Make Append Cost aware of some run time partition prune case

Started by Andy Fanabout 5 years ago6 messages

zhihui.fan1213@gmail.com

about 5 years ago

2 attachment(s)

Currently the cost model of append path sums the cost/rows for all the
subpaths, it usually works well until we run into the run-time partition
prune
case. The first result is that generic plans will rarely be used for some
cases.
For instance, SELECT * FROM p WHERE pkey = $1; The custom plan will only
count the cost of one partition, however generic plan will count the cost
for all the
partitions even we are sure that only 1 partition will survive. Another
impact
is that planners may choose a wrong plan. for example, SELECT * FROM t1,
p
WHERE t1.a = p.pkey; The cost/rows of t1 nest loop p is estimated highly
improperly. This patch wants to help this case to some extent.

The basic idea is we need to estimate how many partitions will survive after
considering the runtime partition prune. After that, we will adjust the
cost/rows accordingly. IIUC, this follows Robert's idea at [1]/messages/by-id/CA+TgmoZHYoAL4HYwnGO25B8CxCB+vNMdf+7rbUzYykR4sU9yUA@mail.gmail.com.
However there are too many special cases which is hard
to handle, but luckily the most common case would be sth like partykey =
$1,
which we can estimate well, this patch is aimed to handle that part
specially.
I supposed 75% partitions will survive for other cases arbitrarily,
actually I
want to use 100% to be the same as the current situation. If we decide to
handle their special case differently, PartitionedRelQual has to be
redefined
a bit (see comments around that.) which is the main data struct in the
implementation.

The attached is the minimum runnable patch. There are some obvious issue
here:

1. MergeAppend path is not handled on purpose, it will be done at last.
2. The cost for Parallel Append Path is not adjusted, I'm not sure about
what is
the best way to do it. So there are 2 test cases in
sql/partition_prune.out
failed due to this, I just echo the 'results' to expected' so that you
can
know which one is impacted.

Here are some simple test results.

create table a1 partition of a for values in (1) partition by range (b, c);
create table a1_1 partition of a1 for values from (1, 0) to (1, 10);
create table a1_2 partition of a1 for values from (2, 0) to (2, 10);

create table a2 partition of a for values in (2) partition by range (b, c);
create table a2_1 partition of a2 for values from (1, 0) to (1, 10);
create table a2_2 partition of a2 for values from (2, 0) to (2, 10);

insert into a select 1, i%2 + 1, i % 10 from generate_series(1, 10000) i;
insert into a select 2, i%2 + 1, i % 10 from generate_series(1, 10000) i;

analyze a;

set plan_cache_mode to force_generic_plan;

prepare s as select * from a where a = $1;
PREPARE
explain execute s(1);
QUERY PLAN
-------------------------------------------------------------------
Append (cost=0.00..231.00 rows=10000 width=12) *<< both rows/cost is
adjusted.*
Subplans Removed: 2
-> Seq Scan on a1_1 a_1 (cost=0.00..90.50 rows=5000 width=12)
Filter: (a = $1)
-> Seq Scan on a1_2 a_2 (cost=0.00..90.50 rows=5000 width=12)
Filter: (a = $1)
(6 rows)

prepare s4 as select * from a where a = $1 and b = $2 and c = $3;
PREPARE
explain execute s4(1, 1, 2);
QUERY PLAN
--------------------------------------------------------------------
Append (cost=0.00..120.50 rows=1000 width=12) *<< Here*
Subplans Removed: 3
-> Seq Scan on a1_1 a_1 (cost=0.00..115.50 rows=1000 width=12)
Filter: ((a = $1) AND (b = $2) AND (c = $3))
(4 rows)

prepare s2 as select * from a where a = $1 union all select * from a where
a = $2;
PREPARE
explain execute s2(1, 1);
QUERY PLAN
-------------------------------------------------------------------------
Append (cost=0.00..762.00 rows=20000 width=12)
-> Append (cost=0.00..231.00 rows=10000 width=12) *<< Here*
Subplans Removed: 2
-> Seq Scan on a1_1 a_1 (cost=0.00..90.50 rows=5000 width=12)
Filter: (a = $1)
-> Seq Scan on a1_2 a_2 (cost=0.00..90.50 rows=5000 width=12)
Filter: (a = $1)
-> Append (cost=0.00..231.00 rows=10000 width=12) << Here
Subplans Removed: 2
-> Seq Scan on a1_1 a_4 (cost=0.00..90.50 rows=5000 width=12)
Filter: (a = $2)
-> Seq Scan on a1_2 a_5 (cost=0.00..90.50 rows=5000 width=12)
Filter: (a = $2)
(13 rows)

prepare s3 as select * from a where a = $1 union select * from a where a =
$2;
PREPARE
explain execute s3(1, 1);
QUERY PLAN
-------------------------------------------------------------------------------
HashAggregate (cost=912.00..1112.00 rows=20000 width=12)
Group Key: a.a, a.b, a.c
-> Append (cost=0.00..762.00 rows=20000 width=12)
-> Append (cost=0.00..231.00 rows=10000 width=12) << Here
Subplans Removed: 2
-> Seq Scan on a1_1 a_1 (cost=0.00..90.50 rows=5000
width=12)
Filter: (a = $1)
-> Seq Scan on a1_2 a_2 (cost=0.00..90.50 rows=5000
width=12)
Filter: (a = $1)
-> Append (cost=0.00..231.00 rows=10000 width=12) << Here
Subplans Removed: 2
-> Seq Scan on a1_1 a_4 (cost=0.00..90.50 rows=5000
width=12)
Filter: (a = $2)
-> Seq Scan on a1_2 a_5 (cost=0.00..90.50 rows=5000
width=12)
Filter: (a = $2)
(15 rows)

-- add a limit to make sure the runtime partitions prune.
explain select * from generate_series(1, 10) i(i) join lateral (select *
from a
where a.a = (i.i % 2 + 1) and a.b = (i.i % 2 + 1) and a.c = (i.i % 10)
limit 10000000) m on true;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..2030.10 rows=10000 width=16)
-> Function Scan on generate_series i (cost=0.00..0.10 rows=10 width=4)
-> Limit (cost=0.00..183.00 rows=1000 width=12)
-> Append (cost=0.00..183.00 rows=1000 width=12) << Here
-> Seq Scan on a1_1 a_1 (cost=0.00..178.00 rows=1000
width=12)
Filter: ((c = (i.i % 10)) AND (a = ((i.i % 2) + 1))
AND (b = ((i.i % 2) + 1)))
-> Seq Scan on a1_2 a_2 (cost=0.00..178.00 rows=1000
width=12)
Filter: ((c = (i.i % 10)) AND (a = ((i.i % 2) + 1))
AND (b = ((i.i % 2) + 1)))
-> Seq Scan on a2_1 a_3 (cost=0.00..178.00 rows=1000
width=12)
Filter: ((c = (i.i % 10)) AND (a = ((i.i % 2) + 1))
AND (b = ((i.i % 2) + 1)))
-> Seq Scan on a2_2 a_4 (cost=0.00..178.00 rows=1000
width=12)
Filter: ((c = (i.i % 10)) AND (a = ((i.i % 2) + 1))
AND (b = ((i.i % 2) + 1)))
(12 rows)

Any thoughts about this? Thanks

[1]: /messages/by-id/CA+TgmoZHYoAL4HYwnGO25B8CxCB+vNMdf+7rbUzYykR4sU9yUA@mail.gmail.com
/messages/by-id/CA+TgmoZHYoAL4HYwnGO25B8CxCB+vNMdf+7rbUzYykR4sU9yUA@mail.gmail.com

--
Best Regards
Andy Fan

Attachments:

v1-0001-Adjust-Append-Path-cost-model-for-runtime-partiti.patchapplication/octet-stream; name=v1-0001-Adjust-Append-Path-cost-model-for-runtime-partiti.patchDownload

From 41273c8cfa4d954553ad37de49b4805c893134b9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 2 Nov 2020 11:49:24 +0800
Subject: [PATCH v1] Adjust Append Path cost model for runtime partition prune
 case.

Currently the cost of Append Path ignored the runtime partition prune
totally which may cause some troubles.  However it looks like we can
estimated how many partitions will live accurately, but luckily the
most common case would be some case  partykey = $1, which can we estimate
well, this patch is aim to handle that part specially.
---
 src/backend/optimizer/path/costsize.c         | 121 +++++-
 src/backend/optimizer/util/pathnode.c         |   2 +-
 src/backend/partitioning/partprune.c          | 367 +++++++++++++++---
 src/include/nodes/nodes.h                     |   1 +
 src/include/nodes/plannodes.h                 |  28 +-
 src/include/optimizer/cost.h                  |   2 +-
 src/include/partitioning/partprune.h          |   6 +
 src/test/regress/expected/partition_prune.out |  54 +--
 8 files changed, 488 insertions(+), 93 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index f1dfdc1a4a..4c0fb3b65e 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -92,6 +92,7 @@
 #include "optimizer/planmain.h"
 #include "optimizer/restrictinfo.h"
 #include "parser/parsetree.h"
+#include "partitioning/partprune.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 #include "utils/spccache.h"
@@ -2036,14 +2037,76 @@ append_nonpartial_cost(List *subpaths, int numpaths, int parallel_workers)
 	return costarr[max_index];
 }
 
+
+/*
+ * find_partitioned_rel_offset
+ *   Given a relid from a subpath, return the offset in partitioned_rel which its
+ * parent in.
+ */
+static int
+find_partitioned_rel_offset(PlannerInfo *root, List *partitioned_rel,  int relid)
+{
+	int i = 0;
+	ListCell *lc;
+	int parent_relid = -1;
+	if (root->append_rel_array[relid] != NULL)
+		parent_relid = root->append_rel_array[relid]->parent_relid;
+	else
+		return -1;
+	foreach(lc,  partitioned_rel)
+	{
+		Relids relids = (Relids)lfirst(lc);
+		if (bms_is_member(parent_relid, relids))
+			return i;
+		i++;
+	}
+	return -1;
+}
+
+
 /*
  * cost_append
  *	  Determines and returns the cost of an Append node.
  */
 void
-cost_append(AppendPath *apath)
+cost_append(AppendPath *apath, PlannerInfo *root)
 {
 	ListCell   *l;
+	RelOptInfo *rel = apath->path.parent;
+	List	*partitioned_rels = apath->partitioned_rels;
+	double	*part_live_ratio = NULL;
+	int len;
+	bool can_runtime_prune;
+
+	/*
+	 * Check if we can have run-time prune, if so, we should reduce
+	 * some cost because of that.
+	 */
+	can_runtime_prune = enable_partition_pruning
+		&& rel->reloptkind == RELOPT_BASEREL
+		&& partitioned_rels != NIL
+		&& apath->subpaths != NIL
+		/* This is just for easy test, should be removed in the final patch */
+		&& enable_geqo;
+
+
+	if (can_runtime_prune)
+	{
+		int i = 0;
+		ListCell	*lc;
+		len = list_length(partitioned_rels);
+		part_live_ratio = palloc0(sizeof(double) * len);
+
+		foreach(lc, partitioned_rels)
+		{
+			part_live_ratio[i++] = est_runtime_part_prune(root,
+														  rel,
+														  (Relids) lfirst(lc),
+														  apath->path.param_info ?
+														  apath->path.param_info->ppi_clauses : NIL);
+
+		}
+	}
 
 	apath->path.startup_cost = 0;
 	apath->path.total_cost = 0;
@@ -2055,14 +2118,17 @@ cost_append(AppendPath *apath)
 	if (!apath->path.parallel_aware)
 	{
 		List	   *pathkeys = apath->path.pathkeys;
-
+		int			offset;
 		if (pathkeys == NIL)
 		{
 			Path	   *subpath = (Path *) linitial(apath->subpaths);
 
 			/*
 			 * For an unordered, non-parallel-aware Append we take the startup
-			 * cost as the startup cost of the first subpath.
+			 * cost as the startup cost of the first subpath when run time partition
+			 * would not happen. However when run time partition prune happens
+			 * we can't know which partition will survived, so still keep the
+			 * startup as the first path's startup_cost.
 			 */
 			apath->path.startup_cost = subpath->startup_cost;
 
@@ -2070,9 +2136,25 @@ cost_append(AppendPath *apath)
 			foreach(l, apath->subpaths)
 			{
 				Path	   *subpath = (Path *) lfirst(l);
-
-				apath->path.rows += subpath->rows;
-				apath->path.total_cost += subpath->total_cost;
+				if (can_runtime_prune)
+				{
+					offset = find_partitioned_rel_offset(root, partitioned_rels, subpath->parent->relid);
+					if (offset != -1)
+					{
+						apath->path.rows += subpath->rows * part_live_ratio[offset];
+						apath->path.total_cost += subpath->total_cost * part_live_ratio[offset];
+					}
+					else
+					{
+						apath->path.rows += subpath->rows;
+						apath->path.total_cost += subpath->total_cost;
+					}
+				}
+				else
+				{
+					apath->path.rows += subpath->rows;
+					apath->path.total_cost += subpath->total_cost;
+				}
 			}
 		}
 		else
@@ -2099,6 +2181,7 @@ cost_append(AppendPath *apath)
 			{
 				Path	   *subpath = (Path *) lfirst(l);
 				Path		sort_path;	/* dummy for result of cost_sort */
+				int			relid = subpath->parent->relid;
 
 				if (!pathkeys_contained_in(pathkeys, subpath->pathkeys))
 				{
@@ -2119,10 +2202,28 @@ cost_append(AppendPath *apath)
 							  apath->limit_tuples);
 					subpath = &sort_path;
 				}
-
-				apath->path.rows += subpath->rows;
-				apath->path.startup_cost += subpath->startup_cost;
-				apath->path.total_cost += subpath->total_cost;
+				if (can_runtime_prune)
+				{
+					offset = find_partitioned_rel_offset(root, partitioned_rels, relid);
+					if (offset != -1)
+					{
+						apath->path.rows += subpath->rows * part_live_ratio[offset];
+						apath->path.total_cost += subpath->total_cost * part_live_ratio[offset];
+						apath->path.startup_cost += subpath->startup_cost * part_live_ratio[offset];
+					}
+					else
+					{
+						apath->path.rows += subpath->rows;
+						apath->path.startup_cost += subpath->startup_cost;
+						apath->path.total_cost += subpath->total_cost;
+					}
+				}
+				else
+				{
+					apath->path.rows += subpath->rows;
+					apath->path.startup_cost += subpath->startup_cost;
+					apath->path.total_cost += subpath->total_cost;
+				}
 			}
 		}
 	}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 5281a2f998..5bef994eeb 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1315,7 +1315,7 @@ create_append_path(PlannerInfo *root,
 		pathnode->path.pathkeys = child->pathkeys;
 	}
 	else
-		cost_append(pathnode);
+		cost_append(pathnode, root);
 
 	/* If the caller provided a row estimate, override the computed value. */
 	if (rows >= 0)
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 8e1187e31f..26ed155231 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -49,6 +49,7 @@
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
+#include "optimizer/restrictinfo.h"
 #include "parser/parsetree.h"
 #include "partitioning/partbounds.h"
 #include "partitioning/partprune.h"
@@ -321,6 +322,44 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
 	return pruneinfo;
 }
 
+
+static List *
+translate_appendrel_vars(PlannerInfo *root,
+						 RelOptInfo *topmostrel,
+						 RelOptInfo *subpart,
+						 RelOptInfo **currel,
+						 List **prunequal)
+{
+	if (!*currel)
+	{
+		*currel = subpart;
+		if (!bms_equal(topmostrel->relids, subpart->relids))
+		{
+			int			nappinfos;
+			AppendRelInfo **appinfos = find_appinfos_by_relids(root,
+															   subpart->relids,
+															   &nappinfos);
+
+			*prunequal = (List *) adjust_appendrel_attrs(root, (Node *)
+														 *prunequal,
+														 nappinfos,
+														 appinfos);
+
+			pfree(appinfos);
+		}
+		return *prunequal;
+	}
+	else
+	{
+		return (List *)	adjust_appendrel_attrs_multilevel(root,
+														  (Node *)(*prunequal),
+														  subpart->relids,
+														  (*currel)->relids);
+	}
+	return NIL;
+}
+
+
 /*
  * make_partitionedrel_pruneinfo
  *		Build a List of PartitionedRelPruneInfos, one for each partitioned
@@ -387,53 +426,11 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
 		Assert(rti < root->simple_rel_array_size);
 		relid_subpart_map[rti] = i++;
 
-		/*
-		 * Translate pruning qual, if necessary, for this partition.
-		 *
-		 * The first item in the list is the target partitioned relation.
-		 */
-		if (!targetpart)
-		{
-			targetpart = subpart;
-
-			/*
-			 * The prunequal is presented to us as a qual for 'parentrel'.
-			 * Frequently this rel is the same as targetpart, so we can skip
-			 * an adjust_appendrel_attrs step.  But it might not be, and then
-			 * we have to translate.  We update the prunequal parameter here,
-			 * because in later iterations of the loop for child partitions,
-			 * we want to translate from parent to child variables.
-			 */
-			if (!bms_equal(parentrel->relids, subpart->relids))
-			{
-				int			nappinfos;
-				AppendRelInfo **appinfos = find_appinfos_by_relids(root,
-																   subpart->relids,
-																   &nappinfos);
-
-				prunequal = (List *) adjust_appendrel_attrs(root, (Node *)
-															prunequal,
-															nappinfos,
-															appinfos);
-
-				pfree(appinfos);
-			}
-
-			partprunequal = prunequal;
-		}
-		else
-		{
-			/*
-			 * For sub-partitioned tables the columns may not be in the same
-			 * order as the parent, so we must translate the prunequal to make
-			 * it compatible with this relation.
-			 */
-			partprunequal = (List *)
-				adjust_appendrel_attrs_multilevel(root,
-												  (Node *) prunequal,
-												  subpart->relids,
-												  targetpart->relids);
-		}
+		partprunequal = translate_appendrel_vars(root,
+											  parentrel,
+											  subpart,
+											  &targetpart,
+											  &prunequal);
 
 		/*
 		 * Convert pruning qual to pruning steps.  We may need to do this
@@ -3600,3 +3597,279 @@ partkey_datum_from_expr(PartitionPruneContext *context,
 		*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
 	}
 }
+
+/*
+ * get_clause_strategy_num
+ *
+ *	Given a clause and check if it is OK to be used to run time partition prune
+ * estimation. If yes, return the StrategyNumber. It is possible that a clause
+ * can be used as partition prune but can't be used as partition prune estimation
+ * at the current implementation since we have to put more effort on such cases
+ * and we don't know how to estimate it well even so.  So it is still tuned to
+ * support such case or not.
+ */
+static int
+get_clause_strategy_num(RelOptInfo *rel, Expr *clause, Expr *partkey, int partkeyidx)
+{
+	PartitionScheme part_scheme = rel->part_scheme;
+	Oid			partopfamily = part_scheme->partopfamily[partkeyidx],
+		partcoll = part_scheme->partcollation[partkeyidx];
+
+	Expr *leftop, *rightop;
+	Oid			opno,
+		op_lefttype,
+		op_righttype,
+		negator = InvalidOid;
+	int			op_strategy = InvalidStrategy;
+	bool		is_opne_listp = false;
+
+	if (IsA(clause, OpExpr))
+	{
+		OpExpr *opexpr = (OpExpr *)clause;
+		leftop = (Expr *) get_leftop(clause);
+		if (IsA(leftop, RelabelType))
+			leftop = ((RelabelType *) leftop)->arg;
+		rightop = (Expr *) get_rightop(clause);
+		if (IsA(rightop, RelabelType))
+			rightop = ((RelabelType *) rightop)->arg;
+		opno = opexpr->opno;
+
+		/* I don't care about the expr actually */
+		if (!equal(leftop, partkey) && !equal(rightop, partkey))
+			return InvalidStrategy;
+
+		/* collation mismatch */
+		if (!PartCollMatchesExprColl(partcoll, opexpr->inputcollid))
+			return InvalidStrategy;
+
+		if (op_in_opfamily(opno, partopfamily))
+		{
+			get_op_opfamily_properties(opno, partopfamily,
+									   false, &op_strategy,
+									   &op_lefttype, &op_righttype);
+		}
+		else
+		{
+			if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+				return InvalidStrategy;
+
+			/* See if the negator is equality */
+			negator = get_negator(opno);
+			if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+			{
+				get_op_opfamily_properties(negator, partopfamily, false,
+										   &op_strategy, &op_lefttype,
+										   &op_righttype);
+				if (op_strategy == BTEqualStrategyNumber)
+					is_opne_listp = true;	/* bingo */
+			}
+
+			if (!is_opne_listp)
+				return InvalidStrategy;
+		}
+
+		if (!op_strict(opno))
+			return InvalidStrategy;
+
+		return op_strategy;
+	}
+	else if (IsA(clause, ScalarArrayOpExpr))
+	{
+		/* still tuned */
+	}
+
+	else
+	{
+		/* still tuned */
+	}
+
+	return InvalidStrategy;
+}
+
+
+/*
+ * gen_partitioned_rel_quals
+ *
+ *	We have got the translated quals for a partitioned rel, now we will match it
+ * partkey and return a PartitionedRelPruneQual for run-time partition prune
+ * estimation.
+ */
+static PartitionedRelPruneQual *
+gen_partitioned_rel_quals(RelOptInfo *rel, List *all_quals)
+{
+	int *strategy_nums = NULL;
+	ListCell *lc;
+	PartitionScheme part_scheme = rel->part_scheme;
+	int i;
+	PartitionedRelPruneQual *res;
+	foreach(lc, all_quals)
+	{
+		Expr	*clause = (Expr *) lfirst(lc);
+
+		if (IsA(clause, RestrictInfo))
+			clause = ((RestrictInfo *) clause)->clause;
+
+		for (i = 0;  i < part_scheme->partnatts; i++)
+		{
+			Expr	   *partkey = linitial(rel->partexprs[i]);
+			int op_strategy = get_clause_strategy_num(rel, clause, partkey, i);
+			if (op_strategy != InvalidStrategy)
+			{
+				if (strategy_nums == NULL)
+				{
+					strategy_nums = palloc0(part_scheme->partnatts * sizeof(int));
+				}
+				strategy_nums[i] = op_strategy;
+				break;
+			}
+		}
+	}
+
+	res = makeNode(PartitionedRelPruneQual);
+	res->rtindex = rel->relid;
+	res->strategy_number = strategy_nums;
+	return res;
+}
+
+
+/*
+ * get_lived_partition
+ *	 Given PartitionedRelPruneQual and RelOptInfo, we estimate how many
+ * partition will survive after run time prune. use `double` here just
+ * because it is a estimated value and double is more accurate than int.
+ * All the cases that I made it as 0.75 * total_parts is still tuned. But
+ * if we decide more complex cases to handle, the definition of PartitionedRelPruneQual
+ * probably need an enhancement. Replace the 0.75 with 1 will work same as the current situation.
+ * return live_parts and total_parts rather than a ratio because that
+ * will make the sub-partition estimation easier.
+ */
+static double
+get_lived_partition(struct PlannerInfo *root,
+					PartitionedRelPruneQual *prinfo,
+					double *total_parts,
+					RelOptInfo *rel)
+{
+	/* All the partition key have a BTEqualStrategyNumber is perfect */
+	bool	is_perfect = true;
+	int i;
+	Assert(IS_PARTITIONED_REL(rel));
+
+	/*
+	 * The total partition here is all the lived partition after plan time
+	 * partition prune.  It has some issue in the following example,
+	 * but I think the impact would be OK.
+	 * partition by (partkey1, partkey2), query is where partkey1 = Const.
+	 * In this case, we already have some plan time partition prune for partkey1
+	 * but we will count the partition prune info again here. However for the common
+	 * case like partkey1 = Const1 and partkey2 = Const2. or partkey1 = $1 and
+	 * partkey2 = $2, there is no impact at all.
+	 * And I can't use the total partition from partition definition since the
+	 * we use the calculate the prune_ratio which impacts on the total_cost/rows
+	 * which is from the partitions after the plan time partition prune.
+	 */
+	*total_parts = bms_num_members(rel->all_partrels);
+
+	/* No partkey quals at all. */
+	if (prinfo->strategy_number == NULL)
+		return *total_parts;
+
+	for (i = 0; i < rel->part_scheme->partnatts; i++)
+	{
+		if (prinfo->strategy_number[i] != BTEqualStrategyNumber)
+		{
+			is_perfect = false;
+			break;
+		}
+	}
+
+	/*
+	 * If we replace the 0.75 with 1 below, the behavior will be same as before
+	 * for such case.
+	 */
+	return is_perfect ? 1 : 0.75 * (*total_parts);
+}
+
+
+/*
+ * est_runtime_part_prune_internal
+ *
+ *	Given a list of PartitionedRelPruneQual, we estimate a prune factor
+ * for the partitioned rel.
+ */
+static double
+est_runtime_part_prune_internal(struct PlannerInfo *root,
+								List *prune_quals)
+{
+	ListCell *lc;
+	RelOptInfo *rel = NULL;
+	double total_parent_parts = 0, total_sub_parts = 0,
+		live_parent_parts = 0, live_sub_parts = 0;
+	double parent_live_ratio;
+	if (prune_quals == NIL)
+		return 1;
+	foreach(lc, prune_quals)
+	{
+		PartitionedRelPruneQual *pinfo = lfirst_node(PartitionedRelPruneQual, lc);
+		double live_parts, total_parts;
+		rel = find_base_rel(root, pinfo->rtindex);
+		Assert(IS_PARTITIONED_REL(rel));
+		live_parts = get_lived_partition(root, pinfo, &total_parts, rel);
+
+		if (rel->reloptkind == RELOPT_BASEREL)
+		{
+			Assert(total_parent_parts == 0);
+			Assert(live_parent_parts == 0);
+			total_parent_parts = total_parts;
+			live_parent_parts = live_parts;
+		}
+		else
+		{
+			total_sub_parts += total_parts;
+			live_sub_parts += live_parts;
+		}
+	}
+	if (total_parent_parts)
+	{
+		parent_live_ratio = live_parent_parts * 1.0 / total_parent_parts;		
+	}
+	else
+		parent_live_ratio = 1;
+
+	if (total_sub_parts > 0)
+		return live_sub_parts * 1.0 / total_sub_parts * parent_live_ratio;
+	return parent_live_ratio;
+}
+
+/*
+ * est_runtime_part_prune
+ *	estimate a runtime partition prune ratio for a partitioned_rel.
+ */
+double
+est_runtime_part_prune(struct PlannerInfo *root,
+					   struct RelOptInfo *topmostrel,
+					   Relids partitioned_rel,
+					   List *prmquals)
+{
+	List	*prunequal;
+	RelOptInfo *currel = NULL;
+	List	*rel_prune_quals = NIL;
+	int rti = -1;
+
+	prunequal = extract_actual_clauses(topmostrel->baserestrictinfo, false);
+	prunequal = list_concat(prunequal, prmquals);
+
+	while((rti = bms_next_member(partitioned_rel, rti)) >= 0)
+	{
+		RelOptInfo *subpart = find_base_rel(root, rti);
+		List *translated_quals = translate_appendrel_vars(root,
+														  topmostrel,
+														  subpart,
+														  &currel,
+														  &prunequal);
+		PartitionedRelPruneQual* partqual  = gen_partitioned_rel_quals(subpart,
+																	   translated_quals);
+		rel_prune_quals = lappend(rel_prune_quals, partqual);
+	}
+
+	return est_runtime_part_prune_internal(root, rel_prune_quals);
+}
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 7ddd8c011b..8e1a16a842 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -89,6 +89,7 @@ typedef enum NodeTag
 	T_NestLoopParam,
 	T_PlanRowMark,
 	T_PartitionPruneInfo,
+	T_PartitionedRelPruneQual,
 	T_PartitionedRelPruneInfo,
 	T_PartitionPruneStepOp,
 	T_PartitionPruneStepCombine,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 7e6b10f86b..554336a7ec 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -1122,9 +1122,35 @@ typedef struct PartitionPruneInfo
 	Bitmapset  *other_subplans;
 } PartitionPruneInfo;
 
+
+
+/*
+ * PartitionedRelPruneQual
+ *   Used to estimate the run time partition prune ratio for a given partitioned
+ * rel's qual.
+ *
+ * strategy_number is a array of `int` for now, indexed by partattrno.  it is OK
+ * for most of the cases,
+ * however it can't represent some more complex cases, like partkey between
+ * $1 and $2. OR (partkey = $1 or partkey = $2).  Since even when know this,
+ * we still can't estimate a good number for that, so I am not sure if it worth
+ * the trouble.  If we need that, we can change `int` to a struct which can include
+ * more info, so still tuned.
+ */
+
+typedef struct PartitionedRelPruneQual
+{
+	NodeTag		type;
+	int			rtindex;
+	int		*strategy_number;
+} PartitionedRelPruneQual;
+
+
 /*
  * PartitionedRelPruneInfo - Details required to allow the executor to prune
- * partitions for a single partitioned table.
+ * partitions for a single partitioned table or allow planner to estimate
+ * how many partition can survive after run time partition prune. In the
+ * later case, only part of the fields are set, see est_runtime_part_prune
  *
  * subplan_map[] and subpart_map[] are indexed by partition index of the
  * partitioned table referenced by 'rtindex', the partition index being the
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6141654e47..112308268a 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -107,7 +107,7 @@ extern void cost_incremental_sort(Path *path,
 								  Cost input_startup_cost, Cost input_total_cost,
 								  double input_tuples, int width, Cost comparison_cost, int sort_mem,
 								  double limit_tuples);
-extern void cost_append(AppendPath *path);
+extern void cost_append(AppendPath *path, PlannerInfo *root);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 							  List *pathkeys, int n_streams,
 							  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index babdad2c3e..1e16bcedc8 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -15,6 +15,7 @@
 #define PARTPRUNE_H
 
 #include "nodes/execnodes.h"
+#include "nodes/pathnodes.h"
 #include "partitioning/partdefs.h"
 
 struct PlannerInfo;				/* avoid including pathnodes.h here */
@@ -77,4 +78,9 @@ extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
 extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
 										  List *pruning_steps);
 
+extern double est_runtime_part_prune(struct PlannerInfo *root,
+									 struct RelOptInfo *topmostrel,
+									 Relids partitioned_rel,
+									 List *prmquals);
+
 #endif							/* PARTPRUNE_H */
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index c72a6d051f..f71a0bbca9 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -3685,20 +3685,16 @@ alter table listp_12_1 set (parallel_workers = 0);
 -- Ensure that listp_12_2 is not scanned.  (The nested Append is not seen in
 -- the plan as it's pulled in setref.c due to having just a single subnode).
 select explain_parallel_append('select * from listp where a = (select 1);');
-                       explain_parallel_append                        
-----------------------------------------------------------------------
- Gather (actual rows=N loops=N)
-   Workers Planned: 2
-   Params Evaluated: $0
-   Workers Launched: N
+                   explain_parallel_append                    
+--------------------------------------------------------------
+ Append (actual rows=N loops=N)
    InitPlan 1 (returns $0)
      ->  Result (actual rows=N loops=N)
-   ->  Parallel Append (actual rows=N loops=N)
-         ->  Seq Scan on listp_12_1 listp_1 (actual rows=N loops=N)
-               Filter: (a = $0)
-         ->  Parallel Seq Scan on listp_12_2 listp_2 (never executed)
-               Filter: (a = $0)
-(11 rows)
+   ->  Seq Scan on listp_12_1 listp_1 (actual rows=N loops=N)
+         Filter: (a = $0)
+   ->  Seq Scan on listp_12_2 listp_2 (never executed)
+         Filter: (a = $0)
+(7 rows)
 
 -- Like the above but throw some more complexity at the planner by adding
 -- a UNION ALL.  We expect both sides of the union not to scan the
@@ -3707,32 +3703,24 @@ select explain_parallel_append(
 'select * from listp where a = (select 1)
   union all
 select * from listp where a = (select 2);');
-                              explain_parallel_append                              
------------------------------------------------------------------------------------
+                      explain_parallel_append                       
+--------------------------------------------------------------------
  Append (actual rows=N loops=N)
-   ->  Gather (actual rows=N loops=N)
-         Workers Planned: 2
-         Params Evaluated: $0
-         Workers Launched: N
+   ->  Append (actual rows=N loops=N)
          InitPlan 1 (returns $0)
            ->  Result (actual rows=N loops=N)
-         ->  Parallel Append (actual rows=N loops=N)
-               ->  Seq Scan on listp_12_1 listp_1 (actual rows=N loops=N)
-                     Filter: (a = $0)
-               ->  Parallel Seq Scan on listp_12_2 listp_2 (never executed)
-                     Filter: (a = $0)
-   ->  Gather (actual rows=N loops=N)
-         Workers Planned: 2
-         Params Evaluated: $1
-         Workers Launched: N
+         ->  Seq Scan on listp_12_1 listp_1 (actual rows=N loops=N)
+               Filter: (a = $0)
+         ->  Seq Scan on listp_12_2 listp_2 (never executed)
+               Filter: (a = $0)
+   ->  Append (actual rows=N loops=N)
          InitPlan 2 (returns $1)
            ->  Result (actual rows=N loops=N)
-         ->  Parallel Append (actual rows=N loops=N)
-               ->  Seq Scan on listp_12_1 listp_4 (never executed)
-                     Filter: (a = $1)
-               ->  Parallel Seq Scan on listp_12_2 listp_5 (actual rows=N loops=N)
-                     Filter: (a = $1)
-(23 rows)
+         ->  Seq Scan on listp_12_1 listp_4 (never executed)
+               Filter: (a = $1)
+         ->  Seq Scan on listp_12_2 listp_5 (actual rows=N loops=N)
+               Filter: (a = $1)
+(15 rows)
 
 drop table listp;
 reset parallel_tuple_cost;
-- 
2.21.0

part_runtime_prune.sqlapplication/octet-stream; name=part_runtime_prune.sqlDownload

Ryan Lambert

ryan@rustprooflabs.com

almost 5 years ago

In reply to: Andy Fan (#1)

Re: Make Append Cost aware of some run time partition prune case

On Mon, Nov 9, 2020 at 5:44 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Currently the cost model of append path sums the cost/rows for all the
subpaths, it usually works well until we run into the run-time partition
prune
case. The first result is that generic plans will rarely be used for some
cases.
For instance, SELECT * FROM p WHERE pkey = $1; The custom plan will only
count the cost of one partition, however generic plan will count the cost
for all the
partitions even we are sure that only 1 partition will survive. Another
impact
is that planners may choose a wrong plan. for example, SELECT * FROM t1,
p
WHERE t1.a = p.pkey; The cost/rows of t1 nest loop p is estimated highly
improperly. This patch wants to help this case to some extent.

Greetings,

I was referred to this patch by Amit as a possible improvement for an issue
I noticed recently. I had a test setup where I expected run-time pruning
to kick in but it did not. I am trying to test this patch to see if it
helps for that scenario, but ran into an error running make install against
the current master (commit 0a687c8f1).

costsize.c: In function ‘cost_append’:
costsize.c:2171:32: error: ‘AppendPath’ {aka ‘struct AppendPath’} has no
member named ‘partitioned_rels’
2171 | List *partitioned_rels = apath->partitioned_rels;
| ^~
make[4]: *** [<builtin>: costsize.o] Error 1
make[4]: Leaving directory
'/var/lib/postgresql/git/postgresql/src/backend/optimizer/path'
make[3]: *** [../../../src/backend/common.mk:39: path-recursive] Error 2
make[3]: Leaving directory
'/var/lib/postgresql/git/postgresql/src/backend/optimizer'
make[2]: *** [common.mk:39: optimizer-recursive] Error 2
make[2]: Leaving directory '/var/lib/postgresql/git/postgresql/src/backend'
make[1]: *** [Makefile:42: install-backend-recurse] Error 2
make[1]: Leaving directory '/var/lib/postgresql/git/postgresql/src'
make: *** [GNUmakefile:11: install-src-recurse] Error 2

Thanks,

Ryan Lambert

Andy Fan

zhihui.fan1213@gmail.com

almost 5 years ago

In reply to: Ryan Lambert (#2)

Re: Make Append Cost aware of some run time partition prune case

Hi Ryan:

On Thu, Mar 4, 2021 at 8:14 AM Ryan Lambert <ryan@rustprooflabs.com> wrote:

On Mon, Nov 9, 2020 at 5:44 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Currently the cost model of append path sums the cost/rows for all the
subpaths, it usually works well until we run into the run-time partition
prune
case. The first result is that generic plans will rarely be used for
some cases.
For instance, SELECT * FROM p WHERE pkey = $1; The custom plan will only
count the cost of one partition, however generic plan will count the cost
for all the
partitions even we are sure that only 1 partition will survive. Another
impact
is that planners may choose a wrong plan. for example, SELECT * FROM
t1, p
WHERE t1.a = p.pkey; The cost/rows of t1 nest loop p is estimated highly
improperly. This patch wants to help this case to some extent.

Greetings,

I was referred to this patch by Amit as a possible improvement for an
issue I noticed recently. I had a test setup where I expected run-time
pruning to kick in but it did not. I am trying to test this patch to see
if it helps for that scenario, but ran into an error running make install
against the current master (commit 0a687c8f1).

costsize.c: In function ‘cost_append’:
costsize.c:2171:32: error: ‘AppendPath’ {aka ‘struct AppendPath’} has no
member named ‘partitioned_rels’
2171 | List *partitioned_rels = apath->partitioned_rels;
| ^~
make[4]: *** [<builtin>: costsize.o] Error 1
make[4]: Leaving directory
'/var/lib/postgresql/git/postgresql/src/backend/optimizer/path'
make[3]: *** [../../../src/backend/common.mk:39: path-recursive] Error 2
make[3]: Leaving directory
'/var/lib/postgresql/git/postgresql/src/backend/optimizer'
make[2]: *** [common.mk:39: optimizer-recursive] Error 2
make[2]: Leaving directory '/var/lib/postgresql/git/postgresql/src/backend'
make[1]: *** [Makefile:42: install-backend-recurse] Error 2
make[1]: Leaving directory '/var/lib/postgresql/git/postgresql/src'
make: *** [GNUmakefile:11: install-src-recurse] Error 2

Thanks,

Ryan Lambert

Thanks for checking. This patch is on a very old master and the code is
too complex
since I wanted to handle a full scenario of a run time partition prune,
which has lots
of troubles and not very practical I think. so I am not happy with that
now.

I have implemented a new one, which only handles 1 level of partitioned
table, and
only 1 partition key. and only handle the eq operators like partkey = $1 /
partkey in ($1, $2)
/ parkey = $1 or partkey = $2; The patch works well in my user case. I
can send
one on the latest master shortly, and hope it is helpful for you as well.

(At the same time, I also ran into a case that we can expand more init
partition
prune case [1]/messages/by-id/CAKU4AWq4NLxu5JF9+d=o=A636-=eFNFmPx+kJ44ezTm=ikZ73w@mail.gmail.com, you can check that one if you like. I am happy with that
patch
now).

[1]: /messages/by-id/CAKU4AWq4NLxu5JF9+d=o=A636-=eFNFmPx+kJ44ezTm=ikZ73w@mail.gmail.com
/messages/by-id/CAKU4AWq4NLxu5JF9+d=o=A636-=eFNFmPx+kJ44ezTm=ikZ73w@mail.gmail.com

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Andy Fan

zhihui.fan1213@gmail.com

almost 5 years ago

In reply to: Andy Fan (#3)

1 attachment(s)

Re: Make Append Cost aware of some run time partition prune case

I have implemented a new one, which only handles 1 level of partitioned
table, and
only 1 partition key. and only handle the eq operators like partkey = $1
/ partkey in ($1, $2)
/ parkey = $1 or partkey = $2; The patch works well in my user case. I
can send
one on the latest master shortly, and hope it is helpful for you as well.

Hi:

Here is the new patch for this topic, which has proved works in my limited
user
case (apply the similar logic on pg11), and it is incomplete since more
cases
should be covered but not. Uploading this patch now is just for discussing
and
testing.

Design principle:

1). the cost of AppendPath should be reduced for either init partition
prune or run
time partition prune. All of the startup_cost, total_cost, rows should be
adjusted. As for the startup_cost, if we should adjust it carefully, or
else we can
get the run_time_cost less than 0.

2). When we merge the subpath from sub-partitioned rel via
accumulate_append_subpath, currently we just merge the subpaths and discard
the
cost/rows in AppendPath. After this feature is involved, we may consider
to use
the AppendPath's cost as well during this stage.

3). When join is involved, AppendPath is not enough since the estimated
rows for
a join relation cares about rel1->rows, rel2->rows only, the Path.rows is
not
cared. so we need to adjust rel->rows as well. and only for the init
partition prune case. (appendrel->rows for planning time partition prune is
handled already).

The biggest problem of the above is I don't know how to adjust the cost for
Parallel Append Path. Currently I just ignore it, which would cause some
query
should use Parallel Append Path but not.

Something I don't want to handle really unless we can address its value.
1. Operators like >, <. Between and.
2. the uncompleted part key appeared in prunequals. Like we have partition
key (a,
b). But use just use A = 1 as restrictinfo.

The main reason I don't want to handle them are 1). it would be uncommon.
b). It introduces extra complexity. c). at last, we can't estimate it well
like partkey > $1,
what would be a prune ratio for ).

Something I don't handle so far are: 1). accumulate_append_subpath
stuff. 2). MergeAppend. 3). Multi Partition key.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Attachments:

v1-0001-adjust-cost-model-for-partition-prune-case.patchapplication/octet-stream; name=v1-0001-adjust-cost-model-for-partition-prune-case.patchDownload

From f25937e866b8c5a99c5a0be64a3d3036dfb17b47 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Thu, 4 Mar 2021 12:08:41 +0800
Subject: [PATCH v1] adjust cost model for partition prune case.

---
 src/backend/optimizer/path/allpaths.c         |  15 +-
 src/backend/optimizer/path/costsize.c         | 348 +++++++++++++++++-
 src/backend/optimizer/util/pathnode.c         |   2 +-
 src/include/optimizer/cost.h                  |   3 +-
 src/test/regress/expected/partition_prune.out | 280 +++++++-------
 5 files changed, 485 insertions(+), 163 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d73ac562eb..f4bd73b3ed 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -947,7 +947,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 					Index rti, RangeTblEntry *rte)
 {
 	int			parentRTindex = rti;
-	bool		has_live_children;
+	int		n_live_children = 0;
 	double		parent_rows;
 	double		parent_size;
 	double	   *parent_attrsizes;
@@ -984,7 +984,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 	 * Note: if you consider changing this logic, beware that child rels could
 	 * have zero rows and/or width, if they were excluded by constraints.
 	 */
-	has_live_children = false;
 	parent_rows = 0;
 	parent_size = 0;
 	nattrs = rel->max_attr - rel->min_attr + 1;
@@ -1112,7 +1111,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 			continue;
 
 		/* We have at least one live child. */
-		has_live_children = true;
+		n_live_children += 1;
 
 		/*
 		 * If any live child is not parallel-safe, treat the whole appendrel
@@ -1169,7 +1168,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
-	if (has_live_children)
+	if (n_live_children > 0)
 	{
 		/*
 		 * Save the finished size estimates.
@@ -1182,6 +1181,14 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		for (i = 0; i < nattrs; i++)
 			rel->attr_widths[i] = rint(parent_attrsizes[i] / parent_rows);
 
+		if (enable_partition_pruning && IS_PARTITIONED_REL(rel))
+		{
+			/* reduce the rows that can be pruned at init partition prune stage */
+			rel->rows *= calculate_relrows_prune_ratio(rel, rte, n_live_children);
+			if (rel->rows < 1)
+				rel->rows = 1;
+		}
+
 		/*
 		 * Set "raw tuples" count equal to "rows" for the appendrel; needed
 		 * because some places assume rel->tuples is valid for any baserel.
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a25b674a19..0ac87ea9f2 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -76,6 +76,7 @@
 #include "access/amapi.h"
 #include "access/htup_details.h"
 #include "access/tsmapi.h"
+#include "catalog/pg_class.h"
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeHash.h"
@@ -188,6 +189,20 @@ static double page_size(double tuples, int width);
 static double get_parallel_divisor(Path *path);
 
 
+static bool is_simple_append(AppendPath *apath, PlannerInfo *root);
+static bool is_exact_equal_partkey(Expr *clause, Expr *partkey,
+								   PartitionScheme part_scheme,
+								   bool external_param_only);
+static bool is_exact_bool_equal_partkey(BoolExpr *boolexpr, Expr *partkey,
+										PartitionScheme part_scheme,
+										bool extern_param_only);
+static bool has_exact_equal_partkey(List *clauses,  Expr *partkey,
+									PartitionScheme part_scheme,
+									bool extern_param_only);
+static double estimate_runtime_prune_ratio(AppendPath *apath,
+											PlannerInfo *root);
+
+
 /*
  * clamp_row_est
  *		Force a row-count estimate to a sane value.
@@ -1879,7 +1894,7 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 
 /*
  * cost_incremental_sort
- * 	Determines and returns the cost of sorting a relation incrementally, when
+ *		Determines and returns the cost of sorting a relation incrementally, when
  *  the input path is presorted by a prefix of the pathkeys.
  *
  * 'presorted_keys' is the number of leading pathkeys by which the input path
@@ -2136,14 +2151,17 @@ append_nonpartial_cost(List *subpaths, int numpaths, int parallel_workers)
  *	  Determines and returns the cost of an Append node.
  */
 void
-cost_append(AppendPath *apath)
+cost_append(AppendPath *apath, PlannerInfo *root)
 {
 	ListCell   *l;
+	double	prune_ratio = 1;
 
 	apath->path.startup_cost = 0;
 	apath->path.total_cost = 0;
 	apath->path.rows = 0;
 
+	prune_ratio = estimate_runtime_prune_ratio(apath, root);
+
 	if (apath->subpaths == NIL)
 		return;
 
@@ -2154,6 +2172,7 @@ cost_append(AppendPath *apath)
 		if (pathkeys == NIL)
 		{
 			Path	   *subpath = (Path *) linitial(apath->subpaths);
+			double		total_startup_cost = 0;
 
 			/*
 			 * For an unordered, non-parallel-aware Append we take the startup
@@ -2168,7 +2187,10 @@ cost_append(AppendPath *apath)
 
 				apath->path.rows += subpath->rows;
 				apath->path.total_cost += subpath->total_cost;
+				total_startup_cost += subpath->startup_cost;
 			}
+			if (prune_ratio < 1)
+				apath->path.startup_cost = total_startup_cost;
 		}
 		else
 		{
@@ -2276,6 +2298,13 @@ cost_append(AppendPath *apath)
 								   apath->path.parallel_workers);
 	}
 
+	apath->path.total_cost *= prune_ratio;
+	apath->path.startup_cost *= prune_ratio;
+	apath->path.rows *= prune_ratio;
+
+	if (apath->path.rows < 1)
+		apath->path.rows = 1;
+
 	/*
 	 * Although Append does not do any selection or projection, it's not free;
 	 * add a small per-tuple overhead.
@@ -5992,3 +6021,318 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
 
 	return pages_fetched;
 }
+
+
+/*
+ * estimate_runtime_prune_ratio
+ *
+ *	estimate a runtime prune ratio for a AppendPath.
+ */
+static double
+estimate_runtime_prune_ratio(AppendPath *apath, PlannerInfo *root)
+{
+	List  *prunequal;
+	RelOptInfo *rel;
+	Expr *partkey;
+
+	if (!is_simple_append(apath, root))
+		return 1.0;
+
+	rel = apath->path.parent;
+
+	prunequal = extract_actual_clauses(rel->baserestrictinfo, false);
+
+	if (apath->path.param_info)
+		prunequal = list_concat(prunequal, apath->path.param_info->ppi_clauses);
+
+	/* We only support 1 partition key right now, see is_simple_append */
+	partkey = linitial(rel->partexprs[0]);
+
+	/*
+	 * We need to handle 3 cases correctly. a). WHERE partexpr = $1 or partexpr = $2;
+	 * in this case, even it probably hit 2 partitions, but we still count the factor as
+	 * 1 / rel->parts since each partition counts the 2 variables already.  b). WHERE
+	 * p.partexpr = p2.partexpr. The append in Nestloop inner plan just need to handle
+	 * one for sure. c). WHERE (partexpr = $1 or partexpr = $2) AND p.partexpr2 = p2.partexpr.
+	 * The outer plan follows rule a) and inner Nestloop plan follows rule b.
+	 * At last, we just need to handle the factor as 1 / list_length(subpaths) for all
+	 * the cases.
+	 */
+	if (has_exact_equal_partkey(prunequal, partkey, rel->part_scheme, false))
+		return 1.0 / list_length(apath->subpaths);
+
+	return 1.0;
+}
+
+
+double
+calculate_relrows_prune_ratio(RelOptInfo *rel, RangeTblEntry *rte,  int live_children)
+{
+
+	Assert(IS_PARTITIONED_REL(rel));
+	if (rel->part_scheme->partnatts == 1 &&
+		rte->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		List	*quals = rel->baserestrictinfo;
+		Expr	*partkey = linitial(rel->partexprs[0]);
+		if (has_exact_equal_partkey(quals, partkey, rel->part_scheme, true /* extern param only */))
+			return 1.0 / live_children;
+	}
+	return 1.0;
+}
+
+
+/*
+ * has_exact_equal_partkey
+ */
+static bool
+has_exact_equal_partkey(List *clauses,  Expr *partkey,
+						PartitionScheme part_scheme,
+						bool extern_param_only)
+{
+	ListCell	*lc;
+	foreach(lc, clauses)
+	{
+		Expr *clause = lfirst(lc);
+		if (IsA(clause, RestrictInfo))
+			clause = ((RestrictInfo *)clause)->clause;
+
+		if (IsA(clause, BoolExpr))
+		{
+			if (is_exact_bool_equal_partkey((BoolExpr*)clause, partkey,
+												  part_scheme, extern_param_only))
+				return true;
+		}
+
+		if (is_exact_equal_partkey(clause, partkey,
+										 part_scheme, extern_param_only))
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * is_simple_append
+ *
+ * Just add some limitations for user case to make this patch
+ * easier.
+ */
+static bool
+is_simple_append(AppendPath *apath, PlannerInfo *root)
+{
+	/*
+	 * Append may comes from p, subp, Union, Union all
+	 * or Append two join rel in partition wise case.
+	 */
+
+	RelOptInfo *rel = apath->path.parent;
+	ListCell *lc;
+
+	if (apath->path.parallel_aware)
+		/* We need to adjust both total_cost and startup_cost. If startup_cost
+		 * is not adjust, the (runtime cost = total_cost - startup_cost) might
+		 * less than 0, which will cause some bad result. However how to adjust
+		 * the startup_cost is not clear, so I just not handle this case now.
+		 */
+		return false;
+
+	if (!IS_PARTITIONED_REL(rel))
+		/* May come from Union, UnionAll or partition wise join*/
+		return false;
+
+	if (bms_num_members(rel->relids) > 1)
+		/* For safety */
+		return false;
+
+	if (rel->part_scheme->partnatts != 1)
+		/* Only support 1 for simple */
+		return false;
+
+	/*
+	 * Now it may still a partitoned rel or sub-partitioned rel.
+	 * Currently the subpaths from sub-partitioned rel will be
+	 * merged into parent->subpaths via accumulate_append_subpath
+	 * without considering the cost of the Append, but cares about
+	 * cost of its subpaths. This is not true any more when this
+	 * feature involved.  now I will just adjust the cost of the
+	 * Append without touching accumulate_append_subpath. It should
+	 * be done later. Filter out any sub-partitioned rel for now.
+	 */
+
+	foreach(lc, apath->subpaths)
+	{
+		Path	*path = (Path *) lfirst(lc);
+		RelOptInfo *pathrel = path->parent;
+
+		if (pathrel->reloptkind != RELOPT_OTHER_MEMBER_REL)
+			return false;
+		if (bms_equal(pathrel->relids, apath->path.parent->relids))
+			/* subpath from sub-partitioned rel*/
+			return false;
+	}
+
+	/* At last, I don't bother to handle the transated_vars for now */
+	foreach(lc, apath->subpaths)
+	{
+		Path *subpath = (Path *) lfirst(lc);
+		AppendRelInfo *appinfo = root->append_rel_array[subpath->parent->relid];
+		int i = 0;
+		ListCell *lc2;
+		if (appinfo == NULL)
+		{
+			Assert(false);
+			return false;
+		}
+
+		foreach(lc2, appinfo->translated_vars)
+		{
+			Var *var = (Var *) lfirst(lc2);
+			i++;
+			if (var == NULL)
+				/* dropped column */
+				continue;
+			if (!IsA(var, Var))
+			{
+				Assert(false);
+				return false;
+			}
+			if (var->varattno != i)
+				return false;
+		}
+	}
+	return true;
+}
+
+
+/*
+ * similar with match_clause_to_partition_key.
+ */
+static bool
+is_exact_equal_partkey(Expr *clause, Expr *partkey,
+							 PartitionScheme part_scheme,
+							 bool external_param_only)
+{
+	/* We only support one partkey for now */
+	Oid	 partopfamily = part_scheme->partopfamily[0];
+
+	int op_strategy = InvalidStrategy;
+	Oid opno;
+	Expr	*leftop, *rightop;
+	Oid	op_lefttype, op_righttype;
+
+	if (IsA(clause, OpExpr))
+	{
+		bool found = false;
+		OpExpr *opexpr = (OpExpr *)clause;
+		opno = opexpr->opno;
+		leftop = (Expr *)get_leftop(clause);
+		rightop = (Expr *)get_rightop(clause);
+		while (IsA(leftop, RelabelType))
+			leftop = ((RelabelType *) leftop)->arg;
+
+		while (IsA(rightop, RelabelType))
+			rightop = ((RelabelType *) rightop)->arg;
+
+		if (equal(leftop, partkey))
+		{
+			if (external_param_only)
+			{
+				if (IsA(rightop, Param) && castNode(Param, rightop)->paramkind == PARAM_EXTERN)
+				{
+					found = true;
+				}
+			}
+			else
+				found = true;
+		}
+		else if (equal(rightop, partkey))
+		{
+			if (external_param_only)
+			{
+				if (IsA(leftop, Param) && castNode(Param, leftop)->paramkind == PARAM_EXTERN)
+					found = true;
+			}
+			else
+				found = true;
+		}
+
+		if (!found)
+			return false;
+
+		if (!op_in_opfamily(opno, partopfamily))
+			return false;
+
+		get_op_opfamily_properties(opno, partopfamily,
+								   false, &op_strategy,
+								   &op_lefttype, &op_righttype);
+		/* We only support BTEqualStrategyNumber on purpose, support other types
+		 * not worthy the troubles and probably can't get a good res. */
+		return op_strategy == BTEqualStrategyNumber;
+	}
+	else if (IsA(clause, ScalarArrayOpExpr))
+	{
+		ScalarArrayOpExpr *scalar_array_opexpr = (ScalarArrayOpExpr *) clause;
+		opno = scalar_array_opexpr->opno;
+		if (!op_in_opfamily(opno, partopfamily))
+			return false;
+
+		get_op_opfamily_properties(opno, partopfamily,
+								   false, &op_strategy,
+								   &op_lefttype, &op_righttype);
+		if(op_strategy != BTEqualStrategyNumber)
+			return false;
+		leftop = linitial(scalar_array_opexpr->args);
+		if (!equal(leftop, partkey))
+			return false;
+		if (external_param_only)
+		{
+			Expr *rexpr = lsecond(scalar_array_opexpr->args);
+			if (IsA(rexpr, ArrayCoerceExpr))
+				rexpr = castNode(ArrayCoerceExpr, rexpr)->arg;
+			if (IsA(rexpr, ArrayExpr))
+			{
+				List *elems = castNode(ArrayExpr, rexpr)->elements;
+				if (elems != NIL)
+				{
+					Expr * elem = linitial(elems);
+					if (IsA(elem, Param) && castNode(Param, elem)->paramkind == PARAM_EXTERN)
+						return true;
+				}
+				return false;
+			}
+		}
+		return true;
+	}
+	return false;
+}
+
+static bool
+is_exact_bool_equal_partkey(BoolExpr *boolexpr, Expr *partkey,
+								  PartitionScheme part_scheme,
+								  bool extern_param_only)
+{
+	ListCell	*lc;
+	switch(boolexpr->boolop)
+	{
+		case AND_EXPR:
+			foreach(lc, boolexpr->args)
+			{
+				Expr *expr = (Expr *)lfirst(lc);
+				if (is_exact_equal_partkey(expr, partkey, part_scheme, extern_param_only))
+					return true;
+			}
+		case OR_EXPR:
+			foreach(lc, boolexpr->args)
+			{
+				Expr *expr = (Expr *)lfirst(lc);
+				/* We need to make sure all the expr is a partkey */
+				if (!is_exact_equal_partkey(expr, partkey, part_scheme, extern_param_only))
+					return false;
+			}
+			return true;
+		case NOT_EXPR:
+			return false;
+	}
+	return false;
+}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 69b83071cf..0bfb638abc 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1342,7 +1342,7 @@ create_append_path(PlannerInfo *root,
 		pathnode->path.pathkeys = child->pathkeys;
 	}
 	else
-		cost_append(pathnode);
+		cost_append(pathnode, root);
 
 	/* If the caller provided a row estimate, override the computed value. */
 	if (rows >= 0)
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 1be93be098..38c1cef70f 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -110,7 +110,7 @@ extern void cost_incremental_sort(Path *path,
 								  Cost input_startup_cost, Cost input_total_cost,
 								  double input_tuples, int width, Cost comparison_cost, int sort_mem,
 								  double limit_tuples);
-extern void cost_append(AppendPath *path);
+extern void cost_append(AppendPath *path, PlannerInfo *root);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 							  List *pathkeys, int n_streams,
 							  Cost input_startup_cost, Cost input_total_cost,
@@ -205,5 +205,6 @@ extern void set_foreign_size_estimates(PlannerInfo *root, RelOptInfo *rel);
 extern PathTarget *set_pathtarget_cost_width(PlannerInfo *root, PathTarget *target);
 extern double compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel,
 								   Path *bitmapqual, int loop_count, Cost *cost, double *tuple);
+extern double calculate_relrows_prune_ratio(RelOptInfo *rel, RangeTblEntry *rte,  int live_children);
 
 #endif							/* COST_H */
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index bde29e38a9..430014f508 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1991,81 +1991,65 @@ select explain_parallel_append('execute ab_q4 (2, 2)');
 prepare ab_q5 (int, int, int) as
 select avg(a) from ab where a in($1,$2,$3) and b < 4;
 select explain_parallel_append('execute ab_q5 (1, 1, 1)');
-                              explain_parallel_append                               
-------------------------------------------------------------------------------------
- Finalize Aggregate (actual rows=N loops=N)
-   ->  Gather (actual rows=N loops=N)
-         Workers Planned: 2
-         Workers Launched: N
-         ->  Partial Aggregate (actual rows=N loops=N)
-               ->  Parallel Append (actual rows=N loops=N)
-                     Subplans Removed: 6
-                     ->  Parallel Seq Scan on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                           Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
-                     ->  Parallel Seq Scan on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                           Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
-                     ->  Parallel Seq Scan on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                           Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
-(13 rows)
+                      explain_parallel_append                      
+-------------------------------------------------------------------
+ Aggregate (actual rows=N loops=N)
+   ->  Append (actual rows=N loops=N)
+         Subplans Removed: 6
+         ->  Seq Scan on ab_a1_b1 ab_1 (actual rows=N loops=N)
+               Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
+         ->  Seq Scan on ab_a1_b2 ab_2 (actual rows=N loops=N)
+               Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
+         ->  Seq Scan on ab_a1_b3 ab_3 (actual rows=N loops=N)
+               Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
+(9 rows)
 
 select explain_parallel_append('execute ab_q5 (2, 3, 3)');
-                              explain_parallel_append                               
-------------------------------------------------------------------------------------
- Finalize Aggregate (actual rows=N loops=N)
-   ->  Gather (actual rows=N loops=N)
-         Workers Planned: 2
-         Workers Launched: N
-         ->  Partial Aggregate (actual rows=N loops=N)
-               ->  Parallel Append (actual rows=N loops=N)
-                     Subplans Removed: 3
-                     ->  Parallel Seq Scan on ab_a2_b1 ab_1 (actual rows=N loops=N)
-                           Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
-                     ->  Parallel Seq Scan on ab_a2_b2 ab_2 (actual rows=N loops=N)
-                           Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
-                     ->  Parallel Seq Scan on ab_a2_b3 ab_3 (actual rows=N loops=N)
-                           Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
-                     ->  Parallel Seq Scan on ab_a3_b1 ab_4 (actual rows=N loops=N)
-                           Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
-                     ->  Parallel Seq Scan on ab_a3_b2 ab_5 (actual rows=N loops=N)
-                           Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
-                     ->  Parallel Seq Scan on ab_a3_b3 ab_6 (actual rows=N loops=N)
-                           Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
-(19 rows)
+                      explain_parallel_append                      
+-------------------------------------------------------------------
+ Aggregate (actual rows=N loops=N)
+   ->  Append (actual rows=N loops=N)
+         Subplans Removed: 3
+         ->  Seq Scan on ab_a2_b1 ab_1 (actual rows=N loops=N)
+               Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
+         ->  Seq Scan on ab_a2_b2 ab_2 (actual rows=N loops=N)
+               Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
+         ->  Seq Scan on ab_a2_b3 ab_3 (actual rows=N loops=N)
+               Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
+         ->  Seq Scan on ab_a3_b1 ab_4 (actual rows=N loops=N)
+               Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
+         ->  Seq Scan on ab_a3_b2 ab_5 (actual rows=N loops=N)
+               Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
+         ->  Seq Scan on ab_a3_b3 ab_6 (actual rows=N loops=N)
+               Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
+(15 rows)
 
 -- Try some params whose values do not belong to any partition.
 select explain_parallel_append('execute ab_q5 (33, 44, 55)');
-                  explain_parallel_append                  
------------------------------------------------------------
- Finalize Aggregate (actual rows=N loops=N)
-   ->  Gather (actual rows=N loops=N)
-         Workers Planned: 2
-         Workers Launched: N
-         ->  Partial Aggregate (actual rows=N loops=N)
-               ->  Parallel Append (actual rows=N loops=N)
-                     Subplans Removed: 9
-(7 rows)
+       explain_parallel_append        
+--------------------------------------
+ Aggregate (actual rows=N loops=N)
+   ->  Append (actual rows=N loops=N)
+         Subplans Removed: 9
+(3 rows)
 
 -- Test Parallel Append with PARAM_EXEC Params
 select explain_parallel_append('select count(*) from ab where (a = (select 1) or a = (select 3)) and b = 2');
-                           explain_parallel_append                            
-------------------------------------------------------------------------------
+                    explain_parallel_append                    
+---------------------------------------------------------------
  Aggregate (actual rows=N loops=N)
    InitPlan 1 (returns $0)
      ->  Result (actual rows=N loops=N)
    InitPlan 2 (returns $1)
      ->  Result (actual rows=N loops=N)
-   ->  Gather (actual rows=N loops=N)
-         Workers Planned: 2
-         Params Evaluated: $0, $1
-         Workers Launched: N
-         ->  Parallel Append (actual rows=N loops=N)
-               ->  Parallel Seq Scan on ab_a1_b2 ab_1 (actual rows=N loops=N)
-                     Filter: ((b = 2) AND ((a = $0) OR (a = $1)))
-               ->  Parallel Seq Scan on ab_a2_b2 ab_2 (never executed)
-                     Filter: ((b = 2) AND ((a = $0) OR (a = $1)))
-               ->  Parallel Seq Scan on ab_a3_b2 ab_3 (actual rows=N loops=N)
-                     Filter: ((b = 2) AND ((a = $0) OR (a = $1)))
-(16 rows)
+   ->  Append (actual rows=N loops=N)
+         ->  Seq Scan on ab_a1_b2 ab_1 (actual rows=N loops=N)
+               Filter: ((b = 2) AND ((a = $0) OR (a = $1)))
+         ->  Seq Scan on ab_a2_b2 ab_2 (never executed)
+               Filter: ((b = 2) AND ((a = $0) OR (a = $1)))
+         ->  Seq Scan on ab_a3_b2 ab_3 (actual rows=N loops=N)
+               Filter: ((b = 2) AND ((a = $0) OR (a = $1)))
+(12 rows)
 
 -- Test pruning during parallel nested loop query
 create table lprt_a (a int not null);
@@ -2463,74 +2447,72 @@ deallocate ab_q6;
 insert into ab values (1,2);
 explain (analyze, costs off, summary off, timing off)
 update ab_a1 set b = 3 from ab where ab.a = 1 and ab.a = ab_a1.a;
-                                     QUERY PLAN                                      
--------------------------------------------------------------------------------------
+                                        QUERY PLAN                                         
+-------------------------------------------------------------------------------------------
  Update on ab_a1 (actual rows=0 loops=1)
    Update on ab_a1_b1 ab_a1_1
    Update on ab_a1_b2 ab_a1_2
    Update on ab_a1_b3 ab_a1_3
    ->  Nested Loop (actual rows=0 loops=1)
-         ->  Append (actual rows=1 loops=1)
-               ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
-                           Index Cond: (a = 1)
-               ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (actual rows=1 loops=1)
-                     Recheck Cond: (a = 1)
-                     Heap Blocks: exact=1
-                     ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
-                           Index Cond: (a = 1)
-               ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=0 loops=1)
-                           Index Cond: (a = 1)
-         ->  Materialize (actual rows=0 loops=1)
-               ->  Bitmap Heap Scan on ab_a1_b1 ab_a1_1 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
-                           Index Cond: (a = 1)
+         ->  Bitmap Heap Scan on ab_a1_b1 ab_a1_1 (actual rows=0 loops=1)
+               Recheck Cond: (a = 1)
+               ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                     Index Cond: (a = 1)
+         ->  Materialize (never executed)
+               ->  Append (never executed)
+                     ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (never executed)
+                           Recheck Cond: (a = 1)
+                           ->  Bitmap Index Scan on ab_a1_b1_a_idx (never executed)
+                                 Index Cond: (a = 1)
+                     ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (never executed)
+                           Recheck Cond: (a = 1)
+                           ->  Bitmap Index Scan on ab_a1_b2_a_idx (never executed)
+                                 Index Cond: (a = 1)
+                     ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (never executed)
+                           Recheck Cond: (a = 1)
+                           ->  Bitmap Index Scan on ab_a1_b3_a_idx (never executed)
+                                 Index Cond: (a = 1)
    ->  Nested Loop (actual rows=1 loops=1)
-         ->  Append (actual rows=1 loops=1)
-               ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
-                           Index Cond: (a = 1)
-               ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (actual rows=1 loops=1)
-                     Recheck Cond: (a = 1)
-                     Heap Blocks: exact=1
-                     ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
-                           Index Cond: (a = 1)
-               ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
-                           Index Cond: (a = 1)
+         ->  Bitmap Heap Scan on ab_a1_b2 ab_a1_2 (actual rows=1 loops=1)
+               Recheck Cond: (a = 1)
+               Heap Blocks: exact=1
+               ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
+                     Index Cond: (a = 1)
          ->  Materialize (actual rows=1 loops=1)
-               ->  Bitmap Heap Scan on ab_a1_b2 ab_a1_2 (actual rows=1 loops=1)
-                     Recheck Cond: (a = 1)
-                     Heap Blocks: exact=1
-                     ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
-                           Index Cond: (a = 1)
+               ->  Append (actual rows=1 loops=1)
+                     ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
+                           Recheck Cond: (a = 1)
+                           ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                                 Index Cond: (a = 1)
+                     ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (actual rows=1 loops=1)
+                           Recheck Cond: (a = 1)
+                           Heap Blocks: exact=1
+                           ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
+                                 Index Cond: (a = 1)
+                     ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (actual rows=0 loops=1)
+                           Recheck Cond: (a = 1)
+                           ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
+                                 Index Cond: (a = 1)
    ->  Nested Loop (actual rows=0 loops=1)
-         ->  Append (actual rows=1 loops=1)
-               ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
-                           Index Cond: (a = 1)
-               ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (actual rows=1 loops=1)
-                     Recheck Cond: (a = 1)
-                     Heap Blocks: exact=1
-                     ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
-                           Index Cond: (a = 1)
-               ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
-                           Index Cond: (a = 1)
-         ->  Materialize (actual rows=0 loops=1)
-               ->  Bitmap Heap Scan on ab_a1_b3 ab_a1_3 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
-                           Index Cond: (a = 1)
-(65 rows)
+         ->  Bitmap Heap Scan on ab_a1_b3 ab_a1_3 (actual rows=0 loops=1)
+               Recheck Cond: (a = 1)
+               ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
+                     Index Cond: (a = 1)
+         ->  Materialize (never executed)
+               ->  Append (never executed)
+                     ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (never executed)
+                           Recheck Cond: (a = 1)
+                           ->  Bitmap Index Scan on ab_a1_b1_a_idx (never executed)
+                                 Index Cond: (a = 1)
+                     ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (never executed)
+                           Recheck Cond: (a = 1)
+                           ->  Bitmap Index Scan on ab_a1_b2_a_idx (never executed)
+                                 Index Cond: (a = 1)
+                     ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (never executed)
+                           Recheck Cond: (a = 1)
+                           ->  Bitmap Index Scan on ab_a1_b3_a_idx (never executed)
+                                 Index Cond: (a = 1)
+(63 rows)
 
 table ab;
  a | b 
@@ -3713,20 +3695,16 @@ alter table listp_12_1 set (parallel_workers = 0);
 -- Ensure that listp_12_2 is not scanned.  (The nested Append is not seen in
 -- the plan as it's pulled in setref.c due to having just a single subnode).
 select explain_parallel_append('select * from listp where a = (select 1);');
-                       explain_parallel_append                        
-----------------------------------------------------------------------
- Gather (actual rows=N loops=N)
-   Workers Planned: 2
-   Params Evaluated: $0
-   Workers Launched: N
+                   explain_parallel_append                    
+--------------------------------------------------------------
+ Append (actual rows=N loops=N)
    InitPlan 1 (returns $0)
      ->  Result (actual rows=N loops=N)
-   ->  Parallel Append (actual rows=N loops=N)
-         ->  Seq Scan on listp_12_1 listp_1 (actual rows=N loops=N)
-               Filter: (a = $0)
-         ->  Parallel Seq Scan on listp_12_2 listp_2 (never executed)
-               Filter: (a = $0)
-(11 rows)
+   ->  Seq Scan on listp_12_1 listp_1 (actual rows=N loops=N)
+         Filter: (a = $0)
+   ->  Seq Scan on listp_12_2 listp_2 (never executed)
+         Filter: (a = $0)
+(7 rows)
 
 -- Like the above but throw some more complexity at the planner by adding
 -- a UNION ALL.  We expect both sides of the union not to scan the
@@ -3735,32 +3713,24 @@ select explain_parallel_append(
 'select * from listp where a = (select 1)
   union all
 select * from listp where a = (select 2);');
-                              explain_parallel_append                              
------------------------------------------------------------------------------------
+                      explain_parallel_append                       
+--------------------------------------------------------------------
  Append (actual rows=N loops=N)
-   ->  Gather (actual rows=N loops=N)
-         Workers Planned: 2
-         Params Evaluated: $0
-         Workers Launched: N
+   ->  Append (actual rows=N loops=N)
          InitPlan 1 (returns $0)
            ->  Result (actual rows=N loops=N)
-         ->  Parallel Append (actual rows=N loops=N)
-               ->  Seq Scan on listp_12_1 listp_1 (actual rows=N loops=N)
-                     Filter: (a = $0)
-               ->  Parallel Seq Scan on listp_12_2 listp_2 (never executed)
-                     Filter: (a = $0)
-   ->  Gather (actual rows=N loops=N)
-         Workers Planned: 2
-         Params Evaluated: $1
-         Workers Launched: N
+         ->  Seq Scan on listp_12_1 listp_1 (actual rows=N loops=N)
+               Filter: (a = $0)
+         ->  Seq Scan on listp_12_2 listp_2 (never executed)
+               Filter: (a = $0)
+   ->  Append (actual rows=N loops=N)
          InitPlan 2 (returns $1)
            ->  Result (actual rows=N loops=N)
-         ->  Parallel Append (actual rows=N loops=N)
-               ->  Seq Scan on listp_12_1 listp_4 (never executed)
-                     Filter: (a = $1)
-               ->  Parallel Seq Scan on listp_12_2 listp_5 (actual rows=N loops=N)
-                     Filter: (a = $1)
-(23 rows)
+         ->  Seq Scan on listp_12_1 listp_4 (never executed)
+               Filter: (a = $1)
+         ->  Seq Scan on listp_12_2 listp_5 (actual rows=N loops=N)
+               Filter: (a = $1)
+(15 rows)
 
 drop table listp;
 reset parallel_tuple_cost;
-- 
2.21.0

vignesh C

vignesh21@gmail.com

over 4 years ago

In reply to: Andy Fan (#4)

Re: Make Append Cost aware of some run time partition prune case

On Thu, Mar 4, 2021 at 9:51 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

I have implemented a new one, which only handles 1 level of partitioned table, and
only 1 partition key. and only handle the eq operators like partkey = $1 / partkey in ($1, $2)
/ parkey = $1 or partkey = $2; The patch works well in my user case. I can send
one on the latest master shortly, and hope it is helpful for you as well.

Hi:

Here is the new patch for this topic, which has proved works in my limited user
case (apply the similar logic on pg11), and it is incomplete since more cases
should be covered but not. Uploading this patch now is just for discussing and
testing.

Design principle:

1). the cost of AppendPath should be reduced for either init partition prune or run
time partition prune. All of the startup_cost, total_cost, rows should be
adjusted. As for the startup_cost, if we should adjust it carefully, or else we can
get the run_time_cost less than 0.

2). When we merge the subpath from sub-partitioned rel via
accumulate_append_subpath, currently we just merge the subpaths and discard the
cost/rows in AppendPath. After this feature is involved, we may consider to use
the AppendPath's cost as well during this stage.

3). When join is involved, AppendPath is not enough since the estimated rows for
a join relation cares about rel1->rows, rel2->rows only, the Path.rows is not
cared. so we need to adjust rel->rows as well. and only for the init
partition prune case. (appendrel->rows for planning time partition prune is
handled already).

The biggest problem of the above is I don't know how to adjust the cost for
Parallel Append Path. Currently I just ignore it, which would cause some query
should use Parallel Append Path but not.

Something I don't want to handle really unless we can address its value.
1. Operators like >, <. Between and.
2. the uncompleted part key appeared in prunequals. Like we have partition key (a,
b). But use just use A = 1 as restrictinfo.

The main reason I don't want to handle them are 1). it would be uncommon.
b). It introduces extra complexity. c). at last, we can't estimate it well like partkey > $1,
what would be a prune ratio for ).

Something I don't handle so far are: 1). accumulate_append_subpath
stuff. 2). MergeAppend. 3). Multi Partition key.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

Daniel Gustafsson

daniel@yesql.se

about 4 years ago

In reply to: vignesh C (#5)

Re: Make Append Cost aware of some run time partition prune case

On 14 Jul 2021, at 17:55, vignesh C <vignesh21@gmail.com> wrote:

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

As the thread has stalled, this patch still doesn't apply (judging by the log
it's likely not too hard to resolve). I'm marking this patch Returned with
Feedback, feel free to open a new entry for an updated patch.

--
Daniel Gustafsson https://vmware.com/