Improve choose_custom_plan for initial partition prune case

Started by Andy Fanover 5 years ago11 messages

zhihui.fan1213@gmail.com

over 5 years ago

Given the plan example:

CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);

CREATE TABLE measurement_y2006m02 PARTITION OF measurement
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');

prepare s as select * from measurement where logdate = $1;
execute s('2006-02-01').

The generic plan will probably not be chosen because it doesn't reduce the
cost
which can be reduced at initial_prune while the custom plan reduces such
cost
at planning time. which makes the cost comparison not fair. I'm thinking
if we can
get an estimated cost reduction of initial_prunne for generic plan based on
the
partition pruned at plan time from custom plan and then reducing
such costs from the generic plan. I just went through the related code but
didn't write anything now. I'd like to see if this is a correct direction
to go.

--
Best Regards
Andy Fan

Amit Langote

amitlangote09@gmail.com

over 5 years ago

In reply to: Andy Fan (#1)

Re: Improve choose_custom_plan for initial partition prune case

Hi Andy,

On Fri, Oct 2, 2020 at 1:04 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Given the plan example:

CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);

CREATE TABLE measurement_y2006m02 PARTITION OF measurement
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');

prepare s as select * from measurement where logdate = $1;
execute s('2006-02-01').

The generic plan will probably not be chosen because it doesn't reduce the cost
which can be reduced at initial_prune while the custom plan reduces such cost
at planning time. which makes the cost comparison not fair.

I agree that there is something to be done here. Actually, I think we
should try to find a solution that will allow us to consider not just
"initial" pruning, but also "execution-time" pruning. The latter
will allow a nested loop join whose inner side scans a partitioned
table using a parameterized scan on the partition key to be favored
over other join plans, because that parameterized scan can use
execution-time pruning which can make the inner scan very cheap.

I'm thinking if we can
get an estimated cost reduction of initial_prunne for generic plan based on the
partition pruned at plan time from custom plan and then reducing
such costs from the generic plan. I just went through the related code but
didn't write anything now. I'd like to see if this is a correct direction to go.

That's an interesting idea, that is, to try to do this totally outside
the planner. When I was thinking about this a little while ago, I was
trying to find a way to adjust the cost of the plan in the planner
itself by looking at the runtime pruning info in the nodes that
support it, that is, Append, MergeAppend. Actually, such an approach
had also come up in the original run-time pruning discussion [1]/messages/by-id/CA+TgmoZHYoAL4HYwnGO25B8CxCB+vNMdf+7rbUzYykR4sU9yUA@mail.gmail.com.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1]: /messages/by-id/CA+TgmoZHYoAL4HYwnGO25B8CxCB+vNMdf+7rbUzYykR4sU9yUA@mail.gmail.com

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Amit Langote (#2)

Re: Improve choose_custom_plan for initial partition prune case

Hi Amit:

Very glad to see your comment!

On Fri, Oct 2, 2020 at 4:21 PM Amit Langote <amitlangote09@gmail.com> wrote:

Hi Andy,

On Fri, Oct 2, 2020 at 1:04 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Given the plan example:

CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);

CREATE TABLE measurement_y2006m02 PARTITION OF measurement
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');

prepare s as select * from measurement where logdate = $1;
execute s('2006-02-01').

The generic plan will probably not be chosen because it doesn't reduce

the cost

which can be reduced at initial_prune while the custom plan reduces such

cost

at planning time. which makes the cost comparison not fair.

I agree that there is something to be done here. Actually, I think we
should try to find a solution that will allow us to consider not just
"initial" pruning, but also "execution-time" pruning. The latter
will allow a nested loop join whose inner side scans a partitioned
table using a parameterized scan on the partition key to be favored
over other join plans, because that parameterized scan can use
execution-time pruning which can make the inner scan very cheap.

This looks like to resolve another important issue of partition prune, which
may happen at planning time totally (no generic plan or custom plan
involved).
for example between choosing a Nest Loop plan which can use
some run-time partition prune and hash join which can't. I "repeat" your
idea
just to make sure I understand you correctly.

I'm thinking if we can
get an estimated cost reduction of initial_prunne for generic plan based

on the

partition pruned at plan time from custom plan and then reducing
such costs from the generic plan. I just went through the related code

but

didn't write anything now. I'd like to see if this is a correct

direction to go.

That's an interesting idea, that is, to try to do this totally outside
the planner. When I was thinking about this a little while ago, I was
trying to find a way to adjust the cost of the plan in the planner
itself by looking at the runtime pruning info in the nodes that
support it, that is, Append, MergeAppend. Actually, such an approach
had also come up in the original run-time pruning discussion [1].

Thank you for your comments. Looks like your approach can be helpful
for the both cases, and I did think a bit for that as well, However, that
looks
complex (for me) AND I am prefer to guess how many partitions can be
pruned with real data even it is the real data in the past (I assume that
will not cause too much difference in practice).

I'm not sure if I should treat Robert's comments as an opposed idea[1]/messages/by-id/CA+TgmoZv8sd9cKyYtHwmd_13+BAjkVKo=ECe7G98tBK5Ejwatw@mail.gmail.com ,
but I think there are some little differences. I'd like to implement my
idea
soon, and I'm glad to see any opposed idea at any time, of course the
sooner
the better:)

[1]: /messages/by-id/CA+TgmoZv8sd9cKyYtHwmd_13+BAjkVKo=ECe7G98tBK5Ejwatw@mail.gmail.com
/messages/by-id/CA+TgmoZv8sd9cKyYtHwmd_13+BAjkVKo=ECe7G98tBK5Ejwatw@mail.gmail.com

--
Best Regards
Andy Fan

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

over 5 years ago

In reply to: Andy Fan (#1)

Re: Improve choose_custom_plan for initial partition prune case

On Thu, Oct 1, 2020 at 9:34 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Given the plan example:

CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);

CREATE TABLE measurement_y2006m02 PARTITION OF measurement
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');

prepare s as select * from measurement where logdate = $1;
execute s('2006-02-01').

The generic plan will probably not be chosen because it doesn't reduce the cost
which can be reduced at initial_prune while the custom plan reduces such cost
at planning time. which makes the cost comparison not fair. I'm thinking if we can
get an estimated cost reduction of initial_prunne for generic plan based on the
partition pruned at plan time from custom plan and then reducing
such costs from the generic plan. I just went through the related code but
didn't write anything now. I'd like to see if this is a correct direction to go.

I think the end result will depend upon the value passed to the first
few executions of the prepared plan. If they happen to have estimates
similar or higher compared to the generic plan, generic plan will be
chosen later. But if they happen to be way lesser than the generic
plan's estimates, a custom plan might be chosen. What happens when we
execute plans with values that have estimates similar to the generic
plan later when we moderate generic plan costs based on the custom
plans?

If the table has good distribution of a partition key, which also
results in good distribution of data across partitions, generic plan
cost will be similar to the custom plan costs. If not that's something
we want to fix. But if there's a large data skew, probably letting the
custom plan always win is better. [1]/messages/by-id/CA+TgmoZHYoAL4HYwnGO25B8CxCB+vNMdf+7rbUzYykR4sU9yUA@mail.gmail.com talks about generic plan being
not chosen just because it has higher cost even though its shape is
similar to a custom plan. Leveraging that idea might be a good option.
If the custom plans produced have shapes similar to the generic plan,
stop producing those.

[1]: /messages/by-id/CA+TgmoZHYoAL4HYwnGO25B8CxCB+vNMdf+7rbUzYykR4sU9yUA@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Ashutosh Bapat (#4)

Re: Improve choose_custom_plan for initial partition prune case

Hi Ashutosh:

Thanks for coming.

On Mon, Oct 5, 2020 at 9:27 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
wrote:

On Thu, Oct 1, 2020 at 9:34 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Given the plan example:

CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);

CREATE TABLE measurement_y2006m02 PARTITION OF measurement
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');

prepare s as select * from measurement where logdate = $1;
execute s('2006-02-01').

The generic plan will probably not be chosen because it doesn't reduce

the cost

which can be reduced at initial_prune while the custom plan reduces such

cost

at planning time. which makes the cost comparison not fair. I'm

thinking if we can

get an estimated cost reduction of initial_prunne for generic plan based

on the

partition pruned at plan time from custom plan and then reducing
such costs from the generic plan. I just went through the related code

but

didn't write anything now. I'd like to see if this is a correct

direction to go.

What happens when we
execute plans with values that have estimates similar to the generic
plan later when we moderate generic plan costs based on the custom
plans?

The example at the beginning of this thread, I used the exact same values
every time, the custom plan will be chosen all the time, which is bad,
The main reason is the custom plan knows the exact value in Execute
message, so it run plan time partition prune, then the total cost is low,
however
for the generic plan the partition prune happens at
Runtime initial_partition prune
stage, so the cost of the partitions which can be pruned at that stage is
still
included the total cost, so generic plans can't be chosen. that would be
the
thing I want to fix.

If the table has good distribution of a partition key, which also

results in good distribution of data across partitions, generic plan
cost will be similar to the custom plan costs. If not that's something
we want to fix.

[1] talks about generic plan being
not chosen just because it has higher cost even though its shape is
similar to a custom plan. Leveraging that idea might be a good option.
If the custom plans produced have shapes similar to the generic plan,
stop producing those.

If I understand you correctly, the issue I want to fix here matches this
goal.

--
Best Regards
Andy Fan

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Andy Fan (#3)

Re: Improve choose_custom_plan for initial partition prune case

On Sat, Oct 3, 2020 at 10:05 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Hi Amit:

Very glad to see your comment!

On Fri, Oct 2, 2020 at 4:21 PM Amit Langote <amitlangote09@gmail.com>
wrote:

Hi Andy,

On Fri, Oct 2, 2020 at 1:04 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Given the plan example:

CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);

CREATE TABLE measurement_y2006m02 PARTITION OF measurement
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');

prepare s as select * from measurement where logdate = $1;
execute s('2006-02-01').

The generic plan will probably not be chosen because it doesn't reduce

the cost

which can be reduced at initial_prune while the custom plan reduces

such cost

at planning time. which makes the cost comparison not fair.

I agree that there is something to be done here. Actually, I think we
should try to find a solution that will allow us to consider not just
"initial" pruning, but also "execution-time" pruning. The latter
will allow a nested loop join whose inner side scans a partitioned
table using a parameterized scan on the partition key to be favored
over other join plans, because that parameterized scan can use
execution-time pruning which can make the inner scan very cheap.

This looks like to resolve another important issue of partition prune,
which
may happen at planning time totally (no generic plan or custom plan
involved).
for example between choosing a Nest Loop plan which can use
some run-time partition prune and hash join which can't. I "repeat" your
idea
just to make sure I understand you correctly.

I'm thinking if we can
get an estimated cost reduction of initial_prunne for generic plan

based on the

partition pruned at plan time from custom plan and then reducing
such costs from the generic plan. I just went through the related code

but

didn't write anything now. I'd like to see if this is a correct

direction to go.

That's an interesting idea, that is, to try to do this totally outside
the planner. When I was thinking about this a little while ago, I was
trying to find a way to adjust the cost of the plan in the planner
itself by looking at the runtime pruning info in the nodes that
support it, that is, Append, MergeAppend. Actually, such an approach
had also come up in the original run-time pruning discussion [1].

Thank you for your comments. Looks like your approach can be helpful
for the both cases, and I did think a bit for that as well, However, that
looks
complex (for me) AND I am prefer to guess how many partitions can be
pruned with real data even it is the real data in the past (I assume that
will not cause too much difference in practice).

I'm not sure if I should treat Robert's comments as an opposed idea[1] ,
but I think there are some little differences. I'd like to implement my
idea
soon, and I'm glad to see any opposed idea at any time, of course the
sooner
the better:)

After some day's coding in this direction, I find a very upsetting
situation.

The main idea here is if we have N partition survived for custom plan after
plan time partition prune, then I assume we can get the similar partition
survived for generic plan after initial partition prune. Then we should
reduce
such costs from generic plans.

However the hard part in implementation is we can't associate the Append
node in the generic plan with the survived partition information from the
custom plan even Append node has apprelids in it.

1. Associate them with rtindex. The final rtindex is impacted by
glob->finalrtable,
however the glob->finalrtable includes the child partitions, we will get
less
finalrtable in the custom plan so rtindex can't be matched.

2. Associate them with RelationOid, and we can record such information in
the
Append node as well. The bad part is the same relation Oid may appear
multiple
times in a query. for example: SELECT .. FROM p p1, p p2 where
p1.partkey1 = $1
AND p2.partkey2 = $2;

So any hint on this will be appreciated..

--
Best Regards
Andy Fan

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Andy Fan (#6)

Re: Improve choose_custom_plan for initial partition prune case

2. Associate them with RelationOid, and we can record such information in
the
Append node as well. The bad part is the same relation Oid may appear
multiple
times in a query. for example: SELECT .. FROM p p1, p p2 where
p1.partkey1 = $1
AND p2.partkey2 = $2;

I just came up with a new idea. Since this situation should be rare, we
can just come back
to our original method (totally ignore the cost reduction) or use the
average number. Fixing
the 99% cases would be a big winner as well IMO.

So any hint on this will be appreciated..

--
Best Regards
Andy Fan

--
Best Regards
Andy Fan

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Andy Fan (#7)

1 attachment(s)

Re: Improve choose_custom_plan for initial partition prune case

On Wed, Oct 7, 2020 at 2:43 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

2. Associate them with RelationOid, and we can record such information in
the
Append node as well. The bad part is the same relation Oid may appear
multiple
times in a query. for example: SELECT .. FROM p p1, p p2 where
p1.partkey1 = $1
AND p2.partkey2 = $2;

I just came up with a new idea. Since this situation should be rare, we
can just come back
to our original method (totally ignore the cost reduction) or use the
average number. Fixing
the 99% cases would be a big winner as well IMO.

I just uploaded a runnable patch for this idea, but it looks like the
design is wrong
at the beginning. for example:

Nest Loop:
Append
p_1
p_2
inner

The patch only reduces the cost of the Append node, but in fact,
since the loop count
of the inner plan is reduced as well, such cost should be reduced as well.
However even
we can reduce the cost for joining, that is not a smart solution as well,
since
the generic plan itself might be wrong as well at the beginning.

--
Best Regards
Andy Fan

Attachments:

v1-0001-Reduce-some-generic-plan-cost-by-adjusting-the-Ap.patchapplication/octet-stream; name=v1-0001-Reduce-some-generic-plan-cost-by-adjusting-the-Ap.patchDownload

From bc879dc79c3a76de3d56acd980c06227c1f44c45 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Wed, 7 Oct 2020 16:19:25 +0800
Subject: [PATCH v1] Reduce some generic plan cost by adjusting the Append node
 cost only

Generic plan includes some costs which can be reduced at initial
partition prune stage, while such cost is reduced at plan time for
custom plan.  So choose_custom_plan always return true for this case.

This patch try to fix this issue by estimating how many partitions can
be pruned at initial partition prune time (based on the prune result from
custom plan). But it doesn't help much since reducing the cost from Append
node is not enough. In join case, after the initial partition prune, not
only append's cost can be reduced,  the overall join cost should be reduced
as well.
---
 src/backend/nodes/copyfuncs.c           |  16 ++
 src/backend/nodes/nodeFuncs.c           |  13 ++
 src/backend/nodes/outfuncs.c            |   1 +
 src/backend/nodes/readfuncs.c           |   1 +
 src/backend/optimizer/path/costsize.c   |   8 -
 src/backend/optimizer/plan/createplan.c |   4 +
 src/backend/optimizer/plan/planner.c    |   4 +
 src/backend/optimizer/plan/setrefs.c    |   2 +
 src/backend/optimizer/util/inherit.c    |  63 +++++++-
 src/backend/utils/cache/plancache.c     | 190 +++++++++++++++++++++++-
 src/include/nodes/nodeFuncs.h           |   2 +
 src/include/nodes/nodes.h               |   1 +
 src/include/nodes/pathnodes.h           |  13 ++
 src/include/nodes/plannodes.h           |  10 +-
 src/include/optimizer/cost.h            |   7 +
 src/include/optimizer/inherit.h         |   2 +-
 src/include/utils/plancache.h           |   2 +
 17 files changed, 321 insertions(+), 18 deletions(-)

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 0409a40b82..90be22bcfc 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -94,6 +94,7 @@ _copyPlannedStmt(const PlannedStmt *from)
 	COPY_NODE_FIELD(rootResultRelations);
 	COPY_NODE_FIELD(appendRelations);
 	COPY_NODE_FIELD(subplans);
+	COPY_NODE_FIELD(flatten_live_parts);
 	COPY_BITMAPSET_FIELD(rewindPlanIDs);
 	COPY_NODE_FIELD(rowMarks);
 	COPY_NODE_FIELD(relationOids);
@@ -243,6 +244,7 @@ _copyAppend(const Append *from)
 	 * copy remainder of node
 	 */
 	COPY_BITMAPSET_FIELD(apprelids);
+	COPY_SCALAR_FIELD(relid);
 	COPY_NODE_FIELD(appendplans);
 	COPY_SCALAR_FIELD(first_partial_plan);
 	COPY_NODE_FIELD(part_prune_info);
@@ -4673,6 +4675,16 @@ _copyPartitionCmd(const PartitionCmd *from)
 	return newnode;
 }
 
+static LivePartition *
+_copyLivePartition(const LivePartition *from)
+{
+	LivePartition *newnode = makeNode(LivePartition);
+	COPY_SCALAR_FIELD(relid);
+	COPY_SCALAR_FIELD(lived_count);
+	COPY_SCALAR_FIELD(count);
+	return newnode;
+}
+
 static CreatePublicationStmt *
 _copyCreatePublicationStmt(const CreatePublicationStmt *from)
 {
@@ -5706,6 +5718,10 @@ copyObjectImpl(const void *from)
 			retval = _copyPartitionCmd(from);
 			break;
 
+		case T_LivePartition:
+			retval = _copyLivePartition(from);
+			break;
+
 			/*
 			 * MISCELLANEOUS NODES
 			 */
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 9ce8f43385..df96b72290 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -1681,6 +1681,19 @@ set_sa_opfuncid(ScalarArrayOpExpr *opexpr)
 		opexpr->opfuncid = get_opcode(opexpr->opno);
 }
 
+LivePartition *
+find_related_liveparts(List *live_parts, Oid relid)
+{
+	ListCell *lc;
+	foreach(lc, live_parts)
+	{
+		LivePartition *live_part = lfirst_node(LivePartition, lc);
+		if (live_part->relid == relid)
+			return live_part;
+	}
+	return NULL;
+}
+
 
 /*
  *	check_functions_in_node -
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f0386480ab..0d09fa6235 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -433,6 +433,7 @@ _outAppend(StringInfo str, const Append *node)
 	_outPlanInfo(str, (const Plan *) node);
 
 	WRITE_BITMAPSET_FIELD(apprelids);
+	WRITE_UINT_FIELD(relid);
 	WRITE_NODE_FIELD(appendplans);
 	WRITE_INT_FIELD(first_partial_plan);
 	WRITE_NODE_FIELD(part_prune_info);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 42050ab719..e7e4ef52e6 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1669,6 +1669,7 @@ _readAppend(void)
 	ReadCommonPlan(&local_node->plan);
 
 	READ_BITMAPSET_FIELD(apprelids);
+	READ_UINT_FIELD(relid);
 	READ_NODE_FIELD(appendplans);
 	READ_INT_FIELD(first_partial_plan);
 	READ_NODE_FIELD(part_prune_info);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index cd3716d494..4dde6867a8 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -100,14 +100,6 @@
 
 #define LOG2(x)  (log(x) / 0.693147180559945)
 
-/*
- * Append and MergeAppend nodes are less expensive than some other operations
- * which use cpu_tuple_cost; instead of adding a separate GUC, estimate the
- * per-tuple cost as cpu_tuple_cost multiplied by this value.
- */
-#define APPEND_CPU_COST_MULTIPLIER 0.5
-
-
 double		seq_page_cost = DEFAULT_SEQ_PAGE_COST;
 double		random_page_cost = DEFAULT_RANDOM_PAGE_COST;
 double		cpu_tuple_cost = DEFAULT_CPU_TUPLE_COST;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 99278eed93..e1dfb62d3b 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1132,6 +1132,10 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
 	plan->plan.lefttree = NULL;
 	plan->plan.righttree = NULL;
 	plan->apprelids = rel->relids;
+	if (IS_PARTITIONED_REL(rel) && rel->relid > 0)
+		plan->relid = planner_rt_fetch(rel->relid, root)->relid;
+	else
+		plan->relid = InvalidOid;
 
 	if (pathkeys != NIL)
 	{
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e2b4965c4..fa9cb2db92 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -315,6 +315,8 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	glob->lastPlanNodeId = 0;
 	glob->transientPlan = false;
 	glob->dependsOnRole = false;
+	glob->flatten_live_parts = NIL;
+	glob->append_plans = NIL;
 
 	/*
 	 * Assess whether it's feasible to use parallel mode for this query. We
@@ -523,6 +525,8 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	result->rootResultRelations = glob->rootResultRelations;
 	result->appendRelations = glob->appendRelations;
 	result->subplans = glob->subplans;
+	result->flatten_live_parts = glob->flatten_live_parts;
+	result->append_plans = glob->append_plans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
 	result->rowMarks = glob->finalrowmarks;
 	result->relationOids = glob->relationOids;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index dd8e2e966d..a15fa7f552 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1000,6 +1000,8 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			}
 			break;
 		case T_Append:
+			root->glob->append_plans = lappend(root->glob->append_plans,
+											   plan);
 			/* Needs special treatment, see comments below */
 			return set_append_references(root,
 										 (Append *) plan,
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 3132fd35a5..a209ff3673 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -21,6 +21,7 @@
 #include "catalog/pg_type.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/appendinfo.h"
 #include "optimizer/inherit.h"
 #include "optimizer/optimizer.h"
@@ -38,7 +39,8 @@
 static void expand_partitioned_rtentry(PlannerInfo *root, RelOptInfo *relinfo,
 									   RangeTblEntry *parentrte,
 									   Index parentRTindex, Relation parentrel,
-									   PlanRowMark *top_parentrc, LOCKMODE lockmode);
+									   PlanRowMark *top_parentrc, LOCKMODE lockmode,
+									   bool *any_pruned);
 static void expand_single_inheritance_child(PlannerInfo *root,
 											RangeTblEntry *parentrte,
 											Index parentRTindex, Relation parentrel,
@@ -131,6 +133,7 @@ expand_inherited_rtentry(PlannerInfo *root, RelOptInfo *rel,
 	/* Scan the inheritance set and expand it */
 	if (oldrelation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 	{
+		bool any_pruned = false;
 		/*
 		 * Partitioned table, so set up for partitioning.
 		 */
@@ -141,7 +144,9 @@ expand_inherited_rtentry(PlannerInfo *root, RelOptInfo *rel,
 		 * extract the partition key columns of all the partitioned tables.
 		 */
 		expand_partitioned_rtentry(root, rel, rte, rti,
-								   oldrelation, oldrc, lockmode);
+								   oldrelation, oldrc, lockmode, &any_pruned);
+		if (any_pruned)
+			count_live_partitions(root, rel);
 	}
 	else
 	{
@@ -278,13 +283,15 @@ expand_inherited_rtentry(PlannerInfo *root, RelOptInfo *rel,
 
 /*
  * expand_partitioned_rtentry
- *		Recursively expand an RTE for a partitioned table.
+ *		Recursively expand an RTE for a partitioned table. any_pruned is an
+ * out parameter which shows if any partition is pruned at this step.
  */
 static void
 expand_partitioned_rtentry(PlannerInfo *root, RelOptInfo *relinfo,
 						   RangeTblEntry *parentrte,
 						   Index parentRTindex, Relation parentrel,
-						   PlanRowMark *top_parentrc, LOCKMODE lockmode)
+						   PlanRowMark *top_parentrc, LOCKMODE lockmode,
+						   bool *any_pruned)
 {
 	PartitionDesc partdesc;
 	Bitmapset  *live_parts;
@@ -334,6 +341,9 @@ expand_partitioned_rtentry(PlannerInfo *root, RelOptInfo *relinfo,
 	if (num_live_parts > 0)
 		expand_planner_arrays(root, num_live_parts);
 
+	if (num_live_parts < partdesc->nparts)
+		*any_pruned = true;
+
 	/*
 	 * We also store partition RelOptInfo pointers in the parent relation.
 	 * Since we're palloc0'ing, slots corresponding to pruned partitions will
@@ -383,7 +393,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RelOptInfo *relinfo,
 		if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 			expand_partitioned_rtentry(root, childrelinfo,
 									   childrte, childRTindex,
-									   childrel, top_parentrc, lockmode);
+									   childrel, top_parentrc, lockmode,
+									   any_pruned);
 
 		/* Close child relation, but keep locks */
 		table_close(childrel, NoLock);
@@ -804,3 +815,45 @@ apply_child_basequals(PlannerInfo *root, RelOptInfo *parentrel,
 
 	return true;
 }
+
+
+/*
+ * See how many child partitions survived for a partitioned relation
+ */
+void
+count_live_partitions(PlannerInfo *root, RelOptInfo *rel)
+{
+	int live_parts = 0, i = -1;
+	RelOptInfo *childrel;
+	LivePartition *live_part_stat;
+	Oid relid;
+
+	Assert(IS_PARTITIONED_REL(rel));
+
+	while((i = bms_next_member(rel->all_partrels, i)) >= 0)
+	{
+		childrel = root->simple_rel_array[i];
+		if (!IS_PARTITIONED_REL(childrel))
+			live_parts += 1;
+		else
+			live_parts += bms_num_members(childrel->all_partrels);
+	}
+
+	relid = planner_rt_fetch(rel->relid, root)->relid;
+	live_part_stat = find_related_liveparts(root->glob->flatten_live_parts, relid);
+
+	if (live_part_stat)
+	{
+		live_part_stat->count++;
+		live_part_stat->lived_count += live_parts;
+	}
+	else
+	{
+		LivePartition *tmp = makeNode(LivePartition);
+		tmp->relid = relid;
+		tmp->lived_count = live_parts;
+		tmp->count = 1;
+		root->glob->flatten_live_parts = lappend(root->glob->flatten_live_parts,
+												 tmp);
+	}
+}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 50d6ad28b4..9c640bb68e 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -62,6 +62,7 @@
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/optimizer.h"
+#include "optimizer/cost.h"
 #include "parser/analyze.h"
 #include "parser/parsetree.h"
 #include "storage/lmgr.h"
@@ -105,6 +106,8 @@ static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
 static bool choose_custom_plan(CachedPlanSource *plansource,
 							   ParamListInfo boundParams);
 static double cached_plan_cost(CachedPlan *plan, bool include_planner);
+static double cost_generic_plan(CachedPlanSource *plansource, List *plist);
+static void merge_live_partitions(List **accumulatedl_live_parts, List *curr_live_parts);
 static Query *QueryListGetPrimaryStmt(List *stmts);
 static void AcquireExecutorLocks(List *stmt_list, bool acquire);
 static void AcquirePlannerLocks(List *stmt_list, bool acquire);
@@ -220,6 +223,7 @@ CreateCachedPlan(RawStmt *raw_parse_tree,
 	plansource->total_custom_cost = 0;
 	plansource->num_generic_plans = 0;
 	plansource->num_custom_plans = 0;
+	plansource->total_lived_parts = NIL;
 
 	MemoryContextSwitchTo(oldcxt);
 
@@ -957,6 +961,15 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
 		 */
 		MemoryContextSwitchTo(plan_context);
 
+		if (boundParams == NULL)
+		{
+			/*
+			 * Generic plan, PlannedStmt->append_plans is not copied on purpose
+			 * (see comments in find_append_plans), so we must cost_generic_plan
+			 * before copyObject.
+			 **/
+			plansource->generic_cost = cost_generic_plan(plansource, plist);
+		}
 		plist = copyObject(plist);
 	}
 	else
@@ -1065,12 +1078,123 @@ choose_custom_plan(CachedPlanSource *plansource, ParamListInfo boundParams)
 	return true;
 }
 
+/*
+ * append_plan_cost_reduction
+ *	Get append plan cost reduction based on the plan time partition prune from
+ *  custom plan.
+ */
+static double
+append_plan_cost_reduction(Append *aplan, LivePartition *live_parts)
+{
+	double result;
+	double total_child_plans = 0, total_child_cost = 0;
+	double avg_lives, live_part_ratio;
+	ListCell *lc;
+	foreach(lc, aplan->appendplans)
+	{
+		Plan	*plan = lfirst(lc);
+		++total_child_plans;
+		total_child_cost += plan->total_cost;
+	}
+
+	avg_lives = live_parts->lived_count / live_parts->count;
+	if (avg_lives < 1)
+		avg_lives = 1;
+	live_part_ratio = avg_lives / total_child_plans;
+	result = (1 - live_part_ratio) * total_child_cost;
+	result +=
+		cpu_tuple_cost * APPEND_CPU_COST_MULTIPLIER *
+		( 1 - live_part_ratio ) * aplan->plan.plan_rows;
+
+	elog(INFO, "append_plan_cost_reduction %d %f", aplan->plan.plan_node_id, result);
+	/*
+	 * XXX: I assume no need to consider the parallel sine all the plan_rows
+	 * and subplan->total_cost have considered it already.
+	 */
+	return result;
+}
+
+
+/*
+ * find_append_plans
+ *	find append plans from PlannedStmt.
+ */
+static List *
+find_append_plans(PlannedStmt *stmt)
+{
+	/*
+	 * Method 1: with append_plan being gathered at set_plan_refs stage.
+	 * however it might be a big stuff for a partitioned table with
+	 * thousands of partitions and there is no use to CachedPlanSource,
+	 * so I didn't copy it in _copyPlannedStmt on purpose, however coping
+	 * a object partially might a bad practice, so I have method 2 in mind
+	 * but it is not implemented yet.
+	 */
+	return stmt->append_plans;
+
+	/*
+	 * Method 2: writing a plantree_walker(Plan, bool (*walker)(), void *context)
+	 * and go though all the plans in PlannedStmt which is not implemented yet.
+	 */
+}
+
+
+/*
+ * cost_generic_plan
+ *
+ * cost generic plan by guessing how many cost should be reduced at
+ * initial partition prune stage, then reduce such cost from total_cost.
+ */
+static double
+cost_generic_plan(CachedPlanSource *plansource, List *plist)
+{
+	double	result = 0;
+	ListCell	*lc1, *lc2;
+	double init_prune_cost_reduction = 0;
+
+	foreach(lc1, plist)
+	{
+		PlannedStmt *stmt = lfirst_node(PlannedStmt, lc1);
+		if (stmt->commandType == CMD_UTILITY)
+			continue;
+		result += stmt->planTree->total_cost;
+	}
+
+	if (plansource->total_lived_parts != NULL)
+	{
+		Assert(list_length(plansource->total_lived_parts) == list_length(plist));
+		forboth(lc1, plist, lc2, plansource->total_lived_parts)
+		{
+			PlannedStmt *stmt = lfirst_node(PlannedStmt, lc1);
+			List	*lived_parts = lfirst_node(List, lc2);
+			ListCell	*lc3;
+			if (stmt->commandType == CMD_UTILITY)
+				continue;
+			foreach(lc3, find_append_plans(stmt))
+			{
+				Append *aplan = lfirst_node(Append, lc3);
+				LivePartition *live_part;
+				if (!OidIsValid(aplan->relid))
+					continue;
+			   live_part = find_related_liveparts(lived_parts, aplan->relid);
+				if (live_part)
+					init_prune_cost_reduction += append_plan_cost_reduction(aplan, live_part);
+												
+			}
+		}
+	}
+
+	return result - init_prune_cost_reduction;
+}
+
 /*
  * cached_plan_cost: calculate estimated cost of a plan
  *
  * If include_planner is true, also include the estimated cost of constructing
  * the plan.  (We must factor that into the cost of using a custom plan, but
  * we don't count it for a generic plan.)
+ * XXX: can be renamed to custom_plan_cost since the generic plan is cost
+ * int cost_generic_plan.
  */
 static double
 cached_plan_cost(CachedPlan *plan, bool include_planner)
@@ -1119,6 +1243,68 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
 	return result;
 }
 
+
+/*
+ * record_runtime_partition_prunes
+ *	Record the plan time partition pruned information from custom plan to
+ * CachedPlanSource for later use.
+ */
+static void
+record_runtime_partition_prunes(CachedPlanSource *plansource, CachedPlan *cplan)
+{
+	MemoryContext oldCtx;
+
+	oldCtx = MemoryContextSwitchTo(plansource->context);
+	if (plansource->total_lived_parts == NIL)
+	{
+		ListCell	*lc;
+		foreach(lc, cplan->stmt_list)
+		{
+			PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+			plansource->total_lived_parts = lappend(plansource->total_lived_parts,
+													copyObject(stmt->flatten_live_parts));
+		}
+	}
+	else
+	{
+		ListCell	*lc1, *lc2;
+		Assert(list_length(plansource->total_lived_parts) == list_length(cplan->stmt_list));
+		forboth(lc1, plansource->total_lived_parts, lc2, cplan->stmt_list)
+		{
+			List	*plansource_live_parts = lfirst_node(List, lc1);
+			List	*stmt_live_parts = lfirst_node(PlannedStmt, lc2)->flatten_live_parts;
+			merge_live_partitions(&plansource_live_parts, stmt_live_parts);
+		}
+	}
+	MemoryContextSwitchTo(oldCtx);
+}
+
+/*
+ * merge_live_partitions
+ */
+static void
+merge_live_partitions(List **accumulatedl_live_parts, List *curr_live_parts)
+{
+	ListCell *lc;
+	foreach(lc, curr_live_parts)
+	{
+		LivePartition *curr_live_part = lfirst_node(LivePartition, lc);
+		LivePartition *accm_live_part = find_related_liveparts(*accumulatedl_live_parts,
+															   curr_live_part->relid);
+		if (accm_live_part)
+		{
+			accm_live_part->lived_count += curr_live_part->lived_count;
+			accm_live_part->count += curr_live_part->count;
+		}
+			
+		else
+			*accumulatedl_live_parts = lappend(*accumulatedl_live_parts,
+												copyObject(curr_live_part));
+	}
+}
+
+
+
 /*
  * GetCachedPlan: get a cached plan from a CachedPlanSource.
  *
@@ -1188,8 +1374,6 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
 				MemoryContextSetParent(plan->context,
 									   MemoryContextGetParent(plansource->context));
 			}
-			/* Update generic_cost whenever we make a new generic plan */
-			plansource->generic_cost = cached_plan_cost(plan, false);
 
 			/*
 			 * If, based on the now-known value of generic_cost, we'd not have
@@ -1217,7 +1401,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
 		plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
 		/* Accumulate total costs of custom plans */
 		plansource->total_custom_cost += cached_plan_cost(plan, true);
-
+		record_runtime_partition_prunes(plansource, plan);
 		plansource->num_custom_plans++;
 	}
 	else
diff --git a/src/include/nodes/nodeFuncs.h b/src/include/nodes/nodeFuncs.h
index 9cc56eecaa..c508f73389 100644
--- a/src/include/nodes/nodeFuncs.h
+++ b/src/include/nodes/nodeFuncs.h
@@ -14,6 +14,7 @@
 #define NODEFUNCS_H
 
 #include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
 
 
 /* flags bits for query_tree_walker and query_tree_mutator */
@@ -53,6 +54,7 @@ extern int	exprLocation(const Node *expr);
 extern void fix_opfuncids(Node *node);
 extern void set_opfuncid(OpExpr *opexpr);
 extern void set_sa_opfuncid(ScalarArrayOpExpr *opexpr);
+extern LivePartition *find_related_liveparts(List *live_parts, Oid relid);
 
 /* Is clause a FuncExpr clause? */
 static inline bool
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 7ddd8c011b..b1b3f727bf 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -274,6 +274,7 @@ typedef enum NodeTag
 	T_RollupData,
 	T_GroupingSetData,
 	T_StatisticExtInfo,
+	T_LivePartition,
 
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index dbe86e7af6..6297010fe9 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -147,6 +147,10 @@ typedef struct PlannerGlobal
 	char		maxParallelHazard;	/* worst PROPARALLEL hazard level */
 
 	PartitionDirectory partition_directory; /* partition descriptors */
+
+	List		*flatten_live_parts;
+
+	List	    *append_plans; /* set it on set_plan_refs */
 } PlannerGlobal;
 
 /* macro for fetching the Plan associated with a SubPlan node */
@@ -2513,6 +2517,15 @@ typedef struct
 	int64		offset_est;
 } FinalPathExtraData;
 
+
+typedef struct
+{
+	NodeTag		tag;
+	Oid		relid;
+	int		lived_count;
+	int	    count;
+} LivePartition;
+
 /*
  * For speed reasons, cost estimation for join paths is performed in two
  * phases: the first phase tries to quickly derive a lower bound for the
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 83e01074ed..a15c17cb4e 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -79,6 +79,14 @@ typedef struct PlannedStmt
 	List	   *subplans;		/* Plan trees for SubPlan expressions; note
 								 * that some could be NULL */
 
+	List	   *flatten_live_parts;
+
+	/*
+	 * I didn't copy it on purpose to keep the size small,
+	 * This might be a bad practice ...
+	 */
+	List	   *append_plans;
+
 	Bitmapset  *rewindPlanIDs;	/* indices of subplans that require REWIND */
 
 	List	   *rowMarks;		/* a list of PlanRowMark's */
@@ -252,8 +260,8 @@ typedef struct Append
 {
 	Plan		plan;
 	Bitmapset  *apprelids;		/* RTIs of appendrel(s) formed by this node */
+	Oid			relid;  /* set to relation's oid for partitioned table only */
 	List	   *appendplans;
-
 	/*
 	 * All 'appendplans' preceding this index are non-partial plans. All
 	 * 'appendplans' from this index onwards are partial plans.
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6141654e47..07bfd9066d 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -31,6 +31,13 @@
 
 #define DEFAULT_EFFECTIVE_CACHE_SIZE  524288	/* measured in pages */
 
+/*
+ * Append and MergeAppend nodes are less expensive than some other operations
+ * which use cpu_tuple_cost; instead of adding a separate GUC, estimate the
+ * per-tuple cost as cpu_tuple_cost multiplied by this value.
+ */
+#define APPEND_CPU_COST_MULTIPLIER 0.5
+
 typedef enum
 {
 	CONSTRAINT_EXCLUSION_OFF,	/* do not use c_e */
diff --git a/src/include/optimizer/inherit.h b/src/include/optimizer/inherit.h
index 0ba6132652..396b25d4e3 100644
--- a/src/include/optimizer/inherit.h
+++ b/src/include/optimizer/inherit.h
@@ -23,5 +23,5 @@ extern void expand_inherited_rtentry(PlannerInfo *root, RelOptInfo *rel,
 extern bool apply_child_basequals(PlannerInfo *root, RelOptInfo *parentrel,
 								  RelOptInfo *childrel, RangeTblEntry *childRTE,
 								  AppendRelInfo *appinfo);
-
+extern void count_live_partitions(PlannerInfo *root, RelOptInfo *rel);
 #endif							/* INHERIT_H */
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 4901568553..2d1dcde5c6 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -18,6 +18,7 @@
 #include "access/tupdesc.h"
 #include "lib/ilist.h"
 #include "nodes/params.h"
+#include "nodes/pathnodes.h"
 #include "tcop/cmdtag.h"
 #include "utils/queryenvironment.h"
 #include "utils/resowner.h"
@@ -130,6 +131,7 @@ typedef struct CachedPlanSource
 	/* State kept to help decide whether to use custom or generic plans: */
 	double		generic_cost;	/* cost of generic plan, or -1 if not known */
 	double		total_custom_cost;	/* total cost of custom plans so far */
+	List	   *total_lived_parts;
 	int64		num_custom_plans;	/* # of custom plans included in total */
 	int64		num_generic_plans;	/* # of generic plans */
 } CachedPlanSource;
-- 
2.21.0

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

over 5 years ago

In reply to: Andy Fan (#5)

Re: Improve choose_custom_plan for initial partition prune case

On Wed, Oct 7, 2020 at 11:20 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Hi Ashutosh:

Thanks for coming.

On Mon, Oct 5, 2020 at 9:27 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:

On Thu, Oct 1, 2020 at 9:34 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Given the plan example:

CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);

CREATE TABLE measurement_y2006m02 PARTITION OF measurement
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');

prepare s as select * from measurement where logdate = $1;
execute s('2006-02-01').

The generic plan will probably not be chosen because it doesn't reduce the cost
which can be reduced at initial_prune while the custom plan reduces such cost
at planning time. which makes the cost comparison not fair. I'm thinking if we can
get an estimated cost reduction of initial_prunne for generic plan based on the
partition pruned at plan time from custom plan and then reducing
such costs from the generic plan. I just went through the related code but
didn't write anything now. I'd like to see if this is a correct direction to go.

What happens when we
execute plans with values that have estimates similar to the generic
plan later when we moderate generic plan costs based on the custom
plans?

The example at the beginning of this thread, I used the exact same values
every time, the custom plan will be chosen all the time, which is bad,
The main reason is the custom plan knows the exact value in Execute
message, so it run plan time partition prune, then the total cost is low, however
for the generic plan the partition prune happens at Runtime initial_partition prune
stage, so the cost of the partitions which can be pruned at that stage is still
included the total cost, so generic plans can't be chosen. that would be the
thing I want to fix.

Something is wrong in the generic plan costing then. IIUC, the
selectivity estimate for only a single partition should come out >= 1.
For all the other partitions, it should be 1 and that too because we
clamp the row counts. So the final costs for generic and custom plans
shouldn't be far off unless there's large deviation in the selectivity
of a partition key. I am assuming that there's an equality condition
on a partition key. That's what I meant by the paragraph below.

If the table has good distribution of a partition key, which also
results in good distribution of data across partitions, generic plan
cost will be similar to the custom plan costs. If not that's something
we want to fix.

Can you please investigate on these lines?

--
Best Wishes,
Ashutosh Bapat

#10

Amit Langote

amitlangote09@gmail.com

over 5 years ago

In reply to: Ashutosh Bapat (#9)

Re: Improve choose_custom_plan for initial partition prune case

Hi Ashutosh,

On Wed, Oct 7, 2020 at 8:40 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Wed, Oct 7, 2020 at 11:20 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Mon, Oct 5, 2020 at 9:27 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:

What happens when we
execute plans with values that have estimates similar to the generic
plan later when we moderate generic plan costs based on the custom
plans?

The example at the beginning of this thread, I used the exact same values
every time, the custom plan will be chosen all the time, which is bad,
The main reason is the custom plan knows the exact value in Execute
message, so it run plan time partition prune, then the total cost is low, however
for the generic plan the partition prune happens at Runtime initial_partition prune
stage, so the cost of the partitions which can be pruned at that stage is still
included the total cost, so generic plans can't be chosen. that would be the
thing I want to fix.

Something is wrong in the generic plan costing then.

That's right.

IIUC, the
selectivity estimate for only a single partition should come out >= 1.
For all the other partitions, it should be 1 and that too because we
clamp the row counts. So the final costs for generic and custom plans
shouldn't be far off unless there's large deviation in the selectivity
of a partition key. I am assuming that there's an equality condition
on a partition key. That's what I meant by the paragraph below.

If the table has good distribution of a partition key, which also
results in good distribution of data across partitions, generic plan
cost will be similar to the custom plan costs. If not that's something
we want to fix.

Can you please investigate on these lines?

The planner currently costs a generic plan assuming that *all*
partitions will be scanned, which is not the case, because run-time
pruning will ensure that scan nodes of all but one partition will be
discarded, assuming a simple query like `select * from parted_table
where a = $1`. With N partitions, a generic plan for such a query is
N times as expensive as a custom plan, so will never win.

For example:

create table bar (a int, b int, c int) partition by range (a);
create table bar_1 partition of bar for values from (1) to (100001);
create table bar_2 partition of bar for values from (100001) to (200001);
create table bar_3 partition of bar for values from (200001) to (300001);
create table bar_4 partition of bar for values from (300001) to (400001);
create table bar_5 partition of bar for values from (400001) to (500001);
create table bar_6 partition of bar for values from (500001) to (600001);
create table bar_7 partition of bar for values from (600001) to (700001);
create table bar_8 partition of bar for values from (700001) to (800001);
create table bar_9 partition of bar for values from (800001) to (900001);
create table bar_10 partition of bar for values from (900001) to (1000001);
insert into bar select generate_series(1, 1000000);
create index on bar (a);
analyze bar;

set plan_cache_mode to force_custom_plan ;
prepare q as select * from bar where a = $1;
explain execute q (1);
QUERY PLAN
------------------------------------------------------------------------------
Index Scan using bar_1_a_idx on bar_1 bar (cost=0.29..8.31 rows=1 width=12)
Index Cond: (a = 1)
(2 rows)

deallocate q;
set plan_cache_mode to force_generic_plan ;
prepare q as select * from bar where a = $1;
explain execute q (1);
QUERY PLAN
--------------------------------------------------------------------------------
Append (cost=0.29..83.15 rows=10 width=12)
Subplans Removed: 9
-> Index Scan using bar_1_a_idx on bar_1 (cost=0.29..8.31 rows=1 width=12)
Index Cond: (a = $1)
(4 rows)

As you can see, the cost of the generic plan is 83.13 which is 8.31 *
10, even though only one partition is actually scanned, so the actual
cost is the same as a custom plan. The planner fails to consider that
the those 9 subplans will be removed during execution.

--
Amit Langote
EDB: http://www.enterprisedb.com

#11

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Amit Langote (#10)

Re: Improve choose_custom_plan for initial partition prune case

Thank you Amit and Ashutosh for your reply!

On Wed, Oct 7, 2020 at 8:41 PM Amit Langote <amitlangote09@gmail.com> wrote:

Hi Ashutosh,

On Wed, Oct 7, 2020 at 8:40 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Wed, Oct 7, 2020 at 11:20 AM Andy Fan <zhihui.fan1213@gmail.com>

wrote:

On Mon, Oct 5, 2020 at 9:27 PM Ashutosh Bapat <

ashutosh.bapat.oss@gmail.com> wrote:

What happens when we
execute plans with values that have estimates similar to the generic
plan later when we moderate generic plan costs based on the custom
plans?

The example at the beginning of this thread, I used the exact same

values

every time, the custom plan will be chosen all the time, which is bad,
The main reason is the custom plan knows the exact value in Execute
message, so it run plan time partition prune, then the total cost is

low, however

for the generic plan the partition prune happens at Runtime

initial_partition prune

stage, so the cost of the partitions which can be pruned at that

stage is still

included the total cost, so generic plans can't be chosen. that would

be the

thing I want to fix.

Something is wrong in the generic plan costing then.

That's right.

IIUC, the
selectivity estimate for only a single partition should come out >= 1.
For all the other partitions, it should be 1 and that too because we
clamp the row counts.

If the planner knows the exact parameter, the planner can estimate
other partitions as 1 row (actually 0 row). However if it doesn't know
that,
it will be rows / ndistinctVals for "partkey = $1" case.

So the final costs for generic and custom plans

shouldn't be far off unless there's large deviation in the selectivity
of a partition key. I am assuming that there's an equality condition
on a partition key. That's what I meant by the paragraph below.

If the table has good distribution of a partition key, which also
results in good distribution of data across partitions, generic plan
cost will be similar to the custom plan costs. If not that's something
we want to fix.

Can you please investigate on these lines?

The planner currently costs a generic plan assuming that *all*
partitions will be scanned, which is not the case, because run-time
pruning will ensure that scan nodes of all but one partition will be
discarded, assuming a simple query like `select * from parted_table
where a = $1`. With N partitions, a generic plan for such a query is
N times as expensive as a custom plan, so will never win.

For example:

create table bar (a int, b int, c int) partition by range (a);
create table bar_1 partition of bar for values from (1) to (100001);
create table bar_2 partition of bar for values from (100001) to (200001);
create table bar_3 partition of bar for values from (200001) to (300001);
create table bar_4 partition of bar for values from (300001) to (400001);
create table bar_5 partition of bar for values from (400001) to (500001);
create table bar_6 partition of bar for values from (500001) to (600001);
create table bar_7 partition of bar for values from (600001) to (700001);
create table bar_8 partition of bar for values from (700001) to (800001);
create table bar_9 partition of bar for values from (800001) to (900001);
create table bar_10 partition of bar for values from (900001) to (1000001);
insert into bar select generate_series(1, 1000000);
create index on bar (a);
analyze bar;

set plan_cache_mode to force_custom_plan ;
prepare q as select * from bar where a = $1;
explain execute q (1);
QUERY PLAN

------------------------------------------------------------------------------
Index Scan using bar_1_a_idx on bar_1 bar (cost=0.29..8.31 rows=1
width=12)
Index Cond: (a = 1)
(2 rows)

deallocate q;
set plan_cache_mode to force_generic_plan ;
prepare q as select * from bar where a = $1;
explain execute q (1);
QUERY PLAN

--------------------------------------------------------------------------------
Append (cost=0.29..83.15 rows=10 width=12)
Subplans Removed: 9
-> Index Scan using bar_1_a_idx on bar_1 (cost=0.29..8.31 rows=1
width=12)
Index Cond: (a = $1)
(4 rows)

As you can see, the cost of the generic plan is 83.13 which is 8.31 *
10, even though only one partition is actually scanned, so the actual
cost is the same as a custom plan. The planner fails to consider that
the those 9 subplans will be removed during execution.

--
Amit Langote
EDB: http://www.enterprisedb.com

I also had some replies at the original runtime partition prune thread
[1]: /messages/by-id/CAKU4AWrzi7f1Y1J8q0xWO8fXOwf-BtdG+M9_7bVPnQyd5cLS0Q@mail.gmail.com
I am very sorry that I mess up this topic with 2 threads, I have some new
understanding about this (might be old ideas for others), I'd like to
continue this discussion in that thread and add you to the cc list there.

Thanks!

[1]: /messages/by-id/CAKU4AWrzi7f1Y1J8q0xWO8fXOwf-BtdG+M9_7bVPnQyd5cLS0Q@mail.gmail.com
/messages/by-id/CAKU4AWrzi7f1Y1J8q0xWO8fXOwf-BtdG+M9_7bVPnQyd5cLS0Q@mail.gmail.com

--
Best Regards
Andy Fan