Make Append Cost aware of some run time partition prune case

Started by Andy Fanover 5 years ago6 messageshackers
Jump to latest
#1Andy Fan
zhihui.fan1213@gmail.com

Currently the cost model of append path sums the cost/rows for all the
subpaths, it usually works well until we run into the run-time partition
prune
case. The first result is that generic plans will rarely be used for some
cases.
For instance, SELECT * FROM p WHERE pkey = $1; The custom plan will only
count the cost of one partition, however generic plan will count the cost
for all the
partitions even we are sure that only 1 partition will survive. Another
impact
is that planners may choose a wrong plan. for example, SELECT * FROM t1,
p
WHERE t1.a = p.pkey; The cost/rows of t1 nest loop p is estimated highly
improperly. This patch wants to help this case to some extent.

The basic idea is we need to estimate how many partitions will survive after
considering the runtime partition prune. After that, we will adjust the
cost/rows accordingly. IIUC, this follows Robert's idea at [1]/messages/by-id/CA+TgmoZHYoAL4HYwnGO25B8CxCB+vNMdf+7rbUzYykR4sU9yUA@mail.gmail.com.
However there are too many special cases which is hard
to handle, but luckily the most common case would be sth like partykey =
$1,
which we can estimate well, this patch is aimed to handle that part
specially.
I supposed 75% partitions will survive for other cases arbitrarily,
actually I
want to use 100% to be the same as the current situation. If we decide to
handle their special case differently, PartitionedRelQual has to be
redefined
a bit (see comments around that.) which is the main data struct in the
implementation.

The attached is the minimum runnable patch. There are some obvious issue
here:

1. MergeAppend path is not handled on purpose, it will be done at last.
2. The cost for Parallel Append Path is not adjusted, I'm not sure about
what is
the best way to do it. So there are 2 test cases in
sql/partition_prune.out
failed due to this, I just echo the 'results' to expected' so that you
can
know which one is impacted.

Here are some simple test results.

create table a1 partition of a for values in (1) partition by range (b, c);
create table a1_1 partition of a1 for values from (1, 0) to (1, 10);
create table a1_2 partition of a1 for values from (2, 0) to (2, 10);

create table a2 partition of a for values in (2) partition by range (b, c);
create table a2_1 partition of a2 for values from (1, 0) to (1, 10);
create table a2_2 partition of a2 for values from (2, 0) to (2, 10);

insert into a select 1, i%2 + 1, i % 10 from generate_series(1, 10000) i;
insert into a select 2, i%2 + 1, i % 10 from generate_series(1, 10000) i;

analyze a;

set plan_cache_mode to force_generic_plan;

prepare s as select * from a where a = $1;
PREPARE
explain execute s(1);
QUERY PLAN
-------------------------------------------------------------------
Append (cost=0.00..231.00 rows=10000 width=12) *<< both rows/cost is
adjusted.*
Subplans Removed: 2
-> Seq Scan on a1_1 a_1 (cost=0.00..90.50 rows=5000 width=12)
Filter: (a = $1)
-> Seq Scan on a1_2 a_2 (cost=0.00..90.50 rows=5000 width=12)
Filter: (a = $1)
(6 rows)

prepare s4 as select * from a where a = $1 and b = $2 and c = $3;
PREPARE
explain execute s4(1, 1, 2);
QUERY PLAN
--------------------------------------------------------------------
Append (cost=0.00..120.50 rows=1000 width=12) *<< Here*
Subplans Removed: 3
-> Seq Scan on a1_1 a_1 (cost=0.00..115.50 rows=1000 width=12)
Filter: ((a = $1) AND (b = $2) AND (c = $3))
(4 rows)

prepare s2 as select * from a where a = $1 union all select * from a where
a = $2;
PREPARE
explain execute s2(1, 1);
QUERY PLAN
-------------------------------------------------------------------------
Append (cost=0.00..762.00 rows=20000 width=12)
-> Append (cost=0.00..231.00 rows=10000 width=12) *<< Here*
Subplans Removed: 2
-> Seq Scan on a1_1 a_1 (cost=0.00..90.50 rows=5000 width=12)
Filter: (a = $1)
-> Seq Scan on a1_2 a_2 (cost=0.00..90.50 rows=5000 width=12)
Filter: (a = $1)
-> Append (cost=0.00..231.00 rows=10000 width=12) << Here
Subplans Removed: 2
-> Seq Scan on a1_1 a_4 (cost=0.00..90.50 rows=5000 width=12)
Filter: (a = $2)
-> Seq Scan on a1_2 a_5 (cost=0.00..90.50 rows=5000 width=12)
Filter: (a = $2)
(13 rows)

prepare s3 as select * from a where a = $1 union select * from a where a =
$2;
PREPARE
explain execute s3(1, 1);
QUERY PLAN
-------------------------------------------------------------------------------
HashAggregate (cost=912.00..1112.00 rows=20000 width=12)
Group Key: a.a, a.b, a.c
-> Append (cost=0.00..762.00 rows=20000 width=12)
-> Append (cost=0.00..231.00 rows=10000 width=12) << Here
Subplans Removed: 2
-> Seq Scan on a1_1 a_1 (cost=0.00..90.50 rows=5000
width=12)
Filter: (a = $1)
-> Seq Scan on a1_2 a_2 (cost=0.00..90.50 rows=5000
width=12)
Filter: (a = $1)
-> Append (cost=0.00..231.00 rows=10000 width=12) << Here
Subplans Removed: 2
-> Seq Scan on a1_1 a_4 (cost=0.00..90.50 rows=5000
width=12)
Filter: (a = $2)
-> Seq Scan on a1_2 a_5 (cost=0.00..90.50 rows=5000
width=12)
Filter: (a = $2)
(15 rows)

-- add a limit to make sure the runtime partitions prune.
explain select * from generate_series(1, 10) i(i) join lateral (select *
from a
where a.a = (i.i % 2 + 1) and a.b = (i.i % 2 + 1) and a.c = (i.i % 10)
limit 10000000) m on true;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..2030.10 rows=10000 width=16)
-> Function Scan on generate_series i (cost=0.00..0.10 rows=10 width=4)
-> Limit (cost=0.00..183.00 rows=1000 width=12)
-> Append (cost=0.00..183.00 rows=1000 width=12) << Here
-> Seq Scan on a1_1 a_1 (cost=0.00..178.00 rows=1000
width=12)
Filter: ((c = (i.i % 10)) AND (a = ((i.i % 2) + 1))
AND (b = ((i.i % 2) + 1)))
-> Seq Scan on a1_2 a_2 (cost=0.00..178.00 rows=1000
width=12)
Filter: ((c = (i.i % 10)) AND (a = ((i.i % 2) + 1))
AND (b = ((i.i % 2) + 1)))
-> Seq Scan on a2_1 a_3 (cost=0.00..178.00 rows=1000
width=12)
Filter: ((c = (i.i % 10)) AND (a = ((i.i % 2) + 1))
AND (b = ((i.i % 2) + 1)))
-> Seq Scan on a2_2 a_4 (cost=0.00..178.00 rows=1000
width=12)
Filter: ((c = (i.i % 10)) AND (a = ((i.i % 2) + 1))
AND (b = ((i.i % 2) + 1)))
(12 rows)

Any thoughts about this? Thanks

[1]: /messages/by-id/CA+TgmoZHYoAL4HYwnGO25B8CxCB+vNMdf+7rbUzYykR4sU9yUA@mail.gmail.com
/messages/by-id/CA+TgmoZHYoAL4HYwnGO25B8CxCB+vNMdf+7rbUzYykR4sU9yUA@mail.gmail.com

--
Best Regards
Andy Fan

Attachments:

v1-0001-Adjust-Append-Path-cost-model-for-runtime-partiti.patchapplication/octet-stream; name=v1-0001-Adjust-Append-Path-cost-model-for-runtime-partiti.patchDownload+488-92
part_runtime_prune.sqlapplication/octet-stream; name=part_runtime_prune.sqlDownload
#2Ryan Lambert
ryan@rustprooflabs.com
In reply to: Andy Fan (#1)
Re: Make Append Cost aware of some run time partition prune case

On Mon, Nov 9, 2020 at 5:44 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Currently the cost model of append path sums the cost/rows for all the
subpaths, it usually works well until we run into the run-time partition
prune
case. The first result is that generic plans will rarely be used for some
cases.
For instance, SELECT * FROM p WHERE pkey = $1; The custom plan will only
count the cost of one partition, however generic plan will count the cost
for all the
partitions even we are sure that only 1 partition will survive. Another
impact
is that planners may choose a wrong plan. for example, SELECT * FROM t1,
p
WHERE t1.a = p.pkey; The cost/rows of t1 nest loop p is estimated highly
improperly. This patch wants to help this case to some extent.

Greetings,

I was referred to this patch by Amit as a possible improvement for an issue
I noticed recently. I had a test setup where I expected run-time pruning
to kick in but it did not. I am trying to test this patch to see if it
helps for that scenario, but ran into an error running make install against
the current master (commit 0a687c8f1).

costsize.c: In function ‘cost_append’:
costsize.c:2171:32: error: ‘AppendPath’ {aka ‘struct AppendPath’} has no
member named ‘partitioned_rels’
2171 | List *partitioned_rels = apath->partitioned_rels;
| ^~
make[4]: *** [<builtin>: costsize.o] Error 1
make[4]: Leaving directory
'/var/lib/postgresql/git/postgresql/src/backend/optimizer/path'
make[3]: *** [../../../src/backend/common.mk:39: path-recursive] Error 2
make[3]: Leaving directory
'/var/lib/postgresql/git/postgresql/src/backend/optimizer'
make[2]: *** [common.mk:39: optimizer-recursive] Error 2
make[2]: Leaving directory '/var/lib/postgresql/git/postgresql/src/backend'
make[1]: *** [Makefile:42: install-backend-recurse] Error 2
make[1]: Leaving directory '/var/lib/postgresql/git/postgresql/src'
make: *** [GNUmakefile:11: install-src-recurse] Error 2

Thanks,

Ryan Lambert

#3Andy Fan
zhihui.fan1213@gmail.com
In reply to: Ryan Lambert (#2)
Re: Make Append Cost aware of some run time partition prune case

Hi Ryan:

On Thu, Mar 4, 2021 at 8:14 AM Ryan Lambert <ryan@rustprooflabs.com> wrote:

On Mon, Nov 9, 2020 at 5:44 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Currently the cost model of append path sums the cost/rows for all the
subpaths, it usually works well until we run into the run-time partition
prune
case. The first result is that generic plans will rarely be used for
some cases.
For instance, SELECT * FROM p WHERE pkey = $1; The custom plan will only
count the cost of one partition, however generic plan will count the cost
for all the
partitions even we are sure that only 1 partition will survive. Another
impact
is that planners may choose a wrong plan. for example, SELECT * FROM
t1, p
WHERE t1.a = p.pkey; The cost/rows of t1 nest loop p is estimated highly
improperly. This patch wants to help this case to some extent.

Greetings,

I was referred to this patch by Amit as a possible improvement for an
issue I noticed recently. I had a test setup where I expected run-time
pruning to kick in but it did not. I am trying to test this patch to see
if it helps for that scenario, but ran into an error running make install
against the current master (commit 0a687c8f1).

costsize.c: In function ‘cost_append’:
costsize.c:2171:32: error: ‘AppendPath’ {aka ‘struct AppendPath’} has no
member named ‘partitioned_rels’
2171 | List *partitioned_rels = apath->partitioned_rels;
| ^~
make[4]: *** [<builtin>: costsize.o] Error 1
make[4]: Leaving directory
'/var/lib/postgresql/git/postgresql/src/backend/optimizer/path'
make[3]: *** [../../../src/backend/common.mk:39: path-recursive] Error 2
make[3]: Leaving directory
'/var/lib/postgresql/git/postgresql/src/backend/optimizer'
make[2]: *** [common.mk:39: optimizer-recursive] Error 2
make[2]: Leaving directory '/var/lib/postgresql/git/postgresql/src/backend'
make[1]: *** [Makefile:42: install-backend-recurse] Error 2
make[1]: Leaving directory '/var/lib/postgresql/git/postgresql/src'
make: *** [GNUmakefile:11: install-src-recurse] Error 2

Thanks,

Ryan Lambert

Thanks for checking. This patch is on a very old master and the code is
too complex
since I wanted to handle a full scenario of a run time partition prune,
which has lots
of troubles and not very practical I think. so I am not happy with that
now.

I have implemented a new one, which only handles 1 level of partitioned
table, and
only 1 partition key. and only handle the eq operators like partkey = $1 /
partkey in ($1, $2)
/ parkey = $1 or partkey = $2; The patch works well in my user case. I
can send
one on the latest master shortly, and hope it is helpful for you as well.

(At the same time, I also ran into a case that we can expand more init
partition
prune case [1]/messages/by-id/CAKU4AWq4NLxu5JF9+d=o=A636-=eFNFmPx+kJ44ezTm=ikZ73w@mail.gmail.com, you can check that one if you like. I am happy with that
patch
now).

[1]: /messages/by-id/CAKU4AWq4NLxu5JF9+d=o=A636-=eFNFmPx+kJ44ezTm=ikZ73w@mail.gmail.com
/messages/by-id/CAKU4AWq4NLxu5JF9+d=o=A636-=eFNFmPx+kJ44ezTm=ikZ73w@mail.gmail.com

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#4Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#3)
Re: Make Append Cost aware of some run time partition prune case

I have implemented a new one, which only handles 1 level of partitioned
table, and
only 1 partition key. and only handle the eq operators like partkey = $1
/ partkey in ($1, $2)
/ parkey = $1 or partkey = $2; The patch works well in my user case. I
can send
one on the latest master shortly, and hope it is helpful for you as well.

Hi:

Here is the new patch for this topic, which has proved works in my limited
user
case (apply the similar logic on pg11), and it is incomplete since more
cases
should be covered but not. Uploading this patch now is just for discussing
and
testing.

Design principle:

1). the cost of AppendPath should be reduced for either init partition
prune or run
time partition prune. All of the startup_cost, total_cost, rows should be
adjusted. As for the startup_cost, if we should adjust it carefully, or
else we can
get the run_time_cost less than 0.

2). When we merge the subpath from sub-partitioned rel via
accumulate_append_subpath, currently we just merge the subpaths and discard
the
cost/rows in AppendPath. After this feature is involved, we may consider
to use
the AppendPath's cost as well during this stage.

3). When join is involved, AppendPath is not enough since the estimated
rows for
a join relation cares about rel1->rows, rel2->rows only, the Path.rows is
not
cared. so we need to adjust rel->rows as well. and only for the init
partition prune case. (appendrel->rows for planning time partition prune is
handled already).

The biggest problem of the above is I don't know how to adjust the cost for
Parallel Append Path. Currently I just ignore it, which would cause some
query
should use Parallel Append Path but not.

Something I don't want to handle really unless we can address its value.
1. Operators like >, <. Between and.
2. the uncompleted part key appeared in prunequals. Like we have partition
key (a,
b). But use just use A = 1 as restrictinfo.

The main reason I don't want to handle them are 1). it would be uncommon.
b). It introduces extra complexity. c). at last, we can't estimate it well
like partkey > $1,
what would be a prune ratio for ).

Something I don't handle so far are: 1). accumulate_append_subpath
stuff. 2). MergeAppend. 3). Multi Partition key.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Attachments:

v1-0001-adjust-cost-model-for-partition-prune-case.patchapplication/octet-stream; name=v1-0001-adjust-cost-model-for-partition-prune-case.patchDownload+485-157
#5vignesh C
vignesh21@gmail.com
In reply to: Andy Fan (#4)
Re: Make Append Cost aware of some run time partition prune case

On Thu, Mar 4, 2021 at 9:51 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

I have implemented a new one, which only handles 1 level of partitioned table, and
only 1 partition key. and only handle the eq operators like partkey = $1 / partkey in ($1, $2)
/ parkey = $1 or partkey = $2; The patch works well in my user case. I can send
one on the latest master shortly, and hope it is helpful for you as well.

Hi:

Here is the new patch for this topic, which has proved works in my limited user
case (apply the similar logic on pg11), and it is incomplete since more cases
should be covered but not. Uploading this patch now is just for discussing and
testing.

Design principle:

1). the cost of AppendPath should be reduced for either init partition prune or run
time partition prune. All of the startup_cost, total_cost, rows should be
adjusted. As for the startup_cost, if we should adjust it carefully, or else we can
get the run_time_cost less than 0.

2). When we merge the subpath from sub-partitioned rel via
accumulate_append_subpath, currently we just merge the subpaths and discard the
cost/rows in AppendPath. After this feature is involved, we may consider to use
the AppendPath's cost as well during this stage.

3). When join is involved, AppendPath is not enough since the estimated rows for
a join relation cares about rel1->rows, rel2->rows only, the Path.rows is not
cared. so we need to adjust rel->rows as well. and only for the init
partition prune case. (appendrel->rows for planning time partition prune is
handled already).

The biggest problem of the above is I don't know how to adjust the cost for
Parallel Append Path. Currently I just ignore it, which would cause some query
should use Parallel Append Path but not.

Something I don't want to handle really unless we can address its value.
1. Operators like >, <. Between and.
2. the uncompleted part key appeared in prunequals. Like we have partition key (a,
b). But use just use A = 1 as restrictinfo.

The main reason I don't want to handle them are 1). it would be uncommon.
b). It introduces extra complexity. c). at last, we can't estimate it well like partkey > $1,
what would be a prune ratio for ).

Something I don't handle so far are: 1). accumulate_append_subpath
stuff. 2). MergeAppend. 3). Multi Partition key.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

#6Daniel Gustafsson
daniel@yesql.se
In reply to: vignesh C (#5)
Re: Make Append Cost aware of some run time partition prune case

On 14 Jul 2021, at 17:55, vignesh C <vignesh21@gmail.com> wrote:

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

As the thread has stalled, this patch still doesn't apply (judging by the log
it's likely not too hard to resolve). I'm marking this patch Returned with
Feedback, feel free to open a new entry for an updated patch.

--
Daniel Gustafsson https://vmware.com/