Improve planner cost estimations for alternative subplans

Started by Alexey Bashtanovalmost 6 years ago11 messageshackers
Jump to latest
#1Alexey Bashtanov
bashtanov@imap.cc

Hello,

Currently, in case of alternative subplans that do hashed vs per-row
lookups,
the per-row estimate is used when planning the rest of the query.
It's also coded in not quite an explicit way.

In [1]/messages/by-id/ff42b25b-ff03-27f8-ed11-b8255d658cd5@imap.cc we found a situation where it leads to a suboptimal plan,
as it bloats the overall cost into large figures,
a decision related to an outer part of the plan look negligible to the
planner,
and as a result it doesn't elaborate on choosing the optimal one.

The patch is to fix it. Our linear model for costs cannot quite accommodate
the piecewise linear matter of alternative subplans,
so it is based on ugly heuristics and still cannot be very precise,
but I think it's better than the current one.

Thoughts?

Best, Alex

[1]: /messages/by-id/ff42b25b-ff03-27f8-ed11-b8255d658cd5@imap.cc
/messages/by-id/ff42b25b-ff03-27f8-ed11-b8255d658cd5@imap.cc

Attachments:

alternatives-cost-as-geom-mean-v2.patchtext/x-patch; charset=UTF-8; name=alternatives-cost-as-geom-mean-v2.patchDownload+72-35
#2Melanie Plageman
melanieplageman@gmail.com
In reply to: Alexey Bashtanov (#1)
Re: Improve planner cost estimations for alternative subplans

On Fri, Jun 5, 2020 at 9:08 AM Alexey Bashtanov <bashtanov@imap.cc> wrote:

In [1] we found a situation where it leads to a suboptimal plan,
as it bloats the overall cost into large figures,
a decision related to an outer part of the plan look negligible to the
planner,
and as a result it doesn't elaborate on choosing the optimal one.

I've just started looking at this patch today, but I was wondering if
you might include a test case which minimally reproduces the original
problem you had.
The only plan diff I see is in updatable_views.sql, and it doesn't
illustrate the problem as well as a more straightforward SELECT query
with EXISTS sublink might.

After putting in some logging, I see that there are only a
few non-catalog queries which exercise this code path.
This query from groupingsets.sql is an example of one such query:

select ten, sum(distinct four) from onek a
group by grouping sets((ten,four),(ten))
having exists (select 1 from onek b where sum(distinct a.four) = b.four);

But, the chosen plan for this query stays the same.

It would be helpful to see a query where a different plan is chosen
because of this change that is not from updatable_views.sql.

--
Melanie Plageman

#3Melanie Plageman
melanieplageman@gmail.com
In reply to: Alexey Bashtanov (#1)
Re: Improve planner cost estimations for alternative subplans

On Fri, Jun 5, 2020 at 9:08 AM Alexey Bashtanov <bashtanov@imap.cc> wrote:

In [1] we found a situation where it leads to a suboptimal plan,
as it bloats the overall cost into large figures,
a decision related to an outer part of the plan look negligible to the
planner,
and as a result it doesn't elaborate on choosing the optimal one.

Did this geometric average method result in choosing the desired plan for
this case?

The patch is to fix it. Our linear model for costs cannot quite accommodate
the piecewise linear matter of alternative subplans,
so it is based on ugly heuristics and still cannot be very precise,
but I think it's better than the current one.

Thoughts?

Is there another place in planner where two alternatives are averaged
together and that cost is used?

To me, it feels a little bit weird that we are averaging together the
startup cost of a plan which will always have a 0 startup cost and a
plan that will always have a non-zero startup cost and the per tuple
cost of a plan that will always have a negligible per tuple cost and one
that might have a very large per tuple cost.

I guess it feels different because instead of comparing alternatives you
are blending them.

I don't have any academic basis for saying that the alternatives costs
shouldn't be averaged together for use in the rest of the plan, so I
could definitely be wrong.

--
Melanie Plageman

#4Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#3)
Re: Improve planner cost estimations for alternative subplans

On Wed, Jun 17, 2020 at 06:21:58PM -0700, Melanie Plageman wrote:

On Fri, Jun 5, 2020 at 9:08 AM Alexey Bashtanov <bashtanov@imap.cc> wrote:

In [1] we found a situation where it leads to a suboptimal plan,
as it bloats the overall cost into large figures,
a decision related to an outer part of the plan look negligible to the
planner,
and as a result it doesn't elaborate on choosing the optimal one.

Did this geometric average method result in choosing the desired plan for
this case?

The patch is to fix it. Our linear model for costs cannot quite accommodate
the piecewise linear matter of alternative subplans,
so it is based on ugly heuristics and still cannot be very precise,
but I think it's better than the current one.

Thoughts?

Is there another place in planner where two alternatives are averaged
together and that cost is used?

To me, it feels a little bit weird that we are averaging together the
startup cost of a plan which will always have a 0 startup cost and a
plan that will always have a non-zero startup cost and the per tuple
cost of a plan that will always have a negligible per tuple cost and one
that might have a very large per tuple cost.

I guess it feels different because instead of comparing alternatives you
are blending them.

I don't have any academic basis for saying that the alternatives costs
shouldn't be averaged together for use in the rest of the plan, so I
could definitely be wrong.

I agree it feels weird. Even if it actually improved the problematic
case, I think it'll be quite hard to convince ourselves this helps in
general. For example, for cases that actually end up using the first
plan, this is bound to make the estimates worse. I find it hard to
believe it won't cause regressions in at least some cases.

Maybe this heuristics really is better than the old one, but I think we
need to understand why - a single query probably is not enough.

I think the crucial limitation here is that we don't know which of the
alternative plans will be used. Is there a chance to improve this,
perhaps by making some sort of guess?

I'm not particularly familiar with AlternativeSubPlans, but I see we're
picking the one in nodeSubplan.c based on plan_rows. Can't we do the
same thing in cost_qual_eval_walker?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#4)
Re: Improve planner cost estimations for alternative subplans

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

I'm not particularly familiar with AlternativeSubPlans, but I see we're
picking the one in nodeSubplan.c based on plan_rows. Can't we do the
same thing in cost_qual_eval_walker?

Nope. The entire reason why we have that kluge is that we don't know
until much later how many times we expect to execute the subplan.
AlternativeSubPlan allows the decision which subplan form to use to be
postponed till runtime; but when we're doing things like estimating the
cost and selectivity of a where-clause, we don't know that.

Maybe there's some way to recast things to avoid that problem,
but I have little clue what it'd look like.

I agree that averaging together the costs of the alternatives seems
wrong in principle. It's going to be one or the other, not some
quantum-mechanical superposition. Maybe there's a case for taking the
min costs (if you're feeling lucky) or the max costs (if you're not),
on the belief that the executor will/will not pick the choice that
contributes least to the total query cost. But I have a feeling that
either of those would distort our estimates too much. The existing
costing behavior at least can be seen to match *one* actual runtime
behavior.

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#5)
Re: Improve planner cost estimations for alternative subplans

I wrote:

Nope. The entire reason why we have that kluge is that we don't know
until much later how many times we expect to execute the subplan.
AlternativeSubPlan allows the decision which subplan form to use to be
postponed till runtime; but when we're doing things like estimating the
cost and selectivity of a where-clause, we don't know that.

Maybe there's some way to recast things to avoid that problem,
but I have little clue what it'd look like.

Actually ... maybe it's not that bad. Clearly there would be a
circularity issue for selectivity estimation, but all the alternatives
should have the same selectivity. Cost estimation is a different story:
by the time we need to do cost estimation for a subexpression, we do in
many cases have an idea how often the subexpression will be executed.

I experimented with adding a number-of-evaluations parameter to
cost_qual_eval, and found that the majority of call sites do have
something realistic they can pass. The attached very-much-WIP
patch shows my results so far. There's a lot of loose ends:

* Any call site using COST_QUAL_EVAL_DUMMY_NUM_EVALS is a potential spot
for future improvement. The only one that seems like it might be
fundamentally unsolvable is cost_subplan's usage; but that would only
matter if a subplan's testexpr contains an AlternativeSubPlan, which is
probably a negligible case in practice. The other ones seem to just
require more refactoring than I cared to do on a Sunday afternoon.

* I did not do anything for postgres_fdw.c beyond making it compile.
We can surely do better there, but it might require some rethinking
of the way that plan costs get cached.

* The merge and hash join costsize.c functions estimate costs of qpquals
(i.e. quals to be applied at the join that are not being used as merge
or hash conditions) by computing cost_qual_eval of the whole
joinrestrictlist and then subtracting off the cost of the merge or hash
quals. This is kind of broken if we want to use different num_eval
estimates for the qpquals and the merge/hash quals, which I think we do.
This probably just needs some refactoring to fix. We also need to save
the relevant rowcounts in the join Path nodes so that createplan.c can
do the right thing.

* I think it might be possible to improve the situation in
get_agg_clause_costs() if we're willing to postpone collection
of the actual aggregate costs till later. This'd require two
passes over the aggregate expressions, but that seems like it
might not be terribly expensive. (I'd be inclined to also look
at the idea of merging duplicate agg calls at plan time not
run time, if we refactor that.)

* I had to increase the number of table rows in one updatable_views.sql
test to keep the plans the same. Without that, the planner realized
that a seqscan would be cheaper than an indexscan. The result wasn't
wrong exactly, but it failed to prove that leakproof quals could be
used as indexquals, so I think we need to keep the plan choice the same.

Anyway, this is kind of invasive, but I think it shouldn't really
add noticeable costs as long as we save relevant rowcounts rather
than recomputing them in createplan.c. Is it worth doing? I dunno.
AlternativeSubPlan is pretty much a backwater, I think --- if it
were interesting performance-wise to a lot of people, more would
have been done with it by now.

regards, tom lane

Attachments:

better-alternativesubplan-cost-ests-1.patchtext/x-diff; charset=us-ascii; name=better-alternativesubplan-cost-ests-1.patchDownload+340-123
#7Alexey Bashtanov
bashtanov@imap.cc
In reply to: Melanie Plageman (#2)
Re: Improve planner cost estimations for alternative subplans

Hi Melanie,

Sorry for the delay.

I've just started looking at this patch today, but I was wondering if
you might include a test case which minimally reproduces the original
problem you had.

I could reproduce it with an easier generated data set, please see attached.

However, to be honest with you, while searching I encountered a few
examples of the opposite behavior,
when the patched version was slower than the master branch.
So I'm not so sure whether we should use the patch, maybe we should
rather consider Tom's approach.

Best, Alex

Attachments:

altplan-example.sqlapplication/sql; name=altplan-example.sqlDownload
#8Alexey Bashtanov
bashtanov@imap.cc
In reply to: Tom Lane (#6)
Re: Improve planner cost estimations for alternative subplans

Hi Tom,

sorry for the delay,

I experimented with adding a number-of-evaluations parameter to
cost_qual_eval, and found that the majority of call sites do have
something realistic they can pass. The attached very-much-WIP
patch shows my results so far. There's a lot of loose ends:

I like the idea, so if we alternative subplans remain there
I think we should implement it.

Best, Alex

#9Andy Fan
zhihui.fan1213@gmail.com
In reply to: Tom Lane (#6)
Re: Improve planner cost estimations for alternative subplans

On Mon, Jun 22, 2020 at 9:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Nope. The entire reason why we have that kluge is that we don't know
until much later how many times we expect to execute the subplan.
AlternativeSubPlan allows the decision which subplan form to use to be
postponed till runtime; but when we're doing things like estimating the
cost and selectivity of a where-clause, we don't know that.

Maybe there's some way to recast things to avoid that problem,
but I have little clue what it'd look like.

Actually ... maybe it's not that bad. Clearly there would be a
circularity issue for selectivity estimation, but all the alternatives
should have the same selectivity. Cost estimation is a different story:
by the time we need to do cost estimation for a subexpression, we do in
many cases have an idea how often the subexpression will be executed.

I read your idea of "ripping out all executor support for AlternativeSubPlan
and instead having the planner replace an AlternativeSubPlan with
the desired specific SubPlan somewhere late in planning, possibly
setrefs.c."
in [1]/messages/by-id/1992952.1592785225@sss.pgh.pa.us. I was thinking that if we can do such a replacement sooner,
for example once we know the num_calls for the subplans, Unknown if it
is possible though. If we can, then we can handle the issue here as well.

The attached is a very PoC version, I'm not sure if it is the right
direction
to go. I'm sorry that I still need more time to understand your solution
below but I'm too excited about your original idea.

[1]: /messages/by-id/1992952.1592785225@sss.pgh.pa.us

I experimented with adding a number-of-evaluations parameter to
cost_qual_eval, and found that the majority of call sites do have
something realistic they can pass. The attached very-much-WIP
patch shows my results so far. There's a lot of loose ends:

* Any call site using COST_QUAL_EVAL_DUMMY_NUM_EVALS is a potential spot
for future improvement. The only one that seems like it might be
fundamentally unsolvable is cost_subplan's usage; but that would only
matter if a subplan's testexpr contains an AlternativeSubPlan, which is
probably a negligible case in practice. The other ones seem to just
require more refactoring than I cared to do on a Sunday afternoon.

* I did not do anything for postgres_fdw.c beyond making it compile.
We can surely do better there, but it might require some rethinking
of the way that plan costs get cached.

* The merge and hash join costsize.c functions estimate costs of qpquals
(i.e. quals to be applied at the join that are not being used as merge
or hash conditions) by computing cost_qual_eval of the whole
joinrestrictlist and then subtracting off the cost of the merge or hash
quals. This is kind of broken if we want to use different num_eval
estimates for the qpquals and the merge/hash quals, which I think we do.
This probably just needs some refactoring to fix. We also need to save
the relevant rowcounts in the join Path nodes so that createplan.c can
do the right thing.

* I think it might be possible to improve the situation in
get_agg_clause_costs() if we're willing to postpone collection
of the actual aggregate costs till later. This'd require two
passes over the aggregate expressions, but that seems like it
might not be terribly expensive. (I'd be inclined to also look
at the idea of merging duplicate agg calls at plan time not
run time, if we refactor that.)

* I had to increase the number of table rows in one updatable_views.sql
test to keep the plans the same. Without that, the planner realized
that a seqscan would be cheaper than an indexscan. The result wasn't
wrong exactly, but it failed to prove that leakproof quals could be
used as indexquals, so I think we need to keep the plan choice the same.

Anyway, this is kind of invasive, but I think it shouldn't really
add noticeable costs as long as we save relevant rowcounts rather
than recomputing them in createplan.c. Is it worth doing? I dunno.
AlternativeSubPlan is pretty much a backwater, I think --- if it
were interesting performance-wise to a lot of people, more would
have been done with it by now.

regards, tom lane

--
Best Regards
Andy Fan

Attachments:

v1-0001-Convert-the-AlternativeSubplan-to-Subplan-as-soon.patchapplication/octet-stream; name=v1-0001-Convert-the-AlternativeSubplan-to-Subplan-as-soon.patchDownload+102-5
#10Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#9)
Re: Improve planner cost estimations for alternative subplans

On Mon, Aug 17, 2020 at 10:12 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Mon, Jun 22, 2020 at 9:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Nope. The entire reason why we have that kluge is that we don't know
until much later how many times we expect to execute the subplan.
AlternativeSubPlan allows the decision which subplan form to use to be
postponed till runtime; but when we're doing things like estimating the
cost and selectivity of a where-clause, we don't know that.

Maybe there's some way to recast things to avoid that problem,
but I have little clue what it'd look like.

Actually ... maybe it's not that bad. Clearly there would be a
circularity issue for selectivity estimation, but all the alternatives
should have the same selectivity. Cost estimation is a different story:
by the time we need to do cost estimation for a subexpression, we do in
many cases have an idea how often the subexpression will be executed.

I read your idea of "ripping out all executor support for
AlternativeSubPlan
and instead having the planner replace an AlternativeSubPlan with
the desired specific SubPlan somewhere late in planning, possibly
setrefs.c."
in [1]. I was thinking that if we can do such a replacement sooner,
for example once we know the num_calls for the subplans, Unknown if it
is possible though. If we can, then we can handle the issue here as well.

The attached is a very PoC version, I'm not sure if it is the right
direction
to go.

The idea behind it is if we have a RelOptInfo which have
some AlternativeSubPlan,
and assume these subplans have some correlated vars which can be expressed
as
deps_relids. Then we can convert the AlternativeSubPlan to SubPlan once
bms_is_subset(deps_relids, rel->relids). My patch is able to fix the
issue reported
here and it only converts the AlternativeSubPlan in rel->reltarget for demo
purpose.

--
Best Regards
Andy Fan

#11Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#10)
Re: Improve planner cost estimations for alternative subplans

On Wed, Aug 26, 2020 at 4:21 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Mon, Aug 17, 2020 at 10:12 PM Andy Fan <zhihui.fan1213@gmail.com>
wrote:

On Mon, Jun 22, 2020 at 9:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Nope. The entire reason why we have that kluge is that we don't know
until much later how many times we expect to execute the subplan.
AlternativeSubPlan allows the decision which subplan form to use to be
postponed till runtime; but when we're doing things like estimating the
cost and selectivity of a where-clause, we don't know that.

Maybe there's some way to recast things to avoid that problem,
but I have little clue what it'd look like.

Actually ... maybe it's not that bad. Clearly there would be a
circularity issue for selectivity estimation, but all the alternatives
should have the same selectivity. Cost estimation is a different story:
by the time we need to do cost estimation for a subexpression, we do in
many cases have an idea how often the subexpression will be executed.

I read your idea of "ripping out all executor support for
AlternativeSubPlan
and instead having the planner replace an AlternativeSubPlan with
the desired specific SubPlan somewhere late in planning, possibly
setrefs.c."
in [1]. I was thinking that if we can do such a replacement sooner,
for example once we know the num_calls for the subplans, Unknown if it
is possible though. If we can, then we can handle the issue here as well.

The attached is a very PoC version, I'm not sure if it is the right
direction
to go.

The idea behind it is if we have a RelOptInfo which have
some AlternativeSubPlan,
and assume these subplans have some correlated vars which can be expressed
as
deps_relids. Then we can convert the AlternativeSubPlan to SubPlan once
bms_is_subset(subplan->deps_relids, rel->relids).

The way of figuring out subplan->deps_relids was wrong in my patch, I will
fix it later.
But the general idea is the same.

My patch is able to fix the issue reported
here and it only converts the AlternativeSubPlan in rel->reltarget for
demo purpose.

--
Best Regards
Andy Fan

--
Best Regards
Andy Fan