generalizing the planner knobs

Started by Neil Conwayover 20 years ago41 messageshackers

neilc@samurai.com

over 20 years ago

There are currently some rather crude knobs for persuading the planner
to favour certain kinds of query plans: the enable_XXX GUC variables.
Several people have asked for a more flexible way to give hints to the
planner. I'm not interested in implementing fully-general planner hints
at the moment, but ISTM that a simple improvement to what we have now
would allow for a wider range of planner hints with only minor changes:
we could replace the enable_XXX variables with a set of variables that
would add an arbitrary constant to the estimated cost of each type of
query node. (Alternatively, an arbitrary multiplier could be specified;
I'm not sure which would be better.)

This would also be useful when diagnosing bad query plans: for example,
setting enable_seqscan=false often causes the planner to disregard the
use of *any* sequential scan, anywhere in the plan. The ability to
slightly bump up the cost of particular operations would allow more
alternative plans to be examined.

On the other hand, the whole mechanism is still a hack. It also means
that applications using this will be more dependent on the actual
costing values produced by the planner, which is not good. However, if
you're in the sort of desperate straights where this sort of hackery is
required, perhaps that's acceptable.

Comments?

-Neil

Jonah H. Harris

jonah.harris@gmail.com

over 20 years ago

In reply to: Neil Conway (#1)

Re: generalizing the planner knobs

Hey Neil,

In the last couple weeks I too have been thinking about planner hints.
Assuming I have read your post correctly, the issue I see with this idea is
that, in most cases, there won't be much of a difference between adding an
arbitrary cost value to each type of node and disabling it completely.
Also, by fiddling with an arbitrary cost the user may introduce a lot of
variation into the planner which may actually result in worse query plans.

While Tom's done a great job with the planner, there are certain cases where
a user knows exactly what type of join or index they want to use for a
query. In that case I'd favor run-time hints from the user similar to
Oracle. I've read about seven papers on query optimization and planning in
the last few weeks and have a lot of ideas... I'm just not sure when I may
get time to work on them :(

-Jonah

Show quoted text

On 12/1/05, Neil Conway <neilc@samurai.com> wrote:

There are currently some rather crude knobs for persuading the planner
to favour certain kinds of query plans: the enable_XXX GUC variables.
Several people have asked for a more flexible way to give hints to the
planner. I'm not interested in implementing fully-general planner hints
at the moment, but ISTM that a simple improvement to what we have now
would allow for a wider range of planner hints with only minor changes:
we could replace the enable_XXX variables with a set of variables that
would add an arbitrary constant to the estimated cost of each type of
query node. (Alternatively, an arbitrary multiplier could be specified;
I'm not sure which would be better.)

This would also be useful when diagnosing bad query plans: for example,
setting enable_seqscan=false often causes the planner to disregard the
use of *any* sequential scan, anywhere in the plan. The ability to
slightly bump up the cost of particular operations would allow more
alternative plans to be examined.

On the other hand, the whole mechanism is still a hack. It also means
that applications using this will be more dependent on the actual
costing values produced by the planner, which is not good. However, if
you're in the sort of desperate straights where this sort of hackery is
required, perhaps that's acceptable.

Comments?

-Neil

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Neil Conway (#1)

Re: generalizing the planner knobs

Neil Conway <neilc@samurai.com> writes:

... ISTM that a simple improvement to what we have now
would allow for a wider range of planner hints with only minor changes:
we could replace the enable_XXX variables with a set of variables that
would add an arbitrary constant to the estimated cost of each type of
query node. (Alternatively, an arbitrary multiplier could be specified;
I'm not sure which would be better.)

I think the multiplier would be better, because it'd avoid the problem
you mention later that useful values would be dependent on the planner's
cost units. Also, one could sanely allow a multiplier less than one,
so as to favor instead of penalize a particular plan type.

regards, tom lane

Qingqing Zhou

zhouqq@cs.toronto.edu

over 20 years ago

In reply to: Neil Conway (#1)

Re: generalizing the planner knobs

"Neil Conway" <neilc@samurai.com> wrote

This would also be useful when diagnosing bad query plans: for example,
setting enable_seqscan=false often causes the planner to disregard the
use of *any* sequential scan, anywhere in the plan. The ability to
slightly bump up the cost of particular operations would allow more
alternative plans to be examined.

This method also has the problem of "enable_seqscan=false" in some
situations. I would vote we implement the final general solution like query
plan hints directly.

Regards,
Qingqing

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Jonah H. Harris (#2)

Re: generalizing the planner knobs

"Jonah H. Harris" <jonah.harris@gmail.com> writes:

In the last couple weeks I too have been thinking about planner hints.
Assuming I have read your post correctly, the issue I see with this idea is
that, in most cases, there won't be much of a difference between adding an
arbitrary cost value to each type of node and disabling it completely.
Also, by fiddling with an arbitrary cost the user may introduce a lot of
variation into the planner which may actually result in worse query plans.

Which is pretty much exactly the problem with "planner hints", too.
I've resisted that suggestion in the past and will continue to do so,
because hints are accidents waiting to happen. Even if the hint is right
today for your current Postgres version and current data distribution,
it's likely not to be right further down the road --- but once the hint
is embedded in your application, how often are you going to revisit it?
As an example, a hint forcing the planner to use an indexscan with a
particular index might have been a great idea in PG 8.0 and a lousy idea
in 8.1, because it would prevent substitution of a possibly-far-better
bitmap indexscan.

The enable_foo switches are debug aids, not something you are expected
to fool with for production purposes, and the same would be true of
Neil's suggested multipliers. While I don't feel any strong need for
variable multipliers, they'd be a small enough incremental amount of
work that the suggestion doesn't require a lot of supporting argument.
Adding a planner hint facility would be several orders of magnitude
more work, and it would be taking the system in a design direction that
I think is fundamentally misguided.

regards, tom lane

Jonah H. Harris

jonah.harris@gmail.com

over 20 years ago

In reply to: Tom Lane (#5)

Re: generalizing the planner knobs

Tom,

Don't get me wrong, I agree with you completely. I would rather put effort
into enhancing the planner than in developing work-arounds. In 99% of all
cases the planner works correctly, but I know people who actually have to
disable planning options (mergejoin) in production applications because they
get bad plans. The "bad" plans are not really bad in terms of what the
planner knows about the query, just in areas where the planner doesn't look
at other things.

I also agree that a significant amount of work would be required to add
run-time hints which would be better spent enhancing the system as a whole.
My only suggestion was that it would be better than Part 1 of Neil's
statement. Somehow I missed the end mention of multipliers which I agree
requires less effort.

Show quoted text

On 12/1/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Jonah H. Harris" <jonah.harris@gmail.com> writes:

In the last couple weeks I too have been thinking about planner hints.
Assuming I have read your post correctly, the issue I see with this idea

is

that, in most cases, there won't be much of a difference between adding

an

arbitrary cost value to each type of node and disabling it completely.
Also, by fiddling with an arbitrary cost the user may introduce a lot of
variation into the planner which may actually result in worse query

plans.

Which is pretty much exactly the problem with "planner hints", too.
I've resisted that suggestion in the past and will continue to do so,
because hints are accidents waiting to happen. Even if the hint is right
today for your current Postgres version and current data distribution,
it's likely not to be right further down the road --- but once the hint
is embedded in your application, how often are you going to revisit it?
As an example, a hint forcing the planner to use an indexscan with a
particular index might have been a great idea in PG 8.0 and a lousy idea
in 8.1, because it would prevent substitution of a possibly-far-better
bitmap indexscan.

The enable_foo switches are debug aids, not something you are expected
to fool with for production purposes, and the same would be true of
Neil's suggested multipliers. While I don't feel any strong need for
variable multipliers, they'd be a small enough incremental amount of
work that the suggestion doesn't require a lot of supporting argument.
Adding a planner hint facility would be several orders of magnitude
more work, and it would be taking the system in a design direction that
I think is fundamentally misguided.

regards, tom lane

Bruce Momjian

bruce@momjian.us

over 20 years ago

In reply to: Jonah H. Harris (#6)

Re: generalizing the planner knobs

"Jonah H. Harris" <jonah.harris@gmail.com> writes:

Tom,

Don't get me wrong, I agree with you completely. I would rather put effort
into enhancing the planner than in developing work-arounds. In 99% of all
cases the planner works correctly, but I know people who actually have to
disable planning options (mergejoin) in production applications because they
get bad plans. The "bad" plans are not really bad in terms of what the
planner knows about the query, just in areas where the planner doesn't look
at other things.

I would like to draw a distinction between two sorts of hints. Currently
you're talking about one sort of hint, namely hints that tell the planner to
alter its cost model and choose a different plan than the inputs it has would
dictate. Using these require the user to have a fairly in depth understanding
of the planner and what options it has available.

On the other hand the type I would prefer to see are hints that feed directly
into filling in information the planner lacks. This only requires that the
user understand his own data and still lets the planner pick the best plan
based on the provided information.

So for example I would love to see a hint that allowed you to specify the
selectivity of a where clause. And one that let you specify the "density" of a
grouping clause.

Most of the time the planner makes a mistake it's because of a bad
miscalculation in estimating these givens. If it had the correct values for
the inputs then it would make the right decision about the plan.

Making the planner very good at making the right decisions given accurate
inputs is an attainable goal. Computers are pretty deterministic and it's
possible to come up with very accurate cost models. Despite some known
problems with Postgres's current models they're remarkably good already. And
there's no particular reason to think they can't be made nearly perfect.

Making the planner very good at producing accurate estimates is a much harder
goal. No matter how accurate it gets there will always be more complex
expressions that are harder to predict and there will always be cases the
planner can't estimate well. The user however knows his own data and may well
know the answer.

In the extreme consider user-defined operators, which will always be dependent
on the user to provide estimator functions. If it's a rarely used operator the
user may find it easier to simply tell the planner the selectivity of each
expression rather than come up with a general solution.

(I also think things like joins and group by clauses will never be able to be
guaranteed accurate results in general. Not without a lot more costs up front
including giving up on calculating statistics based on only a sample.)

--
greg

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Bruce Momjian (#7)

Re: generalizing the planner knobs

Greg Stark <gsstark@mit.edu> writes:

On the other hand the type I would prefer to see are hints that feed directly
into filling in information the planner lacks. This only requires that the
user understand his own data and still lets the planner pick the best plan
based on the provided information.

This would avoid some issues, but it still is vulnerable to the problem
that the hint you put in your code today will fail to track changes in
your data tomorrow.

regards, tom lane

Pollard, Mike

mpollard@cincom.com

over 20 years ago

In reply to: Tom Lane (#8)

Re: generalizing the planner knobs

Greg Stark <gsstark@mit.edu> writes:

On the other hand the type I would prefer to see are hints that feed

directly

into filling in information the planner lacks. This only requires that

the

user understand his own data and still lets the planner pick the best

plan

based on the provided information.

Optimizer hints were added because some databases just don't have a very
smart optimizer. But you are much better served tracking down cases in
which the optimizer makes a bad choice, and teaching the optimizer how
to make a better one. That way, all users get the benefit of the fix.
Remember, the purpose of SQL is to isolate the end user from having to
care about how the data is retrieved; that is the RDBMS' problem. (the
other thing forgotten was that it was supposed to be a natural language.
NVL. Bah.)

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc.

Import Notes

Resolved by subject fallback

#10

Gregory Maxwell

gmaxwell@gmail.com

over 20 years ago

In reply to: Pollard, Mike (#9)

Re: generalizing the planner knobs

On 12/1/05, Pollard, Mike <mpollard@cincom.com> wrote:

Optimizer hints were added because some databases just don't have a very
smart optimizer. But you are much better served tracking down cases in
which the optimizer makes a bad choice, and teaching the optimizer how
to make a better one. That way, all users get the benefit of the fix.
Remember, the purpose of SQL is to isolate the end user from having to
care about how the data is retrieved; that is the RDBMS' problem. (the
other thing forgotten was that it was supposed to be a natural language.
NVL. Bah.)

The flipside there is that a good set of hinting options may increase
the amount of detailed feedback we get from users on improvements
needed in the optimizer. The current knobs are pretty blunt and don't
do as much as I'd like when trying to track down exactly where the
optimiser has gone wrong.

If we'd really like to avoid people using the knobs to rig queries,
how about making them only work with explain analyze, useful for
debugging but not so useful for actual queries.

#11

Pollard, Mike

mpollard@cincom.com

over 20 years ago

In reply to: Gregory Maxwell (#10)

Re: generalizing the planner knobs

Gregory Maxwell <gmaxwell@gmail.com> wrote:

The flipside there is that a good set of hinting options may increase
the amount of detailed feedback we get from users on improvements
needed in the optimizer. The current knobs are pretty blunt and don't
do as much as I'd like when trying to track down exactly where the
optimiser has gone wrong.

Point conceded. Any information that can help diagnose an issue is good
information. I like the idea of only allowing it on explain.

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc.

Import Notes

Resolved by subject fallback

#12

Neil Conway

neilc@samurai.com

over 20 years ago

In reply to: Gregory Maxwell (#10)

Re: generalizing the planner knobs

On Thu, 2005-12-01 at 21:01 -0500, Gregory Maxwell wrote:

If we'd really like to avoid people using the knobs to rig queries,
how about making them only work with explain analyze, useful for
debugging but not so useful for actual queries.

That seems a pretty arbitrary limitation. I agree that it's not ideal to
have users adjust planner behavior via this means, but until we have
something better, I think applying that limitation would only make the
status quo worse.

-Neil

#13

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Neil Conway (#12)

Re: generalizing the planner knobs

Neil Conway <neilc@samurai.com> writes:

On Thu, 2005-12-01 at 21:01 -0500, Gregory Maxwell wrote:

If we'd really like to avoid people using the knobs to rig queries,
how about making them only work with explain analyze, useful for
debugging but not so useful for actual queries.

That seems a pretty arbitrary limitation. I agree that it's not ideal to
have users adjust planner behavior via this means, but until we have
something better, I think applying that limitation would only make the
status quo worse.

Yeah, I agree. Adding code to prevent people from using a facility
doesn't seem very reasonable, even if it's our policy that using the
facility for production purposes is not a good idea. In fact, we just
today had a counterexample --- see this thread:
http://archives.postgresql.org/pgsql-performance/2005-12/msg00015.php
Being able to use enable_nestloop got Markus out of a short-term bind,
which to me is exactly what you want to be able to do with this sort
of thing.

I don't have any problem with expending small amounts of work to make
it easier to hack the planner in small ways. The real problem I have
with a "planner hints" facility (in the form that I think most people
who ask for it have in mind) is that it would be a *very large* amount
of work to do it reasonably well, and I think that amount of effort
would be better spent in other ways.

regards, tom lane

#14

Trent Shipley

tshipley@deru.com

over 20 years ago

In reply to: Gregory Maxwell (#10)

Re: generalizing the planner knobs

On Thursday 2005-12-01 19:01, Gregory Maxwell wrote:

On 12/1/05, Pollard, Mike <mpollard@cincom.com> wrote:

Optimizer hints were added because some databases just don't have a very
smart optimizer. But you are much better served tracking down cases in
which the optimizer makes a bad choice, and teaching the optimizer how
to make a better one. That way, all users get the benefit of the fix.
Remember, the purpose of SQL is to isolate the end user from having to
care about how the data is retrieved; that is the RDBMS' problem. (the
other thing forgotten was that it was supposed to be a natural language.
NVL. Bah.)

The flipside there is that a good set of hinting options may increase
the amount of detailed feedback we get from users on improvements
needed in the optimizer. The current knobs are pretty blunt and don't
do as much as I'd like when trying to track down exactly where the
optimiser has gone wrong.

If we'd really like to avoid people using the knobs to rig queries,
how about making them only work with explain analyze, useful for
debugging but not so useful for actual queries.

I'm all in favor of sticking to the declarative language ideal.

Also, I'm much in favor of protecting people from themselves.

On the other hand, if folks insist on engaging in extreme sports (like second
guessing the optimizer) I'm against regulating their freedom. I think
exposing planner variables would be a good thing, on net. Naturally, you
would warn everyone not to touch them. (Safety and freedom are both
necessary.)

If you can play with the knobs, you should let them be used to return real
result sets. That way, when you get feedback, you will be able to tell if
the cost estimator is "broken". Just returning a modified plan won't
challenge costing assumptions.

#15

Bruce Momjian

bruce@momjian.us

over 20 years ago

In reply to: Pollard, Mike (#9)

Re: generalizing the planner knobs

"Pollard, Mike" <mpollard@cincom.com> writes:

Optimizer hints were added because some databases just don't have a very
smart optimizer. But you are much better served tracking down cases in
which the optimizer makes a bad choice, and teaching the optimizer how
to make a better one.

You more or less missed my entire point.

You can always teach the optimizer to make better decisions based on good
data. Your statement is basically right when talking about tweaking the
optimizer's decisions to ignore its best judgement.

But there are many many cases where the data the optimizer has available isn't
good and for good reason. And in plenty of those cases the data the optimizer
has available *can't* be good.

In the extreme, no amount of added intelligence in the optimizer is going to
help it come up with any sane selectivity estimate for something like

WHERE radius_authenticate(user) = 'OK'

--
greg

#16

Csaba Nagy

nagy@ecircle-ag.com

over 20 years ago

In reply to: Tom Lane (#8)

Re: generalizing the planner knobs

On Thu, 2005-12-01 at 22:01, Tom Lane wrote:

Greg Stark <gsstark@mit.edu> writes:

On the other hand the type I would prefer to see are hints that feed directly
into filling in information the planner lacks. This only requires that the
user understand his own data and still lets the planner pick the best plan
based on the provided information.

This would avoid some issues, but it still is vulnerable to the problem
that the hint you put in your code today will fail to track changes in
your data tomorrow.

Tom, I have to disagree here. At least in our application, we must
provide for an acceptable worst case scenario, and sometimes a slightly
wrong estimate can lead to a plan which is very fast 99% of the time but
completely wrong in 1% of the cases. Sometimes the percentage is 50/50.
I've had this situation with some "limit" plans where the planner had
chosen a wrong index. The problem there was that the planner had
estimated that the query will have 20 rows as a result, but it had less,
and resulted in the complete scan of the index... as opposed to a much
smaller scan that would have resulted by scanning the other index, as
that one would have provided an end condition orders of magnitudes
sooner. Now the statistics will always be only an estimation, and +/- a
few can really make a big difference in some situations. In this
particular situation the index choice of the planner would have been
faster for all cases where there were really 20 rows returned, but I
forced it to always choose the other plan (by adding the proper order
by) because I can't risk a bad result in any of the cases.
In this particular case I was able to force the planner choose a
specific plan, but that might not be always possible, so I guess it
really would make sense to be able to tell the planner how selective
some conditions are. And yes, sometimes I would like to "freeze" a
specific "safe" plan for a specific query, even if it is not optimal.

So for me the "hint" mechanism is good for telling the server that I'm
not interested at all in the BEST plan but which risks getting very bad
on occasions, but in a good enough plan which is safe.

And as for the selectivity changes over time, the hints will change
along. In most of the situations when selectivity change, the SQL has to
change too, sometimes even the complete workflow. I find that if changed
hints will help in some occasions then having them would mean less
maintenance than the code rewriting that would be otherwise involved...
and I'm completely sure the server can't compensate for the change of
the dynamics of the data all the time. And it definitely can't keep up
with highly dynamic data, where the statistics change constantly in big
tables...

Our application for example has kind of batch processing, where we
insert smaller or larger batches of data in a HUGE table (~200 millions
of rows), and then that data is immediately used for different
operations and then reports, and furthermore it is heavily updated. I
can't think of any reasonable statistics target and ANALYZE strategy
which could satisfy both small batches and large batches without running
ANALYZE permanently with high statistics targets on the key fields...
and even that would not be specific enough when "limit 20" is involved.
For queries involving this table I really would like to freeze plans, as
any misplanning has bad consequences.

Cheers,
Csaba.

#17

Martijn van Oosterhout

kleptog@svana.org

over 20 years ago

In reply to: Csaba Nagy (#16)

Re: generalizing the planner knobs

On Fri, Dec 02, 2005 at 11:07:06AM +0100, Csaba Nagy wrote:

So for me the "hint" mechanism is good for telling the server that I'm
not interested at all in the BEST plan but which risks getting very bad
on occasions, but in a good enough plan which is safe.

I'm wondering if long term another approach might be to have another
parameter in the planner, cost_error or selectivity_error which is an
indication of how accurate we think it is.

So for example you have an index scan might cost x but with a possible
error of 15% and the seqscan might cost y but with an error of 1%.

The "error" for nested loop would be the product of the two inputs,
whereas a merge join whould be much less sensetive to error. A sort or
hash join would react badly to large variations of input.

So in cases where there is a choice between two indexscans with one
slightly more expensive and more accurate but can result in a mergejoin
would be a better choice than a possibly highly selective index but
without accurate info that needs to be fed into a nested loop. Even
though the latter might look better, the former is the "safer" option.

I think this would solve the problem where people see sudden flip-flops
between good and bad plans. The downside is that it's yet another
parameter for the planner to get wrong.

Unfortunatly, this is the kind of thing people write thesises on and I
don't think many people have the grounding in statistics to make it all
work.

Have a ncie day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

#18

Pollard, Mike

mpollard@cincom.com

over 20 years ago

In reply to: Martijn van Oosterhout (#17)

Re: generalizing the planner knobs

Greg Stark gsstark@mit.edu writes:

You more or less missed my entire point.

Only because I am still getting used to how powerful and flexible
Postgres is; but I am working on expanding my horizons.

In the extreme, no amount of added intelligence in the optimizer is

going

to
help it come up with any sane selectivity estimate for something like

WHERE radius_authenticate(user) = 'OK'

yeah, I can see where something like this would be problematic. While I
still think that in an ideal world, you want to leave all of this to the
engine, it is true that in the real world sometimes we still have to do
some of the thinking for the computer. It's just that I've seen code
absolutely littered with optimizer hints, and that really bothers me.
But you can't not build a useful tool just because some would abuse it.

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc.

Import Notes

Resolved by subject fallback

#19

Rod Taylor

rbt@rbt.ca

over 20 years ago

In reply to: Bruce Momjian (#15)

Re: generalizing the planner knobs

In the extreme, no amount of added intelligence in the optimizer is going to
help it come up with any sane selectivity estimate for something like

WHERE radius_authenticate(user) = 'OK'

Why not?

The missing capability in this case is to be able to provide or generate
(self learning?) statistics for a function that describe a typical
result and the cost of getting that result.
--

#20

Bruce Momjian

bruce@momjian.us

over 20 years ago

In reply to: Rod Taylor (#19)

Re: generalizing the planner knobs

Rod Taylor <pg@rbt.ca> writes:

In the extreme, no amount of added intelligence in the optimizer is going to
help it come up with any sane selectivity estimate for something like

WHERE radius_authenticate(user) = 'OK'

Why not?

The missing capability in this case is to be able to provide or generate
(self learning?) statistics for a function that describe a typical result
and the cost of getting that result.

Ok, try "WHERE radius_authenticate(user, (select ...), ?)"

The point is that you can improve the estimates the planner gets. But you can
never make them omniscient. There will always be cases where the user knows
his data more than the planner. And those hints are still valid when a new
optimizer has new plans available.

This is different from hints that tell the planner what plan to use. Every
situation where the predicted cost is inaccurate despite accurate estimates
represents a fixable bug in the optimizer's cost model. When a new version of
the optimizer is available with a more accurate cost model or new available
plans those kinds of hints will only get in the way.

--
greg

#21

Rod Taylor

rbt@rbt.ca

over 20 years ago

In reply to: Bruce Momjian (#20)

#22

Gregory Maxwell

gmaxwell@gmail.com

over 20 years ago

In reply to: Bruce Momjian (#20)

#23