Hints proposal

Started by Jim Nasbyover 19 years ago73 messageshackers

Jim.Nasby@BlueTreble.com

over 19 years ago

Posting here instead of hackers since this is where the thread got
started...

The argument has been made that producing a hints system will be as hard
as actually fixing the optimizer. There's also been clamoring for an
actual proposal, so here's one that (I hope) wouldn't be very difficult
to implemen.

My goal with this is to keep the coding aspect as simple as possible, so
that implementation and maintenance of this isn't a big burden. Towards
that end, these hints either tell the planner specifically how to handle
some aspect of a query, or they tell it to modify specific cost
estimates. My hope is that this information could be added to the
internal representation of a query without much pain, and that the
planner can then use that information when generating plans.

The syntax these hints is something arbitrary. I'm borrowing Oracle's
idea of embedding hints in comments, but we can use some other method if
desired. Right now I'm more concerned with getting the general idea
across.

Since this is such a controversial topic, I've left this at a 'rough
draft' stage - it's meant more as a framework for discussion than a
final proposal for implementation.

Forcing a Plan
--------------
These hints would outright force the planner to do things a certain way.

... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */

This would force the planner to access table via a seqscan or
index_name. For the index case, you can also specify if the access must
or must not be via a bitmap scan. If neither is specified, the planner
is free to choose either one.

Theoretically, we could also allow "ACCESS INDEX" without an index name,
which would simply enforce that a seqscan not be used, but I'm not sure
how useful that would be.

... FROM a JOIN b /* {HASH|NESTED LOOP|MERGE} JOIN */ ON (...)
... FROM a JOIN b ON (...) /* [HASH|NESTED LOOP|MERGE] JOIN */

Force the specified join mechanism on the join. The first form would not
enforce a join order, it would only force table b to be joined to the
rest of the relations using the specified join type. The second form
would specify that a joins to b in that order, and optionally specify
what type of join to use.

... GROUP BY ... /* {HASH|SORT} AGGREGATE */

Specify how aggregation should be handled.

Cost Tweaking
-------------
It would also be useful to allow tweaking of planner cost estimates.
This would take the general form of

node operator value

where node would be a planner node/hint (ie: ACCESS INDEX), operator
would be +, -, *, /, and value would be the amount to change the
estimate by. So "ACCESS INDEX my_index / 2" would tell the planner to
cut the estimated cost of any index scan on a given table in half.

(I realize the syntax will probably need to change to avoid pain in the
grammar code.)

Unlike the hints above that are ment to force a certain behavior on an
operation, you could potentially have multiple cost hints in a single
location, ie:

FROM a /* HASH JOIN * 1.1 NESTED LOOP JOIN * 2 MERGE JOIN + 5000 */
JOIN b ON (...) /* NESTED LOOP JOIN - 5000 */

The first comment block would apply to any joins against a, while the
second one would apply only to joins between a and b. The effects would
be cumulative, so this example means that any merge join against a gets
an added cost of 5000, unless it's a join with b (because +5000 + -5000
= 0). I think you could end up with odd cases if the second form just
over-rode the first, which is why it should be cummulative.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

Bruce Momjian

bruce@momjian.us

over 19 years ago

In reply to: Jim Nasby (#1)

Re: Hints proposal

Because DB2 doesn't like hints, and the fact that they have gotten to a
point where they feel they do not need them, I feel we too can get to a
point where we don't need them either. The question is whether we can
get there quickly enough for our userbase.

I perfer attacking the problem at the table definition level, like
something like "volatile", or adding to the existing table statistics.

---------------------------------------------------------------------------

Jim C. Nasby wrote:

Posting here instead of hackers since this is where the thread got
started...

The argument has been made that producing a hints system will be as hard
as actually fixing the optimizer. There's also been clamoring for an
actual proposal, so here's one that (I hope) wouldn't be very difficult
to implemen.

My goal with this is to keep the coding aspect as simple as possible, so
that implementation and maintenance of this isn't a big burden. Towards
that end, these hints either tell the planner specifically how to handle
some aspect of a query, or they tell it to modify specific cost
estimates. My hope is that this information could be added to the
internal representation of a query without much pain, and that the
planner can then use that information when generating plans.

The syntax these hints is something arbitrary. I'm borrowing Oracle's
idea of embedding hints in comments, but we can use some other method if
desired. Right now I'm more concerned with getting the general idea
across.

Since this is such a controversial topic, I've left this at a 'rough
draft' stage - it's meant more as a framework for discussion than a
final proposal for implementation.

Forcing a Plan
--------------
These hints would outright force the planner to do things a certain way.

... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */

This would force the planner to access table via a seqscan or
index_name. For the index case, you can also specify if the access must
or must not be via a bitmap scan. If neither is specified, the planner
is free to choose either one.

Theoretically, we could also allow "ACCESS INDEX" without an index name,
which would simply enforce that a seqscan not be used, but I'm not sure
how useful that would be.

... FROM a JOIN b /* {HASH|NESTED LOOP|MERGE} JOIN */ ON (...)
... FROM a JOIN b ON (...) /* [HASH|NESTED LOOP|MERGE] JOIN */

Force the specified join mechanism on the join. The first form would not
enforce a join order, it would only force table b to be joined to the
rest of the relations using the specified join type. The second form
would specify that a joins to b in that order, and optionally specify
what type of join to use.

... GROUP BY ... /* {HASH|SORT} AGGREGATE */

Specify how aggregation should be handled.

Cost Tweaking
-------------
It would also be useful to allow tweaking of planner cost estimates.
This would take the general form of

node operator value

where node would be a planner node/hint (ie: ACCESS INDEX), operator
would be +, -, *, /, and value would be the amount to change the
estimate by. So "ACCESS INDEX my_index / 2" would tell the planner to
cut the estimated cost of any index scan on a given table in half.

(I realize the syntax will probably need to change to avoid pain in the
grammar code.)

Unlike the hints above that are ment to force a certain behavior on an
operation, you could potentially have multiple cost hints in a single
location, ie:

FROM a /* HASH JOIN * 1.1 NESTED LOOP JOIN * 2 MERGE JOIN + 5000 */
JOIN b ON (...) /* NESTED LOOP JOIN - 5000 */

The first comment block would apply to any joins against a, while the
second one would apply only to joins between a and b. The effects would
be cumulative, so this example means that any merge join against a gets
an added cost of 5000, unless it's a join with b (because +5000 + -5000
= 0). I think you could end up with odd cases if the second form just
over-rode the first, which is why it should be cummulative.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Joshua Marsh

icub3d@gmail.com

over 19 years ago

In reply to: Jim Nasby (#1)

Re: Hints proposal

On 10/12/06, Jim C. Nasby <jim@nasby.net> wrote:

Posting here instead of hackers since this is where the thread got
started...

The argument has been made that producing a hints system will be as hard
as actually fixing the optimizer. There's also been clamoring for an
actual proposal, so here's one that (I hope) wouldn't be very difficult
to implemen.

My goal with this is to keep the coding aspect as simple as possible, so
that implementation and maintenance of this isn't a big burden. Towards
that end, these hints either tell the planner specifically how to handle
some aspect of a query, or they tell it to modify specific cost
estimates. My hope is that this information could be added to the
internal representation of a query without much pain, and that the
planner can then use that information when generating plans.

I've been following the last thread with a bit of interest. I like the
proposal. It seems simple and easy to use. What is it about hinting that
makes it so easily breakable with new versions? I don't have any experience
with Oracle, so I'm not sure how they screwed logic like this up. Hinting
to use a specific merge or scan seems fairly straight forward; if the query
requests to use an index on a join, I don't see how hard it is to go with
the suggestion. It will become painfully obvious to the developer if his
hinting is broken.

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Jim Nasby (#1)

Re: Hints proposal

[ This is off-topic for -performance, please continue the thread in
-hackers ]

"Jim C. Nasby" <jim@nasby.net> writes:

These hints would outright force the planner to do things a certain way.
... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */

This proposal seems to deliberately ignore every point that has been
made *against* doing things that way. It doesn't separate the hints
from the queries, it doesn't focus on fixing the statistical or cost
misestimates that are at the heart of the issue, and it takes no account
of the problem of hints being obsoleted by system improvements.

It would also be useful to allow tweaking of planner cost estimates.
This would take the general form of
node operator value

This is at least focusing on the right sort of thing, although I still
find it completely misguided to be attaching hints like this to
individual queries.

What I would like to see is information *stored in a system catalog*
that affects the planner's cost estimates. As an example, the DBA might
know that a particular table is touched sufficiently often that it's
likely to remain RAM-resident, in which case reducing the page fetch
cost estimates for just that table would make sense. (BTW, this is
something the planner could in principle know, but we're unlikely to
do it anytime soon, for a number of reasons including a desire for plan
stability.) The other general category of thing I think we need is a
way to override selectivity estimates for particular forms of WHERE
clauses.

regards, tom lane

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 19 years ago

In reply to: Bruce Momjian (#2)

Re: Hints proposal

Bruce Momjian wrote:

Because DB2 doesn't like hints, and the fact that they have gotten to a
point where they feel they do not need them, I feel we too can get to a
point where we don't need them either. The question is whether we can
get there quickly enough for our userbase.

In all fairness, when I used to work with DB2 we often had to rewrite
queries to persuade the planner to choose a different plan. Often it was
more of an issue of plan stability; a query would suddenly become
horribly slow in production because a table had grown slowly to the
point that it chose a different plan than before. Then we had to modify
the query again, or manually set the statistics. In extreme cases we had
to split a query to multiple parts and use temporary tables and move
logic to the application to get a query to perform consistently and fast
enough. I really really missed hints.

Because DB2 doesn't have MVCC, an accidental table scan is very serious,
because with stricter isolation levels that keeps the whole table locked.

That said, I really don't like the idea of hints like "use index X"
embedded in a query. I do like the idea of hints that give the planner
more information about the data. I don't have a concrete proposal, but
here's some examples of hints I'd like to see:

"table X sometimes has millions of records and sometimes it's empty"
"Expression (table.foo = table2.bar * 2) has selectivity 0.99"
"if foo.bar = 5 then foo.field2 IS NULL"
"Column X is unique"
"function foobar() always returns either 1 or 2, and it returns 2 90% of
the time."
"if it's Monday, then table NEW_ORDERS has a cardinality of 100000,
otherwise 10."

BTW: Do we make use of CHECK constraints in the planner? In DB2, that
was one nice and clean way of hinting the planner about things. If I
remember correctly, you could even define CHECK constraints that weren't
actually checked at run-time, but were used by the planner.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Merlin Moncure

mmoncure@gmail.com

over 19 years ago

In reply to: Tom Lane (#4)

Re: Hints proposal

On 10/12/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

[ This is off-topic for -performance, please continue the thread in
-hackers ]

This proposal seems to deliberately ignore every point that has been
made *against* doing things that way. It doesn't separate the hints
from the queries, it doesn't focus on fixing the statistical or cost
misestimates that are at the heart of the issue, and it takes no account
of the problem of hints being obsoleted by system improvements.

what about extending the domain system so that we can put in ranges
that override the statistics or (imo much more importantly) provide
information when the planner would have to restort to a guess. my case
for this is prepared statements with a parameterized limit clause.

prepare foo(l int) as select * from bar limit $1;

maybe:
create domain foo_lmt as int hint 1; -- probably needs to be fleshed out
prepare foo(l foolmt) as select * from bar limit $1;

this says: "if you have to guess me, please use this"

what I like about this over previous attempts to persuade you is the
grammar changes are localized and also imo future proofed. planner can
ignore the hints if they are not appropriate for the oparation.

merlin

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Heikki Linnakangas (#5)

Re: Hints proposal

Heikki Linnakangas <heikki@enterprisedb.com> writes:

BTW: Do we make use of CHECK constraints in the planner?

Only for "constraint exclusion", and at the moment that's off by default.

The gating problem here is that if the planner relies on a CHECK
constraint, and then you drop the constraint, the previously generated
plan might start to silently deliver wrong answers. So I'd like to see
a plan invalidation mechanism in place before we go very far down the
path of relying on constraints for planning. That's something I'm going
to try to make happen for 8.3, though.

regards, tom lane

Jim Nasby

Jim.Nasby@BlueTreble.com

over 19 years ago

In reply to: Tom Lane (#4)

Re: [HACKERS] Hints proposal

On Thu, Oct 12, 2006 at 11:42:32AM -0400, Tom Lane wrote:

[ This is off-topic for -performance, please continue the thread in
-hackers ]

"Jim C. Nasby" <jim@nasby.net> writes:

These hints would outright force the planner to do things a certain way.
... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */

This proposal seems to deliberately ignore every point that has been
made *against* doing things that way. It doesn't separate the hints
from the queries, it doesn't focus on fixing the statistical or cost
misestimates that are at the heart of the issue, and it takes no account
of the problem of hints being obsoleted by system improvements.

Yes, but it does one key thing: allows DBAs to fix problems *NOW*. See
also my comment below.

It would also be useful to allow tweaking of planner cost estimates.
This would take the general form of
node operator value

This is at least focusing on the right sort of thing, although I still
find it completely misguided to be attaching hints like this to
individual queries.

Yes, but as I mentioned the idea here was to come up with something that
is (hopefully) easy to define and implement. In other words, something
that should be doable for 8.3. Because this proposal essentially amounts
to limiting plans the planner will consider and tweaking it's cost
estimates, I'm hoping that it should be (relatively) easy to implement.

What I would like to see is information *stored in a system catalog*
that affects the planner's cost estimates. As an example, the DBA might
know that a particular table is touched sufficiently often that it's
likely to remain RAM-resident, in which case reducing the page fetch
cost estimates for just that table would make sense. (BTW, this is
something the planner could in principle know, but we're unlikely to
do it anytime soon, for a number of reasons including a desire for plan
stability.)

All this stuff is great and I would love to see it! But this is all so
abstract that I'm doubtful this could make it into 8.4, let alone 8.3.
Especially if we want a comprehensive system that will handle most/all
cases. I don't know if we even have a list of all the cases we need to
handle.

The other general category of thing I think we need is a
way to override selectivity estimates for particular forms of WHERE
clauses.

I hadn't thought about that for hints, but it would be a good addition.
I think the stats-tweaking model would work, but we'd probably want to
allow "=" as well (which could go into the other stats tweaking hints as
well).

... WHERE a = b /* SELECTIVITY {+|-|*|/|=} value */
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

Csaba Nagy

nagy@ecircle-ag.com

over 19 years ago

In reply to: Tom Lane (#4)

Re: Hints proposal

OK, I just have to comment...

"Jim C. Nasby" <jim@nasby.net> writes:

These hints would outright force the planner to do things a certain way.
... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */

This proposal seems to deliberately ignore every point that has been
made *against* doing things that way. It doesn't separate the hints
from the queries, it doesn't focus on fixing the statistical or cost
misestimates that are at the heart of the issue, and it takes no account
of the problem of hints being obsoleted by system improvements.

But whatever arguments you made about planner improvements and the like,
it will NEVER be possible to correctly estimate in all cases the
statistics for a query, even if you perfectly know WHAT statistics you
need, which is also not the case all the time.

Tom, you're the one who knows best how the planner works... can you bet
anything you care about on the fact that one day the planner will never
ever generate a catastrophic plan without DBA tweaking ? And how far in
time we'll get to that point ?

Until that point is achieved, the above proposal is one of the simplest
to understand for the tweaking DBA, and the fastest to deploy when faced
with catastrophic plans. And I would guess it is one of the simplest to
be implemented and probably not very high maintenance either, although
this is just a guess.

If I could hint some of my queries, I would enable anonymous prepared
statements to take into account the parameter values, but I can't
because that results in runaway queries every now and then, so I had to
force postgres generate generic queries without knowing anything about
parameter values... so the effect for me is an overall slower postgres
system because I couldn't fix the particular problems I had and had to
tweak general settings. And when I have a problem I can't wait until the
planner is fixed, I have to solve it immediately... the current means to
do that are suboptimal.

The argument that planner hints would hide problems from being solved is
a fallacy. To put a hint in place almost the same amount of analysis is
needed from the DBA as solving the problem now, so users who ask now for
help will further do it even in the presence of hints. The ones who
wouldn't are not coming for help now either, they know their way out of
the problems... and the ones who still report a shortcoming of the
planner will do it with hints too.

I would even say it would be an added benefit, cause then you could
really see how well a specific plan will do without having the planner
capable to generate alone that plan... so knowledgeable users could come
to you further down the road when they know where the planner is wrong,
saving you time.

I must say it again, this kind of query-level hinting would be the
easiest to understand for the developers... there are many
trial-end-error type of programmers out there, if you got a hint wrong,
you fix it and move on, doesn't need to be perfect, it just have to be
good enough. I heavily doubt that postgres will get bad publicity
because user Joe sot himself in the foot by using bad hints... the
probability for that is low, you must actively put those hints there,
and if you take the time to do that then you're not the average Joe, and
probably not so lazy either, and if you're putting random hints, then
you would probably mess it up some other way anyway.

And the thing about missing new features is also not very founded. If I
would want to exclude a full table scan on a specific table for a
specific query, than that's about for sure that I want to do that
regardless what new features postgres will offer in the future. Picking
one specific access method is more prone to missing new access methods,
but even then, when I upgrade the DB server to a new version, I usually
have enough other compatibility problems (till now I always had some on
every upgrade I had) that making a round of upgrading hints is not an
outstanding problem. And if the application works good enough with
suboptimal plans, why would I even take that extra effort ?

I guess the angle is: I, as a practicing DBA would like to be able to
experiment and get most out of the imperfect tool I have, and you, the
developers, want to make the tool perfect... I don't care about perfect
tools, it just have to do the job... hints or anything else, if I can
make it work GOOD ENOUGH, it's all fine. And hints is something I would
understand and be able to use.

Thanks for your patience if you're still reading this...

Cheers,
Csaba.

#10

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Jim Nasby (#8)

Re: [PERFORM] Hints proposal

"Jim C. Nasby" <jim@nasby.net> writes:

Yes, but as I mentioned the idea here was to come up with something that
is (hopefully) easy to define and implement. In other words, something
that should be doable for 8.3.

Sorry, but that is not anywhere on my list of criteria for an important
feature. Having to live with a quick-and-dirty design for the
foreseeable future is an ugly prospect --- and anything that puts hints
into application code is going to lock us down to supporting it forever.

regards, tom lane

#11

Bucky Jordan

bjordan@lumeta.com

over 19 years ago

In reply to: Joshua Marsh (#3)

Re: Hints proposal

What is it about hinting that makes it so easily breakable with new versions? I >don't have any experience with Oracle, so I'm not sure how they screwed logic like >this up.

I don't have a ton of experience with oracle either, mostly DB2, MSSQL and PG. So, I thought I'd do some googling, and maybe others might find this useful info.

http://asktom.oracle.com/pls/ask/f?p=4950:8:2177642270773127589::NO::F4950_P8_DISPLAYID,F4950_P8_CRITERIA:7038986332061

Interesting quote: "In Oracle Applications development (11i apps - HR, CRM, etc) Hints are strictly forbidden. We find the underlying cause and fix it." and
"Hints -- only useful if you are in RBO and you want to make use of an access
path."

Maybe because I haven't had access to hints before, I've never been tempted to use them. However, I can't remember having to re-write SQL due to a PG upgrade either.

Oh, and if you want to see everything that gets broken/depreciated with new versions, just take a look at oracle's release notes for 9i and 10g. I particularly dislike how they rename stuff for no apparent reason (e.g. NOPARALLEL is now NO_PARALLEL - http://www.oracle-base.com/articles/10g/PerformanceTuningEnhancements10g.php)

At the very least, I agree it is important to separate the query (what data do I want) from performance options (config, indexes, hints, etc). The data I want doesn't change unless I have a functionality/requirements change. So I'd prefer not to have to go back and change that code just to tweak performance. In addition, this creates an even bigger mess for dynamic queries. I would be much more likely to consider hints if they could be applied separately.

- Bucky

#12

Josh Berkus

josh@agliodbs.com

over 19 years ago

In reply to: Jim Nasby (#8)

Re: [HACKERS] Hints proposal

Jim,

These hints would outright force the planner to do things a certain way.
... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */

This proposal seems to deliberately ignore every point that has been
made *against* doing things that way. It doesn't separate the hints
from the queries, it doesn't focus on fixing the statistical or cost
misestimates that are at the heart of the issue, and it takes no account
of the problem of hints being obsoleted by system improvements.

Yes, but it does one key thing: allows DBAs to fix problems *NOW*. See
also my comment below.

I don't see how adding extra tags to queries is easier to implement than
an ability to modify the system catalogs. Quite the opposite, really.

And, as I said, if you're going to push for a feature that will be
obsolesced in one version, then you're going to have a really rocky row
to hoe.

Yes, but as I mentioned the idea here was to come up with something that
is (hopefully) easy to define and implement. In other words, something
that should be doable for 8.3. Because this proposal essentially amounts
to limiting plans the planner will consider and tweaking it's cost
estimates, I'm hoping that it should be (relatively) easy to implement.

Even I, the chief marketing geek, am more concerned with getting a
feature that we will still be proud of in 5 years than getting one in
the next nine months. Keep your pants on!

I actually think the way to attack this issue is to discuss the kinds of
errors the planner makes, and what tweaks we could do to correct them.
Here's the ones I'm aware of:

-- Incorrect selectivity of WHERE clause
-- Incorrect selectivity of JOIN
-- Wrong estimate of rows returned from SRF
-- Incorrect cost estimate for index use

Can you think of any others?

I also feel that a tenet of the design of the "planner tweaks" system
ought to be that the tweaks are collectible and analyzable in some form.
This would allow DBAs to mail in their tweaks to -performance or
-hackers, and then allow us to continue improving the planner.

--Josh Berkus

#13

Jeff Davis

pgsql@j-davis.com

over 19 years ago

In reply to: Jim Nasby (#1)

Re: Hints proposal

On Thu, 2006-10-12 at 10:14 -0500, Jim C. Nasby wrote:

The syntax these hints is something arbitrary. I'm borrowing Oracle's
idea of embedding hints in comments, but we can use some other method if
desired. Right now I'm more concerned with getting the general idea
across.

Is there any advantage to having the hints in the queries? To me that's
asking for trouble with no benefit at all. It would seem to me to be
better to have a system catalog that defined hints as something like:

"If user A executes a query matching regex R, then coerce (or force) the
planner in this way."

I'm not suggesting that we do that, but it seems better then embedding
the hints in the queries themselves.

Regards,
Jeff Davis

#14

Josh Berkus

josh@agliodbs.com

over 19 years ago

In reply to: Csaba Nagy (#9)

Re: [HACKERS] Hints proposal

Csaba,

I guess the angle is: I, as a practicing DBA would like to be able to
experiment and get most out of the imperfect tool I have, and you, the
developers, want to make the tool perfect... I don't care about perfect
tools, it just have to do the job... hints or anything else, if I can
make it work GOOD ENOUGH, it's all fine. And hints is something I would
understand and be able to use.

Hmmm, if you already understand Visual Basic syntax, should we support
that too? Or maybe we should support MySQL's use of '0000-00-00' as the
"zero" date because people "understand" that?

We're just not going to adopt a bad design because Oracle DBAs are used
to it. If we wanted to do that, we could shut down the project and
join a proprietary DB staff.

The current discussion is:

a) Planner tweaking is sometimes necessary;
b) Oracle HINTS are a bad design for planner tweaking;
c) Can we come up with a good design for planner tweaking?

So, how about suggestions for a good design?

--Josh Berkus

#15

Jim Nasby

Jim.Nasby@BlueTreble.com

over 19 years ago

In reply to: Joshua Marsh (#3)

Re: Hints proposal

On Thu, Oct 12, 2006 at 09:26:24AM -0600, Joshua Marsh wrote:

On 10/12/06, Jim C. Nasby <jim@nasby.net> wrote:

Posting here instead of hackers since this is where the thread got
started...

The argument has been made that producing a hints system will be as hard
as actually fixing the optimizer. There's also been clamoring for an
actual proposal, so here's one that (I hope) wouldn't be very difficult
to implemen.

My goal with this is to keep the coding aspect as simple as possible, so
that implementation and maintenance of this isn't a big burden. Towards
that end, these hints either tell the planner specifically how to handle
some aspect of a query, or they tell it to modify specific cost
estimates. My hope is that this information could be added to the
internal representation of a query without much pain, and that the
planner can then use that information when generating plans.

I've been following the last thread with a bit of interest. I like the
proposal. It seems simple and easy to use. What is it about hinting that
makes it so easily breakable with new versions? I don't have any experience
with Oracle, so I'm not sure how they screwed logic like this up. Hinting
to use a specific merge or scan seems fairly straight forward; if the query
requests to use an index on a join, I don't see how hard it is to go with
the suggestion. It will become painfully obvious to the developer if his
hinting is broken.

The problem is that when you 'hint' (which is actually not a great name
for the first part of my proposal, since it's really forcing the planner
to do something), you're tying the planner's hands. As the planner
improves in newer versions, it's very possible to end up with forced
query plans that are much less optimal than what the newer planner could
come up with. This is especially true as new query execution nodes are
created, such as hashaggregate.

The other downside is that it's per-query. It would certainly be useful
to be able to nudge the planner in the right direction on a per-table
level, but it's just not clear how to accomplish that. Like I said, the
idea behind my proposal is to have something that can be done soon, like
for 8.3.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#16

Jim Nasby

Jim.Nasby@BlueTreble.com

over 19 years ago

In reply to: Heikki Linnakangas (#5)

Re: Hints proposal

On Thu, Oct 12, 2006 at 04:55:17PM +0100, Heikki Linnakangas wrote:

Bruce Momjian wrote:

Because DB2 doesn't like hints, and the fact that they have gotten to a
point where they feel they do not need them, I feel we too can get to a
point where we don't need them either. The question is whether we can
get there quickly enough for our userbase.

In all fairness, when I used to work with DB2 we often had to rewrite
queries to persuade the planner to choose a different plan. Often it was
more of an issue of plan stability; a query would suddenly become
horribly slow in production because a table had grown slowly to the
point that it chose a different plan than before. Then we had to modify
the query again, or manually set the statistics. In extreme cases we had
to split a query to multiple parts and use temporary tables and move
logic to the application to get a query to perform consistently and fast
enough. I really really missed hints.

Oracle has an interesting way to deal with this, in that you can store a
plan that the optimizer generates and tell it to always use it for that
query. There's some other management tools built on top of that. I don't
know how commonly it's used, though...

Also, on the DB2 argument... I'm wondering what happens when people end
up with a query that they can't get to execute the way it should? Is the
planner *that* good that it never happens? Do you have to wait for a
fixpack when it does happen? I'm all for having a super-smart planner,
but I'm highly doubtful it will always know exactly what to do.

That said, I really don't like the idea of hints like "use index X"
embedded in a query. I do like the idea of hints that give the planner
more information about the data. I don't have a concrete proposal, but

Which is part of the problem... there's nothing to indicate we'll have
support for these improved hints anytime soon, especially if a number of
them depend on plan invalidation.

here's some examples of hints I'd like to see:

"table X sometimes has millions of records and sometimes it's empty"
"Expression (table.foo = table2.bar * 2) has selectivity 0.99"
"if foo.bar = 5 then foo.field2 IS NULL"
"Column X is unique"
"function foobar() always returns either 1 or 2, and it returns 2 90% of
the time."
"if it's Monday, then table NEW_ORDERS has a cardinality of 100000,
otherwise 10."

BTW: Do we make use of CHECK constraints in the planner? In DB2, that
was one nice and clean way of hinting the planner about things. If I
remember correctly, you could even define CHECK constraints that weren't
actually checked at run-time, but were used by the planner.

I think you're right... and it is an elegant way to hint the planner.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#17

Csaba Nagy

nagy@ecircle-ag.com

over 19 years ago

In reply to: Josh Berkus (#14)

Re: [HACKERS] Hints proposal

Hmmm, if you already understand Visual Basic syntax, should we support
that too? Or maybe we should support MySQL's use of '0000-00-00' as the
"zero" date because people "understand" that?

You completely misunderstood me... I have no idea about oracle hints,
never used Oracle in fact. My company uses oracle, but I have only very
very limited contact with oracle issues, and never touched a hint.

I'm only talking about ease of use, learning curves, and complexity in
general. While I do like the idea of an all automatic system optimizer
which takes your query portofolio and analyzes the data based on those
queries and creates you all the indexes you need and all that, that's
not gonna happen soon, because it's a very complex thing to implement.

The alternative is that you take your query portofolio, analyze it
yourself, figure out what statistics you need, create indexes, tweak
queries, hint the planner for correlations and stuff... which is a
complex task, and if you have to tell the server about some correlations
with the phase of the moon, you're screwed cause there will never be any
DB engine which will understand that.

But you always can put the corresponding hint in the query when you know
the correlation is there...

The problem is that the application sometimes really knows better than
the server, when the correlations are not standard.

We're just not going to adopt a bad design because Oracle DBAs are used
to it. If we wanted to do that, we could shut down the project and
join a proprietary DB staff.

I have really nothing to do with Oracle. I think you guys are simply too
blinded by Oracle hate... I don't care about Oracle.

The current discussion is:

a) Planner tweaking is sometimes necessary;
b) Oracle HINTS are a bad design for planner tweaking;

While there are plenty of arguments you made against query level hints
(can we not call them Oracle-hints ?), there are plenty of users of
postgres who expressed they would like them. I guess they were tweaking
postgres installations when they needed it, and not Oracle
installations. I expressed it clearly that for me query level hinting
would give more control and better understanding of what I have to do
for the desired result. Perfect planning -> forget it, I only care about
good enough with reasonable tuning effort. If I have to tweak statistics
I will NEVER be sure postgres will not backfire on me again. On the
other hand if I say never do a seq scan on this table for this query, I
could be sure it won't...

c) Can we come up with a good design for planner tweaking?

Angles again: good enough now is better for end users, but programmers
always go for perfect tomorrow... pity.

Cheers,
Csaba.

#18

Csaba Nagy

nagy@ecircle-ag.com

over 19 years ago

In reply to: Jeff Davis (#13)

Re: Hints proposal

I'm not suggesting that we do that, but it seems better then embedding
the hints in the queries themselves.

OK, what about this: if I execute the same query from a web client, I
want the not-so-optimal-but-safe plan, if I execute it asynchronously, I
let the planner choose the
best-overall-performance-but-sometimes-may-be-slow plan ?

What kind of statistics/table level hinting will get you this ?

I would say only query level hinting will buy you query level control.
And that's perfectly good in some situations.

I really can't see why a query-level hinting mechanism is so evil, why
it couldn't be kept forever, and augmented with the possibility of
correlation hinting, or table level hinting.

These are really solving different problems, with some overlapping...

Cheers,
Csaba.

#19

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Csaba Nagy (#9)

Re: [HACKERS] Hints proposal

Csaba Nagy <nagy@ecircle-ag.com> writes:

Until that point is achieved, the above proposal is one of the simplest
to understand for the tweaking DBA, and the fastest to deploy when faced
with catastrophic plans. And I would guess it is one of the simplest to
be implemented and probably not very high maintenance either, although
this is just a guess.

That guess is wrong ... but more to the point, if you think that "simple
and easy to implement" should be the overriding concern for designing a
new feature, see mysql. They've used that design approach for years and
look what a mess they've got. This project has traditionally done
things differently and I feel no need to change that mindset now.

regards, tom lane

#20

Andrew Sullivan

ajs@crankycanuck.ca

over 19 years ago

In reply to: Jim Nasby (#8)

Re: [PERFORM] Hints proposal

On Thu, Oct 12, 2006 at 11:25:25AM -0500, Jim C. Nasby wrote:

Yes, but it does one key thing: allows DBAs to fix problems *NOW*. See
also my comment below.

If I may argue in the other direction, speaking as one whose career
(if we may be generous enough to call it that) has been pretty much
exclusively on the operations end of things, I think that's an awful
idea.

There are two ways that quick-fix solve-the-problem-now hints are
going to be used. One is in the sort of one-off query that a DBA has
to run from time to time, that takes a long time, but that isn't
really a part of regular application load. The thing is, if you
already know your data well enough to provide a useful hint, you also
know your data well enough to work around the problem in the short
run (with some temp table tricks and the like).

The _other_ way it's going to be used is as a stealthy alteration to
regular behaviour, to solve a particular nasty performance problem
that happens to result on a given day. And every single time I've
seen anything like that done, the long term effect is always
monstrous. Two releases later, all your testing and careful
inspection and planning goes to naught one Saturday night at 3 am
(because we all know computers know what time it is _where you are_)
when the one-off trick that you pulled last quarter to solve the
manager's promise (which was made while out golfing, so nobody wrote
anything down) turns out to have a nasty effect now that the data
distribution is different. Or you think so. But now you're not
sure, because the code was tweaked a little to take some advantage of
something you now have because of the query plans that you ended up
getting because of the hint that was there because of the golf game,
so now if you start fiddling with the hints, maybe you break
something else. And you're tired, but the client is on the phone
from Hong King _right now_.

The second case is, from my experience, exactly the sort of thing you
want really a lot when the golf game is just over, and the sort of
thing you end up kicking yourself for in run-on sentences in the
middle of the night six months after the golf game is long since
forgotten.

The idea for knobs on the planner that allows the DBA to give
directed feedback, from which new planner enhancements can also come,
seems to me a really good idea. But any sort of quick and dirty hint
for right now gives me the willies.

--
Andrew Sullivan | ajs@crankycanuck.ca
"The year's penultimate month" is not in truth a good way of saying
November.
--H.W. Fowler

#21

Merlin Moncure

mmoncure@gmail.com

over 19 years ago

In reply to: Andrew Sullivan (#20)

#22

Andrew Sullivan

ajs@crankycanuck.ca

over 19 years ago

In reply to: Merlin Moncure (#21)

#23

Jeff Davis

pgsql@j-davis.com

over 19 years ago

In reply to: Csaba Nagy (#18)

#24

Jim Nasby

Jim.Nasby@BlueTreble.com

over 19 years ago

In reply to: Josh Berkus (#12)

#25

Jim Nasby

Jim.Nasby@BlueTreble.com

over 19 years ago

In reply to: Jeff Davis (#13)

#26

Josh Berkus

josh@agliodbs.com

over 19 years ago

In reply to: Jim Nasby (#24)

#27

Arjen van der Meijden

acmmailing@tweakers.net

over 19 years ago

In reply to: Jeff Davis (#23)

#28

Bucky Jordan

bjordan@lumeta.com

over 19 years ago

In reply to: Josh Berkus (#26)

#29

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Arjen van der Meijden (#27)

#30

Richard Broersma Jr

rabroersma@yahoo.com

over 19 years ago

In reply to: Arjen van der Meijden (#27)

#31

Alvaro Herrera

alvherre@2ndquadrant.com

over 19 years ago

In reply to: Bucky Jordan (#28)

#32

Jeff Davis

pgsql@j-davis.com

over 19 years ago

In reply to: Tom Lane (#29)

#33

Jeff Davis

pgsql@j-davis.com

over 19 years ago

In reply to: Jim Nasby (#25)

#34

Chris Browne

cbbrowne@acm.org

over 19 years ago

In reply to: Arjen van der Meijden (#27)

#35

Chris Browne

cbbrowne@acm.org

over 19 years ago

In reply to: Jim Nasby (#1)

#36

Zeugswetter Andreas SB SD

ZeugswetterA@spardat.at

over 19 years ago

In reply to: Andrew Sullivan (#20)

#37

Zeugswetter Andreas SB SD

ZeugswetterA@spardat.at

over 19 years ago

In reply to: Josh Berkus (#12)

#38

Andrew Sullivan

ajs@crankycanuck.ca

over 19 years ago

In reply to: Zeugswetter Andreas SB SD (#36)

#39

Csaba Nagy

nagy@ecircle-ag.com

over 19 years ago

In reply to: Andrew Sullivan (#38)

#40

Zeugswetter Andreas SB SD

ZeugswetterA@spardat.at

over 19 years ago

In reply to: Andrew Sullivan (#38)

#41

Jim Nasby

Jim.Nasby@BlueTreble.com

over 19 years ago

In reply to: Josh Berkus (#26)

#42

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Jim Nasby (#41)

#43

Jeff Davis

pgsql@j-davis.com

over 19 years ago

In reply to: Chris Browne (#35)

#44

Bruce Momjian

bruce@momjian.us

over 19 years ago

In reply to: Josh Berkus (#26)

#45

Jeff Davis

pgsql@j-davis.com

over 19 years ago

In reply to: Alvaro Herrera (#31)

#46

Bucky Jordan

bjordan@lumeta.com

over 19 years ago

In reply to: Jeff Davis (#45)

#47

Jeff Davis

pgsql@j-davis.com

over 19 years ago

In reply to: Bucky Jordan (#46)

#48

Jim Nasby

Jim.Nasby@BlueTreble.com

over 19 years ago

In reply to: Jeff Davis (#47)

#49

Joshua D. Drake

jd@commandprompt.com

over 19 years ago

In reply to: Jim Nasby (#48)

#50

Bucky Jordan

bjordan@lumeta.com

over 19 years ago

In reply to: Jim Nasby (#41)

#51

David Fetter

david@fetter.org

over 19 years ago

In reply to: Jim Nasby (#48)

#52

Jeff Davis

pgsql@j-davis.com

over 19 years ago

In reply to: Jim Nasby (#48)

#53

Andrew Dunstan

andrew@dunslane.net

over 19 years ago

In reply to: Joshua D. Drake (#49)

#54

Stefan Kaltenbrunner

stefan@kaltenbrunner.cc

over 19 years ago

In reply to: Jim Nasby (#48)

#55

Josh Berkus

josh@agliodbs.com

over 19 years ago

In reply to: Jim Nasby (#41)

#56

Josh Berkus

josh@agliodbs.com

over 19 years ago

In reply to: Zeugswetter Andreas SB SD (#37)

#57

Josh Berkus

josh@agliodbs.com

over 19 years ago

In reply to: Csaba Nagy (#39)

#58

Jim Nasby

Jim.Nasby@BlueTreble.com

over 19 years ago

In reply to: Josh Berkus (#55)

#59

Alvaro Herrera

alvherre@2ndquadrant.com

over 19 years ago

In reply to: Jim Nasby (#58)

#60

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Jim Nasby (#58)

#61

Craig A. James

cjames@modgraph-usa.com

over 19 years ago

In reply to: Josh Berkus (#12)

#62

Craig A. James

cjames@modgraph-usa.com

over 19 years ago

In reply to: Bucky Jordan (#50)

#63

Mark Mielke

mark@mark.mielke.cc

over 19 years ago

In reply to: Craig A. James (#62)

#64

Mark Kirkwood

mark.kirkwood@catalyst.net.nz

over 19 years ago

In reply to: Craig A. James (#62)

#65

Csaba Nagy

nagy@ecircle-ag.com

over 19 years ago

In reply to: Mark Kirkwood (#64)

#66

Brian Hurt

bhurt@janestcapital.com

over 19 years ago

In reply to: Csaba Nagy (#65)

#67

Bruce Momjian

bruce@momjian.us

over 19 years ago

In reply to: Brian Hurt (#66)

#68

Shaun Thomas

sthomas@leapfrogonline.com

over 19 years ago

In reply to: Brian Hurt (#66)

#69

Mark Kirkwood

mark.kirkwood@catalyst.net.nz

over 19 years ago

In reply to: Csaba Nagy (#65)

#70

Robert Treat

xzilla@users.sourceforge.net

over 19 years ago

In reply to: Bruce Momjian (#44)

#71

Robert Treat

xzilla@users.sourceforge.net

over 19 years ago

In reply to: Bucky Jordan (#11)

#72

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Robert Treat (#71)

#73

Robert Treat

xzilla@users.sourceforge.net

over 19 years ago

In reply to: Tom Lane (#72)