Index Tuning Features
For 8.3, I'd like to add the following two related features to assist
with Index Tuning and usability:
- Virtual Indexes
An index which only exists in the catalog, so is visible to the planner
but not the executor. This is useful where a specific SQL query is being
hand-tuned, allowing very specific options to be selected.
Virtual indexes would only be seen by the planner when performing an
EXPLAIN and when enable_virtual_index = on (default: off, Userset).
Normal SQL statements would ignore them completely, whatever
enable_virtual_index is set to.
It would not be possible to have both a virtual and a real index defined
identically at the same time. (If facilities existed to make temporary
tables exist only for a single backend, rather than requiring catalog
access then that implementation route would also work here, but until
that does, simple updates seem fine).
SQL: CREATE [VIRTUAL] [UNIQUE] INDEX ...
- RECOMMEND command
Similar in usage to an EXPLAIN, the RECOMMEND command would return a
list of indexes that need to be added to get the cheapest plan for a
particular query (no explain plan result though).
At planning time, all possible single column indexes would be assumed to
exist, plus all groups of cols that make up a multi-col Foreign Key
would be assumed to make a multi-col index. (PKs always exist,
remember). We track whether hypothetical indexes exist on the plan, so
once the cheapest plan has been decided we can report what they are (if
any). Hypothetical indexes last only for the duration of planning - no
catalog changes are made.
Command will return 1 row per selected index (can be more than one for a
complex query), first col gives list of indexed cols, second col shows
the SQL required to create that index. Virtual indexes will be noted,
though treated identically to hypothetical indexes.
The changes to do this would not be very invasive to the planner and
mainly involve adding additional fields to the planner data structures,
some additional branching code and command changes/additions.
Overall we need both of these new features: RECOMMEND covers many cases
in an easy to use form, with VIRTUAL indexes covers the rest of the
search space for possible new indexes for specific cases.
There's a host of other little tweaky bits we might imagine to enhance
this capability further, but this seems to cover the basic requirements.
Specifically, multi-column indexes are not considered very heavily in
RECOMMEND. This is deliberate because
a) we don't have good multi-col interaction stats (though we might have
for 8.3?)
b) it greatly increases the run-time of exhaustive searching and
c) because we have bitmap index interaction the usefulness of
multi-column indexes is much reduced anyhow, so cost/benefit not good.
Comments? (I'll do a summary of feedback tomorrow.)
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote:
For 8.3, I'd like to add the following two related features to assist
with Index Tuning and usability:- Virtual Indexes
This seems useful, but I'm not sure we need a catalog object for that.
It might be sufficient to declare these hypothetical indexes within the
EXPLAIN command. That is after all the only place where they are
applied.
- RECOMMEND command
Similar in usage to an EXPLAIN, the RECOMMEND command would return a
list of indexes that need to be added to get the cheapest plan for a
particular query (no explain plan result though).
This functionality also seems useful, but maybe it should be the job of
a user-space tool?
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Simon Riggs <simon@2ndquadrant.com> writes:
- Virtual Indexes
An index which only exists in the catalog, so is visible to the planner
but not the executor.
Say what? What would that possibly be useful for, other than crashing
any bit of code that failed to know about it?
- RECOMMEND command
Similar in usage to an EXPLAIN, the RECOMMEND command would return a
list of indexes that need to be added to get the cheapest plan for a
particular query (no explain plan result though).
Both of these seem to assume that EXPLAIN results, without EXPLAIN
ANALYZE results to back them up, are sufficient for tuning. I find
this idea a bit dubious, particularly for cases of "marginal" indexes.
Specifically, multi-column indexes are not considered very heavily in
RECOMMEND.
That seems like a bad idea as well --- multicol indexes are exactly the
sort of thing a novice DBA might fail to consider. If you're going to
do this then you should consider all cases.
regards, tom lane
Peter Eisentraut wrote:
Simon Riggs wrote:
For 8.3, I'd like to add the following two related features to assist
with Index Tuning and usability:- Virtual Indexes
This seems useful, but I'm not sure we need a catalog object for that.
It might be sufficient to declare these hypothetical indexes within the
EXPLAIN command. That is after all the only place where they are
applied.- RECOMMEND command
Similar in usage to an EXPLAIN, the RECOMMEND command would return a
list of indexes that need to be added to get the cheapest plan for a
particular query (no explain plan result though).This functionality also seems useful, but maybe it should be the job of
a user-space tool?
On this same vein I thought it would be interesting if we added a
suggestion to explain analyze... Something like:
Your estimated number of rows appears to be off. Have you ran analyze
lately?
Sincerely,
Joshua D. Drake
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
On Tue, Oct 10, 2006 at 06:06:09PM +0200, Peter Eisentraut wrote:
Simon Riggs wrote:
For 8.3, I'd like to add the following two related features to assist
with Index Tuning and usability:- Virtual Indexes
This seems useful, but I'm not sure we need a catalog object for that.
It might be sufficient to declare these hypothetical indexes within the
EXPLAIN command. That is after all the only place where they are
applied.
If you wanted to try multiple scenarios, that might become a pain. I
guess it depends on how verbose the syntax was...
- RECOMMEND command
Similar in usage to an EXPLAIN, the RECOMMEND command would return a
list of indexes that need to be added to get the cheapest plan for a
particular query (no explain plan result though).This functionality also seems useful, but maybe it should be the job of
a user-space tool?
I think it makes the most sense to have this in core, though I guess an
argument could be made for having it be seperate from the backend. But
it'd have to be easy to call from an external tool, such as pgAdmin,
which means in probably needs to speak libpq.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Tuesday 10 October 2006 12:06, Tom Lane wrote:
Similar in usage to an EXPLAIN, the RECOMMEND command would return a
list of indexes that need to be added to get the cheapest plan for a
particular query (no explain plan result though).Both of these seem to assume that EXPLAIN results, without EXPLAIN
ANALYZE results to back them up, are sufficient for tuning. I find
this idea a bit dubious, particularly for cases of "marginal" indexes.
While I agree with Tom that generally EXPLAIN is not enough for tuning, I also
know that when your dealing with queries that have run times in multiples of
hours (and the corresponding hour long index builds) EXPLAIN ANALYZE just
isn't an option. Anything that can be done to wheedle down your choices
before you have to run EXPLAIN ANALYZE is a bonus.
--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat <xzilla@users.sourceforge.net> writes:
Anything that can be done to wheedle down your choices
before you have to run EXPLAIN ANALYZE is a bonus.
Fair enough, but I prefer Peter's suggestion of attaching the
hypothetical index definitions to EXPLAIN itself, rather than making
bogus catalog entries. Something along the line of
EXPLAIN <statement>
ASSUMING INDEX fooi ON foo ....
[ ASSUMING INDEX ... ]
although this exact syntax probably doesn't work unless we're willing
to make ASSUMING a fully reserved word :-(
I have some vague recollection that this idea has been discussed
before...
regards, tom lane
Simon Riggs <simon@2ndquadrant.com> writes:
- RECOMMEND command
Similar in usage to an EXPLAIN, the RECOMMEND command would return a
list of indexes that need to be added to get the cheapest plan for a
particular query (no explain plan result though).Both of these seem to assume that EXPLAIN results, without EXPLAIN
ANALYZE results to back them up, are sufficient for tuning. I find
this idea a bit dubious, particularly for cases of "marginal" indexes.
I think the idea of "virtual indexes" is pretty interesting, but
ultimately a lesser solution to a more fundimental issue, and that would
be "hands on" control over the planner. Estimating the effect of an index
on a query "prior" to creating the index is a great idea, how that is done
is something different than building concensus that it should be done.
Another thing that this brings up is "hints" to a query. Over the years, I
have run into situation where the planner wasn't great. It would be nice
to try forcing different strategies on the planner and see if performance
caan be improved.
Mark,
Another thing that this brings up is "hints" to a query. Over the years,
I have run into situation where the planner wasn't great. It would be
nice to try forcing different strategies on the planner and see if
performance caan be improved.
See discussion on -performance.
--
--Josh
Josh Berkus
PostgreSQL @ Sun
San Francisco
On 10/10/06, Mark Woodward <pgsql@mohawksoft.com> wrote:
I think the idea of "virtual indexes" is pretty interesting, but
ultimately a lesser solution to a more fundimental issue, and that would
be "hands on" control over the planner. Estimating the effect of an index
on a query "prior" to creating the index is a great idea, how that is done
is something different than building concensus that it should be done.Another thing that this brings up is "hints" to a query. Over the years, I
have run into situation where the planner wasn't great. It would be nice
to try forcing different strategies on the planner and see if performance
caan be improved.
you can do this by setting enable_"access_method" type parameters.
--
regards,
Jaime Casanova
"Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs and the universe trying
to produce bigger and better idiots.
So far, the universe is winning."
Richard Cook
On Tue, 2006-10-10 at 20:17 -0400, Mark Woodward wrote:
Another thing that this brings up is "hints" to a query. Over the
years, I
have run into situation where the planner wasn't great. It would be
nice
to try forcing different strategies on the planner and see if
performance
caan be improved.
/*+ Not on this thread, p-l-e-a-s-e */
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
Another thing that this brings up is "hints" to a query. Over the
years, I have run into situation where the planner wasn'tgreat. It
would be nice to try forcing different strategies on the
planner and
see if performance caan be improved.
you can do this by setting enable_"access_method" type parameters.
No, not generally. Usual problems include join order and wrong index,
not only wrong access method.
Andreas
Thanks everybody for comments so far; this will be a useful discussion.
On Tue, 2006-10-10 at 18:56 -0400, Robert Treat wrote:
On Tuesday 10 October 2006 12:06, Tom Lane wrote:
Similar in usage to an EXPLAIN, the RECOMMEND command would return a
list of indexes that need to be added to get the cheapest plan for a
particular query (no explain plan result though).Both of these seem to assume that EXPLAIN results, without EXPLAIN
ANALYZE results to back them up, are sufficient for tuning. I find
this idea a bit dubious, particularly for cases of "marginal" indexes.While I agree with Tom that generally EXPLAIN is not enough for tuning, I also
know that when your dealing with queries that have run times in multiples of
hours (and the corresponding hour long index builds) EXPLAIN ANALYZE just
isn't an option. Anything that can be done to wheedle down your choices
before you have to run EXPLAIN ANALYZE is a bonus.
IMHO you need EXPLAIN, EXPLAIN ANALYZE and RECOMMEND
As Robert points out, using EA can make tuning take a long time and that
is the critical factor when you have a whole database/app to tune.
This discussion helps me to make explicit what my thoughts had been on
what an ideal index tuning process is:
1. Recommendation: Use RECOMMEND to get an 80/20 setting for a
statement. As Peter suggests a "user-space" tool, I also imagine a tool
that would automatically run RECOMMEND on all SQL statements in a
workload and come up with proposals for additional indexes. We would
have a first cut index design in minutes rather than days.
2. Evaluation: We can then create the potential indexes as Virtual ones
and then re-run EXPLAINs to model how a whole workload would behave. We
can begin to prune low-impact indexes out of the mix at this stage.
Again, this can be done automatically.
3. Implementation: We re-create the new indexes as real indexes (perhaps
concurrently)
4. Correction: We then run the workload and then use existing tools to
spot the statements causing the most problems and manually assess them
using EXPLAIN ANALYZE. Manually postulate new Virtual indexes and
re-model the workload again as (2)
Steps (3) and (4) have both been improved for 8.2. Steps (1) and (2) are
completely new steps for 8.3
The above process can be performed without tool support, but its clear
that further automation will help greatly here. I foresee that the
development of both server-side and tools will take more than one
release. Discussion of tool support can begin once we have agreed
server-side capability.
With that as a backdrop, further comments are:
On Tue, 2006-10-10 at 19:15 -0400, Tom Lane wrote:
Robert Treat <xzilla@users.sourceforge.net> writes:
Anything that can be done to wheedle down your choices
before you have to run EXPLAIN ANALYZE is a bonus.Fair enough, but I prefer Peter's suggestion of attaching the
hypothetical index definitions to EXPLAIN itself, rather than making
bogus catalog entries. Something along the line ofEXPLAIN <statement>
ASSUMING INDEX fooi ON foo ....
[ ASSUMING INDEX ... ]
I do like this, though for step (2) above we would need to attach the
appropriate indexes to each of the SQL statements prior to execution.
Doing this for a single SQL statement is fine, but doing that for a
whole workload of 1000s of statements is not very practical, hence an
externally declarative approach seems better.
I can imagine many other declarative approaches other than the one I
proposed; it just seems pretty neat to me to use almost exactly the same
syntax for a virtual index as for a real index. As I mentioned, ideally
this would not be a full-strength catalog object, but I was thinking
towards implementation also. Another possibility would be to use a local
pg_virtual_indexes table.
On Tue, 2006-10-10 at 18:06 +0200, Peter Eisentraut wrote:
Simon Riggs wrote:
- RECOMMEND command
Similar in usage to an EXPLAIN, the RECOMMEND command would return a
list of indexes that need to be added to get the cheapest plan for a
particular query (no explain plan result though).This functionality also seems useful, but maybe it should be the job of
a user-space tool?
So from above, Yes, I see a user-space tool also, but not instead.
The RECOMMEND command is the minimal server functionality required to
enable an (external) automated tuning support tool to be developed.
Possible architectures for this functionality include both user-space
and server-space options. Much thinking has been done on this in the DB
research community, with the general consensus being its easier to
extend the planner to cope with postulation that it is to create an
external postulation tool that acts (accurately) like the planner.
"DB2 advisor: An optimizer smart enough to recommend its own indexes."
Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohnman, and
Alan Skelley.
In The 16th International Conference on Data Engineering (ICDE'00), San
Diego, CA. IEEE Computer Society, February 2000.
A wonderful summary of which is available here, but not sure if the full
paper is publicly available for free.
http://www.andrew.cmu.edu/user/ngm/15-823/summaries/08.pdf
DB2 Design Advisor: Integrated Automatic Physical Database Design
"DB2 Design Advisor: Integrated Automatic Physical Database Design"
Zilio et al
which is available at
http://www.vldb.org/conf/2004/IND4P1.PDF#search=%22db2%20design%
20advisor%22
On Tue, 2006-10-10 at 19:15 -0400, Tom Lane wrote:
Specifically, multi-column indexes are not considered very heavily in
RECOMMEND.That seems like a bad idea as well --- multicol indexes are exactly the
sort of thing a novice DBA might fail to consider. If you're going to
do this then you should consider all cases.
Calculating all index cases would follow the combinatorial explosion of
sum(N!/(r!(N-r)!)) though we can argue about exactly what N is in this
case. So we have the same problem as the main optimiser: exhaustive
search is not practical, so we must find a heuristic that allows us to
limit the search space so RECOMMEND doesn't run for too long.
The "no multi-col indexes except FKs" is just a proposed heuristic, so
happy to debate exactly what that heuristic should be. (There are
various research papers available with proposed heuristics).
Multi-col indexes are also subject to over-fitting, since RECOMMEND
would be liable to return (for example) 7-column indexes as the best
choice for a single query, which would be bad overall.
I'd been thinking about this for some time: the virtual index concept
fills in the gaps so that taken together, RECOMMEND and virtual indexes
provide a reasonable toolset for both limiting search space and yet
allowing more complex ideas to be tested.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
The above process can be performed without tool support, but its clear
that further automation will help greatly here. I foresee that the
development of both server-side and tools will take more than one
release. Discussion of tool support can begin once we have agreed
server-side capability.
If it came to automated tools, wouldn't fit in this discussion to give
some performance requirement limits to the RECOMMEND tool ? In a
workload not all queries are real time or high priority, and such a
lesser impact index can help enough sometimes to meet the requirements,
compared to a high impact index which would make the query fly.
Example: inserting in a table must be real time, reporting can be taken
offline...
So it would be nice to have a recommendation tool which can take into
account the performance requirements of the individual queries, possibly
making the right compromises to meat all requirements for all queries.
Cheers,
Csaba.
On 10/10/06, Mark Woodward <pgsql@mohawksoft.com> wrote:
I think the idea of "virtual indexes" is pretty interesting, but
ultimately a lesser solution to a more fundimental issue, and that would
be "hands on" control over the planner. Estimating the effect of an
index
on a query "prior" to creating the index is a great idea, how that is
done
is something different than building concensus that it should be done.Another thing that this brings up is "hints" to a query. Over the years,
I
have run into situation where the planner wasn't great. It would be
nice
to try forcing different strategies on the planner and see if
performance
caan be improved.you can do this by setting enable_"access_method" type parameters.
Here's your hammer, all your problems are now nails.
The enable_xxx setting are OK for simple queries gone wrong, but if you
have a more complex query, any one of those settins may help or hinder
different parts of a query, then you would be left with choosing which of
them helps more than hurts the over-all query.
being able to alter the query plan would help in areas where there are
data patterns in a database that the ANALYZE command can't pick up because
it is not designed too.
Imagine you have a street map database ordered by zip, street, number. The
primary order is zipcode, the secondary order is street. There is a
relationship of number to street, and zip to street. The analyzer, at
least the last time I checked, does not recognize these relationships. So,
a search by street and number would probably use a sequential scan rather
than the street index.
"Mark Woodward" <pgsql@mohawksoft.com> writes:
The analyzer, at least the last time I checked, does not recognize these
relationships.
The analyzer is imperfect but arguing from any particular imperfection is weak
because someone will just come back and say we should work on that problem --
though I note nobody's actually volunteering to do so whereas they appear to
be for hints.
I think the stronger argument is to say that there are some statistical
properties that the analyzer _cannot_ be expected to figure out. Either
because
a) they're simply too complex to ever expect to be able to find automatically,
b) too expensive to make it worthwhile in the general case, or
c) because of some operational issue such as the data changing frequently
enough that the analyzes that would be necessary to keep the statistics up
to date would become excessively expensive or even be impossible to perform
rapidly enough.
The people arguing that hints themselves are of negative benefit are taking
the argument far too far. I've never heard an Oracle DBA gripe about having to
fix hints on an upgrade; they're usually the first ones to suggest hinting a
poorly written query. In fact Oracle is going in the opposite direction of
even relying on hints internally. Its plan stability feature depends on
generating and storing hints internally associated with every query.
The argument against hints is usually that the effort would be better spent
elsewhere, not that hints are inherently a bad idea. We already have enable_*
parameters and they are absolutely necessary for testing and experimenting to
understand whether the planner is incorrect and where it has gone wrong. Hints
are just a more precisely targeted version of these. There have been plenty of
instances on this list where people posted 20-30 line query plans with several
joins of each type where the enable_* parameters were too coarse grained to
use effectively.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
"Mark Woodward" <pgsql@mohawksoft.com> writes:
The analyzer, at least the last time I checked, does not recognize these
relationships.The analyzer is imperfect but arguing from any particular imperfection is
weak
because someone will just come back and say we should work on that problem
--
though I note nobody's actually volunteering to do so whereas they appear
to
be for hints.I think the stronger argument is to say that there are some statistical
properties that the analyzer _cannot_ be expected to figure out. Either
becausea) they're simply too complex to ever expect to be able to find
automatically,b) too expensive to make it worthwhile in the general case, or
c) because of some operational issue such as the data changing frequently
enough that the analyzes that would be necessary to keep the statistics
up
to date would become excessively expensive or even be impossible to
perform rapidly enough.
Well, from a purely data domain standpoint, it is impossible to charactize
the exact nature of a data set without enough information to recreate it.
Anything less must be designed for a fixed set of assumptions. There is no
way that every specific trend can be covered by a fixed number of
assumptions.
The argument that all we need is better statistics completely misses the
point. There will *always* be a number cases where the planner will not
work optimally. I would say that a "simpler" planner with better hints
will always be capable of creating a better query plan.
"Mark Woodward" <pgsql@mohawksoft.com> writes:
I would say that a "simpler" planner with better hints
will always be capable of creating a better query plan.
This is demonstrably false: all you need is an out-of-date hint, and
you can have a worse plan.
The argument against hints is not about whether someone could knock
together a crappy hint facility and be able to get some use out of it.
It is about how much work it would take to design a *good* hint facility
that makes it easy to maintain hints that are robust in the face of data
and query changes. If someone were to sit down and design and build
such a thing, it'd very likely get accepted into core Postgres --- but
personally, I think the equivalent amount of effort would be better
spent on improving the planner and the statistics.
As Josh already noted, Oracle-like hints are pretty likely to get
rejected ... not only because of doubts about their true usefulness,
but out of fear of falling foul of some Oracle patent or other.
regards, tom lane
"Mark Woodward" <pgsql@mohawksoft.com> writes:
I would say that a "simpler" planner with better hints
will always be capable of creating a better query plan.This is demonstrably false: all you need is an out-of-date hint, and
you can have a worse plan.
That doesn't make it false, it makes it higher maintenance. Hints are
understood to require maintenance.
The argument against hints is not about whether someone could knock
together a crappy hint facility and be able to get some use out of it.
It is about how much work it would take to design a *good* hint facility
that makes it easy to maintain hints that are robust in the face of data
and query changes. If someone were to sit down and design and build
such a thing, it'd very likely get accepted into core Postgres --- but
personally, I think the equivalent amount of effort would be better
spent on improving the planner and the statistics.
While it is always true that something can be improved, there comes a
point where work outweighs benefits. I can't say that the planner is at
that point, but I think that isn't even an issue.
The notion of hints would probably one of the biggest steps toward
improving the planner. Like I said, it is inarguable that there will
always be queries that the planner can not execute efficiently based on
the statistics gathered by analze. Since that number must be greater than
zero, some methodology to deal with it should be created.
As Josh already noted, Oracle-like hints are pretty likely to get
rejected ... not only because of doubts about their true usefulness,
but out of fear of falling foul of some Oracle patent or other.
Well, if it would get rejected if it looked like Oracle, assuming you
would probably be one of the people rejecting it, what do you envision as
not being rejected?
On Wed, Oct 11, 2006 at 12:40:42PM -0400, Gregory Stark wrote:
poorly written query. In fact Oracle is going in the opposite direction of
even relying on hints internally. Its plan stability feature depends on
generating and storing hints internally associated with every query.
But IBM, whose DB2 planner and optimiser is generally regarded as way
better than Oracle's (at least by anyone I know who's used both),
doesn't like hints. The IBM people all say the same thing Tom has
said before: that the work to design the thing correctly is better
spent making the planner and optimiser parts smarter and cheaper,
because out of that work you also manage not to have the DBA
accidentally mess things up by simple-minded rule-based hints. (Note
that I'm not trying to wade into the actual argument; I'm just
pointing out that even the biggest industry people don't agree on
this point.)
A
--
Andrew Sullivan | ajs@crankycanuck.ca
A certain description of men are for getting out of debt, yet are
against all taxes for raising money to pay it off.
--Alexander Hamilton