Hypothetical Indexes
Hello,
I have just finished my M.Sc. thesis at Pontifícia Universidade
Católica do Rio de Janeiro (PUC-Rio), Brazil, and we have developed a
research prototype for automatic index selection using a software
agent in PostgreSQL. In order to make index selection possible, we
have extended the PostgreSQL DBMS to allow the simulation of
hypothetical indexes. We believe these server extensions may be of
value for addition to the PostgreSQL code base.
A description of the work we have been doing is available at:
http://www.inf.puc-rio.br/~postgresql/
There you will find a link to a tutorial based description of the
hypothetical indexes feature we have implemented on PostgreSQL 7.4
beta 3.
I am aware that we shouldn't go on working on things that are not on
the TODO list, but we had a research oriented focus. At the end of the
day, it turned out that some of the things we coded might be useful to
the PostgreSQL community and we would be happy if we can contribute.
So, what do you people think? Should we work together to make this
new feature available on newer PostgreSQL versions?
Thanks in advance for your attention,
Marcos Salles.
P.S: An interesting consequence of having hypothetical indexes in the
system is that this eases the implementation of index tuning tools
(e.g. the Microsoft SQL Server's Index Tuning Wizard). We are planning
to implement one such tool to the PostgreSQL database (at first as a
command-line utility).
It seems to me to be a very valuable and interesting idea. (IBM has something similar)
Probably, for production you do not want your optimizer puzzling about possible indexes. However, it would be a very valuable tool for the test environment. I can easily imagine a setting that allows the feature to be turned on or off and the suggestions could (of course) be very valuable.
Do you have a paper describing your work? I would like to read it.
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of
Marcos A Vaz Salles
Sent: Wednesday, October 06, 2004 7:40 PM
To: pgsql-hackers@postgresql.org
Cc: Sergio Lifschitz; Maira Ferreira de Noronha; msalles@acm.org
Subject: [HACKERS] Hypothetical IndexesHello,
I have just finished my M.Sc. thesis at Pontifícia
Universidade Católica do Rio de Janeiro (PUC-Rio), Brazil,
and we have developed a research prototype for automatic
index selection using a software agent in PostgreSQL. In
order to make index selection possible, we have extended the
PostgreSQL DBMS to allow the simulation of hypothetical
indexes. We believe these server extensions may be of value
for addition to the PostgreSQL code base.A description of the work we have been doing is available at:
http://www.inf.puc-rio.br/~postgresql/
There you will find a link to a tutorial based description of the hypothetical indexes feature we have implemented on PostgreSQL 7.4 beta 3.
I am aware that we shouldn't go on working on things that are not on the TODO list, but we had a research oriented focus. At the end of the day, it turned out that some of the things we coded might be useful to the PostgreSQL community and we would be happy if we can contribute.
So, what do you people think? Should we work together to make this new feature available on newer PostgreSQL versions?
Thanks in advance for your attention,
Marcos Salles.
P.S: An interesting consequence of having hypothetical indexes in the system is that this eases the implementation of index tuning tools (e.g. the Microsoft SQL Server's Index Tuning Wizard). We are planning to implement one such tool to the PostgreSQL database (at first as a command-line utility).
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Import Notes
Resolved by subject fallback
Marcos A Vaz Salles <msalles@gmail.com> writes:
In order to make index selection possible, we
have extended the PostgreSQL DBMS to allow the simulation of
hypothetical indexes. We believe these server extensions may be of
value for addition to the PostgreSQL code base.
This would be of some value if the optimizer's cost estimates were
highly reliable, but unfortunately they are far from being so :-(
Without the ability to measure *actual* as opposed to estimated
costs, I'm not sure you can really do much.
regards, tom lane
Tom,
This would be of some value if the optimizer's cost estimates were
highly reliable, but unfortunately they are far from being so :-(
Without the ability to measure *actual* as opposed to estimated
costs, I'm not sure you can really do much.
In fact, any index selection tool that uses hypothetical indexes
will not recommend indexes that the optimizer does not recognize as
good enough for the query. The bright side of this is that the better
the optimizer gets, the better are the recommendations made by the
index selection tool. And people tend to invest resources in
constructing good query optimizers. Other index selection tools, such
as Microsoft's and IBM's, have the same limitation. Even though, the
tools are useful for people that have to deal with databases with a
big quantity of tables and queries. Finding useful indexes in this
kind of setting is a difficult problem for DBAs.
So, our point is that hypothetical indexes just have to be as well
estimated by the optimizer as conventional, real indexes. An index not
suggested by the optimizer might still be usable, but that would
require rewriting the query or using hints, things that need the
intervention of a more skilled DBA anyway.
Best regards,
Marcos.
Show quoted text
regards, tom lane
Marcos,
http://www.inf.puc-rio.br/~postgresql/
There you will find a link to a tutorial based description of the
hypothetical indexes feature we have implemented on PostgreSQL 7.4
beta 3.
I would love to see this as an add-in project on pgFoundry. Particularly
since your online "tutorial" only works in Internet Explorer, so I can't read
it.
I believe that there was/is a team exploring a set of utilities to produce
database optimization "hints" for the admin. Your idea would dovetail
nicely with that.
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco
Import Notes
Reply to msg id not found: 20041012192756.20B6332AE6F@svr1.postgresql.orgReference msg id not found: 20041012192756.20B6332AE6F@svr1.postgresql.org | Resolved by subject fallback
Hello Josh,
I will take a look at pgFoundry and register a new project for index
selection. I will also look for other projects there that we may help
somehow. About the tutorial, I will send you the presentation we used
to generate the web pages so that you can see it with OpenOffice. ;)
Thanks for your feedback,
Marcos.
Show quoted text
On Wed, 13 Oct 2004 11:13:55 -0700, Josh Berkus <josh@agliodbs.com> wrote:
Marcos,
http://www.inf.puc-rio.br/~postgresql/
There you will find a link to a tutorial based description of the
hypothetical indexes feature we have implemented on PostgreSQL 7.4
beta 3.I would love to see this as an add-in project on pgFoundry. Particularly
since your online "tutorial" only works in Internet Explorer, so I can't read
it.I believe that there was/is a team exploring a set of utilities to produce
database optimization "hints" for the admin. Your idea would dovetail
nicely with that.--
--JoshJosh Berkus
Aglio Database Solutions
San Francisco
On 10/12/2004 4:02 PM Tom Lane could be overheard saying::
Marcos A Vaz Salles <msalles@gmail.com> writes:
In order to make index selection possible, we
have extended the PostgreSQL DBMS to allow the simulation of
hypothetical indexes. We believe these server extensions may be of
value for addition to the PostgreSQL code base.This would be of some value if the optimizer's cost estimates were
highly reliable, but unfortunately they are far from being so :-(
Without the ability to measure *actual* as opposed to estimated
costs, I'm not sure you can really do much.
Is it possible for the backend to store performance data and try to
modify its cost estimates? I was thinking of the statistics analyzer
currently in use and whether (query) performance data could piggy back
on it or if it would need a different process, something that could try
and correlate cost estimates with actual costs. Given sample data the
calculation could result in an actual execution time estimation. Would
it be worth pursuing, or would it be too invasive?
I know the autotune project has similar goals from a different angle,
system/memory settings.
Thomas