Ad Hoc Indexes

Started by Justinalmost 18 years ago14 messages
#1Justin
justin@emproshunts.com

Question: We have recently moved PostgreSql as we just migrated to
OpenMfg. i really like the database and PL/pgSQL its very similar to
how FoxPro language layout so learning has been pretty easy.

Now for my question It does not appear PostgreSQL does not have an Ad
Hoc Indexes ability where the Query planner will create an in memory
index based on the Select, Update, Insert or Delete commands. I got
pretty lazy with MsSQL and Foxpro doing ad hoc indexes to speed up
complex queries where i should have created an index but never did.

Is there any plans in the future to add the ability for PostgreSQL to
create Ad Hoc indexes if it makes sense. Example i was creating a new
report thats presently only processing around 120,000 records in the 7
tables for the query. But the query took between 18 to 19 seconds to
run. I analyzed the query and added 2 indexes and drop the processing
time to 93 milliseconds.
This query will be run 3 to 5 times a day so it makes sense to create
indexes

But with reports that run maybe once a week or once a month it does not
make sense to create indexes. this is where I think Add Hoc indexes are
nice to have because to my understanding maintaining indexes can get
pretty expensive.

Whats the likely hood of Ad Hoc Indexes being added to PostgreSql

thanks

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Justin (#1)
Re: Ad Hoc Indexes

Justin wrote:

Now for my question It does not appear PostgreSQL does not have an Ad
Hoc Indexes ability where the Query planner will create an in memory
index based on the Select, Update, Insert or Delete commands.

How is that supposed to work? In order to create an index you would need to
visit all the rows in the table. If you do that, you could just as well
answer the query off a sequential scan.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#3Hans-Juergen Schoenig
postgres@cybertec.at
In reply to: Peter Eisentraut (#2)
Re: Ad Hoc Indexes

On Feb 18, 2008, at 9:09 PM, Peter Eisentraut wrote:

Justin wrote:

Now for my question It does not appear PostgreSQL does not have an Ad
Hoc Indexes ability where the Query planner will create an in memory
index based on the Select, Update, Insert or Delete commands.

How is that supposed to work? In order to create an index you
would need to
visit all the rows in the table. If you do that, you could just as
well
answer the query off a sequential scan.

this is not quite true.
this kind of indexing makes sense if you visit the same data over and
over again. WITH-queries would be an example for that and self joins
could benefit from the this feature too.

the question however is: why not create "normal indexes" straight away?
i am not sure if the benefit of ad-hoc indexes justify additional
complexity in the code ...

best regards,

hans

--
Cybertec Schönig & Schönig GmbH
PostgreSQL Solutions and Support
Gröhrmühlgasse 26, 2700 Wiener Neustadt
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin (#1)
Re: Ad Hoc Indexes

Justin <justin@emproshunts.com> writes:

Is there any plans in the future to add the ability for PostgreSQL to
create Ad Hoc indexes if it makes sense.

No, I'm not aware of anyone contemplating such a thing. I can hardly
imagine a situation where building an index for a single query is
actually a win. Maybe those DBMSes you mention were using this as a
substitute for having decent join mechanisms, or something?

regards, tom lane

#5Justin
justin@emproshunts.com
In reply to: Tom Lane (#4)
3 attachment(s)
Re: Ad Hoc Indexes

The idea of ad hoc indexes is speed up loop scans To prove my idea i
created a sql file in PGAdmin that creates the indexes on the fly then
runs the query then drops the indexs.

without the indexes it takes 18 to 19 seconds to run the query.

To create the index and do the query takes 400 milliseconds. I wish i
could do an Explain on it with create index in the query but i can't it
errors out. So i reran the query with indexes already in place it drops
the query time 191 milliseconds.

Create and deleting the indexes on the fly improves performance almost
50 times. I think creating Ad Hoc indexes on the fly in memory makes
sense . I imagine it would be even faster if the index stayed in memory

Tom Lane wrote:

Show quoted text

Justin <justin@emproshunts.com> writes:

Is there any plans in the future to add the ability for PostgreSQL to
create Ad Hoc indexes if it makes sense.

No, I'm not aware of anyone contemplating such a thing. I can hardly
imagine a situation where building an index for a single query is
actually a win. Maybe those DBMSes you mention were using this as a
substitute for having decent join mechanisms, or something?

regards, tom lane

Attachments:

Query Explain With No Indexes.txttext/plain; name="Query Explain With No Indexes.txt"Download
proof.sqltext/plainDownload
query explain with indexes.txttext/plain; name="query explain with indexes.txt"Download
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin (#5)
Re: Ad Hoc Indexes

Justin <justin@emproshunts.com> writes:

The idea of ad hoc indexes is speed up loop scans To prove my idea i
created a sql file in PGAdmin that creates the indexes on the fly then
runs the query then drops the indexs.

without the indexes it takes 18 to 19 seconds to run the query.

To create the index and do the query takes 400 milliseconds.

The example you show doesn't convince me of much of anything, because
the estimated rowcounts are so far off. I think you're basically
dealing with an estimation failure and it's pure luck that the extra
index fixes it.

regards, tom lane

#7Peter Eisentraut
peter_e@gmx.net
In reply to: Hans-Juergen Schoenig (#3)
Re: Ad Hoc Indexes

Hans-Juergen Schoenig wrote:

this kind of indexing makes sense if you visit the same data over and �
over again. WITH-queries would be an example for that and self joins �
could benefit from the this feature too.

Yes, for joins it is useful. We have hash joins, which are kind of ad hoc
hash indexes. If anyone wants to implement a btree join, that might work. :)
But I think we essentially support what the OP is asking for, it is just
under different labels.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#8Justin
justin@emproshunts.com
In reply to: Tom Lane (#6)
Re: Ad Hoc Indexes

Then why are the estimates so far off??? If estimates where correct
would it improve the performance that much.

Vaccum is set to run automatically so the stats stay update.

Total record count for the tables for all the tables put together is
around 120,000 the query returns only 458 records which is correct.

If i am correct in my understanding the reason the index improved the
query so much is the wooper table gets hit hard because it appears in 3
separate nested queries . So taking only 458 records returned from the
parent query times 3 for 1,375 table scans going through 21,873 records
for a total number records being processed to 30,075,375 on a table
with no index. So if you look at it that way PostgreSql did remarkably
well processing the query in 18 to 20 seconds.

The idea behind adhoc indexes is when one shot queries or really used
queries are created that would require numerous indexes to run in a
decent time can be run in a faction of the time. This also saves
processing times across the entire system where creating indexes for the
all the possible queries is impractical

This does not take away the need for index but speed up ad-hoc queries
created from a website or other business analysis tool that someone
might create

Tom Lane wrote:

Show quoted text

Justin <justin@emproshunts.com> writes:

The idea of ad hoc indexes is speed up loop scans To prove my idea i
created a sql file in PGAdmin that creates the indexes on the fly then
runs the query then drops the indexs.

without the indexes it takes 18 to 19 seconds to run the query.

To create the index and do the query takes 400 milliseconds.

The example you show doesn't convince me of much of anything, because
the estimated rowcounts are so far off. I think you're basically
dealing with an estimation failure and it's pure luck that the extra
index fixes it.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

#9Justin
justin@emproshunts.com
In reply to: Justin (#8)
Re: Ad Hoc Indexes

oops dam spell checker really should be rarely sorry

Justin wrote:

Show quoted text

Then why are the estimates so far off??? If estimates where correct
would it improve the performance that much.

Vaccum is set to run automatically so the stats stay update.

Total record count for the tables for all the tables put together is
around 120,000 the query returns only 458 records which is correct.

If i am correct in my understanding the reason the index improved the
query so much is the wooper table gets hit hard because it appears in
3 separate nested queries . So taking only 458 records returned from
the parent query times 3 for 1,375 table scans going through 21,873
records for a total number records being processed to 30,075,375 on a
table with no index. So if you look at it that way PostgreSql did
remarkably well processing the query in 18 to 20 seconds.

The idea behind adhoc indexes is when one shot queries or rarely used
queries are created that would require numerous indexes to run in a
decent time can be run in a faction of the time. This also saves
processing times across the entire system where creating indexes for
the all the possible queries is impractical

This does not take away the need for index but speed up ad-hoc
queries created from a website or other business analysis tool that
someone might create

Tom Lane wrote:

Justin <justin@emproshunts.com> writes:

The idea of ad hoc indexes is speed up loop scans To prove my idea i
created a sql file in PGAdmin that creates the indexes on the fly then
runs the query then drops the indexs.

without the indexes it takes 18 to 19 seconds to run the query.

To create the index and do the query takes 400 milliseconds.

The example you show doesn't convince me of much of anything, because
the estimated rowcounts are so far off. I think you're basically
dealing with an estimation failure and it's pure luck that the extra
index fixes it.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

#10Stephen Denne
Stephen.Denne@datamail.co.nz
In reply to: Justin (#8)
Re: Ad Hoc Indexes

Justin wrote:

Then why are the estimates so far off?

Perhaps a really high correlation between those records where coitem_status = 'O' and those records in your date range. The planner estimates 1 row when restricting by both restrictions, whereas there where 458. The 1 row was probably a result of a small % for status='O' multiplied by a small % for the date range.

If estimates where correct would it improve the performance that much.

Possibly, but a better performance gain might be obtained by rewriting the query, changing the case expression to something along the lines of:

coalesce((
select wrkcnt_code || ' Operation Completed :' || cast(wooper_seqnumber as text)
from wooper inner join wrkcnt on wrkcnt_id = wooper_wrkcnt_id
where wooper_rncomplete = true and wooper_wo_id = coitem_order_id
order by wooper_seqnumber desc limit 1
),'No Operation Completed') as LastFinshedOp

regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.

__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________

#11Justin
justin@emproshunts.com
In reply to: Stephen Denne (#10)
2 attachment(s)
Re: Ad Hoc Indexes

yes that improved the select statement allot from 18 to 20 to 6
seconds 3 fold improvement with no indexes. thanks

Now i tested your new and improved select statement with the indexes its
query time was taking from 90 to 140 milliseconds compared to the
original select statement of 94 to 120 milliseconds. I tested both
select statements 5 times to get an average and they both hover around
110 to 120

I attached the results from the explain that used the indexes and one
with no index

although in both case with and with out indexes the estimate still
failed to return the correct number by allot.

Stephen Denne wrote:

Show quoted text

Justin wrote:

Then why are the estimates so far off?

Perhaps a really high correlation between those records where coitem_status = 'O' and those records in your date range. The planner estimates 1 row when restricting by both restrictions, whereas there where 458. The 1 row was probably a result of a small % for status='O' multiplied by a small % for the date range.

If estimates where correct would it improve the performance that much.

Possibly, but a better performance gain might be obtained by rewriting the query, changing the case expression to something along the lines of:

coalesce((
select wrkcnt_code || ' Operation Completed :' || cast(wooper_seqnumber as text)
from wooper inner join wrkcnt on wrkcnt_id = wooper_wrkcnt_id
where wooper_rncomplete = true and wooper_wo_id = coitem_order_id
order by wooper_seqnumber desc limit 1
),'No Operation Completed') as LastFinshedOp

regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.

__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Attachments:

newimprovequerynoindexes.txttext/plain; name=newimprovequerynoindexes.txtDownload
newimproved querywithindexes.txttext/plain; name="newimproved querywithindexes.txt"Download
#12Stephen Denne
Stephen.Denne@datamail.co.nz
In reply to: Justin (#11)
Re: Ad Hoc Indexes

Justin wrote:

although in both case with and with out indexes the estimate still failed to return the correct number by allot.

The improvement wasn't to the part of the query that had the bad cost estimate, it was to the part that was being performed hundreds of times instead of the one time the planner estimated.

The planner still thought it was only going to perform a sequential scan of your wooper table once. So even if there had been any Ad Hoc Index creation code that had been used to consider creating indexes as part of a plan cost estimate, it wouldn't have bothered creating any indexes on wooper.

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.

__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Denne (#12)
Re: Ad Hoc Indexes

"Stephen Denne" <Stephen.Denne@datamail.co.nz> writes:

The improvement wasn't to the part of the query that had the bad cost estimate, it was to the part that was being performed hundreds of times instead of the one time the planner estimated.

The planner still thought it was only going to perform a sequential scan of your wooper table once. So even if there had been any Ad Hoc Index creation code that had been used to consider creating indexes as part of a plan cost estimate, it wouldn't have bothered creating any indexes on wooper.

Right. And even more to the point, if it had gotten the estimate right
and known that the subquery would have been repeated, that would have
(ideally) prompted it to shift to a different plan structure.

As Peter pointed out upthread, the existing hash join logic seems to be
a pretty decent facsimile of an "ad hoc index" --- in fact, the hash
table *is* a hash index for all intents and purposes. If you tilt your
head at the right angle, a merge join could be seen as comparable to
constructing a couple of ad-hoc btree indexes. Plus the join code is
not burdened by any overhead that actual index code would be likely to
have, such as niggling worries about crash-safety or concurrent access
to the index.

So in my mind the issue here is not why don't we have ad hoc indexes,
it's why the planner didn't choose a more appropriate join method.
It's clear that faulty rowcount estimation was one part of the answer.
Now it may also be that there are some outright structural limitations
involved due to the "join" arising from a sub-select --- I'm not sure
that the planner *can* generate a bulk join plan from a query expressed
like this one. But the TODO items that this suggests to me do not
include implementing anything I'd call an ad-hoc index.

regards, tom lane

#14Justin
justin@emproshunts.com
In reply to: Tom Lane (#13)
Re: Ad Hoc Indexes

there are certainly problems with this idea. Figuring out the cost to
either create an index or just do table scan would be pain but not
impossible. The hit to index a table in memory i don't think would be
that bad compare to do 100's to thousands of loops with table scans.

I see more options for the query planner in config file to handle ad-hoc
indexes

Now to Toms points. The point of Ad Hoc index is they're only alive for
the period time the query is running and only live in memory. Once
the query completes they die horrible deaths

These temporay indexes will not do a table lock or row lock ever it only
needs to view the record to create this index. So the same problems
that affects 100 table scans would affect create temp index. I think
this would help reduce concurrence sense the table is not being scanned
thousands of times

The idea of creating one time use indexes has been around for long time
and has showed to be a benefit when dealing with large data sets where
queries is one time or rarely used and its to much labor and cost to
figure out how to make it faster. This would also reduce IO disk
activity allot if the table can't fit in memory but the index would fit
because it relativity small in comparison.

Tom Lane wrote:

Show quoted text

"Stephen Denne" <Stephen.Denne@datamail.co.nz> writes:

The improvement wasn't to the part of the query that had the bad cost estimate, it was to the part that was being performed hundreds of times instead of the one time the planner estimated.

The planner still thought it was only going to perform a sequential scan of your wooper table once. So even if there had been any Ad Hoc Index creation code that had been used to consider creating indexes as part of a plan cost estimate, it wouldn't have bothered creating any indexes on wooper.

Right. And even more to the point, if it had gotten the estimate right
and known that the subquery would have been repeated, that would have
(ideally) prompted it to shift to a different plan structure.

As Peter pointed out upthread, the existing hash join logic seems to be
a pretty decent facsimile of an "ad hoc index" --- in fact, the hash
table *is* a hash index for all intents and purposes. If you tilt your
head at the right angle, a merge join could be seen as comparable to
constructing a couple of ad-hoc btree indexes. Plus the join code is
not burdened by any overhead that actual index code would be likely to
have, such as niggling worries about crash-safety or concurrent access
to the index.

So in my mind the issue here is not why don't we have ad hoc indexes,
it's why the planner didn't choose a more appropriate join method.
It's clear that faulty rowcount estimation was one part of the answer.
Now it may also be that there are some outright structural limitations
involved due to the "join" arising from a sub-select --- I'm not sure
that the planner *can* generate a bulk join plan from a query expressed
like this one. But the TODO items that this suggests to me do not
include implementing anything I'd call an ad-hoc index.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings