"Hash index" vs. "b-tree index" (PostgreSQL 8.0)
Greetings,
We are working on speeding up the queries by creating indexes. We have
queries with searching criteria such as "select ... where *col1='...'*".
This is a simple query with only "=" operation. As a result I setup hash
index on column "col1". While, in postgreSQL 8 doc, it is wirttern:
*Note: * Testing has shown PostgreSQL's hash indexes to perform no
better than B-tree indexes, and the index size and build time for hash
indexes is much worse. For these reasons, hash index use is presently
discouraged.
May I know for simple "=" operation query, for "Hash index" vs. "B-tree"
index, which can provide better performance please?
Thanks,
Emi
Ying Lu wrote:
May I know for simple "=" operation query, for "Hash index" vs. "B-tree"
index, which can provide better performance please?
I don't think we've found a case in which the hash index code
outperforms B+-tree indexes, even for "=". The hash index code also has
a number of additional issues: for example, it isn't WAL safe, it has
relatively poor concurrency, and creating a hash index is significantly
slower than creating a b+-tree index.
-Neil
On 5/9/05, Neil Conway <neilc@samurai.com> wrote:
I don't think we've found a case in which the hash index code
outperforms B+-tree indexes, even for "=". The hash index code also has
a number of additional issues: for example, it isn't WAL safe, it has
relatively poor concurrency, and creating a hash index is significantly
slower than creating a b+-tree index.
This being the case, is there ever ANY reason for someone to use it?
If not, then shouldn't we consider deprecating it and eventually
removing it. This would reduce complexity, I think.
Chris
--
| Christopher Petrilli
| petrilli@gmail.com
Christopher Petrilli wrote:
This being the case, is there ever ANY reason for someone to use it?
Well, someone might fix it up at some point in the future. I don't think
there's anything fundamentally wrong with hash indexes, it is just that
the current implementation is a bit lacking.
If not, then shouldn't we consider deprecating it and eventually
removing it.
I would personally consider the code to be deprecated already.
I don't think there is much to be gained b removing it: the code is
pretty isolated from the rest of the tree, and (IMHO) not a significant
maintenance burden.
-Neil
On Tue, May 10, 2005 at 01:34:57AM +1000, Neil Conway wrote:
Christopher Petrilli wrote:
This being the case, is there ever ANY reason for someone to use it?
Well, someone might fix it up at some point in the future. I don't think
there's anything fundamentally wrong with hash indexes, it is just that
the current implementation is a bit lacking.If not, then shouldn't we consider deprecating it and eventually
removing it.I would personally consider the code to be deprecated already.
I don't think there is much to be gained b removing it: the code is
pretty isolated from the rest of the tree, and (IMHO) not a significant
maintenance burden.
That may be true, but it's also a somewhat 'developer-centric' view. ;)
Having indexes that people shouldn't be using does add confusion for
users, and presents the opportunity for foot-shooting. I don't know what
purpose they once served, but if there's no advantage to them they
should be officially depricated and eventually removed. Even if there is
some kind of advantage (would they possibly speed up hash joins?), if
there's no plans to fix them they should still be removed. If someone
ever really wanted to do something with, the code would still be in CVS.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828
Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"
Jim C. Nasby wrote:
Having indexes that people shouldn't be using does add confusion for
users, and presents the opportunity for foot-shooting.
Emitting a warning/notice on hash-index creation is something I've
suggested in the past -- that would be fine with me.
Even if there is some kind of advantage (would they possibly speed up
hash joins?)
No, hash joins and hash indexes are unrelated.
-Neil
On Tue, May 10, 2005 at 02:38:41AM +1000, Neil Conway wrote:
Jim C. Nasby wrote:
Having indexes that people shouldn't be using does add confusion for
users, and presents the opportunity for foot-shooting.Emitting a warning/notice on hash-index creation is something I've
suggested in the past -- that would be fine with me.
Probably not a bad idea.
Even if there is some kind of advantage (would they possibly speed up
hash joins?)No, hash joins and hash indexes are unrelated.
I know they are now, but does that have to be the case? Like I said, I
don't know the history, so I don't know why we even have them to begin
with.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828
Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"
*Note: * Testing has shown PostgreSQL's hash indexes to perform no
better than B-tree indexes, and the index size and build time for hash
indexes is much worse. For these reasons, hash index use is presently
discouraged.May I know for simple "=" operation query, for "Hash index" vs.
"B-tree" index, which can provide better performance please?
If you trust the documentation use a b-tree. If you don't trust the
documentation do your own tests.
please don't cross post.
Jim C. Nasby wrote:
No, hash joins and hash indexes are unrelated.
I know they are now, but does that have to be the case?
I mean, the algorithms are fundamentally unrelated. They share a bit of
code such as the hash functions themselves, but they are really solving
two different problems (disk based indexing with (hopefully) good
concurrency and WAL logging vs. in-memory joins via hashing with spill
to disk if needed).
Like I said, I don't know the history, so I don't know why we even
have them to begin with.
As I said, the idea of using hash indexes for better performance on
equality scans is perfectly valid, it is just the implementation that
needs work.
-Neil
Jim C. Nasby wrote:
On Tue, May 10, 2005 at 02:38:41AM +1000, Neil Conway wrote:
Jim C. Nasby wrote:
Having indexes that people shouldn't be using does add confusion for
users, and presents the opportunity for foot-shooting.Emitting a warning/notice on hash-index creation is something I've
suggested in the past -- that would be fine with me.Probably not a bad idea.
Agreed.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Neil Conway <neilc@samurai.com> writes:
Jim C. Nasby wrote:
No, hash joins and hash indexes are unrelated.
I know they are now, but does that have to be the case?
I mean, the algorithms are fundamentally unrelated. They share a bit of
code such as the hash functions themselves, but they are really solving
two different problems
In fact, up till fairly recently they didn't even share the hash
functions. Which was a bug not a feature, but the fact remains ---
there's not a lot of commonality.
Like I said, I don't know the history, so I don't know why we even
have them to begin with.
I think it's largely because some Berkeley grad student had a need to
implement hash indexes for academic reasons ;-)
As I said, the idea of using hash indexes for better performance on
equality scans is perfectly valid, it is just the implementation that
needs work.
I was thinking about that earlier today. It seems to me there is a
window within which hash indexes are theoretically superior, but it
might be pretty narrow. The basic allure of a hash index is that you
look at the search key, do some allegedly-trivial computations, and go
directly to the relevant index page(s); whereas a btree index requires
descending through several upper levels of index pages to reach the
target leaf page. On the other hand, once you reach the target index
page, a hash index has no better method than linear scan through all
the page's index entries to find the actually wanted key(s); in fact you
have to search all the pages in that index bucket. A btree index can
use binary search to find the relevant items within the page.
So it strikes me that important parameters include the index entry size
and the number of entries matching any particular key value.
btree will probably win for smaller keys, on two grounds: it will have
fewer tree levels to descend through, because of higher fan-out, and it
will be much more efficient at finding the target entries within the
target page when there are many entries per page. (As against this,
it will have to work harder at each upper tree page to decide where to
descend to --- but I think that's a second-order consideration.)
hash will tend to win as the number of duplicate keys increases, because
its relative inefficiency at finding the matches within a particular
bucket will become less significant. (The ideal situation for a hash
index is to have only one actual key value per bucket. You can't really
afford to store only one index entry per bucket, because of the sheer
I/O volume that would result, so you need multiple entries that will all
be responsive to your search.) (This also brings up the thought that
it might be interesting to support hash buckets smaller than a page ...
but I don't know how to make that work in an adaptive fashion.)
I suspect that people haven't looked hard for a combination of these
parameters within which hash can win. Of course the real question for
us is whether the window is wide enough to justify the maintenance
effort for hash.
regards, tom lane
Tom Lane wrote:
On the other hand, once you reach the target index page, a hash index
has no better method than linear scan through all the page's index
entries to find the actually wanted key(s)
I wonder if it would be possible to store the keys in a hash bucket in
sorted order, provided that the necessary ordering is defined for the
index keys -- considering the ubiquity of b+-trees in Postgres, the
chances of an ordering being defined are pretty good. Handling overflow
pages would be tricky: perhaps we could store the entries in a given
page in sorted order, but not try to maintain that order for the hash
bucket as a whole. This would mean we'd need to do a binary search for
each page of the bucket, but that would still be a win.
-Neil
Neil Conway <neilc@samurai.com> writes:
Tom Lane wrote:
On the other hand, once you reach the target index page, a hash index
has no better method than linear scan through all the page's index
entries to find the actually wanted key(s)
I wonder if it would be possible to store the keys in a hash bucket in
sorted order, provided that the necessary ordering is defined for the
index keys -- considering the ubiquity of b+-trees in Postgres, the
chances of an ordering being defined are pretty good.
I have a gut reaction against that: it makes hash indexes fundamentally
subservient to btrees. We shouldn't bring in concepts that are outside
the basic opclass abstraction.
However: what about storing the things in hashcode order? Ordering uint32s
doesn't seem like any big conceptual problem.
I think that efficient implementation of this would require explicitly
storing the hash code for each index entry, which we don't do now, but
it seems justifiable on multiple grounds --- besides this point, the
search could avoid doing the data-type-specific comparison if the hash
code isn't equal.
There is evidence in the code that indexes used to store more info than
what we now think of as a "standard" index tuple. I am not sure when
that went away or what it'd cost to bring it back, but it seems worth
looking into.
regards, tom lane
Tom Lane wrote:
I have a gut reaction against that: it makes hash indexes fundamentally
subservient to btrees.
I wouldn't say "subservient" -- if there is no ordering defined for the
index key, we just do a linear scan.
However: what about storing the things in hashcode order? Ordering uint32s
doesn't seem like any big conceptual problem.
Hmm, my memory of the hash code is a bit fuzzy. Do I understand correctly?
- we only use some of the bits in the hash to map from the hash of a key
to its bucket
- therefore within a bucket, we can still distinguish most of the
non-equal tuples from one another by comparing their full hash values
- if we keep the entries in a bucket (or page, I guess -- per earlier
mail) sorted by full hash value, we can use that to perform a binary search
Sounds like a good idea to me. How likely is it that the hash index will
be sufficiently large that we're using most of the bits in the hash just
to map hash values to buckets, so that the binary search won't be very
effective? (At this point many of the distinct keys in each bucket will
be full-on hash collisions, although sorting by the key values
themselves would still be effective.)
I think that efficient implementation of this would require explicitly
storing the hash code for each index entry, which we don't do now, but
it seems justifiable on multiple grounds --- besides this point, the
search could avoid doing the data-type-specific comparison if the hash
code isn't equal.
Another benefit is that it would speed up page splits -- there would be
no need to rehash all the keys in a bucket when doing the split.
-Neil
Tom Lane <tgl@sss.pgh.pa.us> writes:
However: what about storing the things in hashcode order? Ordering uint32s
doesn't seem like any big conceptual problem.I think that efficient implementation of this would require explicitly
storing the hash code for each index entry, which we don't do now, but
it seems justifiable on multiple grounds --- besides this point, the
search could avoid doing the data-type-specific comparison if the hash
code isn't equal.
It seems that means doubling the size of the hash index. That's a pretty big
i/o to cpu tradeoff.
What if the hash index stored *only* the hash code? That could be useful for
indexing large datatypes that would otherwise create large indexes. A good
hash function should have a pretty low collision rate anyways so the
occasional extra i/o should more than be outweighed by the decrease in i/o
needed to use the index.
--
greg
Greg Stark <gsstark@mit.edu> writes:
Tom Lane <tgl@sss.pgh.pa.us> writes:
I think that efficient implementation of this would require explicitly
storing the hash code for each index entry,
It seems that means doubling the size of the hash index. That's a pretty big
i/o to cpu tradeoff.
Hardly. The minimum possible size of a hash entry today is 8 bytes
header plus 4 bytes datum, plus there's a 4-byte line pointer to factor
in. So under the most pessimistic assumptions, storing the hash code
would add 25% to the size. (On MAXALIGN=8 hardware, it might cost you
nothing at all.)
What if the hash index stored *only* the hash code? That could be useful for
indexing large datatypes that would otherwise create large indexes.
Hmm, that could be a thought.
regards, tom lane
On Tue, May 10, 2005 at 10:14:11AM +1000, Neil Conway wrote:
Jim C. Nasby wrote:
No, hash joins and hash indexes are unrelated.
I know they are now, but does that have to be the case?
I mean, the algorithms are fundamentally unrelated. They share a bit of
code such as the hash functions themselves, but they are really solving
two different problems (disk based indexing with (hopefully) good
concurrency and WAL logging vs. in-memory joins via hashing with spill
to disk if needed).
Well, in a hash-join right now you normally end up feeding at least one
side of the join with a seqscan. Wouldn't it speed things up
considerably if you could look up hashes in the hash index instead? That
way you can eliminate going to the heap for any hashes that match. Of
course, if limited tuple visibility info was added to hash indexes
(similar to what I think is currently happening to B-tree's), many of
the heap scans could be eliminated as well. A similar method could also
be used for hash aggregates, assuming they use the same hash.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828
Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"
On Tue, May 10, 2005 at 12:10:57AM -0400, Tom Lane wrote:
be responsive to your search.) (This also brings up the thought that
it might be interesting to support hash buckets smaller than a page ...
but I don't know how to make that work in an adaptive fashion.)
IIRC, other databases that support hash indexes also allow you to define
the bucket size, so it might be a good start to allow for that. DBA's
usually have a pretty good idea of what a table will look like in
production, so if there's clear documentation on the effect of bucket
size a good DBA should be able to make a good decision.
What's the challange to making it adaptive, comming up with an algorithm
that gives you the optimal bucket size (which I would think there's
research on...) or allowing the index to accommodate different bucket
sizes existing in the index at once? (Presumably you don't want to
re-write the entire index every time it looks like a different bucket
size would help.)
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828
Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"
"Jim C. Nasby" <decibel@decibel.org> writes:
What's the challange to making it adaptive, comming up with an algorithm
that gives you the optimal bucket size (which I would think there's
research on...) or allowing the index to accommodate different bucket
sizes existing in the index at once? (Presumably you don't want to
re-write the entire index every time it looks like a different bucket
size would help.)
Exactly. That's (a) expensive and (b) really hard to fit into the WAL
paradigm --- I think we could only handle it as a REINDEX. So if it
were adaptive at all I think we'd have to support multiple bucket sizes
existing simultaneously in the index, and I do not see a good way to do
that.
Allowing a bucket size to be specified at CREATE INDEX doesn't seem out
of line though. We'd have to think up a scheme for index-AM-specific
index parameters ...
regards, tom lane
On Tue, May 10, 2005 at 11:49:50AM -0400, Tom Lane wrote:
"Jim C. Nasby" <decibel@decibel.org> writes:
What's the challange to making it adaptive, comming up with an algorithm
that gives you the optimal bucket size (which I would think there's
research on...) or allowing the index to accommodate different bucket
sizes existing in the index at once? (Presumably you don't want to
re-write the entire index every time it looks like a different bucket
size would help.)Exactly. That's (a) expensive and (b) really hard to fit into the WAL
paradigm --- I think we could only handle it as a REINDEX. So if it
were adaptive at all I think we'd have to support multiple bucket sizes
existing simultaneously in the index, and I do not see a good way to do
that.
I'm not really familiar enough with hash indexes to know if this would
work, but if the maximum bucket size was known you could use that to
determine a maximum range of buckets to look at. In some cases, that
range would include only one bucket, otherwise it would be a set of
buckets. If you found a set of buckets, I think you could then just go
to the specific one you need.
If we assume that the maximum bucket size is one page it becomes more
realistic to take an existing large bucket and split it into several
smaller ones. This could be done on an update to the index page, or a
background process could handle it.
In any case, should this go on the TODO list?
Allowing a bucket size to be specified at CREATE INDEX doesn't seem out
of line though. We'd have to think up a scheme for index-AM-specific
index parameters ...
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828
Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"