HASH: Out of overflow pages. Out of luck
Hi Everybody!
I'm sorry I dropped out for so long -- was switching jobs and was on
the verge of deportation for a while. Still not entirely back to
normal, but can raise my head and look around.
The first thing I discovered in the current version (7.2.1) -- as well
as in 7.1.3 -- seems to be an old problem with the hash am. It's
clustering too much.
The data I'm trying to index is of the type text, so it uses hashvarlena(). The values
are something like this:
REC00014
REC00015
....
REC02463
RBS00001
RBS00002
....
RBS11021
....
It's like several very long sequences with multiple gaps.
With the existing hashvarlena(), I can index about 2M rows, but not
much more -- it comes back with the message about the overflow pages.
hashvarlena() responds to this input in a piecewise linear fashion. That is,
the succession 'REC00010' .. 'REC00019' results in a linear order
1346868191 .. 1346868200. The next ten hash values will also be sequential,
but in a different range.
The quality of the hash function can be a factor here, but probably
not the major one. I was able to jack my limit up to over 3.7M rows by
reversing the order of bytes in hashvarlena() -- I made the pointer go
down instead of up. That spread the hash values more sparsely, but it
failed with the same message when I fed it with more than 4M rows.
I saw Tom answer a similar question a year ago, by saying that the
hash access method is poorly supported and that there is no advantage
to using it. I am not sure about the former, but the latter is not
entirely true: we saw at least 20% gain in performance when we
switched from btree to hash, and my boss considers 20% a big enough
improvement. Besides, he knows the database theory and he is a
long-time BerkelyDB user, and in his world, hash is greatly superior
to btree, so he is wondering why are the postgres implementations so
close. Besides, it's a tough challenge to explain it to a Libertarian
that he'd better not do something.
I guess we can make such people happy by either fixing hash, or by
making btree very much worse -- whichever is easier :)
Seriously, though, if anybody wants to look at the problem, I saved
the set of keys that caused it as
http://home.xnet.com/~selkovjr/tmp/prot.gz
Also, I heard that other database systems have special access methods
for sequences, that address this issue, and I heard as well that
without an option to use an arbitrary hash function instead of a
single built-in, such problems are bound to happen with particular
data sets. How true is that? If a better access method exists, what is
it?
Thank you,
Gene
I saw Tom answer a similar question a year ago, by saying that the
hash access method is poorly supported and that there is no advantage
to using it. I am not sure about the former, but the latter is not
entirely true: we saw at least 20% gain in performance when we
switched from btree to hash, and my boss considers 20% a big enough
improvement. Besides, he knows the database theory and he is a
long-time BerkelyDB user, and in his world, hash is greatly superior
to btree, so he is wondering why are the postgres implementations so
close. Besides, it's a tough challenge to explain it to a Libertarian
that he'd better not do something.I guess we can make such people happy by either fixing hash, or by
making btree very much worse -- whichever is easier :)
Cool. I'm sure that making btree much worse is definitely within my
ability - I'll submit a patch shortly with new pg_bench results.
Chris
"Gene Selkov, Jr." <selkovjr@xnet.com> writes:
I saw Tom answer a similar question a year ago, by saying that the
hash access method is poorly supported and that there is no advantage
to using it. I am not sure about the former, but the latter is not
entirely true: we saw at least 20% gain in performance when we
switched from btree to hash, and my boss considers 20% a big enough
improvement. Besides, he knows the database theory and he is a
long-time BerkelyDB user, and in his world, hash is greatly superior
to btree, so he is wondering why are the postgres implementations so
close. Besides, it's a tough challenge to explain it to a Libertarian
that he'd better not do something.
Hey, if he wants to fix the hash code, more power to him ;-). Patches
will be gladly accepted.
The real problem with the PG hash index code is that approximately zero
effort has been put into it since the code left Berkeley, while quite
a lot of work has been put into the btree code. Thus, for most purposes
the btree index type leaves hash in the dust, no matter what theoretical
concerns may say.
If you or he would like to expend the effort to bring hash indexing up
to speed, I'll surely not stand in your way. But be advised that
there's a lot of work to be done there (concurrency issues and WAL
support being at the top of my list) ... are you sure you believe that
hash is worth the effort?
regards, tom lane