Re: HASH: Out of overflow pages. Out of luck

Started by Hannu Krosingover 23 years ago3 messages

hannu@tm.ee

over 23 years ago

On Mon, 2002-08-05 at 07:26, Gene Selkov, Jr. wrote:

Hi Everybody!

I'm sorry I dropped out for so long -- was switching jobs and was on
the verge of deportation for a while. Still not entirely back to
normal, but can raise my head and look around.

The first thing I discovered in the current version (7.2.1) -- as well
as in 7.1.3 -- seems to be an old problem with the hash am. It's
clustering too much.

...

The quality of the hash function can be a factor here, but probably
not the major one. I was able to jack my limit up to over 3.7M rows by
reversing the order of bytes in hashvarlena() -- I made the pointer go
down instead of up. That spread the hash values more sparsely, but it
failed with the same message when I fed it with more than 4M rows.

I saw Tom answer a similar question a year ago, by saying that the
hash access method is poorly supported and that there is no advantage
to using it. I am not sure about the former, but the latter is not
entirely true: we saw at least 20% gain in performance when we
switched from btree to hash, and my boss considers 20% a big enough
improvement. Besides, he knows the database theory and he is a
long-time BerkelyDB user,

As BerkelyDB came into being by splitting index methods out of an early
version of Postgres, it should still have some similar structure left,
so one possibility is to check what they are doing to not be that bad.

Have you tried to index your dataset into a BerkelyDB database ?

and in his world, hash is greatly superior
to btree, so he is wondering why are the postgres implementations so
close. Besides, it's a tough challenge to explain it to a Libertarian
that he'd better not do something.

-------------
Hannu

Import Notes

Reply to msg id not found: 20020805022621.06AA1A13D@selkovjr.xnet.comReference msg id not found: 20020805022621.06AA1A13D@selkovjr.xnet.com

Noname

selkovjr@xnet.com

over 23 years ago

In reply to: Hannu Krosing (#1)

Re: HASH: Out of overflow pages. Out of luck

From: Hannu Krosing <hannu@tm.ee>

As BerkelyDB came into being by splitting index methods out of an early
version of Postgres, it should still have some similar structure left,
so one possibility is to check what they are doing to not be that bad.

Have you tried to index your dataset into a BerkelyDB database ?

Yes, it works fine with BerkelyDB. I looked at both codes and I was
stupefied with their complexity. Even if there is a similar structure,
it must be very well disguised. Some of the data structures resemble
each other's counterparts; the only piece that is exactly the same
as one of the five BerkelyDB's hash functions.

The only useful experiment that I feel I am capable of making is
trying their __ham_hash5() function, with they claim is generally
better than the other four, for most purposes. But they warn in their
comments that there is no such thing as "a hash function" -- there
must be one for each purpose.

So another experiment I might try is writing an adapter for a
user-supplied hash -- that might help in figuring out the role of the
hash function in bin overflows. That should be easy enough to do, but
fixing or re-writing the access method itself -- I'm sorry: the level
of complexity scares me. Appears like a couple man-months
(those Mythical Man-Months :).

--Gene

Noname

nconway@klamath.dyndns.org

over 23 years ago

In reply to: Noname (#2)

Re: HASH: Out of overflow pages. Out of luck

On Wed, Aug 07, 2002 at 12:41:04AM -0500, selkovjr@xnet.com wrote:

Some of the data structures resemble each other's counterparts;
the only piece that is exactly the same as one of the five
BerkelyDB's hash functions.

FYI, the development version of PostgreSQL uses a completely
different (and higher quality) hash function.

The only useful experiment that I feel I am capable of making is
trying their __ham_hash5() function, with they claim is generally
better than the other four, for most purposes.

I'm skeptical that changing the hash function would make a significant
difference to the usability of hash indexes. At the very least, the
reproducible deadlocks under concurrent access need to be fixed, as well
as a host of other issues.

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC