new hashing function

Started by Neil Conwayalmost 24 years ago4 messages
#1Neil Conway
nconway@klamath.dyndns.org
1 attachment(s)

Hi,

I've implemented Bob Jenkin's hash function for PostgreSQL; more
information on the hash function can be found at
http://burtleburtle.net/bob/hash/doobs.html

I'm posting this to -hackers because I'd like to get some feedback on
whether this actually improves performance. I've tested 2 situations
locally:

(1) pgbench, btree indexes (to test typical performance)

(2) pgbench, hash indexes

I haven't looked at the implementation of hash joins; if they happen to
use this hash function as well, that would be another informative
situation to benchmark.

In my local tests, the performance in (1) is basically the same, while
the performance in (2) is increased by 4% to 8%. If you could let me
know the results on your local setup, it should become clear whether
this patch is a performance win overall.

Note that to test case (2) properly, you'll need to drop and re-create
your pgbench indexes after you apply the patch (so that when the new
indexes are created, they use the new hash function). Also, it would be
a good idea to use concurrency level 1; at higher concurrency levels,
hash indexes have a tendancy to deadlock (yes, I'm currently trying to
fix that too ;-) ).

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

Attachments:

jenkins_hash-3.patchtext/plain; charset=ISO-8859-1Download
Index: src/backend/access/hash/hash.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/hash/hash.c,v
retrieving revision 1.53
diff -c -r1.53 hash.c
*** src/backend/access/hash/hash.c	2001/10/25 05:49:20	1.53
--- src/backend/access/hash/hash.c	2002/03/03 05:50:37
***************
*** 165,173 ****
  	char	   *nulls = (char *) PG_GETARG_POINTER(2);
  	ItemPointer ht_ctid = (ItemPointer) PG_GETARG_POINTER(3);
  
- #ifdef NOT_USED
- 	Relation	heapRel = (Relation) PG_GETARG_POINTER(4);
- #endif
  	InsertIndexResult res;
  	HashItem	hitem;
  	IndexTuple	itup;
--- 165,170 ----
***************
*** 333,339 ****
  
  /*
   *	hashmarkpos() -- save current scan position
-  *
   */
  Datum
  hashmarkpos(PG_FUNCTION_ARGS)
--- 330,335 ----
Index: src/backend/access/hash/hashfunc.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/hash/hashfunc.c,v
retrieving revision 1.31
diff -c -r1.31 hashfunc.c
*** src/backend/access/hash/hashfunc.c	2002/02/25 04:06:47	1.31
--- src/backend/access/hash/hashfunc.c	2002/03/03 05:50:37
***************
*** 116,176 ****
  	return result;
  }
  
  
  /*
!  * hash_any --- compute a hash function for any specified chunk of memory
!  *
!  * This can be used as the underlying hash function for any pass-by-reference
!  * data type in which there are no non-significant bits.
!  *
!  * (Comment from the original db3 hashing code: )
!  *
!  * This is INCREDIBLY ugly, but fast.  We break the string up into 8 byte
!  * units.  On the first time through the loop we get the 'leftover bytes'
!  * (strlen % 8).  On every later iteration, we perform 8 HASHC's so we handle
!  * all 8 bytes.  Essentially, this saves us 7 cmp & branch instructions.  If
!  * this routine is heavily used enough, it's worth the ugly coding.
!  *
!  * "OZ's original sdbm hash"
   */
  Datum
! hash_any(const char *keydata, int keylen)
  {
! 	uint32		n;
! 	int			loop;
! 
! #define HASHC	n = *keydata++ + 65599 * n
! 
! 	n = 0;
! 	if (keylen > 0)
! 	{
! 		loop = (keylen + 8 - 1) >> 3;
! 
! 		switch (keylen & (8 - 1))
! 		{
! 			case 0:
! 				do
! 				{				/* All fall throughs */
! 					HASHC;
! 			case 7:
! 					HASHC;
! 			case 6:
! 					HASHC;
! 			case 5:
! 					HASHC;
! 			case 4:
! 					HASHC;
! 			case 3:
! 					HASHC;
! 			case 2:
! 					HASHC;
! 			case 1:
! 					HASHC;
! 				} while (--loop);
! 		}
! 	}
! 
! #undef HASHC
  
! 	PG_RETURN_UINT32(n);
  }
--- 116,201 ----
  	return result;
  }
  
+ /* This hash function was written by Bob Jenkins, and superficially
+  * adapted for PostgreSQL by Neil Conway. For more information on
+  * this hash function, see http://burtleburtle.net/bob/hash/doobs.html
+  */
+ 
+ /*
+  * mix -- mix 3 32-bit values reversibly.
+  * For every delta with one or two bits set, and the deltas of all three
+  * high bits or all three low bits, whether the original value of a,b,c
+  * is almost all zero or is uniformly distributed,
+  * - If mix() is run forward or backward, at least 32 bits in a,b,c
+  *   have at least 1/4 probability of changing.
+  * - If mix() is run forward, every bit of c will change between 1/3 and
+  *   2/3 of the time.  (Well, 22/100 and 78/100 for some 2-bit deltas.)
+  */
+ #define mix(a,b,c) \
+ { \
+   a -= b; a -= c; a ^= (c>>13); \
+   b -= c; b -= a; b ^= (a<<8); \
+   c -= a; c -= b; c ^= (b>>13); \
+   a -= b; a -= c; a ^= (c>>12);  \
+   b -= c; b -= a; b ^= (a<<16); \
+   c -= a; c -= b; c ^= (b>>5); \
+   a -= b; a -= c; a ^= (c>>3);  \
+   b -= c; b -= a; b ^= (a<<10); \
+   c -= a; c -= b; c ^= (b>>15); \
+ }
  
  /*
!  * hash_any() -- hash a variable-length key into a 32-bit value
!  *      k       : the key (the unaligned variable-length array of bytes)
!  *      len     : the length of the key, counting by bytes
!  * Returns a 32-bit value.  Every bit of the key affects every bit of
!  * the return value.  Every 1-bit and 2-bit delta achieves avalanche.
!  * About 6*len+35 instructions.
   */
  Datum
! hash_any(const char *k, int keylen)
  {
!    register Datum a,b,c,len;
  
!    /* Set up the internal state */
!    len = keylen;
!    a = b = 0x9e3779b9;  /* the golden ratio; an arbitrary value */
!    /* Another arbitrary value. If the hash function is called
!     * multiple times, this could be the previously generated
!     * hash value; however, the interface currently doesn't allow
!     * this. AFAIK this isn't a big deal.
!     */
!    c = 3923095;
! 
!    /* handle most of the key */
!    while (len >= 12)
!    {
!       a += (k[0] +((Datum)k[1]<<8) +((Datum)k[2]<<16) +((Datum)k[3]<<24));
!       b += (k[4] +((Datum)k[5]<<8) +((Datum)k[6]<<16) +((Datum)k[7]<<24));
!       c += (k[8] +((Datum)k[9]<<8) +((Datum)k[10]<<16)+((Datum)k[11]<<24));
!       mix(a,b,c);
!       k += 12; len -= 12;
!    }
! 
!    /* handle the last 11 bytes */
!    c += keylen;
!    switch(len)              /* all the case statements fall through */
!    {
!    case 11: c+=((Datum)k[10]<<24);
!    case 10: c+=((Datum)k[9]<<16);
!    case 9 : c+=((Datum)k[8]<<8);
!       /* the first byte of c is reserved for the length */
!    case 8 : b+=((Datum)k[7]<<24);
!    case 7 : b+=((Datum)k[6]<<16);
!    case 6 : b+=((Datum)k[5]<<8);
!    case 5 : b+=k[4];
!    case 4 : a+=((Datum)k[3]<<24);
!    case 3 : a+=((Datum)k[2]<<16);
!    case 2 : a+=((Datum)k[1]<<8);
!    case 1 : a+=k[0];
!      /* case 0: nothing left to add */
!    }
!    mix(a,b,c);
!    /* report the result */
!    return c;
  }
Index: src/backend/access/hash/hashinsert.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/hash/hashinsert.c,v
retrieving revision 1.23
diff -c -r1.23 hashinsert.c
*** src/backend/access/hash/hashinsert.c	2001/10/25 05:49:21	1.23
--- src/backend/access/hash/hashinsert.c	2002/03/03 05:50:38
***************
*** 17,24 ****
  
  #include "access/hash.h"
  
! static InsertIndexResult _hash_insertonpg(Relation rel, Buffer buf, int keysz, ScanKey scankey, HashItem hitem, Buffer metabuf);
! static OffsetNumber _hash_pgaddtup(Relation rel, Buffer buf, int keysz, ScanKey itup_scankey, Size itemsize, HashItem hitem);
  
  /*
   *	_hash_doinsert() -- Handle insertion of a single HashItem in the table.
--- 17,24 ----
  
  #include "access/hash.h"
  
! static InsertIndexResult _hash_insertonpg(Relation rel, Buffer buf, HashItem hitem, Buffer metabuf);
! static OffsetNumber _hash_pgaddtup(Relation rel, Buffer buf, Size itemsize, HashItem hitem);
  
  /*
   *	_hash_doinsert() -- Handle insertion of a single HashItem in the table.
***************
*** 49,61 ****
  	itup = &(hitem->hash_itup);
  	if ((natts = rel->rd_rel->relnatts) != 1)
  		elog(ERROR, "Hash indices valid for only one index key.");
! 	itup_scankey = _hash_mkscankey(rel, itup, metap);
  
  	/*
  	 * find the first page in the bucket chain containing this key and
  	 * place it in buf.  _hash_search obtains a read lock for us.
  	 */
! 	_hash_search(rel, natts, itup_scankey, &buf, metap);
  	page = BufferGetPage(buf);
  	_hash_checkpage(page, LH_BUCKET_PAGE);
  
--- 49,61 ----
  	itup = &(hitem->hash_itup);
  	if ((natts = rel->rd_rel->relnatts) != 1)
  		elog(ERROR, "Hash indices valid for only one index key.");
! 	itup_scankey = _hash_mkscankey(rel, itup);
  
  	/*
  	 * find the first page in the bucket chain containing this key and
  	 * place it in buf.  _hash_search obtains a read lock for us.
  	 */
! 	_hash_search(rel, itup_scankey, &buf, metap);
  	page = BufferGetPage(buf);
  	_hash_checkpage(page, LH_BUCKET_PAGE);
  
***************
*** 78,85 ****
  	 */
  
  	/* do the insertion */
! 	res = _hash_insertonpg(rel, buf, natts, itup_scankey,
! 						   hitem, metabuf);
  
  	/* be tidy */
  	_hash_freeskey(itup_scankey);
--- 78,84 ----
  	 */
  
  	/* do the insertion */
! 	res = _hash_insertonpg(rel, buf, hitem, metabuf);
  
  	/* be tidy */
  	_hash_freeskey(itup_scankey);
***************
*** 103,110 ****
  static InsertIndexResult
  _hash_insertonpg(Relation rel,
  				 Buffer buf,
- 				 int keysz,
- 				 ScanKey scankey,
  				 HashItem hitem,
  				 Buffer metabuf)
  {
--- 102,107 ----
***************
*** 171,177 ****
  		Assert(pageopaque->hasho_bucket == bucket);
  	}
  
! 	itup_off = _hash_pgaddtup(rel, buf, keysz, scankey, itemsz, hitem);
  	itup_blkno = BufferGetBlockNumber(buf);
  
  	/* by here, the new tuple is inserted */
--- 168,174 ----
  		Assert(pageopaque->hasho_bucket == bucket);
  	}
  
! 	itup_off = _hash_pgaddtup(rel, buf, itemsz, hitem);
  	itup_blkno = BufferGetBlockNumber(buf);
  
  	/* by here, the new tuple is inserted */
***************
*** 214,221 ****
  static OffsetNumber
  _hash_pgaddtup(Relation rel,
  			   Buffer buf,
- 			   int keysz,
- 			   ScanKey itup_scankey,
  			   Size itemsize,
  			   HashItem hitem)
  {
--- 211,216 ----
***************
*** 232,238 ****
  			 RelationGetRelationName(rel));
  
  	/* write the buffer, but hold our lock */
! 	_hash_wrtnorelbuf(rel, buf);
  
  	return itup_off;
  }
--- 227,233 ----
  			 RelationGetRelationName(rel));
  
  	/* write the buffer, but hold our lock */
! 	_hash_wrtnorelbuf(buf);
  
  	return itup_off;
  }
Index: src/backend/access/hash/hashovfl.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/hash/hashovfl.c,v
retrieving revision 1.31
diff -c -r1.31 hashovfl.c
*** src/backend/access/hash/hashovfl.c	2001/10/25 05:49:21	1.31
--- src/backend/access/hash/hashovfl.c	2002/03/03 05:50:38
***************
*** 73,83 ****
  	ovflopaque->hasho_flag = LH_OVERFLOW_PAGE;
  	ovflopaque->hasho_oaddr = oaddr;
  	ovflopaque->hasho_bucket = pageopaque->hasho_bucket;
! 	_hash_wrtnorelbuf(rel, ovflbuf);
  
  	/* logically chain overflow page to previous page */
  	pageopaque->hasho_nextblkno = ovflblkno;
! 	_hash_wrtnorelbuf(rel, buf);
  	return ovflbuf;
  }
  
--- 73,83 ----
  	ovflopaque->hasho_flag = LH_OVERFLOW_PAGE;
  	ovflopaque->hasho_oaddr = oaddr;
  	ovflopaque->hasho_bucket = pageopaque->hasho_bucket;
! 	_hash_wrtnorelbuf(ovflbuf);
  
  	/* logically chain overflow page to previous page */
  	pageopaque->hasho_nextblkno = ovflblkno;
! 	_hash_wrtnorelbuf(buf);
  	return ovflbuf;
  }
  
***************
*** 574,580 ****
  		 * the "next" ItemId.
  		 */
  		PageIndexTupleDelete(rpage, roffnum);
! 		_hash_wrtnorelbuf(rel, rbuf);
  
  		/*
  		 * if the "read" page is now empty because of the deletion, free
--- 574,580 ----
  		 * the "next" ItemId.
  		 */
  		PageIndexTupleDelete(rpage, roffnum);
! 		_hash_wrtnorelbuf(rbuf);
  
  		/*
  		 * if the "read" page is now empty because of the deletion, free
Index: src/backend/access/hash/hashpage.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/hash/hashpage.c,v
retrieving revision 1.34
diff -c -r1.34 hashpage.c
*** src/backend/access/hash/hashpage.c	2002/01/15 22:14:16	1.34
--- src/backend/access/hash/hashpage.c	2002/03/03 05:50:38
***************
*** 151,157 ****
  		elog(ERROR, "Problem with _hash_initbitmap.");
  
  	/* all done */
! 	_hash_wrtnorelbuf(rel, metabuf);
  
  	/*
  	 * initialize the first two buckets
--- 151,157 ----
  		elog(ERROR, "Problem with _hash_initbitmap.");
  
  	/* all done */
! 	_hash_wrtnorelbuf(metabuf);
  
  	/*
  	 * initialize the first two buckets
***************
*** 260,266 ****
   *		or a reference to the buffer.
   */
  void
! _hash_wrtnorelbuf(Relation rel, Buffer buf)
  {
  	BlockNumber blkno;
  
--- 260,266 ----
   *		or a reference to the buffer.
   */
  void
! _hash_wrtnorelbuf(Buffer buf)
  {
  	BlockNumber blkno;
  
***************
*** 383,389 ****
  	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
  
  	PageIndexTupleDelete(page, offno);
! 	_hash_wrtnorelbuf(rel, buf);
  
  	if (PageIsEmpty(page) && (opaque->hasho_flag & LH_OVERFLOW_PAGE))
  	{
--- 383,389 ----
  	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
  
  	PageIndexTupleDelete(page, offno);
! 	_hash_wrtnorelbuf(buf);
  
  	if (PageIsEmpty(page) && (opaque->hasho_flag & LH_OVERFLOW_PAGE))
  	{
***************
*** 505,511 ****
  	nopaque->hasho_flag = LH_BUCKET_PAGE;
  	nopaque->hasho_oaddr = InvalidOvflAddress;
  	nopaque->hasho_bucket = nbucket;
! 	_hash_wrtnorelbuf(rel, nbuf);
  
  	/*
  	 * make sure the old bucket isn't empty.  advance 'opage' and friends
--- 505,511 ----
  	nopaque->hasho_flag = LH_BUCKET_PAGE;
  	nopaque->hasho_oaddr = InvalidOvflAddress;
  	nopaque->hasho_bucket = nbucket;
! 	_hash_wrtnorelbuf(nbuf);
  
  	/*
  	 * make sure the old bucket isn't empty.  advance 'opage' and friends
***************
*** 628,634 ****
  				== InvalidOffsetNumber)
  				elog(ERROR, "_hash_splitpage: failed to add index item to %s",
  					 RelationGetRelationName(rel));
! 			_hash_wrtnorelbuf(rel, nbuf);
  
  			/*
  			 * now delete the tuple from the old bucket.  after this
--- 628,634 ----
  				== InvalidOffsetNumber)
  				elog(ERROR, "_hash_splitpage: failed to add index item to %s",
  					 RelationGetRelationName(rel));
! 			_hash_wrtnorelbuf(nbuf);
  
  			/*
  			 * now delete the tuple from the old bucket.  after this
***************
*** 640,646 ****
  			 * instead of calling PageGetMaxOffsetNumber.
  			 */
  			PageIndexTupleDelete(opage, ooffnum);
! 			_hash_wrtnorelbuf(rel, obuf);
  			omaxoffnum = OffsetNumberPrev(omaxoffnum);
  
  			/*
--- 640,646 ----
  			 * instead of calling PageGetMaxOffsetNumber.
  			 */
  			PageIndexTupleDelete(opage, ooffnum);
! 			_hash_wrtnorelbuf(obuf);
  			omaxoffnum = OffsetNumberPrev(omaxoffnum);
  
  			/*
Index: src/backend/access/hash/hashsearch.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/hash/hashsearch.c,v
retrieving revision 1.27
diff -c -r1.27 hashsearch.c
*** src/backend/access/hash/hashsearch.c	2001/10/25 05:49:21	1.27
--- src/backend/access/hash/hashsearch.c	2002/03/03 05:50:38
***************
*** 24,30 ****
   */
  void
  _hash_search(Relation rel,
- 			 int keysz,
  			 ScanKey scankey,
  			 Buffer *bufP,
  			 HashMetaPage metap)
--- 24,29 ----
***************
*** 198,204 ****
  	 */
  
  	/* find the correct bucket page and load it into buf */
! 	_hash_search(rel, 1, scan->keyData, &buf, metap);
  	page = BufferGetPage(buf);
  	_hash_checkpage(page, LH_BUCKET_PAGE);
  	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
--- 197,203 ----
  	 */
  
  	/* find the correct bucket page and load it into buf */
! 	_hash_search(rel, scan->keyData, &buf, metap);
  	page = BufferGetPage(buf);
  	_hash_checkpage(page, LH_BUCKET_PAGE);
  	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
Index: src/backend/access/hash/hashutil.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/hash/hashutil.c,v
retrieving revision 1.27
diff -c -r1.27 hashutil.c
*** src/backend/access/hash/hashutil.c	2001/10/06 23:21:43	1.27
--- src/backend/access/hash/hashutil.c	2002/03/03 05:50:38
***************
*** 21,27 ****
  
  
  ScanKey
! _hash_mkscankey(Relation rel, IndexTuple itup, HashMetaPage metap)
  {
  	ScanKey		skey;
  	TupleDesc	itupdesc;
--- 21,27 ----
  
  
  ScanKey
! _hash_mkscankey(Relation rel, IndexTuple itup)
  {
  	ScanKey		skey;
  	TupleDesc	itupdesc;
Index: src/include/access/hash.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/access/hash.h,v
retrieving revision 1.43
diff -c -r1.43 hash.h
*** src/include/access/hash.h	2002/02/25 04:06:52	1.43
--- src/include/access/hash.h	2002/03/03 05:50:38
***************
*** 265,273 ****
  extern Datum hashint2vector(PG_FUNCTION_ARGS);
  extern Datum hashname(PG_FUNCTION_ARGS);
  extern Datum hashvarlena(PG_FUNCTION_ARGS);
! extern Datum hash_any(const char *keydata, int keylen);
  
- 
  /* private routines */
  
  /* hashinsert.c */
--- 265,272 ----
  extern Datum hashint2vector(PG_FUNCTION_ARGS);
  extern Datum hashname(PG_FUNCTION_ARGS);
  extern Datum hashvarlena(PG_FUNCTION_ARGS);
! extern Datum hash_any(const char *k, int keylen);
  
  /* private routines */
  
  /* hashinsert.c */
***************
*** 288,294 ****
  extern Buffer _hash_getbuf(Relation rel, BlockNumber blkno, int access);
  extern void _hash_relbuf(Relation rel, Buffer buf, int access);
  extern void _hash_wrtbuf(Relation rel, Buffer buf);
! extern void _hash_wrtnorelbuf(Relation rel, Buffer buf);
  extern Page _hash_chgbufaccess(Relation rel, Buffer *bufp, int from_access,
  				   int to_access);
  extern void _hash_pageinit(Page page, Size size);
--- 287,293 ----
  extern Buffer _hash_getbuf(Relation rel, BlockNumber blkno, int access);
  extern void _hash_relbuf(Relation rel, Buffer buf, int access);
  extern void _hash_wrtbuf(Relation rel, Buffer buf);
! extern void _hash_wrtnorelbuf(Buffer buf);
  extern Page _hash_chgbufaccess(Relation rel, Buffer *bufp, int from_access,
  				   int to_access);
  extern void _hash_pageinit(Page page, Size size);
***************
*** 304,310 ****
  
  
  /* hashsearch.c */
! extern void _hash_search(Relation rel, int keysz, ScanKey scankey,
  			 Buffer *bufP, HashMetaPage metap);
  extern RetrieveIndexResult _hash_next(IndexScanDesc scan, ScanDirection dir);
  extern RetrieveIndexResult _hash_first(IndexScanDesc scan, ScanDirection dir);
--- 303,309 ----
  
  
  /* hashsearch.c */
! extern void _hash_search(Relation rel, ScanKey scankey,
  			 Buffer *bufP, HashMetaPage metap);
  extern RetrieveIndexResult _hash_next(IndexScanDesc scan, ScanDirection dir);
  extern RetrieveIndexResult _hash_first(IndexScanDesc scan, ScanDirection dir);
***************
*** 313,320 ****
  
  
  /* hashutil.c */
! extern ScanKey _hash_mkscankey(Relation rel, IndexTuple itup,
! 				HashMetaPage metap);
  extern void _hash_freeskey(ScanKey skey);
  extern bool _hash_checkqual(IndexScanDesc scan, IndexTuple itup);
  extern HashItem _hash_formitem(IndexTuple itup);
--- 312,318 ----
  
  
  /* hashutil.c */
! extern ScanKey _hash_mkscankey(Relation rel, IndexTuple itup);
  extern void _hash_freeskey(ScanKey skey);
  extern bool _hash_checkqual(IndexScanDesc scan, IndexTuple itup);
  extern HashItem _hash_formitem(IndexTuple itup);
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Neil Conway (#1)
Re: new hashing function

Neil Conway <nconway@klamath.dyndns.org> writes:

I haven't looked at the implementation of hash joins; if they happen to
use this hash function as well, that would be another informative
situation to benchmark.

Hash joins use some chosen-at-random hashing code of their own; see
hashFunc() in src/backend/executor/nodeHash.c. One of the things on my
to-do list has been to replace that with the datatype-specific hash
functions used for indexing/caching, since the latter seem better
engineered (even before your improvements).

BTW, I don't particularly approve of the parts of this patch that
simply remove unused arguments from various routines. You aren't
going to save very many cycles that way, and you are reducing
flexibility (eg, the changes to remove nkeys would interfere with
trying to make hash indexes support multiple columns).

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Neil Conway (#1)
Re: new hashing function

Neil Conway <nconway@klamath.dyndns.org> writes:

I've implemented Bob Jenkin's hash function for PostgreSQL; more
information on the hash function can be found at
http://burtleburtle.net/bob/hash/doobs.html

One other thought --- presently, catcache.c is careful to use a prime
size (257) for its hash tables, so that reducing the raw hash value
mod 257 will allow all bits of the hash to contribute to determining
the hash bucket number. This is necessary because of the relatively
poor randomness of the hash functions. Perhaps with Jenkins' function
we could dispense with that, and use a fixed power-of-2 size so that the
division becomes a simple bit masking. On machines with slow integer
division, this could be a nice speedup. Wouldn't help for hash indexes
or joins though, since they don't use constant hashtable sizes.

regards, tom lane

#4Neil Conway
nconway@klamath.dyndns.org
In reply to: Tom Lane (#2)
Re: new hashing function

On Sun, 2002-03-03 at 12:31, Tom Lane wrote:

Neil Conway <nconway@klamath.dyndns.org> writes:

I haven't looked at the implementation of hash joins; if they happen to
use this hash function as well, that would be another informative
situation to benchmark.

Hash joins use some chosen-at-random hashing code of their own; see
hashFunc() in src/backend/executor/nodeHash.c. One of the things on my
to-do list has been to replace that with the datatype-specific hash
functions used for indexing/caching, since the latter seem better
engineered (even before your improvements).

Okay, I'll implement this.

BTW, I don't particularly approve of the parts of this patch that
simply remove unused arguments from various routines. You aren't
going to save very many cycles that way, and you are reducing
flexibility (eg, the changes to remove nkeys would interfere with
trying to make hash indexes support multiple columns).

Hmmm... I had viewed removing extra, unused functions to arguments as
basically code cleanup. But I see your point -- although really, the
future purpose behind the extra arguments should probably be
documented... I'll review my changes and remove the ones that seem to
have some future benefit.

Thanks for the feedback Tom.

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC