[PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

Started by Zdenek Kotalaover 16 years ago33 messages
#1Zdenek Kotala
Zdenek.Kotala@Sun.COM
1 attachment(s)

Attached patch cleanups hash index headers to allow compile hasham for
8.3 version. It helps to improve pg_migrator with capability to migrate
database with hash index without reindexing.

I discussed this patch year ago with Alvaro when we tried to cleanup
include bloating problem. It should reduce also number of including.

The main point is that hash functions for datatypes are now in related
data files in utils/adt directory. hash_any() and hash_uint32 it now in
utils/hashfunc.c.

It would be nice to have this in 8.4 because it allows to test index
migration functionality.

Thanks Zdenek

Attachments:

hash.patchtext/x-patch; CHARSET=US-ASCII; name=hash.patchDownload
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/access/hash/hashfunc.c pgsql_indexcompat/src/backend/access/hash/hashfunc.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/access/hash/hashfunc.c	2009-05-22 15:56:34.409314434 -0400
--- pgsql_indexcompat/src/backend/access/hash/hashfunc.c	1969-12-31 19:00:00.000000000 -0500
***************
*** 1,528 ****
- /*-------------------------------------------------------------------------
-  *
-  * hashfunc.c
-  *	  Support functions for hash access method.
-  *
-  * Portions Copyright (c) 1996-2009, PostgreSQL Global Development Group
-  * Portions Copyright (c) 1994, Regents of the University of California
-  *
-  *
-  * IDENTIFICATION
-  *	  $PostgreSQL: pgsql/src/backend/access/hash/hashfunc.c,v 1.57 2009/01/01 17:23:35 momjian Exp $
-  *
-  * NOTES
-  *	  These functions are stored in pg_amproc.	For each operator class
-  *	  defined for hash indexes, they compute the hash value of the argument.
-  *
-  *	  Additional hash functions appear in /utils/adt/ files for various
-  *	  specialized datatypes.
-  *
-  *	  It is expected that every bit of a hash function's 32-bit result is
-  *	  as random as every other; failure to ensure this is likely to lead
-  *	  to poor performance of hash joins, for example.  In most cases a hash
-  *	  function should use hash_any() or its variant hash_uint32().
-  *-------------------------------------------------------------------------
-  */
- 
- #include "postgres.h"
- 
- #include "access/hash.h"
- 
- 
- /* Note: this is used for both "char" and boolean datatypes */
- Datum
- hashchar(PG_FUNCTION_ARGS)
- {
- 	return hash_uint32((int32) PG_GETARG_CHAR(0));
- }
- 
- Datum
- hashint2(PG_FUNCTION_ARGS)
- {
- 	return hash_uint32((int32) PG_GETARG_INT16(0));
- }
- 
- Datum
- hashint4(PG_FUNCTION_ARGS)
- {
- 	return hash_uint32(PG_GETARG_INT32(0));
- }
- 
- Datum
- hashint8(PG_FUNCTION_ARGS)
- {
- 	/*
- 	 * The idea here is to produce a hash value compatible with the values
- 	 * produced by hashint4 and hashint2 for logically equal inputs; this is
- 	 * necessary to support cross-type hash joins across these input types.
- 	 * Since all three types are signed, we can xor the high half of the int8
- 	 * value if the sign is positive, or the complement of the high half when
- 	 * the sign is negative.
- 	 */
- #ifndef INT64_IS_BUSTED
- 	int64		val = PG_GETARG_INT64(0);
- 	uint32		lohalf = (uint32) val;
- 	uint32		hihalf = (uint32) (val >> 32);
- 
- 	lohalf ^= (val >= 0) ? hihalf : ~hihalf;
- 
- 	return hash_uint32(lohalf);
- #else
- 	/* here if we can't count on "x >> 32" to work sanely */
- 	return hash_uint32((int32) PG_GETARG_INT64(0));
- #endif
- }
- 
- Datum
- hashoid(PG_FUNCTION_ARGS)
- {
- 	return hash_uint32((uint32) PG_GETARG_OID(0));
- }
- 
- Datum
- hashenum(PG_FUNCTION_ARGS)
- {
- 	return hash_uint32((uint32) PG_GETARG_OID(0));
- }
- 
- Datum
- hashfloat4(PG_FUNCTION_ARGS)
- {
- 	float4		key = PG_GETARG_FLOAT4(0);
- 	float8		key8;
- 
- 	/*
- 	 * On IEEE-float machines, minus zero and zero have different bit patterns
- 	 * but should compare as equal.  We must ensure that they have the same
- 	 * hash value, which is most reliably done this way:
- 	 */
- 	if (key == (float4) 0)
- 		PG_RETURN_UINT32(0);
- 
- 	/*
- 	 * To support cross-type hashing of float8 and float4, we want to return
- 	 * the same hash value hashfloat8 would produce for an equal float8 value.
- 	 * So, widen the value to float8 and hash that.  (We must do this rather
- 	 * than have hashfloat8 try to narrow its value to float4; that could fail
- 	 * on overflow.)
- 	 */
- 	key8 = key;
- 
- 	return hash_any((unsigned char *) &key8, sizeof(key8));
- }
- 
- Datum
- hashfloat8(PG_FUNCTION_ARGS)
- {
- 	float8		key = PG_GETARG_FLOAT8(0);
- 
- 	/*
- 	 * On IEEE-float machines, minus zero and zero have different bit patterns
- 	 * but should compare as equal.  We must ensure that they have the same
- 	 * hash value, which is most reliably done this way:
- 	 */
- 	if (key == (float8) 0)
- 		PG_RETURN_UINT32(0);
- 
- 	return hash_any((unsigned char *) &key, sizeof(key));
- }
- 
- Datum
- hashoidvector(PG_FUNCTION_ARGS)
- {
- 	oidvector  *key = (oidvector *) PG_GETARG_POINTER(0);
- 
- 	return hash_any((unsigned char *) key->values, key->dim1 * sizeof(Oid));
- }
- 
- Datum
- hashint2vector(PG_FUNCTION_ARGS)
- {
- 	int2vector *key = (int2vector *) PG_GETARG_POINTER(0);
- 
- 	return hash_any((unsigned char *) key->values, key->dim1 * sizeof(int2));
- }
- 
- Datum
- hashname(PG_FUNCTION_ARGS)
- {
- 	char	   *key = NameStr(*PG_GETARG_NAME(0));
- 	int			keylen = strlen(key);
- 
- 	Assert(keylen < NAMEDATALEN);		/* else it's not truncated correctly */
- 
- 	return hash_any((unsigned char *) key, keylen);
- }
- 
- Datum
- hashtext(PG_FUNCTION_ARGS)
- {
- 	text	   *key = PG_GETARG_TEXT_PP(0);
- 	Datum		result;
- 
- 	/*
- 	 * Note: this is currently identical in behavior to hashvarlena, but keep
- 	 * it as a separate function in case we someday want to do something
- 	 * different in non-C locales.	(See also hashbpchar, if so.)
- 	 */
- 	result = hash_any((unsigned char *) VARDATA_ANY(key),
- 					  VARSIZE_ANY_EXHDR(key));
- 
- 	/* Avoid leaking memory for toasted inputs */
- 	PG_FREE_IF_COPY(key, 0);
- 
- 	return result;
- }
- 
- /*
-  * hashvarlena() can be used for any varlena datatype in which there are
-  * no non-significant bits, ie, distinct bitpatterns never compare as equal.
-  */
- Datum
- hashvarlena(PG_FUNCTION_ARGS)
- {
- 	struct varlena *key = PG_GETARG_VARLENA_PP(0);
- 	Datum		result;
- 
- 	result = hash_any((unsigned char *) VARDATA_ANY(key),
- 					  VARSIZE_ANY_EXHDR(key));
- 
- 	/* Avoid leaking memory for toasted inputs */
- 	PG_FREE_IF_COPY(key, 0);
- 
- 	return result;
- }
- 
- /*
-  * This hash function was written by Bob Jenkins
-  * (bob_jenkins@burtleburtle.net), and superficially adapted
-  * for PostgreSQL by Neil Conway. For more information on this
-  * hash function, see http://burtleburtle.net/bob/hash/doobs.html,
-  * or Bob's article in Dr. Dobb's Journal, Sept. 1997.
-  *
-  * In the current code, we have adopted Bob's 2006 update of his hash
-  * function to fetch the data a word at a time when it is suitably aligned.
-  * This makes for a useful speedup, at the cost of having to maintain
-  * four code paths (aligned vs unaligned, and little-endian vs big-endian).
-  * It also uses two separate mixing functions mix() and final(), instead
-  * of a slower multi-purpose function.
-  */
- 
- /* Get a bit mask of the bits set in non-uint32 aligned addresses */
- #define UINT32_ALIGN_MASK (sizeof(uint32) - 1)
- 
- /* Rotate a uint32 value left by k bits - note multiple evaluation! */
- #define rot(x,k) (((x)<<(k)) | ((x)>>(32-(k))))
- 
- /*----------
-  * mix -- mix 3 32-bit values reversibly.
-  *
-  * This is reversible, so any information in (a,b,c) before mix() is
-  * still in (a,b,c) after mix().
-  *
-  * If four pairs of (a,b,c) inputs are run through mix(), or through
-  * mix() in reverse, there are at least 32 bits of the output that
-  * are sometimes the same for one pair and different for another pair.
-  * This was tested for:
-  * * pairs that differed by one bit, by two bits, in any combination
-  *   of top bits of (a,b,c), or in any combination of bottom bits of
-  *   (a,b,c).
-  * * "differ" is defined as +, -, ^, or ~^.  For + and -, I transformed
-  *   the output delta to a Gray code (a^(a>>1)) so a string of 1's (as
-  *   is commonly produced by subtraction) look like a single 1-bit
-  *   difference.
-  * * the base values were pseudorandom, all zero but one bit set, or
-  *   all zero plus a counter that starts at zero.
-  * 
-  * This does not achieve avalanche.  There are input bits of (a,b,c)
-  * that fail to affect some output bits of (a,b,c), especially of a.  The
-  * most thoroughly mixed value is c, but it doesn't really even achieve
-  * avalanche in c. 
-  * 
-  * This allows some parallelism.  Read-after-writes are good at doubling
-  * the number of bits affected, so the goal of mixing pulls in the opposite
-  * direction from the goal of parallelism.  I did what I could.  Rotates
-  * seem to cost as much as shifts on every machine I could lay my hands on,
-  * and rotates are much kinder to the top and bottom bits, so I used rotates.
-  *----------
-  */
- #define mix(a,b,c) \
- { \
-   a -= c;  a ^= rot(c, 4);  c += b; \
-   b -= a;  b ^= rot(a, 6);  a += c; \
-   c -= b;  c ^= rot(b, 8);  b += a; \
-   a -= c;  a ^= rot(c,16);  c += b; \
-   b -= a;  b ^= rot(a,19);  a += c; \
-   c -= b;  c ^= rot(b, 4);  b += a; \
- }
- 
- /*----------
-  * final -- final mixing of 3 32-bit values (a,b,c) into c
-  *
-  * Pairs of (a,b,c) values differing in only a few bits will usually
-  * produce values of c that look totally different.  This was tested for
-  * * pairs that differed by one bit, by two bits, in any combination
-  *   of top bits of (a,b,c), or in any combination of bottom bits of
-  *   (a,b,c).
-  * * "differ" is defined as +, -, ^, or ~^.  For + and -, I transformed
-  *   the output delta to a Gray code (a^(a>>1)) so a string of 1's (as
-  *   is commonly produced by subtraction) look like a single 1-bit
-  *   difference.
-  * * the base values were pseudorandom, all zero but one bit set, or
-  *   all zero plus a counter that starts at zero.
-  *     
-  * The use of separate functions for mix() and final() allow for a
-  * substantial performance increase since final() does not need to
-  * do well in reverse, but is does need to affect all output bits.
-  * mix(), on the other hand, does not need to affect all output
-  * bits (affecting 32 bits is enough).  The original hash function had
-  * a single mixing operation that had to satisfy both sets of requirements
-  * and was slower as a result.
-  *----------
-  */
- #define final(a,b,c) \
- { \
-   c ^= b; c -= rot(b,14); \
-   a ^= c; a -= rot(c,11); \
-   b ^= a; b -= rot(a,25); \
-   c ^= b; c -= rot(b,16); \
-   a ^= c; a -= rot(c, 4); \
-   b ^= a; b -= rot(a,14); \
-   c ^= b; c -= rot(b,24); \
- }
- 
- /*
-  * hash_any() -- hash a variable-length key into a 32-bit value
-  *		k		: the key (the unaligned variable-length array of bytes)
-  *		len		: the length of the key, counting by bytes
-  *
-  * Returns a uint32 value.	Every bit of the key affects every bit of
-  * the return value.  Every 1-bit and 2-bit delta achieves avalanche.
-  * About 6*len+35 instructions. The best hash table sizes are powers
-  * of 2.  There is no need to do mod a prime (mod is sooo slow!).
-  * If you need less than 32 bits, use a bitmask.
-  *
-  * Note: we could easily change this function to return a 64-bit hash value
-  * by using the final values of both b and c.  b is perhaps a little less
-  * well mixed than c, however.
-  */
- Datum
- hash_any(register const unsigned char *k, register int keylen)
- {
- 	register uint32 a,
- 				b,
- 				c,
- 				len;
- 
- 	/* Set up the internal state */
- 	len = keylen;
- 	a = b = c = 0x9e3779b9 + len + 3923095;
- 
- 	/* If the source pointer is word-aligned, we use word-wide fetches */
- 	if (((long) k & UINT32_ALIGN_MASK) == 0)
- 	{
- 		/* Code path for aligned source data */
- 		register const uint32 *ka = (const uint32 *) k;
- 
- 		/* handle most of the key */
- 		while (len >= 12)
- 		{
- 			a += ka[0];
- 			b += ka[1];
- 			c += ka[2];
- 			mix(a, b, c);
- 			ka += 3;
- 			len -= 12;
- 		}
- 
- 		/* handle the last 11 bytes */
- 		k = (const unsigned char *) ka;
- #ifdef WORDS_BIGENDIAN
- 		switch (len)
- 		{
- 			case 11:
- 				c += ((uint32) k[10] << 8);
- 				/* fall through */
- 			case 10:
- 				c += ((uint32) k[9] << 16);
- 				/* fall through */
- 			case 9:
- 				c += ((uint32) k[8] << 24);
- 				/* the lowest byte of c is reserved for the length */
- 				/* fall through */
- 			case 8:
- 				b += ka[1];
- 				a += ka[0];
- 				break;
- 			case 7:
- 				b += ((uint32) k[6] << 8);
- 				/* fall through */
- 			case 6:
- 				b += ((uint32) k[5] << 16);
- 				/* fall through */
- 			case 5:
- 				b += ((uint32) k[4] << 24);
- 				/* fall through */
- 			case 4:
- 				a += ka[0];
- 				break;
- 			case 3:
- 				a += ((uint32) k[2] << 8);
- 				/* fall through */
- 			case 2:
- 				a += ((uint32) k[1] << 16);
- 				/* fall through */
- 			case 1:
- 				a += ((uint32) k[0] << 24);
- 			/* case 0: nothing left to add */
- 		}
- #else /* !WORDS_BIGENDIAN */
- 		switch (len)
- 		{
- 			case 11:
- 				c += ((uint32) k[10] << 24);
- 				/* fall through */
- 			case 10:
- 				c += ((uint32) k[9] << 16);
- 				/* fall through */
- 			case 9:
- 				c += ((uint32) k[8] << 8);
- 				/* the lowest byte of c is reserved for the length */
- 				/* fall through */
- 			case 8:
- 				b += ka[1];
- 				a += ka[0];
- 				break;
- 			case 7:
- 				b += ((uint32) k[6] << 16);
- 				/* fall through */
- 			case 6:
- 				b += ((uint32) k[5] << 8);
- 				/* fall through */
- 			case 5:
- 				b += k[4];
- 				/* fall through */
- 			case 4:
- 				a += ka[0];
- 				break;
- 			case 3:
- 				a += ((uint32) k[2] << 16);
- 				/* fall through */
- 			case 2:
- 				a += ((uint32) k[1] << 8);
- 				/* fall through */
- 			case 1:
- 				a += k[0];
- 			/* case 0: nothing left to add */
- 		}
- #endif /* WORDS_BIGENDIAN */
- 	}
- 	else
- 	{
- 		/* Code path for non-aligned source data */
- 
- 		/* handle most of the key */
- 		while (len >= 12)
- 		{
- #ifdef WORDS_BIGENDIAN
- 			a += (k[3] + ((uint32) k[2] << 8) + ((uint32) k[1] << 16) + ((uint32) k[0] << 24));
- 			b += (k[7] + ((uint32) k[6] << 8) + ((uint32) k[5] << 16) + ((uint32) k[4] << 24));
- 			c += (k[11] + ((uint32) k[10] << 8) + ((uint32) k[9] << 16) + ((uint32) k[8] << 24));
- #else /* !WORDS_BIGENDIAN */
- 			a += (k[0] + ((uint32) k[1] << 8) + ((uint32) k[2] << 16) + ((uint32) k[3] << 24));
- 			b += (k[4] + ((uint32) k[5] << 8) + ((uint32) k[6] << 16) + ((uint32) k[7] << 24));
- 			c += (k[8] + ((uint32) k[9] << 8) + ((uint32) k[10] << 16) + ((uint32) k[11] << 24));
- #endif /* WORDS_BIGENDIAN */
- 			mix(a, b, c);
- 			k += 12;
- 			len -= 12;
- 		}
- 
- 		/* handle the last 11 bytes */
- #ifdef WORDS_BIGENDIAN
- 		switch (len)			/* all the case statements fall through */
- 		{
- 			case 11:
- 				c += ((uint32) k[10] << 8);
- 			case 10:
- 				c += ((uint32) k[9] << 16);
- 			case 9:
- 				c += ((uint32) k[8] << 24);
- 				/* the lowest byte of c is reserved for the length */
- 			case 8:
- 				b += k[7];
- 			case 7:
- 				b += ((uint32) k[6] << 8);
- 			case 6:
- 				b += ((uint32) k[5] << 16);
- 			case 5:
- 				b += ((uint32) k[4] << 24);
- 			case 4:
- 				a += k[3];
- 			case 3:
- 				a += ((uint32) k[2] << 8);
- 			case 2:
- 				a += ((uint32) k[1] << 16);
- 			case 1:
- 				a += ((uint32) k[0] << 24);
- 			/* case 0: nothing left to add */
- 		}
- #else /* !WORDS_BIGENDIAN */
- 		switch (len)			/* all the case statements fall through */
- 		{
- 			case 11:
- 				c += ((uint32) k[10] << 24);
- 			case 10:
- 				c += ((uint32) k[9] << 16);
- 			case 9:
- 				c += ((uint32) k[8] << 8);
- 				/* the lowest byte of c is reserved for the length */
- 			case 8:
- 				b += ((uint32) k[7] << 24);
- 			case 7:
- 				b += ((uint32) k[6] << 16);
- 			case 6:
- 				b += ((uint32) k[5] << 8);
- 			case 5:
- 				b += k[4];
- 			case 4:
- 				a += ((uint32) k[3] << 24);
- 			case 3:
- 				a += ((uint32) k[2] << 16);
- 			case 2:
- 				a += ((uint32) k[1] << 8);
- 			case 1:
- 				a += k[0];
- 			/* case 0: nothing left to add */
- 		}
- #endif /* WORDS_BIGENDIAN */
- 	}
- 
- 	final(a, b, c);
- 
- 	/* report the result */
- 	return UInt32GetDatum(c);
- }
- 
- /*
-  * hash_uint32() -- hash a 32-bit value
-  *
-  * This has the same result as
-  *		hash_any(&k, sizeof(uint32))
-  * but is faster and doesn't force the caller to store k into memory.
-  */
- Datum
- hash_uint32(uint32 k)
- {
- 	register uint32 a,
- 				b,
- 				c;
- 
- 	a = b = c = 0x9e3779b9 + (uint32) sizeof(uint32) + 3923095;
- 	a += k;
- 
- 	final(a, b, c);
- 
- 	/* report the result */
- 	return UInt32GetDatum(c);
- }
--- 0 ----
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/access/hash/Makefile pgsql_indexcompat/src/backend/access/hash/Makefile
*** pgsql_indexcompat.5d4d60e3a557/src/backend/access/hash/Makefile	2009-05-22 15:56:34.353808065 -0400
--- pgsql_indexcompat/src/backend/access/hash/Makefile	2009-05-22 15:56:34.409876088 -0400
***************
*** 12,18 ****
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = hash.o hashfunc.o hashinsert.o hashovfl.o hashpage.o hashscan.o \
         hashsearch.o hashsort.o hashutil.o
  
  include $(top_srcdir)/src/backend/common.mk
--- 12,18 ----
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = hash.o hashinsert.o hashovfl.o hashpage.o hashscan.o \
         hashsearch.o hashsort.o hashutil.o
  
  include $(top_srcdir)/src/backend/common.mk
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/nodes/bitmapset.c pgsql_indexcompat/src/backend/nodes/bitmapset.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/nodes/bitmapset.c	2009-05-22 15:56:34.355138078 -0400
--- pgsql_indexcompat/src/backend/nodes/bitmapset.c	2009-05-22 15:56:34.410159011 -0400
***************
*** 21,27 ****
  #include "postgres.h"
  
  #include "nodes/bitmapset.h"
! #include "access/hash.h"
  
  
  #define WORDNUM(x)	((x) / BITS_PER_BITMAPWORD)
--- 21,27 ----
  #include "postgres.h"
  
  #include "nodes/bitmapset.h"
! #include "utils/hashfunc.h"
  
  
  #define WORDNUM(x)	((x) / BITS_PER_BITMAPWORD)
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/tsearch/ts_typanalyze.c pgsql_indexcompat/src/backend/tsearch/ts_typanalyze.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/tsearch/ts_typanalyze.c	2009-05-22 15:56:34.355928139 -0400
--- pgsql_indexcompat/src/backend/tsearch/ts_typanalyze.c	2009-05-22 15:56:34.410422878 -0400
***************
*** 13,24 ****
   */
  #include "postgres.h"
  
- #include "access/hash.h"
  #include "catalog/pg_operator.h"
  #include "commands/vacuum.h"
  #include "tsearch/ts_type.h"
  #include "utils/builtins.h"
  #include "utils/hsearch.h"
  
  
  /* A hash key for lexemes */
--- 13,24 ----
   */
  #include "postgres.h"
  
  #include "catalog/pg_operator.h"
  #include "commands/vacuum.h"
  #include "tsearch/ts_type.h"
  #include "utils/builtins.h"
  #include "utils/hsearch.h"
+ #include "utils/hashfunc.h"
  
  
  /* A hash key for lexemes */
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/date.c pgsql_indexcompat/src/backend/utils/adt/date.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/date.c	2009-05-22 15:56:34.360806325 -0400
--- pgsql_indexcompat/src/backend/utils/adt/date.c	2009-05-22 15:56:34.411013646 -0400
***************
*** 20,32 ****
  #include <float.h>
  #include <time.h>
  
- #include "access/hash.h"
  #include "libpq/pqformat.h"
  #include "miscadmin.h"
  #include "parser/scansup.h"
  #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/date.h"
  #include "utils/nabstime.h"
  
  /*
--- 20,32 ----
  #include <float.h>
  #include <time.h>
  
  #include "libpq/pqformat.h"
  #include "miscadmin.h"
  #include "parser/scansup.h"
  #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/date.h"
+ #include "utils/hashfunc.h"
  #include "utils/nabstime.h"
  
  /*
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/enum.c pgsql_indexcompat/src/backend/utils/adt/enum.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/enum.c	2009-05-22 15:56:34.361377758 -0400
--- pgsql_indexcompat/src/backend/utils/adt/enum.c	2009-05-22 15:56:34.411175195 -0400
***************
*** 17,22 ****
--- 17,23 ----
  #include "fmgr.h"
  #include "utils/array.h"
  #include "utils/builtins.h"
+ #include "utils/hashfunc.h"
  #include "utils/lsyscache.h"
  #include "utils/syscache.h"
  #include "libpq/pqformat.h"
***************
*** 433,435 ****
--- 434,442 ----
  		return 1;
  	return 0;
  }
+ 
+ Datum
+ hashenum(PG_FUNCTION_ARGS)
+ {
+ 	return hash_uint32((uint32) PG_GETARG_OID(0));
+ }
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/float.c pgsql_indexcompat/src/backend/utils/adt/float.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/float.c	2009-05-22 15:56:34.367021659 -0400
--- pgsql_indexcompat/src/backend/utils/adt/float.c	2009-05-22 15:56:34.411406466 -0400
***************
*** 23,28 ****
--- 23,29 ----
  #include "libpq/pqformat.h"
  #include "utils/array.h"
  #include "utils/builtins.h"
+ #include "utils/hashfunc.h"
  
  
  #ifndef M_PI
***************
*** 2745,2750 ****
--- 2746,2793 ----
  	PG_RETURN_INT32(result);
  }
  
+ Datum
+ hashfloat4(PG_FUNCTION_ARGS)
+ {
+ 	float4		key = PG_GETARG_FLOAT4(0);
+ 	float8		key8;
+ 
+ 	/*
+ 	 * On IEEE-float machines, minus zero and zero have different bit patterns
+ 	 * but should compare as equal.  We must ensure that they have the same
+ 	 * hash value, which is most reliably done this way:
+ 	 */
+ 	if (key == (float4) 0)
+ 		PG_RETURN_UINT32(0);
+ 
+ 	/*
+ 	 * To support cross-type hashing of float8 and float4, we want to return
+ 	 * the same hash value hashfloat8 would produce for an equal float8 value.
+ 	 * So, widen the value to float8 and hash that.  (We must do this rather
+ 	 * than have hashfloat8 try to narrow its value to float4; that could fail
+ 	 * on overflow.)
+ 	 */
+ 	key8 = key;
+ 
+ 	return hash_any((unsigned char *) &key8, sizeof(key8));
+ }
+ 
+ Datum
+ hashfloat8(PG_FUNCTION_ARGS)
+ {
+ 	float8		key = PG_GETARG_FLOAT8(0);
+ 
+ 	/*
+ 	 * On IEEE-float machines, minus zero and zero have different bit patterns
+ 	 * but should compare as equal.  We must ensure that they have the same
+ 	 * hash value, which is most reliably done this way:
+ 	 */
+ 	if (key == (float8) 0)
+ 		PG_RETURN_UINT32(0);
+ 
+ 	return hash_any((unsigned char *) &key, sizeof(key));
+ }
+ 
  /* ========== PRIVATE ROUTINES ========== */
  
  #ifndef HAVE_CBRT
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/char.c pgsql_indexcompat/src/backend/utils/adt/char.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/char.c	2009-05-22 15:56:34.356602890 -0400
--- pgsql_indexcompat/src/backend/utils/adt/char.c	2009-05-22 15:56:34.410785720 -0400
***************
*** 19,24 ****
--- 19,25 ----
  
  #include "libpq/pqformat.h"
  #include "utils/builtins.h"
+ #include "utils/hashfunc.h"
  
  /*****************************************************************************
   *	 USER I/O ROUTINES														 *
***************
*** 211,213 ****
--- 212,221 ----
  
  	PG_RETURN_TEXT_P(result);
  }
+ 
+ /* Note: this is used for both "char" and boolean datatypes */
+ Datum
+ hashchar(PG_FUNCTION_ARGS)
+ {
+ 	return hash_uint32((int32) PG_GETARG_CHAR(0));
+ }
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/int.c pgsql_indexcompat/src/backend/utils/adt/int.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/int.c	2009-05-22 15:56:34.369838628 -0400
--- pgsql_indexcompat/src/backend/utils/adt/int.c	2009-05-22 15:56:34.411593455 -0400
***************
*** 36,41 ****
--- 36,42 ----
  #include "libpq/pqformat.h"
  #include "utils/array.h"
  #include "utils/builtins.h"
+ #include "utils/hashfunc.h"
  
  
  #define SAMESIGN(a,b)	(((a) < 0) == ((b) < 0))
***************
*** 1353,1355 ****
--- 1354,1376 ----
  		/* do when there is no more left */
  		SRF_RETURN_DONE(funcctx);
  }
+ 
+ Datum
+ hashint2(PG_FUNCTION_ARGS)
+ {
+ 	return hash_uint32((int32) PG_GETARG_INT16(0));
+ }
+ 
+ Datum
+ hashint4(PG_FUNCTION_ARGS)
+ {
+ 	return hash_uint32(PG_GETARG_INT32(0));
+ }
+ 
+ Datum
+ hashint2vector(PG_FUNCTION_ARGS)
+ {
+ 	int2vector *key = (int2vector *) PG_GETARG_POINTER(0);
+ 
+ 	return hash_any((unsigned char *) key->values, key->dim1 * sizeof(int2));
+ }
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/int8.c pgsql_indexcompat/src/backend/utils/adt/int8.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/int8.c	2009-05-22 15:56:34.372307503 -0400
--- pgsql_indexcompat/src/backend/utils/adt/int8.c	2009-05-22 15:56:34.411779391 -0400
***************
*** 21,26 ****
--- 21,27 ----
  #include "libpq/pqformat.h"
  #include "nodes/nodes.h"
  #include "utils/int8.h"
+ #include "utils/hashfunc.h"
  
  
  #define MAXINT8LEN		25
***************
*** 1401,1403 ****
--- 1402,1429 ----
  		/* do when there is no more left */
  		SRF_RETURN_DONE(funcctx);
  }
+ 
+ Datum
+ hashint8(PG_FUNCTION_ARGS)
+ {
+ 	/*
+ 	 * The idea here is to produce a hash value compatible with the values
+ 	 * produced by hashint4 and hashint2 for logically equal inputs; this is
+ 	 * necessary to support cross-type hash joins across these input types.
+ 	 * Since all three types are signed, we can xor the high half of the int8
+ 	 * value if the sign is positive, or the complement of the high half when
+ 	 * the sign is negative.
+ 	 */
+ #ifndef INT64_IS_BUSTED
+ 	int64		val = PG_GETARG_INT64(0);
+ 	uint32		lohalf = (uint32) val;
+ 	uint32		hihalf = (uint32) (val >> 32);
+ 
+ 	lohalf ^= (val >= 0) ? hihalf : ~hihalf;
+ 
+ 	return hash_uint32(lohalf);
+ #else
+ 	/* here if we can't count on "x >> 32" to work sanely */
+ 	return hash_uint32((int32) PG_GETARG_INT64(0));
+ #endif
+ }
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/mac.c pgsql_indexcompat/src/backend/utils/adt/mac.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/mac.c	2009-05-22 15:56:34.372677805 -0400
--- pgsql_indexcompat/src/backend/utils/adt/mac.c	2009-05-22 15:56:34.411931673 -0400
***************
*** 6,14 ****
  
  #include "postgres.h"
  
- #include "access/hash.h"
  #include "libpq/pqformat.h"
  #include "utils/builtins.h"
  #include "utils/inet.h"
  
  
--- 6,14 ----
  
  #include "postgres.h"
  
  #include "libpq/pqformat.h"
  #include "utils/builtins.h"
+ #include "utils/hashfunc.h"
  #include "utils/inet.h"
  
  
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/name.c pgsql_indexcompat/src/backend/utils/adt/name.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/name.c	2009-05-22 15:56:34.373430941 -0400
--- pgsql_indexcompat/src/backend/utils/adt/name.c	2009-05-22 15:56:34.412080342 -0400
***************
*** 27,32 ****
--- 27,33 ----
  #include "miscadmin.h"
  #include "utils/array.h"
  #include "utils/builtins.h"
+ #include "utils/hashfunc.h"
  #include "utils/lsyscache.h"
  
  
***************
*** 319,321 ****
--- 320,333 ----
  
  	PG_RETURN_POINTER(array);
  }
+ 
+ Datum
+ hashname(PG_FUNCTION_ARGS)
+ {
+ 	char	   *key = NameStr(*PG_GETARG_NAME(0));
+ 	int			keylen = strlen(key);
+ 
+ 	Assert(keylen < NAMEDATALEN);		/* else it's not truncated correctly */
+ 
+ 	return hash_any((unsigned char *) key, keylen);
+ }
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/network.c pgsql_indexcompat/src/backend/utils/adt/network.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/network.c	2009-05-22 15:56:34.376147427 -0400
--- pgsql_indexcompat/src/backend/utils/adt/network.c	2009-05-22 15:56:34.412266819 -0400
***************
*** 12,24 ****
  #include <netinet/in.h>
  #include <arpa/inet.h>
  
- #include "access/hash.h"
  #include "catalog/pg_type.h"
  #include "libpq/ip.h"
  #include "libpq/libpq-be.h"
  #include "libpq/pqformat.h"
  #include "miscadmin.h"
  #include "utils/builtins.h"
  #include "utils/inet.h"
  
  
--- 12,24 ----
  #include <netinet/in.h>
  #include <arpa/inet.h>
  
  #include "catalog/pg_type.h"
  #include "libpq/ip.h"
  #include "libpq/libpq-be.h"
  #include "libpq/pqformat.h"
  #include "miscadmin.h"
  #include "utils/builtins.h"
+ #include "utils/hashfunc.h"
  #include "utils/inet.h"
  
  
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/numeric.c pgsql_indexcompat/src/backend/utils/adt/numeric.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/numeric.c	2009-05-22 15:56:34.382545862 -0400
--- pgsql_indexcompat/src/backend/utils/adt/numeric.c	2009-05-22 15:56:34.412630511 -0400
***************
*** 26,37 ****
  #include <limits.h>
  #include <math.h>
  
- #include "access/hash.h"
  #include "catalog/pg_type.h"
  #include "libpq/pqformat.h"
  #include "miscadmin.h"
  #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/int8.h"
  #include "utils/numeric.h"
  
--- 26,37 ----
  #include <limits.h>
  #include <math.h>
  
  #include "catalog/pg_type.h"
  #include "libpq/pqformat.h"
  #include "miscadmin.h"
  #include "utils/array.h"
  #include "utils/builtins.h"
+ #include "utils/hashfunc.h"
  #include "utils/int8.h"
  #include "utils/numeric.h"
  
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/oid.c pgsql_indexcompat/src/backend/utils/adt/oid.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/oid.c	2009-05-22 15:56:34.384087154 -0400
--- pgsql_indexcompat/src/backend/utils/adt/oid.c	2009-05-22 15:56:34.412785642 -0400
***************
*** 21,26 ****
--- 21,27 ----
  #include "libpq/pqformat.h"
  #include "utils/array.h"
  #include "utils/builtins.h"
+ #include "utils/hashfunc.h"
  
  
  #define OidVectorSize(n)	(offsetof(oidvector, values) + (n) * sizeof(Oid))
***************
*** 419,421 ****
--- 420,436 ----
  
  	PG_RETURN_BOOL(cmp > 0);
  }
+ 
+ Datum
+ hashoid(PG_FUNCTION_ARGS)
+ {
+ 	return hash_uint32((uint32) PG_GETARG_OID(0));
+ }
+ 
+ Datum
+ hashoidvector(PG_FUNCTION_ARGS)
+ {
+ 	oidvector  *key = (oidvector *) PG_GETARG_POINTER(0);
+ 
+ 	return hash_any((unsigned char *) key->values, key->dim1 * sizeof(Oid));
+ }
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/timestamp.c pgsql_indexcompat/src/backend/utils/adt/timestamp.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/timestamp.c	2009-05-22 15:56:34.391980267 -0400
--- pgsql_indexcompat/src/backend/utils/adt/timestamp.c	2009-05-22 15:56:34.413104630 -0400
***************
*** 21,27 ****
  #include <limits.h>
  #include <sys/time.h>
  
- #include "access/hash.h"
  #include "access/xact.h"
  #include "catalog/pg_type.h"
  #include "funcapi.h"
--- 21,26 ----
***************
*** 31,36 ****
--- 30,36 ----
  #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/datetime.h"
+ #include "utils/hashfunc.h"
  
  /*
   * gcc's -ffast-math switch breaks routines that expect exact results from
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/uuid.c pgsql_indexcompat/src/backend/utils/adt/uuid.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/uuid.c	2009-05-22 15:56:34.392585968 -0400
--- pgsql_indexcompat/src/backend/utils/adt/uuid.c	2009-05-22 15:56:34.413254701 -0400
***************
*** 13,21 ****
  
  #include "postgres.h"
  
- #include "access/hash.h"
  #include "libpq/pqformat.h"
  #include "utils/builtins.h"
  #include "utils/uuid.h"
  
  /* uuid size in bytes */
--- 13,21 ----
  
  #include "postgres.h"
  
  #include "libpq/pqformat.h"
  #include "utils/builtins.h"
+ #include "utils/hashfunc.h"
  #include "utils/uuid.h"
  
  /* uuid size in bytes */
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/varchar.c pgsql_indexcompat/src/backend/utils/adt/varchar.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/varchar.c	2009-05-22 15:56:34.394025160 -0400
--- pgsql_indexcompat/src/backend/utils/adt/varchar.c	2009-05-22 15:56:34.413430641 -0400
***************
*** 15,25 ****
  #include "postgres.h"
  
  
- #include "access/hash.h"
  #include "access/tuptoaster.h"
  #include "libpq/pqformat.h"
  #include "utils/array.h"
  #include "utils/builtins.h"
  #include "mb/pg_wchar.h"
  
  
--- 15,25 ----
  #include "postgres.h"
  
  
  #include "access/tuptoaster.h"
  #include "libpq/pqformat.h"
  #include "utils/array.h"
  #include "utils/builtins.h"
+ #include "utils/hashfunc.h"
  #include "mb/pg_wchar.h"
  
  
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/varlena.c pgsql_indexcompat/src/backend/utils/adt/varlena.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/utils/adt/varlena.c	2009-05-22 15:56:34.400341684 -0400
--- pgsql_indexcompat/src/backend/utils/adt/varlena.c	2009-05-22 15:56:34.413685740 -0400
***************
*** 24,29 ****
--- 24,30 ----
  #include "parser/scansup.h"
  #include "regex/regex.h"
  #include "utils/builtins.h"
+ #include "utils/hashfunc.h"
  #include "utils/lsyscache.h"
  #include "utils/pg_locale.h"
  
***************
*** 3102,3104 ****
--- 3103,3144 ----
  
  	PG_RETURN_INT32(result);
  }
+ 
+ Datum
+ hashtext(PG_FUNCTION_ARGS)
+ {
+ 	text	   *key = PG_GETARG_TEXT_PP(0);
+ 	Datum		result;
+ 
+ 	/*
+ 	 * Note: this is currently identical in behavior to hashvarlena, but keep
+ 	 * it as a separate function in case we someday want to do something
+ 	 * different in non-C locales.	(See also hashbpchar, if so.)
+ 	 */
+ 	result = hash_any((unsigned char *) VARDATA_ANY(key),
+ 					  VARSIZE_ANY_EXHDR(key));
+ 
+ 	/* Avoid leaking memory for toasted inputs */
+ 	PG_FREE_IF_COPY(key, 0);
+ 
+ 	return result;
+ }
+ 
+ /*
+  * hashvarlena() can be used for any varlena datatype in which there are
+  * no non-significant bits, ie, distinct bitpatterns never compare as equal.
+  */
+ Datum
+ hashvarlena(PG_FUNCTION_ARGS)
+ {
+ 	struct varlena *key = PG_GETARG_VARLENA_PP(0);
+ 	Datum		result;
+ 
+ 	result = hash_any((unsigned char *) VARDATA_ANY(key),
+ 					  VARSIZE_ANY_EXHDR(key));
+ 
+ 	/* Avoid leaking memory for toasted inputs */
+ 	PG_FREE_IF_COPY(key, 0);
+ 
+ 	return result;
+ }
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/utils/hash/hashfn.c pgsql_indexcompat/src/backend/utils/hash/hashfn.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/utils/hash/hashfn.c	2009-05-22 15:56:34.401341014 -0400
--- pgsql_indexcompat/src/backend/utils/hash/hashfn.c	2009-05-22 15:56:34.414083214 -0400
***************
*** 21,28 ****
   */
  #include "postgres.h"
  
- #include "access/hash.h"
  #include "nodes/bitmapset.h"
  
  
  /*
--- 21,28 ----
   */
  #include "postgres.h"
  
  #include "nodes/bitmapset.h"
+ #include "utils/hashfunc.h"
  
  
  /*
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/utils/hash/hashfunc.c pgsql_indexcompat/src/backend/utils/hash/hashfunc.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/utils/hash/hashfunc.c	1969-12-31 19:00:00.000000000 -0500
--- pgsql_indexcompat/src/backend/utils/hash/hashfunc.c	2009-05-22 15:56:34.414915382 -0400
***************
*** 0 ****
--- 1,357 ----
+ /*-------------------------------------------------------------------------
+  *
+  * hashfunc.c
+  *	  Support functions for hash access method.
+  *
+  * Portions Copyright (c) 1996-2009, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  *
+  * IDENTIFICATION
+  *	  $PostgreSQL: pgsql/src/backend/utils/hash/hashfunc.c,v 1.57 2009/01/01 17:23:35 momjian Exp $
+  *
+  * NOTES
+  *	  It is expected that every bit of a hash function's 32-bit result is
+  *	  as random as every other; failure to ensure this is likely to lead
+  *	  to poor performance of hash joins, for example.  In most cases a hash
+  *	  function should use hash_any() or its variant hash_uint32().
+  *-------------------------------------------------------------------------
+  */
+ 
+ #include "postgres.h"
+ 
+ #include "utils/hashfunc.h"
+ 
+ /*
+  * This hash function was written by Bob Jenkins
+  * (bob_jenkins@burtleburtle.net), and superficially adapted
+  * for PostgreSQL by Neil Conway. For more information on this
+  * hash function, see http://burtleburtle.net/bob/hash/doobs.html,
+  * or Bob's article in Dr. Dobb's Journal, Sept. 1997.
+  *
+  * In the current code, we have adopted Bob's 2006 update of his hash
+  * function to fetch the data a word at a time when it is suitably aligned.
+  * This makes for a useful speedup, at the cost of having to maintain
+  * four code paths (aligned vs unaligned, and little-endian vs big-endian).
+  * It also uses two separate mixing functions mix() and final(), instead
+  * of a slower multi-purpose function.
+  */
+ 
+ /* Get a bit mask of the bits set in non-uint32 aligned addresses */
+ #define UINT32_ALIGN_MASK (sizeof(uint32) - 1)
+ 
+ /* Rotate a uint32 value left by k bits - note multiple evaluation! */
+ #define rot(x,k) (((x)<<(k)) | ((x)>>(32-(k))))
+ 
+ /*----------
+  * mix -- mix 3 32-bit values reversibly.
+  *
+  * This is reversible, so any information in (a,b,c) before mix() is
+  * still in (a,b,c) after mix().
+  *
+  * If four pairs of (a,b,c) inputs are run through mix(), or through
+  * mix() in reverse, there are at least 32 bits of the output that
+  * are sometimes the same for one pair and different for another pair.
+  * This was tested for:
+  * * pairs that differed by one bit, by two bits, in any combination
+  *   of top bits of (a,b,c), or in any combination of bottom bits of
+  *   (a,b,c).
+  * * "differ" is defined as +, -, ^, or ~^.  For + and -, I transformed
+  *   the output delta to a Gray code (a^(a>>1)) so a string of 1's (as
+  *   is commonly produced by subtraction) look like a single 1-bit
+  *   difference.
+  * * the base values were pseudorandom, all zero but one bit set, or
+  *   all zero plus a counter that starts at zero.
+  * 
+  * This does not achieve avalanche.  There are input bits of (a,b,c)
+  * that fail to affect some output bits of (a,b,c), especially of a.  The
+  * most thoroughly mixed value is c, but it doesn't really even achieve
+  * avalanche in c. 
+  * 
+  * This allows some parallelism.  Read-after-writes are good at doubling
+  * the number of bits affected, so the goal of mixing pulls in the opposite
+  * direction from the goal of parallelism.  I did what I could.  Rotates
+  * seem to cost as much as shifts on every machine I could lay my hands on,
+  * and rotates are much kinder to the top and bottom bits, so I used rotates.
+  *----------
+  */
+ #define mix(a,b,c) \
+ { \
+   a -= c;  a ^= rot(c, 4);  c += b; \
+   b -= a;  b ^= rot(a, 6);  a += c; \
+   c -= b;  c ^= rot(b, 8);  b += a; \
+   a -= c;  a ^= rot(c,16);  c += b; \
+   b -= a;  b ^= rot(a,19);  a += c; \
+   c -= b;  c ^= rot(b, 4);  b += a; \
+ }
+ 
+ /*----------
+  * final -- final mixing of 3 32-bit values (a,b,c) into c
+  *
+  * Pairs of (a,b,c) values differing in only a few bits will usually
+  * produce values of c that look totally different.  This was tested for
+  * * pairs that differed by one bit, by two bits, in any combination
+  *   of top bits of (a,b,c), or in any combination of bottom bits of
+  *   (a,b,c).
+  * * "differ" is defined as +, -, ^, or ~^.  For + and -, I transformed
+  *   the output delta to a Gray code (a^(a>>1)) so a string of 1's (as
+  *   is commonly produced by subtraction) look like a single 1-bit
+  *   difference.
+  * * the base values were pseudorandom, all zero but one bit set, or
+  *   all zero plus a counter that starts at zero.
+  *     
+  * The use of separate functions for mix() and final() allow for a
+  * substantial performance increase since final() does not need to
+  * do well in reverse, but is does need to affect all output bits.
+  * mix(), on the other hand, does not need to affect all output
+  * bits (affecting 32 bits is enough).  The original hash function had
+  * a single mixing operation that had to satisfy both sets of requirements
+  * and was slower as a result.
+  *----------
+  */
+ #define final(a,b,c) \
+ { \
+   c ^= b; c -= rot(b,14); \
+   a ^= c; a -= rot(c,11); \
+   b ^= a; b -= rot(a,25); \
+   c ^= b; c -= rot(b,16); \
+   a ^= c; a -= rot(c, 4); \
+   b ^= a; b -= rot(a,14); \
+   c ^= b; c -= rot(b,24); \
+ }
+ 
+ /*
+  * hash_any() -- hash a variable-length key into a 32-bit value
+  *		k		: the key (the unaligned variable-length array of bytes)
+  *		len		: the length of the key, counting by bytes
+  *
+  * Returns a uint32 value.	Every bit of the key affects every bit of
+  * the return value.  Every 1-bit and 2-bit delta achieves avalanche.
+  * About 6*len+35 instructions. The best hash table sizes are powers
+  * of 2.  There is no need to do mod a prime (mod is sooo slow!).
+  * If you need less than 32 bits, use a bitmask.
+  *
+  * Note: we could easily change this function to return a 64-bit hash value
+  * by using the final values of both b and c.  b is perhaps a little less
+  * well mixed than c, however.
+  */
+ Datum
+ hash_any(register const unsigned char *k, register int keylen)
+ {
+ 	register uint32 a,
+ 				b,
+ 				c,
+ 				len;
+ 
+ 	/* Set up the internal state */
+ 	len = keylen;
+ 	a = b = c = 0x9e3779b9 + len + 3923095;
+ 
+ 	/* If the source pointer is word-aligned, we use word-wide fetches */
+ 	if (((long) k & UINT32_ALIGN_MASK) == 0)
+ 	{
+ 		/* Code path for aligned source data */
+ 		register const uint32 *ka = (const uint32 *) k;
+ 
+ 		/* handle most of the key */
+ 		while (len >= 12)
+ 		{
+ 			a += ka[0];
+ 			b += ka[1];
+ 			c += ka[2];
+ 			mix(a, b, c);
+ 			ka += 3;
+ 			len -= 12;
+ 		}
+ 
+ 		/* handle the last 11 bytes */
+ 		k = (const unsigned char *) ka;
+ #ifdef WORDS_BIGENDIAN
+ 		switch (len)
+ 		{
+ 			case 11:
+ 				c += ((uint32) k[10] << 8);
+ 				/* fall through */
+ 			case 10:
+ 				c += ((uint32) k[9] << 16);
+ 				/* fall through */
+ 			case 9:
+ 				c += ((uint32) k[8] << 24);
+ 				/* the lowest byte of c is reserved for the length */
+ 				/* fall through */
+ 			case 8:
+ 				b += ka[1];
+ 				a += ka[0];
+ 				break;
+ 			case 7:
+ 				b += ((uint32) k[6] << 8);
+ 				/* fall through */
+ 			case 6:
+ 				b += ((uint32) k[5] << 16);
+ 				/* fall through */
+ 			case 5:
+ 				b += ((uint32) k[4] << 24);
+ 				/* fall through */
+ 			case 4:
+ 				a += ka[0];
+ 				break;
+ 			case 3:
+ 				a += ((uint32) k[2] << 8);
+ 				/* fall through */
+ 			case 2:
+ 				a += ((uint32) k[1] << 16);
+ 				/* fall through */
+ 			case 1:
+ 				a += ((uint32) k[0] << 24);
+ 			/* case 0: nothing left to add */
+ 		}
+ #else /* !WORDS_BIGENDIAN */
+ 		switch (len)
+ 		{
+ 			case 11:
+ 				c += ((uint32) k[10] << 24);
+ 				/* fall through */
+ 			case 10:
+ 				c += ((uint32) k[9] << 16);
+ 				/* fall through */
+ 			case 9:
+ 				c += ((uint32) k[8] << 8);
+ 				/* the lowest byte of c is reserved for the length */
+ 				/* fall through */
+ 			case 8:
+ 				b += ka[1];
+ 				a += ka[0];
+ 				break;
+ 			case 7:
+ 				b += ((uint32) k[6] << 16);
+ 				/* fall through */
+ 			case 6:
+ 				b += ((uint32) k[5] << 8);
+ 				/* fall through */
+ 			case 5:
+ 				b += k[4];
+ 				/* fall through */
+ 			case 4:
+ 				a += ka[0];
+ 				break;
+ 			case 3:
+ 				a += ((uint32) k[2] << 16);
+ 				/* fall through */
+ 			case 2:
+ 				a += ((uint32) k[1] << 8);
+ 				/* fall through */
+ 			case 1:
+ 				a += k[0];
+ 			/* case 0: nothing left to add */
+ 		}
+ #endif /* WORDS_BIGENDIAN */
+ 	}
+ 	else
+ 	{
+ 		/* Code path for non-aligned source data */
+ 
+ 		/* handle most of the key */
+ 		while (len >= 12)
+ 		{
+ #ifdef WORDS_BIGENDIAN
+ 			a += (k[3] + ((uint32) k[2] << 8) + ((uint32) k[1] << 16) + ((uint32) k[0] << 24));
+ 			b += (k[7] + ((uint32) k[6] << 8) + ((uint32) k[5] << 16) + ((uint32) k[4] << 24));
+ 			c += (k[11] + ((uint32) k[10] << 8) + ((uint32) k[9] << 16) + ((uint32) k[8] << 24));
+ #else /* !WORDS_BIGENDIAN */
+ 			a += (k[0] + ((uint32) k[1] << 8) + ((uint32) k[2] << 16) + ((uint32) k[3] << 24));
+ 			b += (k[4] + ((uint32) k[5] << 8) + ((uint32) k[6] << 16) + ((uint32) k[7] << 24));
+ 			c += (k[8] + ((uint32) k[9] << 8) + ((uint32) k[10] << 16) + ((uint32) k[11] << 24));
+ #endif /* WORDS_BIGENDIAN */
+ 			mix(a, b, c);
+ 			k += 12;
+ 			len -= 12;
+ 		}
+ 
+ 		/* handle the last 11 bytes */
+ #ifdef WORDS_BIGENDIAN
+ 		switch (len)			/* all the case statements fall through */
+ 		{
+ 			case 11:
+ 				c += ((uint32) k[10] << 8);
+ 			case 10:
+ 				c += ((uint32) k[9] << 16);
+ 			case 9:
+ 				c += ((uint32) k[8] << 24);
+ 				/* the lowest byte of c is reserved for the length */
+ 			case 8:
+ 				b += k[7];
+ 			case 7:
+ 				b += ((uint32) k[6] << 8);
+ 			case 6:
+ 				b += ((uint32) k[5] << 16);
+ 			case 5:
+ 				b += ((uint32) k[4] << 24);
+ 			case 4:
+ 				a += k[3];
+ 			case 3:
+ 				a += ((uint32) k[2] << 8);
+ 			case 2:
+ 				a += ((uint32) k[1] << 16);
+ 			case 1:
+ 				a += ((uint32) k[0] << 24);
+ 			/* case 0: nothing left to add */
+ 		}
+ #else /* !WORDS_BIGENDIAN */
+ 		switch (len)			/* all the case statements fall through */
+ 		{
+ 			case 11:
+ 				c += ((uint32) k[10] << 24);
+ 			case 10:
+ 				c += ((uint32) k[9] << 16);
+ 			case 9:
+ 				c += ((uint32) k[8] << 8);
+ 				/* the lowest byte of c is reserved for the length */
+ 			case 8:
+ 				b += ((uint32) k[7] << 24);
+ 			case 7:
+ 				b += ((uint32) k[6] << 16);
+ 			case 6:
+ 				b += ((uint32) k[5] << 8);
+ 			case 5:
+ 				b += k[4];
+ 			case 4:
+ 				a += ((uint32) k[3] << 24);
+ 			case 3:
+ 				a += ((uint32) k[2] << 16);
+ 			case 2:
+ 				a += ((uint32) k[1] << 8);
+ 			case 1:
+ 				a += k[0];
+ 			/* case 0: nothing left to add */
+ 		}
+ #endif /* WORDS_BIGENDIAN */
+ 	}
+ 
+ 	final(a, b, c);
+ 
+ 	/* report the result */
+ 	return UInt32GetDatum(c);
+ }
+ 
+ /*
+  * hash_uint32() -- hash a 32-bit value
+  *
+  * This has the same result as
+  *		hash_any(&k, sizeof(uint32))
+  * but is faster and doesn't force the caller to store k into memory.
+  */
+ Datum
+ hash_uint32(uint32 k)
+ {
+ 	register uint32 a,
+ 				b,
+ 				c;
+ 
+ 	a = b = c = 0x9e3779b9 + (uint32) sizeof(uint32) + 3923095;
+ 	a += k;
+ 
+ 	final(a, b, c);
+ 
+ 	/* report the result */
+ 	return UInt32GetDatum(c);
+ }
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/utils/hash/Makefile pgsql_indexcompat/src/backend/utils/hash/Makefile
*** pgsql_indexcompat.5d4d60e3a557/src/backend/utils/hash/Makefile	2009-05-22 15:56:34.400748291 -0400
--- pgsql_indexcompat/src/backend/utils/hash/Makefile	2009-05-22 15:56:34.413939001 -0400
***************
*** 12,17 ****
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = dynahash.o hashfn.o pg_crc.o
  
  include $(top_srcdir)/src/backend/common.mk
--- 12,17 ----
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = hashfunc.o dynahash.o hashfn.o pg_crc.o
  
  include $(top_srcdir)/src/backend/common.mk
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/include/access/hash.h pgsql_indexcompat/src/include/access/hash.h
*** pgsql_indexcompat.5d4d60e3a557/src/include/access/hash.h	2009-05-22 15:56:34.403100419 -0400
--- pgsql_indexcompat/src/include/access/hash.h	2009-05-22 15:56:34.414443320 -0400
***************
*** 251,281 ****
  extern Datum hashvacuumcleanup(PG_FUNCTION_ARGS);
  extern Datum hashoptions(PG_FUNCTION_ARGS);
  
- /*
-  * Datatype-specific hash functions in hashfunc.c.
-  *
-  * These support both hash indexes and hash joins.
-  *
-  * NOTE: some of these are also used by catcache operations, without
-  * any direct connection to hash indexes.  Also, the common hash_any
-  * routine is also used by dynahash tables.
-  */
- extern Datum hashchar(PG_FUNCTION_ARGS);
- extern Datum hashint2(PG_FUNCTION_ARGS);
- extern Datum hashint4(PG_FUNCTION_ARGS);
- extern Datum hashint8(PG_FUNCTION_ARGS);
- extern Datum hashoid(PG_FUNCTION_ARGS);
- extern Datum hashenum(PG_FUNCTION_ARGS);
- extern Datum hashfloat4(PG_FUNCTION_ARGS);
- extern Datum hashfloat8(PG_FUNCTION_ARGS);
- extern Datum hashoidvector(PG_FUNCTION_ARGS);
- extern Datum hashint2vector(PG_FUNCTION_ARGS);
- extern Datum hashname(PG_FUNCTION_ARGS);
- extern Datum hashtext(PG_FUNCTION_ARGS);
- extern Datum hashvarlena(PG_FUNCTION_ARGS);
- extern Datum hash_any(register const unsigned char *k, register int keylen);
- extern Datum hash_uint32(uint32 k);
- 
  /* private routines */
  
  /* hashinsert.c */
--- 251,256 ----
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/include/utils/builtins.h pgsql_indexcompat/src/include/utils/builtins.h
*** pgsql_indexcompat.5d4d60e3a557/src/include/utils/builtins.h	2009-05-22 15:56:34.408004271 -0400
--- pgsql_indexcompat/src/include/utils/builtins.h	2009-05-22 15:56:34.414756725 -0400
***************
*** 127,132 ****
--- 127,133 ----
  extern Datum i4tochar(PG_FUNCTION_ARGS);
  extern Datum text_char(PG_FUNCTION_ARGS);
  extern Datum char_text(PG_FUNCTION_ARGS);
+ extern Datum hashchar(PG_FUNCTION_ARGS);
  
  /* domains.c */
  extern Datum domain_in(PG_FUNCTION_ARGS);
***************
*** 150,155 ****
--- 151,157 ----
  extern Datum enum_last(PG_FUNCTION_ARGS);
  extern Datum enum_range_bounds(PG_FUNCTION_ARGS);
  extern Datum enum_range_all(PG_FUNCTION_ARGS);
+ extern Datum hashenum(PG_FUNCTION_ARGS);
  
  /* int.c */
  extern Datum int2in(PG_FUNCTION_ARGS);
***************
*** 238,243 ****
--- 240,249 ----
  extern Datum generate_series_int4(PG_FUNCTION_ARGS);
  extern Datum generate_series_step_int4(PG_FUNCTION_ARGS);
  extern int2vector *buildint2vector(const int2 *int2s, int n);
+ extern Datum hashint2(PG_FUNCTION_ARGS);
+ extern Datum hashint4(PG_FUNCTION_ARGS);
+ extern Datum hashint8(PG_FUNCTION_ARGS);
+ extern Datum hashint2vector(PG_FUNCTION_ARGS);
  
  /* name.c */
  extern Datum namein(PG_FUNCTION_ARGS);
***************
*** 257,262 ****
--- 263,269 ----
  extern Datum session_user(PG_FUNCTION_ARGS);
  extern Datum current_schema(PG_FUNCTION_ARGS);
  extern Datum current_schemas(PG_FUNCTION_ARGS);
+ extern Datum hashname(PG_FUNCTION_ARGS);
  
  /* numutils.c */
  extern int32 pg_atoi(char *s, int size, int c);
***************
*** 411,416 ****
--- 418,425 ----
  extern Datum float84gt(PG_FUNCTION_ARGS);
  extern Datum float84ge(PG_FUNCTION_ARGS);
  extern Datum width_bucket_float8(PG_FUNCTION_ARGS);
+ extern Datum hashfloat4(PG_FUNCTION_ARGS);
+ extern Datum hashfloat8(PG_FUNCTION_ARGS);
  
  /* dbsize.c */
  extern Datum pg_tablespace_size_oid(PG_FUNCTION_ARGS);
***************
*** 461,466 ****
--- 470,477 ----
  extern Datum oidvectorle(PG_FUNCTION_ARGS);
  extern Datum oidvectorge(PG_FUNCTION_ARGS);
  extern Datum oidvectorgt(PG_FUNCTION_ARGS);
+ extern Datum hashoid(PG_FUNCTION_ARGS);
+ extern Datum hashoidvector(PG_FUNCTION_ARGS);
  extern oidvector *buildoidvector(const Oid *oids, int n);
  
  /* pseudotypes.c */
***************
*** 698,703 ****
--- 709,716 ----
  extern Datum to_hex64(PG_FUNCTION_ARGS);
  extern Datum md5_text(PG_FUNCTION_ARGS);
  extern Datum md5_bytea(PG_FUNCTION_ARGS);
+ extern Datum hashtext(PG_FUNCTION_ARGS);
+ extern Datum hashvarlena(PG_FUNCTION_ARGS);
  
  extern Datum unknownin(PG_FUNCTION_ARGS);
  extern Datum unknownout(PG_FUNCTION_ARGS);
diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/include/utils/hashfunc.h pgsql_indexcompat/src/include/utils/hashfunc.h
*** pgsql_indexcompat.5d4d60e3a557/src/include/utils/hashfunc.h	1969-12-31 19:00:00.000000000 -0500
--- pgsql_indexcompat/src/include/utils/hashfunc.h	2009-05-22 15:56:34.415054363 -0400
***************
*** 0 ****
--- 1,20 ----
+ /*-------------------------------------------------------------------------
+  *
+  * hashfunc.h
+  *	  header file for hash functions
+  *
+  *
+  * Portions Copyright (c) 1996-2009, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  * $PostgreSQL: pgsql/src/include/access/hash.h,v 1.91 2008/10/17 23:50:57 tgl Exp $
+  *
+  *-------------------------------------------------------------------------
+  */
+ #ifndef HASHFUNC_H
+ #define HASHFUNC_H
+ 
+ extern Datum hash_any(register const unsigned char *k, register int keylen);
+ extern Datum hash_uint32(uint32 k);
+ 
+ #endif   /* HASHFUNC_H */
#2Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Zdenek Kotala (#1)
1 attachment(s)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

I forgot to fix contrib. Updated patch attached.

Zdenek

Zdenek Kotala píše v pá 22. 05. 2009 v 16:23 -0400:

Show quoted text

Attached patch cleanups hash index headers to allow compile hasham for
8.3 version. It helps to improve pg_migrator with capability to migrate
database with hash index without reindexing.

I discussed this patch year ago with Alvaro when we tried to cleanup
include bloating problem. It should reduce also number of including.

The main point is that hash functions for datatypes are now in related
data files in utils/adt directory. hash_any() and hash_uint32 it now in
utils/hashfunc.c.

It would be nice to have this in 8.4 because it allows to test index
migration functionality.

Thanks Zdenek

Attachments:

hash_02.patch.gzapplication/x-gzip; name=hash_02.patch.gzDownload
�qEJhash_02.patch�\}{�6��[�h���l�Q�������U��s�s�w�FDB+�`����u?����(Yr�>�������� w<f���f�$����#�m9x\m;-�SM�nwk�����l7���S�i�������'vRh���J�]i4��:j���j���u�=��*�V���R���L�v����N��oj���h���`��1vP��_���%�`/��PD��Kx���)��-��6��T�.���I��,�]/����]?�/�p�c(������c��|�����g�w'���P�����EC����������l��o?a4��JL�Q�c�%�����G[�{G
�(��$���k5��F���1<�'2�v��J{�3�F���+�����w���N��a 6��''�b���bq/�$�a�|�XT3VA!�]e���.�,3���}	�kcQ����������(4�V�U7�9���8��������=�S�L���<�z��.&�8�
���u��ng��@��w��QiZ�����S��@���p�r�N��*������2������ �a��i�J?b�T���������f���T��Bw2�Y�.�`��
J�5��\�����#��Sq+<����P&�c�Zev%&�EL�Y<�����0r�|��{.���|���98=��|7x{r3��0�j�������|�,�W����-�aV���<j��\�v�����U������k���TD"#\
	��>�1�� �v��H]p{�d B5���(2t��>4J'�tRDe��C�LbAb��[�%���|.������L5�0 �`MAw���P�q�CW&)oQ lf�W���F�����������1���<f���1���y�&b�F� I<lmH����w�?)�1w��K&��>M�W��������1��k a8 p
O|�$F��,]�4Zqs��*���I�bf�$�X6��,���sX��r��Xb@�MF�q�*I\?n6�%%��5��^r�������R)!���8J�CR
8��g���%������/&_���d�0V.�C����[������u	
������_�J����	���@���?�\�����7�b�����7Vg{���X"
����m&]CC*��0��z��u8�N�������;��sa��$��t+���tl4�F�4�y�6��v�/	���D�F%4-_���,,��eUP2>#;`D��p���]�B���B�B������`p�k�������CIjbJ�z��Qb`{���4�GQ�!��y7�&����	G������Lt�5\��p}sv���N�Pn��em��@%P#
Ja
OR��Y1��	��jL��E���
 &���������5�����n{���>��9��(�/�	��Z��z��i�	
�g/�/Q	�d8c�uI<; �Dw�C��o.�u���d�����{~2��=�=����*/f���j���'7��P�����L���g����
Ubs�. ��6w�$b��P����)��{�`�Pt� �}c�#�U��]@�#.�o��`�|c�y�G
M�0�MjT`�S�0*r���\>�q����8Z�Y��H2�V2*�rcI�tuv���b�����Ln6"��O�DI0�<A�zmPl5�YH�Qe�7�����3�)���Q]����'�'w��c��%�.�;R�	�F0��3�+�g��@�<��G��������7FS�X��@*�UK��@���?y@��O1�n3�A�}������
9���m��-^1����������� v���Dm�'�[����/��������A4�_�gW{q
+oTtSV_wn�X5����5l|��E�������>����h3��E��B����qX<X0|q���x)h���z
�>�����D������a������.J��0Xp��/A�CXq\��2A`& �Y4$�l�0��YX��b�7g��o�(Q,�js
7��Av"$�H�c�q�<`�.��������3!M����1 �������M$�������G1z)Mg�M���~�-��������(:R���g��H.!�^]��6�yr�3<<��/R�2A�cXt=��3,�����W�����-�9��g�^�b.�	�IH<�)�+
�N����p�����{����Li1IZj����:,�q�0�U+����t���SxSd�H��D��p�0���}������N�G�'��\��#xaV�Y�5LnT[LU}��������NS��[�{����&�w�0>.d��#�����.b��
VO�m���'���/�R��=���������q�G�IeB���[���2����Q����$}��F���3���qpT��2SNU���rU���++"��\�x���a���>�^���Wf�"�������d�@���@AA-�!��8ZM�Q�wX8J�����02�pQ��d���`��0p8�]��q��8���)~��]53�s>�E�f:N<�%
'	�LEK�n���-j�=�������$�1ap6�XQw�n#�����g��\U2@����o:�iJ��{��\`�����9RxC��)�2��I/��f3�����I�L2x
���\��0���hfr�:�F
�j�i�	���D&����3�
O~�_��\�;+�`�,S+�*�]_I���YP�����u~����Cj0h��Z��_,w��x_��X�X�/�zU��J�~y���lT�A)�}%s��e�
�i�,���h�)��7�/�����$��&�����]�0F�Ushl0�����8���JK5VZp7��	S_c�?D=����d��l=I�����mQ^� T~�T�|���V���1���"B�1�����zM
��"��"VP��2�T��GD�XnC�#Ur�JM����0W��h+�?�$x��#��"�h2H���F�Yo�5F2���40dt�������%j��%�q�Y��>�|����!	� 0�~��a��e%�p
�����?(�)��E����������X� �D���r>�>�^6�%#���������R:`Q��!b���P	0@��p��D"q��)S����t�B�@9�i��K0���G1x�5���3:�H�"{\<t!���)n����R�+s��P7�] �� !�A6�_S�7�<����mD���$��r0*�~���YP�������f����#��d�AQx��
���Hq�N��`e���+�
�Ec���HI.����SQ��B?��@���8T%�R������j�$�!W����5���K*4�q��m����:������� ������������'���/���8��k�IC�y`sLj���!�Y���0���,iy��"�����I����{c��4�aT���0z������������k�H��<���k6:��FX��]l�5<$
����F�\��Y�D
���~��n�Z��m�������b�����(P�U���a�o�~���
����1/x�)��CO�D	��"c���5���0�h���"_�H�rH�rH�
K����8����Z[�E�
�U$��/A7~Nf�4,��aD���_M���3H
���"�4�#�c���l\:��������Bl l-�u�NGQ �,�B�B���dI9P��;q�N�7�����)��y�z������:�J��_pMt�p������`j��(�+��`<A"`#a��F�� �G�QnJ���?z�L)/7���-�;�(��6��r(7LiN6K��T��,u�V8O��8&�����BaV(�"a�"~H��k�x���(��[�������R�r7d���%�,�8�(��,*�������R�h�L�"`��!E�1E����J�[!����LX�����HX�Fq�l3uT�D��anF���X(����U6J���4���2�G��9�|���)�L�H)I��H�c� ��t���N��f��4@v������N��xnC�����h��8��H>�-&���T���2���yD��X>��	Ny��2)���E��bo������$����b���d�������
�?f�E���L>��-G��m>@�L��Z@W�9)"0s�����Q���%V9|���������G}��X��5��f����0��Md��<�
�d�4%u��=�m_�T3���+=�OJl��^���p�O���D���&����t�	�������f8���3�����N�O�y9c�������"���kf5�6���@����X��X=��y`�
�@E�b^����LS��*�����:�]�K�,�Q��Yf�k�N�N��w�ryuz=�vp~vq:8������)����t��cYGF	�������V�'���t
`t��;Q�H3�	�7��J��o��#:��tHd`�QLS�:%�"�����K0]m��gxX��u�(� ��gZw�ZgG��G���d�{	���N+G�����vAs�n���>"hl$c�%k#������aKm(��a���h�t�3�_�����v����n���g��]�������v�������#~��v���/8����$%�U�Ua��9��!�������[������Om������V��`�>�����;����V	���e�RN�������%���c*��Z���66���$��*�v>����M����o������"��*�Z"3&�|zh�c��LX��P��x�2�c���~��1�A��<)��3��3��s#`'O��������V�Q?��>K��������j�8i{8�������g����c��I/��B�P���R��qK��|��EL����=q���5/�e��k��KG�G3t�zo�PH�a_����Z�z�\�!�9%l�4���-��6G�S[��r)����:��!�G��<�}RkF�sc�2��:Z��iU8�L��������������_�x�4���
��K:�F��n�{Oz]}�v�j�;�^�����Q��[b�J�qCP�j�f~���g�U1[�D�6�zKo��]~���>�Z��W��>����y;����O������{��?z��D�UdJ�p�Ju��2��`�����F��g$o%���/�?���dN>�&��j���;hk�}
i�@�u��fpkg+��.�P��������R,�d���-�����r���5*���HE�y��]��T�������\��]�+�yw�+���(Vz5���������{�����/^"���<�*U�
��U�
d�����w������;uP�V��M��
���F�y���\�c.�X���'�j��N��j�� �.��N���/��
�T����s��Gx'Yh+����?��w�n�1q<�����+���j�}
m�}��u��V����;���X�v���fZ�z��X��W����k�t�wQ����_j�/���j��������jA�C+j9DI��Jt4�):H�-��(Rc|i�O9����yTF�,�x�w�'�"�l��)��O7E�~�)�;V��n=�7�mWV��it��^���6Ez��\w6_v��"�;�^�����r�7��k|��[�f��l/F��+�Z����������:~�
F?*}p����C�w������'��B���Z��bk�����[�F�)���.�v��kv���6/�L�_$@���u�������}�t?����A>��V�z��n����if.4P��|PI�([���W�h-������+V�����g�Q���n�8|��������W�@I��+2y�+V�����g�b���;���q,?3��S��|73�UL�$T��$�}��4
�.tw�M���_���>�%[��v(��F:�e��n���b� )V�W!�X�<4�
���2���!�)V�/"�Q�a)V�����_�����8�����\����dc�<������X���������]������(��?��*��a[�~D��i:��3"+�_��!�TC�y����b�8.O.�����/����W����i yg�� �JM!)�H�R���#�J���"�dG��U�B�����-P?c���L?�q�T|&Y��qg������/�m���B���1_�~ry����&�����=�g����O��=��h.�~.Y� 
�G<at1�:��nE�Q�$!��
�zg����BS�������m���-EnS�~.�6E#EnS�%���*��[����u��)�������#�@�4Y��1�����@@�0	������]N��bE	~�v����~�o���N�vv�����.�!9;���>������y5���	�X�L�e&63����1i+p9ZPe&��Io&`��	X�F�L�mS��L�L�dD&`wUd&E&`���G
l3�n��:j�S�fa����2��r��4���0���al��K��C�������
����:V\m���rd'�);
z�(>����~���`�@A)�u�����<�!)C�T� +0����\����`�f���_���>".�V�g��Wds���I6W���UX{K�\
�p)��s����e*j���B�R��,���t���VEA����k���d��&�[�C�G�5���z�[7�T.a��Ut�9������i`
L��h��F���~���:������$�*k�P�B��!��,����$3M�$
i��>�A{���w�{�������}���o)F����=���G�>-��m�o���N��F��jqe��Pz�Bu\�c������/����:�N3S�8C}���s�������������f�]]W�7���#����}�������,T
�U9?�mwv�v����H'����]�o�xU���qN���B����~m��R��
*	A�4�i�y��[6S��YG�(Y�����A������OO�YK�{�W��zQ�d���>��E7��N����Y�fZl�Z]�a�f6c��k~�p��T'?��16����G�y�MAq�X�Y�g�@���j��h��D���_���}�eYfCqJ���G���q��	?s@��:x���8�w��=)���%��6u�w�#��g~��h�LYcx.E(�p@�����uI����(����"�K�i�����Lo��,���m1b�
���Z�>I����k=�����8^!8��L!D�Q
!D�
�<�	�a
����E2A���B��w�n�����<�o�O������"�Ry�c�w�i��-`�!>�@vq�
�W>�|���k���X|����j��0q�N@5�h?� @�e�K*7
��_�.��8���j�W�m
_�����n�X�M[j��Z�an�&u^��	�|���_2��� 	�� >
M��FN�WG���o:��4"Y��������v�3�������>����������W|��
�>B6��?����$��n�s�����2�y����\�����e	�������@���E_��>,nn���l����}���{IV�7�����������q�'*t�y-+����c[H��zz�`��\L�;2O����?����1����X���x��o�����N�Cz�};49��� r�W���`������t����'��EO!�����������������)2[��v���)?��
A�*~�cMpgU���������;��%���+��e� ����.�����o��4��8��c�M~���2���
���s����zXp�_K�%���n�o���a�Om{W~���d����_��=I��/%���$V������$V�������/%����$�����6����_�a0t��U��$��KI�F�/����$���U6���m�����m�X��[�1b�6F��1b�6F�����GF�/�E�%���$(��������*����_��KbYIK���/	���&��h�/�&�Ru�&�Ru�&�Ry��y�r���K��/���h�%�����!���H��$6�������_VQ�P�%���$6���CP�/{,|Y�x5��t�/I'�R%9����]���R�/�����K2I�/+�1YL#��������_H��4�iI,�/I�[��,7��r��"G�����p,���N�%i�/�>�����%1�_N�"�JA�"�9�R<\7����+R��r�)�J9���0�I��"���_���%�u�(����_�Av�cD`�K��E�L�)I
A�R�3*���h�CZ�C���{F��j'��8I�&b����A#��iP���H�	�ZY��qI_)�5U��M�O�)t��dA�t_��)^����nC�)�f�����=����>q�������Wwn�U�(�@��"��"���_�:���/���2���b��/5����/kmeYk+����.����r\h�6��G����#F�A�r@�g��q������0�6���������_�ZHwU�/�PfqRw8�tW�rH�c��d��mr����H�8���l���5��_�~�%���D��Z�K����M��/I��|��6v�&��1bot'���^�>5M��/9�C/��D�h�]�<+��������=^s���f2{���S��-�
O��9���Q���1(�ZV��������Y��Ig�����^W~G?�g1���'>�/_~0���f<����}<�'��j
�Dxi���&?s������*l�����0�<	l�/V_�*�_�*��\��w������d���0��
����N6Vk�����B�� v�%���\F�o��q��X��{����1y��KI7P��X[�T@�V[�����I�UE�����H�a�(�-�����*f�������(�����-��iE�*����/-���Q`�!���a�����A

��+W<+�C�e��D��B6�8��fqS#���D� f�C���Hl��M�^L;#���rU�A�~��2�v��-���oN4��bUg&����8W+~�a#zM�Z�o�]1hE�Kk��i��
�M���fi<5�}��&�V��+�kT�a���R����o��T�t������d��cdh;������'L���D�Zz�1����+#  (�l��m����	{��oI�����w�E������x�yj���0x�Q����^9������E;�h��8I5����uK�P]A�K�L��z�K�h-����c�]-v���{�����R��T����
�4/���Gx�c0���'��������!m'�i2u��*�J.m�'�bho�� �]OgL�Y��"D�9f����xn��v����t�,���Ui]��xJ�d�����t�`��Z���/��W�C���G�3�xGXS�Y�&2�$�27�1�����v������H����|�5H�D�1�R��������%nD��,��
�^���ZQbU|X�Z��k@)V�f Ic?��������2P���O�Y��,��U�����$��$�(?��R��&,����E����f��/B��;�|�uj���j�{�CL���l1q���8�/VY��V�����N�-�
#3Kenneth Marshall
ktm@rice.edu
In reply to: Zdenek Kotala (#2)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

On Sat, May 23, 2009 at 02:52:49PM -0400, Zdenek Kotala wrote:

I forgot to fix contrib. Updated patch attached.

Zdenek

Zdenek Kotala p????e v p?? 22. 05. 2009 v 16:23 -0400:

Attached patch cleanups hash index headers to allow compile hasham for
8.3 version. It helps to improve pg_migrator with capability to migrate
database with hash index without reindexing.

I discussed this patch year ago with Alvaro when we tried to cleanup
include bloating problem. It should reduce also number of including.

The main point is that hash functions for datatypes are now in related
data files in utils/adt directory. hash_any() and hash_uint32 it now in
utils/hashfunc.c.

It would be nice to have this in 8.4 because it allows to test index
migration functionality.

Thanks Zdenek

How does that work with the updated hash functions without a reindex?

Regards,
Ken

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kenneth Marshall (#3)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

Kenneth Marshall <ktm@rice.edu> writes:

On Sat, May 23, 2009 at 02:52:49PM -0400, Zdenek Kotala wrote:

Attached patch cleanups hash index headers to allow compile hasham for
8.3 version. It helps to improve pg_migrator with capability to migrate
database with hash index without reindexing.

How does that work with the updated hash functions without a reindex?

I looked at this patch and I don't see how it helps pg_migrator at all.
It's just pushing some code and function declarations around.

The rearrangement might be marginally nicer from a code beautification
point of view --- right now we're a bit inconsistent about whether
datatype-specific hash functions live in hashfunc.c or in the datatype's
utils/adt/ file. But I'm not sure that removing hashfunc.c altogether is
an appropriate solution to that, not least because of the loss of CVS
history for the functions. I'd be inclined to leave the core hash_any()
code where it is, if not all of these functions altogether.

What does seem useful is to refactor the headers so that datatype hash
functions don't need to include all of the AM's implementation details.
But this patch seems to do both more and less than that --- I don't
think it's gotten rid of all external #includes of access/hash.h, and
in any case moving the function code is not necessary to that goal.

In any case, the barriers to implementing 8.3-style hash indexes in 8.4
are pretty huge: you'd need to duplicate not only the hash AM code, but
also all the hash functions, and therefore all of the hash pg_amop and
pg_amproc entries. Given the close-to-zero usefulness of hash indexes
in production installations, I don't think it's worth the trouble. It
would be much more helpful to look into supporting 8.3-compatible GIN
indexes.

regards, tom lane

#5Dimitri Fontaine
dfontaine@hi-media.com
In reply to: Tom Lane (#4)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

Hi,

Tom Lane <tgl@sss.pgh.pa.us> writes:

The rearrangement might be marginally nicer from a code beautification
point of view --- right now we're a bit inconsistent about whether
datatype-specific hash functions live in hashfunc.c or in the datatype's
utils/adt/ file. But I'm not sure that removing hashfunc.c altogether is
an appropriate solution to that, not least because of the loss of CVS
history for the functions. I'd be inclined to leave the core hash_any()
code where it is, if not all of these functions altogether.

I guess someone has to talk about it: git will follow the code even when
the file hosting it changes. It's not all magic though:

http://kerneltrap.org/node/11765

"And when using git, the whole 'keep code movement separate from
changes' has an even more fundamental reason: git can track code
movement (again, whether moving a whole file or just a function
between files), and doing a 'git blame -C' will actually follow code
movement between files. It does that by similarity analysis, but it
does mean that if you both move the code *and* change it at the same
time, git cannot see that 'oh, that function came originally from that
other file', and now you get worse annotations about where code
actually originated."

Having better tools maybe could help maintain the high quality standards
that are established code wise, too.

Regards,
--
dim

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dimitri Fontaine (#5)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

Dimitri Fontaine <dfontaine@hi-media.com> writes:

Tom Lane <tgl@sss.pgh.pa.us> writes:

The rearrangement might be marginally nicer from a code beautification
point of view --- right now we're a bit inconsistent about whether
datatype-specific hash functions live in hashfunc.c or in the datatype's
utils/adt/ file. But I'm not sure that removing hashfunc.c altogether is
an appropriate solution to that, not least because of the loss of CVS
history for the functions. I'd be inclined to leave the core hash_any()
code where it is, if not all of these functions altogether.

I guess someone has to talk about it: git will follow the code even when
the file hosting it changes.

That might possibly be relevant a year from now; it is 100% irrelevant
to a change being proposed for 8.4.

regards, tom lane

#7David Fetter
david@fetter.org
In reply to: Tom Lane (#6)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

On Mon, May 25, 2009 at 09:59:14AM -0400, Tom Lane wrote:

Dimitri Fontaine <dfontaine@hi-media.com> writes:

Tom Lane <tgl@sss.pgh.pa.us> writes:

The rearrangement might be marginally nicer from a code
beautification point of view --- right now we're a bit
inconsistent about whether datatype-specific hash functions live
in hashfunc.c or in the datatype's utils/adt/ file. But I'm not
sure that removing hashfunc.c altogether is an appropriate
solution to that, not least because of the loss of CVS history
for the functions. I'd be inclined to leave the core hash_any()
code where it is, if not all of these functions altogether.

I guess someone has to talk about it: git will follow the code
even when the file hosting it changes.

That might possibly be relevant a year from now; it is 100%
irrelevant to a change being proposed for 8.4.

It's pretty relevant as far as the schedule goes. I'm not alone
thinking that the appropriate place to make this change, given
buildfarm support, is at the transition to 8.5.

CVS is dead. Long live git! :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#8Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Tom Lane (#4)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

Tom Lane píše v ne 24. 05. 2009 v 18:46 -0400:

Kenneth Marshall <ktm@rice.edu> writes:

On Sat, May 23, 2009 at 02:52:49PM -0400, Zdenek Kotala wrote:

Attached patch cleanups hash index headers to allow compile hasham for
8.3 version. It helps to improve pg_migrator with capability to migrate
database with hash index without reindexing.

How does that work with the updated hash functions without a reindex?

I looked at this patch and I don't see how it helps pg_migrator at all.
It's just pushing some code and function declarations around.

The main important thing is move hash_any/hash_uint32 function from hash
am. It should reduce amount of duplicated code necessary for old hash
index implementation. Rest is only rearrangement and cleanup.

The rearrangement might be marginally nicer from a code beautification
point of view --- right now we're a bit inconsistent about whether
datatype-specific hash functions live in hashfunc.c or in the datatype's
utils/adt/ file.

I personally prefer to keep it in type definition. AM should be
independent on types which are installed and data type code should be
self contained.

But I'm not sure that removing hashfunc.c altogether is
an appropriatera solution to that, not least because of the loss of CVS
history for the functions. I'd be inclined to leave the core hash_any()
code where it is, if not all of these functions altogether.

Until we will have better version control system, hashfunc.c can stay
here, but what I need is hashfunc.h. Minimalistic version of this patch
is to create hashfuct.h and modified related #include in C and header
files.

What does seem useful is to refactor the headers so that datatype hash
functions don't need to include all of the AM's implementation details.
But this patch seems to do both more and less than that --- I don't
think it's gotten rid of all external #includes of access/hash.h, and
in any case moving the function code is not necessary to that goal.

Agree, I will prepare minimalistic version with hashfunc.h only.

In any case, the barriers to implementing 8.3-style hash indexes in 8.4
are pretty huge: you'd need to duplicate not only the hash AM code, but
also all the hash functions, and therefore all of the hash pg_amop and
pg_amproc entries.

I'm not sure if I need duplicate functions. Generally yes but It seems
to me that hash index does not changed functions behavior and they could
be shared at this moment.

Given the close-to-zero usefulness of hash indexes
in production installations, I don't think it's worth the trouble. It
would be much more helpful to look into supporting 8.3-compatible GIN
indexes.

Agree, I wanted to quickly verify function naming collision problem and
HASH index seems to me better candidate for this basic test. GIN has
priority.

Zdenek

#9Stephen Frost
sfrost@snowman.net
In reply to: David Fetter (#7)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

David,

* David Fetter (david@fetter.org) wrote:

It's pretty relevant as far as the schedule goes. I'm not alone
thinking that the appropriate place to make this change, given
buildfarm support, is at the transition to 8.5.

CVS is dead. Long live git! :)

I'm all for moving to git, but not until at least the core folks are
more familiar with it and have been using it. I don't believe that
experience will be there by the time we open for 8.5 and a forced march
when we have numerous big things hopefully hitting on the first
commitfest seems like a bad idea.

I would encourage core, committers and contributors to start becoming
familiar with git on the expectation that we'll be making that move
when we open for 8.6/9.0.

Ideally, there could be an official decision made about when it's going
to happen followed by an announcment when 8.4 is released.

Thoughts?

Stephen

#10David Fetter
david@fetter.org
In reply to: Stephen Frost (#9)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

On Mon, May 25, 2009 at 11:45:33AM -0400, Stephen Frost wrote:

David,

* David Fetter (david@fetter.org) wrote:

It's pretty relevant as far as the schedule goes. I'm not alone
thinking that the appropriate place to make this change, given
buildfarm support, is at the transition to 8.5.

CVS is dead. Long live git! :)

I'm all for moving to git, but not until at least the core folks are
more familiar with it and have been using it.

Which ones aren't familiar and haven't been using it for at least the
past year? I count two.

I don't believe that experience will be there by the time we open
for 8.5 and a forced march when we have numerous big things
hopefully hitting on the first commitfest seems like a bad idea.

Your portrayal of a rough and complicated transition is not terribly
well supported by other projects' switches to git.

I would encourage core, committers and contributors to start
becoming familiar with git on the expectation that we'll be making
that move when we open for 8.6/9.0.

Ideally, there could be an official decision made about when it's
going to happen followed by an announcment when 8.4 is released.

Thoughts?

Here's mine: Git delayed is git denied.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#11Andrew Dunstan
andrew@dunslane.net
In reply to: David Fetter (#7)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

David Fetter wrote:

On Mon, May 25, 2009 at 09:59:14AM -0400, Tom Lane wrote:

Dimitri Fontaine <dfontaine@hi-media.com> writes:

Tom Lane <tgl@sss.pgh.pa.us> writes:

The rearrangement might be marginally nicer from a code
beautification point of view --- right now we're a bit
inconsistent about whether datatype-specific hash functions live
in hashfunc.c or in the datatype's utils/adt/ file. But I'm not
sure that removing hashfunc.c altogether is an appropriate
solution to that, not least because of the loss of CVS history
for the functions. I'd be inclined to leave the core hash_any()
code where it is, if not all of these functions altogether.

I guess someone has to talk about it: git will follow the code
even when the file hosting it changes.

That might possibly be relevant a year from now; it is 100%
irrelevant to a change being proposed for 8.4.

It's pretty relevant as far as the schedule goes. I'm not alone
thinking that the appropriate place to make this change, given
buildfarm support, is at the transition to 8.5.

CVS is dead. Long live git! :)

That still misses Tom's point, since the change is proposed for 8.4 and
at the earliest we would not change SCCMs until after 8.4 is released
(and, notwithstanding your eagerness, I suspect it will be rather later).

cheers

andrew

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Fetter (#10)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

David Fetter <david@fetter.org> writes:

On Mon, May 25, 2009 at 11:45:33AM -0400, Stephen Frost wrote:

I'm all for moving to git, but not until at least the core folks are
more familiar with it and have been using it.

Which ones aren't familiar and haven't been using it for at least the
past year? I count two.

I'm not familiar with it, and neither is Bruce, and frankly that's
entirely sufficient reason not to change now.

What was more or less agreed to at the developer's meeting was that
we would move towards git in an orderly fashion. I'm thinking something
like six months to a year before cutting over the core repository.

If you'd like to accomplish something *useful* about this, how about
pestering git upstream to support diff -c output format?

regards, tom lane

#13David Fetter
david@fetter.org
In reply to: Tom Lane (#12)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

On Mon, May 25, 2009 at 12:24:05PM -0400, Tom Lane wrote:

David Fetter <david@fetter.org> writes:

On Mon, May 25, 2009 at 11:45:33AM -0400, Stephen Frost wrote:

I'm all for moving to git, but not until at least the core folks are
more familiar with it and have been using it.

Which ones aren't familiar and haven't been using it for at least
the past year? I count two.

I'm not familiar with it, and neither is Bruce, and frankly that's
entirely sufficient reason not to change now.

What was more or less agreed to at the developer's meeting was that
we would move towards git in an orderly fashion.

The rest have already been moving to it in "an orderly fashion," some
for over than a year.

I'm thinking something like six months to a year before cutting over
the core repository.

What would gate that?

If you'd like to accomplish something *useful* about this, how about
pestering git upstream to support diff -c output format?

I've been pestering them :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#14David Fetter
david@fetter.org
In reply to: Tom Lane (#12)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

On Mon, May 25, 2009 at 12:24:05PM -0400, Tom Lane wrote:

If you'd like to accomplish something *useful* about this, how about
pestering git upstream to support diff -c output format?

It looks like this is doable with a suitable git configuration file
such as $HOME/.gitconfig or (finer grain) a .git/config for the
repository :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zdenek Kotala (#8)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:

Tom Lane píše v ne 24. 05. 2009 v 18:46 -0400:

In any case, the barriers to implementing 8.3-style hash indexes in 8.4
are pretty huge: you'd need to duplicate not only the hash AM code, but
also all the hash functions, and therefore all of the hash pg_amop and
pg_amproc entries.

I'm not sure if I need duplicate functions. Generally yes but It seems
to me that hash index does not changed functions behavior and they could
be shared at this moment.

No, the behavior of the hash functions themselves changed during 8.4.
Twice, even:

2008-04-06 12:54 tgl

* contrib/dblink/expected/dblink.out,
contrib/dblink/sql/dblink.sql, src/backend/access/hash/hashfunc.c,
src/include/catalog/catversion.h,
src/test/regress/expected/portals.out,
src/test/regress/sql/portals.sql: Improve hash_any() to use
word-wide fetches when hashing suitably aligned data. This makes
for a significant speedup at the cost that the results now vary
between little-endian and big-endian machines; which forces us to
add explicit ORDER BYs in a couple of regression tests to preserve
machine-independent comparison results. Also, force initdb by
bumping catversion, since the contents of hash indexes will change
(at least on big-endian machines).

Kenneth Marshall and Tom Lane, based on work from Bob Jenkins.
This commit does not adopt Bob's new faster mix() algorithm,
however, since we still need to convince ourselves that that
doesn't degrade the quality of the hashing.

2009-02-09 16:18 tgl

* src/: backend/access/hash/hashfunc.c,
include/catalog/catversion.h,
test/regress/expected/polymorphism.out,
test/regress/expected/union.out, test/regress/sql/polymorphism.sql:
Adopt Bob Jenkins' improved hash function for hash_any(). This
changes the contents of hash indexes (again), so bump catversion.

Kenneth Marshall

So as far as I can see, you need completely separate copies of both
hash_any() and the SQL-level functions that call it. I'm not really
seeing that the proposed refactoring makes this any easier. You might
as well just copy-and-paste all that old code into a separate set of
files, and not worry about what is in access/hash.h.

regards, tom lane

#16Alvaro Herrera
alvherre@commandprompt.com
In reply to: David Fetter (#14)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

David Fetter wrote:

On Mon, May 25, 2009 at 12:24:05PM -0400, Tom Lane wrote:

If you'd like to accomplish something *useful* about this, how about
pestering git upstream to support diff -c output format?

It looks like this is doable with a suitable git configuration file
such as $HOME/.gitconfig or (finer grain) a .git/config for the
repository :)

Can you be more specific on the necessary contents of such file?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Fetter (#14)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

David Fetter <david@fetter.org> writes:

On Mon, May 25, 2009 at 12:24:05PM -0400, Tom Lane wrote:

If you'd like to accomplish something *useful* about this, how about
pestering git upstream to support diff -c output format?

It looks like this is doable with a suitable git configuration file
such as $HOME/.gitconfig or (finer grain) a .git/config for the
repository :)

Cool, let's see one.

If we were to put it into a repository config file, that would more or
less have the effect of enforcing a project style for diffs, no?

regards, tom lane

#18Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#16)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

On 05/25/2009 07:20 PM, Alvaro Herrera wrote:

David Fetter wrote:

On Mon, May 25, 2009 at 12:24:05PM -0400, Tom Lane wrote:

If you'd like to accomplish something *useful* about this, how about
pestering git upstream to support diff -c output format?

It looks like this is doable with a suitable git configuration file
such as $HOME/.gitconfig or (finer grain) a .git/config for the
repository :)

Can you be more specific on the necessary contents of such file?

A very sketchy notion of it is at:
http://wiki.postgresql.org/wiki/Talk:Working_with_Git

I will try to correct the wording + windows information after eating.

Andres

#19Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#17)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

On 05/25/2009 07:31 PM, Tom Lane wrote:

David Fetter<david@fetter.org> writes:

On Mon, May 25, 2009 at 12:24:05PM -0400, Tom Lane wrote:

If you'd like to accomplish something *useful* about this, how about
pestering git upstream to support diff -c output format?

It looks like this is doable with a suitable git configuration file
such as $HOME/.gitconfig or (finer grain) a .git/config for the
repository :)

Cool, let's see one.

If we were to put it into a repository config file, that would more or
less have the effect of enforcing a project style for diffs, no?

Yes and no.

You can define that a subset (or all) files use a specific "diff driver"
in the repository - unfortunately the definition of that driver has to
be done locally. Defining it currently involves installing a wrapper
like the one on http://wiki.postgresql.org/wiki/Talk:Working_with_Git
and doing

Andres

#20Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#19)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

On 05/25/2009 07:53 PM, Andres Freund wrote:

On 05/25/2009 07:31 PM, Tom Lane wrote:

David Fetter<david@fetter.org> writes:

On Mon, May 25, 2009 at 12:24:05PM -0400, Tom Lane wrote:

If you'd like to accomplish something *useful* about this, how about
pestering git upstream to support diff -c output format?

It looks like this is doable with a suitable git configuration file
such as $HOME/.gitconfig or (finer grain) a .git/config for the
repository :)

Cool, let's see one.

If we were to put it into a repository config file, that would more or
less have the effect of enforcing a project style for diffs, no?

Yes and no.

You can define that a subset (or all) files use a specific "diff driver"
in the repository - unfortunately the definition of that driver has to
be done locally. Defining it currently involves installing a wrapper
like the one on http://wiki.postgresql.org/wiki/Talk:Working_with_Git
and doing

Ugh, hit the wrong key:
and executing
`git config --global diff.context.command "git-external-diff"`

Andres

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#20)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

Andres Freund <andres@anarazel.de> writes:

You can define that a subset (or all) files use a specific "diff driver"
in the repository - unfortunately the definition of that driver has to
be done locally. Defining it currently involves installing a wrapper
like the one on http://wiki.postgresql.org/wiki/Talk:Working_with_Git
and doing

Ugh, hit the wrong key:
and executing
`git config --global diff.context.command "git-external-diff"`

Okay, so it will more or less have to be a local option. That's okay
... all I really insist on is being able to get a readable diff out
of it. I grant that not everyone may have the same opinion about
what's readable.

regards, tom lane

#22Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#20)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

On 05/25/2009 07:58 PM, Andres Freund wrote:

On 05/25/2009 07:53 PM, Andres Freund wrote:

On 05/25/2009 07:31 PM, Tom Lane wrote:

David Fetter<david@fetter.org> writes:

On Mon, May 25, 2009 at 12:24:05PM -0400, Tom Lane wrote:

If you'd like to accomplish something *useful* about this, how about
pestering git upstream to support diff -c output format?

If we were to put it into a repository config file, that would more or
less have the effect of enforcing a project style for diffs, no?

Yes and no.
You can define that a subset (or all) files use a specific "diff driver"
in the repository - unfortunately the definition of that driver has to
be done locally. Defining it currently involves installing a wrapper
like the one on http://wiki.postgresql.org/wiki/Talk:Working_with_Git
and doing

Ugh, hit the wrong key:
and executing
`git config --global diff.context.command "git-external-diff"`

The content of the former page is now merged into the main page about
git http://wiki.postgresql.org/wiki/Working_with_Git and the notes on
the Talk: page are deleted.

Andres

#23Peter Eisentraut
peter_e@gmx.net
In reply to: Andres Freund (#20)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

On Monday 25 May 2009 20:58:59 Andres Freund wrote:

and executing
`git config --global diff.context.command "git-external-diff"`

We already knew that you could do it with a wrapper. But that isn't the
answer we were looking for, because it will basically mean that 98% of casual
contributors will get it wrong, and it will probably not work very well on
Windows.

The goal is to get git-diff to do it itself.

#24Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#23)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

Hi,

On 05/26/2009 01:39 PM, Peter Eisentraut wrote:

On Monday 25 May 2009 20:58:59 Andres Freund wrote:

and executing
`git config --global diff.context.command "git-external-diff"`

We already knew that you could do it with a wrapper. But that isn't the
answer we were looking for, because it will basically mean that 98% of casual
contributors will get it wrong, and it will probably not work very well on
Windows.

It works on windows, linux, solaris (thats what I could get my hands on
without bothering). I tested it - it works on any non ancient version of
git. (Ancient in the sense, that git at that time didnt work properly on
win anyway).
And providing a 5-line wrapper download-ready surely makes it easier
than figuring it out how to write one out of some git manpages.

Also it allows at least those who prefer context diffs to use them
easily when using git - that are the ones which seem to prefer using
them most.

The goal is to get git-diff to do it itself.

I do not disagree.

Andres

#25Greg Stark
greg.stark@enterprisedb.com
In reply to: Andres Freund (#24)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

I'll repeat my suggestion that everyone poo-pooed: we can have the
mail list filters recognize patches, run filterdiff on them with our
prefered options, and attach the result as an additional attachment
(or link to some web directory).

I think it would be simple to do and would be happy to give it a go if
I can get the necessary access.

It doesn't solve *all* the problems since the committee still needs a
unified diff if he wants to take advantage of git's merge abilities.

I think this is actually all a red herring since it's pretty easy for
the reviewer to run filterdiff anyways. But having things be automatic
is still always easier than not.

--
Greg

On 26 May 2009, at 13:54, Andres Freund <andres@anarazel.de> wrote:

Show quoted text

Hi,

On 05/26/2009 01:39 PM, Peter Eisentraut wrote:

On Monday 25 May 2009 20:58:59 Andres Freund wrote:

and executing
`git config --global diff.context.command "git-external-diff"`

We already knew that you could do it with a wrapper. But that
isn't the
answer we were looking for, because it will basically mean that 98%
of casual
contributors will get it wrong, and it will probably not work very
well on
Windows.

It works on windows, linux, solaris (thats what I could get my hands
on without bothering). I tested it - it works on any non ancient
version of git. (Ancient in the sense, that git at that time didnt
work properly on win anyway).
And providing a 5-line wrapper download-ready surely makes it easier
than figuring it out how to write one out of some git manpages.

Also it allows at least those who prefer context diffs to use them
easily when using git - that are the ones which seem to prefer using
them most.

The goal is to get git-diff to do it itself.

I do not disagree.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Tom Lane (#15)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

Tom Lane píše v po 25. 05. 2009 v 13:07 -0400:

Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:

Tom Lane píše v ne 24. 05. 2009 v 18:46 -0400:

In any case, the barriers to implementing 8.3-style hash indexes in 8.4
are pretty huge: you'd need to duplicate not only the hash AM code, but
also all the hash functions, and therefore all of the hash pg_amop and
pg_amproc entries.

I'm not sure if I need duplicate functions. Generally yes but It seems
to me that hash index does not changed functions behavior and they could
be shared at this moment.

No, the behavior of the hash functions themselves changed during 8.4.
Twice, even:

hmm, I'm missed it. :(

So as far as I can see, you need completely separate copies of both
hash_any() and the SQL-level functions that call it. I'm not really
seeing that the proposed refactoring makes this any easier. You might
as well just copy-and-paste all that old code into a separate set of
files, and not worry about what is in access/hash.h.

Yeah, in this case everything have to be duplicated which is not big
deal in comparison to do same amount of work for GIN. Then I can start
with GIN.

The advantage of refactoring is then only nicer code.

thanks Zdenek

#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#25)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

Greg Stark <greg.stark@enterprisedb.com> writes:

I'll repeat my suggestion that everyone poo-pooed: we can have the
mail list filters recognize patches, run filterdiff on them with our
prefered options, and attach the result as an additional attachment
(or link to some web directory).

The argument that was made at the developer meeting is that the
preferred way of working will be to apply the submitted patch in one's
local git repository, and then do any needed editorialization as a
second patch on top of it. So the critical need as I see it is to be
able to see a -c version of a patch-in-progress (ie, diff current
working state versus some previous committed state). Readability of the
patch as-submitted is useful for quick eyeball checks, but I think all
serious reviewing is going to be done on local copies.

I think this is actually all a red herring since it's pretty easy for
the reviewer to run filterdiff anyways.

I don't trust filterdiff one bit :-(

regards, tom lane

#28Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#27)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

On Tue, May 26, 2009 at 10:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Greg Stark <greg.stark@enterprisedb.com> writes:

I'll repeat my suggestion that everyone poo-pooed: we can have the
mail list filters recognize patches, run filterdiff on them with our
prefered options, and attach the result as an additional attachment
(or link to some web directory).

The argument that was made at the developer meeting is that the
preferred way of working will be to apply the submitted patch in one's
local git repository, and then do any needed editorialization as a
second patch on top of it.  So the critical need as I see it is to be
able to see a -c version of a patch-in-progress (ie, diff current
working state versus some previous committed state).  Readability of the
patch as-submitted is useful for quick eyeball checks, but I think all
serious reviewing is going to be done on local copies.

I think this is actually all a red herring since it's pretty easy for
the reviewer to run filterdiff anyways.

I don't trust filterdiff one bit :-(

For any particular reason, or just natural skepticism?

I believe there have been some wild-eyed claims tossed around in this
space previously that unified diffs don't provide all the same
information as context diffs, which is flatly false. AIUI, the reason
for the name "unified diff" is that it combines, or unifies, the
"before" and "after" versions of the code into a single chunk. The
nice thing about this is that when you have a bunch of small changes
in a file, you don't end up with all of the surrounding lines repeated
in both the "before" and "after" sections. If you change four
consecutive lines and run a unified diff, you end up with 4 +s, 4 -s,
and 6 lines of context (3 before and 3 after), for a total of 14
lines. If you run a context diff, you end up with 4 !s and 6 lines of
context in the before section and the same in the after section, for a
total of 20 lines, 6 of which are duplicated. This means that in many
cases you can see what's changed without having to page up and down in
the diff.

The not-so-nice thing about unified diffs is that when there is a huge
hunk of code that's changed, there are probably by chance a few
identical lines buried in there, like " }", so the + and - lines
end up mixed together in a way that wouldn't happen in a context diff
(which would turn the whole thing into two big "!" sections). It's no
problem for a machine to understand this, but it's hard to read for a
human being.

I haven't personally verified the filterdiff code, but the
transformation is pretty mechanical so I'm not sure why we should
believe that it hasn't been implemented correctly without some
evidence along those lines.

I don't think there's any way to make anyone 100% happy here. I
personally prefer unified diffs, so when I'm reviewing a complex patch
formatted as a context diff I typically apply it and then run a
unified diff using git. When I'm submitting a patch I use a unified
diff to check my work and then convert it to a context diff for
submission. On the other hand, I assume that, if you were presented
with a complex unified diff, would just apply it and then run a
context-diff to review it. Since, as you say, serious reviewing will
be done on local copies anyway, I really don't see the point of
worrying too much about how they're submitted to the mailing list.
Let's just tell everyone to keep using context diffs as the have been
doing, and if anyone doesn't then let's THROW THEIR PATCH ON THE
DUST-HEAP OF HISTORY AND HAUL THEM OUT TO BE DRAWN AND QUARTERED...
er, um, I mean, ask them not to do it that way the next time.

If there's an issue here that's worth getting worked up about, I'm not
seeing it.

...Robert

#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#28)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, May 26, 2009 at 10:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't trust filterdiff one bit :-(

For any particular reason, or just natural skepticism?

IIRC it was demonstrated to be broken the last time it was proposed
as a solution to our problems. Maybe it's been fixed since then, but
I don't have any confidence in it, since evidently it's not been stress
tested very hard.

I believe there have been some wild-eyed claims tossed around in this
space previously that unified diffs don't provide all the same
information as context diffs, which is flatly false.

No, the gripe has always been just that they're less readable for
nontrivial changes.

The not-so-nice thing about unified diffs is that when there is a huge
hunk of code that's changed, there are probably by chance a few
identical lines buried in there, like " }", so the + and - lines
end up mixed together in a way that wouldn't happen in a context diff
(which would turn the whole thing into two big "!" sections). It's no
problem for a machine to understand this, but it's hard to read for a
human being.

Exactly. Even without identical lines, I find that the old and new code
gets intermixed in easily-confusing ways. -u is very readable for
isolated single-line changes, but for anything larger, not so much.

regards, tom lane

#30Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#29)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

Tom Lane escribi�:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, May 26, 2009 at 10:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't trust filterdiff one bit :-(

For any particular reason, or just natural skepticism?

IIRC it was demonstrated to be broken the last time it was proposed
as a solution to our problems. Maybe it's been fixed since then, but
I don't have any confidence in it, since evidently it's not been stress
tested very hard.

I think you're probably confusing it with interdiff. I've had the
latter fail several times (and I haven't really used it all that much),
but I've never seem filterdiff make a mistake even though I use it
frequently.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#30)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

Alvaro Herrera <alvherre@commandprompt.com> writes:

Tom Lane escribi�:

IIRC it was demonstrated to be broken the last time it was proposed
as a solution to our problems. Maybe it's been fixed since then, but
I don't have any confidence in it, since evidently it's not been stress
tested very hard.

I think you're probably confusing it with interdiff.

No, because I never heard of interdiff before. Checking the archives,
the discussion I was remembering was definitely about filterdiff, but
the rap on it was undocumented (so maybe "demonstrated" is too harsh):

http://archives.postgresql.org/pgsql-hackers/2007-10/msg01243.php

regards, tom lane

#32Greg Stark
greg.stark@enterprisedb.com
In reply to: Tom Lane (#31)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

Uhm the rap you quoted was ambiguous but I read it as referring to the
ability I described if viewing the difference between two patches --
which I didn't name but is in fact interdiff.

--
Greg

On 26 May 2009, at 19:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Alvaro Herrera <alvherre@commandprompt.com> writes:

Tom Lane escribió:

IIRC it was demonstrated to be broken the last time it was proposed
as a solution to our problems. Maybe it's been fixed since then,
but
I don't have any confidence in it, since evidently it's not been
stress
tested very hard.

I think you're probably confusing it with interdiff.

No, because I never heard of interdiff before. Checking the archives,
the discussion I was remembering was definitely about filterdiff, but
the rap on it was undocumented (so maybe "demonstrated" is too harsh):

http://archives.postgresql.org/pgsql-hackers/2007-10/msg01243.php

regards, tom lane

#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#32)
Re: [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

Greg Stark <greg.stark@enterprisedb.com> writes:

On 26 May 2009, at 19:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:

http://archives.postgresql.org/pgsql-hackers/2007-10/msg01243.php

Uhm the rap you quoted was ambiguous but I read it as referring to the
ability I described if viewing the difference between two patches --
which I didn't name but is in fact interdiff.

[ squint... ] Hmm, maybe you're right. I see how it could be read
that way, anyway.

regards, tom lane