dynahash API questions

Started by Neil Conwayabout 19 years ago3 messages
#1Neil Conway
neilc@samurai.com

I'd like to use the dynahash package to construct a hash table whose
keys are an array of k Datums. The value of "k" for a particular hash
table is known at hash_create() time. This raises two questions:

(1) How should variably-sized keys be represented?

The simplest way would be just a k * sizeof(Datum) block of memory.
Unfortunately, the dynahash package requires the key to be a prefix of
the full hash entry, so a variably-sized key would mean that we wouldn't
know the right offset to use for accesses to fields of a hash entry
struct until runtime. Maybe you could make this ugly by hiding the
offset calculation in macros, but it still seems gross. An alternative
is to just make the hash key a pointer to an array of Datums:

struct HashEntry { Datum *key; };

which makes the hash key constant-sized, but requires an extra level of
indirection and therefore a bit of legwork when writing the custom
copying, comparison and hashing functions. Anyone see a better way of
doing this?

(2) How should the hashing and comparison functions be implemented?

We can get the right behavior by using the appropriate opclass members.
To call the corresponding fmgr functions, we need access to some state
(FmgrInfo, etc.) within the hashing and comparison functions. However,
the signatures for these function pointers are:

typedef uint32 (*HashValueFunc) (const void *key, Size keysize);
typedef int (*HashCompareFunc) (const void *key1, const void *key2,
Size keysize);

i.e. there is no provision for passing additional data to either of
these callbacks. It would be possible to workaround this by including a
pointer to the additional data in the key itself:

struct HashEntryKey
{
Datum *key_values;
void *data_ptr;
};

struct HashEntry { HashEntryKey key; };

and then making sure to set the data_ptr correctly before calling
hash_search(). This works, but obviously it's pretty ugly. However, the
simple solution of adding a void pointer argument to these function
pointers would uglify all the existing hashing and comparison functions.
Does anyone have any ideas for a better solution?

-Neil

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Neil Conway (#1)
Re: dynahash API questions

Neil Conway <neilc@samurai.com> writes:

... This works, but obviously it's pretty ugly. However, the
simple solution of adding a void pointer argument to these function
pointers would uglify all the existing hashing and comparison functions.
Does anyone have any ideas for a better solution?

The solution that's been used so far is a static variable known to
the caller and the hash/comparison functions; see for instance
CurTupleHashTable in executor/execGrouping.c. At some point there
might be enough of these to justify adding a void pointer argument
... not sure that we're there yet though.

(Right offhand it sounds like you might be reinventing execGrouping.c
--- what is your application exactly?)

regards, tom lane

#3Neil Conway
neilc@samurai.com
In reply to: Tom Lane (#2)
Re: dynahash API questions

On Sat, 2006-11-25 at 01:36 -0500, Tom Lane wrote:

The solution that's been used so far is a static variable known to
the caller and the hash/comparison functions; see for instance
CurTupleHashTable in executor/execGrouping.c

Ah, fair enough -- that should work.

(Right offhand it sounds like you might be reinventing execGrouping.c
--- what is your application exactly?)

I'm playing around with writing a memoization facility as a UDF:
cache(f, a1, a2, ...) = f(a1, a2, ...), except that cache() keeps an
internal hash of f()'s return values and uses that to avoid calling f()
when possible. Hence the need for a hash table whose keys are (a1,
a2, ...).

-Neil