Document hashtext() and Friends?

Started by David E. Wheeleralmost 14 years ago9 messages
#1David E. Wheeler
david@justatheory.com

Hackers,

Is there a reason that hashtext() and friends are not documented? Given that they’re likely to be used more and more for partitioning and sharding, I think it would be useful to do so, starting with something like this. Comments?

*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 1557,1562 ****
--- 1557,1577 ----
        <row>
         <entry>
          <indexterm>
+          <primary>hashtext</primary>
+         </indexterm>
+         <literal><function>hashtext(<parameter>string</parameter>)</function></literal>
+        </entry>
+        <entry><type>int</type></entry>
+        <entry>
+         Generate a hash value for string.
+        </entry>
+        <entry><literal>hashtext('greetings, human')</literal></entry>
+        <entry><literal>-1132466231</literal></entry>
+       </row>
+ 
+       <row>
+        <entry>
+         <indexterm>
           <primary>left</primary>
          </indexterm>
          <literal><function>left(<parameter>str</parameter> <type>text</type>,

Best

David

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: David E. Wheeler (#1)
Re: Document hashtext() and Friends?

"David E. Wheeler" <david@justatheory.com> writes:

Is there a reason that hashtext() and friends are not documented?

Yes. They are internal functions that exist for the convenience of the
system, not for users. We've discussed this before, and decided that
we don't want people to rely on them continuing to have exactly the
current behavior. One example of a possible future change is to widen
the results from 4 bytes to 8.

regards, tom lane

#3Michael Glaesemann
grzm@seespotcode.net
In reply to: Tom Lane (#2)
Re: Document hashtext() and Friends?

On Feb 21, 2012, at 15:01, Tom Lane wrote:

"David E. Wheeler" <david@justatheory.com> writes:

Is there a reason that hashtext() and friends are not documented?

Yes. They are internal functions that exist for the convenience of the
system, not for users. We've discussed this before, and decided that
we don't want people to rely on them continuing to have exactly the
current behavior. One example of a possible future change is to widen
the results from 4 bytes to 8.

And hashtext *has* changed across versions, which is why Peter Eisentraut published a version-independent hash function library: https://github.com/petere/pgvihash

Michael Glaesemann
grzm seespotcode net

#4Peter Geoghegan
peter@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: Document hashtext() and Friends?

On 21 February 2012 20:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David E. Wheeler" <david@justatheory.com> writes:

Is there a reason that hashtext() and friends are not documented?

Yes.  They are internal functions that exist for the convenience of the
system, not for users.  We've discussed this before, and decided that
we don't want people to rely on them continuing to have exactly the
current behavior.  One example of a possible future change is to widen
the results from 4 bytes to 8.

My pg_stat_statements normalisation patch actually extends the
underlying hash_any() function to support 8 byte results, exactly as
currently anticipated by comments above that function, while supplying
a compatibility macro that is used by existing hash_any() clients.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#5David E. Wheeler
david@justatheory.com
In reply to: Michael Glaesemann (#3)
Re: Document hashtext() and Friends?

On Feb 21, 2012, at 12:11 PM, Michael Glaesemann wrote:

And hashtext *has* changed across versions, which is why Peter Eisentraut published a version-independent hash function library: https://github.com/petere/pgvihash

Yes, Marko wrote one, too:

https://github.com/markokr/pghashlib

But as I’m about to build a system that is going to have many billions of nodes, I could use a variant that returns a bigint. Anyone got a pointer to something like that?

Thanks,

David

#6David E. Wheeler
david@justatheory.com
In reply to: David E. Wheeler (#5)
Re: Document hashtext() and Friends?

On Feb 21, 2012, at 12:14 PM, David E. Wheeler wrote:

And hashtext *has* changed across versions, which is why Peter Eisentraut published a version-independent hash function library: https://github.com/petere/pgvihash

Yes, Marko wrote one, too:

https://github.com/markokr/pghashlib

Oh, and these are great extensions for PGXN. Any chance of seeing them submitted soon, Peter and Marko?

Thanks,

David

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#4)
Re: Document hashtext() and Friends?

Peter Geoghegan <peter@2ndquadrant.com> writes:

My pg_stat_statements normalisation patch actually extends the
underlying hash_any() function to support 8 byte results,

... er, what? That seems rather out of scope for that patch,
not to mention unnecessary.

regards, tom lane

#8ktm@rice.edu
ktm@rice.edu
In reply to: David E. Wheeler (#5)
Re: Document hashtext() and Friends?

On Tue, Feb 21, 2012 at 12:14:03PM -0800, David E. Wheeler wrote:

On Feb 21, 2012, at 12:11 PM, Michael Glaesemann wrote:

And hashtext *has* changed across versions, which is why Peter Eisentraut published a version-independent hash function library: https://github.com/petere/pgvihash

Yes, Marko wrote one, too:

https://github.com/markokr/pghashlib

But as I’m about to build a system that is going to have many billions of nodes, I could use a variant that returns a bigint. Anyone got a pointer to something like that?

Thanks,

David

Hi David,

The existing hash_any() function can return a 64-bit hash, instead of the current
32-bit hash, by returning the b and c values, instead of the current which just
returns the c value, per the comment at the start of the function. It sounded like
Peter had already done this in his pg_stat_statements normalization patch, but I
could not find it.

Regards,
Ken

#9Peter Geoghegan
peter@2ndquadrant.com
In reply to: Tom Lane (#7)
Re: Document hashtext() and Friends?

On 21 February 2012 20:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Geoghegan <peter@2ndquadrant.com> writes:

My pg_stat_statements normalisation patch actually extends the
underlying hash_any() function to support 8 byte results,

... er, what?  That seems rather out of scope for that patch,
not to mention unnecessary.

Well, assuming that you deem a uint64 query_id to be necessary, and
based on your earlier comments I take it that you do, that seemed like
the most natural way of going about getting such a value, particularly
since this change is anticipated by the comments above the function.

Of course, any further input you can give on that patch would be most
appreciated. I'm particularly eager to resolve the problems with core
infrastructure (such as that apparent bug with some Const locations),
so that we can at the very least be sure that the community won't have
to wait for the release of 9.3 at the earliest before having a
normalisation capability with stat_statements.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services