Hash Functions
/messages/by-id/CAMp0ubeo3fzzEfiE1vmc1AJkkRPxLnZQoOASeu6cCcco-c+9zw@mail.gmail.com
In that thread, I pointed out some important considerations for the
hash functions themselves. This is a follow-up, after I looked more
carefully.
1. The hash functions as they exist today aren't portable -- they can
return different results on different machines. That means using these
functions for hash partitioning would yield different contents for the
same partition on different architectures (and that's bad, considering
they are logical partitions and not some internal detail).
The core hashing algorithm is based on 32-bit chunks, so it's only
portable if the input is an int32 (or array of int32s). That's not
true for varchar (etc.) or numeric (which is based on an array of
int16s). There is a hack for int8, but see issue #2 below.
We could try to mark portable vs. unportable hash functions, but it
seems quite valuable to be able to logically partition on varchar, so
I think we should have some portable answer there. Another annoyance
is that we would have to assume container types are unportable, or
make them inherit the portability of the underlying type's hash
function.
We could revert 26043592 (copied Tom because that was his patch) to
make hash_any() go back to being portable -- do we know what that
speedup actually was? Maybe the benefit is smaller on newer
processors? Another option is to try to do some combination of
byteswapping and word-at-a-time, which might be better than
byte-at-a-time if the byteswapping is done with a native instruction.
2. hashint8() is a little dubious. If the input is positive, it XORs
the high bits; if negative, it XORs the complement of the high bits.
That works for compatibility with hashint2/4, but it does not cause
good hash properties[1]You can a kind of mirroring in the hash outputs indicating bad mixing:. I prefer that we either (a) upcast int2/4 to
int8 before hashing and then hash the whole 64 bits; or (b) if the
value is within range, downcast to int4, otherwise hash the whole 64
bits.
3. We should downcast numeric to int4/8 before hashing if it's in
range, so that it can be a part of the same opfamily as int2/4/8.
4. text/varchar should use hashbpchar() so that they can be part of
the same opfamily. Trailing blanks seem unlikely to be significant for
most real-world data anyway.
5. For salts[2]Not sure I'm using the term "salt" properly. I really just mean a way to affect the initial state, which I think is good enough for our purposes., I don't think it's too hard to support them in an
optional way. We just allow the function to be a two-argument function
with a default. Places that care about specifying the salt may do so
if the function has pronargs==2, otherwise it just gets the default
value. If we have salts, I don't think having 64-bit hashes is very
important. If we run out of bits, we can just salt the hash function
differently and get more hash bits. This is not urgent and I believe
we should just implement salts when and if some algorithm needs them.
Regards,
Jeff Davis
[1]: You can a kind of mirroring in the hash outputs indicating bad mixing:
postgres=# select hashint8((2^32-100)::int8);
hashint8
----------
41320869
(1 row)
postgres=# select hashint8((-2^32-100)::int8);
hashint8
----------
41320869
(1 row)
postgres=# select hashint8((2^32-101)::int8);
hashint8
-------------
-1214482326
(1 row)
postgres=# select hashint8((-2^32-101)::int8);
hashint8
-------------
-1214482326
(1 row)
[2]: Not sure I'm using the term "salt" properly. I really just mean a way to affect the initial state, which I think is good enough for our purposes.
way to affect the initial state, which I think is good enough for our
purposes.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, May 12, 2017 at 12:08 AM, Jeff Davis <pgsql@j-davis.com> wrote:
1. The hash functions as they exist today aren't portable -- they can
return different results on different machines. That means using these
functions for hash partitioning would yield different contents for the
same partition on different architectures (and that's bad, considering
they are logical partitions and not some internal detail).
Hmm, yeah, that is bad.
We could revert 26043592 (copied Tom because that was his patch) to
make hash_any() go back to being portable -- do we know what that
speedup actually was? Maybe the benefit is smaller on newer
processors? Another option is to try to do some combination of
byteswapping and word-at-a-time, which might be better than
byte-at-a-time if the byteswapping is done with a native instruction.
With regard to portability, I find that in 2009, according to Tom, we
had "already agreed" that it was dispensible:
/messages/by-id/23832.1234214526@sss.pgh.pa.us
I was not able to find where that was agreed. On performance, I found this:
/messages/by-id/20081104202655.GP18362@it.is.rice.edu
It says at the end: "The average time to reindex the table using our
current hash_any() without the separate mix()/final() was 1696ms and
1482ms with the separate mix()/final() stages giving almost 13% better
performance for this stupid metric."
5. For salts[2], I don't think it's too hard to support them in an
optional way. We just allow the function to be a two-argument function
with a default. Places that care about specifying the salt may do so
if the function has pronargs==2, otherwise it just gets the default
value. If we have salts, I don't think having 64-bit hashes is very
important. If we run out of bits, we can just salt the hash function
differently and get more hash bits. This is not urgent and I believe
we should just implement salts when and if some algorithm needs them.
The potential problem with that is that the extra argument might slow
down the hash functions enough to matter. Argument unpacking is not
free, and the hashing logic itself will get more complex.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On May 12, 2017 10:05:56 AM PDT, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, May 12, 2017 at 12:08 AM, Jeff Davis <pgsql@j-davis.com> wrote:
1. The hash functions as they exist today aren't portable -- they can
return different results on different machines. That means usingthese
functions for hash partitioning would yield different contents for
the
same partition on different architectures (and that's bad,
considering
they are logical partitions and not some internal detail).
Hmm, yeah, that is bad.
Given that a lot of data types have a architecture dependent representation, it seems somewhat unrealistic and expensive to have a hard rule to keep them architecture agnostic. And if that's not guaranteed, then I'm doubtful it makes sense as a soft rule either.
Andres
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, May 12, 2017 at 1:12 PM, Andres Freund <andres@anarazel.de> wrote:
Given that a lot of data types have a architecture dependent representation, it seems somewhat unrealistic and expensive to have a hard rule to keep them architecture agnostic. And if that's not guaranteed, then I'm doubtful it makes sense as a soft rule either.
That's a good point, but the flip side is that, if we don't have such
a rule, a pg_dump of a hash-partitioned table on one architecture
might fail to restore on another architecture. Today, I believe that,
while the actual database cluster is architecture-dependent, a pg_dump
is architecture-independent. Is it OK to lose that property?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, May 12, 2017 at 1:12 PM, Andres Freund <andres@anarazel.de> wrote:
Given that a lot of data types have a architecture dependent representation, it seems somewhat unrealistic and expensive to have a hard rule to keep them architecture agnostic. And if that's not guaranteed, then I'm doubtful it makes sense as a soft rule either.
That's a good point, but the flip side is that, if we don't have such
a rule, a pg_dump of a hash-partitioned table on one architecture
might fail to restore on another architecture. Today, I believe that,
while the actual database cluster is architecture-dependent, a pg_dump
is architecture-independent. Is it OK to lose that property?
I'd vote that it's not, which means that this whole approach to hash
partitioning is unworkable. I agree with Andres that demanding hash
functions produce architecture-independent values will not fly.
Maintaining such a property for float8 (and the types that depend on it)
might be possible if you believe that nobody ever uses anything but IEEE
floats, but we've never allowed that as a hard assumption before.
Even architecture dependence isn't the whole scope of the problem.
Consider for example dumping a LATIN1-encoded database and trying
to reload it into a UTF8-encoded database. People will certainly
expect that to be possible, and do you want to guarantee that the
hash of a text value is encoding-independent?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 05/12/2017 10:17 AM, Robert Haas wrote:
On Fri, May 12, 2017 at 1:12 PM, Andres Freund wrote:
Given that a lot of data types have a architecture dependent
representation, it seems somewhat unrealistic and expensive to have
a hard rule to keep them architecture agnostic. And if that's not
guaranteed, then I'm doubtful it makes sense as a soft rule
either.That's a good point, but the flip side is that, if we don't have
such a rule, a pg_dump of a hash-partitioned table on one
architecture might fail to restore on another architecture. Today, I
believe that, while the actual database cluster is
architecture-dependent, a pg_dump is architecture-independent. Is it
OK to lose that property?
Not from where I sit.
Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
On Fri, May 12, 2017 at 1:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'd vote that it's not, which means that this whole approach to hash
partitioning is unworkable. I agree with Andres that demanding hash
functions produce architecture-independent values will not fly.
If we can't produce architecture-independent hash values, then what's
the other option?
One alternative would be to change the way that we dump and restore
the data. Instead of dumping the data with the individual partitions,
dump it all out for the parent and let tuple routing sort it out at
restore time. Of course, this isn't very satisfying because now
dump-and-restore hasn't really preserved the state of the database;
indeed, you could make it into a hard failure by creating a foreign
key referencing a partition hash partition. After dump-and-restore,
the row ends up in some other partition and the foreign key can't be
recreated because the relationship no longer holds. This isn't
limited to foreign keys, either; similar problems could be created
with CHECK constraints or other per-table properties that can vary
between one child and another.
I basically think it's pretty futile to suppose that we can get away
with having a dump and restore move rows around between partitions
without having that blow up in some cases.
Maintaining such a property for float8 (and the types that depend on it)
might be possible if you believe that nobody ever uses anything but IEEE
floats, but we've never allowed that as a hard assumption before.
I don't know how standard that is. Is there any hardware that
anyone's likely to be using that doesn't? TBH, I don't really care if
support for obscure, nearly-dead platforms like VAX or whatever don't
quite work with hash-partitioned tables. In practice, PostgreSQL only
sorta works on that kind of platform anyway; there are far bigger
problems than this. On the other hand, if there are servers being
shipped in 2017 that don't use IEEE floats, that's another problem.
What about integers? I think we're already assuming two's-complement
arithmetic, which I think means that the only problem with making the
hash values portable for integers is big-endian vs. little-endian.
That's sounds solveable-ish.
Even architecture dependence isn't the whole scope of the problem.
Consider for example dumping a LATIN1-encoded database and trying
to reload it into a UTF8-encoded database. People will certainly
expect that to be possible, and do you want to guarantee that the
hash of a text value is encoding-independent?
No, I think that's expecting too much. I'd be just fine telling
people that if you hash-partition on a text column, it may not load
into a database with another encoding. If you care about that, don't
use hash-partitioning, or don't change the encoding, or dump out the
partitions one by one and reload all the roads into the parent table
for re-routing, solving whatever problems come up along the way.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, May 12, 2017 at 1:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'd vote that it's not, which means that this whole approach to hash
partitioning is unworkable. I agree with Andres that demanding hash
functions produce architecture-independent values will not fly.
If we can't produce architecture-independent hash values, then what's
the other option?
Forget hash partitioning. There's no law saying that that's a good
idea and we have to have it. With a different set of constraints,
maybe we could do it, but I think the existing design decisions have
basically locked it out --- and I doubt that hash partitioning is so
valuable that we should jettison other desirable properties to get it.
One alternative would be to change the way that we dump and restore
the data. Instead of dumping the data with the individual partitions,
dump it all out for the parent and let tuple routing sort it out at
restore time. Of course, this isn't very satisfying because now
dump-and-restore hasn't really preserved the state of the database;
indeed, you could make it into a hard failure by creating a foreign
key referencing a partition hash partition.
Yeah, that isn't really appetizing at all. If we were doing physical
partitioning below the user-visible level, we could make it fly.
But the existing design makes the partition boundaries user-visible
which means we have to insist that the partitioning rule is immutable
(in the strongest sense of being invariant across all installations
you could possibly wish to transport data between).
Now, we already have some issues of that sort with range partitioning;
if say you try to transport data to another system with a different
sort ordering, you have probably got problems. But that issue already
existed with the old manual partitioning approach, ie if you have a
CHECK constraint like "(col >= 'foo' && col < 'gob')" then a transfer
to such a system could fail already. So I'm okay with that. But it
seems like hash partitioning is going to introduce new, exciting, and
hard-to-document-or-avoid portability gotchas. Do we really need
to go there?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, May 12, 2017 at 02:23:14PM -0400, Robert Haas wrote:
What about integers? I think we're already assuming two's-complement
arithmetic, which I think means that the only problem with making the
hash values portable for integers is big-endian vs. little-endian.
That's sounds solveable-ish.
xxhash produces identical hashes independent for big-endian and little-
endian.
https://github.com/Cyan4973/xxHash
Regards,
Ken
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, May 12, 2017 at 2:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Yeah, that isn't really appetizing at all. If we were doing physical
partitioning below the user-visible level, we could make it fly.
But the existing design makes the partition boundaries user-visible
which means we have to insist that the partitioning rule is immutable
(in the strongest sense of being invariant across all installations
you could possibly wish to transport data between).
I think you're right.
Now, we already have some issues of that sort with range partitioning;
if say you try to transport data to another system with a different
sort ordering, you have probably got problems. But that issue already
existed with the old manual partitioning approach, ie if you have a
CHECK constraint like "(col >= 'foo' && col < 'gob')" then a transfer
to such a system could fail already. So I'm okay with that. But it
seems like hash partitioning is going to introduce new, exciting, and
hard-to-document-or-avoid portability gotchas. Do we really need
to go there?
That is a good question. I think it basically amounts to this
question: is hash partitioning useful, and if so, for what?
Range and list partitioning inherently provide management benefits.
For example, if you range-partition your data by year, then when you
want to get rid of the oldest year worth of data, you can drop the
entire partition at once, which is likely to be much faster than a
DELETE statement. But hash partitioning provide no such benefits
because, by design, the distribution of the data among the partitions
is essentially random. Dropping one partition will not usually be a
useful thing to do because the rows in that partition are logically
unconnected. So, if you have a natural way of partitioning a table by
range or list, that's probably going to be superior to hash
partitioning, but sometimes there isn't an obviously useful way of
breaking up the data, but you still want to partition it in order to
reduce the size of the individual tables.
That, in turn, allows maintenance operations to be performed each
partition separately. For example, suppose your table is big enough
that running CLUSTER or VACUUM FULL on it takes eight hours, but you
can't really afford an eight-hour maintenance window when the table
gets bloated. However, you can afford (on Sunday nights when activity
is lowest) a window of, say, one hour. Well, if you hash-partition
the table, you can CLUSTER or VACUUM FULL one partition a week for N
weeks until you get to them all. Similarly, if you need to create an
index, you can build it on one partition at a time. You can even add
the index to one partition to see how well it works -- for example,
does it turn too many HOT updates into non-HOT updates, causing bloat?
-- and try it out before you go do it across the board. And an
unpartitioned table can only accommodate one VACUUM process at a time,
but a partitioned table can have one per partition. Partitioning a
table also means that you don't need a single disk volume large enough
to hold the whole thing; instead, you can put each partition in a
separate tablespace or (in some exciting future world where PostgreSQL
looks more like a distributed system) perhaps even on another server.
For a table that is a few tens of gigabytes, none of this amounts to a
hill of beans, but for a table that is a few tens of terabytes, the
time it takes to perform administrative operations can become a really
big problem. A good example is, say, a table full of orders. Imagine
a high-velocity business like Amazon or UPS that has a constant stream
of new orders, or a mobile app that has a constant stream of new user
profiles. If that data grows fast enough that the table needs to be
partitioned, how should it be done? It's often desirable to create
partitions of about equal size and about equal hotness, and
range-partitioning on the creation date or order ID number means
continually creating new partitions, and having all of the hot data in
the same partition.
In my experience, it is *definitely* the case that users with very
large tables are experiencing significant pain right now. The freeze
map changes were a good step towards ameliorating some of that pain,
but I think hash partitioning is another step in that direction, and I
think there will be other applications as well. Even for users who
don't have data that large, there are also interesting
query-performance implications of hash partitioning. Joins between
very large tables tend to get implemented as merge joins, which often
means sorting all the data, which is O(n lg n) and not easy to
parallelize. But if those very large tables happened to be compatibly
partitioned on the join key, you could do a partitionwise join per the
patch Ashutosh proposed, which would be cheaper and easier to do in
parallel.
Maybe a shorter argument for hash partitioning is that not one but two
different people proposed patches for it within months of the initial
partitioning patch going in. When multiple people are thinking about
implementing the same feature almost immediately after the
prerequisite patches land, that's a good clue that it's a desirable
feature. So I think we should try to solve the problems, rather than
giving up.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 5/12/17 14:23, Robert Haas wrote:
One alternative would be to change the way that we dump and restore
the data. Instead of dumping the data with the individual partitions,
dump it all out for the parent and let tuple routing sort it out at
restore time.
I think this could be a pg_dump option. One way it dumps out the
partitions, and another way it recomputes the partitions. I think that
could be well within pg_dump's mandate.
(cough ... logical replication ... cough)
Of course, this isn't very satisfying because now
dump-and-restore hasn't really preserved the state of the database;
That depends no whether you consider the partitions to be a user-visible
or an internal detail. The current approach is sort of in the middle,
so it makes sense to allow the user to interpret it either way depending
on need.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Peter Eisentraut wrote:
On 5/12/17 14:23, Robert Haas wrote:
One alternative would be to change the way that we dump and restore
the data. Instead of dumping the data with the individual partitions,
dump it all out for the parent and let tuple routing sort it out at
restore time.I think this could be a pg_dump option. One way it dumps out the
partitions, and another way it recomputes the partitions. I think that
could be well within pg_dump's mandate.
I was thinking the same, but enable that option automatically for hash
partitioning. Each partition is still dumped separately, but instead of
referencing the specific partition in the TABLE DATA object, reference
the parent table. This way, the restore can still be parallelized, but
tuple routing is executed for each tuple.
(cough ... logical replication ... cough)
I think for logical replication the tuple should appear as being in the
parent table, not the partition. No?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 5/12/17 18:13, Alvaro Herrera wrote:
I think for logical replication the tuple should appear as being in the
parent table, not the partition. No?
Logical replication replicates base table to base table. How those
tables are tied together into a partitioned table or an inheritance tree
is up to the system catalogs on each side.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, May 12, 2017 at 06:38:55PM -0400, Peter Eisentraut wrote:
On 5/12/17 18:13, Alvaro Herrera wrote:
I think for logical replication the tuple should appear as being in the
parent table, not the partition. No?Logical replication replicates base table to base table. How those
tables are tied together into a partitioned table or an inheritance tree
is up to the system catalogs on each side.
This seems like a totally reasonable approach to pg_dump, especially
in light of the fact that logical replication already (and quite
reasonably) does it this way. Hard work has been done to make
tuple-routing cheap, and this is one of the payoffs.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, May 12, 2017 at 7:36 PM, David Fetter <david@fetter.org> wrote:
On Fri, May 12, 2017 at 06:38:55PM -0400, Peter Eisentraut wrote:
On 5/12/17 18:13, Alvaro Herrera wrote:
I think for logical replication the tuple should appear as being in the
parent table, not the partition. No?Logical replication replicates base table to base table. How those
tables are tied together into a partitioned table or an inheritance tree
is up to the system catalogs on each side.This seems like a totally reasonable approach to pg_dump, especially
in light of the fact that logical replication already (and quite
reasonably) does it this way. Hard work has been done to make
tuple-routing cheap, and this is one of the payoffs.
Cheap isn't free, though. It's got a double-digit percentage overhead
rather than a large-multiple-of-the-runtime overhead as triggers do,
but people still won't want to pay it unnecessarily, I think.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-05-12 21:56:30 -0400, Robert Haas wrote:
Cheap isn't free, though. It's got a double-digit percentage overhead
rather than a large-multiple-of-the-runtime overhead as triggers do,
but people still won't want to pay it unnecessarily, I think.
That should be partiall addressable with reasonable amounts of
engineering though. Efficiently computing the target partition in a
hash-partitioned table can be implemented very efficiently, and adding
infrastructure for multiple bulk insert targets in copy should be quite
doable as well. It's also work that's generally useful, not just for
backups.
The bigger issue to me here wrt pg_dump is that partitions can restored
in parallel, but that'd probably not work as well if dumped
separately. Unless we'd do the re-routing on load, rather than when
dumping - which'd also increase cache locality, by most of the time
(same architecture/encoding/etc) having one backend insert into the same
partition.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, May 13, 2017 at 1:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, May 12, 2017 at 2:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maybe a shorter argument for hash partitioning is that not one but two
different people proposed patches for it within months of the initial
partitioning patch going in. When multiple people are thinking about
implementing the same feature almost immediately after the
prerequisite patches land, that's a good clue that it's a desirable
feature. So I think we should try to solve the problems, rather than
giving up.
Can we think of defining separate portable hash functions which can be
used for the purpose of hash partitioning?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, May 13, 2017 at 12:52 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Can we think of defining separate portable hash functions which can be
used for the purpose of hash partitioning?
I think that would be a good idea. I think it shouldn't even be that
hard. By data type:
- Integers. We'd need to make sure that we get the same results for
the same value on big-endian and little-endian hardware, and that
performance is good on both systems. That seems doable.
- Floats. There may be different representations in use on different
hardware, which could be a problem. Tom didn't answer my question
about whether any even-vaguely-modern hardware is still using non-IEEE
floats, which I suspect means that the answer is "no". If every bit
of hardware we are likely to find uses basically the same
representation of the same float value, then this shouldn't be hard.
(Also, even if this turns out to be hard for floats, using a float as
a partitioning key would be a surprising choice because the default
output representation isn't even unambiguous; you need
extra_float_digits for that.)
- Strings. There's basically only one representation for a string.
If we assume that the hash code only needs to be portable across
hardware and not across encodings, a position for which I already
argued upthread, then I think this should be manageable.
- Everything Else. Basically, everything else is just a composite of
that stuff, I think.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, May 12, 2017 at 10:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maintaining such a property for float8 (and the types that depend on it)
might be possible if you believe that nobody ever uses anything but IEEE
floats, but we've never allowed that as a hard assumption before.
This is not such a big practical problem (for me at least) because
hashing of floats is of dubious value.
Even architecture dependence isn't the whole scope of the problem.
Consider for example dumping a LATIN1-encoded database and trying
to reload it into a UTF8-encoded database. People will certainly
expect that to be possible, and do you want to guarantee that the
hash of a text value is encoding-independent?
That is a major problem. In an ideal world, we could make that work
with something like ucol_getSortKey(), but we don't require ICU, and
we can't mix getSortKey() with strxfrm(), or even strxfrm() results
from different platforms.
I don't think it's correct to hash the code points, either, because
strings may be considered equal in a locale even if the code points
aren't identical. But I don't think postgres lives up to that standard
currently.
But hash partitioning is too valuable to give up on entirely. I think
we should consider supporting a limited subset of types for now with
something not based on the hash am.
Regards,
Jeff Davis
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Sat, May 13, 2017 at 12:52 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Can we think of defining separate portable hash functions which can be
used for the purpose of hash partitioning?
I think that would be a good idea. I think it shouldn't even be that
hard. By data type:
- Integers. We'd need to make sure that we get the same results for
the same value on big-endian and little-endian hardware, and that
performance is good on both systems. That seems doable.
- Floats. There may be different representations in use on different
hardware, which could be a problem. Tom didn't answer my question
about whether any even-vaguely-modern hardware is still using non-IEEE
floats, which I suspect means that the answer is "no". If every bit
of hardware we are likely to find uses basically the same
representation of the same float value, then this shouldn't be hard.
(Also, even if this turns out to be hard for floats, using a float as
a partitioning key would be a surprising choice because the default
output representation isn't even unambiguous; you need
extra_float_digits for that.)
- Strings. There's basically only one representation for a string.
If we assume that the hash code only needs to be portable across
hardware and not across encodings, a position for which I already
argued upthread, then I think this should be manageable.
Basically, this is simply saying that you're willing to ignore the
hard cases, which reduces the problem to one of documenting the
portability limitations. You might as well not even bother with
worrying about the integer case, because porting between little-
and big-endian systems is surely far less common than cases you've
already said you're okay with blowing off.
That's not an unreasonable position to take, perhaps; doing better
than that is going to be a lot more work and it's not very clear
how much real-world benefit results. But I can't follow the point
of worrying about endianness but not encoding.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, May 12, 2017 at 11:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Forget hash partitioning. There's no law saying that that's a good
idea and we have to have it. With a different set of constraints,
maybe we could do it, but I think the existing design decisions have
basically locked it out --- and I doubt that hash partitioning is so
valuable that we should jettison other desirable properties to get it.
A lot of the optimizations that can make use of hash partitioning
could also make use of range partitioning. But let me defend hash
partitioning:
* hash partitioning requires fewer decisions by the user
* naturally balances data and workload among partitions in most cases
* easy to match with a degree of parallelism
But with range partitioning, you can have situations where different
tables have different distributions of data. If you partition to
balance the data between partitions in both cases, then that makes
partition-wise join a lot harder because the boundaries don't line up.
If you make the boundaries line up to do partition-wise join, the
partitions might have wildly different amounts of data in them. Either
way, it makes parallelism harder.
Even without considering joins, range partitioning could force you to
make a choice between balancing the data and balancing the workload.
If you are partitioning based on date, then a lot of the workload will
be on more recent partitions. That's desirable sometimes (e.g. for
vacuum) but not always desirable for parallelism.
Hash partitioning doesn't have these issues and goes very nicely with
parallel query.
Regards,
Jeff Davis
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, May 12, 2017 at 12:38 PM, Robert Haas <robertmhaas@gmail.com> wrote:
That is a good question. I think it basically amounts to this
question: is hash partitioning useful, and if so, for what?
Two words: parallel query. To get parallelism, one of the best
approaches is dividing the data, then doing as much work as possible
before combining it again. If you have hash partitions on some key,
then you can do partition-wise joins or partition-wise aggregation on
that key in parallel with no synchronization/communication overhead
(until the final result).
You've taken postgres pretty far in this direction already; hash
partitioning will take it one step further by allowing more pushdowns
and lower sync/communication costs.
Some of these things can be done with range partitioning, too, but see
my other message here:
/messages/by-id/CAMp0ubfNMSGRvZh7N7TRzHHN5tz0ZeFP13Aq3sv6b0H37fdcPg@mail.gmail.com
Regards,
Jeff Davis
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-05-13 10:29:09 -0400, Robert Haas wrote:
On Sat, May 13, 2017 at 12:52 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Can we think of defining separate portable hash functions which can be
used for the purpose of hash partitioning?I think that would be a good idea. I think it shouldn't even be that
hard. By data type:- Integers. We'd need to make sure that we get the same results for
the same value on big-endian and little-endian hardware, and that
performance is good on both systems. That seems doable.- Floats. There may be different representations in use on different
hardware, which could be a problem. Tom didn't answer my question
about whether any even-vaguely-modern hardware is still using non-IEEE
floats, which I suspect means that the answer is "no". If every bit
of hardware we are likely to find uses basically the same
representation of the same float value, then this shouldn't be hard.
(Also, even if this turns out to be hard for floats, using a float as
a partitioning key would be a surprising choice because the default
output representation isn't even unambiguous; you need
extra_float_digits for that.)- Strings. There's basically only one representation for a string.
If we assume that the hash code only needs to be portable across
hardware and not across encodings, a position for which I already
argued upthread, then I think this should be manageable.- Everything Else. Basically, everything else is just a composite of
that stuff, I think.
I seriously doubt that's true. A lot of more complex types have
internal alignment padding and such. Consider e.g. something like
jsonb, hstore, or postgis types - you *can* convert them to something
that's unambiguous, but it's going to be fairly expensive. Essentially
you'd have to something like calling the output function, and then
hashing the result of that. And hash-partitioning is particularly
interesting for e.g. large amounts of points in a geospatial scenario,
because other types of partitioning are quite hard to maintain.
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, May 13, 2017 at 7:08 PM, Andres Freund <andres@anarazel.de> wrote:
I seriously doubt that's true. A lot of more complex types have
internal alignment padding and such.
True, but I believe we require those padding bytes to be zero. If we
didn't, then hstore_hash would be broken already.
Consider e.g. something like
jsonb, hstore, or postgis types - you *can* convert them to something
that's unambiguous, but it's going to be fairly expensive.
I'm fuzzy on what you think we'd need to do.
Essentially
you'd have to something like calling the output function, and then
hashing the result of that.
I really don't see why we'd have to go to nearly that length.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On May 13, 2017 8:44:22 PM PDT, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, May 13, 2017 at 7:08 PM, Andres Freund <andres@anarazel.de>
wrote:I seriously doubt that's true. A lot of more complex types have
internal alignment padding and such.True, but I believe we require those padding bytes to be zero. If we
didn't, then hstore_hash would be broken already.
It'll be differently sized on different platforms. So everyone will have to write hash functions that look at each member individually, rather than hashing the entire struct at once. And for each member you'll have to use a type specific hash function...
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, May 13, 2017 at 1:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Basically, this is simply saying that you're willing to ignore the
hard cases, which reduces the problem to one of documenting the
portability limitations. You might as well not even bother with
worrying about the integer case, because porting between little-
and big-endian systems is surely far less common than cases you've
already said you're okay with blowing off.That's not an unreasonable position to take, perhaps; doing better
than that is going to be a lot more work and it's not very clear
how much real-world benefit results. But I can't follow the point
of worrying about endianness but not encoding.
Encoding is a user choice, not a property of the machine. Or, looking
at it from another point of view, the set of values that can be
represented by an int4 is the same whether they are represented in
big-endian form or in little-endian form, but the set of values that
are representable changes when you switch encodings. You could argue
that text-under-LATIN1 and text-under-UTF8 aren't really the same data
type at all. It's one thing to say "you can pick up your data and
move it to a different piece of hardware and nothing will break".
It's quite another thing to say "you can pick up your data and convert
it to a different encoding and nothing will break". The latter is
generally false already. Maybe LATIN1 -> UTF8 is no-fail, but what
about UTF8 -> LATIN1 or SJIS -> anything? Based on previous mailing
list discussions, I'm under the impression that it is sometimes
debatable how a character in one encoding should be converted to some
other encoding, either because it's not clear whether there is a
mapping at all or it's unclear what mapping should be used. See,
e.g., 2dbbf33f4a95cdcce66365bcdb47c885a8858d3c, or
/messages/by-id/1739a900-30ab-f48e-aec4-2b35475ecf02@2ndquadrant.com
where it was discussed that being able to convert encoding A ->
encoding B does not guarantee the ability to perform the reverse
conversion.
Arguing that a given int4 value should hash to the same value on every
platform seems like a request that is at least superficially
reasonable, if possibly practically tricky in some cases. Arguing
that every currently supported encoding should hash every character
the same way when they don't all have the same set of characters and
the mappings between them are occasionally debatable is asking for the
impossible. I certainly don't want to commit to a design for hash
partitioning that involves a compatibility break any time somebody
changes any encoding conversion in the system, even if a hash function
that involved translating every character to some sort of universal
code point before hashing it didn't seem likely to be horribly slow.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, May 13, 2017 at 11:47 PM, Andres Freund <andres@anarazel.de> wrote:
It'll be differently sized on different platforms. So everyone will have to write hash functions that look at each member individually, rather than hashing the entire struct at once. And for each member you'll have to use a type specific hash function...
Well, that's possibly kind of annoying, but it still sounds like
pretty mechanical work.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 13 May 2017 at 10:29, Robert Haas <robertmhaas@gmail.com> wrote:
- Floats. There may be different representations in use on different
hardware, which could be a problem. Tom didn't answer my question
about whether any even-vaguely-modern hardware is still using non-IEEE
floats, which I suspect means that the answer is "no".
Fwiw the answer to that is certainly no. The only caveat is that some
platforms have not entirely complete implementations of IEEE missing
corner cases such as denormalized values but I don't think that would
be something that would be changed with a different hash function
though.
Personally while I would like to avoid code that actively crashes or
fails basic tests on Vax even I don't think we need to worry about
replication or federated queries in a heterogeneous environment where
some servers are Vaxen and some are modern hardware. That seems a bit
far-fetched.
--
greg
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-05-14 15:59:09 -0400, Greg Stark wrote:
Personally while I would like to avoid code that actively crashes or
fails basic tests on Vax
I personally vote for simply refusing to run/compile on non-IEEE
platforms, including VAX. The benefit of even trying to get that right,
not to speak of actually testing, seems just not there.
even I don't think we need to worry about
replication or federated queries in a heterogeneous environment where
some servers are Vaxen and some are modern hardware. That seems a bit
far-fetched.
Imo there's little point in trying to delineate some subset that we want
to support on platforms that are 20 years out of date. Either we do, or
don't. And since there's no point aside of some intellectual
curiosity...
On the other hand, I don't believe my opinion that requiring hashing to
be portable is unrealistic, is meaningfully affected by disallowing
non-IEEE floats.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, May 13, 2017 at 9:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:
The latter is
generally false already. Maybe LATIN1 -> UTF8 is no-fail, but what
about UTF8 -> LATIN1 or SJIS -> anything? Based on previous mailing
list discussions, I'm under the impression that it is sometimes
debatable how a character in one encoding should be converted to some
other encoding, either because it's not clear whether there is a
mapping at all or it's unclear what mapping should be used.
The express goal of the Unicode consortium is to replace all existing
encodings with Unicode. My personal opinion is that a Unicode
monoculture would be a good thing, provided reasonable differences can
be accommodated. So, it might be that there is ambiguity about how one
codepoint can be converted to another in another encoding, but that's
because encodings like SJIS and BIG5 are needlessly ambiguous. It has
nothing to do with cultural preferences leaving the question
undecidable (at least by a panel of natural language experts), and
everything to do with these regional character encoding systems being
objectively bad. They richly deserve to die, and are in fact dying.
Encoding actually *is* a property of the machine, even though regional
encodings obfuscate things. There is a reason why MacOS and Java use
UTF-16 rather than UTF-8, and there is a reason why the defacto
standard on the web is UTF-8, and those reasons are completely
technical. AFAICT, whatever non-technical reasons remain are actually
technical debt in disguise.
Where this leaves hash partitioning, I cannot say.
--
Peter Geoghegan
VMware vCenter Server
https://www.vmware.com/
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, May 15, 2017 at 7:59 AM, Greg Stark <stark@mit.edu> wrote:
On 13 May 2017 at 10:29, Robert Haas <robertmhaas@gmail.com> wrote:
- Floats. There may be different representations in use on different
hardware, which could be a problem. Tom didn't answer my question
about whether any even-vaguely-modern hardware is still using non-IEEE
floats, which I suspect means that the answer is "no".Fwiw the answer to that is certainly no. The only caveat is that some
platforms have not entirely complete implementations of IEEE missing
corner cases such as denormalized values but I don't think that would
be something that would be changed with a different hash function
though.
Well... along with the Intel/IEEE-754 and VAX formats, there is a
third floating point format that is/was in widespread use: IBM hex
float[1]https://en.wikipedia.org/wiki/IBM_Floating_Point_Architecture. It's base 16 rather than base 2, and might be the default
on some IBM operating systems[2]https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.cbcux01/flotcop.htm#flotcop for the C float and double types (but
fortunately not xlc for AIX or Linux, and I have no clue about i/OS).
This is probably irrelevant because it looks like people aren't
running PostgreSQL on z/OS right now[3]/messages/by-id/BLU437-SMTP4B3FF36035D8A3C3816D49C160@phx.gbl, but unlike VAXen these
systems are alive and well so I just wanted to respond to the
implication on this thread that the whole world is a VAX or an IEEE
system :-) People really use this... I happen to know a shop that has
petabytes of IBM hex float data.
(IBM hardware also supports base 10 floats, but they show up as
different types in C so not relevant.)
[1]: https://en.wikipedia.org/wiki/IBM_Floating_Point_Architecture
[2]: https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.cbcux01/flotcop.htm#flotcop
[3]: /messages/by-id/BLU437-SMTP4B3FF36035D8A3C3816D49C160@phx.gbl
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
On 2017-05-14 15:59:09 -0400, Greg Stark wrote:
Personally while I would like to avoid code that actively crashes or
fails basic tests on Vax
I personally vote for simply refusing to run/compile on non-IEEE
platforms, including VAX.
The point of wanting that is not because anybody thinks that VAX per
se is an interesting platform anymore. The point is to avoid locking
ourselves into narrow assumptions that we may need to break someday.
Saying "nobody cares about floats other than IEEE floats" is morally
indistinguishable from saying "nobody cares about running on any
platform except Windows", which was a depressingly common opinion
back in the nineties or so.
It may well be that we can get away with saying "we're not going
to make it simple to move hash-partitioned tables with float
partition keys between architectures with different float
representations". But there's a whole lot of daylight between that
and denying any support for float representations other than the
currently-most-popular one.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-05-14 18:25:08 -0400, Tom Lane wrote:
It may well be that we can get away with saying "we're not going
to make it simple to move hash-partitioned tables with float
partition keys between architectures with different float
representations". But there's a whole lot of daylight between that
and denying any support for float representations other than the
currently-most-popular one.
Note that I, IIRC in the mail you responded to, also argued that I don't
think it'd be a good idea to rely on hashfunctions being portable. The
amount of lock-in that'd create, especially for more complex datatypes,
seems wholly inadvisable. I still think that dumping tables in a way
they're reloaded via the top-partition (probably one copy statement for
each child partition), and prohibiting incoming fkeys to partitions, is
a better approach to all this.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Peter Geoghegan <pg@bowt.ie> writes:
The express goal of the Unicode consortium is to replace all existing
encodings with Unicode. My personal opinion is that a Unicode
monoculture would be a good thing, provided reasonable differences can
be accommodated.
Can't help remembering Randall Munroe's take on such things:
https://xkcd.com/927/
I agree that the Far Eastern systems that can't easily be replaced
by Unicode are that way mostly because they're a mess. But I'm
still of the opinion that locking ourselves into Unicode is a choice
we might regret, far down the road.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, May 15, 2017 at 10:08 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
Though looking more closely I see that the default is IEEE in 64 bit
builds, which seems like a good way to kill the older format off.
If/when someone gets PostgreSQL ported to z/OS it probably won't be
because they want to run it on an ancient 32 bit mainframe.
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, May 14, 2017 at 3:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I agree that the Far Eastern systems that can't easily be replaced
by Unicode are that way mostly because they're a mess. But I'm
still of the opinion that locking ourselves into Unicode is a choice
we might regret, far down the road.
It's not a choice that has any obvious upside, so I have no reason to
disagree. My point was only that Robert's contention that "You could
argue that text-under-LATIN1 and text-under-UTF8 aren't really the
same data type at all" seems wrong to me, because PostgreSQL seems to
want to treat encoding as a property of the machine. This is evidenced
by the fact that we expect applications to change client encoding
"transparently". That is, client encoding may be changed without in
any way affecting humans that speak a natural language that is
provided for by the application's client encoding. That's a great
ideal to have, and one that is very close to completely workable.
--
Peter Geoghegan
VMware vCenter Server
https://www.vmware.com/
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, May 14, 2017 at 6:29 PM, Andres Freund <andres@anarazel.de> wrote:
On 2017-05-14 18:25:08 -0400, Tom Lane wrote:
It may well be that we can get away with saying "we're not going
to make it simple to move hash-partitioned tables with float
partition keys between architectures with different float
representations". But there's a whole lot of daylight between that
and denying any support for float representations other than the
currently-most-popular one.Note that I, IIRC in the mail you responded to, also argued that I don't
think it'd be a good idea to rely on hashfunctions being portable. The
amount of lock-in that'd create, especially for more complex datatypes,
seems wholly inadvisable. I still think that dumping tables in a way
they're reloaded via the top-partition (probably one copy statement for
each child partition), and prohibiting incoming fkeys to partitions, is
a better approach to all this.
You'd have to prohibit a heck of a lot more than that in order for
this to work 100% reliably. You'd have to prohibit CHECK constraints,
triggers, rules, RLS policies, and UNIQUE indexes, at the least. You
might be able to convince me that some of those things are useless
when applied to partitions, but wanting a CHECK constraint that
applies to only one partition seems pretty reasonable (e.g. CHECK that
records for older years are all in the 'inactive' state, or whatever).
I think getting this to work 100% reliably in all cases would require
an impractically large hammer.
Now that's not to say that we shouldn't have a
reload-through-the-top-parent switch; actually, I think that's a good
idea. I just don't believe that it can ever be a complete substitute
for portable hash functions. I think almost everybody here agrees
that it isn't necessary to have hash functions that are 100% portable,
e.g. to VAX. But it would be nice IMHO if you could use an integer
column as the partitioning key and have that be portable between Intel
and POWER.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2017-05-14 21:22:58 -0400, Robert Haas wrote:
but wanting a CHECK constraint that applies to only one partition
seems pretty reasonable (e.g. CHECK that records for older years are
all in the 'inactive' state, or whatever).
On a hash-partitioned table?
Now that's not to say that we shouldn't have a
reload-through-the-top-parent switch; actually, I think that's a good
idea. I just don't believe that it can ever be a complete substitute
for portable hash functions. I think almost everybody here agrees
that it isn't necessary to have hash functions that are 100% portable,
e.g. to VAX. But it would be nice IMHO if you could use an integer
column as the partitioning key and have that be portable between Intel
and POWER.
I'm not saying it can't work for any datatype, I just think it'd be a
very bad idea to make it work for any non-trivial ones. The likelihood
of reguarly breaking or preventing us from improving things seems high.
I'm not sure that having a design where this most of the time works for
some datatypes is a good one.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, May 14, 2017 at 01:06:03PM -0700, Andres Freund wrote:
On 2017-05-14 15:59:09 -0400, Greg Stark wrote:
Personally while I would like to avoid code that actively crashes or
fails basic tests on VaxI personally vote for simply refusing to run/compile on non-IEEE
platforms, including VAX. The benefit of even trying to get that right,
not to speak of actually testing, seems just not there.
Do we even know that floats are precise enough to determine the
partition. For example, if you have 6.000000001, is it possible for
that to be 5.9999999 on some systems? Are IEEE systems all the same for
these values? I would say we should disallow any approximate date type
for partitioning completely.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, May 14, 2017 at 8:00 PM, Bruce Momjian <bruce@momjian.us> wrote:
Do we even know that floats are precise enough to determine the
partition. For example, if you have 6.000000001, is it possible for
that to be 5.9999999 on some systems? Are IEEE systems all the same for
these values? I would say we should disallow any approximate date type
for partitioning completely.
I'm inclined in this direction, as well. Hash partitioning is mostly
useful for things that are likely to be join keys or group keys, and
floats aren't. Same for complex user-defined types.
The real problem here is what Tom pointed out: that we would have
trouble hashing strings in an encoding-insensitive way. Strings are
useful as join/group keys, so it would be painful to not support them.
Perhaps there's some kind of compromise, like a pg_dump mode that
reloads through the root. Or maybe hash partitions are really a
"semi-logical" partitioning that the optimizer understands, but where
things like per-partition check constraints don't make sense.
Regards,
Jeff Davis
Regards,
Jeff Davis
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, May 14, 2017 at 6:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:
You'd have to prohibit a heck of a lot more than that in order for
this to work 100% reliably. You'd have to prohibit CHECK constraints,
triggers, rules, RLS policies, and UNIQUE indexes, at the least. You
might be able to convince me that some of those things are useless
when applied to partitions, but wanting a CHECK constraint that
applies to only one partition seems pretty reasonable (e.g. CHECK that
records for older years are all in the 'inactive' state, or whatever).
I think getting this to work 100% reliably in all cases would require
an impractically large hammer.
The more I think about it the more I think hash partitions are
"semi-logical". A check constraint on a single partition of a
range-partitioned table makes sense, but it doesn't make sense on a
single partition of a hash-partitioned table.
I think hash partitioning is mainly useful for parallel query (so the
optimizer needs to know about it) and maintenance commands (as you
listed in another email). But those don't need hash portability.
FKs are a little more interesting, but I actually think they still
work regardless of hash portability. If the two sides are in the same
hash opfamily, they should hash the same and it's fine. And if they
aren't, the FK has no hope of working regardless of hash portability.
This would mean we need to reload through the root as Andres and
others suggested, and disable a lot of logical partitioning
capabilities. I'd be a little worried about what we do with
attaching/detaching, though.
Regards,
Jeff Davis
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, May 15, 2017 at 07:32:30AM -0700, Jeff Davis wrote:
On Sun, May 14, 2017 at 8:00 PM, Bruce Momjian <bruce@momjian.us> wrote:
Do we even know that floats are precise enough to determine the
partition. For example, if you have 6.000000001, is it possible for
that to be 5.9999999 on some systems? Are IEEE systems all the same for
these values? I would say we should disallow any approximate date type
for partitioning completely.I'm inclined in this direction, as well. Hash partitioning is mostly
useful for things that are likely to be join keys or group keys, and
floats aren't. Same for complex user-defined types.The real problem here is what Tom pointed out: that we would have
trouble hashing strings in an encoding-insensitive way. Strings are
useful as join/group keys, so it would be painful to not support them.
Well, since we can't mix encodings in the same column, why can't we just
hash the binary representation of the string? My point is that wish
hashing we aren't comparing one string with another, right?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, May 15, 2017 at 07:48:14AM -0700, Jeff Davis wrote:
This would mean we need to reload through the root as Andres and
others suggested,
One refinement of this would be to traverse the partition tree,
stopping at the first place where the next branch has hash partitions,
or at any rate types which have no application-level significance, and
load from there. This could allow for parallelizing where it's
portable to do so.
Level Table Partition Type
------------------------------------------------
Base table: Log (N/A)
Next partition: Year (range)
Next partition: Month (range)
Next partition: Day (range) <---- Data gets loaded no lower than here
Next partition: * (hash)
That last, any below it, doesn't have a specific name because they're
just an implementation detail, i.e. none has any application-level
meaning.
and disable a lot of logical partitioning capabilities. I'd be a
little worried about what we do with attaching/detaching, though.
Attaching and detaching partitions only makes sense for partitions
whose partition keys have application-level meaning anyway.
Does it make sense at this point to separate our partitions into two
categories, those which have can significance to applications, and
those which can't?
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, May 14, 2017 at 9:35 PM, Andres Freund <andres@anarazel.de> wrote:
On 2017-05-14 21:22:58 -0400, Robert Haas wrote:
but wanting a CHECK constraint that applies to only one partition
seems pretty reasonable (e.g. CHECK that records for older years are
all in the 'inactive' state, or whatever).On a hash-partitioned table?
No, probably not. But do we really want the rules for partitioned
tables to be different depending on the kind of partitioning in use?
I'm not saying it can't work for any datatype, I just think it'd be a
very bad idea to make it work for any non-trivial ones. The likelihood
of reguarly breaking or preventing us from improving things seems high.
I'm not sure that having a design where this most of the time works for
some datatypes is a good one.
I think you might be engaging in undue pessimism here, but I suspect
we need to actually try doing the work before we know how it will turn
out.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On May 15, 2017, at 7:48 AM, Jeff Davis <pgsql@j-davis.com> wrote:
On Sun, May 14, 2017 at 6:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:
You'd have to prohibit a heck of a lot more than that in order for
this to work 100% reliably. You'd have to prohibit CHECK constraints,
triggers, rules, RLS policies, and UNIQUE indexes, at the least. You
might be able to convince me that some of those things are useless
when applied to partitions, but wanting a CHECK constraint that
applies to only one partition seems pretty reasonable (e.g. CHECK that
records for older years are all in the 'inactive' state, or whatever).
I think getting this to work 100% reliably in all cases would require
an impractically large hammer.The more I think about it the more I think hash partitions are
"semi-logical". A check constraint on a single partition of a
range-partitioned table makes sense, but it doesn't make sense on a
single partition of a hash-partitioned table.
That depends on whether the user gets to specify the hash function, perhaps
indirectly by specifying a user defined opfamily. I can imagine clever hash
functions that preserve certain properties of the incoming data, and check
constraints in development versions of the database that help verify the hash
is not violating those properties.
That's not to say such hash functions must be allowed in the hash partitioning
implementation; just that it does make sense if you squint and look a bit sideways
at it.
Mark Dilger
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, May 15, 2017 at 03:26:02PM -0400, Robert Haas wrote:
On Sun, May 14, 2017 at 9:35 PM, Andres Freund <andres@anarazel.de> wrote:
On 2017-05-14 21:22:58 -0400, Robert Haas wrote:
but wanting a CHECK constraint that applies to only one partition
seems pretty reasonable (e.g. CHECK that records for older years
are all in the 'inactive' state, or whatever).On a hash-partitioned table?
No, probably not. But do we really want the rules for partitioned
tables to be different depending on the kind of partitioning in use?
As the discussion has devolved here, it appears that there are, at
least conceptually, two fundamentally different classes of partition:
public, which is to say meaningful to DB clients, and "private", used
for optimizations, but otherwise opaque to DB clients.
Mashing those two cases together appears to cause more problems than
it solves.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, May 15, 2017 at 1:04 PM, David Fetter <david@fetter.org> wrote:
As the discussion has devolved here, it appears that there are, at
least conceptually, two fundamentally different classes of partition:
public, which is to say meaningful to DB clients, and "private", used
for optimizations, but otherwise opaque to DB clients.Mashing those two cases together appears to cause more problems than
it solves.
I concur at this point. I originally thought hash functions might be
made portable, but I think Tom and Andres showed that to be too
problematic -- the issue with different encodings is the real killer.
But I also believe hash partitioning is important and we shouldn't
give up on it yet.
That means we need to have a concept of hash partitions that's
different from range/list partitioning. The terminology
"public"/"private" does not seem appropriate. Logical/physical or
external/internal might be better.
With hash partitioning:
* User only specifies number of partitions of the parent table; does
not specify individual partition properties (modulus, etc.)
* Dump/reload goes through the parent table (though we may provide
options so pg_dump/restore can optimize this)
* We could provide syntax to adjust the number of partitions, which
would be expensive but still useful sometimes.
* All DDL should be on the parent table, including check constraints,
FKs, unique constraints, exclusion constraints, indexes, etc.
- Unique and exclusion constraints would only be permitted if the
keys are a superset of the partition keys.
- FKs would only be permitted if the two table's partition schemes
match and the keys are members of the same hash opfamily (this could
be relaxed slightly, but it gets a little confusing if so)
* No attach/detach of partitions
* All partitions have the same permissions
* Individual partitions would only be individually-addressable for
maintenance (like reindex and vacuum), but not for arbitrary queries
- perhaps also COPY for bulk loading/dumping, in case we get clients
smart enough to do their own hashing.
The only real downside is that it could surprise users -- why can I
add a CHECK constraint on my range-partitioned table but not the
hash-partitioned one? We should try to document this so users don't
find that out too far along. As long as they aren't surprised, I think
users will understand why these aren't quite the same concepts.
Regards,
Jeff Davis
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, May 16, 2017 at 11:10 AM, Jeff Davis <pgsql@j-davis.com> wrote:
With hash partitioning:
* User only specifies number of partitions of the parent table; does
not specify individual partition properties (modulus, etc.)
* Dump/reload goes through the parent table (though we may provide
options so pg_dump/restore can optimize this)
* We could provide syntax to adjust the number of partitions, which
would be expensive but still useful sometimes.
* All DDL should be on the parent table, including check constraints,
FKs, unique constraints, exclusion constraints, indexes, etc.
- Unique and exclusion constraints would only be permitted if the
keys are a superset of the partition keys.
- FKs would only be permitted if the two table's partition schemes
match and the keys are members of the same hash opfamily (this could
be relaxed slightly, but it gets a little confusing if so)
* No attach/detach of partitions
* All partitions have the same permissions
* Individual partitions would only be individually-addressable for
maintenance (like reindex and vacuum), but not for arbitrary queries
- perhaps also COPY for bulk loading/dumping, in case we get clients
smart enough to do their own hashing.
I don't really find this a very practical design. If the table
partitions are spread across different relfilenodes, then those
relfilenodes have to have separate pg_class entries and separate
indexes, and those indexes also need to have separate pg_class
entries. Otherwise, nothing works. And if they do have separate
pg_class entries, then the partitions have to have their own names,
and likewise for their indexes, and a dump-and-reload has to preserve
those names. If it doesn't, and those objects get new system-assigned
names after the dump-and-reload, then dump restoration can fail when a
system-assigned name collides with an existing name that is first
mentioned later in the dump.
If we had the ability to have anonymous pg_class entries -- relations
that have no names -- then maybe it would be possible to make
something like what you're talking about work. But that does not seem
easy to do. There's a unique index on (relname, relnamespace) for
good reason, and we can't make it partial on a system catalog. We
could make the relname column allow nulls, but that would add overhead
to any code that needs to access the relation name, and there's a fair
amount of that.
Similarly, if we had the ability to associate multiple relfilenodes
with a single relation, and if index entries could point to
<which-relfilenode, block, offset> rather than just <block, offset>,
then we could also make this work. But either of those things would
require significant re-engineering and would have downsides in other
cases.
If Java has portable hash functions, why can't we?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, May 16, 2017 at 08:10:39AM -0700, Jeff Davis wrote:
On Mon, May 15, 2017 at 1:04 PM, David Fetter <david@fetter.org> wrote:
As the discussion has devolved here, it appears that there are, at
least conceptually, two fundamentally different classes of partition:
public, which is to say meaningful to DB clients, and "private", used
for optimizations, but otherwise opaque to DB clients.Mashing those two cases together appears to cause more problems than
it solves.I concur at this point. I originally thought hash functions might be
made portable, but I think Tom and Andres showed that to be too
problematic -- the issue with different encodings is the real killer.But I also believe hash partitioning is important and we shouldn't
give up on it yet.That means we need to have a concept of hash partitions that's
different from range/list partitioning. The terminology
"public"/"private" does not seem appropriate. Logical/physical or
external/internal might be better.
I'm not attached to any particular terminology.
With hash partitioning:
* User only specifies number of partitions of the parent table; does
not specify individual partition properties (modulus, etc.)
Maybe this is over-thinking it, but I'm picturing us ending up with
something along the lines of:
PARTITION BY INTERNAL(EXPRESSION)
[ WITH ( [PARAMETERS] [, AS, APPROPRIATE] ) ]
i.e. it's not clear that we should wire in "number of partitions" as a
parameter.
In a not that distant future, ANALYZE and similar could have a say in
determining both the "how" and the "whether" of partitioning.
* Dump/reload goes through the parent table (though we may provide
options so pg_dump/restore can optimize this)
Would it be simplest to default to routing through the immediate
ancestor for now?
It occurs to me that with the opaque partition system we're designing
here, internal partitions would necessarily be leaves in the tree.
* We could provide syntax to adjust the number of partitions, which
would be expensive but still useful sometimes.
Yep. I suspect that techniques for this are described in literature,
and possibly even in code bases. Any pointers?
* All DDL should be on the parent table, including check constraints,
FKs, unique constraints, exclusion constraints, indexes, etc.
Necessarily.
- Unique and exclusion constraints would only be permitted if the
keys are a superset of the partition keys.
"Includes either all of the partition expression or none of it,"
maybe?
- FKs would only be permitted if the two table's partition schemes
match and the keys are members of the same hash opfamily (this could
be relaxed slightly, but it gets a little confusing if so)
Relaxing sounds like a not-in-the-first-cut feature, and subtle.
* No attach/detach of partitions
Since they're opaque, this is the only sane thing.
* All partitions have the same permissions
Since they're opaque, this is the only sane thing.
* Individual partitions would only be individually-addressable for
maintenance (like reindex and vacuum), but not for arbitrary queries
Since they're opaque, this is the only sane thing.
- perhaps also COPY for bulk loading/dumping, in case we get clients
smart enough to do their own hashing.
This is appealing from a resource allocation point of view in the
sense of deciding where the hash computing resources are spent. Do we
want something like the NOT VALID/VALIDATE infrastructure to support
it?
The only real downside is that it could surprise users -- why can I
add a CHECK constraint on my range-partitioned table but not the
hash-partitioned one? We should try to document this so users don't
find that out too far along. As long as they aren't surprised, I think
users will understand why these aren't quite the same concepts.
+1
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tuesday, May 16, 2017, Robert Haas <robertmhaas@gmail.com> wrote:
I don't really find this a very practical design. If the table
partitions are spread across different relfilenodes, then those
relfilenodes have to have separate pg_class entries and separate
indexes, and those indexes also need to have separate pg_class
entries. Otherwise, nothing works. And if they do have separate
pg_class entries, then the partitions have to have their own names,
and likewise for their indexes, and a dump-and-reload has to preserve
those names. If it doesn't, and those objects get new system-assigned
names after the dump-and-reload, then dump restoration can fail when a
system-assigned name collides with an existing name that is first
mentioned later in the dump.
Why can't hash partitions be stored in tables the same way as we do TOAST?
That should take care of the naming problem.
If Java has portable hash functions, why can't we?
Java standardizes on a particular unicode encoding (utf-16). Are you
suggesting that we do the same? Or is there another solution that I am
missing?
Regards,
Jeff Davis
On 5/16/17 11:10, Jeff Davis wrote:
I concur at this point. I originally thought hash functions might be
made portable, but I think Tom and Andres showed that to be too
problematic -- the issue with different encodings is the real killer.
I think it would be OK that if you want to move a hash-partitioned table
to a database with a different encoding, you have to do dump/restore
through the parent table. This is quite similar to what you have to do
now if you want to move a range-partitioned table to a database with a
different locale.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/05/17 5:25, Jeff Davis wrote:
On Tuesday, May 16, 2017, Robert Haas <robertmhaas@gmail.com> wrote:
I don't really find this a very practical design. If the table
partitions are spread across different relfilenodes, then those
relfilenodes have to have separate pg_class entries and separate
indexes, and those indexes also need to have separate pg_class
entries. Otherwise, nothing works. And if they do have separate
pg_class entries, then the partitions have to have their own names,
and likewise for their indexes, and a dump-and-reload has to preserve
those names. If it doesn't, and those objects get new system-assigned
names after the dump-and-reload, then dump restoration can fail when a
system-assigned name collides with an existing name that is first
mentioned later in the dump.Why can't hash partitions be stored in tables the same way as we do TOAST?
That should take care of the naming problem.
There is only one TOAST table per relation though, containing all of the
relation's TOASTed data. Doesn't the multiplicity of hash partitions pose
a problem? Will a hash partition of given name end up with the same
subset of data in the target database as it did in the source database? I
suppose it won't matter though if we make hash partitions an
implementation detail of hash partitioned tables to the extent you
described in your earlier email [1]/messages/by-id/CAMp0ubcQ3VYdU1kNUCOmpj225U4hk6ZEoaUVeReP8h60p+mv1Q@mail.gmail.com, whereby it doesn't matter to an
application which partition contains what subset of the table's data.
Thanks,
Amit
[1]: /messages/by-id/CAMp0ubcQ3VYdU1kNUCOmpj225U4hk6ZEoaUVeReP8h60p+mv1Q@mail.gmail.com
/messages/by-id/CAMp0ubcQ3VYdU1kNUCOmpj225U4hk6ZEoaUVeReP8h60p+mv1Q@mail.gmail.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, May 16, 2017 at 8:40 PM, Jeff Davis <pgsql@j-davis.com> wrote:
On Mon, May 15, 2017 at 1:04 PM, David Fetter <david@fetter.org> wrote:
As the discussion has devolved here, it appears that there are, at
least conceptually, two fundamentally different classes of partition:
public, which is to say meaningful to DB clients, and "private", used
for optimizations, but otherwise opaque to DB clients.Mashing those two cases together appears to cause more problems than
it solves.I concur at this point. I originally thought hash functions might be
made portable, but I think Tom and Andres showed that to be too
problematic -- the issue with different encodings is the real killer.But I also believe hash partitioning is important and we shouldn't
give up on it yet.That means we need to have a concept of hash partitions that's
different from range/list partitioning. The terminology
"public"/"private" does not seem appropriate. Logical/physical or
external/internal might be better.With hash partitioning:
* User only specifies number of partitions of the parent table; does
not specify individual partition properties (modulus, etc.)
a well distributed integer column doesn't even need to be hashed, a
simple modulo works with it. If we are going towards "implicit" (yet
another name) partitioning, we could choose the strategy based on the
data type of the partition key, not just hash it always. Although, we
might end up hashing it in most of the cases.
* Dump/reload goes through the parent table (though we may provide
options so pg_dump/restore can optimize this)
Probably you imply immediate hash partitioned parent, but just let me
clarify it a bit. We support multi-level partitioning with each
partitioned table anywhere in the partitioning hierarchy choosing any
partitioning scheme. So, we can have range partitioned table as a
partition of a hash partitioned table or for that matter, a non-hash
partitioned table which is somewhere in the hiearchy rooted at the
hash partitioned table. So, for range/list even hash partitions that
are grand-children of a hash partitioned table, we will need to route
dump/reload through that hash partitioned table i.e. route it through
the topmost hash-partitioned table.
* We could provide syntax to adjust the number of partitions, which
would be expensive but still useful sometimes.
* All DDL should be on the parent table, including check constraints,
FKs, unique constraints, exclusion constraints, indexes, etc.
i.e. topmost hash partitioned table as explained above.
- Unique and exclusion constraints would only be permitted if the
keys are a superset of the partition keys.
Do you think this constraint apply even after we support global
indexes? Isn't this applicable to all the partitioning strategies?
- FKs would only be permitted if the two table's partition schemes
match and the keys are members of the same hash opfamily (this could
be relaxed slightly, but it gets a little confusing if so)
* No attach/detach of partitions
It will be good, if we can support this for maintenance purpose. If a
partition goes bad, we could replace it with its copy somewhere, using
attach and detach without affecting the whole table. Now does that
mean, that we will need to support some form of pg_dump/copy with
special flag to create copies of individual partitions? I think that
proposal has already been floated.
* All partitions have the same permissions
Why's that?
The only real downside is that it could surprise users -- why can I
add a CHECK constraint on my range-partitioned table but not the
hash-partitioned one? We should try to document this so users don't
find that out too far along. As long as they aren't surprised, I think
users will understand why these aren't quite the same concepts.
For a transparent hash (non-transparent in the sense of what Mark
Dilger proposed), any constraint other than implicit partitioning
constraint is applicable to the whole table or it's not applicable at
all. So, better if user adds it on the parent hash table. So, yes,
with this reasoning, we could document this fact.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, May 16, 2017 at 4:25 PM, Jeff Davis <pgsql@j-davis.com> wrote:
Why can't hash partitions be stored in tables the same way as we do TOAST?
That should take care of the naming problem.
Hmm, yeah, something like that could be done, but every place where
you are currently allowed to refer to a partition by name would have
to be be changed to accept some other syntax for specifying the
partition. Even with all the things you propose to disallow, things
like CLUSTER, VACUUM, ANALYZE, etc. would still have to be accepted on
individual partitions. I suspect even CREATE INDEX and DROP INDEX
would need to be accepted on individual partitions, because an index
on one partition somehow becomes bloated while the corresponding
indexes on other partitions are OK, you'll want to create a
replacement index concurrently and drop the old one. Of course, for
similar reasons, you'd need some way for \d on the parent to display
information on indexes on all the children, and all of that output
would have to frobbed to use whatever syntax is now required, in lieu
of a name, to use an individual partition. Error messages would have
to be adjusted in probably quite a few places to use the new notation,
too. And on and on. It's not impossible to do, but we could end up
chasing down loose ends for a very long time.
Beyond that, I think it's a bad idea to make hash partitions behave
completely differently from list and range partitions. That's a lot
of code extra code to maintain, and a lot of extra notional complexity
for users, for really not a lot of benefit. I think we're taking a
significant but not overwhelming problem -- our current hash functions
aren't portable -- and through a chain of logical links eventually
ending up with the conclusion that the design of partitioning needs to
be totally overhauled. I want to resist that conclusion. I'm not
saying that the problem isn't a problem, or that there's not some
logic to each step in the chain, but it's not that hard to blow a
small problem up into a huge one by assuming the worst possible
consequences or the tightest possible requirements at each step.
http://tvtropes.org/pmwiki/pmwiki.php/Main/ForWantOfANail is not an
argument for stricter regulation of the blacksmith industry.
If Java has portable hash functions, why can't we?
Java standardizes on a particular unicode encoding (utf-16). Are you
suggesting that we do the same? Or is there another solution that I am
missing?
Well, I've already said several times (and Peter Eisentraut has
agreed) that we don't really need the hash functions to be portable
across encodings. I think there are at least three good arguments for
that position.
First, as Peter Geoghegan points out, the word is increasingly
standardizing on Unicode, and that trend seems likely to continue. I
strongly suspect that UTF-8 is the most common database encoding by a
wide margin. There may occasionally be reasons to avoid it if, for
example, you're using one of the Eastern languages that doesn't play
entirely nicely with UTF-8, or if you happen to be storing a large
number of characters that can be represented in a single byte in some
other encoding but which require multiple bytes in UTF-8, but for an
awful lot of people UTF-8 just works and there's no need to think any
further. So, a lot of people will never hit the problem of needing to
migrate a database between encodings because they'll just use UTF-8.
Second, if the previous argument turns out to be wrong and the world
abandons UTF-8 in favor of some new and better system (UTF-9?), or if
users frequently want to make some other encoding conversion like
Tom's original example of LATIN1 -> UTF-8, we've already got a
proposed workaround for that case which seems like it will work just
fine. Just dump your data with pg_dump
--insert-hash-partitioned-data-into-parent and reload on the new
system. This isn't absolutely guaranteed to work if you've done
something silly that will make the load of a particular row work on
one partition and fail on some other one, but you probably won't do
that because it would be dumb. Also, it will be a bit slower than a
regular dump-and-reload cycle because tuple routing isn't free.
Neither of these problems really sound very bad. If we're going to
start fixing things that could cause database migrations/upgrades to
occasionally fail in corner cases or run more slowly than expected,
there's a long list of things that you can do in your DDL that will
make pg_upgrade bomb out, and many of them are things that bite users
with some regularity (e.g. tablespaces inside the data directory or
other tablespaces, dependencies on system objects that are changed in
the new version, ALTER USER .. SET ROLE). For whatever reason, we
haven't viewed those warts as really high-priority items in need of
fixing; in some cases, instead of actually trying to improve
usability, we've all but mocked the people reporting those issues for
having the temerity to do configure the system in a way that we think
isn't very smart. How can we then turn around and say "it's
absolutely unacceptable for there to be any case where dump and reload
fails even if there's a workaround switch for pg_dump that will almost
always solve the problem"? That's holding hash partitioning to a
stricter standard than our existing features, which seems unjustified.
Third, the fact that we support multiple encodings ought to be a
strong point of PostgreSQL, not an excuse for running away from
features. This same issue came up with JSON support, and the question
was, well, what do we do about the places where JSON assumes that
everything is UTF-8, like in \uXXXX escape sequences, given that we
support multiple encodings? We could have answered that question by
giving up and saying that JSON support is just too hard in our
multi-encoding environment, but we found some solution that let us
move forward. I think the result is that JSON is not really quite
per-spec unless you are using UTF-8 as your encoding, but I am
confident that it's better to have made that trade-off than to just
not support JSON at all. Similarly, I think that if we use the fact
that we support multiple encodings either as an excuse for why we
shouldn't have hash partitioning at all or for why the design needs to
be vastly more complex than would otherwise be required, we've made a
liability of what should be an asset. Tom's willing to make those
arguments, but he doesn't think hash partitioning is worthwhile in the
first place. If we believe it is worthwhile (and I do), then we
should be looking for a set of design decisions that make implementing
it reasonably practical, maybe with some trade-offs, rather than a set
of decisions that increase the complexity to the point where it will
take us 5-10 years to unravel all the problems.
Also, it's worth noting that our decision here doesn't need to be
monolithic. If we want, we can ship one opfamily that hashes strings
in the obvious way - i.e. fast but not portable across encodings - and
another that converts the input string to UTF-8 and then hashes the
result. The latter will be somewhat slow, but it will be portable
across encodings, and I don't have any problem at all with providing
it if somebody wants to do the legwork. It'll also fail in any
encodings that can't be converted to UTF-8, but it seems a bit much to
insist that hash partitioning must solve a problem that the Unicode
consortium hasn't managed to overcome. Anyway, if we do something
like that, then users can choose whether they want the portability for
which Tom advocates or the speed which I believe most users will
prefer.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, May 16, 2017 at 4:25 PM, Jeff Davis <pgsql@j-davis.com> wrote:
Why can't hash partitions be stored in tables the same way as we do TOAST?
That should take care of the naming problem.
Hmm, yeah, something like that could be done, but every place where
you are currently allowed to refer to a partition by name would have
to be be changed to accept some other syntax for specifying the
partition.
Uh ... toast tables have regular names, and can be specified in commands
just like any other table. I don't see why these "auto" partition tables
couldn't be handled the same way.
Beyond that, I think it's a bad idea to make hash partitions behave
completely differently from list and range partitions.
I think the question is whether we are going to make a distinction between
logical partitions (where the data division rule makes some sense to the
user) and physical partitions (where it needn't). I think it might be
perfectly reasonable for those to behave differently.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, May 17, 2017 at 2:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, May 16, 2017 at 4:25 PM, Jeff Davis <pgsql@j-davis.com> wrote:
Why can't hash partitions be stored in tables the same way as we do TOAST?
That should take care of the naming problem.Hmm, yeah, something like that could be done, but every place where
you are currently allowed to refer to a partition by name would have
to be be changed to accept some other syntax for specifying the
partition.Uh ... toast tables have regular names, and can be specified in commands
just like any other table. I don't see why these "auto" partition tables
couldn't be handled the same way.
Really? That seems like a huge usability fail to me. If somebody
wants to create an index on one partition of a hash-partitioned table,
or reindex an index, do you really want them to have to dig out an
internal name to do it? And how exactly would you dump and restore
the partitions and their indexes? It's true that there are some
operations that can be performed directly on a TOAST table, but the
saving grace is that you usually don't need to do any of them. That
won't be true here.
Beyond that, I think it's a bad idea to make hash partitions behave
completely differently from list and range partitions.I think the question is whether we are going to make a distinction between
logical partitions (where the data division rule makes some sense to the
user) and physical partitions (where it needn't). I think it might be
perfectly reasonable for those to behave differently.
I don't think I'd like to go so far as to say that it's unreasonable,
but I certainly wouldn't say I'm optimistic about such a design. I do
not think that it is going to work to conceal from the user that the
partitions are really separate tables with their own indexes. I also
think that trying to make such a thing work is just going to lead to a
lot of time and energy spent trying to paper over problems that are
basically self-inflicted, and that papering over those problems won't
really end up having any value for users.
Remember, the chain of reasoning here is something like:
1. To handle dump-and-reload the way we partitioning does today, hash
functions would need to be portable across encodings.
2. That's impractically difficult.
3. So let's always load data through the top-parent.
4. But that could fail due to e.g. a UNIQUE index on an individual
child, so let's try to prohibit all of the things that could be done
to an individual partition that could cause a reload failure.
5. And then for good measure let's hide the existence of the partitions, too.
Every step in that chain of logic has a certain sense to it, but none
of them are exactly water-tight. #1 is basically a value judgement:
would people rather (a) have faster hash functions, or (b) would they
rather be able to port a database to a different encoding without
having rows move between hash functions? The statement is only true
if you think it's the latter, but I tend to think it's the former. #2
is a judgement that the performance characteristics of
as-yet-unwritten portable hashing will be so bad that nobody could
possibly be satisfied with it. #3 is a great idea as an optional
behavior, but it's only a strict necessity if you're totally committed
to #1 and #2. It also has some performance cost, which makes it
somewhat undesirable as a default behavior. #4 is *probably* a
necessary consequence of #3. I don't know what the argument for #5 is
unless it's that #4 isn't hard enough already.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, May 17, 2017 at 12:10 PM, Robert Haas <robertmhaas@gmail.com> wrote:
1. To handle dump-and-reload the way we partitioning does today, hash
functions would need to be portable across encodings.
2. That's impractically difficult.
3. So let's always load data through the top-parent.
4. But that could fail due to e.g. a UNIQUE index on an individual
child, so let's try to prohibit all of the things that could be done
to an individual partition that could cause a reload failure.
5. And then for good measure let's hide the existence of the partitions, too.
That is one thread of logic, but out of the discussion also
highlighted some of the consequences of treating hash partitions like
range/list partitions.
For instance, it makes little sense to have individual check
constraints, indexes, permissions, etc. on a hash-partitioned table.
It doesn't mean that we should necessarily forbid them, but it should
make us question whether combining range and hash partitions is really
the right design.
Regards,
Jeff Davis
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, May 17, 2017 at 11:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think the question is whether we are going to make a distinction between
logical partitions (where the data division rule makes some sense to the
user) and physical partitions (where it needn't). I think it might be
perfectly reasonable for those to behave differently.
Agreed. To summarize my perspective:
* hash partitioning offers a nice way to divide the data for later
processing by parallel query
* range partitioning is good for partition elimination
(constraint_exclusion) and separating hot/cold data (e.g. partitioning
on date)
* both offer some maintenance benefits (e.g. reindex one partition at
a time), though range partitioning seems like it offers better
flexibility here in some cases
I lean toward separating the concepts, but Robert is making some
reasonable arguments and I could be convinced.
Regards,
Jeff Davis
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, May 18, 2017 at 1:53 AM, Jeff Davis <pgsql@j-davis.com> wrote:
For instance, it makes little sense to have individual check
constraints, indexes, permissions, etc. on a hash-partitioned table.
It doesn't mean that we should necessarily forbid them, but it should
make us question whether combining range and hash partitions is really
the right design.
I think that it definitely makes sense to have individual indexes on a
hash-partitioned table. If you didn't, then as things stand today,
you'd have no indexes at all, which can't be good. In the future, we
might have some system where an index created on the parent cascades
down to all of the children, but even then, you might want to REINDEX
just one of those child indexes, or better yet, create a replacement
index concurrently and then drop the old one concurrently. You might
also want to add the same sort of new index to every partition, but
not in a single operation - for reasons of load, length of maintenance
window, time for which a snapshot is held open, etc.
I agree that separate constraints and permissions on hash partitions
don't make much sense. To a lesser extent, that's true of other kinds
of partitioning as well. I mean, there is probably some use case for
setting separate permissions on a range-partitioned table, but it's a
pretty thin use case. It certainly seems possible that many users
would prefer a rule that enforces uniform permissions across the
entire partitioning hierarchy. This is one of the key things that had
to be decided in regard to the partitioning implementation we now
have: for which things should we enforce uniformity, and for which
things should we allow diversity? I advocated for enforcing
uniformity only in areas where we could see a clear advantage to it,
which led to the fairly minimal approach of enforcing only that we had
no multiple inheritance and no extra columns in the children, but
that's certainly an arguable position. Other people argued for more
restrictions, I believe out of a desire to create more administrative
simplicity, but there is a risk of cutting yourself off from useful
configurations there, and it seems very difficult to me to draw a hard
line between what is useful and what is useless.
For example, consider a hash-partitioned table. Could it make sense
to have some but not all partitions be unlogged? I think it could.
Suppose you have a cluster of machines each of which has a replica of
the same hash-partitioned table. Each server uses logged tables for
the partitions for which it is the authoritative source of
information, and unlogged tables for the others. In the event of
crash, the data for any tables that are lost are replicated from the
master for that machine. I can think of some disadvantages of that
design, but I can think of some advantages, too, and I think it's
pretty hard to say that nobody should ever want to do it. And if it's
legitimate to want to do that, then what if I want to use
trigger-based replication rather than logical replication? Then I
might need triggers on some partitions but not all, or maybe different
triggers on different partitions.
Even for a permissions grant, suppose my production system is having
some problem that can't be replicated on the test data set. Is it
reasonable to want to give a trusted developer access to a slice, but
not all of, my production data? I could allow them access to just one
partition. Maybe not a common desire, but is that enough reason to
ban it? I'd say it's arguable. I don't think that there are bright
lines around any of this stuff. My experience with this area has led
me to give up on the idea of complete uniformity as impractical, and
instead look at it from the perspective of "what do we absolutely have
to ban in order for this to be sane?".
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thursday, May 18, 2017, Robert Haas <robertmhaas@gmail.com> wrote:
My experience with this area has led
me to give up on the idea of complete uniformity as impractical, and
instead look at it from the perspective of "what do we absolutely have
to ban in order for this to be sane?".
I could agree to something like that. Let's explore some of the challenges
there and potential solutions:
1. Dump/reload of hash partitioned data.
Falling back to restore-through-the-root seems like a reasonable answer
here. Moving to a different encoding is not an edge case, but it's not
common either, so a performance penalty seems acceptable. I'm not
immediately sure how we'd implement this in pg_dump/restore, so I'd feel a
little more comfortable if I saw a sketch.
2. Having a lot of hash partitions would be cumbersome
The user would need to create and manage each partition, and try to do
global operations in a sane way. The normal case would probably involve
scripts to do things like add an index to all partitions, or a column. Many
partitions would also just pollute the namespace unless you remember to put
them in a separate schema (yes, it's easy, but most people will still
forget). Some syntax sugar would go a long way here.
3. The user would need to specify details they really don't care about for
each partition.
Things like "modulus 16, remainder 0", "modulus 16, remainder 1" are
tedious boilerplate. And if the user makes a mistake, then 1/16 of inserts
start failing. Probably would be caught during testing, but not exactly a
good user experience. I'm not thrilled about this, considering that all the
user really wants is 16 partitions, but it's not the end of the world.
4. Detach is a foot-gun
If you detach a partition, random inserts will start failing. Not thrilled
about this, but a hapless user would accept most of the blame if they
stumble over it. Another way of saying this is with hash partitioning you
really need the whole set for the table to be online at all. But we can't
really enforce that, because it would limit some of the flexibility that
you have in mind.
Stepping back, your approach might be closer to the general postgres
philosophy of allowing the user to assemble from spare parts first, then a
few releases later we offer some pre-built subassemblies, and a few
releases later we make the typical cases work out of the box. I'm fine with
it as long as we don't paint ourselves into a corner.
Of course we still have work to do on the hash functions. We should solve
at least the most glaring portability problems, and try to harmonize the
hash opfamilies. If you agree, I can put together a patch or two.
Regards,
Jeff Davis
On Fri, May 19, 2017 at 2:36 AM, Jeff Davis <pgsql@j-davis.com> wrote:
I could agree to something like that. Let's explore some of the challenges
there and potential solutions:1. Dump/reload of hash partitioned data.
Falling back to restore-through-the-root seems like a reasonable answer
here. Moving to a different encoding is not an edge case, but it's not
common either, so a performance penalty seems acceptable. I'm not
immediately sure how we'd implement this in pg_dump/restore, so I'd feel a
little more comfortable if I saw a sketch.
Right, I think this needs some investigation. I can't whip up a
sketch on short notice, but I'll see if someone else at EnterpriseDB
can work on it unless somebody else wants to take a crack at it.
2. Having a lot of hash partitions would be cumbersome
The user would need to create and manage each partition, and try to do
global operations in a sane way. The normal case would probably involve
scripts to do things like add an index to all partitions, or a column. Many
partitions would also just pollute the namespace unless you remember to put
them in a separate schema (yes, it's easy, but most people will still
forget). Some syntax sugar would go a long way here.
I agree. Adding a column already cascades to all children, and
there's a proposal to make the same thing true for indexes. See
discussion beginning at
/messages/by-id/c8fe4f6b-ff46-aae0-89e3-e936a35f0cfd@postgrespro.ru
I haven't had time to review the code posted there yet, but I would
like to see something along the lines discussed there committed to
v11, and hopefully also something around foreign keys. It should be
possible to create an outbound foreign key on a foreign table and have
that cascade to all children. Conversely, it should also be possible
to create a foreign key referencing a partitioned table provided that
the foreign key references the partitioning key, and that there's a
unique index on those same columns on every partition. (Referencing a
foreign key that is not the partitioning key will have to wait for
global indexes, I think.)
These things are useful not just for hash partitioning, but also for
list and range partitioning, and we'll save a lot of work if we can
use the same infrastructure for both cases.
3. The user would need to specify details they really don't care about for
each partition.Things like "modulus 16, remainder 0", "modulus 16, remainder 1" are tedious
boilerplate. And if the user makes a mistake, then 1/16 of inserts start
failing. Probably would be caught during testing, but not exactly a good
user experience. I'm not thrilled about this, considering that all the user
really wants is 16 partitions, but it's not the end of the world.
As I said on the hash partitioning thread, I think the way to handle
this is to get the basic feature in first, then add some convenience
syntax to auto-create the partitions. That shouldn't be very
difficult to implement; I just didn't want to complicate things more
than necessary for the first version. The same issue arose when
discussing list and range partitioning: Oracle has syntax like ours,
but with the ability (requirement?) to create the partitions in the
initial CREATE TABLE statement. However, I wanted to make sure we had
the "inconvenient" syntax fully working and fully tested before we
added that, because I believe pg_dump needs to dump everything out the
long way.
4. Detach is a foot-gun
If you detach a partition, random inserts will start failing. Not thrilled
about this, but a hapless user would accept most of the blame if they
stumble over it. Another way of saying this is with hash partitioning you
really need the whole set for the table to be online at all. But we can't
really enforce that, because it would limit some of the flexibility that you
have in mind.
Yes, I agree with all of that. I don't think it's really going to be
a problem in practice. The only reason to detach a hash partition is
if you want to split it, and we may eventually have convenience syntax
to do that in an automated (i.e. less error-prone) way. If somebody
does it manually and without a plan for putting back a replacement
partition, they may be sad, but if somebody puts a CHECK (false)
constraint on their table, they may be sad about that, too. It's more
important to allow for flexibility than to prohibit every stupid thing
somebody might try to do. Also, documentation helps. We've got a
chapter on partitioning and it can be expanded to discuss these kinds
of issues.
Stepping back, your approach might be closer to the general postgres
philosophy of allowing the user to assemble from spare parts first, then a
few releases later we offer some pre-built subassemblies, and a few releases
later we make the typical cases work out of the box. I'm fine with it as
long as we don't paint ourselves into a corner.
That's basically my thinking here. Right now, our partitioning is
primitive in numerous ways, and so the rough edges are pretty visible.
However, I believe that with careful design we can file down many of
those rough edges over time. Now, it's probably never going to be
quite as smooth as if the system had been designed for partitions from
the ground up, or at least not any time in the foreseeable future, but
I think it can still be very good. If we add the above-referenced
logic for index and foreign key handling, convenience syntax for
creating partitions along with tables and for data-movement
operations, better partition pruning in the planner, run-time
partition-pruning in the executor, partitionwise join and aggregate,
MergeAppend -> Append strength reduction when the required sortorder
matches the partition order, UPDATE tuple routing, default partitions,
and so on, I believe that partitioning will go from "eh, that's better
than inheritance" to "hey, that's actually really good". There is a
ton of work to do there, but I think it is all doable within the
current infrastructure, and all of those things can (and in a number
of cases already are) being worked on as separate patches, so we can
get into each release what is ready and push off to the next release
what isn't. As we go, we'll have fewer and fewer cases where
partitioning a table regresses performance and more and more cases
where the stuff you want to do just works.
Probably the toughest nut to crack is global indexes. An argument was
made on this very mailing list a number of years ago that nobody
should want global indexes because $REASONS, but I was a little
skeptical of that argument at the time and it's been clear to me that
EnterpriseDB's customers, at least, do not in any way accept those
arguments. It works on Oracle or other systems they use, and they
find it useful enough that they're unwilling to be without it. If
PostgreSQL provides the same functionality, they'll use it for more
things than if it doesn't. I find myself more than a bit intimidated
at the prospect of actually trying to make this work, but I've been
convinced that people won't stop asking until it does.
Of course we still have work to do on the hash functions. We should solve at
least the most glaring portability problems, and try to harmonize the hash
opfamilies. If you agree, I can put together a patch or two.
I definitely agree with solving the portability problems to the extent
that we can reasonably do so. I think adding more cross-type hash
opfamilies is a mildly good thing: I don't object to it, it probably
makes sense to do at the same time as any other hashing changes we
want to make, and it's better than not doing it. At the same time, I
wouldn't walk over hot coals for it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, May 12, 2017 at 1:35 PM, Joe Conway <mail@joeconway.com> wrote:
That's a good point, but the flip side is that, if we don't have
such a rule, a pg_dump of a hash-partitioned table on one
architecture might fail to restore on another architecture. Today, I
believe that, while the actual database cluster is
architecture-dependent, a pg_dump is architecture-independent. Is it
OK to lose that property?Not from where I sit.
It was pointed out at PGCon that we've actually already crossed this
Rubicon. If you create a range-partitioned table, put a bunch of data
into it, and then try to reload it on another system with a different
set of encoding definitions, the proper partition for some row might
be different. That would result in the dump failing to reload with a
complaint about the partition key being violated. And, in fact, you
could have the exact same issue on earlier releases which don't have
partitioning, because a CHECK constraint of the form (a >= 'something'
AND b < 'something else') could be true under one encoding and false
under another, and you could define such a constraint on any release
(from this millienium, anyway).
I'm not actually aware of an instance where this has bitten anyone,
even though it seems like it certainly could have and maybe should've
gotten somebody at some point. Has anyone else? I think it's a
reasonable guess that such problems will become more common with the
advent of partitioning and more common still as we continue to improve
partitioning, because people who otherwise would have given up on
PostgreSQL -- or at least on partitioning -- will actually try to use
it in earnest and then hit this problem. However, my guess is that it
will still be pretty rare, and that having an optional
--dump-partition-data-with-parent flag that can be used when it crops
up will be an adequate workaround for most people. Of course, that is
just my opinion.
So now I think that the right way to think about the issues around
hash partitioning is as a new variant of a problem that we already
have rather than an altogether new problem. IMHO, the right questions
are:
1. Are the new problems worse than the old ones?
2. What could we do about it?
On #1, I'd say tentatively yes. The problem of portability across
encodings is probably not new. Suppose you have a table partitioned
by range, either using the new range partitioning or using the old
table inheritance method and CHECK constraints. If you move that
table to a different encoding, will the collation behavior you get
under the new encoding match the collation behavior you got under the
old encoding? The documentation says: "Also, a collation is tied to a
character set encoding (see Section 22.3). The same collation name may
exist for different encodings", which makes it sound like it is
possible but not guaranteed. Even if the same collation name happens
to exist, there's no guarantee it behaves the same way under the new
encoding, and given our experiences with glibc so far, I'd bet against
it. If glibc doesn't even think strcoll() and strxfrm() need to agree
with each other for the same collation, or that minor releases
shouldn't whack the behavior around, there doesn't seem to be room for
optimism about the possibility that they carefully preserve behavior
across similarly-named collations on different encodings. On the
other hand, collation rules probably tend to vary only around the
edges, so there's a reasonably good chance that even if the collation
rules change when you switch encodings, every row will still get put
into the same partition as before. If we implement hashing for hash
partitioning in some trivial way like hashing the raw bytes, that will
most certainly not be true -- *most* rows will move to a different
partition when you switch encodings.
Furthermore, neither range nor list partitioning depends on properties
of the hardware, like how wide integers are, or whether they are
stored big-endian. A naive approach to hash partitioning would depend
on those things. That's clearly worse.
On #2, I'll attempt to list the approaches that have been proposed so far:
1. Don't implement hash partitioning (Tom Lane). I don't think this
proposal will attract many votes.
2. Add an option like --dump-partition-data-with-parent. I'm not sure
who originally proposed this, but it seems that everybody likes it.
What we disagree about is the degree to which it's sufficient. Jeff
Davis thinks it doesn't go far enough: what if you have an old
plain-format dump that you don't want to hand-edit, and the source
database is no longer available? Most people involved in the
unconference discussion of partitioning at PGCon seemed to feel that
wasn't really something we should be worry about too much. I had been
taking that position also, more or less because I don't see that there
are better alternatives. For instance, Jeff proposed having the COPY
command specify both the parent and the child and providing a run-time
option of some kind to decide which table name gets used, but I think
that would be a fairly unpleasant syntax wart with some risk of
breaking other cases (e.g. what if you want to restore a single child
on a system where the parent doesn't exist?). Someone else in the
room had a different take on why we shouldn't worry about this
problem, which I'll try to summarize: "Well, encoding conversions are
already so painful that if you're laboring under any illusion that
it's all just going to work, you're wrong, and this isn't going to
make anything materially worse."
3. Implement portable hash functions (Jeff Davis or me, not sure
which). Andres scoffed at this idea, but I still think it might have
legs. Coming up with a hashing algorithm for integers that produces
the same results on big-endian and little-endian systems seems pretty
feasible, even with the additional constraint that it should still be
fast. Coming up with a hashing algorithm that produces the same
results on every various encodings of the same characters is
definitely feasible in any case where we know how to do encoding
conversion among all relevant encodings, but it is probably hard to
make fast. Those two things also solve different parts of the
problem; one is insulating the user from a difference in hardware
architecture, while the other is insulating the user from a difference
in user-selected settings. I think that the first of those things is
more important than the second, because it's easier to change your
settings than it is to change your hardware. Also, I think that there
may be room for allowing for some user choice on this point; if people
are willing to write multiple hash opfamilies with different
portability-performance trade-offs, we can offer them all, either in
core or contrib, and users can pick the ones that have the properties
they want. My personal guess is that most people will prefer the fast
hash functions over the ones that solve their potential future
migration problems, but, hey, options are good.
I think that some combination of #2 and #3 is probably as well as
we're going to do here, and personally I think we can make that pretty
good. Users who use PostgreSQL 11 to do hash partitioning using a
composite key consisting of one float8 column and one SJIS-encoded
text column on a VAX and attach to each partition a CHECK constraint
that happens to pass only because of the vagaries of which rows end up
in which partitions, and who then try to migrate to Linux64 with a
numeric column and a UTF-8 encoded text column with the same CHECK
constraints will have some problems to work out. However, for the
vast majority of people, reloading the data through the top-parent is
going to be good enough, and the more portable we can make the hash
functions, the fewer people will need to do even that much. Of
course, if there are other things we can do for a reasonable
investment of effort to make things better still, great! I'm open to
suggestions. But I don't think we need to panic about this.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-06-01 13:59:42 -0400, Robert Haas wrote:
I'm not actually aware of an instance where this has bitten anyone,
even though it seems like it certainly could have and maybe should've
gotten somebody at some point. Has anyone else?
Two comments: First, citus has been doing hash-partitiong and
append/range partitioning for a while now, and I'm not aware of anyone
being bitten by this (although there've been plenty other things ;)),
even though there've been cases upgrading to different collation &
encodings. Secondly, I think that's to a significant degree caused by
the fact that in practice people way more often partition on types like
int4/int8/date/timestamp/uuid rather than text - there's rarely good
reasons to do the latter.
Furthermore, neither range nor list partitioning depends on properties
of the hardware, like how wide integers are, or whether they are
stored big-endian. A naive approach to hash partitioning would depend
on those things. That's clearly worse.
I don't think our current int4/8 hash functions depend on
FLOAT8PASSBYVAL.
3. Implement portable hash functions (Jeff Davis or me, not sure
which). Andres scoffed at this idea, but I still think it might have
legs. Coming up with a hashing algorithm for integers that produces
the same results on big-endian and little-endian systems seems pretty
feasible, even with the additional constraint that it should still be
fast.
Just to clarify: I don't think it's a problem to do so for integers and
most other simple scalar types. There's plenty hash algorithms that are
endianess independent, and the rest is just a bit of care. Where I see
a lot more issues is doing so for more complex types like arrays, jsonb,
postgis geometry/geography types and the like, where the fast and simple
implementation is to just hash the entire datum - and that'll very
commonly not be portable at all due to padding and type wideness
differences.
My personal guess is that most people will prefer the fast
hash functions over the ones that solve their potential future
migration problems, but, hey, options are good.
I'm pretty sure that will be the case. I'm not sure that adding
infrastructure to allow for something that nobody will use in practice
is a good idea. If there ends up being demand for it, we can still go there.
I think that the number of people that migrate between architectures is
low enough that this isn't going to be a very common issue. Having some
feasible way around this is important, but I don't think we should
optimize heavily for it by developing new infrastructure / complicating
experience for the 'normal' uses.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 06/01/2017 11:25 AM, Andres Freund wrote:
On 2017-06-01 13:59:42 -0400, Robert Haas wrote:
My personal guess is that most people will prefer the fast
hash functions over the ones that solve their potential future
migration problems, but, hey, options are good.I'm pretty sure that will be the case. I'm not sure that adding
infrastructure to allow for something that nobody will use in practice
is a good idea. If there ends up being demand for it, we can still go there.I think that the number of people that migrate between architectures is
low enough that this isn't going to be a very common issue. Having some
feasible way around this is important, but I don't think we should
optimize heavily for it by developing new infrastructure / complicating
experience for the 'normal' uses.
+1
Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
On Thu, Jun 1, 2017 at 10:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
1. Are the new problems worse than the old ones?
2. What could we do about it?
Exactly the right questions.
1. For range partitioning, I think it's "yes, a little". As you point
out, there are already some weird edge cases -- the main way range
partitioning would make the problem worse is simply by having more
users.
But for hash partitioning I think the problems will become more
substantial. Different encodings, endian issues, etc. will be a
headache for someone, and potentially a bad day if they are urgently
trying to restore on a new machine. We should remember that not
everyone is a full-time postgres DBA, and users might reasonably think
that the default options to pg_dump[all] will give them a portable
dump.
2. I basically see two approaches to solve the problem:
(a) Tom suggested at PGCon that we could have a GUC that
automatically causes inserts to the partition to be re-routed through
the parent. We could discuss whether to always route through the
parent, or do a recheck on the partition constrains and only reroute
tuples that will fail it. If the user gets into trouble, the worst
that would happen is a helpful error message telling them to set the
GUC. I like this idea.
(b) I had suggested before that we could make the default text dump
(and the default output from pg_restore, for consistency) route
through the parent. Advanced users would dump with -Fc, and pg_restore
would support an option to do partition-wise loading. To me, this is
simpler, but users might forget to use (or not know about) the
pg_restore option and then it would load more slowly. Also, the ship
is sailing on range partitioning, so we might prefer option (a) just
to avoid making any changes.
I am fine with either option.
2. Add an option like --dump-partition-data-with-parent. I'm not sure
who originally proposed this, but it seems that everybody likes it.
What we disagree about is the degree to which it's sufficient. Jeff
Davis thinks it doesn't go far enough: what if you have an old
plain-format dump that you don't want to hand-edit, and the source
database is no longer available? Most people involved in the
unconference discussion of partitioning at PGCon seemed to feel that
wasn't really something we should be worry about too much. I had been
taking that position also, more or less because I don't see that there
are better alternatives.
If the suggestions above are unacceptable, and we don't come up with
anything better, then of course we have to move on. I am worrying now
primarily because now is the best time to worry; I don't expect any
horrible outcome.
3. Implement portable hash functions (Jeff Davis or me, not sure
which). Andres scoffed at this idea, but I still think it might have
legs.
I think it reduces the problem, which has value, but it's hard to make
it rock-solid.
make fast. Those two things also solve different parts of the
problem; one is insulating the user from a difference in hardware
architecture, while the other is insulating the user from a difference
in user-selected settings. I think that the first of those things is
more important than the second, because it's easier to change your
settings than it is to change your hardware.
Good point.
Regards,
Jeff Davis
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jun 1, 2017 at 11:25 AM, Andres Freund <andres@anarazel.de> wrote:
Secondly, I think that's to a significant degree caused by
the fact that in practice people way more often partition on types like
int4/int8/date/timestamp/uuid rather than text - there's rarely good
reasons to do the latter.
Once we support more pushdowns to partitions, the only question is:
what are your join keys and what are your grouping keys?
Text is absolutely a normal join key or group key. Consider joins on a
user ID or grouping by a model number.
Regards,
Jeff Davis
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 2, 2017 at 1:24 AM, Jeff Davis <pgsql@j-davis.com> wrote:
1. For range partitioning, I think it's "yes, a little". As you point
out, there are already some weird edge cases -- the main way range
partitioning would make the problem worse is simply by having more
users.
I agree.
But for hash partitioning I think the problems will become more
substantial. Different encodings, endian issues, etc. will be a
headache for someone, and potentially a bad day if they are urgently
trying to restore on a new machine. We should remember that not
everyone is a full-time postgres DBA, and users might reasonably think
that the default options to pg_dump[all] will give them a portable
dump.
I agree to an extent. I think the problem will be worse for hash
partitioning but I might disagree with you on how much worse. I think
that most people don't do encoding conversions very often, and that
those who do know (or should know) enough to expect trouble. I think
most people do endian-ness conversions almost never, but since that's
a matter of hardware not configuration I'd like to paper over that
case if we can.
2. I basically see two approaches to solve the problem:
(a) Tom suggested at PGCon that we could have a GUC that
automatically causes inserts to the partition to be re-routed through
the parent. We could discuss whether to always route through the
parent, or do a recheck on the partition constrains and only reroute
tuples that will fail it. If the user gets into trouble, the worst
that would happen is a helpful error message telling them to set the
GUC. I like this idea.
Yeah, that's not crazy. I find it a bit surprising in terms of the
semantics, though. SET
when_i_try_to_insert_into_a_specific_partition_i_dont_really_mean_it =
true?
(b) I had suggested before that we could make the default text dump
(and the default output from pg_restore, for consistency) route
through the parent. Advanced users would dump with -Fc, and pg_restore
would support an option to do partition-wise loading. To me, this is
simpler, but users might forget to use (or not know about) the
pg_restore option and then it would load more slowly. Also, the ship
is sailing on range partitioning, so we might prefer option (a) just
to avoid making any changes.
I think this is a non-starter. The contents of the dump shouldn't
depend on the format chosen; that is bound to confuse somebody. I
also do not wish to inflict a speed penalty on the users of
plain-format dumps.
2. Add an option like --dump-partition-data-with-parent. I'm not sure
who originally proposed this, but it seems that everybody likes it.
What we disagree about is the degree to which it's sufficient. Jeff
Davis thinks it doesn't go far enough: what if you have an old
plain-format dump that you don't want to hand-edit, and the source
database is no longer available? Most people involved in the
unconference discussion of partitioning at PGCon seemed to feel that
wasn't really something we should be worry about too much. I had been
taking that position also, more or less because I don't see that there
are better alternatives.If the suggestions above are unacceptable, and we don't come up with
anything better, then of course we have to move on. I am worrying now
primarily because now is the best time to worry; I don't expect any
horrible outcome.
OK.
3. Implement portable hash functions (Jeff Davis or me, not sure
which). Andres scoffed at this idea, but I still think it might have
legs.I think it reduces the problem, which has value, but it's hard to make
it rock-solid.
I agree.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 06/02/2017 05:47 AM, Robert Haas wrote:
On Fri, Jun 2, 2017 at 1:24 AM, Jeff Davis <pgsql@j-davis.com> wrote:
2. I basically see two approaches to solve the problem:
(a) Tom suggested at PGCon that we could have a GUC that
automatically causes inserts to the partition to be re-routed through
the parent. We could discuss whether to always route through the
parent, or do a recheck on the partition constrains and only reroute
tuples that will fail it. If the user gets into trouble, the worst
that would happen is a helpful error message telling them to set the
GUC. I like this idea.Yeah, that's not crazy. I find it a bit surprising in terms of the
semantics, though. SET
when_i_try_to_insert_into_a_specific_partition_i_dont_really_mean_it =
true?
Maybe
SET partition_tuple_retry = true;
-or-
SET partition_tuple_reroute = true;
?
I like the idea of only rerouting when failing constraints although I
can envision where there might be use cases where you essentially want
to re-partition and therefore reroute everything, leading to:
SET partition_tuple_reroute = (none | error | all);
Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
On Fri, Jun 2, 2017 at 10:19 AM, Joe Conway <mail@joeconway.com> wrote:
Yeah, that's not crazy. I find it a bit surprising in terms of the
semantics, though. SET
when_i_try_to_insert_into_a_specific_partition_i_dont_really_mean_it =
true?Maybe
SET partition_tuple_retry = true;
-or-
SET partition_tuple_reroute = true;
?I like the idea of only rerouting when failing constraints although I
can envision where there might be use cases where you essentially want
to re-partition and therefore reroute everything, leading to:SET partition_tuple_reroute = (none | error | all);
Personally, I think it's more elegant to make this a pg_dump option
than to make it a server GUC, but I'm not going to spend time fighting
the server GUC idea if other people like it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jun 1, 2017 at 2:25 PM, Andres Freund <andres@anarazel.de> wrote:
Just to clarify: I don't think it's a problem to do so for integers and
most other simple scalar types. There's plenty hash algorithms that are
endianess independent, and the rest is just a bit of care.
Do you have any feeling for which of those endianness-independent hash
functions might be a reasonable choice for us?
https://github.com/markokr/pghashlib implements various hash functions
for PostgreSQL, and claims that, of those implemented, crc32, Jenkins,
lookup3be and lookup3le, md5, and siphash24 are endian-independent.
An interesting point here is that Jeff Davis asserted in the original
post on this thread that our existing hash_any() wasn't portable, but
our current hash_any seems to be the Jenkins algorithm -- so I'm
confused. Part of the problem seems to be that, according to
https://en.wikipedia.org/wiki/Jenkins_hash_function there are 4 of
those. I don't know whether the one in pghashlib is the same one
we've implemented.
Kennel Marshall suggested xxhash as an endian-independent algorithm
upthread. Code for that is available under a 2-clause BSD license.
PostgreSQL page checksums use an algorithm based on, but not exactly,
FNV-1a. See storage/checksum_impl.h. The comments there say this
algorithm was chosen with speed in mind. Our version is not
endian-independent because it folds in 4-byte integers rather than
1-byte integers, but plain old FNV-1a *is* endian-independent and
could be used.
We also have an implementation of CRC32C in core - see port/pg_crc32.h
and port/pg_crc32c_sb8.c. It's not clear to me whether this is
Endian-independent or not, although there is stuff that depends on
WORDS_BIGENDIAN, so, uh, maybe?
Some other possibly-interesting links:
https://research.neustar.biz/2011/12/29/choosing-a-good-hash-function-part-2/
http://greenrobot.org/essentials/features/performant-hash-functions-for-java/comparison-of-hash-functions/
https://www.strchr.com/hash_functions
Thoughts?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2017-08-03 17:09:41 -0400, Robert Haas wrote:
On Thu, Jun 1, 2017 at 2:25 PM, Andres Freund <andres@anarazel.de> wrote:
Just to clarify: I don't think it's a problem to do so for integers and
most other simple scalar types. There's plenty hash algorithms that are
endianess independent, and the rest is just a bit of care.Do you have any feeling for which of those endianness-independent hash
functions might be a reasonable choice for us?
Not a strong / very informed one, TBH.
I'm not convinced it's worth trying to achieve this in the first place,
now that we "nearly" have the insert-via-parent feature. Isn't this a
lot of work, for very little practical gain? Having to select that when
switching architectures still seems unproblematic. People will just
about never switch endianess, so even a tiny performance & effort
overhead doesn't seem worth it to me.
Leaving that aside:
https://github.com/markokr/pghashlib implements various hash functions
for PostgreSQL, and claims that, of those implemented, crc32, Jenkins,
lookup3be and lookup3le, md5, and siphash24 are endian-independent.
An interesting point here is that Jeff Davis asserted in the original
post on this thread that our existing hash_any() wasn't portable, but
our current hash_any seems to be the Jenkins algorithm -- so I'm
confused. Part of the problem seems to be that, according to
https://en.wikipedia.org/wiki/Jenkins_hash_function there are 4 of
those. I don't know whether the one in pghashlib is the same one
we've implemented.
IIUC lookup3be/le from Marko's hashlib just has a endianess conversion
added. I'd personally not go for jenkins3, it's not particularly fast,
nor does it balance that out w/ being cryptographicaly secure.
Kennel Marshall suggested xxhash as an endian-independent algorithm
upthread. Code for that is available under a 2-clause BSD license.
Yea, that'd have been one of my suggestions too. Especially as I still
want to implement better compression using lz4, and that'll depend on
xxhash in turn.
PostgreSQL page checksums use an algorithm based on, but not exactly,
FNV-1a. See storage/checksum_impl.h. The comments there say this
algorithm was chosen with speed in mind. Our version is not
endian-independent because it folds in 4-byte integers rather than
1-byte integers, but plain old FNV-1a *is* endian-independent and
could be used.
Non-SIMDed (which we hope to achieve with our implementation, which is
why we have separate compiler flags for that file) implementations of
FNV are, to my knowledge, not particularly fast. And the SIMD tricks
are, to my knowledge, not really applicable to the case at hand here. So
I'm not a fan of choosing FNV.
We also have an implementation of CRC32C in core - see port/pg_crc32.h
and port/pg_crc32c_sb8.c. It's not clear to me whether this is
Endian-independent or not, although there is stuff that depends on
WORDS_BIGENDIAN, so, uh, maybe?
The results should be endian independent. It depends on WORDS_BIGENDIAN
because we need different pre-computed tables depending on endianess.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 3, 2017 at 5:32 PM, Andres Freund <andres@anarazel.de> wrote:
Do you have any feeling for which of those endianness-independent hash
functions might be a reasonable choice for us?Not a strong / very informed one, TBH.
I'm not convinced it's worth trying to achieve this in the first place,
now that we "nearly" have the insert-via-parent feature. Isn't this a
lot of work, for very little practical gain? Having to select that when
switching architectures still seems unproblematic. People will just
about never switch endianess, so even a tiny performance & effort
overhead doesn't seem worth it to me.
I kind of agree with you. There are some advantages of being
endian-independent, like maybe your hash partitioning is really across
multiple shards, and not all the shards are the same machine
architecture, but it's not going to come up for most people.
For me, the basic point here is that we need a set of hash functions
for hash partitioning that are different than what we use for hash
indexes and hash joins -- otherwise when we hash partition a table and
create hash indexes on each partition, those indexes will have nasty
clustering. Partitionwise hash joins will have similar problems. So,
a new set of hash functions specifically for hash partitioning is
quite desirable.
Given that, if it's not a big problem to pick ones that have the
portability properties at least some people want, I'd be inclined to
do it. If it results in a noticeable slowdown on macrobenchmarks,
then not so much, but otherwise, I'd rather do what people are asking
for than spend time arguing about it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2017-08-03 17:43:44 -0400, Robert Haas wrote:
For me, the basic point here is that we need a set of hash functions
for hash partitioning that are different than what we use for hash
indexes and hash joins -- otherwise when we hash partition a table and
create hash indexes on each partition, those indexes will have nasty
clustering. Partitionwise hash joins will have similar problems. So,
a new set of hash functions specifically for hash partitioning is
quite desirable.
Couldn't that just as well solved by being a bit smarter with an IV? I
doubt we want to end up with different hashfunctions for sharding,
partitioning, hashjoins (which seems to form a hierarchy). Having a
working hash-combine function, or even better a hash API that can
continue to use the hash's internal state, seems a more scalable
solution.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 3, 2017 at 5:50 PM, Andres Freund <andres@anarazel.de> wrote:
On 2017-08-03 17:43:44 -0400, Robert Haas wrote:
For me, the basic point here is that we need a set of hash functions
for hash partitioning that are different than what we use for hash
indexes and hash joins -- otherwise when we hash partition a table and
create hash indexes on each partition, those indexes will have nasty
clustering. Partitionwise hash joins will have similar problems. So,
a new set of hash functions specifically for hash partitioning is
quite desirable.Couldn't that just as well solved by being a bit smarter with an IV? I
doubt we want to end up with different hashfunctions for sharding,
partitioning, hashjoins (which seems to form a hierarchy). Having a
working hash-combine function, or even better a hash API that can
continue to use the hash's internal state, seems a more scalable
solution.
That's another way to go, but it requires inventing a way to thread
the IV through the hash opclass interface. That's actually sort of a
problem anyway. Maybe I ought to have started with the question of
how we're going to make that end of things work. We could:
- Invent a new hash_partition AM that doesn't really make indexes but
supplies hash functions for hash partitioning.
- Add a new, optional support function 2 to the hash AM that takes a
value of the type *and* an IV as an argument.
- Something else.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-08-03 17:57:37 -0400, Robert Haas wrote:
On Thu, Aug 3, 2017 at 5:50 PM, Andres Freund <andres@anarazel.de> wrote:
On 2017-08-03 17:43:44 -0400, Robert Haas wrote:
For me, the basic point here is that we need a set of hash functions
for hash partitioning that are different than what we use for hash
indexes and hash joins -- otherwise when we hash partition a table and
create hash indexes on each partition, those indexes will have nasty
clustering. Partitionwise hash joins will have similar problems. So,
a new set of hash functions specifically for hash partitioning is
quite desirable.Couldn't that just as well solved by being a bit smarter with an IV? I
doubt we want to end up with different hashfunctions for sharding,
partitioning, hashjoins (which seems to form a hierarchy). Having a
working hash-combine function, or even better a hash API that can
continue to use the hash's internal state, seems a more scalable
solution.That's another way to go, but it requires inventing a way to thread
the IV through the hash opclass interface.
Only if we really want to do it really well :P. Using a hash_combine()
like
/*
* Combine two hash values, resulting in another hash value, with decent bit
* mixing.
*
* Similar to boost's hash_combine().
*/
static inline uint32
hash_combine(uint32 a, uint32 b)
{
a ^= b + 0x9e3779b9 + (a << 6) + (a >> 2);
return a;
}
between hash(IV) and the hashfunction should do the trick (the IV needs
to hashed once, otherwise the bit mix is bad).
That's actually sort of a
problem anyway. Maybe I ought to have started with the question of
how we're going to make that end of things work.
+1 one for that plan.
We could:
- Invent a new hash_partition AM that doesn't really make indexes but
supplies hash functions for hash partitioning.
- Add a new, optional support function 2 to the hash AM that takes a
value of the type *and* an IV as an argument.
- Something else.
Not arguing for it, but one option could also have pg_type.hash*
function(s).
One thing that I think might be advisable to think about is that we're
atm stuck with a relatively bad hash function for hash indexes (and hash
joins/aggs), and we should probably evolve it at some point. At the same
time there's currently people out there relying on the current hash
functions remaining stable.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 3, 2017 at 6:08 PM, Andres Freund <andres@anarazel.de> wrote:
That's another way to go, but it requires inventing a way to thread
the IV through the hash opclass interface.Only if we really want to do it really well :P. Using a hash_combine()
like/*
* Combine two hash values, resulting in another hash value, with decent bit
* mixing.
*
* Similar to boost's hash_combine().
*/
static inline uint32
hash_combine(uint32 a, uint32 b)
{
a ^= b + 0x9e3779b9 + (a << 6) + (a >> 2);
return a;
}between hash(IV) and the hashfunction should do the trick (the IV needs
to hashed once, otherwise the bit mix is bad).
That seems pretty lame, although it's sufficient to solve the
immediate problem, and I have to admit to a certain predilection for
things that solve the immediate problem without creating lots of
additional work.
We could:
- Invent a new hash_partition AM that doesn't really make indexes but
supplies hash functions for hash partitioning.
- Add a new, optional support function 2 to the hash AM that takes a
value of the type *and* an IV as an argument.
- Something else.Not arguing for it, but one option could also have pg_type.hash*
function(s).
True. That is a bit less configurable because you can't then have
multiple functions for the same type. Going through the opclass
interface means you can have hash_portable_ops and hash_fast_ops and
let people choose. But this would be easy to implement and enough for
most people in practice.
One thing that I think might be advisable to think about is that we're
atm stuck with a relatively bad hash function for hash indexes (and hash
joins/aggs), and we should probably evolve it at some point. At the same
time there's currently people out there relying on the current hash
functions remaining stable.
That to me is a bit of a separate problem.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 3, 2017 at 6:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
That seems pretty lame, although it's sufficient to solve the
immediate problem, and I have to admit to a certain predilection for
things that solve the immediate problem without creating lots of
additional work.
After some further thought, I propose the following approach to the
issues raised on this thread:
1. Allow hash functions to have a second, optional support function,
similar to what we did for btree opclasses in
c6e3ac11b60ac4a8942ab964252d51c1c0bd8845. The second function will
have a signature of (opclass_datatype, int64) and should return int64.
The int64 argument is a salt. When the salt is 0, the low 32 bits of
the return value should match what the existing hash support function
returns. Otherwise, the salt should be used to perturb the hash
calculation. This design kills two birds with one stone: it gives
callers a way to get 64-bit hash values if they want them (which
should make Tom happy, and we could later think about plugging it into
hash indexes) and it gives us a way of turning a single hash function
into many (which should allow us to prevent hash indexes or hash
tables built on a hash-partitioned table from having a heavily
lopsided distribution, and probably will also make people who are
interested in topics like Bloom filters happy).
2. Introduce a new hash opfamilies here which are more faster, more
portable, and/or better in other ways than the ones we have today.
Given our current rather simplistic notion of a "default" opclass,
there doesn't seem to be an easy to make whatever we introduce here
the default for hash partitioning while keeping the existing default
for other purposes. That should probably be fixed at some point.
However, given the amount of debate this topic has generated, it also
doesn't seem likely that we'd actually wish to decide on a different
default in the v11 release cycle, so I don't think there's any rush to
figure out exactly how we want to fix it. Focusing on introducing the
new opfamilies at all is probably a better use of time, IMHO.
Unless anybody strongly objects, I'm going to write a patch for #1 (or
convince somebody else to do it) and leave #2 for someone else to
tackle if they wish. In addition, I'll tackle (or convince someone
else to tackle) the project of adding that second optional support
function to every hash opclass in the core repository. Then Amul can
update the core hash partitioning patch to use the new infrastructure
when it's available and fall back to the existing method when it's
not, and I think we'll be in reasonably good shape.
Objections to this plan of attack?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
After some further thought, I propose the following approach to the
issues raised on this thread:
1. Allow hash functions to have a second, optional support function,
similar to what we did for btree opclasses in
c6e3ac11b60ac4a8942ab964252d51c1c0bd8845. The second function will
have a signature of (opclass_datatype, int64) and should return int64.
The int64 argument is a salt. When the salt is 0, the low 32 bits of
the return value should match what the existing hash support function
returns. Otherwise, the salt should be used to perturb the hash
calculation.
+1
2. Introduce a new hash opfamilies here which are more faster, more
portable, and/or better in other ways than the ones we have today.
This part seems, uh, under-defined and/or over-ambitious and/or unrelated
to the problem at hand. What are the concrete goals?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 16, 2017 at 12:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
After some further thought, I propose the following approach to the
issues raised on this thread:1. Allow hash functions to have a second, optional support function,
similar to what we did for btree opclasses in
c6e3ac11b60ac4a8942ab964252d51c1c0bd8845. The second function will
have a signature of (opclass_datatype, int64) and should return int64.
The int64 argument is a salt. When the salt is 0, the low 32 bits of
the return value should match what the existing hash support function
returns. Otherwise, the salt should be used to perturb the hash
calculation.+1
Cool.
2. Introduce a new hash opfamilies here which are more faster, more
portable, and/or better in other ways than the ones we have today.This part seems, uh, under-defined and/or over-ambitious and/or unrelated
to the problem at hand. What are the concrete goals?
In my view, it's for the person who proposed a new opclass to say what
goals they're trying to satisfy with that opclass. A variety of goals
that could justify a new opclass have been proposed upthread --
especially in Jeff Davis's original post (q.v.). Such goals could
include endian-ness independence, collation-independence, speed,
better bit-mixing, and opfamilies that span more types than our
currently ones. These goals are probably mutually exclusive, in the
sense that endian-ness independence and collation-independence are
probably pulling in the exact opposite direction as speed, so
conceivably there could be multiple opclasses proposed with different
trade-offs. I take no position for the present on which of those
would be worth accepting into core.
I agree with you that all of this is basically unrelated to the
problem at hand, if "the problem at hand" means "hash partitioning".
In my mind, there are two really serious issues that have been raised
on that front. One is the problem of hash joins/aggregates/indexes on
hash-partitioned tables coming out lopsided, and adding an optional
salt will let us fix that problem. The other is that if you migrate
your data to a different encoding or endianness, you might have
dump/reload problems, but IMHO the already-committed patch for
--load-via-partition-root is as much as really *needs* to be done
there. I am less skeptical about the idea of endianness-independent
hash functions than he is, but "we can't have
$FEATURE_MANY_PEOPLE_WANT until we solve
$PROBLEM_ANDRES_THINKS_IS_NOT_PRACTICALLY_SOLVABLE" is not a route to
swift victory.
In short, I'm proposing to add a method to seed the existing hash
functions and get 64 bits out of them, and leaving any other potential
improvements to someone who wants to argue for them.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 16, 2017 at 12:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
After some further thought, I propose the following approach to the
issues raised on this thread:1. Allow hash functions to have a second, optional support function,
similar to what we did for btree opclasses in
c6e3ac11b60ac4a8942ab964252d51c1c0bd8845. The second function will
have a signature of (opclass_datatype, int64) and should return int64.
The int64 argument is a salt. When the salt is 0, the low 32 bits of
the return value should match what the existing hash support function
returns. Otherwise, the salt should be used to perturb the hash
calculation.+1
Attached is a quick sketch of how this could perhaps be done (ignoring
for the moment the relatively-boring opclass pushups). It introduces
a new function hash_any_extended which differs from hash_any() in that
(a) it combines both b and c into the result and (b) it accepts a seed
which is mixed into the initial state if it's non-zero.
Comments?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
hash-any-extended-v1.patchapplication/octet-stream; name=hash-any-extended-v1.patchDownload
diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index a127f3f8b1..b72df9b773 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -502,6 +502,226 @@ hash_any(register const unsigned char *k, register int keylen)
}
/*
+ * hash_any_extended() -- hash into a 64-bit value, using an optional seed
+ * k : the key (the unaligned variable-length array of bytes)
+ * len : the length of the key, counting by bytes
+ * seed : a 64-bit seed (0 means no seed)
+ *
+ * Returns a uint64 value. Otherwise similar to hash_any.
+ */
+Datum
+hash_any_extended(register const unsigned char *k, register int keylen,
+ uint64 seed)
+{
+ register uint32 a,
+ b,
+ c,
+ len;
+
+ /* Set up the internal state */
+ len = keylen;
+ a = b = c = 0x9e3779b9 + len + 3923095;
+
+ /* If the seed is non-zero, use it to perturb the internal state. */
+ if (seed != 0)
+ {
+ /*
+ * In essence, the seed is treated as part of the data being hashed,
+ * but for simplicity, we pretend that it's padded with four bytes of
+ * zeroes so that the seed constitutes a 4-byte chunk.
+ */
+ a += (uint32) (seed >> 32);
+ b += (uint32) seed;
+ mix(a, b, c);
+ }
+
+ /* If the source pointer is word-aligned, we use word-wide fetches */
+ if (((uintptr_t) k & UINT32_ALIGN_MASK) == 0)
+ {
+ /* Code path for aligned source data */
+ register const uint32 *ka = (const uint32 *) k;
+
+ /* handle most of the key */
+ while (len >= 12)
+ {
+ a += ka[0];
+ b += ka[1];
+ c += ka[2];
+ mix(a, b, c);
+ ka += 3;
+ len -= 12;
+ }
+
+ /* handle the last 11 bytes */
+ k = (const unsigned char *) ka;
+#ifdef WORDS_BIGENDIAN
+ switch (len)
+ {
+ case 11:
+ c += ((uint32) k[10] << 8);
+ /* fall through */
+ case 10:
+ c += ((uint32) k[9] << 16);
+ /* fall through */
+ case 9:
+ c += ((uint32) k[8] << 24);
+ /* the lowest byte of c is reserved for the length */
+ /* fall through */
+ case 8:
+ b += ka[1];
+ a += ka[0];
+ break;
+ case 7:
+ b += ((uint32) k[6] << 8);
+ /* fall through */
+ case 6:
+ b += ((uint32) k[5] << 16);
+ /* fall through */
+ case 5:
+ b += ((uint32) k[4] << 24);
+ /* fall through */
+ case 4:
+ a += ka[0];
+ break;
+ case 3:
+ a += ((uint32) k[2] << 8);
+ /* fall through */
+ case 2:
+ a += ((uint32) k[1] << 16);
+ /* fall through */
+ case 1:
+ a += ((uint32) k[0] << 24);
+ /* case 0: nothing left to add */
+ }
+#else /* !WORDS_BIGENDIAN */
+ switch (len)
+ {
+ case 11:
+ c += ((uint32) k[10] << 24);
+ /* fall through */
+ case 10:
+ c += ((uint32) k[9] << 16);
+ /* fall through */
+ case 9:
+ c += ((uint32) k[8] << 8);
+ /* the lowest byte of c is reserved for the length */
+ /* fall through */
+ case 8:
+ b += ka[1];
+ a += ka[0];
+ break;
+ case 7:
+ b += ((uint32) k[6] << 16);
+ /* fall through */
+ case 6:
+ b += ((uint32) k[5] << 8);
+ /* fall through */
+ case 5:
+ b += k[4];
+ /* fall through */
+ case 4:
+ a += ka[0];
+ break;
+ case 3:
+ a += ((uint32) k[2] << 16);
+ /* fall through */
+ case 2:
+ a += ((uint32) k[1] << 8);
+ /* fall through */
+ case 1:
+ a += k[0];
+ /* case 0: nothing left to add */
+ }
+#endif /* WORDS_BIGENDIAN */
+ }
+ else
+ {
+ /* Code path for non-aligned source data */
+
+ /* handle most of the key */
+ while (len >= 12)
+ {
+#ifdef WORDS_BIGENDIAN
+ a += (k[3] + ((uint32) k[2] << 8) + ((uint32) k[1] << 16) + ((uint32) k[0] << 24));
+ b += (k[7] + ((uint32) k[6] << 8) + ((uint32) k[5] << 16) + ((uint32) k[4] << 24));
+ c += (k[11] + ((uint32) k[10] << 8) + ((uint32) k[9] << 16) + ((uint32) k[8] << 24));
+#else /* !WORDS_BIGENDIAN */
+ a += (k[0] + ((uint32) k[1] << 8) + ((uint32) k[2] << 16) + ((uint32) k[3] << 24));
+ b += (k[4] + ((uint32) k[5] << 8) + ((uint32) k[6] << 16) + ((uint32) k[7] << 24));
+ c += (k[8] + ((uint32) k[9] << 8) + ((uint32) k[10] << 16) + ((uint32) k[11] << 24));
+#endif /* WORDS_BIGENDIAN */
+ mix(a, b, c);
+ k += 12;
+ len -= 12;
+ }
+
+ /* handle the last 11 bytes */
+#ifdef WORDS_BIGENDIAN
+ switch (len) /* all the case statements fall through */
+ {
+ case 11:
+ c += ((uint32) k[10] << 8);
+ case 10:
+ c += ((uint32) k[9] << 16);
+ case 9:
+ c += ((uint32) k[8] << 24);
+ /* the lowest byte of c is reserved for the length */
+ case 8:
+ b += k[7];
+ case 7:
+ b += ((uint32) k[6] << 8);
+ case 6:
+ b += ((uint32) k[5] << 16);
+ case 5:
+ b += ((uint32) k[4] << 24);
+ case 4:
+ a += k[3];
+ case 3:
+ a += ((uint32) k[2] << 8);
+ case 2:
+ a += ((uint32) k[1] << 16);
+ case 1:
+ a += ((uint32) k[0] << 24);
+ /* case 0: nothing left to add */
+ }
+#else /* !WORDS_BIGENDIAN */
+ switch (len) /* all the case statements fall through */
+ {
+ case 11:
+ c += ((uint32) k[10] << 24);
+ case 10:
+ c += ((uint32) k[9] << 16);
+ case 9:
+ c += ((uint32) k[8] << 8);
+ /* the lowest byte of c is reserved for the length */
+ case 8:
+ b += ((uint32) k[7] << 24);
+ case 7:
+ b += ((uint32) k[6] << 16);
+ case 6:
+ b += ((uint32) k[5] << 8);
+ case 5:
+ b += k[4];
+ case 4:
+ a += ((uint32) k[3] << 24);
+ case 3:
+ a += ((uint32) k[2] << 16);
+ case 2:
+ a += ((uint32) k[1] << 8);
+ case 1:
+ a += k[0];
+ /* case 0: nothing left to add */
+ }
+#endif /* WORDS_BIGENDIAN */
+ }
+
+ final(a, b, c);
+
+ /* report the result */
+ return UInt64GetDatum(((uint64) b << 32) | c);
+}
+
+/*
* hash_uint32() -- hash a 32-bit value
*
* This has the same result as
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index 72fce3038c..383d7d67d3 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -322,6 +322,8 @@ extern bytea *hashoptions(Datum reloptions, bool validate);
extern bool hashvalidate(Oid opclassoid);
extern Datum hash_any(register const unsigned char *k, register int keylen);
+extern Datum hash_any_extended(register const unsigned char *k, register int
+ keylen, uint64 seed);
extern Datum hash_uint32(uint32 k);
/* private routines */
Robert Haas <robertmhaas@gmail.com> writes:
Attached is a quick sketch of how this could perhaps be done (ignoring
for the moment the relatively-boring opclass pushups). It introduces
a new function hash_any_extended which differs from hash_any() in that
(a) it combines both b and c into the result and (b) it accepts a seed
which is mixed into the initial state if it's non-zero.
Comments?
Hm. Despite the comment at lines 302-304, I'm not sure that we ought
to do this simply by using "b" as the high order bits. AFAICS that
exposes little or no additional randomness; in particular it seems
unlikely to meet Jenkins' original design goal that "every 1-bit and
2-bit delta achieves avalanche". There might be some simple way to
extend the existing code to produce a mostly-independent set of 32 more
bits, but I wonder if we wouldn't be better advised to just keep Jenkins'
code as-is and use some other method entirely for producing the
32 new result bits.
... In fact, on perusing the linked-to page
http://burtleburtle.net/bob/hash/doobs.html
Bob says specifically that taking b and c from this hash does not
produce a fully random 64-bit result. He has a new one that does,
lookup3.c, but probably he hasn't tried to make that bit-compatible
with the 1997 algorithm.
Your injection of the seed as prepended data seems unassailable from the
randomness standpoint. But I wonder whether we could do it more cheaply
by xoring the seed into the initial a/b/c values --- it's not very clear
whether those are magic in any interesting way, or just some randomly
chosen constants.
Anyway, I'd certainly suggest that whoever embarks on this for real
spend some time perusing Jenkins' website.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 16, 2017 at 05:58:41PM -0400, Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Attached is a quick sketch of how this could perhaps be done (ignoring
for the moment the relatively-boring opclass pushups). It introduces
a new function hash_any_extended which differs from hash_any() in that
(a) it combines both b and c into the result and (b) it accepts a seed
which is mixed into the initial state if it's non-zero.Comments?
Hm. Despite the comment at lines 302-304, I'm not sure that we ought
to do this simply by using "b" as the high order bits. AFAICS that
exposes little or no additional randomness; in particular it seems
unlikely to meet Jenkins' original design goal that "every 1-bit and
2-bit delta achieves avalanche". There might be some simple way to
extend the existing code to produce a mostly-independent set of 32 more
bits, but I wonder if we wouldn't be better advised to just keep Jenkins'
code as-is and use some other method entirely for producing the
32 new result bits.... In fact, on perusing the linked-to page
http://burtleburtle.net/bob/hash/doobs.html
Bob says specifically that taking b and c from this hash does not
produce a fully random 64-bit result. He has a new one that does,
lookup3.c, but probably he hasn't tried to make that bit-compatible
with the 1997 algorithm.
Hi,
The updated hash functions that we currently use are based on Bob Jenkins
lookup3.c and using b as the higher order bits is pretty darn good. I had
lobbied to present the 64-bit b+c hash in the original work for similar
reasons. We are definitely not using a lookup2.c version from 1997.
Regards,
Ken
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Kenneth Marshall <ktm@rice.edu> writes:
On Wed, Aug 16, 2017 at 05:58:41PM -0400, Tom Lane wrote:
... In fact, on perusing the linked-to page
http://burtleburtle.net/bob/hash/doobs.html
Bob says specifically that taking b and c from this hash does not
produce a fully random 64-bit result. He has a new one that does,
lookup3.c, but probably he hasn't tried to make that bit-compatible
with the 1997 algorithm.
The updated hash functions that we currently use are based on Bob Jenkins
lookup3.c and using b as the higher order bits is pretty darn good. I had
lobbied to present the 64-bit b+c hash in the original work for similar
reasons. We are definitely not using a lookup2.c version from 1997.
Oh --- I overlooked the bit about "Bob's 2006 update". Really that
comment block should have been completely rewritten, rather than leaving
the original text there, especially since as it stands there are only
pointers to the old algorithm.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 16, 2017 at 5:34 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Attached is a quick sketch of how this could perhaps be done (ignoring
for the moment the relatively-boring opclass pushups).
Here it is with some relatively-boring opclass pushups added. I just
did the int4 bit; the same thing will need to be done, uh, 35 more
times. But you get the gist. No, not that kind of gist.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
add-optional-second-hash-proc-v1.patchapplication/octet-stream; name=add-optional-second-hash-proc-v1.patchDownload
From 3228c640097de8b6817057b750a41d38265a219d Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 17 Aug 2017 11:16:51 -0400
Subject: [PATCH] Add optional second hash proc.
---
doc/src/sgml/xindex.sgml | 11 +-
src/backend/access/hash/hashfunc.c | 257 +++++++++++++++++++++++++++-
src/backend/access/hash/hashpage.c | 2 +-
src/backend/access/hash/hashutil.c | 6 +-
src/backend/access/hash/hashvalidate.c | 33 +++-
src/backend/commands/opclasscmds.c | 34 +++-
src/backend/utils/cache/lsyscache.c | 8 +-
src/backend/utils/cache/typcache.c | 2 +-
src/include/access/hash.h | 17 +-
src/include/catalog/pg_amproc.h | 1 +
src/include/catalog/pg_proc.h | 2 +
src/test/regress/expected/alter_generic.out | 4 +-
12 files changed, 344 insertions(+), 33 deletions(-)
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml
index 333a36c456..0f3c46b11f 100644
--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@@ -436,7 +436,8 @@
</table>
<para>
- Hash indexes require one support function, shown in <xref
+ Hash indexes require one support function, and allow a second one to be
+ supplied at the operator class author's option, as shown in <xref
linkend="xindex-hash-support-table">.
</para>
@@ -451,9 +452,15 @@
</thead>
<tbody>
<row>
- <entry>Compute the hash value for a key</entry>
+ <entry>Compute the 32-bit hash value for a key</entry>
<entry>1</entry>
</row>
+ <row>
+ <entry>
+ Compute the 64-bit hash value for a key given a 64-bit salt
+ </entry>
+ <entry>2</entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index a127f3f8b1..511d079af7 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -59,6 +59,12 @@ hashint4(PG_FUNCTION_ARGS)
}
Datum
+hashint4extended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended(PG_GETARG_INT32(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashint8(PG_FUNCTION_ARGS)
{
/*
@@ -502,7 +508,227 @@ hash_any(register const unsigned char *k, register int keylen)
}
/*
- * hash_uint32() -- hash a 32-bit value
+ * hash_any_extended() -- hash into a 64-bit value, using an optional seed
+ * k : the key (the unaligned variable-length array of bytes)
+ * len : the length of the key, counting by bytes
+ * seed : a 64-bit seed (0 means no seed)
+ *
+ * Returns a uint64 value. Otherwise similar to hash_any.
+ */
+Datum
+hash_any_extended(register const unsigned char *k, register int keylen,
+ uint64 seed)
+{
+ register uint32 a,
+ b,
+ c,
+ len;
+
+ /* Set up the internal state */
+ len = keylen;
+ a = b = c = 0x9e3779b9 + len + 3923095;
+
+ /* If the seed is non-zero, use it to perturb the internal state. */
+ if (seed != 0)
+ {
+ /*
+ * In essence, the seed is treated as part of the data being hashed,
+ * but for simplicity, we pretend that it's padded with four bytes of
+ * zeroes so that the seed constitutes a 4-byte chunk.
+ */
+ a += (uint32) (seed >> 32);
+ b += (uint32) seed;
+ mix(a, b, c);
+ }
+
+ /* If the source pointer is word-aligned, we use word-wide fetches */
+ if (((uintptr_t) k & UINT32_ALIGN_MASK) == 0)
+ {
+ /* Code path for aligned source data */
+ register const uint32 *ka = (const uint32 *) k;
+
+ /* handle most of the key */
+ while (len >= 12)
+ {
+ a += ka[0];
+ b += ka[1];
+ c += ka[2];
+ mix(a, b, c);
+ ka += 3;
+ len -= 12;
+ }
+
+ /* handle the last 11 bytes */
+ k = (const unsigned char *) ka;
+#ifdef WORDS_BIGENDIAN
+ switch (len)
+ {
+ case 11:
+ c += ((uint32) k[10] << 8);
+ /* fall through */
+ case 10:
+ c += ((uint32) k[9] << 16);
+ /* fall through */
+ case 9:
+ c += ((uint32) k[8] << 24);
+ /* the lowest byte of c is reserved for the length */
+ /* fall through */
+ case 8:
+ b += ka[1];
+ a += ka[0];
+ break;
+ case 7:
+ b += ((uint32) k[6] << 8);
+ /* fall through */
+ case 6:
+ b += ((uint32) k[5] << 16);
+ /* fall through */
+ case 5:
+ b += ((uint32) k[4] << 24);
+ /* fall through */
+ case 4:
+ a += ka[0];
+ break;
+ case 3:
+ a += ((uint32) k[2] << 8);
+ /* fall through */
+ case 2:
+ a += ((uint32) k[1] << 16);
+ /* fall through */
+ case 1:
+ a += ((uint32) k[0] << 24);
+ /* case 0: nothing left to add */
+ }
+#else /* !WORDS_BIGENDIAN */
+ switch (len)
+ {
+ case 11:
+ c += ((uint32) k[10] << 24);
+ /* fall through */
+ case 10:
+ c += ((uint32) k[9] << 16);
+ /* fall through */
+ case 9:
+ c += ((uint32) k[8] << 8);
+ /* the lowest byte of c is reserved for the length */
+ /* fall through */
+ case 8:
+ b += ka[1];
+ a += ka[0];
+ break;
+ case 7:
+ b += ((uint32) k[6] << 16);
+ /* fall through */
+ case 6:
+ b += ((uint32) k[5] << 8);
+ /* fall through */
+ case 5:
+ b += k[4];
+ /* fall through */
+ case 4:
+ a += ka[0];
+ break;
+ case 3:
+ a += ((uint32) k[2] << 16);
+ /* fall through */
+ case 2:
+ a += ((uint32) k[1] << 8);
+ /* fall through */
+ case 1:
+ a += k[0];
+ /* case 0: nothing left to add */
+ }
+#endif /* WORDS_BIGENDIAN */
+ }
+ else
+ {
+ /* Code path for non-aligned source data */
+
+ /* handle most of the key */
+ while (len >= 12)
+ {
+#ifdef WORDS_BIGENDIAN
+ a += (k[3] + ((uint32) k[2] << 8) + ((uint32) k[1] << 16) + ((uint32) k[0] << 24));
+ b += (k[7] + ((uint32) k[6] << 8) + ((uint32) k[5] << 16) + ((uint32) k[4] << 24));
+ c += (k[11] + ((uint32) k[10] << 8) + ((uint32) k[9] << 16) + ((uint32) k[8] << 24));
+#else /* !WORDS_BIGENDIAN */
+ a += (k[0] + ((uint32) k[1] << 8) + ((uint32) k[2] << 16) + ((uint32) k[3] << 24));
+ b += (k[4] + ((uint32) k[5] << 8) + ((uint32) k[6] << 16) + ((uint32) k[7] << 24));
+ c += (k[8] + ((uint32) k[9] << 8) + ((uint32) k[10] << 16) + ((uint32) k[11] << 24));
+#endif /* WORDS_BIGENDIAN */
+ mix(a, b, c);
+ k += 12;
+ len -= 12;
+ }
+
+ /* handle the last 11 bytes */
+#ifdef WORDS_BIGENDIAN
+ switch (len) /* all the case statements fall through */
+ {
+ case 11:
+ c += ((uint32) k[10] << 8);
+ case 10:
+ c += ((uint32) k[9] << 16);
+ case 9:
+ c += ((uint32) k[8] << 24);
+ /* the lowest byte of c is reserved for the length */
+ case 8:
+ b += k[7];
+ case 7:
+ b += ((uint32) k[6] << 8);
+ case 6:
+ b += ((uint32) k[5] << 16);
+ case 5:
+ b += ((uint32) k[4] << 24);
+ case 4:
+ a += k[3];
+ case 3:
+ a += ((uint32) k[2] << 8);
+ case 2:
+ a += ((uint32) k[1] << 16);
+ case 1:
+ a += ((uint32) k[0] << 24);
+ /* case 0: nothing left to add */
+ }
+#else /* !WORDS_BIGENDIAN */
+ switch (len) /* all the case statements fall through */
+ {
+ case 11:
+ c += ((uint32) k[10] << 24);
+ case 10:
+ c += ((uint32) k[9] << 16);
+ case 9:
+ c += ((uint32) k[8] << 8);
+ /* the lowest byte of c is reserved for the length */
+ case 8:
+ b += ((uint32) k[7] << 24);
+ case 7:
+ b += ((uint32) k[6] << 16);
+ case 6:
+ b += ((uint32) k[5] << 8);
+ case 5:
+ b += k[4];
+ case 4:
+ a += ((uint32) k[3] << 24);
+ case 3:
+ a += ((uint32) k[2] << 16);
+ case 2:
+ a += ((uint32) k[1] << 8);
+ case 1:
+ a += k[0];
+ /* case 0: nothing left to add */
+ }
+#endif /* WORDS_BIGENDIAN */
+ }
+
+ final(a, b, c);
+
+ /* report the result */
+ return UInt64GetDatum(((uint64) b << 32) | c);
+}
+
+/*
+ * hash_uint32() -- hash a 32-bit value to a 32-bit value
*
* This has the same result as
* hash_any(&k, sizeof(uint32))
@@ -523,3 +749,32 @@ hash_uint32(uint32 k)
/* report the result */
return UInt32GetDatum(c);
}
+
+/*
+ * hash_uint32_extended() -- hash a 32-bit value to a 64-bit value, with a seed
+ *
+ * Like hash_uint32, this is a convenience function.
+ */
+Datum
+hash_uint32_extended(uint32 k, uint64 seed)
+{
+ register uint32 a,
+ b,
+ c;
+
+ a = b = c = 0x9e3779b9 + (uint32) sizeof(uint32) + 3923095;
+
+ if (seed != 0)
+ {
+ a += (uint32) (seed >> 32);
+ b += (uint32) seed;
+ mix(a, b, c);
+ }
+
+ a += k;
+
+ final(a, b, c);
+
+ /* report the result */
+ return UInt64GetDatum(((uint64) b << 32) | c);
+}
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 7b2906b0ca..05798419fc 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -373,7 +373,7 @@ _hash_init(Relation rel, double num_tuples, ForkNumber forkNum)
if (ffactor < 10)
ffactor = 10;
- procid = index_getprocid(rel, 1, HASHPROC);
+ procid = index_getprocid(rel, 1, HASHSTANDARD_PROC);
/*
* We initialize the metapage, the first N bucket pages, and the first
diff --git a/src/backend/access/hash/hashutil.c b/src/backend/access/hash/hashutil.c
index 9b803af7c2..869cbc1081 100644
--- a/src/backend/access/hash/hashutil.c
+++ b/src/backend/access/hash/hashutil.c
@@ -85,7 +85,7 @@ _hash_datum2hashkey(Relation rel, Datum key)
Oid collation;
/* XXX assumes index has only one attribute */
- procinfo = index_getprocinfo(rel, 1, HASHPROC);
+ procinfo = index_getprocinfo(rel, 1, HASHSTANDARD_PROC);
collation = rel->rd_indcollation[0];
return DatumGetUInt32(FunctionCall1Coll(procinfo, collation, key));
@@ -108,10 +108,10 @@ _hash_datum2hashkey_type(Relation rel, Datum key, Oid keytype)
hash_proc = get_opfamily_proc(rel->rd_opfamily[0],
keytype,
keytype,
- HASHPROC);
+ HASHSTANDARD_PROC);
if (!RegProcedureIsValid(hash_proc))
elog(ERROR, "missing support function %d(%u,%u) for index \"%s\"",
- HASHPROC, keytype, keytype,
+ HASHSTANDARD_PROC, keytype, keytype,
RelationGetRelationName(rel));
collation = rel->rd_indcollation[0];
diff --git a/src/backend/access/hash/hashvalidate.c b/src/backend/access/hash/hashvalidate.c
index 30b29cb100..f952bb9d0b 100644
--- a/src/backend/access/hash/hashvalidate.c
+++ b/src/backend/access/hash/hashvalidate.c
@@ -29,7 +29,7 @@
#include "utils/syscache.h"
-static bool check_hash_func_signature(Oid funcid, Oid restype, Oid argtype);
+static bool check_hash_func_signature(Oid funcid, int16 amprocnum, Oid argtype);
/*
@@ -105,8 +105,9 @@ hashvalidate(Oid opclassoid)
/* Check procedure numbers and function signatures */
switch (procform->amprocnum)
{
- case HASHPROC:
- if (!check_hash_func_signature(procform->amproc, INT4OID,
+ case HASHSTANDARD_PROC:
+ case HASHEXTENDED_PROC:
+ if (!check_hash_func_signature(procform->amproc, procform->amprocnum,
procform->amproclefttype))
{
ereport(INFO,
@@ -264,19 +265,37 @@ hashvalidate(Oid opclassoid)
* hacks in the core hash opclass definitions.
*/
static bool
-check_hash_func_signature(Oid funcid, Oid restype, Oid argtype)
+check_hash_func_signature(Oid funcid, int16 amprocnum, Oid argtype)
{
bool result = true;
+ Oid restype;
+ int16 nargs;
HeapTuple tp;
Form_pg_proc procform;
+ switch (amprocnum)
+ {
+ case HASHSTANDARD_PROC:
+ restype = INT4OID;
+ nargs = 1;
+ break;
+
+ case HASHEXTENDED_PROC:
+ restype = INT8OID;
+ nargs = 2;
+ break;
+
+ default:
+ elog(ERROR, "invalid amprocnum");
+ }
+
tp = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid));
if (!HeapTupleIsValid(tp))
elog(ERROR, "cache lookup failed for function %u", funcid);
procform = (Form_pg_proc) GETSTRUCT(tp);
if (procform->prorettype != restype || procform->proretset ||
- procform->pronargs != 1)
+ procform->pronargs != nargs)
result = false;
if (!IsBinaryCoercible(argtype, procform->proargtypes.values[0]))
@@ -308,6 +327,10 @@ check_hash_func_signature(Oid funcid, Oid restype, Oid argtype)
result = false;
}
+ /* If function takes a second argument, it must be for a 64-bit salt. */
+ if (nargs == 2 && procform->proargtypes.values[1] != INT8OID)
+ result = false;
+
ReleaseSysCache(tp);
return result;
}
diff --git a/src/backend/commands/opclasscmds.c b/src/backend/commands/opclasscmds.c
index a31b1acb9c..4a8aaf38ad 100644
--- a/src/backend/commands/opclasscmds.c
+++ b/src/backend/commands/opclasscmds.c
@@ -18,6 +18,7 @@
#include <limits.h>
#include "access/genam.h"
+#include "access/hash.h"
#include "access/heapam.h"
#include "access/nbtree.h"
#include "access/htup_details.h"
@@ -1129,7 +1130,8 @@ assignProcTypes(OpFamilyMember *member, Oid amoid, Oid typeoid)
/*
* btree comparison procs must be 2-arg procs returning int4, while btree
* sortsupport procs must take internal and return void. hash support
- * procs must be 1-arg procs returning int4. Otherwise we don't know.
+ * proc 1 must be a 1-arg proc returning int4, while proc 2 must be a 2-arg
+ * proc returning int8. Otherwise we don't know.
*/
if (amoid == BTREE_AM_OID)
{
@@ -1172,14 +1174,28 @@ assignProcTypes(OpFamilyMember *member, Oid amoid, Oid typeoid)
}
else if (amoid == HASH_AM_OID)
{
- if (procform->pronargs != 1)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
- errmsg("hash procedures must have one argument")));
- if (procform->prorettype != INT4OID)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
- errmsg("hash procedures must return integer")));
+ if (member->number == HASHSTANDARD_PROC)
+ {
+ if (procform->pronargs != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 1 must have one argument")));
+ if (procform->prorettype != INT4OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 1 must return integer")));
+ }
+ else if (member->number == HASHEXTENDED_PROC)
+ {
+ if (procform->pronargs != 2)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 2 must have two arguments")));
+ if (procform->prorettype != INT8OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 2 must return bigint")));
+ }
/*
* If lefttype/righttype isn't specified, use the proc's input type
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 82763f8013..b7a14dc87e 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -490,8 +490,8 @@ get_compatible_hash_operators(Oid opno,
/*
* get_op_hash_functions
- * Get the OID(s) of hash support function(s) compatible with the given
- * operator, operating on its LHS and/or RHS datatype as required.
+ * Get the OID(s) of the standard hash support function(s) compatible with
+ * the given operator, operating on its LHS and/or RHS datatype as required.
*
* A function for the LHS type is sought and returned into *lhs_procno if
* lhs_procno isn't NULL. Similarly, a function for the RHS type is sought
@@ -542,7 +542,7 @@ get_op_hash_functions(Oid opno,
*lhs_procno = get_opfamily_proc(aform->amopfamily,
aform->amoplefttype,
aform->amoplefttype,
- HASHPROC);
+ HASHSTANDARD_PROC);
if (!OidIsValid(*lhs_procno))
continue;
/* Matching LHS found, done if caller doesn't want RHS */
@@ -564,7 +564,7 @@ get_op_hash_functions(Oid opno,
*rhs_procno = get_opfamily_proc(aform->amopfamily,
aform->amoprighttype,
aform->amoprighttype,
- HASHPROC);
+ HASHSTANDARD_PROC);
if (!OidIsValid(*rhs_procno))
{
/* Forget any LHS function from this opfamily */
diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index 7ec31eb3e3..96139ed204 100644
--- a/src/backend/utils/cache/typcache.c
+++ b/src/backend/utils/cache/typcache.c
@@ -474,7 +474,7 @@ lookup_type_cache(Oid type_id, int flags)
hash_proc = get_opfamily_proc(typentry->hash_opf,
typentry->hash_opintype,
typentry->hash_opintype,
- HASHPROC);
+ HASHSTANDARD_PROC);
/*
* As above, make sure hash_array will succeed. We don't currently
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index 72fce3038c..13505bc580 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -289,12 +289,16 @@ typedef HashMetaPageData *HashMetaPage;
#define HTMaxStrategyNumber 1
/*
- * When a new operator class is declared, we require that the user supply
- * us with an amproc procudure for hashing a key of the new type.
- * Since we only have one such proc in amproc, it's number 1.
+ * When a new operator class is declared, we require that the user supply
+ * us with an amproc procudure for hashing a key of the new type, returning
+ * a 32-bit hash value. We call this the "standard" hash procedure. We
+ * also allow an optional "extended" hash procedure which accepts a salt and
+ * returns a 64-bit hash value. This is highly recommended but, for reasons
+ * of backward compatibility, optional.
*/
-#define HASHPROC 1
-#define HASHNProcs 1
+#define HASHSTANDARD_PROC 1
+#define HASHEXTENDED_PROC 2
+#define HASHNProcs 2
/* public routines */
@@ -322,7 +326,10 @@ extern bytea *hashoptions(Datum reloptions, bool validate);
extern bool hashvalidate(Oid opclassoid);
extern Datum hash_any(register const unsigned char *k, register int keylen);
+extern Datum hash_any_extended(register const unsigned char *k, register int
+ keylen, uint64 seed);
extern Datum hash_uint32(uint32 k);
+extern Datum hash_uint32_extended(uint32 k, uint64 seed);
/* private routines */
diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h
index 7d245b1271..79efc2f5dc 100644
--- a/src/include/catalog/pg_amproc.h
+++ b/src/include/catalog/pg_amproc.h
@@ -161,6 +161,7 @@ DATA(insert ( 1971 701 701 1 452 ));
DATA(insert ( 1975 869 869 1 422 ));
DATA(insert ( 1977 21 21 1 449 ));
DATA(insert ( 1977 23 23 1 450 ));
+DATA(insert ( 1977 23 23 2 425 ));
DATA(insert ( 1977 20 20 1 949 ));
DATA(insert ( 1983 1186 1186 1 1697 ));
DATA(insert ( 1985 829 829 1 399 ));
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 8b33b4e0ea..f2e9f7a553 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -670,6 +670,8 @@ DATA(insert OID = 449 ( hashint2 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0
DESCR("hash");
DATA(insert OID = 450 ( hashint4 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "23" _null_ _null_ _null_ _null_ _null_ hashint4 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 425 ( hashint4extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "23 20" _null_ _null_ _null_ _null_ _null_ hashint4extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 949 ( hashint8 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "20" _null_ _null_ _null_ _null_ _null_ hashint8 _null_ _null_ _null_ ));
DESCR("hash");
DATA(insert OID = 451 ( hashfloat4 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "700" _null_ _null_ _null_ _null_ _null_ hashfloat4 _null_ _null_ _null_ ));
diff --git a/src/test/regress/expected/alter_generic.out b/src/test/regress/expected/alter_generic.out
index 9f6ad4de33..767c09bec5 100644
--- a/src/test/regress/expected/alter_generic.out
+++ b/src/test/regress/expected/alter_generic.out
@@ -421,7 +421,7 @@ BEGIN TRANSACTION;
CREATE OPERATOR FAMILY alt_opf13 USING hash;
CREATE FUNCTION fn_opf13 (int4) RETURNS BIGINT AS 'SELECT NULL::BIGINT;' LANGUAGE SQL;
ALTER OPERATOR FAMILY alt_opf13 USING hash ADD FUNCTION 1 fn_opf13(int4);
-ERROR: hash procedures must return integer
+ERROR: hash procedure 1 must return integer
DROP OPERATOR FAMILY alt_opf13 USING hash;
ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
@@ -439,7 +439,7 @@ BEGIN TRANSACTION;
CREATE OPERATOR FAMILY alt_opf15 USING hash;
CREATE FUNCTION fn_opf15 (int4, int2) RETURNS BIGINT AS 'SELECT NULL::BIGINT;' LANGUAGE SQL;
ALTER OPERATOR FAMILY alt_opf15 USING hash ADD FUNCTION 1 fn_opf15(int4, int2);
-ERROR: hash procedures must have one argument
+ERROR: hash procedure 1 must have one argument
DROP OPERATOR FAMILY alt_opf15 USING hash;
ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
--
2.11.0 (Apple Git-81)
On Fri, Aug 18, 2017 at 8:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Aug 16, 2017 at 5:34 PM, Robert Haas <robertmhaas@gmail.com>
wrote:Attached is a quick sketch of how this could perhaps be done (ignoring
for the moment the relatively-boring opclass pushups).Here it is with some relatively-boring opclass pushups added. I just
did the int4 bit; the same thing will need to be done, uh, 35 more
times. But you get the gist. No, not that kind of gist.
I will work on this.
I have a small query, what if I want a cache entry with extended hash
function instead standard one, I might require that while adding
hash_array_extended function? Do you think we need to extend
lookup_type_cache() as well?
Regards,
Amul
On Fri, Aug 18, 2017 at 1:12 PM, amul sul <sulamul@gmail.com> wrote:
I have a small query, what if I want a cache entry with extended hash
function instead standard one, I might require that while adding
hash_array_extended function? Do you think we need to extend
lookup_type_cache() as well?
Hmm, I thought you had changed the hash partitioning stuff so that it
didn't rely on lookup_type_cache(). You have to look up the function
using the opclass provided in the partition key definition;
lookup_type_cache() will give you the default one for the datatype.
Maybe just use get_opfamily_proc?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Aug 18, 2017 at 11:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Aug 18, 2017 at 1:12 PM, amul sul <sulamul@gmail.com> wrote:
I have a small query, what if I want a cache entry with extended hash
function instead standard one, I might require that while adding
hash_array_extended function? Do you think we need to extend
lookup_type_cache() as well?Hmm, I thought you had changed the hash partitioning stuff so that it
didn't rely on lookup_type_cache(). You have to look up the function
using the opclass provided in the partition key definition;
lookup_type_cache() will give you the default one for the datatype.
Maybe just use get_opfamily_proc?
Yes, we can do that for
the
partitioning code, but my concern is a little bit
different. I apologize, I wasn't clear enough.
I am trying to extend hash_array & hash_range function. The hash_array and
hash_range function calculates hash by using the respective hash function
for
the given argument type (i.e. array/range element type), and those hash
functions are made available in the TypeCacheEntry via lookup_type_cache().
But
in the hash_array & hash_range extended version requires a respective
extended
hash function for those element type.
I have added hash_array_extended & hash_range_extended function in the
attached
patch 0001, which maintains a local copy of TypeCacheEntry with extended
hash
functions. But I am a little bit skeptic about this logic, any
advice/suggestions will be
greatly appreciated.
The logic in the rest of the extended hash functions is same as the standard
one.
Attaching patch 0002 for the reviewer's testing.
Regards,
Amul
Attachments:
0001-add-optional-second-hash-proc-v2-wip.patchapplication/octet-stream; name=0001-add-optional-second-hash-proc-v2-wip.patchDownload
From e50ccdacccc4987eed84cd85b459628eb3eedcaa Mon Sep 17 00:00:00 2001
From: Amul Sul <sulamul@gmail.com>
Date: Fri, 18 Aug 2017 15:28:26 +0530
Subject: [PATCH 1/2] add-optional-second-hash-proc-v2-wip
v2:
Extended remaining hash function.
TODO:
1. hash_array_extended & hash_range_extended code logic TBC.
v1:
Patch by Robert Haas.
---
doc/src/sgml/xindex.sgml | 11 +-
src/backend/access/hash/hashfunc.c | 401 +++++++++++++++++++++++++++-
src/backend/access/hash/hashpage.c | 2 +-
src/backend/access/hash/hashutil.c | 6 +-
src/backend/access/hash/hashvalidate.c | 42 ++-
src/backend/commands/opclasscmds.c | 34 ++-
src/backend/utils/adt/acl.c | 13 +
src/backend/utils/adt/arrayfuncs.c | 114 ++++++++
src/backend/utils/adt/date.c | 24 ++
src/backend/utils/adt/jsonb_op.c | 42 +++
src/backend/utils/adt/jsonb_util.c | 42 +++
src/backend/utils/adt/mac.c | 8 +
src/backend/utils/adt/mac8.c | 9 +
src/backend/utils/adt/network.c | 11 +
src/backend/utils/adt/numeric.c | 78 ++++++
src/backend/utils/adt/pg_lsn.c | 7 +
src/backend/utils/adt/rangetypes.c | 97 +++++++
src/backend/utils/adt/timestamp.c | 24 ++
src/backend/utils/adt/uuid.c | 8 +
src/backend/utils/adt/varchar.c | 19 ++
src/backend/utils/cache/lsyscache.c | 8 +-
src/backend/utils/cache/typcache.c | 2 +-
src/include/access/hash.h | 17 +-
src/include/catalog/pg_amproc.h | 36 +++
src/include/catalog/pg_proc.h | 54 ++++
src/include/utils/jsonb.h | 2 +
src/test/regress/expected/alter_generic.out | 4 +-
27 files changed, 1078 insertions(+), 37 deletions(-)
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml
index 333a36c..0f3c46b 100644
--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@@ -436,7 +436,8 @@
</table>
<para>
- Hash indexes require one support function, shown in <xref
+ Hash indexes require one support function, and allow a second one to be
+ supplied at the operator class author's option, as shown in <xref
linkend="xindex-hash-support-table">.
</para>
@@ -451,9 +452,15 @@
</thead>
<tbody>
<row>
- <entry>Compute the hash value for a key</entry>
+ <entry>Compute the 32-bit hash value for a key</entry>
<entry>1</entry>
</row>
+ <row>
+ <entry>
+ Compute the 64-bit hash value for a key given a 64-bit salt
+ </entry>
+ <entry>2</entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index a127f3f..89db846 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -47,18 +47,36 @@ hashchar(PG_FUNCTION_ARGS)
}
Datum
+hashcharextended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended((int32) PG_GETARG_CHAR(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashint2(PG_FUNCTION_ARGS)
{
return hash_uint32((int32) PG_GETARG_INT16(0));
}
Datum
+hashint2extended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended((int32) PG_GETARG_INT16(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashint4(PG_FUNCTION_ARGS)
{
return hash_uint32(PG_GETARG_INT32(0));
}
Datum
+hashint4extended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended(PG_GETARG_INT32(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashint8(PG_FUNCTION_ARGS)
{
/*
@@ -79,18 +97,50 @@ hashint8(PG_FUNCTION_ARGS)
}
Datum
+hashint8extended(PG_FUNCTION_ARGS)
+{
+ /*
+ * The idea here is to produce a hash value compatible with the values
+ * produced by hashint4 and hashint2 for logically equal inputs; this is
+ * necessary to support cross-type hash joins across these input types.
+ * Since all three types are signed, we can xor the high half of the int8
+ * value if the sign is positive, or the complement of the high half when
+ * the sign is negative.
+ */
+ int64 val = PG_GETARG_INT64(0);
+ uint32 lohalf = (uint32) val;
+ uint32 hihalf = (uint32) (val >> 32);
+
+ lohalf ^= (val >= 0) ? hihalf : ~hihalf;
+
+ return hash_uint32_extended(lohalf, PG_GETARG_INT64(1));
+}
+
+Datum
hashoid(PG_FUNCTION_ARGS)
{
return hash_uint32((uint32) PG_GETARG_OID(0));
}
Datum
+hashoidextended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended((uint32) PG_GETARG_OID(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashenum(PG_FUNCTION_ARGS)
{
return hash_uint32((uint32) PG_GETARG_OID(0));
}
Datum
+hashenumextended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended((uint32) PG_GETARG_OID(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashfloat4(PG_FUNCTION_ARGS)
{
float4 key = PG_GETARG_FLOAT4(0);
@@ -117,6 +167,33 @@ hashfloat4(PG_FUNCTION_ARGS)
}
Datum
+hashfloat4extended(PG_FUNCTION_ARGS)
+{
+ float4 key = PG_GETARG_FLOAT4(0);
+ float8 key8;
+
+ /*
+ * On IEEE-float machines, minus zero and zero have different bit patterns
+ * but should compare as equal. We must ensure that they have the same
+ * hash value, which is most reliably done this way:
+ */
+ if (key == (float4) 0)
+ return UInt64GetDatum(0);
+
+ /*
+ * To support cross-type hashing of float8 and float4, we want to return
+ * the same hash value hashfloat8 would produce for an equal float8 value.
+ * So, widen the value to float8 and hash that. (We must do this rather
+ * than have hashfloat8 try to narrow its value to float4; that could fail
+ * on overflow.)
+ */
+ key8 = key;
+
+ return hash_any_extended((unsigned char *) &key8, sizeof(key8),
+ PG_GETARG_INT64(1));
+}
+
+Datum
hashfloat8(PG_FUNCTION_ARGS)
{
float8 key = PG_GETARG_FLOAT8(0);
@@ -133,6 +210,23 @@ hashfloat8(PG_FUNCTION_ARGS)
}
Datum
+hashfloat8extended(PG_FUNCTION_ARGS)
+{
+ float8 key = PG_GETARG_FLOAT8(0);
+
+ /*
+ * On IEEE-float machines, minus zero and zero have different bit patterns
+ * but should compare as equal. We must ensure that they have the same
+ * hash value, which is most reliably done this way:
+ */
+ if (key == (float8) 0)
+ return UInt64GetDatum(0);
+
+ return hash_any_extended((unsigned char *) &key, sizeof(key),
+ PG_GETARG_INT64(1));
+}
+
+Datum
hashoidvector(PG_FUNCTION_ARGS)
{
oidvector *key = (oidvector *) PG_GETARG_POINTER(0);
@@ -141,6 +235,16 @@ hashoidvector(PG_FUNCTION_ARGS)
}
Datum
+hashoidvectorextended(PG_FUNCTION_ARGS)
+{
+ oidvector *key = (oidvector *) PG_GETARG_POINTER(0);
+
+ return hash_any_extended((unsigned char *) key->values,
+ key->dim1 * sizeof(Oid),
+ PG_GETARG_INT64(1));
+}
+
+Datum
hashname(PG_FUNCTION_ARGS)
{
char *key = NameStr(*PG_GETARG_NAME(0));
@@ -149,6 +253,15 @@ hashname(PG_FUNCTION_ARGS)
}
Datum
+hashnameextended(PG_FUNCTION_ARGS)
+{
+ char *key = NameStr(*PG_GETARG_NAME(0));
+
+ return hash_any_extended((unsigned char *) key, strlen(key),
+ PG_GETARG_INT64(1));
+}
+
+Datum
hashtext(PG_FUNCTION_ARGS)
{
text *key = PG_GETARG_TEXT_PP(0);
@@ -168,6 +281,27 @@ hashtext(PG_FUNCTION_ARGS)
return result;
}
+Datum
+hashtextextended(PG_FUNCTION_ARGS)
+{
+ text *key = PG_GETARG_TEXT_PP(0);
+ Datum result;
+
+ /*
+ * Note: this is currently identical in behavior to hashvarlena, but keep
+ * it as a separate function in case we someday want to do something
+ * different in non-C locales. (See also hashbpchar, if so.)
+ */
+ result = hash_any_extended((unsigned char *) VARDATA_ANY(key),
+ VARSIZE_ANY_EXHDR(key),
+ PG_GETARG_INT64(1));
+
+ /* Avoid leaking memory for toasted inputs */
+ PG_FREE_IF_COPY(key, 0);
+
+ return result;
+}
+
/*
* hashvarlena() can be used for any varlena datatype in which there are
* no non-significant bits, ie, distinct bitpatterns never compare as equal.
@@ -187,6 +321,22 @@ hashvarlena(PG_FUNCTION_ARGS)
return result;
}
+Datum
+hashvarlenaextended(PG_FUNCTION_ARGS)
+{
+ struct varlena *key = PG_GETARG_VARLENA_PP(0);
+ Datum result;
+
+ result = hash_any_extended((unsigned char *) VARDATA_ANY(key),
+ VARSIZE_ANY_EXHDR(key),
+ PG_GETARG_INT64(1));
+
+ /* Avoid leaking memory for toasted inputs */
+ PG_FREE_IF_COPY(key, 0);
+
+ return result;
+}
+
/*
* This hash function was written by Bob Jenkins
* (bob_jenkins@burtleburtle.net), and superficially adapted
@@ -502,7 +652,227 @@ hash_any(register const unsigned char *k, register int keylen)
}
/*
- * hash_uint32() -- hash a 32-bit value
+ * hash_any_extended() -- hash into a 64-bit value, using an optional seed
+ * k : the key (the unaligned variable-length array of bytes)
+ * len : the length of the key, counting by bytes
+ * seed : a 64-bit seed (0 means no seed)
+ *
+ * Returns a uint64 value. Otherwise similar to hash_any.
+ */
+Datum
+hash_any_extended(register const unsigned char *k, register int keylen,
+ uint64 seed)
+{
+ register uint32 a,
+ b,
+ c,
+ len;
+
+ /* Set up the internal state */
+ len = keylen;
+ a = b = c = 0x9e3779b9 + len + 3923095;
+
+ /* If the seed is non-zero, use it to perturb the internal state. */
+ if (seed != 0)
+ {
+ /*
+ * In essence, the seed is treated as part of the data being hashed,
+ * but for simplicity, we pretend that it's padded with four bytes of
+ * zeroes so that the seed constitutes a 4-byte chunk.
+ */
+ a += (uint32) (seed >> 32);
+ b += (uint32) seed;
+ mix(a, b, c);
+ }
+
+ /* If the source pointer is word-aligned, we use word-wide fetches */
+ if (((uintptr_t) k & UINT32_ALIGN_MASK) == 0)
+ {
+ /* Code path for aligned source data */
+ register const uint32 *ka = (const uint32 *) k;
+
+ /* handle most of the key */
+ while (len >= 12)
+ {
+ a += ka[0];
+ b += ka[1];
+ c += ka[2];
+ mix(a, b, c);
+ ka += 3;
+ len -= 12;
+ }
+
+ /* handle the last 11 bytes */
+ k = (const unsigned char *) ka;
+#ifdef WORDS_BIGENDIAN
+ switch (len)
+ {
+ case 11:
+ c += ((uint32) k[10] << 8);
+ /* fall through */
+ case 10:
+ c += ((uint32) k[9] << 16);
+ /* fall through */
+ case 9:
+ c += ((uint32) k[8] << 24);
+ /* the lowest byte of c is reserved for the length */
+ /* fall through */
+ case 8:
+ b += ka[1];
+ a += ka[0];
+ break;
+ case 7:
+ b += ((uint32) k[6] << 8);
+ /* fall through */
+ case 6:
+ b += ((uint32) k[5] << 16);
+ /* fall through */
+ case 5:
+ b += ((uint32) k[4] << 24);
+ /* fall through */
+ case 4:
+ a += ka[0];
+ break;
+ case 3:
+ a += ((uint32) k[2] << 8);
+ /* fall through */
+ case 2:
+ a += ((uint32) k[1] << 16);
+ /* fall through */
+ case 1:
+ a += ((uint32) k[0] << 24);
+ /* case 0: nothing left to add */
+ }
+#else /* !WORDS_BIGENDIAN */
+ switch (len)
+ {
+ case 11:
+ c += ((uint32) k[10] << 24);
+ /* fall through */
+ case 10:
+ c += ((uint32) k[9] << 16);
+ /* fall through */
+ case 9:
+ c += ((uint32) k[8] << 8);
+ /* the lowest byte of c is reserved for the length */
+ /* fall through */
+ case 8:
+ b += ka[1];
+ a += ka[0];
+ break;
+ case 7:
+ b += ((uint32) k[6] << 16);
+ /* fall through */
+ case 6:
+ b += ((uint32) k[5] << 8);
+ /* fall through */
+ case 5:
+ b += k[4];
+ /* fall through */
+ case 4:
+ a += ka[0];
+ break;
+ case 3:
+ a += ((uint32) k[2] << 16);
+ /* fall through */
+ case 2:
+ a += ((uint32) k[1] << 8);
+ /* fall through */
+ case 1:
+ a += k[0];
+ /* case 0: nothing left to add */
+ }
+#endif /* WORDS_BIGENDIAN */
+ }
+ else
+ {
+ /* Code path for non-aligned source data */
+
+ /* handle most of the key */
+ while (len >= 12)
+ {
+#ifdef WORDS_BIGENDIAN
+ a += (k[3] + ((uint32) k[2] << 8) + ((uint32) k[1] << 16) + ((uint32) k[0] << 24));
+ b += (k[7] + ((uint32) k[6] << 8) + ((uint32) k[5] << 16) + ((uint32) k[4] << 24));
+ c += (k[11] + ((uint32) k[10] << 8) + ((uint32) k[9] << 16) + ((uint32) k[8] << 24));
+#else /* !WORDS_BIGENDIAN */
+ a += (k[0] + ((uint32) k[1] << 8) + ((uint32) k[2] << 16) + ((uint32) k[3] << 24));
+ b += (k[4] + ((uint32) k[5] << 8) + ((uint32) k[6] << 16) + ((uint32) k[7] << 24));
+ c += (k[8] + ((uint32) k[9] << 8) + ((uint32) k[10] << 16) + ((uint32) k[11] << 24));
+#endif /* WORDS_BIGENDIAN */
+ mix(a, b, c);
+ k += 12;
+ len -= 12;
+ }
+
+ /* handle the last 11 bytes */
+#ifdef WORDS_BIGENDIAN
+ switch (len) /* all the case statements fall through */
+ {
+ case 11:
+ c += ((uint32) k[10] << 8);
+ case 10:
+ c += ((uint32) k[9] << 16);
+ case 9:
+ c += ((uint32) k[8] << 24);
+ /* the lowest byte of c is reserved for the length */
+ case 8:
+ b += k[7];
+ case 7:
+ b += ((uint32) k[6] << 8);
+ case 6:
+ b += ((uint32) k[5] << 16);
+ case 5:
+ b += ((uint32) k[4] << 24);
+ case 4:
+ a += k[3];
+ case 3:
+ a += ((uint32) k[2] << 8);
+ case 2:
+ a += ((uint32) k[1] << 16);
+ case 1:
+ a += ((uint32) k[0] << 24);
+ /* case 0: nothing left to add */
+ }
+#else /* !WORDS_BIGENDIAN */
+ switch (len) /* all the case statements fall through */
+ {
+ case 11:
+ c += ((uint32) k[10] << 24);
+ case 10:
+ c += ((uint32) k[9] << 16);
+ case 9:
+ c += ((uint32) k[8] << 8);
+ /* the lowest byte of c is reserved for the length */
+ case 8:
+ b += ((uint32) k[7] << 24);
+ case 7:
+ b += ((uint32) k[6] << 16);
+ case 6:
+ b += ((uint32) k[5] << 8);
+ case 5:
+ b += k[4];
+ case 4:
+ a += ((uint32) k[3] << 24);
+ case 3:
+ a += ((uint32) k[2] << 16);
+ case 2:
+ a += ((uint32) k[1] << 8);
+ case 1:
+ a += k[0];
+ /* case 0: nothing left to add */
+ }
+#endif /* WORDS_BIGENDIAN */
+ }
+
+ final(a, b, c);
+
+ /* report the result */
+ return UInt64GetDatum(((uint64) b << 32) | c);
+}
+
+/*
+ * hash_uint32() -- hash a 32-bit value to a 32-bit value
*
* This has the same result as
* hash_any(&k, sizeof(uint32))
@@ -523,3 +893,32 @@ hash_uint32(uint32 k)
/* report the result */
return UInt32GetDatum(c);
}
+
+/*
+ * hash_uint32_extended() -- hash a 32-bit value to a 64-bit value, with a seed
+ *
+ * Like hash_uint32, this is a convenience function.
+ */
+Datum
+hash_uint32_extended(uint32 k, uint64 seed)
+{
+ register uint32 a,
+ b,
+ c;
+
+ a = b = c = 0x9e3779b9 + (uint32) sizeof(uint32) + 3923095;
+
+ if (seed != 0)
+ {
+ a += (uint32) (seed >> 32);
+ b += (uint32) seed;
+ mix(a, b, c);
+ }
+
+ a += k;
+
+ final(a, b, c);
+
+ /* report the result */
+ return UInt64GetDatum(((uint64) b << 32) | c);
+}
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 7b2906b..0579841 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -373,7 +373,7 @@ _hash_init(Relation rel, double num_tuples, ForkNumber forkNum)
if (ffactor < 10)
ffactor = 10;
- procid = index_getprocid(rel, 1, HASHPROC);
+ procid = index_getprocid(rel, 1, HASHSTANDARD_PROC);
/*
* We initialize the metapage, the first N bucket pages, and the first
diff --git a/src/backend/access/hash/hashutil.c b/src/backend/access/hash/hashutil.c
index 9b803af..869cbc1 100644
--- a/src/backend/access/hash/hashutil.c
+++ b/src/backend/access/hash/hashutil.c
@@ -85,7 +85,7 @@ _hash_datum2hashkey(Relation rel, Datum key)
Oid collation;
/* XXX assumes index has only one attribute */
- procinfo = index_getprocinfo(rel, 1, HASHPROC);
+ procinfo = index_getprocinfo(rel, 1, HASHSTANDARD_PROC);
collation = rel->rd_indcollation[0];
return DatumGetUInt32(FunctionCall1Coll(procinfo, collation, key));
@@ -108,10 +108,10 @@ _hash_datum2hashkey_type(Relation rel, Datum key, Oid keytype)
hash_proc = get_opfamily_proc(rel->rd_opfamily[0],
keytype,
keytype,
- HASHPROC);
+ HASHSTANDARD_PROC);
if (!RegProcedureIsValid(hash_proc))
elog(ERROR, "missing support function %d(%u,%u) for index \"%s\"",
- HASHPROC, keytype, keytype,
+ HASHSTANDARD_PROC, keytype, keytype,
RelationGetRelationName(rel));
collation = rel->rd_indcollation[0];
diff --git a/src/backend/access/hash/hashvalidate.c b/src/backend/access/hash/hashvalidate.c
index 30b29cb..8b633c2 100644
--- a/src/backend/access/hash/hashvalidate.c
+++ b/src/backend/access/hash/hashvalidate.c
@@ -29,7 +29,7 @@
#include "utils/syscache.h"
-static bool check_hash_func_signature(Oid funcid, Oid restype, Oid argtype);
+static bool check_hash_func_signature(Oid funcid, int16 amprocnum, Oid argtype);
/*
@@ -105,8 +105,9 @@ hashvalidate(Oid opclassoid)
/* Check procedure numbers and function signatures */
switch (procform->amprocnum)
{
- case HASHPROC:
- if (!check_hash_func_signature(procform->amproc, INT4OID,
+ case HASHSTANDARD_PROC:
+ case HASHEXTENDED_PROC:
+ if (!check_hash_func_signature(procform->amproc, procform->amprocnum,
procform->amproclefttype))
{
ereport(INFO,
@@ -264,19 +265,37 @@ hashvalidate(Oid opclassoid)
* hacks in the core hash opclass definitions.
*/
static bool
-check_hash_func_signature(Oid funcid, Oid restype, Oid argtype)
+check_hash_func_signature(Oid funcid, int16 amprocnum, Oid argtype)
{
bool result = true;
+ Oid restype;
+ int16 nargs;
HeapTuple tp;
Form_pg_proc procform;
+ switch (amprocnum)
+ {
+ case HASHSTANDARD_PROC:
+ restype = INT4OID;
+ nargs = 1;
+ break;
+
+ case HASHEXTENDED_PROC:
+ restype = INT8OID;
+ nargs = 2;
+ break;
+
+ default:
+ elog(ERROR, "invalid amprocnum");
+ }
+
tp = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid));
if (!HeapTupleIsValid(tp))
elog(ERROR, "cache lookup failed for function %u", funcid);
procform = (Form_pg_proc) GETSTRUCT(tp);
if (procform->prorettype != restype || procform->proretset ||
- procform->pronargs != 1)
+ procform->pronargs != nargs)
result = false;
if (!IsBinaryCoercible(argtype, procform->proargtypes.values[0]))
@@ -290,24 +309,29 @@ check_hash_func_signature(Oid funcid, Oid restype, Oid argtype)
* identity, not just its input type, because hashvarlena() takes
* INTERNAL and allowing any such function seems too scary.
*/
- if (funcid == F_HASHINT4 &&
+ if ((funcid == F_HASHINT4 || funcid == F_HASHINT4EXTENDED) &&
(argtype == DATEOID ||
argtype == ABSTIMEOID || argtype == RELTIMEOID ||
argtype == XIDOID || argtype == CIDOID))
/* okay, allowed use of hashint4() */ ;
- else if (funcid == F_TIMESTAMP_HASH &&
+ else if ((funcid == F_TIMESTAMP_HASH ||
+ funcid == F_TIMESTAMP_HASH_EXTENDED) &&
argtype == TIMESTAMPTZOID)
/* okay, allowed use of timestamp_hash() */ ;
- else if (funcid == F_HASHCHAR &&
+ else if ((funcid == F_HASHCHAR || funcid == F_HASHCHAREXTENDED) &&
argtype == BOOLOID)
/* okay, allowed use of hashchar() */ ;
- else if (funcid == F_HASHVARLENA &&
+ else if ((funcid == F_HASHVARLENA || funcid == F_HASHVARLENAEXTENDED) &&
argtype == BYTEAOID)
/* okay, allowed use of hashvarlena() */ ;
else
result = false;
}
+ /* If function takes a second argument, it must be for a 64-bit salt. */
+ if (nargs == 2 && procform->proargtypes.values[1] != INT8OID)
+ result = false;
+
ReleaseSysCache(tp);
return result;
}
diff --git a/src/backend/commands/opclasscmds.c b/src/backend/commands/opclasscmds.c
index a31b1ac..4a8aaf3 100644
--- a/src/backend/commands/opclasscmds.c
+++ b/src/backend/commands/opclasscmds.c
@@ -18,6 +18,7 @@
#include <limits.h>
#include "access/genam.h"
+#include "access/hash.h"
#include "access/heapam.h"
#include "access/nbtree.h"
#include "access/htup_details.h"
@@ -1129,7 +1130,8 @@ assignProcTypes(OpFamilyMember *member, Oid amoid, Oid typeoid)
/*
* btree comparison procs must be 2-arg procs returning int4, while btree
* sortsupport procs must take internal and return void. hash support
- * procs must be 1-arg procs returning int4. Otherwise we don't know.
+ * proc 1 must be a 1-arg proc returning int4, while proc 2 must be a 2-arg
+ * proc returning int8. Otherwise we don't know.
*/
if (amoid == BTREE_AM_OID)
{
@@ -1172,14 +1174,28 @@ assignProcTypes(OpFamilyMember *member, Oid amoid, Oid typeoid)
}
else if (amoid == HASH_AM_OID)
{
- if (procform->pronargs != 1)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
- errmsg("hash procedures must have one argument")));
- if (procform->prorettype != INT4OID)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
- errmsg("hash procedures must return integer")));
+ if (member->number == HASHSTANDARD_PROC)
+ {
+ if (procform->pronargs != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 1 must have one argument")));
+ if (procform->prorettype != INT4OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 1 must return integer")));
+ }
+ else if (member->number == HASHEXTENDED_PROC)
+ {
+ if (procform->pronargs != 2)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 2 must have two arguments")));
+ if (procform->prorettype != INT8OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 2 must return bigint")));
+ }
/*
* If lefttype/righttype isn't specified, use the proc's input type
diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index 2efb6c9..a64182b 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -717,6 +717,19 @@ hash_aclitem(PG_FUNCTION_ARGS)
PG_RETURN_UINT32((uint32) (a->ai_privs + a->ai_grantee + a->ai_grantor));
}
+/*
+ * Returns a uint64 value after adding seed. Otherwise similar to
+ * hash_aclitem
+ */
+Datum
+hash_aclitem_extended(PG_FUNCTION_ARGS)
+{
+ AclItem *a = PG_GETARG_ACLITEM_P(0);
+ uint64 seed = PG_GETARG_INT64(1);
+
+ return UInt64GetDatum((uint64) (a->ai_privs + a->ai_grantee + a->ai_grantor
+ + seed));
+}
/*
* acldefault() --- create an ACL describing default access permissions
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 34dadd6..bd6a0a2 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -20,6 +20,7 @@
#endif
#include <math.h>
+#include "access/hash.h"
#include "access/htup_details.h"
#include "catalog/pg_type.h"
#include "funcapi.h"
@@ -4020,6 +4021,119 @@ hash_array(PG_FUNCTION_ARGS)
PG_RETURN_UINT32(result);
}
+/* Returns a uint64 value. Otherwise similar to hash_any. */
+Datum
+hash_array_extended(PG_FUNCTION_ARGS)
+{
+ AnyArrayType *array = PG_GETARG_ANY_ARRAY(0);
+ uint64 seed = PG_GETARG_INT64(1);
+ int ndims = AARR_NDIM(array);
+ int *dims = AARR_DIMS(array);
+ Oid element_type = AARR_ELEMTYPE(array);
+ uint64 result = 1;
+ int nitems;
+ TypeCacheEntry *typentry;
+ int typlen;
+ bool typbyval;
+ char typalign;
+ int i;
+ array_iter iter;
+ FunctionCallInfoData locfcinfo;
+
+ /*
+ * We arrange to look up the hash function only once per series of calls,
+ * assuming the element type doesn't change underneath us. Unlike
+ * hash_array, which uses the standard hash function but we need an
+ * extended hash function which is not available in the type cache entry,
+ * also it's unsafe to modify the cache entry. So that we will copy this
+ * cached entry locally and load the extended hash functions in it.
+ */
+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL ||
+ typentry->type_id != element_type)
+ {
+ TypeCacheEntry *cache = lookup_type_cache(element_type,
+ TYPECACHE_HASH_OPFAMILY);
+
+ typentry = MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
+ sizeof(TypeCacheEntry));
+
+ typentry->type_id = element_type;
+ typentry->typlen = cache->typlen;
+ typentry->typbyval = cache->typbyval;
+ typentry->typalign = cache->typalign;
+ typentry->hash_proc = get_opfamily_proc(cache->hash_opf,
+ cache->hash_opintype,
+ cache->hash_opintype,
+ HASHEXTENDED_PROC);
+ fmgr_info_cxt(typentry->hash_proc, &typentry->hash_proc_finfo,
+ fcinfo->flinfo->fn_mcxt);
+ if (!OidIsValid(typentry->hash_proc_finfo.fn_oid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("could not identify a hash function for type %s",
+ format_type_be(element_type))));
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
+ typlen = typentry->typlen;
+ typbyval = typentry->typbyval;
+ typalign = typentry->typalign;
+
+ /*
+ * apply the hash function to each array element.
+ */
+ InitFunctionCallInfoData(locfcinfo, &typentry->hash_proc_finfo, 2,
+ InvalidOid, NULL, NULL);
+
+ /* Loop over source data */
+ nitems = ArrayGetNItems(ndims, dims);
+ array_iter_setup(&iter, array);
+
+ for (i = 0; i < nitems; i++)
+ {
+ Datum elt;
+ bool isnull;
+ uint64 elthash;
+
+ /* Get element, checking for NULL */
+ elt = array_iter_next(&iter, &isnull, i, typlen, typbyval, typalign);
+
+ if (isnull)
+ {
+ /* Treat nulls as having hashvalue 0 */
+ elthash = 0;
+ }
+ else
+ {
+ /* Apply the hash function */
+ locfcinfo.arg[0] = elt;
+ locfcinfo.arg[1] = seed;
+ locfcinfo.argnull[0] = false;
+ locfcinfo.argnull[1] = false;
+ locfcinfo.isnull = false;
+ elthash = DatumGetUInt64(FunctionCallInvoke(&locfcinfo));
+ }
+
+ /*
+ * Combine hash values of successive elements by multiplying the
+ * current value by 31 and adding on the new element's hash value.
+ *
+ * The result is a sum in which each element's hash value is
+ * multiplied by a different power of 31. This is modulo 2^64
+ * arithmetic, and the powers of 31 modulo 2^64 form a cyclic group of
+ * order 2^59. So for arrays of up to 2^59 elements, each element's
+ * hash value is multiplied by a different (odd) number, resulting in
+ * a good mixing of all the elements' hash values.
+ */
+ result = (result << 5) - result + elthash;
+ }
+
+ /* Avoid leaking memory when handed toasted input. */
+ AARR_FREE_IF_COPY(array, 0);
+
+ return UInt64GetDatum(result);
+}
+
/*-----------------------------------------------------------------------------
* array overlap/containment comparisons
diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index 7d89d79..ad5cb2a 100644
--- a/src/backend/utils/adt/date.c
+++ b/src/backend/utils/adt/date.c
@@ -1509,6 +1509,12 @@ time_hash(PG_FUNCTION_ARGS)
}
Datum
+time_hash_extended(PG_FUNCTION_ARGS)
+{
+ return hashint8extended(fcinfo);
+}
+
+Datum
time_larger(PG_FUNCTION_ARGS)
{
TimeADT time1 = PG_GETARG_TIMEADT(0);
@@ -2214,6 +2220,24 @@ timetz_hash(PG_FUNCTION_ARGS)
}
Datum
+timetz_hash_extended(PG_FUNCTION_ARGS)
+{
+ TimeTzADT *key = PG_GETARG_TIMETZADT_P(0);
+ uint64 seed = PG_GETARG_DATUM(1);
+ uint64 thash;
+
+ /*
+ * To avoid any problems with padding bytes in the struct, we figure the
+ * field hashes separately and XOR them.
+ */
+ thash = DatumGetUInt64(DirectFunctionCall2(hashint8extended,
+ Int64GetDatumFast(key->time),
+ seed));
+ thash ^= DatumGetUInt64(hash_uint32_extended(key->zone, seed));
+ return UInt64GetDatum(thash);
+}
+
+Datum
timetz_larger(PG_FUNCTION_ARGS)
{
TimeTzADT *time1 = PG_GETARG_TIMETZADT_P(0);
diff --git a/src/backend/utils/adt/jsonb_op.c b/src/backend/utils/adt/jsonb_op.c
index d4c490e..4a0d147 100644
--- a/src/backend/utils/adt/jsonb_op.c
+++ b/src/backend/utils/adt/jsonb_op.c
@@ -291,3 +291,45 @@ jsonb_hash(PG_FUNCTION_ARGS)
PG_FREE_IF_COPY(jb, 0);
PG_RETURN_INT32(hash);
}
+
+Datum
+jsonb_hash_extended(PG_FUNCTION_ARGS)
+{
+ Jsonb *jb = PG_GETARG_JSONB(0);
+ JsonbIterator *it;
+ JsonbValue v;
+ JsonbIteratorToken r;
+ uint64 hash = 0;
+
+ if (JB_ROOT_COUNT(jb) == 0)
+ return UInt64GetDatum(0);
+
+ it = JsonbIteratorInit(&jb->root);
+
+ while ((r = JsonbIteratorNext(&it, &v, false)) != WJB_DONE)
+ {
+ switch (r)
+ {
+ /* Rotation is left to JsonbHashScalarValue() */
+ case WJB_BEGIN_ARRAY:
+ hash ^= JB_FARRAY;
+ break;
+ case WJB_BEGIN_OBJECT:
+ hash ^= JB_FOBJECT;
+ break;
+ case WJB_KEY:
+ case WJB_VALUE:
+ case WJB_ELEM:
+ JsonbHashScalarValueExtended(&v, &hash, PG_GETARG_INT64(1));
+ break;
+ case WJB_END_ARRAY:
+ case WJB_END_OBJECT:
+ break;
+ default:
+ elog(ERROR, "invalid JsonbIteratorNext rc: %d", (int) r);
+ }
+ }
+
+ PG_FREE_IF_COPY(jb, 0);
+ return UInt64GetDatum(hash);
+}
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 4850569..27ff521 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -1249,6 +1249,48 @@ JsonbHashScalarValue(const JsonbValue *scalarVal, uint32 *hash)
*hash ^= tmp;
}
+/* Returns a uint64 value. Otherwise similar to JsonbHashScalarValue. */
+void
+JsonbHashScalarValueExtended(const JsonbValue *scalarVal, uint64 *hash,
+ uint64 seed)
+{
+ uint64 tmp;
+
+ /* Compute hash value for scalarVal */
+ switch (scalarVal->type)
+ {
+ case jbvNull:
+ tmp = 0x01;
+ break;
+ case jbvString:
+ tmp = DatumGetUInt32(hash_any_extended((const unsigned char *) scalarVal->val.string.val,
+ scalarVal->val.string.len, seed));
+ break;
+ case jbvNumeric:
+ /* Must hash equal numerics to equal hash codes */
+ tmp = DatumGetUInt64(DirectFunctionCall2(hash_numeric_extended,
+ NumericGetDatum(scalarVal->val.numeric),
+ seed));
+ break;
+ case jbvBool:
+ tmp = DatumGetUInt64(DirectFunctionCall2(hashcharextended,
+ BoolGetDatum(scalarVal->val.boolean),
+ seed));
+
+ break;
+ default:
+ elog(ERROR, "invalid jsonb scalar type");
+ break;
+ }
+
+ /*
+ * Combine hash values of successive keys, values and elements by rotating
+ * the previous value left 1 bit, then XOR'ing in the new
+ * key/value/element's hash value.
+ */
+ *hash = (*hash << 1) | (*hash >> 63);
+ *hash ^= tmp;
+}
/*
* Are two scalar JsonbValues of the same type a and b equal?
*/
diff --git a/src/backend/utils/adt/mac.c b/src/backend/utils/adt/mac.c
index d1c20c3..deedb57 100644
--- a/src/backend/utils/adt/mac.c
+++ b/src/backend/utils/adt/mac.c
@@ -271,6 +271,14 @@ hashmacaddr(PG_FUNCTION_ARGS)
return hash_any((unsigned char *) key, sizeof(macaddr));
}
+Datum
+hashmacaddrextended(PG_FUNCTION_ARGS)
+{
+ macaddr *key = PG_GETARG_MACADDR_P(0);
+
+ return hash_any_extended((unsigned char *) key, sizeof(macaddr),
+ PG_GETARG_INT64(1));
+}
/*
* Arithmetic functions: bitwise NOT, AND, OR.
*/
diff --git a/src/backend/utils/adt/mac8.c b/src/backend/utils/adt/mac8.c
index 482d1fb..0410b98 100644
--- a/src/backend/utils/adt/mac8.c
+++ b/src/backend/utils/adt/mac8.c
@@ -407,6 +407,15 @@ hashmacaddr8(PG_FUNCTION_ARGS)
return hash_any((unsigned char *) key, sizeof(macaddr8));
}
+Datum
+hashmacaddr8extended(PG_FUNCTION_ARGS)
+{
+ macaddr8 *key = PG_GETARG_MACADDR8_P(0);
+
+ return hash_any_extended((unsigned char *) key, sizeof(macaddr8),
+ PG_GETARG_INT64(1));
+}
+
/*
* Arithmetic functions: bitwise NOT, AND, OR.
*/
diff --git a/src/backend/utils/adt/network.c b/src/backend/utils/adt/network.c
index 5573c34..e1d7c8d 100644
--- a/src/backend/utils/adt/network.c
+++ b/src/backend/utils/adt/network.c
@@ -486,6 +486,17 @@ hashinet(PG_FUNCTION_ARGS)
return hash_any((unsigned char *) VARDATA_ANY(addr), addrsize + 2);
}
+Datum
+hashinetextended(PG_FUNCTION_ARGS)
+{
+ inet *addr = PG_GETARG_INET_PP(0);
+ int addrsize = ip_addrsize(addr);
+
+ /* XXX this assumes there are no pad bytes in the data structure */
+ return hash_any_extended((unsigned char *) VARDATA_ANY(addr), addrsize + 2,
+ PG_GETARG_INT64(1));
+}
+
/*
* Boolean network-inclusion tests.
*/
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 3e5614e..3165357 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -2230,6 +2230,84 @@ hash_numeric(PG_FUNCTION_ARGS)
PG_RETURN_DATUM(result);
}
+Datum
+hash_numeric_extended(PG_FUNCTION_ARGS)
+{
+ Numeric key = PG_GETARG_NUMERIC(0);
+ Datum digit_hash;
+ Datum result;
+ int weight;
+ int start_offset;
+ int end_offset;
+ int i;
+ int hash_len;
+ NumericDigit *digits;
+
+ /* If it's NaN, don't try to hash the rest of the fields */
+ if (NUMERIC_IS_NAN(key))
+ return UInt64GetDatum(0);
+
+ weight = NUMERIC_WEIGHT(key);
+ start_offset = 0;
+ end_offset = 0;
+
+ /*
+ * Omit any leading or trailing zeros from the input to the hash. The
+ * numeric implementation *should* guarantee that leading and trailing
+ * zeros are suppressed, but we're paranoid. Note that we measure the
+ * starting and ending offsets in units of NumericDigits, not bytes.
+ */
+ digits = NUMERIC_DIGITS(key);
+ for (i = 0; i < NUMERIC_NDIGITS(key); i++)
+ {
+ if (digits[i] != (NumericDigit) 0)
+ break;
+
+ start_offset++;
+
+ /*
+ * The weight is effectively the # of digits before the decimal point,
+ * so decrement it for each leading zero we skip.
+ */
+ weight--;
+ }
+
+ /*
+ * If there are no non-zero digits, then the value of the number is zero,
+ * regardless of any other fields.
+ */
+ if (NUMERIC_NDIGITS(key) == start_offset)
+ return UInt64GetDatum(-1);
+
+ for (i = NUMERIC_NDIGITS(key) - 1; i >= 0; i--)
+ {
+ if (digits[i] != (NumericDigit) 0)
+ break;
+
+ end_offset++;
+ }
+
+ /* If we get here, there should be at least one non-zero digit */
+ Assert(start_offset + end_offset < NUMERIC_NDIGITS(key));
+
+ /*
+ * Note that we don't hash on the Numeric's scale, since two numerics can
+ * compare equal but have different scales. We also don't hash on the
+ * sign, although we could: since a sign difference implies inequality,
+ * this shouldn't affect correctness.
+ */
+ hash_len = NUMERIC_NDIGITS(key) - start_offset - end_offset;
+ digit_hash = hash_any_extended((unsigned char *) (NUMERIC_DIGITS(key)
+ + start_offset),
+ hash_len * sizeof(NumericDigit),
+ PG_GETARG_INT64(1));
+
+ /* Mix in the weight, via XOR */
+ result = digit_hash ^ weight;
+
+ PG_RETURN_DATUM(result);
+}
+
/* ----------------------------------------------------------------------
*
diff --git a/src/backend/utils/adt/pg_lsn.c b/src/backend/utils/adt/pg_lsn.c
index aefbb87..c1795df 100644
--- a/src/backend/utils/adt/pg_lsn.c
+++ b/src/backend/utils/adt/pg_lsn.c
@@ -179,6 +179,13 @@ pg_lsn_hash(PG_FUNCTION_ARGS)
return hashint8(fcinfo);
}
+Datum
+pg_lsn_hash_extended(PG_FUNCTION_ARGS)
+{
+ /* We can use hashint8extended directly */
+ return hashint8extended(fcinfo);
+}
+
/*----------------------------------------------------------
* Arithmetic operators on PostgreSQL LSNs.
diff --git a/src/backend/utils/adt/rangetypes.c b/src/backend/utils/adt/rangetypes.c
index 09a4f14..5cb9f90 100644
--- a/src/backend/utils/adt/rangetypes.c
+++ b/src/backend/utils/adt/rangetypes.c
@@ -1280,6 +1280,103 @@ hash_range(PG_FUNCTION_ARGS)
PG_RETURN_INT32(result);
}
+Datum
+hash_range_extended(PG_FUNCTION_ARGS)
+{
+ RangeType *r = PG_GETARG_RANGE(0);
+ uint64 seed = PG_GETARG_INT64(1);
+ uint64 result;
+ TypeCacheEntry *typcache;
+ TypeCacheEntry *scache;
+ RangeBound lower;
+ RangeBound upper;
+ bool empty;
+ char flags;
+ uint64 lower_hash;
+ uint64 upper_hash;
+ Oid rngtypid = RangeTypeGetOid(r);
+
+ check_stack_depth(); /* recurses when subtype is a range type */
+
+ typcache = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+
+ if (typcache == NULL ||
+ typcache->type_id != rngtypid)
+ {
+ TypeCacheEntry *cache = lookup_type_cache(rngtypid, TYPECACHE_RANGE_INFO);
+ if (cache->rngelemtype == NULL)
+ elog(ERROR, "type %u is not a range type", rngtypid);
+
+ /*
+ * Unlike hash_range, which uses the standard hash function but we need
+ * an extended hash function which is not available in the type cache
+ * entry, also it's unsafe to modify the cache entry. So that we will
+ * copy this cached entry locally and load the extended hash functions
+ * in it.
+ */
+ typcache = MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
+ sizeof(TypeCacheEntry));
+ typcache->rngelemtype = MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
+ sizeof(TypeCacheEntry));
+
+ memcpy(typcache, cache, sizeof(TypeCacheEntry));
+ memcpy(typcache->rngelemtype, cache->rngelemtype, sizeof(TypeCacheEntry));
+ fcinfo->flinfo->fn_extra = (void *) typcache;
+ }
+
+ /* deserialize */
+ range_deserialize(typcache, r, &lower, &upper, &empty);
+ flags = range_get_flags(r);
+
+ /*
+ * Look up the element type's hash function, if not done already.
+ */
+ scache = typcache->rngelemtype;
+ if (!OidIsValid(scache->hash_proc_finfo.fn_oid))
+ {
+ scache = lookup_type_cache(scache->type_id, TYPECACHE_HASH_OPFAMILY);
+ scache->hash_proc = get_opfamily_proc(scache->hash_opf,
+ scache->hash_opintype,
+ scache->hash_opintype,
+ HASHEXTENDED_PROC);
+ fmgr_info_cxt(scache->hash_proc, &scache->hash_proc_finfo,
+ fcinfo->flinfo->fn_mcxt);
+
+ if (!OidIsValid(scache->hash_proc_finfo.fn_oid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("could not identify a hash function for type %s",
+ format_type_be(scache->type_id))));
+ }
+
+ /*
+ * Apply the hash function to each bound.
+ */
+ if (RANGE_HAS_LBOUND(flags))
+ lower_hash = DatumGetUInt32(FunctionCall2Coll(&scache->hash_proc_finfo,
+ typcache->rng_collation,
+ lower.val,
+ seed));
+ else
+ lower_hash = 0;
+
+ if (RANGE_HAS_UBOUND(flags))
+ upper_hash = DatumGetUInt32(FunctionCall2Coll(&scache->hash_proc_finfo,
+ typcache->rng_collation,
+ upper.val,
+ seed));
+ else
+ upper_hash = 0;
+
+ /* Merge hashes of flags and bounds */
+ result = hash_uint32_extended((uint64) flags, seed);
+ result ^= lower_hash;
+ result = (result << 1) | (result >> 63);
+ result ^= upper_hash;
+
+ return UInt64GetDatum(result);
+}
+
/*
*----------------------------------------------------------
* CANONICAL FUNCTIONS
diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index 6fa126d..81530bf 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -2113,6 +2113,11 @@ timestamp_hash(PG_FUNCTION_ARGS)
return hashint8(fcinfo);
}
+Datum
+timestamp_hash_extended(PG_FUNCTION_ARGS)
+{
+ return hashint8extended(fcinfo);
+}
/*
* Cross-type comparison functions for timestamp vs timestamptz
@@ -2419,6 +2424,25 @@ interval_hash(PG_FUNCTION_ARGS)
return DirectFunctionCall1(hashint8, Int64GetDatumFast(span64));
}
+Datum
+interval_hash_extended(PG_FUNCTION_ARGS)
+{
+ Interval *interval = PG_GETARG_INTERVAL_P(0);
+ INT128 span = interval_cmp_value(interval);
+ int64 span64;
+
+ /*
+ * Use only the least significant 64 bits for hashing. The upper 64 bits
+ * seldom add any useful information, and besides we must do it like this
+ * for compatibility with hashes calculated before use of INT128 was
+ * introduced.
+ */
+ span64 = int128_to_int64(span);
+
+ return DirectFunctionCall2(hashint8extended, Int64GetDatumFast(span64),
+ PG_GETARG_DATUM(1));
+}
+
/* overlaps_timestamp() --- implements the SQL OVERLAPS operator.
*
* Algorithm is per SQL spec. This is much harder than you'd think
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 5f15c8e..f73c695 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -408,3 +408,11 @@ uuid_hash(PG_FUNCTION_ARGS)
return hash_any(key->data, UUID_LEN);
}
+
+Datum
+uuid_hash_extended(PG_FUNCTION_ARGS)
+{
+ pg_uuid_t *key = PG_GETARG_UUID_P(0);
+
+ return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
+}
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index cbc62b0..c0198d4 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -947,6 +947,25 @@ hashbpchar(PG_FUNCTION_ARGS)
return result;
}
+Datum
+hashbpcharextended(PG_FUNCTION_ARGS)
+{
+ BpChar *key = PG_GETARG_BPCHAR_PP(0);
+ char *keydata;
+ int keylen;
+ Datum result;
+
+ keydata = VARDATA_ANY(key);
+ keylen = bcTruelen(key);
+
+ result = hash_any_extended((unsigned char *) keydata, keylen,
+ PG_GETARG_INT64(1));
+
+ /* Avoid leaking memory for toasted inputs */
+ PG_FREE_IF_COPY(key, 0);
+
+ return result;
+}
/*
* The following operators support character-by-character comparison
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 82763f8..b7a14dc 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -490,8 +490,8 @@ get_compatible_hash_operators(Oid opno,
/*
* get_op_hash_functions
- * Get the OID(s) of hash support function(s) compatible with the given
- * operator, operating on its LHS and/or RHS datatype as required.
+ * Get the OID(s) of the standard hash support function(s) compatible with
+ * the given operator, operating on its LHS and/or RHS datatype as required.
*
* A function for the LHS type is sought and returned into *lhs_procno if
* lhs_procno isn't NULL. Similarly, a function for the RHS type is sought
@@ -542,7 +542,7 @@ get_op_hash_functions(Oid opno,
*lhs_procno = get_opfamily_proc(aform->amopfamily,
aform->amoplefttype,
aform->amoplefttype,
- HASHPROC);
+ HASHSTANDARD_PROC);
if (!OidIsValid(*lhs_procno))
continue;
/* Matching LHS found, done if caller doesn't want RHS */
@@ -564,7 +564,7 @@ get_op_hash_functions(Oid opno,
*rhs_procno = get_opfamily_proc(aform->amopfamily,
aform->amoprighttype,
aform->amoprighttype,
- HASHPROC);
+ HASHSTANDARD_PROC);
if (!OidIsValid(*rhs_procno))
{
/* Forget any LHS function from this opfamily */
diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index 20567a3..9d20eab 100644
--- a/src/backend/utils/cache/typcache.c
+++ b/src/backend/utils/cache/typcache.c
@@ -474,7 +474,7 @@ lookup_type_cache(Oid type_id, int flags)
hash_proc = get_opfamily_proc(typentry->hash_opf,
typentry->hash_opintype,
typentry->hash_opintype,
- HASHPROC);
+ HASHSTANDARD_PROC);
/*
* As above, make sure hash_array will succeed. We don't currently
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index 72fce30..13505bc 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -289,12 +289,16 @@ typedef HashMetaPageData *HashMetaPage;
#define HTMaxStrategyNumber 1
/*
- * When a new operator class is declared, we require that the user supply
- * us with an amproc procudure for hashing a key of the new type.
- * Since we only have one such proc in amproc, it's number 1.
+ * When a new operator class is declared, we require that the user supply
+ * us with an amproc procudure for hashing a key of the new type, returning
+ * a 32-bit hash value. We call this the "standard" hash procedure. We
+ * also allow an optional "extended" hash procedure which accepts a salt and
+ * returns a 64-bit hash value. This is highly recommended but, for reasons
+ * of backward compatibility, optional.
*/
-#define HASHPROC 1
-#define HASHNProcs 1
+#define HASHSTANDARD_PROC 1
+#define HASHEXTENDED_PROC 2
+#define HASHNProcs 2
/* public routines */
@@ -322,7 +326,10 @@ extern bytea *hashoptions(Datum reloptions, bool validate);
extern bool hashvalidate(Oid opclassoid);
extern Datum hash_any(register const unsigned char *k, register int keylen);
+extern Datum hash_any_extended(register const unsigned char *k, register int
+ keylen, uint64 seed);
extern Datum hash_uint32(uint32 k);
+extern Datum hash_uint32_extended(uint32 k, uint64 seed);
/* private routines */
diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h
index 7d245b1..fb6a829 100644
--- a/src/include/catalog/pg_amproc.h
+++ b/src/include/catalog/pg_amproc.h
@@ -153,41 +153,77 @@ DATA(insert ( 4033 3802 3802 1 4044 ));
/* hash */
DATA(insert ( 427 1042 1042 1 1080 ));
+DATA(insert ( 427 1042 1042 2 972 ));
DATA(insert ( 431 18 18 1 454 ));
+DATA(insert ( 431 18 18 2 446 ));
DATA(insert ( 435 1082 1082 1 450 ));
+DATA(insert ( 435 1082 1082 2 425 ));
DATA(insert ( 627 2277 2277 1 626 ));
+DATA(insert ( 627 2277 2277 2 782 ));
DATA(insert ( 1971 700 700 1 451 ));
+DATA(insert ( 1971 700 700 2 443 ));
DATA(insert ( 1971 701 701 1 452 ));
+DATA(insert ( 1971 701 701 2 444 ));
DATA(insert ( 1975 869 869 1 422 ));
+DATA(insert ( 1975 869 869 2 779 ));
DATA(insert ( 1977 21 21 1 449 ));
+DATA(insert ( 1977 21 21 2 441 ));
DATA(insert ( 1977 23 23 1 450 ));
+DATA(insert ( 1977 23 23 2 425 ));
DATA(insert ( 1977 20 20 1 949 ));
+DATA(insert ( 1977 20 20 2 442 ));
DATA(insert ( 1983 1186 1186 1 1697 ));
+DATA(insert ( 1983 1186 1186 2 3418 ));
DATA(insert ( 1985 829 829 1 399 ));
+DATA(insert ( 1985 829 829 2 778 ));
DATA(insert ( 1987 19 19 1 455 ));
+DATA(insert ( 1987 19 19 2 447 ));
DATA(insert ( 1990 26 26 1 453 ));
+DATA(insert ( 1990 26 26 2 445 ));
DATA(insert ( 1992 30 30 1 457 ));
+DATA(insert ( 1992 30 30 2 776 ));
DATA(insert ( 1995 25 25 1 400 ));
+DATA(insert ( 1995 25 25 2 448));
DATA(insert ( 1997 1083 1083 1 1688 ));
+DATA(insert ( 1997 1083 1083 2 3409 ));
DATA(insert ( 1998 1700 1700 1 432 ));
+DATA(insert ( 1998 1700 1700 2 780 ));
DATA(insert ( 1999 1184 1184 1 2039 ));
+DATA(insert ( 1999 1184 1184 2 3411 ));
DATA(insert ( 2001 1266 1266 1 1696 ));
+DATA(insert ( 2001 1266 1266 2 3410 ));
DATA(insert ( 2040 1114 1114 1 2039 ));
+DATA(insert ( 2040 1114 1114 2 3411 ));
DATA(insert ( 2222 16 16 1 454 ));
+DATA(insert ( 2222 16 16 2 446 ));
DATA(insert ( 2223 17 17 1 456 ));
+DATA(insert ( 2223 17 17 2 772 ));
DATA(insert ( 2225 28 28 1 450 ));
+DATA(insert ( 2225 28 28 2 425));
DATA(insert ( 2226 29 29 1 450 ));
+DATA(insert ( 2226 29 29 2 425 ));
DATA(insert ( 2227 702 702 1 450 ));
+DATA(insert ( 2227 702 702 2 425 ));
DATA(insert ( 2228 703 703 1 450 ));
+DATA(insert ( 2228 703 703 2 425 ));
DATA(insert ( 2229 25 25 1 400 ));
+DATA(insert ( 2229 25 25 2 448 ));
DATA(insert ( 2231 1042 1042 1 1080 ));
+DATA(insert ( 2231 1042 1042 2 972 ));
DATA(insert ( 2235 1033 1033 1 329 ));
+DATA(insert ( 2235 1033 1033 2 777 ));
DATA(insert ( 2969 2950 2950 1 2963 ));
+DATA(insert ( 2969 2950 2950 2 3412 ));
DATA(insert ( 3254 3220 3220 1 3252 ));
+DATA(insert ( 3254 3220 3220 2 3413 ));
DATA(insert ( 3372 774 774 1 328 ));
+DATA(insert ( 3372 774 774 2 781 ));
DATA(insert ( 3523 3500 3500 1 3515 ));
+DATA(insert ( 3523 3500 3500 2 3414 ));
DATA(insert ( 3903 3831 3831 1 3902 ));
+DATA(insert ( 3903 3831 3831 2 3417 ));
DATA(insert ( 4034 3802 3802 1 4045 ));
+DATA(insert ( 4034 3802 3802 2 3416));
/* gist */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 8b33b4e..d820b56 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -668,36 +668,68 @@ DESCR("convert char(n) to name");
DATA(insert OID = 449 ( hashint2 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "21" _null_ _null_ _null_ _null_ _null_ hashint2 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 441 ( hashint2extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "21 20" _null_ _null_ _null_ _null_ _null_ hashint2extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 450 ( hashint4 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "23" _null_ _null_ _null_ _null_ _null_ hashint4 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 425 ( hashint4extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "23 20" _null_ _null_ _null_ _null_ _null_ hashint4extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 949 ( hashint8 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "20" _null_ _null_ _null_ _null_ _null_ hashint8 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 442 ( hashint8extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "20 20" _null_ _null_ _null_ _null_ _null_ hashint8extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 451 ( hashfloat4 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "700" _null_ _null_ _null_ _null_ _null_ hashfloat4 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 443 ( hashfloat4extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "700 20" _null_ _null_ _null_ _null_ _null_ hashfloat4extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 452 ( hashfloat8 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "701" _null_ _null_ _null_ _null_ _null_ hashfloat8 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 444 ( hashfloat8extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "701 20" _null_ _null_ _null_ _null_ _null_ hashfloat8extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 453 ( hashoid PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "26" _null_ _null_ _null_ _null_ _null_ hashoid _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 445 ( hashoidextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "26 20" _null_ _null_ _null_ _null_ _null_ hashoidextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 454 ( hashchar PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "18" _null_ _null_ _null_ _null_ _null_ hashchar _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 446 ( hashcharextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "18 20" _null_ _null_ _null_ _null_ _null_ hashcharextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 455 ( hashname PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "19" _null_ _null_ _null_ _null_ _null_ hashname _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 447 ( hashnameextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "19 20" _null_ _null_ _null_ _null_ _null_ hashnameextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 400 ( hashtext PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "25" _null_ _null_ _null_ _null_ _null_ hashtext _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 448 ( hashtextextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "25 20" _null_ _null_ _null_ _null_ _null_ hashtextextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 456 ( hashvarlena PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "2281" _null_ _null_ _null_ _null_ _null_ hashvarlena _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 772 ( hashvarlenaextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "2281 20" _null_ _null_ _null_ _null_ _null_ hashvarlenaextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 457 ( hashoidvector PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "30" _null_ _null_ _null_ _null_ _null_ hashoidvector _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 776 ( hashoidvectorextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "30 20" _null_ _null_ _null_ _null_ _null_ hashoidvectorextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 329 ( hash_aclitem PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1033" _null_ _null_ _null_ _null_ _null_ hash_aclitem _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 777 ( hash_aclitem_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1033 20" _null_ _null_ _null_ _null_ _null_ hash_aclitem_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 399 ( hashmacaddr PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "829" _null_ _null_ _null_ _null_ _null_ hashmacaddr _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 778 ( hashmacaddrextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "829 20" _null_ _null_ _null_ _null_ _null_ hashmacaddrextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 422 ( hashinet PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "869" _null_ _null_ _null_ _null_ _null_ hashinet _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 779 ( hashinetextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "869 20" _null_ _null_ _null_ _null_ _null_ hashinetextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 432 ( hash_numeric PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1700" _null_ _null_ _null_ _null_ _null_ hash_numeric _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 780 ( hash_numeric_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1700 20" _null_ _null_ _null_ _null_ _null_ hash_numeric_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 328 ( hashmacaddr8 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "774" _null_ _null_ _null_ _null_ _null_ hashmacaddr8 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 781 ( hashmacaddr8extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "774 20" _null_ _null_ _null_ _null_ _null_ hashmacaddr8extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 438 ( num_nulls PGNSP PGUID 12 1 0 2276 0 f f f f f f i s 1 0 23 "2276" "{2276}" "{v}" _null_ _null_ _null_ pg_num_nulls _null_ _null_ _null_ ));
DESCR("count the number of NULL arguments");
@@ -747,6 +779,8 @@ DESCR("convert float8 to int8");
DATA(insert OID = 626 ( hash_array PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "2277" _null_ _null_ _null_ _null_ _null_ hash_array _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 782 ( hash_array_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "2277 20" _null_ _null_ _null_ _null_ _null_ hash_array_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 652 ( float4 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 700 "20" _null_ _null_ _null_ _null_ _null_ i8tof _null_ _null_ _null_ ));
DESCR("convert int8 to float4");
@@ -1155,6 +1189,8 @@ DATA(insert OID = 3328 ( bpchar_sortsupport PGNSP PGUID 12 1 0 0 0 f f f f t f i
DESCR("sort support");
DATA(insert OID = 1080 ( hashbpchar PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1042" _null_ _null_ _null_ _null_ _null_ hashbpchar _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 972 ( hashbpcharextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1042 20" _null_ _null_ _null_ _null_ _null_ hashbpcharextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 1081 ( format_type PGNSP PGUID 12 1 0 0 0 f f f f f f s s 2 0 25 "26 23" _null_ _null_ _null_ _null_ _null_ format_type _null_ _null_ _null_ ));
DESCR("format a type oid and atttypmod to canonical SQL");
DATA(insert OID = 1084 ( date_in PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 1082 "2275" _null_ _null_ _null_ _null_ _null_ date_in _null_ _null_ _null_ ));
@@ -2286,10 +2322,16 @@ DESCR("less-equal-greater");
DATA(insert OID = 1688 ( time_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1083" _null_ _null_ _null_ _null_ _null_ time_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3409 ( time_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1083 20" _null_ _null_ _null_ _null_ _null_ time_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 1696 ( timetz_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1266" _null_ _null_ _null_ _null_ _null_ timetz_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3410 ( timetz_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1266 20" _null_ _null_ _null_ _null_ _null_ timetz_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 1697 ( interval_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1186" _null_ _null_ _null_ _null_ _null_ interval_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3418 ( interval_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1186 20" _null_ _null_ _null_ _null_ _null_ interval_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
/* OID's 1700 - 1799 NUMERIC data type */
@@ -3078,6 +3120,8 @@ DATA(insert OID = 2038 ( timezone PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0
DESCR("adjust time with time zone to new zone");
DATA(insert OID = 2039 ( timestamp_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1114" _null_ _null_ _null_ _null_ _null_ timestamp_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3411 ( timestamp_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1114 20" _null_ _null_ _null_ _null_ _null_ timestamp_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 2041 ( overlaps PGNSP PGUID 12 1 0 0 0 f f f f f f i s 4 0 16 "1114 1114 1114 1114" _null_ _null_ _null_ _null_ _null_ overlaps_timestamp _null_ _null_ _null_ ));
DESCR("intervals overlap?");
DATA(insert OID = 2042 ( overlaps PGNSP PGUID 14 1 0 0 0 f f f f f f i s 4 0 16 "1114 1186 1114 1186" _null_ _null_ _null_ _null_ _null_ "select ($1, ($1 + $2)) overlaps ($3, ($3 + $4))" _null_ _null_ _null_ ));
@@ -4543,6 +4587,8 @@ DATA(insert OID = 2962 ( uuid_send PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1
DESCR("I/O");
DATA(insert OID = 2963 ( uuid_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "2950" _null_ _null_ _null_ _null_ _null_ uuid_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3412 ( uuid_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "2950 20" _null_ _null_ _null_ _null_ _null_ uuid_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
/* pg_lsn */
DATA(insert OID = 3229 ( pg_lsn_in PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3220 "2275" _null_ _null_ _null_ _null_ _null_ pg_lsn_in _null_ _null_ _null_ ));
@@ -4564,6 +4610,8 @@ DATA(insert OID = 3251 ( pg_lsn_cmp PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0
DESCR("less-equal-greater");
DATA(insert OID = 3252 ( pg_lsn_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "3220" _null_ _null_ _null_ _null_ _null_ pg_lsn_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3413 ( pg_lsn_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "3220 20" _null_ _null_ _null_ _null_ _null_ pg_lsn_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
/* enum related procs */
DATA(insert OID = 3504 ( anyenum_in PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3500 "2275" _null_ _null_ _null_ _null_ _null_ anyenum_in _null_ _null_ _null_ ));
@@ -4584,6 +4632,8 @@ DATA(insert OID = 3514 ( enum_cmp PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 2
DESCR("less-equal-greater");
DATA(insert OID = 3515 ( hashenum PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "3500" _null_ _null_ _null_ _null_ _null_ hashenum _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3414 ( hashenumextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "3500 20" _null_ _null_ _null_ _null_ _null_ hashenumextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 3524 ( enum_smaller PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 3500 "3500 3500" _null_ _null_ _null_ _null_ _null_ enum_smaller _null_ _null_ _null_ ));
DESCR("smaller of two");
DATA(insert OID = 3525 ( enum_larger PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 3500 "3500 3500" _null_ _null_ _null_ _null_ _null_ enum_larger _null_ _null_ _null_ ));
@@ -4981,6 +5031,8 @@ DATA(insert OID = 4044 ( jsonb_cmp PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2
DESCR("less-equal-greater");
DATA(insert OID = 4045 ( jsonb_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "3802" _null_ _null_ _null_ _null_ _null_ jsonb_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3416 ( jsonb_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "3802 20" _null_ _null_ _null_ _null_ _null_ jsonb_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 4046 ( jsonb_contains PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 16 "3802 3802" _null_ _null_ _null_ _null_ _null_ jsonb_contains _null_ _null_ _null_ ));
DATA(insert OID = 4047 ( jsonb_exists PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 16 "3802 25" _null_ _null_ _null_ _null_ _null_ jsonb_exists _null_ _null_ _null_ ));
DATA(insert OID = 4048 ( jsonb_exists_any PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 16 "3802 1009" _null_ _null_ _null_ _null_ _null_ jsonb_exists_any _null_ _null_ _null_ ));
@@ -5171,6 +5223,8 @@ DATA(insert OID = 3881 ( range_gist_same PGNSP PGUID 12 1 0 0 0 f f f f t f i
DESCR("GiST support");
DATA(insert OID = 3902 ( hash_range PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "3831" _null_ _null_ _null_ _null_ _null_ hash_range _null_ _null_ _null_ ));
DESCR("hash a range");
+DATA(insert OID = 3417 ( hash_range_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "3831 20" _null_ _null_ _null_ _null_ _null_ hash_range_extended _null_ _null_ _null_ ));
+DESCR("hash a range");
DATA(insert OID = 3916 ( range_typanalyze PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 16 "2281" _null_ _null_ _null_ _null_ _null_ range_typanalyze _null_ _null_ _null_ ));
DESCR("range typanalyze");
DATA(insert OID = 3169 ( rangesel PGNSP PGUID 12 1 0 0 0 f f f f t f s s 4 0 701 "2281 26 2281 23" _null_ _null_ _null_ _null_ _null_ rangesel _null_ _null_ _null_ ));
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index ea9dd17..cfee2dc 100644
--- a/src/include/utils/jsonb.h
+++ b/src/include/utils/jsonb.h
@@ -370,6 +370,8 @@ extern Jsonb *JsonbValueToJsonb(JsonbValue *val);
extern bool JsonbDeepContains(JsonbIterator **val,
JsonbIterator **mContained);
extern void JsonbHashScalarValue(const JsonbValue *scalarVal, uint32 *hash);
+extern void JsonbHashScalarValueExtended(const JsonbValue *scalarVal,
+ uint64 *hash, uint64 seed);
/* jsonb.c support functions */
extern char *JsonbToCString(StringInfo out, JsonbContainer *in,
diff --git a/src/test/regress/expected/alter_generic.out b/src/test/regress/expected/alter_generic.out
index 9f6ad4d..767c09b 100644
--- a/src/test/regress/expected/alter_generic.out
+++ b/src/test/regress/expected/alter_generic.out
@@ -421,7 +421,7 @@ BEGIN TRANSACTION;
CREATE OPERATOR FAMILY alt_opf13 USING hash;
CREATE FUNCTION fn_opf13 (int4) RETURNS BIGINT AS 'SELECT NULL::BIGINT;' LANGUAGE SQL;
ALTER OPERATOR FAMILY alt_opf13 USING hash ADD FUNCTION 1 fn_opf13(int4);
-ERROR: hash procedures must return integer
+ERROR: hash procedure 1 must return integer
DROP OPERATOR FAMILY alt_opf13 USING hash;
ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
@@ -439,7 +439,7 @@ BEGIN TRANSACTION;
CREATE OPERATOR FAMILY alt_opf15 USING hash;
CREATE FUNCTION fn_opf15 (int4, int2) RETURNS BIGINT AS 'SELECT NULL::BIGINT;' LANGUAGE SQL;
ALTER OPERATOR FAMILY alt_opf15 USING hash ADD FUNCTION 1 fn_opf15(int4, int2);
-ERROR: hash procedures must have one argument
+ERROR: hash procedure 1 must have one argument
DROP OPERATOR FAMILY alt_opf15 USING hash;
ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
--
2.6.2
0002-test-Hash_functions.patchapplication/octet-stream; name=0002-test-Hash_functions.patchDownload
From b5103a59f50adf2919d5d73e652496356105553e Mon Sep 17 00:00:00 2001
From: Amul Sul <sulamul@gmail.com>
Date: Tue, 22 Aug 2017 14:06:50 +0530
Subject: [PATCH 2/2] test-Hash_functions
---
src/test/regress/expected/hash_func.out | 167 ++++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/hash_func.sql | 48 +++++++++
3 files changed, 216 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/hash_func.out
create mode 100644 src/test/regress/sql/hash_func.sql
diff --git a/src/test/regress/expected/hash_func.out b/src/test/regress/expected/hash_func.out
new file mode 100644
index 0000000..eed8b96
--- /dev/null
+++ b/src/test/regress/expected/hash_func.out
@@ -0,0 +1,167 @@
+--
+-- Test hash functions
+--
+SELECT hashint2(837::int2), hashint2extended(837::int2, 0), hashint2extended(837::int2, 6877457);
+ hashint2 | hashint2extended | hashint2extended
+------------+----------------------+----------------------
+ -828119691 | -7711785034027113099 | -1911879269318603603
+(1 row)
+
+SELECT hashint4(837), hashint4extended(837, 0), hashint4extended(837, 6877457);
+ hashint4 | hashint4extended | hashint4extended
+------------+----------------------+----------------------
+ -828119691 | -7711785034027113099 | -1911879269318603603
+(1 row)
+
+SELECT hashint8(837), hashint8extended(837, 0), hashint8extended(837, 6877457);
+ hashint8 | hashint8extended | hashint8extended
+------------+----------------------+----------------------
+ -828119691 | -7711785034027113099 | -1911879269318603603
+(1 row)
+
+SELECT hashfloat4(837.576), hashfloat4extended(837.576, 0), hashfloat4extended(837.576, 6877457);
+ hashfloat4 | hashfloat4extended | hashfloat4extended
+-------------+---------------------+----------------------
+ -1248506800 | 5445423874677492816 | -7666025229379580075
+(1 row)
+
+SELECT hashfloat8(837.576), hashfloat8extended(837.576, 0), hashfloat8extended(837.576, 6877457);
+ hashfloat8 | hashfloat8extended | hashfloat8extended
+------------+---------------------+---------------------
+ 22809674 | 6692304374340521034 | 3015132189621834995
+(1 row)
+
+SELECT hashoid(445), hashoidextended(445, 0), hashoidextended(445, 6877457);
+ hashoid | hashoidextended | hashoidextended
+----------+----------------------+----------------------
+ 89651167 | -3499471517977675809 | -6835125710780429670
+(1 row)
+
+SELECT hashchar('x'), hashcharextended('x', 0), hashcharextended('x', 6877457);
+ hashchar | hashcharextended | hashcharextended
+-------------+---------------------+----------------------
+ -1072653310 | -300249585504183294 | -1689974060427644062
+(1 row)
+
+SELECT hashname('PostgreSQL'), hashnameextended('PostgreSQL', 0), hashnameextended('PostgreSQL', 6877457);
+ hashname | hashnameextended | hashnameextended
+-------------+----------------------+----------------------
+ -1696465276 | -5770657681951818108 | -6237257254085129499
+(1 row)
+
+SELECT hashtext('PostgreSQL'), hashtextextended('PostgreSQL', 0), hashtextextended('PostgreSQL', 6877457);
+ hashtext | hashtextextended | hashtextextended
+-------------+----------------------+----------------------
+ -1696465276 | -5770657681951818108 | -6237257254085129499
+(1 row)
+
+--SELECT hashvarlena(internal_type??), hashvarlenaextended(internal type??, 0), hashvarlenaextended(internal type??, 6877457);
+SELECT hashoidvector('1 2 3 4 5 6 7 8'), hashoidvectorextended('1 2 3 4 5 6 7 8', 0),
+ hashoidvectorextended('1 2 3 4 5 6 7 8', 6877457);
+ hashoidvector | hashoidvectorextended | hashoidvectorextended
+---------------+-----------------------+-----------------------
+ 1162769288 | -8958777976966712440 | 5137409721967698705
+(1 row)
+
+-- SELECT hash_aclitem(relacl[1]), hash_aclitem_extended(relacl[1], 0), hash_aclitem_extended(relacl[1], 6877457)
+-- FROM pg_class LIMIT 1;
+SELECT hashmacaddr('08:00:2b:01:02:04'), hashmacaddrextended('08:00:2b:01:02:04', 0),
+ hashmacaddrextended('08:00:2b:01:02:04', 6877457);
+ hashmacaddr | hashmacaddrextended | hashmacaddrextended
+-------------+----------------------+----------------------
+ 1310037952 | -1891888588126840896 | -7646251794294615557
+(1 row)
+
+SELECT hashmacaddr8('08:00:2b:01:02:03:04:05'), hashmacaddr8extended('08:00:2b:01:02:03:04:05', 0),
+ hashmacaddr8extended('08:00:2b:01:02:03:04:05', 6877457);
+ hashmacaddr8 | hashmacaddr8extended | hashmacaddr8extended
+--------------+----------------------+----------------------
+ -445665214 | -5218702474190278590 | -230507670101060516
+(1 row)
+
+SELECT hashinet('192.168.100.128/25'), hashinetextended('192.168.100.128/25', 0),
+ hashinetextended('192.168.100.128/25', 6877457);
+ hashinet | hashinetextended | hashinetextended
+------------+---------------------+---------------------
+ 1612896565 | 8936096150078019893 | 6997678326324099600
+(1 row)
+
+SELECT hash_numeric(149484958.204628457), hash_numeric_extended(149484958.204628457, 0),
+ hash_numeric_extended(149484958.204628457, 6877457);
+ hash_numeric | hash_numeric_extended | hash_numeric_extended
+--------------+-----------------------+-----------------------
+ -744679049 | -7302423428854900361 | 1216807249137458925
+(1 row)
+
+SELECT hash_array('{1,2,3,4,5,6,7,8}'::int2[]), hash_array_extended('{1,2,3,4,5,6,7,8}'::int2[], 0),
+ hash_array_extended('{1,2,3,4,5,6,7,8}'::int2[], 6877457);
+ hash_array | hash_array_extended | hash_array_extended
+------------+----------------------+----------------------
+ 1332756414 | -2139327289023774786 | -5977231353903978197
+(1 row)
+
+SELECT hashbpchar('PostgreSQL'), hashbpcharextended('PostgreSQL', 0), hashbpcharextended('PostgreSQL', 6877457);
+ hashbpchar | hashbpcharextended | hashbpcharextended
+-------------+----------------------+----------------------
+ -1696465276 | -5770657681951818108 | -6237257254085129499
+(1 row)
+
+SELECT time_hash('11:09:59'), time_hash_extended('11:09:59', 0),
+ time_hash_extended('11:09:59', 6877457);
+ time_hash | time_hash_extended | time_hash_extended
+------------+---------------------+---------------------
+ 1740227977 | 6564483993255396745 | 8740506789988024383
+(1 row)
+
+SELECT timetz_hash('2017-08-22 00:11:52.518762-07'), timetz_hash_extended('2017-08-22 00:11:52.518762-07', 0),
+ timetz_hash_extended('2017-08-22 00:11:52.518762-07', 6877457);
+ timetz_hash | timetz_hash_extended | timetz_hash_extended
+-------------+----------------------+----------------------
+ -360566912 | 9030192975979884416 | 1447257070792174569
+(1 row)
+
+SELECT timestamp_hash('2017-08-22 00:09:59'), timestamp_hash_extended('2017-08-22 00:09:59', 0),
+ timestamp_hash_extended('2017-08-22 00:09:59', 6877457);
+ timestamp_hash | timestamp_hash_extended | timestamp_hash_extended
+----------------+-------------------------+-------------------------
+ -1450627509 | 580235490235067979 | 3037149474310020228
+(1 row)
+
+SELECT interval_hash('5 month 7 day 46 minutes'), interval_hash_extended('5 month 7 day 46 minutes', 0),
+ interval_hash_extended('5 month 7 day 46 minutes', 6877457);
+ interval_hash | interval_hash_extended | interval_hash_extended
+---------------+------------------------+------------------------
+ -1650268900 | -2083448633614473956 | 7239542030696170388
+(1 row)
+
+SELECT uuid_hash('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11'), uuid_hash_extended('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 0),
+ uuid_hash_extended('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 6877457);
+ uuid_hash | uuid_hash_extended | uuid_hash_extended
+------------+---------------------+---------------------
+ 1154666245 | 7543054272012930821 | 5854430665682797134
+(1 row)
+
+SELECT pg_lsn_hash('16/B374D84'), pg_lsn_hash_extended('16/B374D84', 0), pg_lsn_hash_extended('16/B374D84', 6877457);
+ pg_lsn_hash | pg_lsn_hash_extended | pg_lsn_hash_extended
+-------------+----------------------+----------------------
+ 783723155 | 7832933829336017555 | -483403495645773502
+(1 row)
+
+--CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
+--SELECT hashenum('happy'::mood), hashenumextended('happy'::mood, 0), hashenumextended('happy'::mood, 6877457);
+--DROP TYPE mood;
+SELECT jsonb_hash('{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}'),
+ jsonb_hash_extended('{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}', 0),
+ jsonb_hash_extended('{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}', 6877457);
+ jsonb_hash | jsonb_hash_extended | jsonb_hash_extended
+------------+---------------------+----------------------
+ -258638552 | 2082234257548291112 | -8124667650490227351
+(1 row)
+
+SELECT hash_range(int4range(10, 20)), hash_range_extended(int4range(10, 20), 0),
+ hash_range_extended(int4range(10, 20), 6877457);
+ hash_range | hash_range_extended | hash_range_extended
+------------+---------------------+----------------------
+ 1202375768 | 7295368086535457881 | -7481851086854644720
+(1 row)
+
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index eefdeea..2fd3f2b 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/sql/hash_func.sql b/src/test/regress/sql/hash_func.sql
new file mode 100644
index 0000000..7d751be
--- /dev/null
+++ b/src/test/regress/sql/hash_func.sql
@@ -0,0 +1,48 @@
+--
+-- Test hash functions
+--
+
+SELECT hashint2(837::int2), hashint2extended(837::int2, 0), hashint2extended(837::int2, 6877457);
+SELECT hashint4(837), hashint4extended(837, 0), hashint4extended(837, 6877457);
+SELECT hashint8(837), hashint8extended(837, 0), hashint8extended(837, 6877457);
+SELECT hashfloat4(837.576), hashfloat4extended(837.576, 0), hashfloat4extended(837.576, 6877457);
+SELECT hashfloat8(837.576), hashfloat8extended(837.576, 0), hashfloat8extended(837.576, 6877457);
+SELECT hashoid(445), hashoidextended(445, 0), hashoidextended(445, 6877457);
+SELECT hashchar('x'), hashcharextended('x', 0), hashcharextended('x', 6877457);
+SELECT hashname('PostgreSQL'), hashnameextended('PostgreSQL', 0), hashnameextended('PostgreSQL', 6877457);
+SELECT hashtext('PostgreSQL'), hashtextextended('PostgreSQL', 0), hashtextextended('PostgreSQL', 6877457);
+--SELECT hashvarlena(internal_type??), hashvarlenaextended(internal type??, 0), hashvarlenaextended(internal type??, 6877457);
+SELECT hashoidvector('1 2 3 4 5 6 7 8'), hashoidvectorextended('1 2 3 4 5 6 7 8', 0),
+ hashoidvectorextended('1 2 3 4 5 6 7 8', 6877457);
+-- SELECT hash_aclitem(relacl[1]), hash_aclitem_extended(relacl[1], 0), hash_aclitem_extended(relacl[1], 6877457)
+-- FROM pg_class LIMIT 1;
+SELECT hashmacaddr('08:00:2b:01:02:04'), hashmacaddrextended('08:00:2b:01:02:04', 0),
+ hashmacaddrextended('08:00:2b:01:02:04', 6877457);
+SELECT hashmacaddr8('08:00:2b:01:02:03:04:05'), hashmacaddr8extended('08:00:2b:01:02:03:04:05', 0),
+ hashmacaddr8extended('08:00:2b:01:02:03:04:05', 6877457);
+SELECT hashinet('192.168.100.128/25'), hashinetextended('192.168.100.128/25', 0),
+ hashinetextended('192.168.100.128/25', 6877457);
+SELECT hash_numeric(149484958.204628457), hash_numeric_extended(149484958.204628457, 0),
+ hash_numeric_extended(149484958.204628457, 6877457);
+SELECT hash_array('{1,2,3,4,5,6,7,8}'::int2[]), hash_array_extended('{1,2,3,4,5,6,7,8}'::int2[], 0),
+ hash_array_extended('{1,2,3,4,5,6,7,8}'::int2[], 6877457);
+SELECT hashbpchar('PostgreSQL'), hashbpcharextended('PostgreSQL', 0), hashbpcharextended('PostgreSQL', 6877457);
+SELECT time_hash('11:09:59'), time_hash_extended('11:09:59', 0),
+ time_hash_extended('11:09:59', 6877457);
+SELECT timetz_hash('2017-08-22 00:11:52.518762-07'), timetz_hash_extended('2017-08-22 00:11:52.518762-07', 0),
+ timetz_hash_extended('2017-08-22 00:11:52.518762-07', 6877457);
+SELECT timestamp_hash('2017-08-22 00:09:59'), timestamp_hash_extended('2017-08-22 00:09:59', 0),
+ timestamp_hash_extended('2017-08-22 00:09:59', 6877457);
+SELECT interval_hash('5 month 7 day 46 minutes'), interval_hash_extended('5 month 7 day 46 minutes', 0),
+ interval_hash_extended('5 month 7 day 46 minutes', 6877457);
+SELECT uuid_hash('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11'), uuid_hash_extended('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 0),
+ uuid_hash_extended('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 6877457);
+SELECT pg_lsn_hash('16/B374D84'), pg_lsn_hash_extended('16/B374D84', 0), pg_lsn_hash_extended('16/B374D84', 6877457);
+--CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
+--SELECT hashenum('happy'::mood), hashenumextended('happy'::mood, 0), hashenumextended('happy'::mood, 6877457);
+--DROP TYPE mood;
+SELECT jsonb_hash('{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}'),
+ jsonb_hash_extended('{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}', 0),
+ jsonb_hash_extended('{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}', 6877457);
+SELECT hash_range(int4range(10, 20)), hash_range_extended(int4range(10, 20), 0),
+ hash_range_extended(int4range(10, 20), 6877457);
--
2.6.2
On Tue, Aug 22, 2017 at 5:44 PM, amul sul <sulamul@gmail.com> wrote:
On Fri, Aug 18, 2017 at 11:01 PM, Robert Haas <robertmhaas@gmail.com>
wrote:On Fri, Aug 18, 2017 at 1:12 PM, amul sul <sulamul@gmail.com> wrote:
I have a small query, what if I want a cache entry with extended hash
function instead standard one, I might require that while adding
hash_array_extended function? Do you think we need to extend
lookup_type_cache() as well?Hmm, I thought you had changed the hash partitioning stuff so that it
didn't rely on lookup_type_cache(). You have to look up the function
using the opclass provided in the partition key definition;
lookup_type_cache() will give you the default one for the datatype.
Maybe just use get_opfamily_proc?Yes, we can do that for
the
partitioning code, but my concern is a little bit
different. I apologize, I wasn't clear enough.I am trying to extend hash_array & hash_range function. The hash_array and
hash_range function calculates hash by using the respective hash function
for
the given argument type (i.e. array/range element type), and those hash
functions are made available in the TypeCacheEntry via
lookup_type_cache(). But
in the hash_array & hash_range extended version requires a respective
extended
hash function for those element type.I have added hash_array_extended & hash_range_extended function in the
attached
patch 0001, which maintains a local copy of TypeCacheEntry with extended
hash
functions. But I am a little bit skeptic about this logic, any
advice/suggestions will be
greatly appreciated.
Instead, in the attached patch, I have modified lookup_type_cache() to
request it to get extended hash function in the TypeCacheEntry.
For that, I've introduced new flags as TYPECACHE_HASH_EXTENDED_PROC,
TYPECACHE_HASH_EXTENDED_PROC_FINFO & TCFLAGS_CHECKED_HASH_EXTENDED_PROC, and
additional variables in TypeCacheEntry structure to hold extended hash proc
information.
The logic in the rest of the extended hash functions is same as the
standard
one.
Same for the hash_array_extended() & hash_range_extended() function as
well.
Regards,
Amul
Attachments:
0001-add-optional-second-hash-proc-v2.patchapplication/octet-stream; name=0001-add-optional-second-hash-proc-v2.patchDownload
From a2e655002d258cebe2c5abf9e4a614c258cf4ff9 Mon Sep 17 00:00:00 2001
From: Amul Sul <sulamul@gmail.com>
Date: Fri, 18 Aug 2017 15:28:26 +0530
Subject: [PATCH 1/2] add-optional-second-hash-proc-v2
v2:
Extended remaining hash function.
v1:
Patch by Robert Haas.
---
doc/src/sgml/xindex.sgml | 11 +-
src/backend/access/hash/hashfunc.c | 401 +++++++++++++++++++++++++++-
src/backend/access/hash/hashpage.c | 2 +-
src/backend/access/hash/hashutil.c | 6 +-
src/backend/access/hash/hashvalidate.c | 42 ++-
src/backend/commands/opclasscmds.c | 34 ++-
src/backend/utils/adt/acl.c | 13 +
src/backend/utils/adt/arrayfuncs.c | 98 +++++++
src/backend/utils/adt/date.c | 24 ++
src/backend/utils/adt/jsonb_op.c | 42 +++
src/backend/utils/adt/jsonb_util.c | 44 +++
src/backend/utils/adt/mac.c | 9 +
src/backend/utils/adt/mac8.c | 9 +
src/backend/utils/adt/network.c | 11 +
src/backend/utils/adt/numeric.c | 78 ++++++
src/backend/utils/adt/pg_lsn.c | 7 +
src/backend/utils/adt/rangetypes.c | 67 +++++
src/backend/utils/adt/timestamp.c | 24 ++
src/backend/utils/adt/uuid.c | 8 +
src/backend/utils/adt/varchar.c | 19 ++
src/backend/utils/cache/lsyscache.c | 8 +-
src/backend/utils/cache/typcache.c | 52 +++-
src/include/access/hash.h | 18 +-
src/include/catalog/pg_amproc.h | 36 +++
src/include/catalog/pg_proc.h | 54 ++++
src/include/utils/jsonb.h | 2 +
src/include/utils/typcache.h | 4 +
src/test/regress/expected/alter_generic.out | 4 +-
28 files changed, 1090 insertions(+), 37 deletions(-)
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml
index 333a36c..0f3c46b 100644
--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@@ -436,7 +436,8 @@
</table>
<para>
- Hash indexes require one support function, shown in <xref
+ Hash indexes require one support function, and allow a second one to be
+ supplied at the operator class author's option, as shown in <xref
linkend="xindex-hash-support-table">.
</para>
@@ -451,9 +452,15 @@
</thead>
<tbody>
<row>
- <entry>Compute the hash value for a key</entry>
+ <entry>Compute the 32-bit hash value for a key</entry>
<entry>1</entry>
</row>
+ <row>
+ <entry>
+ Compute the 64-bit hash value for a key given a 64-bit salt
+ </entry>
+ <entry>2</entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index a127f3f..89db846 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -47,18 +47,36 @@ hashchar(PG_FUNCTION_ARGS)
}
Datum
+hashcharextended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended((int32) PG_GETARG_CHAR(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashint2(PG_FUNCTION_ARGS)
{
return hash_uint32((int32) PG_GETARG_INT16(0));
}
Datum
+hashint2extended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended((int32) PG_GETARG_INT16(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashint4(PG_FUNCTION_ARGS)
{
return hash_uint32(PG_GETARG_INT32(0));
}
Datum
+hashint4extended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended(PG_GETARG_INT32(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashint8(PG_FUNCTION_ARGS)
{
/*
@@ -79,18 +97,50 @@ hashint8(PG_FUNCTION_ARGS)
}
Datum
+hashint8extended(PG_FUNCTION_ARGS)
+{
+ /*
+ * The idea here is to produce a hash value compatible with the values
+ * produced by hashint4 and hashint2 for logically equal inputs; this is
+ * necessary to support cross-type hash joins across these input types.
+ * Since all three types are signed, we can xor the high half of the int8
+ * value if the sign is positive, or the complement of the high half when
+ * the sign is negative.
+ */
+ int64 val = PG_GETARG_INT64(0);
+ uint32 lohalf = (uint32) val;
+ uint32 hihalf = (uint32) (val >> 32);
+
+ lohalf ^= (val >= 0) ? hihalf : ~hihalf;
+
+ return hash_uint32_extended(lohalf, PG_GETARG_INT64(1));
+}
+
+Datum
hashoid(PG_FUNCTION_ARGS)
{
return hash_uint32((uint32) PG_GETARG_OID(0));
}
Datum
+hashoidextended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended((uint32) PG_GETARG_OID(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashenum(PG_FUNCTION_ARGS)
{
return hash_uint32((uint32) PG_GETARG_OID(0));
}
Datum
+hashenumextended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended((uint32) PG_GETARG_OID(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashfloat4(PG_FUNCTION_ARGS)
{
float4 key = PG_GETARG_FLOAT4(0);
@@ -117,6 +167,33 @@ hashfloat4(PG_FUNCTION_ARGS)
}
Datum
+hashfloat4extended(PG_FUNCTION_ARGS)
+{
+ float4 key = PG_GETARG_FLOAT4(0);
+ float8 key8;
+
+ /*
+ * On IEEE-float machines, minus zero and zero have different bit patterns
+ * but should compare as equal. We must ensure that they have the same
+ * hash value, which is most reliably done this way:
+ */
+ if (key == (float4) 0)
+ return UInt64GetDatum(0);
+
+ /*
+ * To support cross-type hashing of float8 and float4, we want to return
+ * the same hash value hashfloat8 would produce for an equal float8 value.
+ * So, widen the value to float8 and hash that. (We must do this rather
+ * than have hashfloat8 try to narrow its value to float4; that could fail
+ * on overflow.)
+ */
+ key8 = key;
+
+ return hash_any_extended((unsigned char *) &key8, sizeof(key8),
+ PG_GETARG_INT64(1));
+}
+
+Datum
hashfloat8(PG_FUNCTION_ARGS)
{
float8 key = PG_GETARG_FLOAT8(0);
@@ -133,6 +210,23 @@ hashfloat8(PG_FUNCTION_ARGS)
}
Datum
+hashfloat8extended(PG_FUNCTION_ARGS)
+{
+ float8 key = PG_GETARG_FLOAT8(0);
+
+ /*
+ * On IEEE-float machines, minus zero and zero have different bit patterns
+ * but should compare as equal. We must ensure that they have the same
+ * hash value, which is most reliably done this way:
+ */
+ if (key == (float8) 0)
+ return UInt64GetDatum(0);
+
+ return hash_any_extended((unsigned char *) &key, sizeof(key),
+ PG_GETARG_INT64(1));
+}
+
+Datum
hashoidvector(PG_FUNCTION_ARGS)
{
oidvector *key = (oidvector *) PG_GETARG_POINTER(0);
@@ -141,6 +235,16 @@ hashoidvector(PG_FUNCTION_ARGS)
}
Datum
+hashoidvectorextended(PG_FUNCTION_ARGS)
+{
+ oidvector *key = (oidvector *) PG_GETARG_POINTER(0);
+
+ return hash_any_extended((unsigned char *) key->values,
+ key->dim1 * sizeof(Oid),
+ PG_GETARG_INT64(1));
+}
+
+Datum
hashname(PG_FUNCTION_ARGS)
{
char *key = NameStr(*PG_GETARG_NAME(0));
@@ -149,6 +253,15 @@ hashname(PG_FUNCTION_ARGS)
}
Datum
+hashnameextended(PG_FUNCTION_ARGS)
+{
+ char *key = NameStr(*PG_GETARG_NAME(0));
+
+ return hash_any_extended((unsigned char *) key, strlen(key),
+ PG_GETARG_INT64(1));
+}
+
+Datum
hashtext(PG_FUNCTION_ARGS)
{
text *key = PG_GETARG_TEXT_PP(0);
@@ -168,6 +281,27 @@ hashtext(PG_FUNCTION_ARGS)
return result;
}
+Datum
+hashtextextended(PG_FUNCTION_ARGS)
+{
+ text *key = PG_GETARG_TEXT_PP(0);
+ Datum result;
+
+ /*
+ * Note: this is currently identical in behavior to hashvarlena, but keep
+ * it as a separate function in case we someday want to do something
+ * different in non-C locales. (See also hashbpchar, if so.)
+ */
+ result = hash_any_extended((unsigned char *) VARDATA_ANY(key),
+ VARSIZE_ANY_EXHDR(key),
+ PG_GETARG_INT64(1));
+
+ /* Avoid leaking memory for toasted inputs */
+ PG_FREE_IF_COPY(key, 0);
+
+ return result;
+}
+
/*
* hashvarlena() can be used for any varlena datatype in which there are
* no non-significant bits, ie, distinct bitpatterns never compare as equal.
@@ -187,6 +321,22 @@ hashvarlena(PG_FUNCTION_ARGS)
return result;
}
+Datum
+hashvarlenaextended(PG_FUNCTION_ARGS)
+{
+ struct varlena *key = PG_GETARG_VARLENA_PP(0);
+ Datum result;
+
+ result = hash_any_extended((unsigned char *) VARDATA_ANY(key),
+ VARSIZE_ANY_EXHDR(key),
+ PG_GETARG_INT64(1));
+
+ /* Avoid leaking memory for toasted inputs */
+ PG_FREE_IF_COPY(key, 0);
+
+ return result;
+}
+
/*
* This hash function was written by Bob Jenkins
* (bob_jenkins@burtleburtle.net), and superficially adapted
@@ -502,7 +652,227 @@ hash_any(register const unsigned char *k, register int keylen)
}
/*
- * hash_uint32() -- hash a 32-bit value
+ * hash_any_extended() -- hash into a 64-bit value, using an optional seed
+ * k : the key (the unaligned variable-length array of bytes)
+ * len : the length of the key, counting by bytes
+ * seed : a 64-bit seed (0 means no seed)
+ *
+ * Returns a uint64 value. Otherwise similar to hash_any.
+ */
+Datum
+hash_any_extended(register const unsigned char *k, register int keylen,
+ uint64 seed)
+{
+ register uint32 a,
+ b,
+ c,
+ len;
+
+ /* Set up the internal state */
+ len = keylen;
+ a = b = c = 0x9e3779b9 + len + 3923095;
+
+ /* If the seed is non-zero, use it to perturb the internal state. */
+ if (seed != 0)
+ {
+ /*
+ * In essence, the seed is treated as part of the data being hashed,
+ * but for simplicity, we pretend that it's padded with four bytes of
+ * zeroes so that the seed constitutes a 4-byte chunk.
+ */
+ a += (uint32) (seed >> 32);
+ b += (uint32) seed;
+ mix(a, b, c);
+ }
+
+ /* If the source pointer is word-aligned, we use word-wide fetches */
+ if (((uintptr_t) k & UINT32_ALIGN_MASK) == 0)
+ {
+ /* Code path for aligned source data */
+ register const uint32 *ka = (const uint32 *) k;
+
+ /* handle most of the key */
+ while (len >= 12)
+ {
+ a += ka[0];
+ b += ka[1];
+ c += ka[2];
+ mix(a, b, c);
+ ka += 3;
+ len -= 12;
+ }
+
+ /* handle the last 11 bytes */
+ k = (const unsigned char *) ka;
+#ifdef WORDS_BIGENDIAN
+ switch (len)
+ {
+ case 11:
+ c += ((uint32) k[10] << 8);
+ /* fall through */
+ case 10:
+ c += ((uint32) k[9] << 16);
+ /* fall through */
+ case 9:
+ c += ((uint32) k[8] << 24);
+ /* the lowest byte of c is reserved for the length */
+ /* fall through */
+ case 8:
+ b += ka[1];
+ a += ka[0];
+ break;
+ case 7:
+ b += ((uint32) k[6] << 8);
+ /* fall through */
+ case 6:
+ b += ((uint32) k[5] << 16);
+ /* fall through */
+ case 5:
+ b += ((uint32) k[4] << 24);
+ /* fall through */
+ case 4:
+ a += ka[0];
+ break;
+ case 3:
+ a += ((uint32) k[2] << 8);
+ /* fall through */
+ case 2:
+ a += ((uint32) k[1] << 16);
+ /* fall through */
+ case 1:
+ a += ((uint32) k[0] << 24);
+ /* case 0: nothing left to add */
+ }
+#else /* !WORDS_BIGENDIAN */
+ switch (len)
+ {
+ case 11:
+ c += ((uint32) k[10] << 24);
+ /* fall through */
+ case 10:
+ c += ((uint32) k[9] << 16);
+ /* fall through */
+ case 9:
+ c += ((uint32) k[8] << 8);
+ /* the lowest byte of c is reserved for the length */
+ /* fall through */
+ case 8:
+ b += ka[1];
+ a += ka[0];
+ break;
+ case 7:
+ b += ((uint32) k[6] << 16);
+ /* fall through */
+ case 6:
+ b += ((uint32) k[5] << 8);
+ /* fall through */
+ case 5:
+ b += k[4];
+ /* fall through */
+ case 4:
+ a += ka[0];
+ break;
+ case 3:
+ a += ((uint32) k[2] << 16);
+ /* fall through */
+ case 2:
+ a += ((uint32) k[1] << 8);
+ /* fall through */
+ case 1:
+ a += k[0];
+ /* case 0: nothing left to add */
+ }
+#endif /* WORDS_BIGENDIAN */
+ }
+ else
+ {
+ /* Code path for non-aligned source data */
+
+ /* handle most of the key */
+ while (len >= 12)
+ {
+#ifdef WORDS_BIGENDIAN
+ a += (k[3] + ((uint32) k[2] << 8) + ((uint32) k[1] << 16) + ((uint32) k[0] << 24));
+ b += (k[7] + ((uint32) k[6] << 8) + ((uint32) k[5] << 16) + ((uint32) k[4] << 24));
+ c += (k[11] + ((uint32) k[10] << 8) + ((uint32) k[9] << 16) + ((uint32) k[8] << 24));
+#else /* !WORDS_BIGENDIAN */
+ a += (k[0] + ((uint32) k[1] << 8) + ((uint32) k[2] << 16) + ((uint32) k[3] << 24));
+ b += (k[4] + ((uint32) k[5] << 8) + ((uint32) k[6] << 16) + ((uint32) k[7] << 24));
+ c += (k[8] + ((uint32) k[9] << 8) + ((uint32) k[10] << 16) + ((uint32) k[11] << 24));
+#endif /* WORDS_BIGENDIAN */
+ mix(a, b, c);
+ k += 12;
+ len -= 12;
+ }
+
+ /* handle the last 11 bytes */
+#ifdef WORDS_BIGENDIAN
+ switch (len) /* all the case statements fall through */
+ {
+ case 11:
+ c += ((uint32) k[10] << 8);
+ case 10:
+ c += ((uint32) k[9] << 16);
+ case 9:
+ c += ((uint32) k[8] << 24);
+ /* the lowest byte of c is reserved for the length */
+ case 8:
+ b += k[7];
+ case 7:
+ b += ((uint32) k[6] << 8);
+ case 6:
+ b += ((uint32) k[5] << 16);
+ case 5:
+ b += ((uint32) k[4] << 24);
+ case 4:
+ a += k[3];
+ case 3:
+ a += ((uint32) k[2] << 8);
+ case 2:
+ a += ((uint32) k[1] << 16);
+ case 1:
+ a += ((uint32) k[0] << 24);
+ /* case 0: nothing left to add */
+ }
+#else /* !WORDS_BIGENDIAN */
+ switch (len) /* all the case statements fall through */
+ {
+ case 11:
+ c += ((uint32) k[10] << 24);
+ case 10:
+ c += ((uint32) k[9] << 16);
+ case 9:
+ c += ((uint32) k[8] << 8);
+ /* the lowest byte of c is reserved for the length */
+ case 8:
+ b += ((uint32) k[7] << 24);
+ case 7:
+ b += ((uint32) k[6] << 16);
+ case 6:
+ b += ((uint32) k[5] << 8);
+ case 5:
+ b += k[4];
+ case 4:
+ a += ((uint32) k[3] << 24);
+ case 3:
+ a += ((uint32) k[2] << 16);
+ case 2:
+ a += ((uint32) k[1] << 8);
+ case 1:
+ a += k[0];
+ /* case 0: nothing left to add */
+ }
+#endif /* WORDS_BIGENDIAN */
+ }
+
+ final(a, b, c);
+
+ /* report the result */
+ return UInt64GetDatum(((uint64) b << 32) | c);
+}
+
+/*
+ * hash_uint32() -- hash a 32-bit value to a 32-bit value
*
* This has the same result as
* hash_any(&k, sizeof(uint32))
@@ -523,3 +893,32 @@ hash_uint32(uint32 k)
/* report the result */
return UInt32GetDatum(c);
}
+
+/*
+ * hash_uint32_extended() -- hash a 32-bit value to a 64-bit value, with a seed
+ *
+ * Like hash_uint32, this is a convenience function.
+ */
+Datum
+hash_uint32_extended(uint32 k, uint64 seed)
+{
+ register uint32 a,
+ b,
+ c;
+
+ a = b = c = 0x9e3779b9 + (uint32) sizeof(uint32) + 3923095;
+
+ if (seed != 0)
+ {
+ a += (uint32) (seed >> 32);
+ b += (uint32) seed;
+ mix(a, b, c);
+ }
+
+ a += k;
+
+ final(a, b, c);
+
+ /* report the result */
+ return UInt64GetDatum(((uint64) b << 32) | c);
+}
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 7b2906b..0579841 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -373,7 +373,7 @@ _hash_init(Relation rel, double num_tuples, ForkNumber forkNum)
if (ffactor < 10)
ffactor = 10;
- procid = index_getprocid(rel, 1, HASHPROC);
+ procid = index_getprocid(rel, 1, HASHSTANDARD_PROC);
/*
* We initialize the metapage, the first N bucket pages, and the first
diff --git a/src/backend/access/hash/hashutil.c b/src/backend/access/hash/hashutil.c
index 9b803af..869cbc1 100644
--- a/src/backend/access/hash/hashutil.c
+++ b/src/backend/access/hash/hashutil.c
@@ -85,7 +85,7 @@ _hash_datum2hashkey(Relation rel, Datum key)
Oid collation;
/* XXX assumes index has only one attribute */
- procinfo = index_getprocinfo(rel, 1, HASHPROC);
+ procinfo = index_getprocinfo(rel, 1, HASHSTANDARD_PROC);
collation = rel->rd_indcollation[0];
return DatumGetUInt32(FunctionCall1Coll(procinfo, collation, key));
@@ -108,10 +108,10 @@ _hash_datum2hashkey_type(Relation rel, Datum key, Oid keytype)
hash_proc = get_opfamily_proc(rel->rd_opfamily[0],
keytype,
keytype,
- HASHPROC);
+ HASHSTANDARD_PROC);
if (!RegProcedureIsValid(hash_proc))
elog(ERROR, "missing support function %d(%u,%u) for index \"%s\"",
- HASHPROC, keytype, keytype,
+ HASHSTANDARD_PROC, keytype, keytype,
RelationGetRelationName(rel));
collation = rel->rd_indcollation[0];
diff --git a/src/backend/access/hash/hashvalidate.c b/src/backend/access/hash/hashvalidate.c
index 30b29cb..8b633c2 100644
--- a/src/backend/access/hash/hashvalidate.c
+++ b/src/backend/access/hash/hashvalidate.c
@@ -29,7 +29,7 @@
#include "utils/syscache.h"
-static bool check_hash_func_signature(Oid funcid, Oid restype, Oid argtype);
+static bool check_hash_func_signature(Oid funcid, int16 amprocnum, Oid argtype);
/*
@@ -105,8 +105,9 @@ hashvalidate(Oid opclassoid)
/* Check procedure numbers and function signatures */
switch (procform->amprocnum)
{
- case HASHPROC:
- if (!check_hash_func_signature(procform->amproc, INT4OID,
+ case HASHSTANDARD_PROC:
+ case HASHEXTENDED_PROC:
+ if (!check_hash_func_signature(procform->amproc, procform->amprocnum,
procform->amproclefttype))
{
ereport(INFO,
@@ -264,19 +265,37 @@ hashvalidate(Oid opclassoid)
* hacks in the core hash opclass definitions.
*/
static bool
-check_hash_func_signature(Oid funcid, Oid restype, Oid argtype)
+check_hash_func_signature(Oid funcid, int16 amprocnum, Oid argtype)
{
bool result = true;
+ Oid restype;
+ int16 nargs;
HeapTuple tp;
Form_pg_proc procform;
+ switch (amprocnum)
+ {
+ case HASHSTANDARD_PROC:
+ restype = INT4OID;
+ nargs = 1;
+ break;
+
+ case HASHEXTENDED_PROC:
+ restype = INT8OID;
+ nargs = 2;
+ break;
+
+ default:
+ elog(ERROR, "invalid amprocnum");
+ }
+
tp = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid));
if (!HeapTupleIsValid(tp))
elog(ERROR, "cache lookup failed for function %u", funcid);
procform = (Form_pg_proc) GETSTRUCT(tp);
if (procform->prorettype != restype || procform->proretset ||
- procform->pronargs != 1)
+ procform->pronargs != nargs)
result = false;
if (!IsBinaryCoercible(argtype, procform->proargtypes.values[0]))
@@ -290,24 +309,29 @@ check_hash_func_signature(Oid funcid, Oid restype, Oid argtype)
* identity, not just its input type, because hashvarlena() takes
* INTERNAL and allowing any such function seems too scary.
*/
- if (funcid == F_HASHINT4 &&
+ if ((funcid == F_HASHINT4 || funcid == F_HASHINT4EXTENDED) &&
(argtype == DATEOID ||
argtype == ABSTIMEOID || argtype == RELTIMEOID ||
argtype == XIDOID || argtype == CIDOID))
/* okay, allowed use of hashint4() */ ;
- else if (funcid == F_TIMESTAMP_HASH &&
+ else if ((funcid == F_TIMESTAMP_HASH ||
+ funcid == F_TIMESTAMP_HASH_EXTENDED) &&
argtype == TIMESTAMPTZOID)
/* okay, allowed use of timestamp_hash() */ ;
- else if (funcid == F_HASHCHAR &&
+ else if ((funcid == F_HASHCHAR || funcid == F_HASHCHAREXTENDED) &&
argtype == BOOLOID)
/* okay, allowed use of hashchar() */ ;
- else if (funcid == F_HASHVARLENA &&
+ else if ((funcid == F_HASHVARLENA || funcid == F_HASHVARLENAEXTENDED) &&
argtype == BYTEAOID)
/* okay, allowed use of hashvarlena() */ ;
else
result = false;
}
+ /* If function takes a second argument, it must be for a 64-bit salt. */
+ if (nargs == 2 && procform->proargtypes.values[1] != INT8OID)
+ result = false;
+
ReleaseSysCache(tp);
return result;
}
diff --git a/src/backend/commands/opclasscmds.c b/src/backend/commands/opclasscmds.c
index a31b1ac..d23e6d6 100644
--- a/src/backend/commands/opclasscmds.c
+++ b/src/backend/commands/opclasscmds.c
@@ -18,6 +18,7 @@
#include <limits.h>
#include "access/genam.h"
+#include "access/hash.h"
#include "access/heapam.h"
#include "access/nbtree.h"
#include "access/htup_details.h"
@@ -1129,7 +1130,8 @@ assignProcTypes(OpFamilyMember *member, Oid amoid, Oid typeoid)
/*
* btree comparison procs must be 2-arg procs returning int4, while btree
* sortsupport procs must take internal and return void. hash support
- * procs must be 1-arg procs returning int4. Otherwise we don't know.
+ * proc 1 must be a 1-arg proc returning int4, while proc 2 must be a
+ * 2-arg proc returning int8. Otherwise we don't know.
*/
if (amoid == BTREE_AM_OID)
{
@@ -1172,14 +1174,28 @@ assignProcTypes(OpFamilyMember *member, Oid amoid, Oid typeoid)
}
else if (amoid == HASH_AM_OID)
{
- if (procform->pronargs != 1)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
- errmsg("hash procedures must have one argument")));
- if (procform->prorettype != INT4OID)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
- errmsg("hash procedures must return integer")));
+ if (member->number == HASHSTANDARD_PROC)
+ {
+ if (procform->pronargs != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 1 must have one argument")));
+ if (procform->prorettype != INT4OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 1 must return integer")));
+ }
+ else if (member->number == HASHEXTENDED_PROC)
+ {
+ if (procform->pronargs != 2)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 2 must have two arguments")));
+ if (procform->prorettype != INT8OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 2 must return bigint")));
+ }
/*
* If lefttype/righttype isn't specified, use the proc's input type
diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index 2efb6c9..a64182b 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -717,6 +717,19 @@ hash_aclitem(PG_FUNCTION_ARGS)
PG_RETURN_UINT32((uint32) (a->ai_privs + a->ai_grantee + a->ai_grantor));
}
+/*
+ * Returns a uint64 value after adding seed. Otherwise similar to
+ * hash_aclitem
+ */
+Datum
+hash_aclitem_extended(PG_FUNCTION_ARGS)
+{
+ AclItem *a = PG_GETARG_ACLITEM_P(0);
+ uint64 seed = PG_GETARG_INT64(1);
+
+ return UInt64GetDatum((uint64) (a->ai_privs + a->ai_grantee + a->ai_grantor
+ + seed));
+}
/*
* acldefault() --- create an ACL describing default access permissions
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 34dadd6..aa61753 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -20,6 +20,7 @@
#endif
#include <math.h>
+#include "access/hash.h"
#include "access/htup_details.h"
#include "catalog/pg_type.h"
#include "funcapi.h"
@@ -4020,6 +4021,103 @@ hash_array(PG_FUNCTION_ARGS)
PG_RETURN_UINT32(result);
}
+/* Returns a uint64 value. Otherwise similar to hash_any. */
+Datum
+hash_array_extended(PG_FUNCTION_ARGS)
+{
+ AnyArrayType *array = PG_GETARG_ANY_ARRAY(0);
+ uint64 seed = PG_GETARG_INT64(1);
+ int ndims = AARR_NDIM(array);
+ int *dims = AARR_DIMS(array);
+ Oid element_type = AARR_ELEMTYPE(array);
+ uint64 result = 1;
+ int nitems;
+ TypeCacheEntry *typentry;
+ int typlen;
+ bool typbyval;
+ char typalign;
+ int i;
+ array_iter iter;
+ FunctionCallInfoData locfcinfo;
+
+ /*
+ * We arrange to look up the hash function only once per series of calls,
+ * assuming the element type doesn't change underneath us. The typcache
+ * is used so that we have no memory leakage when being used as an index
+ * support function.
+ */
+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL ||
+ typentry->type_id != element_type)
+ {
+ typentry = lookup_type_cache(element_type,
+ TYPECACHE_HASH_EXTENDED_PROC_FINFO);
+ if (!OidIsValid(typentry->hash_extended_proc_finfo.fn_oid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("could not identify an extended hash function for type %s",
+ format_type_be(element_type))));
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
+ typlen = typentry->typlen;
+ typbyval = typentry->typbyval;
+ typalign = typentry->typalign;
+
+ /*
+ * apply the hash function to each array element.
+ */
+ InitFunctionCallInfoData(locfcinfo, &typentry->hash_extended_proc_finfo, 2,
+ InvalidOid, NULL, NULL);
+
+ /* Loop over source data */
+ nitems = ArrayGetNItems(ndims, dims);
+ array_iter_setup(&iter, array);
+
+ for (i = 0; i < nitems; i++)
+ {
+ Datum elt;
+ bool isnull;
+ uint64 elthash;
+
+ /* Get element, checking for NULL */
+ elt = array_iter_next(&iter, &isnull, i, typlen, typbyval, typalign);
+
+ if (isnull)
+ {
+ /* Treat nulls as having hashvalue 0 */
+ elthash = 0;
+ }
+ else
+ {
+ /* Apply the hash function */
+ locfcinfo.arg[0] = elt;
+ locfcinfo.arg[1] = seed;
+ locfcinfo.argnull[0] = false;
+ locfcinfo.argnull[1] = false;
+ locfcinfo.isnull = false;
+ elthash = DatumGetUInt64(FunctionCallInvoke(&locfcinfo));
+ }
+
+ /*
+ * Combine hash values of successive elements by multiplying the
+ * current value by 31 and adding on the new element's hash value.
+ *
+ * The result is a sum in which each element's hash value is
+ * multiplied by a different power of 31. This is modulo 2^64
+ * arithmetic, and the powers of 31 modulo 2^64 form a cyclic group of
+ * order 2^59. So for arrays of up to 2^59 elements, each element's
+ * hash value is multiplied by a different (odd) number, resulting in
+ * a good mixing of all the elements' hash values.
+ */
+ result = (result << 5) - result + elthash;
+ }
+
+ /* Avoid leaking memory when handed toasted input. */
+ AARR_FREE_IF_COPY(array, 0);
+
+ return UInt64GetDatum(result);
+}
+
/*-----------------------------------------------------------------------------
* array overlap/containment comparisons
diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index 7d89d79..ad5cb2a 100644
--- a/src/backend/utils/adt/date.c
+++ b/src/backend/utils/adt/date.c
@@ -1509,6 +1509,12 @@ time_hash(PG_FUNCTION_ARGS)
}
Datum
+time_hash_extended(PG_FUNCTION_ARGS)
+{
+ return hashint8extended(fcinfo);
+}
+
+Datum
time_larger(PG_FUNCTION_ARGS)
{
TimeADT time1 = PG_GETARG_TIMEADT(0);
@@ -2214,6 +2220,24 @@ timetz_hash(PG_FUNCTION_ARGS)
}
Datum
+timetz_hash_extended(PG_FUNCTION_ARGS)
+{
+ TimeTzADT *key = PG_GETARG_TIMETZADT_P(0);
+ uint64 seed = PG_GETARG_DATUM(1);
+ uint64 thash;
+
+ /*
+ * To avoid any problems with padding bytes in the struct, we figure the
+ * field hashes separately and XOR them.
+ */
+ thash = DatumGetUInt64(DirectFunctionCall2(hashint8extended,
+ Int64GetDatumFast(key->time),
+ seed));
+ thash ^= DatumGetUInt64(hash_uint32_extended(key->zone, seed));
+ return UInt64GetDatum(thash);
+}
+
+Datum
timetz_larger(PG_FUNCTION_ARGS)
{
TimeTzADT *time1 = PG_GETARG_TIMETZADT_P(0);
diff --git a/src/backend/utils/adt/jsonb_op.c b/src/backend/utils/adt/jsonb_op.c
index d4c490e..4a0d147 100644
--- a/src/backend/utils/adt/jsonb_op.c
+++ b/src/backend/utils/adt/jsonb_op.c
@@ -291,3 +291,45 @@ jsonb_hash(PG_FUNCTION_ARGS)
PG_FREE_IF_COPY(jb, 0);
PG_RETURN_INT32(hash);
}
+
+Datum
+jsonb_hash_extended(PG_FUNCTION_ARGS)
+{
+ Jsonb *jb = PG_GETARG_JSONB(0);
+ JsonbIterator *it;
+ JsonbValue v;
+ JsonbIteratorToken r;
+ uint64 hash = 0;
+
+ if (JB_ROOT_COUNT(jb) == 0)
+ return UInt64GetDatum(0);
+
+ it = JsonbIteratorInit(&jb->root);
+
+ while ((r = JsonbIteratorNext(&it, &v, false)) != WJB_DONE)
+ {
+ switch (r)
+ {
+ /* Rotation is left to JsonbHashScalarValue() */
+ case WJB_BEGIN_ARRAY:
+ hash ^= JB_FARRAY;
+ break;
+ case WJB_BEGIN_OBJECT:
+ hash ^= JB_FOBJECT;
+ break;
+ case WJB_KEY:
+ case WJB_VALUE:
+ case WJB_ELEM:
+ JsonbHashScalarValueExtended(&v, &hash, PG_GETARG_INT64(1));
+ break;
+ case WJB_END_ARRAY:
+ case WJB_END_OBJECT:
+ break;
+ default:
+ elog(ERROR, "invalid JsonbIteratorNext rc: %d", (int) r);
+ }
+ }
+
+ PG_FREE_IF_COPY(jb, 0);
+ return UInt64GetDatum(hash);
+}
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 4850569..59782f8 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -1249,6 +1249,50 @@ JsonbHashScalarValue(const JsonbValue *scalarVal, uint32 *hash)
*hash ^= tmp;
}
+/* Returns a uint64 value. Otherwise similar to JsonbHashScalarValue. */
+void
+JsonbHashScalarValueExtended(const JsonbValue *scalarVal, uint64 *hash,
+ uint64 seed)
+{
+ uint64 tmp;
+
+ /* Compute hash value for scalarVal */
+ switch (scalarVal->type)
+ {
+ case jbvNull:
+ tmp = 0x01;
+ break;
+ case jbvString:
+ tmp = DatumGetUInt64(hash_any_extended((const unsigned char *) scalarVal->val.string.val,
+ scalarVal->val.string.len,
+ seed));
+ break;
+ case jbvNumeric:
+ /* Must hash equal numerics to equal hash codes */
+ tmp = DatumGetUInt64(DirectFunctionCall2(hash_numeric_extended,
+ NumericGetDatum(scalarVal->val.numeric),
+ seed));
+ break;
+ case jbvBool:
+ tmp = DatumGetUInt64(DirectFunctionCall2(hashcharextended,
+ BoolGetDatum(scalarVal->val.boolean),
+ seed));
+
+ break;
+ default:
+ elog(ERROR, "invalid jsonb scalar type");
+ break;
+ }
+
+ /*
+ * Combine hash values of successive keys, values and elements by rotating
+ * the previous value left 1 bit, then XOR'ing in the new
+ * key/value/element's hash value.
+ */
+ *hash = (*hash << 1) | (*hash >> 63);
+ *hash ^= tmp;
+}
+
/*
* Are two scalar JsonbValues of the same type a and b equal?
*/
diff --git a/src/backend/utils/adt/mac.c b/src/backend/utils/adt/mac.c
index d1c20c3..60521cc 100644
--- a/src/backend/utils/adt/mac.c
+++ b/src/backend/utils/adt/mac.c
@@ -271,6 +271,15 @@ hashmacaddr(PG_FUNCTION_ARGS)
return hash_any((unsigned char *) key, sizeof(macaddr));
}
+Datum
+hashmacaddrextended(PG_FUNCTION_ARGS)
+{
+ macaddr *key = PG_GETARG_MACADDR_P(0);
+
+ return hash_any_extended((unsigned char *) key, sizeof(macaddr),
+ PG_GETARG_INT64(1));
+}
+
/*
* Arithmetic functions: bitwise NOT, AND, OR.
*/
diff --git a/src/backend/utils/adt/mac8.c b/src/backend/utils/adt/mac8.c
index 482d1fb..0410b98 100644
--- a/src/backend/utils/adt/mac8.c
+++ b/src/backend/utils/adt/mac8.c
@@ -407,6 +407,15 @@ hashmacaddr8(PG_FUNCTION_ARGS)
return hash_any((unsigned char *) key, sizeof(macaddr8));
}
+Datum
+hashmacaddr8extended(PG_FUNCTION_ARGS)
+{
+ macaddr8 *key = PG_GETARG_MACADDR8_P(0);
+
+ return hash_any_extended((unsigned char *) key, sizeof(macaddr8),
+ PG_GETARG_INT64(1));
+}
+
/*
* Arithmetic functions: bitwise NOT, AND, OR.
*/
diff --git a/src/backend/utils/adt/network.c b/src/backend/utils/adt/network.c
index 5573c34..e1d7c8d 100644
--- a/src/backend/utils/adt/network.c
+++ b/src/backend/utils/adt/network.c
@@ -486,6 +486,17 @@ hashinet(PG_FUNCTION_ARGS)
return hash_any((unsigned char *) VARDATA_ANY(addr), addrsize + 2);
}
+Datum
+hashinetextended(PG_FUNCTION_ARGS)
+{
+ inet *addr = PG_GETARG_INET_PP(0);
+ int addrsize = ip_addrsize(addr);
+
+ /* XXX this assumes there are no pad bytes in the data structure */
+ return hash_any_extended((unsigned char *) VARDATA_ANY(addr), addrsize + 2,
+ PG_GETARG_INT64(1));
+}
+
/*
* Boolean network-inclusion tests.
*/
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 3e5614e..d880ec0 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -2230,6 +2230,84 @@ hash_numeric(PG_FUNCTION_ARGS)
PG_RETURN_DATUM(result);
}
+Datum
+hash_numeric_extended(PG_FUNCTION_ARGS)
+{
+ Numeric key = PG_GETARG_NUMERIC(0);
+ Datum digit_hash;
+ Datum result;
+ int weight;
+ int start_offset;
+ int end_offset;
+ int i;
+ int hash_len;
+ NumericDigit *digits;
+
+ /* If it's NaN, don't try to hash the rest of the fields */
+ if (NUMERIC_IS_NAN(key))
+ return UInt64GetDatum(0);
+
+ weight = NUMERIC_WEIGHT(key);
+ start_offset = 0;
+ end_offset = 0;
+
+ /*
+ * Omit any leading or trailing zeros from the input to the hash. The
+ * numeric implementation *should* guarantee that leading and trailing
+ * zeros are suppressed, but we're paranoid. Note that we measure the
+ * starting and ending offsets in units of NumericDigits, not bytes.
+ */
+ digits = NUMERIC_DIGITS(key);
+ for (i = 0; i < NUMERIC_NDIGITS(key); i++)
+ {
+ if (digits[i] != (NumericDigit) 0)
+ break;
+
+ start_offset++;
+
+ /*
+ * The weight is effectively the # of digits before the decimal point,
+ * so decrement it for each leading zero we skip.
+ */
+ weight--;
+ }
+
+ /*
+ * If there are no non-zero digits, then the value of the number is zero,
+ * regardless of any other fields.
+ */
+ if (NUMERIC_NDIGITS(key) == start_offset)
+ return UInt64GetDatum(-1);
+
+ for (i = NUMERIC_NDIGITS(key) - 1; i >= 0; i--)
+ {
+ if (digits[i] != (NumericDigit) 0)
+ break;
+
+ end_offset++;
+ }
+
+ /* If we get here, there should be at least one non-zero digit */
+ Assert(start_offset + end_offset < NUMERIC_NDIGITS(key));
+
+ /*
+ * Note that we don't hash on the Numeric's scale, since two numerics can
+ * compare equal but have different scales. We also don't hash on the
+ * sign, although we could: since a sign difference implies inequality,
+ * this shouldn't affect correctness.
+ */
+ hash_len = NUMERIC_NDIGITS(key) - start_offset - end_offset;
+ digit_hash = hash_any_extended((unsigned char *) (NUMERIC_DIGITS(key)
+ + start_offset),
+ hash_len * sizeof(NumericDigit),
+ PG_GETARG_INT64(1));
+
+ /* Mix in the weight, via XOR */
+ result = digit_hash ^ weight;
+
+ PG_RETURN_DATUM(result);
+}
+
/* ----------------------------------------------------------------------
*
diff --git a/src/backend/utils/adt/pg_lsn.c b/src/backend/utils/adt/pg_lsn.c
index aefbb87..c1795df 100644
--- a/src/backend/utils/adt/pg_lsn.c
+++ b/src/backend/utils/adt/pg_lsn.c
@@ -179,6 +179,13 @@ pg_lsn_hash(PG_FUNCTION_ARGS)
return hashint8(fcinfo);
}
+Datum
+pg_lsn_hash_extended(PG_FUNCTION_ARGS)
+{
+ /* We can use hashint8extended directly */
+ return hashint8extended(fcinfo);
+}
+
/*----------------------------------------------------------
* Arithmetic operators on PostgreSQL LSNs.
diff --git a/src/backend/utils/adt/rangetypes.c b/src/backend/utils/adt/rangetypes.c
index 09a4f14..813e09b 100644
--- a/src/backend/utils/adt/rangetypes.c
+++ b/src/backend/utils/adt/rangetypes.c
@@ -1280,6 +1280,73 @@ hash_range(PG_FUNCTION_ARGS)
PG_RETURN_INT32(result);
}
+Datum
+hash_range_extended(PG_FUNCTION_ARGS)
+{
+ RangeType *r = PG_GETARG_RANGE(0);
+ uint64 seed = PG_GETARG_INT64(1);
+ uint64 result;
+ TypeCacheEntry *typcache;
+ TypeCacheEntry *scache;
+ RangeBound lower;
+ RangeBound upper;
+ bool empty;
+ char flags;
+ uint64 lower_hash;
+ uint64 upper_hash;
+ Oid rngtypid = RangeTypeGetOid(r);
+
+ check_stack_depth(); /* recurses when subtype is a range type */
+
+ typcache = range_get_typcache(fcinfo, RangeTypeGetOid(r));
+
+ /* deserialize */
+ range_deserialize(typcache, r, &lower, &upper, &empty);
+ flags = range_get_flags(r);
+
+ /*
+ * Look up the element type's extended hash function, if not done already.
+ */
+ scache = typcache->rngelemtype;
+ if (!OidIsValid(scache->hash_extended_proc_finfo.fn_oid))
+ {
+ scache = lookup_type_cache(scache->type_id,
+ TYPECACHE_HASH_EXTENDED_PROC_FINFO);
+ if (!OidIsValid(scache->hash_extended_proc_finfo.fn_oid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("could not identify a hash function for type %s",
+ format_type_be(scache->type_id))));
+ }
+
+ /*
+ * Apply the hash function to each bound.
+ */
+ if (RANGE_HAS_LBOUND(flags))
+ lower_hash = DatumGetUInt64(FunctionCall2Coll(&scache->hash_extended_proc_finfo,
+ typcache->rng_collation,
+ lower.val,
+ seed));
+ else
+ lower_hash = 0;
+
+ if (RANGE_HAS_UBOUND(flags))
+ upper_hash = DatumGetUInt64(FunctionCall2Coll(&scache->hash_extended_proc_finfo,
+ typcache->rng_collation,
+ upper.val,
+ seed));
+ else
+ upper_hash = 0;
+
+ /* Merge hashes of flags and bounds */
+ result = hash_uint32_extended((uint64) flags, seed);
+ result ^= lower_hash;
+ result = (result << 1) | (result >> 63);
+ result ^= upper_hash;
+
+ return UInt64GetDatum(result);
+}
+
/*
*----------------------------------------------------------
* CANONICAL FUNCTIONS
diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index 6fa126d..81530bf 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -2113,6 +2113,11 @@ timestamp_hash(PG_FUNCTION_ARGS)
return hashint8(fcinfo);
}
+Datum
+timestamp_hash_extended(PG_FUNCTION_ARGS)
+{
+ return hashint8extended(fcinfo);
+}
/*
* Cross-type comparison functions for timestamp vs timestamptz
@@ -2419,6 +2424,25 @@ interval_hash(PG_FUNCTION_ARGS)
return DirectFunctionCall1(hashint8, Int64GetDatumFast(span64));
}
+Datum
+interval_hash_extended(PG_FUNCTION_ARGS)
+{
+ Interval *interval = PG_GETARG_INTERVAL_P(0);
+ INT128 span = interval_cmp_value(interval);
+ int64 span64;
+
+ /*
+ * Use only the least significant 64 bits for hashing. The upper 64 bits
+ * seldom add any useful information, and besides we must do it like this
+ * for compatibility with hashes calculated before use of INT128 was
+ * introduced.
+ */
+ span64 = int128_to_int64(span);
+
+ return DirectFunctionCall2(hashint8extended, Int64GetDatumFast(span64),
+ PG_GETARG_DATUM(1));
+}
+
/* overlaps_timestamp() --- implements the SQL OVERLAPS operator.
*
* Algorithm is per SQL spec. This is much harder than you'd think
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 5f15c8e..f73c695 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -408,3 +408,11 @@ uuid_hash(PG_FUNCTION_ARGS)
return hash_any(key->data, UUID_LEN);
}
+
+Datum
+uuid_hash_extended(PG_FUNCTION_ARGS)
+{
+ pg_uuid_t *key = PG_GETARG_UUID_P(0);
+
+ return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
+}
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index cbc62b0..c0198d4 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -947,6 +947,25 @@ hashbpchar(PG_FUNCTION_ARGS)
return result;
}
+Datum
+hashbpcharextended(PG_FUNCTION_ARGS)
+{
+ BpChar *key = PG_GETARG_BPCHAR_PP(0);
+ char *keydata;
+ int keylen;
+ Datum result;
+
+ keydata = VARDATA_ANY(key);
+ keylen = bcTruelen(key);
+
+ result = hash_any_extended((unsigned char *) keydata, keylen,
+ PG_GETARG_INT64(1));
+
+ /* Avoid leaking memory for toasted inputs */
+ PG_FREE_IF_COPY(key, 0);
+
+ return result;
+}
/*
* The following operators support character-by-character comparison
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 82763f8..b7a14dc 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -490,8 +490,8 @@ get_compatible_hash_operators(Oid opno,
/*
* get_op_hash_functions
- * Get the OID(s) of hash support function(s) compatible with the given
- * operator, operating on its LHS and/or RHS datatype as required.
+ * Get the OID(s) of the standard hash support function(s) compatible with
+ * the given operator, operating on its LHS and/or RHS datatype as required.
*
* A function for the LHS type is sought and returned into *lhs_procno if
* lhs_procno isn't NULL. Similarly, a function for the RHS type is sought
@@ -542,7 +542,7 @@ get_op_hash_functions(Oid opno,
*lhs_procno = get_opfamily_proc(aform->amopfamily,
aform->amoplefttype,
aform->amoplefttype,
- HASHPROC);
+ HASHSTANDARD_PROC);
if (!OidIsValid(*lhs_procno))
continue;
/* Matching LHS found, done if caller doesn't want RHS */
@@ -564,7 +564,7 @@ get_op_hash_functions(Oid opno,
*rhs_procno = get_opfamily_proc(aform->amopfamily,
aform->amoprighttype,
aform->amoprighttype,
- HASHPROC);
+ HASHSTANDARD_PROC);
if (!OidIsValid(*rhs_procno))
{
/* Forget any LHS function from this opfamily */
diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index 691d498..f161a7f 100644
--- a/src/backend/utils/cache/typcache.c
+++ b/src/backend/utils/cache/typcache.c
@@ -90,6 +90,7 @@ static TypeCacheEntry *firstDomainTypeEntry = NULL;
#define TCFLAGS_HAVE_FIELD_EQUALITY 0x1000
#define TCFLAGS_HAVE_FIELD_COMPARE 0x2000
#define TCFLAGS_CHECKED_DOMAIN_CONSTRAINTS 0x4000
+#define TCFLAGS_CHECKED_HASH_EXTENDED_PROC 0x8000
/*
* Data stored about a domain type's constraints. Note that we do not create
@@ -307,6 +308,8 @@ lookup_type_cache(Oid type_id, int flags)
flags |= TYPECACHE_HASH_OPFAMILY;
if ((flags & (TYPECACHE_HASH_PROC | TYPECACHE_HASH_PROC_FINFO |
+ TYPECACHE_HASH_EXTENDED_PROC |
+ TYPECACHE_HASH_EXTENDED_PROC_FINFO |
TYPECACHE_HASH_OPFAMILY)) &&
!(typentry->flags & TCFLAGS_CHECKED_HASH_OPCLASS))
{
@@ -329,6 +332,7 @@ lookup_type_cache(Oid type_id, int flags)
* decision is still good.
*/
typentry->flags &= ~(TCFLAGS_CHECKED_HASH_PROC);
+ typentry->flags &= ~(TCFLAGS_CHECKED_HASH_EXTENDED_PROC);
typentry->flags |= TCFLAGS_CHECKED_HASH_OPCLASS;
}
@@ -377,6 +381,7 @@ lookup_type_cache(Oid type_id, int flags)
* matches the operator.
*/
typentry->flags &= ~(TCFLAGS_CHECKED_HASH_PROC);
+ typentry->flags &= ~(TCFLAGS_CHECKED_HASH_EXTENDED_PROC);
typentry->flags |= TCFLAGS_CHECKED_EQ_OPR;
}
if ((flags & TYPECACHE_LT_OPR) &&
@@ -467,7 +472,7 @@ lookup_type_cache(Oid type_id, int flags)
hash_proc = get_opfamily_proc(typentry->hash_opf,
typentry->hash_opintype,
typentry->hash_opintype,
- HASHPROC);
+ HASHSTANDARD_PROC);
/*
* As above, make sure hash_array will succeed. We don't currently
@@ -485,6 +490,43 @@ lookup_type_cache(Oid type_id, int flags)
typentry->hash_proc = hash_proc;
typentry->flags |= TCFLAGS_CHECKED_HASH_PROC;
}
+ if ((flags & (TYPECACHE_HASH_EXTENDED_PROC |
+ TYPECACHE_HASH_EXTENDED_PROC_FINFO)) &&
+ !(typentry->flags & TCFLAGS_CHECKED_HASH_EXTENDED_PROC))
+ {
+ Oid hash_extended_proc = InvalidOid;
+
+ /*
+ * We insist that the eq_opr, if one has been determined, match the
+ * hash opclass; else report there is no hash function.
+ */
+ if (typentry->hash_opf != InvalidOid &&
+ (!OidIsValid(typentry->eq_opr) ||
+ typentry->eq_opr == get_opfamily_member(typentry->hash_opf,
+ typentry->hash_opintype,
+ typentry->hash_opintype,
+ HTEqualStrategyNumber)))
+ hash_extended_proc = get_opfamily_proc(typentry->hash_opf,
+ typentry->hash_opintype,
+ typentry->hash_opintype,
+ HASHEXTENDED_PROC);
+
+ /*
+ * As above, make sure hash_array_extended will succeed. We don't
+ * currently support hashing for composite types, but when we do,
+ * we'll need more logic here to check that case too.
+ */
+ if (hash_extended_proc == F_HASH_ARRAY_EXTENDED &&
+ !array_element_has_hashing(typentry))
+ hash_extended_proc = InvalidOid;
+
+ /* Force update of hash_proc_finfo only if we're changing state */
+ if (typentry->hash_extended_proc != hash_extended_proc)
+ typentry->hash_extended_proc_finfo.fn_oid = InvalidOid;
+
+ typentry->hash_extended_proc = hash_extended_proc;
+ typentry->flags |= TCFLAGS_CHECKED_HASH_EXTENDED_PROC;
+ }
/*
* Set up fmgr lookup info as requested
@@ -523,6 +565,14 @@ lookup_type_cache(Oid type_id, int flags)
fmgr_info_cxt(typentry->hash_proc, &typentry->hash_proc_finfo,
CacheMemoryContext);
}
+ if ((flags & TYPECACHE_HASH_EXTENDED_PROC_FINFO) &&
+ typentry->hash_extended_proc_finfo.fn_oid == InvalidOid &&
+ typentry->hash_extended_proc != InvalidOid)
+ {
+ fmgr_info_cxt(typentry->hash_extended_proc,
+ &typentry->hash_extended_proc_finfo,
+ CacheMemoryContext);
+ }
/*
* If it's a composite type (row type), get tupdesc if requested
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index 72fce30..d112ba3 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -289,12 +289,16 @@ typedef HashMetaPageData *HashMetaPage;
#define HTMaxStrategyNumber 1
/*
- * When a new operator class is declared, we require that the user supply
- * us with an amproc procudure for hashing a key of the new type.
- * Since we only have one such proc in amproc, it's number 1.
+ * When a new operator class is declared, we require that the user supply
+ * us with an amproc procudure for hashing a key of the new type, returning
+ * a 32-bit hash value. We call this the "standard" hash procedure. We
+ * also allow an optional "extended" hash procedure which accepts a salt and
+ * returns a 64-bit hash value. This is highly recommended but, for reasons
+ * of backward compatibility, optional.
*/
-#define HASHPROC 1
-#define HASHNProcs 1
+#define HASHSTANDARD_PROC 1
+#define HASHEXTENDED_PROC 2
+#define HASHNProcs 2
/* public routines */
@@ -322,7 +326,11 @@ extern bytea *hashoptions(Datum reloptions, bool validate);
extern bool hashvalidate(Oid opclassoid);
extern Datum hash_any(register const unsigned char *k, register int keylen);
+extern Datum
+hash_any_extended(register const unsigned char *k, register int
+ keylen, uint64 seed);
extern Datum hash_uint32(uint32 k);
+extern Datum hash_uint32_extended(uint32 k, uint64 seed);
/* private routines */
diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h
index 7d245b1..fb6a829 100644
--- a/src/include/catalog/pg_amproc.h
+++ b/src/include/catalog/pg_amproc.h
@@ -153,41 +153,77 @@ DATA(insert ( 4033 3802 3802 1 4044 ));
/* hash */
DATA(insert ( 427 1042 1042 1 1080 ));
+DATA(insert ( 427 1042 1042 2 972 ));
DATA(insert ( 431 18 18 1 454 ));
+DATA(insert ( 431 18 18 2 446 ));
DATA(insert ( 435 1082 1082 1 450 ));
+DATA(insert ( 435 1082 1082 2 425 ));
DATA(insert ( 627 2277 2277 1 626 ));
+DATA(insert ( 627 2277 2277 2 782 ));
DATA(insert ( 1971 700 700 1 451 ));
+DATA(insert ( 1971 700 700 2 443 ));
DATA(insert ( 1971 701 701 1 452 ));
+DATA(insert ( 1971 701 701 2 444 ));
DATA(insert ( 1975 869 869 1 422 ));
+DATA(insert ( 1975 869 869 2 779 ));
DATA(insert ( 1977 21 21 1 449 ));
+DATA(insert ( 1977 21 21 2 441 ));
DATA(insert ( 1977 23 23 1 450 ));
+DATA(insert ( 1977 23 23 2 425 ));
DATA(insert ( 1977 20 20 1 949 ));
+DATA(insert ( 1977 20 20 2 442 ));
DATA(insert ( 1983 1186 1186 1 1697 ));
+DATA(insert ( 1983 1186 1186 2 3418 ));
DATA(insert ( 1985 829 829 1 399 ));
+DATA(insert ( 1985 829 829 2 778 ));
DATA(insert ( 1987 19 19 1 455 ));
+DATA(insert ( 1987 19 19 2 447 ));
DATA(insert ( 1990 26 26 1 453 ));
+DATA(insert ( 1990 26 26 2 445 ));
DATA(insert ( 1992 30 30 1 457 ));
+DATA(insert ( 1992 30 30 2 776 ));
DATA(insert ( 1995 25 25 1 400 ));
+DATA(insert ( 1995 25 25 2 448));
DATA(insert ( 1997 1083 1083 1 1688 ));
+DATA(insert ( 1997 1083 1083 2 3409 ));
DATA(insert ( 1998 1700 1700 1 432 ));
+DATA(insert ( 1998 1700 1700 2 780 ));
DATA(insert ( 1999 1184 1184 1 2039 ));
+DATA(insert ( 1999 1184 1184 2 3411 ));
DATA(insert ( 2001 1266 1266 1 1696 ));
+DATA(insert ( 2001 1266 1266 2 3410 ));
DATA(insert ( 2040 1114 1114 1 2039 ));
+DATA(insert ( 2040 1114 1114 2 3411 ));
DATA(insert ( 2222 16 16 1 454 ));
+DATA(insert ( 2222 16 16 2 446 ));
DATA(insert ( 2223 17 17 1 456 ));
+DATA(insert ( 2223 17 17 2 772 ));
DATA(insert ( 2225 28 28 1 450 ));
+DATA(insert ( 2225 28 28 2 425));
DATA(insert ( 2226 29 29 1 450 ));
+DATA(insert ( 2226 29 29 2 425 ));
DATA(insert ( 2227 702 702 1 450 ));
+DATA(insert ( 2227 702 702 2 425 ));
DATA(insert ( 2228 703 703 1 450 ));
+DATA(insert ( 2228 703 703 2 425 ));
DATA(insert ( 2229 25 25 1 400 ));
+DATA(insert ( 2229 25 25 2 448 ));
DATA(insert ( 2231 1042 1042 1 1080 ));
+DATA(insert ( 2231 1042 1042 2 972 ));
DATA(insert ( 2235 1033 1033 1 329 ));
+DATA(insert ( 2235 1033 1033 2 777 ));
DATA(insert ( 2969 2950 2950 1 2963 ));
+DATA(insert ( 2969 2950 2950 2 3412 ));
DATA(insert ( 3254 3220 3220 1 3252 ));
+DATA(insert ( 3254 3220 3220 2 3413 ));
DATA(insert ( 3372 774 774 1 328 ));
+DATA(insert ( 3372 774 774 2 781 ));
DATA(insert ( 3523 3500 3500 1 3515 ));
+DATA(insert ( 3523 3500 3500 2 3414 ));
DATA(insert ( 3903 3831 3831 1 3902 ));
+DATA(insert ( 3903 3831 3831 2 3417 ));
DATA(insert ( 4034 3802 3802 1 4045 ));
+DATA(insert ( 4034 3802 3802 2 3416));
/* gist */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 8b33b4e..d820b56 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -668,36 +668,68 @@ DESCR("convert char(n) to name");
DATA(insert OID = 449 ( hashint2 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "21" _null_ _null_ _null_ _null_ _null_ hashint2 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 441 ( hashint2extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "21 20" _null_ _null_ _null_ _null_ _null_ hashint2extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 450 ( hashint4 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "23" _null_ _null_ _null_ _null_ _null_ hashint4 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 425 ( hashint4extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "23 20" _null_ _null_ _null_ _null_ _null_ hashint4extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 949 ( hashint8 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "20" _null_ _null_ _null_ _null_ _null_ hashint8 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 442 ( hashint8extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "20 20" _null_ _null_ _null_ _null_ _null_ hashint8extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 451 ( hashfloat4 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "700" _null_ _null_ _null_ _null_ _null_ hashfloat4 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 443 ( hashfloat4extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "700 20" _null_ _null_ _null_ _null_ _null_ hashfloat4extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 452 ( hashfloat8 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "701" _null_ _null_ _null_ _null_ _null_ hashfloat8 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 444 ( hashfloat8extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "701 20" _null_ _null_ _null_ _null_ _null_ hashfloat8extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 453 ( hashoid PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "26" _null_ _null_ _null_ _null_ _null_ hashoid _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 445 ( hashoidextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "26 20" _null_ _null_ _null_ _null_ _null_ hashoidextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 454 ( hashchar PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "18" _null_ _null_ _null_ _null_ _null_ hashchar _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 446 ( hashcharextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "18 20" _null_ _null_ _null_ _null_ _null_ hashcharextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 455 ( hashname PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "19" _null_ _null_ _null_ _null_ _null_ hashname _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 447 ( hashnameextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "19 20" _null_ _null_ _null_ _null_ _null_ hashnameextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 400 ( hashtext PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "25" _null_ _null_ _null_ _null_ _null_ hashtext _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 448 ( hashtextextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "25 20" _null_ _null_ _null_ _null_ _null_ hashtextextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 456 ( hashvarlena PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "2281" _null_ _null_ _null_ _null_ _null_ hashvarlena _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 772 ( hashvarlenaextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "2281 20" _null_ _null_ _null_ _null_ _null_ hashvarlenaextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 457 ( hashoidvector PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "30" _null_ _null_ _null_ _null_ _null_ hashoidvector _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 776 ( hashoidvectorextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "30 20" _null_ _null_ _null_ _null_ _null_ hashoidvectorextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 329 ( hash_aclitem PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1033" _null_ _null_ _null_ _null_ _null_ hash_aclitem _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 777 ( hash_aclitem_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1033 20" _null_ _null_ _null_ _null_ _null_ hash_aclitem_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 399 ( hashmacaddr PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "829" _null_ _null_ _null_ _null_ _null_ hashmacaddr _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 778 ( hashmacaddrextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "829 20" _null_ _null_ _null_ _null_ _null_ hashmacaddrextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 422 ( hashinet PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "869" _null_ _null_ _null_ _null_ _null_ hashinet _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 779 ( hashinetextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "869 20" _null_ _null_ _null_ _null_ _null_ hashinetextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 432 ( hash_numeric PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1700" _null_ _null_ _null_ _null_ _null_ hash_numeric _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 780 ( hash_numeric_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1700 20" _null_ _null_ _null_ _null_ _null_ hash_numeric_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 328 ( hashmacaddr8 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "774" _null_ _null_ _null_ _null_ _null_ hashmacaddr8 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 781 ( hashmacaddr8extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "774 20" _null_ _null_ _null_ _null_ _null_ hashmacaddr8extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 438 ( num_nulls PGNSP PGUID 12 1 0 2276 0 f f f f f f i s 1 0 23 "2276" "{2276}" "{v}" _null_ _null_ _null_ pg_num_nulls _null_ _null_ _null_ ));
DESCR("count the number of NULL arguments");
@@ -747,6 +779,8 @@ DESCR("convert float8 to int8");
DATA(insert OID = 626 ( hash_array PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "2277" _null_ _null_ _null_ _null_ _null_ hash_array _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 782 ( hash_array_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "2277 20" _null_ _null_ _null_ _null_ _null_ hash_array_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 652 ( float4 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 700 "20" _null_ _null_ _null_ _null_ _null_ i8tof _null_ _null_ _null_ ));
DESCR("convert int8 to float4");
@@ -1155,6 +1189,8 @@ DATA(insert OID = 3328 ( bpchar_sortsupport PGNSP PGUID 12 1 0 0 0 f f f f t f i
DESCR("sort support");
DATA(insert OID = 1080 ( hashbpchar PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1042" _null_ _null_ _null_ _null_ _null_ hashbpchar _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 972 ( hashbpcharextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1042 20" _null_ _null_ _null_ _null_ _null_ hashbpcharextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 1081 ( format_type PGNSP PGUID 12 1 0 0 0 f f f f f f s s 2 0 25 "26 23" _null_ _null_ _null_ _null_ _null_ format_type _null_ _null_ _null_ ));
DESCR("format a type oid and atttypmod to canonical SQL");
DATA(insert OID = 1084 ( date_in PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 1082 "2275" _null_ _null_ _null_ _null_ _null_ date_in _null_ _null_ _null_ ));
@@ -2286,10 +2322,16 @@ DESCR("less-equal-greater");
DATA(insert OID = 1688 ( time_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1083" _null_ _null_ _null_ _null_ _null_ time_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3409 ( time_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1083 20" _null_ _null_ _null_ _null_ _null_ time_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 1696 ( timetz_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1266" _null_ _null_ _null_ _null_ _null_ timetz_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3410 ( timetz_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1266 20" _null_ _null_ _null_ _null_ _null_ timetz_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 1697 ( interval_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1186" _null_ _null_ _null_ _null_ _null_ interval_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3418 ( interval_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1186 20" _null_ _null_ _null_ _null_ _null_ interval_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
/* OID's 1700 - 1799 NUMERIC data type */
@@ -3078,6 +3120,8 @@ DATA(insert OID = 2038 ( timezone PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0
DESCR("adjust time with time zone to new zone");
DATA(insert OID = 2039 ( timestamp_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1114" _null_ _null_ _null_ _null_ _null_ timestamp_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3411 ( timestamp_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1114 20" _null_ _null_ _null_ _null_ _null_ timestamp_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 2041 ( overlaps PGNSP PGUID 12 1 0 0 0 f f f f f f i s 4 0 16 "1114 1114 1114 1114" _null_ _null_ _null_ _null_ _null_ overlaps_timestamp _null_ _null_ _null_ ));
DESCR("intervals overlap?");
DATA(insert OID = 2042 ( overlaps PGNSP PGUID 14 1 0 0 0 f f f f f f i s 4 0 16 "1114 1186 1114 1186" _null_ _null_ _null_ _null_ _null_ "select ($1, ($1 + $2)) overlaps ($3, ($3 + $4))" _null_ _null_ _null_ ));
@@ -4543,6 +4587,8 @@ DATA(insert OID = 2962 ( uuid_send PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1
DESCR("I/O");
DATA(insert OID = 2963 ( uuid_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "2950" _null_ _null_ _null_ _null_ _null_ uuid_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3412 ( uuid_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "2950 20" _null_ _null_ _null_ _null_ _null_ uuid_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
/* pg_lsn */
DATA(insert OID = 3229 ( pg_lsn_in PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3220 "2275" _null_ _null_ _null_ _null_ _null_ pg_lsn_in _null_ _null_ _null_ ));
@@ -4564,6 +4610,8 @@ DATA(insert OID = 3251 ( pg_lsn_cmp PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0
DESCR("less-equal-greater");
DATA(insert OID = 3252 ( pg_lsn_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "3220" _null_ _null_ _null_ _null_ _null_ pg_lsn_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3413 ( pg_lsn_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "3220 20" _null_ _null_ _null_ _null_ _null_ pg_lsn_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
/* enum related procs */
DATA(insert OID = 3504 ( anyenum_in PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3500 "2275" _null_ _null_ _null_ _null_ _null_ anyenum_in _null_ _null_ _null_ ));
@@ -4584,6 +4632,8 @@ DATA(insert OID = 3514 ( enum_cmp PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 2
DESCR("less-equal-greater");
DATA(insert OID = 3515 ( hashenum PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "3500" _null_ _null_ _null_ _null_ _null_ hashenum _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3414 ( hashenumextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "3500 20" _null_ _null_ _null_ _null_ _null_ hashenumextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 3524 ( enum_smaller PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 3500 "3500 3500" _null_ _null_ _null_ _null_ _null_ enum_smaller _null_ _null_ _null_ ));
DESCR("smaller of two");
DATA(insert OID = 3525 ( enum_larger PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 3500 "3500 3500" _null_ _null_ _null_ _null_ _null_ enum_larger _null_ _null_ _null_ ));
@@ -4981,6 +5031,8 @@ DATA(insert OID = 4044 ( jsonb_cmp PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2
DESCR("less-equal-greater");
DATA(insert OID = 4045 ( jsonb_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "3802" _null_ _null_ _null_ _null_ _null_ jsonb_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3416 ( jsonb_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "3802 20" _null_ _null_ _null_ _null_ _null_ jsonb_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 4046 ( jsonb_contains PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 16 "3802 3802" _null_ _null_ _null_ _null_ _null_ jsonb_contains _null_ _null_ _null_ ));
DATA(insert OID = 4047 ( jsonb_exists PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 16 "3802 25" _null_ _null_ _null_ _null_ _null_ jsonb_exists _null_ _null_ _null_ ));
DATA(insert OID = 4048 ( jsonb_exists_any PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 16 "3802 1009" _null_ _null_ _null_ _null_ _null_ jsonb_exists_any _null_ _null_ _null_ ));
@@ -5171,6 +5223,8 @@ DATA(insert OID = 3881 ( range_gist_same PGNSP PGUID 12 1 0 0 0 f f f f t f i
DESCR("GiST support");
DATA(insert OID = 3902 ( hash_range PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "3831" _null_ _null_ _null_ _null_ _null_ hash_range _null_ _null_ _null_ ));
DESCR("hash a range");
+DATA(insert OID = 3417 ( hash_range_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "3831 20" _null_ _null_ _null_ _null_ _null_ hash_range_extended _null_ _null_ _null_ ));
+DESCR("hash a range");
DATA(insert OID = 3916 ( range_typanalyze PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 16 "2281" _null_ _null_ _null_ _null_ _null_ range_typanalyze _null_ _null_ _null_ ));
DESCR("range typanalyze");
DATA(insert OID = 3169 ( rangesel PGNSP PGUID 12 1 0 0 0 f f f f t f s s 4 0 701 "2281 26 2281 23" _null_ _null_ _null_ _null_ _null_ rangesel _null_ _null_ _null_ ));
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index ea9dd17..24f4916 100644
--- a/src/include/utils/jsonb.h
+++ b/src/include/utils/jsonb.h
@@ -370,6 +370,8 @@ extern Jsonb *JsonbValueToJsonb(JsonbValue *val);
extern bool JsonbDeepContains(JsonbIterator **val,
JsonbIterator **mContained);
extern void JsonbHashScalarValue(const JsonbValue *scalarVal, uint32 *hash);
+extern void JsonbHashScalarValueExtended(const JsonbValue *scalarVal,
+ uint64 *hash, uint64 seed);
/* jsonb.c support functions */
extern char *JsonbToCString(StringInfo out, JsonbContainer *in,
diff --git a/src/include/utils/typcache.h b/src/include/utils/typcache.h
index c12631d..b4f7592 100644
--- a/src/include/utils/typcache.h
+++ b/src/include/utils/typcache.h
@@ -56,6 +56,7 @@ typedef struct TypeCacheEntry
Oid gt_opr; /* the greater-than operator */
Oid cmp_proc; /* the btree comparison function */
Oid hash_proc; /* the hash calculation function */
+ Oid hash_extended_proc; /* the extended hash calculation function */
/*
* Pre-set-up fmgr call info for the equality operator, the btree
@@ -67,6 +68,7 @@ typedef struct TypeCacheEntry
FmgrInfo eq_opr_finfo;
FmgrInfo cmp_proc_finfo;
FmgrInfo hash_proc_finfo;
+ FmgrInfo hash_extended_proc_finfo;
/*
* Tuple descriptor if it's a composite type (row type). NULL if not
@@ -120,6 +122,8 @@ typedef struct TypeCacheEntry
#define TYPECACHE_HASH_OPFAMILY 0x0400
#define TYPECACHE_RANGE_INFO 0x0800
#define TYPECACHE_DOMAIN_INFO 0x1000
+#define TYPECACHE_HASH_EXTENDED_PROC 0x2000
+#define TYPECACHE_HASH_EXTENDED_PROC_FINFO 0x4000
/*
* Callers wishing to maintain a long-lived reference to a domain's constraint
diff --git a/src/test/regress/expected/alter_generic.out b/src/test/regress/expected/alter_generic.out
index 9f6ad4d..767c09b 100644
--- a/src/test/regress/expected/alter_generic.out
+++ b/src/test/regress/expected/alter_generic.out
@@ -421,7 +421,7 @@ BEGIN TRANSACTION;
CREATE OPERATOR FAMILY alt_opf13 USING hash;
CREATE FUNCTION fn_opf13 (int4) RETURNS BIGINT AS 'SELECT NULL::BIGINT;' LANGUAGE SQL;
ALTER OPERATOR FAMILY alt_opf13 USING hash ADD FUNCTION 1 fn_opf13(int4);
-ERROR: hash procedures must return integer
+ERROR: hash procedure 1 must return integer
DROP OPERATOR FAMILY alt_opf13 USING hash;
ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
@@ -439,7 +439,7 @@ BEGIN TRANSACTION;
CREATE OPERATOR FAMILY alt_opf15 USING hash;
CREATE FUNCTION fn_opf15 (int4, int2) RETURNS BIGINT AS 'SELECT NULL::BIGINT;' LANGUAGE SQL;
ALTER OPERATOR FAMILY alt_opf15 USING hash ADD FUNCTION 1 fn_opf15(int4, int2);
-ERROR: hash procedures must have one argument
+ERROR: hash procedure 1 must have one argument
DROP OPERATOR FAMILY alt_opf15 USING hash;
ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
--
2.6.2
0002-test-Hash_functions_wip.patchapplication/octet-stream; name=0002-test-Hash_functions_wip.patchDownload
From 52a34a845ee6eeedee7741f54858d0970e6a5b40 Mon Sep 17 00:00:00 2001
From: Amul Sul <sulamul@gmail.com>
Date: Tue, 22 Aug 2017 14:06:50 +0530
Subject: [PATCH 2/2] test-Hash_functions
---
src/test/regress/expected/hash_func.out | 167 ++++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/hash_func.sql | 48 +++++++++
3 files changed, 216 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/hash_func.out
create mode 100644 src/test/regress/sql/hash_func.sql
diff --git a/src/test/regress/expected/hash_func.out b/src/test/regress/expected/hash_func.out
new file mode 100644
index 0000000..b926bbf
--- /dev/null
+++ b/src/test/regress/expected/hash_func.out
@@ -0,0 +1,167 @@
+--
+-- Test hash functions
+--
+SELECT hashint2(837::int2), hashint2extended(837::int2, 0), hashint2extended(837::int2, 6877457);
+ hashint2 | hashint2extended | hashint2extended
+------------+----------------------+----------------------
+ -828119691 | -7711785034027113099 | -1911879269318603603
+(1 row)
+
+SELECT hashint4(837), hashint4extended(837, 0), hashint4extended(837, 6877457);
+ hashint4 | hashint4extended | hashint4extended
+------------+----------------------+----------------------
+ -828119691 | -7711785034027113099 | -1911879269318603603
+(1 row)
+
+SELECT hashint8(837), hashint8extended(837, 0), hashint8extended(837, 6877457);
+ hashint8 | hashint8extended | hashint8extended
+------------+----------------------+----------------------
+ -828119691 | -7711785034027113099 | -1911879269318603603
+(1 row)
+
+SELECT hashfloat4(837.576), hashfloat4extended(837.576, 0), hashfloat4extended(837.576, 6877457);
+ hashfloat4 | hashfloat4extended | hashfloat4extended
+-------------+---------------------+----------------------
+ -1248506800 | 5445423874677492816 | -7666025229379580075
+(1 row)
+
+SELECT hashfloat8(837.576), hashfloat8extended(837.576, 0), hashfloat8extended(837.576, 6877457);
+ hashfloat8 | hashfloat8extended | hashfloat8extended
+------------+---------------------+---------------------
+ 22809674 | 6692304374340521034 | 3015132189621834995
+(1 row)
+
+SELECT hashoid(445), hashoidextended(445, 0), hashoidextended(445, 6877457);
+ hashoid | hashoidextended | hashoidextended
+----------+----------------------+----------------------
+ 89651167 | -3499471517977675809 | -6835125710780429670
+(1 row)
+
+SELECT hashchar('x'), hashcharextended('x', 0), hashcharextended('x', 6877457);
+ hashchar | hashcharextended | hashcharextended
+-------------+---------------------+----------------------
+ -1072653310 | -300249585504183294 | -1689974060427644062
+(1 row)
+
+SELECT hashname('PostgreSQL'), hashnameextended('PostgreSQL', 0), hashnameextended('PostgreSQL', 6877457);
+ hashname | hashnameextended | hashnameextended
+-------------+----------------------+----------------------
+ -1696465276 | -5770657681951818108 | -6237257254085129499
+(1 row)
+
+SELECT hashtext('PostgreSQL'), hashtextextended('PostgreSQL', 0), hashtextextended('PostgreSQL', 6877457);
+ hashtext | hashtextextended | hashtextextended
+-------------+----------------------+----------------------
+ -1696465276 | -5770657681951818108 | -6237257254085129499
+(1 row)
+
+--SELECT hashvarlena(internal_type??), hashvarlenaextended(internal type??, 0), hashvarlenaextended(internal type??, 6877457);
+SELECT hashoidvector('1 2 3 4 5 6 7 8'), hashoidvectorextended('1 2 3 4 5 6 7 8', 0),
+ hashoidvectorextended('1 2 3 4 5 6 7 8', 6877457);
+ hashoidvector | hashoidvectorextended | hashoidvectorextended
+---------------+-----------------------+-----------------------
+ 1162769288 | -8958777976966712440 | 5137409721967698705
+(1 row)
+
+-- SELECT hash_aclitem(relacl[1]), hash_aclitem_extended(relacl[1], 0), hash_aclitem_extended(relacl[1], 6877457)
+-- FROM pg_class LIMIT 1;
+SELECT hashmacaddr('08:00:2b:01:02:04'), hashmacaddrextended('08:00:2b:01:02:04', 0),
+ hashmacaddrextended('08:00:2b:01:02:04', 6877457);
+ hashmacaddr | hashmacaddrextended | hashmacaddrextended
+-------------+----------------------+----------------------
+ 1310037952 | -1891888588126840896 | -7646251794294615557
+(1 row)
+
+SELECT hashmacaddr8('08:00:2b:01:02:03:04:05'), hashmacaddr8extended('08:00:2b:01:02:03:04:05', 0),
+ hashmacaddr8extended('08:00:2b:01:02:03:04:05', 6877457);
+ hashmacaddr8 | hashmacaddr8extended | hashmacaddr8extended
+--------------+----------------------+----------------------
+ -445665214 | -5218702474190278590 | -230507670101060516
+(1 row)
+
+SELECT hashinet('192.168.100.128/25'), hashinetextended('192.168.100.128/25', 0),
+ hashinetextended('192.168.100.128/25', 6877457);
+ hashinet | hashinetextended | hashinetextended
+------------+---------------------+---------------------
+ 1612896565 | 8936096150078019893 | 6997678326324099600
+(1 row)
+
+SELECT hash_numeric(149484958.204628457), hash_numeric_extended(149484958.204628457, 0),
+ hash_numeric_extended(149484958.204628457, 6877457);
+ hash_numeric | hash_numeric_extended | hash_numeric_extended
+--------------+-----------------------+-----------------------
+ -744679049 | -7302423428854900361 | 1216807249137458925
+(1 row)
+
+SELECT hash_array('{1,2,3,4,5,6,7,8}'::int2[]), hash_array_extended('{1,2,3,4,5,6,7,8}'::int2[], 0),
+ hash_array_extended('{1,2,3,4,5,6,7,8}'::int2[], 6877457);
+ hash_array | hash_array_extended | hash_array_extended
+------------+----------------------+----------------------
+ 1332756414 | -2139327289023774786 | -5977231353903978197
+(1 row)
+
+SELECT hashbpchar('PostgreSQL'), hashbpcharextended('PostgreSQL', 0), hashbpcharextended('PostgreSQL', 6877457);
+ hashbpchar | hashbpcharextended | hashbpcharextended
+-------------+----------------------+----------------------
+ -1696465276 | -5770657681951818108 | -6237257254085129499
+(1 row)
+
+SELECT time_hash('11:09:59'), time_hash_extended('11:09:59', 0),
+ time_hash_extended('11:09:59', 6877457);
+ time_hash | time_hash_extended | time_hash_extended
+------------+---------------------+---------------------
+ 1740227977 | 6564483993255396745 | 8740506789988024383
+(1 row)
+
+SELECT timetz_hash('2017-08-22 00:11:52.518762-07'), timetz_hash_extended('2017-08-22 00:11:52.518762-07', 0),
+ timetz_hash_extended('2017-08-22 00:11:52.518762-07', 6877457);
+ timetz_hash | timetz_hash_extended | timetz_hash_extended
+-------------+----------------------+----------------------
+ -360566912 | 9030192975979884416 | 1447257070792174569
+(1 row)
+
+SELECT timestamp_hash('2017-08-22 00:09:59'), timestamp_hash_extended('2017-08-22 00:09:59', 0),
+ timestamp_hash_extended('2017-08-22 00:09:59', 6877457);
+ timestamp_hash | timestamp_hash_extended | timestamp_hash_extended
+----------------+-------------------------+-------------------------
+ -1450627509 | 580235490235067979 | 3037149474310020228
+(1 row)
+
+SELECT interval_hash('5 month 7 day 46 minutes'), interval_hash_extended('5 month 7 day 46 minutes', 0),
+ interval_hash_extended('5 month 7 day 46 minutes', 6877457);
+ interval_hash | interval_hash_extended | interval_hash_extended
+---------------+------------------------+------------------------
+ -1650268900 | -2083448633614473956 | 7239542030696170388
+(1 row)
+
+SELECT uuid_hash('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11'), uuid_hash_extended('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 0),
+ uuid_hash_extended('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 6877457);
+ uuid_hash | uuid_hash_extended | uuid_hash_extended
+------------+---------------------+---------------------
+ 1154666245 | 7543054272012930821 | 5854430665682797134
+(1 row)
+
+SELECT pg_lsn_hash('16/B374D84'), pg_lsn_hash_extended('16/B374D84', 0), pg_lsn_hash_extended('16/B374D84', 6877457);
+ pg_lsn_hash | pg_lsn_hash_extended | pg_lsn_hash_extended
+-------------+----------------------+----------------------
+ 783723155 | 7832933829336017555 | -483403495645773502
+(1 row)
+
+--CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
+--SELECT hashenum('happy'::mood), hashenumextended('happy'::mood, 0), hashenumextended('happy'::mood, 6877457);
+--DROP TYPE mood;
+SELECT jsonb_hash('{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}'),
+ jsonb_hash_extended('{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}', 0),
+ jsonb_hash_extended('{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}', 6877457);
+ jsonb_hash | jsonb_hash_extended | jsonb_hash_extended
+------------+---------------------+----------------------
+ -258638552 | 6657421403376102465 | -5342236833681234663
+(1 row)
+
+SELECT hash_range(int4range(10, 20)), hash_range_extended(int4range(10, 20), 0),
+ hash_range_extended(int4range(10, 20), 6877457);
+ hash_range | hash_range_extended | hash_range_extended
+------------+----------------------+----------------------
+ 1202375768 | -4361814294641587112 | -5843541585236977674
+(1 row)
+
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index eefdeea..2fd3f2b 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/sql/hash_func.sql b/src/test/regress/sql/hash_func.sql
new file mode 100644
index 0000000..7d751be
--- /dev/null
+++ b/src/test/regress/sql/hash_func.sql
@@ -0,0 +1,48 @@
+--
+-- Test hash functions
+--
+
+SELECT hashint2(837::int2), hashint2extended(837::int2, 0), hashint2extended(837::int2, 6877457);
+SELECT hashint4(837), hashint4extended(837, 0), hashint4extended(837, 6877457);
+SELECT hashint8(837), hashint8extended(837, 0), hashint8extended(837, 6877457);
+SELECT hashfloat4(837.576), hashfloat4extended(837.576, 0), hashfloat4extended(837.576, 6877457);
+SELECT hashfloat8(837.576), hashfloat8extended(837.576, 0), hashfloat8extended(837.576, 6877457);
+SELECT hashoid(445), hashoidextended(445, 0), hashoidextended(445, 6877457);
+SELECT hashchar('x'), hashcharextended('x', 0), hashcharextended('x', 6877457);
+SELECT hashname('PostgreSQL'), hashnameextended('PostgreSQL', 0), hashnameextended('PostgreSQL', 6877457);
+SELECT hashtext('PostgreSQL'), hashtextextended('PostgreSQL', 0), hashtextextended('PostgreSQL', 6877457);
+--SELECT hashvarlena(internal_type??), hashvarlenaextended(internal type??, 0), hashvarlenaextended(internal type??, 6877457);
+SELECT hashoidvector('1 2 3 4 5 6 7 8'), hashoidvectorextended('1 2 3 4 5 6 7 8', 0),
+ hashoidvectorextended('1 2 3 4 5 6 7 8', 6877457);
+-- SELECT hash_aclitem(relacl[1]), hash_aclitem_extended(relacl[1], 0), hash_aclitem_extended(relacl[1], 6877457)
+-- FROM pg_class LIMIT 1;
+SELECT hashmacaddr('08:00:2b:01:02:04'), hashmacaddrextended('08:00:2b:01:02:04', 0),
+ hashmacaddrextended('08:00:2b:01:02:04', 6877457);
+SELECT hashmacaddr8('08:00:2b:01:02:03:04:05'), hashmacaddr8extended('08:00:2b:01:02:03:04:05', 0),
+ hashmacaddr8extended('08:00:2b:01:02:03:04:05', 6877457);
+SELECT hashinet('192.168.100.128/25'), hashinetextended('192.168.100.128/25', 0),
+ hashinetextended('192.168.100.128/25', 6877457);
+SELECT hash_numeric(149484958.204628457), hash_numeric_extended(149484958.204628457, 0),
+ hash_numeric_extended(149484958.204628457, 6877457);
+SELECT hash_array('{1,2,3,4,5,6,7,8}'::int2[]), hash_array_extended('{1,2,3,4,5,6,7,8}'::int2[], 0),
+ hash_array_extended('{1,2,3,4,5,6,7,8}'::int2[], 6877457);
+SELECT hashbpchar('PostgreSQL'), hashbpcharextended('PostgreSQL', 0), hashbpcharextended('PostgreSQL', 6877457);
+SELECT time_hash('11:09:59'), time_hash_extended('11:09:59', 0),
+ time_hash_extended('11:09:59', 6877457);
+SELECT timetz_hash('2017-08-22 00:11:52.518762-07'), timetz_hash_extended('2017-08-22 00:11:52.518762-07', 0),
+ timetz_hash_extended('2017-08-22 00:11:52.518762-07', 6877457);
+SELECT timestamp_hash('2017-08-22 00:09:59'), timestamp_hash_extended('2017-08-22 00:09:59', 0),
+ timestamp_hash_extended('2017-08-22 00:09:59', 6877457);
+SELECT interval_hash('5 month 7 day 46 minutes'), interval_hash_extended('5 month 7 day 46 minutes', 0),
+ interval_hash_extended('5 month 7 day 46 minutes', 6877457);
+SELECT uuid_hash('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11'), uuid_hash_extended('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 0),
+ uuid_hash_extended('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 6877457);
+SELECT pg_lsn_hash('16/B374D84'), pg_lsn_hash_extended('16/B374D84', 0), pg_lsn_hash_extended('16/B374D84', 6877457);
+--CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
+--SELECT hashenum('happy'::mood), hashenumextended('happy'::mood, 0), hashenumextended('happy'::mood, 6877457);
+--DROP TYPE mood;
+SELECT jsonb_hash('{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}'),
+ jsonb_hash_extended('{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}', 0),
+ jsonb_hash_extended('{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}', 6877457);
+SELECT hash_range(int4range(10, 20)), hash_range_extended(int4range(10, 20), 0),
+ hash_range_extended(int4range(10, 20), 6877457);
--
2.6.2
On Tue, Aug 22, 2017 at 8:14 AM, amul sul <sulamul@gmail.com> wrote:
Attaching patch 0002 for the reviewer's testing.
I think that this 0002 is not something we can think of committing
because there's no guarantee that hash functions will return the same
results on all platforms. However, what we could and, I think, should
do is hash some representative values of each data type and verify
that hash(x) and hashextended(x, 0) come out the same at least as to
the low-order 32 bits -- and maybe that hashextend(x, 1) comes out
different as to the low-order 32 bits. The easiest way to do this
seems to be to cast to bit(32). For example:
SELECT v, hashint4(v)::bit(32) as standard,
hashint4extended(v, 0)::bit(32) as extended0,
hashint4extended(v, 1)::bit(32) as extended1
FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
WHERE hashint4(v)::bit(32) != hashint4extended(v, 0)::bit(32)
OR hashint4(v)::bit(32) = hashint4extended(v, 1)::bit(32);
I suggest adding a version of this for each data type.
From your latest version of 0001, I get:
rangetypes.c:1297:8: error: unused variable 'rngtypid'
[-Werror,-Wunused-variable]
Oid rngtypid = RangeTypeGetOid(r);
I suggest not duplicating the comments from each regular function into
the extended function, but just writing /* Same approach as hashfloat8
*/ when the implementation is non-trivial (no need for this if the
extended function is a single line or the original function had no
comments anyway).
hash_aclitem seems embarrassingly stupid. I suggest that we make the
extended version slightly less stupid -- i.e. if the seed is non-zero,
actually call hash_any_extended on the sum and pass the seed through.
* Reset info about hash function whenever we pick up new info about
* equality operator. This is so we can ensure that the hash function
* matches the operator.
*/
typentry->flags &= ~(TCFLAGS_CHECKED_HASH_PROC);
+ typentry->flags &= ~(TCFLAGS_CHECKED_HASH_EXTENDED_PROC);
Adjust comment: "about hash function" -> "about hash functions", "hash
functions matches" -> "hash functions match".
+extern Datum
+hash_any_extended(register const unsigned char *k, register int
+ keylen, uint64 seed);
Formatting.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 29, 2017 at 11:48 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 22, 2017 at 8:14 AM, amul sul <sulamul@gmail.com> wrote:
Attaching patch 0002 for the reviewer's testing.
I think that this 0002 is not something we can think of committing
because there's no guarantee that hash functions will return the same
results on all platforms. However, what we could and, I think, should
do is hash some representative values of each data type and verify
that hash(x) and hashextended(x, 0) come out the same at least as to
the low-order 32 bits -- and maybe that hashextend(x, 1) comes out
different as to the low-order 32 bits. The easiest way to do this
seems to be to cast to bit(32). For example:SELECT v, hashint4(v)::bit(32) as standard,
hashint4extended(v, 0)::bit(32) as extended0,
hashint4extended(v, 1)::bit(32) as extended1
FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
WHERE hashint4(v)::bit(32) != hashint4extended(v, 0)::bit(32)
OR hashint4(v)::bit(32) = hashint4extended(v, 1)::bit(32);I suggest adding a version of this for each data type.
Thanks for the suggestion, I have updated 0002-patch accordingly.
Using this I found some strange behaviours as follow:
1) standard and extended0 output for the jsonb_hash case is not same.
2) standard and extended0 output for the hash_range case is not same when
input
is int4range(550274, 1550274) other case in the patch are fine. This
can be
reproducible with other values as well, for e.g. int8range(1570275,
208112489).
Will look into this tomorrow.
Another case which I want to discuss is, extended and standard version of
hashfloat4, hashfloat8 & hash_numeric function will have the same output
for zero
value irrespective of seed value. Do you think we need a fix for this?
From your latest version of 0001, I get:
rangetypes.c:1297:8: error: unused variable 'rngtypid'
[-Werror,-Wunused-variable]
Oid rngtypid = RangeTypeGetOid(r);Fixed in the attached version.
I suggest not duplicating the comments from each regular function into
the extended function, but just writing /* Same approach as hashfloat8
*/ when the implementation is non-trivial (no need for this if the
extended function is a single line or the original function had no
comments anyway).
Fixed in the attached version.
hash_aclitem seems embarrassingly stupid. I suggest that we make the
extended version slightly less stupid -- i.e. if the seed is non-zero,
actually call hash_any_extended on the sum and pass the seed through.* Reset info about hash function whenever we pick up new info
about
* equality operator. This is so we can ensure that the hash
function
* matches the operator.
*/
typentry->flags &= ~(TCFLAGS_CHECKED_HASH_PROC);
+ typentry->flags &= ~(TCFLAGS_CHECKED_HASH_EXTENDED_PROC);Adjust comment: "about hash function" -> "about hash functions", "hash
functions matches" -> "hash functions match".
Fixed in the attached version.
+extern Datum +hash_any_extended(register const unsigned char *k, register int + keylen, uint64 seed);Formatting.
Fixed in the attached version.
Thanks !
Regards,
Amul
Attachments:
0001-add-optional-second-hash-proc-v3.patchapplication/octet-stream; name=0001-add-optional-second-hash-proc-v3.patchDownload
From 300c0b356baeb02766eb7d33b477927648b65fb1 Mon Sep 17 00:00:00 2001
From: Amul Sul <sulamul@gmail.com>
Date: Fri, 18 Aug 2017 15:28:26 +0530
Subject: [PATCH 1/2] add-optional-second-hash-proc-v3
v3:
Updated w.r.t Robert suggestion in
message-id : CA%2BTgmoYPvoTMGJYkVBA%3D2j1o1wpZ-WZCYiS7_B-AqqZBkWT4HQ%40mail.gmail.com
v2:
Extended remaining hash function.
v1:
Patch by Robert Haas.
---
doc/src/sgml/xindex.sgml | 11 +-
src/backend/access/hash/hashfunc.c | 372 +++++++++++++++++++++++++++-
src/backend/access/hash/hashpage.c | 2 +-
src/backend/access/hash/hashutil.c | 6 +-
src/backend/access/hash/hashvalidate.c | 42 +++-
src/backend/commands/opclasscmds.c | 34 ++-
src/backend/utils/adt/acl.c | 15 ++
src/backend/utils/adt/arrayfuncs.c | 79 ++++++
src/backend/utils/adt/date.c | 21 ++
src/backend/utils/adt/jsonb_op.c | 42 ++++
src/backend/utils/adt/jsonb_util.c | 40 +++
src/backend/utils/adt/mac.c | 9 +
src/backend/utils/adt/mac8.c | 9 +
src/backend/utils/adt/network.c | 10 +
src/backend/utils/adt/numeric.c | 59 +++++
src/backend/utils/adt/pg_lsn.c | 6 +
src/backend/utils/adt/rangetypes.c | 63 +++++
src/backend/utils/adt/timestamp.c | 19 ++
src/backend/utils/adt/uuid.c | 8 +
src/backend/utils/adt/varchar.c | 18 ++
src/backend/utils/cache/lsyscache.c | 8 +-
src/backend/utils/cache/typcache.c | 58 ++++-
src/include/access/hash.h | 17 +-
src/include/catalog/pg_amproc.h | 36 +++
src/include/catalog/pg_proc.h | 54 ++++
src/include/utils/jsonb.h | 2 +
src/include/utils/typcache.h | 4 +
src/test/regress/expected/alter_generic.out | 4 +-
28 files changed, 1008 insertions(+), 40 deletions(-)
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml
index 333a36c..0f3c46b 100644
--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@@ -436,7 +436,8 @@
</table>
<para>
- Hash indexes require one support function, shown in <xref
+ Hash indexes require one support function, and allow a second one to be
+ supplied at the operator class author's option, as shown in <xref
linkend="xindex-hash-support-table">.
</para>
@@ -451,9 +452,15 @@
</thead>
<tbody>
<row>
- <entry>Compute the hash value for a key</entry>
+ <entry>Compute the 32-bit hash value for a key</entry>
<entry>1</entry>
</row>
+ <row>
+ <entry>
+ Compute the 64-bit hash value for a key given a 64-bit salt
+ </entry>
+ <entry>2</entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index a127f3f..998f9fe 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -47,18 +47,36 @@ hashchar(PG_FUNCTION_ARGS)
}
Datum
+hashcharextended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended((int32) PG_GETARG_CHAR(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashint2(PG_FUNCTION_ARGS)
{
return hash_uint32((int32) PG_GETARG_INT16(0));
}
Datum
+hashint2extended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended((int32) PG_GETARG_INT16(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashint4(PG_FUNCTION_ARGS)
{
return hash_uint32(PG_GETARG_INT32(0));
}
Datum
+hashint4extended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended(PG_GETARG_INT32(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashint8(PG_FUNCTION_ARGS)
{
/*
@@ -79,18 +97,43 @@ hashint8(PG_FUNCTION_ARGS)
}
Datum
+hashint8extended(PG_FUNCTION_ARGS)
+{
+ /* Same approach as hashint8 */
+ int64 val = PG_GETARG_INT64(0);
+ uint32 lohalf = (uint32) val;
+ uint32 hihalf = (uint32) (val >> 32);
+
+ lohalf ^= (val >= 0) ? hihalf : ~hihalf;
+
+ return hash_uint32_extended(lohalf, PG_GETARG_INT64(1));
+}
+
+Datum
hashoid(PG_FUNCTION_ARGS)
{
return hash_uint32((uint32) PG_GETARG_OID(0));
}
Datum
+hashoidextended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended((uint32) PG_GETARG_OID(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashenum(PG_FUNCTION_ARGS)
{
return hash_uint32((uint32) PG_GETARG_OID(0));
}
Datum
+hashenumextended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended((uint32) PG_GETARG_OID(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashfloat4(PG_FUNCTION_ARGS)
{
float4 key = PG_GETARG_FLOAT4(0);
@@ -117,6 +160,21 @@ hashfloat4(PG_FUNCTION_ARGS)
}
Datum
+hashfloat4extended(PG_FUNCTION_ARGS)
+{
+ float4 key = PG_GETARG_FLOAT4(0);
+ float8 key8;
+
+ /* Same approach as hashfloat4 */
+ if (key == (float4) 0)
+ return UInt64GetDatum(0);
+ key8 = key;
+
+ return hash_any_extended((unsigned char *) &key8, sizeof(key8),
+ PG_GETARG_INT64(1));
+}
+
+Datum
hashfloat8(PG_FUNCTION_ARGS)
{
float8 key = PG_GETARG_FLOAT8(0);
@@ -133,6 +191,19 @@ hashfloat8(PG_FUNCTION_ARGS)
}
Datum
+hashfloat8extended(PG_FUNCTION_ARGS)
+{
+ float8 key = PG_GETARG_FLOAT8(0);
+
+ /* Same approach as hashfloat8 */
+ if (key == (float8) 0)
+ return UInt64GetDatum(0);
+
+ return hash_any_extended((unsigned char *) &key, sizeof(key),
+ PG_GETARG_INT64(1));
+}
+
+Datum
hashoidvector(PG_FUNCTION_ARGS)
{
oidvector *key = (oidvector *) PG_GETARG_POINTER(0);
@@ -141,6 +212,16 @@ hashoidvector(PG_FUNCTION_ARGS)
}
Datum
+hashoidvectorextended(PG_FUNCTION_ARGS)
+{
+ oidvector *key = (oidvector *) PG_GETARG_POINTER(0);
+
+ return hash_any_extended((unsigned char *) key->values,
+ key->dim1 * sizeof(Oid),
+ PG_GETARG_INT64(1));
+}
+
+Datum
hashname(PG_FUNCTION_ARGS)
{
char *key = NameStr(*PG_GETARG_NAME(0));
@@ -149,6 +230,15 @@ hashname(PG_FUNCTION_ARGS)
}
Datum
+hashnameextended(PG_FUNCTION_ARGS)
+{
+ char *key = NameStr(*PG_GETARG_NAME(0));
+
+ return hash_any_extended((unsigned char *) key, strlen(key),
+ PG_GETARG_INT64(1));
+}
+
+Datum
hashtext(PG_FUNCTION_ARGS)
{
text *key = PG_GETARG_TEXT_PP(0);
@@ -168,6 +258,22 @@ hashtext(PG_FUNCTION_ARGS)
return result;
}
+Datum
+hashtextextended(PG_FUNCTION_ARGS)
+{
+ text *key = PG_GETARG_TEXT_PP(0);
+ Datum result;
+
+ /* Same approach as hashtext */
+ result = hash_any_extended((unsigned char *) VARDATA_ANY(key),
+ VARSIZE_ANY_EXHDR(key),
+ PG_GETARG_INT64(1));
+
+ PG_FREE_IF_COPY(key, 0);
+
+ return result;
+}
+
/*
* hashvarlena() can be used for any varlena datatype in which there are
* no non-significant bits, ie, distinct bitpatterns never compare as equal.
@@ -187,6 +293,21 @@ hashvarlena(PG_FUNCTION_ARGS)
return result;
}
+Datum
+hashvarlenaextended(PG_FUNCTION_ARGS)
+{
+ struct varlena *key = PG_GETARG_VARLENA_PP(0);
+ Datum result;
+
+ result = hash_any_extended((unsigned char *) VARDATA_ANY(key),
+ VARSIZE_ANY_EXHDR(key),
+ PG_GETARG_INT64(1));
+
+ PG_FREE_IF_COPY(key, 0);
+
+ return result;
+}
+
/*
* This hash function was written by Bob Jenkins
* (bob_jenkins@burtleburtle.net), and superficially adapted
@@ -502,7 +623,227 @@ hash_any(register const unsigned char *k, register int keylen)
}
/*
- * hash_uint32() -- hash a 32-bit value
+ * hash_any_extended() -- hash into a 64-bit value, using an optional seed
+ * k : the key (the unaligned variable-length array of bytes)
+ * len : the length of the key, counting by bytes
+ * seed : a 64-bit seed (0 means no seed)
+ *
+ * Returns a uint64 value. Otherwise similar to hash_any.
+ */
+Datum
+hash_any_extended(register const unsigned char *k, register int keylen,
+ uint64 seed)
+{
+ register uint32 a,
+ b,
+ c,
+ len;
+
+ /* Set up the internal state */
+ len = keylen;
+ a = b = c = 0x9e3779b9 + len + 3923095;
+
+ /* If the seed is non-zero, use it to perturb the internal state. */
+ if (seed != 0)
+ {
+ /*
+ * In essence, the seed is treated as part of the data being hashed,
+ * but for simplicity, we pretend that it's padded with four bytes of
+ * zeroes so that the seed constitutes a 4-byte chunk.
+ */
+ a += (uint32) (seed >> 32);
+ b += (uint32) seed;
+ mix(a, b, c);
+ }
+
+ /* If the source pointer is word-aligned, we use word-wide fetches */
+ if (((uintptr_t) k & UINT32_ALIGN_MASK) == 0)
+ {
+ /* Code path for aligned source data */
+ register const uint32 *ka = (const uint32 *) k;
+
+ /* handle most of the key */
+ while (len >= 12)
+ {
+ a += ka[0];
+ b += ka[1];
+ c += ka[2];
+ mix(a, b, c);
+ ka += 3;
+ len -= 12;
+ }
+
+ /* handle the last 11 bytes */
+ k = (const unsigned char *) ka;
+#ifdef WORDS_BIGENDIAN
+ switch (len)
+ {
+ case 11:
+ c += ((uint32) k[10] << 8);
+ /* fall through */
+ case 10:
+ c += ((uint32) k[9] << 16);
+ /* fall through */
+ case 9:
+ c += ((uint32) k[8] << 24);
+ /* the lowest byte of c is reserved for the length */
+ /* fall through */
+ case 8:
+ b += ka[1];
+ a += ka[0];
+ break;
+ case 7:
+ b += ((uint32) k[6] << 8);
+ /* fall through */
+ case 6:
+ b += ((uint32) k[5] << 16);
+ /* fall through */
+ case 5:
+ b += ((uint32) k[4] << 24);
+ /* fall through */
+ case 4:
+ a += ka[0];
+ break;
+ case 3:
+ a += ((uint32) k[2] << 8);
+ /* fall through */
+ case 2:
+ a += ((uint32) k[1] << 16);
+ /* fall through */
+ case 1:
+ a += ((uint32) k[0] << 24);
+ /* case 0: nothing left to add */
+ }
+#else /* !WORDS_BIGENDIAN */
+ switch (len)
+ {
+ case 11:
+ c += ((uint32) k[10] << 24);
+ /* fall through */
+ case 10:
+ c += ((uint32) k[9] << 16);
+ /* fall through */
+ case 9:
+ c += ((uint32) k[8] << 8);
+ /* the lowest byte of c is reserved for the length */
+ /* fall through */
+ case 8:
+ b += ka[1];
+ a += ka[0];
+ break;
+ case 7:
+ b += ((uint32) k[6] << 16);
+ /* fall through */
+ case 6:
+ b += ((uint32) k[5] << 8);
+ /* fall through */
+ case 5:
+ b += k[4];
+ /* fall through */
+ case 4:
+ a += ka[0];
+ break;
+ case 3:
+ a += ((uint32) k[2] << 16);
+ /* fall through */
+ case 2:
+ a += ((uint32) k[1] << 8);
+ /* fall through */
+ case 1:
+ a += k[0];
+ /* case 0: nothing left to add */
+ }
+#endif /* WORDS_BIGENDIAN */
+ }
+ else
+ {
+ /* Code path for non-aligned source data */
+
+ /* handle most of the key */
+ while (len >= 12)
+ {
+#ifdef WORDS_BIGENDIAN
+ a += (k[3] + ((uint32) k[2] << 8) + ((uint32) k[1] << 16) + ((uint32) k[0] << 24));
+ b += (k[7] + ((uint32) k[6] << 8) + ((uint32) k[5] << 16) + ((uint32) k[4] << 24));
+ c += (k[11] + ((uint32) k[10] << 8) + ((uint32) k[9] << 16) + ((uint32) k[8] << 24));
+#else /* !WORDS_BIGENDIAN */
+ a += (k[0] + ((uint32) k[1] << 8) + ((uint32) k[2] << 16) + ((uint32) k[3] << 24));
+ b += (k[4] + ((uint32) k[5] << 8) + ((uint32) k[6] << 16) + ((uint32) k[7] << 24));
+ c += (k[8] + ((uint32) k[9] << 8) + ((uint32) k[10] << 16) + ((uint32) k[11] << 24));
+#endif /* WORDS_BIGENDIAN */
+ mix(a, b, c);
+ k += 12;
+ len -= 12;
+ }
+
+ /* handle the last 11 bytes */
+#ifdef WORDS_BIGENDIAN
+ switch (len) /* all the case statements fall through */
+ {
+ case 11:
+ c += ((uint32) k[10] << 8);
+ case 10:
+ c += ((uint32) k[9] << 16);
+ case 9:
+ c += ((uint32) k[8] << 24);
+ /* the lowest byte of c is reserved for the length */
+ case 8:
+ b += k[7];
+ case 7:
+ b += ((uint32) k[6] << 8);
+ case 6:
+ b += ((uint32) k[5] << 16);
+ case 5:
+ b += ((uint32) k[4] << 24);
+ case 4:
+ a += k[3];
+ case 3:
+ a += ((uint32) k[2] << 8);
+ case 2:
+ a += ((uint32) k[1] << 16);
+ case 1:
+ a += ((uint32) k[0] << 24);
+ /* case 0: nothing left to add */
+ }
+#else /* !WORDS_BIGENDIAN */
+ switch (len) /* all the case statements fall through */
+ {
+ case 11:
+ c += ((uint32) k[10] << 24);
+ case 10:
+ c += ((uint32) k[9] << 16);
+ case 9:
+ c += ((uint32) k[8] << 8);
+ /* the lowest byte of c is reserved for the length */
+ case 8:
+ b += ((uint32) k[7] << 24);
+ case 7:
+ b += ((uint32) k[6] << 16);
+ case 6:
+ b += ((uint32) k[5] << 8);
+ case 5:
+ b += k[4];
+ case 4:
+ a += ((uint32) k[3] << 24);
+ case 3:
+ a += ((uint32) k[2] << 16);
+ case 2:
+ a += ((uint32) k[1] << 8);
+ case 1:
+ a += k[0];
+ /* case 0: nothing left to add */
+ }
+#endif /* WORDS_BIGENDIAN */
+ }
+
+ final(a, b, c);
+
+ /* report the result */
+ return UInt64GetDatum(((uint64) b << 32) | c);
+}
+
+/*
+ * hash_uint32() -- hash a 32-bit value to a 32-bit value
*
* This has the same result as
* hash_any(&k, sizeof(uint32))
@@ -523,3 +864,32 @@ hash_uint32(uint32 k)
/* report the result */
return UInt32GetDatum(c);
}
+
+/*
+ * hash_uint32_extended() -- hash a 32-bit value to a 64-bit value, with a seed
+ *
+ * Like hash_uint32, this is a convenience function.
+ */
+Datum
+hash_uint32_extended(uint32 k, uint64 seed)
+{
+ register uint32 a,
+ b,
+ c;
+
+ a = b = c = 0x9e3779b9 + (uint32) sizeof(uint32) + 3923095;
+
+ if (seed != 0)
+ {
+ a += (uint32) (seed >> 32);
+ b += (uint32) seed;
+ mix(a, b, c);
+ }
+
+ a += k;
+
+ final(a, b, c);
+
+ /* report the result */
+ return UInt64GetDatum(((uint64) b << 32) | c);
+}
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 7b2906b..0579841 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -373,7 +373,7 @@ _hash_init(Relation rel, double num_tuples, ForkNumber forkNum)
if (ffactor < 10)
ffactor = 10;
- procid = index_getprocid(rel, 1, HASHPROC);
+ procid = index_getprocid(rel, 1, HASHSTANDARD_PROC);
/*
* We initialize the metapage, the first N bucket pages, and the first
diff --git a/src/backend/access/hash/hashutil.c b/src/backend/access/hash/hashutil.c
index 9b803af..869cbc1 100644
--- a/src/backend/access/hash/hashutil.c
+++ b/src/backend/access/hash/hashutil.c
@@ -85,7 +85,7 @@ _hash_datum2hashkey(Relation rel, Datum key)
Oid collation;
/* XXX assumes index has only one attribute */
- procinfo = index_getprocinfo(rel, 1, HASHPROC);
+ procinfo = index_getprocinfo(rel, 1, HASHSTANDARD_PROC);
collation = rel->rd_indcollation[0];
return DatumGetUInt32(FunctionCall1Coll(procinfo, collation, key));
@@ -108,10 +108,10 @@ _hash_datum2hashkey_type(Relation rel, Datum key, Oid keytype)
hash_proc = get_opfamily_proc(rel->rd_opfamily[0],
keytype,
keytype,
- HASHPROC);
+ HASHSTANDARD_PROC);
if (!RegProcedureIsValid(hash_proc))
elog(ERROR, "missing support function %d(%u,%u) for index \"%s\"",
- HASHPROC, keytype, keytype,
+ HASHSTANDARD_PROC, keytype, keytype,
RelationGetRelationName(rel));
collation = rel->rd_indcollation[0];
diff --git a/src/backend/access/hash/hashvalidate.c b/src/backend/access/hash/hashvalidate.c
index 30b29cb..8b633c2 100644
--- a/src/backend/access/hash/hashvalidate.c
+++ b/src/backend/access/hash/hashvalidate.c
@@ -29,7 +29,7 @@
#include "utils/syscache.h"
-static bool check_hash_func_signature(Oid funcid, Oid restype, Oid argtype);
+static bool check_hash_func_signature(Oid funcid, int16 amprocnum, Oid argtype);
/*
@@ -105,8 +105,9 @@ hashvalidate(Oid opclassoid)
/* Check procedure numbers and function signatures */
switch (procform->amprocnum)
{
- case HASHPROC:
- if (!check_hash_func_signature(procform->amproc, INT4OID,
+ case HASHSTANDARD_PROC:
+ case HASHEXTENDED_PROC:
+ if (!check_hash_func_signature(procform->amproc, procform->amprocnum,
procform->amproclefttype))
{
ereport(INFO,
@@ -264,19 +265,37 @@ hashvalidate(Oid opclassoid)
* hacks in the core hash opclass definitions.
*/
static bool
-check_hash_func_signature(Oid funcid, Oid restype, Oid argtype)
+check_hash_func_signature(Oid funcid, int16 amprocnum, Oid argtype)
{
bool result = true;
+ Oid restype;
+ int16 nargs;
HeapTuple tp;
Form_pg_proc procform;
+ switch (amprocnum)
+ {
+ case HASHSTANDARD_PROC:
+ restype = INT4OID;
+ nargs = 1;
+ break;
+
+ case HASHEXTENDED_PROC:
+ restype = INT8OID;
+ nargs = 2;
+ break;
+
+ default:
+ elog(ERROR, "invalid amprocnum");
+ }
+
tp = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid));
if (!HeapTupleIsValid(tp))
elog(ERROR, "cache lookup failed for function %u", funcid);
procform = (Form_pg_proc) GETSTRUCT(tp);
if (procform->prorettype != restype || procform->proretset ||
- procform->pronargs != 1)
+ procform->pronargs != nargs)
result = false;
if (!IsBinaryCoercible(argtype, procform->proargtypes.values[0]))
@@ -290,24 +309,29 @@ check_hash_func_signature(Oid funcid, Oid restype, Oid argtype)
* identity, not just its input type, because hashvarlena() takes
* INTERNAL and allowing any such function seems too scary.
*/
- if (funcid == F_HASHINT4 &&
+ if ((funcid == F_HASHINT4 || funcid == F_HASHINT4EXTENDED) &&
(argtype == DATEOID ||
argtype == ABSTIMEOID || argtype == RELTIMEOID ||
argtype == XIDOID || argtype == CIDOID))
/* okay, allowed use of hashint4() */ ;
- else if (funcid == F_TIMESTAMP_HASH &&
+ else if ((funcid == F_TIMESTAMP_HASH ||
+ funcid == F_TIMESTAMP_HASH_EXTENDED) &&
argtype == TIMESTAMPTZOID)
/* okay, allowed use of timestamp_hash() */ ;
- else if (funcid == F_HASHCHAR &&
+ else if ((funcid == F_HASHCHAR || funcid == F_HASHCHAREXTENDED) &&
argtype == BOOLOID)
/* okay, allowed use of hashchar() */ ;
- else if (funcid == F_HASHVARLENA &&
+ else if ((funcid == F_HASHVARLENA || funcid == F_HASHVARLENAEXTENDED) &&
argtype == BYTEAOID)
/* okay, allowed use of hashvarlena() */ ;
else
result = false;
}
+ /* If function takes a second argument, it must be for a 64-bit salt. */
+ if (nargs == 2 && procform->proargtypes.values[1] != INT8OID)
+ result = false;
+
ReleaseSysCache(tp);
return result;
}
diff --git a/src/backend/commands/opclasscmds.c b/src/backend/commands/opclasscmds.c
index a31b1ac..d23e6d6 100644
--- a/src/backend/commands/opclasscmds.c
+++ b/src/backend/commands/opclasscmds.c
@@ -18,6 +18,7 @@
#include <limits.h>
#include "access/genam.h"
+#include "access/hash.h"
#include "access/heapam.h"
#include "access/nbtree.h"
#include "access/htup_details.h"
@@ -1129,7 +1130,8 @@ assignProcTypes(OpFamilyMember *member, Oid amoid, Oid typeoid)
/*
* btree comparison procs must be 2-arg procs returning int4, while btree
* sortsupport procs must take internal and return void. hash support
- * procs must be 1-arg procs returning int4. Otherwise we don't know.
+ * proc 1 must be a 1-arg proc returning int4, while proc 2 must be a
+ * 2-arg proc returning int8. Otherwise we don't know.
*/
if (amoid == BTREE_AM_OID)
{
@@ -1172,14 +1174,28 @@ assignProcTypes(OpFamilyMember *member, Oid amoid, Oid typeoid)
}
else if (amoid == HASH_AM_OID)
{
- if (procform->pronargs != 1)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
- errmsg("hash procedures must have one argument")));
- if (procform->prorettype != INT4OID)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
- errmsg("hash procedures must return integer")));
+ if (member->number == HASHSTANDARD_PROC)
+ {
+ if (procform->pronargs != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 1 must have one argument")));
+ if (procform->prorettype != INT4OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 1 must return integer")));
+ }
+ else if (member->number == HASHEXTENDED_PROC)
+ {
+ if (procform->pronargs != 2)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 2 must have two arguments")));
+ if (procform->prorettype != INT8OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 2 must return bigint")));
+ }
/*
* If lefttype/righttype isn't specified, use the proc's input type
diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index 2efb6c9..917491e 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -16,6 +16,7 @@
#include <ctype.h>
+#include "access/hash.h"
#include "access/htup_details.h"
#include "catalog/catalog.h"
#include "catalog/namespace.h"
@@ -717,6 +718,20 @@ hash_aclitem(PG_FUNCTION_ARGS)
PG_RETURN_UINT32((uint32) (a->ai_privs + a->ai_grantee + a->ai_grantor));
}
+/*
+ * Hash aclitems with a 64-bit seed, if the seed is non-zero.
+ *
+ * Returns a uint64 value. Otherwise similar to hash_aclitem.
+ */
+Datum
+hash_aclitem_extended(PG_FUNCTION_ARGS)
+{
+ AclItem *a = PG_GETARG_ACLITEM_P(0);
+ uint64 seed = PG_GETARG_INT64(1);
+ uint32 sum = (uint32) (a->ai_privs + a->ai_grantee + a->ai_grantor);
+
+ return (seed == 0) ? UInt64GetDatum(sum) : hash_uint32_extended(sum, seed);
+}
/*
* acldefault() --- create an ACL describing default access permissions
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 34dadd6..b08614e 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -20,6 +20,7 @@
#endif
#include <math.h>
+#include "access/hash.h"
#include "access/htup_details.h"
#include "catalog/pg_type.h"
#include "funcapi.h"
@@ -4020,6 +4021,84 @@ hash_array(PG_FUNCTION_ARGS)
PG_RETURN_UINT32(result);
}
+/*
+ * Returns 64-bit value by hashing a value to a 64-bit value, with a seed.
+ * Otherwise, similar to hash_array.
+ */
+Datum
+hash_array_extended(PG_FUNCTION_ARGS)
+{
+ AnyArrayType *array = PG_GETARG_ANY_ARRAY(0);
+ uint64 seed = PG_GETARG_INT64(1);
+ int ndims = AARR_NDIM(array);
+ int *dims = AARR_DIMS(array);
+ Oid element_type = AARR_ELEMTYPE(array);
+ uint64 result = 1;
+ int nitems;
+ TypeCacheEntry *typentry;
+ int typlen;
+ bool typbyval;
+ char typalign;
+ int i;
+ array_iter iter;
+ FunctionCallInfoData locfcinfo;
+
+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL ||
+ typentry->type_id != element_type)
+ {
+ typentry = lookup_type_cache(element_type,
+ TYPECACHE_HASH_EXTENDED_PROC_FINFO);
+ if (!OidIsValid(typentry->hash_extended_proc_finfo.fn_oid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("could not identify an extended hash function for type %s",
+ format_type_be(element_type))));
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
+ typlen = typentry->typlen;
+ typbyval = typentry->typbyval;
+ typalign = typentry->typalign;
+
+ InitFunctionCallInfoData(locfcinfo, &typentry->hash_extended_proc_finfo, 2,
+ InvalidOid, NULL, NULL);
+
+ /* Loop over source data */
+ nitems = ArrayGetNItems(ndims, dims);
+ array_iter_setup(&iter, array);
+
+ for (i = 0; i < nitems; i++)
+ {
+ Datum elt;
+ bool isnull;
+ uint64 elthash;
+
+ /* Get element, checking for NULL */
+ elt = array_iter_next(&iter, &isnull, i, typlen, typbyval, typalign);
+
+ if (isnull)
+ {
+ elthash = 0;
+ }
+ else
+ {
+ /* Apply the hash function */
+ locfcinfo.arg[0] = elt;
+ locfcinfo.arg[1] = seed;
+ locfcinfo.argnull[0] = false;
+ locfcinfo.argnull[1] = false;
+ locfcinfo.isnull = false;
+ elthash = DatumGetUInt64(FunctionCallInvoke(&locfcinfo));
+ }
+
+ result = (result << 5) - result + elthash;
+ }
+
+ AARR_FREE_IF_COPY(array, 0);
+
+ return UInt64GetDatum(result);
+}
+
/*-----------------------------------------------------------------------------
* array overlap/containment comparisons
diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index 7d89d79..06934ed 100644
--- a/src/backend/utils/adt/date.c
+++ b/src/backend/utils/adt/date.c
@@ -1509,6 +1509,12 @@ time_hash(PG_FUNCTION_ARGS)
}
Datum
+time_hash_extended(PG_FUNCTION_ARGS)
+{
+ return hashint8extended(fcinfo);
+}
+
+Datum
time_larger(PG_FUNCTION_ARGS)
{
TimeADT time1 = PG_GETARG_TIMEADT(0);
@@ -2214,6 +2220,21 @@ timetz_hash(PG_FUNCTION_ARGS)
}
Datum
+timetz_hash_extended(PG_FUNCTION_ARGS)
+{
+ TimeTzADT *key = PG_GETARG_TIMETZADT_P(0);
+ uint64 seed = PG_GETARG_DATUM(1);
+ uint64 thash;
+
+ /* Same approach as timetz_hash */
+ thash = DatumGetUInt64(DirectFunctionCall2(hashint8extended,
+ Int64GetDatumFast(key->time),
+ seed));
+ thash ^= DatumGetUInt64(hash_uint32_extended(key->zone, seed));
+ return UInt64GetDatum(thash);
+}
+
+Datum
timetz_larger(PG_FUNCTION_ARGS)
{
TimeTzADT *time1 = PG_GETARG_TIMETZADT_P(0);
diff --git a/src/backend/utils/adt/jsonb_op.c b/src/backend/utils/adt/jsonb_op.c
index d4c490e..4a0d147 100644
--- a/src/backend/utils/adt/jsonb_op.c
+++ b/src/backend/utils/adt/jsonb_op.c
@@ -291,3 +291,45 @@ jsonb_hash(PG_FUNCTION_ARGS)
PG_FREE_IF_COPY(jb, 0);
PG_RETURN_INT32(hash);
}
+
+Datum
+jsonb_hash_extended(PG_FUNCTION_ARGS)
+{
+ Jsonb *jb = PG_GETARG_JSONB(0);
+ JsonbIterator *it;
+ JsonbValue v;
+ JsonbIteratorToken r;
+ uint64 hash = 0;
+
+ if (JB_ROOT_COUNT(jb) == 0)
+ return UInt64GetDatum(0);
+
+ it = JsonbIteratorInit(&jb->root);
+
+ while ((r = JsonbIteratorNext(&it, &v, false)) != WJB_DONE)
+ {
+ switch (r)
+ {
+ /* Rotation is left to JsonbHashScalarValue() */
+ case WJB_BEGIN_ARRAY:
+ hash ^= JB_FARRAY;
+ break;
+ case WJB_BEGIN_OBJECT:
+ hash ^= JB_FOBJECT;
+ break;
+ case WJB_KEY:
+ case WJB_VALUE:
+ case WJB_ELEM:
+ JsonbHashScalarValueExtended(&v, &hash, PG_GETARG_INT64(1));
+ break;
+ case WJB_END_ARRAY:
+ case WJB_END_OBJECT:
+ break;
+ default:
+ elog(ERROR, "invalid JsonbIteratorNext rc: %d", (int) r);
+ }
+ }
+
+ PG_FREE_IF_COPY(jb, 0);
+ return UInt64GetDatum(hash);
+}
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 4850569..60986dd 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -1250,6 +1250,46 @@ JsonbHashScalarValue(const JsonbValue *scalarVal, uint32 *hash)
}
/*
+ * Hash a value to a 64-bit value, with a seed. Otherwise, similar to
+ * JsonbHashScalarValue.
+ */
+void
+JsonbHashScalarValueExtended(const JsonbValue *scalarVal, uint64 *hash,
+ uint64 seed)
+{
+ uint64 tmp;
+
+ switch (scalarVal->type)
+ {
+ case jbvNull:
+ tmp = 0x01;
+ break;
+ case jbvString:
+ tmp = DatumGetUInt64(hash_any_extended((const unsigned char *) scalarVal->val.string.val,
+ scalarVal->val.string.len,
+ seed));
+ break;
+ case jbvNumeric:
+ tmp = DatumGetUInt64(DirectFunctionCall2(hash_numeric_extended,
+ NumericGetDatum(scalarVal->val.numeric),
+ seed));
+ break;
+ case jbvBool:
+ tmp = DatumGetUInt64(DirectFunctionCall2(hashcharextended,
+ BoolGetDatum(scalarVal->val.boolean),
+ seed));
+
+ break;
+ default:
+ elog(ERROR, "invalid jsonb scalar type");
+ break;
+ }
+
+ *hash = (*hash << 1) | (*hash >> 63);
+ *hash ^= tmp;
+}
+
+/*
* Are two scalar JsonbValues of the same type a and b equal?
*/
static bool
diff --git a/src/backend/utils/adt/mac.c b/src/backend/utils/adt/mac.c
index d1c20c3..60521cc 100644
--- a/src/backend/utils/adt/mac.c
+++ b/src/backend/utils/adt/mac.c
@@ -271,6 +271,15 @@ hashmacaddr(PG_FUNCTION_ARGS)
return hash_any((unsigned char *) key, sizeof(macaddr));
}
+Datum
+hashmacaddrextended(PG_FUNCTION_ARGS)
+{
+ macaddr *key = PG_GETARG_MACADDR_P(0);
+
+ return hash_any_extended((unsigned char *) key, sizeof(macaddr),
+ PG_GETARG_INT64(1));
+}
+
/*
* Arithmetic functions: bitwise NOT, AND, OR.
*/
diff --git a/src/backend/utils/adt/mac8.c b/src/backend/utils/adt/mac8.c
index 482d1fb..0410b98 100644
--- a/src/backend/utils/adt/mac8.c
+++ b/src/backend/utils/adt/mac8.c
@@ -407,6 +407,15 @@ hashmacaddr8(PG_FUNCTION_ARGS)
return hash_any((unsigned char *) key, sizeof(macaddr8));
}
+Datum
+hashmacaddr8extended(PG_FUNCTION_ARGS)
+{
+ macaddr8 *key = PG_GETARG_MACADDR8_P(0);
+
+ return hash_any_extended((unsigned char *) key, sizeof(macaddr8),
+ PG_GETARG_INT64(1));
+}
+
/*
* Arithmetic functions: bitwise NOT, AND, OR.
*/
diff --git a/src/backend/utils/adt/network.c b/src/backend/utils/adt/network.c
index 5573c34..ec4ac20 100644
--- a/src/backend/utils/adt/network.c
+++ b/src/backend/utils/adt/network.c
@@ -486,6 +486,16 @@ hashinet(PG_FUNCTION_ARGS)
return hash_any((unsigned char *) VARDATA_ANY(addr), addrsize + 2);
}
+Datum
+hashinetextended(PG_FUNCTION_ARGS)
+{
+ inet *addr = PG_GETARG_INET_PP(0);
+ int addrsize = ip_addrsize(addr);
+
+ return hash_any_extended((unsigned char *) VARDATA_ANY(addr), addrsize + 2,
+ PG_GETARG_INT64(1));
+}
+
/*
* Boolean network-inclusion tests.
*/
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 3e5614e..59f6041 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -2230,6 +2230,65 @@ hash_numeric(PG_FUNCTION_ARGS)
PG_RETURN_DATUM(result);
}
+/*
+ * Returns 64-bit value by hashing a value to a 64-bit value, with a seed.
+ * Otherwise, similar to hash_numeric.
+ */
+Datum
+hash_numeric_extended(PG_FUNCTION_ARGS)
+{
+ Numeric key = PG_GETARG_NUMERIC(0);
+ Datum digit_hash;
+ Datum result;
+ int weight;
+ int start_offset;
+ int end_offset;
+ int i;
+ int hash_len;
+ NumericDigit *digits;
+
+ if (NUMERIC_IS_NAN(key))
+ return UInt64GetDatum(0);
+
+ weight = NUMERIC_WEIGHT(key);
+ start_offset = 0;
+ end_offset = 0;
+
+ digits = NUMERIC_DIGITS(key);
+ for (i = 0; i < NUMERIC_NDIGITS(key); i++)
+ {
+ if (digits[i] != (NumericDigit) 0)
+ break;
+
+ start_offset++;
+
+ weight--;
+ }
+
+ if (NUMERIC_NDIGITS(key) == start_offset)
+ return UInt64GetDatum(-1);
+
+ for (i = NUMERIC_NDIGITS(key) - 1; i >= 0; i--)
+ {
+ if (digits[i] != (NumericDigit) 0)
+ break;
+
+ end_offset++;
+ }
+
+ Assert(start_offset + end_offset < NUMERIC_NDIGITS(key));
+
+ hash_len = NUMERIC_NDIGITS(key) - start_offset - end_offset;
+ digit_hash = hash_any_extended((unsigned char *) (NUMERIC_DIGITS(key)
+ + start_offset),
+ hash_len * sizeof(NumericDigit),
+ PG_GETARG_INT64(1));
+
+ result = digit_hash ^ weight;
+
+ PG_RETURN_DATUM(result);
+}
+
/* ----------------------------------------------------------------------
*
diff --git a/src/backend/utils/adt/pg_lsn.c b/src/backend/utils/adt/pg_lsn.c
index aefbb87..7ad30a2 100644
--- a/src/backend/utils/adt/pg_lsn.c
+++ b/src/backend/utils/adt/pg_lsn.c
@@ -179,6 +179,12 @@ pg_lsn_hash(PG_FUNCTION_ARGS)
return hashint8(fcinfo);
}
+Datum
+pg_lsn_hash_extended(PG_FUNCTION_ARGS)
+{
+ return hashint8extended(fcinfo);
+}
+
/*----------------------------------------------------------
* Arithmetic operators on PostgreSQL LSNs.
diff --git a/src/backend/utils/adt/rangetypes.c b/src/backend/utils/adt/rangetypes.c
index 09a4f14..66cad0d 100644
--- a/src/backend/utils/adt/rangetypes.c
+++ b/src/backend/utils/adt/rangetypes.c
@@ -1281,6 +1281,69 @@ hash_range(PG_FUNCTION_ARGS)
}
/*
+ * Returns 64-bit value by hashing a value to a 64-bit value, with a seed.
+ * Otherwise, similar to hash_range.
+ */
+Datum
+hash_range_extended(PG_FUNCTION_ARGS)
+{
+ RangeType *r = PG_GETARG_RANGE(0);
+ uint64 seed = PG_GETARG_INT64(1);
+ uint64 result;
+ TypeCacheEntry *typcache;
+ TypeCacheEntry *scache;
+ RangeBound lower;
+ RangeBound upper;
+ bool empty;
+ char flags;
+ uint64 lower_hash;
+ uint64 upper_hash;
+
+ check_stack_depth();
+
+ typcache = range_get_typcache(fcinfo, RangeTypeGetOid(r));
+
+ range_deserialize(typcache, r, &lower, &upper, &empty);
+ flags = range_get_flags(r);
+
+ scache = typcache->rngelemtype;
+ if (!OidIsValid(scache->hash_extended_proc_finfo.fn_oid))
+ {
+ scache = lookup_type_cache(scache->type_id,
+ TYPECACHE_HASH_EXTENDED_PROC_FINFO);
+ if (!OidIsValid(scache->hash_extended_proc_finfo.fn_oid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("could not identify a hash function for type %s",
+ format_type_be(scache->type_id))));
+ }
+
+ if (RANGE_HAS_LBOUND(flags))
+ lower_hash = DatumGetUInt64(FunctionCall2Coll(&scache->hash_extended_proc_finfo,
+ typcache->rng_collation,
+ lower.val,
+ seed));
+ else
+ lower_hash = 0;
+
+ if (RANGE_HAS_UBOUND(flags))
+ upper_hash = DatumGetUInt64(FunctionCall2Coll(&scache->hash_extended_proc_finfo,
+ typcache->rng_collation,
+ upper.val,
+ seed));
+ else
+ upper_hash = 0;
+
+ /* Same approach as hash_range */
+ result = hash_uint32_extended((uint32) flags, seed);
+ result ^= lower_hash;
+ result = (result << 1) | (result >> 63);
+ result ^= upper_hash;
+
+ return UInt64GetDatum(result);
+}
+
+/*
*----------------------------------------------------------
* CANONICAL FUNCTIONS
*
diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index 6fa126d..b11d452 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -2113,6 +2113,11 @@ timestamp_hash(PG_FUNCTION_ARGS)
return hashint8(fcinfo);
}
+Datum
+timestamp_hash_extended(PG_FUNCTION_ARGS)
+{
+ return hashint8extended(fcinfo);
+}
/*
* Cross-type comparison functions for timestamp vs timestamptz
@@ -2419,6 +2424,20 @@ interval_hash(PG_FUNCTION_ARGS)
return DirectFunctionCall1(hashint8, Int64GetDatumFast(span64));
}
+Datum
+interval_hash_extended(PG_FUNCTION_ARGS)
+{
+ Interval *interval = PG_GETARG_INTERVAL_P(0);
+ INT128 span = interval_cmp_value(interval);
+ int64 span64;
+
+ /* Same approach as interval_hash */
+ span64 = int128_to_int64(span);
+
+ return DirectFunctionCall2(hashint8extended, Int64GetDatumFast(span64),
+ PG_GETARG_DATUM(1));
+}
+
/* overlaps_timestamp() --- implements the SQL OVERLAPS operator.
*
* Algorithm is per SQL spec. This is much harder than you'd think
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 5f15c8e..f73c695 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -408,3 +408,11 @@ uuid_hash(PG_FUNCTION_ARGS)
return hash_any(key->data, UUID_LEN);
}
+
+Datum
+uuid_hash_extended(PG_FUNCTION_ARGS)
+{
+ pg_uuid_t *key = PG_GETARG_UUID_P(0);
+
+ return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
+}
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index cbc62b0..2df6f2c 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -947,6 +947,24 @@ hashbpchar(PG_FUNCTION_ARGS)
return result;
}
+Datum
+hashbpcharextended(PG_FUNCTION_ARGS)
+{
+ BpChar *key = PG_GETARG_BPCHAR_PP(0);
+ char *keydata;
+ int keylen;
+ Datum result;
+
+ keydata = VARDATA_ANY(key);
+ keylen = bcTruelen(key);
+
+ result = hash_any_extended((unsigned char *) keydata, keylen,
+ PG_GETARG_INT64(1));
+
+ PG_FREE_IF_COPY(key, 0);
+
+ return result;
+}
/*
* The following operators support character-by-character comparison
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 82763f8..b7a14dc 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -490,8 +490,8 @@ get_compatible_hash_operators(Oid opno,
/*
* get_op_hash_functions
- * Get the OID(s) of hash support function(s) compatible with the given
- * operator, operating on its LHS and/or RHS datatype as required.
+ * Get the OID(s) of the standard hash support function(s) compatible with
+ * the given operator, operating on its LHS and/or RHS datatype as required.
*
* A function for the LHS type is sought and returned into *lhs_procno if
* lhs_procno isn't NULL. Similarly, a function for the RHS type is sought
@@ -542,7 +542,7 @@ get_op_hash_functions(Oid opno,
*lhs_procno = get_opfamily_proc(aform->amopfamily,
aform->amoplefttype,
aform->amoplefttype,
- HASHPROC);
+ HASHSTANDARD_PROC);
if (!OidIsValid(*lhs_procno))
continue;
/* Matching LHS found, done if caller doesn't want RHS */
@@ -564,7 +564,7 @@ get_op_hash_functions(Oid opno,
*rhs_procno = get_opfamily_proc(aform->amopfamily,
aform->amoprighttype,
aform->amoprighttype,
- HASHPROC);
+ HASHSTANDARD_PROC);
if (!OidIsValid(*rhs_procno))
{
/* Forget any LHS function from this opfamily */
diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index 691d498..2e633f0 100644
--- a/src/backend/utils/cache/typcache.c
+++ b/src/backend/utils/cache/typcache.c
@@ -90,6 +90,7 @@ static TypeCacheEntry *firstDomainTypeEntry = NULL;
#define TCFLAGS_HAVE_FIELD_EQUALITY 0x1000
#define TCFLAGS_HAVE_FIELD_COMPARE 0x2000
#define TCFLAGS_CHECKED_DOMAIN_CONSTRAINTS 0x4000
+#define TCFLAGS_CHECKED_HASH_EXTENDED_PROC 0x8000
/*
* Data stored about a domain type's constraints. Note that we do not create
@@ -307,6 +308,8 @@ lookup_type_cache(Oid type_id, int flags)
flags |= TYPECACHE_HASH_OPFAMILY;
if ((flags & (TYPECACHE_HASH_PROC | TYPECACHE_HASH_PROC_FINFO |
+ TYPECACHE_HASH_EXTENDED_PROC |
+ TYPECACHE_HASH_EXTENDED_PROC_FINFO |
TYPECACHE_HASH_OPFAMILY)) &&
!(typentry->flags & TCFLAGS_CHECKED_HASH_OPCLASS))
{
@@ -329,6 +332,7 @@ lookup_type_cache(Oid type_id, int flags)
* decision is still good.
*/
typentry->flags &= ~(TCFLAGS_CHECKED_HASH_PROC);
+ typentry->flags &= ~(TCFLAGS_CHECKED_HASH_EXTENDED_PROC);
typentry->flags |= TCFLAGS_CHECKED_HASH_OPCLASS;
}
@@ -372,11 +376,12 @@ lookup_type_cache(Oid type_id, int flags)
typentry->eq_opr = eq_opr;
/*
- * Reset info about hash function whenever we pick up new info about
- * equality operator. This is so we can ensure that the hash function
- * matches the operator.
+ * Reset info about hash functions whenever we pick up new info about
+ * equality operator. This is so we can ensure that the hash functions
+ * match the operator.
*/
typentry->flags &= ~(TCFLAGS_CHECKED_HASH_PROC);
+ typentry->flags &= ~(TCFLAGS_CHECKED_HASH_EXTENDED_PROC);
typentry->flags |= TCFLAGS_CHECKED_EQ_OPR;
}
if ((flags & TYPECACHE_LT_OPR) &&
@@ -467,7 +472,7 @@ lookup_type_cache(Oid type_id, int flags)
hash_proc = get_opfamily_proc(typentry->hash_opf,
typentry->hash_opintype,
typentry->hash_opintype,
- HASHPROC);
+ HASHSTANDARD_PROC);
/*
* As above, make sure hash_array will succeed. We don't currently
@@ -485,6 +490,43 @@ lookup_type_cache(Oid type_id, int flags)
typentry->hash_proc = hash_proc;
typentry->flags |= TCFLAGS_CHECKED_HASH_PROC;
}
+ if ((flags & (TYPECACHE_HASH_EXTENDED_PROC |
+ TYPECACHE_HASH_EXTENDED_PROC_FINFO)) &&
+ !(typentry->flags & TCFLAGS_CHECKED_HASH_EXTENDED_PROC))
+ {
+ Oid hash_extended_proc = InvalidOid;
+
+ /*
+ * We insist that the eq_opr, if one has been determined, match the
+ * hash opclass; else report there is no hash function.
+ */
+ if (typentry->hash_opf != InvalidOid &&
+ (!OidIsValid(typentry->eq_opr) ||
+ typentry->eq_opr == get_opfamily_member(typentry->hash_opf,
+ typentry->hash_opintype,
+ typentry->hash_opintype,
+ HTEqualStrategyNumber)))
+ hash_extended_proc = get_opfamily_proc(typentry->hash_opf,
+ typentry->hash_opintype,
+ typentry->hash_opintype,
+ HASHEXTENDED_PROC);
+
+ /*
+ * As above, make sure hash_array_extended will succeed. We don't
+ * currently support hashing for composite types, but when we do,
+ * we'll need more logic here to check that case too.
+ */
+ if (hash_extended_proc == F_HASH_ARRAY_EXTENDED &&
+ !array_element_has_hashing(typentry))
+ hash_extended_proc = InvalidOid;
+
+ /* Force update of hash_proc_finfo only if we're changing state */
+ if (typentry->hash_extended_proc != hash_extended_proc)
+ typentry->hash_extended_proc_finfo.fn_oid = InvalidOid;
+
+ typentry->hash_extended_proc = hash_extended_proc;
+ typentry->flags |= TCFLAGS_CHECKED_HASH_EXTENDED_PROC;
+ }
/*
* Set up fmgr lookup info as requested
@@ -523,6 +565,14 @@ lookup_type_cache(Oid type_id, int flags)
fmgr_info_cxt(typentry->hash_proc, &typentry->hash_proc_finfo,
CacheMemoryContext);
}
+ if ((flags & TYPECACHE_HASH_EXTENDED_PROC_FINFO) &&
+ typentry->hash_extended_proc_finfo.fn_oid == InvalidOid &&
+ typentry->hash_extended_proc != InvalidOid)
+ {
+ fmgr_info_cxt(typentry->hash_extended_proc,
+ &typentry->hash_extended_proc_finfo,
+ CacheMemoryContext);
+ }
/*
* If it's a composite type (row type), get tupdesc if requested
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index 72fce30..d89398d 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -289,12 +289,16 @@ typedef HashMetaPageData *HashMetaPage;
#define HTMaxStrategyNumber 1
/*
- * When a new operator class is declared, we require that the user supply
- * us with an amproc procudure for hashing a key of the new type.
- * Since we only have one such proc in amproc, it's number 1.
+ * When a new operator class is declared, we require that the user supply
+ * us with an amproc procudure for hashing a key of the new type, returning
+ * a 32-bit hash value. We call this the "standard" hash procedure. We
+ * also allow an optional "extended" hash procedure which accepts a salt and
+ * returns a 64-bit hash value. This is highly recommended but, for reasons
+ * of backward compatibility, optional.
*/
-#define HASHPROC 1
-#define HASHNProcs 1
+#define HASHSTANDARD_PROC 1
+#define HASHEXTENDED_PROC 2
+#define HASHNProcs 2
/* public routines */
@@ -322,7 +326,10 @@ extern bytea *hashoptions(Datum reloptions, bool validate);
extern bool hashvalidate(Oid opclassoid);
extern Datum hash_any(register const unsigned char *k, register int keylen);
+extern Datum hash_any_extended(register const unsigned char *k,
+ register int keylen, uint64 seed);
extern Datum hash_uint32(uint32 k);
+extern Datum hash_uint32_extended(uint32 k, uint64 seed);
/* private routines */
diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h
index 7d245b1..fb6a829 100644
--- a/src/include/catalog/pg_amproc.h
+++ b/src/include/catalog/pg_amproc.h
@@ -153,41 +153,77 @@ DATA(insert ( 4033 3802 3802 1 4044 ));
/* hash */
DATA(insert ( 427 1042 1042 1 1080 ));
+DATA(insert ( 427 1042 1042 2 972 ));
DATA(insert ( 431 18 18 1 454 ));
+DATA(insert ( 431 18 18 2 446 ));
DATA(insert ( 435 1082 1082 1 450 ));
+DATA(insert ( 435 1082 1082 2 425 ));
DATA(insert ( 627 2277 2277 1 626 ));
+DATA(insert ( 627 2277 2277 2 782 ));
DATA(insert ( 1971 700 700 1 451 ));
+DATA(insert ( 1971 700 700 2 443 ));
DATA(insert ( 1971 701 701 1 452 ));
+DATA(insert ( 1971 701 701 2 444 ));
DATA(insert ( 1975 869 869 1 422 ));
+DATA(insert ( 1975 869 869 2 779 ));
DATA(insert ( 1977 21 21 1 449 ));
+DATA(insert ( 1977 21 21 2 441 ));
DATA(insert ( 1977 23 23 1 450 ));
+DATA(insert ( 1977 23 23 2 425 ));
DATA(insert ( 1977 20 20 1 949 ));
+DATA(insert ( 1977 20 20 2 442 ));
DATA(insert ( 1983 1186 1186 1 1697 ));
+DATA(insert ( 1983 1186 1186 2 3418 ));
DATA(insert ( 1985 829 829 1 399 ));
+DATA(insert ( 1985 829 829 2 778 ));
DATA(insert ( 1987 19 19 1 455 ));
+DATA(insert ( 1987 19 19 2 447 ));
DATA(insert ( 1990 26 26 1 453 ));
+DATA(insert ( 1990 26 26 2 445 ));
DATA(insert ( 1992 30 30 1 457 ));
+DATA(insert ( 1992 30 30 2 776 ));
DATA(insert ( 1995 25 25 1 400 ));
+DATA(insert ( 1995 25 25 2 448));
DATA(insert ( 1997 1083 1083 1 1688 ));
+DATA(insert ( 1997 1083 1083 2 3409 ));
DATA(insert ( 1998 1700 1700 1 432 ));
+DATA(insert ( 1998 1700 1700 2 780 ));
DATA(insert ( 1999 1184 1184 1 2039 ));
+DATA(insert ( 1999 1184 1184 2 3411 ));
DATA(insert ( 2001 1266 1266 1 1696 ));
+DATA(insert ( 2001 1266 1266 2 3410 ));
DATA(insert ( 2040 1114 1114 1 2039 ));
+DATA(insert ( 2040 1114 1114 2 3411 ));
DATA(insert ( 2222 16 16 1 454 ));
+DATA(insert ( 2222 16 16 2 446 ));
DATA(insert ( 2223 17 17 1 456 ));
+DATA(insert ( 2223 17 17 2 772 ));
DATA(insert ( 2225 28 28 1 450 ));
+DATA(insert ( 2225 28 28 2 425));
DATA(insert ( 2226 29 29 1 450 ));
+DATA(insert ( 2226 29 29 2 425 ));
DATA(insert ( 2227 702 702 1 450 ));
+DATA(insert ( 2227 702 702 2 425 ));
DATA(insert ( 2228 703 703 1 450 ));
+DATA(insert ( 2228 703 703 2 425 ));
DATA(insert ( 2229 25 25 1 400 ));
+DATA(insert ( 2229 25 25 2 448 ));
DATA(insert ( 2231 1042 1042 1 1080 ));
+DATA(insert ( 2231 1042 1042 2 972 ));
DATA(insert ( 2235 1033 1033 1 329 ));
+DATA(insert ( 2235 1033 1033 2 777 ));
DATA(insert ( 2969 2950 2950 1 2963 ));
+DATA(insert ( 2969 2950 2950 2 3412 ));
DATA(insert ( 3254 3220 3220 1 3252 ));
+DATA(insert ( 3254 3220 3220 2 3413 ));
DATA(insert ( 3372 774 774 1 328 ));
+DATA(insert ( 3372 774 774 2 781 ));
DATA(insert ( 3523 3500 3500 1 3515 ));
+DATA(insert ( 3523 3500 3500 2 3414 ));
DATA(insert ( 3903 3831 3831 1 3902 ));
+DATA(insert ( 3903 3831 3831 2 3417 ));
DATA(insert ( 4034 3802 3802 1 4045 ));
+DATA(insert ( 4034 3802 3802 2 3416));
/* gist */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 8b33b4e..d820b56 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -668,36 +668,68 @@ DESCR("convert char(n) to name");
DATA(insert OID = 449 ( hashint2 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "21" _null_ _null_ _null_ _null_ _null_ hashint2 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 441 ( hashint2extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "21 20" _null_ _null_ _null_ _null_ _null_ hashint2extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 450 ( hashint4 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "23" _null_ _null_ _null_ _null_ _null_ hashint4 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 425 ( hashint4extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "23 20" _null_ _null_ _null_ _null_ _null_ hashint4extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 949 ( hashint8 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "20" _null_ _null_ _null_ _null_ _null_ hashint8 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 442 ( hashint8extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "20 20" _null_ _null_ _null_ _null_ _null_ hashint8extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 451 ( hashfloat4 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "700" _null_ _null_ _null_ _null_ _null_ hashfloat4 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 443 ( hashfloat4extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "700 20" _null_ _null_ _null_ _null_ _null_ hashfloat4extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 452 ( hashfloat8 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "701" _null_ _null_ _null_ _null_ _null_ hashfloat8 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 444 ( hashfloat8extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "701 20" _null_ _null_ _null_ _null_ _null_ hashfloat8extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 453 ( hashoid PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "26" _null_ _null_ _null_ _null_ _null_ hashoid _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 445 ( hashoidextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "26 20" _null_ _null_ _null_ _null_ _null_ hashoidextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 454 ( hashchar PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "18" _null_ _null_ _null_ _null_ _null_ hashchar _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 446 ( hashcharextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "18 20" _null_ _null_ _null_ _null_ _null_ hashcharextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 455 ( hashname PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "19" _null_ _null_ _null_ _null_ _null_ hashname _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 447 ( hashnameextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "19 20" _null_ _null_ _null_ _null_ _null_ hashnameextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 400 ( hashtext PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "25" _null_ _null_ _null_ _null_ _null_ hashtext _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 448 ( hashtextextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "25 20" _null_ _null_ _null_ _null_ _null_ hashtextextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 456 ( hashvarlena PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "2281" _null_ _null_ _null_ _null_ _null_ hashvarlena _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 772 ( hashvarlenaextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "2281 20" _null_ _null_ _null_ _null_ _null_ hashvarlenaextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 457 ( hashoidvector PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "30" _null_ _null_ _null_ _null_ _null_ hashoidvector _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 776 ( hashoidvectorextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "30 20" _null_ _null_ _null_ _null_ _null_ hashoidvectorextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 329 ( hash_aclitem PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1033" _null_ _null_ _null_ _null_ _null_ hash_aclitem _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 777 ( hash_aclitem_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1033 20" _null_ _null_ _null_ _null_ _null_ hash_aclitem_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 399 ( hashmacaddr PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "829" _null_ _null_ _null_ _null_ _null_ hashmacaddr _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 778 ( hashmacaddrextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "829 20" _null_ _null_ _null_ _null_ _null_ hashmacaddrextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 422 ( hashinet PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "869" _null_ _null_ _null_ _null_ _null_ hashinet _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 779 ( hashinetextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "869 20" _null_ _null_ _null_ _null_ _null_ hashinetextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 432 ( hash_numeric PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1700" _null_ _null_ _null_ _null_ _null_ hash_numeric _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 780 ( hash_numeric_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1700 20" _null_ _null_ _null_ _null_ _null_ hash_numeric_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 328 ( hashmacaddr8 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "774" _null_ _null_ _null_ _null_ _null_ hashmacaddr8 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 781 ( hashmacaddr8extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "774 20" _null_ _null_ _null_ _null_ _null_ hashmacaddr8extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 438 ( num_nulls PGNSP PGUID 12 1 0 2276 0 f f f f f f i s 1 0 23 "2276" "{2276}" "{v}" _null_ _null_ _null_ pg_num_nulls _null_ _null_ _null_ ));
DESCR("count the number of NULL arguments");
@@ -747,6 +779,8 @@ DESCR("convert float8 to int8");
DATA(insert OID = 626 ( hash_array PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "2277" _null_ _null_ _null_ _null_ _null_ hash_array _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 782 ( hash_array_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "2277 20" _null_ _null_ _null_ _null_ _null_ hash_array_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 652 ( float4 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 700 "20" _null_ _null_ _null_ _null_ _null_ i8tof _null_ _null_ _null_ ));
DESCR("convert int8 to float4");
@@ -1155,6 +1189,8 @@ DATA(insert OID = 3328 ( bpchar_sortsupport PGNSP PGUID 12 1 0 0 0 f f f f t f i
DESCR("sort support");
DATA(insert OID = 1080 ( hashbpchar PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1042" _null_ _null_ _null_ _null_ _null_ hashbpchar _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 972 ( hashbpcharextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1042 20" _null_ _null_ _null_ _null_ _null_ hashbpcharextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 1081 ( format_type PGNSP PGUID 12 1 0 0 0 f f f f f f s s 2 0 25 "26 23" _null_ _null_ _null_ _null_ _null_ format_type _null_ _null_ _null_ ));
DESCR("format a type oid and atttypmod to canonical SQL");
DATA(insert OID = 1084 ( date_in PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 1082 "2275" _null_ _null_ _null_ _null_ _null_ date_in _null_ _null_ _null_ ));
@@ -2286,10 +2322,16 @@ DESCR("less-equal-greater");
DATA(insert OID = 1688 ( time_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1083" _null_ _null_ _null_ _null_ _null_ time_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3409 ( time_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1083 20" _null_ _null_ _null_ _null_ _null_ time_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 1696 ( timetz_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1266" _null_ _null_ _null_ _null_ _null_ timetz_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3410 ( timetz_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1266 20" _null_ _null_ _null_ _null_ _null_ timetz_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 1697 ( interval_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1186" _null_ _null_ _null_ _null_ _null_ interval_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3418 ( interval_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1186 20" _null_ _null_ _null_ _null_ _null_ interval_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
/* OID's 1700 - 1799 NUMERIC data type */
@@ -3078,6 +3120,8 @@ DATA(insert OID = 2038 ( timezone PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0
DESCR("adjust time with time zone to new zone");
DATA(insert OID = 2039 ( timestamp_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1114" _null_ _null_ _null_ _null_ _null_ timestamp_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3411 ( timestamp_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1114 20" _null_ _null_ _null_ _null_ _null_ timestamp_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 2041 ( overlaps PGNSP PGUID 12 1 0 0 0 f f f f f f i s 4 0 16 "1114 1114 1114 1114" _null_ _null_ _null_ _null_ _null_ overlaps_timestamp _null_ _null_ _null_ ));
DESCR("intervals overlap?");
DATA(insert OID = 2042 ( overlaps PGNSP PGUID 14 1 0 0 0 f f f f f f i s 4 0 16 "1114 1186 1114 1186" _null_ _null_ _null_ _null_ _null_ "select ($1, ($1 + $2)) overlaps ($3, ($3 + $4))" _null_ _null_ _null_ ));
@@ -4543,6 +4587,8 @@ DATA(insert OID = 2962 ( uuid_send PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1
DESCR("I/O");
DATA(insert OID = 2963 ( uuid_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "2950" _null_ _null_ _null_ _null_ _null_ uuid_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3412 ( uuid_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "2950 20" _null_ _null_ _null_ _null_ _null_ uuid_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
/* pg_lsn */
DATA(insert OID = 3229 ( pg_lsn_in PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3220 "2275" _null_ _null_ _null_ _null_ _null_ pg_lsn_in _null_ _null_ _null_ ));
@@ -4564,6 +4610,8 @@ DATA(insert OID = 3251 ( pg_lsn_cmp PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0
DESCR("less-equal-greater");
DATA(insert OID = 3252 ( pg_lsn_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "3220" _null_ _null_ _null_ _null_ _null_ pg_lsn_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3413 ( pg_lsn_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "3220 20" _null_ _null_ _null_ _null_ _null_ pg_lsn_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
/* enum related procs */
DATA(insert OID = 3504 ( anyenum_in PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3500 "2275" _null_ _null_ _null_ _null_ _null_ anyenum_in _null_ _null_ _null_ ));
@@ -4584,6 +4632,8 @@ DATA(insert OID = 3514 ( enum_cmp PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 2
DESCR("less-equal-greater");
DATA(insert OID = 3515 ( hashenum PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "3500" _null_ _null_ _null_ _null_ _null_ hashenum _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3414 ( hashenumextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "3500 20" _null_ _null_ _null_ _null_ _null_ hashenumextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 3524 ( enum_smaller PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 3500 "3500 3500" _null_ _null_ _null_ _null_ _null_ enum_smaller _null_ _null_ _null_ ));
DESCR("smaller of two");
DATA(insert OID = 3525 ( enum_larger PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 3500 "3500 3500" _null_ _null_ _null_ _null_ _null_ enum_larger _null_ _null_ _null_ ));
@@ -4981,6 +5031,8 @@ DATA(insert OID = 4044 ( jsonb_cmp PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2
DESCR("less-equal-greater");
DATA(insert OID = 4045 ( jsonb_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "3802" _null_ _null_ _null_ _null_ _null_ jsonb_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3416 ( jsonb_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "3802 20" _null_ _null_ _null_ _null_ _null_ jsonb_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 4046 ( jsonb_contains PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 16 "3802 3802" _null_ _null_ _null_ _null_ _null_ jsonb_contains _null_ _null_ _null_ ));
DATA(insert OID = 4047 ( jsonb_exists PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 16 "3802 25" _null_ _null_ _null_ _null_ _null_ jsonb_exists _null_ _null_ _null_ ));
DATA(insert OID = 4048 ( jsonb_exists_any PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 16 "3802 1009" _null_ _null_ _null_ _null_ _null_ jsonb_exists_any _null_ _null_ _null_ ));
@@ -5171,6 +5223,8 @@ DATA(insert OID = 3881 ( range_gist_same PGNSP PGUID 12 1 0 0 0 f f f f t f i
DESCR("GiST support");
DATA(insert OID = 3902 ( hash_range PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "3831" _null_ _null_ _null_ _null_ _null_ hash_range _null_ _null_ _null_ ));
DESCR("hash a range");
+DATA(insert OID = 3417 ( hash_range_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "3831 20" _null_ _null_ _null_ _null_ _null_ hash_range_extended _null_ _null_ _null_ ));
+DESCR("hash a range");
DATA(insert OID = 3916 ( range_typanalyze PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 16 "2281" _null_ _null_ _null_ _null_ _null_ range_typanalyze _null_ _null_ _null_ ));
DESCR("range typanalyze");
DATA(insert OID = 3169 ( rangesel PGNSP PGUID 12 1 0 0 0 f f f f t f s s 4 0 701 "2281 26 2281 23" _null_ _null_ _null_ _null_ _null_ rangesel _null_ _null_ _null_ ));
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index ea9dd17..24f4916 100644
--- a/src/include/utils/jsonb.h
+++ b/src/include/utils/jsonb.h
@@ -370,6 +370,8 @@ extern Jsonb *JsonbValueToJsonb(JsonbValue *val);
extern bool JsonbDeepContains(JsonbIterator **val,
JsonbIterator **mContained);
extern void JsonbHashScalarValue(const JsonbValue *scalarVal, uint32 *hash);
+extern void JsonbHashScalarValueExtended(const JsonbValue *scalarVal,
+ uint64 *hash, uint64 seed);
/* jsonb.c support functions */
extern char *JsonbToCString(StringInfo out, JsonbContainer *in,
diff --git a/src/include/utils/typcache.h b/src/include/utils/typcache.h
index c12631d..b4f7592 100644
--- a/src/include/utils/typcache.h
+++ b/src/include/utils/typcache.h
@@ -56,6 +56,7 @@ typedef struct TypeCacheEntry
Oid gt_opr; /* the greater-than operator */
Oid cmp_proc; /* the btree comparison function */
Oid hash_proc; /* the hash calculation function */
+ Oid hash_extended_proc; /* the extended hash calculation function */
/*
* Pre-set-up fmgr call info for the equality operator, the btree
@@ -67,6 +68,7 @@ typedef struct TypeCacheEntry
FmgrInfo eq_opr_finfo;
FmgrInfo cmp_proc_finfo;
FmgrInfo hash_proc_finfo;
+ FmgrInfo hash_extended_proc_finfo;
/*
* Tuple descriptor if it's a composite type (row type). NULL if not
@@ -120,6 +122,8 @@ typedef struct TypeCacheEntry
#define TYPECACHE_HASH_OPFAMILY 0x0400
#define TYPECACHE_RANGE_INFO 0x0800
#define TYPECACHE_DOMAIN_INFO 0x1000
+#define TYPECACHE_HASH_EXTENDED_PROC 0x2000
+#define TYPECACHE_HASH_EXTENDED_PROC_FINFO 0x4000
/*
* Callers wishing to maintain a long-lived reference to a domain's constraint
diff --git a/src/test/regress/expected/alter_generic.out b/src/test/regress/expected/alter_generic.out
index 9f6ad4d..767c09b 100644
--- a/src/test/regress/expected/alter_generic.out
+++ b/src/test/regress/expected/alter_generic.out
@@ -421,7 +421,7 @@ BEGIN TRANSACTION;
CREATE OPERATOR FAMILY alt_opf13 USING hash;
CREATE FUNCTION fn_opf13 (int4) RETURNS BIGINT AS 'SELECT NULL::BIGINT;' LANGUAGE SQL;
ALTER OPERATOR FAMILY alt_opf13 USING hash ADD FUNCTION 1 fn_opf13(int4);
-ERROR: hash procedures must return integer
+ERROR: hash procedure 1 must return integer
DROP OPERATOR FAMILY alt_opf13 USING hash;
ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
@@ -439,7 +439,7 @@ BEGIN TRANSACTION;
CREATE OPERATOR FAMILY alt_opf15 USING hash;
CREATE FUNCTION fn_opf15 (int4, int2) RETURNS BIGINT AS 'SELECT NULL::BIGINT;' LANGUAGE SQL;
ALTER OPERATOR FAMILY alt_opf15 USING hash ADD FUNCTION 1 fn_opf15(int4, int2);
-ERROR: hash procedures must have one argument
+ERROR: hash procedure 1 must have one argument
DROP OPERATOR FAMILY alt_opf15 USING hash;
ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
--
2.6.2
0002-test-Hash_functions_v2_wip.patchapplication/octet-stream; name=0002-test-Hash_functions_v2_wip.patchDownload
From 503fae97b863269d29db7cf1573b729fe5f9810a Mon Sep 17 00:00:00 2001
From: Amul Sul <sulamul@gmail.com>
Date: Tue, 22 Aug 2017 14:06:50 +0530
Subject: [PATCH 2/2] test-Hash_functions_v2_wip
---
src/test/regress/expected/hash_func.out | 289 ++++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/hash_func.sql | 179 ++++++++++++++++++++
3 files changed, 469 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/hash_func.out
create mode 100644 src/test/regress/sql/hash_func.sql
diff --git a/src/test/regress/expected/hash_func.out b/src/test/regress/expected/hash_func.out
new file mode 100644
index 0000000..41f1fc0
--- /dev/null
+++ b/src/test/regress/expected/hash_func.out
@@ -0,0 +1,289 @@
+--
+-- Test hash functions
+--
+SELECT v as value, hashint2(v)::bit(32) as standard,
+ hashint2extended(v, 0)::bit(32) as extended0,
+ hashint2extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0::int2), (1::int2), (17::int2), (42::int2)) x(v)
+WHERE hashint2(v)::bit(32) != hashint2extended(v, 0)::bit(32)
+ OR hashint2(v)::bit(32) = hashint2extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashint4(v)::bit(32) as standard,
+ hashint4extended(v, 0)::bit(32) as extended0,
+ hashint4extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashint4(v)::bit(32) != hashint4extended(v, 0)::bit(32)
+ OR hashint4(v)::bit(32) = hashint4extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashint8(v)::bit(32) as standard,
+ hashint8extended(v, 0)::bit(32) as extended0,
+ hashint8extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashint8(v)::bit(32) != hashint8extended(v, 0)::bit(32)
+ OR hashint8(v)::bit(32) = hashint8extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashfloat4(v)::bit(32) as standard,
+ hashfloat4extended(v, 0)::bit(32) as extended0,
+ hashfloat4extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashfloat4(v)::bit(32) != hashfloat4extended(v, 0)::bit(32)
+ OR hashfloat4(v)::bit(32) = hashfloat4extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------------------------------+----------------------------------+----------------------------------
+ 0 | 00000000000000000000000000000000 | 00000000000000000000000000000000 | 00000000000000000000000000000000
+(1 row)
+
+SELECT v as value, hashfloat8(v)::bit(32) as standard,
+ hashfloat8extended(v, 0)::bit(32) as extended0,
+ hashfloat8extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashfloat8(v)::bit(32) != hashfloat8extended(v, 0)::bit(32)
+ OR hashfloat8(v)::bit(32) = hashfloat8extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------------------------------+----------------------------------+----------------------------------
+ 0 | 00000000000000000000000000000000 | 00000000000000000000000000000000 | 00000000000000000000000000000000
+(1 row)
+
+SELECT v as value, hashoid(v)::bit(32) as standard,
+ hashoidextended(v, 0)::bit(32) as extended0,
+ hashoidextended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashoid(v)::bit(32) != hashoidextended(v, 0)::bit(32)
+ OR hashoid(v)::bit(32) = hashoidextended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashchar(v)::bit(32) as standard,
+ hashcharextended(v, 0)::bit(32) as extended0,
+ hashcharextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::"char"), ('1'), ('x'), ('X'), ('p'), ('N')) x(v)
+WHERE hashchar(v)::bit(32) != hashcharextended(v, 0)::bit(32)
+ OR hashchar(v)::bit(32) = hashcharextended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashname(v)::bit(32) as standard,
+ hashnameextended(v, 0)::bit(32) as extended0,
+ hashnameextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL), ('PostgreSQL'), ('eIpUEtqmY89'), ('AXKEJBTK'), ('muop28x03'), ('yi3nm0d73')) x(v)
+WHERE hashname(v)::bit(32) != hashnameextended(v, 0)::bit(32)
+ OR hashname(v)::bit(32) = hashnameextended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashtext(v)::bit(32) as standard,
+ hashtextextended(v, 0)::bit(32) as extended0,
+ hashtextextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL), ('PostgreSQL'), ('eIpUEtqmY89'), ('AXKEJBTK'), ('muop28x03'), ('yi3nm0d73')) x(v)
+WHERE hashtext(v)::bit(32) != hashtextextended(v, 0)::bit(32)
+ OR hashtext(v)::bit(32) = hashtextextended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+--SELECT v as value, hashvarlena(internal_type??)::bit(32) as standard,
+-- hashvarlenaextended(internal type??, 0)::bit(32) as extended0,
+-- hashvarlenaextended(internal type??, 1)::bit(32) as extended1;
+SELECT v as value, hashoidvector(v)::bit(32) as standard,
+ hashoidvectorextended(v, 0)::bit(32) as extended0,
+ hashoidvectorextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::oidvector), ('0 1 2 3 4'), ('17 18 19 20'), ('42 43 42 45'),
+ ('550273 550273 570274'), ('207112489 207112499')) x(v)
+WHERE hashoidvector(v)::bit(32) != hashoidvectorextended(v, 0)::bit(32)
+ OR hashoidvector(v)::bit(32) = hashoidvectorextended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hash_aclitem(v)::bit(32) as standard,
+ hash_aclitem_extended(v, 0)::bit(32) as extended0,
+ hash_aclitem_extended(v, 1)::bit(32) as extended1
+FROM (SELECT DISTINCT(relacl[1]) FROM pg_class LIMIT 10) x(v)
+WHERE hash_aclitem(v)::bit(32) != hash_aclitem_extended(v, 0)::bit(32)
+ OR hash_aclitem(v)::bit(32) = hash_aclitem_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashmacaddr(v)::bit(32) as standard,
+ hashmacaddrextended(v, 0)::bit(32) as extended0,
+ hashmacaddrextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::macaddr), ('08:00:2b:01:02:04'), ('08:00:2b:01:02:04'),
+ ('e2:7f:51:3e:70:49'), ('d6:a9:4a:78:1c:d5'), ('ea:29:b1:5e:1f:a5')) x(v)
+WHERE hashmacaddr(v)::bit(32) != hashmacaddrextended(v, 0)::bit(32)
+ OR hashmacaddr(v)::bit(32) = hashmacaddrextended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashmacaddr8(v)::bit(32) as standard,
+ hashmacaddr8extended(v, 0)::bit(32) as extended0,
+ hashmacaddr8extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::macaddr8), ('08:00:2b:01:02:04'), ('08:00:2b:01:02:04'),
+ ('e2:7f:51:3e:70:49'), ('d6:a9:4a:78:1c:d5'), ('ea:29:b1:5e:1f:a5')) x(v)
+WHERE hashmacaddr8(v)::bit(32) != hashmacaddr8extended(v, 0)::bit(32)
+ OR hashmacaddr8(v)::bit(32) = hashmacaddr8extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashinet('192.168.100.128/25')::bit(32) as standard,
+ hashinetextended('192.168.100.128/25', 0)::bit(32) as extended0,
+ hashinetextended('192.168.100.128/25', 1)::bit(32) as extended1
+FROM (VALUES (NULL::inet), ('192.168.100.128/25'), ('192.168.100.0/8'),
+ ('172.168.10.126/16'), ('172.18.103.126/24'), ('192.188.13.16/32')) x(v)
+WHERE hashinet(v)::bit(32) != hashinetextended(v, 0)::bit(32)
+ OR hashinet(v)::bit(32) = hashinetextended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hash_numeric(149484958.204628457)::bit(32) as standard,
+ hash_numeric_extended(149484958.204628457, 0)::bit(32) as extended0,
+ hash_numeric_extended(149484958.204628457, 1)::bit(32) as extended1
+FROM (VALUES (0), (1.149484958), (17.149484958), (42.149484958), (149484958.550273), (2071124898672)) x(v)
+WHERE hash_numeric(v)::bit(32) != hash_numeric_extended(v, 0)::bit(32)
+ OR hash_numeric(v)::bit(32) = hash_numeric_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------------------------------+----------------------------------+----------------------------------
+ 0 | 11010011100111010001100101110111 | 11010011100111010001100101110111 | 11000110010011000001110000110111
+(1 row)
+
+SELECT v as value, hash_array(v)::bit(32) as standard,
+ hash_array_extended(v, 0)::bit(32) as extended0,
+ hash_array_extended(v, 1)::bit(32) as extended1
+FROM (VALUES ('{0}'::int4[]), ('{0,1,2,3,4}'), ('{17,18,19,20}'), ('{42,34,65,98}'),
+ ('{550273,590027, 870273}'), ('{207112489, 807112489}')) x(v)
+WHERE hash_array(v)::bit(32) != hash_array_extended(v, 0)::bit(32)
+ OR hash_array(v)::bit(32) = hash_array_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashbpchar(v)::bit(32) as standard,
+ hashbpcharextended(v, 0)::bit(32) as extended0,
+ hashbpcharextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL), ('PostgreSQL'), ('eIpUEtqmY89'), ('AXKEJBTK'), ('muop28x03'), ('yi3nm0d73')) x(v)
+WHERE hashbpchar(v)::bit(32) != hashbpcharextended(v, 0)::bit(32)
+ OR hashbpchar(v)::bit(32) = hashbpcharextended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, time_hash(v)::bit(32) as standard,
+ time_hash_extended(v, 0)::bit(32) as extended0,
+ time_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::time), ('11:09:59'), ('1:09:59'), ('11:59:59'), ('7:9:59'), ('5:15:59')) x(v)
+WHERE time_hash(v)::bit(32) != time_hash_extended(v, 0)::bit(32)
+ OR time_hash(v)::bit(32) = time_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, timetz_hash(v)::bit(32) as standard,
+ timetz_hash_extended(v, 0)::bit(32) as extended0,
+ timetz_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::timetz), ('00:11:52.518762-07'), ('00:11:52.51762-08'),
+ ('00:11:52.62-01'), ('00:11:52.62+01'), ('11:59:59+04')) x(v)
+WHERE timetz_hash(v)::bit(32) != timetz_hash_extended(v, 0)::bit(32)
+ OR timetz_hash(v)::bit(32) = timetz_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, timestamp_hash('2017-08-22 00:09:59')::bit(32) as standard,
+ timestamp_hash_extended('2017-08-22 00:09:59', 0)::bit(32) as extended0,
+ timestamp_hash_extended('2017-08-22 00:09:59', 1)::bit(32) as extended1
+FROM (VALUES (NULL::timestamp), ('2017-08-22 00:09:59.518762'), ('2015-08-20 00:11:52.51762-08'),
+ ('2017-05-22 00:11:52.62-01'), ('2013-08-22 00:11:52.62+01'), ('2013-08-22 11:59:59+04')) x(v)
+WHERE timestamp_hash(v)::bit(32) != timestamp_hash_extended(v, 0)::bit(32)
+ OR timestamp_hash(v)::bit(32) = timestamp_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, interval_hash(v)::bit(32) as standard,
+ interval_hash_extended(v, 0)::bit(32) as extended0,
+ interval_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::interval), ('5 month 7 day 46 minutes'), ('1 year 7 day 46 minutes'),
+ ('1 year 7 month 20 day 46 minutes'), ('5 month'),
+ ('17 year 11 month 7 day 9 hours 46 minutes 5 seconds')) x(v)
+WHERE interval_hash(v)::bit(32) != interval_hash_extended(v, 0)::bit(32)
+ OR interval_hash(v)::bit(32) = interval_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, uuid_hash('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11')::bit(32) as standard,
+ uuid_hash_extended('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 0)::bit(32) as extended0,
+ uuid_hash_extended('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 1)::bit(32) as extended1
+FROM (VALUES (NULL::uuid), ('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11'),
+ ('5a9ba4ac-8d6f-11e7-bb31-be2e44b06b34'), ('99c6705c-d939-461c-a3c9-1690ad64ed7b'),
+ ('7deed3ca-8d6f-11e7-bb31-be2e44b06b34'), ('9ad46d4f-6f2a-4edd-aadb-745993928e1e')) x(v)
+WHERE uuid_hash(v)::bit(32) != uuid_hash_extended(v, 0)::bit(32)
+ OR uuid_hash(v)::bit(32) = uuid_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, pg_lsn_hash('16/B374D84')::bit(32) as standard,
+ pg_lsn_hash_extended('16/B374D84', 0)::bit(32) as extended0,
+ pg_lsn_hash_extended('16/B374D84', 1)::bit(32) as extended1
+FROM (VALUES (NULL::pg_lsn), ('16/B374D84'), ('30/B374D84'),
+ ('255/B374D84'), ('25/B379D90'), ('900/F37FD90')) x(v)
+WHERE pg_lsn_hash(v)::bit(32) != pg_lsn_hash_extended(v, 0)::bit(32)
+ OR pg_lsn_hash(v)::bit(32) = pg_lsn_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
+SELECT v as value, hashenum(v)::bit(32) as standard,
+ hashenumextended(v, 0)::bit(32) as extended0,
+ hashenumextended(v, 1)::bit(32) as extended1
+FROM (VALUES ('sad'::mood), ('ok'), ('happy')) x(v)
+WHERE hashenum(v)::bit(32) != hashenumextended(v, 0)::bit(32)
+ OR hashenum(v)::bit(32) = hashenumextended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+DROP TYPE mood;
+SELECT v as value, jsonb_hash(v)::bit(32) as standard,
+ jsonb_hash_extended(v, 0)::bit(32) as extended0,
+ jsonb_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::jsonb), ('{"a": "aaa bbb ddd ccc", "b": ["eee fff ggg"], "c": {"d": "hhh iii"}}'),
+ ('{"foo": [true, "bar"], "tags": {"e": 1, "f": null}}'), ('{"g": {"h": "value"}}')) x(v)
+WHERE jsonb_hash(v)::bit(32) != jsonb_hash_extended(v, 0)::bit(32)
+ OR jsonb_hash(v)::bit(32) = jsonb_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-----------------------------------------------------------------------+----------------------------------+----------------------------------+----------------------------------
+ {"a": "aaa bbb ddd ccc", "b": ["eee fff ggg"], "c": {"d": "hhh iii"}} | 11111001011011010100101000111111 | 11111001011011010100101000110010 | 00000001000000001100100010100001
+ {"foo": [true, "bar"], "tags": {"e": 1, "f": null}} | 01100000110111101101010110011111 | 11111100000110110000010011110011 | 00110110101000100101100001101110
+ {"g": {"h": "value"}} | 00011000010101110001111000110101 | 00011000010101110001111000110111 | 11110000000101100100001100001110
+(3 rows)
+
+SELECT v as value, hash_range(v)::bit(32) as standard,
+ hash_range_extended(v, 0)::bit(32) as extended0,
+ hash_range_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (int4range(10, 20)), (int4range(23, 43)), (int4range(5675, 550273)),
+ (int4range(550274, 1550274)), (int4range(1550275, 208112489))) x(v)
+WHERE hash_range(v)::bit(32) != hash_range_extended(v, 0)::bit(32)
+ OR hash_range(v)::bit(32) = hash_range_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+------------------+----------------------------------+----------------------------------+----------------------------------
+ [550274,1550274) | 00111001100001000011001110101110 | 00111001100001000011001110101111 | 10000101010111101010110000001111
+(1 row)
+
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index eefdeea..2fd3f2b 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/sql/hash_func.sql b/src/test/regress/sql/hash_func.sql
new file mode 100644
index 0000000..560a75b
--- /dev/null
+++ b/src/test/regress/sql/hash_func.sql
@@ -0,0 +1,179 @@
+--
+-- Test hash functions
+--
+
+SELECT v as value, hashint2(v)::bit(32) as standard,
+ hashint2extended(v, 0)::bit(32) as extended0,
+ hashint2extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0::int2), (1::int2), (17::int2), (42::int2)) x(v)
+WHERE hashint2(v)::bit(32) != hashint2extended(v, 0)::bit(32)
+ OR hashint2(v)::bit(32) = hashint2extended(v, 1)::bit(32);
+SELECT v as value, hashint4(v)::bit(32) as standard,
+ hashint4extended(v, 0)::bit(32) as extended0,
+ hashint4extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashint4(v)::bit(32) != hashint4extended(v, 0)::bit(32)
+ OR hashint4(v)::bit(32) = hashint4extended(v, 1)::bit(32);
+SELECT v as value, hashint8(v)::bit(32) as standard,
+ hashint8extended(v, 0)::bit(32) as extended0,
+ hashint8extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashint8(v)::bit(32) != hashint8extended(v, 0)::bit(32)
+ OR hashint8(v)::bit(32) = hashint8extended(v, 1)::bit(32);
+SELECT v as value, hashfloat4(v)::bit(32) as standard,
+ hashfloat4extended(v, 0)::bit(32) as extended0,
+ hashfloat4extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashfloat4(v)::bit(32) != hashfloat4extended(v, 0)::bit(32)
+ OR hashfloat4(v)::bit(32) = hashfloat4extended(v, 1)::bit(32);
+SELECT v as value, hashfloat8(v)::bit(32) as standard,
+ hashfloat8extended(v, 0)::bit(32) as extended0,
+ hashfloat8extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashfloat8(v)::bit(32) != hashfloat8extended(v, 0)::bit(32)
+ OR hashfloat8(v)::bit(32) = hashfloat8extended(v, 1)::bit(32);
+SELECT v as value, hashoid(v)::bit(32) as standard,
+ hashoidextended(v, 0)::bit(32) as extended0,
+ hashoidextended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashoid(v)::bit(32) != hashoidextended(v, 0)::bit(32)
+ OR hashoid(v)::bit(32) = hashoidextended(v, 1)::bit(32);
+SELECT v as value, hashchar(v)::bit(32) as standard,
+ hashcharextended(v, 0)::bit(32) as extended0,
+ hashcharextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::"char"), ('1'), ('x'), ('X'), ('p'), ('N')) x(v)
+WHERE hashchar(v)::bit(32) != hashcharextended(v, 0)::bit(32)
+ OR hashchar(v)::bit(32) = hashcharextended(v, 1)::bit(32);
+SELECT v as value, hashname(v)::bit(32) as standard,
+ hashnameextended(v, 0)::bit(32) as extended0,
+ hashnameextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL), ('PostgreSQL'), ('eIpUEtqmY89'), ('AXKEJBTK'), ('muop28x03'), ('yi3nm0d73')) x(v)
+WHERE hashname(v)::bit(32) != hashnameextended(v, 0)::bit(32)
+ OR hashname(v)::bit(32) = hashnameextended(v, 1)::bit(32);
+SELECT v as value, hashtext(v)::bit(32) as standard,
+ hashtextextended(v, 0)::bit(32) as extended0,
+ hashtextextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL), ('PostgreSQL'), ('eIpUEtqmY89'), ('AXKEJBTK'), ('muop28x03'), ('yi3nm0d73')) x(v)
+WHERE hashtext(v)::bit(32) != hashtextextended(v, 0)::bit(32)
+ OR hashtext(v)::bit(32) = hashtextextended(v, 1)::bit(32);
+--SELECT v as value, hashvarlena(internal_type??)::bit(32) as standard,
+-- hashvarlenaextended(internal type??, 0)::bit(32) as extended0,
+-- hashvarlenaextended(internal type??, 1)::bit(32) as extended1;
+SELECT v as value, hashoidvector(v)::bit(32) as standard,
+ hashoidvectorextended(v, 0)::bit(32) as extended0,
+ hashoidvectorextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::oidvector), ('0 1 2 3 4'), ('17 18 19 20'), ('42 43 42 45'),
+ ('550273 550273 570274'), ('207112489 207112499')) x(v)
+WHERE hashoidvector(v)::bit(32) != hashoidvectorextended(v, 0)::bit(32)
+ OR hashoidvector(v)::bit(32) = hashoidvectorextended(v, 1)::bit(32);
+SELECT v as value, hash_aclitem(v)::bit(32) as standard,
+ hash_aclitem_extended(v, 0)::bit(32) as extended0,
+ hash_aclitem_extended(v, 1)::bit(32) as extended1
+FROM (SELECT DISTINCT(relacl[1]) FROM pg_class LIMIT 10) x(v)
+WHERE hash_aclitem(v)::bit(32) != hash_aclitem_extended(v, 0)::bit(32)
+ OR hash_aclitem(v)::bit(32) = hash_aclitem_extended(v, 1)::bit(32);
+SELECT v as value, hashmacaddr(v)::bit(32) as standard,
+ hashmacaddrextended(v, 0)::bit(32) as extended0,
+ hashmacaddrextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::macaddr), ('08:00:2b:01:02:04'), ('08:00:2b:01:02:04'),
+ ('e2:7f:51:3e:70:49'), ('d6:a9:4a:78:1c:d5'), ('ea:29:b1:5e:1f:a5')) x(v)
+WHERE hashmacaddr(v)::bit(32) != hashmacaddrextended(v, 0)::bit(32)
+ OR hashmacaddr(v)::bit(32) = hashmacaddrextended(v, 1)::bit(32);
+SELECT v as value, hashmacaddr8(v)::bit(32) as standard,
+ hashmacaddr8extended(v, 0)::bit(32) as extended0,
+ hashmacaddr8extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::macaddr8), ('08:00:2b:01:02:04'), ('08:00:2b:01:02:04'),
+ ('e2:7f:51:3e:70:49'), ('d6:a9:4a:78:1c:d5'), ('ea:29:b1:5e:1f:a5')) x(v)
+WHERE hashmacaddr8(v)::bit(32) != hashmacaddr8extended(v, 0)::bit(32)
+ OR hashmacaddr8(v)::bit(32) = hashmacaddr8extended(v, 1)::bit(32);
+SELECT v as value, hashinet('192.168.100.128/25')::bit(32) as standard,
+ hashinetextended('192.168.100.128/25', 0)::bit(32) as extended0,
+ hashinetextended('192.168.100.128/25', 1)::bit(32) as extended1
+FROM (VALUES (NULL::inet), ('192.168.100.128/25'), ('192.168.100.0/8'),
+ ('172.168.10.126/16'), ('172.18.103.126/24'), ('192.188.13.16/32')) x(v)
+WHERE hashinet(v)::bit(32) != hashinetextended(v, 0)::bit(32)
+ OR hashinet(v)::bit(32) = hashinetextended(v, 1)::bit(32);
+SELECT v as value, hash_numeric(149484958.204628457)::bit(32) as standard,
+ hash_numeric_extended(149484958.204628457, 0)::bit(32) as extended0,
+ hash_numeric_extended(149484958.204628457, 1)::bit(32) as extended1
+FROM (VALUES (0), (1.149484958), (17.149484958), (42.149484958), (149484958.550273), (2071124898672)) x(v)
+WHERE hash_numeric(v)::bit(32) != hash_numeric_extended(v, 0)::bit(32)
+ OR hash_numeric(v)::bit(32) = hash_numeric_extended(v, 1)::bit(32);
+SELECT v as value, hash_array(v)::bit(32) as standard,
+ hash_array_extended(v, 0)::bit(32) as extended0,
+ hash_array_extended(v, 1)::bit(32) as extended1
+FROM (VALUES ('{0}'::int4[]), ('{0,1,2,3,4}'), ('{17,18,19,20}'), ('{42,34,65,98}'),
+ ('{550273,590027, 870273}'), ('{207112489, 807112489}')) x(v)
+WHERE hash_array(v)::bit(32) != hash_array_extended(v, 0)::bit(32)
+ OR hash_array(v)::bit(32) = hash_array_extended(v, 1)::bit(32);
+SELECT v as value, hashbpchar(v)::bit(32) as standard,
+ hashbpcharextended(v, 0)::bit(32) as extended0,
+ hashbpcharextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL), ('PostgreSQL'), ('eIpUEtqmY89'), ('AXKEJBTK'), ('muop28x03'), ('yi3nm0d73')) x(v)
+WHERE hashbpchar(v)::bit(32) != hashbpcharextended(v, 0)::bit(32)
+ OR hashbpchar(v)::bit(32) = hashbpcharextended(v, 1)::bit(32);
+SELECT v as value, time_hash(v)::bit(32) as standard,
+ time_hash_extended(v, 0)::bit(32) as extended0,
+ time_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::time), ('11:09:59'), ('1:09:59'), ('11:59:59'), ('7:9:59'), ('5:15:59')) x(v)
+WHERE time_hash(v)::bit(32) != time_hash_extended(v, 0)::bit(32)
+ OR time_hash(v)::bit(32) = time_hash_extended(v, 1)::bit(32);
+SELECT v as value, timetz_hash(v)::bit(32) as standard,
+ timetz_hash_extended(v, 0)::bit(32) as extended0,
+ timetz_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::timetz), ('00:11:52.518762-07'), ('00:11:52.51762-08'),
+ ('00:11:52.62-01'), ('00:11:52.62+01'), ('11:59:59+04')) x(v)
+WHERE timetz_hash(v)::bit(32) != timetz_hash_extended(v, 0)::bit(32)
+ OR timetz_hash(v)::bit(32) = timetz_hash_extended(v, 1)::bit(32);
+SELECT v as value, timestamp_hash('2017-08-22 00:09:59')::bit(32) as standard,
+ timestamp_hash_extended('2017-08-22 00:09:59', 0)::bit(32) as extended0,
+ timestamp_hash_extended('2017-08-22 00:09:59', 1)::bit(32) as extended1
+FROM (VALUES (NULL::timestamp), ('2017-08-22 00:09:59.518762'), ('2015-08-20 00:11:52.51762-08'),
+ ('2017-05-22 00:11:52.62-01'), ('2013-08-22 00:11:52.62+01'), ('2013-08-22 11:59:59+04')) x(v)
+WHERE timestamp_hash(v)::bit(32) != timestamp_hash_extended(v, 0)::bit(32)
+ OR timestamp_hash(v)::bit(32) = timestamp_hash_extended(v, 1)::bit(32);
+SELECT v as value, interval_hash(v)::bit(32) as standard,
+ interval_hash_extended(v, 0)::bit(32) as extended0,
+ interval_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::interval), ('5 month 7 day 46 minutes'), ('1 year 7 day 46 minutes'),
+ ('1 year 7 month 20 day 46 minutes'), ('5 month'),
+ ('17 year 11 month 7 day 9 hours 46 minutes 5 seconds')) x(v)
+WHERE interval_hash(v)::bit(32) != interval_hash_extended(v, 0)::bit(32)
+ OR interval_hash(v)::bit(32) = interval_hash_extended(v, 1)::bit(32);
+SELECT v as value, uuid_hash('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11')::bit(32) as standard,
+ uuid_hash_extended('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 0)::bit(32) as extended0,
+ uuid_hash_extended('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 1)::bit(32) as extended1
+FROM (VALUES (NULL::uuid), ('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11'),
+ ('5a9ba4ac-8d6f-11e7-bb31-be2e44b06b34'), ('99c6705c-d939-461c-a3c9-1690ad64ed7b'),
+ ('7deed3ca-8d6f-11e7-bb31-be2e44b06b34'), ('9ad46d4f-6f2a-4edd-aadb-745993928e1e')) x(v)
+WHERE uuid_hash(v)::bit(32) != uuid_hash_extended(v, 0)::bit(32)
+ OR uuid_hash(v)::bit(32) = uuid_hash_extended(v, 1)::bit(32);
+SELECT v as value, pg_lsn_hash('16/B374D84')::bit(32) as standard,
+ pg_lsn_hash_extended('16/B374D84', 0)::bit(32) as extended0,
+ pg_lsn_hash_extended('16/B374D84', 1)::bit(32) as extended1
+FROM (VALUES (NULL::pg_lsn), ('16/B374D84'), ('30/B374D84'),
+ ('255/B374D84'), ('25/B379D90'), ('900/F37FD90')) x(v)
+WHERE pg_lsn_hash(v)::bit(32) != pg_lsn_hash_extended(v, 0)::bit(32)
+ OR pg_lsn_hash(v)::bit(32) = pg_lsn_hash_extended(v, 1)::bit(32);
+CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
+SELECT v as value, hashenum(v)::bit(32) as standard,
+ hashenumextended(v, 0)::bit(32) as extended0,
+ hashenumextended(v, 1)::bit(32) as extended1
+FROM (VALUES ('sad'::mood), ('ok'), ('happy')) x(v)
+WHERE hashenum(v)::bit(32) != hashenumextended(v, 0)::bit(32)
+ OR hashenum(v)::bit(32) = hashenumextended(v, 1)::bit(32);
+DROP TYPE mood;
+SELECT v as value, jsonb_hash(v)::bit(32) as standard,
+ jsonb_hash_extended(v, 0)::bit(32) as extended0,
+ jsonb_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::jsonb), ('{"a": "aaa bbb ddd ccc", "b": ["eee fff ggg"], "c": {"d": "hhh iii"}}'),
+ ('{"foo": [true, "bar"], "tags": {"e": 1, "f": null}}'), ('{"g": {"h": "value"}}')) x(v)
+WHERE jsonb_hash(v)::bit(32) != jsonb_hash_extended(v, 0)::bit(32)
+ OR jsonb_hash(v)::bit(32) = jsonb_hash_extended(v, 1)::bit(32);
+SELECT v as value, hash_range(v)::bit(32) as standard,
+ hash_range_extended(v, 0)::bit(32) as extended0,
+ hash_range_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (int4range(10, 20)), (int4range(23, 43)), (int4range(5675, 550273)),
+ (int4range(550274, 1550274)), (int4range(1550275, 208112489))) x(v)
+WHERE hash_range(v)::bit(32) != hash_range_extended(v, 0)::bit(32)
+ OR hash_range(v)::bit(32) = hash_range_extended(v, 1)::bit(32);
--
2.6.2
On Wed, Aug 30, 2017 at 10:43 AM, amul sul <sulamul@gmail.com> wrote:
Thanks for the suggestion, I have updated 0002-patch accordingly.
Using this I found some strange behaviours as follow:1) standard and extended0 output for the jsonb_hash case is not same.
2) standard and extended0 output for the hash_range case is not same when
input
is int4range(550274, 1550274) other case in the patch are fine. This can
be
reproducible with other values as well, for e.g. int8range(1570275,
208112489).Will look into this tomorrow.
Those sound like bugs in your patch. Specifically:
+ /* Same approach as hash_range */
+ result = hash_uint32_extended((uint32) flags, seed);
+ result ^= lower_hash;
+ result = (result << 1) | (result >> 63);
+ result ^= upper_hash;
That doesn't give compatible results. The easiest thing to do might
be to rotate the high 32 bits and the low 32 bits separately.
JsonbHashScalarValueExtended has the same problem. Maybe create a
helper function that does something like this (untested):
((x << 1) & UINT64COUNT(0xfffffffefffffffe)) | ((x >> 31) &
UINT64CONST(0x100000001))
Another case which I want to discuss is, extended and standard version of
hashfloat4, hashfloat8 & hash_numeric function will have the same output for
zero
value irrespective of seed value. Do you think we need a fix for this?
Yes, I think you should return the seed rather than 0 in the cases
where the current code hard-codes a 0 return. That will give the same
behavior in the seed == 0 case while cheaply getting us something a
bit different when there is a seed.
BTW, you should probably invent and use a PG_RETURN_UINT64 macro in this patch.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 30, 2017 at 9:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Aug 30, 2017 at 10:43 AM, amul sul <sulamul@gmail.com> wrote:
Thanks for the suggestion, I have updated 0002-patch accordingly.
Using this I found some strange behaviours as follow:1) standard and extended0 output for the jsonb_hash case is not same.
2) standard and extended0 output for the hash_range case is not same when
input
is int4range(550274, 1550274) other case in the patch are fine. Thiscan
be
reproducible with other values as well, for e.g. int8range(1570275,
208112489).Will look into this tomorrow.
Those sound like bugs in your patch. Specifically:
+ /* Same approach as hash_range */ + result = hash_uint32_extended((uint32) flags, seed); + result ^= lower_hash; + result = (result << 1) | (result >> 63); + result ^= upper_hash;
Yes, you are correct.
That doesn't give compatible results. The easiest thing to do might
be to rotate the high 32 bits and the low 32 bits separately.
JsonbHashScalarValueExtended has the same problem. Maybe create a
helper function that does something like this (untested):((x << 1) & UINT64COUNT(0xfffffffefffffffe)) | ((x >> 31) &
UINT64CONST(0x100000001))
This working as expected, also tested by executing the following SQL
multiple times:
SELECT v as value, hash_range(v)::bit(32) as standard,
hash_range_extended(v, 0)::bit(32) as extended0,
hash_range_extended(v, 1)::bit(32) as extended1
FROM (VALUES (int8range(floor(random() * 100)::int8, floor(random() *
1000)::int8)),
(int8range(floor(random() * 1000)::int8, floor(random() *
10000)::int8)),
(int8range(floor(random() * 10000)::int8, floor(random() *
100000)::int8)),
(int8range(floor(random() * 10000000)::int8, floor(random() *
100000000)::int8)),
(int8range(floor(random() * 100000000)::int8, floor(random() *
1000000000)::int8))) x(v)
WHERE hash_range(v)::bit(32) != hash_range_extended(v, 0)::bit(32)
OR hash_range(v)::bit(32) = hash_range_extended(v, 1)::bit(32);
Another case which I want to discuss is, extended and standard version of
hashfloat4, hashfloat8 & hash_numeric function will have the same outputfor
zero
value irrespective of seed value. Do you think we need a fix for this?Yes, I think you should return the seed rather than 0 in the cases
where the current code hard-codes a 0 return. That will give the same
behavior in the seed == 0 case while cheaply getting us something a
bit different when there is a seed.Fixed in the attached version.
BTW, you should probably invent and use a PG_RETURN_UINT64 macro in this
patch.Added i
n the attached version
.
Thanks for your all the suggestions.
Regards,
Amul
Attachments:
0001-add-optional-second-hash-proc-v4.patchapplication/octet-stream; name=0001-add-optional-second-hash-proc-v4.patchDownload
From 0fc82b181df4274ba8a3e7703e0dd5e9d06969be Mon Sep 17 00:00:00 2001
From: Amul Sul <sulamul@gmail.com>
Date: Fri, 18 Aug 2017 15:28:26 +0530
Subject: [PATCH 1/2] add-optional-second-hash-proc-v4
v3:
Updated w.r.t Robert suggestion in
message-id : CA%2BTgmoYPvoTMGJYkVBA%3D2j1o1wpZ-WZCYiS7_B-AqqZBkWT4HQ%40mail.gmail.com
v2:
Extended remaining hash function.
v1:
Patch by Robert Haas.
---
doc/src/sgml/xindex.sgml | 11 +-
src/backend/access/hash/hashfunc.c | 372 +++++++++++++++++++++++++++-
src/backend/access/hash/hashpage.c | 2 +-
src/backend/access/hash/hashutil.c | 6 +-
src/backend/access/hash/hashvalidate.c | 42 +++-
src/backend/commands/opclasscmds.c | 34 ++-
src/backend/utils/adt/acl.c | 15 ++
src/backend/utils/adt/arrayfuncs.c | 79 ++++++
src/backend/utils/adt/date.c | 21 ++
src/backend/utils/adt/jsonb_op.c | 43 ++++
src/backend/utils/adt/jsonb_util.c | 43 ++++
src/backend/utils/adt/mac.c | 9 +
src/backend/utils/adt/mac8.c | 9 +
src/backend/utils/adt/network.c | 10 +
src/backend/utils/adt/numeric.c | 60 +++++
src/backend/utils/adt/pg_lsn.c | 6 +
src/backend/utils/adt/rangetypes.c | 63 +++++
src/backend/utils/adt/timestamp.c | 19 ++
src/backend/utils/adt/uuid.c | 8 +
src/backend/utils/adt/varchar.c | 18 ++
src/backend/utils/cache/lsyscache.c | 8 +-
src/backend/utils/cache/typcache.c | 58 ++++-
src/include/access/hash.h | 21 +-
src/include/catalog/pg_amproc.h | 36 +++
src/include/catalog/pg_proc.h | 54 ++++
src/include/fmgr.h | 1 +
src/include/utils/jsonb.h | 2 +
src/include/utils/typcache.h | 4 +
src/test/regress/expected/alter_generic.out | 4 +-
29 files changed, 1018 insertions(+), 40 deletions(-)
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml
index 333a36c..0f3c46b 100644
--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@@ -436,7 +436,8 @@
</table>
<para>
- Hash indexes require one support function, shown in <xref
+ Hash indexes require one support function, and allow a second one to be
+ supplied at the operator class author's option, as shown in <xref
linkend="xindex-hash-support-table">.
</para>
@@ -451,9 +452,15 @@
</thead>
<tbody>
<row>
- <entry>Compute the hash value for a key</entry>
+ <entry>Compute the 32-bit hash value for a key</entry>
<entry>1</entry>
</row>
+ <row>
+ <entry>
+ Compute the 64-bit hash value for a key given a 64-bit salt
+ </entry>
+ <entry>2</entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index a127f3f..dcabf7e 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -47,18 +47,36 @@ hashchar(PG_FUNCTION_ARGS)
}
Datum
+hashcharextended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended((int32) PG_GETARG_CHAR(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashint2(PG_FUNCTION_ARGS)
{
return hash_uint32((int32) PG_GETARG_INT16(0));
}
Datum
+hashint2extended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended((int32) PG_GETARG_INT16(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashint4(PG_FUNCTION_ARGS)
{
return hash_uint32(PG_GETARG_INT32(0));
}
Datum
+hashint4extended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended(PG_GETARG_INT32(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashint8(PG_FUNCTION_ARGS)
{
/*
@@ -79,18 +97,43 @@ hashint8(PG_FUNCTION_ARGS)
}
Datum
+hashint8extended(PG_FUNCTION_ARGS)
+{
+ /* Same approach as hashint8 */
+ int64 val = PG_GETARG_INT64(0);
+ uint32 lohalf = (uint32) val;
+ uint32 hihalf = (uint32) (val >> 32);
+
+ lohalf ^= (val >= 0) ? hihalf : ~hihalf;
+
+ return hash_uint32_extended(lohalf, PG_GETARG_INT64(1));
+}
+
+Datum
hashoid(PG_FUNCTION_ARGS)
{
return hash_uint32((uint32) PG_GETARG_OID(0));
}
Datum
+hashoidextended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended((uint32) PG_GETARG_OID(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashenum(PG_FUNCTION_ARGS)
{
return hash_uint32((uint32) PG_GETARG_OID(0));
}
Datum
+hashenumextended(PG_FUNCTION_ARGS)
+{
+ return hash_uint32_extended((uint32) PG_GETARG_OID(0), PG_GETARG_INT64(1));
+}
+
+Datum
hashfloat4(PG_FUNCTION_ARGS)
{
float4 key = PG_GETARG_FLOAT4(0);
@@ -117,6 +160,21 @@ hashfloat4(PG_FUNCTION_ARGS)
}
Datum
+hashfloat4extended(PG_FUNCTION_ARGS)
+{
+ float4 key = PG_GETARG_FLOAT4(0);
+ uint64 seed = PG_GETARG_INT64(1);
+ float8 key8;
+
+ /* Same approach as hashfloat4 */
+ if (key == (float4) 0)
+ PG_RETURN_UINT64(seed);
+ key8 = key;
+
+ return hash_any_extended((unsigned char *) &key8, sizeof(key8), seed);
+}
+
+Datum
hashfloat8(PG_FUNCTION_ARGS)
{
float8 key = PG_GETARG_FLOAT8(0);
@@ -133,6 +191,19 @@ hashfloat8(PG_FUNCTION_ARGS)
}
Datum
+hashfloat8extended(PG_FUNCTION_ARGS)
+{
+ float8 key = PG_GETARG_FLOAT8(0);
+ uint64 seed = PG_GETARG_INT64(1);
+
+ /* Same approach as hashfloat8 */
+ if (key == (float8) 0)
+ PG_RETURN_UINT64(seed);
+
+ return hash_any_extended((unsigned char *) &key, sizeof(key), seed);
+}
+
+Datum
hashoidvector(PG_FUNCTION_ARGS)
{
oidvector *key = (oidvector *) PG_GETARG_POINTER(0);
@@ -141,6 +212,16 @@ hashoidvector(PG_FUNCTION_ARGS)
}
Datum
+hashoidvectorextended(PG_FUNCTION_ARGS)
+{
+ oidvector *key = (oidvector *) PG_GETARG_POINTER(0);
+
+ return hash_any_extended((unsigned char *) key->values,
+ key->dim1 * sizeof(Oid),
+ PG_GETARG_INT64(1));
+}
+
+Datum
hashname(PG_FUNCTION_ARGS)
{
char *key = NameStr(*PG_GETARG_NAME(0));
@@ -149,6 +230,15 @@ hashname(PG_FUNCTION_ARGS)
}
Datum
+hashnameextended(PG_FUNCTION_ARGS)
+{
+ char *key = NameStr(*PG_GETARG_NAME(0));
+
+ return hash_any_extended((unsigned char *) key, strlen(key),
+ PG_GETARG_INT64(1));
+}
+
+Datum
hashtext(PG_FUNCTION_ARGS)
{
text *key = PG_GETARG_TEXT_PP(0);
@@ -168,6 +258,22 @@ hashtext(PG_FUNCTION_ARGS)
return result;
}
+Datum
+hashtextextended(PG_FUNCTION_ARGS)
+{
+ text *key = PG_GETARG_TEXT_PP(0);
+ Datum result;
+
+ /* Same approach as hashtext */
+ result = hash_any_extended((unsigned char *) VARDATA_ANY(key),
+ VARSIZE_ANY_EXHDR(key),
+ PG_GETARG_INT64(1));
+
+ PG_FREE_IF_COPY(key, 0);
+
+ return result;
+}
+
/*
* hashvarlena() can be used for any varlena datatype in which there are
* no non-significant bits, ie, distinct bitpatterns never compare as equal.
@@ -187,6 +293,21 @@ hashvarlena(PG_FUNCTION_ARGS)
return result;
}
+Datum
+hashvarlenaextended(PG_FUNCTION_ARGS)
+{
+ struct varlena *key = PG_GETARG_VARLENA_PP(0);
+ Datum result;
+
+ result = hash_any_extended((unsigned char *) VARDATA_ANY(key),
+ VARSIZE_ANY_EXHDR(key),
+ PG_GETARG_INT64(1));
+
+ PG_FREE_IF_COPY(key, 0);
+
+ return result;
+}
+
/*
* This hash function was written by Bob Jenkins
* (bob_jenkins@burtleburtle.net), and superficially adapted
@@ -502,7 +623,227 @@ hash_any(register const unsigned char *k, register int keylen)
}
/*
- * hash_uint32() -- hash a 32-bit value
+ * hash_any_extended() -- hash into a 64-bit value, using an optional seed
+ * k : the key (the unaligned variable-length array of bytes)
+ * len : the length of the key, counting by bytes
+ * seed : a 64-bit seed (0 means no seed)
+ *
+ * Returns a uint64 value. Otherwise similar to hash_any.
+ */
+Datum
+hash_any_extended(register const unsigned char *k, register int keylen,
+ uint64 seed)
+{
+ register uint32 a,
+ b,
+ c,
+ len;
+
+ /* Set up the internal state */
+ len = keylen;
+ a = b = c = 0x9e3779b9 + len + 3923095;
+
+ /* If the seed is non-zero, use it to perturb the internal state. */
+ if (seed != 0)
+ {
+ /*
+ * In essence, the seed is treated as part of the data being hashed,
+ * but for simplicity, we pretend that it's padded with four bytes of
+ * zeroes so that the seed constitutes a 4-byte chunk.
+ */
+ a += (uint32) (seed >> 32);
+ b += (uint32) seed;
+ mix(a, b, c);
+ }
+
+ /* If the source pointer is word-aligned, we use word-wide fetches */
+ if (((uintptr_t) k & UINT32_ALIGN_MASK) == 0)
+ {
+ /* Code path for aligned source data */
+ register const uint32 *ka = (const uint32 *) k;
+
+ /* handle most of the key */
+ while (len >= 12)
+ {
+ a += ka[0];
+ b += ka[1];
+ c += ka[2];
+ mix(a, b, c);
+ ka += 3;
+ len -= 12;
+ }
+
+ /* handle the last 11 bytes */
+ k = (const unsigned char *) ka;
+#ifdef WORDS_BIGENDIAN
+ switch (len)
+ {
+ case 11:
+ c += ((uint32) k[10] << 8);
+ /* fall through */
+ case 10:
+ c += ((uint32) k[9] << 16);
+ /* fall through */
+ case 9:
+ c += ((uint32) k[8] << 24);
+ /* the lowest byte of c is reserved for the length */
+ /* fall through */
+ case 8:
+ b += ka[1];
+ a += ka[0];
+ break;
+ case 7:
+ b += ((uint32) k[6] << 8);
+ /* fall through */
+ case 6:
+ b += ((uint32) k[5] << 16);
+ /* fall through */
+ case 5:
+ b += ((uint32) k[4] << 24);
+ /* fall through */
+ case 4:
+ a += ka[0];
+ break;
+ case 3:
+ a += ((uint32) k[2] << 8);
+ /* fall through */
+ case 2:
+ a += ((uint32) k[1] << 16);
+ /* fall through */
+ case 1:
+ a += ((uint32) k[0] << 24);
+ /* case 0: nothing left to add */
+ }
+#else /* !WORDS_BIGENDIAN */
+ switch (len)
+ {
+ case 11:
+ c += ((uint32) k[10] << 24);
+ /* fall through */
+ case 10:
+ c += ((uint32) k[9] << 16);
+ /* fall through */
+ case 9:
+ c += ((uint32) k[8] << 8);
+ /* the lowest byte of c is reserved for the length */
+ /* fall through */
+ case 8:
+ b += ka[1];
+ a += ka[0];
+ break;
+ case 7:
+ b += ((uint32) k[6] << 16);
+ /* fall through */
+ case 6:
+ b += ((uint32) k[5] << 8);
+ /* fall through */
+ case 5:
+ b += k[4];
+ /* fall through */
+ case 4:
+ a += ka[0];
+ break;
+ case 3:
+ a += ((uint32) k[2] << 16);
+ /* fall through */
+ case 2:
+ a += ((uint32) k[1] << 8);
+ /* fall through */
+ case 1:
+ a += k[0];
+ /* case 0: nothing left to add */
+ }
+#endif /* WORDS_BIGENDIAN */
+ }
+ else
+ {
+ /* Code path for non-aligned source data */
+
+ /* handle most of the key */
+ while (len >= 12)
+ {
+#ifdef WORDS_BIGENDIAN
+ a += (k[3] + ((uint32) k[2] << 8) + ((uint32) k[1] << 16) + ((uint32) k[0] << 24));
+ b += (k[7] + ((uint32) k[6] << 8) + ((uint32) k[5] << 16) + ((uint32) k[4] << 24));
+ c += (k[11] + ((uint32) k[10] << 8) + ((uint32) k[9] << 16) + ((uint32) k[8] << 24));
+#else /* !WORDS_BIGENDIAN */
+ a += (k[0] + ((uint32) k[1] << 8) + ((uint32) k[2] << 16) + ((uint32) k[3] << 24));
+ b += (k[4] + ((uint32) k[5] << 8) + ((uint32) k[6] << 16) + ((uint32) k[7] << 24));
+ c += (k[8] + ((uint32) k[9] << 8) + ((uint32) k[10] << 16) + ((uint32) k[11] << 24));
+#endif /* WORDS_BIGENDIAN */
+ mix(a, b, c);
+ k += 12;
+ len -= 12;
+ }
+
+ /* handle the last 11 bytes */
+#ifdef WORDS_BIGENDIAN
+ switch (len) /* all the case statements fall through */
+ {
+ case 11:
+ c += ((uint32) k[10] << 8);
+ case 10:
+ c += ((uint32) k[9] << 16);
+ case 9:
+ c += ((uint32) k[8] << 24);
+ /* the lowest byte of c is reserved for the length */
+ case 8:
+ b += k[7];
+ case 7:
+ b += ((uint32) k[6] << 8);
+ case 6:
+ b += ((uint32) k[5] << 16);
+ case 5:
+ b += ((uint32) k[4] << 24);
+ case 4:
+ a += k[3];
+ case 3:
+ a += ((uint32) k[2] << 8);
+ case 2:
+ a += ((uint32) k[1] << 16);
+ case 1:
+ a += ((uint32) k[0] << 24);
+ /* case 0: nothing left to add */
+ }
+#else /* !WORDS_BIGENDIAN */
+ switch (len) /* all the case statements fall through */
+ {
+ case 11:
+ c += ((uint32) k[10] << 24);
+ case 10:
+ c += ((uint32) k[9] << 16);
+ case 9:
+ c += ((uint32) k[8] << 8);
+ /* the lowest byte of c is reserved for the length */
+ case 8:
+ b += ((uint32) k[7] << 24);
+ case 7:
+ b += ((uint32) k[6] << 16);
+ case 6:
+ b += ((uint32) k[5] << 8);
+ case 5:
+ b += k[4];
+ case 4:
+ a += ((uint32) k[3] << 24);
+ case 3:
+ a += ((uint32) k[2] << 16);
+ case 2:
+ a += ((uint32) k[1] << 8);
+ case 1:
+ a += k[0];
+ /* case 0: nothing left to add */
+ }
+#endif /* WORDS_BIGENDIAN */
+ }
+
+ final(a, b, c);
+
+ /* report the result */
+ PG_RETURN_UINT64(((uint64) b << 32) | c);
+}
+
+/*
+ * hash_uint32() -- hash a 32-bit value to a 32-bit value
*
* This has the same result as
* hash_any(&k, sizeof(uint32))
@@ -523,3 +864,32 @@ hash_uint32(uint32 k)
/* report the result */
return UInt32GetDatum(c);
}
+
+/*
+ * hash_uint32_extended() -- hash a 32-bit value to a 64-bit value, with a seed
+ *
+ * Like hash_uint32, this is a convenience function.
+ */
+Datum
+hash_uint32_extended(uint32 k, uint64 seed)
+{
+ register uint32 a,
+ b,
+ c;
+
+ a = b = c = 0x9e3779b9 + (uint32) sizeof(uint32) + 3923095;
+
+ if (seed != 0)
+ {
+ a += (uint32) (seed >> 32);
+ b += (uint32) seed;
+ mix(a, b, c);
+ }
+
+ a += k;
+
+ final(a, b, c);
+
+ /* report the result */
+ PG_RETURN_UINT64(((uint64) b << 32) | c);
+}
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 7b2906b..0579841 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -373,7 +373,7 @@ _hash_init(Relation rel, double num_tuples, ForkNumber forkNum)
if (ffactor < 10)
ffactor = 10;
- procid = index_getprocid(rel, 1, HASHPROC);
+ procid = index_getprocid(rel, 1, HASHSTANDARD_PROC);
/*
* We initialize the metapage, the first N bucket pages, and the first
diff --git a/src/backend/access/hash/hashutil.c b/src/backend/access/hash/hashutil.c
index 9b803af..869cbc1 100644
--- a/src/backend/access/hash/hashutil.c
+++ b/src/backend/access/hash/hashutil.c
@@ -85,7 +85,7 @@ _hash_datum2hashkey(Relation rel, Datum key)
Oid collation;
/* XXX assumes index has only one attribute */
- procinfo = index_getprocinfo(rel, 1, HASHPROC);
+ procinfo = index_getprocinfo(rel, 1, HASHSTANDARD_PROC);
collation = rel->rd_indcollation[0];
return DatumGetUInt32(FunctionCall1Coll(procinfo, collation, key));
@@ -108,10 +108,10 @@ _hash_datum2hashkey_type(Relation rel, Datum key, Oid keytype)
hash_proc = get_opfamily_proc(rel->rd_opfamily[0],
keytype,
keytype,
- HASHPROC);
+ HASHSTANDARD_PROC);
if (!RegProcedureIsValid(hash_proc))
elog(ERROR, "missing support function %d(%u,%u) for index \"%s\"",
- HASHPROC, keytype, keytype,
+ HASHSTANDARD_PROC, keytype, keytype,
RelationGetRelationName(rel));
collation = rel->rd_indcollation[0];
diff --git a/src/backend/access/hash/hashvalidate.c b/src/backend/access/hash/hashvalidate.c
index 30b29cb..8b633c2 100644
--- a/src/backend/access/hash/hashvalidate.c
+++ b/src/backend/access/hash/hashvalidate.c
@@ -29,7 +29,7 @@
#include "utils/syscache.h"
-static bool check_hash_func_signature(Oid funcid, Oid restype, Oid argtype);
+static bool check_hash_func_signature(Oid funcid, int16 amprocnum, Oid argtype);
/*
@@ -105,8 +105,9 @@ hashvalidate(Oid opclassoid)
/* Check procedure numbers and function signatures */
switch (procform->amprocnum)
{
- case HASHPROC:
- if (!check_hash_func_signature(procform->amproc, INT4OID,
+ case HASHSTANDARD_PROC:
+ case HASHEXTENDED_PROC:
+ if (!check_hash_func_signature(procform->amproc, procform->amprocnum,
procform->amproclefttype))
{
ereport(INFO,
@@ -264,19 +265,37 @@ hashvalidate(Oid opclassoid)
* hacks in the core hash opclass definitions.
*/
static bool
-check_hash_func_signature(Oid funcid, Oid restype, Oid argtype)
+check_hash_func_signature(Oid funcid, int16 amprocnum, Oid argtype)
{
bool result = true;
+ Oid restype;
+ int16 nargs;
HeapTuple tp;
Form_pg_proc procform;
+ switch (amprocnum)
+ {
+ case HASHSTANDARD_PROC:
+ restype = INT4OID;
+ nargs = 1;
+ break;
+
+ case HASHEXTENDED_PROC:
+ restype = INT8OID;
+ nargs = 2;
+ break;
+
+ default:
+ elog(ERROR, "invalid amprocnum");
+ }
+
tp = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid));
if (!HeapTupleIsValid(tp))
elog(ERROR, "cache lookup failed for function %u", funcid);
procform = (Form_pg_proc) GETSTRUCT(tp);
if (procform->prorettype != restype || procform->proretset ||
- procform->pronargs != 1)
+ procform->pronargs != nargs)
result = false;
if (!IsBinaryCoercible(argtype, procform->proargtypes.values[0]))
@@ -290,24 +309,29 @@ check_hash_func_signature(Oid funcid, Oid restype, Oid argtype)
* identity, not just its input type, because hashvarlena() takes
* INTERNAL and allowing any such function seems too scary.
*/
- if (funcid == F_HASHINT4 &&
+ if ((funcid == F_HASHINT4 || funcid == F_HASHINT4EXTENDED) &&
(argtype == DATEOID ||
argtype == ABSTIMEOID || argtype == RELTIMEOID ||
argtype == XIDOID || argtype == CIDOID))
/* okay, allowed use of hashint4() */ ;
- else if (funcid == F_TIMESTAMP_HASH &&
+ else if ((funcid == F_TIMESTAMP_HASH ||
+ funcid == F_TIMESTAMP_HASH_EXTENDED) &&
argtype == TIMESTAMPTZOID)
/* okay, allowed use of timestamp_hash() */ ;
- else if (funcid == F_HASHCHAR &&
+ else if ((funcid == F_HASHCHAR || funcid == F_HASHCHAREXTENDED) &&
argtype == BOOLOID)
/* okay, allowed use of hashchar() */ ;
- else if (funcid == F_HASHVARLENA &&
+ else if ((funcid == F_HASHVARLENA || funcid == F_HASHVARLENAEXTENDED) &&
argtype == BYTEAOID)
/* okay, allowed use of hashvarlena() */ ;
else
result = false;
}
+ /* If function takes a second argument, it must be for a 64-bit salt. */
+ if (nargs == 2 && procform->proargtypes.values[1] != INT8OID)
+ result = false;
+
ReleaseSysCache(tp);
return result;
}
diff --git a/src/backend/commands/opclasscmds.c b/src/backend/commands/opclasscmds.c
index a31b1ac..d23e6d6 100644
--- a/src/backend/commands/opclasscmds.c
+++ b/src/backend/commands/opclasscmds.c
@@ -18,6 +18,7 @@
#include <limits.h>
#include "access/genam.h"
+#include "access/hash.h"
#include "access/heapam.h"
#include "access/nbtree.h"
#include "access/htup_details.h"
@@ -1129,7 +1130,8 @@ assignProcTypes(OpFamilyMember *member, Oid amoid, Oid typeoid)
/*
* btree comparison procs must be 2-arg procs returning int4, while btree
* sortsupport procs must take internal and return void. hash support
- * procs must be 1-arg procs returning int4. Otherwise we don't know.
+ * proc 1 must be a 1-arg proc returning int4, while proc 2 must be a
+ * 2-arg proc returning int8. Otherwise we don't know.
*/
if (amoid == BTREE_AM_OID)
{
@@ -1172,14 +1174,28 @@ assignProcTypes(OpFamilyMember *member, Oid amoid, Oid typeoid)
}
else if (amoid == HASH_AM_OID)
{
- if (procform->pronargs != 1)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
- errmsg("hash procedures must have one argument")));
- if (procform->prorettype != INT4OID)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
- errmsg("hash procedures must return integer")));
+ if (member->number == HASHSTANDARD_PROC)
+ {
+ if (procform->pronargs != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 1 must have one argument")));
+ if (procform->prorettype != INT4OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 1 must return integer")));
+ }
+ else if (member->number == HASHEXTENDED_PROC)
+ {
+ if (procform->pronargs != 2)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 2 must have two arguments")));
+ if (procform->prorettype != INT8OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("hash procedure 2 must return bigint")));
+ }
/*
* If lefttype/righttype isn't specified, use the proc's input type
diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index 2efb6c9..917491e 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -16,6 +16,7 @@
#include <ctype.h>
+#include "access/hash.h"
#include "access/htup_details.h"
#include "catalog/catalog.h"
#include "catalog/namespace.h"
@@ -717,6 +718,20 @@ hash_aclitem(PG_FUNCTION_ARGS)
PG_RETURN_UINT32((uint32) (a->ai_privs + a->ai_grantee + a->ai_grantor));
}
+/*
+ * Hash aclitems with a 64-bit seed, if the seed is non-zero.
+ *
+ * Returns a uint64 value. Otherwise similar to hash_aclitem.
+ */
+Datum
+hash_aclitem_extended(PG_FUNCTION_ARGS)
+{
+ AclItem *a = PG_GETARG_ACLITEM_P(0);
+ uint64 seed = PG_GETARG_INT64(1);
+ uint32 sum = (uint32) (a->ai_privs + a->ai_grantee + a->ai_grantor);
+
+ return (seed == 0) ? UInt64GetDatum(sum) : hash_uint32_extended(sum, seed);
+}
/*
* acldefault() --- create an ACL describing default access permissions
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 34dadd6..522af7a 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -20,6 +20,7 @@
#endif
#include <math.h>
+#include "access/hash.h"
#include "access/htup_details.h"
#include "catalog/pg_type.h"
#include "funcapi.h"
@@ -4020,6 +4021,84 @@ hash_array(PG_FUNCTION_ARGS)
PG_RETURN_UINT32(result);
}
+/*
+ * Returns 64-bit value by hashing a value to a 64-bit value, with a seed.
+ * Otherwise, similar to hash_array.
+ */
+Datum
+hash_array_extended(PG_FUNCTION_ARGS)
+{
+ AnyArrayType *array = PG_GETARG_ANY_ARRAY(0);
+ uint64 seed = PG_GETARG_INT64(1);
+ int ndims = AARR_NDIM(array);
+ int *dims = AARR_DIMS(array);
+ Oid element_type = AARR_ELEMTYPE(array);
+ uint64 result = 1;
+ int nitems;
+ TypeCacheEntry *typentry;
+ int typlen;
+ bool typbyval;
+ char typalign;
+ int i;
+ array_iter iter;
+ FunctionCallInfoData locfcinfo;
+
+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL ||
+ typentry->type_id != element_type)
+ {
+ typentry = lookup_type_cache(element_type,
+ TYPECACHE_HASH_EXTENDED_PROC_FINFO);
+ if (!OidIsValid(typentry->hash_extended_proc_finfo.fn_oid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("could not identify an extended hash function for type %s",
+ format_type_be(element_type))));
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
+ typlen = typentry->typlen;
+ typbyval = typentry->typbyval;
+ typalign = typentry->typalign;
+
+ InitFunctionCallInfoData(locfcinfo, &typentry->hash_extended_proc_finfo, 2,
+ InvalidOid, NULL, NULL);
+
+ /* Loop over source data */
+ nitems = ArrayGetNItems(ndims, dims);
+ array_iter_setup(&iter, array);
+
+ for (i = 0; i < nitems; i++)
+ {
+ Datum elt;
+ bool isnull;
+ uint64 elthash;
+
+ /* Get element, checking for NULL */
+ elt = array_iter_next(&iter, &isnull, i, typlen, typbyval, typalign);
+
+ if (isnull)
+ {
+ elthash = 0;
+ }
+ else
+ {
+ /* Apply the hash function */
+ locfcinfo.arg[0] = elt;
+ locfcinfo.arg[1] = seed;
+ locfcinfo.argnull[0] = false;
+ locfcinfo.argnull[1] = false;
+ locfcinfo.isnull = false;
+ elthash = DatumGetUInt64(FunctionCallInvoke(&locfcinfo));
+ }
+
+ result = (result << 5) - result + elthash;
+ }
+
+ AARR_FREE_IF_COPY(array, 0);
+
+ PG_RETURN_UINT64(result);
+}
+
/*-----------------------------------------------------------------------------
* array overlap/containment comparisons
diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index 7d89d79..34c0b52 100644
--- a/src/backend/utils/adt/date.c
+++ b/src/backend/utils/adt/date.c
@@ -1509,6 +1509,12 @@ time_hash(PG_FUNCTION_ARGS)
}
Datum
+time_hash_extended(PG_FUNCTION_ARGS)
+{
+ return hashint8extended(fcinfo);
+}
+
+Datum
time_larger(PG_FUNCTION_ARGS)
{
TimeADT time1 = PG_GETARG_TIMEADT(0);
@@ -2214,6 +2220,21 @@ timetz_hash(PG_FUNCTION_ARGS)
}
Datum
+timetz_hash_extended(PG_FUNCTION_ARGS)
+{
+ TimeTzADT *key = PG_GETARG_TIMETZADT_P(0);
+ uint64 seed = PG_GETARG_DATUM(1);
+ uint64 thash;
+
+ /* Same approach as timetz_hash */
+ thash = DatumGetUInt64(DirectFunctionCall2(hashint8extended,
+ Int64GetDatumFast(key->time),
+ seed));
+ thash ^= DatumGetUInt64(hash_uint32_extended(key->zone, seed));
+ PG_RETURN_UINT64(thash);
+}
+
+Datum
timetz_larger(PG_FUNCTION_ARGS)
{
TimeTzADT *time1 = PG_GETARG_TIMETZADT_P(0);
diff --git a/src/backend/utils/adt/jsonb_op.c b/src/backend/utils/adt/jsonb_op.c
index d4c490e..c4a7dc3 100644
--- a/src/backend/utils/adt/jsonb_op.c
+++ b/src/backend/utils/adt/jsonb_op.c
@@ -291,3 +291,46 @@ jsonb_hash(PG_FUNCTION_ARGS)
PG_FREE_IF_COPY(jb, 0);
PG_RETURN_INT32(hash);
}
+
+Datum
+jsonb_hash_extended(PG_FUNCTION_ARGS)
+{
+ Jsonb *jb = PG_GETARG_JSONB(0);
+ uint64 seed = PG_GETARG_INT64(1);
+ JsonbIterator *it;
+ JsonbValue v;
+ JsonbIteratorToken r;
+ uint64 hash = 0;
+
+ if (JB_ROOT_COUNT(jb) == 0)
+ PG_RETURN_UINT64(seed);
+
+ it = JsonbIteratorInit(&jb->root);
+
+ while ((r = JsonbIteratorNext(&it, &v, false)) != WJB_DONE)
+ {
+ switch (r)
+ {
+ /* Rotation is left to JsonbHashScalarValueExtended() */
+ case WJB_BEGIN_ARRAY:
+ hash ^= ((UINT64CONST(JB_FARRAY) << 32) | UINT64CONST(JB_FARRAY));
+ break;
+ case WJB_BEGIN_OBJECT:
+ hash ^= ((UINT64CONST(JB_FOBJECT) << 32) | UINT64CONST(JB_FOBJECT));
+ break;
+ case WJB_KEY:
+ case WJB_VALUE:
+ case WJB_ELEM:
+ JsonbHashScalarValueExtended(&v, &hash, seed);
+ break;
+ case WJB_END_ARRAY:
+ case WJB_END_OBJECT:
+ break;
+ default:
+ elog(ERROR, "invalid JsonbIteratorNext rc: %d", (int) r);
+ }
+ }
+
+ PG_FREE_IF_COPY(jb, 0);
+ PG_RETURN_UINT64(hash);
+}
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 4850569..702b006 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -1250,6 +1250,49 @@ JsonbHashScalarValue(const JsonbValue *scalarVal, uint32 *hash)
}
/*
+ * Hash a value to a 64-bit value, with a seed. Otherwise, similar to
+ * JsonbHashScalarValue.
+ */
+void
+JsonbHashScalarValueExtended(const JsonbValue *scalarVal, uint64 *hash,
+ uint64 seed)
+{
+ uint64 tmp;
+
+ switch (scalarVal->type)
+ {
+ case jbvNull:
+ tmp = seed + 0x01;
+ break;
+ case jbvString:
+ tmp = DatumGetUInt64(hash_any_extended((const unsigned char *) scalarVal->val.string.val,
+ scalarVal->val.string.len,
+ seed));
+ break;
+ case jbvNumeric:
+ tmp = DatumGetUInt64(DirectFunctionCall2(hash_numeric_extended,
+ NumericGetDatum(scalarVal->val.numeric),
+ UInt64GetDatum(seed)));
+ break;
+ case jbvBool:
+ if (seed)
+ tmp = DatumGetUInt64(DirectFunctionCall2(hashcharextended,
+ BoolGetDatum(scalarVal->val.boolean),
+ UInt64GetDatum(seed)));
+ else
+ tmp = scalarVal->val.boolean ? 0x02 : 0x04;
+
+ break;
+ default:
+ elog(ERROR, "invalid jsonb scalar type");
+ break;
+ }
+
+ *hash = ROTATE_64BITS(*hash);
+ *hash ^= tmp;
+}
+
+/*
* Are two scalar JsonbValues of the same type a and b equal?
*/
static bool
diff --git a/src/backend/utils/adt/mac.c b/src/backend/utils/adt/mac.c
index d1c20c3..60521cc 100644
--- a/src/backend/utils/adt/mac.c
+++ b/src/backend/utils/adt/mac.c
@@ -271,6 +271,15 @@ hashmacaddr(PG_FUNCTION_ARGS)
return hash_any((unsigned char *) key, sizeof(macaddr));
}
+Datum
+hashmacaddrextended(PG_FUNCTION_ARGS)
+{
+ macaddr *key = PG_GETARG_MACADDR_P(0);
+
+ return hash_any_extended((unsigned char *) key, sizeof(macaddr),
+ PG_GETARG_INT64(1));
+}
+
/*
* Arithmetic functions: bitwise NOT, AND, OR.
*/
diff --git a/src/backend/utils/adt/mac8.c b/src/backend/utils/adt/mac8.c
index 482d1fb..0410b98 100644
--- a/src/backend/utils/adt/mac8.c
+++ b/src/backend/utils/adt/mac8.c
@@ -407,6 +407,15 @@ hashmacaddr8(PG_FUNCTION_ARGS)
return hash_any((unsigned char *) key, sizeof(macaddr8));
}
+Datum
+hashmacaddr8extended(PG_FUNCTION_ARGS)
+{
+ macaddr8 *key = PG_GETARG_MACADDR8_P(0);
+
+ return hash_any_extended((unsigned char *) key, sizeof(macaddr8),
+ PG_GETARG_INT64(1));
+}
+
/*
* Arithmetic functions: bitwise NOT, AND, OR.
*/
diff --git a/src/backend/utils/adt/network.c b/src/backend/utils/adt/network.c
index 5573c34..ec4ac20 100644
--- a/src/backend/utils/adt/network.c
+++ b/src/backend/utils/adt/network.c
@@ -486,6 +486,16 @@ hashinet(PG_FUNCTION_ARGS)
return hash_any((unsigned char *) VARDATA_ANY(addr), addrsize + 2);
}
+Datum
+hashinetextended(PG_FUNCTION_ARGS)
+{
+ inet *addr = PG_GETARG_INET_PP(0);
+ int addrsize = ip_addrsize(addr);
+
+ return hash_any_extended((unsigned char *) VARDATA_ANY(addr), addrsize + 2,
+ PG_GETARG_INT64(1));
+}
+
/*
* Boolean network-inclusion tests.
*/
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 3e5614e..22d5898 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -2230,6 +2230,66 @@ hash_numeric(PG_FUNCTION_ARGS)
PG_RETURN_DATUM(result);
}
+/*
+ * Returns 64-bit value by hashing a value to a 64-bit value, with a seed.
+ * Otherwise, similar to hash_numeric.
+ */
+Datum
+hash_numeric_extended(PG_FUNCTION_ARGS)
+{
+ Numeric key = PG_GETARG_NUMERIC(0);
+ uint64 seed = PG_GETARG_INT64(1);
+ Datum digit_hash;
+ Datum result;
+ int weight;
+ int start_offset;
+ int end_offset;
+ int i;
+ int hash_len;
+ NumericDigit *digits;
+
+ if (NUMERIC_IS_NAN(key))
+ PG_RETURN_UINT64(seed);
+
+ weight = NUMERIC_WEIGHT(key);
+ start_offset = 0;
+ end_offset = 0;
+
+ digits = NUMERIC_DIGITS(key);
+ for (i = 0; i < NUMERIC_NDIGITS(key); i++)
+ {
+ if (digits[i] != (NumericDigit) 0)
+ break;
+
+ start_offset++;
+
+ weight--;
+ }
+
+ if (NUMERIC_NDIGITS(key) == start_offset)
+ PG_RETURN_UINT64(seed - 1);
+
+ for (i = NUMERIC_NDIGITS(key) - 1; i >= 0; i--)
+ {
+ if (digits[i] != (NumericDigit) 0)
+ break;
+
+ end_offset++;
+ }
+
+ Assert(start_offset + end_offset < NUMERIC_NDIGITS(key));
+
+ hash_len = NUMERIC_NDIGITS(key) - start_offset - end_offset;
+ digit_hash = hash_any_extended((unsigned char *) (NUMERIC_DIGITS(key)
+ + start_offset),
+ hash_len * sizeof(NumericDigit),
+ seed);
+
+ result = digit_hash ^ weight;
+
+ PG_RETURN_DATUM(result);
+}
+
/* ----------------------------------------------------------------------
*
diff --git a/src/backend/utils/adt/pg_lsn.c b/src/backend/utils/adt/pg_lsn.c
index aefbb87..7ad30a2 100644
--- a/src/backend/utils/adt/pg_lsn.c
+++ b/src/backend/utils/adt/pg_lsn.c
@@ -179,6 +179,12 @@ pg_lsn_hash(PG_FUNCTION_ARGS)
return hashint8(fcinfo);
}
+Datum
+pg_lsn_hash_extended(PG_FUNCTION_ARGS)
+{
+ return hashint8extended(fcinfo);
+}
+
/*----------------------------------------------------------
* Arithmetic operators on PostgreSQL LSNs.
diff --git a/src/backend/utils/adt/rangetypes.c b/src/backend/utils/adt/rangetypes.c
index 09a4f14..8db2014 100644
--- a/src/backend/utils/adt/rangetypes.c
+++ b/src/backend/utils/adt/rangetypes.c
@@ -1281,6 +1281,69 @@ hash_range(PG_FUNCTION_ARGS)
}
/*
+ * Returns 64-bit value by hashing a value to a 64-bit value, with a seed.
+ * Otherwise, similar to hash_range.
+ */
+Datum
+hash_range_extended(PG_FUNCTION_ARGS)
+{
+ RangeType *r = PG_GETARG_RANGE(0);
+ uint64 seed = PG_GETARG_INT64(1);
+ uint64 result;
+ TypeCacheEntry *typcache;
+ TypeCacheEntry *scache;
+ RangeBound lower;
+ RangeBound upper;
+ bool empty;
+ char flags;
+ uint64 lower_hash;
+ uint64 upper_hash;
+
+ check_stack_depth();
+
+ typcache = range_get_typcache(fcinfo, RangeTypeGetOid(r));
+
+ range_deserialize(typcache, r, &lower, &upper, &empty);
+ flags = range_get_flags(r);
+
+ scache = typcache->rngelemtype;
+ if (!OidIsValid(scache->hash_extended_proc_finfo.fn_oid))
+ {
+ scache = lookup_type_cache(scache->type_id,
+ TYPECACHE_HASH_EXTENDED_PROC_FINFO);
+ if (!OidIsValid(scache->hash_extended_proc_finfo.fn_oid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("could not identify a hash function for type %s",
+ format_type_be(scache->type_id))));
+ }
+
+ if (RANGE_HAS_LBOUND(flags))
+ lower_hash = DatumGetUInt64(FunctionCall2Coll(&scache->hash_extended_proc_finfo,
+ typcache->rng_collation,
+ lower.val,
+ seed));
+ else
+ lower_hash = 0;
+
+ if (RANGE_HAS_UBOUND(flags))
+ upper_hash = DatumGetUInt64(FunctionCall2Coll(&scache->hash_extended_proc_finfo,
+ typcache->rng_collation,
+ upper.val,
+ seed));
+ else
+ upper_hash = 0;
+
+ /* Merge hashes of flags and bounds */
+ result = hash_uint32_extended((uint32) flags, seed);
+ result ^= lower_hash;
+ result = ROTATE_64BITS(result)
+ result ^= upper_hash;
+
+ PG_RETURN_UINT64(result);
+}
+
+/*
*----------------------------------------------------------
* CANONICAL FUNCTIONS
*
diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index 6fa126d..b11d452 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -2113,6 +2113,11 @@ timestamp_hash(PG_FUNCTION_ARGS)
return hashint8(fcinfo);
}
+Datum
+timestamp_hash_extended(PG_FUNCTION_ARGS)
+{
+ return hashint8extended(fcinfo);
+}
/*
* Cross-type comparison functions for timestamp vs timestamptz
@@ -2419,6 +2424,20 @@ interval_hash(PG_FUNCTION_ARGS)
return DirectFunctionCall1(hashint8, Int64GetDatumFast(span64));
}
+Datum
+interval_hash_extended(PG_FUNCTION_ARGS)
+{
+ Interval *interval = PG_GETARG_INTERVAL_P(0);
+ INT128 span = interval_cmp_value(interval);
+ int64 span64;
+
+ /* Same approach as interval_hash */
+ span64 = int128_to_int64(span);
+
+ return DirectFunctionCall2(hashint8extended, Int64GetDatumFast(span64),
+ PG_GETARG_DATUM(1));
+}
+
/* overlaps_timestamp() --- implements the SQL OVERLAPS operator.
*
* Algorithm is per SQL spec. This is much harder than you'd think
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 5f15c8e..f73c695 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -408,3 +408,11 @@ uuid_hash(PG_FUNCTION_ARGS)
return hash_any(key->data, UUID_LEN);
}
+
+Datum
+uuid_hash_extended(PG_FUNCTION_ARGS)
+{
+ pg_uuid_t *key = PG_GETARG_UUID_P(0);
+
+ return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
+}
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index cbc62b0..2df6f2c 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -947,6 +947,24 @@ hashbpchar(PG_FUNCTION_ARGS)
return result;
}
+Datum
+hashbpcharextended(PG_FUNCTION_ARGS)
+{
+ BpChar *key = PG_GETARG_BPCHAR_PP(0);
+ char *keydata;
+ int keylen;
+ Datum result;
+
+ keydata = VARDATA_ANY(key);
+ keylen = bcTruelen(key);
+
+ result = hash_any_extended((unsigned char *) keydata, keylen,
+ PG_GETARG_INT64(1));
+
+ PG_FREE_IF_COPY(key, 0);
+
+ return result;
+}
/*
* The following operators support character-by-character comparison
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 82763f8..b7a14dc 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -490,8 +490,8 @@ get_compatible_hash_operators(Oid opno,
/*
* get_op_hash_functions
- * Get the OID(s) of hash support function(s) compatible with the given
- * operator, operating on its LHS and/or RHS datatype as required.
+ * Get the OID(s) of the standard hash support function(s) compatible with
+ * the given operator, operating on its LHS and/or RHS datatype as required.
*
* A function for the LHS type is sought and returned into *lhs_procno if
* lhs_procno isn't NULL. Similarly, a function for the RHS type is sought
@@ -542,7 +542,7 @@ get_op_hash_functions(Oid opno,
*lhs_procno = get_opfamily_proc(aform->amopfamily,
aform->amoplefttype,
aform->amoplefttype,
- HASHPROC);
+ HASHSTANDARD_PROC);
if (!OidIsValid(*lhs_procno))
continue;
/* Matching LHS found, done if caller doesn't want RHS */
@@ -564,7 +564,7 @@ get_op_hash_functions(Oid opno,
*rhs_procno = get_opfamily_proc(aform->amopfamily,
aform->amoprighttype,
aform->amoprighttype,
- HASHPROC);
+ HASHSTANDARD_PROC);
if (!OidIsValid(*rhs_procno))
{
/* Forget any LHS function from this opfamily */
diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index 691d498..2e633f0 100644
--- a/src/backend/utils/cache/typcache.c
+++ b/src/backend/utils/cache/typcache.c
@@ -90,6 +90,7 @@ static TypeCacheEntry *firstDomainTypeEntry = NULL;
#define TCFLAGS_HAVE_FIELD_EQUALITY 0x1000
#define TCFLAGS_HAVE_FIELD_COMPARE 0x2000
#define TCFLAGS_CHECKED_DOMAIN_CONSTRAINTS 0x4000
+#define TCFLAGS_CHECKED_HASH_EXTENDED_PROC 0x8000
/*
* Data stored about a domain type's constraints. Note that we do not create
@@ -307,6 +308,8 @@ lookup_type_cache(Oid type_id, int flags)
flags |= TYPECACHE_HASH_OPFAMILY;
if ((flags & (TYPECACHE_HASH_PROC | TYPECACHE_HASH_PROC_FINFO |
+ TYPECACHE_HASH_EXTENDED_PROC |
+ TYPECACHE_HASH_EXTENDED_PROC_FINFO |
TYPECACHE_HASH_OPFAMILY)) &&
!(typentry->flags & TCFLAGS_CHECKED_HASH_OPCLASS))
{
@@ -329,6 +332,7 @@ lookup_type_cache(Oid type_id, int flags)
* decision is still good.
*/
typentry->flags &= ~(TCFLAGS_CHECKED_HASH_PROC);
+ typentry->flags &= ~(TCFLAGS_CHECKED_HASH_EXTENDED_PROC);
typentry->flags |= TCFLAGS_CHECKED_HASH_OPCLASS;
}
@@ -372,11 +376,12 @@ lookup_type_cache(Oid type_id, int flags)
typentry->eq_opr = eq_opr;
/*
- * Reset info about hash function whenever we pick up new info about
- * equality operator. This is so we can ensure that the hash function
- * matches the operator.
+ * Reset info about hash functions whenever we pick up new info about
+ * equality operator. This is so we can ensure that the hash functions
+ * match the operator.
*/
typentry->flags &= ~(TCFLAGS_CHECKED_HASH_PROC);
+ typentry->flags &= ~(TCFLAGS_CHECKED_HASH_EXTENDED_PROC);
typentry->flags |= TCFLAGS_CHECKED_EQ_OPR;
}
if ((flags & TYPECACHE_LT_OPR) &&
@@ -467,7 +472,7 @@ lookup_type_cache(Oid type_id, int flags)
hash_proc = get_opfamily_proc(typentry->hash_opf,
typentry->hash_opintype,
typentry->hash_opintype,
- HASHPROC);
+ HASHSTANDARD_PROC);
/*
* As above, make sure hash_array will succeed. We don't currently
@@ -485,6 +490,43 @@ lookup_type_cache(Oid type_id, int flags)
typentry->hash_proc = hash_proc;
typentry->flags |= TCFLAGS_CHECKED_HASH_PROC;
}
+ if ((flags & (TYPECACHE_HASH_EXTENDED_PROC |
+ TYPECACHE_HASH_EXTENDED_PROC_FINFO)) &&
+ !(typentry->flags & TCFLAGS_CHECKED_HASH_EXTENDED_PROC))
+ {
+ Oid hash_extended_proc = InvalidOid;
+
+ /*
+ * We insist that the eq_opr, if one has been determined, match the
+ * hash opclass; else report there is no hash function.
+ */
+ if (typentry->hash_opf != InvalidOid &&
+ (!OidIsValid(typentry->eq_opr) ||
+ typentry->eq_opr == get_opfamily_member(typentry->hash_opf,
+ typentry->hash_opintype,
+ typentry->hash_opintype,
+ HTEqualStrategyNumber)))
+ hash_extended_proc = get_opfamily_proc(typentry->hash_opf,
+ typentry->hash_opintype,
+ typentry->hash_opintype,
+ HASHEXTENDED_PROC);
+
+ /*
+ * As above, make sure hash_array_extended will succeed. We don't
+ * currently support hashing for composite types, but when we do,
+ * we'll need more logic here to check that case too.
+ */
+ if (hash_extended_proc == F_HASH_ARRAY_EXTENDED &&
+ !array_element_has_hashing(typentry))
+ hash_extended_proc = InvalidOid;
+
+ /* Force update of hash_proc_finfo only if we're changing state */
+ if (typentry->hash_extended_proc != hash_extended_proc)
+ typentry->hash_extended_proc_finfo.fn_oid = InvalidOid;
+
+ typentry->hash_extended_proc = hash_extended_proc;
+ typentry->flags |= TCFLAGS_CHECKED_HASH_EXTENDED_PROC;
+ }
/*
* Set up fmgr lookup info as requested
@@ -523,6 +565,14 @@ lookup_type_cache(Oid type_id, int flags)
fmgr_info_cxt(typentry->hash_proc, &typentry->hash_proc_finfo,
CacheMemoryContext);
}
+ if ((flags & TYPECACHE_HASH_EXTENDED_PROC_FINFO) &&
+ typentry->hash_extended_proc_finfo.fn_oid == InvalidOid &&
+ typentry->hash_extended_proc != InvalidOid)
+ {
+ fmgr_info_cxt(typentry->hash_extended_proc,
+ &typentry->hash_extended_proc_finfo,
+ CacheMemoryContext);
+ }
/*
* If it's a composite type (row type), get tupdesc if requested
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index 72fce30..dcea1d1 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -38,6 +38,10 @@ typedef uint32 Bucket;
#define BUCKET_TO_BLKNO(metap,B) \
((BlockNumber) ((B) + ((B) ? (metap)->hashm_spares[_hash_spareindex((B)+1)-1] : 0)) + 1)
+/* Rotate the high 32 bits and the low 32 bits separately. */
+#define ROTATE_64BITS(v) \
+ ((((v) << 1) & UINT64CONST(0xfffffffefffffffe)) | (((v) >> 31) & UINT64CONST(0x100000001)));
+
/*
* Special space for hash index pages.
*
@@ -289,12 +293,16 @@ typedef HashMetaPageData *HashMetaPage;
#define HTMaxStrategyNumber 1
/*
- * When a new operator class is declared, we require that the user supply
- * us with an amproc procudure for hashing a key of the new type.
- * Since we only have one such proc in amproc, it's number 1.
+ * When a new operator class is declared, we require that the user supply
+ * us with an amproc procudure for hashing a key of the new type, returning
+ * a 32-bit hash value. We call this the "standard" hash procedure. We
+ * also allow an optional "extended" hash procedure which accepts a salt and
+ * returns a 64-bit hash value. This is highly recommended but, for reasons
+ * of backward compatibility, optional.
*/
-#define HASHPROC 1
-#define HASHNProcs 1
+#define HASHSTANDARD_PROC 1
+#define HASHEXTENDED_PROC 2
+#define HASHNProcs 2
/* public routines */
@@ -322,7 +330,10 @@ extern bytea *hashoptions(Datum reloptions, bool validate);
extern bool hashvalidate(Oid opclassoid);
extern Datum hash_any(register const unsigned char *k, register int keylen);
+extern Datum hash_any_extended(register const unsigned char *k,
+ register int keylen, uint64 seed);
extern Datum hash_uint32(uint32 k);
+extern Datum hash_uint32_extended(uint32 k, uint64 seed);
/* private routines */
diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h
index 7d245b1..fb6a829 100644
--- a/src/include/catalog/pg_amproc.h
+++ b/src/include/catalog/pg_amproc.h
@@ -153,41 +153,77 @@ DATA(insert ( 4033 3802 3802 1 4044 ));
/* hash */
DATA(insert ( 427 1042 1042 1 1080 ));
+DATA(insert ( 427 1042 1042 2 972 ));
DATA(insert ( 431 18 18 1 454 ));
+DATA(insert ( 431 18 18 2 446 ));
DATA(insert ( 435 1082 1082 1 450 ));
+DATA(insert ( 435 1082 1082 2 425 ));
DATA(insert ( 627 2277 2277 1 626 ));
+DATA(insert ( 627 2277 2277 2 782 ));
DATA(insert ( 1971 700 700 1 451 ));
+DATA(insert ( 1971 700 700 2 443 ));
DATA(insert ( 1971 701 701 1 452 ));
+DATA(insert ( 1971 701 701 2 444 ));
DATA(insert ( 1975 869 869 1 422 ));
+DATA(insert ( 1975 869 869 2 779 ));
DATA(insert ( 1977 21 21 1 449 ));
+DATA(insert ( 1977 21 21 2 441 ));
DATA(insert ( 1977 23 23 1 450 ));
+DATA(insert ( 1977 23 23 2 425 ));
DATA(insert ( 1977 20 20 1 949 ));
+DATA(insert ( 1977 20 20 2 442 ));
DATA(insert ( 1983 1186 1186 1 1697 ));
+DATA(insert ( 1983 1186 1186 2 3418 ));
DATA(insert ( 1985 829 829 1 399 ));
+DATA(insert ( 1985 829 829 2 778 ));
DATA(insert ( 1987 19 19 1 455 ));
+DATA(insert ( 1987 19 19 2 447 ));
DATA(insert ( 1990 26 26 1 453 ));
+DATA(insert ( 1990 26 26 2 445 ));
DATA(insert ( 1992 30 30 1 457 ));
+DATA(insert ( 1992 30 30 2 776 ));
DATA(insert ( 1995 25 25 1 400 ));
+DATA(insert ( 1995 25 25 2 448));
DATA(insert ( 1997 1083 1083 1 1688 ));
+DATA(insert ( 1997 1083 1083 2 3409 ));
DATA(insert ( 1998 1700 1700 1 432 ));
+DATA(insert ( 1998 1700 1700 2 780 ));
DATA(insert ( 1999 1184 1184 1 2039 ));
+DATA(insert ( 1999 1184 1184 2 3411 ));
DATA(insert ( 2001 1266 1266 1 1696 ));
+DATA(insert ( 2001 1266 1266 2 3410 ));
DATA(insert ( 2040 1114 1114 1 2039 ));
+DATA(insert ( 2040 1114 1114 2 3411 ));
DATA(insert ( 2222 16 16 1 454 ));
+DATA(insert ( 2222 16 16 2 446 ));
DATA(insert ( 2223 17 17 1 456 ));
+DATA(insert ( 2223 17 17 2 772 ));
DATA(insert ( 2225 28 28 1 450 ));
+DATA(insert ( 2225 28 28 2 425));
DATA(insert ( 2226 29 29 1 450 ));
+DATA(insert ( 2226 29 29 2 425 ));
DATA(insert ( 2227 702 702 1 450 ));
+DATA(insert ( 2227 702 702 2 425 ));
DATA(insert ( 2228 703 703 1 450 ));
+DATA(insert ( 2228 703 703 2 425 ));
DATA(insert ( 2229 25 25 1 400 ));
+DATA(insert ( 2229 25 25 2 448 ));
DATA(insert ( 2231 1042 1042 1 1080 ));
+DATA(insert ( 2231 1042 1042 2 972 ));
DATA(insert ( 2235 1033 1033 1 329 ));
+DATA(insert ( 2235 1033 1033 2 777 ));
DATA(insert ( 2969 2950 2950 1 2963 ));
+DATA(insert ( 2969 2950 2950 2 3412 ));
DATA(insert ( 3254 3220 3220 1 3252 ));
+DATA(insert ( 3254 3220 3220 2 3413 ));
DATA(insert ( 3372 774 774 1 328 ));
+DATA(insert ( 3372 774 774 2 781 ));
DATA(insert ( 3523 3500 3500 1 3515 ));
+DATA(insert ( 3523 3500 3500 2 3414 ));
DATA(insert ( 3903 3831 3831 1 3902 ));
+DATA(insert ( 3903 3831 3831 2 3417 ));
DATA(insert ( 4034 3802 3802 1 4045 ));
+DATA(insert ( 4034 3802 3802 2 3416));
/* gist */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 8b33b4e..d820b56 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -668,36 +668,68 @@ DESCR("convert char(n) to name");
DATA(insert OID = 449 ( hashint2 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "21" _null_ _null_ _null_ _null_ _null_ hashint2 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 441 ( hashint2extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "21 20" _null_ _null_ _null_ _null_ _null_ hashint2extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 450 ( hashint4 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "23" _null_ _null_ _null_ _null_ _null_ hashint4 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 425 ( hashint4extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "23 20" _null_ _null_ _null_ _null_ _null_ hashint4extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 949 ( hashint8 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "20" _null_ _null_ _null_ _null_ _null_ hashint8 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 442 ( hashint8extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "20 20" _null_ _null_ _null_ _null_ _null_ hashint8extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 451 ( hashfloat4 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "700" _null_ _null_ _null_ _null_ _null_ hashfloat4 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 443 ( hashfloat4extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "700 20" _null_ _null_ _null_ _null_ _null_ hashfloat4extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 452 ( hashfloat8 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "701" _null_ _null_ _null_ _null_ _null_ hashfloat8 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 444 ( hashfloat8extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "701 20" _null_ _null_ _null_ _null_ _null_ hashfloat8extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 453 ( hashoid PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "26" _null_ _null_ _null_ _null_ _null_ hashoid _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 445 ( hashoidextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "26 20" _null_ _null_ _null_ _null_ _null_ hashoidextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 454 ( hashchar PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "18" _null_ _null_ _null_ _null_ _null_ hashchar _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 446 ( hashcharextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "18 20" _null_ _null_ _null_ _null_ _null_ hashcharextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 455 ( hashname PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "19" _null_ _null_ _null_ _null_ _null_ hashname _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 447 ( hashnameextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "19 20" _null_ _null_ _null_ _null_ _null_ hashnameextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 400 ( hashtext PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "25" _null_ _null_ _null_ _null_ _null_ hashtext _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 448 ( hashtextextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "25 20" _null_ _null_ _null_ _null_ _null_ hashtextextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 456 ( hashvarlena PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "2281" _null_ _null_ _null_ _null_ _null_ hashvarlena _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 772 ( hashvarlenaextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "2281 20" _null_ _null_ _null_ _null_ _null_ hashvarlenaextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 457 ( hashoidvector PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "30" _null_ _null_ _null_ _null_ _null_ hashoidvector _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 776 ( hashoidvectorextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "30 20" _null_ _null_ _null_ _null_ _null_ hashoidvectorextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 329 ( hash_aclitem PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1033" _null_ _null_ _null_ _null_ _null_ hash_aclitem _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 777 ( hash_aclitem_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1033 20" _null_ _null_ _null_ _null_ _null_ hash_aclitem_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 399 ( hashmacaddr PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "829" _null_ _null_ _null_ _null_ _null_ hashmacaddr _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 778 ( hashmacaddrextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "829 20" _null_ _null_ _null_ _null_ _null_ hashmacaddrextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 422 ( hashinet PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "869" _null_ _null_ _null_ _null_ _null_ hashinet _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 779 ( hashinetextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "869 20" _null_ _null_ _null_ _null_ _null_ hashinetextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 432 ( hash_numeric PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1700" _null_ _null_ _null_ _null_ _null_ hash_numeric _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 780 ( hash_numeric_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1700 20" _null_ _null_ _null_ _null_ _null_ hash_numeric_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 328 ( hashmacaddr8 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "774" _null_ _null_ _null_ _null_ _null_ hashmacaddr8 _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 781 ( hashmacaddr8extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "774 20" _null_ _null_ _null_ _null_ _null_ hashmacaddr8extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 438 ( num_nulls PGNSP PGUID 12 1 0 2276 0 f f f f f f i s 1 0 23 "2276" "{2276}" "{v}" _null_ _null_ _null_ pg_num_nulls _null_ _null_ _null_ ));
DESCR("count the number of NULL arguments");
@@ -747,6 +779,8 @@ DESCR("convert float8 to int8");
DATA(insert OID = 626 ( hash_array PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "2277" _null_ _null_ _null_ _null_ _null_ hash_array _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 782 ( hash_array_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "2277 20" _null_ _null_ _null_ _null_ _null_ hash_array_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 652 ( float4 PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 700 "20" _null_ _null_ _null_ _null_ _null_ i8tof _null_ _null_ _null_ ));
DESCR("convert int8 to float4");
@@ -1155,6 +1189,8 @@ DATA(insert OID = 3328 ( bpchar_sortsupport PGNSP PGUID 12 1 0 0 0 f f f f t f i
DESCR("sort support");
DATA(insert OID = 1080 ( hashbpchar PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1042" _null_ _null_ _null_ _null_ _null_ hashbpchar _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 972 ( hashbpcharextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1042 20" _null_ _null_ _null_ _null_ _null_ hashbpcharextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 1081 ( format_type PGNSP PGUID 12 1 0 0 0 f f f f f f s s 2 0 25 "26 23" _null_ _null_ _null_ _null_ _null_ format_type _null_ _null_ _null_ ));
DESCR("format a type oid and atttypmod to canonical SQL");
DATA(insert OID = 1084 ( date_in PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 1082 "2275" _null_ _null_ _null_ _null_ _null_ date_in _null_ _null_ _null_ ));
@@ -2286,10 +2322,16 @@ DESCR("less-equal-greater");
DATA(insert OID = 1688 ( time_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1083" _null_ _null_ _null_ _null_ _null_ time_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3409 ( time_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1083 20" _null_ _null_ _null_ _null_ _null_ time_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 1696 ( timetz_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1266" _null_ _null_ _null_ _null_ _null_ timetz_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3410 ( timetz_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1266 20" _null_ _null_ _null_ _null_ _null_ timetz_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 1697 ( interval_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1186" _null_ _null_ _null_ _null_ _null_ interval_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3418 ( interval_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1186 20" _null_ _null_ _null_ _null_ _null_ interval_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
/* OID's 1700 - 1799 NUMERIC data type */
@@ -3078,6 +3120,8 @@ DATA(insert OID = 2038 ( timezone PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0
DESCR("adjust time with time zone to new zone");
DATA(insert OID = 2039 ( timestamp_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "1114" _null_ _null_ _null_ _null_ _null_ timestamp_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3411 ( timestamp_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "1114 20" _null_ _null_ _null_ _null_ _null_ timestamp_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 2041 ( overlaps PGNSP PGUID 12 1 0 0 0 f f f f f f i s 4 0 16 "1114 1114 1114 1114" _null_ _null_ _null_ _null_ _null_ overlaps_timestamp _null_ _null_ _null_ ));
DESCR("intervals overlap?");
DATA(insert OID = 2042 ( overlaps PGNSP PGUID 14 1 0 0 0 f f f f f f i s 4 0 16 "1114 1186 1114 1186" _null_ _null_ _null_ _null_ _null_ "select ($1, ($1 + $2)) overlaps ($3, ($3 + $4))" _null_ _null_ _null_ ));
@@ -4543,6 +4587,8 @@ DATA(insert OID = 2962 ( uuid_send PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1
DESCR("I/O");
DATA(insert OID = 2963 ( uuid_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "2950" _null_ _null_ _null_ _null_ _null_ uuid_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3412 ( uuid_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "2950 20" _null_ _null_ _null_ _null_ _null_ uuid_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
/* pg_lsn */
DATA(insert OID = 3229 ( pg_lsn_in PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3220 "2275" _null_ _null_ _null_ _null_ _null_ pg_lsn_in _null_ _null_ _null_ ));
@@ -4564,6 +4610,8 @@ DATA(insert OID = 3251 ( pg_lsn_cmp PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0
DESCR("less-equal-greater");
DATA(insert OID = 3252 ( pg_lsn_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "3220" _null_ _null_ _null_ _null_ _null_ pg_lsn_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3413 ( pg_lsn_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "3220 20" _null_ _null_ _null_ _null_ _null_ pg_lsn_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
/* enum related procs */
DATA(insert OID = 3504 ( anyenum_in PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3500 "2275" _null_ _null_ _null_ _null_ _null_ anyenum_in _null_ _null_ _null_ ));
@@ -4584,6 +4632,8 @@ DATA(insert OID = 3514 ( enum_cmp PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 2
DESCR("less-equal-greater");
DATA(insert OID = 3515 ( hashenum PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "3500" _null_ _null_ _null_ _null_ _null_ hashenum _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3414 ( hashenumextended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "3500 20" _null_ _null_ _null_ _null_ _null_ hashenumextended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 3524 ( enum_smaller PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 3500 "3500 3500" _null_ _null_ _null_ _null_ _null_ enum_smaller _null_ _null_ _null_ ));
DESCR("smaller of two");
DATA(insert OID = 3525 ( enum_larger PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 3500 "3500 3500" _null_ _null_ _null_ _null_ _null_ enum_larger _null_ _null_ _null_ ));
@@ -4981,6 +5031,8 @@ DATA(insert OID = 4044 ( jsonb_cmp PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2
DESCR("less-equal-greater");
DATA(insert OID = 4045 ( jsonb_hash PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "3802" _null_ _null_ _null_ _null_ _null_ jsonb_hash _null_ _null_ _null_ ));
DESCR("hash");
+DATA(insert OID = 3416 ( jsonb_hash_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "3802 20" _null_ _null_ _null_ _null_ _null_ jsonb_hash_extended _null_ _null_ _null_ ));
+DESCR("hash");
DATA(insert OID = 4046 ( jsonb_contains PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 16 "3802 3802" _null_ _null_ _null_ _null_ _null_ jsonb_contains _null_ _null_ _null_ ));
DATA(insert OID = 4047 ( jsonb_exists PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 16 "3802 25" _null_ _null_ _null_ _null_ _null_ jsonb_exists _null_ _null_ _null_ ));
DATA(insert OID = 4048 ( jsonb_exists_any PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 16 "3802 1009" _null_ _null_ _null_ _null_ _null_ jsonb_exists_any _null_ _null_ _null_ ));
@@ -5171,6 +5223,8 @@ DATA(insert OID = 3881 ( range_gist_same PGNSP PGUID 12 1 0 0 0 f f f f t f i
DESCR("GiST support");
DATA(insert OID = 3902 ( hash_range PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "3831" _null_ _null_ _null_ _null_ _null_ hash_range _null_ _null_ _null_ ));
DESCR("hash a range");
+DATA(insert OID = 3417 ( hash_range_extended PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 20 "3831 20" _null_ _null_ _null_ _null_ _null_ hash_range_extended _null_ _null_ _null_ ));
+DESCR("hash a range");
DATA(insert OID = 3916 ( range_typanalyze PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 16 "2281" _null_ _null_ _null_ _null_ _null_ range_typanalyze _null_ _null_ _null_ ));
DESCR("range typanalyze");
DATA(insert OID = 3169 ( rangesel PGNSP PGUID 12 1 0 0 0 f f f f t f s s 4 0 701 "2281 26 2281 23" _null_ _null_ _null_ _null_ _null_ rangesel _null_ _null_ _null_ ));
diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index 0216965..b604a5c 100644
--- a/src/include/fmgr.h
+++ b/src/include/fmgr.h
@@ -325,6 +325,7 @@ extern struct varlena *pg_detoast_datum_packed(struct varlena *datum);
#define PG_RETURN_FLOAT4(x) return Float4GetDatum(x)
#define PG_RETURN_FLOAT8(x) return Float8GetDatum(x)
#define PG_RETURN_INT64(x) return Int64GetDatum(x)
+#define PG_RETURN_UINT64(x) return UInt64GetDatum(x)
/* RETURN macros for other pass-by-ref types will typically look like this: */
#define PG_RETURN_BYTEA_P(x) PG_RETURN_POINTER(x)
#define PG_RETURN_TEXT_P(x) PG_RETURN_POINTER(x)
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index ea9dd17..24f4916 100644
--- a/src/include/utils/jsonb.h
+++ b/src/include/utils/jsonb.h
@@ -370,6 +370,8 @@ extern Jsonb *JsonbValueToJsonb(JsonbValue *val);
extern bool JsonbDeepContains(JsonbIterator **val,
JsonbIterator **mContained);
extern void JsonbHashScalarValue(const JsonbValue *scalarVal, uint32 *hash);
+extern void JsonbHashScalarValueExtended(const JsonbValue *scalarVal,
+ uint64 *hash, uint64 seed);
/* jsonb.c support functions */
extern char *JsonbToCString(StringInfo out, JsonbContainer *in,
diff --git a/src/include/utils/typcache.h b/src/include/utils/typcache.h
index c12631d..b4f7592 100644
--- a/src/include/utils/typcache.h
+++ b/src/include/utils/typcache.h
@@ -56,6 +56,7 @@ typedef struct TypeCacheEntry
Oid gt_opr; /* the greater-than operator */
Oid cmp_proc; /* the btree comparison function */
Oid hash_proc; /* the hash calculation function */
+ Oid hash_extended_proc; /* the extended hash calculation function */
/*
* Pre-set-up fmgr call info for the equality operator, the btree
@@ -67,6 +68,7 @@ typedef struct TypeCacheEntry
FmgrInfo eq_opr_finfo;
FmgrInfo cmp_proc_finfo;
FmgrInfo hash_proc_finfo;
+ FmgrInfo hash_extended_proc_finfo;
/*
* Tuple descriptor if it's a composite type (row type). NULL if not
@@ -120,6 +122,8 @@ typedef struct TypeCacheEntry
#define TYPECACHE_HASH_OPFAMILY 0x0400
#define TYPECACHE_RANGE_INFO 0x0800
#define TYPECACHE_DOMAIN_INFO 0x1000
+#define TYPECACHE_HASH_EXTENDED_PROC 0x2000
+#define TYPECACHE_HASH_EXTENDED_PROC_FINFO 0x4000
/*
* Callers wishing to maintain a long-lived reference to a domain's constraint
diff --git a/src/test/regress/expected/alter_generic.out b/src/test/regress/expected/alter_generic.out
index 9f6ad4d..767c09b 100644
--- a/src/test/regress/expected/alter_generic.out
+++ b/src/test/regress/expected/alter_generic.out
@@ -421,7 +421,7 @@ BEGIN TRANSACTION;
CREATE OPERATOR FAMILY alt_opf13 USING hash;
CREATE FUNCTION fn_opf13 (int4) RETURNS BIGINT AS 'SELECT NULL::BIGINT;' LANGUAGE SQL;
ALTER OPERATOR FAMILY alt_opf13 USING hash ADD FUNCTION 1 fn_opf13(int4);
-ERROR: hash procedures must return integer
+ERROR: hash procedure 1 must return integer
DROP OPERATOR FAMILY alt_opf13 USING hash;
ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
@@ -439,7 +439,7 @@ BEGIN TRANSACTION;
CREATE OPERATOR FAMILY alt_opf15 USING hash;
CREATE FUNCTION fn_opf15 (int4, int2) RETURNS BIGINT AS 'SELECT NULL::BIGINT;' LANGUAGE SQL;
ALTER OPERATOR FAMILY alt_opf15 USING hash ADD FUNCTION 1 fn_opf15(int4, int2);
-ERROR: hash procedures must have one argument
+ERROR: hash procedure 1 must have one argument
DROP OPERATOR FAMILY alt_opf15 USING hash;
ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
--
2.6.2
0002-test-Hash_functions_v3_wip.patchapplication/octet-stream; name=0002-test-Hash_functions_v3_wip.patchDownload
From 12c8eb0d962196b3dedd568e3b2289391645810a Mon Sep 17 00:00:00 2001
From: Amul Sul <sulamul@gmail.com>
Date: Tue, 22 Aug 2017 14:06:50 +0530
Subject: [PATCH 2/2] test-Hash_functions_v3_wip
---
src/test/regress/expected/hash_func.out | 282 ++++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/hash_func.sql | 179 ++++++++++++++++++++
3 files changed, 462 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/hash_func.out
create mode 100644 src/test/regress/sql/hash_func.sql
diff --git a/src/test/regress/expected/hash_func.out b/src/test/regress/expected/hash_func.out
new file mode 100644
index 0000000..bdba873
--- /dev/null
+++ b/src/test/regress/expected/hash_func.out
@@ -0,0 +1,282 @@
+--
+-- Test hash functions
+--
+SELECT v as value, hashint2(v)::bit(32) as standard,
+ hashint2extended(v, 0)::bit(32) as extended0,
+ hashint2extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0::int2), (1::int2), (17::int2), (42::int2)) x(v)
+WHERE hashint2(v)::bit(32) != hashint2extended(v, 0)::bit(32)
+ OR hashint2(v)::bit(32) = hashint2extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashint4(v)::bit(32) as standard,
+ hashint4extended(v, 0)::bit(32) as extended0,
+ hashint4extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashint4(v)::bit(32) != hashint4extended(v, 0)::bit(32)
+ OR hashint4(v)::bit(32) = hashint4extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashint8(v)::bit(32) as standard,
+ hashint8extended(v, 0)::bit(32) as extended0,
+ hashint8extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashint8(v)::bit(32) != hashint8extended(v, 0)::bit(32)
+ OR hashint8(v)::bit(32) = hashint8extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashfloat4(v)::bit(32) as standard,
+ hashfloat4extended(v, 0)::bit(32) as extended0,
+ hashfloat4extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashfloat4(v)::bit(32) != hashfloat4extended(v, 0)::bit(32)
+ OR hashfloat4(v)::bit(32) = hashfloat4extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashfloat8(v)::bit(32) as standard,
+ hashfloat8extended(v, 0)::bit(32) as extended0,
+ hashfloat8extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashfloat8(v)::bit(32) != hashfloat8extended(v, 0)::bit(32)
+ OR hashfloat8(v)::bit(32) = hashfloat8extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashoid(v)::bit(32) as standard,
+ hashoidextended(v, 0)::bit(32) as extended0,
+ hashoidextended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashoid(v)::bit(32) != hashoidextended(v, 0)::bit(32)
+ OR hashoid(v)::bit(32) = hashoidextended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashchar(v)::bit(32) as standard,
+ hashcharextended(v, 0)::bit(32) as extended0,
+ hashcharextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::"char"), ('1'), ('x'), ('X'), ('p'), ('N')) x(v)
+WHERE hashchar(v)::bit(32) != hashcharextended(v, 0)::bit(32)
+ OR hashchar(v)::bit(32) = hashcharextended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashname(v)::bit(32) as standard,
+ hashnameextended(v, 0)::bit(32) as extended0,
+ hashnameextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL), ('PostgreSQL'), ('eIpUEtqmY89'), ('AXKEJBTK'), ('muop28x03'), ('yi3nm0d73')) x(v)
+WHERE hashname(v)::bit(32) != hashnameextended(v, 0)::bit(32)
+ OR hashname(v)::bit(32) = hashnameextended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashtext(v)::bit(32) as standard,
+ hashtextextended(v, 0)::bit(32) as extended0,
+ hashtextextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL), ('PostgreSQL'), ('eIpUEtqmY89'), ('AXKEJBTK'), ('muop28x03'), ('yi3nm0d73')) x(v)
+WHERE hashtext(v)::bit(32) != hashtextextended(v, 0)::bit(32)
+ OR hashtext(v)::bit(32) = hashtextextended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+--SELECT v as value, hashvarlena(internal_type??)::bit(32) as standard,
+-- hashvarlenaextended(internal type??, 0)::bit(32) as extended0,
+-- hashvarlenaextended(internal type??, 1)::bit(32) as extended1;
+SELECT v as value, hashoidvector(v)::bit(32) as standard,
+ hashoidvectorextended(v, 0)::bit(32) as extended0,
+ hashoidvectorextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::oidvector), ('0 1 2 3 4'), ('17 18 19 20'), ('42 43 42 45'),
+ ('550273 550273 570274'), ('207112489 207112499')) x(v)
+WHERE hashoidvector(v)::bit(32) != hashoidvectorextended(v, 0)::bit(32)
+ OR hashoidvector(v)::bit(32) = hashoidvectorextended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hash_aclitem(v)::bit(32) as standard,
+ hash_aclitem_extended(v, 0)::bit(32) as extended0,
+ hash_aclitem_extended(v, 1)::bit(32) as extended1
+FROM (SELECT DISTINCT(relacl[1]) FROM pg_class LIMIT 10) x(v)
+WHERE hash_aclitem(v)::bit(32) != hash_aclitem_extended(v, 0)::bit(32)
+ OR hash_aclitem(v)::bit(32) = hash_aclitem_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashmacaddr(v)::bit(32) as standard,
+ hashmacaddrextended(v, 0)::bit(32) as extended0,
+ hashmacaddrextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::macaddr), ('08:00:2b:01:02:04'), ('08:00:2b:01:02:04'),
+ ('e2:7f:51:3e:70:49'), ('d6:a9:4a:78:1c:d5'), ('ea:29:b1:5e:1f:a5')) x(v)
+WHERE hashmacaddr(v)::bit(32) != hashmacaddrextended(v, 0)::bit(32)
+ OR hashmacaddr(v)::bit(32) = hashmacaddrextended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashmacaddr8(v)::bit(32) as standard,
+ hashmacaddr8extended(v, 0)::bit(32) as extended0,
+ hashmacaddr8extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::macaddr8), ('08:00:2b:01:02:04'), ('08:00:2b:01:02:04'),
+ ('e2:7f:51:3e:70:49'), ('d6:a9:4a:78:1c:d5'), ('ea:29:b1:5e:1f:a5')) x(v)
+WHERE hashmacaddr8(v)::bit(32) != hashmacaddr8extended(v, 0)::bit(32)
+ OR hashmacaddr8(v)::bit(32) = hashmacaddr8extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashinet('192.168.100.128/25')::bit(32) as standard,
+ hashinetextended('192.168.100.128/25', 0)::bit(32) as extended0,
+ hashinetextended('192.168.100.128/25', 1)::bit(32) as extended1
+FROM (VALUES (NULL::inet), ('192.168.100.128/25'), ('192.168.100.0/8'),
+ ('172.168.10.126/16'), ('172.18.103.126/24'), ('192.188.13.16/32')) x(v)
+WHERE hashinet(v)::bit(32) != hashinetextended(v, 0)::bit(32)
+ OR hashinet(v)::bit(32) = hashinetextended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hash_numeric(149484958.204628457)::bit(32) as standard,
+ hash_numeric_extended(149484958.204628457, 0)::bit(32) as extended0,
+ hash_numeric_extended(149484958.204628457, 1)::bit(32) as extended1
+FROM (VALUES (0), (1.149484958), (17.149484958), (42.149484958), (149484958.550273), (2071124898672)) x(v)
+WHERE hash_numeric(v)::bit(32) != hash_numeric_extended(v, 0)::bit(32)
+ OR hash_numeric(v)::bit(32) = hash_numeric_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hash_array(v)::bit(32) as standard,
+ hash_array_extended(v, 0)::bit(32) as extended0,
+ hash_array_extended(v, 1)::bit(32) as extended1
+FROM (VALUES ('{0}'::int4[]), ('{0,1,2,3,4}'), ('{17,18,19,20}'), ('{42,34,65,98}'),
+ ('{550273,590027, 870273}'), ('{207112489, 807112489}')) x(v)
+WHERE hash_array(v)::bit(32) != hash_array_extended(v, 0)::bit(32)
+ OR hash_array(v)::bit(32) = hash_array_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hashbpchar(v)::bit(32) as standard,
+ hashbpcharextended(v, 0)::bit(32) as extended0,
+ hashbpcharextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL), ('PostgreSQL'), ('eIpUEtqmY89'), ('AXKEJBTK'), ('muop28x03'), ('yi3nm0d73')) x(v)
+WHERE hashbpchar(v)::bit(32) != hashbpcharextended(v, 0)::bit(32)
+ OR hashbpchar(v)::bit(32) = hashbpcharextended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, time_hash(v)::bit(32) as standard,
+ time_hash_extended(v, 0)::bit(32) as extended0,
+ time_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::time), ('11:09:59'), ('1:09:59'), ('11:59:59'), ('7:9:59'), ('5:15:59')) x(v)
+WHERE time_hash(v)::bit(32) != time_hash_extended(v, 0)::bit(32)
+ OR time_hash(v)::bit(32) = time_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, timetz_hash(v)::bit(32) as standard,
+ timetz_hash_extended(v, 0)::bit(32) as extended0,
+ timetz_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::timetz), ('00:11:52.518762-07'), ('00:11:52.51762-08'),
+ ('00:11:52.62-01'), ('00:11:52.62+01'), ('11:59:59+04')) x(v)
+WHERE timetz_hash(v)::bit(32) != timetz_hash_extended(v, 0)::bit(32)
+ OR timetz_hash(v)::bit(32) = timetz_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, timestamp_hash('2017-08-22 00:09:59')::bit(32) as standard,
+ timestamp_hash_extended('2017-08-22 00:09:59', 0)::bit(32) as extended0,
+ timestamp_hash_extended('2017-08-22 00:09:59', 1)::bit(32) as extended1
+FROM (VALUES (NULL::timestamp), ('2017-08-22 00:09:59.518762'), ('2015-08-20 00:11:52.51762-08'),
+ ('2017-05-22 00:11:52.62-01'), ('2013-08-22 00:11:52.62+01'), ('2013-08-22 11:59:59+04')) x(v)
+WHERE timestamp_hash(v)::bit(32) != timestamp_hash_extended(v, 0)::bit(32)
+ OR timestamp_hash(v)::bit(32) = timestamp_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, interval_hash(v)::bit(32) as standard,
+ interval_hash_extended(v, 0)::bit(32) as extended0,
+ interval_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::interval), ('5 month 7 day 46 minutes'), ('1 year 7 day 46 minutes'),
+ ('1 year 7 month 20 day 46 minutes'), ('5 month'),
+ ('17 year 11 month 7 day 9 hours 46 minutes 5 seconds')) x(v)
+WHERE interval_hash(v)::bit(32) != interval_hash_extended(v, 0)::bit(32)
+ OR interval_hash(v)::bit(32) = interval_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, uuid_hash('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11')::bit(32) as standard,
+ uuid_hash_extended('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 0)::bit(32) as extended0,
+ uuid_hash_extended('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 1)::bit(32) as extended1
+FROM (VALUES (NULL::uuid), ('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11'),
+ ('5a9ba4ac-8d6f-11e7-bb31-be2e44b06b34'), ('99c6705c-d939-461c-a3c9-1690ad64ed7b'),
+ ('7deed3ca-8d6f-11e7-bb31-be2e44b06b34'), ('9ad46d4f-6f2a-4edd-aadb-745993928e1e')) x(v)
+WHERE uuid_hash(v)::bit(32) != uuid_hash_extended(v, 0)::bit(32)
+ OR uuid_hash(v)::bit(32) = uuid_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, pg_lsn_hash('16/B374D84')::bit(32) as standard,
+ pg_lsn_hash_extended('16/B374D84', 0)::bit(32) as extended0,
+ pg_lsn_hash_extended('16/B374D84', 1)::bit(32) as extended1
+FROM (VALUES (NULL::pg_lsn), ('16/B374D84'), ('30/B374D84'),
+ ('255/B374D84'), ('25/B379D90'), ('900/F37FD90')) x(v)
+WHERE pg_lsn_hash(v)::bit(32) != pg_lsn_hash_extended(v, 0)::bit(32)
+ OR pg_lsn_hash(v)::bit(32) = pg_lsn_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
+SELECT v as value, hashenum(v)::bit(32) as standard,
+ hashenumextended(v, 0)::bit(32) as extended0,
+ hashenumextended(v, 1)::bit(32) as extended1
+FROM (VALUES ('sad'::mood), ('ok'), ('happy')) x(v)
+WHERE hashenum(v)::bit(32) != hashenumextended(v, 0)::bit(32)
+ OR hashenum(v)::bit(32) = hashenumextended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+DROP TYPE mood;
+SELECT v as value, jsonb_hash(v)::bit(32) as standard,
+ jsonb_hash_extended(v, 0)::bit(32) as extended0,
+ jsonb_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::jsonb), ('{"a": "aaa bbb ddd ccc", "b": ["eee fff ggg"], "c": {"d": "hhh iii"}}'),
+ ('{"foo": [true, "bar"], "tags": {"e": 1, "f": null}}'), ('{"g": {"h": "value"}}')) x(v)
+WHERE jsonb_hash(v)::bit(32) != jsonb_hash_extended(v, 0)::bit(32)
+ OR jsonb_hash(v)::bit(32) = jsonb_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
+SELECT v as value, hash_range(v)::bit(32) as standard,
+ hash_range_extended(v, 0)::bit(32) as extended0,
+ hash_range_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (int4range(10, 20)), (int4range(23, 43)), (int4range(5675, 550273)),
+ (int4range(550274, 1550274)), (int4range(1550275, 208112489))) x(v)
+WHERE hash_range(v)::bit(32) != hash_range_extended(v, 0)::bit(32)
+ OR hash_range(v)::bit(32) = hash_range_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index eefdeea..2fd3f2b 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/sql/hash_func.sql b/src/test/regress/sql/hash_func.sql
new file mode 100644
index 0000000..560a75b
--- /dev/null
+++ b/src/test/regress/sql/hash_func.sql
@@ -0,0 +1,179 @@
+--
+-- Test hash functions
+--
+
+SELECT v as value, hashint2(v)::bit(32) as standard,
+ hashint2extended(v, 0)::bit(32) as extended0,
+ hashint2extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0::int2), (1::int2), (17::int2), (42::int2)) x(v)
+WHERE hashint2(v)::bit(32) != hashint2extended(v, 0)::bit(32)
+ OR hashint2(v)::bit(32) = hashint2extended(v, 1)::bit(32);
+SELECT v as value, hashint4(v)::bit(32) as standard,
+ hashint4extended(v, 0)::bit(32) as extended0,
+ hashint4extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashint4(v)::bit(32) != hashint4extended(v, 0)::bit(32)
+ OR hashint4(v)::bit(32) = hashint4extended(v, 1)::bit(32);
+SELECT v as value, hashint8(v)::bit(32) as standard,
+ hashint8extended(v, 0)::bit(32) as extended0,
+ hashint8extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashint8(v)::bit(32) != hashint8extended(v, 0)::bit(32)
+ OR hashint8(v)::bit(32) = hashint8extended(v, 1)::bit(32);
+SELECT v as value, hashfloat4(v)::bit(32) as standard,
+ hashfloat4extended(v, 0)::bit(32) as extended0,
+ hashfloat4extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashfloat4(v)::bit(32) != hashfloat4extended(v, 0)::bit(32)
+ OR hashfloat4(v)::bit(32) = hashfloat4extended(v, 1)::bit(32);
+SELECT v as value, hashfloat8(v)::bit(32) as standard,
+ hashfloat8extended(v, 0)::bit(32) as extended0,
+ hashfloat8extended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashfloat8(v)::bit(32) != hashfloat8extended(v, 0)::bit(32)
+ OR hashfloat8(v)::bit(32) = hashfloat8extended(v, 1)::bit(32);
+SELECT v as value, hashoid(v)::bit(32) as standard,
+ hashoidextended(v, 0)::bit(32) as extended0,
+ hashoidextended(v, 1)::bit(32) as extended1
+FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
+WHERE hashoid(v)::bit(32) != hashoidextended(v, 0)::bit(32)
+ OR hashoid(v)::bit(32) = hashoidextended(v, 1)::bit(32);
+SELECT v as value, hashchar(v)::bit(32) as standard,
+ hashcharextended(v, 0)::bit(32) as extended0,
+ hashcharextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::"char"), ('1'), ('x'), ('X'), ('p'), ('N')) x(v)
+WHERE hashchar(v)::bit(32) != hashcharextended(v, 0)::bit(32)
+ OR hashchar(v)::bit(32) = hashcharextended(v, 1)::bit(32);
+SELECT v as value, hashname(v)::bit(32) as standard,
+ hashnameextended(v, 0)::bit(32) as extended0,
+ hashnameextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL), ('PostgreSQL'), ('eIpUEtqmY89'), ('AXKEJBTK'), ('muop28x03'), ('yi3nm0d73')) x(v)
+WHERE hashname(v)::bit(32) != hashnameextended(v, 0)::bit(32)
+ OR hashname(v)::bit(32) = hashnameextended(v, 1)::bit(32);
+SELECT v as value, hashtext(v)::bit(32) as standard,
+ hashtextextended(v, 0)::bit(32) as extended0,
+ hashtextextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL), ('PostgreSQL'), ('eIpUEtqmY89'), ('AXKEJBTK'), ('muop28x03'), ('yi3nm0d73')) x(v)
+WHERE hashtext(v)::bit(32) != hashtextextended(v, 0)::bit(32)
+ OR hashtext(v)::bit(32) = hashtextextended(v, 1)::bit(32);
+--SELECT v as value, hashvarlena(internal_type??)::bit(32) as standard,
+-- hashvarlenaextended(internal type??, 0)::bit(32) as extended0,
+-- hashvarlenaextended(internal type??, 1)::bit(32) as extended1;
+SELECT v as value, hashoidvector(v)::bit(32) as standard,
+ hashoidvectorextended(v, 0)::bit(32) as extended0,
+ hashoidvectorextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::oidvector), ('0 1 2 3 4'), ('17 18 19 20'), ('42 43 42 45'),
+ ('550273 550273 570274'), ('207112489 207112499')) x(v)
+WHERE hashoidvector(v)::bit(32) != hashoidvectorextended(v, 0)::bit(32)
+ OR hashoidvector(v)::bit(32) = hashoidvectorextended(v, 1)::bit(32);
+SELECT v as value, hash_aclitem(v)::bit(32) as standard,
+ hash_aclitem_extended(v, 0)::bit(32) as extended0,
+ hash_aclitem_extended(v, 1)::bit(32) as extended1
+FROM (SELECT DISTINCT(relacl[1]) FROM pg_class LIMIT 10) x(v)
+WHERE hash_aclitem(v)::bit(32) != hash_aclitem_extended(v, 0)::bit(32)
+ OR hash_aclitem(v)::bit(32) = hash_aclitem_extended(v, 1)::bit(32);
+SELECT v as value, hashmacaddr(v)::bit(32) as standard,
+ hashmacaddrextended(v, 0)::bit(32) as extended0,
+ hashmacaddrextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::macaddr), ('08:00:2b:01:02:04'), ('08:00:2b:01:02:04'),
+ ('e2:7f:51:3e:70:49'), ('d6:a9:4a:78:1c:d5'), ('ea:29:b1:5e:1f:a5')) x(v)
+WHERE hashmacaddr(v)::bit(32) != hashmacaddrextended(v, 0)::bit(32)
+ OR hashmacaddr(v)::bit(32) = hashmacaddrextended(v, 1)::bit(32);
+SELECT v as value, hashmacaddr8(v)::bit(32) as standard,
+ hashmacaddr8extended(v, 0)::bit(32) as extended0,
+ hashmacaddr8extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::macaddr8), ('08:00:2b:01:02:04'), ('08:00:2b:01:02:04'),
+ ('e2:7f:51:3e:70:49'), ('d6:a9:4a:78:1c:d5'), ('ea:29:b1:5e:1f:a5')) x(v)
+WHERE hashmacaddr8(v)::bit(32) != hashmacaddr8extended(v, 0)::bit(32)
+ OR hashmacaddr8(v)::bit(32) = hashmacaddr8extended(v, 1)::bit(32);
+SELECT v as value, hashinet('192.168.100.128/25')::bit(32) as standard,
+ hashinetextended('192.168.100.128/25', 0)::bit(32) as extended0,
+ hashinetextended('192.168.100.128/25', 1)::bit(32) as extended1
+FROM (VALUES (NULL::inet), ('192.168.100.128/25'), ('192.168.100.0/8'),
+ ('172.168.10.126/16'), ('172.18.103.126/24'), ('192.188.13.16/32')) x(v)
+WHERE hashinet(v)::bit(32) != hashinetextended(v, 0)::bit(32)
+ OR hashinet(v)::bit(32) = hashinetextended(v, 1)::bit(32);
+SELECT v as value, hash_numeric(149484958.204628457)::bit(32) as standard,
+ hash_numeric_extended(149484958.204628457, 0)::bit(32) as extended0,
+ hash_numeric_extended(149484958.204628457, 1)::bit(32) as extended1
+FROM (VALUES (0), (1.149484958), (17.149484958), (42.149484958), (149484958.550273), (2071124898672)) x(v)
+WHERE hash_numeric(v)::bit(32) != hash_numeric_extended(v, 0)::bit(32)
+ OR hash_numeric(v)::bit(32) = hash_numeric_extended(v, 1)::bit(32);
+SELECT v as value, hash_array(v)::bit(32) as standard,
+ hash_array_extended(v, 0)::bit(32) as extended0,
+ hash_array_extended(v, 1)::bit(32) as extended1
+FROM (VALUES ('{0}'::int4[]), ('{0,1,2,3,4}'), ('{17,18,19,20}'), ('{42,34,65,98}'),
+ ('{550273,590027, 870273}'), ('{207112489, 807112489}')) x(v)
+WHERE hash_array(v)::bit(32) != hash_array_extended(v, 0)::bit(32)
+ OR hash_array(v)::bit(32) = hash_array_extended(v, 1)::bit(32);
+SELECT v as value, hashbpchar(v)::bit(32) as standard,
+ hashbpcharextended(v, 0)::bit(32) as extended0,
+ hashbpcharextended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL), ('PostgreSQL'), ('eIpUEtqmY89'), ('AXKEJBTK'), ('muop28x03'), ('yi3nm0d73')) x(v)
+WHERE hashbpchar(v)::bit(32) != hashbpcharextended(v, 0)::bit(32)
+ OR hashbpchar(v)::bit(32) = hashbpcharextended(v, 1)::bit(32);
+SELECT v as value, time_hash(v)::bit(32) as standard,
+ time_hash_extended(v, 0)::bit(32) as extended0,
+ time_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::time), ('11:09:59'), ('1:09:59'), ('11:59:59'), ('7:9:59'), ('5:15:59')) x(v)
+WHERE time_hash(v)::bit(32) != time_hash_extended(v, 0)::bit(32)
+ OR time_hash(v)::bit(32) = time_hash_extended(v, 1)::bit(32);
+SELECT v as value, timetz_hash(v)::bit(32) as standard,
+ timetz_hash_extended(v, 0)::bit(32) as extended0,
+ timetz_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::timetz), ('00:11:52.518762-07'), ('00:11:52.51762-08'),
+ ('00:11:52.62-01'), ('00:11:52.62+01'), ('11:59:59+04')) x(v)
+WHERE timetz_hash(v)::bit(32) != timetz_hash_extended(v, 0)::bit(32)
+ OR timetz_hash(v)::bit(32) = timetz_hash_extended(v, 1)::bit(32);
+SELECT v as value, timestamp_hash('2017-08-22 00:09:59')::bit(32) as standard,
+ timestamp_hash_extended('2017-08-22 00:09:59', 0)::bit(32) as extended0,
+ timestamp_hash_extended('2017-08-22 00:09:59', 1)::bit(32) as extended1
+FROM (VALUES (NULL::timestamp), ('2017-08-22 00:09:59.518762'), ('2015-08-20 00:11:52.51762-08'),
+ ('2017-05-22 00:11:52.62-01'), ('2013-08-22 00:11:52.62+01'), ('2013-08-22 11:59:59+04')) x(v)
+WHERE timestamp_hash(v)::bit(32) != timestamp_hash_extended(v, 0)::bit(32)
+ OR timestamp_hash(v)::bit(32) = timestamp_hash_extended(v, 1)::bit(32);
+SELECT v as value, interval_hash(v)::bit(32) as standard,
+ interval_hash_extended(v, 0)::bit(32) as extended0,
+ interval_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::interval), ('5 month 7 day 46 minutes'), ('1 year 7 day 46 minutes'),
+ ('1 year 7 month 20 day 46 minutes'), ('5 month'),
+ ('17 year 11 month 7 day 9 hours 46 minutes 5 seconds')) x(v)
+WHERE interval_hash(v)::bit(32) != interval_hash_extended(v, 0)::bit(32)
+ OR interval_hash(v)::bit(32) = interval_hash_extended(v, 1)::bit(32);
+SELECT v as value, uuid_hash('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11')::bit(32) as standard,
+ uuid_hash_extended('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 0)::bit(32) as extended0,
+ uuid_hash_extended('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 1)::bit(32) as extended1
+FROM (VALUES (NULL::uuid), ('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11'),
+ ('5a9ba4ac-8d6f-11e7-bb31-be2e44b06b34'), ('99c6705c-d939-461c-a3c9-1690ad64ed7b'),
+ ('7deed3ca-8d6f-11e7-bb31-be2e44b06b34'), ('9ad46d4f-6f2a-4edd-aadb-745993928e1e')) x(v)
+WHERE uuid_hash(v)::bit(32) != uuid_hash_extended(v, 0)::bit(32)
+ OR uuid_hash(v)::bit(32) = uuid_hash_extended(v, 1)::bit(32);
+SELECT v as value, pg_lsn_hash('16/B374D84')::bit(32) as standard,
+ pg_lsn_hash_extended('16/B374D84', 0)::bit(32) as extended0,
+ pg_lsn_hash_extended('16/B374D84', 1)::bit(32) as extended1
+FROM (VALUES (NULL::pg_lsn), ('16/B374D84'), ('30/B374D84'),
+ ('255/B374D84'), ('25/B379D90'), ('900/F37FD90')) x(v)
+WHERE pg_lsn_hash(v)::bit(32) != pg_lsn_hash_extended(v, 0)::bit(32)
+ OR pg_lsn_hash(v)::bit(32) = pg_lsn_hash_extended(v, 1)::bit(32);
+CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
+SELECT v as value, hashenum(v)::bit(32) as standard,
+ hashenumextended(v, 0)::bit(32) as extended0,
+ hashenumextended(v, 1)::bit(32) as extended1
+FROM (VALUES ('sad'::mood), ('ok'), ('happy')) x(v)
+WHERE hashenum(v)::bit(32) != hashenumextended(v, 0)::bit(32)
+ OR hashenum(v)::bit(32) = hashenumextended(v, 1)::bit(32);
+DROP TYPE mood;
+SELECT v as value, jsonb_hash(v)::bit(32) as standard,
+ jsonb_hash_extended(v, 0)::bit(32) as extended0,
+ jsonb_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::jsonb), ('{"a": "aaa bbb ddd ccc", "b": ["eee fff ggg"], "c": {"d": "hhh iii"}}'),
+ ('{"foo": [true, "bar"], "tags": {"e": 1, "f": null}}'), ('{"g": {"h": "value"}}')) x(v)
+WHERE jsonb_hash(v)::bit(32) != jsonb_hash_extended(v, 0)::bit(32)
+ OR jsonb_hash(v)::bit(32) = jsonb_hash_extended(v, 1)::bit(32);
+SELECT v as value, hash_range(v)::bit(32) as standard,
+ hash_range_extended(v, 0)::bit(32) as extended0,
+ hash_range_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (int4range(10, 20)), (int4range(23, 43)), (int4range(5675, 550273)),
+ (int4range(550274, 1550274)), (int4range(1550275, 208112489))) x(v)
+WHERE hash_range(v)::bit(32) != hash_range_extended(v, 0)::bit(32)
+ OR hash_range(v)::bit(32) = hash_range_extended(v, 1)::bit(32);
--
2.6.2
On Thu, Aug 31, 2017 at 8:40 AM, amul sul <sulamul@gmail.com> wrote:
Fixed in the attached version.
I fixed these up a bit and committed them. Thanks.
I think this takes care of adding not only the infrastructure but
support for all the core data types, but I'm not quite sure how to
handle upgrading types in contrib. It looks like citext, hstore, and
several data types provided by isn have hash opclasses, and I think
that there's no syntax for adding a support function to an existing
opclass. We could add that, but I'm not sure how safe it would be.
TBH, I really don't care much about fixing isn, but it seems like
fixing citext and hstore would be worthwhile.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
I think this takes care of adding not only the infrastructure but
support for all the core data types, but I'm not quite sure how to
handle upgrading types in contrib. It looks like citext, hstore, and
several data types provided by isn have hash opclasses, and I think
that there's no syntax for adding a support function to an existing
opclass. We could add that, but I'm not sure how safe it would be.
ALTER OPERATOR FAMILY ADD FUNCTION ... ?
That would result in the functions being considered "loose" in the
family rather than bound into an operator class. I think that's
actually the right thing, because they shouldn't be considered
to be required.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 31, 2017 at 10:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I think this takes care of adding not only the infrastructure but
support for all the core data types, but I'm not quite sure how to
handle upgrading types in contrib. It looks like citext, hstore, and
several data types provided by isn have hash opclasses, and I think
that there's no syntax for adding a support function to an existing
opclass. We could add that, but I'm not sure how safe it would be.ALTER OPERATOR FAMILY ADD FUNCTION ... ?
That would result in the functions being considered "loose" in the
family rather than bound into an operator class. I think that's
actually the right thing, because they shouldn't be considered
to be required.
But wouldn't that result in a different effect than the core data type
changes I just did?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Aug 31, 2017 at 10:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
ALTER OPERATOR FAMILY ADD FUNCTION ... ?
That would result in the functions being considered "loose" in the
family rather than bound into an operator class. I think that's
actually the right thing, because they shouldn't be considered
to be required.
But wouldn't that result in a different effect than the core data type
changes I just did?
Possibly --- I have not read that patch. But considering that all core
functions are pinned anyway, it doesn't seem like it much matters whether
we consider them to be loosely or tightly bound to opclasses. That
should only matter if one tries to drop the function.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Sep 1, 2017 at 8:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Aug 31, 2017 at 8:40 AM, amul sul <sulamul@gmail.com> wrote:
Fixed in the attached version.
I fixed these up a bit and committed them. Thanks.
I think this takes care of adding not only the infrastructure but
support for all the core data types, but I'm not quite sure how to
handle upgrading types in contrib. It looks like citext, hstore, and
several data types provided by isn have hash opclasses, and I think
that there's no syntax for adding a support function to an existing
opclass. We could add that, but I'm not sure how safe it would be.TBH, I really don't care much about fixing isn, but it seems like
fixing citext and hstore would be worthwhile.
Attached patch proposes the fix for the citext and hstore contrib.
To make it easy to understand I've split these patch in two part. 0001 adds
a new file for the contrib upgrade & renames an existing file to the higher
version, and 0002 is the actual implementation of extended hash function for
that contrib's data type.
Regards,
Amul
Attachments:
0001-hstore-File-renaming-v1.patchapplication/octet-stream; name=0001-hstore-File-renaming-v1.patchDownload
From d4ce1385c7fa80918831fc4cef78b21de78eaa6d Mon Sep 17 00:00:00 2001
From: Amul Sul <sulamul@gmail.com>
Date: Fri, 8 Sep 2017 12:29:07 +0530
Subject: [PATCH 1/2] hstore - File renaming v1
---
contrib/hstore/Makefile | 3 +-
contrib/hstore/hstore--1.4--1.5.sql | 4 +
contrib/hstore/hstore--1.4.sql | 550 ------------------------------------
contrib/hstore/hstore--1.5.sql | 550 ++++++++++++++++++++++++++++++++++++
contrib/hstore/hstore.control | 2 +-
5 files changed, 557 insertions(+), 552 deletions(-)
create mode 100644 contrib/hstore/hstore--1.4--1.5.sql
delete mode 100644 contrib/hstore/hstore--1.4.sql
create mode 100644 contrib/hstore/hstore--1.5.sql
diff --git a/contrib/hstore/Makefile b/contrib/hstore/Makefile
index 311cc09..42fbdee 100644
--- a/contrib/hstore/Makefile
+++ b/contrib/hstore/Makefile
@@ -5,7 +5,8 @@ OBJS = hstore_io.o hstore_op.o hstore_gist.o hstore_gin.o hstore_compat.o \
$(WIN32RES)
EXTENSION = hstore
-DATA = hstore--1.4.sql hstore--1.3--1.4.sql hstore--1.2--1.3.sql \
+DATA = hstore--1.5.sql hstore--1.4--1.5.sql \
+ hstore--1.3--1.4.sql hstore--1.2--1.3.sql \
hstore--1.1--1.2.sql hstore--1.0--1.1.sql \
hstore--unpackaged--1.0.sql
PGFILEDESC = "hstore - key/value pair data type"
diff --git a/contrib/hstore/hstore--1.4--1.5.sql b/contrib/hstore/hstore--1.4--1.5.sql
new file mode 100644
index 0000000..443dc84
--- /dev/null
+++ b/contrib/hstore/hstore--1.4--1.5.sql
@@ -0,0 +1,4 @@
+/* contrib/hstore/hstore--1.4--1.5.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION hstore UPDATE TO '1.5'" to load this file. \quit
diff --git a/contrib/hstore/hstore--1.4.sql b/contrib/hstore/hstore--1.4.sql
deleted file mode 100644
index 4294d14..0000000
--- a/contrib/hstore/hstore--1.4.sql
+++ /dev/null
@@ -1,550 +0,0 @@
-/* contrib/hstore/hstore--1.4.sql */
-
--- complain if script is sourced in psql, rather than via CREATE EXTENSION
-\echo Use "CREATE EXTENSION hstore" to load this file. \quit
-
-CREATE TYPE hstore;
-
-CREATE FUNCTION hstore_in(cstring)
-RETURNS hstore
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION hstore_out(hstore)
-RETURNS cstring
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION hstore_recv(internal)
-RETURNS hstore
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION hstore_send(hstore)
-RETURNS bytea
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE TYPE hstore (
- INTERNALLENGTH = -1,
- INPUT = hstore_in,
- OUTPUT = hstore_out,
- RECEIVE = hstore_recv,
- SEND = hstore_send,
- STORAGE = extended
-);
-
-CREATE FUNCTION hstore_version_diag(hstore)
-RETURNS integer
-AS 'MODULE_PATHNAME','hstore_version_diag'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION fetchval(hstore,text)
-RETURNS text
-AS 'MODULE_PATHNAME','hstore_fetchval'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE OPERATOR -> (
- LEFTARG = hstore,
- RIGHTARG = text,
- PROCEDURE = fetchval
-);
-
-CREATE FUNCTION slice_array(hstore,text[])
-RETURNS text[]
-AS 'MODULE_PATHNAME','hstore_slice_to_array'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE OPERATOR -> (
- LEFTARG = hstore,
- RIGHTARG = text[],
- PROCEDURE = slice_array
-);
-
-CREATE FUNCTION slice(hstore,text[])
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_slice_to_hstore'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION isexists(hstore,text)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_exists'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION exist(hstore,text)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_exists'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE OPERATOR ? (
- LEFTARG = hstore,
- RIGHTARG = text,
- PROCEDURE = exist,
- RESTRICT = contsel,
- JOIN = contjoinsel
-);
-
-CREATE FUNCTION exists_any(hstore,text[])
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_exists_any'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE OPERATOR ?| (
- LEFTARG = hstore,
- RIGHTARG = text[],
- PROCEDURE = exists_any,
- RESTRICT = contsel,
- JOIN = contjoinsel
-);
-
-CREATE FUNCTION exists_all(hstore,text[])
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_exists_all'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE OPERATOR ?& (
- LEFTARG = hstore,
- RIGHTARG = text[],
- PROCEDURE = exists_all,
- RESTRICT = contsel,
- JOIN = contjoinsel
-);
-
-CREATE FUNCTION isdefined(hstore,text)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_defined'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION defined(hstore,text)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_defined'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION delete(hstore,text)
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_delete'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION delete(hstore,text[])
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_delete_array'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION delete(hstore,hstore)
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_delete_hstore'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE OPERATOR - (
- LEFTARG = hstore,
- RIGHTARG = text,
- PROCEDURE = delete
-);
-
-CREATE OPERATOR - (
- LEFTARG = hstore,
- RIGHTARG = text[],
- PROCEDURE = delete
-);
-
-CREATE OPERATOR - (
- LEFTARG = hstore,
- RIGHTARG = hstore,
- PROCEDURE = delete
-);
-
-CREATE FUNCTION hs_concat(hstore,hstore)
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_concat'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE OPERATOR || (
- LEFTARG = hstore,
- RIGHTARG = hstore,
- PROCEDURE = hs_concat
-);
-
-CREATE FUNCTION hs_contains(hstore,hstore)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_contains'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION hs_contained(hstore,hstore)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_contained'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE OPERATOR @> (
- LEFTARG = hstore,
- RIGHTARG = hstore,
- PROCEDURE = hs_contains,
- COMMUTATOR = '<@',
- RESTRICT = contsel,
- JOIN = contjoinsel
-);
-
-CREATE OPERATOR <@ (
- LEFTARG = hstore,
- RIGHTARG = hstore,
- PROCEDURE = hs_contained,
- COMMUTATOR = '@>',
- RESTRICT = contsel,
- JOIN = contjoinsel
-);
-
--- obsolete:
-CREATE OPERATOR @ (
- LEFTARG = hstore,
- RIGHTARG = hstore,
- PROCEDURE = hs_contains,
- COMMUTATOR = '~',
- RESTRICT = contsel,
- JOIN = contjoinsel
-);
-
-CREATE OPERATOR ~ (
- LEFTARG = hstore,
- RIGHTARG = hstore,
- PROCEDURE = hs_contained,
- COMMUTATOR = '@',
- RESTRICT = contsel,
- JOIN = contjoinsel
-);
-
-CREATE FUNCTION tconvert(text,text)
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_from_text'
-LANGUAGE C IMMUTABLE PARALLEL SAFE; -- not STRICT; needs to allow (key,NULL)
-
-CREATE FUNCTION hstore(text,text)
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_from_text'
-LANGUAGE C IMMUTABLE PARALLEL SAFE; -- not STRICT; needs to allow (key,NULL)
-
-CREATE FUNCTION hstore(text[],text[])
-RETURNS hstore
-AS 'MODULE_PATHNAME', 'hstore_from_arrays'
-LANGUAGE C IMMUTABLE PARALLEL SAFE; -- not STRICT; allows (keys,null)
-
-CREATE FUNCTION hstore(text[])
-RETURNS hstore
-AS 'MODULE_PATHNAME', 'hstore_from_array'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE CAST (text[] AS hstore)
- WITH FUNCTION hstore(text[]);
-
-CREATE FUNCTION hstore_to_json(hstore)
-RETURNS json
-AS 'MODULE_PATHNAME', 'hstore_to_json'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE CAST (hstore AS json)
- WITH FUNCTION hstore_to_json(hstore);
-
-CREATE FUNCTION hstore_to_json_loose(hstore)
-RETURNS json
-AS 'MODULE_PATHNAME', 'hstore_to_json_loose'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION hstore_to_jsonb(hstore)
-RETURNS jsonb
-AS 'MODULE_PATHNAME', 'hstore_to_jsonb'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE CAST (hstore AS jsonb)
- WITH FUNCTION hstore_to_jsonb(hstore);
-
-CREATE FUNCTION hstore_to_jsonb_loose(hstore)
-RETURNS jsonb
-AS 'MODULE_PATHNAME', 'hstore_to_jsonb_loose'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION hstore(record)
-RETURNS hstore
-AS 'MODULE_PATHNAME', 'hstore_from_record'
-LANGUAGE C IMMUTABLE PARALLEL SAFE; -- not STRICT; allows (null::recordtype)
-
-CREATE FUNCTION hstore_to_array(hstore)
-RETURNS text[]
-AS 'MODULE_PATHNAME','hstore_to_array'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE OPERATOR %% (
- RIGHTARG = hstore,
- PROCEDURE = hstore_to_array
-);
-
-CREATE FUNCTION hstore_to_matrix(hstore)
-RETURNS text[]
-AS 'MODULE_PATHNAME','hstore_to_matrix'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE OPERATOR %# (
- RIGHTARG = hstore,
- PROCEDURE = hstore_to_matrix
-);
-
-CREATE FUNCTION akeys(hstore)
-RETURNS text[]
-AS 'MODULE_PATHNAME','hstore_akeys'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION avals(hstore)
-RETURNS text[]
-AS 'MODULE_PATHNAME','hstore_avals'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION skeys(hstore)
-RETURNS setof text
-AS 'MODULE_PATHNAME','hstore_skeys'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION svals(hstore)
-RETURNS setof text
-AS 'MODULE_PATHNAME','hstore_svals'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION each(IN hs hstore,
- OUT key text,
- OUT value text)
-RETURNS SETOF record
-AS 'MODULE_PATHNAME','hstore_each'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION populate_record(anyelement,hstore)
-RETURNS anyelement
-AS 'MODULE_PATHNAME', 'hstore_populate_record'
-LANGUAGE C IMMUTABLE PARALLEL SAFE; -- not STRICT; allows (null::rectype,hstore)
-
-CREATE OPERATOR #= (
- LEFTARG = anyelement,
- RIGHTARG = hstore,
- PROCEDURE = populate_record
-);
-
--- btree support
-
-CREATE FUNCTION hstore_eq(hstore,hstore)
-RETURNS boolean
-AS 'MODULE_PATHNAME','hstore_eq'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION hstore_ne(hstore,hstore)
-RETURNS boolean
-AS 'MODULE_PATHNAME','hstore_ne'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION hstore_gt(hstore,hstore)
-RETURNS boolean
-AS 'MODULE_PATHNAME','hstore_gt'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION hstore_ge(hstore,hstore)
-RETURNS boolean
-AS 'MODULE_PATHNAME','hstore_ge'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION hstore_lt(hstore,hstore)
-RETURNS boolean
-AS 'MODULE_PATHNAME','hstore_lt'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION hstore_le(hstore,hstore)
-RETURNS boolean
-AS 'MODULE_PATHNAME','hstore_le'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION hstore_cmp(hstore,hstore)
-RETURNS integer
-AS 'MODULE_PATHNAME','hstore_cmp'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE OPERATOR = (
- LEFTARG = hstore,
- RIGHTARG = hstore,
- PROCEDURE = hstore_eq,
- COMMUTATOR = =,
- NEGATOR = <>,
- RESTRICT = eqsel,
- JOIN = eqjoinsel,
- MERGES,
- HASHES
-);
-CREATE OPERATOR <> (
- LEFTARG = hstore,
- RIGHTARG = hstore,
- PROCEDURE = hstore_ne,
- COMMUTATOR = <>,
- NEGATOR = =,
- RESTRICT = neqsel,
- JOIN = neqjoinsel
-);
-
--- the comparison operators have funky names (and are undocumented)
--- in an attempt to discourage anyone from actually using them. they
--- only exist to support the btree opclass
-
-CREATE OPERATOR #<# (
- LEFTARG = hstore,
- RIGHTARG = hstore,
- PROCEDURE = hstore_lt,
- COMMUTATOR = #>#,
- NEGATOR = #>=#,
- RESTRICT = scalarltsel,
- JOIN = scalarltjoinsel
-);
-CREATE OPERATOR #<=# (
- LEFTARG = hstore,
- RIGHTARG = hstore,
- PROCEDURE = hstore_le,
- COMMUTATOR = #>=#,
- NEGATOR = #>#,
- RESTRICT = scalarltsel,
- JOIN = scalarltjoinsel
-);
-CREATE OPERATOR #># (
- LEFTARG = hstore,
- RIGHTARG = hstore,
- PROCEDURE = hstore_gt,
- COMMUTATOR = #<#,
- NEGATOR = #<=#,
- RESTRICT = scalargtsel,
- JOIN = scalargtjoinsel
-);
-CREATE OPERATOR #>=# (
- LEFTARG = hstore,
- RIGHTARG = hstore,
- PROCEDURE = hstore_ge,
- COMMUTATOR = #<=#,
- NEGATOR = #<#,
- RESTRICT = scalargtsel,
- JOIN = scalargtjoinsel
-);
-
-CREATE OPERATOR CLASS btree_hstore_ops
-DEFAULT FOR TYPE hstore USING btree
-AS
- OPERATOR 1 #<# ,
- OPERATOR 2 #<=# ,
- OPERATOR 3 = ,
- OPERATOR 4 #>=# ,
- OPERATOR 5 #># ,
- FUNCTION 1 hstore_cmp(hstore,hstore);
-
--- hash support
-
-CREATE FUNCTION hstore_hash(hstore)
-RETURNS integer
-AS 'MODULE_PATHNAME','hstore_hash'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE OPERATOR CLASS hash_hstore_ops
-DEFAULT FOR TYPE hstore USING hash
-AS
- OPERATOR 1 = ,
- FUNCTION 1 hstore_hash(hstore);
-
--- GiST support
-
-CREATE TYPE ghstore;
-
-CREATE FUNCTION ghstore_in(cstring)
-RETURNS ghstore
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION ghstore_out(ghstore)
-RETURNS cstring
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE TYPE ghstore (
- INTERNALLENGTH = -1,
- INPUT = ghstore_in,
- OUTPUT = ghstore_out
-);
-
-CREATE FUNCTION ghstore_compress(internal)
-RETURNS internal
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION ghstore_decompress(internal)
-RETURNS internal
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION ghstore_penalty(internal,internal,internal)
-RETURNS internal
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION ghstore_picksplit(internal, internal)
-RETURNS internal
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION ghstore_union(internal, internal)
-RETURNS ghstore
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION ghstore_same(ghstore, ghstore, internal)
-RETURNS internal
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION ghstore_consistent(internal,hstore,smallint,oid,internal)
-RETURNS bool
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE OPERATOR CLASS gist_hstore_ops
-DEFAULT FOR TYPE hstore USING gist
-AS
- OPERATOR 7 @> ,
- OPERATOR 9 ?(hstore,text) ,
- OPERATOR 10 ?|(hstore,text[]) ,
- OPERATOR 11 ?&(hstore,text[]) ,
- --OPERATOR 8 <@ ,
- OPERATOR 13 @ ,
- --OPERATOR 14 ~ ,
- FUNCTION 1 ghstore_consistent (internal, hstore, smallint, oid, internal),
- FUNCTION 2 ghstore_union (internal, internal),
- FUNCTION 3 ghstore_compress (internal),
- FUNCTION 4 ghstore_decompress (internal),
- FUNCTION 5 ghstore_penalty (internal, internal, internal),
- FUNCTION 6 ghstore_picksplit (internal, internal),
- FUNCTION 7 ghstore_same (ghstore, ghstore, internal),
- STORAGE ghstore;
-
--- GIN support
-
-CREATE FUNCTION gin_extract_hstore(hstore, internal)
-RETURNS internal
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION gin_extract_hstore_query(hstore, internal, int2, internal, internal)
-RETURNS internal
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION gin_consistent_hstore(internal, int2, hstore, int4, internal, internal)
-RETURNS bool
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE OPERATOR CLASS gin_hstore_ops
-DEFAULT FOR TYPE hstore USING gin
-AS
- OPERATOR 7 @>,
- OPERATOR 9 ?(hstore,text),
- OPERATOR 10 ?|(hstore,text[]),
- OPERATOR 11 ?&(hstore,text[]),
- FUNCTION 1 bttextcmp(text,text),
- FUNCTION 2 gin_extract_hstore(hstore, internal),
- FUNCTION 3 gin_extract_hstore_query(hstore, internal, int2, internal, internal),
- FUNCTION 4 gin_consistent_hstore(internal, int2, hstore, int4, internal, internal),
- STORAGE text;
diff --git a/contrib/hstore/hstore--1.5.sql b/contrib/hstore/hstore--1.5.sql
new file mode 100644
index 0000000..4ec3ec6
--- /dev/null
+++ b/contrib/hstore/hstore--1.5.sql
@@ -0,0 +1,550 @@
+/* contrib/hstore/hstore--1.5.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION hstore" to load this file. \quit
+
+CREATE TYPE hstore;
+
+CREATE FUNCTION hstore_in(cstring)
+RETURNS hstore
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION hstore_out(hstore)
+RETURNS cstring
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION hstore_recv(internal)
+RETURNS hstore
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION hstore_send(hstore)
+RETURNS bytea
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE TYPE hstore (
+ INTERNALLENGTH = -1,
+ INPUT = hstore_in,
+ OUTPUT = hstore_out,
+ RECEIVE = hstore_recv,
+ SEND = hstore_send,
+ STORAGE = extended
+);
+
+CREATE FUNCTION hstore_version_diag(hstore)
+RETURNS integer
+AS 'MODULE_PATHNAME','hstore_version_diag'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION fetchval(hstore,text)
+RETURNS text
+AS 'MODULE_PATHNAME','hstore_fetchval'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE OPERATOR -> (
+ LEFTARG = hstore,
+ RIGHTARG = text,
+ PROCEDURE = fetchval
+);
+
+CREATE FUNCTION slice_array(hstore,text[])
+RETURNS text[]
+AS 'MODULE_PATHNAME','hstore_slice_to_array'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE OPERATOR -> (
+ LEFTARG = hstore,
+ RIGHTARG = text[],
+ PROCEDURE = slice_array
+);
+
+CREATE FUNCTION slice(hstore,text[])
+RETURNS hstore
+AS 'MODULE_PATHNAME','hstore_slice_to_hstore'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION isexists(hstore,text)
+RETURNS bool
+AS 'MODULE_PATHNAME','hstore_exists'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION exist(hstore,text)
+RETURNS bool
+AS 'MODULE_PATHNAME','hstore_exists'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE OPERATOR ? (
+ LEFTARG = hstore,
+ RIGHTARG = text,
+ PROCEDURE = exist,
+ RESTRICT = contsel,
+ JOIN = contjoinsel
+);
+
+CREATE FUNCTION exists_any(hstore,text[])
+RETURNS bool
+AS 'MODULE_PATHNAME','hstore_exists_any'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE OPERATOR ?| (
+ LEFTARG = hstore,
+ RIGHTARG = text[],
+ PROCEDURE = exists_any,
+ RESTRICT = contsel,
+ JOIN = contjoinsel
+);
+
+CREATE FUNCTION exists_all(hstore,text[])
+RETURNS bool
+AS 'MODULE_PATHNAME','hstore_exists_all'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE OPERATOR ?& (
+ LEFTARG = hstore,
+ RIGHTARG = text[],
+ PROCEDURE = exists_all,
+ RESTRICT = contsel,
+ JOIN = contjoinsel
+);
+
+CREATE FUNCTION isdefined(hstore,text)
+RETURNS bool
+AS 'MODULE_PATHNAME','hstore_defined'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION defined(hstore,text)
+RETURNS bool
+AS 'MODULE_PATHNAME','hstore_defined'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION delete(hstore,text)
+RETURNS hstore
+AS 'MODULE_PATHNAME','hstore_delete'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION delete(hstore,text[])
+RETURNS hstore
+AS 'MODULE_PATHNAME','hstore_delete_array'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION delete(hstore,hstore)
+RETURNS hstore
+AS 'MODULE_PATHNAME','hstore_delete_hstore'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE OPERATOR - (
+ LEFTARG = hstore,
+ RIGHTARG = text,
+ PROCEDURE = delete
+);
+
+CREATE OPERATOR - (
+ LEFTARG = hstore,
+ RIGHTARG = text[],
+ PROCEDURE = delete
+);
+
+CREATE OPERATOR - (
+ LEFTARG = hstore,
+ RIGHTARG = hstore,
+ PROCEDURE = delete
+);
+
+CREATE FUNCTION hs_concat(hstore,hstore)
+RETURNS hstore
+AS 'MODULE_PATHNAME','hstore_concat'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE OPERATOR || (
+ LEFTARG = hstore,
+ RIGHTARG = hstore,
+ PROCEDURE = hs_concat
+);
+
+CREATE FUNCTION hs_contains(hstore,hstore)
+RETURNS bool
+AS 'MODULE_PATHNAME','hstore_contains'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION hs_contained(hstore,hstore)
+RETURNS bool
+AS 'MODULE_PATHNAME','hstore_contained'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE OPERATOR @> (
+ LEFTARG = hstore,
+ RIGHTARG = hstore,
+ PROCEDURE = hs_contains,
+ COMMUTATOR = '<@',
+ RESTRICT = contsel,
+ JOIN = contjoinsel
+);
+
+CREATE OPERATOR <@ (
+ LEFTARG = hstore,
+ RIGHTARG = hstore,
+ PROCEDURE = hs_contained,
+ COMMUTATOR = '@>',
+ RESTRICT = contsel,
+ JOIN = contjoinsel
+);
+
+-- obsolete:
+CREATE OPERATOR @ (
+ LEFTARG = hstore,
+ RIGHTARG = hstore,
+ PROCEDURE = hs_contains,
+ COMMUTATOR = '~',
+ RESTRICT = contsel,
+ JOIN = contjoinsel
+);
+
+CREATE OPERATOR ~ (
+ LEFTARG = hstore,
+ RIGHTARG = hstore,
+ PROCEDURE = hs_contained,
+ COMMUTATOR = '@',
+ RESTRICT = contsel,
+ JOIN = contjoinsel
+);
+
+CREATE FUNCTION tconvert(text,text)
+RETURNS hstore
+AS 'MODULE_PATHNAME','hstore_from_text'
+LANGUAGE C IMMUTABLE PARALLEL SAFE; -- not STRICT; needs to allow (key,NULL)
+
+CREATE FUNCTION hstore(text,text)
+RETURNS hstore
+AS 'MODULE_PATHNAME','hstore_from_text'
+LANGUAGE C IMMUTABLE PARALLEL SAFE; -- not STRICT; needs to allow (key,NULL)
+
+CREATE FUNCTION hstore(text[],text[])
+RETURNS hstore
+AS 'MODULE_PATHNAME', 'hstore_from_arrays'
+LANGUAGE C IMMUTABLE PARALLEL SAFE; -- not STRICT; allows (keys,null)
+
+CREATE FUNCTION hstore(text[])
+RETURNS hstore
+AS 'MODULE_PATHNAME', 'hstore_from_array'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE CAST (text[] AS hstore)
+ WITH FUNCTION hstore(text[]);
+
+CREATE FUNCTION hstore_to_json(hstore)
+RETURNS json
+AS 'MODULE_PATHNAME', 'hstore_to_json'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE CAST (hstore AS json)
+ WITH FUNCTION hstore_to_json(hstore);
+
+CREATE FUNCTION hstore_to_json_loose(hstore)
+RETURNS json
+AS 'MODULE_PATHNAME', 'hstore_to_json_loose'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION hstore_to_jsonb(hstore)
+RETURNS jsonb
+AS 'MODULE_PATHNAME', 'hstore_to_jsonb'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE CAST (hstore AS jsonb)
+ WITH FUNCTION hstore_to_jsonb(hstore);
+
+CREATE FUNCTION hstore_to_jsonb_loose(hstore)
+RETURNS jsonb
+AS 'MODULE_PATHNAME', 'hstore_to_jsonb_loose'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION hstore(record)
+RETURNS hstore
+AS 'MODULE_PATHNAME', 'hstore_from_record'
+LANGUAGE C IMMUTABLE PARALLEL SAFE; -- not STRICT; allows (null::recordtype)
+
+CREATE FUNCTION hstore_to_array(hstore)
+RETURNS text[]
+AS 'MODULE_PATHNAME','hstore_to_array'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE OPERATOR %% (
+ RIGHTARG = hstore,
+ PROCEDURE = hstore_to_array
+);
+
+CREATE FUNCTION hstore_to_matrix(hstore)
+RETURNS text[]
+AS 'MODULE_PATHNAME','hstore_to_matrix'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE OPERATOR %# (
+ RIGHTARG = hstore,
+ PROCEDURE = hstore_to_matrix
+);
+
+CREATE FUNCTION akeys(hstore)
+RETURNS text[]
+AS 'MODULE_PATHNAME','hstore_akeys'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION avals(hstore)
+RETURNS text[]
+AS 'MODULE_PATHNAME','hstore_avals'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION skeys(hstore)
+RETURNS setof text
+AS 'MODULE_PATHNAME','hstore_skeys'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION svals(hstore)
+RETURNS setof text
+AS 'MODULE_PATHNAME','hstore_svals'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION each(IN hs hstore,
+ OUT key text,
+ OUT value text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME','hstore_each'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION populate_record(anyelement,hstore)
+RETURNS anyelement
+AS 'MODULE_PATHNAME', 'hstore_populate_record'
+LANGUAGE C IMMUTABLE PARALLEL SAFE; -- not STRICT; allows (null::rectype,hstore)
+
+CREATE OPERATOR #= (
+ LEFTARG = anyelement,
+ RIGHTARG = hstore,
+ PROCEDURE = populate_record
+);
+
+-- btree support
+
+CREATE FUNCTION hstore_eq(hstore,hstore)
+RETURNS boolean
+AS 'MODULE_PATHNAME','hstore_eq'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION hstore_ne(hstore,hstore)
+RETURNS boolean
+AS 'MODULE_PATHNAME','hstore_ne'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION hstore_gt(hstore,hstore)
+RETURNS boolean
+AS 'MODULE_PATHNAME','hstore_gt'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION hstore_ge(hstore,hstore)
+RETURNS boolean
+AS 'MODULE_PATHNAME','hstore_ge'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION hstore_lt(hstore,hstore)
+RETURNS boolean
+AS 'MODULE_PATHNAME','hstore_lt'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION hstore_le(hstore,hstore)
+RETURNS boolean
+AS 'MODULE_PATHNAME','hstore_le'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION hstore_cmp(hstore,hstore)
+RETURNS integer
+AS 'MODULE_PATHNAME','hstore_cmp'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE OPERATOR = (
+ LEFTARG = hstore,
+ RIGHTARG = hstore,
+ PROCEDURE = hstore_eq,
+ COMMUTATOR = =,
+ NEGATOR = <>,
+ RESTRICT = eqsel,
+ JOIN = eqjoinsel,
+ MERGES,
+ HASHES
+);
+CREATE OPERATOR <> (
+ LEFTARG = hstore,
+ RIGHTARG = hstore,
+ PROCEDURE = hstore_ne,
+ COMMUTATOR = <>,
+ NEGATOR = =,
+ RESTRICT = neqsel,
+ JOIN = neqjoinsel
+);
+
+-- the comparison operators have funky names (and are undocumented)
+-- in an attempt to discourage anyone from actually using them. they
+-- only exist to support the btree opclass
+
+CREATE OPERATOR #<# (
+ LEFTARG = hstore,
+ RIGHTARG = hstore,
+ PROCEDURE = hstore_lt,
+ COMMUTATOR = #>#,
+ NEGATOR = #>=#,
+ RESTRICT = scalarltsel,
+ JOIN = scalarltjoinsel
+);
+CREATE OPERATOR #<=# (
+ LEFTARG = hstore,
+ RIGHTARG = hstore,
+ PROCEDURE = hstore_le,
+ COMMUTATOR = #>=#,
+ NEGATOR = #>#,
+ RESTRICT = scalarltsel,
+ JOIN = scalarltjoinsel
+);
+CREATE OPERATOR #># (
+ LEFTARG = hstore,
+ RIGHTARG = hstore,
+ PROCEDURE = hstore_gt,
+ COMMUTATOR = #<#,
+ NEGATOR = #<=#,
+ RESTRICT = scalargtsel,
+ JOIN = scalargtjoinsel
+);
+CREATE OPERATOR #>=# (
+ LEFTARG = hstore,
+ RIGHTARG = hstore,
+ PROCEDURE = hstore_ge,
+ COMMUTATOR = #<=#,
+ NEGATOR = #<#,
+ RESTRICT = scalargtsel,
+ JOIN = scalargtjoinsel
+);
+
+CREATE OPERATOR CLASS btree_hstore_ops
+DEFAULT FOR TYPE hstore USING btree
+AS
+ OPERATOR 1 #<# ,
+ OPERATOR 2 #<=# ,
+ OPERATOR 3 = ,
+ OPERATOR 4 #>=# ,
+ OPERATOR 5 #># ,
+ FUNCTION 1 hstore_cmp(hstore,hstore);
+
+-- hash support
+
+CREATE FUNCTION hstore_hash(hstore)
+RETURNS integer
+AS 'MODULE_PATHNAME','hstore_hash'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE OPERATOR CLASS hash_hstore_ops
+DEFAULT FOR TYPE hstore USING hash
+AS
+ OPERATOR 1 = ,
+ FUNCTION 1 hstore_hash(hstore);
+
+-- GiST support
+
+CREATE TYPE ghstore;
+
+CREATE FUNCTION ghstore_in(cstring)
+RETURNS ghstore
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION ghstore_out(ghstore)
+RETURNS cstring
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE TYPE ghstore (
+ INTERNALLENGTH = -1,
+ INPUT = ghstore_in,
+ OUTPUT = ghstore_out
+);
+
+CREATE FUNCTION ghstore_compress(internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION ghstore_decompress(internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION ghstore_penalty(internal,internal,internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION ghstore_picksplit(internal, internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION ghstore_union(internal, internal)
+RETURNS ghstore
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION ghstore_same(ghstore, ghstore, internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION ghstore_consistent(internal,hstore,smallint,oid,internal)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE OPERATOR CLASS gist_hstore_ops
+DEFAULT FOR TYPE hstore USING gist
+AS
+ OPERATOR 7 @> ,
+ OPERATOR 9 ?(hstore,text) ,
+ OPERATOR 10 ?|(hstore,text[]) ,
+ OPERATOR 11 ?&(hstore,text[]) ,
+ --OPERATOR 8 <@ ,
+ OPERATOR 13 @ ,
+ --OPERATOR 14 ~ ,
+ FUNCTION 1 ghstore_consistent (internal, hstore, smallint, oid, internal),
+ FUNCTION 2 ghstore_union (internal, internal),
+ FUNCTION 3 ghstore_compress (internal),
+ FUNCTION 4 ghstore_decompress (internal),
+ FUNCTION 5 ghstore_penalty (internal, internal, internal),
+ FUNCTION 6 ghstore_picksplit (internal, internal),
+ FUNCTION 7 ghstore_same (ghstore, ghstore, internal),
+ STORAGE ghstore;
+
+-- GIN support
+
+CREATE FUNCTION gin_extract_hstore(hstore, internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION gin_extract_hstore_query(hstore, internal, int2, internal, internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION gin_consistent_hstore(internal, int2, hstore, int4, internal, internal)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE OPERATOR CLASS gin_hstore_ops
+DEFAULT FOR TYPE hstore USING gin
+AS
+ OPERATOR 7 @>,
+ OPERATOR 9 ?(hstore,text),
+ OPERATOR 10 ?|(hstore,text[]),
+ OPERATOR 11 ?&(hstore,text[]),
+ FUNCTION 1 bttextcmp(text,text),
+ FUNCTION 2 gin_extract_hstore(hstore, internal),
+ FUNCTION 3 gin_extract_hstore_query(hstore, internal, int2, internal, internal),
+ FUNCTION 4 gin_consistent_hstore(internal, int2, hstore, int4, internal, internal),
+ STORAGE text;
diff --git a/contrib/hstore/hstore.control b/contrib/hstore/hstore.control
index f99a937..8a71947 100644
--- a/contrib/hstore/hstore.control
+++ b/contrib/hstore/hstore.control
@@ -1,5 +1,5 @@
# hstore extension
comment = 'data type for storing sets of (key, value) pairs'
-default_version = '1.4'
+default_version = '1.5'
module_pathname = '$libdir/hstore'
relocatable = true
--
2.6.2
0002-hstore-add-extended-hash-function-v1.patchapplication/octet-stream; name=0002-hstore-add-extended-hash-function-v1.patchDownload
From a1756489050ba5ff7e3ba441116cb4b6ef6d684e Mon Sep 17 00:00:00 2001
From: Amul Sul <sulamul@gmail.com>
Date: Fri, 8 Sep 2017 12:29:13 +0530
Subject: [PATCH 2/2] hstore - add extended hash function
---
contrib/hstore/expected/hstore.out | 12 ++++++++++++
contrib/hstore/hstore--1.4--1.5.sql | 8 ++++++++
contrib/hstore/hstore--1.5.sql | 8 +++++++-
contrib/hstore/hstore--unpackaged--1.0.sql | 1 +
contrib/hstore/hstore_op.c | 20 ++++++++++++++++++++
contrib/hstore/sql/hstore.sql | 9 +++++++++
6 files changed, 57 insertions(+), 1 deletion(-)
diff --git a/contrib/hstore/expected/hstore.out b/contrib/hstore/expected/hstore.out
index f0d4216..4f1db01 100644
--- a/contrib/hstore/expected/hstore.out
+++ b/contrib/hstore/expected/hstore.out
@@ -1515,3 +1515,15 @@ select json_agg(q) from (select f1, hstore_to_json_loose(f2) as f2 from test_jso
{"f1":"rec2","f2":{"b": false, "c": "null", "d": -12345, "e": "012345.6", "f": -1.234, "g": 0.345e-4, "a key": 2}}]
(1 row)
+-- Check the hstore_hash() and hstore_hash_extended() function explicitly.
+SELECT v as value, hstore_hash(v)::bit(32) as standard,
+ hstore_hash_extended(v, 0)::bit(32) as extended0,
+ hstore_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::hstore), (''), ('"a key" =>1'), ('c => null'),
+ ('e => 012345'), ('g => 2.345e+4')) x(v)
+WHERE hstore_hash(v)::bit(32) != hstore_hash_extended(v, 0)::bit(32)
+ OR hstore_hash(v)::bit(32) = hstore_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
diff --git a/contrib/hstore/hstore--1.4--1.5.sql b/contrib/hstore/hstore--1.4--1.5.sql
index 443dc84..26610ce 100644
--- a/contrib/hstore/hstore--1.4--1.5.sql
+++ b/contrib/hstore/hstore--1.4--1.5.sql
@@ -2,3 +2,11 @@
-- complain if script is sourced in psql, rather than via ALTER EXTENSION
\echo Use "ALTER EXTENSION hstore UPDATE TO '1.5'" to load this file. \quit
+
+CREATE FUNCTION hstore_hash_extended(hstore, int8)
+RETURNS int8
+AS 'MODULE_PATHNAME','hstore_hash_extended'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+ALTER OPERATOR FAMILY hash_hstore_ops USING hash ADD
+ FUNCTION 2 hstore_hash_extended(hstore, int8);
diff --git a/contrib/hstore/hstore--1.5.sql b/contrib/hstore/hstore--1.5.sql
index 4ec3ec6..182275c 100644
--- a/contrib/hstore/hstore--1.5.sql
+++ b/contrib/hstore/hstore--1.5.sql
@@ -439,11 +439,17 @@ RETURNS integer
AS 'MODULE_PATHNAME','hstore_hash'
LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+CREATE FUNCTION hstore_hash_extended(hstore, int8)
+RETURNS int8
+AS 'MODULE_PATHNAME','hstore_hash_extended'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
CREATE OPERATOR CLASS hash_hstore_ops
DEFAULT FOR TYPE hstore USING hash
AS
OPERATOR 1 = ,
- FUNCTION 1 hstore_hash(hstore);
+ FUNCTION 1 hstore_hash(hstore),
+ FUNCTION 2 hstore_hash_extended(hstore, int8);
-- GiST support
diff --git a/contrib/hstore/hstore--unpackaged--1.0.sql b/contrib/hstore/hstore--unpackaged--1.0.sql
index 19a7802..2128165 100644
--- a/contrib/hstore/hstore--unpackaged--1.0.sql
+++ b/contrib/hstore/hstore--unpackaged--1.0.sql
@@ -71,6 +71,7 @@ ALTER EXTENSION hstore ADD operator #<=#(hstore,hstore);
ALTER EXTENSION hstore ADD operator family btree_hstore_ops using btree;
ALTER EXTENSION hstore ADD operator class btree_hstore_ops using btree;
ALTER EXTENSION hstore ADD function hstore_hash(hstore);
+ALTER EXTENSION hstore ADD function hstore_hash_extended(hstore,int8);
ALTER EXTENSION hstore ADD operator family hash_hstore_ops using hash;
ALTER EXTENSION hstore ADD operator class hash_hstore_ops using hash;
ALTER EXTENSION hstore ADD type ghstore;
diff --git a/contrib/hstore/hstore_op.c b/contrib/hstore/hstore_op.c
index 612be23..c962c49 100644
--- a/contrib/hstore/hstore_op.c
+++ b/contrib/hstore/hstore_op.c
@@ -1253,3 +1253,23 @@ hstore_hash(PG_FUNCTION_ARGS)
PG_FREE_IF_COPY(hs, 0);
PG_RETURN_DATUM(hval);
}
+
+PG_FUNCTION_INFO_V1(hstore_hash_extended);
+Datum
+hstore_hash_extended(PG_FUNCTION_ARGS)
+{
+ HStore *hs = PG_GETARG_HS(0);
+ Datum hval = hash_any_extended((unsigned char *) VARDATA(hs),
+ VARSIZE(hs) - VARHDRSZ,
+ PG_GETARG_INT64(1));
+
+ /* Same approach as hstore_hash */
+ Assert(VARSIZE(hs) ==
+ (HS_COUNT(hs) != 0 ?
+ CALCDATASIZE(HS_COUNT(hs),
+ HSE_ENDPOS(ARRPTR(hs)[2 * HS_COUNT(hs) - 1])) :
+ HSHRDSIZE));
+
+ PG_FREE_IF_COPY(hs, 0);
+ PG_RETURN_DATUM(hval);
+}
diff --git a/contrib/hstore/sql/hstore.sql b/contrib/hstore/sql/hstore.sql
index d64b9f7..76ac48b 100644
--- a/contrib/hstore/sql/hstore.sql
+++ b/contrib/hstore/sql/hstore.sql
@@ -350,3 +350,12 @@ insert into test_json_agg values ('rec1','"a key" =>1, b => t, c => null, d=> 12
('rec2','"a key" =>2, b => f, c => "null", d=> -12345, e => 012345.6, f=> -1.234, g=> 0.345e-4');
select json_agg(q) from test_json_agg q;
select json_agg(q) from (select f1, hstore_to_json_loose(f2) as f2 from test_json_agg) q;
+
+-- Check the hstore_hash() and hstore_hash_extended() function explicitly.
+SELECT v as value, hstore_hash(v)::bit(32) as standard,
+ hstore_hash_extended(v, 0)::bit(32) as extended0,
+ hstore_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::hstore), (''), ('"a key" =>1'), ('c => null'),
+ ('e => 012345'), ('g => 2.345e+4')) x(v)
+WHERE hstore_hash(v)::bit(32) != hstore_hash_extended(v, 0)::bit(32)
+ OR hstore_hash(v)::bit(32) = hstore_hash_extended(v, 1)::bit(32);
--
2.6.2
0001-citext-File-renaming-v1.patchapplication/octet-stream; name=0001-citext-File-renaming-v1.patchDownload
From c7f09ad73d4b98dea745d86c5e81c412ed0e5ed7 Mon Sep 17 00:00:00 2001
From: Amul Sul <sulamul@gmail.com>
Date: Fri, 8 Sep 2017 12:28:57 +0530
Subject: [PATCH 1/2] citext - File renaming v1
---
contrib/citext/Makefile | 7 +-
contrib/citext/citext--1.4--1.5.sql | 4 +
contrib/citext/citext--1.4.sql | 501 ------------------------------------
contrib/citext/citext--1.5.sql | 501 ++++++++++++++++++++++++++++++++++++
contrib/citext/citext.control | 2 +-
5 files changed, 510 insertions(+), 505 deletions(-)
create mode 100644 contrib/citext/citext--1.4--1.5.sql
delete mode 100644 contrib/citext/citext--1.4.sql
create mode 100644 contrib/citext/citext--1.5.sql
diff --git a/contrib/citext/Makefile b/contrib/citext/Makefile
index 563cd22..af26775 100644
--- a/contrib/citext/Makefile
+++ b/contrib/citext/Makefile
@@ -3,9 +3,10 @@
MODULES = citext
EXTENSION = citext
-DATA = citext--1.4.sql citext--1.3--1.4.sql \
- citext--1.2--1.3.sql citext--1.1--1.2.sql \
- citext--1.0--1.1.sql citext--unpackaged--1.0.sql
+DATA = citext--1.5.sql citext--1.4--1.5.sql \
+ citext--1.3--1.4.sql citext--1.2--1.3.sql \
+ citext--1.1--1.2.sql citext--1.0--1.1.sql \
+ citext--unpackaged--1.0.sql
PGFILEDESC = "citext - case-insensitive character string data type"
REGRESS = citext
diff --git a/contrib/citext/citext--1.4--1.5.sql b/contrib/citext/citext--1.4--1.5.sql
new file mode 100644
index 0000000..d69f7b1
--- /dev/null
+++ b/contrib/citext/citext--1.4--1.5.sql
@@ -0,0 +1,4 @@
+/* contrib/citext/citext--1.4--1.5.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION citext UPDATE TO '1.5'" to load this file. \quit
diff --git a/contrib/citext/citext--1.4.sql b/contrib/citext/citext--1.4.sql
deleted file mode 100644
index 7b06198..0000000
--- a/contrib/citext/citext--1.4.sql
+++ /dev/null
@@ -1,501 +0,0 @@
-/* contrib/citext/citext--1.4.sql */
-
--- complain if script is sourced in psql, rather than via CREATE EXTENSION
-\echo Use "CREATE EXTENSION citext" to load this file. \quit
-
---
--- PostgreSQL code for CITEXT.
---
--- Most I/O functions, and a few others, piggyback on the "text" type
--- functions via the implicit cast to text.
---
-
---
--- Shell type to keep things a bit quieter.
---
-
-CREATE TYPE citext;
-
---
--- Input and output functions.
---
-CREATE FUNCTION citextin(cstring)
-RETURNS citext
-AS 'textin'
-LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION citextout(citext)
-RETURNS cstring
-AS 'textout'
-LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION citextrecv(internal)
-RETURNS citext
-AS 'textrecv'
-LANGUAGE internal STABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION citextsend(citext)
-RETURNS bytea
-AS 'textsend'
-LANGUAGE internal STABLE STRICT PARALLEL SAFE;
-
---
--- The type itself.
---
-
-CREATE TYPE citext (
- INPUT = citextin,
- OUTPUT = citextout,
- RECEIVE = citextrecv,
- SEND = citextsend,
- INTERNALLENGTH = VARIABLE,
- STORAGE = extended,
- -- make it a non-preferred member of string type category
- CATEGORY = 'S',
- PREFERRED = false,
- COLLATABLE = true
-);
-
---
--- Type casting functions for those situations where the I/O casts don't
--- automatically kick in.
---
-
-CREATE FUNCTION citext(bpchar)
-RETURNS citext
-AS 'rtrim1'
-LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION citext(boolean)
-RETURNS citext
-AS 'booltext'
-LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION citext(inet)
-RETURNS citext
-AS 'network_show'
-LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
-
---
--- Implicit and assignment type casts.
---
-
-CREATE CAST (citext AS text) WITHOUT FUNCTION AS IMPLICIT;
-CREATE CAST (citext AS varchar) WITHOUT FUNCTION AS IMPLICIT;
-CREATE CAST (citext AS bpchar) WITHOUT FUNCTION AS ASSIGNMENT;
-CREATE CAST (text AS citext) WITHOUT FUNCTION AS ASSIGNMENT;
-CREATE CAST (varchar AS citext) WITHOUT FUNCTION AS ASSIGNMENT;
-CREATE CAST (bpchar AS citext) WITH FUNCTION citext(bpchar) AS ASSIGNMENT;
-CREATE CAST (boolean AS citext) WITH FUNCTION citext(boolean) AS ASSIGNMENT;
-CREATE CAST (inet AS citext) WITH FUNCTION citext(inet) AS ASSIGNMENT;
-
---
--- Operator Functions.
---
-
-CREATE FUNCTION citext_eq( citext, citext )
-RETURNS bool
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION citext_ne( citext, citext )
-RETURNS bool
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION citext_lt( citext, citext )
-RETURNS bool
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION citext_le( citext, citext )
-RETURNS bool
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION citext_gt( citext, citext )
-RETURNS bool
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION citext_ge( citext, citext )
-RETURNS bool
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
---
--- Operators.
---
-
-CREATE OPERATOR = (
- LEFTARG = CITEXT,
- RIGHTARG = CITEXT,
- COMMUTATOR = =,
- NEGATOR = <>,
- PROCEDURE = citext_eq,
- RESTRICT = eqsel,
- JOIN = eqjoinsel,
- HASHES,
- MERGES
-);
-
-CREATE OPERATOR <> (
- LEFTARG = CITEXT,
- RIGHTARG = CITEXT,
- NEGATOR = =,
- COMMUTATOR = <>,
- PROCEDURE = citext_ne,
- RESTRICT = neqsel,
- JOIN = neqjoinsel
-);
-
-CREATE OPERATOR < (
- LEFTARG = CITEXT,
- RIGHTARG = CITEXT,
- NEGATOR = >=,
- COMMUTATOR = >,
- PROCEDURE = citext_lt,
- RESTRICT = scalarltsel,
- JOIN = scalarltjoinsel
-);
-
-CREATE OPERATOR <= (
- LEFTARG = CITEXT,
- RIGHTARG = CITEXT,
- NEGATOR = >,
- COMMUTATOR = >=,
- PROCEDURE = citext_le,
- RESTRICT = scalarltsel,
- JOIN = scalarltjoinsel
-);
-
-CREATE OPERATOR >= (
- LEFTARG = CITEXT,
- RIGHTARG = CITEXT,
- NEGATOR = <,
- COMMUTATOR = <=,
- PROCEDURE = citext_ge,
- RESTRICT = scalargtsel,
- JOIN = scalargtjoinsel
-);
-
-CREATE OPERATOR > (
- LEFTARG = CITEXT,
- RIGHTARG = CITEXT,
- NEGATOR = <=,
- COMMUTATOR = <,
- PROCEDURE = citext_gt,
- RESTRICT = scalargtsel,
- JOIN = scalargtjoinsel
-);
-
---
--- Support functions for indexing.
---
-
-CREATE FUNCTION citext_cmp(citext, citext)
-RETURNS int4
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
-CREATE FUNCTION citext_hash(citext)
-RETURNS int4
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
-
---
--- The btree indexing operator class.
---
-
-CREATE OPERATOR CLASS citext_ops
-DEFAULT FOR TYPE CITEXT USING btree AS
- OPERATOR 1 < (citext, citext),
- OPERATOR 2 <= (citext, citext),
- OPERATOR 3 = (citext, citext),
- OPERATOR 4 >= (citext, citext),
- OPERATOR 5 > (citext, citext),
- FUNCTION 1 citext_cmp(citext, citext);
-
---
--- The hash indexing operator class.
---
-
-CREATE OPERATOR CLASS citext_ops
-DEFAULT FOR TYPE citext USING hash AS
- OPERATOR 1 = (citext, citext),
- FUNCTION 1 citext_hash(citext);
-
---
--- Aggregates.
---
-
-CREATE FUNCTION citext_smaller(citext, citext)
-RETURNS citext
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION citext_larger(citext, citext)
-RETURNS citext
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE AGGREGATE min(citext) (
- SFUNC = citext_smaller,
- STYPE = citext,
- SORTOP = <,
- PARALLEL = SAFE,
- COMBINEFUNC = citext_smaller
-);
-
-CREATE AGGREGATE max(citext) (
- SFUNC = citext_larger,
- STYPE = citext,
- SORTOP = >,
- PARALLEL = SAFE,
- COMBINEFUNC = citext_larger
-);
-
---
--- CITEXT pattern matching.
---
-
-CREATE FUNCTION texticlike(citext, citext)
-RETURNS bool AS 'texticlike'
-LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION texticnlike(citext, citext)
-RETURNS bool AS 'texticnlike'
-LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION texticregexeq(citext, citext)
-RETURNS bool AS 'texticregexeq'
-LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION texticregexne(citext, citext)
-RETURNS bool AS 'texticregexne'
-LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE OPERATOR ~ (
- PROCEDURE = texticregexeq,
- LEFTARG = citext,
- RIGHTARG = citext,
- NEGATOR = !~,
- RESTRICT = icregexeqsel,
- JOIN = icregexeqjoinsel
-);
-
-CREATE OPERATOR ~* (
- PROCEDURE = texticregexeq,
- LEFTARG = citext,
- RIGHTARG = citext,
- NEGATOR = !~*,
- RESTRICT = icregexeqsel,
- JOIN = icregexeqjoinsel
-);
-
-CREATE OPERATOR !~ (
- PROCEDURE = texticregexne,
- LEFTARG = citext,
- RIGHTARG = citext,
- NEGATOR = ~,
- RESTRICT = icregexnesel,
- JOIN = icregexnejoinsel
-);
-
-CREATE OPERATOR !~* (
- PROCEDURE = texticregexne,
- LEFTARG = citext,
- RIGHTARG = citext,
- NEGATOR = ~*,
- RESTRICT = icregexnesel,
- JOIN = icregexnejoinsel
-);
-
-CREATE OPERATOR ~~ (
- PROCEDURE = texticlike,
- LEFTARG = citext,
- RIGHTARG = citext,
- NEGATOR = !~~,
- RESTRICT = iclikesel,
- JOIN = iclikejoinsel
-);
-
-CREATE OPERATOR ~~* (
- PROCEDURE = texticlike,
- LEFTARG = citext,
- RIGHTARG = citext,
- NEGATOR = !~~*,
- RESTRICT = iclikesel,
- JOIN = iclikejoinsel
-);
-
-CREATE OPERATOR !~~ (
- PROCEDURE = texticnlike,
- LEFTARG = citext,
- RIGHTARG = citext,
- NEGATOR = ~~,
- RESTRICT = icnlikesel,
- JOIN = icnlikejoinsel
-);
-
-CREATE OPERATOR !~~* (
- PROCEDURE = texticnlike,
- LEFTARG = citext,
- RIGHTARG = citext,
- NEGATOR = ~~*,
- RESTRICT = icnlikesel,
- JOIN = icnlikejoinsel
-);
-
---
--- Matching citext to text.
---
-
-CREATE FUNCTION texticlike(citext, text)
-RETURNS bool AS 'texticlike'
-LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION texticnlike(citext, text)
-RETURNS bool AS 'texticnlike'
-LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION texticregexeq(citext, text)
-RETURNS bool AS 'texticregexeq'
-LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION texticregexne(citext, text)
-RETURNS bool AS 'texticregexne'
-LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE OPERATOR ~ (
- PROCEDURE = texticregexeq,
- LEFTARG = citext,
- RIGHTARG = text,
- NEGATOR = !~,
- RESTRICT = icregexeqsel,
- JOIN = icregexeqjoinsel
-);
-
-CREATE OPERATOR ~* (
- PROCEDURE = texticregexeq,
- LEFTARG = citext,
- RIGHTARG = text,
- NEGATOR = !~*,
- RESTRICT = icregexeqsel,
- JOIN = icregexeqjoinsel
-);
-
-CREATE OPERATOR !~ (
- PROCEDURE = texticregexne,
- LEFTARG = citext,
- RIGHTARG = text,
- NEGATOR = ~,
- RESTRICT = icregexnesel,
- JOIN = icregexnejoinsel
-);
-
-CREATE OPERATOR !~* (
- PROCEDURE = texticregexne,
- LEFTARG = citext,
- RIGHTARG = text,
- NEGATOR = ~*,
- RESTRICT = icregexnesel,
- JOIN = icregexnejoinsel
-);
-
-CREATE OPERATOR ~~ (
- PROCEDURE = texticlike,
- LEFTARG = citext,
- RIGHTARG = text,
- NEGATOR = !~~,
- RESTRICT = iclikesel,
- JOIN = iclikejoinsel
-);
-
-CREATE OPERATOR ~~* (
- PROCEDURE = texticlike,
- LEFTARG = citext,
- RIGHTARG = text,
- NEGATOR = !~~*,
- RESTRICT = iclikesel,
- JOIN = iclikejoinsel
-);
-
-CREATE OPERATOR !~~ (
- PROCEDURE = texticnlike,
- LEFTARG = citext,
- RIGHTARG = text,
- NEGATOR = ~~,
- RESTRICT = icnlikesel,
- JOIN = icnlikejoinsel
-);
-
-CREATE OPERATOR !~~* (
- PROCEDURE = texticnlike,
- LEFTARG = citext,
- RIGHTARG = text,
- NEGATOR = ~~*,
- RESTRICT = icnlikesel,
- JOIN = icnlikejoinsel
-);
-
---
--- Matching citext in string comparison functions.
--- XXX TODO Ideally these would be implemented in C.
---
-
-CREATE FUNCTION regexp_match( citext, citext ) RETURNS TEXT[] AS $$
- SELECT pg_catalog.regexp_match( $1::pg_catalog.text, $2::pg_catalog.text, 'i' );
-$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION regexp_match( citext, citext, text ) RETURNS TEXT[] AS $$
- SELECT pg_catalog.regexp_match( $1::pg_catalog.text, $2::pg_catalog.text, CASE WHEN pg_catalog.strpos($3, 'c') = 0 THEN $3 || 'i' ELSE $3 END );
-$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION regexp_matches( citext, citext ) RETURNS SETOF TEXT[] AS $$
- SELECT pg_catalog.regexp_matches( $1::pg_catalog.text, $2::pg_catalog.text, 'i' );
-$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE ROWS 1;
-
-CREATE FUNCTION regexp_matches( citext, citext, text ) RETURNS SETOF TEXT[] AS $$
- SELECT pg_catalog.regexp_matches( $1::pg_catalog.text, $2::pg_catalog.text, CASE WHEN pg_catalog.strpos($3, 'c') = 0 THEN $3 || 'i' ELSE $3 END );
-$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE ROWS 10;
-
-CREATE FUNCTION regexp_replace( citext, citext, text ) returns TEXT AS $$
- SELECT pg_catalog.regexp_replace( $1::pg_catalog.text, $2::pg_catalog.text, $3, 'i');
-$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION regexp_replace( citext, citext, text, text ) returns TEXT AS $$
- SELECT pg_catalog.regexp_replace( $1::pg_catalog.text, $2::pg_catalog.text, $3, CASE WHEN pg_catalog.strpos($4, 'c') = 0 THEN $4 || 'i' ELSE $4 END);
-$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION regexp_split_to_array( citext, citext ) RETURNS TEXT[] AS $$
- SELECT pg_catalog.regexp_split_to_array( $1::pg_catalog.text, $2::pg_catalog.text, 'i' );
-$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION regexp_split_to_array( citext, citext, text ) RETURNS TEXT[] AS $$
- SELECT pg_catalog.regexp_split_to_array( $1::pg_catalog.text, $2::pg_catalog.text, CASE WHEN pg_catalog.strpos($3, 'c') = 0 THEN $3 || 'i' ELSE $3 END );
-$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION regexp_split_to_table( citext, citext ) RETURNS SETOF TEXT AS $$
- SELECT pg_catalog.regexp_split_to_table( $1::pg_catalog.text, $2::pg_catalog.text, 'i' );
-$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION regexp_split_to_table( citext, citext, text ) RETURNS SETOF TEXT AS $$
- SELECT pg_catalog.regexp_split_to_table( $1::pg_catalog.text, $2::pg_catalog.text, CASE WHEN pg_catalog.strpos($3, 'c') = 0 THEN $3 || 'i' ELSE $3 END );
-$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION strpos( citext, citext ) RETURNS INT AS $$
- SELECT pg_catalog.strpos( pg_catalog.lower( $1::pg_catalog.text ), pg_catalog.lower( $2::pg_catalog.text ) );
-$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION replace( citext, citext, citext ) RETURNS TEXT AS $$
- SELECT pg_catalog.regexp_replace( $1::pg_catalog.text, pg_catalog.regexp_replace($2::pg_catalog.text, '([^a-zA-Z_0-9])', E'\\\\\\1', 'g'), $3::pg_catalog.text, 'gi' );
-$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION split_part( citext, citext, int ) RETURNS TEXT AS $$
- SELECT (pg_catalog.regexp_split_to_array( $1::pg_catalog.text, pg_catalog.regexp_replace($2::pg_catalog.text, '([^a-zA-Z_0-9])', E'\\\\\\1', 'g'), 'i'))[$3];
-$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION translate( citext, citext, text ) RETURNS TEXT AS $$
- SELECT pg_catalog.translate( pg_catalog.translate( $1::pg_catalog.text, pg_catalog.lower($2::pg_catalog.text), $3), pg_catalog.upper($2::pg_catalog.text), $3);
-$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
diff --git a/contrib/citext/citext--1.5.sql b/contrib/citext/citext--1.5.sql
new file mode 100644
index 0000000..ce984e3
--- /dev/null
+++ b/contrib/citext/citext--1.5.sql
@@ -0,0 +1,501 @@
+/* contrib/citext/citext--1.5.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION citext" to load this file. \quit
+
+--
+-- PostgreSQL code for CITEXT.
+--
+-- Most I/O functions, and a few others, piggyback on the "text" type
+-- functions via the implicit cast to text.
+--
+
+--
+-- Shell type to keep things a bit quieter.
+--
+
+CREATE TYPE citext;
+
+--
+-- Input and output functions.
+--
+CREATE FUNCTION citextin(cstring)
+RETURNS citext
+AS 'textin'
+LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION citextout(citext)
+RETURNS cstring
+AS 'textout'
+LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION citextrecv(internal)
+RETURNS citext
+AS 'textrecv'
+LANGUAGE internal STABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION citextsend(citext)
+RETURNS bytea
+AS 'textsend'
+LANGUAGE internal STABLE STRICT PARALLEL SAFE;
+
+--
+-- The type itself.
+--
+
+CREATE TYPE citext (
+ INPUT = citextin,
+ OUTPUT = citextout,
+ RECEIVE = citextrecv,
+ SEND = citextsend,
+ INTERNALLENGTH = VARIABLE,
+ STORAGE = extended,
+ -- make it a non-preferred member of string type category
+ CATEGORY = 'S',
+ PREFERRED = false,
+ COLLATABLE = true
+);
+
+--
+-- Type casting functions for those situations where the I/O casts don't
+-- automatically kick in.
+--
+
+CREATE FUNCTION citext(bpchar)
+RETURNS citext
+AS 'rtrim1'
+LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION citext(boolean)
+RETURNS citext
+AS 'booltext'
+LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION citext(inet)
+RETURNS citext
+AS 'network_show'
+LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
+
+--
+-- Implicit and assignment type casts.
+--
+
+CREATE CAST (citext AS text) WITHOUT FUNCTION AS IMPLICIT;
+CREATE CAST (citext AS varchar) WITHOUT FUNCTION AS IMPLICIT;
+CREATE CAST (citext AS bpchar) WITHOUT FUNCTION AS ASSIGNMENT;
+CREATE CAST (text AS citext) WITHOUT FUNCTION AS ASSIGNMENT;
+CREATE CAST (varchar AS citext) WITHOUT FUNCTION AS ASSIGNMENT;
+CREATE CAST (bpchar AS citext) WITH FUNCTION citext(bpchar) AS ASSIGNMENT;
+CREATE CAST (boolean AS citext) WITH FUNCTION citext(boolean) AS ASSIGNMENT;
+CREATE CAST (inet AS citext) WITH FUNCTION citext(inet) AS ASSIGNMENT;
+
+--
+-- Operator Functions.
+--
+
+CREATE FUNCTION citext_eq( citext, citext )
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION citext_ne( citext, citext )
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION citext_lt( citext, citext )
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION citext_le( citext, citext )
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION citext_gt( citext, citext )
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION citext_ge( citext, citext )
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+--
+-- Operators.
+--
+
+CREATE OPERATOR = (
+ LEFTARG = CITEXT,
+ RIGHTARG = CITEXT,
+ COMMUTATOR = =,
+ NEGATOR = <>,
+ PROCEDURE = citext_eq,
+ RESTRICT = eqsel,
+ JOIN = eqjoinsel,
+ HASHES,
+ MERGES
+);
+
+CREATE OPERATOR <> (
+ LEFTARG = CITEXT,
+ RIGHTARG = CITEXT,
+ NEGATOR = =,
+ COMMUTATOR = <>,
+ PROCEDURE = citext_ne,
+ RESTRICT = neqsel,
+ JOIN = neqjoinsel
+);
+
+CREATE OPERATOR < (
+ LEFTARG = CITEXT,
+ RIGHTARG = CITEXT,
+ NEGATOR = >=,
+ COMMUTATOR = >,
+ PROCEDURE = citext_lt,
+ RESTRICT = scalarltsel,
+ JOIN = scalarltjoinsel
+);
+
+CREATE OPERATOR <= (
+ LEFTARG = CITEXT,
+ RIGHTARG = CITEXT,
+ NEGATOR = >,
+ COMMUTATOR = >=,
+ PROCEDURE = citext_le,
+ RESTRICT = scalarltsel,
+ JOIN = scalarltjoinsel
+);
+
+CREATE OPERATOR >= (
+ LEFTARG = CITEXT,
+ RIGHTARG = CITEXT,
+ NEGATOR = <,
+ COMMUTATOR = <=,
+ PROCEDURE = citext_ge,
+ RESTRICT = scalargtsel,
+ JOIN = scalargtjoinsel
+);
+
+CREATE OPERATOR > (
+ LEFTARG = CITEXT,
+ RIGHTARG = CITEXT,
+ NEGATOR = <=,
+ COMMUTATOR = <,
+ PROCEDURE = citext_gt,
+ RESTRICT = scalargtsel,
+ JOIN = scalargtjoinsel
+);
+
+--
+-- Support functions for indexing.
+--
+
+CREATE FUNCTION citext_cmp(citext, citext)
+RETURNS int4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION citext_hash(citext)
+RETURNS int4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+--
+-- The btree indexing operator class.
+--
+
+CREATE OPERATOR CLASS citext_ops
+DEFAULT FOR TYPE CITEXT USING btree AS
+ OPERATOR 1 < (citext, citext),
+ OPERATOR 2 <= (citext, citext),
+ OPERATOR 3 = (citext, citext),
+ OPERATOR 4 >= (citext, citext),
+ OPERATOR 5 > (citext, citext),
+ FUNCTION 1 citext_cmp(citext, citext);
+
+--
+-- The hash indexing operator class.
+--
+
+CREATE OPERATOR CLASS citext_ops
+DEFAULT FOR TYPE citext USING hash AS
+ OPERATOR 1 = (citext, citext),
+ FUNCTION 1 citext_hash(citext);
+
+--
+-- Aggregates.
+--
+
+CREATE FUNCTION citext_smaller(citext, citext)
+RETURNS citext
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION citext_larger(citext, citext)
+RETURNS citext
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE AGGREGATE min(citext) (
+ SFUNC = citext_smaller,
+ STYPE = citext,
+ SORTOP = <,
+ PARALLEL = SAFE,
+ COMBINEFUNC = citext_smaller
+);
+
+CREATE AGGREGATE max(citext) (
+ SFUNC = citext_larger,
+ STYPE = citext,
+ SORTOP = >,
+ PARALLEL = SAFE,
+ COMBINEFUNC = citext_larger
+);
+
+--
+-- CITEXT pattern matching.
+--
+
+CREATE FUNCTION texticlike(citext, citext)
+RETURNS bool AS 'texticlike'
+LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION texticnlike(citext, citext)
+RETURNS bool AS 'texticnlike'
+LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION texticregexeq(citext, citext)
+RETURNS bool AS 'texticregexeq'
+LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION texticregexne(citext, citext)
+RETURNS bool AS 'texticregexne'
+LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE OPERATOR ~ (
+ PROCEDURE = texticregexeq,
+ LEFTARG = citext,
+ RIGHTARG = citext,
+ NEGATOR = !~,
+ RESTRICT = icregexeqsel,
+ JOIN = icregexeqjoinsel
+);
+
+CREATE OPERATOR ~* (
+ PROCEDURE = texticregexeq,
+ LEFTARG = citext,
+ RIGHTARG = citext,
+ NEGATOR = !~*,
+ RESTRICT = icregexeqsel,
+ JOIN = icregexeqjoinsel
+);
+
+CREATE OPERATOR !~ (
+ PROCEDURE = texticregexne,
+ LEFTARG = citext,
+ RIGHTARG = citext,
+ NEGATOR = ~,
+ RESTRICT = icregexnesel,
+ JOIN = icregexnejoinsel
+);
+
+CREATE OPERATOR !~* (
+ PROCEDURE = texticregexne,
+ LEFTARG = citext,
+ RIGHTARG = citext,
+ NEGATOR = ~*,
+ RESTRICT = icregexnesel,
+ JOIN = icregexnejoinsel
+);
+
+CREATE OPERATOR ~~ (
+ PROCEDURE = texticlike,
+ LEFTARG = citext,
+ RIGHTARG = citext,
+ NEGATOR = !~~,
+ RESTRICT = iclikesel,
+ JOIN = iclikejoinsel
+);
+
+CREATE OPERATOR ~~* (
+ PROCEDURE = texticlike,
+ LEFTARG = citext,
+ RIGHTARG = citext,
+ NEGATOR = !~~*,
+ RESTRICT = iclikesel,
+ JOIN = iclikejoinsel
+);
+
+CREATE OPERATOR !~~ (
+ PROCEDURE = texticnlike,
+ LEFTARG = citext,
+ RIGHTARG = citext,
+ NEGATOR = ~~,
+ RESTRICT = icnlikesel,
+ JOIN = icnlikejoinsel
+);
+
+CREATE OPERATOR !~~* (
+ PROCEDURE = texticnlike,
+ LEFTARG = citext,
+ RIGHTARG = citext,
+ NEGATOR = ~~*,
+ RESTRICT = icnlikesel,
+ JOIN = icnlikejoinsel
+);
+
+--
+-- Matching citext to text.
+--
+
+CREATE FUNCTION texticlike(citext, text)
+RETURNS bool AS 'texticlike'
+LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION texticnlike(citext, text)
+RETURNS bool AS 'texticnlike'
+LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION texticregexeq(citext, text)
+RETURNS bool AS 'texticregexeq'
+LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION texticregexne(citext, text)
+RETURNS bool AS 'texticregexne'
+LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE OPERATOR ~ (
+ PROCEDURE = texticregexeq,
+ LEFTARG = citext,
+ RIGHTARG = text,
+ NEGATOR = !~,
+ RESTRICT = icregexeqsel,
+ JOIN = icregexeqjoinsel
+);
+
+CREATE OPERATOR ~* (
+ PROCEDURE = texticregexeq,
+ LEFTARG = citext,
+ RIGHTARG = text,
+ NEGATOR = !~*,
+ RESTRICT = icregexeqsel,
+ JOIN = icregexeqjoinsel
+);
+
+CREATE OPERATOR !~ (
+ PROCEDURE = texticregexne,
+ LEFTARG = citext,
+ RIGHTARG = text,
+ NEGATOR = ~,
+ RESTRICT = icregexnesel,
+ JOIN = icregexnejoinsel
+);
+
+CREATE OPERATOR !~* (
+ PROCEDURE = texticregexne,
+ LEFTARG = citext,
+ RIGHTARG = text,
+ NEGATOR = ~*,
+ RESTRICT = icregexnesel,
+ JOIN = icregexnejoinsel
+);
+
+CREATE OPERATOR ~~ (
+ PROCEDURE = texticlike,
+ LEFTARG = citext,
+ RIGHTARG = text,
+ NEGATOR = !~~,
+ RESTRICT = iclikesel,
+ JOIN = iclikejoinsel
+);
+
+CREATE OPERATOR ~~* (
+ PROCEDURE = texticlike,
+ LEFTARG = citext,
+ RIGHTARG = text,
+ NEGATOR = !~~*,
+ RESTRICT = iclikesel,
+ JOIN = iclikejoinsel
+);
+
+CREATE OPERATOR !~~ (
+ PROCEDURE = texticnlike,
+ LEFTARG = citext,
+ RIGHTARG = text,
+ NEGATOR = ~~,
+ RESTRICT = icnlikesel,
+ JOIN = icnlikejoinsel
+);
+
+CREATE OPERATOR !~~* (
+ PROCEDURE = texticnlike,
+ LEFTARG = citext,
+ RIGHTARG = text,
+ NEGATOR = ~~*,
+ RESTRICT = icnlikesel,
+ JOIN = icnlikejoinsel
+);
+
+--
+-- Matching citext in string comparison functions.
+-- XXX TODO Ideally these would be implemented in C.
+--
+
+CREATE FUNCTION regexp_match( citext, citext ) RETURNS TEXT[] AS $$
+ SELECT pg_catalog.regexp_match( $1::pg_catalog.text, $2::pg_catalog.text, 'i' );
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION regexp_match( citext, citext, text ) RETURNS TEXT[] AS $$
+ SELECT pg_catalog.regexp_match( $1::pg_catalog.text, $2::pg_catalog.text, CASE WHEN pg_catalog.strpos($3, 'c') = 0 THEN $3 || 'i' ELSE $3 END );
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION regexp_matches( citext, citext ) RETURNS SETOF TEXT[] AS $$
+ SELECT pg_catalog.regexp_matches( $1::pg_catalog.text, $2::pg_catalog.text, 'i' );
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE ROWS 1;
+
+CREATE FUNCTION regexp_matches( citext, citext, text ) RETURNS SETOF TEXT[] AS $$
+ SELECT pg_catalog.regexp_matches( $1::pg_catalog.text, $2::pg_catalog.text, CASE WHEN pg_catalog.strpos($3, 'c') = 0 THEN $3 || 'i' ELSE $3 END );
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE ROWS 10;
+
+CREATE FUNCTION regexp_replace( citext, citext, text ) returns TEXT AS $$
+ SELECT pg_catalog.regexp_replace( $1::pg_catalog.text, $2::pg_catalog.text, $3, 'i');
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION regexp_replace( citext, citext, text, text ) returns TEXT AS $$
+ SELECT pg_catalog.regexp_replace( $1::pg_catalog.text, $2::pg_catalog.text, $3, CASE WHEN pg_catalog.strpos($4, 'c') = 0 THEN $4 || 'i' ELSE $4 END);
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION regexp_split_to_array( citext, citext ) RETURNS TEXT[] AS $$
+ SELECT pg_catalog.regexp_split_to_array( $1::pg_catalog.text, $2::pg_catalog.text, 'i' );
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION regexp_split_to_array( citext, citext, text ) RETURNS TEXT[] AS $$
+ SELECT pg_catalog.regexp_split_to_array( $1::pg_catalog.text, $2::pg_catalog.text, CASE WHEN pg_catalog.strpos($3, 'c') = 0 THEN $3 || 'i' ELSE $3 END );
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION regexp_split_to_table( citext, citext ) RETURNS SETOF TEXT AS $$
+ SELECT pg_catalog.regexp_split_to_table( $1::pg_catalog.text, $2::pg_catalog.text, 'i' );
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION regexp_split_to_table( citext, citext, text ) RETURNS SETOF TEXT AS $$
+ SELECT pg_catalog.regexp_split_to_table( $1::pg_catalog.text, $2::pg_catalog.text, CASE WHEN pg_catalog.strpos($3, 'c') = 0 THEN $3 || 'i' ELSE $3 END );
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION strpos( citext, citext ) RETURNS INT AS $$
+ SELECT pg_catalog.strpos( pg_catalog.lower( $1::pg_catalog.text ), pg_catalog.lower( $2::pg_catalog.text ) );
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION replace( citext, citext, citext ) RETURNS TEXT AS $$
+ SELECT pg_catalog.regexp_replace( $1::pg_catalog.text, pg_catalog.regexp_replace($2::pg_catalog.text, '([^a-zA-Z_0-9])', E'\\\\\\1', 'g'), $3::pg_catalog.text, 'gi' );
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION split_part( citext, citext, int ) RETURNS TEXT AS $$
+ SELECT (pg_catalog.regexp_split_to_array( $1::pg_catalog.text, pg_catalog.regexp_replace($2::pg_catalog.text, '([^a-zA-Z_0-9])', E'\\\\\\1', 'g'), 'i'))[$3];
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION translate( citext, citext, text ) RETURNS TEXT AS $$
+ SELECT pg_catalog.translate( pg_catalog.translate( $1::pg_catalog.text, pg_catalog.lower($2::pg_catalog.text), $3), pg_catalog.upper($2::pg_catalog.text), $3);
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
diff --git a/contrib/citext/citext.control b/contrib/citext/citext.control
index 17fce4e..4cd6e09 100644
--- a/contrib/citext/citext.control
+++ b/contrib/citext/citext.control
@@ -1,5 +1,5 @@
# citext extension
comment = 'data type for case-insensitive character strings'
-default_version = '1.4'
+default_version = '1.5'
module_pathname = '$libdir/citext'
relocatable = true
--
2.6.2
0002-citext-add-extended-hash-function-v1.patchapplication/octet-stream; name=0002-citext-add-extended-hash-function-v1.patchDownload
From 7e3a5330e1059b57e489b802abcc62676fd19ccc Mon Sep 17 00:00:00 2001
From: Amul Sul <sulamul@gmail.com>
Date: Fri, 8 Sep 2017 12:29:02 +0530
Subject: [PATCH 2/2] citext - add extended hash function
---
contrib/citext/citext--1.4--1.5.sql | 8 ++++++++
contrib/citext/citext--1.5.sql | 8 +++++++-
contrib/citext/citext--unpackaged--1.0.sql | 1 +
contrib/citext/citext.c | 20 ++++++++++++++++++++
contrib/citext/expected/citext.out | 12 ++++++++++++
contrib/citext/expected/citext_1.out | 12 ++++++++++++
contrib/citext/sql/citext.sql | 9 +++++++++
7 files changed, 69 insertions(+), 1 deletion(-)
diff --git a/contrib/citext/citext--1.4--1.5.sql b/contrib/citext/citext--1.4--1.5.sql
index d69f7b1..a06c42d 100644
--- a/contrib/citext/citext--1.4--1.5.sql
+++ b/contrib/citext/citext--1.4--1.5.sql
@@ -2,3 +2,11 @@
-- complain if script is sourced in psql, rather than via ALTER EXTENSION
\echo Use "ALTER EXTENSION citext UPDATE TO '1.5'" to load this file. \quit
+
+CREATE FUNCTION citext_hash_extended(citext, int8)
+RETURNS int8
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+ALTER OPERATOR FAMILY citext_ops USING hash ADD
+ FUNCTION 2 citext_hash_extended(citext, int8);
diff --git a/contrib/citext/citext--1.5.sql b/contrib/citext/citext--1.5.sql
index ce984e3..4d7ec4d 100644
--- a/contrib/citext/citext--1.5.sql
+++ b/contrib/citext/citext--1.5.sql
@@ -203,6 +203,11 @@ RETURNS int4
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+CREATE FUNCTION citext_hash_extended(citext, int8)
+RETURNS int8
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
--
-- The btree indexing operator class.
--
@@ -223,7 +228,8 @@ DEFAULT FOR TYPE CITEXT USING btree AS
CREATE OPERATOR CLASS citext_ops
DEFAULT FOR TYPE citext USING hash AS
OPERATOR 1 = (citext, citext),
- FUNCTION 1 citext_hash(citext);
+ FUNCTION 1 citext_hash(citext),
+ FUNCTION 2 citext_hash_extended(citext, int8);
--
-- Aggregates.
diff --git a/contrib/citext/citext--unpackaged--1.0.sql b/contrib/citext/citext--unpackaged--1.0.sql
index ef6d6b0..9b1161b 100644
--- a/contrib/citext/citext--unpackaged--1.0.sql
+++ b/contrib/citext/citext--unpackaged--1.0.sql
@@ -33,6 +33,7 @@ ALTER EXTENSION citext ADD operator <(citext,citext);
ALTER EXTENSION citext ADD operator <=(citext,citext);
ALTER EXTENSION citext ADD function citext_cmp(citext,citext);
ALTER EXTENSION citext ADD function citext_hash(citext);
+ALTER EXTENSION citext ADD function citext_hash_extended(citext,int8);
ALTER EXTENSION citext ADD operator family citext_ops using btree;
ALTER EXTENSION citext ADD operator class citext_ops using btree;
ALTER EXTENSION citext ADD operator family citext_ops using hash;
diff --git a/contrib/citext/citext.c b/contrib/citext/citext.c
index 0ba4782..a959446 100644
--- a/contrib/citext/citext.c
+++ b/contrib/citext/citext.c
@@ -100,6 +100,26 @@ citext_hash(PG_FUNCTION_ARGS)
PG_RETURN_DATUM(result);
}
+PG_FUNCTION_INFO_V1(citext_hash_extended);
+
+Datum
+citext_hash_extended(PG_FUNCTION_ARGS)
+{
+ text *txt = PG_GETARG_TEXT_PP(0);
+ uint64 seed = PG_GETARG_INT64(1);
+ char *str;
+ Datum result;
+
+ str = str_tolower(VARDATA_ANY(txt), VARSIZE_ANY_EXHDR(txt), DEFAULT_COLLATION_OID);
+ result = hash_any_extended((unsigned char *) str, strlen(str), seed);
+ pfree(str);
+
+ /* Avoid leaking memory for toasted inputs */
+ PG_FREE_IF_COPY(txt, 0);
+
+ PG_RETURN_DATUM(result);
+}
+
/*
* ==================
* OPERATOR FUNCTIONS
diff --git a/contrib/citext/expected/citext.out b/contrib/citext/expected/citext.out
index 9cc94f4..8f03fc9 100644
--- a/contrib/citext/expected/citext.out
+++ b/contrib/citext/expected/citext.out
@@ -222,6 +222,18 @@ SELECT citext_cmp('B'::citext, 'a'::citext) > 0 AS true;
t
(1 row)
+-- Check the citext_hash() and citext_hash_extended() function explicitly.
+SELECT v as value, citext_hash(v)::bit(32) as standard,
+ citext_hash_extended(v, 0)::bit(32) as extended0,
+ citext_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::citext), ('PostgreSQL'), ('eIpUEtqmY89'), ('AXKEJBTK'),
+ ('muop28x03'), ('yi3nm0d73')) x(v)
+WHERE citext_hash(v)::bit(32) != citext_hash_extended(v, 0)::bit(32)
+ OR citext_hash(v)::bit(32) = citext_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
-- Do some tests using a table and index.
CREATE TEMP TABLE try (
name citext PRIMARY KEY
diff --git a/contrib/citext/expected/citext_1.out b/contrib/citext/expected/citext_1.out
index d1fb1e1..b46be4d 100644
--- a/contrib/citext/expected/citext_1.out
+++ b/contrib/citext/expected/citext_1.out
@@ -222,6 +222,18 @@ SELECT citext_cmp('B'::citext, 'a'::citext) > 0 AS true;
t
(1 row)
+-- Check the citext_hash() and citext_hash_extended() function explicitly.
+SELECT v as value, citext_hash(v)::bit(32) as standard,
+ citext_hash_extended(v, 0)::bit(32) as extended0,
+ citext_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::citext), ('PostgreSQL'), ('eIpUEtqmY89'), ('AXKEJBTK'),
+ ('muop28x03'), ('yi3nm0d73')) x(v)
+WHERE citext_hash(v)::bit(32) != citext_hash_extended(v, 0)::bit(32)
+ OR citext_hash(v)::bit(32) = citext_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
-- Do some tests using a table and index.
CREATE TEMP TABLE try (
name citext PRIMARY KEY
diff --git a/contrib/citext/sql/citext.sql b/contrib/citext/sql/citext.sql
index f70f9eb..e97bb7e 100644
--- a/contrib/citext/sql/citext.sql
+++ b/contrib/citext/sql/citext.sql
@@ -89,6 +89,15 @@ SELECT citext_cmp('aardvark'::citext, 'aardVark'::citext) AS zero;
SELECT citext_cmp('AARDVARK'::citext, 'AARDVARK'::citext) AS zero;
SELECT citext_cmp('B'::citext, 'a'::citext) > 0 AS true;
+-- Check the citext_hash() and citext_hash_extended() function explicitly.
+SELECT v as value, citext_hash(v)::bit(32) as standard,
+ citext_hash_extended(v, 0)::bit(32) as extended0,
+ citext_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::citext), ('PostgreSQL'), ('eIpUEtqmY89'), ('AXKEJBTK'),
+ ('muop28x03'), ('yi3nm0d73')) x(v)
+WHERE citext_hash(v)::bit(32) != citext_hash_extended(v, 0)::bit(32)
+ OR citext_hash(v)::bit(32) = citext_hash_extended(v, 1)::bit(32);
+
-- Do some tests using a table and index.
CREATE TEMP TABLE try (
--
2.6.2