64-bit hash function for hstore and citext data type

Started by Amul Sulover 7 years ago9 messageshackers
Jump to latest
#1Amul Sul
sulamul@gmail.com

Hi all,

Commit[1] has added 64-bit hash functions for core data types and in the same
discussion thread[2] Robert Haas suggested to have the similar extended hash
function for hstore and citext data type. Attaching patch proposes the same.

Regards,
Amul

1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=81c5e46c490e2426db243eada186995da5bb0ba7
2] /messages/by-id/CA+Tgmoafx2yoJuhCQQOL5CocEi-w_uG4S2xT0EtgiJnPGcHW3g@mail.gmail.com

Attachments:

hstore-add-extended-hash-function-v1.patchapplication/octet-stream; name=hstore-add-extended-hash-function-v1.patchDownload+57-3
citext-add-extended-hash-function-v1.patchapplication/octet-stream; name=citext-add-extended-hash-function-v1.patchDownload+69-3
#2Hironobu SUZUKI
hironobu@interdb.jp
In reply to: Amul Sul (#1)
Re: 64-bit hash function for hstore and citext data type

On 2018/09/26 11:20, amul sul wrote:

Hi all,

Commit[1] has added 64-bit hash functions for core data types and in the same
discussion thread[2] Robert Haas suggested to have the similar extended hash
function for hstore and citext data type. Attaching patch proposes the same.

Regards,
Amul

1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=81c5e46c490e2426db243eada186995da5bb0ba7
2] /messages/by-id/CA+Tgmoafx2yoJuhCQQOL5CocEi-w_uG4S2xT0EtgiJnPGcHW3g@mail.gmail.com

I reviewed citext-add-extended-hash-function-v1.patch and
hstore-add-extended-hash-function-v1.patch.

I could patch and test them without trouble and could not find any issues.

I think both patches are well.

Best regards,

#3Amul Sul
sulamul@gmail.com
In reply to: Hironobu SUZUKI (#2)
Re: 64-bit hash function for hstore and citext data type

On Wed, Nov 21, 2018 at 10:34 AM Hironobu SUZUKI <hironobu@interdb.jp> wrote:

On 2018/09/26 11:20, amul sul wrote:

Hi all,

Commit[1] has added 64-bit hash functions for core data types and in the same
discussion thread[2] Robert Haas suggested to have the similar extended hash
function for hstore and citext data type. Attaching patch proposes the same.

Regards,
Amul

1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=81c5e46c490e2426db243eada186995da5bb0ba7
2] /messages/by-id/CA+Tgmoafx2yoJuhCQQOL5CocEi-w_uG4S2xT0EtgiJnPGcHW3g@mail.gmail.com

I reviewed citext-add-extended-hash-function-v1.patch and
hstore-add-extended-hash-function-v1.patch.

I could patch and test them without trouble and could not find any issues.

Thanks to looking at the patch.

Regards,
Amul

#4Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Amul Sul (#1)
Re: 64-bit hash function for hstore and citext data type

On 9/26/18 12:20 PM, amul sul wrote:

Hi all,

Commit[1] has added 64-bit hash functions for core data types and in the same
discussion thread[2] Robert Haas suggested to have the similar extended hash
function for hstore and citext data type. Attaching patch proposes the same.

I wonder if the hstore hash function is actually correct. I see it
pretty much just computes hash on the varlena representation. The
important question is - can there be two different encodings for the
same hstore value? If that's possible, those two versions would end up
with a different hash value, breaking the hashing scheme.

I'm not very familiar with hstore internals so I don't know if that's
actually possible, but if you look at hstore_cmp, that seems to be far
more complex than just comparing the varlena values directly.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tomas Vondra (#4)
Re: 64-bit hash function for hstore and citext data type

"Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

Tomas> I wonder if the hstore hash function is actually correct. I see
Tomas> it pretty much just computes hash on the varlena representation.
Tomas> The important question is - can there be two different encodings
Tomas> for the same hstore value?

I was going to say "no", but in fact on closer examination there is an
edge case caused by the fact that hstoreUpgrade allows an _empty_ hstore
from pg_upgraded 8.4 data through without modifying it. (There's also a
vanishingly unlikely case involving the pgfoundry release of hstore-new.)

I'm inclined to fix this in hstoreUpgrade rather than complicate
hstore_hash with historical trivia. Also there have been no field
complaints - I guess it's unlikely that there is much pg 8.4 hstore data
in the wild that anyone wants to hash.

--
Andrew (irc:RhodiumToad)

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#5)
Re: 64-bit hash function for hstore and citext data type

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
Tomas> The important question is - can there be two different encodings
Tomas> for the same hstore value?

I was going to say "no", but in fact on closer examination there is an
edge case caused by the fact that hstoreUpgrade allows an _empty_ hstore
from pg_upgraded 8.4 data through without modifying it. (There's also a
vanishingly unlikely case involving the pgfoundry release of hstore-new.)

Ugh. Still, that's a pre-existing problem in hstore_hash, and so I don't
think it's a blocker for this patch.

I'm inclined to fix this in hstoreUpgrade rather than complicate
hstore_hash with historical trivia. Also there have been no field
complaints - I guess it's unlikely that there is much pg 8.4 hstore data
in the wild that anyone wants to hash.

Changing hstoreUpgrade at this point seems like wasted/misguided effort.
I don't doubt that there was a lot of 8.4 hstore data out there, but how
much remains unmigrated? If we're going to take this seriously at all,
my inclination would be to change hstore_hash[_extended] to test for
the empty-hstore case and force the same value it gets for such an
hstore made today.

In the meantime, I went ahead and pushed these patches. The only
non-cosmetic changes I made were to remove the changes in
citext--unpackaged--1.0.sql/hstore--unpackaged--1.0.sql; those
were wrong, because the point of those files is to migrate pre-9.1
databases into the extension system. Such a database would not
contain an extended hash function, and so adding an ALTER EXTENSION
command for that function would cause the script to fail.

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#6)
Re: 64-bit hash function for hstore and citext data type

I wrote:

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

I'm inclined to fix this in hstoreUpgrade rather than complicate
hstore_hash with historical trivia. Also there have been no field
complaints - I guess it's unlikely that there is much pg 8.4 hstore data
in the wild that anyone wants to hash.

Changing hstoreUpgrade at this point seems like wasted/misguided effort.

Oh, cancel that --- I was having a momentary brain fade about how that
function is used. I agree your proposal is sensible.

regards, tom lane

#8Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#7)
Re: 64-bit hash function for hstore and citext data type

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

I'm inclined to fix this in hstoreUpgrade rather than complicate
hstore_hash with historical trivia. Also there have been no field
complaints - I guess it's unlikely that there is much pg 8.4 hstore
data in the wild that anyone wants to hash.

Changing hstoreUpgrade at this point seems like wasted/misguided effort.

Tom> Oh, cancel that --- I was having a momentary brain fade about how
Tom> that function is used. I agree your proposal is sensible.

Here's what I have queued up to push:

--
Andrew (irc:RhodiumToad)

Attachments:

0001-Fix-hstore-hash-function-for-empty-hstores-upgraded-.patchtext/x-patchDownload+19-29
#9Amul Sul
sulamul@gmail.com
In reply to: Tom Lane (#6)
Re: 64-bit hash function for hstore and citext data type

Thanks Tom for enhancing & committing these patches.

Regards,
Amul

Show quoted text

On Sat, Nov 24, 2018 at 12:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
Tomas> The important question is - can there be two different encodings
Tomas> for the same hstore value?

I was going to say "no", but in fact on closer examination there is an
edge case caused by the fact that hstoreUpgrade allows an _empty_ hstore
from pg_upgraded 8.4 data through without modifying it. (There's also a
vanishingly unlikely case involving the pgfoundry release of hstore-new.)

Ugh. Still, that's a pre-existing problem in hstore_hash, and so I don't
think it's a blocker for this patch.

I'm inclined to fix this in hstoreUpgrade rather than complicate
hstore_hash with historical trivia. Also there have been no field
complaints - I guess it's unlikely that there is much pg 8.4 hstore data
in the wild that anyone wants to hash.

Changing hstoreUpgrade at this point seems like wasted/misguided effort.
I don't doubt that there was a lot of 8.4 hstore data out there, but how
much remains unmigrated? If we're going to take this seriously at all,
my inclination would be to change hstore_hash[_extended] to test for
the empty-hstore case and force the same value it gets for such an
hstore made today.

In the meantime, I went ahead and pushed these patches. The only
non-cosmetic changes I made were to remove the changes in
citext--unpackaged--1.0.sql/hstore--unpackaged--1.0.sql; those
were wrong, because the point of those files is to migrate pre-9.1
databases into the extension system. Such a database would not
contain an extended hash function, and so adding an ALTER EXTENSION
command for that function would cause the script to fail.

regards, tom lane