64-bit hash function for hstore and citext data type
Hi all,
Commit[1] has added 64-bit hash functions for core data types and in the same
discussion thread[2] Robert Haas suggested to have the similar extended hash
function for hstore and citext data type. Attaching patch proposes the same.
Regards,
Amul
1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=81c5e46c490e2426db243eada186995da5bb0ba7
2] /messages/by-id/CA+Tgmoafx2yoJuhCQQOL5CocEi-w_uG4S2xT0EtgiJnPGcHW3g@mail.gmail.com
Attachments:
hstore-add-extended-hash-function-v1.patchapplication/octet-stream; name=hstore-add-extended-hash-function-v1.patchDownload
From 2a3b3f96df7f52441ad13e574b8ce3f42b71740b Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 26 Sep 2018 05:00:36 -0400
Subject: [PATCH 1/2] hstore - add extended hash function v1
---
contrib/hstore/Makefile | 3 ++-
contrib/hstore/expected/hstore.out | 12 ++++++++++++
contrib/hstore/hstore--1.5--1.6.sql | 12 ++++++++++++
contrib/hstore/hstore--unpackaged--1.0.sql | 1 +
contrib/hstore/hstore.control | 2 +-
contrib/hstore/hstore_op.c | 20 ++++++++++++++++++++
contrib/hstore/sql/hstore.sql | 9 +++++++++
7 files changed, 57 insertions(+), 2 deletions(-)
create mode 100644 contrib/hstore/hstore--1.5--1.6.sql
diff --git a/contrib/hstore/Makefile b/contrib/hstore/Makefile
index 46d26f8052..8170e307fe 100644
--- a/contrib/hstore/Makefile
+++ b/contrib/hstore/Makefile
@@ -5,7 +5,8 @@ OBJS = hstore_io.o hstore_op.o hstore_gist.o hstore_gin.o hstore_compat.o \
$(WIN32RES)
EXTENSION = hstore
-DATA = hstore--1.4.sql hstore--1.4--1.5.sql \
+DATA = hstore--1.4.sql hstore--1.5--1.6.sql \
+ hstore--1.4--1.5.sql \
hstore--1.3--1.4.sql hstore--1.2--1.3.sql \
hstore--1.1--1.2.sql hstore--1.0--1.1.sql \
hstore--unpackaged--1.0.sql
diff --git a/contrib/hstore/expected/hstore.out b/contrib/hstore/expected/hstore.out
index f0d421602d..4f1db01b3e 100644
--- a/contrib/hstore/expected/hstore.out
+++ b/contrib/hstore/expected/hstore.out
@@ -1515,3 +1515,15 @@ select json_agg(q) from (select f1, hstore_to_json_loose(f2) as f2 from test_jso
{"f1":"rec2","f2":{"b": false, "c": "null", "d": -12345, "e": "012345.6", "f": -1.234, "g": 0.345e-4, "a key": 2}}]
(1 row)
+-- Check the hstore_hash() and hstore_hash_extended() function explicitly.
+SELECT v as value, hstore_hash(v)::bit(32) as standard,
+ hstore_hash_extended(v, 0)::bit(32) as extended0,
+ hstore_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::hstore), (''), ('"a key" =>1'), ('c => null'),
+ ('e => 012345'), ('g => 2.345e+4')) x(v)
+WHERE hstore_hash(v)::bit(32) != hstore_hash_extended(v, 0)::bit(32)
+ OR hstore_hash(v)::bit(32) = hstore_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
diff --git a/contrib/hstore/hstore--1.5--1.6.sql b/contrib/hstore/hstore--1.5--1.6.sql
new file mode 100644
index 0000000000..c5a2bae02f
--- /dev/null
+++ b/contrib/hstore/hstore--1.5--1.6.sql
@@ -0,0 +1,12 @@
+/* contrib/hstore/hstore--1.5--1.6.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION hstore UPDATE TO '1.6'" to load this file. \quit
+
+CREATE FUNCTION hstore_hash_extended(hstore, int8)
+RETURNS int8
+AS 'MODULE_PATHNAME','hstore_hash_extended'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+ALTER OPERATOR FAMILY hash_hstore_ops USING hash ADD
+ FUNCTION 2 hstore_hash_extended(hstore, int8);
diff --git a/contrib/hstore/hstore--unpackaged--1.0.sql b/contrib/hstore/hstore--unpackaged--1.0.sql
index 19a7802805..2128165433 100644
--- a/contrib/hstore/hstore--unpackaged--1.0.sql
+++ b/contrib/hstore/hstore--unpackaged--1.0.sql
@@ -71,6 +71,7 @@ ALTER EXTENSION hstore ADD operator #<=#(hstore,hstore);
ALTER EXTENSION hstore ADD operator family btree_hstore_ops using btree;
ALTER EXTENSION hstore ADD operator class btree_hstore_ops using btree;
ALTER EXTENSION hstore ADD function hstore_hash(hstore);
+ALTER EXTENSION hstore ADD function hstore_hash_extended(hstore,int8);
ALTER EXTENSION hstore ADD operator family hash_hstore_ops using hash;
ALTER EXTENSION hstore ADD operator class hash_hstore_ops using hash;
ALTER EXTENSION hstore ADD type ghstore;
diff --git a/contrib/hstore/hstore.control b/contrib/hstore/hstore.control
index 8a719475b8..93688cdd83 100644
--- a/contrib/hstore/hstore.control
+++ b/contrib/hstore/hstore.control
@@ -1,5 +1,5 @@
# hstore extension
comment = 'data type for storing sets of (key, value) pairs'
-default_version = '1.5'
+default_version = '1.6'
module_pathname = '$libdir/hstore'
relocatable = true
diff --git a/contrib/hstore/hstore_op.c b/contrib/hstore/hstore_op.c
index 8f9277f8da..898324bbe9 100644
--- a/contrib/hstore/hstore_op.c
+++ b/contrib/hstore/hstore_op.c
@@ -1253,3 +1253,23 @@ hstore_hash(PG_FUNCTION_ARGS)
PG_FREE_IF_COPY(hs, 0);
PG_RETURN_DATUM(hval);
}
+
+PG_FUNCTION_INFO_V1(hstore_hash_extended);
+Datum
+hstore_hash_extended(PG_FUNCTION_ARGS)
+{
+ HStore *hs = PG_GETARG_HSTORE_P(0);
+ Datum hval = hash_any_extended((unsigned char *) VARDATA(hs),
+ VARSIZE(hs) - VARHDRSZ,
+ PG_GETARG_INT64(1));
+
+ /* Same approach as hstore_hash */
+ Assert(VARSIZE(hs) ==
+ (HS_COUNT(hs) != 0 ?
+ CALCDATASIZE(HS_COUNT(hs),
+ HSE_ENDPOS(ARRPTR(hs)[2 * HS_COUNT(hs) - 1])) :
+ HSHRDSIZE));
+
+ PG_FREE_IF_COPY(hs, 0);
+ PG_RETURN_DATUM(hval);
+}
diff --git a/contrib/hstore/sql/hstore.sql b/contrib/hstore/sql/hstore.sql
index d64b9f77c7..76ac48b021 100644
--- a/contrib/hstore/sql/hstore.sql
+++ b/contrib/hstore/sql/hstore.sql
@@ -350,3 +350,12 @@ insert into test_json_agg values ('rec1','"a key" =>1, b => t, c => null, d=> 12
('rec2','"a key" =>2, b => f, c => "null", d=> -12345, e => 012345.6, f=> -1.234, g=> 0.345e-4');
select json_agg(q) from test_json_agg q;
select json_agg(q) from (select f1, hstore_to_json_loose(f2) as f2 from test_json_agg) q;
+
+-- Check the hstore_hash() and hstore_hash_extended() function explicitly.
+SELECT v as value, hstore_hash(v)::bit(32) as standard,
+ hstore_hash_extended(v, 0)::bit(32) as extended0,
+ hstore_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::hstore), (''), ('"a key" =>1'), ('c => null'),
+ ('e => 012345'), ('g => 2.345e+4')) x(v)
+WHERE hstore_hash(v)::bit(32) != hstore_hash_extended(v, 0)::bit(32)
+ OR hstore_hash(v)::bit(32) = hstore_hash_extended(v, 1)::bit(32);
--
2.18.0
citext-add-extended-hash-function-v1.patchapplication/octet-stream; name=citext-add-extended-hash-function-v1.patchDownload
From 67231de5dcd1d2f896f941497d44ab14d0d5d352 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 26 Sep 2018 05:32:06 -0400
Subject: [PATCH] citext - add extended hash function v1
---
contrib/citext/Makefile | 3 ++-
contrib/citext/citext--1.5--1.6.sql | 12 ++++++++++++
contrib/citext/citext--unpackaged--1.0.sql | 1 +
contrib/citext/citext.c | 20 ++++++++++++++++++++
contrib/citext/citext.control | 2 +-
contrib/citext/expected/citext.out | 12 ++++++++++++
contrib/citext/expected/citext_1.out | 12 ++++++++++++
contrib/citext/sql/citext.sql | 9 +++++++++
8 files changed, 69 insertions(+), 2 deletions(-)
create mode 100644 contrib/citext/citext--1.5--1.6.sql
diff --git a/contrib/citext/Makefile b/contrib/citext/Makefile
index e32a7de946..144fcdd794 100644
--- a/contrib/citext/Makefile
+++ b/contrib/citext/Makefile
@@ -3,7 +3,8 @@
MODULES = citext
EXTENSION = citext
-DATA = citext--1.4.sql citext--1.4--1.5.sql \
+DATA = citext--1.4.sql citext--1.5--1.6.sql \
+ citext--1.4--1.5.sql \
citext--1.3--1.4.sql \
citext--1.2--1.3.sql citext--1.1--1.2.sql \
citext--1.0--1.1.sql citext--unpackaged--1.0.sql
diff --git a/contrib/citext/citext--1.5--1.6.sql b/contrib/citext/citext--1.5--1.6.sql
new file mode 100644
index 0000000000..32268983ae
--- /dev/null
+++ b/contrib/citext/citext--1.5--1.6.sql
@@ -0,0 +1,12 @@
+/* contrib/citext/citext--1.5--1.6.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION citext UPDATE TO '1.6'" to load this file. \quit
+
+CREATE FUNCTION citext_hash_extended(citext, int8)
+RETURNS int8
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+ALTER OPERATOR FAMILY citext_ops USING hash ADD
+ FUNCTION 2 citext_hash_extended(citext, int8);
diff --git a/contrib/citext/citext--unpackaged--1.0.sql b/contrib/citext/citext--unpackaged--1.0.sql
index ef6d6b0639..9b1161b239 100644
--- a/contrib/citext/citext--unpackaged--1.0.sql
+++ b/contrib/citext/citext--unpackaged--1.0.sql
@@ -33,6 +33,7 @@ ALTER EXTENSION citext ADD operator <(citext,citext);
ALTER EXTENSION citext ADD operator <=(citext,citext);
ALTER EXTENSION citext ADD function citext_cmp(citext,citext);
ALTER EXTENSION citext ADD function citext_hash(citext);
+ALTER EXTENSION citext ADD function citext_hash_extended(citext,int8);
ALTER EXTENSION citext ADD operator family citext_ops using btree;
ALTER EXTENSION citext ADD operator class citext_ops using btree;
ALTER EXTENSION citext ADD operator family citext_ops using hash;
diff --git a/contrib/citext/citext.c b/contrib/citext/citext.c
index 2c0e48e2bc..24ceeb11fc 100644
--- a/contrib/citext/citext.c
+++ b/contrib/citext/citext.c
@@ -153,6 +153,26 @@ citext_hash(PG_FUNCTION_ARGS)
PG_RETURN_DATUM(result);
}
+PG_FUNCTION_INFO_V1(citext_hash_extended);
+
+Datum
+citext_hash_extended(PG_FUNCTION_ARGS)
+{
+ text *txt = PG_GETARG_TEXT_PP(0);
+ uint64 seed = PG_GETARG_INT64(1);
+ char *str;
+ Datum result;
+
+ str = str_tolower(VARDATA_ANY(txt), VARSIZE_ANY_EXHDR(txt), DEFAULT_COLLATION_OID);
+ result = hash_any_extended((unsigned char *) str, strlen(str), seed);
+ pfree(str);
+
+ /* Avoid leaking memory for toasted inputs */
+ PG_FREE_IF_COPY(txt, 0);
+
+ PG_RETURN_DATUM(result);
+}
+
/*
* ==================
* OPERATOR FUNCTIONS
diff --git a/contrib/citext/citext.control b/contrib/citext/citext.control
index 4cd6e09331..a872a3f012 100644
--- a/contrib/citext/citext.control
+++ b/contrib/citext/citext.control
@@ -1,5 +1,5 @@
# citext extension
comment = 'data type for case-insensitive character strings'
-default_version = '1.5'
+default_version = '1.6'
module_pathname = '$libdir/citext'
relocatable = true
diff --git a/contrib/citext/expected/citext.out b/contrib/citext/expected/citext.out
index 99365c57b0..20890db859 100644
--- a/contrib/citext/expected/citext.out
+++ b/contrib/citext/expected/citext.out
@@ -222,6 +222,18 @@ SELECT citext_cmp('B'::citext, 'a'::citext) > 0 AS true;
t
(1 row)
+-- Check the citext_hash() and citext_hash_extended() function explicitly.
+SELECT v as value, citext_hash(v)::bit(32) as standard,
+ citext_hash_extended(v, 0)::bit(32) as extended0,
+ citext_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::citext), ('PostgreSQL'), ('eIpUEtqmY89'), ('AXKEJBTK'),
+ ('muop28x03'), ('yi3nm0d73')) x(v)
+WHERE citext_hash(v)::bit(32) != citext_hash_extended(v, 0)::bit(32)
+ OR citext_hash(v)::bit(32) = citext_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
-- Do some tests using a table and index.
CREATE TEMP TABLE try (
name citext PRIMARY KEY
diff --git a/contrib/citext/expected/citext_1.out b/contrib/citext/expected/citext_1.out
index 8aac72e226..755baad8e2 100644
--- a/contrib/citext/expected/citext_1.out
+++ b/contrib/citext/expected/citext_1.out
@@ -222,6 +222,18 @@ SELECT citext_cmp('B'::citext, 'a'::citext) > 0 AS true;
t
(1 row)
+-- Check the citext_hash() and citext_hash_extended() function explicitly.
+SELECT v as value, citext_hash(v)::bit(32) as standard,
+ citext_hash_extended(v, 0)::bit(32) as extended0,
+ citext_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::citext), ('PostgreSQL'), ('eIpUEtqmY89'), ('AXKEJBTK'),
+ ('muop28x03'), ('yi3nm0d73')) x(v)
+WHERE citext_hash(v)::bit(32) != citext_hash_extended(v, 0)::bit(32)
+ OR citext_hash(v)::bit(32) = citext_hash_extended(v, 1)::bit(32);
+ value | standard | extended0 | extended1
+-------+----------+-----------+-----------
+(0 rows)
+
-- Do some tests using a table and index.
CREATE TEMP TABLE try (
name citext PRIMARY KEY
diff --git a/contrib/citext/sql/citext.sql b/contrib/citext/sql/citext.sql
index 2732be436d..0cc909eb52 100644
--- a/contrib/citext/sql/citext.sql
+++ b/contrib/citext/sql/citext.sql
@@ -89,6 +89,15 @@ SELECT citext_cmp('aardvark'::citext, 'aardVark'::citext) AS zero;
SELECT citext_cmp('AARDVARK'::citext, 'AARDVARK'::citext) AS zero;
SELECT citext_cmp('B'::citext, 'a'::citext) > 0 AS true;
+-- Check the citext_hash() and citext_hash_extended() function explicitly.
+SELECT v as value, citext_hash(v)::bit(32) as standard,
+ citext_hash_extended(v, 0)::bit(32) as extended0,
+ citext_hash_extended(v, 1)::bit(32) as extended1
+FROM (VALUES (NULL::citext), ('PostgreSQL'), ('eIpUEtqmY89'), ('AXKEJBTK'),
+ ('muop28x03'), ('yi3nm0d73')) x(v)
+WHERE citext_hash(v)::bit(32) != citext_hash_extended(v, 0)::bit(32)
+ OR citext_hash(v)::bit(32) = citext_hash_extended(v, 1)::bit(32);
+
-- Do some tests using a table and index.
CREATE TEMP TABLE try (
--
2.18.0
On 2018/09/26 11:20, amul sul wrote:
Hi all,
Commit[1] has added 64-bit hash functions for core data types and in the same
discussion thread[2] Robert Haas suggested to have the similar extended hash
function for hstore and citext data type. Attaching patch proposes the same.Regards,
Amul1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=81c5e46c490e2426db243eada186995da5bb0ba7
2] /messages/by-id/CA+Tgmoafx2yoJuhCQQOL5CocEi-w_uG4S2xT0EtgiJnPGcHW3g@mail.gmail.com
I reviewed citext-add-extended-hash-function-v1.patch and
hstore-add-extended-hash-function-v1.patch.
I could patch and test them without trouble and could not find any issues.
I think both patches are well.
Best regards,
On Wed, Nov 21, 2018 at 10:34 AM Hironobu SUZUKI <hironobu@interdb.jp> wrote:
On 2018/09/26 11:20, amul sul wrote:
Hi all,
Commit[1] has added 64-bit hash functions for core data types and in the same
discussion thread[2] Robert Haas suggested to have the similar extended hash
function for hstore and citext data type. Attaching patch proposes the same.Regards,
Amul1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=81c5e46c490e2426db243eada186995da5bb0ba7
2] /messages/by-id/CA+Tgmoafx2yoJuhCQQOL5CocEi-w_uG4S2xT0EtgiJnPGcHW3g@mail.gmail.comI reviewed citext-add-extended-hash-function-v1.patch and
hstore-add-extended-hash-function-v1.patch.I could patch and test them without trouble and could not find any issues.
Thanks to looking at the patch.
Regards,
Amul
On 9/26/18 12:20 PM, amul sul wrote:
Hi all,
Commit[1] has added 64-bit hash functions for core data types and in the same
discussion thread[2] Robert Haas suggested to have the similar extended hash
function for hstore and citext data type. Attaching patch proposes the same.
I wonder if the hstore hash function is actually correct. I see it
pretty much just computes hash on the varlena representation. The
important question is - can there be two different encodings for the
same hstore value? If that's possible, those two versions would end up
with a different hash value, breaking the hashing scheme.
I'm not very familiar with hstore internals so I don't know if that's
actually possible, but if you look at hstore_cmp, that seems to be far
more complex than just comparing the varlena values directly.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
"Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
Tomas> I wonder if the hstore hash function is actually correct. I see
Tomas> it pretty much just computes hash on the varlena representation.
Tomas> The important question is - can there be two different encodings
Tomas> for the same hstore value?
I was going to say "no", but in fact on closer examination there is an
edge case caused by the fact that hstoreUpgrade allows an _empty_ hstore
from pg_upgraded 8.4 data through without modifying it. (There's also a
vanishingly unlikely case involving the pgfoundry release of hstore-new.)
I'm inclined to fix this in hstoreUpgrade rather than complicate
hstore_hash with historical trivia. Also there have been no field
complaints - I guess it's unlikely that there is much pg 8.4 hstore data
in the wild that anyone wants to hash.
--
Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
"Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
Tomas> The important question is - can there be two different encodings
Tomas> for the same hstore value?
I was going to say "no", but in fact on closer examination there is an
edge case caused by the fact that hstoreUpgrade allows an _empty_ hstore
from pg_upgraded 8.4 data through without modifying it. (There's also a
vanishingly unlikely case involving the pgfoundry release of hstore-new.)
Ugh. Still, that's a pre-existing problem in hstore_hash, and so I don't
think it's a blocker for this patch.
I'm inclined to fix this in hstoreUpgrade rather than complicate
hstore_hash with historical trivia. Also there have been no field
complaints - I guess it's unlikely that there is much pg 8.4 hstore data
in the wild that anyone wants to hash.
Changing hstoreUpgrade at this point seems like wasted/misguided effort.
I don't doubt that there was a lot of 8.4 hstore data out there, but how
much remains unmigrated? If we're going to take this seriously at all,
my inclination would be to change hstore_hash[_extended] to test for
the empty-hstore case and force the same value it gets for such an
hstore made today.
In the meantime, I went ahead and pushed these patches. The only
non-cosmetic changes I made were to remove the changes in
citext--unpackaged--1.0.sql/hstore--unpackaged--1.0.sql; those
were wrong, because the point of those files is to migrate pre-9.1
databases into the extension system. Such a database would not
contain an extended hash function, and so adding an ALTER EXTENSION
command for that function would cause the script to fail.
regards, tom lane
I wrote:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
I'm inclined to fix this in hstoreUpgrade rather than complicate
hstore_hash with historical trivia. Also there have been no field
complaints - I guess it's unlikely that there is much pg 8.4 hstore data
in the wild that anyone wants to hash.
Changing hstoreUpgrade at this point seems like wasted/misguided effort.
Oh, cancel that --- I was having a momentary brain fade about how that
function is used. I agree your proposal is sensible.
regards, tom lane
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
I'm inclined to fix this in hstoreUpgrade rather than complicate
hstore_hash with historical trivia. Also there have been no field
complaints - I guess it's unlikely that there is much pg 8.4 hstore
data in the wild that anyone wants to hash.
Changing hstoreUpgrade at this point seems like wasted/misguided effort.
Tom> Oh, cancel that --- I was having a momentary brain fade about how
Tom> that function is used. I agree your proposal is sensible.
Here's what I have queued up to push:
--
Andrew (irc:RhodiumToad)
Attachments:
0001-Fix-hstore-hash-function-for-empty-hstores-upgraded-.patchtext/x-patchDownload
From d5890f49da6a77b1325a3f5822c6b092a2cd41ae Mon Sep 17 00:00:00 2001
From: Andrew Gierth <rhodiumtoad@postgresql.org>
Date: Sat, 24 Nov 2018 09:59:49 +0000
Subject: [PATCH] Fix hstore hash function for empty hstores upgraded from 8.4.
Hstore data generated on pg 8.4 and pg_upgraded to current versions
remains in its original on-disk format unless modified. The same goes
for values generated by the addon hstore-new module on pre-9.0
versions. (The hstoreUpgrade function converts old values on the fly
when read in, but the on-disk value is not modified by this.)
Since old-format empty hstores (and hstore-new hstores) have
representations compatible with the new format, hstoreUpgrade thought
it could get away without modifying such values; but this breaks
hstore_hash (and the new hstore_hash_extended) which assumes
bit-perfect matching between semantically identical hstore values.
Only one bit actually differs (the "new version" flag in the count
field) but that of course is enough to break the hash.
Fix by making hstoreUpgrade unconditionally convert all old values to
new format.
Backpatch all the way, even though this changes a hash value in some
cases, because in those cases the hash value is already failing - for
example, a hash join between old- and new-format empty hstores will be
failing to match, or a hash index on an hstore column containing an
old-format empty value will be failing to find the value since it will
be searching for a hash derived from a new-format datum. (There are no
known field reports of this happening, probably because hashing of
hstores has only been useful in limited circumstances and there
probably isn't much upgraded data being used this way.)
Per concerns arising from discussion of commit eb6f29141be. Original
bug is my fault.
Discussion: https://postgr.es/m/60b1fd3b-7332-40f0-7e7f-f2f04f777747%402ndquadrant.com
---
contrib/hstore/hstore_compat.c | 47 +++++++++++++++++-------------------------
1 file changed, 19 insertions(+), 28 deletions(-)
diff --git a/contrib/hstore/hstore_compat.c b/contrib/hstore/hstore_compat.c
index b95ce9b4aa..1d4e7484e4 100644
--- a/contrib/hstore/hstore_compat.c
+++ b/contrib/hstore/hstore_compat.c
@@ -238,34 +238,35 @@ hstoreUpgrade(Datum orig)
HStore *hs = (HStore *) PG_DETOAST_DATUM(orig);
int valid_new;
int valid_old;
- bool writable;
/* Return immediately if no conversion needed */
- if ((hs->size_ & HS_FLAG_NEWVERSION) ||
- hs->size_ == 0 ||
+ if (hs->size_ & HS_FLAG_NEWVERSION)
+ return hs;
+
+ /* Do we have a writable copy? If not, make one. */
+ if ((void *) hs == (void *) DatumGetPointer(orig))
+ hs = (HStore *) PG_DETOAST_DATUM_COPY(orig);
+
+ if (hs->size_ == 0 ||
(VARSIZE(hs) < 32768 && HSE_ISFIRST((ARRPTR(hs)[0]))))
+ {
+ HS_SETCOUNT(hs, HS_COUNT(hs));
+ HS_FIXSIZE(hs, HS_COUNT(hs));
return hs;
+ }
valid_new = hstoreValidNewFormat(hs);
valid_old = hstoreValidOldFormat(hs);
- /* Do we have a writable copy? */
- writable = ((void *) hs != (void *) DatumGetPointer(orig));
if (!valid_old || hs->size_ == 0)
{
if (valid_new)
{
/*
- * force the "new version" flag and the correct varlena length,
- * but only if we have a writable copy already (which we almost
- * always will, since short new-format values won't come through
- * here)
+ * force the "new version" flag and the correct varlena length.
*/
- if (writable)
- {
- HS_SETCOUNT(hs, HS_COUNT(hs));
- HS_FIXSIZE(hs, HS_COUNT(hs));
- }
+ HS_SETCOUNT(hs, HS_COUNT(hs));
+ HS_FIXSIZE(hs, HS_COUNT(hs));
return hs;
}
else
@@ -302,15 +303,10 @@ hstoreUpgrade(Datum orig)
elog(WARNING, "ambiguous hstore value resolved as hstore-new");
/*
- * force the "new version" flag and the correct varlena length, but
- * only if we have a writable copy already (which we almost always
- * will, since short new-format values won't come through here)
+ * force the "new version" flag and the correct varlena length.
*/
- if (writable)
- {
- HS_SETCOUNT(hs, HS_COUNT(hs));
- HS_FIXSIZE(hs, HS_COUNT(hs));
- }
+ HS_SETCOUNT(hs, HS_COUNT(hs));
+ HS_FIXSIZE(hs, HS_COUNT(hs));
return hs;
#else
elog(WARNING, "ambiguous hstore value resolved as hstore-old");
@@ -318,13 +314,8 @@ hstoreUpgrade(Datum orig)
}
/*
- * must have an old-style value. Overwrite it in place as a new-style one,
- * making sure we have a writable copy first.
+ * must have an old-style value. Overwrite it in place as a new-style one.
*/
-
- if (!writable)
- hs = (HStore *) PG_DETOAST_DATUM_COPY(orig);
-
{
int count = hs->size_;
HEntry *new_entries = ARRPTR(hs);
--
2.11.1
Thanks Tom for enhancing & committing these patches.
Regards,
Amul
Show quoted text
On Sat, Nov 24, 2018 at 12:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
"Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
Tomas> The important question is - can there be two different encodings
Tomas> for the same hstore value?I was going to say "no", but in fact on closer examination there is an
edge case caused by the fact that hstoreUpgrade allows an _empty_ hstore
from pg_upgraded 8.4 data through without modifying it. (There's also a
vanishingly unlikely case involving the pgfoundry release of hstore-new.)Ugh. Still, that's a pre-existing problem in hstore_hash, and so I don't
think it's a blocker for this patch.I'm inclined to fix this in hstoreUpgrade rather than complicate
hstore_hash with historical trivia. Also there have been no field
complaints - I guess it's unlikely that there is much pg 8.4 hstore data
in the wild that anyone wants to hash.Changing hstoreUpgrade at this point seems like wasted/misguided effort.
I don't doubt that there was a lot of 8.4 hstore data out there, but how
much remains unmigrated? If we're going to take this seriously at all,
my inclination would be to change hstore_hash[_extended] to test for
the empty-hstore case and force the same value it gets for such an
hstore made today.In the meantime, I went ahead and pushed these patches. The only
non-cosmetic changes I made were to remove the changes in
citext--unpackaged--1.0.sql/hstore--unpackaged--1.0.sql; those
were wrong, because the point of those files is to migrate pre-9.1
databases into the extension system. Such a database would not
contain an extended hash function, and so adding an ALTER EXTENSION
command for that function would cause the script to fail.regards, tom lane