[PATCH] btree_gist: fix union implementation for variable length columns
Hi hackers,
while I tried to debug 'gcc -fstack-protector -O3' problems in [1]https://bugzilla.redhat.com/show_bug.cgi?id=1544349 Pavel, I noticed
that gbt_var_union() mistreats the first vector element. Patch is attached.
[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1544349 Pavel
Pavel
Attachments:
0001-Fix-btree_gist-union-for-variable-length-types.patchtext/x-patch; charset=UTF-8; name=0001-Fix-btree_gist-union-for-variable-length-types.patchDownload
From 4e4f0fe8d2f74a85f37abdf095ab8aecf9329596 Mon Sep 17 00:00:00 2001
From: Pavel Raiskup <praiskup@redhat.com>
Date: Wed, 4 Jul 2018 11:39:38 +0200
Subject: [PATCH] Fix btree_gist "union" for variable length types
The gbt_var_union() did not expect that first key in the entryvec
could represent leaf node, while it many times is. So try call
f_l2n() even for the first element, same as we do for all the
other elements in gbt_var_bin_union().
This bug could create wrong indexes, so we it's suggested to
REINDEX.
diff --git a/contrib/btree_gist/btree_bit.c b/contrib/btree_gist/btree_bit.c
index 2225244ded..432b788204 100644
*** a/contrib/btree_gist/btree_bit.c
--- b/contrib/btree_gist/btree_bit.c
***************
*** 75,81 **** gbt_bitcmp(const void *a, const void *b, Oid collation, FmgrInfo *flinfo)
static bytea *
gbt_bit_xfrm(bytea *leaf)
{
! bytea *out = leaf;
int sz = VARBITBYTES(leaf) + VARHDRSZ;
int padded_sz = INTALIGN(sz);
--- 75,81 ----
static bytea *
gbt_bit_xfrm(bytea *leaf)
{
! bytea *out;
int sz = VARBITBYTES(leaf) + VARHDRSZ;
int padded_sz = INTALIGN(sz);
diff --git a/contrib/btree_gist/btreeindex 670c879e77..6d8299d9b5 100644
*** a/contrib/btree_gist/btree_utils_var.c
--- b/contrib/btree_gist/btree_utils_var.c
***************
*** 230,252 **** gbt_var_node_truncate(const GBT_VARKEY *node, int32 cpf_length, const gbtree_vin
}
void
gbt_var_bin_union(Datum *u, GBT_VARKEY *e, Oid collation,
const gbtree_vinfo *tinfo, FmgrInfo *flinfo)
{
! GBT_VARKEY_R eo = gbt_var_key_readable(e);
GBT_VARKEY_R nr;
- if (eo.lower == eo.upper) /* leaf */
- {
- GBT_VARKEY *tmp;
-
- tmp = gbt_var_leaf2node(e, tinfo, flinfo);
- if (tmp != e)
- eo = gbt_var_key_readable(tmp);
- }
-
if (DatumGetPointer(*u))
{
GBT_VARKEY_R ro = gbt_var_key_readable((GBT_VARKEY *) DatumGetPointer(*u));
--- 230,258 ----
}
+ static GBT_VARKEY_R
+ gbt_var_key_readable_resolved(GBT_VARKEY *key, const gbtree_vinfo *tinfo,
+ FmgrInfo *flinfo)
+ {
+ GBT_VARKEY_R out = gbt_var_key_readable(key);
+ if (out.lower == out.upper) /* leaf */
+ {
+ GBT_VARKEY *tmp;
+ tmp = gbt_var_leaf2node(key, tinfo, flinfo);
+ if (tmp != key)
+ out = gbt_var_key_readable(tmp);
+ }
+ return out;
+ }
+
void
gbt_var_bin_union(Datum *u, GBT_VARKEY *e, Oid collation,
const gbtree_vinfo *tinfo, FmgrInfo *flinfo)
{
! GBT_VARKEY_R eo = gbt_var_key_readable_resolved(e, tinfo, flinfo);
GBT_VARKEY_R nr;
if (DatumGetPointer(*u))
{
GBT_VARKEY_R ro = gbt_var_key_readable((GBT_VARKEY *) DatumGetPointer(*u));
***************
*** 333,339 **** gbt_var_union(const GistEntryVector *entryvec, int32 *size, Oid collation,
*size = sizeof(GBT_VARKEY);
cur = (GBT_VARKEY *) DatumGetPointer(entryvec->vector[0].key);
! rk = gbt_var_key_readable(cur);
out = PointerGetDatum(gbt_var_key_copy(&rk));
for (i = 1; i < numranges; i++)
--- 339,345 ----
*size = sizeof(GBT_VARKEY);
cur = (GBT_VARKEY *) DatumGetPointer(entryvec->vector[0].key);
! rk = gbt_var_key_readable_resolved(cur, tinfo, flinfo);
out = PointerGetDatum(gbt_var_key_copy(&rk));
for (i = 1; i < numranges; i++)
Pavel Raiskup <praiskup@redhat.com> writes:
while I tried to debug 'gcc -fstack-protector -O3' problems in [1], I noticed
that gbt_var_union() mistreats the first vector element. Patch is attached.
Hi Pavel! For patches that purport to resolve bugs, we usually like to
add a regression test case that demonstrates the bug in unpatched code.
Can you provide a small test case that does so? (The BZ you pointed to
doesn't seem to address this...)
regards, tom lane
Hi Tom,
On Monday, July 9, 2018 7:41:59 PM CEST Tom Lane wrote:
Pavel Raiskup <praiskup@redhat.com> writes:
while I tried to debug 'gcc -fstack-protector -O3' problems in [1], I noticed
that gbt_var_union() mistreats the first vector element. Patch is attached.Hi Pavel! For patches that purport to resolve bugs, we usually like to
add a regression test case that demonstrates the bug in unpatched code.
Can you provide a small test case that does so? (The BZ you pointed to
doesn't seem to address this...)
I've been looking at the reproducer for a while now... and I probably
need a hint if that's necessary.
Turns out the problem is only related to bit/bit varying type (so the
patch comments need to be reworded properly, at least) since those are the
only types which have implemented the f_l2n() callback.
Considering testcase 'bit.sql' (bit(33) type), it's pretty clear that on
little endian boxes the problematic strings would be '00100001*' (int32
value for '33', because there's varbit header). Big endian boxes would
have problems with char[] like {0, 0, 0, 33, ...}. But I fail to
construct right order of correctly picked elements into 'bit.data'.
The problem probably is (a) that picksplit reorders the elements very
often, and probably that (b) random() brings non-determinism into the
final tree shape (if the reproducer was to be based on many equal keys in
the index). It's amazing to see how the index fixes itself :-) for this
bug.
Can you help me with the reproducer idea, resp. can we live without the
reproducer in this particular case?
Pavel
Pavel Raiskup <praiskup@redhat.com> writes:
On Monday, July 9, 2018 7:41:59 PM CEST Tom Lane wrote:
Hi Pavel! For patches that purport to resolve bugs, we usually like to
add a regression test case that demonstrates the bug in unpatched code.
Can you provide a small test case that does so? (The BZ you pointed to
doesn't seem to address this...)
Turns out the problem is only related to bit/bit varying type (so the
patch comments need to be reworded properly, at least) since those are the
only types which have implemented the f_l2n() callback.
What I'm failing to wrap my head around is why this code exists at all.
AFAICS, gbt_bit_xfrm just forces the bitstring to be zero-padded out to
an INTALIGN boundary, which it wouldn't necessarily be otherwise. But
why bother? That should have no effect on the behavior of bit_cmp().
So I'm speculating that the reason nobody has noticed a problem is that
there is no problem because this code is useless and could be ripped out.
It would be easier to figure this out if the btree_gist code weren't
so desperately undocumented. Teodor, do you remember why it's like
this?
regards, tom lane
On Wednesday, July 11, 2018 7:26:40 PM CEST Tom Lane wrote:
Pavel Raiskup <praiskup@redhat.com> writes:
On Monday, July 9, 2018 7:41:59 PM CEST Tom Lane wrote:
Hi Pavel! For patches that purport to resolve bugs, we usually like to
add a regression test case that demonstrates the bug in unpatched code.
Can you provide a small test case that does so? (The BZ you pointed to
doesn't seem to address this...)Turns out the problem is only related to bit/bit varying type (so the
patch comments need to be reworded properly, at least) since those are the
only types which have implemented the f_l2n() callback.What I'm failing to wrap my head around is why this code exists at all.
AFAICS, gbt_bit_xfrm just forces the bitstring to be zero-padded out to
an INTALIGN boundary, which it wouldn't necessarily be otherwise.
But why bother? That should have no effect on the behavior of bit_cmp().
The gbt_bit_xfrm() method actually also skips the varbit header (number of
bits, stored in VarBit.bit_len), the magic is done by VARBITS macro. If
the header is dropped, it's OK to just use existing byteacmp() for bit
array comparison.
Pavel
It would be easier to figure this out if the btree_gist code weren't
so desperately undocumented. Teodor, do you remember why it's like
this?
Will look.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
On Thursday, July 12, 2018 5:53:18 PM CET Teodor Sigaev wrote:
It would be easier to figure this out if the btree_gist code weren't
so desperately undocumented. Teodor, do you remember why it's like
this?Will look.
Ping on this issue. I guess the patch I proposed isn't wrong in this
case, I'm just not sure about the commit message.
Pavel