[PATCH] btree_gist: fix union implementation for variable length columns
Hi hackers,
while I tried to debug 'gcc -fstack-protector -O3' problems in [1]https://bugzilla.redhat.com/show_bug.cgi?id=1544349 Pavel, I noticed
that gbt_var_union() mistreats the first vector element. Patch is attached.
[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1544349 Pavel
Pavel
Attachments:
0001-Fix-btree_gist-union-for-variable-length-types.patchtext/x-patch; charset=UTF-8; name=0001-Fix-btree_gist-union-for-variable-length-types.patchDownload+21-15
Pavel Raiskup <praiskup@redhat.com> writes:
while I tried to debug 'gcc -fstack-protector -O3' problems in [1], I noticed
that gbt_var_union() mistreats the first vector element. Patch is attached.
Hi Pavel! For patches that purport to resolve bugs, we usually like to
add a regression test case that demonstrates the bug in unpatched code.
Can you provide a small test case that does so? (The BZ you pointed to
doesn't seem to address this...)
regards, tom lane
Hi Tom,
On Monday, July 9, 2018 7:41:59 PM CEST Tom Lane wrote:
Pavel Raiskup <praiskup@redhat.com> writes:
while I tried to debug 'gcc -fstack-protector -O3' problems in [1], I noticed
that gbt_var_union() mistreats the first vector element. Patch is attached.Hi Pavel! For patches that purport to resolve bugs, we usually like to
add a regression test case that demonstrates the bug in unpatched code.
Can you provide a small test case that does so? (The BZ you pointed to
doesn't seem to address this...)
I've been looking at the reproducer for a while now... and I probably
need a hint if that's necessary.
Turns out the problem is only related to bit/bit varying type (so the
patch comments need to be reworded properly, at least) since those are the
only types which have implemented the f_l2n() callback.
Considering testcase 'bit.sql' (bit(33) type), it's pretty clear that on
little endian boxes the problematic strings would be '00100001*' (int32
value for '33', because there's varbit header). Big endian boxes would
have problems with char[] like {0, 0, 0, 33, ...}. But I fail to
construct right order of correctly picked elements into 'bit.data'.
The problem probably is (a) that picksplit reorders the elements very
often, and probably that (b) random() brings non-determinism into the
final tree shape (if the reproducer was to be based on many equal keys in
the index). It's amazing to see how the index fixes itself :-) for this
bug.
Can you help me with the reproducer idea, resp. can we live without the
reproducer in this particular case?
Pavel
Pavel Raiskup <praiskup@redhat.com> writes:
On Monday, July 9, 2018 7:41:59 PM CEST Tom Lane wrote:
Hi Pavel! For patches that purport to resolve bugs, we usually like to
add a regression test case that demonstrates the bug in unpatched code.
Can you provide a small test case that does so? (The BZ you pointed to
doesn't seem to address this...)
Turns out the problem is only related to bit/bit varying type (so the
patch comments need to be reworded properly, at least) since those are the
only types which have implemented the f_l2n() callback.
What I'm failing to wrap my head around is why this code exists at all.
AFAICS, gbt_bit_xfrm just forces the bitstring to be zero-padded out to
an INTALIGN boundary, which it wouldn't necessarily be otherwise. But
why bother? That should have no effect on the behavior of bit_cmp().
So I'm speculating that the reason nobody has noticed a problem is that
there is no problem because this code is useless and could be ripped out.
It would be easier to figure this out if the btree_gist code weren't
so desperately undocumented. Teodor, do you remember why it's like
this?
regards, tom lane
On Wednesday, July 11, 2018 7:26:40 PM CEST Tom Lane wrote:
Pavel Raiskup <praiskup@redhat.com> writes:
On Monday, July 9, 2018 7:41:59 PM CEST Tom Lane wrote:
Hi Pavel! For patches that purport to resolve bugs, we usually like to
add a regression test case that demonstrates the bug in unpatched code.
Can you provide a small test case that does so? (The BZ you pointed to
doesn't seem to address this...)Turns out the problem is only related to bit/bit varying type (so the
patch comments need to be reworded properly, at least) since those are the
only types which have implemented the f_l2n() callback.What I'm failing to wrap my head around is why this code exists at all.
AFAICS, gbt_bit_xfrm just forces the bitstring to be zero-padded out to
an INTALIGN boundary, which it wouldn't necessarily be otherwise.
But why bother? That should have no effect on the behavior of bit_cmp().
The gbt_bit_xfrm() method actually also skips the varbit header (number of
bits, stored in VarBit.bit_len), the magic is done by VARBITS macro. If
the header is dropped, it's OK to just use existing byteacmp() for bit
array comparison.
Pavel
It would be easier to figure this out if the btree_gist code weren't
so desperately undocumented. Teodor, do you remember why it's like
this?
Will look.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
On Thursday, July 12, 2018 5:53:18 PM CET Teodor Sigaev wrote:
It would be easier to figure this out if the btree_gist code weren't
so desperately undocumented. Teodor, do you remember why it's like
this?Will look.
Ping on this issue. I guess the patch I proposed isn't wrong in this
case, I'm just not sure about the commit message.
Pavel