Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

Started by Marc-Olaf Jaschkeover 10 years ago77 messageshackersbugs

Jump to latest

Marc-Olaf Jaschke

marc-olaf.jaschke@s24.com

over 10 years ago

hackersbugs

Hi,

PostgreSQL 9.5 ignores rows with the following test case:

=========================

\l+
…
Encoding | Collate | Ctype
UTF8 | de_DE.UTF-8 | de_DE.UTF-8
...

create table test (t) as values ('eai'), ('e aí');

select * from test where t = 'eai';
t
-----
eai
(1 row)

create index on test(t);

set enable_seqscan = false;

select * from test where t = 'eai';
t
---
(0 rows)

select t from test where t = 'eai' collate "C";
t
-----
eai
(1 row)

alter table test alter column t type text collate "C";
select * from test where t = 'eai';
t
-----
eai
(1 row)

alter table test alter column t type text collate "de_DE.utf8";
select * from test where t = 'eai';
t
---
(0 rows)

set enable_seqscan = true;

select * from test where t = 'eai';
t
-----
eai
(1 row)

=========================

I was able to reproduce this with

cat /etc/debian_version
6.0.1
PostgreSQL 9.5.0 on x86_64-pc-linux-gnu, compiled by gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit
/lib/libc.so.6 > GNU C Library (Debian EGLIBC 2.11.3-3) stable release version 2.11.3, by Roland McGrath et al.

CentOS release 6.7 (Final)
PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
ldd --version
ldd (GNU libc) 2.12

I was not able to reproduce this with

OSX (10.11.3 (15D21))
PostgreSQL 9.5alpha1 on x86_64-apple-darwin14.3.0, compiled by Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn), 64-bit

OSX (10.11.3 (15D21))
PostgreSQL 9.5.1 on x86_64-apple-darwin14.5.0, compiled by Apple LLVM version 7.0.0 (clang-700.1.76), 64-bit

Ubuntu 12.04.5 LTS
PostgreSQL 9.3.11 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
ldd --version
ldd (Ubuntu EGLIBC 2.15-0ubuntu10.13) 2.15

CentOS release 6.7 (Final)
PostgreSQL 9.4.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
ldd --version
ldd (GNU libc) 2.12

Red Hat Enterprise Linux Server release 7.2 (Maipo)
PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit
ldd --version
ldd (GNU libc) 2.17

Best regards,
Marc-Olaf Jaschke

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Marc-Olaf Jaschke (#1)

hackersbugs

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

Marc-Olaf Jaschke <marc-olaf.jaschke@s24.com> writes:

PostgreSQL 9.5 ignores rows with the following test case:

I can reproduce this in 9.5 and HEAD on RHEL6, but 9.4 works as expected.
I presume that that points the finger at the abbreviated-keys work.

BTW, what I'm seeing in 9.5/HEAD is that all three comparison senses fail:

u8=# set enable_seqscan TO 0;
SET
u8=# select * from test where t < 'eai';
t
---
(0 rows)

u8=# select * from test where t = 'eai';
t
---
(0 rows)

u8=# select * from test where t > 'eai';
t
---
(0 rows)

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tom Lane (#2)

hackersbugs

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

On Mon, Mar 21, 2016 at 8:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Marc-Olaf Jaschke <marc-olaf.jaschke@s24.com> writes:

PostgreSQL 9.5 ignores rows with the following test case:

I can reproduce this in 9.5 and HEAD on RHEL6, but 9.4 works as expected.
I presume that that points the finger at the abbreviated-keys work.

BTW, what I'm seeing in 9.5/HEAD is that all three comparison senses fail:

u8=# set enable_seqscan TO 0;
SET
u8=# select * from test where t < 'eai';
t
---
(0 rows)

u8=# select * from test where t = 'eai';
t
---
(0 rows)

u8=# select * from test where t > 'eai';
t
---
(0 rows)

This could plausibly be a consequence of the abbreviated keys work if
strxfrm() and strcoll() return inconsistent results for those strings
for the same locale (say, one says +1 and the other says -1 given
those inputs). I don't have a RHEL6 system handy to test whether that
might be the case here.

If that is the case, I'd argue that's a glibc problem, not our
problem. Of course, we could provide an option to disable abbreviated
keys for the benefit of people who need to work around buggy libc
implementations.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Robert Haas (#3)

hackersbugs

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

On Mon, Mar 21, 2016 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:

If that is the case, I'd argue that's a glibc problem, not our
problem. Of course, we could provide an option to disable abbreviated
keys for the benefit of people who need to work around buggy libc
implementations.

Conferred with Robert. This is my first suspicion. More in a little while.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Marc-Olaf Jaschke (#1)

hackersbugs

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

On Mon, Mar 21, 2016 at 1:40 PM, Marc-Olaf Jaschke
<marc-olaf.jaschke@s24.com> wrote:

PostgreSQL 9.5 ignores rows with the following test case:

At one point, Robert wrote a small self-contained tool to show OS
strxfrm() blobs:

/messages/by-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com

It would be great if you showed us the output for your test case
strings, both on an affected and on an unaffected system. As Robert
mentioned, our use of strxfrm() quite reasonably relies on it
producing blobs that compare with strcmp() in a way that gives the
same result as a strcoll() on the original strings, per ISO C90.

Thanks
--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Robert Haas (#3)

hackersbugs

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

On Mon, Mar 21, 2016 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:

If that is the case, I'd argue that's a glibc problem, not our
problem. Of course, we could provide an option to disable abbreviated
keys for the benefit of people who need to work around buggy libc
implementations.

That would be an easy patch to write. We'd simply have a test within
bttextsortsupport() that had systems that disabled abbreviated keys
for text PG_RETURN_VOID(). Actually, to be more precise we'd put that
next to the Windows code within varstr_sortsupport() (the function is
called btsortsupport_worker in 9.5). It would look at a GUC, I
suppose.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Marc-Olaf Jaschke (#1)

hackersbugs

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

On Mon, Mar 21, 2016 at 1:40 PM, Marc-Olaf Jaschke
<marc-olaf.jaschke@s24.com> wrote:

I was able to reproduce this with

cat /etc/debian_version
6.0.1
PostgreSQL 9.5.0 on x86_64-pc-linux-gnu, compiled by gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit
/lib/libc.so.6 > GNU C Library (Debian EGLIBC 2.11.3-3) stable release version 2.11.3, by Roland McGrath et al.

CentOS release 6.7 (Final)
PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
ldd --version
ldd (GNU libc) 2.12

I found this fairly recent bug report concerning glibc's strxfrm():

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=803927

I'm not certain that this is the problem, but it's a good theory. Note
that this particular message talks about your exact affected version
of eglibc (eglibc-2.11.3):

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=803927#27

Even if it isn't this exact issue, I have a really hard time imagining
that this is not a bug in the relevant Glibc versions. Abbreviated
keys are fundamentally a fairly simple idea, and it's hard to think of
any other possible explanation.

We'll know more when we use those strxfrm() blobs, from the tool I linked to.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Peter Geoghegan (#6)

hackersbugs

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

On Mon, Mar 21, 2016 at 5:44 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Mon, Mar 21, 2016 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:

If that is the case, I'd argue that's a glibc problem, not our
problem. Of course, we could provide an option to disable abbreviated
keys for the benefit of people who need to work around buggy libc
implementations.

That would be an easy patch to write. We'd simply have a test within
bttextsortsupport() that had systems that disabled abbreviated keys
for text PG_RETURN_VOID(). Actually, to be more precise we'd put that
next to the Windows code within varstr_sortsupport() (the function is
called btsortsupport_worker in 9.5). It would look at a GUC, I
suppose.

Actually, I suppose it isn't quite that simple, because abbreviated
keys did not introduce the use of strxfrm() by Postgres. That happened
much sooner. I guess we'd have to think about convert_string_datum(),
too.

Maybe we can write a test-case that lets check_strxfrm_bug() detect
this issue, which would be ideal. But, again, I need to see what's
going on with strxfrm() on affected systems before I can do anything.
Don't have one of my own close at hand.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Peter Geoghegan (#8)

hackersbugs

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

Peter Geoghegan <pg@heroku.com> writes:

On Mon, Mar 21, 2016 at 5:44 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Mon, Mar 21, 2016 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:

If that is the case, I'd argue that's a glibc problem, not our
problem. Of course, we could provide an option to disable abbreviated
keys for the benefit of people who need to work around buggy libc
implementations.

FWIW, I do not think you can dismiss it as "not our bug" if a large
fraction of existing glibc installations share the issue. It might
be a glibc bug, but we'll have to find a workaround.

Maybe we can write a test-case that lets check_strxfrm_bug() detect
this issue, which would be ideal. But, again, I need to see what's
going on with strxfrm() on affected systems before I can do anything.

Happy to test if you can provide a test case.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Tom Lane (#9)

hackersbugs

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

On Mon, Mar 21, 2016 at 7:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

FWIW, I do not think you can dismiss it as "not our bug" if a large
fraction of existing glibc installations share the issue. It might
be a glibc bug, but we'll have to find a workaround.

I didn't say that. I strongly agree.

Maybe we can write a test-case that lets check_strxfrm_bug() detect
this issue, which would be ideal. But, again, I need to see what's
going on with strxfrm() on affected systems before I can do anything.

Happy to test if you can provide a test case.

Can you look at generating a textual representation of the strxfrm()
blobs in question, using Robert's tool?:

/messages/by-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com

That would give me some basis for writing a test.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#11

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Peter Geoghegan (#5)

hackersbugs

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

Peter Geoghegan <pg@heroku.com> writes:

At one point, Robert wrote a small self-contained tool to show OS
strxfrm() blobs:
/messages/by-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com

It would be great if you showed us the output for your test case
strings, both on an affected and on an unaffected system.

On RHEL6, I get

./strxfrm-binary de_DE.UTF-8 'eai' 'e aí'
"eai" -> 100c140108080801020202 (11 bytes)
"e aí" -> 100c140108080901020202010235 (14 bytes)

This seems a bit problematic, because these string sort in the other
order ("e aí" before "eai") according to sort(1) as well as Postgres
sorting code.

It's possible I've copied-and-pasted these multibyte characters wrong.
But if I haven't, this says that the strxfrm-based optimization is
unusably broken on a very large fraction of reasonably-modern
installations. Quite aside from casting aspersions on the glibc guys,
how did we fail to notice this in our own testing?

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#12

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Tom Lane (#11)

hackersbugs

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

On Mon, Mar 21, 2016 at 9:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

On RHEL6, I get

./strxfrm-binary de_DE.UTF-8 'eai' 'e aí'
"eai" -> 100c140108080801020202 (11 bytes)
"e aí" -> 100c140108080901020202010235 (14 bytes)

As expect, ISTM that the "primary weights" here are the same.

Aligned comparison of this with correct en_US.UTF-8 blobs from my system:

Buggy version (Tom's de_DE.UTF-8 testcase):

"eai" -> 100c14 01 090909 01 090909 (11 bytes)
"e aí" -> 100c14 01 0b0909 01 090909010235 (14 bytes)

Correct version (though uses different locale):

"eai" -> 100c14 01 080808 01 020202 (11 bytes)
"e aí" -> 100c14 01 080809 01 020202010235 (14 bytes)

The low bytes, 0x01, separate the weight levels,. I think that this
always happens with glibc. The space character is only represented at
the last level, which is why strcoll() typically weighs spaces as very
unimportant (you'll recall that we here complaints about this from
time to time).

My guess is that the 0x0b byte in Tom's buggy de_DE.UTF-8 testcase is
the problem. Not sure why.

I guess I'll look around here for further ideas tomorrow:
http://unicode.org/reports/tr10/#Well_Formedness_Examples

This seems a bit problematic, because these string sort in the other
order ("e aí" before "eai") according to sort(1) as well as Postgres
sorting code.

It's possible I've copied-and-pasted these multibyte characters wrong.
But if I haven't, this says that the strxfrm-based optimization is
unusably broken on a very large fraction of reasonably-modern
installations. Quite aside from casting aspersions on the glibc guys,
how did we fail to notice this in our own testing?

Because we don't test every possible libc installations. And even if
we did, why should we be able to usefully nail down something that's
fundamentally not under our control? (I don't want to assume that that
bug is at fault, but it seems like a reasonable speculation,
especially based on your "strxfrm-binary" result.)

Let's not relitigate the debate about Postgres controlling its own
collations right now, though.

I think that amcheck will be able to provide reasonable smoke-testing
for these kinds of issues once it gets some buildfarm cycles. I intend
to write plenty of tests for external sorting to go with amcheck, too;
that code currently has no tests whatsoever. amcheck provides a nice
way of testing if strxfrm() agrees with strcoll(), without having to
"expect" any particular total ordering for a collatable type, which is
what a simple pg_regress approach would require. Portable testing of
strcoll() + strxfrm() will improve matters.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#13

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Peter Geoghegan (#12)

hackersbugs

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

On Mon, Mar 21, 2016 at 10:16 PM, Peter Geoghegan <pg@heroku.com> wrote:

"eai" -> 100c14 01 090909 01 090909 (11 bytes)
"e aí" -> 100c14 01 0b0909 01 090909010235 (14 bytes)

"eai" -> 100c14 01 080808 01 020202 (11 bytes)
"e aí" -> 100c14 01 080809 01 020202010235 (14 bytes)

Sorry, I have that backwards. The latter output is Tom's de_DE.UTF-8
testcase, showing broken glibc behavior.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#14

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Peter Geoghegan (#10)

hackersbugs

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

On Mon, Mar 21, 2016 at 9:04 PM, Peter Geoghegan <pg@heroku.com> wrote:

Can you look at generating a textual representation of the strxfrm()
blobs in question, using Robert's tool?:

/messages/by-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com

I played with this tool myself, on an affected CentOS 6.7 VM:

[vagrant@localhost ~]$ ldd --version
ldd (GNU libc) 2.12
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

I now think that we have this backwards: This isn't a bug in glibc's
strxfrm(); it's a bug in glibc's strcoll(). Minimal testcase with
modified tool, simplified to use ascii-safe strings:

[vagrant@localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: 6

If we assume for the sake of argument that this is a strxfrm() bug and
strcoll() is a reliable source of truth, then I find it very curious
that Germany's Austrian neighbors differ on this point about how text
should be collated:

[vagrant@localhost ~]$ ./a.out de_AT.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: -1

This surely adds doubt to the idea that strxfrm() in particular is broken.

I find something else inconsistent with the strxfrm() theory: even the
de_DE collation gives strxfrm()/strcoll() self-consistent answers when
we move the rhs argument's space to the far side of its center 'x'
char:

[vagrant@localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'xx x'
"xxx" -> 2323230108080801020202 (11 bytes)
"xx x" -> 2323230108080801020202010335 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: -1

It seems very unlikely that this is because of a legitimate
consideration that strcoll() makes about how German should be collated
(one that strxfrm() fails to make, say).

This is probably a worse situation for affected Postgres systems,
though, because now they have no scope to turn the faulty part of the
system off. I have a hard time believing that it's a good idea to
trust strcoll() to be wrong in a consistent way that has collatable
type opclasses at least follow "Notes to Operator Class Implementors".
I'd like to hear more opinions on that, though, because it's a tricky
thing to reason about.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#15

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Peter Geoghegan (#14)

hackersbugs

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

On Tue, Mar 22, 2016 at 5:09 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Mon, Mar 21, 2016 at 9:04 PM, Peter Geoghegan <pg@heroku.com> wrote:

Can you look at generating a textual representation of the strxfrm()
blobs in question, using Robert's tool?:

/messages/by-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com

I played with this tool myself, on an affected CentOS 6.7 VM:

[vagrant@localhost ~]$ ldd --version
ldd (GNU libc) 2.12
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

I now think that we have this backwards: This isn't a bug in glibc's
strxfrm(); it's a bug in glibc's strcoll(). Minimal testcase with
modified tool, simplified to use ascii-safe strings:

[vagrant@localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: 6

If we assume for the sake of argument that this is a strxfrm() bug and
strcoll() is a reliable source of truth, then I find it very curious
that Germany's Austrian neighbors differ on this point about how text
should be collated:

[vagrant@localhost ~]$ ./a.out de_AT.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: -1

This surely adds doubt to the idea that strxfrm() in particular is broken.

I find something else inconsistent with the strxfrm() theory: even the
de_DE collation gives strxfrm()/strcoll() self-consistent answers when
we move the rhs argument's space to the far side of its center 'x'
char:

[vagrant@localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'xx x'
"xxx" -> 2323230108080801020202 (11 bytes)
"xx x" -> 2323230108080801020202010335 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: -1

It seems very unlikely that this is because of a legitimate
consideration that strcoll() makes about how German should be collated
(one that strxfrm() fails to make, say).

This is probably a worse situation for affected Postgres systems,
though, because now they have no scope to turn the faulty part of the
system off. I have a hard time believing that it's a good idea to
trust strcoll() to be wrong in a consistent way that has collatable
type opclasses at least follow "Notes to Operator Class Implementors".
I'd like to hear more opinions on that, though, because it's a tricky
thing to reason about.

Well, if we implement a compatibility GUC that shuts off our
dependency on strxfrm(), people can go back to having 9.5 be no more
broken than 9.4 was. I vote we do that and go home.
Behavior-changing GUCs suck, but it seems clear that Tom is not going
to sit still for any solution that involves blaming the glibc vendor
no matter how well-justified that approach might be; and I don't have
a better idea. I was a little worried that it was too much to hope
for that all libc vendors on earth would ship a strxfrm()
implementation that was actually consistent with strcoll(), and here
we are. It's a good thing that operating systems manage to make
read() and getpid() several orders of magnitude more reliable than
strxfrm() and strcoll(), or we'd probably all be running Windows or
VMS or something now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#16

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Robert Haas (#15)

hackersbugs

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

Robert Haas <robertmhaas@gmail.com> writes:

I was a little worried that it was too much to hope for that all libc
vendors on earth would ship a strxfrm() implementation that was actually
consistent with strcoll(), and here we are.

Indeed. To try to put some scope on the problem, I made an idiot little
program that just generates some random UTF8 strings and sees whether
strcoll and strxfrm sort them alike. Attached are that program, a even
more idiot little shell script that runs it over all available UTF8
locales, and the results on my RHEL6 box. While de_DE seems to be the
worst-broken locale, it's far from the only one.

Please try this on as many platforms as you can get hold of ...

regards, tom lane

#17

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Peter Geoghegan (#14)

hackersbugs

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

Peter Geoghegan <pg@heroku.com> writes:

I now think that we have this backwards: This isn't a bug in glibc's
strxfrm(); it's a bug in glibc's strcoll().

FWIW, the test program I just posted includes checks to see if the two
cases produce self-consistent sort orders. So far I've seen no evidence
that they don't; that is, strcoll() produces a consistent sort order,
and strxfrm() produces a consistent sort order, but not the same one.
That being the case, arguing about which one is wrong seems a bit
academic, not to mention well above my pay grade so far as the theoretical
behavior of locale-specific sort ordering is concerned.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#18

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Tom Lane (#17)

hackersbugs

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

On Tue, Mar 22, 2016 at 4:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Geoghegan <pg@heroku.com> writes:

I now think that we have this backwards: This isn't a bug in glibc's
strxfrm(); it's a bug in glibc's strcoll().

FWIW, the test program I just posted includes checks to see if the two
cases produce self-consistent sort orders. So far I've seen no evidence
that they don't; that is, strcoll() produces a consistent sort order,
and strxfrm() produces a consistent sort order, but not the same one.
That being the case, arguing about which one is wrong seems a bit
academic, not to mention well above my pay grade so far as the theoretical
behavior of locale-specific sort ordering is concerned.

I hope you're right about it being academic.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#19

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tom Lane (#16)

hackersbugs

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

On Tue, Mar 22, 2016 at 7:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Please try this on as many platforms as you can get hold of ...

On MacOS X 10.10.5, this fails because the strxfrm() blobs are far
longer than the maximum you defined (about 8n+8 bytes, IIRC). I fixed
that and ran this; all locales tested good.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#20

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Tom Lane (#16)

hackersbugs

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

Robert Haas <robertmhaas@gmail.com> writes:

I was a little worried that it was too much to hope for that all libc
vendors on earth would ship a strxfrm() implementation that was actually
consistent with strcoll(), and here we are.

BTW, the glibc discussion starting here:
https://sourceware.org/ml/libc-alpha/2015-09/msg00196.html
should put substantial fear in us about the advisability of putting strxfrm
results on-disk, as I understand we're now doing in btrees.

I was led to that while looking to see if there were any already-filed
glibc bug reports concerning this issue. AFAICS there are not, which
is odd if the bug is gone in more recent releases ...

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#21

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tom Lane (#20)

hackersbugs

#22

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Tom Lane (#20)

hackersbugs

#23

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Robert Haas (#21)

hackersbugs

#24

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Robert Haas (#19)

hackersbugs

#25

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Robert Haas (#24)

hackersbugs

#26

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tom Lane (#25)

hackersbugs

#27

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Robert Haas (#26)

hackersbugs

#28

Stephen Frost

sfrost@snowman.net

over 10 years ago

In reply to: Tom Lane (#16)

hackersbugs

#29

Thomas Munro

thomas.munro@gmail.com

over 10 years ago

In reply to: Tom Lane (#16)

hackersbugs

#30

Stephen Frost

sfrost@snowman.net

over 10 years ago

In reply to: Tom Lane (#16)

hackersbugs

#31

Stephen Frost

sfrost@snowman.net

over 10 years ago

In reply to: Tom Lane (#16)

hackersbugs

#32

Thomas Munro

thomas.munro@gmail.com

over 10 years ago

In reply to: Thomas Munro (#29)

hackersbugs

#33

Stephen Frost

sfrost@snowman.net

over 10 years ago

In reply to: Tom Lane (#16)

hackersbugs

#34

Stephen Frost

sfrost@snowman.net

over 10 years ago

In reply to: Tom Lane (#16)

hackersbugs

#35

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Robert Haas (#15)

hackersbugs

#36

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Peter Geoghegan (#35)

hackersbugs

#37

Noah Misch

noah@leadboat.com

over 10 years ago

In reply to: Tom Lane (#16)

hackersbugs

#38

Bernd Helmle

mailings@oopsware.de

over 10 years ago

In reply to: Tom Lane (#16)

hackersbugs

#39

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Noah Misch (#37)

hackersbugs

#40

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Robert Haas (#39)

hackersbugs

#41

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Tom Lane (#40)

hackersbugs

#42

David G. Johnston

david.g.johnston@gmail.com

over 10 years ago

In reply to: Tom Lane (#41)

hackersbugs

#43

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: David G. Johnston (#42)

hackersbugs

#44

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tom Lane (#41)

hackersbugs

#45

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Robert Haas (#44)

hackersbugs

#46

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Tom Lane (#45)

hackersbugs

#47

Magnus Hagander

magnus@hagander.net

over 10 years ago

In reply to: Peter Geoghegan (#46)

hackersbugs

#48

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Magnus Hagander (#47)

hackersbugs

#49

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Peter Geoghegan (#48)

hackersbugs

#50

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Robert Haas (#44)

hackersbugs

#51

Magnus Hagander

magnus@hagander.net

over 10 years ago

In reply to: Peter Geoghegan (#48)

hackersbugs

#52

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Magnus Hagander (#51)

hackersbugs

#53

Magnus Hagander

magnus@hagander.net

over 10 years ago

In reply to: Peter Geoghegan (#52)

hackersbugs

#54

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Peter Geoghegan (#48)

hackersbugs

#55

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Magnus Hagander (#53)

hackersbugs

#56

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Peter Geoghegan (#52)

hackersbugs

#57

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Tom Lane (#56)

hackersbugs

#58

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tom Lane (#56)

hackersbugs

#59

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Robert Haas (#58)

hackersbugs

#60

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Peter Geoghegan (#59)

hackersbugs

#61

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Robert Haas (#60)

hackersbugs

#62

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Robert Haas (#58)

hackersbugs

#63

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tom Lane (#62)

hackersbugs

#64

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Tom Lane (#36)

hackersbugs

#65

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Peter Geoghegan (#64)

hackersbugs

#66

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Tom Lane (#65)

hackersbugs

#67

Magnus Hagander

magnus@hagander.net

over 10 years ago

In reply to: Peter Geoghegan (#55)

hackersbugs

#68

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Magnus Hagander (#67)

hackersbugs

#69

Bernd Helmle

mailings@oopsware.de

over 10 years ago

In reply to: Magnus Hagander (#67)

hackersbugs

#70

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Robert Haas (#68)

hackersbugs

#71

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Peter Geoghegan (#70)

hackersbugs

#72

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Tom Lane (#71)

hackersbugs

#73

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Peter Geoghegan (#72)

hackersbugs

#74

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Magnus Hagander (#67)

hackersbugs

#75

Marc-Olaf Jaschke

marc-olaf.jaschke@s24.com

about 10 years ago

In reply to: Robert Haas (#63)

hackersbugs

#76

Peter Geoghegan

pg@bowt.ie

almost 10 years ago

In reply to: Tom Lane (#45)

hackersbugs

#77

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 10 years ago

In reply to: Peter Geoghegan (#76)

hackersbugs

Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

Attachments: