Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Hi,
PostgreSQL 9.5 ignores rows with the following test case:
=========================
\l+
…
Encoding | Collate | Ctype
UTF8 | de_DE.UTF-8 | de_DE.UTF-8
...
create table test (t) as values ('eai'), ('e aí');
select * from test where t = 'eai';
t
-----
eai
(1 row)
create index on test(t);
set enable_seqscan = false;
select * from test where t = 'eai';
t
---
(0 rows)
select t from test where t = 'eai' collate "C";
t
-----
eai
(1 row)
alter table test alter column t type text collate "C";
select * from test where t = 'eai';
t
-----
eai
(1 row)
alter table test alter column t type text collate "de_DE.utf8";
select * from test where t = 'eai';
t
---
(0 rows)
set enable_seqscan = true;
select * from test where t = 'eai';
t
-----
eai
(1 row)
=========================
I was able to reproduce this with
cat /etc/debian_version
6.0.1
PostgreSQL 9.5.0 on x86_64-pc-linux-gnu, compiled by gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit
/lib/libc.so.6 > GNU C Library (Debian EGLIBC 2.11.3-3) stable release version 2.11.3, by Roland McGrath et al.
CentOS release 6.7 (Final)
PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
ldd --version
ldd (GNU libc) 2.12
I was not able to reproduce this with
OSX (10.11.3 (15D21))
PostgreSQL 9.5alpha1 on x86_64-apple-darwin14.3.0, compiled by Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn), 64-bit
OSX (10.11.3 (15D21))
PostgreSQL 9.5.1 on x86_64-apple-darwin14.5.0, compiled by Apple LLVM version 7.0.0 (clang-700.1.76), 64-bit
Ubuntu 12.04.5 LTS
PostgreSQL 9.3.11 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
ldd --version
ldd (Ubuntu EGLIBC 2.15-0ubuntu10.13) 2.15
CentOS release 6.7 (Final)
PostgreSQL 9.4.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
ldd --version
ldd (GNU libc) 2.12
Red Hat Enterprise Linux Server release 7.2 (Maipo)
PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit
ldd --version
ldd (GNU libc) 2.17
Best regards,
Marc-Olaf Jaschke
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Marc-Olaf Jaschke <marc-olaf.jaschke@s24.com> writes:
PostgreSQL 9.5 ignores rows with the following test case:
I can reproduce this in 9.5 and HEAD on RHEL6, but 9.4 works as expected.
I presume that that points the finger at the abbreviated-keys work.
BTW, what I'm seeing in 9.5/HEAD is that all three comparison senses fail:
u8=# set enable_seqscan TO 0;
SET
u8=# select * from test where t < 'eai';
t
---
(0 rows)
u8=# select * from test where t = 'eai';
t
---
(0 rows)
u8=# select * from test where t > 'eai';
t
---
(0 rows)
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, Mar 21, 2016 at 8:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Marc-Olaf Jaschke <marc-olaf.jaschke@s24.com> writes:
PostgreSQL 9.5 ignores rows with the following test case:
I can reproduce this in 9.5 and HEAD on RHEL6, but 9.4 works as expected.
I presume that that points the finger at the abbreviated-keys work.BTW, what I'm seeing in 9.5/HEAD is that all three comparison senses fail:
u8=# set enable_seqscan TO 0;
SET
u8=# select * from test where t < 'eai';
t
---
(0 rows)u8=# select * from test where t = 'eai';
t
---
(0 rows)u8=# select * from test where t > 'eai';
t
---
(0 rows)
This could plausibly be a consequence of the abbreviated keys work if
strxfrm() and strcoll() return inconsistent results for those strings
for the same locale (say, one says +1 and the other says -1 given
those inputs). I don't have a RHEL6 system handy to test whether that
might be the case here.
If that is the case, I'd argue that's a glibc problem, not our
problem. Of course, we could provide an option to disable abbreviated
keys for the benefit of people who need to work around buggy libc
implementations.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, Mar 21, 2016 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
If that is the case, I'd argue that's a glibc problem, not our
problem. Of course, we could provide an option to disable abbreviated
keys for the benefit of people who need to work around buggy libc
implementations.
Conferred with Robert. This is my first suspicion. More in a little while.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, Mar 21, 2016 at 1:40 PM, Marc-Olaf Jaschke
<marc-olaf.jaschke@s24.com> wrote:
PostgreSQL 9.5 ignores rows with the following test case:
At one point, Robert wrote a small self-contained tool to show OS
strxfrm() blobs:
/messages/by-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com
It would be great if you showed us the output for your test case
strings, both on an affected and on an unaffected system. As Robert
mentioned, our use of strxfrm() quite reasonably relies on it
producing blobs that compare with strcmp() in a way that gives the
same result as a strcoll() on the original strings, per ISO C90.
Thanks
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, Mar 21, 2016 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
If that is the case, I'd argue that's a glibc problem, not our
problem. Of course, we could provide an option to disable abbreviated
keys for the benefit of people who need to work around buggy libc
implementations.
That would be an easy patch to write. We'd simply have a test within
bttextsortsupport() that had systems that disabled abbreviated keys
for text PG_RETURN_VOID(). Actually, to be more precise we'd put that
next to the Windows code within varstr_sortsupport() (the function is
called btsortsupport_worker in 9.5). It would look at a GUC, I
suppose.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, Mar 21, 2016 at 1:40 PM, Marc-Olaf Jaschke
<marc-olaf.jaschke@s24.com> wrote:
I was able to reproduce this with
cat /etc/debian_version
6.0.1
PostgreSQL 9.5.0 on x86_64-pc-linux-gnu, compiled by gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit
/lib/libc.so.6 > GNU C Library (Debian EGLIBC 2.11.3-3) stable release version 2.11.3, by Roland McGrath et al.CentOS release 6.7 (Final)
PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
ldd --version
ldd (GNU libc) 2.12
I found this fairly recent bug report concerning glibc's strxfrm():
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=803927
(See also https://sourceware.org/bugzilla/show_bug.cgi?id=16009)
I'm not certain that this is the problem, but it's a good theory. Note
that this particular message talks about your exact affected version
of eglibc (eglibc-2.11.3):
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=803927#27
Even if it isn't this exact issue, I have a really hard time imagining
that this is not a bug in the relevant Glibc versions. Abbreviated
keys are fundamentally a fairly simple idea, and it's hard to think of
any other possible explanation.
We'll know more when we use those strxfrm() blobs, from the tool I linked to.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, Mar 21, 2016 at 5:44 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Mon, Mar 21, 2016 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
If that is the case, I'd argue that's a glibc problem, not our
problem. Of course, we could provide an option to disable abbreviated
keys for the benefit of people who need to work around buggy libc
implementations.That would be an easy patch to write. We'd simply have a test within
bttextsortsupport() that had systems that disabled abbreviated keys
for text PG_RETURN_VOID(). Actually, to be more precise we'd put that
next to the Windows code within varstr_sortsupport() (the function is
called btsortsupport_worker in 9.5). It would look at a GUC, I
suppose.
Actually, I suppose it isn't quite that simple, because abbreviated
keys did not introduce the use of strxfrm() by Postgres. That happened
much sooner. I guess we'd have to think about convert_string_datum(),
too.
Maybe we can write a test-case that lets check_strxfrm_bug() detect
this issue, which would be ideal. But, again, I need to see what's
going on with strxfrm() on affected systems before I can do anything.
Don't have one of my own close at hand.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Peter Geoghegan <pg@heroku.com> writes:
On Mon, Mar 21, 2016 at 5:44 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Mon, Mar 21, 2016 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
If that is the case, I'd argue that's a glibc problem, not our
problem. Of course, we could provide an option to disable abbreviated
keys for the benefit of people who need to work around buggy libc
implementations.
FWIW, I do not think you can dismiss it as "not our bug" if a large
fraction of existing glibc installations share the issue. It might
be a glibc bug, but we'll have to find a workaround.
Maybe we can write a test-case that lets check_strxfrm_bug() detect
this issue, which would be ideal. But, again, I need to see what's
going on with strxfrm() on affected systems before I can do anything.
Happy to test if you can provide a test case.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, Mar 21, 2016 at 7:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
FWIW, I do not think you can dismiss it as "not our bug" if a large
fraction of existing glibc installations share the issue. It might
be a glibc bug, but we'll have to find a workaround.
I didn't say that. I strongly agree.
Maybe we can write a test-case that lets check_strxfrm_bug() detect
this issue, which would be ideal. But, again, I need to see what's
going on with strxfrm() on affected systems before I can do anything.Happy to test if you can provide a test case.
Can you look at generating a textual representation of the strxfrm()
blobs in question, using Robert's tool?:
/messages/by-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com
That would give me some basis for writing a test.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Peter Geoghegan <pg@heroku.com> writes:
At one point, Robert wrote a small self-contained tool to show OS
strxfrm() blobs:
/messages/by-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com
It would be great if you showed us the output for your test case
strings, both on an affected and on an unaffected system.
On RHEL6, I get
./strxfrm-binary de_DE.UTF-8 'eai' 'e aí'
"eai" -> 100c140108080801020202 (11 bytes)
"e aí" -> 100c140108080901020202010235 (14 bytes)
This seems a bit problematic, because these string sort in the other
order ("e aí" before "eai") according to sort(1) as well as Postgres
sorting code.
It's possible I've copied-and-pasted these multibyte characters wrong.
But if I haven't, this says that the strxfrm-based optimization is
unusably broken on a very large fraction of reasonably-modern
installations. Quite aside from casting aspersions on the glibc guys,
how did we fail to notice this in our own testing?
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, Mar 21, 2016 at 9:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
On RHEL6, I get
./strxfrm-binary de_DE.UTF-8 'eai' 'e aí'
"eai" -> 100c140108080801020202 (11 bytes)
"e aí" -> 100c140108080901020202010235 (14 bytes)
As expect, ISTM that the "primary weights" here are the same.
Aligned comparison of this with correct en_US.UTF-8 blobs from my system:
Buggy version (Tom's de_DE.UTF-8 testcase):
"eai" -> 100c14 01 090909 01 090909 (11 bytes)
"e aí" -> 100c14 01 0b0909 01 090909010235 (14 bytes)
Correct version (though uses different locale):
"eai" -> 100c14 01 080808 01 020202 (11 bytes)
"e aí" -> 100c14 01 080809 01 020202010235 (14 bytes)
The low bytes, 0x01, separate the weight levels,. I think that this
always happens with glibc. The space character is only represented at
the last level, which is why strcoll() typically weighs spaces as very
unimportant (you'll recall that we here complaints about this from
time to time).
My guess is that the 0x0b byte in Tom's buggy de_DE.UTF-8 testcase is
the problem. Not sure why.
I guess I'll look around here for further ideas tomorrow:
http://unicode.org/reports/tr10/#Well_Formedness_Examples
This seems a bit problematic, because these string sort in the other
order ("e aí" before "eai") according to sort(1) as well as Postgres
sorting code.It's possible I've copied-and-pasted these multibyte characters wrong.
But if I haven't, this says that the strxfrm-based optimization is
unusably broken on a very large fraction of reasonably-modern
installations. Quite aside from casting aspersions on the glibc guys,
how did we fail to notice this in our own testing?
Because we don't test every possible libc installations. And even if
we did, why should we be able to usefully nail down something that's
fundamentally not under our control? (I don't want to assume that that
bug is at fault, but it seems like a reasonable speculation,
especially based on your "strxfrm-binary" result.)
Let's not relitigate the debate about Postgres controlling its own
collations right now, though.
I think that amcheck will be able to provide reasonable smoke-testing
for these kinds of issues once it gets some buildfarm cycles. I intend
to write plenty of tests for external sorting to go with amcheck, too;
that code currently has no tests whatsoever. amcheck provides a nice
way of testing if strxfrm() agrees with strcoll(), without having to
"expect" any particular total ordering for a collatable type, which is
what a simple pg_regress approach would require. Portable testing of
strcoll() + strxfrm() will improve matters.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, Mar 21, 2016 at 10:16 PM, Peter Geoghegan <pg@heroku.com> wrote:
"eai" -> 100c14 01 090909 01 090909 (11 bytes)
"e aí" -> 100c14 01 0b0909 01 090909010235 (14 bytes)
"eai" -> 100c14 01 080808 01 020202 (11 bytes)
"e aí" -> 100c14 01 080809 01 020202010235 (14 bytes)
Sorry, I have that backwards. The latter output is Tom's de_DE.UTF-8
testcase, showing broken glibc behavior.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, Mar 21, 2016 at 9:04 PM, Peter Geoghegan <pg@heroku.com> wrote:
Can you look at generating a textual representation of the strxfrm()
blobs in question, using Robert's tool?:/messages/by-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com
I played with this tool myself, on an affected CentOS 6.7 VM:
[vagrant@localhost ~]$ ldd --version
ldd (GNU libc) 2.12
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
I now think that we have this backwards: This isn't a bug in glibc's
strxfrm(); it's a bug in glibc's strcoll(). Minimal testcase with
modified tool, simplified to use ascii-safe strings:
[vagrant@localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: 6
If we assume for the sake of argument that this is a strxfrm() bug and
strcoll() is a reliable source of truth, then I find it very curious
that Germany's Austrian neighbors differ on this point about how text
should be collated:
[vagrant@localhost ~]$ ./a.out de_AT.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: -1
This surely adds doubt to the idea that strxfrm() in particular is broken.
I find something else inconsistent with the strxfrm() theory: even the
de_DE collation gives strxfrm()/strcoll() self-consistent answers when
we move the rhs argument's space to the far side of its center 'x'
char:
[vagrant@localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'xx x'
"xxx" -> 2323230108080801020202 (11 bytes)
"xx x" -> 2323230108080801020202010335 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: -1
It seems very unlikely that this is because of a legitimate
consideration that strcoll() makes about how German should be collated
(one that strxfrm() fails to make, say).
This is probably a worse situation for affected Postgres systems,
though, because now they have no scope to turn the faulty part of the
system off. I have a hard time believing that it's a good idea to
trust strcoll() to be wrong in a consistent way that has collatable
type opclasses at least follow "Notes to Operator Class Implementors".
I'd like to hear more opinions on that, though, because it's a tricky
thing to reason about.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Mar 22, 2016 at 5:09 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Mon, Mar 21, 2016 at 9:04 PM, Peter Geoghegan <pg@heroku.com> wrote:
Can you look at generating a textual representation of the strxfrm()
blobs in question, using Robert's tool?:/messages/by-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com
I played with this tool myself, on an affected CentOS 6.7 VM:
[vagrant@localhost ~]$ ldd --version
ldd (GNU libc) 2.12
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.I now think that we have this backwards: This isn't a bug in glibc's
strxfrm(); it's a bug in glibc's strcoll(). Minimal testcase with
modified tool, simplified to use ascii-safe strings:[vagrant@localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: 6If we assume for the sake of argument that this is a strxfrm() bug and
strcoll() is a reliable source of truth, then I find it very curious
that Germany's Austrian neighbors differ on this point about how text
should be collated:[vagrant@localhost ~]$ ./a.out de_AT.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: -1This surely adds doubt to the idea that strxfrm() in particular is broken.
I find something else inconsistent with the strxfrm() theory: even the
de_DE collation gives strxfrm()/strcoll() self-consistent answers when
we move the rhs argument's space to the far side of its center 'x'
char:[vagrant@localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'xx x'
"xxx" -> 2323230108080801020202 (11 bytes)
"xx x" -> 2323230108080801020202010335 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: -1It seems very unlikely that this is because of a legitimate
consideration that strcoll() makes about how German should be collated
(one that strxfrm() fails to make, say).This is probably a worse situation for affected Postgres systems,
though, because now they have no scope to turn the faulty part of the
system off. I have a hard time believing that it's a good idea to
trust strcoll() to be wrong in a consistent way that has collatable
type opclasses at least follow "Notes to Operator Class Implementors".
I'd like to hear more opinions on that, though, because it's a tricky
thing to reason about.
Well, if we implement a compatibility GUC that shuts off our
dependency on strxfrm(), people can go back to having 9.5 be no more
broken than 9.4 was. I vote we do that and go home.
Behavior-changing GUCs suck, but it seems clear that Tom is not going
to sit still for any solution that involves blaming the glibc vendor
no matter how well-justified that approach might be; and I don't have
a better idea. I was a little worried that it was too much to hope
for that all libc vendors on earth would ship a strxfrm()
implementation that was actually consistent with strcoll(), and here
we are. It's a good thing that operating systems manage to make
read() and getpid() several orders of magnitude more reliable than
strxfrm() and strcoll(), or we'd probably all be running Windows or
VMS or something now.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Robert Haas <robertmhaas@gmail.com> writes:
I was a little worried that it was too much to hope for that all libc
vendors on earth would ship a strxfrm() implementation that was actually
consistent with strcoll(), and here we are.
Indeed. To try to put some scope on the problem, I made an idiot little
program that just generates some random UTF8 strings and sees whether
strcoll and strxfrm sort them alike. Attached are that program, a even
more idiot little shell script that runs it over all available UTF8
locales, and the results on my RHEL6 box. While de_DE seems to be the
worst-broken locale, it's far from the only one.
Please try this on as many platforms as you can get hold of ...
regards, tom lane
Peter Geoghegan <pg@heroku.com> writes:
I now think that we have this backwards: This isn't a bug in glibc's
strxfrm(); it's a bug in glibc's strcoll().
FWIW, the test program I just posted includes checks to see if the two
cases produce self-consistent sort orders. So far I've seen no evidence
that they don't; that is, strcoll() produces a consistent sort order,
and strxfrm() produces a consistent sort order, but not the same one.
That being the case, arguing about which one is wrong seems a bit
academic, not to mention well above my pay grade so far as the theoretical
behavior of locale-specific sort ordering is concerned.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Mar 22, 2016 at 4:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Geoghegan <pg@heroku.com> writes:
I now think that we have this backwards: This isn't a bug in glibc's
strxfrm(); it's a bug in glibc's strcoll().FWIW, the test program I just posted includes checks to see if the two
cases produce self-consistent sort orders. So far I've seen no evidence
that they don't; that is, strcoll() produces a consistent sort order,
and strxfrm() produces a consistent sort order, but not the same one.
That being the case, arguing about which one is wrong seems a bit
academic, not to mention well above my pay grade so far as the theoretical
behavior of locale-specific sort ordering is concerned.
I hope you're right about it being academic.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Mar 22, 2016 at 7:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Please try this on as many platforms as you can get hold of ...
On MacOS X 10.10.5, this fails because the strxfrm() blobs are far
longer than the maximum you defined (about 8n+8 bytes, IIRC). I fixed
that and ran this; all locales tested good.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Robert Haas <robertmhaas@gmail.com> writes:
I was a little worried that it was too much to hope for that all libc
vendors on earth would ship a strxfrm() implementation that was actually
consistent with strcoll(), and here we are.
BTW, the glibc discussion starting here:
https://sourceware.org/ml/libc-alpha/2015-09/msg00196.html
should put substantial fear in us about the advisability of putting strxfrm
results on-disk, as I understand we're now doing in btrees.
I was led to that while looking to see if there were any already-filed
glibc bug reports concerning this issue. AFAICS there are not, which
is odd if the bug is gone in more recent releases ...
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Mar 22, 2016 at 7:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I was a little worried that it was too much to hope for that all libc
vendors on earth would ship a strxfrm() implementation that was actually
consistent with strcoll(), and here we are.BTW, the glibc discussion starting here:
https://sourceware.org/ml/libc-alpha/2015-09/msg00196.html
should put substantial fear in us about the advisability of putting strxfrm
results on-disk, as I understand we're now doing in btrees.
No. Peter proposed that, but it hasn't actually been done. This
certainly makes that sound inadvisable, though.
We are, however, putting indexes on disk whose ordering was determined
partly by the result of strxfrm() comparisons.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Mar 22, 2016 at 4:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
BTW, the glibc discussion starting here:
https://sourceware.org/ml/libc-alpha/2015-09/msg00196.html
should put substantial fear in us about the advisability of putting strxfrm
results on-disk, as I understand we're now doing in btrees.I was led to that while looking to see if there were any already-filed
glibc bug reports concerning this issue. AFAICS there are not, which
is odd if the bug is gone in more recent releases ...
I always knew it wouldn't fly to store strxfrm on disk, and we don't
do that. I actually quoted a paper saying just that at one point. I
specifically acknowledged that that was clearly a non-starter a couple
of times.
B-Trees are built based on strxfrm() comparisons at a point in time.
strxfrm() should be able to produce the same results as strcoll().
That is what it's documented to do, in C90. glibc has license to
change the strxfrm() representation while still producing answers
consistent with previous answers. Just not during an ongoing sort,
obviously.
It's not 100% clear that we have a contract with glibc to never change
collation rules, even for strcoll(), but our current use of strxfrm()
should not have made that any worse. Problems only cropped up because
of bugs in glibc.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Robert Haas <robertmhaas@gmail.com> writes:
We are, however, putting indexes on disk whose ordering was determined
partly by the result of strxfrm() comparisons.
Yeah. It appears to me that the originally-submitted test case creates
an index whose entries are ordered correctly according to strxfrm(),
but not so much according to strcoll().
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Mar 22, 2016 at 7:48 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 22, 2016 at 7:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Please try this on as many platforms as you can get hold of ...
On MacOS X 10.10.5, this fails because the strxfrm() blobs are far
longer than the maximum you defined (about 8n+8 bytes, IIRC). I fixed
that and ran this; all locales tested good.
Here are the results on Fedora 16 and RHEL 7.1.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
Here are the results on Fedora 16 and RHEL 7.1.
So much for the theory that it's fixed in RHEL7. I now think that the
glibc folk actually do not know about this, and have accordingly filed
https://bugzilla.redhat.com/show_bug.cgi?id=1320356
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Mar 22, 2016 at 8:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Here are the results on Fedora 16 and RHEL 7.1.
So much for the theory that it's fixed in RHEL7. I now think that the
glibc folk actually do not know about this, and have accordingly filed
https://bugzilla.redhat.com/show_bug.cgi?id=1320356
Good plan, but what do we do between now and when they fix it? This
seems quite bad.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Mar 22, 2016 at 8:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So much for the theory that it's fixed in RHEL7. I now think that the
glibc folk actually do not know about this, and have accordingly filed
https://bugzilla.redhat.com/show_bug.cgi?id=1320356
Good plan, but what do we do between now and when they fix it? This
seems quite bad.
At the moment I think we're still in information-gathering mode.
The upstream reaction to this will be valuable data. In the meantime,
I'd still like to find out which other platforms have similar issues.
I really kinda doubt the upthread report that Ubuntu doesn't have a
comparable problem, for instance, given the lack of any evidence that
this is a known/fixed issue in glibc.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I was a little worried that it was too much to hope for that all libc
vendors on earth would ship a strxfrm() implementation that was actually
consistent with strcoll(), and here we are.Indeed. To try to put some scope on the problem, I made an idiot little
program that just generates some random UTF8 strings and sees whether
strcoll and strxfrm sort them alike. Attached are that program, a even
more idiot little shell script that runs it over all available UTF8
locales, and the results on my RHEL6 box. While de_DE seems to be the
worst-broken locale, it's far from the only one.Please try this on as many platforms as you can get hold of ...
Results for Ubuntu 15.10:
Using LC_COLLATE = "C.UTF-8"
Using LC_CTYPE = "en_US.UTF-8"
C.UTF-8 good
Using LC_COLLATE = "de_DE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
de_DE.utf8 good
Using LC_COLLATE = "en_AG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_AG.utf8 good
Using LC_COLLATE = "en_AU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_AU.utf8 good
Using LC_COLLATE = "en_BW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_BW.utf8 good
Using LC_COLLATE = "en_CA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_CA.utf8 good
Using LC_COLLATE = "en_DK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_DK.utf8 good
Using LC_COLLATE = "en_GB.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_GB.utf8 good
Using LC_COLLATE = "en_HK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_HK.utf8 good
Using LC_COLLATE = "en_IE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_IE.utf8 good
Using LC_COLLATE = "en_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_IN.utf8 good
Using LC_COLLATE = "en_NG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_NG.utf8 good
Using LC_COLLATE = "en_NZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_NZ.utf8 good
Using LC_COLLATE = "en_PH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_PH.utf8 good
Using LC_COLLATE = "en_SG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_SG.utf8 good
Using LC_COLLATE = "en_US.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_US.utf8 good
Using LC_COLLATE = "en_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_ZA.utf8 good
Using LC_COLLATE = "en_ZM.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_ZM.utf8 good
Using LC_COLLATE = "en_ZW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_ZW.utf8 good
Will try on others.
Thanks!
Stephen
On Wed, Mar 23, 2016 at 12:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I was a little worried that it was too much to hope for that all libc
vendors on earth would ship a strxfrm() implementation that was actually
consistent with strcoll(), and here we are.Indeed. To try to put some scope on the problem, I made an idiot little
program that just generates some random UTF8 strings and sees whether
strcoll and strxfrm sort them alike. Attached are that program, a even
more idiot little shell script that runs it over all available UTF8
locales, and the results on my RHEL6 box. While de_DE seems to be the
worst-broken locale, it's far from the only one.Please try this on as many platforms as you can get hold of ...
Failed on Debian 8.2, but only for de_DE.utf8. libc 2.19-18+deb8u1. Attached.
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I was a little worried that it was too much to hope for that all libc
vendors on earth would ship a strxfrm() implementation that was actually
consistent with strcoll(), and here we are.Indeed. To try to put some scope on the problem, I made an idiot little
program that just generates some random UTF8 strings and sees whether
strcoll and strxfrm sort them alike. Attached are that program, a even
more idiot little shell script that runs it over all available UTF8
locales, and the results on my RHEL6 box. While de_DE seems to be the
worst-broken locale, it's far from the only one.Please try this on as many platforms as you can get hold of ...
Results for Ubuntu 14.04:
sfrost@dwemer:/home/sfrost> sh tryalllocales.sh
Using LC_COLLATE = "C.UTF-8"
Using LC_CTYPE = "en_US.UTF-8"
C.UTF-8 good
Using LC_COLLATE = "de_DE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
inconsistency between strcoll (36) and strxfrm (35) orders
inconsistency between strcoll (35) and strxfrm (36) orders
inconsistency between strcoll (160) and strxfrm (159) orders
inconsistency between strcoll (159) and strxfrm (160) orders
inconsistency between strcoll (347) and strxfrm (346) orders
inconsistency between strcoll (348) and strxfrm (347) orders
inconsistency between strcoll (346) and strxfrm (348) orders
inconsistency between strcoll (355) and strxfrm (353) orders
inconsistency between strcoll (353) and strxfrm (354) orders
inconsistency between strcoll (354) and strxfrm (355) orders
inconsistency between strcoll (440) and strxfrm (439) orders
inconsistency between strcoll (441) and strxfrm (440) orders
inconsistency between strcoll (439) and strxfrm (441) orders
inconsistency between strcoll (450) and strxfrm (449) orders
inconsistency between strcoll (449) and strxfrm (450) orders
inconsistency between strcoll (454) and strxfrm (452) orders
inconsistency between strcoll (455) and strxfrm (453) orders
inconsistency between strcoll (452) and strxfrm (454) orders
inconsistency between strcoll (453) and strxfrm (455) orders
inconsistency between strcoll (521) and strxfrm (520) orders
inconsistency between strcoll (520) and strxfrm (521) orders
inconsistency between strcoll (529) and strxfrm (528) orders
inconsistency between strcoll (528) and strxfrm (529) orders
inconsistency between strcoll (682) and strxfrm (681) orders
inconsistency between strcoll (681) and strxfrm (682) orders
inconsistency between strcoll (743) and strxfrm (742) orders
inconsistency between strcoll (742) and strxfrm (743) orders
inconsistency between strcoll (830) and strxfrm (829) orders
inconsistency between strcoll (829) and strxfrm (830) orders
inconsistency between strcoll (870) and strxfrm (869) orders
inconsistency between strcoll (869) and strxfrm (870) orders
inconsistency between strcoll (933) and strxfrm (931) orders
inconsistency between strcoll (931) and strxfrm (932) orders
inconsistency between strcoll (932) and strxfrm (933) orders
de_DE.utf8 BAD
Using LC_COLLATE = "en_US.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_US.utf8 good
Thanks!
Stephen
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I was a little worried that it was too much to hope for that all libc
vendors on earth would ship a strxfrm() implementation that was actually
consistent with strcoll(), and here we are.Indeed. To try to put some scope on the problem, I made an idiot little
program that just generates some random UTF8 strings and sees whether
strcoll and strxfrm sort them alike. Attached are that program, a even
more idiot little shell script that runs it over all available UTF8
locales, and the results on my RHEL6 box. While de_DE seems to be the
worst-broken locale, it's far from the only one.Please try this on as many platforms as you can get hold of ...
I found the 'all' button on Debian 8.3:
sfrost@mahout:~$ sh tryalllocales.sh
Using LC_COLLATE = "aa_DJ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
aa_DJ.utf8 good
Using LC_COLLATE = "aa_ER.utf8"
Using LC_CTYPE = "en_US.UTF-8"
aa_ER.utf8 good
Using LC_COLLATE = "aa_ER.utf8@saaho"
Using LC_CTYPE = "en_US.UTF-8"
aa_ER.utf8@saaho good
Using LC_COLLATE = "aa_ET.utf8"
Using LC_CTYPE = "en_US.UTF-8"
aa_ET.utf8 good
Using LC_COLLATE = "af_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
af_ZA.utf8 good
Using LC_COLLATE = "ak_GH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ak_GH.utf8 good
Using LC_COLLATE = "am_ET.utf8"
Using LC_CTYPE = "en_US.UTF-8"
am_ET.utf8 good
Using LC_COLLATE = "an_ES.utf8"
Using LC_CTYPE = "en_US.UTF-8"
an_ES.utf8 good
Using LC_COLLATE = "anp_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
anp_IN.utf8 good
Using LC_COLLATE = "ar_AE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_AE.utf8 good
Using LC_COLLATE = "ar_BH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_BH.utf8 good
Using LC_COLLATE = "ar_DZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_DZ.utf8 good
Using LC_COLLATE = "ar_EG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_EG.utf8 good
Using LC_COLLATE = "ar_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_IN.utf8 good
Using LC_COLLATE = "ar_IQ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_IQ.utf8 good
Using LC_COLLATE = "ar_JO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_JO.utf8 good
Using LC_COLLATE = "ar_KW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_KW.utf8 good
Using LC_COLLATE = "ar_LB.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_LB.utf8 good
Using LC_COLLATE = "ar_LY.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_LY.utf8 good
Using LC_COLLATE = "ar_MA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_MA.utf8 good
Using LC_COLLATE = "ar_OM.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_OM.utf8 good
Using LC_COLLATE = "ar_QA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_QA.utf8 good
Using LC_COLLATE = "ar_SA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_SA.utf8 good
Using LC_COLLATE = "ar_SD.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_SD.utf8 good
Using LC_COLLATE = "ar_SS.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_SS.utf8 good
Using LC_COLLATE = "ar_SY.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_SY.utf8 good
Using LC_COLLATE = "ar_TN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_TN.utf8 good
Using LC_COLLATE = "ar_YE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_YE.utf8 good
Using LC_COLLATE = "as_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
as_IN.utf8 good
Using LC_COLLATE = "ast_ES.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ast_ES.utf8 good
Using LC_COLLATE = "ayc_PE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ayc_PE.utf8 good
Using LC_COLLATE = "az_AZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
az_AZ.utf8 good
Using LC_COLLATE = "be_BY.utf8"
Using LC_CTYPE = "en_US.UTF-8"
be_BY.utf8 good
Using LC_COLLATE = "be_BY.utf8@latin"
Using LC_CTYPE = "en_US.UTF-8"
be_BY.utf8@latin good
Using LC_COLLATE = "bem_ZM.utf8"
Using LC_CTYPE = "en_US.UTF-8"
bem_ZM.utf8 good
Using LC_COLLATE = "ber_DZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ber_DZ.utf8 good
Using LC_COLLATE = "ber_MA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ber_MA.utf8 good
Using LC_COLLATE = "bg_BG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
bg_BG.utf8 good
Using LC_COLLATE = "bho_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
bho_IN.utf8 good
Using LC_COLLATE = "bn_BD.utf8"
Using LC_CTYPE = "en_US.UTF-8"
bn_BD.utf8 good
Using LC_COLLATE = "bn_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
bn_IN.utf8 good
Using LC_COLLATE = "bo_CN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
bo_CN.utf8 good
Using LC_COLLATE = "bo_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
bo_IN.utf8 good
Using LC_COLLATE = "br_FR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
br_FR.utf8 good
Using LC_COLLATE = "brx_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
brx_IN.utf8 good
Using LC_COLLATE = "bs_BA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
bs_BA.utf8 good
Using LC_COLLATE = "byn_ER.utf8"
Using LC_CTYPE = "en_US.UTF-8"
byn_ER.utf8 good
Using LC_COLLATE = "ca_AD.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ca_AD.utf8 good
Using LC_COLLATE = "ca_ES.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ca_ES.utf8 good
Using LC_COLLATE = "ca_ES.utf8@valencia"
Using LC_CTYPE = "en_US.UTF-8"
ca_ES.utf8@valencia good
Using LC_COLLATE = "ca_FR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ca_FR.utf8 good
Using LC_COLLATE = "ca_IT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ca_IT.utf8 good
Using LC_COLLATE = "cmn_TW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
cmn_TW.utf8 good
Using LC_COLLATE = "crh_UA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
crh_UA.utf8 good
Using LC_COLLATE = "csb_PL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
csb_PL.utf8 good
Using LC_COLLATE = "cs_CZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
cs_CZ.utf8 good
Using LC_COLLATE = "C.UTF-8"
Using LC_CTYPE = "en_US.UTF-8"
C.UTF-8 good
Using LC_COLLATE = "cv_RU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
cv_RU.utf8 good
Using LC_COLLATE = "cy_GB.utf8"
Using LC_CTYPE = "en_US.UTF-8"
cy_GB.utf8 good
Using LC_COLLATE = "da_DK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
da_DK.utf8 good
Using LC_COLLATE = "de_AT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
de_AT.utf8 good
Using LC_COLLATE = "de_BE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
de_BE.utf8 good
Using LC_COLLATE = "de_CH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
de_CH.utf8 good
Using LC_COLLATE = "de_DE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
inconsistency between strcoll (72) and strxfrm (71) orders
inconsistency between strcoll (71) and strxfrm (72) orders
inconsistency between strcoll (136) and strxfrm (135) orders
inconsistency between strcoll (135) and strxfrm (136) orders
inconsistency between strcoll (135) and strxfrm (136) orders
inconsistency between strcoll (139) and strxfrm (137) orders
inconsistency between strcoll (140) and strxfrm (138) orders
inconsistency between strcoll (137) and strxfrm (139) orders
inconsistency between strcoll (138) and strxfrm (140) orders
inconsistency between strcoll (149) and strxfrm (148) orders
inconsistency between strcoll (148) and strxfrm (149) orders
inconsistency between strcoll (254) and strxfrm (252) orders
inconsistency between strcoll (252) and strxfrm (253) orders
inconsistency between strcoll (253) and strxfrm (254) orders
inconsistency between strcoll (274) and strxfrm (273) orders
inconsistency between strcoll (275) and strxfrm (274) orders
inconsistency between strcoll (273) and strxfrm (275) orders
inconsistency between strcoll (339) and strxfrm (338) orders
inconsistency between strcoll (338) and strxfrm (339) orders
inconsistency between strcoll (338) and strxfrm (339) orders
inconsistency between strcoll (390) and strxfrm (388) orders
inconsistency between strcoll (388) and strxfrm (389) orders
inconsistency between strcoll (389) and strxfrm (390) orders
inconsistency between strcoll (411) and strxfrm (410) orders
inconsistency between strcoll (410) and strxfrm (411) orders
inconsistency between strcoll (449) and strxfrm (448) orders
inconsistency between strcoll (448) and strxfrm (449) orders
inconsistency between strcoll (454) and strxfrm (453) orders
inconsistency between strcoll (453) and strxfrm (454) orders
inconsistency between strcoll (529) and strxfrm (528) orders
inconsistency between strcoll (528) and strxfrm (529) orders
inconsistency between strcoll (543) and strxfrm (542) orders
inconsistency between strcoll (544) and strxfrm (543) orders
inconsistency between strcoll (542) and strxfrm (544) orders
inconsistency between strcoll (542) and strxfrm (544) orders
inconsistency between strcoll (567) and strxfrm (566) orders
inconsistency between strcoll (566) and strxfrm (567) orders
inconsistency between strcoll (589) and strxfrm (588) orders
inconsistency between strcoll (588) and strxfrm (589) orders
inconsistency between strcoll (592) and strxfrm (591) orders
inconsistency between strcoll (591) and strxfrm (592) orders
inconsistency between strcoll (594) and strxfrm (593) orders
inconsistency between strcoll (593) and strxfrm (594) orders
inconsistency between strcoll (597) and strxfrm (595) orders
inconsistency between strcoll (595) and strxfrm (596) orders
inconsistency between strcoll (596) and strxfrm (597) orders
inconsistency between strcoll (601) and strxfrm (600) orders
inconsistency between strcoll (600) and strxfrm (601) orders
inconsistency between strcoll (726) and strxfrm (724) orders
inconsistency between strcoll (724) and strxfrm (725) orders
inconsistency between strcoll (725) and strxfrm (726) orders
inconsistency between strcoll (743) and strxfrm (741) orders
inconsistency between strcoll (741) and strxfrm (742) orders
inconsistency between strcoll (741) and strxfrm (742) orders
inconsistency between strcoll (744) and strxfrm (743) orders
inconsistency between strcoll (742) and strxfrm (744) orders
inconsistency between strcoll (765) and strxfrm (764) orders
inconsistency between strcoll (764) and strxfrm (765) orders
inconsistency between strcoll (786) and strxfrm (784) orders
inconsistency between strcoll (784) and strxfrm (786) orders
inconsistency between strcoll (896) and strxfrm (895) orders
inconsistency between strcoll (895) and strxfrm (896) orders
inconsistency between strcoll (941) and strxfrm (939) orders
inconsistency between strcoll (942) and strxfrm (940) orders
inconsistency between strcoll (943) and strxfrm (941) orders
inconsistency between strcoll (939) and strxfrm (942) orders
inconsistency between strcoll (940) and strxfrm (943) orders
de_DE.utf8 BAD
Using LC_COLLATE = "de_LI.utf8"
Using LC_CTYPE = "en_US.UTF-8"
de_LI.utf8 good
Using LC_COLLATE = "de_LU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
de_LU.utf8 good
Using LC_COLLATE = "doi_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
doi_IN.utf8 good
Using LC_COLLATE = "dv_MV.utf8"
Using LC_CTYPE = "en_US.UTF-8"
dv_MV.utf8 good
Using LC_COLLATE = "dz_BT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
dz_BT.utf8 good
Using LC_COLLATE = "el_CY.utf8"
Using LC_CTYPE = "en_US.UTF-8"
el_CY.utf8 good
Using LC_COLLATE = "el_GR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
el_GR.utf8 good
Using LC_COLLATE = "en_AG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_AG.utf8 good
Using LC_COLLATE = "en_AU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_AU.utf8 good
Using LC_COLLATE = "en_BW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_BW.utf8 good
Using LC_COLLATE = "en_CA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_CA.utf8 good
Using LC_COLLATE = "en_DK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_DK.utf8 good
Using LC_COLLATE = "en_GB.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_GB.utf8 good
Using LC_COLLATE = "en_HK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_HK.utf8 good
Using LC_COLLATE = "en_IE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_IE.utf8 good
Using LC_COLLATE = "en_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_IN.utf8 good
Using LC_COLLATE = "en_NG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_NG.utf8 good
Using LC_COLLATE = "en_NZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_NZ.utf8 good
Using LC_COLLATE = "en_PH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_PH.utf8 good
Using LC_COLLATE = "en_SG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_SG.utf8 good
Using LC_COLLATE = "en_US.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_US.utf8 good
Using LC_COLLATE = "en_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_ZA.utf8 good
Using LC_COLLATE = "en_ZM.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_ZM.utf8 good
Using LC_COLLATE = "en_ZW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_ZW.utf8 good
Using LC_COLLATE = "eo.utf8"
Using LC_CTYPE = "en_US.UTF-8"
eo.utf8 good
Using LC_COLLATE = "es_AR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_AR.utf8 good
Using LC_COLLATE = "es_BO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_BO.utf8 good
Using LC_COLLATE = "es_CL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_CL.utf8 good
Using LC_COLLATE = "es_CO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_CO.utf8 good
Using LC_COLLATE = "es_CR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_CR.utf8 good
Using LC_COLLATE = "es_CU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_CU.utf8 good
Using LC_COLLATE = "es_DO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_DO.utf8 good
Using LC_COLLATE = "es_EC.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_EC.utf8 good
Using LC_COLLATE = "es_ES.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_ES.utf8 good
Using LC_COLLATE = "es_GT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_GT.utf8 good
Using LC_COLLATE = "es_HN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_HN.utf8 good
Using LC_COLLATE = "es_MX.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_MX.utf8 good
Using LC_COLLATE = "es_NI.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_NI.utf8 good
Using LC_COLLATE = "es_PA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_PA.utf8 good
Using LC_COLLATE = "es_PE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_PE.utf8 good
Using LC_COLLATE = "es_PR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_PR.utf8 good
Using LC_COLLATE = "es_PY.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_PY.utf8 good
Using LC_COLLATE = "es_SV.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_SV.utf8 good
Using LC_COLLATE = "es_US.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_US.utf8 good
Using LC_COLLATE = "es_UY.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_UY.utf8 good
Using LC_COLLATE = "es_VE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_VE.utf8 good
Using LC_COLLATE = "et_EE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
et_EE.utf8 good
Using LC_COLLATE = "eu_ES.utf8"
Using LC_CTYPE = "en_US.UTF-8"
eu_ES.utf8 good
Using LC_COLLATE = "eu_FR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
eu_FR.utf8 good
Using LC_COLLATE = "fa_IR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fa_IR.utf8 good
Using LC_COLLATE = "ff_SN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ff_SN.utf8 good
Using LC_COLLATE = "fi_FI.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fi_FI.utf8 good
Using LC_COLLATE = "fil_PH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fil_PH.utf8 good
Using LC_COLLATE = "fo_FO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fo_FO.utf8 good
Using LC_COLLATE = "fr_BE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fr_BE.utf8 good
Using LC_COLLATE = "fr_CA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fr_CA.utf8 good
Using LC_COLLATE = "fr_CH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fr_CH.utf8 good
Using LC_COLLATE = "fr_FR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fr_FR.utf8 good
Using LC_COLLATE = "fr_LU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fr_LU.utf8 good
Using LC_COLLATE = "fur_IT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fur_IT.utf8 good
Using LC_COLLATE = "fy_DE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fy_DE.utf8 good
Using LC_COLLATE = "fy_NL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fy_NL.utf8 good
Using LC_COLLATE = "ga_IE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ga_IE.utf8 good
Using LC_COLLATE = "gd_GB.utf8"
Using LC_CTYPE = "en_US.UTF-8"
gd_GB.utf8 good
Using LC_COLLATE = "gez_ER.utf8"
Using LC_CTYPE = "en_US.UTF-8"
gez_ER.utf8 good
Using LC_COLLATE = "gez_ER.utf8@abegede"
Using LC_CTYPE = "en_US.UTF-8"
gez_ER.utf8@abegede good
Using LC_COLLATE = "gez_ET.utf8"
Using LC_CTYPE = "en_US.UTF-8"
gez_ET.utf8 good
Using LC_COLLATE = "gez_ET.utf8@abegede"
Using LC_CTYPE = "en_US.UTF-8"
gez_ET.utf8@abegede good
Using LC_COLLATE = "gl_ES.utf8"
Using LC_CTYPE = "en_US.UTF-8"
gl_ES.utf8 good
Using LC_COLLATE = "gu_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
gu_IN.utf8 good
Using LC_COLLATE = "gv_GB.utf8"
Using LC_CTYPE = "en_US.UTF-8"
gv_GB.utf8 good
Using LC_COLLATE = "hak_TW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
hak_TW.utf8 good
Using LC_COLLATE = "ha_NG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ha_NG.utf8 good
Using LC_COLLATE = "he_IL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
he_IL.utf8 good
Using LC_COLLATE = "hi_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
hi_IN.utf8 good
Using LC_COLLATE = "hne_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
hne_IN.utf8 good
Using LC_COLLATE = "hr_HR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
hr_HR.utf8 good
Using LC_COLLATE = "hsb_DE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
hsb_DE.utf8 good
Using LC_COLLATE = "ht_HT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ht_HT.utf8 good
Using LC_COLLATE = "hu_HU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
hu_HU.utf8 good
Using LC_COLLATE = "hy_AM.utf8"
Using LC_CTYPE = "en_US.UTF-8"
hy_AM.utf8 good
Using LC_COLLATE = "ia_FR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ia_FR.utf8 good
Using LC_COLLATE = "id_ID.utf8"
Using LC_CTYPE = "en_US.UTF-8"
id_ID.utf8 good
Using LC_COLLATE = "ig_NG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ig_NG.utf8 good
Using LC_COLLATE = "ik_CA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ik_CA.utf8 good
Using LC_COLLATE = "is_IS.utf8"
Using LC_CTYPE = "en_US.UTF-8"
is_IS.utf8 good
Using LC_COLLATE = "it_CH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
it_CH.utf8 good
Using LC_COLLATE = "it_IT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
it_IT.utf8 good
Using LC_COLLATE = "iu_CA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
iu_CA.utf8 good
Using LC_COLLATE = "iw_IL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
iw_IL.utf8 good
Using LC_COLLATE = "ja_JP.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ja_JP.utf8 good
Using LC_COLLATE = "ka_GE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ka_GE.utf8 good
Using LC_COLLATE = "kk_KZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
kk_KZ.utf8 good
Using LC_COLLATE = "kl_GL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
kl_GL.utf8 good
Using LC_COLLATE = "km_KH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
km_KH.utf8 good
Using LC_COLLATE = "kn_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
kn_IN.utf8 good
Using LC_COLLATE = "kok_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
kok_IN.utf8 good
Using LC_COLLATE = "ko_KR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ko_KR.utf8 good
Using LC_COLLATE = "ks_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ks_IN.utf8 good
Using LC_COLLATE = "ks_IN.utf8@devanagari"
Using LC_CTYPE = "en_US.UTF-8"
ks_IN.utf8@devanagari good
Using LC_COLLATE = "ku_TR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ku_TR.utf8 good
Using LC_COLLATE = "kw_GB.utf8"
Using LC_CTYPE = "en_US.UTF-8"
kw_GB.utf8 good
Using LC_COLLATE = "ky_KG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ky_KG.utf8 good
Using LC_COLLATE = "lb_LU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
inconsistency between strcoll (137) and strxfrm (136) orders
inconsistency between strcoll (136) and strxfrm (137) orders
inconsistency between strcoll (171) and strxfrm (170) orders
inconsistency between strcoll (170) and strxfrm (171) orders
inconsistency between strcoll (351) and strxfrm (350) orders
inconsistency between strcoll (350) and strxfrm (351) orders
inconsistency between strcoll (350) and strxfrm (351) orders
inconsistency between strcoll (356) and strxfrm (353) orders
inconsistency between strcoll (353) and strxfrm (354) orders
inconsistency between strcoll (354) and strxfrm (355) orders
inconsistency between strcoll (355) and strxfrm (356) orders
inconsistency between strcoll (465) and strxfrm (464) orders
inconsistency between strcoll (464) and strxfrm (465) orders
inconsistency between strcoll (467) and strxfrm (466) orders
inconsistency between strcoll (466) and strxfrm (467) orders
inconsistency between strcoll (470) and strxfrm (469) orders
inconsistency between strcoll (469) and strxfrm (470) orders
inconsistency between strcoll (573) and strxfrm (572) orders
inconsistency between strcoll (574) and strxfrm (573) orders
inconsistency between strcoll (572) and strxfrm (574) orders
inconsistency between strcoll (572) and strxfrm (574) orders
inconsistency between strcoll (612) and strxfrm (611) orders
inconsistency between strcoll (611) and strxfrm (612) orders
inconsistency between strcoll (709) and strxfrm (708) orders
inconsistency between strcoll (710) and strxfrm (709) orders
inconsistency between strcoll (708) and strxfrm (710) orders
inconsistency between strcoll (771) and strxfrm (770) orders
inconsistency between strcoll (770) and strxfrm (771) orders
inconsistency between strcoll (789) and strxfrm (787) orders
inconsistency between strcoll (787) and strxfrm (788) orders
inconsistency between strcoll (788) and strxfrm (789) orders
inconsistency between strcoll (948) and strxfrm (947) orders
inconsistency between strcoll (947) and strxfrm (948) orders
lb_LU.utf8 BAD
Using LC_COLLATE = "lg_UG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
lg_UG.utf8 good
Using LC_COLLATE = "li_BE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
li_BE.utf8 good
Using LC_COLLATE = "lij_IT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
lij_IT.utf8 good
Using LC_COLLATE = "li_NL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
li_NL.utf8 good
Using LC_COLLATE = "lo_LA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
lo_LA.utf8 good
Using LC_COLLATE = "lt_LT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
lt_LT.utf8 good
Using LC_COLLATE = "lv_LV.utf8"
Using LC_CTYPE = "en_US.UTF-8"
lv_LV.utf8 good
Using LC_COLLATE = "lzh_TW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
lzh_TW.utf8 good
Using LC_COLLATE = "mag_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
mag_IN.utf8 good
Using LC_COLLATE = "mai_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
mai_IN.utf8 good
Using LC_COLLATE = "mg_MG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
mg_MG.utf8 good
Using LC_COLLATE = "mhr_RU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
mhr_RU.utf8 good
Using LC_COLLATE = "mi_NZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
mi_NZ.utf8 good
Using LC_COLLATE = "mk_MK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
mk_MK.utf8 good
Using LC_COLLATE = "ml_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ml_IN.utf8 good
Using LC_COLLATE = "mni_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
mni_IN.utf8 good
Using LC_COLLATE = "mn_MN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
mn_MN.utf8 good
Using LC_COLLATE = "mr_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
mr_IN.utf8 good
Using LC_COLLATE = "ms_MY.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ms_MY.utf8 good
Using LC_COLLATE = "mt_MT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
mt_MT.utf8 good
Using LC_COLLATE = "my_MM.utf8"
Using LC_CTYPE = "en_US.UTF-8"
my_MM.utf8 good
Using LC_COLLATE = "nan_TW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nan_TW.utf8 good
Using LC_COLLATE = "nan_TW.utf8@latin"
Using LC_CTYPE = "en_US.UTF-8"
nan_TW.utf8@latin good
Using LC_COLLATE = "nb_NO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nb_NO.utf8 good
Using LC_COLLATE = "nds_DE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nds_DE.utf8 good
Using LC_COLLATE = "nds_NL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nds_NL.utf8 good
Using LC_COLLATE = "ne_NP.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ne_NP.utf8 good
Using LC_COLLATE = "nhn_MX.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nhn_MX.utf8 good
Using LC_COLLATE = "niu_NU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
niu_NU.utf8 good
Using LC_COLLATE = "niu_NZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
niu_NZ.utf8 good
Using LC_COLLATE = "nl_AW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nl_AW.utf8 good
Using LC_COLLATE = "nl_BE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nl_BE.utf8 good
Using LC_COLLATE = "nl_NL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nl_NL.utf8 good
Using LC_COLLATE = "nn_NO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nn_NO.utf8 good
Using LC_COLLATE = "nr_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nr_ZA.utf8 good
Using LC_COLLATE = "nso_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nso_ZA.utf8 good
Using LC_COLLATE = "oc_FR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
oc_FR.utf8 good
Using LC_COLLATE = "om_ET.utf8"
Using LC_CTYPE = "en_US.UTF-8"
om_ET.utf8 good
Using LC_COLLATE = "om_KE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
om_KE.utf8 good
Using LC_COLLATE = "or_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
or_IN.utf8 good
Using LC_COLLATE = "os_RU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
os_RU.utf8 good
Using LC_COLLATE = "pa_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
pa_IN.utf8 good
Using LC_COLLATE = "pap_AN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
pap_AN.utf8 good
Using LC_COLLATE = "pap_AW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
pap_AW.utf8 good
Using LC_COLLATE = "pap_CW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
pap_CW.utf8 good
Using LC_COLLATE = "pa_PK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
pa_PK.utf8 good
Using LC_COLLATE = "pl_PL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
pl_PL.utf8 good
Using LC_COLLATE = "ps_AF.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ps_AF.utf8 good
Using LC_COLLATE = "pt_BR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
pt_BR.utf8 good
Using LC_COLLATE = "pt_PT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
pt_PT.utf8 good
Using LC_COLLATE = "quz_PE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
quz_PE.utf8 good
Using LC_COLLATE = "ro_RO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ro_RO.utf8 good
Using LC_COLLATE = "ru_RU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ru_RU.utf8 good
Using LC_COLLATE = "ru_UA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ru_UA.utf8 good
Using LC_COLLATE = "rw_RW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
rw_RW.utf8 good
Using LC_COLLATE = "sa_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sa_IN.utf8 good
Using LC_COLLATE = "sat_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sat_IN.utf8 good
Using LC_COLLATE = "sc_IT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sc_IT.utf8 good
Using LC_COLLATE = "sd_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sd_IN.utf8 good
Using LC_COLLATE = "sd_IN.utf8@devanagari"
Using LC_CTYPE = "en_US.UTF-8"
sd_IN.utf8@devanagari good
Using LC_COLLATE = "se_NO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
se_NO.utf8 good
Using LC_COLLATE = "shs_CA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
shs_CA.utf8 good
Using LC_COLLATE = "sid_ET.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sid_ET.utf8 good
Using LC_COLLATE = "si_LK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
si_LK.utf8 good
Using LC_COLLATE = "sk_SK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sk_SK.utf8 good
Using LC_COLLATE = "sl_SI.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sl_SI.utf8 good
Using LC_COLLATE = "so_DJ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
so_DJ.utf8 good
Using LC_COLLATE = "so_ET.utf8"
Using LC_CTYPE = "en_US.UTF-8"
so_ET.utf8 good
Using LC_COLLATE = "so_KE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
so_KE.utf8 good
Using LC_COLLATE = "so_SO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
so_SO.utf8 good
Using LC_COLLATE = "sq_AL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sq_AL.utf8 good
Using LC_COLLATE = "sq_MK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sq_MK.utf8 good
Using LC_COLLATE = "sr_ME.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sr_ME.utf8 good
Using LC_COLLATE = "sr_RS.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sr_RS.utf8 good
Using LC_COLLATE = "sr_RS.utf8@latin"
Using LC_CTYPE = "en_US.UTF-8"
sr_RS.utf8@latin good
Using LC_COLLATE = "ss_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ss_ZA.utf8 good
Using LC_COLLATE = "st_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
st_ZA.utf8 good
Using LC_COLLATE = "sv_FI.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sv_FI.utf8 good
Using LC_COLLATE = "sv_SE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sv_SE.utf8 good
Using LC_COLLATE = "sw_KE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sw_KE.utf8 good
Using LC_COLLATE = "sw_TZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sw_TZ.utf8 good
Using LC_COLLATE = "szl_PL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
szl_PL.utf8 good
Using LC_COLLATE = "ta_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ta_IN.utf8 good
Using LC_COLLATE = "ta_LK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ta_LK.utf8 good
Using LC_COLLATE = "te_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
te_IN.utf8 good
Using LC_COLLATE = "tg_TJ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
tg_TJ.utf8 good
Using LC_COLLATE = "the_NP.utf8"
Using LC_CTYPE = "en_US.UTF-8"
the_NP.utf8 good
Using LC_COLLATE = "th_TH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
th_TH.utf8 good
Using LC_COLLATE = "ti_ER.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ti_ER.utf8 good
Using LC_COLLATE = "ti_ET.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ti_ET.utf8 good
Using LC_COLLATE = "tig_ER.utf8"
Using LC_CTYPE = "en_US.UTF-8"
tig_ER.utf8 good
Using LC_COLLATE = "tk_TM.utf8"
Using LC_CTYPE = "en_US.UTF-8"
tk_TM.utf8 good
Using LC_COLLATE = "tl_PH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
tl_PH.utf8 good
Using LC_COLLATE = "tn_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
tn_ZA.utf8 good
Using LC_COLLATE = "tr_CY.utf8"
Using LC_CTYPE = "en_US.UTF-8"
tr_CY.utf8 good
Using LC_COLLATE = "tr_TR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
tr_TR.utf8 good
Using LC_COLLATE = "ts_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ts_ZA.utf8 good
Using LC_COLLATE = "tt_RU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
tt_RU.utf8 good
Using LC_COLLATE = "tt_RU.utf8@iqtelif"
Using LC_CTYPE = "en_US.UTF-8"
tt_RU.utf8@iqtelif good
Using LC_COLLATE = "ug_CN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ug_CN.utf8 good
Using LC_COLLATE = "uk_UA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
uk_UA.utf8 good
Using LC_COLLATE = "unm_US.utf8"
Using LC_CTYPE = "en_US.UTF-8"
unm_US.utf8 good
Using LC_COLLATE = "ur_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ur_IN.utf8 good
Using LC_COLLATE = "ur_PK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ur_PK.utf8 good
Using LC_COLLATE = "uz_UZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
uz_UZ.utf8 good
Using LC_COLLATE = "uz_UZ.utf8@cyrillic"
Using LC_CTYPE = "en_US.UTF-8"
uz_UZ.utf8@cyrillic good
Using LC_COLLATE = "ve_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ve_ZA.utf8 good
Using LC_COLLATE = "vi_VN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
vi_VN.utf8 good
Using LC_COLLATE = "wa_BE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
wa_BE.utf8 good
Using LC_COLLATE = "wae_CH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
wae_CH.utf8 good
Using LC_COLLATE = "wal_ET.utf8"
Using LC_CTYPE = "en_US.UTF-8"
wal_ET.utf8 good
Using LC_COLLATE = "wo_SN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
wo_SN.utf8 good
Using LC_COLLATE = "xh_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
xh_ZA.utf8 good
Using LC_COLLATE = "yi_US.utf8"
Using LC_CTYPE = "en_US.UTF-8"
yi_US.utf8 good
Using LC_COLLATE = "yo_NG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
yo_NG.utf8 good
Using LC_COLLATE = "yue_HK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
yue_HK.utf8 good
Using LC_COLLATE = "zh_CN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
zh_CN.utf8 good
Using LC_COLLATE = "zh_HK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
zh_HK.utf8 good
Using LC_COLLATE = "zh_SG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
zh_SG.utf8 good
Using LC_COLLATE = "zh_TW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
zh_TW.utf8 good
Using LC_COLLATE = "zu_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
zu_ZA.utf8 good
Thanks!
Stephen
On Wed, Mar 23, 2016 at 2:18 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Wed, Mar 23, 2016 at 12:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I was a little worried that it was too much to hope for that all libc
vendors on earth would ship a strxfrm() implementation that was actually
consistent with strcoll(), and here we are.Indeed. To try to put some scope on the problem, I made an idiot little
program that just generates some random UTF8 strings and sees whether
strcoll and strxfrm sort them alike. Attached are that program, a even
more idiot little shell script that runs it over all available UTF8
locales, and the results on my RHEL6 box. While de_DE seems to be the
worst-broken locale, it's far from the only one.Please try this on as many platforms as you can get hold of ...
Failed on Debian 8.2, but only for de_DE.utf8. libc 2.19-18+deb8u1. Attached.
Ran again after apt-get upgrade took me to 8.3 and libc6
2.19-18+deb8u2. Results similar, de_DE.utf8 has inconsistencies but
nothing else. So Debian stable is affected. (Just noticed that
Stephen Frost's output from the same OS reports a broken lb_LU.utf8
too, but after conferring on IRC it seems that may be because I
installed "locales-all" (precompiled) which didn't give me lb_LU.utf8,
and he generated all locales which apparently does.)
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I was a little worried that it was too much to hope for that all libc
vendors on earth would ship a strxfrm() implementation that was actually
consistent with strcoll(), and here we are.Indeed. To try to put some scope on the problem, I made an idiot little
program that just generates some random UTF8 strings and sees whether
strcoll and strxfrm sort them alike. Attached are that program, a even
more idiot little shell script that runs it over all available UTF8
locales, and the results on my RHEL6 box. While de_DE seems to be the
worst-broken locale, it's far from the only one.Please try this on as many platforms as you can get hold of ...
Debian 7.9 results with all locales locally generated:
Using LC_COLLATE = "aa_DJ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
aa_DJ.utf8 good
Using LC_COLLATE = "aa_ER.utf8"
Using LC_CTYPE = "en_US.UTF-8"
aa_ER.utf8 good
Using LC_COLLATE = "aa_ER.utf8@saaho"
Using LC_CTYPE = "en_US.UTF-8"
aa_ER.utf8@saaho good
Using LC_COLLATE = "aa_ET.utf8"
Using LC_CTYPE = "en_US.UTF-8"
aa_ET.utf8 good
Using LC_COLLATE = "af_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
af_ZA.utf8 good
Using LC_COLLATE = "am_ET.utf8"
Using LC_CTYPE = "en_US.UTF-8"
am_ET.utf8 good
Using LC_COLLATE = "an_ES.utf8"
Using LC_CTYPE = "en_US.UTF-8"
an_ES.utf8 good
Using LC_COLLATE = "ar_AE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_AE.utf8 good
Using LC_COLLATE = "ar_BH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_BH.utf8 good
Using LC_COLLATE = "ar_DZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_DZ.utf8 good
Using LC_COLLATE = "ar_EG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_EG.utf8 good
Using LC_COLLATE = "ar_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_IN.utf8 good
Using LC_COLLATE = "ar_IQ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_IQ.utf8 good
Using LC_COLLATE = "ar_JO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_JO.utf8 good
Using LC_COLLATE = "ar_KW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_KW.utf8 good
Using LC_COLLATE = "ar_LB.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_LB.utf8 good
Using LC_COLLATE = "ar_LY.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_LY.utf8 good
Using LC_COLLATE = "ar_MA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_MA.utf8 good
Using LC_COLLATE = "ar_OM.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_OM.utf8 good
Using LC_COLLATE = "ar_QA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_QA.utf8 good
Using LC_COLLATE = "ar_SA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_SA.utf8 good
Using LC_COLLATE = "ar_SD.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_SD.utf8 good
Using LC_COLLATE = "ar_SY.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_SY.utf8 good
Using LC_COLLATE = "ar_TN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_TN.utf8 good
Using LC_COLLATE = "ar_YE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ar_YE.utf8 good
Using LC_COLLATE = "as_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
as_IN.utf8 good
Using LC_COLLATE = "ast_ES.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ast_ES.utf8 good
Using LC_COLLATE = "az_AZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
az_AZ.utf8 good
Using LC_COLLATE = "be_BY.utf8"
Using LC_CTYPE = "en_US.UTF-8"
be_BY.utf8 good
Using LC_COLLATE = "be_BY.utf8@latin"
Using LC_CTYPE = "en_US.UTF-8"
be_BY.utf8@latin good
Using LC_COLLATE = "bem_ZM.utf8"
Using LC_CTYPE = "en_US.UTF-8"
bem_ZM.utf8 good
Using LC_COLLATE = "ber_DZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ber_DZ.utf8 good
Using LC_COLLATE = "ber_MA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ber_MA.utf8 good
Using LC_COLLATE = "bg_BG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
bg_BG.utf8 good
Using LC_COLLATE = "bn_BD.utf8"
Using LC_CTYPE = "en_US.UTF-8"
bn_BD.utf8 good
Using LC_COLLATE = "bn_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
bn_IN.utf8 good
Using LC_COLLATE = "bo_CN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
bo_CN.utf8 good
Using LC_COLLATE = "bo_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
bo_IN.utf8 good
Using LC_COLLATE = "br_FR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
br_FR.utf8 good
Using LC_COLLATE = "bs_BA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
bs_BA.utf8 good
Using LC_COLLATE = "byn_ER.utf8"
Using LC_CTYPE = "en_US.UTF-8"
byn_ER.utf8 good
Using LC_COLLATE = "ca_AD.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ca_AD.utf8 good
Using LC_COLLATE = "ca_ES.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ca_ES.utf8 good
Using LC_COLLATE = "ca_ES.utf8@valencia"
Using LC_CTYPE = "en_US.UTF-8"
ca_ES.utf8@valencia good
Using LC_COLLATE = "ca_FR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ca_FR.utf8 good
Using LC_COLLATE = "ca_IT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ca_IT.utf8 good
Using LC_COLLATE = "crh_UA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
crh_UA.utf8 good
Using LC_COLLATE = "csb_PL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
csb_PL.utf8 good
Using LC_COLLATE = "cs_CZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
cs_CZ.utf8 good
Using LC_COLLATE = "C.UTF-8"
Using LC_CTYPE = "en_US.UTF-8"
C.UTF-8 good
Using LC_COLLATE = "cv_RU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
cv_RU.utf8 good
Using LC_COLLATE = "cy_GB.utf8"
Using LC_CTYPE = "en_US.UTF-8"
cy_GB.utf8 good
Using LC_COLLATE = "da_DK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
da_DK.utf8 good
Using LC_COLLATE = "de_AT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
de_AT.utf8 good
Using LC_COLLATE = "de_BE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
de_BE.utf8 good
Using LC_COLLATE = "de_CH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
de_CH.utf8 good
Using LC_COLLATE = "de_DE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
inconsistency between strcoll (71) and strxfrm (70) orders
inconsistency between strcoll (70) and strxfrm (71) orders
inconsistency between strcoll (98) and strxfrm (97) orders
inconsistency between strcoll (97) and strxfrm (98) orders
inconsistency between strcoll (130) and strxfrm (128) orders
inconsistency between strcoll (131) and strxfrm (129) orders
inconsistency between strcoll (128) and strxfrm (130) orders
inconsistency between strcoll (129) and strxfrm (131) orders
inconsistency between strcoll (143) and strxfrm (142) orders
inconsistency between strcoll (142) and strxfrm (143) orders
inconsistency between strcoll (147) and strxfrm (146) orders
inconsistency between strcoll (146) and strxfrm (147) orders
inconsistency between strcoll (152) and strxfrm (150) orders
inconsistency between strcoll (150) and strxfrm (151) orders
inconsistency between strcoll (151) and strxfrm (152) orders
inconsistency between strcoll (155) and strxfrm (154) orders
inconsistency between strcoll (154) and strxfrm (155) orders
inconsistency between strcoll (154) and strxfrm (155) orders
inconsistency between strcoll (157) and strxfrm (156) orders
inconsistency between strcoll (156) and strxfrm (157) orders
inconsistency between strcoll (195) and strxfrm (194) orders
inconsistency between strcoll (194) and strxfrm (195) orders
inconsistency between strcoll (314) and strxfrm (313) orders
inconsistency between strcoll (315) and strxfrm (314) orders
inconsistency between strcoll (316) and strxfrm (315) orders
inconsistency between strcoll (313) and strxfrm (316) orders
inconsistency between strcoll (350) and strxfrm (349) orders
inconsistency between strcoll (351) and strxfrm (350) orders
inconsistency between strcoll (352) and strxfrm (351) orders
inconsistency between strcoll (353) and strxfrm (352) orders
inconsistency between strcoll (354) and strxfrm (353) orders
inconsistency between strcoll (349) and strxfrm (354) orders
inconsistency between strcoll (357) and strxfrm (356) orders
inconsistency between strcoll (356) and strxfrm (357) orders
inconsistency between strcoll (360) and strxfrm (359) orders
inconsistency between strcoll (359) and strxfrm (360) orders
inconsistency between strcoll (433) and strxfrm (432) orders
inconsistency between strcoll (432) and strxfrm (433) orders
inconsistency between strcoll (535) and strxfrm (534) orders
inconsistency between strcoll (534) and strxfrm (535) orders
inconsistency between strcoll (634) and strxfrm (632) orders
inconsistency between strcoll (635) and strxfrm (633) orders
inconsistency between strcoll (632) and strxfrm (634) orders
inconsistency between strcoll (633) and strxfrm (635) orders
inconsistency between strcoll (642) and strxfrm (641) orders
inconsistency between strcoll (641) and strxfrm (642) orders
inconsistency between strcoll (760) and strxfrm (758) orders
inconsistency between strcoll (758) and strxfrm (759) orders
inconsistency between strcoll (761) and strxfrm (760) orders
inconsistency between strcoll (759) and strxfrm (761) orders
inconsistency between strcoll (794) and strxfrm (793) orders
inconsistency between strcoll (795) and strxfrm (794) orders
inconsistency between strcoll (796) and strxfrm (795) orders
inconsistency between strcoll (797) and strxfrm (796) orders
inconsistency between strcoll (793) and strxfrm (797) orders
inconsistency between strcoll (799) and strxfrm (798) orders
inconsistency between strcoll (798) and strxfrm (799) orders
inconsistency between strcoll (803) and strxfrm (802) orders
inconsistency between strcoll (802) and strxfrm (803) orders
inconsistency between strcoll (880) and strxfrm (879) orders
inconsistency between strcoll (879) and strxfrm (880) orders
inconsistency between strcoll (879) and strxfrm (880) orders
inconsistency between strcoll (890) and strxfrm (889) orders
inconsistency between strcoll (889) and strxfrm (890) orders
de_DE.utf8 BAD
Using LC_COLLATE = "de_LI.utf8"
Using LC_CTYPE = "en_US.UTF-8"
de_LI.utf8 good
Using LC_COLLATE = "de_LU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
de_LU.utf8 good
Using LC_COLLATE = "dv_MV.utf8"
Using LC_CTYPE = "en_US.UTF-8"
dv_MV.utf8 good
Using LC_COLLATE = "dz_BT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
dz_BT.utf8 good
Using LC_COLLATE = "el_CY.utf8"
Using LC_CTYPE = "en_US.UTF-8"
el_CY.utf8 good
Using LC_COLLATE = "el_GR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
el_GR.utf8 good
Using LC_COLLATE = "en_AG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_AG.utf8 good
Using LC_COLLATE = "en_AU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_AU.utf8 good
Using LC_COLLATE = "en_BW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_BW.utf8 good
Using LC_COLLATE = "en_CA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_CA.utf8 good
Using LC_COLLATE = "en_DK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_DK.utf8 good
Using LC_COLLATE = "en_GB.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_GB.utf8 good
Using LC_COLLATE = "en_HK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_HK.utf8 good
Using LC_COLLATE = "en_IE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_IE.utf8 good
Using LC_COLLATE = "en_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_IN.utf8 good
Using LC_COLLATE = "en_NG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_NG.utf8 good
Using LC_COLLATE = "en_NZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_NZ.utf8 good
Using LC_COLLATE = "en_PH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_PH.utf8 good
Using LC_COLLATE = "en_SG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_SG.utf8 good
Using LC_COLLATE = "en_US.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_US.utf8 good
Using LC_COLLATE = "en_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_ZA.utf8 good
Using LC_COLLATE = "en_ZM.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_ZM.utf8 good
Using LC_COLLATE = "en_ZW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_ZW.utf8 good
Using LC_COLLATE = "eo.utf8"
Using LC_CTYPE = "en_US.UTF-8"
eo.utf8 good
Using LC_COLLATE = "es_AR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_AR.utf8 good
Using LC_COLLATE = "es_BO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_BO.utf8 good
Using LC_COLLATE = "es_CL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_CL.utf8 good
Using LC_COLLATE = "es_CO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_CO.utf8 good
Using LC_COLLATE = "es_CR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_CR.utf8 good
Using LC_COLLATE = "es_DO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_DO.utf8 good
Using LC_COLLATE = "es_EC.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_EC.utf8 good
Using LC_COLLATE = "es_ES.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_ES.utf8 good
Using LC_COLLATE = "es_GT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_GT.utf8 good
Using LC_COLLATE = "es_HN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_HN.utf8 good
Using LC_COLLATE = "es_MX.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_MX.utf8 good
Using LC_COLLATE = "es_NI.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_NI.utf8 good
Using LC_COLLATE = "es_PA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_PA.utf8 good
Using LC_COLLATE = "es_PE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_PE.utf8 good
Using LC_COLLATE = "es_PR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_PR.utf8 good
Using LC_COLLATE = "es_PY.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_PY.utf8 good
Using LC_COLLATE = "es_SV.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_SV.utf8 good
Using LC_COLLATE = "es_US.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_US.utf8 good
Using LC_COLLATE = "es_UY.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_UY.utf8 good
Using LC_COLLATE = "es_VE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
es_VE.utf8 good
Using LC_COLLATE = "et_EE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
et_EE.utf8 good
Using LC_COLLATE = "eu_ES.utf8"
Using LC_CTYPE = "en_US.UTF-8"
eu_ES.utf8 good
Using LC_COLLATE = "eu_FR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
eu_FR.utf8 good
Using LC_COLLATE = "fa_IR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fa_IR.utf8 good
Using LC_COLLATE = "ff_SN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ff_SN.utf8 good
Using LC_COLLATE = "fi_FI.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fi_FI.utf8 good
Using LC_COLLATE = "fil_PH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fil_PH.utf8 good
Using LC_COLLATE = "fo_FO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fo_FO.utf8 good
Using LC_COLLATE = "fr_BE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fr_BE.utf8 good
Using LC_COLLATE = "fr_CA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fr_CA.utf8 good
Using LC_COLLATE = "fr_CH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fr_CH.utf8 good
Using LC_COLLATE = "fr_FR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fr_FR.utf8 good
Using LC_COLLATE = "fr_LU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fr_LU.utf8 good
Using LC_COLLATE = "fur_IT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fur_IT.utf8 good
Using LC_COLLATE = "fy_DE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fy_DE.utf8 good
Using LC_COLLATE = "fy_NL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
fy_NL.utf8 good
Using LC_COLLATE = "ga_IE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ga_IE.utf8 good
Using LC_COLLATE = "gd_GB.utf8"
Using LC_CTYPE = "en_US.UTF-8"
gd_GB.utf8 good
Using LC_COLLATE = "gez_ER.utf8"
Using LC_CTYPE = "en_US.UTF-8"
gez_ER.utf8 good
Using LC_COLLATE = "gez_ER.utf8@abegede"
Using LC_CTYPE = "en_US.UTF-8"
gez_ER.utf8@abegede good
Using LC_COLLATE = "gez_ET.utf8"
Using LC_CTYPE = "en_US.UTF-8"
gez_ET.utf8 good
Using LC_COLLATE = "gez_ET.utf8@abegede"
Using LC_CTYPE = "en_US.UTF-8"
gez_ET.utf8@abegede good
Using LC_COLLATE = "gl_ES.utf8"
Using LC_CTYPE = "en_US.UTF-8"
gl_ES.utf8 good
Using LC_COLLATE = "gu_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
gu_IN.utf8 good
Using LC_COLLATE = "gv_GB.utf8"
Using LC_CTYPE = "en_US.UTF-8"
gv_GB.utf8 good
Using LC_COLLATE = "ha_NG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ha_NG.utf8 good
Using LC_COLLATE = "he_IL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
he_IL.utf8 good
Using LC_COLLATE = "hi_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
hi_IN.utf8 good
Using LC_COLLATE = "hne_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
hne_IN.utf8 good
Using LC_COLLATE = "hr_HR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
hr_HR.utf8 good
Using LC_COLLATE = "hsb_DE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
hsb_DE.utf8 good
Using LC_COLLATE = "ht_HT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ht_HT.utf8 good
Using LC_COLLATE = "hu_HU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
hu_HU.utf8 good
Using LC_COLLATE = "hy_AM.utf8"
Using LC_CTYPE = "en_US.UTF-8"
hy_AM.utf8 good
Using LC_COLLATE = "ia.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ia.utf8 good
Using LC_COLLATE = "id_ID.utf8"
Using LC_CTYPE = "en_US.UTF-8"
id_ID.utf8 good
Using LC_COLLATE = "ig_NG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ig_NG.utf8 good
Using LC_COLLATE = "ik_CA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ik_CA.utf8 good
Using LC_COLLATE = "is_IS.utf8"
Using LC_CTYPE = "en_US.UTF-8"
is_IS.utf8 good
Using LC_COLLATE = "it_CH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
it_CH.utf8 good
Using LC_COLLATE = "it_IT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
it_IT.utf8 good
Using LC_COLLATE = "iu_CA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
iu_CA.utf8 good
Using LC_COLLATE = "iw_IL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
iw_IL.utf8 good
Using LC_COLLATE = "ja_JP.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ja_JP.utf8 good
Using LC_COLLATE = "ka_GE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ka_GE.utf8 good
Using LC_COLLATE = "kk_KZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
kk_KZ.utf8 good
Using LC_COLLATE = "kl_GL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
kl_GL.utf8 good
Using LC_COLLATE = "km_KH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
km_KH.utf8 good
Using LC_COLLATE = "kn_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
kn_IN.utf8 good
Using LC_COLLATE = "kok_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
kok_IN.utf8 good
Using LC_COLLATE = "ko_KR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ko_KR.utf8 good
Using LC_COLLATE = "ks_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ks_IN.utf8 good
Using LC_COLLATE = "ks_IN.utf8@devanagari"
Using LC_CTYPE = "en_US.UTF-8"
ks_IN.utf8@devanagari good
Using LC_COLLATE = "ku_TR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ku_TR.utf8 good
Using LC_COLLATE = "kw_GB.utf8"
Using LC_CTYPE = "en_US.UTF-8"
kw_GB.utf8 good
Using LC_COLLATE = "ky_KG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ky_KG.utf8 good
Using LC_COLLATE = "lg_UG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
lg_UG.utf8 good
Using LC_COLLATE = "li_BE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
li_BE.utf8 good
Using LC_COLLATE = "li_NL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
li_NL.utf8 good
Using LC_COLLATE = "lo_LA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
lo_LA.utf8 good
Using LC_COLLATE = "lt_LT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
lt_LT.utf8 good
Using LC_COLLATE = "lv_LV.utf8"
Using LC_CTYPE = "en_US.UTF-8"
lv_LV.utf8 good
Using LC_COLLATE = "mai_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
mai_IN.utf8 good
Using LC_COLLATE = "mg_MG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
mg_MG.utf8 good
Using LC_COLLATE = "mi_NZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
mi_NZ.utf8 good
Using LC_COLLATE = "mk_MK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
mk_MK.utf8 good
Using LC_COLLATE = "ml_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ml_IN.utf8 good
Using LC_COLLATE = "mn_MN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
mn_MN.utf8 good
Using LC_COLLATE = "mr_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
mr_IN.utf8 good
Using LC_COLLATE = "ms_MY.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ms_MY.utf8 good
Using LC_COLLATE = "mt_MT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
mt_MT.utf8 good
Using LC_COLLATE = "my_MM.utf8"
Using LC_CTYPE = "en_US.UTF-8"
my_MM.utf8 good
Using LC_COLLATE = "nan_TW.utf8@latin"
Using LC_CTYPE = "en_US.UTF-8"
nan_TW.utf8@latin good
Using LC_COLLATE = "nb_NO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nb_NO.utf8 good
Using LC_COLLATE = "nds_DE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nds_DE.utf8 good
Using LC_COLLATE = "nds_NL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nds_NL.utf8 good
Using LC_COLLATE = "ne_NP.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ne_NP.utf8 good
Using LC_COLLATE = "nl_AW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nl_AW.utf8 good
Using LC_COLLATE = "nl_BE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nl_BE.utf8 good
Using LC_COLLATE = "nl_NL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nl_NL.utf8 good
Using LC_COLLATE = "nn_NO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nn_NO.utf8 good
Using LC_COLLATE = "nr_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nr_ZA.utf8 good
Using LC_COLLATE = "nso_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
nso_ZA.utf8 good
Using LC_COLLATE = "oc_FR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
oc_FR.utf8 good
Using LC_COLLATE = "om_ET.utf8"
Using LC_CTYPE = "en_US.UTF-8"
om_ET.utf8 good
Using LC_COLLATE = "om_KE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
om_KE.utf8 good
Using LC_COLLATE = "or_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
or_IN.utf8 good
Using LC_COLLATE = "os_RU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
inconsistency between strcoll (936) and strxfrm (935) orders
inconsistency between strcoll (935) and strxfrm (936) orders
os_RU.utf8 BAD
Using LC_COLLATE = "pa_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
pa_IN.utf8 good
Using LC_COLLATE = "pap_AN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
pap_AN.utf8 good
Using LC_COLLATE = "pa_PK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
pa_PK.utf8 good
Using LC_COLLATE = "pl_PL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
pl_PL.utf8 good
Using LC_COLLATE = "ps_AF.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ps_AF.utf8 good
Using LC_COLLATE = "pt_BR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
pt_BR.utf8 good
Using LC_COLLATE = "pt_PT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
pt_PT.utf8 good
Using LC_COLLATE = "ro_RO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ro_RO.utf8 good
Using LC_COLLATE = "ru_RU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ru_RU.utf8 good
Using LC_COLLATE = "ru_UA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ru_UA.utf8 good
Using LC_COLLATE = "rw_RW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
rw_RW.utf8 good
Using LC_COLLATE = "sa_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sa_IN.utf8 good
Using LC_COLLATE = "sc_IT.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sc_IT.utf8 good
Using LC_COLLATE = "sd_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sd_IN.utf8 good
Using LC_COLLATE = "sd_IN.utf8@devanagari"
Using LC_CTYPE = "en_US.UTF-8"
sd_IN.utf8@devanagari good
Using LC_COLLATE = "se_NO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
se_NO.utf8 good
Using LC_COLLATE = "shs_CA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
strxfrm() result for 18-length string exceeded 100 bytes
shs_CA.utf8 BAD
Using LC_COLLATE = "sid_ET.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sid_ET.utf8 good
Using LC_COLLATE = "si_LK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
si_LK.utf8 good
Using LC_COLLATE = "sk_SK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sk_SK.utf8 good
Using LC_COLLATE = "sl_SI.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sl_SI.utf8 good
Using LC_COLLATE = "so_DJ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
so_DJ.utf8 good
Using LC_COLLATE = "so_ET.utf8"
Using LC_CTYPE = "en_US.UTF-8"
so_ET.utf8 good
Using LC_COLLATE = "so_KE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
so_KE.utf8 good
Using LC_COLLATE = "so_SO.utf8"
Using LC_CTYPE = "en_US.UTF-8"
so_SO.utf8 good
Using LC_COLLATE = "sq_AL.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sq_AL.utf8 good
Using LC_COLLATE = "sq_MK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sq_MK.utf8 good
Using LC_COLLATE = "sr_ME.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sr_ME.utf8 good
Using LC_COLLATE = "sr_RS.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sr_RS.utf8 good
Using LC_COLLATE = "sr_RS.utf8@latin"
Using LC_CTYPE = "en_US.UTF-8"
sr_RS.utf8@latin good
Using LC_COLLATE = "ss_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ss_ZA.utf8 good
Using LC_COLLATE = "st_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
st_ZA.utf8 good
Using LC_COLLATE = "sv_FI.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sv_FI.utf8 good
Using LC_COLLATE = "sv_SE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sv_SE.utf8 good
Using LC_COLLATE = "sw_KE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sw_KE.utf8 good
Using LC_COLLATE = "sw_TZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
sw_TZ.utf8 good
Using LC_COLLATE = "ta_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ta_IN.utf8 good
Using LC_COLLATE = "te_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
te_IN.utf8 good
Using LC_COLLATE = "tg_TJ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
tg_TJ.utf8 good
Using LC_COLLATE = "th_TH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
th_TH.utf8 good
Using LC_COLLATE = "ti_ER.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ti_ER.utf8 good
Using LC_COLLATE = "ti_ET.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ti_ET.utf8 good
Using LC_COLLATE = "tig_ER.utf8"
Using LC_CTYPE = "en_US.UTF-8"
tig_ER.utf8 good
Using LC_COLLATE = "tk_TM.utf8"
Using LC_CTYPE = "en_US.UTF-8"
tk_TM.utf8 good
Using LC_COLLATE = "tl_PH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
tl_PH.utf8 good
Using LC_COLLATE = "tn_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
tn_ZA.utf8 good
Using LC_COLLATE = "tr_CY.utf8"
Using LC_CTYPE = "en_US.UTF-8"
tr_CY.utf8 good
Using LC_COLLATE = "tr_TR.utf8"
Using LC_CTYPE = "en_US.UTF-8"
tr_TR.utf8 good
Using LC_COLLATE = "ts_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ts_ZA.utf8 good
Using LC_COLLATE = "tt_RU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
tt_RU.utf8 good
Using LC_COLLATE = "tt_RU.utf8@iqtelif"
Using LC_CTYPE = "en_US.UTF-8"
tt_RU.utf8@iqtelif good
Using LC_COLLATE = "ug_CN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ug_CN.utf8 good
Using LC_COLLATE = "uk_UA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
uk_UA.utf8 good
Using LC_COLLATE = "ur_PK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ur_PK.utf8 good
Using LC_COLLATE = "uz_UZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
uz_UZ.utf8 good
Using LC_COLLATE = "uz_UZ.utf8@cyrillic"
Using LC_CTYPE = "en_US.UTF-8"
uz_UZ.utf8@cyrillic good
Using LC_COLLATE = "ve_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
ve_ZA.utf8 good
Using LC_COLLATE = "vi_VN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
vi_VN.utf8 good
Using LC_COLLATE = "wa_BE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
wa_BE.utf8 good
Using LC_COLLATE = "wo_SN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
wo_SN.utf8 good
Using LC_COLLATE = "xh_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
xh_ZA.utf8 good
Using LC_COLLATE = "yi_US.utf8"
Using LC_CTYPE = "en_US.UTF-8"
yi_US.utf8 good
Using LC_COLLATE = "yo_NG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
yo_NG.utf8 good
Using LC_COLLATE = "zh_CN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
zh_CN.utf8 good
Using LC_COLLATE = "zh_HK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
zh_HK.utf8 good
Using LC_COLLATE = "zh_SG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
zh_SG.utf8 good
Using LC_COLLATE = "zh_TW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
zh_TW.utf8 good
Using LC_COLLATE = "zu_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
zu_ZA.utf8 good
Thanks!
Stephen
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I was a little worried that it was too much to hope for that all libc
vendors on earth would ship a strxfrm() implementation that was actually
consistent with strcoll(), and here we are.Indeed. To try to put some scope on the problem, I made an idiot little
program that just generates some random UTF8 strings and sees whether
strcoll and strxfrm sort them alike. Attached are that program, a even
more idiot little shell script that runs it over all available UTF8
locales, and the results on my RHEL6 box. While de_DE seems to be the
worst-broken locale, it's far from the only one.Please try this on as many platforms as you can get hold of ...
From IRC (not mine), "debian testing, glibc 2.22-3":
Using LC_COLLATE = "aa_DJ.utf8"
Using LC_CTYPE = "aa_DJ.utf8"
aa_DJ.utf8 good
Using LC_COLLATE = "aa_ER"
Using LC_CTYPE = "aa_ER"
aa_ER good
Using LC_COLLATE = "aa_ER@saaho"
Using LC_CTYPE = "aa_ER@saaho"
aa_ER@saaho good
Using LC_COLLATE = "aa_ET"
Using LC_CTYPE = "aa_ET"
aa_ET good
Using LC_COLLATE = "af_ZA.utf8"
Using LC_CTYPE = "af_ZA.utf8"
af_ZA.utf8 good
Using LC_COLLATE = "ak_GH"
Using LC_CTYPE = "ak_GH"
ak_GH good
Using LC_COLLATE = "am_ET"
Using LC_CTYPE = "am_ET"
am_ET good
Using LC_COLLATE = "an_ES.utf8"
Using LC_CTYPE = "an_ES.utf8"
an_ES.utf8 good
Using LC_COLLATE = "anp_IN"
Using LC_CTYPE = "anp_IN"
anp_IN good
Using LC_COLLATE = "ar_AE.utf8"
Using LC_CTYPE = "ar_AE.utf8"
ar_AE.utf8 good
Using LC_COLLATE = "ar_BH.utf8"
Using LC_CTYPE = "ar_BH.utf8"
ar_BH.utf8 good
Using LC_COLLATE = "ar_DZ.utf8"
Using LC_CTYPE = "ar_DZ.utf8"
ar_DZ.utf8 good
Using LC_COLLATE = "ar_EG.utf8"
Using LC_CTYPE = "ar_EG.utf8"
ar_EG.utf8 good
Using LC_COLLATE = "ar_IN"
Using LC_CTYPE = "ar_IN"
ar_IN good
Using LC_COLLATE = "ar_IQ.utf8"
Using LC_CTYPE = "ar_IQ.utf8"
ar_IQ.utf8 good
Using LC_COLLATE = "ar_JO.utf8"
Using LC_CTYPE = "ar_JO.utf8"
ar_JO.utf8 good
Using LC_COLLATE = "ar_KW.utf8"
Using LC_CTYPE = "ar_KW.utf8"
ar_KW.utf8 good
Using LC_COLLATE = "ar_LB.utf8"
Using LC_CTYPE = "ar_LB.utf8"
ar_LB.utf8 good
Using LC_COLLATE = "ar_LY.utf8"
Using LC_CTYPE = "ar_LY.utf8"
ar_LY.utf8 good
Using LC_COLLATE = "ar_MA.utf8"
Using LC_CTYPE = "ar_MA.utf8"
ar_MA.utf8 good
Using LC_COLLATE = "ar_OM.utf8"
Using LC_CTYPE = "ar_OM.utf8"
ar_OM.utf8 good
Using LC_COLLATE = "ar_QA.utf8"
Using LC_CTYPE = "ar_QA.utf8"
ar_QA.utf8 good
Using LC_COLLATE = "ar_SA.utf8"
Using LC_CTYPE = "ar_SA.utf8"
ar_SA.utf8 good
Using LC_COLLATE = "ar_SD.utf8"
Using LC_CTYPE = "ar_SD.utf8"
ar_SD.utf8 good
Using LC_COLLATE = "ar_SS"
Using LC_CTYPE = "ar_SS"
ar_SS good
Using LC_COLLATE = "ar_SY.utf8"
Using LC_CTYPE = "ar_SY.utf8"
ar_SY.utf8 good
Using LC_COLLATE = "ar_TN.utf8"
Using LC_CTYPE = "ar_TN.utf8"
ar_TN.utf8 good
Using LC_COLLATE = "ar_YE.utf8"
Using LC_CTYPE = "ar_YE.utf8"
ar_YE.utf8 good
Using LC_COLLATE = "as_IN"
Using LC_CTYPE = "as_IN"
as_IN good
Using LC_COLLATE = "ast_ES.utf8"
Using LC_CTYPE = "ast_ES.utf8"
ast_ES.utf8 good
Using LC_COLLATE = "ayc_PE"
Using LC_CTYPE = "ayc_PE"
ayc_PE good
Using LC_COLLATE = "az_AZ"
Using LC_CTYPE = "az_AZ"
az_AZ good
Using LC_COLLATE = "be_BY@latin"
Using LC_CTYPE = "be_BY@latin"
be_BY@latin good
Using LC_COLLATE = "be_BY.utf8"
Using LC_CTYPE = "be_BY.utf8"
be_BY.utf8 good
Using LC_COLLATE = "bem_ZM"
Using LC_CTYPE = "bem_ZM"
bem_ZM good
Using LC_COLLATE = "ber_DZ"
Using LC_CTYPE = "ber_DZ"
ber_DZ good
Using LC_COLLATE = "ber_MA"
Using LC_CTYPE = "ber_MA"
ber_MA good
Using LC_COLLATE = "bg_BG.utf8"
Using LC_CTYPE = "bg_BG.utf8"
bg_BG.utf8 good
Using LC_COLLATE = "bhb_IN.utf8"
Using LC_CTYPE = "bhb_IN.utf8"
bhb_IN.utf8 good
Using LC_COLLATE = "bho_IN"
Using LC_CTYPE = "bho_IN"
bho_IN good
Using LC_COLLATE = "bn_BD"
Using LC_CTYPE = "bn_BD"
bn_BD good
Using LC_COLLATE = "bn_IN"
Using LC_CTYPE = "bn_IN"
bn_IN good
Using LC_COLLATE = "bo_CN"
Using LC_CTYPE = "bo_CN"
bo_CN good
Using LC_COLLATE = "bo_IN"
Using LC_CTYPE = "bo_IN"
bo_IN good
Using LC_COLLATE = "br_FR.utf8"
Using LC_CTYPE = "br_FR.utf8"
br_FR.utf8 good
Using LC_COLLATE = "brx_IN"
Using LC_CTYPE = "brx_IN"
brx_IN good
Using LC_COLLATE = "bs_BA.utf8"
Using LC_CTYPE = "bs_BA.utf8"
bs_BA.utf8 good
Using LC_COLLATE = "byn_ER"
Using LC_CTYPE = "byn_ER"
byn_ER good
Using LC_COLLATE = "ca_AD.utf8"
Using LC_CTYPE = "ca_AD.utf8"
ca_AD.utf8 good
Using LC_COLLATE = "ca_ES.utf8"
Using LC_CTYPE = "ca_ES.utf8"
ca_ES.utf8 good
Using LC_COLLATE = "ca_ES.utf8@valencia"
Using LC_CTYPE = "ca_ES.utf8@valencia"
ca_ES.utf8@valencia good
Using LC_COLLATE = "ca_FR.utf8"
Using LC_CTYPE = "ca_FR.utf8"
ca_FR.utf8 good
Using LC_COLLATE = "ca_IT.utf8"
Using LC_CTYPE = "ca_IT.utf8"
ca_IT.utf8 good
Using LC_COLLATE = "ce_RU"
Using LC_CTYPE = "ce_RU"
ce_RU good
Using LC_COLLATE = "cmn_TW"
Using LC_CTYPE = "cmn_TW"
cmn_TW good
Using LC_COLLATE = "crh_UA"
Using LC_CTYPE = "crh_UA"
crh_UA good
Using LC_COLLATE = "csb_PL"
Using LC_CTYPE = "csb_PL"
csb_PL good
Using LC_COLLATE = "cs_CZ.utf8"
Using LC_CTYPE = "cs_CZ.utf8"
cs_CZ.utf8 good
Using LC_COLLATE = "C.UTF-8"
Using LC_CTYPE = "C.UTF-8"
C.UTF-8 good
Using LC_COLLATE = "cv_RU"
Using LC_CTYPE = "cv_RU"
cv_RU good
Using LC_COLLATE = "cy_GB.utf8"
Using LC_CTYPE = "cy_GB.utf8"
cy_GB.utf8 good
Using LC_COLLATE = "da_DK.utf8"
Using LC_CTYPE = "da_DK.utf8"
da_DK.utf8 good
Using LC_COLLATE = "de_AT.utf8"
Using LC_CTYPE = "de_AT.utf8"
de_AT.utf8 good
Using LC_COLLATE = "de_BE.utf8"
Using LC_CTYPE = "de_BE.utf8"
de_BE.utf8 good
Using LC_COLLATE = "de_CH.utf8"
Using LC_CTYPE = "de_CH.utf8"
de_CH.utf8 good
Using LC_COLLATE = "de_DE.utf8"
Using LC_CTYPE = "de_DE.utf8"
de_DE.utf8 good
Using LC_COLLATE = "de_LI.utf8"
Using LC_CTYPE = "de_LI.utf8"
de_LI.utf8 good
Using LC_COLLATE = "de_LU.utf8"
Using LC_CTYPE = "de_LU.utf8"
de_LU.utf8 good
Using LC_COLLATE = "doi_IN"
Using LC_CTYPE = "doi_IN"
doi_IN good
Using LC_COLLATE = "dv_MV"
Using LC_CTYPE = "dv_MV"
dv_MV good
Using LC_COLLATE = "dz_BT"
Using LC_CTYPE = "dz_BT"
dz_BT good
Using LC_COLLATE = "el_CY.utf8"
Using LC_CTYPE = "el_CY.utf8"
el_CY.utf8 good
Using LC_COLLATE = "el_GR.utf8"
Using LC_CTYPE = "el_GR.utf8"
el_GR.utf8 good
Using LC_COLLATE = "en_AG"
Using LC_CTYPE = "en_AG"
en_AG good
Using LC_COLLATE = "en_AU.utf8"
Using LC_CTYPE = "en_AU.utf8"
en_AU.utf8 good
Using LC_COLLATE = "en_BW.utf8"
Using LC_CTYPE = "en_BW.utf8"
en_BW.utf8 good
Using LC_COLLATE = "en_CA.utf8"
Using LC_CTYPE = "en_CA.utf8"
en_CA.utf8 good
Using LC_COLLATE = "en_DK.utf8"
Using LC_CTYPE = "en_DK.utf8"
en_DK.utf8 good
Using LC_COLLATE = "en_GB.utf8"
Using LC_CTYPE = "en_GB.utf8"
en_GB.utf8 good
Using LC_COLLATE = "en_HK.utf8"
Using LC_CTYPE = "en_HK.utf8"
en_HK.utf8 good
Using LC_COLLATE = "en_IE.utf8"
Using LC_CTYPE = "en_IE.utf8"
en_IE.utf8 good
Using LC_COLLATE = "en_IN"
Using LC_CTYPE = "en_IN"
en_IN good
Using LC_COLLATE = "en_NG"
Using LC_CTYPE = "en_NG"
en_NG good
Using LC_COLLATE = "en_NZ.utf8"
Using LC_CTYPE = "en_NZ.utf8"
en_NZ.utf8 good
Using LC_COLLATE = "en_PH.utf8"
Using LC_CTYPE = "en_PH.utf8"
en_PH.utf8 good
Using LC_COLLATE = "en_SG.utf8"
Using LC_CTYPE = "en_SG.utf8"
en_SG.utf8 good
Using LC_COLLATE = "en_US.utf8"
Using LC_CTYPE = "en_US.utf8"
en_US.utf8 good
Using LC_COLLATE = "en_ZA.utf8"
Using LC_CTYPE = "en_ZA.utf8"
en_ZA.utf8 good
Using LC_COLLATE = "en_ZM"
Using LC_CTYPE = "en_ZM"
en_ZM good
Using LC_COLLATE = "en_ZW.utf8"
Using LC_CTYPE = "en_ZW.utf8"
en_ZW.utf8 good
Using LC_COLLATE = "eo.utf8"
Using LC_CTYPE = "eo.utf8"
eo.utf8 good
Using LC_COLLATE = "es_AR.utf8"
Using LC_CTYPE = "es_AR.utf8"
es_AR.utf8 good
Using LC_COLLATE = "es_BO.utf8"
Using LC_CTYPE = "es_BO.utf8"
es_BO.utf8 good
Using LC_COLLATE = "es_CL.utf8"
Using LC_CTYPE = "es_CL.utf8"
es_CL.utf8 good
Using LC_COLLATE = "es_CO.utf8"
Using LC_CTYPE = "es_CO.utf8"
es_CO.utf8 good
Using LC_COLLATE = "es_CR.utf8"
Using LC_CTYPE = "es_CR.utf8"
es_CR.utf8 good
Using LC_COLLATE = "es_CU"
Using LC_CTYPE = "es_CU"
es_CU good
Using LC_COLLATE = "es_DO.utf8"
Using LC_CTYPE = "es_DO.utf8"
es_DO.utf8 good
Using LC_COLLATE = "es_EC.utf8"
Using LC_CTYPE = "es_EC.utf8"
es_EC.utf8 good
Using LC_COLLATE = "es_ES.utf8"
Using LC_CTYPE = "es_ES.utf8"
es_ES.utf8 good
Using LC_COLLATE = "es_GT.utf8"
Using LC_CTYPE = "es_GT.utf8"
es_GT.utf8 good
Using LC_COLLATE = "es_HN.utf8"
Using LC_CTYPE = "es_HN.utf8"
es_HN.utf8 good
Using LC_COLLATE = "es_MX.utf8"
Using LC_CTYPE = "es_MX.utf8"
es_MX.utf8 good
Using LC_COLLATE = "es_NI.utf8"
Using LC_CTYPE = "es_NI.utf8"
es_NI.utf8 good
Using LC_COLLATE = "es_PA.utf8"
Using LC_CTYPE = "es_PA.utf8"
es_PA.utf8 good
Using LC_COLLATE = "es_PE.utf8"
Using LC_CTYPE = "es_PE.utf8"
es_PE.utf8 good
Using LC_COLLATE = "es_PR.utf8"
Using LC_CTYPE = "es_PR.utf8"
es_PR.utf8 good
Using LC_COLLATE = "es_PY.utf8"
Using LC_CTYPE = "es_PY.utf8"
es_PY.utf8 good
Using LC_COLLATE = "es_SV.utf8"
Using LC_CTYPE = "es_SV.utf8"
es_SV.utf8 good
Using LC_COLLATE = "es_US.utf8"
Using LC_CTYPE = "es_US.utf8"
es_US.utf8 good
Using LC_COLLATE = "es_UY.utf8"
Using LC_CTYPE = "es_UY.utf8"
es_UY.utf8 good
Using LC_COLLATE = "es_VE.utf8"
Using LC_CTYPE = "es_VE.utf8"
es_VE.utf8 good
Using LC_COLLATE = "et_EE.utf8"
Using LC_CTYPE = "et_EE.utf8"
et_EE.utf8 good
Using LC_COLLATE = "eu_ES.utf8"
Using LC_CTYPE = "eu_ES.utf8"
eu_ES.utf8 good
Using LC_COLLATE = "eu_FR.utf8"
Using LC_CTYPE = "eu_FR.utf8"
eu_FR.utf8 good
Using LC_COLLATE = "fa_IR"
Using LC_CTYPE = "fa_IR"
fa_IR good
Using LC_COLLATE = "ff_SN"
Using LC_CTYPE = "ff_SN"
ff_SN good
Using LC_COLLATE = "fi_FI.utf8"
Using LC_CTYPE = "fi_FI.utf8"
fi_FI.utf8 good
Using LC_COLLATE = "fil_PH"
Using LC_CTYPE = "fil_PH"
fil_PH good
Using LC_COLLATE = "fo_FO.utf8"
Using LC_CTYPE = "fo_FO.utf8"
fo_FO.utf8 good
Using LC_COLLATE = "fr_BE.utf8"
Using LC_CTYPE = "fr_BE.utf8"
fr_BE.utf8 good
Using LC_COLLATE = "fr_CA.utf8"
Using LC_CTYPE = "fr_CA.utf8"
fr_CA.utf8 good
Using LC_COLLATE = "fr_CH.utf8"
Using LC_CTYPE = "fr_CH.utf8"
fr_CH.utf8 good
Using LC_COLLATE = "fr_FR.utf8"
Using LC_CTYPE = "fr_FR.utf8"
fr_FR.utf8 good
Using LC_COLLATE = "fr_LU.utf8"
Using LC_CTYPE = "fr_LU.utf8"
fr_LU.utf8 good
Using LC_COLLATE = "fur_IT"
Using LC_CTYPE = "fur_IT"
fur_IT good
Using LC_COLLATE = "fy_DE"
Using LC_CTYPE = "fy_DE"
fy_DE good
Using LC_COLLATE = "fy_NL"
Using LC_CTYPE = "fy_NL"
fy_NL good
Using LC_COLLATE = "ga_IE.utf8"
Using LC_CTYPE = "ga_IE.utf8"
ga_IE.utf8 good
Using LC_COLLATE = "gd_GB.utf8"
Using LC_CTYPE = "gd_GB.utf8"
gd_GB.utf8 good
Using LC_COLLATE = "gez_ER"
Using LC_CTYPE = "gez_ER"
gez_ER good
Using LC_COLLATE = "gez_ER@abegede"
Using LC_CTYPE = "gez_ER@abegede"
gez_ER@abegede good
Using LC_COLLATE = "gez_ET"
Using LC_CTYPE = "gez_ET"
gez_ET good
Using LC_COLLATE = "gez_ET@abegede"
Using LC_CTYPE = "gez_ET@abegede"
gez_ET@abegede good
Using LC_COLLATE = "gl_ES.utf8"
Using LC_CTYPE = "gl_ES.utf8"
gl_ES.utf8 good
Using LC_COLLATE = "gu_IN"
Using LC_CTYPE = "gu_IN"
gu_IN good
Using LC_COLLATE = "gv_GB.utf8"
Using LC_CTYPE = "gv_GB.utf8"
gv_GB.utf8 good
Using LC_COLLATE = "hak_TW"
Using LC_CTYPE = "hak_TW"
hak_TW good
Using LC_COLLATE = "ha_NG"
Using LC_CTYPE = "ha_NG"
ha_NG good
Using LC_COLLATE = "he_IL.utf8"
Using LC_CTYPE = "he_IL.utf8"
he_IL.utf8 good
Using LC_COLLATE = "hi_IN"
Using LC_CTYPE = "hi_IN"
hi_IN good
Using LC_COLLATE = "hne_IN"
Using LC_CTYPE = "hne_IN"
hne_IN good
Using LC_COLLATE = "hr_HR.utf8"
Using LC_CTYPE = "hr_HR.utf8"
hr_HR.utf8 good
Using LC_COLLATE = "hsb_DE.utf8"
Using LC_CTYPE = "hsb_DE.utf8"
hsb_DE.utf8 good
Using LC_COLLATE = "ht_HT"
Using LC_CTYPE = "ht_HT"
ht_HT good
Using LC_COLLATE = "hu_HU.utf8"
Using LC_CTYPE = "hu_HU.utf8"
hu_HU.utf8 good
Using LC_COLLATE = "hy_AM"
Using LC_CTYPE = "hy_AM"
hy_AM good
Using LC_COLLATE = "ia_FR"
Using LC_CTYPE = "ia_FR"
ia_FR good
Using LC_COLLATE = "id_ID.utf8"
Using LC_CTYPE = "id_ID.utf8"
id_ID.utf8 good
Using LC_COLLATE = "ig_NG"
Using LC_CTYPE = "ig_NG"
ig_NG good
Using LC_COLLATE = "ik_CA"
Using LC_CTYPE = "ik_CA"
ik_CA good
Using LC_COLLATE = "is_IS.utf8"
Using LC_CTYPE = "is_IS.utf8"
is_IS.utf8 good
Using LC_COLLATE = "it_CH.utf8"
Using LC_CTYPE = "it_CH.utf8"
it_CH.utf8 good
Using LC_COLLATE = "it_IT.utf8"
Using LC_CTYPE = "it_IT.utf8"
it_IT.utf8 good
Using LC_COLLATE = "iu_CA"
Using LC_CTYPE = "iu_CA"
iu_CA good
Using LC_COLLATE = "iw_IL.utf8"
Using LC_CTYPE = "iw_IL.utf8"
iw_IL.utf8 good
Using LC_COLLATE = "ja_JP.utf8"
Using LC_CTYPE = "ja_JP.utf8"
ja_JP.utf8 good
Using LC_COLLATE = "ka_GE.utf8"
Using LC_CTYPE = "ka_GE.utf8"
ka_GE.utf8 good
Using LC_COLLATE = "kk_KZ.utf8"
Using LC_CTYPE = "kk_KZ.utf8"
kk_KZ.utf8 good
Using LC_COLLATE = "kl_GL.utf8"
Using LC_CTYPE = "kl_GL.utf8"
kl_GL.utf8 good
Using LC_COLLATE = "km_KH"
Using LC_CTYPE = "km_KH"
km_KH good
Using LC_COLLATE = "kn_IN"
Using LC_CTYPE = "kn_IN"
kn_IN good
Using LC_COLLATE = "kok_IN"
Using LC_CTYPE = "kok_IN"
kok_IN good
Using LC_COLLATE = "ko_KR.utf8"
Using LC_CTYPE = "ko_KR.utf8"
ko_KR.utf8 good
Using LC_COLLATE = "ks_IN"
Using LC_CTYPE = "ks_IN"
ks_IN good
Using LC_COLLATE = "ks_IN@devanagari"
Using LC_CTYPE = "ks_IN@devanagari"
ks_IN@devanagari good
Using LC_COLLATE = "ku_TR.utf8"
Using LC_CTYPE = "ku_TR.utf8"
ku_TR.utf8 good
Using LC_COLLATE = "kw_GB.utf8"
Using LC_CTYPE = "kw_GB.utf8"
kw_GB.utf8 good
Using LC_COLLATE = "ky_KG"
Using LC_CTYPE = "ky_KG"
ky_KG good
Using LC_COLLATE = "lb_LU"
Using LC_CTYPE = "lb_LU"
lb_LU good
Using LC_COLLATE = "lg_UG.utf8"
Using LC_CTYPE = "lg_UG.utf8"
lg_UG.utf8 good
Using LC_COLLATE = "li_BE"
Using LC_CTYPE = "li_BE"
li_BE good
Using LC_COLLATE = "lij_IT"
Using LC_CTYPE = "lij_IT"
lij_IT good
Using LC_COLLATE = "li_NL"
Using LC_CTYPE = "li_NL"
li_NL good
Using LC_COLLATE = "lo_LA"
Using LC_CTYPE = "lo_LA"
lo_LA good
Using LC_COLLATE = "lt_LT.utf8"
Using LC_CTYPE = "lt_LT.utf8"
lt_LT.utf8 good
Using LC_COLLATE = "lv_LV.utf8"
Using LC_CTYPE = "lv_LV.utf8"
lv_LV.utf8 good
Using LC_COLLATE = "lzh_TW"
Using LC_CTYPE = "lzh_TW"
lzh_TW good
Using LC_COLLATE = "mag_IN"
Using LC_CTYPE = "mag_IN"
mag_IN good
Using LC_COLLATE = "mai_IN"
Using LC_CTYPE = "mai_IN"
mai_IN good
Using LC_COLLATE = "mg_MG.utf8"
Using LC_CTYPE = "mg_MG.utf8"
mg_MG.utf8 good
Using LC_COLLATE = "mhr_RU"
Using LC_CTYPE = "mhr_RU"
mhr_RU good
Using LC_COLLATE = "mi_NZ.utf8"
Using LC_CTYPE = "mi_NZ.utf8"
mi_NZ.utf8 good
Using LC_COLLATE = "mk_MK.utf8"
Using LC_CTYPE = "mk_MK.utf8"
mk_MK.utf8 good
Using LC_COLLATE = "ml_IN"
Using LC_CTYPE = "ml_IN"
ml_IN good
Using LC_COLLATE = "mni_IN"
Using LC_CTYPE = "mni_IN"
mni_IN good
Using LC_COLLATE = "mn_MN"
Using LC_CTYPE = "mn_MN"
mn_MN good
Using LC_COLLATE = "mr_IN"
Using LC_CTYPE = "mr_IN"
mr_IN good
Using LC_COLLATE = "ms_MY.utf8"
Using LC_CTYPE = "ms_MY.utf8"
ms_MY.utf8 good
Using LC_COLLATE = "mt_MT.utf8"
Using LC_CTYPE = "mt_MT.utf8"
mt_MT.utf8 good
Using LC_COLLATE = "my_MM"
Using LC_CTYPE = "my_MM"
my_MM good
Using LC_COLLATE = "nan_TW"
Using LC_CTYPE = "nan_TW"
nan_TW good
Using LC_COLLATE = "nan_TW@latin"
Using LC_CTYPE = "nan_TW@latin"
nan_TW@latin good
Using LC_COLLATE = "nb_NO.utf8"
Using LC_CTYPE = "nb_NO.utf8"
nb_NO.utf8 good
Using LC_COLLATE = "nds_DE"
Using LC_CTYPE = "nds_DE"
nds_DE good
Using LC_COLLATE = "nds_NL"
Using LC_CTYPE = "nds_NL"
nds_NL good
Using LC_COLLATE = "ne_NP"
Using LC_CTYPE = "ne_NP"
ne_NP good
Using LC_COLLATE = "nhn_MX"
Using LC_CTYPE = "nhn_MX"
nhn_MX good
Using LC_COLLATE = "niu_NU"
Using LC_CTYPE = "niu_NU"
niu_NU good
Using LC_COLLATE = "niu_NZ"
Using LC_CTYPE = "niu_NZ"
niu_NZ good
Using LC_COLLATE = "nl_AW"
Using LC_CTYPE = "nl_AW"
nl_AW good
Using LC_COLLATE = "nl_BE.utf8"
Using LC_CTYPE = "nl_BE.utf8"
nl_BE.utf8 good
Using LC_COLLATE = "nl_NL.utf8"
Using LC_CTYPE = "nl_NL.utf8"
nl_NL.utf8 good
Using LC_COLLATE = "nn_NO.utf8"
Using LC_CTYPE = "nn_NO.utf8"
nn_NO.utf8 good
Using LC_COLLATE = "nr_ZA"
Using LC_CTYPE = "nr_ZA"
nr_ZA good
Using LC_COLLATE = "nso_ZA"
Using LC_CTYPE = "nso_ZA"
nso_ZA good
Using LC_COLLATE = "oc_FR.utf8"
Using LC_CTYPE = "oc_FR.utf8"
oc_FR.utf8 good
Using LC_COLLATE = "om_ET"
Using LC_CTYPE = "om_ET"
om_ET good
Using LC_COLLATE = "om_KE.utf8"
Using LC_CTYPE = "om_KE.utf8"
om_KE.utf8 good
Using LC_COLLATE = "or_IN"
Using LC_CTYPE = "or_IN"
or_IN good
Using LC_COLLATE = "os_RU"
Using LC_CTYPE = "os_RU"
os_RU good
Using LC_COLLATE = "pa_IN"
Using LC_CTYPE = "pa_IN"
pa_IN good
Using LC_COLLATE = "pap_AN"
Using LC_CTYPE = "pap_AN"
pap_AN good
Using LC_COLLATE = "pap_AW"
Using LC_CTYPE = "pap_AW"
pap_AW good
Using LC_COLLATE = "pap_CW"
Using LC_CTYPE = "pap_CW"
pap_CW good
Using LC_COLLATE = "pa_PK"
Using LC_CTYPE = "pa_PK"
pa_PK good
Using LC_COLLATE = "pl_PL.utf8"
Using LC_CTYPE = "pl_PL.utf8"
pl_PL.utf8 good
Using LC_COLLATE = "ps_AF"
Using LC_CTYPE = "ps_AF"
ps_AF good
Using LC_COLLATE = "pt_BR.utf8"
Using LC_CTYPE = "pt_BR.utf8"
pt_BR.utf8 good
Using LC_COLLATE = "pt_PT.utf8"
Using LC_CTYPE = "pt_PT.utf8"
pt_PT.utf8 good
Using LC_COLLATE = "quz_PE"
Using LC_CTYPE = "quz_PE"
quz_PE good
Using LC_COLLATE = "raj_IN"
Using LC_CTYPE = "raj_IN"
raj_IN good
Using LC_COLLATE = "ro_RO.utf8"
Using LC_CTYPE = "ro_RO.utf8"
ro_RO.utf8 good
Using LC_COLLATE = "ru_RU.utf8"
Using LC_CTYPE = "ru_RU.utf8"
ru_RU.utf8 good
Using LC_COLLATE = "ru_UA.utf8"
Using LC_CTYPE = "ru_UA.utf8"
ru_UA.utf8 good
Using LC_COLLATE = "rw_RW"
Using LC_CTYPE = "rw_RW"
rw_RW good
Using LC_COLLATE = "sa_IN"
Using LC_CTYPE = "sa_IN"
sa_IN good
Using LC_COLLATE = "sat_IN"
Using LC_CTYPE = "sat_IN"
sat_IN good
Using LC_COLLATE = "sc_IT"
Using LC_CTYPE = "sc_IT"
sc_IT good
Using LC_COLLATE = "sd_IN"
Using LC_CTYPE = "sd_IN"
sd_IN good
Using LC_COLLATE = "sd_IN@devanagari"
Using LC_CTYPE = "sd_IN@devanagari"
sd_IN@devanagari good
Using LC_COLLATE = "se_NO"
Using LC_CTYPE = "se_NO"
se_NO good
Using LC_COLLATE = "shs_CA"
Using LC_CTYPE = "shs_CA"
shs_CA good
Using LC_COLLATE = "sid_ET"
Using LC_CTYPE = "sid_ET"
sid_ET good
Using LC_COLLATE = "si_LK"
Using LC_CTYPE = "si_LK"
si_LK good
Using LC_COLLATE = "sk_SK.utf8"
Using LC_CTYPE = "sk_SK.utf8"
sk_SK.utf8 good
Using LC_COLLATE = "sl_SI.utf8"
Using LC_CTYPE = "sl_SI.utf8"
sl_SI.utf8 good
Using LC_COLLATE = "so_DJ.utf8"
Using LC_CTYPE = "so_DJ.utf8"
so_DJ.utf8 good
Using LC_COLLATE = "so_ET"
Using LC_CTYPE = "so_ET"
so_ET good
Using LC_COLLATE = "so_KE.utf8"
Using LC_CTYPE = "so_KE.utf8"
so_KE.utf8 good
Using LC_COLLATE = "so_SO.utf8"
Using LC_CTYPE = "so_SO.utf8"
so_SO.utf8 good
Using LC_COLLATE = "sq_AL.utf8"
Using LC_CTYPE = "sq_AL.utf8"
sq_AL.utf8 good
Using LC_COLLATE = "sq_MK"
Using LC_CTYPE = "sq_MK"
sq_MK good
Using LC_COLLATE = "sr_ME"
Using LC_CTYPE = "sr_ME"
sr_ME good
Using LC_COLLATE = "sr_RS"
Using LC_CTYPE = "sr_RS"
sr_RS good
Using LC_COLLATE = "sr_RS@latin"
Using LC_CTYPE = "sr_RS@latin"
sr_RS@latin good
Using LC_COLLATE = "ss_ZA"
Using LC_CTYPE = "ss_ZA"
ss_ZA good
Using LC_COLLATE = "st_ZA.utf8"
Using LC_CTYPE = "st_ZA.utf8"
st_ZA.utf8 good
Using LC_COLLATE = "sv_FI.utf8"
Using LC_CTYPE = "sv_FI.utf8"
sv_FI.utf8 good
Using LC_COLLATE = "sv_SE.utf8"
Using LC_CTYPE = "sv_SE.utf8"
sv_SE.utf8 good
Using LC_COLLATE = "sw_KE"
Using LC_CTYPE = "sw_KE"
sw_KE good
Using LC_COLLATE = "sw_TZ"
Using LC_CTYPE = "sw_TZ"
sw_TZ good
Using LC_COLLATE = "szl_PL"
Using LC_CTYPE = "szl_PL"
szl_PL good
Using LC_COLLATE = "ta_IN"
Using LC_CTYPE = "ta_IN"
ta_IN good
Using LC_COLLATE = "ta_LK"
Using LC_CTYPE = "ta_LK"
ta_LK good
Using LC_COLLATE = "tcy_IN.utf8"
Using LC_CTYPE = "tcy_IN.utf8"
tcy_IN.utf8 good
Using LC_COLLATE = "te_IN"
Using LC_CTYPE = "te_IN"
te_IN good
Using LC_COLLATE = "tg_TJ.utf8"
Using LC_CTYPE = "tg_TJ.utf8"
tg_TJ.utf8 good
Using LC_COLLATE = "the_NP"
Using LC_CTYPE = "the_NP"
the_NP good
Using LC_COLLATE = "th_TH.utf8"
Using LC_CTYPE = "th_TH.utf8"
th_TH.utf8 good
Using LC_COLLATE = "ti_ER"
Using LC_CTYPE = "ti_ER"
ti_ER good
Using LC_COLLATE = "ti_ET"
Using LC_CTYPE = "ti_ET"
ti_ET good
Using LC_COLLATE = "tig_ER"
Using LC_CTYPE = "tig_ER"
tig_ER good
Using LC_COLLATE = "tk_TM"
Using LC_CTYPE = "tk_TM"
tk_TM good
Using LC_COLLATE = "tl_PH.utf8"
Using LC_CTYPE = "tl_PH.utf8"
tl_PH.utf8 good
Using LC_COLLATE = "tn_ZA"
Using LC_CTYPE = "tn_ZA"
tn_ZA good
Using LC_COLLATE = "tr_CY.utf8"
Using LC_CTYPE = "tr_CY.utf8"
tr_CY.utf8 good
Using LC_COLLATE = "tr_TR.utf8"
Using LC_CTYPE = "tr_TR.utf8"
tr_TR.utf8 good
Using LC_COLLATE = "ts_ZA"
Using LC_CTYPE = "ts_ZA"
ts_ZA good
Using LC_COLLATE = "tt_RU"
Using LC_CTYPE = "tt_RU"
tt_RU good
Using LC_COLLATE = "tt_RU@iqtelif"
Using LC_CTYPE = "tt_RU@iqtelif"
tt_RU@iqtelif good
Using LC_COLLATE = "ug_CN"
Using LC_CTYPE = "ug_CN"
ug_CN good
Using LC_COLLATE = "uk_UA.utf8"
Using LC_CTYPE = "uk_UA.utf8"
uk_UA.utf8 good
Using LC_COLLATE = "unm_US"
Using LC_CTYPE = "unm_US"
unm_US good
Using LC_COLLATE = "ur_IN"
Using LC_CTYPE = "ur_IN"
ur_IN good
Using LC_COLLATE = "ur_PK"
Using LC_CTYPE = "ur_PK"
ur_PK good
Using LC_COLLATE = "uz_UZ@cyrillic"
Using LC_CTYPE = "uz_UZ@cyrillic"
uz_UZ@cyrillic good
Using LC_COLLATE = "uz_UZ.utf8"
Using LC_CTYPE = "uz_UZ.utf8"
uz_UZ.utf8 good
Using LC_COLLATE = "ve_ZA"
Using LC_CTYPE = "ve_ZA"
ve_ZA good
Using LC_COLLATE = "vi_VN"
Using LC_CTYPE = "vi_VN"
vi_VN good
Using LC_COLLATE = "wa_BE.utf8"
Using LC_CTYPE = "wa_BE.utf8"
wa_BE.utf8 good
Using LC_COLLATE = "wae_CH"
Using LC_CTYPE = "wae_CH"
wae_CH good
Using LC_COLLATE = "wal_ET"
Using LC_CTYPE = "wal_ET"
wal_ET good
Using LC_COLLATE = "wo_SN"
Using LC_CTYPE = "wo_SN"
wo_SN good
Using LC_COLLATE = "xh_ZA.utf8"
Using LC_CTYPE = "xh_ZA.utf8"
xh_ZA.utf8 good
Using LC_COLLATE = "yi_US.utf8"
Using LC_CTYPE = "yi_US.utf8"
yi_US.utf8 good
Using LC_COLLATE = "yo_NG"
Using LC_CTYPE = "yo_NG"
yo_NG good
Using LC_COLLATE = "yue_HK"
Using LC_CTYPE = "yue_HK"
yue_HK good
Using LC_COLLATE = "zh_CN.utf8"
Using LC_CTYPE = "zh_CN.utf8"
zh_CN.utf8 good
Using LC_COLLATE = "zh_HK.utf8"
Using LC_CTYPE = "zh_HK.utf8"
zh_HK.utf8 good
Using LC_COLLATE = "zh_SG.utf8"
Using LC_CTYPE = "zh_SG.utf8"
zh_SG.utf8 good
Using LC_COLLATE = "zh_TW.utf8"
Using LC_CTYPE = "zh_TW.utf8"
zh_TW.utf8 good
Using LC_COLLATE = "zu_ZA.utf8"
Using LC_CTYPE = "zu_ZA.utf8"
zu_ZA.utf8 good
Thanks!
Stephen
On Tue, Mar 22, 2016 at 3:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Well, if we implement a compatibility GUC that shuts off our
dependency on strxfrm(), people can go back to having 9.5 be no more
broken than 9.4 was. I vote we do that and go home.
I don't have a problem with that idea, but I fear "no more broken than
9.4 was" might be a very low bar for certain systems and collations.
Abbreviated key may have simply unmasked the problem in some cases.
Consider:
[vagrant@localhost ~]$ LC_COLLATE=en_us sort strings.txt <-- correct
x xx
x xx"
xxx
xxx"
[vagrant@localhost ~]$ LC_COLLATE=de_DE sort strings.txt <-- wrong
xxx
xxx"
x xx
x xx"
[vagrant@localhost ~]$ ./strxfrm-binary de_DE.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: 6
My concern was not merely "academic" (i.e. it was not limited in scope
to things that don't make B-Tree indexes corrupt). Pretty sure that we
need to start thinking of this as a problem with strcoll() that
strxfrm() does not have for more fundamental reasons, because
strcoll() says that the first string in the de_DE sorted list is
*greater* than the third string. That's wrong, and not just because
strxfrm() gives an intuitively correct answer -- it's wrong
specifically because the transitive law has been broken.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Peter Geoghegan <pg@heroku.com> writes:
My concern was not merely "academic" (i.e. it was not limited in scope
to things that don't make B-Tree indexes corrupt). Pretty sure that we
need to start thinking of this as a problem with strcoll() that
strxfrm() does not have for more fundamental reasons, because
strcoll() says that the first string in the de_DE sorted list is
*greater* than the third string.
[ squint... ] I was looking specifically for that sort of misbehavior
in my test program, and I haven't seen it.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Mar 22, 2016 at 07:19:44PM -0400, Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I was a little worried that it was too much to hope for that all libc
vendors on earth would ship a strxfrm() implementation that was actually
consistent with strcoll(), and here we are.Indeed. To try to put some scope on the problem, I made an idiot little
program that just generates some random UTF8 strings and sees whether
strcoll and strxfrm sort them alike. Attached are that program, a even
more idiot little shell script that runs it over all available UTF8
locales, and the results on my RHEL6 box. While de_DE seems to be the
worst-broken locale, it's far from the only one.Please try this on as many platforms as you can get hold of ...
I, too, found MAXXFRMLEN insufficient; I raised it fourfold. Cygwin
2.2.1(0.289/5/3) caught fire; 10% of locales passed. (varstr_sortsupport()
already blacklists the UTF8/native Windows case.) The test passed on Solaris
10, Solaris 11, HP-UX B.11.31, OpenBSD 5.0, NetBSD 5.1.2, and FreeBSD 9.0.
See attached tryalllocales.sh outputs. I did not test AIX, because the AIX
machines I use have no UTF8 locales installed.
Attachments:
--On 22. März 2016 19:19:44 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote:
Please try this on as many platforms as you can get hold of ...
Since i have to work on SuSE/SLES platforms atm some results from them
(openLeap/SLES12 are identical, but that isn't a surprise since SLES12 is
based on openLeap42.1):
SLES12:
grep BAD results_sles12.txt
ca_AD.utf8 BAD
ca_ES.utf8 BAD
ca_FR.utf8 BAD
ca_IT.utf8 BAD
da_DK.utf8 BAD
de_DE.utf8 BAD
en_BE.utf8 BAD
en_CA.utf8 BAD
es_EC.utf8 BAD
es_US.utf8 BAD
fi_FI.utf8 BAD
fo_FO.utf8 BAD
fr_CA.utf8 BAD
hu_HU.utf8 BAD
kl_GL.utf8 BAD
ku_TR.utf8 BAD
nb_NO.utf8 BAD
nn_NO.utf8 BAD
no_NO.utf8 BAD
ro_RO.utf8 BAD
sh_YU.utf8 BAD
sq_AL.utf8 BAD
sv_FI.utf8 BAD
sv_SE.utf8 BAD
SLES11 SP4:
grep BAD results_sles11sp4.txt
az_AZ.utf8 BAD
ca_AD.utf8 BAD
ca_ES.utf8 BAD
ca_FR.utf8 BAD
ca_IT.utf8 BAD
da_DK.utf8 BAD
de_DE.utf8 BAD
en_BE.utf8 BAD
en_CA.utf8 BAD
es_EC.utf8 BAD
es_US.utf8 BAD
fi_FI.utf8 BAD
fo_FO.utf8 BAD
fr_CA.utf8 BAD
hu_HU.utf8 BAD
kl_GL.utf8 BAD
ku_TR.utf8 BAD
nb_NO.utf8 BAD
nn_NO.utf8 BAD
no_NO.utf8 BAD
ro_RO.utf8 BAD
se_NO.utf8 BAD
sh_YU.utf8 BAD
sq_AL.utf8 BAD
sv_FI.utf8 BAD
sv_SE.utf8 BAD
tt_RU.utf8 BAD
tt_RU@iqtelif.UTF-8 BAD
openSuSE/openLeap 42.1:
grep BAD results_openleap421.txt
ca_AD.utf8 BAD
ca_ES.utf8 BAD
ca_FR.utf8 BAD
ca_IT.utf8 BAD
da_DK.utf8 BAD
de_DE.utf8 BAD
en_BE.utf8 BAD
en_CA.utf8 BAD
es_EC.utf8 BAD
es_US.utf8 BAD
fi_FI.utf8 BAD
fo_FO.utf8 BAD
fr_CA.utf8 BAD
hu_HU.utf8 BAD
kl_GL.utf8 BAD
ku_TR.utf8 BAD
nb_NO.utf8 BAD
nn_NO.utf8 BAD
no_NO.utf8 BAD
ro_RO.utf8 BAD
sh_YU.utf8 BAD
sq_AL.utf8 BAD
sv_FI.utf8 BAD
sv_SE.utf8 BAD
--
Thanks
Bernd
On Tue, Mar 22, 2016 at 10:44 PM, Noah Misch <noah@leadboat.com> wrote:
On Tue, Mar 22, 2016 at 07:19:44PM -0400, Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I was a little worried that it was too much to hope for that all libc
vendors on earth would ship a strxfrm() implementation that was actually
consistent with strcoll(), and here we are.Indeed. To try to put some scope on the problem, I made an idiot little
program that just generates some random UTF8 strings and sees whether
strcoll and strxfrm sort them alike. Attached are that program, a even
more idiot little shell script that runs it over all available UTF8
locales, and the results on my RHEL6 box. While de_DE seems to be the
worst-broken locale, it's far from the only one.Please try this on as many platforms as you can get hold of ...
I, too, found MAXXFRMLEN insufficient; I raised it fourfold. Cygwin
2.2.1(0.289/5/3) caught fire; 10% of locales passed. (varstr_sortsupport()
already blacklists the UTF8/native Windows case.) The test passed on Solaris
10, Solaris 11, HP-UX B.11.31, OpenBSD 5.0, NetBSD 5.1.2, and FreeBSD 9.0.
See attached tryalllocales.sh outputs. I did not test AIX, because the AIX
machines I use have no UTF8 locales installed.
Wow, thanks for the extensive testing. This suggests that, apart from
Cygwin which apparently doesn't matter right now, the only thing that
is busted is glibc. I believe we have yet to see a single locale that
fails anywhere else (apart from Cygwin). Good thing so few of our
users run glibc!
Ha ha, little joke there.
So, options:
1. We could make it the user's problem to figure out whether they've
got a buggy glibc and add a GUC to shut this off, as previously
suggested.
2. We could add a blacklist (either hardcoded or a GUC) shutting this
off for locales known to be buggy anywhere.
3. We could write some test code that runs at startup time which
reliably detects all of the broken locales we've so far uncovered and
disables this if so.
4. We could shut this off for all Linux users in all locales and tell
everybody to REINDEX. That would be pretty sad, though.
Thoughts? Other ideas?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Mar 22, 2016 at 10:44 PM, Noah Misch <noah@leadboat.com> wrote:
I, too, found MAXXFRMLEN insufficient; I raised it fourfold. Cygwin
2.2.1(0.289/5/3) caught fire; 10% of locales passed. (varstr_sortsupport()
already blacklists the UTF8/native Windows case.) The test passed on Solaris
10, Solaris 11, HP-UX B.11.31, OpenBSD 5.0, NetBSD 5.1.2, and FreeBSD 9.0.
See attached tryalllocales.sh outputs. I did not test AIX, because the AIX
machines I use have no UTF8 locales installed.
Wow, thanks for the extensive testing. This suggests that, apart from
Cygwin which apparently doesn't matter right now, the only thing that
is busted is glibc. I believe we have yet to see a single locale that
fails anywhere else (apart from Cygwin). Good thing so few of our
users run glibc!
I extended my test program to be able to check locales using ISO-8859-x
encodings. RHEL6 shows me failures in a set of locales that is remarkably
unlike the set it fails on for UTF8 (though good ol de_DE manages to fail
in both encodings, as do a few others). I'm not sure what that implies
for the underlying bug(s).
So, options:
1. We could make it the user's problem to figure out whether they've
got a buggy glibc and add a GUC to shut this off, as previously
suggested.
2. We could add a blacklist (either hardcoded or a GUC) shutting this
off for locales known to be buggy anywhere.
3. We could write some test code that runs at startup time which
reliably detects all of the broken locales we've so far uncovered and
disables this if so.
4. We could shut this off for all Linux users in all locales and tell
everybody to REINDEX. That would be pretty sad, though.
TBH, I think #1 is right out, unless maybe the GUC defaults to off.
We aren't that cavalier with data consistency in other departments.
#2 and #3 presume a level of knowledge of the bug details that we
have not got, and probably can't get by Monday.
As far as #4 goes, we're going to have to tell people to REINDEX
no matter what the other aspects of the fix look like. On-disk
indexes are broken right now, if you're using one of the affected
locales.
regards, tom lane
I wrote:
I extended my test program to be able to check locales using ISO-8859-x
encodings. RHEL6 shows me failures in a set of locales that is remarkably
unlike the set it fails on for UTF8 (though good ol de_DE manages to fail
in both encodings, as do a few others). I'm not sure what that implies
for the underlying bug(s).
Closer analysis says that all of the cases where only utf8 is reported to
fail are in fact because there is no iso8859 equivalent locale on my
machine. Many of the cases where only iso8859 is reported to fail are
just chance passes due to not having randomly generated a failure case;
you can reduce the odds of that by passing strcolltest a repeat count
larger than 1. There remain, however, a few locales in which it seems
that indeed iso8859 is broken and utf8 is not; ru_RU being the most
prominent example.
In short, the problem is actually worse in non-UTF8 locales.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Mar 23, 2016 at 9:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
I extended my test program to be able to check locales using ISO-8859-x
encodings. RHEL6 shows me failures in a set of locales that isremarkably
unlike the set it fails on for UTF8 (though good ol de_DE manages to fail
in both encodings, as do a few others). I'm not sure what that implies
for the underlying bug(s).Closer analysis says that all of the cases where only utf8 is reported to
fail are in fact because there is no iso8859 equivalent locale on my
machine. Many of the cases where only iso8859 is reported to fail are
just chance passes due to not having randomly generated a failure case;
you can reduce the odds of that by passing strcolltest a repeat count
larger than 1. There remain, however, a few locales in which it seems
that indeed iso8859 is broken and utf8 is not; ru_RU being the most
prominent example.In short, the problem is actually worse in non-UTF8 locales.
Is the POSIX/C (non)-locale affected?
David J.
On Wed, Mar 23, 2016 at 12:19 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
Is the POSIX/C (non)-locale affected?
We don't use strxfrm() or strcoll() in that case, so I sure hope not.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Mar 23, 2016 at 12:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
I extended my test program to be able to check locales using ISO-8859-x
encodings. RHEL6 shows me failures in a set of locales that is remarkably
unlike the set it fails on for UTF8 (though good ol de_DE manages to fail
in both encodings, as do a few others). I'm not sure what that implies
for the underlying bug(s).Closer analysis says that all of the cases where only utf8 is reported to
fail are in fact because there is no iso8859 equivalent locale on my
machine. Many of the cases where only iso8859 is reported to fail are
just chance passes due to not having randomly generated a failure case;
you can reduce the odds of that by passing strcolltest a repeat count
larger than 1. There remain, however, a few locales in which it seems
that indeed iso8859 is broken and utf8 is not; ru_RU being the most
prominent example.In short, the problem is actually worse in non-UTF8 locales.
I guess that's not terribly surprising. If the glibc maintainers
haven't managed to get this right for UTF-8 locales, I can't imagine
why they would have been more careful for non-UTF-8 locales that - I
would guess - get less use.
Are you still in information-gathering more, or are you going to issue
a recommendation on how we should proceed here, or what?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Robert Haas <robertmhaas@gmail.com> writes:
Are you still in information-gathering more, or are you going to issue
a recommendation on how we should proceed here, or what?
If I had to make a recommendation right now, I would go for your
option #4, ie shut 'em all down Scotty. We do not know the full extent
of the problem but it looks pretty bad, and I think our first priority
has to be to guarantee data integrity. I do not have a lot of faith in
the proposition that glibc's is the only buggy implementation, either.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Mar 23, 2016 at 10:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
If I had to make a recommendation right now, I would go for your
option #4, ie shut 'em all down Scotty. We do not know the full extent
of the problem but it looks pretty bad, and I think our first priority
has to be to guarantee data integrity.
+1, but only for glibc, and configurable. The glibc default might
later be revisited in the stable 9.5 branch.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Mar 23, 2016 18:53, "Peter Geoghegan" <pg@heroku.com> wrote:
On Wed, Mar 23, 2016 at 10:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
If I had to make a recommendation right now, I would go for your
option #4, ie shut 'em all down Scotty. We do not know the full extent
of the problem but it looks pretty bad, and I think our first priority
has to be to guarantee data integrity.+1, but only for glibc, and configurable. The glibc default might
later be revisited in the stable 9.5 branch.
Are you talking about configurable at./configure time, or guc?
Making it a compile time option makes sense I think. But turning it into a
guc will expose users to a lot of failure scenarios if they *change* the
value, and that seems risky.
Putting it in autoconf and default to off in the upcoming minor seems like
a good idea. Then once we have more information, we can consider if we want
to turn it back on in backbranches our just in 9.6 (when/if properly
fixed).
/Magnus
On Wed, Mar 23, 2016 at 10:56 AM, Magnus Hagander <magnus@hagander.net> wrote:
Are you talking about configurable at./configure time, or guc?
I meant a GUC. I think a ./configure option is overkill.
What about the existing caller of strxfrm(), convert_string_datum()?
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Mar 23, 2016 at 10:58 AM, Peter Geoghegan <pg@heroku.com> wrote:
What about the existing caller of strxfrm(), convert_string_datum()?
I mean, the caller exists in all back-branches, not just 9.5.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Mar 23, 2016 at 10:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:
I guess that's not terribly surprising. If the glibc maintainers
haven't managed to get this right for UTF-8 locales, I can't imagine
why they would have been more careful for non-UTF-8 locales that - I
would guess - get less use.
We don't want to suggest that locales are broken as such. My inability
to reproduce the original complaint on alternative German locales
(e.g. Austrian) suggest to me that it just "accidentally fails to
fail" for whatever reason (maybe they fail in other ways). I should
say "accidentally fails to not fail", because this is a failure of
strxfrm() to be bug-compatible with strcoll(), which I think needs to
not be forgotten.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Mar 23, 2016 at 6:58 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Wed, Mar 23, 2016 at 10:56 AM, Magnus Hagander <magnus@hagander.net>
wrote:Are you talking about configurable at./configure time, or guc?
I meant a GUC. I think a ./configure option is overkill.
We clearly have different views of the amount of kill effort required for
the different options :) I would've said that a ./configure option is the
easier way, and that doing a GUC is the one that's an overkill (being
significantly more effort).
That said, my main point is that I do not think the knob is something that
should be tuned by the average end user. For most people, that should be
left to the packagers for the platform, who can make an informed choice
about if it's safe to turn it on.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Wed, Mar 23, 2016 at 11:04 AM, Magnus Hagander <magnus@hagander.net> wrote:
That said, my main point is that I do not think the knob is something that
should be tuned by the average end user. For most people, that should be
left to the packagers for the platform, who can make an informed choice
about if it's safe to turn it on.
I could get behind that if we really make an effort to help them make
an informed choice. The abbreviated keys optimization is highly
valuable, and I put a lot of work into it, as did Robert.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Mar 23, 2016 at 7:06 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Wed, Mar 23, 2016 at 11:04 AM, Magnus Hagander <magnus@hagander.net>
wrote:That said, my main point is that I do not think the knob is something
that
should be tuned by the average end user. For most people, that should be
left to the packagers for the platform, who can make an informed choice
about if it's safe to turn it on.I could get behind that if we really make an effort to help them make
an informed choice. The abbreviated keys optimization is highly
valuable, and I put a lot of work into it, as did Robert.
Oh, I totally appreciate that. It's one of the great improvements in 9.5,
and one of the best things is that it's an "automatic improvement" that
doesn't require the users to change their applications to benefit from it.
But it's also currently badly broken on some of our most common platforms.
We want to get it back to working. But short-term, it's more important to
limit the scope of the brokenness, since this is a version that people are
putting in production. Once we have enough info to safely say we've put a
workaround in place, we turn it back on.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Peter Geoghegan <pg@heroku.com> writes:
What about the existing caller of strxfrm(), convert_string_datum()?
convert_string_datum is, and always has been, used only for planner
estimation purposes. We do not care if it sometimes gets inaccurate
answers. Even if it's as wrong as it can possibly be, that will only
affect planner estimates to the extent of wrongly interpolating between
the endpoints of a histogram bin, so that the effects are no worse than
about 1/statistics_target. And there are bigger limitations on the
accuracy of those estimates anyway, notably that we use the same stats
regardless of the collation that applies to a particular WHERE-clause
operator.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Mar 23, 2016 at 11:09 AM, Magnus Hagander <magnus@hagander.net> wrote:
We want to get it back to working. But short-term, it's more important to
limit the scope of the brokenness, since this is a version that people are
putting in production. Once we have enough info to safely say we've put a
workaround in place, we turn it back on.
Do you think it's possible that my amcheck tool might have a role to
play here? I wrote it for exactly this kind of scenario. If we could
get it reviewed, then a pre-release version compatible with 9.5 could
be made available. I'd be willing to work on that side of things if
core are receptive. Early prototypes of the tool were used to detect
collation incompatibility issues in production.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Peter Geoghegan <pg@heroku.com> writes:
On Wed, Mar 23, 2016 at 11:04 AM, Magnus Hagander <magnus@hagander.net> wrote:
That said, my main point is that I do not think the knob is something that
should be tuned by the average end user. For most people, that should be
left to the packagers for the platform, who can make an informed choice
about if it's safe to turn it on.
I could get behind that if we really make an effort to help them make
an informed choice. The abbreviated keys optimization is highly
valuable, and I put a lot of work into it, as did Robert.
I realize that, and I'm sympathetic, but I'm afraid it also means that
your judgment in this matter is rather biased.
I do not think that end users can be expected to know whether this is safe
to turn on, and TBH I do not think that most packagers will either. My
opinion is that our only guaranteed-safe option is to turn it off, period,
no exceptions for platforms that we've not yet found a failure case for.
We can consider turning it back on later, once we've done vastly more
study and testing than has evidently been done to date. One thing I'm
going to want to know is what was the root cause of glibc's bug, and what
is the reason to think that other implementations are going to be any more
reliable. At this point I'm disinclined to trust any implementation that
can't point to a structural reason (e.g., sharing code) to believe that
strcoll and strxfrm must yield equivalent answers.
(In other words, I want an #ifdef NOT_USED, which is even less effort
than either a GUC or a configure option ;-(. As well as being something
that we won't need to document and support indefinitely.)
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Mar 23, 2016 at 11:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I do not think that end users can be expected to know whether this is safe
to turn on, and TBH I do not think that most packagers will either. My
opinion is that our only guaranteed-safe option is to turn it off, period,
no exceptions for platforms that we've not yet found a failure case for.
We can consider turning it back on later, once we've done vastly more
study and testing than has evidently been done to date. One thing I'm
going to want to know is what was the root cause of glibc's bug, and what
is the reason to think that other implementations are going to be any more
reliable. At this point I'm disinclined to trust any implementation that
can't point to a structural reason (e.g., sharing code) to believe that
strcoll and strxfrm must yield equivalent answers.
The more I think about it, the more I agree that not trusting
strxfrm() across the board is the right move short-term. So, I'm not
going to be upset, provided we do actually follow through later with
an effort to turn it back on in 9.5 as as when it is known to be
reliable. All I'm asking for is that we actively work towards making
it safe, which evidently requires leg-work, that I can only do part
of. (For example, I'm not on the -packagers list, so cannot really
coordinate with packagers).
I think that that's a reasonable thing for me to expect.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Mar 23, 2016 at 2:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Geoghegan <pg@heroku.com> writes:
On Wed, Mar 23, 2016 at 11:04 AM, Magnus Hagander <magnus@hagander.net> wrote:
That said, my main point is that I do not think the knob is something that
should be tuned by the average end user. For most people, that should be
left to the packagers for the platform, who can make an informed choice
about if it's safe to turn it on.I could get behind that if we really make an effort to help them make
an informed choice. The abbreviated keys optimization is highly
valuable, and I put a lot of work into it, as did Robert.I realize that, and I'm sympathetic, but I'm afraid it also means that
your judgment in this matter is rather biased.I do not think that end users can be expected to know whether this is safe
to turn on, and TBH I do not think that most packagers will either. My
opinion is that our only guaranteed-safe option is to turn it off, period,
no exceptions for platforms that we've not yet found a failure case for.
We can consider turning it back on later, once we've done vastly more
study and testing than has evidently been done to date. One thing I'm
going to want to know is what was the root cause of glibc's bug, and what
is the reason to think that other implementations are going to be any more
reliable. At this point I'm disinclined to trust any implementation that
can't point to a structural reason (e.g., sharing code) to believe that
strcoll and strxfrm must yield equivalent answers.(In other words, I want an #ifdef NOT_USED, which is even less effort
than either a GUC or a configure option ;-(. As well as being something
that we won't need to document and support indefinitely.)
I think that something like the attached would be a reasonable
approach to the problem. If we later decide this is altogether
hopeless, we can do a more thorough job removing the code that can be
reached when collate_c && abbreviate, but let's not do that right now.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
dont-trust-strxfrm.patchtext/x-diff; charset=US-ASCII; name=dont-trust-strxfrm.patchDownload
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 94599cc..b10027f 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1832,17 +1832,30 @@ varstr_sortsupport(SortSupport ssup, Oid collid, bool bpchar)
}
/*
- * It's possible that there are platforms where the use of abbreviated
- * keys should be disabled at compile time. Having only 4 byte datums
- * could make worst-case performance drastically more likely, for example.
- * Moreover, Darwin's strxfrm() implementations is known to not
- * effectively concentrate a significant amount of entropy from the
- * original string in earlier transformed blobs. It's possible that other
- * supported platforms are similarly encumbered. However, even in those
- * cases, the abbreviated keys optimization may win, and if it doesn't,
- * the "abort abbreviation" code may rescue us. So, for now, we don't
- * disable this anywhere on the basis of performance.
+ * Unfortunately, it seems that abbreviation for non-C collations is
+ * broken on many common platforms; testing of multiple versions of glibc
+ * reveals that, for many locales, strcoll() and strxfrm() do not return
+ * consistent results, which is fatal to this optimization. While no
+ * other libc other than Cygwin has so far been shown to have a problem,
+ * we take the conservative course of action for right now and disable
+ * this categorically. (Users who are certain this isn't a problem on
+ * their system can define TRUST_STRXFRM.)
+ *
+ * Even apart from the risk of broken locales, it's possible that there
+ * are platforms where the use of abbreviated keys should be disabled at
+ * compile time. Having only 4 byte datums could make worst-case
+ * performance drastically more likely, for example. Moreover, Darwin's
+ * strxfrm() implementations is known to not effectively concentrate a
+ * significant amount of entropy from the original string in earlier
+ * transformed blobs. It's possible that other supported platforms are
+ * similarly encumbered. So, if we ever get past disabling this
+ * categorically, we may still want or need to disable it for particular
+ * platforms.
*/
+#ifndef TRUST_STRXFRM
+ if (!collate_c)
+ abbreviate = false;
+#endif
/*
* If we're using abbreviated keys, or if we're using a locale-aware
On Wed, Mar 23, 2016 at 11:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
I think that something like the attached would be a reasonable
approach to the problem. If we later decide this is altogether
hopeless, we can do a more thorough job removing the code that can be
reached when collate_c && abbreviate, but let's not do that right now.
This patch looks good to me.
I think that disabling abbreviation when the C collation is in makes
no sense, though. This has nothing to do with abbreviation as such,
and everything to do with glibc.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Mar 23, 2016 at 3:01 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Wed, Mar 23, 2016 at 11:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
I think that something like the attached would be a reasonable
approach to the problem. If we later decide this is altogether
hopeless, we can do a more thorough job removing the code that can be
reached when collate_c && abbreviate, but let's not do that right now.This patch looks good to me.
I think that disabling abbreviation when the C collation is in makes
no sense, though.
But the patch doesn't do that, right?
This has nothing to do with abbreviation as such,
and everything to do with glibc.
Yes.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Mar 23, 2016 at 12:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
I think that disabling abbreviation when the C collation is in makes
no sense, though.But the patch doesn't do that, right?
Right, it doesn't. But I was surprised that you even mentioned it as a
possibility. That's all.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Robert Haas <robertmhaas@gmail.com> writes:
+#ifndef TRUST_STRXFRM + if (!collate_c) + abbreviate = false; +#endif
Ah, I did not realize that abbreviation would be of any value in C locale.
If it is, then +1 for something like the above.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Mar 23, 2016 at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
+#ifndef TRUST_STRXFRM + if (!collate_c) + abbreviate = false; +#endifAh, I did not realize that abbreviation would be of any value in C locale.
If it is, then +1 for something like the above.
It's actually more likely to help for a C locale than for a non-C locale.
I have committed this and back-patched it to 9.5.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Mar 22, 2016 at 7:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Geoghegan <pg@heroku.com> writes:
My concern was not merely "academic" (i.e. it was not limited in scope
to things that don't make B-Tree indexes corrupt). Pretty sure that we
need to start thinking of this as a problem with strcoll() that
strxfrm() does not have for more fundamental reasons, because
strcoll() says that the first string in the de_DE sorted list is
*greater* than the third string.[ squint... ] I was looking specifically for that sort of misbehavior
in my test program, and I haven't seen it.
Sorry, I was in too much of a hurry to get to the bottom of this with
that example. I failed to notice that LC_COLLATE for sort was "de_DE",
not "de_DE.UTF-8". For my simple case it would not have mattered if
"de_DE" was specified instead of "de_DE.UTF-8" on a non-broken system.
But, this was a broken system.
Anyway, what prompted the misguided example was this:
[vagrant@localhost ~]$ ./strxfrm-binary de_DE.UTF-8 'x xx"' 'xxx"'
"x xx"" -> 2323230108080801020202010235034b (16 bytes)
"xxx"" -> 232323010808080102020201044b (14 bytes)
strcmp(arg1, arg2) result: -2
strcoll(arg1, arg2) result: -6
[vagrant@localhost ~]$ ./strxfrm-binary de_DE.UTF-8 'x xxf' 'xxxf'
"x xxf" -> 2323231101080808080102020202010235 (17 bytes)
"xxxf" -> 2323231101080808080102020202 (14 bytes)
strcmp(arg1, arg2) result: 1
strcoll(arg1, arg2) result: -6
Notice that case where a double-quote is used makes strxfrm() and
strcoll() agree. Whereas if that character is a character from the
Latin Alphabet instead, they disagree.
My intuition is that this is significant from the point of view of
fixing the glibc strcoll() bug. It feels like there is an incorrectly
applied optimization here, that occurs for strcoll() but not the
separate transformation process that strxfrm() does.
There seems to be at least a few instances of over-optimizing
strcoll() in the past few years. For example:
https://github.com/bminor/glibc/commit/87701a58e291bd7ac3b407d10a829dac52c9c16e
This bug looks like a possible candidate, given that complaints were
about de_DE:
https://github.com/bminor/glibc/commit/33a667def79c42e0befed1a4070798c58488170f
Is this bug of the right vintage? Seems like it might be a bit too
early for RHEL 6 to be affected, but I'm no expert.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Peter Geoghegan <pg@heroku.com> writes:
There seems to be at least a few instances of over-optimizing
strcoll() in the past few years. For example:
https://github.com/bminor/glibc/commit/87701a58e291bd7ac3b407d10a829dac52c9c16e
This bug looks like a possible candidate, given that complaints were
about de_DE:
https://github.com/bminor/glibc/commit/33a667def79c42e0befed1a4070798c58488170f
Is this bug of the right vintage? Seems like it might be a bit too
early for RHEL 6 to be affected, but I'm no expert.
It is too early. RHEL6 seems to be based off glibc 2.12, released 2010.
(By the same token, it's not got the other bug you mention ;-))
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Mar 23, 2016 at 2:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
It is too early. RHEL6 seems to be based off glibc 2.12, released 2010.
(By the same token, it's not got the other bug you mention ;-))
Well, it looked like everything was fine for "debian testing, glibc
2.22-3", including de_DE.UTF-8. In theory, it's only a matter of using
git-bisect to find what the fix was. That's just leg-work. I will find
time for it after the ongoing CF.
Mercifully, the situation with Glibc 2.22 suggests that the Glibc
people *aren't* fixing the strcoll() bugs in stable branches. But that
also means that it will take a long time to make non-C collation text
sorting use abbreviation on most systems, which is certainly
disappointing.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Mar 23, 2016 at 7:14 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Wed, Mar 23, 2016 at 11:09 AM, Magnus Hagander <magnus@hagander.net>
wrote:We want to get it back to working. But short-term, it's more important to
limit the scope of the brokenness, since this is a version that peopleare
putting in production. Once we have enough info to safely say we've put a
workaround in place, we turn it back on.Do you think it's possible that my amcheck tool might have a role to
play here? I wrote it for exactly this kind of scenario. If we could
get it reviewed, then a pre-release version compatible with 9.5 could
be made available. I'd be willing to work on that side of things if
core are receptive. Early prototypes of the tool were used to detect
collation incompatibility issues in production.
That's a good question? Would it catch corruption like this? I haven't
actually tested it :) My understanding is that the thing that can happen is
that while we don't actually store incorrect values in the indexes, we can
end up with index pointers in the wrong order in the indexes with this bug?
That does sound like one of those things that the amcheck tool is designed
to find?
And if not that one, can we find some other way for people to find out if
they need to REINDEX after the upgrade? It would be very nice not to have
to tell everybody to reindex everything, but to actually detect the cases
where it's needed. Or at least provide a supported way to do that, for
those where a cluster-wide reindex is really expensive.
Even if we can't sneak amcheck into 9.5, if we can show that it detects the
problem, then just being able to direct people to "get amcheck from 9.6 if
you want to check if the reindex is necessary" would still be a strong
improvement over nothing.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Thu, Mar 24, 2016 at 9:04 AM, Magnus Hagander <magnus@hagander.net> wrote:
Even if we can't sneak amcheck into 9.5, if we can show that it detects the
problem, then just being able to direct people to "get amcheck from 9.6 if
you want to check if the reindex is necessary" would still be a strong
improvement over nothing.
I agree that back-patching amcheck into 9.5 would be unprecedented,
but it wouldn't be crazy: shipping an extra contrib module with no
additional dependencies shouldn't break anything for existing users.
However, the fact that the patch is not "Ready for Committer" at this
point means that it is not going to be available in time for next
week's maintenance releases, or very possibly, for 9.6. Time grows
very short.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
--On 24. März 2016 14:04:22 +0100 Magnus Hagander <magnus@hagander.net>
wrote:
That's a good question? Would it catch corruption like this? I haven't
actually tested it :) My understanding is that the thing that can happen
is that while we don't actually store incorrect values in the indexes, we
can end up with index pointers in the wrong order in the indexes with
this bug? That does sound like one of those things that the amcheck tool
is designed to find?
This is exactly where the prototype btreecheck helped a lot. The last time
i used it to track down problems we got
WARNING: page order invariant violated for index
which nailed down collation problems on that specific machine and to
identify indexes, where we got the problem.
For example, if you take the bug report from Marc-Olaf and check the
affected table/index with the current amcheck patch, you get:
bernd@localhost:test #= SELECT bt_index_check('foo_val_idx');
ERROR: XX002: page order invariant violated for index "foo_val_idx"
DETAIL: Lower index tid=(1,1) (points to heap tid=(0,1)) higher index
tid=(1,2) (points to heap tid=(0,2)) page lsn=0/0.
LOCATION: bt_target_page_check, amcheck.c:687
STATEMENT: SELECT bt_index_check('foo_val_idx');
So if you ask me, this absolutely is a "must-have".
--
Thanks
Bernd
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Mar 24, 2016 at 7:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
However, the fact that the patch is not "Ready for Committer" at this
point means that it is not going to be available in time for next
week's maintenance releases, or very possibly, for 9.6. Time grows
very short.
The only people that are likely comfortable giving final sign-off on
it that are active this CF are Tom and Kevin. That is an awkward
situation.
I could produce a 9.5 variant that had even more limited scope than
what's in the CF. That would be strictly limited to checking page
order, and the high key invariant. It wouldn't check relationships
spanning multiple pages, either on the same level, or though
parent/child relationships. Then, I think significantly less expertise
is required for review, because locking protocols and so on don't
enter into it.
I think that the risk of getting something wrong with amcheck as
things stand is acceptable for 9.6, and maybe even 9.5. About the
worst case scenario is a false positive report of corruption. But with
the tool scoped at only looking at really obvious invariants at the
level of a single page, which is what I'd propose for 9.5, it seems
like the risk of bugs would be very well managed. That would still
catch issues caused by this glibc bug very reliable.
Keep in mind that in general, amcheck does nothing special with buffer
locks + pins -- it just acquires a pin +shared buffer lock on one
buffer/page at a time, copies it into local memory, and releases and
drops the pin. So, all processing by amcheck happens outside any
critical path.
I could work hard to get that stripped down amcheck into 9.5. I'm
already behind on my CF reviews, and time is short, so it would be
good if we moved quickly on this, either way....
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Peter Geoghegan <pg@heroku.com> writes:
On Thu, Mar 24, 2016 at 7:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
However, the fact that the patch is not "Ready for Committer" at this
point means that it is not going to be available in time for next
week's maintenance releases, or very possibly, for 9.6. Time grows
very short.
The only people that are likely comfortable giving final sign-off on
it that are active this CF are Tom and Kevin. That is an awkward
situation.
I would not be comfortable with reviewing an entire module with the
intention of shipping it in a stable branch on Monday, even if I had
nothing else to do between now and then. I think the only sane way
to get this into 9.5.2 would be to slip the release date, and that
seems rather counterproductive. We need to get this fix into the
hands of users ASAP.
I fear our only realistic course of action is to publish release
notes along the lines of "if you use any of list-of-affected-locales,
you should REINDEX btree indexes on text/varchar/bpchar columns".
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Mar 24, 2016 at 12:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
The only people that are likely comfortable giving final sign-off on
it that are active this CF are Tom and Kevin. That is an awkward
situation.I would not be comfortable with reviewing an entire module with the
intention of shipping it in a stable branch on Monday, even if I had
nothing else to do between now and then. I think the only sane way
to get this into 9.5.2 would be to slip the release date, and that
seems rather counterproductive. We need to get this fix into the
hands of users ASAP.
That's fair. I didn't really imagine that we'd want to put the tool
into 9.5 myself. Still, I think that amcheck could have some role to
play in managing the problem. Even the near-term availability of
amcheck for 9.5 as a satellite project would count. That could happen
without blocking the point release. I just don't want to go over
anyone's head with that.
"REINDEX everything" isn't a realistic plan for a lot of users.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Peter Geoghegan <pg@heroku.com> writes:
That's fair. I didn't really imagine that we'd want to put the tool
into 9.5 myself. Still, I think that amcheck could have some role to
play in managing the problem. Even the near-term availability of
amcheck for 9.5 as a satellite project would count. That could happen
without blocking the point release. I just don't want to go over
anyone's head with that.
I have no objection to something like that happening.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Mar 24, 2016 at 6:04 AM, Magnus Hagander <magnus@hagander.net> wrote:
And if not that one, can we find some other way for people to find out if
they need to REINDEX after the upgrade? It would be very nice not to have to
tell everybody to reindex everything, but to actually detect the cases where
it's needed. Or at least provide a supported way to do that, for those where
a cluster-wide reindex is really expensive.
If amcheck was made to only verify pages in isolation, then it have a
very strong chance of finding any issues, but not an iron-clad
guarantee -- it might be that the ordering was wrong across pages
(although that seems like a very small space for problems to hide).
Because we know that there is a sane total ordering for both strcoll()
and strxfrm() cases on affected systems, I'm pretty sure that the
version of amcheck in the ongoing CF (that checks child/parent, as
well as sibling relationships) would actually catch any problems of
that kind *reliably*. In other words, it would be okay that it didn't
check every item against every other item, because per Tom's analysis
the transitive law is not broken in either case, even if strcoll() is
buggy.
Even if we can't sneak amcheck into 9.5, if we can show that it detects the
problem, then just being able to direct people to "get amcheck from 9.6 if
you want to check if the reindex is necessary" would still be a strong
improvement over nothing.
Agreed.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Thanks for the quick bug fix!
I've seen that a wiki page on the subject has been created. Maybe it is useful to explicitly mention, that 9.5.1 performance can be partly maintained, by changing the collation of text columns to "C", when there is no need for special collation handling.
Best regards,
Marc-Olaf Jaschke
Am 23.03.2016 um 21:07 schrieb Robert Haas <robertmhaas@gmail.com>:
On Wed, Mar 23, 2016 at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
+#ifndef TRUST_STRXFRM + if (!collate_c) + abbreviate = false; +#endifAh, I did not realize that abbreviation would be of any value in C locale.
If it is, then +1 for something like the above.It's actually more likely to help for a C locale than for a non-C locale.
I have committed this and back-patched it to 9.5.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Mar 23, 2016 at 10:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Are you still in information-gathering more, or are you going to issue
a recommendation on how we should proceed here, or what?If I had to make a recommendation right now, I would go for your
option #4, ie shut 'em all down Scotty. We do not know the full extent
of the problem but it looks pretty bad, and I think our first priority
has to be to guarantee data integrity. I do not have a lot of faith in
the proposition that glibc's is the only buggy implementation, either.
For the record, I have been able to determine by using amcheck on the
Heroku platform that en_US.UTF-8 cases are sometimes affected by an
inconsistency between strcoll() and strxfrm() behavior, which was
previously an open question. I saw only two instances of this across
many thousands of servers. For some reason, both cases involved
strings with code points from the Arabic alphabet, even though each
case was from a totally unrelated customer database.
I'll go update the Wiki page for this [1]https://wiki.postgresql.org/wiki/Abbreviated_keys_glibc_issue -- Peter Geoghegan now.
[1]: https://wiki.postgresql.org/wiki/Abbreviated_keys_glibc_issue -- Peter Geoghegan
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 8/22/16 7:36 PM, Peter Geoghegan wrote:
For some reason, both cases involved
strings with code points from the Arabic alphabet, even though each
case was from a totally unrelated customer database.
Do those code points read right to left? Maybe that had an effect?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers