BUG #14112: sorting v and w is broken with et_EE locate

Started by Georg Kahestalmost 10 years ago9 messagesbugs
Jump to latest
#1Georg Kahest
georg.kahest@internet.ee

The following bug has been logged on the website:

Bug reference: 14112
Logged by: Georg Kahest
Email address: georg.kahest@internet.ee
PostgreSQL version: 9.4.7
Operating system: Debian Jessie
Description:

It seems that sorting v and w with et_EE locate is broken (other chars seem
to be okey):

select name COLLATE "et_EE" from test order by name;
name
--------------------
a1.ee
vvbwjbln7.ee
wwvl8.ee
wxxezi6lkaq7eoi.ee
vyz.ee
(5 rows)

select name COLLATE "en_US" from test order by name;
name
--------------------
a1.ee
vvbwjbln7.ee
vyz.ee
wwvl8.ee
wxxezi6lkaq7eoi.ee
(5 rows)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Thomas Munro
thomas.munro@gmail.com
In reply to: Georg Kahest (#1)
Re: BUG #14112: sorting v and w is broken with et_EE locate

On Tue, Apr 26, 2016 at 2:37 AM, <georg.kahest@internet.ee> wrote:

The following bug has been logged on the website:

Bug reference: 14112
Logged by: Georg Kahest
Email address: georg.kahest@internet.ee
PostgreSQL version: 9.4.7
Operating system: Debian Jessie
Description:

It seems that sorting v and w with et_EE locate is broken (other chars seem
to be okey):

select name COLLATE "et_EE" from test order by name;
name
--------------------
a1.ee
vvbwjbln7.ee
wwvl8.ee
wxxezi6lkaq7eoi.ee
vyz.ee
(5 rows)

select name COLLATE "en_US" from test order by name;
name
--------------------
a1.ee
vvbwjbln7.ee
vyz.ee
wwvl8.ee
wxxezi6lkaq7eoi.ee
(5 rows)

That does look odd. If that's not the correct way to sort Estonian,
then that should probably be reported to the Debian glibc maintainers
(or maybe the glibc project). Here's a Debian Jessie box
demonstrating that behaviour without any help from PostgreSQL:

munro@yoga:~/junk$ locale -a | grep et_EE
et_EE
et_EE.iso885915
et_EE.utf8
munro@yoga:~/junk$ cat input
a1.ee
vvbwjbln7.ee
vyz.ee
wwvl8.ee
wxxezi6lkaq7eoi.ee
munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input
a1.ee
vvbwjbln7.ee
wwvl8.ee
wxxezi6lkaq7eoi.ee
vyz.ee
munro@yoga:~/junk$ LC_COLLATE=en_US.utf8 sort < input
a1.ee
vvbwjbln7.ee
vyz.ee
wwvl8.ee
wxxezi6lkaq7eoi.ee

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Thomas Munro (#2)
Re: BUG #14112: sorting v and w is broken with et_EE locate

On Wed, Apr 27, 2016 at 9:07 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

That does look odd.

What happens if you replace the dot in each string with a single 'x'
character, Georg? Does the sort order look correct to you then?

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Peter Geoghegan (#3)
Re: BUG #14112: sorting v and w is broken with et_EE locate

On Wed, Apr 27, 2016 at 9:22 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Wed, Apr 27, 2016 at 9:07 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

That does look odd.

What happens if you replace the dot in each string with a single 'x'
character, Georg? Does the sort order look correct to you then?

I ask because I suspect that this might be the same strcoll() bug I
describe here: https://bugzilla.redhat.com/show_bug.cgi?id=1320356

(In particular, see my remarks on Austria and Germany.)
--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#4)
Re: BUG #14112: sorting v and w is broken with et_EE locate

Peter Geoghegan <pg@heroku.com> writes:

I ask because I suspect that this might be the same strcoll() bug I
describe here: https://bugzilla.redhat.com/show_bug.cgi?id=1320356

The report is against 9.4, though, so strcoll shouldn't matter.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Geoghegan (#4)
Re: BUG #14112: sorting v and w is broken with et_EE locate

On Thu, Apr 28, 2016 at 4:24 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Wed, Apr 27, 2016 at 9:22 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Wed, Apr 27, 2016 at 9:07 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

That does look odd.

What happens if you replace the dot in each string with a single 'x'
character, Georg? Does the sort order look correct to you then?

I ask because I suspect that this might be the same strcoll() bug I
describe here: https://bugzilla.redhat.com/show_bug.cgi?id=1320356

(In particular, see my remarks on Austria and Germany.)

No change here. This system has locales-all ("GNU C Library:
Precompiled locale data") package version 2.19-18+deb8u4 (and same
libc6).

munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input
a1.ee
vvbwjbln7.ee
wwvl8.ee
wxxezi6lkaq7eoi.ee
vyz.ee
munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input2
a1xee
vvbwjbln7xee
wwvl8xee
wxxezi6lkaq7eoixee
vyzxee

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Tom Lane (#5)
Re: BUG #14112: sorting v and w is broken with et_EE locate

On Wed, Apr 27, 2016 at 9:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Geoghegan <pg@heroku.com> writes:

I ask because I suspect that this might be the same strcoll() bug I
describe here: https://bugzilla.redhat.com/show_bug.cgi?id=1320356

The report is against 9.4, though, so strcoll shouldn't matter.

It shouldn't matter that it doesn't agree with strxfrm(), which is the
most important thing, but not the only thing. I think that it would be
interesting to know if this is a strcoll() problem. I have no
intention of pursuing a "fix" from the glibc people.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#6)
Re: BUG #14112: sorting v and w is broken with et_EE locate

On Thu, Apr 28, 2016 at 4:43 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Apr 28, 2016 at 4:24 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Wed, Apr 27, 2016 at 9:22 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Wed, Apr 27, 2016 at 9:07 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

That does look odd.

What happens if you replace the dot in each string with a single 'x'
character, Georg? Does the sort order look correct to you then?

I ask because I suspect that this might be the same strcoll() bug I
describe here: https://bugzilla.redhat.com/show_bug.cgi?id=1320356

(In particular, see my remarks on Austria and Germany.)

No change here. This system has locales-all ("GNU C Library:
Precompiled locale data") package version 2.19-18+deb8u4 (and same
libc6).

munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input
a1.ee
vvbwjbln7.ee
wwvl8.ee
wxxezi6lkaq7eoi.ee
vyz.ee
munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input2
a1xee
vvbwjbln7xee
wwvl8xee
wxxezi6lkaq7eoixee
vyzxee

Same result on a CentOS box. I think the OP should probably write to
bug-glibc-locales@gnu.org.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#9Georg Kahest
georg.kahest@internet.ee
In reply to: Thomas Munro (#8)
Re: BUG #14112: sorting v and w is broken with et_EE locate

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 04/28/2016 08:09 AM, Thomas Munro wrote:

On Thu, Apr 28, 2016 at 4:43 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Apr 28, 2016 at 4:24 PM, Peter Geoghegan <pg@heroku.com>
wrote:

On Wed, Apr 27, 2016 at 9:22 PM, Peter Geoghegan
<pg@heroku.com> wrote:

On Wed, Apr 27, 2016 at 9:07 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

That does look odd.

What happens if you replace the dot in each string with a
single 'x' character, Georg? Does the sort order look correct
to you then?

I ask because I suspect that this might be the same strcoll()
bug I describe here:
https://bugzilla.redhat.com/show_bug.cgi?id=1320356

(In particular, see my remarks on Austria and Germany.)

No change here. This system has locales-all ("GNU C Library:
Precompiled locale data") package version 2.19-18+deb8u4 (and
same libc6).

munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input a1.ee
vvbwjbln7.ee wwvl8.ee wxxezi6lkaq7eoi.ee vyz.ee
munro@yoga:~/junk$ LC_COLLATE=et_EE.utf8 sort < input2 a1xee
vvbwjbln7xee wwvl8xee wxxezi6lkaq7eoixee vyzxee

Same result on a CentOS box. I think the OP should probably write
to bug-glibc-locales@gnu.org.

Hello,

Indeed the problem seems to be related to to glibc itself handling it
incorrectly.

Thank you for your time, ill report the bug to glibc.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJXIeHSAAoJEFDOdES6xIFjLNQP/3hLYCBS1ex78SN+uIZGT4xV
1nx/xadb9qQ3AoVT2CsHVgL9QVwCXmNbXR/tAfdj6OKy9i8WMzBvuI4cvZjKB+ei
f1FeJc2ldnpLgAQ9/R9FRqMpGch4MnkhwxhK4+c69TqTvugvPwGpSvAddPj5edxn
IM2diNuCtQKSw+fHwP1/N4hB67TfFX+rfoHbdhwSlGbuK8Lxs+kpxIecP1WutcS5
jrFbptaLlWKMTptQmyVKINu8sztRxMdlJ5ywUr9UpL2GdaQv3SzhzC5OOcDh4a96
stmh7fZ6DBBpvvGWg/bJLNTi+nOgyEb9vFwKQMvseyoXnyRG4JyvoNJyzDpccyVt
1lWYhnlPuSFTYOI9zWfhcmWgZ5XY7g3kC3B5Ode5pawvSsHZ1ynvsxEHOK9i3J67
nAU4g1ehjw9sYwl+5g7+xuRXNoGIAr4prGAzlM7ZOG+2mwEpAqaQGkxTYZ+Sts0i
I/+SIMpDfQbZmMjzkKvwBSAqJZlCZign1fEt234uhRuIfI3ucxhBAegXGFpUzQSS
/qy0knRog+ouTwQN3pV1QAbcHfC8ZcBiPSzivT3KNneHaD7usz2GrD8wB8OyAbMH
nzThOL6aUhLUdmCOpm9/zil4HTXcTWxSUcoWLhHEAjzlto+74I4yUJM7L5LK7Lfq
LWBpBv7i2Nj/goR2w2Ip
=Lfo9
-----END PGP SIGNATURE-----

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs