BUG #1082: Order by doesn't sort correctly.

Started by PostgreSQL Bugs Listabout 22 years ago8 messagesbugs
Jump to latest
#1PostgreSQL Bugs List
pgsql-bugs@postgresql.org

The following bug has been logged online:

Bug reference: 1082
Logged by: Richard Neill

Email address: rn214@cam.ac.uk

PostgreSQL version: 7.3.4

Operating system: Linux

Description: Order by doesn't sort correctly.

Details:

ORDER BY sorts the following in this order:

Cymbal #1
Cymbal - 18 inch
Cymbal #2

It ought to be thus:

Cymbal #1
Cymbal #2
Cymbal - 18 inch

or possibly thus:

Cymbal - 18 inch
Cymbal #1
Cymbal #2

-------------------------------------------------
Here's an example sql script to reproduce the bug.

CREATE TABLE tbl_testinstruments(
instrumentid integer, instrument character varying(300)
);

INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (1,
'Antique Cymbals #1');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (2,
'Antique Cymbals #2');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (3, 'Clash
Cymbals, French - 20 inch');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (4,
'Cymbal #1');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (5,
'Cymbal #2');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (6,
'Cymbal - 18 inch');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (7,
'Cymbal, Sizzle');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (8,
'Cymbal, Splash');

SELECT instrument FROM tbl_testinstruments ORDER BY instrument;

------------------------------------------------
This is the output I get:

instrument
---------------------------------
Antique Cymbals #1
Antique Cymbals #2
Clash Cymbals, French - 20 inch
Cymbal #1
Cymbal - 18 inch
Cymbal #2
Cymbal, Sizzle
Cymbal, Splash
(8 rows)

-------------------------------------------------
I'm using version:

PostgreSQL 7.3.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.1
(Mandrake Linux 9.2 3.3.1-1mdk)

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PostgreSQL Bugs List (#1)
Re: BUG #1082: Order by doesn't sort correctly.

"PostgreSQL Bugs List" <pgsql-bugs@postgresql.org> writes:

Description: Order by doesn't sort correctly.

It almost certainly is the correct sort order according to the locale
you're using. Use pg_controldata to check the database locale. You'll
probably want to re-initdb in C locale. Most non-C locales have weird
rules that try to approximate dictionary sort order.

regards, tom lane

#3Richard Neill
rn214@hermes.cam.ac.uk
In reply to: Tom Lane (#2)
Re: BUG #1082: Order by doesn't sort correctly.

Tom Lane wrote:

"PostgreSQL Bugs List" <pgsql-bugs@postgresql.org> writes:

Description: Order by doesn't sort correctly.

It almost certainly is the correct sort order according to the locale
you're using. Use pg_controldata to check the database locale. You'll
probably want to re-initdb in C locale. Most non-C locales have weird
rules that try to approximate dictionary sort order.

regards, tom lane

Dear Tom,

Thanks for your email. I did check pg_controldata and found:
LC_COLLATE: en_GB
LC_CTYPE: en_GB

The bug isn't the particular ascii-betical (or other) order.
But what I am getting as a supposedly ordered list includes:

Cymbal #1
Cymbal - 18 inch
Cymbal #2

This ordering is perverse! No matter what the priority is of the
different characters, I cannot understand how the above can arise.
Whether '#' comes before or after '-', '#1' and '#2' should be adjacent.

Richard

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Neill (#3)
Re: BUG #1082: Order by doesn't sort correctly.

Richard Neill <rn214@hermes.cam.ac.uk> writes:

This ordering is perverse!

No kidding.

No matter what the priority is of the
different characters, I cannot understand how the above can arise.

You are assuming that it's a byte-at-a-time process. It's not. I
believe the first pass considers only letters and digits.

You can easily prove to yourself that it's not just Postgres. Here's
an example on my Linux laptop:

[tgl@g3 tgl]$ cat zzz
Cymbal #1
Cymbal - 18 inch
Cymbal #2
[tgl@g3 tgl]$ LC_ALL=C sort zzz
Cymbal #1
Cymbal #2
Cymbal - 18 inch
[tgl@g3 tgl]$ LC_ALL=en_GB sort zzz
Cymbal #1
Cymbal - 18 inch
Cymbal #2
[tgl@g3 tgl]$

regards, tom lane

#5Richard Neill
rn214@hermes.cam.ac.uk
In reply to: Tom Lane (#4)
Re: BUG #1082: Order by doesn't sort correctly.

Dear Tom,

Thank you for your explanation. It's very helpful, although I was
extremely surprised! I agree, it's not a postgresql bug.

Can I suggest it might be worth a mention on the "Order By" part of the
documentation.

i.e. this page:
http://www.postgresql.org/docs/7.3/static/sql-select.html#SQL-ORDERBY

could possibly use a little more emphasis of this last paragraph:
Data of character types is sorted according to the locale-specific
collation order that was established when the database cluster was
initialized.

or perhaps a link to here:
http://www.postgresql.org/docs/7.3/static/charset.html#AEN21582

I did realise that the sort would be locale dependent, but failed to
realise it wasn't byte-at-a-time.

Best wishes

Richard

Tom Lane wrote:

Richard Neill <rn214@hermes.cam.ac.uk> writes:

This ordering is perverse!

No kidding.

No matter what the priority is of the
different characters, I cannot understand how the above can arise.

You are assuming that it's a byte-at-a-time process. It's not. I
believe the first pass considers only letters and digits.

You can easily prove to yourself that it's not just Postgres. Here's
an example on my Linux laptop:

[tgl@g3 tgl]$ cat zzz
Cymbal #1
Cymbal - 18 inch
Cymbal #2
[tgl@g3 tgl]$ LC_ALL=C sort zzz
Cymbal #1
Cymbal #2
Cymbal - 18 inch
[tgl@g3 tgl]$ LC_ALL=en_GB sort zzz
Cymbal #1
Cymbal - 18 inch
Cymbal #2
[tgl@g3 tgl]$

regards, tom lane

--
rn214@hermes.cam.ac.uk ** http://www.richardneill.org
Richard Neill, Trinity College, Cambridge, CB21TQ, U.K.

#6Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Tom Lane (#4)
Re: BUG #1082: Order by doesn't sort correctly.

Tom Lane wrote:

Richard Neill <rn214@hermes.cam.ac.uk> writes:

This ordering is perverse!

No kidding.

No matter what the priority is of the
different characters, I cannot understand how the above can arise.

You are assuming that it's a byte-at-a-time process. It's not. I
believe the first pass considers only letters and digits.

You can easily prove to yourself that it's not just Postgres. Here's
an example on my Linux laptop:

[tgl@g3 tgl]$ cat zzz
Cymbal #1
Cymbal - 18 inch
Cymbal #2
[tgl@g3 tgl]$ LC_ALL=C sort zzz
Cymbal #1
Cymbal #2
Cymbal - 18 inch
[tgl@g3 tgl]$ LC_ALL=en_GB sort zzz
Cymbal #1
Cymbal - 18 inch
Cymbal #2
[tgl@g3 tgl]$

I verified this, and it's not GB specific as one might suggest... Same
with en_US, de_DE, fr_FR, af_ZA. Does this behaviour really make sense
to anybody?

Regards,
Andreas

#7Devrim GÜNDÜZ
devrim@gunduz.org
In reply to: Andreas Pflug (#6)
Re: BUG #1082: Order by doesn't sort correctly.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

On Tue, 24 Feb 2004, Andreas Pflug wrote:

<snip>

I verified this, and it's not GB specific as one might suggest... Same
with en_US, de_DE, fr_FR, af_ZA. Does this behaviour really make sense
to anybody?

Ummm, same with tr_TR (for a long time).

We digged glibc and found out that tr_TR locale is broken there. Trying to
fix it. Maybe in the next release...

Regards,
- --
Devrim GUNDUZ
devrim@gunduz.org devrim.gunduz@linux.org.tr
http://www.TDMSoft.com
http://www.gunduz.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFAO1Dctl86P3SPfQ4RAotmAJwJTUGRtYezYQDByGJ4f04+bJv3wgCgpz08
TxjQCMoYTYogCBSPYkjppLs=
=YyRq
-----END PGP SIGNATURE-----

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andreas Pflug (#6)
Re: BUG #1082: Order by doesn't sort correctly.

Andreas Pflug <pgadmin@pse-consulting.de> writes:

I verified this, and it's not GB specific as one might suggest... Same
with en_US, de_DE, fr_FR, af_ZA.

Yeah, most of the locales use dictionary ordering rules.

Does this behaviour really make sense to anybody?

You'd have to argue about it with the people who work on locales.
AFAIK it is a standard of some kind.

regards, tom lane