BUG #1082: Order by doesn't sort correctly.
The following bug has been logged online:
Bug reference: 1082
Logged by: Richard Neill
Email address: rn214@cam.ac.uk
PostgreSQL version: 7.3.4
Operating system: Linux
Description: Order by doesn't sort correctly.
Details:
ORDER BY sorts the following in this order:
Cymbal #1
Cymbal - 18 inch
Cymbal #2
It ought to be thus:
Cymbal #1
Cymbal #2
Cymbal - 18 inch
or possibly thus:
Cymbal - 18 inch
Cymbal #1
Cymbal #2
-------------------------------------------------
Here's an example sql script to reproduce the bug.
CREATE TABLE tbl_testinstruments(
instrumentid integer, instrument character varying(300)
);
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (1,
'Antique Cymbals #1');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (2,
'Antique Cymbals #2');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (3, 'Clash
Cymbals, French - 20 inch');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (4,
'Cymbal #1');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (5,
'Cymbal #2');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (6,
'Cymbal - 18 inch');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (7,
'Cymbal, Sizzle');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (8,
'Cymbal, Splash');
SELECT instrument FROM tbl_testinstruments ORDER BY instrument;
------------------------------------------------
This is the output I get:
instrument
---------------------------------
Antique Cymbals #1
Antique Cymbals #2
Clash Cymbals, French - 20 inch
Cymbal #1
Cymbal - 18 inch
Cymbal #2
Cymbal, Sizzle
Cymbal, Splash
(8 rows)
-------------------------------------------------
I'm using version:
PostgreSQL 7.3.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.1
(Mandrake Linux 9.2 3.3.1-1mdk)
"PostgreSQL Bugs List" <pgsql-bugs@postgresql.org> writes:
Description: Order by doesn't sort correctly.
It almost certainly is the correct sort order according to the locale
you're using. Use pg_controldata to check the database locale. You'll
probably want to re-initdb in C locale. Most non-C locales have weird
rules that try to approximate dictionary sort order.
regards, tom lane
Tom Lane wrote:
"PostgreSQL Bugs List" <pgsql-bugs@postgresql.org> writes:
Description: Order by doesn't sort correctly.
It almost certainly is the correct sort order according to the locale
you're using. Use pg_controldata to check the database locale. You'll
probably want to re-initdb in C locale. Most non-C locales have weird
rules that try to approximate dictionary sort order.regards, tom lane
Dear Tom,
Thanks for your email. I did check pg_controldata and found:
LC_COLLATE: en_GB
LC_CTYPE: en_GB
The bug isn't the particular ascii-betical (or other) order.
But what I am getting as a supposedly ordered list includes:
Cymbal #1
Cymbal - 18 inch
Cymbal #2
This ordering is perverse! No matter what the priority is of the
different characters, I cannot understand how the above can arise.
Whether '#' comes before or after '-', '#1' and '#2' should be adjacent.
Richard
Richard Neill <rn214@hermes.cam.ac.uk> writes:
This ordering is perverse!
No kidding.
No matter what the priority is of the
different characters, I cannot understand how the above can arise.
You are assuming that it's a byte-at-a-time process. It's not. I
believe the first pass considers only letters and digits.
You can easily prove to yourself that it's not just Postgres. Here's
an example on my Linux laptop:
[tgl@g3 tgl]$ cat zzz
Cymbal #1
Cymbal - 18 inch
Cymbal #2
[tgl@g3 tgl]$ LC_ALL=C sort zzz
Cymbal #1
Cymbal #2
Cymbal - 18 inch
[tgl@g3 tgl]$ LC_ALL=en_GB sort zzz
Cymbal #1
Cymbal - 18 inch
Cymbal #2
[tgl@g3 tgl]$
regards, tom lane
Dear Tom,
Thank you for your explanation. It's very helpful, although I was
extremely surprised! I agree, it's not a postgresql bug.
Can I suggest it might be worth a mention on the "Order By" part of the
documentation.
i.e. this page:
http://www.postgresql.org/docs/7.3/static/sql-select.html#SQL-ORDERBY
could possibly use a little more emphasis of this last paragraph:
Data of character types is sorted according to the locale-specific
collation order that was established when the database cluster was
initialized.
or perhaps a link to here:
http://www.postgresql.org/docs/7.3/static/charset.html#AEN21582
I did realise that the sort would be locale dependent, but failed to
realise it wasn't byte-at-a-time.
Best wishes
Richard
Tom Lane wrote:
Richard Neill <rn214@hermes.cam.ac.uk> writes:
This ordering is perverse!
No kidding.
No matter what the priority is of the
different characters, I cannot understand how the above can arise.You are assuming that it's a byte-at-a-time process. It's not. I
believe the first pass considers only letters and digits.You can easily prove to yourself that it's not just Postgres. Here's
an example on my Linux laptop:[tgl@g3 tgl]$ cat zzz
Cymbal #1
Cymbal - 18 inch
Cymbal #2
[tgl@g3 tgl]$ LC_ALL=C sort zzz
Cymbal #1
Cymbal #2
Cymbal - 18 inch
[tgl@g3 tgl]$ LC_ALL=en_GB sort zzz
Cymbal #1
Cymbal - 18 inch
Cymbal #2
[tgl@g3 tgl]$regards, tom lane
--
rn214@hermes.cam.ac.uk ** http://www.richardneill.org
Richard Neill, Trinity College, Cambridge, CB21TQ, U.K.
Tom Lane wrote:
Richard Neill <rn214@hermes.cam.ac.uk> writes:
This ordering is perverse!
No kidding.
No matter what the priority is of the
different characters, I cannot understand how the above can arise.You are assuming that it's a byte-at-a-time process. It's not. I
believe the first pass considers only letters and digits.You can easily prove to yourself that it's not just Postgres. Here's
an example on my Linux laptop:[tgl@g3 tgl]$ cat zzz
Cymbal #1
Cymbal - 18 inch
Cymbal #2
[tgl@g3 tgl]$ LC_ALL=C sort zzz
Cymbal #1
Cymbal #2
Cymbal - 18 inch
[tgl@g3 tgl]$ LC_ALL=en_GB sort zzz
Cymbal #1
Cymbal - 18 inch
Cymbal #2
[tgl@g3 tgl]$
I verified this, and it's not GB specific as one might suggest... Same
with en_US, de_DE, fr_FR, af_ZA. Does this behaviour really make sense
to anybody?
Regards,
Andreas
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
On Tue, 24 Feb 2004, Andreas Pflug wrote:
<snip>
I verified this, and it's not GB specific as one might suggest... Same
with en_US, de_DE, fr_FR, af_ZA. Does this behaviour really make sense
to anybody?
Ummm, same with tr_TR (for a long time).
We digged glibc and found out that tr_TR locale is broken there. Trying to
fix it. Maybe in the next release...
Regards,
- --
Devrim GUNDUZ
devrim@gunduz.org devrim.gunduz@linux.org.tr
http://www.TDMSoft.com
http://www.gunduz.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
iD8DBQFAO1Dctl86P3SPfQ4RAotmAJwJTUGRtYezYQDByGJ4f04+bJv3wgCgpz08
TxjQCMoYTYogCBSPYkjppLs=
=YyRq
-----END PGP SIGNATURE-----
Andreas Pflug <pgadmin@pse-consulting.de> writes:
I verified this, and it's not GB specific as one might suggest... Same
with en_US, de_DE, fr_FR, af_ZA.
Yeah, most of the locales use dictionary ordering rules.
Does this behaviour really make sense to anybody?
You'd have to argue about it with the people who work on locales.
AFAIK it is a standard of some kind.
regards, tom lane