BUG #2317: Wrong sorting order for (VW)
The following bug has been logged online:
Bug reference: 2317
Logged by: TomasKlockar
Email address: tomask@omicron.se
PostgreSQL version: 7.3.2/7.4.7
Operating system: linux(fedora)
Description: Wrong sorting order for (VW)
Details:
SELECT cname FROM clients ORDER BY cname;
sorts W before V and that is a mistake.
the result is
V
W
WHI
Vi
Wi
Volvo
Correct order would be to place all vV before all wW.
I think the database was initialized with UTF8 but it might have been
ISO_8859-1, however the error is easily repeatable and ill be happy to
provide a dump.
I tested this on 7.3.2 and 7.4.7 and the error was repeatable.
regards,
/Tomas
On Mon, 13 Mar 2006, TomasKlockar wrote:
The following bug has been logged online:
Bug reference: 2317
Logged by: TomasKlockar
Email address: tomask@omicron.se
PostgreSQL version: 7.3.2/7.4.7
Operating system: linux(fedora)
Description: Wrong sorting order for (VW)
Details:SELECT cname FROM clients ORDER BY cname;
sorts W before V and that is a mistake.
the result is
V
W
WHI
Vi
Wi
VolvoCorrect order would be to place all vV before all wW.
I think the database was initialized with UTF8 but it might have been
ISO_8859-1, however the error is easily repeatable and ill be happy to
provide a dump.I tested this on 7.3.2 and 7.4.7 and the error was repeatable.
I think the most important thing we'd need to know is what locale the
database was initialized with in order to try to reproduce.
If you put similar data in a file and use the unix "sort" command with the
same locale, do you get the same order?
"TomasKlockar" <tomask@omicron.se> writes:
I think the database was initialized with UTF8 but it might have been
ISO_8859-1, however the error is easily repeatable and ill be happy to
provide a dump.
If the database encoding doesn't match the database locale (server
LC_COLLATE setting), all kinds of very strange sorting behavior can
ensue. I suspect you've been bit by this gotcha.
regards, tom lane
The following bug has been logged online:
Bug reference: 2317
Logged by: TomasKlockar
Email address: tomask@omicron.se
PostgreSQL version: 7.3.2/7.4.7
Operating system: linux(fedora)
Description: Wrong sorting order for (VW)
Details:SELECT cname FROM clients ORDER BY cname;
sorts W before V and that is a mistake.
the result is
V
W
WHI
Vi
Wi
VolvoCorrect order would be to place all vV before all wW.
I think the database was initialized with UTF8 but it might
have been
ISO_8859-1, however the error is easily repeatable and ill
be happy to
provide a dump.
I tested this on 7.3.2 and 7.4.7 and the error was repeatable.
I think the most important thing we'd need to know is what
locale the database was initialized with in order to try to reproduce.If you put similar data in a file and use the unix "sort"
command with the same locale, do you get the same order?
That looks like the typical swedish locale, which sorts V and W as the
same character. Yes, that can be very annoying for some of us :-), but
that's the way it's defined.
//Magnus
Import Notes
Resolved by subject fallback
Magnus Hagander wrote:
The following bug has been logged online:
Bug reference: 2317
Logged by: TomasKlockar
Email address: tomask@omicron.se
PostgreSQL version: 7.3.2/7.4.7
Operating system: linux(fedora)
Description: Wrong sorting order for (VW)
Details:SELECT cname FROM clients ORDER BY cname;
sorts W before V and that is a mistake.
the result is
V
W
WHI
Vi
Wi
VolvoCorrect order would be to place all vV before all wW.
I think the database was initialized with UTF8 but it might
have been
ISO_8859-1, however the error is easily repeatable and ill
be happy to
provide a dump.
I tested this on 7.3.2 and 7.4.7 and the error was repeatable.
I think the most important thing we'd need to know is what
locale the database was initialized with in order to try to reproduce.If you put similar data in a file and use the unix "sort"
command with the same locale, do you get the same order?That looks like the typical swedish locale, which sorts V and W as the
same character. Yes, that can be very annoying for some of us :-), but
that's the way it's defined.//Magnus
I would say that sorting V and W as the same character may work in a
dictionary or sometimes when sorting names, however in that case why
don't it sort C and K as the same character? and in some cases C and S
as the same.
The best sorting algorithm should sort all characters as separate
characters, and if you want fancy sorting you should need to turn it on
for a table.
Fancy sorting, like sorting Carlsson and Karlsson together since they
are equal in the same way as Viktor=Wiktor=Victor=Wictor. Cesar and
Sesar would also have to be sorted together since they are pronounced
the same in swedish, and if you look in a phonebook you would find them
at the same place.
Now I get the english words was and vas sorted together when I need them
separated.
Currently the sortingfunction in postgreSQL have disqualified itself
from beeing used, and I do the sorting in java which treat them as
separate caracters.
This is my 5 cent,
/Tomas
The following bug has been logged online:
Bug reference: 2317
Logged by: TomasKlockar
Email address: tomask@omicron.se
PostgreSQL version: 7.3.2/7.4.7
Operating system: linux(fedora)
Description: Wrong sorting order for (VW)
Details:SELECT cname FROM clients ORDER BY cname;
sorts W before V and that is a mistake.
the result is
V
W
WHI
Vi
Wi
VolvoCorrect order would be to place all vV
before all wW.I think the database was initialized
with UTF8 but it mighthave been
ISO_8859-1, however the error is easily
repeatable and illbe happy to
provide a dump.
I tested this on 7.3.2 and 7.4.7 and
the error was repeatable.I think the most important thing we'd need to
know is what
locale the database was initialized with in
order to try to reproduce.If you put similar data in a file and use the
unix "sort"
command with the same locale, do you get the same order?That looks like the typical swedish locale, which sorts
V and W as the
same character. Yes, that can be very annoying for some
of us :-), but
that's the way it's defined.//Magnus
I would say that sorting V and W as the same character may
work in a dictionary or sometimes when sorting names, however
in that case why don't it sort C and K as the same character?
and in some cases C and S as the same.The best sorting algorithm should sort all characters as
separate characters, and if you want fancy sorting you should
need to turn it on for a table.Fancy sorting, like sorting Carlsson and Karlsson together
since they are equal in the same way as
Viktor=Wiktor=Victor=Wictor. Cesar and Sesar would also have
to be sorted together since they are pronounced the same in
swedish, and if you look in a phonebook you would find them
at the same place.Now I get the english words was and vas sorted together when
I need them separated.Currently the sortingfunction in postgreSQL have disqualified
itself from beeing used, and I do the sorting in java which
treat them as separate caracters.
Well, PostgreSQL uses the locale functionatlity provided by your OS, so you will need to talk to them. Perhaps they already ship a different locale definition that is more suitable for your needs that you can change to?
If you select locale=C you will get the sort all chars as separate characters. It will break your sorting of åäö, because they are not in the correct sequence in LATIN1 (or UTF8 for that matter), but you can certainly do that. It's a matter of picking the correct locale when you initdb your database.
//Magnus
Import Notes
Resolved by subject fallback
On Thu, Mar 16, 2006 at 11:03:59AM +0100, Tomas Klockar wrote:
[...]
I would say that sorting V and W as the same character may work in a
dictionary [...]
Currently the sortingfunction in postgreSQL have disqualified itself
from beeing used, and I do the sorting in java which treat them as
separate caracters.This is my 5 cent,
Tomas, as it has been said, PostgreSQL uses the operating system's
locale. So you can do two things: change it (e.b. to C , which is
neutral) or fix it. If you are on an unix-like OS, man 5 locale might
help with that.
Regards
-- tomás