Collation in ORDER BY not lexicographical
Hi!
We have big problems with collation in ORDER BY, which happens in
binary order, not alphabetic (lexicographical), like:.
A
B
Z
a
z
Ä
Ö
ä
ö
This is what we have done:
1. initdb -U myuser -E UTF-8 --locale=de_DE.UTF-8 -D /mydata
2. pg_ctl -U myuser -D /mydata -l logfile start
3. createdb -U myuser mydb
4. create table mytable(name text)
5. insert into mytable values('Adam'); ....
6. select * from mytable order by name
The result is:
"Adam"
"Berta"
"Mann"
"Zoo"
"Zoodirektor"
"adam"
"Äpfel"
"Öl"
"äpfel"
"locale -a" shows:
de_DE
de_DE.ISO8859-1
de_DE.ISO8859-15
de_DE.UTF-8
"psql -l" shows:
List of databases
Name | Owner | Encoding | Collation | Ctype
| Access privileges
----------+-----------+----------+-------------+-------------
+-----------------------------------
postgres | myuser | UTF8 | de_DE.UTF-8 | de_DE.UTF-8 |
template0 | myuser | UTF8 | de_DE.UTF-8 | de_DE.UTF-8 | =c/myuser
: myuser =CTc/myuser
template1 | myuser | UTF8 | de_DE.UTF-8 | de_DE.UTF-8 | =c/myuser
:
myuser =CTc/myuser
mydb | myuser | UTF8 | de_DE.UTF-8 | de_DE.UTF-8 |
PG is running on Mac OS X 10.5 and 10.6 Intel.
Any help is appreciated. Thanks very much in advance.
Paul
On Tue, Sep 29, 2009 at 2:52 AM, Paul Gaspar <devlist@revolversoft.com> wrote:
Hi!
We have big problems with collation in ORDER BY, which happens in binary
order, not alphabetic (lexicographical), like:.A
B
Z
a
z
Ä
Ö
ä
ö
PG is running on Mac OS X 10.5 and 10.6 Intel.
I seem to recall there were some problem with Mac locales at some
point being broken. Could be you're running into that issue.
am 29.09.2009 11:21 Uhr schrieb Scott Marlowe unter scott.marlowe@gmail.com:
On Tue, Sep 29, 2009 at 2:52 AM, Paul Gaspar <devlist@revolversoft.com> wrote:
Hi!
We have big problems with collation in ORDER BY, which happens in binary
order, not alphabetic (lexicographical), like:.A
B
Z
a
z
Ä
Ö
ä
öPG is running on Mac OS X 10.5 and 10.6 Intel.
I seem to recall there were some problem with Mac locales at some
point being broken. Could be you're running into that issue.
Yep, i ran into this as well. Here is my workaround: Create a function like
this:
CREATE OR REPLACE FUNCTION f_getorderbyfriendlyversion(texttoconvert text)
RETURNS text AS
$BODY$
select
replace(replace(replace(replace(replace(replace($1,'Ä','A'),'Ö','O'),'Ü','U'
),'ä','a'),'ö','o'),'ü','u');
$BODY$
LANGUAGE 'sql' IMMUTABLE STRICT
COST 100;
ALTER FUNCTION f_getorderbyfriendlyversion(text) OWNER TO postgres;
Then create an index like this:
create index idx_personen_nachname_orderByFriendly on personen
(f_getorderbyfriendlyversion(nachname))
Now you can do:
select * from personen order by f_getorderbyfriendlyversion(p.nachname)
Seems pretty fast.
Best,
Maximilian Tyrtania
On Tue, 2009-09-29 at 03:21 -0600, Scott Marlowe wrote:
On Tue, Sep 29, 2009 at 2:52 AM, Paul Gaspar <devlist@revolversoft.com> wrote:
Hi!
We have big problems with collation in ORDER BY, which happens in binary
order, not alphabetic (lexicographical), like:.A
B
Z
a
z
Ä
Ö
ä
öPG is running on Mac OS X 10.5 and 10.6 Intel.
I seem to recall there were some problem with Mac locales at some
point being broken. Could be you're running into that issue.
Yes, the UTF8 locales on BSD systems (Mac OS X, FreeBSD, etc.) are
dysfunctional. Either switch to a non-UTF8 locale or a different
operating system.
Thank you all very much for your help.
Maximilian, we simplified your replacing code:
replace(replace(replace(replace(replace(replace
($1,'Ä','A'),'Ö','O'),'Ü','U'
),'ä','a'),'ö','o'),'ü','u');
to this:
translate(upper($1),'ÄÖÜ','AOU')
Paul
Am 29.09.2009 um 14:36 schrieb Maximilian Tyrtania:
Show quoted text
am 29.09.2009 11:21 Uhr schrieb Scott Marlowe unter scott.marlowe@gmail.com
:On Tue, Sep 29, 2009 at 2:52 AM, Paul Gaspar <devlist@revolversoft.com
wrote:
Hi!We have big problems with collation in ORDER BY, which happens in
binary
order, not alphabetic (lexicographical), like:.PG is running on Mac OS X 10.5 and 10.6 Intel.
I seem to recall there were some problem with Mac locales at some
point being broken. Could be you're running into that issue.Yep, i ran into this as well. Here is my workaround: Create a
function like
this:CREATE OR REPLACE FUNCTION f_getorderbyfriendlyversion(texttoconvert
text)RETURNS text AS
$BODY$
select
replace(replace(replace(replace(replace(replace
($1,'Ä','A'),'Ö','O'),'Ü','U'
),'ä','a'),'ö','o'),'ü','u');$BODY$
LANGUAGE 'sql' IMMUTABLE STRICT
COST 100;ALTER FUNCTION f_getorderbyfriendlyversion(text) OWNER TO postgres;
Then create an index like this:
create index idx_personen_nachname_orderByFriendly on personen
(f_getorderbyfriendlyversion(nachname))Now you can do:
select * from personen order by f_getorderbyfriendlyversion
(p.nachname)Seems pretty fast.
Best,
Maximilian Tyrtania