Collation in ORDER BY not lexicographical

Started by Paul Gasparover 16 years ago5 messagesgeneral
Jump to latest
#1Paul Gaspar
devlist@revolversoft.com

Hi!

We have big problems with collation in ORDER BY, which happens in
binary order, not alphabetic (lexicographical), like:.

A
B
Z
a
z
Ä
Ö
ä
ö

This is what we have done:

1. initdb -U myuser -E UTF-8 --locale=de_DE.UTF-8 -D /mydata

2. pg_ctl -U myuser -D /mydata -l logfile start

3. createdb -U myuser mydb

4. create table mytable(name text)

5. insert into mytable values('Adam'); ....

6. select * from mytable order by name

The result is:

"Adam"
"Berta"
"Mann"
"Zoo"
"Zoodirektor"
"adam"
"Äpfel"
"Öl"
"äpfel"

"locale -a" shows:

de_DE
de_DE.ISO8859-1
de_DE.ISO8859-15
de_DE.UTF-8

"psql -l" shows:

                                            List of databases
    Name       |     Owner      | Encoding |  Collation  |    Ctype     
|         Access privileges
----------+-----------+----------+-------------+------------- 
+-----------------------------------
  postgres  | myuser | UTF8     | de_DE.UTF-8 | de_DE.UTF-8 |
  template0 | myuser | UTF8     | de_DE.UTF-8 | de_DE.UTF-8 | =c/myuser
: myuser =CTc/myuser
  template1 | myuser | UTF8     | de_DE.UTF-8 | de_DE.UTF-8 | =c/myuser
                                                                    :  
myuser =CTc/myuser
  mydb      | myuser | UTF8     | de_DE.UTF-8 | de_DE.UTF-8 |

PG is running on Mac OS X 10.5 and 10.6 Intel.

Any help is appreciated. Thanks very much in advance.

Paul

#2Scott Marlowe
scott.marlowe@gmail.com
In reply to: Paul Gaspar (#1)
Re: Collation in ORDER BY not lexicographical

On Tue, Sep 29, 2009 at 2:52 AM, Paul Gaspar <devlist@revolversoft.com> wrote:

Hi!

We have big problems with collation in ORDER BY, which happens in binary
order, not alphabetic (lexicographical), like:.

A
B
Z
a
z
Ä
Ö
ä
ö

PG is running on Mac OS X 10.5 and 10.6 Intel.

I seem to recall there were some problem with Mac locales at some
point being broken. Could be you're running into that issue.

#3Maximilian Tyrtania
maximilian.tyrtania@onlinehome.de
In reply to: Scott Marlowe (#2)
Re: Collation in ORDER BY not lexicographical

am 29.09.2009 11:21 Uhr schrieb Scott Marlowe unter scott.marlowe@gmail.com:

On Tue, Sep 29, 2009 at 2:52 AM, Paul Gaspar <devlist@revolversoft.com> wrote:

Hi!

We have big problems with collation in ORDER BY, which happens in binary
order, not alphabetic (lexicographical), like:.

A
B
Z
a
z
Ä
Ö
ä
ö

PG is running on Mac OS X 10.5 and 10.6 Intel.

I seem to recall there were some problem with Mac locales at some
point being broken. Could be you're running into that issue.

Yep, i ran into this as well. Here is my workaround: Create a function like
this:

CREATE OR REPLACE FUNCTION f_getorderbyfriendlyversion(texttoconvert text)

RETURNS text AS
$BODY$
select
replace(replace(replace(replace(replace(replace($1,'Ä','A'),'Ö','O'),'Ü','U'
),'ä','a'),'ö','o'),'ü','u');

$BODY$

LANGUAGE 'sql' IMMUTABLE STRICT
COST 100;

ALTER FUNCTION f_getorderbyfriendlyversion(text) OWNER TO postgres;

Then create an index like this:

create index idx_personen_nachname_orderByFriendly on personen
(f_getorderbyfriendlyversion(nachname))

Now you can do:

select * from personen order by f_getorderbyfriendlyversion(p.nachname)

Seems pretty fast.

Best,

Maximilian Tyrtania

#4Peter Eisentraut
peter_e@gmx.net
In reply to: Scott Marlowe (#2)
Re: Collation in ORDER BY not lexicographical

On Tue, 2009-09-29 at 03:21 -0600, Scott Marlowe wrote:

On Tue, Sep 29, 2009 at 2:52 AM, Paul Gaspar <devlist@revolversoft.com> wrote:

Hi!

We have big problems with collation in ORDER BY, which happens in binary
order, not alphabetic (lexicographical), like:.

A
B
Z
a
z
Ä
Ö
ä
ö

PG is running on Mac OS X 10.5 and 10.6 Intel.

I seem to recall there were some problem with Mac locales at some
point being broken. Could be you're running into that issue.

Yes, the UTF8 locales on BSD systems (Mac OS X, FreeBSD, etc.) are
dysfunctional. Either switch to a non-UTF8 locale or a different
operating system.

#5Paul Gaspar
devlist@revolversoft.com
In reply to: Maximilian Tyrtania (#3)
Re: Collation in ORDER BY not lexicographical

Thank you all very much for your help.

Maximilian, we simplified your replacing code:

replace(replace(replace(replace(replace(replace
($1,'Ä','A'),'Ö','O'),'Ü','U'
),'ä','a'),'ö','o'),'ü','u');

to this:

translate(upper($1),'ÄÖÜ','AOU')

Paul

Am 29.09.2009 um 14:36 schrieb Maximilian Tyrtania:

Show quoted text

am 29.09.2009 11:21 Uhr schrieb Scott Marlowe unter scott.marlowe@gmail.com
:

On Tue, Sep 29, 2009 at 2:52 AM, Paul Gaspar <devlist@revolversoft.com

wrote:
Hi!

We have big problems with collation in ORDER BY, which happens in
binary
order, not alphabetic (lexicographical), like:.

PG is running on Mac OS X 10.5 and 10.6 Intel.

I seem to recall there were some problem with Mac locales at some
point being broken. Could be you're running into that issue.

Yep, i ran into this as well. Here is my workaround: Create a
function like
this:

CREATE OR REPLACE FUNCTION f_getorderbyfriendlyversion(texttoconvert
text)

RETURNS text AS
$BODY$
select
replace(replace(replace(replace(replace(replace
($1,'Ä','A'),'Ö','O'),'Ü','U'
),'ä','a'),'ö','o'),'ü','u');

$BODY$

LANGUAGE 'sql' IMMUTABLE STRICT
COST 100;

ALTER FUNCTION f_getorderbyfriendlyversion(text) OWNER TO postgres;

Then create an index like this:

create index idx_personen_nachname_orderByFriendly on personen
(f_getorderbyfriendlyversion(nachname))

Now you can do:

select * from personen order by f_getorderbyfriendlyversion
(p.nachname)

Seems pretty fast.

Best,

Maximilian Tyrtania