Issues with german 'Umlaute'

Started by Nicolaus Erichsenover 23 years ago3 messagesbugs
Jump to latest
#1Nicolaus Erichsen
nico.erichsen@hsh-berlin.com

Hello everybody,

I recently found a problem with sorting german 'Umlaute' . I hope the encoding
of this mail works ;-) :

Postgres puts Umlaute (i.e., ᅵᅵᅵᅵᅵᅵ) at the very end of the Alphabet, and
this is not the way it should be. I didn't check for the special Character
'ᅵ', but its probably similar.

The canonical sort order for Umlaute is to treat them as two characters, like
this:
ᅵ -> ae
ᅵ -> oe
ᅵ -> ue
ᅵ -> ss
( and the same for upper case 'ᅵᅵᅵ'. 'ᅵ' does not have an upper case )

Well, I guess this might be difficult to implement and might have quite an
impact on performance. The solution I know from other databases consists of
inserting ᅵ after a, ᅵ after o, ᅵ after u and ᅵ after s. Afaik this is
generally accepted.

upper() does not handle Umlaute correctly as well. It leaves ᅵᅵᅵ unchanged
instead of converting them to upper case.

All this happens with a database created with encoding ='latin1'. If there
are better results with a different encoding (I didn't try it yet), I'd
suggest adding some information about this in the documentation.

Thanks for your work,

N.Erichsen

--
HSH Soft-und Hardware Vertriebs GmbH
Rudolf-Diesel-Straᅵe 2 - 16321 Lindenberg
Tel. (030) 94004 - 509 Fax (030) 94004 - 400

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nicolaus Erichsen (#1)
Re: Issues with german 'Umlaute'

Nicolaus Erichsen <nico.erichsen@hsh-berlin.com> writes:

I recently found a problem with sorting german 'Umlaute' .

Sounds like you did not set the right locale when creating the database.
You need to be careful to run initdb with LANG (or LC_ALL or at least
LC_COLLATE) set to what you want, probably "de_DE".

All this happens with a database created with encoding ='latin1'.

Encoding is not the issue, locale is.

regards, tom lane

#3Iavor Raytchev
iavor.raytchev@verysmall.org
In reply to: Tom Lane (#2)
Re: Issues with german 'Umlaute'

Tom Lane wrote:

Nicolaus Erichsen <nico.erichsen@hsh-berlin.com> writes:

I recently found a problem with sorting german 'Umlaute' .

Sounds like you did not set the right locale when creating
the database.
You need to be careful to run initdb with LANG (or LC_ALL or at least
LC_COLLATE) set to what you want, probably "de_DE".

All this happens with a database created with encoding ='latin1'.

Encoding is not the issue, locale is.

Then what about having German, English, Italian and French words in the
same database? Shall we create four databases and place each language in
a separate one?

Iavor

--
Iavor Raytchev
very small technologies (a company of CEE Solutions)

www.verysmall.org