BUG #4451: initcap() function capitalizes incorrectly

Started by Scott Vanderbiltover 17 years ago5 messagesbugs
Jump to latest
#1Scott Vanderbilt
datagenic@gmail.com

The following bug has been logged online:

Bug reference: 4451
Logged by: Scott V
Email address: datagenic@gmail.com
PostgreSQL version: 8.3.1
Operating system: Mac OS X 10.5.4
Description: initcap() function capitalizes incorrectly
Details:

initcap() capitalizes incorrectly when passing strings containing certain
two-byte UTF-8 characters. E.g., when argument = 'mātūrāte', initcap
returns 'MāTūRāTe'. Correct result should be 'Mātūrāte'.

The function appears to be incorrectly interpreting the two-byte chars as
non-alphamueric characters. They are in fact alphanumerics, they just have
diacritical markings.

#2Magnus Hagander
magnus@hagander.net
In reply to: Scott Vanderbilt (#1)
Re: BUG #4451: initcap() function capitalizes incorrectly

Scott V wrote:

The following bug has been logged online:

Bug reference: 4451
Logged by: Scott V
Email address: datagenic@gmail.com
PostgreSQL version: 8.3.1
Operating system: Mac OS X 10.5.4
Description: initcap() function capitalizes incorrectly
Details:

initcap() capitalizes incorrectly when passing strings containing certain
two-byte UTF-8 characters. E.g., when argument = 'mātūrāte', initcap
returns 'MāTūRāTe'. Correct result should be 'Mātūrāte'.

The function appears to be incorrectly interpreting the two-byte chars as
non-alphamueric characters. They are in fact alphanumerics, they just have
diacritical markings.

What's your setting for lc_collate?

//Magnus

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#2)
Re: BUG #4451: initcap() function capitalizes incorrectly

Magnus Hagander <magnus@hagander.net> writes:

Scott V wrote:

PostgreSQL version: 8.3.1
Operating system: Mac OS X 10.5.4

initcap() capitalizes incorrectly when passing strings containing certain
two-byte UTF-8 characters. E.g., when argument = 'mātūrāte', initcap
returns 'MāTūRāTe'. Correct result should be 'Mātūrāte'.

What's your setting for lc_collate?

I think actually it's lc_ctype that determines case-folding. But the
current theory is that Apple's locale support is simply broken for
UTF-8:
http://archives.postgresql.org/pgsql-general/2008-02/msg01072.php
which means that even if Scott had all his settings right, it wouldn't
work :-( A quick test on OS X here seems to confirm this.

regards, tom lane

#4Scott Vanderbilt
datagenic@gmail.com
In reply to: Tom Lane (#3)
Re: BUG #4451: initcap() function capitalizes incorrectly

Note sure what the correct settings should be, but output from SHOW
ALL in psql says:

lc_collate C
lc_ctype C

Show quoted text

On Mon, Oct 6, 2008 at 5:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Magnus Hagander <magnus@hagander.net> writes:

Scott V wrote:

PostgreSQL version: 8.3.1
Operating system: Mac OS X 10.5.4

initcap() capitalizes incorrectly when passing strings containing certain
two-byte UTF-8 characters. E.g., when argument = 'mātūrāte', initcap
returns 'MāTūRāTe'. Correct result should be 'Mātūrāte'.

What's your setting for lc_collate?

I think actually it's lc_ctype that determines case-folding. But the
current theory is that Apple's locale support is simply broken for
UTF-8:
http://archives.postgresql.org/pgsql-general/2008-02/msg01072.php
which means that even if Scott had all his settings right, it wouldn't
work :-( A quick test on OS X here seems to confirm this.

regards, tom lane

#5Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Scott Vanderbilt (#4)
Re: BUG #4451: initcap() function capitalizes incorrectly

Scott Vanderbilt wrote:

Note sure what the correct settings should be, but output from SHOW
ALL in psql says:

lc_collate C
lc_ctype C

There's a chapter on locale support in the user manual:

http://www.postgresql.org/docs/8.3/interactive/locale.html

The right setting depends on what language's collation rules you want to
follow. "locale -a" in a shell should list the available options.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com