BUG #4451: initcap() function capitalizes incorrectly
The following bug has been logged online:
Bug reference: 4451
Logged by: Scott V
Email address: datagenic@gmail.com
PostgreSQL version: 8.3.1
Operating system: Mac OS X 10.5.4
Description: initcap() function capitalizes incorrectly
Details:
initcap() capitalizes incorrectly when passing strings containing certain
two-byte UTF-8 characters. E.g., when argument = 'mātūrāte', initcap
returns 'MāTūRāTe'. Correct result should be 'Mātūrāte'.
The function appears to be incorrectly interpreting the two-byte chars as
non-alphamueric characters. They are in fact alphanumerics, they just have
diacritical markings.
Scott V wrote:
The following bug has been logged online:
Bug reference: 4451
Logged by: Scott V
Email address: datagenic@gmail.com
PostgreSQL version: 8.3.1
Operating system: Mac OS X 10.5.4
Description: initcap() function capitalizes incorrectly
Details:initcap() capitalizes incorrectly when passing strings containing certain
two-byte UTF-8 characters. E.g., when argument = 'mātūrāte', initcap
returns 'MāTūRāTe'. Correct result should be 'Mātūrāte'.The function appears to be incorrectly interpreting the two-byte chars as
non-alphamueric characters. They are in fact alphanumerics, they just have
diacritical markings.
What's your setting for lc_collate?
//Magnus
Magnus Hagander <magnus@hagander.net> writes:
Scott V wrote:
PostgreSQL version: 8.3.1
Operating system: Mac OS X 10.5.4
initcap() capitalizes incorrectly when passing strings containing certain
two-byte UTF-8 characters. E.g., when argument = 'mātūrāte', initcap
returns 'MāTūRāTe'. Correct result should be 'Mātūrāte'.
What's your setting for lc_collate?
I think actually it's lc_ctype that determines case-folding. But the
current theory is that Apple's locale support is simply broken for
UTF-8:
http://archives.postgresql.org/pgsql-general/2008-02/msg01072.php
which means that even if Scott had all his settings right, it wouldn't
work :-( A quick test on OS X here seems to confirm this.
regards, tom lane
Note sure what the correct settings should be, but output from SHOW
ALL in psql says:
lc_collate C
lc_ctype C
Show quoted text
On Mon, Oct 6, 2008 at 5:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
Scott V wrote:
PostgreSQL version: 8.3.1
Operating system: Mac OS X 10.5.4initcap() capitalizes incorrectly when passing strings containing certain
two-byte UTF-8 characters. E.g., when argument = 'mātūrāte', initcap
returns 'MāTūRāTe'. Correct result should be 'Mātūrāte'.What's your setting for lc_collate?
I think actually it's lc_ctype that determines case-folding. But the
current theory is that Apple's locale support is simply broken for
UTF-8:
http://archives.postgresql.org/pgsql-general/2008-02/msg01072.php
which means that even if Scott had all his settings right, it wouldn't
work :-( A quick test on OS X here seems to confirm this.regards, tom lane
Scott Vanderbilt wrote:
Note sure what the correct settings should be, but output from SHOW
ALL in psql says:lc_collate C
lc_ctype C
There's a chapter on locale support in the user manual:
http://www.postgresql.org/docs/8.3/interactive/locale.html
The right setting depends on what language's collation rules you want to
follow. "locale -a" in a shell should list the available options.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com