Bug #676: lower(), upper(), & initcap() do not work on utf-8 chars

Started by PostgreSQL Bugs Listalmost 24 years ago5 messagesbugs
Jump to latest
#1PostgreSQL Bugs List
pgsql-bugs@postgresql.org

Henry House (hajhouse@houseag.com) reports a bug with a severity of 3
The lower the number the more severe it is.

Short Description
lower(), upper(), & initcap() do not work on utf-8 chars

Long Description
The string case manipulation functions lower(), upper(), & initcap()
have no effect on non-ASCII characters in the argument, such as �, �,
�, �, etc. ASCII chars in the argument are properly up- or down-cased.
The database encoding is UTF-8.

Sample Code
SELECT upper('�');

No file was uploaded with this report

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PostgreSQL Bugs List (#1)
Re: Bug #676: lower(), upper(), & initcap() do not work on utf-8 chars

pgsql-bugs@postgresql.org writes:

The string case manipulation functions lower(), upper(), & initcap()
have no effect on non-ASCII characters in the argument, such as �, �,
�, �, etc. ASCII chars in the argument are properly up- or down-cased.
The database encoding is UTF-8.

lower/upper-casing is driven by locale, not encoding.

Unfortunately you didn't mention anything about your locale setup...

regards, tom lane

#3Henry House
hajhouse@houseag.com
In reply to: Tom Lane (#2)
Re: Bug #676: lower(), upper(), & initcap() do not work on utf-8 chars

On Sat, May 25, 2002 at 12:56:06AM -0400, Tom Lane wrote:

pgsql-bugs@postgresql.org writes:

The string case manipulation functions lower(), upper(), & initcap()
have no effect on non-ASCII characters in the argument, such as �, �,
�, �, etc. ASCII chars in the argument are properly up- or down-cased.
The database encoding is UTF-8.

lower/upper-casing is driven by locale, not encoding.

Unfortunately you didn't mention anything about your locale setup...

The server locale is en_US.UTF-8. (At least I set it up as such when
installing PostgreSQL; I know no way to verify.) The server version is 7.2.1,
running on a IA32 and a DEC Alpha; both machines show the same behavior. Both
are Debian Linux. Perhaps the bug lies in the locale definition supplied by
Debian?

--
Henry House
The attached file is a digital signature. See <http://romana.hajhouse.org/pgp&gt;
for information. My OpenPGP key: <http://romana.hajhouse.org/hajhouse.asc&gt;.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Henry House (#3)
Re: Bug #676: lower(), upper(), & initcap() do not work on utf-8 chars

Henry House <hajhouse@houseag.com> writes:

Unfortunately you didn't mention anything about your locale setup...

The server locale is en_US.UTF-8. (At least I set it up as such when
installing PostgreSQL; I know no way to verify.) The server version is 7.2.=
1,
running on a IA32 and a DEC Alpha; both machines show the same behavior. Bo=
th
are Debian Linux. Perhaps the bug lies in the locale definition supplied by
Debian?

Offhand I'd not necessarily expect an en_US locale to upcase/downcase
anything except a-z/A-Z. Perhaps you need to use a different locale.

I'd suggest taking this up with a locale expert, which I surely am
not.

regards, tom lane

#5Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Henry House (#3)
Re: Bug #676: lower(), upper(), & initcap() do not work on

lower/upper-casing is driven by locale, not encoding.

Unfortunately you didn't mention anything about your locale setup...

The server locale is en_US.UTF-8. (At least I set it up as such when
installing PostgreSQL; I know no way to verify.) The server version is 7.2.1,
running on a IA32 and a DEC Alpha; both machines show the same behavior. Both
are Debian Linux. Perhaps the bug lies in the locale definition supplied by
Debian?

I don't think current locale support code works with mutibyte
encodings such as UTF-8. See the thread tiled "Bug #659:
lower()/upper() bug on" on pgsql-bugs and pgsql-hackers.

In the mean time, a work around would be something like:

select convert(lower(convert('X', 'LATIN1')),'LATIN1','UNICODE');

That will convert UTF-8 'X' to its lower case if you are sure that 'X'
could be converted to ISO-8859-1.

Of course the problem with this method is:

Someone has suggested me a fix using UTF-8 locales, but I'm worried
about usage of UTF-8 and am waiting for the test result with my
Japanese data.
--
Tatsuo Ishii