BUG #1721: mutiple bytes character string comaprison error
The following bug has been logged online:
Bug reference: 1721
Logged by: Chii-Tung Liu
Email address: cdliou@mail.cyut.edu.tw
PostgreSQL version: 8.0.3
Operating system: Windows XP SP2
Description: mutiple bytes character string comaprison error
Details:
When compare two UTF-8 encoded string that contains Chinese words, the
result is always TRUE
1. create a database test with encoding set to unicode
CREATE DATABASE test
WITH OWNER = postgres
ENCODING = 'UNICODE'
TABLESPACE = pg_default;
2. insert data with Chinese words
INSERT into node set title='1 中文'
3. SELECT title from node where title > '1.1 '
would return '1 中文'
4. Both SELECT '1 中文' > '1.1' and SELECT '1.1' > '1 中文' return
FALSE
"Chii-Tung Liu" <cdliou@mail.cyut.edu.tw> writes:
PostgreSQL version: 8.0.3
Operating system: Windows XP SP2
When compare two UTF-8 encoded string that contains Chinese words, the
result is always TRUE
Sorry, but UTF-8 encoding doesn't work properly on Windows (yet).
Use some other database encoding.
regards, tom lane
On Sun, 19 Jun 2005, Tom Lane wrote:
"Chii-Tung Liu" <cdliou@mail.cyut.edu.tw> writes:
PostgreSQL version: 8.0.3
Operating system: Windows XP SP2When compare two UTF-8 encoded string that contains Chinese words, the
result is always TRUESorry, but UTF-8 encoding doesn't work properly on Windows (yet).
Use some other database encoding.
Shouldn't we forbid its creation then? At least a strongly worded
warning? We see these complaints too often.
Kris Jurka
Kris Jurka <books@ejurka.com> writes:
On Sun, 19 Jun 2005, Tom Lane wrote:
Sorry, but UTF-8 encoding doesn't work properly on Windows (yet).
Use some other database encoding.
Shouldn't we forbid its creation then?
There was serious discussion of that before the 8.0 release, but
we decided not to forbid it. Check the archives; I don't recall
the reasoning at the moment.
We see these complaints too often.
There are lots of complaints we see way too often ;-) ... but
distressingly, there are still only 24 hours in a day.
regards, tom lane
The following bug has been logged online:
Bug reference: 1721
Logged by: Chii-Tung Liu
Email address: cdliou@mail.cyut.edu.tw
PostgreSQL version: 8.0.3
Operating system: Windows XP SP2
Description: mutiple bytes character string comaprison error
Details:When compare two UTF-8 encoded string that contains Chinese words, the
result is always TRUE
1. create a database test with encoding set to unicode
CREATE DATABASE test
WITH OWNER = postgres
ENCODING = 'UNICODE'
TABLESPACE = pg_default;
2. insert data with Chinese words
INSERT into node set title='1 中文'3. SELECT title from node where title > '1.1 '
would return '1 中文'4. Both SELECT '1 中文' > '1.1' and SELECT '1.1' > '1 中文' return
FALSE
I think you need to use C locale.
--
Tatsuo Ishii
Tom Lane wrote:
Kris Jurka <books@ejurka.com> writes:
On Sun, 19 Jun 2005, Tom Lane wrote:
Sorry, but UTF-8 encoding doesn't work properly on Windows (yet).
Use some other database encoding.Shouldn't we forbid its creation then?
There was serious discussion of that before the 8.0 release, but
we decided not to forbid it. Check the archives; I don't recall
the reasoning at the moment.
UTF8 encoding works with the C locale assuming you don't care about
ordering of the character set, e.g. Japanese.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
UTF8 encoding works with the C locale assuming you don't care
about ordering of the character set, e.g. Japanese.
Has anyone with the ability to compile postgresql on windows tested the
ICU patch?
... John
Import Notes
Resolved by subject fallback
UTF8 encoding works with the C locale assuming you don't care about
ordering of the character set, e.g. Japanese.Has anyone with the ability to compile postgresql on windows
tested the ICU patch?
Yes.
See http://archives.postgresql.org/pgsql-hackers/2005-05/msg00662.php
//Magnus
Import Notes
Resolved by subject fallback
Tom Lane wrote:
Kris Jurka <books@ejurka.com> writes:
On Sun, 19 Jun 2005, Tom Lane wrote:
Sorry, but UTF-8 encoding doesn't work properly on Windows (yet).
Use some other database encoding.Shouldn't we forbid its creation then?
There was serious discussion of that before the 8.0 release, but
we decided not to forbid it. Check the archives; I don't recall
the reasoning at the moment.UTF8 encoding works with the C locale assuming you don't care about
ordering of the character set, e.g. Japanese.
No, sometimes Japanese needs char ordering too and I think this is not
a Windows only problem. The real problem is Unicode defines char
orderes in totally random manner because Chinese/Japanese/Korean Kanji
characters are "Unified" in Unicode. To solve the problem, we can use
convert UTF8 to EUC_JP using CONVERT. See archives for more details.
Or you can use Unicode locale only if your platform's locale database
is not broken and you only use single locale.
--
Tatsuo Ishii