questionable item in HISTORY

Started by Tatsuo Ishiialmost 21 years ago2 messageshackers

ishii@postgresql.org

almost 21 years ago

Following item in HISTORY:

* Add support for 3 and 4-byte UTF8 characters (John Hansen)
Previously only one and two-byte UTF8 characters were supported.
This is particularly important for support for some Chinese
characters.

is wrong since 3-byte UTF-8 characters are supported since UTF-8
support has been added to PostgreSQL. Correct description would be:

* Add support for 4-byte UTF8 characters (John Hansen)
Previously only up to three-byte UTF8 characters were supported.
This is particularly important for support for some Chinese
characters.

In the mean time I wonder if we need to update UTF-8 <--> locale
encoding maps. The author of the patches stated that "This is
particularly important for support for some Chinese characters". I
have no idea what encoding he is reffering to, but I wonder if the
latest Chinense encoding standard GB18030 needs 4-byte UTF-8 mappings.
If yes, we surely need to update utf8_to_gb18030.map.

Anybody familiar with GB18030/UTF-8?
--
SRA OSS, Inc. Japan
Tatsuo Ishii

Bruce Momjian

bruce@momjian.us

almost 21 years ago

In reply to: Tatsuo Ishii (#1)

Re: questionable item in HISTORY

Tatsuo Ishii wrote:

Following item in HISTORY:

* Add support for 3 and 4-byte UTF8 characters (John Hansen)
Previously only one and two-byte UTF8 characters were supported.
This is particularly important for support for some Chinese
characters.

is wrong since 3-byte UTF-8 characters are supported since UTF-8
support has been added to PostgreSQL. Correct description would be:

* Add support for 4-byte UTF8 characters (John Hansen)
Previously only up to three-byte UTF8 characters were supported.
This is particularly important for support for some Chinese
characters.

Release notes updated.

In the mean time I wonder if we need to update UTF-8 <--> locale
encoding maps. The author of the patches stated that "This is
particularly important for support for some Chinese characters". I
have no idea what encoding he is reffering to, but I wonder if the
latest Chinense encoding standard GB18030 needs 4-byte UTF-8 mappings.
If yes, we surely need to update utf8_to_gb18030.map.

Anybody familiar with GB18030/UTF-8?

Good question. The report we got in the past was that some UTF
characters were being rejected even though they were valid UTF
characters, mostly Chinese. I have no idea how they map to GB*
character sets.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073