BUG #4257: about unicode extend

Started by ArLialmost 18 years ago4 messagesbugs
Jump to latest
#1ArLi
program@163.com

The following bug has been logged online:

Bug reference: 4257
Logged by: arli weng
Email address: program@163.com
PostgreSQL version: 8.3
Operating system: gentoo linux
Description: about unicode extend
Details:

the command (chinese by utf-8):
INSERT INTO "title" VALUES(46307243,46307898,'酋鼠𪕨');

in sqlite text type, no problem..
in postgres report error:

invalid byte sequence for encoding "UNICODE": 0xf0

the 𪕨 char is unicode extend b,
by utf-8 format, the hex code is "f0 aa 95 a8", because unicode extend b,
must start by 0xf0

but postgres cannot support it?

server/database/client encoding has unicode already.

help me pls, because i love postgres..
and sorry my english

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: ArLi (#1)
Re: BUG #4257: about unicode extend

"arli weng" <program@163.com> writes:

the command (chinese by utf-8):
INSERT INTO "title" VALUES(46307243,46307898,'酋鼠𪕨');
in postgres report error:
invalid byte sequence for encoding "UNICODE": 0xf0

I don't believe this is actually an 8.3 server. In 8.1 or later that
encoding would be referred to as "UTF8"; also, 8.1 and later would show
all bytes of the complained-of character not just the first one.

8.0 and before only support 16-bit Unicode code points (ie, 3-byte
utf8 sequences). We have support for 4-byte sequences in 8.1 and
later. Also, there were some fixes in this area in Jan 2007, so
whichever branch you use, make sure you get a minor release that's
newer than that.

regards, tom lane

#3Michael Fuhr
mike@fuhr.org
In reply to: ArLi (#1)
Re: BUG #4257: about unicode extend

On Sat, Jun 21, 2008 at 01:25:15PM +0000, arli weng wrote:

PostgreSQL version: 8.3

What does "SELECT version()" return? I'm wondering if the server
isn't 8.3 but rather an earlier version (see below).

the command (chinese by utf-8):
INSERT INTO "title" VALUES(46307243,46307898,'酋鼠𪕨');

in sqlite text type, no problem..
in postgres report error:

invalid byte sequence for encoding "UNICODE": 0xf0

Your INSERT statement works for me in 8.3.3, 8.2.9, and 8.1.13.
According to the release notes version 8.1 changed UNICODE to UTF8
and added support for 4-byte characters, so the fact that the error
says "UNICODE" and your database doesn't appear to support 4-byte
characters makes me wonder if you're running 8.0 or earlier.

--
Michael Fuhr

#4ArLi
program@163.com
In reply to: Michael Fuhr (#3)
Re: BUG #4257: about unicode extend

very sorry, is i wrong..

the version is 8.0.15.

i just copyed from wrong of server-terminal window.. -_-!

thank you for help.

arli

Michael Fuhr wrote:

Show quoted text

On Sat, Jun 21, 2008 at 01:25:15PM +0000, arli weng wrote:

PostgreSQL version: 8.3

What does "SELECT version()" return? I'm wondering if the server
isn't 8.3 but rather an earlier version (see below).

the command (chinese by utf-8):
INSERT INTO "title" VALUES(46307243,46307898,'酋鼠𪕨');

in sqlite text type, no problem..
in postgres report error:

invalid byte sequence for encoding "UNICODE": 0xf0

Your INSERT statement works for me in 8.3.3, 8.2.9, and 8.1.13.
According to the release notes version 8.1 changed UNICODE to UTF8
and added support for 4-byte characters, so the fact that the error
says "UNICODE" and your database doesn't appear to support 4-byte
characters makes me wonder if you're running 8.0 or earlier.