BUG #4257: about unicode extend
The following bug has been logged online:
Bug reference: 4257
Logged by: arli weng
Email address: program@163.com
PostgreSQL version: 8.3
Operating system: gentoo linux
Description: about unicode extend
Details:
the command (chinese by utf-8):
INSERT INTO "title" VALUES(46307243,46307898,'酋鼠𪕨');
in sqlite text type, no problem..
in postgres report error:
invalid byte sequence for encoding "UNICODE": 0xf0
the 𪕨 char is unicode extend b,
by utf-8 format, the hex code is "f0 aa 95 a8", because unicode extend b,
must start by 0xf0
but postgres cannot support it?
server/database/client encoding has unicode already.
help me pls, because i love postgres..
and sorry my english
"arli weng" <program@163.com> writes:
the command (chinese by utf-8):
INSERT INTO "title" VALUES(46307243,46307898,'酋鼠𪕨');
in postgres report error:
invalid byte sequence for encoding "UNICODE": 0xf0
I don't believe this is actually an 8.3 server. In 8.1 or later that
encoding would be referred to as "UTF8"; also, 8.1 and later would show
all bytes of the complained-of character not just the first one.
8.0 and before only support 16-bit Unicode code points (ie, 3-byte
utf8 sequences). We have support for 4-byte sequences in 8.1 and
later. Also, there were some fixes in this area in Jan 2007, so
whichever branch you use, make sure you get a minor release that's
newer than that.
regards, tom lane
On Sat, Jun 21, 2008 at 01:25:15PM +0000, arli weng wrote:
PostgreSQL version: 8.3
What does "SELECT version()" return? I'm wondering if the server
isn't 8.3 but rather an earlier version (see below).
the command (chinese by utf-8):
INSERT INTO "title" VALUES(46307243,46307898,'酋鼠𪕨');in sqlite text type, no problem..
in postgres report error:invalid byte sequence for encoding "UNICODE": 0xf0
Your INSERT statement works for me in 8.3.3, 8.2.9, and 8.1.13.
According to the release notes version 8.1 changed UNICODE to UTF8
and added support for 4-byte characters, so the fact that the error
says "UNICODE" and your database doesn't appear to support 4-byte
characters makes me wonder if you're running 8.0 or earlier.
--
Michael Fuhr
very sorry, is i wrong..
the version is 8.0.15.
i just copyed from wrong of server-terminal window.. -_-!
thank you for help.
arli
Michael Fuhr wrote:
Show quoted text
On Sat, Jun 21, 2008 at 01:25:15PM +0000, arli weng wrote:
PostgreSQL version: 8.3
What does "SELECT version()" return? I'm wondering if the server
isn't 8.3 but rather an earlier version (see below).the command (chinese by utf-8):
INSERT INTO "title" VALUES(46307243,46307898,'酋鼠𪕨');in sqlite text type, no problem..
in postgres report error:invalid byte sequence for encoding "UNICODE": 0xf0
Your INSERT statement works for me in 8.3.3, 8.2.9, and 8.1.13.
According to the release notes version 8.1 changed UNICODE to UTF8
and added support for 4-byte characters, so the fact that the error
says "UNICODE" and your database doesn't appear to support 4-byte
characters makes me wonder if you're running 8.0 or earlier.