again: Bug #943: Server-Encoding from EUC_TW to UTF-8 doesn'twork
Hello,
I reported bug #943 (I found in 7.3.2) and you checked in some change against integer overflow.
Now I upgraded to 7.3.3 and I'm not happy with this.
The exact error as I described is fixed, but I found new errors in conversion UTF-8 <-> EUC_TW and BIG5:
Copy to table (DB has UTF-8 encoding) from file:
for PGCLIENTENCODING=BIG5:
WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored
WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored
WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored
WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. Ignored
for EUC_TW
WARNING: copy: line 1, LocalToUtf: could not convert (0x8ea3c3b7) EUC_TW to UTF-8. Ignored
WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored
WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored
WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. Ignored
Copy out to file from table (UTF-8 data):
to BIG5
WARNING: UtfToLocal: could not convert UTF-8 (0xe7a281). Ignored
WARNING: UtfToLocal: could not convert UTF-8 (0xe98ab9). Ignored
WARNING: UtfToLocal: could not convert UTF-8 (0xe8a38f). Ignored
WARNING: UtfToLocal: could not convert UTF-8 (0xe7b2a7). Ignored
to EUC_TW is ok!
Regards,
Michael
Hello,
I reported bug #943 (I found in 7.3.2) and you checked in some change against integer overflow.
Now I upgraded to 7.3.3 and I'm not happy with this.
The exact error as I described is fixed, but I found new errors in conversion UTF-8 <-> EUC_TW and BIG5:Copy to table (DB has UTF-8 encoding) from file:
for PGCLIENTENCODING=BIG5:
WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored
WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored
WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored
WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. Ignored
I see no problem here. The only standard conversion map I could found
on-line form so far (see below URL) does not include entries 0xf9d6 or
above.
http://www.unicode.org/Public/UNIDATA/Unihan.txt
for EUC_TW
WARNING: copy: line 1, LocalToUtf: could not convert (0x8ea3c3b7) EUC_TW to UTF-8. Ignored
WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored
WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored
WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. Ignored
Hum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL
supports only:
CNS 11643-1993, plane 0
CNS 11643-1993, plane 1
CNS 11643-1993, plane 2
CNS 11643-1993, plane 15
Would you like to have support for rest of CNS 11643-1993 planes:
CNS 11643-1993, plane 3
CNS 11643-1993, plane 4
CNS 11643-1993, plane 5
CNS 11643-1993, plane 6
CNS 11643-1993, plane 7
support for upcoming 7.4?
Copy out to file from table (UTF-8 data):
to BIG5
WARNING: UtfToLocal: could not convert UTF-8 (0xe7a281). Ignored
WARNING: UtfToLocal: could not convert UTF-8 (0xe98ab9). Ignored
WARNING: UtfToLocal: could not convert UTF-8 (0xe8a38f). Ignored
WARNING: UtfToLocal: could not convert UTF-8 (0xe7b2a7). Ignoredto EUC_TW is ok!
BIG5 and EUC_TW have different code points. So this is not very strange.
--
Tatsuo Ishii
Tatsuo Ishii wrote:
Hello,
I reported bug #943 (I found in 7.3.2) and you checked in some change against integer overflow.
Now I upgraded to 7.3.3 and I'm not happy with this.
The exact error as I described is fixed, but I found new errors in conversion UTF-8 <-> EUC_TW and BIG5:Copy to table (DB has UTF-8 encoding) from file:
for PGCLIENTENCODING=BIG5:
WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored
WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored
WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored
WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. IgnoredI see no problem here. The only standard conversion map I could found
on-line form so far (see below URL) does not include entries 0xf9d6 or
above.
Sorry, I do not know anything about conversion maps and CNS 11643-1993 planes.
I only got a file in BIG5 encoding from Taiwan and found that it is not possible
to load all text to postgresql 7.3.3.
But it is possible to convert to UTF-8 with iconv tool from glibc (Linux).
It would be good if next release supports todays BIG5.
Michael
Show quoted text
http://www.unicode.org/Public/UNIDATA/Unihan.txt
for EUC_TW
WARNING: copy: line 1, LocalToUtf: could not convert (0x8ea3c3b7) EUC_TW to UTF-8. Ignored
WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored
WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored
WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. IgnoredHum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL
supports only:CNS 11643-1993, plane 0
CNS 11643-1993, plane 1
CNS 11643-1993, plane 2
CNS 11643-1993, plane 15Would you like to have support for rest of CNS 11643-1993 planes:
CNS 11643-1993, plane 3
CNS 11643-1993, plane 4
CNS 11643-1993, plane 5
CNS 11643-1993, plane 6
CNS 11643-1993, plane 7support for upcoming 7.4?
Copy out to file from table (UTF-8 data):
to BIG5
WARNING: UtfToLocal: could not convert UTF-8 (0xe7a281). Ignored
WARNING: UtfToLocal: could not convert UTF-8 (0xe98ab9). Ignored
WARNING: UtfToLocal: could not convert UTF-8 (0xe8a38f). Ignored
WARNING: UtfToLocal: could not convert UTF-8 (0xe7b2a7). Ignoredto EUC_TW is ok!
BIG5 and EUC_TW have different code points. So this is not very strange.
--
Tatsuo Ishii
Copy to table (DB has UTF-8 encoding) from file:
for PGCLIENTENCODING=BIG5:
WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored
WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored
WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored
WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. IgnoredI see no problem here. The only standard conversion map I could found
on-line form so far (see below URL) does not include entries 0xf9d6 or
above.Sorry, I do not know anything about conversion maps and CNS 11643-1993 planes.
I only got a file in BIG5 encoding from Taiwan and found that it is not possible
to load all text to postgresql 7.3.3.
But it is possible to convert to UTF-8 with iconv tool from glibc (Linux).
It would be good if next release supports todays BIG5.
I'm not looking forward to add any conversion entries confirmed by
standards. Can some one explain me the current status of the
conversion maps between BIG5 and Unicode? The only info I could found
so far is in www.unicode.org.
--
Tatsuo Ishii
Copy to table (DB has UTF-8 encoding) from file:
for PGCLIENTENCODING=BIG5:
WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored
WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored
WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored
WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. IgnoredI see no problem here. The only standard conversion map I could found
on-line form so far (see below URL) does not include entries 0xf9d6 or
above.Sorry, I do not know anything about conversion maps and CNS 11643-1993 planes.
I only got a file in BIG5 encoding from Taiwan and found that it is not possible
to load all text to postgresql 7.3.3.
But it is possible to convert to UTF-8 with iconv tool from glibc (Linux).
It would be good if next release supports todays BIG5.I'm not looking forward to add any conversion entries confirmed by
standards. Can some one explain me the current status of the
Oops. above should be:
I'm not looking forward to add any conversion entries NOT confirmed by
standards.
Show quoted text
conversion maps between BIG5 and Unicode? The only info I could found
so far is in www.unicode.org.
--
Tatsuo Ishii---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
I reported bug #943 (I found in 7.3.2) and you checked in some change against integer overflow.
Now I upgraded to 7.3.3 and I'm not happy with this.
The exact error as I described is fixed, but I found new errors in conversion UTF-8 <-> EUC_TW and BIG5:Copy to table (DB has UTF-8 encoding) from file:
for PGCLIENTENCODING=BIG5:
WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored
WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored
WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored
WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. IgnoredI see no problem here. The only standard conversion map I could found
on-line form so far (see below URL) does not include entries 0xf9d6 or
above.I found in this file:
U+F9D7 in line 604519
U+F9D8 in line 219540
U+F9D6...U+F9DB in lines 730707...730766.
No. U+F9D6 means *Unicode* code point, not BIG5 code point.
Show quoted text
for EUC_TW
WARNING: copy: line 1, LocalToUtf: could not convert (0x8ea3c3b7) EUC_TW to UTF-8. Ignored
WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored
WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored
WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. IgnoredHum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL
supports only:CNS 11643-1993, plane 0
CNS 11643-1993, plane 1
CNS 11643-1993, plane 2
CNS 11643-1993, plane 15Would you like to have support for rest of CNS 11643-1993 planes:
CNS 11643-1993, plane 3
CNS 11643-1993, plane 4
CNS 11643-1993, plane 5
CNS 11643-1993, plane 6
CNS 11643-1993, plane 7support for upcoming 7.4?
Copy out to file from table (UTF-8 data):
to BIG5
WARNING: UtfToLocal: could not convert UTF-8 (0xe7a281). Ignored
WARNING: UtfToLocal: could not convert UTF-8 (0xe98ab9). Ignored
WARNING: UtfToLocal: could not convert UTF-8 (0xe8a38f). Ignored
WARNING: UtfToLocal: could not convert UTF-8 (0xe7b2a7). Ignoredto EUC_TW is ok!
BIG5 and EUC_TW have different code points. So this is not very strange.
But it is very strange that I can (for EUC_TW) copy to file without error but I can not copy from file without error.
Michael
Import Notes
Reply to msg id not found: 3EF7FBEF.DE82C20C@wincor-nixdorf.com
Tatsuo Ishii wrote:
I reported bug #943 (I found in 7.3.2) and you checked in some change against integer overflow.
Now I upgraded to 7.3.3 and I'm not happy with this.
The exact error as I described is fixed, but I found new errors in conversion UTF-8 <-> EUC_TW and BIG5:Copy to table (DB has UTF-8 encoding) from file:
for PGCLIENTENCODING=BIG5:
WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored
WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored
WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored
WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. IgnoredI see no problem here. The only standard conversion map I could found
on-line form so far (see below URL) does not include entries 0xf9d6 or
above.I found in this file:
U+F9D7 in line 604519
U+F9D8 in line 219540
U+F9D6...U+F9DB in lines 730707...730766.No. U+F9D6 means *Unicode* code point, not BIG5 code point.
Ok.
I have looked into my Linux box and found this in /usr/share/i18n/charmaps/BIG5.gz:
% Chinese charmap for BIG5 (CP950)
% version: 0.92
% Contact: Tung-Han Hsieh <thhsieh@linux.org.tw>
% Yuan-Chung Cheng <platin@ms31.hinet.net>
% Distribution and use is free, even for comercial purpose.
%
% This charmap is converted from:
% ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT
% ...
There "my" characters are in.
Don't you agree that it is strange that I can (for EUC_TW) copy "to" file without error
but I can not copy "from" file without error?
Michael
Show quoted text
for EUC_TW
WARNING: copy: line 1, LocalToUtf: could not convert (0x8ea3c3b7) EUC_TW to UTF-8. Ignored
WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored
WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored
WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. IgnoredHum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL
supports only:CNS 11643-1993, plane 0
CNS 11643-1993, plane 1
CNS 11643-1993, plane 2
CNS 11643-1993, plane 15Would you like to have support for rest of CNS 11643-1993 planes:
CNS 11643-1993, plane 3
CNS 11643-1993, plane 4
CNS 11643-1993, plane 5
CNS 11643-1993, plane 6
CNS 11643-1993, plane 7support for upcoming 7.4?
Copy out to file from table (UTF-8 data):
to BIG5
WARNING: UtfToLocal: could not convert UTF-8 (0xe7a281). Ignored
WARNING: UtfToLocal: could not convert UTF-8 (0xe98ab9). Ignored
WARNING: UtfToLocal: could not convert UTF-8 (0xe8a38f). Ignored
WARNING: UtfToLocal: could not convert UTF-8 (0xe7b2a7). Ignoredto EUC_TW is ok!
BIG5 and EUC_TW have different code points. So this is not very strange.
But it is very strange that I can (for EUC_TW) copy to file without error but I can not copy from file without error.
Michael
I have looked into my Linux box and found this in /usr/share/i18n/charmaps/BIG5.gz:
% Chinese charmap for BIG5 (CP950)
% version: 0.92
% Contact: Tung-Han Hsieh <thhsieh@linux.org.tw>
% Yuan-Chung Cheng <platin@ms31.hinet.net>
% Distribution and use is free, even for comercial purpose.
%
% This charmap is converted from:
% ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT
% ...There "my" characters are in.
That's a M$'s definition, not a standard. I think there should be a
reason why the Unicode org. does not use it.
Don't you agree that it is strange that I can (for EUC_TW) copy "to" file without error
but I can not copy "from" file without error?
I'm not quite sure what you are saying. Are you complaining that (for
example) 0xe7a281 in UTF-8 does not convert to EUC_TW?
BTW, what do you think about below?
FYI, CNS 11643-1993 is the standard character set and EUC_TW is the
one of the encodings. That means your problem below will disappear.
WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored
WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored
WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. Ignored
Hum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL
supports only:CNS 11643-1993, plane 0
CNS 11643-1993, plane 1
CNS 11643-1993, plane 2
CNS 11643-1993, plane 15Would you like to have support for rest of CNS 11643-1993 planes:
CNS 11643-1993, plane 3
CNS 11643-1993, plane 4
CNS 11643-1993, plane 5
CNS 11643-1993, plane 6
CNS 11643-1993, plane 7support for upcoming 7.4?
--
Tatsuo Ishii
Tatsuo Ishii wrote:
I have looked into my Linux box and found this in /usr/share/i18n/charmaps/BIG5.gz:
% Chinese charmap for BIG5 (CP950)
% version: 0.92
% Contact: Tung-Han Hsieh <thhsieh@linux.org.tw>
% Yuan-Chung Cheng <platin@ms31.hinet.net>
% Distribution and use is free, even for comercial purpose.
%
% This charmap is converted from:
% ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT
% ...There "my" characters are in.
That's a M$'s definition, not a standard. I think there should be a
reason why the Unicode org. does not use it.
Ok, I do not know the reason. But since also the glibc uses it, couldn't you use it too?
I believe the glibc delveloper have thought about this a lot. And they came to the
conclusion to use this definition. Why not postgresql?
Don't you agree that it is strange that I can (for EUC_TW) copy "to" file without error
but I can not copy "from" file without error?I'm not quite sure what you are saying. Are you complaining that (for
example) 0xe7a281 in UTF-8 does not convert to EUC_TW?
Yes exactly, since this value comes from a "copy to" with PGCLIENTENCODING=EUC_TW
BTW, what do you think about below?
FYI, CNS 11643-1993 is the standard character set and EUC_TW is the
one of the encodings. That means your problem below will disappear.
Ok.
Regards,
Michael
Show quoted text
WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored
WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored
WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. IgnoredHum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL
supports only:CNS 11643-1993, plane 0
CNS 11643-1993, plane 1
CNS 11643-1993, plane 2
CNS 11643-1993, plane 15Would you like to have support for rest of CNS 11643-1993 planes:
CNS 11643-1993, plane 3
CNS 11643-1993, plane 4
CNS 11643-1993, plane 5
CNS 11643-1993, plane 6
CNS 11643-1993, plane 7support for upcoming 7.4?
--
Tatsuo Ishii
Import Notes
Reference msg id not found: 3EF7FBEF.DE82C20C@wincor-nixdorf.com