XML Encoding problem

Started by Radosław Smoguraabout 15 years ago3 messagesgeneral
Jump to latest
#1Radosław Smogura
rsmogura@softperience.eu

Hi,

I have test database with UTF-8 encoding. I putted there XML
<a>ЁĄ¡</a>, (U+0401, U+0104, U+00A1). I changed client encoding to
iso8859-2, as the result of select I got
ERROR: character 0xd081 of encoding "UTF8" has no equivalent in
"LATIN2"
Stan SQL:22P05.

I should got result with characters entities for unparsable characters
&#...;.

Kind regards,
Radosław Smogura

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Radosław Smogura (#1)
Re: XML Encoding problem

On mån, 2011-02-07 at 12:44 +0100, rsmogura wrote:

I have test database with UTF-8 encoding. I putted there XML
<a>ЁĄ¡</a>, (U+0401, U+0104, U+00A1). I changed client encoding to
iso8859-2, as the result of select I got
ERROR: character 0xd081 of encoding "UTF8" has no equivalent in
"LATIN2"
Stan SQL:22P05.

I should got result with characters entities for unparsable characters
&#...;.

Hehe, interesting idea, but it's not implemented that way. We don't
alter the XML data, except for the XML declaration.

#3Radosław Smogura
rsmogura@softperience.eu
In reply to: Peter Eisentraut (#2)
Re: XML Encoding problem

I may write some patch, actually text mode will not be affected, becuase it's
text mode, and patch will fail if client encoding is "reacher" then server
(one possiblity in this situation is to XML-encode to client encoding, text-
rencode to server encoding)

But looking at code same thing could occur with binary recv. I saw there text
based XML conversion (it's altering XML in some way). According to doc I can
store XML in any encodign using binary mode.

I think if text conversion fails, then XML rewrite should occur, and all
unparsable character should be converted to XML entities...

Actually it's XML, not varchar with parsing :)

Peter Eisentraut <peter_e@gmx.net> Wednesday 09 February 2011 23:29:29

Show quoted text

On mån, 2011-02-07 at 12:44 +0100, rsmogura wrote:

I have test database with UTF-8 encoding. I putted there XML
<a>ЁĄ¡</a>, (U+0401, U+0104, U+00A1). I changed client encoding to
iso8859-2, as the result of select I got
ERROR: character 0xd081 of encoding "UTF8" has no equivalent in
"LATIN2"
Stan SQL:22P05.

I should got result with characters entities for unparsable characters
&#...;.

Hehe, interesting idea, but it's not implemented that way. We don't
alter the XML data, except for the XML declaration.