XML element with special characters can be created, serialized, but not deserialized

Started by Sergiu Ignatalmost 3 years ago2 messagesbugs
Jump to latest
#1Sergiu Ignat
sergiu@bitsoftware.ro

Hello,

I am using PostgreSQL 13.8 and I think that I found an issue with XML
serialization and deserialization.

A text that has special characters cannot be converted to XML even if it
was created by serializing an XML element.

In our case a string contains a special character with the ASCII code 19,
placed between the letters i and p.
The simple statement that serializes an XML element works.
select xmlelement(name "street",'i p')::text

When the same text has to be converted back to XML. it fails with an error

select xmlelement(name "street",'i p')::text::xml

The error message is

SQL Error [2200N]: ERROR: invalid XML content
Detail: line 1: PCDATA invalid Char value 19
<street>i p</street>
^
line 1: chunk is not well balanced
<street>i p</street>
^

The expected behaviour would be to successfully parse an XML element that
was created and serialized by the same engine.

Best regards,
--
Serghei Ignat

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sergiu Ignat (#1)
Re: XML element with special characters can be created, serialized, but not deserialized

Sergiu Ignat <sergiu@bitsoftware.ro> writes:

I am using PostgreSQL 13.8 and I think that I found an issue with XML
serialization and deserialization.

Hmm. The root cause here seems to be that escape_xml() thinks it
doesn't need to escape ASCII control characters, other than CR (\r).
Which is a bit backwards, because after some googling I conclude that
XML 1.1 requires all C0 and C1 control characters to be represented as
numeric escapes *except* CR, LF, and TAB [1]https://www.w3.org/International/questions/qa-controls.

What we probably ought to do is escape all except LF and TAB.
However, I'm a bit hesitant to back-patch such a behavioral change.
Maybe change this in HEAD (v16) only?

regards, tom lane

[1]: https://www.w3.org/International/questions/qa-controls