Bug in XPATH() produces invalid XML values and probably un-restorable dumps
Hi
While trying to figure out sensible semantics for XPATH() and scalar-value returning XPath expressions, I've stumbled upon a bug in XPATH() that allows invalid XML values to be produced. This is a serious problem because should such invalid values get inserted into an XML column, an un-restorable dump ensues.
Here's an example (REL9_0_STABLE as of a few days ago)
template1=# SELECT (XPATH('/*/text()', '<root><</root>'))[1];
xpath
-------
<
Since XPATH() returns XML[], this value has type XML, but clearly isn't well-formed. And behold, casting to TEXT and back to XML complains loudly.
template1=# SELECT (XPATH('/*/text()', '<root><</root>'))[1]::TEXT::XML;
ERROR: invalid XML content
DETAIL: Entity: line 1: parser error : StartTag: invalid element name
<
^
The culprit is xml_xmlnodetoxmltype() in backend/utils/adt/xml.c. For non-element nodes, it returns the result of xmlXPathCastNodeToString() verbatim, even though that function doesn't reverse the entity replacement that was done during parsing. Adding a call to escape_xml()
for non-element nodes fixes the problem
template1=# SELECT (XPATH('/*/text()', '<root><</root>'))[1];
xpath
-------
<
Patch is attached.
best regards,
Florian Pflug
Attachments:
pg_xpath_invalidxml.v1.patchapplication/octet-stream; name=pg_xpath_invalidxml.v1.patchDownload
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index eaf5b4d..359cded 100644
*** a/src/backend/utils/adt/xml.c
--- b/src/backend/utils/adt/xml.c
*************** xml_xmlnodetoxmltype(xmlNodePtr cur)
*** 3278,3284 ****
str = xmlXPathCastNodeToString(cur);
PG_TRY();
{
! result = (xmltype *) cstring_to_text((char *) str);
}
PG_CATCH();
{
--- 3278,3284 ----
str = xmlXPathCastNodeToString(cur);
PG_TRY();
{
! result = (xmltype *) cstring_to_text(escape_xml((char *) str));
}
PG_CATCH();
{
On May31, 2011, at 20:50 , Florian Pflug wrote:
While trying to figure out sensible semantics for XPATH() and scalar-value returning XPath expressions, I've stumbled upon a bug in XPATH() that allows invalid XML values to be produced. This is a serious problem because should such invalid values get inserted into an XML column, an un-restorable dump ensues.
Here's an example (REL9_0_STABLE as of a few days ago)
template1=# SELECT (XPATH('/*/text()', '<root><</root>'))[1];
xpath
-------
<...
Patch is attached.
I've added two tests to the xml regression test which highlight the issue.
Updated patch attached.
best regards,
Florian Pflug
Attachments:
pg_xpath_invalidxml.v2.patchapplication/octet-stream; name=pg_xpath_invalidxml.v2.patchDownload
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 702b9e3..fa584de 100644
*** a/src/backend/utils/adt/xml.c
--- b/src/backend/utils/adt/xml.c
*************** xml_xmlnodetoxmltype(xmlNodePtr cur)
*** 3280,3286 ****
str = xmlXPathCastNodeToString(cur);
PG_TRY();
{
! result = (xmltype *) cstring_to_text((char *) str);
}
PG_CATCH();
{
--- 3280,3286 ----
str = xmlXPathCastNodeToString(cur);
PG_TRY();
{
! result = (xmltype *) cstring_to_text(escape_xml((char *) str));
}
PG_CATCH();
{
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index eaa5a74..822f3b4 100644
*** a/src/test/regress/expected/xml.out
--- b/src/test/regress/expected/xml.out
*************** SELECT xpath('//b', '<a>one <b>two</b> t
*** 502,507 ****
--- 502,519 ----
{<b>two</b>,<b>etc</b>}
(1 row)
+ SELECT xpath('//text()', '<root><</root>');
+ xpath
+ --------
+ {<}
+ (1 row)
+
+ SELECT xpath('//@value', '<root value="<"/>');
+ xpath
+ --------
+ {<}
+ (1 row)
+
-- Test xmlexists and xpath_exists
SELECT xmlexists('//town[text() = ''Toronto'']' PASSING BY REF '<towns><town>Bidford-on-Avon</town><town>Cwmbran</town><town>Bristol</town></towns>');
xmlexists
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 711b435..d53131c 100644
*** a/src/test/regress/expected/xml_1.out
--- b/src/test/regress/expected/xml_1.out
*************** LINE 1: SELECT xpath('//b', '<a>one <b>t
*** 456,461 ****
--- 456,473 ----
^
DETAIL: This functionality requires the server to be built with libxml support.
HINT: You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xpath('//text()', '<root><</root>');
+ ERROR: unsupported XML feature
+ LINE 1: SELECT xpath('//text()', '<root><</root>'...
+ ^
+ DETAIL: This functionality requires the server to be built with libxml support.
+ HINT: You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xpath('//@value', '<root value="<"/>');
+ ERROR: unsupported XML feature
+ LINE 1: SELECT xpath('//@value', '<root value="<"/>'...
+ ^
+ DETAIL: This functionality requires the server to be built with libxml support.
+ HINT: You need to rebuild PostgreSQL using --with-libxml.
-- Test xmlexists and xpath_exists
SELECT xmlexists('//town[text() = ''Toronto'']' PASSING BY REF '<towns><town>Bidford-on-Avon</town><town>Cwmbran</town><town>Bristol</town></towns>');
ERROR: unsupported XML feature
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 717a1e7..40e711a 100644
*** a/src/test/regress/sql/xml.sql
--- b/src/test/regress/sql/xml.sql
*************** SELECT xpath('', '<!-- error -->');
*** 163,168 ****
--- 163,170 ----
SELECT xpath('//text()', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>');
SELECT xpath('//loc:piece/@id', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
SELECT xpath('//b', '<a>one <b>two</b> three <b>etc</b></a>');
+ SELECT xpath('//text()', '<root><</root>');
+ SELECT xpath('//@value', '<root value="<"/>');
-- Test xmlexists and xpath_exists
SELECT xmlexists('//town[text() = ''Toronto'']' PASSING BY REF '<towns><town>Bidford-on-Avon</town><town>Cwmbran</town><town>Bristol</town></towns>');