Fix xpath() to return namespace definitions
Hi all,
While developing some XML processing queries, i stumbled on an old bug
mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or
repeated xpath() that apparently mess up namespaces.
Source of the bug is that libxml2's xmlNodeDump won't output XML namespace
definitions that declared in the node's parents. As per
https://bug639036.bugzilla-attachments.gnome.org/attachment.cgi?id=177858,
the behavior is intentional.
This patch uses function xmlCopyNode that copies a node, including its
namespace definitions as required (without unused namespace in the node or
its children). When the copy dumped, the resulting XML is complete with its
namespaces. Calling xmlCopyNode will need additional memory to execute, but
reimplementing its routine to handle namespace definition will introduce
much complexity to the code.
Note: This is my very first postgresql patch.
--
Ali Akbar
Attachments:
xpath-ns-fix.patchtext/x-patch; charset=US-ASCII; name=xpath-ns-fix.patchDownload
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 422be69..93e335c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3602,19 +3602,28 @@ xml_xmlnodetoxmltype(xmlNodePtr cur)
if (cur->type == XML_ELEMENT_NODE)
{
xmlBufferPtr buf;
+ xmlNodePtr cur_copy;
buf = xmlBufferCreate();
+
+ /* the result of xmlNodeDump won't contain namespace definitions,
+ * but libxml2 has xmlCopyNode that duplicates a node, along
+ * with its required namespace definitions.
+ */
+ cur_copy = xmlCopyNode(cur, 1);
PG_TRY();
{
- xmlNodeDump(buf, NULL, cur, 0, 1);
+ xmlNodeDump(buf, NULL, cur_copy, 0, 1);
result = xmlBuffer_to_xmltype(buf);
}
PG_CATCH();
{
+ xmlFreeNode(cur_copy);
xmlBufferFree(buf);
PG_RE_THROW();
}
PG_END_TRY();
+ xmlFreeNode(cur_copy);
xmlBufferFree(buf);
}
else
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 382f9df..a6d26f7 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -584,6 +584,12 @@ SELECT xpath('//loc:piece/@id', '<local:data xmlns:local="http://127.0.0.1"><loc
{1,2}
(1 row)
+SELECT xpath('//loc:piece', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+ xpath
+------------------------------------------------------------------------------------------------------------------------------------------------
+ {"<local:piece xmlns:local=\"http://127.0.0.1\" id=\"1\">number one</local:piece>","<local:piece xmlns:local=\"http://127.0.0.1\" id=\"2\"/>"}
+(1 row)
+
SELECT xpath('//b', '<a>one <b>two</b> three <b>etc</b></a>');
xpath
-------------------------
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index a34d1f4..c7bcf91 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -498,6 +498,12 @@ LINE 1: SELECT xpath('//loc:piece/@id', '<local:data xmlns:local="ht...
^
DETAIL: This functionality requires the server to be built with libxml support.
HINT: You need to rebuild PostgreSQL using --with-libxml.
+SELECT xpath('//loc:piece', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+ERROR: unsupported XML feature
+LINE 1: SELECT xpath('//loc:piece', '<local:data xmlns:local="http:/...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+HINT: You need to rebuild PostgreSQL using --with-libxml.
SELECT xpath('//b', '<a>one <b>two</b> three <b>etc</b></a>');
ERROR: unsupported XML feature
LINE 1: SELECT xpath('//b', '<a>one <b>two</b> three <b>etc</b></a>'...
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 90d4d67..241a5d6 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -174,6 +174,7 @@ SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
SELECT xpath('', '<!-- error -->');
SELECT xpath('//text()', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>');
SELECT xpath('//loc:piece/@id', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+SELECT xpath('//loc:piece', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
SELECT xpath('//b', '<a>one <b>two</b> three <b>etc</b></a>');
SELECT xpath('//text()', '<root><</root>');
SELECT xpath('//@value', '<root value="<"/>');
On Fri, May 30, 2014 at 5:04 AM, Ali Akbar <the.apaan@gmail.com> wrote:
While developing some XML processing queries, i stumbled on an old bug
mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or
repeated xpath() that apparently mess up namespaces.Source of the bug is that libxml2's xmlNodeDump won't output XML namespace
definitions that declared in the node's parents. As per
https://bug639036.bugzilla-attachments.gnome.org/attachment.cgi?id=177858,
the behavior is intentional.This patch uses function xmlCopyNode that copies a node, including its
namespace definitions as required (without unused namespace in the node or
its children). When the copy dumped, the resulting XML is complete with its
namespaces. Calling xmlCopyNode will need additional memory to execute, but
reimplementing its routine to handle namespace definition will introduce
much complexity to the code.Note: This is my very first postgresql patch.
Please add your patch here so we don't forget about it:
https://commitfest.postgresql.org/action/commitfest_view/open
Please see also:
https://wiki.postgresql.org/wiki/Submitting_a_Patch
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2014-06-05 2:37 GMT+07:00 Robert Haas <robertmhaas@gmail.com>:
Please add your patch here so we don't forget about it:
https://commitfest.postgresql.org/action/commitfest_view/open
Added: https://commitfest.postgresql.org/action/patch_view?id=1461
Please see also:
Thanks, i've read it.
As for the suggestion there: "Start out by reviewing a patch or responding
to email on the lists. Even if it is unrelated to what you're doing", i'll
look around other posts in this list.
For back versions, i think because this patch changes xpath() behavior, we
will only apply this to future versions. The old behavior is wrong
(according to XPath standard) for not including namespaces, but maybe there
are some application that depends on the old behavior.
Thanks!
--
Ali Akbar
At 2014-05-30 16:04:33 +0700, the.apaan@gmail.com wrote:
While developing some XML processing queries, i stumbled on an old bug
mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or
repeated xpath() that apparently mess up namespaces.
Thanks for the patch, and welcome to Postgres development.
I can confirm that it works fine. I have attached here a very slightly
tweaked version of the patch (removed trailing whitespace, and changed
some comment text).
I'm marking this ready for committer.
-- Abhijit
Attachments:
xpath-ns-fix-2.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 422be69..f9ad598 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3602,19 +3602,28 @@ xml_xmlnodetoxmltype(xmlNodePtr cur)
if (cur->type == XML_ELEMENT_NODE)
{
xmlBufferPtr buf;
+ xmlNodePtr cur_copy;
buf = xmlBufferCreate();
+
+ /* the result of xmlNodeDump won't contain namespace definitions
+ * from parent nodes, but xmlCopyNode duplicates a node along
+ * with its required namespace definitions.
+ */
+ cur_copy = xmlCopyNode(cur, 1);
PG_TRY();
{
- xmlNodeDump(buf, NULL, cur, 0, 1);
+ xmlNodeDump(buf, NULL, cur_copy, 0, 1);
result = xmlBuffer_to_xmltype(buf);
}
PG_CATCH();
{
+ xmlFreeNode(cur_copy);
xmlBufferFree(buf);
PG_RE_THROW();
}
PG_END_TRY();
+ xmlFreeNode(cur_copy);
xmlBufferFree(buf);
}
else
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 382f9df..a6d26f7 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -584,6 +584,12 @@ SELECT xpath('//loc:piece/@id', '<local:data xmlns:local="http://127.0.0.1"><loc
{1,2}
(1 row)
+SELECT xpath('//loc:piece', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+ xpath
+------------------------------------------------------------------------------------------------------------------------------------------------
+ {"<local:piece xmlns:local=\"http://127.0.0.1\" id=\"1\">number one</local:piece>","<local:piece xmlns:local=\"http://127.0.0.1\" id=\"2\"/>"}
+(1 row)
+
SELECT xpath('//b', '<a>one <b>two</b> three <b>etc</b></a>');
xpath
-------------------------
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index a34d1f4..c7bcf91 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -498,6 +498,12 @@ LINE 1: SELECT xpath('//loc:piece/@id', '<local:data xmlns:local="ht...
^
DETAIL: This functionality requires the server to be built with libxml support.
HINT: You need to rebuild PostgreSQL using --with-libxml.
+SELECT xpath('//loc:piece', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+ERROR: unsupported XML feature
+LINE 1: SELECT xpath('//loc:piece', '<local:data xmlns:local="http:/...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+HINT: You need to rebuild PostgreSQL using --with-libxml.
SELECT xpath('//b', '<a>one <b>two</b> three <b>etc</b></a>');
ERROR: unsupported XML feature
LINE 1: SELECT xpath('//b', '<a>one <b>two</b> three <b>etc</b></a>'...
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 90d4d67..241a5d6 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -174,6 +174,7 @@ SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
SELECT xpath('', '<!-- error -->');
SELECT xpath('//text()', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>');
SELECT xpath('//loc:piece/@id', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+SELECT xpath('//loc:piece', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
SELECT xpath('//b', '<a>one <b>two</b> three <b>etc</b></a>');
SELECT xpath('//text()', '<root><</root>');
SELECT xpath('//@value', '<root value="<"/>');
2014-06-16 22:52 GMT+07:00 Abhijit Menon-Sen <ams@2ndquadrant.com>:
Thanks for the patch, and welcome to Postgres development.
I can confirm that it works fine. I have attached here a very slightly
tweaked version of the patch (removed trailing whitespace, and changed
some comment text).I'm marking this ready for committer.
Thanks for the review. Hope i will be able to contribute a little here and
there in the future.
--
Ali Akbar
Abhijit Menon-Sen <ams@2ndQuadrant.com> writes:
I can confirm that it works fine. I have attached here a very slightly
tweaked version of the patch (removed trailing whitespace, and changed
some comment text).
I'm marking this ready for committer.
I don't know enough about XML/XPATH to know if this is a good idea or not,
but I did go read the documentation of xmlCopyNode(), and I notice it says
the second argument is
extended: if 1 do a recursive copy (properties, namespaces and children
when applicable) if 2 copy properties and namespaces (when applicable)
Do we really want it to "copy children"? Perhaps the value should be 2?
(I have no idea, I'm just raising the question.)
I also notice that it says
Returns: a new #xmlNodePtr, or NULL in case of error.
and the patch is failing to cover the error case. Most likely, that
would represent out-of-memory, so it definitely seems to be something
we should account for.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I don't know enough about XML/XPATH to know if this is a good idea or not,
Actually currently because of the namespace problem, xpath() returns wrong
result (even worse: invalid xml!). So this patch corrects the behavior of
xpath() to the correct one.
but I did go read the documentation of xmlCopyNode(), and I notice it says
the second argument isextended: if 1 do a recursive copy (properties, namespaces and children
when applicable) if 2 copy properties and namespaces (when applicable)Do we really want it to "copy children"? Perhaps the value should be 2?
(I have no idea, I'm just raising the question.)
If we put 1 as the parameter, the resulting Node will link to the existing
children. In this case, the namespace problem for the children might be
still exists. If any of the children uses different namespace(s) than the
parent, the namespace definition will not be copied correctly.
Just to be safe, in the patch 1 passed as the second argument.
Ideally, we can traverse the Node object and generate XML accordingly, but
it is a lot of work, and a lot of duplicating libxml's code. Maybe we
should push this upstream to libxml?
I also notice that it says
Returns: a new #xmlNodePtr, or NULL in case of error.
and the patch is failing to cover the error case. Most likely, that
would represent out-of-memory, so it definitely seems to be something
we should account for.
Ah, overlooked that.
Skimming through libxml2's source, it looks like there are some other cases
beside out-of-memory. Will update the patch according to these cases.
Thanks for the review.
--
Ali Akbar
Greetings,
Attached modified patch to handle xmlCopyNode returning NULL. The patch is
larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt
is needed for calling xml_ereport).
From libxml2 source, it turns out that the other cases that xmlCopyNode
will return NULL will not happen. So in this patch i assume that the only
case is memory exhaustion.
But i found some bug in libxml2's code, because we call xmlCopyNode with 1
as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode
recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't
check the return of xmlStaticCopyNode whether it's NULL. So if someday the
recursion returns NULL (maybe because of memory exhaustion), it will
SEGFAULT.
Knowing this but in libxml2 code, is this patch is still acceptable?
Thanks,
Ali Akbar
Attachments:
xpath-ns-fix-3.patchtext/x-diff; charset=US-ASCII; name=xpath-ns-fix-3.patchDownload
*** a/src/backend/utils/adt/xml.c
--- b/src/backend/utils/adt/xml.c
***************
*** 141,149 **** static bool print_xml_decl(StringInfo buf, const xmlChar *version,
pg_enc encoding, int standalone);
static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
bool preserve_whitespace, int encoding);
! static text *xml_xmlnodetoxmltype(xmlNodePtr cur);
static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! ArrayBuildState **astate);
#endif /* USE_LIBXML */
static StringInfo query_to_xml_internal(const char *query, char *tablename,
--- 141,151 ----
pg_enc encoding, int standalone);
static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
bool preserve_whitespace, int encoding);
! static text *xml_xmlnodetoxmltype(xmlNodePtr cur,
! PgXmlErrorContext *xmlerrcxt);
static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! ArrayBuildState **astate,
! PgXmlErrorContext *xmlerrcxt);
#endif /* USE_LIBXML */
static StringInfo query_to_xml_internal(const char *query, char *tablename,
***************
*** 3595,3620 **** SPI_sql_row_to_xmlelement(int rownum, StringInfo result, char *tablename,
* return value otherwise)
*/
static text *
! xml_xmlnodetoxmltype(xmlNodePtr cur)
{
xmltype *result;
if (cur->type == XML_ELEMENT_NODE)
{
xmlBufferPtr buf;
buf = xmlBufferCreate();
PG_TRY();
{
! xmlNodeDump(buf, NULL, cur, 0, 1);
result = xmlBuffer_to_xmltype(buf);
}
PG_CATCH();
{
xmlBufferFree(buf);
PG_RE_THROW();
}
PG_END_TRY();
xmlBufferFree(buf);
}
else
--- 3597,3636 ----
* return value otherwise)
*/
static text *
! xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
{
xmltype *result;
if (cur->type == XML_ELEMENT_NODE)
{
xmlBufferPtr buf;
+ xmlNodePtr cur_copy;
buf = xmlBufferCreate();
+
+ /* the result of xmlNodeDump won't contain namespace definitions
+ * from parent nodes, but xmlCopyNode duplicates a node along
+ * with its required namespace definitions.
+ */
+ cur_copy = xmlCopyNode(cur, 1);
+
+ if (cur_copy == NULL)
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
+ "could not serialize xml");
+
PG_TRY();
{
! xmlNodeDump(buf, NULL, cur_copy, 0, 1);
result = xmlBuffer_to_xmltype(buf);
}
PG_CATCH();
{
+ xmlFreeNode(cur_copy);
xmlBufferFree(buf);
PG_RE_THROW();
}
PG_END_TRY();
+ xmlFreeNode(cur_copy);
xmlBufferFree(buf);
}
else
***************
*** 3656,3662 **** xml_xmlnodetoxmltype(xmlNodePtr cur)
*/
static int
xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! ArrayBuildState **astate)
{
int result = 0;
Datum datum;
--- 3672,3679 ----
*/
static int
xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! ArrayBuildState **astate,
! PgXmlErrorContext *xmlerrcxt)
{
int result = 0;
Datum datum;
***************
*** 3678,3684 **** xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
for (i = 0; i < result; i++)
{
! datum = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[i]));
*astate = accumArrayResult(*astate, datum,
false, XMLOID,
CurrentMemoryContext);
--- 3695,3702 ----
for (i = 0; i < result; i++)
{
! datum = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[i],
! xmlerrcxt));
*astate = accumArrayResult(*astate, datum,
false, XMLOID,
CurrentMemoryContext);
***************
*** 3882,3890 **** xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces,
* Extract the results as requested.
*/
if (res_nitems != NULL)
! *res_nitems = xml_xpathobjtoxmlarray(xpathobj, astate);
else
! (void) xml_xpathobjtoxmlarray(xpathobj, astate);
}
PG_CATCH();
{
--- 3900,3908 ----
* Extract the results as requested.
*/
if (res_nitems != NULL)
! *res_nitems = xml_xpathobjtoxmlarray(xpathobj, astate, xmlerrcxt);
else
! (void) xml_xpathobjtoxmlarray(xpathobj, astate, xmlerrcxt);
}
PG_CATCH();
{
*** a/src/test/regress/expected/xml.out
--- b/src/test/regress/expected/xml.out
***************
*** 584,589 **** SELECT xpath('//loc:piece/@id', '<local:data xmlns:local="http://127.0.0.1"><loc
--- 584,595 ----
{1,2}
(1 row)
+ SELECT xpath('//loc:piece', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+ xpath
+ ------------------------------------------------------------------------------------------------------------------------------------------------
+ {"<local:piece xmlns:local=\"http://127.0.0.1\" id=\"1\">number one</local:piece>","<local:piece xmlns:local=\"http://127.0.0.1\" id=\"2\"/>"}
+ (1 row)
+
SELECT xpath('//b', '<a>one <b>two</b> three <b>etc</b></a>');
xpath
-------------------------
*** a/src/test/regress/expected/xml_1.out
--- b/src/test/regress/expected/xml_1.out
***************
*** 498,503 **** LINE 1: SELECT xpath('//loc:piece/@id', '<local:data xmlns:local="ht...
--- 498,509 ----
^
DETAIL: This functionality requires the server to be built with libxml support.
HINT: You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xpath('//loc:piece', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+ ERROR: unsupported XML feature
+ LINE 1: SELECT xpath('//loc:piece', '<local:data xmlns:local="http:/...
+ ^
+ DETAIL: This functionality requires the server to be built with libxml support.
+ HINT: You need to rebuild PostgreSQL using --with-libxml.
SELECT xpath('//b', '<a>one <b>two</b> three <b>etc</b></a>');
ERROR: unsupported XML feature
LINE 1: SELECT xpath('//b', '<a>one <b>two</b> three <b>etc</b></a>'...
*** a/src/test/regress/sql/xml.sql
--- b/src/test/regress/sql/xml.sql
***************
*** 174,179 **** SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
--- 174,180 ----
SELECT xpath('', '<!-- error -->');
SELECT xpath('//text()', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>');
SELECT xpath('//loc:piece/@id', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+ SELECT xpath('//loc:piece', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
SELECT xpath('//b', '<a>one <b>two</b> three <b>etc</b></a>');
SELECT xpath('//text()', '<root><</root>');
SELECT xpath('//@value', '<root value="<"/>');
Greetings,
Because of the memory bug in xmlCopyNode, this is a new patch with
different method. In this patch, instead of using xmlCopyNode to bring the
namespace back, we added the required namespaces to the node before
dumping the node to string, and cleaning it up afterwards (because the same
node could be dumped again in next xpath result).
Thanks,
Ali Akbar
2014-07-11 15:36 GMT+07:00 Ali Akbar <the.apaan@gmail.com>:
Greetings,
Attached modified patch to handle xmlCopyNode returning NULL. The patch is
larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt
is needed for calling xml_ereport).From libxml2 source, it turns out that the other cases that xmlCopyNode
will return NULL will not happen. So in this patch i assume that the only
case is memory exhaustion.But i found some bug in libxml2's code, because we call xmlCopyNode with 1
as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode
recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't
check the return of xmlStaticCopyNode whether it's NULL. So if someday the
recursion returns NULL (maybe because of memory exhaustion), it will
SEGFAULT.Knowing this but in libxml2 code, is this patch is still acceptable?
Thanks,
Ali Akbar
--
Ali Akbar
Attachments:
xpath-ns-fix-4.patchtext/x-patch; charset=US-ASCII; name=xpath-ns-fix-4.patchDownload
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 422be69..f34c87e 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -141,9 +141,11 @@ static bool print_xml_decl(StringInfo buf, const xmlChar *version,
pg_enc encoding, int standalone);
static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
bool preserve_whitespace, int encoding);
-static text *xml_xmlnodetoxmltype(xmlNodePtr cur);
+static text *xml_xmlnodetoxmltype(xmlNodePtr cur,
+ PgXmlErrorContext *xmlerrcxt);
static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
- ArrayBuildState **astate);
+ ArrayBuildState **astate,
+ PgXmlErrorContext *xmlerrcxt);
#endif /* USE_LIBXML */
static StringInfo query_to_xml_internal(const char *query, char *tablename,
@@ -3590,27 +3592,109 @@ SPI_sql_row_to_xmlelement(int rownum, StringInfo result, char *tablename,
#ifdef USE_LIBXML
+/* check ns definition of node and its childrens. If any one of ns is
+ * not defined in node and it's children, but in the node's parent,
+ * copy the definition to node.
+ */
+static void
+xml_checkandcopyns(xmlNodePtr root,
+ PgXmlErrorContext *xmlerrcxt,
+ xmlNodePtr node,
+ xmlNsPtr lastns_before)
+{
+ xmlNsPtr ns = NULL;
+ xmlNsPtr cur_ns;
+ xmlNodePtr cur;
+
+ if (node->ns != NULL)
+ {
+ /* check in new nses */
+ cur_ns = lastns_before == NULL ? node->nsDef : lastns_before->next;
+ while (cur_ns != NULL)
+ {
+ if (cur_ns->href != NULL)
+ {
+ if (((cur_ns->prefix == NULL) && (node->ns->prefix == NULL)) ||
+ ((cur_ns->prefix != NULL) && (node->ns->prefix != NULL) &&
+ xmlStrEqual(cur_ns->prefix, node->ns->prefix)))
+ {
+ ns = cur_ns;
+ break;
+ }
+ }
+ cur_ns = cur_ns->next;
+ }
+ if (ns == NULL) /* not in new nses */
+ {
+ ns = xmlSearchNs(NULL, node->parent, node->ns->prefix);
+
+ if (ns != NULL) {
+ ns = xmlNewNs(root, ns->href, ns->prefix);
+
+ if (ns == NULL && xmlerrcxt->err_occurred) {
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
+ "could not allocate xmlNs");
+ }
+ }
+ }
+ }
+ /* check and copy ns for children recursively */
+ cur = node->children;
+ while (cur != NULL) {
+ xml_checkandcopyns(root, xmlerrcxt, cur, lastns_before);
+ cur = cur->next;
+ }
+}
+
/*
* Convert XML node to text (dump subtree in case of element,
* return value otherwise)
*/
static text *
-xml_xmlnodetoxmltype(xmlNodePtr cur)
+xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
{
xmltype *result;
if (cur->type == XML_ELEMENT_NODE)
{
xmlBufferPtr buf;
+ xmlNsPtr lastns_before;
+ xmlNsPtr ns;
+ xmlNsPtr next;
buf = xmlBufferCreate();
+
PG_TRY();
{
+ lastns_before = cur->nsDef;
+ if (lastns_before != NULL) {
+ while (lastns_before->next != NULL) {
+ lastns_before = lastns_before->next;
+ }
+ }
+ xml_checkandcopyns(cur, xmlerrcxt, cur, lastns_before);
+
xmlNodeDump(buf, NULL, cur, 0, 1);
result = xmlBuffer_to_xmltype(buf);
+
+ /* delete and free new nses */
+ ns = lastns_before == NULL ? cur->nsDef : lastns_before->next;
+ while (ns != NULL) {
+ next = ns->next;
+ xmlFree(ns);
+ ns = next;
+ }
+ if (lastns_before == NULL) {
+ cur->nsDef = NULL;
+ } else {
+ lastns_before->next = NULL;
+ }
}
PG_CATCH();
{
+ /* new namespaces will be freed while free-ing the node, so we
+ * won't free it here
+ */
xmlBufferFree(buf);
PG_RE_THROW();
}
@@ -3656,7 +3740,8 @@ xml_xmlnodetoxmltype(xmlNodePtr cur)
*/
static int
xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
- ArrayBuildState **astate)
+ ArrayBuildState **astate,
+ PgXmlErrorContext *xmlerrcxt)
{
int result = 0;
Datum datum;
@@ -3678,7 +3763,8 @@ xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
for (i = 0; i < result; i++)
{
- datum = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[i]));
+ datum = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[i],
+ xmlerrcxt));
*astate = accumArrayResult(*astate, datum,
false, XMLOID,
CurrentMemoryContext);
@@ -3882,9 +3968,9 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces,
* Extract the results as requested.
*/
if (res_nitems != NULL)
- *res_nitems = xml_xpathobjtoxmlarray(xpathobj, astate);
+ *res_nitems = xml_xpathobjtoxmlarray(xpathobj, astate, xmlerrcxt);
else
- (void) xml_xpathobjtoxmlarray(xpathobj, astate);
+ (void) xml_xpathobjtoxmlarray(xpathobj, astate, xmlerrcxt);
}
PG_CATCH();
{
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 382f9df..866f299 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -584,6 +584,21 @@ SELECT xpath('//loc:piece/@id', '<local:data xmlns:local="http://127.0.0.1"><loc
{1,2}
(1 row)
+SELECT xpath('//loc:piece', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+ xpath
+------------------------------------------------------------------------------------------------------------------------------------------------
+ {"<local:piece xmlns:local=\"http://127.0.0.1\" id=\"1\">number one</local:piece>","<local:piece xmlns:local=\"http://127.0.0.1\" id=\"2\"/>"}
+(1 row)
+
+SELECT xpath('//loc:*', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+ xpath
+--------------------------------------------------------------------------------------------------------------------------------------------------------------
+ {"<local:data xmlns:local=\"http://127.0.0.1\"> +
+ <local:piece id=\"1\">number one</local:piece> +
+ <local:piece id=\"2\"/> +
+ </local:data>","<local:piece xmlns:local=\"http://127.0.0.1\" id=\"1\">number one</local:piece>","<local:piece xmlns:local=\"http://127.0.0.1\" id=\"2\"/>"}
+(1 row)
+
SELECT xpath('//b', '<a>one <b>two</b> three <b>etc</b></a>');
xpath
-------------------------
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index a34d1f4..46aff5f 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -498,6 +498,18 @@ LINE 1: SELECT xpath('//loc:piece/@id', '<local:data xmlns:local="ht...
^
DETAIL: This functionality requires the server to be built with libxml support.
HINT: You need to rebuild PostgreSQL using --with-libxml.
+SELECT xpath('//loc:piece', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+ERROR: unsupported XML feature
+LINE 1: SELECT xpath('//loc:piece', '<local:data xmlns:local="http:/...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+HINT: You need to rebuild PostgreSQL using --with-libxml.
+SELECT xpath('//loc:*', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+ERROR: unsupported XML feature
+LINE 1: SELECT xpath('//loc:*', '<local:data xmlns:local="http://127...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+HINT: You need to rebuild PostgreSQL using --with-libxml.
SELECT xpath('//b', '<a>one <b>two</b> three <b>etc</b></a>');
ERROR: unsupported XML feature
LINE 1: SELECT xpath('//b', '<a>one <b>two</b> three <b>etc</b></a>'...
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 90d4d67..f436f97 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -174,6 +174,8 @@ SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
SELECT xpath('', '<!-- error -->');
SELECT xpath('//text()', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>');
SELECT xpath('//loc:piece/@id', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+SELECT xpath('//loc:piece', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+SELECT xpath('//loc:*', '<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
SELECT xpath('//b', '<a>one <b>two</b> three <b>etc</b></a>');
SELECT xpath('//text()', '<root><</root>');
SELECT xpath('//@value', '<root value="<"/>');
On 7/11/14 4:36 AM, Ali Akbar wrote:
But i found some bug in libxml2's code, because we call xmlCopyNode with
1 as the second parameter, internally xmlCopyNode will call
xmlStaticCopyNode recursively via xmlStaticCopyNodeList. And
xmlStaticCopyNodeList doesn't check the return of xmlStaticCopyNode
whether it's NULL. So if someday the recursion returns NULL (maybe
because of memory exhaustion), it will SEGFAULT.Knowing this but in libxml2 code, is this patch is still acceptable?
This problem was fixed in libxml2 2.9.2 (see
https://bugzilla.gnome.org/show_bug.cgi?id=708681).
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 7/8/14 6:03 AM, Ali Akbar wrote:
If we put 1 as the parameter, the resulting Node will link to the
existing children. In this case, the namespace problem for the children
might be still exists. If any of the children uses different
namespace(s) than the parent, the namespace definition will not be
copied correctly.
This analysis might very well be right, but I think we should prove it
with a test case.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers