Fix xpath() to return namespace definitions

Started by Ali Akbaralmost 12 years ago11 messageshackers
Jump to latest
#1Ali Akbar
the.apaan@gmail.com

Hi all,

While developing some XML processing queries, i stumbled on an old bug
mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or
repeated xpath() that apparently mess up namespaces.

Source of the bug is that libxml2's xmlNodeDump won't output XML namespace
definitions that declared in the node's parents. As per
https://bug639036.bugzilla-attachments.gnome.org/attachment.cgi?id=177858,
the behavior is intentional.

This patch uses function xmlCopyNode that copies a node, including its
namespace definitions as required (without unused namespace in the node or
its children). When the copy dumped, the resulting XML is complete with its
namespaces. Calling xmlCopyNode will need additional memory to execute, but
reimplementing its routine to handle namespace definition will introduce
much complexity to the code.

Note: This is my very first postgresql patch.

--
Ali Akbar

Attachments:

xpath-ns-fix.patchtext/x-patch; charset=US-ASCII; name=xpath-ns-fix.patchDownload+23-1
#2Robert Haas
robertmhaas@gmail.com
In reply to: Ali Akbar (#1)
Re: Fix xpath() to return namespace definitions

On Fri, May 30, 2014 at 5:04 AM, Ali Akbar <the.apaan@gmail.com> wrote:

While developing some XML processing queries, i stumbled on an old bug
mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or
repeated xpath() that apparently mess up namespaces.

Source of the bug is that libxml2's xmlNodeDump won't output XML namespace
definitions that declared in the node's parents. As per
https://bug639036.bugzilla-attachments.gnome.org/attachment.cgi?id=177858,
the behavior is intentional.

This patch uses function xmlCopyNode that copies a node, including its
namespace definitions as required (without unused namespace in the node or
its children). When the copy dumped, the resulting XML is complete with its
namespaces. Calling xmlCopyNode will need additional memory to execute, but
reimplementing its routine to handle namespace definition will introduce
much complexity to the code.

Note: This is my very first postgresql patch.

Please add your patch here so we don't forget about it:

https://commitfest.postgresql.org/action/commitfest_view/open

Please see also:

https://wiki.postgresql.org/wiki/Submitting_a_Patch

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Ali Akbar
the.apaan@gmail.com
In reply to: Robert Haas (#2)
Re: Fix xpath() to return namespace definitions

2014-06-05 2:37 GMT+07:00 Robert Haas <robertmhaas@gmail.com>:

Please add your patch here so we don't forget about it:

https://commitfest.postgresql.org/action/commitfest_view/open

Added: https://commitfest.postgresql.org/action/patch_view?id=1461

Please see also:

https://wiki.postgresql.org/wiki/Submitting_a_Patch

Thanks, i've read it.
As for the suggestion there: "Start out by reviewing a patch or responding
to email on the lists. Even if it is unrelated to what you're doing", i'll
look around other posts in this list.

For back versions, i think because this patch changes xpath() behavior, we
will only apply this to future versions. The old behavior is wrong
(according to XPath standard) for not including namespaces, but maybe there
are some application that depends on the old behavior.

Thanks!
--
Ali Akbar

#4Abhijit Menon-Sen
ams@2ndQuadrant.com
In reply to: Ali Akbar (#1)
[REVIEW] Re: Fix xpath() to return namespace definitions

At 2014-05-30 16:04:33 +0700, the.apaan@gmail.com wrote:

While developing some XML processing queries, i stumbled on an old bug
mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or
repeated xpath() that apparently mess up namespaces.

Thanks for the patch, and welcome to Postgres development.

I can confirm that it works fine. I have attached here a very slightly
tweaked version of the patch (removed trailing whitespace, and changed
some comment text).

I'm marking this ready for committer.

-- Abhijit

Attachments:

xpath-ns-fix-2.patchtext/x-diff; charset=us-asciiDownload+23-1
#5Ali Akbar
the.apaan@gmail.com
In reply to: Abhijit Menon-Sen (#4)
Re: [REVIEW] Re: Fix xpath() to return namespace definitions

2014-06-16 22:52 GMT+07:00 Abhijit Menon-Sen <ams@2ndquadrant.com>:

Thanks for the patch, and welcome to Postgres development.

I can confirm that it works fine. I have attached here a very slightly
tweaked version of the patch (removed trailing whitespace, and changed
some comment text).

I'm marking this ready for committer.

Thanks for the review. Hope i will be able to contribute a little here and
there in the future.

--
Ali Akbar

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Abhijit Menon-Sen (#4)
Re: [REVIEW] Re: Fix xpath() to return namespace definitions

Abhijit Menon-Sen <ams@2ndQuadrant.com> writes:

I can confirm that it works fine. I have attached here a very slightly
tweaked version of the patch (removed trailing whitespace, and changed
some comment text).

I'm marking this ready for committer.

I don't know enough about XML/XPATH to know if this is a good idea or not,
but I did go read the documentation of xmlCopyNode(), and I notice it says
the second argument is

extended: if 1 do a recursive copy (properties, namespaces and children
when applicable) if 2 copy properties and namespaces (when applicable)

Do we really want it to "copy children"? Perhaps the value should be 2?
(I have no idea, I'm just raising the question.)

I also notice that it says

Returns: a new #xmlNodePtr, or NULL in case of error.

and the patch is failing to cover the error case. Most likely, that
would represent out-of-memory, so it definitely seems to be something
we should account for.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Ali Akbar
the.apaan@gmail.com
In reply to: Tom Lane (#6)
Re: [REVIEW] Re: Fix xpath() to return namespace definitions

I don't know enough about XML/XPATH to know if this is a good idea or not,

Actually currently because of the namespace problem, xpath() returns wrong
result (even worse: invalid xml!). So this patch corrects the behavior of
xpath() to the correct one.

but I did go read the documentation of xmlCopyNode(), and I notice it says
the second argument is

extended: if 1 do a recursive copy (properties, namespaces and children
when applicable) if 2 copy properties and namespaces (when applicable)

Do we really want it to "copy children"? Perhaps the value should be 2?
(I have no idea, I'm just raising the question.)

If we put 1 as the parameter, the resulting Node will link to the existing
children. In this case, the namespace problem for the children might be
still exists. If any of the children uses different namespace(s) than the
parent, the namespace definition will not be copied correctly.

Just to be safe, in the patch 1 passed as the second argument.

Ideally, we can traverse the Node object and generate XML accordingly, but
it is a lot of work, and a lot of duplicating libxml's code. Maybe we
should push this upstream to libxml?

I also notice that it says

Returns: a new #xmlNodePtr, or NULL in case of error.

and the patch is failing to cover the error case. Most likely, that
would represent out-of-memory, so it definitely seems to be something
we should account for.

Ah, overlooked that.

Skimming through libxml2's source, it looks like there are some other cases
beside out-of-memory. Will update the patch according to these cases.

Thanks for the review.

--
Ali Akbar

#8Ali Akbar
the.apaan@gmail.com
In reply to: Ali Akbar (#7)
Re: [REVIEW] Re: Fix xpath() to return namespace definitions

Greetings,

Attached modified patch to handle xmlCopyNode returning NULL. The patch is
larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt
is needed for calling xml_ereport).

From libxml2 source, it turns out that the other cases that xmlCopyNode
will return NULL will not happen. So in this patch i assume that the only
case is memory exhaustion.

But i found some bug in libxml2's code, because we call xmlCopyNode with 1
as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode
recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't
check the return of xmlStaticCopyNode whether it's NULL. So if someday the
recursion returns NULL (maybe because of memory exhaustion), it will
SEGFAULT.

Knowing this but in libxml2 code, is this patch is still acceptable?

Thanks,
Ali Akbar

Attachments:

xpath-ns-fix-3.patchtext/x-diff; charset=US-ASCII; name=xpath-ns-fix-3.patchDownload+47-20
#9Ali Akbar
the.apaan@gmail.com
In reply to: Ali Akbar (#8)
Re: [REVIEW] Re: Fix xpath() to return namespace definitions

Greetings,

Because of the memory bug in xmlCopyNode, this is a new patch with
different method. In this patch, instead of using xmlCopyNode to bring the
namespace back, we added the required namespaces to the node before
dumping the node to string, and cleaning it up afterwards (because the same
node could be dumped again in next xpath result).

Thanks,
Ali Akbar

2014-07-11 15:36 GMT+07:00 Ali Akbar <the.apaan@gmail.com>:

Greetings,

Attached modified patch to handle xmlCopyNode returning NULL. The patch is
larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt
is needed for calling xml_ereport).

From libxml2 source, it turns out that the other cases that xmlCopyNode
will return NULL will not happen. So in this patch i assume that the only
case is memory exhaustion.

But i found some bug in libxml2's code, because we call xmlCopyNode with 1
as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode
recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't
check the return of xmlStaticCopyNode whether it's NULL. So if someday the
recursion returns NULL (maybe because of memory exhaustion), it will
SEGFAULT.

Knowing this but in libxml2 code, is this patch is still acceptable?

Thanks,
Ali Akbar

--
Ali Akbar

Attachments:

xpath-ns-fix-4.patchtext/x-patch; charset=US-ASCII; name=xpath-ns-fix-4.patchDownload+122-7
#10Peter Eisentraut
peter_e@gmx.net
In reply to: Ali Akbar (#8)
Re: [REVIEW] Re: Fix xpath() to return namespace definitions

On 7/11/14 4:36 AM, Ali Akbar wrote:

But i found some bug in libxml2's code, because we call xmlCopyNode with
1 as the second parameter, internally xmlCopyNode will call
xmlStaticCopyNode recursively via xmlStaticCopyNodeList. And
xmlStaticCopyNodeList doesn't check the return of xmlStaticCopyNode
whether it's NULL. So if someday the recursion returns NULL (maybe
because of memory exhaustion), it will SEGFAULT.

Knowing this but in libxml2 code, is this patch is still acceptable?

This problem was fixed in libxml2 2.9.2 (see
https://bugzilla.gnome.org/show_bug.cgi?id=708681).

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Peter Eisentraut
peter_e@gmx.net
In reply to: Ali Akbar (#7)
Re: [REVIEW] Re: Fix xpath() to return namespace definitions

On 7/8/14 6:03 AM, Ali Akbar wrote:

If we put 1 as the parameter, the resulting Node will link to the
existing children. In this case, the namespace problem for the children
might be still exists. If any of the children uses different
namespace(s) than the parent, the namespace definition will not be
copied correctly.

This analysis might very well be right, but I think we should prove it
with a test case.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers