xpath processing brain dead
Andrew Gierth was just pointing out to me how badly broken our XPath
processing is.
For fear of passing an ill formed fragment of xml to the processor, we
strip the xml declaration if any and surround what's left with '<x>" and
'</x>' and prepend '/x' to the supposed xpath. This is just horrible. It
will break for every xpath expression that doesn't begin with a '/' and
probably for many that do.
This whole thing is a mess, and I suspect the only fix for now is to
undo all the mangling of both the xml and the xpath expression. If the
programmer passes an ill formed piece of xml to the processor that is
their lookout, but I think we should ensure that we give back correct
results on well formed input.
The only good piece of news is that the xpath procedures in contrib/xml2
don't apparently suffer these faults.
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
For fear of passing an ill formed fragment of xml to the processor, we
strip the xml declaration if any and surround what's left with '<x>" and
'</x>' and prepend '/x' to the supposed xpath. This is just horrible.
I seem to recall having complained about that at the time, but I didn't
(and don't) know enough about xpath to do any better.
This whole thing is a mess, and I suspect the only fix for now is to
undo all the mangling of both the xml and the xpath expression.
I don't think we should change the behavior if it's just to arrive at
another less-than-desirable behavior. Whacking semantics around afresh
with each release does not endear us to users. If we know how to fix it
right, great; but if we can't then we should keep compatibility with 8.3
until we can.
regards, tom lane
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
For fear of passing an ill formed fragment of xml to the processor, we
strip the xml declaration if any and surround what's left with '<x>" and
'</x>' and prepend '/x' to the supposed xpath. This is just horrible.I seem to recall having complained about that at the time, but I didn't
(and don't) know enough about xpath to do any better.
Well, a few of us do. I guess I took my eye off the ball a bit back when
we were putting this into 8.3.
This whole thing is a mess, and I suspect the only fix for now is to
undo all the mangling of both the xml and the xpath expression.I don't think we should change the behavior if it's just to arrive at
another less-than-desirable behavior. Whacking semantics around afresh
with each release does not endear us to users. If we know how to fix it
right, great; but if we can't then we should keep compatibility with 8.3
until we can.
Honestly, this is a bug, pure and simple. There really can't be an
argument about that. For the stable branch, we could make the following
changes that should result in a Pareto improvement (nothing gets worse
while some things get better):
* only do the xml transformation if the xml is known not to be be
well formed
* if we need to mangle the xpath expression due to doing the xml
transformation, then unless the xpath expression begins with a
'/', prepend it with '/x//'. (two slashes corresponds to the
descendent axis in xpath - in effect it stands for any number of
descendent elements).
But that's just a holding operation. For 8.4 we should stop this
nonsense and simply say that it is up to the programmer to ensure that
the xml passed to the processor is well formed.
The thing that is so very bad about this is that if the programmer *has*
made sure that the inputs are correct, s/he can still end up with broken
results. If we're going to try to fix bad inputs, we must make damn sure
that we don't break on correct inputs as a result. However, I can't off
hand think of a lightning way to fix bad inputs that doesn't carry some
danger to good inputs. Until someone comes up with something tolerably
bulletproof, I suggest that we simply say that it is the programmer's
responsibility.
cheers
andrew
Andrew Dunstan wrote:
For fear of passing an ill formed fragment of xml to the processor, we
strip the xml declaration if any and surround what's left with '<x>" and
'</x>' and prepend '/x' to the supposed xpath. This is just horrible. It
will break for every xpath expression that doesn't begin with a '/' and
probably for many that do.This whole thing is a mess, and I suspect the only fix for now is to
undo all the mangling of both the xml and the xpath expression. If the
programmer passes an ill formed piece of xml to the processor that is
their lookout, but I think we should ensure that we give back correct
results on well formed input.
It's not about ill-formed pieces, it is about (well-formed) content
fragments that are not full documents (exactly one root element).
libxml2 doesn't support xpath on content fragments.
The tradeoff here is either supporting no xpath at all on content
fragments or supporting some xpath on both kinds of xml data. Whoever
made the initial implementation of this (Nikolay, Cc) decided for the
latter, but I can't say I'm happy with it either. I'd be OK with
changing it, but I haven't had time to get to it, unfortunately.
See also <http://wiki.postgresql.org/wiki/XPath_Todo#XPath>.
Peter Eisentraut wrote:
Andrew Dunstan wrote:
For fear of passing an ill formed fragment of xml to the processor,
we strip the xml declaration if any and surround what's left with
'<x>" and '</x>' and prepend '/x' to the supposed xpath. This is just
horrible. It will break for every xpath expression that doesn't begin
with a '/' and probably for many that do.This whole thing is a mess, and I suspect the only fix for now is to
undo all the mangling of both the xml and the xpath expression. If
the programmer passes an ill formed piece of xml to the processor
that is their lookout, but I think we should ensure that we give back
correct results on well formed input.It's not about ill-formed pieces, it is about (well-formed) content
fragments that are not full documents (exactly one root element).
libxml2 doesn't support xpath on content fragments.The tradeoff here is either supporting no xpath at all on content
fragments or supporting some xpath on both kinds of xml data. Whoever
made the initial implementation of this (Nikolay, Cc) decided for the
latter, but I can't say I'm happy with it either. I'd be OK with
changing it, but I haven't had time to get to it, unfortunately.See also <http://wiki.postgresql.org/wiki/XPath_Todo#XPath>.
I don't think it is our responsibility to remedy the deficiencies of
libxml2, especially if it involves breaking the processing of valid
xpath expressions on well formed XML.
If someone can come up with a correct scheme for handling fragments, I'm
all ears, but otherwise I think a) we should rip this out of 8.4 and b)
we should try to make 8.3 slightly less broken at least, along the lines
of my earlier suggestion.
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
I don't think it is our responsibility to remedy the deficiencies of
libxml2, especially if it involves breaking the processing of valid
xpath expressions on well formed XML.
If someone can come up with a correct scheme for handling fragments, I'm
all ears, but otherwise I think a) we should rip this out of 8.4 and b)
we should try to make 8.3 slightly less broken at least, along the lines
of my earlier suggestion.
I'm okay with removing this for 8.4, but I think it's too late to
change the behavior of 8.3.
regards, tom lane
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
I don't think it is our responsibility to remedy the deficiencies of
libxml2, especially if it involves breaking the processing of valid
xpath expressions on well formed XML.If someone can come up with a correct scheme for handling fragments, I'm
all ears, but otherwise I think a) we should rip this out of 8.4 and b)
we should try to make 8.3 slightly less broken at least, along the lines
of my earlier suggestion.I'm okay with removing this for 8.4, but I think it's too late to
change the behavior of 8.3.
It's never too late to fix a bug. The current behaviour is just plain
nonsense ... if your xpath expression is 'foo/bar' it ends up searching
for '/xfoo/bar' which can't possibly be right. Surely we can at least
fix that.
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
Tom Lane wrote:
I'm okay with removing this for 8.4, but I think it's too late to
change the behavior of 8.3.
It's never too late to fix a bug.
When the proposed fix involves breaking cases that used to behave
usefully, you need a much stronger argument than that.
regards, tom lane
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
Tom Lane wrote:
I'm okay with removing this for 8.4, but I think it's too late to
change the behavior of 8.3.It's never too late to fix a bug.
When the proposed fix involves breaking cases that used to behave
usefully, you need a much stronger argument than that.
What I have proposed for 8.3 should not break a single case that
currently behaves usefully. If anyone has a counter-example please show it.
What I have proposed for 8.4 possibly would break current "useful"
behaviour (FSVO "useful"), but should be done anyway on correctness grounds.
cheers
andrew
What I have proposed for 8.3 should not break a single case that currently
behaves usefully. If anyone has a counter-example please show it.What I have proposed for 8.4 possibly would break current "useful" behaviour
(FSVO "useful"), but should be done anyway on correctness grounds.
I dunno, aren't XML document fragments sort of a pretty common case?
...Robert
Robert Haas wrote:
What I have proposed for 8.3 should not break a single case that currently
behaves usefully. If anyone has a counter-example please show it.What I have proposed for 8.4 possibly would break current "useful" behaviour
(FSVO "useful"), but should be done anyway on correctness grounds.I dunno, aren't XML document fragments sort of a pretty common case?
They are. So what? How does that contradict either of the statements
made above?
Programmers using libxml2 are used to handling this problem. Why must
postgres alone use a totally brain dead and utterly incorrect wrapping
to solve a problem that everyone else leaves up to the programmer to handle?
People in this thread are not concentrating on the fact that what we do
now can break correct input. That makes it an unquestionable bug,
IMNSHO. There seems to be agreement about what to do for 8.4, so we seem
to be arguing about what to do for 8.3. Are you *really* arguing that
prepending the xpath expression with '/x' in all cases should be allowed
to continue? If you are I can only assume your use of xml/xpath is
probably fairly light.
cheers
andrew
On Thu, Feb 26, 2009 at 3:18 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
I dunno, aren't XML document fragments sort of a pretty common case?
They are. So what? How does that contradict either of the statements made
above?Programmers using libxml2 are used to handling this problem. Why must
postgres alone use a totally brain dead and utterly incorrect wrapping to
solve a problem that everyone else leaves up to the programmer to handle?
I can't think of any reason, especially not when you put it that way. :-)
People in this thread are not concentrating on the fact that what we do now
can break correct input. That makes it an unquestionable bug, IMNSHO. There
seems to be agreement about what to do for 8.4, so we seem to be arguing
about what to do for 8.3. Are you *really* arguing that prepending the xpath
expression with '/x' in all cases should be allowed to continue? If you are
I can only assume your use of xml/xpath is probably fairly light.
Mine is very light indeed. But the change you're proposing seems like
it could CONCEIVABLY break a working application that counts on
PostgreSQL's particular flavor of misbehavior, and therefore it seems
questionable to put that into a stable branch. The fact that the
application perhaps should not have been written that way to begin
with is neither here nor there. We don't want people to be afraid of
upgrading to the latest point release when we fix, say, a backend
crash or a data corruption bug.
...Robert
On Thu, 2009-02-26 at 15:37 -0500, Robert Haas wrote:
Mine is very light indeed. But the change you're proposing seems like
it could CONCEIVABLY break a working application that counts on
PostgreSQL's particular flavor of misbehavior,
That is what release notes are for.
Joshua D. Drake
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997
Robert Haas wrote:
But the change you're proposing seems like
it could CONCEIVABLY break a working application that counts on
PostgreSQL's particular flavor of misbehavior, and therefore it seems
questionable to put that into a stable branch. The fact that the
application perhaps should not have been written that way to begin
with is neither here nor there. We don't want people to be afraid of
upgrading to the latest point release when we fix, say, a backend
crash or a data corruption bug.
Let me reiterate the changes that I propose, and again challenge you to
provide a working counter-example if you believe it will break anything
not currently broken, even cases involving fragments.
First, I propose that we abandon this mangling, if, and only if, the xml
is in fact a well formed XML document. Since the whole point of the
mangling is to handle situations where the XML is not a well formed
document, that seems fairly straight-forward. If this change were to
upset any user, it must be because they are relying on undisputably
incorrect results.
Second, I propose that, in the remaining cases, where we do mangle the
XML, if the xpath expression does not begin with a '/', instead of
prepending it with '/x/, which can not possibly be correct under any
circumstance, we prepend it with '/x//' which has some possibility of
giving correct results.
IOW, these proposals will expand the number of correct results from the
code, without contributing any new incorrect cases. These changes are
*very* conservative, as is usual when we fix things on stable branches.
cheers
andrew
I wrote:
Second, I propose that, in the remaining cases, where we do mangle the
XML, if the xpath expression does not begin with a '/', instead of
prepending it with '/x/, which can not possibly be correct under any
circumstance, we prepend it with '/x//' which has some possibility of
giving correct results.
^^^
'/x'
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
First, I propose that we abandon this mangling, if, and only if, the xml
is in fact a well formed XML document. Since the whole point of the
mangling is to handle situations where the XML is not a well formed
document, that seems fairly straight-forward. If this change were to
upset any user, it must be because they are relying on undisputably
incorrect results.
Second, I propose that, in the remaining cases, where we do mangle the
XML, if the xpath expression does not begin with a '/', instead of
prepending it with '/x/, which can not possibly be correct under any
circumstance, we prepend it with '/x//' which has some possibility of
giving correct results.
Hmm, does this proposal require adding a test of well-formed-ness to
a code path that doesn't currently have one? If so, is that likely
to contribute any noticeable slowdown?
I can't offhand see an objection to this other than possible performance
impact.
regards, tom lane
On Thu, Feb 26, 2009 at 3:54 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
Robert Haas wrote:
But the change you're proposing seems like
it could CONCEIVABLY break a working application that counts on
PostgreSQL's particular flavor of misbehavior, and therefore it seems
questionable to put that into a stable branch. The fact that the
application perhaps should not have been written that way to begin
with is neither here nor there. We don't want people to be afraid of
upgrading to the latest point release when we fix, say, a backend
crash or a data corruption bug.Let me reiterate the changes that I propose, and again challenge you to
provide a working counter-example if you believe it will break anything not
currently broken, even cases involving fragments.First, I propose that we abandon this mangling, if, and only if, the xml is
in fact a well formed XML document. Since the whole point of the mangling is
to handle situations where the XML is not a well formed document, that seems
fairly straight-forward. If this change were to upset any user, it must be
because they are relying on undisputably incorrect results.Second, I propose that, in the remaining cases, where we do mangle the XML,
if the xpath expression does not begin with a '/', instead of prepending it
with '/x/, which can not possibly be correct under any circumstance, we
prepend it with '/x//' which has some possibility of giving correct results.IOW, these proposals will expand the number of correct results from the
code, without contributing any new incorrect cases. These changes are *very*
conservative, as is usual when we fix things on stable branches.cheers
andrew
You are right.
Nuff said,
...Robert
On Feb 26, 2009, at 9:37 AM, Peter Eisentraut wrote:
It's not about ill-formed pieces, it is about (well-formed) content
fragments that are not full documents (exactly one root element).
libxml2 doesn't support xpath on content fragments.
exslt:node-set() to the rescue? Or is that/equivalent functionality
not easily accessed at the C-level with libxml2?
James Pye wrote:
On Feb 26, 2009, at 9:37 AM, Peter Eisentraut wrote:
It's not about ill-formed pieces, it is about (well-formed) content
fragments that are not full documents (exactly one root element).
libxml2 doesn't support xpath on content fragments.exslt:node-set() to the rescue? Or is that/equivalent functionality
not easily accessed at the C-level with libxml2?
A node-set isn't a document. In any case, this functionality doesn't
appear to be in libxml2, it's in libxslt according to the reference you
provided.
cheers
andrew
On Feb 26, 2009, at 5:41 PM, Andrew Dunstan wrote:
A node-set isn't a document.
yes.. :)
I guess what I'm saying is that if:
tinman=# SELECT XML'<foo/><bar/>';
xml
--------------
<foo/><bar/>
(1 row)
is considered to be valid per *SQL-XML*, then it should probably be
treated as a node-set in the context of xpath, not mangled with
<x>...</x>..
Certainly, I would expect an implicit "node-set() call" long before
wrapping the fragment in <x>...</x> and prefixing my xpath query.
However, I find the validity of the above, XML'<foo/><bar/>', a bit
disturbing to begin with. :P
In any case, this functionality doesn't appear to be in libxml2,
it's in libxslt according to the reference you provided.
I think that's *just* referencing the list of xslt implementations
that the extension function is known to be available in.. I doubt that
means to imply that the function or equivalent functionality is not
available in libxml2 itself. I'd wager that equivalent functionality
could be implemented if it weren't directly/already available.. =)