xpath processing brain dead

Started by Andrew Dunstanabout 17 years ago48 messageshackers
Jump to latest
#1Andrew Dunstan
andrew@dunslane.net

Andrew Gierth was just pointing out to me how badly broken our XPath
processing is.

For fear of passing an ill formed fragment of xml to the processor, we
strip the xml declaration if any and surround what's left with '<x>" and
'</x>' and prepend '/x' to the supposed xpath. This is just horrible. It
will break for every xpath expression that doesn't begin with a '/' and
probably for many that do.

This whole thing is a mess, and I suspect the only fix for now is to
undo all the mangling of both the xml and the xpath expression. If the
programmer passes an ill formed piece of xml to the processor that is
their lookout, but I think we should ensure that we give back correct
results on well formed input.

The only good piece of news is that the xpath procedures in contrib/xml2
don't apparently suffer these faults.

cheers

andrew

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#1)
Re: xpath processing brain dead

Andrew Dunstan <andrew@dunslane.net> writes:

For fear of passing an ill formed fragment of xml to the processor, we
strip the xml declaration if any and surround what's left with '<x>" and
'</x>' and prepend '/x' to the supposed xpath. This is just horrible.

I seem to recall having complained about that at the time, but I didn't
(and don't) know enough about xpath to do any better.

This whole thing is a mess, and I suspect the only fix for now is to
undo all the mangling of both the xml and the xpath expression.

I don't think we should change the behavior if it's just to arrive at
another less-than-desirable behavior. Whacking semantics around afresh
with each release does not endear us to users. If we know how to fix it
right, great; but if we can't then we should keep compatibility with 8.3
until we can.

regards, tom lane

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#2)
Re: xpath processing brain dead

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

For fear of passing an ill formed fragment of xml to the processor, we
strip the xml declaration if any and surround what's left with '<x>" and
'</x>' and prepend '/x' to the supposed xpath. This is just horrible.

I seem to recall having complained about that at the time, but I didn't
(and don't) know enough about xpath to do any better.

Well, a few of us do. I guess I took my eye off the ball a bit back when
we were putting this into 8.3.

This whole thing is a mess, and I suspect the only fix for now is to
undo all the mangling of both the xml and the xpath expression.

I don't think we should change the behavior if it's just to arrive at
another less-than-desirable behavior. Whacking semantics around afresh
with each release does not endear us to users. If we know how to fix it
right, great; but if we can't then we should keep compatibility with 8.3
until we can.

Honestly, this is a bug, pure and simple. There really can't be an
argument about that. For the stable branch, we could make the following
changes that should result in a Pareto improvement (nothing gets worse
while some things get better):

* only do the xml transformation if the xml is known not to be be
well formed
* if we need to mangle the xpath expression due to doing the xml
transformation, then unless the xpath expression begins with a
'/', prepend it with '/x//'. (two slashes corresponds to the
descendent axis in xpath - in effect it stands for any number of
descendent elements).

But that's just a holding operation. For 8.4 we should stop this
nonsense and simply say that it is up to the programmer to ensure that
the xml passed to the processor is well formed.

The thing that is so very bad about this is that if the programmer *has*
made sure that the inputs are correct, s/he can still end up with broken
results. If we're going to try to fix bad inputs, we must make damn sure
that we don't break on correct inputs as a result. However, I can't off
hand think of a lightning way to fix bad inputs that doesn't carry some
danger to good inputs. Until someone comes up with something tolerably
bulletproof, I suggest that we simply say that it is the programmer's
responsibility.

cheers

andrew

#4Peter Eisentraut
peter_e@gmx.net
In reply to: Andrew Dunstan (#1)
Re: xpath processing brain dead

Andrew Dunstan wrote:

For fear of passing an ill formed fragment of xml to the processor, we
strip the xml declaration if any and surround what's left with '<x>" and
'</x>' and prepend '/x' to the supposed xpath. This is just horrible. It
will break for every xpath expression that doesn't begin with a '/' and
probably for many that do.

This whole thing is a mess, and I suspect the only fix for now is to
undo all the mangling of both the xml and the xpath expression. If the
programmer passes an ill formed piece of xml to the processor that is
their lookout, but I think we should ensure that we give back correct
results on well formed input.

It's not about ill-formed pieces, it is about (well-formed) content
fragments that are not full documents (exactly one root element).
libxml2 doesn't support xpath on content fragments.

The tradeoff here is either supporting no xpath at all on content
fragments or supporting some xpath on both kinds of xml data. Whoever
made the initial implementation of this (Nikolay, Cc) decided for the
latter, but I can't say I'm happy with it either. I'd be OK with
changing it, but I haven't had time to get to it, unfortunately.

See also <http://wiki.postgresql.org/wiki/XPath_Todo#XPath&gt;.

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Peter Eisentraut (#4)
Re: xpath processing brain dead

Peter Eisentraut wrote:

Andrew Dunstan wrote:

For fear of passing an ill formed fragment of xml to the processor,
we strip the xml declaration if any and surround what's left with
'<x>" and '</x>' and prepend '/x' to the supposed xpath. This is just
horrible. It will break for every xpath expression that doesn't begin
with a '/' and probably for many that do.

This whole thing is a mess, and I suspect the only fix for now is to
undo all the mangling of both the xml and the xpath expression. If
the programmer passes an ill formed piece of xml to the processor
that is their lookout, but I think we should ensure that we give back
correct results on well formed input.

It's not about ill-formed pieces, it is about (well-formed) content
fragments that are not full documents (exactly one root element).
libxml2 doesn't support xpath on content fragments.

The tradeoff here is either supporting no xpath at all on content
fragments or supporting some xpath on both kinds of xml data. Whoever
made the initial implementation of this (Nikolay, Cc) decided for the
latter, but I can't say I'm happy with it either. I'd be OK with
changing it, but I haven't had time to get to it, unfortunately.

See also <http://wiki.postgresql.org/wiki/XPath_Todo#XPath&gt;.

I don't think it is our responsibility to remedy the deficiencies of
libxml2, especially if it involves breaking the processing of valid
xpath expressions on well formed XML.

If someone can come up with a correct scheme for handling fragments, I'm
all ears, but otherwise I think a) we should rip this out of 8.4 and b)
we should try to make 8.3 slightly less broken at least, along the lines
of my earlier suggestion.

cheers

andrew

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#5)
Re: xpath processing brain dead

Andrew Dunstan <andrew@dunslane.net> writes:

I don't think it is our responsibility to remedy the deficiencies of
libxml2, especially if it involves breaking the processing of valid
xpath expressions on well formed XML.

If someone can come up with a correct scheme for handling fragments, I'm
all ears, but otherwise I think a) we should rip this out of 8.4 and b)
we should try to make 8.3 slightly less broken at least, along the lines
of my earlier suggestion.

I'm okay with removing this for 8.4, but I think it's too late to
change the behavior of 8.3.

regards, tom lane

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#6)
Re: xpath processing brain dead

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

I don't think it is our responsibility to remedy the deficiencies of
libxml2, especially if it involves breaking the processing of valid
xpath expressions on well formed XML.

If someone can come up with a correct scheme for handling fragments, I'm
all ears, but otherwise I think a) we should rip this out of 8.4 and b)
we should try to make 8.3 slightly less broken at least, along the lines
of my earlier suggestion.

I'm okay with removing this for 8.4, but I think it's too late to
change the behavior of 8.3.

It's never too late to fix a bug. The current behaviour is just plain
nonsense ... if your xpath expression is 'foo/bar' it ends up searching
for '/xfoo/bar' which can't possibly be right. Surely we can at least
fix that.

cheers

andrew

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#7)
Re: xpath processing brain dead

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

I'm okay with removing this for 8.4, but I think it's too late to
change the behavior of 8.3.

It's never too late to fix a bug.

When the proposed fix involves breaking cases that used to behave
usefully, you need a much stronger argument than that.

regards, tom lane

#9Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#8)
Re: xpath processing brain dead

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

I'm okay with removing this for 8.4, but I think it's too late to
change the behavior of 8.3.

It's never too late to fix a bug.

When the proposed fix involves breaking cases that used to behave
usefully, you need a much stronger argument than that.

What I have proposed for 8.3 should not break a single case that
currently behaves usefully. If anyone has a counter-example please show it.

What I have proposed for 8.4 possibly would break current "useful"
behaviour (FSVO "useful"), but should be done anyway on correctness grounds.

cheers

andrew

#10Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#9)
Re: xpath processing brain dead

What I have proposed for 8.3 should not break a single case that currently
behaves usefully. If anyone has a counter-example please show it.

What I have proposed for 8.4 possibly would break current "useful" behaviour
(FSVO "useful"), but should be done anyway on correctness grounds.

I dunno, aren't XML document fragments sort of a pretty common case?

...Robert

#11Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#10)
Re: xpath processing brain dead

Robert Haas wrote:

What I have proposed for 8.3 should not break a single case that currently
behaves usefully. If anyone has a counter-example please show it.

What I have proposed for 8.4 possibly would break current "useful" behaviour
(FSVO "useful"), but should be done anyway on correctness grounds.

I dunno, aren't XML document fragments sort of a pretty common case?

They are. So what? How does that contradict either of the statements
made above?

Programmers using libxml2 are used to handling this problem. Why must
postgres alone use a totally brain dead and utterly incorrect wrapping
to solve a problem that everyone else leaves up to the programmer to handle?

People in this thread are not concentrating on the fact that what we do
now can break correct input. That makes it an unquestionable bug,
IMNSHO. There seems to be agreement about what to do for 8.4, so we seem
to be arguing about what to do for 8.3. Are you *really* arguing that
prepending the xpath expression with '/x' in all cases should be allowed
to continue? If you are I can only assume your use of xml/xpath is
probably fairly light.

cheers

andrew

#12Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#11)
Re: xpath processing brain dead

On Thu, Feb 26, 2009 at 3:18 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

I dunno, aren't XML document fragments sort of a pretty common case?

They are. So what? How does that contradict either of the statements made
above?

Programmers using libxml2 are used to handling this problem. Why must
postgres alone use a totally brain dead and utterly incorrect wrapping to
solve a problem that everyone else leaves up to the programmer to handle?

I can't think of any reason, especially not when you put it that way. :-)

People in this thread are not concentrating on the fact that what we do now
can break correct input. That makes it an unquestionable bug, IMNSHO. There
seems to be agreement about what to do for 8.4, so we seem to be arguing
about what to do for 8.3. Are you *really* arguing that prepending the xpath
expression with '/x' in all cases should be allowed to continue? If you are
I can only assume your use of xml/xpath is probably fairly light.

Mine is very light indeed. But the change you're proposing seems like
it could CONCEIVABLY break a working application that counts on
PostgreSQL's particular flavor of misbehavior, and therefore it seems
questionable to put that into a stable branch. The fact that the
application perhaps should not have been written that way to begin
with is neither here nor there. We don't want people to be afraid of
upgrading to the latest point release when we fix, say, a backend
crash or a data corruption bug.

...Robert

#13Joshua D. Drake
jd@commandprompt.com
In reply to: Robert Haas (#12)
Re: xpath processing brain dead

On Thu, 2009-02-26 at 15:37 -0500, Robert Haas wrote:

Mine is very light indeed. But the change you're proposing seems like
it could CONCEIVABLY break a working application that counts on
PostgreSQL's particular flavor of misbehavior,

That is what release notes are for.

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

#14Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#12)
Re: xpath processing brain dead

Robert Haas wrote:

But the change you're proposing seems like
it could CONCEIVABLY break a working application that counts on
PostgreSQL's particular flavor of misbehavior, and therefore it seems
questionable to put that into a stable branch. The fact that the
application perhaps should not have been written that way to begin
with is neither here nor there. We don't want people to be afraid of
upgrading to the latest point release when we fix, say, a backend
crash or a data corruption bug.

Let me reiterate the changes that I propose, and again challenge you to
provide a working counter-example if you believe it will break anything
not currently broken, even cases involving fragments.

First, I propose that we abandon this mangling, if, and only if, the xml
is in fact a well formed XML document. Since the whole point of the
mangling is to handle situations where the XML is not a well formed
document, that seems fairly straight-forward. If this change were to
upset any user, it must be because they are relying on undisputably
incorrect results.

Second, I propose that, in the remaining cases, where we do mangle the
XML, if the xpath expression does not begin with a '/', instead of
prepending it with '/x/, which can not possibly be correct under any
circumstance, we prepend it with '/x//' which has some possibility of
giving correct results.

IOW, these proposals will expand the number of correct results from the
code, without contributing any new incorrect cases. These changes are
*very* conservative, as is usual when we fix things on stable branches.

cheers

andrew

#15Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#14)
Re: xpath processing brain dead

I wrote:

Second, I propose that, in the remaining cases, where we do mangle the
XML, if the xpath expression does not begin with a '/', instead of
prepending it with '/x/, which can not possibly be correct under any
circumstance, we prepend it with '/x//' which has some possibility of
giving correct results.

^^^
'/x'

cheers

andrew

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#14)
Re: xpath processing brain dead

Andrew Dunstan <andrew@dunslane.net> writes:

First, I propose that we abandon this mangling, if, and only if, the xml
is in fact a well formed XML document. Since the whole point of the
mangling is to handle situations where the XML is not a well formed
document, that seems fairly straight-forward. If this change were to
upset any user, it must be because they are relying on undisputably
incorrect results.

Second, I propose that, in the remaining cases, where we do mangle the
XML, if the xpath expression does not begin with a '/', instead of
prepending it with '/x/, which can not possibly be correct under any
circumstance, we prepend it with '/x//' which has some possibility of
giving correct results.

Hmm, does this proposal require adding a test of well-formed-ness to
a code path that doesn't currently have one? If so, is that likely
to contribute any noticeable slowdown?

I can't offhand see an objection to this other than possible performance
impact.

regards, tom lane

#17Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#14)
Re: xpath processing brain dead

On Thu, Feb 26, 2009 at 3:54 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

Robert Haas wrote:

But the change you're proposing seems like
it could CONCEIVABLY break a working application that counts on
PostgreSQL's particular flavor of misbehavior, and therefore it seems
questionable to put that into a stable branch.  The fact that the
application perhaps should not have been written that way to begin
with is neither here nor there.  We don't want people to be afraid of
upgrading to the latest point release when we fix, say, a backend
crash or a data corruption bug.

Let me reiterate the changes that I propose, and again challenge you to
provide a working counter-example if you believe it will break anything not
currently broken, even cases involving fragments.

First, I propose that we abandon this mangling, if, and only if, the xml is
in fact a well formed XML document. Since the whole point of the mangling is
to handle situations where the XML is not a well formed document, that seems
fairly straight-forward. If this change were to upset any user, it must be
because they are relying on undisputably incorrect results.

Second, I propose that, in the remaining cases, where we do mangle the XML,
if the xpath expression does not begin with a '/', instead of prepending it
with '/x/, which can not possibly be correct under any circumstance, we
prepend it with '/x//' which has some possibility of giving correct results.

IOW, these proposals will expand the number of correct results from the
code, without contributing any new incorrect cases. These changes are *very*
conservative, as is usual when we fix things on stable branches.

cheers

andrew

You are right.

Nuff said,

...Robert

#18James William Pye
lists@jwp.name
In reply to: Peter Eisentraut (#4)
Re: xpath processing brain dead

On Feb 26, 2009, at 9:37 AM, Peter Eisentraut wrote:

It's not about ill-formed pieces, it is about (well-formed) content
fragments that are not full documents (exactly one root element).
libxml2 doesn't support xpath on content fragments.

exslt:node-set() to the rescue? Or is that/equivalent functionality
not easily accessed at the C-level with libxml2?

http://www.exslt.org/exsl/functions/node-set/index.html

#19Andrew Dunstan
andrew@dunslane.net
In reply to: James William Pye (#18)
Re: xpath processing brain dead

James Pye wrote:

On Feb 26, 2009, at 9:37 AM, Peter Eisentraut wrote:

It's not about ill-formed pieces, it is about (well-formed) content
fragments that are not full documents (exactly one root element).
libxml2 doesn't support xpath on content fragments.

exslt:node-set() to the rescue? Or is that/equivalent functionality
not easily accessed at the C-level with libxml2?

http://www.exslt.org/exsl/functions/node-set/index.html

A node-set isn't a document. In any case, this functionality doesn't
appear to be in libxml2, it's in libxslt according to the reference you
provided.

cheers

andrew

#20James William Pye
lists@jwp.name
In reply to: Andrew Dunstan (#19)
Re: xpath processing brain dead

On Feb 26, 2009, at 5:41 PM, Andrew Dunstan wrote:

http://www.exslt.org/exsl/functions/node-set/index.html

A node-set isn't a document.

yes.. :)

I guess what I'm saying is that if:

tinman=# SELECT XML'<foo/><bar/>';
xml
--------------
<foo/><bar/>
(1 row)

is considered to be valid per *SQL-XML*, then it should probably be
treated as a node-set in the context of xpath, not mangled with
<x>...</x>..

Certainly, I would expect an implicit "node-set() call" long before
wrapping the fragment in <x>...</x> and prefixing my xpath query.

However, I find the validity of the above, XML'<foo/><bar/>', a bit
disturbing to begin with. :P

In any case, this functionality doesn't appear to be in libxml2,
it's in libxslt according to the reference you provided.

I think that's *just* referencing the list of xslt implementations
that the extension function is known to be available in.. I doubt that
means to imply that the function or equivalent functionality is not
available in libxml2 itself. I'd wager that equivalent functionality
could be implemented if it weren't directly/already available.. =)

#21Andrew Dunstan
andrew@dunslane.net
In reply to: James William Pye (#20)
#22Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#16)
#23Hannu Krosing
hannu@tm.ee
In reply to: Robert Haas (#10)
#24Andrew Dunstan
andrew@dunslane.net
In reply to: Hannu Krosing (#23)
#25Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#22)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#25)
#27Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#26)
#28Hannu Krosing
hannu@tm.ee
In reply to: Andrew Dunstan (#24)
#29James William Pye
lists@jwp.name
In reply to: Andrew Dunstan (#21)
#30Andrew Dunstan
andrew@dunslane.net
In reply to: Hannu Krosing (#28)
#31James William Pye
lists@jwp.name
In reply to: James William Pye (#29)
#32Andrew Dunstan
andrew@dunslane.net
In reply to: James William Pye (#31)
#33James William Pye
lists@jwp.name
In reply to: Andrew Dunstan (#32)
#34Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#27)
#35Hannu Krosing
hannu@tm.ee
In reply to: Andrew Dunstan (#30)
#36Andrew Dunstan
andrew@dunslane.net
In reply to: Hannu Krosing (#28)
#37Hannu Krosing
hannu@tm.ee
In reply to: Andrew Dunstan (#36)
#38Andrew Dunstan
andrew@dunslane.net
In reply to: Hannu Krosing (#37)
#39Peter Eisentraut
peter_e@gmx.net
In reply to: Andrew Dunstan (#21)
#40Simon Riggs
simon@2ndQuadrant.com
In reply to: Andrew Dunstan (#38)
#41Andrew Dunstan
andrew@dunslane.net
In reply to: Peter Eisentraut (#39)
#42Andrew Dunstan
andrew@dunslane.net
In reply to: Simon Riggs (#40)
#43Hannu Krosing
hannu@tm.ee
In reply to: Andrew Dunstan (#42)
#44Peter Eisentraut
peter_e@gmx.net
In reply to: Hannu Krosing (#43)
#45Andrew Dunstan
andrew@dunslane.net
In reply to: Hannu Krosing (#43)
#46Hannu Krosing
hannu@tm.ee
In reply to: Peter Eisentraut (#44)
#47Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#45)
#48James William Pye
lists@jwp.name
In reply to: Andrew Dunstan (#47)