Fix XML handling with DOCTYPE

Started by Ryan Lambertalmost 7 years ago49 messages

ryan@rustprooflabs.com

almost 7 years ago

Hi all,

I'm investigating the issue I reported here:
/messages/by-id/153478795159.1302.9617586466368699403@wrigleys.postgresql.org

As Tom Lane mentioned there, the docs (8.13) indicate xmloption = CONTENT
should accept all valid XML. At this time, XML with a DOCTYPE declaration
is not accepted with this setting even though it is considered valid XML.
I'd like to work on a patch to address this issue and make it work as
advertised.

I traced the source of the error to line ~1500 in
/src/backend/utils/adt/xml.c

res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0, utf8string +
count, NULL);

It looks like it is xmlParseBalancedChunkMemory from libxml that doesn't
work when there's a DOCTYPE in the XML data. My assumption is the DOCTYPE
element makes the XML not well-balanced. From:

http://xmlsoft.org/html/libxml-parser.html#xmlParseBalancedChunkMemory

This function returns:

0 if the chunk is well balanced, -1 in case of args problem and the parser
error code otherwise

I see xmlParseBalancedChunkMemoryRecover that might provide the
functionality needed. That function returns:

0 if the chunk is well balanced, -1 in case of args problem and the parser

error code otherwise In case recover is set to 1, the nodelist will not be
empty even if the parsed chunk is not well balanced, assuming the parsing
succeeded to some extent.

I haven't tested yet to see if this parses the data w/ DOCTYPE successfully
yet. If it does, I don't think it would be difficult to update the check
on res_code to not fail. I'm making another assumption that there is a
distinct code from libxml to differentiate from other errors, but I
couldn't find those codes quickly. The current check is this:

if (res_code != 0 || xmlerrcxt->err_occurred)

Does this sound reasonable? Have I missed some major aspect? If this is
on the right track I can work on creating a patch to move this forward.

Thanks,

*Ryan Lambert*
RustProof Labs
www.rustprooflabs.com

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Ryan Lambert (#1)

Re: Fix XML handling with DOCTYPE

On 03/16/19 16:10, Ryan Lambert wrote:

As Tom Lane mentioned there, the docs (8.13) indicate xmloption = CONTENT
should accept all valid XML. At this time, XML with a DOCTYPE declaration
is not accepted with this setting even though it is considered valid XML.

Hello Ryan,

A patch for your issue is currently registered in the 2019-03 commitfest[1]https://commitfest.postgresql.org/22/1872/.

If it attracts somebody to review it before the end of the month, it might
make it into PG v12.

It is the xml-content-2006-2.patch found on the email thread [2]/messages/by-id/5C81F8C0.6090901@anastigmatix.net. (The other
patch found there is associated documentation fixes, and also needs to be
reviewed.)

Further conversation should probably be on that email thread so that it
stays associated with the commitfest entry.

Thanks for your interest in the issue!

Regards,
Chapman Flack

[1]: https://commitfest.postgresql.org/22/1872/
[2]: /messages/by-id/5C81F8C0.6090901@anastigmatix.net

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Ryan Lambert (#1)

Re: Fix XML handling with DOCTYPE

Ryan Lambert <ryan@rustprooflabs.com> writes:

I'm investigating the issue I reported here:
/messages/by-id/153478795159.1302.9617586466368699403@wrigleys.postgresql.org
I'd like to work on a patch to address this issue and make it work as
advertised.

Good idea, because it doesn't seem like anybody else cares ...

I see xmlParseBalancedChunkMemoryRecover that might provide the
functionality needed.

TBH, our experience with libxml has not been so positive that I'd think
adding dependencies on new parts of its API would be a good plan.

Experimenting with different inputs, it seems like removing the
"<!DOCTYPE ...>" tag is enough to make it work. So what I'm wondering
about is writing something like parse_xml_decl() to skip over that.

Bear in mind though that I know next to zip about XML. There may be
some good reason why we don't want to strip off the !DOCTYPE part
from what libxml sees.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Chapman Flack (#2)

Re: Fix XML handling with DOCTYPE

Chapman Flack <chap@anastigmatix.net> writes:

A patch for your issue is currently registered in the 2019-03 commitfest[1].

Oh! I apologize for saying nobody was working on this issue.

Taking a very quick look at your patch, though, I dunno --- it seems like
it adds a boatload of new assumptions about libxml's data structures and
error-reporting behavior. I'm sure it works for you, but will it work
across a wide range of libxml versions?

What do you think of the idea I just posted about parsing off the DOCTYPE
thing for ourselves, and not letting libxml see it?

regards, tom lane

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Tom Lane (#3)

Re: Fix XML handling with DOCTYPE

On 03/16/19 16:42, Tom Lane wrote:

Ryan Lambert <ryan@rustprooflabs.com> writes:

I'm investigating the issue I reported here:
/messages/by-id/153478795159.1302.9617586466368699403@wrigleys.postgresql.org
I'd like to work on a patch to address this issue and make it work as
advertised.

Good idea, because it doesn't seem like anybody else cares ...

ahem

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Tom Lane (#4)

Re: Fix XML handling with DOCTYPE

On 03/16/19 16:55, Tom Lane wrote:

What do you think of the idea I just posted about parsing off the DOCTYPE
thing for ourselves, and not letting libxml see it?

The principled way of doing that would be to pre-parse to find a DOCTYPE,
and if there is one, leave it there and parse the input as we do for
'document'. Per XML, if there is a DOCTYPE, the document must satisfy
the 'document' syntax requirements, and per SQL/XML:2006-and-later,
'content' is a proper superset of 'document', so if we were asked for
'content' and can successfully parse it as 'document', we're good,
and if we see a DOCTYPE and yet it incurs a parse error as 'document',
well, that's what needed to happen.

The DOCTYPE can appear arbitrarily far in, but the only things that
can precede it are the XML decl, whitespace, XML comments, and XML
processing instructions. None of those things nest, so the preceding
stuff makes a regular language, and a regular expression that matches
any amount of that stuff ending in <!DOCTYPE would be enough to detect
that the parse should be shunted off to get 'document' treatment.

The patch I submitted essentially relies on libxml to do that same
parsing up to that same point and detect the error, so it would add
no upfront cost in the majority of cases that aren't this corner one.

But keeping a little compiled regex around and testing the input with that
would hardly break the bank, either.

Regards,
-Chap

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Chapman Flack (#6)

Re: Fix XML handling with DOCTYPE

Chapman Flack <chap@anastigmatix.net> writes:

On 03/16/19 16:55, Tom Lane wrote:

What do you think of the idea I just posted about parsing off the DOCTYPE
thing for ourselves, and not letting libxml see it?

The principled way of doing that would be to pre-parse to find a DOCTYPE,
and if there is one, leave it there and parse the input as we do for
'document'. Per XML, if there is a DOCTYPE, the document must satisfy
the 'document' syntax requirements, and per SQL/XML:2006-and-later,
'content' is a proper superset of 'document', so if we were asked for
'content' and can successfully parse it as 'document', we're good,
and if we see a DOCTYPE and yet it incurs a parse error as 'document',
well, that's what needed to happen.

Hm, so, maybe just

(1) always try to parse as document. If successful, we're done.

(2) otherwise, if allowed by xmloption, try to parse using our
current logic for the CONTENT case.

This avoids adding any new assumptions about how libxml acts,
which is what I was hoping to achieve.

One interesting question is which error to report if both (1) and (2)
fail.

regards, tom lane

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Tom Lane (#7)

Re: Fix XML handling with DOCTYPE

On 03/16/19 17:21, Tom Lane wrote:

Chapman Flack <chap@anastigmatix.net> writes:

On 03/16/19 16:55, Tom Lane wrote:

What do you think of the idea I just posted about parsing off the DOCTYPE
thing for ourselves, and not letting libxml see it?

The principled way of doing that would be to pre-parse to find a DOCTYPE,
and if there is one, leave it there and parse the input as we do for
'document'. Per XML, if there is a DOCTYPE, the document must satisfy
the 'document' syntax requirements, and per SQL/XML:2006-and-later,
'content' is a proper superset of 'document', so if we were asked for
'content' and can successfully parse it as 'document', we're good,
and if we see a DOCTYPE and yet it incurs a parse error as 'document',
well, that's what needed to happen.

Hm, so, maybe just

(1) always try to parse as document. If successful, we're done.

(2) otherwise, if allowed by xmloption, try to parse using our
current logic for the CONTENT case.

What I don't like about that is that (a) the input could be
arbitrarily long and complex to parse (not that you can't imagine
a database populated with lots of short little XML snippets, but
at the same time, a query could quite plausibly deal in yooge ones),
and (b), step (1) could fail at the last byte of the input, followed
by total reparsing as (2).

I think the safer structure is clearly that of the current patch,
modulo whether the "has a DOCTYPE" test is done by libxml itself
(with the assumptions you don't like) or by a pre-scan.

So the current structure is:

restart:
asked for document?
parse as document, or fail
else asked for content:
parse as content
failed?
because DOCTYPE? restart as if document
else fail

and a pre-scan structure could be very similar:

restart:
asked for document?
parse as document, or fail
else asked for content:
pre-scan finds DOCTYPE?
restart as if document
else parse as content, or fail

The pre-scan is a simple linear search and will ordinarily say yes or no
within a couple dozen characters--you could *have* an input with 20k of
leading whitespace and comments, but it's hardly the norm. Just trying to
parse as 'document' first could easily parse a large fraction of the input
before discovering it's followed by something that can't follow a document
element.

Regards,
-Chap

Ryan Lambert

ryan@rustprooflabs.com

almost 7 years ago

In reply to: Chapman Flack (#8)

Re: Fix XML handling with DOCTYPE

Thank you both! I had glanced at that item in the commitfest but didn't
notice it would fix this issue.
I'll try to test/review this before the end of the month, much better than
starting from scratch myself. A quick glance at the patch looks logical
and looks like it should work for my use case.

Thanks,

Ryan Lambert

On Sat, Mar 16, 2019 at 4:33 PM Chapman Flack <chap@anastigmatix.net> wrote:

Show quoted text

On 03/16/19 17:21, Tom Lane wrote:

Chapman Flack <chap@anastigmatix.net> writes:

On 03/16/19 16:55, Tom Lane wrote:

What do you think of the idea I just posted about parsing off the

DOCTYPE

thing for ourselves, and not letting libxml see it?

The principled way of doing that would be to pre-parse to find a

DOCTYPE,

and if there is one, leave it there and parse the input as we do for
'document'. Per XML, if there is a DOCTYPE, the document must satisfy
the 'document' syntax requirements, and per SQL/XML:2006-and-later,
'content' is a proper superset of 'document', so if we were asked for
'content' and can successfully parse it as 'document', we're good,
and if we see a DOCTYPE and yet it incurs a parse error as 'document',
well, that's what needed to happen.

Hm, so, maybe just

(1) always try to parse as document. If successful, we're done.

(2) otherwise, if allowed by xmloption, try to parse using our
current logic for the CONTENT case.

What I don't like about that is that (a) the input could be
arbitrarily long and complex to parse (not that you can't imagine
a database populated with lots of short little XML snippets, but
at the same time, a query could quite plausibly deal in yooge ones),
and (b), step (1) could fail at the last byte of the input, followed
by total reparsing as (2).

I think the safer structure is clearly that of the current patch,
modulo whether the "has a DOCTYPE" test is done by libxml itself
(with the assumptions you don't like) or by a pre-scan.

So the current structure is:

restart:
asked for document?
parse as document, or fail
else asked for content:
parse as content
failed?
because DOCTYPE? restart as if document
else fail

and a pre-scan structure could be very similar:

restart:
asked for document?
parse as document, or fail
else asked for content:
pre-scan finds DOCTYPE?
restart as if document
else parse as content, or fail

The pre-scan is a simple linear search and will ordinarily say yes or no
within a couple dozen characters--you could *have* an input with 20k of
leading whitespace and comments, but it's hardly the norm. Just trying to
parse as 'document' first could easily parse a large fraction of the input
before discovering it's followed by something that can't follow a document
element.

Regards,
-Chap

#10

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Chapman Flack (#8)

Re: Fix XML handling with DOCTYPE

On 03/16/19 18:33, Chapman Flack wrote:

The pre-scan is a simple linear search and will ordinarily say yes or no
within a couple dozen characters--you could *have* an input with 20k of
leading whitespace and comments, but it's hardly the norm. Just trying to

If the available regexp functions want to start by munging the entire input
into a pg_wchar array, then it may be better to implement the pre-scan as
open code, the same way parse_xml_decl() is already implemented.

Given that parse_xml_decl() already covers the first optional thing that
can precede the doctype, the remaining scan routine would only need to
recognize comments, PIs, and whitespace. That would be pretty straightforward.

Regards,
-Chap

#11

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Chapman Flack (#8)

Re: Fix XML handling with DOCTYPE

Chapman Flack <chap@anastigmatix.net> writes:

On 03/16/19 17:21, Tom Lane wrote:

Hm, so, maybe just

(1) always try to parse as document. If successful, we're done.

(2) otherwise, if allowed by xmloption, try to parse using our
current logic for the CONTENT case.

What I don't like about that is that (a) the input could be
arbitrarily long and complex to parse (not that you can't imagine
a database populated with lots of short little XML snippets, but
at the same time, a query could quite plausibly deal in yooge ones),
and (b), step (1) could fail at the last byte of the input, followed
by total reparsing as (2).

That doesn't seem particularly likely to me: based on what's been
said here, I'd expect parsing with the wrong expectation to usually
fail near the start of the input. In any case, the other patch
also requires repeat parsing, no? It's just doing that in a different
set of cases.

The reason I'm pressing you for a simpler patch is that dump/reload
failures are pretty bad, so ideally we'd find a fix that we're
comfortable with back-patching into the released branches.
Personally I would never dare to back-patch the proposed patch:
it's too complex, so it's not real clear that it doesn't have unwanted
side-effects, and it's not at all certain that there aren't libxml
version dependencies in it. (Maybe another committer with more
familiarity with libxml would evaluate the risks differently, but
I doubt it.) But I think that something close to what I sketched
above would pass muster as safe-to-backpatch.

regards, tom lane

#12

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Tom Lane (#11)

Re: Fix XML handling with DOCTYPE

On 03/17/19 11:45, Tom Lane wrote:

Chapman Flack <chap@anastigmatix.net> writes:

On 03/16/19 17:21, Tom Lane wrote:

(1) always try to parse as document. If successful, we're done.
(2) otherwise, if allowed by xmloption, try to parse using our

What I don't like about that is that (a) the input could be
arbitrarily long and complex to parse (not that you can't imagine
a database populated with lots of short little XML snippets, but
at the same time, a query could quite plausibly deal in yooge ones),
and (b), step (1) could fail at the last byte of the input, followed
by total reparsing as (2).

That doesn't seem particularly likely to me: based on what's been
said here, I'd expect parsing with the wrong expectation to usually
fail near the start of the input. In any case, the other patch
also requires repeat parsing, no? It's just doing that in a different
set of cases.

I'll do up a version with the open-coded prescan I proposed last night.

Whether parsing with the wrong expectation is likely to fail near the
start of the input depends on both the input and the expectation. If
your expectation is DOCUMENT and the input is CONTENT, it's possible
for the determining difference to be something that follows the first
element, and a first element can be (and often is) nearly all of the input.

What I was doing in the patch is the reverse: parsing with the expectation
of CONTENT to see if a DTD gets tripped over. It isn't allowed for an
element to precede a DTD, so that approach can be expected to fail fast
if the other branch needs to be taken.

But a quick pre-scan for the same thing would have the same property,
without the libxml dependencies that bother you here. Watch this space.

Regards,
-Chap

#13

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Chapman Flack (#12)

Re: Fix XML handling with DOCTYPE

Chapman Flack <chap@anastigmatix.net> writes:

What I was doing in the patch is the reverse: parsing with the expectation
of CONTENT to see if a DTD gets tripped over. It isn't allowed for an
element to precede a DTD, so that approach can be expected to fail fast
if the other branch needs to be taken.

Ah, right. I don't have any problem with trying the CONTENT approach
before the DOCUMENT approach rather than vice-versa. What I was concerned
about was adding a lot of assumptions about exactly how libxml would
report the failure. IMO a maximally-safe patch would just rearrange
things we're already doing without adding new things.

But a quick pre-scan for the same thing would have the same property,
without the libxml dependencies that bother you here. Watch this space.

Do we need a pre-scan at all?

regards, tom lane

#14

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Tom Lane (#13)

Re: Fix XML handling with DOCTYPE

On 03/17/19 13:16, Tom Lane wrote:

Chapman Flack <chap@anastigmatix.net> writes:

What I was doing in the patch is the reverse: parsing with the expectation
of CONTENT to see if a DTD gets tripped over. It isn't allowed for an
element to precede a DTD, so that approach can be expected to fail fast
if the other branch needs to be taken.

Ah, right. I don't have any problem with trying the CONTENT approach
before the DOCUMENT approach rather than vice-versa. What I was concerned
about was adding a lot of assumptions about exactly how libxml would
report the failure. IMO a maximally-safe patch would just rearrange
things we're already doing without adding new things.

But a quick pre-scan for the same thing would have the same property,
without the libxml dependencies that bother you here. Watch this space.

Do we need a pre-scan at all?

Without it, we double the time to a failure result in every case that
should actually fail, as well as in this one corner case that we want to
see succeed, and the question you posed earlier about which error message
to return becomes thornier.

If the query asked for CONTENT, any error result should be one you could get
when parsing as CONTENT. If we switch and try parsing as DOCUMENT _because
the input is claiming to have the form of a DOCUMENT_, then it's defensible
to return errors explaining why it's not a DOCUMENT ... but not in the
general case of just throwing DOCUMENT at it any time CONTENT parse fails.

Regards,
-Chap

#15

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Chapman Flack (#14)

Re: Fix XML handling with DOCTYPE

Chapman Flack <chap@anastigmatix.net> writes:

On 03/17/19 13:16, Tom Lane wrote:

Do we need a pre-scan at all?

Without it, we double the time to a failure result in every case that
should actually fail, as well as in this one corner case that we want to
see succeed, and the question you posed earlier about which error message
to return becomes thornier.

I have absolutely zero concern about whether it takes twice as long to
detect bad input; nobody should be sending bad input if they're concerned
about performance. (The costs of the ensuing transaction abort would
likely dwarf xml_in's runtime in any case.) Besides, with what we're
talking about doing here,

(1) the extra runtime is consumed only in cases that would fail up to now,
so nobody can complain about a performance regression;
(2) doing a pre-scan *would* be a performance regression for cases that
work today; not a large one we hope, but still...

The error message issue is indeed a concern, but I don't see why it's
complicated if you agree that

If the query asked for CONTENT, any error result should be one you could get
when parsing as CONTENT.

That just requires us to save the first error message and be sure to issue
that one not the DOCUMENT one, no? That's what we'd want to do from a
backwards-compatibility standpoint anyhow, since that's the error message
wording you'd get with today's code.

regards, tom lane

#16

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Tom Lane (#15)

2 attachment(s)

Re: Fix XML handling with DOCTYPE

On 03/17/19 15:06, Tom Lane wrote:

The error message issue is indeed a concern, but I don't see why it's
complicated if you agree that

If the query asked for CONTENT, any error result should be one you could get
when parsing as CONTENT.

That just requires us to save the first error message and be sure to issue
that one not the DOCUMENT one, no?

I confess I haven't looked hard yet at how to do that. The way errors come
out of libxml is more involved than "here's a message", so there's a choice
of (a) try to copy off that struct in a way that's sure to survive
re-executing the parser, and then use the copy, or (b) generate a message
right away from the structured information and save that, and I guess b
might not be too bad; a might not be too bad, or it might, and slide right
back into the kind of libxml-behavior-assumptions you're wanting to avoid.

Meanwhile, here is a patch on the lines I proposed earlier, with a
pre-check. Any performance hit that it could entail (which I'd really
expect to be de minimis, though I haven't benchmarked) ought to be
compensated by the strlen I changed to strnlen in parse_xml_decl (as
there's really no need to run off and count the whole rest of the input
just to know if 1, 2, 3, or 4 bytes are available to decode a UTF-8 char).

... and, yes, I know that could be an independent patch, and then the
performance effect here should be measured from there. But it was near
what I was doing anyway, so I included it here.

Attaching both still-outstanding patches (this one and docfix) so the
CF app doesn't lose one.

Regards,
-Chap

Attachments:

xml-functions-type-docfix-3.patchtext/x-patch; name=xml-functions-type-docfix-3.patchDownload

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index b462c06..94b46a6 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4282,16 +4282,21 @@ SET XML OPTION { DOCUMENT | CONTENT };
 SET xmloption TO { DOCUMENT | CONTENT };
 </synopsis>
     The default is <literal>CONTENT</literal>, so all forms of XML
-    data are allowed.
+    data are allowed except as noted below.
    </para>
 
    <note>
     <para>
-     With the default XML option setting, you cannot directly cast
-     character strings to type <type>xml</type> if they contain a
-     document type declaration, because the definition of XML content
-     fragment does not accept them.  If you need to do that, either
-     use <literal>XMLPARSE</literal> or change the XML option.
+     In the SQL:2006 and later standard, the <literal>CONTENT</literal> form
+     is a proper superset of the <literal>DOCUMENT</literal> form, and so the
+     default XML option setting would allow casting to XML from character
+     strings in either form. <productname>PostgreSQL</productname>, however,
+     uses the SQL:2003 definition in which <literal>CONTENT</literal> form
+     cannot contain a document type declaration. Therefore, there is no one
+     setting of the XML option that will allow casting to XML from every valid
+     character string. The default will work almost always, but for a document
+     with a DTD, it will be necessary to change the XML option or
+     use <literal>XMLPARSE</literal> specifying <literal>DOCUMENT</literal>.
     </para>
    </note>
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 03859a7..0017aab 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10140,8 +10140,13 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 
 
  <sect1 id="functions-xml">
+
   <title>XML Functions</title>
 
+  <indexterm>
+   <primary>XML Functions</primary>
+  </indexterm>
+
   <para>
    The functions and function-like expressions described in this
    section operate on values of type <type>xml</type>.  Check <xref
@@ -10344,8 +10349,8 @@ SELECT xmlelement(name foo, xmlattributes('xyz' as bar),
      encoding, depending on the setting of the configuration parameter
      <xref linkend="guc-xmlbinary"/>.  The particular behavior for
      individual data types is expected to evolve in order to align the
-     SQL and PostgreSQL data types with the XML Schema specification,
-     at which point a more precise description will appear.
+     PostgreSQL mappings with those specified in SQL:2006 and later,
+     as discussed in <xref linkend="functions-xml-limits-casts"/>.
     </para>
    </sect3>
 
@@ -10587,10 +10592,12 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
 </synopsis>
 
     <para>
-     The function <function>xmlexists</function> returns true if the
-     XPath expression in the first argument returns any nodes, and
-     false otherwise.  (If either argument is null, the result is
-     null.)
+     The function <function>xmlexists</function> evaluates an XPath 1.0
+     expression (the first argument), with the passed value as its context item.
+     The function returns false if the result of that evaluation yields an
+     empty nodeset, true if it yields any other value.  (The function returns
+     null if an argument is null.) A nonnull value passed as the context item
+     must be an XML document, not a content fragment or any non-XML value.
     </para>
 
     <para>
@@ -10607,24 +10614,12 @@ SELECT xmlexists('//town[text() = ''Toronto'']' PASSING BY VALUE '<towns><town>T
 
     <para>
      The <literal>BY REF</literal> or <literal>BY VALUE</literal> clauses
-     have no effect in <productname>PostgreSQL</productname>, but are allowed
-     for compatibility with other implementations.  Per the <acronym>SQL</acronym>
-     standard, the one that precedes any argument is required, and indicates
-     the default for arguments that follow, and one may follow any argument to
-     override the default.
-     <productname>PostgreSQL</productname> ignores <literal>BY REF</literal>
-     and passes by value always.
-    </para>
-
-    <para>
-     In the <acronym>SQL</acronym> standard, an <function>xmlexists</function>
-     construct evaluates an expression in the XQuery language, allows passing
-     values for named parameters in the expression as well as for the context
-     item, and does not require the passed values to be documents, or even of
-     XML type.
-     In <productname>PostgreSQL</productname>, this construct currently only
-     evaluates an XPath 1.0 expression, and allows passing only one value,
-     which must be an XML document, to be the context item.
+     are accepted in <productname>PostgreSQL</productname>, but ignored, as
+     discussed in <xref linkend="functions-xml-limits-postgresql"/>.
+     In the SQL standard, an <function>xmlexists</function> construct first
+     appears in SQL:2006 and evaluates an expression in the XML Query language,
+     but this implementation allows only an XPath 1.0 expression, as discussed
+     in <xref linkend="functions-xml-limits-xpath1"/>.
     </para>
    </sect3>
 
@@ -10730,12 +10725,12 @@ SELECT xml_is_well_formed_document('<pg:foo xmlns:pg="http://postgresql.org/stuf
 </synopsis>
 
     <para>
-     The function <function>xpath</function> evaluates the XPath
+     The function <function>xpath</function> evaluates the XPath 1.0
      expression <replaceable>xpath</replaceable> (a <type>text</type> value)
      against the XML value
      <replaceable>xml</replaceable>.  It returns an array of XML values
-     corresponding to the node set produced by the XPath expression.
-     If the XPath expression returns a scalar value rather than a node set,
+     corresponding to the nodeset produced by the XPath expression.
+     If the XPath expression returns a scalar value rather than a nodeset,
      a single-element array is returned.
     </para>
 
@@ -10797,9 +10792,10 @@ SELECT xpath('//mydefns:b/text()', '<a xmlns="http://example.com"><b>test</b></a
     <para>
      The function <function>xpath_exists</function> is a specialized form
      of the <function>xpath</function> function.  Instead of returning the
-     individual XML values that satisfy the XPath, this function returns a
-     Boolean indicating whether the query was satisfied or not.  This
-     function is equivalent to the standard <literal>XMLEXISTS</literal> predicate,
+     individual XML values that satisfy the XPath 1.0 expression, this function
+     returns a Boolean indicating whether the query was satisfied or not
+     (specifically, whether it produced any value other than an empty nodeset).
+     This function is equivalent to the <literal>XMLEXISTS</literal> predicate,
      except that it also offers support for a namespace mapping argument.
     </para>
 
@@ -10840,8 +10836,8 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
 
     <para>
      The <function>xmltable</function> function produces a table based
-     on the given XML value, an XPath filter to extract rows, and an
-     optional set of column definitions.
+     on the given XML value, an XPath filter to extract rows, and a
+     set of column definitions.
     </para>
 
     <para>
@@ -10852,30 +10848,33 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
     </para>
 
     <para>
-     The required <replaceable>row_expression</replaceable> argument is an XPath
-     expression that is evaluated against the supplied XML document to
-     obtain an ordered sequence of XML nodes. This sequence is what
-     <function>xmltable</function> transforms into output rows.
+     The required <replaceable>row_expression</replaceable> argument is
+     an XPath 1.0 expression that is evaluated, passing the
+     <replaceable>document_expression</replaceable> as its context item, to
+     obtain a set of XML nodes. These nodes are what
+     <function>xmltable</function> transforms into output rows. (No rows
+     will be produced if the <replaceable>document_expression</replaceable>
+     is null, or the <replaceable>row_expression</replaceable> produces an
+     empty nodeset, or any value other than a nodeset.)
     </para>
 
     <para>
-     <replaceable>document_expression</replaceable> provides the XML document to
-     operate on.
-     The argument must be a well-formed XML document; fragments/forests
-     are not accepted.
+     <replaceable>document_expression</replaceable> provides the context item
+     for the <replaceable>row_expression</replaceable>. It must be a well-formed
+     XML document; fragments/forests are not accepted.
      The <literal>BY REF</literal> or <literal>BY VALUE</literal> clauses are
-     accepted, as described for the <function>xmlexists</function> predicate,
-     but ignored; PostgreSQL currently passes XML by value always.
+     accepted but ignored, as discussed in
+     <xref linkend="functions-xml-limits-postgresql"/>.
+     In the SQL standard, an <function>xmltable</function> construct first
+     appears in SQL:2006 and evaluates expressions in the XML Query language,
+     but this implementation allows only XPath 1.0 expressions, as discussed
+     in <xref linkend="functions-xml-limits-xpath1"/>.
     </para>
 
     <para>
      The mandatory <literal>COLUMNS</literal> clause specifies the list
      of columns in the output table.
-     If the <literal>COLUMNS</literal> clause is omitted, the rows in the result
-     set contain a single column of type <literal>xml</literal> containing the
-     data matched by <replaceable>row_expression</replaceable>.
-     If <literal>COLUMNS</literal> is specified, each entry describes a
-     single column.
+     Each entry describes a single column.
      See the syntax summary above for the format.
      The column name and type are required; the path, default and
      nullability clauses are optional.
@@ -10889,42 +10888,78 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
     </para>
 
     <para>
-     The <literal>column_expression</literal> for a column is an XPath expression
-     that is evaluated for each row, relative to the result of the
-     <replaceable>row_expression</replaceable>, to find the value of the column.
+     The <literal>column_expression</literal> for a column is an XPath 1.0
+     expression
+     that is evaluated for each row, with the current node from the
+     <replaceable>row_expression</replaceable> result as its context item,
+     to find the value of the column.
      If no <literal>column_expression</literal> is given, then the column name
      is used as an implicit path.
     </para>
 
     <para>
-     If a column's XPath expression returns multiple elements, an error
-     is raised.
-     If the expression matches an empty tag, the result is an
-     empty string (not <literal>NULL</literal>).
-     Any <literal>xsi:nil</literal> attributes are ignored.
+     If a column's XPath expression returns a non-XML value (limited to
+     string, boolean, or double in XPath 1.0) and the column has a
+     PostgreSQL type other than <type>xml</type>, the column will be set
+     as if by assigning the value's string representation to the PostgreSQL
+     type (adjusting the "string representation" of a boolean to
+     <literal>1</literal> or <literal>0</literal> if the target column type
+     category is numeric, otherwise <literal>true</literal> or
+     <literal>false</literal>).
+    </para>
+
+    <para>
+     If the column's expression returns a non-empty set of XML nodes
+     and the target column's type is <type>xml</type>, the column will
+     be assigned the expression result exactly, if it is of document or
+     content form.
+     <footnote>
+      <para>
+       A result containing more than one element node at the top level, or
+       non-whitespace text outside of an element, is an example of content form.
+       An XPath result can be of neither form, for example if it returns an
+       attribute node selected from the element that contains it. Such a result
+       will be put into content form with each such disallowed node replaced by
+       its string value, as defined for the XPath 1.0
+       <function>string</function> function.
+      </para>
+     </footnote>
+    </para>
+
+    <para>
+     A non-XML result assigned to an <type>xml</type> output column produces
+     content, a single text node with the string value of the result.
+     An XML result assigned to a column of any other type may not have more than
+     one node, or an error is raised. If there is exactly one node, the column
+     will be set as if by assigning the node's string
+     value (as defined for the XPath 1.0 <function>string</function> function)
+     to the PostgreSQL type.
     </para>
 
     <para>
-     The text body of the XML matched by the <replaceable>column_expression</replaceable>
-     is used as the column value. Multiple <literal>text()</literal> nodes
-     within an element are concatenated in order. Any child elements,
-     processing instructions, and comments are ignored, but the text contents
-     of child elements are concatenated to the result.
+     The string value of an XML element is the concatenation, in document order,
+     of all text nodes contained in that element and its descendants. The string
+     value of an element with no descendant text nodes is an
+     empty string (not <literal>NULL</literal>).
+     Any <literal>xsi:nil</literal> attributes are ignored.
      Note that the whitespace-only <literal>text()</literal> node between two non-text
      elements is preserved, and that leading whitespace on a <literal>text()</literal>
      node is not flattened.
+     The XPath 1.0 <function>string</function> function may be consulted for the
+     rules defining the string value of other XML node types and non-XML values.
+    </para>
+
+    <para>
+     The conversion rules presented here are not exactly those of the SQL
+     standard, as discussed in <xref linkend="functions-xml-limits-casts"/>.
     </para>
 
     <para>
-     If the path expression does not match for a given row but
-     <replaceable>default_expression</replaceable> is specified, the value resulting
-     from evaluating that expression is used.
-     If no <literal>DEFAULT</literal> clause is given for the column,
-     the field will be set to <literal>NULL</literal>.
-     It is possible for a <replaceable>default_expression</replaceable> to reference
-     the value of output columns that appear prior to it in the column list,
-     so the default of one column may be based on the value of another
-     column.
+     If the path expression returns an empty nodeset
+     (typically, when it does not match)
+     for a given row, the column will be set to <literal>NULL</literal>, unless
+     a <replaceable>default_expression</replaceable> is specified; then the
+     value resulting from evaluating that expression is used.
     </para>
 
     <para>
@@ -10936,20 +10971,14 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
     </para>
 
     <para>
-     Unlike regular PostgreSQL functions, <replaceable>column_expression</replaceable>
-     and <replaceable>default_expression</replaceable> are not evaluated to a simple
-     value before calling the function.
-     <replaceable>column_expression</replaceable> is normally evaluated
-     exactly once per input row, and <replaceable>default_expression</replaceable>
-     is evaluated each time a default is needed for a field.
-     If the expression qualifies as stable or immutable the repeat
+     A <replaceable>default_expression</replaceable>, rather than being
+     evaluated immediately when <function>xmltable</function> is called,
+     is evaluated each time a default is needed for the column.
+     If the expression qualifies as stable or immutable, the repeat
      evaluation may be skipped.
-     Effectively <function>xmltable</function> behaves more like a subquery than a
-     function call.
      This means that you can usefully use volatile functions like
-     <function>nextval</function> in <replaceable>default_expression</replaceable>, and
-     <replaceable>column_expression</replaceable> may depend on other parts of the
-     XML document.
+     <function>nextval</function> in
+     <replaceable>default_expression</replaceable>.
     </para>
 
     <para>
@@ -11297,6 +11326,293 @@ table2-mapping
 ]]></programlisting>
    </figure>
   </sect2>
+
+  <sect2 id="functions-xml-limits-compatibility">
+   <title>Limits and Compatibility</title>
+
+   <indexterm>
+    <primary>XML Functions</primary>
+    <secondary>limits and compatibility</secondary>
+   </indexterm>
+
+   <sect3 id="functions-xml-limits-xpath1">
+    <title>Queries restricted to XPath 1.0</title>
+    
+    <para>
+     The <productname>PostgreSQL</productname>-specific functions
+     <function>xpath</function> and <function>xpath_exists</function> query
+     XML documents using the XPath language, and
+     <productname>PostgreSQL</productname> also provides XPath-only variants of
+     the standard functions <function>XMLEXISTS</function> and
+     <function>XMLTABLE</function>, which are defined in the SQL standard to use
+     the XQuery language. For all of these functions,
+     <productname>PostgreSQL</productname> relies on the
+     <productname>libxml</productname> library, which provides only XPath 1.0.
+    </para>
+
+    <para>
+     There is a strong connection between the XQuery language and XPath versions
+     2.0 and later: any expression that is syntactically valid and executes
+     successfully in both produces the same result (with a minor exception for
+     expressions containing numeric character references or predefined entity
+     references, which XQuery replaces with the corresponding character while
+     XPath leaves them alone). But there is no such connection between XPath 1.0
+     and XQuery or the later XPath versions; it was an earlier language and
+     differs in many respects.
+    </para>
+
+    <para>
+     There are two categories of limitation to keep in mind: the restriction
+     from XQuery to XPath for the functions specified in the SQL standard, and
+     the restriction of XPath to version 1.0 for both the standard and the
+     <productname>PostgreSQL</productname>-specific functions.
+    </para>
+    
+    <sect4>
+     <title>Restriction of XQuery to XPath</title>
+
+     <para>
+      Features of XQuery beyond those of XPath include:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+         XQuery expressions can construct and return new XML nodes, in addition
+         to all possible XPath values. XPath can introduce and return values of
+         the atomic types (numbers, strings, and so on) but can only return XML
+         nodes already present in documents supplied as input to the expression.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XQuery has control constructs for iteration, sorting, and grouping.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XQuery allows the declaration and use of local functions.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <para>
+      Recent XPath versions begin to offer overlapping capabilities
+      (such as the functional-style <function>for-each</function> and
+      <function>sort</function>, anonymous functions, and
+      <function>parse-xml</function> to create a node from a string),
+      but these were not available before XPath 3.0.
+     </para>
+    </sect4>
+
+    <sect4>
+     <title>Restriction of XPath to 1.0</title>
+
+     <para>
+      For developers familiar with XQuery and XPath 2.0 or later, or porting
+      queries from other systems, XPath 1.0 presents a number of differences to
+      contend with:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+         The fundamental type of an XQuery/XPath expression, the
+         <type>sequence</type>, which can contain XML nodes, atomic values,
+         or both, does not exist in XPath 1.0. A 1.0 expression can only produce
+         a nodeset (possibly empty, or with one XML node or more), or a single
+         atomic value.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         Unlike an XQuery/XPath sequence, which can contain any desired items
+         in any desired order, an XPath 1.0 nodeset has no guaranteed order and,
+         like any set, can have no member appear more than once. (The
+         <productname>libxml</productname> library does seem to always return
+         nodesets to <productname>PostgreSQL</productname> with their members
+         in the same relative order they had in the input document; it does not
+         commit to this behavior, and an XPath 1.0 expression cannot control
+         it.)
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         While XQuery/XPath provides all of the types defined in XML Schema
+         and many operators and functions over those types, XPath 1.0 has only
+         nodesets and three atomic types, <type>boolean</type>,
+         <type>double</type>, and <type>string</type>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XPath 1.0 has no conditional operator. An XQuery/XPath expression
+         such as <userinput>if ( hat ) then hat/@size else "no hat"</userinput>
+         has no XPath 1.0 equivalent.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XPath 1.0 has no ordering comparison operator for strings. Both
+         <userinput>"cat" &lt; "dog"</userinput> and
+         <userinput>"cat" &gt; "dog"</userinput> are false, because each is a
+         numeric comparison of two <literal>NaN</literal>s. In contrast,
+         <literal>=</literal> and <literal>!=</literal> do compare the strings
+         as strings.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XPath 1.0 blurs the distinction between
+         <firstterm>value comparisons</firstterm> and
+         <firstterm>general comparisons</firstterm> as XQuery/XPath define them.
+         Both <userinput>sale/@hatsize = 7</userinput> and
+         <userinput>sale/@customer = "alice"</userinput> are existentially
+         quantified comparisons, true if there is any sale with the given value
+         for the attribute, but <userinput>sale/@taxable = false()</userinput>
+         is a value comparison to the
+         <firstterm>effective boolean value</firstterm> of a whole nodeset,
+         and true only if no sale has a <literal>taxable</literal> attribute
+         at all.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         In the XQuery/XPath data model, a <firstterm>document node</firstterm>
+         can have either document form (exactly one top-level element, only
+         comments and processing instructions outside of it) or content form
+         (with those constraints relaxed). Its equivalent in XPath 1.0, the
+         <firstterm>root node</firstterm>, can only be in document form.
+         This is part of the reason an <type>xml</type> value passed as the
+         context item to any <productname>PostgreSQL</productname> XPath-based
+         function must be in document form.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <para>
+      The differences highlighted here are not all of them. In XQuery and
+      the 2.0 and later versions of XPath, there is an XPath 1.0 compatibility
+      mode, and the W3C lists of function library
+      <ulink url='https://www.w3.org/TR/2010/REC-xpath-functions-20101214/#xpath1-compatibility'>changes</ulink>
+      and language
+      <ulink url='https://www.w3.org/TR/xpath20/#id-backwards-compatibility'>changes</ulink>
+      applied in that mode offer a more complete account of the
+      differences, if still not exhaustive; the compatibility mode cannot make
+      the later languages exactly equivalent to XPath 1.0.
+     </para>
+    </sect4>
+
+    <sect4 id="functions-xml-limits-casts">
+     <title>Mappings between SQL and XML data types and values</title>
+
+     <para>
+      In SQL:2006 and later, both directions of conversion between standard SQL
+      data types and the XML Schema types are specified precisely. However, the
+      rules are expressed using the types and semantics of XQuery/XPath, and
+      have no direct application to the different data model of XPath 1.0.
+     </para>
+
+     <para>
+      When <productname>PostgreSQL</productname> maps SQL data values to XML
+      (as in <function>xmlelement</function>), or XML to SQL (as in the output
+      columns of <function>xmltable</function>), except for the few cases
+      treated specially, <productname>PostgreSQL</productname> simply assumes
+      that the XML data type's XPath 1.0 string form will be valid as the
+      text-input form of the SQL datatype, and conversely. This rule has the
+      virtue of simplicity while producing, for many data types, results similar
+      to the mappings specified in the standard.
+     </para>
+
+     <para>
+      Where interoperability with other systems is a concern, for some data
+      types, it may be necessary to use available data type formatting functions
+      (such as those in <xref linkend="functions-formatting"/>) explicitly in
+      queries to produce the standard mappings.
+     </para>
+    </sect4>
+   </sect3>
+
+   <sect3 id="functions-xml-limits-postgresql">
+    <title>
+     Incidental limits of the implementation
+    </title>
+
+    <para>
+     This section concerns limits that are not inherent in the
+     <productname>libxml</productname> library, but apply to the current
+     implementation in <productname>PostgreSQL</productname>.
+    </para>
+
+    <sect4>
+     <title>Only <literal>BY VALUE</literal> passing mechanism supported</title>
+
+     <para>
+      The SQL standard defines two <firstterm>passing mechanisms</firstterm>
+      that apply when passing an XML argument from SQL to an XML function or
+      receiving a result: <literal>BY REF</literal>, in which a particular XML
+      value retains its node identity, and <literal>BY VALUE</literal>, in which
+      the content of the XML is passed but node identity is not preserved. A
+      mechanism can be specified before a list of parameters, as the default
+      mechanism for all of them, or after any parameter, to override the
+      default.
+     </para>
+
+     <para>
+      To illustrate the difference, if
+      <replaceable>x</replaceable> is an XML value, these two queries in
+      an SQL:2006 environment would produce true and false, respectively:
+
+      <screen><![CDATA[
+SELECT XMLQUERY('$a is $b' PASSING BY REF x AS a, x AS b NULL ON EMPTY);
+SELECT XMLQUERY('$a is $b' PASSING BY VALUE x AS a, x AS b NULL ON EMPTY);
+]]></screen>
+     </para>
+
+     <para>
+      <productname>PostgreSQL</productname> will accept either
+      <literal>BY VALUE</literal> or <literal>BY REF</literal> in an
+      <function>XMLEXISTS</function> or <function>XMLTABLE</function> construct,
+      but ignores them; the <type>xml</type> data type holds a character-string
+      serialized representation, so there is no node identity to preserve,
+      and passing is always <literal>BY VALUE</literal>.
+     </para>
+    </sect4>
+
+    <sect4>
+     <title>Cannot pass named parameters to queries</title>
+
+     <para>
+      The XPath-based functions support passing one parameter to serve as the
+      XPath expression's context item, but do not support passing additional
+      values to be available to the expression as named parameters.
+     </para>
+    </sect4>
+
+    <sect4>
+     <title>No <type>XML(SEQUENCE)</type> type</title>
+
+     <para>
+      The <productname>PostgreSQL</productname> <type>xml</type> can only hold
+      a value in <literal>DOCUMENT</literal> or <literal>CONTENT</literal> form.
+      An XQuery/XPath expression context item must be a single XML node
+      or atomic value, while XPath 1.0 further restricts it to only an XML node,
+      and has no node type allowing <literal>CONTENT</literal>. The upshot is
+      that a well-formed <literal>DOCUMENT</literal> is the only form of XML
+      value that <productname>PostgreSQL</productname> can supply as an XPath
+      context item.
+     </para>
+    </sect4>
+   </sect3>
+  </sect2>
  </sect1>
 
  <sect1 id="functions-json">
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index bade0fe..e947613 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -569,7 +569,7 @@ X056	Advanced table mapping: data mapping			YES
 X057	Advanced table mapping: metadata mapping			YES	
 X058	Advanced table mapping: base64 encoding of binary strings			YES	
 X059	Advanced table mapping: hex encoding of binary strings			YES	
-X060	XMLParse: character string input and CONTENT option			YES	
+X060	XMLParse: character string input and CONTENT option			YES	uses SQL:2003 definition, so a document with a DTD cannot be parsed as CONTENT
 X061	XMLParse: character string input and DOCUMENT option			YES	
 X065	XMLParse: BLOB input and CONTENT option			NO	
 X066	XMLParse: BLOB input and DOCUMENT option			NO	
@@ -593,7 +593,7 @@ X085	Predefined namespace prefixes			NO
 X086	XML namespace declarations in XMLTable			NO	
 X090	XML document predicate			YES	
 X091	XML content predicate			NO	
-X096	XMLExists			NO	XPath only
+X096	XMLExists			NO	XPath 1.0 only
 X100	Host language support for XML: CONTENT option			NO	
 X101	Host language support for XML: DOCUMENT option			NO	
 X110	Host language support for XML: VARCHAR mapping			NO	
@@ -661,11 +661,11 @@ X282	XMLValidate with CONTENT option			NO
 X283	XMLValidate with SEQUENCE option			NO	
 X284	XMLValidate: NAMESPACE without ELEMENT clause			NO	
 X286	XMLValidate: NO NAMESPACE with ELEMENT clause			NO	
-X300	XMLTable			NO	XPath only
+X300	XMLTable			NO	XPath 1.0 only
 X301	XMLTable: derived column list option			YES	
 X302	XMLTable: ordinality column option			YES	
 X303	XMLTable: column default option			YES	
-X304	XMLTable: passing a context item			YES	
+X304	XMLTable: passing a context item			YES	must be XML DOCUMENT
 X305	XMLTable: initializing an XQuery variable			NO	
 X400	Name and identifier mapping			YES	
 X410	Alter column data type: XML type			YES

xml-content-2006-3.patchtext/x-patch; name=xml-content-2006-3.patchDownload

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 94b46a6..752eadf 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4208,9 +4208,11 @@ a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
    <para>
     The <type>xml</type> type can store well-formed
     <quote>documents</quote>, as defined by the XML standard, as well
-    as <quote>content</quote> fragments, which are defined by the
-    production <literal>XMLDecl? content</literal> in the XML
-    standard.  Roughly, this means that content fragments can have
+    as <quote>content</quote> fragments, which are defined by reference to the
+    more permissive
+    <ulink url="https://www.w3.org/TR/2010/REC-xpath-datamodel-20101214/#DocumentNode"><quote>document node</quote></ulink>
+    of the XQuery and XPath data model.
+    Roughly, this means that content fragments can have
     more than one top-level element or character node.  The expression
     <literal><replaceable>xmlvalue</replaceable> IS DOCUMENT</literal>
     can be used to evaluate whether a particular <type>xml</type>
@@ -4282,24 +4284,9 @@ SET XML OPTION { DOCUMENT | CONTENT };
 SET xmloption TO { DOCUMENT | CONTENT };
 </synopsis>
     The default is <literal>CONTENT</literal>, so all forms of XML
-    data are allowed except as noted below.
+    data are allowed.
    </para>
 
-   <note>
-    <para>
-     In the SQL:2006 and later standard, the <literal>CONTENT</literal> form
-     is a proper superset of the <literal>DOCUMENT</literal> form, and so the
-     default XML option setting would allow casting to XML from character
-     strings in either form. <productname>PostgreSQL</productname>, however,
-     uses the SQL:2003 definition in which <literal>CONTENT</literal> form
-     cannot contain a document type declaration. Therefore, there is no one
-     setting of the XML option that will allow casting to XML from every valid
-     character string. The default will work almost always, but for a document
-     with a DTD, it will be necessary to change the XML option or
-     use <literal>XMLPARSE</literal> specifying <literal>DOCUMENT</literal>.
-    </para>
-   </note>
-
    </sect2>
 
    <sect2>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index e947613..32908c1 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -569,7 +569,7 @@ X056	Advanced table mapping: data mapping			YES
 X057	Advanced table mapping: metadata mapping			YES	
 X058	Advanced table mapping: base64 encoding of binary strings			YES	
 X059	Advanced table mapping: hex encoding of binary strings			YES	
-X060	XMLParse: character string input and CONTENT option			YES	uses SQL:2003 definition, so a document with a DTD cannot be parsed as CONTENT
+X060	XMLParse: character string input and CONTENT option			YES	
 X061	XMLParse: character string input and DOCUMENT option			YES	
 X065	XMLParse: BLOB input and CONTENT option			NO	
 X066	XMLParse: BLOB input and DOCUMENT option			NO	
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 28b3eaa..d326140 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -141,6 +141,7 @@ static int parse_xml_decl(const xmlChar *str, size_t *lenp,
 			   xmlChar **version, xmlChar **encoding, int *standalone);
 static bool print_xml_decl(StringInfo buf, const xmlChar *version,
 			   pg_enc encoding, int standalone);
+static bool xml_doctype_in_content(const xmlChar *str);
 static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
 		  bool preserve_whitespace, int encoding);
 static text *xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt);
@@ -1243,8 +1244,19 @@ parse_xml_decl(const xmlChar *str, size_t *lenp,
 	if (xmlStrncmp(p, (xmlChar *) "<?xml", 5) != 0)
 		goto finished;
 
-	/* if next char is name char, it's a PI like <?xml-stylesheet ...?> */
-	utf8len = strlen((const char *) (p + 5));
+	/*
+	 * If next char is name char, it's a PI like <?xml-stylesheet ...?> rather
+	 * than an XMLDecl, so we have done what we came to do and found no XMLDecl.
+	 *
+	 * Use strnlen here because there is no need to go off counting all the rest
+	 * of the input just to find out if there are enough bytes to decode one
+	 * UTF-8 char. 6 is conservative: UTF-8 finally settled on 4 bytes as the
+	 * longest encoding, but could some input come from a source that (a) saw
+	 * the original drafts with up to 6 and (b) produces overlong encodings?
+	 * Hardly likely, but it costs nearly nothing to accommodate it, while using
+	 * 4 could be a change of behavior from when strlen was here.
+	 */
+	utf8len = strnlen((const char *) (p + 5), 6);
 	utf8char = xmlGetUTF8Char(p + 5, &utf8len);
 	if (PG_XMLISNAMECHAR(utf8char))
 		goto finished;
@@ -1415,6 +1427,133 @@ print_xml_decl(StringInfo buf, const xmlChar *version,
 		return false;
 }
 
+/*
+ * Test whether an input that is to be parsed as CONTENT contains a DTD.
+ *
+ * The SQL/XML:2003 definition of CONTENT ("XMLDecl? content") is not satisfied
+ * by a document with a DTD, which is a bit of a wart, as it means the CONTENT
+ * type is not a proper superset of DOCUMENT. SQL/XML:2006 and later fix that,
+ * by redefining content with reference to the "more permissive" Document Node
+ * of the XQuery/XPath Data Model, such that any DOCUMENT value is indeed also a
+ * CONTENT value. That definition is more useful, as CONTENT becomes usable for
+ * parsing input of unknown form (think pg_restore).
+ *
+ * As used below in parse_xml when parsing for CONTENT, libxml does not give us
+ * the 2006+ behavior, but only the 2003; it will choke if the input has a DTD.
+ * But without the full effort of bringing our whole implementation along to
+ * 2006+, we can provide the 2006+ definition of CONTENT easily enough, by
+ * detecting this case first and simply doing the parse as DOCUMENT.
+ *
+ * A DTD can be found arbitrarily far in, but that would be a contrived case; it
+ * will ordinarily start within a few dozen characters. We only need to see the
+ * start. The only things that can precede it are an XMLDecl (here, the caller
+ * will have called parse_xml_decl already), whitespace, comments, and
+ * processing instructions. This function need only return true if it sees a
+ * valid sequence of such things leading to <!DOCTYPE. It can simply return
+ * false in any other cases, including malformed input; that will mean the input
+ * gets parsed as CONTENT as originally planned, with libxml reporting any
+ * errors.
+ *
+ * By taking care to return false for malformed input, this ensures the user
+ * gets whatever error would be reported in a parse for CONTENT, as expected,
+ * rather than being surprised for errors about DOCUMENT form (except in the
+ * specific case we are here to recognize, where the input has a DTD and is
+ * therefore begging for DOCUMENT treatment).
+ *
+ * This is only to be called from xml_parse, when pg_xml_init has already
+ * been called. That the input is UTF-8 is heavily relied on.
+ */
+static bool
+xml_doctype_in_content(const xmlChar *str)
+{
+	const xmlChar *p = str;
+	const xmlChar *e;
+	const xmlChar *sp;
+	int			   utf8char;
+	int			   utf8len;
+
+	for ( ;; )
+	{
+		SKIP_XML_SPACE(p);
+
+		if ( '<' != *p )
+			return false;
+
+		++ p; /* point to char following < */
+
+		if ( '!' == *p )
+		{
+			++ p; /* point to char following <! */
+			if ( 0 == xmlStrncmp(p, (xmlChar *) "DOCTYPE", 7) )
+				return true;
+			if ( 0 != xmlStrncmp(p, (xmlChar *) "--", 2) )
+				return false;
+			/* consume remainder of a comment: find -- and a > must follow */
+			p = xmlStrstr(p + 2, (xmlChar *) "--");
+			if ( !p  ||  '>' != p[2] )
+				return false; /* didn't find -- or > didn't follow */
+			p += 3; /* point to char following --> */
+			continue;
+		}
+
+		if ( '?' != *p ) /* a PI <?target something?> is the only choice left */
+			return false;
+
+		++ p; /* point to char following <? */
+
+		/* The string ?> is forbidden within a PI; it can only mean the end. */
+		e = xmlStrstr(p, (xmlChar *) "?>");
+		if ( !e )
+			return false;
+
+		/*
+		 * The target is from p to the first blank (or end of PI if no blank).
+		 * The characters accepted by xmlIsBlank are all 7-bit; it will not be
+		 * confounded by any multibyte UTF-8 characters, though it will end up
+		 * seeing each byte individually.
+		 */
+		for ( sp = p; sp < e; ++ sp )
+			if ( xmlIsBlank_ch(*sp) )
+				break;
+
+		if ( sp == p )
+			return false; /* there should have been at least one non-blank */
+
+		/* The target must not be, case-insensitively, "xml". */
+		if ( sp - p == 3  &&  0 == xmlStrncasecmp(p, (xmlChar *) "xml", 3) )
+			return false;
+
+		utf8len = sp - p;
+		utf8char = xmlGetUTF8Char(p, &utf8len);
+		if ( ! PG_XMLISNAMECHAR(utf8char) )
+			return false;
+		/* certain chars are excluded as the FIRST name char */
+		if ( '-' == utf8char || '.' == utf8char || 0xB7 == utf8char ||
+			('0' <= utf8char && utf8char <= '9') ||
+			( 0x300 <= utf8char && utf8char <= 0x36F ) ||
+			0x203F == utf8char || 0x2040 == utf8char )
+			return false;
+
+		p += utf8len;
+
+		while ( p < sp ) /* the rest can be anything PG_XMLISNAMECHAR accepts */
+		{
+			utf8len = sp - p;
+			utf8char = xmlGetUTF8Char(p, &utf8len);
+			if ( ! PG_XMLISNAMECHAR(utf8char) )
+				return false;
+			p += utf8len;
+		}
+
+		/*
+		 * The target checks out as a NAME. There are no constraints on the rest
+		 * of the content in a PI (besides that it can't be ?> relied on above),
+		 * so it's good. Point at the char following the PI's ending ?>
+		 */
+		p = e + 2;
+	}
+}
+
 
 /*
  * Convert a C string to XML internal representation
@@ -1433,6 +1572,9 @@ xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
 	xmlChar    *string;
 	xmlChar    *utf8string;
 	PgXmlErrorContext *xmlerrcxt;
+	int			errcode_document_parse = ERRCODE_INVALID_XML_DOCUMENT;
+	char const *errmsg_document_parse = "invalid XML document";
+	char const *errmsg_content_parse = "invalid XML content";
 	volatile xmlParserCtxtPtr ctxt = NULL;
 	volatile xmlDocPtr doc = NULL;
 
@@ -1457,6 +1599,7 @@ xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
 			xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
 						"could not allocate parser context");
 
+try_other_xmloption:
 		if (xmloption_arg == XMLOPTION_DOCUMENT)
 		{
 			/*
@@ -1472,8 +1615,8 @@ xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
 								 XML_PARSE_NOENT | XML_PARSE_DTDATTR
 								 | (preserve_whitespace ? 0 : XML_PARSE_NOBLANKS));
 			if (doc == NULL || xmlerrcxt->err_occurred)
-				xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
-							"invalid XML document");
+				xml_ereport(xmlerrcxt, ERROR, errcode_document_parse,
+							errmsg_document_parse);
 		}
 		else
 		{
@@ -1489,6 +1632,21 @@ xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
 									"invalid XML content: invalid XML declaration",
 									res_code);
 
+			/*
+			 * The way we use libxml below to parse CONTENT will not accept the
+			 * case (not allowed in SQL/XML:2003 but allowed in 2006+) where the
+			 * content is, in fact, a document with a DTD. By detecting that
+			 * case here and parsing as DOCUMENT instead, we provide the 2006+
+			 * behavior.
+			 */
+			if (xml_doctype_in_content(utf8string + count))
+			{
+				xmloption_arg = XMLOPTION_DOCUMENT;
+				errcode_document_parse = ERRCODE_INVALID_XML_CONTENT;
+				errmsg_document_parse = errmsg_content_parse;
+				goto try_other_xmloption;
+			}
+
 			doc = xmlNewDoc(version);
 			Assert(doc->encoding == NULL);
 			doc->encoding = xmlStrdup((const xmlChar *) "UTF-8");
@@ -1501,7 +1659,7 @@ xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
 													   utf8string + count, NULL);
 				if (res_code != 0 || xmlerrcxt->err_occurred)
 					xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_CONTENT,
-								"invalid XML content");
+								errmsg_content_parse);
 			}
 		}
 	}
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 2085fa0..0aae600 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -532,6 +532,13 @@ LINE 1: EXECUTE foo ('bad');
 DETAIL:  line 1: Start tag expected, '<' not found
 bad
 ^
+SELECT xml '<!DOCTYPE a><a/><b/>';
+ERROR:  invalid XML document
+LINE 1: SELECT xml '<!DOCTYPE a><a/><b/>';
+                   ^
+DETAIL:  line 1: Extra content at the end of the document
+<!DOCTYPE a><a/><b/>
+                ^
 SET XML OPTION CONTENT;
 EXECUTE foo ('<bar/>');
   xmlconcat   
@@ -545,6 +552,45 @@ EXECUTE foo ('good');
  <foo/>good
 (1 row)
 
+SELECT xml '<!-- in SQL:2006+ a doc is content too--> <?y z?> <!DOCTYPE a><a/>';
+                                xml                                 
+--------------------------------------------------------------------
+ <!-- in SQL:2006+ a doc is content too--> <?y z?> <!DOCTYPE a><a/>
+(1 row)
+
+SELECT xml '<?xml version="1.0"?> <!-- hi--> <!DOCTYPE a><a/>';
+             xml              
+------------------------------
+  <!-- hi--> <!DOCTYPE a><a/>
+(1 row)
+
+SELECT xml '<!DOCTYPE a><a/>';
+       xml        
+------------------
+ <!DOCTYPE a><a/>
+(1 row)
+
+SELECT xml '<!-- hi--> oops <!DOCTYPE a><a/>';
+ERROR:  invalid XML content
+LINE 1: SELECT xml '<!-- hi--> oops <!DOCTYPE a><a/>';
+                   ^
+DETAIL:  line 1: StartTag: invalid element name
+<!-- hi--> oops <!DOCTYPE a><a/>
+                 ^
+SELECT xml '<!-- hi--> <oops/> <!DOCTYPE a><a/>';
+ERROR:  invalid XML content
+LINE 1: SELECT xml '<!-- hi--> <oops/> <!DOCTYPE a><a/>';
+                   ^
+DETAIL:  line 1: StartTag: invalid element name
+<!-- hi--> <oops/> <!DOCTYPE a><a/>
+                    ^
+SELECT xml '<!DOCTYPE a><a/><b/>';
+ERROR:  invalid XML content
+LINE 1: SELECT xml '<!DOCTYPE a><a/><b/>';
+                   ^
+DETAIL:  line 1: Extra content at the end of the document
+<!DOCTYPE a><a/><b/>
+                ^
 -- Test backwards parsing
 CREATE VIEW xmlview1 AS SELECT xmlcomment('test');
 CREATE VIEW xmlview2 AS SELECT xmlconcat('hello', 'you');
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 7f86696..d1a03b5 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -429,11 +429,53 @@ EXECUTE foo ('<bar/>');
 ERROR:  prepared statement "foo" does not exist
 EXECUTE foo ('bad');
 ERROR:  prepared statement "foo" does not exist
+SELECT xml '<!DOCTYPE a><a/><b/>';
+ERROR:  unsupported XML feature
+LINE 1: SELECT xml '<!DOCTYPE a><a/><b/>';
+                   ^
+DETAIL:  This functionality requires the server to be built with libxml support.
+HINT:  You need to rebuild PostgreSQL using --with-libxml.
 SET XML OPTION CONTENT;
 EXECUTE foo ('<bar/>');
 ERROR:  prepared statement "foo" does not exist
 EXECUTE foo ('good');
 ERROR:  prepared statement "foo" does not exist
+SELECT xml '<!-- in SQL:2006+ a doc is content too--> <?y z?> <!DOCTYPE a><a/>';
+ERROR:  unsupported XML feature
+LINE 1: SELECT xml '<!-- in SQL:2006+ a doc is content too--> <?y z?...
+                   ^
+DETAIL:  This functionality requires the server to be built with libxml support.
+HINT:  You need to rebuild PostgreSQL using --with-libxml.
+SELECT xml '<?xml version="1.0"?> <!-- hi--> <!DOCTYPE a><a/>';
+ERROR:  unsupported XML feature
+LINE 1: SELECT xml '<?xml version="1.0"?> <!-- hi--> <!DOCTYPE a><a/...
+                   ^
+DETAIL:  This functionality requires the server to be built with libxml support.
+HINT:  You need to rebuild PostgreSQL using --with-libxml.
+SELECT xml '<!DOCTYPE a><a/>';
+ERROR:  unsupported XML feature
+LINE 1: SELECT xml '<!DOCTYPE a><a/>';
+                   ^
+DETAIL:  This functionality requires the server to be built with libxml support.
+HINT:  You need to rebuild PostgreSQL using --with-libxml.
+SELECT xml '<!-- hi--> oops <!DOCTYPE a><a/>';
+ERROR:  unsupported XML feature
+LINE 1: SELECT xml '<!-- hi--> oops <!DOCTYPE a><a/>';
+                   ^
+DETAIL:  This functionality requires the server to be built with libxml support.
+HINT:  You need to rebuild PostgreSQL using --with-libxml.
+SELECT xml '<!-- hi--> <oops/> <!DOCTYPE a><a/>';
+ERROR:  unsupported XML feature
+LINE 1: SELECT xml '<!-- hi--> <oops/> <!DOCTYPE a><a/>';
+                   ^
+DETAIL:  This functionality requires the server to be built with libxml support.
+HINT:  You need to rebuild PostgreSQL using --with-libxml.
+SELECT xml '<!DOCTYPE a><a/><b/>';
+ERROR:  unsupported XML feature
+LINE 1: SELECT xml '<!DOCTYPE a><a/><b/>';
+                   ^
+DETAIL:  This functionality requires the server to be built with libxml support.
+HINT:  You need to rebuild PostgreSQL using --with-libxml.
 -- Test backwards parsing
 CREATE VIEW xmlview1 AS SELECT xmlcomment('test');
 CREATE VIEW xmlview2 AS SELECT xmlconcat('hello', 'you');
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 510b09b..9756b6e 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -512,6 +512,13 @@ LINE 1: EXECUTE foo ('bad');
 DETAIL:  line 1: Start tag expected, '<' not found
 bad
 ^
+SELECT xml '<!DOCTYPE a><a/><b/>';
+ERROR:  invalid XML document
+LINE 1: SELECT xml '<!DOCTYPE a><a/><b/>';
+                   ^
+DETAIL:  line 1: Extra content at the end of the document
+<!DOCTYPE a><a/><b/>
+                ^
 SET XML OPTION CONTENT;
 EXECUTE foo ('<bar/>');
   xmlconcat   
@@ -525,6 +532,45 @@ EXECUTE foo ('good');
  <foo/>good
 (1 row)
 
+SELECT xml '<!-- in SQL:2006+ a doc is content too--> <?y z?> <!DOCTYPE a><a/>';
+                                xml                                 
+--------------------------------------------------------------------
+ <!-- in SQL:2006+ a doc is content too--> <?y z?> <!DOCTYPE a><a/>
+(1 row)
+
+SELECT xml '<?xml version="1.0"?> <!-- hi--> <!DOCTYPE a><a/>';
+             xml              
+------------------------------
+  <!-- hi--> <!DOCTYPE a><a/>
+(1 row)
+
+SELECT xml '<!DOCTYPE a><a/>';
+       xml        
+------------------
+ <!DOCTYPE a><a/>
+(1 row)
+
+SELECT xml '<!-- hi--> oops <!DOCTYPE a><a/>';
+ERROR:  invalid XML content
+LINE 1: SELECT xml '<!-- hi--> oops <!DOCTYPE a><a/>';
+                   ^
+DETAIL:  line 1: StartTag: invalid element name
+<!-- hi--> oops <!DOCTYPE a><a/>
+                 ^
+SELECT xml '<!-- hi--> <oops/> <!DOCTYPE a><a/>';
+ERROR:  invalid XML content
+LINE 1: SELECT xml '<!-- hi--> <oops/> <!DOCTYPE a><a/>';
+                   ^
+DETAIL:  line 1: StartTag: invalid element name
+<!-- hi--> <oops/> <!DOCTYPE a><a/>
+                    ^
+SELECT xml '<!DOCTYPE a><a/><b/>';
+ERROR:  invalid XML content
+LINE 1: SELECT xml '<!DOCTYPE a><a/><b/>';
+                   ^
+DETAIL:  line 1: Extra content at the end of the document
+<!DOCTYPE a><a/><b/>
+                ^
 -- Test backwards parsing
 CREATE VIEW xmlview1 AS SELECT xmlcomment('test');
 CREATE VIEW xmlview2 AS SELECT xmlconcat('hello', 'you');
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 8057a46..71431d8 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -149,10 +149,17 @@ PREPARE foo (xml) AS SELECT xmlconcat('<foo/>', $1);
 SET XML OPTION DOCUMENT;
 EXECUTE foo ('<bar/>');
 EXECUTE foo ('bad');
+SELECT xml '<!DOCTYPE a><a/><b/>';
 
 SET XML OPTION CONTENT;
 EXECUTE foo ('<bar/>');
 EXECUTE foo ('good');
+SELECT xml '<!-- in SQL:2006+ a doc is content too--> <?y z?> <!DOCTYPE a><a/>';
+SELECT xml '<?xml version="1.0"?> <!-- hi--> <!DOCTYPE a><a/>';
+SELECT xml '<!DOCTYPE a><a/>';
+SELECT xml '<!-- hi--> oops <!DOCTYPE a><a/>';
+SELECT xml '<!-- hi--> <oops/> <!DOCTYPE a><a/>';
+SELECT xml '<!DOCTYPE a><a/><b/>';
 
 
 -- Test backwards parsing

#17

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Chapman Flack (#16)

Re: Fix XML handling with DOCTYPE

There might be too many different email threads on this with patches,
but in case it went under the radar, xml-content-2006-3.patch appeared
in my previous message on this thread[1]/messages/by-id/5C8ECAA4.3090301@anastigmatix.net.

It is based on a simple pre-check of the prefix of the input, determining
which form of parse to apply. That may or may not be simpler than parse-
once-save-error-parse-again-report-first-error, but IMV it's a more direct
solution and clearer (the logic is clearly about "how do I determine the way
this input should be parsed?" which is the problem on the table, rather
than "how should I save and regurgitate this libxml error?" which turns the
problem on the table to a different one).

I decided, for a first point of reference, to wear the green eyeshade and
write a pre-check that exactly implements the applicable rules. That gives
a starting point for simplifications that are probably safe.

For example, a bunch of lines at the end have to do with verifying the
content inside of a processing-instruction, after finding where it ends.
We could reasonably decide that, for the purpose of skipping it, knowing
where it ends is enough, as libxml will parse it next and report any errors
anyway.

That would slightly violate my intention of sending input to (the parser
that wasn't asked for) /only/ when it's completely clear (from the prefix
we've seen) that that's where it should go. The relaxed version could do
that in completely-clear cases and cases with an invalid PI ahead of what
looks like a DTD. But you'd pretty much expect both parsers to produce
the same message for a bad PI anyway.

That made me just want to try it now, and--surprise!--the messages from
libxml are not the same. So maybe I would lean to keeping the green-eyeshade
rules in the test, if you can stomach them, but I would understand taking
them out.

Regards,
-Chap

[1]: /messages/by-id/5C8ECAA4.3090301@anastigmatix.net

#18

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Chapman Flack (#17)

Re: Fix XML handling with DOCTYPE

Chapman Flack <chap@anastigmatix.net> writes:

I decided, for a first point of reference, to wear the green eyeshade and
write a pre-check that exactly implements the applicable rules. That gives
a starting point for simplifications that are probably safe.
For example, a bunch of lines at the end have to do with verifying the
content inside of a processing-instruction, after finding where it ends.
We could reasonably decide that, for the purpose of skipping it, knowing
where it ends is enough, as libxml will parse it next and report any errors
anyway.

Yeah, I did not like that code too much, particularly not all the magic
Unicode-code-point numbers. I removed that, made some other changes to
bring the patch more in line with PG coding style, and pushed it.

That made me just want to try it now, and--surprise!--the messages from
libxml are not the same. So maybe I would lean to keeping the green-eyeshade
rules in the test, if you can stomach them, but I would understand taking
them out.

I doubt anyone will care too much about whether error messages for bad
XML input are exactly like what they were before; and even if someone
does, I doubt that these extra tests would be enough to ensure that
the messages don't change. You're not really validating that the input
is something that libxml would accept, unless its processing of XML PIs
is far stupider than I would expect it to be.

regards, tom lane

#19

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Tom Lane (#18)

Re: Fix XML handling with DOCTYPE

On 03/23/19 16:59, Tom Lane wrote:

Unicode-code-point numbers. I removed that, made some other changes to
bring the patch more in line with PG coding style, and pushed it.

Thanks! It looks good. I'm content with the extra PI checking being gone.

The magic Unicode-code-point numbers come straight from the XML standard;
I couldn't make that stuff up. :)

You're not really validating that the input

is something that libxml would accept, unless its processing of XML PIs
is far stupider than I would expect it to be.

Out of curiosity, what further processing would you expect libxml to do?

XML parsers are supposed to be transparent PI-preservers, except in the
rare case of seeing a PI that actually means something to the embedding
application, which isn't going to be the case for a database simply
implementing an XML data type.

The standard literally requires that the target must be a NAME, and
can't match [Xx][Mm][Ll], and if there's whitespace and anything after
that, there can't be an embedded ?> ... and that's it.

Regards,
-Chap

#20

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Chapman Flack (#19)

Re: Fix XML handling with DOCTYPE

Chapman Flack <chap@anastigmatix.net> writes:

On 03/23/19 16:59, Tom Lane wrote:

You're not really validating that the input
is something that libxml would accept, unless its processing of XML PIs
is far stupider than I would expect it to be.

Out of curiosity, what further processing would you expect libxml to do?

Hm, I'd have thought it'd try to parse the arguments to some extent,
but maybe not. Does everybody reimplement attribute parsing for
themselves when using PIs?

regards, tom lane

#21

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Tom Lane (#20)

Re: Fix XML handling with DOCTYPE

On 03/23/19 18:22, Tom Lane wrote:

Out of curiosity, what further processing would you expect libxml to do?

Hm, I'd have thought it'd try to parse the arguments to some extent,
but maybe not. Does everybody reimplement attribute parsing for
themselves when using PIs?

Yeah, the content of a PI (whatever's after the target name) is left
all to be defined by whatever XML-using application might care about
that PI.

It could have an attribute=value syntax inspired by XML elements, or
some other form entirely, but there'd just better not be any ?> in it.

Regards,
-Chap

#22

Ryan Lambert

ryan@rustprooflabs.com

almost 7 years ago

In reply to: Chapman Flack (#21)

Re: Fix XML handling with DOCTYPE

I am unable to get latest patches I found [1]/messages/by-id/5C8ECAA4.3090301@anastigmatix.net to apply cleanly to current
branches. It's possible I missed the latest patches so if I'm using the
wrong ones please let me know. I tried against master, 11.2 stable and the
11.2 tag with similar results. It's quite possible it's user error on my
end, I am new to this process but didn't have issues with the previous
patches when I tested those a couple weeks ago.

[1]: /messages/by-id/5C8ECAA4.3090301@anastigmatix.net

Ryan Lambert

On Sat, Mar 23, 2019 at 5:07 PM Chapman Flack <chap@anastigmatix.net> wrote:

Show quoted text

On 03/23/19 18:22, Tom Lane wrote:

Out of curiosity, what further processing would you expect libxml to do?

Hm, I'd have thought it'd try to parse the arguments to some extent,
but maybe not. Does everybody reimplement attribute parsing for
themselves when using PIs?

Yeah, the content of a PI (whatever's after the target name) is left
all to be defined by whatever XML-using application might care about
that PI.

It could have an attribute=value syntax inspired by XML elements, or
some other form entirely, but there'd just better not be any ?> in it.

Regards,
-Chap

#23

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Ryan Lambert (#22)

Re: Fix XML handling with DOCTYPE

On 03/24/19 21:04, Ryan Lambert wrote:

I am unable to get latest patches I found [1] to apply cleanly to current
branches. It's possible I missed the latest patches so if I'm using the
wrong ones please let me know. I tried against master, 11.2 stable and the
11.2 tag with similar results.

Tom pushed the content-with-DOCTYPE patch, it's now included in master,
REL_11_STABLE, REL_10_STABLE, REL9_6_STABLE, REL9_5_STABLE, and
REL9_4_STABLE.

The only patch that's left to be reviewed and applied is the documentation
fix, latest in [1]/messages/by-id/5C96DBB5.2080103@anastigmatix.net.

If you were interested in giving a review opinion on some XML documentation.

Regards,
-Chap

[1]: /messages/by-id/5C96DBB5.2080103@anastigmatix.net

#24

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Chapman Flack (#23)

Re: Fix XML handling with DOCTYPE

Chapman Flack <chap@anastigmatix.net> writes:

On 03/24/19 21:04, Ryan Lambert wrote:

I am unable to get latest patches I found [1] to apply cleanly to current
branches. It's possible I missed the latest patches so if I'm using the
wrong ones please let me know. I tried against master, 11.2 stable and the
11.2 tag with similar results.

Tom pushed the content-with-DOCTYPE patch, it's now included in master,
REL_11_STABLE, REL_10_STABLE, REL9_6_STABLE, REL9_5_STABLE, and
REL9_4_STABLE.

Right. If you want to test (and please do!) you could grab the relevant
branch tip from our git repo, or download a nightly snapshot tarball from

https://www.postgresql.org/ftp/snapshot/

regards, tom lane

#25

Ryan Lambert

ryan@rustprooflabs.com

almost 7 years ago

In reply to: Tom Lane (#24)

Re: Fix XML handling with DOCTYPE

Perfect, thank you! I do remember seeing that message now, but hadn't
understood what it really meant.
I will test later today. Thanks

*Ryan*

On Sun, Mar 24, 2019 at 9:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Chapman Flack <chap@anastigmatix.net> writes:

On 03/24/19 21:04, Ryan Lambert wrote:

I am unable to get latest patches I found [1] to apply cleanly to

current

branches. It's possible I missed the latest patches so if I'm using the
wrong ones please let me know. I tried against master, 11.2 stable and

the

11.2 tag with similar results.

Tom pushed the content-with-DOCTYPE patch, it's now included in master,
REL_11_STABLE, REL_10_STABLE, REL9_6_STABLE, REL9_5_STABLE, and
REL9_4_STABLE.

Right. If you want to test (and please do!) you could grab the relevant
branch tip from our git repo, or download a nightly snapshot tarball from

https://www.postgresql.org/ftp/snapshot/

regards, tom lane

#26

Ryan Lambert

ryan@rustprooflabs.com

almost 7 years ago

In reply to: Ryan Lambert (#25)

Re: Fix XML handling with DOCTYPE

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

I tested the master branch (commit 8edd0e7), REL_11_STABLE (commit 24df866) and REL9_6_STABLE (commit 5097368) and verified functionality. This patch fixes the bug I had reported [1]/messages/by-id/153478795159.1302.9617586466368699403@wrigleys.postgresql.org previously.

With this in the stable branches is it safe to assume this will be included with the next minor releases? Thanks for your work on this!!

Ryan

[1]: /messages/by-id/153478795159.1302.9617586466368699403@wrigleys.postgresql.org

The new status of this patch is: Ready for Committer

#27

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Ryan Lambert (#26)

Re: Fix XML handling with DOCTYPE

On 03/25/19 18:03, Ryan Lambert wrote:

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

Hi,

Thanks for the review! Yes, that part of this commitfest entry has been
committed already and will appear in the next minor releases of those
branches.

That leaves only one patch in this commitfest entry that is still in
need of review, namely the update to the documentation.

If you happened to feel moved to look over a documentation patch, that
would be what this CF entry most needs in the waning days of the commitfest.

There seem to be community members reluctant to review it because of not
feeling sufficiently expert in XML to scrutinize every technical detail,
but there are other valuable angles for documentation review. (And the
reason there *is* a documentation patch is the plentiful room for
improvement in the documentation that's already there, so as far as
reviewing goes, the old yarn about the two guys, the running shoes, and
the bear comes to mind.)

I can supply pointers to specs, etc., for anyone who does see some technical
details in the patch and has questions about them.

Regards,
-Chap

#28

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Chapman Flack (#27)

Re: Fix XML handling with DOCTYPE

Chapman Flack <chap@anastigmatix.net> writes:

Thanks for the review! Yes, that part of this commitfest entry has been
committed already and will appear in the next minor releases of those
branches.

Indeed, thanks for verifying that this fixes your problem.

That leaves only one patch in this commitfest entry that is still in
need of review, namely the update to the documentation.

Yeah. Since it *is* in need of review, I changed the CF entry's
state back to Needs Review.

regards, tom lane

#29

Ryan Lambert

ryan@rustprooflabs.com

almost 7 years ago

In reply to: Chapman Flack (#27)

Re: Fix XML handling with DOCTYPE

Ok, I'll give it a go.

If you happened to feel moved to look over a documentation patch, that
would be what this CF entry most needs in the waning days of the
commitfest.

Is the xml-functions-type-docfix-4.patch [1]/messages/by-id/attachment/100016/xml-functions-type-docfix-4.patch the one needing review? I'll
test applying it and review the changes in better detail. Is there a
section in the docs that shows how to verify if the updated pages render
properly? I would assume the pages are build when installing from source.

Ryan

[1]: /messages/by-id/attachment/100016/xml-functions-type-docfix-4.patch
/messages/by-id/attachment/100016/xml-functions-type-docfix-4.patch

On Mon, Mar 25, 2019 at 4:52 PM Chapman Flack <chap@anastigmatix.net> wrote:

Show quoted text

On 03/25/19 18:03, Ryan Lambert wrote:

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

Hi,

Thanks for the review! Yes, that part of this commitfest entry has been
committed already and will appear in the next minor releases of those
branches.

That leaves only one patch in this commitfest entry that is still in
need of review, namely the update to the documentation.

If you happened to feel moved to look over a documentation patch, that
would be what this CF entry most needs in the waning days of the
commitfest.

There seem to be community members reluctant to review it because of not
feeling sufficiently expert in XML to scrutinize every technical detail,
but there are other valuable angles for documentation review. (And the
reason there *is* a documentation patch is the plentiful room for
improvement in the documentation that's already there, so as far as
reviewing goes, the old yarn about the two guys, the running shoes, and
the bear comes to mind.)

I can supply pointers to specs, etc., for anyone who does see some
technical
details in the patch and has questions about them.

Regards,
-Chap

#30

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Ryan Lambert (#29)

Re: Fix XML handling with DOCTYPE

Ryan Lambert <ryan@rustprooflabs.com> writes:

Is the xml-functions-type-docfix-4.patch [1] the one needing review? I'll
test applying it and review the changes in better detail. Is there a
section in the docs that shows how to verify if the updated pages render
properly? I would assume the pages are build when installing from source.

Plain old "make all" doesn't build the docs. See
https://www.postgresql.org/docs/devel/docguide.html
for tooling prerequisites and build instructions.

(Usually people just build the HTML docs and look at them
with a browser; the other doc formats are less interesting.)

regards, tom lane

#31

Ryan Lambert

ryan@rustprooflabs.com

almost 7 years ago

In reply to: Tom Lane (#30)

Re: Fix XML handling with DOCTYPE

Overall I think this patch [1]/messages/by-id/attachment/100016/xml-functions-type-docfix-4.patch improves the docs and help explain edge case functionality that many of us, myself included, don't fully understand. I can't verify technical accuracy for many of the details (nuances between XPath 1.0, et. al), but overall my experience with the XML functionality lines up with what has been documented here. Adding the clear declaration of "XPath 1.0" instead of the generic "XPath" helps make it clear of the functional differences and helps frame the differences for new users.

I have two recommendations for features.sgml. You state:

relies on the libxml library

Should this be clarified as the libxml2 library? That's what I installed to build postgres from source (Ubuntu 16/18). If it is the libxml library and the "2" is irrelevant, or it it works with either, it might be nice to have a clarifying note to indicate that.

There are a few places where the parenthesis around a block of text seem unnecessary. I don't think parens serve a purpose when a full sentence is contained within.

(The libxml library does seem to always return nodesets to PostgreSQL with their members in the same relative order they had in the input document; it does not commit to this behavior, and an XPath 1.0 expression cannot control it.)

It seems you are standardizing from "node set" to "nodeset", is that the preferred nomenclature or preference?

Hopefully this helps! Thanks,

Ryan Lambert

[1]: /messages/by-id/attachment/100016/xml-functions-type-docfix-4.patch

The new status of this patch is: Waiting on Author

#32

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Ryan Lambert (#31)

Re: Fix XML handling with DOCTYPE

Ryan Lambert <ryan@rustprooflabs.com> writes:

I have two recommendations for features.sgml. You state:

relies on the libxml library

Should this be clarified as the libxml2 library? That's what I installed to build postgres from source (Ubuntu 16/18). If it is the libxml library and the "2" is irrelevant, or it it works with either, it might be nice to have a clarifying note to indicate that.

Do we need to mention that at all? If you're not building from source,
it doesn't seem very interesting ... but maybe I'm missing some reason
why end users would care.

It seems you are standardizing from "node set" to "nodeset", is that the preferred nomenclature or preference?

That seemed a bit jargon-y to me too. If that's standard terminology
in the XML world, maybe it's fine; but I'd have stuck with "node set".

regards, tom lane

#33

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Ryan Lambert (#31)

Re: Fix XML handling with DOCTYPE

On 03/26/19 21:39, Ryan Lambert wrote:

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: tested, passed

Thanks for the review!

I have two recommendations for features.sgml. You state:

relies on the libxml library

Should this be clarified as the libxml2 library? That's what I installed
to build postgres from source (Ubuntu 16/18). If it is the libxml library
and the "2" is irrelevant

That's a good catch. I'm not actually sure whether there is any "libxml"
library that isn't libxml2. Maybe there was once and nobody admits to
hanging out with it. Most Google hits on "libxml" seem to be modules
that have libxml in their names and libxml2 as their actual dependency.

Perl XML:LibXML: "This module is an interface to libxml2"
Haskell libxml: "Binding to libxml2"
libxml-ruby: "The Libxml-Ruby project provides Ruby language bindings
for the GNOME Libxml2 ..."

--with-libxml is the PostgreSQL configure option to make it use libxml2.

The very web page http://xmlsoft.org/index.html says "The XML C parser
and toolkit of Gnome: libxml" and is all about libxml2.

So I think I was unsure what convention to follow, and threw up my hands
and went with libxml. I could just as easily throw them up and go with
libxml2. Which do you think would be better?

On 03/26/19 23:52, Tom Lane wrote:

Do we need to mention that at all? If you're not building from source,
it doesn't seem very interesting ... but maybe I'm missing some reason
why end users would care.

The three places I've mentioned it were the ones where I thought users
might care:

- why are we stuck at XPath 1.0? It's what we get from the library we use.

- in what order do we get things out from a (hmm) node-set? Per XPath 1.0,
it's indeterminate (it's a set!), unlike XPath 2.0/XQuery where there's
a sequence type and you have order control. Observable behavior from
libxml2 (and you could certainly want to know this) is you get things out
in document order, whether that's what you wanted or not, even though
this is undocumented, and even counter-documented[1]http://xmlsoft.org/html/libxml-xpath.html#xmlNodeSet : "array of nodes in no particular order", libxml2 behavior.
So it's an example of something you would fundamentally like to know,
where the only available answer depends precariously on the library
we happen to be using.

- which limits in our implementation are inherent to the library, and
which are just current limits in our embedding of it? (Maybe this is
right at the border of what a user would care to know, but I know it's
a question that crosses my mind when I bonk into a limit I wasn't
expecting.)

There are a few places where the parenthesis around a block of text
seem unnecessary.

)blush( that's a long-standing wart in my writing ... seems I often think
in parentheses, then look and say "those aren't needed" and take them out,
only sometimes I don't.

I skimmed just now and found a few instances of parenthesized whole
sentence: the one you quoted, and some (if argument is null, the result
is null), and (No rows will be produced if ....). Shall I deparenthesize
them all? Did you have other instances in mind?

It seems you are standardizing from "node set" to "nodeset", is that
the preferred nomenclature or preference?

Another good catch. I remember consciously making a last pass to get them
all consistent, and I wanted them consistent with the spec, and I see now
I messed up.

XPath 1.0 [2]https://www.w3.org/TR/1999/REC-xpath-19991116/ has zero instances of "nodeset", two of "node set" and about
six dozen of "node-set". The only appearances of "node set" without the
hyphen are in a heading and its ToC entry. The stuff under that heading
consistently uses node-set. It seems that's the XPath 1.0 term for sure.

When I made my consistency pass, I must have been looking too recently
in libxml2 C source, rather than the spec.

On 03/26/19 23:52, Tom Lane wrote:

That seemed a bit jargon-y to me too. If that's standard terminology
in the XML world, maybe it's fine; but I'd have stuck with "node set".

It really was my intention (though I flubbed it) to use XPath 1.0's term
for XPath 1.0's concept; in my doc philosophy, that gives readers
the most breadcrumbs to follow for the rest of the details if they want
them. "Node set" might be some sort of squishy expository concept I'm
using, but node-set is a thing, in a spec.

If you agree, I should go through and fix my nodesets to be node-sets.

I do think the terminology matters here, especially because of the
differences between what you can do with a node-set (XPath 1.0 thing)
and with a sequence (XPath 2.0+,XQuery,SQL/XML thing).

Let me know what you'd like best on these points and I'll revise the patch.

Regards,
-Chap

[1]: http://xmlsoft.org/html/libxml-xpath.html#xmlNodeSet : "array of nodes in no particular order"
in no particular order"

[2]: https://www.w3.org/TR/1999/REC-xpath-19991116/

#34

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Ryan Lambert (#31)

Re: Fix XML handling with DOCTYPE

On 03/26/19 21:39, Ryan Lambert wrote:

I can't verify technical accuracy for many of the details (nuances between
XPath 1.0, et. al), but overall my experience with the XML functionality
lines up with what has been documented here.

By the way, in case it's buried too far back in the email thread now,
much of the early drafting for this happened on the wiki page

https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards

which includes a lot of reference links, including a nice paper by
Andrew Eisenberg and Jim Melton that introduced the major changes
from the SQL:2003 to :2006 editions of SQL/XML.

Cheers,
-Chap

#35

Ryan Lambert

ryan@rustprooflabs.com

almost 7 years ago

In reply to: Chapman Flack (#33)

Re: Fix XML handling with DOCTYPE

Thanks for putting up with a new reviewer!

--with-libxml is the PostgreSQL configure option to make it use libxml2.

The very web page http://xmlsoft.org/index.html says "The XML C parser
and toolkit of Gnome: libxml" and is all about libxml2.

So I think I was unsure what convention to follow, and threw up my hands
and went with libxml. I could just as easily throw them up and go with
libxml2. Which do you think would be better?

I think leaving it as libxml makes sense with all that. Good point that
--with-libxml is used to build so I think staying with that works and is
consistent. I agree that having this point included does clarify the how
and why of the limitations of this implementation.

I also over-parenthesize so I'm used to looking for that in my own
writing. The full sentences were the ones that seemed excessive to me, I
think the others are ok and I won't nit-pick either way on those (unless
you want me to!).

If you agree, I should go through and fix my nodesets to be node-sets.

Yes, I like node-sets better, especially knowing it conforms to the spec's
language.

Thanks,

*Ryan Lambert*

On Tue, Mar 26, 2019 at 11:05 PM Chapman Flack <chap@anastigmatix.net>
wrote:

Show quoted text

On 03/26/19 21:39, Ryan Lambert wrote:

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: tested, passed

Thanks for the review!

I have two recommendations for features.sgml. You state:

relies on the libxml library

Should this be clarified as the libxml2 library? That's what I installed
to build postgres from source (Ubuntu 16/18). If it is the libxml

library

and the "2" is irrelevant

That's a good catch. I'm not actually sure whether there is any "libxml"
library that isn't libxml2. Maybe there was once and nobody admits to
hanging out with it. Most Google hits on "libxml" seem to be modules
that have libxml in their names and libxml2 as their actual dependency.

Perl XML:LibXML: "This module is an interface to libxml2"
Haskell libxml: "Binding to libxml2"
libxml-ruby: "The Libxml-Ruby project provides Ruby language bindings
for the GNOME Libxml2 ..."

--with-libxml is the PostgreSQL configure option to make it use libxml2.

The very web page http://xmlsoft.org/index.html says "The XML C parser
and toolkit of Gnome: libxml" and is all about libxml2.

So I think I was unsure what convention to follow, and threw up my hands
and went with libxml. I could just as easily throw them up and go with
libxml2. Which do you think would be better?

On 03/26/19 23:52, Tom Lane wrote:

Do we need to mention that at all? If you're not building from source,
it doesn't seem very interesting ... but maybe I'm missing some reason
why end users would care.

The three places I've mentioned it were the ones where I thought users
might care:

- why are we stuck at XPath 1.0? It's what we get from the library we use.

- in what order do we get things out from a (hmm) node-set? Per XPath 1.0,
it's indeterminate (it's a set!), unlike XPath 2.0/XQuery where there's
a sequence type and you have order control. Observable behavior from
libxml2 (and you could certainly want to know this) is you get things
out
in document order, whether that's what you wanted or not, even though
this is undocumented, and even counter-documented[1], libxml2 behavior.
So it's an example of something you would fundamentally like to know,
where the only available answer depends precariously on the library
we happen to be using.

- which limits in our implementation are inherent to the library, and
which are just current limits in our embedding of it? (Maybe this is
right at the border of what a user would care to know, but I know it's
a question that crosses my mind when I bonk into a limit I wasn't
expecting.)

There are a few places where the parenthesis around a block of text
seem unnecessary.

)blush( that's a long-standing wart in my writing ... seems I often think
in parentheses, then look and say "those aren't needed" and take them out,
only sometimes I don't.

I skimmed just now and found a few instances of parenthesized whole
sentence: the one you quoted, and some (if argument is null, the result
is null), and (No rows will be produced if ....). Shall I deparenthesize
them all? Did you have other instances in mind?

It seems you are standardizing from "node set" to "nodeset", is that
the preferred nomenclature or preference?

Another good catch. I remember consciously making a last pass to get them
all consistent, and I wanted them consistent with the spec, and I see now
I messed up.

XPath 1.0 [2] has zero instances of "nodeset", two of "node set" and about
six dozen of "node-set". The only appearances of "node set" without the
hyphen are in a heading and its ToC entry. The stuff under that heading
consistently uses node-set. It seems that's the XPath 1.0 term for sure.

When I made my consistency pass, I must have been looking too recently
in libxml2 C source, rather than the spec.

On 03/26/19 23:52, Tom Lane wrote:

That seemed a bit jargon-y to me too. If that's standard terminology
in the XML world, maybe it's fine; but I'd have stuck with "node set".

It really was my intention (though I flubbed it) to use XPath 1.0's term
for XPath 1.0's concept; in my doc philosophy, that gives readers
the most breadcrumbs to follow for the rest of the details if they want
them. "Node set" might be some sort of squishy expository concept I'm
using, but node-set is a thing, in a spec.

If you agree, I should go through and fix my nodesets to be node-sets.

I do think the terminology matters here, especially because of the
differences between what you can do with a node-set (XPath 1.0 thing)
and with a sequence (XPath 2.0+,XQuery,SQL/XML thing).

Let me know what you'd like best on these points and I'll revise the patch.

Regards,
-Chap

[1] http://xmlsoft.org/html/libxml-xpath.html#xmlNodeSet : "array of nodes
in no particular order"

[2] https://www.w3.org/TR/1999/REC-xpath-19991116/

#36

Alvaro Herrera

alvherre@2ndquadrant.com

almost 7 years ago

In reply to: Chapman Flack (#33)

Re: Fix XML handling with DOCTYPE

On 2019-Mar-27, Chapman Flack wrote:

On 03/26/19 21:39, Ryan Lambert wrote:

Should this be clarified as the libxml2 library? That's what I installed
to build postgres from source (Ubuntu 16/18). If it is the libxml library
and the "2" is irrelevant

That's a good catch. I'm not actually sure whether there is any "libxml"
library that isn't libxml2. Maybe there was once and nobody admits to
hanging out with it. Most Google hits on "libxml" seem to be modules
that have libxml in their names and libxml2 as their actual dependency.

Perl XML:LibXML: "This module is an interface to libxml2"
Haskell libxml: "Binding to libxml2"
libxml-ruby: "The Libxml-Ruby project provides Ruby language bindings
for the GNOME Libxml2 ..."

--with-libxml is the PostgreSQL configure option to make it use libxml2.

The very web page http://xmlsoft.org/index.html says "The XML C parser
and toolkit of Gnome: libxml" and is all about libxml2.

So I think I was unsure what convention to follow, and threw up my hands
and went with libxml. I could just as easily throw them up and go with
libxml2. Which do you think would be better?

Daniel Veillard actually had libxml version 1 in that repository (mostly
of GNOME provenance, it seems, put together during some W3C meeting in
1998). The version number changed to 2 sometime during year 2000.
Version 1 was mostly abandoned at that point, and for some reason
everyone keeps using "libxml2" as the name as though it was a different
thing from "libxml". I suppose the latter name is just too generic, or
because they wanted to differentiate from the old (probably
incompatible API) code.
https://gitlab.gnome.org/GNOME/libxml2/tree/LIB_XML_1_BRANCH

Everyone calls it "libxml2" nowadays. Let's just use that and avoid any
possible confusion. If some libxml3 emerges one day, it's quite likely
we'll need to revise much more than our docs in order to use it.

On 03/26/19 23:52, Tom Lane wrote:

Do we need to mention that at all? If you're not building from source,
it doesn't seem very interesting ... but maybe I'm missing some reason
why end users would care.

The three places I've mentioned it were the ones where I thought users
might care:

These seem relevant details.

If you agree, I should go through and fix my nodesets to be node-sets.

[1] http://xmlsoft.org/html/libxml-xpath.html#xmlNodeSet : "array of nodes
in no particular order"

What this means is "we don't guarantee any specific order". It's like a
query without ORDER BY: you may currently always get document order, but
if you upgrade the library one day, it's quite possible to get the nodes
in another order and you'll not get a refund. So you (the user) should
not rely on the order, or at least be mindful that it may change in the
future.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#37

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Alvaro Herrera (#36)

Re: Fix XML handling with DOCTYPE

On 3/27/19 9:31 AM, Alvaro Herrera wrote:

Everyone calls it "libxml2" nowadays. Let's just use that and avoid any
possible confusion. If some libxml3 emerges one day, it's quite likely
we'll need to revise much more than our docs in order to use it.

That's persuasive to me. I'll change the references to say libxml2
and let a committer serve as tiebreaker.

[1] http://xmlsoft.org/html/libxml-xpath.html#xmlNodeSet : "array of nodes
in no particular order"

What this means is "we don't guarantee any specific order". It's like a
query without ORDER BY: you may currently always get document order, but
if you upgrade the library one day, it's quite possible to get the nodes
in another order and you'll not get a refund. So you (the user) should
not rely on the order, or at least be mindful that it may change in the
future.

Exactly. I called the behavior "counter-documented" to distinguish this
from the usual "undocumented" case, where you notice that a library is
behaving in a way you like, but its docs are utterly silent on the
matter, so you know you're going out on a limb to count on what you've
noticed.

In this case, you can notice the handy behavior but the doc *comes
right out and disclaims it* so if you count on it, you're going out
on a limb that has no bark left and looks punky.

And yet it seems worthwhile to mention how the library does in fact
seem to behave, because you might well be in the situation of porting
code over from SQL/XML:2006+ or XQuery or XPath 2+, or those are the
languages you've learned, so you may have order assumptions you've made,
and be surprised that XPath 1 doesn't let you make them, and at least
we can say "in a pinch, if you don't mind standing on this punky limb
here, you may be able to use the code you've got without having to
refactor every XMLTABLE() or xpath() into something wrapped in an
outer SQL query with ORDER BY. You just don't get your money back if
a later library upgrade changes the order."

The wiki page remembers[1]/messages/by-id/5C465A65.4030305@anastigmatix.net that I had tried some pretty gnarly XPath 1
queries to see if I could make libxml2 return things in a different
order, but no, got document order every time.

Regards,
-Chap

[1]: /messages/by-id/5C465A65.4030305@anastigmatix.net
/messages/by-id/5C465A65.4030305@anastigmatix.net

#38

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Alvaro Herrera (#36)

1 attachment(s)

Re: Fix XML handling with DOCTYPE

Hi,

xml-functions-type-docfix-5.patch attached, with node-sets instead of
nodesets, libxml2 instead of libxml, and parenthesized full sentences
now au naturel.

I ended up turning the formerly-parenthesized note about libxml2's
node-set ordering into a DocBook <note>: there is really something
parenthetical about it, with the official statement of node-set
element ordering being that there is none, and the description of
what the library happens to do being of possible interest, but set
apart, with the necessary caveats about relying on it.

Spotted and fixed a couple more typos in the process.

Regards,
-Chap

Attachments:

xml-functions-type-docfix-5.patchtext/x-patch; name=xml-functions-type-docfix-5.patchDownload

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 52c28e7..0aed14c 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4219,6 +4219,12 @@ a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
     value is a full document or only a content fragment.
    </para>
 
+   <para>
+    Limits and compatibility notes for the <type>xml</type> data type
+    in <productname>PostgreSQL</productname> can be found in
+    <xref linkend="xml-limits-conformance"/>.
+   </para>
+
    <sect2>
     <title>Creating XML Values</title>
    <para>
diff --git a/doc/src/sgml/features.sgml b/doc/src/sgml/features.sgml
index 6c22d69..095fcb9 100644
--- a/doc/src/sgml/features.sgml
+++ b/doc/src/sgml/features.sgml
@@ -16,7 +16,8 @@
   Language SQL</quote>.  A revised version of the standard is released
   from time to time; the most recent update appearing in 2011.
   The 2011 version is referred to as ISO/IEC 9075:2011, or simply as SQL:2011.
-  The versions prior to that were SQL:2008, SQL:2003, SQL:1999, and SQL-92.  Each version
+  The versions prior to that were SQL:2008, SQL:2006, SQL:2003, SQL:1999,
+  and SQL-92.  Each version
   replaces the previous one, so claims of conformance to earlier
   versions have no official merit.
   <productname>PostgreSQL</productname> development aims for
@@ -155,4 +156,329 @@
    </para>
   </sect1>
 
+  <sect1 id="xml-limits-conformance">
+   <title>XML Limits and Conformance to SQL/XML</title>
+
+   <indexterm>
+    <primary>SQL/XML</primary>
+    <secondary>limits and conformance</secondary>
+   </indexterm>
+
+   <para>
+    Significant revisions to the ISO/IEC 9075-14 XML-related specifications
+    (SQL/XML) were introduced with SQL:2006. The
+    <productname>PostgreSQL</productname> implementation of the XML data type
+    and related functions largely follows the earlier, 2003 edition, with some
+    borrowing from the later editions. In particular:
+    <itemizedlist>
+     <listitem>
+      <para>
+       Where the current standard provides a family of XML data types
+       to hold <quote>document</quote> or <quote>content</quote> in
+       untyped or XML Schema-typed variants, and a type
+       <type>XML(SEQUENCE)</type> to hold arbitrary pieces of XML content,
+       <productname>PostgreSQL</productname> provides the single
+       <type>xml</type> type, which can hold <quote>document</quote> or
+       <quote>content</quote>, and no equivalent of the <quote>sequence</quote>
+       type.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       <productname>PostgreSQL</productname> provides two functions introduced
+       in SQL:2006, but in variants that use the language XPath 1.0, rather than
+       XML Query as specified for them in the standard.
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+    This section presents some of the resulting differences you may encounter.
+   </para>
+
+   <sect2 id="functions-xml-limits-xpath1">
+    <title>Queries restricted to XPath 1.0</title>
+    
+    <para>
+     The <productname>PostgreSQL</productname>-specific functions
+     <function>xpath</function> and <function>xpath_exists</function> query
+     XML documents using the XPath language, and
+     <productname>PostgreSQL</productname> also provides XPath-only variants of
+     the standard functions <function>XMLEXISTS</function> and
+     <function>XMLTABLE</function>, which officially use
+     the XQuery language. For all of these functions,
+     <productname>PostgreSQL</productname> relies on the
+     <productname>libxml2</productname> library, which provides only XPath 1.0.
+    </para>
+
+    <para>
+     There is a strong connection between the XQuery language and XPath versions
+     2.0 and later: any expression that is syntactically valid and executes
+     successfully in both produces the same result (with a minor exception for
+     expressions containing numeric character references or predefined entity
+     references, which XQuery replaces with the corresponding character while
+     XPath leaves them alone). But there is no such connection between XPath 1.0
+     and XQuery or the later XPath versions; it was an earlier language and
+     differs in many respects.
+    </para>
+
+    <para>
+     There are two categories of limitation to keep in mind: the restriction
+     from XQuery to XPath for the functions specified in the SQL standard, and
+     the restriction of XPath to version 1.0 for both the standard and the
+     <productname>PostgreSQL</productname>-specific functions.
+    </para>
+    
+    <sect3>
+     <title>Restriction of XQuery to XPath</title>
+
+     <para>
+      Features of XQuery beyond those of XPath include:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+         XQuery expressions can construct and return new XML nodes, in addition
+         to all possible XPath values. XPath can introduce and return values of
+         the atomic types (numbers, strings, and so on) but can only return XML
+         nodes already present in documents supplied as input to the expression.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XQuery has control constructs for iteration, sorting, and grouping.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XQuery allows the declaration and use of local functions.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <para>
+      Recent XPath versions begin to offer overlapping capabilities
+      (such as the functional-style <function>for-each</function> and
+      <function>sort</function>, anonymous functions, and
+      <function>parse-xml</function> to create a node from a string),
+      but these were not available before XPath 3.0.
+     </para>
+    </sect3>
+
+    <sect3>
+     <title>Restriction of XPath to 1.0</title>
+
+     <para>
+      For developers familiar with XQuery and XPath 2.0 or later, or porting
+      queries from other systems, XPath 1.0 presents a number of differences to
+      contend with:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+         The fundamental type of an XQuery/XPath expression, the
+         <type>sequence</type>, which can contain XML nodes, atomic values,
+         or both, does not exist in XPath 1.0. A 1.0 expression can only produce
+         a node-set (possibly empty, or with one XML node or more), or a single
+         atomic value.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         Unlike an XQuery/XPath sequence, which can contain any desired items in
+         any desired order, an XPath 1.0 node-set has no guaranteed order and,
+         like any set, can have no member appear more than once.
+         <note>
+          <para>
+           The <productname>libxml2</productname> library does seem to always
+           return node-sets to <productname>PostgreSQL</productname> with their
+           members in the same relative order they had in the input document. It
+           does not commit to this behavior, and an XPath 1.0 expression cannot
+           control it.
+          </para>
+         </note>
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         While XQuery/XPath provides all of the types defined in XML Schema
+         and many operators and functions over those types, XPath 1.0 has only
+         node-sets and three atomic types, <type>boolean</type>,
+         <type>double</type>, and <type>string</type>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XPath 1.0 has no conditional operator. An XQuery/XPath expression
+         such as <userinput>if ( hat ) then hat/@size else "no hat"</userinput>
+         has no XPath 1.0 equivalent.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XPath 1.0 has no ordering comparison operator for strings. Both
+         <userinput>"cat" &lt; "dog"</userinput> and
+         <userinput>"cat" &gt; "dog"</userinput> are false, because each is a
+         numeric comparison of two <literal>NaN</literal>s. In contrast,
+         <literal>=</literal> and <literal>!=</literal> do compare the strings
+         as strings.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XPath 1.0 blurs the distinction between
+         <firstterm>value comparisons</firstterm> and
+         <firstterm>general comparisons</firstterm> as XQuery/XPath define them.
+         Both <userinput>sale/@hatsize = 7</userinput> and
+         <userinput>sale/@customer = "alice"</userinput> are existentially
+         quantified comparisons, true if there is any sale with the given value
+         for the attribute, but <userinput>sale/@taxable = false()</userinput>
+         is a value comparison to the
+         <firstterm>effective boolean value</firstterm> of a whole node-set,
+         and true only if no sale has a <literal>taxable</literal> attribute
+         at all.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         In the XQuery/XPath data model, a <firstterm>document node</firstterm>
+         can have either document form (exactly one top-level element, only
+         comments and processing instructions outside of it) or content form
+         (with those constraints relaxed). Its equivalent in XPath 1.0, the
+         <firstterm>root node</firstterm>, can only be in document form.
+         This is part of the reason an <type>xml</type> value passed as the
+         context item to any <productname>PostgreSQL</productname> XPath-based
+         function must be in document form.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <para>
+      The differences highlighted here are not all of them. In XQuery and
+      the 2.0 and later versions of XPath, there is an XPath 1.0 compatibility
+      mode, and the W3C lists of function library
+      <ulink url='https://www.w3.org/TR/2010/REC-xpath-functions-20101214/#xpath1-compatibility'>changes</ulink>
+      and language
+      <ulink url='https://www.w3.org/TR/xpath20/#id-backwards-compatibility'>changes</ulink>
+      applied in that mode offer a more complete account of the
+      differences, if still not exhaustive; the compatibility mode cannot make
+      the later languages exactly equivalent to XPath 1.0.
+     </para>
+    </sect3>
+
+    <sect3 id="functions-xml-limits-casts">
+     <title>Mappings between SQL and XML data types and values</title>
+
+     <para>
+      In SQL:2006 and later, both directions of conversion between standard SQL
+      data types and the XML Schema types are specified precisely. However, the
+      rules are expressed using the types and semantics of XQuery/XPath, and
+      have no direct application to the different data model of XPath 1.0.
+     </para>
+
+     <para>
+      When <productname>PostgreSQL</productname> maps SQL data values to XML
+      (as in <function>xmlelement</function>), or XML to SQL (as in the output
+      columns of <function>xmltable</function>), except for the few cases
+      treated specially, <productname>PostgreSQL</productname> simply assumes
+      that the XML data type's XPath 1.0 string form will be valid as the
+      text-input form of the SQL datatype, and conversely. This rule has the
+      virtue of simplicity while producing, for many data types, results similar
+      to the mappings specified in the standard.
+     </para>
+
+     <para>
+      Where interoperability with other systems is a concern, for some data
+      types, it may be necessary to use available data type formatting functions
+      (such as those in <xref linkend="functions-formatting"/>) explicitly in
+      queries to produce the standard mappings.
+     </para>
+    </sect3>
+   </sect2>
+
+   <sect2 id="functions-xml-limits-postgresql">
+    <title>
+     Incidental limits of the implementation
+    </title>
+
+    <para>
+     This section concerns limits that are not inherent in the
+     <productname>libxml2</productname> library, but apply to the current
+     implementation in <productname>PostgreSQL</productname>.
+    </para>
+
+    <sect3>
+     <title>Only <literal>BY VALUE</literal> passing mechanism supported</title>
+
+     <para>
+      The SQL standard defines two <firstterm>passing mechanisms</firstterm>
+      that apply when passing an XML argument from SQL to an XML function or
+      receiving a result: <literal>BY REF</literal>, in which a particular XML
+      value retains its node identity, and <literal>BY VALUE</literal>, in which
+      the content of the XML is passed but node identity is not preserved. A
+      mechanism can be specified before a list of parameters, as the default
+      mechanism for all of them, or after any parameter, to override the
+      default.
+     </para>
+
+     <para>
+      To illustrate the difference, if
+      <replaceable>x</replaceable> is an XML value, these two queries in
+      an SQL:2006 environment would produce true and false, respectively:
+
+      <screen><![CDATA[
+SELECT XMLQUERY('$a is $b' PASSING BY REF x AS a, x AS b NULL ON EMPTY);
+SELECT XMLQUERY('$a is $b' PASSING BY VALUE x AS a, x AS b NULL ON EMPTY);
+]]></screen>
+     </para>
+
+     <para>
+      <productname>PostgreSQL</productname> will accept either
+      <literal>BY VALUE</literal> or <literal>BY REF</literal> in an
+      <function>XMLEXISTS</function> or <function>XMLTABLE</function> construct,
+      but ignores them; the <type>xml</type> data type holds a character-string
+      serialized representation, so there is no node identity to preserve,
+      and passing is always <literal>BY VALUE</literal>.
+     </para>
+    </sect3>
+
+    <sect3>
+     <title>Cannot pass named parameters to queries</title>
+
+     <para>
+      The XPath-based functions support passing one parameter to serve as the
+      XPath expression's context item, but do not support passing additional
+      values to be available to the expression as named parameters.
+     </para>
+    </sect3>
+
+    <sect3>
+     <title>No <type>XML(SEQUENCE)</type> type</title>
+
+     <para>
+      The <productname>PostgreSQL</productname> <type>xml</type> can only hold
+      a value in <literal>DOCUMENT</literal> or <literal>CONTENT</literal> form.
+      An XQuery/XPath expression context item must be a single XML node
+      or atomic value, while XPath 1.0 further restricts it to only an XML node,
+      and has no node type allowing <literal>CONTENT</literal>. The upshot is
+      that a well-formed <literal>DOCUMENT</literal> is the only form of XML
+      value that <productname>PostgreSQL</productname> can supply as an XPath
+      context item.
+     </para>
+    </sect3>
+   </sect2>
+  </sect1>
+
  </appendix>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1a01473..214baa4 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10249,8 +10249,13 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 
 
  <sect1 id="functions-xml">
+
   <title>XML Functions</title>
 
+  <indexterm>
+   <primary>XML Functions</primary>
+  </indexterm>
+
   <para>
    The functions and function-like expressions described in this
    section operate on values of type <type>xml</type>.  Check <xref
@@ -10453,8 +10458,8 @@ SELECT xmlelement(name foo, xmlattributes('xyz' as bar),
      encoding, depending on the setting of the configuration parameter
      <xref linkend="guc-xmlbinary"/>.  The particular behavior for
      individual data types is expected to evolve in order to align the
-     SQL and PostgreSQL data types with the XML Schema specification,
-     at which point a more precise description will appear.
+     PostgreSQL mappings with those specified in SQL:2006 and later,
+     as discussed in <xref linkend="functions-xml-limits-casts"/>.
     </para>
    </sect3>
 
@@ -10696,10 +10701,12 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
 </synopsis>
 
     <para>
-     The function <function>xmlexists</function> returns true if the
-     XPath expression in the first argument returns any nodes, and
-     false otherwise.  (If either argument is null, the result is
-     null.)
+     The function <function>xmlexists</function> evaluates an XPath 1.0
+     expression (the first argument), with the passed value as its context item.
+     The function returns false if the result of that evaluation yields an
+     empty node-set, true if it yields any other value.  The function returns
+     null if an argument is null. A nonnull value passed as the context item
+     must be an XML document, not a content fragment or any non-XML value.
     </para>
 
     <para>
@@ -10716,24 +10723,12 @@ SELECT xmlexists('//town[text() = ''Toronto'']' PASSING BY VALUE '<towns><town>T
 
     <para>
      The <literal>BY REF</literal> or <literal>BY VALUE</literal> clauses
-     have no effect in <productname>PostgreSQL</productname>, but are allowed
-     for compatibility with other implementations.  Per the <acronym>SQL</acronym>
-     standard, the one that precedes any argument is required, and indicates
-     the default for arguments that follow, and one may follow any argument to
-     override the default.
-     <productname>PostgreSQL</productname> ignores <literal>BY REF</literal>
-     and passes by value always.
-    </para>
-
-    <para>
-     In the <acronym>SQL</acronym> standard, an <function>xmlexists</function>
-     construct evaluates an expression in the XQuery language, allows passing
-     values for named parameters in the expression as well as for the context
-     item, and does not require the passed values to be documents, or even of
-     XML type.
-     In <productname>PostgreSQL</productname>, this construct currently only
-     evaluates an XPath 1.0 expression, and allows passing only one value,
-     which must be an XML document, to be the context item.
+     are accepted in <productname>PostgreSQL</productname>, but ignored, as
+     discussed in <xref linkend="functions-xml-limits-postgresql"/>.
+     In the SQL standard, an <function>xmlexists</function> construct first
+     appears in SQL:2006 and evaluates an expression in the XML Query language,
+     but this implementation allows only an XPath 1.0 expression, as discussed
+     in <xref linkend="functions-xml-limits-xpath1"/>.
     </para>
    </sect3>
 
@@ -10839,12 +10834,12 @@ SELECT xml_is_well_formed_document('<pg:foo xmlns:pg="http://postgresql.org/stuf
 </synopsis>
 
     <para>
-     The function <function>xpath</function> evaluates the XPath
+     The function <function>xpath</function> evaluates the XPath 1.0
      expression <replaceable>xpath</replaceable> (a <type>text</type> value)
      against the XML value
      <replaceable>xml</replaceable>.  It returns an array of XML values
-     corresponding to the node set produced by the XPath expression.
-     If the XPath expression returns a scalar value rather than a node set,
+     corresponding to the node-set produced by the XPath expression.
+     If the XPath expression returns a scalar value rather than a node-set,
      a single-element array is returned.
     </para>
 
@@ -10906,9 +10901,10 @@ SELECT xpath('//mydefns:b/text()', '<a xmlns="http://example.com"><b>test</b></a
     <para>
      The function <function>xpath_exists</function> is a specialized form
      of the <function>xpath</function> function.  Instead of returning the
-     individual XML values that satisfy the XPath, this function returns a
-     Boolean indicating whether the query was satisfied or not.  This
-     function is equivalent to the standard <literal>XMLEXISTS</literal> predicate,
+     individual XML values that satisfy the XPath 1.0 expression, this function
+     returns a Boolean indicating whether the query was satisfied or not
+     (specifically, whether it produced any value other than an empty node-set).
+     This function is equivalent to the <literal>XMLEXISTS</literal> predicate,
      except that it also offers support for a namespace mapping argument.
     </para>
 
@@ -10949,8 +10945,8 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
 
     <para>
      The <function>xmltable</function> function produces a table based
-     on the given XML value, an XPath filter to extract rows, and an
-     optional set of column definitions.
+     on the given XML value, an XPath filter to extract rows, and a
+     set of column definitions.
     </para>
 
     <para>
@@ -10961,30 +10957,33 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
     </para>
 
     <para>
-     The required <replaceable>row_expression</replaceable> argument is an XPath
-     expression that is evaluated against the supplied XML document to
-     obtain an ordered sequence of XML nodes. This sequence is what
-     <function>xmltable</function> transforms into output rows.
+     The required <replaceable>row_expression</replaceable> argument is
+     an XPath 1.0 expression that is evaluated, passing the
+     <replaceable>document_expression</replaceable> as its context item, to
+     obtain a set of XML nodes. These nodes are what
+     <function>xmltable</function> transforms into output rows. No rows
+     will be produced if the <replaceable>document_expression</replaceable>
+     is null, or the <replaceable>row_expression</replaceable> produces an
+     empty node-set, or any value other than a node-set.
     </para>
 
     <para>
-     <replaceable>document_expression</replaceable> provides the XML document to
-     operate on.
-     The argument must be a well-formed XML document; fragments/forests
-     are not accepted.
+     <replaceable>document_expression</replaceable> provides the context item
+     for the <replaceable>row_expression</replaceable>. It must be a well-formed
+     XML document; fragments/forests are not accepted.
      The <literal>BY REF</literal> or <literal>BY VALUE</literal> clauses are
-     accepted, as described for the <function>xmlexists</function> predicate,
-     but ignored; PostgreSQL currently passes XML by value always.
+     accepted but ignored, as discussed in
+     <xref linkend="functions-xml-limits-postgresql"/>.
+     In the SQL standard, an <function>xmltable</function> construct first
+     appears in SQL:2006 and evaluates expressions in the XML Query language,
+     but this implementation allows only XPath 1.0 expressions, as discussed
+     in <xref linkend="functions-xml-limits-xpath1"/>.
     </para>
 
     <para>
      The mandatory <literal>COLUMNS</literal> clause specifies the list
      of columns in the output table.
-     If the <literal>COLUMNS</literal> clause is omitted, the rows in the result
-     set contain a single column of type <literal>xml</literal> containing the
-     data matched by <replaceable>row_expression</replaceable>.
-     If <literal>COLUMNS</literal> is specified, each entry describes a
-     single column.
+     Each entry describes a single column.
      See the syntax summary above for the format.
      The column name and type are required; the path, default and
      nullability clauses are optional.
@@ -10998,42 +10997,78 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
     </para>
 
     <para>
-     The <literal>column_expression</literal> for a column is an XPath expression
-     that is evaluated for each row, relative to the result of the
-     <replaceable>row_expression</replaceable>, to find the value of the column.
+     The <literal>column_expression</literal> for a column is an XPath 1.0
+     expression
+     that is evaluated for each row, with the current node from the
+     <replaceable>row_expression</replaceable> result as its context item,
+     to find the value of the column.
      If no <literal>column_expression</literal> is given, then the column name
      is used as an implicit path.
     </para>
 
     <para>
-     If a column's XPath expression returns multiple elements, an error
-     is raised.
-     If the expression matches an empty tag, the result is an
-     empty string (not <literal>NULL</literal>).
-     Any <literal>xsi:nil</literal> attributes are ignored.
+     If a column's XPath expression returns a non-XML value (limited to
+     string, boolean, or double in XPath 1.0) and the column has a
+     PostgreSQL type other than <type>xml</type>, the column will be set
+     as if by assigning the value's string representation to the PostgreSQL
+     type (adjusting the <quote>string representation</quote> of a boolean to
+     <literal>1</literal> or <literal>0</literal> if the target column type
+     category is numeric, otherwise <literal>true</literal> or
+     <literal>false</literal>).
+    </para>
+
+    <para>
+     If the column's expression returns a non-empty set of XML nodes
+     and the target column's type is <type>xml</type>, the column will
+     be assigned the expression result exactly, if it is of document or
+     content form.
+     <footnote>
+      <para>
+       A result containing more than one element node at the top level, or
+       non-whitespace text outside of an element, is an example of content form.
+       An XPath result can be of neither form, for example if it returns an
+       attribute node selected from the element that contains it. Such a result
+       will be put into content form with each such disallowed node replaced by
+       its string value, as defined for the XPath 1.0
+       <function>string</function> function.
+      </para>
+     </footnote>
+    </para>
+
+    <para>
+     A non-XML result assigned to an <type>xml</type> output column produces
+     content, a single text node with the string value of the result.
+     An XML result assigned to a column of any other type may not have more than
+     one node, or an error is raised. If there is exactly one node, the column
+     will be set as if by assigning the node's string
+     value (as defined for the XPath 1.0 <function>string</function> function)
+     to the PostgreSQL type.
     </para>
 
     <para>
-     The text body of the XML matched by the <replaceable>column_expression</replaceable>
-     is used as the column value. Multiple <literal>text()</literal> nodes
-     within an element are concatenated in order. Any child elements,
-     processing instructions, and comments are ignored, but the text contents
-     of child elements are concatenated to the result.
+     The string value of an XML element is the concatenation, in document order,
+     of all text nodes contained in that element and its descendants. The string
+     value of an element with no descendant text nodes is an
+     empty string (not <literal>NULL</literal>).
+     Any <literal>xsi:nil</literal> attributes are ignored.
      Note that the whitespace-only <literal>text()</literal> node between two non-text
      elements is preserved, and that leading whitespace on a <literal>text()</literal>
      node is not flattened.
+     The XPath 1.0 <function>string</function> function may be consulted for the
+     rules defining the string value of other XML node types and non-XML values.
+    </para>
+
+    <para>
+     The conversion rules presented here are not exactly those of the SQL
+     standard, as discussed in <xref linkend="functions-xml-limits-casts"/>.
     </para>
 
     <para>
-     If the path expression does not match for a given row but
-     <replaceable>default_expression</replaceable> is specified, the value resulting
-     from evaluating that expression is used.
-     If no <literal>DEFAULT</literal> clause is given for the column,
-     the field will be set to <literal>NULL</literal>.
-     It is possible for a <replaceable>default_expression</replaceable> to reference
-     the value of output columns that appear prior to it in the column list,
-     so the default of one column may be based on the value of another
-     column.
+     If the path expression returns an empty node-set
+     (typically, when it does not match)
+     for a given row, the column will be set to <literal>NULL</literal>, unless
+     a <replaceable>default_expression</replaceable> is specified; then the
+     value resulting from evaluating that expression is used.
     </para>
 
     <para>
@@ -11045,20 +11080,14 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
     </para>
 
     <para>
-     Unlike regular PostgreSQL functions, <replaceable>column_expression</replaceable>
-     and <replaceable>default_expression</replaceable> are not evaluated to a simple
-     value before calling the function.
-     <replaceable>column_expression</replaceable> is normally evaluated
-     exactly once per input row, and <replaceable>default_expression</replaceable>
-     is evaluated each time a default is needed for a field.
-     If the expression qualifies as stable or immutable the repeat
+     A <replaceable>default_expression</replaceable>, rather than being
+     evaluated immediately when <function>xmltable</function> is called,
+     is evaluated each time a default is needed for the column.
+     If the expression qualifies as stable or immutable, the repeat
      evaluation may be skipped.
-     Effectively <function>xmltable</function> behaves more like a subquery than a
-     function call.
      This means that you can usefully use volatile functions like
-     <function>nextval</function> in <replaceable>default_expression</replaceable>, and
-     <replaceable>column_expression</replaceable> may depend on other parts of the
-     XML document.
+     <function>nextval</function> in
+     <replaceable>default_expression</replaceable>.
     </para>
 
     <para>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index bade0fe..32908c1 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -593,7 +593,7 @@ X085	Predefined namespace prefixes			NO
 X086	XML namespace declarations in XMLTable			NO	
 X090	XML document predicate			YES	
 X091	XML content predicate			NO	
-X096	XMLExists			NO	XPath only
+X096	XMLExists			NO	XPath 1.0 only
 X100	Host language support for XML: CONTENT option			NO	
 X101	Host language support for XML: DOCUMENT option			NO	
 X110	Host language support for XML: VARCHAR mapping			NO	
@@ -661,11 +661,11 @@ X282	XMLValidate with CONTENT option			NO
 X283	XMLValidate with SEQUENCE option			NO	
 X284	XMLValidate: NAMESPACE without ELEMENT clause			NO	
 X286	XMLValidate: NO NAMESPACE with ELEMENT clause			NO	
-X300	XMLTable			NO	XPath only
+X300	XMLTable			NO	XPath 1.0 only
 X301	XMLTable: derived column list option			YES	
 X302	XMLTable: ordinality column option			YES	
 X303	XMLTable: column default option			YES	
-X304	XMLTable: passing a context item			YES	
+X304	XMLTable: passing a context item			YES	must be XML DOCUMENT
 X305	XMLTable: initializing an XQuery variable			NO	
 X400	Name and identifier mapping			YES	
 X410	Alter column data type: XML type			YES

#39

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Chapman Flack (#38)

Re: Fix XML handling with DOCTYPE

On 03/27/19 19:07, Chapman Flack wrote:

xml-functions-type-docfix-5.patch attached, with node-sets instead of
nodesets, libxml2 instead of libxml, and parenthesized full sentences
now au naturel.

I ended up turning the formerly-parenthesized note about libxml2's
node-set ordering into a DocBook <note>: there is really something
parenthetical about it, with the official statement of node-set
element ordering being that there is none, and the description of
what the library happens to do being of possible interest, but set
apart, with the necessary caveats about relying on it.

I have just suffered a giant sinking feeling upon re-reading this
sentence in our XMLTABLE doc:

A column marked FOR ORDINALITY will be populated with row numbers
matching the order in which the output rows appeared in the original
input XML document.

I've been skimming right over it all this time, and that right there is
a glaring built-in reliance on the observable-but-disclaimed iteration
order of a libxml2 node-set.

I'm a bit unsure what any clarifying language should even say.

Regards,
-Chap

#40

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Chapman Flack (#39)

1 attachment(s)

Re: Fix XML handling with DOCTYPE

On 03/27/19 19:27, Chapman Flack wrote:

A column marked FOR ORDINALITY will be populated with row numbers
matching the order in which the output rows appeared in the original
input XML document.

I've been skimming right over it all this time, and that right there is
a glaring built-in reliance on the observable-but-disclaimed iteration
order of a libxml2 node-set.

So, xml-functions-type-docfix-6.patch.

I changed that language to say "populated with row numbers, starting
with 1, in the order of nodes retrieved from the row_expression's
result node-set."

That's not such a terrible thing to have to say; in fact, it's the
*correct* description for the standard, XQuery-based, XMLTABLE (where
the language gives you control of the result sequence's order).

I followed that with a short note saying since XPath 1.0 doesn't
specify that order, relying on it is implementation-dependent, and
linked to the existing Appendix D discussion.

I would have like to link directly to the <listitem>, but of course
<xref> doesn't know what to call that, so I linked to the <sect3>
instead.

Regards,
-Chap

Attachments:

xml-functions-type-docfix-6.patchtext/x-patch; name=xml-functions-type-docfix-6.patchDownload

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 52c28e7..0aed14c 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4219,6 +4219,12 @@ a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
     value is a full document or only a content fragment.
    </para>
 
+   <para>
+    Limits and compatibility notes for the <type>xml</type> data type
+    in <productname>PostgreSQL</productname> can be found in
+    <xref linkend="xml-limits-conformance"/>.
+   </para>
+
    <sect2>
     <title>Creating XML Values</title>
    <para>
diff --git a/doc/src/sgml/features.sgml b/doc/src/sgml/features.sgml
index 6c22d69..e8015a9 100644
--- a/doc/src/sgml/features.sgml
+++ b/doc/src/sgml/features.sgml
@@ -16,7 +16,8 @@
   Language SQL</quote>.  A revised version of the standard is released
   from time to time; the most recent update appearing in 2011.
   The 2011 version is referred to as ISO/IEC 9075:2011, or simply as SQL:2011.
-  The versions prior to that were SQL:2008, SQL:2003, SQL:1999, and SQL-92.  Each version
+  The versions prior to that were SQL:2008, SQL:2006, SQL:2003, SQL:1999,
+  and SQL-92.  Each version
   replaces the previous one, so claims of conformance to earlier
   versions have no official merit.
   <productname>PostgreSQL</productname> development aims for
@@ -155,4 +156,329 @@
    </para>
   </sect1>
 
+  <sect1 id="xml-limits-conformance">
+   <title>XML Limits and Conformance to SQL/XML</title>
+
+   <indexterm>
+    <primary>SQL/XML</primary>
+    <secondary>limits and conformance</secondary>
+   </indexterm>
+
+   <para>
+    Significant revisions to the ISO/IEC 9075-14 XML-related specifications
+    (SQL/XML) were introduced with SQL:2006. The
+    <productname>PostgreSQL</productname> implementation of the XML data type
+    and related functions largely follows the earlier, 2003 edition, with some
+    borrowing from the later editions. In particular:
+    <itemizedlist>
+     <listitem>
+      <para>
+       Where the current standard provides a family of XML data types
+       to hold <quote>document</quote> or <quote>content</quote> in
+       untyped or XML Schema-typed variants, and a type
+       <type>XML(SEQUENCE)</type> to hold arbitrary pieces of XML content,
+       <productname>PostgreSQL</productname> provides the single
+       <type>xml</type> type, which can hold <quote>document</quote> or
+       <quote>content</quote>, and no equivalent of the <quote>sequence</quote>
+       type.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       <productname>PostgreSQL</productname> provides two functions introduced
+       in SQL:2006, but in variants that use the language XPath 1.0, rather than
+       XML Query as specified for them in the standard.
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+    This section presents some of the resulting differences you may encounter.
+   </para>
+
+   <sect2 id="functions-xml-limits-xpath1">
+    <title>Queries restricted to XPath 1.0</title>
+    
+    <para>
+     The <productname>PostgreSQL</productname>-specific functions
+     <function>xpath</function> and <function>xpath_exists</function> query
+     XML documents using the XPath language, and
+     <productname>PostgreSQL</productname> also provides XPath-only variants of
+     the standard functions <function>XMLEXISTS</function> and
+     <function>XMLTABLE</function>, which officially use
+     the XQuery language. For all of these functions,
+     <productname>PostgreSQL</productname> relies on the
+     <productname>libxml2</productname> library, which provides only XPath 1.0.
+    </para>
+
+    <para>
+     There is a strong connection between the XQuery language and XPath versions
+     2.0 and later: any expression that is syntactically valid and executes
+     successfully in both produces the same result (with a minor exception for
+     expressions containing numeric character references or predefined entity
+     references, which XQuery replaces with the corresponding character while
+     XPath leaves them alone). But there is no such connection between XPath 1.0
+     and XQuery or the later XPath versions; it was an earlier language and
+     differs in many respects.
+    </para>
+
+    <para>
+     There are two categories of limitation to keep in mind: the restriction
+     from XQuery to XPath for the functions specified in the SQL standard, and
+     the restriction of XPath to version 1.0 for both the standard and the
+     <productname>PostgreSQL</productname>-specific functions.
+    </para>
+    
+    <sect3>
+     <title>Restriction of XQuery to XPath</title>
+
+     <para>
+      Features of XQuery beyond those of XPath include:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+         XQuery expressions can construct and return new XML nodes, in addition
+         to all possible XPath values. XPath can introduce and return values of
+         the atomic types (numbers, strings, and so on) but can only return XML
+         nodes already present in documents supplied as input to the expression.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XQuery has control constructs for iteration, sorting, and grouping.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XQuery allows the declaration and use of local functions.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <para>
+      Recent XPath versions begin to offer overlapping capabilities
+      (such as the functional-style <function>for-each</function> and
+      <function>sort</function>, anonymous functions, and
+      <function>parse-xml</function> to create a node from a string),
+      but these were not available before XPath 3.0.
+     </para>
+    </sect3>
+
+    <sect3 id="xml-xpath-1-specifics">
+     <title>Restriction of XPath to 1.0</title>
+
+     <para>
+      For developers familiar with XQuery and XPath 2.0 or later, or porting
+      queries from other systems, XPath 1.0 presents a number of differences to
+      contend with:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+         The fundamental type of an XQuery/XPath expression, the
+         <type>sequence</type>, which can contain XML nodes, atomic values,
+         or both, does not exist in XPath 1.0. A 1.0 expression can only produce
+         a node-set (possibly empty, or with one XML node or more), or a single
+         atomic value.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         Unlike an XQuery/XPath sequence, which can contain any desired items in
+         any desired order, an XPath 1.0 node-set has no guaranteed order and,
+         like any set, can have no member appear more than once.
+         <note>
+          <para>
+           The <productname>libxml2</productname> library does seem to always
+           return node-sets to <productname>PostgreSQL</productname> with their
+           members in the same relative order they had in the input document. It
+           does not commit to this behavior, and an XPath 1.0 expression cannot
+           control it.
+          </para>
+         </note>
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         While XQuery/XPath provides all of the types defined in XML Schema
+         and many operators and functions over those types, XPath 1.0 has only
+         node-sets and three atomic types, <type>boolean</type>,
+         <type>double</type>, and <type>string</type>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XPath 1.0 has no conditional operator. An XQuery/XPath expression
+         such as <userinput>if ( hat ) then hat/@size else "no hat"</userinput>
+         has no XPath 1.0 equivalent.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XPath 1.0 has no ordering comparison operator for strings. Both
+         <userinput>"cat" &lt; "dog"</userinput> and
+         <userinput>"cat" &gt; "dog"</userinput> are false, because each is a
+         numeric comparison of two <literal>NaN</literal>s. In contrast,
+         <literal>=</literal> and <literal>!=</literal> do compare the strings
+         as strings.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XPath 1.0 blurs the distinction between
+         <firstterm>value comparisons</firstterm> and
+         <firstterm>general comparisons</firstterm> as XQuery/XPath define them.
+         Both <userinput>sale/@hatsize = 7</userinput> and
+         <userinput>sale/@customer = "alice"</userinput> are existentially
+         quantified comparisons, true if there is any sale with the given value
+         for the attribute, but <userinput>sale/@taxable = false()</userinput>
+         is a value comparison to the
+         <firstterm>effective boolean value</firstterm> of a whole node-set,
+         and true only if no sale has a <literal>taxable</literal> attribute
+         at all.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         In the XQuery/XPath data model, a <firstterm>document node</firstterm>
+         can have either document form (exactly one top-level element, only
+         comments and processing instructions outside of it) or content form
+         (with those constraints relaxed). Its equivalent in XPath 1.0, the
+         <firstterm>root node</firstterm>, can only be in document form.
+         This is part of the reason an <type>xml</type> value passed as the
+         context item to any <productname>PostgreSQL</productname> XPath-based
+         function must be in document form.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <para>
+      The differences highlighted here are not all of them. In XQuery and
+      the 2.0 and later versions of XPath, there is an XPath 1.0 compatibility
+      mode, and the W3C lists of function library
+      <ulink url='https://www.w3.org/TR/2010/REC-xpath-functions-20101214/#xpath1-compatibility'>changes</ulink>
+      and language
+      <ulink url='https://www.w3.org/TR/xpath20/#id-backwards-compatibility'>changes</ulink>
+      applied in that mode offer a more complete account of the
+      differences, if still not exhaustive; the compatibility mode cannot make
+      the later languages exactly equivalent to XPath 1.0.
+     </para>
+    </sect3>
+
+    <sect3 id="functions-xml-limits-casts">
+     <title>Mappings between SQL and XML data types and values</title>
+
+     <para>
+      In SQL:2006 and later, both directions of conversion between standard SQL
+      data types and the XML Schema types are specified precisely. However, the
+      rules are expressed using the types and semantics of XQuery/XPath, and
+      have no direct application to the different data model of XPath 1.0.
+     </para>
+
+     <para>
+      When <productname>PostgreSQL</productname> maps SQL data values to XML
+      (as in <function>xmlelement</function>), or XML to SQL (as in the output
+      columns of <function>xmltable</function>), except for the few cases
+      treated specially, <productname>PostgreSQL</productname> simply assumes
+      that the XML data type's XPath 1.0 string form will be valid as the
+      text-input form of the SQL datatype, and conversely. This rule has the
+      virtue of simplicity while producing, for many data types, results similar
+      to the mappings specified in the standard.
+     </para>
+
+     <para>
+      Where interoperability with other systems is a concern, for some data
+      types, it may be necessary to use available data type formatting functions
+      (such as those in <xref linkend="functions-formatting"/>) explicitly in
+      queries to produce the standard mappings.
+     </para>
+    </sect3>
+   </sect2>
+
+   <sect2 id="functions-xml-limits-postgresql">
+    <title>
+     Incidental limits of the implementation
+    </title>
+
+    <para>
+     This section concerns limits that are not inherent in the
+     <productname>libxml2</productname> library, but apply to the current
+     implementation in <productname>PostgreSQL</productname>.
+    </para>
+
+    <sect3>
+     <title>Only <literal>BY VALUE</literal> passing mechanism supported</title>
+
+     <para>
+      The SQL standard defines two <firstterm>passing mechanisms</firstterm>
+      that apply when passing an XML argument from SQL to an XML function or
+      receiving a result: <literal>BY REF</literal>, in which a particular XML
+      value retains its node identity, and <literal>BY VALUE</literal>, in which
+      the content of the XML is passed but node identity is not preserved. A
+      mechanism can be specified before a list of parameters, as the default
+      mechanism for all of them, or after any parameter, to override the
+      default.
+     </para>
+
+     <para>
+      To illustrate the difference, if
+      <replaceable>x</replaceable> is an XML value, these two queries in
+      an SQL:2006 environment would produce true and false, respectively:
+
+      <screen><![CDATA[
+SELECT XMLQUERY('$a is $b' PASSING BY REF x AS a, x AS b NULL ON EMPTY);
+SELECT XMLQUERY('$a is $b' PASSING BY VALUE x AS a, x AS b NULL ON EMPTY);
+]]></screen>
+     </para>
+
+     <para>
+      <productname>PostgreSQL</productname> will accept either
+      <literal>BY VALUE</literal> or <literal>BY REF</literal> in an
+      <function>XMLEXISTS</function> or <function>XMLTABLE</function> construct,
+      but ignores them; the <type>xml</type> data type holds a character-string
+      serialized representation, so there is no node identity to preserve,
+      and passing is always <literal>BY VALUE</literal>.
+     </para>
+    </sect3>
+
+    <sect3>
+     <title>Cannot pass named parameters to queries</title>
+
+     <para>
+      The XPath-based functions support passing one parameter to serve as the
+      XPath expression's context item, but do not support passing additional
+      values to be available to the expression as named parameters.
+     </para>
+    </sect3>
+
+    <sect3>
+     <title>No <type>XML(SEQUENCE)</type> type</title>
+
+     <para>
+      The <productname>PostgreSQL</productname> <type>xml</type> can only hold
+      a value in <literal>DOCUMENT</literal> or <literal>CONTENT</literal> form.
+      An XQuery/XPath expression context item must be a single XML node
+      or atomic value, while XPath 1.0 further restricts it to only an XML node,
+      and has no node type allowing <literal>CONTENT</literal>. The upshot is
+      that a well-formed <literal>DOCUMENT</literal> is the only form of XML
+      value that <productname>PostgreSQL</productname> can supply as an XPath
+      context item.
+     </para>
+    </sect3>
+   </sect2>
+  </sect1>
+
  </appendix>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1a01473..7952f02 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10249,8 +10249,13 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 
 
  <sect1 id="functions-xml">
+
   <title>XML Functions</title>
 
+  <indexterm>
+   <primary>XML Functions</primary>
+  </indexterm>
+
   <para>
    The functions and function-like expressions described in this
    section operate on values of type <type>xml</type>.  Check <xref
@@ -10453,8 +10458,8 @@ SELECT xmlelement(name foo, xmlattributes('xyz' as bar),
      encoding, depending on the setting of the configuration parameter
      <xref linkend="guc-xmlbinary"/>.  The particular behavior for
      individual data types is expected to evolve in order to align the
-     SQL and PostgreSQL data types with the XML Schema specification,
-     at which point a more precise description will appear.
+     PostgreSQL mappings with those specified in SQL:2006 and later,
+     as discussed in <xref linkend="functions-xml-limits-casts"/>.
     </para>
    </sect3>
 
@@ -10696,10 +10701,12 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
 </synopsis>
 
     <para>
-     The function <function>xmlexists</function> returns true if the
-     XPath expression in the first argument returns any nodes, and
-     false otherwise.  (If either argument is null, the result is
-     null.)
+     The function <function>xmlexists</function> evaluates an XPath 1.0
+     expression (the first argument), with the passed value as its context item.
+     The function returns false if the result of that evaluation yields an
+     empty node-set, true if it yields any other value.  The function returns
+     null if an argument is null. A nonnull value passed as the context item
+     must be an XML document, not a content fragment or any non-XML value.
     </para>
 
     <para>
@@ -10716,24 +10723,12 @@ SELECT xmlexists('//town[text() = ''Toronto'']' PASSING BY VALUE '<towns><town>T
 
     <para>
      The <literal>BY REF</literal> or <literal>BY VALUE</literal> clauses
-     have no effect in <productname>PostgreSQL</productname>, but are allowed
-     for compatibility with other implementations.  Per the <acronym>SQL</acronym>
-     standard, the one that precedes any argument is required, and indicates
-     the default for arguments that follow, and one may follow any argument to
-     override the default.
-     <productname>PostgreSQL</productname> ignores <literal>BY REF</literal>
-     and passes by value always.
-    </para>
-
-    <para>
-     In the <acronym>SQL</acronym> standard, an <function>xmlexists</function>
-     construct evaluates an expression in the XQuery language, allows passing
-     values for named parameters in the expression as well as for the context
-     item, and does not require the passed values to be documents, or even of
-     XML type.
-     In <productname>PostgreSQL</productname>, this construct currently only
-     evaluates an XPath 1.0 expression, and allows passing only one value,
-     which must be an XML document, to be the context item.
+     are accepted in <productname>PostgreSQL</productname>, but ignored, as
+     discussed in <xref linkend="functions-xml-limits-postgresql"/>.
+     In the SQL standard, an <function>xmlexists</function> construct first
+     appears in SQL:2006 and evaluates an expression in the XML Query language,
+     but this implementation allows only an XPath 1.0 expression, as discussed
+     in <xref linkend="functions-xml-limits-xpath1"/>.
     </para>
    </sect3>
 
@@ -10839,12 +10834,12 @@ SELECT xml_is_well_formed_document('<pg:foo xmlns:pg="http://postgresql.org/stuf
 </synopsis>
 
     <para>
-     The function <function>xpath</function> evaluates the XPath
+     The function <function>xpath</function> evaluates the XPath 1.0
      expression <replaceable>xpath</replaceable> (a <type>text</type> value)
      against the XML value
      <replaceable>xml</replaceable>.  It returns an array of XML values
-     corresponding to the node set produced by the XPath expression.
-     If the XPath expression returns a scalar value rather than a node set,
+     corresponding to the node-set produced by the XPath expression.
+     If the XPath expression returns a scalar value rather than a node-set,
      a single-element array is returned.
     </para>
 
@@ -10906,9 +10901,10 @@ SELECT xpath('//mydefns:b/text()', '<a xmlns="http://example.com"><b>test</b></a
     <para>
      The function <function>xpath_exists</function> is a specialized form
      of the <function>xpath</function> function.  Instead of returning the
-     individual XML values that satisfy the XPath, this function returns a
-     Boolean indicating whether the query was satisfied or not.  This
-     function is equivalent to the standard <literal>XMLEXISTS</literal> predicate,
+     individual XML values that satisfy the XPath 1.0 expression, this function
+     returns a Boolean indicating whether the query was satisfied or not
+     (specifically, whether it produced any value other than an empty node-set).
+     This function is equivalent to the <literal>XMLEXISTS</literal> predicate,
      except that it also offers support for a namespace mapping argument.
     </para>
 
@@ -10949,8 +10945,8 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
 
     <para>
      The <function>xmltable</function> function produces a table based
-     on the given XML value, an XPath filter to extract rows, and an
-     optional set of column definitions.
+     on the given XML value, an XPath filter to extract rows, and a
+     set of column definitions.
     </para>
 
     <para>
@@ -10961,30 +10957,33 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
     </para>
 
     <para>
-     The required <replaceable>row_expression</replaceable> argument is an XPath
-     expression that is evaluated against the supplied XML document to
-     obtain an ordered sequence of XML nodes. This sequence is what
-     <function>xmltable</function> transforms into output rows.
+     The required <replaceable>row_expression</replaceable> argument is
+     an XPath 1.0 expression that is evaluated, passing the
+     <replaceable>document_expression</replaceable> as its context item, to
+     obtain a set of XML nodes. These nodes are what
+     <function>xmltable</function> transforms into output rows. No rows
+     will be produced if the <replaceable>document_expression</replaceable>
+     is null, or the <replaceable>row_expression</replaceable> produces an
+     empty node-set, or any value other than a node-set.
     </para>
 
     <para>
-     <replaceable>document_expression</replaceable> provides the XML document to
-     operate on.
-     The argument must be a well-formed XML document; fragments/forests
-     are not accepted.
+     <replaceable>document_expression</replaceable> provides the context item
+     for the <replaceable>row_expression</replaceable>. It must be a well-formed
+     XML document; fragments/forests are not accepted.
      The <literal>BY REF</literal> or <literal>BY VALUE</literal> clauses are
-     accepted, as described for the <function>xmlexists</function> predicate,
-     but ignored; PostgreSQL currently passes XML by value always.
+     accepted but ignored, as discussed in
+     <xref linkend="functions-xml-limits-postgresql"/>.
+     In the SQL standard, an <function>xmltable</function> construct first
+     appears in SQL:2006 and evaluates expressions in the XML Query language,
+     but this implementation allows only XPath 1.0 expressions, as discussed
+     in <xref linkend="functions-xml-limits-xpath1"/>.
     </para>
 
     <para>
      The mandatory <literal>COLUMNS</literal> clause specifies the list
      of columns in the output table.
-     If the <literal>COLUMNS</literal> clause is omitted, the rows in the result
-     set contain a single column of type <literal>xml</literal> containing the
-     data matched by <replaceable>row_expression</replaceable>.
-     If <literal>COLUMNS</literal> is specified, each entry describes a
-     single column.
+     Each entry describes a single column.
      See the syntax summary above for the format.
      The column name and type are required; the path, default and
      nullability clauses are optional.
@@ -10992,48 +10991,93 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
 
     <para>
      A column marked <literal>FOR ORDINALITY</literal> will be populated
-     with row numbers matching the order in which the
-     output rows appeared in the original input XML document.
+     with row numbers, starting with 1, in the order of nodes retrieved from
+     the <replaceable>row_expression</replaceable>'s result node-set.
      At most one column may be marked <literal>FOR ORDINALITY</literal>.
     </para>
 
+    <note>
+     <para>
+      XPath 1.0 does not specify an order for nodes in a node-set, so code that
+      relies on a particular order here will be implementation-dependent.
+      Details for this implementation can be found in
+      <xref linkend="xml-xpath-1-specifics"/>.
+     </para>
+    </note>
+
     <para>
-     The <literal>column_expression</literal> for a column is an XPath expression
-     that is evaluated for each row, relative to the result of the
-     <replaceable>row_expression</replaceable>, to find the value of the column.
+     The <literal>column_expression</literal> for a column is an XPath 1.0
+     expression
+     that is evaluated for each row, with the current node from the
+     <replaceable>row_expression</replaceable> result as its context item,
+     to find the value of the column.
      If no <literal>column_expression</literal> is given, then the column name
      is used as an implicit path.
     </para>
 
     <para>
-     If a column's XPath expression returns multiple elements, an error
-     is raised.
-     If the expression matches an empty tag, the result is an
-     empty string (not <literal>NULL</literal>).
-     Any <literal>xsi:nil</literal> attributes are ignored.
+     If a column's XPath expression returns a non-XML value (limited to
+     string, boolean, or double in XPath 1.0) and the column has a
+     PostgreSQL type other than <type>xml</type>, the column will be set
+     as if by assigning the value's string representation to the PostgreSQL
+     type (adjusting the <quote>string representation</quote> of a boolean to
+     <literal>1</literal> or <literal>0</literal> if the target column type
+     category is numeric, otherwise <literal>true</literal> or
+     <literal>false</literal>).
     </para>
 
     <para>
-     The text body of the XML matched by the <replaceable>column_expression</replaceable>
-     is used as the column value. Multiple <literal>text()</literal> nodes
-     within an element are concatenated in order. Any child elements,
-     processing instructions, and comments are ignored, but the text contents
-     of child elements are concatenated to the result.
+     If the column's expression returns a non-empty set of XML nodes
+     and the target column's type is <type>xml</type>, the column will
+     be assigned the expression result exactly, if it is of document or
+     content form.
+     <footnote>
+      <para>
+       A result containing more than one element node at the top level, or
+       non-whitespace text outside of an element, is an example of content form.
+       An XPath result can be of neither form, for example if it returns an
+       attribute node selected from the element that contains it. Such a result
+       will be put into content form with each such disallowed node replaced by
+       its string value, as defined for the XPath 1.0
+       <function>string</function> function.
+      </para>
+     </footnote>
+    </para>
+
+    <para>
+     A non-XML result assigned to an <type>xml</type> output column produces
+     content, a single text node with the string value of the result.
+     An XML result assigned to a column of any other type may not have more than
+     one node, or an error is raised. If there is exactly one node, the column
+     will be set as if by assigning the node's string
+     value (as defined for the XPath 1.0 <function>string</function> function)
+     to the PostgreSQL type.
+    </para>
+
+    <para>
+     The string value of an XML element is the concatenation, in document order,
+     of all text nodes contained in that element and its descendants. The string
+     value of an element with no descendant text nodes is an
+     empty string (not <literal>NULL</literal>).
+     Any <literal>xsi:nil</literal> attributes are ignored.
      Note that the whitespace-only <literal>text()</literal> node between two non-text
      elements is preserved, and that leading whitespace on a <literal>text()</literal>
      node is not flattened.
+     The XPath 1.0 <function>string</function> function may be consulted for the
+     rules defining the string value of other XML node types and non-XML values.
+    </para>
+
+    <para>
+     The conversion rules presented here are not exactly those of the SQL
+     standard, as discussed in <xref linkend="functions-xml-limits-casts"/>.
     </para>
 
     <para>
-     If the path expression does not match for a given row but
-     <replaceable>default_expression</replaceable> is specified, the value resulting
-     from evaluating that expression is used.
-     If no <literal>DEFAULT</literal> clause is given for the column,
-     the field will be set to <literal>NULL</literal>.
-     It is possible for a <replaceable>default_expression</replaceable> to reference
-     the value of output columns that appear prior to it in the column list,
-     so the default of one column may be based on the value of another
-     column.
+     If the path expression returns an empty node-set
+     (typically, when it does not match)
+     for a given row, the column will be set to <literal>NULL</literal>, unless
+     a <replaceable>default_expression</replaceable> is specified; then the
+     value resulting from evaluating that expression is used.
     </para>
 
     <para>
@@ -11045,20 +11089,14 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
     </para>
 
     <para>
-     Unlike regular PostgreSQL functions, <replaceable>column_expression</replaceable>
-     and <replaceable>default_expression</replaceable> are not evaluated to a simple
-     value before calling the function.
-     <replaceable>column_expression</replaceable> is normally evaluated
-     exactly once per input row, and <replaceable>default_expression</replaceable>
-     is evaluated each time a default is needed for a field.
-     If the expression qualifies as stable or immutable the repeat
+     A <replaceable>default_expression</replaceable>, rather than being
+     evaluated immediately when <function>xmltable</function> is called,
+     is evaluated each time a default is needed for the column.
+     If the expression qualifies as stable or immutable, the repeat
      evaluation may be skipped.
-     Effectively <function>xmltable</function> behaves more like a subquery than a
-     function call.
      This means that you can usefully use volatile functions like
-     <function>nextval</function> in <replaceable>default_expression</replaceable>, and
-     <replaceable>column_expression</replaceable> may depend on other parts of the
-     XML document.
+     <function>nextval</function> in
+     <replaceable>default_expression</replaceable>.
     </para>
 
     <para>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index bade0fe..32908c1 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -593,7 +593,7 @@ X085	Predefined namespace prefixes			NO
 X086	XML namespace declarations in XMLTable			NO	
 X090	XML document predicate			YES	
 X091	XML content predicate			NO	
-X096	XMLExists			NO	XPath only
+X096	XMLExists			NO	XPath 1.0 only
 X100	Host language support for XML: CONTENT option			NO	
 X101	Host language support for XML: DOCUMENT option			NO	
 X110	Host language support for XML: VARCHAR mapping			NO	
@@ -661,11 +661,11 @@ X282	XMLValidate with CONTENT option			NO
 X283	XMLValidate with SEQUENCE option			NO	
 X284	XMLValidate: NAMESPACE without ELEMENT clause			NO	
 X286	XMLValidate: NO NAMESPACE with ELEMENT clause			NO	
-X300	XMLTable			NO	XPath only
+X300	XMLTable			NO	XPath 1.0 only
 X301	XMLTable: derived column list option			YES	
 X302	XMLTable: ordinality column option			YES	
 X303	XMLTable: column default option			YES	
-X304	XMLTable: passing a context item			YES	
+X304	XMLTable: passing a context item			YES	must be XML DOCUMENT
 X305	XMLTable: initializing an XQuery variable			NO	
 X400	Name and identifier mapping			YES	
 X410	Alter column data type: XML type			YES

#41

Ryan Lambert

ryan@rustprooflabs.com

almost 7 years ago

In reply to: Chapman Flack (#40)

Re: Fix XML handling with DOCTYPE

I applied and reviewed xml-functions-type-docfix-6.patch. Looks good to me.

I like the standardization (e.g. libxml2, node-set) and I didn't catch any
spots that used the other versions. I agree that the <note> is appropriate
for that block.
It also looks like you incorporated Alvaro's feedback about sorting, or the
lack thereof.

Let me know if there's anything else I can do to help get this accepted.
Thanks,

Ryan

On Thu, Mar 28, 2019 at 5:45 PM Chapman Flack <chap@anastigmatix.net> wrote:

Show quoted text

On 03/27/19 19:27, Chapman Flack wrote:

A column marked FOR ORDINALITY will be populated with row numbers
matching the order in which the output rows appeared in the original
input XML document.

I've been skimming right over it all this time, and that right there is
a glaring built-in reliance on the observable-but-disclaimed iteration
order of a libxml2 node-set.

So, xml-functions-type-docfix-6.patch.

I changed that language to say "populated with row numbers, starting
with 1, in the order of nodes retrieved from the row_expression's
result node-set."

That's not such a terrible thing to have to say; in fact, it's the
*correct* description for the standard, XQuery-based, XMLTABLE (where
the language gives you control of the result sequence's order).

I followed that with a short note saying since XPath 1.0 doesn't
specify that order, relying on it is implementation-dependent, and
linked to the existing Appendix D discussion.

I would have like to link directly to the <listitem>, but of course
<xref> doesn't know what to call that, so I linked to the <sect3>
instead.

Regards,
-Chap

#42

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Chapman Flack (#40)

Re: Fix XML handling with DOCTYPE

Chapman Flack <chap@anastigmatix.net> writes:

So, xml-functions-type-docfix-6.patch.

Pushed with some light(?) copy-editing.

I believe this closes out everything discussed in

https://commitfest.postgresql.org/22/1872/

but I haven't gone through all three threads in detail.
Please confirm whether that CF entry can be closed or not.

regards, tom lane

#43

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Tom Lane (#42)

Re: Fix XML handling with DOCTYPE

On 4/1/19 4:22 PM, Tom Lane wrote:

Chapman Flack <chap@anastigmatix.net> writes:

So, xml-functions-type-docfix-6.patch.

Pushed with some light(?) copy-editing.

I believe this closes out everything discussed in

https://commitfest.postgresql.org/22/1872/

but I haven't gone through all three threads in detail.
Please confirm whether that CF entry can be closed or not.

I think that does wrap up everything in the CF entry. Thanks!
And thanks for the copy-edits; they do read better than what
I came up with.

When I get a moment, I'll update the PostgreSQL vs. SQL/XML wiki page
to reflect the things that were fixed.

Regards,
-Chap

#44

Alvaro Herrera

alvherre@2ndquadrant.com

almost 7 years ago

In reply to: Chapman Flack (#43)

Re: Fix XML handling with DOCTYPE

On 2019-Apr-01, Chapman Flack wrote:

When I get a moment, I'll update the PostgreSQL vs. SQL/XML wiki page
to reflect the things that were fixed.

I think there were some outright bugs in the docs, at least for
XMLTABLE, that maybe we should backpatch. If you have the energy to
cherry-pick a minimal doc update to 10/11, I offer to back-patch it.

Thanks everyone for taking care of this!

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#45

Chapman Flack

chap@anastigmatix.net

almost 7 years ago

In reply to: Alvaro Herrera (#44)

Re: Fix XML handling with DOCTYPE

On 04/01/19 17:34, Alvaro Herrera wrote:

I think there were some outright bugs in the docs, at least for
XMLTABLE, that maybe we should backpatch. If you have the energy to
cherry-pick a minimal doc update to 10/11, I offer to back-patch it.

I'll see what I can do. There's breathing room for that after the end of
the CF, right?

It seems to me that the conformance-appendix part is worth using,
along with all of the clarifications in datatype.sgml and func.sgml
except the ones clarifying fixed behavior, where the behavior fix
wasn't backpatched. That'll be where the cherry-picking effort lies.

Regards,
-Chap

#46

Chapman Flack

chap@anastigmatix.net

over 6 years ago

In reply to: Alvaro Herrera (#44)

2 attachment(s)

Re: Fix XML handling with DOCTYPE

On 04/01/19 17:34, Alvaro Herrera wrote:

I think there were some outright bugs in the docs, at least for
XMLTABLE, that maybe we should backpatch. If you have the energy to
cherry-pick a minimal doc update to 10/11, I offer to back-patch it.

I don't know if this fits your intention for "minimal". What I've done
is taken the doc commit made by Tom for 12 (12d46a), then revised it
so it describes the unfixed behavior for the bugs whose fixes weren't
backpatched to 11 or 10.

I don't know if it's too late to get in the upcoming minor releases,
but maybe it can, if it looks ok, or the next ones, if that's too rushed.

11.patch applies cleanly to 11, 10.patch to 10.

I've confirmed the 11 docs build successfully, but without sgml tools,
I haven't confirmed that for 10.

Regards,
-Chap

Attachments:

11.patchtext/x-patch; name=11.patchDownload

From f55fb4daa47ed249e87bc417b111e842403fc1a9 Mon Sep 17 00:00:00 2001
From: nobody <nobody@halvard.anastigmatix.net>
Date: Fri, 2 Aug 2019 22:47:10 -0400
Subject: [PATCH] Improve documentation about our XML functionality.

Add a section explaining how our XML features depart from current
versions of the SQL standard.  Update and clarify the descriptions
of some XML functions.

Chapman Flack, reviewed by Ryan Lambert

Discussion: https://postgr.es/m/5BD1284C.1010305@anastigmatix.net
Discussion: https://postgr.es/m/5C81F8C0.6090901@anastigmatix.net
Discussion: https://postgr.es/m/CAN-V+g-6JqUQEQZ55Q3toXEN6d5Ez5uvzL4VR+8KtvJKj31taw@mail.gmail.com

This version for backpatching PG 11 and 10, taken from Tom's commit
for 12, then edited to correctly describe behaviors that are fixed
in 12 but still broken in 11 and 10.
---
 doc/src/sgml/datatype.sgml           |   5 +
 doc/src/sgml/features.sgml           | 381 ++++++++++++++++++++++++++++++++++-
 doc/src/sgml/func.sgml               | 184 +++++++++--------
 src/backend/catalog/sql_features.txt |   6 +-
 4 files changed, 490 insertions(+), 86 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 401a2f0..fa505e0 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4228,6 +4228,11 @@ a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
     value is a full document or only a content fragment.
    </para>
 
+   <para>
+    Limits and compatibility notes for the <type>xml</type> data type
+    can be found in <xref linkend="xml-limits-conformance"/>.
+   </para>
+
    <sect2>
     <title>Creating XML Values</title>
    <para>
diff --git a/doc/src/sgml/features.sgml b/doc/src/sgml/features.sgml
index 6c22d69..253ec87 100644
--- a/doc/src/sgml/features.sgml
+++ b/doc/src/sgml/features.sgml
@@ -16,7 +16,8 @@
   Language SQL</quote>.  A revised version of the standard is released
   from time to time; the most recent update appearing in 2011.
   The 2011 version is referred to as ISO/IEC 9075:2011, or simply as SQL:2011.
-  The versions prior to that were SQL:2008, SQL:2003, SQL:1999, and SQL-92.  Each version
+  The versions prior to that were SQL:2008, SQL:2006, SQL:2003, SQL:1999,
+  and SQL-92.  Each version
   replaces the previous one, so claims of conformance to earlier
   versions have no official merit.
   <productname>PostgreSQL</productname> development aims for
@@ -155,4 +156,382 @@
    </para>
   </sect1>
 
+  <sect1 id="xml-limits-conformance">
+   <title>XML Limits and Conformance to SQL/XML</title>
+
+   <indexterm>
+    <primary>SQL/XML</primary>
+    <secondary>limits and conformance</secondary>
+   </indexterm>
+
+   <para>
+    Significant revisions to the XML-related specifications in ISO/IEC 9075-14
+    (SQL/XML) were introduced with SQL:2006.
+    <productname>PostgreSQL</productname>'s implementation of the XML data
+    type and related functions largely follows the earlier 2003 edition,
+    with some borrowing from later editions.  In particular:
+    <itemizedlist>
+     <listitem>
+      <para>
+       Where the current standard provides a family of XML data types
+       to hold <quote>document</quote> or <quote>content</quote> in
+       untyped or XML Schema-typed variants, and a type
+       <type>XML(SEQUENCE)</type> to hold arbitrary pieces of XML content,
+       <productname>PostgreSQL</productname> provides the single
+       <type>xml</type> type, which can hold <quote>document</quote> or
+       <quote>content</quote>.  There is no equivalent of the
+       standard's <quote>sequence</quote> type.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       <productname>PostgreSQL</productname> provides two functions
+       introduced in SQL:2006, but in variants that use the XPath 1.0
+       language, rather than XML Query as specified for them in the
+       standard.
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+    This section presents some of the resulting differences you may encounter.
+   </para>
+
+   <sect2 id="functions-xml-limits-xpath1">
+    <title>Queries are restricted to XPath 1.0</title>
+
+    <para>
+     The <productname>PostgreSQL</productname>-specific functions
+     <function>xpath()</function> and <function>xpath_exists()</function>
+     query XML documents using the XPath language.
+     <productname>PostgreSQL</productname> also provides XPath-only variants
+     of the standard functions <function>XMLEXISTS</function> and
+     <function>XMLTABLE</function>, which officially use
+     the XQuery language. For all of these functions,
+     <productname>PostgreSQL</productname> relies on the
+     <application>libxml2</application> library, which provides only XPath 1.0.
+    </para>
+
+    <para>
+     There is a strong connection between the XQuery language and XPath
+     versions 2.0 and later: any expression that is syntactically valid and
+     executes successfully in both produces the same result (with a minor
+     exception for expressions containing numeric character references or
+     predefined entity references, which XQuery replaces with the
+     corresponding character while XPath leaves them alone).  But there is
+     no such connection between these languages and XPath 1.0; it was an
+     earlier language and differs in many respects.
+    </para>
+
+    <para>
+     There are two categories of limitation to keep in mind: the restriction
+     from XQuery to XPath for the functions specified in the SQL standard, and
+     the restriction of XPath to version 1.0 for both the standard and the
+     <productname>PostgreSQL</productname>-specific functions.
+    </para>
+
+    <sect3>
+     <title>Restriction of XQuery to XPath</title>
+
+     <para>
+      Features of XQuery beyond those of XPath include:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+         XQuery expressions can construct and return new XML nodes, in
+         addition to all possible XPath values.  XPath can create and return
+         values of the atomic types (numbers, strings, and so on) but can
+         only return XML nodes that were already present in documents
+         supplied as input to the expression.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XQuery has control constructs for iteration, sorting, and grouping.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XQuery allows declaration and use of local functions.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <para>
+      Recent XPath versions begin to offer capabilities overlapping with
+      these (such as functional-style <function>for-each</function> and
+      <function>sort</function>, anonymous functions, and
+      <function>parse-xml</function> to create a node from a string),
+      but such features were not available before XPath 3.0.
+     </para>
+    </sect3>
+
+    <sect3 id="xml-xpath-1-specifics">
+     <title>Restriction of XPath to 1.0</title>
+
+     <para>
+      For developers familiar with XQuery and XPath 2.0 or later, XPath 1.0
+      presents a number of differences to contend with:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+         The fundamental type of an XQuery/XPath expression, the
+         <type>sequence</type>, which can contain XML nodes, atomic values,
+         or both, does not exist in XPath 1.0. A 1.0 expression can only
+         produce a node-set (containing zero or more XML nodes), or a single
+         atomic value.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          Unlike an XQuery/XPath sequence, which can contain any desired
+          items in any desired order, an XPath 1.0 node-set has no
+          guaranteed order and, like any set, does not allow multiple
+          appearances of the same item.
+         <note>
+          <para>
+           The <application>libxml2</application> library does seem to
+           always return node-sets to <productname>PostgreSQL</productname>
+           with their members in the same relative order they had in the
+           input document.  Its documentation does not commit to this
+           behavior, and an XPath 1.0 expression cannot control it.
+          </para>
+         </note>
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         While XQuery/XPath provides all of the types defined in XML Schema
+         and many operators and functions over those types, XPath 1.0 has only
+         node-sets and the three atomic types <type>boolean</type>,
+         <type>double</type>, and <type>string</type>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XPath 1.0 has no conditional operator. An XQuery/XPath expression
+         such as <literal>if ( hat ) then hat/@size else "no hat"</literal>
+         has no XPath 1.0 equivalent.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XPath 1.0 has no ordering comparison operator for strings. Both
+         <literal>"cat" &lt; "dog"</literal> and
+         <literal>"cat" &gt; "dog"</literal> are false, because each is a
+         numeric comparison of two <literal>NaN</literal>s. In contrast,
+         <literal>=</literal> and <literal>!=</literal> do compare the strings
+         as strings.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XPath 1.0 blurs the distinction between
+         <firstterm>value comparisons</firstterm> and
+         <firstterm>general comparisons</firstterm> as XQuery/XPath define
+         them.  Both <literal>sale/@hatsize = 7</literal> and
+         <literal>sale/@customer = "alice"</literal> are existentially
+         quantified comparisons, true if there is
+         any <literal>sale</literal> with the given value for the
+         attribute, but <literal>sale/@taxable = false()</literal> is a
+         value comparison to the
+         <firstterm>effective boolean value</firstterm> of a whole node-set.
+         It is true only if no <literal>sale</literal> has
+         a <literal>taxable</literal> attribute at all.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         In the XQuery/XPath data model, a <firstterm>document
+         node</firstterm> can have either document form (i.e., exactly one
+         top-level element, with only comments and processing instructions
+         outside of it) or content form (with those constraints
+         relaxed). Its equivalent in XPath 1.0, the
+         <firstterm>root node</firstterm>, can only be in document form.
+         This is part of the reason an <type>xml</type> value passed as the
+         context item to any <productname>PostgreSQL</productname>
+         XPath-based function must be in document form.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <para>
+      The differences highlighted here are not all of them. In XQuery and
+      the 2.0 and later versions of XPath, there is an XPath 1.0 compatibility
+      mode, and the W3C lists of
+      <ulink url='https://www.w3.org/TR/2010/REC-xpath-functions-20101214/#xpath1-compatibility'>function library changes</ulink>
+      and
+      <ulink url='https://www.w3.org/TR/xpath20/#id-backwards-compatibility'>language changes</ulink>
+      applied in that mode offer a more complete (but still not exhaustive)
+      account of the differences.  The compatibility mode cannot make the
+      later languages exactly equivalent to XPath 1.0.
+     </para>
+    </sect3>
+
+    <sect3 id="functions-xml-limits-casts">
+     <title>Mappings between SQL and XML data types and values</title>
+
+     <para>
+      In SQL:2006 and later, both directions of conversion between standard SQL
+      data types and the XML Schema types are specified precisely. However, the
+      rules are expressed using the types and semantics of XQuery/XPath, and
+      have no direct application to the different data model of XPath 1.0.
+     </para>
+
+     <para>
+      When <productname>PostgreSQL</productname> maps SQL data values to XML
+      (as in <function>xmlelement</function>), or XML to SQL (as in the output
+      columns of <function>xmltable</function>), except for a few cases
+      treated specially, <productname>PostgreSQL</productname> simply assumes
+      that the XML data type's XPath 1.0 string form will be valid as the
+      text-input form of the SQL datatype, and conversely. This rule has the
+      virtue of simplicity while producing, for many data types, results similar
+      to the mappings specified in the standard. In this release,
+      an explicit cast is needed if an <function>xmltable</function> column
+      expression produces a boolean or double value; see
+      <xref linkend="functions-xml-limits-postgresql"/>.
+     </para>
+
+     <para>
+      Where interoperability with other systems is a concern, for some data
+      types, it may be necessary to use data type formatting functions (such
+      as those in <xref linkend="functions-formatting"/>) explicitly to
+      produce the standard mappings.
+     </para>
+    </sect3>
+   </sect2>
+
+   <sect2 id="functions-xml-limits-postgresql">
+    <title>
+     Incidental limits of the implementation
+    </title>
+
+    <para>
+     This section concerns limits that are not inherent in the
+     <application>libxml2</application> library, but apply to the current
+     implementation in <productname>PostgreSQL</productname>.
+    </para>
+
+    <sect3>
+     <title>
+      Cast needed for <function>xmltable</function> column
+      of boolean or double type
+     </title>
+
+     <para>
+      An <function>xmltable</function> column expression evaluating to an XPath
+      boolean or number result will produce an <quote>unexpected XPath object
+      type</quote> error. The workaround is to rewrite the column expression to
+      be inside the XPath <function>string</function> function;
+      <productname>PostgreSQL</productname> will then assign the string value
+      successfully to an SQL output column of boolean or double type.
+     </para>
+    </sect3>
+
+    <sect3>
+     <title>
+      Column path result or SQL result column of XML type
+     </title>
+
+     <para>
+      In this release, a <function>xmltable</function> column expression
+      that evaluates to an XML node-set can be assigned to an SQL result
+      column of XML type, producing a concatenation of: for most types of
+      node in the node-set, a text node containing the XPath 1.0
+      <firstterm>string-value</firstterm> of the node, but for an element node,
+      a copy of the node itself. Such a node-set may be assigned to an SQL
+      column of non-XML type only if the node-set has a single node, with the
+      string-value of most node types replaced with an empty string, the
+      string-value of an element node replaced with a concatenation of only its
+      direct text-node children (excluding those of descendants), and the
+      string-value of a text or attribute node being as defined in XPath 1.0.
+      An XPath string value assigned to a result column of XML type must be
+      parsable as XML.
+     </para>
+
+     <para>
+      It is best not to develop code that relies on these behaviors, which have
+      little resemblance to the spec, and are changed in
+      <productname>PostgreSQL 12</productname>.
+     </para>
+    </sect3>
+
+    <sect3>
+     <title>Only <literal>BY VALUE</literal> passing mechanism is supported</title>
+
+     <para>
+      The SQL standard defines two <firstterm>passing mechanisms</firstterm>
+      that apply when passing an XML argument from SQL to an XML function or
+      receiving a result: <literal>BY REF</literal>, in which a particular XML
+      value retains its node identity, and <literal>BY VALUE</literal>, in which
+      the content of the XML is passed but node identity is not preserved. A
+      mechanism can be specified before a list of parameters, as the default
+      mechanism for all of them, or after any parameter, to override the
+      default.
+     </para>
+
+     <para>
+      To illustrate the difference, if
+      <replaceable>x</replaceable> is an XML value, these two queries in
+      an SQL:2006 environment would produce true and false, respectively:
+
+<programlisting>
+SELECT XMLQUERY('$a is $b' PASSING BY REF <replaceable>x</replaceable> AS a, <replaceable>x</replaceable> AS b NULL ON EMPTY);
+SELECT XMLQUERY('$a is $b' PASSING BY VALUE <replaceable>x</replaceable> AS a, <replaceable>x</replaceable> AS b NULL ON EMPTY);
+</programlisting>
+     </para>
+
+     <para>
+      In this release, <productname>PostgreSQL</productname> will accept
+      <literal>BY REF</literal> in an
+      <function>XMLEXISTS</function> or <function>XMLTABLE</function>
+      construct, but will ignore it.  The <type>xml</type> data type holds
+      a character-string serialized representation, so there is no node
+      identity to preserve, and passing is always effectively <literal>BY
+      VALUE</literal>.
+     </para>
+    </sect3>
+
+    <sect3>
+     <title>Cannot pass named parameters to queries</title>
+
+     <para>
+      The XPath-based functions support passing one parameter to serve as the
+      XPath expression's context item, but do not support passing additional
+      values to be available to the expression as named parameters.
+     </para>
+    </sect3>
+
+    <sect3>
+     <title>No <type>XML(SEQUENCE)</type> type</title>
+
+     <para>
+      The <productname>PostgreSQL</productname> <type>xml</type> data type
+      can only hold a value in <literal>DOCUMENT</literal>
+      or <literal>CONTENT</literal> form.  An XQuery/XPath expression
+      context item must be a single XML node or atomic value, but XPath 1.0
+      further restricts it to be only an XML node, and has no node type
+      allowing <literal>CONTENT</literal>.  The upshot is that a
+      well-formed <literal>DOCUMENT</literal> is the only form of XML value
+      that <productname>PostgreSQL</productname> can supply as an XPath
+      context item.
+     </para>
+    </sect3>
+   </sect2>
+  </sect1>
+
  </appendix>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index cfa1e78..bec1b87 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10045,16 +10045,25 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 
 
  <sect1 id="functions-xml">
+
   <title>XML Functions</title>
 
+  <indexterm>
+   <primary>XML Functions</primary>
+  </indexterm>
+
   <para>
    The functions and function-like expressions described in this
-   section operate on values of type <type>xml</type>.  Check <xref
+   section operate on values of type <type>xml</type>.  See <xref
    linkend="datatype-xml"/> for information about the <type>xml</type>
    type.  The function-like expressions <function>xmlparse</function>
    and <function>xmlserialize</function> for converting to and from
-   type <type>xml</type> are not repeated here.  Use of most of these
-   functions requires the installation to have been built
+   type <type>xml</type> are documented there, not in this section.
+  </para>
+
+  <para>
+   Use of most of these functions
+   requires <productname>PostgreSQL</productname> to have been built
    with <command>configure --with-libxml</command>.
   </para>
 
@@ -10249,8 +10258,8 @@ SELECT xmlelement(name foo, xmlattributes('xyz' as bar),
      encoding, depending on the setting of the configuration parameter
      <xref linkend="guc-xmlbinary"/>.  The particular behavior for
      individual data types is expected to evolve in order to align the
-     SQL and PostgreSQL data types with the XML Schema specification,
-     at which point a more precise description will appear.
+     PostgreSQL mappings with those specified in SQL:2006 and later,
+     as discussed in <xref linkend="functions-xml-limits-casts"/>.
     </para>
    </sect3>
 
@@ -10492,10 +10501,13 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
 </synopsis>
 
     <para>
-     The function <function>xmlexists</function> returns true if the
-     XPath expression in the first argument returns any nodes, and
-     false otherwise.  (If either argument is null, the result is
-     null.)
+     The function <function>xmlexists</function> evaluates an XPath 1.0
+     expression (the first argument), with the passed XML value as its context
+     item.  The function returns false if the result of that evaluation
+     yields an empty node-set, true if it yields any other value.  The
+     function returns null if any argument is null.  A nonnull value
+     passed as the context item must be an XML document, not a content
+     fragment or any non-XML value.
     </para>
 
     <para>
@@ -10511,14 +10523,14 @@ SELECT xmlexists('//town[text() = ''Toronto'']' PASSING BY REF '<towns><town>Tor
     </para>
 
     <para>
-     The <literal>BY REF</literal> clauses have no effect in
-     PostgreSQL, but are allowed for SQL conformance and compatibility
-     with other implementations.  Per SQL standard, the
-     first <literal>BY REF</literal> is required, the second is
-     optional.  Also note that the SQL standard specifies
-     the <function>xmlexists</function> construct to take an XQuery
-     expression as first argument, but PostgreSQL currently only
-     supports XPath, which is a subset of XQuery.
+     The <literal>BY REF</literal> clauses
+     are accepted in <productname>PostgreSQL</productname>, but are ignored,
+     as discussed in <xref linkend="functions-xml-limits-postgresql"/>.
+     In the SQL standard, the <function>xmlexists</function> function
+     evaluates an expression in the XML Query language,
+     but <productname>PostgreSQL</productname> allows only an XPath 1.0
+     expression, as discussed in
+     <xref linkend="functions-xml-limits-xpath1"/>.
     </para>
    </sect3>
 
@@ -10624,12 +10636,12 @@ SELECT xml_is_well_formed_document('<pg:foo xmlns:pg="http://postgresql.org/stuf
 </synopsis>
 
     <para>
-     The function <function>xpath</function> evaluates the XPath
+     The function <function>xpath</function> evaluates the XPath 1.0
      expression <replaceable>xpath</replaceable> (a <type>text</type> value)
      against the XML value
      <replaceable>xml</replaceable>.  It returns an array of XML values
-     corresponding to the node set produced by the XPath expression.
-     If the XPath expression returns a scalar value rather than a node set,
+     corresponding to the node-set produced by the XPath expression.
+     If the XPath expression returns a scalar value rather than a node-set,
      a single-element array is returned.
     </para>
 
@@ -10691,9 +10703,10 @@ SELECT xpath('//mydefns:b/text()', '<a xmlns="http://example.com"><b>test</b></a
     <para>
      The function <function>xpath_exists</function> is a specialized form
      of the <function>xpath</function> function.  Instead of returning the
-     individual XML values that satisfy the XPath, this function returns a
-     Boolean indicating whether the query was satisfied or not.  This
-     function is equivalent to the standard <literal>XMLEXISTS</literal> predicate,
+     individual XML values that satisfy the XPath 1.0 expression, this function
+     returns a Boolean indicating whether the query was satisfied or not
+     (specifically, whether it produced any value other than an empty node-set).
+     This function is equivalent to the <literal>XMLEXISTS</literal> predicate,
      except that it also offers support for a namespace mapping argument.
     </para>
 
@@ -10734,8 +10747,8 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
 
     <para>
      The <function>xmltable</function> function produces a table based
-     on the given XML value, an XPath filter to extract rows, and an
-     optional set of column definitions.
+     on the given XML value, an XPath filter to extract rows, and a
+     set of column definitions.
     </para>
 
     <para>
@@ -10746,30 +10759,34 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
     </para>
 
     <para>
-     The required <replaceable>row_expression</replaceable> argument is an XPath
-     expression that is evaluated against the supplied XML document to
-     obtain an ordered sequence of XML nodes. This sequence is what
-     <function>xmltable</function> transforms into output rows.
+     The required <replaceable>row_expression</replaceable> argument is
+     an XPath 1.0 expression that is evaluated, passing the
+     <replaceable>document_expression</replaceable> as its context item, to
+     obtain a set of XML nodes. These nodes are what
+     <function>xmltable</function> transforms into output rows. No rows
+     will be produced if the <replaceable>document_expression</replaceable>
+     is null, nor if the <replaceable>row_expression</replaceable> produces
+     an empty node-set or any value other than a node-set.
     </para>
 
     <para>
-     <replaceable>document_expression</replaceable> provides the XML document to
-     operate on.
-     The <literal>BY REF</literal> clauses have no effect in PostgreSQL,
-     but are allowed for SQL conformance and compatibility with other
-     implementations.
-     The argument must be a well-formed XML document; fragments/forests
-     are not accepted.
+     <replaceable>document_expression</replaceable> provides the context
+     item for the <replaceable>row_expression</replaceable>. It must be a
+     well-formed XML document; fragments/forests are not accepted.
+     The <literal>BY REF</literal> clauses
+     are accepted but ignored, as discussed in
+     <xref linkend="functions-xml-limits-postgresql"/>.
+     In the SQL standard, the <function>xmltable</function> function
+     evaluates expressions in the XML Query language,
+     but <productname>PostgreSQL</productname> allows only XPath 1.0
+     expressions, as discussed in
+     <xref linkend="functions-xml-limits-xpath1"/>.
     </para>
 
     <para>
      The mandatory <literal>COLUMNS</literal> clause specifies the list
      of columns in the output table.
-     If the <literal>COLUMNS</literal> clause is omitted, the rows in the result
-     set contain a single column of type <literal>xml</literal> containing the
-     data matched by <replaceable>row_expression</replaceable>.
-     If <literal>COLUMNS</literal> is specified, each entry describes a
-     single column.
+     Each entry describes a single column.
      See the syntax summary above for the format.
      The column name and type are required; the path, default and
      nullability clauses are optional.
@@ -10777,48 +10794,57 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
 
     <para>
      A column marked <literal>FOR ORDINALITY</literal> will be populated
-     with row numbers matching the order in which the
-     output rows appeared in the original input XML document.
+     with row numbers, starting with 1, in the order of nodes retrieved from
+     the <replaceable>row_expression</replaceable>'s result node-set.
      At most one column may be marked <literal>FOR ORDINALITY</literal>.
     </para>
 
+    <note>
+     <para>
+      XPath 1.0 does not specify an order for nodes in a node-set, so code
+      that relies on a particular order of the results will be
+      implementation-dependent.  Details can be found in
+      <xref linkend="xml-xpath-1-specifics"/>.
+     </para>
+    </note>
+
     <para>
-     The <literal>column_expression</literal> for a column is an XPath expression
-     that is evaluated for each row, relative to the result of the
-     <replaceable>row_expression</replaceable>, to find the value of the column.
-     If no <literal>column_expression</literal> is given, then the column name
-     is used as an implicit path.
+     The <replaceable>column_expression</replaceable> for a column is an
+     XPath 1.0 expression that is evaluated for each row, with the current
+     node from the <replaceable>row_expression</replaceable> result as its
+     context item, to find the value of the column.  If
+     no <replaceable>column_expression</replaceable> is given, then the
+     column name is used as an implicit path.
     </para>
 
     <para>
-     If a column's XPath expression returns multiple elements, an error
-     is raised.
-     If the expression matches an empty tag, the result is an
-     empty string (not <literal>NULL</literal>).
-     Any <literal>xsi:nil</literal> attributes are ignored.
+     If a column's XPath expression returns a non-XML value (limited to
+     string, boolean, or double in XPath 1.0) and the column has a
+     PostgreSQL type other than <type>xml</type>, the column will be set
+     as if by assigning the value's string representation to the PostgreSQL
+     type. In this release, an XPath boolean or double result must be explicitly
+     cast to string (that is, the XPath 1.0 <function>string</function> function
+     wrapped around the original column expression);
+     <productname>PostgreSQL</productname> can then successfully assign the
+     string to an SQL result column of boolean or double type.
+     These conversion rules differ from those of the SQL
+     standard, as discussed in <xref linkend="functions-xml-limits-casts"/>.
     </para>
 
     <para>
-     The text body of the XML matched by the <replaceable>column_expression</replaceable>
-     is used as the column value. Multiple <literal>text()</literal> nodes
-     within an element are concatenated in order. Any child elements,
-     processing instructions, and comments are ignored, but the text contents
-     of child elements are concatenated to the result.
-     Note that the whitespace-only <literal>text()</literal> node between two non-text
-     elements is preserved, and that leading whitespace on a <literal>text()</literal>
-     node is not flattened.
+     In this release, SQL result columns of <type>xml</type> type, or
+     column XPath expressions evaluating to an XML type, regardless of the
+     output column SQL type, are handled as described in
+     <xref linkend="functions-xml-limits-postgresql"/>; the behavior
+     changes significantly in <productname>PostgreSQL 12</productname>.
     </para>
 
     <para>
-     If the path expression does not match for a given row but
-     <replaceable>default_expression</replaceable> is specified, the value resulting
-     from evaluating that expression is used.
-     If no <literal>DEFAULT</literal> clause is given for the column,
-     the field will be set to <literal>NULL</literal>.
-     It is possible for a <replaceable>default_expression</replaceable> to reference
-     the value of output columns that appear prior to it in the column list,
-     so the default of one column may be based on the value of another
-     column.
+     If the path expression returns an empty node-set
+     (typically, when it does not match)
+     for a given row, the column will be set to <literal>NULL</literal>, unless
+     a <replaceable>default_expression</replaceable> is specified; then the
+     value resulting from evaluating that expression is used.
     </para>
 
     <para>
@@ -10830,20 +10856,14 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
     </para>
 
     <para>
-     Unlike regular PostgreSQL functions, <replaceable>column_expression</replaceable>
-     and <replaceable>default_expression</replaceable> are not evaluated to a simple
-     value before calling the function.
-     <replaceable>column_expression</replaceable> is normally evaluated
-     exactly once per input row, and <replaceable>default_expression</replaceable>
-     is evaluated each time a default is needed for a field.
-     If the expression qualifies as stable or immutable the repeat
+     A <replaceable>default_expression</replaceable>, rather than being
+     evaluated immediately when <function>xmltable</function> is called,
+     is evaluated each time a default is needed for the column.
+     If the expression qualifies as stable or immutable, the repeat
      evaluation may be skipped.
-     Effectively <function>xmltable</function> behaves more like a subquery than a
-     function call.
      This means that you can usefully use volatile functions like
-     <function>nextval</function> in <replaceable>default_expression</replaceable>, and
-     <replaceable>column_expression</replaceable> may depend on other parts of the
-     XML document.
+     <function>nextval</function> in
+     <replaceable>default_expression</replaceable>.
     </para>
 
     <para>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index aeb262a..915696e 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -593,7 +593,7 @@ X085	Predefined namespace prefixes			NO
 X086	XML namespace declarations in XMLTable			NO	
 X090	XML document predicate			YES	
 X091	XML content predicate			NO	
-X096	XMLExists			NO	XPath only
+X096	XMLExists			NO	XPath 1.0 only
 X100	Host language support for XML: CONTENT option			NO	
 X101	Host language support for XML: DOCUMENT option			NO	
 X110	Host language support for XML: VARCHAR mapping			NO	
@@ -661,11 +661,11 @@ X282	XMLValidate with CONTENT option			NO
 X283	XMLValidate with SEQUENCE option			NO	
 X284	XMLValidate: NAMESPACE without ELEMENT clause			NO	
 X286	XMLValidate: NO NAMESPACE with ELEMENT clause			NO	
-X300	XMLTable			NO	XPath only
+X300	XMLTable			NO	XPath 1.0 only
 X301	XMLTable: derived column list option			YES	
 X302	XMLTable: ordinality column option			YES	
 X303	XMLTable: column default option			YES	
-X304	XMLTable: passing a context item			YES	
+X304	XMLTable: passing a context item			YES	must be XML DOCUMENT
 X305	XMLTable: initializing an XQuery variable			NO	
 X400	Name and identifier mapping			YES	
 X410	Alter column data type: XML type			YES	
-- 
2.7.3

10.patchtext/x-patch; name=10.patchDownload

From 387631f79b0a7824d053296965f768efca14e30d Mon Sep 17 00:00:00 2001
From: nobody <nobody@halvard.anastigmatix.net>
Date: Fri, 2 Aug 2019 22:47:10 -0400
Subject: [PATCH] Improve documentation about our XML functionality.

Add a section explaining how our XML features depart from current
versions of the SQL standard.  Update and clarify the descriptions
of some XML functions.

Chapman Flack, reviewed by Ryan Lambert

Discussion: https://postgr.es/m/5BD1284C.1010305@anastigmatix.net
Discussion: https://postgr.es/m/5C81F8C0.6090901@anastigmatix.net
Discussion: https://postgr.es/m/CAN-V+g-6JqUQEQZ55Q3toXEN6d5Ez5uvzL4VR+8KtvJKj31taw@mail.gmail.com

This version for backpatching PG 10, taken from Tom's commit
for 12, then edited to correctly describe behaviors that are fixed
in 12 but still broken in 10.
---
 doc/src/sgml/datatype.sgml           |   5 +
 doc/src/sgml/features.sgml           | 381 ++++++++++++++++++++++++++++++++++-
 doc/src/sgml/func.sgml               | 190 +++++++++--------
 src/backend/catalog/sql_features.txt |   6 +-
 4 files changed, 493 insertions(+), 89 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index cd5f5f0..6b9010f 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4228,6 +4228,11 @@ a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
     value is a full document or only a content fragment.
    </para>
 
+   <para>
+    Limits and compatibility notes for the <type>xml</type> data type
+    can be found in <xref linkend="xml-limits-conformance"/>.
+   </para>
+
    <sect2>
     <title>Creating XML Values</title>
    <para>
diff --git a/doc/src/sgml/features.sgml b/doc/src/sgml/features.sgml
index 6c22d69..253ec87 100644
--- a/doc/src/sgml/features.sgml
+++ b/doc/src/sgml/features.sgml
@@ -16,7 +16,8 @@
   Language SQL</quote>.  A revised version of the standard is released
   from time to time; the most recent update appearing in 2011.
   The 2011 version is referred to as ISO/IEC 9075:2011, or simply as SQL:2011.
-  The versions prior to that were SQL:2008, SQL:2003, SQL:1999, and SQL-92.  Each version
+  The versions prior to that were SQL:2008, SQL:2006, SQL:2003, SQL:1999,
+  and SQL-92.  Each version
   replaces the previous one, so claims of conformance to earlier
   versions have no official merit.
   <productname>PostgreSQL</productname> development aims for
@@ -155,4 +156,382 @@
    </para>
   </sect1>
 
+  <sect1 id="xml-limits-conformance">
+   <title>XML Limits and Conformance to SQL/XML</title>
+
+   <indexterm>
+    <primary>SQL/XML</primary>
+    <secondary>limits and conformance</secondary>
+   </indexterm>
+
+   <para>
+    Significant revisions to the XML-related specifications in ISO/IEC 9075-14
+    (SQL/XML) were introduced with SQL:2006.
+    <productname>PostgreSQL</productname>'s implementation of the XML data
+    type and related functions largely follows the earlier 2003 edition,
+    with some borrowing from later editions.  In particular:
+    <itemizedlist>
+     <listitem>
+      <para>
+       Where the current standard provides a family of XML data types
+       to hold <quote>document</quote> or <quote>content</quote> in
+       untyped or XML Schema-typed variants, and a type
+       <type>XML(SEQUENCE)</type> to hold arbitrary pieces of XML content,
+       <productname>PostgreSQL</productname> provides the single
+       <type>xml</type> type, which can hold <quote>document</quote> or
+       <quote>content</quote>.  There is no equivalent of the
+       standard's <quote>sequence</quote> type.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       <productname>PostgreSQL</productname> provides two functions
+       introduced in SQL:2006, but in variants that use the XPath 1.0
+       language, rather than XML Query as specified for them in the
+       standard.
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+    This section presents some of the resulting differences you may encounter.
+   </para>
+
+   <sect2 id="functions-xml-limits-xpath1">
+    <title>Queries are restricted to XPath 1.0</title>
+
+    <para>
+     The <productname>PostgreSQL</productname>-specific functions
+     <function>xpath()</function> and <function>xpath_exists()</function>
+     query XML documents using the XPath language.
+     <productname>PostgreSQL</productname> also provides XPath-only variants
+     of the standard functions <function>XMLEXISTS</function> and
+     <function>XMLTABLE</function>, which officially use
+     the XQuery language. For all of these functions,
+     <productname>PostgreSQL</productname> relies on the
+     <application>libxml2</application> library, which provides only XPath 1.0.
+    </para>
+
+    <para>
+     There is a strong connection between the XQuery language and XPath
+     versions 2.0 and later: any expression that is syntactically valid and
+     executes successfully in both produces the same result (with a minor
+     exception for expressions containing numeric character references or
+     predefined entity references, which XQuery replaces with the
+     corresponding character while XPath leaves them alone).  But there is
+     no such connection between these languages and XPath 1.0; it was an
+     earlier language and differs in many respects.
+    </para>
+
+    <para>
+     There are two categories of limitation to keep in mind: the restriction
+     from XQuery to XPath for the functions specified in the SQL standard, and
+     the restriction of XPath to version 1.0 for both the standard and the
+     <productname>PostgreSQL</productname>-specific functions.
+    </para>
+
+    <sect3>
+     <title>Restriction of XQuery to XPath</title>
+
+     <para>
+      Features of XQuery beyond those of XPath include:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+         XQuery expressions can construct and return new XML nodes, in
+         addition to all possible XPath values.  XPath can create and return
+         values of the atomic types (numbers, strings, and so on) but can
+         only return XML nodes that were already present in documents
+         supplied as input to the expression.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XQuery has control constructs for iteration, sorting, and grouping.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XQuery allows declaration and use of local functions.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <para>
+      Recent XPath versions begin to offer capabilities overlapping with
+      these (such as functional-style <function>for-each</function> and
+      <function>sort</function>, anonymous functions, and
+      <function>parse-xml</function> to create a node from a string),
+      but such features were not available before XPath 3.0.
+     </para>
+    </sect3>
+
+    <sect3 id="xml-xpath-1-specifics">
+     <title>Restriction of XPath to 1.0</title>
+
+     <para>
+      For developers familiar with XQuery and XPath 2.0 or later, XPath 1.0
+      presents a number of differences to contend with:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+         The fundamental type of an XQuery/XPath expression, the
+         <type>sequence</type>, which can contain XML nodes, atomic values,
+         or both, does not exist in XPath 1.0. A 1.0 expression can only
+         produce a node-set (containing zero or more XML nodes), or a single
+         atomic value.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          Unlike an XQuery/XPath sequence, which can contain any desired
+          items in any desired order, an XPath 1.0 node-set has no
+          guaranteed order and, like any set, does not allow multiple
+          appearances of the same item.
+         <note>
+          <para>
+           The <application>libxml2</application> library does seem to
+           always return node-sets to <productname>PostgreSQL</productname>
+           with their members in the same relative order they had in the
+           input document.  Its documentation does not commit to this
+           behavior, and an XPath 1.0 expression cannot control it.
+          </para>
+         </note>
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         While XQuery/XPath provides all of the types defined in XML Schema
+         and many operators and functions over those types, XPath 1.0 has only
+         node-sets and the three atomic types <type>boolean</type>,
+         <type>double</type>, and <type>string</type>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XPath 1.0 has no conditional operator. An XQuery/XPath expression
+         such as <literal>if ( hat ) then hat/@size else "no hat"</literal>
+         has no XPath 1.0 equivalent.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XPath 1.0 has no ordering comparison operator for strings. Both
+         <literal>"cat" &lt; "dog"</literal> and
+         <literal>"cat" &gt; "dog"</literal> are false, because each is a
+         numeric comparison of two <literal>NaN</literal>s. In contrast,
+         <literal>=</literal> and <literal>!=</literal> do compare the strings
+         as strings.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         XPath 1.0 blurs the distinction between
+         <firstterm>value comparisons</firstterm> and
+         <firstterm>general comparisons</firstterm> as XQuery/XPath define
+         them.  Both <literal>sale/@hatsize = 7</literal> and
+         <literal>sale/@customer = "alice"</literal> are existentially
+         quantified comparisons, true if there is
+         any <literal>sale</literal> with the given value for the
+         attribute, but <literal>sale/@taxable = false()</literal> is a
+         value comparison to the
+         <firstterm>effective boolean value</firstterm> of a whole node-set.
+         It is true only if no <literal>sale</literal> has
+         a <literal>taxable</literal> attribute at all.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         In the XQuery/XPath data model, a <firstterm>document
+         node</firstterm> can have either document form (i.e., exactly one
+         top-level element, with only comments and processing instructions
+         outside of it) or content form (with those constraints
+         relaxed). Its equivalent in XPath 1.0, the
+         <firstterm>root node</firstterm>, can only be in document form.
+         This is part of the reason an <type>xml</type> value passed as the
+         context item to any <productname>PostgreSQL</productname>
+         XPath-based function must be in document form.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <para>
+      The differences highlighted here are not all of them. In XQuery and
+      the 2.0 and later versions of XPath, there is an XPath 1.0 compatibility
+      mode, and the W3C lists of
+      <ulink url='https://www.w3.org/TR/2010/REC-xpath-functions-20101214/#xpath1-compatibility'>function library changes</ulink>
+      and
+      <ulink url='https://www.w3.org/TR/xpath20/#id-backwards-compatibility'>language changes</ulink>
+      applied in that mode offer a more complete (but still not exhaustive)
+      account of the differences.  The compatibility mode cannot make the
+      later languages exactly equivalent to XPath 1.0.
+     </para>
+    </sect3>
+
+    <sect3 id="functions-xml-limits-casts">
+     <title>Mappings between SQL and XML data types and values</title>
+
+     <para>
+      In SQL:2006 and later, both directions of conversion between standard SQL
+      data types and the XML Schema types are specified precisely. However, the
+      rules are expressed using the types and semantics of XQuery/XPath, and
+      have no direct application to the different data model of XPath 1.0.
+     </para>
+
+     <para>
+      When <productname>PostgreSQL</productname> maps SQL data values to XML
+      (as in <function>xmlelement</function>), or XML to SQL (as in the output
+      columns of <function>xmltable</function>), except for a few cases
+      treated specially, <productname>PostgreSQL</productname> simply assumes
+      that the XML data type's XPath 1.0 string form will be valid as the
+      text-input form of the SQL datatype, and conversely. This rule has the
+      virtue of simplicity while producing, for many data types, results similar
+      to the mappings specified in the standard. In this release,
+      an explicit cast is needed if an <function>xmltable</function> column
+      expression produces a boolean or double value; see
+      <xref linkend="functions-xml-limits-postgresql"/>.
+     </para>
+
+     <para>
+      Where interoperability with other systems is a concern, for some data
+      types, it may be necessary to use data type formatting functions (such
+      as those in <xref linkend="functions-formatting"/>) explicitly to
+      produce the standard mappings.
+     </para>
+    </sect3>
+   </sect2>
+
+   <sect2 id="functions-xml-limits-postgresql">
+    <title>
+     Incidental limits of the implementation
+    </title>
+
+    <para>
+     This section concerns limits that are not inherent in the
+     <application>libxml2</application> library, but apply to the current
+     implementation in <productname>PostgreSQL</productname>.
+    </para>
+
+    <sect3>
+     <title>
+      Cast needed for <function>xmltable</function> column
+      of boolean or double type
+     </title>
+
+     <para>
+      An <function>xmltable</function> column expression evaluating to an XPath
+      boolean or number result will produce an <quote>unexpected XPath object
+      type</quote> error. The workaround is to rewrite the column expression to
+      be inside the XPath <function>string</function> function;
+      <productname>PostgreSQL</productname> will then assign the string value
+      successfully to an SQL output column of boolean or double type.
+     </para>
+    </sect3>
+
+    <sect3>
+     <title>
+      Column path result or SQL result column of XML type
+     </title>
+
+     <para>
+      In this release, a <function>xmltable</function> column expression
+      that evaluates to an XML node-set can be assigned to an SQL result
+      column of XML type, producing a concatenation of: for most types of
+      node in the node-set, a text node containing the XPath 1.0
+      <firstterm>string-value</firstterm> of the node, but for an element node,
+      a copy of the node itself. Such a node-set may be assigned to an SQL
+      column of non-XML type only if the node-set has a single node, with the
+      string-value of most node types replaced with an empty string, the
+      string-value of an element node replaced with a concatenation of only its
+      direct text-node children (excluding those of descendants), and the
+      string-value of a text or attribute node being as defined in XPath 1.0.
+      An XPath string value assigned to a result column of XML type must be
+      parsable as XML.
+     </para>
+
+     <para>
+      It is best not to develop code that relies on these behaviors, which have
+      little resemblance to the spec, and are changed in
+      <productname>PostgreSQL 12</productname>.
+     </para>
+    </sect3>
+
+    <sect3>
+     <title>Only <literal>BY VALUE</literal> passing mechanism is supported</title>
+
+     <para>
+      The SQL standard defines two <firstterm>passing mechanisms</firstterm>
+      that apply when passing an XML argument from SQL to an XML function or
+      receiving a result: <literal>BY REF</literal>, in which a particular XML
+      value retains its node identity, and <literal>BY VALUE</literal>, in which
+      the content of the XML is passed but node identity is not preserved. A
+      mechanism can be specified before a list of parameters, as the default
+      mechanism for all of them, or after any parameter, to override the
+      default.
+     </para>
+
+     <para>
+      To illustrate the difference, if
+      <replaceable>x</replaceable> is an XML value, these two queries in
+      an SQL:2006 environment would produce true and false, respectively:
+
+<programlisting>
+SELECT XMLQUERY('$a is $b' PASSING BY REF <replaceable>x</replaceable> AS a, <replaceable>x</replaceable> AS b NULL ON EMPTY);
+SELECT XMLQUERY('$a is $b' PASSING BY VALUE <replaceable>x</replaceable> AS a, <replaceable>x</replaceable> AS b NULL ON EMPTY);
+</programlisting>
+     </para>
+
+     <para>
+      In this release, <productname>PostgreSQL</productname> will accept
+      <literal>BY REF</literal> in an
+      <function>XMLEXISTS</function> or <function>XMLTABLE</function>
+      construct, but will ignore it.  The <type>xml</type> data type holds
+      a character-string serialized representation, so there is no node
+      identity to preserve, and passing is always effectively <literal>BY
+      VALUE</literal>.
+     </para>
+    </sect3>
+
+    <sect3>
+     <title>Cannot pass named parameters to queries</title>
+
+     <para>
+      The XPath-based functions support passing one parameter to serve as the
+      XPath expression's context item, but do not support passing additional
+      values to be available to the expression as named parameters.
+     </para>
+    </sect3>
+
+    <sect3>
+     <title>No <type>XML(SEQUENCE)</type> type</title>
+
+     <para>
+      The <productname>PostgreSQL</productname> <type>xml</type> data type
+      can only hold a value in <literal>DOCUMENT</literal>
+      or <literal>CONTENT</literal> form.  An XQuery/XPath expression
+      context item must be a single XML node or atomic value, but XPath 1.0
+      further restricts it to be only an XML node, and has no node type
+      allowing <literal>CONTENT</literal>.  The upshot is that a
+      well-formed <literal>DOCUMENT</literal> is the only form of XML value
+      that <productname>PostgreSQL</productname> can supply as an XPath
+      context item.
+     </para>
+    </sect3>
+   </sect2>
+  </sect1>
+
  </appendix>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 198f9c2..a7514c0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -9903,17 +9903,26 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 
 
  <sect1 id="functions-xml">
+
   <title>XML Functions</title>
 
+  <indexterm>
+   <primary>XML Functions</primary>
+  </indexterm>
+
   <para>
    The functions and function-like expressions described in this
-   section operate on values of type <type>xml</type>.  Check <xref
-   linkend="datatype-xml"> for information about the <type>xml</type>
+   section operate on values of type <type>xml</type>.  See <xref
+   linkend="datatype-xml"/> for information about the <type>xml</type>
    type.  The function-like expressions <function>xmlparse</function>
    and <function>xmlserialize</function> for converting to and from
-   type <type>xml</type> are not repeated here.  Use of most of these
-   functions requires the installation to have been built
-   with <command>configure --with-libxml</>.
+   type <type>xml</type> are documented there, not in this section.
+  </para>
+
+  <para>
+   Use of most of these functions
+   requires <productname>PostgreSQL</productname> to have been built
+   with <command>configure --with-libxml</command>.
   </para>
 
   <sect2 id="functions-producing-xml">
@@ -10107,8 +10116,8 @@ SELECT xmlelement(name foo, xmlattributes('xyz' as bar),
      encoding, depending on the setting of the configuration parameter
      <xref linkend="guc-xmlbinary">.  The particular behavior for
      individual data types is expected to evolve in order to align the
-     SQL and PostgreSQL data types with the XML Schema specification,
-     at which point a more precise description will appear.
+     PostgreSQL mappings with those specified in SQL:2006 and later,
+     as discussed in <xref linkend="functions-xml-limits-casts"/>.
     </para>
    </sect3>
 
@@ -10350,10 +10359,13 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
 </synopsis>
 
     <para>
-     The function <function>xmlexists</function> returns true if the
-     XPath expression in the first argument returns any nodes, and
-     false otherwise.  (If either argument is null, the result is
-     null.)
+     The function <function>xmlexists</function> evaluates an XPath 1.0
+     expression (the first argument), with the passed XML value as its context
+     item.  The function returns false if the result of that evaluation
+     yields an empty node-set, true if it yields any other value.  The
+     function returns null if any argument is null.  A nonnull value
+     passed as the context item must be an XML document, not a content
+     fragment or any non-XML value.
     </para>
 
     <para>
@@ -10369,14 +10381,14 @@ SELECT xmlexists('//town[text() = ''Toronto'']' PASSING BY REF '<towns><town>Tor
     </para>
 
     <para>
-     The <literal>BY REF</literal> clauses have no effect in
-     PostgreSQL, but are allowed for SQL conformance and compatibility
-     with other implementations.  Per SQL standard, the
-     first <literal>BY REF</literal> is required, the second is
-     optional.  Also note that the SQL standard specifies
-     the <function>xmlexists</function> construct to take an XQuery
-     expression as first argument, but PostgreSQL currently only
-     supports XPath, which is a subset of XQuery.
+     The <literal>BY REF</literal> clauses
+     are accepted in <productname>PostgreSQL</productname>, but are ignored,
+     as discussed in <xref linkend="functions-xml-limits-postgresql"/>.
+     In the SQL standard, the <function>xmlexists</function> function
+     evaluates an expression in the XML Query language,
+     but <productname>PostgreSQL</productname> allows only an XPath 1.0
+     expression, as discussed in
+     <xref linkend="functions-xml-limits-xpath1"/>.
     </para>
    </sect3>
 
@@ -10482,12 +10494,12 @@ SELECT xml_is_well_formed_document('<pg:foo xmlns:pg="http://postgresql.org/stuf
 </synopsis>
 
     <para>
-     The function <function>xpath</function> evaluates the XPath
-     expression <replaceable>xpath</replaceable> (a <type>text</> value)
+     The function <function>xpath</function> evaluates the XPath 1.0
+     expression <replaceable>xpath</replaceable> (a <type>text</type> value)
      against the XML value
      <replaceable>xml</replaceable>.  It returns an array of XML values
-     corresponding to the node set produced by the XPath expression.
-     If the XPath expression returns a scalar value rather than a node set,
+     corresponding to the node-set produced by the XPath expression.
+     If the XPath expression returns a scalar value rather than a node-set,
      a single-element array is returned.
     </para>
 
@@ -10549,9 +10561,10 @@ SELECT xpath('//mydefns:b/text()', '<a xmlns="http://example.com"><b>test</b></a
     <para>
      The function <function>xpath_exists</function> is a specialized form
      of the <function>xpath</function> function.  Instead of returning the
-     individual XML values that satisfy the XPath, this function returns a
-     Boolean indicating whether the query was satisfied or not.  This
-     function is equivalent to the standard <literal>XMLEXISTS</> predicate,
+     individual XML values that satisfy the XPath 1.0 expression, this function
+     returns a Boolean indicating whether the query was satisfied or not
+     (specifically, whether it produced any value other than an empty node-set).
+     This function is equivalent to the <literal>XMLEXISTS</literal> predicate,
      except that it also offers support for a namespace mapping argument.
     </para>
 
@@ -10592,8 +10605,8 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
 
     <para>
      The <function>xmltable</function> function produces a table based
-     on the given XML value, an XPath filter to extract rows, and an
-     optional set of column definitions.
+     on the given XML value, an XPath filter to extract rows, and a
+     set of column definitions.
     </para>
 
     <para>
@@ -10604,30 +10617,34 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
     </para>
 
     <para>
-     The required <replaceable>row_expression</> argument is an XPath
-     expression that is evaluated against the supplied XML document to
-     obtain an ordered sequence of XML nodes. This sequence is what
-     <function>xmltable</> transforms into output rows.
+     The required <replaceable>row_expression</replaceable> argument is
+     an XPath 1.0 expression that is evaluated, passing the
+     <replaceable>document_expression</replaceable> as its context item, to
+     obtain a set of XML nodes. These nodes are what
+     <function>xmltable</function> transforms into output rows. No rows
+     will be produced if the <replaceable>document_expression</replaceable>
+     is null, nor if the <replaceable>row_expression</replaceable> produces
+     an empty node-set or any value other than a node-set.
     </para>
 
     <para>
-     <replaceable>document_expression</> provides the XML document to
-     operate on.
-     The <literal>BY REF</literal> clauses have no effect in PostgreSQL,
-     but are allowed for SQL conformance and compatibility with other
-     implementations.
-     The argument must be a well-formed XML document; fragments/forests
-     are not accepted.
+     <replaceable>document_expression</replaceable> provides the context
+     item for the <replaceable>row_expression</replaceable>. It must be a
+     well-formed XML document; fragments/forests are not accepted.
+     The <literal>BY REF</literal> clauses
+     are accepted but ignored, as discussed in
+     <xref linkend="functions-xml-limits-postgresql"/>.
+     In the SQL standard, the <function>xmltable</function> function
+     evaluates expressions in the XML Query language,
+     but <productname>PostgreSQL</productname> allows only XPath 1.0
+     expressions, as discussed in
+     <xref linkend="functions-xml-limits-xpath1"/>.
     </para>
 
     <para>
      The mandatory <literal>COLUMNS</literal> clause specifies the list
      of columns in the output table.
-     If the <literal>COLUMNS</> clause is omitted, the rows in the result
-     set contain a single column of type <literal>xml</> containing the
-     data matched by <replaceable>row_expression</>.
-     If <literal>COLUMNS</literal> is specified, each entry describes a
-     single column.
+     Each entry describes a single column.
      See the syntax summary above for the format.
      The column name and type are required; the path, default and
      nullability clauses are optional.
@@ -10635,48 +10652,57 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
 
     <para>
      A column marked <literal>FOR ORDINALITY</literal> will be populated
-     with row numbers matching the order in which the
-     output rows appeared in the original input XML document.
+     with row numbers, starting with 1, in the order of nodes retrieved from
+     the <replaceable>row_expression</replaceable>'s result node-set.
      At most one column may be marked <literal>FOR ORDINALITY</literal>.
     </para>
 
+    <note>
+     <para>
+      XPath 1.0 does not specify an order for nodes in a node-set, so code
+      that relies on a particular order of the results will be
+      implementation-dependent.  Details can be found in
+      <xref linkend="xml-xpath-1-specifics"/>.
+     </para>
+    </note>
+
     <para>
-     The <literal>column_expression</> for a column is an XPath expression
-     that is evaluated for each row, relative to the result of the
-     <replaceable>row_expression</>, to find the value of the column.
-     If no <literal>column_expression</> is given, then the column name
-     is used as an implicit path.
+     The <replaceable>column_expression</replaceable> for a column is an
+     XPath 1.0 expression that is evaluated for each row, with the current
+     node from the <replaceable>row_expression</replaceable> result as its
+     context item, to find the value of the column.  If
+     no <replaceable>column_expression</replaceable> is given, then the
+     column name is used as an implicit path.
     </para>
 
     <para>
-     If a column's XPath expression returns multiple elements, an error
-     is raised.
-     If the expression matches an empty tag, the result is an
-     empty string (not <literal>NULL</>).
-     Any <literal>xsi:nil</> attributes are ignored.
+     If a column's XPath expression returns a non-XML value (limited to
+     string, boolean, or double in XPath 1.0) and the column has a
+     PostgreSQL type other than <type>xml</type>, the column will be set
+     as if by assigning the value's string representation to the PostgreSQL
+     type. In this release, an XPath boolean or double result must be explicitly
+     cast to string (that is, the XPath 1.0 <function>string</function> function
+     wrapped around the original column expression);
+     <productname>PostgreSQL</productname> can then successfully assign the
+     string to an SQL result column of boolean or double type.
+     These conversion rules differ from those of the SQL
+     standard, as discussed in <xref linkend="functions-xml-limits-casts"/>.
     </para>
 
     <para>
-     The text body of the XML matched by the <replaceable>column_expression</>
-     is used as the column value. Multiple <literal>text()</literal> nodes
-     within an element are concatenated in order. Any child elements,
-     processing instructions, and comments are ignored, but the text contents
-     of child elements are concatenated to the result.
-     Note that the whitespace-only <literal>text()</> node between two non-text
-     elements is preserved, and that leading whitespace on a <literal>text()</>
-     node is not flattened.
+     In this release, SQL result columns of <type>xml</type> type, or
+     column XPath expressions evaluating to an XML type, regardless of the
+     output column SQL type, are handled as described in
+     <xref linkend="functions-xml-limits-postgresql"/>; the behavior
+     changes significantly in <productname>PostgreSQL 12</productname>.
     </para>
 
     <para>
-     If the path expression does not match for a given row but
-     <replaceable>default_expression</> is specified, the value resulting
-     from evaluating that expression is used.
-     If no <literal>DEFAULT</> clause is given for the column,
-     the field will be set to <literal>NULL</>.
-     It is possible for a <replaceable>default_expression</> to reference
-     the value of output columns that appear prior to it in the column list,
-     so the default of one column may be based on the value of another
-     column.
+     If the path expression returns an empty node-set
+     (typically, when it does not match)
+     for a given row, the column will be set to <literal>NULL</literal>, unless
+     a <replaceable>default_expression</replaceable> is specified; then the
+     value resulting from evaluating that expression is used.
     </para>
 
     <para>
@@ -10688,20 +10714,14 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
     </para>
 
     <para>
-     Unlike regular PostgreSQL functions, <replaceable>column_expression</>
-     and <replaceable>default_expression</> are not evaluated to a simple
-     value before calling the function.
-     <replaceable>column_expression</> is normally evaluated
-     exactly once per input row, and <replaceable>default_expression</>
-     is evaluated each time a default is needed for a field.
-     If the expression qualifies as stable or immutable the repeat
+     A <replaceable>default_expression</replaceable>, rather than being
+     evaluated immediately when <function>xmltable</function> is called,
+     is evaluated each time a default is needed for the column.
+     If the expression qualifies as stable or immutable, the repeat
      evaluation may be skipped.
-     Effectively <function>xmltable</> behaves more like a subquery than a
-     function call.
      This means that you can usefully use volatile functions like
-     <function>nextval</> in <replaceable>default_expression</>, and
-     <replaceable>column_expression</> may depend on other parts of the
-     XML document.
+     <function>nextval</function> in
+     <replaceable>default_expression</replaceable>.
     </para>
 
     <para>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 8e746f3..ccd3450 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -593,7 +593,7 @@ X085	Predefined namespace prefixes			NO
 X086	XML namespace declarations in XMLTable			NO	
 X090	XML document predicate			YES	
 X091	XML content predicate			NO	
-X096	XMLExists			NO	XPath only
+X096	XMLExists			NO	XPath 1.0 only
 X100	Host language support for XML: CONTENT option			NO	
 X101	Host language support for XML: DOCUMENT option			NO	
 X110	Host language support for XML: VARCHAR mapping			NO	
@@ -661,11 +661,11 @@ X282	XMLValidate with CONTENT option			NO
 X283	XMLValidate with SEQUENCE option			NO	
 X284	XMLValidate: NAMESPACE without ELEMENT clause			NO	
 X286	XMLValidate: NO NAMESPACE with ELEMENT clause			NO	
-X300	XMLTable			NO	XPath only
+X300	XMLTable			NO	XPath 1.0 only
 X301	XMLTable: derived column list option			YES	
 X302	XMLTable: ordinality column option			YES	
 X303	XMLTable: column default option			YES	
-X304	XMLTable: passing a context item			YES	
+X304	XMLTable: passing a context item			YES	must be XML DOCUMENT
 X305	XMLTable: initializing an XQuery variable			NO	
 X400	Name and identifier mapping			YES	
 X410	Alter column data type: XML type			YES	
-- 
2.7.3

#47

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Chapman Flack (#46)

Re: Fix XML handling with DOCTYPE

On 2019-Aug-03, Chapman Flack wrote:

I don't know if it's too late to get in the upcoming minor releases,
but maybe it can, if it looks ok, or the next ones, if that's too rushed.

Hmm, I'm travelling back home from a conference the weekend, so yeah I
think it would be rushed for me to handle for the upcoming set. But I
can look at it before the *next* set.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#48

Chapman Flack

chap@anastigmatix.net

over 6 years ago

In reply to: Alvaro Herrera (#47)

Re: Fix XML handling with DOCTYPE

Hi Alvaro,

On 08/03/19 12:15, Alvaro Herrera wrote:

I don't know if it's too late to get in the upcoming minor releases,
but maybe it can, if it looks ok, or the next ones, if that's too rushed.

Hmm, I'm travelling back home from a conference the weekend, so yeah I
think it would be rushed for me to handle for the upcoming set. But I
can look at it before the *next* set.

Are these on your radar to maybe backpatch in this round of activity?

The latest patches I did for 11 and 10 are in
/messages/by-id/5D45A44F.8010803@anastigmatix.net

Cheers,
-Chap

#49

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Chapman Flack (#48)

Re: Fix XML handling with DOCTYPE

Hi Chapman,

On 2019-Sep-05, Chapman Flack wrote:

Are these on your radar to maybe backpatch in this round of activity?

The latest patches I did for 11 and 10 are in
/messages/by-id/5D45A44F.8010803@anastigmatix.net

Thanks! I just pushed these to those branches.

I think we're finally done with these. Many thanks for your
persistence.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services