XMLDocument (SQL/XML X030)

Started by Jim Jonesover 1 year ago47 messages

jim.jones@uni-muenster.de

over 1 year ago

Hi,

I'd like to propose the implementation of XMLDocument (SQL/XML X030).It
basically returns an XML document from a given XML expression, e.g.

SELECT
xmldocument(
    xmlelement(NAME foo,
      xmlattributes(42 AS att),
      xmlelement(NAME bar,
        xmlconcat('va', 'lue'))
    )
);

xmldocument
--------------------------------------
<foo att="42"><bar>value</bar></foo>
(1 row)

XMLDocument doesn't do much. In fact, it might be reduced to a simple
xmlparse() call as XMLOPTION_DOCUMENT...

xmlparse(data, XMLOPTION_DOCUMENT, true)

... to make sure that the given XML expression is a valid document -
still need some more research there. One could argue that XMLDocument()
is in most cases unnecessary, but I believe it would facilitate the
migration of scripts from other database products.

Any thoughts?

Best, Jim

Jim Jones

jim.jones@uni-muenster.de

about 1 year ago

In reply to: Jim Jones (#1)

Re: XMLDocument (SQL/XML X030)

On 04.12.24 17:18, Jim Jones wrote:

I'd like to propose the implementation of XMLDocument (SQL/XML X030).
It basically returns an XML document from a given XML expression, e.g.

SELECT
xmldocument(
    xmlelement(NAME foo,
      xmlattributes(42 AS att),
      xmlelement(NAME bar,
        xmlconcat('va', 'lue'))
    )
);

             xmldocument
--------------------------------------
<foo att="42"><bar>value</bar></foo>
(1 row)

v1 attached attempts to implement XMLDocument() as described above.

Feedback welcome.

--
Jim

Andrew Dunstan

andrew@dunslane.net

about 1 year ago

In reply to: Jim Jones (#2)

Re: XMLDocument (SQL/XML X030)

On 2024-12-10 Tu 2:48 AM, Jim Jones wrote:

On 04.12.24 17:18, Jim Jones wrote:

I'd like to propose the implementation of XMLDocument (SQL/XML X030).
It basically returns an XML document from a given XML expression, e.g.

SELECT
xmldocument(
    xmlelement(NAME foo,
      xmlattributes(42 AS att),
      xmlelement(NAME bar,
        xmlconcat('va', 'lue'))
    )
);

             xmldocument
--------------------------------------
<foo att="42"><bar>value</bar></foo>
(1 row)

v1 attached attempts to implement XMLDocument() as described above.

Feedback welcome.

LGTM at a first glance.

Please add this to the next CommitFest if you haven't done already.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Jim Jones

jim.jones@uni-muenster.de

about 1 year ago

In reply to: Andrew Dunstan (#3)

Re: XMLDocument (SQL/XML X030)

Hi Andrew

On 10.12.24 14:59, Andrew Dunstan wrote:

LGTM at a first glance.

Please add this to the next CommitFest if you haven't done already.

Thanks!

This is the CF entry: https://commitfest.postgresql.org/51/5431/

Best, Jim

Pavel Stehule

pavel.stehule@gmail.com

about 1 year ago

In reply to: Jim Jones (#1)

Re: XMLDocument (SQL/XML X030)

út 14. 1. 2025 v 8:11 odesílatel Jim Jones <jim.jones@uni-muenster.de>
napsal:

Hi,

I'd like to propose the implementation of XMLDocument (SQL/XML X030).It
basically returns an XML document from a given XML expression, e.g.

SELECT
xmldocument(
xmlelement(NAME foo,
xmlattributes(42 AS att),
xmlelement(NAME bar,
xmlconcat('va', 'lue'))
)
);

xmldocument
--------------------------------------
<foo att="42"><bar>value</bar></foo>
(1 row)

XMLDocument doesn't do much. In fact, it might be reduced to a simple
xmlparse() call as XMLOPTION_DOCUMENT...

xmlparse(data, XMLOPTION_DOCUMENT, true)

... to make sure that the given XML expression is a valid document -
still need some more research there. One could argue that XMLDocument()
is in most cases unnecessary, but I believe it would facilitate the
migration of scripts from other database products.

Any thoughts?

I did some research and the design of this document is different

1. Oracle doesn't support this
2. DB2 has different implementations for z/OS (variadic) and for unix
(nonvariadic)
3. looks so db2 allows some concatenation of xml content when xmlexpr is
not the document already (not tested)
4. Your implementation just raise an exception

I didn't find a free downloadable SQL/XML standard with description of
XMLDOCUMENT so I read just the DB2 doc, but it isn't fully consistent and
it is different from your implementation.
So the argument of better compatibility for this patch doesn't look too
strong. But I found that the usage of XMLDOCUMENT is required for storing
XML, so it can be a frequently used function. Unfortunately, I do not have
any knowledge about db2. It is hard to understand the usage of this
function, because the sense is probably different than in DB2, and the
documentation doesn't explain well an usage and motivation for this
function. If it does a check, then it is not described in doc.

Regards

Pavel

Show quoted text

Best, Jim

Jim Jones

jim.jones@uni-muenster.de

about 1 year ago

In reply to: Pavel Stehule (#5)

Re: XMLDocument (SQL/XML X030)

Hi Pavel

On 14.01.25 09:14, Pavel Stehule wrote:

I did some research and the design of this document is different

1. Oracle doesn't support this
2. DB2 has different implementations for z/OS (variadic) and for unix
(nonvariadic)
3. looks so db2 allows some concatenation of xml content when xmlexpr
is not the document already (not tested)
4. Your implementation just raise an exception

I'm not entirely sure I follow. XMLDOCUMENT is designed to produce a well-formed XML document, and according to the XML specification, a well-formed document must have precisely one root element.

SELECT
xmlserialize(DOCUMENT
xmldocument(
xmlelement(NAME root,
xmlattributes(42 AS att),
xmlcomment('comment'),
xmlelement(NAME foo,'<foo&bar>'),
xmlelement(NAME bar,
xmlconcat('va', 'lue')),
xmlpi(name pi),
xmlelement(NAME txt, xmltext('<"&>'))
)) AS text INDENT) ;
xmlserialize
----------------------------------
<root att="42"> +
 +
<foo><foo&bar></foo>+
<bar>value</bar> +
<?pi?> +
<txt><"&></txt> +
</root>
(1 row)

Could you provide an example of this feature you're missing?

Malformed CONTENT xml strings will indeed raise an exception.

SELECT xmldocument('foo'::xml);
ERROR: invalid XML document
DETAIL: line 1: Start tag expected, '<' not found
foo
^

I didn't find a free downloadable SQL/XML standard with description of
XMLDOCUMENT so I read just the DB2 doc, but it isn't fully consistent
and it is different from your implementation.

The main idea is to ensure that an xml string is a valid document (even
in CONTENT mode)

postgres=# SET xmloption TO DOCUMENT;
SET
postgres=# SELECT 'foo'::xml;
ERROR: invalid XML document
LINE 1: SELECT 'foo'::xml;
^
DETAIL: line 1: Start tag expected, '<' not found
foo
^
postgres=# SET xmloption TO CONTENT;
SET
postgres=# SELECT 'foo'::xml;
xml
-----
foo
(1 row)

postgres=# SELECT xmldocument('foo'::xml);
ERROR: invalid XML document
DETAIL: line 1: Start tag expected, '<' not found
foo
^

So the argument of better compatibility for this patch doesn't look
too strong. But I found that the usage of XMLDOCUMENT is required for
storing XML, so it can be a frequently used function. Unfortunately,
I do not have any knowledge about db2. It is hard to understand the
usage of this function, because the sense is probably different than
in DB2, and the documentation doesn't explain well an usage and
motivation for this function. If it does a check, then it is not
described in doc.

Perhaps changing the documentation like this would make things clearer?

"The xmldocument function encapsulates the XML expression within a valid XML document structure. The expression passed as the argument must be a valid, single-rooted XML fragment. If the XML expression is NULL, the result will also be NULL."

Many thanks for the review!

Best, Jim

Pavel Stehule

pavel.stehule@gmail.com

about 1 year ago

In reply to: Jim Jones (#6)

Re: XMLDocument (SQL/XML X030)

st 15. 1. 2025 v 22:05 odesílatel Jim Jones <jim.jones@uni-muenster.de>
napsal:

Hi Pavel

On 14.01.25 09:14, Pavel Stehule wrote:

I did some research and the design of this document is different

1. Oracle doesn't support this
2. DB2 has different implementations for z/OS (variadic) and for unix
(nonvariadic)
3. looks so db2 allows some concatenation of xml content when xmlexpr
is not the document already (not tested)
4. Your implementation just raise an exception

I'm not entirely sure I follow. XMLDOCUMENT is designed to produce a
well-formed XML document, and according to the XML specification, a
well-formed document must have precisely one root element.

SELECT
xmlserialize(DOCUMENT
xmldocument(
xmlelement(NAME root,
xmlattributes(42 AS att),
xmlcomment('comment'),
xmlelement(NAME foo,'<foo&bar>'),
xmlelement(NAME bar,
xmlconcat('va', 'lue')),
xmlpi(name pi),
xmlelement(NAME txt, xmltext('<"&>'))
)) AS text INDENT) ;
xmlserialize
----------------------------------
<root att="42"> +
 +
<foo><foo&bar></foo>+
<bar>value</bar> +
<?pi?> +
<txt><"&></txt> +
</root>
(1 row)

Could you provide an example of this feature you're missing?

Malformed CONTENT xml strings will indeed raise an exception.

SELECT xmldocument('foo'::xml);
ERROR: invalid XML document
DETAIL: line 1: Start tag expected, '<' not found
foo
^

I didn't find a free downloadable SQL/XML standard with description of
XMLDOCUMENT so I read just the DB2 doc, but it isn't fully consistent
and it is different from your implementation.

The main idea is to ensure that an xml string is a valid document (even
in CONTENT mode)

postgres=# SET xmloption TO DOCUMENT;
SET
postgres=# SELECT 'foo'::xml;
ERROR: invalid XML document
LINE 1: SELECT 'foo'::xml;
^
DETAIL: line 1: Start tag expected, '<' not found
foo
^
postgres=# SET xmloption TO CONTENT;
SET
postgres=# SELECT 'foo'::xml;
xml
-----
foo
(1 row)

postgres=# SELECT xmldocument('foo'::xml);
ERROR: invalid XML document
DETAIL: line 1: Start tag expected, '<' not found
foo
^

So the argument of better compatibility for this patch doesn't look
too strong. But I found that the usage of XMLDOCUMENT is required for
storing XML, so it can be a frequently used function. Unfortunately,
I do not have any knowledge about db2. It is hard to understand the
usage of this function, because the sense is probably different than
in DB2, and the documentation doesn't explain well an usage and
motivation for this function. If it does a check, then it is not
described in doc.

Perhaps changing the documentation like this would make things clearer?

"The xmldocument function encapsulates the XML expression within a valid
XML document structure. The expression passed as the argument must be a
valid, single-rooted XML fragment. If the XML expression is NULL, the
result will also be NULL."

It is better.

My note was related to a very different description of this functionality
in DB2. So if you propose this function for better compatibility (and this
function is implemented only by db2), it is surprising to see that this
functionality is described (and probably implemented) very differently.
Because I do not have db2 and I miss db2 knowledge, I don't know if
differences in implementation and description are based on different
technology (XML like graph or XML like string) or if it is something that
is missing in this patch.

Regards

Pavel

Show quoted text

Many thanks for the review!

Best, Jim

Jim Jones

jim.jones@uni-muenster.de

about 1 year ago

In reply to: Pavel Stehule (#7)

Re: XMLDocument (SQL/XML X030)

On 16.01.25 07:11, Pavel Stehule wrote:

It is better.

v2 attached updates the documentation.

My note was related to a very different description of this
functionality in DB2. So if you propose this function for better
compatibility (and this function is implemented only by db2), it is
surprising to see that this functionality is described (and probably
implemented) very differently. Because I do not have db2 and I miss
db2 knowledge, I don't know if differences in implementation and
description are based on different technology (XML like graph or XML
like string) or if it is something that is missing in this patch.

I suppose it's mostly because PostgreSQL and DB2 have different
structures for the XML data type; DB2 stores it in its native
hierarchical format rather than as text.

Thanks for the review.

Best, Jim

Jim Jones

jim.jones@uni-muenster.de

about 1 year ago

In reply to: Jim Jones (#8)

Re: XMLDocument (SQL/XML X030)

On 16.01.25 08:21, Jim Jones wrote:

On 16.01.25 07:11, Pavel Stehule wrote:

It is better.

v2 attached updates the documentation.

My note was related to a very different description of this
functionality in DB2. So if you propose this function for better
compatibility (and this function is implemented only by db2), it is
surprising to see that this functionality is described (and probably
implemented) very differently. Because I do not have db2 and I miss
db2 knowledge, I don't know if differences in implementation and
description are based on different technology (XML like graph or XML
like string) or if it is something that is missing in this patch.

I suppose it's mostly because PostgreSQL and DB2 have different
structures for the XML data type; DB2 stores it in its native
hierarchical format rather than as text.

Thanks for the review.

Best, Jim

The DB2 "Document node constructors" might provide some insights into
its behavior regarding well-formed XML documents [1]:

"No validation is performed on the constructed document node. The XQuery
document node constructor does not enforce the XML 1.0 rules that govern
the structure of an XML document. For example, a document node is not
required to have exactly one child that is an element node."

This suggests that DB2's design reflects a different approach to
handling XML, focusing less on enforcing XML 1.0 constraints. It appears
to be more of a design philosophy regarding how XML is integrated into
the database system as a whole, rather than just a difference in the
implementation of the XMLDocument function.

PostgreSQL does not support the RETURNING SEQUENCE or RETURNING CONTENT
clauses explicitly. Instead, it implicitly uses RETURNING CONTENT[2] in
functions that require it. Since RETURNING CONTENT implies that the
output is a well-formed XML document (e.g., single-rooted), I would
argue that the behavior of this patch is more intuitive and aligns best
with the expectations of XML document structure.

Any thoughts?

Best, Jim

1 - https://www.ibm.com/docs/en/db2/11.1?topic=constructors-document-node
2 - https://www.postgresql.org/docs/17/xml-limits-conformance.html

#10

Chapman Flack

chap@anastigmatix.net

about 1 year ago

In reply to: Jim Jones (#9)

Re: XMLDocument (SQL/XML X030)

On 01/20/25 06:02, Jim Jones wrote:

The DB2 "Document node constructors" might provide some insights into
its behavior regarding well-formed XML documents [1]:

"No validation is performed on the constructed document node. The XQuery
document node constructor does not enforce the XML 1.0 rules that govern
the structure of an XML document. For example, a document node is not
required to have exactly one child that is an element node."

This suggests that DB2's design reflects a different approach to
handling XML, focusing less on enforcing XML 1.0 constraints. It appears
to be more of a design philosophy regarding how XML is integrated into
the database system as a whole, rather than just a difference in the
implementation of the XMLDocument function.

Indeed. ISO SQL/XML changed significantly between the 2003 edition
(largely followed by PostgreSQL) and the 2006 and all later editions.
There's a rundown of those changes at [3].

PostgreSQL does not support the RETURNING SEQUENCE or RETURNING CONTENT
clauses explicitly. Instead, it implicitly uses RETURNING CONTENT[2] in
functions that require it. Since RETURNING CONTENT implies that the
output is a well-formed XML document (e.g., single-rooted),

In fact, you can't infer single-root-element-ness from RETURNING CONTENT,
according to the standard. Single-root-element-ness is checked by the
IS DOCUMENT predicate, and by XMLPARSE and XMLSERIALIZE when they specify
DOCUMENT. But it isn't checked or implied by the XMLDOCUMENT constructor.

That amounts to a bit of unfortunate punning on the word DOCUMENT,
but so help me that's what's in the standard.

It may help to think in terms of the hierarchy of XML types that the
2006 standard introduced (cribbed here from [3]):

SEQUENCE
|
(?sequence of length 1, a document node)
|
CONTENT(ANY)----------------.----------------(?every element
| | conforms to a
(?every element has (?no extraneous schema)
xdt:untyped and !nilled, nodes) |
every attribute has | |
xdt:untypedAtomic) DOCUMENT(ANY) CONTENT(XMLSCHEMA)
| |
CONTENT(UNTYPED) (?whole thing is valid
| according to schema)
(?no extraneous nodes) |
| DOCUMENT(XMLSCHEMA)
DOCUMENT(UNTYPED)

where the condition (?no extraneous nodes) is shorthand for SQL/XML's
more precise "whose `children` property has exactly one XQuery element
node, zero or more XQuery comment nodes, and zero or more XQuery
processing instruction nodes".

So that (?no extraneous nodes) condition is required for any of
the XML(DOCUMENT...) types. When you relax that condition, you have
an XML(CONTENT...) type.

The XMLDOCUMENT constructor is so named because it constructs what
corresponds to an XQuery document node—which actually corresponds to
the XML(CONTENT...) SQL/XML types, and does not enforce having a
single root element:

"This data model is more permissive: a Document Node may be empty,
it may have more than one Element Node as a child, and it also
permits Text Nodes as children."[4]

So in terms of the SQL/XML type hierarchy, what you get back from
XMLDOCUMENT ... RETURNING CONTENT will have one of the XML(CONTENT...)
types (whether it's CONTENT(ANY) or CONTENT(UNTYPED) is left to the
implementation).

If you then want to know if it is single-rooted, you can apply the
IS DOCUMENT predicate, or try to cast it to an XML(DOCUMENT...) type.

(And if you use XMLDOCUMENT ... RETURNING SEQUENCE, then you get a
value of type XML(SEQUENCE). The sequence has length 1, a document
node, making it safely castable to XML(CONTENT(ANY)), but whether
you can cast it to an XML(DOCUMENT...) type will depend on what
children that document node has.)

Long story short, an XMLDOCUMENT constructor that enforced having
a single root element would be nonconformant.

Regards,
-Chap

1 - https://www.ibm.com/docs/en/db2/11.1?topic=constructors-document-node
2 - https://www.postgresql.org/docs/17/xml-limits-conformance.html

3 -
https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#SQL.2FXML:2003_contrasted_with_SQL.2FXML_since_2006
4 - https://www.w3.org/TR/2010/REC-xpath-datamodel-20101214/#DocumentNode

#11

Jim Jones

jim.jones@uni-muenster.de

about 1 year ago

In reply to: Chapman Flack (#10)

Re: XMLDocument (SQL/XML X030)

Hi Chap,

Thanks for the thorough explanation!

On 20.01.25 20:09, Chapman Flack wrote:

PostgreSQL does not support the RETURNING SEQUENCE or RETURNING CONTENT
clauses explicitly. Instead, it implicitly uses RETURNING CONTENT[2] in
functions that require it. Since RETURNING CONTENT implies that the
output is a well-formed XML document (e.g., single-rooted),

In fact, you can't infer single-root-element-ness from RETURNING CONTENT,
according to the standard. Single-root-element-ness is checked by the
IS DOCUMENT predicate, and by XMLPARSE and XMLSERIALIZE when they specify
DOCUMENT. But it isn't checked or implied by the XMLDOCUMENT constructor.

That amounts to a bit of unfortunate punning on the word DOCUMENT,
but so help me that's what's in the standard.

Yeah, the term DOCUMENT seems a bit misleading in this context.

It may help to think in terms of the hierarchy of XML types that the
2006 standard introduced (cribbed here from [3]):

SEQUENCE
|
(?sequence of length 1, a document node)
|
CONTENT(ANY)----------------.----------------(?every element
| | conforms to a
(?every element has (?no extraneous schema)
xdt:untyped and !nilled, nodes) |
every attribute has | |
xdt:untypedAtomic) DOCUMENT(ANY) CONTENT(XMLSCHEMA)
| |
CONTENT(UNTYPED) (?whole thing is valid
| according to schema)
(?no extraneous nodes) |
| DOCUMENT(XMLSCHEMA)
DOCUMENT(UNTYPED)

where the condition (?no extraneous nodes) is shorthand for SQL/XML's
more precise "whose `children` property has exactly one XQuery element
node, zero or more XQuery comment nodes, and zero or more XQuery
processing instruction nodes".

So that (?no extraneous nodes) condition is required for any of
the XML(DOCUMENT...) types. When you relax that condition, you have
an XML(CONTENT...) type.

The XMLDOCUMENT constructor is so named because it constructs what
corresponds to an XQuery document node—which actually corresponds to
the XML(CONTENT...) SQL/XML types, and does not enforce having a
single root element:

"This data model is more permissive: a Document Node may be empty,
it may have more than one Element Node as a child, and it also
permits Text Nodes as children."[4]

Thanks a lot for pointing that out! I guess it's clear now.

So in terms of the SQL/XML type hierarchy, what you get back from
XMLDOCUMENT ... RETURNING CONTENT will have one of the XML(CONTENT...)
types (whether it's CONTENT(ANY) or CONTENT(UNTYPED) is left to the
implementation).

If you then want to know if it is single-rooted, you can apply the
IS DOCUMENT predicate, or try to cast it to an XML(DOCUMENT...) type.

(And if you use XMLDOCUMENT ... RETURNING SEQUENCE, then you get a
value of type XML(SEQUENCE). The sequence has length 1, a document
node, making it safely castable to XML(CONTENT(ANY)), but whether
you can cast it to an XML(DOCUMENT...) type will depend on what
children that document node has.)

Long story short, an XMLDOCUMENT constructor that enforced having
a single root element would be nonconformant.

If I understand correctly, the compliant approach would be to always
treat the input expression as CONTENT:

|PG_RETURN_XML_P(xmlparse((text *) data, XMLOPTION_DOCUMENT, true));|

Is that right?"

1 - https://www.ibm.com/docs/en/db2/11.1?topic=constructors-document-node
2 - https://www.postgresql.org/docs/17/xml-limits-conformance.html

3 -
https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#SQL.2FXML:2003_contrasted_with_SQL.2FXML_since_2006
4 - https://www.w3.org/TR/2010/REC-xpath-datamodel-20101214/#DocumentNode

Best, Jim

#12

Andrew Dunstan

andrew@dunslane.net

about 1 year ago

In reply to: Chapman Flack (#10)

Re: XMLDocument (SQL/XML X030)

On 2025-01-20 Mo 2:09 PM, Chapman Flack wrote:

Long story short, an XMLDOCUMENT constructor that enforced having
a single root element would be nonconformant.

What an (expletive deleted) mess. I'm glad someone is on top of it.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#13

Chapman Flack

chap@anastigmatix.net

about 1 year ago

In reply to: Jim Jones (#11)

Re: XMLDocument (SQL/XML X030)

On 01/20/25 14:56, Jim Jones wrote:

If I understand correctly, the compliant approach would be to always
treat the input expression as CONTENT:

|PG_RETURN_XML_P(xmlparse((text *) data, XMLOPTION_DOCUMENT, true));|

Is that right?"

Did you mean XMLOPTION_CONTENT?

I think that would be right ... which, to be honest, leaves me
in doubt that there is any substantive work at all for an
XMLDOCUMENT constructor in PostgreSQL to do. Possibly that's why
it wasn't provided originally.

We don't have the XML(SEQUENCE) type, and our single XML type
essentially corresponds to XML(CONTENT(ANY)). Any value that has
already been accepted by the input function should already have
the same serialized form you'd expect from XMLDOCUMENT.

We don't have BY REF and passing XML node trees around, so there's
really no way to notice that you've applied XMLDOCUMENT to wrap
stuff in a 'document node' that isn't visible in the serialized
form that is all we store.

Likewise, I'm not thinking of any way to combine our other XML
constructors / aggregates to produce a result that isn't already
XML(CONTENT(ANY)). (If I could think of a way, I think it would mean
those weren't well-specified.)

Therefore I'm thinking that, given the specifics of our XML support,
a fully conformant and efficient XMLDOCUMENT could be implemented
just by returning its XML argument.

That opens a question of whether it's worth the effort to supply
it at all. Maybe it could reduce the surprise for people coming from
another DBMS and finding it missing, and/or be a placeholder in case
we ever implement enough more of the newer SQL/XML standard for it
to have a real effect.

(In a DBMS implementing the whole XML type hierarchy, you could
have an XML(SEQUENCE) containing a single element node, which
you might think you could easily cast to XML(CONTENT(ANY)) or even
XML(DOCUMENT(ANY)), but the cast would raise 2200U or 2200L, respectively:
the sequence would first need to be transformed into a sequence of
a single document node wrapping the original sequence, which is exactly
what you'd use XMLDOCUMENT to do. And of course you could put RETURNING
CONTENT on that XMLDOCUMENT to accomplish the cast in the same step.)

Regards,
-Chap

#14

Jim Jones

jim.jones@uni-muenster.de

about 1 year ago

In reply to: Chapman Flack (#13)

Re: XMLDocument (SQL/XML X030)

On 20.01.25 23:21, Chapman Flack wrote:

Therefore I'm thinking that, given the specifics of our XML support,
a fully conformant and efficient XMLDOCUMENT could be implemented
just by returning its XML argument.

After your explanation, I tend to agree.

v3, attached, incorporates these changes and updates the regression
tests accordingly.

That opens a question of whether it's worth the effort to supply
it at all. Maybe it could reduce the surprise for people coming from
another DBMS and finding it missing, and/or be a placeholder in case
we ever implement enough more of the newer SQL/XML standard for it
to have a real effect.

Although quite trivial, I believe this function could still be valuable
in facilitating the migration of scripts from other database systems --
improving SQL/XML conformance also isn't a bad thing :).

Thank you again for your help in interpreting the SQL/XML standard. Much
appreciated!

Best regards, Jim

#15

Robert Treat

xzilla@users.sourceforge.net

about 1 year ago

In reply to: Jim Jones (#14)

Re: XMLDocument (SQL/XML X030)

On Tue, Jan 21, 2025 at 5:58 AM Jim Jones <jim.jones@uni-muenster.de> wrote:

On 20.01.25 23:21, Chapman Flack wrote:

Therefore I'm thinking that, given the specifics of our XML support,
a fully conformant and efficient XMLDOCUMENT could be implemented
just by returning its XML argument.

After your explanation, I tend to agree.

v3, attached, incorporates these changes and updates the regression
tests accordingly.

That opens a question of whether it's worth the effort to supply
it at all. Maybe it could reduce the surprise for people coming from
another DBMS and finding it missing, and/or be a placeholder in case
we ever implement enough more of the newer SQL/XML standard for it
to have a real effect.

Although quite trivial, I believe this function could still be valuable
in facilitating the migration of scripts from other database systems --
improving SQL/XML conformance also isn't a bad thing :).

Is there some concrete use case you have seen that this would help
with? Not objecting to adding it, but you've mentioned this migration
idea twice but it seems to me this doesn't conform with existing
implementations, and I don't see much benefit in migration use cases
specifically, so I'm just curious if I am overlooking something?

Robert Treat
https://xzilla.net

#16

Jim Jones

jim.jones@uni-muenster.de

about 1 year ago

In reply to: Robert Treat (#15)

Re: XMLDocument (SQL/XML X030)

On 21.01.25 23:45, Robert Treat wrote:

Is there some concrete use case you have seen that this would help
with? Not objecting to adding it, but you've mentioned this migration
idea twice but it seems to me this doesn't conform with existing
implementations, and I don't see much benefit in migration use cases
specifically, so I'm just curious if I am overlooking something?

I wouldn’t frame it as a "migration idea". My point is that this would
be one less function to modify when migrating a script from another
database system to PostgreSQL.

For example, in DB2:

SELECT xmldocument(xmlforest(10 as x, 20 as y)) FROM SYSIBM.SYSDUMMY1;

----------------------------------------------------------
<X>10</X><Y>20</Y>

1 record(s) selected.

... This same query wouldn’t work in PostgreSQL, but the function makes
the query more compatible.

SELECT xmldocument(xmlforest(10 as x, 20 as y));

xmldocument
--------------------
<x>10</x><y>20</y>
(1 row)

Which compatibility issues with existing implementations are you
referring to?

Thanks!

Jim

#17

Robert Treat

xzilla@users.sourceforge.net

about 1 year ago

In reply to: Jim Jones (#16)

Re: XMLDocument (SQL/XML X030)

On Tue, Jan 21, 2025 at 6:36 PM Jim Jones <jim.jones@uni-muenster.de> wrote:

On 21.01.25 23:45, Robert Treat wrote:

Is there some concrete use case you have seen that this would help
with? Not objecting to adding it, but you've mentioned this migration
idea twice but it seems to me this doesn't conform with existing
implementations, and I don't see much benefit in migration use cases
specifically, so I'm just curious if I am overlooking something?

I wouldn’t frame it as a "migration idea". My point is that this would
be one less function to modify when migrating a script from another
database system to PostgreSQL.

<snip>

Which compatibility issues with existing implementations are you
referring to?

I'm mostly referring back to Pavel's statements:
| 1. Oracle doesn't support this
| 2. DB2 has different implementations for z/OS (variadic) and for
unix (nonvariadic)

With an added #3 which is AFAIK sql server doesn't implement this
either (I believe they implement something similar with
sp_xml_preparedocument)

So even if we are following the spec (which I think technically we may
not be), if no other database implements it the way we are, then it
really isn't one less function people will have to modify. Which
again, I'm not trying to argue against having it, I'm just trying to
figure out if there is some specific group that would benefit from it.

Robert Treat
https://xzilla.net

#18

Chapman Flack

chap@anastigmatix.net

about 1 year ago

In reply to: Robert Treat (#17)

Re: XMLDocument (SQL/XML X030)

On 01/22/25 13:41, Robert Treat wrote:

So even if we are following the spec (which I think technically we may
not be),

There are definite ways in which we're not following the SQL/XML spec,
which we document in an appendix[1]https://www.postgresql.org/docs/17/xml-limits-conformance.html. The one that matters here is that
we just have a single XML type instead of the hierarchy of them in the
spec, and ours corresponds to what the spec calls XML(CONTENT(ANY)).

With that divergence from the spec understood, I don't see any new
divergence in providing an XMLDOCUMENT function that just returns
its argument. That's the correct result to return for anything that's
a valid value of our XML type to begin with.

if no other database implements it the way we are

There may be other systems that don't implement it at all, for which
case I don't see any compatibility issue created because we have it
and they do not.

There may be systems that implement the SQL/XML type hierarchy more
completely than we do, so that it would be possible for their
XMLDOCUMENT to be called with an XML(SEQUENCE) argument, or with
RETURNING SEQUENCE, both of which are things that can't happen
in PostgreSQL. I don't see a problem in that either, as long as
theirs produces results matching ours in the RETURNING CONTENT,
passed an XML(CONTENT...) argument case.

If another system produces results that differ, in that restricted
domain corresponding to ours, I'd say something's nonconformant in
that implementation. In my opinion, that would only be a problem
for us if the system in question is an 800 lb gorilla and has many
users relying on the differing behavior.

Regarding the patch itself: I wonder if there is already, somewhere
in the code base, a generic fmgr identity function that the pg_proc
entry could point to. Or is there a strong project tradition in favor
of writing a dedicated new one to use here? I'm not sure it's critical
to have a version that tests USE_LIBXML or reports it's unsupported,
because without support I doubt there's any way to pass it a non-null
XML argument, and if declared STRICT it won't be called for a null one
anyway.

Regards,
-Chap

[1]: https://www.postgresql.org/docs/17/xml-limits-conformance.html

#19

Jim Jones

jim.jones@uni-muenster.de

about 1 year ago

In reply to: Chapman Flack (#18)

Re: XMLDocument (SQL/XML X030)

Hi Chapman & Robert

Many thanks for the input

On 22.01.25 22:35, Chapman Flack wrote:

On 01/22/25 13:41, Robert Treat wrote:

So even if we are following the spec (which I think technically we may
not be),

There are definite ways in which we're not following the SQL/XML spec,
which we document in an appendix[1]. The one that matters here is that
we just have a single XML type instead of the hierarchy of them in the
spec, and ours corresponds to what the spec calls XML(CONTENT(ANY)).

With that divergence from the spec understood, I don't see any new
divergence in providing an XMLDOCUMENT function that just returns
its argument. That's the correct result to return for anything that's
a valid value of our XML type to begin with.

if no other database implements it the way we are

There may be other systems that don't implement it at all, for which
case I don't see any compatibility issue created because we have it
and they do not.

Yeah, as long as we stick to the specification, I don’t see any issue
with including it.

There may be systems that implement the SQL/XML type hierarchy more
completely than we do, so that it would be possible for their
XMLDOCUMENT to be called with an XML(SEQUENCE) argument, or with
RETURNING SEQUENCE, both of which are things that can't happen
in PostgreSQL. I don't see a problem in that either, as long as
theirs produces results matching ours in the RETURNING CONTENT,
passed an XML(CONTENT...) argument case.

If another system produces results that differ, in that restricted
domain corresponding to ours, I'd say something's nonconformant in
that implementation. In my opinion, that would only be a problem
for us if the system in question is an 800 lb gorilla and has many
users relying on the differing behavior.

Regarding the patch itself: I wonder if there is already, somewhere
in the code base, a generic fmgr identity function that the pg_proc
entry could point to. Or is there a strong project tradition in favor
of writing a dedicated new one to use here? I'm not sure it's critical
to have a version that tests USE_LIBXML or reports it's unsupported,
because without support I doubt there's any way to pass it a non-null
XML argument, and if declared STRICT it won't be called for a null one
anyway.

Regards,
-Chap

[1] https://www.postgresql.org/docs/17/xml-limits-conformance.html

Regarding compatibility, here is an example of how both implementations
handle single-rooted XML, non-single-rooted XML, and NULL values.

== DB2 [1] ==

WITH t(x) AS (
VALUES
    (xmlparse(DOCUMENT '<root><foo>bar</foo></root>')),
    (xmlforest(42 AS foo, 73 AS bar)),
    (NULL)
)
SELECT xmldocument(x) FROM t;

----------------------------
<root><foo>bar</foo></root>
<FOO>42</FOO><BAR>73</BAR>
-

3 record(s) selected.

== PostgreSQL ==

WITH t(x) AS (
VALUES
    (xmlparse(DOCUMENT '<root><foo>bar</foo></root>')),
    (xmlforest(42 AS foo, 73 AS bar)),
    (NULL)
)
SELECT xmldocument(x) FROM t;

xmldocument
-----------------------------
<root><foo>bar</foo></root>
<foo>42</foo><bar>73</bar>

(3 rows)

To make it clear: ensuring this function is compatible with other
database products is IMHO beneficial (and is my primary motivation), but
it shouldn't come at the expense of violating the SQL/XML specification.
I mean, if in some edge case, another database system implemented
XMLDocument in a way that deviates from the standard, I'd argue it’s not
worth prioritizing compatibility -- assuming, of course, that we are
fully following the standard.

Best regards, Jim

1 - https://dbfiddle.uk/G9VoHKp7

#20

Pavel Stehule

pavel.stehule@gmail.com

about 1 year ago

In reply to: Jim Jones (#19)

Re: XMLDocument (SQL/XML X030)

čt 23. 1. 2025 v 0:55 odesílatel Jim Jones <jim.jones@uni-muenster.de>
napsal:

Hi Chapman & Robert

Many thanks for the input

On 22.01.25 22:35, Chapman Flack wrote:

On 01/22/25 13:41, Robert Treat wrote:

So even if we are following the spec (which I think technically we may
not be),

There are definite ways in which we're not following the SQL/XML spec,
which we document in an appendix[1]. The one that matters here is that
we just have a single XML type instead of the hierarchy of them in the
spec, and ours corresponds to what the spec calls XML(CONTENT(ANY)).

With that divergence from the spec understood, I don't see any new
divergence in providing an XMLDOCUMENT function that just returns
its argument. That's the correct result to return for anything that's
a valid value of our XML type to begin with.

if no other database implements it the way we are

There may be other systems that don't implement it at all, for which
case I don't see any compatibility issue created because we have it
and they do not.

Yeah, as long as we stick to the specification, I don’t see any issue
with including it.

There may be systems that implement the SQL/XML type hierarchy more
completely than we do, so that it would be possible for their
XMLDOCUMENT to be called with an XML(SEQUENCE) argument, or with
RETURNING SEQUENCE, both of which are things that can't happen
in PostgreSQL. I don't see a problem in that either, as long as
theirs produces results matching ours in the RETURNING CONTENT,
passed an XML(CONTENT...) argument case.

If another system produces results that differ, in that restricted
domain corresponding to ours, I'd say something's nonconformant in
that implementation. In my opinion, that would only be a problem
for us if the system in question is an 800 lb gorilla and has many
users relying on the differing behavior.

Regarding the patch itself: I wonder if there is already, somewhere
in the code base, a generic fmgr identity function that the pg_proc
entry could point to. Or is there a strong project tradition in favor
of writing a dedicated new one to use here? I'm not sure it's critical
to have a version that tests USE_LIBXML or reports it's unsupported,
because without support I doubt there's any way to pass it a non-null
XML argument, and if declared STRICT it won't be called for a null one
anyway.

Regards,
-Chap

[1] https://www.postgresql.org/docs/17/xml-limits-conformance.html

Regarding compatibility, here is an example of how both implementations
handle single-rooted XML, non-single-rooted XML, and NULL values.

== DB2 [1] ==

WITH t(x) AS (
VALUES
(xmlparse(DOCUMENT '<root><foo>bar</foo></root>')),
(xmlforest(42 AS foo, 73 AS bar)),
(NULL)
)
SELECT xmldocument(x) FROM t;

----------------------------
<root><foo>bar</foo></root>
<FOO>42</FOO><BAR>73</BAR>

I think so in this form (just forward input to output) I have no
objection.

There is a benefit with a) possible zero work with migration from db2, b)
nobody needs to repeat a work which is a correct implementation of
XMLDOCUMENT function.

Maybe opened question can be implementation like classic scalar function or
via XmlExpr

In this moment I prefer to use XmlExpr from consistency reasons

Regards

Pavel

Show quoted text

3 record(s) selected.

== PostgreSQL ==

WITH t(x) AS (
VALUES
(xmlparse(DOCUMENT '<root><foo>bar</foo></root>')),
(xmlforest(42 AS foo, 73 AS bar)),
(NULL)
)
SELECT xmldocument(x) FROM t;

xmldocument
-----------------------------
<root><foo>bar</foo></root>
<foo>42</foo><bar>73</bar>

(3 rows)

To make it clear: ensuring this function is compatible with other
database products is IMHO beneficial (and is my primary motivation), but
it shouldn't come at the expense of violating the SQL/XML specification.
I mean, if in some edge case, another database system implemented
XMLDocument in a way that deviates from the standard, I'd argue it’s not
worth prioritizing compatibility -- assuming, of course, that we are
fully following the standard.

Best regards, Jim

1 - https://dbfiddle.uk/G9VoHKp7