[PoC] XMLCast (SQL/XML X025)

Started by Jim Jonesalmost 2 years ago36 messageshackers
Jump to latest
#1Jim Jones
jim.jones@uni-muenster.de

Hi,

This is a PoC that implements XMLCast (SQL/XML X025), which enables
conversions between SQL and XML data type.

It basically does the following:

* When casting an XML value to a SQL data type, XML values containing
XSD literals will be converted to their equivalent SQL data type.
* When casting from a SQL data type to XML, the cast operand will be
translated to its corresponding XSD data type.

SELECT xmlcast(now() AS xml);
             xmlcast              
----------------------------------
 2024-07-02T17:03:11.189073+02:00
(1 row)

SELECT xmlcast('2024-07-02T17:03:11.189073+02:00'::xml AS timestamp with
time zone);
            xmlcast            
-------------------------------
 2024-07-02 17:03:11.189073+02
(1 row)

SELECT xmlcast('P1Y2M3DT4H5M6S'::xml AS interval);
            xmlcast            
-------------------------------
 1 year 2 mons 3 days 04:05:06
(1 row)

SELECT xmlcast('<foo&bar>'::xml AS text);
  xmlcast  
-----------
 <foo&bar>
(1 row)

SELECT xmlcast('1 year 2 months 3 days 4 hours 5 minutes 6
seconds'::interval AS xml) ;
    xmlcast     
----------------
 P1Y2M3DT4H5M6S
(1 row)

SELECT xmlcast('42.73'::xml AS numeric);
 xmlcast
---------
   42.73
(1 row)

SELECT xmlcast(42730102030405 AS xml);
    xmlcast     
----------------
 42730102030405
(1 row)

Is it starting in the right direction? Any feedback would be much
appreciated.

Best,
Jim

Attachments:

v1-0001-Add-XMLCast-function-SQL-XML-X025.patchtext/x-patch; charset=UTF-8; name=v1-0001-Add-XMLCast-function-SQL-XML-X025.patchDownload+1371-9
#2Jim Jones
jim.jones@uni-muenster.de
In reply to: Jim Jones (#1)
Re: [PoC] XMLCast (SQL/XML X025)

On 02.07.24 18:02, Jim Jones wrote:

It basically does the following:

* When casting an XML value to a SQL data type, XML values containing
XSD literals will be converted to their equivalent SQL data type.
* When casting from a SQL data type to XML, the cast operand will be
translated to its corresponding XSD data type.

v2 attached adds missing return for NO_XML_SUPPORT control path in
unescape_xml

--
Jim

Attachments:

v2-0001-Add-XMLCast-function-SQL-XML-X025.patchtext/x-patch; charset=UTF-8; name=v2-0001-Add-XMLCast-function-SQL-XML-X025.patchDownload+1372-9
#3Jim Jones
jim.jones@uni-muenster.de
In reply to: Jim Jones (#2)
Re: [PoC] XMLCast (SQL/XML X025)

On 05.07.24 16:18, Jim Jones wrote:

On 02.07.24 18:02, Jim Jones wrote:

It basically does the following:

* When casting an XML value to a SQL data type, XML values containing
XSD literals will be converted to their equivalent SQL data type.
* When casting from a SQL data type to XML, the cast operand will be
translated to its corresponding XSD data type.

v2 attached adds missing return for NO_XML_SUPPORT control path in
unescape_xml

v3 adds the missing XML passing mechanism BY VALUE and BY REF, as
described in the  XMLCast specification:

XMLCAST (<XML cast operand> AS <XML cast target> [ <XML passing
mechanism> ])

Tests and documentation were updated accordingly.

--
Jim

Attachments:

v3-0001-Add-XMLCast-function-SQL-XML-X025.patchtext/x-patch; charset=UTF-8; name=v3-0001-Add-XMLCast-function-SQL-XML-X025.patchDownload+1739-9
#4Jim Jones
jim.jones@uni-muenster.de
In reply to: Jim Jones (#3)
Re: [PoC] XMLCast (SQL/XML X025)

rebase.

--
Jim

Attachments:

v4-0001-Add-XMLCast-function-SQL-XML-X025.patchtext/x-patch; charset=UTF-8; name=v4-0001-Add-XMLCast-function-SQL-XML-X025.patchDownload+1748-9
#5Robert Haas
robertmhaas@gmail.com
In reply to: Jim Jones (#4)
Re: [PoC] XMLCast (SQL/XML X025)

On Sun, Nov 10, 2024 at 1:14 PM Jim Jones <jim.jones@uni-muenster.de> wrote:

rebase.

Hmm, this patch has gotten no responses for 4 months. That's kind of
unfortunate. Sadly, there's not a whole lot that I can do to better
the situation, because I know very little either about XML-related
standards or about how people make use of XML in practice. It's not
that much code, so if it does a useful thing that we actually want, we
can probably figure out how to verify that the code is correct, or fix
it. But I don't know whether it's a useful thing that we actually
want. Syntactically, XMLCAST() looks a lot like CAST(), so one might
ask whether the things that it does can already be accomplished using
CAST(); or whether, perhaps, we have some other existing method for
performing such conversions.

The only thing I found during a quick perusal of the documentation was
XMLTABLE(), which seems a bit baroque if you just want to convert one
value. Is this intended to plug that gap? Is there any other current
way of doing it?

Do we need to ensure some kind of consistency between XMLTABLE() and
XMLCAST() in terms of how they behave? The documentation at
https://www.postgresql.org/docs/current/xml-limits-conformance.html#FUNCTIONS-XML-LIMITS-CASTS
says that "When PostgreSQL maps SQL data values to XML (as in
xmlelement), or XML to SQL (as in the output columns of xmltable),
except for a few cases treated specially, PostgreSQL simply assumes
that the XML data type's XPath 1.0 string form will be valid as the
text-input form of the SQL datatype, and conversely." Unfortunately,
it does not specify what those cases treated specially are, and the
commit that added that documentation text is not the one that added
the underlying code, so I don't actually know where that code is, but
one would expect this function to conform to that general rule.

I emphasize again that if there are people other than the submitter
who are interested in this patch, they should really chime in. This
can't progress in a vacuum.

--
Robert Haas
EDB: http://www.enterprisedb.com

#6Jim Jones
jim.jones@uni-muenster.de
In reply to: Robert Haas (#5)
Re: [PoC] XMLCast (SQL/XML X025)

Hi Robert
Thanks for taking a look at it.

On 11.11.24 19:15, Robert Haas wrote:

Hmm, this patch has gotten no responses for 4 months. That's kind of
unfortunate. Sadly, there's not a whole lot that I can do to better
the situation, because I know very little either about XML-related
standards or about how people make use of XML in practice. It's not
that much code, so if it does a useful thing that we actually want, we
can probably figure out how to verify that the code is correct, or fix
it. But I don't know whether it's a useful thing that we actually
want. Syntactically, XMLCAST() looks a lot like CAST(), so one might
ask whether the things that it does can already be accomplished using
CAST(); or whether, perhaps, we have some other existing method for
performing such conversions.

It indeed has a huge overlap with CAST(), except for a few handy SQL <->
XML mappings, such as

SELECT xmlcast('foo &amp; &lt;&quot;bar&quot;&gt;'::xml AS text);

    xmlcast    
---------------
 foo & <"bar">
(1 row)

--

SELECT
  xmlcast('2024-05-29 12:04:10.703585+02'::timestamp without time zone
AS xml),
  xmlcast('2024-05-29T12:04:10.703585'::xml AS timestamp without time zone);
 
          xmlcast           |          xmlcast           
----------------------------+----------------------------
 2024-05-29T12:04:10.703585 | 2024-05-29 12:04:10.703585
(1 row)

--

SELECT
  xmlcast('P1Y2M3DT4H5M6S'::xml AS interval),
  xmlcast('1 year 2 mons 3 days 4 hours 5 minutes 6 seconds'::interval
AS xml);
 
            xmlcast            |    xmlcast     
-------------------------------+----------------
 1 year 2 mons 3 days 04:05:06 | P1Y2M3DT4H5M6S
(1 row)

--

SELECT CAST('42'::xml AS int);

ERROR:  cannot cast type xml to integer
LINE 1: SELECT CAST('42'::xml AS int);
               ^
--

SELECT XMLCAST('42'::xml AS int);
 xmlcast
---------
      42
(1 row)

The only thing I found during a quick perusal of the documentation was
XMLTABLE(), which seems a bit baroque if you just want to convert one
value. Is this intended to plug that gap? Is there any other current
way of doing it?

Do we need to ensure some kind of consistency between XMLTABLE() and
XMLCAST() in terms of how they behave?

I haven't considered any compatibility to XMLTABLE(), as it has a
different spec (X300-X305), but I can take a look at it! To implement
this function I just followed the SQL/XML spec "ISO/IEC IWD 9075-14" -
and from time to time I also took a look on how other databases
implemented it.[1]

The documentation at
https://www.postgresql.org/docs/current/xml-limits-conformance.html#FUNCTIONS-XML-LIMITS-CASTS
says that "When PostgreSQL maps SQL data values to XML (as in
xmlelement), or XML to SQL (as in the output columns of xmltable),
except for a few cases treated specially, PostgreSQL simply assumes
that the XML data type's XPath 1.0 string form will be valid as the
text-input form of the SQL datatype, and conversely." Unfortunately,
it does not specify what those cases treated specially are, and the
commit that added that documentation text is not the one that added
the underlying code, so I don't actually know where that code is, but
one would expect this function to conform to that general rule.

I agree. It would be nice to know which cases those are.
However, invalid inputs should normally return an error, e.g.

SELECT xmlcast('foo&bar'::xml AS text);

ERROR:  invalid XML content
LINE 1: SELECT xmlcast('foo&bar'::xml AS text);
                       ^
DETAIL:  line 1: EntityRef: expecting ';'
foo&bar
       ^
--

SELECT xmlcast('foo'::xml AS date);
ERROR:  invalid input syntax for type date: "foo"

--

.. but perhaps the text means something else?

Thanks!

Best, Jim

1 - https://dbfiddle.uk/ZSpsyIal

#7Robert Haas
robertmhaas@gmail.com
In reply to: Jim Jones (#6)
Re: [PoC] XMLCast (SQL/XML X025)

Hi Jim,

On Mon, Nov 11, 2024 at 2:43 PM Jim Jones <jim.jones@uni-muenster.de> wrote:

The only thing I found during a quick perusal of the documentation was
XMLTABLE(), which seems a bit baroque if you just want to convert one
value. Is this intended to plug that gap? Is there any other current
way of doing it?

Do we need to ensure some kind of consistency between XMLTABLE() and
XMLCAST() in terms of how they behave?

I haven't considered any compatibility to XMLTABLE(), as it has a
different spec (X300-X305), but I can take a look at it! To implement
this function I just followed the SQL/XML spec "ISO/IEC IWD 9075-14" -
and from time to time I also took a look on how other databases
implemented it.[1]

Those are good things to check, but we also need to consider how it
interacts with features PostgreSQL itself already has. In particular,
I'm concerned about the special handling you seem to have for times
and intervals. That handling might be different from what, say,
XMLTABLE() does. In a perfect world, we'd probably like the features
to share code, unless there is some good reason to do otherwise. But
at the very least we want them to work in compatible ways. For
example, if the way you convert a date into the JSON-preferred format
happened to use slightly different time zone handling than the way
that some other existing feature does it, that would be extremely sad.
Or if the existing features don't have interval handling and you do,
perhaps we ought to add that capability to the existing features and
then have your new feature call the same code so that it works the
same way. I haven't researched what the exact situation is here too
and these examples I'm giving you here are strictly hypothetical --
they're just the kind of thing that needs to be sorted out before we
can think about committing anything.

There's still also the question of desirability. I take it for granted
that you want this feature and consider it valuable, but sometimes
people submit patches for a feature that only the submitter wants and
nobody else cares about it (or even, other people actively dislike
it). I am in a very poor position to assess how important this feature
is or to what extent it complies with the relevant specification. Vik,
who I see you copied, is probably in a much better position to
interpret the spec than I am, and may or may not also know something
about whether people want this. I continue to hope that we'll get some
comments from others as well.

--
Robert Haas
EDB: http://www.enterprisedb.com

#8Jim Jones
jim.jones@uni-muenster.de
In reply to: Robert Haas (#7)
Re: [PoC] XMLCast (SQL/XML X025)

On 12.11.24 15:59, Robert Haas wrote:

Those are good things to check, but we also need to consider how it
interacts with features PostgreSQL itself already has.

I totally agree. It just didn't occur to me to check how XMLTABLE()
deals with these conversions :)

In particular,
I'm concerned about the special handling you seem to have for times
and intervals.

The spec dictates that SQL types should be converted to their xsd
equivalents, e.g.

6.7 <XML cast specification>: Syntax Rules
...
15 e)
 * i)   If the type designator of SQLT is DATE, then let XT be xs:date.
 * ii)  If the type designator of SQLT is TIME WITH TIME ZONE, then let
XT be xs:time.
 * iii) If the type designator of SQLT is TIME WITHOUT TIME ZONE, then
let XT be xs:time.
 * iv)  If the type designator of SQLT is TIMESTAMP WITH TIME ZONE, then
let XT be xs:dateTime.
 * v)   If the type designator of SQLT is TIMESTAMP WITHOUT TIME ZONE,
then let XT be xs:dateTime.

That handling might be different from what, say,
XMLTABLE() does.

XMLTABLE() does seem to have a similar behaviour (also regarding
intervals and timestamps):

WITH j (val) AS (
 SELECT
  '<foo>
    <interval>P1Y2M3DT4H5M6S</interval>
    <timestamp>2002-05-30T09:30:10</timestamp>
    <integer>42</integer>
    <numeric>-42.73</numeric>
    <text>foo &amp; &lt;&quot;bar&quot;&gt;</text>
    <boolean>false</boolean>
  </foo>'::xml
)
SELECT a, b, c, d, e, f
FROM j,
  XMLTABLE(
    '/foo'
    PASSING val
    COLUMNS
      a interval PATH 'interval',
      b timestamp PATH 'timestamp',
      c integer PATH 'integer',
      d numeric PATH 'numeric',
      e text PATH 'text',
      f boolean PATH 'boolean');
               a               |          b          | c  |   d   
|       e       | f
-------------------------------+---------------------+----+--------+---------------+---
 1 year 2 mons 3 days 04:05:06 | 2002-05-30 09:30:10 | 42 | -42.73 | foo
& <"bar"> | f
(1 row)

In a perfect world, we'd probably like the features
to share code, unless there is some good reason to do otherwise. But
at the very least we want them to work in compatible ways. For
example, if the way you convert a date into the JSON-preferred format
happened to use slightly different time zone handling than the way
that some other existing feature does it, that would be extremely sad.
Or if the existing features don't have interval handling and you do,
perhaps we ought to add that capability to the existing features and
then have your new feature call the same code so that it works the
same way.

At least XMLTABLE() does handle intervals in the same way. I'll do some
research to check if maybe other related XML features follow a different
path.

I haven't researched what the exact situation is here too
and these examples I'm giving you here are strictly hypothetical --
they're just the kind of thing that needs to be sorted out before we
can think about committing anything.

+1

There's still also the question of desirability. I take it for granted
that you want this feature and consider it valuable, but sometimes
people submit patches for a feature that only the submitter wants and
nobody else cares about it (or even, other people actively dislike
it).

I've been there a few times :)

I am in a very poor position to assess how important this feature
is or to what extent it complies with the relevant specification. Vik,
who I see you copied, is probably in a much better position to
interpret the spec than I am, and may or may not also know something
about whether people want this. I continue to hope that we'll get some
comments from others as well.

Thanks for taking a look at this patch. Much appreciated!

--
Jim

#9Jim Jones
jim.jones@uni-muenster.de
In reply to: Jim Jones (#8)
Re: [PoC] XMLCast (SQL/XML X025)

rebase.

v5 also attached removes the libxml2 dependency of unescape_xml().

Background: the existing function escape_xml() intentionally avoids
libxml2 dependency and the previously used libxml2 functions
xmlStringDecodeEntities() and xmlDecodeEntities() got deprecated.

--
Jim

Attachments:

v5-0001-Add-XMLCast-function-SQL-XML-X025.patchtext/x-patch; charset=UTF-8; name=v5-0001-Add-XMLCast-function-SQL-XML-X025.patchDownload+1812-9
#10Jim Jones
jim.jones@uni-muenster.de
In reply to: Jim Jones (#9)
Re: [PoC] XMLCast (SQL/XML X025)

On 30.12.24 10:29, Jim Jones wrote:

rebase.

v5 attached removes the libxml2 dependency of unescape_xml().

Background: the existing function escape_xml() intentionally avoids
libxml2 dependency and the previously used libxml2 functions
xmlStringDecodeEntities() and xmlDecodeEntities() got deprecated.

v6 attached adds missing regression tests for XMLCast backward parsing.

Jim

Attachments:

v6-0001-Add-XMLCast-function-SQL-XML-X025.patchtext/x-patch; charset=UTF-8; name=v6-0001-Add-XMLCast-function-SQL-XML-X025.patchDownload+2201-10
#11Jim Jones
jim.jones@uni-muenster.de
In reply to: Jim Jones (#10)
Re: [PoC] XMLCast (SQL/XML X025)

rebase due to changes in 984410b923263cac901fa81e0efbe523e9c36df3

Jim

Attachments:

v7-0001-Add-XMLCast-function-SQL-XML-X025.patchtext/x-patch; charset=UTF-8; name=v7-0001-Add-XMLCast-function-SQL-XML-X025.patchDownload+2201-10
#12Jim Jones
jim.jones@uni-muenster.de
In reply to: Jim Jones (#11)
Re: [PoC] XMLCast (SQL/XML X025)

rebase

Jim

Attachments:

v8-0001-Add-XMLCast-function-SQL-XML-X025.patchtext/x-patch; charset=UTF-8; name=v8-0001-Add-XMLCast-function-SQL-XML-X025.patchDownload+2201-10
#13Jim Jones
jim.jones@uni-muenster.de
In reply to: Jim Jones (#12)
Re: [PoC] XMLCast (SQL/XML X025)

rebase

--
Jim

Attachments:

v9-0001-Add-XMLCast-function-SQL-XML-X025.patchtext/x-patch; charset=UTF-8; name=v9-0001-Add-XMLCast-function-SQL-XML-X025.patchDownload+2201-10
#14Robert Haas
robertmhaas@gmail.com
In reply to: Jim Jones (#13)
Re: [PoC] XMLCast (SQL/XML X025)

On Mon, May 19, 2025 at 9:23 AM Jim Jones <jim.jones@uni-muenster.de> wrote:

rebase

Hi,

Well, this patch is now more than 10 months old, and it's still the
case that nobody other than the author has said that they want this.
Is it time to give up?

I still don't think it's very clear either from the patch or from the
thread how this differs from existing facilities. As far as I can see,
there are zero comments in the patch explaining the design decisions
that it makes, and nothing in the commit message, the comments, or
even the thread itself addressing my concern from my previous review:
how is this consistent, or inconsistent, with what we do in other
similar cases, and why?

To make that more concrete, the patch says:

+    Another option to convert character strings into xml is the
function <function>xmlcast</function>,
+    which is designed to cast SQL data types into <type>xml</type>,
and vice versa.

But while it explains the behavior of this option, it does not explain
how the behavior is the same or different from other options, or why.

In the comments it says:

+ /* These data types must be converted to their ISO 8601 representations */

To me this just begs the question "says who?". I think there should be
a bunch of comments in this referencing whatever document specifies
the behavior of XMLCAST, so that someone who is good at reading
specification documents (not me) can compare the implementation with
the spec and see if they agree with the decisions that were made.

+ default:
+ *op->resvalue = PointerGetDatum(DatumGetTextP(value));
+ break;

This doesn't seem very safe at all. If we know what type OIDs we
intend this to handle, then we could list them out explicitly as is
already done for TEXTOID, VARCHAROID, etc. If we don't, then why
should we believe that it's a data type for which DatumGetTextP will
produce a non-garbage return value? Maybe there's an answer to that
question, but there's no comment spelling it out; or maybe it's
actually just broken.

--
Robert Haas
EDB: http://www.enterprisedb.com

#15Jim Jones
jim.jones@uni-muenster.de
In reply to: Robert Haas (#14)
Re: [PoC] XMLCast (SQL/XML X025)

Hi Robert

On 19.05.25 16:26, Robert Haas wrote:

Well, this patch is now more than 10 months old, and it's still the
case that nobody other than the author has said that they want this.
Is it time to give up?

Not quite yet -- unless there is an expiration date that I am not aware
of :). If we decide we don't need XMLCast on Postgres after all, I'd
suggest to delete it from the todo list on the wiki [1] - I've already
added a link to this thread there.

I still don't think it's very clear either from the patch or from the
thread how this differs from existing facilities. As far as I can see,
there are zero comments in the patch explaining the design decisions
that it makes, and nothing in the commit message, the comments, or
even the thread itself addressing my concern from my previous review:
how is this consistent, or inconsistent, with what we do in other
similar cases, and why?

I guess I misunderstood your message last year. I was under the
impression that you were only concerned about the handling of intervals,
which I replied here [2]. Basically I said that XMLTable is doing pretty
much the same ...

WITH j (val) AS (
 SELECT
  '<foo>
    <interval>P1Y2M3DT4H5M6S</interval>
    <timestamp>2002-05-30T09:30:10</timestamp>
    <integer>42</integer>
    <numeric>-42.73</numeric>
    <text>foo &amp; &lt;&quot;bar&quot;&gt;</text>
    <boolean>false</boolean>
  </foo>'::xml
)
SELECT a, b, c, d, e, f
FROM j,
  XMLTABLE(
    '/foo'
    PASSING val
    COLUMNS
      a interval PATH 'interval',
      b timestamp PATH 'timestamp',
      c integer PATH 'integer',
      d numeric PATH 'numeric',
      e text PATH 'text',
      f boolean PATH 'boolean');
               a               |          b          | c  |   d   
|       e       | f
-------------------------------+---------------------+----+--------+---------------+---
 1 year 2 mons 3 days 04:05:06 | 2002-05-30 09:30:10 | 42 | -42.73 | foo
& <"bar"> | f
(1 row)

... and cited the spec:

6.7 <XML cast specification>: Syntax Rules
...
15 e)
 * i)   If the type designator of SQLT is DATE, then let XT be xs:date.
 * ii)  If the type designator of SQLT is TIME WITH TIME ZONE, then let
XT be xs:time.
 * iii) If the type designator of SQLT is TIME WITHOUT TIME ZONE, then
let XT be xs:time.
 * iv)  If the type designator of SQLT is TIMESTAMP WITH TIME ZONE, then
let XT be xs:dateTime.
 * v)   If the type designator of SQLT is TIMESTAMP WITHOUT TIME ZONE,
then let XT be xs:dateTime.

and since you didn't reply, I assumed I had already addressed your
comments. But now I see it was not the case.

To make that more concrete, the patch says:

+    Another option to convert character strings into xml is the
function <function>xmlcast</function>,
+    which is designed to cast SQL data types into <type>xml</type>,
and vice versa.

But while it explains the behavior of this option, it does not explain
how the behavior is the same or different from other options, or why.

I'm not entirely sure what you mean by "other options". Similar
alternatives are

* CAST: simply treats the input as a textual XML value and does not
perform type-aware formatting.
* XMLTABLE: extracts typed values from an XML document using
XQuery/XPath expressions, but does not itself convert SQL types into XML.

SELECT xmlcast('2 years 1 day'::interval AS xml);
 xmlcast
---------
 P2Y1D
(1 row)

SELECT cast('2 years 1 day'::interval AS xml);
ERROR:  cannot cast type interval to xml

SELECT xmlcast(now() AS xml);
             xmlcast             
---------------------------------
 2025-05-19T20:16:47.58815+02:00
(1 row)

SELECT cast(now() AS xml);
ERROR:  cannot cast type timestamp with time zone to xml

Is it what you meant?

In the comments it says:

+ /* These data types must be converted to their ISO 8601 representations */

To me this just begs the question "says who?".

Says the SQL/XML:2023 standard :)

SQL/XML:2023 (ISO/IEC 9075-14:2023) - “General Rules” of §6.7.3 (d.ii.1
and d.ii.2):

If SD is a year-month interval type, then let XSBT be the XQuery simple
type xs:yearMonthDuration.
If SD is a day-time interval type, then let XSBT be the XQuery simple
type xs:dayTimeDuration.

... and since xs:yearMonthDuration and xs:dayTimeDuration are subsets of
xs:duration, and the lexical representation of this type is ISO 8601, I
inferred that it is the expected behaviour.

I think there should be
a bunch of comments in this referencing whatever document specifies
the behavior of XMLCAST, so that someone who is good at reading
specification documents (not me) can compare the implementation with
the spec and see if they agree with the decisions that were made.

I re-wrote the commit message and some the code comments to try to make
things clearer.

+ default:
+ *op->resvalue = PointerGetDatum(DatumGetTextP(value));
+ break;

This doesn't seem very safe at all. If we know what type OIDs we
intend this to handle, then we could list them out explicitly as is
already done for TEXTOID, VARCHAROID, etc. If we don't, then why
should we believe that it's a data type for which DatumGetTextP will
produce a non-garbage return value? Maybe there's an answer to that
question, but there's no comment spelling it out; or maybe it's
actually just broken.

Given that XMLCast converts values between SQL and XML and vice versa,
my rationale was that if the target type is not a text type (such as
TEXTOID, VARCHAROID, or NAMEOID), then the cast operand must be of type
xml, which makes this default: safe.

SELECT xmlcast('2 years 1 day'::interval AS interval);
ERROR:  cannot cast from 'interval' to 'interval'

SELECT xmlcast(now() AS timestamp);
ERROR:  cannot cast from 'timestamp with time zone' to 'timestamp
without time zone'
               
see "static Node *transformXmlCast(ParseState *pstate, XmlCast *xc)" in
parse_target.c

But I can see it looks unsafe. Do you have something like this in mind?

case INT2OID:
case INT4OID:
case INT8OID:
case NUMERICOID:
case FLOAT4OID:
case FLOAT8OID:
case BOOLOID:
case TIMESTAMPOID:
case TIMESTAMPTZOID:
case TIMEOID:
case TIMETZOID:
case DATEOID:
case BYTEAOID:
case INTERVALOID:
    *op->resvalue = PointerGetDatum(DatumGetTextP(value));
    break;
default:
    elog(ERROR, "unsupported target data type for XMLCast");
}

Thanks for the review. Much appreciated.

v10 attached.

Best, Jim

1 - https://wiki.postgresql.org/wiki/Todo
2 - /messages/by-id/998465cb-2fda-4497-8194-87da56748186@uni-muenster.de

Attachments:

v10-0001-Add-XMLCast-function-SQL-XML-X025.patchtext/x-patch; charset=UTF-8; name=v10-0001-Add-XMLCast-function-SQL-XML-X025.patchDownload+2249-11
#16Robert Haas
robertmhaas@gmail.com
In reply to: Jim Jones (#15)
Re: [PoC] XMLCast (SQL/XML X025)

On Mon, May 19, 2025 at 7:11 PM Jim Jones <jim.jones@uni-muenster.de> wrote:

Not quite yet -- unless there is an expiration date that I am not aware
of :). If we decide we don't need XMLCast on Postgres after all, I'd
suggest to delete it from the todo list on the wiki [1] - I've already
added a link to this thread there.

Yeah. Just to be clear, I can't really think of committing a patch in
this area because I don't know the topic well enough. I can make some
general comments on what I see as issues with this patch but serious
review would really need to come from a committer who is more familiar
with the XML specifications than I am. If nobody like that shows up,
this proposal won't be able to advance.

and since you didn't reply, I assumed I had already addressed your
comments. But now I see it was not the case.

Hmm, sorry if I wasn't clear enough. I think there needs to be more
explanation of quite a few things in the patch itself.

To make that more concrete, the patch says:

+    Another option to convert character strings into xml is the
function <function>xmlcast</function>,
+    which is designed to cast SQL data types into <type>xml</type>,
and vice versa.

But while it explains the behavior of this option, it does not explain
how the behavior is the same or different from other options, or why.

I'm not entirely sure what you mean by "other options".

Well, the sentence begins with "Another option". Let's say we were
talking about making cookies. I could say "Another option, if you
don't have butter, is to substitute Crisco." But if I do that, I
should then go on to explain further: "However, if you do this, it may
affect the flavor of the cookies and they may brown differently in the
oven. Nevertheless, it's better than not having cookies." Your patch
seemed to me to be lacking any further explanation of this kind. When
we document that there are multiple options, we should try to give
some context to help the user choose between them. In my cookie-based
example, the additional text makes it clear why I would select the
Crisco option: I might be out of butter, and something is better than
nothing. In your case, it was not clear to me why someone should
choose XMLCAST over options or the other way around.

To be clear, I don't want you to explain it *to me*. I want you to
explain it to the reader of the documentation.

In the comments it says:

+ /* These data types must be converted to their ISO 8601 representations */

To me this just begs the question "says who?".

Says the SQL/XML:2023 standard :)

SQL/XML:2023 (ISO/IEC 9075-14:2023) - “General Rules” of §6.7.3 (d.ii.1
and d.ii.2):

Cool. You should put that in the patch.

+ default:
+ *op->resvalue = PointerGetDatum(DatumGetTextP(value));
+ break;

This doesn't seem very safe at all. If we know what type OIDs we
intend this to handle, then we could list them out explicitly as is
already done for TEXTOID, VARCHAROID, etc. If we don't, then why
should we believe that it's a data type for which DatumGetTextP will
produce a non-garbage return value? Maybe there's an answer to that
question, but there's no comment spelling it out; or maybe it's
actually just broken.

Given that XMLCast converts values between SQL and XML and vice versa,
my rationale was that if the target type is not a text type (such as
TEXTOID, VARCHAROID, or NAMEOID), then the cast operand must be of type
xml, which makes this default: safe.
[...]
But I can see it looks unsafe. Do you have something like this in mind?
[...]
default:
elog(ERROR, "unsupported target data type for XMLCast");
}

Yes, exactly.

--
Robert Haas
EDB: http://www.enterprisedb.com

#17Jim Jones
jim.jones@uni-muenster.de
In reply to: Robert Haas (#16)
Re: [PoC] XMLCast (SQL/XML X025)

Hi Robert

On 21.05.25 19:10, Robert Haas wrote:

Yeah. Just to be clear, I can't really think of committing a patch in
this area because I don't know the topic well enough. I can make some
general comments on what I see as issues with this patch but serious
review would really need to come from a committer who is more familiar
with the XML specifications than I am.

I understand. And I really do appreciate that you take the time to take
a look at it nevertheless.

If nobody like that shows up,
this proposal won't be able to advance.

So I'll keep my fingers crossed that somebody shows up :)

Well, the sentence begins with "Another option". Let's say we were
talking about making cookies. I could say "Another option, if you
don't have butter, is to substitute Crisco." But if I do that, I
should then go on to explain further: "However, if you do this, it may
affect the flavor of the cookies and they may brown differently in the
oven. Nevertheless, it's better than not having cookies." Your patch
seemed to me to be lacking any further explanation of this kind. When
we document that there are multiple options, we should try to give
some context to help the user choose between them. In my cookie-based
example, the additional text makes it clear why I would select the
Crisco option: I might be out of butter, and something is better than
nothing. In your case, it was not clear to me why someone should
choose XMLCAST over options or the other way around.

To be clear, I don't want you to explain it *to me*. I want you to
explain it to the reader of the documentation.

Got it.

In v10 I added this to the documentation to make the difference to CAST
clearer:

Similar to the SQL function <function>CAST</function>, this function
converts an <replaceable>expression into the specified
<replaceable>type</replaceable>. It is primarily used for converting
between SQL values and <type>xml</type> values in a standards-compliant way.

Unlike <function>CAST</function>, which may coerce SQL values into text
or XML without enforcing a specific lexical representation,
<function>xmlcast</function> ensures that the conversion produces or
expects a canonical XML Schema lexical form appropriate for the target
type. For example, an <type>interval</type> value is rendered as
<literal>P1Y2M</literal> (<type>xs:duration</type>), and a
<type>timestamp</type> as  <literal>2023-05-19T14:30:00Z</literal>
(xs:dateTime). Similarly, when converting from XML to SQL types,
<function>xmlcast</function> validates that the input string conforms to
the lexical format required by the corresponding SQL type.

Says the SQL/XML:2023 standard :)

SQL/XML:2023 (ISO/IEC 9075-14:2023) - “General Rules” of §6.7.3 (d.ii.1
and d.ii.2):

Cool. You should put that in the patch.

In v10 I changed these comments to:

/*
 * SQL date/time types must be mapped to XML Schema types when casting
to XML:
 *   - DATE                        -> xs:date
 *   - TIME [WITH/WITHOUT TZ]      -> xs:time
 *   - TIMESTAMP [WITH/WITHOUT TZ] -> xs:dateTime
 *
 * These mappings are defined in SQL/XML:2023 (ISO/IEC 9075-14:2023),
 * Subclause 6.7 "<XML cast specification>", item 15.e.i–v.
 *
 * The corresponding XML Schema lexical formats (e.g., "2023-05-19",
"14:30:00Z",
 * "2023-05-19T14:30:00+01:00") follow ISO 8601 and are specified in
 * W3C XML Schema Part 2: Primitive Datatypes §3.2.7 (dateTime) and
§3.2.9 (date).
 */

and

/*
 * SQL interval types must be mapped to XML Schema types when casting to
XML:
 *   - Year-month intervals -> xs:yearMonthDuration
 *   - Day-time intervals    -> xs:dayTimeDuration
 *
 * This behavior is required by SQL/XML:2023 (ISO/IEC 9075-14:2023),
 * Subclause 6.7 "<XML cast specification>", General Rules, item 3.d.ii.1–2.
 *
 * These XML Schema types require ISO 8601-compatible lexical
representations,
 * such as: "P1Y2M", "P3DT4H5M", or "P1Y2M3DT4H5M6S", as defined in
 * W3C XML Schema Part 2: Primitve Datatypes, §3.2.6 (duration)
 */

Given that XMLCast converts values between SQL and XML and vice versa,
my rationale was that if the target type is not a text type (such as
TEXTOID, VARCHAROID, or NAMEOID), then the cast operand must be of type
xml, which makes this default: safe.
[...]
But I can see it looks unsafe. Do you have something like this in mind?
[...]
default:
elog(ERROR, "unsupported target data type for XMLCast");
}

Yes, exactly.

Done in v10.

Thanks!

Jim

#18Robert Haas
robertmhaas@gmail.com
In reply to: Jim Jones (#17)
Re: [PoC] XMLCast (SQL/XML X025)

On Wed, May 21, 2025 at 2:22 PM Jim Jones <jim.jones@uni-muenster.de> wrote:

In v10 I added this to the documentation to make the difference to CAST
clearer:

Yes, that looks very helpful.

In v10 I changed these comments to:

That, too.

I don't have time to re-review this right now, but I encourage you to
look through the patch for other, similar places that could benefit
from a fuller explanation. And I hope somebody else shows up to
express interest in this so that your work is not wasted...

--
Robert Haas
EDB: http://www.enterprisedb.com

#19Jim Jones
jim.jones@uni-muenster.de
In reply to: Robert Haas (#18)
Re: [PoC] XMLCast (SQL/XML X025)

On 22.05.25 16:02, Robert Haas wrote:

I don't have time to re-review this right now, but I encourage you to
look through the patch for other, similar places that could benefit
from a fuller explanation. And I hope somebody else shows up to
express interest in this so that your work is not wasted...

Took another look at the code and added more references to the SQL/XML
spec where it made sense -- hope it's not too verbose now :)

v11 attached.

Best regards, Jim

Attachments:

v11-0001-Add-XMLCast-function-SQL-XML-X025.patchtext/x-patch; charset=UTF-8; name=v11-0001-Add-XMLCast-function-SQL-XML-X025.patchDownload+2276-11
#20Jim Jones
jim.jones@uni-muenster.de
In reply to: Jim Jones (#19)
Re: [PoC] XMLCast (SQL/XML X025)

rebased

Jim

Attachments:

v12-0001-Add-XMLCast-function-SQL-XML-X025.patchtext/x-patch; charset=UTF-8; name=v12-0001-Add-XMLCast-function-SQL-XML-X025.patchDownload+2276-11
#21Jim Jones
jim.jones@uni-muenster.de
In reply to: Jim Jones (#20)
#22Jim Jones
jim.jones@uni-muenster.de
In reply to: Jim Jones (#21)
#23Jim Jones
jim.jones@uni-muenster.de
In reply to: Jim Jones (#22)
#24Marcos Pegoraro
marcos@f10.com.br
In reply to: Jim Jones (#23)
#25Jim Jones
jim.jones@uni-muenster.de
In reply to: Marcos Pegoraro (#24)
#26Marcos Pegoraro
marcos@f10.com.br
In reply to: Jim Jones (#25)
#27Jim Jones
jim.jones@uni-muenster.de
In reply to: Marcos Pegoraro (#26)
#28Marcos Pegoraro
marcos@f10.com.br
In reply to: Jim Jones (#27)
#29Jim Jones
jim.jones@uni-muenster.de
In reply to: Marcos Pegoraro (#28)
#30Marcos Pegoraro
marcos@f10.com.br
In reply to: Jim Jones (#29)
#31Jim Jones
jim.jones@uni-muenster.de
In reply to: Marcos Pegoraro (#30)
#32Jim Jones
jim.jones@uni-muenster.de
In reply to: Jim Jones (#31)
#33Jim Jones
jim.jones@uni-muenster.de
In reply to: Jim Jones (#32)
#34Jim Jones
jim.jones@uni-muenster.de
In reply to: Jim Jones (#33)
#35Jim Jones
jim.jones@uni-muenster.de
In reply to: Jim Jones (#34)
#36Jim Jones
jim.jones@uni-muenster.de
In reply to: Jim Jones (#35)