XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

Started by Markus Winandalmost 8 years ago10 messages
#1Markus Winand
markus.winand@winand.at
2 attachment(s)

Hi!

I’ve found two issues with XML/XPath integration in PostgreSQL. Two patches are attached. I’ve just separated them because the second one is an incompatible change.

* Patch 0001: Accept TEXT and CDATA nodes in XMLTABLE’s column_expression.

Column_expressions that match TEXT or CDATA nodes must return
the contents of the node itself, not the content of the
non-existing childs (i.e. the empty string).

The following query returns a wrong result in master:

SELECT *
FROM (VALUES ('<xml>text</xml>'::xml)
, ('<xml><![CDATA[<cdata>]]></xml>'::xml)
) d(x)
CROSS JOIN LATERAL
XMLTABLE('/xml'
PASSING x
COLUMNS "node()" TEXT PATH 'node()'
) t

x | node()
--------------------------------+--------
<xml>text</xml> |
<xml><![CDATA[<cdata>]]></xml> |

(the “node()” column is empty)

The correct result can be obtained with patch 0001 applied:

x | node()
--------------------------------+---------
<xml>text</xml> | text
<xml><![CDATA[<cdata>]]></xml> | <cdata>

The patch adopts existing tests to guard against future regressions.

* Patch 0002: Set correct context for XPath evaluation.

According to the ISO standard, the context of XMLTABLE's XPath
row_expression is the document node of the XML[1] - not the root node.

The respective code is shared with non-ISO functions such as xpath
and xpath_exists so that the change also affects these functions.

It’s an incompatible change - even the regression tests need to be adjusted because they
test for the “wrong” behaviour. The documentation, on the other hand, does neither
document the behaviour explicitly, no does it show any example that breaks due to this patch.

The alternatives to this patch are (1) documenting the current standard-breaking behaviour
and (2) fixing the context only for ISO functions. Personally, I don’t like either of them.

I’ve checked the libxml2 code to see what’s the proper way to set the context to the
document node. It’s indeed just “(xmlNodePtr)ctxt->doc”.

See: https://github.com/GNOME/libxml2/blob/7abec671473b837f99181442d59edd0cc2ee01d1/xpath.c#L14306

-markus

--
[1]: SQL/XML:2011, 6.18 General Rule 1aii2.

Attachments:

0001-Accept-TEXT-and-CDATA-nodes-in-XMLTABLE-s-column_exp.patchapplication/octet-stream; name=0001-Accept-TEXT-and-CDATA-nodes-in-XMLTABLE-s-column_exp.patchDownload
From fca715a250a9a49d23193b34a72b04e80e024d71 Mon Sep 17 00:00:00 2001
From: Markus Winand <markus.winand@winand.at>
Date: Tue, 27 Mar 2018 08:47:53 +0000
Subject: [PATCH 1/2] Accept TEXT and CDATA nodes in XMLTABLE's
 column_expression.

Column_expressions that match TEXT or CDATA nodes must return
the contents of the node itself, not the content of the
non-existing childs (i.e. the empty string).
---
 src/backend/utils/adt/xml.c         | 14 +++++++++-----
 src/test/regress/expected/xml_2.out | 12 ++++++------
 src/test/regress/sql/xml.sql        |  4 ++--
 3 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 7cdb87e..3e8cc55 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -4509,8 +4509,15 @@ XmlTableGetValue(TableFuncScanState *state, int colnum,
 			{
 				xmlChar    *str;
 
+				/* Most nodes (elements and even attributes) store their data in child nodes.
+				 * If they don't have child nodes, it means that they are empty (e.g. <element/>).
+				 * Text nodes and CDATA sections are the exception: the don't have childs
+				 * but have content in the Text/CDATA node itself.
+				 */
 				str = xmlNodeListGetString(xtCxt->doc,
-										   xpathobj->nodesetval->nodeTab[0]->xmlChildrenNode,
+										   xpathobj->nodesetval->nodeTab[0]->type == XML_CDATA_SECTION_NODE ||
+										   xpathobj->nodesetval->nodeTab[0]->type == XML_TEXT_NODE ? xpathobj->nodesetval->nodeTab[0]
+																								   : xpathobj->nodesetval->nodeTab[0]->xmlChildrenNode,
 										   1);
 
 				if (str != NULL)
@@ -4531,10 +4538,7 @@ XmlTableGetValue(TableFuncScanState *state, int colnum,
 				{
 					/*
 					 * This line ensure mapping of empty tags to PostgreSQL
-					 * value. Usually we would to map a empty tag to empty
-					 * string. But this mapping can create empty string when
-					 * user doesn't expect it - when empty tag is enforced by
-					 * libxml2 - when user uses a text() function for example.
+					 * values.
 					 */
 					cstr = "";
 				}
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 112ebe4..cb865a9 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -1004,7 +1004,7 @@ SELECT  xmltable.*
                          PASSING data
                          COLUMNS id int PATH '@id',
                                   _id FOR ORDINALITY,
-                                  country_name text PATH 'COUNTRY_NAME' NOT NULL,
+                                  country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
                                   country_id text PATH 'COUNTRY_ID',
                                   region_id int PATH 'REGION_ID',
                                   size float PATH 'SIZE',
@@ -1026,7 +1026,7 @@ CREATE VIEW xmltableview1 AS SELECT  xmltable.*
                          PASSING data
                          COLUMNS id int PATH '@id',
                                   _id FOR ORDINALITY,
-                                  country_name text PATH 'COUNTRY_NAME' NOT NULL,
+                                  country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
                                   country_id text PATH 'COUNTRY_ID',
                                   region_id int PATH 'REGION_ID',
                                   size float PATH 'SIZE',
@@ -1055,7 +1055,7 @@ CREATE OR REPLACE VIEW public.xmltableview1 AS
     "xmltable".premier_name
    FROM ( SELECT xmldata.data
            FROM xmldata) x,
-    LATERAL XMLTABLE(('/ROWS/ROW'::text) PASSING (x.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text))
+    LATERAL XMLTABLE(('/ROWS/ROW'::text) PASSING (x.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME/text()'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text))
 EXPLAIN (COSTS OFF) SELECT * FROM xmltableview1;
                QUERY PLAN                
 -----------------------------------------
@@ -1065,15 +1065,15 @@ EXPLAIN (COSTS OFF) SELECT * FROM xmltableview1;
 (3 rows)
 
 EXPLAIN (COSTS OFF, VERBOSE) SELECT * FROM xmltableview1;
-                                                                                                                                                                                                                        QUERY PLAN                                                                                                                                                                                                                         
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                                                                                                                                            QUERY PLAN                                                                                                                                                                                                                            
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop
    Output: "xmltable".id, "xmltable"._id, "xmltable".country_name, "xmltable".country_id, "xmltable".region_id, "xmltable".size, "xmltable".unit, "xmltable".premier_name
    ->  Seq Scan on public.xmldata
          Output: xmldata.data
    ->  Table Function Scan on "xmltable"
          Output: "xmltable".id, "xmltable"._id, "xmltable".country_name, "xmltable".country_id, "xmltable".region_id, "xmltable".size, "xmltable".unit, "xmltable".premier_name
-         Table Function Call: XMLTABLE(('/ROWS/ROW'::text) PASSING (xmldata.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text))
+         Table Function Call: XMLTABLE(('/ROWS/ROW'::text) PASSING (xmldata.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME/text()'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text))
 (7 rows)
 
 -- XMLNAMESPACES tests
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index cb96e18..c223603 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -349,7 +349,7 @@ SELECT  xmltable.*
                          PASSING data
                          COLUMNS id int PATH '@id',
                                   _id FOR ORDINALITY,
-                                  country_name text PATH 'COUNTRY_NAME' NOT NULL,
+                                  country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
                                   country_id text PATH 'COUNTRY_ID',
                                   region_id int PATH 'REGION_ID',
                                   size float PATH 'SIZE',
@@ -362,7 +362,7 @@ CREATE VIEW xmltableview1 AS SELECT  xmltable.*
                          PASSING data
                          COLUMNS id int PATH '@id',
                                   _id FOR ORDINALITY,
-                                  country_name text PATH 'COUNTRY_NAME' NOT NULL,
+                                  country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
                                   country_id text PATH 'COUNTRY_ID',
                                   region_id int PATH 'REGION_ID',
                                   size float PATH 'SIZE',
-- 
1.9.1

0002-Set-correct-context-for-XPath-evaluation.patchapplication/octet-stream; name=0002-Set-correct-context-for-XPath-evaluation.patchDownload
From 1ecc1322fdf94fc9f779122b5285aebae0f6a2ca Mon Sep 17 00:00:00 2001
From: Markus Winand <markus.winand@winand.at>
Date: Tue, 27 Mar 2018 08:36:59 +0000
Subject: [PATCH 2/2] Set correct context for XPath evaluation.

According to the ISO standard, the context of XMLTABLE's XPath
row_expression is the document node of the XML - not the root node.

The respective code is shared with non-ISO functions such as xpath
and xpath_exists so that the change also affects these functions.
---
 src/backend/utils/adt/xml.c         | 10 ++--------
 src/test/regress/expected/xml_2.out |  8 +++++++-
 src/test/regress/sql/xml.sql        |  3 ++-
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 3e8cc55..c8b681f 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3934,10 +3934,7 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces,
 		if (xpathctx == NULL || xmlerrcxt->err_occurred)
 			xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
 						"could not allocate XPath context");
-		xpathctx->node = xmlDocGetRootElement(doc);
-		if (xpathctx->node == NULL || xmlerrcxt->err_occurred)
-			xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
-						"could not find root XML element");
+		xpathctx->node = (xmlNodePtr)doc;
 
 		/* register namespaces, if any */
 		if (ns_count > 0)
@@ -4276,10 +4273,7 @@ XmlTableSetDocument(TableFuncScanState *state, Datum value)
 		if (xpathcxt == NULL || xtCxt->xmlerrcxt->err_occurred)
 			xml_ereport(xtCxt->xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
 						"could not allocate XPath context");
-		xpathcxt->node = xmlDocGetRootElement(doc);
-		if (xpathcxt->node == NULL || xtCxt->xmlerrcxt->err_occurred)
-			xml_ereport(xtCxt->xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
-						"could not find root XML element");
+		xpathcxt->node = (xmlNodePtr)doc;
 	}
 	PG_CATCH();
 	{
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index cb865a9..3ec56e4 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -650,6 +650,12 @@ SELECT xpath('/nosuchtag', '<root/>');
  {}
 (1 row)
 
+SELECT xpath('root', '<root/>');
+   xpath   
+-----------
+ {<root/>}
+(1 row)
+
 -- Round-trip non-ASCII data through xpath().
 DO $$
 DECLARE
@@ -1192,7 +1198,7 @@ SELECT * FROM xmltable('/root' passing '<root><element>a1a<!-- aaaa -->a2a<?aaaa
 SELECT * FROM xmltable('/root' passing '<root><element>a1a<!-- aaaa -->a2a<?aaaaa?> <!--z-->  bbbb<x>xxx</x>cccc</element></root>' COLUMNS element text PATH 'element/text()'); -- should fail
 ERROR:  more than one value returned by column XPath expression
 -- CDATA test
-select * from xmltable('r' passing '<d><r><c><![CDATA[<hello> &"<>!<a>foo</a>]]></c></r><r><c>2</c></r></d>' columns c text);
+select * from xmltable('d/r' passing '<d><r><c><![CDATA[<hello> &"<>!<a>foo</a>]]></c></r><r><c>2</c></r></d>' columns c text);
             c            
 -------------------------
  <hello> &"<>!<a>foo</a>
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index c223603..3b91b56 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -188,6 +188,7 @@ SELECT xpath('count(//*)=0', '<root><sub/><sub/></root>');
 SELECT xpath('count(//*)=3', '<root><sub/><sub/></root>');
 SELECT xpath('name(/*)', '<root><sub/><sub/></root>');
 SELECT xpath('/nosuchtag', '<root/>');
+SELECT xpath('root', '<root/>');
 
 -- Round-trip non-ASCII data through xpath().
 DO $$
@@ -423,7 +424,7 @@ SELECT * FROM xmltable('/root' passing '<root><element>a1a<!-- aaaa -->a2a<?aaaa
 SELECT * FROM xmltable('/root' passing '<root><element>a1a<!-- aaaa -->a2a<?aaaaa?> <!--z-->  bbbb<x>xxx</x>cccc</element></root>' COLUMNS element text PATH 'element/text()'); -- should fail
 
 -- CDATA test
-select * from xmltable('r' passing '<d><r><c><![CDATA[<hello> &"<>!<a>foo</a>]]></c></r><r><c>2</c></r></d>' columns c text);
+select * from xmltable('d/r' passing '<d><r><c><![CDATA[<hello> &"<>!<a>foo</a>]]></c></r><r><c>2</c></r></d>' columns c text);
 
 -- XML builtin entities
 SELECT * FROM xmltable('/x/a' PASSING '<x><a><ent>&apos;</ent></a><a><ent>&quot;</ent></a><a><ent>&amp;</ent></a><a><ent>&lt;</ent></a><a><ent>&gt;</ent></a></x>' COLUMNS ent text);
-- 
1.9.1

#2Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Markus Winand (#1)
Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

Markus Winand wrote:

Hi Markus,

I’ve found two issues with XML/XPath integration in PostgreSQL. Two
patches are attached. I’ve just separated them because the second one
is an incompatible change.

Thanks for these. I'll have a look at them after the commitfest is
over. In the meantime, if Peter or Pavel have comments, it'd be good to
hear them.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Markus Winand (#1)
1 attachment(s)
Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

Hello Markus,

Markus Winand wrote:

* Patch 0001: Accept TEXT and CDATA nodes in XMLTABLE’s column_expression.

Column_expressions that match TEXT or CDATA nodes must return
the contents of the node itself, not the content of the
non-existing childs (i.e. the empty string).

The following query returns a wrong result in master:

SELECT *
FROM (VALUES ('<xml>text</xml>'::xml)
, ('<xml><![CDATA[<cdata>]]></xml>'::xml)
) d(x)
CROSS JOIN LATERAL
XMLTABLE('/xml'
PASSING x
COLUMNS "node()" TEXT PATH 'node()'
) t

The correct result can be obtained with patch 0001 applied:

x | node()
--------------------------------+---------
<xml>text</xml> | text
<xml><![CDATA[<cdata>]]></xml> | <cdata>

The patch adopts existing tests to guard against future regressions.

I remembered while reading this that Craig Ringer had commented that XML
text sections can have multiple children: just put XML comments amidst
the text. To test this, I added a comment in one of the text values, in
the regression database:

update xmldata set data = regexp_replace(data::text, 'HongKong', 'Hong<!-- a space -->Kong')::xml;

With the modified data, the new query in the regression tests fails:

SELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data
COLUMNS id int PATH '@id',
_id FOR ORDINALITY,
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
country_id text PATH 'COUNTRY_ID',
region_id int PATH 'REGION_ID',
size float PATH 'SIZE',
unit text PATH 'SIZE/@unit',
premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified');

ERROR: more than one value returned by column XPath expression

This seems really undesirable, so I looked into getting it fixed. Turns
out that this error is being thrown from inside the same function we're
modifying, line 4559. I didn't have a terribly high opinion of that
ereport() when working on this feature to begin with, so now that it's
proven to provide a bad experience, let's try removing it. With that
change (patch attached), the feature seems to work; I tried with this
query, which seems to give the expected output (notice we have some
columns of type xml, others of type text, with and without the text()
function in the path):

SELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data COLUMNS
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
country_name_xml xml PATH 'COUNTRY_NAME/text()' NOT NULL,
country_name_n text PATH 'COUNTRY_NAME' NOT NULL,
country_name_xml_n xml PATH 'COUNTRY_NAME' NOT NULL);
country_name │ country_name_xml │ country_name_n │ country_name_xml_n
────────────────┼──────────────────┼────────────────┼───────────────────────────────────────────────────────
Australia │ Australia │ Australia │ <COUNTRY_NAME>Australia</COUNTRY_NAME>
China │ China │ China │ <COUNTRY_NAME>China</COUNTRY_NAME>
HongKong │ HongKong │ HongKong │ <COUNTRY_NAME>Hong<!-- a space -->Kong</COUNTRY_NAME>
India │ India │ India │ <COUNTRY_NAME>India</COUNTRY_NAME>
Japan │ Japan │ Japan │ <COUNTRY_NAME>Japan</COUNTRY_NAME>
Singapore │ Singapore │ Singapore │ <COUNTRY_NAME>Singapore</COUNTRY_NAME>
Czech Republic │ Czech Republic │ Czech Republic │ <COUNTRY_NAME>Czech Republic</COUNTRY_NAME>
Germany │ Germany │ Germany │ <COUNTRY_NAME>Germany</COUNTRY_NAME>
France │ France │ France │ <COUNTRY_NAME>France</COUNTRY_NAME>
Egypt │ Egypt │ Egypt │ <COUNTRY_NAME>Egypt</COUNTRY_NAME>
Sudan │ Sudan │ Sudan │ <COUNTRY_NAME>Sudan</COUNTRY_NAME>
(11 filas)

This means that the concatenation now works for all types, not just xml, so I
can do this also:

update xmldata set data = regexp_replace(data::text, '791', '7<!--small country-->91')::xml;

SELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data
COLUMNS
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
size_text float PATH 'SIZE/text()',
"SIZE" float, size_xml xml PATH 'SIZE');
country_name │ size_text │ SIZE
────────────────┼───────────┼──────
Australia │ │
China │ │
HongKong │ │
India │ │
Japan │ │
Singapore │ 791 │ 791
Czech Republic │ │
Germany │ │
France │ │
Egypt │ │
Sudan │ │
(11 filas)

I'm not sure if this concatenation of things that are not text or XML is
undesirable for whatever reason. Any clues?

Also, array indexes behave funny. First let's add more XML comments
inside that number, and query for the subscripts:

update xmldata set data = regexp_replace(data::text, '7<!--small country-->91', '<!--ah-->7<!--oh-->9<!--uh-->1')::xml;

SELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data
COLUMNS
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
size_text float PATH 'SIZE/text()',
size_text_1 float PATH 'SIZE/text()[1]',
size_text_2 float PATH 'SIZE/text()[2]',
"SIZE" float, size_xml xml PATH 'SIZE')
where size_text is not null;

country_name │ size_text │ size_text_1 │ size_text_2 │ size_text_3 │ SIZE │ size_xml
──────────────┼───────────┼─────────────┼─────────────┼─────────────┼──────┼───────────────────────────────────────────────────────
Singapore │ 791 │ 791 │ 91 │ 1 │ 791 │ <SIZE unit="km"><!--ah-->7<!--oh-->9<!--uh-->1</SIZE>
(1 fila)

I don't know what to make of that. Seems pretty broken.

After this, I looked for some examples of XPath syntax in the interwebs.
I came across the | operator (which apparently is a "union" of some
sort). Behold:

SELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data
COLUMNS
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
size_text text PATH 'SIZE/text() | SIZE/@unit')
where size_text is not null ;

country_name │ size_text
──────────────┼───────────
Singapore │ km791
(1 fila)

The "units" property is ahead of the text(), which is pretty odd. But if I
remove the text() call, it puts the units after the text:

SELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data
COLUMNS
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
size_text text PATH 'SIZE | SIZE/@unit')
where size_text is not null ;
country_name │ size_text
──────────────┼─────────────────────────────────────────────────────────
Singapore │ <SIZE unit="km"><!--ah-->7<!--oh-->9<!--uh-->1</SIZE>km
(1 fila)

Again, I don't know what to make of this. Anyway, this seems firmly in
the libxml side of things; the only conclusion I have is that nobody
ever uses libxml terribly much for complex XPath processing.

Basing another test on your original test case, look at the first row
here. Is this result correct?

SELECT *
FROM (VALUES ('<xml>te<!-- ahoy -->xt</xml>'::xml)
, ('<xml><![CDATA[some <!-- really --> weird <stuff>]]></xml>'::xml)
) d(x)
CROSS JOIN LATERAL
XMLTABLE('/xml'
PASSING x
COLUMNS "node()" TEXT PATH 'node()'
) t;
x │ node()
───────────────────────────────────────────────────────────┼────────────────────────────────────
<xml>te<!-- ahoy -->xt</xml> │ te ahoy xt
<xml><![CDATA[some <!-- really --> weird <stuff>]]></xml> │ some <!-- really --> weird <stuff>

Why was the comment contents expanded inside node()? Note what happens
if I change the type from text to xml in that column:

SELECT *
FROM (VALUES ('<xml>te<!-- ahoy -->xt</xml>'::xml)
, ('<xml><![CDATA[some <!-- really --> weird <stuff>]]></xml>'::xml)
) d(x)
CROSS JOIN LATERAL
XMLTABLE('/xml'
PASSING x
COLUMNS "node()" xml PATH 'node()'
) t;

x │ node()
───────────────────────────────────────────────────────────┼────────────────────────────────────────────────
<xml>te<!-- ahoy -->xt</xml> │ te ahoy xt
<xml><![CDATA[some <!-- really --> weird <stuff>]]></xml> │ some &lt;!-- really --&gt; weird &lt;stuff&gt;
(2 filas)

Further note that, per the standard, you can omit the PATH clause in
which case the column name is used as the path, which seems to work
correctly. Phew!

* Patch 0002: Set correct context for XPath evaluation.

According to the ISO standard, the context of XMLTABLE's XPath
row_expression is the document node of the XML[1] - not the root node.

The respective code is shared with non-ISO functions such as xpath
and xpath_exists so that the change also affects these functions.

It’s an incompatible change - even the regression tests need to be adjusted because they
test for the “wrong” behaviour.

I'm really afraid to change the non-ISO functions in PG10, since
it's already released for quite some time with this long-standing
behavior; and I'm not sure it'll do much good to change XMLTABLE in PG10
and leave the non-iso functions unpatched. I think the easiest route is
to patch all functions only for PostgreSQL 11. Maybe if XMLTABLE is
considered unacceptably broken in PostgreSQL 10 we can patch only that
one.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v2-0001-Accept-TEXT-and-CDATA-nodes-in-XMLTABLE-s-column_exp.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 233dd63e89..06bdbaf7c9 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -4505,11 +4505,22 @@ XmlTableGetValue(TableFuncScanState *state, int colnum,
 			else if (count == 1)
 			{
 				xmlChar    *str;
+				xmlNodePtr	object;
 
-				str = xmlNodeListGetString(xtCxt->doc,
-										   xpathobj->nodesetval->nodeTab[0]->xmlChildrenNode,
-										   1);
+				object = xpathobj->nodesetval->nodeTab[0];
 
+				/*
+				 * Most nodes (elements and even attributes) store their data
+				 * in child nodes.  If they don't have child nodes, it means
+				 * that they are empty (e.g. <element/>).  Text nodes and
+				 * CDATA sections are the exception: they don't have children
+				 * but the contents are in the Text/CDATA node itself.
+				 */
+				if (object->type != XML_CDATA_SECTION_NODE &&
+					object->type != XML_TEXT_NODE)
+					object = object->children;
+
+				str = xmlNodeListGetString(xtCxt->doc, object, 1);
 				if (str != NULL)
 				{
 					PG_TRY();
@@ -4528,10 +4539,7 @@ XmlTableGetValue(TableFuncScanState *state, int colnum,
 				{
 					/*
 					 * This line ensure mapping of empty tags to PostgreSQL
-					 * value. Usually we would to map a empty tag to empty
-					 * string. But this mapping can create empty string when
-					 * user doesn't expect it - when empty tag is enforced by
-					 * libxml2 - when user uses a text() function for example.
+					 * values.
 					 */
 					cstr = "";
 				}
@@ -4543,16 +4551,6 @@ XmlTableGetValue(TableFuncScanState *state, int colnum,
 
 				Assert(count > 1);
 
-				/*
-				 * When evaluating the XPath expression returns multiple
-				 * nodes, the result is the concatenation of them all. The
-				 * target type must be XML.
-				 */
-				if (typid != XMLOID)
-					ereport(ERROR,
-							(errcode(ERRCODE_CARDINALITY_VIOLATION),
-							 errmsg("more than one value returned by column XPath expression")));
-
 				/* Concatenate serialized values */
 				initStringInfo(&str);
 				for (i = 0; i < count; i++)
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 7fa1309108..3eb638ca25 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1024,7 +1024,7 @@ SELECT  xmltable.*
                          PASSING data
                          COLUMNS id int PATH '@id',
                                   _id FOR ORDINALITY,
-                                  country_name text PATH 'COUNTRY_NAME' NOT NULL,
+                                  country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
                                   country_id text PATH 'COUNTRY_ID',
                                   region_id int PATH 'REGION_ID',
                                   size float PATH 'SIZE',
@@ -1046,7 +1046,7 @@ CREATE VIEW xmltableview1 AS SELECT  xmltable.*
                          PASSING data
                          COLUMNS id int PATH '@id',
                                   _id FOR ORDINALITY,
-                                  country_name text PATH 'COUNTRY_NAME' NOT NULL,
+                                  country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
                                   country_id text PATH 'COUNTRY_ID',
                                   region_id int PATH 'REGION_ID',
                                   size float PATH 'SIZE',
@@ -1075,7 +1075,7 @@ CREATE OR REPLACE VIEW public.xmltableview1 AS
     "xmltable".premier_name
    FROM ( SELECT xmldata.data
            FROM xmldata) x,
-    LATERAL XMLTABLE(('/ROWS/ROW'::text) PASSING (x.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text))
+    LATERAL XMLTABLE(('/ROWS/ROW'::text) PASSING (x.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME/text()'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text))
 EXPLAIN (COSTS OFF) SELECT * FROM xmltableview1;
                QUERY PLAN                
 -----------------------------------------
@@ -1085,15 +1085,15 @@ EXPLAIN (COSTS OFF) SELECT * FROM xmltableview1;
 (3 rows)
 
 EXPLAIN (COSTS OFF, VERBOSE) SELECT * FROM xmltableview1;
-                                                                                                                                                                                                                        QUERY PLAN                                                                                                                                                                                                                         
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                                                                                                                                            QUERY PLAN                                                                                                                                                                                                                            
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop
    Output: "xmltable".id, "xmltable"._id, "xmltable".country_name, "xmltable".country_id, "xmltable".region_id, "xmltable".size, "xmltable".unit, "xmltable".premier_name
    ->  Seq Scan on public.xmldata
          Output: xmldata.data
    ->  Table Function Scan on "xmltable"
          Output: "xmltable".id, "xmltable"._id, "xmltable".country_name, "xmltable".country_id, "xmltable".region_id, "xmltable".size, "xmltable".unit, "xmltable".premier_name
-         Table Function Call: XMLTABLE(('/ROWS/ROW'::text) PASSING (xmldata.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text))
+         Table Function Call: XMLTABLE(('/ROWS/ROW'::text) PASSING (xmldata.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME/text()'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text))
 (7 rows)
 
 -- XMLNAMESPACES tests
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 112ebe47cd..cb865a9ef7 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -1004,7 +1004,7 @@ SELECT  xmltable.*
                          PASSING data
                          COLUMNS id int PATH '@id',
                                   _id FOR ORDINALITY,
-                                  country_name text PATH 'COUNTRY_NAME' NOT NULL,
+                                  country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
                                   country_id text PATH 'COUNTRY_ID',
                                   region_id int PATH 'REGION_ID',
                                   size float PATH 'SIZE',
@@ -1026,7 +1026,7 @@ CREATE VIEW xmltableview1 AS SELECT  xmltable.*
                          PASSING data
                          COLUMNS id int PATH '@id',
                                   _id FOR ORDINALITY,
-                                  country_name text PATH 'COUNTRY_NAME' NOT NULL,
+                                  country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
                                   country_id text PATH 'COUNTRY_ID',
                                   region_id int PATH 'REGION_ID',
                                   size float PATH 'SIZE',
@@ -1055,7 +1055,7 @@ CREATE OR REPLACE VIEW public.xmltableview1 AS
     "xmltable".premier_name
    FROM ( SELECT xmldata.data
            FROM xmldata) x,
-    LATERAL XMLTABLE(('/ROWS/ROW'::text) PASSING (x.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text))
+    LATERAL XMLTABLE(('/ROWS/ROW'::text) PASSING (x.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME/text()'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text))
 EXPLAIN (COSTS OFF) SELECT * FROM xmltableview1;
                QUERY PLAN                
 -----------------------------------------
@@ -1065,15 +1065,15 @@ EXPLAIN (COSTS OFF) SELECT * FROM xmltableview1;
 (3 rows)
 
 EXPLAIN (COSTS OFF, VERBOSE) SELECT * FROM xmltableview1;
-                                                                                                                                                                                                                        QUERY PLAN                                                                                                                                                                                                                         
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                                                                                                                                            QUERY PLAN                                                                                                                                                                                                                            
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop
    Output: "xmltable".id, "xmltable"._id, "xmltable".country_name, "xmltable".country_id, "xmltable".region_id, "xmltable".size, "xmltable".unit, "xmltable".premier_name
    ->  Seq Scan on public.xmldata
          Output: xmldata.data
    ->  Table Function Scan on "xmltable"
          Output: "xmltable".id, "xmltable"._id, "xmltable".country_name, "xmltable".country_id, "xmltable".region_id, "xmltable".size, "xmltable".unit, "xmltable".premier_name
-         Table Function Call: XMLTABLE(('/ROWS/ROW'::text) PASSING (xmldata.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text))
+         Table Function Call: XMLTABLE(('/ROWS/ROW'::text) PASSING (xmldata.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME/text()'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text))
 (7 rows)
 
 -- XMLNAMESPACES tests
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index cb96e18005..c223603a1f 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -349,7 +349,7 @@ SELECT  xmltable.*
                          PASSING data
                          COLUMNS id int PATH '@id',
                                   _id FOR ORDINALITY,
-                                  country_name text PATH 'COUNTRY_NAME' NOT NULL,
+                                  country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
                                   country_id text PATH 'COUNTRY_ID',
                                   region_id int PATH 'REGION_ID',
                                   size float PATH 'SIZE',
@@ -362,7 +362,7 @@ CREATE VIEW xmltableview1 AS SELECT  xmltable.*
                          PASSING data
                          COLUMNS id int PATH '@id',
                                   _id FOR ORDINALITY,
-                                  country_name text PATH 'COUNTRY_NAME' NOT NULL,
+                                  country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
                                   country_id text PATH 'COUNTRY_ID',
                                   region_id int PATH 'REGION_ID',
                                   size float PATH 'SIZE',
#4Markus Winand
markus.winand@winand.at
In reply to: Alvaro Herrera (#3)
Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

Hi and thanks for your efforts.

On 2018-04-26, at 21:18 , Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Hello Markus,

Markus Winand wrote:

* Patch 0001: Accept TEXT and CDATA nodes in XMLTABLE’s column_expression.

Column_expressions that match TEXT or CDATA nodes must return
the contents of the node itself, not the content of the
non-existing childs (i.e. the empty string).

The following query returns a wrong result in master:

SELECT *
FROM (VALUES ('<xml>text</xml>'::xml)
, ('<xml><![CDATA[<cdata>]]></xml>'::xml)
) d(x)
CROSS JOIN LATERAL
XMLTABLE('/xml'
PASSING x
COLUMNS "node()" TEXT PATH 'node()'
) t

The correct result can be obtained with patch 0001 applied:

x | node()
--------------------------------+---------
<xml>text</xml> | text
<xml><![CDATA[<cdata>]]></xml> | <cdata>

The patch adopts existing tests to guard against future regressions.

I remembered while reading this that Craig Ringer had commented that XML
text sections can have multiple children: just put XML comments amidst
the text.

As per my understanding of XML, this is a sibling—not a child—of two text nodes.

To test this, I added a comment in one of the text values, in
the regression database:

update xmldata set data = regexp_replace(data::text, 'HongKong', 'Hong<!-- a space -->Kong')::xml;

With the modified data, the new query in the regression tests fails:

SELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data
COLUMNS id int PATH '@id',
_id FOR ORDINALITY,
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
country_id text PATH 'COUNTRY_ID',
region_id int PATH 'REGION_ID',
size float PATH 'SIZE',
unit text PATH 'SIZE/@unit',
premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified');

ERROR: more than one value returned by column Path expression

I’ve tried matching “node()” against XML that includes a mixture of text and
comment nodes in other database products. Two out of two failed with a
similar error message.

- ORA-19279: XPTY0004 - XQuery dynamic type mismatch: expected singleton sequence - got multi-item sequence
- SQLCODE=-16003, SQLSTATE=10507, SQLERRMC=( item(), item()+ )
See: https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.messages.sql.doc/doc/msql16003n.html

So I digged through the standard to figure out what the standard-mandated
behaviour is.

The bottom line is that an error is the standard mandated behavior because
SQL uses a XPath/XQuery “cast" in the procedure. The XPath/XQuery spec
says “If the result of atomization is a sequence of more than one atomic value,
a type error is raised [err:XPTY0004]” (compare that to the ORA- above).
https://www.w3.org/TR/xpath-31/#id-cast

For reference, how I came there:
The XPath/XQuery cast is triggered for non XML types in
XMLCAST (SQL-14:2011 6.6 GR 4 h iv.) which is in turn triggered by
XMLTABLE (SQL-14:2011 7.1 SR 4 e ii 8).
[Note: I only have SQL-14:2011, thus no references to :2016]

This seems really undesirable, so I looked into getting it fixed. Turns
out that this error is being thrown from inside the same function we're
modifying, line 4559. I didn't have a terribly high opinion of that
ereport() when working on this feature to begin with, so now that it's
proven to provide a bad experience, let's try removing it. With that
change (patch attached), the feature seems to work; I tried with this
query, which seems to give the expected output (notice we have some
columns of type xml, others of type text, with and without the text()
function in the path):

SELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data COLUMNS
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
country_name_xml xml PATH 'COUNTRY_NAME/text()' NOT NULL,
country_name_n text PATH 'COUNTRY_NAME' NOT NULL,
country_name_xml_n xml PATH 'COUNTRY_NAME' NOT NULL);
country_name │ country_name_xml │ country_name_n │ country_name_xml_n
────────────────┼──────────────────┼────────────────┼───────────────────────────────────────────────────────
Australia │ Australia │ Australia │ <COUNTRY_NAME>Australia</COUNTRY_NAME>
China │ China │ China │ <COUNTRY_NAME>China</COUNTRY_NAME>
HongKong │ HongKong │ HongKong │ <COUNTRY_NAME>Hong<!-- a space -->Kong</COUNTRY_NAME>
India │ India │ India │ <COUNTRY_NAME>India</COUNTRY_NAME>
Japan │ Japan │ Japan │ <COUNTRY_NAME>Japan</COUNTRY_NAME>
Singapore │ Singapore │ Singapore │ <COUNTRY_NAME>Singapore</COUNTRY_NAME>
Czech Republic │ Czech Republic │ Czech Republic │ <COUNTRY_NAME>Czech Republic</COUNTRY_NAME>
Germany │ Germany │ Germany │ <COUNTRY_NAME>Germany</COUNTRY_NAME>
France │ France │ France │ <COUNTRY_NAME>France</COUNTRY_NAME>
Egypt │ Egypt │ Egypt │ <COUNTRY_NAME>Egypt</COUNTRY_NAME>
Sudan │ Sudan │ Sudan │ <COUNTRY_NAME>Sudan</COUNTRY_NAME>
(11 filas)

This means that the concatenation now works for all types, not just xml, so I
can do this also:

update xmldata set data = regexp_replace(data::text, '791', '7<!--small country-->91')::xml;

SELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data
COLUMNS
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
size_text float PATH 'SIZE/text()',
"SIZE" float, size_xml xml PATH 'SIZE');
country_name │ size_text │ SIZE
────────────────┼───────────┼──────
Australia │ │
China │ │
HongKong │ │
India │ │
Japan │ │
Singapore │ 791 │ 791
Czech Republic │ │
Germany │ │
France │ │
Egypt │ │
Sudan │ │
(11 filas)

I'm not sure if this concatenation of things that are not text or XML is
undesirable for whatever reason. Any clues?

As per above the reason against it is standard conformance.

Also, array indexes behave funny. First let's add more XML comments
inside that number, and query for the subscripts:

update xmldata set data = regexp_replace(data::text, '7<!--small country-->91', '<!--ah-->7<!--oh-->9<!--uh-->1')::xml;

SELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data
COLUMNS
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
size_text float PATH 'SIZE/text()',
size_text_1 float PATH 'SIZE/text()[1]',
size_text_2 float PATH 'SIZE/text()[2]',
"SIZE" float, size_xml xml PATH 'SIZE')
where size_text is not null;

country_name │ size_text │ size_text_1 │ size_text_2 │ size_text_3 │ SIZE │ size_xml
──────────────┼───────────┼─────────────┼─────────────┼─────────────┼──────┼───────────────────────────────────────────────────────
Singapore │ 791 │ 791 │ 91 │ 1 │ 791 │ <SIZE unit="km"><!--ah-->7<!--oh-->9<!--uh-->1</SIZE>
(1 fila)

I don't know what to make of that. Seems pretty broken.

Absolutely.

Also, node() matching comments or processing instructions
seems to be broken too:

SELECT *
FROM (VALUES ('<xml><!--comment--></xml>'::xml)
, ('<xml><?pi content?></xml>'::xml)
) d(x)
CROSS JOIN LATERAL
XMLTABLE('/xml'
PASSING x
COLUMNS "node()" TEXT PATH 'node()'
) t

x | node()
---------------------------+--------
<xml><!--comment--></xml> |
<xml><?pi content?></xml> |
(2 rows)

I can look into this, but it may take a while.

As per my understanding explained above, I’d base future work on my original patch.

After this, I looked for some examples of XPath syntax in the interwebs.
I came across the | operator (which apparently is a "union" of some
sort). Behold:

SELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data
COLUMNS
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
size_text text PATH 'SIZE/text() | SIZE/@unit')
where size_text is not null ;

country_name │ size_text
──────────────┼───────────
Singapore │ km791
(1 fila)

The "units" property is ahead of the text(), which is pretty odd.

As per above, I think this is outside the spec (as it matches several nodes).

Other than that it is related to the “document order” in which XPath
generally returns matched nodes (except a few axis that return their matched
in reverse order—e.g. ancestor::)

But if I
remove the text() call, it puts the units after the text:

SELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data
COLUMNS
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
size_text text PATH 'SIZE | SIZE/@unit')
where size_text is not null ;
country_name │ size_text
──────────────┼─────────────────────────────────────────────────────────
Singapore │ <SIZE unit="km"><!--ah-->7<!--oh-->9<!--uh-->1</SIZE>km
(1 fila)

Again, I don't know what to make of this.

As per above, the document order is:

1. <size> opening tag
2. the @unit attribute for size
3. the text() inside the size.

Now that you match (1) instead of (3), it is returned first.

Anyway, this seems firmly in
the libxml side of things; the only conclusion I have is that nobody
ever uses libxml terribly much for complex XPath processing.

Basing another test on your original test case, look at the first row
here. Is this result correct?

SELECT *
FROM (VALUES ('<xml>te<!-- ahoy -->xt</xml>'::xml)
, ('<xml><![CDATA[some <!-- really --> weird <stuff>]]></xml>'::xml)
) d(x)
CROSS JOIN LATERAL
XMLTABLE('/xml'
PASSING x
COLUMNS "node()" TEXT PATH 'node()'
) t;
x │ node()
───────────────────────────────────────────────────────────┼────────────────────────────────────
<xml>te<!-- ahoy -->xt</xml> │ te ahoy xt
<xml><![CDATA[some <!-- really --> weird <stuff>]]></xml> │ some <!-- really --> weird <stuff>

Why was the comment contents expanded inside node()?

Because node() matches also comments and your patches concatenates all matches.

Note what happens
if I change the type from text to xml in that column:

SELECT *
FROM (VALUES ('<xml>te<!-- ahoy -->xt</xml>'::xml)
, ('<xml><![CDATA[some <!-- really --> weird <stuff>]]></xml>'::xml)
) d(x)
CROSS JOIN LATERAL
XMLTABLE('/xml'
PASSING x
COLUMNS "node()" xml PATH 'node()'
) t;

x │ node()
───────────────────────────────────────────────────────────┼────────────────────────────────────────────────
<xml>te<!-- ahoy -->xt</xml> │ te ahoy xt
<xml><![CDATA[some <!-- really --> weird <stuff>]]></xml> │ some &lt;!-- really --&gt; weird &lt;stuff&gt;
(2 filas)

The comment seems to be wrong.

I guess it’s fine if the CDATA gets transformed in to an equivalent
string using the XML entities. Yet, it might be better avoiding it.

Will look into it, but may take a while.

Further note that, per the standard, you can omit the PATH clause in
which case the column name is used as the path, which seems to work
correctly. Phew!

* Patch 0002: Set correct context for XPath evaluation.

According to the ISO standard, the context of XMLTABLE's XPath
row_expression is the document node of the XML[1] - not the root node.

The respective code is shared with non-ISO functions such as xpath
and xpath_exists so that the change also affects these functions.

It’s an incompatible change - even the regression tests need to be adjusted because they
test for the “wrong” behaviour.

I'm really afraid to change the non-ISO functions in PG10, since
it's already released for quite some time with this long-standing
behavior; and I'm not sure it'll do much good to change XMLTABLE in PG10
and leave the non-iso functions unpatched. I think the easiest route is
to patch all functions only for PostgreSQL 11. Maybe if XMLTABLE is
considered unacceptably broken in PostgreSQL 10 we can patch only that
one.

I’m in favour of doing all at once in PG11.

As the XMLTABLE examples in the docs consistently use absolute XPath
expressions, there is hope that people do it this way so that they won’t be
affected by the patch.

When I stumbled upon this issue, I first though that the context is not set
at all. I did not even come to my mind that the context could be there,
but wrongly set. I only found out that it actually is set when I made the patch.

In other words, I would not have made code relying on the wrong behaviour.
Unfortunately, this argument doesn’t hold for the non-ISO functions, which
are also longer in PostgreSQL.

-markus

#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Markus Winand (#4)
Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

Hello hackers,

Any objections fixing this on Pg11? My opinion is that it's better to
fix it now rather than waiting for pg12. It's already broken in pg10
(xmltable) and older (rest of the xml stuff). As Markus advised, I'd
not backpatch fixes for fear of breaking stuff, but I'd rather release
pg11 with the correct behavior.

I intend to get it done this week if there are no serious objections.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Markus Winand (#1)
Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

Hi

On 2018-Mar-27, Markus Winand wrote:

I’ve found two issues with XML/XPath integration in PostgreSQL. Two
patches are attached. I’ve just separated them because the second one
is an incompatible change.

* Patch 0001: Accept TEXT and CDATA nodes in XMLTABLE’s column_expression.

I pushed 0001 to REL_10_STABLE and master -- since there is no
incompatible behavior change. (Well, there is a behavior change, but
it's pretty hard to imagine someone would be relying on the bogus old
output.)

I'll see about 0002 now -- only to master, as discussed.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#6)
1 attachment(s)
Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

On 2018-Jun-20, Alvaro Herrera wrote:

I'll see about 0002 now -- only to master, as discussed.

Here's a patch.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Set-correct-context-for-XPath-evaluation.patchtext/plain; charset=iso-8859-1Download
From 6952930bc6d706a42736d9579f449bcbf598d201 Mon Sep 17 00:00:00 2001
From: Markus Winand <markus.winand@winand.at>
Date: Tue, 27 Mar 2018 08:36:59 +0000
Subject: [PATCH] Set correct context for XPath evaluation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

According to the SQL standard, the context of XMLTABLE's XPath
row_expression is the document node of the XML input document, not the
root node.  This becomes visible when a relative path rather than
absolute is used as row expression.  Absolute paths is what was used in
original tests and docs (and the most common form used in examples
throughout the interwebs), which explains why this wasn't noticed
before.

Other functions such as xpath() and xpath_exists() also have this
problem.  While not specified by the SQL standard, it would be pretty
odd to leave those functions to behave differently than XMLTABLE, so
change them too.  However, this is a backwards-incompatible change.

Author: Markus Winand
Reported-By: Markus Winand
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/0684A598-002C-42A2-AE12-F024A324EAE4@winand.at
---
 src/backend/utils/adt/xml.c         | 10 ++--------
 src/test/regress/expected/xml.out   |  8 +++++++-
 src/test/regress/expected/xml_1.out | 12 +++++++++---
 src/test/regress/expected/xml_2.out |  8 +++++++-
 src/test/regress/sql/xml.sql        |  3 ++-
 5 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 8307f1cf47..37d85f71f3 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3934,10 +3934,7 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces,
 		if (xpathctx == NULL || xmlerrcxt->err_occurred)
 			xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
 						"could not allocate XPath context");
-		xpathctx->node = xmlDocGetRootElement(doc);
-		if (xpathctx->node == NULL || xmlerrcxt->err_occurred)
-			xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
-						"could not find root XML element");
+		xpathctx->node = (xmlNodePtr) doc;
 
 		/* register namespaces, if any */
 		if (ns_count > 0)
@@ -4276,10 +4273,7 @@ XmlTableSetDocument(TableFuncScanState *state, Datum value)
 		if (xpathcxt == NULL || xtCxt->xmlerrcxt->err_occurred)
 			xml_ereport(xtCxt->xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
 						"could not allocate XPath context");
-		xpathcxt->node = xmlDocGetRootElement(doc);
-		if (xpathcxt->node == NULL || xtCxt->xmlerrcxt->err_occurred)
-			xml_ereport(xtCxt->xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
-						"could not find root XML element");
+		xpathcxt->node = (xmlNodePtr) doc;
 	}
 	PG_CATCH();
 	{
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3eb638ca25..6e1f885112 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -670,6 +670,12 @@ SELECT xpath('/nosuchtag', '<root/>');
  {}
 (1 row)
 
+SELECT xpath('root', '<root/>');
+   xpath   
+-----------
+ {<root/>}
+(1 row)
+
 -- Round-trip non-ASCII data through xpath().
 DO $$
 DECLARE
@@ -1212,7 +1218,7 @@ SELECT * FROM xmltable('/root' passing '<root><element>a1a<!-- aaaa -->a2a<?aaaa
 SELECT * FROM xmltable('/root' passing '<root><element>a1a<!-- aaaa -->a2a<?aaaaa?> <!--z-->  bbbb<x>xxx</x>cccc</element></root>' COLUMNS element text PATH 'element/text()'); -- should fail
 ERROR:  more than one value returned by column XPath expression
 -- CDATA test
-select * from xmltable('r' passing '<d><r><c><![CDATA[<hello> &"<>!<a>foo</a>]]></c></r><r><c>2</c></r></d>' columns c text);
+select * from xmltable('d/r' passing '<d><r><c><![CDATA[<hello> &"<>!<a>foo</a>]]></c></r><r><c>2</c></r></d>' columns c text);
             c            
 -------------------------
  <hello> &"<>!<a>foo</a>
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 2053734f65..0eba424346 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -576,6 +576,12 @@ LINE 1: SELECT xpath('/nosuchtag', '<root/>');
                                    ^
 DETAIL:  This functionality requires the server to be built with libxml support.
 HINT:  You need to rebuild PostgreSQL using --with-libxml.
+SELECT xpath('root', '<root/>');
+ERROR:  unsupported XML feature
+LINE 1: SELECT xpath('root', '<root/>');
+                             ^
+DETAIL:  This functionality requires the server to be built with libxml support.
+HINT:  You need to rebuild PostgreSQL using --with-libxml.
 -- Round-trip non-ASCII data through xpath().
 DO $$
 DECLARE
@@ -1067,10 +1073,10 @@ LINE 1: SELECT * FROM xmltable('/root' passing '<root><element>a1a<!...
 DETAIL:  This functionality requires the server to be built with libxml support.
 HINT:  You need to rebuild PostgreSQL using --with-libxml.
 -- CDATA test
-select * from xmltable('r' passing '<d><r><c><![CDATA[<hello> &"<>!<a>foo</a>]]></c></r><r><c>2</c></r></d>' columns c text);
+select * from xmltable('d/r' passing '<d><r><c><![CDATA[<hello> &"<>!<a>foo</a>]]></c></r><r><c>2</c></r></d>' columns c text);
 ERROR:  unsupported XML feature
-LINE 1: select * from xmltable('r' passing '<d><r><c><![CDATA[<hello...
-                                           ^
+LINE 1: select * from xmltable('d/r' passing '<d><r><c><![CDATA[<hel...
+                                             ^
 DETAIL:  This functionality requires the server to be built with libxml support.
 HINT:  You need to rebuild PostgreSQL using --with-libxml.
 -- XML builtin entities
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index cb865a9ef7..3ec56e4c54 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -650,6 +650,12 @@ SELECT xpath('/nosuchtag', '<root/>');
  {}
 (1 row)
 
+SELECT xpath('root', '<root/>');
+   xpath   
+-----------
+ {<root/>}
+(1 row)
+
 -- Round-trip non-ASCII data through xpath().
 DO $$
 DECLARE
@@ -1192,7 +1198,7 @@ SELECT * FROM xmltable('/root' passing '<root><element>a1a<!-- aaaa -->a2a<?aaaa
 SELECT * FROM xmltable('/root' passing '<root><element>a1a<!-- aaaa -->a2a<?aaaaa?> <!--z-->  bbbb<x>xxx</x>cccc</element></root>' COLUMNS element text PATH 'element/text()'); -- should fail
 ERROR:  more than one value returned by column XPath expression
 -- CDATA test
-select * from xmltable('r' passing '<d><r><c><![CDATA[<hello> &"<>!<a>foo</a>]]></c></r><r><c>2</c></r></d>' columns c text);
+select * from xmltable('d/r' passing '<d><r><c><![CDATA[<hello> &"<>!<a>foo</a>]]></c></r><r><c>2</c></r></d>' columns c text);
             c            
 -------------------------
  <hello> &"<>!<a>foo</a>
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index c223603a1f..3b91b56d5a 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -188,6 +188,7 @@ SELECT xpath('count(//*)=0', '<root><sub/><sub/></root>');
 SELECT xpath('count(//*)=3', '<root><sub/><sub/></root>');
 SELECT xpath('name(/*)', '<root><sub/><sub/></root>');
 SELECT xpath('/nosuchtag', '<root/>');
+SELECT xpath('root', '<root/>');
 
 -- Round-trip non-ASCII data through xpath().
 DO $$
@@ -423,7 +424,7 @@ SELECT * FROM xmltable('/root' passing '<root><element>a1a<!-- aaaa -->a2a<?aaaa
 SELECT * FROM xmltable('/root' passing '<root><element>a1a<!-- aaaa -->a2a<?aaaaa?> <!--z-->  bbbb<x>xxx</x>cccc</element></root>' COLUMNS element text PATH 'element/text()'); -- should fail
 
 -- CDATA test
-select * from xmltable('r' passing '<d><r><c><![CDATA[<hello> &"<>!<a>foo</a>]]></c></r><r><c>2</c></r></d>' columns c text);
+select * from xmltable('d/r' passing '<d><r><c><![CDATA[<hello> &"<>!<a>foo</a>]]></c></r><r><c>2</c></r></d>' columns c text);
 
 -- XML builtin entities
 SELECT * FROM xmltable('/x/a' PASSING '<x><a><ent>&apos;</ent></a><a><ent>&quot;</ent></a><a><ent>&amp;</ent></a><a><ent>&lt;</ent></a><a><ent>&gt;</ent></a></x>' COLUMNS ent text);
-- 
2.11.0

#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Markus Winand (#4)
Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

I have pushed the patch now (in your original form rather than my later
formulation) -- let's see how the buildfarm likes it. There are (at
least) three issues remaining, as per below; Pavel, do you have any
insight on these?

First one is about array indexes not working sanely (I couldn't get this
to work in Oracle)

Also, array indexes behave funny. First let's add more XML comments
inside that number, and query for the subscripts:

update xmldata set data = regexp_replace(data::text, '7<!--small country-->91', '<!--ah-->7<!--oh-->9<!--uh-->1')::xml;

SELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data
COLUMNS
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
size_text float PATH 'SIZE/text()',
size_text_1 float PATH 'SIZE/text()[1]',
size_text_2 float PATH 'SIZE/text()[2]',
"SIZE" float, size_xml xml PATH 'SIZE')
where size_text is not null;

country_name │ size_text │ size_text_1 │ size_text_2 │ size_text_3 │ SIZE │ size_xml
──────────────┼───────────┼─────────────┼─────────────┼─────────────┼──────┼───────────────────────────────────────────────────────
Singapore │ 791 │ 791 │ 91 │ 1 │ 791 │ <SIZE unit="km"><!--ah-->7<!--oh-->9<!--uh-->1</SIZE>
(1 fila)

The second one is about (lack of!) processing instructions and comments:

Also, node() matching comments or processing instructions
seems to be broken too:

SELECT *
FROM (VALUES ('<xml><!--comment--></xml>'::xml)
, ('<xml><?pi content?></xml>'::xml)
) d(x)
CROSS JOIN LATERAL
XMLTABLE('/xml'
PASSING x
COLUMNS "node()" TEXT PATH 'node()'
) t

x | node()
---------------------------+--------
<xml><!--comment--></xml> |
<xml><?pi content?></xml> |
(2 rows)

I can look into this, but it may take a while.

Compare the empty second columns with oracle behavior, which returns the
contents of the PI and the comment. As a script for
http://rextester.com/l/oracle_online_compiler

create table xmltb (data xmltype) \\
insert into xmltb values ('<xml><!--the comment is here--></xml>') \\
insert into xmltb values ('<xml><?pi php_stuff(); do_stuff("hello"); ?></xml>') \\
SELECT * FROM xmltb, XMLTABLE('/xml' PASSING data COLUMNS node varchar2(100) PATH 'node()') t \\
drop table xmltb \\

The third issue is the way we output comments when they're in a column
of type XML:

Note what happens if I change the type from text to xml in that
column:

SELECT *
FROM (VALUES ('<xml>te<!-- ahoy -->xt</xml>'::xml)
, ('<xml><![CDATA[some <!-- really --> weird <stuff>]]></xml>'::xml)
) d(x)
CROSS JOIN LATERAL
XMLTABLE('/xml'
PASSING x
COLUMNS "node()" xml PATH 'node()'
) t;

x │ node()
───────────────────────────────────────────────────────────┼────────────────────────────────────────────────
<xml>te<!-- ahoy -->xt</xml> │ te ahoy xt
<xml><![CDATA[some <!-- really --> weird <stuff>]]></xml> │ some &lt;!-- really --&gt; weird &lt;stuff&gt;
(2 filas)

The comment seems to be wrong.

I guess it’s fine if the CDATA gets transformed in to an equivalent
string using the XML entities. Yet, it might be better avoiding it.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#9Markus Winand
markus.winand@winand.at
In reply to: Alvaro Herrera (#8)
2 attachment(s)
Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

Hi!

I have pushed the patch now (in your original form rather than my later
formulation) -- let's see how the buildfarm likes it. There are (at
least) three issues remaining, as per below; Pavel, do you have any
insight on these?

Attached patch 0001 fixes all three issues and one more that is, again,
an incompatible change (see below).

Attachments:

0001-XML-XPath-comments-processing-instructions-array-ind.patchapplication/octet-stream; name=0001-XML-XPath-comments-processing-instructions-array-ind.patchDownload
From 733b6f8af5b3f07a2c55145c9a797955566bb5e3 Mon Sep 17 00:00:00 2001
From: Markus Winand <markus.winand@winand.at>
Date: Sun, 8 Jul 2018 17:24:05 +0000
Subject: [PATCH 1/2] XML/XPath: comments, processing-instructions, array
 indexes

For text output, use correct data for comments and processing-instructions
and return only the selected node if an array index is used (e.g. node()[2]).
Elements now return the concatenation of all descendant text() nodes, not
just the immediate children of the element.

For XML output, serialize comments, processing-instructions and CDATA
nodes as XML.
---
 src/backend/utils/adt/xml.c         | 122 +++++++++++++++++++++++++-----------
 src/test/regress/expected/xml_2.out |  10 +--
 src/test/regress/sql/xml.sql        |   4 +-
 3 files changed, 92 insertions(+), 44 deletions(-)

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 37d85f7..7cef817 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -143,6 +143,7 @@ static bool print_xml_decl(StringInfo buf, const xmlChar *version,
 static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
 		  bool preserve_whitespace, int encoding);
 static text *xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt);
+static xmlChar *xml_xmlnodetostring(xmlDocPtr doc, const xmlNode const* element);
 static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
 					   ArrayBuildState *astate,
 					   PgXmlErrorContext *xmlerrcxt);
@@ -3682,7 +3683,7 @@ xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
 {
 	xmltype    *result;
 
-	if (cur->type == XML_ELEMENT_NODE)
+	if (cur->type == XML_ELEMENT_NODE || cur->type == XML_COMMENT_NODE || cur->type == XML_CDATA_SECTION_NODE || cur->type == XML_PI_NODE)
 	{
 		xmlBufferPtr buf;
 		xmlNodePtr	cur_copy;
@@ -3741,6 +3742,52 @@ xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
 }
 
 /*
+ * Convert XML node to a concatenation of nested text() nodes.
+ * Inline CDATA, expand entity references.
+ *
+ * Returns NULL or a string that must be xmlFree()-ed by the caller.
+ */
+static xmlChar *
+xml_xmlnodetostring (xmlDocPtr doc, const xmlNode *const node) {
+	const xmlNode *n = node->xmlChildrenNode;
+	const xmlNode *deep;
+	const xmlEntity *ent;
+	xmlChar *ret = NULL;
+
+	while (n != NULL) {
+		if (n->type == XML_ELEMENT_NODE ||
+			n->type == XML_ENTITY_REF_NODE ||
+			n->type == XML_TEXT_NODE ||
+			n->type == XML_CDATA_SECTION_NODE)
+		{
+			deep = NULL;
+			if (n->type == XML_ELEMENT_NODE) {
+				deep = n;
+			} else if (n->type == XML_ENTITY_REF_NODE) {
+				ent = xmlGetDocEntity(doc, n->name);
+				if (ent != NULL) {
+					deep = ent->children;
+				}
+			}
+
+			if (deep == NULL) {
+				ret = xmlStrcat(ret, n->content);
+			} else {
+				xmlChar *buffer;
+				buffer = xml_xmlnodetostring(doc, deep);
+				if (buffer != NULL) {
+					ret = xmlStrcat(ret, buffer);
+					xmlFree(buffer);
+				}
+			}
+		}
+
+		n = n->next;
+	}
+	return ret;
+}
+
+/*
  * Convert an XML XPath object (the result of evaluating an XPath expression)
  * to an array of xml values, which are appended to astate.  The function
  * result value is the number of elements in the array.
@@ -4475,9 +4522,9 @@ XmlTableGetValue(TableFuncScanState *state, int colnum,
 		/*
 		 * There are four possible cases, depending on the number of nodes
 		 * returned by the XPath expression and the type of the target column:
-		 * a) XPath returns no nodes.  b) One node is returned, and column is
-		 * of type XML.  c) One node, column type other than XML.  d) Multiple
-		 * nodes are returned.
+		 * a) XPath returns no nodes.  b) The target type is XML (return all
+		 * as XML). For non-XML types: c) One node (return content).
+		 * d) Multiple nodes (error).
 		 */
 		if (xpathobj->type == XPATH_NODESET)
 		{
@@ -4490,33 +4537,49 @@ XmlTableGetValue(TableFuncScanState *state, int colnum,
 			{
 				*isnull = true;
 			}
-			else if (count == 1 && typid == XMLOID)
+			else if (typid == XMLOID)
 			{
-				text	   *textstr;
+				StringInfoData str;
+				int			i;
 
-				/* simple case, result is one value */
-				textstr = xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[0],
-											   xtCxt->xmlerrcxt);
-				cstr = text_to_cstring(textstr);
+				/* Concatenate serialized values */
+				initStringInfo(&str);
+				for (i = 0; i < count; i++)
+				{
+					appendStringInfoText(&str,
+										 xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[i],
+															  xtCxt->xmlerrcxt));
+				}
+				cstr = str.data;
 			}
 			else if (count == 1)
 			{
 				xmlChar    *str;
 				xmlNodePtr	node;
 
-				/*
-				 * Most nodes (elements and even attributes) store their data
-				 * in children nodes. If they don't have children nodes, it
-				 * means that they are empty (e.g. <element/>). Text nodes and
-				 * CDATA sections are an exception: they don't have children
-				 * but have content in the Text/CDATA node itself.
-				 */
 				node = xpathobj->nodesetval->nodeTab[0];
-				if (node->type != XML_CDATA_SECTION_NODE &&
-					node->type != XML_TEXT_NODE)
-					node = node->xmlChildrenNode;
+				if (node->type == XML_ELEMENT_NODE)
+				{
+					/* XML elements need recursive processing */
+					str = xml_xmlnodetostring(xtCxt->doc, node);
+				}
+				else  if (node->type == XML_CDATA_SECTION_NODE ||
+					node->type == XML_TEXT_NODE ||
+					node->type == XML_PI_NODE ||
+					node->type == XML_COMMENT_NODE)
+				{
+					/* Node types that store their data in "content" */
+					str = xmlStrdup(node->content);
+				}
+				else
+				{
+					/*
+					 * The remaining types (e.g. attributes)
+					 * store their data in the child.
+					 */
+					str = xmlStrdup(node->xmlChildrenNode->content);
+				}
 
-				str = xmlNodeListGetString(xtCxt->doc, node, 1);
 				if (str != NULL)
 				{
 					PG_TRY();
@@ -4539,30 +4602,15 @@ XmlTableGetValue(TableFuncScanState *state, int colnum,
 			}
 			else
 			{
-				StringInfoData str;
-				int			i;
-
-				Assert(count > 1);
-
 				/*
 				 * When evaluating the XPath expression returns multiple
-				 * nodes, the result is the concatenation of them all. The
-				 * target type must be XML.
+				 * nodes, and the tartget type is not XML, yield an error.
 				 */
 				if (typid != XMLOID)
 					ereport(ERROR,
 							(errcode(ERRCODE_CARDINALITY_VIOLATION),
 							 errmsg("more than one value returned by column XPath expression")));
 
-				/* Concatenate serialized values */
-				initStringInfo(&str);
-				for (i = 0; i < count; i++)
-				{
-					appendStringInfoText(&str,
-										 xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[i],
-															  xtCxt->xmlerrcxt));
-				}
-				cstr = str.data;
 			}
 		}
 		else if (xpathobj->type == XPATH_STRING)
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 3ec56e4..ca1911f 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -1189,13 +1189,13 @@ SELECT xmltable.* FROM xmldata, LATERAL xmltable('/ROWS/ROW[COUNTRY_NAME="Japan"
   5 | Japan        |         3 | <COUNTRY_ID>JP</COUNTRY_ID><COUNTRY_NAME>Japan</COUNTRY_NAME><REGION_ID>3</REGION_ID><PREMIER_NAME>Sinzo Abe</PREMIER_NAME>
 (2 rows)
 
-SELECT * FROM xmltable('/root' passing '<root><element>a1a<!-- aaaa -->a2a<?aaaaa?> <!--z-->  bbbb<x>xxx</x>cccc</element></root>' COLUMNS element text);
-      element      
--------------------
- a1aa2a   bbbbcccc
+SELECT * FROM xmltable('/root' passing '<root><element>a1a<!-- aaaa -->a2a<?aaaaa pi?> <!--z-->  bbbb<x>xxx</x>cccc</element></root>' COLUMNS element text, cmnt text PATH 'element/comment()[2]', pi text PATH 'element/processing-instruction()', t1 text PATH 'element/text()[1]', x xml PATH 'element');
+       element        | cmnt | pi | t1  |                                        x                                        
+----------------------+------+----+-----+---------------------------------------------------------------------------------
+ a1aa2a   bbbbxxxcccc | z    | pi | a1a | <element>a1a<!-- aaaa -->a2a<?aaaaa pi?> <!--z-->  bbbb<x>xxx</x>cccc</element>
 (1 row)
 
-SELECT * FROM xmltable('/root' passing '<root><element>a1a<!-- aaaa -->a2a<?aaaaa?> <!--z-->  bbbb<x>xxx</x>cccc</element></root>' COLUMNS element text PATH 'element/text()'); -- should fail
+SELECT * FROM xmltable('/root' passing '<root><element>a1a<!-- aaaa -->a2a<?aaaaa pi?> <!--z-->  bbbb<x>xxx</x>cccc</element></root>' COLUMNS element text PATH 'element/text()'); -- should fail
 ERROR:  more than one value returned by column XPath expression
 -- CDATA test
 select * from xmltable('d/r' passing '<d><r><c><![CDATA[<hello> &"<>!<a>foo</a>]]></c></r><r><c>2</c></r></d>' columns c text);
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 3b91b56..e274e47 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -420,8 +420,8 @@ SELECT xmltable.* FROM xmldata, LATERAL xmltable('/ROWS/ROW[COUNTRY_NAME="Japan"
 SELECT xmltable.* FROM xmldata, LATERAL xmltable('/ROWS/ROW[COUNTRY_NAME="Japan" or COUNTRY_NAME="India"]' PASSING data COLUMNS id int PATH '@id', "COUNTRY_NAME" text, "REGION_ID" int, rawdata xml PATH '.');
 SELECT xmltable.* FROM xmldata, LATERAL xmltable('/ROWS/ROW[COUNTRY_NAME="Japan" or COUNTRY_NAME="India"]' PASSING data COLUMNS id int PATH '@id', "COUNTRY_NAME" text, "REGION_ID" int, rawdata xml PATH './*');
 
-SELECT * FROM xmltable('/root' passing '<root><element>a1a<!-- aaaa -->a2a<?aaaaa?> <!--z-->  bbbb<x>xxx</x>cccc</element></root>' COLUMNS element text);
-SELECT * FROM xmltable('/root' passing '<root><element>a1a<!-- aaaa -->a2a<?aaaaa?> <!--z-->  bbbb<x>xxx</x>cccc</element></root>' COLUMNS element text PATH 'element/text()'); -- should fail
+SELECT * FROM xmltable('/root' passing '<root><element>a1a<!-- aaaa -->a2a<?aaaaa pi?> <!--z-->  bbbb<x>xxx</x>cccc</element></root>' COLUMNS element text, cmnt text PATH 'element/comment()[2]', pi text PATH 'element/processing-instruction()', t1 text PATH 'element/text()[1]', x xml PATH 'element');
+SELECT * FROM xmltable('/root' passing '<root><element>a1a<!-- aaaa -->a2a<?aaaaa pi?> <!--z-->  bbbb<x>xxx</x>cccc</element></root>' COLUMNS element text PATH 'element/text()'); -- should fail
 
 -- CDATA test
 select * from xmltable('d/r' passing '<d><r><c><![CDATA[<hello> &"<>!<a>foo</a>]]></c></r><r><c>2</c></r></d>' columns c text);
-- 
1.9.1

0002-XML-avoid-xmlStrdup-if-possible.patchapplication/octet-stream; name=0002-XML-avoid-xmlStrdup-if-possible.patchDownload
From a786e2cded217b36192dc54ac5f5291f862d0c39 Mon Sep 17 00:00:00 2001
From: Markus Winand <markus.winand@winand.at>
Date: Sun, 8 Jul 2018 18:53:35 +0000
Subject: [PATCH 2/2] XML: avoid xmlStrdup if possible

---
 src/backend/utils/adt/xml.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 7cef817..cc88a66 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -4555,6 +4555,7 @@ XmlTableGetValue(TableFuncScanState *state, int colnum,
 			else if (count == 1)
 			{
 				xmlChar    *str;
+				int        free_str = 0; /* xmlFree(str) required? */
 				xmlNodePtr	node;
 
 				node = xpathobj->nodesetval->nodeTab[0];
@@ -4562,6 +4563,7 @@ XmlTableGetValue(TableFuncScanState *state, int colnum,
 				{
 					/* XML elements need recursive processing */
 					str = xml_xmlnodetostring(xtCxt->doc, node);
+					free_str = 1;
 				}
 				else  if (node->type == XML_CDATA_SECTION_NODE ||
 					node->type == XML_TEXT_NODE ||
@@ -4569,7 +4571,7 @@ XmlTableGetValue(TableFuncScanState *state, int colnum,
 					node->type == XML_COMMENT_NODE)
 				{
 					/* Node types that store their data in "content" */
-					str = xmlStrdup(node->content);
+					str = node->content;
 				}
 				else
 				{
@@ -4577,7 +4579,7 @@ XmlTableGetValue(TableFuncScanState *state, int colnum,
 					 * The remaining types (e.g. attributes)
 					 * store their data in the child.
 					 */
-					str = xmlStrdup(node->xmlChildrenNode->content);
+					str = node->xmlChildrenNode->content;
 				}
 
 				if (str != NULL)
@@ -4588,11 +4590,13 @@ XmlTableGetValue(TableFuncScanState *state, int colnum,
 					}
 					PG_CATCH();
 					{
-						xmlFree(str);
+						if (free_str)
+							xmlFree(str);
 						PG_RE_THROW();
 					}
 					PG_END_TRY();
-					xmlFree(str);
+					if (free_str)
+						xmlFree(str);
 				}
 				else
 				{
-- 
1.9.1

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Markus Winand (#9)
Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

On 2018-Jul-08, Markus Winand wrote:

Hi!

I have pushed the patch now (in your original form rather than my later
formulation) -- let's see how the buildfarm likes it. There are (at
least) three issues remaining, as per below; Pavel, do you have any
insight on these?

Attached patch 0001 fixes all three issues and one more that is, again,
an incompatible change (see below).

For closure: this was re-published as
/messages/by-id/3e8eab9e-7289-6c23-5e2c-153cccea2257@anastigmatix.net
and got pushed as commit 251cf2e27be.

Thank you for reporting this!

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services