[PATCH] Add pretty-printed XML output option
Hi,
This small patch introduces a XML pretty print function. It basically
takes advantage of the indentation feature of xmlDocDumpFormatMemory
from libxml2 to format XML strings.
postgres=# SELECT xmlpretty('<foo id="x"><bar id="y"><var
id="z">42</var></bar></foo>');
xmlpretty
--------------------------
<foo id="x"> +
<bar id="y"> +
<var id="z">42</var>+
</bar> +
</foo> +
(1 row)
The patch also contains regression tests and documentation.
Feedback is very welcome!
Jim
Attachments:
v1-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v1-0001-Add-pretty-printed-XML-output-option.patchDownload
From ced9fccddc033de98709a6e93dc6530ce68149db Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v1] Add pretty-printed XML output option
This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
doc/src/sgml/func.sgml | 34 ++++++++++
src/backend/utils/adt/xml.c | 30 +++++++++
src/include/catalog/pg_proc.dat | 3 +
src/test/regress/expected/xml.out | 107 ++++++++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 34 ++++++++++
5 files changed, 208 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..e8b5e581f0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14293,6 +14293,40 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
]]></screen>
</para>
</sect3>
+
+ <sect3 id="functions-xml-xmlpretty">
+ <title><literal>xmlpretty</literal></title>
+
+ <indexterm>
+ <primary>xmlpretty</primary>
+ </indexterm>
+
+<synopsis>
+<function>xmlpretty</function> ( <type>xml</type> ) <returnvalue>xml</returnvalue>
+</synopsis>
+
+ <para>
+ Converts the given XML value to pretty-printed, indented text.
+ </para>
+
+ <para>
+ Example:
+ <screen><![CDATA[
+SELECT xmlpretty('<foo id="x"><bar id="y"><var id="z">42</var></bar></foo>');
+ xmlpretty
+--------------------------
+ <foo id="x">
+ <bar id="y">
+ <var id="z">42</var>
+ </bar>
+ </foo>
+
+(1 row)
+
+]]></screen>
+ </para>
+ </sect3>
+
</sect2>
<sect2 id="functions-xml-predicates">
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..6409133137 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,36 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
}
#endif
+Datum
+xmlpretty(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+ xmlDocPtr doc;
+ xmlChar *buf = NULL;
+ text *arg = PG_GETARG_TEXT_PP(0);
+
+ doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+ /**
+ * xmlDocDumpFormatMemory (
+ * xmlDocPtr doc, # the XML document.
+ * xmlChar ** buf, # buffer where the formatted XML document will be stored.
+ * int *size, # this could store the size of the created buffer
+ * but as we do not need it, we can leave it NULL.
+ * int format) # 1 = node indenting.
+ */
+ xmlDocDumpFormatMemory(doc, &buf, NULL, 1);
+
+ xmlFreeDoc(doc);
+ PG_RETURN_XML_P(cstring_to_xmltype((char*)buf));
+
+#else
+ NO_XML_SUPPORT();
+ return 0;
+#endif
+}
+
Datum
xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..3224dc3e76 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
{ oid => '3053', descr => 'determine if a string is well formed XML content',
proname => 'xml_is_well_formed_content', prorettype => 'bool',
proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+ { oid => '4642', descr => 'Indented text from xml',
+ proname => 'xmlpretty', prorettype => 'xml',
+ proargtypes => 'xml', prosrc => 'xmlpretty' },
# json
{ oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..98a338ad8d 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,110 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
<foo/> | <foo/>
(1 row)
+-- XML pretty print: single line XML string
+SELECT xmlpretty('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>')::xml;
+ xmlpretty
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="42"> +
+ <food type="discounter"> +
+ <name>Belgian Waffles</name> +
+ <price>$5.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes
+SELECT xmlpretty('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ')::xml;
+ xmlpretty
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="73"> +
+ <food type="organic" class="fancy"> +
+ <name>Belgian Waffles</name> +
+ <price>$15.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ xmlpretty
+------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> +
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>+
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ xmlpretty
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <!-- eat this --> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ xmlpretty
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories> +
+ <c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c> +
+ </meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: invalid XML string (not well balanced)
+SELECT xmlpretty('<foo>')::xml;
+ERROR: invalid XML content
+LINE 1: SELECT xmlpretty('<foo>')::xml;
+ ^
+DETAIL: line 1: chunk is not well balanced
+<foo>
+ ^
+-- XML pretty print: invalid parameter
+SELECT xmlpretty(42)::xml;
+ERROR: function xmlpretty(integer) does not exist
+LINE 1: SELECT xmlpretty(42)::xml;
+ ^
+HINT: No function matches the given name and argument types. You might need to add explicit type casts.
+-- XML pretty print: NULL parameter
+SELECT xmlpretty(NULL)::xml;
+ xmlpretty
+-----------
+
+(1 row)
+
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459297..2b40c90966 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -624,3 +624,37 @@ SELECT * FROM XMLTABLE('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n
\x
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
+
+
+-- XML pretty print: single line XML string
+SELECT xmlpretty('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes
+SELECT xmlpretty('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+
+-- XML pretty print: invalid XML string (not well balanced)
+SELECT xmlpretty('<foo>')::xml;
+
+-- XML pretty print: invalid parameter
+SELECT xmlpretty(42)::xml;
+
+-- XML pretty print: NULL parameter
+SELECT xmlpretty(NULL)::xml;
+
--
2.25.1
The system somehow returns different error messages in Linux and
MacOS/Windows, which was causing the cfbot to fail.
SELECT xmlpretty('<foo>')::xml;
^
-DETAIL: line 1: chunk is not well balanced
+DETAIL: line 1: Premature end of data in tag foo line 1
Test removed in v2.
Show quoted text
On 02.02.23 21:35, Jim Jones wrote:
Hi,
This small patch introduces a XML pretty print function. It basically
takes advantage of the indentation feature of xmlDocDumpFormatMemory
from libxml2 to format XML strings.postgres=# SELECT xmlpretty('<foo id="x"><bar id="y"><var
id="z">42</var></bar></foo>');
xmlpretty
--------------------------
<foo id="x"> +
<bar id="y"> +
<var id="z">42</var>+
</bar> +
</foo> +(1 row)
The patch also contains regression tests and documentation.
Feedback is very welcome!
Jim
Attachments:
v2-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v2-0001-Add-pretty-printed-XML-output-option.patchDownload
From ced9fccddc033de98709a6e93dc6530ce68149db Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v2 1/2] Add pretty-printed XML output option
This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
doc/src/sgml/func.sgml | 34 ++++++++++
src/backend/utils/adt/xml.c | 30 +++++++++
src/include/catalog/pg_proc.dat | 3 +
src/test/regress/expected/xml.out | 107 ++++++++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 34 ++++++++++
5 files changed, 208 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..e8b5e581f0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14293,6 +14293,40 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
]]></screen>
</para>
</sect3>
+
+ <sect3 id="functions-xml-xmlpretty">
+ <title><literal>xmlpretty</literal></title>
+
+ <indexterm>
+ <primary>xmlpretty</primary>
+ </indexterm>
+
+<synopsis>
+<function>xmlpretty</function> ( <type>xml</type> ) <returnvalue>xml</returnvalue>
+</synopsis>
+
+ <para>
+ Converts the given XML value to pretty-printed, indented text.
+ </para>
+
+ <para>
+ Example:
+ <screen><![CDATA[
+SELECT xmlpretty('<foo id="x"><bar id="y"><var id="z">42</var></bar></foo>');
+ xmlpretty
+--------------------------
+ <foo id="x">
+ <bar id="y">
+ <var id="z">42</var>
+ </bar>
+ </foo>
+
+(1 row)
+
+]]></screen>
+ </para>
+ </sect3>
+
</sect2>
<sect2 id="functions-xml-predicates">
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..6409133137 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,36 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
}
#endif
+Datum
+xmlpretty(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+ xmlDocPtr doc;
+ xmlChar *buf = NULL;
+ text *arg = PG_GETARG_TEXT_PP(0);
+
+ doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+ /**
+ * xmlDocDumpFormatMemory (
+ * xmlDocPtr doc, # the XML document.
+ * xmlChar ** buf, # buffer where the formatted XML document will be stored.
+ * int *size, # this could store the size of the created buffer
+ * but as we do not need it, we can leave it NULL.
+ * int format) # 1 = node indenting.
+ */
+ xmlDocDumpFormatMemory(doc, &buf, NULL, 1);
+
+ xmlFreeDoc(doc);
+ PG_RETURN_XML_P(cstring_to_xmltype((char*)buf));
+
+#else
+ NO_XML_SUPPORT();
+ return 0;
+#endif
+}
+
Datum
xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..3224dc3e76 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
{ oid => '3053', descr => 'determine if a string is well formed XML content',
proname => 'xml_is_well_formed_content', prorettype => 'bool',
proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+ { oid => '4642', descr => 'Indented text from xml',
+ proname => 'xmlpretty', prorettype => 'xml',
+ proargtypes => 'xml', prosrc => 'xmlpretty' },
# json
{ oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..98a338ad8d 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,110 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
<foo/> | <foo/>
(1 row)
+-- XML pretty print: single line XML string
+SELECT xmlpretty('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>')::xml;
+ xmlpretty
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="42"> +
+ <food type="discounter"> +
+ <name>Belgian Waffles</name> +
+ <price>$5.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes
+SELECT xmlpretty('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ')::xml;
+ xmlpretty
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="73"> +
+ <food type="organic" class="fancy"> +
+ <name>Belgian Waffles</name> +
+ <price>$15.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ xmlpretty
+------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> +
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>+
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ xmlpretty
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <!-- eat this --> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ xmlpretty
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories> +
+ <c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c> +
+ </meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: invalid XML string (not well balanced)
+SELECT xmlpretty('<foo>')::xml;
+ERROR: invalid XML content
+LINE 1: SELECT xmlpretty('<foo>')::xml;
+ ^
+DETAIL: line 1: chunk is not well balanced
+<foo>
+ ^
+-- XML pretty print: invalid parameter
+SELECT xmlpretty(42)::xml;
+ERROR: function xmlpretty(integer) does not exist
+LINE 1: SELECT xmlpretty(42)::xml;
+ ^
+HINT: No function matches the given name and argument types. You might need to add explicit type casts.
+-- XML pretty print: NULL parameter
+SELECT xmlpretty(NULL)::xml;
+ xmlpretty
+-----------
+
+(1 row)
+
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459297..2b40c90966 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -624,3 +624,37 @@ SELECT * FROM XMLTABLE('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n
\x
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
+
+
+-- XML pretty print: single line XML string
+SELECT xmlpretty('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes
+SELECT xmlpretty('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+
+-- XML pretty print: invalid XML string (not well balanced)
+SELECT xmlpretty('<foo>')::xml;
+
+-- XML pretty print: invalid parameter
+SELECT xmlpretty(42)::xml;
+
+-- XML pretty print: NULL parameter
+SELECT xmlpretty(NULL)::xml;
+
--
2.25.1
From ceb24fcbc55e94a69968432f7a0d93e9e240cd2d Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Fri, 3 Feb 2023 07:48:42 +0100
Subject: [PATCH v2 2/2] Remove unecessary regression tests
The removed removed tests (corner cases) were unnecessray and were
causing the cfbot to fail, as the system is delivering different
error messages in linux (chunk is not well balanced) and windows /
macos (Premature end of data in tag foo line 1).
---
src/test/regress/expected/xml.out | 14 --------------
src/test/regress/sql/xml.sql | 9 +--------
2 files changed, 1 insertion(+), 22 deletions(-)
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 98a338ad8d..afaa83941b 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1685,20 +1685,6 @@ SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xm
(1 row)
--- XML pretty print: invalid XML string (not well balanced)
-SELECT xmlpretty('<foo>')::xml;
-ERROR: invalid XML content
-LINE 1: SELECT xmlpretty('<foo>')::xml;
- ^
-DETAIL: line 1: chunk is not well balanced
-<foo>
- ^
--- XML pretty print: invalid parameter
-SELECT xmlpretty(42)::xml;
-ERROR: function xmlpretty(integer) does not exist
-LINE 1: SELECT xmlpretty(42)::xml;
- ^
-HINT: No function matches the given name and argument types. You might need to add explicit type casts.
-- XML pretty print: NULL parameter
SELECT xmlpretty(NULL)::xml;
xmlpretty
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 2b40c90966..6e9a7b2295 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -649,12 +649,5 @@ SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xm
<desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>')::xml;
--- XML pretty print: invalid XML string (not well balanced)
-SELECT xmlpretty('<foo>')::xml;
-
--- XML pretty print: invalid parameter
-SELECT xmlpretty(42)::xml;
-
-- XML pretty print: NULL parameter
-SELECT xmlpretty(NULL)::xml;
-
+SELECT xmlpretty(NULL)::xml;
\ No newline at end of file
--
2.25.1
Hi,
The cfbot on "Windows - Server 2019, VS 2019 - Meson & ninja" is failing
the regression tests with the error:
ERROR: unsupported XML feature
DETAIL: This functionality requires the server to be built with libxml
support.
Is there anything I can do to enable libxml to run my regression tests?
v3 adds a missing xmlFree call.
Best, Jim
Attachments:
v3-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v3-0001-Add-pretty-printed-XML-output-option.patchDownload
From ced9fccddc033de98709a6e93dc6530ce68149db Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v3 1/3] Add pretty-printed XML output option
This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
doc/src/sgml/func.sgml | 34 ++++++++++
src/backend/utils/adt/xml.c | 30 +++++++++
src/include/catalog/pg_proc.dat | 3 +
src/test/regress/expected/xml.out | 107 ++++++++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 34 ++++++++++
5 files changed, 208 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..e8b5e581f0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14293,6 +14293,40 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
]]></screen>
</para>
</sect3>
+
+ <sect3 id="functions-xml-xmlpretty">
+ <title><literal>xmlpretty</literal></title>
+
+ <indexterm>
+ <primary>xmlpretty</primary>
+ </indexterm>
+
+<synopsis>
+<function>xmlpretty</function> ( <type>xml</type> ) <returnvalue>xml</returnvalue>
+</synopsis>
+
+ <para>
+ Converts the given XML value to pretty-printed, indented text.
+ </para>
+
+ <para>
+ Example:
+ <screen><![CDATA[
+SELECT xmlpretty('<foo id="x"><bar id="y"><var id="z">42</var></bar></foo>');
+ xmlpretty
+--------------------------
+ <foo id="x">
+ <bar id="y">
+ <var id="z">42</var>
+ </bar>
+ </foo>
+
+(1 row)
+
+]]></screen>
+ </para>
+ </sect3>
+
</sect2>
<sect2 id="functions-xml-predicates">
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..6409133137 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,36 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
}
#endif
+Datum
+xmlpretty(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+ xmlDocPtr doc;
+ xmlChar *buf = NULL;
+ text *arg = PG_GETARG_TEXT_PP(0);
+
+ doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+ /**
+ * xmlDocDumpFormatMemory (
+ * xmlDocPtr doc, # the XML document.
+ * xmlChar ** buf, # buffer where the formatted XML document will be stored.
+ * int *size, # this could store the size of the created buffer
+ * but as we do not need it, we can leave it NULL.
+ * int format) # 1 = node indenting.
+ */
+ xmlDocDumpFormatMemory(doc, &buf, NULL, 1);
+
+ xmlFreeDoc(doc);
+ PG_RETURN_XML_P(cstring_to_xmltype((char*)buf));
+
+#else
+ NO_XML_SUPPORT();
+ return 0;
+#endif
+}
+
Datum
xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..3224dc3e76 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
{ oid => '3053', descr => 'determine if a string is well formed XML content',
proname => 'xml_is_well_formed_content', prorettype => 'bool',
proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+ { oid => '4642', descr => 'Indented text from xml',
+ proname => 'xmlpretty', prorettype => 'xml',
+ proargtypes => 'xml', prosrc => 'xmlpretty' },
# json
{ oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..98a338ad8d 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,110 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
<foo/> | <foo/>
(1 row)
+-- XML pretty print: single line XML string
+SELECT xmlpretty('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>')::xml;
+ xmlpretty
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="42"> +
+ <food type="discounter"> +
+ <name>Belgian Waffles</name> +
+ <price>$5.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes
+SELECT xmlpretty('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ')::xml;
+ xmlpretty
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="73"> +
+ <food type="organic" class="fancy"> +
+ <name>Belgian Waffles</name> +
+ <price>$15.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ xmlpretty
+------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> +
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>+
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ xmlpretty
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <!-- eat this --> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ xmlpretty
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories> +
+ <c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c> +
+ </meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: invalid XML string (not well balanced)
+SELECT xmlpretty('<foo>')::xml;
+ERROR: invalid XML content
+LINE 1: SELECT xmlpretty('<foo>')::xml;
+ ^
+DETAIL: line 1: chunk is not well balanced
+<foo>
+ ^
+-- XML pretty print: invalid parameter
+SELECT xmlpretty(42)::xml;
+ERROR: function xmlpretty(integer) does not exist
+LINE 1: SELECT xmlpretty(42)::xml;
+ ^
+HINT: No function matches the given name and argument types. You might need to add explicit type casts.
+-- XML pretty print: NULL parameter
+SELECT xmlpretty(NULL)::xml;
+ xmlpretty
+-----------
+
+(1 row)
+
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459297..2b40c90966 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -624,3 +624,37 @@ SELECT * FROM XMLTABLE('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n
\x
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
+
+
+-- XML pretty print: single line XML string
+SELECT xmlpretty('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes
+SELECT xmlpretty('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+
+-- XML pretty print: invalid XML string (not well balanced)
+SELECT xmlpretty('<foo>')::xml;
+
+-- XML pretty print: invalid parameter
+SELECT xmlpretty(42)::xml;
+
+-- XML pretty print: NULL parameter
+SELECT xmlpretty(NULL)::xml;
+
--
2.25.1
From ceb24fcbc55e94a69968432f7a0d93e9e240cd2d Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Fri, 3 Feb 2023 07:48:42 +0100
Subject: [PATCH v3 2/3] Remove unecessary regression tests
The removed removed tests (corner cases) were unnecessray and were
causing the cfbot to fail, as the system is delivering different
error messages in linux (chunk is not well balanced) and windows /
macos (Premature end of data in tag foo line 1).
---
src/test/regress/expected/xml.out | 14 --------------
src/test/regress/sql/xml.sql | 9 +--------
2 files changed, 1 insertion(+), 22 deletions(-)
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 98a338ad8d..afaa83941b 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1685,20 +1685,6 @@ SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xm
(1 row)
--- XML pretty print: invalid XML string (not well balanced)
-SELECT xmlpretty('<foo>')::xml;
-ERROR: invalid XML content
-LINE 1: SELECT xmlpretty('<foo>')::xml;
- ^
-DETAIL: line 1: chunk is not well balanced
-<foo>
- ^
--- XML pretty print: invalid parameter
-SELECT xmlpretty(42)::xml;
-ERROR: function xmlpretty(integer) does not exist
-LINE 1: SELECT xmlpretty(42)::xml;
- ^
-HINT: No function matches the given name and argument types. You might need to add explicit type casts.
-- XML pretty print: NULL parameter
SELECT xmlpretty(NULL)::xml;
xmlpretty
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 2b40c90966..6e9a7b2295 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -649,12 +649,5 @@ SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xm
<desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>')::xml;
--- XML pretty print: invalid XML string (not well balanced)
-SELECT xmlpretty('<foo>')::xml;
-
--- XML pretty print: invalid parameter
-SELECT xmlpretty(42)::xml;
-
-- XML pretty print: NULL parameter
-SELECT xmlpretty(NULL)::xml;
-
+SELECT xmlpretty(NULL)::xml;
\ No newline at end of file
--
2.25.1
From f2b5a722c7ff3d7aa41ff20ae146af2477e590da Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Mon, 6 Feb 2023 16:51:13 +0100
Subject: [PATCH v3 3/3] Add missing xmlFree call for xml buffer
Indented xml string now stored in a StringInfoData and xmlChar*
buffer is properly freed.
---
src/backend/utils/adt/xml.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 6409133137..4b6a9fde01 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -479,8 +479,9 @@ xmlpretty(PG_FUNCTION_ARGS)
#ifdef USE_LIBXML
xmlDocPtr doc;
- xmlChar *buf = NULL;
+ xmlChar *xmlbuf = NULL;
text *arg = PG_GETARG_TEXT_PP(0);
+ StringInfoData buf;
doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
@@ -492,10 +493,15 @@ xmlpretty(PG_FUNCTION_ARGS)
* but as we do not need it, we can leave it NULL.
* int format) # 1 = node indenting.
*/
- xmlDocDumpFormatMemory(doc, &buf, NULL, 1);
+ xmlDocDumpFormatMemory(doc, &xmlbuf, NULL, 1);
- xmlFreeDoc(doc);
- PG_RETURN_XML_P(cstring_to_xmltype((char*)buf));
+ initStringInfo(&buf);
+ appendStringInfoString(&buf, (char*)xmlbuf);
+
+ xmlFree(xmlbuf);
+ xmlFreeDoc(doc);
+
+ PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
#else
NO_XML_SUPPORT();
--
2.25.1
Jim Jones <jim.jones@uni-muenster.de> writes:
The cfbot on "Windows - Server 2019, VS 2019 - Meson & ninja" is failing
the regression tests with the error:
ERROR: unsupported XML feature
DETAIL: This functionality requires the server to be built with libxml
support.
Is there anything I can do to enable libxml to run my regression tests?
Your patch needs to also update expected/xml_1.out to match the output
the test produces when run without --with-libxml.
regards, tom lane
On 06.02.23 17:23, Tom Lane wrote:
Your patch needs to also update expected/xml_1.out to match the output
the test produces when run without --with-libxml.
Thanks a lot! It seems to do the trick.
xml_1.out updated in v4.
Best, Jim
Attachments:
v4-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v4-0001-Add-pretty-printed-XML-output-option.patchDownload
From ced9fccddc033de98709a6e93dc6530ce68149db Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v4 1/4] Add pretty-printed XML output option
This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
doc/src/sgml/func.sgml | 34 ++++++++++
src/backend/utils/adt/xml.c | 30 +++++++++
src/include/catalog/pg_proc.dat | 3 +
src/test/regress/expected/xml.out | 107 ++++++++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 34 ++++++++++
5 files changed, 208 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..e8b5e581f0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14293,6 +14293,40 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
]]></screen>
</para>
</sect3>
+
+ <sect3 id="functions-xml-xmlpretty">
+ <title><literal>xmlpretty</literal></title>
+
+ <indexterm>
+ <primary>xmlpretty</primary>
+ </indexterm>
+
+<synopsis>
+<function>xmlpretty</function> ( <type>xml</type> ) <returnvalue>xml</returnvalue>
+</synopsis>
+
+ <para>
+ Converts the given XML value to pretty-printed, indented text.
+ </para>
+
+ <para>
+ Example:
+ <screen><![CDATA[
+SELECT xmlpretty('<foo id="x"><bar id="y"><var id="z">42</var></bar></foo>');
+ xmlpretty
+--------------------------
+ <foo id="x">
+ <bar id="y">
+ <var id="z">42</var>
+ </bar>
+ </foo>
+
+(1 row)
+
+]]></screen>
+ </para>
+ </sect3>
+
</sect2>
<sect2 id="functions-xml-predicates">
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..6409133137 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,36 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
}
#endif
+Datum
+xmlpretty(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+ xmlDocPtr doc;
+ xmlChar *buf = NULL;
+ text *arg = PG_GETARG_TEXT_PP(0);
+
+ doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+ /**
+ * xmlDocDumpFormatMemory (
+ * xmlDocPtr doc, # the XML document.
+ * xmlChar ** buf, # buffer where the formatted XML document will be stored.
+ * int *size, # this could store the size of the created buffer
+ * but as we do not need it, we can leave it NULL.
+ * int format) # 1 = node indenting.
+ */
+ xmlDocDumpFormatMemory(doc, &buf, NULL, 1);
+
+ xmlFreeDoc(doc);
+ PG_RETURN_XML_P(cstring_to_xmltype((char*)buf));
+
+#else
+ NO_XML_SUPPORT();
+ return 0;
+#endif
+}
+
Datum
xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..3224dc3e76 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
{ oid => '3053', descr => 'determine if a string is well formed XML content',
proname => 'xml_is_well_formed_content', prorettype => 'bool',
proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+ { oid => '4642', descr => 'Indented text from xml',
+ proname => 'xmlpretty', prorettype => 'xml',
+ proargtypes => 'xml', prosrc => 'xmlpretty' },
# json
{ oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..98a338ad8d 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,110 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
<foo/> | <foo/>
(1 row)
+-- XML pretty print: single line XML string
+SELECT xmlpretty('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>')::xml;
+ xmlpretty
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="42"> +
+ <food type="discounter"> +
+ <name>Belgian Waffles</name> +
+ <price>$5.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes
+SELECT xmlpretty('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ')::xml;
+ xmlpretty
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="73"> +
+ <food type="organic" class="fancy"> +
+ <name>Belgian Waffles</name> +
+ <price>$15.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ xmlpretty
+------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> +
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>+
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ xmlpretty
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <!-- eat this --> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ xmlpretty
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories> +
+ <c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c> +
+ </meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: invalid XML string (not well balanced)
+SELECT xmlpretty('<foo>')::xml;
+ERROR: invalid XML content
+LINE 1: SELECT xmlpretty('<foo>')::xml;
+ ^
+DETAIL: line 1: chunk is not well balanced
+<foo>
+ ^
+-- XML pretty print: invalid parameter
+SELECT xmlpretty(42)::xml;
+ERROR: function xmlpretty(integer) does not exist
+LINE 1: SELECT xmlpretty(42)::xml;
+ ^
+HINT: No function matches the given name and argument types. You might need to add explicit type casts.
+-- XML pretty print: NULL parameter
+SELECT xmlpretty(NULL)::xml;
+ xmlpretty
+-----------
+
+(1 row)
+
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459297..2b40c90966 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -624,3 +624,37 @@ SELECT * FROM XMLTABLE('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n
\x
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
+
+
+-- XML pretty print: single line XML string
+SELECT xmlpretty('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes
+SELECT xmlpretty('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+
+-- XML pretty print: invalid XML string (not well balanced)
+SELECT xmlpretty('<foo>')::xml;
+
+-- XML pretty print: invalid parameter
+SELECT xmlpretty(42)::xml;
+
+-- XML pretty print: NULL parameter
+SELECT xmlpretty(NULL)::xml;
+
--
2.25.1
From ceb24fcbc55e94a69968432f7a0d93e9e240cd2d Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Fri, 3 Feb 2023 07:48:42 +0100
Subject: [PATCH v4 2/4] Remove unecessary regression tests
The removed removed tests (corner cases) were unnecessray and were
causing the cfbot to fail, as the system is delivering different
error messages in linux (chunk is not well balanced) and windows /
macos (Premature end of data in tag foo line 1).
---
src/test/regress/expected/xml.out | 14 --------------
src/test/regress/sql/xml.sql | 9 +--------
2 files changed, 1 insertion(+), 22 deletions(-)
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 98a338ad8d..afaa83941b 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1685,20 +1685,6 @@ SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xm
(1 row)
--- XML pretty print: invalid XML string (not well balanced)
-SELECT xmlpretty('<foo>')::xml;
-ERROR: invalid XML content
-LINE 1: SELECT xmlpretty('<foo>')::xml;
- ^
-DETAIL: line 1: chunk is not well balanced
-<foo>
- ^
--- XML pretty print: invalid parameter
-SELECT xmlpretty(42)::xml;
-ERROR: function xmlpretty(integer) does not exist
-LINE 1: SELECT xmlpretty(42)::xml;
- ^
-HINT: No function matches the given name and argument types. You might need to add explicit type casts.
-- XML pretty print: NULL parameter
SELECT xmlpretty(NULL)::xml;
xmlpretty
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 2b40c90966..6e9a7b2295 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -649,12 +649,5 @@ SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xm
<desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>')::xml;
--- XML pretty print: invalid XML string (not well balanced)
-SELECT xmlpretty('<foo>')::xml;
-
--- XML pretty print: invalid parameter
-SELECT xmlpretty(42)::xml;
-
-- XML pretty print: NULL parameter
-SELECT xmlpretty(NULL)::xml;
-
+SELECT xmlpretty(NULL)::xml;
\ No newline at end of file
--
2.25.1
From f2b5a722c7ff3d7aa41ff20ae146af2477e590da Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Mon, 6 Feb 2023 16:51:13 +0100
Subject: [PATCH v4 3/4] Add missing xmlFree call for xml buffer
Indented xml string now stored in a StringInfoData and xmlChar*
buffer is properly freed.
---
src/backend/utils/adt/xml.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 6409133137..4b6a9fde01 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -479,8 +479,9 @@ xmlpretty(PG_FUNCTION_ARGS)
#ifdef USE_LIBXML
xmlDocPtr doc;
- xmlChar *buf = NULL;
+ xmlChar *xmlbuf = NULL;
text *arg = PG_GETARG_TEXT_PP(0);
+ StringInfoData buf;
doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
@@ -492,10 +493,15 @@ xmlpretty(PG_FUNCTION_ARGS)
* but as we do not need it, we can leave it NULL.
* int format) # 1 = node indenting.
*/
- xmlDocDumpFormatMemory(doc, &buf, NULL, 1);
+ xmlDocDumpFormatMemory(doc, &xmlbuf, NULL, 1);
- xmlFreeDoc(doc);
- PG_RETURN_XML_P(cstring_to_xmltype((char*)buf));
+ initStringInfo(&buf);
+ appendStringInfoString(&buf, (char*)xmlbuf);
+
+ xmlFree(xmlbuf);
+ xmlFreeDoc(doc);
+
+ PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
#else
NO_XML_SUPPORT();
--
2.25.1
From f89ffd258321c6bcbe65c24ef7318374deff0535 Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Mon, 6 Feb 2023 18:52:40 +0100
Subject: [PATCH v4 4/4] Update xml_1.out
Add error messages to xml_1.out, so that the regression tests
do not fail when the system is built without --with--libxml
---
src/test/regress/expected/xml_1.out | 45 +++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 378b412db0..aecec39e05 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -1268,3 +1268,48 @@ DETAIL: This functionality requires the server to be built with libxml support.
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
ERROR: unsupported XML feature
DETAIL: This functionality requires the server to be built with libxml support.
+-- XML pretty print: single line XML string
+SELECT xmlpretty('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>')::xml;
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlpretty('<breakfast_menu id="42"><food type="discou...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML pretty print: XML string with space, tabs and newline between nodes
+SELECT xmlpretty('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ')::xml;
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlpretty('<breakfast_menu id="73"> <food type="organ...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML pretty print: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML pretty print: NULL parameter
+SELECT xmlpretty(NULL)::xml;
+ xmlpretty
+-----------
+
+(1 row)
+
--
2.25.1
while working on another item of the TODO list I realized that I should
be using a PG_TRY() block in he xmlDocDumpFormatMemory call.
Fixed in v5.
Best regards, Jim
Attachments:
v5-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v5-0001-Add-pretty-printed-XML-output-option.patchDownload
From f503b25c7fd8d984d29536e78577741e5e7c5e9f Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v5] Add pretty-printed XML output option
This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
doc/src/sgml/func.sgml | 34 +++++++++++
src/backend/utils/adt/xml.c | 68 +++++++++++++++++++++
src/include/catalog/pg_proc.dat | 3 +
src/test/regress/expected/xml.out | 93 +++++++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 45 ++++++++++++++
src/test/regress/sql/xml.sql | 27 +++++++++
6 files changed, 270 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..e8b5e581f0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14293,6 +14293,40 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
]]></screen>
</para>
</sect3>
+
+ <sect3 id="functions-xml-xmlpretty">
+ <title><literal>xmlpretty</literal></title>
+
+ <indexterm>
+ <primary>xmlpretty</primary>
+ </indexterm>
+
+<synopsis>
+<function>xmlpretty</function> ( <type>xml</type> ) <returnvalue>xml</returnvalue>
+</synopsis>
+
+ <para>
+ Converts the given XML value to pretty-printed, indented text.
+ </para>
+
+ <para>
+ Example:
+ <screen><![CDATA[
+SELECT xmlpretty('<foo id="x"><bar id="y"><var id="z">42</var></bar></foo>');
+ xmlpretty
+--------------------------
+ <foo id="x">
+ <bar id="y">
+ <var id="z">42</var>
+ </bar>
+ </foo>
+
+(1 row)
+
+]]></screen>
+ </para>
+ </sect3>
+
</sect2>
<sect2 id="functions-xml-predicates">
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..9c7f5c85cb 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,74 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
}
#endif
+Datum
+xmlpretty(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+ xmlDocPtr doc;
+ xmlChar *xmlbuf = NULL;
+ text *arg = PG_GETARG_TEXT_PP(0);
+ StringInfoData buf;
+ PgXmlErrorContext *xmlerrcxt;
+
+ doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+ xmlerrcxt = pg_xml_init(PG_XML_STRICTNESS_ALL);
+
+ PG_TRY();
+ {
+
+ int nbytes;
+
+ /**
+ * xmlDocDumpFormatMemory (()
+ * xmlDocPtr doc, # the XML document
+ * xmlChar **xmlbuf, # the memory pointer
+ * int *nbytes, # the memory length
+ * int format # 1 = node indenting
+ *)
+ */
+
+ xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+ if(!nbytes || xmlerrcxt->err_occurred) {
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
+ "could not indent the given XML document");
+ }
+
+ initStringInfo(&buf);
+ appendStringInfoString(&buf, (const char *)xmlbuf);
+
+ }
+ PG_CATCH();
+ {
+
+ if(doc!=NULL)
+ xmlFreeDoc(doc);
+ if(xmlbuf!=NULL)
+ xmlFree(xmlbuf);
+
+ pg_xml_done(xmlerrcxt, true);
+
+ PG_RE_THROW();
+
+ }
+ PG_END_TRY();
+
+ xmlFreeDoc(doc);
+ xmlFree(xmlbuf);
+
+ pg_xml_done(xmlerrcxt, false);
+
+ PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
+
+#else
+ NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
Datum
xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..3224dc3e76 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
{ oid => '3053', descr => 'determine if a string is well formed XML content',
proname => 'xml_is_well_formed_content', prorettype => 'bool',
proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+ { oid => '4642', descr => 'Indented text from xml',
+ proname => 'xmlpretty', prorettype => 'xml',
+ proargtypes => 'xml', prosrc => 'xmlpretty' },
# json
{ oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..afaa83941b 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,96 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
<foo/> | <foo/>
(1 row)
+-- XML pretty print: single line XML string
+SELECT xmlpretty('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>')::xml;
+ xmlpretty
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="42"> +
+ <food type="discounter"> +
+ <name>Belgian Waffles</name> +
+ <price>$5.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes
+SELECT xmlpretty('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ')::xml;
+ xmlpretty
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="73"> +
+ <food type="organic" class="fancy"> +
+ <name>Belgian Waffles</name> +
+ <price>$15.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ xmlpretty
+------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> +
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>+
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ xmlpretty
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <!-- eat this --> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ xmlpretty
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories> +
+ <c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c> +
+ </meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML pretty print: NULL parameter
+SELECT xmlpretty(NULL)::xml;
+ xmlpretty
+-----------
+
+(1 row)
+
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 378b412db0..aecec39e05 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -1268,3 +1268,48 @@ DETAIL: This functionality requires the server to be built with libxml support.
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
ERROR: unsupported XML feature
DETAIL: This functionality requires the server to be built with libxml support.
+-- XML pretty print: single line XML string
+SELECT xmlpretty('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>')::xml;
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlpretty('<breakfast_menu id="42"><food type="discou...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML pretty print: XML string with space, tabs and newline between nodes
+SELECT xmlpretty('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ')::xml;
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlpretty('<breakfast_menu id="73"> <food type="organ...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML pretty print: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML pretty print: NULL parameter
+SELECT xmlpretty(NULL)::xml;
+ xmlpretty
+-----------
+
+(1 row)
+
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459297..6e9a7b2295 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -624,3 +624,30 @@ SELECT * FROM XMLTABLE('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n
\x
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
+
+
+-- XML pretty print: single line XML string
+SELECT xmlpretty('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes
+SELECT xmlpretty('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+
+-- XML pretty print: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlpretty('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>')::xml;
+
+-- XML pretty print: NULL parameter
+SELECT xmlpretty(NULL)::xml;
\ No newline at end of file
--
2.25.1
On Thu, Feb 9, 2023 at 7:31 AM Jim Jones <jim.jones@uni-muenster.de> wrote:
while working on another item of the TODO list I realized that I should
be using a PG_TRY() block in he xmlDocDumpFormatMemory call.Fixed in v5.
I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the
other xmlFreeDoc(doc) is not. As the doc is assigned outside the
PG_TRY shouldn't those both be the same?
------
Kind Regards,
Peter Smith.
Fujitsu Australia.
On 09.02.23 00:09, Peter Smith wrote:
I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the
other xmlFreeDoc(doc) is not. As the doc is assigned outside the
PG_TRY shouldn't those both be the same?
Hi Peter,
My logic there was the following: if program reached that part of the
code it means that the xml_parse() and xmlDocDumpFormatMemory() worked,
which consequently means that the variables doc and xmlbuf are != NULL,
therefore not needing to be checked. Am I missing something?
Thanks a lot for the review!
Best, Jim
Attachments:
On Thu, Feb 9, 2023 at 10:42 AM Jim Jones <jim.jones@uni-muenster.de> wrote:
On 09.02.23 00:09, Peter Smith wrote:
I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the
other xmlFreeDoc(doc) is not. As the doc is assigned outside the
PG_TRY shouldn't those both be the same?Hi Peter,
My logic there was the following: if program reached that part of the
code it means that the xml_parse() and xmlDocDumpFormatMemory() worked,
which consequently means that the variables doc and xmlbuf are != NULL,
therefore not needing to be checked. Am I missing something?
Thanks. I think I understand it better now -- I expect
xmlDocDumpFormatMemory will cope OK when passed a NULL doc (see this
source [1]), but it will return nbytes of 0, but your code will still
throw ERROR, meaning the guard for doc NULL is necessary for the
PG_CATCH.
In that case, everything LGTM.
~
OTOH, if you are having to check for NULL doc anyway, maybe it's just
as easy only doing that up-front. Then you could quick-exit the
function without calling xmlDocDumpFormatMemory etc. in the first
place. For example:
doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
if (!doc)
return 0;
------
Kind Regards,
Peter Smith.
Fujitsu Australia.
On 09.02.23 02:01, Peter Smith wrote:
OTOH, if you are having to check for NULL doc anyway, maybe it's just
as easy only doing that up-front. Then you could quick-exit the
function without calling xmlDocDumpFormatMemory etc. in the first
place. For example:doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
if (!doc)
return 0;
I see your point. If I got it right, you're suggesting the following
change in the PG_TRY();
PG_TRY();
{
int nbytes;
if(!doc)
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
"could not parse the given XML document");
xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
if(!nbytes || xmlerrcxt->err_occurred)
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
"could not indent the given XML document");
initStringInfo(&buf);
appendStringInfoString(&buf, (const char *)xmlbuf);
}
.. which will catch the doc == NULL before calling xmlDocDumpFormatMemory.
Is it what you suggest?
Thanks a lot for the thorough review!
Best, Jim
Jim Jones <jim.jones@uni-muenster.de> writes:
I see your point. If I got it right, you're suggesting the following
change in the PG_TRY();
PG_TRY();
{
int nbytes;
if(!doc)
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
"could not parse the given XML document");
xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
if(!nbytes || xmlerrcxt->err_occurred)
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
"could not indent the given XML document");
initStringInfo(&buf);
appendStringInfoString(&buf, (const char *)xmlbuf);
}
.. which will catch the doc == NULL before calling xmlDocDumpFormatMemory.
Um ... why are you using PG_TRY here at all? It seems like
you have full control of the actually likely error cases.
The only plausible error out of the StringInfo calls is OOM,
and you probably don't want to trap that at all.
regards, tom lane
On 09.02.23 08:23, Tom Lane wrote:
Um ... why are you using PG_TRY here at all? It seems like
you have full control of the actually likely error cases.
The only plausible error out of the StringInfo calls is OOM,
and you probably don't want to trap that at all.
My intention was to catch any unexpected error from
xmlDocDumpFormatMemory and handle it properly. But I guess you're right,
I can control the likely error cases by checking doc and nbytes.
You suggest something along these lines?
xmlDocPtr doc;
xmlChar *xmlbuf = NULL;
text *arg = PG_GETARG_TEXT_PP(0);
StringInfoData buf;
int nbytes;
doc = xml_parse(arg, XMLOPTION_DOCUMENT, false,
GetDatabaseEncoding(), NULL);
if(!doc)
elog(ERROR, "could not parse the given XML document");
xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
xmlFreeDoc(doc);
if(!nbytes)
elog(ERROR, "could not indent the given XML document");
initStringInfo(&buf);
appendStringInfoString(&buf, (const char *)xmlbuf);
xmlFree(xmlbuf);
PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
Thanks!
Best, Jim
Attachments:
On 02.02.23 21:35, Jim Jones wrote:
This small patch introduces a XML pretty print function. It basically
takes advantage of the indentation feature of xmlDocDumpFormatMemory
from libxml2 to format XML strings.
I suggest we call it "xmlformat", which is an established term for this.
On Thu, Feb 9, 2023 at 7:17 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
On 09.02.23 08:23, Tom Lane wrote:
Um ... why are you using PG_TRY here at all? It seems like
you have full control of the actually likely error cases.
The only plausible error out of the StringInfo calls is OOM,
and you probably don't want to trap that at all.My intention was to catch any unexpected error from
xmlDocDumpFormatMemory and handle it properly. But I guess you're right,
I can control the likely error cases by checking doc and nbytes.You suggest something along these lines?
xmlDocPtr doc;
xmlChar *xmlbuf = NULL;
text *arg = PG_GETARG_TEXT_PP(0);
StringInfoData buf;
int nbytes;doc = xml_parse(arg, XMLOPTION_DOCUMENT, false,
GetDatabaseEncoding(), NULL);if(!doc)
elog(ERROR, "could not parse the given XML document");xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
xmlFreeDoc(doc);
if(!nbytes)
elog(ERROR, "could not indent the given XML document");initStringInfo(&buf);
appendStringInfoString(&buf, (const char *)xmlbuf);xmlFree(xmlbuf);
PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
Thanks!
Something like that LGTM, but here are some other minor comments.
======
1.
FYI, there are some whitespace warnings applying the v5 patch
[postgres@CentOS7-x64 oss_postgres_misc]$ git apply
../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch
../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:26:
trailing whitespace.
../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:29:
trailing whitespace.
../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:33:
trailing whitespace.
../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:37:
trailing whitespace.
../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:41:
trailing whitespace.
warning: squelched 48 whitespace errors
warning: 53 lines add whitespace errors.
======
src/test/regress/sql/xml.sql
2.
The v5 patch was already testing NULL, but it might be good to add
more tests to verify the function is behaving how you want for edge
cases. For example,
+-- XML pretty print: NULL, empty string, spaces only...
SELECT xmlpretty(NULL);
SELECT xmlpretty('');
SELECT xmlpretty(' ');
~~
3.
The function is returning XML anyway, so is the '::xml' casting in
these tests necessary?
e.g.
SELECT xmlpretty(NULL)::xml; --> SELECT xmlpretty(NULL);
======
src/include/catalog/pg_proc.dat
4.
+ { oid => '4642', descr => 'Indented text from xml',
+ proname => 'xmlpretty', prorettype => 'xml',
+ proargtypes => 'xml', prosrc => 'xmlpretty' },
Spurious leading space for this new entry.
======
doc/src/sgml/func.sgml
5.
+ <foo id="x">
+ <bar id="y">
+ <var id="z">42</var>
+ </bar>
+ </foo>
+
+(1 row)
+
+]]></screen>
A spurious blank line in the example after the "(1 row)"
~~~
6.
Does this function docs belong in section 9.15.1 "Producing XML
Content"? Or (since it is not really producing content) should it be
moved to the 9.15.3 "Processing XML" section?
------
Kind Regards,
Peter Smith.
Fujitsu Australia
On 10.02.23 02:10, Peter Smith wrote:
On Thu, Feb 9, 2023 at 7:17 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
1.
FYI, there are some whitespace warnings applying the v5 patch
Trailing whitespaces removed. The patch applies now without warnings.
======
src/test/regress/sql/xml.sql2.
The v5 patch was already testing NULL, but it might be good to add
more tests to verify the function is behaving how you want for edge
cases. For example,+-- XML pretty print: NULL, empty string, spaces only...
SELECT xmlpretty(NULL);
SELECT xmlpretty('');
SELECT xmlpretty(' ');
Test cases added.
3.
The function is returning XML anyway, so is the '::xml' casting in
these tests necessary?e.g.
SELECT xmlpretty(NULL)::xml; --> SELECT xmlpretty(NULL);
It is indeed not necessary. Most likely I used it for testing and forgot
to remove it afterwards. Now removed.
======
src/include/catalog/pg_proc.dat4.
+ { oid => '4642', descr => 'Indented text from xml', + proname => 'xmlpretty', prorettype => 'xml', + proargtypes => 'xml', prosrc => 'xmlpretty' },Spurious leading space for this new entry.
Removed.
======
doc/src/sgml/func.sgml5. + <foo id="x"> + <bar id="y"> + <var id="z">42</var> + </bar> + </foo> + +(1 row) + +]]></screen>A spurious blank line in the example after the "(1 row)"
Removed.
~~~
6.
Does this function docs belong in section 9.15.1 "Producing XML
Content"? Or (since it is not really producing content) should it be
moved to the 9.15.3 "Processing XML" section?
Moved to the section 9.15.3
Following the suggestion of Peter Eisentraut I renamed the function to
xmlformat().
v6 attached.
Thanks a lot for the review.
Best, Jim
Attachments:
v6-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v6-0001-Add-pretty-printed-XML-output-option.patchDownload
From 5ca93fe69e8b62895b13cdd51d568d061362c912 Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v6] Add pretty-printed XML output option
This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
doc/src/sgml/func.sgml | 34 +++++++++
src/backend/utils/adt/xml.c | 45 ++++++++++++
src/include/catalog/pg_proc.dat | 3 +
src/test/regress/expected/xml.out | 108 ++++++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 57 +++++++++++++++
src/test/regress/sql/xml.sql | 33 +++++++++
6 files changed, 280 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..a621192425 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14861,6 +14861,40 @@ SELECT xmltable.*
]]></screen>
</para>
</sect3>
+
+ <sect3 id="functions-xml-xmlformat">
+ <title><literal>xmlformat</literal></title>
+
+ <indexterm>
+ <primary>xmlformat</primary>
+ </indexterm>
+
+<synopsis>
+<function>xmlformat</function> ( <type>xml</type> ) <returnvalue>xml</returnvalue>
+</synopsis>
+
+ <para>
+ Converts the given XML value to pretty-printed, indented text.
+ </para>
+
+ <para>
+ Example:
+ <screen><![CDATA[
+SELECT xmlformat('<foo id="x"><bar id="y"><var id="z">42</var></bar></foo>');
+ xmlformat
+--------------------------
+ <foo id="x">
+ <bar id="y">
+ <var id="z">42</var>
+ </bar>
+ </foo>
+
+(1 row)
+
+]]></screen>
+ </para>
+ </sect3>
+
</sect2>
<sect2 id="functions-xml-mapping">
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..ec12707b5c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,51 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
}
#endif
+Datum
+xmlformat(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+ xmlDocPtr doc;
+ xmlChar *xmlbuf = NULL;
+ text *arg = PG_GETARG_TEXT_PP(0);
+ StringInfoData buf;
+ int nbytes;
+
+ doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+ if(!doc)
+ elog(ERROR, "could not parse the given XML document");
+
+ /**
+ * xmlDocDumpFormatMemory (
+ * xmlDocPtr doc, # the XML document
+ * xmlChar **xmlbuf, # the memory pointer
+ * int *nbytes, # the memory length
+ * int format # 1 = node indenting
+ *)
+ */
+
+ xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+ xmlFreeDoc(doc);
+
+ if(!nbytes)
+ elog(ERROR, "could not indent the given XML document");
+
+ initStringInfo(&buf);
+ appendStringInfoString(&buf, (const char *)xmlbuf);
+
+ xmlFree(xmlbuf);
+
+ PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
+
+#else
+ NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
Datum
xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..54e8a6262a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
{ oid => '3053', descr => 'determine if a string is well formed XML content',
proname => 'xml_is_well_formed_content', prorettype => 'bool',
proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+{ oid => '4642', descr => 'Indented text from xml',
+ proname => 'xmlformat', prorettype => 'xml',
+ proargtypes => 'xml', prosrc => 'xmlformat' },
# json
{ oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..8e53c87a41 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,111 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
<foo/> | <foo/>
(1 row)
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="42"> +
+ <food type="discounter"> +
+ <name>Belgian Waffles</name> +
+ <price>$5.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="73"> +
+ <food type="organic" class="fancy"> +
+ <name>Belgian Waffles</name> +
+ <price>$15.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> +
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>+
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <!-- eat this --> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories> +
+ <c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c> +
+ </meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+-- XML format: empty string
+SELECT xmlformat('');
+ERROR: invalid XML document
+DETAIL: line 1: switching encoding : no input
+
+^
+line 1: Document is empty
+
+^
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: invalid XML document
+DETAIL: line 1: Start tag expected, '<' not found
+
+ ^
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 378b412db0..0657f839a7 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -1268,3 +1268,60 @@ DETAIL: This functionality requires the server to be built with libxml support.
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
ERROR: unsupported XML feature
DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<breakfast_menu id="42"><food type="discou...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<breakfast_menu id="73"> <food type="organ...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+-- XML format: empty string
+SELECT xmlformat('');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('');
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat(' ');
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459297..2517e84419 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -624,3 +624,36 @@ SELECT * FROM XMLTABLE('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n
\x
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
+
+
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+
+-- XML format: empty string
+SELECT xmlformat('');
+
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
\ No newline at end of file
--
2.25.1
Something is misbehaving.
Using the latest HEAD, and before applying the v6 patch, 'make check'
is working OK.
But after applying the v6 patch, some XML regression tests are failing
because the DETAIL part of the message indicating the wrong syntax
position is not getting displayed. Not only for your new tests -- but
for other XML tests too.
My ./configure looks like this:
./configure --prefix=/usr/local/pg_oss --with-libxml --enable-debug
--enable-cassert --enable-tap-tests CFLAGS="-ggdb -O0 -g3
-fno-omit-frame-pointer"
resulting in:
checking whether to build with XML support... yes
checking for libxml-2.0 >= 2.6.23... yes
~
e.g.(these are a sample of errors)
xml ... FAILED 2561 ms
@@ -344,8 +326,6 @@
<twoerrors>&idontexist;</unbalanced>
^
line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced
-<twoerrors>&idontexist;</unbalanced>
- ^
SELECT xmlparse(document '<nosuchprefix:tag/>');
xmlparse
---------------------
@@ -1696,14 +1676,8 @@
SELECT xmlformat('');
ERROR: invalid XML document
DETAIL: line 1: switching encoding : no input
-
-^
line 1: Document is empty
-
-^
-- XML format: invalid string (whitespaces)
SELECT xmlformat(' ');
ERROR: invalid XML document
DETAIL: line 1: Start tag expected, '<' not found
-
- ^
~~
Separately (but maybe it's related?), the CF-bot test also reported a
failure [1]https://api.cirrus-ci.com/v1/artifact/task/4802219812323328/testrun/build/testrun/regress/regress/regression.diffs with strange error detail differences.
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out
/tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12
09:02:57.077569000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
2023-02-12 09:05:45.148100000 +0000
@@ -1695,10 +1695,7 @@
-- XML format: empty string
SELECT xmlformat('');
ERROR: invalid XML document
-DETAIL: line 1: switching encoding : no input
-
-^
-line 1: Document is empty
+DETAIL: line 1: Document is empty
^
-- XML format: invalid string (whitespaces)
Kind Regards,
Peter Smith.
Fujitsu Australia
On 13.02.23 02:15, Peter Smith wrote:
Something is misbehaving.
Using the latest HEAD, and before applying the v6 patch, 'make check'
is working OK.But after applying the v6 patch, some XML regression tests are failing
because the DETAIL part of the message indicating the wrong syntax
position is not getting displayed. Not only for your new tests -- but
for other XML tests too.
Yes, I noticed it yesterday ... and I'm not sure how to solve it. It
seems that in the system is returning a different error message in the
FreeBSD patch tester, which is causing a regression test in this
particular OS to fail.
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12 09:02:57.077569000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out 2023-02-12 09:05:45.148100000 +0000
@@ -1695,10 +1695,7 @@
-- XML format: empty string
SELECT xmlformat('');
ERROR: invalid XML document
-DETAIL: line 1: switching encoding : no input
-
-^
-line 1: Document is empty
+DETAIL: line 1: Document is empty
^
-- XML format: invalid string (whitespaces)
Does anyone know if there is anything I can do to make the error
messages be consistent among different OS?
On 13.02.23 13:15, Jim Jones wrote:
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out --- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12 09:02:57.077569000 +0000 +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out 2023-02-12 09:05:45.148100000 +0000 @@ -1695,10 +1695,7 @@ -- XML format: empty string SELECT xmlformat(''); ERROR: invalid XML document -DETAIL: line 1: switching encoding : no input - -^ -line 1: Document is empty +DETAIL: line 1: Document is empty^
-- XML format: invalid string (whitespaces)
I couldn't figure out why the error messages are different -- I'm
wondering if the issue is the test environment itself. I just removed
the troubling test case for now
SELECT xmlformat('');
v7 attached.
Thanks for reviewing this patch!
Best, Jim
Attachments:
v7-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v7-0001-Add-pretty-printed-XML-output-option.patchDownload
From 9a1069e796eae892526fb08f7d7c7601fbcd341f Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v7] Add pretty-printed XML output option
This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
doc/src/sgml/func.sgml | 34 ++++++++++
src/backend/utils/adt/xml.c | 45 +++++++++++++
src/include/catalog/pg_proc.dat | 3 +
src/test/regress/expected/xml.out | 99 +++++++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 51 +++++++++++++++
src/test/regress/sql/xml.sql | 30 +++++++++
6 files changed, 262 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..a621192425 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14861,6 +14861,40 @@ SELECT xmltable.*
]]></screen>
</para>
</sect3>
+
+ <sect3 id="functions-xml-xmlformat">
+ <title><literal>xmlformat</literal></title>
+
+ <indexterm>
+ <primary>xmlformat</primary>
+ </indexterm>
+
+<synopsis>
+<function>xmlformat</function> ( <type>xml</type> ) <returnvalue>xml</returnvalue>
+</synopsis>
+
+ <para>
+ Converts the given XML value to pretty-printed, indented text.
+ </para>
+
+ <para>
+ Example:
+ <screen><![CDATA[
+SELECT xmlformat('<foo id="x"><bar id="y"><var id="z">42</var></bar></foo>');
+ xmlformat
+--------------------------
+ <foo id="x">
+ <bar id="y">
+ <var id="z">42</var>
+ </bar>
+ </foo>
+
+(1 row)
+
+]]></screen>
+ </para>
+ </sect3>
+
</sect2>
<sect2 id="functions-xml-mapping">
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..ec12707b5c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,51 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
}
#endif
+Datum
+xmlformat(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+ xmlDocPtr doc;
+ xmlChar *xmlbuf = NULL;
+ text *arg = PG_GETARG_TEXT_PP(0);
+ StringInfoData buf;
+ int nbytes;
+
+ doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+ if(!doc)
+ elog(ERROR, "could not parse the given XML document");
+
+ /**
+ * xmlDocDumpFormatMemory (
+ * xmlDocPtr doc, # the XML document
+ * xmlChar **xmlbuf, # the memory pointer
+ * int *nbytes, # the memory length
+ * int format # 1 = node indenting
+ *)
+ */
+
+ xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+ xmlFreeDoc(doc);
+
+ if(!nbytes)
+ elog(ERROR, "could not indent the given XML document");
+
+ initStringInfo(&buf);
+ appendStringInfoString(&buf, (const char *)xmlbuf);
+
+ xmlFree(xmlbuf);
+
+ PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
+
+#else
+ NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
Datum
xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..54e8a6262a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
{ oid => '3053', descr => 'determine if a string is well formed XML content',
proname => 'xml_is_well_formed_content', prorettype => 'bool',
proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+{ oid => '4642', descr => 'Indented text from xml',
+ proname => 'xmlformat', prorettype => 'xml',
+ proargtypes => 'xml', prosrc => 'xmlformat' },
# json
{ oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..2f886f3efa 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,102 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
<foo/> | <foo/>
(1 row)
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="42"> +
+ <food type="discounter"> +
+ <name>Belgian Waffles</name> +
+ <price>$5.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="73"> +
+ <food type="organic" class="fancy"> +
+ <name>Belgian Waffles</name> +
+ <price>$15.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> +
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>+
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <!-- eat this --> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories> +
+ <c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c> +
+ </meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: invalid XML document
+DETAIL: line 1: Start tag expected, '<' not found
+
+ ^
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 378b412db0..57e2df97ce 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -1268,3 +1268,54 @@ DETAIL: This functionality requires the server to be built with libxml support.
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
ERROR: unsupported XML feature
DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<breakfast_menu id="42"><food type="discou...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<breakfast_menu id="73"> <food type="organ...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat(' ');
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459297..fb6950fff7 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -624,3 +624,33 @@ SELECT * FROM XMLTABLE('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n
\x
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
+
+
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
\ No newline at end of file
--
2.25.1
On Wed, Feb 15, 2023 at 8:55 AM Jim Jones <jim.jones@uni-muenster.de> wrote:
On 13.02.23 13:15, Jim Jones wrote:
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out --- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12 09:02:57.077569000 +0000 +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out 2023-02-12 09:05:45.148100000 +0000 @@ -1695,10 +1695,7 @@ -- XML format: empty string SELECT xmlformat(''); ERROR: invalid XML document -DETAIL: line 1: switching encoding : no input - -^ -line 1: Document is empty +DETAIL: line 1: Document is empty^
-- XML format: invalid string (whitespaces)I couldn't figure out why the error messages are different -- I'm wondering if the issue is the test environment itself. I just removed the troubling test case for now
SELECT xmlformat('');
v7 attached.
Thanks for reviewing this patch!
Yesterday I looked at those cfbot configs and noticed all those
machines have different versions of libxml.
2.10.3
2.6.23
2.9.10
2.9.13
But I don't if version numbers have any impact on the different error
details or not.
~
The thing that puzzled me most is that in MY environment (CentOS7;
libxml 20901; PG --with-libxml build) I get this behaviour.
- Without your v6 patch 'make check' is all OK.
- With your v6 patch other XML tests (not only yours) of 'make check'
failed with different error messages.
- Similarly, if I keep the v6 patch but just change (in xmlformat) the
#ifdef USE_LIBXML to be #if 0, then only the new xmlformat tests fail,
but the other XML tests are working OK again.
Those results implied to me that this function code (in my environment
anyway) is somehow introducing a side effect causing the *other* XML
tests to fail.
But so far I was unable to identify the reason. Sorry, I don't know
this XML API well enough to help more.
------
Kind Regards,
Peter Smith.
Fujitsu Austalia.
On 14.02.23 23:45, Peter Smith wrote:
Those results implied to me that this function code (in my environment
anyway) is somehow introducing a side effect causing the *other* XML
tests to fail.
I believe I've found the issue. It is probably related to the XML OPTION
settings, as it seems to deliver different error messages when set to
DOCUMENT or CONTENT:
postgres=# SET XML OPTION CONTENT;
SET
postgres=# SELECT xmlformat('');
ERROR: invalid XML document
DETAIL: line 1: switching encoding : no input
^
line 1: Document is empty
^
postgres=# SET XML OPTION DOCUMENT;
SET
postgres=# SELECT xmlformat('');
ERROR: invalid XML document
LINE 1: SELECT xmlformat('');
^
DETAIL: line 1: switching encoding : no input
^
line 1: Document is empty
^
v8 attached reintroduces the SELECT xmlformat('') test case and adds SET
XML OPTION DOCUMENT to the regression tests.
Best, Jim
Attachments:
v8-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v8-0001-Add-pretty-printed-XML-output-option.patchDownload
From 588bd8cbde42189117a429b6f588053ea8362fd8 Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v8] Add pretty-printed XML output option
This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
doc/src/sgml/func.sgml | 34 +++++++++
src/backend/utils/adt/xml.c | 45 +++++++++++
src/include/catalog/pg_proc.dat | 3 +
src/test/regress/expected/xml.out | 113 ++++++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 58 ++++++++++++++
src/test/regress/sql/xml.sql | 34 +++++++++
6 files changed, 287 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..a621192425 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14861,6 +14861,40 @@ SELECT xmltable.*
]]></screen>
</para>
</sect3>
+
+ <sect3 id="functions-xml-xmlformat">
+ <title><literal>xmlformat</literal></title>
+
+ <indexterm>
+ <primary>xmlformat</primary>
+ </indexterm>
+
+<synopsis>
+<function>xmlformat</function> ( <type>xml</type> ) <returnvalue>xml</returnvalue>
+</synopsis>
+
+ <para>
+ Converts the given XML value to pretty-printed, indented text.
+ </para>
+
+ <para>
+ Example:
+ <screen><![CDATA[
+SELECT xmlformat('<foo id="x"><bar id="y"><var id="z">42</var></bar></foo>');
+ xmlformat
+--------------------------
+ <foo id="x">
+ <bar id="y">
+ <var id="z">42</var>
+ </bar>
+ </foo>
+
+(1 row)
+
+]]></screen>
+ </para>
+ </sect3>
+
</sect2>
<sect2 id="functions-xml-mapping">
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..ec12707b5c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,51 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
}
#endif
+Datum
+xmlformat(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+ xmlDocPtr doc;
+ xmlChar *xmlbuf = NULL;
+ text *arg = PG_GETARG_TEXT_PP(0);
+ StringInfoData buf;
+ int nbytes;
+
+ doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+ if(!doc)
+ elog(ERROR, "could not parse the given XML document");
+
+ /**
+ * xmlDocDumpFormatMemory (
+ * xmlDocPtr doc, # the XML document
+ * xmlChar **xmlbuf, # the memory pointer
+ * int *nbytes, # the memory length
+ * int format # 1 = node indenting
+ *)
+ */
+
+ xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+ xmlFreeDoc(doc);
+
+ if(!nbytes)
+ elog(ERROR, "could not indent the given XML document");
+
+ initStringInfo(&buf);
+ appendStringInfoString(&buf, (const char *)xmlbuf);
+
+ xmlFree(xmlbuf);
+
+ PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
+
+#else
+ NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
Datum
xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..54e8a6262a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
{ oid => '3053', descr => 'determine if a string is well formed XML content',
proname => 'xml_is_well_formed_content', prorettype => 'bool',
proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+{ oid => '4642', descr => 'Indented text from xml',
+ proname => 'xmlformat', prorettype => 'xml',
+ proargtypes => 'xml', prosrc => 'xmlformat' },
# json
{ oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..70ccf3b0fb 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,116 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
<foo/> | <foo/>
(1 row)
+SET XML OPTION DOCUMENT;
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="42"> +
+ <food type="discounter"> +
+ <name>Belgian Waffles</name> +
+ <price>$5.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="73"> +
+ <food type="organic" class="fancy"> +
+ <name>Belgian Waffles</name> +
+ <price>$15.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> +
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>+
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <!-- eat this --> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories> +
+ <c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c> +
+ </meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: invalid XML document
+LINE 1: SELECT xmlformat(' ');
+ ^
+DETAIL: line 1: Start tag expected, '<' not found
+
+ ^
+ -- XML format: empty string
+ SELECT xmlformat('');
+ERROR: invalid XML document
+LINE 1: SELECT xmlformat('');
+ ^
+DETAIL: line 1: switching encoding : no input
+
+^
+line 1: Document is empty
+
+^
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 378b412db0..d54244ef90 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -1268,3 +1268,61 @@ DETAIL: This functionality requires the server to be built with libxml support.
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
ERROR: unsupported XML feature
DETAIL: This functionality requires the server to be built with libxml support.
+SET XML OPTION DOCUMENT;
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<breakfast_menu id="42"><food type="discou...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<breakfast_menu id="73"> <food type="organ...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat(' ');
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+ -- XML format: empty string
+ SELECT xmlformat('');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('');
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459297..e417234aa4 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -624,3 +624,37 @@ SELECT * FROM XMLTABLE('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n
\x
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
+
+SET XML OPTION DOCUMENT;
+
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+
+ -- XML format: empty string
+ SELECT xmlformat('');
\ No newline at end of file
--
2.25.1
On Wed, Feb 15, 2023 at 11:05 AM Jim Jones <jim.jones@uni-muenster.de> wrote:
On 14.02.23 23:45, Peter Smith wrote:
Those results implied to me that this function code (in my environment
anyway) is somehow introducing a side effect causing the *other* XML
tests to fail.I believe I've found the issue. It is probably related to the XML OPTION
settings, as it seems to deliver different error messages when set to
DOCUMENT or CONTENT:postgres=# SET XML OPTION CONTENT;
SET
postgres=# SELECT xmlformat('');
ERROR: invalid XML document
DETAIL: line 1: switching encoding : no input^
line 1: Document is empty^
postgres=# SET XML OPTION DOCUMENT;
SET
postgres=# SELECT xmlformat('');
ERROR: invalid XML document
LINE 1: SELECT xmlformat('');
^
DETAIL: line 1: switching encoding : no input^
line 1: Document is empty^
v8 attached reintroduces the SELECT xmlformat('') test case and adds SET
XML OPTION DOCUMENT to the regression tests.
With v8, in my environment, in psql I see something slightly different:
test_pub=# SET XML OPTION CONTENT;
SET
test_pub=# SELECT xmlformat('');
ERROR: invalid XML document
DETAIL: line 1: switching encoding : no input
line 1: Document is empty
test_pub=# SET XML OPTION DOCUMENT;
SET
test_pub=# SELECT xmlformat('');
ERROR: invalid XML document
LINE 1: SELECT xmlformat('');
^
DETAIL: line 1: switching encoding : no input
line 1: Document is empty
~~
test_pub=# SET XML OPTION CONTENT;
SET
test_pub=# INSERT INTO xmltest VALUES (3, '<wrong');
ERROR: relation "xmltest" does not exist
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
^
test_pub=# SET XML OPTION DOCUMENT;
SET
test_pub=# INSERT INTO xmltest VALUES (3, '<wrong');
ERROR: relation "xmltest" does not exist
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
^
~~
Because the expected extra detail stuff is missing the regression
tests are still failing for me.
------
Kind Regards,
Peter Smith.
Fujitsu Austalia.
On 15.02.23 02:09, Peter Smith wrote:
With v8, in my environment, in psql I see something slightly different:
test_pub=# SET XML OPTION CONTENT;
SET
test_pub=# SELECT xmlformat('');
ERROR: invalid XML document
DETAIL: line 1: switching encoding : no input
line 1: Document is empty
test_pub=# SET XML OPTION DOCUMENT;
SET
test_pub=# SELECT xmlformat('');
ERROR: invalid XML document
LINE 1: SELECT xmlformat('');
^
DETAIL: line 1: switching encoding : no input
line 1: Document is empty~~
test_pub=# SET XML OPTION CONTENT;
SET
test_pub=# INSERT INTO xmltest VALUES (3, '<wrong');
ERROR: relation "xmltest" does not exist
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
^
test_pub=# SET XML OPTION DOCUMENT;
SET
test_pub=# INSERT INTO xmltest VALUES (3, '<wrong');
ERROR: relation "xmltest" does not exist
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
^~~
Yes... a cfbot also complained about the same thing.
Setting the VERBOSITY to terse might solve this issue:
postgres=# \set VERBOSITY terse
postgres=# SELECT xmlformat('');
ERROR: invalid XML document
postgres=# \set VERBOSITY default
postgres=# SELECT xmlformat('');
ERROR: invalid XML document
DETAIL: line 1: switching encoding : no input
^
line 1: Document is empty
^
v9 wraps the corner test cases with VERBOSITY terse to reduce the error
message output.
Thanks!
Best, Jim
Attachments:
v9-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v9-0001-Add-pretty-printed-XML-output-option.patchDownload
From 2545406a1494e71ca14dbad4ee6fca10e1668754 Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v9] Add pretty-printed XML output option
This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
doc/src/sgml/func.sgml | 34 ++++++++++
src/backend/utils/adt/xml.c | 45 ++++++++++++
src/include/catalog/pg_proc.dat | 3 +
src/test/regress/expected/xml.out | 102 ++++++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 54 +++++++++++++++
src/test/regress/sql/xml.sql | 40 +++++++++++
6 files changed, 278 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..a621192425 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14861,6 +14861,40 @@ SELECT xmltable.*
]]></screen>
</para>
</sect3>
+
+ <sect3 id="functions-xml-xmlformat">
+ <title><literal>xmlformat</literal></title>
+
+ <indexterm>
+ <primary>xmlformat</primary>
+ </indexterm>
+
+<synopsis>
+<function>xmlformat</function> ( <type>xml</type> ) <returnvalue>xml</returnvalue>
+</synopsis>
+
+ <para>
+ Converts the given XML value to pretty-printed, indented text.
+ </para>
+
+ <para>
+ Example:
+ <screen><![CDATA[
+SELECT xmlformat('<foo id="x"><bar id="y"><var id="z">42</var></bar></foo>');
+ xmlformat
+--------------------------
+ <foo id="x">
+ <bar id="y">
+ <var id="z">42</var>
+ </bar>
+ </foo>
+
+(1 row)
+
+]]></screen>
+ </para>
+ </sect3>
+
</sect2>
<sect2 id="functions-xml-mapping">
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..ec12707b5c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,51 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
}
#endif
+Datum
+xmlformat(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+ xmlDocPtr doc;
+ xmlChar *xmlbuf = NULL;
+ text *arg = PG_GETARG_TEXT_PP(0);
+ StringInfoData buf;
+ int nbytes;
+
+ doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+ if(!doc)
+ elog(ERROR, "could not parse the given XML document");
+
+ /**
+ * xmlDocDumpFormatMemory (
+ * xmlDocPtr doc, # the XML document
+ * xmlChar **xmlbuf, # the memory pointer
+ * int *nbytes, # the memory length
+ * int format # 1 = node indenting
+ *)
+ */
+
+ xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+ xmlFreeDoc(doc);
+
+ if(!nbytes)
+ elog(ERROR, "could not indent the given XML document");
+
+ initStringInfo(&buf);
+ appendStringInfoString(&buf, (const char *)xmlbuf);
+
+ xmlFree(xmlbuf);
+
+ PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
+
+#else
+ NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
Datum
xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..54e8a6262a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
{ oid => '3053', descr => 'determine if a string is well formed XML content',
proname => 'xml_is_well_formed_content', prorettype => 'bool',
proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+{ oid => '4642', descr => 'Indented text from xml',
+ proname => 'xmlformat', prorettype => 'xml',
+ proargtypes => 'xml', prosrc => 'xmlformat' },
# json
{ oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..3bc5f40142 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,105 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
<foo/> | <foo/>
(1 row)
+SET XML OPTION DOCUMENT;
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="42"> +
+ <food type="discounter"> +
+ <name>Belgian Waffles</name> +
+ <price>$5.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="73"> +
+ <food type="organic" class="fancy"> +
+ <name>Belgian Waffles</name> +
+ <price>$15.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> +
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>+
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <!-- eat this --> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories> +
+ <c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c> +
+ </meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+\set VERBOSITY terse
+-- XML format: empty string
+SELECT xmlformat('');
+ERROR: invalid XML document at character 18
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: invalid XML document at character 18
+\set VERBOSITY default
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 378b412db0..e18de278f8 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -1268,3 +1268,57 @@ DETAIL: This functionality requires the server to be built with libxml support.
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
ERROR: unsupported XML feature
DETAIL: This functionality requires the server to be built with libxml support.
+SET XML OPTION DOCUMENT;
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<breakfast_menu id="42"><food type="discou...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<breakfast_menu id="73"> <food type="organ...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+\set VERBOSITY terse
+-- XML format: empty string
+SELECT xmlformat('');
+ERROR: unsupported XML feature at character 18
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: unsupported XML feature at character 18
+\set VERBOSITY default
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459297..d2072be4b8 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -624,3 +624,43 @@ SELECT * FROM XMLTABLE('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n
\x
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
+
+SET XML OPTION DOCUMENT;
+
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+
+
+
+\set VERBOSITY terse
+
+-- XML format: empty string
+SELECT xmlformat('');
+
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+
+\set VERBOSITY default
--
2.25.1
On Wed, Feb 15, 2023 at 6:10 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
On 15.02.23 02:09, Peter Smith wrote:
With v8, in my environment, in psql I see something slightly different:
test_pub=# SET XML OPTION CONTENT;
SET
test_pub=# SELECT xmlformat('');
ERROR: invalid XML document
DETAIL: line 1: switching encoding : no input
line 1: Document is empty
test_pub=# SET XML OPTION DOCUMENT;
SET
test_pub=# SELECT xmlformat('');
ERROR: invalid XML document
LINE 1: SELECT xmlformat('');
^
DETAIL: line 1: switching encoding : no input
line 1: Document is empty~~
test_pub=# SET XML OPTION CONTENT;
SET
test_pub=# INSERT INTO xmltest VALUES (3, '<wrong');
ERROR: relation "xmltest" does not exist
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
^
test_pub=# SET XML OPTION DOCUMENT;
SET
test_pub=# INSERT INTO xmltest VALUES (3, '<wrong');
ERROR: relation "xmltest" does not exist
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
^~~
Yes... a cfbot also complained about the same thing.
Setting the VERBOSITY to terse might solve this issue:
postgres=# \set VERBOSITY terse
postgres=# SELECT xmlformat('');
ERROR: invalid XML documentpostgres=# \set VERBOSITY default
postgres=# SELECT xmlformat('');
ERROR: invalid XML document
DETAIL: line 1: switching encoding : no input^
line 1: Document is empty^
v9 wraps the corner test cases with VERBOSITY terse to reduce the error
message output.
Bingo!! Your v9 patch now passes all 'make check' tests for me.
But I'll leave it to a committer to decide if this VERBOSITY toggle is
the best fix.
(I don't understand, maybe someone can explain, how the patch managed
to mess verbosity of the existing tests.)
------
Kind Regards,
Peter Smith.
Fujitsu Austalia.
On 15.02.23 10:06, Peter Smith wrote:
Bingo!! Your v9 patch now passes all 'make check' tests for me.
Nice! It also passed all tests in the patch tester.
But I'll leave it to a committer to decide if this VERBOSITY toggle is
the best fix.
I see now other test cases in the xml.sql file that also use this
option, so it might be a known "issue".
Shall we now set the patch to "Ready for Commiter"?
Thanks a lot for the review!
Best, Jim
Attachments:
On 2023-Feb-13, Peter Smith wrote:
Something is misbehaving.
Using the latest HEAD, and before applying the v6 patch, 'make check'
is working OK.But after applying the v6 patch, some XML regression tests are failing
because the DETAIL part of the message indicating the wrong syntax
position is not getting displayed. Not only for your new tests -- but
for other XML tests too.
Note that there's another file, xml_2.out, which does not contain the
additional part of the DETAIL message. I suspect in Peter's case it's
xml_2.out that was originally being used as a comparison basis (not
xml.out), but that one is not getting patched, and ultimately the diff
is reported for him against xml.out because none of them matches.
An easy way forward might be to manually apply the patch from xml.out to
xml_2.out, and edit it by hand to remove the additional lines.
See commit 085423e3e326 for a bit of background.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
On 15.02.23 11:11, Alvaro Herrera wrote:
Note that there's another file, xml_2.out, which does not contain the
additional part of the DETAIL message. I suspect in Peter's case it's
xml_2.out that was originally being used as a comparison basis (not
xml.out), but that one is not getting patched, and ultimately the diff
is reported for him against xml.out because none of them matches.An easy way forward might be to manually apply the patch from xml.out to
xml_2.out, and edit it by hand to remove the additional lines.See commit 085423e3e326 for a bit of background.
Hi Álvaro,
As my test cases were not specifically about how the error message looks
like, I thought that suppressing part of the error messages by setting
VERBOSITY terse would suffice.[1] Is this approach not recommended?
Thanks!
Best, Jim
1 - v9 patch:
/messages/by-id/CAHut+PtoH8zkBHxv44XyO+o4kW_nZdGhNdVaJ_cpEjrckKDqtw@mail.gmail.com
Attachments:
On 2023-Feb-15, Jim Jones wrote:
Hi Álvaro,
As my test cases were not specifically about how the error message looks
like, I thought that suppressing part of the error messages by setting
VERBOSITY terse would suffice.[1] Is this approach not recommended?
Well, I don't see why we would depart from what we've been doing in the
rest of the XML tests. I did see that patch and I thought it was taking
the wrong approach.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)
On 15.02.23 12:11, Alvaro Herrera wrote:
Well, I don't see why we would depart from what we've been doing in the
rest of the XML tests. I did see that patch and I thought it was taking
the wrong approach.
Fair point.
v10 patches the xml.out to xml_2.out - manually removing the additional
lines.
Thanks for the review!
Best, Jim
Attachments:
v10-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v10-0001-Add-pretty-printed-XML-output-option.patchDownload
From 835c9ec18255adce9f9ae1e1e5d9e4287bac5452 Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v10] Add pretty-printed XML output option
This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
doc/src/sgml/func.sgml | 34 +++++++++
src/backend/utils/adt/xml.c | 45 ++++++++++++
src/include/catalog/pg_proc.dat | 3 +
src/test/regress/expected/xml.out | 108 ++++++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 57 +++++++++++++++
src/test/regress/expected/xml_2.out | 105 +++++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 32 +++++++++
7 files changed, 384 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..a621192425 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14861,6 +14861,40 @@ SELECT xmltable.*
]]></screen>
</para>
</sect3>
+
+ <sect3 id="functions-xml-xmlformat">
+ <title><literal>xmlformat</literal></title>
+
+ <indexterm>
+ <primary>xmlformat</primary>
+ </indexterm>
+
+<synopsis>
+<function>xmlformat</function> ( <type>xml</type> ) <returnvalue>xml</returnvalue>
+</synopsis>
+
+ <para>
+ Converts the given XML value to pretty-printed, indented text.
+ </para>
+
+ <para>
+ Example:
+ <screen><![CDATA[
+SELECT xmlformat('<foo id="x"><bar id="y"><var id="z">42</var></bar></foo>');
+ xmlformat
+--------------------------
+ <foo id="x">
+ <bar id="y">
+ <var id="z">42</var>
+ </bar>
+ </foo>
+
+(1 row)
+
+]]></screen>
+ </para>
+ </sect3>
+
</sect2>
<sect2 id="functions-xml-mapping">
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..ec12707b5c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,51 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
}
#endif
+Datum
+xmlformat(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+ xmlDocPtr doc;
+ xmlChar *xmlbuf = NULL;
+ text *arg = PG_GETARG_TEXT_PP(0);
+ StringInfoData buf;
+ int nbytes;
+
+ doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+ if(!doc)
+ elog(ERROR, "could not parse the given XML document");
+
+ /**
+ * xmlDocDumpFormatMemory (
+ * xmlDocPtr doc, # the XML document
+ * xmlChar **xmlbuf, # the memory pointer
+ * int *nbytes, # the memory length
+ * int format # 1 = node indenting
+ *)
+ */
+
+ xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+ xmlFreeDoc(doc);
+
+ if(!nbytes)
+ elog(ERROR, "could not indent the given XML document");
+
+ initStringInfo(&buf);
+ appendStringInfoString(&buf, (const char *)xmlbuf);
+
+ xmlFree(xmlbuf);
+
+ PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
+
+#else
+ NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
Datum
xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..54e8a6262a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
{ oid => '3053', descr => 'determine if a string is well formed XML content',
proname => 'xml_is_well_formed_content', prorettype => 'bool',
proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+{ oid => '4642', descr => 'Indented text from xml',
+ proname => 'xmlformat', prorettype => 'xml',
+ proargtypes => 'xml', prosrc => 'xmlformat' },
# json
{ oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..8bc8919092 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,111 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
<foo/> | <foo/>
(1 row)
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="42"> +
+ <food type="discounter"> +
+ <name>Belgian Waffles</name> +
+ <price>$5.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="73"> +
+ <food type="organic" class="fancy"> +
+ <name>Belgian Waffles</name> +
+ <price>$15.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> +
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>+
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <!-- eat this --> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories> +
+ <c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c> +
+ </meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: invalid XML document
+DETAIL: line 1: Start tag expected, '<' not found
+
+ ^
+-- XML format: empty string
+SELECT xmlformat('');
+ERROR: invalid XML document
+DETAIL: line 1: switching encoding : no input
+
+^
+line 1: Document is empty
+
+^
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 378b412db0..79c4721f4b 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -1268,3 +1268,60 @@ DETAIL: This functionality requires the server to be built with libxml support.
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
ERROR: unsupported XML feature
DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<breakfast_menu id="42"><food type="discou...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<breakfast_menu id="73"> <food type="organ...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat(' ');
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: empty string
+SELECT xmlformat('');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('');
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 42055c5003..73923e1e80 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -1579,3 +1579,108 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
<foo/> | <foo/>
(1 row)
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="42"> +
+ <food type="discounter"> +
+ <name>Belgian Waffles</name> +
+ <price>$5.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="73"> +
+ <food type="organic" class="fancy"> +
+ <name>Belgian Waffles</name> +
+ <price>$15.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> +
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>+
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <!-- eat this --> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories> +
+ <c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c> +
+ </meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: invalid XML document
+DETAIL: line 1: Start tag expected, '<' not found
+
+ ^
+-- XML format: empty string
+SELECT xmlformat('');
+ERROR: invalid XML document
+DETAIL: line 1: Document is empty
+
+^
\ No newline at end of file
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459297..19c5b9d7a4 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -624,3 +624,35 @@ SELECT * FROM XMLTABLE('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n
\x
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
+
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+
+-- XML format: empty string
+SELECT xmlformat('');
\ No newline at end of file
--
2.25.1
Accidentally left the VERBOSE settings out -- sorry!
Now it matches the approach used in a xpath test in xml.sql, xml.out,
xml_1.out and xml_2.out
-- Since different libxml versions emit slightly different
-- error messages, we suppress the DETAIL in this test.
\set VERBOSITY terse
SELECT xpath('/*', '<invalidns xmlns=''<''/>');
ERROR: could not parse XML document
\set VERBOSITY default
v11 now correctly sets xml_2.out.
Best, Jim
Attachments:
v11-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v11-0001-Add-pretty-printed-XML-output-option.patchDownload
From 473aab0a0028cd4de515c6a3698a1cda1c987067 Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v11] Add pretty-printed XML output option
This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
doc/src/sgml/func.sgml | 34 ++++++++++
src/backend/utils/adt/xml.c | 45 +++++++++++++
src/include/catalog/pg_proc.dat | 3 +
src/test/regress/expected/xml.out | 101 ++++++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 53 +++++++++++++++
src/test/regress/expected/xml_2.out | 101 ++++++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 33 +++++++++
7 files changed, 370 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..a621192425 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14861,6 +14861,40 @@ SELECT xmltable.*
]]></screen>
</para>
</sect3>
+
+ <sect3 id="functions-xml-xmlformat">
+ <title><literal>xmlformat</literal></title>
+
+ <indexterm>
+ <primary>xmlformat</primary>
+ </indexterm>
+
+<synopsis>
+<function>xmlformat</function> ( <type>xml</type> ) <returnvalue>xml</returnvalue>
+</synopsis>
+
+ <para>
+ Converts the given XML value to pretty-printed, indented text.
+ </para>
+
+ <para>
+ Example:
+ <screen><![CDATA[
+SELECT xmlformat('<foo id="x"><bar id="y"><var id="z">42</var></bar></foo>');
+ xmlformat
+--------------------------
+ <foo id="x">
+ <bar id="y">
+ <var id="z">42</var>
+ </bar>
+ </foo>
+
+(1 row)
+
+]]></screen>
+ </para>
+ </sect3>
+
</sect2>
<sect2 id="functions-xml-mapping">
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..ec12707b5c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,51 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
}
#endif
+Datum
+xmlformat(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+ xmlDocPtr doc;
+ xmlChar *xmlbuf = NULL;
+ text *arg = PG_GETARG_TEXT_PP(0);
+ StringInfoData buf;
+ int nbytes;
+
+ doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+ if(!doc)
+ elog(ERROR, "could not parse the given XML document");
+
+ /**
+ * xmlDocDumpFormatMemory (
+ * xmlDocPtr doc, # the XML document
+ * xmlChar **xmlbuf, # the memory pointer
+ * int *nbytes, # the memory length
+ * int format # 1 = node indenting
+ *)
+ */
+
+ xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+ xmlFreeDoc(doc);
+
+ if(!nbytes)
+ elog(ERROR, "could not indent the given XML document");
+
+ initStringInfo(&buf);
+ appendStringInfoString(&buf, (const char *)xmlbuf);
+
+ xmlFree(xmlbuf);
+
+ PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
+
+#else
+ NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
Datum
xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..54e8a6262a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
{ oid => '3053', descr => 'determine if a string is well formed XML content',
proname => 'xml_is_well_formed_content', prorettype => 'bool',
proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+{ oid => '4642', descr => 'Indented text from xml',
+ proname => 'xmlformat', prorettype => 'xml',
+ proargtypes => 'xml', prosrc => 'xmlformat' },
# json
{ oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..e45116aaa7 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,104 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
<foo/> | <foo/>
(1 row)
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="42"> +
+ <food type="discounter"> +
+ <name>Belgian Waffles</name> +
+ <price>$5.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="73"> +
+ <food type="organic" class="fancy"> +
+ <name>Belgian Waffles</name> +
+ <price>$15.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> +
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>+
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <!-- eat this --> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories> +
+ <c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c> +
+ </meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+\set VERBOSITY terse
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: invalid XML document
+-- XML format: empty string
+SELECT xmlformat('');
+ERROR: invalid XML document
+\set VERBOSITY default
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 378b412db0..dc3c241a3a 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -1268,3 +1268,56 @@ DETAIL: This functionality requires the server to be built with libxml support.
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
ERROR: unsupported XML feature
DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<breakfast_menu id="42"><food type="discou...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<breakfast_menu id="73"> <food type="organ...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+\set VERBOSITY terse
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: unsupported XML feature at character 18
+-- XML format: empty string
+SELECT xmlformat('');
+ERROR: unsupported XML feature at character 18
+\set VERBOSITY default
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 42055c5003..c04a57fe6d 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -1579,3 +1579,104 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
<foo/> | <foo/>
(1 row)
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="42"> +
+ <food type="discounter"> +
+ <name>Belgian Waffles</name> +
+ <price>$5.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="73"> +
+ <food type="organic" class="fancy"> +
+ <name>Belgian Waffles</name> +
+ <price>$15.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> +
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>+
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <!-- eat this --> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories> +
+ <c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c> +
+ </meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+\set VERBOSITY terse
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: invalid XML document
+-- XML format: empty string
+SELECT xmlformat('');
+ERROR: invalid XML document
+\set VERBOSITY default
\ No newline at end of file
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459297..68ac613475 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -624,3 +624,36 @@ SELECT * FROM XMLTABLE('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n
\x
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
+
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+\set VERBOSITY terse
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+
+-- XML format: empty string
+SELECT xmlformat('');
+\set VERBOSITY default
\ No newline at end of file
--
2.25.1
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
Note that there's another file, xml_2.out, which does not contain the
additional part of the DETAIL message. I suspect in Peter's case it's
xml_2.out that was originally being used as a comparison basis (not
xml.out), but that one is not getting patched, and ultimately the diff
is reported for him against xml.out because none of them matches.
See commit 085423e3e326 for a bit of background.
Yeah. That's kind of sad, because it means there are still broken
libxml2s out there in 2023. Nonetheless, since there are, it is not
optional to fix all three expected-files.
regards, tom lane
On Thu, Feb 16, 2023 at 12:49 AM Jim Jones <jim.jones@uni-muenster.de> wrote:
Accidentally left the VERBOSE settings out -- sorry!
Now it matches the approach used in a xpath test in xml.sql, xml.out,
xml_1.out and xml_2.out-- Since different libxml versions emit slightly different
-- error messages, we suppress the DETAIL in this test.
\set VERBOSITY terse
SELECT xpath('/*', '<invalidns xmlns=''<''/>');
ERROR: could not parse XML document
\set VERBOSITY defaultv11 now correctly sets xml_2.out.
Best, Jim
Firstly, Sorry it seems like I made a mistake and was premature
calling bingo above for v9.
- today I repeated v9 'make check' and found it failing still.
- the new xmlformat tests are OK, but some pre-existing xmlparse tests
are broken.
- see attached file pretty-v9-results
----
OTOH, the absence of xml_2.out from this patch appears to be the
correct explanation for why my results have been differing.
----
Today I fetched and tried the latest v11.
It is failing too, but only just.
- see attached file pretty-v11-results
It looks only due to a whitespace EOF issue in the xml_2.out
@@ -1679,4 +1679,4 @@
-- XML format: empty string
SELECT xmlformat('');
ERROR: invalid XML document
-\set VERBOSITY default
\ No newline at end of file
+\set VERBOSITY default
------
The attached patch update (v12-0002) fixes the xml_2.out for me.
------
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v12-0002-PS-fix-EOF-for-xml_2.out.patchapplication/octet-stream; name=v12-0002-PS-fix-EOF-for-xml_2.out.patchDownload
From dc0ff9f8f8251d7ab9b5248bf1945341a036d76e Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Thu, 16 Feb 2023 10:00:41 +1100
Subject: [PATCH v12] PS fix EOF for xml_2.out
---
src/test/regress/expected/xml_2.out | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index c04a57f..2bacbde 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -1679,4 +1679,4 @@ ERROR: invalid XML document
-- XML format: empty string
SELECT xmlformat('');
ERROR: invalid XML document
-\set VERBOSITY default
\ No newline at end of file
+\set VERBOSITY default
--
1.8.3.1
pretty-v9-resultsapplication/octet-stream; name=pretty-v9-resultsDownload
diff -U3 /home/postgres/oss_postgres_misc/src/test/regress/expected/xml.out /home/postgres/oss_postgres_misc/src/test/regress/results/xml.out
--- /home/postgres/oss_postgres_misc/src/test/regress/expected/xml.out 2023-02-16 07:53:55.577490032 +1100
+++ /home/postgres/oss_postgres_misc/src/test/regress/results/xml.out 2023-02-16 08:19:25.588981111 +1100
@@ -9,8 +9,6 @@
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
^
DETAIL: line 1: Couldn't find end of Start Tag wrong line 1
-<wrong
- ^
SELECT * FROM xmltest;
id | data
----+--------------------
@@ -94,8 +92,6 @@
LINE 1: SELECT xmlconcat('bad', '<syntax');
^
DETAIL: line 1: Couldn't find end of Start Tag syntax line 1
-<syntax
- ^
SELECT xmlconcat('<foo/>', NULL, '<?xml version="1.1" standalone="no"?><bar/>');
xmlconcat
--------------
@@ -255,16 +251,12 @@
<invalidentity>&</invalidentity>
^
line 1: chunk is not well balanced
-<invalidentity>&</invalidentity>
- ^
SELECT xmlparse(content '<undefinedentity>&idontexist;</undefinedentity>');
ERROR: invalid XML content
DETAIL: line 1: Entity 'idontexist' not defined
<undefinedentity>&idontexist;</undefinedentity>
^
line 1: chunk is not well balanced
-<undefinedentity>&idontexist;</undefinedentity>
- ^
SELECT xmlparse(content '<invalidns xmlns=''<''/>');
xmlparse
---------------------------
@@ -283,11 +275,7 @@
<twoerrors>&idontexist;</unbalanced>
^
line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced
-<twoerrors>&idontexist;</unbalanced>
- ^
line 1: chunk is not well balanced
-<twoerrors>&idontexist;</unbalanced>
- ^
SELECT xmlparse(content '<nosuchprefix:tag/>');
xmlparse
---------------------
@@ -297,8 +285,6 @@
SELECT xmlparse(document ' ');
ERROR: invalid XML document
DETAIL: line 1: Start tag expected, '<' not found
-
- ^
SELECT xmlparse(document 'abc');
ERROR: invalid XML document
DETAIL: line 1: Start tag expected, '<' not found
@@ -316,16 +302,12 @@
<invalidentity>&</abc>
^
line 1: Opening and ending tag mismatch: invalidentity line 1 and abc
-<invalidentity>&</abc>
- ^
SELECT xmlparse(document '<undefinedentity>&idontexist;</abc>');
ERROR: invalid XML document
DETAIL: line 1: Entity 'idontexist' not defined
<undefinedentity>&idontexist;</abc>
^
line 1: Opening and ending tag mismatch: undefinedentity line 1 and abc
-<undefinedentity>&idontexist;</abc>
- ^
SELECT xmlparse(document '<invalidns xmlns=''<''/>');
xmlparse
---------------------------
@@ -344,8 +326,6 @@
<twoerrors>&idontexist;</unbalanced>
^
line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced
-<twoerrors>&idontexist;</unbalanced>
- ^
SELECT xmlparse(document '<nosuchprefix:tag/>');
xmlparse
---------------------
v12-0001-Add-pretty-printed-XML-output-option.patchapplication/octet-stream; name=v12-0001-Add-pretty-printed-XML-output-option.patchDownload
From a4f811bc6397740ac757aec1f2014e661019dad4 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Thu, 16 Feb 2023 09:57:44 +1100
Subject: [PATCH v12] Add pretty-printed XML output option
This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
doc/src/sgml/func.sgml | 34 ++++++++++++
src/backend/utils/adt/xml.c | 45 ++++++++++++++++
src/include/catalog/pg_proc.dat | 3 ++
src/test/regress/expected/xml.out | 101 ++++++++++++++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 53 +++++++++++++++++++
src/test/regress/expected/xml_2.out | 101 ++++++++++++++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 33 ++++++++++++
7 files changed, 370 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289..a621192 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14861,6 +14861,40 @@ SELECT xmltable.*
]]></screen>
</para>
</sect3>
+
+ <sect3 id="functions-xml-xmlformat">
+ <title><literal>xmlformat</literal></title>
+
+ <indexterm>
+ <primary>xmlformat</primary>
+ </indexterm>
+
+<synopsis>
+<function>xmlformat</function> ( <type>xml</type> ) <returnvalue>xml</returnvalue>
+</synopsis>
+
+ <para>
+ Converts the given XML value to pretty-printed, indented text.
+ </para>
+
+ <para>
+ Example:
+ <screen><![CDATA[
+SELECT xmlformat('<foo id="x"><bar id="y"><var id="z">42</var></bar></foo>');
+ xmlformat
+--------------------------
+ <foo id="x">
+ <bar id="y">
+ <var id="z">42</var>
+ </bar>
+ </foo>
+
+(1 row)
+
+]]></screen>
+ </para>
+ </sect3>
+
</sect2>
<sect2 id="functions-xml-mapping">
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1..ec12707 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,51 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
}
#endif
+Datum
+xmlformat(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+ xmlDocPtr doc;
+ xmlChar *xmlbuf = NULL;
+ text *arg = PG_GETARG_TEXT_PP(0);
+ StringInfoData buf;
+ int nbytes;
+
+ doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+ if(!doc)
+ elog(ERROR, "could not parse the given XML document");
+
+ /**
+ * xmlDocDumpFormatMemory (
+ * xmlDocPtr doc, # the XML document
+ * xmlChar **xmlbuf, # the memory pointer
+ * int *nbytes, # the memory length
+ * int format # 1 = node indenting
+ *)
+ */
+
+ xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+ xmlFreeDoc(doc);
+
+ if(!nbytes)
+ elog(ERROR, "could not indent the given XML document");
+
+ initStringInfo(&buf);
+ appendStringInfoString(&buf, (const char *)xmlbuf);
+
+ xmlFree(xmlbuf);
+
+ PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
+
+#else
+ NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
Datum
xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 66b73c3..0e80347 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8851,6 +8851,9 @@
{ oid => '3053', descr => 'determine if a string is well formed XML content',
proname => 'xml_is_well_formed_content', prorettype => 'bool',
proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+{ oid => '4642', descr => 'Indented text from xml',
+ proname => 'xmlformat', prorettype => 'xml',
+ proargtypes => 'xml', prosrc => 'xmlformat' },
# json
{ oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9..e45116a 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,104 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
<foo/> | <foo/>
(1 row)
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="42"> +
+ <food type="discounter"> +
+ <name>Belgian Waffles</name> +
+ <price>$5.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="73"> +
+ <food type="organic" class="fancy"> +
+ <name>Belgian Waffles</name> +
+ <price>$15.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> +
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>+
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <!-- eat this --> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories> +
+ <c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c> +
+ </meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+\set VERBOSITY terse
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: invalid XML document
+-- XML format: empty string
+SELECT xmlformat('');
+ERROR: invalid XML document
+\set VERBOSITY default
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 378b412..dc3c241 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -1268,3 +1268,56 @@ DETAIL: This functionality requires the server to be built with libxml support.
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
ERROR: unsupported XML feature
DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<breakfast_menu id="42"><food type="discou...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<breakfast_menu id="73"> <food type="organ...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+\set VERBOSITY terse
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: unsupported XML feature at character 18
+-- XML format: empty string
+SELECT xmlformat('');
+ERROR: unsupported XML feature at character 18
+\set VERBOSITY default
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 42055c5..c04a57f 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -1579,3 +1579,104 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
<foo/> | <foo/>
(1 row)
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="42"> +
+ <food type="discounter"> +
+ <name>Belgian Waffles</name> +
+ <price>$5.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="73"> +
+ <food type="organic" class="fancy"> +
+ <name>Belgian Waffles</name> +
+ <price>$15.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> +
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>+
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <!-- eat this --> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories> +
+ <c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c> +
+ </meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+\set VERBOSITY terse
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: invalid XML document
+-- XML format: empty string
+SELECT xmlformat('');
+ERROR: invalid XML document
+\set VERBOSITY default
\ No newline at end of file
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459..68ac613 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -624,3 +624,36 @@ SELECT * FROM XMLTABLE('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n
\x
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
+
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+\set VERBOSITY terse
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+
+-- XML format: empty string
+SELECT xmlformat('');
+\set VERBOSITY default
\ No newline at end of file
--
1.8.3.1
pretty-v11-resultsapplication/octet-stream; name=pretty-v11-resultsDownload
diff -U3 /home/postgres/oss_postgres_misc/src/test/regress/expected/xml_2.out /home/postgres/oss_postgres_misc/src/test/regress/results/xml.out
--- /home/postgres/oss_postgres_misc/src/test/regress/expected/xml_2.out 2023-02-16 08:25:37.878022598 +1100
+++ /home/postgres/oss_postgres_misc/src/test/regress/results/xml.out 2023-02-16 08:38:08.674557466 +1100
@@ -1679,4 +1679,4 @@
-- XML format: empty string
SELECT xmlformat('');
ERROR: invalid XML document
-\set VERBOSITY default
\ No newline at end of file
+\set VERBOSITY default
On Thu, Feb 9, 2023 at 2:31 AM Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:
I suggest we call it "xmlformat", which is an established term for this.
Some very-very old, rusted memory told me that there was something in
standard – and indeed, it seems it described an optional Feature X069,
“XMLSerialize: INDENT” for XMLSERIALIZE. So probably pretty-printing should
go there, to XMLSERIALIZE, to follow the standard?
Oracle also has an option for it in XMLSERIALIZE, although in a slightly
different form, with ability to specify the number of spaces for
indentation
https://docs.oracle.com/database/121/SQLRF/functions268.htm#SQLRF06231.
On 16.02.23 05:38, Nikolay Samokhvalov wrote:
On Thu, Feb 9, 2023 at 2:31 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:I suggest we call it "xmlformat", which is an established term for
this.Some very-very old, rusted memory told me that there was something in
standard – and indeed, it seems it described an optional Feature X069,
“XMLSerialize: INDENT” for XMLSERIALIZE. So probably pretty-printing
should go there, to XMLSERIALIZE, to follow the standard?Oracle also has an option for it in XMLSERIALIZE, although in a
slightly different form, with ability to specify the number of spaces
for indentation
https://docs.oracle.com/database/121/SQLRF/functions268.htm#SQLRF06231.
Hi Nikolay,
My first thought was to call it xmlpretty, to make it consistent with
the jsonb equivalent "jsonb_pretty". But yes, you make a good
observation .. xmlserialize seems to be a much better candidate.
I would be willing to refactor my patch if we agree on xmlserialize.
Thanks for the suggestion!
Jim
On 16.02.23 00:13, Peter Smith wrote:
Today I fetched and tried the latest v11.
It is failing too, but only just.
- see attached file pretty-v11-resultsIt looks only due to a whitespace EOF issue in the xml_2.out
@@ -1679,4 +1679,4 @@ -- XML format: empty string SELECT xmlformat(''); ERROR: invalid XML document -\set VERBOSITY default \ No newline at end of file +\set VERBOSITY default------
The attached patch update (v12-0002) fixes the xml_2.out for me.
It works for me too.
Thanks for the quick fix!
Jim
On 16.02.23 00:13, Peter Smith wrote:
Today I fetched and tried the latest v11.
It is failing too, but only just.
- see attached file pretty-v11-resultsIt looks only due to a whitespace EOF issue in the xml_2.out
@@ -1679,4 +1679,4 @@ -- XML format: empty string SELECT xmlformat(''); ERROR: invalid XML document -\set VERBOSITY default \ No newline at end of file +\set VERBOSITY default------
The attached patch update (v12-0002) fixes the xml_2.out for me.
I'm squashing v12-0001 and v12-0002 (v13 attached). There is still an
open discussion on renaming the function to xmlserialize,[1] but it
shouldn't be too difficult to change it later in case we reach a
consensus :)
Thanks!
Jim
1-
/messages/by-id/CANNMO+Kwb4_87G8qDeN+Vk1B1vX3HvgoGW+13fJ-b6rj7qbAww@mail.gmail.com
Attachments:
v13-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v13-0001-Add-pretty-printed-XML-output-option.patchDownload
From e28e9da7890d07e10f412ad61318d7a9ce4d058c Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Thu, 16 Feb 2023 22:36:19 +0100
Subject: [PATCH v13] Add pretty-printed XML output option
This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
doc/src/sgml/func.sgml | 34 ++++++++++
src/backend/utils/adt/xml.c | 45 +++++++++++++
src/include/catalog/pg_proc.dat | 3 +
src/test/regress/expected/xml.out | 101 ++++++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 53 +++++++++++++++
src/test/regress/expected/xml_2.out | 101 ++++++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 33 +++++++++
7 files changed, 370 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..a621192425 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14861,6 +14861,40 @@ SELECT xmltable.*
]]></screen>
</para>
</sect3>
+
+ <sect3 id="functions-xml-xmlformat">
+ <title><literal>xmlformat</literal></title>
+
+ <indexterm>
+ <primary>xmlformat</primary>
+ </indexterm>
+
+<synopsis>
+<function>xmlformat</function> ( <type>xml</type> ) <returnvalue>xml</returnvalue>
+</synopsis>
+
+ <para>
+ Converts the given XML value to pretty-printed, indented text.
+ </para>
+
+ <para>
+ Example:
+ <screen><![CDATA[
+SELECT xmlformat('<foo id="x"><bar id="y"><var id="z">42</var></bar></foo>');
+ xmlformat
+--------------------------
+ <foo id="x">
+ <bar id="y">
+ <var id="z">42</var>
+ </bar>
+ </foo>
+
+(1 row)
+
+]]></screen>
+ </para>
+ </sect3>
+
</sect2>
<sect2 id="functions-xml-mapping">
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..ec12707b5c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,51 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
}
#endif
+Datum
+xmlformat(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+ xmlDocPtr doc;
+ xmlChar *xmlbuf = NULL;
+ text *arg = PG_GETARG_TEXT_PP(0);
+ StringInfoData buf;
+ int nbytes;
+
+ doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+ if(!doc)
+ elog(ERROR, "could not parse the given XML document");
+
+ /**
+ * xmlDocDumpFormatMemory (
+ * xmlDocPtr doc, # the XML document
+ * xmlChar **xmlbuf, # the memory pointer
+ * int *nbytes, # the memory length
+ * int format # 1 = node indenting
+ *)
+ */
+
+ xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+ xmlFreeDoc(doc);
+
+ if(!nbytes)
+ elog(ERROR, "could not indent the given XML document");
+
+ initStringInfo(&buf);
+ appendStringInfoString(&buf, (const char *)xmlbuf);
+
+ xmlFree(xmlbuf);
+
+ PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
+
+#else
+ NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
Datum
xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..54e8a6262a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
{ oid => '3053', descr => 'determine if a string is well formed XML content',
proname => 'xml_is_well_formed_content', prorettype => 'bool',
proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+{ oid => '4642', descr => 'Indented text from xml',
+ proname => 'xmlformat', prorettype => 'xml',
+ proargtypes => 'xml', prosrc => 'xmlformat' },
# json
{ oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..e45116aaa7 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,104 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
<foo/> | <foo/>
(1 row)
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="42"> +
+ <food type="discounter"> +
+ <name>Belgian Waffles</name> +
+ <price>$5.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="73"> +
+ <food type="organic" class="fancy"> +
+ <name>Belgian Waffles</name> +
+ <price>$15.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> +
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>+
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <!-- eat this --> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories> +
+ <c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c> +
+ </meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+\set VERBOSITY terse
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: invalid XML document
+-- XML format: empty string
+SELECT xmlformat('');
+ERROR: invalid XML document
+\set VERBOSITY default
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 378b412db0..dc3c241a3a 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -1268,3 +1268,56 @@ DETAIL: This functionality requires the server to be built with libxml support.
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
ERROR: unsupported XML feature
DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<breakfast_menu id="42"><food type="discou...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<breakfast_menu id="73"> <food type="organ...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+\set VERBOSITY terse
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: unsupported XML feature at character 18
+-- XML format: empty string
+SELECT xmlformat('');
+ERROR: unsupported XML feature at character 18
+\set VERBOSITY default
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 42055c5003..2bacbde0c6 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -1579,3 +1579,104 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
<foo/> | <foo/>
(1 row)
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="42"> +
+ <food type="discounter"> +
+ <name>Belgian Waffles</name> +
+ <price>$5.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="73"> +
+ <food type="organic" class="fancy"> +
+ <name>Belgian Waffles</name> +
+ <price>$15.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> +
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>+
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <!-- eat this --> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories> +
+ <c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c> +
+ </meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+\set VERBOSITY terse
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: invalid XML document
+-- XML format: empty string
+SELECT xmlformat('');
+ERROR: invalid XML document
+\set VERBOSITY default
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459297..68ac613475 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -624,3 +624,36 @@ SELECT * FROM XMLTABLE('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n
\x
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
+
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+\set VERBOSITY terse
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+
+-- XML format: empty string
+SELECT xmlformat('');
+\set VERBOSITY default
\ No newline at end of file
--
2.25.1
On Thu, Feb 16, 2023 at 2:12 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
I'm squashing v12-0001 and v12-0002 (v13 attached).
I've looked into the patch. The code looks to conform to usual expectations.
One nit: this comment should have just one asterisk.
+ /**
And I have a dumb question: is this function protected from using
external XML namespaces? What if the user passes some xmlns that will
force it to read namespace data from the server filesystem? Or is it
not possible? I see there are a lot of calls to xml_parse() anyway,
but still...
Best regards, Andrey Borodin.
On 16.02.23 05:38, Nikolay Samokhvalov wrote:
On Thu, Feb 9, 2023 at 2:31 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:I suggest we call it "xmlformat", which is an established term for
this.Some very-very old, rusted memory told me that there was something in
standard – and indeed, it seems it described an optional Feature X069,
“XMLSerialize: INDENT” for XMLSERIALIZE. So probably pretty-printing
should go there, to XMLSERIALIZE, to follow the standard?Oracle also has an option for it in XMLSERIALIZE, although in a
slightly different form, with ability to specify the number of spaces
for indentation
https://docs.oracle.com/database/121/SQLRF/functions268.htm#SQLRF06231.
After your comment I'm studying the possibility to extend the existing
xmlserialize function to add the indentation feature. If so, how do you
think it should look like? An extra parameter? e.g.
SELECT xmlserialize(DOCUMENT '<foo><bar>42</bar></foo>'::XML AS text,
true);
.. or more or like Oracle does it
SELECT XMLSERIALIZE(DOCUMENT xmltype('<foo><bar>42</bar></foo>') AS BLOB
INDENT)
FROM dual;
Thanks!
Best, Jim
On 17.02.23 01:08, Andrey Borodin wrote:
On Thu, Feb 16, 2023 at 2:12 PM Jim Jones<jim.jones@uni-muenster.de> wrote:
I've looked into the patch. The code looks to conform to usual
expectations.
One nit: this comment should have just one asterisk.
+ /**
Thanks for reviewing! Asterisk removed in v14.
And I have a dumb question: is this function protected from using
external XML namespaces? What if the user passes some xmlns that will
force it to read namespace data from the server filesystem? Or is it
not possible? I see there are a lot of calls to xml_parse() anyway,
but still...
According to the documentation,[1] such validations are not supported.
/"The |xml| type does not validate input values against a document type
declaration (DTD), even when the input value specifies a DTD. There is
also currently no built-in support for validating against other XML
schema languages such as XML Schema."/
But I'll have a look at the code to be sure :)
Best, Jim
Attachments:
v14-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v14-0001-Add-pretty-printed-XML-output-option.patchDownload
From 44825f436e9c8f06a9bea3ed5966ef73bab208a9 Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Thu, 16 Feb 2023 22:36:19 +0100
Subject: [PATCH v14] Add pretty-printed XML output option
This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
doc/src/sgml/func.sgml | 34 ++++++++++
src/backend/utils/adt/xml.c | 44 ++++++++++++
src/include/catalog/pg_proc.dat | 3 +
src/test/regress/expected/xml.out | 101 ++++++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 53 +++++++++++++++
src/test/regress/expected/xml_2.out | 101 ++++++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 33 +++++++++
7 files changed, 369 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..a621192425 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14861,6 +14861,40 @@ SELECT xmltable.*
]]></screen>
</para>
</sect3>
+
+ <sect3 id="functions-xml-xmlformat">
+ <title><literal>xmlformat</literal></title>
+
+ <indexterm>
+ <primary>xmlformat</primary>
+ </indexterm>
+
+<synopsis>
+<function>xmlformat</function> ( <type>xml</type> ) <returnvalue>xml</returnvalue>
+</synopsis>
+
+ <para>
+ Converts the given XML value to pretty-printed, indented text.
+ </para>
+
+ <para>
+ Example:
+ <screen><![CDATA[
+SELECT xmlformat('<foo id="x"><bar id="y"><var id="z">42</var></bar></foo>');
+ xmlformat
+--------------------------
+ <foo id="x">
+ <bar id="y">
+ <var id="z">42</var>
+ </bar>
+ </foo>
+
+(1 row)
+
+]]></screen>
+ </para>
+ </sect3>
+
</sect2>
<sect2 id="functions-xml-mapping">
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..e96cbf65a7 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,50 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
}
#endif
+Datum
+xmlformat(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+ xmlDocPtr doc;
+ xmlChar *xmlbuf = NULL;
+ text *arg = PG_GETARG_TEXT_PP(0);
+ StringInfoData buf;
+ int nbytes;
+
+ doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+ if(!doc)
+ elog(ERROR, "could not parse the given XML document");
+
+ /*
+ * xmlDocDumpFormatMemory (
+ * xmlDocPtr doc, # the XML document
+ * xmlChar **xmlbuf, # the memory pointer
+ * int *nbytes, # the memory length
+ * int format # 1 = node indenting)
+ */
+
+ xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+ xmlFreeDoc(doc);
+
+ if(!nbytes)
+ elog(ERROR, "could not indent the given XML document");
+
+ initStringInfo(&buf);
+ appendStringInfoString(&buf, (const char *)xmlbuf);
+
+ xmlFree(xmlbuf);
+
+ PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
+
+#else
+ NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
Datum
xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..54e8a6262a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
{ oid => '3053', descr => 'determine if a string is well formed XML content',
proname => 'xml_is_well_formed_content', prorettype => 'bool',
proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+{ oid => '4642', descr => 'Indented text from xml',
+ proname => 'xmlformat', prorettype => 'xml',
+ proargtypes => 'xml', prosrc => 'xmlformat' },
# json
{ oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..e45116aaa7 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,104 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
<foo/> | <foo/>
(1 row)
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="42"> +
+ <food type="discounter"> +
+ <name>Belgian Waffles</name> +
+ <price>$5.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="73"> +
+ <food type="organic" class="fancy"> +
+ <name>Belgian Waffles</name> +
+ <price>$15.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> +
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>+
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <!-- eat this --> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories> +
+ <c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c> +
+ </meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+\set VERBOSITY terse
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: invalid XML document
+-- XML format: empty string
+SELECT xmlformat('');
+ERROR: invalid XML document
+\set VERBOSITY default
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 378b412db0..dc3c241a3a 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -1268,3 +1268,56 @@ DETAIL: This functionality requires the server to be built with libxml support.
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
ERROR: unsupported XML feature
DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<breakfast_menu id="42"><food type="discou...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<breakfast_menu id="73"> <food type="organ...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fa...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+\set VERBOSITY terse
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: unsupported XML feature at character 18
+-- XML format: empty string
+SELECT xmlformat('');
+ERROR: unsupported XML feature at character 18
+\set VERBOSITY default
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 42055c5003..2bacbde0c6 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -1579,3 +1579,104 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
<foo/> | <foo/>
(1 row)
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="42"> +
+ <food type="discounter"> +
+ <name>Belgian Waffles</name> +
+ <price>$5.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+ xmlformat
+--------------------------------------------------------------------------------------------------
+ <breakfast_menu id="73"> +
+ <food type="organic" class="fancy"> +
+ <name>Belgian Waffles</name> +
+ <price>$15.95</price> +
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>+
+ <calories>650</calories> +
+ </food> +
+ </breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> +
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>+
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <!-- eat this --> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories>650</meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+ xmlformat
+-------------------------------------------------------------------------------------------------------------
+ <meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73">+
+ <meal:food type="organic" class="fancy"> +
+ <meal:name>Belgian Waffles</meal:name> +
+ <meal:price>$15.95</meal:price> +
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description> +
+ <meal:calories> +
+ <c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c> +
+ </meal:calories> +
+ </meal:food> +
+ </meal:breakfast_menu> +
+
+(1 row)
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+ xmlformat
+-----------
+
+(1 row)
+
+\set VERBOSITY terse
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+ERROR: invalid XML document
+-- XML format: empty string
+SELECT xmlformat('');
+ERROR: invalid XML document
+\set VERBOSITY default
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459297..68ac613475 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -624,3 +624,36 @@ SELECT * FROM XMLTABLE('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n
\x
SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH '"<foo/>"', b xml PATH '"<foo/>"');
+
+-- XML format: single line XML string
+SELECT xmlformat('<breakfast_menu id="42"><food type="discounter"><name>Belgian Waffles</name><price>$5.95</price><description>Two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories></food></breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat('<breakfast_menu id="73"> <food type="organic" class="fancy"> <name>Belgian Waffles</name> <price>$15.95</price>
+ <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
+<calories>650</calories> </food> </breakfast_menu> ');
+
+-- XML format: XML string with space, tabs and newline between nodes, using a namespace
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <meal:description>Two of our famous Belgian Waffles with plenty of real maple syrup</meal:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and a comment
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <!-- eat this --> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories>650</meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: XML string with space, tabs and newline between nodes, using multiple namespaces and CDATA
+SELECT xmlformat('<meal:breakfast_menu xmlns:meal="http://fancycafe.im/meal/" xmlns:desc="http://fancycafe.mn/meal/" id="73"> <meal:food type="organic" class="fancy"> <meal:name>Belgian Waffles</meal:name> <meal:price>$15.95</meal:price>
+ <desc:description>Two of our famous Belgian Waffles with plenty of real maple syrup</desc:description>
+<meal:calories><c><![CDATA[<unknown> &"<>!<a>foo</a>]]></c></meal:calories> </meal:food></meal:breakfast_menu>');
+
+-- XML format: NULL parameter
+SELECT xmlformat(NULL);
+\set VERBOSITY terse
+-- XML format: invalid string (whitespaces)
+SELECT xmlformat(' ');
+
+-- XML format: empty string
+SELECT xmlformat('');
+\set VERBOSITY default
\ No newline at end of file
--
2.25.1
On Fri, Feb 17, 2023 at 1:14 AM Jim Jones <jim.jones@uni-muenster.de> wrote:
After your comment I'm studying the possibility to extend the existing
xmlserialize function to add the indentation feature. If so, how do you
think it should look like? An extra parameter? e.g.SELECT xmlserialize(DOCUMENT '<foo><bar>42</bar></foo>'::XML AS text,
true);.. or more or like Oracle does it
SELECT XMLSERIALIZE(DOCUMENT xmltype('<foo><bar>42</bar></foo>') AS BLOB
INDENT)
FROM dual;
My idea was to follow the SQL standard (part 14, SQL/XML); unfortunately,
there is no free version, but there are drafts at
http://www.wiscorp.com/SQLStandards.html.
<XML character string serialization> ::=
XMLSERIALIZE <left paren> [ <document or content> ]
<XML value expression> AS <data type>
[ <XML serialize bom> ]
[ <XML serialize version> ]
[ <XML declaration option> ]
[ <XML serialize indent> ]
<right paren>
<XML serialize indent> ::=
[ NO ] INDENT
Oracle's extension SIZE=n also seems interesting (including a special case
SIZE=0, which means using new-line characters without spaces for each line).
On 17.02.23 23:24, Nikolay Samokhvalov wrote:
My idea was to follow the SQL standard (part 14, SQL/XML);
unfortunately, there is no free version, but there are drafts at
http://www.wiscorp.com/SQLStandards.html
<http://www.wiscorp.com/SQLStandards.html>.<XML character string serialization> ::= XMLSERIALIZE <left paren> [
<document or content> ]<XML value expression> AS <data type> [ <XML serialize bom> ] [ <XML
serialize version> ] [ <XML declaration option> ][ <XML serialize indent> ] <right paren>
<XML serialize indent> ::= [ NO ] INDENT
Good find. It would be better to use this standard syntax.
On 18.02.23 19:09, Peter Eisentraut wrote:
On 17.02.23 23:24, Nikolay Samokhvalov wrote:
My idea was to follow the SQL standard (part 14, SQL/XML);
unfortunately, there is no free version, but there are drafts at
http://www.wiscorp.com/SQLStandards.html
<http://www.wiscorp.com/SQLStandards.html>.<XML character string serialization> ::= XMLSERIALIZE <left paren> [
<document or content> ]<XML value expression> AS <data type> [ <XML serialize bom> ] [ <XML
serialize version> ] [ <XML declaration option> ][ <XML serialize indent> ] <right paren>
<XML serialize indent> ::= [ NO ] INDENT
Good find. It would be better to use this standard syntax.
As suggested by Peter and Nikolay, v15 now removes the xmlformat
function from the catalog and adds the [NO] INDENT option to
xmlserialize, as described in X069.
postgres=# SELECT xmlserialize(DOCUMENT '<foo><bar><val
x="y">42</val></bar></foo>' AS text INDENT);
xmlserialize
----------------------------------------
<?xml version="1.0" encoding="UTF-8"?>+
<foo> +
<bar> +
<val x="y">42</val> +
</bar> +
</foo> +
(1 row)
postgres=# SELECT xmlserialize(DOCUMENT '<foo><bar><val
x="y">42</val></bar></foo>' AS text NO INDENT);
xmlserialize
-------------------------------------------
<foo><bar><val x="y">42</val></bar></foo>
(1 row)
Although the indent feature is designed to work with xml strings of type
DOCUMENT, this implementation also allows the usage of CONTENT type
strings as long as it contains a well-formed xml. It will throw an error
otherwise.
Thanks!
Best, Jim
Attachments:
v15-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v15-0001-Add-pretty-printed-XML-output-option.patchDownload
From ba0bf68ab69b702b6dbe00e481e39b60580d8569 Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Mon, 20 Feb 2023 23:35:22 +0100
Subject: [PATCH v15] Add pretty-printed XML output option
This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.'
It adds the options INDENT and NO INDENT (default) to the existing
xmlserialize function. It uses the indentation feature of xmlDocDumpFormatMemory
from libxml2 to format XML strings. Although the INDENT feature is designed
to work with xml strings of type DOCUMENT, this implementation also allows
the usage of CONTENT type strings as long as it contains a well-formed xml -
note the XMLOPTION_DOCUMENT in the xml_parse call.
This patch also includes documentation, regression tests and their three
possible output files xml.out, xml_1.out and xml_2.out.
---
doc/src/sgml/datatype.sgml | 8 ++-
src/backend/catalog/sql_features.txt | 2 +-
src/backend/executor/execExprInterp.c | 13 +++-
src/backend/parser/gram.y | 12 +++-
src/backend/parser/parse_expr.c | 1 +
src/backend/utils/adt/xml.c | 41 ++++++++++++
src/include/nodes/parsenodes.h | 1 +
src/include/nodes/primnodes.h | 1 +
src/include/parser/kwlist.h | 1 +
src/include/utils/xml.h | 1 +
src/test/regress/expected/xml.out | 93 +++++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 63 ++++++++++++++++++
src/test/regress/expected/xml_2.out | 93 +++++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 23 +++++++
14 files changed, 345 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..b579b521af 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml '<foo>bar</foo>'
<type>xml</type>, uses the function
<function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm>
<synopsis>
-XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> [ { NO INDENT | INDENT } ] )
</synopsis>
<replaceable>type</replaceable> can be
<type>character</type>, <type>character varying</type>, or
<type>text</type> (or an alias for one of those). Again, according
to the SQL standard, this is the only way to convert between type
<type>xml</type> and character types, but PostgreSQL also allows
- you to simply cast the value.
+ you to simply cast the value. The option <type>INDENT</type> allows to
+ indent the serialized xml output - the default is <type>NO INDENT</type>.
+ It is designed to indent XML strings of type <type>DOCUMENT</type>, but it can also
+ be used with <type>CONTENT</type> as long as <replaceable>value</replaceable>
+ contains a well-formed XML.
</para>
<para>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 3766762ae3..2e196faeeb 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -619,7 +619,7 @@ X061 XMLParse: character string input and DOCUMENT option YES
X065 XMLParse: binary string input and CONTENT option NO
X066 XMLParse: binary string input and DOCUMENT option NO
X068 XMLSerialize: BOM NO
-X069 XMLSerialize: INDENT NO
+X069 XMLSerialize: INDENT YES
X070 XMLSerialize: character string serialization and CONTENT option YES
X071 XMLSerialize: character string serialization and DOCUMENT option YES
X072 XMLSerialize: character string serialization YES
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..d460c2b67a 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
{
Datum *argvalue = op->d.xmlexpr.argvalue;
bool *argnull = op->d.xmlexpr.argnull;
-
+ bool indent = op->d.xmlexpr.xexpr->indent;
+ text *data;
/* argument type is known to be xml */
Assert(list_length(xexpr->args) == 1);
@@ -3837,9 +3838,15 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
return;
value = argvalue[0];
- *op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
- xexpr->xmloption));
*op->resnull = false;
+
+ data = xmltotext_with_xmloption(DatumGetXmlP(value),
+ xexpr->xmloption);
+ if(indent)
+ *op->resvalue = PointerGetDatum(xmlformat(data));
+ else
+ *op->resvalue = PointerGetDatum(data);
+
}
break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0138382a1..2814f16082 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -619,6 +619,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <defelt> xmltable_column_option_el
%type <list> xml_namespace_list
%type <target> xml_namespace_el
+%type <boolean> opt_xml_indent
%type <node> func_application func_expr_common_subexpr
%type <node> func_expr func_expr_windowless
@@ -702,7 +703,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
- INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
+ INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -15532,13 +15533,14 @@ func_expr_common_subexpr:
$$ = makeXmlExpr(IS_XMLROOT, NULL, NIL,
list_make3($3, $5, $6), @1);
}
- | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename ')'
+ | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename opt_xml_indent ')'
{
XmlSerialize *n = makeNode(XmlSerialize);
n->xmloption = $3;
n->expr = $4;
n->typeName = $6;
+ n->indent = $7;
n->location = @1;
$$ = (Node *) n;
}
@@ -15617,6 +15619,11 @@ xmlexists_argument:
}
;
+opt_xml_indent: INDENT { $$ = true; }
+ | NO INDENT { $$ = false; }
+ | /*EMPTY*/ { $$ = false; }
+ ;
+
xml_passing_mech:
BY REF_P
| BY VALUE_P
@@ -16828,6 +16835,7 @@ unreserved_keyword:
| INCLUDE
| INCLUDING
| INCREMENT
+ | INDENT
| INDEX
| INDEXES
| INHERIT
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 7ff41acb84..1f465d126a 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -2332,6 +2332,7 @@ transformXmlSerialize(ParseState *pstate, XmlSerialize *xs)
xexpr->xmloption = xs->xmloption;
xexpr->location = xs->location;
+ xexpr->indent = xs->indent;
/* We actually only need these to be able to parse back the expression. */
xexpr->type = targetType;
xexpr->typmod = targetTypmod;
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..a326b92336 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -4818,3 +4818,44 @@ XmlTableDestroyOpaque(TableFuncScanState *state)
NO_XML_SUPPORT();
#endif /* not USE_LIBXML */
}
+
+xmltype *
+xmlformat(text *data)
+{
+#ifdef USE_LIBXML
+
+ xmlDocPtr doc;
+ xmlChar *xmlbuf = NULL;
+ StringInfoData buf;
+ int nbytes;
+
+ doc = xml_parse(data, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+ if(!doc)
+ elog(ERROR, "could not parse the given XML document");
+
+ /*
+ * xmlDocDumpFormatMemory (
+ * xmlDocPtr doc, # the XML document
+ * xmlChar **xmlbuf, # the memory pointer
+ * int *nbytes, # the memory length
+ * int format # 1 = node indenting )
+ */
+ xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+ xmlFreeDoc(doc);
+
+ if(!nbytes)
+ elog(ERROR, "could not indent the given XML document");
+
+ initStringInfo(&buf);
+ appendStringInfoString(&buf, (const char *)xmlbuf);
+
+ xmlFree(xmlbuf);
+
+ return stringinfo_to_xmltype(&buf);
+#else
+ NO_XML_SUPPORT();
+return 0;
+#endif
+}
\ No newline at end of file
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index f7d7f10f7d..831206dbc0 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -842,6 +842,7 @@ typedef struct XmlSerialize
Node *expr;
TypeName *typeName;
int location; /* token location, or -1 if unknown */
+ bool indent; /* should the xml output be indented? */
} XmlSerialize;
/* Partitioning related definitions */
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1be1642d92..17504a7d3d 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1491,6 +1491,7 @@ typedef struct XmlExpr
int32 typmod pg_node_attr(query_jumble_ignore);
/* token location, or -1 if unknown */
int location;
+ bool indent;
} XmlExpr;
/* ----------------
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index bb36213e6f..aeda7cc9f1 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -205,6 +205,7 @@ PG_KEYWORD("in", IN_P, RESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("include", INCLUDE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("including", INCLUDING, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("increment", INCREMENT, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, AS_LABEL)
PG_KEYWORD("index", INDEX, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("indexes", INDEXES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("inherit", INHERIT, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 311da06cd6..551d9f6b05 100644
--- a/src/include/utils/xml.h
+++ b/src/include/utils/xml.h
@@ -90,4 +90,5 @@ extern PGDLLIMPORT int xmloption; /* XmlOptionType, but int for guc enum */
extern PGDLLIMPORT const TableFuncRoutine XmlTableRoutine;
+extern xmltype *xmlformat(text *data);
#endif /* XML_H */
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..3f5aea920f 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -486,6 +486,99 @@ SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
ERROR: not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: invalid XML document
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: invalid XML document
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: invalid XML document
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
?column?
----------
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 378b412db0..d2fb208d3e 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -309,6 +309,69 @@ ERROR: unsupported XML feature
LINE 1: SELECT xmlserialize(document 'bad' as text);
^
DETAIL: This functionality requires the server to be built with libxml support.
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
ERROR: unsupported XML feature
LINE 1: SELECT xml '<foo>bar</foo>' IS DOCUMENT;
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 42055c5003..689f1bc831 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -466,6 +466,99 @@ SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
ERROR: not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: invalid XML document
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: invalid XML document
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: invalid XML document
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
?column?
----------
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459297..7841ad95cd 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -132,6 +132,29 @@ SELECT xmlserialize(content data as character varying(20)) FROM xmltest;
SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
SELECT xml '<foo>bar</foo><bar>foo</bar>' IS DOCUMENT;
--
2.25.1
On Mon, Feb 20, 2023 at 3:06 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
As suggested by Peter and Nikolay, v15 now removes the xmlformat
function from the catalog and adds the [NO] INDENT option to
xmlserialize, as described in X069.\
Great. I'm checking this patch and it seems, indentation stops working if
we have a text node inside:
gitpod=# select xmlserialize(document '<xml><more>13</more></xml>' as text
indent);
xmlserialize
----------------------------------------
<?xml version="1.0" encoding="UTF-8"?>+
<xml> +
<more>13</more> +
</xml> +
(1 row)
gitpod=# select xmlserialize(document '<xml>text<more>13</more></xml>' as
text indent);
xmlserialize
----------------------------------------
<?xml version="1.0" encoding="UTF-8"?>+
<xml>text<more>13</more></xml> +
(1 row)
Worth to mention, Oracle behaves similarly -- indentation doesn't work:
https://dbfiddle.uk/hRz5sXdM.
But is this as expected? Shouldn't it be like this:
<xml>
text
<more>13</more>
</xml>
?
Here are some review comments for patch v15-0001
FYI, the patch applies clean and tests OK for me.
======
doc/src/sgml/datatype.sgml
1.
XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>
AS <replaceable>type</replaceable> [ { NO INDENT | INDENT } ] )
~
Another/shorter way to write that syntax is like below. For me, it is
easier to read. YMMV.
XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>
AS <replaceable>type</replaceable> [ [NO] INDENT ] )
======
src/backend/executor/execExprInterp.c
2. ExecEvalXmlExpr
@@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
{
Datum *argvalue = op->d.xmlexpr.argvalue;
bool *argnull = op->d.xmlexpr.argnull;
-
+ bool indent = op->d.xmlexpr.xexpr->indent;
+ text *data;
/* argument type is known to be xml */
Assert(list_length(xexpr->args) == 1);
Missing whitespace after the variable declarations
~~~
3.
+
+ data = xmltotext_with_xmloption(DatumGetXmlP(value),
+ xexpr->xmloption);
+ if(indent)
+ *op->resvalue = PointerGetDatum(xmlformat(data));
+ else
+ *op->resvalue = PointerGetDatum(data);
+
}
Unnecessary blank line at the end.
======
src/backend/utils/adt/xml.
4. xmlformat
+#else
+ NO_XML_SUPPORT();
+return 0;
+#endif
Wrong indentation (return 0) in the indentation function? ;-)
------
Kind Regards,
Peter Smith.
Fujitsu Australia
On 22.02.23 08:05, Nikolay Samokhvalov wrote:
But is this as expected? Shouldn't it be like this:
<xml>
text
<more>13</more>
</xml>
?
Oracle and other parsers I know also do not work well with mixed
contents.[1,2] I believe libxml2's parser does not know where to put the
newline, as mixed values can contain more than one text node:
<xml>text<more>13</more> text2 text3</xml> [3]
And applying this logic the output could look like this ..
<xml>text
<more>13</more>text2 text3
</xml>
or even this
<xml>
text
<more>13</more>
text2 text3
</xml>
.. which doesn't seem right either. Perhaps a note about mixed contents
in the docs would make things clearer?
Thanks for the review!
Jim
Attachments:
On 22.02.23 08:20, Peter Smith wrote:
Here are some review comments for patch v15-0001
FYI, the patch applies clean and tests OK for me.
======
doc/src/sgml/datatype.sgml1.
XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>
AS <replaceable>type</replaceable> [ { NO INDENT | INDENT } ] )~
Another/shorter way to write that syntax is like below. For me, it is
easier to read. YMMV.XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>
AS <replaceable>type</replaceable> [ [NO] INDENT ] )
Indeed simpler to read.
======
src/backend/executor/execExprInterp.c2. ExecEvalXmlExpr
@@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) { Datum *argvalue = op->d.xmlexpr.argvalue; bool *argnull = op->d.xmlexpr.argnull; - + bool indent = op->d.xmlexpr.xexpr->indent; + text *data; /* argument type is known to be xml */ Assert(list_length(xexpr->args) == 1); Missing whitespace after the variable declarations
Whitespace added.
~~~
3. + + data = xmltotext_with_xmloption(DatumGetXmlP(value), + xexpr->xmloption); + if(indent) + *op->resvalue = PointerGetDatum(xmlformat(data)); + else + *op->resvalue = PointerGetDatum(data); + }Unnecessary blank line at the end.
blank line removed.
======
src/backend/utils/adt/xml.4. xmlformat
+#else + NO_XML_SUPPORT(); +return 0; +#endifWrong indentation (return 0) in the indentation function? ;-)
indentation corrected.
v16 attached.
Thanks a lot!
Jim
Attachments:
v16-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v16-0001-Add-pretty-printed-XML-output-option.patchDownload
From a4fef3cdadd3d2f7abe530f5b07bf8c536689d83 Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Mon, 20 Feb 2023 23:35:22 +0100
Subject: [PATCH v16] Add pretty-printed XML output option
This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.'
It adds the options INDENT and NO INDENT (default) to the existing
xmlserialize function. It uses the indentation feature of xmlDocDumpFormatMemory
from libxml2 to format XML strings. Although the INDENT feature is designed
to work with xml strings of type DOCUMENT, this implementation also allows
the usage of CONTENT type strings as long as it contains a well-formed xml -
note the XMLOPTION_DOCUMENT in the xml_parse call.
This patch also includes documentation, regression tests and their three
possible output files xml.out, xml_1.out and xml_2.out.
---
doc/src/sgml/datatype.sgml | 8 ++-
src/backend/catalog/sql_features.txt | 2 +-
src/backend/executor/execExprInterp.c | 12 +++-
src/backend/parser/gram.y | 12 +++-
src/backend/parser/parse_expr.c | 1 +
src/backend/utils/adt/xml.c | 41 ++++++++++++
src/include/nodes/parsenodes.h | 1 +
src/include/nodes/primnodes.h | 1 +
src/include/parser/kwlist.h | 1 +
src/include/utils/xml.h | 1 +
src/test/regress/expected/xml.out | 93 +++++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 63 ++++++++++++++++++
src/test/regress/expected/xml_2.out | 93 +++++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 23 +++++++
14 files changed, 344 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..53d59662b9 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml '<foo>bar</foo>'
<type>xml</type>, uses the function
<function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm>
<synopsis>
-XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> [ [NO] INDENT ] )
</synopsis>
<replaceable>type</replaceable> can be
<type>character</type>, <type>character varying</type>, or
<type>text</type> (or an alias for one of those). Again, according
to the SQL standard, this is the only way to convert between type
<type>xml</type> and character types, but PostgreSQL also allows
- you to simply cast the value.
+ you to simply cast the value. The option <type>INDENT</type> allows to
+ indent the serialized xml output - the default is <type>NO INDENT</type>.
+ It is designed to indent XML strings of type <type>DOCUMENT</type>, but it can also
+ be used with <type>CONTENT</type> as long as <replaceable>value</replaceable>
+ contains a well-formed XML.
</para>
<para>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 3766762ae3..2e196faeeb 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -619,7 +619,7 @@ X061 XMLParse: character string input and DOCUMENT option YES
X065 XMLParse: binary string input and CONTENT option NO
X066 XMLParse: binary string input and DOCUMENT option NO
X068 XMLSerialize: BOM NO
-X069 XMLSerialize: INDENT NO
+X069 XMLSerialize: INDENT YES
X070 XMLSerialize: character string serialization and CONTENT option YES
X071 XMLSerialize: character string serialization and DOCUMENT option YES
X072 XMLSerialize: character string serialization YES
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..15393f83c8 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
{
Datum *argvalue = op->d.xmlexpr.argvalue;
bool *argnull = op->d.xmlexpr.argnull;
-
+ bool indent = op->d.xmlexpr.xexpr->indent;
+ text *data;
/* argument type is known to be xml */
Assert(list_length(xexpr->args) == 1);
@@ -3837,9 +3838,14 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
return;
value = argvalue[0];
- *op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
- xexpr->xmloption));
*op->resnull = false;
+
+ data = xmltotext_with_xmloption(DatumGetXmlP(value),
+ xexpr->xmloption);
+ if(indent)
+ *op->resvalue = PointerGetDatum(xmlformat(data));
+ else
+ *op->resvalue = PointerGetDatum(data);
}
break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0138382a1..2814f16082 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -619,6 +619,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <defelt> xmltable_column_option_el
%type <list> xml_namespace_list
%type <target> xml_namespace_el
+%type <boolean> opt_xml_indent
%type <node> func_application func_expr_common_subexpr
%type <node> func_expr func_expr_windowless
@@ -702,7 +703,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
- INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
+ INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -15532,13 +15533,14 @@ func_expr_common_subexpr:
$$ = makeXmlExpr(IS_XMLROOT, NULL, NIL,
list_make3($3, $5, $6), @1);
}
- | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename ')'
+ | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename opt_xml_indent ')'
{
XmlSerialize *n = makeNode(XmlSerialize);
n->xmloption = $3;
n->expr = $4;
n->typeName = $6;
+ n->indent = $7;
n->location = @1;
$$ = (Node *) n;
}
@@ -15617,6 +15619,11 @@ xmlexists_argument:
}
;
+opt_xml_indent: INDENT { $$ = true; }
+ | NO INDENT { $$ = false; }
+ | /*EMPTY*/ { $$ = false; }
+ ;
+
xml_passing_mech:
BY REF_P
| BY VALUE_P
@@ -16828,6 +16835,7 @@ unreserved_keyword:
| INCLUDE
| INCLUDING
| INCREMENT
+ | INDENT
| INDEX
| INDEXES
| INHERIT
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 7ff41acb84..1f465d126a 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -2332,6 +2332,7 @@ transformXmlSerialize(ParseState *pstate, XmlSerialize *xs)
xexpr->xmloption = xs->xmloption;
xexpr->location = xs->location;
+ xexpr->indent = xs->indent;
/* We actually only need these to be able to parse back the expression. */
xexpr->type = targetType;
xexpr->typmod = targetTypmod;
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..47cdc7c339 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -4818,3 +4818,44 @@ XmlTableDestroyOpaque(TableFuncScanState *state)
NO_XML_SUPPORT();
#endif /* not USE_LIBXML */
}
+
+xmltype *
+xmlformat(text *data)
+{
+#ifdef USE_LIBXML
+
+ xmlDocPtr doc;
+ xmlChar *xmlbuf = NULL;
+ StringInfoData buf;
+ int nbytes;
+
+ doc = xml_parse(data, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+ if(!doc)
+ elog(ERROR, "could not parse the given XML document");
+
+ /*
+ * xmlDocDumpFormatMemory (
+ * xmlDocPtr doc, # the XML document
+ * xmlChar **xmlbuf, # the memory pointer
+ * int *nbytes, # the memory length
+ * int format # 1 = node indenting )
+ */
+ xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+ xmlFreeDoc(doc);
+
+ if(!nbytes)
+ elog(ERROR, "could not indent the given XML document");
+
+ initStringInfo(&buf);
+ appendStringInfoString(&buf, (const char *)xmlbuf);
+
+ xmlFree(xmlbuf);
+
+ return stringinfo_to_xmltype(&buf);
+#else
+ NO_XML_SUPPORT();
+ return 0;
+#endif
+}
\ No newline at end of file
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index f7d7f10f7d..831206dbc0 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -842,6 +842,7 @@ typedef struct XmlSerialize
Node *expr;
TypeName *typeName;
int location; /* token location, or -1 if unknown */
+ bool indent; /* should the xml output be indented? */
} XmlSerialize;
/* Partitioning related definitions */
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1be1642d92..17504a7d3d 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1491,6 +1491,7 @@ typedef struct XmlExpr
int32 typmod pg_node_attr(query_jumble_ignore);
/* token location, or -1 if unknown */
int location;
+ bool indent;
} XmlExpr;
/* ----------------
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index bb36213e6f..aeda7cc9f1 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -205,6 +205,7 @@ PG_KEYWORD("in", IN_P, RESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("include", INCLUDE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("including", INCLUDING, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("increment", INCREMENT, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, AS_LABEL)
PG_KEYWORD("index", INDEX, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("indexes", INDEXES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("inherit", INHERIT, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 311da06cd6..551d9f6b05 100644
--- a/src/include/utils/xml.h
+++ b/src/include/utils/xml.h
@@ -90,4 +90,5 @@ extern PGDLLIMPORT int xmloption; /* XmlOptionType, but int for guc enum */
extern PGDLLIMPORT const TableFuncRoutine XmlTableRoutine;
+extern xmltype *xmlformat(text *data);
#endif /* XML_H */
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..3f5aea920f 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -486,6 +486,99 @@ SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
ERROR: not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: invalid XML document
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: invalid XML document
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: invalid XML document
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
?column?
----------
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 378b412db0..d2fb208d3e 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -309,6 +309,69 @@ ERROR: unsupported XML feature
LINE 1: SELECT xmlserialize(document 'bad' as text);
^
DETAIL: This functionality requires the server to be built with libxml support.
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
ERROR: unsupported XML feature
LINE 1: SELECT xml '<foo>bar</foo>' IS DOCUMENT;
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 42055c5003..689f1bc831 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -466,6 +466,99 @@ SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
ERROR: not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: invalid XML document
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: invalid XML document
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: invalid XML document
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
?column?
----------
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459297..7841ad95cd 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -132,6 +132,29 @@ SELECT xmlserialize(content data as character varying(20)) FROM xmltest;
SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
SELECT xml '<foo>bar</foo><bar>foo</bar>' IS DOCUMENT;
--
2.25.1
Here are some review comments for patch v16-0001.
======
src/backend/executor/execExprInterp.c
2. ExecEvalXmlExpr
@@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) { Datum *argvalue = op->d.xmlexpr.argvalue; bool *argnull = op->d.xmlexpr.argnull; - + bool indent = op->d.xmlexpr.xexpr->indent; + text *data; /* argument type is known to be xml */ Assert(list_length(xexpr->args) == 1); Missing whitespace after the variable declarations
Whitespace added.
~
Oh, I meant something different to that fix. I meant there is a
missing blank line after the last ('data') variable declaration.
======
Test code.
I wondered if there ought to be a test that demonstrates explicitly
saying NO INDENT will give the identical result to just omitting it.
For example:
test=# -- no indent is default
test=# SELECT xmlserialize(DOCUMENT '<foo><bar><val
x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT
'<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
?column?
----------
t
(1 row)
test=# SELECT xmlserialize(CONTENT '<foo><bar><val
x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT
'<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
?column?
----------
t
(1 row)
------
Kind Regards,
Peter Smith.
Fujitsu Australia
On 22.02.23 23:45, Peter Smith wrote:
src/backend/executor/execExprInterp.c
2. ExecEvalXmlExpr
@@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) { Datum *argvalue = op->d.xmlexpr.argvalue; bool *argnull = op->d.xmlexpr.argnull; - + bool indent = op->d.xmlexpr.xexpr->indent; + text *data; /* argument type is known to be xml */ Assert(list_length(xexpr->args) == 1); Missing whitespace after the variable declarationsWhitespace added.
~
Oh, I meant something different to that fix. I meant there is a
missing blank line after the last ('data') variable declaration.
I believe I see it now (it took me a while) :)
======
Test code.I wondered if there ought to be a test that demonstrates explicitly
saying NO INDENT will give the identical result to just omitting it.For example:
test=# -- no indent is default
test=# SELECT xmlserialize(DOCUMENT '<foo><bar><val
x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT
'<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
?column?
----------
t
(1 row)test=# SELECT xmlserialize(CONTENT '<foo><bar><val
x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT
'<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
?column?
----------
t
(1 row)
Actually NO INDENT just ignores this feature and doesn't call the
function at all, so in this particular case the result sets will always
be identical. But yes, I totally agree that a test case for that is also
important.
v17 attached.
Thanks!
Best, Jim
Attachments:
v17-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v17-0001-Add-pretty-printed-XML-output-option.patchDownload
From 98524ed5e39188c2ad177c3f22159d3aff301899 Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Mon, 20 Feb 2023 23:35:22 +0100
Subject: [PATCH v17] Add pretty-printed XML output option
This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.'
It adds the options INDENT and NO INDENT (default) to the existing
xmlserialize function. It uses the indentation feature of xmlDocDumpFormatMemory
from libxml2 to format XML strings. Although the INDENT feature is designed
to work with xml strings of type DOCUMENT, this implementation also allows
the usage of CONTENT type strings as long as it contains a well-formed xml -
note the XMLOPTION_DOCUMENT in the xml_parse call.
This patch also includes documentation, regression tests and their three
possible output files xml.out, xml_1.out and xml_2.out.
---
doc/src/sgml/datatype.sgml | 8 +-
src/backend/catalog/sql_features.txt | 2 +-
src/backend/executor/execExprInterp.c | 11 ++-
src/backend/parser/gram.y | 12 ++-
src/backend/parser/parse_expr.c | 1 +
src/backend/utils/adt/xml.c | 41 ++++++++++
src/include/nodes/parsenodes.h | 1 +
src/include/nodes/primnodes.h | 1 +
src/include/parser/kwlist.h | 1 +
src/include/utils/xml.h | 1 +
src/test/regress/expected/xml.out | 106 ++++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 74 ++++++++++++++++++
src/test/regress/expected/xml_2.out | 106 ++++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 27 ++++++-
14 files changed, 384 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..53d59662b9 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml '<foo>bar</foo>'
<type>xml</type>, uses the function
<function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm>
<synopsis>
-XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> [ [NO] INDENT ] )
</synopsis>
<replaceable>type</replaceable> can be
<type>character</type>, <type>character varying</type>, or
<type>text</type> (or an alias for one of those). Again, according
to the SQL standard, this is the only way to convert between type
<type>xml</type> and character types, but PostgreSQL also allows
- you to simply cast the value.
+ you to simply cast the value. The option <type>INDENT</type> allows to
+ indent the serialized xml output - the default is <type>NO INDENT</type>.
+ It is designed to indent XML strings of type <type>DOCUMENT</type>, but it can also
+ be used with <type>CONTENT</type> as long as <replaceable>value</replaceable>
+ contains a well-formed XML.
</para>
<para>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 3766762ae3..2e196faeeb 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -619,7 +619,7 @@ X061 XMLParse: character string input and DOCUMENT option YES
X065 XMLParse: binary string input and CONTENT option NO
X066 XMLParse: binary string input and DOCUMENT option NO
X068 XMLSerialize: BOM NO
-X069 XMLSerialize: INDENT NO
+X069 XMLSerialize: INDENT YES
X070 XMLSerialize: character string serialization and CONTENT option YES
X071 XMLSerialize: character string serialization and DOCUMENT option YES
X072 XMLSerialize: character string serialization YES
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..7ba3131d92 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
{
Datum *argvalue = op->d.xmlexpr.argvalue;
bool *argnull = op->d.xmlexpr.argnull;
+ bool indent = op->d.xmlexpr.xexpr->indent;
+ text *data;
/* argument type is known to be xml */
Assert(list_length(xexpr->args) == 1);
@@ -3837,9 +3839,14 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
return;
value = argvalue[0];
- *op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
- xexpr->xmloption));
*op->resnull = false;
+
+ data = xmltotext_with_xmloption(DatumGetXmlP(value),
+ xexpr->xmloption);
+ if(indent)
+ *op->resvalue = PointerGetDatum(xmlformat(data));
+ else
+ *op->resvalue = PointerGetDatum(data);
}
break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0138382a1..2814f16082 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -619,6 +619,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <defelt> xmltable_column_option_el
%type <list> xml_namespace_list
%type <target> xml_namespace_el
+%type <boolean> opt_xml_indent
%type <node> func_application func_expr_common_subexpr
%type <node> func_expr func_expr_windowless
@@ -702,7 +703,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
- INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
+ INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -15532,13 +15533,14 @@ func_expr_common_subexpr:
$$ = makeXmlExpr(IS_XMLROOT, NULL, NIL,
list_make3($3, $5, $6), @1);
}
- | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename ')'
+ | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename opt_xml_indent ')'
{
XmlSerialize *n = makeNode(XmlSerialize);
n->xmloption = $3;
n->expr = $4;
n->typeName = $6;
+ n->indent = $7;
n->location = @1;
$$ = (Node *) n;
}
@@ -15617,6 +15619,11 @@ xmlexists_argument:
}
;
+opt_xml_indent: INDENT { $$ = true; }
+ | NO INDENT { $$ = false; }
+ | /*EMPTY*/ { $$ = false; }
+ ;
+
xml_passing_mech:
BY REF_P
| BY VALUE_P
@@ -16828,6 +16835,7 @@ unreserved_keyword:
| INCLUDE
| INCLUDING
| INCREMENT
+ | INDENT
| INDEX
| INDEXES
| INHERIT
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 7ff41acb84..1f465d126a 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -2332,6 +2332,7 @@ transformXmlSerialize(ParseState *pstate, XmlSerialize *xs)
xexpr->xmloption = xs->xmloption;
xexpr->location = xs->location;
+ xexpr->indent = xs->indent;
/* We actually only need these to be able to parse back the expression. */
xexpr->type = targetType;
xexpr->typmod = targetTypmod;
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..47cdc7c339 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -4818,3 +4818,44 @@ XmlTableDestroyOpaque(TableFuncScanState *state)
NO_XML_SUPPORT();
#endif /* not USE_LIBXML */
}
+
+xmltype *
+xmlformat(text *data)
+{
+#ifdef USE_LIBXML
+
+ xmlDocPtr doc;
+ xmlChar *xmlbuf = NULL;
+ StringInfoData buf;
+ int nbytes;
+
+ doc = xml_parse(data, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+ if(!doc)
+ elog(ERROR, "could not parse the given XML document");
+
+ /*
+ * xmlDocDumpFormatMemory (
+ * xmlDocPtr doc, # the XML document
+ * xmlChar **xmlbuf, # the memory pointer
+ * int *nbytes, # the memory length
+ * int format # 1 = node indenting )
+ */
+ xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+ xmlFreeDoc(doc);
+
+ if(!nbytes)
+ elog(ERROR, "could not indent the given XML document");
+
+ initStringInfo(&buf);
+ appendStringInfoString(&buf, (const char *)xmlbuf);
+
+ xmlFree(xmlbuf);
+
+ return stringinfo_to_xmltype(&buf);
+#else
+ NO_XML_SUPPORT();
+ return 0;
+#endif
+}
\ No newline at end of file
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index f7d7f10f7d..831206dbc0 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -842,6 +842,7 @@ typedef struct XmlSerialize
Node *expr;
TypeName *typeName;
int location; /* token location, or -1 if unknown */
+ bool indent; /* should the xml output be indented? */
} XmlSerialize;
/* Partitioning related definitions */
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1be1642d92..17504a7d3d 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1491,6 +1491,7 @@ typedef struct XmlExpr
int32 typmod pg_node_attr(query_jumble_ignore);
/* token location, or -1 if unknown */
int location;
+ bool indent;
} XmlExpr;
/* ----------------
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index bb36213e6f..aeda7cc9f1 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -205,6 +205,7 @@ PG_KEYWORD("in", IN_P, RESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("include", INCLUDE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("including", INCLUDING, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("increment", INCREMENT, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, AS_LABEL)
PG_KEYWORD("index", INDEX, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("indexes", INDEXES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("inherit", INHERIT, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 311da06cd6..551d9f6b05 100644
--- a/src/include/utils/xml.h
+++ b/src/include/utils/xml.h
@@ -90,4 +90,5 @@ extern PGDLLIMPORT int xmloption; /* XmlOptionType, but int for guc enum */
extern PGDLLIMPORT const TableFuncRoutine XmlTableRoutine;
+extern xmltype *xmlformat(text *data);
#endif /* XML_H */
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..e5372c9e64 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -486,6 +486,112 @@ SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
ERROR: not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: invalid XML document
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: invalid XML document
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: invalid XML document
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
?column?
----------
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 378b412db0..fa645f5963 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -309,6 +309,80 @@ ERROR: unsupported XML feature
LINE 1: SELECT xmlserialize(document 'bad' as text);
^
DETAIL: This functionality requires the server to be built with libxml support.
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
ERROR: unsupported XML feature
LINE 1: SELECT xml '<foo>bar</foo>' IS DOCUMENT;
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 42055c5003..ae7a04ebcc 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -466,6 +466,112 @@ SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
ERROR: not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: invalid XML document
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: invalid XML document
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: invalid XML document
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
?column?
----------
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459297..4640691b90 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -132,7 +132,32 @@ SELECT xmlserialize(content data as character varying(20)) FROM xmltest;
SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
-
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
SELECT xml '<foo>bar</foo><bar>foo</bar>' IS DOCUMENT;
SELECT xml '<abc/>' IS NOT DOCUMENT;
--
2.25.1
Here are my review comments for patch v17-0001.
======
src/test/regress/sql/xml.sql
The blank line(s) which previously separated the xmlserialize tests
from the xml IS [NOT] DOCUMENT tests are now missing...
e.g.
-- indent different encoding (returns UTF-8)
SELECT xmlserialize(DOCUMENT '<?xml version="1.0"
encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS
text INDENT);
SELECT xmlserialize(CONTENT '<?xml version="1.0"
encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS
text INDENT);
-- 'no indent' = not using 'no indent'
SELECT xmlserialize(DOCUMENT '<foo><bar><val
x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT
'<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
SELECT xmlserialize(CONTENT '<foo><bar><val
x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT
'<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
SELECT xml '<foo>bar</foo><bar>foo</bar>' IS DOCUMENT;
SELECT xml '<abc/>' IS NOT DOCUMENT;
SELECT xml 'abc' IS NOT DOCUMENT;
SELECT '<>' IS NOT DOCUMENT;
~~
Apart from that, patch v17 LGTM.
------
Kind Regards,
Peter Smith.
Fujitsu Australia
On 23.02.23 02:52, Peter Smith wrote:
Here are my review comments for patch v17-0001.
======
src/test/regress/sql/xml.sqlThe blank line(s) which previously separated the xmlserialize tests
from the xml IS [NOT] DOCUMENT tests are now missing...
v18 adds a new line in the xml.sql file to separate the xmlserialize
test cases from the rest.
Thanks!
Best, Jim
Attachments:
v18-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v18-0001-Add-pretty-printed-XML-output-option.patchDownload
From a37e8cea68e9e6032e29b555b986c28d12f4a16b Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Mon, 20 Feb 2023 23:35:22 +0100
Subject: [PATCH v18] Add pretty-printed XML output option
This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.'
It adds the options INDENT and NO INDENT (default) to the existing
xmlserialize function. It uses the indentation feature of xmlDocDumpFormatMemory
from libxml2 to format XML strings. Although the INDENT feature is designed
to work with xml strings of type DOCUMENT, this implementation also allows
the usage of CONTENT type strings as long as it contains a well-formed xml -
note the XMLOPTION_DOCUMENT in the xml_parse call.
This patch also includes documentation, regression tests and their three
possible output files xml.out, xml_1.out and xml_2.out.
---
doc/src/sgml/datatype.sgml | 8 +-
src/backend/catalog/sql_features.txt | 2 +-
src/backend/executor/execExprInterp.c | 11 ++-
src/backend/parser/gram.y | 12 ++-
src/backend/parser/parse_expr.c | 1 +
src/backend/utils/adt/xml.c | 41 ++++++++++
src/include/nodes/parsenodes.h | 1 +
src/include/nodes/primnodes.h | 1 +
src/include/parser/kwlist.h | 1 +
src/include/utils/xml.h | 1 +
src/test/regress/expected/xml.out | 106 ++++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 74 ++++++++++++++++++
src/test/regress/expected/xml_2.out | 106 ++++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 26 +++++++
14 files changed, 384 insertions(+), 7 deletions(-)
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..53d59662b9 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml '<foo>bar</foo>'
<type>xml</type>, uses the function
<function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm>
<synopsis>
-XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> [ [NO] INDENT ] )
</synopsis>
<replaceable>type</replaceable> can be
<type>character</type>, <type>character varying</type>, or
<type>text</type> (or an alias for one of those). Again, according
to the SQL standard, this is the only way to convert between type
<type>xml</type> and character types, but PostgreSQL also allows
- you to simply cast the value.
+ you to simply cast the value. The option <type>INDENT</type> allows to
+ indent the serialized xml output - the default is <type>NO INDENT</type>.
+ It is designed to indent XML strings of type <type>DOCUMENT</type>, but it can also
+ be used with <type>CONTENT</type> as long as <replaceable>value</replaceable>
+ contains a well-formed XML.
</para>
<para>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 3766762ae3..2e196faeeb 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -619,7 +619,7 @@ X061 XMLParse: character string input and DOCUMENT option YES
X065 XMLParse: binary string input and CONTENT option NO
X066 XMLParse: binary string input and DOCUMENT option NO
X068 XMLSerialize: BOM NO
-X069 XMLSerialize: INDENT NO
+X069 XMLSerialize: INDENT YES
X070 XMLSerialize: character string serialization and CONTENT option YES
X071 XMLSerialize: character string serialization and DOCUMENT option YES
X072 XMLSerialize: character string serialization YES
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..7ba3131d92 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
{
Datum *argvalue = op->d.xmlexpr.argvalue;
bool *argnull = op->d.xmlexpr.argnull;
+ bool indent = op->d.xmlexpr.xexpr->indent;
+ text *data;
/* argument type is known to be xml */
Assert(list_length(xexpr->args) == 1);
@@ -3837,9 +3839,14 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
return;
value = argvalue[0];
- *op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
- xexpr->xmloption));
*op->resnull = false;
+
+ data = xmltotext_with_xmloption(DatumGetXmlP(value),
+ xexpr->xmloption);
+ if(indent)
+ *op->resvalue = PointerGetDatum(xmlformat(data));
+ else
+ *op->resvalue = PointerGetDatum(data);
}
break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0138382a1..2814f16082 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -619,6 +619,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <defelt> xmltable_column_option_el
%type <list> xml_namespace_list
%type <target> xml_namespace_el
+%type <boolean> opt_xml_indent
%type <node> func_application func_expr_common_subexpr
%type <node> func_expr func_expr_windowless
@@ -702,7 +703,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
- INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
+ INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -15532,13 +15533,14 @@ func_expr_common_subexpr:
$$ = makeXmlExpr(IS_XMLROOT, NULL, NIL,
list_make3($3, $5, $6), @1);
}
- | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename ')'
+ | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename opt_xml_indent ')'
{
XmlSerialize *n = makeNode(XmlSerialize);
n->xmloption = $3;
n->expr = $4;
n->typeName = $6;
+ n->indent = $7;
n->location = @1;
$$ = (Node *) n;
}
@@ -15617,6 +15619,11 @@ xmlexists_argument:
}
;
+opt_xml_indent: INDENT { $$ = true; }
+ | NO INDENT { $$ = false; }
+ | /*EMPTY*/ { $$ = false; }
+ ;
+
xml_passing_mech:
BY REF_P
| BY VALUE_P
@@ -16828,6 +16835,7 @@ unreserved_keyword:
| INCLUDE
| INCLUDING
| INCREMENT
+ | INDENT
| INDEX
| INDEXES
| INHERIT
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 7ff41acb84..1f465d126a 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -2332,6 +2332,7 @@ transformXmlSerialize(ParseState *pstate, XmlSerialize *xs)
xexpr->xmloption = xs->xmloption;
xexpr->location = xs->location;
+ xexpr->indent = xs->indent;
/* We actually only need these to be able to parse back the expression. */
xexpr->type = targetType;
xexpr->typmod = targetTypmod;
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..47cdc7c339 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -4818,3 +4818,44 @@ XmlTableDestroyOpaque(TableFuncScanState *state)
NO_XML_SUPPORT();
#endif /* not USE_LIBXML */
}
+
+xmltype *
+xmlformat(text *data)
+{
+#ifdef USE_LIBXML
+
+ xmlDocPtr doc;
+ xmlChar *xmlbuf = NULL;
+ StringInfoData buf;
+ int nbytes;
+
+ doc = xml_parse(data, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+ if(!doc)
+ elog(ERROR, "could not parse the given XML document");
+
+ /*
+ * xmlDocDumpFormatMemory (
+ * xmlDocPtr doc, # the XML document
+ * xmlChar **xmlbuf, # the memory pointer
+ * int *nbytes, # the memory length
+ * int format # 1 = node indenting )
+ */
+ xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+ xmlFreeDoc(doc);
+
+ if(!nbytes)
+ elog(ERROR, "could not indent the given XML document");
+
+ initStringInfo(&buf);
+ appendStringInfoString(&buf, (const char *)xmlbuf);
+
+ xmlFree(xmlbuf);
+
+ return stringinfo_to_xmltype(&buf);
+#else
+ NO_XML_SUPPORT();
+ return 0;
+#endif
+}
\ No newline at end of file
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index f7d7f10f7d..831206dbc0 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -842,6 +842,7 @@ typedef struct XmlSerialize
Node *expr;
TypeName *typeName;
int location; /* token location, or -1 if unknown */
+ bool indent; /* should the xml output be indented? */
} XmlSerialize;
/* Partitioning related definitions */
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1be1642d92..17504a7d3d 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1491,6 +1491,7 @@ typedef struct XmlExpr
int32 typmod pg_node_attr(query_jumble_ignore);
/* token location, or -1 if unknown */
int location;
+ bool indent;
} XmlExpr;
/* ----------------
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index bb36213e6f..aeda7cc9f1 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -205,6 +205,7 @@ PG_KEYWORD("in", IN_P, RESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("include", INCLUDE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("including", INCLUDING, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("increment", INCREMENT, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, AS_LABEL)
PG_KEYWORD("index", INDEX, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("indexes", INDEXES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("inherit", INHERIT, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 311da06cd6..551d9f6b05 100644
--- a/src/include/utils/xml.h
+++ b/src/include/utils/xml.h
@@ -90,4 +90,5 @@ extern PGDLLIMPORT int xmloption; /* XmlOptionType, but int for guc enum */
extern PGDLLIMPORT const TableFuncRoutine XmlTableRoutine;
+extern xmltype *xmlformat(text *data);
#endif /* XML_H */
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..e5372c9e64 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -486,6 +486,112 @@ SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
ERROR: not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: invalid XML document
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: invalid XML document
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: invalid XML document
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
?column?
----------
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 378b412db0..fa645f5963 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -309,6 +309,80 @@ ERROR: unsupported XML feature
LINE 1: SELECT xmlserialize(document 'bad' as text);
^
DETAIL: This functionality requires the server to be built with libxml support.
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
ERROR: unsupported XML feature
LINE 1: SELECT xml '<foo>bar</foo>' IS DOCUMENT;
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 42055c5003..ae7a04ebcc 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -466,6 +466,112 @@ SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
ERROR: not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: invalid XML document
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: invalid XML document
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: invalid XML document
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
?column?
----------
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459297..dcbbd2b23c 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -132,6 +132,32 @@ SELECT xmlserialize(content data as character varying(20)) FROM xmltest;
SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
SELECT xml '<foo>bar</foo><bar>foo</bar>' IS DOCUMENT;
--
2.25.1
On 23.02.23 07:36, Jim Jones wrote:
On 23.02.23 02:52, Peter Smith wrote:
Here are my review comments for patch v17-0001.
======
src/test/regress/sql/xml.sqlThe blank line(s) which previously separated the xmlserialize tests
from the xml IS [NOT] DOCUMENT tests are now missing...v18 adds a new line in the xml.sql file to separate the xmlserialize
test cases from the rest.
In kwlist.h you have
PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, AS_LABEL)
but you can actually make it BARE_LABEL, which is preferable.
More importantly, you need to add the new keyword to the
bare_label_keyword production in gram.y. I thought we had some tooling
in the build system to catch this kind of omission, but it's apparently
not working right now.
Elsewhere, let's rename the xmlformat() C function to xmlserialize() (or
maybe something like xmlserialize_indent()), so the association is clearer.
On 23.02.23 08:51, Peter Eisentraut wrote:
In kwlist.h you have
PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, AS_LABEL)
but you can actually make it BARE_LABEL, which is preferable.
More importantly, you need to add the new keyword to the
bare_label_keyword production in gram.y. I thought we had some
tooling in the build system to catch this kind of omission, but it's
apparently not working right now.
Entry in kwlist.h changed to BARE_LABEL.
Elsewhere, let's rename the xmlformat() C function to xmlserialize()
(or maybe something like xmlserialize_indent()), so the association is
clearer.
xmlserialize_indent sounds much better and makes the association indeed
clearer. Changed in v19.
v19 attached.
Thanks for the review!
Best, Jim
Attachments:
v19-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v19-0001-Add-pretty-printed-XML-output-option.patchDownload
From ed1e4a9fc94a6b65a9be6b125ae5fa8af1aa9d68 Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Mon, 20 Feb 2023 23:35:22 +0100
Subject: [PATCH v19] Add pretty-printed XML output option
This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.'
It adds the options INDENT and NO INDENT (default) to the existing
xmlserialize function. It uses the indentation feature of xmlDocDumpFormatMemory
from libxml2 to format XML strings. Although the INDENT feature is designed
to work with xml strings of type DOCUMENT, this implementation also allows
the usage of CONTENT type strings as long as it contains a well-formed xml -
note the XMLOPTION_DOCUMENT in the xml_parse call.
This patch also includes documentation, regression tests and their three
possible output files xml.out, xml_1.out and xml_2.out.
---
doc/src/sgml/datatype.sgml | 8 +-
src/backend/catalog/sql_features.txt | 2 +-
src/backend/executor/execExprInterp.c | 11 ++-
src/backend/parser/gram.y | 13 +++-
src/backend/parser/parse_expr.c | 1 +
src/backend/utils/adt/xml.c | 41 ++++++++++
src/include/nodes/parsenodes.h | 1 +
src/include/nodes/primnodes.h | 1 +
src/include/parser/kwlist.h | 1 +
src/include/utils/xml.h | 1 +
src/test/regress/expected/xml.out | 106 ++++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 74 ++++++++++++++++++
src/test/regress/expected/xml_2.out | 106 ++++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 26 +++++++
14 files changed, 385 insertions(+), 7 deletions(-)
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..53d59662b9 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml '<foo>bar</foo>'
<type>xml</type>, uses the function
<function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm>
<synopsis>
-XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> [ [NO] INDENT ] )
</synopsis>
<replaceable>type</replaceable> can be
<type>character</type>, <type>character varying</type>, or
<type>text</type> (or an alias for one of those). Again, according
to the SQL standard, this is the only way to convert between type
<type>xml</type> and character types, but PostgreSQL also allows
- you to simply cast the value.
+ you to simply cast the value. The option <type>INDENT</type> allows to
+ indent the serialized xml output - the default is <type>NO INDENT</type>.
+ It is designed to indent XML strings of type <type>DOCUMENT</type>, but it can also
+ be used with <type>CONTENT</type> as long as <replaceable>value</replaceable>
+ contains a well-formed XML.
</para>
<para>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 3766762ae3..2e196faeeb 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -619,7 +619,7 @@ X061 XMLParse: character string input and DOCUMENT option YES
X065 XMLParse: binary string input and CONTENT option NO
X066 XMLParse: binary string input and DOCUMENT option NO
X068 XMLSerialize: BOM NO
-X069 XMLSerialize: INDENT NO
+X069 XMLSerialize: INDENT YES
X070 XMLSerialize: character string serialization and CONTENT option YES
X071 XMLSerialize: character string serialization and DOCUMENT option YES
X072 XMLSerialize: character string serialization YES
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..0339862267 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
{
Datum *argvalue = op->d.xmlexpr.argvalue;
bool *argnull = op->d.xmlexpr.argnull;
+ bool indent = op->d.xmlexpr.xexpr->indent;
+ text *data;
/* argument type is known to be xml */
Assert(list_length(xexpr->args) == 1);
@@ -3837,9 +3839,14 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
return;
value = argvalue[0];
- *op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
- xexpr->xmloption));
*op->resnull = false;
+
+ data = xmltotext_with_xmloption(DatumGetXmlP(value),
+ xexpr->xmloption);
+ if(indent)
+ *op->resvalue = PointerGetDatum(xmlserialize_indent(data));
+ else
+ *op->resvalue = PointerGetDatum(data);
}
break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0138382a1..014c547c5d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -619,6 +619,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <defelt> xmltable_column_option_el
%type <list> xml_namespace_list
%type <target> xml_namespace_el
+%type <boolean> opt_xml_indent
%type <node> func_application func_expr_common_subexpr
%type <node> func_expr func_expr_windowless
@@ -702,7 +703,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
- INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
+ INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -15532,13 +15533,14 @@ func_expr_common_subexpr:
$$ = makeXmlExpr(IS_XMLROOT, NULL, NIL,
list_make3($3, $5, $6), @1);
}
- | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename ')'
+ | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename opt_xml_indent ')'
{
XmlSerialize *n = makeNode(XmlSerialize);
n->xmloption = $3;
n->expr = $4;
n->typeName = $6;
+ n->indent = $7;
n->location = @1;
$$ = (Node *) n;
}
@@ -15617,6 +15619,11 @@ xmlexists_argument:
}
;
+opt_xml_indent: INDENT { $$ = true; }
+ | NO INDENT { $$ = false; }
+ | /*EMPTY*/ { $$ = false; }
+ ;
+
xml_passing_mech:
BY REF_P
| BY VALUE_P
@@ -16828,6 +16835,7 @@ unreserved_keyword:
| INCLUDE
| INCLUDING
| INCREMENT
+ | INDENT
| INDEX
| INDEXES
| INHERIT
@@ -17384,6 +17392,7 @@ bare_label_keyword:
| INCLUDE
| INCLUDING
| INCREMENT
+ | INDENT
| INDEX
| INDEXES
| INHERIT
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 7ff41acb84..1f465d126a 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -2332,6 +2332,7 @@ transformXmlSerialize(ParseState *pstate, XmlSerialize *xs)
xexpr->xmloption = xs->xmloption;
xexpr->location = xs->location;
+ xexpr->indent = xs->indent;
/* We actually only need these to be able to parse back the expression. */
xexpr->type = targetType;
xexpr->typmod = targetTypmod;
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..cad0586bba 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -4818,3 +4818,44 @@ XmlTableDestroyOpaque(TableFuncScanState *state)
NO_XML_SUPPORT();
#endif /* not USE_LIBXML */
}
+
+xmltype *
+xmlserialize_indent(text *data)
+{
+#ifdef USE_LIBXML
+
+ xmlDocPtr doc;
+ xmlChar *xmlbuf = NULL;
+ StringInfoData buf;
+ int nbytes;
+
+ doc = xml_parse(data, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+ if(!doc)
+ elog(ERROR, "could not parse the given XML document");
+
+ /*
+ * xmlDocDumpFormatMemory (
+ * xmlDocPtr doc, # the XML document
+ * xmlChar **xmlbuf, # the memory pointer
+ * int *nbytes, # the memory length
+ * int format # 1 = node indenting )
+ */
+ xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+ xmlFreeDoc(doc);
+
+ if(!nbytes)
+ elog(ERROR, "could not indent the given XML document");
+
+ initStringInfo(&buf);
+ appendStringInfoString(&buf, (const char *)xmlbuf);
+
+ xmlFree(xmlbuf);
+
+ return stringinfo_to_xmltype(&buf);
+#else
+ NO_XML_SUPPORT();
+ return 0;
+#endif
+}
\ No newline at end of file
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index f7d7f10f7d..831206dbc0 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -842,6 +842,7 @@ typedef struct XmlSerialize
Node *expr;
TypeName *typeName;
int location; /* token location, or -1 if unknown */
+ bool indent; /* should the xml output be indented? */
} XmlSerialize;
/* Partitioning related definitions */
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1be1642d92..17504a7d3d 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1491,6 +1491,7 @@ typedef struct XmlExpr
int32 typmod pg_node_attr(query_jumble_ignore);
/* token location, or -1 if unknown */
int location;
+ bool indent;
} XmlExpr;
/* ----------------
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index bb36213e6f..753e9ee174 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -205,6 +205,7 @@ PG_KEYWORD("in", IN_P, RESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("include", INCLUDE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("including", INCLUDING, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("increment", INCREMENT, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("index", INDEX, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("indexes", INDEXES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("inherit", INHERIT, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 311da06cd6..d72cd92f63 100644
--- a/src/include/utils/xml.h
+++ b/src/include/utils/xml.h
@@ -90,4 +90,5 @@ extern PGDLLIMPORT int xmloption; /* XmlOptionType, but int for guc enum */
extern PGDLLIMPORT const TableFuncRoutine XmlTableRoutine;
+extern xmltype *xmlserialize_indent(text *data);
#endif /* XML_H */
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..e5372c9e64 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -486,6 +486,112 @@ SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
ERROR: not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: invalid XML document
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: invalid XML document
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: invalid XML document
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
?column?
----------
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 378b412db0..fa645f5963 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -309,6 +309,80 @@ ERROR: unsupported XML feature
LINE 1: SELECT xmlserialize(document 'bad' as text);
^
DETAIL: This functionality requires the server to be built with libxml support.
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
ERROR: unsupported XML feature
LINE 1: SELECT xml '<foo>bar</foo>' IS DOCUMENT;
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 42055c5003..ae7a04ebcc 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -466,6 +466,112 @@ SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
ERROR: not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: invalid XML document
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: invalid XML document
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: invalid XML document
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
?column?
----------
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459297..dcbbd2b23c 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -132,6 +132,32 @@ SELECT xmlserialize(content data as character varying(20)) FROM xmltest;
SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
SELECT xml '<foo>bar</foo><bar>foo</bar>' IS DOCUMENT;
--
2.25.1
The patch v19 LGTM.
- v19 applies cleanly for me
- Full clean build OK
- HTML docs build and render OK
- The 'make check' tests all pass for me
- Also cfbot reports latest patch has no errors -- http://cfbot.cputube.org/
So, I marked it a "Ready for Committer" in the CF --
https://commitfest.postgresql.org/42/4162/
------
Kind Regards,
Peter Smith.
Fujitsu Australia
While reviewing this patch, I started to wonder why we don't eliminate
the maintenance hassle of xml_1.out by putting in a short-circuit
at the top of the test, similar to those in some other scripts:
/* skip test if XML support not compiled in */
SELECT '<value>one</value>'::xml;
\if :ERROR
\quit
\endif
(and I guess xmlmap.sql could get the same treatment).
The only argument I can think of against it is that the current
approach ensures we produce a clean error (and not, say, a crash)
for all xml.c entry points not just xml_in. I'm not sure how much
that's worth though. The compiler/linker would tell us if we miss
compiling out every reference to libxml2.
Thoughts?
regards, tom lane
On 09.03.23 18:38, Tom Lane wrote:
While reviewing this patch, I started to wonder why we don't eliminate
the maintenance hassle of xml_1.out by putting in a short-circuit
at the top of the test, similar to those in some other scripts:/* skip test if XML support not compiled in */
SELECT '<value>one</value>'::xml;
\if :ERROR
\quit
\endif(and I guess xmlmap.sql could get the same treatment).
The only argument I can think of against it is that the current
approach ensures we produce a clean error (and not, say, a crash)
for all xml.c entry points not just xml_in. I'm not sure how much
that's worth though. The compiler/linker would tell us if we miss
compiling out every reference to libxml2.Thoughts?
regards, tom lane
Hi Tom,
I agree it would make things easier and it could indeed save some time
(and some CI runs ;)).
However, checking in the absence of libxml2 if an error message is
raised, and checking if this error message is the one we expect, is IMHO
also a very nice test. But I guess I could also live with skipping the
whole thing.
Best, Jim
Attachments:
Peter Smith <smithpb2250@gmail.com> writes:
The patch v19 LGTM.
I've looked through this now, and have some minor complaints and a major
one. The major one is that it doesn't work for XML that doesn't satisfy
IS DOCUMENT. For example,
regression=# select '<bar><val x="y">42</val></bar><foo></foo>'::xml is document;
?column?
----------
f
(1 row)
regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text);
xmlserialize
-------------------------------------------
<bar><val x="y">42</val></bar><foo></foo>
(1 row)
regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text indent);
ERROR: invalid XML document
DETAIL: line 1: Extra content at the end of the document
<bar><val x="y">42</val></bar><foo></foo>
^
This is not what the documentation promises, and I don't think it's
good enough --- the SQL spec has no restriction saying you can't
use INDENT with CONTENT. I tried adjusting things so that we call
xml_parse() with the appropriate DOCUMENT or CONTENT xmloption flag,
but all that got me was empty output (except for a document header).
It seems like xmlDocDumpFormatMemory is not the thing to use, at least
not in the CONTENT case. But libxml2 has a few other "dump"
functions, so maybe we can use a different one? I see we are using
xmlNodeDump elsewhere, and that has a format option, so maybe there's
a way forward there.
A lesser issue is that INDENT tacks on a document header (XML declaration)
whether there was one or not. I'm not sure whether that's an appropriate
thing to do in the DOCUMENT case, but it sure seems weird in the CONTENT
case. We have code that can strip off the header again, but we
need to figure out exactly when to apply it.
I also suspect that it's outright broken to attach a header claiming
the data is now in UTF8 encoding. If the database encoding isn't
UTF8, then either that's a lie or we now have an encoding violation.
Another thing that's mildly irking me is that the current
factorization of this code will result in xml_parse'ing the data
twice, if you have both DOCUMENT and INDENT specified. We could
consider avoiding that if we merged the indentation functionality
into xmltotext_with_xmloption, but it's probably premature to do so
when we haven't figured out how to get the output right --- we might
end up needing two xml_parse calls anyway with different parameters,
perhaps.
I also had a bunch of cosmetic complaints (mostly around this having
a bad case of add-at-the-end-itis), which I've cleaned up in the
attached v20. This doesn't address any of the above, however.
regards, tom lane
Attachments:
v20-0001-Add-pretty-printed-XML-output-option.patchtext/x-diff; charset=us-ascii; name=v20-0001-Add-pretty-printed-XML-output-option.patchDownload
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..53d59662b9 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml '<foo>bar</foo>'
<type>xml</type>, uses the function
<function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm>
<synopsis>
-XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> [ [NO] INDENT ] )
</synopsis>
<replaceable>type</replaceable> can be
<type>character</type>, <type>character varying</type>, or
<type>text</type> (or an alias for one of those). Again, according
to the SQL standard, this is the only way to convert between type
<type>xml</type> and character types, but PostgreSQL also allows
- you to simply cast the value.
+ you to simply cast the value. The option <type>INDENT</type> allows to
+ indent the serialized xml output - the default is <type>NO INDENT</type>.
+ It is designed to indent XML strings of type <type>DOCUMENT</type>, but it can also
+ be used with <type>CONTENT</type> as long as <replaceable>value</replaceable>
+ contains a well-formed XML.
</para>
<para>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 0fb9ab7533..bb4c135a7f 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -621,7 +621,7 @@ X061 XMLParse: character string input and DOCUMENT option YES
X065 XMLParse: binary string input and CONTENT option NO
X066 XMLParse: binary string input and DOCUMENT option NO
X068 XMLSerialize: BOM NO
-X069 XMLSerialize: INDENT NO
+X069 XMLSerialize: INDENT YES
X070 XMLSerialize: character string serialization and CONTENT option YES
X071 XMLSerialize: character string serialization and DOCUMENT option YES
X072 XMLSerialize: character string serialization YES
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..3dcd15d5f0 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
{
Datum *argvalue = op->d.xmlexpr.argvalue;
bool *argnull = op->d.xmlexpr.argnull;
+ text *result;
/* argument type is known to be xml */
Assert(list_length(xexpr->args) == 1);
@@ -3837,8 +3838,12 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
return;
value = argvalue[0];
- *op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
- xexpr->xmloption));
+ result = xmltotext_with_xmloption(DatumGetXmlP(value),
+ xexpr->xmloption);
+ if (xexpr->indent)
+ result = xmlserialize_indent(result);
+
+ *op->resvalue = PointerGetDatum(result);
*op->resnull = false;
}
break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0138382a1..efe88ccf9d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -613,7 +613,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <node> xml_root_version opt_xml_root_standalone
%type <node> xmlexists_argument
%type <ival> document_or_content
-%type <boolean> xml_whitespace_option
+%type <boolean> xml_indent_option xml_whitespace_option
%type <list> xmltable_column_list xmltable_column_option_list
%type <node> xmltable_column_el
%type <defelt> xmltable_column_option_el
@@ -702,7 +702,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
- INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
+ INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -15532,13 +15532,14 @@ func_expr_common_subexpr:
$$ = makeXmlExpr(IS_XMLROOT, NULL, NIL,
list_make3($3, $5, $6), @1);
}
- | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename ')'
+ | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename xml_indent_option ')'
{
XmlSerialize *n = makeNode(XmlSerialize);
n->xmloption = $3;
n->expr = $4;
n->typeName = $6;
+ n->indent = $7;
n->location = @1;
$$ = (Node *) n;
}
@@ -15592,6 +15593,11 @@ document_or_content: DOCUMENT_P { $$ = XMLOPTION_DOCUMENT; }
| CONTENT_P { $$ = XMLOPTION_CONTENT; }
;
+xml_indent_option: INDENT { $$ = true; }
+ | NO INDENT { $$ = false; }
+ | /*EMPTY*/ { $$ = false; }
+ ;
+
xml_whitespace_option: PRESERVE WHITESPACE_P { $$ = true; }
| STRIP_P WHITESPACE_P { $$ = false; }
| /*EMPTY*/ { $$ = false; }
@@ -16828,6 +16834,7 @@ unreserved_keyword:
| INCLUDE
| INCLUDING
| INCREMENT
+ | INDENT
| INDEX
| INDEXES
| INHERIT
@@ -17384,6 +17391,7 @@ bare_label_keyword:
| INCLUDE
| INCLUDING
| INCREMENT
+ | INDENT
| INDEX
| INDEXES
| INHERIT
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 78221d2e0f..2331417552 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -2331,6 +2331,7 @@ transformXmlSerialize(ParseState *pstate, XmlSerialize *xs)
typenameTypeIdAndMod(pstate, xs->typeName, &targetType, &targetTypmod);
xexpr->xmloption = xs->xmloption;
+ xexpr->indent = xs->indent;
xexpr->location = xs->location;
/* We actually only need these to be able to parse back the expression. */
xexpr->type = targetType;
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..4d2549ed03 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -631,6 +631,39 @@ xmltotext_with_xmloption(xmltype *data, XmlOptionType xmloption_arg)
}
+text *
+xmlserialize_indent(text *data)
+{
+#ifdef USE_LIBXML
+ text *result;
+ xmlDocPtr doc;
+ xmlChar *xmlbuf;
+ int nbytes;
+
+ doc = xml_parse(data, XMLOPTION_DOCUMENT, false,
+ GetDatabaseEncoding(), NULL);
+ Assert(doc);
+
+ /* Reformat with indenting requested */
+ xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+ xmlFreeDoc(doc);
+
+ if (!nbytes)
+ elog(ERROR, "could not indent the given XML document");
+
+ result = cstring_to_text_with_len((const char *) xmlbuf, nbytes);
+
+ xmlFree(xmlbuf);
+
+ return result;
+#else
+ NO_XML_SUPPORT();
+ return NULL;
+#endif
+}
+
+
xmltype *
xmlelement(XmlExpr *xexpr,
Datum *named_argvalue, bool *named_argnull,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 371aa0ffc5..028588fb33 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -840,6 +840,7 @@ typedef struct XmlSerialize
XmlOptionType xmloption; /* DOCUMENT or CONTENT */
Node *expr;
TypeName *typeName;
+ bool indent; /* [NO] INDENT */
int location; /* token location, or -1 if unknown */
} XmlSerialize;
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 4220c63ab7..8fb5b4b919 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1464,7 +1464,7 @@ typedef enum XmlExprOp
IS_XMLPARSE, /* XMLPARSE(text, is_doc, preserve_ws) */
IS_XMLPI, /* XMLPI(name [, args]) */
IS_XMLROOT, /* XMLROOT(xml, version, standalone) */
- IS_XMLSERIALIZE, /* XMLSERIALIZE(is_document, xmlval) */
+ IS_XMLSERIALIZE, /* XMLSERIALIZE(is_document, xmlval, indent) */
IS_DOCUMENT /* xmlval IS DOCUMENT */
} XmlExprOp;
@@ -1489,6 +1489,8 @@ typedef struct XmlExpr
List *args;
/* DOCUMENT or CONTENT */
XmlOptionType xmloption pg_node_attr(query_jumble_ignore);
+ /* INDENT option for XMLSERIALIZE */
+ bool indent;
/* target type/typmod for XMLSERIALIZE */
Oid type pg_node_attr(query_jumble_ignore);
int32 typmod pg_node_attr(query_jumble_ignore);
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index bb36213e6f..753e9ee174 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -205,6 +205,7 @@ PG_KEYWORD("in", IN_P, RESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("include", INCLUDE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("including", INCLUDING, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("increment", INCREMENT, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("index", INDEX, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("indexes", INDEXES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("inherit", INHERIT, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 311da06cd6..a1dfe4c631 100644
--- a/src/include/utils/xml.h
+++ b/src/include/utils/xml.h
@@ -78,6 +78,7 @@ extern xmltype *xmlpi(const char *target, text *arg, bool arg_is_null, bool *res
extern xmltype *xmlroot(xmltype *data, text *version, int standalone);
extern bool xml_is_document(xmltype *arg);
extern text *xmltotext_with_xmloption(xmltype *data, XmlOptionType xmloption_arg);
+extern text *xmlserialize_indent(text *data);
extern char *escape_xml(const char *str);
extern char *map_sql_identifier_to_xml_name(const char *ident, bool fully_escaped, bool escape_period);
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index ad852dc2f7..ddbf0ca16b 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -486,6 +486,112 @@ SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
ERROR: not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: invalid XML document
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: invalid XML document
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: invalid XML document
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
?column?
----------
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 70fe34a04f..2944f84103 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -309,6 +309,80 @@ ERROR: unsupported XML feature
LINE 1: SELECT xmlserialize(document 'bad' as text);
^
DETAIL: This functionality requires the server to be built with libxml support.
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
ERROR: unsupported XML feature
LINE 1: SELECT xml '<foo>bar</foo>' IS DOCUMENT;
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 4f029d0072..60dcb3d36a 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -466,6 +466,112 @@ SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
ERROR: not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: invalid XML document
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: invalid XML document
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: invalid XML document
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
?column?
----------
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 24e40d2653..fea875adfd 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -132,6 +132,32 @@ SELECT xmlserialize(content data as character varying(20)) FROM xmltest;
SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
SELECT xml '<foo>bar</foo><bar>foo</bar>' IS DOCUMENT;
Thanks a lot for the review!
On 09.03.23 21:21, Tom Lane wrote:
I've looked through this now, and have some minor complaints and a major
one. The major one is that it doesn't work for XML that doesn't satisfy
IS DOCUMENT. For example,regression=# select '<bar><val x="y">42</val></bar><foo></foo>'::xml is document;
?column?
----------
f
(1 row)regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text);
xmlserialize
-------------------------------------------
<bar><val x="y">42</val></bar><foo></foo>
(1 row)regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text indent);
ERROR: invalid XML document
DETAIL: line 1: Extra content at the end of the document
<bar><val x="y">42</val></bar><foo></foo>
^
I assumed it should fail because the XML string doesn't have a
singly-rooted XML. Oracle has this feature implemented and it does not
seem to allow non singly-rooted strings either[1]. Also, some the tools
I use also fail in this case[2,3]
How do you suggest the output should look like? Does the SQL spec also
define it? I can't find it online :(
This is not what the documentation promises, and I don't think it's
good enough --- the SQL spec has no restriction saying you can't
use INDENT with CONTENT. I tried adjusting things so that we call
xml_parse() with the appropriate DOCUMENT or CONTENT xmloption flag,
but all that got me was empty output (except for a document header).
It seems like xmlDocDumpFormatMemory is not the thing to use, at least
not in the CONTENT case. But libxml2 has a few other "dump"
functions, so maybe we can use a different one? I see we are using
xmlNodeDump elsewhere, and that has a format option, so maybe there's
a way forward there.A lesser issue is that INDENT tacks on a document header (XML declaration)
whether there was one or not. I'm not sure whether that's an appropriate
thing to do in the DOCUMENT case, but it sure seems weird in the CONTENT
case. We have code that can strip off the header again, but we
need to figure out exactly when to apply it.
I replaced xmlDocDumpFormatMemory with xmlSaveToBuffer and used to
option XML_SAVE_NO_DECL for input docs with XML declaration. It no
longer returns a XML declaration if the input doc does not contain it.
I also suspect that it's outright broken to attach a header claiming
the data is now in UTF8 encoding. If the database encoding isn't
UTF8, then either that's a lie or we now have an encoding violation.
I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now
uses the encoding of the given doc and UTF8 if not provided.
Another thing that's mildly irking me is that the current
factorization of this code will result in xml_parse'ing the data
twice, if you have both DOCUMENT and INDENT specified. We could
consider avoiding that if we merged the indentation functionality
into xmltotext_with_xmloption, but it's probably premature to do so
when we haven't figured out how to get the output right --- we might
end up needing two xml_parse calls anyway with different parameters,
perhaps.I also had a bunch of cosmetic complaints (mostly around this having
a bad case of add-at-the-end-itis), which I've cleaned up in the
attached v20. This doesn't address any of the above, however.
I swear to god I have no idea what "add-at-the-end-itis" means :)
regards, tom lane
Thanks a lot!
Best, Jim
1 - https://dbfiddle.uk/WUOWtjBU
Attachments:
v21-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v21-0001-Add-pretty-printed-XML-output-option.patchDownload
From 5d522d8ec1bd01731d0f75a4163f9a8ad435bee6 Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Fri, 10 Mar 2023 13:47:16 +0100
Subject: [PATCH v21] Add pretty-printed XML output option
This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.'
It adds the options INDENT and NO INDENT (default) to the existing
xmlserialize function. It uses the indentation feature of xmlSaveToBuffer
from libxml2 to indent XML strings - XML_SAVE_FORMAT. Although the INDENT
feature is designed to work with xml strings of type DOCUMENT, this
implementation also allows the usage of CONTENT type strings as long as it
contains a well-formed singly-rooted xml - note the XMLOPTION_DOCUMENT in
the xml_parse call.
This patch also includes documentation, regression tests and their three
possible output files xml.out, xml_1.out and xml_2.out.
---
doc/src/sgml/datatype.sgml | 8 +-
src/backend/catalog/sql_features.txt | 2 +-
src/backend/executor/execExprInterp.c | 9 +-
src/backend/parser/gram.y | 14 ++-
src/backend/parser/parse_expr.c | 1 +
src/backend/utils/adt/xml.c | 74 +++++++++++++++
src/include/nodes/parsenodes.h | 1 +
src/include/nodes/primnodes.h | 4 +-
src/include/parser/kwlist.h | 1 +
src/include/utils/xml.h | 1 +
src/test/regress/expected/xml.out | 129 ++++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 85 +++++++++++++++++
src/test/regress/expected/xml_2.out | 129 ++++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 29 ++++++
14 files changed, 478 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..14cbbdd71f 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml '<foo>bar</foo>'
<type>xml</type>, uses the function
<function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm>
<synopsis>
-XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> [ [NO] INDENT ] )
</synopsis>
<replaceable>type</replaceable> can be
<type>character</type>, <type>character varying</type>, or
<type>text</type> (or an alias for one of those). Again, according
to the SQL standard, this is the only way to convert between type
<type>xml</type> and character types, but PostgreSQL also allows
- you to simply cast the value.
+ you to simply cast the value. The option <type>INDENT</type> allows to
+ indent the serialized xml output - the default is <type>NO INDENT</type>.
+ It is designed to indent XML strings of type <type>DOCUMENT</type>, but it can also
+ be used with <type>CONTENT</type> as long as <replaceable>value</replaceable>
+ contains a well-formed singly-rooted XML.
</para>
<para>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 0fb9ab7533..bb4c135a7f 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -621,7 +621,7 @@ X061 XMLParse: character string input and DOCUMENT option YES
X065 XMLParse: binary string input and CONTENT option NO
X066 XMLParse: binary string input and DOCUMENT option NO
X068 XMLSerialize: BOM NO
-X069 XMLSerialize: INDENT NO
+X069 XMLSerialize: INDENT YES
X070 XMLSerialize: character string serialization and CONTENT option YES
X071 XMLSerialize: character string serialization and DOCUMENT option YES
X072 XMLSerialize: character string serialization YES
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..3dcd15d5f0 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
{
Datum *argvalue = op->d.xmlexpr.argvalue;
bool *argnull = op->d.xmlexpr.argnull;
+ text *result;
/* argument type is known to be xml */
Assert(list_length(xexpr->args) == 1);
@@ -3837,8 +3838,12 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
return;
value = argvalue[0];
- *op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
- xexpr->xmloption));
+ result = xmltotext_with_xmloption(DatumGetXmlP(value),
+ xexpr->xmloption);
+ if (xexpr->indent)
+ result = xmlserialize_indent(result);
+
+ *op->resvalue = PointerGetDatum(result);
*op->resnull = false;
}
break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0138382a1..efe88ccf9d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -613,7 +613,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <node> xml_root_version opt_xml_root_standalone
%type <node> xmlexists_argument
%type <ival> document_or_content
-%type <boolean> xml_whitespace_option
+%type <boolean> xml_indent_option xml_whitespace_option
%type <list> xmltable_column_list xmltable_column_option_list
%type <node> xmltable_column_el
%type <defelt> xmltable_column_option_el
@@ -702,7 +702,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
- INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
+ INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -15532,13 +15532,14 @@ func_expr_common_subexpr:
$$ = makeXmlExpr(IS_XMLROOT, NULL, NIL,
list_make3($3, $5, $6), @1);
}
- | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename ')'
+ | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename xml_indent_option ')'
{
XmlSerialize *n = makeNode(XmlSerialize);
n->xmloption = $3;
n->expr = $4;
n->typeName = $6;
+ n->indent = $7;
n->location = @1;
$$ = (Node *) n;
}
@@ -15592,6 +15593,11 @@ document_or_content: DOCUMENT_P { $$ = XMLOPTION_DOCUMENT; }
| CONTENT_P { $$ = XMLOPTION_CONTENT; }
;
+xml_indent_option: INDENT { $$ = true; }
+ | NO INDENT { $$ = false; }
+ | /*EMPTY*/ { $$ = false; }
+ ;
+
xml_whitespace_option: PRESERVE WHITESPACE_P { $$ = true; }
| STRIP_P WHITESPACE_P { $$ = false; }
| /*EMPTY*/ { $$ = false; }
@@ -16828,6 +16834,7 @@ unreserved_keyword:
| INCLUDE
| INCLUDING
| INCREMENT
+ | INDENT
| INDEX
| INDEXES
| INHERIT
@@ -17384,6 +17391,7 @@ bare_label_keyword:
| INCLUDE
| INCLUDING
| INCREMENT
+ | INDENT
| INDEX
| INDEXES
| INHERIT
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 78221d2e0f..2331417552 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -2331,6 +2331,7 @@ transformXmlSerialize(ParseState *pstate, XmlSerialize *xs)
typenameTypeIdAndMod(pstate, xs->typeName, &targetType, &targetTypmod);
xexpr->xmloption = xs->xmloption;
+ xexpr->indent = xs->indent;
xexpr->location = xs->location;
/* We actually only need these to be able to parse back the expression. */
xexpr->type = targetType;
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..174831e6b4 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -52,6 +52,7 @@
#include <libxml/tree.h>
#include <libxml/uri.h>
#include <libxml/xmlerror.h>
+#include <libxml/xmlsave.h>
#include <libxml/xmlversion.h>
#include <libxml/xmlwriter.h>
#include <libxml/xpath.h>
@@ -631,6 +632,79 @@ xmltotext_with_xmloption(xmltype *data, XmlOptionType xmloption_arg)
}
+text *
+xmlserialize_indent(text *data)
+{
+#ifdef USE_LIBXML
+ text *result;
+ xmlDocPtr doc;
+ xmlSaveCtxtPtr ctxt = NULL;
+ xmlBufferPtr buf = NULL;
+ xmlChar *encodingStr = NULL;
+ xmlChar *version;
+ PgXmlErrorContext *xmlerrcxt;
+ int encoding;
+
+ parse_xml_decl(xml_text2xmlChar(data), NULL, &version, &encodingStr, NULL);
+ encoding = encodingStr ? xmlChar_to_encoding(encodingStr) : PG_UTF8;
+
+ doc = xml_parse(data, XMLOPTION_DOCUMENT, false,
+ encoding, NULL);
+ Assert(doc);
+
+ xmlerrcxt = pg_xml_init(PG_XML_STRICTNESS_ALL);
+
+ PG_TRY();
+ {
+ buf = xmlBufferCreate();
+ if (buf == NULL || xmlerrcxt->err_occurred)
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
+ "could not allocate xmlBuffer");
+
+ if(!version)
+ /* Reformat with indenting requested without XML declaration */
+ ctxt = xmlSaveToBuffer(buf, (const char *) encodingStr,
+ XML_SAVE_NO_DECL|XML_SAVE_FORMAT);
+ else
+ ctxt = xmlSaveToBuffer(buf, (const char *) encodingStr,
+ XML_SAVE_FORMAT);
+
+ if (ctxt == NULL || xmlerrcxt->err_occurred)
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
+ "could not allocate parser context");
+
+ xmlSaveDoc(ctxt, doc);
+ xmlSaveClose(ctxt);
+ }
+ PG_CATCH();
+ {
+ if (buf)
+ xmlBufferFree(buf);
+ if(doc)
+ xmlFreeDoc(doc);
+ if(ctxt)
+ xmlSaveClose(ctxt);
+
+ pg_xml_done(xmlerrcxt, true);
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ pg_xml_done(xmlerrcxt, false);
+ xmlFreeDoc(doc);
+
+ result = (text *) xmlBuffer_to_xmltype(buf);
+ xmlBufferFree(buf);
+
+ return result;
+#else
+ NO_XML_SUPPORT();
+ return NULL;
+#endif
+}
+
+
xmltype *
xmlelement(XmlExpr *xexpr,
Datum *named_argvalue, bool *named_argnull,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index f7d7f10f7d..fc5b89a698 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -841,6 +841,7 @@ typedef struct XmlSerialize
XmlOptionType xmloption; /* DOCUMENT or CONTENT */
Node *expr;
TypeName *typeName;
+ bool indent; /* [NO] INDENT */
int location; /* token location, or -1 if unknown */
} XmlSerialize;
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index b4292253cc..2263dab8a1 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1461,7 +1461,7 @@ typedef enum XmlExprOp
IS_XMLPARSE, /* XMLPARSE(text, is_doc, preserve_ws) */
IS_XMLPI, /* XMLPI(name [, args]) */
IS_XMLROOT, /* XMLROOT(xml, version, standalone) */
- IS_XMLSERIALIZE, /* XMLSERIALIZE(is_document, xmlval) */
+ IS_XMLSERIALIZE, /* XMLSERIALIZE(is_document, xmlval, indent) */
IS_DOCUMENT /* xmlval IS DOCUMENT */
} XmlExprOp;
@@ -1486,6 +1486,8 @@ typedef struct XmlExpr
List *args;
/* DOCUMENT or CONTENT */
XmlOptionType xmloption pg_node_attr(query_jumble_ignore);
+ /* INDENT option for XMLSERIALIZE */
+ bool indent;
/* target type/typmod for XMLSERIALIZE */
Oid type pg_node_attr(query_jumble_ignore);
int32 typmod pg_node_attr(query_jumble_ignore);
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index bb36213e6f..753e9ee174 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -205,6 +205,7 @@ PG_KEYWORD("in", IN_P, RESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("include", INCLUDE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("including", INCLUDING, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("increment", INCREMENT, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("index", INDEX, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("indexes", INDEXES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("inherit", INHERIT, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 311da06cd6..a1dfe4c631 100644
--- a/src/include/utils/xml.h
+++ b/src/include/utils/xml.h
@@ -78,6 +78,7 @@ extern xmltype *xmlpi(const char *target, text *arg, bool arg_is_null, bool *res
extern xmltype *xmlroot(xmltype *data, text *version, int standalone);
extern bool xml_is_document(xmltype *arg);
extern text *xmltotext_with_xmloption(xmltype *data, XmlOptionType xmloption_arg);
+extern text *xmlserialize_indent(text *data);
extern char *escape_xml(const char *str);
extern char *map_sql_identifier_to_xml_name(const char *ident, bool fully_escaped, bool escape_period);
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index ad852dc2f7..5f7b8f4827 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -486,6 +486,135 @@ SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
ERROR: not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+-------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val>+
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+-------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val>+
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: invalid XML document
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: invalid XML document
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: invalid XML document
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent encoding="ISO-8859-1"
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+---------------------------------------------
+ <?xml version="1.0" encoding="ISO-8859-1"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+---------------------------------------------
+ <?xml version="1.0" encoding="ISO-8859-1"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- indent encoding="UTF-8"
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>73</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>73</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
?column?
----------
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 70fe34a04f..154f1e5297 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -309,6 +309,91 @@ ERROR: unsupported XML feature
LINE 1: SELECT xmlserialize(document 'bad' as text);
^
DETAIL: This functionality requires the server to be built with libxml support.
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent encoding="ISO-8859-1"
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- indent encoding="UTF-8"
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
ERROR: unsupported XML feature
LINE 1: SELECT xml '<foo>bar</foo>' IS DOCUMENT;
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 4f029d0072..42a6e8910a 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -466,6 +466,135 @@ SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
ERROR: not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+-------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val>+
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+-------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val>+
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: invalid XML document
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: invalid XML document
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: invalid XML document
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent encoding="ISO-8859-1"
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+---------------------------------------------
+ <?xml version="1.0" encoding="ISO-8859-1"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+---------------------------------------------
+ <?xml version="1.0" encoding="ISO-8859-1"?>+
+ <foo> +
+ <bar> +
+ <val>42</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- indent encoding="UTF-8"
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>73</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+ xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo> +
+ <bar> +
+ <val>73</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
?column?
----------
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 24e40d2653..0349b624bc 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -132,6 +132,35 @@ SELECT xmlserialize(content data as character varying(20)) FROM xmltest;
SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+-- indent encoding="ISO-8859-1"
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT);
+-- indent encoding="UTF-8"
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
SELECT xml '<foo>bar</foo><bar>foo</bar>' IS DOCUMENT;
--
2.25.1
Jim Jones <jim.jones@uni-muenster.de> writes:
On 09.03.23 21:21, Tom Lane wrote:
I've looked through this now, and have some minor complaints and a major
one. The major one is that it doesn't work for XML that doesn't satisfy
IS DOCUMENT. For example,
How do you suggest the output should look like?
I'd say a series of node trees, each starting on a separate line.
I also suspect that it's outright broken to attach a header claiming
the data is now in UTF8 encoding. If the database encoding isn't
UTF8, then either that's a lie or we now have an encoding violation.
I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now
uses the encoding of the given doc and UTF8 if not provided.
Mmmm .... doing this differently from what we do elsewhere does not
sound like the right path forward. The input *is* (or had better be)
in the database encoding.
regards, tom lane
On 10.03.23 15:32, Tom Lane wrote:
Jim Jones<jim.jones@uni-muenster.de> writes:
On 09.03.23 21:21, Tom Lane wrote:
I've looked through this now, and have some minor complaints and a major
one. The major one is that it doesn't work for XML that doesn't satisfy
IS DOCUMENT. For example,How do you suggest the output should look like?
I'd say a series of node trees, each starting on a separate line.
v22 attached enables the usage of INDENT with non singly-rooted xml.
postgres=# SELECT xmlserialize (CONTENT '<bar><val
x="y">42</val></bar><foo>73</foo>' AS text INDENT);
xmlserialize
-----------------------
<bar> +
<val x="y">42</val>+
</bar> +
<foo>73</foo>
(1 row)
I tried several libxml2 dump functions and none of them could cope very
well with an xml string without a root node. So added them into a
temporary root node, so that I could iterate over its children and add
them one by one (formatted) into the output buffer.
I slightly modified the existing xml_parse() function to return the list
of nodes parsed by xmlParseBalancedChunkMemory:
xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
int encoding, Node *escontext, *xmlNodePtr *parsed_nodes*)
res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
utf8string + count, *parsed_nodes*);
I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now
uses the encoding of the given doc and UTF8 if not provided.Mmmm .... doing this differently from what we do elsewhere does not
sound like the right path forward. The input *is* (or had better be)
in the database encoding.
I changed that behavior. It now uses GetDatabaseEncoding();
Thanks!
Best, Jim
Attachments:
v22-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v22-0001-Add-pretty-printed-XML-output-option.patchDownload
From 85873e505aa04dea4ed92267dd07160d39460a59 Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Fri, 10 Mar 2023 13:47:16 +0100
Subject: [PATCH v22] Add pretty-printed XML output option
This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.'
It adds the options INDENT and NO INDENT (default) to the existing
xmlserialize function. It uses the indentation feature of xmlSaveToBuffer
from libxml2 to indent XML strings - see option XML_SAVE_FORMAT.
Although the INDENT feature is designed to work with xml strings of type
DOCUMENT, this implementation also allows the usage of CONTENT type strings
as long as it contains a well balanced xml.
This patch also includes documentation, regression tests and their three
possible output files xml.out, xml_1.out and xml_2.out.
---
doc/src/sgml/datatype.sgml | 8 +-
src/backend/catalog/sql_features.txt | 2 +-
src/backend/executor/execExprInterp.c | 9 +-
src/backend/parser/gram.y | 14 ++-
src/backend/parser/parse_expr.c | 1 +
src/backend/utils/adt/xml.c | 137 +++++++++++++++++++++--
src/include/nodes/parsenodes.h | 1 +
src/include/nodes/primnodes.h | 4 +-
src/include/parser/kwlist.h | 1 +
src/include/utils/xml.h | 1 +
src/test/regress/expected/xml.out | 153 ++++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 84 ++++++++++++++
src/test/regress/expected/xml_2.out | 153 ++++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 32 ++++++
14 files changed, 582 insertions(+), 18 deletions(-)
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..53d59662b9 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml '<foo>bar</foo>'
<type>xml</type>, uses the function
<function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm>
<synopsis>
-XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> [ [NO] INDENT ] )
</synopsis>
<replaceable>type</replaceable> can be
<type>character</type>, <type>character varying</type>, or
<type>text</type> (or an alias for one of those). Again, according
to the SQL standard, this is the only way to convert between type
<type>xml</type> and character types, but PostgreSQL also allows
- you to simply cast the value.
+ you to simply cast the value. The option <type>INDENT</type> allows to
+ indent the serialized xml output - the default is <type>NO INDENT</type>.
+ It is designed to indent XML strings of type <type>DOCUMENT</type>, but it can also
+ be used with <type>CONTENT</type> as long as <replaceable>value</replaceable>
+ contains a well-formed XML.
</para>
<para>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 0fb9ab7533..bb4c135a7f 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -621,7 +621,7 @@ X061 XMLParse: character string input and DOCUMENT option YES
X065 XMLParse: binary string input and CONTENT option NO
X066 XMLParse: binary string input and DOCUMENT option NO
X068 XMLSerialize: BOM NO
-X069 XMLSerialize: INDENT NO
+X069 XMLSerialize: INDENT YES
X070 XMLSerialize: character string serialization and CONTENT option YES
X071 XMLSerialize: character string serialization and DOCUMENT option YES
X072 XMLSerialize: character string serialization YES
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..6e4425ca7c 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
{
Datum *argvalue = op->d.xmlexpr.argvalue;
bool *argnull = op->d.xmlexpr.argnull;
+ text *result;
/* argument type is known to be xml */
Assert(list_length(xexpr->args) == 1);
@@ -3837,8 +3838,12 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
return;
value = argvalue[0];
- *op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
- xexpr->xmloption));
+ result = xmltotext_with_xmloption(DatumGetXmlP(value),
+ xexpr->xmloption);
+ if (xexpr->indent)
+ result = xmlserialize_indent(result,xexpr->xmloption);
+
+ *op->resvalue = PointerGetDatum(result);
*op->resnull = false;
}
break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0138382a1..efe88ccf9d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -613,7 +613,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <node> xml_root_version opt_xml_root_standalone
%type <node> xmlexists_argument
%type <ival> document_or_content
-%type <boolean> xml_whitespace_option
+%type <boolean> xml_indent_option xml_whitespace_option
%type <list> xmltable_column_list xmltable_column_option_list
%type <node> xmltable_column_el
%type <defelt> xmltable_column_option_el
@@ -702,7 +702,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
- INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
+ INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -15532,13 +15532,14 @@ func_expr_common_subexpr:
$$ = makeXmlExpr(IS_XMLROOT, NULL, NIL,
list_make3($3, $5, $6), @1);
}
- | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename ')'
+ | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename xml_indent_option ')'
{
XmlSerialize *n = makeNode(XmlSerialize);
n->xmloption = $3;
n->expr = $4;
n->typeName = $6;
+ n->indent = $7;
n->location = @1;
$$ = (Node *) n;
}
@@ -15592,6 +15593,11 @@ document_or_content: DOCUMENT_P { $$ = XMLOPTION_DOCUMENT; }
| CONTENT_P { $$ = XMLOPTION_CONTENT; }
;
+xml_indent_option: INDENT { $$ = true; }
+ | NO INDENT { $$ = false; }
+ | /*EMPTY*/ { $$ = false; }
+ ;
+
xml_whitespace_option: PRESERVE WHITESPACE_P { $$ = true; }
| STRIP_P WHITESPACE_P { $$ = false; }
| /*EMPTY*/ { $$ = false; }
@@ -16828,6 +16834,7 @@ unreserved_keyword:
| INCLUDE
| INCLUDING
| INCREMENT
+ | INDENT
| INDEX
| INDEXES
| INHERIT
@@ -17384,6 +17391,7 @@ bare_label_keyword:
| INCLUDE
| INCLUDING
| INCREMENT
+ | INDENT
| INDEX
| INDEXES
| INHERIT
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 78221d2e0f..2331417552 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -2331,6 +2331,7 @@ transformXmlSerialize(ParseState *pstate, XmlSerialize *xs)
typenameTypeIdAndMod(pstate, xs->typeName, &targetType, &targetTypmod);
xexpr->xmloption = xs->xmloption;
+ xexpr->indent = xs->indent;
xexpr->location = xs->location;
/* We actually only need these to be able to parse back the expression. */
xexpr->type = targetType;
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..4fe85519e2 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -52,6 +52,7 @@
#include <libxml/tree.h>
#include <libxml/uri.h>
#include <libxml/xmlerror.h>
+#include <libxml/xmlsave.h>
#include <libxml/xmlversion.h>
#include <libxml/xmlwriter.h>
#include <libxml/xpath.h>
@@ -146,7 +147,7 @@ static bool print_xml_decl(StringInfo buf, const xmlChar *version,
static bool xml_doctype_in_content(const xmlChar *str);
static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
bool preserve_whitespace, int encoding,
- Node *escontext);
+ Node *escontext, xmlNodePtr *parsed_nodes);
static text *xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt);
static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
ArrayBuildState *astate,
@@ -273,7 +274,7 @@ xml_in(PG_FUNCTION_ARGS)
* Note: we don't need to worry about whether a soft error is detected.
*/
doc = xml_parse(vardata, xmloption, true, GetDatabaseEncoding(),
- fcinfo->context);
+ fcinfo->context, NULL);
if (doc != NULL)
xmlFreeDoc(doc);
@@ -400,7 +401,7 @@ xml_recv(PG_FUNCTION_ARGS)
* Parse the data to check if it is well-formed XML data. Assume that
* xml_parse will throw ERROR if not.
*/
- doc = xml_parse(result, xmloption, true, encoding, NULL);
+ doc = xml_parse(result, xmloption, true, encoding, NULL, NULL);
xmlFreeDoc(doc);
/* Now that we know what we're dealing with, convert to server encoding */
@@ -631,6 +632,122 @@ xmltotext_with_xmloption(xmltype *data, XmlOptionType xmloption_arg)
}
+text *
+xmlserialize_indent(text *data, XmlOptionType xmloption_arg)
+{
+#ifdef USE_LIBXML
+ text *result;
+ xmlDocPtr doc;
+ xmlSaveCtxtPtr ctxt = NULL;
+ xmlBufferPtr buf = NULL;
+ xmlChar *version;
+ xmlNodePtr content_nodes = NULL;
+ PgXmlErrorContext *xmlerrcxt;
+
+ parse_xml_decl(xml_text2xmlChar(data), NULL, &version, NULL, NULL);
+
+ doc = xml_parse(data, xmloption_arg, true,
+ GetDatabaseEncoding(), NULL, &content_nodes);
+ Assert(doc);
+
+ xmlerrcxt = pg_xml_init(PG_XML_STRICTNESS_ALL);
+
+ PG_TRY();
+ {
+ buf = xmlBufferCreate();
+
+ if (buf == NULL || xmlerrcxt->err_occurred)
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
+ "could not allocate xmlBuffer");
+
+ if(!version)
+ ctxt = xmlSaveToBuffer(buf, GetDatabaseEncodingName(),
+ XML_SAVE_NO_DECL | XML_SAVE_FORMAT | XML_SAVE_NO_EMPTY);
+ else
+ ctxt = xmlSaveToBuffer(buf, GetDatabaseEncodingName(),
+ XML_SAVE_FORMAT | XML_SAVE_NO_EMPTY);
+
+ if (ctxt == NULL || xmlerrcxt->err_occurred)
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
+ "could not allocate parser context");
+
+ if(xmloption_arg == XMLOPTION_DOCUMENT)
+ {
+ if (xmlSaveDoc(ctxt, doc) == -1 || xmlerrcxt->err_occurred)
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
+ "could not save document to xmlBuffer");
+ }
+ else
+ {
+ if(content_nodes != NULL)
+ {
+ xmlNodePtr root = NULL;
+ xmlNodePtr node = NULL;
+
+ /* This creates a root node for returned content from xml_parse
+ * as it can contain a non singly-rooted XML. This is necessary
+ * because to avoid the dump functions ignoring XML strings with
+ * multiple root nodes (content type). This new root node serves
+ * only as a container, so that we can iterate over its nodes
+ * and save each one of the formatted children into the buffer -
+ * separated by a newline.
+ */
+ root = xmlNewNode(NULL, BAD_CAST "content-root");
+ xmlDocSetRootElement(doc, root);
+ xmlAddChild(root, content_nodes);
+
+ for (node = root->children; node; node = node->next) {
+
+ if (node->type != XML_TEXT_NODE && node->prev != NULL)
+ {
+ xmlNodePtr newline = NULL;
+ newline = xmlNewDocText(doc, (const xmlChar *) "\n");
+
+ if (xmlSaveTree(ctxt, newline) == -1 || xmlerrcxt->err_occurred)
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
+ "could not save content's line separator to xmlBuffer");
+ }
+
+ if (xmlSaveTree(ctxt, node) == -1 || xmlerrcxt->err_occurred)
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
+ "could not save content to xmlBuffer");
+ }
+ }
+ }
+
+ if (xmlSaveClose(ctxt) == -1 || xmlerrcxt->err_occurred)
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
+ "could not close xmlSaveCtxtPtr");
+ }
+ PG_CATCH();
+ {
+ if (buf)
+ xmlBufferFree(buf);
+ if(doc)
+ xmlFreeDoc(doc);
+ if(ctxt)
+ xmlSaveClose(ctxt);
+
+ pg_xml_done(xmlerrcxt, true);
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ pg_xml_done(xmlerrcxt, false);
+ xmlFreeDoc(doc);
+
+ result = (text *) xmlBuffer_to_xmltype(buf);
+ xmlBufferFree(buf);
+
+ return result;
+#else
+ NO_XML_SUPPORT();
+ return NULL;
+#endif
+}
+
+
xmltype *
xmlelement(XmlExpr *xexpr,
Datum *named_argvalue, bool *named_argnull,
@@ -762,7 +879,7 @@ xmlparse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace)
xmlDocPtr doc;
doc = xml_parse(data, xmloption_arg, preserve_whitespace,
- GetDatabaseEncoding(), NULL);
+ GetDatabaseEncoding(), NULL, NULL);
xmlFreeDoc(doc);
return (xmltype *) data;
@@ -902,7 +1019,7 @@ xml_is_document(xmltype *arg)
* We'll report "true" if no soft error is reported by xml_parse().
*/
doc = xml_parse((text *) arg, XMLOPTION_DOCUMENT, true,
- GetDatabaseEncoding(), (Node *) &escontext);
+ GetDatabaseEncoding(), (Node *) &escontext, NULL);
if (doc)
xmlFreeDoc(doc);
@@ -1489,7 +1606,9 @@ xml_doctype_in_content(const xmlChar *str)
*
* data is the source data (must not be toasted!), encoding is its encoding,
* and xmloption_arg and preserve_whitespace are options for the
- * transformation.
+ * transformation. parsed_nodes will return the list of parsed nodes
+ * for XML of type XMLOPTION_CONTENT from the xmlParseBalancedChunkMemory
+ * call - it can be NULL.
*
* Errors normally result in ereport(ERROR), but if escontext is an
* ErrorSaveContext, then "safe" errors are reported there instead, and the
@@ -1504,7 +1623,7 @@ xml_doctype_in_content(const xmlChar *str)
*/
static xmlDocPtr
xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
- int encoding, Node *escontext)
+ int encoding, Node *escontext, xmlNodePtr *parsed_nodes)
{
int32 len;
xmlChar *string;
@@ -1620,7 +1739,7 @@ xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
if (*(utf8string + count))
{
res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
- utf8string + count, NULL);
+ utf8string + count, parsed_nodes);
if (res_code != 0 || xmlerrcxt->err_occurred)
{
xml_errsave(escontext, xmlerrcxt,
@@ -4305,7 +4424,7 @@ wellformed_xml(text *data, XmlOptionType xmloption_arg)
* We'll report "true" if no soft error is reported by xml_parse().
*/
doc = xml_parse(data, xmloption_arg, true,
- GetDatabaseEncoding(), (Node *) &escontext);
+ GetDatabaseEncoding(), (Node *) &escontext, NULL);
if (doc)
xmlFreeDoc(doc);
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index f7d7f10f7d..fc5b89a698 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -841,6 +841,7 @@ typedef struct XmlSerialize
XmlOptionType xmloption; /* DOCUMENT or CONTENT */
Node *expr;
TypeName *typeName;
+ bool indent; /* [NO] INDENT */
int location; /* token location, or -1 if unknown */
} XmlSerialize;
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index b4292253cc..2263dab8a1 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1461,7 +1461,7 @@ typedef enum XmlExprOp
IS_XMLPARSE, /* XMLPARSE(text, is_doc, preserve_ws) */
IS_XMLPI, /* XMLPI(name [, args]) */
IS_XMLROOT, /* XMLROOT(xml, version, standalone) */
- IS_XMLSERIALIZE, /* XMLSERIALIZE(is_document, xmlval) */
+ IS_XMLSERIALIZE, /* XMLSERIALIZE(is_document, xmlval, indent) */
IS_DOCUMENT /* xmlval IS DOCUMENT */
} XmlExprOp;
@@ -1486,6 +1486,8 @@ typedef struct XmlExpr
List *args;
/* DOCUMENT or CONTENT */
XmlOptionType xmloption pg_node_attr(query_jumble_ignore);
+ /* INDENT option for XMLSERIALIZE */
+ bool indent;
/* target type/typmod for XMLSERIALIZE */
Oid type pg_node_attr(query_jumble_ignore);
int32 typmod pg_node_attr(query_jumble_ignore);
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index bb36213e6f..753e9ee174 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -205,6 +205,7 @@ PG_KEYWORD("in", IN_P, RESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("include", INCLUDE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("including", INCLUDING, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("increment", INCREMENT, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("index", INDEX, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("indexes", INDEXES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("inherit", INHERIT, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 311da06cd6..ea14eae712 100644
--- a/src/include/utils/xml.h
+++ b/src/include/utils/xml.h
@@ -78,6 +78,7 @@ extern xmltype *xmlpi(const char *target, text *arg, bool arg_is_null, bool *res
extern xmltype *xmlroot(xmltype *data, text *version, int standalone);
extern bool xml_is_document(xmltype *arg);
extern text *xmltotext_with_xmloption(xmltype *data, XmlOptionType xmloption_arg);
+extern text *xmlserialize_indent(text *data, XmlOptionType xmloption_arg);
extern char *escape_xml(const char *str);
extern char *map_sql_identifier_to_xml_name(const char *ident, bool fully_escaped, bool escape_period);
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index ad852dc2f7..f5a8ed4e43 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -486,6 +486,159 @@ SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
ERROR: not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+-------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val>+
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+-------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val>+
+ </bar> +
+ </foo>
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent non singly-rooted xml
+SELECT xmlserialize(DOCUMENT '<foo>73</foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '<foo>73</foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ xmlserialize
+-----------------------
+ <foo>73</foo> +
+ <bar> +
+ <val x="y">42</val>+
+ </bar>
+(1 row)
+
+-- indent non singly-rooted xml with mixed contents
+SELECT xmlserialize(DOCUMENT 'text node<foo>73</foo>text node<bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT 'text node<foo>73</foo>text node<bar><val x="y">42</val></bar>' AS text INDENT);
+ xmlserialize
+------------------------
+ text node +
+ <foo>73</foo>text node+
+ <bar> +
+ <val x="y">42</val> +
+ </bar>
+(1 row)
+
+-- indent singly-rooted xml with mixed contents
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><val x="y">text node<val>73</val></val></bar></foo>' AS text INDENT);
+ xmlserialize
+---------------------------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ <val x="y">text node<val>73</val></val>+
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><val x="y">text node<val>73</val></val></bar></foo>' AS text INDENT);
+ xmlserialize
+---------------------------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ <val x="y">text node<val>73</val></val>+
+ </bar> +
+ </foo>
+(1 row)
+
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent with XML declaration
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+ xmlserialize
+---------------------------------------
+ <?xml version="1.0" encoding="UTF8"?>+
+ <foo> +
+ <bar> +
+ <val>73</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+ xmlserialize
+-------------------
+ <foo> +
+ <bar> +
+ <val>73</val>+
+ </bar> +
+ </foo>
+(1 row)
+
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
?column?
----------
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 70fe34a04f..68ec380d6e 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -309,6 +309,90 @@ ERROR: unsupported XML feature
LINE 1: SELECT xmlserialize(document 'bad' as text);
^
DETAIL: This functionality requires the server to be built with libxml support.
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+\set VERBOSITY terse
+-- indent non singly-rooted xml
+SELECT xmlserialize(DOCUMENT '<foo>73</foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '<foo>73</foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- indent non singly-rooted xml with mixed contents
+SELECT xmlserialize(DOCUMENT 'text node<foo>73</foo>text node<bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT 'text node<foo>73</foo>text node<bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- indent singly-rooted xml with mixed contents
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><val x="y">text node<val>73</val></val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><val x="y">text node<val>73</val></val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature at character 29
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent with XML declaration
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
ERROR: unsupported XML feature
LINE 1: SELECT xml '<foo>bar</foo>' IS DOCUMENT;
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 4f029d0072..0955214bb7 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -466,6 +466,159 @@ SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
ERROR: not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+-------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val>+
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+-------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val>+
+ </bar> +
+ </foo>
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent non singly-rooted xml
+SELECT xmlserialize(DOCUMENT '<foo>73</foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '<foo>73</foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ xmlserialize
+-----------------------
+ <foo>73</foo> +
+ <bar> +
+ <val x="y">42</val>+
+ </bar>
+(1 row)
+
+-- indent non singly-rooted xml with mixed contents
+SELECT xmlserialize(DOCUMENT 'text node<foo>73</foo>text node<bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT 'text node<foo>73</foo>text node<bar><val x="y">42</val></bar>' AS text INDENT);
+ xmlserialize
+------------------------
+ text node +
+ <foo>73</foo>text node+
+ <bar> +
+ <val x="y">42</val> +
+ </bar>
+(1 row)
+
+-- indent singly-rooted xml with mixed contents
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><val x="y">text node<val>73</val></val></bar></foo>' AS text INDENT);
+ xmlserialize
+---------------------------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ <val x="y">text node<val>73</val></val>+
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><val x="y">text node<val>73</val></val></bar></foo>' AS text INDENT);
+ xmlserialize
+---------------------------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ <val x="y">text node<val>73</val></val>+
+ </bar> +
+ </foo>
+(1 row)
+
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent with XML declaration
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+ xmlserialize
+---------------------------------------
+ <?xml version="1.0" encoding="UTF8"?>+
+ <foo> +
+ <bar> +
+ <val>73</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+ xmlserialize
+-------------------
+ <foo> +
+ <bar> +
+ <val>73</val>+
+ </bar> +
+ </foo>
+(1 row)
+
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
?column?
----------
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 24e40d2653..4b7224baa3 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -132,6 +132,38 @@ SELECT xmlserialize(content data as character varying(20)) FROM xmltest;
SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+\set VERBOSITY terse
+-- indent non singly-rooted xml
+SELECT xmlserialize(DOCUMENT '<foo>73</foo><bar><val x="y">42</val></bar>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo>73</foo><bar><val x="y">42</val></bar>' AS text INDENT);
+-- indent non singly-rooted xml with mixed contents
+SELECT xmlserialize(DOCUMENT 'text node<foo>73</foo>text node<bar><val x="y">42</val></bar>' AS text INDENT);
+SELECT xmlserialize(CONTENT 'text node<foo>73</foo>text node<bar><val x="y">42</val></bar>' AS text INDENT);
+-- indent singly-rooted xml with mixed contents
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><val x="y">text node<val>73</val></val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><val x="y">text node<val>73</val></val></bar></foo>' AS text INDENT);
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+-- indent with XML declaration
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
SELECT xml '<foo>bar</foo><bar>foo</bar>' IS DOCUMENT;
--
2.25.1
On 09.03.23 21:21, Tom Lane wrote:
Peter Smith <smithpb2250@gmail.com> writes:
The patch v19 LGTM.
Another thing that's mildly irking me is that the current
factorization of this code will result in xml_parse'ing the data
twice, if you have both DOCUMENT and INDENT specified. We could
consider avoiding that if we merged the indentation functionality
into xmltotext_with_xmloption, but it's probably premature to do so
when we haven't figured out how to get the output right --- we might
end up needing two xml_parse calls anyway with different parameters,
perhaps.
Just a thought: since xmlserialize_indent also calls xml_parse() to
build the xmlDocPtr, couldn't we simply bypass
xmltotext_with_xmloption() in case of INDENT is specified?
Something like this:
diff --git a/src/backend/executor/execExprInterp.c
b/src/backend/executor/execExprInterp.c
index 19351fe..ea808dd 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
{
Datum *argvalue = op->d.xmlexpr.argvalue;
bool *argnull = op->d.xmlexpr.argnull;
+ text *result;
/* argument type is known to be xml */
Assert(list_length(xexpr->args) == 1);
@@ -3837,8 +3838,14 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
return;
value = argvalue[0];
- *op->resvalue =
PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
- xexpr->xmloption));
+ if (xexpr->indent)
+ result =
xmlserialize_indent(DatumGetXmlP(value),
+ xexpr->xmloption);
+ else
+ result =
xmltotext_with_xmloption(DatumGetXmlP(value),
+ xexpr->xmloption);
+
+ *op->resvalue = PointerGetDatum(result);
*op->resnull = false;
}
break;
Jim Jones <jim.jones@uni-muenster.de> writes:
[ v22-0001-Add-pretty-printed-XML-output-option.patch ]
I poked at this for awhile and ran into a problem that I'm not sure
how to solve: it misbehaves for input with embedded DOCTYPE.
regression=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' as text indent);
xmlserialize
--------------
<!DOCTYPE a>+
<a></a> +
(1 row)
regression=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' as text indent);
xmlserialize
--------------
(1 row)
The bad result for CONTENT is because xml_parse() decides to
parse_as_document, but xmlserialize_indent has no idea that happened
and tries to use the content_nodes list anyway. I don't especially
care for the laissez faire "maybe we'll set *content_nodes and maybe
we won't" API you adopted for xml_parse, which seems to be contributing
to the mess. We could pass back more info so that xmlserialize_indent
knows what really happened. However, that won't fix the bogus output
for the DOCUMENT case. Are we perhaps passing incorrect flags to
xmlSaveToBuffer?
regards, tom lane
On 14.03.23 18:40, Tom Lane wrote:
Jim Jones <jim.jones@uni-muenster.de> writes:
[ v22-0001-Add-pretty-printed-XML-output-option.patch ]
I poked at this for awhile and ran into a problem that I'm not sure
how to solve: it misbehaves for input with embedded DOCTYPE.regression=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' as text indent);
xmlserialize
--------------
<!DOCTYPE a>+
<a></a> +(1 row)
The issue was the flag XML_SAVE_NO_EMPTY. It was forcing empty elements
to be serialized with start-end tag pairs. Removing it did the trick ...
postgres=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' AS text INDENT);
xmlserialize
--------------
<!DOCTYPE a>+
<a/> +
(1 row)
... but as a side effect empty start-end tags will be now serialized as
empty elements
postgres=# SELECT xmlserialize(CONTENT '<foo><bar></bar></foo>' AS text
INDENT);
xmlserialize
--------------
<foo> +
<bar/> +
</foo>
(1 row)
It seems to be the standard behavior of other xml indent tools
(including Oracle)
regression=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' as text indent);
xmlserialize
--------------(1 row)
The bad result for CONTENT is because xml_parse() decides to
parse_as_document, but xmlserialize_indent has no idea that happened
and tries to use the content_nodes list anyway. I don't especially
care for the laissez faire "maybe we'll set *content_nodes and maybe
we won't" API you adopted for xml_parse, which seems to be contributing
to the mess. We could pass back more info so that xmlserialize_indent
knows what really happened.
I added a new (nullable) parameter to the xml_parse function that will
return the actual XmlOptionType used to parse the xml data. Now
xmlserialize_indent knows how the data was really parsed:
postgres=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' AS text INDENT);
xmlserialize
--------------
<!DOCTYPE a>+
<a/> +
(1 row)
I added test cases for these queries.
v23 attached.
Thanks!
Best, Jim
Attachments:
v23-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v23-0001-Add-pretty-printed-XML-output-option.patchDownload
From 98fe15f07da345e046b8d29d5dde27ce191055a2 Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Fri, 10 Mar 2023 13:47:16 +0100
Subject: [PATCH v23] Add pretty-printed XML output option
This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.'
It adds the options INDENT and NO INDENT (default) to the existing
xmlserialize function. It uses the indentation feature of xmlSaveToBuffer
from libxml2 to indent XML strings - see option XML_SAVE_FORMAT.
Although the INDENT feature is designed to work with xml strings of type
DOCUMENT, this implementation also allows the usage of CONTENT type strings
as long as it contains a well balanced xml.
This patch also includes documentation, regression tests and their three
possible output files xml.out, xml_1.out and xml_2.out.
---
doc/src/sgml/datatype.sgml | 8 +-
src/backend/catalog/sql_features.txt | 2 +-
src/backend/executor/execExprInterp.c | 9 +-
src/backend/parser/gram.y | 14 +-
src/backend/parser/parse_expr.c | 1 +
src/backend/utils/adt/xml.c | 154 +++++++++++++++++++--
src/include/nodes/parsenodes.h | 1 +
src/include/nodes/primnodes.h | 4 +-
src/include/parser/kwlist.h | 1 +
src/include/utils/xml.h | 1 +
src/test/regress/expected/xml.out | 188 ++++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 106 +++++++++++++++
src/test/regress/expected/xml_2.out | 188 ++++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 38 ++++++
14 files changed, 697 insertions(+), 18 deletions(-)
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..53d59662b9 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml '<foo>bar</foo>'
<type>xml</type>, uses the function
<function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm>
<synopsis>
-XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> [ [NO] INDENT ] )
</synopsis>
<replaceable>type</replaceable> can be
<type>character</type>, <type>character varying</type>, or
<type>text</type> (or an alias for one of those). Again, according
to the SQL standard, this is the only way to convert between type
<type>xml</type> and character types, but PostgreSQL also allows
- you to simply cast the value.
+ you to simply cast the value. The option <type>INDENT</type> allows to
+ indent the serialized xml output - the default is <type>NO INDENT</type>.
+ It is designed to indent XML strings of type <type>DOCUMENT</type>, but it can also
+ be used with <type>CONTENT</type> as long as <replaceable>value</replaceable>
+ contains a well-formed XML.
</para>
<para>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 0fb9ab7533..bb4c135a7f 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -621,7 +621,7 @@ X061 XMLParse: character string input and DOCUMENT option YES
X065 XMLParse: binary string input and CONTENT option NO
X066 XMLParse: binary string input and DOCUMENT option NO
X068 XMLSerialize: BOM NO
-X069 XMLSerialize: INDENT NO
+X069 XMLSerialize: INDENT YES
X070 XMLSerialize: character string serialization and CONTENT option YES
X071 XMLSerialize: character string serialization and DOCUMENT option YES
X072 XMLSerialize: character string serialization YES
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..6e4425ca7c 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
{
Datum *argvalue = op->d.xmlexpr.argvalue;
bool *argnull = op->d.xmlexpr.argnull;
+ text *result;
/* argument type is known to be xml */
Assert(list_length(xexpr->args) == 1);
@@ -3837,8 +3838,12 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
return;
value = argvalue[0];
- *op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
- xexpr->xmloption));
+ result = xmltotext_with_xmloption(DatumGetXmlP(value),
+ xexpr->xmloption);
+ if (xexpr->indent)
+ result = xmlserialize_indent(result,xexpr->xmloption);
+
+ *op->resvalue = PointerGetDatum(result);
*op->resnull = false;
}
break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0138382a1..efe88ccf9d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -613,7 +613,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <node> xml_root_version opt_xml_root_standalone
%type <node> xmlexists_argument
%type <ival> document_or_content
-%type <boolean> xml_whitespace_option
+%type <boolean> xml_indent_option xml_whitespace_option
%type <list> xmltable_column_list xmltable_column_option_list
%type <node> xmltable_column_el
%type <defelt> xmltable_column_option_el
@@ -702,7 +702,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
- INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
+ INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -15532,13 +15532,14 @@ func_expr_common_subexpr:
$$ = makeXmlExpr(IS_XMLROOT, NULL, NIL,
list_make3($3, $5, $6), @1);
}
- | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename ')'
+ | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename xml_indent_option ')'
{
XmlSerialize *n = makeNode(XmlSerialize);
n->xmloption = $3;
n->expr = $4;
n->typeName = $6;
+ n->indent = $7;
n->location = @1;
$$ = (Node *) n;
}
@@ -15592,6 +15593,11 @@ document_or_content: DOCUMENT_P { $$ = XMLOPTION_DOCUMENT; }
| CONTENT_P { $$ = XMLOPTION_CONTENT; }
;
+xml_indent_option: INDENT { $$ = true; }
+ | NO INDENT { $$ = false; }
+ | /*EMPTY*/ { $$ = false; }
+ ;
+
xml_whitespace_option: PRESERVE WHITESPACE_P { $$ = true; }
| STRIP_P WHITESPACE_P { $$ = false; }
| /*EMPTY*/ { $$ = false; }
@@ -16828,6 +16834,7 @@ unreserved_keyword:
| INCLUDE
| INCLUDING
| INCREMENT
+ | INDENT
| INDEX
| INDEXES
| INHERIT
@@ -17384,6 +17391,7 @@ bare_label_keyword:
| INCLUDE
| INCLUDING
| INCREMENT
+ | INDENT
| INDEX
| INDEXES
| INHERIT
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 78221d2e0f..2331417552 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -2331,6 +2331,7 @@ transformXmlSerialize(ParseState *pstate, XmlSerialize *xs)
typenameTypeIdAndMod(pstate, xs->typeName, &targetType, &targetTypmod);
xexpr->xmloption = xs->xmloption;
+ xexpr->indent = xs->indent;
xexpr->location = xs->location;
/* We actually only need these to be able to parse back the expression. */
xexpr->type = targetType;
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..facd111f4f 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -52,6 +52,7 @@
#include <libxml/tree.h>
#include <libxml/uri.h>
#include <libxml/xmlerror.h>
+#include <libxml/xmlsave.h>
#include <libxml/xmlversion.h>
#include <libxml/xmlwriter.h>
#include <libxml/xpath.h>
@@ -146,7 +147,8 @@ static bool print_xml_decl(StringInfo buf, const xmlChar *version,
static bool xml_doctype_in_content(const xmlChar *str);
static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
bool preserve_whitespace, int encoding,
- Node *escontext);
+ Node *escontext, xmlNodePtr *parsed_nodes,
+ XmlOptionType *parsed_xmloptiontype);
static text *xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt);
static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
ArrayBuildState *astate,
@@ -273,7 +275,7 @@ xml_in(PG_FUNCTION_ARGS)
* Note: we don't need to worry about whether a soft error is detected.
*/
doc = xml_parse(vardata, xmloption, true, GetDatabaseEncoding(),
- fcinfo->context);
+ fcinfo->context, NULL,NULL);
if (doc != NULL)
xmlFreeDoc(doc);
@@ -400,7 +402,7 @@ xml_recv(PG_FUNCTION_ARGS)
* Parse the data to check if it is well-formed XML data. Assume that
* xml_parse will throw ERROR if not.
*/
- doc = xml_parse(result, xmloption, true, encoding, NULL);
+ doc = xml_parse(result, xmloption, true, encoding, NULL, NULL,NULL);
xmlFreeDoc(doc);
/* Now that we know what we're dealing with, convert to server encoding */
@@ -631,6 +633,123 @@ xmltotext_with_xmloption(xmltype *data, XmlOptionType xmloption_arg)
}
+text *
+xmlserialize_indent(text *data, XmlOptionType xmloption_arg)
+{
+#ifdef USE_LIBXML
+ text *result;
+ xmlDocPtr doc;
+ xmlSaveCtxtPtr ctxt = NULL;
+ xmlBufferPtr buf = NULL;
+ xmlChar *version;
+ xmlNodePtr content_nodes = NULL;
+ PgXmlErrorContext *xmlerrcxt;
+ XmlOptionType parsed_xmloptiontype;
+
+ parse_xml_decl(xml_text2xmlChar(data), NULL, &version, NULL, NULL);
+
+ doc = xml_parse(data, xmloption_arg, true,
+ GetDatabaseEncoding(), NULL, &content_nodes, &parsed_xmloptiontype);
+ Assert(doc);
+
+ xmlerrcxt = pg_xml_init(PG_XML_STRICTNESS_ALL);
+
+ PG_TRY();
+ {
+ buf = xmlBufferCreate();
+
+ if (buf == NULL || xmlerrcxt->err_occurred)
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
+ "could not allocate xmlBuffer");
+
+ if(!version)
+ ctxt = xmlSaveToBuffer(buf, GetDatabaseEncodingName(),
+ XML_SAVE_NO_DECL | XML_SAVE_FORMAT);
+ else
+ ctxt = xmlSaveToBuffer(buf, GetDatabaseEncodingName(),
+ XML_SAVE_FORMAT);
+
+ if (ctxt == NULL || xmlerrcxt->err_occurred)
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
+ "could not allocate parser context");
+
+ if(parsed_xmloptiontype == XMLOPTION_DOCUMENT)
+ {
+ if (xmlSaveDoc(ctxt, doc) == -1 || xmlerrcxt->err_occurred)
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
+ "could not save document to xmlBuffer");
+ }
+ else
+ {
+ if(content_nodes != NULL)
+ {
+ xmlNodePtr root = NULL;
+ xmlNodePtr node = NULL;
+
+ /* This creates a root node for returned content from xml_parse,
+ * as it can contain a non singly-rooted XML. This is necessary
+ * to avoid the dump functions to ignore XML strings with
+ * multiple root nodes (CONTENT type). This new root node serves
+ * only as a container, so that we can iterate over its nodes
+ * and save each one of the formatted children into the buffer.
+ * Nodes are separated by a newline.
+ */
+ root = xmlNewNode(NULL, BAD_CAST "content-root");
+ xmlDocSetRootElement(doc, root);
+ xmlAddChild(root, content_nodes);
+
+ for (node = root->children; node; node = node->next) {
+
+ if (node->type != XML_TEXT_NODE && node->prev != NULL)
+ {
+ xmlNodePtr newline = NULL;
+ newline = xmlNewDocText(doc, (const xmlChar *) "\n");
+
+ if (xmlSaveTree(ctxt, newline) == -1 || xmlerrcxt->err_occurred)
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
+ "could not save content's line separator to xmlBuffer");
+ }
+
+ if (xmlSaveTree(ctxt, node) == -1 || xmlerrcxt->err_occurred)
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
+ "could not save content to xmlBuffer");
+ }
+ }
+ }
+
+ if (xmlSaveClose(ctxt) == -1 || xmlerrcxt->err_occurred)
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
+ "could not close xmlSaveCtxtPtr");
+ }
+ PG_CATCH();
+ {
+ if (buf)
+ xmlBufferFree(buf);
+ if(doc)
+ xmlFreeDoc(doc);
+ if(ctxt)
+ xmlSaveClose(ctxt);
+
+ pg_xml_done(xmlerrcxt, true);
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ pg_xml_done(xmlerrcxt, false);
+ xmlFreeDoc(doc);
+
+ result = (text *) xmlBuffer_to_xmltype(buf);
+ xmlBufferFree(buf);
+
+ return result;
+#else
+ NO_XML_SUPPORT();
+ return NULL;
+#endif
+}
+
+
xmltype *
xmlelement(XmlExpr *xexpr,
Datum *named_argvalue, bool *named_argnull,
@@ -762,7 +881,7 @@ xmlparse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace)
xmlDocPtr doc;
doc = xml_parse(data, xmloption_arg, preserve_whitespace,
- GetDatabaseEncoding(), NULL);
+ GetDatabaseEncoding(), NULL, NULL,NULL);
xmlFreeDoc(doc);
return (xmltype *) data;
@@ -902,7 +1021,7 @@ xml_is_document(xmltype *arg)
* We'll report "true" if no soft error is reported by xml_parse().
*/
doc = xml_parse((text *) arg, XMLOPTION_DOCUMENT, true,
- GetDatabaseEncoding(), (Node *) &escontext);
+ GetDatabaseEncoding(), (Node *) &escontext, NULL,NULL);
if (doc)
xmlFreeDoc(doc);
@@ -1489,7 +1608,11 @@ xml_doctype_in_content(const xmlChar *str)
*
* data is the source data (must not be toasted!), encoding is its encoding,
* and xmloption_arg and preserve_whitespace are options for the
- * transformation.
+ * transformation. parsed_nodes will return the list of parsed nodes
+ * for XML of type XMLOPTION_CONTENT from the xmlParseBalancedChunkMemory
+ * call - it can be NULL. parsed_xmloptiontype will return the actual
+ * XmlOptionType used to parse the given data, as it may differ from
+ * xmloption_arg if the xml contains DOCTYPE declarations - it can be NULL.
*
* Errors normally result in ereport(ERROR), but if escontext is an
* ErrorSaveContext, then "safe" errors are reported there instead, and the
@@ -1504,7 +1627,8 @@ xml_doctype_in_content(const xmlChar *str)
*/
static xmlDocPtr
xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
- int encoding, Node *escontext)
+ int encoding, Node *escontext, xmlNodePtr *parsed_nodes,
+ XmlOptionType *parsed_xmloptiontype)
{
int32 len;
xmlChar *string;
@@ -1552,9 +1676,16 @@ xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
"could not allocate parser context");
+ if(parsed_xmloptiontype!=NULL)
+ *parsed_xmloptiontype = XMLOPTION_CONTENT;
+
/* Decide whether to parse as document or content */
if (xmloption_arg == XMLOPTION_DOCUMENT)
+ {
parse_as_document = true;
+ if(parsed_xmloptiontype!=NULL)
+ *parsed_xmloptiontype = XMLOPTION_DOCUMENT;
+ }
else
{
/* Parse and skip over the XML declaration, if any */
@@ -1571,7 +1702,12 @@ xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
/* Is there a DOCTYPE element? */
if (xml_doctype_in_content(utf8string + count))
+ {
parse_as_document = true;
+
+ if(parsed_xmloptiontype!=NULL)
+ *parsed_xmloptiontype = XMLOPTION_DOCUMENT;
+ }
}
if (parse_as_document)
@@ -1620,7 +1756,7 @@ xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
if (*(utf8string + count))
{
res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
- utf8string + count, NULL);
+ utf8string + count, parsed_nodes);
if (res_code != 0 || xmlerrcxt->err_occurred)
{
xml_errsave(escontext, xmlerrcxt,
@@ -4305,7 +4441,7 @@ wellformed_xml(text *data, XmlOptionType xmloption_arg)
* We'll report "true" if no soft error is reported by xml_parse().
*/
doc = xml_parse(data, xmloption_arg, true,
- GetDatabaseEncoding(), (Node *) &escontext);
+ GetDatabaseEncoding(), (Node *) &escontext, NULL,NULL);
if (doc)
xmlFreeDoc(doc);
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index f7d7f10f7d..fc5b89a698 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -841,6 +841,7 @@ typedef struct XmlSerialize
XmlOptionType xmloption; /* DOCUMENT or CONTENT */
Node *expr;
TypeName *typeName;
+ bool indent; /* [NO] INDENT */
int location; /* token location, or -1 if unknown */
} XmlSerialize;
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index b4292253cc..2263dab8a1 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1461,7 +1461,7 @@ typedef enum XmlExprOp
IS_XMLPARSE, /* XMLPARSE(text, is_doc, preserve_ws) */
IS_XMLPI, /* XMLPI(name [, args]) */
IS_XMLROOT, /* XMLROOT(xml, version, standalone) */
- IS_XMLSERIALIZE, /* XMLSERIALIZE(is_document, xmlval) */
+ IS_XMLSERIALIZE, /* XMLSERIALIZE(is_document, xmlval, indent) */
IS_DOCUMENT /* xmlval IS DOCUMENT */
} XmlExprOp;
@@ -1486,6 +1486,8 @@ typedef struct XmlExpr
List *args;
/* DOCUMENT or CONTENT */
XmlOptionType xmloption pg_node_attr(query_jumble_ignore);
+ /* INDENT option for XMLSERIALIZE */
+ bool indent;
/* target type/typmod for XMLSERIALIZE */
Oid type pg_node_attr(query_jumble_ignore);
int32 typmod pg_node_attr(query_jumble_ignore);
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index bb36213e6f..753e9ee174 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -205,6 +205,7 @@ PG_KEYWORD("in", IN_P, RESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("include", INCLUDE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("including", INCLUDING, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("increment", INCREMENT, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("index", INDEX, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("indexes", INDEXES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("inherit", INHERIT, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 311da06cd6..ea14eae712 100644
--- a/src/include/utils/xml.h
+++ b/src/include/utils/xml.h
@@ -78,6 +78,7 @@ extern xmltype *xmlpi(const char *target, text *arg, bool arg_is_null, bool *res
extern xmltype *xmlroot(xmltype *data, text *version, int standalone);
extern bool xml_is_document(xmltype *arg);
extern text *xmltotext_with_xmloption(xmltype *data, XmlOptionType xmloption_arg);
+extern text *xmlserialize_indent(text *data, XmlOptionType xmloption_arg);
extern char *escape_xml(const char *str);
extern char *map_sql_identifier_to_xml_name(const char *ident, bool fully_escaped, bool escape_period);
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index ad852dc2f7..8f1f3c7e65 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -486,6 +486,194 @@ SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
ERROR: not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+-------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val>+
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+-------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val>+
+ </bar> +
+ </foo>
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent non singly-rooted xml
+SELECT xmlserialize(DOCUMENT '<foo>73</foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '<foo>73</foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ xmlserialize
+-----------------------
+ <foo>73</foo> +
+ <bar> +
+ <val x="y">42</val>+
+ </bar>
+(1 row)
+
+-- indent non singly-rooted xml with mixed contents
+SELECT xmlserialize(DOCUMENT 'text node<foo>73</foo>text node<bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT 'text node<foo>73</foo>text node<bar><val x="y">42</val></bar>' AS text INDENT);
+ xmlserialize
+------------------------
+ text node +
+ <foo>73</foo>text node+
+ <bar> +
+ <val x="y">42</val> +
+ </bar>
+(1 row)
+
+-- indent singly-rooted xml with mixed contents
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><val x="y">text node<val>73</val></val></bar></foo>' AS text INDENT);
+ xmlserialize
+---------------------------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ <val x="y">text node<val>73</val></val>+
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><val x="y">text node<val>73</val></val></bar></foo>' AS text INDENT);
+ xmlserialize
+---------------------------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ <val x="y">text node<val>73</val></val>+
+ </bar> +
+ </foo>
+(1 row)
+
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent with XML declaration
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+ xmlserialize
+---------------------------------------
+ <?xml version="1.0" encoding="UTF8"?>+
+ <foo> +
+ <bar> +
+ <val>73</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+ xmlserialize
+-------------------
+ <foo> +
+ <bar> +
+ <val>73</val>+
+ </bar> +
+ </foo>
+(1 row)
+
+-- indent containing DOCTYPE declaration
+SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' AS text INDENT);
+ xmlserialize
+--------------
+ <!DOCTYPE a>+
+ <a/> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' AS text INDENT);
+ xmlserialize
+--------------
+ <!DOCTYPE a>+
+ <a/> +
+
+(1 row)
+
+-- indent xml with empty element
+SELECT xmlserialize(DOCUMENT '<foo><bar></bar></foo>' AS text INDENT);
+ xmlserialize
+--------------
+ <foo> +
+ <bar/> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar></bar></foo>' AS text INDENT);
+ xmlserialize
+--------------
+ <foo> +
+ <bar/> +
+ </foo>
+(1 row)
+
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
?column?
----------
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 70fe34a04f..6e08f8587e 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -309,6 +309,112 @@ ERROR: unsupported XML feature
LINE 1: SELECT xmlserialize(document 'bad' as text);
^
DETAIL: This functionality requires the server to be built with libxml support.
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+\set VERBOSITY terse
+-- indent non singly-rooted xml
+SELECT xmlserialize(DOCUMENT '<foo>73</foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '<foo>73</foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- indent non singly-rooted xml with mixed contents
+SELECT xmlserialize(DOCUMENT 'text node<foo>73</foo>text node<bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT 'text node<foo>73</foo>text node<bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- indent singly-rooted xml with mixed contents
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><val x="y">text node<val>73</val></val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><val x="y">text node<val>73</val></val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ERROR: unsupported XML feature at character 30
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent with XML declaration
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- indent containing DOCTYPE declaration
+SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' AS text INDE...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' AS text INDE...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- indent xml with empty element
+SELECT xmlserialize(DOCUMENT '<foo><bar></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar></bar></foo>' AS tex...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar></bar></foo>' AS text INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar></bar></foo>' AS tex...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR: unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
ERROR: unsupported XML feature
LINE 1: SELECT xml '<foo>bar</foo>' IS DOCUMENT;
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 4f029d0072..16fc787c18 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -466,6 +466,194 @@ SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
ERROR: not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+-------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val>+
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ xmlserialize
+-------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val>+
+ </bar> +
+ </foo>
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent non singly-rooted xml
+SELECT xmlserialize(DOCUMENT '<foo>73</foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '<foo>73</foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ xmlserialize
+-----------------------
+ <foo>73</foo> +
+ <bar> +
+ <val x="y">42</val>+
+ </bar>
+(1 row)
+
+-- indent non singly-rooted xml with mixed contents
+SELECT xmlserialize(DOCUMENT 'text node<foo>73</foo>text node<bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT 'text node<foo>73</foo>text node<bar><val x="y">42</val></bar>' AS text INDENT);
+ xmlserialize
+------------------------
+ text node +
+ <foo>73</foo>text node+
+ <bar> +
+ <val x="y">42</val> +
+ </bar>
+(1 row)
+
+-- indent singly-rooted xml with mixed contents
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><val x="y">text node<val>73</val></val></bar></foo>' AS text INDENT);
+ xmlserialize
+---------------------------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ <val x="y">text node<val>73</val></val>+
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><val x="y">text node<val>73</val></val></bar></foo>' AS text INDENT);
+ xmlserialize
+---------------------------------------------
+ <foo> +
+ <bar> +
+ <val x="y">42</val> +
+ <val x="y">text node<val>73</val></val>+
+ </bar> +
+ </foo>
+(1 row)
+
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+ERROR: not an XML document
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent with XML declaration
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+ xmlserialize
+---------------------------------------
+ <?xml version="1.0" encoding="UTF8"?>+
+ <foo> +
+ <bar> +
+ <val>73</val> +
+ </bar> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+ xmlserialize
+-------------------
+ <foo> +
+ <bar> +
+ <val>73</val>+
+ </bar> +
+ </foo>
+(1 row)
+
+-- indent containing DOCTYPE declaration
+SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' AS text INDENT);
+ xmlserialize
+--------------
+ <!DOCTYPE a>+
+ <a/> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' AS text INDENT);
+ xmlserialize
+--------------
+ <!DOCTYPE a>+
+ <a/> +
+
+(1 row)
+
+-- indent xml with empty element
+SELECT xmlserialize(DOCUMENT '<foo><bar></bar></foo>' AS text INDENT);
+ xmlserialize
+--------------
+ <foo> +
+ <bar/> +
+ </foo> +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar></bar></foo>' AS text INDENT);
+ xmlserialize
+--------------
+ <foo> +
+ <bar/> +
+ </foo>
+(1 row)
+
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ ?column?
+----------
+ t
+(1 row)
+
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
?column?
----------
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 24e40d2653..078e873bb2 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -132,6 +132,44 @@ SELECT xmlserialize(content data as character varying(20)) FROM xmltest;
SELECT xmlserialize(content 'good' as char(10));
SELECT xmlserialize(document 'bad' as text);
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+\set VERBOSITY terse
+-- indent non singly-rooted xml
+SELECT xmlserialize(DOCUMENT '<foo>73</foo><bar><val x="y">42</val></bar>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo>73</foo><bar><val x="y">42</val></bar>' AS text INDENT);
+-- indent non singly-rooted xml with mixed contents
+SELECT xmlserialize(DOCUMENT 'text node<foo>73</foo>text node<bar><val x="y">42</val></bar>' AS text INDENT);
+SELECT xmlserialize(CONTENT 'text node<foo>73</foo>text node<bar><val x="y">42</val></bar>' AS text INDENT);
+-- indent singly-rooted xml with mixed contents
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><val x="y">text node<val>73</val></val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><val x="y">text node<val>73</val></val></bar></foo>' AS text INDENT);
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+SELECT xmlserialize(CONTENT '' AS text INDENT);
+-- whitespaces
+SELECT xmlserialize(DOCUMENT ' ' AS text INDENT);
+SELECT xmlserialize(CONTENT ' ' AS text INDENT);
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+SELECT xmlserialize(CONTENT NULL AS text INDENT);
+-- indent with XML declaration
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="UTF-8"?><foo><bar><val>73</val></bar></foo>' AS text INDENT);
+-- indent containing DOCTYPE declaration
+SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' AS text INDENT);
+-- indent xml with empty element
+SELECT xmlserialize(DOCUMENT '<foo><bar></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar></bar></foo>' AS text INDENT);
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
SELECT xml '<foo>bar</foo><bar>foo</bar>' IS DOCUMENT;
--
2.25.1
Jim Jones <jim.jones@uni-muenster.de> writes:
On 14.03.23 18:40, Tom Lane wrote:
I poked at this for awhile and ran into a problem that I'm not sure
how to solve: it misbehaves for input with embedded DOCTYPE.
The issue was the flag XML_SAVE_NO_EMPTY. It was forcing empty elements
to be serialized with start-end tag pairs. Removing it did the trick ...
... but as a side effect empty start-end tags will be now serialized as
empty elements
postgres=# SELECT xmlserialize(CONTENT '<foo><bar></bar></foo>' AS text
INDENT);
xmlserialize
--------------
<foo> +
<bar/> +
</foo>
(1 row)
Huh, interesting. That is a legitimate pretty-fication of the input,
I suppose, but some people might think it goes beyond the charter of
"indentation". I'm okay with it personally; anyone want to object?
regards, tom lane
I wrote:
Huh, interesting. That is a legitimate pretty-fication of the input,
I suppose, but some people might think it goes beyond the charter of
"indentation". I'm okay with it personally; anyone want to object?
Hearing no objections to that, I moved ahead with this.
It occurred to me to test v23 for memory leaks, and it had bad ones:
* the "newline" node used in the CONTENT case never got freed.
Making another one for each line wasn't helping, either.
* libxml, at least in the 2.9.7 version I have here, turns out to
leak memory if you pass a non-null encoding to xmlSaveToBuffer.
But AFAICS we don't really need to do that, because the last thing
we want is for libxml to try to do any encoding conversion.
After cleaning that up, I saw that we were indeed doing essentially
duplicative xml_parse calls for the DOCUMENT check and the indentation
work, so I refactored to allow just one call to serve.
Pushed with those changes and some other cosmetic cleanup.
Thanks for working so hard on this!
(Now to keep an eye on the buildfarm, to see if other versions of
libxml work like mine ...)
BTW, the libxml leak problem seems to extend to other cases too.
I tested with code like
do $$
declare x xml; t text;
begin
x := '<?xml version="1.0" encoding="utf8"?><foo><bar><val>73</val></bar></foo>';
for i in 1..10000000 loop
t := xmlserialize(document x as text);
end loop;
raise notice 't = %', t;
end;
$$;
That case is fine, but if you change the encoding spec to "latin1",
it leaks like mad. That problem is not the fault of this patch,
I don't think. I wonder if we need to do something to prevent
libxml from seeing encoding declarations other than utf8?
regards, tom lane
I wrote:
BTW, the libxml leak problem seems to extend to other cases too.
I tested with code like
do $$
declare x xml; t text;
begin
x := '<?xml version="1.0" encoding="utf8"?><foo><bar><val>73</val></bar></foo>';
for i in 1..10000000 loop
t := xmlserialize(document x as text);
end loop;
raise notice 't = %', t;
end;
$$;
That case is fine, but if you change the encoding spec to "latin1",
it leaks like mad. That problem is not the fault of this patch,
I don't think. I wonder if we need to do something to prevent
libxml from seeing encoding declarations other than utf8?
After a bit of further testing: the leak is present in libxml2 2.9.7
which is what I have on this RHEL8 box, but it seems not to occur
in libxml2 2.10.3 (tested on Fedora 37, and I verified that Fedora
isn't carrying any relevant local patch).
So maybe it's worth working around that, or maybe it isn't.
regards, tom lane
On 15 Mar 2023, at 22:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:
After a bit of further testing: the leak is present in libxml2 2.9.7
which is what I have on this RHEL8 box, but it seems not to occur
in libxml2 2.10.3 (tested on Fedora 37, and I verified that Fedora
isn't carrying any relevant local patch).So maybe it's worth working around that, or maybe it isn't.
2.9.7 is from November 2017 and 2.10.3 is from October 2022, so depending on
when in that timespan the issue was fixed it might be in a release which will
be with us for quite some time. The lack of reports (that I was able to find)
indicate that it might be rare in production though?
--
Daniel Gustafsson
On 15.03.23 22:13, Tom Lane wrote:
I wrote:
It occurred to me to test v23 for memory leaks, and it had bad ones:
* the "newline" node used in the CONTENT case never got freed.
Making another one for each line wasn't helping, either.
Oh, I did really miss that one. Thanks!
Pushed with those changes and some other cosmetic cleanup.
Thanks for working so hard on this!
Great! Thank you, Peter and Andrey for the very nice reviews.
BTW, the libxml leak problem seems to extend to other cases too.
I tested with code likedo $$
declare x xml; t text;
begin
x := '<?xml version="1.0" encoding="utf8"?><foo><bar><val>73</val></bar></foo>';
for i in 1..10000000 loop
t := xmlserialize(document x as text);
end loop;
raise notice 't = %', t;
end;
$$;That case is fine, but if you change the encoding spec to "latin1",
it leaks like mad. That problem is not the fault of this patch,
I don't think. I wonder if we need to do something to prevent
libxml from seeing encoding declarations other than utf8?
In my environment (libxml2 v2.9.10 and Ubuntu 22.04) I couldn't
reproduce this memory leak. It's been most likely fixed in further
libxml2 versions. Unfortunately their gitlab page has no release notes
from versions prior to 2.9.13 :(
Best, Jim
Jim Jones <jim.jones@uni-muenster.de> writes:
In my environment (libxml2 v2.9.10 and Ubuntu 22.04) I couldn't
reproduce this memory leak.
Just when you thought it was safe to go back in the water ...
Experimenting with the improved valgrind leak detection code at [1]/messages/by-id/1295385.1747847681@sss.pgh.pa.us,
I discovered that XMLSERIALIZE(... INDENT) has yet a different memory
leak problem. It turns out that xmlDocSetRootElement() doesn't
merely install the given root node: it unlinks the document's old
root node and returns it to you. If you don't free it, it's leaked
(for the session, since this is a malloc not palloc). The amount of
leakage isn't that large, seems to be a couple hundred bytes per
iteration, which may explain why this escaped our notice in the
previous testing. Still, it could add up under extensive usage.
So I think we need to apply the attached, back to PG 16.
regards, tom lane
Attachments:
fix-leakage-in-xmlserialize-indent.patchtext/x-diff; charset=us-ascii; name=fix-leakage-in-xmlserialize-indent.patchDownload
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index db8d0d6a7e8..73fd4fa090c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -754,6 +754,7 @@ xmltotext_with_options(xmltype *data, XmlOptionType xmloption_arg, bool indent)
* content nodes, and then iterate over the nodes.
*/
xmlNodePtr root;
+ xmlNodePtr oldroot;
xmlNodePtr newline;
root = xmlNewNode(NULL, (const xmlChar *) "content-root");
@@ -761,8 +762,14 @@ xmltotext_with_options(xmltype *data, XmlOptionType xmloption_arg, bool indent)
xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
"could not allocate xml node");
- /* This attaches root to doc, so we need not free it separately. */
- xmlDocSetRootElement(doc, root);
+ /*
+ * This attaches root to doc, so we need not free it separately...
+ * but instead, we have to free the old root if there was one.
+ */
+ oldroot = xmlDocSetRootElement(doc, root);
+ if (oldroot != NULL)
+ xmlFreeNode(oldroot);
+
xmlAddChildList(root, content_nodes);
/*
Hi Tom
On 21.05.25 22:20, Tom Lane wrote:
Just when you thought it was safe to go back in the water ...
Experimenting with the improved valgrind leak detection code at [1],
I discovered that XMLSERIALIZE(... INDENT) has yet a different memory
leak problem. It turns out that xmlDocSetRootElement() doesn't
merely install the given root node: it unlinks the document's old
root node and returns it to you. If you don't free it, it's leaked
(for the session, since this is a malloc not palloc).
Yeah, I just read the same in the docs
/"returns the unlinked old root element or NULL if the document didn't
have a root element or a memory allocation failed. "/
The xmlsoft examples are a bit misleading though [1]
/*
* Creates a new document, a node and set it as a root node
*/
doc = xmlNewDoc(BAD_CAST "1.0");
root_node = xmlNewNode(NULL, BAD_CAST "root");
xmlDocSetRootElement(doc, root_node);
and [2]
/* Make ELEMENT the root node of the tree */
xmlDocSetRootElement(doc, node);
It seems that xml_parse has the same issue[3]
Should we attempt to free the result of xmlDocSetRootElement() there too? v2 attached.
The amount of
leakage isn't that large, seems to be a couple hundred bytes per
iteration, which may explain why this escaped our notice in the
previous testing. Still, it could add up under extensive usage.
So I think we need to apply the attached, back to PG 16.
Definitely. It could add up quickly under heavy usage.
Thanks for fixing it!
Best, Jim
1 - http://xmlsoft.org/examples/tree2.c
2 - http://xmlsoft.org/examples/testWriter.c
3 -
https://github.com/postgres/postgres/blob/f3622b64762bb5ee5242937f0fadcacb1a10f30e/src/backend/utils/adt/xml.c#L1872
Attachments:
v2-0001-fix-leakage-in-xmlserialize-indent.patchtext/x-patch; charset=UTF-8; name=v2-0001-fix-leakage-in-xmlserialize-indent.patchDownload
From c01b19d747bb4e3ffff3f6eb2f4641c268fe2b21 Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jones@uni-muenster.de>
Date: Thu, 22 May 2025 00:55:37 +0200
Subject: [PATCH v2] fix leakage in xmlserialize indent
---
src/backend/utils/adt/xml.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index db8d0d6a7e..a8d5eeda54 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -754,6 +754,7 @@ xmltotext_with_options(xmltype *data, XmlOptionType xmloption_arg, bool indent)
* content nodes, and then iterate over the nodes.
*/
xmlNodePtr root;
+ xmlNodePtr oldroot;
xmlNodePtr newline;
root = xmlNewNode(NULL, (const xmlChar *) "content-root");
@@ -761,8 +762,14 @@ xmltotext_with_options(xmltype *data, XmlOptionType xmloption_arg, bool indent)
xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
"could not allocate xml node");
- /* This attaches root to doc, so we need not free it separately. */
- xmlDocSetRootElement(doc, root);
+ /*
+ * This attaches root to doc, so we need not free it separately...
+ * but instead, we have to free the old root if there was one.
+ */
+ oldroot = xmlDocSetRootElement(doc, root);
+ if (oldroot != NULL)
+ xmlFreeNode(oldroot);
+
xmlAddChildList(root, content_nodes);
/*
@@ -1850,6 +1857,7 @@ xml_parse(text *data, XmlOptionType xmloption_arg,
else
{
xmlNodePtr root;
+ xmlNodePtr oldroot;
/* set up document with empty root node to be the context node */
doc = xmlNewDoc(version);
@@ -1868,8 +1876,13 @@ xml_parse(text *data, XmlOptionType xmloption_arg,
if (root == NULL || xmlerrcxt->err_occurred)
xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
"could not allocate xml node");
- /* This attaches root to doc, so we need not free it separately. */
- xmlDocSetRootElement(doc, root);
+ /*
+ * This attaches root to doc, so we need not free it separately...
+ * but instead, we have to free the old root if there was one.
+ */
+ oldroot = xmlDocSetRootElement(doc, root);
+ if (oldroot != NULL)
+ xmlFreeNode(oldroot);
/* allow empty content */
if (*(utf8string + count))
--
2.34.1
Jim Jones <jim.jones@uni-muenster.de> writes:
On 21.05.25 22:20, Tom Lane wrote:
It turns out that xmlDocSetRootElement() doesn't
merely install the given root node: it unlinks the document's old
root node and returns it to you. If you don't free it, it's leaked
(for the session, since this is a malloc not palloc).
The xmlsoft examples are a bit misleading though [1]
Yeah. I also did some searching on http://codesearch.debian.net
and was hard put to it to find anything that pays attention to
xmlDocSetRootElement's result at all. I wonder how many of those
represent leaks.
It seems that xml_parse has the same issue[3]
I did look at that one too. I think it's fine, because we're
dealing with a newly-created document which can't have a root node
yet. (Reinforcing this, Valgrind sees no leaks after applying
my patch.) I considered adding an assertion that that call returns
NULL, but concluded that it wasn't worth the notational hassle.
I'm not strongly set on that conclusion, though, if you think
differently.
regards, tom lane
On 22.05.25 01:48, Tom Lane wrote:
I did look at that one too. I think it's fine, because we're
dealing with a newly-created document which can't have a root node
yet. (Reinforcing this, Valgrind sees no leaks after applying
my patch.) I considered adding an assertion that that call returns
NULL, but concluded that it wasn't worth the notational hassle.
I'm not strongly set on that conclusion, though, if you think
differently.
I see. In that case I believe that at least a different comment
explaining this decision would avoid confusion. Something like
/*
* This attaches root to doc, so we do not need to free it separately.
* The return value of xmlDocSetRootElement (xmlNodePtr) is intentionally
* ignored here, as it is guaranteed to be NULL in this specific context.
* When using this function elsewhere, ensure to handle the return value
* properly.
*/
Best regards, Jim
Jim Jones <jim.jones@uni-muenster.de> writes:
On 22.05.25 01:48, Tom Lane wrote:
... I considered adding an assertion that that call returns
NULL, but concluded that it wasn't worth the notational hassle.
I'm not strongly set on that conclusion, though, if you think
differently.
I see. In that case I believe that at least a different comment
explaining this decision would avoid confusion. Something like
Yeah, after sleeping on it I fear that leaving xml_parse entirely
alone will just be a recipe for future copy-and-paste errors.
The Assert solution seems like the way to go, approximately
xmlNodePtr root;
+ xmlNodePtr oldroot PG_USED_FOR_ASSERTS_ONLY;
...
/* This attaches root to doc, so we need not free it separately. */
- xmlDocSetRootElement(doc, root);
+ oldroot = xmlDocSetRootElement(doc, root);
+ /* ... and there can't yet be any old root to clean up. */
+ Assert(oldroot == NULL);
I'll make it so.
regards, tom lane
On 22.05.25 17:00, Tom Lane wrote:
Yeah, after sleeping on it I fear that leaving xml_parse entirely
alone will just be a recipe for future copy-and-paste errors.
That's exactly my concern as well.
The Assert solution seems like the way to go, approximately
xmlNodePtr root;
+ xmlNodePtr oldroot PG_USED_FOR_ASSERTS_ONLY;... /* This attaches root to doc, so we need not free it separately. */ - xmlDocSetRootElement(doc, root); + oldroot = xmlDocSetRootElement(doc, root); + /* ... and there can't yet be any old root to clean up. */ + Assert(oldroot == NULL);I'll make it so.
+1
Thanks!
Best regards, Jim