[PATCH] Add pretty-printed XML output option
Hi,
This small patch introduces a XML pretty print function. It basically
takes advantage of the indentation feature of xmlDocDumpFormatMemory
from libxml2 to format XML strings.
postgres=# SELECT xmlpretty('<foo id="x"><bar id="y"><var
id="z">42</var></bar></foo>');
xmlpretty
--------------------------
<foo id="x"> +
<bar id="y"> +
<var id="z">42</var>+
</bar> +
</foo> +
(1 row)
The patch also contains regression tests and documentation.
Feedback is very welcome!
Jim
The system somehow returns different error messages in Linux and
MacOS/Windows, which was causing the cfbot to fail.
SELECT xmlpretty('<foo>')::xml;
^
-DETAIL: line 1: chunk is not well balanced
+DETAIL: line 1: Premature end of data in tag foo line 1
Test removed in v2.
Show quoted text
On 02.02.23 21:35, Jim Jones wrote:
Hi,
This small patch introduces a XML pretty print function. It basically
takes advantage of the indentation feature of xmlDocDumpFormatMemory
from libxml2 to format XML strings.postgres=# SELECT xmlpretty('<foo id="x"><bar id="y"><var
id="z">42</var></bar></foo>');
xmlpretty
--------------------------
<foo id="x"> +
<bar id="y"> +
<var id="z">42</var>+
</bar> +
</foo> +(1 row)
The patch also contains regression tests and documentation.
Feedback is very welcome!
Jim
Hi,
The cfbot on "Windows - Server 2019, VS 2019 - Meson & ninja" is failing
the regression tests with the error:
ERROR: unsupported XML feature
DETAIL: This functionality requires the server to be built with libxml
support.
Is there anything I can do to enable libxml to run my regression tests?
v3 adds a missing xmlFree call.
Best, Jim
Attachments:
v3-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v3-0001-Add-pretty-printed-XML-output-option.patchDownload+219-25
Jim Jones <jim.jones@uni-muenster.de> writes:
The cfbot on "Windows - Server 2019, VS 2019 - Meson & ninja" is failing
the regression tests with the error:
ERROR: unsupported XML feature
DETAIL: This functionality requires the server to be built with libxml
support.
Is there anything I can do to enable libxml to run my regression tests?
Your patch needs to also update expected/xml_1.out to match the output
the test produces when run without --with-libxml.
regards, tom lane
On 06.02.23 17:23, Tom Lane wrote:
Your patch needs to also update expected/xml_1.out to match the output
the test produces when run without --with-libxml.
Thanks a lot! It seems to do the trick.
xml_1.out updated in v4.
Best, Jim
while working on another item of the TODO list I realized that I should
be using a PG_TRY() block in he xmlDocDumpFormatMemory call.
Fixed in v5.
Best regards, Jim
Attachments:
v5-0001-Add-pretty-printed-XML-output-option.patchtext/x-patch; charset=UTF-8; name=v5-0001-Add-pretty-printed-XML-output-option.patchDownload+270-1
On Thu, Feb 9, 2023 at 7:31 AM Jim Jones <jim.jones@uni-muenster.de> wrote:
while working on another item of the TODO list I realized that I should
be using a PG_TRY() block in he xmlDocDumpFormatMemory call.Fixed in v5.
I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the
other xmlFreeDoc(doc) is not. As the doc is assigned outside the
PG_TRY shouldn't those both be the same?
------
Kind Regards,
Peter Smith.
Fujitsu Australia.
On 09.02.23 00:09, Peter Smith wrote:
I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the
other xmlFreeDoc(doc) is not. As the doc is assigned outside the
PG_TRY shouldn't those both be the same?
Hi Peter,
My logic there was the following: if program reached that part of the
code it means that the xml_parse() and xmlDocDumpFormatMemory() worked,
which consequently means that the variables doc and xmlbuf are != NULL,
therefore not needing to be checked. Am I missing something?
Thanks a lot for the review!
Best, Jim
Attachments:
On Thu, Feb 9, 2023 at 10:42 AM Jim Jones <jim.jones@uni-muenster.de> wrote:
On 09.02.23 00:09, Peter Smith wrote:
I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the
other xmlFreeDoc(doc) is not. As the doc is assigned outside the
PG_TRY shouldn't those both be the same?Hi Peter,
My logic there was the following: if program reached that part of the
code it means that the xml_parse() and xmlDocDumpFormatMemory() worked,
which consequently means that the variables doc and xmlbuf are != NULL,
therefore not needing to be checked. Am I missing something?
Thanks. I think I understand it better now -- I expect
xmlDocDumpFormatMemory will cope OK when passed a NULL doc (see this
source [1]), but it will return nbytes of 0, but your code will still
throw ERROR, meaning the guard for doc NULL is necessary for the
PG_CATCH.
In that case, everything LGTM.
~
OTOH, if you are having to check for NULL doc anyway, maybe it's just
as easy only doing that up-front. Then you could quick-exit the
function without calling xmlDocDumpFormatMemory etc. in the first
place. For example:
doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
if (!doc)
return 0;
------
Kind Regards,
Peter Smith.
Fujitsu Australia.
On 09.02.23 02:01, Peter Smith wrote:
OTOH, if you are having to check for NULL doc anyway, maybe it's just
as easy only doing that up-front. Then you could quick-exit the
function without calling xmlDocDumpFormatMemory etc. in the first
place. For example:doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
if (!doc)
return 0;
I see your point. If I got it right, you're suggesting the following
change in the PG_TRY();
PG_TRY();
{
int nbytes;
if(!doc)
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
"could not parse the given XML document");
xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
if(!nbytes || xmlerrcxt->err_occurred)
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
"could not indent the given XML document");
initStringInfo(&buf);
appendStringInfoString(&buf, (const char *)xmlbuf);
}
.. which will catch the doc == NULL before calling xmlDocDumpFormatMemory.
Is it what you suggest?
Thanks a lot for the thorough review!
Best, Jim
Jim Jones <jim.jones@uni-muenster.de> writes:
I see your point. If I got it right, you're suggesting the following
change in the PG_TRY();
PG_TRY();
{
int nbytes;
if(!doc)
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
"could not parse the given XML document");
xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
if(!nbytes || xmlerrcxt->err_occurred)
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
"could not indent the given XML document");
initStringInfo(&buf);
appendStringInfoString(&buf, (const char *)xmlbuf);
}
.. which will catch the doc == NULL before calling xmlDocDumpFormatMemory.
Um ... why are you using PG_TRY here at all? It seems like
you have full control of the actually likely error cases.
The only plausible error out of the StringInfo calls is OOM,
and you probably don't want to trap that at all.
regards, tom lane
On 09.02.23 08:23, Tom Lane wrote:
Um ... why are you using PG_TRY here at all? It seems like
you have full control of the actually likely error cases.
The only plausible error out of the StringInfo calls is OOM,
and you probably don't want to trap that at all.
My intention was to catch any unexpected error from
xmlDocDumpFormatMemory and handle it properly. But I guess you're right,
I can control the likely error cases by checking doc and nbytes.
You suggest something along these lines?
xmlDocPtr doc;
xmlChar *xmlbuf = NULL;
text *arg = PG_GETARG_TEXT_PP(0);
StringInfoData buf;
int nbytes;
doc = xml_parse(arg, XMLOPTION_DOCUMENT, false,
GetDatabaseEncoding(), NULL);
if(!doc)
elog(ERROR, "could not parse the given XML document");
xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
xmlFreeDoc(doc);
if(!nbytes)
elog(ERROR, "could not indent the given XML document");
initStringInfo(&buf);
appendStringInfoString(&buf, (const char *)xmlbuf);
xmlFree(xmlbuf);
PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
Thanks!
Best, Jim
Attachments:
On 02.02.23 21:35, Jim Jones wrote:
This small patch introduces a XML pretty print function. It basically
takes advantage of the indentation feature of xmlDocDumpFormatMemory
from libxml2 to format XML strings.
I suggest we call it "xmlformat", which is an established term for this.
On Thu, Feb 9, 2023 at 7:17 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
On 09.02.23 08:23, Tom Lane wrote:
Um ... why are you using PG_TRY here at all? It seems like
you have full control of the actually likely error cases.
The only plausible error out of the StringInfo calls is OOM,
and you probably don't want to trap that at all.My intention was to catch any unexpected error from
xmlDocDumpFormatMemory and handle it properly. But I guess you're right,
I can control the likely error cases by checking doc and nbytes.You suggest something along these lines?
xmlDocPtr doc;
xmlChar *xmlbuf = NULL;
text *arg = PG_GETARG_TEXT_PP(0);
StringInfoData buf;
int nbytes;doc = xml_parse(arg, XMLOPTION_DOCUMENT, false,
GetDatabaseEncoding(), NULL);if(!doc)
elog(ERROR, "could not parse the given XML document");xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
xmlFreeDoc(doc);
if(!nbytes)
elog(ERROR, "could not indent the given XML document");initStringInfo(&buf);
appendStringInfoString(&buf, (const char *)xmlbuf);xmlFree(xmlbuf);
PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
Thanks!
Something like that LGTM, but here are some other minor comments.
======
1.
FYI, there are some whitespace warnings applying the v5 patch
[postgres@CentOS7-x64 oss_postgres_misc]$ git apply
../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch
../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:26:
trailing whitespace.
../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:29:
trailing whitespace.
../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:33:
trailing whitespace.
../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:37:
trailing whitespace.
../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:41:
trailing whitespace.
warning: squelched 48 whitespace errors
warning: 53 lines add whitespace errors.
======
src/test/regress/sql/xml.sql
2.
The v5 patch was already testing NULL, but it might be good to add
more tests to verify the function is behaving how you want for edge
cases. For example,
+-- XML pretty print: NULL, empty string, spaces only...
SELECT xmlpretty(NULL);
SELECT xmlpretty('');
SELECT xmlpretty(' ');
~~
3.
The function is returning XML anyway, so is the '::xml' casting in
these tests necessary?
e.g.
SELECT xmlpretty(NULL)::xml; --> SELECT xmlpretty(NULL);
======
src/include/catalog/pg_proc.dat
4.
+ { oid => '4642', descr => 'Indented text from xml',
+ proname => 'xmlpretty', prorettype => 'xml',
+ proargtypes => 'xml', prosrc => 'xmlpretty' },
Spurious leading space for this new entry.
======
doc/src/sgml/func.sgml
5.
+ <foo id="x">
+ <bar id="y">
+ <var id="z">42</var>
+ </bar>
+ </foo>
+
+(1 row)
+
+]]></screen>
A spurious blank line in the example after the "(1 row)"
~~~
6.
Does this function docs belong in section 9.15.1 "Producing XML
Content"? Or (since it is not really producing content) should it be
moved to the 9.15.3 "Processing XML" section?
------
Kind Regards,
Peter Smith.
Fujitsu Australia
On 10.02.23 02:10, Peter Smith wrote:
On Thu, Feb 9, 2023 at 7:17 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
1.
FYI, there are some whitespace warnings applying the v5 patch
Trailing whitespaces removed. The patch applies now without warnings.
======
src/test/regress/sql/xml.sql2.
The v5 patch was already testing NULL, but it might be good to add
more tests to verify the function is behaving how you want for edge
cases. For example,+-- XML pretty print: NULL, empty string, spaces only...
SELECT xmlpretty(NULL);
SELECT xmlpretty('');
SELECT xmlpretty(' ');
Test cases added.
3.
The function is returning XML anyway, so is the '::xml' casting in
these tests necessary?e.g.
SELECT xmlpretty(NULL)::xml; --> SELECT xmlpretty(NULL);
It is indeed not necessary. Most likely I used it for testing and forgot
to remove it afterwards. Now removed.
======
src/include/catalog/pg_proc.dat4.
+ { oid => '4642', descr => 'Indented text from xml', + proname => 'xmlpretty', prorettype => 'xml', + proargtypes => 'xml', prosrc => 'xmlpretty' },Spurious leading space for this new entry.
Removed.
======
doc/src/sgml/func.sgml5. + <foo id="x"> + <bar id="y"> + <var id="z">42</var> + </bar> + </foo> + +(1 row) + +]]></screen>A spurious blank line in the example after the "(1 row)"
Removed.
~~~
6.
Does this function docs belong in section 9.15.1 "Producing XML
Content"? Or (since it is not really producing content) should it be
moved to the 9.15.3 "Processing XML" section?
Moved to the section 9.15.3
Following the suggestion of Peter Eisentraut I renamed the function to
xmlformat().
v6 attached.
Thanks a lot for the review.
Best, Jim
Something is misbehaving.
Using the latest HEAD, and before applying the v6 patch, 'make check'
is working OK.
But after applying the v6 patch, some XML regression tests are failing
because the DETAIL part of the message indicating the wrong syntax
position is not getting displayed. Not only for your new tests -- but
for other XML tests too.
My ./configure looks like this:
./configure --prefix=/usr/local/pg_oss --with-libxml --enable-debug
--enable-cassert --enable-tap-tests CFLAGS="-ggdb -O0 -g3
-fno-omit-frame-pointer"
resulting in:
checking whether to build with XML support... yes
checking for libxml-2.0 >= 2.6.23... yes
~
e.g.(these are a sample of errors)
xml ... FAILED 2561 ms
@@ -344,8 +326,6 @@
<twoerrors>&idontexist;</unbalanced>
^
line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced
-<twoerrors>&idontexist;</unbalanced>
- ^
SELECT xmlparse(document '<nosuchprefix:tag/>');
xmlparse
---------------------
@@ -1696,14 +1676,8 @@
SELECT xmlformat('');
ERROR: invalid XML document
DETAIL: line 1: switching encoding : no input
-
-^
line 1: Document is empty
-
-^
-- XML format: invalid string (whitespaces)
SELECT xmlformat(' ');
ERROR: invalid XML document
DETAIL: line 1: Start tag expected, '<' not found
-
- ^
~~
Separately (but maybe it's related?), the CF-bot test also reported a
failure [1]https://api.cirrus-ci.com/v1/artifact/task/4802219812323328/testrun/build/testrun/regress/regress/regression.diffs with strange error detail differences.
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out
/tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12
09:02:57.077569000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
2023-02-12 09:05:45.148100000 +0000
@@ -1695,10 +1695,7 @@
-- XML format: empty string
SELECT xmlformat('');
ERROR: invalid XML document
-DETAIL: line 1: switching encoding : no input
-
-^
-line 1: Document is empty
+DETAIL: line 1: Document is empty
^
-- XML format: invalid string (whitespaces)
Kind Regards,
Peter Smith.
Fujitsu Australia
On 13.02.23 02:15, Peter Smith wrote:
Something is misbehaving.
Using the latest HEAD, and before applying the v6 patch, 'make check'
is working OK.But after applying the v6 patch, some XML regression tests are failing
because the DETAIL part of the message indicating the wrong syntax
position is not getting displayed. Not only for your new tests -- but
for other XML tests too.
Yes, I noticed it yesterday ... and I'm not sure how to solve it. It
seems that in the system is returning a different error message in the
FreeBSD patch tester, which is causing a regression test in this
particular OS to fail.
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12 09:02:57.077569000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out 2023-02-12 09:05:45.148100000 +0000
@@ -1695,10 +1695,7 @@
-- XML format: empty string
SELECT xmlformat('');
ERROR: invalid XML document
-DETAIL: line 1: switching encoding : no input
-
-^
-line 1: Document is empty
+DETAIL: line 1: Document is empty
^
-- XML format: invalid string (whitespaces)
Does anyone know if there is anything I can do to make the error
messages be consistent among different OS?
On 13.02.23 13:15, Jim Jones wrote:
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out --- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12 09:02:57.077569000 +0000 +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out 2023-02-12 09:05:45.148100000 +0000 @@ -1695,10 +1695,7 @@ -- XML format: empty string SELECT xmlformat(''); ERROR: invalid XML document -DETAIL: line 1: switching encoding : no input - -^ -line 1: Document is empty +DETAIL: line 1: Document is empty^
-- XML format: invalid string (whitespaces)
I couldn't figure out why the error messages are different -- I'm
wondering if the issue is the test environment itself. I just removed
the troubling test case for now
SELECT xmlformat('');
v7 attached.
Thanks for reviewing this patch!
Best, Jim
On Wed, Feb 15, 2023 at 8:55 AM Jim Jones <jim.jones@uni-muenster.de> wrote:
On 13.02.23 13:15, Jim Jones wrote:
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out --- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12 09:02:57.077569000 +0000 +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out 2023-02-12 09:05:45.148100000 +0000 @@ -1695,10 +1695,7 @@ -- XML format: empty string SELECT xmlformat(''); ERROR: invalid XML document -DETAIL: line 1: switching encoding : no input - -^ -line 1: Document is empty +DETAIL: line 1: Document is empty^
-- XML format: invalid string (whitespaces)I couldn't figure out why the error messages are different -- I'm wondering if the issue is the test environment itself. I just removed the troubling test case for now
SELECT xmlformat('');
v7 attached.
Thanks for reviewing this patch!
Yesterday I looked at those cfbot configs and noticed all those
machines have different versions of libxml.
2.10.3
2.6.23
2.9.10
2.9.13
But I don't if version numbers have any impact on the different error
details or not.
~
The thing that puzzled me most is that in MY environment (CentOS7;
libxml 20901; PG --with-libxml build) I get this behaviour.
- Without your v6 patch 'make check' is all OK.
- With your v6 patch other XML tests (not only yours) of 'make check'
failed with different error messages.
- Similarly, if I keep the v6 patch but just change (in xmlformat) the
#ifdef USE_LIBXML to be #if 0, then only the new xmlformat tests fail,
but the other XML tests are working OK again.
Those results implied to me that this function code (in my environment
anyway) is somehow introducing a side effect causing the *other* XML
tests to fail.
But so far I was unable to identify the reason. Sorry, I don't know
this XML API well enough to help more.
------
Kind Regards,
Peter Smith.
Fujitsu Austalia.
On 14.02.23 23:45, Peter Smith wrote:
Those results implied to me that this function code (in my environment
anyway) is somehow introducing a side effect causing the *other* XML
tests to fail.
I believe I've found the issue. It is probably related to the XML OPTION
settings, as it seems to deliver different error messages when set to
DOCUMENT or CONTENT:
postgres=# SET XML OPTION CONTENT;
SET
postgres=# SELECT xmlformat('');
ERROR: invalid XML document
DETAIL: line 1: switching encoding : no input
^
line 1: Document is empty
^
postgres=# SET XML OPTION DOCUMENT;
SET
postgres=# SELECT xmlformat('');
ERROR: invalid XML document
LINE 1: SELECT xmlformat('');
^
DETAIL: line 1: switching encoding : no input
^
line 1: Document is empty
^
v8 attached reintroduces the SELECT xmlformat('') test case and adds SET
XML OPTION DOCUMENT to the regression tests.
Best, Jim