XML test error on Arch Linux
Hello hackers,
i compile postgresql just for fun.
My system is a Arch Linux.
I get after upgrade the libxml2 package (from 2.12.7-1 to 2.13.1-1) test errors for xml:
---
...
ok 200 + largeobject 553 ms
ok 201 + with 819 ms
not ok 202 + xml 1464 ms
# parallel group (15 tests): hash_part predicate partition_info explain reloptions memoize compression partition_merge stats partition_split partition_aggregate tuplesort partition_join partition_prune indexing
ok 203 + partition_merge 1282 ms
ok 204 + partition_split 1533 ms
ok 205 + partition_join 2229 ms
...
ok 221 - fast_default 230 ms
ok 222 - tablespace 498 ms
1..222
# 1 of 222 tests failed.
# The differences that caused some tests to fail can be viewed in the file "/home/frastr/devel/postgresql/src/test/regress/regression.diffs".
# A copy of the test summary that you see above is saved in the file "/home/frastr/devel/postgresql/src/test/regress/regression.out".
make[1]: *** [GNUmakefile:118: check] Fehler 1
make[1]: Verzeichnis „/home/frastr/devel/postgresql/src/test/regress“ wird verlassen
make: *** [GNUmakefile:69: check] Fehler 2
---
What can I do?
Who can help?
Best regards,
Frank
Attachments:
regression.diffsapplication/octet-streamDownload
diff -U3 /home/frastr/devel/postgresql/src/test/regress/expected/xml.out /home/frastr/devel/postgresql/src/test/regress/results/xml.out
--- /home/frastr/devel/postgresql/src/test/regress/expected/xml.out 2024-01-23 17:36:17.156017054 +0100
+++ /home/frastr/devel/postgresql/src/test/regress/results/xml.out 2024-07-05 15:15:23.908490451 +0200
@@ -254,23 +254,13 @@
DETAIL: line 1: xmlParseEntityRef: no name
<invalidentity>&</invalidentity>
^
-line 1: chunk is not well balanced
-<invalidentity>&</invalidentity>
- ^
SELECT xmlparse(content '<undefinedentity>&idontexist;</undefinedentity>');
ERROR: invalid XML content
DETAIL: line 1: Entity 'idontexist' not defined
<undefinedentity>&idontexist;</undefinedentity>
^
-line 1: chunk is not well balanced
-<undefinedentity>&idontexist;</undefinedentity>
- ^
SELECT xmlparse(content '<invalidns xmlns=''<''/>');
- xmlparse
----------------------------
- <invalidns xmlns='<'/>
-(1 row)
-
+ERROR: invalid XML content
SELECT xmlparse(content '<relativens xmlns=''relative''/>');
xmlparse
--------------------------------
@@ -285,15 +275,8 @@
line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced
<twoerrors>&idontexist;</unbalanced>
^
-line 1: chunk is not well balanced
-<twoerrors>&idontexist;</unbalanced>
- ^
SELECT xmlparse(content '<nosuchprefix:tag/>');
- xmlparse
----------------------
- <nosuchprefix:tag/>
-(1 row)
-
+ERROR: invalid XML content
SELECT xmlparse(document ' ');
ERROR: invalid XML document
DETAIL: line 1: Start tag expected, '<' not found
@@ -525,25 +508,18 @@
SELECT xmlserialize(DOCUMENT '<foo>73</foo><bar><val x="y">42</val></bar>' AS text INDENT);
ERROR: not an XML document
SELECT xmlserialize(CONTENT '<foo>73</foo><bar><val x="y">42</val></bar>' AS text INDENT);
- xmlserialize
------------------------
- <foo>73</foo> +
- <bar> +
- <val x="y">42</val>+
- </bar>
+ xmlserialize
+---------------
+ <foo>73</foo>
(1 row)
-- indent non singly-rooted xml with mixed contents
SELECT xmlserialize(DOCUMENT 'text node<foo>73</foo>text node<bar><val x="y">42</val></bar>' AS text INDENT);
ERROR: not an XML document
SELECT xmlserialize(CONTENT 'text node<foo>73</foo>text node<bar><val x="y">42</val></bar>' AS text INDENT);
- xmlserialize
-------------------------
- text node +
- <foo>73</foo>text node+
- <bar> +
- <val x="y">42</val> +
- </bar>
+ xmlserialize
+--------------
+ text node
(1 row)
-- indent singly-rooted xml with mixed contents
@@ -1209,15 +1185,13 @@
-- error messages, we suppress the DETAIL in this test.
\set VERBOSITY terse
SELECT xpath('/*', '<invalidns xmlns=''<''/>');
-ERROR: could not parse XML document
+ERROR: invalid XML content at character 20
\set VERBOSITY default
-- Again, the XML isn't well-formed for namespace purposes
SELECT xpath('/*', '<nosuchprefix:tag/>');
-ERROR: could not parse XML document
-DETAIL: line 1: Namespace prefix nosuchprefix on tag is not defined
-<nosuchprefix:tag/>
- ^
-CONTEXT: SQL function "xpath" statement 1
+ERROR: invalid XML content
+LINE 1: SELECT xpath('/*', '<nosuchprefix:tag/>');
+ ^
-- XPath deprecates relative namespaces, but they're not supposed to
-- throw an error, only a warning.
SELECT xpath('/*', '<relativens xmlns=''relative''/>');
On 2024-07-05 15:33 +0200, Frank Streitzig wrote:
My system is a Arch Linux.
I get after upgrade the libxml2 package (from 2.12.7-1 to 2.13.1-1)
test errors for xml:not ok 202 + xml 1464 ms
[...snip...]
# 1 of 222 tests failed.
# The differences that caused some tests to fail can be viewed in the file "/home/frastr/devel/postgresql/src/test/regress/regression.diffs".
# A copy of the test summary that you see above is saved in the file "/home/frastr/devel/postgresql/src/test/regress/regression.out".
make[1]: *** [GNUmakefile:118: check] Fehler 1
make[1]: Verzeichnis „/home/frastr/devel/postgresql/src/test/regress“ wird verlassen
make: *** [GNUmakefile:69: check] Fehler 2
Hmm, I did not get this error after upgrading libxml2 on my Arch machine
a couple of weeks ago. Did you just run make check after upgrading,
without recompiling? Please try make clean && make && make check
--
Erik
My system is a Arch Linux.
I get after upgrade the libxml2 package (from 2.12.7-1 to 2.13.1-1)
test errors for xml:not ok 202 + xml 1464 ms
[...snip...]
# 1 of 222 tests failed.
# The differences that caused some tests to fail can be viewed in the file "/home/frastr/devel/postgresql/src/test/regress/regression.diffs".
# A copy of the test summary that you see above is saved in the file "/home/frastr/devel/postgresql/src/test/regress/regression.out".
make[1]: *** [GNUmakefile:118: check] Fehler 1
make[1]: Verzeichnis „/home/frastr/devel/postgresql/src/test/regress“ wird verlassen
make: *** [GNUmakefile:69: check] Fehler 2Hmm, I did not get this error after upgrading libxml2 on my Arch machine
a couple of weeks ago. Did you just run make check after upgrading,
without recompiling? Please try make clean && make && make check
Hi,
i get same error.
Frank
Import Notes
Resolved by subject fallback
On 2024-07-06 11:57 +0200, Frank Streitzig wrote:
My system is a Arch Linux.
I get after upgrade the libxml2 package (from 2.12.7-1 to 2.13.1-1)
test errors for xml:not ok 202 + xml 1464 ms
[...snip...]
# 1 of 222 tests failed.
# The differences that caused some tests to fail can be viewed in the file "/home/frastr/devel/postgresql/src/test/regress/regression.diffs".
# A copy of the test summary that you see above is saved in the file "/home/frastr/devel/postgresql/src/test/regress/regression.out".
make[1]: *** [GNUmakefile:118: check] Fehler 1
make[1]: Verzeichnis „/home/frastr/devel/postgresql/src/test/regress“ wird verlassen
make: *** [GNUmakefile:69: check] Fehler 2Hmm, I did not get this error after upgrading libxml2 on my Arch machine
a couple of weeks ago. Did you just run make check after upgrading,
without recompiling? Please try make clean && make && make checki get same error.
Ah! I forgot to run ./configure --with-libxml. I thought it was
enabled by default if libxml2 is available. Now I get the same errors.
Also with 2.13.2-1 which is currently still in core-testing.
So, there must be breaking changes in 2.13.0:
https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.13.0
Maybe you can downgrade in the meantime, if you still have 2.12.7 in the
cache:
pacman -U /var/cache/pacman/pkg/libxml2-2.12.7-1-x86_64.pkg.tar.zst
--
Erik
Erik Wienhold <ewie@ewie.name> writes:
So, there must be breaking changes in 2.13.0:
https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.13.0
Yeah, apparently --- I get what look like the same diffs with
libxml2 2.13.0 recently supplied by MacPorts. Grumble.
Somebody's going to have to look into that.
regards, tom lane
On 2024-07-06 16:25 +0200, Tom Lane wrote:
Erik Wienhold <ewie@ewie.name> writes:
So, there must be breaking changes in 2.13.0:
https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.13.0Yeah, apparently --- I get what look like the same diffs with
libxml2 2.13.0 recently supplied by MacPorts. Grumble.
Somebody's going to have to look into that.
Here's a patch that fixes just the xmlserialize and namespace errors.
Use xmlAddChildList instead of xmlAddChild for xmlserialize. That also
works with 2.12.7, but I don't know about older libxml2 versions. Maybe
add a version check to be safe:
#if LIBXML_VERSION >= 21300
xmlAddChildList(root, content_nodes);
#else
xmlAddChild(root, content_nodes);
#endif
I don't know if using xmlAddChild in this context was ever correct.
The namespace errors are tricky because xmlParseBalancedChunkMemory now
returns res_code != 0 for invalid or unknown namespaces (probably other
errors as well). So I just added an additional check to ignore those
errors for >=2.13. But that's rather hackish. I don't know how to
handle it in xml_errorHandler where those error codes are already dealt
with in order to compensate for differences in error reporting across
different libxml2 versions. Looks like xmlerrcxt is ignored by
xmlParseBalancedChunkMemory.
No idea how to deal with the remaining errors for invalid and undefined
entities which appear to include less details now. That seems to be
expected, judging from the release notes:
A few error messages were improved and consolidated. Please update
downstream test suites accordingly.
How to deal with that in a manner that still works for pre-2.13, other
than filtering out those details that are no longer included in 2.13?
Or just \set VERBOSITY terse for those few test cases? But that omits
the entire error detail.
--
Erik
Attachments:
libxml2.13-fixes.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 3e4ca874d8..9eddafde01 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -757,7 +757,7 @@ xmltotext_with_options(xmltype *data, XmlOptionType xmloption_arg, bool indent)
/* This attaches root to doc, so we need not free it separately. */
xmlDocSetRootElement(doc, root);
- xmlAddChild(root, content_nodes);
+ xmlAddChildList(root, content_nodes);
/*
* We use this node to insert newlines in the dump. Note: in at
@@ -1842,10 +1842,24 @@ xml_parse(text *data, XmlOptionType xmloption_arg,
parsed_nodes);
if (res_code != 0 || xmlerrcxt->err_occurred)
{
- xml_errsave(escontext, xmlerrcxt,
- ERRCODE_INVALID_XML_CONTENT,
- "invalid XML content");
- goto fail;
+#if LIBXML_VERSION >= 21300
+ /*
+ * libxml2 2.13 reports namespace errors at this point via
+ * res_code instead of xmlerrcxt. Ignore them here so that
+ * we can report the same error message as for libxml2
+ * releases prior to 2.13.
+ */
+ if (res_code != XML_NS_ERR_UNDEFINED_NAMESPACE &&
+ res_code != XML_WAR_NS_URI)
+ {
+#endif
+ xml_errsave(escontext, xmlerrcxt,
+ ERRCODE_INVALID_XML_CONTENT,
+ "invalid XML content");
+ goto fail;
+#if LIBXML_VERSION >= 21300
+ }
+#endif
}
}
}
Erik Wienhold <ewie@ewie.name> writes:
On 2024-07-06 16:25 +0200, Tom Lane wrote:
Yeah, apparently --- I get what look like the same diffs with
libxml2 2.13.0 recently supplied by MacPorts. Grumble.
Somebody's going to have to look into that.
Here's a patch that fixes just the xmlserialize and namespace errors.
One angle that ought to be considered is that some of this stuff may
be flat-out bugs in 2.13.0. I see at
https://gitlab.gnome.org/GNOME/libxml2/-/releases
that both 2.13.1 and 2.13.2 contain fixes for "regressions" in 2.13.0.
I'm disinclined to spend much effort on working around dot-zero bugs.
Use xmlAddChildList instead of xmlAddChild for xmlserialize. That also
works with 2.12.7, but I don't know about older libxml2 versions. Maybe
...
I don't know if using xmlAddChild in this context was ever correct.
Good question. A look at
https://github.com/ConradIrwin/libxml2/blame/master/include/libxml/tree.h
shows that xmlAddChildList has been there for decades, and I also
see in the 2.13.0 notes
* tree: Align xmlAddChild with other node insertion functions
so it sort of looks like this is something that we got away with
but it wasn't right. I'm inclined to just replace the call and see
what the buildfarm says.
The namespace errors are tricky because xmlParseBalancedChunkMemory now
returns res_code != 0 for invalid or unknown namespaces (probably other
errors as well). So I just added an additional check to ignore those
errors for >=2.13. But that's rather hackish. I don't know how to
handle it in xml_errorHandler where those error codes are already dealt
with in order to compensate for differences in error reporting across
different libxml2 versions. Looks like xmlerrcxt is ignored by
xmlParseBalancedChunkMemory.
2.13.0 mentions having added some new error-handler-setting functions;
maybe we need to use one of those to get xmlParseBalancedChunkMemory's
attention now?
No idea how to deal with the remaining errors for invalid and undefined
entities which appear to include less details now.
That's fairly annoying. One answer is to create a variant
expected-file, but the long-term maintenance costs could be high.
On the other hand, our xml support is a bit of a backwater, so
maybe it wouldn't be that bad.
regards, tom lane
I wrote:
One angle that ought to be considered is that some of this stuff may
be flat-out bugs in 2.13.0. I see at
https://gitlab.gnome.org/GNOME/libxml2/-/releases
that both 2.13.1 and 2.13.2 contain fixes for "regressions" in 2.13.0.
I'm disinclined to spend much effort on working around dot-zero bugs.
Meh ... 2.13.1, at least, changes nothing here.
regards, tom lane
On 2024-07-06 20:43 +0200, Tom Lane wrote:
One angle that ought to be considered is that some of this stuff may
be flat-out bugs in 2.13.0. I see athttps://gitlab.gnome.org/GNOME/libxml2/-/releases
that both 2.13.1 and 2.13.2 contain fixes for "regressions" in 2.13.0.
I'm disinclined to spend much effort on working around dot-zero bugs.
Found an open issue about ABI compatibility that affects 2.12.7 and
possibly also 2.13: https://gitlab.gnome.org/GNOME/libxml2/-/issues/751.
Maybe just wait this one out, in case they'll bump SONAME.
2.13.0 mentions having added some new error-handler-setting functions;
maybe we need to use one of those to get xmlParseBalancedChunkMemory's
attention now?
I tried xmlCtxtSetErrorHandler but xmlParseBalancedChunkMemory doesn't
report to that. It's a known issue:
https://gitlab.gnome.org/GNOME/libxml2/-/issues/727
--
Erik
Erik Wienhold <ewie@ewie.name> writes:
On 2024-07-06 20:43 +0200, Tom Lane wrote:
I'm disinclined to spend much effort on working around dot-zero bugs.
Found an open issue about ABI compatibility that affects 2.12.7 and
possibly also 2.13: https://gitlab.gnome.org/GNOME/libxml2/-/issues/751.
Maybe just wait this one out, in case they'll bump SONAME.
Whether they bump SONAME or put back the removed functions, there's no
way that production-oriented distros are going to ship 2.13.x until
something's done about that. That's good news for us, in that it
gives us an excuse to not work around any dot-zero issues that get
fixed before that point. But we'd better analyze our concerns and
make sure anything we don't want to work around gets fixed by then.
I tried xmlCtxtSetErrorHandler but xmlParseBalancedChunkMemory doesn't
report to that. It's a known issue:
https://gitlab.gnome.org/GNOME/libxml2/-/issues/727
I saw that one. It would be good to have a replacement for
xmlParseBalancedChunkMemory, because after looking at the libxml2
sources I realize that that's classed as a SAX1 function, which means
it will likely go away at some point (maybe it's already not there in
some builds). That's a long-term consideration though.
In the short term, I dug in the libxml2 sources and found that
xmlParseBalancedChunkMemory got pretty heavily refactored between 2.12
and 2.13. The reason for our immediate problem is that the return
value in the old code is defined by
if (!ctxt->wellFormed) {
if (ctxt->errNo == 0)
ret = 1;
else
ret = ctxt->errNo;
} else {
ret = 0;
}
whereas the new code just returns ctxt->errNo unconditionally.
This does not agree with the phraseology in libxml2's documentation:
0 if the chunk is well balanced, -1 in case of args problem and
the parser error code otherwise
so I filed an issue at
https://gitlab.gnome.org/GNOME/libxml2/-/issues/765
We'll see what they say about that.
I think we could work around it as attached. This relies on seeing
that the 2.13 code will return a node list if and only if
ctxt->wellFormed is true (and we already eliminated the empty-input
case, so an empty node list shouldn't happen). But it's not a lot
less ugly than your proposal.
Also, this only fixes the two wrong-output-from-xmlserialize
test cases. I'm not so stressed about the cases where the errdetail
changes, but I think we need to find an answer for the places where
it fails and didn't before, like:
SELECT xmlparse(content '<invalidns xmlns=''<''/>');
- xmlparse
----------------------------
- <invalidns xmlns='<'/>
-(1 row)
-
+ERROR: invalid XML content
regards, tom lane
Attachments:
hacky-workaround-for-libxml2-2.13-bug.patchtext/x-diff; charset=us-ascii; name=hacky-workaround-for-libxml2-2.13-bug.patchDownload
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index d75f765de0..d5fa83e736 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -1840,6 +1840,18 @@ xml_parse(text *data, XmlOptionType xmloption_arg,
res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
utf8string + count,
parsed_nodes);
+
+ /*
+ * libxml2 2.13.x incorrectly returns parser errors even if
+ * the chunk is well-balanced. As a workaround, ignore the
+ * return code if a node list was returned.
+ * https://gitlab.gnome.org/GNOME/libxml2/-/issues/765
+ */
+#if LIBXML_VERSION >= 21300
+ if (res_code > 0 && parsed_nodes != NULL)
+ res_code = 0;
+#endif
+
if (res_code != 0 || xmlerrcxt->err_occurred)
{
xml_errsave(escontext, xmlerrcxt,
I wrote:
I think we could work around it as attached. This relies on seeing
that the 2.13 code will return a node list if and only if
ctxt->wellFormed is true (and we already eliminated the empty-input
case, so an empty node list shouldn't happen). But it's not a lot
less ugly than your proposal.
Also, this only fixes the two wrong-output-from-xmlserialize
test cases. I'm not so stressed about the cases where the errdetail
changes, but I think we need to find an answer for the places where
it fails and didn't before, like:
SELECT xmlparse(content '<invalidns xmlns=''<''/>'); - xmlparse ---------------------------- - <invalidns xmlns='<'/> -(1 row) - +ERROR: invalid XML content
Oh! That's actually the same bug, and my patch was faulty because
I didn't think about the case where the caller of xml_parse passes
parsed_nodes = NULL. (And it wasn't doing the right thing in the
other case either :-(.) The attached works significantly better,
and cleans up these bogus errors.
We're still left with missing "chunk is not well balanced" errcontext
entries, which we could live without if we have to, but I wonder why
those are not there.
regards, tom lane
Attachments:
v2-hacky-workaround-for-libxml2-2.13-bug.patchtext/x-diff; charset=us-ascii; name=v2-hacky-workaround-for-libxml2-2.13-bug.patchDownload
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index d75f765de0..311de91110 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -1837,16 +1837,36 @@ xml_parse(text *data, XmlOptionType xmloption_arg,
/* allow empty content */
if (*(utf8string + count))
{
+ xmlNodePtr node_list = NULL;
+
res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
utf8string + count,
- parsed_nodes);
+ &node_list);
+
+ /*
+ * libxml2 2.13.x incorrectly returns parser errors even if
+ * the chunk is well-balanced. As a workaround, ignore the
+ * return code if a node list was returned.
+ * https://gitlab.gnome.org/GNOME/libxml2/-/issues/765
+ */
+#if LIBXML_VERSION >= 21300
+ if (res_code > 0 && node_list != NULL)
+ res_code = 0;
+#endif
+
if (res_code != 0 || xmlerrcxt->err_occurred)
{
+ xmlFreeNodeList(node_list);
xml_errsave(escontext, xmlerrcxt,
ERRCODE_INVALID_XML_CONTENT,
"invalid XML content");
goto fail;
}
+
+ if (parsed_nodes != NULL)
+ *parsed_nodes = node_list;
+ else
+ xmlFreeNodeList(node_list);
}
}
I wrote:
I saw that one. It would be good to have a replacement for
xmlParseBalancedChunkMemory, because after looking at the libxml2
sources I realize that that's classed as a SAX1 function, which means
it will likely go away at some point (maybe it's already not there in
some builds). That's a long-term consideration though.
Actually ... after nosing around in libxml2 some more, I noticed
xmlParseInNodeContext, which is the only other function specified
to parse a Well Balanced Chunk. It requires a context node,
but AFAICS we can just gin up a dummy root node and use that.
It's existed for plenty long enough for our purposes, and it's
not semi-deprecated, and it lacks the bug at hand. So I'm now
thinking about the attached.
As far as the errcontext changes go: I think we have to just bite
the bullet and accept them. It looks like 2.13 has a completely
different mechanism than prior versions for deciding when to issue
XML_ERR_NOT_WELL_BALANCED. And it's not even clear that it's wrong;
for example, in our first failing case
DETAIL: line 1: xmlParseEntityRef: no name
<invalidentity>&</invalidentity>
^
-line 1: chunk is not well balanced
-<invalidentity>&</invalidentity>
- ^
it's kind of hard to argue that the chunk isn't well-balanced.
So we can either suppress errdetails from the expected output,
or set up an additional expected-file. I'm leaning to the
"\set VERBOSITY terse" solution.
regards, tom lane
Attachments:
v3-hacky-workaround-for-libxml2-2.13-bug.patchtext/x-diff; charset=us-ascii; name=v3-hacky-workaround-for-libxml2-2.13-bug.patchDownload
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index d75f765de0..4a5517fd75 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -1822,6 +1822,8 @@ xml_parse(text *data, XmlOptionType xmloption_arg,
}
else
{
+ xmlNodePtr root;
+
doc = xmlNewDoc(version);
if (doc == NULL || xmlerrcxt->err_occurred)
xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
@@ -1834,19 +1836,39 @@ xml_parse(text *data, XmlOptionType xmloption_arg,
"could not allocate XML document");
doc->standalone = standalone;
+ root = xmlNewNode(NULL, (const xmlChar *) "content-root");
+ if (root == NULL || xmlerrcxt->err_occurred)
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
+ "could not allocate xml node");
+ /* This attaches root to doc, so we need not free it separately. */
+ xmlDocSetRootElement(doc, root);
+
/* allow empty content */
if (*(utf8string + count))
{
- res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
- utf8string + count,
- parsed_nodes);
- if (res_code != 0 || xmlerrcxt->err_occurred)
+ xmlNodePtr node_list = NULL;
+ xmlParserErrors res;
+
+ res = xmlParseInNodeContext(root,
+ (char *) utf8string + count,
+ strlen((char *) utf8string + count),
+ XML_PARSE_NOENT | XML_PARSE_DTDATTR
+ | (preserve_whitespace ? 0 : XML_PARSE_NOBLANKS),
+ &node_list);
+
+ if (res != XML_ERR_OK || xmlerrcxt->err_occurred)
{
+ xmlFreeNodeList(node_list);
xml_errsave(escontext, xmlerrcxt,
ERRCODE_INVALID_XML_CONTENT,
"invalid XML content");
goto fail;
}
+
+ if (parsed_nodes != NULL)
+ *parsed_nodes = node_list;
+ else
+ xmlFreeNodeList(node_list);
}
}
On 2024-07-07 22:43 +0200, Tom Lane wrote:
As far as the errcontext changes go: I think we have to just bite
the bullet and accept them. It looks like 2.13 has a completely
different mechanism than prior versions for deciding when to issue
XML_ERR_NOT_WELL_BALANCED. And it's not even clear that it's wrong;
for example, in our first failing caseDETAIL: line 1: xmlParseEntityRef: no name
<invalidentity>&</invalidentity>
^
-line 1: chunk is not well balanced
-<invalidentity>&</invalidentity>
- ^it's kind of hard to argue that the chunk isn't well-balanced.
So we can either suppress errdetails from the expected output,
or set up an additional expected-file. I'm leaning to the
"\set VERBOSITY terse" solution.
+1 for \set VERBOSITY terse as a last resort.
But it looks to me as if "chunk is not well balanced" is just noise
because libxml2 reports more specific errors before that. For example:
SELECT xmlparse(content '<twoerrors>&idontexist;</unbalanced>');
ERROR: invalid XML content
DETAIL: line 1: Entity 'idontexist' not defined
<twoerrors>&idontexist;</unbalanced>
^
line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced
<twoerrors>&idontexist;</unbalanced>
^
line 1: chunk is not well balanced
<twoerrors>&idontexist;</unbalanced>
^
Here, "Opening and ending tag mismatch" already covers the unbalanced
closing tag.
So how about just ignoring XML_ERR_NOT_WELL_BALANCED like in the
attached? This also adds test cases for an unclosed tag because I
wanted to see if I can trigger just "chunk is not well balanced", but
without success.
SELECT xmlparse(content '<unclosed>');
ERROR: invalid XML content
DETAIL: line 1: Premature end of data in tag unclosed line 1
<unclosed>
^
line 1: chunk is not well balanced
<unclosed>
^
libxml2 2.13 doesn't report "chunk ..." here either.
There's also this more explicit test case for unbalanced tags:
<parent><child></parent></child>
But I'm not sure if that's really necessary if we already have:
<twoerrors>&idontexist;</unbalanced>
The error messages are the same, except for the additional entity error.
--
Erik
Attachments:
suppress-noisy-error.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 8893be5682..4f45c90f54 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -2114,6 +2114,17 @@ xml_errorHandler(void *data, PgXmlErrorPtr error)
switch (domain)
{
case XML_FROM_PARSER:
+ /*
+ * Suppress errors about not well-balanced elements because libxml2
+ * already reports more specific errors in those cases. So, this
+ * error is redundant noise. Also, libxml2 2.13 no longer reports
+ * this error in every case.
+ */
+ if (error->code == XML_ERR_NOT_WELL_BALANCED)
+ return;
+
+ /* fall through */
+
case XML_FROM_NONE:
case XML_FROM_MEMORY:
case XML_FROM_IO:
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 6500cff885..d6a51f9e38 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -254,17 +254,11 @@ ERROR: invalid XML content
DETAIL: line 1: xmlParseEntityRef: no name
<invalidentity>&</invalidentity>
^
-line 1: chunk is not well balanced
-<invalidentity>&</invalidentity>
- ^
SELECT xmlparse(content '<undefinedentity>&idontexist;</undefinedentity>');
ERROR: invalid XML content
DETAIL: line 1: Entity 'idontexist' not defined
<undefinedentity>&idontexist;</undefinedentity>
^
-line 1: chunk is not well balanced
-<undefinedentity>&idontexist;</undefinedentity>
- ^
SELECT xmlparse(content '<invalidns xmlns=''<''/>');
xmlparse
---------------------------
@@ -283,9 +277,6 @@ DETAIL: line 1: Entity 'idontexist' not defined
<twoerrors>&idontexist;</unbalanced>
^
line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced
-<twoerrors>&idontexist;</unbalanced>
- ^
-line 1: chunk is not well balanced
<twoerrors>&idontexist;</unbalanced>
^
SELECT xmlparse(content '<nosuchprefix:tag/>');
@@ -294,6 +285,19 @@ SELECT xmlparse(content '<nosuchprefix:tag/>');
<nosuchprefix:tag/>
(1 row)
+SELECT xmlparse(content '<unclosed>');
+ERROR: invalid XML content
+DETAIL: line 1: Premature end of data in tag unclosed line 1
+<unclosed>
+ ^
+SELECT xmlparse(content '<parent><child></parent></child>');
+ERROR: invalid XML content
+DETAIL: line 1: Opening and ending tag mismatch: child line 1 and parent
+<parent><child></parent></child>
+ ^
+line 1: Opening and ending tag mismatch: parent line 1 and child
+<parent><child></parent></child>
+ ^
SELECT xmlparse(document ' ');
ERROR: invalid XML document
DETAIL: line 1: Start tag expected, '<' not found
@@ -352,6 +356,19 @@ SELECT xmlparse(document '<nosuchprefix:tag/>');
<nosuchprefix:tag/>
(1 row)
+SELECT xmlparse(document '<unclosed>');
+ERROR: invalid XML document
+DETAIL: line 1: Premature end of data in tag unclosed line 1
+<unclosed>
+ ^
+SELECT xmlparse(document '<parent><child></parent></child>');
+ERROR: invalid XML document
+DETAIL: line 1: Opening and ending tag mismatch: child line 1 and parent
+<parent><child></parent></child>
+ ^
+line 1: Opening and ending tag mismatch: parent line 1 and child
+<parent><child></parent></child>
+ ^
SELECT xmlpi(name foo);
xmlpi
---------
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 953bac09e4..15ccbe1d35 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -77,6 +77,8 @@ SELECT xmlparse(content '<invalidns xmlns=''<''/>');
SELECT xmlparse(content '<relativens xmlns=''relative''/>');
SELECT xmlparse(content '<twoerrors>&idontexist;</unbalanced>');
SELECT xmlparse(content '<nosuchprefix:tag/>');
+SELECT xmlparse(content '<unclosed>');
+SELECT xmlparse(content '<parent><child></parent></child>');
SELECT xmlparse(document ' ');
SELECT xmlparse(document 'abc');
@@ -87,6 +89,8 @@ SELECT xmlparse(document '<invalidns xmlns=''<''/>');
SELECT xmlparse(document '<relativens xmlns=''relative''/>');
SELECT xmlparse(document '<twoerrors>&idontexist;</unbalanced>');
SELECT xmlparse(document '<nosuchprefix:tag/>');
+SELECT xmlparse(document '<unclosed>');
+SELECT xmlparse(document '<parent><child></parent></child>');
SELECT xmlpi(name foo);
Erik Wienhold <ewie@ewie.name> writes:
So how about just ignoring XML_ERR_NOT_WELL_BALANCED like in the
attached?
Oh, that's a good idea. Given that we know they've changed the
behavior around this at least once, I'm not sure that it's safe
to unconditionally ignore this error --- but we could ignore it
as long as we've already recorded some libxml2 error.
I dug through the 2.13 sources and I think that in that version,
it is actually impossible to get XML_ERR_NOT_WELL_BALANCED without
any prior error. It's only issued if parsing stops short of the
end of input, and that only happens if PARSER_STOPPED() becomes
true, and that only happens if xmlHaltParser() is called, and
all the calls to that seem to follow other errors being issued.
But the 2.12 code looks quite different and I'm not sure that
there's no such code path there; much less that it can't happen
in older versions.
This also adds test cases for an unclosed tag because I
wanted to see if I can trigger just "chunk is not well balanced", but
without success.
No particular objection to adding these.
But I'm not sure if that's really necessary if we already have:
<twoerrors>&idontexist;</unbalanced>
The error messages are the same, except for the additional entity error.
I think the point of that test is different: it's showing that we
actually can report multiple errors out of a single libxml2 parsing
call. Before seeing your message I was thinking we'd have to
"\set VERBOSITY terse" that test, which was annoying me mightily
because it could no longer prove any such thing.
I've updated indri's host to current MacPorts including libxml2
2.13.1, and as expected it's now showing failures:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=indri&dt=2024-07-09%2018%3A17%3A23
I'll push this change in a little bit (still gotta write commit
message) and indri should go back to green. Unless one of the
other animals complains, I'll set about back-patching in a
day or two.
regards, tom lane
I wrote:
I'll push this change in a little bit (still gotta write commit
message) and indri should go back to green. Unless one of the
other animals complains, I'll set about back-patching in a
day or two.
Well, that didn't take long: several animals are reporting
different error text for one or both of those new test cases.
It looks like I did indeed guess wrong about the output for
the libxml2 versions that match xml_2.out; but more
interestingly we have
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=widowbird&dt=2024-07-09%2019%3A30%3A07
@@ -287,7 +287,7 @@
SELECT xmlparse(content '<unclosed>');
ERROR: invalid XML content
-DETAIL: line 1: Premature end of data in tag unclosed line 1
+DETAIL: line 1: chunk is not well balanced
<unclosed>
^
SELECT xmlparse(content '<parent><child></parent></child>');
@@ -358,7 +358,7 @@
SELECT xmlparse(document '<unclosed>');
ERROR: invalid XML document
-DETAIL: line 1: Premature end of data in tag unclosed line 1
+DETAIL: line 1: EndTag: '</' not found
<unclosed>
^
SELECT xmlparse(document '<parent><child></parent></child>');
which proves that whatever version widowbird is running is
indeed capable of emitting XML_ERR_NOT_WELL_BALANCED with
no other error. So my instinct to not suppress that
unconditionally was right.
At the moment I'm thinking that we should just remove those
new test cases again. They are not valuable enough to
justify a new variant expected-file, while suppressing the
errdetail would remove whatever value they do have.
regards, tom lane