Use \if/\endif to remove non-libxml2 expected output in regression tests
Hi all,
While dealing with libxml2 things today, I have been reminded that we
have a regression test output for the case where we don't compile with
libxml2, which is error-prone (I also forgot xml_1.out; and managed to
forget about xml_2.out until massasauga has reminded me about it).
The only location where we use USE_LIBXML is in the backend's xml.c,
so we need some imagination for an \if query that can trigger an early
exit. And it is simply possible to check if we have some data in the
table where the initial inserts are done at the top of the test, which
is what the attached patch is doing.
With this patch in place, we don't need to care about updating the
output file when not building with libxml2, leading to these numbers:
4 files changed, 16 insertions(+), 1482 deletions(-)
I'd like to backpatch that down to v14, to ease future maintenance.
Any comments and/or objections?
--
Michael
Attachments:
0001-Skip-xml-tests-when-not-compiling-with-libxml2.patchtext/plain; charset=us-asciiDownload+16-1388
On 12 Jun 2026, at 09:49, Michael Paquier <michael@paquier.xyz> wrote:
I'd like to backpatch that down to v14, to ease future maintenance.
Any comments and/or objections?
Skipping tests when the underlying library/platform support is missing is in
line with how the TAP tests already do it. We would lose coverage of invoking
XML functions without libxml support, which risk hiding bugs in the error-
handling (which is already the case with the TAP tests etc). For libxml that
might be acceptable since the error paths are quite small (and by now
welltested) and the XML code isn't really moving all that much.
I think there needs to be a big comment explaining the skip_test clause though.
--
Daniel Gustafsson
Daniel Gustafsson <daniel@yesql.se> writes:
On 12 Jun 2026, at 09:49, Michael Paquier <michael@paquier.xyz> wrote:
I'd like to backpatch that down to v14, to ease future maintenance.
Any comments and/or objections?
Skipping tests when the underlying library/platform support is missing is in
line with how the TAP tests already do it. We would lose coverage of invoking
XML functions without libxml support, which risk hiding bugs in the error-
handling (which is already the case with the TAP tests etc).
Yeah. Also, I think this isn't really moving the support goalposts
very far, because xml_2.out will still require maintenance whenever
we touch xml.out, and that's the harder file to update if you don't
have a suitable version of libxml2 at hand. So I'm not in favor of
this as it stands. If we could find a way to get rid of xml_2.out
as well, then maybe dropping coverage of the error paths would be a
good tradeoff.
Diff'ing xml.out and xml_2.out, I see the differences are omissions
of input fragments in some error reports, eg:
@@ -335,8 +323,6 @@
<twoerrors>&idontexist;</unbalanced>
^
line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced
-<twoerrors>&idontexist;</unbalanced>
- ^
SELECT xmlparse(document '<nosuchprefix:tag/>');
xmlparse
---------------------
I wonder if anyone's ever looked into exactly why that happens and
whether we could do something to restore the missing context.
regards, tom lane
I wrote:
Diff'ing xml.out and xml_2.out, I see the differences are omissions
of input fragments in some error reports, eg:
...
I wonder if anyone's ever looked into exactly why that happens and
whether we could do something to restore the missing context.
The commit that added xml_2.out said:
commit 085423e3e326da1b52f41aa86126f2a064a7db25
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri Dec 11 19:08:40 2015 -0500
Add an expected-file to match behavior of latest libxml2.
Recent releases of libxml2 do not provide error context reports for errors
detected at the very end of the input string. This appears to be a bug, or
at least an infelicity, introduced by the fix for libxml2's CVE-2015-7499.
We can hope that this behavioral change will get undone before too long;
but the security patch is likely to spread a lot faster/further than any
follow-on cleanup, which means this behavior is likely to be present in the
wild for some time to come. As a stopgap, add a variant regression test
expected-file that matches what you get with a libxml2 that acts this way.
It's kind of sad that this issue is still present in the wild ten
years later. But anyway, I wonder if we could dodge the issue
simply by modifying these error-provoking cases to have some
whitespace at the end of the XML input? I don't think I have
a problematic version of libxml2 to test that with, though.
regards, tom lane
I wrote:
It's kind of sad that this issue is still present in the wild ten
years later. But anyway, I wonder if we could dodge the issue
simply by modifying these error-provoking cases to have some
whitespace at the end of the XML input? I don't think I have
a problematic version of libxml2 to test that with, though.
Hah, this idea does mostly work. Finding that no system I had at
hand had a libxml2 old enough to exhibit the problem, I installed
libxml2 2.9.3 from source. (That's the first version containing
the problematic CVE fix, and it does behave as xml_2.out expects.)
I found that four of the inconsistent messages could be fixed this
way. The other three problem cases have input like '<wrong' or
all-spaces, and evidently we need the error to be recognized before
reaching end-of-input.
Now, it seems to me that we are not especially interested in whether
libxml2 produces one error message or another one for given input;
what we need to test is only that Postgres reports the error
properly. So I don't have a problem with tweaking those bad inputs
a bit more aggressively to make them be something that different
libxml2 versions will report identically. The result of my
experiments is attached: it results in expected/xml.out that matches
the output of both 2.9.3 and recent versions. So if we do this
we could drop xml_2.out.
This patch is WIP because I've not updated xml_1.out to match.
But that's trivial, and dropping xml_2.out is equally boring,
so I left those parts out of this demo patch.
regards, tom lane
Attachments:
unify-libxml-error-results-wip.patchtext/x-diff; charset=us-ascii; name=unify-libxml-error-results-wip.patchDownload+32-32
On Fri, Jun 12, 2026 at 07:21:28PM -0400, Tom Lane wrote:
Hah, this idea does mostly work. Finding that no system I had at
hand had a libxml2 old enough to exhibit the problem, I installed
libxml2 2.9.3 from source. (That's the first version containing
the problematic CVE fix, and it does behave as xml_2.out expects.)
I found that four of the inconsistent messages could be fixed this
way. The other three problem cases have input like '<wrong' or
all-spaces, and evidently we need the error to be recognized before
reaching end-of-input.
Done that as well here, where I had to use a few switches on my Fedora
44 box to make something that I can use. For reference after checking
out at v2.9.3:
autoreconf -i
./configure --prefix=/path/to/libxml/2.9.3/ \
--without-threads --without-thread-alloc
make install
Now, it seems to me that we are not especially interested in whether
libxml2 produces one error message or another one for given input;
what we need to test is only that Postgres reports the error
properly. So I don't have a problem with tweaking those bad inputs
a bit more aggressively to make them be something that different
libxml2 versions will report identically. The result of my
experiments is attached: it results in expected/xml.out that matches
the output of both 2.9.3 and recent versions. So if we do this
we could drop xml_2.out.
Nice. Confirmed that this is working here.
This patch is WIP because I've not updated xml_1.out to match.
But that's trivial, and dropping xml_2.out is equally boring,
so I left those parts out of this demo patch.
I would also like to drop xml_1.out with the previous trick \if, but
at the end if you feel that there is still value in keeping some
coverage for the multiple NO_XML_SUPPORT() paths, that's fine by me.
Removing half of the regression output update pain is still better
than removing none of it.
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
I would also like to drop xml_1.out with the previous trick \if, but
at the end if you feel that there is still value in keeping some
coverage for the multiple NO_XML_SUPPORT() paths, that's fine by me.
Removing half of the regression output update pain is still better
than removing none of it.
I think it's worth covering, say, one of those paths, but we don't
have to hit every one of the duplicate copies. So what do you think
of putting the early-exit \if after the first test that produces
one of those failures?
regards, tom lane
On Sun, Jun 14, 2026 at 07:12:24PM -0400, Tom Lane wrote:
Michael Paquier <michael@paquier.xyz> writes:
I would also like to drop xml_1.out with the previous trick \if, but
at the end if you feel that there is still value in keeping some
coverage for the multiple NO_XML_SUPPORT() paths, that's fine by me.
Removing half of the regression output update pain is still better
than removing none of it.I think it's worth covering, say, one of those paths, but we don't
have to hit every one of the duplicate copies. So what do you think
of putting the early-exit \if after the first test that produces
one of those failures?
Positioning the \if after the three initial INSERT queries (\if query
that counts the number of tuples inserted), like in the patch I have
posted upthread, is able to hit NO_XML_SUPPORT() in xmlin(). That
would be enough. Would you agree with that?
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
On Sun, Jun 14, 2026 at 07:12:24PM -0400, Tom Lane wrote:
I think it's worth covering, say, one of those paths, but we don't
have to hit every one of the duplicate copies. So what do you think
of putting the early-exit \if after the first test that produces
one of those failures?
Positioning the \if after the three initial INSERT queries (\if query
that counts the number of tuples inserted), like in the patch I have
posted upthread, is able to hit NO_XML_SUPPORT() in xmlin(). That
would be enough. Would you agree with that?
Works for me.
regards, tom lane
On Sun, Jun 14, 2026 at 07:33:54PM -0400, Tom Lane wrote:
Michael Paquier <michael@paquier.xyz> writes:
Positioning the \if after the three initial INSERT queries (\if query
that counts the number of tuples inserted), like in the patch I have
posted upthread, is able to hit NO_XML_SUPPORT() in xmlin(). That
would be enough. Would you agree with that?Works for me.
Okay. So fusioning my stuff, your stuff and Daniel's comment
suggestion, I get the attached. That's working here with my system's
libxml2 (2.12.10), libxml2 2.9.3 and without libxml2.
Does that look right?
--
Michael
Attachments:
v2-0001-Trim-regression-test-expected-output-for-xml.patchtext/plain; charset=us-asciiDownload+52-3054
Michael Paquier <michael@paquier.xyz> writes:
Okay. So fusioning my stuff, your stuff and Daniel's comment
suggestion, I get the attached. That's working here with my system's
libxml2 (2.12.10), libxml2 2.9.3 and without libxml2.
Does that look right?
Locally, I'd changed the line
+INSERT INTO xmltest VALUES (3, '<value>two</wrong> ');
to be
+INSERT INTO xmltest VALUES (3, '<value>three</wrong> ');
after sending my 1.0 patch; it was basically a copy-and-paste
error that it wasn't "three" already. That would have follow-on
effects in xml.out and xml_1.out, so if you don't feel like
adopting it, I won't complain hard. No other suggestions.
regards, tom lane
On Sun, Jun 14, 2026 at 08:56:29PM -0400, Tom Lane wrote:
after sending my 1.0 patch; it was basically a copy-and-paste
error that it wasn't "three" already. That would have follow-on
effects in xml.out and xml_1.out, so if you don't feel like
adopting it, I won't complain hard. No other suggestions.
Makes sense. Thanks for double-checking.
--
Michael