Re-add recently-removed tests for ltree and intarray
Hi all,
Some of you may have noticed that some regression tests have been
removed due to some noise in the buildfarm, as of commit 906ea101d0d5.
We did not have time to do something for this release, unfortunately.
It is possible to reproduce the incompatibility by setting
max_stack_depth to a low value, where the first new query of ltree and
intarray would fail, when written in their original shape.
Tom had the idea to switch these two unstable tests to use a balanced
binary tree instead, so as they don't eat the stack still are able to
cover the recent fixes pushed into the tree.
And this investigation has led me to the attached, to-be-backpatched
down to v14. Even under a low max_stack_depth, these new tests are
stable. I could not see an issue for the two tests added at the
bottom of ltree.
Opinions or comments?
Thanks,
--
Michael
Attachments:
0001-Re-add-regression-tests-for-ltree-and-intarray.patchtext/plain; charset=us-asciiDownload+72-1
Michael Paquier <michael@paquier.xyz> writes:
Some of you may have noticed that some regression tests have been
removed due to some noise in the buildfarm, as of commit 906ea101d0d5.
We did not have time to do something for this release, unfortunately.
It is possible to reproduce the incompatibility by setting
max_stack_depth to a low value, where the first new query of ltree and
intarray would fail, when written in their original shape.
Just to add a little more color to this --- what we discovered after
there was time for some investigation was that:
(a) the stack-overflow failure occurred in the findoprnd() function
of intarray/_int_bool.c or ltree/ltxtquery_io.c.
(b) the failure only appeared on buildfarm members running on ppc64
or s390x. I determined by examining assembly code that ppc64 uses
about 3X as much stack per call level in this function as x86_64;
probably s390x is similar. That was enough to overrun our default
max_stack_depth on these architectures, even though the same case
passed on the machines we'd tested on.
(c) even with minimum max_stack_depth, the test passed using gcc
but not clang. Again examining assembly code, gcc is smart enough
to collapse the tail-recursion calls in findoprnd() into looping,
causing the original test case's right-deep query tree to consume
essentially zero stack space. clang doesn't do that, at least not
on those arches at default optimization level. You can make gcc
fail too with -O0.
So it'd be good to verify on a few oddball platforms that Michael's
new attempt is OK. It should theoretically work, but ...
regards, tom lane
On Fri, May 15, 2026 at 9:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
(b) the failure only appeared on buildfarm members running on ppc64
or s390x. I determined by examining assembly code that ppc64 uses
about 3X as much stack per call level in this function as x86_64;
probably s390x is similar. That was enough to overrun our default
max_stack_depth on these architectures, even though the same case
passed on the machines we'd tested on.
FWIW, I tried to reproduce with the former new tests un-reverted, and
didn't see stack overflow on the following, so unless I fat-fingered
that I wonder if there's something more specific on the previously
failing members:
ppc64le / gcc 8.5 / Linux kernel 4.18
S390X / gcc 13.3 / Linux kernel 6.8
--
John Naylor
Amazon Web Services
John Naylor <johncnaylorls@gmail.com> writes:
FWIW, I tried to reproduce with the former new tests un-reverted, and
didn't see stack overflow on the following, so unless I fat-fingered
that I wonder if there's something more specific on the previously
failing members:
ppc64le / gcc 8.5 / Linux kernel 4.18
S390X / gcc 13.3 / Linux kernel 6.8
Hm, did you use -O0 ?
regards, tom lane
On Thu, May 14, 2026 at 10:49:38PM -0400, Tom Lane wrote:
John Naylor <johncnaylorls@gmail.com> writes:
FWIW, I tried to reproduce with the former new tests un-reverted, and
didn't see stack overflow on the following, so unless I fat-fingered
that I wonder if there's something more specific on the previously
failing members:ppc64le / gcc 8.5 / Linux kernel 4.18
S390X / gcc 13.3 / Linux kernel 6.8Hm, did you use -O0 ?
Yeah, that should matter. I don't immediately see why the new tests
should fail at hand.. And unfortunately I don't have these
environments at hand to double-check things, so I think that I am
going to take a bet on HEAD. Then if things work, do a backpatch.
--
Michael
On Fri, May 15, 2026 at 9:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
John Naylor <johncnaylorls@gmail.com> writes:
FWIW, I tried to reproduce with the former new tests un-reverted, and
didn't see stack overflow on the following, so unless I fat-fingered
that I wonder if there's something more specific on the previously
failing members:ppc64le / gcc 8.5 / Linux kernel 4.18
S390X / gcc 13.3 / Linux kernel 6.8Hm, did you use -O0 ?
I just now tried -O0 on yesterday's master with ppc64le, with the
previous new tests re-added, and it did fail. Then, pulled in master
with the tests just now committed, and it passed.
--
John Naylor
Amazon Web Services
On Fri, May 15, 2026 at 01:33:53PM +0700, John Naylor wrote:
I just now tried -O0 on yesterday's master with ppc64le, with the
previous new tests re-added, and it did fail. Then, pulled in master
with the tests just now committed, and it passed.
イエイ。
Thanks for checking. 3 animals running on ppc have currently
passed with the new tests on HEAD, all passing. They are not the ones
that have failed previously, so I'm still holding a bit longer..
--
Michael
On Fri, May 15, 2026 at 03:54:59PM +0900, Michael Paquier wrote:
Thanks for checking. 3 animals running on ppc have currently
passed with the new tests on HEAD, all passing. They are not the ones
that have failed previously, so I'm still holding a bit longer..
A backpatch down to v14 has been done a couple of hours ago, and the
buildfarm looks happy with all the ppc members across the board. We
are done here. Thanks all for the feedback.
--
Michael