sys_siglist[] is causing us trouble again
As of a couple days ago, buildfarm member caiman (Fedora rawhide)
is failing like this in all the pre-v12 branches:
ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -O2 -DFRONTEND -I../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -c -o wait_error.o wait_error.c
wait_error.c: In function \342\200\230wait_result_to_str\342\200\231:
wait_error.c:71:6: error: \342\200\230sys_siglist\342\200\231 undeclared (first use in this function)
71 | sys_siglist[WTERMSIG(exitstatus)] : "(unknown)");
| ^~~~~~~~~~~
wait_error.c:71:6: note: each undeclared identifier is reported only once for each function it appears in
make[2]: *** [<builtin>: wait_error.o] Error 1
We haven't changed anything, ergo something changed at the OS level.
Oddly, we'd not get to this code unless configure set
HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*. I suspect the root
issue here is some rearrangement of system header files combined with
wait_error.c (and maybe other places?) not including exactly the same
headers that configure tested.
Anyway, rather than installing rawhide and trying to debug this,
I'd like to make a modest proposal: let's back-patch the v12
patches that made us stop relying on sys_siglist[], viz a73d08319
and cc92cca43. Per the discussions that led to those patches,
it's been decades since any platform didn't have POSIX-compliant
strsignal(), so we'd be much better off relying on that.
regards, tom lane
On Wed, Jul 15, 2020 at 7:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
As of a couple days ago, buildfarm member caiman (Fedora rawhide)
is failing like this in all the pre-v12 branches:ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
-Wno-format-truncation -Wno-stringop-truncation -g -O2 -DFRONTEND
-I../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -c -o
wait_error.o wait_error.c
wait_error.c: In function \342\200\230wait_result_to_str\342\200\231:
wait_error.c:71:6: error: \342\200\230sys_siglist\342\200\231 undeclared
(first use in this function)
71 | sys_siglist[WTERMSIG(exitstatus)] : "(unknown)");
| ^~~~~~~~~~~
wait_error.c:71:6: note: each undeclared identifier is reported only once
for each function it appears in
make[2]: *** [<builtin>: wait_error.o] Error 1We haven't changed anything, ergo something changed at the OS level.
Oddly, we'd not get to this code unless configure set
HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*. I suspect the root
issue here is some rearrangement of system header files combined with
wait_error.c (and maybe other places?) not including exactly the same
headers that configure tested.Anyway, rather than installing rawhide and trying to debug this,
I'd like to make a modest proposal: let's back-patch the v12
patches that made us stop relying on sys_siglist[], viz a73d08319
and cc92cca43. Per the discussions that led to those patches,
it's been decades since any platform didn't have POSIX-compliant
strsignal(), so we'd be much better off relying on that.regards, tom lane
I believe it's related with these recent glibc changes at rawhide.
https://src.fedoraproject.org/rpms/glibc/c/0aab7eb58528999277c626fc16682da179de03d0?branch=master
- signal: Move sys_errlist to a compat symbol
- signal: Move sys_siglist to a compat symbol
SHA512 (glibc-2.31.9000-683-gffb17e7ba3.tar.xz) =
103ff3c04de5dc149df93e5399de1630f6fff1b8d7f127881d6e530492b8b953a8064205ceecb311a77c0a10de3a5ab2056121fd1fa833a30327c6b1f08beacc
On Thu, Jul 16, 2020 at 10:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
We haven't changed anything, ergo something changed at the OS level.
Oddly, we'd not get to this code unless configure set
HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*. I suspect the root
issue here is some rearrangement of system header files combined with
wait_error.c (and maybe other places?) not including exactly the same
headers that configure tested.
It looks like glibc very recently decided[1]https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=b1ccfc061feee9ce616444ded8e1cd5acf9fa97f to hide the declaration,
but we're using a cached configure test result. I guess rawhide is
the RH thing that tracks the bleeding edge?
Anyway, rather than installing rawhide and trying to debug this,
I'd like to make a modest proposal: let's back-patch the v12
patches that made us stop relying on sys_siglist[], viz a73d08319
and cc92cca43. Per the discussions that led to those patches,
it's been decades since any platform didn't have POSIX-compliant
strsignal(), so we'd be much better off relying on that.
Seems sensible. Despite the claims of the glibc manual[2]https://www.gnu.org/software/libc/manual/html_node/Signal-Messages.html, it's not
really a GNU extension, and the BSDs have it (for decades).
[1]: https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=b1ccfc061feee9ce616444ded8e1cd5acf9fa97f
[2]: https://www.gnu.org/software/libc/manual/html_node/Signal-Messages.html
Thomas Munro <thomas.munro@gmail.com> writes:
On Thu, Jul 16, 2020 at 10:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Oddly, we'd not get to this code unless configure set
HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*.
It looks like glibc very recently decided[1] to hide the declaration,
but we're using a cached configure test result.
Ah, of course. I was thinking that Peter had just changed configure
in the last day or so, but that did not affect the back branches.
So it's unsurprising for buildfarm animals to be using cached configure
results.
I guess rawhide is the RH thing that tracks the bleeding edge?
Yup. Possibly we should recommend that buildfarm owners running on
non-stable platforms disable autoconf result caching --- I believe
that's "use_accache => undef" in the configuration file.
Alternatively, maybe it'd be bright for the buildfarm script to
discard that cache after any failure (or at least configure or
build failures).
regards, tom lane
Thomas Munro <thomas.munro@gmail.com> writes:
On Thu, Jul 16, 2020 at 10:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
We haven't changed anything, ergo something changed at the OS level.
It looks like glibc very recently decided[1] to hide the declaration,
but we're using a cached configure test result.
Right. So, modulo the mis-cached result, what would happen if we do
nothing is that the back branches would lose the ability to translate
signal numbers to strings on bleeding-edge glibc. I don't think we
want that, so we need to back-patch. Attached is a lightly tested
patch for v11. (This includes 7570df0f3 as well, so that
pgstrsignal.c will be the same in all branches.)
regards, tom lane
Attachments:
replace-sys_siglist-with-strsignal-v11.patchtext/x-diff; charset=us-ascii; name=replace-sys_siglist-with-strsignal-v11.patchDownload+94-84
On 7/15/20 7:36 PM, Tom Lane wrote:
I guess rawhide is the RH thing that tracks the bleeding edge?
Yup. Possibly we should recommend that buildfarm owners running on
non-stable platforms disable autoconf result caching --- I believe
that's "use_accache => undef" in the configuration file.Alternatively, maybe it'd be bright for the buildfarm script to
discard that cache after any failure (or at least configure or
build failures).
Yeah, these lines will be added to the upcoming client code release in
run_build.pl Search for 'obsolete' and you'll find where to put it if
you want to be ahead of the curve.
my $last_stage = get_last_stage() || "";
$obsolete ||=
$last_stage =~ /^(Make|Configure|Contrib|.*-build)$/;
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services