pltcl crash on recent macOS

Started by Peter Eisentrautover 3 years ago11 messages
#1Peter Eisentraut
peter.eisentraut@enterprisedb.com

A little while ago, the pltcl tests starting crashing for me on macOS.
I don't know what had changed, but I suspect it was either an operating
system update or something like an xcode update.

Here is a backtrace:

* frame #0: 0x00007ff7b0e61853
frame #1: 0x00007ff803a28751 libsystem_c.dylib`hash_search + 215
frame #2: 0x0000000110357700
pltcl.so`compile_pltcl_function(fn_oid=16418, tgreloid=0,
is_event_trigger=false, pltrusted=true) at pltcl.c:1418:13
frame #3: 0x0000000110355d50
pltcl.so`pltcl_func_handler(fcinfo=0x00007fb6f1817028,
call_state=0x00007ff7b0e61b80, pltrusted=true) at pltcl.c:814:12
...

Note that the hash_search call goes into some system library, not postgres.

The command to link pltcl is:

gcc ... -ltcl8.6 -lz -lpthread -framework CoreFoundation -lc
-bundle_loader ../../../src/backend/postgres

Notice the -lc in there. If I remove that, it works again.

The -lc is explicitly added in src/pl/tcl/Makefile, so it's our own
doing. I tracked this back, and it's been moved and rearranged in that
makefile a number of time. The original addition was

commit e3909672f12e0ddf3e202b824fda068ad2195ef2
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon Dec 14 00:46:49 1998

Build pltcl.so correctly on platforms that want dependent
shared libraries to be listed in the link command.

Has anyone else seen this?

Note, I'm using the tcl-tk package from Homebrew. The tcl installation
provided by macOS itself no longer appears to work for linking against.

#2Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#1)
Re: pltcl crash on recent macOS

On Mon, Jun 13, 2022 at 6:53 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

frame #1: 0x00007ff803a28751 libsystem_c.dylib`hash_search + 215
frame #2: 0x0000000110357700
pltcl.so`compile_pltcl_function(fn_oid=16418, tgreloid=0,

Hmm, I can’t reproduce that…. although that symbol is present in my
libSystem.B.dylib according to dlsym() and callable from a simple
program not linked to anything else, pltcl.so is apparently reaching
postgres’s hash_search for me, based on the fact that make -C
src/pl/tcl check succeeds and nm -m on pltcl.so shows it as "from
executable". It would be interesting to see what nm -m shows for you.

Archeological note: That hash_search stuff, header <strhash.h>, seems
to have been copied from ancient FreeBSD before it was dropped
upstream for the crime of polluting the global symbol namespace with
junk[1]https://github.com/freebsd/freebsd-src/commit/dc196afb2e58dd05cd66e2da44872bb3d619910f. It's been languishing in Apple's libc for at least 19
years[2]https://github.com/apple-open-source-mirror/Libc/blame/master/stdlib/FreeBSD/strhash.c, though, so I'm not sure why it's showing up suddenly as a
problem for you now.

Note, I'm using the tcl-tk package from Homebrew. The tcl installation
provided by macOS itself no longer appears to work for linking against.

I’m using tcl 8.6.12 installed by MacPorts on macOS 12.4, though, hmm,
SDK 12.3. I see the explicit -lc when building pltcl.so, and I see
that libSystem.B.dylib is explicitly mentioned here, whether or not I
have -lc:

% otool -L ./tmp_install/Users/tmunro/install/lib/postgresql/pltcl.so
./tmp_install/Users/tmunro/install/lib/postgresql/pltcl.so:
/opt/local/lib/libtcl8.6.dylib (compatibility version 8.6.0, current
version 8.6.12)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current
version 1311.100.3)

Here’s the complete link line:

ccache cc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla
-Werror=unguarded-availability-new -Wendif-labels
-Wmissing-format-attribute -Wcast-function-type -Wformat-security
-fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument
-Wno-compound-token-split-by-macro -g -O0 -bundle -multiply_defined
suppress -o pltcl.so pltcl.o -L../../../src/port
-L../../../src/common -isysroot
/Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk
-Wl,-dead_strip_dylibs -L/opt/local/lib -ltcl8.6 -lz -lpthread
-framework CoreFoundation -lc -bundle_loader
../../../src/backend/postgres

[1]: https://github.com/freebsd/freebsd-src/commit/dc196afb2e58dd05cd66e2da44872bb3d619910f
[2]: https://github.com/apple-open-source-mirror/Libc/blame/master/stdlib/FreeBSD/strhash.c

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#2)
Re: pltcl crash on recent macOS

Thomas Munro <thomas.munro@gmail.com> writes:

On Mon, Jun 13, 2022 at 6:53 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

frame #1: 0x00007ff803a28751 libsystem_c.dylib`hash_search + 215
frame #2: 0x0000000110357700
pltcl.so`compile_pltcl_function(fn_oid=16418, tgreloid=0,

Hmm, I can’t reproduce that….

I can't either, although I'm using the macOS-provided Tcl code,
which still works fine for me. (I grant that Apple might desupport
that someday, but they haven't yet.) sifaka and longfin aren't
unhappy either; although sifaka is close to identical to my laptop.

Having said that, I wonder whether the position of the -bundle_loader
switch in the command line is relevant to which way the hash_search
reference is resolved. Seems like we could put it in front of the
various -l options if that'd help.

regards, tom lane

#4Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Thomas Munro (#2)
Re: pltcl crash on recent macOS

On 13.06.22 13:27, Thomas Munro wrote:

On Mon, Jun 13, 2022 at 6:53 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

frame #1: 0x00007ff803a28751 libsystem_c.dylib`hash_search + 215
frame #2: 0x0000000110357700
pltcl.so`compile_pltcl_function(fn_oid=16418, tgreloid=0,

Hmm, I can’t reproduce that…. although that symbol is present in my
libSystem.B.dylib according to dlsym() and callable from a simple
program not linked to anything else, pltcl.so is apparently reaching
postgres’s hash_search for me, based on the fact that make -C
src/pl/tcl check succeeds and nm -m on pltcl.so shows it as "from
executable". It would be interesting to see what nm -m shows for you.

...
(undefined) external _get_call_result_type (from executable)
(undefined) external _getmissingattr (from executable)
(undefined) external _hash_create (from libSystem)
(undefined) external _hash_search (from libSystem)
...

I’m using tcl 8.6.12 installed by MacPorts on macOS 12.4, though, hmm,
SDK 12.3. I see the explicit -lc when building pltcl.so, and I see
that libSystem.B.dylib is explicitly mentioned here, whether or not I
have -lc:

% otool -L ./tmp_install/Users/tmunro/install/lib/postgresql/pltcl.so
./tmp_install/Users/tmunro/install/lib/postgresql/pltcl.so:
/opt/local/lib/libtcl8.6.dylib (compatibility version 8.6.0, current
version 8.6.12)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current
version 1311.100.3)

Looks the same here:

pltcl.so:
/usr/local/opt/tcl-tk/lib/libtcl8.6.dylib (compatibility version 8.6.0,
current version 8.6.12)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current
version 1311.100.3)

Here’s the complete link line:

ccache cc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla
-Werror=unguarded-availability-new -Wendif-labels
-Wmissing-format-attribute -Wcast-function-type -Wformat-security
-fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument
-Wno-compound-token-split-by-macro -g -O0 -bundle -multiply_defined
suppress -o pltcl.so pltcl.o -L../../../src/port
-L../../../src/common -isysroot
/Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk
-Wl,-dead_strip_dylibs -L/opt/local/lib -ltcl8.6 -lz -lpthread
-framework CoreFoundation -lc -bundle_loader
../../../src/backend/postgres

The difference is that I use CC=gcc-11. I have change to CC=cc, then it
works (nm output shows "from executable"). So it's gcc that gets thrown
off by the -lc.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#4)
Re: pltcl crash on recent macOS

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

The difference is that I use CC=gcc-11. I have change to CC=cc, then it
works (nm output shows "from executable"). So it's gcc that gets thrown
off by the -lc.

Hah, that makes sense. So does changing the option order help?

regards, tom lane

#6Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Tom Lane (#3)
Re: pltcl crash on recent macOS

On 13.06.22 18:01, Tom Lane wrote:

Having said that, I wonder whether the position of the -bundle_loader
switch in the command line is relevant to which way the hash_search
reference is resolved. Seems like we could put it in front of the
various -l options if that'd help.

Switching the order of -bundle_loader and -lc did not help.

#7Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#4)
Re: pltcl crash on recent macOS

On Tue, Jun 14, 2022 at 8:21 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

The difference is that I use CC=gcc-11. I have change to CC=cc, then it
works (nm output shows "from executable"). So it's gcc that gets thrown
off by the -lc.

Hrmph, I changed my CC to "ccache gcc-mp-11" (what MacPorts calls GCC
11), and I still can't reproduce the problem. I still get "(from
executable)". In your original quote you showed "gcc", not "gcc-11",
which (assuming it is found as /usr/bin/gcc) is just a little binary
that redirects to clang... trying that, this time without ccache in
the mix... and still no cigar. So something is different about GCC 11
from homebrew, or the linker invocation it produces under the covers,
or the linker it's using?

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#6)
Re: pltcl crash on recent macOS

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

Switching the order of -bundle_loader and -lc did not help.

Meh. Well, it was worth a try.

I'd be okay with just dropping the -lc from pl/tcl/Makefile and seeing
what the buildfarm says. The fact that we needed it in 1998 doesn't
mean that we still need it on supported versions of Tcl; nor was it
ever anything but a hack for us to be overriding what TCL_LIBS says.

As a quick check, I tried it on prairiedog's host (which has the oldest
Tcl installation I still have in captivity), and it seemed fine.

regards, tom lane

#9Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Thomas Munro (#7)
Re: pltcl crash on recent macOS

On 13.06.22 23:32, Thomas Munro wrote:

Hrmph, I changed my CC to "ccache gcc-mp-11" (what MacPorts calls GCC
11), and I still can't reproduce the problem. I still get "(from
executable)". In your original quote you showed "gcc", not "gcc-11",
which (assuming it is found as /usr/bin/gcc) is just a little binary
that redirects to clang... trying that, this time without ccache in
the mix... and still no cigar. So something is different about GCC 11
from homebrew, or the linker invocation it produces under the covers,
or the linker it's using?

The original quote said "gcc" but that just me attempting to simplify.
I have now also figured out that it works with gcc-10 but not with
gcc-11 and gcc-12. For example, below are the underlying linker
invocations from gcc-10 and gcc-11. Note that some of the options are
ordered quite differently. I don't know what all of that means yet, but
it surely points to something in gcc or its packaging being the cause.

However, I think ultimately the use of -lc is an error and we should get
rid of it. This episode shows that it's very fragile in any case.

"/usr/local/Cellar/gcc@10/10.3.0/libexec/gcc/x86_64-apple-darwin20/10.3.0/collect2"
-dynamic -arch x86_64 -bundle -bundle_loader
../../../src/backend/postgres -macosx_version_min 11.4.0
-multiply_defined suppress -syslibroot
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk
-weak_reference_mismatches non-weak -o pltcl.so -L../../../src/port
-L../../../src/common -L/usr/local/lib -L/usr/local/opt/openldap/lib
"-L/usr/local/opt/openssl@1.1/lib" -L/usr/local/opt/readline/lib
-L/usr/local/opt/krb5/lib -L/usr/local/opt/icu4c/lib
-L/usr/local/opt/tcl-tk/lib -L/usr/local/Cellar/libxml2/2.9.14/lib
-L/usr/local/Cellar/lz4/1.9.3/lib -L/usr/local/Cellar/zstd/1.5.2/lib
-L/usr/local/Cellar/tcl-tk/8.6.12_1/lib
"-L/usr/local/Cellar/gcc@10/10.3.0/lib/gcc/10/gcc/x86_64-apple-darwin20/10.3.0"
"-L/usr/local/Cellar/gcc@10/10.3.0/lib/gcc/10/gcc/x86_64-apple-darwin20/10.3.0/../../.."
pltcl.o -dead_strip_dylibs -ltcl8.6 -lz -framework CoreFoundation -lc
-lSystem -lgcc_ext.10.5 -lgcc -lSystem -no_compact_unwind -idsym

/usr/local/Cellar/gcc/11.3.0_1/bin/../libexec/gcc/x86_64-apple-darwin21/11/collect2
-dynamic -arch x86_64 -syslibroot
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk
-macosx_version_min 12.4.0 -o pltcl.so -L../../../src/port
-L../../../src/common -L/usr/local/lib -L/usr/local/opt/openldap/lib
"-L/usr/local/opt/openssl@1.1/lib" -L/usr/local/opt/readline/lib
-L/usr/local/opt/krb5/lib -L/usr/local/opt/icu4c/lib
-L/usr/local/opt/tcl-tk/lib -L/usr/local/Cellar/libxml2/2.9.14/lib
-L/usr/local/Cellar/lz4/1.9.3/lib -L/usr/local/Cellar/zstd/1.5.2/lib
-L/usr/local/Cellar/tcl-tk/8.6.12_1/lib
-L/usr/local/Cellar/gcc/11.3.0_1/bin/../lib/gcc/11/gcc/x86_64-apple-darwin21/11
-L/usr/local/Cellar/gcc/11.3.0_1/bin/../lib/gcc/11/gcc
-L/usr/local/Cellar/gcc/11.3.0_1/bin/../lib/gcc/11/gcc/x86_64-apple-darwin21/11/../../..
pltcl.o -dead_strip_dylibs -ltcl8.6 -lz -lc -bundle_loader
../../../src/backend/postgres -bundle -framework CoreFoundation
-multiply_defined suppress -lemutls_w -lgcc -lSystem -no_compact_unwind
-idsym

#10Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Tom Lane (#8)
1 attachment(s)
Re: pltcl crash on recent macOS

On 14.06.22 05:05, Tom Lane wrote:

I'd be okay with just dropping the -lc from pl/tcl/Makefile and seeing
what the buildfarm says. The fact that we needed it in 1998 doesn't
mean that we still need it on supported versions of Tcl; nor was it
ever anything but a hack for us to be overriding what TCL_LIBS says.

Ok, I propose to proceed with the attached patch (with a bit more
explanation added) for the master branch (for now) and see how it goes.

Attachments:

0001-PL-Tcl-Don-t-link-with-lc-explicitly.patchtext/plain; charset=UTF-8; name=0001-PL-Tcl-Don-t-link-with-lc-explicitly.patchDownload
From 394ab358d7437768d2c2570381f2cdcef51dc2c4 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 20 Jun 2022 12:34:13 +0200
Subject: [PATCH] PL/Tcl: Don't link with -lc explicitly

Discussion: https://www.postgresql.org/message-id/flat/a78c847a-4f79-9286-be99-e819e9e4139e%40enterprisedb.com
---
 src/pl/tcl/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/pl/tcl/Makefile b/src/pl/tcl/Makefile
index 25e65189b6..314f9b2eec 100644
--- a/src/pl/tcl/Makefile
+++ b/src/pl/tcl/Makefile
@@ -15,7 +15,7 @@ override CPPFLAGS := -I. -I$(srcdir) $(TCL_INCLUDE_SPEC) $(CPPFLAGS)
 
 # On Windows, we don't link directly with the Tcl library; see below
 ifneq ($(PORTNAME), win32)
-SHLIB_LINK = $(TCL_LIB_SPEC) $(TCL_LIBS) -lc
+SHLIB_LINK = $(TCL_LIB_SPEC) $(TCL_LIBS)
 endif
 
 PGFILEDESC = "PL/Tcl - procedural language"
-- 
2.36.1

#11Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Peter Eisentraut (#10)
Re: pltcl crash on recent macOS

On 20.06.22 12:36, Peter Eisentraut wrote:

On 14.06.22 05:05, Tom Lane wrote:

I'd be okay with just dropping the -lc from pl/tcl/Makefile and seeing
what the buildfarm says.  The fact that we needed it in 1998 doesn't
mean that we still need it on supported versions of Tcl; nor was it
ever anything but a hack for us to be overriding what TCL_LIBS says.

Ok, I propose to proceed with the attached patch (with a bit more
explanation added) for the master branch (for now) and see how it goes.

done