libpq crashing on macOS during connection startup

Started by John DeSoiover 2 years ago14 messagesgeneral
Jump to latest
#1John DeSoi
john@desoi.dev

I have a macOS web server using Postgres that has been very stable until a month or two ago. If I restart the web server the problem seems to go away for a while, but starts happening again within days. I thought it was a PHP issue as discussed in the link below, but I just noticed in the crash report it seems to be something related to a call from libpq.

https://github.com/shivammathur/homebrew-php/issues/1862

Any ideas or suggestions appreciated.

John DeSoi, Ph.D.

-------------------------------------
Translated Report (Full Report Below)
-------------------------------------

Process: httpd [54877]
Path: /opt/homebrew/*/httpd
Identifier: httpd
Version: ???
Code Type: ARM-64 (Native)
Parent Process: httpd [6040]
Responsible: httpd [6040]
User ID: 502

Date/Time: 2023-11-30 07:06:00.0651 -0600
OS Version: macOS 12.7 (21G816)
Report Version: 12
Anonymous UUID: 750F146C-B2B5-BECA-EC21-1FEC0471D5AC

Time Awake Since Boot: 1000000 seconds

System Integrity Protection: enabled

Crashed Thread: 0 Dispatch queue: com.apple.root.utility-qos

Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000110
Exception Codes: 0x0000000000000001, 0x0000000000000110
Exception Note: EXC_CORPSE_NOTIFY

VM Region Info: 0x110 is not in any region. Bytes before following region: 105553518919408
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
UNUSED SPACE AT START
--->
MALLOC_NANO (reserved) 600018000000-600020000000 [128.0M] rw-/rwx SM=NUL ...(unallocated)

Application Specific Information:
*** multi-threaded process forked ***
crashed on child side of fork pre-exec

Kernel Triage:
VM - pmap_enter failed with resource shortage
VM - pmap_enter failed with resource shortage

Thread 0 Crashed:: Dispatch queue: com.apple.root.utility-qos
0 libdispatch.dylib 0x199dd825c _dispatch_apply_with_attr_f + 1136
1 libdispatch.dylib 0x199dd8234 _dispatch_apply_with_attr_f + 1096
2 libdispatch.dylib 0x199dd847c dispatch_apply + 108
3 CoreFoundation 0x19a172a80 __104-[CFPrefsSearchListSource synchronouslySendDaemonMessage:andAgentMessage:andDirectMessage:replyHandler:]_block_invoke.92 + 132
4 CoreFoundation 0x19a007e8c CFPREFERENCES_IS_WAITING_FOR_SYSTEM_AND_USER_CFPREFSDS + 100
5 CoreFoundation 0x19a007ccc -[CFPrefsSearchListSource synchronouslySendDaemonMessage:andAgentMessage:andDirectMessage:replyHandler:] + 232
6 CoreFoundation 0x19a00649c -[CFPrefsSearchListSource alreadylocked_generationCountFromListOfSources:count:] + 252
7 CoreFoundation 0x19a006178 -[CFPrefsSearchListSource alreadylocked_getDictionary:] + 468
8 CoreFoundation 0x19a005cec -[CFPrefsSearchListSource alreadylocked_copyValueForKey:] + 172
9 CoreFoundation 0x19a005c20 -[CFPrefsSource copyValueForKey:] + 60
10 CoreFoundation 0x19a005bcc __76-[_CFXPreferences copyAppValueForKey:identifier:container:configurationURL:]_block_invoke + 44
11 CoreFoundation 0x199ffe9e0 __108-[_CFXPreferences(SearchListAdditions) withSearchListForIdentifier:container:cloudConfigurationURL:perform:]_block_invoke + 384
12 CoreFoundation 0x19a173350 -[_CFXPreferences withSearchListForIdentifier:container:cloudConfigurationURL:perform:] + 384
13 CoreFoundation 0x199ffe394 -[_CFXPreferences copyAppValueForKey:identifier:container:configurationURL:] + 168
14 CoreFoundation 0x199ffe2b0 _CFPreferencesCopyAppValueWithContainerAndConfiguration + 128
15 Heimdal 0x1a5d4cb80 init_context_from_config_file + 2732
16 Heimdal 0x1a5d33944 krb5_set_config_files + 392
17 Heimdal 0x1a5d33284 krb5_init_context_flags + 308
18 Heimdal 0x1a5d33144 krb5_init_context + 32
19 Kerberos 0x1a7fc32e8 mshim_ctx + 64
20 Kerberos 0x1a7fc16e4 context_new_ccache_iterator + 92
21 libkrb5.3.3.dylib 0x1017accc8 api_macos_ptcursor_next + 220
22 libkrb5.3.3.dylib 0x1017a9f0c krb5_cccol_cursor_next + 76
23 libkrb5.3.3.dylib 0x1017aa1f4 krb5_cccol_have_content + 92
24 libgssapi_krb5.2.2.dylib 0x1016a1f58 acquire_cred_context + 1668
25 libgssapi_krb5.2.2.dylib 0x1016a185c acquire_cred_from + 688
26 libgssapi_krb5.2.2.dylib 0x101693b8c gss_add_cred_from + 1108
27 libgssapi_krb5.2.2.dylib 0x101693568 gss_acquire_cred_from + 308
28 libgssapi_krb5.2.2.dylib 0x101693428 gss_acquire_cred + 36
29 libpq.5.dylib 0x1012a9db8 pg_GSS_have_cred_cache + 60
30 libpq.5.dylib 0x10129927c PQconnectPoll + 5600
31 libpq.5.dylib 0x10129623c connectDBComplete + 304
32 libpq.5.dylib 0x1012963a8 PQconnectdb + 44
33 libphp.so 0x10229569c pdo_pgsql_handle_factory + 328
34 libphp.so 0x102282230 zim_PDO___construct + 1496
35 libphp.so 0x10249bd0c ZEND_DO_FCALL_SPEC_RETVAL_UNUSED_HANDLER + 304
36 libphp.so 0x102479868 execute_ex + 52
37 libphp.so 0x10244b314 zend_call_function + 1332
38 libphp.so 0x10236cef0 zif_call_user_func_array + 136
39 libphp.so 0x1024b83e4 ZEND_DO_FCALL_BY_NAME_SPEC_RETVAL_USED_HANDLER + 264
40 libphp.so 0x102479868 execute_ex + 52
41 libphp.so 0x102479a64 zend_execute + 288
42 libphp.so 0x102459d84 zend_execute_scripts + 156
43 libphp.so 0x1023ff9a8 php_execute_script + 460
44 libphp.so 0x10253efa8 php_handler + 1024
45 httpd 0x100cc61a4 ap_run_handler + 64
46 httpd 0x100cc687c ap_invoke_handler + 264
47 httpd 0x100cfe364 ap_internal_redirect + 60
48 mod_rewrite.so 0x10204b6d8 handler_redirect + 136
49 httpd 0x100cc61a4 ap_run_handler + 64
50 httpd 0x100cc687c ap_invoke_handler + 264
51 httpd 0x100cfdf3c ap_process_async_request + 792
52 httpd 0x100cfdfec ap_process_request + 24
53 httpd 0x100cfae64 ap_process_http_connection + 344
54 httpd 0x100cd785c ap_run_process_connection + 64
55 mod_mpm_prefork.so 0x1010e23ec child_main + 1092
56 mod_mpm_prefork.so 0x1010e1e74 make_child + 436
57 mod_mpm_prefork.so 0x1010e18b0 prefork_run + 2056
58 httpd 0x100cd9f30 ap_run_mpm + 84
59 httpd 0x100ccd3b4 main + 2260
60 dyld 0x100fd108c start + 520

#2Joe Conway
mail@joeconway.com
In reply to: John DeSoi (#1)
Re: libpq crashing on macOS during connection startup

On 11/30/23 09:45, John DeSoi wrote:

I have a macOS web server using Postgres that has been very stable until a month or two ago. If I restart the web server the problem seems to go away for a while, but starts happening again within days. I thought it was a PHP issue as discussed in the link below, but I just noticed in the crash report it seems to be something related to a call from libpq.

https://github.com/shivammathur/homebrew-php/issues/1862

Any ideas or suggestions appreciated.

Did you recently get an OpenSSL upgrade to v3.2.0? This is a shot in the
dark, but perhaps related to the discussion here?

/messages/by-id/CAN55FZ1eDDYsYaL7mv+oSLUij2h_u6hvD4Qmv-7PK7jkji0uyQ@mail.gmail.com

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#3John DeSoi
john@desoi.dev
In reply to: Joe Conway (#2)
Re: libpq crashing on macOS during connection startup

On Nov 30, 2023, at 8:59 AM, Joe Conway <mail@joeconway.com> wrote:

Did you recently get an OpenSSL upgrade to v3.2.0? This is a shot in the dark, but perhaps related to the discussion here?

/messages/by-id/CAN55FZ1eDDYsYaL7mv+oSLUij2h_u6hvD4Qmv-7PK7jkji0uyQ@mail.gmail.com

No, this server is on openssl 3.1.4. But thanks for sending that, I'm about to setup a new server and I'm sure it will end up with the latest versions.

John DeSoi, Ph.D.

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: John DeSoi (#1)
Re: libpq crashing on macOS during connection startup

On 11/30/23 06:45, John DeSoi wrote:

I have a macOS web server using Postgres that has been very stable until a month or two ago. If I restart the web server the problem seems to go away for a while, but starts happening again within days. I thought it was a PHP issue as discussed in the link below, but I just noticed in the crash report it seems to be something related to a call from libpq.

What starts happening?

Does the Postgres log show anything?

Postgres version?

How was Postgres installed?

https://github.com/shivammathur/homebrew-php/issues/1862

Any ideas or suggestions appreciated.

John DeSoi, Ph.D.

--
Adrian Klaver
adrian.klaver@aklaver.com

#5John DeSoi
john@desoi.dev
In reply to: Adrian Klaver (#4)
Re: libpq crashing on macOS during connection startup

On Nov 30, 2023, at 9:36 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

What starts happening?

Random web process crashes when connecting to PostgreSQL.

Does the Postgres log show anything?

No.

Postgres version?

How was Postgres installed?

PostgreSQL 15.4 installed with Homebrew.

John DeSoi, Ph.D.

#6Adrian Klaver
adrian.klaver@aklaver.com
In reply to: John DeSoi (#5)
Re: libpq crashing on macOS during connection startup

On 11/30/23 07:49, John DeSoi wrote:

On Nov 30, 2023, at 9:36 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

What starts happening?

Random web process crashes when connecting to PostgreSQL.

Does the Postgres log show anything?

No.

To be clear, at the times the Web processes crash there is are no traces
in the Postgres log of an issue on the Postgres side?

Is there evidence in the Postgres logs of what the Web process was doing
just before it crashed?

Postgres version?

How was Postgres installed?

PostgreSQL 15.4 installed with Homebrew.

John DeSoi, Ph.D.

--
Adrian Klaver
adrian.klaver@aklaver.com

#7John DeSoi
john@desoi.dev
In reply to: Adrian Klaver (#6)
Re: libpq crashing on macOS during connection startup

On Nov 30, 2023, at 10:21 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

To be clear, at the times the Web processes crash there is are no traces in the Postgres log of an issue on the Postgres side?

Is there evidence in the Postgres logs of what the Web process was doing just before it crashed?

No entry in the Postgres log that I can see. The backtrace I posted in the original message was today at 7:06am. There is nothing in the Postgres log around that time except for some checkpoint messages.

I think the backtrace shows that Postgres has just connected and is authenticating by calling Kerberos which calls Heimdal and then crashes in CoreFoundation. I also posted this issue on the Heimdal GitHub account.

John DeSoi, Ph.D.

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: John DeSoi (#3)
Re: libpq crashing on macOS during connection startup

John DeSoi <john@desoi.dev> writes:

On Nov 30, 2023, at 8:59 AM, Joe Conway <mail@joeconway.com> wrote:
Did you recently get an OpenSSL upgrade to v3.2.0? This is a shot in the dark, but perhaps related to the discussion here?
/messages/by-id/CAN55FZ1eDDYsYaL7mv+oSLUij2h_u6hvD4Qmv-7PK7jkji0uyQ@mail.gmail.com

No, this server is on openssl 3.1.4. But thanks for sending that, I'm about to setup a new server and I'm sure it will end up with the latest versions.

The crash appears to be happening within GSSAPI authentication, which
presumably indicates that we're not using OpenSSL, so that isn't
where to look.

What troubles me about that stack trace is the references to Heimdal.
We gave up supporting Heimdal (and v16 explicitly rejects building
with it) because its support for Kerberos credentials was too
incomplete and flaky. So I'm inclined to guess that you are running
into some Heimdal bug. Try to rebuild libpq using MIT Kerberos
and see if things get better.

regards, tom lane

#9John DeSoi
john@desoi.dev
In reply to: Tom Lane (#8)
Re: libpq crashing on macOS during connection startup

On Nov 30, 2023, at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

What troubles me about that stack trace is the references to Heimdal.
We gave up supporting Heimdal (and v16 explicitly rejects building
with it) because its support for Kerberos credentials was too
incomplete and flaky. So I'm inclined to guess that you are running
into some Heimdal bug. Try to rebuild libpq using MIT Kerberos
and see if things get better.

I'm using v16 on my development machine and it is crashing on me at times with the same backtrace. Restarting the web server fixes it for a while for some reason.

Is there a way to simply disable GSSAPI authentication? I could not find it.

The builds are from homebrew (https://brew.sh/). I'll have to see if there is a way for me to override build options.

The otool output below shows that Apple's Kerberos is being used and I assume by extension, their Heimdal library. The Heimdal project told me as much - Apple has a fork and would not pull from their project.

John DeSoi, Ph.D.

$ otool -L /usr/local/opt/postgresql@16/lib/libpq.5.dylib
/usr/local/opt/postgresql@16/lib/libpq.5.dylib:
/usr/local/opt/postgresql@16/lib/libpq.5.dylib (compatibility version 5.0.0, current version 5.16.0)
/usr/local/opt/gettext/lib/libintl.8.dylib (compatibility version 13.0.0, current version 13.0.0)
/usr/local/opt/openssl@3/lib/libssl.3.dylib (compatibility version 3.0.0, current version 3.0.0)
/usr/local/opt/openssl@3/lib/libcrypto.3.dylib (compatibility version 3.0.0, current version 3.0.0)
/usr/local/opt/krb5/lib/libgssapi_krb5.2.2.dylib (compatibility version 2.0.0, current version 2.2.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.100.3)
/System/Library/Frameworks/LDAP.framework/Versions/A/LDAP (compatibility version 1.0.0, current version 2.4.0)

$ otool -L /usr/local/opt/krb5/lib/libgssapi_krb5.2.2.dylib
/usr/local/opt/krb5/lib/libgssapi_krb5.2.2.dylib:
/usr/local/opt/krb5/lib/libgssapi_krb5.2.2.dylib (compatibility version 2.0.0, current version 2.2.0)
@loader_path/libkrb5.3.3.dylib (compatibility version 3.0.0, current version 3.3.0)
@loader_path/libk5crypto.3.1.dylib (compatibility version 3.0.0, current version 3.1.0)
@loader_path/libcom_err.3.0.dylib (compatibility version 3.0.0, current version 3.0.0)
@loader_path/libkrb5support.1.1.dylib (compatibility version 1.0.0, current version 1.1.0)
/usr/lib/libresolv.9.dylib (compatibility version 1.0.0, current version 1.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.100.3)

$ otool -L /usr/local/opt/krb5/lib/libkrb5.3.3.dylib
/usr/local/opt/krb5/lib/libkrb5.3.3.dylib:
/usr/local/opt/krb5/lib/libkrb5.3.3.dylib (compatibility version 3.0.0, current version 3.3.0)
@loader_path/libk5crypto.3.1.dylib (compatibility version 3.0.0, current version 3.1.0)
@loader_path/libcom_err.3.0.dylib (compatibility version 3.0.0, current version 3.0.0)
@loader_path/libkrb5support.1.1.dylib (compatibility version 1.0.0, current version 1.1.0)
/System/Library/Frameworks/Kerberos.framework/Versions/A/Kerberos (compatibility version 5.0.0, current version 6.0.0)
/usr/lib/libresolv.9.dylib (compatibility version 1.0.0, current version 1.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.100.3)

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: John DeSoi (#9)
Re: libpq crashing on macOS during connection startup

John DeSoi <john@desoi.dev> writes:

On Nov 30, 2023, at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

What troubles me about that stack trace is the references to Heimdal.
We gave up supporting Heimdal (and v16 explicitly rejects building
with it) because its support for Kerberos credentials was too
incomplete and flaky. So I'm inclined to guess that you are running
into some Heimdal bug. Try to rebuild libpq using MIT Kerberos
and see if things get better.

Is there a way to simply disable GSSAPI authentication? I could not find it.

gssencmode=disable in your connection options; but that's a tad
inconvenient probably.

The otool output below shows that Apple's Kerberos is being used and I assume by extension, their Heimdal library. The Heimdal project told me as much - Apple has a fork and would not pull from their project.

Ugh, not only Heimdal but a very obsolete version thereof? It borders
on negligence for the homebrew PG package to be building against that.
They should be pulling in homebrew's MIT Kerberos package and using
that, if they want to enable GSSAPI.

regards, tom lane

#11John DeSoi
john@desoi.dev
In reply to: Tom Lane (#10)
Re: libpq crashing on macOS during connection startup

On Nov 30, 2023, at 7:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

gssencmode=disable in your connection options; but that's a tad
inconvenient probably.

Yes, the application uses PHP PDO to connect to PostgreSQL. I don't see any way to specify that in the connection options.

Ugh, not only Heimdal but a very obsolete version thereof? It borders
on negligence for the homebrew PG package to be building against that.
They should be pulling in homebrew's MIT Kerberos package and using
that, if they want to enable GSSAPI.

I was looking at the homebrew source for PostgreSQL package to see if there was a way to customize the build options. I did not find one but saw the comment below. Apparently this is a known issue and it was suggested to use the MIT Kerberos package 4 years ago. Instead they just added this comment in 2020.

# GSSAPI provided by Kerberos.framework crashes when forked.
# See https://github.com/Homebrew/homebrew-core/issues/47494.

John DeSoi, Ph.D.

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: John DeSoi (#11)
Re: libpq crashing on macOS during connection startup

John DeSoi <john@desoi.dev> writes:

On Nov 30, 2023, at 7:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ugh, not only Heimdal but a very obsolete version thereof? It borders
on negligence for the homebrew PG package to be building against that.
They should be pulling in homebrew's MIT Kerberos package and using
that, if they want to enable GSSAPI.

I was looking at the homebrew source for PostgreSQL package to see if there was a way to customize the build options. I did not find one but saw the comment below. Apparently this is a known issue and it was suggested to use the MIT Kerberos package 4 years ago. Instead they just added this comment in 2020.

# GSSAPI provided by Kerberos.framework crashes when forked.
# See https://github.com/Homebrew/homebrew-core/issues/47494.

Oh, thanks for finding that. But you misinterpreted the outcome;
the commit that closed that thread did

+# GSSAPI provided by Kerberos.framework crashes when forked.
+# See https://github.com/Homebrew/homebrew-core/issues/47494.
+depends_on "krb5"

The "depends_on" was evidently meant to force building against krb5,
and I suppose it did have that effect when committed. Could they
have done something since then to break it?

Looking closer, your stack trace seems to show that libpq *is*
linked against MIT Kerberos: at least, control flows from
libpq.5.dylib to libgssapi_krb5.2.2.dylib, which is not a
library that Apple supplies. However, then a few subroutines
further deep, we somehow end up in Apple's Kerberos framework,
and that eventually calls libdispatch which is the source of
the problem according to the discussion in issues/47494.

My guess at this point is that somebody at Homebrew put in a
hack (perhaps quite recently) that causes their build of MIT
Kerberos to sometimes call Apple's implementation, and that
ill-advised idea has re-opened the problem that issues/47494
meant to solve.

I'd suggest filing a bug against Homebrew's krb5 package.
Whatever this is, it seems pretty clear that it's not a
Postgres bug.

regards, tom lane

#13John DeSoi
john@desoi.dev
In reply to: Tom Lane (#12)
Re: libpq crashing on macOS during connection startup

On Dec 1, 2023, at 11:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'd suggest filing a bug against Homebrew's krb5 package.
Whatever this is, it seems pretty clear that it's not a
Postgres bug.

Will do, thank you and everyone else for the help and feedback.

John DeSoi, Ph.D.

#14John DeSoi
john@desoi.dev
In reply to: Tom Lane (#10)
Re: libpq crashing on macOS during connection startup

On Nov 30, 2023, at 7:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Is there a way to simply disable GSSAPI authentication? I could not find it.

gssencmode=disable in your connection options; but that's a tad
inconvenient probably.

I discovered there is a PGGSSENCMODE environment variable. I set it to 'disable' in the environment used to run the http server. Hopefully this will solve it.

https://www.postgresql.org/docs/current/libpq-envars.html

John DeSoi, Ph.D.