BUG #16041: Error shows up both in pgAdmin and in Ruby (pg gem) - Segmentation fault
The following bug has been logged on the website:
Bug reference: 16041
Logged by: Mark Siemers
Email address: mark.siemers@gmail.com
PostgreSQL version: 12.0
Operating system: Mac OS X Mojave 10.14.6
Description:
For further details (including crash report) see bugs filed with
third-parties:
Ruby - https://bugs.ruby-lang.org/issues/16239
pgAdmin 4 - https://redmine.postgresql.org/issues/4813
The speculation from a ruby maintainer is there is an issue with GSS
authentication on OS X.
Snippet of stack trace below:
7 ??? 0x0000000200000000 0 + 8589934592
8 com.apple.security 0x00007fff3f57c059 invocation function
for block in
Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*)
+ 287
9 libdispatch.dylib 0x00007fff5fd6d63d
_dispatch_client_callout + 8
10 libdispatch.dylib 0x00007fff5fd79129
_dispatch_lane_barrier_sync_invoke_and_complete + 60
11 com.apple.security 0x00007fff3f57be47
Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*)
+ 441
12 com.apple.security 0x00007fff3f37cae2
Security::KeychainCore::KCCursorImpl::next(Security::KeychainCore::Item&) +
230
13 com.apple.security 0x00007fff3f523c98
Security::KeychainCore::IdentityCursor::next(Security::SecPointer<Security::KeychainCore::Identity>&)
+ 192
14 com.apple.security 0x00007fff3f545f2f
SecIdentitySearchCopyNext + 145
15 com.apple.security 0x00007fff3f550956
SecItemCopyMatching_osx(__CFDictionary const*, void const**) + 238
16 com.apple.security 0x00007fff3f553fc5 SecItemCopyMatching +
316
17 com.apple.Heimdal 0x00007fff4feae830 0x7fff4fe5c000 +
337968
18 com.apple.Heimdal 0x00007fff4fead35e hx509_certs_find +
67
19 com.apple.Heimdal 0x00007fff4fe88a6c _krb5_pk_find_cert +
246
20 com.apple.GSS 0x00007fff364dbd8e
_gsspku2u_acquire_cred + 386
21 com.apple.GSS 0x00007fff364cb0d8 gss_acquire_cred +
523
22 libpq.5.dylib 0x0000000112b4b77d
pg_GSS_have_cred_cache + 54
23 libpq.5.dylib 0x0000000112b39edf PQconnectPoll +
6377
24 libpq.5.dylib 0x0000000112b36f8b connectDBComplete +
232
25 libpq.5.dylib 0x0000000112b37112 PQconnectdb + 36
26 pg_ext.bundle 0x000000011157ab01
gvl_PQconnectdb_skeleton + 17
27 ruby 0x000000010f1dfff9 call_without_gvl +
185
28 pg_ext.bundle 0x000000011157aadd gvl_PQconnectdb +
45
29 pg_ext.bundle 0x000000011157fcb9 pgconn_init + 121
30 ruby 0x000000010f221b1c vm_call0_body + 604
Hi,
Issue is not reproducible on MAC 10.12 for same PostgreSQL 12 server.
On Sat, Oct 5, 2019 at 3:43 AM PG Bug reporting form <noreply@postgresql.org>
wrote:
The following bug has been logged on the website:
Bug reference: 16041
Logged by: Mark Siemers
Email address: mark.siemers@gmail.com
PostgreSQL version: 12.0
Operating system: Mac OS X Mojave 10.14.6
Description:For further details (including crash report) see bugs filed with
third-parties:
Ruby - https://bugs.ruby-lang.org/issues/16239
pgAdmin 4 - https://redmine.postgresql.org/issues/4813The speculation from a ruby maintainer is there is an issue with GSS
authentication on OS X.Snippet of stack trace below:
7 ??? 0x0000000200000000 0 + 8589934592
8 com.apple.security 0x00007fff3f57c059 invocation function
for block inSecurity::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*)
+ 287
9 libdispatch.dylib 0x00007fff5fd6d63d
_dispatch_client_callout + 8
10 libdispatch.dylib 0x00007fff5fd79129
_dispatch_lane_barrier_sync_invoke_and_complete + 60
11 com.apple.security 0x00007fff3f57be47Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*)
+ 441
12 com.apple.security 0x00007fff3f37cae2
Security::KeychainCore::KCCursorImpl::next(Security::KeychainCore::Item&) +
230
13 com.apple.security 0x00007fff3f523c98Security::KeychainCore::IdentityCursor::next(Security::SecPointer<Security::KeychainCore::Identity>&)
+ 192
14 com.apple.security 0x00007fff3f545f2f
SecIdentitySearchCopyNext + 145
15 com.apple.security 0x00007fff3f550956
SecItemCopyMatching_osx(__CFDictionary const*, void const**) + 238
16 com.apple.security 0x00007fff3f553fc5 SecItemCopyMatching
+
316
17 com.apple.Heimdal 0x00007fff4feae830 0x7fff4fe5c000 +
337968
18 com.apple.Heimdal 0x00007fff4fead35e hx509_certs_find +
67
19 com.apple.Heimdal 0x00007fff4fe88a6c _krb5_pk_find_cert +
246
20 com.apple.GSS 0x00007fff364dbd8e
_gsspku2u_acquire_cred + 386
21 com.apple.GSS 0x00007fff364cb0d8 gss_acquire_cred +
523
22 libpq.5.dylib 0x0000000112b4b77d
pg_GSS_have_cred_cache + 54
23 libpq.5.dylib 0x0000000112b39edf PQconnectPoll +
6377
24 libpq.5.dylib 0x0000000112b36f8b connectDBComplete +
232
25 libpq.5.dylib 0x0000000112b37112 PQconnectdb + 36
26 pg_ext.bundle 0x000000011157ab01
gvl_PQconnectdb_skeleton + 17
27 ruby 0x000000010f1dfff9 call_without_gvl +
185
28 pg_ext.bundle 0x000000011157aadd gvl_PQconnectdb +
45
29 pg_ext.bundle 0x000000011157fcb9 pgconn_init + 121
30 ruby 0x000000010f221b1c vm_call0_body + 604
--
Fahar Abbas
QMG
EnterpriseDB Corporation
Phone Office: +92-51-835-8874
Phone Direct: +92-51-8466803
Mobile: +92-333-5409707
Skype ID: *live:fahar.abbas*
Website: www.enterprisedb.com
Attachments:
Not reproducible.pngimage/png; name="Not reproducible.png"Download+10-4
Hello,
I am able to reproduce this on macOS 10.14 (Mojave) in multiple versions
of Ruby and in a minimal C program.
Steps to reproduce:
1. Install libpq for PostgreSQL 12:
brew install postgresql@12
2. Install the pg gem:
gem install pg
2. Start a PostgreSQL server:
docker run --rm -d -p 127.0.0.1:5432:5432 postgres:12
3. Execute some GSS path before and after fork:
ruby -r pg -e '
PG.connect(host: "localhost")
Process.fork { PG.connect(host: "localhost") }
Process.wait
'
Notice that host must be a TCP address (not Unix) and gssencmode must be
"prefer" (default is "prefer".) The version of the server doesn't appear
to matter; I tested 10, 11, and 12.
This can also happen in `rails console` if an application initializer
interacts with ActiveRecord or a descendant (i.e. opens a database
connection.) Any further interaction with ActiveRecord on the console
segfaults.
This has been reported in a variety of Ruby projects and often dismissed
as "a PostgreSQL issue."
I found a similar trace in a Python package that interacts with the
macOS keychain.[1]https://github.com/jaraco/keyring/issues/281 There they narrowed it to a single call, raised the
issue upstream, and were told in-short "you can't use keychain after fork."
Based on that report, I crafted a minimal C program to make the same GSS
call as libpq. I compiled (with deprecation warnings) and tested with
the following:
gcc macos-gss-crash.c -o macos-gss-crash -lgssapi_krb5
./macos-gss-crash
It prints:
before gss_acquire_cred in main
after gss_acquire_cred in main
gss complete: true
before gss_acquire_cred in child
child signalled: 11
I've attached the C program and crash reports for it and the above Ruby
snippet.
Thanks!
Chris
[1]: https://github.com/jaraco/keyring/issues/281
Show quoted text
On 10/4/19 5:43 PM, PG Bug reporting form wrote:
The following bug has been logged on the website:
Bug reference: 16041
Logged by: Mark Siemers
Email address: mark.siemers@gmail.com
PostgreSQL version: 12.0
Operating system: Mac OS X Mojave 10.14.6
Description:For further details (including crash report) see bugs filed with
third-parties:
Ruby - https://bugs.ruby-lang.org/issues/16239
pgAdmin 4 - https://redmine.postgresql.org/issues/4813The speculation from a ruby maintainer is there is an issue with GSS
authentication on OS X.Snippet of stack trace below:
7 ??? 0x0000000200000000 0 + 8589934592
8 com.apple.security 0x00007fff3f57c059 invocation function
for block in
Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*)
+ 287
9 libdispatch.dylib 0x00007fff5fd6d63d
_dispatch_client_callout + 8
10 libdispatch.dylib 0x00007fff5fd79129
_dispatch_lane_barrier_sync_invoke_and_complete + 60
11 com.apple.security 0x00007fff3f57be47
Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*)
+ 441
12 com.apple.security 0x00007fff3f37cae2
Security::KeychainCore::KCCursorImpl::next(Security::KeychainCore::Item&) +
230
13 com.apple.security 0x00007fff3f523c98
Security::KeychainCore::IdentityCursor::next(Security::SecPointer<Security::KeychainCore::Identity>&)
+ 192
14 com.apple.security 0x00007fff3f545f2f
SecIdentitySearchCopyNext + 145
15 com.apple.security 0x00007fff3f550956
SecItemCopyMatching_osx(__CFDictionary const*, void const**) + 238
16 com.apple.security 0x00007fff3f553fc5 SecItemCopyMatching +
316
17 com.apple.Heimdal 0x00007fff4feae830 0x7fff4fe5c000 +
337968
18 com.apple.Heimdal 0x00007fff4fead35e hx509_certs_find +
67
19 com.apple.Heimdal 0x00007fff4fe88a6c _krb5_pk_find_cert +
246
20 com.apple.GSS 0x00007fff364dbd8e
_gsspku2u_acquire_cred + 386
21 com.apple.GSS 0x00007fff364cb0d8 gss_acquire_cred +
523
22 libpq.5.dylib 0x0000000112b4b77d
pg_GSS_have_cred_cache + 54
23 libpq.5.dylib 0x0000000112b39edf PQconnectPoll +
6377
24 libpq.5.dylib 0x0000000112b36f8b connectDBComplete +
232
25 libpq.5.dylib 0x0000000112b37112 PQconnectdb + 36
26 pg_ext.bundle 0x000000011157ab01
gvl_PQconnectdb_skeleton + 17
27 ruby 0x000000010f1dfff9 call_without_gvl +
185
28 pg_ext.bundle 0x000000011157aadd gvl_PQconnectdb +
45
29 pg_ext.bundle 0x000000011157fcb9 pgconn_init + 121
30 ruby 0x000000010f221b1c vm_call0_body + 604
Attachments:
macos-gss-crash.ctext/plain; charset=UTF-8; name=macos-gss-crash.c; x-mac-creator=0; x-mac-type=0Download
macos-gss-crash_2019-12-03-144923.crashtext/plain; charset=UTF-8; name=macos-gss-crash_2019-12-03-144923.crash; x-mac-creator=0; x-mac-type=0Download
ruby_2019-12-03-123416.crashtext/plain; charset=UTF-8; name=ruby_2019-12-03-123416.crash; x-mac-creator=0; x-mac-type=0Download
On 12/3/19 3:33 PM, Chris Bandy wrote:
Hello,
I am able to reproduce this on macOS 10.14 (Mojave) in multiple versions
of Ruby and in a minimal C program.
I was also able to reproduce this with the attached Python program and
psycopg2 package.
Steps to reproduce:
1. Install libpq for PostgreSQL 12:
brew install postgresql@12
2. Install the psycopg2 package:
pip install psycopg2
3. Start a PostgreSQL server:
docker run --rm -d -p 127.0.0.1:5432:5432 postgres:12
4. Execute some GSS path before and after fork:
python macos-gss-crash.py
It generates a crash report and prints:
main ok
-11
In this and the previous tests I can avoid/workaround the segfault by
specifying gssencmode=disable.
Thanks!
Chris
Greetings,
* Chris Bandy (chris.bandy@crunchydata.com) wrote:
Notice that host must be a TCP address (not Unix) and gssencmode must be
"prefer" (default is "prefer".) The version of the server doesn't appear to
matter; I tested 10, 11, and 12.
So, gssencmode didn't exist in 10 or 11- but are you actually testing
those different versions of *libpq*? That's really what is relevant
here, I believe, if libpq is actually even relevant at all...
This has been reported in a variety of Ruby projects and often dismissed as
"a PostgreSQL issue."
I'm really inclined to say that this isn't a PG issue...
Based on that report, I crafted a minimal C program to make the same GSS
call as libpq. I compiled (with deprecation warnings) and tested with the
following:gcc macos-gss-crash.c -o macos-gss-crash -lgssapi_krb5
./macos-gss-crash
Particularly since that isn't linking against libpq and it's still
crashing.
I took the liberty to update the C code version to run on a Linux
system, and sure enough, it works just fine:
before gss_acquire_cred in main
after gss_acquire_cred in main
gss complete: true
before gss_acquire_cred in child
after gss_acquire_cred in child
gss complete: true
child exit code: 0
(also tested w/o having GSS creds and it still worked without a crash)
The only difference I needed to get it to compile on my Ubuntu box was
to add:
#include <sys/types.h>
#include <sys/wait.h>
and then compile as:
➜ ~ gcc macos-gss-crash.c -o macos-gss-crash -I /usr/include/mit-krb5 -L /usr/lib/x86_64-linux-gnu/mit-krb5 -lgssapi_krb5
It prints:
before gss_acquire_cred in main
after gss_acquire_cred in main
gss complete: true
before gss_acquire_cred in child
child signalled: 11I've attached the C program and crash reports for it and the above Ruby
snippet.
Unfortunately, MacOS is pretty well known to be terrible about less
commonly used libraries and maintaining them. I'd suggest building a
current version of the Kerberos libraries, making sure you're linking
against just those and not whatever is provided by MacOS, and see if you
still have an issue.
The other possibility is that this is an current bug in Heimdal, which
seems to be the Kerberos library being used on MacOS, in which case
you'd need to bring up the issue with them.
There seems to be some indepedent confirmation of this being an issue
with the Heimdal provided by MacOS:
https://github.com/zenchild/gssapi/issues/12
The docs for gss_acquire_cred() don't seem to say much about what
happens when there's a fork():
https://docs.oracle.com/cd/E19683-01/816-1331/overview-141/index.html
If there's something we should be doing differently with
gss_acquire_cred() to "fix" this then I'm certainly open to it but I'm
really not sure what we'd do here; it seems pretty clearly to be some
issue where the Kerberos/Heimdal library being used is maintaining its
own state and getting confused after a fork happens.
Thanks,
Stephen
On 12/3/19 5:31 PM, Stephen Frost wrote:
Greetings,
* Chris Bandy (chris.bandy@crunchydata.com) wrote:
Notice that host must be a TCP address (not Unix) and gssencmode must be
"prefer" (default is "prefer".) The version of the server doesn't appear to
matter; I tested 10, 11, and 12.So, gssencmode didn't exist in 10 or 11- but are you actually testing
those different versions of *libpq*?
No, the libpq version in my tests is always 12. I was trying to say that
it doesn't appear to be an issue with the protocol/negotiation of GSS
encryption.
That does make me wonder, though, if/how the _server_ built by `brew
install postgresql` might be impacted by the macOS GSSAPI? All my tests
targeted a linux server.
This has been reported in a variety of Ruby projects and often dismissed as
"a PostgreSQL issue."I'm really inclined to say that this isn't a PG issue...
I agree, but at the same time the perception seems to be that
using/connecting to PostgreSQL crashes one's application. I think the
very reasonable default of gssencmode=prefer is partly responsible.
Users don't realize that by upgrading libpq they are opting in to new
security code paths (and library compatibility issues.)
Unfortunately, MacOS is pretty well known to be terrible about less
commonly used libraries and maintaining them. I'd suggest building a
current version of the Kerberos libraries, making sure you're linking
against just those and not whatever is provided by MacOS, and see if you
still have an issue.
Investigating this has been the deepest exposure I've had to this...
yes, "unfortunate" reality.
Homebrew provides a recent version of krb5 (1.17 at this time) so I set
out to use it. A small diff to the formula proved successful. I'll
submit a patch to Homebrew linking back to this thread.
Is there anything that can/should be done on PostgreSQL's end now that
we know about this situation? The most I can imagine is to issue a
warning when macOS's GSSAPI is detected during build/configure. I don't
know how to do the latter and won't be surprised if the answer to the
former is "no."
The other possibility is that this is an current bug in Heimdal, which
seems to be the Kerberos library being used on MacOS, in which case
you'd need to bring up the issue with them.
I'm out of my depth on this front. My impression from the traces is that
the incompatibility is in macOS keychain, and I'm willing to leave it at
that. While researching this topic, I found multiple cases where fork()
and the "dispatch queue" are incompatible.[1]https://www.evanjones.ca/fork-is-dangerous.html
There seems to be some indepedent confirmation of this being an issue
with the Heimdal provided by MacOS:
I don't see any C level backtrace information in that thread, so I can't
tell if its the same issue.
Thank you for your help!
Chris
Greetings,
* Chris Bandy (chris.bandy@crunchydata.com) wrote:
On 12/3/19 5:31 PM, Stephen Frost wrote:
* Chris Bandy (chris.bandy@crunchydata.com) wrote:
Notice that host must be a TCP address (not Unix) and gssencmode must be
"prefer" (default is "prefer".) The version of the server doesn't appear to
matter; I tested 10, 11, and 12.So, gssencmode didn't exist in 10 or 11- but are you actually testing
those different versions of *libpq*?No, the libpq version in my tests is always 12. I was trying to say that it
doesn't appear to be an issue with the protocol/negotiation of GSS
encryption.
No, I don't think it's got anything to do with that ... or largely to do
with PG, except that libpq with v12 now uses more of the GSSAPI library
than it used to.
That does make me wonder, though, if/how the _server_ built by `brew install
postgresql` might be impacted by the macOS GSSAPI? All my tests targeted a
linux server.
I wouldn't be at all surprised if there's other bugs lurking in the old
version of Heimdal that Apple hacked up and distributes with their base
OS.
This has been reported in a variety of Ruby projects and often dismissed as
"a PostgreSQL issue."I'm really inclined to say that this isn't a PG issue...
I agree, but at the same time the perception seems to be that
using/connecting to PostgreSQL crashes one's application. I think the very
reasonable default of gssencmode=prefer is partly responsible. Users don't
realize that by upgrading libpq they are opting in to new security code
paths (and library compatibility issues.)
Perception isn't reality though and upgrading to a new major version of
libpq is going to pretty regularly involves new library calls or calls
being made in ways they weren't before. If that exposes a bug in
that library (particularly one that's been fixed in more recent versions
of the library), that's not on us to hack around or attempt to solve,
imv. Perhaps someone else has a differing opinion and wants to try and
figure out a way to solve this that doesn't materially make things worse
for users that are running with a modern library, which would be great,
but I can't get too worked up about it.
Unfortunately, MacOS is pretty well known to be terrible about less
commonly used libraries and maintaining them. I'd suggest building a
current version of the Kerberos libraries, making sure you're linking
against just those and not whatever is provided by MacOS, and see if you
still have an issue.Investigating this has been the deepest exposure I've had to this... yes,
"unfortunate" reality.Homebrew provides a recent version of krb5 (1.17 at this time) so I set out
to use it. A small diff to the formula proved successful. I'll submit a
patch to Homebrew linking back to this thread.
Great, that sounds like it's probably the right approach to addressing
this.
Is there anything that can/should be done on PostgreSQL's end now that we
know about this situation? The most I can imagine is to issue a warning when
macOS's GSSAPI is detected during build/configure. I don't know how to do
the latter and won't be surprised if the answer to the former is "no."
I wouldn't be against doing something here but I don't have a Mac myself
and I don't plan to spend time trying to hack around their broken
library. I'm also not entirely convinced that we should just throw an
error if we come across this busted library- psql doesn't fork and
hasn't got any problems, so it seems a bit overkill to just refuse to
work with the MacOS library.
The other possibility is that this is an current bug in Heimdal, which
seems to be the Kerberos library being used on MacOS, in which case
you'd need to bring up the issue with them.I'm out of my depth on this front. My impression from the traces is that the
incompatibility is in macOS keychain, and I'm willing to leave it at that.
While researching this topic, I found multiple cases where fork() and the
"dispatch queue" are incompatible.[1]
I'm.. not terribly impressed by that blog's arguments around fork(),
particularly since it seems to be claiming things that are actually not
true about fork but which are true about threads. In fact, what it
seems to really be getting at is that running with threads and fork'ing
at the same time is awful complicated to get right, and that's pretty
accurate, but that doesn't make just using fork() an issue.
That blog post aside, it looks like what it's getting at is that you
can't link to MacOS libraries and also fork() and expect things to be
sane, and while that's unfortuante, that isn't really our issue to go
figure out how to fix or address.
Thanks,
Stephen