Problems with GSS encryption and SSL in libpq in 12~
Hi all,
A quick make check with Postgres 11 and 12 for src/test/ssl/ shows a
lot of difference in run time, using the same set of options with SSL
and the same compilation flags (OpenSSL 1.1.1f, with debugging and
assertions enabled among other things FWIW), with 12 taking up to five
minutes to complete and 11 finishing as a matter of seconds for me.
I have spent a couple of hours on that, to find out that libpq tries
to initialize a GSS context where the client remains stuck:
#9 0x00007fcd839bf72c in krb5_expand_hostname () from
/usr/lib/x86_64-linux-gnu/libkrb5.so.3
#10 0x00007fcd839bf9e0 in krb5_sname_to_principal () from
/usr/lib/x86_64-linux-gnu/libkrb5.so.3
#11 0x00007fcd83ad55b4 in ?? () from
/usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2
#12 0x00007fcd83ac0a98 in ?? () from
/usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2
#13 0x00007fcd83ac200f in gss_init_sec_context () from
/usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2
#14 0x00007fcd8423b24d in pqsecure_open_gss (conn=0x5582fa8cad90) at
fe-secure-gssapi.c:626
#15 0x00007fcd8421cd2b in PQconnectPoll (conn=0x5582fa8cad90) at
fe-connect.c:3165
#16 0x00007fcd8421b311 in connectDBComplete (conn=0x5582fa8cad90) at
fe-connect.c:2182
#17 0x00007fcd84218c1f in PQconnectdbParams (keywords=0x5582fa8cacf0,
values=0x5582fa8cad40, expand_dbname=1) at fe-connect.c:647
#18 0x00005582f8a81c87 in main (argc=8, argv=0x7ffe5ddb9df8) at
startup.c:266
However this makes little sense, why would libpq do that in the
context of an OpenSSL connection? Well, makeEmptyPGconn() does that,
which means that libpq would try by default to use GSS just if libpq
is *built* with GSS:
#ifdef ENABLE_GSS
conn->try_gss = true;
#endif
It is possible to enforce this flag to false by using
gssencmode=disable, but that's not really user-friendly in my opinion
because nobody is going to remember that for connection strings with
SSL settings so a lot of application are taking a performance hit at
connection because of that in my opinion. I think that's also a bad
idea from the start to assume that we have to try GSS by default, as
any new code path opening a secured connection may fail into the trap
of attempting to use GSS if this flag is not reset. Shouldn't we try
to set this flag to false by default, and set it to true only if
necessary depending on gssencmode? A quick hack switching this flag
to false in makeEmptyPGconn() gives back the past performance to
src/test/ssl/, FWIW.
Looking around, it seems to me that there is a second issue as of
PQconnectPoll(), where we don't reset the state machine correctly for
try_gss within reset_connection_state_machine, and instead HEAD does
it in connectDBStart().
Also, I have noted a hack as of pqsecure_open_gss() which does that:
/*
* We're done - hooray! Kind of gross, but we need to disable SSL
* here so that we don't accidentally tunnel one over the other.
*/
#ifdef USE_SSL
conn->allow_ssl_try = false;
#endif
And that looks like a rather bad idea to me..
Thanks,
--
Michael
On Mon, Apr 06, 2020 at 04:25:57PM +0900, Michael Paquier wrote:
It is possible to enforce this flag to false by using
gssencmode=disable, but that's not really user-friendly in my opinion
because nobody is going to remember that for connection strings with
SSL settings so a lot of application are taking a performance hit at
connection because of that in my opinion. I think that's also a bad
idea from the start to assume that we have to try GSS by default, as
any new code path opening a secured connection may fail into the trap
of attempting to use GSS if this flag is not reset. Shouldn't we try
to set this flag to false by default, and set it to true only if
necessary depending on gssencmode? A quick hack switching this flag
to false in makeEmptyPGconn() gives back the past performance to
src/test/ssl/, FWIW.Looking around, it seems to me that there is a second issue as of
PQconnectPoll(), where we don't reset the state machine correctly for
try_gss within reset_connection_state_machine, and instead HEAD does
it in connectDBStart().
So, a lot of things come down to PQconnectPoll() here. Once the
connection state reached is CONNECTION_MADE, we first try a GSS
connection if try_gss is true, and a SSL connection attempt follows
just after. This makes me wonder about the following things:
- gssencmode is prefer by default, the same as sslmode. Shouldn't we
issue an error if any of them is not disabled to avoid any conflicts
in the client, making the choice of gssencmode=prefer by default a bad
choice? It seems to me that there could be an argument to make
gssencmode disabled by default, and issue an error if somebody
attempts a connection without at least gssencode or sslmode set as
disabled.
- The current code tries a GSS connection first, and then it follows
with SSL, which is annoying because gssencmode=prefer by default means
that any user would pay the cost of attempting a GSS connection for
nothing (with or even without SSL). Shouldn't we do the opposite
here, by trying SSL first, and then GSS?
For now, I am attaching a WIP patch which seems like a good angle of
attack for libpq, meaning that if sslmode and gssencmode are both set,
then we would attempt a SSL connection before attempting GSS, so as
any user of SSL does not pay a performance hit compared to past
versions (I know that src/test/kerberos/ fails with that because
sslmode=prefer is triggered first in PQconnectPoll(), but that's just
to show the idea I had in mind).
Any thoughts?
--
Michael
Attachments:
libpq-gssenc.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 0157c619aa..f1b74e8351 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2023,11 +2023,6 @@ connectDBStart(PGconn *conn)
*/
resetPQExpBuffer(&conn->errorMessage);
-#ifdef ENABLE_GSS
- if (conn->gssencmode[0] == 'd') /* "disable" */
- conn->try_gss = false;
-#endif
-
/*
* Set up to try to connect to the first host. (Setting whichhost = -1 is
* a bit of a cheat, but PQconnectPoll will advance it to 0 before
@@ -2464,6 +2459,9 @@ keep_going: /* We will come back to here until there is
conn->allow_ssl_try = (conn->sslmode[0] != 'd'); /* "disable" */
conn->wait_ssl_try = (conn->sslmode[0] == 'a'); /* "allow" */
#endif
+#ifdef ENABLE_GSS
+ conn->try_gss = (conn->gssencmode[0] != 'd'); /* disable */
+#endif
reset_connection_state_machine = false;
need_new_connection = true;
@@ -2861,6 +2859,38 @@ keep_going: /* We will come back to here until there is
#endif
}
+#ifdef USE_SSL
+
+ /*
+ * If SSL is enabled and we haven't already got it running,
+ * request it instead of sending the startup message.
+ */
+ if (conn->allow_ssl_try && !conn->wait_ssl_try &&
+ !conn->ssl_in_use)
+ {
+ ProtocolVersion pv;
+
+ /*
+ * Send the SSL request packet.
+ *
+ * Theoretically, this could block, but it really
+ * shouldn't since we only got here if the socket is
+ * write-ready.
+ */
+ pv = pg_hton32(NEGOTIATE_SSL_CODE);
+ if (pqPacketSend(conn, 0, &pv, sizeof(pv)) != STATUS_OK)
+ {
+ appendPQExpBuffer(&conn->errorMessage,
+ libpq_gettext("could not send SSL negotiation packet: %s\n"),
+ SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+ goto error_return;
+ }
+ /* Ok, wait for response */
+ conn->status = CONNECTION_SSL_STARTUP;
+ return PGRES_POLLING_READING;
+ }
+#endif /* USE_SSL */
+
#ifdef ENABLE_GSS
/*
@@ -2897,38 +2927,6 @@ keep_going: /* We will come back to here until there is
}
#endif
-#ifdef USE_SSL
-
- /*
- * If SSL is enabled and we haven't already got it running,
- * request it instead of sending the startup message.
- */
- if (conn->allow_ssl_try && !conn->wait_ssl_try &&
- !conn->ssl_in_use)
- {
- ProtocolVersion pv;
-
- /*
- * Send the SSL request packet.
- *
- * Theoretically, this could block, but it really
- * shouldn't since we only got here if the socket is
- * write-ready.
- */
- pv = pg_hton32(NEGOTIATE_SSL_CODE);
- if (pqPacketSend(conn, 0, &pv, sizeof(pv)) != STATUS_OK)
- {
- appendPQExpBuffer(&conn->errorMessage,
- libpq_gettext("could not send SSL negotiation packet: %s\n"),
- SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
- goto error_return;
- }
- /* Ok, wait for response */
- conn->status = CONNECTION_SSL_STARTUP;
- return PGRES_POLLING_READING;
- }
-#endif /* USE_SSL */
-
/*
* Build the startup packet.
*/
@@ -3902,9 +3900,6 @@ makeEmptyPGconn(void)
conn->verbosity = PQERRORS_DEFAULT;
conn->show_context = PQSHOW_CONTEXT_ERRORS;
conn->sock = PGINVALID_SOCKET;
-#ifdef ENABLE_GSS
- conn->try_gss = true;
-#endif
/*
* We try to send at least 8K at a time, which is the usual size of pipe
Greetings,
* Michael Paquier (michael@paquier.xyz) wrote:
A quick make check with Postgres 11 and 12 for src/test/ssl/ shows a
lot of difference in run time, using the same set of options with SSL
and the same compilation flags (OpenSSL 1.1.1f, with debugging and
assertions enabled among other things FWIW), with 12 taking up to five
minutes to complete and 11 finishing as a matter of seconds for me.I have spent a couple of hours on that, to find out that libpq tries
to initialize a GSS context where the client remains stuck:
#9 0x00007fcd839bf72c in krb5_expand_hostname () from
/usr/lib/x86_64-linux-gnu/libkrb5.so.3
#10 0x00007fcd839bf9e0 in krb5_sname_to_principal () from
/usr/lib/x86_64-linux-gnu/libkrb5.so.3
#11 0x00007fcd83ad55b4 in ?? () from
/usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2
#12 0x00007fcd83ac0a98 in ?? () from
/usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2
#13 0x00007fcd83ac200f in gss_init_sec_context () from
/usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2
#14 0x00007fcd8423b24d in pqsecure_open_gss (conn=0x5582fa8cad90) at
fe-secure-gssapi.c:626
#15 0x00007fcd8421cd2b in PQconnectPoll (conn=0x5582fa8cad90) at
fe-connect.c:3165
#16 0x00007fcd8421b311 in connectDBComplete (conn=0x5582fa8cad90) at
fe-connect.c:2182
#17 0x00007fcd84218c1f in PQconnectdbParams (keywords=0x5582fa8cacf0,
values=0x5582fa8cad40, expand_dbname=1) at fe-connect.c:647
#18 0x00005582f8a81c87 in main (argc=8, argv=0x7ffe5ddb9df8) at
startup.c:266However this makes little sense, why would libpq do that in the
context of an OpenSSL connection? Well, makeEmptyPGconn() does that,
which means that libpq would try by default to use GSS just if libpq
is *built* with GSS:
#ifdef ENABLE_GSS
conn->try_gss = true;
#endif
Sure, but if you look at what is done with it:
/*
* If GSSAPI encryption is enabled, then call
* pg_GSS_have_cred_cache() which will return true if we can
* acquire credentials (and give us a handle to use in
* conn->gcred), and then send a packet to the server asking
* for GSSAPI Encryption (and skip past SSL negotiation and
* regular startup below).
*/
if (conn->try_gss && !conn->gctx)
conn->try_gss = pg_GSS_have_cred_cache(&conn->gcred);
In other words, it's trying because a call to gss_acquire_cred() (called
from pg_GSS_have_cred_cache()) returned without error, indicating that
GSS should be possible to attempt.
If you have GSS compiled in, and you've got a credential cache such that
gss_acquire_cred() returns true, it seems entirely reasonable that you'd
like to connect using GSS encryption.
It is possible to enforce this flag to false by using
gssencmode=disable, but that's not really user-friendly in my opinion
because nobody is going to remember that for connection strings with
SSL settings so a lot of application are taking a performance hit at
connection because of that in my opinion. I think that's also a bad
idea from the start to assume that we have to try GSS by default, as
any new code path opening a secured connection may fail into the trap
of attempting to use GSS if this flag is not reset. Shouldn't we try
to set this flag to false by default, and set it to true only if
necessary depending on gssencmode? A quick hack switching this flag
to false in makeEmptyPGconn() gives back the past performance to
src/test/ssl/, FWIW.
We don't just always try to do GSS, that certainly wouldn't make sense-
we only try when gss_acquire_cred() comes back without an error.
As this is part of the initial connection, it's also not possible to
decide to do it by "depending on gssencmode", as we haven't talked to
the server at all at this point and need to decide if we're going to do
GSS encryption or not with the initial packet. Note that this is
more-or-less identical to what we do with SSL, and, as you saw, we
default to 'prefer' with GSSENCMODE, but you can set it to 'disable' on
the client side if you don't want to try GSS, even when you have a
client compiled with GSS and you have a credential cache.
Looking around, it seems to me that there is a second issue as of
PQconnectPoll(), where we don't reset the state machine correctly for
try_gss within reset_connection_state_machine, and instead HEAD does
it in connectDBStart().
Not following exactly what you're referring to here, but I see you've
sent a follow-up email about this and will respond to that
independently.
Also, I have noted a hack as of pqsecure_open_gss() which does that:
/*
* We're done - hooray! Kind of gross, but we need to disable SSL
* here so that we don't accidentally tunnel one over the other.
*/
#ifdef USE_SSL
conn->allow_ssl_try = false;
#endif
And that looks like a rather bad idea to me..
Tunneling SSL over GSS encryption is definitely a bad idea, which is why
we prevent that from happening. I'm not sure what the issue here is-
are you suggesting that we should support tunneling SSL over GSS
encryption..?
Thanks,
Stephen
Greetings,
* Michael Paquier (michael@paquier.xyz) wrote:
On Mon, Apr 06, 2020 at 04:25:57PM +0900, Michael Paquier wrote:
It is possible to enforce this flag to false by using
gssencmode=disable, but that's not really user-friendly in my opinion
because nobody is going to remember that for connection strings with
SSL settings so a lot of application are taking a performance hit at
connection because of that in my opinion. I think that's also a bad
idea from the start to assume that we have to try GSS by default, as
any new code path opening a secured connection may fail into the trap
of attempting to use GSS if this flag is not reset. Shouldn't we try
to set this flag to false by default, and set it to true only if
necessary depending on gssencmode? A quick hack switching this flag
to false in makeEmptyPGconn() gives back the past performance to
src/test/ssl/, FWIW.Looking around, it seems to me that there is a second issue as of
PQconnectPoll(), where we don't reset the state machine correctly for
try_gss within reset_connection_state_machine, and instead HEAD does
it in connectDBStart().So, a lot of things come down to PQconnectPoll() here. Once the
connection state reached is CONNECTION_MADE, we first try a GSS
connection if try_gss is true, and a SSL connection attempt follows
just after. This makes me wonder about the following things:
- gssencmode is prefer by default, the same as sslmode. Shouldn't we
issue an error if any of them is not disabled to avoid any conflicts
in the client, making the choice of gssencmode=prefer by default a bad
choice? It seems to me that there could be an argument to make
gssencmode disabled by default, and issue an error if somebody
attempts a connection without at least gssencode or sslmode set as
disabled.
I don't see why it would make sense to throw an error and require that
one of them be disabled. I certainly don't agree that we should disable
GSS encryption by default, or that there's any reason to throw an error
if both GSS and SSL are set to 'prefer' (as our current default is).
- The current code tries a GSS connection first, and then it follows
with SSL, which is annoying because gssencmode=prefer by default means
that any user would pay the cost of attempting a GSS connection for
nothing (with or even without SSL). Shouldn't we do the opposite
here, by trying SSL first, and then GSS?
A GSS connection is only attempted, as mentioned, if your GSS library
claims that there's a possibility that credentials could be acquired for
the connection.
For now, I am attaching a WIP patch which seems like a good angle of
attack for libpq, meaning that if sslmode and gssencmode are both set,
then we would attempt a SSL connection before attempting GSS, so as
any user of SSL does not pay a performance hit compared to past
versions (I know that src/test/kerberos/ fails with that because
sslmode=prefer is triggered first in PQconnectPoll(), but that's just
to show the idea I had in mind).Any thoughts?
I don't agree with the assumption that, in the face of having GSS up and
running, which actually validates the client and the server, that we
should prefer SSL, where our default configuration for SSL does *not*
validate properly *either* the client or the server.
What it sounds like to me is that it'd be helpful for you to review why
your environment has a GSS credential cache (or, at least,
gss_acquire_cred() returns without any error) but GSS isn't properly
working. If you're using OSX, it's possible the issue here is actually
the broken and ridiculously ancient kerberos code that OSX ships with
(and which, in another thread, there's a discussion about detecting and
failing to build if it's detected as it's just outright broken). If
it's not that then it'd be great to understand more about the specific
environment and what things like 'klist' return.
Thanks,
Stephen