Direct SSL connection and ALPN loose ends
There's been a bunch of bugs, and discussion on the intended behavior of
sslnegotiation and ALPN. This email summarizes the current status:
## Status and loose ends for beta1
All reported bugs have now been fixed. We now enforce ALPN in all the
right places. Please let me know if I missed something.
There are two open items remaining that I intend to address in the next
few days, before beta1:
- I am going to rename sslnegotiation=requiredirect to
sslnegotiation=directonly. I acknowledge that there is still some debate
on this: Jacob (and Robert?) would prefer to change the behavior
instead, so that sslnegotiation=requiredirect would also imply or
require sslmode=require, while IMHO the settings should be orthogonal so
that sslmode controls whether SSL is used or not, and sslnegotiation
controls how the SSL layer is negotiated when SSL is used. Given that
they are orthogonal, "directonly" is a better name. I will also take
another look at the documentation, if it needs clarification on that
point. If you have more comments on whether this is a good idea or not
or how sslnegotiation should work, please reply on the other thread,
let's keep this one focused on the overall status. [1]/messages/by-id/CA+TgmobV9JEk4AFy61Xw+2+cCTBqdTsDopkeB+gb81kq3f-o6A@mail.gmail.com
- The registration of the ALPN name with IANA hasn't been finished yet
[2]: https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/
I changed the request to "postgresql". The string we have in 'master' is
currently "TBD-pgsql". I'm very confident that the registration will go
through with "postgresql", so my plan is to commit that change before
beta1, even if the IANA process hasn't completed by then.
## V18 material
- Add an option to disable traditional SSL negotiation in the server.
There was discussion on doing this via HBA rules or as a global option,
and the consensus seems to be for a global option. This would be just to
reduce the attach surface, there is no known vulnerabilities or other
issues with the traditional negotiation. And maybe to help with testing. [3]/messages/by-id/CA+TgmoaLpDVY2ywqQUfxvKEQZ+nwkabcw_f=i4Zyivt9CLjcmA@mail.gmail.com
These are not directly related to sslnegotiation, but came up in the
discussion:
- Clarify the situation with sslmode=require and gssencmode=require
combination, by replacing sslmode and gssencmode options with a single
"encryption=[ssl|gss|none], [...]" option. [4]/messages/by-id/3a6f126c-e1aa-4dcc-9252-9868308f6cf0@iki.fi
- Make sslmode=require the default. This is orthogonal to the SSL
negotiation, but I think the root cause for the disagreements on
sslnegotiation is actually that we'd like SSL to be the default. [5]/messages/by-id/CA+TgmoaNkRerEmB9JPgW0FhcJAe337AA=5kp6je9KekQhhRbmA@mail.gmail.com
The details of these need to be hashed out, in particular the
backwards-compatibility and migration aspects, but the consensus seems
to be that it's the right direction.
## V19 and beyond
In the future, once v17 is ubiquitous and the ecosystem (pgbouncer etc)
have added direct SSL support, we can change the default sslnegotiation
from 'postgres' to 'direct'. I'm thinking 3-5 years from now. In the
more distant future, we could remove the traditional SSLRequest
negotiation altogether and always use direct SSL negotiation.
There's no rush on these.
## Retrospective
There were a lot more cleanups required for this work than I expected,
given that there were little changes to the patches between January and
March commitfests. I was mostly worried about the refactoring of the
retry logic in libpq (and about the pre-existing logic too to be honest,
it was complicated before these changes already). That's why I added a
lot more tests for that. However, I did not foresee all the ALPN related
issues. In hindsight, it would have been good to commit most of the ALPN
changes first, and with more tests. Jacob wrote a python test suite; I
should've played more with that, that could have demonstrated the ALPN
issues earlier.
[1]: /messages/by-id/CA+TgmobV9JEk4AFy61Xw+2+cCTBqdTsDopkeB+gb81kq3f-o6A@mail.gmail.com
/messages/by-id/CA+TgmobV9JEk4AFy61Xw+2+cCTBqdTsDopkeB+gb81kq3f-o6A@mail.gmail.com
[2]: https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/
https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/
[3]: /messages/by-id/CA+TgmoaLpDVY2ywqQUfxvKEQZ+nwkabcw_f=i4Zyivt9CLjcmA@mail.gmail.com
/messages/by-id/CA+TgmoaLpDVY2ywqQUfxvKEQZ+nwkabcw_f=i4Zyivt9CLjcmA@mail.gmail.com
[4]: /messages/by-id/3a6f126c-e1aa-4dcc-9252-9868308f6cf0@iki.fi
/messages/by-id/3a6f126c-e1aa-4dcc-9252-9868308f6cf0@iki.fi
[5]: /messages/by-id/CA+TgmoaNkRerEmB9JPgW0FhcJAe337AA=5kp6je9KekQhhRbmA@mail.gmail.com
/messages/by-id/CA+TgmoaNkRerEmB9JPgW0FhcJAe337AA=5kp6je9KekQhhRbmA@mail.gmail.com
--
Heikki Linnakangas
Neon (https://neon.tech)
Hi,
With TLS 1.3 and others there is possibly a security flaw using ALPN [1]terminate-tlsv1-3-handshake-if-alpn-is-missing <https://stackoverflow.com/questions/77271498/terminate-tlsv1-3-handshake-if-alpn-is-missing>.
It seems to me that the ALPN protocol can be bypassed if the client does
not correctly inform the ClientHello header.
So, the suggestion is to check the ClientHello header in the server and
terminate the TLS handshake early.
Patch attached.
best regards,
Ranier Vilela
[1]: terminate-tlsv1-3-handshake-if-alpn-is-missing <https://stackoverflow.com/questions/77271498/terminate-tlsv1-3-handshake-if-alpn-is-missing>
<https://stackoverflow.com/questions/77271498/terminate-tlsv1-3-handshake-if-alpn-is-missing>
Attachments:
terminate-tls-handshake-if-no-alpn.patchapplication/octet-stream; name=terminate-tls-handshake-if-no-alpn.patchDownload+28-0
Import Notes
Resolved by subject fallback
On 29/04/2024 20:10, Ranier Vilela wrote:
Hi,
With TLS 1.3 and others there is possibly a security flaw using ALPN [1].
It seems to me that the ALPN protocol can be bypassed if the client does
not correctly inform the ClientHello header.So, the suggestion is to check the ClientHello header in the server and
terminate the TLS handshake early.
Sounds to me like it's working as designed. ALPN in general is optional;
if the client doesn't request it, then you proceed without it. We do
require ALPN for direct SSL connections though. We can, because direct
SSL connections is a new feature in Postgres. But we cannot require it
for the connections negotiated with SSLRequest, or we break
compatibility with old clients that don't use ALPN.
There is a check in direct SSL mode that ALPN was used
(ProcessSSLStartup in backend_startup.c):
if (!port->alpn_used)
{
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
errmsg("received direct SSL connection request without ALPN protocol negotiation extension")));
goto reject;
}
That happens immediately after the SSL connection has been established.
Hmm. I guess it would be better to abort the connection earlier, without
completing the TLS handshake. Otherwise the client might send the first
message in wrong protocol to the PostgreSQL server. That's not a
security issue for the PostgreSQL server: the server disconnects without
reading the message. And I don't see any way for an ALPACA attack when
the server ignores the client's message. Nevertheless, from the point of
view of keeping the attack surface as small as possible, aborting
earlier seems better.
--
Heikki Linnakangas
Neon (https://neon.tech)
Em seg., 29 de abr. de 2024 às 14:56, Heikki Linnakangas <hlinnaka@iki.fi>
escreveu:
On 29/04/2024 20:10, Ranier Vilela wrote:
Hi,
With TLS 1.3 and others there is possibly a security flaw using ALPN [1].
It seems to me that the ALPN protocol can be bypassed if the client does
not correctly inform the ClientHello header.So, the suggestion is to check the ClientHello header in the server and
terminate the TLS handshake early.Sounds to me like it's working as designed. ALPN in general is optional;
if the client doesn't request it, then you proceed without it. We do
require ALPN for direct SSL connections though. We can, because direct
SSL connections is a new feature in Postgres. But we cannot require it
for the connections negotiated with SSLRequest, or we break
compatibility with old clients that don't use ALPN.
Ok.
But what if I have a server configured for TLS 1.3 and that requires ALPN
to allow access?
What about a client configured without ALPN requiring connection?
There is a check in direct SSL mode that ALPN was used
(ProcessSSLStartup in backend_startup.c):if (!port->alpn_used)
{
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
errmsg("received direct SSL connectionrequest without ALPN protocol negotiation extension")));
goto reject;
}That happens immediately after the SSL connection has been established.
Hmm. I guess it would be better to abort the connection earlier, without
completing the TLS handshake. Otherwise the client might send the first
message in wrong protocol to the PostgreSQL server. That's not a
security issue for the PostgreSQL server: the server disconnects without
reading the message. And I don't see any way for an ALPACA attack when
the server ignores the client's message. Nevertheless, from the point of
view of keeping the attack surface as small as possible, aborting
earlier seems better.
So the ClientHello callback is the correct way to determine the end.
best regards,
Ranier Vilela
On 29/04/2024 21:06, Ranier Vilela wrote:
Em seg., 29 de abr. de 2024 às 14:56, Heikki Linnakangas
<hlinnaka@iki.fi <mailto:hlinnaka@iki.fi>> escreveu:On 29/04/2024 20:10, Ranier Vilela wrote:
Hi,
With TLS 1.3 and others there is possibly a security flaw using
ALPN [1].
It seems to me that the ALPN protocol can be bypassed if the
client does
not correctly inform the ClientHello header.
So, the suggestion is to check the ClientHello header in the
server and
terminate the TLS handshake early.
Sounds to me like it's working as designed. ALPN in general is
optional;
if the client doesn't request it, then you proceed without it. We do
require ALPN for direct SSL connections though. We can, because direct
SSL connections is a new feature in Postgres. But we cannot require it
for the connections negotiated with SSLRequest, or we break
compatibility with old clients that don't use ALPN.Ok.
But what if I have a server configured for TLS 1.3 and that requires
ALPN to allow access?
What about a client configured without ALPN requiring connection?
Sorry, I don't understand the questions. What about them?
--
Heikki Linnakangas
Neon (https://neon.tech)
Em seg., 29 de abr. de 2024 às 15:36, Heikki Linnakangas <hlinnaka@iki.fi>
escreveu:
On 29/04/2024 21:06, Ranier Vilela wrote:
Em seg., 29 de abr. de 2024 às 14:56, Heikki Linnakangas
<hlinnaka@iki.fi <mailto:hlinnaka@iki.fi>> escreveu:On 29/04/2024 20:10, Ranier Vilela wrote:
Hi,
With TLS 1.3 and others there is possibly a security flaw using
ALPN [1].
It seems to me that the ALPN protocol can be bypassed if the
client does
not correctly inform the ClientHello header.
So, the suggestion is to check the ClientHello header in the
server and
terminate the TLS handshake early.
Sounds to me like it's working as designed. ALPN in general is
optional;
if the client doesn't request it, then you proceed without it. We do
require ALPN for direct SSL connections though. We can, becausedirect
SSL connections is a new feature in Postgres. But we cannot require
it
for the connections negotiated with SSLRequest, or we break
compatibility with old clients that don't use ALPN.Ok.
But what if I have a server configured for TLS 1.3 and that requires
ALPN to allow access?
What about a client configured without ALPN requiring connection?Sorry, I don't understand the questions. What about them?
Sorry, I'll try to be clearer.
The way it is designed, can we impose TLS 1.3 and ALPN to allow access to a
public server?
And if on the other side we have a client, configured without ALPN,
when requesting access, the server will refuse?
best regards,
Ranier Vilela
On Mon, Apr 29, 2024 at 8:24 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
I was mostly worried about the refactoring of the
retry logic in libpq (and about the pre-existing logic too to be honest,
it was complicated before these changes already).
Some changes in the v17 negotiation fallback order caught my eye:
1. For sslmode=prefer, a modern v3 error during negotiation now
results in a fallback to plaintext. For v16 this resulted in an
immediate failure. (v2 errors retain the v16 behavior.)
2. For gssencmode=prefer, a legacy v2 error during negotiation now
results in an immediate failure. In v16 it allowed fallback to SSL or
plaintext depending on sslmode.
Are both these changes intentional/desirable? Change #1 seems to
partially undo the decision made in a49fbaaf:
Don't assume that "E" response to NEGOTIATE_SSL_CODE means pre-7.0 server.
These days, such a response is far more likely to signify a server-side
problem, such as fork failure. [...]Hence, it seems best to just eliminate the assumption that backing off
to non-SSL/2.0 protocol is the way to recover from an "E" response, and
instead treat the server error the same as we would in non-SSL cases.
Thanks,
--Jacob
On 17/06/2024 17:11, Jacob Champion wrote:
On Mon, Apr 29, 2024 at 8:24 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
I was mostly worried about the refactoring of the
retry logic in libpq (and about the pre-existing logic too to be honest,
it was complicated before these changes already).Some changes in the v17 negotiation fallback order caught my eye:
1. For sslmode=prefer, a modern v3 error during negotiation now
results in a fallback to plaintext. For v16 this resulted in an
immediate failure. (v2 errors retain the v16 behavior.)
2. For gssencmode=prefer, a legacy v2 error during negotiation now
results in an immediate failure. In v16 it allowed fallback to SSL or
plaintext depending on sslmode.Are both these changes intentional/desirable? Change #1 seems to
partially undo the decision made in a49fbaaf:Don't assume that "E" response to NEGOTIATE_SSL_CODE means pre-7.0 server.
These days, such a response is far more likely to signify a server-side
problem, such as fork failure. [...]Hence, it seems best to just eliminate the assumption that backing off
to non-SSL/2.0 protocol is the way to recover from an "E" response, and
instead treat the server error the same as we would in non-SSL cases.
They were not intentional. Let me think about the desirable part :-).
By "negotiation", which part of the protocol are we talking about
exactly? In the middle of the TLS handshake? After sending the startup
packet?
I think the behavior with v2 and v3 errors should be the same. And I
think an immediate failure is appropriate on any v2/v3 error during
negotiation, assuming we don't use those errors for things like "TLS not
supported", which would warrant a fallback.
--
Heikki Linnakangas
Neon (https://neon.tech)
On Mon, Jun 17, 2024 at 8:24 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
By "negotiation", which part of the protocol are we talking about
exactly? In the middle of the TLS handshake? After sending the startup
packet?
By "negotiation" I mean the server's response to the startup packet.
I.e. "supported"/"not supported"/"error".
I think the behavior with v2 and v3 errors should be the same. And I
think an immediate failure is appropriate on any v2/v3 error during
negotiation, assuming we don't use those errors for things like "TLS not
supported", which would warrant a fallback.
For GSS encryption, it was my vague understanding that older servers
respond with an error rather than the "not supported" indication. For
TLS, though, the decision in a49fbaaf (immediate failure) seemed
reasonable.
Thanks,
--Jacob
Hi,
On 2024-04-29 18:24:04 +0300, Heikki Linnakangas wrote:
All reported bugs have now been fixed. We now enforce ALPN in all the right
places. Please let me know if I missed something.
Very minor and not really your responsibility:
If provided with the necessary key information, wireshark can decode TLS
exchanges when using sslnegotiation=postgres but not with direct. Presumably
it needs to be taught postgres' ALPN id or something.
Example with direct:
476 6513.310308457 192.168.0.113 → 192.168.0.200 48978 5432 142 TLSv1.3 Finished
477 6513.310341492 192.168.0.113 → 192.168.0.200 48978 5432 151 TLSv1.3 Application Data
478 6513.320730295 192.168.0.200 → 192.168.0.113 5432 48978 147 TLSv1.3 New Session Ticket
479 6513.320745684 192.168.0.200 → 192.168.0.113 5432 48978 147 TLSv1.3 New Session Ticket
480 6513.321175713 192.168.0.113 → 192.168.0.200 48978 5432 68 TCP 48978 → 5432 [ACK] Seq=915 Ack=1665 Win=62848 Len=0 TSval=3779915421 TSecr=3469016093
481 6513.323161553 192.168.0.200 → 192.168.0.113 5432 48978 518 TLSv1.3 Application Data
482 6513.323626180 192.168.0.113 → 192.168.0.200 48978 5432 125 TLSv1.3 Application Data
483 6513.333977769 192.168.0.200 → 192.168.0.113 5432 48978 273 TLSv1.3 Application Data
484 6513.334581920 192.168.0.113 → 192.168.0.200 48978 5432 95 TLSv1.3 Application Data
485 6513.334666116 192.168.0.113 → 192.168.0.200 48978 5432 92 TLSv1.3 Alert (Level: Warning, Description: Close Notify)
Example with postgres:
502 6544.752799560 192.168.0.113 → 192.168.0.200 46300 5432 142 TLSv1.3 Finished
503 6544.752842863 192.168.0.113 → 192.168.0.200 46300 5432 151 PGSQL >?
504 6544.763152222 192.168.0.200 → 192.168.0.113 5432 46300 147 TLSv1.3 New Session Ticket
505 6544.763163155 192.168.0.200 → 192.168.0.113 5432 46300 147 TLSv1.3 New Session Ticket
506 6544.763587595 192.168.0.113 → 192.168.0.200 46300 5432 68 TCP 46300 → 5432 [ACK] Seq=923 Ack=1666 Win=62848 Len=0 TSval=3779946864 TSecr=3469047536
507 6544.765024827 192.168.0.200 → 192.168.0.113 5432 46300 518 PGSQL <R/S/S/S/S/S/S/S/S/S/S/S/S/S/S/K/Z
508 6544.766288155 192.168.0.113 → 192.168.0.200 46300 5432 125 PGSQL >Q
509 6544.776974164 192.168.0.200 → 192.168.0.113 5432 46300 273 PGSQL <T/D/D/D/D/D/D/D/D/D/D/C/Z
510 6544.777597927 192.168.0.113 → 192.168.0.200 46300 5432 95 PGSQL >X
511 6544.777631520 192.168.0.113 → 192.168.0.200 46300 5432 92 TLSv1.3 Alert (Level: Warning, Description: Close Notify)
Note that in the second one it knows what's inside the "Application Data"
messages and decodes them (S: startup, authentication ok, parameters, cancel key,
ready for query, C: simple query, S: description, 10 rows, command complete,
ready for query).
In the GUI you can obviously go into the "postgres messages" in more detail
than I know how to do on the console.
A second aspect is that I'm not super happy about the hack of stashing data
into Port. I think medium term we'd be better off separating out the
buffering for unencrypted and encrypted data properly. It turns out that not
having any buffering *below* openssl (i.e. the encrypted data) hurts both for
the send and receive side, due to a) increased number of syscalls b) too many
small packets being sent, as we use TCP_NODELAY c) kernel memory copies being
slower due to the small increments.
Greetings,
Andres Freund
On Mon, Jun 17, 2024 at 9:23 AM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
I think the behavior with v2 and v3 errors should be the same. And I
think an immediate failure is appropriate on any v2/v3 error during
negotiation, assuming we don't use those errors for things like "TLS not
supported", which would warrant a fallback.For GSS encryption, it was my vague understanding that older servers
respond with an error rather than the "not supported" indication. For
TLS, though, the decision in a49fbaaf (immediate failure) seemed
reasonable.
Would an open item for this be appropriate?
Thanks,
--Jacob
On 20/06/2024 20:02, Jacob Champion wrote:
On Mon, Jun 17, 2024 at 9:23 AM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:I think the behavior with v2 and v3 errors should be the same. And I
think an immediate failure is appropriate on any v2/v3 error during
negotiation, assuming we don't use those errors for things like "TLS not
supported", which would warrant a fallback.For GSS encryption, it was my vague understanding that older servers
respond with an error rather than the "not supported" indication. For
TLS, though, the decision in a49fbaaf (immediate failure) seemed
reasonable.Would an open item for this be appropriate?
Added.
By "negotiation" I mean the server's response to the startup packet.
I.e. "supported"/"not supported"/"error".
Ok, I'm still a little confused, probably a terminology issue. The
server doesn't respond with "supported" or "not supported" to the
startup packet, that happens earlier. I think you mean the SSLRequst /
GSSRequest packet, which is sent *before* the startup packet?
I think the behavior with v2 and v3 errors should be the same. And I
think an immediate failure is appropriate on any v2/v3 error during
negotiation, assuming we don't use those errors for things like "TLS not
supported", which would warrant a fallback.For GSS encryption, it was my vague understanding that older servers
respond with an error rather than the "not supported" indication. For
TLS, though, the decision in a49fbaaf (immediate failure) seemed
reasonable.
Hmm, right, GSS encryption was introduced in v12, and older versions
respond with an error to a GSSRequest.
We probably could make the same assumption for GSS as we did for TLS in
a49fbaaf, i.e. that an error means that something's wrong with the
server, rather than that it's just very old and doesn't support GSS. But
the case for that is a lot weaker case than with TLS. There are still
pre-v12 servers out there in the wild.
--
Heikki Linnakangas
Neon (https://neon.tech)
On Thu, Jun 20, 2024 at 4:13 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
By "negotiation" I mean the server's response to the startup packet.
I.e. "supported"/"not supported"/"error".Ok, I'm still a little confused, probably a terminology issue. The
server doesn't respond with "supported" or "not supported" to the
startup packet, that happens earlier. I think you mean the SSLRequst /
GSSRequest packet, which is sent *before* the startup packet?
Yes, sorry. (I'm used to referring to those as startup packets too, ha.)
Hmm, right, GSS encryption was introduced in v12, and older versions
respond with an error to a GSSRequest.We probably could make the same assumption for GSS as we did for TLS in
a49fbaaf, i.e. that an error means that something's wrong with the
server, rather than that it's just very old and doesn't support GSS. But
the case for that is a lot weaker case than with TLS. There are still
pre-v12 servers out there in the wild.
Right. Since we default to gssencmode=prefer, if you have Kerberos
creds in your environment, I think this could potentially break
existing software that connects to v11 servers once you upgrade libpq.
Thanks,
--Jacob
On 21/06/2024 02:32, Jacob Champion wrote:
On Thu, Jun 20, 2024 at 4:13 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
By "negotiation" I mean the server's response to the startup packet.
I.e. "supported"/"not supported"/"error".Ok, I'm still a little confused, probably a terminology issue. The
server doesn't respond with "supported" or "not supported" to the
startup packet, that happens earlier. I think you mean the SSLRequst /
GSSRequest packet, which is sent *before* the startup packet?Yes, sorry. (I'm used to referring to those as startup packets too, ha.)
Yeah I'm not sure what the right term would be.
Hmm, right, GSS encryption was introduced in v12, and older versions
respond with an error to a GSSRequest.We probably could make the same assumption for GSS as we did for TLS in
a49fbaaf, i.e. that an error means that something's wrong with the
server, rather than that it's just very old and doesn't support GSS. But
the case for that is a lot weaker case than with TLS. There are still
pre-v12 servers out there in the wild.Right. Since we default to gssencmode=prefer, if you have Kerberos
creds in your environment, I think this could potentially break
existing software that connects to v11 servers once you upgrade libpq.
When you connect to a V11 server and attempt to perform GSSAPI
authentication, it will respond with a V3 error that says: "unsupported
frontend protocol 1234.5680: server supports 2.0 to 3.0". That was a
surprise to me until I tested it just now. I thought that it would
respond with a protocol V2 error, but it is not so. The backend sets
FrontendProtocol to 1234.5680 before sending the error, and because it
is >= 3, the error is sent with protocol version 3.
Given that, I think it is a good thing to fail the connection completely
on receiving a V2 error.
Attached is a patch to fix the other issue, with falling back from SSL
to plaintext. And some tests and comment fixes I spotted while at it.
0001: A small comment fix
0002: This is the main patch that fixes the SSL fallback issue
0003: This adds fault injection tests to exercise these early error
codepaths. It is not ready to be merged, as it contains a hack to skip
locking. See thread at
/messages/by-id/e1ffb822-054e-4006-ac06-50532767f75b@iki.fi.
0004: More tests, for what happens if the server sends an error after
responding "yes" to the SSLRequest or GSSRequest, but before performing
the SSL/GSS handshake.
Attached is also a little stand-alone perl program that listens on a
socket, and when you connect to it, it immediately sends a V2 or V3
error, depending on the argument. That's useful for testing. It could be
used as an alternative strategy to the injection points I used in the
0003-0004 patches, but for now I just used it for manual testing.
--
Heikki Linnakangas
Neon (https://neon.tech)
Attachments:
0001-Fix-outdated-comment-after-removal-of-direct-SSL-fal.patchtext/x-patch; charset=UTF-8; name=0001-Fix-outdated-comment-after-removal-of-direct-SSL-fal.patchDownload+2-3
0002-Fix-fallback-behavior-when-server-sends-an-ERROR-ear.patchtext/x-patch; charset=UTF-8; name=0002-Fix-fallback-behavior-when-server-sends-an-ERROR-ear.patchDownload+17-1
0003-WIP-Add-injection-points-and-test-for-early-backend-.patchtext/x-patch; charset=UTF-8; name=0003-WIP-Add-injection-points-and-test-for-early-backend-.patchDownload+92-3
0004-WIP-Add-tests-for-errors-during-ssl-or-gssapi-handsh.patchtext/x-patch; charset=UTF-8; name=0004-WIP-Add-tests-for-errors-during-ssl-or-gssapi-handsh.patchDownload+24-1
I reviewed the documentation for "direct ALPN connections' ', and it looks
like it could be improved.
Here's the link:
https://www.postgresql.org/docs/17/protocol-flow.html#PROTOCOL-FLOW-SSL
The currently suggested values for "sslnegotiations" are "direct" and
"postgres".
The project name is PostgreSQL and the ALPN name is postgresql. Is there a
reason why property value uses "postgres"?
Can the value be renamed to postgresql for consistency?
"SSL". Technically, the proper term is TLS, and even the document refers to
"IANA TLS ALPN Protocol IDs" (TLS, not SSL).
I would not die on that hill, however, going for tlsnegotiation would look
better than sslnegotiation.
Vladimir
On Tue, 25 Jun 2024 at 09:37, Vladimir Sitnikov <sitnikov.vladimir@gmail.com>
wrote:
I reviewed the documentation for "direct ALPN connections' ', and it looks
like it could be improved.
Here's the link:
https://www.postgresql.org/docs/17/protocol-flow.html#PROTOCOL-FLOW-SSLThe currently suggested values for "sslnegotiations" are "direct" and
"postgres".
The project name is PostgreSQL and the ALPN name is postgresql. Is there a
reason why property value uses "postgres"?
Can the value be renamed to postgresql for consistency?
+1 I found it strange that we are not using postgresql
"SSL". Technically, the proper term is TLS, and even the document refers
to "IANA TLS ALPN Protocol IDs" (TLS, not SSL).
I would not die on that hill, however, going for tlsnegotiation would look
better than sslnegotiation.
+1 again, unusual to use SSL when this really is TLS.
Dave
On Thu, Jun 20, 2024 at 4:32 PM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
Thanks,
--Jacob
Hey Heikki,
[sending this to the list in case it's not just me]
I cannot for the life of me get GMail to deliver your latest message,
even though I see it on postgresql.org. It's not in spam; it's just
gone. I wonder if it's possibly the Perl server script causing
virus-scanner issues?
--Jacob
On Tue, Jun 25, 2024 at 7:20 AM Dave Cramer <davecramer@postgres.rocks> wrote:
On Tue, 25 Jun 2024 at 09:37, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
"SSL". Technically, the proper term is TLS, and even the document refers to "IANA TLS ALPN Protocol IDs" (TLS, not SSL).
I would not die on that hill, however, going for tlsnegotiation would look better than sslnegotiation.+1 again, unusual to use SSL when this really is TLS.
This was sort of litigated last ye-(checks notes) oh no, three years ago:
/messages/by-id/CE12DD5C-4BB3-4166-BC9A-39779568734C@yesql.se
I'm your side when it comes to the use of the TLS acronym, personally,
but I think introducing a brand new option that interfaces with
sslmode and sslrootcert and etc. while not being named like them would
be outright unhelpful. And the idea of switching everything to use TLS
in docs seemed to be met with a solid "meh" on the other thread.
--Jacob
On Mon, Jun 24, 2024 at 11:30:53PM +0300, Heikki Linnakangas wrote:
Given that, I think it is a good thing to fail the connection completely on
receiving a V2 error.Attached is a patch to fix the other issue, with falling back from SSL to
plaintext. And some tests and comment fixes I spotted while at it.0001: A small comment fix
Already committed as of cc68ca6d420e.
0002: This is the main patch that fixes the SSL fallback issue
+ conn->failed_enc_methods |= conn->allowed_enc_methods &
(~conn->current_enc_method);
Sounds reasonable to me.
It's a bit annoying to have to guess that current_enc_method is
tracking only one method at a time (aka these three fields are not
documented in libpq-int.h), while allowed_enc_methods and
failed_enc_methods is a bitwise combination of the methods that are
still allowed or that have already failed.
0003: This adds fault injection tests to exercise these early error
codepaths. It is not ready to be merged, as it contains a hack to skip
locking. See thread at
/messages/by-id/e1ffb822-054e-4006-ac06-50532767f75b@iki.fi.
Locking when running an injection point has been replaced by some
atomics in 86db52a5062a.
+ if (IsInjectionPointAttached("backend-initialize-v2-error"))
+ {
+ FrontendProtocol = PG_PROTOCOL(2,0);
+ elog(FATAL, "protocol version 2 error triggered");
+ }
This is an attempt to do stack manipulation with an injection point
set. FrontendProtocol is a global variable, so you could have a new
callback setting up this global variable directly, then FATAL (I
really don't mind is modules/injection_points finishes with a library
of callbacks).
Not sure to like much this new IsInjectionPointAttached() that does a
search in the existing injection point pool, though. This leads to
more code footprint in the core backend, and I'm trying to minimize
that. Not everybody agrees with this view, I'd guess, which is also
fine.
0004: More tests, for what happens if the server sends an error after
responding "yes" to the SSLRequest or GSSRequest, but before performing the
SSL/GSS handshake.
No objections to these two additions.
Attached is also a little stand-alone perl program that listens on a socket,
and when you connect to it, it immediately sends a V2 or V3 error, depending
on the argument. That's useful for testing. It could be used as an
alternative strategy to the injection points I used in the 0003-0004
patches, but for now I just used it for manual testing.
Nice toy.
--
Michael
On 16/07/2024 09:54, Michael Paquier wrote:
On Mon, Jun 24, 2024 at 11:30:53PM +0300, Heikki Linnakangas wrote:
0002: This is the main patch that fixes the SSL fallback issue
+ conn->failed_enc_methods |= conn->allowed_enc_methods &
(~conn->current_enc_method);Sounds reasonable to me.
It's a bit annoying to have to guess that current_enc_method is
tracking only one method at a time (aka these three fields are not
documented in libpq-int.h), while allowed_enc_methods and
failed_enc_methods is a bitwise combination of the methods that are
still allowed or that have already failed.
Yeah. In hindsight I'm still not very happy with the code structure with
"allowed_enc_methods" and "current_enc_methods" and all that. The
fallback logic is still complicated. It's better than in v16, IMHO, but
still not great. This patch seems like the best fix for v17, but I
wouldn't mind another round of refactoring for v18, if anyone's got some
good ideas on how to structure it better. All these new tests are a
great asset when refactoring this again.
+ if (IsInjectionPointAttached("backend-initialize-v2-error")) + { + FrontendProtocol = PG_PROTOCOL(2,0); + elog(FATAL, "protocol version 2 error triggered"); + }This is an attempt to do stack manipulation with an injection point
set. FrontendProtocol is a global variable, so you could have a new
callback setting up this global variable directly, then FATAL (I
really don't mind is modules/injection_points finishes with a library
of callbacks).Not sure to like much this new IsInjectionPointAttached() that does a
search in the existing injection point pool, though. This leads to
more code footprint in the core backend, and I'm trying to minimize
that. Not everybody agrees with this view, I'd guess, which is also
fine.
Yeah, I'm also not too excited about the additional code in the backend,
but I'm also not excited about writing another test C module just for
this. I'm inclined to commit this as it is, but we can certainly revisit
this later, since it's just test code.
Here's a new rebased version with some minor cleanup. Notably, I added
docs for the new IS_INJECTION_POINT_ATTACHED() macro.
--
Heikki Linnakangas
Neon (https://neon.tech)