Experiments with Postgres and SSL
I had a conversation a while back with Heikki where he expressed that
it was annoying that we negotiate SSL/TLS the way we do since it
introduces an extra round trip. Aside from the performance
optimization I think accepting standard TLS connections would open the
door to a number of other opportunities that would be worth it on
their own.
So I took a look into what it would take to do and I think it would
actually be quite feasible. The first byte of a standard TLS
connection can't look anything like the first byte of any flavour of
Postgres startup packet because it would be the high order bits of the
length so unless we start having multi-megabyte startup packets....
So I put together a POC patch and it's working quite well and didn't
require very much kludgery. Well, it required some but it's really not
bad. I do have a bug I'm still trying to work out and the code isn't
quite in committable form but I can send the POC patch.
Other things it would open the door to in order from least
controversial to most....
* Hiding Postgres behind a standard SSL proxy terminating SSL without
implementing the Postgres protocol.
* "Service Mesh" type tools that hide multiple services behind a
single host/port ("Service Mesh" is just a new buzzword for "proxy").
* Browser-based protocol implementations using websockets for things
like pgadmin or other tools to connect directly to postgres using
Postgres wire protocol but using native SSL implementations.
* Postgres could even implement an HTTP based version of its protocol
and enable things like queries or browser based tools using straight
up HTTP requests so they don't need to use websockets.
* Postgres could implement other protocols to serve up data like
status queries or monitoring metrics, using HTTP based standard
protocols instead of using our own protocol.
Incidentally I find the logic in ProcessStartupPacket incredibly
confusing. It took me a while before I realized it's using tail
recursion to implement the startup logic. I think it would be way more
straightforward and extensible if it used a much more common iterative
style. I think it would make it possible to keep more state than just
ssl_done and gss_done without changing the function signature every
time for example.
--
greg
On Wed, Jan 18, 2023 at 7:16 PM Greg Stark <stark@mit.edu> wrote:
So I took a look into what it would take to do and I think it would
actually be quite feasible. The first byte of a standard TLS
connection can't look anything like the first byte of any flavour of
Postgres startup packet because it would be the high order bits of the
length so unless we start having multi-megabyte startup packets....
This is a fascinating idea! I like it a lot.
But..do we have to treat any unknown start sequence of bytes as a TLS
connection? Or is there some definite subset of possible first bytes
that clearly indicates that this is a TLS connection or not?
Best regards, Andrey Borodin.
On Thu, 19 Jan 2023 at 00:45, Andrey Borodin <amborodin86@gmail.com> wrote:
But..do we have to treat any unknown start sequence of bytes as a TLS
connection? Or is there some definite subset of possible first bytes
that clearly indicates that this is a TLS connection or not?
Absolutely not, there's only one MessageType that can initiate a
connection, ClientHello, so the initial byte has to be a specific
value. (0x16)
And probably to implement HTTP/Websocket it would probably only peek
at the first byte and check for things like G(ET) and H(EAD) and so
on, possibly only over SSL but in theory it could be over any
connection if the request comes before the startup packet.
Personally I'm motivated by wanting to implement status and monitoring
data for things like Prometheus and the like. For that it would just
be simple GET queries to recognize. But tunneling pg wire protocol
over websockets sounds cool but not really something I know a lot
about. I note that Neon is doing something similar with a proxy:
https://neon.tech/blog/serverless-driver-for-postgres
--
greg
It would be great if PostgreSQL supported 'start with TLS', however, how
could clients activate the feature?
I would like to refrain users from configuring the handshake mode, and I
would like to refrain from degrading performance when a new client talks to
an old database.
What if the server that supports 'fast TLS' added an extra notification in
case client connects with a classic TLS?
Then a capable client could remember host:port and try with newer TLS
appoach the next time it connects.
It would be transparent to the clients, and the users won't need to
configure 'prefer classic or fast TLS'
The old clients could discard the notification.
Vladimir
--
Vladimir
On Thu, 19 Jan 2023 at 15:49, Vladimir Sitnikov
<sitnikov.vladimir@gmail.com> wrote:
What if the server that supports 'fast TLS' added an extra notification in case client connects with a classic TLS?
Then a capable client could remember host:port and try with newer TLS appoach the next time it connects.It would be transparent to the clients, and the users won't need to configure 'prefer classic or fast TLS'
The old clients could discard the notification.
Hm. I hadn't really thought about the case of a new client connecting
to an old server. I don't think it's worth implementing a code path in
the server like this as it would then become cruft that would be hard
to ever get rid of.
I think you can do the same thing, more or less, in the client. Like
if the driver tries to connect via SSL and gets an error it remembers
that host/port and connects using negotiation in the future.
In practice though, by the time drivers support this it'll probably be
far enough in the future that they can just enable it and you can
disable it if you're connecting to an old server. The main benefit for
the near term is going to be clients that are specifically designed to
take advantage of it because it's necessary to enable the environment
they need -- like monitoring tools and proxies.
I've attached the POC. It's not near committable, mainly because of
the lack of any proper interface to the added fields in Port. I
actually had a whole API but ripped it out while debugging because it
wasn't working out.
But here's an example of psql connecting to the same server via
negotiated SSL or through stunnel where stunnel establishes the SSL
connection and psql is just doing plain text:
stark@hatter:~/src/postgresql$ ~/pgsql-sslhacked/bin/psql
'postgresql://localhost:9432/postgres'
psql (16devel)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384,
compression: off)
Type "help" for help.
postgres=# select * from pg_stat_ssl;
pid | ssl | version | cipher | bits | client_dn |
client_serial | issuer_dn
-------+-----+---------+------------------------+------+-----------+---------------+-----------
48771 | t | TLSv1.3 | TLS_AES_256_GCM_SHA384 | 256 | |
|
(1 row)
postgres=# \q
stark@hatter:~/src/postgresql$ ~/pgsql-sslhacked/bin/psql
'postgresql://localhost:8999/postgres'
psql (16devel)
Type "help" for help.
postgres=# select * from pg_stat_ssl;
pid | ssl | version | cipher | bits | client_dn |
client_serial | issuer_dn
-------+-----+---------+------------------------+------+-----------+---------------+-----------
48797 | t | TLSv1.3 | TLS_AES_256_GCM_SHA384 | 256 | |
|
(1 row)
--
greg
Attachments:
v1-0001-Direct-SSL-Connections.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Direct-SSL-Connections.patchDownload+134-34
On Wed, Jan 18, 2023 at 7:16 PM Greg Stark <stark@mit.edu> wrote:
I had a conversation a while back with Heikki where he expressed that
it was annoying that we negotiate SSL/TLS the way we do since it
introduces an extra round trip. Aside from the performance
optimization I think accepting standard TLS connections would open the
door to a number of other opportunities that would be worth it on
their own.
Nice! I want this too, but for security reasons [1]/messages/by-id/fcc3ebeb7f05775b63f3207ed52a54ea5d17fb42.camel@vmware.com -- I want to be
able to turn off negotiated (explicit) TLS, to force (implicit)
TLS-only mode.
Other things it would open the door to in order from least
controversial to most....* Hiding Postgres behind a standard SSL proxy terminating SSL without
implementing the Postgres protocol.
+1
* "Service Mesh" type tools that hide multiple services behind a
single host/port ("Service Mesh" is just a new buzzword for "proxy").
If you want to multiplex protocols on a port, now is an excellent time
to require clients to use ALPN on implicit-TLS connections. (There are
no clients that can currently connect via implicit TLS, so you'll
never have another chance to force the issue without breaking
backwards compatibility.) That should hopefully make it harder to
ALPACA yourself or others [2]https://alpaca-attack.com/.
ALPN doesn't prevent cross-port attacks though, and speaking of those...
* Browser-based protocol implementations using websockets for things
like pgadmin or other tools to connect directly to postgres using
Postgres wire protocol but using native SSL implementations.* Postgres could even implement an HTTP based version of its protocol
and enable things like queries or browser based tools using straight
up HTTP requests so they don't need to use websockets.* Postgres could implement other protocols to serve up data like
status queries or monitoring metrics, using HTTP based standard
protocols instead of using our own protocol.
I see big red warning lights going off in my head -- in a previous
life, I got to fix vulnerabilities that resulted from bolting HTTP
onto existing protocol servers. Not only do you opt into the browser
security model forever, you also gain the ability to speak for any
other web server already running on the same host.
(I know you have PG committers who are also HTTP experts, and I think
you were hacking on mod_perl well before I knew web servers existed.
Just... please be careful. ;D )
Incidentally I find the logic in ProcessStartupPacket incredibly
confusing. It took me a while before I realized it's using tail
recursion to implement the startup logic. I think it would be way more
straightforward and extensible if it used a much more common iterative
style. I think it would make it possible to keep more state than just
ssl_done and gss_done without changing the function signature every
time for example.
+1. The complexity of the startup logic, both client- and server-side,
is a big reason why I want implicit TLS in the first place. That way,
bugs in that code can't be exploited before the TLS handshake
completes.
Thanks!
--Jacob
[1]: /messages/by-id/fcc3ebeb7f05775b63f3207ed52a54ea5d17fb42.camel@vmware.com
[2]: https://alpaca-attack.com/
I don't think it's worth implementing a code path in
the server like this as it would then become cruft that would be hard
to ever get rid of.
Do you think the server can de-support the old code path soon?
I think you can do the same thing, more or less, in the client. Like
if the driver tries to connect via SSL and gets an error it remembers
that host/port and connects using negotiation in the future.
Well, I doubt everybody would instantaneously upgrade to the database that
supports fast TLS,
so there will be a timeframe when there will be a lot of old databases, and
the clients will be new.
In that case, going with "try fast, ignore exception" would degrade
performance for old databases.
I see you suggest caching, however, "degrading one of the cases" might be
more painful than
"not improving one of the cases".
I would like to refrain from implementing "parallel connect both ways
and check which is faster" in
PG clients (e.g. https://en.wikipedia.org/wiki/Happy_Eyeballs ).
Just wondering: do you consider back-porting the feature to all supported
DB versions?
In practice though, by the time drivers support this it'll probably be
far enough in the future
I think drivers release more often than the database, and we can get driver
support even before the database releases.
I'm from pgjdbc Java driver team, and I think it is unfair to suggest that
"driver support is only far enough in the future".
Vladimir
On Fri, 20 Jan 2023 at 01:41, Vladimir Sitnikov
<sitnikov.vladimir@gmail.com> wrote:
Do you think the server can de-support the old code path soon?
I don't have any intention to de-support anything. I really only
picture it being an option in environments where the client and server
are all part of a stack controlled by a single group. User tools and
general purpose tools are better served by our current more flexible
setup.
Just wondering: do you consider back-porting the feature to all supported DB versions?
I can't see that, no.
In practice though, by the time drivers support this it'll probably be
far enough in the futureI think drivers release more often than the database, and we can get driver support even before the database releases.
I'm from pgjdbc Java driver team, and I think it is unfair to suggest that "driver support is only far enough in the future".
Interesting. I didn't realize this would be so attractive to regular
driver authors. I did think of the Happy Eyeballs technique too but I
agree I wouldn't want to go that way either :)
I guess the server doesn't really have to do anything specific to do
what you want. You could just hard code that servers newer than a
specific version would have this support. Or it could be done with a
"protocol option" -- which wouldn't actually change any behaviour but
would be rejected if the server doesn't support "fast ssl" giving you
the feedback you expect without having much extra legacy complexity.
I guess a lot depends on the way the driver works and the way the
application is structured. Applications that make a single connection
or don't have shared state across connections wouldn't think this way.
And interfaces like libpq would normally just leave it up to the
application to make choices like this. But I guess JVM based
applications are more likely to have long-lived systems that make many
connections and also more likely to make it the driver's
responsibility to manage such things.
--
greg
You could just hard code that servers newer than a
specific version would have this support
Suppose PostgreSQL 21 implements "fast TLS"
Suppose pgjdbc 43 supports "fast TLS"
Suppose PgBouncer 1.17.0 does not support "fast TLS" yet
If pgjdbc connects to the DB via balancer, then the server would
respond with "server_version=21".
The balancer would forward "server_version", so the driver would
assume "fast TLS is supported".
In practice, fast TLS can't be used in that configuration since the
connection will fail when the driver attempts to ask
"fast TLS" from the PgBouncer.
Or it could be done with a "protocol option"
Would you please clarify what you mean by "protocol option"?
I guess a lot depends on the way the driver works and the way the
application is structured
There are cases when applications pre-create connections on startup,
so the faster connections are created the better.
The same case happens when the admin issues "reset connection pool",
so it discards old connections and creates new ones.
People rarely know all the knobs, so I would like to have a "fast by
default" design (e.g. server sending a notification "you may use fast
mode the next time")
rather than "keep old behaviour and require everybody to add fast=true
to their configuration" (e.g. users having to configure
"try_fast_tls_first=true")
Vladimir
On 20/01/2023 03:28, Jacob Champion wrote:
On Wed, Jan 18, 2023 at 7:16 PM Greg Stark <stark@mit.edu> wrote:
* "Service Mesh" type tools that hide multiple services behind a
single host/port ("Service Mesh" is just a new buzzword for "proxy").If you want to multiplex protocols on a port, now is an excellent time
to require clients to use ALPN on implicit-TLS connections. (There are
no clients that can currently connect via implicit TLS, so you'll
never have another chance to force the issue without breaking
backwards compatibility.) That should hopefully make it harder to
ALPACA yourself or others [2].
Good idea. Do we want to just require the protocol to be "postgres", or
perhaps "postgres/3.0"? Need to register that with IANA, I guess.
We implemented a protocol version negotiation mechanism in the libpq
protocol itself, how would this interact with it? If it's just
"postgres", then I guess we'd still negotiate the protocol version and
list of extensions after the TLS handshake.
Incidentally I find the logic in ProcessStartupPacket incredibly
confusing. It took me a while before I realized it's using tail
recursion to implement the startup logic. I think it would be way more
straightforward and extensible if it used a much more common iterative
style. I think it would make it possible to keep more state than just
ssl_done and gss_done without changing the function signature every
time for example.+1. The complexity of the startup logic, both client- and server-side,
is a big reason why I want implicit TLS in the first place. That way,
bugs in that code can't be exploited before the TLS handshake
completes.
+1. We need to support explicit TLS for a long time, so we can't
simplify by just removing it. But let's refactor the code somehow, to
make it more clear.
Looking at the patch, I think it accepts an SSLRequest packet even if
implicit TLS has already been established. That's surely wrong, and
shows how confusing the code is. (Or I'm reading it incorrectly, which
also shows how confusing it is :-) )
Regarding Vladimir's comments on how clients can migrate to this, I
don't have any great suggestions. To summarize, there are several options:
- Add an "fast_tls" option that the user can enable if they know the
server supports it
- First connect in old-fashioned way, and remember the server version.
Later, if you reconnect to the same server, use implicit TLS if the
server version was high enough. This would be most useful for connection
pools.
- Connect both ways at the same time, and continue with the fastest,
i.e. "happy eyeballs"
- Try implicit TLS first, and fall back to explicit TLS if it fails.
For libpq, we don't necessarily need to do anything right now. We can
add the implicit TLS support in a later version. Not having libpq
support makes it hard to test the server codepath, though. Maybe just
test it with 'stunnel' or 'openssl s_client'.
- Heikki
On Wed, Feb 22, 2023 at 4:26 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 20/01/2023 03:28, Jacob Champion wrote:
If you want to multiplex protocols on a port, now is an excellent time
to require clients to use ALPN on implicit-TLS connections. (There are
no clients that can currently connect via implicit TLS, so you'll
never have another chance to force the issue without breaking
backwards compatibility.) That should hopefully make it harder to
ALPACA yourself or others [2].Good idea. Do we want to just require the protocol to be "postgres", or
perhaps "postgres/3.0"? Need to register that with IANA, I guess.
Unless you plan to make the next minor protocol version fundamentally
incompatible, I don't think there's much reason to add '.0'. (And even
if that does happen, 'postgres/3.1' is still distinct from
'postgres/3'. Or 'postgres' for that matter.) The Expert Review
process might provide some additional guidance?
We implemented a protocol version negotiation mechanism in the libpq
protocol itself, how would this interact with it? If it's just
"postgres", then I guess we'd still negotiate the protocol version and
list of extensions after the TLS handshake.
Yeah. You could choose to replace major version negotiation completely
with ALPN, I suppose, but there might not be any maintenance benefit
if you still have to support plaintext negotiation. Maybe there are
performance implications to handling the negotiation earlier vs.
later?
Note that older versions of TLS will expose the ALPN in plaintext...
but that may not be a factor by the time a postgres/4 shows up, and if
the next protocol is incompatible then it may not be feasible to hide
the differences via transport encryption anyway.
Regarding Vladimir's comments on how clients can migrate to this, I
don't have any great suggestions. To summarize, there are several options:- Add an "fast_tls" option that the user can enable if they know the
server supports it
I like that such an option could eventually be leveraged for a
postgresqls:// URI scheme (which should not fall back, ever). There
would be other things we'd have to change first to make that a reality
-- postgresqls://example.com?host=evil.local is problematic, for
example -- but it'd be really nice to have an HTTPS equivalent.
--Jacob
On Wed, 22 Feb 2023 at 07:27, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 20/01/2023 03:28, Jacob Champion wrote:
On Wed, Jan 18, 2023 at 7:16 PM Greg Stark <stark@mit.edu> wrote:
* "Service Mesh" type tools that hide multiple services behind a
single host/port ("Service Mesh" is just a new buzzword for "proxy").If you want to multiplex protocols on a port, now is an excellent time
to require clients to use ALPN on implicit-TLS connections. (There are
no clients that can currently connect via implicit TLS, so you'll
never have another chance to force the issue without breaking
backwards compatibility.) That should hopefully make it harder to
ALPACA yourself or others [2].Good idea. Do we want to just require the protocol to be "postgres", or
perhaps "postgres/3.0"? Need to register that with IANA, I guess.
I had never heard of this before, it does seem useful. But if I
understand it right it's entirely independent of this patch. We can
add it to all our Client/Server exchanges whether they're the initial
direct SSL connection or the STARTTLS negotiation?
We implemented a protocol version negotiation mechanism in the libpq
protocol itself, how would this interact with it? If it's just
"postgres", then I guess we'd still negotiate the protocol version and
list of extensions after the TLS handshake.Incidentally I find the logic in ProcessStartupPacket incredibly
confusing. It took me a while before I realized it's using tail
recursion to implement the startup logic. I think it would be way more
straightforward and extensible if it used a much more common iterative
style. I think it would make it possible to keep more state than just
ssl_done and gss_done without changing the function signature every
time for example.+1. The complexity of the startup logic, both client- and server-side,
is a big reason why I want implicit TLS in the first place. That way,
bugs in that code can't be exploited before the TLS handshake
completes.+1. We need to support explicit TLS for a long time, so we can't
simplify by just removing it. But let's refactor the code somehow, to
make it more clear.Looking at the patch, I think it accepts an SSLRequest packet even if
implicit TLS has already been established. That's surely wrong, and
shows how confusing the code is. (Or I'm reading it incorrectly, which
also shows how confusing it is :-) )
I'll double check it but I think I tested that that wasn't the case. I
think it accepts the SSL request packet and sends back an N which the
client libpq just interprets as the server not supporting SSL and does
an unencrypted connection (which is tunneled over stunnel unbeknownst
to libpq).
I agree I would want to flatten this logic to an iterative approach
but having wrapped my head around it now I'm not necessarily rushing
to do it now. The main advantage of flattening it would be to make it
easy to support other protocol types which I think could be really
interesting. It would be much clearer to document the state machine if
all the state is in one place and the code just loops through
processing startup packets and going to a new state until the
connection is established. That's true now but you have to understand
how the state is passed in the function parameters and notice that all
the recursion is tail recursive (I think). And extending that state
would require extending the function signature which would get awkward
quickly.
Regarding Vladimir's comments on how clients can migrate to this, I
don't have any great suggestions. To summarize, there are several options:- Add an "fast_tls" option that the user can enable if they know the
server supports it- First connect in old-fashioned way, and remember the server version.
Later, if you reconnect to the same server, use implicit TLS if the
server version was high enough. This would be most useful for connection
pools.
Vladimir pointed out that this doesn't necessarily work. The server
may be new enough to support it but it could be behind a proxy like
pgbouncer or something. The same would be true if the server reported
a "connection option" instead of depending on version.
- Connect both ways at the same time, and continue with the fastest,
i.e. "happy eyeballs"
That seems way too complex for us to bother with imho.
- Try implicit TLS first, and fall back to explicit TLS if it fails.
For libpq, we don't necessarily need to do anything right now. We can
add the implicit TLS support in a later version. Not having libpq
support makes it hard to test the server codepath, though. Maybe just
test it with 'stunnel' or 'openssl s_client'.
I think we should have an option to explicitly enable it in psql, if
only for testing. And then wait five years and switch the default on
it then. In the meantime users can just set it based on their setup.
That's not the way to the quickest adoption but imho the main
advantages of this option are the options it gives users, not the
latency improvement, so I'm not actually super concerned about
adoption rate.
I assume we'll keep the negotiated mode indefinitely because it can
handle any other protocols we might want. For instance, it currently
handles GSSAPI -- which raises the question, are we happy with GSSAPI
having this extra round trip? Is there a similar change we could make
for it? My understanding is that GSSAPI is an abstract interface and
the actual protocol it's invoking could be anything so we can't make
any assumptions about what the first packet looks like. Perhaps we can
do something about pipelining GSSAPI messages so if the negotiation
fails the server just closes the connection but if it accepts it it
does a similar trick with unreading the buffered data and processing
it through the GSSAPI calls.
--
greg
On Tue, Feb 28, 2023 at 10:33 AM Greg Stark <stark@mit.edu> wrote:
On Wed, 22 Feb 2023 at 07:27, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Good idea. Do we want to just require the protocol to be "postgres", or
perhaps "postgres/3.0"? Need to register that with IANA, I guess.I had never heard of this before, it does seem useful. But if I
understand it right it's entirely independent of this patch.
It can be. If you want to use it in the strongest possible way,
though, you'd have to require its use by clients. Introducing that
requirement later would break existing ones, so I think it makes sense
to do it at the same time as the initial implementation, if there's
interest.
We can
add it to all our Client/Server exchanges whether they're the initial
direct SSL connection or the STARTTLS negotiation?
I'm not sure it would buy you anything during the STARTTLS-style
opening. You already know what protocol you're speaking in that case.
(So with the ALPACA example, the damage is already done.)
Thanks,
--Jacob
Here's an updated patch for direct SSL connections.
I've added libpq client support with a new connection parameter. This
allows testing it easily with psql. It's still a bit hard to see
what's going on though. I'm thinking it would be good to have libpq
keep a string which describes what negotiations were attempted and
failed and what was eventually accepted which psql could print with
the SSL message or expose some other way.
In the end I didn't see how adding an API for this really helped any
more than just saying the API is to stuff the unread data into the
Port structure. So I just documented that. If anyone has any better
idea...
I added documentation for the libpq connection setting.
One thing, I *think* it's ok to replace the send(2) call with
secure_write() in the negotiation. It does mean it's possible for the
connection to fail with FATAL at that point instead of COMMERROR but I
don't think that's a problem.
I haven't added tests. I'm not sure how to test this since to test it
properly means running the server with every permutation of ssl and
gssapi configurations.
Incidentally, some of the configuration combinations -- namely
sslnegotiation=direct and default gssencmode and sslmode results in a
counter-intuitive behaviour. But I don't see a better option that
doesn't mean making the defaults less useful.
Attachments:
v2-0003-Direct-SSL-connections-documentation.patchtext/x-patch; charset=US-ASCII; name=v2-0003-Direct-SSL-connections-documentation.patchDownload+93-10
v2-0001-Direct-SSL-connections-postmaster-support.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Direct-SSL-connections-postmaster-support.patchDownload+134-34
v2-0002-Direct-SSL-connections-client-support.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Direct-SSL-connections-client-support.patchDownload+88-7
Here's a first cut at ALPN support.
Currently it's using a hard coded "Postgres/3.0" protocol (hard coded
both in the client and the server...). And it's hard coded to be
required for direct connections and supported but not required for
regular connections.
IIRC I put a variable labeled a "GUC" but forgot to actually make it a
GUC. But I'm thinking of maybe removing that variable since I don't
see much of a use case for controlling this manually. I *think* ALPN
is supported by all the versions of OpenSSL we support.
The other patches are unchanged (modulo a free() that I missed in the
client before). They still have the semi-open issues I mentioned in
the previous email.
--
greg
Attachments:
v3-0001-Direct-SSL-connections-postmaster-support.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Direct-SSL-connections-postmaster-support.patchDownload+134-34
v3-0002-Direct-SSL-connections-client-support.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Direct-SSL-connections-client-support.patchDownload+89-7
v3-0003-Direct-SSL-connections-documentation.patchtext/x-patch; charset=US-ASCII; name=v3-0003-Direct-SSL-connections-documentation.patchDownload+93-10
v3-0004-alpn-support.patchtext/x-patch; charset=US-ASCII; name=v3-0004-alpn-support.patchDownload+129-3
Sorry, checking the cfbot apparently I had a typo in the #ifndef USE_SSL case.
Attachments:
v4-0004-alpn-support.patchtext/x-patch; charset=US-ASCII; name=v4-0004-alpn-support.patchDownload+129-3
v4-0003-Direct-SSL-connections-documentation.patchtext/x-patch; charset=US-ASCII; name=v4-0003-Direct-SSL-connections-documentation.patchDownload+93-10
v4-0002-Direct-SSL-connections-client-support.patchtext/x-patch; charset=US-ASCII; name=v4-0002-Direct-SSL-connections-client-support.patchDownload+89-7
v4-0001-Direct-SSL-connections-postmaster-support.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Direct-SSL-connections-postmaster-support.patchDownload+134-34
On Mon, 20 Mar 2023 at 16:31, Greg Stark <stark@mit.edu> wrote:
Here's a first cut at ALPN support.
Currently it's using a hard coded "Postgres/3.0" protocol
Apparently that is explicitly disrecommended by the IETF folk. They
want something like "TBD" so people don't start using a string until
it's been added to the registry. So I've changed this for now (to
"TBD-pgsql")
Ok, I think this has pretty much everything I was hoping to do.
The one thing I'm not sure of is it seems some codepaths in postmaster
have ereport(COMMERROR) followed by returning an error whereas other
codepaths just have ereport(FATAL). And I don't actually see much
logic in which do which. (I get the principle behind COMMERR it just
seems like it doesn't really match the code).
I realized I had exactly the infrastructure needed to allow pipelining
the SSL ClientHello like Neon wanted to do so I added that too. It's
kind of redundant with direct SSL connections but seems like there may
be reasons to use that instead.
--
greg
Attachments:
v5-0002-Direct-SSL-connections-client-support.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Direct-SSL-connections-client-support.patchDownload+89-7
v5-0006-Some-added-docs.patchtext/x-patch; charset=US-ASCII; name=v5-0006-Some-added-docs.patchDownload+37-1
v5-0004-alpn-support.patchtext/x-patch; charset=US-ASCII; name=v5-0004-alpn-support.patchDownload+166-3
v5-0005-Allow-pipelining-data-after-ssl-request.patchtext/x-patch; charset=US-ASCII; name=v5-0005-Allow-pipelining-data-after-ssl-request.patchDownload+42-13
v5-0003-Direct-SSL-connections-documentation.patchtext/x-patch; charset=US-ASCII; name=v5-0003-Direct-SSL-connections-documentation.patchDownload+93-10
v5-0001-Direct-SSL-connections-postmaster-support.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Direct-SSL-connections-postmaster-support.patchDownload+134-34
And the cfbot wants a small tweak
Attachments:
v6-0005-Allow-pipelining-data-after-ssl-request.patchtext/x-patch; charset=US-ASCII; name=v6-0005-Allow-pipelining-data-after-ssl-request.patchDownload+42-13
v6-0004-Direct-SSL-connections-ALPN-support.patchtext/x-patch; charset=US-ASCII; name=v6-0004-Direct-SSL-connections-ALPN-support.patchDownload+167-3
v6-0002-Direct-SSL-connections-client-support.patchtext/x-patch; charset=US-ASCII; name=v6-0002-Direct-SSL-connections-client-support.patchDownload+89-7
v6-0003-Direct-SSL-connections-documentation.patchtext/x-patch; charset=US-ASCII; name=v6-0003-Direct-SSL-connections-documentation.patchDownload+93-10
v6-0006-Direct-SSL-connections-some-additional-docs.patchtext/x-patch; charset=US-ASCII; name=v6-0006-Direct-SSL-connections-some-additional-docs.patchDownload+37-1
v6-0001-Direct-SSL-connections-postmaster-support.patchtext/x-patch; charset=US-ASCII; name=v6-0001-Direct-SSL-connections-postmaster-support.patchDownload+134-34
On 31/03/2023 10:59, Greg Stark wrote:
IIRC I put a variable labeled a "GUC" but forgot to actually make it a
GUC. But I'm thinking of maybe removing that variable since I don't
see much of a use case for controlling this manually. I *think* ALPN
is supported by all the versions of OpenSSL we support.
+1 on removing the variable. Let's make ALPN mandatory for direct SSL
connections, like Jacob suggested. And for old-style handshakes, accept
and check ALPN if it's given.
I don't see the point of the libpq 'sslalpn' option either. Let's send
ALPN always.
Admittedly having the options make testing different of combinations of
old and new clients and servers a little easier. But I don't think we
should add options for the sake of backwards compatibility tests.
--- a/src/backend/libpq/pqcomm.c +++ b/src/backend/libpq/pqcomm.c @@ -1126,13 +1126,16 @@ pq_discardbytes(size_t len) /* -------------------------------- * pq_buffer_has_data - is any buffered data available to read? * - * This will *not* attempt to read more data. + * Actually returns the number of bytes in the buffer... + * + * This will *not* attempt to read more data. And reading up to that number of + * bytes should not cause reading any more data either. * -------------------------------- */ -bool +size_t pq_buffer_has_data(void) { - return (PqRecvPointer < PqRecvLength); + return (PqRecvLength - PqRecvPointer); }
Let's rename the function.
/* push unencrypted buffered data back through SSL setup */
len = pq_buffer_has_data();
if (len > 0)
{
buf = palloc(len);
if (pq_getbytes(buf, len) == EOF)
return STATUS_ERROR; /* shouldn't be possible */
port->raw_buf = buf;
port->raw_buf_remaining = len;
port->raw_buf_consumed = 0;
}Assert(pq_buffer_has_data() == 0);
if (secure_open_server(port) == -1)
{
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
errmsg("SSL Protocol Error during direct SSL connection initiation")));
return STATUS_ERROR;
}if (port->raw_buf_remaining > 0)
{
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
errmsg("received unencrypted data after SSL request"),
errdetail("This could be either a client-software bug or evidence of an attempted man-in-the-middle attack.")));
return STATUS_ERROR;
}
if (port->raw_buf)
pfree(port->raw_buf);
This pattern is repeated in both callers of secure_open_server(). Could
we move this into secure_open_server() itself? That would feel pretty
natural, be-secure.c already contains the secure_raw_read() function
that reads the 'raw_buf' field.
const char *
PQsslAttribute(PGconn *conn, const char *attribute_name)
{
...if (strcmp(attribute_name, "alpn") == 0)
{
const unsigned char *data;
unsigned int len;
static char alpn_str[256]; /* alpn doesn't support longer than 255 bytes */
SSL_get0_alpn_selected(conn->ssl, &data, &len);
if (data == NULL || len==0 || len > sizeof(alpn_str)-1)
return NULL;
memcpy(alpn_str, data, len);
alpn_str[len] = 0;
return alpn_str;
}
Using a static buffer doesn't look right. If you call PQsslAttribute on
two different connections from two different threads concurrently, they
will write to the same buffer. I see that you copied it from the
"key_bits" handlng, but it has the same issue.
--
Heikki Linnakangas
Neon (https://neon.tech)
On Tue, Jul 04, 2023 at 05:15:49PM +0300, Heikki Linnakangas wrote:
I don't see the point of the libpq 'sslalpn' option either. Let's send ALPN
always.Admittedly having the options make testing different of combinations of old
and new clients and servers a little easier. But I don't think we should add
options for the sake of backwards compatibility tests.
Hmm. I would actually argue in favor of having these with tests in
core to stress the previous SSL hanshake protocol, as not having these
parameters would mean that we rely only on major version upgrades in
the buildfarm to test the backward-compatible code path, making issues
much harder to catch. And we still need to maintain the
backward-compatible path for 10 years based on what pg_dump and
pg_upgrade need to support.
--
Michael