[PATCH] Avoid mixing custom and OpenSSL BIO functions

Started by David Benjaminalmost 2 years ago21 messages
#1David Benjamin
davidben@google.com
1 attachment(s)

Hi all,

I've attached a patch for the master branch to fix up the custom BIOs used
by PostgreSQL, in light of the issues with the OpenSSL update recently.
While c82207a548db47623a2bfa2447babdaa630302b9 (switching from BIO_get_data
to BIO_get_app_data) resolved the immediate conflict, I don't think it
addressed the root cause, which is that PostgreSQL was mixing up two BIO
implementations, each of which expected to be driving the BIO.

It turns out the parts that came from the OpenSSL socket BIO were a no-op,
and in fact PostgreSQL is relying on it being a no-op. Instead, it's
cleaner to just define a custom BIO the normal way, which then leaves the
more standard BIO_get_data mechanism usable. This also avoids the risk that
a future OpenSSL will add a now BIO_ctrl to the socket type, with libssl
calling into it, and then break some assumptions made by PostgreSQL.

I've attached a patch which does that. The existing SSL tests pass with
it, tested on Debian stable. (Though it took me a few iterations to figure
out how to run the SSL tests, so it's possible I've missed something.)

The patch is not expected to change behavior, so nothing new to document,
nor any expected performance impact.

David

Attachments:

avoid-mixing-custom-and-openssl-bios-v1.patchapplication/octet-stream; name=avoid-mixing-custom-and-openssl-bios-v1.patchDownload
From dc16713c5afc952f4a68e9a3fae56aa6b2e9bd74 Mon Sep 17 00:00:00 2001
From: David Benjamin <davidben@google.com>
Date: Sun, 11 Feb 2024 10:42:25 -0500
Subject: [PATCH] Avoid mixing custom and OpenSSL BIO functions

This addresses the root cause of the BIO_set_data conflict that
c82207a548db47623a2bfa2447babdaa630302b9 attempted to address. That fix was
really just a workaround. The real root cause was that postgres was mixing up
two BIO implementations, each of which expected to be driving the BIO.

Mixing them up did not actually do any good. The Port and PGconn structures
already maintained the file descriptors. The socket BIO's copy, configured via
BIO_set_fd, wasn't even being used because my_BIO_s_socket replaced read and
write anyway. We've been essentially keeping extra state around, and relying
on it being unused:

- gets - Not implemented by sockets and not used by libssl.
- puts - Not used by libssl. If it were, it would break the special SIGPIPE and
  interrupt handling postgres aiming for.
- ctrl - (More on this later)
- create - This is just setting up state that we don't use.
- destroy - This is a no-op because we use BIO_NOCLOSE. In fact it's important
  that it's a no-op because otherwise OpenSSL would close the socket under
  postgres' feet!
- callback_ctrl - Not implemented by sockets.

That leaves ctrl. ctrl is a bunch of operations (it's ioctl). The only
operation that matters is BIO_CTRL_FLUSH, which is implemented as a no-op. All
other operations are unused. It's once again good that they're unused because
otherwise OpenSSL might mess with postgres's socket, break the assumptions
around interrupt handling, etc.

Instead, simply implement a very basic ctrl ourselves and drop the other
functions. This avoids the risk that future OpenSSL upgrades add some feature
to BIO_s_socket's ctrl which conflicts with postgres.

Once we've done that, we're free to use BIO_set_data. While BIO_set_app_data
works fine, I've reverted back to BIO_set_data because it's more commonly used.
app_data depends on OpenSSL's "ex_data" mechanism, which is a tad heavier under
the hood.
---
 configure                                |  2 +-
 configure.ac                             |  2 +-
 meson.build                              |  1 +
 src/backend/libpq/be-secure-openssl.c    | 48 ++++++++++++++++--------
 src/include/pg_config.h.in               |  3 ++
 src/interfaces/libpq/fe-secure-openssl.c | 48 ++++++++++++++++--------
 6 files changed, 72 insertions(+), 32 deletions(-)

diff --git a/configure b/configure
index 2a1ee251f2..2b1f83b8b1 100755
--- a/configure
+++ b/configure
@@ -12870,7 +12870,7 @@ done
   # defines OPENSSL_VERSION_NUMBER to claim version 2.0.0, even though it
   # doesn't have these OpenSSL 1.1.0 functions. So check for individual
   # functions.
-  for ac_func in OPENSSL_init_ssl BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free
+  for ac_func in OPENSSL_init_ssl BIO_get_data BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.ac b/configure.ac
index 52fd7af446..be46d9a6fc 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1374,7 +1374,7 @@ if test "$with_ssl" = openssl ; then
   # defines OPENSSL_VERSION_NUMBER to claim version 2.0.0, even though it
   # doesn't have these OpenSSL 1.1.0 functions. So check for individual
   # functions.
-  AC_CHECK_FUNCS([OPENSSL_init_ssl BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free])
+  AC_CHECK_FUNCS([OPENSSL_init_ssl BIO_get_data BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free])
   # OpenSSL versions before 1.1.0 required setting callback functions, for
   # thread-safety. In 1.1.0, it's no longer required, and CRYPTO_lock()
   # function was removed.
diff --git a/meson.build b/meson.build
index 8ed51b6aae..ba133b6e11 100644
--- a/meson.build
+++ b/meson.build
@@ -1290,6 +1290,7 @@ if sslopt in ['auto', 'openssl']
       # doesn't have these OpenSSL 1.1.0 functions. So check for individual
       # functions.
       ['OPENSSL_init_ssl'],
+      ['BIO_get_data'],
       ['BIO_meth_new'],
       ['ASN1_STRING_get0_data'],
       ['HMAC_CTX_new'],
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index e12b1cc9e3..68697eb4ce 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -59,7 +59,7 @@ openssl_tls_init_hook_typ openssl_tls_init_hook = default_openssl_tls_init;
 static int	my_sock_read(BIO *h, char *buf, int size);
 static int	my_sock_write(BIO *h, const char *buf, int size);
 static BIO_METHOD *my_BIO_s_socket(void);
-static int	my_SSL_set_fd(Port *port, int fd);
+static int	my_SSL_set_fd(Port *port);
 
 static DH  *load_dh_file(char *filename, bool isServerStart);
 static DH  *load_dh_buffer(const char *buffer, size_t len);
@@ -440,7 +440,7 @@ be_tls_open_server(Port *port)
 						SSLerrmessage(ERR_get_error()))));
 		return -1;
 	}
-	if (!my_SSL_set_fd(port, port->sock))
+	if (!my_SSL_set_fd(port))
 	{
 		ereport(COMMERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -849,6 +849,11 @@ be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
  * see sock_read() and sock_write() in OpenSSL's crypto/bio/bss_sock.c.
  */
 
+#ifndef HAVE_BIO_GET_DATA
+#define BIO_get_data(bio) (bio->ptr)
+#define BIO_set_data(bio, data) (bio->ptr = data)
+#endif
+
 static BIO_METHOD *my_bio_methods = NULL;
 
 static int
@@ -858,7 +863,7 @@ my_sock_read(BIO *h, char *buf, int size)
 
 	if (buf != NULL)
 	{
-		res = secure_raw_read(((Port *) BIO_get_app_data(h)), buf, size);
+		res = secure_raw_read(((Port *) BIO_get_data(h)), buf, size);
 		BIO_clear_retry_flags(h);
 		if (res <= 0)
 		{
@@ -878,7 +883,7 @@ my_sock_write(BIO *h, const char *buf, int size)
 {
 	int			res = 0;
 
-	res = secure_raw_write(((Port *) BIO_get_app_data(h)), buf, size);
+	res = secure_raw_write(((Port *) BIO_get_data(h)), buf, size);
 	BIO_clear_retry_flags(h);
 	if (res <= 0)
 	{
@@ -892,12 +897,27 @@ my_sock_write(BIO *h, const char *buf, int size)
 	return res;
 }
 
+static long
+my_sock_ctrl(BIO *h, int cmd, long num, void *ptr)
+{
+	long		res = 0;
+
+	switch (cmd)
+	{
+		case BIO_CTRL_FLUSH:
+			/* libssl expects all BIOs to support BIO_flush. */
+			res = 1;
+			break;
+	}
+
+	return res;
+}
+
 static BIO_METHOD *
 my_BIO_s_socket(void)
 {
 	if (!my_bio_methods)
 	{
-		BIO_METHOD *biom = (BIO_METHOD *) BIO_s_socket();
 #ifdef HAVE_BIO_METH_NEW
 		int			my_bio_index;
 
@@ -910,12 +930,7 @@ my_BIO_s_socket(void)
 			return NULL;
 		if (!BIO_meth_set_write(my_bio_methods, my_sock_write) ||
 			!BIO_meth_set_read(my_bio_methods, my_sock_read) ||
-			!BIO_meth_set_gets(my_bio_methods, BIO_meth_get_gets(biom)) ||
-			!BIO_meth_set_puts(my_bio_methods, BIO_meth_get_puts(biom)) ||
-			!BIO_meth_set_ctrl(my_bio_methods, BIO_meth_get_ctrl(biom)) ||
-			!BIO_meth_set_create(my_bio_methods, BIO_meth_get_create(biom)) ||
-			!BIO_meth_set_destroy(my_bio_methods, BIO_meth_get_destroy(biom)) ||
-			!BIO_meth_set_callback_ctrl(my_bio_methods, BIO_meth_get_callback_ctrl(biom)))
+			!BIO_meth_set_ctrl(my_bio_methods, my_sock_ctrl))
 		{
 			BIO_meth_free(my_bio_methods);
 			my_bio_methods = NULL;
@@ -925,9 +940,12 @@ my_BIO_s_socket(void)
 		my_bio_methods = malloc(sizeof(BIO_METHOD));
 		if (!my_bio_methods)
 			return NULL;
-		memcpy(my_bio_methods, biom, sizeof(BIO_METHOD));
+		memset(my_bio_methods, 0, sizeof(BIO_METHOD));
+		my_bio_methods->type = BIO_TYPE_SOCKET;
+		my_bio_methods->name = "libpq socket";
 		my_bio_methods->bread = my_sock_read;
 		my_bio_methods->bwrite = my_sock_write;
+		my_bio_methods->ctrl = my_sock_ctrl;
 #endif
 	}
 	return my_bio_methods;
@@ -935,7 +953,7 @@ my_BIO_s_socket(void)
 
 /* This should exactly match OpenSSL's SSL_set_fd except for using my BIO */
 static int
-my_SSL_set_fd(Port *port, int fd)
+my_SSL_set_fd(Port *port)
 {
 	int			ret = 0;
 	BIO		   *bio;
@@ -954,9 +972,9 @@ my_SSL_set_fd(Port *port, int fd)
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
 		goto err;
 	}
-	BIO_set_app_data(bio, port);
+	BIO_set_data(bio, port);
+	BIO_set_init(bio, 1);
 
-	BIO_set_fd(bio, fd, BIO_NOCLOSE);
 	SSL_set_bio(port->ssl, bio, bio);
 	ret = 1;
 err:
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 07e73567dc..c7b88945fa 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -66,6 +66,9 @@
 /* Define to 1 if you have the `backtrace_symbols' function. */
 #undef HAVE_BACKTRACE_SYMBOLS
 
+/* Define to 1 if you have the `BIO_get_data' function. */
+#undef HAVE_BIO_GET_DATA
+
 /* Define to 1 if you have the `BIO_meth_new' function. */
 #undef HAVE_BIO_METH_NEW
 
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index 8110882262..52b23638f0 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -81,7 +81,7 @@ static int	PQssl_passwd_cb(char *buf, int size, int rwflag, void *userdata);
 static int	my_sock_read(BIO *h, char *buf, int size);
 static int	my_sock_write(BIO *h, const char *buf, int size);
 static BIO_METHOD *my_BIO_s_socket(void);
-static int	my_SSL_set_fd(PGconn *conn, int fd);
+static int	my_SSL_set_fd(PGconn *conn);
 
 
 static bool pq_init_ssl_lib = true;
@@ -1191,7 +1191,7 @@ initialize_SSL(PGconn *conn)
 	 */
 	if (!(conn->ssl = SSL_new(SSL_context)) ||
 		!SSL_set_app_data(conn->ssl, conn) ||
-		!my_SSL_set_fd(conn, conn->sock))
+		!my_SSL_set_fd(conn))
 	{
 		char	   *err = SSLerrmessage(ERR_get_error());
 
@@ -1802,6 +1802,11 @@ PQsslAttribute(PGconn *conn, const char *attribute_name)
  * see sock_read() and sock_write() in OpenSSL's crypto/bio/bss_sock.c.
  */
 
+#ifndef HAVE_BIO_GET_DATA
+#define BIO_get_data(bio) (bio->ptr)
+#define BIO_set_data(bio, data) (bio->ptr = data)
+#endif
+
 /* protected by ssl_config_mutex */
 static BIO_METHOD *my_bio_methods;
 
@@ -1810,7 +1815,7 @@ my_sock_read(BIO *h, char *buf, int size)
 {
 	int			res;
 
-	res = pqsecure_raw_read((PGconn *) BIO_get_app_data(h), buf, size);
+	res = pqsecure_raw_read((PGconn *) BIO_get_data(h), buf, size);
 	BIO_clear_retry_flags(h);
 	if (res < 0)
 	{
@@ -1840,7 +1845,7 @@ my_sock_write(BIO *h, const char *buf, int size)
 {
 	int			res;
 
-	res = pqsecure_raw_write((PGconn *) BIO_get_app_data(h), buf, size);
+	res = pqsecure_raw_write((PGconn *) BIO_get_data(h), buf, size);
 	BIO_clear_retry_flags(h);
 	if (res < 0)
 	{
@@ -1865,6 +1870,22 @@ my_sock_write(BIO *h, const char *buf, int size)
 	return res;
 }
 
+static long
+my_sock_ctrl(BIO *h, int cmd, long num, void *ptr)
+{
+	long		res = 0;
+
+	switch (cmd)
+	{
+		case BIO_CTRL_FLUSH:
+			/* libssl expects all BIOs to support BIO_flush. */
+			res = 1;
+			break;
+	}
+
+	return res;
+}
+
 static BIO_METHOD *
 my_BIO_s_socket(void)
 {
@@ -1877,7 +1898,6 @@ my_BIO_s_socket(void)
 
 	if (!my_bio_methods)
 	{
-		BIO_METHOD *biom = (BIO_METHOD *) BIO_s_socket();
 #ifdef HAVE_BIO_METH_NEW
 		int			my_bio_index;
 
@@ -1895,12 +1915,7 @@ my_BIO_s_socket(void)
 		 */
 		if (!BIO_meth_set_write(res, my_sock_write) ||
 			!BIO_meth_set_read(res, my_sock_read) ||
-			!BIO_meth_set_gets(res, BIO_meth_get_gets(biom)) ||
-			!BIO_meth_set_puts(res, BIO_meth_get_puts(biom)) ||
-			!BIO_meth_set_ctrl(res, BIO_meth_get_ctrl(biom)) ||
-			!BIO_meth_set_create(res, BIO_meth_get_create(biom)) ||
-			!BIO_meth_set_destroy(res, BIO_meth_get_destroy(biom)) ||
-			!BIO_meth_set_callback_ctrl(res, BIO_meth_get_callback_ctrl(biom)))
+			!BIO_meth_set_ctrl(res, my_sock_ctrl))
 		{
 			goto err;
 		}
@@ -1908,9 +1923,12 @@ my_BIO_s_socket(void)
 		res = malloc(sizeof(BIO_METHOD));
 		if (!res)
 			goto err;
-		memcpy(res, biom, sizeof(BIO_METHOD));
+		memset(res, 0, sizeof(BIO_METHOD));
+		res->type = BIO_TYPE_SOCKET;
+		res->name = "libpq socket";
 		res->bread = my_sock_read;
 		res->bwrite = my_sock_write;
+		res->ctrl = my_sock_ctrl;
 #endif
 	}
 
@@ -1932,7 +1950,7 @@ err:
 
 /* This should exactly match OpenSSL's SSL_set_fd except for using my BIO */
 static int
-my_SSL_set_fd(PGconn *conn, int fd)
+my_SSL_set_fd(PGconn *conn)
 {
 	int			ret = 0;
 	BIO		   *bio;
@@ -1950,10 +1968,10 @@ my_SSL_set_fd(PGconn *conn, int fd)
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
 		goto err;
 	}
-	BIO_set_app_data(bio, conn);
+	BIO_set_data(bio, conn);
+	BIO_set_init(bio, 1);
 
 	SSL_set_bio(conn->ssl, bio, bio);
-	BIO_set_fd(bio, fd, BIO_NOCLOSE);
 	ret = 1;
 err:
 	return ret;
-- 
2.39.2

#2Daniel Gustafsson
daniel@yesql.se
In reply to: David Benjamin (#1)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

On 11 Feb 2024, at 19:19, David Benjamin <davidben@google.com> wrote:

Hi all,

I've attached a patch for the master branch to fix up the custom BIOs used by PostgreSQL, in light of the issues with the OpenSSL update recently. While c82207a548db47623a2bfa2447babdaa630302b9 (switching from BIO_get_data to BIO_get_app_data) resolved the immediate conflict, I don't think it addressed the root cause, which is that PostgreSQL was mixing up two BIO implementations, each of which expected to be driving the BIO.

Thanks for your patch, I'm still digesting it so will provide more of a review
later.

It turns out the parts that came from the OpenSSL socket BIO were a no-op, and in fact PostgreSQL is relying on it being a no-op. Instead, it's cleaner to just define a custom BIO the normal way, which then leaves the more standard BIO_get_data mechanism usable. This also avoids the risk that a future OpenSSL will add a now BIO_ctrl to the socket type, with libssl calling into it, and then break some assumptions made by PostgreSQL.

+		case BIO_CTRL_FLUSH:
+			/* libssl expects all BIOs to support BIO_flush. */
+			res = 1;
+			break;

Will this always be true? Returning 1 implies that we have flushed all data on
the socket, but what if we just before called BIO_set_retry..XX()?

I've attached a patch which does that. The existing SSL tests pass with it, tested on Debian stable. (Though it took me a few iterations to figure out how to run the SSL tests, so it's possible I've missed something.)

We've done a fair bit of work on making them easier to run, so I'm curious if
you saw any room for improvements there as someone coming to them for the first
time?

--
Daniel Gustafsson

#3David Benjamin
davidben@google.com
In reply to: Daniel Gustafsson (#2)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

On Mon, Feb 12, 2024 at 9:38 AM Daniel Gustafsson <daniel@yesql.se> wrote:

On 11 Feb 2024, at 19:19, David Benjamin <davidben@google.com> wrote:
It turns out the parts that came from the OpenSSL socket BIO were a

no-op, and in fact PostgreSQL is relying on it being a no-op. Instead, it's
cleaner to just define a custom BIO the normal way, which then leaves the
more standard BIO_get_data mechanism usable. This also avoids the risk that
a future OpenSSL will add a now BIO_ctrl to the socket type, with libssl
calling into it, and then break some assumptions made by PostgreSQL.

+               case BIO_CTRL_FLUSH:
+                       /* libssl expects all BIOs to support BIO_flush. */
+                       res = 1;
+                       break;

Will this always be true? Returning 1 implies that we have flushed all
data on
the socket, but what if we just before called BIO_set_retry..XX()?

The real one is also just a no-op. :-)
https://github.com/openssl/openssl/blob/master/crypto/bio/bss_sock.c#L215

This is used in places like buffer BIO or the FILE* BIO, where BIO_write
might accept data, but stage it into a buffer, to be flushed later. libssl
ends up calling BIO_flush at the end of every flight, which in turn means
that BIOs used with libssl need to support it, even if to return true
because there's nothing to flush. (Arguably TCP sockets could have used a
flush concept, to help control Nagle's algorithm, but for better or worse,
that's a socket-wide TCP_NODELAY option, rather than an explicit flush
call.)

BIO_set_retry.. behaves like POSIX I/O, where a failed EWOULDBLOCK write is
as if you never wrote to the socket at all and doesn't impact socket state.
That is, the data hasn't been accepted yet. It's not expected for BIO_flush
to care about the rejected write data. (Also I don't believe libssl will
ever trigger this case.) It's confusing because unlike an EWOULDBLOCK
errno, BIO_set_retry.. *is* itself BIO state, but that's just because the
BIO calling convention is goofy and didn't just return the error out of the
return value. So OpenSSL just stashes the bit on the BIO itself, for you to
query out immediately afterwards.

I've attached a patch which does that. The existing SSL tests pass with

it, tested on Debian stable. (Though it took me a few iterations to figure
out how to run the SSL tests, so it's possible I've missed something.)

We've done a fair bit of work on making them easier to run, so I'm curious
if
you saw any room for improvements there as someone coming to them for the
first
time?

A lot of my time was just trying to figure out how to run the tests in the
first place, so perhaps documentation? But I may just have been looking in
the wrong spot and honestly didn't really know what I was doing. I can try
to summarize what I did (from memory), and perhaps that can point to
possible improvements?

- I looked in the repository for instructions on running the tests and
couldn't find any. At this point, I hadn't found src/test/README.
- I found
https://wiki.postgresql.org/wiki/Developer_FAQ#How_do_I_test_my_changes.3F,
linked from https://www.postgresql.org/developer/
- I ran the configure build with --enable-cassert, ran make check, tests
passed.
- I wrote my patch and then spent a while intentionally adding bugs to see
if the tests would catch it (I wasn't sure whether there was ssl test
coverage), finally concluding that I wasn't running any ssl tests
- I looked some more and found src/test/ssl/README
- I reconfigured with --enable-tap-tests and ran make check
PG_TEST_EXTRA=ssl per those instructions, but the SSL tests still weren't
running
- I grepped for PG_TEST_EXTRA and found references in the CI config, but
using the meson build
- I installed meson, mimicked a few commands from the CI. That seemed to
work.
- I tried running only the ssl tests, looking up how you specify individual
tests in meson, to make my compile/test cycles a bit faster, but they
failed.
- I noticed that the first couple "tests" were named like setup tasks, and
guessed that the ssl tests depended on this setup to run. But by then I
just gave up and waited out the whole test suite per run. :-)

Once I got it running, it was quite smooth. I just wasn't sure how to do it.

David

#4David Benjamin
davidben@google.com
In reply to: David Benjamin (#3)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

By the way, I'm unable to add the patch to the next commitfest due to the
cool off period for new accounts. How long is that period? I don't suppose
there's a way to avoid it?

On Mon, Feb 12, 2024 at 11:31 AM David Benjamin <davidben@google.com> wrote:

Show quoted text

On Mon, Feb 12, 2024 at 9:38 AM Daniel Gustafsson <daniel@yesql.se> wrote:

On 11 Feb 2024, at 19:19, David Benjamin <davidben@google.com> wrote:
It turns out the parts that came from the OpenSSL socket BIO were a

no-op, and in fact PostgreSQL is relying on it being a no-op. Instead, it's
cleaner to just define a custom BIO the normal way, which then leaves the
more standard BIO_get_data mechanism usable. This also avoids the risk that
a future OpenSSL will add a now BIO_ctrl to the socket type, with libssl
calling into it, and then break some assumptions made by PostgreSQL.

+               case BIO_CTRL_FLUSH:
+                       /* libssl expects all BIOs to support BIO_flush.
*/
+                       res = 1;
+                       break;

Will this always be true? Returning 1 implies that we have flushed all
data on
the socket, but what if we just before called BIO_set_retry..XX()?

The real one is also just a no-op. :-)
https://github.com/openssl/openssl/blob/master/crypto/bio/bss_sock.c#L215

This is used in places like buffer BIO or the FILE* BIO, where BIO_write
might accept data, but stage it into a buffer, to be flushed later. libssl
ends up calling BIO_flush at the end of every flight, which in turn means
that BIOs used with libssl need to support it, even if to return true
because there's nothing to flush. (Arguably TCP sockets could have used a
flush concept, to help control Nagle's algorithm, but for better or worse,
that's a socket-wide TCP_NODELAY option, rather than an explicit flush
call.)

BIO_set_retry.. behaves like POSIX I/O, where a failed EWOULDBLOCK write
is as if you never wrote to the socket at all and doesn't impact
socket state. That is, the data hasn't been accepted yet. It's not expected
for BIO_flush to care about the rejected write data. (Also I don't believe
libssl will ever trigger this case.) It's confusing because unlike an
EWOULDBLOCK errno, BIO_set_retry.. *is* itself BIO state, but that's just
because the BIO calling convention is goofy and didn't just return the
error out of the return value. So OpenSSL just stashes the bit on the BIO
itself, for you to query out immediately afterwards.

I've attached a patch which does that. The existing SSL tests pass with

it, tested on Debian stable. (Though it took me a few iterations to figure
out how to run the SSL tests, so it's possible I've missed something.)

We've done a fair bit of work on making them easier to run, so I'm
curious if
you saw any room for improvements there as someone coming to them for the
first
time?

A lot of my time was just trying to figure out how to run the tests in
the first place, so perhaps documentation? But I may just have been looking
in the wrong spot and honestly didn't really know what I was doing. I can
try to summarize what I did (from memory), and perhaps that can point to
possible improvements?

- I looked in the repository for instructions on running the tests and
couldn't find any. At this point, I hadn't found src/test/README.
- I found
https://wiki.postgresql.org/wiki/Developer_FAQ#How_do_I_test_my_changes.3F,
linked from https://www.postgresql.org/developer/
- I ran the configure build with --enable-cassert, ran make check, tests
passed.
- I wrote my patch and then spent a while intentionally adding bugs to see
if the tests would catch it (I wasn't sure whether there was ssl test
coverage), finally concluding that I wasn't running any ssl tests
- I looked some more and found src/test/ssl/README
- I reconfigured with --enable-tap-tests and ran make check
PG_TEST_EXTRA=ssl per those instructions, but the SSL tests still weren't
running
- I grepped for PG_TEST_EXTRA and found references in the CI config, but
using the meson build
- I installed meson, mimicked a few commands from the CI. That seemed to
work.
- I tried running only the ssl tests, looking up how you specify
individual tests in meson, to make my compile/test cycles a bit faster, but
they failed.
- I noticed that the first couple "tests" were named like setup tasks, and
guessed that the ssl tests depended on this setup to run. But by then I
just gave up and waited out the whole test suite per run. :-)

Once I got it running, it was quite smooth. I just wasn't sure how to do
it.

David

#5Daniel Gustafsson
daniel@yesql.se
In reply to: David Benjamin (#4)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

On 15 Feb 2024, at 04:58, David Benjamin <davidben@google.com> wrote:

By the way, I'm unable to add the patch to the next commitfest due to the cool off period for new accounts. How long is that period? I don't suppose there's a way to avoid it?

There is a way to expedite the cooling-off period (it's a SPAM prevention
measure), but I don't have access to it. In the meantime I've added the patch
for you, and once the cooling off is over we can add your name as author.

https://commitfest.postgresql.org/47/4835/

I'm currently reviewing it and will get back to you soon on that.

--
Daniel Gustafsson

#6Andres Freund
andres@anarazel.de
In reply to: David Benjamin (#1)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

Hi,

On 2024-02-11 13:19:00 -0500, David Benjamin wrote:

I've attached a patch for the master branch to fix up the custom BIOs used
by PostgreSQL, in light of the issues with the OpenSSL update recently.
While c82207a548db47623a2bfa2447babdaa630302b9 (switching from BIO_get_data
to BIO_get_app_data) resolved the immediate conflict, I don't think it
addressed the root cause, which is that PostgreSQL was mixing up two BIO
implementations, each of which expected to be driving the BIO.

Yea, that's certainly not nice - and I think we've been somewhat lucky it
hasn't caused more issues. There's some nasty possibilities, e.g. sock_ctrl()
partially enabling ktls without our initialization having called
ktls_enable(). Right now that just means ktls is unusable, but it's not hard
to imagine accidentally ending up sending unencrypted data.

I've in the past looked into not using a custom BIO [1]/messages/by-id/20210715021747.h2hih7mw56ivyntt@alap3.anarazel.de, but I have my doubts
about that being a good idea. I think medium term we want to be able to do
network IO asynchronously, which seems quite hard to do when using openssl's
socket BIO.

Once we've done that, we're free to use BIO_set_data. While BIO_set_app_data
works fine, I've reverted back to BIO_set_data because it's more commonly used.
app_data depends on OpenSSL's "ex_data" mechanism, which is a tad heavier under
the hood.

At first I was a bit wary of that, because it requires us to bring back the
fallback implementation. But you're right, it's noticeably heavier than
BIO_get_data(), and we do call BIO_get_app_data() fairly frequently.

That leaves ctrl. ctrl is a bunch of operations (it's ioctl). The only
operation that matters is BIO_CTRL_FLUSH, which is implemented as a no-op. All
other operations are unused. It's once again good that they're unused because
otherwise OpenSSL might mess with postgres's socket, break the assumptions
around interrupt handling, etc.

How did you determine that only FLUSH is required? I didn't even really find
documentation about what the intended semantics actually are.

E.g. should we implement BIO_CTRL_EOF? Sure, it wasn't really supported so
far, because we never set it, but is that right? What about
BIO_CTRL_GET_CLOSE/BIO_CTRL_SET_CLOSE?

Another issue is that 0 doesn't actually seem like the universal error return
- e.g. BIO_C_GET_FD seems to return -1, because 0 is a valid fd.

As of your patch the bio doesn't actually have an FD anymore, should it still
set BIO_TYPE_DESCRIPTOR?

+static long
+my_sock_ctrl(BIO *h, int cmd, long num, void *ptr)
+{
+	long		res = 0;
+
+	switch (cmd)
+	{
+		case BIO_CTRL_FLUSH:
+			/* libssl expects all BIOs to support BIO_flush. */
+			res = 1;
+			break;
+	}
+
+	return res;
+}

I'd move the res = 0 into a default: block. That way the compiler can warn if
some case doesn't set it in all branches.

static BIO_METHOD *
my_BIO_s_socket(void)
{

Wonder if we should rename this. It's pretty odd that we still call it's not
really related to s_socket anymore, and doesn't even really implement the same
interface (e.g. get_fd doesn't work anymore). Similarly, my_SSL_set_fd()
doesn't actually call set_fd() anymore, which sure seems odd.

Greetings,

Andres Freund

[1]: /messages/by-id/20210715021747.h2hih7mw56ivyntt@alap3.anarazel.de

#7David Benjamin
davidben@google.com
In reply to: Andres Freund (#6)
1 attachment(s)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

Thanks for the very thorough comments! I've attached a new version of the
patch.

On Thu, Feb 15, 2024 at 4:17 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2024-02-11 13:19:00 -0500, David Benjamin wrote:

I've attached a patch for the master branch to fix up the custom BIOs

used

by PostgreSQL, in light of the issues with the OpenSSL update recently.
While c82207a548db47623a2bfa2447babdaa630302b9 (switching from

BIO_get_data

to BIO_get_app_data) resolved the immediate conflict, I don't think it
addressed the root cause, which is that PostgreSQL was mixing up two BIO
implementations, each of which expected to be driving the BIO.

Yea, that's certainly not nice - and I think we've been somewhat lucky it
hasn't caused more issues. There's some nasty possibilities, e.g.
sock_ctrl()
partially enabling ktls without our initialization having called
ktls_enable(). Right now that just means ktls is unusable, but it's not
hard
to imagine accidentally ending up sending unencrypted data.

Yeah. Even if, say, the ktls bits work, given you all care enough about I/O
to have wanted to wrap the BIO, I assume you'd want to pick up those
features on your own terms, e.g. by implementing the BIO_CTRLs yourself.

I've in the past looked into not using a custom BIO [1], but I have my
doubts
about that being a good idea. I think medium term we want to be able to do
network IO asynchronously, which seems quite hard to do when using
openssl's
socket BIO.

Once we've done that, we're free to use BIO_set_data. While
BIO_set_app_data

works fine, I've reverted back to BIO_set_data because it's more

commonly used.

app_data depends on OpenSSL's "ex_data" mechanism, which is a tad

heavier under

the hood.

At first I was a bit wary of that, because it requires us to bring back the
fallback implementation. But you're right, it's noticeably heavier than
BIO_get_data(), and we do call BIO_get_app_data() fairly frequently.

TBH, I doubt it makes any real difference perf-wise. But I think
BIO_get_data is a bit more common in this context.

That leaves ctrl. ctrl is a bunch of operations (it's ioctl). The only
operation that matters is BIO_CTRL_FLUSH, which is implemented as a

no-op. All

other operations are unused. It's once again good that they're unused

because

otherwise OpenSSL might mess with postgres's socket, break the

assumptions

around interrupt handling, etc.

How did you determine that only FLUSH is required? I didn't even really
find
documentation about what the intended semantics actually are.

The unhelpful answer is that my day job is working on BoringSSL, so I've
spent a lot of time with this mess. :-) But, yeah, it's not well-documented
at all. OpenSSL ends up calling BIO_flush at the end of each batch of
writes in libssl. TBH, I suspect that was less intentional and more an
emergent property of them internally layering a buffer BIO at one point in
the process, but it's long been part of the (undocumented) API contract.
Conversely, I don't think OpenSSL can possibly make libssl *require* a new
BIO_CTRL because they'd break every custom BIO anyone has ever used with
the library.

E.g. should we implement BIO_CTRL_EOF? Sure, it wasn't really supported so
far, because we never set it, but is that right?

Ah hmm, BIO_CTRL_EOF is... a bit of a mess. OpenSSL kind of messed things
up. So, up until recently, I would have said that BIO_CTRL_EOF was not part
of the interface here. OpenSSL 1.0.x did not implement it for sockets, and
the BIO_read API *already* had a way to signal EOF: did you return zero or
-1?

Then OpenSSL 1.1.x introduced size_t-based BIO_read_ex APIs. However, in
the process, they *forgot that EOF and error are different things* and made
it impossible to detect EOFs if you use BIO_read_ex! They never noticed
this, because they didn't actually convert their own code to their new API.
See this discussion, which alas ended with OpenSSL deciding to ignore the
problem and not even document their current interface.
https://github.com/openssl/openssl/issues/8208

Though they never responded, they seem to have tacitly settled using the
out-of-band BIO_eof function (which is BIO_CTRL_EOF) as the way to signal
EOF for BIO_read_ex. This is kind of fiddly, but is at least a well-defined
option. But the problem is no one's BIO_METHODs, including their own, are
read_ex-based. They all implement the old read callback. But someone might
call BIO_read_ex on a read-based BIO_METHOD. IMO, BIO_read_ex should be
responsible for translating between the two EOF conventions. For example,
it could automatically set a flag when the read callback returns 0 and then
make BIO_ctrl check the flag and automatically implement BIO_CTRL_EOF for
BIOs that don't do it themselves. Alas, OpenSSL did not do this and instead
they just made their built-in BIOs implement BIO_CTRL_EOF, even though this
new expectation is a compatibility break. That leaves BIO_read, the one
everyone uses, in a pretty ambiguous state that they seem uninterested in
addressing.
https://github.com/openssl/openssl/pull/10882
https://github.com/openssl/openssl/pull/10907

Honestly, I suspect nothing in postgres actually cares about this, given
you all didn't notice things breaking around here in the early days of
1.1.x. Still, that is a good point. I've updated the patch to implement
BIO_CTRL_EOF for completeness' sake.

(For BoringSSL, we did not go down this route because, unlike OpenSSL
apparently, we do not think breaking backwards compatibility in the EOF
convention is OK. I do want to add the size_t APIs eventually, but I'm
thinking we'll do the automatic BIO_CTRL_EOF thing described above, to
avoid breaking compatibility with existing BIO_METHODs.)

Beyond BIO_CTRL_EOF, I assume what you're really asking here is "how do we
know OpenSSL won't change the interface again?". And, honestly, we don't.
They promise API and ABI stability, which in theory means they shouldn't.
But their track record with custom BIOs is not great. That said, if they do
break it, I think it will unambiguously be an API break on their part and
hopefully it'll be possible to get them to fix the issue. The EOF issue
snuck in because it mostly only impacted people who tried to migrate to
their new APIs. It broke existing APIs too, but the one project that
noticed (CPython) misdiagnosed the issue and worked around it.
(Incorrectly, but no one noticed. See
https://github.com/python/cpython/pull/95495.) So, I raised the issue,
they'd long since shipped it and evidently felt no burning need to fix
their regression or unclear APIs. :-(

But the alternative, picking up the real socket BIO's ctrl function, is
even more unsound, so I think this is the better place to be.

What about
BIO_CTRL_GET_CLOSE/BIO_CTRL_SET_CLOSE?

The close flag is pretty silly. All BIOs do slightly different things with
it, which means that libssl cannot possibly call it generically. So if your
BIO doesn't have any need for it, you don't need to bother. It's usually
used to indicate whether the BIO owns the underlying fd/socket/BUF_MEM/etc,

Another issue is that 0 doesn't actually seem like the universal error
return
- e.g. BIO_C_GET_FD seems to return -1, because 0 is a valid fd.

Yeah, that is... also a mess. All of OpenSSL's ctrl functions return zero
for unrecognized commands. It seems they just don't have a convention for
distinguishing unimplemented commands from zero returns. As you note, this
is a problem for BIO_C_GET_FD. OpenSSL's other non-fd BIOs have the same
problem though:
https://github.com/openssl/openssl/blob/master/crypto/bio/bss_file.c#L340

Realistically, type-specific ioctls like that don't matter much because
you're really only going to call them if you already know you have an
applicable BIO. Hopefully, between that, and removing BIO_TYPE_DESCRIPTOR
(below), it's mostly okay? Also happy to add a `ret = -1` implementation
of BIO_C_GET_FD if you'd prefer. But, in the general case, I suspect we
just have to live with the zero thing, and hopefully OpenSSL will stop
introducing ctrls where zero is an unsafe return value.

Perhaps BIO_ctrl in OpenSSL should have some logic like...
```
if (!(type & BIO_TYPE_DESCRIPTOR) && cmd == BIO_C_GET_FD) {
return -1;
}
```
Although I suspect there are people who implement BIO_C_GET_FD without
setting BIO_TYPE_DESCRIPTOR because none of this was ever documented.

As of your patch the bio doesn't actually have an FD anymore, should it
still
set BIO_TYPE_DESCRIPTOR?

Ah good call. Done. I don't think much code really cares, but it does mean
that, if you all call SSL_get_fd (why would you?), it won't get confused.
:-)

+static long

+my_sock_ctrl(BIO *h, int cmd, long num, void *ptr)
+{
+     long            res = 0;
+
+     switch (cmd)
+     {
+             case BIO_CTRL_FLUSH:
+                     /* libssl expects all BIOs to support BIO_flush. */
+                     res = 1;
+                     break;
+     }
+
+     return res;
+}

I'd move the res = 0 into a default: block. That way the compiler can warn
if
some case doesn't set it in all branches.

Done.

static BIO_METHOD *
my_BIO_s_socket(void)
{

Wonder if we should rename this. It's pretty odd that we still call it's
not
really related to s_socket anymore, and doesn't even really implement the
same
interface (e.g. get_fd doesn't work anymore). Similarly, my_SSL_set_fd()
doesn't actually call set_fd() anymore, which sure seems odd.

Done. I wasn't sure what to name them, or even what the naming convention
was, but I opted to name them after the Port and PGconn objects they wrap.
Happy to switch to another name if you have a better idea though.

David

Attachments:

avoid-mixing-custom-and-openssl-bios-v2.patchapplication/x-patch; name=avoid-mixing-custom-and-openssl-bios-v2.patchDownload
From e29b816dfb7ee5ed9f3389d7f06408f83783258a Mon Sep 17 00:00:00 2001
From: David Benjamin <davidben@google.com>
Date: Sun, 11 Feb 2024 10:42:25 -0500
Subject: [PATCH] Avoid mixing custom and OpenSSL BIO functions

This addresses the root cause of the BIO_set_data conflict that
c82207a548db47623a2bfa2447babdaa630302b9 attempted to address. That fix was
really just a workaround. The real root cause was that postgres was mixing up
two BIO implementations, each of which expected to be driving the BIO.

Mixing them up did not actually do any good. The Port and PGconn structures
already maintained the file descriptors. The socket BIO's copy, configured via
BIO_set_fd, wasn't even being used because my_BIO_s_socket replaced read and
write anyway. We've been essentially keeping extra state around, and relying
on it being unused:

- gets - Not implemented by sockets and not used by libssl.
- puts - Not used by libssl. If it were, it would break the special SIGPIPE and
  interrupt handling postgres aiming for.
- ctrl - (More on this later)
- create - This is just setting up state that we don't use.
- destroy - This is a no-op because we use BIO_NOCLOSE. In fact it's important
  that it's a no-op because otherwise OpenSSL would close the socket under
  postgres' feet!
- callback_ctrl - Not implemented by sockets.

That leaves ctrl. ctrl is a bunch of operations (it's ioctl). The only
operation that matters is BIO_CTRL_FLUSH, which is implemented as a no-op. All
other operations are unused. It's once again good that they're unused because
otherwise OpenSSL might mess with postgres's socket, break the assumptions
around interrupt handling, etc.

Instead, simply implement a very basic ctrl ourselves and drop the other
functions. This avoids the risk that future OpenSSL upgrades add some feature
to BIO_s_socket's ctrl which conflicts with postgres.

Once we've done that, we're free to use BIO_set_data. While BIO_set_app_data
works fine, I've reverted back to BIO_set_data because it's more commonly used.
app_data depends on OpenSSL's "ex_data" mechanism, which is a tad heavier under
the hood.

As this is no longer related to BIO_s_socket or calling SSL_set_fd, I've
renamed the methods to reference the PGconn and Port types instead.
---
 configure                                |   2 +-
 configure.ac                             |   2 +-
 meson.build                              |   1 +
 src/backend/libpq/be-secure-openssl.c    | 106 +++++++++++++++--------
 src/include/libpq/libpq-be.h             |   1 +
 src/include/pg_config.h.in               |   3 +
 src/interfaces/libpq/fe-secure-openssl.c |  96 +++++++++++++-------
 src/interfaces/libpq/libpq-int.h         |   1 +
 8 files changed, 141 insertions(+), 71 deletions(-)

diff --git a/configure b/configure
index 6b87e5c9a8..cacda708b0 100755
--- a/configure
+++ b/configure
@@ -12870,7 +12870,7 @@ done
   # defines OPENSSL_VERSION_NUMBER to claim version 2.0.0, even though it
   # doesn't have these OpenSSL 1.1.0 functions. So check for individual
   # functions.
-  for ac_func in OPENSSL_init_ssl BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free
+  for ac_func in OPENSSL_init_ssl BIO_get_data BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.ac b/configure.ac
index 6e64ece11d..083bbabfbf 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1374,7 +1374,7 @@ if test "$with_ssl" = openssl ; then
   # defines OPENSSL_VERSION_NUMBER to claim version 2.0.0, even though it
   # doesn't have these OpenSSL 1.1.0 functions. So check for individual
   # functions.
-  AC_CHECK_FUNCS([OPENSSL_init_ssl BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free])
+  AC_CHECK_FUNCS([OPENSSL_init_ssl BIO_get_data BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free])
   # OpenSSL versions before 1.1.0 required setting callback functions, for
   # thread-safety. In 1.1.0, it's no longer required, and CRYPTO_lock()
   # function was removed.
diff --git a/meson.build b/meson.build
index 8ed51b6aae..ba133b6e11 100644
--- a/meson.build
+++ b/meson.build
@@ -1290,6 +1290,7 @@ if sslopt in ['auto', 'openssl']
       # doesn't have these OpenSSL 1.1.0 functions. So check for individual
       # functions.
       ['OPENSSL_init_ssl'],
+      ['BIO_get_data'],
       ['BIO_meth_new'],
       ['ASN1_STRING_get0_data'],
       ['HMAC_CTX_new'],
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index e12b1cc9e3..05cfb8c5d3 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -56,10 +56,10 @@
 static void default_openssl_tls_init(SSL_CTX *context, bool isServerStart);
 openssl_tls_init_hook_typ openssl_tls_init_hook = default_openssl_tls_init;
 
-static int	my_sock_read(BIO *h, char *buf, int size);
-static int	my_sock_write(BIO *h, const char *buf, int size);
-static BIO_METHOD *my_BIO_s_socket(void);
-static int	my_SSL_set_fd(Port *port, int fd);
+static int	port_bio_read(BIO *h, char *buf, int size);
+static int	port_bio_write(BIO *h, const char *buf, int size);
+static BIO_METHOD *port_bio_method(void);
+static int	ssl_set_port_bio(Port *port);
 
 static DH  *load_dh_file(char *filename, bool isServerStart);
 static DH  *load_dh_buffer(const char *buffer, size_t len);
@@ -440,7 +440,7 @@ be_tls_open_server(Port *port)
 						SSLerrmessage(ERR_get_error()))));
 		return -1;
 	}
-	if (!my_SSL_set_fd(port, port->sock))
+	if (!ssl_set_port_bio(port))
 	{
 		ereport(COMMERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -849,17 +849,24 @@ be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
  * see sock_read() and sock_write() in OpenSSL's crypto/bio/bss_sock.c.
  */
 
-static BIO_METHOD *my_bio_methods = NULL;
+#ifndef HAVE_BIO_GET_DATA
+#define BIO_get_data(bio) (bio->ptr)
+#define BIO_set_data(bio, data) (bio->ptr = data)
+#endif
+
+static BIO_METHOD *port_bio_method_ptr = NULL;
 
 static int
-my_sock_read(BIO *h, char *buf, int size)
+port_bio_read(BIO *h, char *buf, int size)
 {
 	int			res = 0;
+	Port	   *port = (Port *) BIO_get_data(h);
 
 	if (buf != NULL)
 	{
-		res = secure_raw_read(((Port *) BIO_get_app_data(h)), buf, size);
+		res = secure_raw_read(port, buf, size);
 		BIO_clear_retry_flags(h);
+		port->last_read_was_eof = res == 0;
 		if (res <= 0)
 		{
 			/* If we were interrupted, tell caller to retry */
@@ -874,11 +881,11 @@ my_sock_read(BIO *h, char *buf, int size)
 }
 
 static int
-my_sock_write(BIO *h, const char *buf, int size)
+port_bio_write(BIO *h, const char *buf, int size)
 {
 	int			res = 0;
 
-	res = secure_raw_write(((Port *) BIO_get_app_data(h)), buf, size);
+	res = secure_raw_write(((Port *) BIO_get_data(h)), buf, size);
 	BIO_clear_retry_flags(h);
 	if (res <= 0)
 	{
@@ -892,56 +899,81 @@ my_sock_write(BIO *h, const char *buf, int size)
 	return res;
 }
 
+static long
+port_bio_ctrl(BIO *h, int cmd, long num, void *ptr)
+{
+	long		res;
+	Port	   *port = (Port *) BIO_get_data(h);
+
+	switch (cmd)
+	{
+		case BIO_CTRL_EOF:
+			/*
+			 * This should not be needed. port_bio_read already has a way to
+			 * signal EOF to OpenSSL. However, OpenSSL made an undocumented,
+			 * backwards-incompatible change and now expects EOF via BIO_ctrl.
+			 * See https://github.com/openssl/openssl/issues/8208
+			 */
+			res = port->last_read_was_eof;
+			break;
+		case BIO_CTRL_FLUSH:
+			/* libssl expects all BIOs to support BIO_flush. */
+			res = 1;
+			break;
+		default:
+			res = 0;
+			break;
+	}
+
+	return res;
+}
+
 static BIO_METHOD *
-my_BIO_s_socket(void)
+port_bio_method(void)
 {
-	if (!my_bio_methods)
+	if (!port_bio_method_ptr)
 	{
-		BIO_METHOD *biom = (BIO_METHOD *) BIO_s_socket();
 #ifdef HAVE_BIO_METH_NEW
 		int			my_bio_index;
 
 		my_bio_index = BIO_get_new_index();
 		if (my_bio_index == -1)
 			return NULL;
-		my_bio_index |= (BIO_TYPE_DESCRIPTOR | BIO_TYPE_SOURCE_SINK);
-		my_bio_methods = BIO_meth_new(my_bio_index, "PostgreSQL backend socket");
-		if (!my_bio_methods)
+		my_bio_index |= BIO_TYPE_SOURCE_SINK;
+		port_bio_method_ptr = BIO_meth_new(my_bio_index, "PostgreSQL backend socket");
+		if (!port_bio_method_ptr)
 			return NULL;
-		if (!BIO_meth_set_write(my_bio_methods, my_sock_write) ||
-			!BIO_meth_set_read(my_bio_methods, my_sock_read) ||
-			!BIO_meth_set_gets(my_bio_methods, BIO_meth_get_gets(biom)) ||
-			!BIO_meth_set_puts(my_bio_methods, BIO_meth_get_puts(biom)) ||
-			!BIO_meth_set_ctrl(my_bio_methods, BIO_meth_get_ctrl(biom)) ||
-			!BIO_meth_set_create(my_bio_methods, BIO_meth_get_create(biom)) ||
-			!BIO_meth_set_destroy(my_bio_methods, BIO_meth_get_destroy(biom)) ||
-			!BIO_meth_set_callback_ctrl(my_bio_methods, BIO_meth_get_callback_ctrl(biom)))
+		if (!BIO_meth_set_write(port_bio_method_ptr, port_bio_write) ||
+			!BIO_meth_set_read(port_bio_method_ptr, port_bio_read) ||
+			!BIO_meth_set_ctrl(port_bio_method_ptr, port_bio_ctrl))
 		{
-			BIO_meth_free(my_bio_methods);
-			my_bio_methods = NULL;
+			BIO_meth_free(port_bio_method_ptr);
+			port_bio_method_ptr = NULL;
 			return NULL;
 		}
 #else
-		my_bio_methods = malloc(sizeof(BIO_METHOD));
-		if (!my_bio_methods)
+		port_bio_method_ptr = malloc(sizeof(BIO_METHOD));
+		if (!port_bio_method_ptr)
 			return NULL;
-		memcpy(my_bio_methods, biom, sizeof(BIO_METHOD));
-		my_bio_methods->bread = my_sock_read;
-		my_bio_methods->bwrite = my_sock_write;
+		memset(port_bio_method_ptr, 0, sizeof(BIO_METHOD));
+		port_bio_method_ptr->type = BIO_TYPE_SOURCE_SINK;
+		port_bio_method_ptr->name = "libpq socket";
+		port_bio_method_ptr->bread = port_bio_read;
+		port_bio_method_ptr->bwrite = port_bio_write;
+		port_bio_method_ptr->ctrl = port_bio_ctrl;
 #endif
 	}
-	return my_bio_methods;
+	return port_bio_method_ptr;
 }
 
-/* This should exactly match OpenSSL's SSL_set_fd except for using my BIO */
 static int
-my_SSL_set_fd(Port *port, int fd)
+ssl_set_port_bio(Port *port)
 {
 	int			ret = 0;
 	BIO		   *bio;
 	BIO_METHOD *bio_method;
 
-	bio_method = my_BIO_s_socket();
+	bio_method = port_bio_method();
 	if (bio_method == NULL)
 	{
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
@@ -954,9 +986,9 @@ my_SSL_set_fd(Port *port, int fd)
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
 		goto err;
 	}
-	BIO_set_app_data(bio, port);
+	BIO_set_data(bio, port);
+	BIO_set_init(bio, 1);
 
-	BIO_set_fd(bio, fd, BIO_NOCLOSE);
 	SSL_set_bio(port->ssl, bio, bio);
 	ret = 1;
 err:
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 47d66d5524..014c067e81 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -218,6 +218,7 @@ typedef struct Port
 	char	   *peer_cn;
 	char	   *peer_dn;
 	bool		peer_cert_valid;
+	bool		last_read_was_eof;
 
 	/*
 	 * OpenSSL structures. (Keep these last so that the locations of other
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 07e73567dc..c7b88945fa 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -66,6 +66,9 @@
 /* Define to 1 if you have the `backtrace_symbols' function. */
 #undef HAVE_BACKTRACE_SYMBOLS
 
+/* Define to 1 if you have the `BIO_get_data' function. */
+#undef HAVE_BIO_GET_DATA
+
 /* Define to 1 if you have the `BIO_meth_new' function. */
 #undef HAVE_BIO_METH_NEW
 
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index 8110882262..cadb5b32f4 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -78,10 +78,10 @@ static char *SSLerrmessage(unsigned long ecode);
 static void SSLerrfree(char *buf);
 static int	PQssl_passwd_cb(char *buf, int size, int rwflag, void *userdata);
 
-static int	my_sock_read(BIO *h, char *buf, int size);
-static int	my_sock_write(BIO *h, const char *buf, int size);
-static BIO_METHOD *my_BIO_s_socket(void);
-static int	my_SSL_set_fd(PGconn *conn, int fd);
+static int	pgconn_bio_read(BIO *h, char *buf, int size);
+static int	pgconn_bio_write(BIO *h, const char *buf, int size);
+static BIO_METHOD *pgconn_bio_method(void);
+static int	ssl_set_pgconn_bio(PGconn *conn);
 
 
 static bool pq_init_ssl_lib = true;
@@ -1191,7 +1191,7 @@ initialize_SSL(PGconn *conn)
 	 */
 	if (!(conn->ssl = SSL_new(SSL_context)) ||
 		!SSL_set_app_data(conn->ssl, conn) ||
-		!my_SSL_set_fd(conn, conn->sock))
+		!ssl_set_pgconn_bio(conn))
 	{
 		char	   *err = SSLerrmessage(ERR_get_error());
 
@@ -1802,16 +1802,23 @@ PQsslAttribute(PGconn *conn, const char *attribute_name)
  * see sock_read() and sock_write() in OpenSSL's crypto/bio/bss_sock.c.
  */
 
+#ifndef HAVE_BIO_GET_DATA
+#define BIO_get_data(bio) (bio->ptr)
+#define BIO_set_data(bio, data) (bio->ptr = data)
+#endif
+
 /* protected by ssl_config_mutex */
-static BIO_METHOD *my_bio_methods;
+static BIO_METHOD *pgconn_bio_method_ptr;
 
 static int
-my_sock_read(BIO *h, char *buf, int size)
+pgconn_bio_read(BIO *h, char *buf, int size)
 {
 	int			res;
+	PGconn	   *conn = (PGconn *) BIO_get_data(h);
 
-	res = pqsecure_raw_read((PGconn *) BIO_get_app_data(h), buf, size);
+	res = pqsecure_raw_read(conn, buf, size);
 	BIO_clear_retry_flags(h);
+	conn->last_read_was_eof = res == 0;
 	if (res < 0)
 	{
 		/* If we were interrupted, tell caller to retry */
@@ -1836,11 +1843,11 @@ my_sock_read(BIO *h, char *buf, int size)
 }
 
 static int
-my_sock_write(BIO *h, const char *buf, int size)
+pgconn_bio_write(BIO *h, const char *buf, int size)
 {
 	int			res;
 
-	res = pqsecure_raw_write((PGconn *) BIO_get_app_data(h), buf, size);
+	res = pqsecure_raw_write((PGconn *) BIO_get_data(h), buf, size);
 	BIO_clear_retry_flags(h);
 	if (res < 0)
 	{
@@ -1865,26 +1872,54 @@ my_sock_write(BIO *h, const char *buf, int size)
 	return res;
 }
 
+static long
+pgconn_bio_ctrl(BIO *h, int cmd, long num, void *ptr)
+{
+	long		res;
+	PGconn	   *conn = (PGconn *) BIO_get_data(h);
+
+	switch (cmd)
+	{
+		case BIO_CTRL_EOF:
+			/*
+			 * This should not be needed. pgconn_bio_read already has a way to
+			 * signal EOF to OpenSSL. However, OpenSSL made an undocumented,
+			 * backwards-incompatible change and now expects EOF via BIO_ctrl.
+			 * See https://github.com/openssl/openssl/issues/8208
+			 */
+			res = conn->last_read_was_eof;
+			break;
+		case BIO_CTRL_FLUSH:
+			/* libssl expects all BIOs to support BIO_flush. */
+			res = 1;
+			break;
+		default:
+			res = 0;
+			break;
+	}
+
+	return res;
+}
+
 static BIO_METHOD *
-my_BIO_s_socket(void)
+pgconn_bio_method(void)
 {
 	BIO_METHOD *res;
 
 	if (pthread_mutex_lock(&ssl_config_mutex))
 		return NULL;
 
-	res = my_bio_methods;
+	res = pgconn_bio_method_ptr;
 
-	if (!my_bio_methods)
+	if (!pgconn_bio_method_ptr)
 	{
-		BIO_METHOD *biom = (BIO_METHOD *) BIO_s_socket();
 #ifdef HAVE_BIO_METH_NEW
 		int			my_bio_index;
 
 		my_bio_index = BIO_get_new_index();
 		if (my_bio_index == -1)
 			goto err;
-		my_bio_index |= (BIO_TYPE_DESCRIPTOR | BIO_TYPE_SOURCE_SINK);
+		my_bio_index |= BIO_TYPE_SOURCE_SINK;
 		res = BIO_meth_new(my_bio_index, "libpq socket");
 		if (!res)
 			goto err;
@@ -1893,14 +1928,9 @@ my_BIO_s_socket(void)
 		 * As of this writing, these functions never fail. But check anyway,
 		 * like OpenSSL's own examples do.
 		 */
-		if (!BIO_meth_set_write(res, my_sock_write) ||
-			!BIO_meth_set_read(res, my_sock_read) ||
-			!BIO_meth_set_gets(res, BIO_meth_get_gets(biom)) ||
-			!BIO_meth_set_puts(res, BIO_meth_get_puts(biom)) ||
-			!BIO_meth_set_ctrl(res, BIO_meth_get_ctrl(biom)) ||
-			!BIO_meth_set_create(res, BIO_meth_get_create(biom)) ||
-			!BIO_meth_set_destroy(res, BIO_meth_get_destroy(biom)) ||
-			!BIO_meth_set_callback_ctrl(res, BIO_meth_get_callback_ctrl(biom)))
+		if (!BIO_meth_set_write(res, pgconn_bio_write) ||
+			!BIO_meth_set_read(res, pgconn_bio_read) ||
+			!BIO_meth_set_ctrl(res, pgconn_bio_ctrl))
 		{
 			goto err;
 		}
@@ -1908,13 +1938,16 @@ my_BIO_s_socket(void)
 		res = malloc(sizeof(BIO_METHOD));
 		if (!res)
 			goto err;
-		memcpy(res, biom, sizeof(BIO_METHOD));
-		res->bread = my_sock_read;
-		res->bwrite = my_sock_write;
+		memset(res, 0, sizeof(BIO_METHOD));
+		res->type = BIO_TYPE_SOURCE_SINK;
+		res->name = "libpq socket";
+		res->bread = pgconn_bio_read;
+		res->bwrite = pgconn_bio_write;
+		res->ctrl = pgconn_bio_ctrl;
 #endif
 	}
 
-	my_bio_methods = res;
+	pgconn_bio_method_ptr = res;
 	pthread_mutex_unlock(&ssl_config_mutex);
 	return res;
 
@@ -1930,15 +1963,14 @@ err:
 	return NULL;
 }
 
-/* This should exactly match OpenSSL's SSL_set_fd except for using my BIO */
 static int
-my_SSL_set_fd(PGconn *conn, int fd)
+ssl_set_pgconn_bio(PGconn *conn)
 {
 	int			ret = 0;
 	BIO		   *bio;
 	BIO_METHOD *bio_method;
 
-	bio_method = my_BIO_s_socket();
+	bio_method = pgconn_bio_method();
 	if (bio_method == NULL)
 	{
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
@@ -1950,10 +1982,10 @@ my_SSL_set_fd(PGconn *conn, int fd)
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
 		goto err;
 	}
-	BIO_set_app_data(bio, conn);
+	BIO_set_data(bio, conn);
+	BIO_set_init(bio, 1);
 
 	SSL_set_bio(conn->ssl, bio, bio);
-	BIO_set_fd(bio, fd, BIO_NOCLOSE);
 	ret = 1;
 err:
 	return ret;
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 82c18f870d..a6c36073be 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -547,6 +547,7 @@ struct pg_conn
 	bool		ssl_in_use;
 	bool		ssl_cert_requested; /* Did the server ask us for a cert? */
 	bool		ssl_cert_sent;	/* Did we send one in reply? */
+	bool		last_read_was_eof;
 
 #ifdef USE_SSL
 	bool		allow_ssl_try;	/* Allowed to try SSL negotiation */
-- 
2.44.0.rc0.258.g7320e95886-goog

#8David Benjamin
davidben@google.com
In reply to: David Benjamin (#7)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

Circling back here, anything else needed from my end on this patch?

On Wed, Feb 21, 2024, 17:04 David Benjamin <davidben@google.com> wrote:

Show quoted text

Thanks for the very thorough comments! I've attached a new version of the
patch.

On Thu, Feb 15, 2024 at 4:17 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2024-02-11 13:19:00 -0500, David Benjamin wrote:

I've attached a patch for the master branch to fix up the custom BIOs

used

by PostgreSQL, in light of the issues with the OpenSSL update recently.
While c82207a548db47623a2bfa2447babdaa630302b9 (switching from

BIO_get_data

to BIO_get_app_data) resolved the immediate conflict, I don't think it
addressed the root cause, which is that PostgreSQL was mixing up two BIO
implementations, each of which expected to be driving the BIO.

Yea, that's certainly not nice - and I think we've been somewhat lucky it
hasn't caused more issues. There's some nasty possibilities, e.g.
sock_ctrl()
partially enabling ktls without our initialization having called
ktls_enable(). Right now that just means ktls is unusable, but it's not
hard
to imagine accidentally ending up sending unencrypted data.

Yeah. Even if, say, the ktls bits work, given you all care enough about
I/O to have wanted to wrap the BIO, I assume you'd want to pick up those
features on your own terms, e.g. by implementing the BIO_CTRLs yourself.

I've in the past looked into not using a custom BIO [1], but I have my
doubts
about that being a good idea. I think medium term we want to be able to do
network IO asynchronously, which seems quite hard to do when using
openssl's
socket BIO.

Once we've done that, we're free to use BIO_set_data. While
BIO_set_app_data

works fine, I've reverted back to BIO_set_data because it's more

commonly used.

app_data depends on OpenSSL's "ex_data" mechanism, which is a tad

heavier under

the hood.

At first I was a bit wary of that, because it requires us to bring back
the
fallback implementation. But you're right, it's noticeably heavier than
BIO_get_data(), and we do call BIO_get_app_data() fairly frequently.

TBH, I doubt it makes any real difference perf-wise. But I think
BIO_get_data is a bit more common in this context.

That leaves ctrl. ctrl is a bunch of operations (it's ioctl). The only
operation that matters is BIO_CTRL_FLUSH, which is implemented as a

no-op. All

other operations are unused. It's once again good that they're unused

because

otherwise OpenSSL might mess with postgres's socket, break the

assumptions

around interrupt handling, etc.

How did you determine that only FLUSH is required? I didn't even really
find
documentation about what the intended semantics actually are.

The unhelpful answer is that my day job is working on BoringSSL, so I've
spent a lot of time with this mess. :-) But, yeah, it's not well-documented
at all. OpenSSL ends up calling BIO_flush at the end of each batch of
writes in libssl. TBH, I suspect that was less intentional and more an
emergent property of them internally layering a buffer BIO at one point in
the process, but it's long been part of the (undocumented) API contract.
Conversely, I don't think OpenSSL can possibly make libssl *require* a
new BIO_CTRL because they'd break every custom BIO anyone has ever used
with the library.

E.g. should we implement BIO_CTRL_EOF? Sure, it wasn't really supported so
far, because we never set it, but is that right?

Ah hmm, BIO_CTRL_EOF is... a bit of a mess. OpenSSL kind of messed things
up. So, up until recently, I would have said that BIO_CTRL_EOF was not part
of the interface here. OpenSSL 1.0.x did not implement it for sockets, and
the BIO_read API *already* had a way to signal EOF: did you return zero
or -1?

Then OpenSSL 1.1.x introduced size_t-based BIO_read_ex APIs. However, in
the process, they *forgot that EOF and error are different things* and
made it impossible to detect EOFs if you use BIO_read_ex! They never
noticed this, because they didn't actually convert their own code to their
new API. See this discussion, which alas ended with OpenSSL deciding to
ignore the problem and not even document their current interface.
https://github.com/openssl/openssl/issues/8208

Though they never responded, they seem to have tacitly settled using the
out-of-band BIO_eof function (which is BIO_CTRL_EOF) as the way to signal
EOF for BIO_read_ex. This is kind of fiddly, but is at least a well-defined
option. But the problem is no one's BIO_METHODs, including their own, are
read_ex-based. They all implement the old read callback. But someone might
call BIO_read_ex on a read-based BIO_METHOD. IMO, BIO_read_ex should be
responsible for translating between the two EOF conventions. For example,
it could automatically set a flag when the read callback returns 0 and then
make BIO_ctrl check the flag and automatically implement BIO_CTRL_EOF for
BIOs that don't do it themselves. Alas, OpenSSL did not do this and instead
they just made their built-in BIOs implement BIO_CTRL_EOF, even though this
new expectation is a compatibility break. That leaves BIO_read, the one
everyone uses, in a pretty ambiguous state that they seem uninterested in
addressing.
https://github.com/openssl/openssl/pull/10882
https://github.com/openssl/openssl/pull/10907

Honestly, I suspect nothing in postgres actually cares about this, given
you all didn't notice things breaking around here in the early days of
1.1.x. Still, that is a good point. I've updated the patch to implement
BIO_CTRL_EOF for completeness' sake.

(For BoringSSL, we did not go down this route because, unlike OpenSSL
apparently, we do not think breaking backwards compatibility in the EOF
convention is OK. I do want to add the size_t APIs eventually, but I'm
thinking we'll do the automatic BIO_CTRL_EOF thing described above, to
avoid breaking compatibility with existing BIO_METHODs.)

Beyond BIO_CTRL_EOF, I assume what you're really asking here is "how do we
know OpenSSL won't change the interface again?". And, honestly, we don't.
They promise API and ABI stability, which in theory means they shouldn't.
But their track record with custom BIOs is not great. That said, if they do
break it, I think it will unambiguously be an API break on their part and
hopefully it'll be possible to get them to fix the issue. The EOF issue
snuck in because it mostly only impacted people who tried to migrate to
their new APIs. It broke existing APIs too, but the one project that
noticed (CPython) misdiagnosed the issue and worked around it.
(Incorrectly, but no one noticed. See
https://github.com/python/cpython/pull/95495.) So, I raised the issue,
they'd long since shipped it and evidently felt no burning need to fix
their regression or unclear APIs. :-(

But the alternative, picking up the real socket BIO's ctrl function, is
even more unsound, so I think this is the better place to be.

What about
BIO_CTRL_GET_CLOSE/BIO_CTRL_SET_CLOSE?

The close flag is pretty silly. All BIOs do slightly different things with
it, which means that libssl cannot possibly call it generically. So if your
BIO doesn't have any need for it, you don't need to bother. It's usually
used to indicate whether the BIO owns the underlying fd/socket/BUF_MEM/etc,

Another issue is that 0 doesn't actually seem like the universal error
return
- e.g. BIO_C_GET_FD seems to return -1, because 0 is a valid fd.

Yeah, that is... also a mess. All of OpenSSL's ctrl functions return zero
for unrecognized commands. It seems they just don't have a convention for
distinguishing unimplemented commands from zero returns. As you note, this
is a problem for BIO_C_GET_FD. OpenSSL's other non-fd BIOs have the same
problem though:
https://github.com/openssl/openssl/blob/master/crypto/bio/bss_file.c#L340

Realistically, type-specific ioctls like that don't matter much because
you're really only going to call them if you already know you have an
applicable BIO. Hopefully, between that, and removing BIO_TYPE_DESCRIPTOR
(below), it's mostly okay? Also happy to add a `ret = -1` implementation
of BIO_C_GET_FD if you'd prefer. But, in the general case, I suspect we
just have to live with the zero thing, and hopefully OpenSSL will stop
introducing ctrls where zero is an unsafe return value.

Perhaps BIO_ctrl in OpenSSL should have some logic like...
```
if (!(type & BIO_TYPE_DESCRIPTOR) && cmd == BIO_C_GET_FD) {
return -1;
}
```
Although I suspect there are people who implement BIO_C_GET_FD without
setting BIO_TYPE_DESCRIPTOR because none of this was ever documented.

As of your patch the bio doesn't actually have an FD anymore, should it
still
set BIO_TYPE_DESCRIPTOR?

Ah good call. Done. I don't think much code really cares, but it does mean
that, if you all call SSL_get_fd (why would you?), it won't get confused.
:-)

+static long

+my_sock_ctrl(BIO *h, int cmd, long num, void *ptr)
+{
+     long            res = 0;
+
+     switch (cmd)
+     {
+             case BIO_CTRL_FLUSH:
+                     /* libssl expects all BIOs to support BIO_flush.

*/

+                     res = 1;
+                     break;
+     }
+
+     return res;
+}

I'd move the res = 0 into a default: block. That way the compiler can
warn if
some case doesn't set it in all branches.

Done.

static BIO_METHOD *
my_BIO_s_socket(void)
{

Wonder if we should rename this. It's pretty odd that we still call it's
not
really related to s_socket anymore, and doesn't even really implement the
same
interface (e.g. get_fd doesn't work anymore). Similarly, my_SSL_set_fd()
doesn't actually call set_fd() anymore, which sure seems odd.

Done. I wasn't sure what to name them, or even what the naming convention
was, but I opted to name them after the Port and PGconn objects they wrap.
Happy to switch to another name if you have a better idea though.

David

#9Daniel Gustafsson
daniel@yesql.se
In reply to: David Benjamin (#8)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

On 19 Apr 2024, at 15:13, David Benjamin <davidben@google.com> wrote:

Circling back here, anything else needed from my end on this patch?

Patience mostly, v17 very recently entered feature freeze so a lof of the
developers are busy finding and fixing bugs to stabilize the release. When the
window for new features opens again I am sure this patch will get reviews (I
have it on my personal TODO and hope to get to it).

--
Daniel Gustafsson

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Gustafsson (#9)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

Daniel Gustafsson <daniel@yesql.se> writes:

On 19 Apr 2024, at 15:13, David Benjamin <davidben@google.com> wrote:

Circling back here, anything else needed from my end on this patch?

Patience mostly, v17 very recently entered feature freeze so a lof of the
developers are busy finding and fixing bugs to stabilize the
release.

Per the cfbot [1]http://commitfest.cputube.org/david-benjamin.html, this patch needs a rebase over the ALPN-related
commits. It still isn't likely to get human attention before the
July commitfest, but you can save some time by making sure it's
in a reviewable state before that.

regards, tom lane

[1]: http://commitfest.cputube.org/david-benjamin.html

#11David Benjamin
davidben@google.com
In reply to: Tom Lane (#10)
1 attachment(s)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

On Sun, May 19, 2024 at 12:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Per the cfbot [1], this patch needs a rebase over the ALPN-related
commits. It still isn't likely to get human attention before the
July commitfest, but you can save some time by making sure it's
in a reviewable state before that.

Rebased version attached. (The conflict was pretty trivial. Both patches
add a field to some struct.)

David

Attachments:

avoid-mixing-custom-and-openssl-bios-v3.patchtext/x-patch; charset=US-ASCII; name=avoid-mixing-custom-and-openssl-bios-v3.patchDownload
From d949ff826aed2a7a9107be4b166fd48bcae38227 Mon Sep 17 00:00:00 2001
From: David Benjamin <davidben@google.com>
Date: Sun, 11 Feb 2024 10:42:25 -0500
Subject: [PATCH] Avoid mixing custom and OpenSSL BIO functions

This addresses the root cause of the BIO_set_data conflict that
c82207a548db47623a2bfa2447babdaa630302b9 attempted to address. That fix was
really just a workaround. The real root cause was that postgres was mixing up
two BIO implementations, each of which expected to be driving the BIO.

Mixing them up did not actually do any good. The Port and PGconn structures
already maintained the file descriptors. The socket BIO's copy, configured via
BIO_set_fd, wasn't even being used because my_BIO_s_socket replaced read and
write anyway. We've been essentially keeping extra state around, and relying
on it being unused:

- gets - Not implemented by sockets and not used by libssl.
- puts - Not used by libssl. If it were, it would break the special SIGPIPE and
  interrupt handling postgres aiming for.
- ctrl - (More on this later)
- create - This is just setting up state that we don't use.
- destroy - This is a no-op because we use BIO_NOCLOSE. In fact it's important
  that it's a no-op because otherwise OpenSSL would close the socket under
  postgres' feet!
- callback_ctrl - Not implemented by sockets.

That leaves ctrl. ctrl is a bunch of operations (it's ioctl). The only
operation that matters is BIO_CTRL_FLUSH, which is implemented as a no-op. All
other operations are unused. It's once again good that they're unused because
otherwise OpenSSL might mess with postgres's socket, break the assumptions
around interrupt handling, etc.

Instead, simply implement a very basic ctrl ourselves and drop the other
functions. This avoids the risk that future OpenSSL upgrades add some feature
to BIO_s_socket's ctrl which conflicts with postgres.

Once we've done that, we're free to use BIO_set_data. While BIO_set_app_data
works fine, I've reverted back to BIO_set_data because it's more commonly used.
app_data depends on OpenSSL's "ex_data" mechanism, which is a tad heavier under
the hood.

As this is no longer related to BIO_s_socket or calling SSL_set_fd, I've
renamed the methods to reference the PGconn and Port types instead.
---
 configure                                |   2 +-
 configure.ac                             |   2 +-
 meson.build                              |   1 +
 src/backend/libpq/be-secure-openssl.c    | 106 +++++++++++++++--------
 src/include/libpq/libpq-be.h             |   1 +
 src/include/pg_config.h.in               |   3 +
 src/interfaces/libpq/fe-secure-openssl.c |  94 +++++++++++++-------
 src/interfaces/libpq/libpq-int.h         |   1 +
 8 files changed, 140 insertions(+), 70 deletions(-)

diff --git a/configure b/configure
index 8e7704d54b..538975530f 100755
--- a/configure
+++ b/configure
@@ -12564,7 +12564,7 @@ done
   # defines OPENSSL_VERSION_NUMBER to claim version 2.0.0, even though it
   # doesn't have these OpenSSL 1.1.0 functions. So check for individual
   # functions.
-  for ac_func in OPENSSL_init_ssl BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free
+  for ac_func in OPENSSL_init_ssl BIO_get_data BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.ac b/configure.ac
index c7322e292c..4e34539ea2 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1352,7 +1352,7 @@ if test "$with_ssl" = openssl ; then
   # defines OPENSSL_VERSION_NUMBER to claim version 2.0.0, even though it
   # doesn't have these OpenSSL 1.1.0 functions. So check for individual
   # functions.
-  AC_CHECK_FUNCS([OPENSSL_init_ssl BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free])
+  AC_CHECK_FUNCS([OPENSSL_init_ssl BIO_get_data BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free])
   # OpenSSL versions before 1.1.0 required setting callback functions, for
   # thread-safety. In 1.1.0, it's no longer required, and CRYPTO_lock()
   # function was removed.
diff --git a/meson.build b/meson.build
index 1c0579d5a6..853e1aa9f8 100644
--- a/meson.build
+++ b/meson.build
@@ -1281,6 +1281,7 @@ if sslopt in ['auto', 'openssl']
       # doesn't have these OpenSSL 1.1.0 functions. So check for individual
       # functions.
       ['OPENSSL_init_ssl'],
+      ['BIO_get_data'],
       ['BIO_meth_new'],
       ['ASN1_STRING_get0_data'],
       ['HMAC_CTX_new'],
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 39b1a66236..05b995ff6c 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -56,10 +56,10 @@
 static void default_openssl_tls_init(SSL_CTX *context, bool isServerStart);
 openssl_tls_init_hook_typ openssl_tls_init_hook = default_openssl_tls_init;
 
-static int	my_sock_read(BIO *h, char *buf, int size);
-static int	my_sock_write(BIO *h, const char *buf, int size);
-static BIO_METHOD *my_BIO_s_socket(void);
-static int	my_SSL_set_fd(Port *port, int fd);
+static int	port_bio_read(BIO *h, char *buf, int size);
+static int	port_bio_write(BIO *h, const char *buf, int size);
+static BIO_METHOD *port_bio_method(void);
+static int	ssl_set_port_bio(Port *port);
 
 static DH  *load_dh_file(char *filename, bool isServerStart);
 static DH  *load_dh_buffer(const char *buffer, size_t len);
@@ -454,7 +454,7 @@ be_tls_open_server(Port *port)
 						SSLerrmessage(ERR_get_error()))));
 		return -1;
 	}
-	if (!my_SSL_set_fd(port, port->sock))
+	if (!ssl_set_port_bio(port))
 	{
 		ereport(COMMERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -891,17 +891,24 @@ be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
  * see sock_read() and sock_write() in OpenSSL's crypto/bio/bss_sock.c.
  */
 
-static BIO_METHOD *my_bio_methods = NULL;
+#ifndef HAVE_BIO_GET_DATA
+#define BIO_get_data(bio) (bio->ptr)
+#define BIO_set_data(bio, data) (bio->ptr = data)
+#endif
+
+static BIO_METHOD *port_bio_method_ptr = NULL;
 
 static int
-my_sock_read(BIO *h, char *buf, int size)
+port_bio_read(BIO *h, char *buf, int size)
 {
 	int			res = 0;
+	Port	   *port = (Port *) BIO_get_data(h);
 
 	if (buf != NULL)
 	{
-		res = secure_raw_read(((Port *) BIO_get_app_data(h)), buf, size);
+		res = secure_raw_read(port, buf, size);
 		BIO_clear_retry_flags(h);
+		port->last_read_was_eof = res == 0;
 		if (res <= 0)
 		{
 			/* If we were interrupted, tell caller to retry */
@@ -916,11 +923,11 @@ my_sock_read(BIO *h, char *buf, int size)
 }
 
 static int
-my_sock_write(BIO *h, const char *buf, int size)
+port_bio_write(BIO *h, const char *buf, int size)
 {
 	int			res = 0;
 
-	res = secure_raw_write(((Port *) BIO_get_app_data(h)), buf, size);
+	res = secure_raw_write(((Port *) BIO_get_data(h)), buf, size);
 	BIO_clear_retry_flags(h);
 	if (res <= 0)
 	{
@@ -934,56 +941,81 @@ my_sock_write(BIO *h, const char *buf, int size)
 	return res;
 }
 
+static long
+port_bio_ctrl(BIO *h, int cmd, long num, void *ptr)
+{
+	long		res;
+	Port	   *port = (Port *) BIO_get_data(h);
+
+	switch (cmd)
+	{
+		case BIO_CTRL_EOF:
+			/*
+			 * This should not be needed. port_bio_read already has a way to
+			 * signal EOF to OpenSSL. However, OpenSSL made an undocumented,
+			 * backwards-incompatible change and now expects EOF via BIO_ctrl.
+			 * See https://github.com/openssl/openssl/issues/8208
+			 */
+			res = port->last_read_was_eof;
+			break;
+		case BIO_CTRL_FLUSH:
+			/* libssl expects all BIOs to support BIO_flush. */
+			res = 1;
+			break;
+		default:
+			res = 0;
+			break;
+	}
+
+	return res;
+}
+
 static BIO_METHOD *
-my_BIO_s_socket(void)
+port_bio_method(void)
 {
-	if (!my_bio_methods)
+	if (!port_bio_method_ptr)
 	{
-		BIO_METHOD *biom = (BIO_METHOD *) BIO_s_socket();
 #ifdef HAVE_BIO_METH_NEW
 		int			my_bio_index;
 
 		my_bio_index = BIO_get_new_index();
 		if (my_bio_index == -1)
 			return NULL;
-		my_bio_index |= (BIO_TYPE_DESCRIPTOR | BIO_TYPE_SOURCE_SINK);
-		my_bio_methods = BIO_meth_new(my_bio_index, "PostgreSQL backend socket");
-		if (!my_bio_methods)
+		my_bio_index |= BIO_TYPE_SOURCE_SINK;
+		port_bio_method_ptr = BIO_meth_new(my_bio_index, "PostgreSQL backend socket");
+		if (!port_bio_method_ptr)
 			return NULL;
-		if (!BIO_meth_set_write(my_bio_methods, my_sock_write) ||
-			!BIO_meth_set_read(my_bio_methods, my_sock_read) ||
-			!BIO_meth_set_gets(my_bio_methods, BIO_meth_get_gets(biom)) ||
-			!BIO_meth_set_puts(my_bio_methods, BIO_meth_get_puts(biom)) ||
-			!BIO_meth_set_ctrl(my_bio_methods, BIO_meth_get_ctrl(biom)) ||
-			!BIO_meth_set_create(my_bio_methods, BIO_meth_get_create(biom)) ||
-			!BIO_meth_set_destroy(my_bio_methods, BIO_meth_get_destroy(biom)) ||
-			!BIO_meth_set_callback_ctrl(my_bio_methods, BIO_meth_get_callback_ctrl(biom)))
+		if (!BIO_meth_set_write(port_bio_method_ptr, port_bio_write) ||
+			!BIO_meth_set_read(port_bio_method_ptr, port_bio_read) ||
+			!BIO_meth_set_ctrl(port_bio_method_ptr, port_bio_ctrl))
 		{
-			BIO_meth_free(my_bio_methods);
-			my_bio_methods = NULL;
+			BIO_meth_free(port_bio_method_ptr);
+			port_bio_method_ptr = NULL;
 			return NULL;
 		}
 #else
-		my_bio_methods = malloc(sizeof(BIO_METHOD));
-		if (!my_bio_methods)
+		port_bio_method_ptr = malloc(sizeof(BIO_METHOD));
+		if (!port_bio_method_ptr)
 			return NULL;
-		memcpy(my_bio_methods, biom, sizeof(BIO_METHOD));
-		my_bio_methods->bread = my_sock_read;
-		my_bio_methods->bwrite = my_sock_write;
+		memset(port_bio_method_ptr, 0, sizeof(BIO_METHOD));
+		port_bio_method_ptr->type = BIO_TYPE_SOURCE_SINK;
+		port_bio_method_ptr->name = "libpq socket";
+		port_bio_method_ptr->bread = port_bio_read;
+		port_bio_method_ptr->bwrite = port_bio_write;
+		port_bio_method_ptr->ctrl = port_bio_ctrl;
 #endif
 	}
-	return my_bio_methods;
+	return port_bio_method_ptr;
 }
 
-/* This should exactly match OpenSSL's SSL_set_fd except for using my BIO */
 static int
-my_SSL_set_fd(Port *port, int fd)
+ssl_set_port_bio(Port *port)
 {
 	int			ret = 0;
 	BIO		   *bio;
 	BIO_METHOD *bio_method;
 
-	bio_method = my_BIO_s_socket();
+	bio_method = port_bio_method();
 	if (bio_method == NULL)
 	{
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
@@ -996,9 +1028,9 @@ my_SSL_set_fd(Port *port, int fd)
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
 		goto err;
 	}
-	BIO_set_app_data(bio, port);
+	BIO_set_data(bio, port);
+	BIO_set_init(bio, 1);
 
-	BIO_set_fd(bio, fd, BIO_NOCLOSE);
 	SSL_set_bio(port->ssl, bio, bio);
 	ret = 1;
 err:
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 05cb1874c5..d97d1e5f6b 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -204,6 +204,7 @@ typedef struct Port
 	char	   *peer_dn;
 	bool		peer_cert_valid;
 	bool		alpn_used;
+	bool		last_read_was_eof;
 
 	/*
 	 * OpenSSL structures. (Keep these last so that the locations of other
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index f8d3e3b6b8..01a76aef08 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -66,6 +66,9 @@
 /* Define to 1 if you have the `backtrace_symbols' function. */
 #undef HAVE_BACKTRACE_SYMBOLS
 
+/* Define to 1 if you have the `BIO_get_data' function. */
+#undef HAVE_BIO_GET_DATA
+
 /* Define to 1 if you have the `BIO_meth_new' function. */
 #undef HAVE_BIO_METH_NEW
 
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index bf9dfbf918..883e09afe9 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -78,10 +78,10 @@ static char *SSLerrmessage(unsigned long ecode);
 static void SSLerrfree(char *buf);
 static int	PQssl_passwd_cb(char *buf, int size, int rwflag, void *userdata);
 
-static int	my_sock_read(BIO *h, char *buf, int size);
-static int	my_sock_write(BIO *h, const char *buf, int size);
-static BIO_METHOD *my_BIO_s_socket(void);
-static int	my_SSL_set_fd(PGconn *conn, int fd);
+static int	pgconn_bio_read(BIO *h, char *buf, int size);
+static int	pgconn_bio_write(BIO *h, const char *buf, int size);
+static BIO_METHOD *pgconn_bio_method(void);
+static int	ssl_set_pgconn_bio(PGconn *conn);
 
 
 static bool pq_init_ssl_lib = true;
@@ -1193,7 +1193,7 @@ initialize_SSL(PGconn *conn)
 	 */
 	if (!(conn->ssl = SSL_new(SSL_context)) ||
 		!SSL_set_app_data(conn->ssl, conn) ||
-		!my_SSL_set_fd(conn, conn->sock))
+		!ssl_set_pgconn_bio(conn))
 	{
 		char	   *err = SSLerrmessage(ERR_get_error());
 
@@ -1900,17 +1900,24 @@ PQsslAttribute(PGconn *conn, const char *attribute_name)
  * see sock_read() and sock_write() in OpenSSL's crypto/bio/bss_sock.c.
  */
 
+#ifndef HAVE_BIO_GET_DATA
+#define BIO_get_data(bio) (bio->ptr)
+#define BIO_set_data(bio, data) (bio->ptr = data)
+#endif
+
 /* protected by ssl_config_mutex */
-static BIO_METHOD *my_bio_methods;
+static BIO_METHOD *pgconn_bio_method_ptr;
 
 static int
-my_sock_read(BIO *h, char *buf, int size)
+pgconn_bio_read(BIO *h, char *buf, int size)
 {
 	PGconn	   *conn = (PGconn *) BIO_get_app_data(h);
 	int			res;
+	PGconn	   *conn = (PGconn *) BIO_get_data(h);
 
 	res = pqsecure_raw_read(conn, buf, size);
 	BIO_clear_retry_flags(h);
+	conn->last_read_was_eof = res == 0;
 	if (res < 0)
 	{
 		/* If we were interrupted, tell caller to retry */
@@ -1938,11 +1945,11 @@ my_sock_read(BIO *h, char *buf, int size)
 }
 
 static int
-my_sock_write(BIO *h, const char *buf, int size)
+pgconn_bio_write(BIO *h, const char *buf, int size)
 {
 	int			res;
 
-	res = pqsecure_raw_write((PGconn *) BIO_get_app_data(h), buf, size);
+	res = pqsecure_raw_write((PGconn *) BIO_get_data(h), buf, size);
 	BIO_clear_retry_flags(h);
 	if (res < 0)
 	{
@@ -1967,26 +1974,54 @@ my_sock_write(BIO *h, const char *buf, int size)
 	return res;
 }
 
+static long
+pgconn_bio_ctrl(BIO *h, int cmd, long num, void *ptr)
+{
+	long		res;
+	PGconn	   *conn = (PGconn *) BIO_get_data(h);
+
+	switch (cmd)
+	{
+		case BIO_CTRL_EOF:
+			/*
+			 * This should not be needed. pgconn_bio_read already has a way to
+			 * signal EOF to OpenSSL. However, OpenSSL made an undocumented,
+			 * backwards-incompatible change and now expects EOF via BIO_ctrl.
+			 * See https://github.com/openssl/openssl/issues/8208
+			 */
+			res = conn->last_read_was_eof;
+			break;
+		case BIO_CTRL_FLUSH:
+			/* libssl expects all BIOs to support BIO_flush. */
+			res = 1;
+			break;
+		default:
+			res = 0;
+			break;
+	}
+
+	return res;
+}
+
 static BIO_METHOD *
-my_BIO_s_socket(void)
+pgconn_bio_method(void)
 {
 	BIO_METHOD *res;
 
 	if (pthread_mutex_lock(&ssl_config_mutex))
 		return NULL;
 
-	res = my_bio_methods;
+	res = pgconn_bio_method_ptr;
 
-	if (!my_bio_methods)
+	if (!pgconn_bio_method_ptr)
 	{
-		BIO_METHOD *biom = (BIO_METHOD *) BIO_s_socket();
 #ifdef HAVE_BIO_METH_NEW
 		int			my_bio_index;
 
 		my_bio_index = BIO_get_new_index();
 		if (my_bio_index == -1)
 			goto err;
-		my_bio_index |= (BIO_TYPE_DESCRIPTOR | BIO_TYPE_SOURCE_SINK);
+		my_bio_index |= BIO_TYPE_SOURCE_SINK;
 		res = BIO_meth_new(my_bio_index, "libpq socket");
 		if (!res)
 			goto err;
@@ -1995,14 +2030,9 @@ my_BIO_s_socket(void)
 		 * As of this writing, these functions never fail. But check anyway,
 		 * like OpenSSL's own examples do.
 		 */
-		if (!BIO_meth_set_write(res, my_sock_write) ||
-			!BIO_meth_set_read(res, my_sock_read) ||
-			!BIO_meth_set_gets(res, BIO_meth_get_gets(biom)) ||
-			!BIO_meth_set_puts(res, BIO_meth_get_puts(biom)) ||
-			!BIO_meth_set_ctrl(res, BIO_meth_get_ctrl(biom)) ||
-			!BIO_meth_set_create(res, BIO_meth_get_create(biom)) ||
-			!BIO_meth_set_destroy(res, BIO_meth_get_destroy(biom)) ||
-			!BIO_meth_set_callback_ctrl(res, BIO_meth_get_callback_ctrl(biom)))
+		if (!BIO_meth_set_write(res, pgconn_bio_write) ||
+			!BIO_meth_set_read(res, pgconn_bio_read) ||
+			!BIO_meth_set_ctrl(res, pgconn_bio_ctrl))
 		{
 			goto err;
 		}
@@ -2010,13 +2040,16 @@ my_BIO_s_socket(void)
 		res = malloc(sizeof(BIO_METHOD));
 		if (!res)
 			goto err;
-		memcpy(res, biom, sizeof(BIO_METHOD));
-		res->bread = my_sock_read;
-		res->bwrite = my_sock_write;
+		memset(res, 0, sizeof(BIO_METHOD));
+		res->type = BIO_TYPE_SOURCE_SINK;
+		res->name = "libpq socket";
+		res->bread = pgconn_bio_read;
+		res->bwrite = pgconn_bio_write;
+		res->ctrl = pgconn_bio_ctrl;
 #endif
 	}
 
-	my_bio_methods = res;
+	pgconn_bio_method_ptr = res;
 	pthread_mutex_unlock(&ssl_config_mutex);
 	return res;
 
@@ -2032,15 +2065,14 @@ err:
 	return NULL;
 }
 
-/* This should exactly match OpenSSL's SSL_set_fd except for using my BIO */
 static int
-my_SSL_set_fd(PGconn *conn, int fd)
+ssl_set_pgconn_bio(PGconn *conn)
 {
 	int			ret = 0;
 	BIO		   *bio;
 	BIO_METHOD *bio_method;
 
-	bio_method = my_BIO_s_socket();
+	bio_method = pgconn_bio_method();
 	if (bio_method == NULL)
 	{
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
@@ -2052,10 +2084,10 @@ my_SSL_set_fd(PGconn *conn, int fd)
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
 		goto err;
 	}
-	BIO_set_app_data(bio, conn);
+	BIO_set_data(bio, conn);
+	BIO_set_init(bio, 1);
 
 	SSL_set_bio(conn->ssl, bio, bio);
-	BIO_set_fd(bio, fd, BIO_NOCLOSE);
 	ret = 1;
 err:
 	return ret;
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 3886204c70..c6c19e6a19 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -567,6 +567,7 @@ struct pg_conn
 	bool		ssl_handshake_started;
 	bool		ssl_cert_requested; /* Did the server ask us for a cert? */
 	bool		ssl_cert_sent;	/* Did we send one in reply? */
+	bool		last_read_was_eof;
 
 #ifdef USE_SSL
 #ifdef USE_OPENSSL
-- 
2.45.0.rc1.225.g2a3ae87e7f-goog

#12Peter Eisentraut
peter@eisentraut.org
In reply to: David Benjamin (#11)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

On 19.05.24 20:07, David Benjamin wrote:

On Sun, May 19, 2024 at 12:21 PM Tom Lane <tgl@sss.pgh.pa.us
<mailto:tgl@sss.pgh.pa.us>> wrote:

Per the cfbot [1], this patch needs a rebase over the ALPN-related
commits.  It still isn't likely to get human attention before the
July commitfest, but you can save some time by making sure it's
in a reviewable state before that.

Rebased version attached. (The conflict was pretty trivial. Both patches
add a field to some struct.)

Note that there is a concurrent plan to drop support for OpenSSL older
than 1.1.1 [0]/messages/by-id/ZG3JNursG69dz1lr@paquier.xyz, which would obsolete your configure test for
BIO_get_data. Whoever commits these should be sure to coordinate this.

[0]: /messages/by-id/ZG3JNursG69dz1lr@paquier.xyz

#13Daniel Gustafsson
daniel@yesql.se
In reply to: Peter Eisentraut (#12)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

On 19 May 2024, at 22:21, Peter Eisentraut <peter@eisentraut.org> wrote:

Whoever commits these should be sure to coordinate this.

Thanks for the reminder. I have this patchset, the OCSP stapling patchset and
the multiple-cert one on my radar for when I do the 1.1.0 removal work in the
July CF. Will do another scan for SSL related patches in flight at the time.

--
Daniel Gustafsson

#14Daniel Gustafsson
daniel@yesql.se
In reply to: Daniel Gustafsson (#13)
1 attachment(s)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

On 19 May 2024, at 22:32, Daniel Gustafsson <daniel@yesql.se> wrote:

On 19 May 2024, at 22:21, Peter Eisentraut <peter@eisentraut.org> wrote:

Whoever commits these should be sure to coordinate this.

Thanks for the reminder. I have this patchset, the OCSP stapling patchset and
the multiple-cert one on my radar for when I do the 1.1.0 removal work in the
July CF. Will do another scan for SSL related patches in flight at the time.

Attached is a v4 rebase over the recent OpenSSL 1.0.2 removal which made this
patch no longer apply. I've just started to dig into it so have no comments on
it right now, but wanted to get a cleaned up version into the CFBot.

--
Daniel Gustafsson

Attachments:

v4-0001-Avoid-mixing-custom-and-OpenSSL-BIO-functions.patchapplication/octet-stream; name=v4-0001-Avoid-mixing-custom-and-OpenSSL-BIO-functions.patch; x-unix-mode=0644Download
From c33884464f5d8b5f7e336be5357fc5c8114a4b34 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Tue, 3 Sep 2024 14:13:05 +0200
Subject: [PATCH v4] Avoid mixing custom and OpenSSL BIO functions

This addresses the root cause of the BIO_set_data conflict that
c82207a548db47623a2bfa2447babdaa630302b9 attempted to address. That fix was
really just a workaround. The real root cause was that postgres was mixing up
two BIO implementations, each of which expected to be driving the BIO.

Mixing them up did not actually do any good. The Port and PGconn structures
already maintained the file descriptors. The socket BIO's copy, configured via
BIO_set_fd, wasn't even being used because my_BIO_s_socket replaced read and
write anyway. We've been essentially keeping extra state around, and relying
on it being unused:

- gets - Not implemented by sockets and not used by libssl.
- puts - Not used by libssl. If it were, it would break the special SIGPIPE and
  interrupt handling postgres aiming for.
- ctrl - (More on this later)
- create - This is just setting up state that we don't use.
- destroy - This is a no-op because we use BIO_NOCLOSE. In fact it's important
  that it's a no-op because otherwise OpenSSL would close the socket under
  postgres' feet!
- callback_ctrl - Not implemented by sockets.

That leaves ctrl. ctrl is a bunch of operations (it's ioctl). The only
operation that matters is BIO_CTRL_FLUSH, which is implemented as a no-op. All
other operations are unused. It's once again good that they're unused because
otherwise OpenSSL might mess with postgres's socket, break the assumptions
around interrupt handling, etc.

Instead, simply implement a very basic ctrl ourselves and drop the other
functions. This avoids the risk that future OpenSSL upgrades add some feature
to BIO_s_socket's ctrl which conflicts with postgres.

Once we've done that, we're free to use BIO_set_data. While BIO_set_app_data
works fine, I've reverted back to BIO_set_data because it's more commonly used.
app_data depends on OpenSSL's "ex_data" mechanism, which is a tad heavier under
the hood.

As this is no longer related to BIO_s_socket or calling SSL_set_fd, I've
renamed the methods to reference the PGconn and Port types instead.
---
 src/backend/libpq/be-secure-openssl.c    | 94 +++++++++++++++---------
 src/include/libpq/libpq-be.h             |  1 +
 src/interfaces/libpq/fe-secure-openssl.c | 81 ++++++++++++--------
 src/interfaces/libpq/libpq-int.h         |  1 +
 4 files changed, 113 insertions(+), 64 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 1ebd3f2e6d..1f86b0cd98 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -57,10 +57,10 @@
 static void default_openssl_tls_init(SSL_CTX *context, bool isServerStart);
 openssl_tls_init_hook_typ openssl_tls_init_hook = default_openssl_tls_init;
 
-static int	my_sock_read(BIO *h, char *buf, int size);
-static int	my_sock_write(BIO *h, const char *buf, int size);
-static BIO_METHOD *my_BIO_s_socket(void);
-static int	my_SSL_set_fd(Port *port, int fd);
+static int	port_bio_read(BIO *h, char *buf, int size);
+static int	port_bio_write(BIO *h, const char *buf, int size);
+static BIO_METHOD *port_bio_method(void);
+static int	ssl_set_port_bio(Port *port);
 
 static DH  *load_dh_file(char *filename, bool isServerStart);
 static DH  *load_dh_buffer(const char *buffer, size_t len);
@@ -453,7 +453,7 @@ be_tls_open_server(Port *port)
 						SSLerrmessage(ERR_get_error()))));
 		return -1;
 	}
-	if (!my_SSL_set_fd(port, port->sock))
+	if (!ssl_set_port_bio(port))
 	{
 		ereport(COMMERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -890,17 +890,19 @@ be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
  * see sock_read() and sock_write() in OpenSSL's crypto/bio/bss_sock.c.
  */
 
-static BIO_METHOD *my_bio_methods = NULL;
+static BIO_METHOD *port_bio_method_ptr = NULL;
 
 static int
-my_sock_read(BIO *h, char *buf, int size)
+port_bio_read(BIO *h, char *buf, int size)
 {
 	int			res = 0;
+	Port	   *port = (Port *) BIO_get_data(h);
 
 	if (buf != NULL)
 	{
-		res = secure_raw_read(((Port *) BIO_get_app_data(h)), buf, size);
+		res = secure_raw_read(port, buf, size);
 		BIO_clear_retry_flags(h);
+		port->last_read_was_eof = res == 0;
 		if (res <= 0)
 		{
 			/* If we were interrupted, tell caller to retry */
@@ -915,11 +917,11 @@ my_sock_read(BIO *h, char *buf, int size)
 }
 
 static int
-my_sock_write(BIO *h, const char *buf, int size)
+port_bio_write(BIO *h, const char *buf, int size)
 {
 	int			res = 0;
 
-	res = secure_raw_write(((Port *) BIO_get_app_data(h)), buf, size);
+	res = secure_raw_write(((Port *) BIO_get_data(h)), buf, size);
 	BIO_clear_retry_flags(h);
 	if (res <= 0)
 	{
@@ -933,47 +935,69 @@ my_sock_write(BIO *h, const char *buf, int size)
 	return res;
 }
 
+static long
+port_bio_ctrl(BIO *h, int cmd, long num, void *ptr)
+{
+	long		res;
+	Port	   *port = (Port *) BIO_get_data(h);
+
+	switch (cmd)
+	{
+		case BIO_CTRL_EOF:
+			/*
+			 * This should not be needed. port_bio_read already has a way to
+			 * signal EOF to OpenSSL. However, OpenSSL made an undocumented,
+			 * backwards-incompatible change and now expects EOF via BIO_ctrl.
+			 * See https://github.com/openssl/openssl/issues/8208
+			 */
+			res = port->last_read_was_eof;
+			break;
+		case BIO_CTRL_FLUSH:
+			/* libssl expects all BIOs to support BIO_flush. */
+			res = 1;
+			break;
+		default:
+			res = 0;
+			break;
+	}
+
+	return res;
+}
+
 static BIO_METHOD *
-my_BIO_s_socket(void)
+port_bio_method(void)
 {
-	if (!my_bio_methods)
+	if (!port_bio_method_ptr)
 	{
-		BIO_METHOD *biom = (BIO_METHOD *) BIO_s_socket();
 		int			my_bio_index;
 
-		my_bio_index = BIO_get_new_index();
-		if (my_bio_index == -1)
-			return NULL;
-		my_bio_index |= (BIO_TYPE_DESCRIPTOR | BIO_TYPE_SOURCE_SINK);
-		my_bio_methods = BIO_meth_new(my_bio_index, "PostgreSQL backend socket");
-		if (!my_bio_methods)
+ 		my_bio_index = BIO_get_new_index();
+ 		if (my_bio_index == -1)
+ 			return NULL;
+		my_bio_index |= BIO_TYPE_SOURCE_SINK;
+		port_bio_method_ptr = BIO_meth_new(my_bio_index, "PostgreSQL backend socket");
+		if (!port_bio_method_ptr)
 			return NULL;
-		if (!BIO_meth_set_write(my_bio_methods, my_sock_write) ||
-			!BIO_meth_set_read(my_bio_methods, my_sock_read) ||
-			!BIO_meth_set_gets(my_bio_methods, BIO_meth_get_gets(biom)) ||
-			!BIO_meth_set_puts(my_bio_methods, BIO_meth_get_puts(biom)) ||
-			!BIO_meth_set_ctrl(my_bio_methods, BIO_meth_get_ctrl(biom)) ||
-			!BIO_meth_set_create(my_bio_methods, BIO_meth_get_create(biom)) ||
-			!BIO_meth_set_destroy(my_bio_methods, BIO_meth_get_destroy(biom)) ||
-			!BIO_meth_set_callback_ctrl(my_bio_methods, BIO_meth_get_callback_ctrl(biom)))
+		if (!BIO_meth_set_write(port_bio_method_ptr, port_bio_write) ||
+			!BIO_meth_set_read(port_bio_method_ptr, port_bio_read) ||
+			!BIO_meth_set_ctrl(port_bio_method_ptr, port_bio_ctrl))
 		{
-			BIO_meth_free(my_bio_methods);
-			my_bio_methods = NULL;
+			BIO_meth_free(port_bio_method_ptr);
+			port_bio_method_ptr = NULL;
 			return NULL;
 		}
 	}
-	return my_bio_methods;
+	return port_bio_method_ptr;
 }
 
-/* This should exactly match OpenSSL's SSL_set_fd except for using my BIO */
 static int
-my_SSL_set_fd(Port *port, int fd)
+ssl_set_port_bio(Port *port)
 {
 	int			ret = 0;
 	BIO		   *bio;
 	BIO_METHOD *bio_method;
 
-	bio_method = my_BIO_s_socket();
+	bio_method = port_bio_method();
 	if (bio_method == NULL)
 	{
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
@@ -986,9 +1010,9 @@ my_SSL_set_fd(Port *port, int fd)
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
 		goto err;
 	}
-	BIO_set_app_data(bio, port);
+	BIO_set_data(bio, port);
+	BIO_set_init(bio, 1);
 
-	BIO_set_fd(bio, fd, BIO_NOCLOSE);
 	SSL_set_bio(port->ssl, bio, bio);
 	ret = 1;
 err:
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 05cb1874c5..d97d1e5f6b 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -204,6 +204,7 @@ typedef struct Port
 	char	   *peer_dn;
 	bool		peer_cert_valid;
 	bool		alpn_used;
+	bool		last_read_was_eof;
 
 	/*
 	 * OpenSSL structures. (Keep these last so that the locations of other
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index b5749d0292..420443a031 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -77,10 +77,10 @@ static char *SSLerrmessage(unsigned long ecode);
 static void SSLerrfree(char *buf);
 static int	PQssl_passwd_cb(char *buf, int size, int rwflag, void *userdata);
 
-static int	my_sock_read(BIO *h, char *buf, int size);
-static int	my_sock_write(BIO *h, const char *buf, int size);
-static BIO_METHOD *my_BIO_s_socket(void);
-static int	my_SSL_set_fd(PGconn *conn, int fd);
+static int	pgconn_bio_read(BIO *h, char *buf, int size);
+static int	pgconn_bio_write(BIO *h, const char *buf, int size);
+static BIO_METHOD *pgconn_bio_method(void);
+static int	ssl_set_pgconn_bio(PGconn *conn);
 
 static pthread_mutex_t ssl_config_mutex = PTHREAD_MUTEX_INITIALIZER;
 
@@ -989,7 +989,7 @@ initialize_SSL(PGconn *conn)
 	 */
 	if (!(conn->ssl = SSL_new(SSL_context)) ||
 		!SSL_set_app_data(conn->ssl, conn) ||
-		!my_SSL_set_fd(conn, conn->sock))
+		!ssl_set_pgconn_bio(conn))
 	{
 		char	   *err = SSLerrmessage(ERR_get_error());
 
@@ -1670,16 +1670,17 @@ PQsslAttribute(PGconn *conn, const char *attribute_name)
  */
 
 /* protected by ssl_config_mutex */
-static BIO_METHOD *my_bio_methods;
+static BIO_METHOD *pgconn_bio_method_ptr;
 
 static int
-my_sock_read(BIO *h, char *buf, int size)
+pgconn_bio_read(BIO *h, char *buf, int size)
 {
-	PGconn	   *conn = (PGconn *) BIO_get_app_data(h);
+	PGconn	   *conn = (PGconn *) BIO_get_data(h);
 	int			res;
 
 	res = pqsecure_raw_read(conn, buf, size);
 	BIO_clear_retry_flags(h);
+	conn->last_read_was_eof = res == 0;
 	if (res < 0)
 	{
 		/* If we were interrupted, tell caller to retry */
@@ -1707,11 +1708,11 @@ my_sock_read(BIO *h, char *buf, int size)
 }
 
 static int
-my_sock_write(BIO *h, const char *buf, int size)
+pgconn_bio_write(BIO *h, const char *buf, int size)
 {
 	int			res;
 
-	res = pqsecure_raw_write((PGconn *) BIO_get_app_data(h), buf, size);
+	res = pqsecure_raw_write((PGconn *) BIO_get_data(h), buf, size);
 	BIO_clear_retry_flags(h);
 	if (res < 0)
 	{
@@ -1736,25 +1737,53 @@ my_sock_write(BIO *h, const char *buf, int size)
 	return res;
 }
 
+static long
+pgconn_bio_ctrl(BIO *h, int cmd, long num, void *ptr)
+{
+	long		res;
+	PGconn	   *conn = (PGconn *) BIO_get_data(h);
+
+	switch (cmd)
+	{
+		case BIO_CTRL_EOF:
+			/*
+			 * This should not be needed. pgconn_bio_read already has a way to
+			 * signal EOF to OpenSSL. However, OpenSSL made an undocumented,
+			 * backwards-incompatible change and now expects EOF via BIO_ctrl.
+			 * See https://github.com/openssl/openssl/issues/8208
+			 */
+			res = conn->last_read_was_eof;
+			break;
+		case BIO_CTRL_FLUSH:
+			/* libssl expects all BIOs to support BIO_flush. */
+			res = 1;
+			break;
+		default:
+			res = 0;
+			break;
+	}
+
+	return res;
+}
+
 static BIO_METHOD *
-my_BIO_s_socket(void)
+pgconn_bio_method(void)
 {
 	BIO_METHOD *res;
 
 	if (pthread_mutex_lock(&ssl_config_mutex))
 		return NULL;
 
-	res = my_bio_methods;
+	res = pgconn_bio_method_ptr;
 
-	if (!my_bio_methods)
+	if (!pgconn_bio_method_ptr)
 	{
-		BIO_METHOD *biom = (BIO_METHOD *) BIO_s_socket();
 		int			my_bio_index;
 
 		my_bio_index = BIO_get_new_index();
 		if (my_bio_index == -1)
 			goto err;
-		my_bio_index |= (BIO_TYPE_DESCRIPTOR | BIO_TYPE_SOURCE_SINK);
+		my_bio_index |= BIO_TYPE_SOURCE_SINK;
 		res = BIO_meth_new(my_bio_index, "libpq socket");
 		if (!res)
 			goto err;
@@ -1763,20 +1792,15 @@ my_BIO_s_socket(void)
 		 * As of this writing, these functions never fail. But check anyway,
 		 * like OpenSSL's own examples do.
 		 */
-		if (!BIO_meth_set_write(res, my_sock_write) ||
-			!BIO_meth_set_read(res, my_sock_read) ||
-			!BIO_meth_set_gets(res, BIO_meth_get_gets(biom)) ||
-			!BIO_meth_set_puts(res, BIO_meth_get_puts(biom)) ||
-			!BIO_meth_set_ctrl(res, BIO_meth_get_ctrl(biom)) ||
-			!BIO_meth_set_create(res, BIO_meth_get_create(biom)) ||
-			!BIO_meth_set_destroy(res, BIO_meth_get_destroy(biom)) ||
-			!BIO_meth_set_callback_ctrl(res, BIO_meth_get_callback_ctrl(biom)))
+		if (!BIO_meth_set_write(res, pgconn_bio_write) ||
+			!BIO_meth_set_read(res, pgconn_bio_read) ||
+			!BIO_meth_set_ctrl(res, pgconn_bio_ctrl))
 		{
 			goto err;
 		}
 	}
 
-	my_bio_methods = res;
+	pgconn_bio_method_ptr = res;
 	pthread_mutex_unlock(&ssl_config_mutex);
 	return res;
 
@@ -1787,15 +1811,14 @@ err:
 	return NULL;
 }
 
-/* This should exactly match OpenSSL's SSL_set_fd except for using my BIO */
 static int
-my_SSL_set_fd(PGconn *conn, int fd)
+ssl_set_pgconn_bio(PGconn *conn)
 {
 	int			ret = 0;
 	BIO		   *bio;
 	BIO_METHOD *bio_method;
 
-	bio_method = my_BIO_s_socket();
+	bio_method = pgconn_bio_method();
 	if (bio_method == NULL)
 	{
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
@@ -1807,10 +1830,10 @@ my_SSL_set_fd(PGconn *conn, int fd)
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
 		goto err;
 	}
-	BIO_set_app_data(bio, conn);
+	BIO_set_data(bio, conn);
+	BIO_set_init(bio, 1);
 
 	SSL_set_bio(conn->ssl, bio, bio);
-	BIO_set_fd(bio, fd, BIO_NOCLOSE);
 	ret = 1;
 err:
 	return ret;
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 9579f80353..08cc391cbd 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -581,6 +581,7 @@ struct pg_conn
 	bool		ssl_handshake_started;
 	bool		ssl_cert_requested; /* Did the server ask us for a cert? */
 	bool		ssl_cert_sent;	/* Did we send one in reply? */
+	bool		last_read_was_eof;
 
 #ifdef USE_SSL
 #ifdef USE_OPENSSL
-- 
2.39.3 (Apple Git-146)

#15Daniel Gustafsson
daniel@yesql.se
In reply to: Daniel Gustafsson (#14)
2 attachment(s)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

On 3 Sep 2024, at 14:18, Daniel Gustafsson <daniel@yesql.se> wrote:

Attached is a v4 rebase over the recent OpenSSL 1.0.2 removal which made this
patch no longer apply. I've just started to dig into it so have no comments on
it right now, but wanted to get a cleaned up version into the CFBot.

CFBot building green for this, I just have a few small questions/comments:

+ my_bio_index |= BIO_TYPE_SOURCE_SINK;

According to the OpenSSL docs we should set BIO_TYPE_DESCRIPTOR as well as this
BIO is socket based, but it's not entirely clear to me why. Is there a
specific reason it was removed?

+ bio_method = port_bio_method();
if (bio_method == NULL)
{
SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);

SSL_F_SSL_SET_FD is no longer the correct function context for this error
reporting. In OpenSSL 3.x it means nothing since SSLerr throws away the
function when calling ERR_raise_data, but we still support 1.1.0+. I think the
correct error would be BIOerr(BIO_F_BIO_METH_NEW..) but I wonder if we should
just remove it since BIO_meth_new and BIO_new already set errors for us to
consume? It doesn't seem to make sense to add more errors on the queue here?
The same goes for the frontend part.

The attached v5 is a fresh rebase with my comments from above as 0002 to
illustrate.

--
Daniel Gustafsson

Attachments:

v5-0001-Avoid-mixing-custom-and-OpenSSL-BIO-functions.patchapplication/octet-stream; name=v5-0001-Avoid-mixing-custom-and-OpenSSL-BIO-functions.patch; x-unix-mode=0644Download
From e0f64c6f81843afcf174774a7c64874bdeb06ba5 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Tue, 3 Sep 2024 14:13:05 +0200
Subject: [PATCH v5 1/2] Avoid mixing custom and OpenSSL BIO functions

This addresses the root cause of the BIO_set_data conflict that
c82207a548db47623a2bfa2447babdaa630302b9 attempted to address. That fix was
really just a workaround. The real root cause was that postgres was mixing up
two BIO implementations, each of which expected to be driving the BIO.

Mixing them up did not actually do any good. The Port and PGconn structures
already maintained the file descriptors. The socket BIO's copy, configured via
BIO_set_fd, wasn't even being used because my_BIO_s_socket replaced read and
write anyway. We've been essentially keeping extra state around, and relying
on it being unused:

- gets - Not implemented by sockets and not used by libssl.
- puts - Not used by libssl. If it were, it would break the special SIGPIPE and
  interrupt handling postgres aiming for.
- ctrl - (More on this later)
- create - This is just setting up state that we don't use.
- destroy - This is a no-op because we use BIO_NOCLOSE. In fact it's important
  that it's a no-op because otherwise OpenSSL would close the socket under
  postgres' feet!
- callback_ctrl - Not implemented by sockets.

That leaves ctrl. ctrl is a bunch of operations (it's ioctl). The only
operation that matters is BIO_CTRL_FLUSH, which is implemented as a no-op. All
other operations are unused. It's once again good that they're unused because
otherwise OpenSSL might mess with postgres's socket, break the assumptions
around interrupt handling, etc.

Instead, simply implement a very basic ctrl ourselves and drop the other
functions. This avoids the risk that future OpenSSL upgrades add some feature
to BIO_s_socket's ctrl which conflicts with postgres.

Once we've done that, we're free to use BIO_set_data. While BIO_set_app_data
works fine, I've reverted back to BIO_set_data because it's more commonly used.
app_data depends on OpenSSL's "ex_data" mechanism, which is a tad heavier under
the hood.

As this is no longer related to BIO_s_socket or calling SSL_set_fd, I've
renamed the methods to reference the PGconn and Port types instead.
---
 src/backend/libpq/be-secure-openssl.c    | 94 +++++++++++++++---------
 src/include/libpq/libpq-be.h             |  1 +
 src/interfaces/libpq/fe-secure-openssl.c | 81 ++++++++++++--------
 src/interfaces/libpq/libpq-int.h         |  1 +
 4 files changed, 113 insertions(+), 64 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 1ebd3f2e6d..1f86b0cd98 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -57,10 +57,10 @@
 static void default_openssl_tls_init(SSL_CTX *context, bool isServerStart);
 openssl_tls_init_hook_typ openssl_tls_init_hook = default_openssl_tls_init;
 
-static int	my_sock_read(BIO *h, char *buf, int size);
-static int	my_sock_write(BIO *h, const char *buf, int size);
-static BIO_METHOD *my_BIO_s_socket(void);
-static int	my_SSL_set_fd(Port *port, int fd);
+static int	port_bio_read(BIO *h, char *buf, int size);
+static int	port_bio_write(BIO *h, const char *buf, int size);
+static BIO_METHOD *port_bio_method(void);
+static int	ssl_set_port_bio(Port *port);
 
 static DH  *load_dh_file(char *filename, bool isServerStart);
 static DH  *load_dh_buffer(const char *buffer, size_t len);
@@ -453,7 +453,7 @@ be_tls_open_server(Port *port)
 						SSLerrmessage(ERR_get_error()))));
 		return -1;
 	}
-	if (!my_SSL_set_fd(port, port->sock))
+	if (!ssl_set_port_bio(port))
 	{
 		ereport(COMMERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -890,17 +890,19 @@ be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
  * see sock_read() and sock_write() in OpenSSL's crypto/bio/bss_sock.c.
  */
 
-static BIO_METHOD *my_bio_methods = NULL;
+static BIO_METHOD *port_bio_method_ptr = NULL;
 
 static int
-my_sock_read(BIO *h, char *buf, int size)
+port_bio_read(BIO *h, char *buf, int size)
 {
 	int			res = 0;
+	Port	   *port = (Port *) BIO_get_data(h);
 
 	if (buf != NULL)
 	{
-		res = secure_raw_read(((Port *) BIO_get_app_data(h)), buf, size);
+		res = secure_raw_read(port, buf, size);
 		BIO_clear_retry_flags(h);
+		port->last_read_was_eof = res == 0;
 		if (res <= 0)
 		{
 			/* If we were interrupted, tell caller to retry */
@@ -915,11 +917,11 @@ my_sock_read(BIO *h, char *buf, int size)
 }
 
 static int
-my_sock_write(BIO *h, const char *buf, int size)
+port_bio_write(BIO *h, const char *buf, int size)
 {
 	int			res = 0;
 
-	res = secure_raw_write(((Port *) BIO_get_app_data(h)), buf, size);
+	res = secure_raw_write(((Port *) BIO_get_data(h)), buf, size);
 	BIO_clear_retry_flags(h);
 	if (res <= 0)
 	{
@@ -933,47 +935,69 @@ my_sock_write(BIO *h, const char *buf, int size)
 	return res;
 }
 
+static long
+port_bio_ctrl(BIO *h, int cmd, long num, void *ptr)
+{
+	long		res;
+	Port	   *port = (Port *) BIO_get_data(h);
+
+	switch (cmd)
+	{
+		case BIO_CTRL_EOF:
+			/*
+			 * This should not be needed. port_bio_read already has a way to
+			 * signal EOF to OpenSSL. However, OpenSSL made an undocumented,
+			 * backwards-incompatible change and now expects EOF via BIO_ctrl.
+			 * See https://github.com/openssl/openssl/issues/8208
+			 */
+			res = port->last_read_was_eof;
+			break;
+		case BIO_CTRL_FLUSH:
+			/* libssl expects all BIOs to support BIO_flush. */
+			res = 1;
+			break;
+		default:
+			res = 0;
+			break;
+	}
+
+	return res;
+}
+
 static BIO_METHOD *
-my_BIO_s_socket(void)
+port_bio_method(void)
 {
-	if (!my_bio_methods)
+	if (!port_bio_method_ptr)
 	{
-		BIO_METHOD *biom = (BIO_METHOD *) BIO_s_socket();
 		int			my_bio_index;
 
-		my_bio_index = BIO_get_new_index();
-		if (my_bio_index == -1)
-			return NULL;
-		my_bio_index |= (BIO_TYPE_DESCRIPTOR | BIO_TYPE_SOURCE_SINK);
-		my_bio_methods = BIO_meth_new(my_bio_index, "PostgreSQL backend socket");
-		if (!my_bio_methods)
+ 		my_bio_index = BIO_get_new_index();
+ 		if (my_bio_index == -1)
+ 			return NULL;
+		my_bio_index |= BIO_TYPE_SOURCE_SINK;
+		port_bio_method_ptr = BIO_meth_new(my_bio_index, "PostgreSQL backend socket");
+		if (!port_bio_method_ptr)
 			return NULL;
-		if (!BIO_meth_set_write(my_bio_methods, my_sock_write) ||
-			!BIO_meth_set_read(my_bio_methods, my_sock_read) ||
-			!BIO_meth_set_gets(my_bio_methods, BIO_meth_get_gets(biom)) ||
-			!BIO_meth_set_puts(my_bio_methods, BIO_meth_get_puts(biom)) ||
-			!BIO_meth_set_ctrl(my_bio_methods, BIO_meth_get_ctrl(biom)) ||
-			!BIO_meth_set_create(my_bio_methods, BIO_meth_get_create(biom)) ||
-			!BIO_meth_set_destroy(my_bio_methods, BIO_meth_get_destroy(biom)) ||
-			!BIO_meth_set_callback_ctrl(my_bio_methods, BIO_meth_get_callback_ctrl(biom)))
+		if (!BIO_meth_set_write(port_bio_method_ptr, port_bio_write) ||
+			!BIO_meth_set_read(port_bio_method_ptr, port_bio_read) ||
+			!BIO_meth_set_ctrl(port_bio_method_ptr, port_bio_ctrl))
 		{
-			BIO_meth_free(my_bio_methods);
-			my_bio_methods = NULL;
+			BIO_meth_free(port_bio_method_ptr);
+			port_bio_method_ptr = NULL;
 			return NULL;
 		}
 	}
-	return my_bio_methods;
+	return port_bio_method_ptr;
 }
 
-/* This should exactly match OpenSSL's SSL_set_fd except for using my BIO */
 static int
-my_SSL_set_fd(Port *port, int fd)
+ssl_set_port_bio(Port *port)
 {
 	int			ret = 0;
 	BIO		   *bio;
 	BIO_METHOD *bio_method;
 
-	bio_method = my_BIO_s_socket();
+	bio_method = port_bio_method();
 	if (bio_method == NULL)
 	{
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
@@ -986,9 +1010,9 @@ my_SSL_set_fd(Port *port, int fd)
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
 		goto err;
 	}
-	BIO_set_app_data(bio, port);
+	BIO_set_data(bio, port);
+	BIO_set_init(bio, 1);
 
-	BIO_set_fd(bio, fd, BIO_NOCLOSE);
 	SSL_set_bio(port->ssl, bio, bio);
 	ret = 1;
 err:
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 05cb1874c5..d97d1e5f6b 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -204,6 +204,7 @@ typedef struct Port
 	char	   *peer_dn;
 	bool		peer_cert_valid;
 	bool		alpn_used;
+	bool		last_read_was_eof;
 
 	/*
 	 * OpenSSL structures. (Keep these last so that the locations of other
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index b5749d0292..420443a031 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -77,10 +77,10 @@ static char *SSLerrmessage(unsigned long ecode);
 static void SSLerrfree(char *buf);
 static int	PQssl_passwd_cb(char *buf, int size, int rwflag, void *userdata);
 
-static int	my_sock_read(BIO *h, char *buf, int size);
-static int	my_sock_write(BIO *h, const char *buf, int size);
-static BIO_METHOD *my_BIO_s_socket(void);
-static int	my_SSL_set_fd(PGconn *conn, int fd);
+static int	pgconn_bio_read(BIO *h, char *buf, int size);
+static int	pgconn_bio_write(BIO *h, const char *buf, int size);
+static BIO_METHOD *pgconn_bio_method(void);
+static int	ssl_set_pgconn_bio(PGconn *conn);
 
 static pthread_mutex_t ssl_config_mutex = PTHREAD_MUTEX_INITIALIZER;
 
@@ -989,7 +989,7 @@ initialize_SSL(PGconn *conn)
 	 */
 	if (!(conn->ssl = SSL_new(SSL_context)) ||
 		!SSL_set_app_data(conn->ssl, conn) ||
-		!my_SSL_set_fd(conn, conn->sock))
+		!ssl_set_pgconn_bio(conn))
 	{
 		char	   *err = SSLerrmessage(ERR_get_error());
 
@@ -1670,16 +1670,17 @@ PQsslAttribute(PGconn *conn, const char *attribute_name)
  */
 
 /* protected by ssl_config_mutex */
-static BIO_METHOD *my_bio_methods;
+static BIO_METHOD *pgconn_bio_method_ptr;
 
 static int
-my_sock_read(BIO *h, char *buf, int size)
+pgconn_bio_read(BIO *h, char *buf, int size)
 {
-	PGconn	   *conn = (PGconn *) BIO_get_app_data(h);
+	PGconn	   *conn = (PGconn *) BIO_get_data(h);
 	int			res;
 
 	res = pqsecure_raw_read(conn, buf, size);
 	BIO_clear_retry_flags(h);
+	conn->last_read_was_eof = res == 0;
 	if (res < 0)
 	{
 		/* If we were interrupted, tell caller to retry */
@@ -1707,11 +1708,11 @@ my_sock_read(BIO *h, char *buf, int size)
 }
 
 static int
-my_sock_write(BIO *h, const char *buf, int size)
+pgconn_bio_write(BIO *h, const char *buf, int size)
 {
 	int			res;
 
-	res = pqsecure_raw_write((PGconn *) BIO_get_app_data(h), buf, size);
+	res = pqsecure_raw_write((PGconn *) BIO_get_data(h), buf, size);
 	BIO_clear_retry_flags(h);
 	if (res < 0)
 	{
@@ -1736,25 +1737,53 @@ my_sock_write(BIO *h, const char *buf, int size)
 	return res;
 }
 
+static long
+pgconn_bio_ctrl(BIO *h, int cmd, long num, void *ptr)
+{
+	long		res;
+	PGconn	   *conn = (PGconn *) BIO_get_data(h);
+
+	switch (cmd)
+	{
+		case BIO_CTRL_EOF:
+			/*
+			 * This should not be needed. pgconn_bio_read already has a way to
+			 * signal EOF to OpenSSL. However, OpenSSL made an undocumented,
+			 * backwards-incompatible change and now expects EOF via BIO_ctrl.
+			 * See https://github.com/openssl/openssl/issues/8208
+			 */
+			res = conn->last_read_was_eof;
+			break;
+		case BIO_CTRL_FLUSH:
+			/* libssl expects all BIOs to support BIO_flush. */
+			res = 1;
+			break;
+		default:
+			res = 0;
+			break;
+	}
+
+	return res;
+}
+
 static BIO_METHOD *
-my_BIO_s_socket(void)
+pgconn_bio_method(void)
 {
 	BIO_METHOD *res;
 
 	if (pthread_mutex_lock(&ssl_config_mutex))
 		return NULL;
 
-	res = my_bio_methods;
+	res = pgconn_bio_method_ptr;
 
-	if (!my_bio_methods)
+	if (!pgconn_bio_method_ptr)
 	{
-		BIO_METHOD *biom = (BIO_METHOD *) BIO_s_socket();
 		int			my_bio_index;
 
 		my_bio_index = BIO_get_new_index();
 		if (my_bio_index == -1)
 			goto err;
-		my_bio_index |= (BIO_TYPE_DESCRIPTOR | BIO_TYPE_SOURCE_SINK);
+		my_bio_index |= BIO_TYPE_SOURCE_SINK;
 		res = BIO_meth_new(my_bio_index, "libpq socket");
 		if (!res)
 			goto err;
@@ -1763,20 +1792,15 @@ my_BIO_s_socket(void)
 		 * As of this writing, these functions never fail. But check anyway,
 		 * like OpenSSL's own examples do.
 		 */
-		if (!BIO_meth_set_write(res, my_sock_write) ||
-			!BIO_meth_set_read(res, my_sock_read) ||
-			!BIO_meth_set_gets(res, BIO_meth_get_gets(biom)) ||
-			!BIO_meth_set_puts(res, BIO_meth_get_puts(biom)) ||
-			!BIO_meth_set_ctrl(res, BIO_meth_get_ctrl(biom)) ||
-			!BIO_meth_set_create(res, BIO_meth_get_create(biom)) ||
-			!BIO_meth_set_destroy(res, BIO_meth_get_destroy(biom)) ||
-			!BIO_meth_set_callback_ctrl(res, BIO_meth_get_callback_ctrl(biom)))
+		if (!BIO_meth_set_write(res, pgconn_bio_write) ||
+			!BIO_meth_set_read(res, pgconn_bio_read) ||
+			!BIO_meth_set_ctrl(res, pgconn_bio_ctrl))
 		{
 			goto err;
 		}
 	}
 
-	my_bio_methods = res;
+	pgconn_bio_method_ptr = res;
 	pthread_mutex_unlock(&ssl_config_mutex);
 	return res;
 
@@ -1787,15 +1811,14 @@ err:
 	return NULL;
 }
 
-/* This should exactly match OpenSSL's SSL_set_fd except for using my BIO */
 static int
-my_SSL_set_fd(PGconn *conn, int fd)
+ssl_set_pgconn_bio(PGconn *conn)
 {
 	int			ret = 0;
 	BIO		   *bio;
 	BIO_METHOD *bio_method;
 
-	bio_method = my_BIO_s_socket();
+	bio_method = pgconn_bio_method();
 	if (bio_method == NULL)
 	{
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
@@ -1807,10 +1830,10 @@ my_SSL_set_fd(PGconn *conn, int fd)
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
 		goto err;
 	}
-	BIO_set_app_data(bio, conn);
+	BIO_set_data(bio, conn);
+	BIO_set_init(bio, 1);
 
 	SSL_set_bio(conn->ssl, bio, bio);
-	BIO_set_fd(bio, fd, BIO_NOCLOSE);
 	ret = 1;
 err:
 	return ret;
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 9579f80353..08cc391cbd 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -581,6 +581,7 @@ struct pg_conn
 	bool		ssl_handshake_started;
 	bool		ssl_cert_requested; /* Did the server ask us for a cert? */
 	bool		ssl_cert_sent;	/* Did we send one in reply? */
+	bool		last_read_was_eof;
 
 #ifdef USE_SSL
 #ifdef USE_OPENSSL
-- 
2.39.3 (Apple Git-146)

v5-0002-Review-comments.patchapplication/octet-stream; name=v5-0002-Review-comments.patch; x-unix-mode=0644Download
From d87de7ec139023b6e7f9c808e70913e919047b18 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Wed, 4 Sep 2024 15:20:00 +0200
Subject: [PATCH v5 2/2] Review comments

---
 src/backend/libpq/be-secure-openssl.c    | 18 +++++-------------
 src/interfaces/libpq/fe-secure-openssl.c | 17 +++++------------
 2 files changed, 10 insertions(+), 25 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 1f86b0cd98..b248ab384c 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -993,30 +993,22 @@ port_bio_method(void)
 static int
 ssl_set_port_bio(Port *port)
 {
-	int			ret = 0;
 	BIO		   *bio;
 	BIO_METHOD *bio_method;
 
 	bio_method = port_bio_method();
 	if (bio_method == NULL)
-	{
-		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
-		goto err;
-	}
-	bio = BIO_new(bio_method);
+		return 0;
 
+	bio = BIO_new(bio_method);
 	if (bio == NULL)
-	{
-		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
-		goto err;
-	}
+		return 0;
+
 	BIO_set_data(bio, port);
 	BIO_set_init(bio, 1);
 
 	SSL_set_bio(port->ssl, bio, bio);
-	ret = 1;
-err:
-	return ret;
+	return 1;
 }
 
 /*
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index 420443a031..6739e009bd 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -1814,29 +1814,22 @@ err:
 static int
 ssl_set_pgconn_bio(PGconn *conn)
 {
-	int			ret = 0;
 	BIO		   *bio;
 	BIO_METHOD *bio_method;
 
 	bio_method = pgconn_bio_method();
 	if (bio_method == NULL)
-	{
-		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
-		goto err;
-	}
+		return 0;
+
 	bio = BIO_new(bio_method);
 	if (bio == NULL)
-	{
-		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
-		goto err;
-	}
+		return 0;
+
 	BIO_set_data(bio, conn);
 	BIO_set_init(bio, 1);
 
 	SSL_set_bio(conn->ssl, bio, bio);
-	ret = 1;
-err:
-	return ret;
+	return 1;
 }
 
 /*
-- 
2.39.3 (Apple Git-146)

#16David Benjamin
davidben@google.com
In reply to: Daniel Gustafsson (#15)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

On Wed, Sep 4, 2024 at 9:22 AM Daniel Gustafsson <daniel@yesql.se> wrote:

On 3 Sep 2024, at 14:18, Daniel Gustafsson <daniel@yesql.se> wrote:

Attached is a v4 rebase over the recent OpenSSL 1.0.2 removal which made

this

patch no longer apply. I've just started to dig into it so have no

comments on

it right now, but wanted to get a cleaned up version into the CFBot.

CFBot building green for this, I just have a few small questions/comments:

+ my_bio_index |= BIO_TYPE_SOURCE_SINK;

According to the OpenSSL docs we should set BIO_TYPE_DESCRIPTOR as well as
this
BIO is socket based, but it's not entirely clear to me why. Is there a
specific reason it was removed?

Looking around at what uses it, it seems BIO_TYPE_DESCRIPTOR is how OpenSSL
decides whether the BIO is expected to respond to BIO_get_fd
(BIO_C_GET_FD). Since the custom BIO is not directly backed by an fd and
doesn't implement that control, I think we don't want to include that bit.
https://github.com/openssl/openssl/blob/openssl-3.3.2/ssl/ssl_lib.c#L1621-L1643

The other place I saw that cares about this bit is this debug callback.
That one's kinda amusing because it assumes every fd-backed BIO stores its
fd in bio->num, but bio->num is not even accessible to external BIOs. I
assume this is an oversight because no one cares about this function.
Perhaps that should be sampled from BIO_get_fd.
https://github.com/openssl/openssl/blob/openssl-3.3.2/crypto/bio/bio_cb.c#L45-L62

Practically speaking, though, I don't think it makes any difference whether
BIO_TYPE_DESCRIPTOR or even BIO_TYPE_SOURCE_SINK is set or unset. I
couldn't find any code that's sensitive to BIO_TYPE_SOURCE_SINK and
presumably Postgres is not calling SSL_get_rfd on an SSL object that it
already knows is backed by a PGconn. TBH if you just passed 0 in for the
index, it would probably work just as well.

+ bio_method = port_bio_method();
if (bio_method == NULL)
{
SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);

SSL_F_SSL_SET_FD is no longer the correct function context for this error
reporting. In OpenSSL 3.x it means nothing since SSLerr throws away the
function when calling ERR_raise_data, but we still support 1.1.0+. I
think the
correct error would be BIOerr(BIO_F_BIO_METH_NEW..) but I wonder if we
should
just remove it since BIO_meth_new and BIO_new already set errors for us to
consume? It doesn't seem to make sense to add more errors on the queue
here?
The same goes for the frontend part.

Ah yeah, +1 to removing them. I've always found external code adding to the
error queue to be a little goofy. OpenSSL's error queue is weird enough
without external additions! :-)

The attached v5 is a fresh rebase with my comments from above as 0002 to
illustrate.

LGTM

David

#17Daniel Gustafsson
daniel@yesql.se
In reply to: David Benjamin (#16)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

On 4 Sep 2024, at 23:19, David Benjamin <davidben@google.com> wrote:

On Wed, Sep 4, 2024 at 9:22 AM Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> wrote:

On 3 Sep 2024, at 14:18, Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> wrote:

Attached is a v4 rebase over the recent OpenSSL 1.0.2 removal which made this
patch no longer apply. I've just started to dig into it so have no comments on
it right now, but wanted to get a cleaned up version into the CFBot.

CFBot building green for this, I just have a few small questions/comments:

+ my_bio_index |= BIO_TYPE_SOURCE_SINK;

According to the OpenSSL docs we should set BIO_TYPE_DESCRIPTOR as well as this
BIO is socket based, but it's not entirely clear to me why. Is there a
specific reason it was removed?

Looking around at what uses it, it seems BIO_TYPE_DESCRIPTOR is how OpenSSL decides whether the BIO is expected to respond to BIO_get_fd (BIO_C_GET_FD). Since the custom BIO is not directly backed by an fd and doesn't implement that control, I think we don't want to include that bit.
https://github.com/openssl/openssl/blob/openssl-3.3.2/ssl/ssl_lib.c#L1621-L1643

The other place I saw that cares about this bit is this debug callback. That one's kinda amusing because it assumes every fd-backed BIO stores its fd in bio->num, but bio->num is not even accessible to external BIOs. I assume this is an oversight because no one cares about this function. Perhaps that should be sampled from BIO_get_fd.
https://github.com/openssl/openssl/blob/openssl-3.3.2/crypto/bio/bio_cb.c#L45-L62

Practically speaking, though, I don't think it makes any difference whether BIO_TYPE_DESCRIPTOR or even BIO_TYPE_SOURCE_SINK is set or unset. I couldn't find any code that's sensitive to BIO_TYPE_SOURCE_SINK and presumably Postgres is not calling SSL_get_rfd on an SSL object that it already knows is backed by a PGconn. TBH if you just passed 0 in for the index, it would probably work just as well.

Following the bouncing ball around the code tonight I agree with that. I think
we should stick to setting BIO_TYPE_SOURCE_SINK though, if only for passing in
zero might seem incorrect enough that we get emails about that from future readers.

+ bio_method = port_bio_method();
if (bio_method == NULL)
{
SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);

SSL_F_SSL_SET_FD is no longer the correct function context for this error
reporting. In OpenSSL 3.x it means nothing since SSLerr throws away the
function when calling ERR_raise_data, but we still support 1.1.0+. I think the
correct error would be BIOerr(BIO_F_BIO_METH_NEW..) but I wonder if we should
just remove it since BIO_meth_new and BIO_new already set errors for us to
consume? It doesn't seem to make sense to add more errors on the queue here?
The same goes for the frontend part.

Ah yeah, +1 to removing them. I've always found external code adding to the error queue to be a little goofy. OpenSSL's error queue is weird enough without external additions! :-)

I wholeheartedly agree. I've previously gone on record saying that every day
with the OpenSSL API is an adventure, and the errorhandling code doubly so.

The attached v5 is a fresh rebase with my comments from above as 0002 to
illustrate.

LGTM

Thanks for reviewing, I plan on going ahead with this patch shortly.

--
Daniel Gustafsson

#18Daniel Gustafsson
daniel@yesql.se
In reply to: Daniel Gustafsson (#17)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

On 5 Sep 2024, at 00:10, Daniel Gustafsson <daniel@yesql.se> wrote:

Thanks for reviewing, I plan on going ahead with this patch shortly.

That ended up not being shortly, but having spent a fair bit of time reading
the diff over and testing on multiple versions of OpenSSL and LibreSSL I've now
pushed it. The buildfarm has built green on multiple platforms tonight but
I'll keep monitoring it.

--
Daniel Gustafsson

#19David Benjamin
davidben@google.com
In reply to: Daniel Gustafsson (#18)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

Thanks! I got asked about release branches, so I thought I'd pass it along:
how do you all handle merges to release branches and would it make sense to
merge this change? On the one hand, nothing is actively on fire yet, but
the current setup does risk breakage if OpenSSL ever migrates BIO_s_socket
to their new size_t-clean internals.

On my end, I found some time to write up the compatibility riskiness to
OpenSSL upstream and they agreed with me that BIO_meth_get_* are
problematic:
https://github.com/openssl/openssl/issues/26047
https://github.com/openssl/openssl/pull/26056

I also sent them a documentation fix so the BIO_CTRL_FLUSH requirement is
clearly written down.
https://github.com/openssl/openssl/pull/26060
I didn't write down the expectations around BIO_CTRL_EOF yet because I'm
still not really sure what they are with
https://github.com/openssl/openssl/issues/8208 and all. (If we get around
to adding BIO_read_ex to BoringSSL, I'll see if we can do something better
there---have some half-baked ideas---and, if successful, I'll try to
convince OpenSSL to do the same.)

David

On Fri, Oct 11, 2024 at 5:20 PM Daniel Gustafsson <daniel@yesql.se> wrote:

Show quoted text

On 5 Sep 2024, at 00:10, Daniel Gustafsson <daniel@yesql.se> wrote:

Thanks for reviewing, I plan on going ahead with this patch shortly.

That ended up not being shortly, but having spent a fair bit of time
reading
the diff over and testing on multiple versions of OpenSSL and LibreSSL
I've now
pushed it. The buildfarm has built green on multiple platforms tonight but
I'll keep monitoring it.

--
Daniel Gustafsson

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Benjamin (#19)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

David Benjamin <davidben@google.com> writes:

Thanks! I got asked about release branches, so I thought I'd pass it along:
how do you all handle merges to release branches and would it make sense to
merge this change? On the one hand, nothing is actively on fire yet, but
the current setup does risk breakage if OpenSSL ever migrates BIO_s_socket
to their new size_t-clean internals.

We theoretically could back-patch 6f782a2a1, as it doesn't appear to
introduce any ABI-breaking changes. (The new field in struct Port
could be an issue, but it looks like it fits into what was padding
space, so probably fine.) However, I'm not sure that it's attractive
to do so from a risk/benefit standpoint. That code's received only
minimal testing so far, and the problem it's fixing is as yet
hypothetical.

On balance I think I'd vote against a back-patch now. We could
reconsider next year once PG v18 has gotten a reasonable amount of
beta testing.

regards, tom lane

#21Daniel Gustafsson
daniel@yesql.se
In reply to: Tom Lane (#20)
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

On 29 Nov 2024, at 19:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Benjamin <davidben@google.com> writes:

Thanks! I got asked about release branches, so I thought I'd pass it along:
how do you all handle merges to release branches and would it make sense to
merge this change? On the one hand, nothing is actively on fire yet, but
the current setup does risk breakage if OpenSSL ever migrates BIO_s_socket
to their new size_t-clean internals.

We theoretically could back-patch 6f782a2a1, as it doesn't appear to
introduce any ABI-breaking changes. (The new field in struct Port
could be an issue, but it looks like it fits into what was padding
space, so probably fine.) However, I'm not sure that it's attractive
to do so from a risk/benefit standpoint. That code's received only
minimal testing so far, and the problem it's fixing is as yet
hypothetical.

On balance I think I'd vote against a back-patch now. We could
reconsider next year once PG v18 has gotten a reasonable amount of
beta testing.

I agree with all of the above.

--
Daniel Gustafsson