SSL tests fail on OpenSSL v3.2.0

Started by Nazir Bilal Yavuzover 2 years ago34 messageshackers
Jump to latest
#1Nazir Bilal Yavuz
byavuz81@gmail.com

Hi,

SSL tests fail on OpenSSL v3.2.0. I tested both on macOS (CI) and
debian (my local) and both failed with the same errors. To trigger
these errors on CI, you may need to clear the repository cache;
otherwise macOS won't install the v3.2.0 of the OpenSSL.

001_ssltests:
psql exited with signal 6 (core dumped): 'psql: error: connection to
server at "127.0.0.1", port 56718 failed: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
SSL SYSCALL error: Connection reset by peer' while running 'psql -XAtq
-d sslkey=invalid sslcert=invalid sslrootcert=invalid sslcrl=invalid
sslcrldir=invalid user=ssltestuser dbname=trustdb hostaddr=127.0.0.1
host=common-name.pg-ssltest.test sslrootcert=invalid sslmode=require
-f - -w' at /Users/admin/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm

002_scram:
psql exited with signal 6 (core dumped): 'psql: error: connection to
server at "127.0.0.1", port 54531 failed: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
SSL SYSCALL error: Connection reset by peer' while running 'psql -XAtq
-d dbname=trustdb sslmode=require sslcert=invalid sslrootcert=invalid
hostaddr=127.0.0.1 host=localhost user=ssltestuser -f - -w' at
/Users/admin/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm line 1997.

003_sslinfo:
psql exited with signal 6 (core dumped): 'psql: error: connection to
server at "127.0.0.1", port 59337 failed: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
SSL SYSCALL error: Connection reset by peer' while running 'psql -XAtq
-d sslkey=invalid sslcert=invalid sslrootcert=invalid sslcrl=invalid
sslcrldir=invalid sslrootcert=ssl/root+server_ca.crt sslmode=require
dbname=certdb hostaddr=127.0.0.1 host=localhost user=ssltestuser
sslcert=ssl/client_ext.crt
sslkey=/Users/admin/pgsql/build/testrun/ssl/003_sslinfo/data/tmp_test_q11O/client_ext.key
-f - -w' at /Users/admin/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm
line 1997.

macOS CI run: https://cirrus-ci.com/task/5128008789393408

I couldn't find the cause yet but just wanted to inform you.

--
Regards,
Nazir Bilal Yavuz
Microsoft

#2Tristan Partin
tristan@neon.tech
In reply to: Nazir Bilal Yavuz (#1)
Re: SSL tests fail on OpenSSL v3.2.0

Nazir,

Thanks for opening a thread. Was just about to start one, here what we
came up with so far.

Homebrew users discovered a regression[0]https://github.com/Homebrew/homebrew-core/issues/155651 when using Postgres compiled
and linked against OpenSSL version 3.2.

$ psql "postgresql://$DB?sslmode=require"
psql: error: connection to server at "redacted" (redacted), port 5432 failed: ERROR: Parameter 'user' is missing in startup packet.
double free or corruption (out)
Aborted (core dumped)

Analyzing the backtrace, OpenSSL was overwriting heap-allocated data in
our PGconn struct because it thought BIO::ptr was a struct bss_sock_st
*. OpenSSL then called a memset() on a member of that struct, and we
zeroed out data in our PGconn struct.

BIO_get_data(3) says the following:

These functions are mainly useful when implementing a custom BIO.

The BIO_set_data() function associates the custom data pointed to by ptr
with the BIO a. This data can subsequently be retrieved via a call to
BIO_get_data(). This can be used by custom BIOs for storing
implementation specific information.

If you take a look at my_BIO_s_socket(), we create a partially custom
BIO, but for the most part are defaulting to the methods defined by
BIO_s_socket(). We need to set application-specific data and not BIO
private data, so that the BIO implementation we rely on, can properly
assert that its private data is what it expects.

The ssl test suite continues to pass with this patch. This patch should
be backported to every supported Postgres version most likely.

[0]: https://github.com/Homebrew/homebrew-core/issues/155651

--
Tristan Partin
Neon (https://neon.tech)

Attachments:

v1-0001-Use-BIO_-get-set-_app_data-instead-of-BIO_-get-se.patchtext/x-patch; charset=utf-8; name=v1-0001-Use-BIO_-get-set-_app_data-instead-of-BIO_-get-se.patchDownload+11-32
#3Tristan Partin
tristan@neon.tech
In reply to: Tristan Partin (#2)
Re: SSL tests fail on OpenSSL v3.2.0

Here is a v2 which adds back a comment that was not meant to be removed.

--
Tristan Partin
Neon (https://neon.tech)

Attachments:

v2-0001-Use-BIO_-get-set-_app_data-instead-of-BIO_-get-se.patchtext/x-patch; charset=utf-8; name=v2-0001-Use-BIO_-get-set-_app_data-instead-of-BIO_-get-se.patchDownload+11-23
#4Michael Paquier
michael@paquier.xyz
In reply to: Tristan Partin (#3)
Re: SSL tests fail on OpenSSL v3.2.0

On Mon, Nov 27, 2023 at 12:33:49PM -0600, Tristan Partin wrote:

-		res = secure_raw_read(((Port *) BIO_get_data(h)), buf, size);
+		res = secure_raw_read(((Port *) BIO_get_app_data(h)), buf, size);
BIO_clear_retry_flags(h);
if (res <= 0)

Interesting. I have yet to look at that in details, but
BIO_get_app_data() exists down to 0.9.8, which is the oldest version
we need to support for stable branches. So that looks like a safe
bet.

-#ifndef HAVE_BIO_GET_DATA
-#define BIO_get_data(bio) (bio->ptr)
-#define BIO_set_data(bio, data) (bio->ptr = data)
-#endif

Shouldn't this patch do a refresh of configure.ac and remove the check
on BIO_get_data() if HAVE_BIO_GET_DATA is gone?
--
Michael

#5Tristan Partin
tristan@neon.tech
In reply to: Michael Paquier (#4)
Re: SSL tests fail on OpenSSL v3.2.0

On Mon Nov 27, 2023 at 5:53 PM CST, Michael Paquier wrote:

On Mon, Nov 27, 2023 at 12:33:49PM -0600, Tristan Partin wrote:

-#ifndef HAVE_BIO_GET_DATA
-#define BIO_get_data(bio) (bio->ptr)
-#define BIO_set_data(bio, data) (bio->ptr = data)
-#endif

Shouldn't this patch do a refresh of configure.ac and remove the check
on BIO_get_data() if HAVE_BIO_GET_DATA is gone?

See the attached v3. I am unfamiliar with autotools, so I just hand
edited the configure.ac script instead of whatever "refresh" means.

--
Tristan Partin
Neon (https://neon.tech)

Attachments:

v3-0001-Use-BIO_-get-set-_app_data-instead-of-BIO_-get-se.patchtext/x-patch; charset=utf-8; name=v3-0001-Use-BIO_-get-set-_app_data-instead-of-BIO_-get-se.patchDownload+13-25
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#4)
Re: SSL tests fail on OpenSSL v3.2.0

Michael Paquier <michael@paquier.xyz> writes:

Interesting. I have yet to look at that in details, but
BIO_get_app_data() exists down to 0.9.8, which is the oldest version
we need to support for stable branches. So that looks like a safe
bet.

What about LibreSSL? In general, I'm not too pleased with just assuming
that BIO_get_app_data exists. If we can do that, we can probably remove
most of the OpenSSL function probes that configure.ac has today. Even
if that's a good idea in HEAD, I doubt we want to do it all the way back.

I'd be inclined to form the patch more along the lines of
s/BIO_get_data/BIO_get_app_data/g, with a configure check for
BIO_get_app_data and falling back to the existing direct use of
bio->ptr if it's not there.

regards, tom lane

#7Bo Anderson
mail@boanderson.me
In reply to: Tom Lane (#6)
Re: SSL tests fail on OpenSSL v3.2.0

It was first added in SSLeay 0.8.1 which predates OpenSSL let alone the LibreSSL fork.

It probably doesn’t exist in BoringSSL but neither does a lot of things.

Show quoted text

On 28 Nov 2023, at 00:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Michael Paquier <michael@paquier.xyz> writes:

Interesting. I have yet to look at that in details, but
BIO_get_app_data() exists down to 0.9.8, which is the oldest version
we need to support for stable branches. So that looks like a safe
bet.

What about LibreSSL? In general, I'm not too pleased with just assuming
that BIO_get_app_data exists. If we can do that, we can probably remove
most of the OpenSSL function probes that configure.ac has today. Even
if that's a good idea in HEAD, I doubt we want to do it all the way back.

I'd be inclined to form the patch more along the lines of
s/BIO_get_data/BIO_get_app_data/g, with a configure check for
BIO_get_app_data and falling back to the existing direct use of
bio->ptr if it's not there.

regards, tom lane

#8Tristan Partin
tristan@neon.tech
In reply to: Tom Lane (#6)
Re: SSL tests fail on OpenSSL v3.2.0

On Mon Nov 27, 2023 at 6:21 PM CST, Tom Lane wrote:

Michael Paquier <michael@paquier.xyz> writes:

Interesting. I have yet to look at that in details, but
BIO_get_app_data() exists down to 0.9.8, which is the oldest version
we need to support for stable branches. So that looks like a safe
bet.

What about LibreSSL? In general, I'm not too pleased with just assuming
that BIO_get_app_data exists. If we can do that, we can probably remove
most of the OpenSSL function probes that configure.ac has today. Even
if that's a good idea in HEAD, I doubt we want to do it all the way back.

As Bo said, this has been available since before LibreSSL forked off of
OpenSSL.

I'd be inclined to form the patch more along the lines of
s/BIO_get_data/BIO_get_app_data/g, with a configure check for
BIO_get_app_data and falling back to the existing direct use of
bio->ptr if it's not there.

Falling back to what existed before is invalid. BIO::ptr is private data
for the BIO implementation. BIO_{get,set}_app_data() does
something completely different than setting BIO::ptr. In Postgres we
call BIO_meth_set_create() with BIO_meth_get_create() from
BIO_s_socket(). The create function we pass allocates bi->ptr to
a struct bss_sock_st * as previously stated, and that's been the case
since March 10, 2022[0]https://github.com/openssl/openssl/commit/a3e53d56831adb60d6875297b3339a4251f735d2. Essentially Postgres only worked because the
BIO implementation didn't use the private data section until the linked
commit. I don't see any reason to keep compatibility with what only
worked by accident.

[0]: https://github.com/openssl/openssl/commit/a3e53d56831adb60d6875297b3339a4251f735d2

--
Tristan Partin
Neon (https://neon.tech)

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tristan Partin (#8)
Re: SSL tests fail on OpenSSL v3.2.0

"Tristan Partin" <tristan@neon.tech> writes:

On Mon Nov 27, 2023 at 6:21 PM CST, Tom Lane wrote:

What about LibreSSL? In general, I'm not too pleased with just assuming
that BIO_get_app_data exists.

Falling back to what existed before is invalid.

Well, sure it only worked by accident, but it did work with older
OpenSSL versions. If we assume that BIO_get_app_data exists, and
somebody tries to use it with a version that hasn't got that,
it won't work.

Having said that, my concern was mainly driven by the comments in
configure.ac claiming that this was an OpenSSL 1.1.0 addition.
Looking at the relevant commits, 593d4e47d and 5c6df67e0, it seems
that that was less about "the function doesn't exist before 1.1.0"
and more about "in 1.1.0 we have to use the function because we
can no longer directly access the ptr field". If the function
does exist in 0.9.8 then I concur that we don't need to test.

regards, tom lane

#10Tristan Partin
tristan@neon.tech
In reply to: Tom Lane (#9)
Re: SSL tests fail on OpenSSL v3.2.0

On Mon Nov 27, 2023 at 7:14 PM CST, Tom Lane wrote:

"Tristan Partin" <tristan@neon.tech> writes:

On Mon Nov 27, 2023 at 6:21 PM CST, Tom Lane wrote:

What about LibreSSL? In general, I'm not too pleased with just assuming
that BIO_get_app_data exists.

Falling back to what existed before is invalid.

Well, sure it only worked by accident, but it did work with older
OpenSSL versions. If we assume that BIO_get_app_data exists, and
somebody tries to use it with a version that hasn't got that,
it won't work.

Having said that, my concern was mainly driven by the comments in
configure.ac claiming that this was an OpenSSL 1.1.0 addition.
Looking at the relevant commits, 593d4e47d and 5c6df67e0, it seems
that that was less about "the function doesn't exist before 1.1.0"
and more about "in 1.1.0 we have to use the function because we
can no longer directly access the ptr field". If the function
does exist in 0.9.8 then I concur that we don't need to test.

I have gone back all the way to 1.0.0 and confirmed that the function
exists. Didn't choose to go further than that since Postgres doesn't
support it.

--
Tristan Partin
Neon (https://neon.tech)

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tristan Partin (#10)
Re: SSL tests fail on OpenSSL v3.2.0

"Tristan Partin" <tristan@neon.tech> writes:

On Mon Nov 27, 2023 at 7:14 PM CST, Tom Lane wrote:

... If the function
does exist in 0.9.8 then I concur that we don't need to test.

I have gone back all the way to 1.0.0 and confirmed that the function
exists. Didn't choose to go further than that since Postgres doesn't
support it.

Since this is something we'd need to back-patch, OpenSSL 0.9.8
and later are relevant: the v12 branch still supports those.
It's moot given Bo's claim about the origin of the function,
though.

regards, tom lane

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#9)
Re: SSL tests fail on OpenSSL v3.2.0

I can confirm that we also fail when using up-to-date MacPorts, which
seems to have started shipping 3.2.0 last week or so. I tried the v3
patch, and while that stops the crash, it looks like 3.2.0 has also
made some random changes in error messages:

# +++ tap check in src/test/ssl +++
t/001_ssltests.pl .. 163/?
# Failed test 'certificate authorization fails with revoked client cert: matches'
# at t/001_ssltests.pl line 775.
# 'psql: error: connection to server at "127.0.0.1", port 58332 failed: SSL error: ssl/tls alert certificate revoked'
# doesn't match '(?^:SSL error: sslv3 alert certificate revoked)'
# Failed test 'certificate authorization fails with revoked client cert with server-side CRL directory: matches'
# at t/001_ssltests.pl line 880.
# 'psql: error: connection to server at "127.0.0.1", port 58332 failed: SSL error: ssl/tls alert certificate revoked'
# doesn't match '(?^:SSL error: sslv3 alert certificate revoked)'
# Failed test 'certificate authorization fails with revoked UTF-8 client cert with server-side CRL directory: matches'
# at t/001_ssltests.pl line 893.
# 'psql: error: connection to server at "127.0.0.1", port 58332 failed: SSL error: ssl/tls alert certificate revoked'
# doesn't match '(?^:SSL error: sslv3 alert certificate revoked)'
# Looks like you failed 3 tests of 205.
t/001_ssltests.pl .. Dubious, test returned 3 (wstat 768, 0x300)
Failed 3/205 subtests
t/002_scram.pl ..... ok
t/003_sslinfo.pl ... ok

Guess we'll need to adjust the test script a bit too.

regards, tom lane

#13Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#11)
Re: SSL tests fail on OpenSSL v3.2.0

On Mon, Nov 27, 2023 at 08:32:28PM -0500, Tom Lane wrote:

Since this is something we'd need to back-patch, OpenSSL 0.9.8
and later are relevant: the v12 branch still supports those.
It's moot given Bo's claim about the origin of the function,
though.

Yep, unfortunately this needs to be checked down to 0.9.8. I've just
done this exercise yesterday for another backpatch..
--
Michael

#14Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#12)
Re: SSL tests fail on OpenSSL v3.2.0

On Mon, Nov 27, 2023 at 09:04:23PM -0500, Tom Lane wrote:

I can confirm that we also fail when using up-to-date MacPorts, which
seems to have started shipping 3.2.0 last week or so. I tried the v3
patch, and while that stops the crash, it looks like 3.2.0 has also
made some random changes in error messages:

Failed 3/205 subtests
t/002_scram.pl ..... ok
t/003_sslinfo.pl ... ok

Guess we'll need to adjust the test script a bit too.

Sigh. We could use an extra check_pg_config() with a routine new in
3.2.0. Looking at CHANGES.md, SSL_get0_group_name() seems to be one
generic choice here.
--
Michael

#15Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#14)
Re: SSL tests fail on OpenSSL v3.2.0

On Tue, Nov 28, 2023 at 12:55:37PM +0900, Michael Paquier wrote:

Sigh. We could use an extra check_pg_config() with a routine new in
3.2.0. Looking at CHANGES.md, SSL_get0_group_name() seems to be one
generic choice here.

Or even simpler: plant a (ssl\/tls|sslv3) in these strings.
--
Michael

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#15)
Re: SSL tests fail on OpenSSL v3.2.0

Michael Paquier <michael@paquier.xyz> writes:

Or even simpler: plant a (ssl\/tls|sslv3) in these strings.

Yeah, weakening the pattern match was what I had in mind.
I was thinking of something like "ssl[a-z0-9/]*" but your
proposal works too.

regards, tom lane

#17Daniel Gustafsson
daniel@yesql.se
In reply to: Bo Anderson (#7)
Re: SSL tests fail on OpenSSL v3.2.0

On 28 Nov 2023, at 01:29, Bo Anderson <mail@boanderson.me> wrote:

It probably doesn’t exist in BoringSSL but neither does a lot of things.

Thats not an issue, we don't support building with BoringSSL.

--
Daniel Gustafsson

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Gustafsson (#17)
Re: SSL tests fail on OpenSSL v3.2.0

Daniel Gustafsson <daniel@yesql.se> writes:

Thats not an issue, we don't support building with BoringSSL.

Right. I'll work on getting this pushed, unless someone else
is already on it?

regards, tom lane

#19Tristan Partin
tristan@neon.tech
In reply to: Tom Lane (#18)
Re: SSL tests fail on OpenSSL v3.2.0

On Tue Nov 28, 2023 at 9:00 AM CST, Tom Lane wrote:

Daniel Gustafsson <daniel@yesql.se> writes:

Thats not an issue, we don't support building with BoringSSL.

Right. I'll work on getting this pushed, unless someone else
is already on it?

When you say "this" are you referring to the patch I sent or adding
support for BoringSSL?

--
Tristan Partin
Neon (https://neon.tech)

#20Tristan Partin
tristan@neon.tech
In reply to: Tom Lane (#16)
Re: SSL tests fail on OpenSSL v3.2.0

How are you guys running the tests? I have PG_TEST_EXTRA=ssl and
everything passes for me. Granted, I am using the Meson build.

--
Tristan Partin
Neon (https://neon.tech)

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tristan Partin (#20)
#22Tristan Partin
tristan@neon.tech
In reply to: Tom Lane (#21)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tristan Partin (#19)
#24Tristan Partin
tristan@neon.tech
In reply to: Tom Lane (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tristan Partin (#24)
#26Tristan Partin
tristan@neon.tech
In reply to: Tom Lane (#25)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tristan Partin (#26)
#28Tristan Partin
tristan@neon.tech
In reply to: Tom Lane (#23)
#29Daniel Gustafsson
daniel@yesql.se
In reply to: Tristan Partin (#28)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Gustafsson (#29)
#31Tristan Partin
tristan@neon.tech
In reply to: Tom Lane (#30)
#32Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#30)
#33Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Alvaro Herrera (#32)
#34Tristan Partin
tristan@neon.tech
In reply to: Jelte Fennema-Nio (#33)