disabled SSL log_like tests
Back in 2022 in commit 55828a6b6084 we disabled a bunch of tests due to
a timing issue. Nothing seems to have been done since ... do we still
want to keep these commented out lines around? This "temporary" fix
seems to have stretched the definition of that term more than somewhat.
(noticed when looking into a different issue).
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes:
Back in 2022 in commit 55828a6b6084 we disabled a bunch of tests due to
a timing issue. Nothing seems to have been done since ... do we still
want to keep these commented out lines around? This "temporary" fix
seems to have stretched the definition of that term more than somewhat.
Per my coffee cup, "Nothing is as permanent as a temporary fix".
However, I wonder whether Andres' work at 8b886a4e3 could be used
to improve this, either directly or as inspiration?
regards, tom lane
On 2025-04-17 Th 10:56 AM, Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
Back in 2022 in commit 55828a6b6084 we disabled a bunch of tests due to
a timing issue. Nothing seems to have been done since ... do we still
want to keep these commented out lines around? This "temporary" fix
seems to have stretched the definition of that term more than somewhat.Per my coffee cup, "Nothing is as permanent as a temporary fix".
However, I wonder whether Andres' work at 8b886a4e3 could be used
to improve this, either directly or as inspiration?
I don't think so - these tests are about checking log file contents, not
a psql problem.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes:
On 2025-04-17 Th 10:56 AM, Tom Lane wrote:
However, I wonder whether Andres' work at 8b886a4e3 could be used
to improve this, either directly or as inspiration?
I don't think so - these tests are about checking log file contents, not
a psql problem.
Well, I was mainly leaning on the "inspiration" part: the idea
is to wait for something that must come out after the text we
want to look for. After digging around a bit I noticed that
002_connection_limits.pl's connect_fails_wait is doing something
that's almost what we want: that test cranks log_min_messages
up to DEBUG2 and then waits for the postmaster's report of
backend exit before believing that it's done.
Awhile later I had the attached patch. Some notes:
* The commented-out tests in 001_ssltests.pl contained hard-wired
expectations as to certificate serial numbers, which are obsolete now.
I just replaced them with "\d+", but if anyone feels like that's not
good enough, we could continue to check for exact serial numbers and
eat the ensuing maintenance effort.
* I was more than slightly surprised to find that there are a bunch of
other connect_fails callers that are testing log_like or log_unlike
and thereby risking the same type of race condition. Some of those
tests are relatively new and perhaps just haven't failed *yet*,
but I wonder if we changed something since 2022 that solves this
problem in a different way? Anyway, after this change any such
caller must set log_min_messages = debug2 or fail. I think I got
all the relevant test scripts in the attached.
regards, tom lane
Attachments:
v1-fix-connect_fails-races.patchtext/x-diff; charset=us-ascii; name=v1-fix-connect_fails-races.patchDownload+51-41
On Fri, Apr 18, 2025 at 12:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
* The commented-out tests in 001_ssltests.pl contained hard-wired
expectations as to certificate serial numbers, which are obsolete now.
I just replaced them with "\d+", but if anyone feels like that's not
good enough, we could continue to check for exact serial numbers and
eat the ensuing maintenance effort.
No, I think \d+ should be fine.
(I haven't stared closely at the racing code yet, but I like the
concept of the patch.)
Thanks!
--Jacob
I wrote:
* I was more than slightly surprised to find that there are a bunch of
other connect_fails callers that are testing log_like or log_unlike
and thereby risking the same type of race condition. Some of those
tests are relatively new and perhaps just haven't failed *yet*,
but I wonder if we changed something since 2022 that solves this
problem in a different way?
Nope, apparently not. The failure depends on the kernel's scheduler,
so unsurprisingly it's quite platform-dependent. But I can reproduce
it reliably on mamba's host (NetBSD 10/macppc) if I enable the
001_ssltests.pl log_like tests without applying the connect_fails
changes.
The fact that mamba hasn't been failing on the other affected
tests is a bit puzzling. mamba isn't running kerberos or ldap
or oauth_validator, so the lack of failures there is expected.
authentication/t/001_password.pl appears accidentally not vulnerable:
it's not using log_like in any connect_fails tests. (It is using
log_unlike, so those tests could be giving silent false negatives.)
But authentication/t/003_peer.pl has 8 test cases that look
vulnerable. I guess there must be some extra dollop of timing
weirdness in the ssl tests that's not there in 003_peer.pl.
Unfortunately ... it sometimes fails even with the connect_fails
changes, for example
# Failed test 'intermediate client certificate is missing: log matches'
# at /home/tgl/pgsql/src/test/ssl/../../../src/test/perl/PostgreSQL/Test/Cluster.pm line 2666.
# '2025-04-18 17:59:19.358 EDT [1460] DEBUG: assigned pm child slot 2 for backend
# 2025-04-18 17:59:19.359 EDT [1460] DEBUG: forked new client backend, pid=599 socket=8
# 2025-04-18 17:59:19.369 EDT [599] [unknown] LOG: connection received: host=localhost port=63709
# 2025-04-18 17:59:19.436 EDT [1460] DEBUG: releasing pm child slot 1
# 2025-04-18 17:59:19.436 EDT [1460] DEBUG: client backend (PID 25401) exited with exit code 0
# '
# doesn't match '(?^:Client certificate verification failed at depth 0: unable to get local issuer certificate)'
What I think happened here is that a previous backend hadn't exited
yet when we start the test, and when its report does come out,
connect_fails prematurely stops waiting. (In the above, evidently
the child process we want to wait for is 599, but we're fooled by
a delayed report for 25401.) So my v1 patch needs work.
Maybe we can make the test compare the PIDs in the "forked new client
backend" and "client backend exited" log messages. Stay tuned...
regards, tom lane
I wrote:
What I think happened here is that a previous backend hadn't exited
yet when we start the test, and when its report does come out,
connect_fails prematurely stops waiting. (In the above, evidently
the child process we want to wait for is 599, but we're fooled by
a delayed report for 25401.) So my v1 patch needs work.
Maybe we can make the test compare the PIDs in the "forked new client
backend" and "client backend exited" log messages. Stay tuned...
Okay, this version seems more reliable.
regards, tom lane
Attachments:
v2-fix-connect_fails-races.patchtext/x-diff; charset=us-ascii; name=v2-fix-connect_fails-races.patchDownload+53-42
On 2025-04-18 Fr 7:26 PM, Tom Lane wrote:
I wrote:
What I think happened here is that a previous backend hadn't exited
yet when we start the test, and when its report does come out,
connect_fails prematurely stops waiting. (In the above, evidently
the child process we want to wait for is 599, but we're fooled by
a delayed report for 25401.) So my v1 patch needs work.
Maybe we can make the test compare the PIDs in the "forked new client
backend" and "client backend exited" log messages. Stay tuned...Okay, this version seems more reliable.
+See C<log_check(...)>. CAUTION: use of either option requires that
+the server's log_min_messages be at least DEBUG2, and that no other
+client backend is launched concurrently. These requirements allow
+C<connect_fails> to wait to see the postmaster-log report of backend
+exit, without which there is a race condition as to whether we will
+see the expected backend log output.
That seems a little fragile. I can imagine test authors easily
forgetting this. Is it worth sanity checking to make sure
log_min_messages is appropriately set?
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes:
On 2025-04-18 Fr 7:26 PM, Tom Lane wrote: +See C<log_check(...)>. CAUTION: use of either option requires that +the server's log_min_messages be at least DEBUG2, and that no other +client backend is launched concurrently. These requirements allow +C<connect_fails> to wait to see the postmaster-log report of backend +exit, without which there is a race condition as to whether we will +see the expected backend log output.
That seems a little fragile. I can imagine test authors easily
forgetting this. Is it worth sanity checking to make sure
log_min_messages is appropriately set?
Setting log_min_messages is not so easily forgotten, because
connect_fails will just hang until timeout if you didn't.
I'm more worried about the "no other backend" requirement.
I think v2 is reasonably proof against that, but whether it's
sufficiently bulletproof to withstand the buildfarm environment
remains to be seen. I wish there were a better way to
determine the backend PID for a failed connection...
regards, tom lane
I wrote:
The fact that mamba hasn't been failing on the other affected
tests is a bit puzzling. mamba isn't running kerberos or ldap
or oauth_validator, so the lack of failures there is expected.
authentication/t/001_password.pl appears accidentally not vulnerable:
it's not using log_like in any connect_fails tests. (It is using
log_unlike, so those tests could be giving silent false negatives.)
But authentication/t/003_peer.pl has 8 test cases that look
vulnerable. I guess there must be some extra dollop of timing
weirdness in the ssl tests that's not there in 003_peer.pl.
After pushing this patch, the probable explanation for the "extra
timing weirdness" finally penetrated my brain: the error reports
we are looking for are cases that are detected by OpenSSL. So
it's plausible that OpenSSL has told the connecting client to
go away before it returns the error indication to the backend's
calling code, which would be what would log the message. That is
what provides the window for a race condition. You still need
a bunch of assumptions about the kernel's subsequent scheduling
of psql and the TAP script versus the backend, but the whole
thing is certainly dependent on psql getting the word sooner
than the backend.
(Not all of the tests disabled by 55828a6b6 meet this description,
but a bunch of them do, and I believe that I just zapped every
log_like condition in the script rather than trying to be selective
about which ones were known to have failed.)
It seems at least plausible that the Kerberos tests could have
a similar problem. I'm less sure about LDAP or OAuth. But
authentication/t/003_peer.pl would not, because elog.c emits errors
to the log before sending them to the client. So that explains
the lack of buildfarm reports from mamba, and it's unclear that
we have any other animals with the right timing behavior to
see this.
regards, tom lane
If you run the not-yet-enabled-by-default OpenBSD CI task on master,
ssl/001_ssltests fails in "intermediate client certificate is
untrusted", recently uncommented by commit e0f373ee. I think it might
be telling us that LibreSSL's x509_store_ctx_get_current_cert() is
giving us the client certificate (ie chain depth 0) instead of the
intermediate certificate, even though X509_STORE_CTX_get_error_depth()
returned 1 as expected. I don't know why it would do that, given the
documentation:
X509_STORE_CTX_get_current_cert() returns the certificate in ctx which
caused the error or NULL if no certificate is relevant.
The explanation is probably in here somewhere, but I don't understand
these things:
https://github.com/openbsd/src/blob/master/lib/libcrypto/x509/x509_vfy.c
https://github.com/openssl/openssl/blob/master/crypto/x509/x509_vfy.c
[17:55:28.888] # Failed test 'intermediate client certificate is
untrusted: log matches'
[17:55:28.888] # at
/home/postgres/postgres/src/test/perl/PostgreSQL/Test/Cluster.pm line
2667.
[17:55:28.888] # '2025-05-05 17:55:28.353 UTC
[10009]: [postmaster] DEBUG: assigned pm child slot 1 for backend [17:55:28.888] # 2025-05-05 17:55:28.354 UTC [10009][postmaster] DEBUG: forked new client backend, pid=27624 socket=8 [17:55:28.888] # 2025-05-05 17:55:28.355 UTC [27624][not initialized] [[unknown]][:0] LOG: connection received: host=localhost port=11357 [17:55:28.888] # 2025-05-05 17:55:28.374 UTC [27624][not initialized] [[unknown]][:0] LOG: could not accept SSL connection: certificate verify failed [17:55:28.888] # 2025-05-05 17:55:28.374 UTC [27624][not initialized] [[unknown]][:0] DETAIL: Client certificate verification failed at depth 1: unable to get local issuer certificate. [17:55:28.888] # Failed certificate data (unverified): subject "/CN=ssltestuser", serial number 2315702411956921344, issuer "/CN=Test CA for PostgreSQL SSL regression test client certs". [17:55:28.888] # 2025-05-05 17:55:28.374 UTC [27624][not initialized] [[unknown]][:0] DEBUG: SSL connection from DN:"(anonymous)" CN:"(anonymous)" [17:55:28.888] # 2025-05-05 17:55:28.377 UTC [10009][postmaster] DEBUG: releasing pm child slot 1 [17:55:28.888] # 2025-05-05 17:55:28.377 UTC [10009][postmaster] DEBUG: client backend (PID 27624) exited with exit code 0 [17:55:28.888] # ' [17:55:28.888] # doesn't match '(?^:Failed certificate data \(unverified\): subject "/CN=Test CA for PostgreSQL SSL regression test client certs", serial number \d+, issuer "/CN=Test root CA for PostgreSQL SSL regression test suite")' [17:55:28.888] # Looks like you failed 1 test of 240.
[17:55:28.888] # 2025-05-05 17:55:28.354 UTC [10009][postmaster] DEBUG: assigned pm child slot 1 for backend [17:55:28.888] # 2025-05-05 17:55:28.354 UTC [10009][postmaster] DEBUG: forked new client backend, pid=27624 socket=8 [17:55:28.888] # 2025-05-05 17:55:28.355 UTC [27624][not initialized] [[unknown]][:0] LOG: connection received: host=localhost port=11357 [17:55:28.888] # 2025-05-05 17:55:28.374 UTC [27624][not initialized] [[unknown]][:0] LOG: could not accept SSL connection: certificate verify failed [17:55:28.888] # 2025-05-05 17:55:28.374 UTC [27624][not initialized] [[unknown]][:0] DETAIL: Client certificate verification failed at depth 1: unable to get local issuer certificate. [17:55:28.888] # Failed certificate data (unverified): subject "/CN=ssltestuser", serial number 2315702411956921344, issuer "/CN=Test CA for PostgreSQL SSL regression test client certs". [17:55:28.888] # 2025-05-05 17:55:28.374 UTC [27624][not initialized] [[unknown]][:0] DEBUG: SSL connection from DN:"(anonymous)" CN:"(anonymous)" [17:55:28.888] # 2025-05-05 17:55:28.377 UTC [10009][postmaster] DEBUG: releasing pm child slot 1 [17:55:28.888] # 2025-05-05 17:55:28.377 UTC [10009][postmaster] DEBUG: client backend (PID 27624) exited with exit code 0 [17:55:28.888] # ' [17:55:28.888] # doesn't match '(?^:Failed certificate data \(unverified\): subject "/CN=Test CA for PostgreSQL SSL regression test client certs", serial number \d+, issuer "/CN=Test root CA for PostgreSQL SSL regression test suite")' [17:55:28.888] # Looks like you failed 1 test of 240.[postmaster]
DEBUG: forked new client backend, pid=27624 socket=8
[17:55:28.888] # 2025-05-05 17:55:28.355 UTC [27624][not initialized]
[[unknown]][:0] LOG: connection received: host=localhost port=11357
[17:55:28.888] # 2025-05-05 17:55:28.374 UTC [27624][not initialized]
[[unknown]][:0] LOG: could not accept SSL connection: certificate
verify failed
[17:55:28.888] # 2025-05-05 17:55:28.374 UTC [27624][not initialized]
[[unknown]][:0] DETAIL: Client certificate verification failed at
depth 1: unable to get local issuer certificate.
[17:55:28.888] # Failed certificate data (unverified): subject
"/CN=ssltestuser", serial number 2315702411956921344, issuer "/CN=Test
CA for PostgreSQL SSL regression test client certs".
[17:55:28.888] # 2025-05-05 17:55:28.374 UTC [27624][not initialized]
[[unknown]][:0] DEBUG: SSL connection from DN:"(anonymous)"
CN:"(anonymous)"
[17:55:28.888] # 2025-05-05 17:55:28.377 UTC [10009][postmaster] DEBUG: assigned pm child slot 1 for backend [17:55:28.888] # 2025-05-05 17:55:28.354 UTC [10009][postmaster] DEBUG: forked new client backend, pid=27624 socket=8 [17:55:28.888] # 2025-05-05 17:55:28.355 UTC [27624][not initialized] [[unknown]][:0] LOG: connection received: host=localhost port=11357 [17:55:28.888] # 2025-05-05 17:55:28.374 UTC [27624][not initialized] [[unknown]][:0] LOG: could not accept SSL connection: certificate verify failed [17:55:28.888] # 2025-05-05 17:55:28.374 UTC [27624][not initialized] [[unknown]][:0] DETAIL: Client certificate verification failed at depth 1: unable to get local issuer certificate. [17:55:28.888] # Failed certificate data (unverified): subject "/CN=ssltestuser", serial number 2315702411956921344, issuer "/CN=Test CA for PostgreSQL SSL regression test client certs". [17:55:28.888] # 2025-05-05 17:55:28.374 UTC [27624][not initialized] [[unknown]][:0] DEBUG: SSL connection from DN:"(anonymous)" CN:"(anonymous)" [17:55:28.888] # 2025-05-05 17:55:28.377 UTC [10009][postmaster] DEBUG: releasing pm child slot 1 [17:55:28.888] # 2025-05-05 17:55:28.377 UTC [10009][postmaster] DEBUG: client backend (PID 27624) exited with exit code 0 [17:55:28.888] # ' [17:55:28.888] # doesn't match '(?^:Failed certificate data \(unverified\): subject "/CN=Test CA for PostgreSQL SSL regression test client certs", serial number \d+, issuer "/CN=Test root CA for PostgreSQL SSL regression test suite")' [17:55:28.888] # Looks like you failed 1 test of 240.[postmaster]
DEBUG: releasing pm child slot 1
[17:55:28.888] # 2025-05-05 17:55:28.377 UTC [10009][postmaster] DEBUG: assigned pm child slot 1 for backend [17:55:28.888] # 2025-05-05 17:55:28.354 UTC [10009][postmaster] DEBUG: forked new client backend, pid=27624 socket=8 [17:55:28.888] # 2025-05-05 17:55:28.355 UTC [27624][not initialized] [[unknown]][:0] LOG: connection received: host=localhost port=11357 [17:55:28.888] # 2025-05-05 17:55:28.374 UTC [27624][not initialized] [[unknown]][:0] LOG: could not accept SSL connection: certificate verify failed [17:55:28.888] # 2025-05-05 17:55:28.374 UTC [27624][not initialized] [[unknown]][:0] DETAIL: Client certificate verification failed at depth 1: unable to get local issuer certificate. [17:55:28.888] # Failed certificate data (unverified): subject "/CN=ssltestuser", serial number 2315702411956921344, issuer "/CN=Test CA for PostgreSQL SSL regression test client certs". [17:55:28.888] # 2025-05-05 17:55:28.374 UTC [27624][not initialized] [[unknown]][:0] DEBUG: SSL connection from DN:"(anonymous)" CN:"(anonymous)" [17:55:28.888] # 2025-05-05 17:55:28.377 UTC [10009][postmaster] DEBUG: releasing pm child slot 1 [17:55:28.888] # 2025-05-05 17:55:28.377 UTC [10009][postmaster] DEBUG: client backend (PID 27624) exited with exit code 0 [17:55:28.888] # ' [17:55:28.888] # doesn't match '(?^:Failed certificate data \(unverified\): subject "/CN=Test CA for PostgreSQL SSL regression test client certs", serial number \d+, issuer "/CN=Test root CA for PostgreSQL SSL regression test suite")' [17:55:28.888] # Looks like you failed 1 test of 240.[postmaster]
DEBUG: client backend (PID 27624) exited with exit code 0
[17:55:28.888] # '
[17:55:28.888] # doesn't match '(?^:Failed certificate data
\(unverified\): subject "/CN=Test CA for PostgreSQL SSL regression
test client certs", serial number \d+, issuer "/CN=Test root CA for
PostgreSQL SSL regression test suite")'
[17:55:28.888] # Looks like you failed 1 test of 240.
https://cirrus-ci.com/task/4708964002168832?logs=test_world#L345
https://api.cirrus-ci.com/v1/artifact/task/4708964002168832/testrun/build/testrun/ssl/001_ssltests/log/regress_log_001_ssltests
https://api.cirrus-ci.com/v1/artifact/task/4708964002168832/testrun/build/testrun/ssl/001_ssltests/log/001_ssltests_primary.log
Thomas Munro <thomas.munro@gmail.com> writes:
If you run the not-yet-enabled-by-default OpenBSD CI task on master,
ssl/001_ssltests fails in "intermediate client certificate is
untrusted", recently uncommented by commit e0f373ee.
Yeah, I see that too. But I also see three failures in 002_scram.pl,
which presumably were there before e0f373ee. (Tested on OpenBSD 7.6
and 7.7.) The buildfarm's OpenBSD animals haven't caught this
because they don't run this test suite :-(. Yes they build with
--with-openssl, but one of them lacks --enable-tap-tests and the
other two aren't filling PG_TEST_EXTRA.
The SCRAM failures are a bit discouraging ...
[18:16:33.259](0.565s) not ok 26 - SCRAM with SSL and channel_binding=require, server certificate uses 'rsassaPss'
[18:16:33.261](0.002s)
[18:16:33.261](0.000s) # Failed test 'SCRAM with SSL and channel_binding=require, server certificate uses 'rsassaPss''
# at t/002_scram.pl line 161.
[18:16:33.262](0.001s) # got: '2'
# expected: '0'
[18:16:33.264](0.002s) not ok 27 - SCRAM with SSL and channel_binding=require, server certificate uses 'rsassaPss': no stderr
[18:16:33.265](0.001s)
[18:16:33.265](0.000s) # Failed test 'SCRAM with SSL and channel_binding=require, server certificate uses 'rsassaPss': no stderr'
# at t/002_scram.pl line 161.
[18:16:33.266](0.001s) # got: 'psql: error: connection to server at "127.0.0.1", port 10442 failed: SSL error: sslv3 alert handshake failure'
# expected: ''
[18:16:33.268](0.002s) not ok 28 - SCRAM with SSL and channel_binding=require, server certificate uses 'rsassaPss': log matches
[18:16:33.269](0.001s)
[18:16:33.269](0.000s) # Failed test 'SCRAM with SSL and channel_binding=require, server certificate uses 'rsassaPss': log matches'
# at /home/tgl/pgsql/src/test/ssl/../../../src/test/perl/PostgreSQL/Test/Cluster.pm line 2607.
[18:16:33.270](0.001s) # '2025-05-05 18:16:33.222 EDT [71478] [unknown] LOG: connection received: host=localhost port=42632
# 2025-05-05 18:16:33.244 EDT [71478] [unknown] LOG: could not accept SSL connection: missing rsa certificate
# '
# doesn't match '(?^:connection authenticated: identity="ssltestuser" method=scram-sha-256)'
regards, tom lane
I wrote:
Yeah, I see that too. But I also see three failures in 002_scram.pl,
which presumably were there before e0f373ee. (Tested on OpenBSD 7.6
and 7.7.) The buildfarm's OpenBSD animals haven't caught this
because they don't run this test suite :-(.
I dug into this with gdb, and it seems like it's an omission in
LibreSSL's RSA-PSS support. We're requesting a signature algorithm
with code 2054, which in their source is SIGALG_RSA_PSS_RSAE_SHA512,
and that leads them to this sigalgs[] table entry in
lib/libssl/ssl_sigalgs.c:
{
.value = SIGALG_RSA_PSS_RSAE_SHA512,
.key_type = EVP_PKEY_RSA,
.md = EVP_sha512,
.security_level = 5,
.flags = SIGALG_FLAG_RSA_PSS,
},
The problem is that the private key has key_type EVP_PKEY_RSA_PSS
which is not equal to EVP_PKEY_RSA, so ssl_sigalg_pkey_ok() in the
same file rejects this entry as not compatible. In fact, there
are *no* entries in sigalgs[] that can match a PSS private key
according to ssl_sigalg_pkey_ok(). So while perhaps
SIGALG_RSA_PSS_RSAE_SHA512 is the wrong thing for us to request,
I do not see a right thing.
Looking around a little more, there are places like
SSL_get_signature_type_nid() that do things like this:
*nid = sigalg->key_type;
if (sigalg->key_type == EVP_PKEY_RSA &&
(sigalg->flags & SIGALG_FLAG_RSA_PSS))
*nid = EVP_PKEY_RSA_PSS;
So it seems like this might be a simple oversight in
ssl_sigalg_pkey_ok(), which doesn't make any such correction:
if (sigalg->key_type != EVP_PKEY_id(pkey))
return 0;
Anyone know anything about where to submit LibreSSL bugs?
regards, tom lane
On Wed, May 7, 2025 at 1:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
So it seems like this might be a simple oversight in
ssl_sigalg_pkey_ok(), which doesn't make any such correction:if (sigalg->key_type != EVP_PKEY_id(pkey))
return 0;
Nice detective work.
Anyone know anything about where to submit LibreSSL bugs?
I think it's done with sendbug on an OpenBSD box, or perhaps you can
just write a normal email to the bugs@openbsd.org or
libressl@openbsd.org list, based on:
Thomas Munro <thomas.munro@gmail.com> writes:
On Wed, May 7, 2025 at 1:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Anyone know anything about where to submit LibreSSL bugs?
I think it's done with sendbug on an OpenBSD box, or perhaps you can
just write a normal email to the bugs@openbsd.org or
libressl@openbsd.org list, based on:
https://www.openbsd.org/mail.html
Thanks, I'll look into reporting it tomorrow. In the meantime,
I couldn't help noticing that the backtraces went through
lib/libssl/tls13_legacy.c, which doesn't give a warm feeling
about how supported they think our usage is (and perhaps also
explains why they didn't detect this bug themselves). This is
evidently because we set up the SSL context with SSLv23_method(),
per this comment in be_tls_init():
* We use SSLv23_method() because it can negotiate use of the highest
* mutually supported protocol version, while alternatives like
* TLSv1_2_method() permit only one specific version. Note that we don't
* actually allow SSL v2 or v3, only TLS protocols (see below).
This choice seems to be more than 20 years old, though the above
comment defending it dates only to 2014. I wonder if it's time to
revisit that idea.
regards, tom lane
On Wed, May 7, 2025 at 4:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thanks, I'll look into reporting it tomorrow. In the meantime,
I couldn't help noticing that the backtraces went through
lib/libssl/tls13_legacy.c, which doesn't give a warm feeling
about how supported they think our usage is (and perhaps also
explains why they didn't detect this bug themselves). This is
evidently because we set up the SSL context with SSLv23_method(),
per this comment in be_tls_init():
Oh, interesting. I also wondered if the problem I reported might be
related to the separate legacy code paths in x509_vfy.c.
On 7 May 2025, at 06:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
On Wed, May 7, 2025 at 1:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Anyone know anything about where to submit LibreSSL bugs?
I think it's done with sendbug on an OpenBSD box, or perhaps you can
just write a normal email to the bugs@openbsd.org or
libressl@openbsd.org list, based on:
https://www.openbsd.org/mail.html
Bugs are frequently reported, and dealt with, on the libressl mailing list.
Thanks, I'll look into reporting it tomorrow. In the meantime,
I couldn't help noticing that the backtraces went through
lib/libssl/tls13_legacy.c, which doesn't give a warm feeling
about how supported they think our usage is (and perhaps also
explains why they didn't detect this bug themselves). This is
evidently because we set up the SSL context with SSLv23_method(),
per this comment in be_tls_init():* We use SSLv23_method() because it can negotiate use of the highest
* mutually supported protocol version, while alternatives like
* TLSv1_2_method() permit only one specific version. Note that we don't
* actually allow SSL v2 or v3, only TLS protocols (see below).This choice seems to be more than 20 years old, though the above
comment defending it dates only to 2014. I wonder if it's time to
revisit that idea.
The use of SSLv23_method() comes from us supporting 1.0.2, the replacement
TLS_method() was introduced in 1.1.0 and SSLv23_method() was turned into an
alias of TLS_method() in OpenSSL commit 32ec41539b5.
Since we no longer support 1.0.2 we can apply something like the (lightly
tested) attached which should be a no-op as we already use TLS_method() but via
an alias.
There's likely more OpenSSL code we can modernize, hopefully we can make some
progress on that during the v19 cycle.
--
Daniel Gustafsson
Attachments:
tls_method.diffapplication/octet-stream; name=tls_method.diff; x-unix-mode=0644Download+4-5
Daniel Gustafsson <daniel@yesql.se> writes:
On 7 May 2025, at 06:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I couldn't help noticing that the backtraces went through
lib/libssl/tls13_legacy.c, which doesn't give a warm feeling
about how supported they think our usage is (and perhaps also
explains why they didn't detect this bug themselves).
Since we no longer support 1.0.2 we can apply something like the (lightly
tested) attached which should be a no-op as we already use TLS_method() but via
an alias.
Yeah, I saw that SSLv23_method() was merely an alias for TLS_method()
in LibreSSL as well. That means unfortunately that your proposal is
just cosmetic and doesn't get us out of using code that they're
calling "legacy". I wonder what it would take to get to the "modern"
code paths.
regards, tom lane
On 7 May 2025, at 18:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Daniel Gustafsson <daniel@yesql.se> writes:
On 7 May 2025, at 06:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I couldn't help noticing that the backtraces went through
lib/libssl/tls13_legacy.c, which doesn't give a warm feeling
about how supported they think our usage is (and perhaps also
explains why they didn't detect this bug themselves).Since we no longer support 1.0.2 we can apply something like the (lightly
tested) attached which should be a no-op as we already use TLS_method() but via
an alias.Yeah, I saw that SSLv23_method() was merely an alias for TLS_method()
in LibreSSL as well. That means unfortunately that your proposal is
just cosmetic and doesn't get us out of using code that they're
calling "legacy". I wonder what it would take to get to the "modern"
code paths.
AFAICT (it's not documented what I can see) the Libressl folks consider code
inherited from OpenSSL legacy. Using current OpenSSL API's and moving away
from deprecated API's is probably our best bet. On that note, TLS_method is
the current API to use in both OpenSSL and Libressl according to their
respective documentations.
A separate be-secure-libressl.c could move to use their libtls instead of the
libssl OpenSSL compatibility library, which may be an interesting excercise but
a very different project from what is discussed here.
--
Daniel Gustafsson
I filed a report with the LibreSSL folks, and the answer was basically
"yeah, we know RSA-PSS doesn't work; there are multiple things that
need fixing" [1]https://marc.info/?l=libressl&m=174664225002441&w=2. So it seems like we'd better find a way to skip
that test case when using LibreSSL. Not sure what the most
appropriate check would look like.
regards, tom lane
[1]: https://marc.info/?l=libressl&m=174664225002441&w=2