OpenSSL randomness seeding

Started by Daniel Gustafssonover 5 years ago14 messages
#1Daniel Gustafsson
daniel@yesql.se
2 attachment(s)

After forking we call RAND_cleanup in fork_process.c to force a re-seed to
ensure that two backends cannot share sequence. OpenSSL 1.1.0 deprecated
RAND_cleanup, and contrary to how they usually leave deprecated APIs working
until removed, they decided to silently make this call a noop like below:

# define RAND_cleanup() while(0) continue

This leaves our defence against pool sharing seemingly useless, and also
against the recommendations of OpenSSL for versions > 1.1.0 and < 1.1.1 where
the RNG was rewritten:

https://wiki.openssl.org/index.php/Random_fork-safety

The silver lining here is that while OpenSSL nooped RAND_cleanup, they also
changed what is mixed into seeding so we are still not sharing a sequence. To
fix this, changing the RAND_cleanup call to RAND_poll should be enough to
ensure re-seeding after forking across all supported OpenSSL versions. Patch
0001 implements this along with a comment referencing when it can be removed
(which most likely won't be for quite some time).

Another thing that stood out when reviewing this code is that we optimize for
RAND_poll failing in pg_strong_random, when we already have RAND_status
checking for a sufficiently seeded RNG for us. ISTM that we can simplify the
code by letting RAND_status do the work as per 0002, and also (while unlikely)
survive any transient failures in RAND_poll by allowing all the retries we've
defined for the loop.

Also, as a disclaimer, this was brought up with the PostgreSQL security team
first whom have given permission to discuss this in public.

Thoughts on these?

cheers ./daniel

Attachments:

0002-Remove-optimization-for-RAND_poll-failing.patchapplication/octet-stream; name=0002-Remove-optimization-for-RAND_poll-failing.patch; x-unix-mode=0644Download
From 8d8d0216fc033c5576dd7cf55f442486dffc8cdc Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Tue, 21 Jul 2020 14:01:11 +0200
Subject: [PATCH 2/2] Remove optimization for RAND_poll failing

The loop to generate seed data will exit on RAND_status so we don't
need to handle the case of RAND_poll failing separately.
---
 src/port/pg_strong_random.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/src/port/pg_strong_random.c b/src/port/pg_strong_random.c
index eed8b87808..d3b4689f4f 100644
--- a/src/port/pg_strong_random.c
+++ b/src/port/pg_strong_random.c
@@ -109,6 +109,10 @@ pg_strong_random(void *buf, size_t len)
 	 * Check that OpenSSL's CSPRNG has been sufficiently seeded, and if not
 	 * add more seed data using RAND_poll().  With some older versions of
 	 * OpenSSL, it may be necessary to call RAND_poll() a number of times.
+	 * If RAND_poll() fails to generate seed data within the given amount of
+	 * retries, subsequent RAND_bytes() calls will fail but we allow that to
+	 * happen to let pg_strong_random callers handle that with appropriate
+	 * error handling.
 	 */
 #define NUM_RAND_POLL_RETRIES 8
 
@@ -120,16 +124,7 @@ pg_strong_random(void *buf, size_t len)
 			break;
 		}
 
-		if (RAND_poll() == 0)
-		{
-			/*
-			 * RAND_poll() failed to generate any seed data, which means that
-			 * RAND_bytes() will probably fail.  For now, just fall through
-			 * and let that happen.  XXX: maybe we could seed it some other
-			 * way.
-			 */
-			break;
-		}
+		RAND_poll();
 	}
 
 	if (RAND_bytes(buf, len) == 1)
-- 
2.21.1 (Apple Git-122.3)

0001-Use-RAND_poll-for-seeding-randomness-after-fork.patchapplication/octet-stream; name=0001-Use-RAND_poll-for-seeding-randomness-after-fork.patch; x-unix-mode=0644Download
From d50bc5d786a4c7f4c07ce1d3cfecbf69354c3874 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Tue, 21 Jul 2020 13:57:49 +0200
Subject: [PATCH 1/2] Use RAND_poll for seeding randomness after fork

RAND_cleanup has been deprecated by OpenSSL and made into a noop
in 1.1.0, which makes calling it useless.  Fix by instead using
RAND_poll which is the recommended function.
---
 src/backend/postmaster/fork_process.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index def3cee37e..15d6340800 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -109,10 +109,12 @@ fork_process(void)
 		}
 
 		/*
-		 * Make sure processes do not share OpenSSL randomness state.
+		 * Make sure processes do not share OpenSSL randomness state. This is
+		 * no longer required in OpenSSL 1.1.1 and later versions, but until
+		 * we drop support for version < 1.1.1 we need to do this.
 		 */
 #ifdef USE_OPENSSL
-		RAND_cleanup();
+		RAND_poll();
 #endif
 	}
 
-- 
2.21.1 (Apple Git-122.3)

#2David Steele
david@pgmasters.net
In reply to: Daniel Gustafsson (#1)
Re: OpenSSL randomness seeding

On 7/21/20 8:13 AM, Daniel Gustafsson wrote:

After forking we call RAND_cleanup in fork_process.c to force a re-seed to
ensure that two backends cannot share sequence. OpenSSL 1.1.0 deprecated
RAND_cleanup, and contrary to how they usually leave deprecated APIs working
until removed, they decided to silently make this call a noop like below:

# define RAND_cleanup() while(0) continue

This leaves our defence against pool sharing seemingly useless, and also
against the recommendations of OpenSSL for versions > 1.1.0 and < 1.1.1 where
the RNG was rewritten:

https://wiki.openssl.org/index.php/Random_fork-safety

The silver lining here is that while OpenSSL nooped RAND_cleanup, they also
changed what is mixed into seeding so we are still not sharing a sequence. To
fix this, changing the RAND_cleanup call to RAND_poll should be enough to
ensure re-seeding after forking across all supported OpenSSL versions. Patch
0001 implements this along with a comment referencing when it can be removed
(which most likely won't be for quite some time).

This looks reasonable to me based on your explanation and the OpenSSL wiki.

Another thing that stood out when reviewing this code is that we optimize for
RAND_poll failing in pg_strong_random, when we already have RAND_status
checking for a sufficiently seeded RNG for us. ISTM that we can simplify the
code by letting RAND_status do the work as per 0002, and also (while unlikely)
survive any transient failures in RAND_poll by allowing all the retries we've
defined for the loop.

I wonder how effective the retries are going to be if they happen
immediately. However, most of the code paths I followed ended in a hard
error when pg_strong_random() failed so it may not hurt to try. I just
worry that some caller is depending on a faster failure here.

Regards,
--
-David
david@pgmasters.net

#3Daniel Gustafsson
daniel@yesql.se
In reply to: David Steele (#2)
Re: OpenSSL randomness seeding

On 21 Jul 2020, at 17:31, David Steele <david@pgmasters.net> wrote:
On 7/21/20 8:13 AM, Daniel Gustafsson wrote:

Another thing that stood out when reviewing this code is that we optimize for
RAND_poll failing in pg_strong_random, when we already have RAND_status
checking for a sufficiently seeded RNG for us. ISTM that we can simplify the
code by letting RAND_status do the work as per 0002, and also (while unlikely)
survive any transient failures in RAND_poll by allowing all the retries we've
defined for the loop.

I wonder how effective the retries are going to be if they happen immediately. However, most of the code paths I followed ended in a hard error when pg_strong_random() failed so it may not hurt to try. I just worry that some caller is depending on a faster failure here.

There is that, but I'm not convinced that relying on specific timing for
anything RNG or similarly cryptographic-related is especially sane.

cheers ./daniel

#4David Steele
david@pgmasters.net
In reply to: Daniel Gustafsson (#3)
Re: OpenSSL randomness seeding

On 7/21/20 3:44 PM, Daniel Gustafsson wrote:

On 21 Jul 2020, at 17:31, David Steele <david@pgmasters.net> wrote:
On 7/21/20 8:13 AM, Daniel Gustafsson wrote:

Another thing that stood out when reviewing this code is that we optimize for
RAND_poll failing in pg_strong_random, when we already have RAND_status
checking for a sufficiently seeded RNG for us. ISTM that we can simplify the
code by letting RAND_status do the work as per 0002, and also (while unlikely)
survive any transient failures in RAND_poll by allowing all the retries we've
defined for the loop.

I wonder how effective the retries are going to be if they happen immediately. However, most of the code paths I followed ended in a hard error when pg_strong_random() failed so it may not hurt to try. I just worry that some caller is depending on a faster failure here.

There is that, but I'm not convinced that relying on specific timing for
anything RNG or similarly cryptographic-related is especially sane.

I wasn't thinking specific timing -- just that the caller might be
expecting it to give up quickly if it doesn't work. That's what the code
is trying to do and I wonder if there is a reason for it.

But you are probably correct and I'm just overthinking it.

Regards,
--
-David
david@pgmasters.net

#5Daniel Gustafsson
daniel@yesql.se
In reply to: David Steele (#4)
Re: OpenSSL randomness seeding

On 21 Jul 2020, at 22:00, David Steele <david@pgmasters.net> wrote:

On 7/21/20 3:44 PM, Daniel Gustafsson wrote:

On 21 Jul 2020, at 17:31, David Steele <david@pgmasters.net> wrote:
On 7/21/20 8:13 AM, Daniel Gustafsson wrote:

Another thing that stood out when reviewing this code is that we optimize for
RAND_poll failing in pg_strong_random, when we already have RAND_status
checking for a sufficiently seeded RNG for us. ISTM that we can simplify the
code by letting RAND_status do the work as per 0002, and also (while unlikely)
survive any transient failures in RAND_poll by allowing all the retries we've
defined for the loop.

I wonder how effective the retries are going to be if they happen immediately. However, most of the code paths I followed ended in a hard error when pg_strong_random() failed so it may not hurt to try. I just worry that some caller is depending on a faster failure here.

There is that, but I'm not convinced that relying on specific timing for
anything RNG or similarly cryptographic-related is especially sane.

I wasn't thinking specific timing -- just that the caller might be expecting it to give up quickly if it doesn't work. That's what the code is trying to do and I wonder if there is a reason for it.

I think the original intention was to handle older OpenSSL versions where
multiple successful RAND_poll calls were required for RAND_status to succeed,
the check working as an optimization since a failing RAND_poll would render all
efforts useless anyway. I'm not sure this is true for the OpenSSL versions we
support in HEAD, and/or for modern platforms, but without proof of it not being
useful I would opt for keeping it.

cheers ./daniel

#6Michael Paquier
michael@paquier.xyz
In reply to: Daniel Gustafsson (#5)
Re: OpenSSL randomness seeding

On Tue, Jul 21, 2020 at 10:36:53PM +0200, Daniel Gustafsson wrote:

I think the original intention was to handle older OpenSSL versions where
multiple successful RAND_poll calls were required for RAND_status to succeed,
the check working as an optimization since a failing RAND_poll would render all
efforts useless anyway. I'm not sure this is true for the OpenSSL versions we
support in HEAD, and/or for modern platforms, but without proof of it not being
useful I would opt for keeping it.

Yeah, the retry loop refers to this part of the past discussion on the
matter:
/messages/by-id/CAEZATCWYs6rAp36VKm4W7Sb3EF_7tNcRuhcnJC1P8=8W9nBm9w@mail.gmail.com

During the rewrite of the RNG engines, there was also a retry logic
introduced in 75e2c87, then removed in c16de9d for 1.1.1. In short,
we may be able to live without that once we cut more support for
OpenSSL versions (minimum version support of 1.1.1 is a couple of
years ahead at least for us), but I see no reasons to not leave that
in place either. And this visibly solved one problem for us. I don't
see either a reason to not simplify the loop to fall back to
RAND_status() for the validation.

In short, the proposed patch set looks like a good idea to me to stick
with the recommendations of upstream's wiki to use RAND_poll() after a
fork, but only do that on HEAD (OpenSSL 1.1.0 mixes the current
timestamp and the PID in the random seed of the default engine, 1.0.2
only the PID but RAND_cleanup is a no-op only after 1.1.0).
--
Michael

#7Noah Misch
noah@leadboat.com
In reply to: Daniel Gustafsson (#1)
Re: OpenSSL randomness seeding

On Tue, Jul 21, 2020 at 02:13:32PM +0200, Daniel Gustafsson wrote:

The silver lining here is that while OpenSSL nooped RAND_cleanup, they also
changed what is mixed into seeding so we are still not sharing a sequence. To
fix this, changing the RAND_cleanup call to RAND_poll should be enough to
ensure re-seeding after forking across all supported OpenSSL versions. Patch
0001 implements this along with a comment referencing when it can be removed
(which most likely won't be for quite some time).

Another thing that stood out when reviewing this code is that we optimize for
RAND_poll failing in pg_strong_random, when we already have RAND_status
checking for a sufficiently seeded RNG for us. ISTM that we can simplify the
code by letting RAND_status do the work as per 0002, and also (while unlikely)
survive any transient failures in RAND_poll by allowing all the retries we've
defined for the loop.

Thoughts on these?

These look good. I'll push them on Saturday or later. I wondered whether to
do both RAND_cleanup() and RAND_poll(), to purge all traces of the old seed on
versions supporting both. Since that would strictly (albeit negligibly)
increase seed predictability, I like your decision here.

Do you happen to know how OpenSSL 1.1.1 changed its reaction to forks in order
to make the RAND_poll() superfluous? (No need to research it if you don't.)

#8Michael Paquier
michael@paquier.xyz
In reply to: Noah Misch (#7)
Re: OpenSSL randomness seeding

On Tue, Jul 21, 2020 at 10:00:20PM -0700, Noah Misch wrote:

These look good. I'll push them on Saturday or later. I wondered whether to
do both RAND_cleanup() and RAND_poll(), to purge all traces of the old seed on
versions supporting both. Since that would strictly (albeit negligibly)
increase seed predictability, I like your decision here.

Thanks Noah for taking care of it. No plans for a backpatch, right?
--
Michael

#9Daniel Gustafsson
daniel@yesql.se
In reply to: Noah Misch (#7)
Re: OpenSSL randomness seeding

On 22 Jul 2020, at 07:00, Noah Misch <noah@leadboat.com> wrote:

On Tue, Jul 21, 2020 at 02:13:32PM +0200, Daniel Gustafsson wrote:

The silver lining here is that while OpenSSL nooped RAND_cleanup, they also
changed what is mixed into seeding so we are still not sharing a sequence. To
fix this, changing the RAND_cleanup call to RAND_poll should be enough to
ensure re-seeding after forking across all supported OpenSSL versions. Patch
0001 implements this along with a comment referencing when it can be removed
(which most likely won't be for quite some time).

Another thing that stood out when reviewing this code is that we optimize for
RAND_poll failing in pg_strong_random, when we already have RAND_status
checking for a sufficiently seeded RNG for us. ISTM that we can simplify the
code by letting RAND_status do the work as per 0002, and also (while unlikely)
survive any transient failures in RAND_poll by allowing all the retries we've
defined for the loop.

Thoughts on these?

These look good. I'll push them on Saturday or later.

Thanks for picking it up!

I wondered whether to
do both RAND_cleanup() and RAND_poll(), to purge all traces of the old seed on
versions supporting both. Since that would strictly (albeit negligibly)
increase seed predictability, I like your decision here.

That's a good question. I believe that if one actually do use RAND_cleanup as
a re-seeding mechanism then that can break FIPS enabled OpenSSL installations
as RAND_cleanup resets the RNG method from the FIPS RNG to the built-in one. I
would be inclined to follow the upstream recommendations of using RAND_poll
exclusively, but I'm far from an expert here.

Do you happen to know how OpenSSL 1.1.1 changed its reaction to forks in order
to make the RAND_poll() superfluous? (No need to research it if you don't.)

I'm not entirely sure, but I do believe that 1.1.1 ported over the RNG from the
FIPS module which re-seeds itself with fork() protection. There was however a
bug, fixed in 1.1.1d or thereabouts as CVE-2019-1549, where the fork protection
wasn't activated by default.. so there is that. Since that bug was found,
there has been tests introduced to catch any regression on that which is
comforting.

cheers ./daniel

#10Michael Paquier
michael@paquier.xyz
In reply to: Daniel Gustafsson (#9)
Re: OpenSSL randomness seeding

On Wed, Jul 22, 2020 at 11:31:38PM +0200, Daniel Gustafsson wrote:

Thanks for picking it up!

For the archives, the patch set has been applied as ce4939f and
15e4419 on HEAD. Thanks, Noah.

That's a good question. I believe that if one actually do use RAND_cleanup as
a re-seeding mechanism then that can break FIPS enabled OpenSSL installations
as RAND_cleanup resets the RNG method from the FIPS RNG to the built-in one. I
would be inclined to follow the upstream recommendations of using RAND_poll
exclusively, but I'm far from an expert here.

RAND_cleanup() can cause a failure telling that the RNG state is not
initialized when attempting to use FIPS in 1.0.2. This is not
officially supported by upstream AFAIK, and those APIs have been
dropped later in 1.1.0. And FWIW, VMware's Photon actually applies
some custom patches in this area:
https://github.com/vmware/photon/tree/master/SPECS/openssl

openssl-drbg-default-read-system-fips.patch is used to enforce the
initialization state of FIPS for example. Anyway, I would just stick
with the wiki recommendation.

Do you happen to know how OpenSSL 1.1.1 changed its reaction to forks in order
to make the RAND_poll() superfluous? (No need to research it if you don't.)

I'm not entirely sure, but I do believe that 1.1.1 ported over the RNG from the
FIPS module which re-seeds itself with fork() protection. There was however a
bug, fixed in 1.1.1d or thereabouts as CVE-2019-1549, where the fork protection
wasn't activated by default.. so there is that. Since that bug was found,
there has been tests introduced to catch any regression on that which is
comforting.

No idea about this one actually.
--
Michael

#11Daniel Gustafsson
daniel@yesql.se
In reply to: Michael Paquier (#10)
Re: OpenSSL randomness seeding

On 26 Jul 2020, at 09:06, Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Jul 22, 2020 at 11:31:38PM +0200, Daniel Gustafsson wrote:

Thanks for picking it up!

For the archives, the patch set has been applied as ce4939f and
15e4419 on HEAD. Thanks, Noah.

Indeed, thanks!

Do you happen to know how OpenSSL 1.1.1 changed its reaction to forks in order
to make the RAND_poll() superfluous? (No need to research it if you don't.)

I'm not entirely sure, but I do believe that 1.1.1 ported over the RNG from the
FIPS module which re-seeds itself with fork() protection. There was however a
bug, fixed in 1.1.1d or thereabouts as CVE-2019-1549, where the fork protection
wasn't activated by default.. so there is that. Since that bug was found,
there has been tests introduced to catch any regression on that which is
comforting.

No idea about this one actually.

I did some more reading and AFAICT it won't be required in 1.1.1+, but it also
won't cause any harm so unless evidence of the latter emerge we may just as
well leave it as an extra safeguard.

Somewhat on topic though, 1.1.1 adds a RAND_priv_bytes function for random
numbers that are supposed to be private and extra protected via it's own DRBG.
Maybe we should use that for SCRAM salts etc in case we detect 1.1.1?

cheers ./daniel

#12Noah Misch
noah@leadboat.com
In reply to: Daniel Gustafsson (#11)
Re: OpenSSL randomness seeding

On Thu, Jul 30, 2020 at 11:42:16PM +0200, Daniel Gustafsson wrote:

Somewhat on topic though, 1.1.1 adds a RAND_priv_bytes function for random
numbers that are supposed to be private and extra protected via it's own DRBG.
Maybe we should use that for SCRAM salts etc in case we detect 1.1.1?

Maybe. Would you have a separate pg_private_random() function, or just use
RAND_priv_bytes() for pg_strong_random()? No pg_strong_random() caller is
clearly disinterested in privacy; gen_random_uuid() may come closest.

#13Michael Paquier
michael@paquier.xyz
In reply to: Noah Misch (#12)
Re: OpenSSL randomness seeding

On Sat, Aug 01, 2020 at 11:48:23PM -0700, Noah Misch wrote:

On Thu, Jul 30, 2020 at 11:42:16PM +0200, Daniel Gustafsson wrote:

Somewhat on topic though, 1.1.1 adds a RAND_priv_bytes function for random
numbers that are supposed to be private and extra protected via it's own DRBG.
Maybe we should use that for SCRAM salts etc in case we detect 1.1.1?

Maybe. Would you have a separate pg_private_random() function, or just use
RAND_priv_bytes() for pg_strong_random()? No pg_strong_random() caller is
clearly disinterested in privacy; gen_random_uuid() may come closest.

FWIW, I am not sure that we need extra level of complexity when it
comes to random number generation, so having only one API to rule them
all sounds sensible to me, particularly if we know that the API used
has more private protections.
--
Michael

#14Daniel Gustafsson
daniel@yesql.se
In reply to: Michael Paquier (#13)
Re: OpenSSL randomness seeding

On 2 Aug 2020, at 09:05, Michael Paquier <michael@paquier.xyz> wrote:

On Sat, Aug 01, 2020 at 11:48:23PM -0700, Noah Misch wrote:

On Thu, Jul 30, 2020 at 11:42:16PM +0200, Daniel Gustafsson wrote:

Somewhat on topic though, 1.1.1 adds a RAND_priv_bytes function for random
numbers that are supposed to be private and extra protected via it's own DRBG.
Maybe we should use that for SCRAM salts etc in case we detect 1.1.1?

Maybe. Would you have a separate pg_private_random() function, or just use
RAND_priv_bytes() for pg_strong_random()? No pg_strong_random() caller is
clearly disinterested in privacy; gen_random_uuid() may come closest.

FWIW, I am not sure that we need extra level of complexity when it
comes to random number generation, so having only one API to rule them
all sounds sensible to me, particularly if we know that the API used
has more private protections.

I would agree with that, especially since we might not be able to provide an
equivalent implementation of a pg_private_random() function in non-OpenSSL
builds.

Will do a bit more reading and poking and post a patch.

cheers ./daniel