Incorrect allocation handling for cryptohash functions with OpenSSL
Hi all,
As of the work done in 87ae9691, I have played with error injections
in the code paths using this code, but forgot to count for cases where
cascading resowner cleanups are involved. Like other resources (JIT,
DSM, etc.), this requires an allocation in TopMemoryContext to make
sure that nothing gets forgotten or cleaned up on the way until the
resowner that did the cryptohash allocation is handled.
Attached is a small extension I have played with by doing some error
injections, and a patch. If there are no objections, I would like to
commit this fix.
Thanks,
--
Michael
Attachments:
On 18/12/2020 09:35, Michael Paquier wrote:
Hi all,
As of the work done in 87ae9691, I have played with error injections
in the code paths using this code, but forgot to count for cases where
cascading resowner cleanups are involved. Like other resources (JIT,
DSM, etc.), this requires an allocation in TopMemoryContext to make
sure that nothing gets forgotten or cleaned up on the way until the
resowner that did the cryptohash allocation is handled.Attached is a small extension I have played with by doing some error
injections, and a patch. If there are no objections, I would like to
commit this fix.
pg_cryptohash_create() is now susceptible to leaking memory in
TopMemoryContext, if the allocations fail. I think the attached should
fix it (but I haven't tested it at all).
BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we
need two structs? They're both allocated and controlled by the
cryptohash implementation. It would seem simpler to have just one.
- Heikki
Attachments:
On 18/12/2020 11:35, Heikki Linnakangas wrote:
BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we
need two structs? They're both allocated and controlled by the
cryptohash implementation. It would seem simpler to have just one.
Something like this. Passes regression tests, but otherwise untested.
- Heikki
Attachments:
On Fri, Dec 18, 2020 at 11:35:14AM +0200, Heikki Linnakangas wrote:
pg_cryptohash_create() is now susceptible to leaking memory in
TopMemoryContext, if the allocations fail. I think the attached should fix
it (but I haven't tested it at all).
Yeah, you are right here. If the second allocation fails the first
one would leak. I don't think that your suggested fix is completely
right though because it ignores that the callers of
pg_cryptohash_create() in the backend expect an error all the time, so
it could crash. Perhaps we should just bite the bullet and switch the
OpenSSL and fallback implementations to use allocation APIs that never
cause an error, and always return NULL? That would have the advantage
to be more consistent with the frontend that relies in malloc(), at
the cost of requiring more changes for the backend code where the
_create() call would need to handle the NULL case properly. The
backend calls are already aware of errors so that would not be
invasive except for the addition of some elog(ERROR) or similar, and
we could change the fallback implementation to use palloc_extended()
with MCXT_ALLOC_NO_OOM.
BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we need
two structs? They're both allocated and controlled by the cryptohash
implementation. It would seem simpler to have just one.
Depending on the implementation, the data to track can be completely
different, and this split allows to know about the resowner dependency
only in the OpenSSL part of cryptohashes, without having to include
this knowledge in neither cryptohash.h nor in the fallback
implementation that can just use palloc() in the backend.
--
Michael
On Fri, Dec 18, 2020 at 11:51:55AM +0200, Heikki Linnakangas wrote:
Something like this. Passes regression tests, but otherwise untested.
... And I wanted to keep track of the type of cryptohash directly in
the shared structure. ;)
--
Michael
On 18/12/2020 12:10, Michael Paquier wrote:
On Fri, Dec 18, 2020 at 11:35:14AM +0200, Heikki Linnakangas wrote:
pg_cryptohash_create() is now susceptible to leaking memory in
TopMemoryContext, if the allocations fail. I think the attached should fix
it (but I haven't tested it at all).Yeah, you are right here. If the second allocation fails the first
one would leak. I don't think that your suggested fix is completely
right though because it ignores that the callers of
pg_cryptohash_create() in the backend expect an error all the time, so
it could crash.
Ah, right.
Perhaps we should just bite the bullet and switch the
OpenSSL and fallback implementations to use allocation APIs that never
cause an error, and always return NULL? That would have the advantage
to be more consistent with the frontend that relies in malloc(), at
the cost of requiring more changes for the backend code where the
_create() call would need to handle the NULL case properly. The
backend calls are already aware of errors so that would not be
invasive except for the addition of some elog(ERROR) or similar, and
we could change the fallback implementation to use palloc_extended()
with MCXT_ALLOC_NO_OOM.
-1. On the contrary, I think we should reduce the number of checks
needed in the callers, and prefer throwing errors, if possible. It's too
easy to forget the check, and it makes the code more verbose, too.
In fact, it might be better if pg_cryptohash_init() and
pg_cryptohash_update() didn't return errors either. If an error happens,
they could just set a flag in the pg_cryptohash_ctx, and
pg_cryptohash_final() function would return the error. That way, you
would only need to check for error return in the call to
pg_cryptohash_final().
BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we need
two structs? They're both allocated and controlled by the cryptohash
implementation. It would seem simpler to have just one.Depending on the implementation, the data to track can be completely
different, and this split allows to know about the resowner dependency
only in the OpenSSL part of cryptohashes, without having to include
this knowledge in neither cryptohash.h nor in the fallback
implementation that can just use palloc() in the backend.
... And I wanted to keep track of the type of cryptohash directly in
the shared structure. ;)
You could also define a shared header, with the rest of the struct being
implementation-specific:
typedef struct pg_cryptohash_ctx
{
pg_cryptohash_type type;
/* implementation-specific data follows */
} pg_cryptohash_ctx;
- Heikki
On 18/12/2020 12:55, Heikki Linnakangas wrote:
On 18/12/2020 12:10, Michael Paquier wrote:
On Fri, Dec 18, 2020 at 11:35:14AM +0200, Heikki Linnakangas wrote:
pg_cryptohash_create() is now susceptible to leaking memory in
TopMemoryContext, if the allocations fail. I think the attached should fix
it (but I haven't tested it at all).Yeah, you are right here. If the second allocation fails the first
one would leak. I don't think that your suggested fix is completely
right though because it ignores that the callers of
pg_cryptohash_create() in the backend expect an error all the time, so
it could crash.Ah, right.
Perhaps we should just bite the bullet and switch the
OpenSSL and fallback implementations to use allocation APIs that never
cause an error, and always return NULL? That would have the advantage
to be more consistent with the frontend that relies in malloc(), at
the cost of requiring more changes for the backend code where the
_create() call would need to handle the NULL case properly. The
backend calls are already aware of errors so that would not be
invasive except for the addition of some elog(ERROR) or similar, and
we could change the fallback implementation to use palloc_extended()
with MCXT_ALLOC_NO_OOM.-1. On the contrary, I think we should reduce the number of checks
needed in the callers, and prefer throwing errors, if possible. It's too
easy to forget the check, and it makes the code more verbose, too.In fact, it might be better if pg_cryptohash_init() and
pg_cryptohash_update() didn't return errors either. If an error happens,
they could just set a flag in the pg_cryptohash_ctx, and
pg_cryptohash_final() function would return the error. That way, you
would only need to check for error return in the call to
pg_cryptohash_final().
BTW, it's a bit weird that the pg_cryptohash_init/update/final()
functions return success, if the ctx argument is NULL. It would seem
more sensible for them to return an error. That way, if a caller forgets
to check for NULL result from pg_cryptohash_create(), but correctly
checks the result of those other functions, it would catch the error. In
fact, if we documented that pg_cryptohash_create() can return NULL, and
pg_cryptohash_final() always returns error on NULL argument, then it
would be sufficient for the callers to only check the return value of
pg_cryptohash_final(). So the usage pattern would be:
ctx = pg_cryptohash_create(PG_MD5);
pg_cryptohash_inti(ctx);
pg_update(ctx, data, size);
pg_update(ctx, moredata, size);
if (pg_cryptohash_final(ctx, &hash) < 0)
elog(ERROR, "md5 calculation failed");
pg_cryptohash_free(ctx);
- Heikki
On Fri, Dec 18, 2020 at 01:04:27PM +0200, Heikki Linnakangas wrote:
BTW, it's a bit weird that the pg_cryptohash_init/update/final() functions
return success, if the ctx argument is NULL. It would seem more sensible for
them to return an error.
Okay.
That way, if a caller forgets to check for NULL
result from pg_cryptohash_create(), but correctly checks the result of those
other functions, it would catch the error. In fact, if we documented that
pg_cryptohash_create() can return NULL, and pg_cryptohash_final() always
returns error on NULL argument, then it would be sufficient for the callers
to only check the return value of pg_cryptohash_final(). So the usage
pattern would be:ctx = pg_cryptohash_create(PG_MD5);
pg_cryptohash_inti(ctx);
pg_update(ctx, data, size);
pg_update(ctx, moredata, size);
if (pg_cryptohash_final(ctx, &hash) < 0)
elog(ERROR, "md5 calculation failed");
pg_cryptohash_free(ctx);
I'd rather keep the init and update routines to return an error code
directly. This is more consistent with OpenSSL (note that libnss does
not return error codes for the init, update and final but it is
possible to grab for errors then react on that). And we have even in
our tree code paths a-la-pgcrypto that have callbacks for each phase
with some processing in-between. HMAC also gets a bit cleaner by
keeping this flexibility IMO.
--
Michael
On Fri, Dec 18, 2020 at 11:51:55AM +0200, Heikki Linnakangas wrote:
On 18/12/2020 11:35, Heikki Linnakangas wrote:
BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we
need two structs? They're both allocated and controlled by the
cryptohash implementation. It would seem simpler to have just one.Something like this. Passes regression tests, but otherwise untested.
Interesting. I have looked at that with a fresh mind, thanks for the
idea. This reduces the number of allocations to one making the error
handling a no-brainer, at the cost of hiding the cryptohash type
directly to the caller. I originally thought that this would be
useful as I recall reading cases in the OpenSSL code doing checks on
hash type used, but perhaps that's just some over-engineered thoughts
from my side. I have found a couple of small-ish issues, please see
my comments below.
+ /*
+ * FIXME: this always allocates enough space for the largest hash.
+ * A smaller allocation would be enough for md5, sha224 and sha256.
+ */
I am not sure that this is worth complicating more, and we are not
talking about a lot of memory (sha512 requires 208 bytes, sha224/256
104 bytes, md5 96 bytes with a quick measurement). This makes free()
equally more simple. So just allocating the amount of memory based on
the max size in the union looks fine to me. I would add a memset(0)
after this allocation though.
-#define ALLOC(size) palloc(size)
+#define ALLOC(size) MemoryContextAllocExtended(TopMemoryContext, size, MCXT_ALLOC_NO_OOM)
As the only allocation in TopMemoryContext is for the context, it
would be fine to not use MCXT_ALLOC_NO_OOM here, and fail so as
callers in the backend don't need to worry about create() returning
NULL.
- state->evpctx = EVP_MD_CTX_create();
+ ctx->evpctx = EVP_MD_CTX_create();
- if (state->evpctx == NULL)
+ if (ctx->evpctx == NULL)
{
If EVP_MD_CTX_create() fails, you would leak memory with the context
allocated in TopMemoryContext. So this requires a free of the context
before the elog(ERROR).
+ /*
+ * Make sure that the resource owner has space to remember this
+ * reference. This can error out with "out of memory", so do this
+ * before any other allocation to avoid leaking.
+ */
#ifndef FRONTEND
ResourceOwnerEnlargeCryptoHash(CurrentResourceOwner);
#endif
Right. Good point.
- /* OpenSSL internals return 1 on success, 0 on failure */
+ /* openssl internals return 1 on success, 0 on failure */
It seems to me that this was not wanted.
At the same time, I have taken care of your comment from upthread to
return a failure if the caller passes NULL for the context, and
adjusted some comments. What do you think of the attached?
--
Michael
Attachments:
On Fri, Dec 18, 2020 at 6:04 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
BTW, it's a bit weird that the pg_cryptohash_init/update/final()
functions return success, if the ctx argument is NULL. It would seem
more sensible for them to return an error. That way, if a caller forgets
to check for NULL result from pg_cryptohash_create(), but correctly
checks the result of those other functions, it would catch the error. In
fact, if we documented that pg_cryptohash_create() can return NULL, and
pg_cryptohash_final() always returns error on NULL argument, then it
would be sufficient for the callers to only check the return value of
pg_cryptohash_final(). So the usage pattern would be:ctx = pg_cryptohash_create(PG_MD5);
pg_cryptohash_inti(ctx);
pg_update(ctx, data, size);
pg_update(ctx, moredata, size);
if (pg_cryptohash_final(ctx, &hash) < 0)
elog(ERROR, "md5 calculation failed");
pg_cryptohash_free(ctx);
TBH, I think there's no point in return an error here at all, because
it's totally non-specific. You have no idea what failed, just that
something failed. Blech. If we want to check that ctx is non-NULL, we
should do that with an Assert(). Complicating the code with error
checks that have to be added in multiple places that are far removed
from where the actual problem was detected stinks.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Dec 21, 2020 at 04:28:26PM -0500, Robert Haas wrote:
TBH, I think there's no point in return an error here at all, because
it's totally non-specific. You have no idea what failed, just that
something failed. Blech. If we want to check that ctx is non-NULL, we
should do that with an Assert(). Complicating the code with error
checks that have to be added in multiple places that are far removed
from where the actual problem was detected stinks.
You could technically do that, but only for the backend at the cost of
painting the code of src/common/ with more #ifdef FRONTEND. Even if
we do that, enforcing an error in the backend could be a problem when
it comes to some code paths. One of them is the SCRAM mock
authentication where we had better generate a generic error message.
Using an Assert() or just letting the code go through is not good
either, as we should avoid incorrect computations or crash on OOM, not
to mention that this would fail the detection of bugs coming directly
from OpenSSL or any other SSL library this code plugs with. In short,
I think that there are more benefits in letting the caller control the
error handling.
--
Michael
On Sat, Dec 19, 2020 at 03:13:50PM +0900, Michael Paquier wrote:
At the same time, I have taken care of your comment from upthread to
return a failure if the caller passes NULL for the context, and
adjusted some comments. What do you think of the attached?
I have looked again at this thread with a fresher mind and I did not
see a problem with the previous patch, except some indentation
issues. So if there are no objections, I'd like to commit the
attached.
--
Michael
Attachments:
On 06/01/2021 13:42, Michael Paquier wrote:
On Sat, Dec 19, 2020 at 03:13:50PM +0900, Michael Paquier wrote:
At the same time, I have taken care of your comment from upthread to
return a failure if the caller passes NULL for the context, and
adjusted some comments. What do you think of the attached?I have looked again at this thread with a fresher mind and I did not
see a problem with the previous patch, except some indentation
issues. So if there are no objections, I'd like to commit the
attached.
Looks fine to me.
- Heikki
On 25/12/2020 02:57, Michael Paquier wrote:
On Mon, Dec 21, 2020 at 04:28:26PM -0500, Robert Haas wrote:
TBH, I think there's no point in return an error here at all, because
it's totally non-specific. You have no idea what failed, just that
something failed. Blech. If we want to check that ctx is non-NULL, we
should do that with an Assert(). Complicating the code with error
checks that have to be added in multiple places that are far removed
from where the actual problem was detected stinks.You could technically do that, but only for the backend at the cost of
painting the code of src/common/ with more #ifdef FRONTEND. Even if
we do that, enforcing an error in the backend could be a problem when
it comes to some code paths.
Yeah, you would still need to remember to check for the error in
frontend code. Maybe it would still be a good idea, not sure. It would
be a nice backstop, if you forget to check for the error.
I had a quick look at the callers:
contrib/pgcrypto/internal-sha2.c and
src/backend/utils/adt/cryptohashfuncs.c: the call to
pg_cryptohash_create() is missing check for NULL result. With your
latest patch, that's OK because the subsequent pg_cryptohash_update()
calls will fail if passed a NULL context. But seems sloppy.
contrib/pgcrypto/internal.c: all the calls to pg_cryptohash_* functions
are missing checks for error return codes.
contrib/uuid-ossp/uuid-ossp.c: uses pg_cryptohash for MD5, but borrows
the built-in implementation of SHA1 on some platforms. Should we add
support for SHA1 in pg_cryptohash and use that for consistency?
src/backend/libpq/auth-scram.c: SHA256 is used in the mock
authentication. If the pg_cryptohash functions fail, we throw a distinct
error "could not encode salt" that reveals that it was a mock
authentication. I don't think this is a big deal in practice, it would
be hard for an attacker to induce the SHA256 computation to fail, and
there are probably better ways to distinguish mock authentication from
real, like timing attacks. But still.
src/include/common/checksum_helper.h: in pg_checksum_raw_context, do we
still need separate fields for the different sha variants.
- Heikki
On Wed, Jan 06, 2021 at 03:27:03PM +0200, Heikki Linnakangas wrote:
Looks fine to me.
Thanks, I have been able to get this part done as of 55fe26a.
--
Michael
On Wed, Jan 06, 2021 at 03:58:22PM +0200, Heikki Linnakangas wrote:
contrib/pgcrypto/internal-sha2.c and
src/backend/utils/adt/cryptohashfuncs.c: the call to pg_cryptohash_create()
is missing check for NULL result. With your latest patch, that's OK because
the subsequent pg_cryptohash_update() calls will fail if passed a NULL
context. But seems sloppy.
Is it? pg_cryptohash_create() will never return NULL for the backend.
contrib/pgcrypto/internal.c: all the calls to pg_cryptohash_* functions are
missing checks for error return codes.
Indeed, these are incorrect, thanks. I'll go fix that separately.
contrib/uuid-ossp/uuid-ossp.c: uses pg_cryptohash for MD5, but borrows the
built-in implementation of SHA1 on some platforms. Should we add support for
SHA1 in pg_cryptohash and use that for consistency?
Yeah, I have sent a separate patch for that:
https://commitfest.postgresql.org/31/2868/
The cleanups produced by this patch are kind of nice.
src/backend/libpq/auth-scram.c: SHA256 is used in the mock authentication.
If the pg_cryptohash functions fail, we throw a distinct error "could not
encode salt" that reveals that it was a mock authentication. I don't think
this is a big deal in practice, it would be hard for an attacker to induce
the SHA256 computation to fail, and there are probably better ways to
distinguish mock authentication from real, like timing attacks. But still.
This maps with the second error in the mock routine, so wouldn't it be
better to change both and back-patch? The last place where this error
message is used is pg_be_scram_build_secret() for the generation of
what's stored in pg_authid. An idea would be to use "out of memory".
That can be faced for any palloc() calls.
src/include/common/checksum_helper.h: in pg_checksum_raw_context, do we
still need separate fields for the different sha variants.
Using separate fields looked cleaner to me if it came to debugging,
and the cleanup was rather minimal except if we use more switch/case
to set up the various variables. What about something like the
attached?
--
Michael
Attachments:
On 07/01/2021 06:15, Michael Paquier wrote:
On Wed, Jan 06, 2021 at 03:58:22PM +0200, Heikki Linnakangas wrote:
contrib/pgcrypto/internal-sha2.c and
src/backend/utils/adt/cryptohashfuncs.c: the call to pg_cryptohash_create()
is missing check for NULL result. With your latest patch, that's OK because
the subsequent pg_cryptohash_update() calls will fail if passed a NULL
context. But seems sloppy.Is it? pg_cryptohash_create() will never return NULL for the backend.
Ah, you're right.
src/backend/libpq/auth-scram.c: SHA256 is used in the mock authentication.
If the pg_cryptohash functions fail, we throw a distinct error "could not
encode salt" that reveals that it was a mock authentication. I don't think
this is a big deal in practice, it would be hard for an attacker to induce
the SHA256 computation to fail, and there are probably better ways to
distinguish mock authentication from real, like timing attacks. But still.This maps with the second error in the mock routine, so wouldn't it be
better to change both and back-patch? The last place where this error
message is used is pg_be_scram_build_secret() for the generation of
what's stored in pg_authid. An idea would be to use "out of memory".
That can be faced for any palloc() calls.
Hmm. Perhaps it would be best to change all the errors in mock
authentication to COMMERROR. It'd be nice to have an accurate error
message in the log, but no need to send it to the client.
src/include/common/checksum_helper.h: in pg_checksum_raw_context, do we
still need separate fields for the different sha variants.Using separate fields looked cleaner to me if it came to debugging,
and the cleanup was rather minimal except if we use more switch/case
to set up the various variables. What about something like the
attached?
+1
- Heikki
On Thu, Jan 07, 2021 at 09:51:00AM +0200, Heikki Linnakangas wrote:
Hmm. Perhaps it would be best to change all the errors in mock
authentication to COMMERROR. It'd be nice to have an accurate error message
in the log, but no need to send it to the client.
Yeah, we could do that. Still, this mode still requires a hard
failure because COMMERROR is just a log, and if only COMMERROR is done
we still expect a salt to be generated to send a challenge back to the
client, which would require a fallback for the salt if the one
generated from the mock nonce cannot. Need to think more about that.
Using separate fields looked cleaner to me if it came to debugging,
and the cleanup was rather minimal except if we use more switch/case
to set up the various variables. What about something like the
attached?+1
Thanks, I have committed this part.
--
Michael