libpq's multi-threaded SSL callback handling is busted

Started by Jan Urbańskiover 11 years ago17 messageshackers

wulczer@wulczer.org

over 11 years ago

I did some more digging on bug
/messages/by-id/CAHUL3dpWYFnUgdgo95OHYDQ4kugdnBKPTjq0mNbTuBhCMG4xvQ@mail.gmail.com
which describes a deadlock when using libpq with SSL in a multi-threaded
environment with other threads doing SSL independently.

Attached is a reproducing Python script in my experience is faster at showing
the problem. Run it with

python -u pg_lock.py

As Heikki correctly diagnosed, the problem is with libpq unsetting the OpenSSL
locking callback while another thread is holding one of the locks. The other
thread then never releases the lock and the next time anyone tries to take it,
the application deadlocks.

The exact way it goes down is like this:

T1 (libpq) T2 (Python)

start libpq connection
init ssl system
add locking callback

start ssl operation
take lock

finish libpq connection
destroy ssl system
remove locking callback

(!) release lock, noop since no callback

And the next time any thread tries to take the lock, it deadlocks.

We added unsetting the locking callback in
4e816286533dd34c10b368487d4079595a3e1418 due to this bug report:
/messages/by-id/48620925.6070806@pws.com.au

Indeed, commenting out the CRYPTO_set_locking_callback(NULL) call in
fe-secure-openssl.c gets rid of the deadlock. However, it makes php segfault
with the (attached) reproduction script from the original 2008 bug report. If
your php.ini loads both the pgsql and curl extensions, reproduce the segfault with:

php -f pg_segfault.php

The most difficult part about fixing this bug is to determine *who's at
fault*. I now lean towards the opinion that we shouldn't be messing with
OpenSSL callbacks *at all*.

First of all, the current behaviour is crazy. We're setting and unsetting the
locking callback every time a connection is made/closed, which is not how
OpenSSL is supposed to be used. The *application* using libpq should set a
callback before it starts threads, it's no business of the library's.

The old behaviour was slightly less insane (set callbacks first time we're
engaging OpenSSL code, never touch them again). The real sane solution is to
leave it up to the application.

I posit we should remove all CRYPTO_set_*_callback functions and associated
cruft from libpq. This unfortunately means that multi-threaded applications
using libpq and SSL will break if they haven't been setting their own callbacks
(if they have, well, tough luck! libpq will just stomp over them the first time
it connects to Postgres, but at least *some* callbacks are left present after
that).

However, AFAICS if your app is not in C, then runtimes already handle that for
you (as they should).

Python:

https://hg.python.org/cpython/file/dc820b44ce21/Modules/_ssl.c#l4284

PHP:

https://github.com/php/php-src/blob/master/ext/curl/interface.c#L1235

Note that the PHP pgsql extension doesn't set the OpenSSL callbacks, because
libpq was setting them on its own. If we remove the callback handling from
libpq, PHP will need to add them. By the way, the MySQL extension for PHP also
does not set those callbacks.

Let me reiterate: I now believe the callbacks should be set by the application,
libraries should not touch them, since they don't know what will they be
stomping on. If the application is run through a VM like Python or PHP, it's
the VM that should make sure the callbacks are set.

I could submit a patch to get rid of the crazy CRYPTO_*_callback dance in
libpq, but at the very least this will require a warning in the release notes
about how you can't assume that libpq will take care of making sure your app is
multi-threaded safe when using OpenSSL. I also don't know how far that's
back-patcheable.

I would very much like to have this change back-patched, since setting and
resetting the callback makes using libpq in a threaded OpenSSL-enabled app
arguably less safe than if it didn't use any locking. If the app is written
correctly, it will have set locking callbacks before starting. Then libpq will
happily stomp on them. If the app hasn't set callbacks, it wasn't written
correctly in the first place and it will get segfaults instead of deadlocks.

Thanks,
Jan

Jan Urbański

wulczer@wulczer.org

over 11 years ago

In reply to: Jan Urbański (#1)

Re: libpq's multi-threaded SSL callback handling is busted

Jan Urbański writes:

I did some more digging on bug
/messages/by-id/CAHUL3dpWYFnUgdgo95OHYDQ4kugdnBKPTjq0mNbTuBhCMG4xvQ@mail.gmail.com
which describes a deadlock when using libpq with SSL in a multi-threaded
environment with other threads doing SSL independently.

[reproducing instructions]

I posit we should remove all CRYPTO_set_*_callback functions and associated
cruft from libpq.

I could submit a patch to get rid of the crazy CRYPTO_*_callback dance in
libpq, but at the very least this will require a warning in the release notes

Attached is a patch doing just that.

I would very much like to have this change back-patched, since setting and
resetting the callback makes using libpq in a threaded OpenSSL-enabled app
arguably less safe than if it didn't use any locking.

Also attached is a patch for 9.4 and all previous supported releases, which is
the same thing, but adjusted for when we didn't have a separate fe-secure.c and
fe-secure-openssl.c

If committed, this change will warrant a notice in the release notes. I could
try drafting it, if that'd be helpful.

Cheers,
Jan

Andres Freund

andres@anarazel.de

over 11 years ago

In reply to: Jan Urbański (#1)

Re: libpq's multi-threaded SSL callback handling is busted

On 2015-02-09 18:17:14 +0100, Jan Urbański wrote:

We added unsetting the locking callback in
4e816286533dd34c10b368487d4079595a3e1418 due to this bug report:
/messages/by-id/48620925.6070806@pws.com.au

Indeed, commenting out the CRYPTO_set_locking_callback(NULL) call in
fe-secure-openssl.c gets rid of the deadlock. However, it makes php segfault
with the (attached) reproduction script from the original 2008 bug report. If
your php.ini loads both the pgsql and curl extensions, reproduce the segfault with:

php -f pg_segfault.php

The most difficult part about fixing this bug is to determine *who's at
fault*. I now lean towards the opinion that we shouldn't be messing with
OpenSSL callbacks *at all*.

First of all, the current behaviour is crazy. We're setting and unsetting the
locking callback every time a connection is made/closed, which is not how
OpenSSL is supposed to be used. The *application* using libpq should set a
callback before it starts threads, it's no business of the library's.

I don't buy this argument at all. Delivering a libpq that's not
threadsafe in some circumstances is a really bad idea.

The old behaviour was slightly less insane (set callbacks first time we're
engaging OpenSSL code, never touch them again). The real sane solution is to
leave it up to the application.

The real solution would be for openssl to do it itself.

I posit we should remove all CRYPTO_set_*_callback functions and associated
cruft from libpq. This unfortunately means that multi-threaded applications
using libpq and SSL will break if they haven't been setting their own callbacks
(if they have, well, tough luck! libpq will just stomp over them the first time
it connects to Postgres, but at least *some* callbacks are left present after
that).

I think there's no chance in hell we can do that. Breaking a noticeable
amount of libpq users in a minor upgrade is imo a cure *FAR* worse than
the cure.

Note that the PHP pgsql extension doesn't set the OpenSSL callbacks, because
libpq was setting them on its own. If we remove the callback handling from
libpq, PHP will need to add them. By the way, the MySQL extension for PHP also
does not set those callbacks.

I think this shows pretty clearly how insane it would be to change this
in a minor release. Do you really expect people to update libpq and php
in unison. After a minor release?

Note that we *already* provide applications with the ability to set the
callbacks themselves and prevent us from doing so: PQinitSSL(false).

Let me reiterate: I now believe the callbacks should be set by the application,
libraries should not touch them, since they don't know what will they be
stomping on. If the application is run through a VM like Python or PHP, it's
the VM that should make sure the callbacks are set.

I fail to see why it's any better to do it in the VM. That relies on
either always loading the VM's openssl module, even if you don't
actually need it because the library you use deals with openssl. It
prevents other implementations of openssl in the VM.

What I think we should do is the following:

* Emphasize the fact that it's important to use PQinitSSL(false) if you
did things yourself.
* If there's already callbacks set: Remember that fact and don't
overwrite. In the next major version: warn.
* Assert or something if the callback when unregistering isn't the same
as it was when registering. That's pretty much guaranteed to cause
issues.

I would very much like to have this change back-patched, since setting and
resetting the callback makes using libpq in a threaded OpenSSL-enabled app
arguably less safe than if it didn't use any locking.

Meh^2

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jan Urbański

wulczer@wulczer.org

over 11 years ago

In reply to: Andres Freund (#3)

Re: libpq's multi-threaded SSL callback handling is busted

Andres Freund writes:

On 2015-02-09 18:17:14 +0100, Jan Urbański wrote:

First of all, the current behaviour is crazy. We're setting and unsetting the
locking callback every time a connection is made/closed, which is not how
OpenSSL is supposed to be used. The *application* using libpq should set a
callback before it starts threads, it's no business of the library's.

I don't buy this argument at all. Delivering a libpq that's not
threadsafe in some circumstances is a really bad idea.

I knew this would be a hard sell :( What I know is that the current situation
is not good and leaving it as-is causes real grief.

The old behaviour was slightly less insane (set callbacks first time we're
engaging OpenSSL code, never touch them again). The real sane solution is to
leave it up to the application.

The real solution would be for openssl to do it itself.

I think that the OpenSSL API is the real culprit there, requiring threads to
register a callback is what's causing all the issues, but I don't think this
will get ever changed.

Note that the PHP pgsql extension doesn't set the OpenSSL callbacks, because
libpq was setting them on its own. If we remove the callback handling from
libpq, PHP will need to add them. By the way, the MySQL extension for PHP also
does not set those callbacks.

I think this shows pretty clearly how insane it would be to change this
in a minor release. Do you really expect people to update libpq and php
in unison. After a minor release?

Well, I haven't found reports of threaded PHP+MySQL+SSL programs crashing, and
the MySQL extension also doesn't care about the callbacks. I think it's because
it's both because it's a rare combination, or because almost everyone loads the
cURL extension, which *does* set up callbacks. Like I said, it's not a good
situation.

Note that we *already* provide applications with the ability to set the
callbacks themselves and prevent us from doing so: PQinitSSL(false).

Ah, I only now saw that with PQinitOpenSSL(true, false) you can work around the
problem, if you set up your own callbacks first. That seems to at least provide
a way to make libpq not do the callbacks dance in released versions. Thank you.

Let me reiterate: I now believe the callbacks should be set by the application,
libraries should not touch them, since they don't know what will they be
stomping on. If the application is run through a VM like Python or PHP, it's
the VM that should make sure the callbacks are set.

I fail to see why it's any better to do it in the VM. That relies on
either always loading the VM's openssl module, even if you don't
actually need it because the library you use deals with openssl. It
prevents other implementations of openssl in the VM.

The callbacks are a thing that's owned by the application, so libraries have no
business in setting them up. The way OpenSSL's API is specified (very
unfortunately IMHO) is that before you use OpenSSL from threads, you need to
set the callbacks. If you don't control your application's startup (like when
you're a script executed through a VM), you assume the VM took care of it at
startup. If you're a library, you assume the user took care of it, as they
should.

Now I know that a lot of apps get this wrong. Python almost does the right
thing, but you're right, it only sets up callbacks if you load the 'ssl'
module. It still feels like setting them up in library code is stomping on
things not owned by the library - like libperl setting up signal handlers,
which caused problems in Postgres. They're resources not owned by the library!

What I think we should do is the following:

* Emphasize the fact that it's important to use PQinitSSL(false) if you
did things yourself.

PQinitOpenSSL(true, false) if anything. The reason that function got introduced
is precisely because PQinitSSL(false) isn't exactly right, see
/messages/by-id/49820D7D.7030902@esilo.com

That doesn't solve the problem of the Python deadlock, where you're not at
leisure to call a C function at the beginning of your module.

* If there's already callbacks set: Remember that fact and don't
overwrite. In the next major version: warn.

So yeah, that was my initial approach - check if callbacks are set, don't do
the dance if they are. It felt like a crutch, though, and racy at that. There's
no atomic way to test-and-set those callbacks. The window for racyness is
small, though.

* Assert or something if the callback when unregistering isn't the same
as it was when registering. That's pretty much guaranteed to cause
issues.

So let me try another proposal and see if it doesn't set alarm bells off.

* When opening a connection, if there's a callback set, don't overwrite it
(small race window).

* When closing a connection, if the callback set is not
pq_lockingcallback/pq_threadidcallback, don't NULL it (small race window)

Asserts in frontend code are impossible, right? There's no way to warn, either.

That would be a ~8 line patch. Does it feel back-patcheable?

Cheers,
Jan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

over 11 years ago

In reply to: Jan Urbański (#4)

Re: libpq's multi-threaded SSL callback handling is busted

On 2015-02-12 09:31:27 +0100, Jan Urbański wrote:

Andres Freund writes:

On 2015-02-09 18:17:14 +0100, Jan Urbański wrote:

First of all, the current behaviour is crazy. We're setting and unsetting the
locking callback every time a connection is made/closed, which is not how
OpenSSL is supposed to be used. The *application* using libpq should set a
callback before it starts threads, it's no business of the library's.

I don't buy this argument at all. Delivering a libpq that's not
threadsafe in some circumstances is a really bad idea.

I knew this would be a hard sell :( What I know is that the current situation
is not good and leaving it as-is causes real grief.

It certainly causes less grief than just breaking working
applications. While really shitty the current situation works in a large
number of scenarios. It's not that common to have several users of
openssl in an application *and* unload libpq.

I fail to see why it's any better to do it in the VM. That relies on
either always loading the VM's openssl module, even if you don't
actually need it because the library you use deals with openssl. It
prevents other implementations of openssl in the VM.

The callbacks are a thing that's owned by the application, so libraries have no
business in setting them up. The way OpenSSL's API is specified (very
unfortunately IMHO) is that before you use OpenSSL from threads, you need to
set the callbacks. If you don't control your application's startup (like when
you're a script executed through a VM), you assume the VM took care of it at
startup. If you're a library, you assume the user took care of it, as they
should.

That's a bogus argument. The VM cannot setup up every library in the
world in a correct way. It'd be obviously be completely insane to load
the ssl module in every library just because some part of some random
application might need it. In fact, the ssl library for python does
pretty much the same thing as libpq does (although it's slightly more
careful). It's not the VM at all.

What I think we should do is the following:

* Emphasize the fact that it's important to use PQinitSSL(false) if you
did things yourself.

PQinitOpenSSL(true, false) if anything. The reason that function got introduced
is precisely because PQinitSSL(false) isn't exactly right, see
/messages/by-id/49820D7D.7030902@esilo.com

Well, that really depends on what the application is actually using...

That doesn't solve the problem of the Python deadlock, where you're not at
leisure to call a C function at the beginning of your module.

We could just never unload the hooks...

* If there's already callbacks set: Remember that fact and don't
overwrite. In the next major version: warn.

So yeah, that was my initial approach - check if callbacks are set, don't do
the dance if they are. It felt like a crutch, though, and racy at that. There's
no atomic way to test-and-set those callbacks. The window for racyness is
small, though.

If you do that check during library initialization instead of every
connection it shouldn't be racy - if that part is run in a multithreaded
fashion you're doing something crazy.

* Assert or something if the callback when unregistering isn't the same
as it was when registering. That's pretty much guaranteed to cause
issues.

So let me try another proposal and see if it doesn't set alarm bells off.

* When opening a connection, if there's a callback set, don't overwrite it
(small race window).

* When closing a connection, if the callback set is not
pq_lockingcallback/pq_threadidcallback, don't NULL it (small race window)

If we do the tests once during initialization there shouldn't be a
race....

Asserts in frontend code are impossible, right? There's no way to warn, either.

You can write to stderr...

That would be a ~8 line patch. Does it feel back-patcheable?

I think we first need to find out what we all can agree on before
deciding about that.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jan Urbański

wulczer@wulczer.org

over 11 years ago

In reply to: Andres Freund (#5)

Re: libpq's multi-threaded SSL callback handling is busted

Andres Freund writes:

On 2015-02-12 09:31:27 +0100, Jan Urbański wrote:

That doesn't solve the problem of the Python deadlock, where you're not at
leisure to call a C function at the beginning of your module.

We could just never unload the hooks...

That's what we did before 4e816286533dd34c10b368487d4079595a3e1418 :) And it
got changed after /messages/by-id/48620925.6070806@pws.com.au

* If there's already callbacks set: Remember that fact and don't
overwrite. In the next major version: warn.

So yeah, that was my initial approach - check if callbacks are set, don't do
the dance if they are. It felt like a crutch, though, and racy at that. There's
no atomic way to test-and-set those callbacks. The window for racyness is
small, though.

If you do that check during library initialization instead of every
connection it shouldn't be racy - if that part is run in a multithreaded
fashion you're doing something crazy.

Yes, that's true. The problem is that there's no real libpq initialisation
function. The docs say that:

"If your application initializes libssl and/or libcrypto libraries and libpq is
built with SSL support, you should call PQinitOpenSSL"

So most apps will just not bother. The moment you know you'll need SSL is only
when you get an 'S' message from the server...

Cheers,
Jan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jan Urbański

wulczer@wulczer.org

over 11 years ago

In reply to: Jan Urbański (#6)

Re: libpq's multi-threaded SSL callback handling is busted

Jan Urbański writes:

Andres Freund writes:

On 2015-02-12 09:31:27 +0100, Jan Urbański wrote:

That doesn't solve the problem of the Python deadlock, where you're not at
leisure to call a C function at the beginning of your module.

We could just never unload the hooks...

That's what we did before 4e816286533dd34c10b368487d4079595a3e1418 :) And it
got changed after /messages/by-id/48620925.6070806@pws.com.au

* If there's already callbacks set: Remember that fact and don't
overwrite. In the next major version: warn.

So yeah, that was my initial approach - check if callbacks are set, don't do
the dance if they are. It felt like a crutch, though, and racy at that. There's
no atomic way to test-and-set those callbacks. The window for racyness is
small, though.

If you do that check during library initialization instead of every
connection it shouldn't be racy - if that part is run in a multithreaded
fashion you're doing something crazy.

Yes, that's true. The problem is that there's no real libpq initialisation
function. The docs say that:

"If your application initializes libssl and/or libcrypto libraries and libpq is
built with SSL support, you should call PQinitOpenSSL"

So most apps will just not bother. The moment you know you'll need SSL is only
when you get an 'S' message from the server...

For the sake of discussion, here's a patch to prevent stomping on
previously-set callbacks, racy as it looks.

FWIW, it does fix the Python deadlock and doesn't cause the PHP segfault...

Peter Eisentraut

peter_e@gmx.net

over 11 years ago

In reply to: Jan Urbański (#7)

Re: libpq's multi-threaded SSL callback handling is busted

On 2/12/15 7:28 AM, Jan Urbański wrote:

* If there's already callbacks set: Remember that fact and don't
overwrite. In the next major version: warn.

So yeah, that was my initial approach - check if callbacks are set, don't do
the dance if they are. It felt like a crutch, though, and racy at that. There's
no atomic way to test-and-set those callbacks. The window for racyness is
small, though.

If you do that check during library initialization instead of every
connection it shouldn't be racy - if that part is run in a multithreaded
fashion you're doing something crazy.

Yes, that's true. The problem is that there's no real libpq initialisation
function. The docs say that:

"If your application initializes libssl and/or libcrypto libraries and libpq is
built with SSL support, you should call PQinitOpenSSL"

So most apps will just not bother. The moment you know you'll need SSL is only
when you get an 'S' message from the server...

For the sake of discussion, here's a patch to prevent stomping on
previously-set callbacks, racy as it looks.

FWIW, it does fix the Python deadlock and doesn't cause the PHP segfault...

I don't think this patch would actually fix the problem that was
described after the original bug report
(/messages/by-id/5436991B.5020708@vmware.com),
namely that another thread acquires a lock while the libpq callbacks are
set and then cannot release the lock if libpq has been shut down in the
meantime.

The only way to fix that is to never unset the callbacks. But we don't
want that or can't do that for other reasons.

I think the only way out is to declare that if there are multiple
threads and other threads might be using OpenSSL not through libpq, then
the callbacks need to be managed outside of libpq.

In environments like PHP or Python this would require some coordination
work across modules somehow, but I don't see a way around that.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jan Urbański

wulczer@wulczer.org

over 11 years ago

In reply to: Peter Eisentraut (#8)

Re: libpq's multi-threaded SSL callback handling is busted

Peter Eisentraut writes:

On 2/12/15 7:28 AM, Jan Urbański wrote:

For the sake of discussion, here's a patch to prevent stomping on
previously-set callbacks, racy as it looks.

FWIW, it does fix the Python deadlock and doesn't cause the PHP segfault...

I don't think this patch would actually fix the problem that was
described after the original bug report
(/messages/by-id/5436991B.5020708@vmware.com),
namely that another thread acquires a lock while the libpq callbacks are
set and then cannot release the lock if libpq has been shut down in the
meantime.

I did test both the Python and the PHP repro scripts and the patch fixed both
the deadlock and the segfault.

What happens is that Python (for instance) stops over the callback
unconditionally. So when libpq gets unloaded, it sees that the currently set
callback is no the one it originally set and refrains from NULLing it.

There's a small race window there, to be sure, but it's a lot better than what
we have now.

Cheers,
Jan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Peter Eisentraut

peter_e@gmx.net

over 11 years ago

In reply to: Jan Urbański (#9)

Re: libpq's multi-threaded SSL callback handling is busted

On 4/2/15 4:32 AM, Jan Urbański wrote:

Peter Eisentraut writes:

On 2/12/15 7:28 AM, Jan Urbański wrote:

For the sake of discussion, here's a patch to prevent stomping on
previously-set callbacks, racy as it looks.

FWIW, it does fix the Python deadlock and doesn't cause the PHP segfault...

I don't think this patch would actually fix the problem that was
described after the original bug report
(/messages/by-id/5436991B.5020708@vmware.com),
namely that another thread acquires a lock while the libpq callbacks are
set and then cannot release the lock if libpq has been shut down in the
meantime.

I did test both the Python and the PHP repro scripts and the patch fixed both
the deadlock and the segfault.

What happens is that Python (for instance) stops over the callback
unconditionally. So when libpq gets unloaded, it sees that the currently set
callback is no the one it originally set and refrains from NULLing it.

So this works because Python is just as broken as libpq right now. What
happens if Python is fixed as well? Then we'll have the problem I
described above.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Jan Urbański

wulczer@wulczer.org

over 11 years ago

In reply to: Peter Eisentraut (#10)

Re: libpq's multi-threaded SSL callback handling is busted

Peter Eisentraut writes:

On 4/2/15 4:32 AM, Jan Urbański wrote:

Peter Eisentraut writes:

I don't think this patch would actually fix the problem that was
described after the original bug report
(/messages/by-id/5436991B.5020708@vmware.com),
namely that another thread acquires a lock while the libpq callbacks are
set and then cannot release the lock if libpq has been shut down in the
meantime.

I did test both the Python and the PHP repro scripts and the patch fixed both
the deadlock and the segfault.

What happens is that Python (for instance) stops over the callback
unconditionally. So when libpq gets unloaded, it sees that the currently set
callback is no the one it originally set and refrains from NULLing it.

So this works because Python is just as broken as libpq right now. What
happens if Python is fixed as well? Then we'll have the problem I
described above.

Well, not really, libpq sets and unsets the callbacks every time an SSL
connection is opened and closed. Python sets the callbacks once when the ssl
module is imported and never touches them again.

Arguably, it should set them even earlier, but it's still saner than
flip-flopping them. AFAIK you can't "unload" Python, so setting the callback
and keeping it there is a sound strategy.

The change I'm proposing is not to set the callback in libpq if one is already
set and not remove it if the one that's set is not libpq's. That's as sane as
it gets.

With multiple libraries wanting to use OpenSSL in threaded code, the mechanism
seems to be "last one wins". It doesn't matter *who's* callbacks are used, as
long as they're there and they stay there and that's Python's stance. This
doesn't work if you want to be able to unload the library, so the next best
thing is not touching the callback if someone else set his in the meantime.

What's broken is OpenSSL for offering such a bad way of dealing with
concurrency.

To reiterate: I recognise that not handling the callbacks is not the right
answer. But not stomping on callbacks others might have set sounds like a
minimal and safe improvement.

Cheers,
Jan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Peter Eisentraut

peter_e@gmx.net

over 11 years ago

In reply to: Jan Urbański (#11)

Re: libpq's multi-threaded SSL callback handling is busted

On 4/3/15 7:44 AM, Jan Urbański wrote:

To reiterate: I recognise that not handling the callbacks is not the right
answer. But not stomping on callbacks others might have set sounds like a
minimal and safe improvement.

I think your patch is okay in that it is not a good idea to overwrite or
unset someone else's callbacks. But we shouldn't mistake that for
fixing the underlying problem. The only reason this patch appears to
fix the presented test cases is because other OpenSSL users are also
misbehaving and/or the OpenSSL interfaces are so stupid that they cannot
be worked with reasonably.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Jan Urbański

wulczer@wulczer.org

over 11 years ago

In reply to: Peter Eisentraut (#12)

Re: libpq's multi-threaded SSL callback handling is busted

Peter Eisentraut writes:

On 4/3/15 7:44 AM, Jan Urbański wrote:

To reiterate: I recognise that not handling the callbacks is not the right
answer. But not stomping on callbacks others might have set sounds like a
minimal and safe improvement.

I think your patch is okay in that it is not a good idea to overwrite or
unset someone else's callbacks. But we shouldn't mistake that for
fixing the underlying problem. The only reason this patch appears to
fix the presented test cases is because other OpenSSL users are also
misbehaving and/or the OpenSSL interfaces are so stupid that they cannot
be worked with reasonably.

Yeah, the underlying problem is OpenSSL's idea of handling threads by limiting
itself to providing a function pointer. That's what we have to work with,
sadly.

Faced by such madness, libpq should try to do the sanest possible thing, at
least then if it breaks it's not our fault.

Cheers,
Jan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Peter Eisentraut

peter_e@gmx.net

over 11 years ago

In reply to: Jan Urbański (#7)

Re: libpq's multi-threaded SSL callback handling is busted

On 2/12/15 7:28 AM, Jan Urbański wrote:

+#if OPENSSL_VERSION_NUMBER < 0x10000000
+/* OpenSSL 1.0.0 deprecates the CRYPTO_set_id_callback function and provides a
+ * default implementation, so there's no need for our own. */

I have some additional concerns about this. It is true that OpenSSL
1.0.0 deprecates CRYPTO_set_id_callback(), but it replaces it with
CRYPTO_THREADID_set_callback(). There is no indication that you don't
need to set a callback anymore. The man page
(https://www.openssl.org/docs/crypto/threads.html) still says you need
to set two callbacks, and points to the new interface.

It is true that there is a fallback implementation for some platforms,
but there is no indication that one is invited to rely on those. Let's
keep in mind that libpq is potentially used on obscure platforms, so I'd
rather stick with the documented approaches.

I suggest you remove this part from your patch and submit a separate
patch for consideration if you want to.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Jan Urbański

wulczer@wulczer.org

over 11 years ago

In reply to: Peter Eisentraut (#14)

Re: libpq's multi-threaded SSL callback handling is busted

Peter Eisentraut writes:

On 2/12/15 7:28 AM, Jan Urbański wrote:
+#if OPENSSL_VERSION_NUMBER < 0x10000000
+/* OpenSSL 1.0.0 deprecates the CRYPTO_set_id_callback function and provides a
+ * default implementation, so there's no need for our own. */
I have some additional concerns about this. It is true that OpenSSL
1.0.0 deprecates CRYPTO_set_id_callback(), but it replaces it with
CRYPTO_THREADID_set_callback(). There is no indication that you don't
need to set a callback anymore. The man page
(https://www.openssl.org/docs/crypto/threads.html) still says you need
to set two callbacks, and points to the new interface.

Truly, no good deed can ever go unpunished.

I spent around an hour tracking down why setting both callbacks
for OpenSSL >= 1.0.0 brought back the PHP segfaults. Turns out, in OpenSSL
there's *no way* to unregister a callback registered with
CRYPTO_THREADID_set_callback()!

Observe: https://github.com/openssl/openssl/blob/35a1cc90bc1795e8893c11e442790ee7f659fffb/crypto/thr_id.c#L174-L180

Once you set a callback, game over - unloading your library will cause a
segfault. I cannot express how joyful I felt when I discovered it.

I suggest you remove this part from your patch and submit a separate
patch for consideration if you want to.

At this point I will propose an even simpler patch (attached). I gave up on
trying to use the more modern CRYPTO_THREADID_* functions.

Right now, HEAD libpq won't compile against OpenSSL compiled with
OPENSSL_NO_DEPRECATED, which I think is fine, given the lack of complaints. So
let's just keep using the older, saner functions and ignore the THREADID crap.

By the way, might I take the opportunity to breach the subject of backpatching
this change, should it get accepted?

Cheers,
Jan

#16

Peter Eisentraut

peter_e@gmx.net

over 11 years ago

In reply to: Jan Urbański (#15)

Re: libpq's multi-threaded SSL callback handling is busted

On 4/9/15 3:54 PM, Jan Urbański wrote:

At this point I will propose an even simpler patch (attached). I gave up on
trying to use the more modern CRYPTO_THREADID_* functions.

Right now, HEAD libpq won't compile against OpenSSL compiled with
OPENSSL_NO_DEPRECATED, which I think is fine, given the lack of complaints. So
let's just keep using the older, saner functions and ignore the THREADID crap.

Committed.

By the way, might I take the opportunity to breach the subject of backpatching
this change, should it get accepted?

I'll do that if another committer will speak out in favor of it.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Jan Urbański

wulczer@wulczer.org

over 11 years ago

In reply to: Peter Eisentraut (#16)

Re: libpq's multi-threaded SSL callback handling is busted

Peter Eisentraut writes:

On 4/9/15 3:54 PM, Jan Urbański wrote:

At this point I will propose an even simpler patch (attached). I gave up on
trying to use the more modern CRYPTO_THREADID_* functions.

Right now, HEAD libpq won't compile against OpenSSL compiled with
OPENSSL_NO_DEPRECATED, which I think is fine, given the lack of complaints. So
let's just keep using the older, saner functions and ignore the THREADID crap.

Committed.

Thank you!

And thanks for the discussion, I hope my frustration did not come across as
brusqueness.

Cheers,
Jan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

libpq's multi-threaded SSL callback handling is busted

Attachments:

Attachments:

Attachments:

Attachments: