BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

Started by Nonameabout 11 years ago10 messagesbugs
Jump to latest
#1Noname
william.welter@4linux.com.br

The following bug has been logged on the website:

Bug reference: 12799
Logged by: William Felipe Welter
Email address: william.welter@4linux.com.br
PostgreSQL version: 9.4.1
Operating system: Ubuntu Linux
Description:

According to OpenSSL manual
(https://www.openssl.org/docs/ssl/SSL_get_error.html#DESCRIPTION) "The
current thread's error queue must be empty before the TLS/SSL I/O operation
is attempted, or SSL_get_error() will not work reliably"

But libpq in pgsecure_read()/pqsecure_write() on branch REL9_4_STABLE or in
pgtls_read()/pgtls_write() on branch MASTER there no calls to
ERR_clear_error() to clear error queue
(https://www.openssl.org/docs/crypto/ERR_clear_error.html) and avoid
unpredictable conditions.

We can reproduce problems with currently implementation on PHP scripts that
use pgsql extension (use libpq) and openssl extension as reported on PHP bug
track on "https://bugs.php.net/bug.php?id=68276" (firstly reported as memory
corruption error) when errors from previous php callings to openssl affect
libpq calls leading to fatal errors.

The solution is simple, see following patches:

Branch master
diff --git a/src/interfaces/libpq/fe-secure-openssl.c
b/src/interfaces/libpq/fe-secure-openssl.c
index 1b9f3a4..8cf0335 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -212,6 +212,7 @@ pgtls_read(PGconn *conn, void *ptr, size_t len)

rloop:
SOCK_ERRNO_SET(0);
+ ERR_clear_error();
n = SSL_read(conn->ssl, ptr, len);
err = SSL_get_error(conn->ssl, n);
switch (err)
@@ -320,6 +321,7 @@ pgtls_write(PGconn *conn, const void *ptr, size_t len)
int err;

SOCK_ERRNO_SET(0);
+ ERR_clear_error();
n = SSL_write(conn->ssl, ptr, len);
err = SSL_get_error(conn->ssl, n);
switch (err)

Branch REL9_4_STABLE
diff --git a/src/interfaces/libpq/fe-secure.c
b/src/interfaces/libpq/fe-secure.c
index 2752d16..54001c0 100644
--- a/src/interfaces/libpq/fe-secure.c
+++ b/src/interfaces/libpq/fe-secure.c
@@ -350,6 +350,7 @@ pqsecure_read(PGconn *conn, void *ptr, size_t len)

rloop:
SOCK_ERRNO_SET(0);
+ ERR_clear_error();
n = SSL_read(conn->ssl, ptr, len);
err = SSL_get_error(conn->ssl, n);
switch (err)
@@ -511,6 +512,7 @@ pqsecure_write(PGconn *conn, const void *ptr, size_t
len)
DISABLE_SIGPIPE(conn, spinfo, return -1);

SOCK_ERRNO_SET(0);
+ ERR_clear_error();
n = SSL_write(conn->ssl, ptr, len);
err = SSL_get_error(conn->ssl, n);
switch (err)

Similar situation discussed on
stackoverflow:http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Noname (#1)
Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

On 02/24/2015 05:09 AM, william.welter@4linux.com.br wrote:

According to OpenSSL manual
(https://www.openssl.org/docs/ssl/SSL_get_error.html#DESCRIPTION) "The
current thread's error queue must be empty before the TLS/SSL I/O operation
is attempted, or SSL_get_error() will not work reliably"

But libpq in pgsecure_read()/pqsecure_write() on branch REL9_4_STABLE or in
pgtls_read()/pgtls_write() on branch MASTER there no calls to
ERR_clear_error() to clear error queue
(https://www.openssl.org/docs/crypto/ERR_clear_error.html) and avoid
unpredictable conditions.

<pedantic>
The OpenSSL manual doesn't directly require you to call
ERR_clear_error() before every SSL_* call. It just requires that you
ensure that the error queue is empty. Libpq ensures that by always
clearing the queue *after* an error happens, in SSLerrmessage().
</pedantic>

I'm actually not 100% sure we clear the error queue after every error.
Looking at the OpenSSL sources on SSL_get_error(), it seems possible
that it returns SSL_ERROR_SYSCALL, with an error in the error queue. The
libpq code does not call SSLerrmessag() when SSL_get_error() returns
SSL_ERROR_SYSCALL, so that would leave the error dangling in the queue.
Also, if there are multiple errors in the queue, we only return and
remove from the queue the first one.

So should we clear the error queue before each call, just to be sure?
The risk is that the application has performed some other OpenSSL
operations, outside libpq, but not checked the error queue yet, and when
we clear the queue, the original error is gone and isn't reported. An
application probably shouldn't do that, you should check the error after
each SSL call, and not leave the error dangling in the error queue as a
landmine for the next SSL call. OTOH, if the application is built so
that the error queue is cleared before each SSL call, not after, that's
incompatible with the way libpq currently works. And that's certainly a
much more common practice than the hypothetical application that would
miss the error if it's cleared unexpectedly.

Overall, I agree we should clear the error queue before each SSL call.
It makes the application as whole more robust.

The solution is simple, see following patches:

Yeah. Should also do the same in the backend, and also before
SSL_connect/SSL_accept() calls. I.e. any time SSL_get_error is used.

- Heikki

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Noname
william.welter@4linux.com.br
In reply to: Heikki Linnakangas (#2)
Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

<pedantic>
The OpenSSL manual doesn't directly require you to call
ERR_clear_error() before every SSL_* call. It just requires that you
ensure that the error queue is empty. Libpq ensures that by always
clearing the queue *after* an error happens, in SSLerrmessage().
</pedantic>

Ok, i understand.

I'm actually not 100% sure we clear the error queue after every error.
Looking at the OpenSSL sources on SSL_get_error(), it seems possible
that it returns SSL_ERROR_SYSCALL, with an error in the error queue. The
libpq code does not call SSLerrmessag() when SSL_get_error() returns
SSL_ERROR_SYSCALL, so that would leave the error dangling in the queue.
Also, if there are multiple errors in the queue, we only return and
remove from the queue the first one.

So should we clear the error queue before each call, just to be sure?
The risk is that the application has performed some other OpenSSL
operations, outside libpq, but not checked the error queue yet, and when
we clear the queue, the original error is gone and isn't reported. An
application probably shouldn't do that, you should check the error after
each SSL call, and not leave the error dangling in the error queue as a
landmine for the next SSL call. OTOH, if the application is built so
that the error queue is cleared before each SSL call, not after, that's
incompatible with the way libpq currently works. And that's certainly a
much more common practice than the hypothetical application that would
miss the error if it's cleared unexpectedly.

If i understand correctly:
- If libpq clear the error queue before each call this can break current applications that not check error queue immedialty after calls.
- But if libpq not clear the queue, application can leave errors on the queue which lead libpq throws unrelated errors.

On PHP case, the problem is that the queue is only cleared if the developer explicit call "openssl_error_string()" for each openssl function they called before. I think this behavior can be changed on PHP to prevent this kind of problem with libpq or other libs that use openssl.

Overall, I agree we should clear the error queue before each SSL call.
It makes the application as whole more robust.

The solution is simple, see following patches:

Yeah. Should also do the same in the backend, and also before
SSL_connect/SSL_accept() calls. I.e. any time SSL_get_error is used.

Question: Is possible to fix this to next major release ? (im volunteering)

- Heikki

--
Message protected by MailGuard: e-mail anti-virus, anti-spam and content
filtering.http://www.mailguard.com.au/mg
Click here to report this message as spam:
https://console.mailguard.com.au/ras/1LubMvNYuN/6iGxWUBDMTyqo5LXNO4F30/0

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Noname (#3)
Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

On 02/27/2015 04:00 AM, william.welter@4linux.com.br wrote:

The solution is simple, see following patches:

Yeah. Should also do the same in the backend, and also before
SSL_connect/SSL_accept() calls. I.e. any time SSL_get_error is used.

Question: Is possible to fix this to next major release ? (im volunteering)

Yeah. If you could write a complete patch, per above, I can commit it.

- Heikki

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#5Dave Vitek
dvitek@grammatech.com
In reply to: Heikki Linnakangas (#4)
Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

On 3/1/2015 3:32 PM, Heikki Linnakangas wrote:

On 02/27/2015 04:00 AM, william.welter@4linux.com.br wrote:

The solution is simple, see following patches:

Yeah. Should also do the same in the backend, and also before
SSL_connect/SSL_accept() calls. I.e. any time SSL_get_error is used.

Question: Is possible to fix this to next major release ? (im
volunteering)

Yeah. If you could write a complete patch, per above, I can commit it.

- Heikki

We independently ran into this, diagnosed it, and fixed it. Here is the
complete patch covering every use of SSL_get_error.

diff -ur src/backend/libpq/be-secure-openssl.c 
src/backend/libpq/be-secure-openssl.c
--- src/backend/libpq/be-secure-openssl.c    2016-02-18 
16:40:38.540898100 -0500
+++ src/backend/libpq/be-secure-openssl.c    2016-02-18 
16:59:07.331898100 -0500
@@ -353,6 +353,10 @@
      port->ssl_in_use = true;
  aloop:
+    /* make sure SSL_get_error_doesn't fetch an
+     * old error that predates the SSL_accept below.
+     */
+    ERR_clear_error();
      r = SSL_accept(port->ssl);
      if (r <= 0)
      {
@@ -501,6 +505,9 @@
      int            err;
      errno = 0;
+    /* make sure SSL_get_error_doesn't fetch an
+     * old error that predates the SSL_read below.
+     */
      n = SSL_read(port->ssl, ptr, len);
      err = SSL_get_error(port->ssl, n);
      switch (err)
@@ -558,6 +565,9 @@
      int            err;
      errno = 0;
+    /* make sure SSL_get_error_doesn't fetch an
+     * old error that predates the SSL_write below.
+     */
      n = SSL_write(port->ssl, ptr, len);
      err = SSL_get_error(port->ssl, n);
      switch (err)
diff -ur src/interfaces/libpq/fe-secure-openssl.c 
src/interfaces/libpq/fe-secure-openssl.c
--- src/interfaces/libpq/fe-secure-openssl.c    2016-02-18 
16:40:03.575898100 -0500
+++ src/interfaces/libpq/fe-secure-openssl.c    2016-02-18 
16:58:01.711898100 -0500
@@ -212,6 +212,10 @@
  rloop:
      SOCK_ERRNO_SET(0);
+    /* make sure SSL_get_error_doesn't fetch an
+     * old error that predates the SSL_read below.
+     */
+    ERR_clear_error();
      n = SSL_read(conn->ssl, ptr, len);
      err = SSL_get_error(conn->ssl, n);
      switch (err)
@@ -320,6 +324,10 @@
      int            err;
      SOCK_ERRNO_SET(0);
+    /* make sure SSL_get_error_doesn't fetch an
+     * old error that predates the SSL_write below.
+     */
+    ERR_clear_error();
      n = SSL_write(conn->ssl, ptr, len);
      err = SSL_get_error(conn->ssl, n);
      switch (err)
@@ -1327,6 +1335,10 @@
  {
      int            r;
+    /* make sure SSL_get_error_doesn't fetch an
+     * old error that predates the SSL_connect below.
+     */
+    ERR_clear_error();
      r = SSL_connect(conn->ssl);
      if (r <= 0)
      {

- Dave

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Dave Vitek (#5)
Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

On Thu, Feb 18, 2016 at 2:11 PM, Dave Vitek <dvitek@grammatech.com> wrote:

We independently ran into this, diagnosed it, and fixed it. Here is the
complete patch covering every use of SSL_get_error.

I posted a patch for this issue independently, and quite recently:

https://commitfest.postgresql.org/9/520/

Do you happen to have any idea why there has been an uptick in problem
reports about this recently, despite the fact that it's been an issue
for a while (that's been the case within Heroku, at least)? Are you
aware that there is some specific trend behind that?

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Dave Vitek
dvitek@grammatech.com
In reply to: Peter Geoghegan (#6)
Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

On 2/18/2016 5:16 PM, Peter Geoghegan wrote:

On Thu, Feb 18, 2016 at 2:11 PM, Dave Vitek <dvitek@grammatech.com> wrote:

We independently ran into this, diagnosed it, and fixed it. Here is the
complete patch covering every use of SSL_get_error.

I posted a patch for this issue independently, and quite recently:

https://commitfest.postgresql.org/9/520/

Do you happen to have any idea why there has been an uptick in problem
reports about this recently, despite the fact that it's been an issue
for a while (that's been the case within Heroku, at least)? Are you
aware that there is some specific trend behind that?

I can only speak for my case. postgres is part of our CodeSonar
product, and we are adding an option to have it use TLS sockets to talk
to postgres.

Maintainers: Peter's patch is better than mine, at least for the front
end side. I also adjusted be-secure-openssl.c, which perhaps is not
necessary, but then again it's hard to be sure. It might be worth doing
a patch for be-secure-openssl.c in the spirit of what Peter did for the
frontend (sorry, not volunteering :).

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dave Vitek (#7)
Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

Dave Vitek <dvitek@grammatech.com> writes:

Maintainers: Peter's patch is better than mine, at least for the front
end side. I also adjusted be-secure-openssl.c, which perhaps is not
necessary, but then again it's hard to be sure. It might be worth doing
a patch for be-secure-openssl.c in the spirit of what Peter did for the
frontend (sorry, not volunteering :).

+1 for changing both sides. I'm fairly sure that you could provoke
problems of this ilk in the backend too, for example if client connection
is using SSL and we also establish an outgoing SSL connection using
postgres_fdw or dblink.

BTW, do we have a reproducible test case?

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#9Dave Vitek
dvitek@grammatech.com
In reply to: Tom Lane (#8)
Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

On 2/18/2016 5:38 PM, Tom Lane wrote:

Dave Vitek <dvitek@grammatech.com> writes:

Maintainers: Peter's patch is better than mine, at least for the front
end side. I also adjusted be-secure-openssl.c, which perhaps is not
necessary, but then again it's hard to be sure. It might be worth doing
a patch for be-secure-openssl.c in the spirit of what Peter did for the
frontend (sorry, not volunteering :).

+1 for changing both sides. I'm fairly sure that you could provoke
problems of this ilk in the backend too, for example if client connection
is using SSL and we also establish an outgoing SSL connection using
postgres_fdw or dblink.

BTW, do we have a reproducible test case?

regards, tom lane

I can reproduce it, but I don't have a self contained unit test.

Such a test case might look like: call ERR_put_error with SSL_ERROR_SSL
and then cause libpq to invoke SSL_read. SSL_get_error after the read
will probably return SSL_ERROR_SSL even if the read goes fine, causing
postgres to conclude things have failed.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Tom Lane (#8)
Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

On Thu, Feb 18, 2016 at 2:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

+1 for changing both sides. I'm fairly sure that you could provoke
problems of this ilk in the backend too, for example if client connection
is using SSL and we also establish an outgoing SSL connection using
postgres_fdw or dblink.

I didn't consider that, although I did note that Heikki was also in
favor of covering both sides. It should be fairly straightforward.

BTW, do we have a reproducible test case?

Yes, but that seems almost unnecessary. Basically, we trust that the
per-thread error queue doesn't have anything in it, even though it
clearly can. That assumption could be violated because malloc()
returns NULL in SSLerrmessage(), for example. That's a case that does
not even involve any non-libpq use of OpenSSL. Clearly we need to be
more careful about clearing the queue generally, but especially
because everyone else will get this wrong.

The SSL_get_error() man page says:

"""
In addition to ssl and ret, SSL_get_error() inspects the current thread's
OpenSSL error queue. Thus, SSL_get_error() must be used in the same thread
that performed the TLS/SSL I/O operation, and no other OpenSSL function calls
should appear in between. The current thread's error queue must be empty
before the TLS/SSL I/O operation is attempted, or SSL_get_error() will not work
reliably.

"""

It is more or less that simple. We can't let this happen without
severe consequences. It's on us to coordinate how to prevent this
outcome, with no direction provided on how that should work out in the
real world.

There seem to be some really thin wrappers for OpenSSL for scripting
languages like Ruby and PHP around, that pass the buck on to users of
those languages. Naturally, they often get it wrong, because the
interface is so impractical.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs