Bug #838: SSL problems in 7.3
Nathan Mueller (nate@cs.wisc.edu) reports a bug with a severity of 1
The lower the number the more severe it is.
Short Description
SSL problems in 7.3
Long Description
In 7.3 if a client exists without disconnecting from the database the backend dumps core. This wouldn't be so bad except that everyone else is kicked off. Additionally, pre-7.3 clients can't connect to a 7.3 database with SSL. Instead of failing right away they just hang for a minute or two and then report "psql: could not establish SSL connection: No SSL error reported". When people hit Ctrl-C during the hang the database crashes.
Here's the server output from trying to connect with a pre-7.3 client.
FATAL: failed to initialize SSL connection: wrong version number
Here's the output from a crash:
FATAL: SSL error: ssl handshake failure
<10935 identical lines deleted>
FATAL: SSL error: ssl handshake failure
LOG: server process (pid 9308) was terminated by signal 11
LOG: terminating any other active server processes
Both 7.3 (server) and 7.2.2 (client) are linked against the same build of openssl 0.9.6e. I'm pretty sure this isn't an SSL bug since I have lots of other apps linked agaist the same build that have never done this.
I wasn't sure if/how to post the core file. If you want it let me know how.
Sample Code
No file was uploaded with this report
pgsql-bugs@postgresql.org writes:
In 7.3 if a client exists without disconnecting from the database the
backend dumps core.
I can't reproduce that.
--
Peter Eisentraut peter_e@gmx.net
In 7.3 if a client exists without disconnecting from the
database the
backend dumps core.I can't reproduce that.
What version of openssl are you using?
--Nate
Import Notes
Resolved by subject fallback
In 7.3 if a client exists without disconnecting from the
database the
backend dumps core.I can't reproduce that.
I'm sorry but you're going to have to send me more info about your
setup. I just did a fresh build on my home machine against Red Hat's
openssl and the problems got even worse. The crashing still happens but
now I get the version error when I use a client and server from the
same build.
Has anyone else tested this? I find it hard to believe the new SSL is so
broken but I've tested this in three different environments with pretty
much the same result. I saw a message on [HACKERS] saying that the
author is MIA -- is that still the current state?
--Nate
Import Notes
Resolved by subject fallback
Nathan Mueller wrote:
In 7.3 if a client exists without disconnecting from the
database the
backend dumps core.I can't reproduce that.
I'm sorry but you're going to have to send me more info about your
setup. I just did a fresh build on my home machine against Red Hat's
openssl and the problems got even worse. The crashing still happens but
now I get the version error when I use a client and server from the
same build.Has anyone else tested this? I find it hard to believe the new SSL is so
broken but I've tested this in three different environments with pretty
much the same result. I saw a message on [HACKERS] saying that the
author is MIA -- is that still the current state?
I tested it with openssl 0.9.6e and it worked on BSD/OS 4.2. The author
is only involved intermittently. I worked with him to get it working on
7.3. It is certainly possible there are other bugs in there.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
tested it with openssl 0.9.6e and it worked on BSD/OS 4.2. The author
is only involved intermittently. I worked with him to get it
working on
7.3. It is certainly possible there are other bugs in there.
Slow night so I learned a little about SSL and figured this out. The
following patch does two things.
First it switches the ssl method back to SSLv23 so pre-7.3 SSL clients
will work with new databases. I made the switch in both the client and
the server, but the server change is all you really need. The second is
to ignore the SSL syscall error when n is 0 since that means EOF.
This fixes both of my problems, hope it works for everyone else too.
--Nate
diff -ur postgresql-7.3/src/backend/libpq/be-secure.c
postgresql-7.3.patched/src/backend/libpq/be-secure.c
--- postgresql-7.3/src/backend/libpq/be-secure.c Thu Nov 7 12:47:08 2002
+++ postgresql-7.3.patched/src/backend/libpq/be-secure.c Mon Dec 9
23:47:45 2002
@@ -288,7 +288,8 @@
case SSL_ERROR_WANT_READ:
break;
case SSL_ERROR_SYSCALL:
- elog(ERROR, "SSL SYSCALL error: %s",
strerror(errno));
+ if (n == -1)
+ elog(ERROR, "SSL SYSCALL error: %s",
strerror(errno));
break;
case SSL_ERROR_SSL:
elog(ERROR, "SSL error: %s",
SSLerrmessage());
@@ -585,7 +586,7 @@
{
SSL_library_init();
SSL_load_error_strings();
- SSL_context = SSL_CTX_new(TLSv1_method());
+ SSL_context = SSL_CTX_new(SSLv23_method());
if (!SSL_context)
{
postmaster_error("failed to create SSL
context: %s",
diff -ur postgresql-7.3/src/interfaces/libpq/fe-secure.c
postgresql-7.3.patched/src/interfaces/libpq/fe-secure.c
--- postgresql-7.3/src/interfaces/libpq/fe-secure.c Thu Nov 7
12:47:08 2002
+++ postgresql-7.3.patched/src/interfaces/libpq/fe-secure.c Mon Dec 9
23:42:40 2002
@@ -712,7 +712,7 @@
{
SSL_library_init();
SSL_load_error_strings();
- SSL_context = SSL_CTX_new(TLSv1_method());
+ SSL_context = SSL_CTX_new(SSLv23_method());
if (!SSL_context)
{
printfPQExpBuffer(&conn->errorMessage,
Only in postgresql-7.3.patched/src/interfaces/libpq: fe-secure.c~
Import Notes
Resolved by subject fallback
I am glad you found out the cause of your problems.
I am reluctant to apply this patch because the original author
recommended TLSv1 specifically because it was more secure, especially
compared to SSLv2.
There was a conscious decision in 7.3 to require only 7.3 clients when
using SSL. I don't remember how many people were involved in that
discussion, but I know it was made. In fact, there was so much new SSL
code in 7.3, I suspected we couldn't even make it work with pre-7.2
clients. I am surprised it works with your small change.
On the issue of checking if SSL_read() has returned -1, is that standard
OpenSSL coding practice that even if SSL_get_error() returns an error,
you have to check the SSL_read() return value too?
---------------------------------------------------------------------------
Nathan Mueller wrote:
tested it with openssl 0.9.6e and it worked on BSD/OS 4.2. The author
is only involved intermittently. I worked with him to get it
working on
7.3. It is certainly possible there are other bugs in there.Slow night so I learned a little about SSL and figured this out. The
following patch does two things.First it switches the ssl method back to SSLv23 so pre-7.3 SSL clients
will work with new databases. I made the switch in both the client and
the server, but the server change is all you really need. The second is
to ignore the SSL syscall error when n is 0 since that means EOF.This fixes both of my problems, hope it works for everyone else too.
--Nate
diff -ur postgresql-7.3/src/backend/libpq/be-secure.c postgresql-7.3.patched/src/backend/libpq/be-secure.c --- postgresql-7.3/src/backend/libpq/be-secure.c Thu Nov 7 12:47:08 2002 +++ postgresql-7.3.patched/src/backend/libpq/be-secure.c Mon Dec 9 23:47:45 2002 @@ -288,7 +288,8 @@ case SSL_ERROR_WANT_READ: break; case SSL_ERROR_SYSCALL: - elog(ERROR, "SSL SYSCALL error: %s", strerror(errno)); + if (n == -1) + elog(ERROR, "SSL SYSCALL error: %s", strerror(errno)); break; case SSL_ERROR_SSL: elog(ERROR, "SSL error: %s", SSLerrmessage()); @@ -585,7 +586,7 @@ { SSL_library_init(); SSL_load_error_strings(); - SSL_context = SSL_CTX_new(TLSv1_method()); + SSL_context = SSL_CTX_new(SSLv23_method()); if (!SSL_context) { postmaster_error("failed to create SSL context: %s", diff -ur postgresql-7.3/src/interfaces/libpq/fe-secure.c postgresql-7.3.patched/src/interfaces/libpq/fe-secure.c --- postgresql-7.3/src/interfaces/libpq/fe-secure.c Thu Nov 7 12:47:08 2002 +++ postgresql-7.3.patched/src/interfaces/libpq/fe-secure.c Mon Dec 9 23:42:40 2002 @@ -712,7 +712,7 @@ { SSL_library_init(); SSL_load_error_strings(); - SSL_context = SSL_CTX_new(TLSv1_method()); + SSL_context = SSL_CTX_new(SSLv23_method()); if (!SSL_context) { printfPQExpBuffer(&conn->errorMessage, Only in postgresql-7.3.patched/src/interfaces/libpq: fe-secure.c~---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
There was a conscious decision in 7.3 to require only 7.3 clients when
using SSL. I don't remember how many people were involved in that
discussion, but I know it was made. In fact, there was so much new SSL
code in 7.3, I suspected we couldn't even make it work with pre-7.2
clients. I am surprised it works with your small change.
Fair enough. Let me test the other patch with TLSv1 to make sure it
still works.
On the issue of checking if SSL_read() has returned -1, is that
standard
OpenSSL coding practice that even if SSL_get_error() returns an error,
you have to check the SSL_read() return value too?
From looking at the SSL_get_error man page I think you only need to do
it in the case of SSL_ERROR_SYSCALL.
--Nate
Import Notes
Resolved by subject fallback
Ok, I tested this out with TLSv1 and it worked fine. I found that the
same mistake was being made on the client side of things too so I
included a patch for that too.
--Nate
Index: src/backend/libpq/be-secure.c
===================================================================
RCS file: /s/postgresql-7.3.0/src/CVSROOT/postgresql-7.3.0/src/backend/-
libpq/be-secure.c,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 be-secure.c
--- src/backend/libpq/be-secure.c 2 Dec 2002 03:33:36 -0000 1.1.1.1
+++ src/backend/libpq/be-secure.c 10 Dec 2002 20:23:30 -0000
@@ -288,7 +288,8 @@
case SSL_ERROR_WANT_READ:
break;
case SSL_ERROR_SYSCALL:
- elog(ERROR, "SSL SYSCALL error: %s",
strerror(errno));
+ if (n == -1)
+ elog(ERROR, "SSL SYSCALL error: %s",
strerror(errno));
break;
case SSL_ERROR_SSL:
elog(ERROR, "SSL error: %s",
SSLerrmessage());
Index: src/interfaces/libpq/fe-secure.c
===================================================================
RCS file: /s/postgresql-7.3.0/src/CVSROOT/postgresql-7.3.0/src/interfac-
es/libpq/fe-secure.c,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 fe-secure.c
--- src/interfaces/libpq/fe-secure.c 2 Dec 2002 03:33:51 -0000 1.1.1.1
+++ src/interfaces/libpq/fe-secure.c 10 Dec 2002 20:24:36 -0000
@@ -270,7 +270,8 @@
case SSL_ERROR_WANT_READ:
break;
case SSL_ERROR_SYSCALL:
- printfPQExpBuffer(&conn->errorMessage,
+ if (n == -1)
+ printfPQExpBuffer(&conn->err-
orMessage,
libpq_g-
ettext(-
"SSL
SYSCALL
error:
%s\n"),
SOCK_-
STRER-
ROR(S-
OCK_E-
RRNO)-
);
break;
@@ -315,7 +316,8 @@
case SSL_ERROR_WANT_WRITE:
break;
case SSL_ERROR_SYSCALL:
- printfPQExpBuffer(&conn->errorMessage,
+ if (n == -1)
+ printfPQExpBuffer(&conn->err-
orMessage,
libpq_g-
ettext(-
"SSL
SYSCALL
error:
%s\n"),
SOCK_-
STRER-
ROR(S-
OCK_E-
RRNO)-
);
break;
Import Notes
Resolved by subject fallback
OK, I can apply this. One question I have is why the double strerror()
in the first patch chunk. Also, I will need to manually patch this
because your system has formatted the code quite unusually:
libpq_g-
ettext(-
"SSL
SYSCALL
error:
%s\n"),
SOCK_-
STRER-
ROR(S-
OCK_E-
RRNO)-
);
Also, I see in my documentation on SSL_get_error():
SSL_ERROR_SYSCALL
Some I/O error occurred. The OpenSSL error queue may
contain more information on the error. If the error
queue is empty (i.e. ERR_get_error() returns 0), ret
can be used to find out more about the error: If ret
== 0, an EOF was observed that violates the protocol.
If ret == -1, the underlying BIO reported an I/O error
(for socket I/O on Unix systems, consult errno for
details).
I assume this is the issue your patch is addressing, right?
---------------------------------------------------------------------------
Nathan Mueller wrote:
Ok, I tested this out with TLSv1 and it worked fine. I found that the
same mistake was being made on the client side of things too so I
included a patch for that too.--Nate
Index: src/backend/libpq/be-secure.c =================================================================== RCS file: /s/postgresql-7.3.0/src/CVSROOT/postgresql-7.3.0/src/backend/- libpq/be-secure.c,v retrieving revision 1.1.1.1 diff -u -r1.1.1.1 be-secure.c --- src/backend/libpq/be-secure.c 2 Dec 2002 03:33:36 -0000 1.1.1.1 +++ src/backend/libpq/be-secure.c 10 Dec 2002 20:23:30 -0000 @@ -288,7 +288,8 @@ case SSL_ERROR_WANT_READ: break; case SSL_ERROR_SYSCALL: - elog(ERROR, "SSL SYSCALL error: %s", strerror(errno)); + if (n == -1) + elog(ERROR, "SSL SYSCALL error: %s", strerror(errno)); break; case SSL_ERROR_SSL: elog(ERROR, "SSL error: %s", SSLerrmessage()); Index: src/interfaces/libpq/fe-secure.c =================================================================== RCS file: /s/postgresql-7.3.0/src/CVSROOT/postgresql-7.3.0/src/interfac- es/libpq/fe-secure.c,v retrieving revision 1.1.1.1 diff -u -r1.1.1.1 fe-secure.c --- src/interfaces/libpq/fe-secure.c 2 Dec 2002 03:33:51 -0000 1.1.1.1 +++ src/interfaces/libpq/fe-secure.c 10 Dec 2002 20:24:36 -0000 @@ -270,7 +270,8 @@ case SSL_ERROR_WANT_READ: break; case SSL_ERROR_SYSCALL: - printfPQExpBuffer(&conn->errorMessage, + if (n == -1) + printfPQExpBuffer(&conn->err- orMessage, libpq_g- ettext(- "SSL SYSCALL error: %s\n"), SOCK_- STRER- ROR(S- OCK_E- RRNO)- ); break; @@ -315,7 +316,8 @@ case SSL_ERROR_WANT_WRITE: break; case SSL_ERROR_SYSCALL: - printfPQExpBuffer(&conn->errorMessage, + if (n == -1) + printfPQExpBuffer(&conn->err- orMessage, libpq_g- ettext(- "SSL SYSCALL error: %s\n"), SOCK_- STRER- ROR(S- OCK_E- RRNO)- ); break;---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Ick, my email client wrapped that really weird. That's why the
double strerror is there too -- the first one should be part of the
previous line.
I've found that you can still get the server to crash in the other
error cases or when SSL_read/write returns -1. Since that happens
whenever you try to connect with a pre-7.3 client it's still a pretty
big deal. I'll keep looking at it but but I don't know how much further
I'll be able to get. It might be a good time to get the author involved
if at all possible.
--Nate
Import Notes
Resolved by subject fallback
I will manually apply this.
Your patch has been added to the PostgreSQL unapplied patches list at:
http://momjian.postgresql.org/cgi-bin/pgpatches
I will try to apply it within the next 48 hours.
---------------------------------------------------------------------------
Nathan Mueller wrote:
Ok, I tested this out with TLSv1 and it worked fine. I found that the
same mistake was being made on the client side of things too so I
included a patch for that too.--Nate
Index: src/backend/libpq/be-secure.c =================================================================== RCS file: /s/postgresql-7.3.0/src/CVSROOT/postgresql-7.3.0/src/backend/- libpq/be-secure.c,v retrieving revision 1.1.1.1 diff -u -r1.1.1.1 be-secure.c --- src/backend/libpq/be-secure.c 2 Dec 2002 03:33:36 -0000 1.1.1.1 +++ src/backend/libpq/be-secure.c 10 Dec 2002 20:23:30 -0000 @@ -288,7 +288,8 @@ case SSL_ERROR_WANT_READ: break; case SSL_ERROR_SYSCALL: - elog(ERROR, "SSL SYSCALL error: %s", strerror(errno)); + if (n == -1) + elog(ERROR, "SSL SYSCALL error: %s", strerror(errno)); break; case SSL_ERROR_SSL: elog(ERROR, "SSL error: %s", SSLerrmessage()); Index: src/interfaces/libpq/fe-secure.c =================================================================== RCS file: /s/postgresql-7.3.0/src/CVSROOT/postgresql-7.3.0/src/interfac- es/libpq/fe-secure.c,v retrieving revision 1.1.1.1 diff -u -r1.1.1.1 fe-secure.c --- src/interfaces/libpq/fe-secure.c 2 Dec 2002 03:33:51 -0000 1.1.1.1 +++ src/interfaces/libpq/fe-secure.c 10 Dec 2002 20:24:36 -0000 @@ -270,7 +270,8 @@ case SSL_ERROR_WANT_READ: break; case SSL_ERROR_SYSCALL: - printfPQExpBuffer(&conn->errorMessage, + if (n == -1) + printfPQExpBuffer(&conn->err- orMessage, libpq_g- ettext(- "SSL SYSCALL error: %s\n"), SOCK_- STRER- ROR(S- OCK_E- RRNO)- ); break; @@ -315,7 +316,8 @@ case SSL_ERROR_WANT_WRITE: break; case SSL_ERROR_SYSCALL: - printfPQExpBuffer(&conn->errorMessage, + if (n == -1) + printfPQExpBuffer(&conn->err- orMessage, libpq_g- ettext(- "SSL SYSCALL error: %s\n"), SOCK_- STRER- ROR(S- OCK_E- RRNO)- ); break;---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Nathan Mueller wrote:
Ok, I tested this out with TLSv1 and it worked fine. I found that the
same mistake was being made on the client side of things too so I
included a patch for that too.
OK, attached is the patch that I applied. It does strerror() but no
elog(ERROR) on the server side if SSL_get_error() returns
SSL_ERROR_SYSCALL and SSL_read() returns 0 rather than -1. This logic
matches the SSL_get_error() manual page.
I found a few cases you missed.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Attachments:
/bjm/difftext/plainDownload+16-16
Ok, I think I've gotten this figured out now. I saw this comment in pqcomm.c,
switched the ERROR logs to COMMERROR logs and it all works. I've attached a
patch to be-secure.c that fixes all my problems. Hopefully this is the right fix.
--Nate
/*
* Careful: an elog() that tries to write to the client would
* cause recursion to here, leading to stack overflow and core
* dump! This message must go *only* to the postmaster log.
*/
Attachments:
sslpatchapplication/octet-stream; name=sslpatchDownload+4-4
Import Notes
Resolved by subject fallback
I checked through the rest of the SSL code and caught a few more cases.
The strange part here is that COMMERROR is for cases where the client
might not exist, and obviously you are seeing that. The problem is that
these errors can happen when the client _does_ exist too. Not sure how
to handle that, but let me get this fix in now.
Attached patch applied to HEAD and 7.3.X. Thanks.
---------------------------------------------------------------------------
Nathan Mueller wrote:
Ok, I think I've gotten this figured out now. I saw this comment in pqcomm.c,
switched the ERROR logs to COMMERROR logs and it all works. I've attached a
patch to be-secure.c that fixes all my problems. Hopefully this is the right fix.--Nate
/*
* Careful: an elog() that tries to write to the client would
* cause recursion to here, leading to stack overflow and core
* dump! This message must go *only* to the postmaster log.
*/
[ Attachment, skipping... ]
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073