SSL renegotiation and other related woes

Started by Andres Freundalmost 11 years ago9 messages
#1Andres Freund
andres@2ndquadrant.com

Hi,

When working on getting rid of ImmediateInterruptOK I wanted to verify
that ssl still works correctly. Turned out it didn't. But neither did it
in master.

Turns out there's two major things we do wrong:

1) We ignore the rule that once called and returning
SSL_ERROR_WANTS_(READ|WRITE) SSL_read()/write() have to be called
again with the same parameters. Unfortunately that rule doesn't mean
just that the same parameters have to be passed in, but also that we
can't just constantly switch between _read()/write(). Especially
nonblocking backend code (i.e. walsender) and the whole frontend code
violate this rule.

2) We start renegotiations in be_tls_write() while in nonblocking mode,
but don't properly retry to handle socket readyness. There's a loop
that retries handshakes twenty times (???), but what actually is
needed is to call SSL_get_error() and ensure that there's actually
data available.

2) is easy enough to fix, but 1) is pretty hard. Before anybody says
that 1) isn't an important rule: It reliably causes connection aborts
within a couple renegotiations. The higher the latency the higher the
likelihood of aborts. Even locally it doesn't take very long to
abort. Errors usually are something like "SSL connection has been closed
unexpectedly" or "SSL Error: sslv3 alert unexpected message" and a host
of other similar messages. There's a couple reports of those in the
archives and I've seen many more in client logs.

As far as I can see the only realistic way to fix 1) is to change both
frontend and backend code to:
a) Always check for socket read/writeability before calling
SSL_read/write() when in nonblocking mode. That's a bit annoying
because it nearly doubles the amount of syscalls we do or client
communication, but I can't really se an alternative. That allows us
to avoid waiting inside after a WANT_READ/WRITE, or havin to setup a
larger state machine that keeps track what we tried last.

b) When SSL_read/write nonetheless returns WANT_READ/WRITE, even though
we tested for read/writeability, we're very likely doing
renegotiation. In that case we'll just have to block. There's already
code that busy loops (and thus waits) in the frontend
(c.f. pgtls_read's WANT_WRITE case, triggered during reneg). We can't
just return immediately to the upper layers as we'd otherwise likely
violate the rule about calling ssl with the same parameters again.

c) Add a somewhat hacky optimization whereas we allow to break out of a
WANT_READ condition in a nonblocking socket when ssl->state ==
SSL_ST_OK. That's the cases where it actually, at least by my reading
of the unreadable ssl code, safe to not wait. That case is somewhat
important because we otherwise can end up waiting on both sides due
to b), even when nonblocking calls where actually made. That
condition essentially means that we'll only block if renegotiation or
partial reads are in progress. Afaics at least.

d) Remove the SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER hack - we don't
actually need it anymore.

These errors are much less frequent when using a plain frontend
(e.g. psql/pgbench) because they don't use copy both stuff - the way
these clients use the FE/BE protocol there's essentially natural
synchronization points where nothing but renegotiation happens. With
walsender (or pipelined queries!) both sides can write at the same time.

My testcase for this is just to setup a server with a low
ssl_renegotiation_limit, generate lots of WAL (wal.sql attached) and
receive data via pg_receivexlog -n. Usually it'll error out quickly.

I've done a preliminary implementation of the above steps and it
survives transferring 25GB of WAL via the replication protocol with a
ssl_renegotiation_limit=100kB - previously it failed much earlier.

Does anybody have a neater way to tackle this? I'm not happy about this
solution, but I really can't think of anything better (save ditching
openssl maybe). I'm willing to clean up my hacked up fix for this, but
not if we can't find agreement on the approach.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Andres Freund
andres@2ndquadrant.com
In reply to: Andres Freund (#1)
1 attachment(s)
Re: SSL renegotiation and other related woes

On 2015-01-26 11:14:05 +0100, Andres Freund wrote:

My testcase for this is just to setup a server with a low
ssl_renegotiation_limit, generate lots of WAL (wal.sql attached) and
receive data via pg_receivexlog -n. Usually it'll error out quickly.

...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

wal.sqltext/plain; charset=us-asciiDownload
#3Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Andres Freund (#1)
1 attachment(s)
Re: SSL renegotiation and other related woes

On 01/26/2015 12:14 PM, Andres Freund wrote:

Hi,

When working on getting rid of ImmediateInterruptOK I wanted to verify
that ssl still works correctly. Turned out it didn't. But neither did it
in master.

Turns out there's two major things we do wrong:

1) We ignore the rule that once called and returning
SSL_ERROR_WANTS_(READ|WRITE) SSL_read()/write() have to be called
again with the same parameters. Unfortunately that rule doesn't mean
just that the same parameters have to be passed in, but also that we
can't just constantly switch between _read()/write(). Especially
nonblocking backend code (i.e. walsender) and the whole frontend code
violate this rule.

I don't buy that. Googling around, and looking at the man pages, I just
don't see anything to support that claim. I believe it is supported to
switch between reads and writes.

2) We start renegotiations in be_tls_write() while in nonblocking mode,
but don't properly retry to handle socket readyness. There's a loop
that retries handshakes twenty times (???), but what actually is
needed is to call SSL_get_error() and ensure that there's actually
data available.

Yeah, that's just crazy.

Actually, I don't think we need to call SSL_do_handshake() at all. At
least on modern OpenSSL versions. OpenSSL internally uses a state
machine, and SSL_read() and SSL_write() calls will complete the
handshake just as well.

2) is easy enough to fix, but 1) is pretty hard. Before anybody says
that 1) isn't an important rule: It reliably causes connection aborts
within a couple renegotiations. The higher the latency the higher the
likelihood of aborts. Even locally it doesn't take very long to
abort. Errors usually are something like "SSL connection has been closed
unexpectedly" or "SSL Error: sslv3 alert unexpected message" and a host
of other similar messages. There's a couple reports of those in the
archives and I've seen many more in client logs.

Yeah, I can reproduce that. There's clearly something wrong.

I believe this is an OpenSSL bug. I traced the "unexpected message"
error to this code in OpenSSL's s3_read_bytes() function:

case SSL3_RT_APPLICATION_DATA:
/*
* At this point, we were expecting handshake data, but have
* application data. If the library was running inside ssl3_read()
* (i.e. in_read_app_data is set) and it makes sense to read
* application data at this point (session renegotiation not yet
* started), we will indulge it.
*/
if (s->s3->in_read_app_data &&
(s->s3->total_renegotiations != 0) &&
(((s->state & SSL_ST_CONNECT) &&
(s->state >= SSL3_ST_CW_CLNT_HELLO_A) &&
(s->state <= SSL3_ST_CR_SRVR_HELLO_A)
) || ((s->state & SSL_ST_ACCEPT) &&
(s->state <= SSL3_ST_SW_HELLO_REQ_A) &&
(s->state >= SSL3_ST_SR_CLNT_HELLO_A)
)
)) {
s->s3->in_read_app_data = 2;
return (-1);
} else {
al = SSL_AD_UNEXPECTED_MESSAGE;
SSLerr(SSL_F_SSL3_READ_BYTES, SSL_R_UNEXPECTED_RECORD);
goto f_err;
}

So that quite clearly throws an error, *unless* the original application
call was SSL_read(). As you also concluded, OpenSSL doesn't like it when
SSL_write() is called when it's in renegotiation. I think that's
OpenSSL's fault and it should "indulge" the application data message
even in SSL_write().

Can we work-around that easily? I believe we can. The crucial part is
that we mustn't let SSL_write() to process any incoming application
data. We can achieve that if we always call SSL_read() to drain such
data, before calling SSL_write(). But that's not quite enough; if we're
in renegotiation, SSL_write() might still try to read more data from the
socket that has arrived after the SSL_read() call. Fortunately, we can
easily prevent that by hacking pqsecure_raw_read() to always return
EWOULDBLOCK, when it's called through SSL_write().

The attached patch does that. I've been running your pg_receivexlog test
case with this for about 15 minutes without any errors now, with
ssl_renegotiation_limit=50kB, while before it errored out within seconds.

In theory, I guess we should do similar hacks in the server, and always
call SSL_read() before SSL_write(), but it seems to work without it. Or
maybe not; OpenSSL server and client code is not symmetric, so perhaps
it works in server mode without that.

PS. The server code started renegotiation when it's 1kB from reaching
ssl_renegotiation_limit. I increased that to 32kB, because in testing, I
got some "SSL failed to renegotiate connection before limit expired"
errors in testing before I did that.

PPS. I did all testing of this patch with my sleeping logic
simplification patch applied
(/messages/by-id/54D3821E.9060004@vmware.com). It
applies without it too, and I don't think it matters, but I thought I'd
mention it.

- Heikki

Attachments:

0001-SSL-renegotiation-fixes.patchapplication/x-patch; name=0001-SSL-renegotiation-fixes.patchDownload
From 4b1c8ef107dc195c0b4a493d72c6ffe97077318c Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 5 Feb 2015 22:44:58 +0200
Subject: [PATCH 1/1] SSL renegotiation fixes.

Remove the ugly "retry twenty times" from server renegotiation code. In
fact, calling SSL_do_handshake() after SSL_renegotiate() is not necessary
at all. SSL_read() and SSL_write() will perform the handshake as well.

In the client-side, avoid letting SSL_write() to read data. That sometimes
makes OpenSSL throw "unexpected message" and other errors, if renegotiation
is happening. Before calling SSL_write(), set a flag that prevents
pqsecure_raw_read() from reading any more data from the socket, and call
SSL_read() to drain any remaining data from OpenSSL's buffer. That makes
ensures that SSL_read() always handles all reads, which seems to be
necessary to make renegotiations robust with OpenSSL.

Also start renegotiation a little bit earlier than before. We used to start
it 1kB before reaching ssl_renegotiation_limit. That's too little, there can
easily be 1kB of data in-flight from the client, before it gets our
renegotiation request.
---
 src/backend/libpq/be-secure-openssl.c    | 25 ++-----------------------
 src/interfaces/libpq/fe-misc.c           | 16 +++++++++++++++-
 src/interfaces/libpq/fe-secure-openssl.c |  1 -
 src/interfaces/libpq/fe-secure.c         | 15 +++++++++++++++
 src/interfaces/libpq/libpq-int.h         |  3 +++
 5 files changed, 35 insertions(+), 25 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 39587e8..e847f41 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -576,11 +576,11 @@ be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
 	/*
 	 * If SSL renegotiations are enabled and we're getting close to the
 	 * limit, start one now; but avoid it if there's one already in
-	 * progress.  Request the renegotiation 1kB before the limit has
+	 * progress.  Request the renegotiation 32kB before the limit has
 	 * actually expired.
 	 */
 	if (ssl_renegotiation_limit && !in_ssl_renegotiation &&
-		port->count > (ssl_renegotiation_limit - 1) * 1024L)
+		port->count > (ssl_renegotiation_limit - 32) * 1024L)
 	{
 		in_ssl_renegotiation = true;
 
@@ -601,27 +601,6 @@ be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
 			ereport(COMMERROR,
 					(errcode(ERRCODE_PROTOCOL_VIOLATION),
 					 errmsg("SSL failure during renegotiation start")));
-		else
-		{
-			int			retries;
-
-			/*
-			 * A handshake can fail, so be prepared to retry it, but only
-			 * a few times.
-			 */
-			for (retries = 0;; retries++)
-			{
-				if (SSL_do_handshake(port->ssl) > 0)
-					break;	/* done */
-				ereport(COMMERROR,
-						(errcode(ERRCODE_PROTOCOL_VIOLATION),
-						 errmsg("SSL handshake failure on renegotiation, retrying")));
-				if (retries >= 20)
-					ereport(FATAL,
-							(errcode(ERRCODE_PROTOCOL_VIOLATION),
-							 errmsg("could not complete SSL handshake on renegotiation, too many failures")));
-			}
-		}
 	}
 
 	errno = 0;
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 945e283..4c47821 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -834,7 +834,7 @@ pqSendSome(PGconn *conn, int len)
 {
 	char	   *ptr = conn->outBuffer;
 	int			remaining = conn->outCount;
-	int			result = 0;
+	int			result;
 
 	if (conn->sock == PGINVALID_SOCKET)
 	{
@@ -845,11 +845,24 @@ pqSendSome(PGconn *conn, int len)
 		return -1;
 	}
 
+#ifdef USE_OPENSSL
+	if (conn->ssl)
+	{
+		block_reads = true;
+		result = pqReadData(conn);
+		block_reads = false;
+		if (result < 0)
+			return -1;
+	}
+#endif
+
 	/* while there's still data to send */
+	result = 0;
 	while (len > 0)
 	{
 		int			sent;
 
+		block_reads = true;
 #ifndef WIN32
 		sent = pqsecure_write(conn, ptr, len);
 #else
@@ -861,6 +874,7 @@ pqSendSome(PGconn *conn, int len)
 		 */
 		sent = pqsecure_write(conn, ptr, Min(len, 65536));
 #endif
+		block_reads = false;
 
 		if (sent < 0)
 		{
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index a32af34..38e7071 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -100,7 +100,6 @@ static long win32_ssl_create_mutex = 0;
 #endif
 #endif   /* ENABLE_THREAD_SAFETY */
 
-
 /* ------------------------------------------------------------ */
 /*			 Procedures common to all secure sessions			*/
 /* ------------------------------------------------------------ */
diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
index 57c572b..aac405b 100644
--- a/src/interfaces/libpq/fe-secure.c
+++ b/src/interfaces/libpq/fe-secure.c
@@ -124,6 +124,9 @@ struct sigpipe_info
 #define RESTORE_SIGPIPE(conn, spinfo)
 #endif   /* WIN32 */
 
+bool block_reads = false;
+bool block_writes = false;
+
 /* ------------------------------------------------------------ */
 /*			 Procedures common to all secure sessions			*/
 /* ------------------------------------------------------------ */
@@ -227,6 +230,12 @@ pqsecure_raw_read(PGconn *conn, void *ptr, size_t len)
 	int			result_errno = 0;
 	char		sebuf[256];
 
+	if (block_reads)
+	{
+		errno = EWOULDBLOCK;
+		return -1;
+	}
+
 	n = recv(conn->sock, ptr, len, 0);
 
 	if (n < 0)
@@ -307,6 +316,12 @@ pqsecure_raw_write(PGconn *conn, const void *ptr, size_t len)
 
 	DECLARE_SIGPIPE_INFO(spinfo);
 
+	if (block_writes)
+	{
+		errno = EWOULDBLOCK;
+		return -1;
+	}
+
 #ifdef MSG_NOSIGNAL
 	if (conn->sigpipe_flag)
 		flags |= MSG_NOSIGNAL;
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 008fd67..ce55411 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -618,6 +618,9 @@ extern int	pqWriteReady(PGconn *conn);
 
 /* === in fe-secure.c === */
 
+extern bool block_reads;
+extern bool block_writes;
+
 extern int	pqsecure_initialize(PGconn *);
 extern void pqsecure_destroy(void);
 extern PostgresPollingStatusType pqsecure_open_client(PGconn *);
-- 
2.1.4

#4Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Heikki Linnakangas (#3)
1 attachment(s)
Re: SSL renegotiation and other related woes

On 02/05/2015 11:03 PM, Heikki Linnakangas wrote:

On 01/26/2015 12:14 PM, Andres Freund wrote:
Can we work-around that easily? I believe we can. The crucial part is
that we mustn't let SSL_write() to process any incoming application
data. We can achieve that if we always call SSL_read() to drain such
data, before calling SSL_write(). But that's not quite enough; if we're
in renegotiation, SSL_write() might still try to read more data from the
socket that has arrived after the SSL_read() call. Fortunately, we can
easily prevent that by hacking pqsecure_raw_read() to always return
EWOULDBLOCK, when it's called through SSL_write().

The attached patch does that. I've been running your pg_receivexlog test
case with this for about 15 minutes without any errors now, with
ssl_renegotiation_limit=50kB, while before it errored out within seconds.

Here is a simplified version of this patch. It seems to be enough to not
let SSL_write() to read any data from the socket. No need to call
SSL_read() first, and the server-side changes I made were only needed
because of the other patch I had applied.

Thoughts? Can you reproduce any errors with this?

In theory, I guess we should do similar hacks in the server, and always
call SSL_read() before SSL_write(), but it seems to work without it. Or
maybe not; OpenSSL server and client code is not symmetric, so perhaps
it works in server mode without that.

Not included in this patch, but I believe we apply a similar patch to
the server-side too.

- Heikki

Attachments:

0001-Fix-sslv3-alert-unexpected-message-errors-in-SSL-ren.patchapplication/x-patch; name=0001-Fix-sslv3-alert-unexpected-message-errors-in-SSL-ren.patchDownload
From b78bfbb70127cbe3f360458eb2a97dcc28194fbc Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 5 Feb 2015 22:44:58 +0200
Subject: [PATCH 1/1] Fix "sslv3 alert unexpected message" errors in SSL
 renegotiation.

There seems to be a bug in OpenSSL renegotiation, so that when SSL_write()
needs to read data to complete the handshake, but it receives application
data instead, it gets confused and throws an "unexpected message" error. To
work around that, don't let SSL_write() to read data from the socket, even
if some is available. Arrange so that SSL_read() processes all incoming
messages instead.
---
 src/interfaces/libpq/fe-misc.c           | 14 +++++++++-----
 src/interfaces/libpq/fe-secure-openssl.c | 20 ++++++++++++++++++++
 src/interfaces/libpq/fe-secure.c         |  9 +++++++++
 src/interfaces/libpq/libpq-int.h         |  2 ++
 4 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 945e283..9dd02d5 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -920,11 +920,15 @@ pqSendSome(PGconn *conn, int len)
 			 * to clear the channel eventually because it's blocked trying to
 			 * send data to us.  (This can happen when we are sending a large
 			 * amount of COPY data, and the server has generated lots of
-			 * NOTICE responses.)  To avoid a deadlock situation, we must be
-			 * prepared to accept and buffer incoming data before we try
-			 * again.  Furthermore, it is possible that such incoming data
-			 * might not arrive until after we've gone to sleep.  Therefore,
-			 * we wait for either read ready or write ready.
+			 * NOTICE responses.)  Another scenario is that SSL renegotiation
+			 * is in progress, and the SSL library needs to read a message
+			 * to complete the handshake, before it can send more data.
+			 *
+			 * To avoid a deadlock situation, we must be prepared to accept
+			 * and buffer incoming data before we try again.  Furthermore, it
+			 * is possible that such incoming data might not arrive until
+			 * after we've gone to sleep.  Therefore, we wait for either read
+			 * ready or write ready.
 			 */
 			if (pqReadData(conn) < 0)
 			{
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index a32af34..2e75f3c 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -319,8 +319,28 @@ pgtls_write(PGconn *conn, const void *ptr, size_t len)
 	char		sebuf[256];
 	int			err;
 
+	/*
+	 * To work-around an issue with OpenSSL and renegotiation, we don't
+	 * let SSL_write() to read any incoming data.
+	 *
+	 * If SSL_write() is called, and renegotiation has just been inititated,
+	 * SSL_write() might try to read from the socket to complete the
+	 * handshake. If there was some application data in-flight, it might
+	 * receive the application data instead. That confuses it, and it throws
+	 * an "sslv3 alert unexpected message" error.
+	 *
+	 * We avoid that setting a kill-switch, pqsecure_block_raw_read, which
+	 * tells pqsecure_raw_read() to not return any data to the caller, even
+	 * if some is available.
+	 *
+	 * NB: This relies on the calling code to call pqsecure_read(), completing
+	 * the renegotiation handshake, if pqsecure_write() returns 0. Otherwise
+	 * we'll never make progress.
+	 */
 	SOCK_ERRNO_SET(0);
+	pqsecure_block_raw_read = true;
 	n = SSL_write(conn->ssl, ptr, len);
+	pqsecure_block_raw_read = false;
 	err = SSL_get_error(conn->ssl, n);
 	switch (err)
 	{
diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
index 57c572b..11d0f8b 100644
--- a/src/interfaces/libpq/fe-secure.c
+++ b/src/interfaces/libpq/fe-secure.c
@@ -124,6 +124,8 @@ struct sigpipe_info
 #define RESTORE_SIGPIPE(conn, spinfo)
 #endif   /* WIN32 */
 
+bool pqsecure_block_raw_read = false;
+
 /* ------------------------------------------------------------ */
 /*			 Procedures common to all secure sessions			*/
 /* ------------------------------------------------------------ */
@@ -227,6 +229,13 @@ pqsecure_raw_read(PGconn *conn, void *ptr, size_t len)
 	int			result_errno = 0;
 	char		sebuf[256];
 
+	/* If the kill-switch is set, pretend that there is no data. */
+	if (pqsecure_block_raw_read)
+	{
+		errno = EWOULDBLOCK;
+		return -1;
+	}
+
 	n = recv(conn->sock, ptr, len, 0);
 
 	if (n < 0)
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 008fd67..1a94d16 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -618,6 +618,8 @@ extern int	pqWriteReady(PGconn *conn);
 
 /* === in fe-secure.c === */
 
+extern bool pqsecure_block_raw_read;
+
 extern int	pqsecure_initialize(PGconn *);
 extern void pqsecure_destroy(void);
 extern PostgresPollingStatusType pqsecure_open_client(PGconn *);
-- 
2.1.4

#5Albe Laurenz
laurenz.albe@wien.gv.at
In reply to: Heikki Linnakangas (#4)
Re: SSL renegotiation and other related woes

Heikki Linnakangas wrote:

Can we work-around that easily? I believe we can. The crucial part is
that we mustn't let SSL_write() to process any incoming application
data. We can achieve that if we always call SSL_read() to drain such
data, before calling SSL_write(). But that's not quite enough; if we're
in renegotiation, SSL_write() might still try to read more data from the
socket that has arrived after the SSL_read() call. Fortunately, we can
easily prevent that by hacking pqsecure_raw_read() to always return
EWOULDBLOCK, when it's called through SSL_write().

The attached patch does that. I've been running your pg_receivexlog test
case with this for about 15 minutes without any errors now, with
ssl_renegotiation_limit=50kB, while before it errored out within seconds.

Here is a simplified version of this patch. It seems to be enough to not
let SSL_write() to read any data from the socket. No need to call
SSL_read() first, and the server-side changes I made were only needed
because of the other patch I had applied.

Thoughts? Can you reproduce any errors with this?

In theory, I guess we should do similar hacks in the server, and always
call SSL_read() before SSL_write(), but it seems to work without it. Or
maybe not; OpenSSL server and client code is not symmetric, so perhaps
it works in server mode without that.

Not included in this patch, but I believe we apply a similar patch to
the server-side too.

It seems to me that there is a bug in the server part of your first patch
(not included in your second patch):

--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -576,11 +576,11 @@ be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
    /*
     * If SSL renegotiations are enabled and we're getting close to the
     * limit, start one now; but avoid it if there's one already in
-    * progress.  Request the renegotiation 1kB before the limit has
+    * progress.  Request the renegotiation 32kB before the limit has
     * actually expired.
     */
    if (ssl_renegotiation_limit && !in_ssl_renegotiation &&
-       port->count > (ssl_renegotiation_limit - 1) * 1024L)
+       port->count > (ssl_renegotiation_limit - 32) * 1024L)
    {
        in_ssl_renegotiation = true;

Experiment shows that for values of ssl_renegotiation_limit less than 32,
no renegotiation takes place at all with this change applied.
port->count is an unsigned long.

Yours,
Laurenz Albe

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andres Freund
andres@2ndquadrant.com
In reply to: Heikki Linnakangas (#4)
Re: SSL renegotiation and other related woes

On 2015-02-11 14:54:03 +0200, Heikki Linnakangas wrote:

Thoughts? Can you reproduce any errors with this?

I'm on battery right now, so I can't really test much.

Did you test having an actual standby instead of pg_receivexlog? I saw
some slightly different errors there.

Does this patch alone work for you or did you test it together with the
earlier one to fix the renegotiation handling server side when
nonblocking? Because that frequently seemed to error out for me, at
least over actual network. A latch loop + checking for the states seemed
to fix that for me.

I'm pretty darn happy that you seem to have found a much less ugly
solution than mine. Although it's only pretty by comparison ;)

+	/*
+	 * To work-around an issue with OpenSSL and renegotiation, we don't
+	 * let SSL_write() to read any incoming data.

superflous "to".

+	 * NB: This relies on the calling code to call pqsecure_read(), completing
+	 * the renegotiation handshake, if pqsecure_write() returns 0. Otherwise
+	 * we'll never make progress.
+	 */

I think this should a) refer to the fact that pqSendSome does that in
blocking scenarios b) expand the comment around the pqReadData to
reference that fact.

How are we going to deal with callers using libpq in nonblocking mode. A
client doing a large COPY FROM STDIN to the server using a nonblocking
connection (not a stupid thing to do) could hit this IIUC.

Could we perhaps call SSL_peek() when SSL_write() hits WANTS_READ? In
combination with your fix I think that should fix the possibility of
such a deadlock? Especially on the serverside where the socket most of
the time is is in, although emulated, blocking mode that seems easier -
we can just retry afterwards.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Andres Freund
andres@2ndquadrant.com
In reply to: Heikki Linnakangas (#3)
Re: SSL renegotiation and other related woes

Hi,

On 2015-02-05 23:03:02 +0200, Heikki Linnakangas wrote:

On 01/26/2015 12:14 PM, Andres Freund wrote:

Hi,

When working on getting rid of ImmediateInterruptOK I wanted to verify
that ssl still works correctly. Turned out it didn't. But neither did it
in master.

Turns out there's two major things we do wrong:

1) We ignore the rule that once called and returning
SSL_ERROR_WANTS_(READ|WRITE) SSL_read()/write() have to be called
again with the same parameters. Unfortunately that rule doesn't mean
just that the same parameters have to be passed in, but also that we
can't just constantly switch between _read()/write(). Especially
nonblocking backend code (i.e. walsender) and the whole frontend code
violate this rule.

I don't buy that. Googling around, and looking at the man pages, I just
don't see anything to support that claim. I believe it is supported to
switch between reads and writes.

There's at least two implementations that seem to have workaround to
achieve something like that. Apache's mod_ssl and the ssl layer for
libevent.

2) We start renegotiations in be_tls_write() while in nonblocking mode,
but don't properly retry to handle socket readyness. There's a loop
that retries handshakes twenty times (???), but what actually is
needed is to call SSL_get_error() and ensure that there's actually
data available.

Yeah, that's just crazy.

Actually, I don't think we need to call SSL_do_handshake() at all. At least
on modern OpenSSL versions. OpenSSL internally uses a state machine, and
SSL_read() and SSL_write() calls will complete the handshake just as well.

Some blaming seems to show that that's been the case for a long time.

2) is easy enough to fix, but 1) is pretty hard. Before anybody says
that 1) isn't an important rule: It reliably causes connection aborts
within a couple renegotiations. The higher the latency the higher the
likelihood of aborts. Even locally it doesn't take very long to
abort. Errors usually are something like "SSL connection has been closed
unexpectedly" or "SSL Error: sslv3 alert unexpected message" and a host
of other similar messages. There's a couple reports of those in the
archives and I've seen many more in client logs.

Yeah, I can reproduce that. There's clearly something wrong.

I believe this is an OpenSSL bug. I traced the "unexpected message" error to
this code in OpenSSL's s3_read_bytes() function:

Yep, I got to that as well.

case SSL3_RT_APPLICATION_DATA:
/*
* At this point, we were expecting handshake data, but have
* application data. If the library was running inside ssl3_read()
* (i.e. in_read_app_data is set) and it makes sense to read
* application data at this point (session renegotiation not yet
* started), we will indulge it.
*/
if (s->s3->in_read_app_data &&
(s->s3->total_renegotiations != 0) &&
(((s->state & SSL_ST_CONNECT) &&
(s->state >= SSL3_ST_CW_CLNT_HELLO_A) &&
(s->state <= SSL3_ST_CR_SRVR_HELLO_A)
) || ((s->state & SSL_ST_ACCEPT) &&
(s->state <= SSL3_ST_SW_HELLO_REQ_A) &&
(s->state >= SSL3_ST_SR_CLNT_HELLO_A)
)
)) {
s->s3->in_read_app_data = 2;
return (-1);
} else {
al = SSL_AD_UNEXPECTED_MESSAGE;
SSLerr(SSL_F_SSL3_READ_BYTES, SSL_R_UNEXPECTED_RECORD);
goto f_err;
}

So that quite clearly throws an error, *unless* the original application
call was SSL_read(). As you also concluded, OpenSSL doesn't like it when
SSL_write() is called when it's in renegotiation. I think that's OpenSSL's
fault and it should "indulge" the application data message even in
SSL_write().

That'd be nice, but I think there were multiple reports to them about
this and there wasn't much of a response. Maybe it's time to try again.

In theory, I guess we should do similar hacks in the server, and always call
SSL_read() before SSL_write(), but it seems to work without it. Or maybe
not; OpenSSL server and client code is not symmetric, so perhaps it works in
server mode without that.

I think we should do it on the server side - got some server side errors
when I cranked up the pg_receivexlog's status interval and set the wal
sender timeout very low.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Andres Freund (#6)
Re: SSL renegotiation and other related woes

On 02/12/2015 01:33 AM, Andres Freund wrote:

On 2015-02-11 14:54:03 +0200, Heikki Linnakangas wrote:

Thoughts? Can you reproduce any errors with this?

I'm on battery right now, so I can't really test much.

Did you test having an actual standby instead of pg_receivexlog? I saw
some slightly different errors there.

Does this patch alone work for you or did you test it together with the
earlier one to fix the renegotiation handling server side when
nonblocking? Because that frequently seemed to error out for me, at
least over actual network. A latch loop + checking for the states seemed
to fix that for me.

This patch alone worked for me.

+	 * NB: This relies on the calling code to call pqsecure_read(), completing
+	 * the renegotiation handshake, if pqsecure_write() returns 0. Otherwise
+	 * we'll never make progress.
+	 */

I think this should a) refer to the fact that pqSendSome does that in
blocking scenarios b) expand the comment around the pqReadData to
reference that fact.

How are we going to deal with callers using libpq in nonblocking mode. A
client doing a large COPY FROM STDIN to the server using a nonblocking
connection (not a stupid thing to do) could hit this IIUC.

Hmm, that's problematic even without SSL. If you do a large COPY FROM
STDIN, and the server sends a lot of NOTICEs, the NOTICEs will be
accumulated in the TCP send and receive buffers until they fill up.
After that, the server will block on the send, and will stop reading the
tuple data the client sends, and you get a deadlock. In blocking mode,
pqSendSome calls pqReadData to avoid that.

I think pqSendSome should call pqReadData() after getting EWOULDBLOCK,
even in non-blocking mode. It won't completely prevent the problem: it's
possible that the receive buffer fills up, but the application doesn't
call pqSendSome() until the socket becomes writeable, which won't happen
until the client drains the incoming data is drained and unblocks the
server. But it greatly reduces the risk. In practice that will solve the
problem for SSL renegotiations.

We should also document that the application should always wait for the
socket readable condition, in addition to writeable, and call
pqConsumeInput(). That fixes the issue for good.

Could we perhaps call SSL_peek() when SSL_write() hits WANTS_READ? In
combination with your fix I think that should fix the possibility of
such a deadlock? Especially on the serverside where the socket most of
the time is is in, although emulated, blocking mode that seems easier -
we can just retry afterwards.

Hmm. Not sure what the semantics of SSL_peek() are. At least we'd need
to call it with a large enough buffer that it would read all the
incoming data from the OS buffer. I think it would be safer to just call
SSL_read(). Both the server and libpq have an input buffer that you can
easily read into at any time.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Andres Freund (#7)
3 attachment(s)
Re: SSL renegotiation and other related woes

On 02/12/2015 01:41 AM, Andres Freund wrote:

Hi,

On 2015-02-05 23:03:02 +0200, Heikki Linnakangas wrote:

On 01/26/2015 12:14 PM, Andres Freund wrote:

Hi,

When working on getting rid of ImmediateInterruptOK I wanted to verify
that ssl still works correctly. Turned out it didn't. But neither did it
in master.

Turns out there's two major things we do wrong:

1) We ignore the rule that once called and returning
SSL_ERROR_WANTS_(READ|WRITE) SSL_read()/write() have to be called
again with the same parameters. Unfortunately that rule doesn't mean
just that the same parameters have to be passed in, but also that we
can't just constantly switch between _read()/write(). Especially
nonblocking backend code (i.e. walsender) and the whole frontend code
violate this rule.

I don't buy that. Googling around, and looking at the man pages, I just
don't see anything to support that claim. I believe it is supported to
switch between reads and writes.

There's at least two implementations that seem to have workaround to
achieve something like that. Apache's mod_ssl and the ssl layer for
libevent.

2) We start renegotiations in be_tls_write() while in nonblocking mode,
but don't properly retry to handle socket readyness. There's a loop
that retries handshakes twenty times (???), but what actually is
needed is to call SSL_get_error() and ensure that there's actually
data available.

Yeah, that's just crazy.

Actually, I don't think we need to call SSL_do_handshake() at all. At least
on modern OpenSSL versions. OpenSSL internally uses a state machine, and
SSL_read() and SSL_write() calls will complete the handshake just as well.

Some blaming seems to show that that's been the case for a long time.

2) is easy enough to fix, but 1) is pretty hard. Before anybody says
that 1) isn't an important rule: It reliably causes connection aborts
within a couple renegotiations. The higher the latency the higher the
likelihood of aborts. Even locally it doesn't take very long to
abort. Errors usually are something like "SSL connection has been closed
unexpectedly" or "SSL Error: sslv3 alert unexpected message" and a host
of other similar messages. There's a couple reports of those in the
archives and I've seen many more in client logs.

Yeah, I can reproduce that. There's clearly something wrong.

I believe this is an OpenSSL bug. I traced the "unexpected message" error to
this code in OpenSSL's s3_read_bytes() function:

Yep, I got to that as well.

case SSL3_RT_APPLICATION_DATA:
/*
* At this point, we were expecting handshake data, but have
* application data. If the library was running inside ssl3_read()
* (i.e. in_read_app_data is set) and it makes sense to read
* application data at this point (session renegotiation not yet
* started), we will indulge it.
*/
if (s->s3->in_read_app_data &&
(s->s3->total_renegotiations != 0) &&
(((s->state & SSL_ST_CONNECT) &&
(s->state >= SSL3_ST_CW_CLNT_HELLO_A) &&
(s->state <= SSL3_ST_CR_SRVR_HELLO_A)
) || ((s->state & SSL_ST_ACCEPT) &&
(s->state <= SSL3_ST_SW_HELLO_REQ_A) &&
(s->state >= SSL3_ST_SR_CLNT_HELLO_A)
)
)) {
s->s3->in_read_app_data = 2;
return (-1);
} else {
al = SSL_AD_UNEXPECTED_MESSAGE;
SSLerr(SSL_F_SSL3_READ_BYTES, SSL_R_UNEXPECTED_RECORD);
goto f_err;
}

So that quite clearly throws an error, *unless* the original application
call was SSL_read(). As you also concluded, OpenSSL doesn't like it when
SSL_write() is called when it's in renegotiation. I think that's OpenSSL's
fault and it should "indulge" the application data message even in
SSL_write().

That'd be nice, but I think there were multiple reports to them about
this and there wasn't much of a response. Maybe it's time to try again.

In theory, I guess we should do similar hacks in the server, and always call
SSL_read() before SSL_write(), but it seems to work without it. Or maybe
not; OpenSSL server and client code is not symmetric, so perhaps it works in
server mode without that.

I think we should do it on the server side - got some server side errors
when I cranked up the pg_receivexlog's status interval and set the wal
sender timeout very low.

I've not been able to reproduce any errors on the server side, with my
latest client-only patch. Not sure why. I would assume that the server
has the same problem, but perhaps there are some mitigating factors that
make it impossible or at least less likely there.

I'm planning to commit and backpatch the first two of the attached
patches. The first is essentially the same as what I've posted before,
with some minor cleanup. I realized that the hack can be isolated
completely in fe-secure-openssl.c by putting the "kill-switch" in the
custom BIO_read routine, instead of pqsecure_raw_read(), so I did that.
The second patch makes pqSendSome() call pqReadData() also in
non-blocking mode. Per discussion, that's necessary to avoid getting
stuck, if an application that uses non-blocking mode doesn't call
pqConsumeInput() between pqFlush() calls.

For the sake of completeness, the third one applies the same kill-switch
hack to the server, preventing SSL_write() from reading anything.
However, as it doesn't seem to actually be necessary, and it happens to
be a bit more messy to implement in the backend than in libpq, I'm not
going to commit that one.

- Heikki

Attachments:

0001-Fix-sslv3-alert-unexpected-message-errors-in-SSL-ren.patchapplication/x-patch; name=0001-Fix-sslv3-alert-unexpected-message-errors-in-SSL-ren.patchDownload
From 0c59577de27ec636026370ddf1ad9df1190883f5 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 5 Feb 2015 22:44:58 +0200
Subject: [PATCH 1/3] Fix "sslv3 alert unexpected message" errors in SSL
 renegotiation.

There seems to be a bug in OpenSSL renegotiation, so that when SSL_write()
needs to read data to complete the handshake, but it receives application
data instead, it gets confused and throws an "unexpected message" error. To
work around that, don't let SSL_write() to read data from the socket, even
if some is available. Arrange so that SSL_read() processes all incoming
messages instead.

Reviewed by Andres Freund

diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 945e283..9dd02d5 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -920,11 +920,15 @@ pqSendSome(PGconn *conn, int len)
 			 * to clear the channel eventually because it's blocked trying to
 			 * send data to us.  (This can happen when we are sending a large
 			 * amount of COPY data, and the server has generated lots of
-			 * NOTICE responses.)  To avoid a deadlock situation, we must be
-			 * prepared to accept and buffer incoming data before we try
-			 * again.  Furthermore, it is possible that such incoming data
-			 * might not arrive until after we've gone to sleep.  Therefore,
-			 * we wait for either read ready or write ready.
+			 * NOTICE responses.)  Another scenario is that SSL renegotiation
+			 * is in progress, and the SSL library needs to read a message
+			 * to complete the handshake, before it can send more data.
+			 *
+			 * To avoid a deadlock situation, we must be prepared to accept
+			 * and buffer incoming data before we try again.  Furthermore, it
+			 * is possible that such incoming data might not arrive until
+			 * after we've gone to sleep.  Therefore, we wait for either read
+			 * ready or write ready.
 			 */
 			if (pqReadData(conn) < 0)
 			{
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index a32af34..93b8184 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -78,6 +78,7 @@ static int my_sock_write(BIO *h, const char *buf, int size);
 static BIO_METHOD *my_BIO_s_socket(void);
 static int my_SSL_set_fd(PGconn *conn, int fd);
 
+static bool my_block_raw_read = false;
 
 static bool pq_init_ssl_lib = true;
 static bool pq_init_crypto_lib = true;
@@ -319,8 +320,28 @@ pgtls_write(PGconn *conn, const void *ptr, size_t len)
 	char		sebuf[256];
 	int			err;
 
+	/*
+	 * To work-around an issue with OpenSSL and renegotiation, don't let
+	 * SSL_write() read any incoming data.
+	 *
+	 * If SSL_write() is called, and renegotiation has just been inititated,
+	 * SSL_write() might try to read from the socket to complete the
+	 * handshake. If there was some application data in-flight, it might
+	 * receive the application data instead. That confuses it, and it throws
+	 * an "sslv3 alert unexpected message" error.
+	 *
+	 * We avoid that by setting a kill-switch, my_block_raw_read, which tells
+	 * my_sock_read() to not return any data to the caller, even if some is
+	 * available.
+	 *
+	 * NB: This relies on the calling code to call pqsecure_read(), completing
+	 * the renegotiation handshake, if pqsecure_write() returns 0. Otherwise
+	 * we'll never make progress.
+	 */
 	SOCK_ERRNO_SET(0);
+	my_block_raw_read = true;
 	n = SSL_write(conn->ssl, ptr, len);
+	my_block_raw_read = false;
 	err = SSL_get_error(conn->ssl, n);
 	switch (err)
 	{
@@ -1591,6 +1612,15 @@ my_sock_read(BIO *h, char *buf, int size)
 	int			res;
 	int			save_errno;
 
+	/* If the kill-switch is set, pretend that there is no data. */
+	if (my_block_raw_read)
+	{
+		errno = EWOULDBLOCK;
+		BIO_clear_retry_flags(h);
+		BIO_set_retry_read(h);
+		return -1;
+	}
+
 	res = pqsecure_raw_read((PGconn *) h->ptr, buf, size);
 	save_errno = errno;
 	BIO_clear_retry_flags(h);
-- 
2.1.4

0002-Also-drain-input-buffer-in-non-blocking-mode-if-send.patchapplication/x-patch; name=0002-Also-drain-input-buffer-in-non-blocking-mode-if-send.patchDownload
From b03488c0fb28b74a490f164b04705a24f759e779 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 12 Feb 2015 15:37:19 +0200
Subject: [PATCH 2/3] Also drain input buffer in non-blocking mode if send
 fails with EAGAIN.

In blocking mode, pqSendSome() calls pqReadData() to read any NOTICEs and
such that the server may have sent, before sleeping. That avoids a deadlock
e.g. in COPY FROM STDIN, because the server might block on sending the
NOTICEs and not receive our data until we read the NOTICEs. But
non-blocking mode has exactly the same issue, so let's call pqReadData() in
non-blocking mode too.

Reviewed by Andres Freund

diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 9dd02d5..f0ab6cd 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -905,16 +905,6 @@ pqSendSome(PGconn *conn, int len)
 			/*
 			 * We didn't send it all, wait till we can send more.
 			 *
-			 * If the connection is in non-blocking mode we don't wait, but
-			 * return 1 to indicate that data is still pending.
-			 */
-			if (pqIsnonblocking(conn))
-			{
-				result = 1;
-				break;
-			}
-
-			/*
 			 * There are scenarios in which we can't send data because the
 			 * communications channel is full, but we cannot expect the server
 			 * to clear the channel eventually because it's blocked trying to
@@ -935,6 +925,17 @@ pqSendSome(PGconn *conn, int len)
 				result = -1;	/* error message already set up */
 				break;
 			}
+
+			/*
+			 * If the connection is in non-blocking mode we don't wait, but
+			 * return 1 to indicate that data is still pending.
+			 */
+			if (pqIsnonblocking(conn))
+			{
+				result = 1;
+				break;
+			}
+
 			if (pqWait(TRUE, TRUE, conn))
 			{
 				result = -1;
-- 
2.1.4

0003-WIP-Do-the-same-block-raw-read-dance-in-backend.patchapplication/x-patch; name=0003-WIP-Do-the-same-block-raw-read-dance-in-backend.patchDownload
From 0cf438fca1d00d41f7358bd140346fa813d9ab1e Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 12 Feb 2015 18:58:01 +0200
Subject: [PATCH 3/3] WIP: Do the same block-raw-read dance in backend.

This is more complicated than in the client, because the callers of
secure_write() will not automatically do a read, if the write fails.
Need to expose pq_recvbuf() and call it from secure_write(), which is
modularity violation.

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 37af6e4..4a20d80 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -93,6 +93,9 @@ static const char *SSLerrmessage(void);
 /* are we in the middle of a renegotiation? */
 static bool in_ssl_renegotiation = false;
 
+/* kill-switch to have my_sock_read pretend there's no data */
+static bool my_block_raw_read = false;
+
 static SSL_CTX *SSL_context = NULL;
 
 /* ------------------------------------------------------------ */
@@ -602,7 +605,9 @@ be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
 	}
 
 	errno = 0;
+	my_block_raw_read = true;
 	n = SSL_write(port->ssl, ptr, len);
+	my_block_raw_read = false;
 	err = SSL_get_error(port->ssl, n);
 	switch (err)
 	{
@@ -695,19 +700,28 @@ static BIO_METHOD my_bio_methods;
 static int
 my_sock_read(BIO *h, char *buf, int size)
 {
-	int			res = 0;
+	int			res;
 
-	if (buf != NULL)
+	if (buf == NULL)
+		return 0;	/* XXX: can this happen? */
+
+	/* If the kill-switch is set, pretend that there is no data. */
+	if (my_block_raw_read)
 	{
-		res = secure_raw_read(((Port *)h->ptr), buf, size);
+		errno = EWOULDBLOCK;
 		BIO_clear_retry_flags(h);
-		if (res <= 0)
+		BIO_set_retry_read(h);
+		return -1;
+	}
+
+	res = secure_raw_read(((Port *)h->ptr), buf, size);
+	BIO_clear_retry_flags(h);
+	if (res <= 0)
+	{
+		/* If we were interrupted, tell caller to retry */
+		if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
 		{
-			/* If we were interrupted, tell caller to retry */
-			if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
-			{
-				BIO_set_retry_read(h);
-			}
+			BIO_set_retry_read(h);
 		}
 	}
 
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 4e7acbe..e7bc52b 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -245,6 +245,9 @@ retry:
 			 * for the socket to become ready again.
 			 */
 		}
+		if (waitfor == WL_SOCKET_READABLE)
+			pq_recvbuf(false);
+
 		goto retry;
 	}
 
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 34efac4..bedc19f 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -843,11 +843,11 @@ socket_set_nonblocking(bool nonblocking)
 /* --------------------------------
  *		pq_recvbuf - load some bytes into the input buffer
  *
- *		returns 0 if OK, EOF if trouble
+ *		returns 0 if OK, EOF if	trouble
  * --------------------------------
  */
-static int
-pq_recvbuf(void)
+int
+pq_recvbuf(bool block)
 {
 	if (PqRecvPointer > 0)
 	{
@@ -863,8 +863,8 @@ pq_recvbuf(void)
 			PqRecvLength = PqRecvPointer = 0;
 	}
 
-	/* Ensure that we're in blocking mode */
-	socket_set_nonblocking(false);
+	/* Ensure that we're in blocking mode (or not) */
+	socket_set_nonblocking(!block);
 
 	/* Can fill buffer from PqRecvLength and upwards */
 	for (;;)
@@ -879,6 +879,12 @@ pq_recvbuf(void)
 			if (errno == EINTR)
 				continue;		/* Ok if interrupted */
 
+			if (!block)
+			{
+				if (errno == EWOULDBLOCK || errno == EAGAIN)
+					return 0;
+			}
+
 			/*
 			 * Careful: an ereport() that tries to write to the client would
 			 * cause recursion to here, leading to stack overflow and core
@@ -914,7 +920,7 @@ pq_getbyte(void)
 
 	while (PqRecvPointer >= PqRecvLength)
 	{
-		if (pq_recvbuf())		/* If nothing in buffer, then recv some */
+		if (pq_recvbuf(true))		/* If nothing in buffer, then recv some */
 			return EOF;			/* Failed to recv data */
 	}
 	return (unsigned char) PqRecvBuffer[PqRecvPointer++];
@@ -933,7 +939,7 @@ pq_peekbyte(void)
 
 	while (PqRecvPointer >= PqRecvLength)
 	{
-		if (pq_recvbuf())		/* If nothing in buffer, then recv some */
+		if (pq_recvbuf(true))	/* If nothing in buffer, then recv some */
 			return EOF;			/* Failed to recv data */
 	}
 	return (unsigned char) PqRecvBuffer[PqRecvPointer];
@@ -1012,7 +1018,7 @@ pq_getbytes(char *s, size_t len)
 	{
 		while (PqRecvPointer >= PqRecvLength)
 		{
-			if (pq_recvbuf())	/* If nothing in buffer, then recv some */
+			if (pq_recvbuf(true))	/* If nothing in buffer, then recv some */
 				return EOF;		/* Failed to recv data */
 		}
 		amount = PqRecvLength - PqRecvPointer;
@@ -1046,7 +1052,7 @@ pq_discardbytes(size_t len)
 	{
 		while (PqRecvPointer >= PqRecvLength)
 		{
-			if (pq_recvbuf())	/* If nothing in buffer, then recv some */
+			if (pq_recvbuf(true))	/* If nothing in buffer, then recv some */
 				return EOF;		/* Failed to recv data */
 		}
 		amount = PqRecvLength - PqRecvPointer;
@@ -1087,7 +1093,7 @@ pq_getstring(StringInfo s)
 	{
 		while (PqRecvPointer >= PqRecvLength)
 		{
-			if (pq_recvbuf())	/* If nothing in buffer, then recv some */
+			if (pq_recvbuf(true))	/* If nothing in buffer, then recv some */
 				return EOF;		/* Failed to recv data */
 		}
 
diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
index af4ba2a..152e30e 100644
--- a/src/include/libpq/libpq.h
+++ b/src/include/libpq/libpq.h
@@ -1,3 +1,4 @@
+
 /*-------------------------------------------------------------------------
  *
  * libpq.h
@@ -86,6 +87,7 @@ extern int	pq_getbyte(void);
 extern int	pq_peekbyte(void);
 extern int	pq_getbyte_if_available(unsigned char *c);
 extern int	pq_putbytes(const char *s, size_t len);
+extern int	pq_recvbuf(bool block);
 
 /*
  * prototypes for functions in be-secure.c
-- 
2.1.4