[patch] fe-connect.c doesn't handle EINTR correctly

Started by David Fordabout 24 years ago3 messageshackers
Jump to latest
#1David Ford
david+cert@blue-labs.org

Last year we had a drawn out discussion about this and I created a patch
for it. I never noticed that the patch didn't go in until I installed
7.2 the other day and realised that fe-connect.c never was fixed.

Here is the patch again. It is against CVS 3/16/2002. This time I only
rewrote the connect procedure at line 912, I leave it up to the regular
hackers to copy it's functionality to the SSL procedure just below it.

In summary, if a software writer implements timer events or other events
which generate a signal with a timing fast enough to occur while libpq
is inside connect(), then connect returns -EINTR. The code following
the connect call does not handle this and generates an error message.
The sum result is that the pg_connect() fails. If the timer or other
event is right on the window of the connect() completion time, the
pg_connect() may appear to work sporadically. If the event is too slow,
pg_connect() will appear to always work and if the event is too fast,
pg_connect() will always fail.

David

Attachments:

fe-connect.c-EINTR.difftext/plain; name=fe-connect.c-EINTR.diffDownload+40-21
#2Bruce Momjian
bruce@momjian.us
In reply to: David Ford (#1)
Re: [PATCHES] [patch] fe-connect.c doesn't handle EINTR correctly

David, sorry you patch didn't make it into 7.2.X. That whole EINTR
discussion was quite complicated so I am not surprised we missed it.

The attached patch implements your ENITR test in cases that seems to
need it. I have followed the method we used for ENITRY in fe-misc.c.

---------------------------------------------------------------------------

David Ford wrote:

Last year we had a drawn out discussion about this and I created a patch
for it. I never noticed that the patch didn't go in until I installed
7.2 the other day and realised that fe-connect.c never was fixed.

Here is the patch again. It is against CVS 3/16/2002. This time I only
rewrote the connect procedure at line 912, I leave it up to the regular
hackers to copy it's functionality to the SSL procedure just below it.

In summary, if a software writer implements timer events or other events
which generate a signal with a timing fast enough to occur while libpq
is inside connect(), then connect returns -EINTR. The code following
the connect call does not handle this and generates an error message.
The sum result is that the pg_connect() fails. If the timer or other
event is right on the window of the connect() completion time, the
pg_connect() may appear to work sporadically. If the event is too slow,
pg_connect() will appear to always work and if the event is too fast,
pg_connect() will always fail.

David

Index: src/interfaces/libpq/fe-connect.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/interfaces/libpq/fe-connect.c,v
retrieving revision 1.181
diff -u -r1.181 fe-connect.c
--- src/interfaces/libpq/fe-connect.c	2001/11/11 02:09:05	1.181
+++ src/interfaces/libpq/fe-connect.c	2002/03/16 05:17:47
@@ -909,29 +909,48 @@
* Thus, we have to make arrangements for all eventualities.
* ----------
*/
-	if (connect(conn->sock, &conn->raddr.sa, conn->raddr_len) < 0)
-	{
-		if (SOCK_ERRNO == EINPROGRESS || SOCK_ERRNO == EWOULDBLOCK || SOCK_ERRNO == 0)
-		{
-			/*
-			 * This is fine - we're in non-blocking mode, and the
-			 * connection is in progress.
-			 */
-			conn->status = CONNECTION_STARTED;
-		}
-		else
-		{
-			/* Something's gone wrong */
-			connectFailureMessage(conn, SOCK_ERRNO);
-			goto connect_errReturn;
+	do {
+		int e;
+		e=connect(conn->sock, &conn->raddr.sa, conn->raddr_len)
+
+		if(e < 0) {
+			switch (e) {
+				case EINTR:
+					/*
+					 * Interrupted by a signal, keep trying.  This handling is
+					 * required because the user may have turned on signals in
+					 * his program.  Previously, libpq would erronously fail to
+					 * connect if the user's timer event fired and interrupted
+					 * this syscall.  It is important that we don't try to sleep
+					 * here because this may cause havoc with the user program.
+					 */
+					continue;
+					break;
+				case 0:
+				case EINPROGRESS:
+				case EWOULDBLOCK:
+					/*
+					 * This is fine - we're in non-blocking mode, and the
+					 * connection is in progress.
+					 */
+					conn->status = CONNECTION_STARTED;
+					break;
+				default:
+					/* Something's gone wrong */
+					connectFailureMessage(conn, SOCK_ERRNO);
+					goto connect_errReturn;
+					break;
+			}
+		} else {
+			/* We're connected now */
+			conn->status = CONNECTION_MADE;
}
-	}
-	else
-	{
-		/* We're connected already */
-		conn->status = CONNECTION_MADE;
-	}
+		
+		if(conn->status == CONNECTION_STARTED || conn->status == CONNECTION_MADE)
+			break;

+ } while(1);
+
#ifdef USE_SSL
/* Attempt to negotiate SSL usage */
if (conn->allow_ssl_try)

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Attachments:

/bjm/difftext/plainDownload+40-22
#3Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#2)
Re: [PATCHES] [patch] fe-connect.c doesn't handle EINTR correctly

Fix applied. Thanks.

---------------------------------------------------------------------------

Bruce Momjian wrote:

David, sorry you patch didn't make it into 7.2.X. That whole EINTR
discussion was quite complicated so I am not surprised we missed it.

The attached patch implements your ENITR test in cases that seems to
need it. I have followed the method we used for ENITRY in fe-misc.c.

---------------------------------------------------------------------------

David Ford wrote:

Last year we had a drawn out discussion about this and I created a patch
for it. I never noticed that the patch didn't go in until I installed
7.2 the other day and realised that fe-connect.c never was fixed.

Here is the patch again. It is against CVS 3/16/2002. This time I only
rewrote the connect procedure at line 912, I leave it up to the regular
hackers to copy it's functionality to the SSL procedure just below it.

In summary, if a software writer implements timer events or other events
which generate a signal with a timing fast enough to occur while libpq
is inside connect(), then connect returns -EINTR. The code following
the connect call does not handle this and generates an error message.
The sum result is that the pg_connect() fails. If the timer or other
event is right on the window of the connect() completion time, the
pg_connect() may appear to work sporadically. If the event is too slow,
pg_connect() will appear to always work and if the event is too fast,
pg_connect() will always fail.

David

Index: src/interfaces/libpq/fe-connect.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/interfaces/libpq/fe-connect.c,v
retrieving revision 1.181
diff -u -r1.181 fe-connect.c
--- src/interfaces/libpq/fe-connect.c	2001/11/11 02:09:05	1.181
+++ src/interfaces/libpq/fe-connect.c	2002/03/16 05:17:47
@@ -909,29 +909,48 @@
* Thus, we have to make arrangements for all eventualities.
* ----------
*/
-	if (connect(conn->sock, &conn->raddr.sa, conn->raddr_len) < 0)
-	{
-		if (SOCK_ERRNO == EINPROGRESS || SOCK_ERRNO == EWOULDBLOCK || SOCK_ERRNO == 0)
-		{
-			/*
-			 * This is fine - we're in non-blocking mode, and the
-			 * connection is in progress.
-			 */
-			conn->status = CONNECTION_STARTED;
-		}
-		else
-		{
-			/* Something's gone wrong */
-			connectFailureMessage(conn, SOCK_ERRNO);
-			goto connect_errReturn;
+	do {
+		int e;
+		e=connect(conn->sock, &conn->raddr.sa, conn->raddr_len)
+
+		if(e < 0) {
+			switch (e) {
+				case EINTR:
+					/*
+					 * Interrupted by a signal, keep trying.  This handling is
+					 * required because the user may have turned on signals in
+					 * his program.  Previously, libpq would erronously fail to
+					 * connect if the user's timer event fired and interrupted
+					 * this syscall.  It is important that we don't try to sleep
+					 * here because this may cause havoc with the user program.
+					 */
+					continue;
+					break;
+				case 0:
+				case EINPROGRESS:
+				case EWOULDBLOCK:
+					/*
+					 * This is fine - we're in non-blocking mode, and the
+					 * connection is in progress.
+					 */
+					conn->status = CONNECTION_STARTED;
+					break;
+				default:
+					/* Something's gone wrong */
+					connectFailureMessage(conn, SOCK_ERRNO);
+					goto connect_errReturn;
+					break;
+			}
+		} else {
+			/* We're connected now */
+			conn->status = CONNECTION_MADE;
}
-	}
-	else
-	{
-		/* We're connected already */
-		conn->status = CONNECTION_MADE;
-	}
+		
+		if(conn->status == CONNECTION_STARTED || conn->status == CONNECTION_MADE)
+			break;

+ } while(1);
+
#ifdef USE_SSL
/* Attempt to negotiate SSL usage */
if (conn->allow_ssl_try)

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 853-3000
+  If your life is a hard drive,     |  830 Blythe Avenue
+  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
Index: src/interfaces/libpq/fe-connect.c
===================================================================
RCS file: /cvsroot/pgsql/src/interfaces/libpq/fe-connect.c,v
retrieving revision 1.182
diff -c -r1.182 fe-connect.c
*** src/interfaces/libpq/fe-connect.c	2 Mar 2002 00:49:22 -0000	1.182
--- src/interfaces/libpq/fe-connect.c	14 Apr 2002 04:40:24 -0000
***************
*** 913,920 ****
--- 913,925 ----
* Thus, we have to make arrangements for all eventualities.
* ----------
*/
+ retry1:
if (connect(conn->sock, &conn->raddr.sa, conn->raddr_len) < 0)
{
+ 		if (SOCK_ERRNO == EINTR)
+ 			/* Interrupted system call - we'll just try again */
+ 			goto retry1;
+ 
if (SOCK_ERRNO == EINPROGRESS || SOCK_ERRNO == EWOULDBLOCK || SOCK_ERRNO == 0)
{
/*
***************
*** 949,957 ****
--- 954,967 ----
SOCK_STRERROR(SOCK_ERRNO));
goto connect_errReturn;
}
+ retry2:
/* Now receive the postmasters response */
if (recv(conn->sock, &SSLok, 1, 0) != 1)
{
+ 			if (SOCK_ERRNO == EINTR)
+ 				/* Interrupted system call - we'll just try again */
+ 				goto retry2;
+ 
printfPQExpBuffer(&conn->errorMessage,
libpq_gettext("could not receive server response to SSL negotiation packet: %s\n"),
SOCK_STRERROR(SOCK_ERRNO));
***************
*** 2132,2139 ****
--- 2142,2153 ----
"PQrequestCancel() -- socket() failed: ");
goto cancel_errReturn;
}
+ retry3:
if (connect(tmpsock, &conn->raddr.sa, conn->raddr_len) < 0)
{
+ 		if (SOCK_ERRNO == EINTR)
+ 			/* Interrupted system call - we'll just try again */
+ 			goto retry3;
strcpy(conn->errorMessage.data,
"PQrequestCancel() -- connect() failed: ");
goto cancel_errReturn;
***************
*** 2150,2157 ****
--- 2164,2175 ----
crp.cp.backendPID = htonl(conn->be_pid);
crp.cp.cancelAuthCode = htonl(conn->be_key);
+ retry4:
if (send(tmpsock, (char *) &crp, sizeof(crp), 0) != (int) sizeof(crp))
{
+ 		if (SOCK_ERRNO == EINTR)
+ 			/* Interrupted system call - we'll just try again */
+ 			goto retry4;
strcpy(conn->errorMessage.data,
"PQrequestCancel() -- send() failed: ");
goto cancel_errReturn;
Index: src/interfaces/libpq/fe-misc.c
===================================================================
RCS file: /cvsroot/pgsql/src/interfaces/libpq/fe-misc.c,v
retrieving revision 1.68
diff -c -r1.68 fe-misc.c
*** src/interfaces/libpq/fe-misc.c	6 Mar 2002 06:10:42 -0000	1.68
--- src/interfaces/libpq/fe-misc.c	14 Apr 2002 04:40:25 -0000
***************
*** 361,367 ****
if (!conn || conn->sock < 0)
return -1;
! retry:
FD_ZERO(&input_mask);
FD_SET(conn->sock, &input_mask);
timeout.tv_sec = 0;
--- 361,367 ----
if (!conn || conn->sock < 0)
return -1;

! retry1:
FD_ZERO(&input_mask);
FD_SET(conn->sock, &input_mask);
timeout.tv_sec = 0;
***************
*** 371,377 ****
{
if (SOCK_ERRNO == EINTR)
/* Interrupted system call - we'll just try again */
! goto retry;

printfPQExpBuffer(&conn->errorMessage,
libpq_gettext("select() failed: %s\n"),
--- 371,377 ----
{
if (SOCK_ERRNO == EINTR)
/* Interrupted system call - we'll just try again */
! 			goto retry1;

printfPQExpBuffer(&conn->errorMessage,
libpq_gettext("select() failed: %s\n"),
***************
*** 395,401 ****
if (!conn || conn->sock < 0)
return -1;

! retry:
FD_ZERO(&input_mask);
FD_SET(conn->sock, &input_mask);
timeout.tv_sec = 0;
--- 395,401 ----
if (!conn || conn->sock < 0)
return -1;

! retry2:
FD_ZERO(&input_mask);
FD_SET(conn->sock, &input_mask);
timeout.tv_sec = 0;
***************
*** 405,411 ****
{
if (SOCK_ERRNO == EINTR)
/* Interrupted system call - we'll just try again */
! goto retry;

printfPQExpBuffer(&conn->errorMessage,
libpq_gettext("select() failed: %s\n"),
--- 405,411 ----
{
if (SOCK_ERRNO == EINTR)
/* Interrupted system call - we'll just try again */
! 			goto retry2;

printfPQExpBuffer(&conn->errorMessage,
libpq_gettext("select() failed: %s\n"),
***************
*** 478,484 ****
}

/* OK, try to read some data */
! tryAgain:
#ifdef USE_SSL
if (conn->ssl)
nread = SSL_read(conn->ssl, conn->inBuffer + conn->inEnd,
--- 478,484 ----
}
/* OK, try to read some data */
! retry3:
#ifdef USE_SSL
if (conn->ssl)
nread = SSL_read(conn->ssl, conn->inBuffer + conn->inEnd,
***************
*** 490,496 ****
if (nread < 0)
{
if (SOCK_ERRNO == EINTR)
! 			goto tryAgain;
/* Some systems return EAGAIN/EWOULDBLOCK for no data */
#ifdef EAGAIN
if (SOCK_ERRNO == EAGAIN)
--- 490,496 ----
if (nread < 0)
{
if (SOCK_ERRNO == EINTR)
! 			goto retry3;
/* Some systems return EAGAIN/EWOULDBLOCK for no data */
#ifdef EAGAIN
if (SOCK_ERRNO == EAGAIN)
***************
*** 531,537 ****
(conn->inBufSize - conn->inEnd) >= 8192)
{
someread = 1;
! 			goto tryAgain;
}
return 1;
}
--- 531,537 ----
(conn->inBufSize - conn->inEnd) >= 8192)
{
someread = 1;
! 			goto retry3;
}
return 1;
}
***************
*** 564,570 ****
* Still not sure that it's EOF, because some data could have just
* arrived.
*/
! tryAgain2:
#ifdef USE_SSL
if (conn->ssl)
nread = SSL_read(conn->ssl, conn->inBuffer + conn->inEnd,
--- 564,570 ----
* Still not sure that it's EOF, because some data could have just
* arrived.
*/
! retry4:
#ifdef USE_SSL
if (conn->ssl)
nread = SSL_read(conn->ssl, conn->inBuffer + conn->inEnd,
***************
*** 576,582 ****
if (nread < 0)
{
if (SOCK_ERRNO == EINTR)
! 			goto tryAgain2;
/* Some systems return EAGAIN/EWOULDBLOCK for no data */
#ifdef EAGAIN
if (SOCK_ERRNO == EAGAIN)
--- 576,582 ----
if (nread < 0)
{
if (SOCK_ERRNO == EINTR)
! 			goto retry4;
/* Some systems return EAGAIN/EWOULDBLOCK for no data */
#ifdef EAGAIN
if (SOCK_ERRNO == EAGAIN)
***************
*** 804,810 ****
if (forRead || forWrite)
{
! retry:
FD_ZERO(&input_mask);
FD_ZERO(&output_mask);
FD_ZERO(&except_mask);
--- 804,810 ----
if (forRead || forWrite)
{
! retry5:
FD_ZERO(&input_mask);
FD_ZERO(&output_mask);
FD_ZERO(&except_mask);
***************
*** 817,823 ****
(struct timeval *) NULL) < 0)
{
if (SOCK_ERRNO == EINTR)
! 				goto retry;
printfPQExpBuffer(&conn->errorMessage,
libpq_gettext("select() failed: %s\n"),
SOCK_STRERROR(SOCK_ERRNO));
--- 817,823 ----
(struct timeval *) NULL) < 0)
{
if (SOCK_ERRNO == EINTR)
! 				goto retry5;
printfPQExpBuffer(&conn->errorMessage,
libpq_gettext("select() failed: %s\n"),
SOCK_STRERROR(SOCK_ERRNO));

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026