psql casts aspersions on server reliability

Started by Robert Haasover 9 years ago12 messages
#1Robert Haas
robertmhaas@gmail.com

psql tends to do things like this:

rhaas=# select * from pg_stat_activity;
FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Basically everything psql has to say about this is a lie:

1. It says the server closed the connection unexpectedly, but it just
finished processing a FATAL message from the server. So how
unexpected is it that the connection would thereafter be closed?

2. It says the server probably terminated abnormally, but PostgreSQL
is rarely so unreliable as to just give up the ghost and die. The
ErrorResponse it just processed has an errcode of
ERRCODE_ADMIN_SHUTDOWN, so probably what happened is somebody either
shut down the server or killed the session.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#1)
Re: psql casts aspersions on server reliability

Robert Haas <robertmhaas@gmail.com> writes:

psql tends to do things like this:
rhaas=# select * from pg_stat_activity;
FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Basically everything psql has to say about this is a lie:

I cannot get terribly excited about this. What you seem to be proposing
is that psql try to intuit the reason for connection closure from the
last error message it got, but that seems likely to lead to worse lies
than printing a boilerplate message.

I could go along with just dropping the last sentence ("This probably...")
if the last error we got was FATAL level. I don't find "unexpectedly"
to be problematic here: from the point of view of psql, and probably
of its user, the shutdown *was* unexpected.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#2)
Re: psql casts aspersions on server reliability

On Wed, Sep 28, 2016 at 9:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

psql tends to do things like this:
rhaas=# select * from pg_stat_activity;
FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Basically everything psql has to say about this is a lie:

I cannot get terribly excited about this. What you seem to be proposing
is that psql try to intuit the reason for connection closure from the
last error message it got, but that seems likely to lead to worse lies
than printing a boilerplate message.

I could go along with just dropping the last sentence ("This probably...")
if the last error we got was FATAL level. I don't find "unexpectedly"
to be problematic here: from the point of view of psql, and probably
of its user, the shutdown *was* unexpected.

I don't care very much whether we try to intuit the reason for
connection closure or not; it could be done, but I don't feel that it
has to be done. My bigger point is that currently psql speculates
that the reason for *every* connection closure is abnormal server
termination, which is actually a very rare event.

It may have been common when that message was added.
1a17447be1186fdd36391c58a2a0209f613d89c4 changed the wording this
message in 2001, and the original message seems to date to
011ee13131f6fa2f6dbafd3827b70d051cb28f64 in 1996. And my guess is at
that time the server probably did just roll over and die with some
regularity. But today it usually doesn't. It's neither helpful nor
good PR for libpq to guess that the most likely cause of a server
disconnection is server unreliability.

I have seen actual instances of customers getting upset by this
message even though the server had been shut down quite cleanly. The
message got into a logfile and induced minor panic. Fortunately, I
have not seen this happen lately.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4David Steele
david@pgmasters.net
In reply to: Robert Haas (#3)
Re: psql casts aspersions on server reliability

On 9/28/16 10:22 AM, Robert Haas wrote:

On Wed, Sep 28, 2016 at 9:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

psql tends to do things like this:
rhaas=# select * from pg_stat_activity;
FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Basically everything psql has to say about this is a lie:

I cannot get terribly excited about this. What you seem to be proposing
is that psql try to intuit the reason for connection closure from the
last error message it got, but that seems likely to lead to worse lies
than printing a boilerplate message.

I could go along with just dropping the last sentence ("This probably...")
if the last error we got was FATAL level. I don't find "unexpectedly"
to be problematic here: from the point of view of psql, and probably
of its user, the shutdown *was* unexpected.

I don't care very much whether we try to intuit the reason for
connection closure or not; it could be done, but I don't feel that it
has to be done. My bigger point is that currently psql speculates
that the reason for *every* connection closure is abnormal server
termination, which is actually a very rare event.

It may have been common when that message was added.
1a17447be1186fdd36391c58a2a0209f613d89c4 changed the wording this
message in 2001, and the original message seems to date to
011ee13131f6fa2f6dbafd3827b70d051cb28f64 in 1996. And my guess is at
that time the server probably did just roll over and die with some
regularity. But today it usually doesn't. It's neither helpful nor
good PR for libpq to guess that the most likely cause of a server
disconnection is server unreliability.

I have seen actual instances of customers getting upset by this
message even though the server had been shut down quite cleanly. The
message got into a logfile and induced minor panic. Fortunately, I
have not seen this happen lately.

+1 for making this error message less frightening. I have also had to
explain it away on occasion.

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Petr Jelinek
petr@2ndquadrant.com
In reply to: David Steele (#4)
Re: psql casts aspersions on server reliability

On 28/09/16 17:13, David Steele wrote:

On 9/28/16 10:22 AM, Robert Haas wrote:

On Wed, Sep 28, 2016 at 9:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

psql tends to do things like this:
rhaas=# select * from pg_stat_activity;
FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Basically everything psql has to say about this is a lie:

I cannot get terribly excited about this. What you seem to be proposing
is that psql try to intuit the reason for connection closure from the
last error message it got, but that seems likely to lead to worse lies
than printing a boilerplate message.

I could go along with just dropping the last sentence ("This probably...")
if the last error we got was FATAL level. I don't find "unexpectedly"
to be problematic here: from the point of view of psql, and probably
of its user, the shutdown *was* unexpected.

I don't care very much whether we try to intuit the reason for
connection closure or not; it could be done, but I don't feel that it
has to be done. My bigger point is that currently psql speculates
that the reason for *every* connection closure is abnormal server
termination, which is actually a very rare event.

It may have been common when that message was added.
1a17447be1186fdd36391c58a2a0209f613d89c4 changed the wording this
message in 2001, and the original message seems to date to
011ee13131f6fa2f6dbafd3827b70d051cb28f64 in 1996. And my guess is at
that time the server probably did just roll over and die with some
regularity. But today it usually doesn't. It's neither helpful nor
good PR for libpq to guess that the most likely cause of a server
disconnection is server unreliability.

I have seen actual instances of customers getting upset by this
message even though the server had been shut down quite cleanly. The
message got into a logfile and induced minor panic. Fortunately, I
have not seen this happen lately.

+1 for making this error message less frightening. I have also had to
explain it away on occasion.

+1 I've seen this being misleading way too often.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#2)
1 attachment(s)
Re: [HACKERS] psql casts aspersions on server reliability

On Wed, Sep 28, 2016 at 09:14:41AM -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

psql tends to do things like this:
rhaas=# select * from pg_stat_activity;
FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Basically everything psql has to say about this is a lie:

I cannot get terribly excited about this. What you seem to be proposing
is that psql try to intuit the reason for connection closure from the
last error message it got, but that seems likely to lead to worse lies
than printing a boilerplate message.

I could go along with just dropping the last sentence ("This probably...")
if the last error we got was FATAL level. I don't find "unexpectedly"
to be problematic here: from the point of view of psql, and probably
of its user, the shutdown *was* unexpected.

I looked at this thread from 2016 and I think the problem is the
"abnormally" word, since if the server was shutdown by the administrator
(most likely), it isn't abnormal. Here is a patch to remove
"abnormally".

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

Attachments:

exit.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 660cdec93c..634708d716 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -749,7 +749,7 @@ retry4:
 	 */
 definitelyEOF:
 	libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-							"\tThis probably means the server terminated abnormally\n"
+							"\tThis probably means the server terminated\n"
 							"\tbefore or while processing the request.");
 
 	/* Come here if lower-level code already set a suitable errorMessage */
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index f1192d28f2..115776ce6c 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -206,7 +206,7 @@ rloop:
 				if (result_errno == EPIPE ||
 					result_errno == ECONNRESET)
 					libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-											"\tThis probably means the server terminated abnormally\n"
+											"\tThis probably means the server terminated\n"
 											"\tbefore or while processing the request.");
 				else
 					libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
@@ -306,7 +306,7 @@ pgtls_write(PGconn *conn, const void *ptr, size_t len)
 				result_errno = SOCK_ERRNO;
 				if (result_errno == EPIPE || result_errno == ECONNRESET)
 					libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-											"\tThis probably means the server terminated abnormally\n"
+											"\tThis probably means the server terminated\n"
 											"\tbefore or while processing the request.");
 				else
 					libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
index bd72a87bbb..b972bd3ced 100644
--- a/src/interfaces/libpq/fe-secure.c
+++ b/src/interfaces/libpq/fe-secure.c
@@ -233,7 +233,7 @@ pqsecure_raw_read(PGconn *conn, void *ptr, size_t len)
 			case EPIPE:
 			case ECONNRESET:
 				libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-										"\tThis probably means the server terminated abnormally\n"
+										"\tThis probably means the server terminated\n"
 										"\tbefore or while processing the request.");
 				break;
 
@@ -395,7 +395,7 @@ retry_masked:
 				/* (strdup failure is OK, we'll cope later) */
 				snprintf(msgbuf, sizeof(msgbuf),
 						 libpq_gettext("server closed the connection unexpectedly\n"
-									   "\tThis probably means the server terminated abnormally\n"
+									   "\tThis probably means the server terminated\n"
 									   "\tbefore or while processing the request."));
 				/* keep newline out of translated string */
 				strlcat(msgbuf, "\n", sizeof(msgbuf));
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#6)
Re: [HACKERS] psql casts aspersions on server reliability

Bruce Momjian <bruce@momjian.us> writes:

On Wed, Sep 28, 2016 at 09:14:41AM -0400, Tom Lane wrote:

I could go along with just dropping the last sentence ("This probably...")
if the last error we got was FATAL level. I don't find "unexpectedly"
to be problematic here: from the point of view of psql, and probably
of its user, the shutdown *was* unexpected.

I looked at this thread from 2016 and I think the problem is the
"abnormally" word, since if the server was shutdown by the administrator
(most likely), it isn't abnormal. Here is a patch to remove
"abnormally".

I do not think this is an improvement. The places you are changing
are reacting to a connection closure. *If* we had previously gotten a
"FATAL: terminating connection due to administrator command" message,
then yeah the connection closure is expected; but if not, it isn't.
Your patch adds no check for that. (As I remarked in 2016, we could
probably condition this on the elevel being FATAL, rather than
checking for specific error messages.)

regards, tom lane

#8Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#7)
1 attachment(s)
Re: [HACKERS] psql casts aspersions on server reliability

On Wed, Nov 22, 2023 at 07:38:52PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Wed, Sep 28, 2016 at 09:14:41AM -0400, Tom Lane wrote:

I could go along with just dropping the last sentence ("This probably...")
if the last error we got was FATAL level. I don't find "unexpectedly"
to be problematic here: from the point of view of psql, and probably
of its user, the shutdown *was* unexpected.

I looked at this thread from 2016 and I think the problem is the
"abnormally" word, since if the server was shutdown by the administrator
(most likely), it isn't abnormal. Here is a patch to remove
"abnormally".

I do not think this is an improvement. The places you are changing
are reacting to a connection closure. *If* we had previously gotten a
"FATAL: terminating connection due to administrator command" message,
then yeah the connection closure is expected; but if not, it isn't.
Your patch adds no check for that. (As I remarked in 2016, we could
probably condition this on the elevel being FATAL, rather than
checking for specific error messages.)

Yes, you are correct. Here is a patch that implements the FATAL test,
though I am not sure I have the logic correct or backwards, and I don't
know how to test this. Thanks.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

Attachments:

exit.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 660cdec93c..c541fd8b02 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -749,8 +749,10 @@ retry4:
 	 */
 definitelyEOF:
 	libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-							"\tThis probably means the server terminated abnormally\n"
-							"\tbefore or while processing the request.");
+							"\tThis probably means the server terminated%s\n"
+							"\tbefore or while processing the request.",
+							conn->result->resultStatus == PGRES_FATAL_ERROR ?
+							"" : "abnormally");
 
 	/* Come here if lower-level code already set a suitable errorMessage */
 definitelyFailed:
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index f1192d28f2..6c21f91817 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -206,8 +206,10 @@ rloop:
 				if (result_errno == EPIPE ||
 					result_errno == ECONNRESET)
 					libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-											"\tThis probably means the server terminated abnormally\n"
-											"\tbefore or while processing the request.");
+											  "\tThis probably means the server terminated%s\n"
+											  "\tbefore or while processing the request.",
+											  conn->result->resultStatus == PGRES_FATAL_ERROR ?
+											  "" : "abnormally");
 				else
 					libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
 											SOCK_STRERROR(result_errno,
@@ -306,8 +308,10 @@ pgtls_write(PGconn *conn, const void *ptr, size_t len)
 				result_errno = SOCK_ERRNO;
 				if (result_errno == EPIPE || result_errno == ECONNRESET)
 					libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-											"\tThis probably means the server terminated abnormally\n"
-											"\tbefore or while processing the request.");
+											  "\tThis probably means the server terminated%s\n"
+											  "\tbefore or while processing the request.",
+											  conn->result->resultStatus == PGRES_FATAL_ERROR ?
+											  "" : "abnormally");
 				else
 					libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
 											SOCK_STRERROR(result_errno,
diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
index bd72a87bbb..5e7136195a 100644
--- a/src/interfaces/libpq/fe-secure.c
+++ b/src/interfaces/libpq/fe-secure.c
@@ -233,8 +233,10 @@ pqsecure_raw_read(PGconn *conn, void *ptr, size_t len)
 			case EPIPE:
 			case ECONNRESET:
 				libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-										"\tThis probably means the server terminated abnormally\n"
-										"\tbefore or while processing the request.");
+										"\tThis probably means the server terminated%s\n"
+										"\tbefore or while processing the request.",
+										conn->result->resultStatus == PGRES_FATAL_ERROR ?
+										"" : "abnormally");
 				break;
 
 			default:
@@ -395,8 +397,11 @@ retry_masked:
 				/* (strdup failure is OK, we'll cope later) */
 				snprintf(msgbuf, sizeof(msgbuf),
 						 libpq_gettext("server closed the connection unexpectedly\n"
-									   "\tThis probably means the server terminated abnormally\n"
-									   "\tbefore or while processing the request."));
+									   "\tThis probably means the server terminated%s\n"
+									   "\tbefore or while processing the request."),
+										conn->result->resultStatus == PGRES_FATAL_ERROR ?
+										"" : "abnormally");
+
 				/* keep newline out of translated string */
 				strlcat(msgbuf, "\n", sizeof(msgbuf));
 				conn->write_err_msg = strdup(msgbuf);
#9Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#8)
1 attachment(s)
Re: [HACKERS] psql casts aspersions on server reliability

On Wed, Nov 22, 2023 at 10:25:14PM -0500, Bruce Momjian wrote:

On Wed, Nov 22, 2023 at 07:38:52PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Wed, Sep 28, 2016 at 09:14:41AM -0400, Tom Lane wrote:

I could go along with just dropping the last sentence ("This probably...")
if the last error we got was FATAL level. I don't find "unexpectedly"
to be problematic here: from the point of view of psql, and probably
of its user, the shutdown *was* unexpected.

I looked at this thread from 2016 and I think the problem is the
"abnormally" word, since if the server was shutdown by the administrator
(most likely), it isn't abnormal. Here is a patch to remove
"abnormally".

I do not think this is an improvement. The places you are changing
are reacting to a connection closure. *If* we had previously gotten a
"FATAL: terminating connection due to administrator command" message,
then yeah the connection closure is expected; but if not, it isn't.
Your patch adds no check for that. (As I remarked in 2016, we could
probably condition this on the elevel being FATAL, rather than
checking for specific error messages.)

Yes, you are correct. Here is a patch that implements the FATAL test,
though I am not sure I have the logic correct or backwards, and I don't
know how to test this. Thanks.

I developed the attached patch which seems to work better. In testing
kill -3 on a backend or calling elog(FATAL) in the server for a
session, libpq's 'res' is NULL, meaning we don't have any status to
check for PGRES_FATAL_ERROR. It is very possible that libpq just isn't
stuctured to have the PGRES_FATAL_ERROR at the point where we issue this
message, and this is not worth improving.

test=> select pg_sleep(100);
--> FATAL: FATAL called

server closed the connection unexpectedly
--> This probably means the server terminated null
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

Attachments:

exit.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 660cdec93c..64faad19df 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -749,8 +749,11 @@ retry4:
 	 */
 definitelyEOF:
 	libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-							"\tThis probably means the server terminated abnormally\n"
-							"\tbefore or while processing the request.");
+							"\tThis probably means the server terminated%s\n"
+							"\tbefore or while processing the request.",
+							(conn->result == NULL) ? " null" :
+							(conn->result->resultStatus == PGRES_FATAL_ERROR) ?
+							"" : " abnormally");
 
 	/* Come here if lower-level code already set a suitable errorMessage */
 definitelyFailed:
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 5613c56b14..03914b97fc 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -2158,6 +2158,7 @@ pqFunctionCall3(PGconn *conn, Oid fnid,
 				if (pqGetErrorNotice3(conn, true))
 					continue;
 				status = PGRES_FATAL_ERROR;
+				fprintf(stderr, "Got 'E'\n");
 				break;
 			case 'A':			/* notify message */
 				/* handle notify and go back to processing return values */
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index f1192d28f2..f4c7f51b0a 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -206,8 +206,11 @@ rloop:
 				if (result_errno == EPIPE ||
 					result_errno == ECONNRESET)
 					libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-											"\tThis probably means the server terminated abnormally\n"
-											"\tbefore or while processing the request.");
+											  "\tThis probably means the server terminated%s\n"
+											  "\tbefore or while processing the request.",
+											  (conn->result == NULL) ? " null" :
+											  (conn->result->resultStatus == PGRES_FATAL_ERROR) ?
+											  "" : " abnormally");
 				else
 					libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
 											SOCK_STRERROR(result_errno,
@@ -306,8 +309,11 @@ pgtls_write(PGconn *conn, const void *ptr, size_t len)
 				result_errno = SOCK_ERRNO;
 				if (result_errno == EPIPE || result_errno == ECONNRESET)
 					libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-											"\tThis probably means the server terminated abnormally\n"
-											"\tbefore or while processing the request.");
+											  "\tThis probably means the server terminated%s\n"
+											  "\tbefore or while processing the request.",
+											  (conn->result == NULL) ? " null" :
+											  (conn->result->resultStatus == PGRES_FATAL_ERROR) ?
+											  "" : " abnormally");
 				else
 					libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
 											SOCK_STRERROR(result_errno,
diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
index bd72a87bbb..be93c2c0f9 100644
--- a/src/interfaces/libpq/fe-secure.c
+++ b/src/interfaces/libpq/fe-secure.c
@@ -233,8 +233,11 @@ pqsecure_raw_read(PGconn *conn, void *ptr, size_t len)
 			case EPIPE:
 			case ECONNRESET:
 				libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-										"\tThis probably means the server terminated abnormally\n"
-										"\tbefore or while processing the request.");
+										"\tThis probably means the server terminated%s\n"
+										"\tbefore or while processing the request.",
+										(conn->result == NULL) ? " null" :
+										(conn->result->resultStatus == PGRES_FATAL_ERROR) ?
+										"" : " abnormally");
 				break;
 
 			default:
@@ -395,8 +398,12 @@ retry_masked:
 				/* (strdup failure is OK, we'll cope later) */
 				snprintf(msgbuf, sizeof(msgbuf),
 						 libpq_gettext("server closed the connection unexpectedly\n"
-									   "\tThis probably means the server terminated abnormally\n"
-									   "\tbefore or while processing the request."));
+									   "\tThis probably means the server terminated%s\n"
+									   "\tbefore or while processing the request."),
+									   (conn->result == NULL) ? " null" :
+									   (conn->result->resultStatus == PGRES_FATAL_ERROR) ?
+									   "" : " abnormally");
+
 				/* keep newline out of translated string */
 				strlcat(msgbuf, "\n", sizeof(msgbuf));
 				conn->write_err_msg = strdup(msgbuf);
#10Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Bruce Momjian (#9)
Re: [HACKERS] psql casts aspersions on server reliability

On Thu, 2023-11-23 at 11:12 -0500, Bruce Momjian wrote:

On Wed, Nov 22, 2023 at 10:25:14PM -0500, Bruce Momjian wrote:

Yes, you are correct. Here is a patch that implements the FATAL test,
though I am not sure I have the logic correct or backwards, and I don't
know how to test this. Thanks.

I developed the attached patch which seems to work better. In testing
kill -3 on a backend or calling elog(FATAL) in the server for a
session, libpq's 'res' is NULL, meaning we don't have any status to
check for PGRES_FATAL_ERROR. It is very possible that libpq just isn't
stuctured to have the PGRES_FATAL_ERROR at the point where we issue this
message, and this is not worth improving.

test=> select pg_sleep(100);
--> FATAL: FATAL called

server closed the connection unexpectedly
--> This probably means the server terminated null
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

I don't thing "terminated null" is a meaningful message.

libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-							"\tThis probably means the server terminated abnormally\n"
-							"\tbefore or while processing the request.");
+							"\tThis probably means the server terminated%s\n"
+							"\tbefore or while processing the request.",
+							(conn->result == NULL) ? " null" :
+							(conn->result->resultStatus == PGRES_FATAL_ERROR) ?
+							"" : " abnormally");

Apart from the weird "null", will that work well for translation?

--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -2158,6 +2158,7 @@ pqFunctionCall3(PGconn *conn, Oid fnid,
if (pqGetErrorNotice3(conn, true))
continue;
status = PGRES_FATAL_ERROR;
+				fprintf(stderr, "Got 'E'\n");
break;
case 'A':			/* notify message */
/* handle notify and go back to processing return values */

That looks like a leftover debugging message.

Yours,
Laurenz Albe

#11Bruce Momjian
bruce@momjian.us
In reply to: Laurenz Albe (#10)
1 attachment(s)
Re: [HACKERS] psql casts aspersions on server reliability

On Fri, Nov 24, 2023 at 04:06:22AM +0100, Laurenz Albe wrote:

On Thu, 2023-11-23 at 11:12 -0500, Bruce Momjian wrote:

On Wed, Nov 22, 2023 at 10:25:14PM -0500, Bruce Momjian wrote:

Yes, you are correct. Here is a patch that implements the FATAL test,
though I am not sure I have the logic correct or backwards, and I don't
know how to test this. Thanks.

I developed the attached patch which seems to work better. In testing
kill -3 on a backend or calling elog(FATAL) in the server for a
session, libpq's 'res' is NULL, meaning we don't have any status to
check for PGRES_FATAL_ERROR. It is very possible that libpq just isn't
stuctured to have the PGRES_FATAL_ERROR at the point where we issue this
message, and this is not worth improving.

test=> select pg_sleep(100);
--> FATAL: FATAL called

server closed the connection unexpectedly
--> This probably means the server terminated null
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

I don't thing "terminated null" is a meaningful message.

Yes, this is just a debug build so we can see the values of 'res'.
Sorry for the confusion. This attached patch has the elog() added so
you can reproduce what I saw.

I am actually unclear which exits should be labled as "abnormal".

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

Attachments:

exit.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 5d78d6dc06..890fe3bd4a 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -372,6 +372,8 @@ pg_sleep(PG_FUNCTION_ARGS)
 	float8		secs = PG_GETARG_FLOAT8(0);
 	float8		endtime;
 
+	elog(FATAL, "FATAL from pg_sleep()");
+
 	/*
 	 * We sleep using WaitLatch, to ensure that we'll wake up promptly if an
 	 * important signal (such as SIGALRM or SIGINT) arrives.  Because
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 660cdec93c..64faad19df 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -749,8 +749,11 @@ retry4:
 	 */
 definitelyEOF:
 	libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-							"\tThis probably means the server terminated abnormally\n"
-							"\tbefore or while processing the request.");
+							"\tThis probably means the server terminated%s\n"
+							"\tbefore or while processing the request.",
+							(conn->result == NULL) ? " null" :
+							(conn->result->resultStatus == PGRES_FATAL_ERROR) ?
+							"" : " abnormally");
 
 	/* Come here if lower-level code already set a suitable errorMessage */
 definitelyFailed:
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 5613c56b14..03914b97fc 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -2158,6 +2158,7 @@ pqFunctionCall3(PGconn *conn, Oid fnid,
 				if (pqGetErrorNotice3(conn, true))
 					continue;
 				status = PGRES_FATAL_ERROR;
+				fprintf(stderr, "Got 'E'\n");
 				break;
 			case 'A':			/* notify message */
 				/* handle notify and go back to processing return values */
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index f1192d28f2..f4c7f51b0a 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -206,8 +206,11 @@ rloop:
 				if (result_errno == EPIPE ||
 					result_errno == ECONNRESET)
 					libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-											"\tThis probably means the server terminated abnormally\n"
-											"\tbefore or while processing the request.");
+											  "\tThis probably means the server terminated%s\n"
+											  "\tbefore or while processing the request.",
+											  (conn->result == NULL) ? " null" :
+											  (conn->result->resultStatus == PGRES_FATAL_ERROR) ?
+											  "" : " abnormally");
 				else
 					libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
 											SOCK_STRERROR(result_errno,
@@ -306,8 +309,11 @@ pgtls_write(PGconn *conn, const void *ptr, size_t len)
 				result_errno = SOCK_ERRNO;
 				if (result_errno == EPIPE || result_errno == ECONNRESET)
 					libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-											"\tThis probably means the server terminated abnormally\n"
-											"\tbefore or while processing the request.");
+											  "\tThis probably means the server terminated%s\n"
+											  "\tbefore or while processing the request.",
+											  (conn->result == NULL) ? " null" :
+											  (conn->result->resultStatus == PGRES_FATAL_ERROR) ?
+											  "" : " abnormally");
 				else
 					libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
 											SOCK_STRERROR(result_errno,
diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
index bd72a87bbb..be93c2c0f9 100644
--- a/src/interfaces/libpq/fe-secure.c
+++ b/src/interfaces/libpq/fe-secure.c
@@ -233,8 +233,11 @@ pqsecure_raw_read(PGconn *conn, void *ptr, size_t len)
 			case EPIPE:
 			case ECONNRESET:
 				libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-										"\tThis probably means the server terminated abnormally\n"
-										"\tbefore or while processing the request.");
+										"\tThis probably means the server terminated%s\n"
+										"\tbefore or while processing the request.",
+										(conn->result == NULL) ? " null" :
+										(conn->result->resultStatus == PGRES_FATAL_ERROR) ?
+										"" : " abnormally");
 				break;
 
 			default:
@@ -395,8 +398,12 @@ retry_masked:
 				/* (strdup failure is OK, we'll cope later) */
 				snprintf(msgbuf, sizeof(msgbuf),
 						 libpq_gettext("server closed the connection unexpectedly\n"
-									   "\tThis probably means the server terminated abnormally\n"
-									   "\tbefore or while processing the request."));
+									   "\tThis probably means the server terminated%s\n"
+									   "\tbefore or while processing the request."),
+									   (conn->result == NULL) ? " null" :
+									   (conn->result->resultStatus == PGRES_FATAL_ERROR) ?
+									   "" : " abnormally");
+
 				/* keep newline out of translated string */
 				strlcat(msgbuf, "\n", sizeof(msgbuf));
 				conn->write_err_msg = strdup(msgbuf);
#12Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#11)
Re: [HACKERS] psql casts aspersions on server reliability

On Fri, Nov 24, 2023 at 10:19:29AM -0500, Bruce Momjian wrote:

On Fri, Nov 24, 2023 at 04:06:22AM +0100, Laurenz Albe wrote:

On Thu, 2023-11-23 at 11:12 -0500, Bruce Momjian wrote:

On Wed, Nov 22, 2023 at 10:25:14PM -0500, Bruce Momjian wrote:

Yes, you are correct. Here is a patch that implements the FATAL test,
though I am not sure I have the logic correct or backwards, and I don't
know how to test this. Thanks.

I developed the attached patch which seems to work better. In testing
kill -3 on a backend or calling elog(FATAL) in the server for a
session, libpq's 'res' is NULL, meaning we don't have any status to
check for PGRES_FATAL_ERROR. It is very possible that libpq just isn't
structured to have the PGRES_FATAL_ERROR at the point where we issue this
message, and this is not worth improving.

test=> select pg_sleep(100);
--> FATAL: FATAL called

server closed the connection unexpectedly
--> This probably means the server terminated null
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

I don't thing "terminated null" is a meaningful message.

Yes, this is just a debug build so we can see the values of 'res'.
Sorry for the confusion. This attached patch has the elog() added so
you can reproduce what I saw.

I am actually unclear which exits should be labeled as "abnormal".

There are five call sites which issue this message, so I looked at
adding "abnormally" just at the call sites where it made sense, but I
couldn't find a pattern. I don't plan to pursue this further.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.