Nicer error when connecting to standby with hot_standby=off

Started by James Colemanalmost 6 years ago44 messages
#1James Coleman
jtc331@gmail.com

I recently noticed while setting up a test environment that attempting to
connect to a standby running without hot_standby=on results in a fairly
generic error (I believe "the database system is starting up"). I don't
have my test setup running right now, so can't confirm with a repro case at
the moment, but with a little bit of spelunking I noticed that error text
only shows up in src/backend/postmaster/postmaster.c when
port->canAcceptConnections has the value CAC_STARTUP.

Ideally the error message would include something along the lines of "The
server is running as a standby but cannot accept connections with
hot_standby=off".

I wanted to get some initial feedback on the idea before writing a patch:
does that seem like a reasonable change? Is it actually plausible to
distinguish between this state and "still recovering" (i.e., when starting
up a hot standby but initial recovery hasn't completed so it legitimately
can't accept connections yet)? If so, should we include the possibility if
hot_standby isn't on, just in case?

The enum value CAC_STARTUP is defined in src/include/libpq/libpq-be.h,
which makes me wonder if changing this value would result in a wire
protocol change/something the client wants to know about? If so, I assume
it's not reasonable to change the value, but would it still be reasonable
to change the error text?

Thanks,
James Coleman

#2Andres Freund
andres@anarazel.de
In reply to: James Coleman (#1)
Re: Nicer error when connecting to standby with hot_standby=off

Hi,

On 2020-03-08 20:12:21 -0400, James Coleman wrote:

I recently noticed while setting up a test environment that attempting to
connect to a standby running without hot_standby=on results in a fairly
generic error (I believe "the database system is starting up"). I don't
have my test setup running right now, so can't confirm with a repro case at
the moment, but with a little bit of spelunking I noticed that error text
only shows up in src/backend/postmaster/postmaster.c when
port->canAcceptConnections has the value CAC_STARTUP.

Ideally the error message would include something along the lines of "The
server is running as a standby but cannot accept connections with
hot_standby=off".

Yea, something roughly like that would be good.

I wanted to get some initial feedback on the idea before writing a patch:
does that seem like a reasonable change? Is it actually plausible to
distinguish between this state and "still recovering" (i.e., when starting
up a hot standby but initial recovery hasn't completed so it legitimately
can't accept connections yet)? If so, should we include the possibility if
hot_standby isn't on, just in case?

Yes, it is feasible to distinguish those cases. And we should, if we're
going to change things around.

The enum value CAC_STARTUP is defined in src/include/libpq/libpq-be.h,
which makes me wonder if changing this value would result in a wire
protocol change/something the client wants to know about? If so, I assume
it's not reasonable to change the value, but would it still be reasonable
to change the error text?

The value shouldn't be visible to clients in any way. While not obvious
from the name, there's this comment at the top of the header:

* Note that this is backend-internal and is NOT exported to clients.
* Structs that need to be client-visible are in pqcomm.h.

Greetings,

Andres Freund

#3James Coleman
jtc331@gmail.com
In reply to: Andres Freund (#2)
Re: Nicer error when connecting to standby with hot_standby=off

On Mon, Mar 9, 2020 at 6:28 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2020-03-08 20:12:21 -0400, James Coleman wrote:

I recently noticed while setting up a test environment that attempting to
connect to a standby running without hot_standby=on results in a fairly
generic error (I believe "the database system is starting up"). I don't
have my test setup running right now, so can't confirm with a repro case at
the moment, but with a little bit of spelunking I noticed that error text
only shows up in src/backend/postmaster/postmaster.c when
port->canAcceptConnections has the value CAC_STARTUP.

Ideally the error message would include something along the lines of "The
server is running as a standby but cannot accept connections with
hot_standby=off".

Yea, something roughly like that would be good.

Awesome, thanks for the early feedback!

I wanted to get some initial feedback on the idea before writing a patch:
does that seem like a reasonable change? Is it actually plausible to
distinguish between this state and "still recovering" (i.e., when starting
up a hot standby but initial recovery hasn't completed so it legitimately
can't accept connections yet)? If so, should we include the possibility if
hot_standby isn't on, just in case?

Yes, it is feasible to distinguish those cases. And we should, if we're
going to change things around.

I'll look into this hopefully soon, but it's helpful to know that it's
possible. Is it basically along the lines of checking to see if the
LSN is past the minimum recovery point?

The enum value CAC_STARTUP is defined in src/include/libpq/libpq-be.h,
which makes me wonder if changing this value would result in a wire
protocol change/something the client wants to know about? If so, I assume
it's not reasonable to change the value, but would it still be reasonable
to change the error text?

The value shouldn't be visible to clients in any way. While not obvious
from the name, there's this comment at the top of the header:

* Note that this is backend-internal and is NOT exported to clients.
* Structs that need to be client-visible are in pqcomm.h.

This is also helpful.

Thanks,
James

#4Andres Freund
andres@anarazel.de
In reply to: James Coleman (#3)
Re: Nicer error when connecting to standby with hot_standby=off

Hi,

On 2020-03-09 18:40:32 -0400, James Coleman wrote:

On Mon, Mar 9, 2020 at 6:28 PM Andres Freund <andres@anarazel.de> wrote:

I wanted to get some initial feedback on the idea before writing a patch:
does that seem like a reasonable change? Is it actually plausible to
distinguish between this state and "still recovering" (i.e., when starting
up a hot standby but initial recovery hasn't completed so it legitimately
can't accept connections yet)? If so, should we include the possibility if
hot_standby isn't on, just in case?

Yes, it is feasible to distinguish those cases. And we should, if we're
going to change things around.

I'll look into this hopefully soon, but it's helpful to know that it's
possible. Is it basically along the lines of checking to see if the
LSN is past the minimum recovery point?

No, I don't think that's the right approach. IIRC the startup process
(i.e. the one doing the WAL replay) signals postmaster once consistency
has been achieved. So you can just use that state.

Greetings,

Andres Freund

#5James Coleman
jtc331@gmail.com
In reply to: Andres Freund (#4)
1 attachment(s)
Re: Nicer error when connecting to standby with hot_standby=off

On Mon, Mar 9, 2020 at 8:06 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2020-03-09 18:40:32 -0400, James Coleman wrote:

On Mon, Mar 9, 2020 at 6:28 PM Andres Freund <andres@anarazel.de> wrote:

I wanted to get some initial feedback on the idea before writing a patch:
does that seem like a reasonable change? Is it actually plausible to
distinguish between this state and "still recovering" (i.e., when starting
up a hot standby but initial recovery hasn't completed so it legitimately
can't accept connections yet)? If so, should we include the possibility if
hot_standby isn't on, just in case?

Yes, it is feasible to distinguish those cases. And we should, if we're
going to change things around.

I'll look into this hopefully soon, but it's helpful to know that it's
possible. Is it basically along the lines of checking to see if the
LSN is past the minimum recovery point?

No, I don't think that's the right approach. IIRC the startup process
(i.e. the one doing the WAL replay) signals postmaster once consistency
has been achieved. So you can just use that state.

I've taken that approach in the attached patch (I'd expected to wait
until later to work on this...but it seemed pretty small so I ended up
hacking on it this evening).

I don't have tests included: I tried intentionally breaking the
existing behavior (returning no error when hot_standby=off), but
running make check-world (including tap tests) didn't find any
breakages. I can look into that more deeply at some point, but if you
happen to know a place we test similar things, then I'd be happy to
hear it.

One other question: how is error message translation handled? I
haven't added entries to the relevant files, but also I'm obviously
not qualified to write them.

Thanks,
James

Attachments:

v1-0001-Improve-standby-connection-denied-error-message.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Improve-standby-connection-denied-error-message.patchDownload
From 0389f9dd7b48b3058849960826a49079abb58e58 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Mon, 9 Mar 2020 21:45:15 -0400
Subject: [PATCH v1] Improve standby connection denied error message.

Currently when a standby is finished starting up but hot_standby is
configured to off, the error message when a client connects is "the
database system is starting up", which is needless confusing (and not
really all that accurate either).

Instead send a helpful error message so that the user immediately knows
that their server is configured to deny these connections.
---
 src/backend/postmaster/postmaster.c | 12 +++++++++---
 src/include/libpq/libpq-be.h        |  7 ++++++-
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 55187eb910..63258287f7 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2287,6 +2287,12 @@ retry1:
 					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
 					 errmsg("the database system is starting up")));
 			break;
+		case CAC_STANDBY:
+			ereport(FATAL,
+					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
+					 errmsg("the database system is up, but hot_standby is off")));
+			break;
+
 		case CAC_SHUTDOWN:
 			ereport(FATAL,
 					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
@@ -2436,10 +2442,10 @@ canAcceptConnections(int backend_type)
 			result = CAC_WAITBACKUP;	/* allow superusers only */
 		else if (Shutdown > NoShutdown)
 			return CAC_SHUTDOWN;	/* shutdown is pending */
-		else if (!FatalError &&
-				 (pmState == PM_STARTUP ||
-				  pmState == PM_RECOVERY))
+		else if (!FatalError && pmState == PM_STARTUP)
 			return CAC_STARTUP; /* normal startup */
+		else if (!FatalError && pmState == PM_RECOVERY)
+			return CAC_STANDBY; /* connection disallowed on non-hot standby */
 		else if (!FatalError &&
 				 pmState == PM_HOT_STANDBY)
 			result = CAC_OK;	/* connection OK during hot standby */
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 82e57afc64..effa78a84a 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -70,7 +70,12 @@ typedef struct
 
 typedef enum CAC_state
 {
-	CAC_OK, CAC_STARTUP, CAC_SHUTDOWN, CAC_RECOVERY, CAC_TOOMANY,
+	CAC_OK,
+	CAC_STARTUP,
+	CAC_SHUTDOWN,
+	CAC_RECOVERY,
+	CAC_STANDBY,
+	CAC_TOOMANY,
 	CAC_WAITBACKUP
 } CAC_state;
 

base-commit: 71e0d0a73773b3985db658d3c5366ce5ceef76ae
-- 
2.17.1

#6David Zhang
david.zhang@highgo.ca
In reply to: James Coleman (#5)
Re: Nicer error when connecting to standby with hot_standby=off

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

I applied the patch to the latest master branch and run a test below. The error messages have been separated. Below is the test steps.

### setup primary server
initdb -D /tmp/primary/data
mkdir /tmp/archive_dir
echo "archive_mode='on'" >> /tmp/primary/data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'" >> /tmp/primary/data/postgresql.conf
pg_ctl -D /tmp/primary/data -l /tmp/primary-logs start

### setup host standby server
pg_basebackup -p 5432 -w -R -D /tmp/hotstandby
echo "primary_conninfo='host=127.0.0.1 port=5432 user=pgdev'" >> /tmp/hotstandby/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'" >> /tmp/hotstandby/postgresql.conf
echo "hot_standby = off" >> /tmp/hotstandby/postgresql.conf
pg_ctl -D /tmp/hotstandby -l /tmp/hotstandby-logs -o "-p 5433" start

### keep trying to connect to hot standby server in order to get the error messages in different stages.
while true; do echo "`date`"; psql postgres -p 5433 -c "SELECT txid_current_snapshot();" sleep 0.2; done

### before the patch
psql: error: could not connect to server: FATAL: the database system is starting up
...

### after the patch, got different messages, one message indicates hot_standby is off
psql: error: could not connect to server: FATAL: the database system is starting up
...
psql: error: could not connect to server: FATAL: the database system is up, but hot_standby is off
...

#7James Coleman
jtc331@gmail.com
In reply to: David Zhang (#6)
Re: Nicer error when connecting to standby with hot_standby=off

On Thu, Apr 2, 2020 at 5:53 PM David Zhang <david.zhang@highgo.ca> wrote:

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

I applied the patch to the latest master branch and run a test below. The error messages have been separated. Below is the test steps.

### setup primary server
initdb -D /tmp/primary/data
mkdir /tmp/archive_dir
echo "archive_mode='on'" >> /tmp/primary/data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'" >> /tmp/primary/data/postgresql.conf
pg_ctl -D /tmp/primary/data -l /tmp/primary-logs start

### setup host standby server
pg_basebackup -p 5432 -w -R -D /tmp/hotstandby
echo "primary_conninfo='host=127.0.0.1 port=5432 user=pgdev'" >> /tmp/hotstandby/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'" >> /tmp/hotstandby/postgresql.conf
echo "hot_standby = off" >> /tmp/hotstandby/postgresql.conf
pg_ctl -D /tmp/hotstandby -l /tmp/hotstandby-logs -o "-p 5433" start

### keep trying to connect to hot standby server in order to get the error messages in different stages.
while true; do echo "`date`"; psql postgres -p 5433 -c "SELECT txid_current_snapshot();" sleep 0.2; done

### before the patch
psql: error: could not connect to server: FATAL: the database system is starting up
...

### after the patch, got different messages, one message indicates hot_standby is off
psql: error: could not connect to server: FATAL: the database system is starting up
...
psql: error: could not connect to server: FATAL: the database system is up, but hot_standby is off
...

Thanks for the review and testing!

James

#8Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: James Coleman (#7)
Re: Nicer error when connecting to standby with hot_standby=off

On 2020/04/03 22:49, James Coleman wrote:

On Thu, Apr 2, 2020 at 5:53 PM David Zhang <david.zhang@highgo.ca> wrote:

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

I applied the patch to the latest master branch and run a test below. The error messages have been separated. Below is the test steps.

### setup primary server
initdb -D /tmp/primary/data
mkdir /tmp/archive_dir
echo "archive_mode='on'" >> /tmp/primary/data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'" >> /tmp/primary/data/postgresql.conf
pg_ctl -D /tmp/primary/data -l /tmp/primary-logs start

### setup host standby server
pg_basebackup -p 5432 -w -R -D /tmp/hotstandby
echo "primary_conninfo='host=127.0.0.1 port=5432 user=pgdev'" >> /tmp/hotstandby/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'" >> /tmp/hotstandby/postgresql.conf
echo "hot_standby = off" >> /tmp/hotstandby/postgresql.conf
pg_ctl -D /tmp/hotstandby -l /tmp/hotstandby-logs -o "-p 5433" start

### keep trying to connect to hot standby server in order to get the error messages in different stages.
while true; do echo "`date`"; psql postgres -p 5433 -c "SELECT txid_current_snapshot();" sleep 0.2; done

### before the patch
psql: error: could not connect to server: FATAL: the database system is starting up
...

### after the patch, got different messages, one message indicates hot_standby is off
psql: error: could not connect to server: FATAL: the database system is starting up
...
psql: error: could not connect to server: FATAL: the database system is up, but hot_standby is off
...

Thanks for the review and testing!

Thanks for the patch! Here is the comment from me.

+		else if (!FatalError && pmState == PM_RECOVERY)
+			return CAC_STANDBY; /* connection disallowed on non-hot standby */

Even if hot_standby is enabled, pmState seems to indicate PM_RECOVERY
until recovery has reached a consistent state. No? So, if my understanding
is right, "FATAL: the database system is up, but hot_standby is off" can
be logged even when hot_standby is on. Which sounds very confusing.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#9James Coleman
jtc331@gmail.com
In reply to: Fujii Masao (#8)
1 attachment(s)
Re: Nicer error when connecting to standby with hot_standby=off

On Wed, Jul 29, 2020 at 11:24 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2020/04/03 22:49, James Coleman wrote:

On Thu, Apr 2, 2020 at 5:53 PM David Zhang <david.zhang@highgo.ca> wrote:

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

I applied the patch to the latest master branch and run a test below. The error messages have been separated. Below is the test steps.

### setup primary server
initdb -D /tmp/primary/data
mkdir /tmp/archive_dir
echo "archive_mode='on'" >> /tmp/primary/data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'" >> /tmp/primary/data/postgresql.conf
pg_ctl -D /tmp/primary/data -l /tmp/primary-logs start

### setup host standby server
pg_basebackup -p 5432 -w -R -D /tmp/hotstandby
echo "primary_conninfo='host=127.0.0.1 port=5432 user=pgdev'" >> /tmp/hotstandby/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'" >> /tmp/hotstandby/postgresql.conf
echo "hot_standby = off" >> /tmp/hotstandby/postgresql.conf
pg_ctl -D /tmp/hotstandby -l /tmp/hotstandby-logs -o "-p 5433" start

### keep trying to connect to hot standby server in order to get the error messages in different stages.
while true; do echo "`date`"; psql postgres -p 5433 -c "SELECT txid_current_snapshot();" sleep 0.2; done

### before the patch
psql: error: could not connect to server: FATAL: the database system is starting up
...

### after the patch, got different messages, one message indicates hot_standby is off
psql: error: could not connect to server: FATAL: the database system is starting up
...
psql: error: could not connect to server: FATAL: the database system is up, but hot_standby is off
...

Thanks for the review and testing!

Thanks for the patch! Here is the comment from me.

+               else if (!FatalError && pmState == PM_RECOVERY)
+                       return CAC_STANDBY; /* connection disallowed on non-hot standby */

Even if hot_standby is enabled, pmState seems to indicate PM_RECOVERY
until recovery has reached a consistent state. No? So, if my understanding
is right, "FATAL: the database system is up, but hot_standby is off" can
be logged even when hot_standby is on. Which sounds very confusing.

That's a good point. I've attached a corrected version.

I still don't have a good idea for how to add a test for this change.
If a test for this warranted, I'd be interested in any ideas.

James

Attachments:

v2-0001-Improve-standby-connection-denied-error-message.patchapplication/octet-stream; name=v2-0001-Improve-standby-connection-denied-error-message.patchDownload
From e81ceaed041e6b84139071842e36d5445eceeeba Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Mon, 9 Mar 2020 21:45:15 -0400
Subject: [PATCH v2] Improve standby connection denied error message.

Currently when a standby is finished starting up but hot_standby is
configured to off, the error message when a client connects is "the
database system is starting up", which is needless confusing (and not
really all that accurate either).

Instead send a helpful error message so that the user immediately knows
that their server is configured to deny these connections.
---
 src/backend/postmaster/postmaster.c | 14 +++++++++++---
 src/include/libpq/libpq-be.h        |  7 ++++++-
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 5b5fc97c72..456c0e466b 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2305,6 +2305,12 @@ retry1:
 					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
 					 errmsg("the database system is starting up")));
 			break;
+		case CAC_STANDBY:
+			ereport(FATAL,
+					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
+					 errmsg("the database system is up, but hot_standby is off")));
+			break;
+
 		case CAC_SHUTDOWN:
 			ereport(FATAL,
 					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
@@ -2454,10 +2460,12 @@ canAcceptConnections(int backend_type)
 			result = CAC_WAITBACKUP;	/* allow superusers only */
 		else if (Shutdown > NoShutdown)
 			return CAC_SHUTDOWN;	/* shutdown is pending */
-		else if (!FatalError &&
-				 (pmState == PM_STARTUP ||
-				  pmState == PM_RECOVERY))
+		else if (!FatalError && pmState == PM_STARTUP)
 			return CAC_STARTUP; /* normal startup */
+		else if (!FatalError && pmState == PM_RECOVERY && EnableHotStandby)
+			return CAC_STARTUP; /* hot standby is starting up */
+		else if (!FatalError && pmState == PM_RECOVERY)
+			return CAC_STANDBY; /* connection disallowed on non-hot standby */
 		else if (!FatalError &&
 				 pmState == PM_HOT_STANDBY)
 			result = CAC_OK;	/* connection OK during hot standby */
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 179ebaa104..e38fed5a1c 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -70,7 +70,12 @@ typedef struct
 
 typedef enum CAC_state
 {
-	CAC_OK, CAC_STARTUP, CAC_SHUTDOWN, CAC_RECOVERY, CAC_TOOMANY,
+	CAC_OK,
+	CAC_STARTUP,
+	CAC_SHUTDOWN,
+	CAC_RECOVERY,
+	CAC_STANDBY,
+	CAC_TOOMANY,
 	CAC_WAITBACKUP
 } CAC_state;
 
-- 
2.20.1 (Apple Git-117)

#10Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: James Coleman (#9)
Re: Nicer error when connecting to standby with hot_standby=off

On 2020/08/01 5:18, James Coleman wrote:

On Wed, Jul 29, 2020 at 11:24 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2020/04/03 22:49, James Coleman wrote:

On Thu, Apr 2, 2020 at 5:53 PM David Zhang <david.zhang@highgo.ca> wrote:

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

I applied the patch to the latest master branch and run a test below. The error messages have been separated. Below is the test steps.

### setup primary server
initdb -D /tmp/primary/data
mkdir /tmp/archive_dir
echo "archive_mode='on'" >> /tmp/primary/data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'" >> /tmp/primary/data/postgresql.conf
pg_ctl -D /tmp/primary/data -l /tmp/primary-logs start

### setup host standby server
pg_basebackup -p 5432 -w -R -D /tmp/hotstandby
echo "primary_conninfo='host=127.0.0.1 port=5432 user=pgdev'" >> /tmp/hotstandby/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'" >> /tmp/hotstandby/postgresql.conf
echo "hot_standby = off" >> /tmp/hotstandby/postgresql.conf
pg_ctl -D /tmp/hotstandby -l /tmp/hotstandby-logs -o "-p 5433" start

### keep trying to connect to hot standby server in order to get the error messages in different stages.
while true; do echo "`date`"; psql postgres -p 5433 -c "SELECT txid_current_snapshot();" sleep 0.2; done

### before the patch
psql: error: could not connect to server: FATAL: the database system is starting up
...

### after the patch, got different messages, one message indicates hot_standby is off
psql: error: could not connect to server: FATAL: the database system is starting up
...
psql: error: could not connect to server: FATAL: the database system is up, but hot_standby is off
...

Thanks for the review and testing!

Thanks for the patch! Here is the comment from me.

+               else if (!FatalError && pmState == PM_RECOVERY)
+                       return CAC_STANDBY; /* connection disallowed on non-hot standby */

Even if hot_standby is enabled, pmState seems to indicate PM_RECOVERY
until recovery has reached a consistent state. No? So, if my understanding
is right, "FATAL: the database system is up, but hot_standby is off" can
be logged even when hot_standby is on. Which sounds very confusing.

That's a good point. I've attached a corrected version.

Thanks for updating the patch! But it failed to be applied to the master branch
cleanly because of the recent commit 0038f94387. So could you update the patch
again?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#11James Coleman
jtc331@gmail.com
In reply to: Fujii Masao (#10)
1 attachment(s)
Re: Nicer error when connecting to standby with hot_standby=off

On Tue, Aug 18, 2020 at 12:25 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

Thanks for updating the patch! But it failed to be applied to the master branch
cleanly because of the recent commit 0038f94387. So could you update the patch
again?

Updated patch attached.

James

Attachments:

v3-0001-Improve-standby-connection-denied-error-message.patchapplication/octet-stream; name=v3-0001-Improve-standby-connection-denied-error-message.patchDownload
From 1d0a2adee73e2e25863d0af4dd9055474ef7f4f4 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Mon, 9 Mar 2020 21:45:15 -0400
Subject: [PATCH v3] Improve standby connection denied error message.

Currently when a standby is finished starting up but hot_standby is
configured to off, the error message when a client connects is "the
database system is starting up", which is needless confusing (and not
really all that accurate either).

Instead send a helpful error message so that the user immediately knows
that their server is configured to deny these connections.
---
 src/backend/postmaster/postmaster.c | 14 +++++++++++---
 src/include/libpq/libpq-be.h        |  7 ++++++-
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 42223c0f61..9922d9a987 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2319,6 +2319,12 @@ retry1:
 					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
 					 errmsg("the database system is starting up")));
 			break;
+		case CAC_STANDBY:
+			ereport(FATAL,
+					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
+					 errmsg("the database system is up, but hot_standby is off")));
+			break;
+
 		case CAC_SHUTDOWN:
 			ereport(FATAL,
 					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
@@ -2460,10 +2466,12 @@ canAcceptConnections(int backend_type)
 	{
 		if (Shutdown > NoShutdown)
 			return CAC_SHUTDOWN;	/* shutdown is pending */
-		else if (!FatalError &&
-				 (pmState == PM_STARTUP ||
-				  pmState == PM_RECOVERY))
+		else if (!FatalError && pmState == PM_STARTUP)
 			return CAC_STARTUP; /* normal startup */
+		else if (!FatalError && pmState == PM_RECOVERY && EnableHotStandby)
+			return CAC_STARTUP; /* hot standby is starting up */
+		else if (!FatalError && pmState == PM_RECOVERY)
+			return CAC_STANDBY; /* connection disallowed on non-hot standby */
 		else
 			return CAC_RECOVERY;	/* else must be crash recovery */
 	}
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 0a23281ad5..12f4326763 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -70,7 +70,12 @@ typedef struct
 
 typedef enum CAC_state
 {
-	CAC_OK, CAC_STARTUP, CAC_SHUTDOWN, CAC_RECOVERY, CAC_TOOMANY,
+	CAC_OK,
+	CAC_STARTUP,
+	CAC_SHUTDOWN,
+	CAC_RECOVERY,
+	CAC_STANDBY,
+	CAC_TOOMANY,
 	CAC_SUPERUSER
 } CAC_state;
 
-- 
2.20.1

#12David Steele
david@pgmasters.net
In reply to: James Coleman (#11)
Re: Nicer error when connecting to standby with hot_standby=off

Hi Fujii,

On 9/8/20 1:17 PM, James Coleman wrote:

On Tue, Aug 18, 2020 at 12:25 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

Thanks for updating the patch! But it failed to be applied to the master branch
cleanly because of the recent commit 0038f94387. So could you update the patch
again?

Updated patch attached.

Any thoughts on the updated patch?

Regards,
--
-David
david@pgmasters.net

#13Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: David Steele (#12)
Re: Nicer error when connecting to standby with hot_standby=off

On 2021/03/05 22:45, David Steele wrote:

Hi Fujii,

On 9/8/20 1:17 PM, James Coleman wrote:

On Tue, Aug 18, 2020 at 12:25 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

Thanks for updating the patch! But it failed to be applied to the master branch
cleanly because of the recent commit 0038f94387. So could you update the patch
again?

Updated patch attached.

Any thoughts on the updated patch?

Thanks for the ping!

With the patch, if hot_standby is enabled, the message
"the database system is starting up" is output while the server is
in PM_RECOVERY state until it reaches the consistent recovery point.
On the other hand, if hot_standby is not enabled, the message
"the database system is up, but hot_standby is off" is output even
while the server is in that same situation. That is, opposite
messages can be output for the same situation based on the setting
of hot_standby. One message is "system is starting up", the other
is "system is up". Isn't this rather confusing?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#14James Coleman
jtc331@gmail.com
In reply to: Fujii Masao (#13)
Re: Nicer error when connecting to standby with hot_standby=off

On Fri, Mar 5, 2021 at 12:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/03/05 22:45, David Steele wrote:

Hi Fujii,

On 9/8/20 1:17 PM, James Coleman wrote:

On Tue, Aug 18, 2020 at 12:25 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

Thanks for updating the patch! But it failed to be applied to the master branch
cleanly because of the recent commit 0038f94387. So could you update the patch
again?

Updated patch attached.

Any thoughts on the updated patch?

Thanks for the ping!

With the patch, if hot_standby is enabled, the message
"the database system is starting up" is output while the server is
in PM_RECOVERY state until it reaches the consistent recovery point.
On the other hand, if hot_standby is not enabled, the message
"the database system is up, but hot_standby is off" is output even
while the server is in that same situation. That is, opposite
messages can be output for the same situation based on the setting
of hot_standby. One message is "system is starting up", the other
is "system is up". Isn't this rather confusing?

Do you have any thoughts on what you'd like to see the message be? I
could change the PM_RECOVERY (without hot standby enabled) to return
CAC_RECOVERY which would give us the message "the database system is
in recovery mode", but that would be a change from what that state
returns now in a way that's unrelated to the goal of the patch.

Thanks,
James

#15Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: James Coleman (#14)
Re: Nicer error when connecting to standby with hot_standby=off

On 2021-Mar-05, James Coleman wrote:

Do you have any thoughts on what you'd like to see the message be? I
could change the PM_RECOVERY (without hot standby enabled) to return
CAC_RECOVERY which would give us the message "the database system is
in recovery mode", but that would be a change from what that state
returns now in a way that's unrelated to the goal of the patch.

Here's an idea:

* hot_standby=on, before reaching consistent state
FATAL: database is not accepting connections
DETAIL: Consistent state has not yet been reached.

* hot_standby=off, past consistent state
FATAL: database is not accepting connections
DETAIL: Hot standby mode is disabled.

* hot_standby=off, before reaching consistent state
FATAL: database is not accepting connections
DETAIL: Hot standby mode is disabled.
or maybe
DETAIL: Consistent state has not yet been reached, and hot standby mode is disabled.

--
�lvaro Herrera 39�49'30"S 73�17'W
"Ed is the standard text editor."
http://groups.google.com/group/alt.religion.emacs/msg/8d94ddab6a9b0ad3

#16Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Alvaro Herrera (#15)
Re: Nicer error when connecting to standby with hot_standby=off

On 2021/03/06 5:37, Alvaro Herrera wrote:

On 2021-Mar-05, James Coleman wrote:

Do you have any thoughts on what you'd like to see the message be? I
could change the PM_RECOVERY (without hot standby enabled) to return
CAC_RECOVERY which would give us the message "the database system is
in recovery mode", but that would be a change from what that state
returns now in a way that's unrelated to the goal of the patch.

Here's an idea:

* hot_standby=on, before reaching consistent state
FATAL: database is not accepting connections
DETAIL: Consistent state has not yet been reached.

* hot_standby=off, past consistent state
FATAL: database is not accepting connections
DETAIL: Hot standby mode is disabled.

* hot_standby=off, before reaching consistent state
FATAL: database is not accepting connections

This idea looks good to me!

DETAIL: Hot standby mode is disabled.
or maybe
DETAIL: Consistent state has not yet been reached, and hot standby mode is disabled.

I prefer the former message. Because the latter message meams that
we need to output the different messages based on whether the consistent
state is reached or not, and the followings would be necessary to implement
that. This looks a bit overkill to me against the purpose, at least for me.

- The startup process needs to send new signal
(e.g., PMSIGNAL_RECOVERY_CONSISTENT) to postmaster when the consistent
state has been reached, to let postmaster know that state.

- When receiving that signal, postmaster needs to move its state to new state
(e.g., PM_CONSISTENT).

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#17Magnus Hagander
magnus@hagander.net
In reply to: Fujii Masao (#16)
Re: Nicer error when connecting to standby with hot_standby=off

On Sun, Mar 7, 2021 at 3:39 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/03/06 5:37, Alvaro Herrera wrote:

On 2021-Mar-05, James Coleman wrote:

Do you have any thoughts on what you'd like to see the message be? I
could change the PM_RECOVERY (without hot standby enabled) to return
CAC_RECOVERY which would give us the message "the database system is
in recovery mode", but that would be a change from what that state
returns now in a way that's unrelated to the goal of the patch.

Here's an idea:

* hot_standby=on, before reaching consistent state
FATAL: database is not accepting connections
DETAIL: Consistent state has not yet been reached.

* hot_standby=off, past consistent state
FATAL: database is not accepting connections
DETAIL: Hot standby mode is disabled.

* hot_standby=off, before reaching consistent state
FATAL: database is not accepting connections

This idea looks good to me!

+1.

DETAIL: Hot standby mode is disabled.
or maybe
DETAIL: Consistent state has not yet been reached, and hot standby mode is disabled.

I prefer the former message. Because the latter message meams that
we need to output the different messages based on whether the consistent
state is reached or not, and the followings would be necessary to implement
that. This looks a bit overkill to me against the purpose, at least for me.

Agreed. If hot standby is off, why would the admin care about whether
it's consistent yet or not?

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#18Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Magnus Hagander (#17)
Re: Nicer error when connecting to standby with hot_standby=off

On 2021-Mar-07, Magnus Hagander wrote:

On Sun, Mar 7, 2021 at 3:39 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

Here's an idea:

* hot_standby=on, before reaching consistent state
FATAL: database is not accepting connections
DETAIL: Consistent state has not yet been reached.

* hot_standby=off, past consistent state
FATAL: database is not accepting connections
DETAIL: Hot standby mode is disabled.

* hot_standby=off, before reaching consistent state
FATAL: database is not accepting connections

[...]

DETAIL: Hot standby mode is disabled.

I prefer the former message. Because the latter message meams that
we need to output the different messages based on whether the consistent
state is reached or not, and the followings would be necessary to implement
that. This looks a bit overkill to me against the purpose, at least for me.

Agreed. If hot standby is off, why would the admin care about whether
it's consistent yet or not?

Great, so we're agreed on the messages to emit. James, are you updating
your patch, considering Fujii's note about the new signal and pmstate
that need to be added?

--
�lvaro Herrera 39�49'30"S 73�17'W

#19James Coleman
jtc331@gmail.com
In reply to: Alvaro Herrera (#18)
Re: Nicer error when connecting to standby with hot_standby=off

On Tue, Mar 9, 2021 at 8:47 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Mar-07, Magnus Hagander wrote:

On Sun, Mar 7, 2021 at 3:39 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

Here's an idea:

* hot_standby=on, before reaching consistent state
FATAL: database is not accepting connections
DETAIL: Consistent state has not yet been reached.

* hot_standby=off, past consistent state
FATAL: database is not accepting connections
DETAIL: Hot standby mode is disabled.

* hot_standby=off, before reaching consistent state
FATAL: database is not accepting connections

[...]

DETAIL: Hot standby mode is disabled.

I prefer the former message. Because the latter message meams that
we need to output the different messages based on whether the consistent
state is reached or not, and the followings would be necessary to implement
that. This looks a bit overkill to me against the purpose, at least for me.

Agreed. If hot standby is off, why would the admin care about whether
it's consistent yet or not?

Great, so we're agreed on the messages to emit. James, are you updating
your patch, considering Fujii's note about the new signal and pmstate
that need to be added?

Perhaps I'm missing something, but I was under the impression the
"prefer the former message" meant we were not adding a new signal and
pmstate?

James

#20Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: James Coleman (#19)
Re: Nicer error when connecting to standby with hot_standby=off

On 2021-Mar-09, James Coleman wrote:

On Tue, Mar 9, 2021 at 8:47 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Mar-07, Magnus Hagander wrote:

On Sun, Mar 7, 2021 at 3:39 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

Great, so we're agreed on the messages to emit. James, are you updating
your patch, considering Fujii's note about the new signal and pmstate
that need to be added?

Perhaps I'm missing something, but I was under the impression the
"prefer the former message" meant we were not adding a new signal and
pmstate?

Eh, I read that differently. I was proposing two options for the DETAIL
line in that case:

DETAIL: Hot standby mode is disabled.
or maybe
DETAIL: Consistent state has not yet been reached, and hot standby mode is disabled.

and both Fujii and Magnus said they prefer the first option over the
second option. I don't read any of them as saying that they would like
to do something else (including not doing anything).

Maybe I misinterpreted them?

--
�lvaro Herrera Valdivia, Chile

#21James Coleman
jtc331@gmail.com
In reply to: Alvaro Herrera (#20)
Re: Nicer error when connecting to standby with hot_standby=off

On Tue, Mar 9, 2021 at 9:07 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Mar-09, James Coleman wrote:

On Tue, Mar 9, 2021 at 8:47 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Mar-07, Magnus Hagander wrote:

On Sun, Mar 7, 2021 at 3:39 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

Great, so we're agreed on the messages to emit. James, are you updating
your patch, considering Fujii's note about the new signal and pmstate
that need to be added?

Perhaps I'm missing something, but I was under the impression the
"prefer the former message" meant we were not adding a new signal and
pmstate?

Eh, I read that differently. I was proposing two options for the DETAIL
line in that case:

DETAIL: Hot standby mode is disabled.
or maybe
DETAIL: Consistent state has not yet been reached, and hot standby mode is disabled.

and both Fujii and Magnus said they prefer the first option over the
second option. I don't read any of them as saying that they would like
to do something else (including not doing anything).

Maybe I misinterpreted them?

Yes, I think they both agreed on the "DETAIL: Hot standby mode is
disabled." message, but that alternative meant not needing to add any
new signals and pm states, correct?

Thanks,
James

#22Magnus Hagander
magnus@hagander.net
In reply to: Alvaro Herrera (#20)
Re: Nicer error when connecting to standby with hot_standby=off

On Tue, Mar 9, 2021 at 3:07 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Mar-09, James Coleman wrote:

On Tue, Mar 9, 2021 at 8:47 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Mar-07, Magnus Hagander wrote:

On Sun, Mar 7, 2021 at 3:39 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

Great, so we're agreed on the messages to emit. James, are you updating
your patch, considering Fujii's note about the new signal and pmstate
that need to be added?

Perhaps I'm missing something, but I was under the impression the
"prefer the former message" meant we were not adding a new signal and
pmstate?

Eh, I read that differently. I was proposing two options for the DETAIL
line in that case:

DETAIL: Hot standby mode is disabled.
or maybe
DETAIL: Consistent state has not yet been reached, and hot standby mode is disabled.

and both Fujii and Magnus said they prefer the first option over the
second option. I don't read any of them as saying that they would like
to do something else (including not doing anything).

Maybe I misinterpreted them?

That is indeed what I meant as well.

The reference to "the former" as being the "first or the two new
options", not the "old option". That is, "DETAIL: Hot standby mode is
disabled.".

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#23Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: James Coleman (#21)
Re: Nicer error when connecting to standby with hot_standby=off

On 2021-Mar-09, James Coleman wrote:

Yes, I think they both agreed on the "DETAIL: Hot standby mode is
disabled." message, but that alternative meant not needing to add any
new signals and pm states, correct?

Ah, I see! I was thinking that you still needed the state and signal in
order to print the correct message in hot-standby mode, but that's
(obviously!) wrong. So you're right that no signal/state are needed.

Thanks

--
�lvaro Herrera Valdivia, Chile
Si no sabes adonde vas, es muy probable que acabes en otra parte.

#24James Coleman
jtc331@gmail.com
In reply to: Alvaro Herrera (#23)
Re: Nicer error when connecting to standby with hot_standby=off

On Tue, Mar 9, 2021 at 9:17 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Mar-09, James Coleman wrote:

Yes, I think they both agreed on the "DETAIL: Hot standby mode is
disabled." message, but that alternative meant not needing to add any
new signals and pm states, correct?

Ah, I see! I was thinking that you still needed the state and signal in
order to print the correct message in hot-standby mode, but that's
(obviously!) wrong. So you're right that no signal/state are needed.

Cool. And yes, I'm planning to update the patch soon.

Thanks,
James

#25Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: James Coleman (#24)
Re: Nicer error when connecting to standby with hot_standby=off

On 2021/03/09 23:19, James Coleman wrote:

On Tue, Mar 9, 2021 at 9:17 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Mar-09, James Coleman wrote:

Yes, I think they both agreed on the "DETAIL: Hot standby mode is
disabled." message, but that alternative meant not needing to add any
new signals and pm states, correct?

Ah, I see! I was thinking that you still needed the state and signal in
order to print the correct message in hot-standby mode, but that's
(obviously!) wrong. So you're right that no signal/state are needed.

Cool. And yes, I'm planning to update the patch soon.

+1. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#26James Coleman
jtc331@gmail.com
In reply to: Fujii Masao (#25)
1 attachment(s)
Re: Nicer error when connecting to standby with hot_standby=off

On Tue, Mar 9, 2021 at 9:27 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/03/09 23:19, James Coleman wrote:

On Tue, Mar 9, 2021 at 9:17 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Mar-09, James Coleman wrote:

Yes, I think they both agreed on the "DETAIL: Hot standby mode is
disabled." message, but that alternative meant not needing to add any
new signals and pm states, correct?

Ah, I see! I was thinking that you still needed the state and signal in
order to print the correct message in hot-standby mode, but that's
(obviously!) wrong. So you're right that no signal/state are needed.

Cool. And yes, I'm planning to update the patch soon.

+1. Thanks!

Here's an updated patch; I think I've gotten what Alvaro suggested.

Thanks,
James

Attachments:

v4-0001-Improve-standby-connection-denied-error-message.patchapplication/octet-stream; name=v4-0001-Improve-standby-connection-denied-error-message.patchDownload
From d70ca01ed896c7b87dc5b73de382f924828f8bac Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Mon, 9 Mar 2020 21:45:15 -0400
Subject: [PATCH v4] Improve standby connection denied error message.

Currently when a standby is finished starting up but hot_standby is
configured to off, the error message when a client connects is "the
database system is starting up", which is needless confusing (and not
really all that accurate either).

Instead send a helpful error message so that the user immediately knows
that their server is configured to deny these connections.
---
 src/backend/postmaster/postmaster.c | 16 ++++++++++++----
 src/include/libpq/libpq-be.h        |  7 ++++++-
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index edab95a19e..ac2877bdc9 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2289,8 +2289,14 @@ retry1:
 		case CAC_STARTUP:
 			ereport(FATAL,
 					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
-					 errmsg("the database system is starting up")));
+					 errmsg("consistent state has not yet been reached")));
 			break;
+		case CAC_STANDBY:
+			ereport(FATAL,
+					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
+					 errmsg("hot standby mode is disabled")));
+			break;
+
 		case CAC_SHUTDOWN:
 			ereport(FATAL,
 					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
@@ -2432,10 +2438,12 @@ canAcceptConnections(int backend_type)
 	{
 		if (Shutdown > NoShutdown)
 			return CAC_SHUTDOWN;	/* shutdown is pending */
-		else if (!FatalError &&
-				 (pmState == PM_STARTUP ||
-				  pmState == PM_RECOVERY))
+		else if (!FatalError && pmState == PM_STARTUP)
 			return CAC_STARTUP; /* normal startup */
+		else if (!FatalError && pmState == PM_RECOVERY && EnableHotStandby)
+			return CAC_STARTUP; /* hot standby is starting up */
+		else if (!FatalError && pmState == PM_RECOVERY)
+			return CAC_STANDBY; /* connection disallowed on non-hot standby */
 		else
 			return CAC_RECOVERY;	/* else must be crash recovery */
 	}
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 30fb4e613d..908eeb5469 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -70,7 +70,12 @@ typedef struct
 
 typedef enum CAC_state
 {
-	CAC_OK, CAC_STARTUP, CAC_SHUTDOWN, CAC_RECOVERY, CAC_TOOMANY,
+	CAC_OK,
+	CAC_STARTUP,
+	CAC_SHUTDOWN,
+	CAC_RECOVERY,
+	CAC_STANDBY,
+	CAC_TOOMANY,
 	CAC_SUPERUSER
 } CAC_state;
 
-- 
2.20.1

#27Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: James Coleman (#26)
1 attachment(s)
Re: Nicer error when connecting to standby with hot_standby=off

On 2021/03/19 23:35, James Coleman wrote:

Here's an updated patch; I think I've gotten what Alvaro suggested.

Thanks for updating the patch! But I was thinking that our consensus is
something like the attached patch. Could you check this patch?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

v5-0001-Improve-standby-connection-denied-error-message.patchtext/plain; charset=UTF-8; name=v5-0001-Improve-standby-connection-denied-error-message.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ef0be4ca38..a800568d14 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2294,6 +2294,19 @@ retry1:
 					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
 					 errmsg("the database system is starting up")));
 			break;
+		case CAC_NOCONSISTENT:
+			if (EnableHotStandby)
+				ereport(FATAL,
+						(errcode(ERRCODE_CANNOT_CONNECT_NOW),
+						 errmsg("the database system is not accepting connections"),
+						 errdetail("Consistent recovery state has not been yet reached.")));
+			else
+				ereport(FATAL,
+						(errcode(ERRCODE_CANNOT_CONNECT_NOW),
+						 errmsg("the database system is not accepting connections"),
+						 errdetail("Hot standby mode is disabled.")));
+			break;
+
 		case CAC_SHUTDOWN:
 			ereport(FATAL,
 					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
@@ -2435,10 +2448,11 @@ canAcceptConnections(int backend_type)
 	{
 		if (Shutdown > NoShutdown)
 			return CAC_SHUTDOWN;	/* shutdown is pending */
-		else if (!FatalError &&
-				 (pmState == PM_STARTUP ||
-				  pmState == PM_RECOVERY))
+		else if (!FatalError && pmState == PM_STARTUP)
 			return CAC_STARTUP; /* normal startup */
+		else if (!FatalError && pmState == PM_RECOVERY)
+			return CAC_NOCONSISTENT;	/* consistent recovery state has not
+										 * been yet reached */
 		else
 			return CAC_RECOVERY;	/* else must be crash recovery */
 	}
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 30fb4e613d..299528255c 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -70,7 +70,12 @@ typedef struct
 
 typedef enum CAC_state
 {
-	CAC_OK, CAC_STARTUP, CAC_SHUTDOWN, CAC_RECOVERY, CAC_TOOMANY,
+	CAC_OK,
+	CAC_STARTUP,
+	CAC_NOCONSISTENT,
+	CAC_SHUTDOWN,
+	CAC_RECOVERY,
+	CAC_TOOMANY,
 	CAC_SUPERUSER
 } CAC_state;
 
#28James Coleman
jtc331@gmail.com
In reply to: Fujii Masao (#27)
1 attachment(s)
Re: Nicer error when connecting to standby with hot_standby=off

On Mon, Mar 22, 2021 at 1:49 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/03/19 23:35, James Coleman wrote:

Here's an updated patch; I think I've gotten what Alvaro suggested.

Thanks for updating the patch! But I was thinking that our consensus is
something like the attached patch. Could you check this patch?

As far as I can tell (I might be missing something) your v5 patch does
the same thing, albeit with different code organization. It did catch
though that I'd neglected to add the DETAIL line as separate from the
errmsg line.

Is the attached (in the style of my v4, since I'm not following why we
need to move the standby determination logic into a new
CAC_NOCONSISTENT block) what you're thinking? Or is there something
else I'm missing?

Thanks,
James

Attachments:

v6-0001-Improve-standby-connection-denied-error-message.patchapplication/octet-stream; name=v6-0001-Improve-standby-connection-denied-error-message.patchDownload
From 67ebf3b8f78814e727f2cfdb08ab41ff9ffd03ad Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Mon, 9 Mar 2020 21:45:15 -0400
Subject: [PATCH v6] Improve standby connection denied error message.

Currently when a standby is finished starting up but hot_standby is
configured to off, the error message when a client connects is "the
database system is starting up", which is needless confusing (and not
really all that accurate either).

Instead send a helpful error message so that the user immediately knows
that their server is configured to deny these connections.
---
 src/backend/postmaster/postmaster.c | 18 ++++++++++++++----
 src/include/libpq/libpq-be.h        |  7 ++++++-
 2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index edab95a19e..b23a83e6b7 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2289,8 +2289,16 @@ retry1:
 		case CAC_STARTUP:
 			ereport(FATAL,
 					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
-					 errmsg("the database system is starting up")));
+					 errmsg("the database system is not accepting connections"),
+					 errdetail("Consistent recovery state has not been yet reached.")));
 			break;
+		case CAC_STANDBY:
+			ereport(FATAL,
+					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
+					 errmsg("the database system is not accepting connections"),
+					 errdetail("Hot standby mode is disabled.")));
+			break;
+
 		case CAC_SHUTDOWN:
 			ereport(FATAL,
 					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
@@ -2432,10 +2440,12 @@ canAcceptConnections(int backend_type)
 	{
 		if (Shutdown > NoShutdown)
 			return CAC_SHUTDOWN;	/* shutdown is pending */
-		else if (!FatalError &&
-				 (pmState == PM_STARTUP ||
-				  pmState == PM_RECOVERY))
+		else if (!FatalError && pmState == PM_STARTUP)
 			return CAC_STARTUP; /* normal startup */
+		else if (!FatalError && pmState == PM_RECOVERY && EnableHotStandby)
+			return CAC_STARTUP; /* hot standby is starting up */
+		else if (!FatalError && pmState == PM_RECOVERY)
+			return CAC_STANDBY; /* connection disallowed on non-hot standby */
 		else
 			return CAC_RECOVERY;	/* else must be crash recovery */
 	}
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 30fb4e613d..908eeb5469 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -70,7 +70,12 @@ typedef struct
 
 typedef enum CAC_state
 {
-	CAC_OK, CAC_STARTUP, CAC_SHUTDOWN, CAC_RECOVERY, CAC_TOOMANY,
+	CAC_OK,
+	CAC_STARTUP,
+	CAC_SHUTDOWN,
+	CAC_RECOVERY,
+	CAC_STANDBY,
+	CAC_TOOMANY,
 	CAC_SUPERUSER
 } CAC_state;
 
-- 
2.20.1

#29Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: James Coleman (#28)
Re: Nicer error when connecting to standby with hot_standby=off

On 2021/03/23 3:25, James Coleman wrote:

On Mon, Mar 22, 2021 at 1:49 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/03/19 23:35, James Coleman wrote:

Here's an updated patch; I think I've gotten what Alvaro suggested.

Thanks for updating the patch! But I was thinking that our consensus is
something like the attached patch. Could you check this patch?

As far as I can tell (I might be missing something) your v5 patch does
the same thing, albeit with different code organization. It did catch
though that I'd neglected to add the DETAIL line as separate from the
errmsg line.

Is the attached (in the style of my v4, since I'm not following why we
need to move the standby determination logic into a new
CAC_NOCONSISTENT block) what you're thinking? Or is there something
else I'm missing?

I just did that to avoid adding more CAC_state. But basically it's
ok to check hot standby at either canAcceptConnections() or
ProcessStartupPacket().

  		case CAC_STARTUP:
  			ereport(FATAL,
  					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
-					 errmsg("the database system is starting up")));
+					 errmsg("the database system is not accepting connections"),
+					 errdetail("Consistent recovery state has not been yet reached.")));

Do you want to report this message even in crash recovery? Since crash
recovery is basically not so much related to "consistent recovery state",
at least for me the original message seems more suitable for crash recovery.

Also if we adopt this message, the server with hot_standby=off reports
"Consistent recovery state has not been yet reached." in PM_STARTUP,
but stops reporting this message at PM_RECOVERY even if the consistent
recovery state has not been reached yet. Instead, it reports "Hot standby
mode is disabled." at PM_RECOVERY. Isn't this transition of message confusing?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#30James Coleman
jtc331@gmail.com
In reply to: Fujii Masao (#29)
Re: Nicer error when connecting to standby with hot_standby=off

On Mon, Mar 22, 2021 at 2:52 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/03/23 3:25, James Coleman wrote:

On Mon, Mar 22, 2021 at 1:49 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/03/19 23:35, James Coleman wrote:

Here's an updated patch; I think I've gotten what Alvaro suggested.

Thanks for updating the patch! But I was thinking that our consensus is
something like the attached patch. Could you check this patch?

As far as I can tell (I might be missing something) your v5 patch does
the same thing, albeit with different code organization. It did catch
though that I'd neglected to add the DETAIL line as separate from the
errmsg line.

Is the attached (in the style of my v4, since I'm not following why we
need to move the standby determination logic into a new
CAC_NOCONSISTENT block) what you're thinking? Or is there something
else I'm missing?

I just did that to avoid adding more CAC_state. But basically it's
ok to check hot standby at either canAcceptConnections() or
ProcessStartupPacket().

case CAC_STARTUP:
ereport(FATAL,
(errcode(ERRCODE_CANNOT_CONNECT_NOW),
-                                        errmsg("the database system is starting up")));
+                                        errmsg("the database system is not accepting connections"),
+                                        errdetail("Consistent recovery state has not been yet reached.")));

Do you want to report this message even in crash recovery? Since crash
recovery is basically not so much related to "consistent recovery state",
at least for me the original message seems more suitable for crash recovery.

Also if we adopt this message, the server with hot_standby=off reports
"Consistent recovery state has not been yet reached." in PM_STARTUP,
but stops reporting this message at PM_RECOVERY even if the consistent
recovery state has not been reached yet. Instead, it reports "Hot standby
mode is disabled." at PM_RECOVERY. Isn't this transition of message confusing?

Are you saying we should only change the message for a single case:
the case where we'd otherwise allow connections but EnableHotStandby
is false? I believe that's what the original patch did, but then
Alvaro's proposal included changing additional messages.

James Coleman

#31Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: James Coleman (#30)
Re: Nicer error when connecting to standby with hot_standby=off

On 2021/03/23 3:59, James Coleman wrote:

Are you saying we should only change the message for a single case:
the case where we'd otherwise allow connections but EnableHotStandby
is false?

No. Let me clarify my opinion.

At PM_STARTUP, "the database system is starting up" should be logged
whatever the setting of hot_standby is. This is the same as the original
behavior. During crash recovery, this message is output. Also at archive
recovery or standby server, until the startup process sends
PMSIGNAL_RECOVERY_STARTED, this message is logged.

At PM_RECOVERY, originally "the database system is starting up" was logged
whatever the setting of hot_standby is. My opinion is the same as our
consensus, i.e., "the database system is not accepting connections" and
"Hot standby mode is disabled." are logged if hot_standby is disabled.
"the database system is not accepting connections" and "Consistent
recovery state has not been yet reached." are logged if hot_standby is
enabled.

After the consistent recovery state is reached, if hot_standby is disabled,
the postmaster state is still PM_RECOVERY. So "Hot standby mode is disabled."
is still logged in this case. This is also different behavior from the original.
If hot_standby is enabled, read-only connections can be accepted because
the consistent state is reached. So no message needs to be logged.

Therefore for now what we've not reached the consensus is what message
should be logged at PM_STARTUP. I'm thinking it's better to log
"the database system is starting up" in that case because of the reasons
that I explained upthread.

Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#32James Coleman
jtc331@gmail.com
In reply to: Fujii Masao (#31)
1 attachment(s)
Re: Nicer error when connecting to standby with hot_standby=off

On Tue, Mar 23, 2021 at 1:46 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/03/23 3:59, James Coleman wrote:

Are you saying we should only change the message for a single case:
the case where we'd otherwise allow connections but EnableHotStandby
is false?

No. Let me clarify my opinion.

At PM_STARTUP, "the database system is starting up" should be logged
whatever the setting of hot_standby is. This is the same as the original
behavior. During crash recovery, this message is output. Also at archive
recovery or standby server, until the startup process sends
PMSIGNAL_RECOVERY_STARTED, this message is logged.

At PM_RECOVERY, originally "the database system is starting up" was logged
whatever the setting of hot_standby is. My opinion is the same as our
consensus, i.e., "the database system is not accepting connections" and
"Hot standby mode is disabled." are logged if hot_standby is disabled.
"the database system is not accepting connections" and "Consistent
recovery state has not been yet reached." are logged if hot_standby is
enabled.

After the consistent recovery state is reached, if hot_standby is disabled,
the postmaster state is still PM_RECOVERY. So "Hot standby mode is disabled."
is still logged in this case. This is also different behavior from the original.
If hot_standby is enabled, read-only connections can be accepted because
the consistent state is reached. So no message needs to be logged.

Therefore for now what we've not reached the consensus is what message
should be logged at PM_STARTUP. I'm thinking it's better to log
"the database system is starting up" in that case because of the reasons
that I explained upthread.

Thought?

I understand your point now, and I agree, that makes sense.

The attached takes a similar approach to your v5, but I've used
CAC_NOTCONSISTENT instead of CAC_NOCONSISTENT because I think it reads
better (CAC_INCONSISTENT would technically be better English,
but...also it doesn't parallel the code and error message).

Thoughts?

James Coleman

Attachments:

v7-0001-Improve-standby-connection-denied-error-message.patchapplication/octet-stream; name=v7-0001-Improve-standby-connection-denied-error-message.patchDownload
From aa5f72c17e1fe80603414ca05418d93c66b6160f Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Mon, 9 Mar 2020 21:45:15 -0400
Subject: [PATCH v7] Improve standby connection denied error message.

Currently when a standby is finished starting up but hot_standby is
configured to off, the error message when a client connects is "the
database system is starting up", which is needless confusing (and not
really all that accurate either).

Instead send a helpful error message so that the user immediately knows
that their server is configured to deny these connections.
---
 src/backend/postmaster/postmaster.c | 20 ++++++++++++++++----
 src/include/libpq/libpq-be.h        |  7 ++++++-
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index edab95a19e..646ffca66d 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2289,7 +2289,19 @@ retry1:
 		case CAC_STARTUP:
 			ereport(FATAL,
 					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
-					 errmsg("the database system is starting up")));
+					 errmsg("the database system is not accepting connections")));
+			break;
+		case CAC_NOTCONSISTENT:
+			if (EnableHotStandby)
+				ereport(FATAL,
+						(errcode(ERRCODE_CANNOT_CONNECT_NOW),
+						 errmsg("the database system is not accepting connections"),
+						 errdetail("Consistent recovery state has not been yet reached.")));
+			else
+				ereport(FATAL,
+						(errcode(ERRCODE_CANNOT_CONNECT_NOW),
+						 errmsg("the database system is not accepting connections"),
+						 errdetail("Hot standby mode is disabled.")));
 			break;
 		case CAC_SHUTDOWN:
 			ereport(FATAL,
@@ -2432,10 +2444,10 @@ canAcceptConnections(int backend_type)
 	{
 		if (Shutdown > NoShutdown)
 			return CAC_SHUTDOWN;	/* shutdown is pending */
-		else if (!FatalError &&
-				 (pmState == PM_STARTUP ||
-				  pmState == PM_RECOVERY))
+		else if (!FatalError && pmState == PM_STARTUP)
 			return CAC_STARTUP; /* normal startup */
+		else if (!FatalError && pmState == PM_RECOVERY)
+			return CAC_NOTCONSISTENT; /* not yet at consistent recovery state */
 		else
 			return CAC_RECOVERY;	/* else must be crash recovery */
 	}
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 30fb4e613d..891394b0c3 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -70,7 +70,12 @@ typedef struct
 
 typedef enum CAC_state
 {
-	CAC_OK, CAC_STARTUP, CAC_SHUTDOWN, CAC_RECOVERY, CAC_TOOMANY,
+	CAC_OK,
+	CAC_STARTUP,
+	CAC_SHUTDOWN,
+	CAC_RECOVERY,
+	CAC_NOTCONSISTENT,
+	CAC_TOOMANY,
 	CAC_SUPERUSER
 } CAC_state;
 
-- 
2.20.1

#33Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: James Coleman (#32)
Re: Nicer error when connecting to standby with hot_standby=off

On 2021-Mar-23, James Coleman wrote:

On Tue, Mar 23, 2021 at 1:46 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

Therefore for now what we've not reached the consensus is what message
should be logged at PM_STARTUP. I'm thinking it's better to log
"the database system is starting up" in that case because of the reasons
that I explained upthread.

I understand your point now, and I agree, that makes sense.

Please note that PM_STARTUP mode is very very short-lived. It only
starts happening when postmaster launches the startup process, and
before the startup process begins WAL replay (as changed by
sigusr1_handler in postmaster.c). Once WAL replay begins, the PM status
changes to PM_RECOVERY. So I don't think we really care all that much
what message is logged in this case. It changes very quickly into the
CAC_NOTCONSISTENT message anyway. For this state, it seems okay with
either what James submitted in v7, or what Fujii said.

However, for this one

+       case CAC_NOTCONSISTENT:
+           if (EnableHotStandby)
+               ereport(FATAL,
+                       (errcode(ERRCODE_CANNOT_CONNECT_NOW),
+                        errmsg("the database system is not accepting connections"),
+                        errdetail("Consistent recovery state has not been yet reached.")));

Maybe it makes sense to say "... is not accepting connections *yet*".
That'd be a tad redundant with what the DETAIL says, but that seems
acceptable.

--
�lvaro Herrera 39�49'30"S 73�17'W

#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#33)
Re: Nicer error when connecting to standby with hot_standby=off

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

However, for this one

+       case CAC_NOTCONSISTENT:
+           if (EnableHotStandby)
+               ereport(FATAL,
+                       (errcode(ERRCODE_CANNOT_CONNECT_NOW),
+                        errmsg("the database system is not accepting connections"),
+                        errdetail("Consistent recovery state has not been yet reached.")));

Maybe it makes sense to say "... is not accepting connections *yet*".

+1, but I think "... is not yet accepting connections" is slightly
better style.

regards, tom lane

#35James Coleman
jtc331@gmail.com
In reply to: Tom Lane (#34)
1 attachment(s)
Re: Nicer error when connecting to standby with hot_standby=off

On Tue, Mar 23, 2021 at 12:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

However, for this one

+       case CAC_NOTCONSISTENT:
+           if (EnableHotStandby)
+               ereport(FATAL,
+                       (errcode(ERRCODE_CANNOT_CONNECT_NOW),
+                        errmsg("the database system is not accepting connections"),
+                        errdetail("Consistent recovery state has not been yet reached.")));

Maybe it makes sense to say "... is not accepting connections *yet*".

+1, but I think "... is not yet accepting connections" is slightly
better style.

All right, see attached v8.

James Coleman

Attachments:

v8-0001-Improve-standby-connection-denied-error-message.patchapplication/octet-stream; name=v8-0001-Improve-standby-connection-denied-error-message.patchDownload
From 4f59013636e3ade1c14132f24a1b27eed2d67ca4 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Mon, 9 Mar 2020 21:45:15 -0400
Subject: [PATCH v8] Improve standby connection denied error message.

Currently when a standby is finished starting up but hot_standby is
configured to off, the error message when a client connects is "the
database system is starting up", which is needless confusing (and not
really all that accurate either).

Instead send a helpful error message so that the user immediately knows
that their server is configured to deny these connections.
---
 src/backend/postmaster/postmaster.c | 20 ++++++++++++++++----
 src/include/libpq/libpq-be.h        |  7 ++++++-
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index edab95a19e..7f5a573f03 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2289,7 +2289,19 @@ retry1:
 		case CAC_STARTUP:
 			ereport(FATAL,
 					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
-					 errmsg("the database system is starting up")));
+					 errmsg("the database system is not accepting connections")));
+			break;
+		case CAC_NOTCONSISTENT:
+			if (EnableHotStandby)
+				ereport(FATAL,
+						(errcode(ERRCODE_CANNOT_CONNECT_NOW),
+						 errmsg("the database system is not yet accepting connections"),
+						 errdetail("Consistent recovery state has not been yet reached.")));
+			else
+				ereport(FATAL,
+						(errcode(ERRCODE_CANNOT_CONNECT_NOW),
+						 errmsg("the database system is not accepting connections"),
+						 errdetail("Hot standby mode is disabled.")));
 			break;
 		case CAC_SHUTDOWN:
 			ereport(FATAL,
@@ -2432,10 +2444,10 @@ canAcceptConnections(int backend_type)
 	{
 		if (Shutdown > NoShutdown)
 			return CAC_SHUTDOWN;	/* shutdown is pending */
-		else if (!FatalError &&
-				 (pmState == PM_STARTUP ||
-				  pmState == PM_RECOVERY))
+		else if (!FatalError && pmState == PM_STARTUP)
 			return CAC_STARTUP; /* normal startup */
+		else if (!FatalError && pmState == PM_RECOVERY)
+			return CAC_NOTCONSISTENT; /* not yet at consistent recovery state */
 		else
 			return CAC_RECOVERY;	/* else must be crash recovery */
 	}
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 30fb4e613d..891394b0c3 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -70,7 +70,12 @@ typedef struct
 
 typedef enum CAC_state
 {
-	CAC_OK, CAC_STARTUP, CAC_SHUTDOWN, CAC_RECOVERY, CAC_TOOMANY,
+	CAC_OK,
+	CAC_STARTUP,
+	CAC_SHUTDOWN,
+	CAC_RECOVERY,
+	CAC_NOTCONSISTENT,
+	CAC_TOOMANY,
 	CAC_SUPERUSER
 } CAC_state;
 
-- 
2.20.1

#36Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Alvaro Herrera (#33)
Re: Nicer error when connecting to standby with hot_standby=off

On 2021/03/24 1:20, Alvaro Herrera wrote:

On 2021-Mar-23, James Coleman wrote:

On Tue, Mar 23, 2021 at 1:46 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

Therefore for now what we've not reached the consensus is what message
should be logged at PM_STARTUP. I'm thinking it's better to log
"the database system is starting up" in that case because of the reasons
that I explained upthread.

I understand your point now, and I agree, that makes sense.

Please note that PM_STARTUP mode is very very short-lived. It only
starts happening when postmaster launches the startup process, and
before the startup process begins WAL replay (as changed by
sigusr1_handler in postmaster.c). Once WAL replay begins, the PM status
changes to PM_RECOVERY.

True if archive recovery or standby server. But during crash recovery
postmaster sits in PM_STARTUP mode. So I guess that we still see
the log messages for PM_STARTUP lots of times.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#37Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Fujii Masao (#36)
Re: Nicer error when connecting to standby with hot_standby=off

On 2021-Mar-24, Fujii Masao wrote:

On 2021/03/24 1:20, Alvaro Herrera wrote:

Please note that PM_STARTUP mode is very very short-lived. It only
starts happening when postmaster launches the startup process, and
before the startup process begins WAL replay (as changed by
sigusr1_handler in postmaster.c). Once WAL replay begins, the PM status
changes to PM_RECOVERY.

True if archive recovery or standby server. But during crash recovery
postmaster sits in PM_STARTUP mode. So I guess that we still see
the log messages for PM_STARTUP lots of times.

Hmm ... true, and I had missed that this is what you had already said
upthread. In this case, should we add a DETAIL line for this message?

FATAL: the database system is starting up
DETAIL: WAL is being applied to recover from a system crash.
or
DETAIL: The system is applying WAL to recover from a system crash.
or
DETAIL: The startup process is applying WAL to recover from a system crash.

--
�lvaro Herrera 39�49'30"S 73�17'W
"La conclusi�n que podemos sacar de esos estudios es que
no podemos sacar ninguna conclusi�n de ellos" (Tanenbaum)

#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#37)
Re: Nicer error when connecting to standby with hot_standby=off

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

FATAL: the database system is starting up
DETAIL: WAL is being applied to recover from a system crash.
or
DETAIL: The system is applying WAL to recover from a system crash.
or
DETAIL: The startup process is applying WAL to recover from a system crash.

I don't think the postmaster has enough context to know if that's
actually true. It just launches the startup process and waits for
results. If somebody saw this during a normal (non-crash) startup,
they'd be justifiably alarmed.

regards, tom lane

#39Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Tom Lane (#38)
Re: Nicer error when connecting to standby with hot_standby=off

On 2021/03/24 5:59, Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

FATAL: the database system is starting up
DETAIL: WAL is being applied to recover from a system crash.
or
DETAIL: The system is applying WAL to recover from a system crash.
or
DETAIL: The startup process is applying WAL to recover from a system crash.

I don't think the postmaster has enough context to know if that's
actually true. It just launches the startup process and waits for
results. If somebody saw this during a normal (non-crash) startup,
they'd be justifiably alarmed.

Yes, so logging "the database system is starting up" seems enough to me.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#40Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Fujii Masao (#39)
Re: Nicer error when connecting to standby with hot_standby=off

On 2021-Mar-24, Fujii Masao wrote:

On 2021/03/24 5:59, Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

FATAL: the database system is starting up
DETAIL: WAL is being applied to recover from a system crash.
or
DETAIL: The system is applying WAL to recover from a system crash.
or
DETAIL: The startup process is applying WAL to recover from a system crash.

I don't think the postmaster has enough context to know if that's
actually true. It just launches the startup process and waits for
results. If somebody saw this during a normal (non-crash) startup,
they'd be justifiably alarmed.

Yes, so logging "the database system is starting up" seems enough to me.

No objection.

--
�lvaro Herrera Valdivia, Chile
"Cuando no hay humildad las personas se degradan" (A. Christie)

#41Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Alvaro Herrera (#40)
1 attachment(s)
Re: Nicer error when connecting to standby with hot_standby=off

On 2021/03/24 16:59, Alvaro Herrera wrote:

On 2021-Mar-24, Fujii Masao wrote:

On 2021/03/24 5:59, Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

FATAL: the database system is starting up
DETAIL: WAL is being applied to recover from a system crash.
or
DETAIL: The system is applying WAL to recover from a system crash.
or
DETAIL: The startup process is applying WAL to recover from a system crash.

I don't think the postmaster has enough context to know if that's
actually true. It just launches the startup process and waits for
results. If somebody saw this during a normal (non-crash) startup,
they'd be justifiably alarmed.

Yes, so logging "the database system is starting up" seems enough to me.

No objection.

Thanks! So I changed the message reported at PM_STARTUP to that one,
based on v8 patch that James posted upthread. I also ran pgindent for
the patch. Attached is the updated version of the patch.

Barring any objection, I will commit this.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

v9-0001-Improve-standby-connection-denied-error-message.patchtext/plain; charset=UTF-8; name=v9-0001-Improve-standby-connection-denied-error-message.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ef0be4ca38..4a3ca78c1b 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2294,6 +2294,18 @@ retry1:
 					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
 					 errmsg("the database system is starting up")));
 			break;
+		case CAC_NOTCONSISTENT:
+			if (EnableHotStandby)
+				ereport(FATAL,
+						(errcode(ERRCODE_CANNOT_CONNECT_NOW),
+						 errmsg("the database system is not yet accepting connections"),
+						 errdetail("Consistent recovery state has not been yet reached.")));
+			else
+				ereport(FATAL,
+						(errcode(ERRCODE_CANNOT_CONNECT_NOW),
+						 errmsg("the database system is not accepting connections"),
+						 errdetail("Hot standby mode is disabled.")));
+			break;
 		case CAC_SHUTDOWN:
 			ereport(FATAL,
 					(errcode(ERRCODE_CANNOT_CONNECT_NOW),
@@ -2435,10 +2447,11 @@ canAcceptConnections(int backend_type)
 	{
 		if (Shutdown > NoShutdown)
 			return CAC_SHUTDOWN;	/* shutdown is pending */
-		else if (!FatalError &&
-				 (pmState == PM_STARTUP ||
-				  pmState == PM_RECOVERY))
+		else if (!FatalError && pmState == PM_STARTUP)
 			return CAC_STARTUP; /* normal startup */
+		else if (!FatalError && pmState == PM_RECOVERY)
+			return CAC_NOTCONSISTENT;	/* not yet at consistent recovery
+										 * state */
 		else
 			return CAC_RECOVERY;	/* else must be crash recovery */
 	}
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 30fb4e613d..891394b0c3 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -70,7 +70,12 @@ typedef struct
 
 typedef enum CAC_state
 {
-	CAC_OK, CAC_STARTUP, CAC_SHUTDOWN, CAC_RECOVERY, CAC_TOOMANY,
+	CAC_OK,
+	CAC_STARTUP,
+	CAC_SHUTDOWN,
+	CAC_RECOVERY,
+	CAC_NOTCONSISTENT,
+	CAC_TOOMANY,
 	CAC_SUPERUSER
 } CAC_state;
 
#42James Coleman
jtc331@gmail.com
In reply to: Fujii Masao (#41)
Re: Nicer error when connecting to standby with hot_standby=off

On Wed, Mar 24, 2021 at 5:55 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/03/24 16:59, Alvaro Herrera wrote:

On 2021-Mar-24, Fujii Masao wrote:

On 2021/03/24 5:59, Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

FATAL: the database system is starting up
DETAIL: WAL is being applied to recover from a system crash.
or
DETAIL: The system is applying WAL to recover from a system crash.
or
DETAIL: The startup process is applying WAL to recover from a system crash.

I don't think the postmaster has enough context to know if that's
actually true. It just launches the startup process and waits for
results. If somebody saw this during a normal (non-crash) startup,
they'd be justifiably alarmed.

Yes, so logging "the database system is starting up" seems enough to me.

No objection.

Thanks! So I changed the message reported at PM_STARTUP to that one,
based on v8 patch that James posted upthread. I also ran pgindent for
the patch. Attached is the updated version of the patch.

Barring any objection, I will commit this.

That looks good to me. Thanks for working on this.

James Coleman

#43Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: James Coleman (#42)
Re: Nicer error when connecting to standby with hot_standby=off

On 2021/03/24 22:06, James Coleman wrote:

That looks good to me. Thanks for working on this.

Thanks! I pushed the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#44James Coleman
jtc331@gmail.com
In reply to: Fujii Masao (#43)
Re: Nicer error when connecting to standby with hot_standby=off

On Wed, Mar 24, 2021 at 9:43 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/03/24 22:06, James Coleman wrote:

That looks good to me. Thanks for working on this.

Thanks! I pushed the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Thanks for reviewing and committing!

James