Allow some recovery parameters to be changed with reload

Started by Peter Eisentrautalmost 7 years ago30 messages
#1Peter Eisentraut
peter.eisentraut@2ndquadrant.com
1 attachment(s)

I think the recovery parameters

archive_cleanup_command
promote_trigger_file
recovery_end_command
recovery_min_apply_delay

can be changed from PGC_POSTMASTER to PGC_SIGHUP without any further
complications (unlike for example primary_conninfo, which is being
discussed elsewhere).

See attached patch.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Allow-some-recovery-parameters-to-be-changed-with-re.patchtext/plain; charset=UTF-8; name=0001-Allow-some-recovery-parameters-to-be-changed-with-re.patch; x-mac-creator=0; x-mac-type=0Download
From 6e0777f4799bce027aa339629539cc101ed0f862 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 4 Feb 2019 09:28:17 +0100
Subject: [PATCH] Allow some recovery parameters to be changed with reload

Change

archive_cleanup_command
promote_trigger_file
recovery_end_command
recovery_min_apply_delay

from PGC_POSTMASTER to PGC_SIGHUP.  This did not require any further
changes.
---
 doc/src/sgml/config.sgml                      | 21 +++++++++++++++----
 src/backend/utils/misc/guc.c                  |  8 +++----
 src/backend/utils/misc/postgresql.conf.sample |  4 ----
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 9b7a7388d5..7e208a4b81 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3081,8 +3081,7 @@ <title>Archive Recovery</title>
     <para>
      This section describes the settings that apply only for the duration of
      the recovery.  They must be reset for any subsequent recovery you wish to
-     perform.  They can only be set at server start and cannot be changed once
-     recovery has begun.
+     perform.
     </para>
 
     <para>
@@ -3161,6 +3160,10 @@ <title>Archive Recovery</title>
         database server shutdown) or an error by the shell (such as command
         not found), then recovery will abort and the server will not start up.
        </para>
+
+       <para>
+        This parameter can only be set at server start.
+       </para>
       </listitem>
      </varlistentry>
 
@@ -3202,6 +3205,10 @@ <title>Archive Recovery</title>
         terminated by a signal or an error by the shell (such as command not
         found), a fatal error will be raised.
        </para>
+       <para>
+        This parameter can only be set in the <filename>postgresql.conf</filename>
+        file or on the server command line.
+       </para>
       </listitem>
      </varlistentry>
 
@@ -3227,6 +3234,10 @@ <title>Archive Recovery</title>
         signal or an error by the shell (such as command not found), the
         database will not proceed with startup.
        </para>
+       <para>
+        This parameter can only be set in the <filename>postgresql.conf</filename>
+        file or on the server command line.
+       </para>
       </listitem>
      </varlistentry>
 
@@ -3863,7 +3874,8 @@ <title>Standby Servers</title>
           standby.  Even if this value is not set, you can still promote
           the standby using <command>pg_ctl promote</command> or calling
           <function>pg_promote</function>.
-          This parameter can only be set at server start.
+          This parameter can only be set in the <filename>postgresql.conf</filename>
+          file or on the server command line.
          </para>
         </listitem>
        </varlistentry>
@@ -4117,7 +4129,8 @@ <title>Standby Servers</title>
         </warning>
        </para>
        <para>
-        This parameter can only be set at server start.
+        This parameter can only be set in the <filename>postgresql.conf</filename>
+        file or on the server command line.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 8681ada33a..ea5444c6f1 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2047,7 +2047,7 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
-		{"recovery_min_apply_delay", PGC_POSTMASTER, REPLICATION_STANDBY,
+		{"recovery_min_apply_delay", PGC_SIGHUP, REPLICATION_STANDBY,
 			gettext_noop("Sets the minimum delay for applying changes during recovery."),
 			NULL,
 			GUC_UNIT_MS
@@ -3398,7 +3398,7 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
-		{"archive_cleanup_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY,
+		{"archive_cleanup_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be executed at every restart point."),
 			NULL
 		},
@@ -3408,7 +3408,7 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
-		{"recovery_end_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY,
+		{"recovery_end_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be executed once at the end of recovery."),
 			NULL
 		},
@@ -3474,7 +3474,7 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
-		{"promote_trigger_file", PGC_POSTMASTER, REPLICATION_STANDBY,
+		{"promote_trigger_file", PGC_SIGHUP, REPLICATION_STANDBY,
 			gettext_noop("Specifies a file name whose presence ends recovery in the standby."),
 			NULL
 		},
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c7f53470df..ad6c436f93 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -244,9 +244,7 @@
 				# e.g. 'cp /mnt/server/archivedir/%f %p'
 				# (change requires restart)
 #archive_cleanup_command = ''	# command to execute at every restartpoint
-				# (change requires restart)
 #recovery_end_command = ''	# command to execute at completion of recovery
-				# (change requires restart)
 
 # - Recovery Target -
 
@@ -310,7 +308,6 @@
 #primary_slot_name = ''			# replication slot on sending server
 					# (change requires restart)
 #promote_trigger_file = ''		# file name whose presence ends recovery
-					# (change requires restart)
 #hot_standby = on			# "off" disallows queries during recovery
 					# (change requires restart)
 #max_standby_archive_delay = 30s	# max delay before canceling queries
@@ -329,7 +326,6 @@
 #wal_retrieve_retry_interval = 5s	# time to wait before retrying to
 					# retrieve WAL after a failed attempt
 #recovery_min_apply_delay = 0		# minimum delay for applying changes during recovery
-					# (change requires restart)
 
 # - Subscribers -
 
-- 
2.20.1

#2Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#1)
Re: Allow some recovery parameters to be changed with reload

On Mon, Feb 04, 2019 at 11:58:28AM +0100, Peter Eisentraut wrote:

I think the recovery parameters

archive_cleanup_command

Only triggered by the checkpointer.

promote_trigger_file
recovery_end_command
recovery_min_apply_delay

Only looked at by the startup process.

can be changed from PGC_POSTMASTER to PGC_SIGHUP without any further
complications (unlike for example primary_conninfo, which is being
discussed elsewhere).

I agree that this subset is straight-forward and safe to switch. The
documentation changes look right.
--
Michael

#3Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Michael Paquier (#2)
Re: Allow some recovery parameters to be changed with reload

On 05/02/2019 04:35, Michael Paquier wrote:

On Mon, Feb 04, 2019 at 11:58:28AM +0100, Peter Eisentraut wrote:

I think the recovery parameters

archive_cleanup_command

Only triggered by the checkpointer.

promote_trigger_file
recovery_end_command
recovery_min_apply_delay

Only looked at by the startup process.

can be changed from PGC_POSTMASTER to PGC_SIGHUP without any further
complications (unlike for example primary_conninfo, which is being
discussed elsewhere).

I agree that this subset is straight-forward and safe to switch. The
documentation changes look right.

Committed, thanks.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In reply to: Michael Paquier (#2)
Re: Allow some recovery parameters to be changed with reload

Hello

 I think the recovery parameters

     archive_cleanup_command

Only triggered by the checkpointer.

     promote_trigger_file
     recovery_end_command
     recovery_min_apply_delay

Only looked at by the startup process.

We have some possible trouble with restore_command? As far i know it also only looked at by the startup process.

regards, Sergei

#5Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Sergei Kornilov (#4)
Re: Allow some recovery parameters to be changed with reload

On 07/02/2019 09:14, Sergei Kornilov wrote:

We have some possible trouble with restore_command? As far i know it also only looked at by the startup process.

Probably right. I figured it would be useful to see what the outcome is
with primary_conninfo, so they can be treated similarly.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#5)
Re: Allow some recovery parameters to be changed with reload

On Thu, Feb 07, 2019 at 11:06:27PM +0100, Peter Eisentraut wrote:

Probably right. I figured it would be useful to see what the outcome is
with primary_conninfo, so they can be treated similarly.

The interactions with waiting for WAL to be available and the WAL
receiver stresses me a bit for restore_command, as you could finish
with the startup process switching to use restore_command with a WAL
receiver still working behind and overwriting partially the recovered
segment, which could lead to corruption. We should be *very* careful
about that.
--
Michael

#7Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#6)
Re: Allow some recovery parameters to be changed with reload

Hi,

On 2019-02-08 09:19:31 +0900, Michael Paquier wrote:

On Thu, Feb 07, 2019 at 11:06:27PM +0100, Peter Eisentraut wrote:

Probably right. I figured it would be useful to see what the outcome is
with primary_conninfo, so they can be treated similarly.

The interactions with waiting for WAL to be available and the WAL
receiver stresses me a bit for restore_command, as you could finish
with the startup process switching to use restore_command with a WAL
receiver still working behind and overwriting partially the recovered
segment, which could lead to corruption. We should be *very* careful
about that.

I'm not clear on the precise mechanics you're imagining here, could you
expand a bit? We kill the walreceiver when switching from receiver to
restore command, and wait for it to acknowledge that, no?
C.F. ShutdownWalRcv() call in the lastSourceFailed branch of
WaitForWALToBecomeAvailable().

Greetings,

Andres Freund

In reply to: Andres Freund (#7)
1 attachment(s)
Re: Allow some recovery parameters to be changed with reload

Hello

I want to return to this discussion, since primary_conninfo is now PGC_SIGHUP (and I hope will not be reverted)

On 2019-02-08 09:19:31 +0900, Michael Paquier wrote:

 On Thu, Feb 07, 2019 at 11:06:27PM +0100, Peter Eisentraut wrote:
 > Probably right. I figured it would be useful to see what the outcome is
 > with primary_conninfo, so they can be treated similarly.

 The interactions with waiting for WAL to be available and the WAL
 receiver stresses me a bit for restore_command, as you could finish
 with the startup process switching to use restore_command with a WAL
 receiver still working behind and overwriting partially the recovered
 segment, which could lead to corruption. We should be *very* careful
 about that.

I'm not clear on the precise mechanics you're imagining here, could you
expand a bit? We kill the walreceiver when switching from receiver to
restore command, and wait for it to acknowledge that, no?
C.F. ShutdownWalRcv() call in the lastSourceFailed branch of
WaitForWALToBecomeAvailable().

So...
We call restore_command only when walreceiver is stopped.
We use restore_command only in startup process - so we have no race condition between processes.
We have some issues here? Or we can just make restore_command reloadable as attached?

regards, Sergei

Attachments:

v1_restore_command_reload.patchtext/x-diff; name=v1_restore_command_reload.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2de21903a1..454bf95d9b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3303,7 +3303,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
        </para>
 
        <para>
-        This parameter can only be set at server start.
+        This parameter can only be set in the <filename>postgresql.conf</filename>
+        file or on the server command line.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 79bc7ac8ca..f340369dcf 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3641,7 +3641,7 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
-		{"restore_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY,
+		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will retrieve an archived WAL file."),
 			NULL
 		},
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e9f8ca775d..8078249e1f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -247,7 +247,6 @@
 				# placeholders: %p = path of file to restore
 				#               %f = file name only
 				# e.g. 'cp /mnt/server/archivedir/%f %p'
-				# (change requires restart)
 #archive_cleanup_command = ''	# command to execute at every restartpoint
 #recovery_end_command = ''	# command to execute at completion of recovery
 
#9Robert Haas
robertmhaas@gmail.com
In reply to: Sergei Kornilov (#8)
Re: Allow some recovery parameters to be changed with reload

On Sat, Mar 28, 2020 at 7:21 AM Sergei Kornilov <sk@zsrv.org> wrote:

So...
We call restore_command only when walreceiver is stopped.
We use restore_command only in startup process - so we have no race condition between processes.
We have some issues here? Or we can just make restore_command reloadable as attached?

I don't see the problem here, either. Does anyone else see a problem,
or some reason not to press forward with this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#9)
Re: Allow some recovery parameters to be changed with reload

On Wed, Aug 05, 2020 at 11:41:49AM -0400, Robert Haas wrote:

On Sat, Mar 28, 2020 at 7:21 AM Sergei Kornilov <sk@zsrv.org> wrote:

So...
We call restore_command only when walreceiver is stopped.
We use restore_command only in startup process - so we have no race condition between processes.
We have some issues here? Or we can just make restore_command reloadable as attached?

I don't see the problem here, either. Does anyone else see a problem,
or some reason not to press forward with this?

Sorry for the late reply. I have been looking at that stuff again,
and restore_command can be called in the context of a WAL sender
process within the page_read callback of logical decoding via
XLogReadDetermineTimeline(), as readTimeLineHistory() could look for a
timeline history file. So restore_command is not used only in the
startup process.
--
Michael

#11Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#10)
Re: Allow some recovery parameters to be changed with reload

On Sun, Aug 9, 2020 at 1:21 AM Michael Paquier <michael@paquier.xyz> wrote:

Sorry for the late reply. I have been looking at that stuff again,
and restore_command can be called in the context of a WAL sender
process within the page_read callback of logical decoding via
XLogReadDetermineTimeline(), as readTimeLineHistory() could look for a
timeline history file. So restore_command is not used only in the
startup process.

Hmm, interesting. But, does that make this change wrong, apart from
the comments? Like, in the case of primary_conninfo, maybe some
confusion could result if the startup process decided whether to ask
for a WAL receiver based on thinking primary_conninfo being set, while
that process thought that it wasn't actually set after all, as
previously discussed in
/messages/by-id/CA+TgmoZVmJX1+QTWw2tSnPHrnkwKZxC3ZsRynFB-Fpzm1Oxuhw@mail.gmail.com
... but what's the corresponding hazard here, exactly? It doesn't seem
that there's any way in which the decision one process makes affects
the decision the other process makes. There's still a race condition:
it's possible for a walsender to use the old restore_command after the
startup process had already used the new one, or the other way around.
However, it doesn't seem like that should confuse anything inside the
server, and therefore I'm not sure we need to code around it.

If you or someone else thinks we do, then it'd be nice to hear why,
and what guarantees you think we should be aiming to achieve.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Anastasia Lubennikova
a.lubennikova@postgrespro.ru
In reply to: Robert Haas (#11)
Re: Allow some recovery parameters to be changed with reload

On 10.08.2020 23:20, Robert Haas wrote:

On Sun, Aug 9, 2020 at 1:21 AM Michael Paquier <michael@paquier.xyz> wrote:

Sorry for the late reply. I have been looking at that stuff again,
and restore_command can be called in the context of a WAL sender
process within the page_read callback of logical decoding via
XLogReadDetermineTimeline(), as readTimeLineHistory() could look for a
timeline history file. So restore_command is not used only in the
startup process.

Hmm, interesting. But, does that make this change wrong, apart from
the comments? Like, in the case of primary_conninfo, maybe some
confusion could result if the startup process decided whether to ask
for a WAL receiver based on thinking primary_conninfo being set, while
that process thought that it wasn't actually set after all, as
previously discussed in
/messages/by-id/CA+TgmoZVmJX1+QTWw2tSnPHrnkwKZxC3ZsRynFB-Fpzm1Oxuhw@mail.gmail.com
... but what's the corresponding hazard here, exactly? It doesn't seem
that there's any way in which the decision one process makes affects
the decision the other process makes. There's still a race condition:
it's possible for a walsender

Did you mean walreceiver here?

to use the old restore_command after the
startup process had already used the new one, or the other way around.
However, it doesn't seem like that should confuse anything inside the
server, and therefore I'm not sure we need to code around it.

I came up with following scenario. Let's say we have xlog files 1,2,3 in
dir1 and files 4,5 in dir2. If startup process had only handled files 1
and 2, before we switched restore_command from reading dir1 to reading
dir2, it will fail to find next file. IIUC, it will assume that recovery
is done, start server and walreceiver. The walreceiver will fail as
well. I don't know, how realistic is this case, though.

In general,. this feature looks useful and consistent with previous
changes, so I am interested in pushing it forward.
Sergey, could you please attach this thread to the upcoming CF, if
you're going to continue working on it.

 A few more questions:
- RestoreArchivedFile() is also used by pg_rewind. I don't see any
particular problem with it, just want to remind that we should test it too.
- How will it interact with possible future optimizations of archive
restore? For example, WAL prefetch [1].

 [1]
/messages/by-id/601EE1F5-0B78-47E1-9AAE-C15F74A1C21D@postgrespro.ru

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#13Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Anastasia Lubennikova (#12)
Re: Allow some recovery parameters to be changed with reload

At Thu, 22 Oct 2020 01:59:07 +0300, Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote in

On 10.08.2020 23:20, Robert Haas wrote:

On Sun, Aug 9, 2020 at 1:21 AM Michael Paquier <michael@paquier.xyz>
wrote:

Sorry for the late reply. I have been looking at that stuff again,
and restore_command can be called in the context of a WAL sender
process within the page_read callback of logical decoding via
XLogReadDetermineTimeline(), as readTimeLineHistory() could look for a
timeline history file. So restore_command is not used only in the
startup process.

Hmm, interesting. But, does that make this change wrong, apart from
the comments? Like, in the case of primary_conninfo, maybe some
confusion could result if the startup process decided whether to ask
for a WAL receiver based on thinking primary_conninfo being set, while
that process thought that it wasn't actually set after all, as
previously discussed in
/messages/by-id/CA+TgmoZVmJX1+QTWw2tSnPHrnkwKZxC3ZsRynFB-Fpzm1Oxuhw@mail.gmail.com
... but what's the corresponding hazard here, exactly? It doesn't seem
that there's any way in which the decision one process makes affects
the decision the other process makes. There's still a race condition:
it's possible for a walsender

Did you mean walreceiver here?

It's logical walsender. restore_command is used within
logical_read_xlog_page() via XLogReadDetermineTimeline().

to use the old restore_command after the
startup process had already used the new one, or the other way around.
However, it doesn't seem like that should confuse anything inside the
server, and therefore I'm not sure we need to code around it.

I came up with following scenario. Let's say we have xlog files 1,2,3
in dir1 and files 4,5 in dir2. If startup process had only handled
files 1 and 2, before we switched restore_command from reading dir1 to
reading dir2, it will fail to find next file. IIUC, it will assume
that recovery is done, start server and walreceiver. The walreceiver
will fail as well. I don't know, how realistic is this case, though.

That operation is somewhat bogus, if the server is not in standby
mode. In standby mode, startup waits for the next segment safely.

In general,. this feature looks useful and consistent with previous
changes, so I am interested in pushing it forward.

Agreed. The feature seems to work fine as far as we don't make a
change of restore_command that moves to another history. Otherwise
recovery doesn't work correctly regaredless whether it is PGC_SIGHUP
or not.

Sergey, could you please attach this thread to the upcoming CF, if
you're going to continue working on it.

 A few more questions:
- RestoreArchivedFile() is also used by pg_rewind. I don't see any
- particular problem with it, just want to remind that we should test it
- too.
- How will it interact with possible future optimizations of archive
- restore? For example, WAL prefetch [1].

 [1]
/messages/by-id/601EE1F5-0B78-47E1-9AAE-C15F74A1C21D@postgrespro.ru

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In reply to: Kyotaro Horiguchi (#13)
Re: Allow some recovery parameters to be changed with reload

Hello

Sorry for late response.

 > ... but what's the corresponding hazard here, exactly? It doesn't seem
 > that there's any way in which the decision one process makes affects
 > the decision the other process makes. There's still a race condition:
 > it's possible for a walsender
 Did you mean walreceiver here?

It's logical walsender. restore_command is used within
logical_read_xlog_page() via XLogReadDetermineTimeline().

Still have no idea what's the corresponding hazard here.

 > to use the old restore_command after the
 > startup process had already used the new one, or the other way around.
 > However, it doesn't seem like that should confuse anything inside the
 > server, and therefore I'm not sure we need to code around it.
 I came up with following scenario. Let's say we have xlog files 1,2,3
 in dir1 and files 4,5 in dir2. If startup process had only handled
 files 1 and 2, before we switched restore_command from reading dir1 to
 reading dir2, it will fail to find next file. IIUC, it will assume
 that recovery is done, start server and walreceiver. The walreceiver
 will fail as well. I don't know, how realistic is this case, though.

That operation is somewhat bogus, if the server is not in standby
mode. In standby mode, startup waits for the next segment safely.

I think it's pilot error. It is already possible to change anything in restore_command by wrapping real command into some script:

restore_command = '/bin/restore_wal.sh "%f" "%p"'

And one can simple replace this file with something else with different logic. Or even by using some command with separate own settings. Real world example ( https://github.com/wal-g/wal-g ):

restore_command = '. /etc/wal-g/WALG_AWS_ENV; wal-g wal-fetch "%f" "%p"'

And it is possible to change the real WAL source in ENV script without changing the restore_command. We can't track this, so I not see new issues here.

 Sergey, could you please attach this thread to the upcoming CF, if
 you're going to continue working on it.

Sure, I created one: https://commitfest.postgresql.org/30/2802/

 - How will it interact with possible future optimizations of archive
 - restore? For example, WAL prefetch [1].

Shouldn't we ask the author of such a patch and not me? In particular, does this patch rely on the restore_command not being changed? Probably some form of synchronisation would be neccesary in infrastructure for parallel executing restore commands. On / off handling of restore_command will most likely be required. I did not review this patch.

regards, Sergei

#15Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Sergei Kornilov (#14)
Re: Allow some recovery parameters to be changed with reload

On 2020/10/28 21:02, Sergei Kornilov wrote:

Hello

Sorry for late response.

 > ... but what's the corresponding hazard here, exactly? It doesn't seem
 > that there's any way in which the decision one process makes affects
 > the decision the other process makes. There's still a race condition:
 > it's possible for a walsender
 Did you mean walreceiver here?

It's logical walsender. restore_command is used within
logical_read_xlog_page() via XLogReadDetermineTimeline().

Still have no idea what's the corresponding hazard here.

 > to use the old restore_command after the
 > startup process had already used the new one, or the other way around.
 > However, it doesn't seem like that should confuse anything inside the
 > server, and therefore I'm not sure we need to code around it.
 I came up with following scenario. Let's say we have xlog files 1,2,3
 in dir1 and files 4,5 in dir2. If startup process had only handled
 files 1 and 2, before we switched restore_command from reading dir1 to
 reading dir2, it will fail to find next file. IIUC, it will assume
 that recovery is done, start server and walreceiver. The walreceiver
 will fail as well. I don't know, how realistic is this case, though.

That operation is somewhat bogus, if the server is not in standby
mode. In standby mode, startup waits for the next segment safely.

I think it's pilot error. It is already possible to change anything in restore_command by wrapping real command into some script:

restore_command = '/bin/restore_wal.sh "%f" "%p"'

And one can simple replace this file with something else with different logic. Or even by using some command with separate own settings. Real world example ( https://github.com/wal-g/wal-g ):

restore_command = '. /etc/wal-g/WALG_AWS_ENV; wal-g wal-fetch "%f" "%p"'

And it is possible to change the real WAL source in ENV script without changing the restore_command. We can't track this, so I not see new issues here.

 Sergey, could you please attach this thread to the upcoming CF, if
 you're going to continue working on it.

Sure, I created one: https://commitfest.postgresql.org/30/2802/

+1 to mark restore_command as PGC_SIGHUP.

Currently when restore_command is not set, archive recovery fails
at the beginning. With the patch, how should we treat the case where
retore_command is reset to empty during archive recovery? We should
reject that change of restore_command?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In reply to: Fujii Masao (#15)
1 attachment(s)
Re: Allow some recovery parameters to be changed with reload

Hello

Currently when restore_command is not set, archive recovery fails
at the beginning. With the patch, how should we treat the case where
retore_command is reset to empty during archive recovery? We should
reject that change of restore_command?

Good point. I think we should reject that change. But (AFAIC) I cannot use GUC check callback for this purpose, as only the startup process knows StandbyModeRequested. I think it would be appropriate to call validateRecoveryParameters from StartupRereadConfig. As side effect this add warning/hint "specified neither primary_conninfo nor restore_command" in standby mode in appropriate configuration state. Not sure about the rest checks in validateRecoveryParameters, maybe it's a wrong idea to recheck them here and I need to separate these checks into another function.

regards, Sergei

Attachments:

v2_restore_command_reload.patchtext/x-diff; name=v2_restore_command_reload.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f043433e31..ec74cb43ad 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3542,7 +3542,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
        </para>
 
        <para>
-        This parameter can only be set at server start.
+        This parameter can only be set in the <filename>postgresql.conf</filename>
+        file or on the server command line.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a1078a7cfc..148dd34633 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -883,7 +883,6 @@ static MemoryContext walDebugCxt = NULL;
 #endif
 
 static void readRecoverySignalFile(void);
-static void validateRecoveryParameters(void);
 static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
 static bool recoveryStopsBefore(XLogReaderState *record);
 static bool recoveryStopsAfter(XLogReaderState *record);
@@ -5458,7 +5457,7 @@ readRecoverySignalFile(void)
 				 errmsg("standby mode is not supported by single-user servers")));
 }
 
-static void
+void
 validateRecoveryParameters(void)
 {
 	if (!ArchiveRecoveryRequested)
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index eab9c8c4ed..bf7dc866db 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -114,6 +114,11 @@ StartupRereadConfig(void)
 
 	if (conninfoChanged || slotnameChanged || tempSlotChanged)
 		StartupRequestWalReceiverRestart();
+
+	/*
+	 * Check the combination of new parameters
+	 */
+	validateRecoveryParameters();
 }
 
 /* Handle various signals that might be sent to the startup process */
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a62d64eaa4..87fd593924 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3699,7 +3699,7 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
-		{"restore_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY,
+		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be called to retrieve an archived WAL file."),
 			NULL
 		},
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cb571f7cc..9c9091e601 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -253,7 +253,6 @@
 				# placeholders: %p = path of file to restore
 				#               %f = file name only
 				# e.g. 'cp /mnt/server/archivedir/%f %p'
-				# (change requires restart)
 #archive_cleanup_command = ''	# command to execute at every restartpoint
 #recovery_end_command = ''	# command to execute at completion of recovery
 
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 221af87e71..965ed109b3 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -347,6 +347,7 @@ extern void WakeupRecovery(void);
 extern void SetWalWriterSleeping(bool sleeping);
 
 extern void StartupRequestWalReceiverRestart(void);
+extern void validateRecoveryParameters(void);
 extern void XLogRequestWalReceiverReply(void);
 
 extern void assign_max_wal_size(int newval, void *extra);
#17Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Sergei Kornilov (#16)
Re: Allow some recovery parameters to be changed with reload

On 2020/11/06 21:36, Sergei Kornilov wrote:

Hello

Currently when restore_command is not set, archive recovery fails
at the beginning. With the patch, how should we treat the case where
retore_command is reset to empty during archive recovery? We should
reject that change of restore_command?

Good point. I think we should reject that change. But (AFAIC) I cannot use GUC check callback for this purpose, as only the startup process knows StandbyModeRequested. I think it would be appropriate to call validateRecoveryParameters from StartupRereadConfig.

I don't think this idea is ok because emptying restore_command and the reload
of configuration file could cause the server doing archive recovery to
shut down with FATAL error.

I'm wondering if it's safe to allow restore_command to be emptied during
archive recovery. Even when it's emptied, archive recovery can proceed
by reading WAL files from pg_wal directory. This is the same behavior as
when restore_command is set to, e.g., /bin/false. So maybe we don't need
to treat the empty restore_command so special??

OTOH, we should not remove the check of restore_command in
validateRecoveryParameters(). Otherwise, when users forget to specify
restore_command when starting archive recovery, recovery could
wrongly proceed and the database could get corrupted.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In reply to: Fujii Masao (#17)
Re: Allow some recovery parameters to be changed with reload

Hello

I'm wondering if it's safe to allow restore_command to be emptied during
archive recovery. Even when it's emptied, archive recovery can proceed
by reading WAL files from pg_wal directory. This is the same behavior as
when restore_command is set to, e.g., /bin/false.

I am always confused by this implementation detail. restore_command fails? Fine, let's just read file from pg_wal. But this is different topic...

I do not know the history of this fatal ereport. It looks like "must specify restore_command when standby mode is not enabled" check is only intended to protect the user from misconfiguration and the rest code will treat empty restore_command correctly, just like /bin/false. Did not notice anything around StandbyMode conditions.

regards, Sergei

#19Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Sergei Kornilov (#18)
Re: Allow some recovery parameters to be changed with reload

At Sat, 07 Nov 2020 00:36:33 +0300, Sergei Kornilov <sk@zsrv.org> wrote in

Hello

I'm wondering if it's safe to allow restore_command to be emptied during
archive recovery. Even when it's emptied, archive recovery can proceed
by reading WAL files from pg_wal directory. This is the same behavior as
when restore_command is set to, e.g., /bin/false.

I am always confused by this implementation detail. restore_command fails? Fine, let's just read file from pg_wal. But this is different topic...

--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -114,6 +114,11 @@ StartupRereadConfig(void)
 	if (conninfoChanged || slotnameChanged || tempSlotChanged)
 		StartupRequestWalReceiverRestart();
+
+	/*
+	 * Check the combination of new parameters
+	 */
+	validateRecoveryParameters();

If someone changes restore_command to '' then reload while crash
recovery is running, the server stops for no valid reason. If
restore_command is set to 'hoge' (literally:p, that is, anything
unexecutable) and send SIGHUP while archive recovery is running, the
server stops. I think we need to handle these cases more gracefully,
I think. That said, I think we should keep the current behavior that
the server stops if the same happens just after server start.

If someone changes restore_command by mistake to something executable
but fails to offer the specfied file even if it exists, the running
archive recovery finishes then switches timeline unexpectedly. With
the same reasoning to the discussion abou inexecutable contents just
above, that behavior seems valid when the variable has not changed
since startup, but I'm not sure what to do if that happens by a reload
while (archive|crash) recovery is proceeding.

I do not know the history of this fatal ereport. It looks like "must specify restore_command when standby mode is not enabled" check is only intended to protect the user from misconfiguration and the rest code will treat empty restore_command correctly, just like /bin/false. Did not notice anything around StandbyMode conditions.

If restore_command is not changable after server-start, it would be
valid for startup to stop for inexecutable content for the variable
since there's no way to proceed recovery.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In reply to: Kyotaro Horiguchi (#19)
Re: Allow some recovery parameters to be changed with reload

Hello

If someone changes restore_command to '' then reload while crash
recovery is running, the server stops for no valid reason.

While *crash* recovery is running? It's possible only during Point-in-Time Recovery, no?
At the beginning of validateRecoveryParameters we check ArchiveRecoveryRequested, which can be set in two cases:

* if recovery.signal found - same check on recovery start. Otherwise it is possible to early end recovery because of empty restore_command. So we want to protect the user from such misconfiguration? I am fine if we decide that no additional handling is needed.
* if standby.signal found - this FATAL is not reachable because StandbyModeRequested is also set.

During crash recovery validateRecoveryParameters does nothing.

If restore_command is set to 'hoge' (literally:p, that is, anything
unexecutable) and send SIGHUP while archive recovery is running, the
server stops. I think we need to handle these cases more gracefully,

I think we can not perform such check reliable. As in my example earlier:

restore_command = '. /etc/wal-g/WALG_AWS_ENV; wal-g wal-fetch "%f" "%p"'

How do we find the commands first? For any shell? And even: we learned that the binary is unexecutable. But what to do next?

If someone changes restore_command by mistake to something executable
but fails to offer the specfied file even if it exists, the running
archive recovery finishes then switches timeline unexpectedly.

Or executable file was just removed. Which is clearly a pilot error. Is this differs from changing restore_command?

 I do not know the history of this fatal ereport. It looks like "must specify restore_command when standby mode is not enabled" check is only intended to protect the user from misconfiguration and the rest code will treat empty restore_command correctly, just like /bin/false. Did not notice anything around StandbyMode conditions.

If restore_command is not changable after server-start, it would be
valid for startup to stop for inexecutable content for the variable
since there's no way to proceed recovery.

Why not use local pg_wal? There may be already enough WAL.

regards, Sergei

#21Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Sergei Kornilov (#20)
Re: Allow some recovery parameters to be changed with reload

Hello.

At Tue, 10 Nov 2020 14:13:17 +0300, Sergei Kornilov <sk@zsrv.org> wrote in

Hello

If someone changes restore_command to '' then reload while crash
recovery is running, the server stops for no valid reason.

While *crash* recovery is running? It's possible only during Point-in-Time Recovery, no?

Even if PITR is commanded, crash recovery can run before starting
archive recovery if the server was not gracefully shut down.
Parameter reload can happen while crash recovery. And
validateRecoveryParameters() calls "ereport(FATAL" in that case.

At the beginning of validateRecoveryParameters we check ArchiveRecoveryRequested, which can be set in two cases:

That does not prevent crash recovery from running.

* if recovery.signal found - same check on recovery start. Otherwise it is possible to early end recovery because of empty restore_command. So we want to protect the user from such misconfiguration? I am fine if we decide that no additional handling is needed.
* if standby.signal found - this FATAL is not reachable because StandbyModeRequested is also set.

During crash recovery validateRecoveryParameters does nothing.

If restore_command is set to 'hoge' (literally:p, that is, anything
unexecutable) and send SIGHUP while archive recovery is running, the
server stops. I think we need to handle these cases more gracefully,

I think we can not perform such check reliable. As in my example earlier:

restore_command = '. /etc/wal-g/WALG_AWS_ENV; wal-g wal-fetch "%f" "%p"'

How do we find the commands first? For any shell? And even: we learned that the binary is unexecutable. But what to do next?

I don't suggest to check if the command actually works, I suggested to
avoid server stop even if the parameters failed to run after a
config-reload.

If someone changes restore_command by mistake to something executable
but fails to offer the specfied file even if it exists, the running
archive recovery finishes then switches timeline unexpectedly.

Or executable file was just removed. Which is clearly a pilot error. Is this differs from changing restore_command?

I don't know. I just think that it is not proper that "ALTER SYSTEM" +
config-reload causes server stop.

 I do not know the history of this fatal ereport. It looks like "must specify restore_command when standby mode is not enabled" check is only intended to protect the user from misconfiguration and the rest code will treat empty restore_command correctly, just like /bin/false. Did not notice anything around StandbyMode conditions.

If restore_command is not changable after server-start, it would be
valid for startup to stop for inexecutable content for the variable
since there's no way to proceed recovery.

Why not use local pg_wal? There may be already enough WAL.

Mmm. If the file to read is in pg_wal, restore_command won't be
executed in the first place?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In reply to: Kyotaro Horiguchi (#21)
Re: Allow some recovery parameters to be changed with reload

Hello

Even if PITR is commanded, crash recovery can run before starting
archive recovery if the server was not gracefully shut down.

Hmm... Still not sure how it's possible. Both readRecoverySignalFile and validateRecoveryParameters are called early in StartupXLOG. If PITR was commanded - we follow PITR logic. If requested recovery stop point is before consistent recovery point we shutdown the database with another FATAL.
I mean such place:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlog.c;h=9d3f1c12fc56f61da4d2b9bf08c54d31b9757ef7;hb=29be9983a64c011eac0b9ee29895cce71e15ea77#l6891
If we start recovery by any reason and a archive recovery was requested - we start archive recovery instead of crash recovery.

I don't know. I just think that it is not proper that "ALTER SYSTEM" +
config-reload causes server stop.

I got your point. How about pause the recovery process? Like proposed in https://commitfest.postgresql.org/30/2489/
For example,
* restore_command become empty on SIGHUP while PITR was requested
* we set recovery to pause
* if user call pg_wal_replay_resume and restore_command is still empty - we shutdown the database
* if user fix restore_command - we continue restore.

But it seems complicated if we just don't need special handling here. We still require restore_command to be set to start recovery. In case the user later wants to set the restore_command to empty - let's assume that's correct (FATAL if PITR target is after the end of local pg_wal, promote otherwise).

 Why not use local pg_wal? There may be already enough WAL.

Mmm. If the file to read is in pg_wal, restore_command won't be
executed in the first place?

Startup process will call restore_command in any case regardless of pg_wal content. (xlogarchive.c, RestoreArchivedFile)

* When doing archive recovery, we always prefer an archived log file even
* if a file of the same name exists in XLOGDIR. The reason is that the
* file in XLOGDIR could be an old, un-filled or partly-filled version
* that was copied and restored as part of backing up $PGDATA.

regards, Sergei

#23Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Sergei Kornilov (#18)
Re: Allow some recovery parameters to be changed with reload

On 2020/11/07 6:36, Sergei Kornilov wrote:

Hello

I'm wondering if it's safe to allow restore_command to be emptied during
archive recovery. Even when it's emptied, archive recovery can proceed
by reading WAL files from pg_wal directory. This is the same behavior as
when restore_command is set to, e.g., /bin/false.

I am always confused by this implementation detail. restore_command fails? Fine, let's just read file from pg_wal. But this is different topic...

I do not know the history of this fatal ereport. It looks like "must specify restore_command when standby mode is not enabled" check is only intended to protect the user from misconfiguration and the rest code will treat empty restore_command correctly, just like /bin/false.

Maybe.

Anyway, for now I think that your first patch would be enough, i.e.,
just change the context of restore_command to PGC_SIGHUP.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In reply to: Fujii Masao (#23)
1 attachment(s)
Re: Allow some recovery parameters to be changed with reload

Hello

Anyway, for now I think that your first patch would be enough, i.e.,
just change the context of restore_command to PGC_SIGHUP.

Glad to hear. Attached a rebased version of the original proposal.

regards, Sergei

Attachments:

v3_restore_command_reload.patchtext/x-diff; name=v3_restore_command_reload.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f043433e31..ec74cb43ad 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3542,7 +3542,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
        </para>
 
        <para>
-        This parameter can only be set at server start.
+        This parameter can only be set in the <filename>postgresql.conf</filename>
+        file or on the server command line.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index bb34630e8e..a732279f52 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3699,7 +3699,7 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
-		{"restore_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY,
+		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be called to retrieve an archived WAL file."),
 			NULL
 		},
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cb571f7cc..9c9091e601 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -253,7 +253,6 @@
 				# placeholders: %p = path of file to restore
 				#               %f = file name only
 				# e.g. 'cp /mnt/server/archivedir/%f %p'
-				# (change requires restart)
 #archive_cleanup_command = ''	# command to execute at every restartpoint
 #recovery_end_command = ''	# command to execute at completion of recovery
 
#25Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Sergei Kornilov (#24)
Re: Allow some recovery parameters to be changed with reload

On 2020/11/12 4:38, Sergei Kornilov wrote:

Hello

Anyway, for now I think that your first patch would be enough, i.e.,
just change the context of restore_command to PGC_SIGHUP.

Glad to hear. Attached a rebased version of the original proposal.

Thanks for rebasing the patch!

This parameter is required for archive recovery,

I found the above description in config.sgml. I was just wondering
if it should be updated so that the actual specification is described or not.
The actual spec is that restore_command is required to start archive
recovery, but optional (i.e., the parameter can be reset to an empty)
after archive recovery has started. But this updated version of
description would be rather confusing to users. So I'm now thinking
not to update that.

Does anyone object to the patch? If no, I'm thinking to commit the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#26Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fujii Masao (#25)
Re: Allow some recovery parameters to be changed with reload

At Thu, 26 Nov 2020 22:43:48 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

On 2020/11/12 4:38, Sergei Kornilov wrote:

Hello

Anyway, for now I think that your first patch would be enough, i.e.,
just change the context of restore_command to PGC_SIGHUP.

Glad to hear. Attached a rebased version of the original proposal.

Thanks for rebasing the patch!

This parameter is required for archive recovery,

I found the above description in config.sgml. I was just wondering
if it should be updated so that the actual specification is described
or not.
The actual spec is that restore_command is required to start archive
recovery, but optional (i.e., the parameter can be reset to an empty)
after archive recovery has started. But this updated version of
description would be rather confusing to users. So I'm now thinking
not to update that.

Does anyone object to the patch? If no, I'm thinking to commit the
patch.

Although I don't object to make the parameter reloadable, I think it
needs to be documented that server could stop after reloading if the
server failed to execute the new command line.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#27Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Kyotaro Horiguchi (#26)
Re: Allow some recovery parameters to be changed with reload

On 2020/11/27 9:30, Kyotaro Horiguchi wrote:

At Thu, 26 Nov 2020 22:43:48 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

On 2020/11/12 4:38, Sergei Kornilov wrote:

Hello

Anyway, for now I think that your first patch would be enough, i.e.,
just change the context of restore_command to PGC_SIGHUP.

Glad to hear. Attached a rebased version of the original proposal.

Thanks for rebasing the patch!

This parameter is required for archive recovery,

I found the above description in config.sgml. I was just wondering
if it should be updated so that the actual specification is described
or not.
The actual spec is that restore_command is required to start archive
recovery, but optional (i.e., the parameter can be reset to an empty)
after archive recovery has started. But this updated version of
description would be rather confusing to users. So I'm now thinking
not to update that.

Does anyone object to the patch? If no, I'm thinking to commit the
patch.

Although I don't object to make the parameter reloadable, I think it
needs to be documented that server could stop after reloading if the
server failed to execute the new command line.

You mean that we should document that if restore_command is set to improper command mistakenly, archive recovery may fail to restore some archived WAL files and finish without replaying those WAL? But isn't this true even without applying the patch?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#28Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fujii Masao (#27)
Re: Allow some recovery parameters to be changed with reload

At Fri, 27 Nov 2020 09:48:25 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

On 2020/11/27 9:30, Kyotaro Horiguchi wrote:

At Thu, 26 Nov 2020 22:43:48 +0900, Fujii Masao
<masao.fujii@oss.nttdata.com> wrote in

On 2020/11/12 4:38, Sergei Kornilov wrote:

Hello

Anyway, for now I think that your first patch would be enough, i.e.,
just change the context of restore_command to PGC_SIGHUP.

Glad to hear. Attached a rebased version of the original proposal.

Thanks for rebasing the patch!

This parameter is required for archive recovery,

I found the above description in config.sgml. I was just wondering
if it should be updated so that the actual specification is described
or not.
The actual spec is that restore_command is required to start archive
recovery, but optional (i.e., the parameter can be reset to an empty)
after archive recovery has started. But this updated version of
description would be rather confusing to users. So I'm now thinking
not to update that.

Does anyone object to the patch? If no, I'm thinking to commit the
patch.

Although I don't object to make the parameter reloadable, I think it
needs to be documented that server could stop after reloading if the
server failed to execute the new command line.

You mean that we should document that if restore_command is set to
improper command mistakenly, archive recovery may fail to restore some
archived WAL files and finish without replaying those WAL? But isn't
this true even without applying the patch?

If we set a wrong command in .conf and start the server in recovery
mode, the server immediately stops. It's fine. If we change
restore_command wrong way on a running server, "pg_ctl reload" stops
the server. If it is a hot standby, the server stops unexpectedly.

However, after rechecking, I found that recovery_end_command with
wrong content causes server stop at the end of recovery, or at
promotion. And that variable is PGC_SIGHUP.

So I agree not to document that. Sorry for the noise.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#29Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Kyotaro Horiguchi (#28)
Re: Allow some recovery parameters to be changed with reload

On 2020/11/27 12:05, Kyotaro Horiguchi wrote:

At Fri, 27 Nov 2020 09:48:25 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

On 2020/11/27 9:30, Kyotaro Horiguchi wrote:

At Thu, 26 Nov 2020 22:43:48 +0900, Fujii Masao
<masao.fujii@oss.nttdata.com> wrote in

On 2020/11/12 4:38, Sergei Kornilov wrote:

Hello

Anyway, for now I think that your first patch would be enough, i.e.,
just change the context of restore_command to PGC_SIGHUP.

Glad to hear. Attached a rebased version of the original proposal.

Thanks for rebasing the patch!

This parameter is required for archive recovery,

I found the above description in config.sgml. I was just wondering
if it should be updated so that the actual specification is described
or not.
The actual spec is that restore_command is required to start archive
recovery, but optional (i.e., the parameter can be reset to an empty)
after archive recovery has started. But this updated version of
description would be rather confusing to users. So I'm now thinking
not to update that.

Does anyone object to the patch? If no, I'm thinking to commit the
patch.

Although I don't object to make the parameter reloadable, I think it
needs to be documented that server could stop after reloading if the
server failed to execute the new command line.

You mean that we should document that if restore_command is set to
improper command mistakenly, archive recovery may fail to restore some
archived WAL files and finish without replaying those WAL? But isn't
this true even without applying the patch?

If we set a wrong command in .conf and start the server in recovery
mode, the server immediately stops. It's fine. If we change
restore_command wrong way on a running server, "pg_ctl reload" stops
the server. If it is a hot standby, the server stops unexpectedly.

However, after rechecking, I found that recovery_end_command with
wrong content causes server stop at the end of recovery, or at
promotion. And that variable is PGC_SIGHUP.

So I agree not to document that. Sorry for the noise.

OK, so I pushed the patch. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In reply to: Fujii Masao (#29)
Re: Allow some recovery parameters to be changed with reload

Hello

OK, so I pushed the patch. Thanks!

Thank you!

regards, Sergei