replay pause vs. standby promotion

Started by Fujii Masaoalmost 6 years ago18 messages
#1Fujii Masao
masao.fujii@oss.nttdata.com

Hi,

Currently if pg_wal_replay_pause() is called after the standby
promotion is triggerred before the promotion has successfully
finished, WAL replay is paused. That is, the replay pause is
preferred than the promotion. Is this desiderable behavior?

ISTM that most users including me want the recovery to complete
as soon as possible and the server to become the master when
they requeste the promotion. So I'm thinking to change
the recovery so that it ignore the pause request after the promotion
is triggerred. Thought?

I want to start this discussion because this is related to the patch
(propoesd at the thread [1]/messages/by-id/19168211580382043@myt5-b646bde4b8f3.qloud-c.yandex.net) that I'm reviewing. It does that partially,
i.e., prefers the promotion only when the pause is requested by
recovery_target_action=pause. But I think that it's reasonable and
more consistent to do that whether whichever the pause is requested
by pg_wal_replay_pause() or recovery_target_action.

BTW, regarding "replay pause vs. delayed standby", any wait by
recovery_min_apply_delay doesn't happen after the promotion
is triggerred. IMO "pause" should be treated as the similar.

[1]: /messages/by-id/19168211580382043@myt5-b646bde4b8f3.qloud-c.yandex.net
/messages/by-id/19168211580382043@myt5-b646bde4b8f3.qloud-c.yandex.net

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

In reply to: Fujii Masao (#1)
Re: replay pause vs. standby promotion

Hello

I want to start this discussion because this is related to the patch
(propoesd at the thread [1]) that I'm reviewing. It does that partially,
i.e., prefers the promotion only when the pause is requested by
recovery_target_action=pause. But I think that it's reasonable and
more consistent to do that whether whichever the pause is requested
by pg_wal_replay_pause() or recovery_target_action.

+1.
I'm just not sure if this is safe for replay logic, so I did not touch this behavior in my proposal. (hmm, I wanted to mention this, but apparently forgot)

regards, Sergei

In reply to: Sergei Kornilov (#2)
Re: replay pause vs. standby promotion

On Wed, 04 Mar 2020 15:00:54 +0300
Sergei Kornilov <sk@zsrv.org> wrote:

Hello

I want to start this discussion because this is related to the patch
(propoesd at the thread [1]) that I'm reviewing. It does that partially,
i.e., prefers the promotion only when the pause is requested by
recovery_target_action=pause. But I think that it's reasonable and
more consistent to do that whether whichever the pause is requested
by pg_wal_replay_pause() or recovery_target_action.

+1.

+1

And pg_wal_replay_pause () should probably raise an error explaining the
standby ignores the pause because of ongoing promotion.

#4Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Jehan-Guillaume de Rorthais (#3)
1 attachment(s)
Re: replay pause vs. standby promotion

On 2020/03/04 23:40, Jehan-Guillaume de Rorthais wrote:

On Wed, 04 Mar 2020 15:00:54 +0300
Sergei Kornilov <sk@zsrv.org> wrote:

Hello

I want to start this discussion because this is related to the patch
(propoesd at the thread [1]) that I'm reviewing. It does that partially,
i.e., prefers the promotion only when the pause is requested by
recovery_target_action=pause. But I think that it's reasonable and
more consistent to do that whether whichever the pause is requested
by pg_wal_replay_pause() or recovery_target_action.

+1.

+1

And pg_wal_replay_pause () should probably raise an error explaining the
standby ignores the pause because of ongoing promotion.

OK, so patch attached.

This patch causes, if a promotion is triggered while recovery is paused,
the paused state to end and a promotion to continue. OTOH, this patch
makes pg_wal_replay_pause() and _resume() throw an error if it's executed
while a promotion is ongoing.

Regarding recovery_target_action, if the recovery target is reached
while a promotion is ongoing, "pause" setting will act the same as "promote",
i.e., recovery will finish and the server will start to accept connections.

To implement the above, I added new shared varible indicating whether
a promotion is triggered or not. Only startup process can update this shared
variable. Other processes like read-only backends can check whether
promotion is ongoing, via this variable.

I added new function PromoteIsTriggered() that returns true if a promotion
is triggered. Since the name of this function and the existing function
IsPromoteTriggered() are confusingly similar, I changed the name of
IsPromoteTriggered() to IsPromoteSignaled, as more appropriate name.

I'd like to apply the change of log message that Sergei proposed at [1]/messages/by-id/19168211580382043@myt5-b646bde4b8f3.qloud-c.yandex.net
after commiting this patch if it's ok.

[1]: /messages/by-id/19168211580382043@myt5-b646bde4b8f3.qloud-c.yandex.net
/messages/by-id/19168211580382043@myt5-b646bde4b8f3.qloud-c.yandex.net

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Attachments:

prefer_promote_than_pause_v1.patchtext/plain; charset=UTF-8; name=prefer_promote_than_pause_v1.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c1128f89ec..86726376f1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3570,6 +3570,9 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
         This setting has no effect if no recovery target is set.
         If <xref linkend="guc-hot-standby"/> is not enabled, a setting of
         <literal>pause</literal> will act the same as <literal>shutdown</literal>.
+        If the recovery target is reached while a promotion is ongoing,
+        a setting of <literal>pause</literal> will act the same as
+        <literal>promote</literal>.
        </para>
        <para>
         In any case, if a recovery target is configured but the archive
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 323366feb6..bcd456f52d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20154,6 +20154,13 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
     recovery is resumed.
    </para>
 
+   <para>
+    <function>pg_wal_replay_pause</function> and
+    <function>pg_wal_replay_resume</function> cannot be executed while
+    a promotion is ongoing. If a promotion is triggered while recovery
+    is paused, the paused state ends and a promotion continues.
+   </para>
+
    <para>
     If streaming replication is disabled, the paused state may continue
     indefinitely without problem. While streaming replication is in
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4361568882..c545aeffa3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -229,6 +229,12 @@ static bool LocalRecoveryInProgress = true;
  */
 static bool LocalHotStandbyActive = false;
 
+/*
+ * Local copy of SharedPromoteIsTriggered variable. False actually means "not
+ * known, need to check the shared state".
+ */
+static bool LocalPromoteIsTriggered = false;
+
 /*
  * Local state for XLogInsertAllowed():
  *		1: unconditionally allowed to insert XLOG
@@ -654,6 +660,12 @@ typedef struct XLogCtlData
 	 */
 	bool		SharedHotStandbyActive;
 
+	/*
+	 * SharedPromoteIsTriggered indicates if a standby promotion has been
+	 * triggered.  Protected by info_lck.
+	 */
+	bool		SharedPromoteIsTriggered;
+
 	/*
 	 * WalWriterSleeping indicates whether the WAL writer is currently in
 	 * low-power mode (and hence should be nudged if an async commit occurs).
@@ -912,6 +924,7 @@ static void InitControlFile(uint64 sysidentifier);
 static void WriteControlFile(void);
 static void ReadControlFile(void);
 static char *str_time(pg_time_t tnow);
+static void SetPromoteIsTriggered(void);
 static bool CheckForStandbyTrigger(void);
 
 #ifdef WAL_DEBUG
@@ -5112,6 +5125,7 @@ XLOGShmemInit(void)
 	XLogCtl->XLogCacheBlck = XLOGbuffers - 1;
 	XLogCtl->SharedRecoveryInProgress = true;
 	XLogCtl->SharedHotStandbyActive = false;
+	XLogCtl->SharedPromoteIsTriggered = false;
 	XLogCtl->WalWriterSleeping = false;
 
 	SpinLockInit(&XLogCtl->Insert.insertpos_lck);
@@ -5940,14 +5954,20 @@ recoveryPausesHere(void)
 	if (!LocalHotStandbyActive)
 		return;
 
+	/* Don't pause after standby promotion has been triggered */
+	if (LocalPromoteIsTriggered)
+		return;
+
 	ereport(LOG,
 			(errmsg("recovery has paused"),
 			 errhint("Execute pg_wal_replay_resume() to continue.")));
 
 	while (RecoveryIsPaused())
 	{
+		HandleStartupProcInterrupts();
+		if (CheckForStandbyTrigger())
+			return;
 		pg_usleep(1000000L);	/* 1000 ms */
-		HandleStartupProcInterrupts();
 	}
 }
 
@@ -12252,6 +12272,40 @@ emode_for_corrupt_record(int emode, XLogRecPtr RecPtr)
 	return emode;
 }
 
+/*
+ * Has a standby promotion already been triggered?
+ *
+ * Unlike CheckForStandbyTrigger(), this works in any process
+ * that's connected to shared memory.
+ */
+bool
+PromoteIsTriggered(void)
+{
+	/*
+	 * We check shared state each time only until a standby promotion is
+	 * triggered. We can't trigger a promotion again, so there's no need to
+	 * keep checking after the shared variable has once been seen true.
+	 */
+	if (LocalPromoteIsTriggered)
+		return true;
+
+	SpinLockAcquire(&XLogCtl->info_lck);
+	LocalPromoteIsTriggered = XLogCtl->SharedPromoteIsTriggered;
+	SpinLockRelease(&XLogCtl->info_lck);
+
+	return LocalPromoteIsTriggered;
+}
+
+static void
+SetPromoteIsTriggered(void)
+{
+	SpinLockAcquire(&XLogCtl->info_lck);
+	XLogCtl->SharedPromoteIsTriggered = true;
+	SpinLockRelease(&XLogCtl->info_lck);
+
+	LocalPromoteIsTriggered = true;
+}
+
 /*
  * Check to see whether the user-specified trigger file exists and whether a
  * promote request has arrived.  If either condition holds, return true.
@@ -12260,12 +12314,11 @@ static bool
 CheckForStandbyTrigger(void)
 {
 	struct stat stat_buf;
-	static bool triggered = false;
 
-	if (triggered)
+	if (LocalPromoteIsTriggered)
 		return true;
 
-	if (IsPromoteTriggered())
+	if (IsPromoteSignaled())
 	{
 		/*
 		 * In 9.1 and 9.2 the postmaster unlinked the promote file inside the
@@ -12288,8 +12341,8 @@ CheckForStandbyTrigger(void)
 
 		ereport(LOG, (errmsg("received promote request")));
 
-		ResetPromoteTriggered();
-		triggered = true;
+		ResetPromoteSignaled();
+		SetPromoteIsTriggered();
 		return true;
 	}
 
@@ -12301,7 +12354,7 @@ CheckForStandbyTrigger(void)
 		ereport(LOG,
 				(errmsg("promote trigger file found: %s", PromoteTriggerFile)));
 		unlink(PromoteTriggerFile);
-		triggered = true;
+		SetPromoteIsTriggered();
 		fast_promote = true;
 		return true;
 	}
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 20316539b6..b84ba57259 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -531,6 +531,13 @@ pg_wal_replay_pause(PG_FUNCTION_ARGS)
 				 errmsg("recovery is not in progress"),
 				 errhint("Recovery control functions can only be executed during recovery.")));
 
+	if (PromoteIsTriggered())
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("standby promotion is ongoing"),
+				 errhint("%s cannot be executed after promotion is triggered.",
+						 "pg_wal_replay_pause()")));
+
 	SetRecoveryPause(true);
 
 	PG_RETURN_VOID();
@@ -551,6 +558,13 @@ pg_wal_replay_resume(PG_FUNCTION_ARGS)
 				 errmsg("recovery is not in progress"),
 				 errhint("Recovery control functions can only be executed during recovery.")));
 
+	if (PromoteIsTriggered())
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("standby promotion is ongoing"),
+				 errhint("%s cannot be executed after promotion is triggered.",
+						 "pg_wal_replay_resume()")));
+
 	SetRecoveryPause(false);
 
 	PG_RETURN_VOID();
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index c2250d7d4e..8952676765 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -39,7 +39,7 @@
  */
 static volatile sig_atomic_t got_SIGHUP = false;
 static volatile sig_atomic_t shutdown_requested = false;
-static volatile sig_atomic_t promote_triggered = false;
+static volatile sig_atomic_t promote_signaled = false;
 
 /*
  * Flag set when executing a restore command, to tell SIGTERM signal handler
@@ -63,7 +63,7 @@ StartupProcTriggerHandler(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
 
-	promote_triggered = true;
+	promote_signaled = true;
 	WakeupRecovery();
 
 	errno = save_errno;
@@ -197,13 +197,13 @@ PostRestoreCommand(void)
 }
 
 bool
-IsPromoteTriggered(void)
+IsPromoteSignaled(void)
 {
-	return promote_triggered;
+	return promote_signaled;
 }
 
 void
-ResetPromoteTriggered(void)
+ResetPromoteSignaled(void)
 {
-	promote_triggered = false;
+	promote_signaled = false;
 }
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 98b033fc20..331497bcfb 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -313,6 +313,7 @@ extern XLogRecPtr GetFlushRecPtr(void);
 extern XLogRecPtr GetLastImportantRecPtr(void);
 extern void RemovePromoteSignalFiles(void);
 
+extern bool PromoteIsTriggered(void);
 extern bool CheckPromoteSignal(void);
 extern void WakeupRecovery(void);
 extern void SetWalWriterSleeping(bool sleeping);
diff --git a/src/include/postmaster/startup.h b/src/include/postmaster/startup.h
index 9f59c1ffa3..bec313764a 100644
--- a/src/include/postmaster/startup.h
+++ b/src/include/postmaster/startup.h
@@ -16,7 +16,7 @@ extern void HandleStartupProcInterrupts(void);
 extern void StartupProcessMain(void) pg_attribute_noreturn();
 extern void PreRestoreCommand(void);
 extern void PostRestoreCommand(void);
-extern bool IsPromoteTriggered(void);
-extern void ResetPromoteTriggered(void);
+extern bool IsPromoteSignaled(void);
+extern void ResetPromoteSignaled(void);
 
 #endif							/* _STARTUP_H */
#5Atsushi Torikoshi
atorik@gmail.com
In reply to: Fujii Masao (#4)
Re: replay pause vs. standby promotion

On Fri, Mar 6, 2020 at 10:18 PM Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:

OK, so patch attached.

This patch causes, if a promotion is triggered while recovery is paused,
the paused state to end and a promotion to continue. OTOH, this patch
makes pg_wal_replay_pause() and _resume() throw an error if it's executed
while a promotion is ongoing.

Regarding recovery_target_action, if the recovery target is reached

while a promotion is ongoing, "pause" setting will act the same as
"promote",
i.e., recovery will finish and the server will start to accept connections.

To implement the above, I added new shared varible indicating whether
a promotion is triggered or not. Only startup process can update this
shared
variable. Other processes like read-only backends can check whether
promotion is ongoing, via this variable.

I added new function PromoteIsTriggered() that returns true if a promotion
is triggered. Since the name of this function and the existing function
IsPromoteTriggered() are confusingly similar, I changed the name of
IsPromoteTriggered() to IsPromoteSignaled, as more appropriate name.

I've confirmed the patch works as you described above.
And I also poked around it a little bit but found no problems.

Regards,

--
Atsushi Torikoshi

#6Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Atsushi Torikoshi (#5)
Re: replay pause vs. standby promotion

On 2020/03/20 15:22, Atsushi Torikoshi wrote:

On Fri, Mar 6, 2020 at 10:18 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:

OK, so patch attached.

This patch causes, if a promotion is triggered while recovery is paused,
the paused state to end and a promotion to continue. OTOH, this patch
makes pg_wal_replay_pause() and _resume() throw an error if it's executed
while a promotion is ongoing.

Regarding recovery_target_action, if the recovery target is reached
while a promotion is ongoing, "pause" setting will act the same as "promote",
i.e., recovery will finish and the server will start to accept connections.

To implement the above, I added new shared varible indicating whether
a promotion is triggered or not. Only startup process can update this shared
variable. Other processes like read-only backends can check whether
promotion is ongoing, via this variable.

I added new function PromoteIsTriggered() that returns true if a promotion
is triggered. Since the name of this function and the existing function
IsPromoteTriggered() are confusingly similar, I changed the name of
IsPromoteTriggered() to IsPromoteSignaled, as more appropriate name.

I've confirmed the patch works as you described above.
And I also poked around it a little bit but found no problems.

Thanks for the review!
Barrying any objection, I will commit the patch.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

In reply to: Fujii Masao (#6)
Re: replay pause vs. standby promotion

Hello

(I am trying to find an opportunity to review this patch...)

Consider test case with streaming replication:

on primary: create table foo (i int);
on standby:

postgres=# select pg_wal_replay_pause();
pg_wal_replay_pause
---------------------

(1 row)

postgres=# select pg_is_wal_replay_paused();
pg_is_wal_replay_paused
-------------------------
t
(1 row)

postgres=# table foo;
i
---
(0 rows)

Execute "insert into foo values (1);" on primary

postgres=# select pg_promote ();
pg_promote
------------
t
(1 row)

postgres=# table foo;
i
---
1

And we did replay one additional change during promote. I think this is wrong behavior. Possible can be fixed by

+ if (PromoteIsTriggered()) break;
/* Setup error traceback support for ereport() */
errcallback.callback = rm_redo_error_callback;

regards, Sergei

#8Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Sergei Kornilov (#7)
Re: replay pause vs. standby promotion

On 2020/03/23 22:46, Sergei Kornilov wrote:

Hello

(I am trying to find an opportunity to review this patch...)

Thanks for the review! It's really helpful!

Consider test case with streaming replication:

on primary: create table foo (i int);
on standby:

postgres=# select pg_wal_replay_pause();
pg_wal_replay_pause
---------------------

(1 row)

postgres=# select pg_is_wal_replay_paused();
pg_is_wal_replay_paused
-------------------------
t
(1 row)

postgres=# table foo;
i
---
(0 rows)

Execute "insert into foo values (1);" on primary

postgres=# select pg_promote ();
pg_promote
------------
t
(1 row)

postgres=# table foo;
i
---
1

And we did replay one additional change during promote. I think this is wrong behavior. Possible can be fixed by

+ if (PromoteIsTriggered()) break;
/* Setup error traceback support for ereport() */
errcallback.callback = rm_redo_error_callback;

You meant that the promotion request should cause the recovery
to finish immediately even if there are still outstanding WAL records,
and cause the standby to become the master? I don't think that
it's the expected (also existing) behavior of the promotion. That is,
the promotion request should cause the recovery to replay as much
WAL records as possible, to the end, in order to avoid data loss. No?

If we would like to have the promotion method to finish recovery
immediately, IMO we should implement something like
"pg_ctl promote -m fast". That is, we need to add new method into
the promotion.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

#9Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#8)
Re: replay pause vs. standby promotion

On Mon, Mar 23, 2020 at 10:36 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

If we would like to have the promotion method to finish recovery
immediately, IMO we should implement something like
"pg_ctl promote -m fast". That is, we need to add new method into
the promotion.

I think 'immediate' would be a better choice. One reason is that we've
used the term 'fast promotion' in the past for a different feature.
Another is that 'immediate' might sound slightly scary to people who
are familiar with what 'pg_ctl stop -mimmediate' does. And you want
people doing this to be just a little bit scared: not too scared, but
a little scared.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In reply to: Fujii Masao (#8)
Re: replay pause vs. standby promotion

Hello

You meant that the promotion request should cause the recovery
to finish immediately even if there are still outstanding WAL records,
and cause the standby to become the master?

Oh, I get your point. But yes, I expect that in case of promotion request during a pause, the user (me too) will want to have exactly the current state, not latest available in WALs.

Real usercase from my experience:
The user wants to update a third-party application. In case of problems, he wants to return to the old version of the application and the unchanged replica. Thus, it sets a pause on standby and performs an update. If all is ok - he will resume replay. In case of some problems he plans to promote standby.
But oops, standby will ignore promote signals during pause and we need get currect LSN from standby and restart it with recovery_target_lsn = ? and recovery_target_action = promote to achieve this state.

regards, Sergei

#11Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Robert Haas (#9)
Re: replay pause vs. standby promotion

On 2020/03/23 23:55, Robert Haas wrote:

On Mon, Mar 23, 2020 at 10:36 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

If we would like to have the promotion method to finish recovery
immediately, IMO we should implement something like
"pg_ctl promote -m fast". That is, we need to add new method into
the promotion.

I think 'immediate' would be a better choice. One reason is that we've
used the term 'fast promotion' in the past for a different feature.
Another is that 'immediate' might sound slightly scary to people who
are familiar with what 'pg_ctl stop -mimmediate' does. And you want
people doing this to be just a little bit scared: not too scared, but
a little scared.

+1

When I proposed the feature five years before, I used "immediate"
as the option value.
/messages/by-id/CAHGQGwHtvyDqKZaYWYA9zyyLEcAKiF5P0KpcpuNE_tsrGTFtQw@mail.gmail.com

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

#12Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Sergei Kornilov (#10)
Re: replay pause vs. standby promotion

On 2020/03/24 0:17, Sergei Kornilov wrote:

Hello

You meant that the promotion request should cause the recovery
to finish immediately even if there are still outstanding WAL records,
and cause the standby to become the master?

Oh, I get your point. But yes, I expect that in case of promotion request during a pause, the user (me too) will want to have exactly the current state, not latest available in WALs.

Basically I'd like the promotion to make the standby replay all the WAL
even if it's requested during pause state. OTOH I understand there
are use cases where immediate promotion is useful, as you explained.
So, +1 to add something like "pg_ctl promote -m immediate".

But I'm afraid that now it's too late to add such feature into v13.
Probably it's an item for v14....

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

#13Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Fujii Masao (#12)
Re: replay pause vs. standby promotion

On 2020/03/24 0:57, Fujii Masao wrote:

On 2020/03/24 0:17, Sergei Kornilov wrote:

Hello

You meant that the promotion request should cause the recovery
to finish immediately even if there are still outstanding WAL records,
and cause the standby to become the master?

Oh, I get your point. But yes, I expect that in case of promotion request during a pause, the user (me too) will want to have exactly the current state, not latest available in WALs.

Basically I'd like the promotion to make the standby replay all the WAL
even if it's requested during pause state. OTOH I understand there
are use cases where immediate promotion is useful, as you explained.
So, +1 to add something like "pg_ctl promote -m immediate".

But I'm afraid that now it's too late to add such feature into v13.
Probably it's an item for v14....

I pushed the latest version of the patch. If you have further opinion
about immediate promotion, let's keep discussing that!

Also we need to go back to the original patch posted at [1]/messages/by-id/19168211580382043@myt5-b646bde4b8f3.qloud-c.yandex.net.

[1]: /messages/by-id/19168211580382043@myt5-b646bde4b8f3.qloud-c.yandex.net
/messages/by-id/19168211580382043@myt5-b646bde4b8f3.qloud-c.yandex.net

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

In reply to: Fujii Masao (#13)
Re: replay pause vs. standby promotion

Hello

I pushed the latest version of the patch. If you have further opinion
about immediate promotion, let's keep discussing that!

Thank you!

Honestly, I forgot that the promotion is documented in high-availability.sgml as:

Before failover, any WAL immediately available in the archive or in pg_wal will be
restored, but no attempt is made to connect to the master.

I mistakenly thought that promote should be "immediately"...

If a promotion is triggered while recovery is paused, the paused state ends and a promotion continues.

Could we add a few words in func.sgml to clarify the behavior? Especially for users from my example above. Something like:

If a promotion is triggered while recovery is paused, the paused state ends, replay of any WAL immediately available in the archive or in pg_wal will be continued and then a promotion will be completed.

regards, Sergei

#15Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Sergei Kornilov (#14)
Re: replay pause vs. standby promotion

On 2020/03/25 0:17, Sergei Kornilov wrote:

Hello

I pushed the latest version of the patch. If you have further opinion
about immediate promotion, let's keep discussing that!

Thank you!

Honestly, I forgot that the promotion is documented in high-availability.sgml as:

Before failover, any WAL immediately available in the archive or in pg_wal will be
restored, but no attempt is made to connect to the master.

I mistakenly thought that promote should be "immediately"...

If a promotion is triggered while recovery is paused, the paused state ends and a promotion continues.

Could we add a few words in func.sgml to clarify the behavior? Especially for users from my example above. Something like:

If a promotion is triggered while recovery is paused, the paused state ends, replay of any WAL immediately available in the archive or in pg_wal will be continued and then a promotion will be completed.

This description is true if pause is requested by pg_wal_replay_pause(),
but not if recovery target is reached and pause is requested by
recovery_target_action=pause. In the latter case, even if there are WAL data
avaiable in pg_wal or archive, they are not replayed, i.e., the promotion
completes immediately. Probably we should document those two cases
explicitly to avoid the confusion about a promotion and recovery pause?

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

In reply to: Fujii Masao (#15)
Re: replay pause vs. standby promotion

Hi

 Could we add a few words in func.sgml to clarify the behavior? Especially for users from my example above. Something like:

 If a promotion is triggered while recovery is paused, the paused state ends, replay of any WAL immediately available in the archive or in pg_wal will be continued and then a promotion will be completed.

This description is true if pause is requested by pg_wal_replay_pause(),
but not if recovery target is reached and pause is requested by
recovery_target_action=pause. In the latter case, even if there are WAL data
avaiable in pg_wal or archive, they are not replayed, i.e., the promotion
completes immediately. Probably we should document those two cases
explicitly to avoid the confusion about a promotion and recovery pause?

This is description for pg_wal_replay_pause, but actually we suggest to call pg_wal_replay_resume in recovery_target_action=pause... So, I agree, we need to document both cases.

PS: I think we have inconsistent behavior here... Read wal during promotion from local pg_wal AND call restore_command, but ignore walreceiver also seems strange for my DBA hat...

regards, Sergei

#17Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Sergei Kornilov (#16)
Re: replay pause vs. standby promotion

On 2020/03/25 19:42, Sergei Kornilov wrote:

Hi

 Could we add a few words in func.sgml to clarify the behavior? Especially for users from my example above. Something like:

 If a promotion is triggered while recovery is paused, the paused state ends, replay of any WAL immediately available in the archive or in pg_wal will be continued and then a promotion will be completed.

This description is true if pause is requested by pg_wal_replay_pause(),
but not if recovery target is reached and pause is requested by
recovery_target_action=pause. In the latter case, even if there are WAL data
avaiable in pg_wal or archive, they are not replayed, i.e., the promotion
completes immediately. Probably we should document those two cases
explicitly to avoid the confusion about a promotion and recovery pause?

This is description for pg_wal_replay_pause, but actually we suggest to call pg_wal_replay_resume in recovery_target_action=pause... So, I agree, we need to document both cases.

PS: I think we have inconsistent behavior here... Read wal during promotion from local pg_wal AND call restore_command, but ignore walreceiver also seems strange for my DBA hat...

If we don't ignore walreceiver and does try to connect to the master,
a promotion and recovery cannot end forever since new WAL data can
be streamed. You think this behavior is more consistent?

IMO it's valid to replay all the WAL data available to avoid data loss
before a promotion completes.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

In reply to: Fujii Masao (#17)
Re: replay pause vs. standby promotion

Hello

If we don't ignore walreceiver and does try to connect to the master,
a promotion and recovery cannot end forever since new WAL data can
be streamed. You think this behavior is more consistent?

We have no simple point to stop replay.
Well, except for "immediately" - just one easy stop. But I agree that this is not the best option. Simple and clear, but not best one for data when we want to replay as much as possible from archive.

IMO it's valid to replay all the WAL data available to avoid data loss
before a promotion completes.

But in case of still working primary (with archive_command) we choose quite random time to promote. A random time when the primary did not save the new wal segment.
or even when a temporary error of restore_command occurs? We mention just cp command in docs. I know users uses cp (e.g. from NFS) without further error handling.

regards, Sergei