RecoveryWalAll and RecoveryWalStream wait events

Started by Fujii Masaoalmost 6 years ago9 messages
#1Fujii Masao
masao.fujii@oss.nttdata.com

Hi,

RecoveryWalAll and RecoveryWalStream wait events are documented as follows.

RecoveryWalAll
Waiting for WAL from any kind of source (local, archive or stream) at recovery.

RecoveryWalStream
Waiting for WAL from a stream at recovery.

But as far as I read the code, RecoveryWalAll is reported only when waiting
for WAL from a stream. So the current description looks incorrect. What's
described now for RecoveryWalStream seems rather fit to RecoveryWalAll.
I'd like to change the description of RecoveryWalAll to "Waiting for WAL
from a stream at recovery".

Regarding RecoveryWalStream, as far as I read the code, while this event is
being reported, the startup process is waiting for next trial to retrieve
WAL data when WAL data is not available from any sources, based on
wal_retrieve_retry_interval. So this current description looks also
incorrect. I'd like to change it to "Waiting when WAL data is not available
from any kind of sources (local, archive or stream) before trying again
to retrieve WAL data".

Thought?

Also the current names of these wait events sound confusing. I think
that RecoveryWalAll should be changed to RecoveryWalStream.
RecoveryWalStream should be RecoveryRetrieveRetryInterval or
something.

Another problem is that the current wait event types of them also look
strange. Currently the type of them is Activity, but IMO it's better to
use IPC for RecoveryWalAll because it's waiting for walreceiver to
receive new WAL. Also it's better to use Timeout for RecoveryWalStream
because it's waiting depending on wal_retrieve_retry_interval.

The changes of wait event types and names would break the compatibility
of wait events in pg_stat_activity. So this change should not be applied
to the back branches, but it's ok to apply in the master. Right?

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fujii Masao (#1)
Re: RecoveryWalAll and RecoveryWalStream wait events

Hello.

At Tue, 18 Feb 2020 12:25:51 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

Hi,

RecoveryWalAll and RecoveryWalStream wait events are documented as
follows.

RecoveryWalAll
Waiting for WAL from any kind of source (local, archive or stream) at
recovery.

RecoveryWalStream
Waiting for WAL from a stream at recovery.

But as far as I read the code, RecoveryWalAll is reported only when
waiting
for WAL from a stream. So the current description looks
incorrect. What's
described now for RecoveryWalStream seems rather fit to
RecoveryWalAll.
I'd like to change the description of RecoveryWalAll to "Waiting for
WAL
from a stream at recovery".

Good catch!

Regarding RecoveryWalStream, as far as I read the code, while this
event is
being reported, the startup process is waiting for next trial to
retrieve
WAL data when WAL data is not available from any sources, based on
wal_retrieve_retry_interval. So this current description looks also
incorrect. I'd like to change it to "Waiting when WAL data is not
available
from any kind of sources (local, archive or stream) before trying
again
to retrieve WAL data".

Thought?

I agree that the corrected description sound correct in meaning. The
latter seems a bit lengthy, though.

Also the current names of these wait events sound confusing. I think
that RecoveryWalAll should be changed to RecoveryWalStream.
RecoveryWalStream should be RecoveryRetrieveRetryInterval or
something.

I agree to the former, I think RecoveryWalInterval works well enough.

Another problem is that the current wait event types of them also look
strange. Currently the type of them is Activity, but IMO it's better
to
use IPC for RecoveryWalAll because it's waiting for walreceiver to
receive new WAL. Also it's better to use Timeout for RecoveryWalStream
because it's waiting depending on wal_retrieve_retry_interval.

Do you mean condition variable by the "IPC"? But the WaitLatch waits
not only for new WAL but also for trigger, SIGHUP, shutdown and
walreceiver events other than new WAL. I'm not sure that condition
variable fits for the purpose.

The changes of wait event types and names would break the
compatibility
of wait events in pg_stat_activity. So this change should not be
applied
to the back branches, but it's ok to apply in the master. Right?

FWIW, It seems right.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#3Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Kyotaro Horiguchi (#2)
2 attachment(s)
Re: RecoveryWalAll and RecoveryWalStream wait events

On 2020/02/18 14:20, Kyotaro Horiguchi wrote:

Hello.

At Tue, 18 Feb 2020 12:25:51 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

Hi,

RecoveryWalAll and RecoveryWalStream wait events are documented as
follows.

RecoveryWalAll
Waiting for WAL from any kind of source (local, archive or stream) at
recovery.

RecoveryWalStream
Waiting for WAL from a stream at recovery.

But as far as I read the code, RecoveryWalAll is reported only when
waiting
for WAL from a stream. So the current description looks
incorrect. What's
described now for RecoveryWalStream seems rather fit to
RecoveryWalAll.
I'd like to change the description of RecoveryWalAll to "Waiting for
WAL
from a stream at recovery".

Good catch!

Regarding RecoveryWalStream, as far as I read the code, while this
event is
being reported, the startup process is waiting for next trial to
retrieve
WAL data when WAL data is not available from any sources, based on
wal_retrieve_retry_interval. So this current description looks also
incorrect. I'd like to change it to "Waiting when WAL data is not
available
from any kind of sources (local, archive or stream) before trying
again
to retrieve WAL data".

Thought?

I agree that the corrected description sound correct in meaning. The
latter seems a bit lengthy, though.

Yeah, so better idea?

Anyway, attached is the patch (fix_recovery_wait_event_doc_v1.patch)
that fixes the descriptions of those wait events. This should be
back-patched to v9.5.

Also the current names of these wait events sound confusing. I think
that RecoveryWalAll should be changed to RecoveryWalStream.
RecoveryWalStream should be RecoveryRetrieveRetryInterval or
something.

I agree to the former, I think RecoveryWalInterval works well enough.

RecoveryWalInterval sounds confusing to me...

Attached is the patch (improve_recovery_wait_event_for_master_v1.patch) that
changes the names and types of wait events. This patch uses
RecoveryRetrieveRetryInterval, but if there is better name,
I will adopt that.

Note that this patch needs to be applied after
fix_recovery_wait_event_doc_v1.patch is applied.

Another problem is that the current wait event types of them also look
strange. Currently the type of them is Activity, but IMO it's better
to
use IPC for RecoveryWalAll because it's waiting for walreceiver to
receive new WAL. Also it's better to use Timeout for RecoveryWalStream
because it's waiting depending on wal_retrieve_retry_interval.

Do you mean condition variable by the "IPC"? But the WaitLatch waits
not only for new WAL but also for trigger, SIGHUP, shutdown and
walreceiver events other than new WAL. I'm not sure that condition
variable fits for the purpose.

OK, I didn't change the type of RecoveryWalStream to IPC, in the patch.
Its type is still Activity.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Attachments:

fix_recovery_wait_event_doc_v1.patchtext/plain; charset=UTF-8; name=fix_recovery_wait_event_doc_v1.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 87586a7b06..c641361418 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1270,12 +1270,16 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
         </row>
         <row>
          <entry><literal>RecoveryWalAll</literal></entry>
-         <entry>Waiting for WAL from any kind of source (local, archive or stream) at recovery.</entry>
-        </row>
-        <row>
-         <entry><literal>RecoveryWalStream</literal></entry>
          <entry>Waiting for WAL from a stream at recovery.</entry>
         </row>
+        <row>
+         <entry><literal>RecoveryWalStream</literal></entry>
+         <entry>
+          Waiting when WAL data is not available from any kind of sources
+          (local, archive or stream) before trying again to retrieve WAL data,
+          at recovery.
+         </entry>
+        </row>
         <row>
          <entry><literal>SysLoggerMain</literal></entry>
          <entry>Waiting in main loop of syslogger process.</entry>
improve_recovery_wait_event_for_master_v1.patchtext/plain; charset=UTF-8; name=improve_recovery_wait_event_for_master_v1.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index c641361418..8fbaad197b 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1236,7 +1236,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting to acquire a pin on a buffer.</entry>
         </row>
         <row>
-         <entry morerows="13"><literal>Activity</literal></entry>
+         <entry morerows="12"><literal>Activity</literal></entry>
          <entry><literal>ArchiverMain</literal></entry>
          <entry>Waiting in main loop of the archiver process.</entry>
         </row>
@@ -1268,17 +1268,9 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry><literal>PgStatMain</literal></entry>
          <entry>Waiting in main loop of the statistics collector process.</entry>
         </row>
-        <row>
-         <entry><literal>RecoveryWalAll</literal></entry>
-         <entry>Waiting for WAL from a stream at recovery.</entry>
-        </row>
         <row>
          <entry><literal>RecoveryWalStream</literal></entry>
-         <entry>
-          Waiting when WAL data is not available from any kind of sources
-          (local, archive or stream) before trying again to retrieve WAL data,
-          at recovery.
-         </entry>
+         <entry>Waiting for WAL from a stream at recovery.</entry>
         </row>
         <row>
          <entry><literal>SysLoggerMain</literal></entry>
@@ -1488,7 +1480,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
         </row>
         <row>
-         <entry morerows="2"><literal>Timeout</literal></entry>
+         <entry morerows="3"><literal>Timeout</literal></entry>
          <entry><literal>BaseBackupThrottle</literal></entry>
          <entry>Waiting during base backup when throttling activity.</entry>
         </row>
@@ -1500,6 +1492,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry><literal>RecoveryApplyDelay</literal></entry>
          <entry>Waiting to apply WAL at recovery because it is delayed.</entry>
         </row>
+        <row>
+         <entry><literal>RecoveryRetrieveRetryInterval</literal></entry>
+         <entry>
+          Waiting when WAL data is not available from any kind of sources
+          (local, archive or stream) before trying again to retrieve WAL data,
+          at recovery.
+         </entry>
+        </row>
         <row>
          <entry morerows="68"><literal>IO</literal></entry>
          <entry><literal>BufFileRead</literal></entry>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3813eadfb4..a9659223e0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11981,7 +11981,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 										 WL_LATCH_SET | WL_TIMEOUT |
 										 WL_EXIT_ON_PM_DEATH,
 										 wait_time,
-										 WAIT_EVENT_RECOVERY_WAL_STREAM);
+										 WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL);
 						ResetLatch(&XLogCtl->recoveryWakeupLatch);
 						now = GetCurrentTimestamp();
 					}
@@ -12165,7 +12165,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					(void) WaitLatch(&XLogCtl->recoveryWakeupLatch,
 									 WL_LATCH_SET | WL_TIMEOUT |
 									 WL_EXIT_ON_PM_DEATH,
-									 5000L, WAIT_EVENT_RECOVERY_WAL_ALL);
+									 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM);
 					ResetLatch(&XLogCtl->recoveryWakeupLatch);
 					break;
 				}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 59dc4f31ab..aba23f5d23 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3654,9 +3654,6 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_PGSTAT_MAIN:
 			event_name = "PgStatMain";
 			break;
-		case WAIT_EVENT_RECOVERY_WAL_ALL:
-			event_name = "RecoveryWalAll";
-			break;
 		case WAIT_EVENT_RECOVERY_WAL_STREAM:
 			event_name = "RecoveryWalStream";
 			break;
@@ -3876,6 +3873,9 @@ pgstat_get_wait_timeout(WaitEventTimeout w)
 		case WAIT_EVENT_RECOVERY_APPLY_DELAY:
 			event_name = "RecoveryApplyDelay";
 			break;
+		case WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL:
+			event_name = "RecoveryRetrieveRetryInterval";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 3a65a51696..45496738f0 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -779,7 +779,6 @@ typedef enum
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
-	WAIT_EVENT_RECOVERY_WAL_ALL,
 	WAIT_EVENT_RECOVERY_WAL_STREAM,
 	WAIT_EVENT_SYSLOGGER_MAIN,
 	WAIT_EVENT_WAL_RECEIVER_MAIN,
@@ -866,7 +865,8 @@ typedef enum
 {
 	WAIT_EVENT_BASE_BACKUP_THROTTLE = PG_WAIT_TIMEOUT,
 	WAIT_EVENT_PG_SLEEP,
-	WAIT_EVENT_RECOVERY_APPLY_DELAY
+	WAIT_EVENT_RECOVERY_APPLY_DELAY,
+	WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL
 } WaitEventTimeout;
 
 /* ----------
#4Atsushi Torikoshi
atorik@gmail.com
In reply to: Fujii Masao (#3)
Re: RecoveryWalAll and RecoveryWalStream wait events

On 2020/02/19 21:46 Fujii Masao <masao.fujii@oss.nttdata.com>:

I agree to the former, I think RecoveryWalInterval works well enough.

RecoveryWalInterval sounds confusing to me...

IMHO as a user, I prefer RecoveryRetrieveRetryInterval because
it's easy to understand this wait_event is related to the
parameter 'wal_retrieve_retry_interval'.

Also from the point of balance, the explanation of
RecoveryRetrieveRetryInterval is lengthy, but I
sometimes feel explanations of wait_events in the
manual are so simple that it's hard to understand
well.

Waiting when WAL data is not available from any kind of sources
(local, archive or stream) before trying again to retrieve WAL data,

I think 'local' means pg_wal here, but the comment on
WaitForWALToBecomeAvailable() says checking pg_wal in
standby mode is 'not documented', so I'm a little bit worried
that users may be confused.

Regards,
--
Torikoshi Atsushi

#5Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Atsushi Torikoshi (#4)
Re: RecoveryWalAll and RecoveryWalStream wait events

On 2020/03/15 0:06, Atsushi Torikoshi wrote:

On 2020/02/19 21:46 Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>:

I agree to the former, I think RecoveryWalInterval works well enough.

RecoveryWalInterval sounds confusing to me...

IMHO as a user, I prefer RecoveryRetrieveRetryInterval because
it's easy to understand this wait_event is related to the
parameter 'wal_retrieve_retry_interval'.

Also from the point of balance, the explanation of
RecoveryRetrieveRetryInterval is lengthy, but I
sometimes feel explanations of wait_events in the
manual are so simple that it's hard to understand
well.

+1 to document them more. It's not easy task, though..

   Waiting when WAL data is not available from any kind of sources
   (local, archive or stream) before trying again to retrieve WAL data,

I think 'local' means pg_wal here, but the comment on
WaitForWALToBecomeAvailable() says checking pg_wal in
standby mode is 'not documented', so I'm a little bit worried
that users may be confused.

This logic seems to be documented in high-availability.sgml.
But, anyway, you think that "pg_wal" should be used instead of "local" here?

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

#6Atsushi Torikoshi
atorik@gmail.com
In reply to: Fujii Masao (#5)
Re: RecoveryWalAll and RecoveryWalStream wait events

On Tue, Mar 17, 2020 at 11:55 AM Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:

Waiting when WAL data is not available from any kind of sources
(local, archive or stream) before trying again to retrieve WAL

data,

I think 'local' means pg_wal here, but the comment on
WaitForWALToBecomeAvailable() says checking pg_wal in
standby mode is 'not documented', so I'm a little bit worried
that users may be confused.

This logic seems to be documented in high-availability.sgml.

Thanks! I didn't notice it.
I think you mean the below sentence.

The standby server will also attempt to restore any WAL found in the

standby cluster's pg_wal directory.

It seems the comment on WaitForWALToBecomeAvailable()
does not go along with the high-availability.sgml, do we need
modification on the comment on the function?
Or do I misunderstand something?

But, anyway, you think that "pg_wal" should be used instead

of "local" here?

I don't have special opinion here.
It might be better because high-availability.sgml does not use
"local" but "pg_wal" for the explanation, but I also feel it's
obvious in this context.

Regards,

--
Torikoshi Atsushi

#7Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Atsushi Torikoshi (#6)
1 attachment(s)
Re: RecoveryWalAll and RecoveryWalStream wait events

On 2020/03/18 17:56, Atsushi Torikoshi wrote:

On Tue, Mar 17, 2020 at 11:55 AM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:

  >    Waiting when WAL data is not available from any kind of sources
  >    (local, archive or stream) before trying again to retrieve WAL data,

I think 'local' means pg_wal here, but the comment on
WaitForWALToBecomeAvailable() says checking pg_wal in
standby mode is 'not documented', so I'm a little bit worried
that users may be confused.

This logic seems to be documented in high-availability.sgml.

Thanks! I didn't notice it.
I think you mean the below sentence.

  The standby server will also attempt to restore any WAL found in the standby cluster's pg_wal directory.

I meant the following part in the doc.

---------------------
At startup, the standby begins by restoring all WAL available in the archive
location, calling restore_command. Once it reaches the end of WAL available
there and restore_command fails, it tries to restore any WAL available in the
pg_wal directory. If that fails, and streaming replication has been configured,
the standby tries to connect to the primary server and start streaming WAL from
the last valid record found in archive or pg_wal. If that fails or streaming
replication is not configured, or if the connection is later disconnected,
the standby goes back to step 1 and tries to restore the file from the archive
again. This loop of retries from the archive, pg_wal, and via streaming
replication goes on until the server is stopped or failover is triggered by a
trigger file.
---------------------

It seems the comment on WaitForWALToBecomeAvailable()
does not go along with the high-availability.sgml, do we need
modification on the comment on the function?

No, I think for now. But you'd like to improve the docs?

But, anyway, you think that "pg_wal" should be used instead

of "local" here?

I don't have special opinion here.
It might be better because high-availability.sgml does not use
"local" but "pg_wal" for the explanation,  but I also feel it's
obvious in this context.

Ok, I changed that from "local" to "pg_wal" in the patch for
the master. Attached is the updated version of the patch.
If you're OK with this, I'd like to commit two patches that I posted
in this thread.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Attachments:

improve_recovery_wait_event_for_master_v2.patchtext/plain; charset=UTF-8; name=improve_recovery_wait_event_for_master_v2.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 7626987808..89853a16d8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1244,7 +1244,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting to acquire a pin on a buffer.</entry>
         </row>
         <row>
-         <entry morerows="13"><literal>Activity</literal></entry>
+         <entry morerows="12"><literal>Activity</literal></entry>
          <entry><literal>ArchiverMain</literal></entry>
          <entry>Waiting in main loop of the archiver process.</entry>
         </row>
@@ -1276,17 +1276,9 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry><literal>PgStatMain</literal></entry>
          <entry>Waiting in main loop of the statistics collector process.</entry>
         </row>
-        <row>
-         <entry><literal>RecoveryWalAll</literal></entry>
-         <entry>Waiting for WAL from a stream at recovery.</entry>
-        </row>
         <row>
          <entry><literal>RecoveryWalStream</literal></entry>
-         <entry>
-          Waiting when WAL data is not available from any kind of sources
-          (local, archive or stream) before trying again to retrieve WAL data,
-          at recovery.
-         </entry>
+         <entry>Waiting for WAL from a stream at recovery.</entry>
         </row>
         <row>
          <entry><literal>SysLoggerMain</literal></entry>
@@ -1496,7 +1488,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
         </row>
         <row>
-         <entry morerows="2"><literal>Timeout</literal></entry>
+         <entry morerows="3"><literal>Timeout</literal></entry>
          <entry><literal>BaseBackupThrottle</literal></entry>
          <entry>Waiting during base backup when throttling activity.</entry>
         </row>
@@ -1508,6 +1500,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry><literal>RecoveryApplyDelay</literal></entry>
          <entry>Waiting to apply WAL at recovery because it is delayed.</entry>
         </row>
+        <row>
+         <entry><literal>RecoveryRetrieveRetryInterval</literal></entry>
+         <entry>
+          Waiting when WAL data is not available from any kind of sources
+          (<filename>pg_wal</filename>, archive or stream) before trying
+          again to retrieve WAL data, at recovery.
+         </entry>
+        </row>
         <row>
          <entry morerows="68"><literal>IO</literal></entry>
          <entry><literal>BufFileRead</literal></entry>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index de2d4ee582..793c076da6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12031,7 +12031,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 										 WL_LATCH_SET | WL_TIMEOUT |
 										 WL_EXIT_ON_PM_DEATH,
 										 wait_time,
-										 WAIT_EVENT_RECOVERY_WAL_STREAM);
+										 WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL);
 						ResetLatch(&XLogCtl->recoveryWakeupLatch);
 						now = GetCurrentTimestamp();
 					}
@@ -12221,7 +12221,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					(void) WaitLatch(&XLogCtl->recoveryWakeupLatch,
 									 WL_LATCH_SET | WL_TIMEOUT |
 									 WL_EXIT_ON_PM_DEATH,
-									 5000L, WAIT_EVENT_RECOVERY_WAL_ALL);
+									 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM);
 					ResetLatch(&XLogCtl->recoveryWakeupLatch);
 					break;
 				}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f9287b7942..d29c211a76 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3602,9 +3602,6 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_PGSTAT_MAIN:
 			event_name = "PgStatMain";
 			break;
-		case WAIT_EVENT_RECOVERY_WAL_ALL:
-			event_name = "RecoveryWalAll";
-			break;
 		case WAIT_EVENT_RECOVERY_WAL_STREAM:
 			event_name = "RecoveryWalStream";
 			break;
@@ -3824,6 +3821,9 @@ pgstat_get_wait_timeout(WaitEventTimeout w)
 		case WAIT_EVENT_RECOVERY_APPLY_DELAY:
 			event_name = "RecoveryApplyDelay";
 			break;
+		case WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL:
+			event_name = "RecoveryRetrieveRetryInterval";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 1a19921f80..851d0a7246 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -761,7 +761,6 @@ typedef enum
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
-	WAIT_EVENT_RECOVERY_WAL_ALL,
 	WAIT_EVENT_RECOVERY_WAL_STREAM,
 	WAIT_EVENT_SYSLOGGER_MAIN,
 	WAIT_EVENT_WAL_RECEIVER_MAIN,
@@ -848,7 +847,8 @@ typedef enum
 {
 	WAIT_EVENT_BASE_BACKUP_THROTTLE = PG_WAIT_TIMEOUT,
 	WAIT_EVENT_PG_SLEEP,
-	WAIT_EVENT_RECOVERY_APPLY_DELAY
+	WAIT_EVENT_RECOVERY_APPLY_DELAY,
+	WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL
 } WaitEventTimeout;
 
 /* ----------
#8Atsushi Torikoshi
atorik@gmail.com
In reply to: Fujii Masao (#7)
Re: RecoveryWalAll and RecoveryWalStream wait events

On Wed, Mar 18, 2020 at 6:59 PM Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:

I meant the following part in the doc.

---------------------
At startup, the standby begins by restoring all WAL available in the
archive
location, calling restore_command. Once it reaches the end of WAL available
there and restore_command fails, it tries to restore any WAL available in
the
pg_wal directory. If that fails, and streaming replication has been
configured,
the standby tries to connect to the primary server and start streaming WAL
from
the last valid record found in archive or pg_wal. If that fails or
streaming
replication is not configured, or if the connection is later disconnected,
the standby goes back to step 1 and tries to restore the file from the
archive
again. This loop of retries from the archive, pg_wal, and via streaming
replication goes on until the server is stopped or failover is triggered
by a
trigger file.
---------------------

Thanks!

It seems the comment on WaitForWALToBecomeAvailable()
does not go along with the high-availability.sgml, do we need
modification on the comment on the function?

No, I think for now. But you'd like to improve the docs?

I'll do it.

But, anyway, you think that "pg_wal" should be used instead

of "local" here?

I don't have special opinion here.
It might be better because high-availability.sgml does not use
"local" but "pg_wal" for the explanation, but I also feel it's
obvious in this context.

Ok, I changed that from "local" to "pg_wal" in the patch for
the master. Attached is the updated version of the patch.
If you're OK with this, I'd like to commit two patches that I posted
in this thread.

Thanks for your modification and it looks good to me.

Regards,

--
Torikoshi Atsushi

#9Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Atsushi Torikoshi (#8)
Re: RecoveryWalAll and RecoveryWalStream wait events

On 2020/03/18 22:37, Atsushi Torikoshi wrote:

On Wed, Mar 18, 2020 at 6:59 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:

I meant the following part in the doc.

---------------------
At startup, the standby begins by restoring all WAL available in the archive
location, calling restore_command. Once it reaches the end of WAL available
there and restore_command fails, it tries to restore any WAL available in the
pg_wal directory. If that fails, and streaming replication has been configured,
the standby tries to connect to the primary server and start streaming WAL from
the last valid record found in archive or pg_wal. If that fails or streaming
replication is not configured, or if the connection is later disconnected,
the standby goes back to step 1 and tries to restore the file from the archive
again. This loop of retries from the archive, pg_wal, and via streaming
replication goes on until the server is stopped or failover is triggered by a
trigger file.
---------------------

Thanks!

It seems the comment on WaitForWALToBecomeAvailable()
does not go along with the high-availability.sgml, do we need
modification on the comment on the function?

No, I think for now. But you'd like to improve the docs?

I'll do it.

     But, anyway, you think that "pg_wal" should be used instead

     of "local" here?

I don't have special opinion here.
It might be better because high-availability.sgml does not use
"local" but "pg_wal" for the explanation,  but I also feel it's
obvious in this context.

Ok, I changed that from "local" to "pg_wal" in the patch for
the master. Attached is the updated version of the patch.
If you're OK with this, I'd like to commit two patches that I posted
in this thread.

 Thanks for your modification and it looks good to me.

Pushed! Thanks a lot!

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters