Wait event that should be reported while waiting for WAL archiving to finish
Hi,
When I saw pg_stat_activity.wait_event while pg_basebackup -X none
is waiting for WAL archiving to finish, it was either NULL or
CheckpointDone. I think this is confusing. What about introducing
new wait_event like WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE
(BackupWaitWalArchive) and reporting it during that period?
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
On Thu, Feb 13, 2020 at 02:29:20AM +0900, Fujii Masao wrote:
When I saw pg_stat_activity.wait_event while pg_basebackup -X none
is waiting for WAL archiving to finish, it was either NULL or
CheckpointDone. I think this is confusing. What about introducing
new wait_event like WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE
(BackupWaitWalArchive) and reporting it during that period?
Sounds like a good idea to me. You need to be careful that this does
not overwrite more low-level wait event registration though, so that
could be more tricky than it looks at first sight.
--
Michael
On 2020/02/13 12:28, Michael Paquier wrote:
On Thu, Feb 13, 2020 at 02:29:20AM +0900, Fujii Masao wrote:
When I saw pg_stat_activity.wait_event while pg_basebackup -X none
is waiting for WAL archiving to finish, it was either NULL or
CheckpointDone. I think this is confusing. What about introducing
new wait_event like WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE
(BackupWaitWalArchive) and reporting it during that period?Sounds like a good idea to me. You need to be careful that this does
not overwrite more low-level wait event registration though, so that
could be more tricky than it looks at first sight.
Thanks for the advise! Patch attached.
I found that the wait events "LogicalRewriteTruncate" and
"GSSOpenServer" are not documented. I'm thinking to add
them into doc separately if ok.
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
Attachments:
wait_event_wal_archive_v1.patchtext/plain; charset=UTF-8; name=wait_event_wal_archive_v1.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9129f79bbf..1e2f81d994 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1479,6 +1479,10 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry><literal>SyncRep</literal></entry>
<entry>Waiting for confirmation from remote server during synchronous replication.</entry>
</row>
+ <row>
+ <entry><literal>BackupWaitWalArchive</literal></entry>
+ <entry>Waiting for WAL files required for the backup to be successfully archived.</entry>
+ </row>
<row>
<entry morerows="2"><literal>Timeout</literal></entry>
<entry><literal>BaseBackupThrottle</literal></entry>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3813eadfb4..d79417b443 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11095,6 +11095,7 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
seconds_before_warning = 60;
waits = 0;
+ pgstat_report_wait_start(WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE);
while (XLogArchiveIsBusy(lastxlogfilename) ||
XLogArchiveIsBusy(histfilename))
{
@@ -11120,6 +11121,7 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
"but the database backup will not be usable without all the WAL segments.")));
}
}
+ pgstat_report_wait_end();
ereport(NOTICE,
(errmsg("all required WAL segments have been archived")));
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 7169509a79..6114ab44ee 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3848,6 +3848,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
case WAIT_EVENT_SYNC_REP:
event_name = "SyncRep";
break;
+ case WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE:
+ event_name = "BackupWaitWalArchive";
+ break;
/* no default case, so that compiler will warn */
}
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index aecb6013f0..7c40ebc3dc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -853,7 +853,8 @@ typedef enum
WAIT_EVENT_REPLICATION_ORIGIN_DROP,
WAIT_EVENT_REPLICATION_SLOT_DROP,
WAIT_EVENT_SAFE_SNAPSHOT,
- WAIT_EVENT_SYNC_REP
+ WAIT_EVENT_SYNC_REP,
+ WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE
} WaitEventIPC;
/* ----------
On Thu, Feb 13, 2020 at 03:35:50PM +0900, Fujii Masao wrote:
I found that the wait events "LogicalRewriteTruncate" and
"GSSOpenServer" are not documented. I'm thinking to add
them into doc separately if ok.
Nice catch. The ordering of the entries is not respected either for
GSSOpenServer in pgstat.h. The portion for the code and the docs can
be fixed in back-branches, but not the enum list in WaitEventClient or
we would have an ABI breakage. But this can be fixed on HEAD. Can
you take care of it? If you need help, please feel free to poke me. I
think that this should be fixed first, before adding the new event.
<entry><literal>SyncRep</literal></entry> <entry>Waiting for confirmation from remote server during synchronous replication.</entry> </row> + <row> + <entry><literal>BackupWaitWalArchive</literal></entry> + <entry>Waiting for WAL files required for the backup to be successfully archived.</entry> + </row>
The category IPC is adapted. You forgot to update the markup morerows
from "36" to "37", causing the table of the wait events to have a
weird format (the bottom should be incorrect).
+ pgstat_report_wait_start(WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE); while (XLogArchiveIsBusy(lastxlogfilename) || XLogArchiveIsBusy(histfilename)) { @@ -11120,6 +11121,7 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p) "but the database backup will not be usable without all the WAL segments."))); } } + pgstat_report_wait_end();
Okay, that position is right.
@@ -3848,6 +3848,9 @@ pgstat_get_wait_ipc(WaitEventIPC w) case WAIT_EVENT_SYNC_REP: event_name = "SyncRep"; break; + case WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE: + event_name = "BackupWaitWalArchive"; + break; /* no default case, so that compiler will warn */ [...] @@ -853,7 +853,8 @@ typedef enum WAIT_EVENT_REPLICATION_ORIGIN_DROP, WAIT_EVENT_REPLICATION_SLOT_DROP, WAIT_EVENT_SAFE_SNAPSHOT, - WAIT_EVENT_SYNC_REP + WAIT_EVENT_SYNC_REP, + WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE } WaitEventIPC;
It would be good to keep entries in alphabetical order in the header,
the code and in the docs (see the effort from 5ef037c), and your patch
is missing that concept for all three places where it matters for this
new event.
--
Michael
On 2020/02/13 16:30, Michael Paquier wrote:
On Thu, Feb 13, 2020 at 03:35:50PM +0900, Fujii Masao wrote:
I found that the wait events "LogicalRewriteTruncate" and
"GSSOpenServer" are not documented. I'm thinking to add
them into doc separately if ok.Nice catch. The ordering of the entries is not respected either for
GSSOpenServer in pgstat.h. The portion for the code and the docs can
be fixed in back-branches, but not the enum list in WaitEventClient or
we would have an ABI breakage. But this can be fixed on HEAD. Can
you take care of it?
Yes. Patch attached.
logical_rewrite_truncate_v1.patch adds the description of
LogicalRewriteTruncate into the doc. This needs to be
back-patched to v10 where commit 249cf070e3 introduced
LogicalRewriteTruncate event.
gss_open_server_v1.patch adds the description of GSSOpenServer
into the doc and update the code in pgstat_get_wait_client().
This needs to be applied in v12 where commit b0b39f72b9 introduced
GSSOpenServer event.
gss_open_server_for_master_v1.patch does not only what the above
patch does but also update wait event enum into alphabetical order.
This needs to be applied in the master.
<entry><literal>SyncRep</literal></entry> <entry>Waiting for confirmation from remote server during synchronous replication.</entry> </row> + <row> + <entry><literal>BackupWaitWalArchive</literal></entry> + <entry>Waiting for WAL files required for the backup to be successfully archived.</entry> + </row>The category IPC is adapted. You forgot to update the markup morerows
from "36" to "37", causing the table of the wait events to have a
weird format (the bottom should be incorrect).
Fixed. Thanks for the review!
+ pgstat_report_wait_start(WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE); while (XLogArchiveIsBusy(lastxlogfilename) || XLogArchiveIsBusy(histfilename)) { @@ -11120,6 +11121,7 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p) "but the database backup will not be usable without all the WAL segments."))); } } + pgstat_report_wait_end();Okay, that position is right.
@@ -3848,6 +3848,9 @@ pgstat_get_wait_ipc(WaitEventIPC w) case WAIT_EVENT_SYNC_REP: event_name = "SyncRep"; break; + case WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE: + event_name = "BackupWaitWalArchive"; + break; /* no default case, so that compiler will warn */ [...] @@ -853,7 +853,8 @@ typedef enum WAIT_EVENT_REPLICATION_ORIGIN_DROP, WAIT_EVENT_REPLICATION_SLOT_DROP, WAIT_EVENT_SAFE_SNAPSHOT, - WAIT_EVENT_SYNC_REP + WAIT_EVENT_SYNC_REP, + WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE } WaitEventIPC;It would be good to keep entries in alphabetical order in the header,
the code and in the docs (see the effort from 5ef037c), and your patch
is missing that concept for all three places where it matters for this
new event.
Fixed. Patch attached.
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
Attachments:
gss_open_server_for_master_v1.patchtext/plain; charset=UTF-8; name=gss_open_server_for_master_v1.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9129f79bbf..027df985ac 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1293,7 +1293,7 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry>Waiting in main loop of WAL writer process.</entry>
</row>
<row>
- <entry morerows="7"><literal>Client</literal></entry>
+ <entry morerows="8"><literal>Client</literal></entry>
<entry><literal>ClientRead</literal></entry>
<entry>Waiting to read data from the client.</entry>
</row>
@@ -1301,6 +1301,10 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry><literal>ClientWrite</literal></entry>
<entry>Waiting to write data to the client.</entry>
</row>
+ <row>
+ <entry><literal>GSSOpenServer</literal></entry>
+ <entry>Waiting to read data from the client while establishing the GSSAPI session.</entry>
+ </row>
<row>
<entry><literal>LibPQWalReceiverConnect</literal></entry>
<entry>Waiting in WAL receiver to establish connection to remote server.</entry>
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 7169509a79..59dc4f31ab 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3697,6 +3697,9 @@ pgstat_get_wait_client(WaitEventClient w)
case WAIT_EVENT_CLIENT_WRITE:
event_name = "ClientWrite";
break;
+ case WAIT_EVENT_GSS_OPEN_SERVER:
+ event_name = "GSSOpenServer";
+ break;
case WAIT_EVENT_LIBPQWALRECEIVER_CONNECT:
event_name = "LibPQWalReceiverConnect";
break;
@@ -3715,9 +3718,6 @@ pgstat_get_wait_client(WaitEventClient w)
case WAIT_EVENT_WAL_SENDER_WRITE_DATA:
event_name = "WalSenderWriteData";
break;
- case WAIT_EVENT_GSS_OPEN_SERVER:
- event_name = "GSSOpenServer";
- break;
/* no default case, so that compiler will warn */
}
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index aecb6013f0..3a65a51696 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -799,13 +799,13 @@ typedef enum
{
WAIT_EVENT_CLIENT_READ = PG_WAIT_CLIENT,
WAIT_EVENT_CLIENT_WRITE,
+ WAIT_EVENT_GSS_OPEN_SERVER,
WAIT_EVENT_LIBPQWALRECEIVER_CONNECT,
WAIT_EVENT_LIBPQWALRECEIVER_RECEIVE,
WAIT_EVENT_SSL_OPEN_SERVER,
WAIT_EVENT_WAL_RECEIVER_WAIT_START,
WAIT_EVENT_WAL_SENDER_WAIT_WAL,
WAIT_EVENT_WAL_SENDER_WRITE_DATA,
- WAIT_EVENT_GSS_OPEN_SERVER,
} WaitEventClient;
/* ----------
gss_open_server_v1.patchtext/plain; charset=UTF-8; name=gss_open_server_v1.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9129f79bbf..027df985ac 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1293,7 +1293,7 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry>Waiting in main loop of WAL writer process.</entry>
</row>
<row>
- <entry morerows="7"><literal>Client</literal></entry>
+ <entry morerows="8"><literal>Client</literal></entry>
<entry><literal>ClientRead</literal></entry>
<entry>Waiting to read data from the client.</entry>
</row>
@@ -1301,6 +1301,10 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry><literal>ClientWrite</literal></entry>
<entry>Waiting to write data to the client.</entry>
</row>
+ <row>
+ <entry><literal>GSSOpenServer</literal></entry>
+ <entry>Waiting to read data from the client while establishing the GSSAPI session.</entry>
+ </row>
<row>
<entry><literal>LibPQWalReceiverConnect</literal></entry>
<entry>Waiting in WAL receiver to establish connection to remote server.</entry>
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 7169509a79..59dc4f31ab 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3697,6 +3697,9 @@ pgstat_get_wait_client(WaitEventClient w)
case WAIT_EVENT_CLIENT_WRITE:
event_name = "ClientWrite";
break;
+ case WAIT_EVENT_GSS_OPEN_SERVER:
+ event_name = "GSSOpenServer";
+ break;
case WAIT_EVENT_LIBPQWALRECEIVER_CONNECT:
event_name = "LibPQWalReceiverConnect";
break;
@@ -3715,9 +3718,6 @@ pgstat_get_wait_client(WaitEventClient w)
case WAIT_EVENT_WAL_SENDER_WRITE_DATA:
event_name = "WalSenderWriteData";
break;
- case WAIT_EVENT_GSS_OPEN_SERVER:
- event_name = "GSSOpenServer";
- break;
/* no default case, so that compiler will warn */
}
wait_event_wal_archive_v2.patchtext/plain; charset=UTF-8; name=wait_event_wal_archive_v2.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9129f79bbf..411076aaf9 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1331,7 +1331,11 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry>Waiting in an extension.</entry>
</row>
<row>
- <entry morerows="36"><literal>IPC</literal></entry>
+ <entry morerows="37"><literal>IPC</literal></entry>
+ <entry><literal>BackupWaitWalArchive</literal></entry>
+ <entry>Waiting for WAL files required for the backup to be successfully archived.</entry>
+ </row>
+ <row>
<entry><literal>BgWorkerShutdown</literal></entry>
<entry>Waiting for background worker to shut down.</entry>
</row>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3813eadfb4..d79417b443 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11095,6 +11095,7 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
seconds_before_warning = 60;
waits = 0;
+ pgstat_report_wait_start(WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE);
while (XLogArchiveIsBusy(lastxlogfilename) ||
XLogArchiveIsBusy(histfilename))
{
@@ -11120,6 +11121,7 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
"but the database backup will not be usable without all the WAL segments.")));
}
}
+ pgstat_report_wait_end();
ereport(NOTICE,
(errmsg("all required WAL segments have been archived")));
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 7169509a79..0b2c7ee916 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3737,6 +3737,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
switch (w)
{
+ case WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE:
+ event_name = "BackupWaitWalArchive";
+ break;
case WAIT_EVENT_BGWORKER_SHUTDOWN:
event_name = "BgWorkerShutdown";
break;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index aecb6013f0..348dcdc21f 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -817,7 +817,8 @@ typedef enum
*/
typedef enum
{
- WAIT_EVENT_BGWORKER_SHUTDOWN = PG_WAIT_IPC,
+ WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE = PG_WAIT_IPC,
+ WAIT_EVENT_BGWORKER_SHUTDOWN,
WAIT_EVENT_BGWORKER_STARTUP,
WAIT_EVENT_BTREE_PAGE,
WAIT_EVENT_CLOG_GROUP_UPDATE,
logical_rewrite_truncate_v1.patchtext/plain; charset=UTF-8; name=logical_rewrite_truncate_v1.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9129f79bbf..0e5a4a0b86 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1493,7 +1493,7 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry>Waiting to apply WAL at recovery because it is delayed.</entry>
</row>
<row>
- <entry morerows="67"><literal>IO</literal></entry>
+ <entry morerows="68"><literal>IO</literal></entry>
<entry><literal>BufFileRead</literal></entry>
<entry>Waiting for a read from a buffered file.</entry>
</row>
@@ -1609,6 +1609,10 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry><literal>LogicalRewriteSync</literal></entry>
<entry>Waiting for logical rewrite mappings to reach stable storage.</entry>
</row>
+ <row>
+ <entry><literal>LogicalRewriteTruncate</literal></entry>
+ <entry>Waiting for truncate of mapping data during a logical rewrite.</entry>
+ </row>
<row>
<entry><literal>LogicalRewriteWrite</literal></entry>
<entry>Waiting for a write of logical rewrite mappings.</entry>
On Fri, Feb 14, 2020 at 12:47:19PM +0900, Fujii Masao wrote:
logical_rewrite_truncate_v1.patch adds the description of
LogicalRewriteTruncate into the doc. This needs to be
back-patched to v10 where commit 249cf070e3 introduced
LogicalRewriteTruncate event.
Indeed. You just be careful about the number of fields for morerows,
as that's not the same across branches.
gss_open_server_v1.patch adds the description of GSSOpenServer
into the doc and update the code in pgstat_get_wait_client().
This needs to be applied in v12 where commit b0b39f72b9 introduced
GSSOpenServer event.gss_open_server_for_master_v1.patch does not only what the above
patch does but also update wait event enum into alphabetical order.
This needs to be applied in the master.
Thanks for splitting things. All that looks correct to me.
--
Michael
On Thu, Feb 13, 2020 at 10:47 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
Fixed. Thanks for the review!
I think it would be safer to just report the wait event during
pg_usleep(1000000L) rather than putting those calls around the whole
loop. It does not seem impossible that ereport() or
CHECK_FOR_INTERRUPTS() could do something that reports a wait event
internally.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2020/02/14 15:45, Michael Paquier wrote:
On Fri, Feb 14, 2020 at 12:47:19PM +0900, Fujii Masao wrote:
logical_rewrite_truncate_v1.patch adds the description of
LogicalRewriteTruncate into the doc. This needs to be
back-patched to v10 where commit 249cf070e3 introduced
LogicalRewriteTruncate event.Indeed. You just be careful about the number of fields for morerows,
as that's not the same across branches.gss_open_server_v1.patch adds the description of GSSOpenServer
into the doc and update the code in pgstat_get_wait_client().
This needs to be applied in v12 where commit b0b39f72b9 introduced
GSSOpenServer event.gss_open_server_for_master_v1.patch does not only what the above
patch does but also update wait event enum into alphabetical order.
This needs to be applied in the master.Thanks for splitting things. All that looks correct to me.
Thanks for the review! Pushed the patches for
LogicalRewriteTruncate and GSSOpenServer.
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
On 2020/02/14 23:43, Robert Haas wrote:
On Thu, Feb 13, 2020 at 10:47 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:Fixed. Thanks for the review!
I think it would be safer to just report the wait event during
pg_usleep(1000000L) rather than putting those calls around the whole
loop. It does not seem impossible that ereport() or
CHECK_FOR_INTERRUPTS() could do something that reports a wait event
internally.
OK, so I attached the updated version of the patch.
Thanks for the review!
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
Attachments:
wait_event_wal_archive_v3.patchtext/plain; charset=UTF-8; name=wait_event_wal_archive_v3.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a9f6ee6e32..2f88b41529 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1335,7 +1335,11 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry>Waiting in an extension.</entry>
</row>
<row>
- <entry morerows="36"><literal>IPC</literal></entry>
+ <entry morerows="37"><literal>IPC</literal></entry>
+ <entry><literal>BackupWaitWalArchive</literal></entry>
+ <entry>Waiting for WAL files required for the backup to be successfully archived.</entry>
+ </row>
+ <row>
<entry><literal>BgWorkerShutdown</literal></entry>
<entry>Waiting for background worker to shut down.</entry>
</row>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3813eadfb4..dce0f3b041 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11107,7 +11107,9 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
reported_waiting = true;
}
+ pgstat_report_wait_start(WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE);
pg_usleep(1000000L);
+ pgstat_report_wait_end();
if (++waits >= seconds_before_warning)
{
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 59dc4f31ab..6e7a57b1b7 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3737,6 +3737,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
switch (w)
{
+ case WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE:
+ event_name = "BackupWaitWalArchive";
+ break;
case WAIT_EVENT_BGWORKER_SHUTDOWN:
event_name = "BgWorkerShutdown";
break;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 3a65a51696..20313f1e32 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -817,7 +817,8 @@ typedef enum
*/
typedef enum
{
- WAIT_EVENT_BGWORKER_SHUTDOWN = PG_WAIT_IPC,
+ WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE = PG_WAIT_IPC,
+ WAIT_EVENT_BGWORKER_SHUTDOWN,
WAIT_EVENT_BGWORKER_STARTUP,
WAIT_EVENT_BTREE_PAGE,
WAIT_EVENT_CLOG_GROUP_UPDATE,
On Mon, Feb 17, 2020 at 04:30:00PM +0900, Fujii Masao wrote:
On 2020/02/14 23:43, Robert Haas wrote:
On Thu, Feb 13, 2020 at 10:47 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:Fixed. Thanks for the review!
I think it would be safer to just report the wait event during
pg_usleep(1000000L) rather than putting those calls around the whole
loop. It does not seem impossible that ereport() or
CHECK_FOR_INTERRUPTS() could do something that reports a wait event
internally.
CHECK_FOR_INTERRUPTS() would reset the event wait state. Hm.. You
may be right about the WARNING and it would be better to not rely on
that. Do you remember the states which may be triggered?
OK, so I attached the updated version of the patch.
Thanks for the review!
Actually, I have some questions:
1) Should a new wait event be added in recoveryPausesHere()? That
would be IMO useful.
2) Perhaps those two points should be replaced with WaitLatch(), where
we would use the new wait events introduced?
--
Michael
On 2020/02/17 18:48, Michael Paquier wrote:
On Mon, Feb 17, 2020 at 04:30:00PM +0900, Fujii Masao wrote:
On 2020/02/14 23:43, Robert Haas wrote:
On Thu, Feb 13, 2020 at 10:47 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:Fixed. Thanks for the review!
I think it would be safer to just report the wait event during
pg_usleep(1000000L) rather than putting those calls around the whole
loop. It does not seem impossible that ereport() or
CHECK_FOR_INTERRUPTS() could do something that reports a wait event
internally.CHECK_FOR_INTERRUPTS() would reset the event wait state. Hm.. You
may be right about the WARNING and it would be better to not rely on
that. Do you remember the states which may be triggered?OK, so I attached the updated version of the patch.
Thanks for the review!Actually, I have some questions:
1) Should a new wait event be added in recoveryPausesHere()? That
would be IMO useful.
Yes, it's useful, I think. But it's better to implement that
as a separate patch.
2) Perhaps those two points should be replaced with WaitLatch(), where
we would use the new wait events introduced?
For what? Maybe it should, but I'm not sure it's worth the trouble.
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
On Mon, Feb 17, 2020 at 10:21:23PM +0900, Fujii Masao wrote:
On 2020/02/17 18:48, Michael Paquier wrote:
Actually, I have some questions:
1) Should a new wait event be added in recoveryPausesHere()? That
would be IMO useful.Yes, it's useful, I think. But it's better to implement that
as a separate patch.
No problem for me.
2) Perhaps those two points should be replaced with WaitLatch(), where
we would use the new wait events introduced?For what? Maybe it should, but I'm not sure it's worth the trouble.
I don't have more to offer than signal handling consistency for both
without relying on pg_usleep()'s behavior depending on the platform,
and power consumption. For the recovery pause, the second argument
may not be worth carrying, but we never had this argument for the
archiving wait, did we? For both, on top of it you don't need to
worry about concurrent issues with the wait events attached around.
--
Michael
On 2020/02/18 12:39, Michael Paquier wrote:
On Mon, Feb 17, 2020 at 10:21:23PM +0900, Fujii Masao wrote:
On 2020/02/17 18:48, Michael Paquier wrote:
Actually, I have some questions:
1) Should a new wait event be added in recoveryPausesHere()? That
would be IMO useful.Yes, it's useful, I think. But it's better to implement that
as a separate patch.No problem for me.
On second thought, it's OK to add that event into the patch.
Attached is the updated version of the patch. This patch adds
two wait events for WAL archiving and recovery pause.
2) Perhaps those two points should be replaced with WaitLatch(), where
we would use the new wait events introduced?For what? Maybe it should, but I'm not sure it's worth the trouble.
I don't have more to offer than signal handling consistency for both
without relying on pg_usleep()'s behavior depending on the platform,
and power consumption. For the recovery pause, the second argument
may not be worth carrying, but we never had this argument for the
archiving wait, did we?
I have no idea about this. But I wonder how much that change
is helpful to reduce the power consumption because waiting
for WAL archive during the backup basically not so frequently
happens.
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
Attachments:
wait_event_wal_archive_v4.patchtext/plain; charset=UTF-8; name=wait_event_wal_archive_v4.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 87586a7b06..6c0c93ffdc 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1335,7 +1335,11 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry>Waiting in an extension.</entry>
</row>
<row>
- <entry morerows="36"><literal>IPC</literal></entry>
+ <entry morerows="38"><literal>IPC</literal></entry>
+ <entry><literal>BackupWaitWalArchive</literal></entry>
+ <entry>Waiting for WAL files required for the backup to be successfully archived.</entry>
+ </row>
+ <row>
<entry><literal>BgWorkerShutdown</literal></entry>
<entry>Waiting for background worker to shut down.</entry>
</row>
@@ -1467,6 +1471,10 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry><literal>Promote</literal></entry>
<entry>Waiting for standby promotion.</entry>
</row>
+ <row>
+ <entry><literal>RecoveryPause</literal></entry>
+ <entry>Waiting for recovery to be resumed.</entry>
+ </row>
<row>
<entry><literal>ReplicationOriginDrop</literal></entry>
<entry>Waiting for a replication origin to become inactive to be dropped.</entry>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d19408b3be..5569606183 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5945,7 +5945,9 @@ recoveryPausesHere(void)
while (RecoveryIsPaused())
{
+ pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);
pg_usleep(1000000L); /* 1000 ms */
+ pgstat_report_wait_end();
HandleStartupProcInterrupts();
}
}
@@ -11129,7 +11131,9 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
reported_waiting = true;
}
+ pgstat_report_wait_start(WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE);
pg_usleep(1000000L);
+ pgstat_report_wait_end();
if (++waits >= seconds_before_warning)
{
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 462b4d7e06..2400bf31ee 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3740,6 +3740,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
switch (w)
{
+ case WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE:
+ event_name = "BackupWaitWalArchive";
+ break;
case WAIT_EVENT_BGWORKER_SHUTDOWN:
event_name = "BgWorkerShutdown";
break;
@@ -3839,6 +3842,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
case WAIT_EVENT_PROMOTE:
event_name = "Promote";
break;
+ case WAIT_EVENT_RECOVERY_PAUSE:
+ event_name = "RecoveryPause";
+ break;
case WAIT_EVENT_REPLICATION_ORIGIN_DROP:
event_name = "ReplicationOriginDrop";
break;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 3a65a51696..6d8332163c 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -817,7 +817,8 @@ typedef enum
*/
typedef enum
{
- WAIT_EVENT_BGWORKER_SHUTDOWN = PG_WAIT_IPC,
+ WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE = PG_WAIT_IPC,
+ WAIT_EVENT_BGWORKER_SHUTDOWN,
WAIT_EVENT_BGWORKER_STARTUP,
WAIT_EVENT_BTREE_PAGE,
WAIT_EVENT_CLOG_GROUP_UPDATE,
@@ -850,6 +851,7 @@ typedef enum
WAIT_EVENT_PARALLEL_FINISH,
WAIT_EVENT_PROCARRAY_GROUP_UPDATE,
WAIT_EVENT_PROMOTE,
+ WAIT_EVENT_RECOVERY_PAUSE,
WAIT_EVENT_REPLICATION_ORIGIN_DROP,
WAIT_EVENT_REPLICATION_SLOT_DROP,
WAIT_EVENT_SAFE_SNAPSHOT,
On Wed, Feb 26, 2020 at 9:19 PM Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:
I have no idea about this. But I wonder how much that change
is helpful to reduce the power consumption because waiting
for WAL archive during the backup basically not so frequently
happens.
+1.
And as far as I reviewed the patch, I didn't find any problems.
Regards,
--
Atsushi Torikoshi
On 2020/03/19 19:39, Atsushi Torikoshi wrote:
On Wed, Feb 26, 2020 at 9:19 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:
I have no idea about this. But I wonder how much that change
is helpful to reduce the power consumption because waiting
for WAL archive during the backup basically not so frequently
happens.+1.
And as far as I reviewed the patch, I didn't find any problems.
Thanks for the review!
Barring any objection, I will commit this patch.
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
On 2020/03/23 15:56, Fujii Masao wrote:
On 2020/03/19 19:39, Atsushi Torikoshi wrote:
On Wed, Feb 26, 2020 at 9:19 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:
I have no idea about this. But I wonder how much that change
is helpful to reduce the power consumption because waiting
for WAL archive during the backup basically not so frequently
happens.+1.
And as far as I reviewed the patch, I didn't find any problems.Thanks for the review!
Barring any objection, I will commit this patch.
Pushed! Thanks!
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters