add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

bossartn@amazon.com

about 4 years ago

In reply to: Bharath Rupireddy (#1)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On 11/10/21, 9:43 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:

As discussed in [1], isn't it a better idea to add some of activity
messages [2] such as recovery, archive, backup, streaming etc. to
server logs at LOG level? They are currently being set into ps display
which is good if the postgres is being run on a standalone box/VM
where users can see the ps display, but it doesn't help much in case
the postgres is being run on a cloud environment where users don't
have access to ps display output. Moreover, the ps display is
transient and will not help to analyze after an issue occurs.

Having the above messages in the server logs will be useful to
understand how the system is/was doing/progressing with these
(sometimes time-intensive) operations.

I think this would make the logs far too noisy for many servers. For
archiving alone, this could cause tens of thousands more log messages
per hour on a busy system. I think you can already see such
information at a debug level, anyway.

Nathan

Euler Taveira

euler@eulerto.com

about 4 years ago

In reply to: Bharath Rupireddy (#1)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Wed, Nov 10, 2021, at 2:38 PM, Bharath Rupireddy wrote:

As discussed in [1], isn't it a better idea to add some of activity
messages [2] such as recovery, archive, backup, streaming etc. to
server logs at LOG level? They are currently being set into ps display
which is good if the postgres is being run on a standalone box/VM
where users can see the ps display, but it doesn't help much in case
the postgres is being run on a cloud environment where users don't
have access to ps display output. Moreover, the ps display is
transient and will not help to analyze after an issue occurs.

Besides recovery, the other activities already provide information through
views. I might be missing something but it seems the current views already
provide the information that ps displays.

Having the above messages in the server logs will be useful to
understand how the system is/was doing/progressing with these
(sometimes time-intensive) operations.

It could be useful for a root cause analysis, however, you are also increasing
the number of LOG messages in a system that doesn't/won't require such
information. It is fine to add additional DEBUG messages if there isn't a
similar one yet. If at least the message level were module-controlled, you
could modify a setting to gather more messages from a specific module.
Unfortunately, that's not possible so we should avoid superfluous messages.

--
Euler Taveira
EDB https://www.enterprisedb.com/

bharath.rupireddyforpostgres@gmail.com

about 4 years ago

In reply to: Bossart, Nathan (#2)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Thu, Nov 11, 2021 at 12:01 AM Bossart, Nathan <bossartn@amazon.com> wrote:

On 11/10/21, 9:43 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:

As discussed in [1], isn't it a better idea to add some of activity
messages [2] such as recovery, archive, backup, streaming etc. to
server logs at LOG level? They are currently being set into ps display
which is good if the postgres is being run on a standalone box/VM
where users can see the ps display, but it doesn't help much in case
the postgres is being run on a cloud environment where users don't
have access to ps display output. Moreover, the ps display is
transient and will not help to analyze after an issue occurs.

Having the above messages in the server logs will be useful to
understand how the system is/was doing/progressing with these
(sometimes time-intensive) operations.

I think this would make the logs far too noisy for many servers. For
archiving alone, this could cause tens of thousands more log messages
per hour on a busy system. I think you can already see such
information at a debug level, anyway.

You are right, having them at LOG level may blow up the log files and
disk storage, having them at DEBUG1 like other messages would at least
help than having them as activitymsg in ps display.

And not all of the activitymsgs are logged at DEBUGX level, at least I
don't see them(on a quick search).
snprintf(activitymsg, sizeof(activitymsg), "waiting for %s",
snprintf(activitymsg, sizeof(activitymsg), "recovering %s",
snprintf(activitymsg, sizeof(activitymsg), "performing %s%s%s",
snprintf(activitymsg, sizeof(activitymsg), "sending backup \"%s\"",
snprintf(activitymsg, sizeof(activitymsg), "restarting at %X/%X",
snprintf(activitymsg, sizeof(activitymsg), "streaming %X/%X",

But for the snprintf(activitymsg, sizeof(activitymsg), "archiving %s",
xlog); we have elog(DEBUG1, "archived write-ahead log file \"%s\"",
xlog); after the archiving command. It is also good to have a similar
debug message before archive command execution, we can get to know
whether archive command is executed or not, if yes how much time did
it take etc.?

I'm not sure whether it is okay to do the following, for every of the
activitymsg, before the command execution, we have an elog(DEBUG1
message and after the command execution another elog(DEBUG1 message.

Thoughts?

Regards,
Bharath Rupireddy.

[1]: /messages/by-id/CALj2ACXVwnHhek9bo9wgamtW9SBinj4ohF1f1Yf3Su2o5WarWg@mail.gmail.com

bharath.rupireddyforpostgres@gmail.com

about 4 years ago

In reply to: Euler Taveira (#3)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Thu, Nov 11, 2021 at 1:40 AM Euler Taveira <euler@eulerto.com> wrote:

On Wed, Nov 10, 2021, at 2:38 PM, Bharath Rupireddy wrote:

As discussed in [1], isn't it a better idea to add some of activity
messages [2] such as recovery, archive, backup, streaming etc. to
server logs at LOG level? They are currently being set into ps display
which is good if the postgres is being run on a standalone box/VM
where users can see the ps display, but it doesn't help much in case
the postgres is being run on a cloud environment where users don't
have access to ps display output. Moreover, the ps display is
transient and will not help to analyze after an issue occurs.

Besides recovery, the other activities already provide information through
views. I might be missing something but it seems the current views already
provide the information that ps displays.

Having the above messages in the server logs will be useful to
understand how the system is/was doing/progressing with these
(sometimes time-intensive) operations.

It could be useful for a root cause analysis, however, you are also increasing
the number of LOG messages in a system that doesn't/won't require such
information. It is fine to add additional DEBUG messages if there isn't a
similar one yet. If at least the message level were module-controlled, you
could modify a setting to gather more messages from a specific module.
Unfortunately, that's not possible so we should avoid superfluous messages.

Yes, at least having the messages at DEBUG1 level would help do the
RCA after an issue occurs. Please see my response upthread at [1]/messages/by-id/CALj2ACXVwnHhek9bo9wgamtW9SBinj4ohF1f1Yf3Su2o5WarWg@mail.gmail.com.

Regards,
Bharath Rupireddy.

alvherre@alvh.no-ip.org

about 4 years ago

In reply to: Bossart, Nathan (#2)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On 2021-Nov-10, Bossart, Nathan wrote:

On 11/10/21, 9:43 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:

As discussed in [1], isn't it a better idea to add some of activity
messages [2] such as recovery, archive, backup, streaming etc. to
server logs at LOG level?

I think this would make the logs far too noisy for many servers. For
archiving alone, this could cause tens of thousands more log messages
per hour on a busy system. I think you can already see such
information at a debug level, anyway.

Yeah. If we had some sort of ring buffer in which to store these
messages, the user could examine them through a view; they would still
be accessible in a running server, but they would not be written to the
server log.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

bossartn@amazon.com

about 4 years ago

In reply to: Alvaro Herrera (#6)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On 11/10/21, 9:51 PM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:

But for the snprintf(activitymsg, sizeof(activitymsg), "archiving %s",
xlog); we have elog(DEBUG1, "archived write-ahead log file \"%s\"",
xlog); after the archiving command. It is also good to have a similar
debug message before archive command execution, we can get to know
whether archive command is executed or not, if yes how much time did
it take etc.?

There is a DEBUG3 that emits "executing archive command..." prior to
executing the archive command.

I'm not sure whether it is okay to do the following, for every of the
activitymsg, before the command execution, we have an elog(DEBUG1
message and after the command execution another elog(DEBUG1 message.

The general idea seems reasonable to me.

Nathan

bossartn@amazon.com

about 4 years ago

In reply to: Alvaro Herrera (#6)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On 11/11/21, 4:38 AM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:

Yeah. If we had some sort of ring buffer in which to store these
messages, the user could examine them through a view; they would still
be accessible in a running server, but they would not be written to the
server log.

That's an interesting idea. It'd be nice to have a way to see more
debugging information without exploding the server log.

Nathan

Euler Taveira

euler@eulerto.com

about 4 years ago

In reply to: Bharath Rupireddy (#4)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Thu, Nov 11, 2021, at 2:50 AM, Bharath Rupireddy wrote:

But for the snprintf(activitymsg, sizeof(activitymsg), "archiving %s",
xlog); we have elog(DEBUG1, "archived write-ahead log file \"%s\"",
xlog); after the archiving command. It is also good to have a similar
debug message before archive command execution, we can get to know
whether archive command is executed or not, if yes how much time did
it take etc.?

If you are only interested in how much time it took to archive a WAL file, you
don't need 2 messages (before and after); one message is sufficient. If you
calculate the difference and then print it as part of the existent message.
Calculate the difference iif the log level is at least DEBUG1 (same as the log
message).

Messages related to system() e.g. archive and recovery, you could measure the
execution time in your own script/program.

I like Alvaro's idea of implementing a ring buffer for this kind of activity.
This implementation could be applied to checkpoints, archiving, recovery,
replication, base backup, WAL usage, connections and others.

--
Euler Taveira
EDB https://www.enterprisedb.com/

#10

Bruce Momjian

bruce@momjian.us

about 4 years ago

In reply to: Bossart, Nathan (#8)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Thu, Nov 11, 2021 at 03:18:16PM +0000, Bossart, Nathan wrote:

On 11/11/21, 4:38 AM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:

Yeah. If we had some sort of ring buffer in which to store these
messages, the user could examine them through a view; they would still
be accessible in a running server, but they would not be written to the
server log.

That's an interesting idea. It'd be nice to have a way to see more
debugging information without exploding the server log.

Uh, that's what dmesg does, which I mentioned earlier.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#11

Ashutosh Sharma

ashu.coek88@gmail.com

about 4 years ago

In reply to: Alvaro Herrera (#6)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Thu, Nov 11, 2021 at 6:07 PM Alvaro Herrera <alvherre@alvh.no-ip.org>
wrote:

On 2021-Nov-10, Bossart, Nathan wrote:

On 11/10/21, 9:43 AM, "Bharath Rupireddy" <

bharath.rupireddyforpostgres@gmail.com> wrote:

As discussed in [1], isn't it a better idea to add some of activity
messages [2] such as recovery, archive, backup, streaming etc. to
server logs at LOG level?

I think this would make the logs far too noisy for many servers. For
archiving alone, this could cause tens of thousands more log messages
per hour on a busy system. I think you can already see such
information at a debug level, anyway.

Yeah. If we had some sort of ring buffer in which to store these
messages, the user could examine them through a view; they would still
be accessible in a running server, but they would not be written to the
server log.

That's a good idea. How about also adding some GUC(s) to the log archive,
recovery related log messages just like we have for checkpoints, autovacuum
etc? Maybe something like log_archive, log_recovery etc.

--
With Regards,
Ashutosh Sharma.

#12

bharath.rupireddyforpostgres@gmail.com

about 4 years ago

In reply to: Bossart, Nathan (#7)

1 attachment(s)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Thu, Nov 11, 2021 at 8:41 PM Bossart, Nathan <bossartn@amazon.com> wrote:

I'm not sure whether it is okay to do the following, for every of the
activitymsg, before the command execution, we have an elog(DEBUG1
message and after the command execution another elog(DEBUG1 message.

The general idea seems reasonable to me.

Here's a simple patch that I could come up with. Please have a look.

Regards,
Bharath Rupireddy.

Attachments:

v1-0001-add-log-messages-that-are-equivalent-of-ps-displa.patchapplication/octet-stream; name=v1-0001-add-log-messages-that-are-equivalent-of-ps-displa.patchDownload

From 15deaab0b364102bd33b6fe926959599dd07b12b Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Sat, 13 Nov 2021 13:58:53 +0000
Subject: [PATCH v1] add log messages that are equivalent of ps display
 activitymsgs

---
 src/backend/access/transam/xlog.c     |  6 ++++++
 src/backend/postmaster/pgarch.c       |  2 ++
 src/backend/replication/basebackup.c  | 12 ++++++------
 src/backend/replication/walreceiver.c | 26 ++++++++++++++------------
 src/backend/replication/walsender.c   | 12 ++++++------
 5 files changed, 34 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e073121a7e..00eb71db79 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3791,6 +3791,8 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 					 xlogfname);
 			set_ps_display(activitymsg);
 
+			ereport(DEBUG1, (errmsg_internal("%s", activitymsg)));
+
 			if (!RestoreArchivedFile(path, xlogfname,
 									 "RECOVERYXLOG",
 									 wal_segment_size,
@@ -3833,6 +3835,8 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 				 xlogfname);
 		set_ps_display(activitymsg);
 
+		ereport(DEBUG1, (errmsg_internal("%s", activitymsg)));
+
 		/* Track source of data in assorted state variables */
 		readSource = source;
 		XLogReceiptSource = source;
@@ -9067,6 +9071,8 @@ update_checkpoint_display(int flags, bool restartpoint, bool reset)
 				 (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "",
 				 restartpoint ? "restartpoint" : "checkpoint");
 		set_ps_display(activitymsg);
+
+		ereport(DEBUG1, (errmsg_internal("%s", activitymsg)));
 	}
 }
 
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 3b33e01d95..9020b786b1 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -555,6 +555,8 @@ pgarch_archiveXlog(char *xlog)
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
+	ereport(DEBUG1, (errmsg_internal("%s", activitymsg)));
+
 	rc = system(xlogarchcmd);
 	if (rc != 0)
 	{
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index ec0485705d..4a15ea33b3 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -867,19 +867,19 @@ SendBaseBackup(BaseBackupCmd *cmd)
 {
 	basebackup_options opt;
 	bbsink	   *sink;
+	char		activitymsg[50];
 
 	parse_basebackup_options(cmd->options, &opt);
 
 	WalSndSetState(WALSNDSTATE_BACKUP);
 
-	if (update_process_title)
-	{
-		char		activitymsg[50];
+	snprintf(activitymsg, sizeof(activitymsg), "sending backup \"%s\"",
+			 opt.label);
 
-		snprintf(activitymsg, sizeof(activitymsg), "sending backup \"%s\"",
-				 opt.label);
+	if (update_process_title)
 		set_ps_display(activitymsg);
-	}
+
+	ereport(DEBUG1, (errmsg_internal("%s", activitymsg)));
 
 	/* Create a basic basebackup sink. */
 	sink = bbsink_copytblspc_new();
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 7a7eb3784e..aa995cda56 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -643,6 +643,7 @@ WalRcvWaitForStartPosition(XLogRecPtr *startpoint, TimeLineID *startpointTLI)
 {
 	WalRcvData *walrcv = WalRcv;
 	int			state;
+	char		activitymsg[50];
 
 	SpinLockAcquire(&walrcv->mutex);
 	state = walrcv->walRcvState;
@@ -704,14 +705,13 @@ WalRcvWaitForStartPosition(XLogRecPtr *startpoint, TimeLineID *startpointTLI)
 						 WAIT_EVENT_WAL_RECEIVER_WAIT_START);
 	}
 
-	if (update_process_title)
-	{
-		char		activitymsg[50];
+	snprintf(activitymsg, sizeof(activitymsg), "restarting at %X/%X",
+			 LSN_FORMAT_ARGS(*startpoint));
 
-		snprintf(activitymsg, sizeof(activitymsg), "restarting at %X/%X",
-				 LSN_FORMAT_ARGS(*startpoint));
+	if (update_process_title)
 		set_ps_display(activitymsg);
-	}
+
+	ereport(DEBUG1, (errmsg_internal("%s", activitymsg)));
 }
 
 /*
@@ -965,6 +965,8 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr, TimeLineID tli)
 static void
 XLogWalRcvFlush(bool dying, TimeLineID tli)
 {
+	char		activitymsg[50];
+
 	Assert(tli != 0);
 
 	if (LogstreamResult.Flush < LogstreamResult.Write)
@@ -990,15 +992,15 @@ XLogWalRcvFlush(bool dying, TimeLineID tli)
 		if (AllowCascadeReplication())
 			WalSndWakeup();
 
+
+		snprintf(activitymsg, sizeof(activitymsg), "streaming %X/%X",
+				 LSN_FORMAT_ARGS(LogstreamResult.Write));
+
 		/* Report XLOG streaming progress in PS display */
 		if (update_process_title)
-		{
-			char		activitymsg[50];
-
-			snprintf(activitymsg, sizeof(activitymsg), "streaming %X/%X",
-					 LSN_FORMAT_ARGS(LogstreamResult.Write));
 			set_ps_display(activitymsg);
-		}
+
+		ereport(DEBUG1, (errmsg_internal("%s", activitymsg)));
 
 		/* Also let the primary know that we made some progress */
 		if (!dying)
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 7950afb173..273ce0cc8a 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2662,6 +2662,7 @@ XLogSendPhysical(void)
 	Size		nbytes;
 	XLogSegNo	segno;
 	WALReadError errinfo;
+	char		activitymsg[50];
 
 	/* If requested switch the WAL sender to the stopping state. */
 	if (got_STOPPING)
@@ -2939,15 +2940,14 @@ retry:
 		SpinLockRelease(&walsnd->mutex);
 	}
 
+	snprintf(activitymsg, sizeof(activitymsg), "streaming %X/%X",
+			 LSN_FORMAT_ARGS(sentPtr));
+
 	/* Report progress of XLOG streaming in PS display */
 	if (update_process_title)
-	{
-		char		activitymsg[50];
-
-		snprintf(activitymsg, sizeof(activitymsg), "streaming %X/%X",
-				 LSN_FORMAT_ARGS(sentPtr));
 		set_ps_display(activitymsg);
-	}
+
+	ereport(DEBUG1, (errmsg_internal("%s", activitymsg)));
 }
 
 /*
-- 
2.25.1

#13

alvherre@alvh.no-ip.org

about 4 years ago

In reply to: Bharath Rupireddy (#12)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

A bunch of these now execute snprintf()s unnecessarily. I think these
should be made conditional on message_level_is_interesting(DEBUG1) to
avoid that overhead.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick." (Andrew Sullivan)

#14

bharath.rupireddyforpostgres@gmail.com

about 4 years ago

In reply to: Alvaro Herrera (#13)

1 attachment(s)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Sat, Nov 13, 2021 at 9:04 PM Alvaro Herrera <alvherre@alvh.no-ip.org>
wrote:

A bunch of these now execute snprintf()s unnecessarily. I think these
should be made conditional on message_level_is_interesting(DEBUG1) to
avoid that overhead.

Thanks. Attaching the v2 to avoid that by directly using the message in
ereport instead of activitymsg.

Regards,
Bharath Rupireddy.

Attachments:

v2-0001-add-log-messages-equivalents-of-ps-display-activi.patchapplication/x-patch; name=v2-0001-add-log-messages-equivalents-of-ps-display-activi.patchDownload

From 6eea044a734fc53353deaa973df6f70096160435 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Sun, 14 Nov 2021 02:29:04 +0000
Subject: [PATCH v2] add log messages equivalents of ps display activitymsgs

---
 src/backend/access/transam/xlog.c     | 6 ++++++
 src/backend/postmaster/pgarch.c       | 8 ++++----
 src/backend/replication/basebackup.c  | 3 +++
 src/backend/replication/walreceiver.c | 8 ++++++++
 src/backend/replication/walsender.c   | 4 ++++
 5 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e073121a7e..3a62b4c745 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3791,6 +3791,9 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 					 xlogfname);
 			set_ps_display(activitymsg);
 
+			ereport(DEBUG1,
+					(errmsg_internal("waiting for %s", xlogfname)));
+
 			if (!RestoreArchivedFile(path, xlogfname,
 									 "RECOVERYXLOG",
 									 wal_segment_size,
@@ -3833,6 +3836,9 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 				 xlogfname);
 		set_ps_display(activitymsg);
 
+		ereport(DEBUG1,
+				(errmsg_internal("recovering %s", xlogfname)));
+
 		/* Track source of data in assorted state variables */
 		readSource = source;
 		XLogReceiptSource = source;
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 3b33e01d95..abfe451dbe 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -547,14 +547,14 @@ pgarch_archiveXlog(char *xlog)
 	}
 	*dp = '\0';
 
-	ereport(DEBUG3,
-			(errmsg_internal("executing archive command \"%s\"",
-							 xlogarchcmd)));
-
 	/* Report archive activity in PS display */
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
+	ereport(DEBUG1,
+			(errmsg_internal("executing archive command \"%s\" for archving %s",
+							 xlogarchcmd, xlog)));
+
 	rc = system(xlogarchcmd);
 	if (rc != 0)
 	{
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index ec0485705d..bc173bfe8e 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -881,6 +881,9 @@ SendBaseBackup(BaseBackupCmd *cmd)
 		set_ps_display(activitymsg);
 	}
 
+	ereport(DEBUG1,
+			(errmsg_internal("sending backup \"%s\"", opt.label)));
+
 	/* Create a basic basebackup sink. */
 	sink = bbsink_copytblspc_new();
 
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 7a7eb3784e..f0062aa615 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -712,6 +712,10 @@ WalRcvWaitForStartPosition(XLogRecPtr *startpoint, TimeLineID *startpointTLI)
 				 LSN_FORMAT_ARGS(*startpoint));
 		set_ps_display(activitymsg);
 	}
+
+	ereport(DEBUG1,
+			(errmsg_internal("restarting at %X/%X",
+							 LSN_FORMAT_ARGS(*startpoint))));
 }
 
 /*
@@ -1000,6 +1004,10 @@ XLogWalRcvFlush(bool dying, TimeLineID tli)
 			set_ps_display(activitymsg);
 		}
 
+		ereport(DEBUG1,
+				(errmsg_internal("streaming %X/%X",
+								  LSN_FORMAT_ARGS(LogstreamResult.Write))));
+
 		/* Also let the primary know that we made some progress */
 		if (!dying)
 		{
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 7950afb173..e04bab54c9 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2948,6 +2948,10 @@ retry:
 				 LSN_FORMAT_ARGS(sentPtr));
 		set_ps_display(activitymsg);
 	}
+
+	ereport(DEBUG1,
+			(errmsg_internal("streaming %X/%X",
+							  LSN_FORMAT_ARGS(sentPtr))));
 }
 
 /*
-- 
2.25.1

#15

bossartn@amazon.com

about 4 years ago

In reply to: Bharath Rupireddy (#14)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On 11/14/21, 4:18 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:

Thanks. Attaching the v2 to avoid that by directly using the message
in ereport instead of activitymsg.

+            ereport(DEBUG1,
+                    (errmsg_internal("waiting for %s", xlogfname)));

As a general comment, I think we should include more detail than what
goes in the ps title. While we probably want to keep the ps title,
brief, I don't see any issue with including a bit more context in
debug statements.

+    ereport(DEBUG1,
+            (errmsg_internal("executing archive command \"%s\" for archving %s",
+                             xlogarchcmd, xlog)));

I wonder if it's necessary to include the "for archiving" part in this
one. Typically, the xlogarchcmd will include it somewhere.

Nathan

#16

michael@paquier.xyz

about 4 years ago

In reply to: Bossart, Nathan (#15)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Mon, Nov 15, 2021 at 10:36:09PM +0000, Bossart, Nathan wrote:

+    ereport(DEBUG1,
+            (errmsg_internal("executing archive command \"%s\" for archving %s",
+                             xlogarchcmd, xlog)));
I wonder if it's necessary to include the "for archiving" part in this
one. Typically, the xlogarchcmd will include it somewhere.

Don't see any need to add that either (s/archving/archiving/, btw).

@@ -1000,6 +1004,10 @@ XLogWalRcvFlush(bool dying, TimeLineID tli)
+               ereport(DEBUG1,
+                               (errmsg_internal("streaming %X/%X",
+ LSN_FORMAT_ARGS(LogstreamResult.Write))));
Having the same wording for the WAL receiver and the WAL sender could
be confusing when it comes to a cascading standby node, particularly
if log_line_prefix does not include the process PID.

+       ereport(DEBUG1,
+                       (errmsg_internal("streaming %X/%X",
+                                                         LSN_FORMAT_ARGS(sentPtr))));
Anyway, those two ones are going to make the logs much more noisy, no?
The same could be said about XLogFileRead(), joining the point of
Nathan's upthread.  So I cannot get excited by this change.
--
Michael

#17

bossartn@amazon.com

about 4 years ago

In reply to: Michael Paquier (#16)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On 11/15/21, 7:14 PM, "Michael Paquier" <michael@paquier.xyz> wrote:

+       ereport(DEBUG1,
+                       (errmsg_internal("streaming %X/%X",
+                                                         LSN_FORMAT_ARGS(sentPtr))));
Anyway, those two ones are going to make the logs much more noisy, no?
The same could be said about XLogFileRead(), joining the point of
Nathan's upthread.  So I cannot get excited by this change.

Yeah, this might even be too noisy for DEBUG5.

Nathan

#18

alvherre@alvh.no-ip.org

about 4 years ago

In reply to: Bossart, Nathan (#17)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On 2021-Nov-16, Bossart, Nathan wrote:

On 11/15/21, 7:14 PM, "Michael Paquier" <michael@paquier.xyz> wrote:

+       ereport(DEBUG1,
+                       (errmsg_internal("streaming %X/%X",
+                                                         LSN_FORMAT_ARGS(sentPtr))));
Anyway, those two ones are going to make the logs much more noisy, no?
The same could be said about XLogFileRead(), joining the point of
Nathan's upthread.  So I cannot get excited by this change.

Yeah, this might even be too noisy for DEBUG5.

Nod. And if you're at DEBUG5, the log contains so much other crap that
it is unusable for any purposes anyway.

My opinion is that adding these things willy-nilly is not a solution to
any actual problem. Adding a few additional log lines that are
low-volume at DEBUG1 might be useful, but below that (DEBUG2 etc) it's
not good for anything other than specific development, IMO. At least
this particular one for streaming replication I think we should not
include.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#19

michael@paquier.xyz

about 4 years ago

In reply to: Alvaro Herrera (#18)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Tue, Nov 16, 2021 at 01:26:49PM -0300, Alvaro Herrera wrote:

My opinion is that adding these things willy-nilly is not a solution to
any actual problem. Adding a few additional log lines that are
low-volume at DEBUG1 might be useful, but below that (DEBUG2 etc) it's
not good for anything other than specific development, IMO. At least
this particular one for streaming replication I think we should not
include.

Looking at v2, I think that this leaves the additions of the DEBUG1
entries in SendBaseBackup() and WalRcvWaitForStartPosition(), then.
The one in pgarch.c does not provide any additional information as the
segment to-be-archived should be part of the command.
--
Michael

#20

bharath.rupireddyforpostgres@gmail.com

about 4 years ago

In reply to: Michael Paquier (#19)

1 attachment(s)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Wed, Nov 17, 2021 at 8:01 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Nov 16, 2021 at 01:26:49PM -0300, Alvaro Herrera wrote:

My opinion is that adding these things willy-nilly is not a solution to
any actual problem. Adding a few additional log lines that are
low-volume at DEBUG1 might be useful, but below that (DEBUG2 etc) it's
not good for anything other than specific development, IMO. At least
this particular one for streaming replication I think we should not
include.

Looking at v2, I think that this leaves the additions of the DEBUG1
entries in SendBaseBackup() and WalRcvWaitForStartPosition(), then.
The one in pgarch.c does not provide any additional information as the
segment to-be-archived should be part of the command.

Thank you all for the inputs. Here's the patch that I've come up with.

Upon thinking further, having at least the messages at LOG level [1]ereport(LOG, (errmsg("waiting for WAL segment \"%s\" from archive", xlogfname)));
would be helpful to know what's happening with the system while in
recovery. Although these messages at LOG level seem to be filling up
the server logs, having a good log consumption and rotation mechanism
(I'm sure every major postgres vendor would have one) would be
sufficient to allay that concern.

These LOG messages would help us know how much time a restore command
takes to fetch the WAL file and what is the current WAL file the
server is recovering and where is it recovering from. The customer
often asks questions like: when will my server come up? how much time
does the recovery of my server take?

As a developer or admin, one can monitor these logs and do bunch of things:
1) see how many WAL files left to be recovered by looking at the WAL
files in the archive location or pg_wal directory or from primary
2) provide an approximate estimation of when the server will come up
or how much more the recovery takes by looking at these previous LOG
messages, one can know the number of WAL files that server recovered
over a minute and with the help of left-over WAL files calculated from
(1).

[1]: ereport(LOG, (errmsg("waiting for WAL segment \"%s\" from archive", xlogfname)));
ereport(LOG,
(errmsg("waiting for WAL segment \"%s\" from archive",
xlogfname)));

ereport(LOG,
(errmsg("restored WAL segment \"%s\" from archive",
xlogfname)));

ereport(LOG,
(errmsg("recovering WAL segment \"%s\" from source \"%s\"",
xlogfname, srcname)));

Regards,
Bharath Rupireddy.

Attachments:

v3-0001-add-log-messages-for-recovery.patchapplication/octet-stream; name=v3-0001-add-log-messages-for-recovery.patchDownload

From fa9fafb42f6ed98594ffd74973a55c409ab36dc6 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Tue, 7 Dec 2021 12:40:34 +0000
Subject: [PATCH v3] add log messages for recovery

---
 src/backend/access/transam/xlog.c        | 15 ++++++++++++++-
 src/backend/access/transam/xlogarchive.c |  2 +-
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d894af310a..bbdb93db61 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3776,6 +3776,7 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 	char		activitymsg[MAXFNAMELEN + 16];
 	char		path[MAXPGPATH];
 	int			fd;
+	char		srcname[MAXFNAMELEN];
 
 	XLogFileName(xlogfname, tli, segno, wal_segment_size);
 
@@ -3787,16 +3788,25 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 					 xlogfname);
 			set_ps_display(activitymsg);
 
+			ereport(LOG,
+					(errmsg("waiting for WAL segment \"%s\" from archive",
+							xlogfname)));
+
 			if (!RestoreArchivedFile(path, xlogfname,
 									 "RECOVERYXLOG",
 									 wal_segment_size,
 									 InRedo))
 				return -1;
+
+			snprintf(srcname, MAXFNAMELEN, "archive");
 			break;
 
 		case XLOG_FROM_PG_WAL:
+			snprintf(srcname, MAXFNAMELEN, "pg_wal");
+			/* fall through */
 		case XLOG_FROM_STREAM:
 			XLogFilePath(path, tli, segno, wal_segment_size);
+			snprintf(srcname, MAXFNAMELEN, "stream");
 			break;
 
 		default:
@@ -3829,6 +3839,10 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 				 xlogfname);
 		set_ps_display(activitymsg);
 
+		ereport(LOG,
+				(errmsg("recovering WAL segment \"%s\" from source \"%s\"",
+								 xlogfname, srcname)));
+
 		/* Track source of data in assorted state variables */
 		readSource = source;
 		XLogReceiptSource = source;
@@ -3919,7 +3933,6 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
 							  XLOG_FROM_ARCHIVE, true);
 			if (fd != -1)
 			{
-				elog(DEBUG1, "got WAL segment from archive");
 				if (!expectedTLEs)
 					expectedTLEs = tles;
 				return fd;
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 4ddeac1fb9..1f724f4acf 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -214,7 +214,7 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 			else
 			{
 				ereport(LOG,
-						(errmsg("restored log file \"%s\" from archive",
+						(errmsg("restored WAL segment \"%s\" from archive",
 								xlogfname)));
 				strcpy(path, xlogpath);
 				return true;
-- 
2.25.1

#21

michael@paquier.xyz

about 4 years ago

In reply to: Bharath Rupireddy (#20)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Tue, Dec 07, 2021 at 06:28:24PM +0530, Bharath Rupireddy wrote:

Upon thinking further, having at least the messages at LOG level [1]
would be helpful to know what's happening with the system while in
recovery. Although these messages at LOG level seem to be filling up
the server logs, having a good log consumption and rotation mechanism
(I'm sure every major postgres vendor would have one) would be
sufficient to allay that concern.

+      ereport(LOG,
+              (errmsg("recovering WAL segment \"%s\" from source \"%s\"",
+               xlogfname, srcname)));
Isn't this approach an issue for translations?  It seems to me that
terms like "stream" or "archive" had better be translated properly,
meaning that this needs a fully-constructed sentence.
--
Michael

#22

bharath.rupireddyforpostgres@gmail.com

about 4 years ago

In reply to: Michael Paquier (#21)

1 attachment(s)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Wed, Dec 8, 2021 at 10:05 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Dec 07, 2021 at 06:28:24PM +0530, Bharath Rupireddy wrote:

Upon thinking further, having at least the messages at LOG level [1]
would be helpful to know what's happening with the system while in
recovery. Although these messages at LOG level seem to be filling up
the server logs, having a good log consumption and rotation mechanism
(I'm sure every major postgres vendor would have one) would be
sufficient to allay that concern.
+      ereport(LOG,
+              (errmsg("recovering WAL segment \"%s\" from source \"%s\"",
+               xlogfname, srcname)));
Isn't this approach an issue for translations?  It seems to me that
terms like "stream" or "archive" had better be translated properly,
meaning that this needs a fully-constructed sentence.

Thanks for taking a look at the patch. How about the attached v4?

I added a CF entry - https://commitfest.postgresql.org/36/3443/

Regards,
Bharath Rupireddy.

Attachments:

v4-0001-add-log-messages-for-recovery.patchapplication/octet-stream; name=v4-0001-add-log-messages-for-recovery.patchDownload

From 66288a9afcb682876c7c056aa56cfa8da91fc2d2 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Wed, 8 Dec 2021 05:15:56 +0000
Subject: [PATCH v4] add log messages for recovery

The messages at LOG level would be helpful to know what's happening
with the system while in recovery. Although these messages seem to
be filling up the server logs, but the advantages we gain with them
are plenty. Of course, having a good log consumption and rotation
mechanism will help the server logs not fill up the disk.

These LOG messages will help us know how much time a restore command
takes to fetch the WAL file, what is the current WAL file the server
is recovering and where is it recovering from.

As a developer or admin, one can monitor these logs and do bunch of
things:
1) see how many WAL files left to be recovered by looking at the WAL
files in the archive location or pg_wal directory or from primary
2) provide an approximate estimation of when the server will come up
or how much more the recovery takes by looking at these previous LOG
messages, one can know the number of WAL files that server recovered
over a minute and with the help of left-over WAL files calculated
---
 src/backend/access/transam/xlog.c        | 18 +++++++++++++++++-
 src/backend/access/transam/xlogarchive.c |  2 +-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d894af310a..56b40884f1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3787,6 +3787,10 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 					 xlogfname);
 			set_ps_display(activitymsg);
 
+			ereport(LOG,
+					(errmsg("waiting for WAL segment \"%s\" from archive",
+							xlogfname)));
+
 			if (!RestoreArchivedFile(path, xlogfname,
 									 "RECOVERYXLOG",
 									 wal_segment_size,
@@ -3829,6 +3833,19 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 				 xlogfname);
 		set_ps_display(activitymsg);
 
+		if (source == XLOG_FROM_ARCHIVE)
+			ereport(LOG,
+					(errmsg("recovering WAL segment \"%s\" received from archive",
+							xlogfname)));
+		else if (source == XLOG_FROM_PG_WAL)
+			ereport(LOG,
+					(errmsg("recovering WAL segment \"%s\" received from pg_wal",
+							xlogfname)));
+		else if (source == XLOG_FROM_STREAM)
+			ereport(LOG,
+					(errmsg("recovering WAL segment \"%s\" received from primary",
+							xlogfname)));
+
 		/* Track source of data in assorted state variables */
 		readSource = source;
 		XLogReceiptSource = source;
@@ -3919,7 +3936,6 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
 							  XLOG_FROM_ARCHIVE, true);
 			if (fd != -1)
 			{
-				elog(DEBUG1, "got WAL segment from archive");
 				if (!expectedTLEs)
 					expectedTLEs = tles;
 				return fd;
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 4ddeac1fb9..1f724f4acf 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -214,7 +214,7 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 			else
 			{
 				ereport(LOG,
-						(errmsg("restored log file \"%s\" from archive",
+						(errmsg("restored WAL segment \"%s\" from archive",
 								xlogfname)));
 				strcpy(path, xlogpath);
 				return true;
-- 
2.25.1

#23

michael@paquier.xyz

about 4 years ago

In reply to: Bharath Rupireddy (#22)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Wed, Dec 08, 2021 at 10:47:57AM +0530, Bharath Rupireddy wrote:

Thanks for taking a look at the patch. How about the attached v4?

I added a CF entry - https://commitfest.postgresql.org/36/3443/

+       else if (source == XLOG_FROM_STREAM)
+           ereport(LOG,
+                   (errmsg("recovering WAL segment \"%s\" received from primary",
+                           xlogfname)));
This is incorrect when using a cascading standby.  And perhaps this
could use a switch/case?

While quickly testing, I got reminded that the LOG for a segment
retrieved from the local pg_wal would generate some noise when running
for example the bootstrap process. Is this one really interesting
compared to the two others?
--
Michael

#24

bharath.rupireddyforpostgres@gmail.com

about 4 years ago

In reply to: Michael Paquier (#23)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Wed, Dec 8, 2021 at 12:52 PM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Dec 08, 2021 at 10:47:57AM +0530, Bharath Rupireddy wrote:

Thanks for taking a look at the patch. How about the attached v4?

I added a CF entry - https://commitfest.postgresql.org/36/3443/
+       else if (source == XLOG_FROM_STREAM)
+           ereport(LOG,
+                   (errmsg("recovering WAL segment \"%s\" received from primary",
+                           xlogfname)));
This is incorrect when using a cascading standby.

I visualize the setup like this: primary -> standby -> cascading standby

For standby, "received from primary" makes sense. For cascading
standby too it makes sense because the standby still acts as primary
for cascading standby, no? And we don't distinguish any other existing
messages for a standby acting as a cascading standby right? The
comments around the enum XLOG_FROM_STREAM and usage of it still says
from "primary".

Do you want me to add "received from stream" in general?

And perhaps this could use a switch/case?

Yes, I will change it to switch-case in the next version.

While quickly testing, I got reminded that the LOG for a segment
retrieved from the local pg_wal would generate some noise when running
for example the bootstrap process. Is this one really interesting
compared to the two others?

Yes. I observed that for a simple primary-standby setup, inserting
10million rows generated around 40 WAL files and so 40 LOG messages in
the standby server logs, in a typical production environment with
heavy-write workload, the amount of LOG generation is going to be much
more. And we don't want to have them as DEBUG1 messages and enable
DEBUG1 just for recovery messages that one is interested in.

How about we invent a new GUC log_recovery, with default set to false,
similar to log_checkpoints? And, these messages can be like:
ereport(log_recovery ? LOG : DEBUG1, .....)? This GUC can be useful to
add some recovery stats in future as well. Thoughts?

Regards,
Bharath Rupireddy.

#25

michael@paquier.xyz

about 4 years ago

In reply to: Bharath Rupireddy (#24)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Wed, Dec 08, 2021 at 02:00:12PM +0530, Bharath Rupireddy wrote:

For standby, "received from primary" makes sense. For cascading
standby too it makes sense because the standby still acts as primary
for cascading standby, no? And we don't distinguish any other existing
messages for a standby acting as a cascading standby right? The
comments around the enum XLOG_FROM_STREAM and usage of it still says
from "primary".

Quoting the docs:
https://www.postgresql.org/docs/devel/glossary.html#GLOSSARY-PRIMARY-SERVER
And a standby is no such thing.

Do you want me to add "received from stream" in general?

"from stream" would be fine, IMHO.

How about we invent a new GUC log_recovery, with default set to false,
similar to log_checkpoints? And, these messages can be like:
ereport(log_recovery ? LOG : DEBUG1, .....)? This GUC can be useful to
add some recovery stats in future as well. Thoughts?

This discussion does not justify a new GUC at this stage IMO.
--
Michael

#26

bharath.rupireddyforpostgres@gmail.com

about 4 years ago

In reply to: Michael Paquier (#25)

1 attachment(s)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Wed, Dec 8, 2021 at 2:08 PM Michael Paquier <michael@paquier.xyz> wrote:

Do you want me to add "received from stream" in general?

"from stream" would be fine, IMHO.

Done in the attached v5 patch.

How about we invent a new GUC log_recovery, with default set to false,
similar to log_checkpoints? And, these messages can be like:
ereport(log_recovery ? LOG : DEBUG1, .....)? This GUC can be useful to
add some recovery stats in future as well. Thoughts?

This discussion does not justify a new GUC at this stage IMO.

Having "log_recovery" GUC gives the flexibility to the users to enable
the recovery logs added by this patch if they are concerned otherwise
the logs just be at DEBUG1 level. Setting log_min_messages level to
DEBUG1 would definitely flood the server logs.

While quickly testing, I got reminded that the LOG for a segment
retrieved from the local pg_wal would generate some noise when running
for example the bootstrap process. Is this one really interesting
compared to the two others?

If we just have the archive related message at LOG level and others at
DEBUG1 level, it is a kind of partial recovery log information that
one has in the log files, missing recovery messages from the pg_wal
directory in the logs.

Another option is to just have all the messages (added in the patch)
at DEBUG1 level, let's users deal with it.

IMO, the new GUC "log_recovery" is way better as it can be extended to
include some recovery stats info as well.

Thoughts?

Regards,
Bharath Rupireddy.

Attachments:

v5-0001-add-log-messages-for-recovery.patchapplication/x-patch; name=v5-0001-add-log-messages-for-recovery.patchDownload

From a486f3643cc7b84bae3cd6a72cd4d3323c224043 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Wed, 8 Dec 2021 09:01:49 +0000
Subject: [PATCH v5] add log messages for recovery

The messages at LOG level would be helpful to know what's happening
with the system while in recovery. Although these messages seem to
be filling up the server logs, but the advantages we gain with them
are plenty. Of course, having a good log consumption and rotation
mechanism will help the server logs not fill up the disk.

These LOG messages will help us know how much time a restore command
takes to fetch the WAL file, what is the current WAL file the server
is recovering and where is it recovering from.

As a developer or admin, one can monitor these logs and do bunch of
things:
1) see how many WAL files left to be recovered by looking at the WAL
files in the archive location or pg_wal directory or from primary
2) provide an approximate estimation of when the server will come up
or how much more the recovery takes by looking at these previous LOG
messages, one can know the number of WAL files that server recovered
over a minute and with the help of left-over WAL files calculated.
---
 src/backend/access/transam/xlog.c        | 30 +++++++++++++++++++++++-
 src/backend/access/transam/xlogarchive.c |  2 +-
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d894af310a..d799e12a3a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3787,6 +3787,10 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 					 xlogfname);
 			set_ps_display(activitymsg);
 
+			ereport(LOG,
+					(errmsg("waiting for WAL segment \"%s\" from archive",
+							xlogfname)));
+
 			if (!RestoreArchivedFile(path, xlogfname,
 									 "RECOVERYXLOG",
 									 wal_segment_size,
@@ -3829,6 +3833,31 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 				 xlogfname);
 		set_ps_display(activitymsg);
 
+		switch (source)
+		{
+			case XLOG_FROM_ARCHIVE:
+				ereport(LOG,
+						(errmsg("recovering WAL segment \"%s\" received from archive",
+								xlogfname)));
+				break;
+			case XLOG_FROM_PG_WAL:
+				ereport(LOG,
+						(errmsg("recovering WAL segment \"%s\" received from pg_wal",
+								xlogfname)));
+				break;
+			case XLOG_FROM_STREAM:
+				ereport(LOG,
+						(errmsg("recovering WAL segment \"%s\" received from stream",
+								xlogfname)));
+				break;
+			default:
+				/*
+				 * Intentionally doing nothing here as invalid source check has
+				 * already been perofrmed at the beginning of this function.
+				 */
+				break;
+		}
+
 		/* Track source of data in assorted state variables */
 		readSource = source;
 		XLogReceiptSource = source;
@@ -3919,7 +3948,6 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
 							  XLOG_FROM_ARCHIVE, true);
 			if (fd != -1)
 			{
-				elog(DEBUG1, "got WAL segment from archive");
 				if (!expectedTLEs)
 					expectedTLEs = tles;
 				return fd;
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 4ddeac1fb9..1f724f4acf 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -214,7 +214,7 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 			else
 			{
 				ereport(LOG,
-						(errmsg("restored log file \"%s\" from archive",
+						(errmsg("restored WAL segment \"%s\" from archive",
 								xlogfname)));
 				strcpy(path, xlogpath);
 				return true;
-- 
2.25.1

#27

michael@paquier.xyz

about 4 years ago

In reply to: Bharath Rupireddy (#26)

1 attachment(s)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Wed, Dec 08, 2021 at 04:57:49PM +0530, Bharath Rupireddy wrote:

Done in the attached v5 patch.

Thanks for the new version. FWIW, as the information logged when
recovering a WAL segment from the local pg_wal could be rather
confusing at bootstrap phase, I would suggest to simplify things as of
the attached and call it a day.
--
Michael

Attachments:

v6-0001-add-log-messages-for-recovery.patchtext/x-diff; charset=us-asciiDownload

From 0ab1a0ecdc54272942afbaf523abeff249071e8c Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 9 Dec 2021 11:23:42 +0900
Subject: [PATCH v6] add log messages for recovery

---
 src/backend/access/transam/xlog.c        | 22 +++++++++++++++++++++-
 src/backend/access/transam/xlogarchive.c |  2 +-
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d894af310a..3a98ee6c3b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3787,6 +3787,10 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 					 xlogfname);
 			set_ps_display(activitymsg);
 
+			ereport(LOG,
+					(errmsg("waiting for WAL segment \"%s\" from archive",
+							xlogfname)));
+
 			if (!RestoreArchivedFile(path, xlogfname,
 									 "RECOVERYXLOG",
 									 wal_segment_size,
@@ -3829,6 +3833,23 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 				 xlogfname);
 		set_ps_display(activitymsg);
 
+		switch (source)
+		{
+			case XLOG_FROM_ARCHIVE:
+				ereport(LOG,
+						(errmsg("recovering WAL segment \"%s\" from archive",
+								xlogfname)));
+				break;
+			case XLOG_FROM_STREAM:
+				ereport(LOG,
+						(errmsg("recovering WAL segment \"%s\" from stream",
+								xlogfname)));
+				break;
+			case XLOG_FROM_PG_WAL:
+			case XLOG_FROM_ANY:
+				break;
+		}
+
 		/* Track source of data in assorted state variables */
 		readSource = source;
 		XLogReceiptSource = source;
@@ -3919,7 +3940,6 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
 							  XLOG_FROM_ARCHIVE, true);
 			if (fd != -1)
 			{
-				elog(DEBUG1, "got WAL segment from archive");
 				if (!expectedTLEs)
 					expectedTLEs = tles;
 				return fd;
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 4ddeac1fb9..1f724f4acf 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -214,7 +214,7 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 			else
 			{
 				ereport(LOG,
-						(errmsg("restored log file \"%s\" from archive",
+						(errmsg("restored WAL segment \"%s\" from archive",
 								xlogfname)));
 				strcpy(path, xlogpath);
 				return true;
-- 
2.34.1

#28

bharath.rupireddyforpostgres@gmail.com

about 4 years ago

In reply to: Michael Paquier (#27)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Thu, Dec 9, 2021 at 7:56 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Dec 08, 2021 at 04:57:49PM +0530, Bharath Rupireddy wrote:

Done in the attached v5 patch.

Thanks for the new version. FWIW, as the information logged when
recovering a WAL segment from the local pg_wal could be rather
confusing at bootstrap phase, I would suggest to simplify things as of
the attached and call it a day.

Thanks for the patch. It looks good to me. I have a suggestion: It's
better if we specify why we are not emitting any message if the WAL
segment is found in pg_wal directory, something like "we are
intentionally not emitting any messages in case if the required WAL
segment is found in the pg_wal directory to avoid noise in the server
logs".

+ case XLOG_FROM_PG_WAL:
+ case XLOG_FROM_ANY:
+ break;

Regards,
Bharath Rupireddy.

#29

bharath.rupireddyforpostgres@gmail.com

about 4 years ago

In reply to: Bharath Rupireddy (#28)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Thu, Dec 9, 2021 at 11:28 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Dec 9, 2021 at 7:56 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Dec 08, 2021 at 04:57:49PM +0530, Bharath Rupireddy wrote:

Done in the attached v5 patch.

Thanks for the new version. FWIW, as the information logged when
recovering a WAL segment from the local pg_wal could be rather
confusing at bootstrap phase, I would suggest to simplify things as of
the attached and call it a day.

Thanks for the patch. It looks good to me. I have a suggestion: It's
better if we specify why we are not emitting any message if the WAL
segment is found in pg_wal directory, something like "we are
intentionally not emitting any messages in case if the required WAL
segment is found in the pg_wal directory to avoid noise in the server
logs".
+ case XLOG_FROM_PG_WAL:
+ case XLOG_FROM_ANY:
+ break;

I just want to call this out: an insertion of 10 million rows in the
primary generates a bunch of messages in the standby [1]2021-12-08 08:11:53.906 UTC [1219898] LOG: recovering WAL segment "000000010000000000000004" from stream 2021-12-08 08:11:55.864 UTC [1219898] LOG: recovering WAL segment "000000010000000000000005" from stream 2021-12-08 08:11:56.713 UTC [1219898] LOG: recovering WAL segment "000000010000000000000006" from stream 2021-12-08 08:11:57.495 UTC [1219898] LOG: recovering WAL segment "000000010000000000000007" from stream 2021-12-08 08:11:58.301 UTC [1219898] LOG: recovering WAL segment "000000010000000000000008" from stream 2021-12-08 08:11:59.062 UTC [1219898] LOG: recovering WAL segment "000000010000000000000009" from stream 2021-12-08 08:11:59.853 UTC [1219898] LOG: recovering WAL segment "00000001000000000000000A" from stream 2021-12-08 08:12:00.573 UTC [1219898] LOG: recovering WAL segment "00000001000000000000000B" from stream 2021-12-08 08:12:01.346 UTC [1219898] LOG: recovering WAL segment "00000001000000000000000C" from stream 2021-12-08 08:12:02.048 UTC [1219898] LOG: recovering WAL segment "00000001000000000000000D" from stream 2021-12-08 08:12:02.798 UTC [1219898] LOG: recovering WAL segment "00000001000000000000000E" from stream 2021-12-08 08:12:03.544 UTC [1219898] LOG: recovering WAL segment "00000001000000000000000F" from stream 2021-12-08 08:12:04.310 UTC [1219898] LOG: recovering WAL segment "000000010000000000000010" from stream 2021-12-08 08:12:05.096 UTC [1219898] LOG: recovering WAL segment "000000010000000000000011" from stream 2021-12-08 08:12:06.003 UTC [1219898] LOG: recovering WAL segment "000000010000000000000012" from stream 2021-12-08 08:12:06.762 UTC [1219898] LOG: recovering WAL segment "000000010000000000000013" from stream 2021-12-08 08:12:07.595 UTC [1219898] LOG: recovering WAL segment "000000010000000000000014" from stream 2021-12-08 08:12:08.431 UTC [1219898] LOG: recovering WAL segment "000000010000000000000015" from stream 2021-12-08 08:12:09.246 UTC [1219898] LOG: recovering WAL segment "000000010000000000000016" from stream 2021-12-08 08:12:10.144 UTC [1219898] LOG: recovering WAL segment "000000010000000000000017" from stream 2021-12-08 08:12:14.944 UTC [1219898] LOG: recovering WAL segment "000000010000000000000018" from stream 2021-12-08 08:12:15.784 UTC [1219898] LOG: recovering WAL segment "000000010000000000000019" from stream 2021-12-08 08:12:16.757 UTC [1219898] LOG: recovering WAL segment "00000001000000000000001A" from stream 2021-12-08 08:12:17.571 UTC [1219898] LOG: recovering WAL segment "00000001000000000000001B" from stream 2021-12-08 08:12:18.375 UTC [1219898] LOG: recovering WAL segment "00000001000000000000001C" from stream 2021-12-08 08:12:19.179 UTC [1219898] LOG: recovering WAL segment "00000001000000000000001D" from stream 2021-12-08 08:12:19.964 UTC [1219898] LOG: recovering WAL segment "00000001000000000000001E" from stream 2021-12-08 08:12:20.839 UTC [1219898] LOG: recovering WAL segment "00000001000000000000001F" from stream 2021-12-08 08:12:21.782 UTC [1219898] LOG: recovering WAL segment "000000010000000000000020" from stream 2021-12-08 08:12:24.422 UTC [1219898] LOG: recovering WAL segment "000000010000000000000021" from stream 2021-12-08 08:12:25.164 UTC [1219898] LOG: recovering WAL segment "000000010000000000000022" from stream 2021-12-08 08:12:25.989 UTC [1219898] LOG: recovering WAL segment "000000010000000000000023" from stream 2021-12-08 08:12:26.874 UTC [1219898] LOG: recovering WAL segment "000000010000000000000024" from stream 2021-12-08 08:12:27.592 UTC [1219898] LOG: recovering WAL segment "000000010000000000000025" from stream 2021-12-08 08:12:28.471 UTC [1219898] LOG: recovering WAL segment "000000010000000000000026" from stream 2021-12-08 08:12:29.158 UTC [1219898] LOG: recovering WAL segment "000000010000000000000027" from stream 2021-12-08 08:12:30.017 UTC [1219898] LOG: recovering WAL segment "000000010000000000000028" from stream 2021-12-08 08:12:30.778 UTC [1219898] LOG: recovering WAL segment "000000010000000000000029" from stream 2021-12-08 08:12:31.859 UTC [1219898] LOG: recovering WAL segment "00000001000000000000002A" from stream within 40 sec
(size of the standby server log grows by 5K).

[1]: 2021-12-08 08:11:53.906 UTC [1219898] LOG: recovering WAL segment "000000010000000000000004" from stream 2021-12-08 08:11:55.864 UTC [1219898] LOG: recovering WAL segment "000000010000000000000005" from stream 2021-12-08 08:11:56.713 UTC [1219898] LOG: recovering WAL segment "000000010000000000000006" from stream 2021-12-08 08:11:57.495 UTC [1219898] LOG: recovering WAL segment "000000010000000000000007" from stream 2021-12-08 08:11:58.301 UTC [1219898] LOG: recovering WAL segment "000000010000000000000008" from stream 2021-12-08 08:11:59.062 UTC [1219898] LOG: recovering WAL segment "000000010000000000000009" from stream 2021-12-08 08:11:59.853 UTC [1219898] LOG: recovering WAL segment "00000001000000000000000A" from stream 2021-12-08 08:12:00.573 UTC [1219898] LOG: recovering WAL segment "00000001000000000000000B" from stream 2021-12-08 08:12:01.346 UTC [1219898] LOG: recovering WAL segment "00000001000000000000000C" from stream 2021-12-08 08:12:02.048 UTC [1219898] LOG: recovering WAL segment "00000001000000000000000D" from stream 2021-12-08 08:12:02.798 UTC [1219898] LOG: recovering WAL segment "00000001000000000000000E" from stream 2021-12-08 08:12:03.544 UTC [1219898] LOG: recovering WAL segment "00000001000000000000000F" from stream 2021-12-08 08:12:04.310 UTC [1219898] LOG: recovering WAL segment "000000010000000000000010" from stream 2021-12-08 08:12:05.096 UTC [1219898] LOG: recovering WAL segment "000000010000000000000011" from stream 2021-12-08 08:12:06.003 UTC [1219898] LOG: recovering WAL segment "000000010000000000000012" from stream 2021-12-08 08:12:06.762 UTC [1219898] LOG: recovering WAL segment "000000010000000000000013" from stream 2021-12-08 08:12:07.595 UTC [1219898] LOG: recovering WAL segment "000000010000000000000014" from stream 2021-12-08 08:12:08.431 UTC [1219898] LOG: recovering WAL segment "000000010000000000000015" from stream 2021-12-08 08:12:09.246 UTC [1219898] LOG: recovering WAL segment "000000010000000000000016" from stream 2021-12-08 08:12:10.144 UTC [1219898] LOG: recovering WAL segment "000000010000000000000017" from stream 2021-12-08 08:12:14.944 UTC [1219898] LOG: recovering WAL segment "000000010000000000000018" from stream 2021-12-08 08:12:15.784 UTC [1219898] LOG: recovering WAL segment "000000010000000000000019" from stream 2021-12-08 08:12:16.757 UTC [1219898] LOG: recovering WAL segment "00000001000000000000001A" from stream 2021-12-08 08:12:17.571 UTC [1219898] LOG: recovering WAL segment "00000001000000000000001B" from stream 2021-12-08 08:12:18.375 UTC [1219898] LOG: recovering WAL segment "00000001000000000000001C" from stream 2021-12-08 08:12:19.179 UTC [1219898] LOG: recovering WAL segment "00000001000000000000001D" from stream 2021-12-08 08:12:19.964 UTC [1219898] LOG: recovering WAL segment "00000001000000000000001E" from stream 2021-12-08 08:12:20.839 UTC [1219898] LOG: recovering WAL segment "00000001000000000000001F" from stream 2021-12-08 08:12:21.782 UTC [1219898] LOG: recovering WAL segment "000000010000000000000020" from stream 2021-12-08 08:12:24.422 UTC [1219898] LOG: recovering WAL segment "000000010000000000000021" from stream 2021-12-08 08:12:25.164 UTC [1219898] LOG: recovering WAL segment "000000010000000000000022" from stream 2021-12-08 08:12:25.989 UTC [1219898] LOG: recovering WAL segment "000000010000000000000023" from stream 2021-12-08 08:12:26.874 UTC [1219898] LOG: recovering WAL segment "000000010000000000000024" from stream 2021-12-08 08:12:27.592 UTC [1219898] LOG: recovering WAL segment "000000010000000000000025" from stream 2021-12-08 08:12:28.471 UTC [1219898] LOG: recovering WAL segment "000000010000000000000026" from stream 2021-12-08 08:12:29.158 UTC [1219898] LOG: recovering WAL segment "000000010000000000000027" from stream 2021-12-08 08:12:30.017 UTC [1219898] LOG: recovering WAL segment "000000010000000000000028" from stream 2021-12-08 08:12:30.778 UTC [1219898] LOG: recovering WAL segment "000000010000000000000029" from stream 2021-12-08 08:12:31.859 UTC [1219898] LOG: recovering WAL segment "00000001000000000000002A" from stream
2021-12-08 08:11:53.906 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000004" from stream
2021-12-08 08:11:55.864 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000005" from stream
2021-12-08 08:11:56.713 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000006" from stream
2021-12-08 08:11:57.495 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000007" from stream
2021-12-08 08:11:58.301 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000008" from stream
2021-12-08 08:11:59.062 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000009" from stream
2021-12-08 08:11:59.853 UTC [1219898] LOG: recovering WAL segment
"00000001000000000000000A" from stream
2021-12-08 08:12:00.573 UTC [1219898] LOG: recovering WAL segment
"00000001000000000000000B" from stream
2021-12-08 08:12:01.346 UTC [1219898] LOG: recovering WAL segment
"00000001000000000000000C" from stream
2021-12-08 08:12:02.048 UTC [1219898] LOG: recovering WAL segment
"00000001000000000000000D" from stream
2021-12-08 08:12:02.798 UTC [1219898] LOG: recovering WAL segment
"00000001000000000000000E" from stream
2021-12-08 08:12:03.544 UTC [1219898] LOG: recovering WAL segment
"00000001000000000000000F" from stream
2021-12-08 08:12:04.310 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000010" from stream
2021-12-08 08:12:05.096 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000011" from stream
2021-12-08 08:12:06.003 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000012" from stream
2021-12-08 08:12:06.762 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000013" from stream
2021-12-08 08:12:07.595 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000014" from stream
2021-12-08 08:12:08.431 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000015" from stream
2021-12-08 08:12:09.246 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000016" from stream
2021-12-08 08:12:10.144 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000017" from stream
2021-12-08 08:12:14.944 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000018" from stream
2021-12-08 08:12:15.784 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000019" from stream
2021-12-08 08:12:16.757 UTC [1219898] LOG: recovering WAL segment
"00000001000000000000001A" from stream
2021-12-08 08:12:17.571 UTC [1219898] LOG: recovering WAL segment
"00000001000000000000001B" from stream
2021-12-08 08:12:18.375 UTC [1219898] LOG: recovering WAL segment
"00000001000000000000001C" from stream
2021-12-08 08:12:19.179 UTC [1219898] LOG: recovering WAL segment
"00000001000000000000001D" from stream
2021-12-08 08:12:19.964 UTC [1219898] LOG: recovering WAL segment
"00000001000000000000001E" from stream
2021-12-08 08:12:20.839 UTC [1219898] LOG: recovering WAL segment
"00000001000000000000001F" from stream
2021-12-08 08:12:21.782 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000020" from stream
2021-12-08 08:12:24.422 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000021" from stream
2021-12-08 08:12:25.164 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000022" from stream
2021-12-08 08:12:25.989 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000023" from stream
2021-12-08 08:12:26.874 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000024" from stream
2021-12-08 08:12:27.592 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000025" from stream
2021-12-08 08:12:28.471 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000026" from stream
2021-12-08 08:12:29.158 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000027" from stream
2021-12-08 08:12:30.017 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000028" from stream
2021-12-08 08:12:30.778 UTC [1219898] LOG: recovering WAL segment
"000000010000000000000029" from stream
2021-12-08 08:12:31.859 UTC [1219898] LOG: recovering WAL segment
"00000001000000000000002A" from stream

Regards,
Bharath Rupireddy.

#30

alvherre@alvh.no-ip.org

about 4 years ago

In reply to: Bharath Rupireddy (#29)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On 2021-Dec-09, Bharath Rupireddy wrote:

I just want to call this out: an insertion of 10 million rows in the
primary generates a bunch of messages in the standby [1] within 40 sec
(size of the standby server log grows by 5K).

Hmm, that does sound excessive to me in terms of log bloat increase.
Remember the discussion about turning log_checkpoints on by default?

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Las mujeres son como hondas: mientras más resistencia tienen,
más lejos puedes llegar con ellas" (Jonas Nightingale, Leap of Faith)

#31

[1]: /messages/by-id/CA+TgmoaDFpFmNQuaWj6+78CPVBrF_WPT1wFHBTvio=tRmxzUcQ@mail.gmail.com

bharath.rupireddyforpostgres@gmail.com

about 4 years ago

In reply to: Alvaro Herrera (#30)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Thu, Dec 9, 2021 at 6:00 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Dec-09, Bharath Rupireddy wrote:

I just want to call this out: an insertion of 10 million rows in the
primary generates a bunch of messages in the standby [1] within 40 sec
(size of the standby server log grows by 5K).

Hmm, that does sound excessive to me in terms of log bloat increase.
Remember the discussion about turning log_checkpoints on by default?

The amount of LOG messages generated when the log_checkpoints GUC is
set to on, are quite less, hardly 4 messages per-checkpoint (5min). I
think enabling log_checkpoints is still acceptable as most of the
hackers agreed in another thread [1]/messages/by-id/CA+TgmoaDFpFmNQuaWj6+78CPVBrF_WPT1wFHBTvio=tRmxzUcQ@mail.gmail.com that the advantages with it are
more and it doesn't seem to be bloating the server logs (in a day at
max 1152 messages).

I'm not sure if it is worth having a GUC log_recovery if enabled the
recovery messages can be emitted at LOG level otherwise DEBUG1 level.
log_recovery GUC can also be used to collect and emit some recovery
stats like log_checkpoints.

Regards,
Bharath Rupireddy.

#32

alvherre@alvh.no-ip.org

about 4 years ago

In reply to: Bharath Rupireddy (#31)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On 2021-Dec-09, Bharath Rupireddy wrote:

On Thu, Dec 9, 2021 at 6:00 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Dec-09, Bharath Rupireddy wrote:

I just want to call this out: an insertion of 10 million rows in the
primary generates a bunch of messages in the standby [1] within 40 sec
(size of the standby server log grows by 5K).

Hmm, that does sound excessive to me in terms of log bloat increase.
Remember the discussion about turning log_checkpoints on by default?

The amount of LOG messages generated when the log_checkpoints GUC is
set to on, are quite less, hardly 4 messages per-checkpoint (5min). I
think enabling log_checkpoints is still acceptable as most of the
hackers agreed in another thread [1] that the advantages with it are
more and it doesn't seem to be bloating the server logs (in a day at
max 1152 messages).

My point is that it was argued, in that thread, that setting
log_checkpoints to on by default would cause too much additional log
volume. That argument was shot down, but in this thread we're arguing
that we want five kilobytes of messages in forty seconds. That sounds
much less acceptable to me, so making it default behavior or
unconditional behavior is not going to fly very high.

I'm not sure if it is worth having a GUC log_recovery if enabled the
recovery messages can be emitted at LOG level otherwise DEBUG1 level.

Maybe some tunable like
log_wal_traffic = { none, medium, high }
where "none" is current behavior of no noise, "medium" gets (say) once
every 256 segments (e.g., when going from XFF to (X+1)00), "high" gets
you one message per segment.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#33

bharath.rupireddyforpostgres@gmail.com

about 4 years ago

In reply to: Alvaro Herrera (#32)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Thu, Dec 9, 2021 at 9:28 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Dec-09, Bharath Rupireddy wrote:

On Thu, Dec 9, 2021 at 6:00 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Dec-09, Bharath Rupireddy wrote:

I just want to call this out: an insertion of 10 million rows in the
primary generates a bunch of messages in the standby [1] within 40 sec
(size of the standby server log grows by 5K).

Hmm, that does sound excessive to me in terms of log bloat increase.
Remember the discussion about turning log_checkpoints on by default?

The amount of LOG messages generated when the log_checkpoints GUC is
set to on, are quite less, hardly 4 messages per-checkpoint (5min). I
think enabling log_checkpoints is still acceptable as most of the
hackers agreed in another thread [1] that the advantages with it are
more and it doesn't seem to be bloating the server logs (in a day at
max 1152 messages).

My point is that it was argued, in that thread, that setting
log_checkpoints to on by default would cause too much additional log
volume. That argument was shot down, but in this thread we're arguing
that we want five kilobytes of messages in forty seconds. That sounds
much less acceptable to me, so making it default behavior or
unconditional behavior is not going to fly very high.

I'm not sure if it is worth having a GUC log_recovery if enabled the
recovery messages can be emitted at LOG level otherwise DEBUG1 level.

Maybe some tunable like
log_wal_traffic = { none, medium, high }
where "none" is current behavior of no noise, "medium" gets (say) once
every 256 segments (e.g., when going from XFF to (X+1)00), "high" gets
you one message per segment.

Yeah, that can be an option.

Another idea could be to use the infrastructure laid out by the commit
9ce346e [1]commit 9ce346eabf350a130bba46be3f8c50ba28506969 Author: Robert Haas <rhaas@postgresql.org> Date: Mon Oct 25 11:51:57 2021 -0400. With ereport_startup_progress, we can emit the LOGs(of
current recovering WAL file) for every log_startup_progress_interval
seconds/milliseconds.

One problem is that ereport_startup_progress doesn't work on
StandbyMode, maybe we can remove this restriction unless we have a
major reason for not allowing it on the standby.
/* Prepare to report progress of the redo phase. */
if (!StandbyMode)
begin_startup_progress_phase();

Thoughts?

[1]: commit 9ce346eabf350a130bba46be3f8c50ba28506969 Author: Robert Haas <rhaas@postgresql.org> Date: Mon Oct 25 11:51:57 2021 -0400
commit 9ce346eabf350a130bba46be3f8c50ba28506969
Author: Robert Haas <rhaas@postgresql.org>
Date: Mon Oct 25 11:51:57 2021 -0400

Report progress of startup operations that take a long time.

Discussion:
/messages/by-id/CA+TgmoaHQrgDFOBwgY16XCoMtXxsrVGFB2jNCvb7-ubuEe1MGg@mail.gmail.com
Discussion:
/messages/by-id/CAMm1aWaHF7VE69572_OLQ+MgpT5RUiUDgF1x5RrtkJBLdpRj3Q@mail.gmail.com

Regards,
Bharath Rupireddy.

#34

Justin Pryzby

pryzby@telsasoft.com

about 4 years ago

In reply to: Bharath Rupireddy (#33)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Sun, Dec 12, 2021 at 06:08:05PM +0530, Bharath Rupireddy wrote:

Another idea could be to use the infrastructure laid out by the commit
9ce346e [1]. With ereport_startup_progress, we can emit the LOGs(of
current recovering WAL file) for every log_startup_progress_interval
seconds/milliseconds.

One problem is that ereport_startup_progress doesn't work on
StandbyMode, maybe we can remove this restriction unless we have a
major reason for not allowing it on the standby.
/* Prepare to report progress of the redo phase. */
if (!StandbyMode)
begin_startup_progress_phase();

The relevant conversation starts here.
/messages/by-id/20210814214700.GO10479@telsasoft.com

There was a lot of confusion in that thread, though.

The understanding was that it didn't make sense for that feature to
continuously log messages on a standby (every 10sec by default). That seems
like too much - the issue of a checkpointed logged every 5min was enough of a
hurdle.

If you're talking about a new feature that uses the infrastructre from 9ce3,
but is controlled by a separate GUC like log_wal_traffic, that could be okay.

--
Justin

#35

bharath.rupireddyforpostgres@gmail.com

about 4 years ago

In reply to: Justin Pryzby (#34)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Fri, Dec 24, 2021 at 7:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Sun, Dec 12, 2021 at 06:08:05PM +0530, Bharath Rupireddy wrote:

Another idea could be to use the infrastructure laid out by the commit
9ce346e [1]. With ereport_startup_progress, we can emit the LOGs(of
current recovering WAL file) for every log_startup_progress_interval
seconds/milliseconds.

One problem is that ereport_startup_progress doesn't work on
StandbyMode, maybe we can remove this restriction unless we have a
major reason for not allowing it on the standby.
/* Prepare to report progress of the redo phase. */
if (!StandbyMode)
begin_startup_progress_phase();

The relevant conversation starts here.
/messages/by-id/20210814214700.GO10479@telsasoft.com

There was a lot of confusion in that thread, though.

The understanding was that it didn't make sense for that feature to
continuously log messages on a standby (every 10sec by default). That seems
like too much - the issue of a checkpointed logged every 5min was enough of a
hurdle.

If you're talking about a new feature that uses the infrastructre from 9ce3,
but is controlled by a separate GUC like log_wal_traffic, that could be okay.

Do you see any problems using the same GUC
log_startup_progress_interval and ereport_startup_progress API
introduced by 9ce346e? I prefer this instead of a new GUC. Thoughts?

Regards,
Bharath Rupireddy.

#36

Justin Pryzby

pryzby@telsasoft.com

about 4 years ago

In reply to: Bharath Rupireddy (#35)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Fri, Dec 24, 2021 at 08:04:08PM +0530, Bharath Rupireddy wrote:

On Fri, Dec 24, 2021 at 7:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Sun, Dec 12, 2021 at 06:08:05PM +0530, Bharath Rupireddy wrote:

Another idea could be to use the infrastructure laid out by the commit
9ce346e [1]. With ereport_startup_progress, we can emit the LOGs(of
current recovering WAL file) for every log_startup_progress_interval
seconds/milliseconds.

One problem is that ereport_startup_progress doesn't work on
StandbyMode, maybe we can remove this restriction unless we have a
major reason for not allowing it on the standby.
/* Prepare to report progress of the redo phase. */
if (!StandbyMode)
begin_startup_progress_phase();

The relevant conversation starts here.
/messages/by-id/20210814214700.GO10479@telsasoft.com

There was a lot of confusion in that thread, though.

The understanding was that it didn't make sense for that feature to
continuously log messages on a standby (every 10sec by default). That seems
like too much - the issue of a checkpointed logged every 5min was enough of a
hurdle.

If you're talking about a new feature that uses the infrastructre from 9ce3,
but is controlled by a separate GUC like log_wal_traffic, that could be okay.

Do you see any problems using the same GUC
log_startup_progress_interval and ereport_startup_progress API
introduced by 9ce346e? I prefer this instead of a new GUC. Thoughts?

I referenced the thread where we avoided writing progress logs during normal
operation of a standby, since the logs would be both useless and excessive.

If your patch were to use the same infrastructure, you'd still want to avoid
disturbing that behavior and writing useless logs.

--
Justin

#37

bharath.rupireddyforpostgres@gmail.com

about 4 years ago

In reply to: Justin Pryzby (#34)

1 attachment(s)

Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

On Thu, Dec 9, 2021 at 9:28 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Maybe some tunable like
log_wal_traffic = { none, medium, high }
where "none" is current behavior of no noise, "medium" gets (say) once
every 256 segments (e.g., when going from XFF to (X+1)00), "high" gets
you one message per segment.

On Fri, Dec 24, 2021 at 7:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

If you're talking about a new feature that uses the infrastructre from 9ce3,
but is controlled by a separate GUC like log_wal_traffic, that could be okay.

Thanks for the suggestion. I've added a new GUC log_wal_traffic as
suggested above. PSA v7 patch.

Regards,
Bharath Rupireddy.

Attachments:

v7-0001-Add-WAL-recovery-messages-with-log_wal_traffic-GU.patchapplication/octet-stream; name=v7-0001-Add-WAL-recovery-messages-with-log_wal_traffic-GU.patchDownload

From 7626192b2832959fd12a09edc134382501a4e3be Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Wed, 29 Dec 2021 01:19:41 +0000
Subject: [PATCH v7] Add WAL recovery messages with log_wal_traffic GUC

---
 doc/src/sgml/config.sgml                      | 18 +++++
 src/backend/access/transam/xlog.c             | 80 ++++++++++++++++++-
 src/backend/access/transam/xlogarchive.c      |  2 +-
 src/backend/utils/misc/guc.c                  | 11 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h                     |  9 +++
 6 files changed, 119 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index afbb6c35e3..7d8e2caa09 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3365,6 +3365,24 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-log-wal-traffic" xreflabel="log_wal_traffic">
+      <term><varname>log_wal_traffic</varname> (<type>enum</type>)
+      <indexterm>
+       <primary><varname>log_wal_traffic</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of WAL processing information written to
+        the server log. Valid values are <literal>none</literal> (which is the
+        default), <literal>medium</literal> and <literal>high</literal>. The
+        value <literal>none</literal> logs no information, <literal>medium</literal>
+        logs messages per each 128 WAL segments processed and <literal>high</literal>
+        logs messages per each WAL segment processed.
+       </para>
+      </listitem>
+     </varlistentry>
+
      </variablelist>
      </sect2>
      <sect2 id="runtime-config-wal-checkpoints">
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1e1fbe957f..c77d3c4daa 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -91,6 +91,8 @@ extern uint32 bootstrap_data_checksum_version;
 /* timeline ID to be used when bootstrapping */
 #define BootstrapTimeLineID		1
 
+#define LOG_WAL_TRAFFIC_PER_SEGMENTS 128
+
 /* User-settable parameters */
 int			max_wal_size_mb = 1024; /* 1 GB */
 int			min_wal_size_mb = 80;	/* 80 MB */
@@ -115,6 +117,7 @@ int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int			wal_retrieve_retry_interval = 5000;
 int			max_slot_wal_keep_size_mb = -1;
 bool		track_wal_io_timing = false;
+int			log_wal_traffic = LOG_WAL_TRAFFIC_NONE;
 
 #ifdef WAL_DEBUG
 bool		XLOG_DEBUG = false;
@@ -184,6 +187,13 @@ const struct config_enum_entry recovery_target_action_options[] = {
 	{NULL, 0, false}
 };
 
+const struct config_enum_entry log_wal_traffic_options[] = {
+	{"none", LOG_WAL_TRAFFIC_NONE, false},
+	{"medium", LOG_WAL_TRAFFIC_MEDIUM, false},
+	{"high", LOG_WAL_TRAFFIC_HIGH, false},
+	{NULL, 0, false}
+};
+
 /*
  * Statistics for current checkpoint are collected in this global struct.
  * Because only the checkpointer or a stand-alone backend can perform
@@ -999,6 +1009,8 @@ static void WALInsertLockAcquireExclusive(void);
 static void WALInsertLockRelease(void);
 static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
 
+static bool IsLogWALTraffic(char *last_logged_xlogfname, char *xlogfname);
+
 /*
  * Insert an XLOG record represented by an already-constructed chain of data
  * chunks.  This is a low-level routine; to construct the WAL record header
@@ -3788,6 +3800,7 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 	char		activitymsg[MAXFNAMELEN + 16];
 	char		path[MAXPGPATH];
 	int			fd;
+	static	char last_logged_xlogfname[MAXFNAMELEN] = {'\0'};
 
 	XLogFileName(xlogfname, tli, segno, wal_segment_size);
 
@@ -3799,6 +3812,11 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 					 xlogfname);
 			set_ps_display(activitymsg);
 
+			if (IsLogWALTraffic(last_logged_xlogfname, xlogfname))
+				ereport(LOG,
+						(errmsg("waiting for WAL segment \"%s\" from archive",
+								xlogfname)));
+
 			if (!RestoreArchivedFile(path, xlogfname,
 									 "RECOVERYXLOG",
 									 wal_segment_size,
@@ -3841,6 +3859,32 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 				 xlogfname);
 		set_ps_display(activitymsg);
 
+		if (IsLogWALTraffic(last_logged_xlogfname, xlogfname))
+		{
+			switch (source)
+			{
+				case XLOG_FROM_ARCHIVE:
+					ereport(LOG,
+							(errmsg("recovering WAL segment \"%s\" from archive",
+									xlogfname)));
+					break;
+				case XLOG_FROM_STREAM:
+					ereport(LOG,
+							(errmsg("recovering WAL segment \"%s\" from stream",
+									xlogfname)));
+					break;
+				case XLOG_FROM_PG_WAL:
+					ereport(LOG,
+							(errmsg("recovering WAL segment \"%s\" from pg_wal",
+									xlogfname)));
+					break;
+				case XLOG_FROM_ANY:
+					break;
+			}
+
+			memcpy(last_logged_xlogfname, xlogfname, MAXFNAMELEN);
+		}
+
 		/* Track source of data in assorted state variables */
 		readSource = source;
 		XLogReceiptSource = source;
@@ -3931,7 +3975,6 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
 							  XLOG_FROM_ARCHIVE, true);
 			if (fd != -1)
 			{
-				elog(DEBUG1, "got WAL segment from archive");
 				if (!expectedTLEs)
 					expectedTLEs = tles;
 				return fd;
@@ -13243,3 +13286,38 @@ XLogRequestWalReceiverReply(void)
 {
 	doRequestWalReceiverReply = true;
 }
+
+/*
+ * Returns true if logging of WAL traffic is allowed, otherwise false.
+ */
+static bool
+IsLogWALTraffic(char *last_logged_xlogfname, char *xlogfname)
+{
+	bool emit_log = false;
+
+	if (log_wal_traffic == LOG_WAL_TRAFFIC_NONE)
+		return false;
+	else if (log_wal_traffic == LOG_WAL_TRAFFIC_MEDIUM)
+	{
+		/* first time, log the messages */
+		if (strlen(last_logged_xlogfname) == 0)
+			emit_log = true;
+		else if (IsXLogFileName(last_logged_xlogfname))
+		{
+			uint32		l_tli;
+			uint32		tli;
+			XLogSegNo	l_segno;
+			XLogSegNo	segno;
+
+			XLogFromFileName(last_logged_xlogfname, &l_tli, &l_segno, wal_segment_size);
+			XLogFromFileName(xlogfname, &tli, &segno, wal_segment_size);
+
+			if ((segno - l_segno) >= LOG_WAL_TRAFFIC_PER_SEGMENTS)
+				emit_log = true;
+		}
+	}
+	else if (log_wal_traffic == LOG_WAL_TRAFFIC_HIGH)
+		return true;
+
+	return emit_log;
+}
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 4ddeac1fb9..1f724f4acf 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -214,7 +214,7 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 			else
 			{
 				ereport(LOG,
-						(errmsg("restored log file \"%s\" from archive",
+						(errmsg("restored WAL segment \"%s\" from archive",
 								xlogfname)));
 				strcpy(path, xlogpath);
 				return true;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f9504d3aec..e45a263a51 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -565,6 +565,7 @@ extern const struct config_enum_entry archive_mode_options[];
 extern const struct config_enum_entry recovery_target_action_options[];
 extern const struct config_enum_entry sync_method_options[];
 extern const struct config_enum_entry dynamic_shared_memory_options[];
+extern const struct config_enum_entry log_wal_traffic_options[];
 
 /*
  * GUC option variables that are exported from this module
@@ -4885,6 +4886,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"log_wal_traffic", PGC_USERSET, WAL_SETTINGS,
+			gettext_noop("Sets the amount of WAL processing information written to the server log."),
+			NULL
+		},
+		&log_wal_traffic,
+		LOG_WAL_TRAFFIC_NONE, log_wal_traffic_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"dynamic_shared_memory_type", PGC_POSTMASTER, RESOURCES_MEM,
 			gettext_noop("Selects the dynamic shared memory implementation used."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a1acd46b61..c0ea0d847b 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -231,6 +231,7 @@
 
 #commit_delay = 0			# range 0-100000, in microseconds
 #commit_siblings = 5			# range 1-1000
+#log_wal_traffic = none			# none, medium, or high
 
 # - Checkpoints -
 
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 34f6c89f06..9a7f6b27be 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -88,6 +88,7 @@ extern char *PrimaryConnInfo;
 extern char *PrimarySlotName;
 extern bool wal_receiver_create_temp_slot;
 extern bool track_wal_io_timing;
+extern int log_wal_traffic;
 
 /* indirectly set via GUC system */
 extern TransactionId recoveryTargetXid;
@@ -147,6 +148,14 @@ typedef enum RecoveryPauseState
 	RECOVERY_PAUSED				/* recovery is paused */
 } RecoveryPauseState;
 
+/* Log WAL traffic states */
+typedef enum LogWalTraffic
+{
+	LOG_WAL_TRAFFIC_NONE = 0,
+	LOG_WAL_TRAFFIC_MEDIUM,
+	LOG_WAL_TRAFFIC_HIGH
+} LogWalTraffic;
+
 extern PGDLLIMPORT int wal_level;
 
 /* Is WAL archiving enabled (always or only while server is running normally)? */
-- 
2.25.1

#38

https://commitfest.postgresql.org/37/3443/

bharath.rupireddyforpostgres@gmail.com

almost 4 years ago

In reply to: Bharath Rupireddy (#37)

1 attachment(s)

Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)

On Wed, Dec 29, 2021 at 6:50 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Dec 9, 2021 at 9:28 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Maybe some tunable like
log_wal_traffic = { none, medium, high }
where "none" is current behavior of no noise, "medium" gets (say) once
every 256 segments (e.g., when going from XFF to (X+1)00), "high" gets
you one message per segment.

On Fri, Dec 24, 2021 at 7:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

If you're talking about a new feature that uses the infrastructre from 9ce3,
but is controlled by a separate GUC like log_wal_traffic, that could be okay.

Thanks for the suggestion. I've added a new GUC log_wal_traffic as
suggested above. PSA v7 patch.

Here's the rebased v8 patch, please review it.

Regards,
Bharath Rupireddy.

Attachments:

v8-0001-Add-WAL-recovery-messages-with-log_wal_traffic-GU.patchapplication/octet-stream; name=v8-0001-Add-WAL-recovery-messages-with-log_wal_traffic-GU.patchDownload

From f262eec240c4071e3128468d9d604c5ea958948c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Sun, 27 Feb 2022 07:06:17 +0000
Subject: [PATCH v8] Add WAL recovery messages with log_wal_traffic GUC

Having the WAL file info (the server is recovering/replaying)
along with source (where it is being fetched from archive or
stream or pg_wal) in server log messages will be useful to
understand how the system is/was doing/progressing with these
(sometimes time-intensive) operations.

Added a GUC to control the log traffic i.e. log frequently or once
in a while per every 128 recovered WAL files or log nothing.
---
 doc/src/sgml/config.sgml                      | 18 ++++++++
 src/backend/access/transam/xlog.c             | 43 +++++++++++++++++++
 src/backend/access/transam/xlogarchive.c      |  2 +-
 src/backend/access/transam/xlogrecovery.c     | 33 +++++++++++++-
 src/backend/utils/misc/guc.c                  | 11 +++++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h                     | 11 +++++
 7 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7ed8c82a9d..480d1f7acd 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3364,6 +3364,24 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-log-wal-traffic" xreflabel="log_wal_traffic">
+      <term><varname>log_wal_traffic</varname> (<type>enum</type>)
+      <indexterm>
+       <primary><varname>log_wal_traffic</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of WAL processing information written to
+        the server log. Valid values are <literal>none</literal> (which is the
+        default), <literal>medium</literal> and <literal>high</literal>. The
+        value <literal>none</literal> logs no information, <literal>medium</literal>
+        logs messages per each 128 WAL segments processed and <literal>high</literal>
+        logs messages per each WAL segment processed.
+       </para>
+      </listitem>
+     </varlistentry>
+
      </variablelist>
      </sect2>
      <sect2 id="runtime-config-wal-checkpoints">
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0d2bd7a357..ad946a18af 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -139,6 +139,7 @@ bool		XLOG_DEBUG = false;
 #endif
 
 int			wal_segment_size = DEFAULT_XLOG_SEG_SIZE;
+int			log_wal_traffic = LOG_WAL_TRAFFIC_NONE;
 
 /*
  * Number of WAL insertion locks to use. A higher value allows more insertions
@@ -195,6 +196,13 @@ const struct config_enum_entry archive_mode_options[] = {
 	{NULL, 0, false}
 };
 
+const struct config_enum_entry log_wal_traffic_options[] = {
+	{"none", LOG_WAL_TRAFFIC_NONE, false},
+	{"medium", LOG_WAL_TRAFFIC_MEDIUM, false},
+	{"high", LOG_WAL_TRAFFIC_HIGH, false},
+	{NULL, 0, false}
+};
+
 /*
  * Statistics for current checkpoint are collected in this global struct.
  * Because only the checkpointer or a stand-alone backend can perform
@@ -9197,3 +9205,38 @@ SetWalWriterSleeping(bool sleeping)
 	XLogCtl->WalWriterSleeping = sleeping;
 	SpinLockRelease(&XLogCtl->info_lck);
 }
+
+/*
+ * Returns true if logging of WAL traffic is allowed, otherwise false.
+ */
+bool
+LogWALTraffic(char *last_logged_xlogfname, char *xlogfname)
+{
+	bool emit_log = false;
+
+	if (log_wal_traffic == LOG_WAL_TRAFFIC_NONE)
+		return false;
+	else if (log_wal_traffic == LOG_WAL_TRAFFIC_MEDIUM)
+	{
+		/* first time, log the messages */
+		if (strlen(last_logged_xlogfname) == 0)
+			emit_log = true;
+		else if (IsXLogFileName(last_logged_xlogfname))
+		{
+			uint32		l_tli;
+			uint32		tli;
+			XLogSegNo	l_segno;
+			XLogSegNo	segno;
+
+			XLogFromFileName(last_logged_xlogfname, &l_tli, &l_segno, wal_segment_size);
+			XLogFromFileName(xlogfname, &tli, &segno, wal_segment_size);
+
+			if ((segno - l_segno) >= LOG_WAL_TRAFFIC_PER_SEGMENTS)
+				emit_log = true;
+		}
+	}
+	else if (log_wal_traffic == LOG_WAL_TRAFFIC_HIGH)
+		return true;
+
+	return emit_log;
+}
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index a2657a2005..b6040f9b9d 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -214,7 +214,7 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 			else
 			{
 				ereport(LOG,
-						(errmsg("restored log file \"%s\" from archive",
+						(errmsg("restored WAL segment \"%s\" from archive",
 								xlogfname)));
 				strcpy(path, xlogpath);
 				return true;
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index f9f212680b..bc1dd1b096 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3985,6 +3985,7 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 	char		activitymsg[MAXFNAMELEN + 16];
 	char		path[MAXPGPATH];
 	int			fd;
+	static	char last_logged_xlogfname[MAXFNAMELEN] = {'\0'};
 
 	XLogFileName(xlogfname, tli, segno, wal_segment_size);
 
@@ -3996,6 +3997,11 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 					 xlogfname);
 			set_ps_display(activitymsg);
 
+			if (LogWALTraffic(last_logged_xlogfname, xlogfname))
+				ereport(LOG,
+						(errmsg("waiting for WAL segment \"%s\" from archive",
+								xlogfname)));
+
 			if (!RestoreArchivedFile(path, xlogfname,
 									 "RECOVERYXLOG",
 									 wal_segment_size,
@@ -4038,6 +4044,32 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 				 xlogfname);
 		set_ps_display(activitymsg);
 
+		if (LogWALTraffic(last_logged_xlogfname, xlogfname))
+		{
+			switch (source)
+			{
+				case XLOG_FROM_ARCHIVE:
+					ereport(LOG,
+							(errmsg("recovering WAL segment \"%s\" from archive",
+									xlogfname)));
+					break;
+				case XLOG_FROM_STREAM:
+					ereport(LOG,
+							(errmsg("recovering WAL segment \"%s\" from stream",
+									xlogfname)));
+					break;
+				case XLOG_FROM_PG_WAL:
+					ereport(LOG,
+							(errmsg("recovering WAL segment \"%s\" from pg_wal",
+									xlogfname)));
+					break;
+				case XLOG_FROM_ANY:
+					break;
+			}
+
+			memcpy(last_logged_xlogfname, xlogfname, MAXFNAMELEN);
+		}
+
 		/* Track source of data in assorted state variables */
 		readSource = source;
 		XLogReceiptSource = source;
@@ -4128,7 +4160,6 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
 							  XLOG_FROM_ARCHIVE, true);
 			if (fd != -1)
 			{
-				elog(DEBUG1, "got WAL segment from archive");
 				if (!expectedTLEs)
 					expectedTLEs = tles;
 				return fd;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 1e3650184b..5dc956f9ef 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -570,6 +570,7 @@ extern const struct config_enum_entry archive_mode_options[];
 extern const struct config_enum_entry recovery_target_action_options[];
 extern const struct config_enum_entry sync_method_options[];
 extern const struct config_enum_entry dynamic_shared_memory_options[];
+extern const struct config_enum_entry log_wal_traffic_options[];
 
 /*
  * GUC option variables that are exported from this module
@@ -4911,6 +4912,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"log_wal_traffic", PGC_USERSET, WAL_SETTINGS,
+			gettext_noop("Sets the amount of WAL processing information written to the server log."),
+			NULL
+		},
+		&log_wal_traffic,
+		LOG_WAL_TRAFFIC_NONE, log_wal_traffic_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"dynamic_shared_memory_type", PGC_POSTMASTER, RESOURCES_MEM,
 			gettext_noop("Selects the dynamic shared memory implementation used."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 4a094bb38b..cfeebd7cfa 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -231,6 +231,7 @@
 
 #commit_delay = 0			# range 0-100000, in microseconds
 #commit_siblings = 5			# range 1-1000
+#log_wal_traffic = none			# none, medium, or high
 
 # - Checkpoints -
 
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 4b45ac64db..2420cd77f3 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -50,6 +50,7 @@ extern bool *wal_consistency_checking;
 extern char *wal_consistency_checking_string;
 extern bool log_checkpoints;
 extern bool track_wal_io_timing;
+extern int log_wal_traffic;
 
 extern int	CheckPointSegments;
 
@@ -86,6 +87,14 @@ typedef enum RecoveryState
 	RECOVERY_STATE_DONE			/* currently in production */
 } RecoveryState;
 
+/* Log WAL traffic states */
+typedef enum LogWalTraffic
+{
+	LOG_WAL_TRAFFIC_NONE = 0,
+	LOG_WAL_TRAFFIC_MEDIUM,
+	LOG_WAL_TRAFFIC_HIGH
+} LogWalTraffic;
+
 extern PGDLLIMPORT int wal_level;
 
 /* Is WAL archiving enabled (always or only while server is running normally)? */
@@ -149,6 +158,7 @@ extern bool XLOG_DEBUG;
 #define XLOG_INCLUDE_ORIGIN		0x01	/* include the replication origin */
 #define XLOG_MARK_UNIMPORTANT	0x02	/* record not important for durability */
 
+#define LOG_WAL_TRAFFIC_PER_SEGMENTS 128
 
 /* Checkpoint statistics */
 typedef struct CheckpointStatsData
@@ -246,6 +256,7 @@ extern void SetWalWriterSleeping(bool sleeping);
 
 extern void assign_max_wal_size(int newval, void *extra);
 extern void assign_checkpoint_completion_target(double newval, void *extra);
+extern bool LogWALTraffic(char *last_logged_xlogfname, char *xlogfname);
 
 /*
  * Routines used by xlogrecovery.c to call back into xlog.c during recovery.
-- 
2.25.1

#39

bharath.rupireddyforpostgres@gmail.com

over 3 years ago

In reply to: Bharath Rupireddy (#38)

1 attachment(s)

Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)

On Sun, Feb 27, 2022 at 12:50 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Dec 29, 2021 at 6:50 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Dec 9, 2021 at 9:28 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Maybe some tunable like
log_wal_traffic = { none, medium, high }
where "none" is current behavior of no noise, "medium" gets (say) once
every 256 segments (e.g., when going from XFF to (X+1)00), "high" gets
you one message per segment.

On Fri, Dec 24, 2021 at 7:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

If you're talking about a new feature that uses the infrastructre from 9ce3,
but is controlled by a separate GUC like log_wal_traffic, that could be okay.

Thanks for the suggestion. I've added a new GUC log_wal_traffic as
suggested above. PSA v7 patch.

Here's the rebased v8 patch, please review it.

https://commitfest.postgresql.org/37/3443/

Here's the rebased v9 patch.

Regards,
Bharath Rupireddy.

Attachments:

v9-0001-Add-WAL-recovery-messages-with-log_wal_traffic-GU.patchapplication/octet-stream; name=v9-0001-Add-WAL-recovery-messages-with-log_wal_traffic-GU.patchDownload

From 199f72d93173207e41dd1b71e4bde254befa7635 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Fri, 29 Apr 2022 09:07:44 +0000
Subject: [PATCH v9] Add WAL recovery messages with log_wal_traffic GUC

Having the WAL file info (the server is recovering/replaying)
along with source (where it is being fetched from archive or
stream or pg_wal) in server log messages will be useful to
understand how the system is/was doing/progressing with these
(sometimes time-intensive) operations.

Added a GUC to control the log traffic i.e. log frequently or once
in a while per every 128 recovered WAL files or log nothing.
---
 doc/src/sgml/config.sgml                      | 18 ++++++++
 src/backend/access/transam/xlog.c             | 43 +++++++++++++++++++
 src/backend/access/transam/xlogarchive.c      |  2 +-
 src/backend/access/transam/xlogrecovery.c     | 33 +++++++++++++-
 src/backend/utils/misc/guc.c                  | 11 +++++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h                     | 11 +++++
 7 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 03986946a8..1d69b34575 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3380,6 +3380,24 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-log-wal-traffic" xreflabel="log_wal_traffic">
+      <term><varname>log_wal_traffic</varname> (<type>enum</type>)
+      <indexterm>
+       <primary><varname>log_wal_traffic</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of WAL processing information written to
+        the server log. Valid values are <literal>none</literal> (which is the
+        default), <literal>medium</literal> and <literal>high</literal>. The
+        value <literal>none</literal> logs no information, <literal>medium</literal>
+        logs messages per each 128 WAL segments processed and <literal>high</literal>
+        logs messages per each WAL segment processed.
+       </para>
+      </listitem>
+     </varlistentry>
+
      </variablelist>
      </sect2>
      <sect2 id="runtime-config-wal-checkpoints">
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 61cda56c6f..20495e5e75 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -142,6 +142,7 @@ bool		XLOG_DEBUG = false;
 #endif
 
 int			wal_segment_size = DEFAULT_XLOG_SEG_SIZE;
+int			log_wal_traffic = LOG_WAL_TRAFFIC_NONE;
 
 /*
  * Number of WAL insertion locks to use. A higher value allows more insertions
@@ -198,6 +199,13 @@ const struct config_enum_entry archive_mode_options[] = {
 	{NULL, 0, false}
 };
 
+const struct config_enum_entry log_wal_traffic_options[] = {
+	{"none", LOG_WAL_TRAFFIC_NONE, false},
+	{"medium", LOG_WAL_TRAFFIC_MEDIUM, false},
+	{"high", LOG_WAL_TRAFFIC_HIGH, false},
+	{NULL, 0, false}
+};
+
 /*
  * Statistics for current checkpoint are collected in this global struct.
  * Because only the checkpointer or a stand-alone backend can perform
@@ -8868,3 +8876,38 @@ SetWalWriterSleeping(bool sleeping)
 	XLogCtl->WalWriterSleeping = sleeping;
 	SpinLockRelease(&XLogCtl->info_lck);
 }
+
+/*
+ * Returns true if logging of WAL traffic is allowed, otherwise false.
+ */
+bool
+LogWALTraffic(char *last_logged_xlogfname, char *xlogfname)
+{
+	bool emit_log = false;
+
+	if (log_wal_traffic == LOG_WAL_TRAFFIC_NONE)
+		return false;
+	else if (log_wal_traffic == LOG_WAL_TRAFFIC_MEDIUM)
+	{
+		/* first time, log the messages */
+		if (strlen(last_logged_xlogfname) == 0)
+			emit_log = true;
+		else if (IsXLogFileName(last_logged_xlogfname))
+		{
+			uint32		l_tli;
+			uint32		tli;
+			XLogSegNo	l_segno;
+			XLogSegNo	segno;
+
+			XLogFromFileName(last_logged_xlogfname, &l_tli, &l_segno, wal_segment_size);
+			XLogFromFileName(xlogfname, &tli, &segno, wal_segment_size);
+
+			if ((segno - l_segno) >= LOG_WAL_TRAFFIC_PER_SEGMENTS)
+				emit_log = true;
+		}
+	}
+	else if (log_wal_traffic == LOG_WAL_TRAFFIC_HIGH)
+		return true;
+
+	return emit_log;
+}
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index a2657a2005..b6040f9b9d 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -214,7 +214,7 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 			else
 			{
 				ereport(LOG,
-						(errmsg("restored log file \"%s\" from archive",
+						(errmsg("restored WAL segment \"%s\" from archive",
 								xlogfname)));
 				strcpy(path, xlogpath);
 				return true;
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 39ef865ed9..6e95877961 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -4047,6 +4047,7 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 	char		activitymsg[MAXFNAMELEN + 16];
 	char		path[MAXPGPATH];
 	int			fd;
+	static	char last_logged_xlogfname[MAXFNAMELEN] = {'\0'};
 
 	XLogFileName(xlogfname, tli, segno, wal_segment_size);
 
@@ -4058,6 +4059,11 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 					 xlogfname);
 			set_ps_display(activitymsg);
 
+			if (LogWALTraffic(last_logged_xlogfname, xlogfname))
+				ereport(LOG,
+						(errmsg("waiting for WAL segment \"%s\" from archive",
+								xlogfname)));
+
 			if (!RestoreArchivedFile(path, xlogfname,
 									 "RECOVERYXLOG",
 									 wal_segment_size,
@@ -4100,6 +4106,32 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 				 xlogfname);
 		set_ps_display(activitymsg);
 
+		if (LogWALTraffic(last_logged_xlogfname, xlogfname))
+		{
+			switch (source)
+			{
+				case XLOG_FROM_ARCHIVE:
+					ereport(LOG,
+							(errmsg("recovering WAL segment \"%s\" from archive",
+									xlogfname)));
+					break;
+				case XLOG_FROM_STREAM:
+					ereport(LOG,
+							(errmsg("recovering WAL segment \"%s\" from stream",
+									xlogfname)));
+					break;
+				case XLOG_FROM_PG_WAL:
+					ereport(LOG,
+							(errmsg("recovering WAL segment \"%s\" from pg_wal",
+									xlogfname)));
+					break;
+				case XLOG_FROM_ANY:
+					break;
+			}
+
+			memcpy(last_logged_xlogfname, xlogfname, MAXFNAMELEN);
+		}
+
 		/* Track source of data in assorted state variables */
 		readSource = source;
 		XLogReceiptSource = source;
@@ -4190,7 +4222,6 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
 							  XLOG_FROM_ARCHIVE, true);
 			if (fd != -1)
 			{
-				elog(DEBUG1, "got WAL segment from archive");
 				if (!expectedTLEs)
 					expectedTLEs = tles;
 				return fd;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 8e9b71375c..8cf74d549a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -604,6 +604,7 @@ extern const struct config_enum_entry archive_mode_options[];
 extern const struct config_enum_entry recovery_target_action_options[];
 extern const struct config_enum_entry sync_method_options[];
 extern const struct config_enum_entry dynamic_shared_memory_options[];
+extern const struct config_enum_entry log_wal_traffic_options[];
 
 /*
  * GUC option variables that are exported from this module
@@ -4981,6 +4982,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"log_wal_traffic", PGC_USERSET, WAL_SETTINGS,
+			gettext_noop("Sets the amount of WAL processing information written to the server log."),
+			NULL
+		},
+		&log_wal_traffic,
+		LOG_WAL_TRAFFIC_NONE, log_wal_traffic_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"dynamic_shared_memory_type", PGC_POSTMASTER, RESOURCES_MEM,
 			gettext_noop("Selects the dynamic shared memory implementation used."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 94270eb0ec..d49d757298 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -231,6 +231,7 @@
 
 #commit_delay = 0			# range 0-100000, in microseconds
 #commit_siblings = 5			# range 1-1000
+#log_wal_traffic = none			# none, medium, or high
 
 # - Checkpoints -
 
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d9f2487a96..f91c52a93e 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -51,6 +51,7 @@ extern PGDLLIMPORT char *wal_consistency_checking_string;
 extern PGDLLIMPORT bool log_checkpoints;
 extern PGDLLIMPORT bool track_wal_io_timing;
 extern PGDLLIMPORT int wal_decode_buffer_size;
+extern PGDLLIMPORT int log_wal_traffic;
 
 extern PGDLLIMPORT int CheckPointSegments;
 
@@ -88,6 +89,14 @@ typedef enum RecoveryState
 	RECOVERY_STATE_DONE			/* currently in production */
 } RecoveryState;
 
+/* Log WAL traffic states */
+typedef enum LogWalTraffic
+{
+	LOG_WAL_TRAFFIC_NONE = 0,
+	LOG_WAL_TRAFFIC_MEDIUM,
+	LOG_WAL_TRAFFIC_HIGH
+} LogWalTraffic;
+
 extern PGDLLIMPORT int wal_level;
 
 /* Is WAL archiving enabled (always or only while server is running normally)? */
@@ -151,6 +160,7 @@ extern PGDLLIMPORT bool XLOG_DEBUG;
 #define XLOG_INCLUDE_ORIGIN		0x01	/* include the replication origin */
 #define XLOG_MARK_UNIMPORTANT	0x02	/* record not important for durability */
 
+#define LOG_WAL_TRAFFIC_PER_SEGMENTS 128
 
 /* Checkpoint statistics */
 typedef struct CheckpointStatsData
@@ -248,6 +258,7 @@ extern void SetWalWriterSleeping(bool sleeping);
 
 extern void assign_max_wal_size(int newval, void *extra);
 extern void assign_checkpoint_completion_target(double newval, void *extra);
+extern bool LogWALTraffic(char *last_logged_xlogfname, char *xlogfname);
 
 /*
  * Routines used by xlogrecovery.c to call back into xlog.c during recovery.
-- 
2.25.1

#40

alvherre@alvh.no-ip.org

over 3 years ago

In reply to: Bharath Rupireddy (#39)

Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)

Did we ever consider the idea of using a new pg_stat_wal_activity_progress
view or something like that, using the backend_progress.c functionality?
I don't see it mentioned in the thread.

It's not an *exact* fit, since this is not about one "command" being
executed by a "backend", but it seems a near enough fit, and it would
unobtrusive enough that we don't need to worry about the progress-update
rate or have a GUC to determine how frequently to update (with a gauge
that's heavily workload-dependant and is likely to make little sense to
some users -- consider differently-sized WAL segments, for one thing).

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#41

bharath.rupireddyforpostgres@gmail.com

over 3 years ago

In reply to: Alvaro Herrera (#40)

Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)

On Fri, Apr 29, 2022 at 4:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Did we ever consider the idea of using a new pg_stat_wal_activity_progress
view or something like that, using the backend_progress.c functionality?
I don't see it mentioned in the thread.

IMO, progress reporting works well on a running server and at the
moment. The WAL recovery/replay can happen even before the server
opens up for connections and the progress report view can't be used
for later analysis like how much time the restoring WAL files from
archive location took and also the WAL file names can't be reported in
progress reporting mechanism (only integers columns, of course if
required we can add text columns to pg_stat_get_progress_info). Having
the recovery info in server logs might help.

It's not an *exact* fit, since this is not about one "command" being
executed by a "backend", but it seems a near enough fit, and it would
unobtrusive enough that we don't need to worry about the progress-update
rate or have a GUC to determine how frequently to update (with a gauge
that's heavily workload-dependant and is likely to make little sense to
some users -- consider differently-sized WAL segments, for one thing).

I think reporting a long-running file processing operation (removing
or syncing) within postgres is a generic problem (for snapshot,
mapping, temporary (pgsql_tmp), temp relation files, old WAL file
processing, WAL file processing during recovery etc.) and needs to be
solved in two ways: 1) logging progress into server logs (which helps
for analysis and report when the server isn't available for
connections, crash recovery), a generic GUC
log_file_processing_traffic = {none, medium, high} might help here
(also proposed in [1]) and 2) pg_stat_file_processing_progress
(extending progress reporting pg_stat_get_progress_info to have few
text columns for current file name and directory path).

Thoughts?

Regards,
Bharath Rupireddy.

#42

alvherre@alvh.no-ip.org

over 3 years ago

In reply to: Bharath Rupireddy (#41)

Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)

On 2022-May-05, Bharath Rupireddy wrote:

On Fri, Apr 29, 2022 at 4:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Did we ever consider the idea of using a new pg_stat_wal_activity_progress
view or something like that, using the backend_progress.c functionality?
I don't see it mentioned in the thread.

IMO, progress reporting works well on a running server and at the
moment. The WAL recovery/replay can happen even before the server
opens up for connections

It's definitely true that you wouldn't be able to use it for when the
server is not accepting connections.

and the progress report view can't be used
for later analysis like how much time the restoring WAL files from
archive location took

This is true too -- progress report doesn't store historical data, only
current status.

and also the WAL file names can't be reported in progress reporting
mechanism

Also true.

(only integers columns, of course if required we can add text columns
to pg_stat_get_progress_info).

Yeah, I don't think adding text columns is terribly easy, because the
whole point of the progress reporting infrastructure is that it can be
updated very cheaply as atomic operations, and if you want to transmit
text columns, that's no longer possible.

Having the recovery info in server logs might help.

I suppose it might.

I think reporting a long-running file processing operation (removing
or syncing) within postgres is a generic problem (for snapshot,
mapping, temporary (pgsql_tmp), temp relation files, old WAL file
processing, WAL file processing during recovery etc.) and needs to be
solved

I agree up to here.

in two ways: 1) logging progress into server logs (which helps
for analysis and report when the server isn't available for
connections, crash recovery), a generic GUC
log_file_processing_traffic = {none, medium, high} might help here
(also proposed in [1]) and 2) pg_stat_file_processing_progress
(extending progress reporting pg_stat_get_progress_info to have few
text columns for current file name and directory path).

I think using the server log to store telemetry data is not a great fit.
It bloats the log files and can be so slow as to block other operations
in the server. Server logs should normally be considered critical info
that's not okay to lose; telemetry tends to be of secondary importance
and in a pinch you can drop a few messages without hurting too much.

We've done moderately okay so far with having some system views where
some telemetry readings can be obtained, but there several drawbacks to
that approach that we should at some point solve. My opinion on this is
that we need to bite the bullet and develop separate infrastructure for
reporting server metrics.

That said, I'm not opposed to having a patch somewhat as posted. I just
think that we should look into a new mechanism going forward.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#43

[1]: /messages/by-id/CALj2ACW-ELOF5JT2zPavs95wbZ0BrLPrqvSZ7Ac+pjxCkmXtEQ@mail.gmail.com

bharath.rupireddyforpostgres@gmail.com

over 3 years ago

In reply to: Alvaro Herrera (#42)

Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)

On Thu, May 5, 2022 at 2:07 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2022-May-05, Bharath Rupireddy wrote:

On Fri, Apr 29, 2022 at 4:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Did we ever consider the idea of using a new pg_stat_wal_activity_progress
view or something like that, using the backend_progress.c functionality?
I don't see it mentioned in the thread.

IMO, progress reporting works well on a running server and at the
moment. The WAL recovery/replay can happen even before the server
opens up for connections

It's definitely true that you wouldn't be able to use it for when the
server is not accepting connections.

and the progress report view can't be used
for later analysis like how much time the restoring WAL files from
archive location took

This is true too -- progress report doesn't store historical data, only
current status.

and also the WAL file names can't be reported in progress reporting
mechanism

Also true.

(only integers columns, of course if required we can add text columns
to pg_stat_get_progress_info).

Yeah, I don't think adding text columns is terribly easy, because the
whole point of the progress reporting infrastructure is that it can be
updated very cheaply as atomic operations, and if you want to transmit
text columns, that's no longer possible.

Having the recovery info in server logs might help.

I suppose it might.

I think reporting a long-running file processing operation (removing
or syncing) within postgres is a generic problem (for snapshot,
mapping, temporary (pgsql_tmp), temp relation files, old WAL file
processing, WAL file processing during recovery etc.) and needs to be
solved

I agree up to here.

in two ways: 1) logging progress into server logs (which helps
for analysis and report when the server isn't available for
connections, crash recovery), a generic GUC
log_file_processing_traffic = {none, medium, high} might help here
(also proposed in [1]) and 2) pg_stat_file_processing_progress
(extending progress reporting pg_stat_get_progress_info to have few
text columns for current file name and directory path).

I think using the server log to store telemetry data is not a great fit.
It bloats the log files and can be so slow as to block other operations
in the server. Server logs should normally be considered critical info
that's not okay to lose; telemetry tends to be of secondary importance
and in a pinch you can drop a few messages without hurting too much.

We've done moderately okay so far with having some system views where
some telemetry readings can be obtained, but there several drawbacks to
that approach that we should at some point solve. My opinion on this is
that we need to bite the bullet and develop separate infrastructure for
reporting server metrics.

I just
think that we should look into a new mechanism going forward.

I completely agree that we must have new ways to report important
server metrics in an easily consumable fashion - it could be something
like server computing metrics (time taken, the file being processed
etc.) important/time-taking operations such as execution of archive
command, restore command, recovery, checkpoint, old WAL files
removal/recycling, processing of various files - snapshot, mapping,
pgsql_tmp files, temp relation files etc. and writing these metrics to
a file or a stat table on the database itself (this table can have an
automatic mechanism of clean up or limit on number of rows or size of
the table so that the clients can read those metrics and use it in
whichever way they want. To start with, a simple structure of the
metrics can be {OPERATION text, START_TIME timestamp, END_TIME
timestamp, few text, double and/or integer flexible columns that each
operation can use to store metrics}.

I can start a discussion in a separate thread to seek more thoughts on
the server metrics part.

That said, I'm not opposed to having a patch somewhat as posted.

Thanks. I'm thinking if we should have a generic GUC
log_file_processing_traffic = {none, medium, high} that might help
reporting other time taking file processing operations such as [1]/messages/by-id/CALj2ACW-ELOF5JT2zPavs95wbZ0BrLPrqvSZ7Ac+pjxCkmXtEQ@mail.gmail.com.

Regards,
Bharath Rupireddy.

#44

Robert Haas

robertmhaas@gmail.com

over 3 years ago

In reply to: Bharath Rupireddy (#39)

Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)

On Fri, Apr 29, 2022 at 5:11 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Here's the rebased v9 patch.

This seems like it has enormous overlap with the existing
functionality that we have from log_startup_progress_interval.

I think that facility is also better-designed than this one. It prints
out a message based on elapsed time, whereas this patch prints out a
message based progress through the WAL. That means that if WAL replay
isn't actually advancing for some reason, you just won't get any log
messages and you don't know whether it's advancing slowly or not at
all or the server is just hung. With that facility you can distinguish
those cases.

Also, if for some reason we do think that amount of WAL replayed is
the right metric, rather than time, why would we only allow high=1
segment and low=128 segments, rather than say any number of MB or GB
that the user would like to configure?

I suggest that if log_startup_progress_interval doesn't meet your
needs here, we should try to understand why not and maybe enhance it,
instead of adding a separate facility.

--
Robert Haas
EDB: http://www.enterprisedb.com

#45

[1]: /messages/by-id/CALj2ACW-ELOF5JT2zPavs95wbZ0BrLPrqvSZ7Ac+pjxCkmXtEQ@mail.gmail.com

bharath.rupireddyforpostgres@gmail.com

over 3 years ago

In reply to: Robert Haas (#44)

Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)

On Fri, May 13, 2022 at 6:41 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Apr 29, 2022 at 5:11 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Here's the rebased v9 patch.

This seems like it has enormous overlap with the existing
functionality that we have from log_startup_progress_interval.

I think that facility is also better-designed than this one. It prints
out a message based on elapsed time, whereas this patch prints out a
message based progress through the WAL. That means that if WAL replay
isn't actually advancing for some reason, you just won't get any log
messages and you don't know whether it's advancing slowly or not at
all or the server is just hung. With that facility you can distinguish
those cases.

Also, if for some reason we do think that amount of WAL replayed is
the right metric, rather than time, why would we only allow high=1
segment and low=128 segments, rather than say any number of MB or GB
that the user would like to configure?

I suggest that if log_startup_progress_interval doesn't meet your
needs here, we should try to understand why not and maybe enhance it,
instead of adding a separate facility.

Thanks Robert!

In a production environment (of course with a better management of
server logs) one can set log_wal_traffic to "high" and emit the
required info to answer some of the customer questions like - "How far
the server is in recovery? How much time recovery of each WAL file
approximately took? How much time will it take to recover all the WAL
files? What's the rate of recovery - time per WAL file? etc."

Whereas ereport_startup_progress facility will help to emit log
messages only if "some" operation takes longer than set
log_startup_progress_interval time which may not serve the above
purpose.

Actually, IMO a generic GUC log_file_processing_traffic = {none,
medium, high} to help server emit logs for all the critical file
processing operations - WAL file recovery (as proposed here in this
thread), temp file processing during server startup or restarts
(log_startup_progress_interval can't be used here as postmaster
doesn't register for any timeouts) [1]/messages/by-id/CALj2ACW-ELOF5JT2zPavs95wbZ0BrLPrqvSZ7Ac+pjxCkmXtEQ@mail.gmail.com, snapshot and mapping file
processing during checkpoint, temp relation files, removing old WAL
files and so on.

Thoughts?

Regards,
Bharath Rupireddy.

#46

Robert Haas

robertmhaas@gmail.com

over 3 years ago

In reply to: Bharath Rupireddy (#45)

Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)

On Tue, May 17, 2022 at 1:32 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

In a production environment (of course with a better management of
server logs) one can set log_wal_traffic to "high" and emit the
required info to answer some of the customer questions like - "How far
the server is in recovery? How much time recovery of each WAL file
approximately took? How much time will it take to recover all the WAL
files? What's the rate of recovery - time per WAL file? etc."

Whereas ereport_startup_progress facility will help to emit log
messages only if "some" operation takes longer than set
log_startup_progress_interval time which may not serve the above
purpose.

I think you are misunderstanding what an "operation" is here. If you
take the time to test out that facility, I think you will find that it
is quite capable of answering all of those questions.

--
Robert Haas
EDB: http://www.enterprisedb.com

#47

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: Robert Haas (#44)

Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)

On Fri, May 13, 2022 at 09:10:52AM -0400, Robert Haas wrote:

I suggest that if log_startup_progress_interval doesn't meet your
needs here, we should try to understand why not and maybe enhance it,
instead of adding a separate facility.

+1. AFAICT it should be possible to make the log_startup_progress_interval
machinery extensible enough to be reused for several other tasks that can
take a while but have little visibility.

Since this thread has been inactive for over 2 months, I'm marking the
commitfest entry as Waiting on Author.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#48