add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display
Hi,
As discussed in [1]/messages/by-id/CALj2ACWE6D+bbKnA9tQ0d-pPN1n4xrsFPOmiURm2my3qt0w3OA@mail.gmail.com, isn't it a better idea to add some of activity
messages [2]snprintf(activitymsg, sizeof(activitymsg), "waiting for %s", snprintf(activitymsg, sizeof(activitymsg), "recovering %s", snprintf(activitymsg, sizeof(activitymsg), "performing %s%s%s", snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog); snprintf(activitymsg, sizeof(activitymsg), "failed on %s", xlog); elog(DEBUG1, "archived write-ahead log file \"%s\"", xlog); snprintf(activitymsg, sizeof(activitymsg), "sending backup \"%s\"", snprintf(activitymsg, sizeof(activitymsg), "restarting at %X/%X", snprintf(activitymsg, sizeof(activitymsg), "streaming %X/%X", snprintf(activitymsg, sizeof(activitymsg), "streaming %X/%X", such as recovery, archive, backup, streaming etc. to
server logs at LOG level? They are currently being set into ps display
which is good if the postgres is being run on a standalone box/VM
where users can see the ps display, but it doesn't help much in case
the postgres is being run on a cloud environment where users don't
have access to ps display output. Moreover, the ps display is
transient and will not help to analyze after an issue occurs.
Having the above messages in the server logs will be useful to
understand how the system is/was doing/progressing with these
(sometimes time-intensive) operations.
Thoughts?
[1]: /messages/by-id/CALj2ACWE6D+bbKnA9tQ0d-pPN1n4xrsFPOmiURm2my3qt0w3OA@mail.gmail.com
[2]: snprintf(activitymsg, sizeof(activitymsg), "waiting for %s", snprintf(activitymsg, sizeof(activitymsg), "recovering %s", snprintf(activitymsg, sizeof(activitymsg), "performing %s%s%s", snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog); snprintf(activitymsg, sizeof(activitymsg), "failed on %s", xlog); elog(DEBUG1, "archived write-ahead log file \"%s\"", xlog); snprintf(activitymsg, sizeof(activitymsg), "sending backup \"%s\"", snprintf(activitymsg, sizeof(activitymsg), "restarting at %X/%X", snprintf(activitymsg, sizeof(activitymsg), "streaming %X/%X", snprintf(activitymsg, sizeof(activitymsg), "streaming %X/%X",
snprintf(activitymsg, sizeof(activitymsg), "waiting for %s",
snprintf(activitymsg, sizeof(activitymsg), "recovering %s",
snprintf(activitymsg, sizeof(activitymsg), "performing %s%s%s",
snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
snprintf(activitymsg, sizeof(activitymsg), "failed on %s", xlog);
elog(DEBUG1, "archived write-ahead log file \"%s\"", xlog);
snprintf(activitymsg, sizeof(activitymsg), "sending backup \"%s\"",
snprintf(activitymsg, sizeof(activitymsg), "restarting at %X/%X",
snprintf(activitymsg, sizeof(activitymsg), "streaming %X/%X",
snprintf(activitymsg, sizeof(activitymsg), "streaming %X/%X",
Regards,
Bharath Rupireddy.
On 11/10/21, 9:43 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
As discussed in [1], isn't it a better idea to add some of activity
messages [2] such as recovery, archive, backup, streaming etc. to
server logs at LOG level? They are currently being set into ps display
which is good if the postgres is being run on a standalone box/VM
where users can see the ps display, but it doesn't help much in case
the postgres is being run on a cloud environment where users don't
have access to ps display output. Moreover, the ps display is
transient and will not help to analyze after an issue occurs.Having the above messages in the server logs will be useful to
understand how the system is/was doing/progressing with these
(sometimes time-intensive) operations.
I think this would make the logs far too noisy for many servers. For
archiving alone, this could cause tens of thousands more log messages
per hour on a busy system. I think you can already see such
information at a debug level, anyway.
Nathan
On Wed, Nov 10, 2021, at 2:38 PM, Bharath Rupireddy wrote:
As discussed in [1], isn't it a better idea to add some of activity
messages [2] such as recovery, archive, backup, streaming etc. to
server logs at LOG level? They are currently being set into ps display
which is good if the postgres is being run on a standalone box/VM
where users can see the ps display, but it doesn't help much in case
the postgres is being run on a cloud environment where users don't
have access to ps display output. Moreover, the ps display is
transient and will not help to analyze after an issue occurs.
Besides recovery, the other activities already provide information through
views. I might be missing something but it seems the current views already
provide the information that ps displays.
Having the above messages in the server logs will be useful to
understand how the system is/was doing/progressing with these
(sometimes time-intensive) operations.
It could be useful for a root cause analysis, however, you are also increasing
the number of LOG messages in a system that doesn't/won't require such
information. It is fine to add additional DEBUG messages if there isn't a
similar one yet. If at least the message level were module-controlled, you
could modify a setting to gather more messages from a specific module.
Unfortunately, that's not possible so we should avoid superfluous messages.
--
Euler Taveira
EDB https://www.enterprisedb.com/
On Thu, Nov 11, 2021 at 12:01 AM Bossart, Nathan <bossartn@amazon.com> wrote:
On 11/10/21, 9:43 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
As discussed in [1], isn't it a better idea to add some of activity
messages [2] such as recovery, archive, backup, streaming etc. to
server logs at LOG level? They are currently being set into ps display
which is good if the postgres is being run on a standalone box/VM
where users can see the ps display, but it doesn't help much in case
the postgres is being run on a cloud environment where users don't
have access to ps display output. Moreover, the ps display is
transient and will not help to analyze after an issue occurs.Having the above messages in the server logs will be useful to
understand how the system is/was doing/progressing with these
(sometimes time-intensive) operations.I think this would make the logs far too noisy for many servers. For
archiving alone, this could cause tens of thousands more log messages
per hour on a busy system. I think you can already see such
information at a debug level, anyway.
You are right, having them at LOG level may blow up the log files and
disk storage, having them at DEBUG1 like other messages would at least
help than having them as activitymsg in ps display.
And not all of the activitymsgs are logged at DEBUGX level, at least I
don't see them(on a quick search).
snprintf(activitymsg, sizeof(activitymsg), "waiting for %s",
snprintf(activitymsg, sizeof(activitymsg), "recovering %s",
snprintf(activitymsg, sizeof(activitymsg), "performing %s%s%s",
snprintf(activitymsg, sizeof(activitymsg), "sending backup \"%s\"",
snprintf(activitymsg, sizeof(activitymsg), "restarting at %X/%X",
snprintf(activitymsg, sizeof(activitymsg), "streaming %X/%X",
But for the snprintf(activitymsg, sizeof(activitymsg), "archiving %s",
xlog); we have elog(DEBUG1, "archived write-ahead log file \"%s\"",
xlog); after the archiving command. It is also good to have a similar
debug message before archive command execution, we can get to know
whether archive command is executed or not, if yes how much time did
it take etc.?
I'm not sure whether it is okay to do the following, for every of the
activitymsg, before the command execution, we have an elog(DEBUG1
message and after the command execution another elog(DEBUG1 message.
Thoughts?
Regards,
Bharath Rupireddy.
On Thu, Nov 11, 2021 at 1:40 AM Euler Taveira <euler@eulerto.com> wrote:
On Wed, Nov 10, 2021, at 2:38 PM, Bharath Rupireddy wrote:
As discussed in [1], isn't it a better idea to add some of activity
messages [2] such as recovery, archive, backup, streaming etc. to
server logs at LOG level? They are currently being set into ps display
which is good if the postgres is being run on a standalone box/VM
where users can see the ps display, but it doesn't help much in case
the postgres is being run on a cloud environment where users don't
have access to ps display output. Moreover, the ps display is
transient and will not help to analyze after an issue occurs.Besides recovery, the other activities already provide information through
views. I might be missing something but it seems the current views already
provide the information that ps displays.Having the above messages in the server logs will be useful to
understand how the system is/was doing/progressing with these
(sometimes time-intensive) operations.It could be useful for a root cause analysis, however, you are also increasing
the number of LOG messages in a system that doesn't/won't require such
information. It is fine to add additional DEBUG messages if there isn't a
similar one yet. If at least the message level were module-controlled, you
could modify a setting to gather more messages from a specific module.
Unfortunately, that's not possible so we should avoid superfluous messages.
Yes, at least having the messages at DEBUG1 level would help do the
RCA after an issue occurs. Please see my response upthread at [1]/messages/by-id/CALj2ACXVwnHhek9bo9wgamtW9SBinj4ohF1f1Yf3Su2o5WarWg@mail.gmail.com.
[1]: /messages/by-id/CALj2ACXVwnHhek9bo9wgamtW9SBinj4ohF1f1Yf3Su2o5WarWg@mail.gmail.com
Regards,
Bharath Rupireddy.
On 2021-Nov-10, Bossart, Nathan wrote:
On 11/10/21, 9:43 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
As discussed in [1], isn't it a better idea to add some of activity
messages [2] such as recovery, archive, backup, streaming etc. to
server logs at LOG level?
I think this would make the logs far too noisy for many servers. For
archiving alone, this could cause tens of thousands more log messages
per hour on a busy system. I think you can already see such
information at a debug level, anyway.
Yeah. If we had some sort of ring buffer in which to store these
messages, the user could examine them through a view; they would still
be accessible in a running server, but they would not be written to the
server log.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On 11/10/21, 9:51 PM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
But for the snprintf(activitymsg, sizeof(activitymsg), "archiving %s",
xlog); we have elog(DEBUG1, "archived write-ahead log file \"%s\"",
xlog); after the archiving command. It is also good to have a similar
debug message before archive command execution, we can get to know
whether archive command is executed or not, if yes how much time did
it take etc.?
There is a DEBUG3 that emits "executing archive command..." prior to
executing the archive command.
I'm not sure whether it is okay to do the following, for every of the
activitymsg, before the command execution, we have an elog(DEBUG1
message and after the command execution another elog(DEBUG1 message.
The general idea seems reasonable to me.
Nathan
On 11/11/21, 4:38 AM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:
Yeah. If we had some sort of ring buffer in which to store these
messages, the user could examine them through a view; they would still
be accessible in a running server, but they would not be written to the
server log.
That's an interesting idea. It'd be nice to have a way to see more
debugging information without exploding the server log.
Nathan
On Thu, Nov 11, 2021, at 2:50 AM, Bharath Rupireddy wrote:
But for the snprintf(activitymsg, sizeof(activitymsg), "archiving %s",
xlog); we have elog(DEBUG1, "archived write-ahead log file \"%s\"",
xlog); after the archiving command. It is also good to have a similar
debug message before archive command execution, we can get to know
whether archive command is executed or not, if yes how much time did
it take etc.?
If you are only interested in how much time it took to archive a WAL file, you
don't need 2 messages (before and after); one message is sufficient. If you
calculate the difference and then print it as part of the existent message.
Calculate the difference iif the log level is at least DEBUG1 (same as the log
message).
Messages related to system() e.g. archive and recovery, you could measure the
execution time in your own script/program.
I like Alvaro's idea of implementing a ring buffer for this kind of activity.
This implementation could be applied to checkpoints, archiving, recovery,
replication, base backup, WAL usage, connections and others.
--
Euler Taveira
EDB https://www.enterprisedb.com/
On Thu, Nov 11, 2021 at 03:18:16PM +0000, Bossart, Nathan wrote:
On 11/11/21, 4:38 AM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:
Yeah. If we had some sort of ring buffer in which to store these
messages, the user could examine them through a view; they would still
be accessible in a running server, but they would not be written to the
server log.That's an interesting idea. It'd be nice to have a way to see more
debugging information without exploding the server log.
Uh, that's what dmesg does, which I mentioned earlier.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
On Thu, Nov 11, 2021 at 6:07 PM Alvaro Herrera <alvherre@alvh.no-ip.org>
wrote:
On 2021-Nov-10, Bossart, Nathan wrote:
On 11/10/21, 9:43 AM, "Bharath Rupireddy" <
bharath.rupireddyforpostgres@gmail.com> wrote:
As discussed in [1], isn't it a better idea to add some of activity
messages [2] such as recovery, archive, backup, streaming etc. to
server logs at LOG level?I think this would make the logs far too noisy for many servers. For
archiving alone, this could cause tens of thousands more log messages
per hour on a busy system. I think you can already see such
information at a debug level, anyway.Yeah. If we had some sort of ring buffer in which to store these
messages, the user could examine them through a view; they would still
be accessible in a running server, but they would not be written to the
server log.
That's a good idea. How about also adding some GUC(s) to the log archive,
recovery related log messages just like we have for checkpoints, autovacuum
etc? Maybe something like log_archive, log_recovery etc.
--
With Regards,
Ashutosh Sharma.
On Thu, Nov 11, 2021 at 8:41 PM Bossart, Nathan <bossartn@amazon.com> wrote:
I'm not sure whether it is okay to do the following, for every of the
activitymsg, before the command execution, we have an elog(DEBUG1
message and after the command execution another elog(DEBUG1 message.The general idea seems reasonable to me.
Here's a simple patch that I could come up with. Please have a look.
Regards,
Bharath Rupireddy.
Attachments:
v1-0001-add-log-messages-that-are-equivalent-of-ps-displa.patchapplication/octet-stream; name=v1-0001-add-log-messages-that-are-equivalent-of-ps-displa.patchDownload+34-25
A bunch of these now execute snprintf()s unnecessarily. I think these
should be made conditional on message_level_is_interesting(DEBUG1) to
avoid that overhead.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick." (Andrew Sullivan)
On Sat, Nov 13, 2021 at 9:04 PM Alvaro Herrera <alvherre@alvh.no-ip.org>
wrote:
A bunch of these now execute snprintf()s unnecessarily. I think these
should be made conditional on message_level_is_interesting(DEBUG1) to
avoid that overhead.
Thanks. Attaching the v2 to avoid that by directly using the message in
ereport instead of activitymsg.
Regards,
Bharath Rupireddy.
Attachments:
v2-0001-add-log-messages-equivalents-of-ps-display-activi.patchapplication/x-patch; name=v2-0001-add-log-messages-equivalents-of-ps-display-activi.patchDownload+25-5
On 11/14/21, 4:18 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
Thanks. Attaching the v2 to avoid that by directly using the message
in ereport instead of activitymsg.
+ ereport(DEBUG1,
+ (errmsg_internal("waiting for %s", xlogfname)));
As a general comment, I think we should include more detail than what
goes in the ps title. While we probably want to keep the ps title,
brief, I don't see any issue with including a bit more context in
debug statements.
+ ereport(DEBUG1,
+ (errmsg_internal("executing archive command \"%s\" for archving %s",
+ xlogarchcmd, xlog)));
I wonder if it's necessary to include the "for archiving" part in this
one. Typically, the xlogarchcmd will include it somewhere.
Nathan
On Mon, Nov 15, 2021 at 10:36:09PM +0000, Bossart, Nathan wrote:
+ ereport(DEBUG1, + (errmsg_internal("executing archive command \"%s\" for archving %s", + xlogarchcmd, xlog)));I wonder if it's necessary to include the "for archiving" part in this
one. Typically, the xlogarchcmd will include it somewhere.
Don't see any need to add that either (s/archving/archiving/, btw).
@@ -1000,6 +1004,10 @@ XLogWalRcvFlush(bool dying, TimeLineID tli)
+ ereport(DEBUG1,
+ (errmsg_internal("streaming %X/%X",
+ LSN_FORMAT_ARGS(LogstreamResult.Write))));
Having the same wording for the WAL receiver and the WAL sender could
be confusing when it comes to a cascading standby node, particularly
if log_line_prefix does not include the process PID.
+ ereport(DEBUG1,
+ (errmsg_internal("streaming %X/%X",
+ LSN_FORMAT_ARGS(sentPtr))));
Anyway, those two ones are going to make the logs much more noisy, no?
The same could be said about XLogFileRead(), joining the point of
Nathan's upthread. So I cannot get excited by this change.
--
Michael
On 11/15/21, 7:14 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
+ ereport(DEBUG1, + (errmsg_internal("streaming %X/%X", + LSN_FORMAT_ARGS(sentPtr)))); Anyway, those two ones are going to make the logs much more noisy, no? The same could be said about XLogFileRead(), joining the point of Nathan's upthread. So I cannot get excited by this change.
Yeah, this might even be too noisy for DEBUG5.
Nathan
On 2021-Nov-16, Bossart, Nathan wrote:
On 11/15/21, 7:14 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
+ ereport(DEBUG1, + (errmsg_internal("streaming %X/%X", + LSN_FORMAT_ARGS(sentPtr)))); Anyway, those two ones are going to make the logs much more noisy, no? The same could be said about XLogFileRead(), joining the point of Nathan's upthread. So I cannot get excited by this change.Yeah, this might even be too noisy for DEBUG5.
Nod. And if you're at DEBUG5, the log contains so much other crap that
it is unusable for any purposes anyway.
My opinion is that adding these things willy-nilly is not a solution to
any actual problem. Adding a few additional log lines that are
low-volume at DEBUG1 might be useful, but below that (DEBUG2 etc) it's
not good for anything other than specific development, IMO. At least
this particular one for streaming replication I think we should not
include.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Tue, Nov 16, 2021 at 01:26:49PM -0300, Alvaro Herrera wrote:
My opinion is that adding these things willy-nilly is not a solution to
any actual problem. Adding a few additional log lines that are
low-volume at DEBUG1 might be useful, but below that (DEBUG2 etc) it's
not good for anything other than specific development, IMO. At least
this particular one for streaming replication I think we should not
include.
Looking at v2, I think that this leaves the additions of the DEBUG1
entries in SendBaseBackup() and WalRcvWaitForStartPosition(), then.
The one in pgarch.c does not provide any additional information as the
segment to-be-archived should be part of the command.
--
Michael
On Wed, Nov 17, 2021 at 8:01 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Nov 16, 2021 at 01:26:49PM -0300, Alvaro Herrera wrote:
My opinion is that adding these things willy-nilly is not a solution to
any actual problem. Adding a few additional log lines that are
low-volume at DEBUG1 might be useful, but below that (DEBUG2 etc) it's
not good for anything other than specific development, IMO. At least
this particular one for streaming replication I think we should not
include.Looking at v2, I think that this leaves the additions of the DEBUG1
entries in SendBaseBackup() and WalRcvWaitForStartPosition(), then.
The one in pgarch.c does not provide any additional information as the
segment to-be-archived should be part of the command.
Thank you all for the inputs. Here's the patch that I've come up with.
Upon thinking further, having at least the messages at LOG level [1]ereport(LOG, (errmsg("waiting for WAL segment \"%s\" from archive", xlogfname)));
would be helpful to know what's happening with the system while in
recovery. Although these messages at LOG level seem to be filling up
the server logs, having a good log consumption and rotation mechanism
(I'm sure every major postgres vendor would have one) would be
sufficient to allay that concern.
These LOG messages would help us know how much time a restore command
takes to fetch the WAL file and what is the current WAL file the
server is recovering and where is it recovering from. The customer
often asks questions like: when will my server come up? how much time
does the recovery of my server take?
As a developer or admin, one can monitor these logs and do bunch of things:
1) see how many WAL files left to be recovered by looking at the WAL
files in the archive location or pg_wal directory or from primary
2) provide an approximate estimation of when the server will come up
or how much more the recovery takes by looking at these previous LOG
messages, one can know the number of WAL files that server recovered
over a minute and with the help of left-over WAL files calculated from
(1).
[1]: ereport(LOG, (errmsg("waiting for WAL segment \"%s\" from archive", xlogfname)));
ereport(LOG,
(errmsg("waiting for WAL segment \"%s\" from archive",
xlogfname)));
ereport(LOG,
(errmsg("restored WAL segment \"%s\" from archive",
xlogfname)));
ereport(LOG,
(errmsg("recovering WAL segment \"%s\" from source \"%s\"",
xlogfname, srcname)));
Regards,
Bharath Rupireddy.