please update ps display for recovery checkpoint

Started by Justin Pryzbyover 5 years ago30 messages
#1Justin Pryzby
Justin Pryzby
pryzby@telsasoft.com
3 attachment(s)

During crash recovery, the server writes this to log:

< 2020-08-16 08:46:08.601 -03 >LOG: redo done at 2299C/1EC6BA00
< 2020-08-16 08:46:08.877 -03 >LOG: checkpoint starting: end-of-recovery immediate

But runs a checkpoint, which can take a long time, while the "ps" display still
says "recovering NNNNNNNN".

Please change to say "recovery checkpoint" or similar, as I mentioned here.
/messages/by-id/20200118201111.GP26045@telsasoft.com

--
Justin

Attachments:

0001-Update-PS-display-following-replay-of-last-xlog.patchtext/x-diff; charset=us-ascii
0002-Document-that-checkpoint_flush_after-applies-to-end-.patchtext/x-diff; charset=us-ascii
0003-Avoid-ambiguous-that.patchtext/x-diff; charset=us-ascii
#2k.jamison@fujitsu.com
k.jamison@fujitsu.com
k.jamison@fujitsu.com
In reply to: Justin Pryzby (#1)
RE: please update ps display for recovery checkpoint

On Wednesday, August 19, 2020 7:53 AM (GMT+9), Justin Pryzby wrote:

Hi,

All the patches apply, although when applying them the following appears:
(Stripping trailing CRs from patch; use --binary to disable.)

During crash recovery, the server writes this to log:

< 2020-08-16 08:46:08.601 -03 >LOG: redo done at 2299C/1EC6BA00 <
2020-08-16 08:46:08.877 -03 >LOG: checkpoint starting: end-of-recovery
immediate

But runs a checkpoint, which can take a long time, while the "ps" display still says
"recovering NNNNNNNN".

Please change to say "recovery checkpoint" or similar, as I mentioned here.
/messages/by-id/20200118201111.GP26045@telsasoft.c
om

Yes, I agree that it is helpful to tell users about that.

About 0003 patch, there are similar phrases in bgwriter_flush_after and
backend_flush_after. Should those be updated too?

--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3170,7 +3170,7 @@ include_dir 'conf.d'
         limit the amount of dirty data in the kernel's page cache, reducing
         the likelihood of stalls when an <function>fsync</function> is issued at the end of the
         checkpoint, or when the OS writes data back in larger batches in the
-        background.  Often that will result in greatly reduced transaction
+        background.  This feature will often result in greatly reduced transaction
         latency, but there also are some cases, especially with workloads
         that are bigger than <xref linkend="guc-shared-buffers"/>, but smaller
         than the OS's page cache, where performance might degrade.  This

Regards,
Kirk Jamison

#3Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: k.jamison@fujitsu.com (#2)
Re: please update ps display for recovery checkpoint

On Wed, Aug 19, 2020 at 12:20:50AM +0000, k.jamison@fujitsu.com wrote:

On Wednesday, August 19, 2020 7:53 AM (GMT+9), Justin Pryzby wrote:

During crash recovery, the server writes this to log:
Please change to say "recovery checkpoint" or similar, as I mentioned here.
/messages/by-id/20200118201111.GP26045@telsasoft.c
om

Yes, I agree that it is helpful to tell users about that.

That could be helpful. Wouldn't it be better to use "end-of-recovery
checkpoint" instead? That's the common wording in the code comments.

I don't see the point of patch 0002. In the same paragraph, we
already know that this applies to any checkpoints.

About 0003 patch, there are similar phrases in bgwriter_flush_after and
backend_flush_after. Should those be updated too?

Yep, makes sense.
--
Michael

#4Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#3)
Re: please update ps display for recovery checkpoint

On Thu, Aug 20, 2020 at 05:09:05PM +0900, Michael Paquier wrote:

That could be helpful. Wouldn't it be better to use "end-of-recovery
checkpoint" instead? That's the common wording in the code comments.

I don't see the point of patch 0002. In the same paragraph, we
already know that this applies to any checkpoints.

Thinking more about this.. Could it be better to just add some calls
to set_ps_display() directly in CreateCheckPoint()? This way, both
the checkpointer as well as the startup process at the end of recovery
would benefit from the change.
--
Michael

#5Justin Pryzby
Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#4)
Re: please update ps display for recovery checkpoint

On Mon, Aug 31, 2020 at 03:52:44PM +0900, Michael Paquier wrote:

On Thu, Aug 20, 2020 at 05:09:05PM +0900, Michael Paquier wrote:

That could be helpful. Wouldn't it be better to use "end-of-recovery
checkpoint" instead? That's the common wording in the code comments.

I don't see the point of patch 0002. In the same paragraph, we
already know that this applies to any checkpoints.

Thinking more about this.. Could it be better to just add some calls
to set_ps_display() directly in CreateCheckPoint()? This way, both
the checkpointer as well as the startup process at the end of recovery
would benefit from the change.

What would you want the checkpointer's ps to say ?

Normally it just says:
postgres 3468 3151 0 Aug27 ? 00:20:57 postgres: checkpointer

Or do you mean do the same thing as now, but one layer lower, like:

@@ -8728,6 +8725,9 @@ CreateCheckPoint(int flags)
+       if (flags & CHECKPOINT_END_OF_RECOVERY)
+               set_ps_display("recovery checkpoint");

--
Justin

#6Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#5)
Re: please update ps display for recovery checkpoint

On Wed, Sep 09, 2020 at 09:00:50PM -0500, Justin Pryzby wrote:

What would you want the checkpointer's ps to say ?

Normally it just says:
postgres 3468 3151 0 Aug27 ? 00:20:57 postgres: checkpointer

Note that CreateCheckPoint() can also be called from the startup
process if the bgwriter has not been launched once recovery finishes.

Or do you mean do the same thing as now, but one layer lower, like:

@@ -8728,6 +8725,9 @@ CreateCheckPoint(int flags)
+       if (flags & CHECKPOINT_END_OF_RECOVERY)
+               set_ps_display("recovery checkpoint");

For the use-case discussed here, that would be fine. Now the
difficult point is how much information we can actually display
without bloating ps while still have something meaningful. Showing
all the information from LogCheckpointStart() would bloat the output a
lot for example. So, thinking about that, my take would be to have ps
display the following at the beginning of CreateCheckpoint() and
CreateRestartPoint():
- restartpoint or checkpoint
- shutdown
- end-of-recovery

The output also needs to be cleared once the routines finish or if
there is a skip, of course.
--
Michael

#7Justin Pryzby
Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#6)
4 attachment(s)
Re: please update ps display for recovery checkpoint

On Thu, Sep 10, 2020 at 01:37:10PM +0900, Michael Paquier wrote:

On Wed, Sep 09, 2020 at 09:00:50PM -0500, Justin Pryzby wrote:

What would you want the checkpointer's ps to say ?

Normally it just says:
postgres 3468 3151 0 Aug27 ? 00:20:57 postgres: checkpointer

Note that CreateCheckPoint() can also be called from the startup
process if the bgwriter has not been launched once recovery finishes.

Or do you mean do the same thing as now, but one layer lower, like:

@@ -8728,6 +8725,9 @@ CreateCheckPoint(int flags)
+       if (flags & CHECKPOINT_END_OF_RECOVERY)
+               set_ps_display("recovery checkpoint");

For the use-case discussed here, that would be fine. Now the
difficult point is how much information we can actually display
without bloating ps while still have something meaningful. Showing
all the information from LogCheckpointStart() would bloat the output a
lot for example. So, thinking about that, my take would be to have ps
display the following at the beginning of CreateCheckpoint() and
CreateRestartPoint():
- restartpoint or checkpoint
- shutdown
- end-of-recovery

The output also needs to be cleared once the routines finish or if
there is a skip, of course.

In my inital request, I *only* care about the startup process' recovery
checkpoint. AFAIK, this exits when it's done, so there may be no need to
"revert" to the previous "ps". However, one could argue that it's currently a
bug that the "recovering NNN" portion isn't updated after finishing the WAL
files.

StartupXLOG -> xlogreader -> XLogPageRead -> WaitForWALToBecomeAvailable -> XLogFileReadAnyTLI -> XLogFileRead
-> CreateCheckPoint

Maybe it's a bad idea if the checkpointer is continuously changing its display.
I don't see the utility in it, since log_checkpoints does more than ps could
ever do. I'm concerned that would break things for someone using something
like pgrep.
|$ ps -wwf `pgrep -f 'checkpointer *$'`
|UID PID PPID C STIME TTY STAT TIME CMD
|postgres 9434 9418 0 Aug20 ? Ss 214:25 postgres: checkpointer

|pryzbyj 23010 23007 0 10:33 ? 00:00:00 postgres: checkpointer checkpoint

I think this one is by far the most common, but somewhat confusing, since it's
only one word. This led me to put parenthesis around it:

|pryzbyj 26810 26809 82 10:53 ? 00:00:12 postgres: startup (end-of-recovery checkpoint)

Related: I have always thought that this message meant "recovery will complete
Real Soon", but I now understand it to mean "beginning the recovery checkpoint,
which is flagged CHECKPOINT_IMMEDIATE" (and may take a long time).

|2020-09-19 10:53:26.345 CDT [26810] LOG: checkpoint starting: end-of-recovery immediate

--
Justin

Attachments:

v2-0001-Clear-PS-display-after-recovery.patchtext/x-diff; charset=us-ascii
v2-0002-Update-PS-display-following-replay-of-last-xlog.patchtext/x-diff; charset=us-ascii
v2-0003-Avoid-ambiguous-that.patchtext/x-diff; charset=us-ascii
v2-0004-Document-that-checkpoint_flush_after-applies-to-e.patchtext/x-diff; charset=us-ascii
#8Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#7)
1 attachment(s)
Re: please update ps display for recovery checkpoint

On Sat, Sep 19, 2020 at 11:00:31AM -0500, Justin Pryzby wrote:

Maybe it's a bad idea if the checkpointer is continuously changing its display.
I don't see the utility in it, since log_checkpoints does more than ps could
ever do. I'm concerned that would break things for someone using something
like pgrep.

At the end of recovery, there is a code path where the startup process
triggers the checkpoint by itself if the bgwriter is not launched, but
there is also a second code path where, if the bgwriter is started and
if the cluster not promoted, the startup process would request for an
immediate checkpoint and then wait for it. It is IMO equally
important to update the display of the checkpointer in this case to
show that the checkpointer is running an end-of-recovery checkpoint.

Related: I have always thought that this message meant "recovery will complete
Real Soon", but I now understand it to mean "beginning the recovery checkpoint,
which is flagged CHECKPOINT_IMMEDIATE" (and may take a long time).

Yep. And at the end of crash recovery seconds feel like minutes.

I agree that "checkpointer checkpoint" is not the best fit. Using
parenthesis would also be inconsistent with the other usages of this
API in the backend code. What about adding "running" then? This
would give "checkpointer running end-of-recovery checkpoint".

While looking at this patch, I got tempted to use a StringInfo to fill
in the string to display as that would make the addition of any extra
information easier, giving the attached.
--
Michael

Attachments:

checkpoint-ps-display-v3.patchtext/x-diff; charset=us-ascii
#9Justin Pryzby
Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#8)
Re: please update ps display for recovery checkpoint

On Fri, Oct 02, 2020 at 04:28:14PM +0900, Michael Paquier wrote:

Related: I have always thought that this message meant "recovery will complete
Real Soon", but I now understand it to mean "beginning the recovery checkpoint,
which is flagged CHECKPOINT_IMMEDIATE" (and may take a long time).

Yep. And at the end of crash recovery seconds feel like minutes.

I agree that "checkpointer checkpoint" is not the best fit. Using
parenthesis would also be inconsistent with the other usages of this
API in the backend code.

I think maybe I got the idea for parenthesis from these:
src/backend/tcop/postgres.c: set_ps_display("idle in transaction (aborted)");
src/backend/postmaster/postmaster.c- if (port->remote_port[0] != '\0')
src/backend/postmaster/postmaster.c- appendStringInfo(&ps_data, "(%s)", port->remote_port);

What about adding "running" then? This
would give "checkpointer running end-of-recovery checkpoint".

What about one of these?
"checkpointer: running end-of-recovery checkpoint"
"checkpointer running: end-of-recovery checkpoint"

Thanks for looking.

--
Justin

#10Bossart, Nathan
Bossart, Nathan
bossartn@amazon.com
In reply to: Justin Pryzby (#9)
1 attachment(s)
Re: please update ps display for recovery checkpoint

Hi,

I like the idea behind this patch. I wrote a new version with a
couple of changes:

1. Instead of using StringInfoData, just use a small buffer along
with snprintf(). This is similar to what the WAL receiver
process does.
2. If the process is the checkpointer, reset the display to "idle"
once the checkpoint or restartpoint is finished. It's easy
enough to discover the backend type, and I think adding a bit
more clarity to the checkpointer display is a nice touch.
3. Instead of "running," use "creating." IMO that's a bit more
accurate.

I considered also checking that update_process_title was enabled, but
I figured that these ps display updates should happen sparsely enough
that it wouldn't make much of an impact.

What do you think?

Nathan

Attachments:

v4-0001-Add-checkpoint-restartpoint-status-to-ps-display.patchapplication/octet-stream; name=v4-0001-Add-checkpoint-restartpoint-status-to-ps-display.patch
#11Justin Pryzby
Justin Pryzby
pryzby@telsasoft.com
In reply to: Bossart, Nathan (#10)
Re: please update ps display for recovery checkpoint

On Thu, Dec 03, 2020 at 09:18:07PM +0000, Bossart, Nathan wrote:

I considered also checking that update_process_title was enabled, but
I figured that these ps display updates should happen sparsely enough
that it wouldn't make much of an impact.

Since bf68b79e5, update_ps_display is responsible for checking
update_process_title. Its other, remaining uses are apparently just acting as
minor optimizations to guard against useless snprintf's.

See also /messages/by-id/1288021.1600178478@sss.pgh.pa.us
in which (I just saw) Tom wrote:

Seems like a good argument, but you'd have to be careful about the
final state when you stop overriding update_process_title --- it can't
be left looking like it's still-in-progress on some random WAL file.

I think that's a live problem, not just a concern for that patch.
It was exactly my complaint leading to this thread:

But runs a checkpoint, which can take a long time, while the "ps" display still
says "recovering NNNNNNNN".

--
Justin

#12Bossart, Nathan
Bossart, Nathan
bossartn@amazon.com
In reply to: Justin Pryzby (#11)
Re: please update ps display for recovery checkpoint

On 12/3/20, 1:58 PM, "Justin Pryzby" <pryzby@telsasoft.com> wrote:

On Thu, Dec 03, 2020 at 09:18:07PM +0000, Bossart, Nathan wrote:

I considered also checking that update_process_title was enabled, but
I figured that these ps display updates should happen sparsely enough
that it wouldn't make much of an impact.

Since bf68b79e5, update_ps_display is responsible for checking
update_process_title. Its other, remaining uses are apparently just acting as
minor optimizations to guard against useless snprintf's.

See also /messages/by-id/1288021.1600178478@sss.pgh.pa.us
in which (I just saw) Tom wrote:

Seems like a good argument, but you'd have to be careful about the
final state when you stop overriding update_process_title --- it can't
be left looking like it's still-in-progress on some random WAL file.

I think that's a live problem, not just a concern for that patch.
It was exactly my complaint leading to this thread:

But runs a checkpoint, which can take a long time, while the "ps" display still
says "recovering NNNNNNNN".

Ah, I see. Thanks for pointing this out.

Nathan

#13Bossart, Nathan
Bossart, Nathan
bossartn@amazon.com
In reply to: Justin Pryzby (#11)
1 attachment(s)
Re: please update ps display for recovery checkpoint

On 12/3/20, 1:19 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

I like the idea behind this patch. I wrote a new version with a
couple of changes:

I missed an #include in this patch. Here's a new one with that fixed.

Nathan

Attachments:

v5-0001-Add-checkpoint-restartpoint-status-to-ps-display.patchapplication/octet-stream; name=v5-0001-Add-checkpoint-restartpoint-status-to-ps-display.patch
#14Fujii Masao
Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Bossart, Nathan (#13)
Re: please update ps display for recovery checkpoint

On 2020/12/05 2:17, Bossart, Nathan wrote:

On 12/3/20, 1:19 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

I like the idea behind this patch. I wrote a new version with a
couple of changes:

I missed an #include in this patch. Here's a new one with that fixed.

I agree it might be helpful to display something like "checkpointing" or "waiting for checkpoint" in PS title for the startup process.

But, at least for me, it seems strange to display "idle" only for checkpointer. *If* we want to monitor the current status of checkpointer, it should be shown as wait event in pg_stat_activity, instead?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#15Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#14)
Re: please update ps display for recovery checkpoint

On Wed, Dec 09, 2020 at 02:00:44AM +0900, Fujii Masao wrote:

I agree it might be helpful to display something like
"checkpointing" or "waiting for checkpoint" in PS title for the
startup process.

Thanks.

But, at least for me, it seems strange to display "idle" only for
checkpointer.

Yeah, I'd rather leave out the part of the patch where we have the
checkpointer say "idle". So I still think that what v3 did upthread,
by just resetting the ps display in all cases, feels more natural. So
we should remove the parts of v5 from checkpointer.c.

+        * Reset the ps status display.  We set the status to "idle" for the
+        * checkpointer process, and we clear it for anything else that runs this
+        * code.
+        */
+       if (MyBackendType == B_CHECKPOINTER)
+               set_ps_display("idle");
+       else
+               set_ps_display("");
Regarding this part, this just needs a reset with an empty string.
The second sentence is confusing (it partially comes fronm v3, from
me..).  Do we lose anything by removing the second sentence of this
comment?
+   snprintf(activitymsg, sizeof(activitymsg), "creating %s%scheckpoint",
[...]
+   snprintf(activitymsg, sizeof(activitymsg), "creating %srestartpoint",
FWIW, I still fing "running" to sound better than "creating" here.  An
extra term I can think of that could be adapted is "performing".

*If* we want to monitor the current status of
checkpointer, it should be shown as wait event in pg_stat_activity,
instead?

That would be WAIT_EVENT_CHECKPOINTER_MAIN, now there has been also on
this thread an argument that you would not have access to
pg_stat_activity until crash recovery completes and consistency is
restored.
--
Michael

#16Bossart, Nathan
Bossart, Nathan
bossartn@amazon.com
In reply to: Michael Paquier (#15)
1 attachment(s)
Re: please update ps display for recovery checkpoint

On 12/8/20, 10:16 PM, "Michael Paquier" <michael@paquier.xyz> wrote:

On Wed, Dec 09, 2020 at 02:00:44AM +0900, Fujii Masao wrote:

I agree it might be helpful to display something like
"checkpointing" or "waiting for checkpoint" in PS title for the
startup process.

Thanks.

But, at least for me, it seems strange to display "idle" only for
checkpointer.

Yeah, I'd rather leave out the part of the patch where we have the
checkpointer say "idle". So I still think that what v3 did upthread,
by just resetting the ps display in all cases, feels more natural. So
we should remove the parts of v5 from checkpointer.c.

That seems fine to me. I think it is most important that the end-of-
recovery and shutdown checkpoints are shown. I'm not sure there's
much value in updating the process title for automatic checkpoints and
checkpoints triggered via the CHECKPOINT command, so IMO we could skip
those, too. I made these changes in the new version of the patch.

+        * Reset the ps status display.  We set the status to "idle" for the
+        * checkpointer process, and we clear it for anything else that runs this
+        * code.
+        */
+       if (MyBackendType == B_CHECKPOINTER)
+               set_ps_display("idle");
+       else
+               set_ps_display("");
Regarding this part, this just needs a reset with an empty string.
The second sentence is confusing (it partially comes fronm v3, from
me..).  Do we lose anything by removing the second sentence of this
comment?

I've fixed this in the new version of the patch.

+   snprintf(activitymsg, sizeof(activitymsg), "creating %s%scheckpoint",
[...]
+   snprintf(activitymsg, sizeof(activitymsg), "creating %srestartpoint",
FWIW, I still fing "running" to sound better than "creating" here.  An
extra term I can think of that could be adapted is "performing".

I think I prefer "performing" over "running" because that's what the
docs use [0]https://www.postgresql.org/docs/devel/wal-configuration.html. I've changed it to "performing" in the new version of
the patch.

Nathan

[0]: https://www.postgresql.org/docs/devel/wal-configuration.html

Attachments:

v6-0001-Add-checkpoint-restartpoint-status-to-ps-display.patchapplication/octet-stream; name=v6-0001-Add-checkpoint-restartpoint-status-to-ps-display.patch
#17Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Bossart, Nathan (#16)
1 attachment(s)
Re: please update ps display for recovery checkpoint

On Wed, Dec 09, 2020 at 06:37:22PM +0000, Bossart, Nathan wrote:

On 12/8/20, 10:16 PM, "Michael Paquier" <michael@paquier.xyz> wrote:

Yeah, I'd rather leave out the part of the patch where we have the
checkpointer say "idle". So I still think that what v3 did upthread,
by just resetting the ps display in all cases, feels more natural. So
we should remove the parts of v5 from checkpointer.c.

That seems fine to me. I think it is most important that the end-of-
recovery and shutdown checkpoints are shown. I'm not sure there's
much value in updating the process title for automatic checkpoints and
checkpoints triggered via the CHECKPOINT command, so IMO we could skip
those, too. I made these changes in the new version of the patch.

It would be possible to use pg_stat_activity in most cases here, so I
agree to settle down to the minimum we can agree on for now, and maybe
discuss separately if this should be extended in some or another in
the future if there is a use-case for that. So I'd say that what you
have here is logically fine.

+   snprintf(activitymsg, sizeof(activitymsg), "creating %s%scheckpoint",
[...]
+   snprintf(activitymsg, sizeof(activitymsg), "creating %srestartpoint",
FWIW, I still fing "running" to sound better than "creating" here.  An
extra term I can think of that could be adapted is "performing".

I think I prefer "performing" over "running" because that's what the
docs use [0]. I've changed it to "performing" in the new version of
the patch.

That's also used in the code comments, FWIW.

@@ -9282,6 +9296,7 @@ CreateRestartPoint(int flags)
    XLogRecPtr  endptr;
    XLogSegNo   _logSegNo;
    TimestampTz xtime;
+   bool        shutdown = (flags & CHECKPOINT_IS_SHUTDOWN);
You are right that CHECKPOINT_END_OF_RECOVERY should not be called for
a restart point, so that's correct.

However, I think that it would be better to have all those four code
paths controlled by a single routine, where we pass down the
checkpoint flags and fill in the ps string directly depending on what
the caller wants to do. This way, we will avoid any inconsistent
updates if this stuff gets extended in the future, and there will be
all the information at hand to extend the logic. So I have simplified
your patch as per the attached. Thoughts?
--
Michael

Attachments:

v7-0001-Add-checkpoint-restartpoint-status-to-ps-display.patchtext/x-diff; charset=us-ascii
#18Justin Pryzby
Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#17)
Re: please update ps display for recovery checkpoint

Isn't the sense of "reset" inverted ?

I think maybe you mean to do set_ps_display(""); in the "if reset".

Show quoted text

On Fri, Dec 11, 2020 at 12:54:22PM +0900, Michael Paquier wrote:

+update_checkpoint_display(int flags, bool restartpoint, bool reset)
+{
+	if (reset)
+	{
+		char activitymsg[128];
+
+		snprintf(activitymsg, sizeof(activitymsg), "performing %s%s%s",
+				 (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "",
+				 (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "",
+				 restartpoint ? "restartpoint" : "checkpoint");
+		set_ps_display(activitymsg);
+	}
+	else
+		set_ps_display("");
+}
+
+
/*
* Perform a checkpoint --- either during shutdown, or on-the-fly
*
@@ -8905,6 +8937,9 @@ CreateCheckPoint(int flags)
if (log_checkpoints)
LogCheckpointStart(flags, false);
+	/* Update the process title */
+	update_checkpoint_display(flags, false, false);
+
TRACE_POSTGRESQL_CHECKPOINT_START(flags);

/*
@@ -9120,6 +9155,9 @@ CreateCheckPoint(int flags)
/* Real work is done, but log and update stats before releasing lock. */
LogCheckpointEnd(false);

+	/* Reset the process title */
+	update_checkpoint_display(flags, false, true);
+
TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
NBuffers,
CheckpointStats.ckpt_segs_added,
@@ -9374,6 +9412,9 @@ CreateRestartPoint(int flags)
if (log_checkpoints)
LogCheckpointStart(flags, true);
+	/* Update the process title */
+	update_checkpoint_display(flags, true, false);
+
CheckPointGuts(lastCheckPoint.redo, flags);

/*
@@ -9492,6 +9533,9 @@ CreateRestartPoint(int flags)
/* Real work is done, but log and update before releasing lock. */
LogCheckpointEnd(true);

+	/* Reset the process title */
+	update_checkpoint_display(flags, true, true);
+
xtime = GetLatestXTime();
ereport((log_checkpoints ? LOG : DEBUG2),
(errmsg("recovery restart point at %X/%X",
#19Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#18)
Re: please update ps display for recovery checkpoint

On Thu, Dec 10, 2020 at 10:02:10PM -0600, Justin Pryzby wrote:

Isn't the sense of "reset" inverted ?

It is ;p
--
Michael

#20Bossart, Nathan
Bossart, Nathan
bossartn@amazon.com
In reply to: Michael Paquier (#19)
1 attachment(s)
Re: please update ps display for recovery checkpoint

On 12/10/20, 7:54 PM, "Michael Paquier" <michael@paquier.xyz> wrote:

However, I think that it would be better to have all those four code
paths controlled by a single routine, where we pass down the
checkpoint flags and fill in the ps string directly depending on what
the caller wants to do. This way, we will avoid any inconsistent
updates if this stuff gets extended in the future, and there will be
all the information at hand to extend the logic. So I have simplified
your patch as per the attached. Thoughts?

This approach seems reasonable to me. I've attached my take on it.

Nathan

Attachments:

v8-0001-Add-checkpoint-restartpoint-status-to-ps-display.patchapplication/octet-stream; name=v8-0001-Add-checkpoint-restartpoint-status-to-ps-display.patch
#21Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Bossart, Nathan (#20)
1 attachment(s)
Re: please update ps display for recovery checkpoint

On Fri, Dec 11, 2020 at 06:54:42PM +0000, Bossart, Nathan wrote:

This approach seems reasonable to me. I've attached my take on it.

+ /* Reset the process title */
+ set_ps_display("");
I would still recommend to avoid calling set_ps_display() if there is
no need to so as we avoid useless system calls, so I think that this
stuff had better use a common path for the set and reset logic.

My counter-proposal is like the attached, with the set/reset part not
reversed this time, and the code indented :p
--
Michael

Attachments:

v9-0001-Add-checkpoint-restartpoint-status-to-ps-display.patchtext/x-diff; charset=us-ascii
#22Bossart, Nathan
Bossart, Nathan
bossartn@amazon.com
In reply to: Michael Paquier (#21)
Re: please update ps display for recovery checkpoint

On 12/11/20, 4:00 PM, "Michael Paquier" <michael@paquier.xyz> wrote:

On Fri, Dec 11, 2020 at 06:54:42PM +0000, Bossart, Nathan wrote:

This approach seems reasonable to me. I've attached my take on it.

+ /* Reset the process title */
+ set_ps_display("");
I would still recommend to avoid calling set_ps_display() if there is
no need to so as we avoid useless system calls, so I think that this
stuff had better use a common path for the set and reset logic.

My counter-proposal is like the attached, with the set/reset part not
reversed this time, and the code indented :p

Haha. LGTM.

Nathan

#23Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Bossart, Nathan (#22)
Re: please update ps display for recovery checkpoint

On Sat, Dec 12, 2020 at 12:41:25AM +0000, Bossart, Nathan wrote:

On 12/11/20, 4:00 PM, "Michael Paquier" <michael@paquier.xyz> wrote:

My counter-proposal is like the attached, with the set/reset part not
reversed this time, and the code indented :p

Haha. LGTM.

Thanks. I have applied this one, then.
--
Michael

#24Justin Pryzby
Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#23)
Re: please update ps display for recovery checkpoint

On Mon, Dec 14, 2020 at 12:01:33PM +0900, Michael Paquier wrote:

On Sat, Dec 12, 2020 at 12:41:25AM +0000, Bossart, Nathan wrote:

On 12/11/20, 4:00 PM, "Michael Paquier" <michael@paquier.xyz> wrote:

My counter-proposal is like the attached, with the set/reset part not
reversed this time, and the code indented :p

Haha. LGTM.

Thanks. I have applied this one, then.

Thank you.

I'm not sure, but we could consider backpatching something to clear the
"recovering NNN" that's currently displayed during checkpoint, even though
recovery of NNN has already completed. Possibly just calling
set_ps_display(""); after "Real work is done".

--
Justin

#25Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#24)
Re: please update ps display for recovery checkpoint

On Sun, Dec 13, 2020 at 09:22:24PM -0600, Justin Pryzby wrote:

I'm not sure, but we could consider backpatching something to clear the
"recovering NNN" that's currently displayed during checkpoint, even though
recovery of NNN has already completed. Possibly just calling
set_ps_display(""); after "Real work is done".

This behavior exists for ages and there were not a lot of complaints
on this matter, so I see no reason to touch back-branches more than
necessary.
--
Michael

#26Justin Pryzby
Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#23)
1 attachment(s)
Re: please update ps display for recovery checkpoint

On Mon, Dec 14, 2020 at 12:01:33PM +0900, Michael Paquier wrote:

On Sat, Dec 12, 2020 at 12:41:25AM +0000, Bossart, Nathan wrote:

On 12/11/20, 4:00 PM, "Michael Paquier" <michael@paquier.xyz> wrote:

My counter-proposal is like the attached, with the set/reset part not
reversed this time, and the code indented :p

Haha. LGTM.

Thanks. I have applied this one, then.

To refresh: commit df9274adf adds "checkpoint" info to "ps", which previously
continued to say "recovering NNNNN" even after finishing WAL replay, and
throughout the checkpoint.

Now, I wonder whether the startup process should also include some detail about
"syncing data dir". It's possible to strace the process to see what it's
doing, but most DBA would probably not know that, and it's helpful to know the
status of recovery and what part of recovery is slow: sync, replay, or
checkpoint. commit df9274adf improved the situation between replay and
ckpoint, but it's still not clear what "postgres: startup" means before replay
starts.

There's some interaction between Thomas' commit 61752afb2 and
recovery_init_sync_method - if we include a startup message, it should
distinguish between "syncing data dirs (fsync)" and (syncfs).

Putting this into fd.c seems to assume that we can clobber "ps", which is fine
when called by StartupXLOG(), but it's a public interface, so I'm not sure if
it's okay to assume that's the only caller. Maybe it should check if
MyAuxProcType == B_STARTUP.

--
Justin

Attachments:

0001-Show-syncing-data-directories-in-ps-display.patchtext/x-diff; charset=us-ascii
#27Bossart, Nathan
Bossart, Nathan
bossartn@amazon.com
In reply to: Justin Pryzby (#26)
Re: please update ps display for recovery checkpoint

On 6/6/21, 7:14 PM, "Justin Pryzby" <pryzby@telsasoft.com> wrote:

Now, I wonder whether the startup process should also include some detail about
"syncing data dir". It's possible to strace the process to see what it's
doing, but most DBA would probably not know that, and it's helpful to know the
status of recovery and what part of recovery is slow: sync, replay, or
checkpoint. commit df9274adf improved the situation between replay and
ckpoint, but it's still not clear what "postgres: startup" means before replay
starts.

I've seen a few functions cause lengthy startups, including
SyncDataDirectory() (for which I was grateful to see 61752afb),
StartupReorderBuffer(), and RemovePgTempFiles(). I like the idea of
adding additional information in the ps title, but I also think it is
worth exploring additional ways to improve on these O(n) startup
tasks.

Nathan

#28Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Bossart, Nathan (#27)
Re: please update ps display for recovery checkpoint

On Mon, Jun 7, 2021 at 12:02 PM Bossart, Nathan <bossartn@amazon.com> wrote:

On 6/6/21, 7:14 PM, "Justin Pryzby" <pryzby@telsasoft.com> wrote:

Now, I wonder whether the startup process should also include some detail about
"syncing data dir". It's possible to strace the process to see what it's
doing, but most DBA would probably not know that, and it's helpful to know the
status of recovery and what part of recovery is slow: sync, replay, or
checkpoint. commit df9274adf improved the situation between replay and
ckpoint, but it's still not clear what "postgres: startup" means before replay
starts.

I've seen a few functions cause lengthy startups, including
SyncDataDirectory() (for which I was grateful to see 61752afb),
StartupReorderBuffer(), and RemovePgTempFiles(). I like the idea of
adding additional information in the ps title, but I also think it is
worth exploring additional ways to improve on these O(n) startup
tasks.

See also the nearby thread entitled "when the startup process doesn't"
which touches on this same issue.

--
Robert Haas
EDB: http://www.enterprisedb.com

#29Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#28)
Re: please update ps display for recovery checkpoint

On Mon, Jun 07, 2021 at 01:28:06PM -0400, Robert Haas wrote:

On Mon, Jun 7, 2021 at 12:02 PM Bossart, Nathan <bossartn@amazon.com> wrote:

I've seen a few functions cause lengthy startups, including
SyncDataDirectory() (for which I was grateful to see 61752afb),
StartupReorderBuffer(), and RemovePgTempFiles(). I like the idea of
adding additional information in the ps title, but I also think it is
worth exploring additional ways to improve on these O(n) startup
tasks.

+1. I also agree with doing something for the ps output of the
startup process when these are happening in crash recovery.

See also the nearby thread entitled "when the startup process doesn't"
which touches on this same issue.

Here is a link to the thread:
/messages/by-id/CA+TgmoaHQrgDFOBwgY16XCoMtXxsrVGFB2jNCvb7-ubuEe1MGg@mail.gmail.com
--
Michael

#30Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#26)
Re: please update ps display for recovery checkpoint

On Sun, Jun 06, 2021 at 09:13:48PM -0500, Justin Pryzby wrote:

Putting this into fd.c seems to assume that we can clobber "ps", which is fine
when called by StartupXLOG(), but it's a public interface, so I'm not sure if
it's okay to assume that's the only caller. Maybe it should check if
MyAuxProcType == B_STARTUP.

I would be tempted to just add that into StartupXLOG() rather than
implying that callers of SyncDataDirectory() are fine to get their ps
output enforced all the time.
--
Michael