Report checkpoint progress in server logs

Started by Bharath Rupireddyover 4 years ago97 messageshackers
Jump to latest
#1Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com

Hi,

At times, some of the checkpoint operations such as removing old WAL
files, dealing with replication snapshot or mapping files etc. may
take a while during which the server doesn't emit any logs or
information, the only logs emitted are LogCheckpointStart and
LogCheckpointEnd. Many times this isn't a problem if the checkpoint is
quicker, but there can be extreme situations which require the users
to know what's going on with the current checkpoint.

Given that the commit 9ce346ea [1]commit 9ce346eabf350a130bba46be3f8c50ba28506969 Author: Robert Haas <rhaas@postgresql.org> Date: Mon Oct 25 11:51:57 2021 -0400 introduced a nice mechanism to
report the long running operations of the startup process in the
server logs, I'm thinking we can have a similar progress mechanism for
the checkpoint as well. There's another idea suggested in a couple of
other threads to have a pg_stat_progress_checkpoint similar to
pg_stat_progress_analyze/vacuum/etc. But the problem with this idea is
during the end-of-recovery or shutdown checkpoints, the
pg_stat_progress_checkpoint view isn't accessible as it requires a
connection to the server which isn't allowed.

Therefore, reporting the checkpoint progress in the server logs, much
like [1]commit 9ce346eabf350a130bba46be3f8c50ba28506969 Author: Robert Haas <rhaas@postgresql.org> Date: Mon Oct 25 11:51:57 2021 -0400, seems to be the best way IMO. We can 1) either make
ereport_startup_progress and log_startup_progress_interval more
generic (something like ereport_log_progress and
log_progress_interval), move the code to elog.c, use it for
checkpoint progress and if required for other time-consuming
operations 2) or have an entirely different GUC and API for checkpoint
progress.

IMO, option (1) i.e. ereport_log_progress and log_progress_interval
(better names are welcome) seems a better idea.

Thoughts?

[1]: commit 9ce346eabf350a130bba46be3f8c50ba28506969 Author: Robert Haas <rhaas@postgresql.org> Date: Mon Oct 25 11:51:57 2021 -0400
commit 9ce346eabf350a130bba46be3f8c50ba28506969
Author: Robert Haas <rhaas@postgresql.org>
Date: Mon Oct 25 11:51:57 2021 -0400

Report progress of startup operations that take a long time.

Regards,
Bharath Rupireddy.

#2Magnus Hagander
magnus@hagander.net
In reply to: Bharath Rupireddy (#1)
Re: Report checkpoint progress in server logs

On Wed, Dec 29, 2021 at 3:31 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Hi,

At times, some of the checkpoint operations such as removing old WAL
files, dealing with replication snapshot or mapping files etc. may
take a while during which the server doesn't emit any logs or
information, the only logs emitted are LogCheckpointStart and
LogCheckpointEnd. Many times this isn't a problem if the checkpoint is
quicker, but there can be extreme situations which require the users
to know what's going on with the current checkpoint.

Given that the commit 9ce346ea [1] introduced a nice mechanism to
report the long running operations of the startup process in the
server logs, I'm thinking we can have a similar progress mechanism for
the checkpoint as well. There's another idea suggested in a couple of
other threads to have a pg_stat_progress_checkpoint similar to
pg_stat_progress_analyze/vacuum/etc. But the problem with this idea is
during the end-of-recovery or shutdown checkpoints, the
pg_stat_progress_checkpoint view isn't accessible as it requires a
connection to the server which isn't allowed.

Therefore, reporting the checkpoint progress in the server logs, much
like [1], seems to be the best way IMO. We can 1) either make
ereport_startup_progress and log_startup_progress_interval more
generic (something like ereport_log_progress and
log_progress_interval), move the code to elog.c, use it for
checkpoint progress and if required for other time-consuming
operations 2) or have an entirely different GUC and API for checkpoint
progress.

IMO, option (1) i.e. ereport_log_progress and log_progress_interval
(better names are welcome) seems a better idea.

Thoughts?

I find progress reporting in the logfile to generally be a terrible
way of doing things, and the fact that we do it for the startup
process is/should be only because we have no other choice, not because
it's the right choice.

I think the right choice to solve the *general* problem is the
mentioned pg_stat_progress_checkpoints.

We may want to *additionally* have the ability to log the progress
specifically for the special cases when we're not able to use that
view. And in those case, we can perhaps just use the existing
log_startup_progress_interval parameter for this as well -- at least
for the startup checkpoint.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#2)
Re: Report checkpoint progress in server logs

Magnus Hagander <magnus@hagander.net> writes:

Therefore, reporting the checkpoint progress in the server logs, much
like [1], seems to be the best way IMO.

I find progress reporting in the logfile to generally be a terrible
way of doing things, and the fact that we do it for the startup
process is/should be only because we have no other choice, not because
it's the right choice.

I'm already pretty seriously unhappy about the log-spamming effects of
64da07c41 (default to log_checkpoints=on), and am willing to lay a side
bet that that gets reverted after we have some field experience with it.
This proposal seems far worse from that standpoint. Keep in mind that
our out-of-the-box logging configuration still doesn't have any log
rotation ability, which means that the noisier the server is in normal
operation, the sooner you fill your disk.

I think the right choice to solve the *general* problem is the
mentioned pg_stat_progress_checkpoints.

+1

regards, tom lane

#4SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Tom Lane (#3)
Re: Report checkpoint progress in server logs

Coincidentally, I was thinking about the same yesterday after tired of
waiting for the checkpoint completion on a server.

On Wed, Dec 29, 2021 at 7:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Magnus Hagander <magnus@hagander.net> writes:

Therefore, reporting the checkpoint progress in the server logs, much
like [1], seems to be the best way IMO.

I find progress reporting in the logfile to generally be a terrible
way of doing things, and the fact that we do it for the startup
process is/should be only because we have no other choice, not because
it's the right choice.

I'm already pretty seriously unhappy about the log-spamming effects of
64da07c41 (default to log_checkpoints=on), and am willing to lay a side
bet that that gets reverted after we have some field experience with it.
This proposal seems far worse from that standpoint. Keep in mind that
our out-of-the-box logging configuration still doesn't have any log
rotation ability, which means that the noisier the server is in normal
operation, the sooner you fill your disk.

Server is not open up for the queries while running the end of recovery
checkpoint and a catalog view may not help here but the process title
change or logging would be helpful in such cases. When the server is
running the recovery, anxious customers ask several times the ETA for
recovery completion, and not having visibility into these operations makes
life difficult for the DBA/operations.

I think the right choice to solve the *general* problem is the
mentioned pg_stat_progress_checkpoints.

+1

+1 to this. We need at least a trace of the number of buffers to sync
(num_to_scan) before the checkpoint start, instead of just emitting the
stats at the end.

Bharat, it would be good to show the buffers synced counter and the total
buffers to sync, checkpointer pid, substep it is running, whether it is on
target for completion, checkpoint_Reason (manual/times/forced). BufferSync
has several variables tracking the sync progress locally, and we may need
some refactoring here.

Show quoted text

regards, tom lane

#5Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#3)
Re: Report checkpoint progress in server logs

On Wed, Dec 29, 2021 at 10:40:59AM -0500, Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

I think the right choice to solve the *general* problem is the
mentioned pg_stat_progress_checkpoints.

+1

Agreed. I don't see why this would not work as there are
PgBackendStatus entries for each auxiliary process.
--
Michael

#6Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#3)
Re: Report checkpoint progress in server logs

On Wed, Dec 29, 2021 at 10:40:59AM -0500, Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

Therefore, reporting the checkpoint progress in the server logs, much
like [1], seems to be the best way IMO.

I find progress reporting in the logfile to generally be a terrible
way of doing things, and the fact that we do it for the startup
process is/should be only because we have no other choice, not because
it's the right choice.

I'm already pretty seriously unhappy about the log-spamming effects of
64da07c41 (default to log_checkpoints=on), and am willing to lay a side
bet that that gets reverted after we have some field experience with it.
This proposal seems far worse from that standpoint. Keep in mind that
our out-of-the-box logging configuration still doesn't have any log
rotation ability, which means that the noisier the server is in normal
operation, the sooner you fill your disk.

I think we are looking at three potential observable behaviors people
might care about:

* the current activity/progress of checkpoints
* the historical reporting of checkpoint completion, mixed in with other
log messages for later analysis
* the aggregate behavior of checkpoint operation

I think it is clear that checkpoint progress activity isn't useful for
the server logs because that information has little historical value,
but does fit for a progress view. As Tom already expressed, we will
have to wait to see if non-progress checkpoint information in the logs
has sufficient historical value.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#7Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Bruce Momjian (#6)
Re: Report checkpoint progress in server logs

I think the right choice to solve the *general* problem is the
mentioned pg_stat_progress_checkpoints.

We may want to *additionally* have the ability to log the progress
specifically for the special cases when we're not able to use that
view. And in those case, we can perhaps just use the existing
log_startup_progress_interval parameter for this as well -- at least
for the startup checkpoint.

+1

We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of just emitting the stats at the end.

Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substep it is running, whether it is on target for completion, checkpoint_Reason
(manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some refactoring here.

I agree to provide above mentioned information as part of showing the
progress of current checkpoint operation. I am currently looking into
the code to know if any other information can be added.

Thanks & Regards,
Nitin Jadhav

Show quoted text

On Thu, Jan 6, 2022 at 5:12 AM Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Dec 29, 2021 at 10:40:59AM -0500, Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

Therefore, reporting the checkpoint progress in the server logs, much
like [1], seems to be the best way IMO.

I find progress reporting in the logfile to generally be a terrible
way of doing things, and the fact that we do it for the startup
process is/should be only because we have no other choice, not because
it's the right choice.

I'm already pretty seriously unhappy about the log-spamming effects of
64da07c41 (default to log_checkpoints=on), and am willing to lay a side
bet that that gets reverted after we have some field experience with it.
This proposal seems far worse from that standpoint. Keep in mind that
our out-of-the-box logging configuration still doesn't have any log
rotation ability, which means that the noisier the server is in normal
operation, the sooner you fill your disk.

I think we are looking at three potential observable behaviors people
might care about:

* the current activity/progress of checkpoints
* the historical reporting of checkpoint completion, mixed in with other
log messages for later analysis
* the aggregate behavior of checkpoint operation

I think it is clear that checkpoint progress activity isn't useful for
the server logs because that information has little historical value,
but does fit for a progress view. As Tom already expressed, we will
have to wait to see if non-progress checkpoint information in the logs
has sufficient historical value.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#8Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Nitin Jadhav (#7)
Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

On Fri, Jan 21, 2022 at 11:07 AM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:

I think the right choice to solve the *general* problem is the
mentioned pg_stat_progress_checkpoints.

We may want to *additionally* have the ability to log the progress
specifically for the special cases when we're not able to use that
view. And in those case, we can perhaps just use the existing
log_startup_progress_interval parameter for this as well -- at least
for the startup checkpoint.

+1

We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of just emitting the stats at the end.

Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substep it is running, whether it is on target for completion, checkpoint_Reason
(manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some refactoring here.

I agree to provide above mentioned information as part of showing the
progress of current checkpoint operation. I am currently looking into
the code to know if any other information can be added.

As suggested in the other thread by Julien, I'm changing the subject
of this thread to reflect the discussion.

Regards,
Bharath Rupireddy.

#9Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Bharath Rupireddy (#8)
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of just emitting the stats at the end.

Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substep it is running, whether it is on target for completion, checkpoint_Reason
(manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some refactoring here.

I agree to provide above mentioned information as part of showing the
progress of current checkpoint operation. I am currently looking into
the code to know if any other information can be added.

Here is the initial patch to show the progress of checkpoint through
pg_stat_progress_checkpoint view. Please find the attachment.

The information added to this view are pid - process ID of a
CHECKPOINTER process, kind - kind of checkpoint indicates the reason
for checkpoint (values can be wal, time or force), phase - indicates
the current phase of checkpoint operation, total_buffer_writes - total
number of buffers to be written, buffers_processed - number of buffers
processed, buffers_written - number of buffers written,
total_file_syncs - total number of files to be synced, files_synced -
number of files synced.

There are many operations happen as part of checkpoint. For each of
the operation I am updating the phase field of
pg_stat_progress_checkpoint view. The values supported for this field
are initializing, checkpointing replication slots, checkpointing
snapshots, checkpointing logical rewrite mappings, checkpointing CLOG
pages, checkpointing CommitTs pages, checkpointing SUBTRANS pages,
checkpointing MULTIXACT pages, checkpointing SLRU pages, checkpointing
buffers, performing sync requests, performing two phase checkpoint,
recycling old XLOG files and Finalizing. In case of checkpointing
buffers phase, the fields total_buffer_writes, buffers_processed and
buffers_written shows the detailed progress of writing buffers. In
case of performing sync requests phase, the fields total_file_syncs
and files_synced shows the detailed progress of syncing files. In
other phases, only the phase field is getting updated and it is
difficult to show the progress because we do not get the total number
of files count without traversing the directory. It is not worth to
calculate that as it affects the performance of the checkpoint. I also
gave a thought to just mention the number of files processed, but this
wont give a meaningful progress information (It can be treated as
statistics). Hence just updating the phase field in those scenarios.

Apart from above fields, I am planning to add few more fields to the
view in the next patch. That is, process ID of the backend process
which triggered a CHECKPOINT command, checkpoint start location, filed
to indicate whether it is a checkpoint or restartpoint and elapsed
time of the checkpoint operation. Please share your thoughts. I would
be happy to add any other information that contributes to showing the
progress of checkpoint.

As per the discussion in this thread, there should be some mechanism
to show the progress of checkpoint during shutdown and end-of-recovery
cases as we cannot access pg_stat_progress_checkpoint in those cases.
I am working on this to use log_startup_progress_interval mechanism to
log the progress in the server logs.

Kindly review the patch and share your thoughts.

On Fri, Jan 28, 2022 at 12:24 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Show quoted text

On Fri, Jan 21, 2022 at 11:07 AM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:

I think the right choice to solve the *general* problem is the
mentioned pg_stat_progress_checkpoints.

We may want to *additionally* have the ability to log the progress
specifically for the special cases when we're not able to use that
view. And in those case, we can perhaps just use the existing
log_startup_progress_interval parameter for this as well -- at least
for the startup checkpoint.

+1

We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of just emitting the stats at the end.

Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substep it is running, whether it is on target for completion, checkpoint_Reason
(manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some refactoring here.

I agree to provide above mentioned information as part of showing the
progress of current checkpoint operation. I am currently looking into
the code to know if any other information can be added.

As suggested in the other thread by Julien, I'm changing the subject
of this thread to reflect the discussion.

Regards,
Bharath Rupireddy.

Attachments:

0001-pg_stat_progress_checkpoint-view.patchapplication/octet-stream; name=0001-pg_stat_progress_checkpoint-view.patchDownload+473-5
#10Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Nitin Jadhav (#9)
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

Apart from above fields, I am planning to add few more fields to the
view in the next patch. That is, process ID of the backend process
which triggered a CHECKPOINT command, checkpoint start location, filed
to indicate whether it is a checkpoint or restartpoint and elapsed
time of the checkpoint operation. Please share your thoughts. I would
be happy to add any other information that contributes to showing the
progress of checkpoint.

The progress reporting mechanism of postgres uses the
'st_progress_param' array of 'PgBackendStatus' structure to hold the
information related to the progress. There is a function
'pgstat_progress_update_param()' which takes 'index' and 'val' as
arguments and updates the 'val' to corresponding 'index' in the
'st_progress_param' array. This mechanism works fine when all the
progress information is of type integer as the data type of
'st_progress_param' is of type integer. If the progress data is of
different type than integer, then there is no easy way to do so. In my
understanding, define a new structure with additional fields. Add this
as part of the 'PgBackendStatus' structure and support the necessary
function to update and fetch the data from this structure. This
becomes very ugly as it will not match the existing mechanism of
progress reporting. Kindly let me know if there is any better way to
handle this. If there are any changes to the existing mechanism to
make it generic to support basic data types, I would like to discuss
this in the new thread.

On Thu, Feb 10, 2022 at 12:22 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:

Show quoted text

We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of just emitting the stats at the end.

Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substep it is running, whether it is on target for completion, checkpoint_Reason
(manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some refactoring here.

I agree to provide above mentioned information as part of showing the
progress of current checkpoint operation. I am currently looking into
the code to know if any other information can be added.

Here is the initial patch to show the progress of checkpoint through
pg_stat_progress_checkpoint view. Please find the attachment.

The information added to this view are pid - process ID of a
CHECKPOINTER process, kind - kind of checkpoint indicates the reason
for checkpoint (values can be wal, time or force), phase - indicates
the current phase of checkpoint operation, total_buffer_writes - total
number of buffers to be written, buffers_processed - number of buffers
processed, buffers_written - number of buffers written,
total_file_syncs - total number of files to be synced, files_synced -
number of files synced.

There are many operations happen as part of checkpoint. For each of
the operation I am updating the phase field of
pg_stat_progress_checkpoint view. The values supported for this field
are initializing, checkpointing replication slots, checkpointing
snapshots, checkpointing logical rewrite mappings, checkpointing CLOG
pages, checkpointing CommitTs pages, checkpointing SUBTRANS pages,
checkpointing MULTIXACT pages, checkpointing SLRU pages, checkpointing
buffers, performing sync requests, performing two phase checkpoint,
recycling old XLOG files and Finalizing. In case of checkpointing
buffers phase, the fields total_buffer_writes, buffers_processed and
buffers_written shows the detailed progress of writing buffers. In
case of performing sync requests phase, the fields total_file_syncs
and files_synced shows the detailed progress of syncing files. In
other phases, only the phase field is getting updated and it is
difficult to show the progress because we do not get the total number
of files count without traversing the directory. It is not worth to
calculate that as it affects the performance of the checkpoint. I also
gave a thought to just mention the number of files processed, but this
wont give a meaningful progress information (It can be treated as
statistics). Hence just updating the phase field in those scenarios.

Apart from above fields, I am planning to add few more fields to the
view in the next patch. That is, process ID of the backend process
which triggered a CHECKPOINT command, checkpoint start location, filed
to indicate whether it is a checkpoint or restartpoint and elapsed
time of the checkpoint operation. Please share your thoughts. I would
be happy to add any other information that contributes to showing the
progress of checkpoint.

As per the discussion in this thread, there should be some mechanism
to show the progress of checkpoint during shutdown and end-of-recovery
cases as we cannot access pg_stat_progress_checkpoint in those cases.
I am working on this to use log_startup_progress_interval mechanism to
log the progress in the server logs.

Kindly review the patch and share your thoughts.

On Fri, Jan 28, 2022 at 12:24 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Jan 21, 2022 at 11:07 AM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:

I think the right choice to solve the *general* problem is the
mentioned pg_stat_progress_checkpoints.

We may want to *additionally* have the ability to log the progress
specifically for the special cases when we're not able to use that
view. And in those case, we can perhaps just use the existing
log_startup_progress_interval parameter for this as well -- at least
for the startup checkpoint.

+1

We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of just emitting the stats at the end.

Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substep it is running, whether it is on target for completion, checkpoint_Reason
(manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some refactoring here.

I agree to provide above mentioned information as part of showing the
progress of current checkpoint operation. I am currently looking into
the code to know if any other information can be added.

As suggested in the other thread by Julien, I'm changing the subject
of this thread to reflect the discussion.

Regards,
Bharath Rupireddy.

#11Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Nitin Jadhav (#10)
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

On Tue, 15 Feb 2022 at 13:16, Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:

Apart from above fields, I am planning to add few more fields to the
view in the next patch. That is, process ID of the backend process
which triggered a CHECKPOINT command, checkpoint start location, filed
to indicate whether it is a checkpoint or restartpoint and elapsed
time of the checkpoint operation. Please share your thoughts. I would
be happy to add any other information that contributes to showing the
progress of checkpoint.

The progress reporting mechanism of postgres uses the
'st_progress_param' array of 'PgBackendStatus' structure to hold the
information related to the progress. There is a function
'pgstat_progress_update_param()' which takes 'index' and 'val' as
arguments and updates the 'val' to corresponding 'index' in the
'st_progress_param' array. This mechanism works fine when all the
progress information is of type integer as the data type of
'st_progress_param' is of type integer. If the progress data is of
different type than integer, then there is no easy way to do so.

Progress parameters are int64, so all of the new 'checkpoint start
location' (lsn = uint64), 'triggering backend PID' (int), 'elapsed
time' (store as start time in stat_progress, timestamp fits in 64
bits) and 'checkpoint or restartpoint?' (boolean) would each fit in a
current stat_progress parameter. Some processing would be required at
the view, but that's not impossible to overcome.

Kind regards,

Matthias van de Meent

#12Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Nitin Jadhav (#9)
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

On Thu, 10 Feb 2022 at 07:53, Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:

We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of just emitting the stats at the end.

Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substep it is running, whether it is on target for completion, checkpoint_Reason
(manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some refactoring here.

I agree to provide above mentioned information as part of showing the
progress of current checkpoint operation. I am currently looking into
the code to know if any other information can be added.

Here is the initial patch to show the progress of checkpoint through
pg_stat_progress_checkpoint view. Please find the attachment.

The information added to this view are pid - process ID of a
CHECKPOINTER process, kind - kind of checkpoint indicates the reason
for checkpoint (values can be wal, time or force), phase - indicates
the current phase of checkpoint operation, total_buffer_writes - total
number of buffers to be written, buffers_processed - number of buffers
processed, buffers_written - number of buffers written,
total_file_syncs - total number of files to be synced, files_synced -
number of files synced.

There are many operations happen as part of checkpoint. For each of
the operation I am updating the phase field of
pg_stat_progress_checkpoint view. The values supported for this field
are initializing, checkpointing replication slots, checkpointing
snapshots, checkpointing logical rewrite mappings, checkpointing CLOG
pages, checkpointing CommitTs pages, checkpointing SUBTRANS pages,
checkpointing MULTIXACT pages, checkpointing SLRU pages, checkpointing
buffers, performing sync requests, performing two phase checkpoint,
recycling old XLOG files and Finalizing. In case of checkpointing
buffers phase, the fields total_buffer_writes, buffers_processed and
buffers_written shows the detailed progress of writing buffers. In
case of performing sync requests phase, the fields total_file_syncs
and files_synced shows the detailed progress of syncing files. In
other phases, only the phase field is getting updated and it is
difficult to show the progress because we do not get the total number
of files count without traversing the directory. It is not worth to
calculate that as it affects the performance of the checkpoint. I also
gave a thought to just mention the number of files processed, but this
wont give a meaningful progress information (It can be treated as
statistics). Hence just updating the phase field in those scenarios.

Apart from above fields, I am planning to add few more fields to the
view in the next patch. That is, process ID of the backend process
which triggered a CHECKPOINT command, checkpoint start location, filed
to indicate whether it is a checkpoint or restartpoint and elapsed
time of the checkpoint operation. Please share your thoughts. I would
be happy to add any other information that contributes to showing the
progress of checkpoint.

As per the discussion in this thread, there should be some mechanism
to show the progress of checkpoint during shutdown and end-of-recovery
cases as we cannot access pg_stat_progress_checkpoint in those cases.
I am working on this to use log_startup_progress_interval mechanism to
log the progress in the server logs.

Kindly review the patch and share your thoughts.

Interesting idea, and overall a nice addition to the
pg_stat_progress_* reporting infrastructure.

Could you add your patch to the current commitfest at
https://commitfest.postgresql.org/37/?

See below for some comments on the patch:

xlog.c @ checkpoint_progress_start, checkpoint_progress_update_param, checkpoint_progress_end
+    /* In bootstrap mode, we don't actually record anything. */
+    if (IsBootstrapProcessingMode())
+        return;

Why do you check against the state of the system?
pgstat_progress_update_* already provides protections against updating
the progress tables if the progress infrastructure is not loaded; and
otherwise (in the happy path) the cost of updating the progress fields
will be quite a bit higher than normal. Updating stat_progress isn't
very expensive (quite cheap, really), so I don't quite get why you
guard against reporting stats when you expect no other client to be
listening.

I think you can simplify this a lot by directly using
pgstat_progress_update_param() instead.

xlog.c @ checkpoint_progress_start
+        pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT, InvalidOid);
+        checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_PHASE,
+                                         PROGRESS_CHECKPOINT_PHASE_INIT);
+        if (flags & CHECKPOINT_CAUSE_XLOG)
+            checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND,
+                                             PROGRESS_CHECKPOINT_KIND_WAL);
+        else if (flags & CHECKPOINT_CAUSE_TIME)
+            checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND,
+                                             PROGRESS_CHECKPOINT_KIND_TIME);
+ [...]

Could you assign the kind of checkpoint to a local variable, and then
update the "phase" and "kind" parameters at the same time through
pgstat_progress_update_multi_param(2, ...)? See
BuildRelationExtStatistics in extended_stats.c for an example usage.
Note that regardless of whether checkpoint_progress_update* will
remain, the checks done in that function already have been checked in
this function as well, so you can use the pgstat_* functions directly.

monitoring.sgml
+   <structname>pg_stat_progress_checkpoint</structname> view will contain a
+   single row indicating the progress of checkpoint operation.

... add "if a checkpoint is currently active".

+       <structfield>total_buffer_writes</structfield> <type>bigint</type>
+       <structfield>total_file_syncs</structfield> <type>bigint</type>

The other progress tables use [type]_total as column names for counter
targets (e.g. backup_total for backup_streamed, heap_blks_total for
heap_blks_scanned, etc.). I think that `buffers_total` and
`files_total` would be better column names.

+ The checkpoint operation is requested due to XLOG filling.

+ The checkpoint was started because >max_wal_size< of WAL was written.

+ The checkpoint operation is requested due to timeout.

+ The checkpoint was started due to the expiration of a

checkpoint_timeout< interval

+       The checkpoint operation is forced even if no XLOG activity has occurred
+       since the last one.

+ Some operation forced a checkpoint.

+ <entry><literal>checkpointing CommitTs pages</literal></entry>

CommitTs -> Commit time stamp

Thanks for working on this.

Kind regards,

Matthias van de Meent

#13Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Matthias van de Meent (#12)
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

The progress reporting mechanism of postgres uses the
'st_progress_param' array of 'PgBackendStatus' structure to hold the
information related to the progress. There is a function
'pgstat_progress_update_param()' which takes 'index' and 'val' as
arguments and updates the 'val' to corresponding 'index' in the
'st_progress_param' array. This mechanism works fine when all the
progress information is of type integer as the data type of
'st_progress_param' is of type integer. If the progress data is of
different type than integer, then there is no easy way to do so.

Progress parameters are int64, so all of the new 'checkpoint start
location' (lsn = uint64), 'triggering backend PID' (int), 'elapsed
time' (store as start time in stat_progress, timestamp fits in 64
bits) and 'checkpoint or restartpoint?' (boolean) would each fit in a
current stat_progress parameter. Some processing would be required at
the view, but that's not impossible to overcome.

Thank you for sharing the information. 'triggering backend PID' (int)
- can be stored without any problem. 'checkpoint or restartpoint?'
(boolean) - can be stored as a integer value like
PROGRESS_CHECKPOINT_TYPE_CHECKPOINT(0) and
PROGRESS_CHECKPOINT_TYPE_RESTARTPOINT(1). 'elapsed time' (store as
start time in stat_progress, timestamp fits in 64 bits) - As
Timestamptz is of type int64 internally, so we can store the timestamp
value in the progres parameter and then expose a function like
'pg_stat_get_progress_checkpoint_elapsed' which takes int64 (not
Timestamptz) as argument and then returns string representing the
elapsed time. This function can be called in the view. Is it
safe/advisable to use int64 type here rather than Timestamptz for this
purpose? 'checkpoint start location' (lsn = uint64) - I feel we
cannot use progress parameters for this case. As assigning uint64 to
int64 type would be an issue for larger values and can lead to hidden
bugs.

Thoughts?

Thanks & Regards,
Nitin Jadhav

On Thu, Feb 17, 2022 at 1:33 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

Show quoted text

On Thu, 10 Feb 2022 at 07:53, Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:

We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of just emitting the stats at the end.

Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substep it is running, whether it is on target for completion, checkpoint_Reason
(manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some refactoring here.

I agree to provide above mentioned information as part of showing the
progress of current checkpoint operation. I am currently looking into
the code to know if any other information can be added.

Here is the initial patch to show the progress of checkpoint through
pg_stat_progress_checkpoint view. Please find the attachment.

The information added to this view are pid - process ID of a
CHECKPOINTER process, kind - kind of checkpoint indicates the reason
for checkpoint (values can be wal, time or force), phase - indicates
the current phase of checkpoint operation, total_buffer_writes - total
number of buffers to be written, buffers_processed - number of buffers
processed, buffers_written - number of buffers written,
total_file_syncs - total number of files to be synced, files_synced -
number of files synced.

There are many operations happen as part of checkpoint. For each of
the operation I am updating the phase field of
pg_stat_progress_checkpoint view. The values supported for this field
are initializing, checkpointing replication slots, checkpointing
snapshots, checkpointing logical rewrite mappings, checkpointing CLOG
pages, checkpointing CommitTs pages, checkpointing SUBTRANS pages,
checkpointing MULTIXACT pages, checkpointing SLRU pages, checkpointing
buffers, performing sync requests, performing two phase checkpoint,
recycling old XLOG files and Finalizing. In case of checkpointing
buffers phase, the fields total_buffer_writes, buffers_processed and
buffers_written shows the detailed progress of writing buffers. In
case of performing sync requests phase, the fields total_file_syncs
and files_synced shows the detailed progress of syncing files. In
other phases, only the phase field is getting updated and it is
difficult to show the progress because we do not get the total number
of files count without traversing the directory. It is not worth to
calculate that as it affects the performance of the checkpoint. I also
gave a thought to just mention the number of files processed, but this
wont give a meaningful progress information (It can be treated as
statistics). Hence just updating the phase field in those scenarios.

Apart from above fields, I am planning to add few more fields to the
view in the next patch. That is, process ID of the backend process
which triggered a CHECKPOINT command, checkpoint start location, filed
to indicate whether it is a checkpoint or restartpoint and elapsed
time of the checkpoint operation. Please share your thoughts. I would
be happy to add any other information that contributes to showing the
progress of checkpoint.

As per the discussion in this thread, there should be some mechanism
to show the progress of checkpoint during shutdown and end-of-recovery
cases as we cannot access pg_stat_progress_checkpoint in those cases.
I am working on this to use log_startup_progress_interval mechanism to
log the progress in the server logs.

Kindly review the patch and share your thoughts.

Interesting idea, and overall a nice addition to the
pg_stat_progress_* reporting infrastructure.

Could you add your patch to the current commitfest at
https://commitfest.postgresql.org/37/?

See below for some comments on the patch:

xlog.c @ checkpoint_progress_start, checkpoint_progress_update_param, checkpoint_progress_end
+    /* In bootstrap mode, we don't actually record anything. */
+    if (IsBootstrapProcessingMode())
+        return;

Why do you check against the state of the system?
pgstat_progress_update_* already provides protections against updating
the progress tables if the progress infrastructure is not loaded; and
otherwise (in the happy path) the cost of updating the progress fields
will be quite a bit higher than normal. Updating stat_progress isn't
very expensive (quite cheap, really), so I don't quite get why you
guard against reporting stats when you expect no other client to be
listening.

I think you can simplify this a lot by directly using
pgstat_progress_update_param() instead.

xlog.c @ checkpoint_progress_start
+        pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT, InvalidOid);
+        checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_PHASE,
+                                         PROGRESS_CHECKPOINT_PHASE_INIT);
+        if (flags & CHECKPOINT_CAUSE_XLOG)
+            checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND,
+                                             PROGRESS_CHECKPOINT_KIND_WAL);
+        else if (flags & CHECKPOINT_CAUSE_TIME)
+            checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND,
+                                             PROGRESS_CHECKPOINT_KIND_TIME);
+ [...]

Could you assign the kind of checkpoint to a local variable, and then
update the "phase" and "kind" parameters at the same time through
pgstat_progress_update_multi_param(2, ...)? See
BuildRelationExtStatistics in extended_stats.c for an example usage.
Note that regardless of whether checkpoint_progress_update* will
remain, the checks done in that function already have been checked in
this function as well, so you can use the pgstat_* functions directly.

monitoring.sgml
+   <structname>pg_stat_progress_checkpoint</structname> view will contain a
+   single row indicating the progress of checkpoint operation.

... add "if a checkpoint is currently active".

+       <structfield>total_buffer_writes</structfield> <type>bigint</type>
+       <structfield>total_file_syncs</structfield> <type>bigint</type>

The other progress tables use [type]_total as column names for counter
targets (e.g. backup_total for backup_streamed, heap_blks_total for
heap_blks_scanned, etc.). I think that `buffers_total` and
`files_total` would be better column names.

+ The checkpoint operation is requested due to XLOG filling.

+ The checkpoint was started because >max_wal_size< of WAL was written.

+ The checkpoint operation is requested due to timeout.

+ The checkpoint was started due to the expiration of a

checkpoint_timeout< interval

+       The checkpoint operation is forced even if no XLOG activity has occurred
+       since the last one.

+ Some operation forced a checkpoint.

+ <entry><literal>checkpointing CommitTs pages</literal></entry>

CommitTs -> Commit time stamp

Thanks for working on this.

Kind regards,

Matthias van de Meent

#14Julien Rouhaud
rjuju123@gmail.com
In reply to: Nitin Jadhav (#13)
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

Hi,

On Thu, Feb 17, 2022 at 12:26:07PM +0530, Nitin Jadhav wrote:

Thank you for sharing the information. 'triggering backend PID' (int)
- can be stored without any problem.

There can be multiple processes triggering a checkpoint, or at least wanting it
to happen or happen faster.

'checkpoint or restartpoint?'

Do you actually need to store that? Can't it be inferred from
pg_is_in_recovery()?

#15Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Nitin Jadhav (#13)
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

On Thu, 17 Feb 2022 at 07:56, Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:

Progress parameters are int64, so all of the new 'checkpoint start
location' (lsn = uint64), 'triggering backend PID' (int), 'elapsed
time' (store as start time in stat_progress, timestamp fits in 64
bits) and 'checkpoint or restartpoint?' (boolean) would each fit in a
current stat_progress parameter. Some processing would be required at
the view, but that's not impossible to overcome.

Thank you for sharing the information. 'triggering backend PID' (int)
- can be stored without any problem. 'checkpoint or restartpoint?'
(boolean) - can be stored as a integer value like
PROGRESS_CHECKPOINT_TYPE_CHECKPOINT(0) and
PROGRESS_CHECKPOINT_TYPE_RESTARTPOINT(1). 'elapsed time' (store as
start time in stat_progress, timestamp fits in 64 bits) - As
Timestamptz is of type int64 internally, so we can store the timestamp
value in the progres parameter and then expose a function like
'pg_stat_get_progress_checkpoint_elapsed' which takes int64 (not
Timestamptz) as argument and then returns string representing the
elapsed time.

No need to use a string there; I think exposing the checkpoint start
time is good enough. The conversion of int64 to timestamp[tz] can be
done in SQL (although I'm not sure that exposing the internal bitwise
representation of Interval should be exposed to that extent) [0]Assuming we don't care about the years past 294246 CE (2942467 is when int64 overflows into negatives), the following works without any precision losses: SELECT to_timestamp((stat.my_int64::bigint/1000000)::float8) + make_interval(0, 0, 0, 0, 0, 0, MOD(stat.my_int64, 1000000)::float8 / 1000000::float8) FROM (SELECT 1::bigint) AS stat(my_int64);.
Users can then extract the duration interval using now() - start_time,
which also allows the user to use their own preferred formatting.

This function can be called in the view. Is it
safe/advisable to use int64 type here rather than Timestamptz for this
purpose?

Yes, this must be exposed through int64, as the sql-callable
pg_stat_get_progress_info only exposes bigint columns. Any
transformation function may return other types (see
pg_indexam_progress_phasename for an example of that).

'checkpoint start location' (lsn = uint64) - I feel we
cannot use progress parameters for this case. As assigning uint64 to
int64 type would be an issue for larger values and can lead to hidden
bugs.

Not necessarily - we can (without much trouble) do a bitwise cast from
uint64 to int64, and then (in SQL) cast it back to a pg_lsn [1]SELECT '0/0'::pg_lsn + ((CASE WHEN stat.my_int64 < 0 THEN pow(2::numeric, 64::numeric)::numeric ELSE 0::numeric END) + stat.my_int64::numeric) FROM (SELECT -2::bigint /* 0xFFFFFFFF/FFFFFFFE */ AS my_bigint_lsn) AS stat(my_int64);. Not
very elegant, but it works quite well.

Kind regards,

Matthias van de Meent

[0]: Assuming we don't care about the years past 294246 CE (2942467 is when int64 overflows into negatives), the following works without any precision losses: SELECT to_timestamp((stat.my_int64::bigint/1000000)::float8) + make_interval(0, 0, 0, 0, 0, 0, MOD(stat.my_int64, 1000000)::float8 / 1000000::float8) FROM (SELECT 1::bigint) AS stat(my_int64);
when int64 overflows into negatives), the following works without any
precision losses: SELECT
to_timestamp((stat.my_int64::bigint/1000000)::float8) +
make_interval(0, 0, 0, 0, 0, 0, MOD(stat.my_int64, 1000000)::float8 /
1000000::float8) FROM (SELECT 1::bigint) AS stat(my_int64);
[1]: SELECT '0/0'::pg_lsn + ((CASE WHEN stat.my_int64 < 0 THEN pow(2::numeric, 64::numeric)::numeric ELSE 0::numeric END) + stat.my_int64::numeric) FROM (SELECT -2::bigint /* 0xFFFFFFFF/FFFFFFFE */ AS my_bigint_lsn) AS stat(my_int64);
pow(2::numeric, 64::numeric)::numeric ELSE 0::numeric END) +
stat.my_int64::numeric) FROM (SELECT -2::bigint /* 0xFFFFFFFF/FFFFFFFE
*/ AS my_bigint_lsn) AS stat(my_int64);

#16Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Julien Rouhaud (#14)
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

Thank you for sharing the information. 'triggering backend PID' (int)
- can be stored without any problem.

There can be multiple processes triggering a checkpoint, or at least wanting it
to happen or happen faster.

Yes. There can be multiple processes but there will be one checkpoint
operation at a time. So the backend PID corresponds to the current
checkpoint operation. Let me know if I am missing something.

'checkpoint or restartpoint?'

Do you actually need to store that? Can't it be inferred from
pg_is_in_recovery()?

AFAIK we cannot use pg_is_in_recovery() to predict whether it is a
checkpoint or restartpoint because if the system exits from recovery
mode during restartpoint then any query to pg_stat_progress_checkpoint
view will return it as a checkpoint which is ideally not correct. Please
correct me if I am wrong.

Thanks & Regards,
Nitin Jadhav

Show quoted text

On Thu, Feb 17, 2022 at 4:35 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Hi,

On Thu, Feb 17, 2022 at 12:26:07PM +0530, Nitin Jadhav wrote:

Thank you for sharing the information. 'triggering backend PID' (int)
- can be stored without any problem.

There can be multiple processes triggering a checkpoint, or at least wanting it
to happen or happen faster.

'checkpoint or restartpoint?'

Do you actually need to store that? Can't it be inferred from
pg_is_in_recovery()?

#17Julien Rouhaud
rjuju123@gmail.com
In reply to: Nitin Jadhav (#16)
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

Hi,

On Thu, Feb 17, 2022 at 10:39:02PM +0530, Nitin Jadhav wrote:

Thank you for sharing the information. 'triggering backend PID' (int)
- can be stored without any problem.

There can be multiple processes triggering a checkpoint, or at least wanting it
to happen or happen faster.

Yes. There can be multiple processes but there will be one checkpoint
operation at a time. So the backend PID corresponds to the current
checkpoint operation. Let me know if I am missing something.

If there's a checkpoint timed triggered and then someone calls
pg_start_backup() which then wait for the end of the current checkpoint
(possibly after changing the flags), I think the view should reflect that in
some way. Maybe storing an array of (pid, flags) is too much, but at least a
counter with the number of processes actively waiting for the end of the
checkpoint.

'checkpoint or restartpoint?'

Do you actually need to store that? Can't it be inferred from
pg_is_in_recovery()?

AFAIK we cannot use pg_is_in_recovery() to predict whether it is a
checkpoint or restartpoint because if the system exits from recovery
mode during restartpoint then any query to pg_stat_progress_checkpoint
view will return it as a checkpoint which is ideally not correct. Please
correct me if I am wrong.

Recovery ends with an end-of-recovery checkpoint that has to finish before the
promotion can happen, so I don't think that a restart can still be in progress
if pg_is_in_recovery() returns false.

#18Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Matthias van de Meent (#12)
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

Interesting idea, and overall a nice addition to the
pg_stat_progress_* reporting infrastructure.

Could you add your patch to the current commitfest at
https://commitfest.postgresql.org/37/?

See below for some comments on the patch:

Thanks you for reviewing.
I have added it to the commitfest - https://commitfest.postgresql.org/37/3545/

xlog.c @ checkpoint_progress_start, checkpoint_progress_update_param, checkpoint_progress_end
+    /* In bootstrap mode, we don't actually record anything. */
+    if (IsBootstrapProcessingMode())
+        return;

Why do you check against the state of the system?
pgstat_progress_update_* already provides protections against updating
the progress tables if the progress infrastructure is not loaded; and
otherwise (in the happy path) the cost of updating the progress fields
will be quite a bit higher than normal. Updating stat_progress isn't
very expensive (quite cheap, really), so I don't quite get why you
guard against reporting stats when you expect no other client to be
listening.

Nice point. I agree that the extra guards(IsBootstrapProcessingMode()
and (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY)) ==
0) are not needed as the progress reporting mechanism handles that
internally (It only updates when there is an access to the
pg_stat_progress_activity view). I am planning to add the progress of
checkpoint during shutdown and end-of-recovery cases in server logs as
we don't have access to the view. In this case these guards are
necessary. checkpoint_progress_update_param() is a generic function to
report progress to the view or server logs. Thoughts?

I think you can simplify this a lot by directly using
pgstat_progress_update_param() instead.

xlog.c @ checkpoint_progress_start
+        pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT, InvalidOid);
+        checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_PHASE,
+                                         PROGRESS_CHECKPOINT_PHASE_INIT);
+        if (flags & CHECKPOINT_CAUSE_XLOG)
+            checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND,
+                                             PROGRESS_CHECKPOINT_KIND_WAL);
+        else if (flags & CHECKPOINT_CAUSE_TIME)
+            checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND,
+                                             PROGRESS_CHECKPOINT_KIND_TIME);
+ [...]

Could you assign the kind of checkpoint to a local variable, and then
update the "phase" and "kind" parameters at the same time through
pgstat_progress_update_multi_param(2, ...)? See
BuildRelationExtStatistics in extended_stats.c for an example usage.

I will make use of pgstat_progress_update_multi_param() in the next
patch to replace multiple calls to checkpoint_progress_update_param().

Note that regardless of whether checkpoint_progress_update* will
remain, the checks done in that function already have been checked in
this function as well, so you can use the pgstat_* functions directly.

As I mentioned before I am planning to add progress reporting in the
server logs, checkpoint_progress_update_param() is required and it
makes the job easier.

monitoring.sgml
+   <structname>pg_stat_progress_checkpoint</structname> view will contain a
+   single row indicating the progress of checkpoint operation.

... add "if a checkpoint is currently active".

I feel adding extra words here to indicate "if a checkpoint is
currently active" is not necessary as the view description provides
that information and also it aligns with the documentation of existing
progress views.

+       <structfield>total_buffer_writes</structfield> <type>bigint</type>
+       <structfield>total_file_syncs</structfield> <type>bigint</type>

The other progress tables use [type]_total as column names for counter
targets (e.g. backup_total for backup_streamed, heap_blks_total for
heap_blks_scanned, etc.). I think that `buffers_total` and
`files_total` would be better column names.

I agree and I will update this in the next patch.

+ The checkpoint operation is requested due to XLOG filling.

+ The checkpoint was started because >max_wal_size< of WAL was written.

How about this "The checkpoint is started because max_wal_size is reached".

+ The checkpoint operation is requested due to timeout.

+ The checkpoint was started due to the expiration of a

checkpoint_timeout< interval

"The checkpoint is started because checkpoint_timeout expired".

+       The checkpoint operation is forced even if no XLOG activity has occurred
+       since the last one.

+ Some operation forced a checkpoint.

"The checkpoint is started because some operation forced a checkpoint".

+ <entry><literal>checkpointing CommitTs pages</literal></entry>

CommitTs -> Commit time stamp

I will handle this in the next patch.

Thanks & Regards,
Nitin Jadhav

Show quoted text

On Thu, 10 Feb 2022 at 07:53, Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:

We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of just emitting the stats at the end.

Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substep it is running, whether it is on target for completion, checkpoint_Reason
(manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some refactoring here.

I agree to provide above mentioned information as part of showing the
progress of current checkpoint operation. I am currently looking into
the code to know if any other information can be added.

Here is the initial patch to show the progress of checkpoint through
pg_stat_progress_checkpoint view. Please find the attachment.

The information added to this view are pid - process ID of a
CHECKPOINTER process, kind - kind of checkpoint indicates the reason
for checkpoint (values can be wal, time or force), phase - indicates
the current phase of checkpoint operation, total_buffer_writes - total
number of buffers to be written, buffers_processed - number of buffers
processed, buffers_written - number of buffers written,
total_file_syncs - total number of files to be synced, files_synced -
number of files synced.

There are many operations happen as part of checkpoint. For each of
the operation I am updating the phase field of
pg_stat_progress_checkpoint view. The values supported for this field
are initializing, checkpointing replication slots, checkpointing
snapshots, checkpointing logical rewrite mappings, checkpointing CLOG
pages, checkpointing CommitTs pages, checkpointing SUBTRANS pages,
checkpointing MULTIXACT pages, checkpointing SLRU pages, checkpointing
buffers, performing sync requests, performing two phase checkpoint,
recycling old XLOG files and Finalizing. In case of checkpointing
buffers phase, the fields total_buffer_writes, buffers_processed and
buffers_written shows the detailed progress of writing buffers. In
case of performing sync requests phase, the fields total_file_syncs
and files_synced shows the detailed progress of syncing files. In
other phases, only the phase field is getting updated and it is
difficult to show the progress because we do not get the total number
of files count without traversing the directory. It is not worth to
calculate that as it affects the performance of the checkpoint. I also
gave a thought to just mention the number of files processed, but this
wont give a meaningful progress information (It can be treated as
statistics). Hence just updating the phase field in those scenarios.

Apart from above fields, I am planning to add few more fields to the
view in the next patch. That is, process ID of the backend process
which triggered a CHECKPOINT command, checkpoint start location, filed
to indicate whether it is a checkpoint or restartpoint and elapsed
time of the checkpoint operation. Please share your thoughts. I would
be happy to add any other information that contributes to showing the
progress of checkpoint.

As per the discussion in this thread, there should be some mechanism
to show the progress of checkpoint during shutdown and end-of-recovery
cases as we cannot access pg_stat_progress_checkpoint in those cases.
I am working on this to use log_startup_progress_interval mechanism to
log the progress in the server logs.

Kindly review the patch and share your thoughts.

Interesting idea, and overall a nice addition to the
pg_stat_progress_* reporting infrastructure.

Could you add your patch to the current commitfest at
https://commitfest.postgresql.org/37/?

See below for some comments on the patch:

xlog.c @ checkpoint_progress_start, checkpoint_progress_update_param, checkpoint_progress_end
+    /* In bootstrap mode, we don't actually record anything. */
+    if (IsBootstrapProcessingMode())
+        return;

Why do you check against the state of the system?
pgstat_progress_update_* already provides protections against updating
the progress tables if the progress infrastructure is not loaded; and
otherwise (in the happy path) the cost of updating the progress fields
will be quite a bit higher than normal. Updating stat_progress isn't
very expensive (quite cheap, really), so I don't quite get why you
guard against reporting stats when you expect no other client to be
listening.

I think you can simplify this a lot by directly using
pgstat_progress_update_param() instead.

xlog.c @ checkpoint_progress_start
+        pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT, InvalidOid);
+        checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_PHASE,
+                                         PROGRESS_CHECKPOINT_PHASE_INIT);
+        if (flags & CHECKPOINT_CAUSE_XLOG)
+            checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND,
+                                             PROGRESS_CHECKPOINT_KIND_WAL);
+        else if (flags & CHECKPOINT_CAUSE_TIME)
+            checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND,
+                                             PROGRESS_CHECKPOINT_KIND_TIME);
+ [...]

Could you assign the kind of checkpoint to a local variable, and then
update the "phase" and "kind" parameters at the same time through
pgstat_progress_update_multi_param(2, ...)? See
BuildRelationExtStatistics in extended_stats.c for an example usage.
Note that regardless of whether checkpoint_progress_update* will
remain, the checks done in that function already have been checked in
this function as well, so you can use the pgstat_* functions directly.

monitoring.sgml
+   <structname>pg_stat_progress_checkpoint</structname> view will contain a
+   single row indicating the progress of checkpoint operation.

... add "if a checkpoint is currently active".

+       <structfield>total_buffer_writes</structfield> <type>bigint</type>
+       <structfield>total_file_syncs</structfield> <type>bigint</type>

The other progress tables use [type]_total as column names for counter
targets (e.g. backup_total for backup_streamed, heap_blks_total for
heap_blks_scanned, etc.). I think that `buffers_total` and
`files_total` would be better column names.

+ The checkpoint operation is requested due to XLOG filling.

+ The checkpoint was started because >max_wal_size< of WAL was written.

+ The checkpoint operation is requested due to timeout.

+ The checkpoint was started due to the expiration of a

checkpoint_timeout< interval

+       The checkpoint operation is forced even if no XLOG activity has occurred
+       since the last one.

+ Some operation forced a checkpoint.

+ <entry><literal>checkpointing CommitTs pages</literal></entry>

CommitTs -> Commit time stamp

Thanks for working on this.

Kind regards,

Matthias van de Meent

#19Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Julien Rouhaud (#17)
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

Thank you for sharing the information. 'triggering backend PID' (int)
- can be stored without any problem.

There can be multiple processes triggering a checkpoint, or at least wanting it
to happen or happen faster.

Yes. There can be multiple processes but there will be one checkpoint
operation at a time. So the backend PID corresponds to the current
checkpoint operation. Let me know if I am missing something.

If there's a checkpoint timed triggered and then someone calls
pg_start_backup() which then wait for the end of the current checkpoint
(possibly after changing the flags), I think the view should reflect that in
some way. Maybe storing an array of (pid, flags) is too much, but at least a
counter with the number of processes actively waiting for the end of the
checkpoint.

Okay. I feel this can be added as additional field but it will not
replace backend_pid field as this represents the pid of the backend
which triggered the current checkpoint. Probably a new field named
'processes_wiating' or 'events_waiting' can be added for this purpose.
Thoughts?

'checkpoint or restartpoint?'

Do you actually need to store that? Can't it be inferred from
pg_is_in_recovery()?

AFAIK we cannot use pg_is_in_recovery() to predict whether it is a
checkpoint or restartpoint because if the system exits from recovery
mode during restartpoint then any query to pg_stat_progress_checkpoint
view will return it as a checkpoint which is ideally not correct. Please
correct me if I am wrong.

Recovery ends with an end-of-recovery checkpoint that has to finish before the
promotion can happen, so I don't think that a restart can still be in progress
if pg_is_in_recovery() returns false.

Probably writing of buffers or syncing files may complete before
pg_is_in_recovery() returns false. But there are some cleanup
operations happen as part of the checkpoint. During this scenario, we
may get false value for pg_is_in_recovery(). Please refer following
piece of code which is present in CreateRestartpoint().

if (!RecoveryInProgress())
replayTLI = XLogCtl->InsertTimeLineID;

Thanks & Regards,
Nitin Jadhav

Show quoted text

On Thu, Feb 17, 2022 at 10:57 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Hi,

On Thu, Feb 17, 2022 at 10:39:02PM +0530, Nitin Jadhav wrote:

Thank you for sharing the information. 'triggering backend PID' (int)
- can be stored without any problem.

There can be multiple processes triggering a checkpoint, or at least wanting it
to happen or happen faster.

Yes. There can be multiple processes but there will be one checkpoint
operation at a time. So the backend PID corresponds to the current
checkpoint operation. Let me know if I am missing something.

If there's a checkpoint timed triggered and then someone calls
pg_start_backup() which then wait for the end of the current checkpoint
(possibly after changing the flags), I think the view should reflect that in
some way. Maybe storing an array of (pid, flags) is too much, but at least a
counter with the number of processes actively waiting for the end of the
checkpoint.

'checkpoint or restartpoint?'

Do you actually need to store that? Can't it be inferred from
pg_is_in_recovery()?

AFAIK we cannot use pg_is_in_recovery() to predict whether it is a
checkpoint or restartpoint because if the system exits from recovery
mode during restartpoint then any query to pg_stat_progress_checkpoint
view will return it as a checkpoint which is ideally not correct. Please
correct me if I am wrong.

Recovery ends with an end-of-recovery checkpoint that has to finish before the
promotion can happen, so I don't think that a restart can still be in progress
if pg_is_in_recovery() returns false.

#20Julien Rouhaud
rjuju123@gmail.com
In reply to: Nitin Jadhav (#19)
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

Hi,

On Fri, Feb 18, 2022 at 12:20:26PM +0530, Nitin Jadhav wrote:

If there's a checkpoint timed triggered and then someone calls
pg_start_backup() which then wait for the end of the current checkpoint
(possibly after changing the flags), I think the view should reflect that in
some way. Maybe storing an array of (pid, flags) is too much, but at least a
counter with the number of processes actively waiting for the end of the
checkpoint.

Okay. I feel this can be added as additional field but it will not
replace backend_pid field as this represents the pid of the backend
which triggered the current checkpoint.

I don't think that's true. Requesting a checkpoint means telling the
checkpointer that it should wake up and start a checkpoint (or restore point)
if it's not already doing so, so the pid will always be the checkpointer pid.
The only exception is a standalone backend, but in that case you won't be able
to query that view anyway.

And also while looking at the patch I see there's the same problem that I
mentioned in the previous thread, which is that the effective flags can be
updated once the checkpoint started, and as-is the view won't reflect that. It
also means that you can't simply display one of wal, time or force but a
possible combination of the flags (including the one not handled in v1).

Probably a new field named 'processes_wiating' or 'events_waiting' can be
added for this purpose.

Maybe num_process_waiting?

'checkpoint or restartpoint?'

Do you actually need to store that? Can't it be inferred from
pg_is_in_recovery()?

AFAIK we cannot use pg_is_in_recovery() to predict whether it is a
checkpoint or restartpoint because if the system exits from recovery
mode during restartpoint then any query to pg_stat_progress_checkpoint
view will return it as a checkpoint which is ideally not correct. Please
correct me if I am wrong.

Recovery ends with an end-of-recovery checkpoint that has to finish before the
promotion can happen, so I don't think that a restart can still be in progress
if pg_is_in_recovery() returns false.

Probably writing of buffers or syncing files may complete before
pg_is_in_recovery() returns false. But there are some cleanup
operations happen as part of the checkpoint. During this scenario, we
may get false value for pg_is_in_recovery(). Please refer following
piece of code which is present in CreateRestartpoint().

if (!RecoveryInProgress())
replayTLI = XLogCtl->InsertTimeLineID;

Then maybe we could store the timeline rather then then kind of checkpoint?
You should still be able to compute the information while giving a bit more
information for the same memory usage.

#21Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Julien Rouhaud (#20)
#22Julien Rouhaud
rjuju123@gmail.com
In reply to: Nitin Jadhav (#21)
#23Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Julien Rouhaud (#22)
#24Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Nitin Jadhav (#9)
#25Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Nitin Jadhav (#23)
#26Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Nitin Jadhav (#23)
#27Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Ashutosh Sharma (#24)
#28Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Matthias van de Meent (#25)
#29Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Nitin Jadhav (#26)
#30Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nitin Jadhav (#26)
#31Justin Pryzby
pryzby@telsasoft.com
In reply to: Nitin Jadhav (#26)
#32Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Nitin Jadhav (#28)
#33Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Nitin Jadhav (#26)
#34Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Matthias van de Meent (#33)
#35Julien Rouhaud
rjuju123@gmail.com
In reply to: Nitin Jadhav (#34)
#36Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Nitin Jadhav (#34)
#37Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Ashutosh Sharma (#29)
#38Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Julien Rouhaud (#35)
#39Julien Rouhaud
rjuju123@gmail.com
In reply to: Nitin Jadhav (#38)
#40Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Julien Rouhaud (#39)
#41Julien Rouhaud
rjuju123@gmail.com
In reply to: Matthias van de Meent (#40)
#42Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#41)
#43Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Nitin Jadhav (#37)
#44Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#43)
#45Julien Rouhaud
rjuju123@gmail.com
In reply to: Bharath Rupireddy (#44)
#46Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Julien Rouhaud (#45)
#47Julien Rouhaud
rjuju123@gmail.com
In reply to: Bharath Rupireddy (#46)
#48Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Bharath Rupireddy (#43)
#49Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Matthias van de Meent (#48)
#50Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Nitin Jadhav (#49)
#51Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Nitin Jadhav (#50)
#52Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Nitin Jadhav (#50)
#53Julien Rouhaud
rjuju123@gmail.com
In reply to: Nitin Jadhav (#50)
#54Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Julien Rouhaud (#53)
#55Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Nitin Jadhav (#54)
#56Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Bharath Rupireddy (#51)
#57Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Ashutosh Sharma (#52)
#58Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Nitin Jadhav (#56)
#59Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Nitin Jadhav (#57)
#60Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Julien Rouhaud (#53)
#61Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Nitin Jadhav (#59)
#62Julien Rouhaud
rjuju123@gmail.com
In reply to: Nitin Jadhav (#60)
#63Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Ashutosh Sharma (#61)
#64Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Julien Rouhaud (#62)
#65Julien Rouhaud
rjuju123@gmail.com
In reply to: Nitin Jadhav (#64)
#66Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Julien Rouhaud (#65)
#67Julien Rouhaud
rjuju123@gmail.com
In reply to: Nitin Jadhav (#66)
#68Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Julien Rouhaud (#67)
#69Julien Rouhaud
rjuju123@gmail.com
In reply to: Nitin Jadhav (#68)
#70Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Julien Rouhaud (#69)
#71Andres Freund
andres@anarazel.de
In reply to: Nitin Jadhav (#58)
#72Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#71)
#73Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Andres Freund (#71)
In reply to: Matthias van de Meent (#73)
#75Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Dagfinn Ilmari Mannsåker (#74)
#76Andres Freund
andres@anarazel.de
In reply to: Matthias van de Meent (#73)
#77Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Nitin Jadhav (#70)
#78Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Andres Freund (#71)
#79Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Michael Paquier (#72)
#80Andres Freund
andres@anarazel.de
In reply to: Nitin Jadhav (#78)
#81Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Andres Freund (#80)
#82Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Nitin Jadhav (#81)
#83Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Bertrand Drouvot (#82)
#84Robert Haas
robertmhaas@gmail.com
In reply to: Bertrand Drouvot (#82)
#85Andres Freund
andres@anarazel.de
In reply to: Bertrand Drouvot (#82)
#86Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Robert Haas (#84)
#87Andres Freund
andres@anarazel.de
In reply to: Bharath Rupireddy (#86)
#88Robert Haas
robertmhaas@gmail.com
In reply to: Bharath Rupireddy (#86)
#89Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#88)
#90Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Robert Haas (#88)
#91Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#89)
#92Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#91)
#93Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#92)
#94Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#92)
#95Andres Freund
andres@anarazel.de
In reply to: Bertrand Drouvot (#82)
#96vignesh C
vignesh21@gmail.com
In reply to: Andres Freund (#95)
#97Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: vignesh C (#96)