commit 4acf7f57f1476611b70f027b4fddd7cc276204af Author: Anton A. Melnikov Date: Mon Dec 4 04:19:47 2023 +0300 Add docs about restartpoints related counters in the pg_stat_checkpointer view. diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 42509042ad..45cc091ea7 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -2982,6 +2982,33 @@ description | Waiting for a newly initialized WAL file to reach durable storage + + + restartpoints_timed bigint + + + Number of scheduled restartpoints due to timeout or after failed attempt to perform it + + + + + + restartpoints_req bigint + + + Number of requested restartpoints + + + + + + restartpoints_done bigint + + + Number of restartpoints that have been performed + + + write_time double precision diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml index 2ed4eb659d..678b0489d3 100644 --- a/doc/src/sgml/wal.sgml +++ b/doc/src/sgml/wal.sgml @@ -655,14 +655,41 @@ directory. Restartpoints can't be performed more frequently than checkpoints on the primary because restartpoints can only be performed at checkpoint records. - A restartpoint is triggered when a checkpoint record is reached if at - least checkpoint_timeout seconds have passed since the last - restartpoint, or if WAL size is about to exceed - max_wal_size. However, because of limitations on when a - restartpoint can be performed, max_wal_size is often exceeded - during recovery, by up to one checkpoint cycle's worth of WAL. + A restartpoint can be demanded by a shedule or by an external request. + The restartpoints_timed counter in the + pg_stat_checkpointer + view counts the first ones while the restartpoints_req + the second. + A restartpoint is triggered by shedule when a checkpoint record is reached + if at least seconds have passed since + the last performed restartpoint or when the previous attempt to perform + the restartpoint have been failed. In the last case the next restartpoint + will be scheduled in 15s. + A restartpoint is triggered by request due to similar reasons like checkpoint + but mostly if WAL size is about to exceed + However, because of limitations on when a restartpoint can be performed, + max_wal_size is often exceeded during recovery, + by up to one checkpoint cycle's worth of WAL. (max_wal_size is never a hard limit anyway, so you should always leave plenty of headroom to avoid running out of disk space.) + The restartpoints_done counter in the + pg_stat_checkpointer + view counts the restartpoints that have really been performed. + + + + In some cases, when the WAL size on the primary increases quickly, + for instance during massive INSERT, + the restartpoints_req counter on the standby + may demonstarte a spike growth. + This occurs since requests to create a new restartpoint due to increased + XLOG consumption cannot be performed because the safe checkpoint record + since last restartpoint has not yet been replayed on the standby. + This behavior is normal does not lead to increase in system resources + consumption. + Only the restartpoints_done + counter among the restartpoint related ones indicates that noticable system + resources have been spent.