Change default of checkpoint_completion_target

Started by Stephen Frostabout 5 years ago30 messages
#1Stephen Frost
sfrost@snowman.net
1 attachment(s)

Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:

On Sun, Dec 06, 2020 at 10:03:08AM -0500, Stephen Frost wrote:

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:

You keep making this statement, and I don't necessarily disagree, but if
that is the case, please explain why don't we have
checkpoint_completion_target set to 0.9 by default? Should we change
that?

Yes, I do think we should change that..

Agreed. FWIW, no idea for others, but it is one of those parameters I
keep telling to update after a default installation.

Concretely, attached is a patch which changes the default and updates
the documentation accordingly.

Passes regression tests and doc build. Will register in the January
commitfest as Needs Review.

Thanks,

Stephen

Attachments:

cct_def_v1.patchtext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8cd3d6901c..47db860dd0 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3260,7 +3260,13 @@ include_dir 'conf.d'
       <listitem>
        <para>
         Specifies the target of checkpoint completion, as a fraction of
-        total time between checkpoints. The default is 0.5.
+        total time between checkpoints. The default is 0.9, which spreads the
+        checkpoint across the entire checkpoint timeout period of time,
+        providing a consistent amount of I/O during the entire checkpoint.
+        Reducing this parameter is not recommended as that causes the I/O from
+        the checkpoint to have to complete faster, resulting in a higher I/O
+        rate, while then having a period of less I/O between the completion of
+        the checkpoint and the start of the next scheduled checkpoint.
         This parameter can only be set in the <filename>postgresql.conf</filename>
         file or on the server command line.
        </para>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index d1c3893b14..8ac3f971b7 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -524,22 +524,28 @@
    writing dirty buffers during a checkpoint is spread over a period of time.
    That period is controlled by
    <xref linkend="guc-checkpoint-completion-target"/>, which is
-   given as a fraction of the checkpoint interval.
+   given as a fraction of the checkpoint interval (configured by using
+   <varname>checkpoint_timeout</varname>).
    The I/O rate is adjusted so that the checkpoint finishes when the
    given fraction of
    <varname>checkpoint_timeout</varname> seconds have elapsed, or before
    <varname>max_wal_size</varname> is exceeded, whichever is sooner.
-   With the default value of 0.5,
+   With the default value of 0.9,
    <productname>PostgreSQL</productname> can be expected to complete each checkpoint
-   in about half the time before the next checkpoint starts.  On a system
-   that's very close to maximum I/O throughput during normal operation,
-   you might want to increase <varname>checkpoint_completion_target</varname>
-   to reduce the I/O load from checkpoints.  The disadvantage of this is that
-   prolonging checkpoints affects recovery time, because more WAL segments
-   will need to be kept around for possible use in recovery.  Although
-   <varname>checkpoint_completion_target</varname> can be set as high as 1.0,
-   it is best to keep it less than that (perhaps 0.9 at most) since
-   checkpoints include some other activities besides writing dirty buffers.
+   a bit before the next scheduled checkpoint.  This spreads out the I/O as much as
+   possible to have the I/O load be consistent during the checkpoint.  The
+   disadvantage of this is that prolonging checkpoints affects recovery time,
+   because more WAL segments will need to be kept around for possible use in recovery.
+   A user concerned about the amount of time required to recover might wish to reduce
+   <varname>checkpoint_timeout</varname>, causing checkpoints to happen more
+   frequently while still spreading out the I/O from each checkpoint.  Alternatively,
+   <varname>checkpoint_completion_target</varname> could be reduced, but this would
+   result in times of more intense I/O (during the checkpoint) and times of less I/O
+   (after the checkpoint completed but before the next scheduled checkpoint) and
+   therefore is not recommended.
+   Although <varname>checkpoint_completion_target</varname> could be set as high as
+   1.0, it is best to keep it less than that (such as at the default of 0.9, at most)
+   since checkpoints include some other activities besides writing dirty buffers.
    A setting of 1.0 is quite likely to result in checkpoints not being
    completed on time, which would result in performance loss due to
    unexpected variation in the number of WAL segments needed.
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 635d91d50a..c0b2f99c11 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3643,7 +3643,7 @@ static struct config_real ConfigureNamesReal[] =
 			NULL
 		},
 		&CheckPointCompletionTarget,
-		0.5, 0.0, 1.0,
+		0.9, 0.0, 1.0,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9c9091e601..91d759dc61 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -230,7 +230,7 @@
 #checkpoint_timeout = 5min		# range 30s-1d
 #max_wal_size = 1GB
 #min_wal_size = 80MB
-#checkpoint_completion_target = 0.5	# checkpoint target duration, 0.0 - 1.0
+#checkpoint_completion_target = 0.9	# checkpoint target duration, 0.0 - 1.0
 #checkpoint_flush_after = 0		# measured in pages, 0 disables
 #checkpoint_warning = 30s		# 0 disables
 
diff --git a/src/test/recovery/t/015_promotion_pages.pl b/src/test/recovery/t/015_promotion_pages.pl
index 6fb70b5001..25a9e4764a 100644
--- a/src/test/recovery/t/015_promotion_pages.pl
+++ b/src/test/recovery/t/015_promotion_pages.pl
@@ -26,7 +26,6 @@ my $bravo = get_new_node('bravo');
 $bravo->init_from_backup($alpha, 'bkp', has_streaming => 1);
 $bravo->append_conf('postgresql.conf', <<EOF);
 checkpoint_timeout=1h
-checkpoint_completion_target=0.9
 EOF
 $bravo->start;
 
#2Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Stephen Frost (#1)
Re: Change default of checkpoint_completion_target

On 2020-12-07 18:53, Stephen Frost wrote:

* Michael Paquier (michael@paquier.xyz) wrote:

On Sun, Dec 06, 2020 at 10:03:08AM -0500, Stephen Frost wrote:

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:

You keep making this statement, and I don't necessarily disagree, but if
that is the case, please explain why don't we have
checkpoint_completion_target set to 0.9 by default? Should we change
that?

Yes, I do think we should change that..

Agreed. FWIW, no idea for others, but it is one of those parameters I
keep telling to update after a default installation.

Concretely, attached is a patch which changes the default and updates
the documentation accordingly.

I agree with considering this change, but I wonder why the value 0.9.
Why not, say, 0.95, 0.99, or 1.0?

#3Stephen Frost
sfrost@snowman.net
In reply to: Peter Eisentraut (#2)
Re: Change default of checkpoint_completion_target

Greetings,

* Peter Eisentraut (peter.eisentraut@enterprisedb.com) wrote:

On 2020-12-07 18:53, Stephen Frost wrote:

* Michael Paquier (michael@paquier.xyz) wrote:

On Sun, Dec 06, 2020 at 10:03:08AM -0500, Stephen Frost wrote:

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:

You keep making this statement, and I don't necessarily disagree, but if
that is the case, please explain why don't we have
checkpoint_completion_target set to 0.9 by default? Should we change
that?

Yes, I do think we should change that..

Agreed. FWIW, no idea for others, but it is one of those parameters I
keep telling to update after a default installation.

Concretely, attached is a patch which changes the default and updates
the documentation accordingly.

I agree with considering this change, but I wonder why the value 0.9. Why
not, say, 0.95, 0.99, or 1.0?

The documentation (which my patch updates to match the new default)
covers this pretty well here:

https://www.postgresql.org/docs/current/wal-configuration.html

"Although checkpoint_completion_target can be set as high as 1.0, it is
best to keep it less than that (perhaps 0.9 at most) since checkpoints
include some other activities besides writing dirty buffers. A setting
of 1.0 is quite likely to result in checkpoints not being completed on
time, which would result in performance loss due to unexpected variation
in the number of WAL segments needed."

Thanks,

Stephen

#4Bossart, Nathan
bossartn@amazon.com
In reply to: Stephen Frost (#3)
Re: Change default of checkpoint_completion_target

On 12/7/20, 9:53 AM, "Stephen Frost" <sfrost@snowman.net> wrote:

Concretely, attached is a patch which changes the default and updates
the documentation accordingly.

+1 to setting checkpoint_completion_target to 0.9 by default.

Nathan

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bossart, Nathan (#4)
Re: Change default of checkpoint_completion_target

"Bossart, Nathan" <bossartn@amazon.com> writes:

On 12/7/20, 9:53 AM, "Stephen Frost" <sfrost@snowman.net> wrote:

Concretely, attached is a patch which changes the default and updates
the documentation accordingly.

+1 to setting checkpoint_completion_target to 0.9 by default.

FWIW, I kind of like the idea of getting rid of it completely.
Is there really ever a good reason to set it to something different
than that? If not, well, we have too many GUCs already, and each
of them carries nonzero performance, documentation, and maintenance
overhead.

regards, tom lane

#6Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#5)
Re: Change default of checkpoint_completion_target

On Tue, Dec 8, 2020 at 6:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Bossart, Nathan" <bossartn@amazon.com> writes:

On 12/7/20, 9:53 AM, "Stephen Frost" <sfrost@snowman.net> wrote:

Concretely, attached is a patch which changes the default and updates
the documentation accordingly.

+1 to setting checkpoint_completion_target to 0.9 by default.

FWIW, I kind of like the idea of getting rid of it completely.
Is there really ever a good reason to set it to something different
than that? If not, well, we have too many GUCs already, and each
of them carries nonzero performance, documentation, and maintenance
overhead.

+1.

There are plenty of cases I think where it doesn't really matter with the
values, but when it does I'm not sure what it would be where something else
would actually be better.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#7Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Bossart, Nathan (#4)
Re: Change default of checkpoint_completion_target

On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote:

+1 to setting checkpoint_completion_target to 0.9 by default.

+1 for changing the default or getting rid of it, as Tom suggested.

While we are at it, could we change the default of "log_lock_waits" to "on"?

Yours,
Laurenz Albe

#8Stephen Frost
sfrost@snowman.net
In reply to: Laurenz Albe (#7)
1 attachment(s)
Re: Change default of checkpoint_completion_target

Greetings,

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:

On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote:

+1 to setting checkpoint_completion_target to 0.9 by default.

+1 for changing the default or getting rid of it, as Tom suggested.

Attached is a patch to change it from a GUC to a compile-time #define
which is set to 0.9, with accompanying documentation updates.

While we are at it, could we change the default of "log_lock_waits" to "on"?

While I agree that it'd be good to change quite a few of the log_X items
to be 'on' by default, I'm not planning to work on this.

Thanks,

Stephen

Attachments:

cct_def_v2.patchtext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 42a8ed328d..ed2b27164b 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -857,10 +857,9 @@ SELECT pg_start_backup('label', false, false);
     <para>
      By default, <function>pg_start_backup</function> can take a long time to finish.
      This is because it performs a checkpoint, and the I/O
-     required for the checkpoint will be spread out over a significant
-     period of time, by default half your inter-checkpoint interval
-     (see the configuration parameter
-     <xref linkend="guc-checkpoint-completion-target"/>).  This is
+     required for the checkpoint will be spread out over the inter-checkpoint
+     interval (see the configuration parameter
+     <xref linkend="guc-checkpoint-timeout"/>).  This is
      usually what you want, because it minimizes the impact on query
      processing.  If you want to start the backup as soon as
      possible, change the second parameter to <literal>true</literal>, which will
@@ -1000,10 +999,9 @@ SELECT pg_start_backup('label');
     <para>
      By default, <function>pg_start_backup</function> can take a long time to finish.
      This is because it performs a checkpoint, and the I/O
-     required for the checkpoint will be spread out over a significant
-     period of time, by default half your inter-checkpoint interval
-     (see the configuration parameter
-     <xref linkend="guc-checkpoint-completion-target"/>).  This is
+     required for the checkpoint will be spread out over the inter-checkpoint
+     interval (see the configuration parameter
+     <xref linkend="guc-checkpoint-timeout"/>).  This is
      usually what you want, because it minimizes the impact on query
      processing.  If you want to start the backup as soon as
      possible, use:
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8cd3d6901c..39f9701959 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3251,22 +3251,6 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
-     <varlistentry id="guc-checkpoint-completion-target" xreflabel="checkpoint_completion_target">
-      <term><varname>checkpoint_completion_target</varname> (<type>floating point</type>)
-      <indexterm>
-       <primary><varname>checkpoint_completion_target</varname> configuration parameter</primary>
-      </indexterm>
-      </term>
-      <listitem>
-       <para>
-        Specifies the target of checkpoint completion, as a fraction of
-        total time between checkpoints. The default is 0.5.
-        This parameter can only be set in the <filename>postgresql.conf</filename>
-        file or on the server command line.
-       </para>
-      </listitem>
-     </varlistentry>
-
      <varlistentry id="guc-checkpoint-flush-after" xreflabel="checkpoint_flush_after">
       <term><varname>checkpoint_flush_after</varname> (<type>integer</type>)
       <indexterm>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index d1c3893b14..735d0c0661 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -521,28 +521,19 @@
 
   <para>
    To avoid flooding the I/O system with a burst of page writes,
-   writing dirty buffers during a checkpoint is spread over a period of time.
-   That period is controlled by
-   <xref linkend="guc-checkpoint-completion-target"/>, which is
-   given as a fraction of the checkpoint interval.
-   The I/O rate is adjusted so that the checkpoint finishes when the
-   given fraction of
-   <varname>checkpoint_timeout</varname> seconds have elapsed, or before
+   writing dirty buffers during a checkpoint is spread out across the time between
+   when checkpoints are scheduled to begin, as configured by 
+   <xref linkend="guc-checkpoint-timeout"/>.
+   The I/O rate is adjusted so that the checkpoint finishes at approximately the
+   time when the next checkpoint is scheduled to begin, or before 
    <varname>max_wal_size</varname> is exceeded, whichever is sooner.
-   With the default value of 0.5,
-   <productname>PostgreSQL</productname> can be expected to complete each checkpoint
-   in about half the time before the next checkpoint starts.  On a system
-   that's very close to maximum I/O throughput during normal operation,
-   you might want to increase <varname>checkpoint_completion_target</varname>
-   to reduce the I/O load from checkpoints.  The disadvantage of this is that
-   prolonging checkpoints affects recovery time, because more WAL segments
-   will need to be kept around for possible use in recovery.  Although
-   <varname>checkpoint_completion_target</varname> can be set as high as 1.0,
-   it is best to keep it less than that (perhaps 0.9 at most) since
-   checkpoints include some other activities besides writing dirty buffers.
-   A setting of 1.0 is quite likely to result in checkpoints not being
-   completed on time, which would result in performance loss due to
-   unexpected variation in the number of WAL segments needed.
+   This spreads out the I/O as much as possible to have the I/O load be consistent
+   during the checkpoint and generally throughout the operation of the system.  The
+   disadvantage of this is that prolonging checkpoints affects recovery time,
+   because more WAL segments will need to be kept around for possible use in recovery.
+   A user concerned about the amount of time required to recover might wish to reduce
+   <varname>checkpoint_timeout</varname>, causing checkpoints to happen more
+   frequently.
   </para>
 
   <para>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7e81ce4f17..f027cfe171 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2289,7 +2289,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
 
 /*
  * Calculate CheckPointSegments based on max_wal_size_mb and
- * checkpoint_completion_target.
+ * CheckPointCompletionTarget.
  */
 static void
 CalculateCheckpointSegments(void)
@@ -2306,7 +2306,7 @@ CalculateCheckpointSegments(void)
 	 *    only did this on the primary anyway, not on standby. Keeping just
 	 *    one checkpoint simplifies processing and reduces disk space in
 	 *    many smaller databases.)
-	 * b) during checkpoint, we consume checkpoint_completion_target *
+	 * b) during checkpoint, we consume CheckPointCompletionTarget *
 	 *	  number of segments consumed between checkpoints.
 	 *-------
 	 */
@@ -2327,13 +2327,6 @@ assign_max_wal_size(int newval, void *extra)
 	CalculateCheckpointSegments();
 }
 
-void
-assign_checkpoint_completion_target(double newval, void *extra)
-{
-	CheckPointCompletionTarget = newval;
-	CalculateCheckpointSegments();
-}
-
 /*
  * At a checkpoint, how many WAL segments to recycle as preallocated future
  * XLOG segments? Returns the highest segment that should be preallocated.
@@ -8694,7 +8687,7 @@ UpdateCheckPointDistanceEstimate(uint64 nbytes)
  *	CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.
  *	CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.
  *	CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP,
- *		ignoring checkpoint_completion_target parameter.
+ *		ignoring the CheckPointCompletionTarget.
  *	CHECKPOINT_FORCE: force a checkpoint even if no XLOG activity has occurred
  *		since the last one (implied by CHECKPOINT_IS_SHUTDOWN or
  *		CHECKPOINT_END_OF_RECOVERY).
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 429c8010ef..4f6e843146 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -145,7 +145,6 @@ static CheckpointerShmemStruct *CheckpointerShmem;
  */
 int			CheckPointTimeout = 300;
 int			CheckPointWarning = 30;
-double		CheckPointCompletionTarget = 0.5;
 
 /*
  * Private state
@@ -671,7 +670,7 @@ ImmediateCheckpointRequested(void)
  *
  * This function is called after each page write performed by BufferSync().
  * It is responsible for throttling BufferSync()'s write rate to hit
- * checkpoint_completion_target.
+ * CheckPointCompletionTarget.
  *
  * The checkpoint request flags should be passed in; currently the only one
  * examined is CHECKPOINT_IMMEDIATE, which disables delays between writes.
@@ -757,7 +756,7 @@ IsCheckpointOnSchedule(double progress)
 
 	Assert(ckpt_active);
 
-	/* Scale progress according to checkpoint_completion_target. */
+	/* Scale progress according to CheckPointCompletionTarget. */
 	progress *= CheckPointCompletionTarget;
 
 	/*
@@ -786,7 +785,7 @@ IsCheckpointOnSchedule(double progress)
 	 * be a large gap between a checkpoint's redo-pointer and the checkpoint
 	 * record itself, and we only start the restartpoint after we've seen the
 	 * checkpoint record. (The gap is typically up to CheckPointSegments *
-	 * checkpoint_completion_target where checkpoint_completion_target is the
+	 * CheckPointCompletionTarget where CheckPointCompletionTarget is the
 	 * value that was in effect when the WAL was generated).
 	 */
 	if (RecoveryInProgress())
@@ -903,7 +902,7 @@ CheckpointerShmemInit(void)
  *	CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.
  *	CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.
  *	CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP,
- *		ignoring checkpoint_completion_target parameter.
+ *		ignoring the CheckPointCompletionTarget.
  *	CHECKPOINT_FORCE: force a checkpoint even if no XLOG activity has occurred
  *		since the last one (implied by CHECKPOINT_IS_SHUTDOWN or
  *		CHECKPOINT_END_OF_RECOVERY).
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 635d91d50a..e501e525eb 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3637,16 +3637,6 @@ static struct config_real ConfigureNamesReal[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"checkpoint_completion_target", PGC_SIGHUP, WAL_CHECKPOINTS,
-			gettext_noop("Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval."),
-			NULL
-		},
-		&CheckPointCompletionTarget,
-		0.5, 0.0, 1.0,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"vacuum_cleanup_index_scale_factor", PGC_USERSET, CLIENT_CONN_STATEMENT,
 			gettext_noop("Number of tuple inserts prior to index cleanup as a fraction of reltuples."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9c9091e601..39049c0e48 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -230,7 +230,6 @@
 #checkpoint_timeout = 5min		# range 30s-1d
 #max_wal_size = 1GB
 #min_wal_size = 80MB
-#checkpoint_completion_target = 0.5	# checkpoint target duration, 0.0 - 1.0
 #checkpoint_flush_after = 0		# measured in pages, 0 disables
 #checkpoint_warning = 30s		# 0 disables
 
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 221af87e71..e7d93c27bd 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -350,7 +350,6 @@ extern void StartupRequestWalReceiverRestart(void);
 extern void XLogRequestWalReceiverReply(void);
 
 extern void assign_max_wal_size(int newval, void *extra);
-extern void assign_checkpoint_completion_target(double newval, void *extra);
 
 /*
  * Routines to start, stop, and get status of a base backup.
diff --git a/src/include/postmaster/bgwriter.h b/src/include/postmaster/bgwriter.h
index 0a5708b32e..5e13c3bd2a 100644
--- a/src/include/postmaster/bgwriter.h
+++ b/src/include/postmaster/bgwriter.h
@@ -20,12 +20,31 @@
 #include "storage/smgr.h"
 #include "storage/sync.h"
 
+/*
+ * CheckPointCompletionTarget is the percentage of time between when
+ * checkpoints are scheduled to start that we wish to spend writing
+ * out dirty buffers.  Our goal is to spread the I/O out over as much of the
+ * checkpoint interval as possible, while not finishing the checkpoint late,
+ * so that the amount of I/O is consistent.
+ *
+ * It might be tempting to set this to '1.0', but there are a few other
+ * things that happen during a checkpoint and we don't want to mistakenly
+ * end up not finishing the checkpoint on time as that could lead to
+ * unexpected variation in the number of WAL segments needed and reduced
+ * performance.
+ *
+ * CheckPointCompletionTarget used to be exposed as a GUC named
+ * checkpoint_completion_target, but there's little evidence to suggest that
+ * there's actually a case for it being a different value, so it's no longer
+ * exposed as a GUC to be configured.
+ */
+
+#define CheckPointCompletionTarget	0.9
 
 /* GUC options */
 extern int	BgWriterDelay;
 extern int	CheckPointTimeout;
 extern int	CheckPointWarning;
-extern double CheckPointCompletionTarget;
 
 extern void BackgroundWriterMain(void) pg_attribute_noreturn();
 extern void CheckpointerMain(void) pg_attribute_noreturn();
diff --git a/src/test/recovery/t/015_promotion_pages.pl b/src/test/recovery/t/015_promotion_pages.pl
index 6fb70b5001..25a9e4764a 100644
--- a/src/test/recovery/t/015_promotion_pages.pl
+++ b/src/test/recovery/t/015_promotion_pages.pl
@@ -26,7 +26,6 @@ my $bravo = get_new_node('bravo');
 $bravo->init_from_backup($alpha, 'bkp', has_streaming => 1);
 $bravo->append_conf('postgresql.conf', <<EOF);
 checkpoint_timeout=1h
-checkpoint_completion_target=0.9
 EOF
 $bravo->start;
 
#9Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Stephen Frost (#8)
Re: Change default of checkpoint_completion_target

Howdy,

On 2020-Dec-10, Stephen Frost wrote:

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:

On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote:

+1 to setting checkpoint_completion_target to 0.9 by default.

+1 for changing the default or getting rid of it, as Tom suggested.

Attached is a patch to change it from a GUC to a compile-time #define
which is set to 0.9, with accompanying documentation updates.

I think we should leave a doc stub or at least an <indexterm>, to let
people know the GUC has been removed rather than just making it
completely invisible. (Maybe piggyback on the stuff in [1]/messages/by-id/CAGRY4nyA=jmBNa4LVwgGO1GyO-RnFmfkesddpT_uO+3=mot8DA@mail.gmail.com?)

[1]: /messages/by-id/CAGRY4nyA=jmBNa4LVwgGO1GyO-RnFmfkesddpT_uO+3=mot8DA@mail.gmail.com

#10Stephen Frost
sfrost@snowman.net
In reply to: Alvaro Herrera (#9)
Re: Change default of checkpoint_completion_target

Greetings,

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:

On 2020-Dec-10, Stephen Frost wrote:

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:

On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote:

+1 to setting checkpoint_completion_target to 0.9 by default.

+1 for changing the default or getting rid of it, as Tom suggested.

Attached is a patch to change it from a GUC to a compile-time #define
which is set to 0.9, with accompanying documentation updates.

I think we should leave a doc stub or at least an <indexterm>, to let
people know the GUC has been removed rather than just making it
completely invisible. (Maybe piggyback on the stuff in [1]?)

[1] /messages/by-id/CAGRY4nyA=jmBNa4LVwgGO1GyO-RnFmfkesddpT_uO+3=mot8DA@mail.gmail.com

Yes, I agree, and am involved in that thread as well- currently waiting
feedback from others about the proposed approach.

Getting a few more people looking at that thread and commenting on it
would really help us be able to move forward.

Thanks,

Stephen

#11Stephen Frost
sfrost@snowman.net
In reply to: Stephen Frost (#10)
Re: Change default of checkpoint_completion_target

Greetings,

* Stephen Frost (sfrost@snowman.net) wrote:

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:

On 2020-Dec-10, Stephen Frost wrote:

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:

On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote:

+1 to setting checkpoint_completion_target to 0.9 by default.

+1 for changing the default or getting rid of it, as Tom suggested.

Attached is a patch to change it from a GUC to a compile-time #define
which is set to 0.9, with accompanying documentation updates.

I think we should leave a doc stub or at least an <indexterm>, to let
people know the GUC has been removed rather than just making it
completely invisible. (Maybe piggyback on the stuff in [1]?)

[1] /messages/by-id/CAGRY4nyA=jmBNa4LVwgGO1GyO-RnFmfkesddpT_uO+3=mot8DA@mail.gmail.com

Yes, I agree, and am involved in that thread as well- currently waiting
feedback from others about the proposed approach.

I've tried to push that forward. I'm happy to update this patch once
we've got agreement to move forward on that, to wit, adding to an
'obsolete' section in the documentation information about this
particular GUC and how it's been removed due to not being sensible or
necessary to continue to have.

Getting a few more people looking at that thread and commenting on it
would really help us be able to move forward.

This is still the case though..

Thanks!

Stephen

#12Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#8)
Re: Change default of checkpoint_completion_target

On Thu, Dec 10, 2020 at 12:16:02PM -0500, Stephen Frost wrote:

Attached is a patch to change it from a GUC to a compile-time #define
which is set to 0.9, with accompanying documentation updates.

All the references to checkpoint_target_completion are removed (except
for bgwriter.h as per the patch).

This is because it performs a checkpoint, and the I/O
-     required for the checkpoint will be spread out over a significant
-     period of time, by default half your inter-checkpoint interval
-     (see the configuration parameter
-     <xref linkend="guc-checkpoint-completion-target"/>).  This is
+     required for the checkpoint will be spread out over the inter-checkpoint
+     interval (see the configuration parameter
+     <xref linkend="guc-checkpoint-timeout"/>).  This is

It may be worth mentioning that this is spread across 90% of the last
checkpoint's duration instead.

-   in about half the time before the next checkpoint starts.  On a system
-   that's very close to maximum I/O throughput during normal operation,
-   you might want to increase <varname>checkpoint_completion_target</varname>
-   to reduce the I/O load from checkpoints.  The disadvantage of this is that
-   prolonging checkpoints affects recovery time, because more WAL segments
-   will need to be kept around for possible use in recovery.  Although
-   <varname>checkpoint_completion_target</varname> can be set as high as 1.0,
-   it is best to keep it less than that (perhaps 0.9 at most) since
-   checkpoints include some other activities besides writing dirty buffers.
-   A setting of 1.0 is quite likely to result in checkpoints not being
-   completed on time, which would result in performance loss due to
-   unexpected variation in the number of WAL segments needed.
+   This spreads out the I/O as much as possible to have the I/O load be consistent
+   during the checkpoint and generally throughout the operation of the system.  The
+   disadvantage of this is that prolonging checkpoints affects recovery time,
+   because more WAL segments will need to be kept around for possible use in recovery.
+   A user concerned about the amount of time required to recover might wish to reduce
+   <varname>checkpoint_timeout</varname>, causing checkpoints to happen more
+   frequently.
</para>

<para>

Again, this makes the description of the I/O spread more general,
removing the portion where half the time is used by default. Should
this stuff also mention the spread value of 90% instead?

* At a checkpoint, how many WAL segments to recycle as preallocated future
* XLOG segments? Returns the highest segment that should be preallocated.
@@ -8694,7 +8687,7 @@ UpdateCheckPointDistanceEstimate(uint64 nbytes)
*	CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.
*	CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.
*	CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP,
- *		ignoring checkpoint_completion_target parameter.
+ *		ignoring the CheckPointCompletionTarget.

s/the//?

* be a large gap between a checkpoint's redo-pointer and the checkpoint
* record itself, and we only start the restartpoint after we've seen the
* checkpoint record. (The gap is typically up to CheckPointSegments *
-	 * checkpoint_completion_target where checkpoint_completion_target is the
+	 * CheckPointCompletionTarget where CheckPointCompletionTarget is the
* value that was in effect when the WAL was generated).

The last part of this sentence does not make sense.
CheckPointCompletionTarget becomes a constant with this patch.

if (RecoveryInProgress())
@@ -903,7 +902,7 @@ CheckpointerShmemInit(void)
*	CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.
*	CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.
*	CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP,
- *		ignoring checkpoint_completion_target parameter.
+ *		ignoring the CheckPointCompletionTarget.

s/the//?

+ * CheckPointCompletionTarget used to be exposed as a GUC named
+ * checkpoint_completion_target, but there's little evidence to suggest that
+ * there's actually a case for it being a different value, so it's no longer
+ * exposed as a GUC to be configured.

I would just remove this paragraph.
--
Michael

#13Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#5)
Re: Change default of checkpoint_completion_target

Hi,

On 2020-12-08 12:41:35 -0500, Tom Lane wrote:

FWIW, I kind of like the idea of getting rid of it completely.
Is there really ever a good reason to set it to something different
than that? If not, well, we have too many GUCs already, and each
of them carries nonzero performance, documentation, and maintenance
overhead.

I like the idea of getting rid of it too, but I think we should consider
evaluating the concrete hard-coded value a bit more careful than just
going for 0.9 based on some old recommendations in the docs. It not
being changeable afterwards...

I think it might be a good idea to immediately change the default to
0.9, and concurrently try to evaluate whether it's really the best value
(vs 0.95, 1 or ...).

FWIW I have seen a few cases in the past where setting the target to
something very small helped, but I think that was mostly because we
didn't yet tell the kernel to flush dirty data more aggressively.

Greetings,

Andres Freund

#14Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Andres Freund (#13)
Re: Change default of checkpoint_completion_target

On 1/15/21 10:51 PM, Andres Freund wrote:

Hi,

On 2020-12-08 12:41:35 -0500, Tom Lane wrote:

FWIW, I kind of like the idea of getting rid of it completely.
Is there really ever a good reason to set it to something different
than that? If not, well, we have too many GUCs already, and each
of them carries nonzero performance, documentation, and maintenance
overhead.

I like the idea of getting rid of it too, but I think we should consider
evaluating the concrete hard-coded value a bit more careful than just
going for 0.9 based on some old recommendations in the docs. It not
being changeable afterwards...

I think it might be a good idea to immediately change the default to
0.9, and concurrently try to evaluate whether it's really the best value
(vs 0.95, 1 or ...).

FWIW I have seen a few cases in the past where setting the target to
something very small helped, but I think that was mostly because we
didn't yet tell the kernel to flush dirty data more aggressively.

Yeah. The flushing probably makes that mostly unnecessary, but we still
allow disabling that. I'm not really convinced replacing it with a
compile-time #define is a good idea, exactly because it can't be changed
if needed.

As for the exact value, maybe the right solution is to make it dynamic.
The usual approach is to leave "enough time" for the kernel to flush
dirty data, so we could say 60 seconds and calculate the exact target
depending on the checkpoint_timeout.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#14)
Re: Change default of checkpoint_completion_target

Hi,

On 2021-01-15 23:05:02 +0100, Tomas Vondra wrote:

Yeah. The flushing probably makes that mostly unnecessary, but we still
allow disabling that. I'm not really convinced replacing it with a
compile-time #define is a good idea, exactly because it can't be changed
if needed.

It's also not available everywhere...

As for the exact value, maybe the right solution is to make it dynamic.
The usual approach is to leave "enough time" for the kernel to flush
dirty data, so we could say 60 seconds and calculate the exact target
depending on the checkpoint_timeout.

IME the kernel flushing at some later time precisely is the problem,
because of the latency spikes that happen when it decides to do so. That
commonly starts to happen well before the fsyncs. The reason that
setting a very small checkpoint_completion_target can help is that it
condenses the period of unrealiable performance into one short time,
rather than spreading it over the whole checkpoint...

Greetings,

Andres Freund

#16Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Stephen Frost (#11)
Re: Change default of checkpoint_completion_target

On 2021-01-13 23:10, Stephen Frost wrote:

Yes, I agree, and am involved in that thread as well- currently waiting
feedback from others about the proposed approach.

I've tried to push that forward. I'm happy to update this patch once
we've got agreement to move forward on that, to wit, adding to an
'obsolete' section in the documentation information about this
particular GUC and how it's been removed due to not being sensible or
necessary to continue to have.

Some discussion a few days ago was arguing that it was still necessary
in some cases as a way to counteract the possible lack of tuning in the
kernel flushing behavior. I think in light of that we should go with
your first patch that just changes the default, possibly with the
documentation updated a bit.

#17Stephen Frost
sfrost@snowman.net
In reply to: Peter Eisentraut (#16)
1 attachment(s)
Re: Change default of checkpoint_completion_target

Greetings,

* Peter Eisentraut (peter.eisentraut@enterprisedb.com) wrote:

On 2021-01-13 23:10, Stephen Frost wrote:

Yes, I agree, and am involved in that thread as well- currently waiting
feedback from others about the proposed approach.

I've tried to push that forward. I'm happy to update this patch once
we've got agreement to move forward on that, to wit, adding to an
'obsolete' section in the documentation information about this
particular GUC and how it's been removed due to not being sensible or
necessary to continue to have.

Some discussion a few days ago was arguing that it was still necessary in
some cases as a way to counteract the possible lack of tuning in the kernel
flushing behavior. I think in light of that we should go with your first
patch that just changes the default, possibly with the documentation updated
a bit.

Rebased and updated patch attached which moves back to just changing the
default instead of removing the option, with a more explicit call-out of
the '90%', as suggested by Michael on the other patch.

Any further comments or thoughts on this one?

Thanks,

Stephen

Attachments:

cct_def_v3.patchtext/x-diff; charset=us-asciiDownload
From 335b8e630fae6c229f27f70f85847e29dfc1b783 Mon Sep 17 00:00:00 2001
From: Stephen Frost <sfrost@snowman.net>
Date: Tue, 19 Jan 2021 13:53:34 -0500
Subject: [PATCH] Change the default of checkpoint_completion_target to 0.9

Common recommendations are that the checkpoint should be spread out as
much as possible, provided we avoid having it take too long.  This
change updates the default to 0.9 (from 0.5) to match that
recommendation.

There was some debate about possibly removing the option entirely but it
seems there may be some corner-cases where having it set much lower to
try to force the checkpoint to be as fast as possible could result in
fewer periods of time of reduced performance due to kernel flushing.
General agreement is that the "spread more" is the preferred approach
though and those who need to tune away from that value are much less
common.

Reviewed-By: Michael Paquier, Peter Eisentraut
Discussion: https://postgr.es/m/20201207175329.GM16415%40tamriel.snowman.net
---
 doc/src/sgml/config.sgml                      |  8 ++++-
 doc/src/sgml/wal.sgml                         | 29 ++++++++++++-------
 src/backend/utils/misc/guc.c                  |  2 +-
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/test/recovery/t/015_promotion_pages.pl    |  1 -
 5 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 82864bbb24..7e06d0febb 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3266,7 +3266,13 @@ include_dir 'conf.d'
       <listitem>
        <para>
         Specifies the target of checkpoint completion, as a fraction of
-        total time between checkpoints. The default is 0.5.
+        total time between checkpoints. The default is 0.9, which spreads the
+        checkpoint across the entire checkpoint timeout period of time,
+        providing a consistent amount of I/O during the entire checkpoint.
+        Reducing this parameter is not recommended as that causes the I/O from
+        the checkpoint to have to complete faster, resulting in a higher I/O
+        rate, while then having a period of less I/O between the completion of
+        the checkpoint and the start of the next scheduled checkpoint.
         This parameter can only be set in the <filename>postgresql.conf</filename>
         file or on the server command line.
        </para>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 66de1ee2f8..733eba22db 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -571,22 +571,29 @@
    writing dirty buffers during a checkpoint is spread over a period of time.
    That period is controlled by
    <xref linkend="guc-checkpoint-completion-target"/>, which is
-   given as a fraction of the checkpoint interval.
+   given as a fraction of the checkpoint interval (configured by using
+   <varname>checkpoint_timeout</varname>).
    The I/O rate is adjusted so that the checkpoint finishes when the
    given fraction of
    <varname>checkpoint_timeout</varname> seconds have elapsed, or before
    <varname>max_wal_size</varname> is exceeded, whichever is sooner.
-   With the default value of 0.5,
+   With the default value of 0.9,
    <productname>PostgreSQL</productname> can be expected to complete each checkpoint
-   in about half the time before the next checkpoint starts.  On a system
-   that's very close to maximum I/O throughput during normal operation,
-   you might want to increase <varname>checkpoint_completion_target</varname>
-   to reduce the I/O load from checkpoints.  The disadvantage of this is that
-   prolonging checkpoints affects recovery time, because more WAL segments
-   will need to be kept around for possible use in recovery.  Although
-   <varname>checkpoint_completion_target</varname> can be set as high as 1.0,
-   it is best to keep it less than that (perhaps 0.9 at most) since
-   checkpoints include some other activities besides writing dirty buffers.
+   a bit before the next scheduled checkpoint (at around 90% of the last checkpoint's
+   duration).  This spreads out the I/O as much as possible to have the I/O load be
+   consistent during the checkpoint.  The disadvantage of this is that prolonging
+   checkpoints affects recovery time, because more WAL segments will need to be kept
+   around for possible use in recovery.  A user concerned about the amount of time
+   required to recover might wish to reduce <varname>checkpoint_timeout</varname>,
+   causing checkpoints to happen more frequently while still spreading out the I/O
+   from each checkpoint.  Alternatively,
+   <varname>checkpoint_completion_target</varname> could be reduced, but this would
+   result in times of more intense I/O (during the checkpoint) and times of less I/O
+   (after the checkpoint completed but before the next scheduled checkpoint) and
+   therefore is not recommended.
+   Although <varname>checkpoint_completion_target</varname> could be set as high as
+   1.0, it is best to keep it less than that (such as at the default of 0.9, at most)
+   since checkpoints include some other activities besides writing dirty buffers.
    A setting of 1.0 is quite likely to result in checkpoints not being
    completed on time, which would result in performance loss due to
    unexpected variation in the number of WAL segments needed.
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 17579eeaca..39d32542d2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3689,7 +3689,7 @@ static struct config_real ConfigureNamesReal[] =
 			NULL
 		},
 		&CheckPointCompletionTarget,
-		0.5, 0.0, 1.0,
+		0.9, 0.0, 1.0,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 8930a94fff..4964134c8c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -230,7 +230,7 @@
 #checkpoint_timeout = 5min		# range 30s-1d
 #max_wal_size = 1GB
 #min_wal_size = 80MB
-#checkpoint_completion_target = 0.5	# checkpoint target duration, 0.0 - 1.0
+#checkpoint_completion_target = 0.9	# checkpoint target duration, 0.0 - 1.0
 #checkpoint_flush_after = 0		# measured in pages, 0 disables
 #checkpoint_warning = 30s		# 0 disables
 
diff --git a/src/test/recovery/t/015_promotion_pages.pl b/src/test/recovery/t/015_promotion_pages.pl
index 6fb70b5001..25a9e4764a 100644
--- a/src/test/recovery/t/015_promotion_pages.pl
+++ b/src/test/recovery/t/015_promotion_pages.pl
@@ -26,7 +26,6 @@ my $bravo = get_new_node('bravo');
 $bravo->init_from_backup($alpha, 'bkp', has_streaming => 1);
 $bravo->append_conf('postgresql.conf', <<EOF);
 checkpoint_timeout=1h
-checkpoint_completion_target=0.9
 EOF
 $bravo->start;
 
-- 
2.25.1

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#17)
Re: Change default of checkpoint_completion_target

Stephen Frost <sfrost@snowman.net> writes:

Any further comments or thoughts on this one?

This:

+        total time between checkpoints. The default is 0.9, which spreads the
+        checkpoint across the entire checkpoint timeout period of time,

is confusing because 0.9 is obviously not 1.0; people will wonder
whether the scale is something strange or the text is just wrong.
They will also wonder why not use 1.0 instead. So perhaps more like

... The default is 0.9, which spreads the checkpoint across almost
all the available interval, providing fairly consistent I/O load
while also leaving some slop for checkpoint completion overhead.

The other chunk of text seems accurate, but there's no reason to let
this one be misleading.

regards, tom lane

#19Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#18)
1 attachment(s)
Re: Change default of checkpoint_completion_target

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

Any further comments or thoughts on this one?

This:

+        total time between checkpoints. The default is 0.9, which spreads the
+        checkpoint across the entire checkpoint timeout period of time,

is confusing because 0.9 is obviously not 1.0; people will wonder
whether the scale is something strange or the text is just wrong.
They will also wonder why not use 1.0 instead. So perhaps more like

... The default is 0.9, which spreads the checkpoint across almost
all the available interval, providing fairly consistent I/O load
while also leaving some slop for checkpoint completion overhead.

The other chunk of text seems accurate, but there's no reason to let
this one be misleading.

Good point, updated along those lines.

In passing, I noticed that we have a lot of documentation like:

This parameter can only be set in the postgresql.conf file or on the
server command line.

... which hasn't been true since the introduction of ALTER SYSTEM. I
don't really think it's this patch's job to clean that up but it doesn't
seem quite right that we don't include ALTER SYSTEM in that list either.
If this was C code, maybe we could get away with just changing such
references as we find them, but I don't think we'd want the
documentation to be in an inconsistent state regarding that.

Anyone want to opine about what to do with that? Should we consider
changing those to mention ALTER SYSTEM? Or perhaps have a way of saying
"at server start" that then links to "how to set options at server
start", perhaps..

Thanks,

Stephen

Attachments:

cct_def_v4.patchtext/x-diff; charset=us-asciiDownload
From 97c24d92e4ae470a257aa2ac9501032aba5edd82 Mon Sep 17 00:00:00 2001
From: Stephen Frost <sfrost@snowman.net>
Date: Tue, 19 Jan 2021 13:53:34 -0500
Subject: [PATCH] Change the default of checkpoint_completion_target to 0.9

Common recommendations are that the checkpoint should be spread out as
much as possible, provided we avoid having it take too long.  This
change updates the default to 0.9 (from 0.5) to match that
recommendation.

There was some debate about possibly removing the option entirely but it
seems there may be some corner-cases where having it set much lower to
try to force the checkpoint to be as fast as possible could result in
fewer periods of time of reduced performance due to kernel flushing.
General agreement is that the "spread more" is the preferred approach
though and those who need to tune away from that value are much less
common.

Reviewed-By: Michael Paquier, Peter Eisentraut, Tom Lane
Discussion: https://postgr.es/m/20201207175329.GM16415%40tamriel.snowman.net
---
 doc/src/sgml/config.sgml                      | 12 ++++++--
 doc/src/sgml/wal.sgml                         | 29 ++++++++++++-------
 src/backend/utils/misc/guc.c                  |  2 +-
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/test/recovery/t/015_promotion_pages.pl    |  1 -
 5 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 82864bbb24..666b467eda 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3266,9 +3266,15 @@ include_dir 'conf.d'
       <listitem>
        <para>
         Specifies the target of checkpoint completion, as a fraction of
-        total time between checkpoints. The default is 0.5.
-        This parameter can only be set in the <filename>postgresql.conf</filename>
-        file or on the server command line.
+        total time between checkpoints. The default is 0.9, which spreads the
+        checkpoint across almost all of the available interval, providing fairly
+        consistent I/O load while also leaving some slop for checkpoint
+        completion overhead.  Reducing this parameter is not recommended as that
+        causes the I/O from the checkpoint to have to complete faster, resulting
+        in a higher I/O rate, while then having a period of less I/O between the
+        completion of the checkpoint and the start of the next scheduled
+        checkpoint.  This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 66de1ee2f8..733eba22db 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -571,22 +571,29 @@
    writing dirty buffers during a checkpoint is spread over a period of time.
    That period is controlled by
    <xref linkend="guc-checkpoint-completion-target"/>, which is
-   given as a fraction of the checkpoint interval.
+   given as a fraction of the checkpoint interval (configured by using
+   <varname>checkpoint_timeout</varname>).
    The I/O rate is adjusted so that the checkpoint finishes when the
    given fraction of
    <varname>checkpoint_timeout</varname> seconds have elapsed, or before
    <varname>max_wal_size</varname> is exceeded, whichever is sooner.
-   With the default value of 0.5,
+   With the default value of 0.9,
    <productname>PostgreSQL</productname> can be expected to complete each checkpoint
-   in about half the time before the next checkpoint starts.  On a system
-   that's very close to maximum I/O throughput during normal operation,
-   you might want to increase <varname>checkpoint_completion_target</varname>
-   to reduce the I/O load from checkpoints.  The disadvantage of this is that
-   prolonging checkpoints affects recovery time, because more WAL segments
-   will need to be kept around for possible use in recovery.  Although
-   <varname>checkpoint_completion_target</varname> can be set as high as 1.0,
-   it is best to keep it less than that (perhaps 0.9 at most) since
-   checkpoints include some other activities besides writing dirty buffers.
+   a bit before the next scheduled checkpoint (at around 90% of the last checkpoint's
+   duration).  This spreads out the I/O as much as possible to have the I/O load be
+   consistent during the checkpoint.  The disadvantage of this is that prolonging
+   checkpoints affects recovery time, because more WAL segments will need to be kept
+   around for possible use in recovery.  A user concerned about the amount of time
+   required to recover might wish to reduce <varname>checkpoint_timeout</varname>,
+   causing checkpoints to happen more frequently while still spreading out the I/O
+   from each checkpoint.  Alternatively,
+   <varname>checkpoint_completion_target</varname> could be reduced, but this would
+   result in times of more intense I/O (during the checkpoint) and times of less I/O
+   (after the checkpoint completed but before the next scheduled checkpoint) and
+   therefore is not recommended.
+   Although <varname>checkpoint_completion_target</varname> could be set as high as
+   1.0, it is best to keep it less than that (such as at the default of 0.9, at most)
+   since checkpoints include some other activities besides writing dirty buffers.
    A setting of 1.0 is quite likely to result in checkpoints not being
    completed on time, which would result in performance loss due to
    unexpected variation in the number of WAL segments needed.
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 17579eeaca..39d32542d2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3689,7 +3689,7 @@ static struct config_real ConfigureNamesReal[] =
 			NULL
 		},
 		&CheckPointCompletionTarget,
-		0.5, 0.0, 1.0,
+		0.9, 0.0, 1.0,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 8930a94fff..4964134c8c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -230,7 +230,7 @@
 #checkpoint_timeout = 5min		# range 30s-1d
 #max_wal_size = 1GB
 #min_wal_size = 80MB
-#checkpoint_completion_target = 0.5	# checkpoint target duration, 0.0 - 1.0
+#checkpoint_completion_target = 0.9	# checkpoint target duration, 0.0 - 1.0
 #checkpoint_flush_after = 0		# measured in pages, 0 disables
 #checkpoint_warning = 30s		# 0 disables
 
diff --git a/src/test/recovery/t/015_promotion_pages.pl b/src/test/recovery/t/015_promotion_pages.pl
index 6fb70b5001..25a9e4764a 100644
--- a/src/test/recovery/t/015_promotion_pages.pl
+++ b/src/test/recovery/t/015_promotion_pages.pl
@@ -26,7 +26,6 @@ my $bravo = get_new_node('bravo');
 $bravo->init_from_backup($alpha, 'bkp', has_streaming => 1);
 $bravo->append_conf('postgresql.conf', <<EOF);
 checkpoint_timeout=1h
-checkpoint_completion_target=0.9
 EOF
 $bravo->start;
 
-- 
2.25.1

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#19)
Re: Change default of checkpoint_completion_target

Stephen Frost <sfrost@snowman.net> writes:

In passing, I noticed that we have a lot of documentation like:

This parameter can only be set in the postgresql.conf file or on the
server command line.

... which hasn't been true since the introduction of ALTER SYSTEM.

Well, it's still true if you understand "the postgresql.conf file"
to cover whatever's included by postgresql.conf, notably
postgresql.auto.conf (and the include facility existed long before
that, too, so you needed the expanded interpretation even then).
Still, I take your point that it's confusing.

I like your suggestion of shortening all of these to be "can only be set
at server start", or maybe better "cannot be changed after server start".
I'm not sure whether or not we really need new text elsewhere; I think
section 20.1 is pretty long already.

regards, tom lane

#21japin
japinli@hotmail.com
In reply to: Stephen Frost (#19)
Re: Change default of checkpoint_completion_target

On Wed, 20 Jan 2021 at 03:47, Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

Any further comments or thoughts on this one?

This:

+        total time between checkpoints. The default is 0.9, which spreads the
+        checkpoint across the entire checkpoint timeout period of time,

is confusing because 0.9 is obviously not 1.0; people will wonder
whether the scale is something strange or the text is just wrong.
They will also wonder why not use 1.0 instead. So perhaps more like

... The default is 0.9, which spreads the checkpoint across almost
all the available interval, providing fairly consistent I/O load
while also leaving some slop for checkpoint completion overhead.

The other chunk of text seems accurate, but there's no reason to let
this one be misleading.

Good point, updated along those lines.

In passing, I noticed that we have a lot of documentation like:

This parameter can only be set in the postgresql.conf file or on the
server command line.

... which hasn't been true since the introduction of ALTER SYSTEM. I
don't really think it's this patch's job to clean that up but it doesn't
seem quite right that we don't include ALTER SYSTEM in that list either.
If this was C code, maybe we could get away with just changing such
references as we find them, but I don't think we'd want the
documentation to be in an inconsistent state regarding that.

I have already mentioned this in [1]/messages/by-id/199703E4-A795-4FB8-911C-D0DE9F51519C@hotmail.com, however it seems unattractive.

[1]: /messages/by-id/199703E4-A795-4FB8-911C-D0DE9F51519C@hotmail.com

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

#22David Steele
david@pgmasters.net
In reply to: Stephen Frost (#19)
Re: Change default of checkpoint_completion_target

On 1/19/21 2:47 PM, Stephen Frost wrote:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

Any further comments or thoughts on this one?

This:

+        total time between checkpoints. The default is 0.9, which spreads the
+        checkpoint across the entire checkpoint timeout period of time,

is confusing because 0.9 is obviously not 1.0; people will wonder
whether the scale is something strange or the text is just wrong.
They will also wonder why not use 1.0 instead. So perhaps more like

... The default is 0.9, which spreads the checkpoint across almost
all the available interval, providing fairly consistent I/O load
while also leaving some slop for checkpoint completion overhead.

The other chunk of text seems accurate, but there's no reason to let
this one be misleading.

Good point, updated along those lines.

I had a look at the patch and the change and new documentation seem
sensible to me.

I think this phrase may be a bit too idiomatic:

+ consistent I/O load while also leaving some slop for checkpoint

Perhaps just:

+ consistent I/O load while also leaving some time for checkpoint

It seems to me that the discussion about changing the wording for GUCs
not changeable after server should be saved for another patch as long as
this patch follows the current convention.

Regards,
--
-David
david@pgmasters.net

#23Stephen Frost
sfrost@snowman.net
In reply to: David Steele (#22)
1 attachment(s)
Re: Change default of checkpoint_completion_target

Greetings,

* David Steele (david@pgmasters.net) wrote:

I had a look at the patch and the change and new documentation seem sensible
to me.

Thanks!

I think this phrase may be a bit too idiomatic:

+ consistent I/O load while also leaving some slop for checkpoint

Perhaps just:

+ consistent I/O load while also leaving some time for checkpoint

Yeah, good thought, updated.

It seems to me that the discussion about changing the wording for GUCs not
changeable after server should be saved for another patch as long as this
patch follows the current convention.

Agreed.

Unless there's anything further on this, I'll plan to commit it tomorrow
or Wednesday.

Thanks!

Stephen

Attachments:

cct_def_v5.patchtext/x-diff; charset=us-asciiDownload
From 3ebe08dee4b9dfe2dff51fd1bad2eb36834e82ed Mon Sep 17 00:00:00 2001
From: Stephen Frost <sfrost@snowman.net>
Date: Tue, 19 Jan 2021 13:53:34 -0500
Subject: [PATCH] Change the default of checkpoint_completion_target to 0.9

Common recommendations are that the checkpoint should be spread out as
much as possible, provided we avoid having it take too long.  This
change updates the default to 0.9 (from 0.5) to match that
recommendation.

There was some debate about possibly removing the option entirely but it
seems there may be some corner-cases where having it set much lower to
try to force the checkpoint to be as fast as possible could result in
fewer periods of time of reduced performance due to kernel flushing.
General agreement is that the "spread more" is the preferred approach
though and those who need to tune away from that value are much less
common.

Reviewed-By: Michael Paquier, Peter Eisentraut, Tom Lane, David Steele
Discussion: https://postgr.es/m/20201207175329.GM16415%40tamriel.snowman.net
---
 doc/src/sgml/config.sgml                      | 12 ++++++--
 doc/src/sgml/wal.sgml                         | 29 ++++++++++++-------
 src/backend/utils/misc/guc.c                  |  2 +-
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/test/recovery/t/015_promotion_pages.pl    |  1 -
 5 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5679b40dd5..44763f0180 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3302,9 +3302,15 @@ include_dir 'conf.d'
       <listitem>
        <para>
         Specifies the target of checkpoint completion, as a fraction of
-        total time between checkpoints. The default is 0.5.
-        This parameter can only be set in the <filename>postgresql.conf</filename>
-        file or on the server command line.
+        total time between checkpoints. The default is 0.9, which spreads the
+        checkpoint across almost all of the available interval, providing fairly
+        consistent I/O load while also leaving some time for checkpoint
+        completion overhead.  Reducing this parameter is not recommended as that
+        causes the I/O from the checkpoint to have to complete faster, resulting
+        in a higher I/O rate, while then having a period of less I/O between the
+        completion of the checkpoint and the start of the next scheduled
+        checkpoint.  This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index ae4a3c1293..4354051c7b 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -571,22 +571,29 @@
    writing dirty buffers during a checkpoint is spread over a period of time.
    That period is controlled by
    <xref linkend="guc-checkpoint-completion-target"/>, which is
-   given as a fraction of the checkpoint interval.
+   given as a fraction of the checkpoint interval (configured by using
+   <varname>checkpoint_timeout</varname>).
    The I/O rate is adjusted so that the checkpoint finishes when the
    given fraction of
    <varname>checkpoint_timeout</varname> seconds have elapsed, or before
    <varname>max_wal_size</varname> is exceeded, whichever is sooner.
-   With the default value of 0.5,
+   With the default value of 0.9,
    <productname>PostgreSQL</productname> can be expected to complete each checkpoint
-   in about half the time before the next checkpoint starts.  On a system
-   that's very close to maximum I/O throughput during normal operation,
-   you might want to increase <varname>checkpoint_completion_target</varname>
-   to reduce the I/O load from checkpoints.  The disadvantage of this is that
-   prolonging checkpoints affects recovery time, because more WAL segments
-   will need to be kept around for possible use in recovery.  Although
-   <varname>checkpoint_completion_target</varname> can be set as high as 1.0,
-   it is best to keep it less than that (perhaps 0.9 at most) since
-   checkpoints include some other activities besides writing dirty buffers.
+   a bit before the next scheduled checkpoint (at around 90% of the last checkpoint's
+   duration).  This spreads out the I/O as much as possible to have the I/O load be
+   consistent during the checkpoint.  The disadvantage of this is that prolonging
+   checkpoints affects recovery time, because more WAL segments will need to be kept
+   around for possible use in recovery.  A user concerned about the amount of time
+   required to recover might wish to reduce <varname>checkpoint_timeout</varname>,
+   causing checkpoints to happen more frequently while still spreading out the I/O
+   from each checkpoint.  Alternatively,
+   <varname>checkpoint_completion_target</varname> could be reduced, but this would
+   result in times of more intense I/O (during the checkpoint) and times of less I/O
+   (after the checkpoint completed but before the next scheduled checkpoint) and
+   therefore is not recommended.
+   Although <varname>checkpoint_completion_target</varname> could be set as high as
+   1.0, it is best to keep it less than that (such as at the default of 0.9, at most)
+   since checkpoints include some other activities besides writing dirty buffers.
    A setting of 1.0 is quite likely to result in checkpoints not being
    completed on time, which would result in performance loss due to
    unexpected variation in the number of WAL segments needed.
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 3b36a31a47..5fe7f2fa01 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3727,7 +3727,7 @@ static struct config_real ConfigureNamesReal[] =
 			NULL
 		},
 		&CheckPointCompletionTarget,
-		0.5, 0.0, 1.0,
+		0.9, 0.0, 1.0,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 86425965d0..68f95bdfff 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -231,7 +231,7 @@
 #checkpoint_timeout = 5min		# range 30s-1d
 #max_wal_size = 1GB
 #min_wal_size = 80MB
-#checkpoint_completion_target = 0.5	# checkpoint target duration, 0.0 - 1.0
+#checkpoint_completion_target = 0.9	# checkpoint target duration, 0.0 - 1.0
 #checkpoint_flush_after = 0		# measured in pages, 0 disables
 #checkpoint_warning = 30s		# 0 disables
 
diff --git a/src/test/recovery/t/015_promotion_pages.pl b/src/test/recovery/t/015_promotion_pages.pl
index 6fb70b5001..25a9e4764a 100644
--- a/src/test/recovery/t/015_promotion_pages.pl
+++ b/src/test/recovery/t/015_promotion_pages.pl
@@ -26,7 +26,6 @@ my $bravo = get_new_node('bravo');
 $bravo->init_from_backup($alpha, 'bkp', has_streaming => 1);
 $bravo->append_conf('postgresql.conf', <<EOF);
 checkpoint_timeout=1h
-checkpoint_completion_target=0.9
 EOF
 $bravo->start;
 
-- 
2.27.0

#24Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#23)
Re: Change default of checkpoint_completion_target

On Mon, Mar 22, 2021 at 01:11:00PM -0400, Stephen Frost wrote:

Unless there's anything further on this, I'll plan to commit it tomorrow
or Wednesday.

Cool, looks fine to me.

This version of the patch has forgotten to update one spot:
src/backend/postmaster/checkpointer.c:double CheckPointCompletionTarget = 0.5;
--
Michael

#25Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#24)
1 attachment(s)
Re: Change default of checkpoint_completion_target

Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:

On Mon, Mar 22, 2021 at 01:11:00PM -0400, Stephen Frost wrote:

Unless there's anything further on this, I'll plan to commit it tomorrow
or Wednesday.

Cool, looks fine to me.

This version of the patch has forgotten to update one spot:
src/backend/postmaster/checkpointer.c:double CheckPointCompletionTarget = 0.5;

Hah! Indeed!

Fixed in the attached.

Thanks!

Stephen

Attachments:

cct_def_v6.patchtext/x-diff; charset=us-asciiDownload
From 1c69cca6fc9bbd921f873cb208ffcdbd68bde586 Mon Sep 17 00:00:00 2001
From: Stephen Frost <sfrost@snowman.net>
Date: Tue, 19 Jan 2021 13:53:34 -0500
Subject: [PATCH] Change checkpoint_completion_target default to 0.9

Common recommendations are that the checkpoint should be spread out as
much as possible, provided we avoid having it take too long.  This
change updates the default to 0.9 (from 0.5) to match that
recommendation.

There was some debate about possibly removing the option entirely but it
seems there may be some corner-cases where having it set much lower to
try to force the checkpoint to be as fast as possible could result in
fewer periods of time of reduced performance due to kernel flushing.
General agreement is that the "spread more" is the preferred approach
though and those who need to tune away from that value are much less
common.

Reviewed-By: Michael Paquier, Peter Eisentraut, Tom Lane, David Steele
Discussion: https://postgr.es/m/20201207175329.GM16415%40tamriel.snowman.net
---
 doc/src/sgml/config.sgml                      | 12 ++++++--
 doc/src/sgml/wal.sgml                         | 29 ++++++++++++-------
 src/backend/postmaster/checkpointer.c         |  2 +-
 src/backend/utils/misc/guc.c                  |  2 +-
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/test/recovery/t/015_promotion_pages.pl    |  1 -
 6 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5679b40dd5..44763f0180 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3302,9 +3302,15 @@ include_dir 'conf.d'
       <listitem>
        <para>
         Specifies the target of checkpoint completion, as a fraction of
-        total time between checkpoints. The default is 0.5.
-        This parameter can only be set in the <filename>postgresql.conf</filename>
-        file or on the server command line.
+        total time between checkpoints. The default is 0.9, which spreads the
+        checkpoint across almost all of the available interval, providing fairly
+        consistent I/O load while also leaving some time for checkpoint
+        completion overhead.  Reducing this parameter is not recommended as that
+        causes the I/O from the checkpoint to have to complete faster, resulting
+        in a higher I/O rate, while then having a period of less I/O between the
+        completion of the checkpoint and the start of the next scheduled
+        checkpoint.  This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index ae4a3c1293..4354051c7b 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -571,22 +571,29 @@
    writing dirty buffers during a checkpoint is spread over a period of time.
    That period is controlled by
    <xref linkend="guc-checkpoint-completion-target"/>, which is
-   given as a fraction of the checkpoint interval.
+   given as a fraction of the checkpoint interval (configured by using
+   <varname>checkpoint_timeout</varname>).
    The I/O rate is adjusted so that the checkpoint finishes when the
    given fraction of
    <varname>checkpoint_timeout</varname> seconds have elapsed, or before
    <varname>max_wal_size</varname> is exceeded, whichever is sooner.
-   With the default value of 0.5,
+   With the default value of 0.9,
    <productname>PostgreSQL</productname> can be expected to complete each checkpoint
-   in about half the time before the next checkpoint starts.  On a system
-   that's very close to maximum I/O throughput during normal operation,
-   you might want to increase <varname>checkpoint_completion_target</varname>
-   to reduce the I/O load from checkpoints.  The disadvantage of this is that
-   prolonging checkpoints affects recovery time, because more WAL segments
-   will need to be kept around for possible use in recovery.  Although
-   <varname>checkpoint_completion_target</varname> can be set as high as 1.0,
-   it is best to keep it less than that (perhaps 0.9 at most) since
-   checkpoints include some other activities besides writing dirty buffers.
+   a bit before the next scheduled checkpoint (at around 90% of the last checkpoint's
+   duration).  This spreads out the I/O as much as possible to have the I/O load be
+   consistent during the checkpoint.  The disadvantage of this is that prolonging
+   checkpoints affects recovery time, because more WAL segments will need to be kept
+   around for possible use in recovery.  A user concerned about the amount of time
+   required to recover might wish to reduce <varname>checkpoint_timeout</varname>,
+   causing checkpoints to happen more frequently while still spreading out the I/O
+   from each checkpoint.  Alternatively,
+   <varname>checkpoint_completion_target</varname> could be reduced, but this would
+   result in times of more intense I/O (during the checkpoint) and times of less I/O
+   (after the checkpoint completed but before the next scheduled checkpoint) and
+   therefore is not recommended.
+   Although <varname>checkpoint_completion_target</varname> could be set as high as
+   1.0, it is best to keep it less than that (such as at the default of 0.9, at most)
+   since checkpoints include some other activities besides writing dirty buffers.
    A setting of 1.0 is quite likely to result in checkpoints not being
    completed on time, which would result in performance loss due to
    unexpected variation in the number of WAL segments needed.
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 5907a7befc..e7e6a2a459 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -145,7 +145,7 @@ static CheckpointerShmemStruct *CheckpointerShmem;
  */
 int			CheckPointTimeout = 300;
 int			CheckPointWarning = 30;
-double		CheckPointCompletionTarget = 0.5;
+double		CheckPointCompletionTarget = 0.9;
 
 /*
  * Private state
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 3b36a31a47..5fe7f2fa01 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3727,7 +3727,7 @@ static struct config_real ConfigureNamesReal[] =
 			NULL
 		},
 		&CheckPointCompletionTarget,
-		0.5, 0.0, 1.0,
+		0.9, 0.0, 1.0,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 86425965d0..68f95bdfff 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -231,7 +231,7 @@
 #checkpoint_timeout = 5min		# range 30s-1d
 #max_wal_size = 1GB
 #min_wal_size = 80MB
-#checkpoint_completion_target = 0.5	# checkpoint target duration, 0.0 - 1.0
+#checkpoint_completion_target = 0.9	# checkpoint target duration, 0.0 - 1.0
 #checkpoint_flush_after = 0		# measured in pages, 0 disables
 #checkpoint_warning = 30s		# 0 disables
 
diff --git a/src/test/recovery/t/015_promotion_pages.pl b/src/test/recovery/t/015_promotion_pages.pl
index 6fb70b5001..25a9e4764a 100644
--- a/src/test/recovery/t/015_promotion_pages.pl
+++ b/src/test/recovery/t/015_promotion_pages.pl
@@ -26,7 +26,6 @@ my $bravo = get_new_node('bravo');
 $bravo->init_from_backup($alpha, 'bkp', has_streaming => 1);
 $bravo->append_conf('postgresql.conf', <<EOF);
 checkpoint_timeout=1h
-checkpoint_completion_target=0.9
 EOF
 $bravo->start;
 
-- 
2.27.0

#26Bossart, Nathan
bossartn@amazon.com
In reply to: Stephen Frost (#25)
Re: Change default of checkpoint_completion_target

LGTM. I just have a few small wording suggestions.

+        completion overhead.  Reducing this parameter is not recommended as that
+        causes the I/O from the checkpoint to have to complete faster, resulting
+        in a higher I/O rate, while then having a period of less I/O between the
+        completion of the checkpoint and the start of the next scheduled
+        checkpoint.  This parameter can only be set in the

Reducing this parameter is not recommended because it forces the
checkpoint to complete faster. This results in a higher rate of I/O
during the checkpoint followed by a period of less I/O between
checkpoint completion and the next scheduled checkpoint.

+   duration).  This spreads out the I/O as much as possible to have the I/O load be
+   consistent during the checkpoint.  The disadvantage of this is that prolonging

This spreads out the I/O as much as possible so that the checkpoint
I/O load is consistent throughout the checkpoint interval.

+   around for possible use in recovery.  A user concerned about the amount of time
+   required to recover might wish to reduce <varname>checkpoint_timeout</varname>,
+   causing checkpoints to happen more frequently while still spreading out the I/O
+   from each checkpoint.  Alternatively,

A user concerned about the amount of time required to recover might
wish to reduce checkpoint_timeout so that checkpoints occur more
frequently but still spread the I/O across the checkpoint interval.

+   Although <varname>checkpoint_completion_target</varname> could be set as high as
+   1.0, it is best to keep it less than that (such as at the default of 0.9, at most)
+   since checkpoints include some other activities besides writing dirty buffers.

Although checkpoint_completion_target can be set as high at 1.0, it is
typically recommended to set it to no higher than 0.9 (the default)
since checkpoints include some other activities besides writing dirty
buffers.

Nathan

#27Bruce Momjian
bruce@momjian.us
In reply to: Bossart, Nathan (#26)
Re: Change default of checkpoint_completion_target

On Tue, Mar 23, 2021 at 06:24:07PM +0000, Bossart, Nathan wrote:

LGTM. I just have a few small wording suggestions.

+        completion overhead.  Reducing this parameter is not recommended as that
+        causes the I/O from the checkpoint to have to complete faster, resulting
+        in a higher I/O rate, while then having a period of less I/O between the
+        completion of the checkpoint and the start of the next scheduled
+        checkpoint.  This parameter can only be set in the

Reducing this parameter is not recommended because it forces the
checkpoint to complete faster. This results in a higher rate of I/O
during the checkpoint followed by a period of less I/O between
checkpoint completion and the next scheduled checkpoint.

FYI, I am very happy this issue is being addressed for PG 14. :-)

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#28Stephen Frost
sfrost@snowman.net
In reply to: Bossart, Nathan (#26)
1 attachment(s)
Re: Change default of checkpoint_completion_target

Greetings,

* Bossart, Nathan (bossartn@amazon.com) wrote:

LGTM. I just have a few small wording suggestions.

Agreed, those looked like good suggestions and so I've incorporated
them.

Updated patch attached.

Thanks!

Stephen

Attachments:

cct_def_v7.patchtext/x-diff; charset=us-asciiDownload
From 40a529bc0a129e90c9917c1a3df2297ac7f2e073 Mon Sep 17 00:00:00 2001
From: Stephen Frost <sfrost@snowman.net>
Date: Tue, 19 Jan 2021 13:53:34 -0500
Subject: [PATCH] Change checkpoint_completion_target default to 0.9

Common recommendations are that the checkpoint should be spread out as
much as possible, provided we avoid having it take too long.  This
change updates the default to 0.9 (from 0.5) to match that
recommendation.

There was some debate about possibly removing the option entirely but it
seems there may be some corner-cases where having it set much lower to
try to force the checkpoint to be as fast as possible could result in
fewer periods of time of reduced performance due to kernel flushing.
General agreement is that the "spread more" is the preferred approach
though and those who need to tune away from that value are much less
common.

Reviewed-By: Michael Paquier, Peter Eisentraut, Tom Lane, David Steele,
Nathan Bossart
Discussion: https://postgr.es/m/20201207175329.GM16415%40tamriel.snowman.net
---
 doc/src/sgml/config.sgml                      | 12 ++++++--
 doc/src/sgml/wal.sgml                         | 29 ++++++++++++-------
 src/backend/postmaster/checkpointer.c         |  2 +-
 src/backend/utils/misc/guc.c                  |  2 +-
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/test/recovery/t/015_promotion_pages.pl    |  1 -
 6 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5679b40dd5..0d101f65f6 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3302,9 +3302,15 @@ include_dir 'conf.d'
       <listitem>
        <para>
         Specifies the target of checkpoint completion, as a fraction of
-        total time between checkpoints. The default is 0.5.
-        This parameter can only be set in the <filename>postgresql.conf</filename>
-        file or on the server command line.
+        total time between checkpoints. The default is 0.9, which spreads the
+        checkpoint across almost all of the available interval, providing fairly
+        consistent I/O load while also leaving some time for checkpoint
+        completion overhead.  Reducing this parameter is not recommended because
+        it causes the checkpoint to complete faster.  This results in a higher
+        rate of I/O during the checkpoint followed by a period of less I/O between
+        the checkpoint completion and the next scheduled checkpoint.  This
+        parameter can only be set in the <filename>postgresql.conf</filename> file
+        or on the server command line.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index ae4a3c1293..7d48f42710 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -571,22 +571,29 @@
    writing dirty buffers during a checkpoint is spread over a period of time.
    That period is controlled by
    <xref linkend="guc-checkpoint-completion-target"/>, which is
-   given as a fraction of the checkpoint interval.
+   given as a fraction of the checkpoint interval (configured by using
+   <varname>checkpoint_timeout</varname>).
    The I/O rate is adjusted so that the checkpoint finishes when the
    given fraction of
    <varname>checkpoint_timeout</varname> seconds have elapsed, or before
    <varname>max_wal_size</varname> is exceeded, whichever is sooner.
-   With the default value of 0.5,
+   With the default value of 0.9,
    <productname>PostgreSQL</productname> can be expected to complete each checkpoint
-   in about half the time before the next checkpoint starts.  On a system
-   that's very close to maximum I/O throughput during normal operation,
-   you might want to increase <varname>checkpoint_completion_target</varname>
-   to reduce the I/O load from checkpoints.  The disadvantage of this is that
-   prolonging checkpoints affects recovery time, because more WAL segments
-   will need to be kept around for possible use in recovery.  Although
-   <varname>checkpoint_completion_target</varname> can be set as high as 1.0,
-   it is best to keep it less than that (perhaps 0.9 at most) since
-   checkpoints include some other activities besides writing dirty buffers.
+   a bit before the next scheduled checkpoint (at around 90% of the last checkpoint's
+   duration).  This spreads out the I/O as much as possible so that the checkpoint
+   I/O load is consistent throughout the checkpoint interval.  The disadvantage of
+   this is that prolonging checkpoints affects recovery time, because more WAL
+   segments will need to be kept around for possible use in recovery.  A user
+   concerned about the amount of time required to recover might wish to reduce
+   <varname>checkpoint_timeout</varname> so that checkpoints occur more frequently
+   but still spread the I/O across the checkpoint interval.  Alternatively,
+   <varname>checkpoint_completion_target</varname> could be reduced, but this would
+   result in times of more intense I/O (during the checkpoint) and times of less I/O
+   (after the checkpoint completed but before the next scheduled checkpoint) and
+   therefore is not recommended.
+   Although <varname>checkpoint_completion_target</varname> could be set as high as
+   1.0, it is typically recommended to set it to no higher than 0.9 (the default)
+   since checkpoints include some other activities besides writing dirty buffers.
    A setting of 1.0 is quite likely to result in checkpoints not being
    completed on time, which would result in performance loss due to
    unexpected variation in the number of WAL segments needed.
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 5907a7befc..e7e6a2a459 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -145,7 +145,7 @@ static CheckpointerShmemStruct *CheckpointerShmem;
  */
 int			CheckPointTimeout = 300;
 int			CheckPointWarning = 30;
-double		CheckPointCompletionTarget = 0.5;
+double		CheckPointCompletionTarget = 0.9;
 
 /*
  * Private state
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 3b36a31a47..5fe7f2fa01 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3727,7 +3727,7 @@ static struct config_real ConfigureNamesReal[] =
 			NULL
 		},
 		&CheckPointCompletionTarget,
-		0.5, 0.0, 1.0,
+		0.9, 0.0, 1.0,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 86425965d0..68f95bdfff 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -231,7 +231,7 @@
 #checkpoint_timeout = 5min		# range 30s-1d
 #max_wal_size = 1GB
 #min_wal_size = 80MB
-#checkpoint_completion_target = 0.5	# checkpoint target duration, 0.0 - 1.0
+#checkpoint_completion_target = 0.9	# checkpoint target duration, 0.0 - 1.0
 #checkpoint_flush_after = 0		# measured in pages, 0 disables
 #checkpoint_warning = 30s		# 0 disables
 
diff --git a/src/test/recovery/t/015_promotion_pages.pl b/src/test/recovery/t/015_promotion_pages.pl
index 6fb70b5001..25a9e4764a 100644
--- a/src/test/recovery/t/015_promotion_pages.pl
+++ b/src/test/recovery/t/015_promotion_pages.pl
@@ -26,7 +26,6 @@ my $bravo = get_new_node('bravo');
 $bravo->init_from_backup($alpha, 'bkp', has_streaming => 1);
 $bravo->append_conf('postgresql.conf', <<EOF);
 checkpoint_timeout=1h
-checkpoint_completion_target=0.9
 EOF
 $bravo->start;
 
-- 
2.27.0

#29Bossart, Nathan
bossartn@amazon.com
In reply to: Stephen Frost (#28)
Re: Change default of checkpoint_completion_target

On 3/23/21, 12:19 PM, "Stephen Frost" <sfrost@snowman.net> wrote:

* Bossart, Nathan (bossartn@amazon.com) wrote:

LGTM. I just have a few small wording suggestions.

Agreed, those looked like good suggestions and so I've incorporated
them.

Updated patch attached.

Looks good!

Nathan

#30Stephen Frost
sfrost@snowman.net
In reply to: Bossart, Nathan (#29)
Re: Change default of checkpoint_completion_target

Greetings,

* Bossart, Nathan (bossartn@amazon.com) wrote:

On 3/23/21, 12:19 PM, "Stephen Frost" <sfrost@snowman.net> wrote:

* Bossart, Nathan (bossartn@amazon.com) wrote:

LGTM. I just have a few small wording suggestions.

Agreed, those looked like good suggestions and so I've incorporated
them.

Updated patch attached.

Looks good!

Great, pushed! Thanks to everyone for your thoughts, comments,
suggestions, and improvments.

Stephen