Change default of checkpoint_completion_target

Started by Stephen Frostover 5 years ago30 messageshackers
Jump to latest
#1Stephen Frost
sfrost@snowman.net

Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:

On Sun, Dec 06, 2020 at 10:03:08AM -0500, Stephen Frost wrote:

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:

You keep making this statement, and I don't necessarily disagree, but if
that is the case, please explain why don't we have
checkpoint_completion_target set to 0.9 by default? Should we change
that?

Yes, I do think we should change that..

Agreed. FWIW, no idea for others, but it is one of those parameters I
keep telling to update after a default installation.

Concretely, attached is a patch which changes the default and updates
the documentation accordingly.

Passes regression tests and doc build. Will register in the January
commitfest as Needs Review.

Thanks,

Stephen

Attachments:

cct_def_v1.patchtext/x-diff; charset=us-asciiDownload+26-15
#2Peter Eisentraut
peter_e@gmx.net
In reply to: Stephen Frost (#1)
Re: Change default of checkpoint_completion_target

On 2020-12-07 18:53, Stephen Frost wrote:

* Michael Paquier (michael@paquier.xyz) wrote:

On Sun, Dec 06, 2020 at 10:03:08AM -0500, Stephen Frost wrote:

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:

You keep making this statement, and I don't necessarily disagree, but if
that is the case, please explain why don't we have
checkpoint_completion_target set to 0.9 by default? Should we change
that?

Yes, I do think we should change that..

Agreed. FWIW, no idea for others, but it is one of those parameters I
keep telling to update after a default installation.

Concretely, attached is a patch which changes the default and updates
the documentation accordingly.

I agree with considering this change, but I wonder why the value 0.9.
Why not, say, 0.95, 0.99, or 1.0?

#3Stephen Frost
sfrost@snowman.net
In reply to: Peter Eisentraut (#2)
Re: Change default of checkpoint_completion_target

Greetings,

* Peter Eisentraut (peter.eisentraut@enterprisedb.com) wrote:

On 2020-12-07 18:53, Stephen Frost wrote:

* Michael Paquier (michael@paquier.xyz) wrote:

On Sun, Dec 06, 2020 at 10:03:08AM -0500, Stephen Frost wrote:

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:

You keep making this statement, and I don't necessarily disagree, but if
that is the case, please explain why don't we have
checkpoint_completion_target set to 0.9 by default? Should we change
that?

Yes, I do think we should change that..

Agreed. FWIW, no idea for others, but it is one of those parameters I
keep telling to update after a default installation.

Concretely, attached is a patch which changes the default and updates
the documentation accordingly.

I agree with considering this change, but I wonder why the value 0.9. Why
not, say, 0.95, 0.99, or 1.0?

The documentation (which my patch updates to match the new default)
covers this pretty well here:

https://www.postgresql.org/docs/current/wal-configuration.html

"Although checkpoint_completion_target can be set as high as 1.0, it is
best to keep it less than that (perhaps 0.9 at most) since checkpoints
include some other activities besides writing dirty buffers. A setting
of 1.0 is quite likely to result in checkpoints not being completed on
time, which would result in performance loss due to unexpected variation
in the number of WAL segments needed."

Thanks,

Stephen

#4Nathan Bossart
nathandbossart@gmail.com
In reply to: Stephen Frost (#3)
Re: Change default of checkpoint_completion_target

On 12/7/20, 9:53 AM, "Stephen Frost" <sfrost@snowman.net> wrote:

Concretely, attached is a patch which changes the default and updates
the documentation accordingly.

+1 to setting checkpoint_completion_target to 0.9 by default.

Nathan

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#4)
Re: Change default of checkpoint_completion_target

"Bossart, Nathan" <bossartn@amazon.com> writes:

On 12/7/20, 9:53 AM, "Stephen Frost" <sfrost@snowman.net> wrote:

Concretely, attached is a patch which changes the default and updates
the documentation accordingly.

+1 to setting checkpoint_completion_target to 0.9 by default.

FWIW, I kind of like the idea of getting rid of it completely.
Is there really ever a good reason to set it to something different
than that? If not, well, we have too many GUCs already, and each
of them carries nonzero performance, documentation, and maintenance
overhead.

regards, tom lane

#6Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#5)
Re: Change default of checkpoint_completion_target

On Tue, Dec 8, 2020 at 6:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Bossart, Nathan" <bossartn@amazon.com> writes:

On 12/7/20, 9:53 AM, "Stephen Frost" <sfrost@snowman.net> wrote:

Concretely, attached is a patch which changes the default and updates
the documentation accordingly.

+1 to setting checkpoint_completion_target to 0.9 by default.

FWIW, I kind of like the idea of getting rid of it completely.
Is there really ever a good reason to set it to something different
than that? If not, well, we have too many GUCs already, and each
of them carries nonzero performance, documentation, and maintenance
overhead.

+1.

There are plenty of cases I think where it doesn't really matter with the
values, but when it does I'm not sure what it would be where something else
would actually be better.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#7Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Nathan Bossart (#4)
Re: Change default of checkpoint_completion_target

On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote:

+1 to setting checkpoint_completion_target to 0.9 by default.

+1 for changing the default or getting rid of it, as Tom suggested.

While we are at it, could we change the default of "log_lock_waits" to "on"?

Yours,
Laurenz Albe

#8Stephen Frost
sfrost@snowman.net
In reply to: Laurenz Albe (#7)
Re: Change default of checkpoint_completion_target

Greetings,

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:

On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote:

+1 to setting checkpoint_completion_target to 0.9 by default.

+1 for changing the default or getting rid of it, as Tom suggested.

Attached is a patch to change it from a GUC to a compile-time #define
which is set to 0.9, with accompanying documentation updates.

While we are at it, could we change the default of "log_lock_waits" to "on"?

While I agree that it'd be good to change quite a few of the log_X items
to be 'on' by default, I'm not planning to work on this.

Thanks,

Stephen

Attachments:

cct_def_v2.patchtext/x-diff; charset=us-asciiDownload+45-74
#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Stephen Frost (#8)
Re: Change default of checkpoint_completion_target

Howdy,

On 2020-Dec-10, Stephen Frost wrote:

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:

On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote:

+1 to setting checkpoint_completion_target to 0.9 by default.

+1 for changing the default or getting rid of it, as Tom suggested.

Attached is a patch to change it from a GUC to a compile-time #define
which is set to 0.9, with accompanying documentation updates.

I think we should leave a doc stub or at least an <indexterm>, to let
people know the GUC has been removed rather than just making it
completely invisible. (Maybe piggyback on the stuff in [1]/messages/by-id/CAGRY4nyA=jmBNa4LVwgGO1GyO-RnFmfkesddpT_uO+3=mot8DA@mail.gmail.com?)

[1]: /messages/by-id/CAGRY4nyA=jmBNa4LVwgGO1GyO-RnFmfkesddpT_uO+3=mot8DA@mail.gmail.com

#10Stephen Frost
sfrost@snowman.net
In reply to: Alvaro Herrera (#9)
Re: Change default of checkpoint_completion_target

Greetings,

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:

On 2020-Dec-10, Stephen Frost wrote:

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:

On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote:

+1 to setting checkpoint_completion_target to 0.9 by default.

+1 for changing the default or getting rid of it, as Tom suggested.

Attached is a patch to change it from a GUC to a compile-time #define
which is set to 0.9, with accompanying documentation updates.

I think we should leave a doc stub or at least an <indexterm>, to let
people know the GUC has been removed rather than just making it
completely invisible. (Maybe piggyback on the stuff in [1]?)

[1] /messages/by-id/CAGRY4nyA=jmBNa4LVwgGO1GyO-RnFmfkesddpT_uO+3=mot8DA@mail.gmail.com

Yes, I agree, and am involved in that thread as well- currently waiting
feedback from others about the proposed approach.

Getting a few more people looking at that thread and commenting on it
would really help us be able to move forward.

Thanks,

Stephen

#11Stephen Frost
sfrost@snowman.net
In reply to: Stephen Frost (#10)
Re: Change default of checkpoint_completion_target

Greetings,

* Stephen Frost (sfrost@snowman.net) wrote:

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:

On 2020-Dec-10, Stephen Frost wrote:

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:

On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote:

+1 to setting checkpoint_completion_target to 0.9 by default.

+1 for changing the default or getting rid of it, as Tom suggested.

Attached is a patch to change it from a GUC to a compile-time #define
which is set to 0.9, with accompanying documentation updates.

I think we should leave a doc stub or at least an <indexterm>, to let
people know the GUC has been removed rather than just making it
completely invisible. (Maybe piggyback on the stuff in [1]?)

[1] /messages/by-id/CAGRY4nyA=jmBNa4LVwgGO1GyO-RnFmfkesddpT_uO+3=mot8DA@mail.gmail.com

Yes, I agree, and am involved in that thread as well- currently waiting
feedback from others about the proposed approach.

I've tried to push that forward. I'm happy to update this patch once
we've got agreement to move forward on that, to wit, adding to an
'obsolete' section in the documentation information about this
particular GUC and how it's been removed due to not being sensible or
necessary to continue to have.

Getting a few more people looking at that thread and commenting on it
would really help us be able to move forward.

This is still the case though..

Thanks!

Stephen

#12Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#8)
Re: Change default of checkpoint_completion_target

On Thu, Dec 10, 2020 at 12:16:02PM -0500, Stephen Frost wrote:

Attached is a patch to change it from a GUC to a compile-time #define
which is set to 0.9, with accompanying documentation updates.

All the references to checkpoint_target_completion are removed (except
for bgwriter.h as per the patch).

This is because it performs a checkpoint, and the I/O
-     required for the checkpoint will be spread out over a significant
-     period of time, by default half your inter-checkpoint interval
-     (see the configuration parameter
-     <xref linkend="guc-checkpoint-completion-target"/>).  This is
+     required for the checkpoint will be spread out over the inter-checkpoint
+     interval (see the configuration parameter
+     <xref linkend="guc-checkpoint-timeout"/>).  This is

It may be worth mentioning that this is spread across 90% of the last
checkpoint's duration instead.

-   in about half the time before the next checkpoint starts.  On a system
-   that's very close to maximum I/O throughput during normal operation,
-   you might want to increase <varname>checkpoint_completion_target</varname>
-   to reduce the I/O load from checkpoints.  The disadvantage of this is that
-   prolonging checkpoints affects recovery time, because more WAL segments
-   will need to be kept around for possible use in recovery.  Although
-   <varname>checkpoint_completion_target</varname> can be set as high as 1.0,
-   it is best to keep it less than that (perhaps 0.9 at most) since
-   checkpoints include some other activities besides writing dirty buffers.
-   A setting of 1.0 is quite likely to result in checkpoints not being
-   completed on time, which would result in performance loss due to
-   unexpected variation in the number of WAL segments needed.
+   This spreads out the I/O as much as possible to have the I/O load be consistent
+   during the checkpoint and generally throughout the operation of the system.  The
+   disadvantage of this is that prolonging checkpoints affects recovery time,
+   because more WAL segments will need to be kept around for possible use in recovery.
+   A user concerned about the amount of time required to recover might wish to reduce
+   <varname>checkpoint_timeout</varname>, causing checkpoints to happen more
+   frequently.
</para>

<para>

Again, this makes the description of the I/O spread more general,
removing the portion where half the time is used by default. Should
this stuff also mention the spread value of 90% instead?

* At a checkpoint, how many WAL segments to recycle as preallocated future
* XLOG segments? Returns the highest segment that should be preallocated.
@@ -8694,7 +8687,7 @@ UpdateCheckPointDistanceEstimate(uint64 nbytes)
*	CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.
*	CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.
*	CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP,
- *		ignoring checkpoint_completion_target parameter.
+ *		ignoring the CheckPointCompletionTarget.

s/the//?

* be a large gap between a checkpoint's redo-pointer and the checkpoint
* record itself, and we only start the restartpoint after we've seen the
* checkpoint record. (The gap is typically up to CheckPointSegments *
-	 * checkpoint_completion_target where checkpoint_completion_target is the
+	 * CheckPointCompletionTarget where CheckPointCompletionTarget is the
* value that was in effect when the WAL was generated).

The last part of this sentence does not make sense.
CheckPointCompletionTarget becomes a constant with this patch.

if (RecoveryInProgress())
@@ -903,7 +902,7 @@ CheckpointerShmemInit(void)
*	CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.
*	CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.
*	CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP,
- *		ignoring checkpoint_completion_target parameter.
+ *		ignoring the CheckPointCompletionTarget.

s/the//?

+ * CheckPointCompletionTarget used to be exposed as a GUC named
+ * checkpoint_completion_target, but there's little evidence to suggest that
+ * there's actually a case for it being a different value, so it's no longer
+ * exposed as a GUC to be configured.

I would just remove this paragraph.
--
Michael

#13Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#5)
Re: Change default of checkpoint_completion_target

Hi,

On 2020-12-08 12:41:35 -0500, Tom Lane wrote:

FWIW, I kind of like the idea of getting rid of it completely.
Is there really ever a good reason to set it to something different
than that? If not, well, we have too many GUCs already, and each
of them carries nonzero performance, documentation, and maintenance
overhead.

I like the idea of getting rid of it too, but I think we should consider
evaluating the concrete hard-coded value a bit more careful than just
going for 0.9 based on some old recommendations in the docs. It not
being changeable afterwards...

I think it might be a good idea to immediately change the default to
0.9, and concurrently try to evaluate whether it's really the best value
(vs 0.95, 1 or ...).

FWIW I have seen a few cases in the past where setting the target to
something very small helped, but I think that was mostly because we
didn't yet tell the kernel to flush dirty data more aggressively.

Greetings,

Andres Freund

#14Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#13)
Re: Change default of checkpoint_completion_target

On 1/15/21 10:51 PM, Andres Freund wrote:

Hi,

On 2020-12-08 12:41:35 -0500, Tom Lane wrote:

FWIW, I kind of like the idea of getting rid of it completely.
Is there really ever a good reason to set it to something different
than that? If not, well, we have too many GUCs already, and each
of them carries nonzero performance, documentation, and maintenance
overhead.

I like the idea of getting rid of it too, but I think we should consider
evaluating the concrete hard-coded value a bit more careful than just
going for 0.9 based on some old recommendations in the docs. It not
being changeable afterwards...

I think it might be a good idea to immediately change the default to
0.9, and concurrently try to evaluate whether it's really the best value
(vs 0.95, 1 or ...).

FWIW I have seen a few cases in the past where setting the target to
something very small helped, but I think that was mostly because we
didn't yet tell the kernel to flush dirty data more aggressively.

Yeah. The flushing probably makes that mostly unnecessary, but we still
allow disabling that. I'm not really convinced replacing it with a
compile-time #define is a good idea, exactly because it can't be changed
if needed.

As for the exact value, maybe the right solution is to make it dynamic.
The usual approach is to leave "enough time" for the kernel to flush
dirty data, so we could say 60 seconds and calculate the exact target
depending on the checkpoint_timeout.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#14)
Re: Change default of checkpoint_completion_target

Hi,

On 2021-01-15 23:05:02 +0100, Tomas Vondra wrote:

Yeah. The flushing probably makes that mostly unnecessary, but we still
allow disabling that. I'm not really convinced replacing it with a
compile-time #define is a good idea, exactly because it can't be changed
if needed.

It's also not available everywhere...

As for the exact value, maybe the right solution is to make it dynamic.
The usual approach is to leave "enough time" for the kernel to flush
dirty data, so we could say 60 seconds and calculate the exact target
depending on the checkpoint_timeout.

IME the kernel flushing at some later time precisely is the problem,
because of the latency spikes that happen when it decides to do so. That
commonly starts to happen well before the fsyncs. The reason that
setting a very small checkpoint_completion_target can help is that it
condenses the period of unrealiable performance into one short time,
rather than spreading it over the whole checkpoint...

Greetings,

Andres Freund

#16Peter Eisentraut
peter_e@gmx.net
In reply to: Stephen Frost (#11)
Re: Change default of checkpoint_completion_target

On 2021-01-13 23:10, Stephen Frost wrote:

Yes, I agree, and am involved in that thread as well- currently waiting
feedback from others about the proposed approach.

I've tried to push that forward. I'm happy to update this patch once
we've got agreement to move forward on that, to wit, adding to an
'obsolete' section in the documentation information about this
particular GUC and how it's been removed due to not being sensible or
necessary to continue to have.

Some discussion a few days ago was arguing that it was still necessary
in some cases as a way to counteract the possible lack of tuning in the
kernel flushing behavior. I think in light of that we should go with
your first patch that just changes the default, possibly with the
documentation updated a bit.

#17Stephen Frost
sfrost@snowman.net
In reply to: Peter Eisentraut (#16)
Re: Change default of checkpoint_completion_target

Greetings,

* Peter Eisentraut (peter.eisentraut@enterprisedb.com) wrote:

On 2021-01-13 23:10, Stephen Frost wrote:

Yes, I agree, and am involved in that thread as well- currently waiting
feedback from others about the proposed approach.

I've tried to push that forward. I'm happy to update this patch once
we've got agreement to move forward on that, to wit, adding to an
'obsolete' section in the documentation information about this
particular GUC and how it's been removed due to not being sensible or
necessary to continue to have.

Some discussion a few days ago was arguing that it was still necessary in
some cases as a way to counteract the possible lack of tuning in the kernel
flushing behavior. I think in light of that we should go with your first
patch that just changes the default, possibly with the documentation updated
a bit.

Rebased and updated patch attached which moves back to just changing the
default instead of removing the option, with a more explicit call-out of
the '90%', as suggested by Michael on the other patch.

Any further comments or thoughts on this one?

Thanks,

Stephen

Attachments:

cct_def_v3.patchtext/x-diff; charset=us-asciiDownload+27-16
#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#17)
Re: Change default of checkpoint_completion_target

Stephen Frost <sfrost@snowman.net> writes:

Any further comments or thoughts on this one?

This:

+        total time between checkpoints. The default is 0.9, which spreads the
+        checkpoint across the entire checkpoint timeout period of time,

is confusing because 0.9 is obviously not 1.0; people will wonder
whether the scale is something strange or the text is just wrong.
They will also wonder why not use 1.0 instead. So perhaps more like

... The default is 0.9, which spreads the checkpoint across almost
all the available interval, providing fairly consistent I/O load
while also leaving some slop for checkpoint completion overhead.

The other chunk of text seems accurate, but there's no reason to let
this one be misleading.

regards, tom lane

#19Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#18)
Re: Change default of checkpoint_completion_target

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

Any further comments or thoughts on this one?

This:

+        total time between checkpoints. The default is 0.9, which spreads the
+        checkpoint across the entire checkpoint timeout period of time,

is confusing because 0.9 is obviously not 1.0; people will wonder
whether the scale is something strange or the text is just wrong.
They will also wonder why not use 1.0 instead. So perhaps more like

... The default is 0.9, which spreads the checkpoint across almost
all the available interval, providing fairly consistent I/O load
while also leaving some slop for checkpoint completion overhead.

The other chunk of text seems accurate, but there's no reason to let
this one be misleading.

Good point, updated along those lines.

In passing, I noticed that we have a lot of documentation like:

This parameter can only be set in the postgresql.conf file or on the
server command line.

... which hasn't been true since the introduction of ALTER SYSTEM. I
don't really think it's this patch's job to clean that up but it doesn't
seem quite right that we don't include ALTER SYSTEM in that list either.
If this was C code, maybe we could get away with just changing such
references as we find them, but I don't think we'd want the
documentation to be in an inconsistent state regarding that.

Anyone want to opine about what to do with that? Should we consider
changing those to mention ALTER SYSTEM? Or perhaps have a way of saying
"at server start" that then links to "how to set options at server
start", perhaps..

Thanks,

Stephen

Attachments:

cct_def_v4.patchtext/x-diff; charset=us-asciiDownload+29-18
#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#19)
Re: Change default of checkpoint_completion_target

Stephen Frost <sfrost@snowman.net> writes:

In passing, I noticed that we have a lot of documentation like:

This parameter can only be set in the postgresql.conf file or on the
server command line.

... which hasn't been true since the introduction of ALTER SYSTEM.

Well, it's still true if you understand "the postgresql.conf file"
to cover whatever's included by postgresql.conf, notably
postgresql.auto.conf (and the include facility existed long before
that, too, so you needed the expanded interpretation even then).
Still, I take your point that it's confusing.

I like your suggestion of shortening all of these to be "can only be set
at server start", or maybe better "cannot be changed after server start".
I'm not sure whether or not we really need new text elsewhere; I think
section 20.1 is pretty long already.

regards, tom lane

#21Japin Li
japinli@hotmail.com
In reply to: Stephen Frost (#19)
#22David Steele
david@pgmasters.net
In reply to: Stephen Frost (#19)
#23Stephen Frost
sfrost@snowman.net
In reply to: David Steele (#22)
#24Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#23)
#25Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#24)
#26Nathan Bossart
nathandbossart@gmail.com
In reply to: Stephen Frost (#25)
#27Bruce Momjian
bruce@momjian.us
In reply to: Nathan Bossart (#26)
#28Stephen Frost
sfrost@snowman.net
In reply to: Nathan Bossart (#26)
#29Nathan Bossart
nathandbossart@gmail.com
In reply to: Stephen Frost (#28)
#30Stephen Frost
sfrost@snowman.net
In reply to: Nathan Bossart (#29)