CHECKPOINT unlogged data

Started by Christoph Berg9 months ago34 messages

myon@debian.org

9 months ago

A customer reported to use CHECKPOINT before shutdowns to make the
shutdowns themselves faster and asked if it was possible to make
CHECKPOINT optionally also write out unlogged table data for that
purpose. I think the idea makes sense, so there's the patch.

Christoph

Andres Freund

andres@anarazel.de

9 months ago

In reply to: Christoph Berg (#1)

Re: CHECKPOINT unlogged data

Hi,

On 2025-05-30 18:17:45 +0200, Christoph Berg wrote:

A customer reported to use CHECKPOINT before shutdowns to make the
shutdowns themselves faster and asked if it was possible to make
CHECKPOINT optionally also write out unlogged table data for that
purpose. I think the idea makes sense, so there's the patch.

I've seen the need for this quite a few times, fwiw. So +1 to the idea.

Greetings,

Andres Freund

Christoph Berg

myon@debian.org

9 months ago

In reply to: Andres Freund (#2)

Re: CHECKPOINT unlogged data

Re: Andres Freund

Hi,

On 2025-05-30 18:17:45 +0200, Christoph Berg wrote:

A customer reported to use CHECKPOINT before shutdowns to make the
shutdowns themselves faster and asked if it was possible to make
CHECKPOINT optionally also write out unlogged table data for that
purpose. I think the idea makes sense, so there's the patch.

I've seen the need for this quite a few times, fwiw. So +1 to the idea.

Do we want to officially mention this use case in checkpoint.sgml? I've
already replaced the "is not intended for use during normal operation"
wording by "not required during normal operation", but we might go one
step further and endorse it.

Christoph

Andres Freund

andres@anarazel.de

9 months ago

In reply to: Christoph Berg (#3)

Re: CHECKPOINT unlogged data

Hi,

On May 30, 2025 12:55:28 PM EDT, Christoph Berg <myon@debian.org> wrote:

Re: Andres Freund

Hi,

On 2025-05-30 18:17:45 +0200, Christoph Berg wrote:

A customer reported to use CHECKPOINT before shutdowns to make the
shutdowns themselves faster and asked if it was possible to make
CHECKPOINT optionally also write out unlogged table data for that
purpose. I think the idea makes sense, so there's the patch.

I've seen the need for this quite a few times, fwiw. So +1 to the idea.

Do we want to officially mention this use case in checkpoint.sgml? I've
already replaced the "is not intended for use during normal operation"
wording by "not required during normal operation", but we might go one
step further and endorse it.

Yes, I think it's good to mention what it is useful for.

Greetings,

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Nathan Bossart

nathandbossart@gmail.com

9 months ago

In reply to: Andres Freund (#2)

Re: CHECKPOINT unlogged data

On Fri, May 30, 2025 at 12:39:02PM -0400, Andres Freund wrote:

On 2025-05-30 18:17:45 +0200, Christoph Berg wrote:

A customer reported to use CHECKPOINT before shutdowns to make the
shutdowns themselves faster and asked if it was possible to make
CHECKPOINT optionally also write out unlogged table data for that
purpose. I think the idea makes sense, so there's the patch.

I've seen the need for this quite a few times, fwiw. So +1 to the idea.

This patch also adds an IMMEDIATE option, which I proposed some time ago
[0]: /messages/by-id/17A03557-CF5C-4D4B-B719-A1D98DD75E75@amazon.com
usefulness. FWIW I have no concerns about adding a few retail options to
CHECKPOINT, but others might balk at options without solid use-cases. The
unlogged table one seems reasonable enough.

[0]: /messages/by-id/17A03557-CF5C-4D4B-B719-A1D98DD75E75@amazon.com

--
nathan

Christoph Berg

myon@debian.org

9 months ago

In reply to: Nathan Bossart (#5)

Re: CHECKPOINT unlogged data

Re: Nathan Bossart

This patch also adds an IMMEDIATE option, which I proposed some time ago
[0]. I ended up withdrawing it due to general skepticism about its

Thanks for the pointer, I did not go that far back when looking for
older threads.

When writing the patch, I was also thinking about naming the option
"fast" or "spread" but ultimately went with "immediate" because that's
what the log message is using:

=# checkpoint;
2025-05-30 18:23:17.433 CEST [579834] LOG: Checkpoint beginnt: immediate force wait

SQL command "(options)" tend to be booleans, hence "immediate {on|off}".
Introducing two separate keywords "fast" and "spread" seemed
confusing, and there is no precedent for "fast=on" in other tools or
the replication protocol.

usefulness. FWIW I have no concerns about adding a few retail options to
CHECKPOINT, but others might balk at options without solid use-cases. The
unlogged table one seems reasonable enough.

I think the two options immediate and flush_all are actually useful in
combination for the shutdown case. If operation is to continue
normally until just before the shutdown, it might make sense to run
these 3 commands (or just #1 and #3):

checkpoint (flush_all, immediate false);
checkpoint (flush_all);
pg_ctl stop

(I also thought about a VERBOSE option, but since the checkpoint
messages are generated by a different process, it's probably harder
than I initially thought.)

Christoph

Andres Freund

andres@anarazel.de

9 months ago

In reply to: Christoph Berg (#6)

Re: CHECKPOINT unlogged data

On 2025-05-30 19:23:04 +0200, Christoph Berg wrote:

Re: Nathan Bossart

This patch also adds an IMMEDIATE option, which I proposed some time ago
[0]. I ended up withdrawing it due to general skepticism about its

Thanks for the pointer, I did not go that far back when looking for
older threads.

When writing the patch, I was also thinking about naming the option
"fast" or "spread" but ultimately went with "immediate" because that's
what the log message is using:

=# checkpoint;
2025-05-30 18:23:17.433 CEST [579834] LOG: Checkpoint beginnt: immediate force wait

SQL command "(options)" tend to be booleans, hence "immediate {on|off}".
Introducing two separate keywords "fast" and "spread" seemed
confusing, and there is no precedent for "fast=on" in other tools or
the replication protocol.

I'd add a 'mode' that can be set to an arbitrary string, which then can be
validated in C code. That seems more future proof.

Christoph Berg

myon@debian.org

9 months ago

In reply to: Andres Freund (#7)

Re: CHECKPOINT unlogged data

Re: Andres Freund

I'd add a 'mode' that can be set to an arbitrary string, which then can be
validated in C code. That seems more future proof.

Changed in the attached v2, thanks.

Christoph

Fujii Masao

masao.fujii@gmail.com

9 months ago

In reply to: Christoph Berg (#8)

Re: CHECKPOINT unlogged data

On 2025/06/06 19:03, Christoph Berg wrote:

Re: Andres Freund

I'd add a 'mode' that can be set to an arbitrary string, which then can be
validated in C code. That seems more future proof.

Changed in the attached v2, thanks.

When I applied the patch and compiled it, I got the following warnings:

utility.c:946:4: warning: label followed by a declaration is a C23 extension [-Wc23-extensions]
946 | CheckPointStmt *stmt = (CheckPointStmt *) parsetree;
| ^
utility.c:947:16: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
947 | ListCell *lc;
| ^
2 warnings generated.

			RequestCheckpoint(CHECKPOINT_WAIT |
+							  (immediate ? CHECKPOINT_IMMEDIATE : 0) |
+							  (flush_all ? CHECKPOINT_FLUSH_ALL : 0) |

Some users might want to trigger a spread checkpoint but not wait for
it to finish, since it could take a long time? If that's a valid use case,
maybe we should add a WAIT option to let users choose whether to wait for
the checkpoint to complete or not?

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

#10

Christoph Berg

myon@debian.org

9 months ago

In reply to: Fujii Masao (#9)

Re: CHECKPOINT unlogged data

Re: Fujii Masao

utility.c:946:4: warning: label followed by a declaration is a C23 extension [-Wc23-extensions]

Thanks, my compiler didn't throw that. { } block added in v3.

RequestCheckpoint(CHECKPOINT_WAIT |
+							  (immediate ? CHECKPOINT_IMMEDIATE : 0) |
+							  (flush_all ? CHECKPOINT_FLUSH_ALL : 0) |
Some users might want to trigger a spread checkpoint but not wait for
it to finish, since it could take a long time? If that's a valid use case,
maybe we should add a WAIT option to let users choose whether to wait for
the checkpoint to complete or not?

Do we want that? The checkpoint is only effective when it's finished,
and running `psql -c "checkpoint (wait false)"` might make people
shoot themselves into the foot.

Christoph

#11

Nathan Bossart

nathandbossart@gmail.com

9 months ago

In reply to: Christoph Berg (#10)

Re: CHECKPOINT unlogged data

On Fri, Jun 06, 2025 at 04:26:56PM +0200, Christoph Berg wrote:

Re: Fujii Masao

Some users might want to trigger a spread checkpoint but not wait for
it to finish, since it could take a long time? If that's a valid use case,
maybe we should add a WAIT option to let users choose whether to wait for
the checkpoint to complete or not?

Do we want that? The checkpoint is only effective when it's finished,
and running `psql -c "checkpoint (wait false)"` might make people
shoot themselves into the foot.

*shrug*

I imagine the documentation will pretty clearly indicate that setting WAIT
to "false" will cause CHECKPOINT to not wait for it to finish.

+    <term><literal>MODE</literal></term>
+    <listitem>
+     <para>
+      The <command>CHECKPOINT</command> command forces an immediate
+      checkpoint by default when the command is issued, without waiting for a
+      regular checkpoint scheduled by the system. <literal>FAST</literal> is a
+      synonym for <literal>IMMEDIATE</literal>.
+     </para>
+     <para>
+      A <literal>SPREAD</literal> checkpoint will instead spread out the write load
+      as determined by the <xref linkend="guc-checkpoint-completion-target"/>
+      setting, like the system-scheduled checkpoints.
+     </para>
+    </listitem>
+   </varlistentry>

I don't understand why we need to add both FAST and IMMEDIATE.

+   <varlistentry>
+    <term><literal>FLUSH_ALL</literal></term>
+    <listitem>
+     <para>
+      Requests the checkpoint to also flush data of <literal>UNLOGGED</literal>
+      relations. Defaults to off.
+     </para>
+    </listitem>
+   </varlistentry>

Could we rename this to something like FLUSH_UNLOGGED or INCLUDE_UNLOGGED?
IMHO that's more descriptive.

My attempt at this patch back in 2020 included the following note, which
seems relevant here:

+   Note that the server may consolidate concurrently requested checkpoints or
+   restartpoints.  Such consolidated requests will contain a combined set of
+   options.  For example, if one session requested an immediate checkpoint and
+   another session requested a non-immediate checkpoint, the server may combine
+   these requests and perform one immediate checkpoint.

We might also want to make sure it's clear that CHECKPOINT does nothing if
there's been no database activity since the last one (or, in the case of a
restartpoint, if there hasn't been a checkpoint record).

--
nathan

#12

Christoph Berg

myon@debian.org

9 months ago

In reply to: Nathan Bossart (#11)

Re: CHECKPOINT unlogged data

Re: Nathan Bossart

I imagine the documentation will pretty clearly indicate that setting WAIT
to "false" will cause CHECKPOINT to not wait for it to finish.

I can add it, it's easy enough...

I don't understand why we need to add both FAST and IMMEDIATE.

We have both:

=# checkpoint ;
2025-06-06 18:09:25.743 CEST [872379] LOG: checkpoint starting: immediate force wait

pg_basebackup --checkpoint=fast

Could we settle for one official name for that? Then we could use that
name in all contexts.

+ <term><literal>FLUSH_ALL</literal></term>

Could we rename this to something like FLUSH_UNLOGGED or INCLUDE_UNLOGGED?
IMHO that's more descriptive.

That's again coming from what the log message is saying:

=# checkpoint (flush_all);
2025-06-06 18:12:46.298 CEST [873436] LOG: checkpoint starting: immediate force wait flush-all

I think we should be consistent there.

#define CHECKPOINT_FLUSH_ALL 0x0010 /* Flush all pages, including those
* belonging to unlogged tables */

Maybe CHECKPOINT_FLUSH_UNLOGGED would be more explicit?

My attempt at this patch back in 2020 included the following note, which
seems relevant here:

+   Note that the server may consolidate concurrently requested checkpoints or
+   restartpoints.  Such consolidated requests will contain a combined set of
+   options.  For example, if one session requested an immediate checkpoint and
+   another session requested a non-immediate checkpoint, the server may combine
+   these requests and perform one immediate checkpoint.

The CHECKPOINT documentation links to `28.5. WAL Configuration`,
should this be mentioned there instead?

We might also want to make sure it's clear that CHECKPOINT does nothing if
there's been no database activity since the last one (or, in the case of a
restartpoint, if there hasn't been a checkpoint record).

That's taken care of by "force":

#define CHECKPOINT_FORCE 0x0008 /* Force even if no activity */

Christoph

#13

Nathan Bossart

nathandbossart@gmail.com

9 months ago

In reply to: Christoph Berg (#12)

Re: CHECKPOINT unlogged data

On Fri, Jun 06, 2025 at 06:20:21PM +0200, Christoph Berg wrote:

Re: Nathan Bossart

I don't understand why we need to add both FAST and IMMEDIATE.

We have both:

=# checkpoint ;
2025-06-06 18:09:25.743 CEST [872379] LOG: checkpoint starting: immediate force wait

pg_basebackup --checkpoint=fast

Could we settle for one official name for that? Then we could use that
name in all contexts.

That seems like a good idea to me. I'm tempted to say that "fast" more
accurately describes what's happening than "immediate." "Immediate" sounds
like it happens instantaneously, but it's actually just happening "fast,"
i.e., as fast as possible.

+ <term><literal>FLUSH_ALL</literal></term>

Could we rename this to something like FLUSH_UNLOGGED or INCLUDE_UNLOGGED?
IMHO that's more descriptive.

That's again coming from what the log message is saying:

=# checkpoint (flush_all);
2025-06-06 18:12:46.298 CEST [873436] LOG: checkpoint starting: immediate force wait flush-all

I think we should be consistent there.

#define CHECKPOINT_FLUSH_ALL 0x0010 /* Flush all pages, including those
* belonging to unlogged tables */

Maybe CHECKPOINT_FLUSH_UNLOGGED would be more explicit?

WFM.

My attempt at this patch back in 2020 included the following note, which
seems relevant here:
+   Note that the server may consolidate concurrently requested checkpoints or
+   restartpoints.  Such consolidated requests will contain a combined set of
+   options.  For example, if one session requested an immediate checkpoint and
+   another session requested a non-immediate checkpoint, the server may combine
+   these requests and perform one immediate checkpoint.
The CHECKPOINT documentation links to `28.5. WAL Configuration`,
should this be mentioned there instead?

I thought it would make sense to put it closer to where these options are
described, since it'll be most evident for manually-initiated checkpoints.

We might also want to make sure it's clear that CHECKPOINT does nothing if
there's been no database activity since the last one (or, in the case of a
restartpoint, if there hasn't been a checkpoint record).

That's taken care of by "force":

#define CHECKPOINT_FORCE 0x0008 /* Force even if no activity */

Oh, I see that we always specify that for CHECKPOINT commands, except for
restartpoints. IIRC even if you do specify CHECKPOINT_FORCE for a
restartpoint, it'll have no effect. It's proably worth mentioning that
case, at least.

--
nathan

#14

Christoph Berg

myon@debian.org

9 months ago

In reply to: Nathan Bossart (#13)

Re: CHECKPOINT unlogged data

Re: Nathan Bossart

That seems like a good idea to me. I'm tempted to say that "fast" more
accurately describes what's happening than "immediate." "Immediate" sounds
like it happens instantaneously, but it's actually just happening "fast,"
i.e., as fast as possible.

Ack.

#define CHECKPOINT_FLUSH_ALL 0x0010 /* Flush all pages, including those
* belonging to unlogged tables */

Maybe CHECKPOINT_FLUSH_UNLOGGED would be more explicit?

WFM.

Do we want to change the checkpoint log message (and the new options)
only, or include the CHECKPOINT_* flags? (I would guess there aren't
many external users of these flags, but mmmv.)

I thought it would make sense to put it closer to where these options are
described, since it'll be most evident for manually-initiated checkpoints.

Ack, I'll add that.

We might also want to make sure it's clear that CHECKPOINT does nothing if
there's been no database activity since the last one (or, in the case of a
restartpoint, if there hasn't been a checkpoint record).

That's taken care of by "force":

#define CHECKPOINT_FORCE 0x0008 /* Force even if no activity */

Oh, I see that we always specify that for CHECKPOINT commands, except for
restartpoints. IIRC even if you do specify CHECKPOINT_FORCE for a
restartpoint, it'll have no effect. It's proably worth mentioning that
case, at least.

Right, will do.

Christoph

#15

Nathan Bossart

nathandbossart@gmail.com

9 months ago

In reply to: Christoph Berg (#14)

Re: CHECKPOINT unlogged data

On Wed, Jun 11, 2025 at 03:45:46PM +0200, Christoph Berg wrote:

Do we want to change the checkpoint log message (and the new options)
only, or include the CHECKPOINT_* flags? (I would guess there aren't
many external users of these flags, but mmmv.)

IMO we should try to make the terminology consistent everywhere. I'd
suggest putting the renaming stuff in separate prerequisite patches for
your new CHECKPOINT option.

--
nathan

#16

Christoph Berg

myon@debian.org

9 months ago

In reply to: Nathan Bossart (#15)

Re: CHECKPOINT unlogged data

Re: Nathan Bossart

IMO we should try to make the terminology consistent everywhere. I'd
suggest putting the renaming stuff in separate prerequisite patches for
your new CHECKPOINT option.

Ack, done in v4.

I haven't implemented a WAIT option yet since I didn't want to decide
that without more votes in either direction.

Christoph

#17

Nathan Bossart

nathandbossart@gmail.com

9 months ago

In reply to: Christoph Berg (#16)

Re: CHECKPOINT unlogged data

On Wed, Jun 11, 2025 at 05:53:15PM +0200, Christoph Berg wrote:

Ack, done in v4.

Thanks! The overall shape of these patches looks pretty good to me. I'll
aim to give them a deeper review in the near future.

--
nathan

#18

Laurenz Albe

laurenz.albe@cybertec.at

9 months ago

In reply to: Christoph Berg (#16)

Re: CHECKPOINT unlogged data

On Wed, 2025-06-11 at 17:53 +0200, Christoph Berg wrote:

I haven't implemented a WAIT option yet since I didn't want to decide
that without more votes in either direction.

I had a look at it, and I have suggestions for the documentation.

--- a/doc/src/sgml/ref/checkpoint.sgml
+++ b/doc/src/sgml/ref/checkpoint.sgml
[...]
<para>
-   The <command>CHECKPOINT</command> command forces a fast
-   checkpoint when the command is issued, without waiting for a
-   regular checkpoint scheduled by the system (controlled by the settings in
-   <xref linkend="runtime-config-wal-checkpoints"/>).
-   <command>CHECKPOINT</command> is not intended for use during normal
-   operation.
+   Running <command>CHECKPOINT</command> is not required during normal
+   operation; the system schedules checkpoints automatically (controlled by
+   the settings in <xref linkend="runtime-config-wal-checkpoints"/>).
+   <command>CHECKPOINT</command> can be beneficial to run before operations
+   such as binary backups or shutting down <productname>PostgreSQL</productname>
+   as this can reduce the amount of data to be flushed for the next checkpoint
+   within these operations. Specifically, <literal>UNLOGGED</literal> table
+   data is otherwise only flushed by the shutdown checkpoint.
</para>

How about the following for added clarity:

Running an explicit <command>CHECKPOINT</command> is not required during
normal operation; the system schedules checkpoints automatically (controlled
by the settings in <xref linkend="runtime-config-wal-checkpoints"/>).
However, it can be useful to perform an explicit checkpoint immediately
before shutting down the server or performing an online file system backup,
if you want the checkpoint implicit in these operations to be as fast as
possible. In particular, <literal>UNLOGGED</literal> table data only get
flushed to disk during a shutdown checkpoint, so you should use the option
<literal>FLUSH_UNLOGGED</literal> for an explicit checkpoint right before a
shutdown.

<para>
If executed during recovery, the <command>CHECKPOINT</command> command
-   will force a restartpoint (see <xref linkend="wal-configuration"/>)
-   rather than writing a new checkpoint.
+   will force a restartpoint rather than writing a new checkpoint.  (The
+   operation will be a no-op if there is no corresponding checkpoint in the
+   write-ahead log.)  If a checkpoint is already being written when a
+   <command>CHECKPOINT</command> is issued, the running checkpoint is upgraded
+   with this command's <literal>MODE</literal> and
+   <literal>FLUSH_UNLOGGED</literal> options.
</para>

Oh, interesting; I wasn't aware of that. That means that running CHECKPOINT
(FLUSH_UNLOGGED) is less useful than I thought, since there is often already
a spread checkpoint running. Would it be useful to recommend that you should
run the command twice when aiming for a fast shutdown?

@@ -58,6 +69,54 @@ CHECKPOINT
</para>
</refsect1>

+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><literal>MODE</literal></term>
+    <listitem>
+     <para>
+      The <command>CHECKPOINT</command> command forces a fast checkpoint by
+      default when the command is issued, without waiting for a regular
+      checkpoint scheduled by the system.
+     </para>
+     <para>
+      A <literal>SPREAD</literal> checkpoint will instead spread out the write
+      load as determined by the
+      <xref linkend="guc-checkpoint-completion-target"/> setting, like the
+      system-scheduled checkpoints.
+     </para>
+    </listitem>
+   </varlistentry>

I find this somewhat confusing; how about

<para>
The default <literal>FAST</literal> mode causes the checkpoint to be
performed as fast as possible. A <literal>SPREAD</literal> checkpoint
will instead spread out the write load as determined by the
<xref linkend="guc-checkpoint-completion-target"/> setting, like the
system-scheduled checkpoints.
</para>

+   <varlistentry>
+    <term><literal>FLUSH_UNLOGGED</literal></term>
+    <listitem>
+     <para>
+      All data files except for those marked as <literal>UNLOGGED</literal>
+      are flushed to disk during a checkpoint.  Enabling this option will also
+      flush <literal>UNLOGGED</literal> relations; the default is off.
+     </para>
+    </listitem>
+   </varlistentry>

How about:

<varlistentry>
<term><literal>FLUSH_UNLOGGED</literal></term>
<listitem>
<para>
Normally, data files marked as <literal>UNLOGGED</literal> are not
flushed to disk during a checkpoint. Enabling this option will also
flush <literal>UNLOGGED</literal> relations. This option is disabled
by default.
</para>
</listitem>
</varlistentry>

Yours,
Laurenz Albe

#19

Christoph Berg

myon@debian.org

9 months ago

In reply to: Laurenz Albe (#18)

Re: CHECKPOINT unlogged data

Re: Laurenz Albe

How about the following for added clarity:

Running an explicit <command>CHECKPOINT</command> is not required during
normal operation; the system schedules checkpoints automatically (controlled
by the settings in <xref linkend="runtime-config-wal-checkpoints"/>).
However, it can be useful to perform an explicit checkpoint immediately
before shutting down the server or performing an online file system backup,
if you want the checkpoint implicit in these operations to be as fast as
possible. In particular, <literal>UNLOGGED</literal> table data only get
flushed to disk during a shutdown checkpoint, so you should use the option
<literal>FLUSH_UNLOGGED</literal> for an explicit checkpoint right before a
shutdown.

Thanks for the suggestions.

I've taken this one, but replaced "be as fast as possible" by "have to
write out less data" to avoid confusion with the FAST mode.

Oh, interesting; I wasn't aware of that. That means that running CHECKPOINT
(FLUSH_UNLOGGED) is less useful than I thought, since there is often already
a spread checkpoint running. Would it be useful to recommend that you should
run the command twice when aiming for a fast shutdown?

I spent some time digging through the code, but I'm still not entirely
sure what's happening. There are several parts to it:

1) the list of buffers to flush is determined at the beginning of the
checkpoint, so running a 2nd FLUSH_UNLOGGED checkpoint will not make
the running checkpoint write these

2) running CHECKPOINT updates the checkpoint flags in shared memory so
I think the currently running checkpoint picks "MODE FAST" up and
speeds up. (But I'm not entirely sure, the call stack is quite deep
there.)

3) running CHECKPOINT (at least when waiting for it) seems to actually
start a new checkpoint, so FLUSH_UNLOGGED should still be effective.
(See the code arount "start_cv" in checkpointer.c)

Admittedly, adding these points together raises some question marks
about the flag handling, so I would welcome clarification by someone
more knowledgeable in this area.

I find this somewhat confusing; how about
How about:

Taken unmodified, thanks!

Christoph

#20

Nathan Bossart

nathandbossart@gmail.com

9 months ago

In reply to: Christoph Berg (#19)

Re: CHECKPOINT unlogged data

On Mon, Jun 16, 2025 at 04:36:59PM +0200, Christoph Berg wrote:

I spent some time digging through the code, but I'm still not entirely
sure what's happening. There are several parts to it:

1) the list of buffers to flush is determined at the beginning of the
checkpoint, so running a 2nd FLUSH_UNLOGGED checkpoint will not make
the running checkpoint write these

2) running CHECKPOINT updates the checkpoint flags in shared memory so
I think the currently running checkpoint picks "MODE FAST" up and
speeds up. (But I'm not entirely sure, the call stack is quite deep
there.)

3) running CHECKPOINT (at least when waiting for it) seems to actually
start a new checkpoint, so FLUSH_UNLOGGED should still be effective.
(See the code arount "start_cv" in checkpointer.c)

Admittedly, adding these points together raises some question marks
about the flag handling, so I would welcome clarification by someone
more knowledgeable in this area.

I think you've got it right. With CHECKPOINT_WAIT set, RequestCheckpoint()
will wait for a new checkpoint to start, at which point we know that the
new flags have been seen by the checkpointer. If an immediate checkpoint
is pending, CheckpointWriteDelay() will skip sleeping in the
currently-running one, so the current checkpoint will be "upgraded" to
immediate in some sense, but IIUC there will still be another immediate
checkpoint after it completes. But AFAICT it doesn't pick up
FLUSH_UNLOGGED until the next checkpoint begins.

Another thing to note is what I mentioned earlier:

+   Note that the server may consolidate concurrently requested checkpoints or
+   restartpoints.  Such consolidated requests will contain a combined set of
+   options.  For example, if one session requested an immediate checkpoint and
+   another session requested a non-immediate checkpoint, the server may combine
+   these requests and perform one immediate checkpoint.

--
nathan

#21

Christoph Berg

myon@debian.org

9 months ago

In reply to: Nathan Bossart (#20)

#22

Nathan Bossart

nathandbossart@gmail.com

9 months ago

In reply to: Christoph Berg (#21)

#23