Refactor CheckpointWriteDelay()

Started by Bharath Rupireddyalmost 4 years ago3 messages

bharath.rupireddyforpostgres@gmail.com

almost 4 years ago

1 attachment(s)

Hi,

Given that CheckpointWriteDelay() is really a hot code path i.e. gets
called for every dirty buffer written to disk, here's an opportunity
for us to improve it by avoiding some function calls.

Firstly, the AmCheckpointerProcess() macro check can be placed outside
of the CheckpointWriteDelay() which avoids function calls (dirty
buffers written to disk * one function call cost) for a
non-checkpointer process(standalone backend) performing checkpoint.

Secondly, remove the function ImmediateCheckpointRequested() and read
the ckpt_flags from the CheckpointerShmem with volatile qualifier.
This saves some LOC and cost = dirty buffers written to disk * one
function call cost. The ImmediateCheckpointRequested() really does
nothing great, it just reads from the shared memory without lock and
checks whether there's any immediate checkpoint request pending behind
the current one.

Attaching a patch with the above changes. Please have a look at it.

I did a small experiment[1]see "write delay" at the end of "checkpoint complete" message: HEAD: 2022-02-08 05:56:45.551 UTC [651784] LOG: checkpoint starting: time 2022-02-08 05:57:39.154 UTC [651784] LOG: checkpoint complete: wrote 14740 buffers (90.0%); 0 WAL file(s) added, 0 removed, 27 recycled; write=53.461 s, sync=0.027 s, total=53.604 s; sync files=22, longest=0.016 s, average=0.002 s; distance=438865 kB, estimate=438865 kB; write delay=53104.194 ms with a use case [2]checkpoint_timeout = 60s create table t1(a1 int, b1 int); /* inserted 7mn rows */ insert into t1 select i, i*2 from generate_series(1, 7000000) i; on my dev system where
I measured the total time spent in CheckpointWriteDelay() with and
without patch, to my surprise, I didn't see much difference. It may be
my experiment is wrong, my dev box doesn't show much diff and others
may or may not notice the difference. Despite this, the patch proposed
still helps IMO as it saves a few LOC (16) and I'm sure it will also
save some time for standalone backend checkpoints.

Other hackers may not agree with me on the readability (IMO, the patch
doesn't make it unreadable) or the diff that it creates with the
previous versions and so on. I'd rather argue that
CheckpointWriteDelay() is really a hot code path in production
environments and the patch proposed has some benefits.

Thoughts?

[1]: see "write delay" at the end of "checkpoint complete" message: HEAD: 2022-02-08 05:56:45.551 UTC [651784] LOG: checkpoint starting: time 2022-02-08 05:57:39.154 UTC [651784] LOG: checkpoint complete: wrote 14740 buffers (90.0%); 0 WAL file(s) added, 0 removed, 27 recycled; write=53.461 s, sync=0.027 s, total=53.604 s; sync files=22, longest=0.016 s, average=0.002 s; distance=438865 kB, estimate=438865 kB; write delay=53104.194 ms
HEAD:
2022-02-08 05:56:45.551 UTC [651784] LOG: checkpoint starting: time
2022-02-08 05:57:39.154 UTC [651784] LOG: checkpoint complete: wrote
14740 buffers (90.0%); 0 WAL file(s) added, 0 removed, 27 recycled;
write=53.461 s, sync=0.027 s, total=53.604 s; sync files=22,
longest=0.016 s, average=0.002 s; distance=438865 kB, estimate=438865
kB; write delay=53104.194 ms

2022-02-08 05:59:24.173 UTC [652589] LOG: checkpoint starting: time
2022-02-08 06:00:18.166 UTC [652589] LOG: checkpoint complete: wrote
14740 buffers (90.0%); 0 WAL file(s) added, 0 removed, 27 recycled;
write=53.848 s, sync=0.030 s, total=53.993 s; sync files=22,
longest=0.017 s, average=0.002 s; distance=438865 kB, estimate=438865
kB; write delay=53603.159 ms

PATCHED:
2022-02-08 06:07:26.286 UTC [662667] LOG: checkpoint starting: time
2022-02-08 06:08:20.152 UTC [662667] LOG: checkpoint complete: wrote
14740 buffers (90.0%); 0 WAL file(s) added, 0 removed, 27 recycled;
write=53.732 s, sync=0.026 s, total=53.867 s; sync files=22,
longest=0.016 s, average=0.002 s; distance=438865 kB, estimate=438865
kB; write delay=53399.582 ms

2022-02-08 06:10:17.554 UTC [663393] LOG: checkpoint starting: time
2022-02-08 06:11:11.163 UTC [663393] LOG: checkpoint complete: wrote
14740 buffers (90.0%); 0 WAL file(s) added, 0 removed, 27 recycled;
write=53.488 s, sync=0.023 s, total=53.610 s; sync files=22,
longest=0.018 s, average=0.002 s; distance=438865 kB, estimate=438865
kB; write delay=53099.114 ms

[2]: checkpoint_timeout = 60s create table t1(a1 int, b1 int); /* inserted 7mn rows */ insert into t1 select i, i*2 from generate_series(1, 7000000) i;
checkpoint_timeout = 60s
create table t1(a1 int, b1 int);
/* inserted 7mn rows */
insert into t1 select i, i*2 from generate_series(1, 7000000) i;

Regards,
Bharath Rupireddy.

Attachments:

v1-0001-Refactor-CheckpointWriteDelay.patchapplication/x-patch; name=v1-0001-Refactor-CheckpointWriteDelay.patchDownload

From d9d58fc7d006f8fdb37d5ab9119062c5e0083c14 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Tue, 8 Feb 2022 05:19:37 +0000
Subject: [PATCH v1] Refactor CheckpointWriteDelay()

---
 src/backend/postmaster/checkpointer.c | 36 +++++++--------------------
 src/backend/storage/buffer/bufmgr.c   |  6 +++--
 2 files changed, 13 insertions(+), 29 deletions(-)

diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 23f691cd47..e1f0ccd561 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -162,7 +162,6 @@ static pg_time_t last_xlog_switch_time;
 static void HandleCheckpointerInterrupts(void);
 static void CheckArchiveTimeout(void);
 static bool IsCheckpointOnSchedule(double progress);
-static bool ImmediateCheckpointRequested(void);
 static bool CompactCheckpointerRequestQueue(void);
 static void UpdateSharedMemoryConfig(void);
 
@@ -651,25 +650,6 @@ CheckArchiveTimeout(void)
 	}
 }
 
-/*
- * Returns true if an immediate checkpoint request is pending.  (Note that
- * this does not check the *current* checkpoint's IMMEDIATE flag, but whether
- * there is one pending behind it.)
- */
-static bool
-ImmediateCheckpointRequested(void)
-{
-	volatile CheckpointerShmemStruct *cps = CheckpointerShmem;
-
-	/*
-	 * We don't need to acquire the ckpt_lck in this case because we're only
-	 * looking at a single flag bit.
-	 */
-	if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE)
-		return true;
-	return false;
-}
-
 /*
  * CheckpointWriteDelay -- control rate of checkpoint
  *
@@ -688,17 +668,19 @@ CheckpointWriteDelay(int flags, double progress)
 {
 	static int	absorb_counter = WRITES_PER_ABSORB;
 
-	/* Do nothing if checkpoint is being executed by non-checkpointer process */
-	if (!AmCheckpointerProcess())
-		return;
-
 	/*
-	 * Perform the usual duties and take a nap, unless we're behind schedule,
-	 * in which case we just try to catch up as quickly as possible.
+	 * Perform the usual duties and take a nap, unless we're behind schedule or
+	 * an immediate checkpoint request is pending, in which case we just try to
+	 * catch up as quickly as possible.
+	 *
+	 * We don't need to acquire the ckpt_lck while reading ckpt_flags from
+	 * checkpointer shared memory, because we're only looking at a single flag
+	 * bit.
 	 */
 	if (!(flags & CHECKPOINT_IMMEDIATE) &&
 		!ShutdownRequestPending &&
-		!ImmediateCheckpointRequested() &&
+		!(((volatile CheckpointerShmemStruct *) CheckpointerShmem)->ckpt_flags &
+		  CHECKPOINT_IMMEDIATE) &&
 		IsCheckpointOnSchedule(progress))
 	{
 		if (ConfigReloadPending)
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index f5459c68f8..859aa0c867 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2172,11 +2172,13 @@ BufferSync(int flags)
 		}
 
 		/*
-		 * Sleep to throttle our I/O rate.
+		 * Sleep to throttle our I/O rate. Do nothing if checkpoint is being
+		 * executed by non-checkpointer process.
 		 *
 		 * (This will check for barrier events even if it doesn't sleep.)
 		 */
-		CheckpointWriteDelay(flags, (double) num_processed / num_to_scan);
+		if (AmCheckpointerProcess())
+			CheckpointWriteDelay(flags, (double) num_processed / num_to_scan);
 	}
 
 	/* issue all pending flushes */
-- 
2.25.1

Nitin Jadhav

nitinjadhavpostgres@gmail.com

almost 4 years ago

In reply to: Bharath Rupireddy (#1)

Re: Refactor CheckpointWriteDelay()

Given that CheckpointWriteDelay() is really a hot code path i.e. gets
called for every dirty buffer written to disk, here's an opportunity
for us to improve it by avoiding some function calls.

I have reviewed the patch and I agree that the patch improves the code.

Firstly, the AmCheckpointerProcess() macro check can be placed outside
of the CheckpointWriteDelay() which avoids function calls (dirty
buffers written to disk * one function call cost) for a
non-checkpointer process(standalone backend) performing checkpoint.

It definitely improves the code as it reduces one function call per
dirty buffer write during non-checkpointer process even though we may
not see huge performance improvement during testing. But I am not in
favour of this change as many existing functions use this kind of
check in the initial part of the function. The other reason is, if we
call this function in multiple places in future then we may end up
having multiple checks or we may have to revert back the code.

Secondly, remove the function ImmediateCheckpointRequested() and read
the ckpt_flags from the CheckpointerShmem with volatile qualifier.
This saves some LOC and cost = dirty buffers written to disk * one
function call cost. The ImmediateCheckpointRequested() really does
nothing great, it just reads from the shared memory without lock and
checks whether there's any immediate checkpoint request pending behind
the current one.

+1 for this change.

- * Perform the usual duties and take a nap, unless we're behind schedule,
- * in which case we just try to catch up as quickly as possible.
+ * Perform the usual duties and take a nap, unless we're behind schedule or
+ * an immediate checkpoint request is pending, in which case we just try to
+ * catch up as quickly as possible.
+ *
+ * We don't need to acquire the ckpt_lck while reading ckpt_flags from
+ * checkpointer shared memory, because we're only looking at a single flag
+ * bit.

+1 for the additional information. I feel the information related to
the pending shutdown request should also be added.

Thanks & Regards,
Nitin Jadhav

On Thu, Feb 10, 2022 at 5:32 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Show quoted text

Hi,

Given that CheckpointWriteDelay() is really a hot code path i.e. gets
called for every dirty buffer written to disk, here's an opportunity
for us to improve it by avoiding some function calls.

Firstly, the AmCheckpointerProcess() macro check can be placed outside
of the CheckpointWriteDelay() which avoids function calls (dirty
buffers written to disk * one function call cost) for a
non-checkpointer process(standalone backend) performing checkpoint.

Secondly, remove the function ImmediateCheckpointRequested() and read
the ckpt_flags from the CheckpointerShmem with volatile qualifier.
This saves some LOC and cost = dirty buffers written to disk * one
function call cost. The ImmediateCheckpointRequested() really does
nothing great, it just reads from the shared memory without lock and
checks whether there's any immediate checkpoint request pending behind
the current one.

Attaching a patch with the above changes. Please have a look at it.

I did a small experiment[1] with a use case [2] on my dev system where
I measured the total time spent in CheckpointWriteDelay() with and
without patch, to my surprise, I didn't see much difference. It may be
my experiment is wrong, my dev box doesn't show much diff and others
may or may not notice the difference. Despite this, the patch proposed
still helps IMO as it saves a few LOC (16) and I'm sure it will also
save some time for standalone backend checkpoints.

Other hackers may not agree with me on the readability (IMO, the patch
doesn't make it unreadable) or the diff that it creates with the
previous versions and so on. I'd rather argue that
CheckpointWriteDelay() is really a hot code path in production
environments and the patch proposed has some benefits.

Thoughts?

[1] see "write delay" at the end of "checkpoint complete" message:
HEAD:
2022-02-08 05:56:45.551 UTC [651784] LOG: checkpoint starting: time
2022-02-08 05:57:39.154 UTC [651784] LOG: checkpoint complete: wrote
14740 buffers (90.0%); 0 WAL file(s) added, 0 removed, 27 recycled;
write=53.461 s, sync=0.027 s, total=53.604 s; sync files=22,
longest=0.016 s, average=0.002 s; distance=438865 kB, estimate=438865
kB; write delay=53104.194 ms

2022-02-08 05:59:24.173 UTC [652589] LOG: checkpoint starting: time
2022-02-08 06:00:18.166 UTC [652589] LOG: checkpoint complete: wrote
14740 buffers (90.0%); 0 WAL file(s) added, 0 removed, 27 recycled;
write=53.848 s, sync=0.030 s, total=53.993 s; sync files=22,
longest=0.017 s, average=0.002 s; distance=438865 kB, estimate=438865
kB; write delay=53603.159 ms

PATCHED:
2022-02-08 06:07:26.286 UTC [662667] LOG: checkpoint starting: time
2022-02-08 06:08:20.152 UTC [662667] LOG: checkpoint complete: wrote
14740 buffers (90.0%); 0 WAL file(s) added, 0 removed, 27 recycled;
write=53.732 s, sync=0.026 s, total=53.867 s; sync files=22,
longest=0.016 s, average=0.002 s; distance=438865 kB, estimate=438865
kB; write delay=53399.582 ms

2022-02-08 06:10:17.554 UTC [663393] LOG: checkpoint starting: time
2022-02-08 06:11:11.163 UTC [663393] LOG: checkpoint complete: wrote
14740 buffers (90.0%); 0 WAL file(s) added, 0 removed, 27 recycled;
write=53.488 s, sync=0.023 s, total=53.610 s; sync files=22,
longest=0.018 s, average=0.002 s; distance=438865 kB, estimate=438865
kB; write delay=53099.114 ms

[2]
checkpoint_timeout = 60s
create table t1(a1 int, b1 int);
/* inserted 7mn rows */
insert into t1 select i, i*2 from generate_series(1, 7000000) i;

Regards,
Bharath Rupireddy.

Andres Freund

andres@anarazel.de

almost 4 years ago

In reply to: Bharath Rupireddy (#1)

Re: Refactor CheckpointWriteDelay()

Hi,

On 2022-02-10 17:31:55 +0530, Bharath Rupireddy wrote:

Secondly, remove the function ImmediateCheckpointRequested() and read
the ckpt_flags from the CheckpointerShmem with volatile qualifier.
This saves some LOC and cost = dirty buffers written to disk * one
function call cost.

The compiler can just inline ImmediateCheckpointRequested(). We don't gain
anything by inlining it ourselves.

I did a small experiment[1] with a use case [2] on my dev system where
I measured the total time spent in CheckpointWriteDelay() with and
without patch, to my surprise, I didn't see much difference. It may be
my experiment is wrong, my dev box doesn't show much diff and others
may or may not notice the difference. Despite this, the patch proposed
still helps IMO as it saves a few LOC (16) and I'm sure it will also
save some time for standalone backend checkpoints.

I am not surprised that you're not seeing a benefit. Compared to the cost of
writing out a buffer the cost of the function call here is neglegible.

I don't think this is an improvement at all. Imagine what happens if we add a
second callsite to CheckpointWriteDelay() or
ImmediateCheckpointRequested()...

Greetings,

Andres Freund