skip replication slot snapshot/map file removal during end-of-recovery checkpoint

Started by Bharath Rupireddyabout 4 years ago4 messages

bharath.rupireddyforpostgres@gmail.com

about 4 years ago

Hi,

Currently the end-of-recovery checkpoint can be much slower, impacting
the server availability, if there are many replication slot files
XXXX.snap or map-XXXX to be enumerated and deleted. How about skipping
the .snap and map- file handling during the end-of-recovery
checkpoint? It makes the server available faster and the next regular
checkpoint can deal with these files. If required, we can have a GUC
(skip_replication_slot_file_handling or some other better name) to
control this default being the existing behavior.

Thoughts?

Regards,
Bharath Rupireddy.

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

about 4 years ago

In reply to: Bharath Rupireddy (#1)

1 attachment(s)

Re: skip replication slot snapshot/map file removal during end-of-recovery checkpoint

On Thu, Dec 23, 2021 at 4:46 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Hi,

Currently the end-of-recovery checkpoint can be much slower, impacting
the server availability, if there are many replication slot files
XXXX.snap or map-XXXX to be enumerated and deleted. How about skipping
the .snap and map- file handling during the end-of-recovery
checkpoint? It makes the server available faster and the next regular
checkpoint can deal with these files. If required, we can have a GUC
(skip_replication_slot_file_handling or some other better name) to
control this default being the existing behavior.

Thoughts?

Here's the v1 patch, please review it.

Regards,
Bharath Rupireddy.

Attachments:

v1-0001-Skip-processing-snapshot-mapping-files-during-end.patchapplication/octet-stream; name=v1-0001-Skip-processing-snapshot-mapping-files-during-end.patchDownload

From e3cb06f6712323debc9553e79955501437948794 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Fri, 31 Dec 2021 05:52:24 +0000
Subject: [PATCH v1] Skip processing snapshot, mapping files during
 end-of-recovery checkpoint

This makes the server available faster. However the regular
checkpoints can process these files.
---
 src/backend/access/transam/xlog.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 87cd05c945..9d1331dd0d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9570,8 +9570,21 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
 	CheckPointReplicationSlots();
-	CheckPointSnapBuild();
-	CheckPointLogicalRewriteHeap();
+
+	/*
+	 * Let's not process snapshot and mapping files during end-of-recovery
+	 * checkpoint to make the server available faster. However, the regular
+	 * checkpoints can process these files.
+	 */
+	if (flags & CHECKPOINT_END_OF_RECOVERY)
+		ereport((log_checkpoints ? LOG : DEBUG2),
+				(errmsg("skipped processing of replication slot snapshot and mapping files during end-of-recovery checkpoint")));
+	else
+	{
+		CheckPointSnapBuild();
+		CheckPointLogicalRewriteHeap();
+	}
+
 	CheckPointReplicationOrigin();
 
 	/* Write out all dirty data in SLRUs and the main buffer pool */
-- 
2.25.1

Bossart, Nathan

bossartn@amazon.com

about 4 years ago

In reply to: Bharath Rupireddy (#2)

Re: skip replication slot snapshot/map file removal during end-of-recovery checkpoint

On 12/23/21, 3:17 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:

Currently the end-of-recovery checkpoint can be much slower, impacting
the server availability, if there are many replication slot files
XXXX.snap or map-XXXX to be enumerated and deleted. How about skipping
the .snap and map- file handling during the end-of-recovery
checkpoint? It makes the server available faster and the next regular
checkpoint can deal with these files. If required, we can have a GUC
(skip_replication_slot_file_handling or some other better name) to
control this default being the existing behavior.

I suggested something similar as a possibility in the other thread
where these tasks are being discussed [0]/messages/by-id/A285A823-0AF2-4376-838E-847FA4710F9A@amazon.com. I think it is worth
considering, but IMO it is not a complete solution to the problem. If
there are frequently many such files to delete and regular checkpoints
are taking longer, the shutdown/end-of-recovery checkpoint could still
take a while. I think it would be better to separate these tasks from
checkpointing instead.

Nathan

[0]: /messages/by-id/A285A823-0AF2-4376-838E-847FA4710F9A@amazon.com

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

about 4 years ago

In reply to: Bossart, Nathan (#3)

Re: skip replication slot snapshot/map file removal during end-of-recovery checkpoint

On Thu, Jan 6, 2022 at 5:04 AM Bossart, Nathan <bossartn@amazon.com> wrote:

On 12/23/21, 3:17 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:

Currently the end-of-recovery checkpoint can be much slower, impacting
the server availability, if there are many replication slot files
XXXX.snap or map-XXXX to be enumerated and deleted. How about skipping
the .snap and map- file handling during the end-of-recovery
checkpoint? It makes the server available faster and the next regular
checkpoint can deal with these files. If required, we can have a GUC
(skip_replication_slot_file_handling or some other better name) to
control this default being the existing behavior.

I suggested something similar as a possibility in the other thread
where these tasks are being discussed [0]. I think it is worth
considering, but IMO it is not a complete solution to the problem. If
there are frequently many such files to delete and regular checkpoints
are taking longer, the shutdown/end-of-recovery checkpoint could still
take a while. I think it would be better to separate these tasks from
checkpointing instead.

[0] /messages/by-id/A285A823-0AF2-4376-838E-847FA4710F9A@amazon.com

Thanks. I agree to solve it as part of the other thread and close this
thread here.

Regards,
Bharath Rupireddy.