skip replication slot snapshot/map file removal during end-of-recovery checkpoint
Hi,
Currently the end-of-recovery checkpoint can be much slower, impacting
the server availability, if there are many replication slot files
XXXX.snap or map-XXXX to be enumerated and deleted. How about skipping
the .snap and map- file handling during the end-of-recovery
checkpoint? It makes the server available faster and the next regular
checkpoint can deal with these files. If required, we can have a GUC
(skip_replication_slot_file_handling or some other better name) to
control this default being the existing behavior.
Thoughts?
Regards,
Bharath Rupireddy.
On Thu, Dec 23, 2021 at 4:46 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
Hi,
Currently the end-of-recovery checkpoint can be much slower, impacting
the server availability, if there are many replication slot files
XXXX.snap or map-XXXX to be enumerated and deleted. How about skipping
the .snap and map- file handling during the end-of-recovery
checkpoint? It makes the server available faster and the next regular
checkpoint can deal with these files. If required, we can have a GUC
(skip_replication_slot_file_handling or some other better name) to
control this default being the existing behavior.Thoughts?
Here's the v1 patch, please review it.
Regards,
Bharath Rupireddy.
Attachments:
v1-0001-Skip-processing-snapshot-mapping-files-during-end.patchapplication/octet-stream; name=v1-0001-Skip-processing-snapshot-mapping-files-during-end.patchDownload
From e3cb06f6712323debc9553e79955501437948794 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Fri, 31 Dec 2021 05:52:24 +0000
Subject: [PATCH v1] Skip processing snapshot, mapping files during
end-of-recovery checkpoint
This makes the server available faster. However the regular
checkpoints can process these files.
---
src/backend/access/transam/xlog.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 87cd05c945..9d1331dd0d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9570,8 +9570,21 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
{
CheckPointRelationMap();
CheckPointReplicationSlots();
- CheckPointSnapBuild();
- CheckPointLogicalRewriteHeap();
+
+ /*
+ * Let's not process snapshot and mapping files during end-of-recovery
+ * checkpoint to make the server available faster. However, the regular
+ * checkpoints can process these files.
+ */
+ if (flags & CHECKPOINT_END_OF_RECOVERY)
+ ereport((log_checkpoints ? LOG : DEBUG2),
+ (errmsg("skipped processing of replication slot snapshot and mapping files during end-of-recovery checkpoint")));
+ else
+ {
+ CheckPointSnapBuild();
+ CheckPointLogicalRewriteHeap();
+ }
+
CheckPointReplicationOrigin();
/* Write out all dirty data in SLRUs and the main buffer pool */
--
2.25.1
On 12/23/21, 3:17 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
Currently the end-of-recovery checkpoint can be much slower, impacting
the server availability, if there are many replication slot files
XXXX.snap or map-XXXX to be enumerated and deleted. How about skipping
the .snap and map- file handling during the end-of-recovery
checkpoint? It makes the server available faster and the next regular
checkpoint can deal with these files. If required, we can have a GUC
(skip_replication_slot_file_handling or some other better name) to
control this default being the existing behavior.
I suggested something similar as a possibility in the other thread
where these tasks are being discussed [0]/messages/by-id/A285A823-0AF2-4376-838E-847FA4710F9A@amazon.com. I think it is worth
considering, but IMO it is not a complete solution to the problem. If
there are frequently many such files to delete and regular checkpoints
are taking longer, the shutdown/end-of-recovery checkpoint could still
take a while. I think it would be better to separate these tasks from
checkpointing instead.
Nathan
[0]: /messages/by-id/A285A823-0AF2-4376-838E-847FA4710F9A@amazon.com
On Thu, Jan 6, 2022 at 5:04 AM Bossart, Nathan <bossartn@amazon.com> wrote:
On 12/23/21, 3:17 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
Currently the end-of-recovery checkpoint can be much slower, impacting
the server availability, if there are many replication slot files
XXXX.snap or map-XXXX to be enumerated and deleted. How about skipping
the .snap and map- file handling during the end-of-recovery
checkpoint? It makes the server available faster and the next regular
checkpoint can deal with these files. If required, we can have a GUC
(skip_replication_slot_file_handling or some other better name) to
control this default being the existing behavior.I suggested something similar as a possibility in the other thread
where these tasks are being discussed [0]. I think it is worth
considering, but IMO it is not a complete solution to the problem. If
there are frequently many such files to delete and regular checkpoints
are taking longer, the shutdown/end-of-recovery checkpoint could still
take a while. I think it would be better to separate these tasks from
checkpointing instead.[0] /messages/by-id/A285A823-0AF2-4376-838E-847FA4710F9A@amazon.com
Thanks. I agree to solve it as part of the other thread and close this
thread here.
Regards,
Bharath Rupireddy.