Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Dear Hackers,
I'd like to discuss a problem with replication slots's restart LSN. Physical slots are saved to disk at the beginning of checkpoint. At the end of checkpoint, old WAL segments are recycled or removed from disk, if they are not kept by slot's restart_lsn values.
If an existing physical slot is advanced in the middle of checkpoint execution, WAL segments, which are related to saved on disk restart LSN may be removed. It is because the calculation of the replication slot miminal LSN is occured at the end of checkpoint, prior to old WAL segments removal. If to hard stop (pg_stl -m immediate) the postgres instance right after checkpoint and to restart it, the slot's restart_lsn may point to the removed WAL segment. I believe, such behaviour is not good.
The doc [0]https://www.postgresql.org/docs/current/logicaldecoding-explanation.html describes that restart_lsn may be set to the some past value after reload. There is a discussion [1]/messages/by-id/059cc53a-8b14-653a-a24d-5f867503b0ee@postgrespro.ru on pghackers where such behaviour is discussed. The main reason of not flushing physical slots on advancing is a performance reason. I'm ok with such behaviour, except of that the corresponding WAL segments should not be removed.
I propose to keep WAL segments by saved on disk (flushed) restart_lsn of slots. Add a new field restart_lsn_flushed into ReplicationSlot structure. Copy restart_lsn to restart_lsn_flushed in SaveSlotToPath. It doesn't change the format of storing the slot contents on disk. I attached a patch. It is not yet complete, but demonstate a way to solve the problem.
I reproduced the problem by the following way:
* Add some delay in CheckPointBuffers (pg_usleep) to emulate long checkpoint execution. * Execute checkpoint and pg_replication_slot_advance right after starting of the checkpoint from another connection. * Hard restart the server right after checkpoint completion. * After restart slot's restart_lsn may point to removed WAL segment.
The proposed patch fixes it.
[0]: https://www.postgresql.org/docs/current/logicaldecoding-explanation.html
[1]: /messages/by-id/059cc53a-8b14-653a-a24d-5f867503b0ee@postgrespro.ru
Sorry, attached the missed patch.
On Thursday, October 31, 2024 13:18 MSK, "Vitaly Davydov" <v.davydov@postgrespro.ru> wrote:
Dear Hackers,
I'd like to discuss a problem with replication slots's restart LSN. Physical slots are saved to disk at the beginning of checkpoint. At the end of checkpoint, old WAL segments are recycled or removed from disk, if they are not kept by slot's restart_lsn values.
If an existing physical slot is advanced in the middle of checkpoint execution, WAL segments, which are related to saved on disk restart LSN may be removed. It is because the calculation of the replication slot miminal LSN is occured at the end of checkpoint, prior to old WAL segments removal. If to hard stop (pg_stl -m immediate) the postgres instance right after checkpoint and to restart it, the slot's restart_lsn may point to the removed WAL segment. I believe, such behaviour is not good.
The doc [0]https://www.postgresql.org/docs/current/logicaldecoding-explanation.html describes that restart_lsn may be set to the some past value after reload. There is a discussion [1]/messages/by-id/059cc53a-8b14-653a-a24d-5f867503b0ee@postgrespro.ru on pghackers where such behaviour is discussed. The main reason of not flushing physical slots on advancing is a performance reason. I'm ok with such behaviour, except of that the corresponding WAL segments should not be removed.
I propose to keep WAL segments by saved on disk (flushed) restart_lsn of slots. Add a new field restart_lsn_flushed into ReplicationSlot structure. Copy restart_lsn to restart_lsn_flushed in SaveSlotToPath. It doesn't change the format of storing the slot contents on disk. I attached a patch. It is not yet complete, but demonstate a way to solve the problem.
I reproduced the problem by the following way:
* Add some delay in CheckPointBuffers (pg_usleep) to emulate long checkpoint execution. * Execute checkpoint and pg_replication_slot_advance right after starting of the checkpoint from another connection. * Hard restart the server right after checkpoint completion. * After restart slot's restart_lsn may point to removed WAL segment.
The proposed patch fixes it.
[0]: https://www.postgresql.org/docs/current/logicaldecoding-explanation.html
[1]: /messages/by-id/059cc53a-8b14-653a-a24d-5f867503b0ee@postgrespro.ru
Attachments:
0001-Keep-WAL-segments-by-slot-s-flushed-restart-LSN.patchtext/x-patchDownload
From acae6b55fc766d2fe1bfe85eb8af85110f55dcc8 Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <v.davydov@postgrespro.ru>
Date: Thu, 31 Oct 2024 12:29:12 +0300
Subject: [PATCH] Keep WAL segments by slot's flushed restart LSN
---
src/backend/replication/slot.c | 9 +++++++--
src/include/replication/slot.h | 4 ++++
2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 6828100cf1..ee7ab3678e 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1148,7 +1148,9 @@ ReplicationSlotsComputeRequiredLSN(void)
continue;
SpinLockAcquire(&s->mutex);
- restart_lsn = s->data.restart_lsn;
+ restart_lsn = s->restart_lsn_flushed != InvalidXLogRecPtr ?
+ s->restart_lsn_flushed :
+ s->data.restart_lsn;
invalidated = s->data.invalidated != RS_INVAL_NONE;
SpinLockRelease(&s->mutex);
@@ -1207,7 +1209,9 @@ ReplicationSlotsComputeLogicalRestartLSN(void)
/* read once, it's ok if it increases while we're checking */
SpinLockAcquire(&s->mutex);
- restart_lsn = s->data.restart_lsn;
+ restart_lsn = s->restart_lsn_flushed != InvalidXLogRecPtr ?
+ s->restart_lsn_flushed :
+ s->data.restart_lsn;
invalidated = s->data.invalidated != RS_INVAL_NONE;
SpinLockRelease(&s->mutex);
@@ -2097,6 +2101,7 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
SpinLockAcquire(&slot->mutex);
memcpy(&cp.slotdata, &slot->data, sizeof(ReplicationSlotPersistentData));
+ slot->restart_lsn_flushed = slot->data.restart_lsn;
SpinLockRelease(&slot->mutex);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 45582cf9d8..ca4c3aab3b 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -207,6 +207,10 @@ typedef struct ReplicationSlot
/* The time since the slot has become inactive */
TimestampTz inactive_since;
+
+ /* Latest restart LSN that was flushed to disk */
+ XLogRecPtr restart_lsn_flushed;
+
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
--
2.34.1
Dear Hackers,
I'd like to introduce an improved version of my patch (see the attached file). My original idea was to take into account saved on disk restart_lsn (slot→restart_lsn_flushed) for persistent slots when removing WAL segment files. It helps tackle errors like: ERROR: requested WAL segment 000...0AA has already been removed.
Improvements:
* flushed_restart_lsn is used only for RS_PERSISTENT slots. * Save physical slot on disk when advancing only once - if restart_lsn_flushed is invalid. It is needed because slots with invalid restart LSN are not used when calculating oldest LSN for WAL truncation. Once restart_lsn becomes valid, it should be saved to disk immediately to update restart_lsn_flushed.
Regression tests seems to be ok except:
* recovery/t/001_stream_rep.pl (checkpoint is needed) * recovery/t/019_replslot_limit.pl (it seems, slot was invalidated, some adjustments are needed) * pg_basebackup/t/020_pg_receivewal.pl (not sure about it)
There are some problems:
* More WAL segments may be kept. It may lead to invalidations of slots in some tests (recovery/t/019_replslot_limit.pl). A couple of tests should be adjusted.
With best regards,
Vitaly Davydov
On Thursday, October 31, 2024 13:32 MSK, "Vitaly Davydov" <v.davydov@postgrespro.ru> wrote:
Sorry, attached the missed patch.
On Thursday, October 31, 2024 13:18 MSK, "Vitaly Davydov" <v.davydov@postgrespro.ru> wrote:
Dear Hackers,
I'd like to discuss a problem with replication slots's restart LSN. Physical slots are saved to disk at the beginning of checkpoint. At the end of checkpoint, old WAL segments are recycled or removed from disk, if they are not kept by slot's restart_lsn values.
If an existing physical slot is advanced in the middle of checkpoint execution, WAL segments, which are related to saved on disk restart LSN may be removed. It is because the calculation of the replication slot miminal LSN is occured at the end of checkpoint, prior to old WAL segments removal. If to hard stop (pg_stl -m immediate) the postgres instance right after checkpoint and to restart it, the slot's restart_lsn may point to the removed WAL segment. I believe, such behaviour is not good.
The doc [0]https://www.postgresql.org/docs/current/logicaldecoding-explanation.html describes that restart_lsn may be set to the some past value after reload. There is a discussion [1]/messages/by-id/059cc53a-8b14-653a-a24d-5f867503b0ee@postgrespro.ru on pghackers where such behaviour is discussed. The main reason of not flushing physical slots on advancing is a performance reason. I'm ok with such behaviour, except of that the corresponding WAL segments should not be removed.
I propose to keep WAL segments by saved on disk (flushed) restart_lsn of slots. Add a new field restart_lsn_flushed into ReplicationSlot structure. Copy restart_lsn to restart_lsn_flushed in SaveSlotToPath. It doesn't change the format of storing the slot contents on disk. I attached a patch. It is not yet complete, but demonstate a way to solve the problem.
I reproduced the problem by the following way:
* Add some delay in CheckPointBuffers (pg_usleep) to emulate long checkpoint execution. * Execute checkpoint and pg_replication_slot_advance right after starting of the checkpoint from another connection. * Hard restart the server right after checkpoint completion. * After restart slot's restart_lsn may point to removed WAL segment.
The proposed patch fixes it.
[0]: https://www.postgresql.org/docs/current/logicaldecoding-explanation.html
[1]: /messages/by-id/059cc53a-8b14-653a-a24d-5f867503b0ee@postgrespro.ru
Attachments:
0001-Keep-WAL-segments-by-slot-s-flushed-restart-LSN.patchtext/x-patchDownload
From d52e254c558e665bc41389e02e026c1069b29861 Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <v.davydov@postgrespro.ru>
Date: Thu, 31 Oct 2024 12:29:12 +0300
Subject: [PATCH] Keep WAL segments by slot's flushed restart LSN
---
src/backend/replication/slot.c | 27 ++++++++++++++++++++++++++-
src/backend/replication/walsender.c | 13 +++++++++++++
src/include/replication/slot.h | 4 ++++
3 files changed, 43 insertions(+), 1 deletion(-)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 6828100cf1..e6aef1f9a3 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -409,6 +409,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
slot->candidate_restart_valid = InvalidXLogRecPtr;
slot->candidate_restart_lsn = InvalidXLogRecPtr;
slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
+ slot->restart_lsn_flushed = InvalidXLogRecPtr;
slot->inactive_since = 0;
/*
@@ -1142,20 +1143,28 @@ ReplicationSlotsComputeRequiredLSN(void)
{
ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
XLogRecPtr restart_lsn;
+ XLogRecPtr restart_lsn_flushed;
bool invalidated;
+ ReplicationSlotPersistency persistency;
if (!s->in_use)
continue;
SpinLockAcquire(&s->mutex);
+ persistency = s->data.persistency;
restart_lsn = s->data.restart_lsn;
invalidated = s->data.invalidated != RS_INVAL_NONE;
+ restart_lsn_flushed = s->restart_lsn_flushed;
SpinLockRelease(&s->mutex);
/* invalidated slots need not apply */
if (invalidated)
continue;
+ /* truncate WAL for persistent slots by flushed restart_lsn */
+ if (persistency == RS_PERSISTENT)
+ restart_lsn = restart_lsn_flushed;
+
if (restart_lsn != InvalidXLogRecPtr &&
(min_required == InvalidXLogRecPtr ||
restart_lsn < min_required))
@@ -1193,7 +1202,9 @@ ReplicationSlotsComputeLogicalRestartLSN(void)
{
ReplicationSlot *s;
XLogRecPtr restart_lsn;
+ XLogRecPtr restart_lsn_flushed;
bool invalidated;
+ ReplicationSlotPersistency persistency;
s = &ReplicationSlotCtl->replication_slots[i];
@@ -1207,14 +1218,20 @@ ReplicationSlotsComputeLogicalRestartLSN(void)
/* read once, it's ok if it increases while we're checking */
SpinLockAcquire(&s->mutex);
- restart_lsn = s->data.restart_lsn;
+ persistency = s->data.persistency;
+ restart_lsn = s->restart_lsn_flushed;
invalidated = s->data.invalidated != RS_INVAL_NONE;
+ restart_lsn_flushed = s->restart_lsn_flushed;
SpinLockRelease(&s->mutex);
/* invalidated slots need not apply */
if (invalidated)
continue;
+ /* truncate WAL for persistent slots by flushed restart_lsn */
+ if (persistency == RS_PERSISTENT)
+ restart_lsn = restart_lsn_flushed;
+
if (restart_lsn == InvalidXLogRecPtr)
continue;
@@ -1432,6 +1449,7 @@ ReplicationSlotReserveWal(void)
Assert(slot != NULL);
Assert(slot->data.restart_lsn == InvalidXLogRecPtr);
+ Assert(slot->restart_lsn_flushed == InvalidXLogRecPtr);
/*
* The replication slot mechanism is used to prevent removal of required
@@ -1607,6 +1625,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
*/
SpinLockAcquire(&s->mutex);
+ Assert(s->data.restart_lsn >= s->restart_lsn_flushed);
+
restart_lsn = s->data.restart_lsn;
/* we do nothing if the slot is already invalid */
@@ -1691,7 +1711,10 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
* just rely on .invalidated.
*/
if (invalidation_cause == RS_INVAL_WAL_REMOVED)
+ {
s->data.restart_lsn = InvalidXLogRecPtr;
+ s->restart_lsn_flushed = InvalidXLogRecPtr;
+ }
/* Let caller know */
*invalidated = true;
@@ -2189,6 +2212,7 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
if (!slot->just_dirtied)
slot->dirty = false;
slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
+ slot->restart_lsn_flushed = cp.slotdata.restart_lsn;
SpinLockRelease(&slot->mutex);
LWLockRelease(&slot->io_in_progress_lock);
@@ -2386,6 +2410,7 @@ RestoreSlotFromDisk(const char *name)
slot->effective_xmin = cp.slotdata.xmin;
slot->effective_catalog_xmin = cp.slotdata.catalog_xmin;
slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
+ slot->restart_lsn_flushed = cp.slotdata.restart_lsn;
slot->candidate_catalog_xmin = InvalidTransactionId;
slot->candidate_xmin_lsn = InvalidXLogRecPtr;
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 371eef3ddd..03cdce23f0 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2329,6 +2329,7 @@ static void
PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
{
bool changed = false;
+ XLogRecPtr restart_lsn_flushed;
ReplicationSlot *slot = MyReplicationSlot;
Assert(lsn != InvalidXLogRecPtr);
@@ -2336,6 +2337,7 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
if (slot->data.restart_lsn != lsn)
{
changed = true;
+ restart_lsn_flushed = slot->restart_lsn_flushed;
slot->data.restart_lsn = lsn;
}
SpinLockRelease(&slot->mutex);
@@ -2343,6 +2345,17 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
if (changed)
{
ReplicationSlotMarkDirty();
+
+ /* Save the replication slot on disk in case of its flushed restart_lsn
+ * is invalid. Slots with invalid restart lsn are ignored when
+ * calculating required LSN. Once we started to keep the WAL by flushed
+ * restart LSN, we should save to disk an initial valid value.
+ */
+ if (slot->data.persistency == RS_PERSISTENT) {
+ if (restart_lsn_flushed == InvalidXLogRecPtr && lsn != InvalidXLogRecPtr)
+ ReplicationSlotSave();
+ }
+
ReplicationSlotsComputeRequiredLSN();
PhysicalWakeupLogicalWalSnd();
}
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 45582cf9d8..ca4c3aab3b 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -207,6 +207,10 @@ typedef struct ReplicationSlot
/* The time since the slot has become inactive */
TimestampTz inactive_since;
+
+ /* Latest restart LSN that was flushed to disk */
+ XLogRecPtr restart_lsn_flushed;
+
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
--
2.34.1
Dear Hackers,
To ping the topic, I'd like to clarify what may be wrong with the idea described here, because I do not see any interest from the community. The topic is related to physical replication. The primary idea is to define the horizon of WAL segments (files) removal based on saved on disk restart LSN values. Now, the WAL segment removal horizon is calculated based on the current restart LSN values of slots, that can not be saved on disk at the time of the horizon calculation. The case take place when a slot is advancing during checkpoint as described earlier in the topic.
Such behaviour is not a problem when slots are used only for physical replication in a conventional way. But it may be a problem when physical slot is used for some other goals. For example, I have an extension which keeps the WAL using physical replication slots. It creates a new physical slot and advances it as needed. After restart, it can use restart lsn of the slot to read WAL from this LSN. In this case, there is no guarantee that restart lsn will point to an existing WAL segment.
The advantage of the current behaviour is that it requires a little bit less WAL to keep. The disadvantage is that physical slots do not guarantee WAL keeping starting from its' restart lsns in general.
I would be happy to get some advice, whether I am on the right or wrong way. Thank you in advance.
With best regards,
Vitaly
On Thursday, November 07, 2024 16:30 MSK, "Vitaly Davydov" <v.davydov@postgrespro.ru> wrote:
Dear Hackers,
I'd like to introduce an improved version of my patch (see the attached file). My original idea was to take into account saved on disk restart_lsn (slot→restart_lsn_flushed) for persistent slots when removing WAL segment files. It helps tackle errors like: ERROR: requested WAL segment 000...0AA has already been removed.
Improvements:
* flushed_restart_lsn is used only for RS_PERSISTENT slots. * Save physical slot on disk when advancing only once - if restart_lsn_flushed is invalid. It is needed because slots with invalid restart LSN are not used when calculating oldest LSN for WAL truncation. Once restart_lsn becomes valid, it should be saved to disk immediately to update restart_lsn_flushed.
Regression tests seems to be ok except:
* recovery/t/001_stream_rep.pl (checkpoint is needed) * recovery/t/019_replslot_limit.pl (it seems, slot was invalidated, some adjustments are needed) * pg_basebackup/t/020_pg_receivewal.pl (not sure about it)
There are some problems:
* More WAL segments may be kept. It may lead to invalidations of slots in some tests (recovery/t/019_replslot_limit.pl). A couple of tests should be adjusted.
With best regards,
Vitaly Davydov
On Thursday, October 31, 2024 13:32 MSK, "Vitaly Davydov" <v.davydov@postgrespro.ru> wrote:
Sorry, attached the missed patch.
On Thursday, October 31, 2024 13:18 MSK, "Vitaly Davydov" <v.davydov@postgrespro.ru> wrote:
Dear Hackers,
I'd like to discuss a problem with replication slots's restart LSN. Physical slots are saved to disk at the beginning of checkpoint. At the end of checkpoint, old WAL segments are recycled or removed from disk, if they are not kept by slot's restart_lsn values.
If an existing physical slot is advanced in the middle of checkpoint execution, WAL segments, which are related to saved on disk restart LSN may be removed. It is because the calculation of the replication slot miminal LSN is occured at the end of checkpoint, prior to old WAL segments removal. If to hard stop (pg_stl -m immediate) the postgres instance right after checkpoint and to restart it, the slot's restart_lsn may point to the removed WAL segment. I believe, such behaviour is not good.
The doc [0]https://www.postgresql.org/docs/current/logicaldecoding-explanation.html describes that restart_lsn may be set to the some past value after reload. There is a discussion [1]/messages/by-id/059cc53a-8b14-653a-a24d-5f867503b0ee@postgrespro.ru on pghackers where such behaviour is discussed. The main reason of not flushing physical slots on advancing is a performance reason. I'm ok with such behaviour, except of that the corresponding WAL segments should not be removed.
I propose to keep WAL segments by saved on disk (flushed) restart_lsn of slots. Add a new field restart_lsn_flushed into ReplicationSlot structure. Copy restart_lsn to restart_lsn_flushed in SaveSlotToPath. It doesn't change the format of storing the slot contents on disk. I attached a patch. It is not yet complete, but demonstate a way to solve the problem.
I reproduced the problem by the following way:
* Add some delay in CheckPointBuffers (pg_usleep) to emulate long checkpoint execution. * Execute checkpoint and pg_replication_slot_advance right after starting of the checkpoint from another connection. * Hard restart the server right after checkpoint completion. * After restart slot's restart_lsn may point to removed WAL segment.
The proposed patch fixes it.
[0]: https://www.postgresql.org/docs/current/logicaldecoding-explanation.html
[1]: /messages/by-id/059cc53a-8b14-653a-a24d-5f867503b0ee@postgrespro.ru
On 10/31/24 11:18, Vitaly Davydov wrote:
Dear Hackers,
I'd like to discuss a problem with replication slots's restart LSN.
Physical slots are saved to disk at the beginning of checkpoint. At the
end of checkpoint, old WAL segments are recycled or removed from disk,
if they are not kept by slot's restart_lsn values.
I agree that if we can lose WAL still needed for a replication slot,
that is a bug. Retaining the WAL is the primary purpose of slots, and we
just fixed a similar issue for logical replication.
If an existing physical slot is advanced in the middle of checkpoint
execution, WAL segments, which are related to saved on disk restart LSN
may be removed. It is because the calculation of the replication slot
miminal LSN is occured at the end of checkpoint, prior to old WAL
segments removal. If to hard stop (pg_stl -m immediate) the postgres
instance right after checkpoint and to restart it, the slot's
restart_lsn may point to the removed WAL segment. I believe, such
behaviour is not good.
Not sure I 100% follow, but let me rephrase, just so that we're on the
same page. CreateCheckPoint() does this:
... something ...
CheckPointGuts(checkPoint.redo, flags);
... something ...
RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr,
checkPoint.ThisTimeLineID);
The slots get synced in CheckPointGuts(), so IIUC you're saying if the
slot gets advanced shortly after the sync, the RemoveOldXlogFiles() may
remove still-needed WAL, because we happen to consider a fresh
restart_lsn when calculating the logSegNo. Is that right?
The doc [0] describes that restart_lsn may be set to the some past value
after reload. There is a discussion [1] on pghackers where such
behaviour is discussed. The main reason of not flushing physical slots
on advancing is a performance reason. I'm ok with such behaviour, except
of that the corresponding WAL segments should not be removed.
I don't know which part of [0] you refer to, but I guess you're
referring to this:
The current position of each slot is persisted only at checkpoint,
so in the case of a crash the slot may return to an earlier LSN,
which will then cause recent changes to be sent again when the
server restarts. Logical decoding clients are responsible for
avoiding ill effects from handling the same message more than once.
Yes, it's fine if we discard the new in-memory restart_lsn value, and we
do this for performance reasons - flushing the slot on every advance
would be very expensive. I haven't read [1] as it's quite long, but I
guess that's what it says.
But we must not make any "permanent" actions based on the unflushed
value, I think. Like, we should not remove WAL segments, for example.
I propose to keep WAL segments by saved on disk (flushed) restart_lsn of
slots. Add a new field restart_lsn_flushed into ReplicationSlot
structure. Copy restart_lsn to restart_lsn_flushed in SaveSlotToPath. It
doesn't change the format of storing the slot contents on disk. I
attached a patch. It is not yet complete, but demonstate a way to solve
the problem.
That seems like a possible fix this, yes. And maybe it's the right one.
I reproduced the problem by the following way:
* Add some delay in CheckPointBuffers (pg_usleep) to emulate long
checkpoint execution.
* Execute checkpoint and pg_replication_slot_advance right after
starting of the checkpoint from another connection.
* Hard restart the server right after checkpoint completion.
* After restart slot's restart_lsn may point to removed WAL segment.The proposed patch fixes it.
I tried to reproduce the issue using a stress test (checkpoint+restart
in a loop), but so far without success :-(
Can you clarify where exactly you added the pg_usleep(), and how long
are the waits you added? I wonder if the sleep is really needed,
considering the checkpoints are spread anyway. Also, what you mean by
"hard reset"?
What confuses me a bit is that we update the restart_lsn (and call
ReplicationSlotsComputeRequiredLSN() to recalculate the global value)
all the time. Walsender does that in PhysicalConfirmReceivedLocation for
example. So we actually see the required LSN to move during checkpoint
very often. So how come we don't see the issues much more often? Surely
I miss something important.
Another option might be that pg_replication_slot_advance() doesn't do
something it should be doing. For example, shouldn't be marking the slot
as dirty?
regards
--
Tomas Vondra
On 11/20/24 14:40, Vitaly Davydov wrote:
Dear Hackers,
To ping the topic, I'd like to clarify what may be wrong with the idea
described here, because I do not see any interest from the community.
The topic is related to physical replication. The primary idea is to
define the horizon of WAL segments (files) removal based on saved on
disk restart LSN values. Now, the WAL segment removal horizon is
calculated based on the current restart LSN values of slots, that can
not be saved on disk at the time of the horizon calculation. The case
take place when a slot is advancing during checkpoint as described
earlier in the topic.
Yeah, a simple way to fix this might be to make sure we don't use the
required LSN value set after CheckPointReplicationSlots() to remove WAL.
AFAICS the problem is KeepLogSeg() gets the new LSN value by:
keep = XLogGetReplicationSlotMinimumLSN();
Let's say we get the LSN before calling CheckPointGuts(), and then pass
it to KeepLogSeg, so that it doesn't need to get the fresh value.
Wouldn't that fix the issue?
Such behaviour is not a problem when slots are used only for physical
replication in a conventional way. But it may be a problem when physical
slot is used for some other goals. For example, I have an extension
which keeps the WAL using physical replication slots. It creates a new
physical slot and advances it as needed. After restart, it can use
restart lsn of the slot to read WAL from this LSN. In this case, there
is no guarantee that restart lsn will point to an existing WAL segment.
Yeah.
The advantage of the current behaviour is that it requires a little bit
less WAL to keep. The disadvantage is that physical slots do not
guarantee WAL keeping starting from its' restart lsns in general.
If it's wrong, it doesn't really matter it has some advantages.
regards
--
Tomas Vondra
On 11/20/24 18:24, Tomas Vondra wrote:
...
What confuses me a bit is that we update the restart_lsn (and call
ReplicationSlotsComputeRequiredLSN() to recalculate the global value)
all the time. Walsender does that in PhysicalConfirmReceivedLocation for
example. So we actually see the required LSN to move during checkpoint
very often. So how come we don't see the issues much more often? Surely
I miss something important.
This question "How come we don't see this more often?" kept bugging me,
and the answer is actually pretty simple.
The restart_lsn can move backwards after a hard restart (for the reasons
explained), but physical replication does not actually rely on that. The
replica keeps track of the LSN it received (well, it uses the same LSN),
and on reconnect it sends the startpoint to the primary. And the primary
just proceeds use that instead of the (stale) restart LSN for the slot.
And the startpoint is guaranteed (I think) to be at least restart_lsn.
AFAICS this would work for pg_replication_slot_advance() too, that is if
you remember the last LSN the slot advanced to, it should be possible to
advance to it just fine. Of course, it requires a way to remember that
LSN, which for a replica is not an issue. But this just highlights we
can't rely on restart_lsn for this purpose.
(Apologies if this repeats something obvious, or something you already
said, Vitaly.)
regards
--
Tomas Vondra
On 11/20/24 23:19, Tomas Vondra wrote:
On 11/20/24 18:24, Tomas Vondra wrote:
...
What confuses me a bit is that we update the restart_lsn (and call
ReplicationSlotsComputeRequiredLSN() to recalculate the global value)
all the time. Walsender does that in PhysicalConfirmReceivedLocation for
example. So we actually see the required LSN to move during checkpoint
very often. So how come we don't see the issues much more often? Surely
I miss something important.This question "How come we don't see this more often?" kept bugging me,
and the answer is actually pretty simple.The restart_lsn can move backwards after a hard restart (for the reasons
explained), but physical replication does not actually rely on that. The
replica keeps track of the LSN it received (well, it uses the same LSN),
and on reconnect it sends the startpoint to the primary. And the primary
just proceeds use that instead of the (stale) restart LSN for the slot.
And the startpoint is guaranteed (I think) to be at least restart_lsn.AFAICS this would work for pg_replication_slot_advance() too, that is if
you remember the last LSN the slot advanced to, it should be possible to
advance to it just fine. Of course, it requires a way to remember that
LSN, which for a replica is not an issue. But this just highlights we
can't rely on restart_lsn for this purpose.
I kept thinking about this (sorry it's this incremental), particularly
if this applies to logical replication too. And AFAICS it does not, or
at least not to this extent.
For streaming, the subscriber sends the startpoint (just like physical
replication), so it should be protected too.
But then there's the SQL API - pg_logical_slot_get_changes(). And it
turns out it ends up syncing the slot to disk pretty often, because for
RUNNING_XACTS we call LogicalDecodingProcessRecord() + standby_decode(),
which ends up calling SaveSlotToDisk(). And at the end we call
LogicalConfirmReceivedLocation() for good measure, which saves the slot
too, just to be sure.
FWIW I suspect this still is not perfectly safe, because we may still
crash / restart before the updated data.restart_lsn makes it to disk,
but after it was already used to remove old WAL, although that's
probably harder to hit. With streaming the subscriber will still send us
the new startpoint, so that should not fail I think. But with the SQL
API we probably can get into the "segment already removed" issues.
I haven't tried reproducing this yet, I guess it should be possible
using the injection points. Not sure when I get to this, though.
In any case, doesn't this suggest SaveSlotToDisk() really is not that
expensive, if we do it pretty often for logical replication? Which was
presented as the main reason why pg_replication_slot_advance() doesn't
do that. Maybe it should?
If the advance is substantial I don't think it really matters, because
there simply can't be that many of large advances. It amortizes, in a
way. But even with smaller advances it should be fine, I think - if the
goal is to not remove WAL prematurely, it's enough to flush when we move
to the next segment.
regards
--
Tomas Vondra
Hi Tomas,
Thank you for the reply and your interest to the investigation.
On Wednesday, November 20, 2024 20:24 MSK, Tomas Vondra <tomas@vondra.me> wrote:
If an existing physical slot is advanced in the middle of checkpoint
execution, WAL segments, which are related to saved on disk restart LSN
may be removed. It is because the calculation of the replication slot
miminal LSN is occured at the end of checkpoint, prior to old WAL
segments removal. If to hard stop (pg_stl -m immediate) the postgres
instance right after checkpoint and to restart it, the slot's
restart_lsn may point to the removed WAL segment. I believe, such
behaviour is not good.
Not sure I 100% follow, but let me rephrase, just so that we're on the
same page. CreateCheckPoint() does this:
... something ...
CheckPointGuts(checkPoint.redo, flags);
... something ...
RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr,
checkPoint.ThisTimeLineID);
The slots get synced in CheckPointGuts(), so IIUC you're saying if the
slot gets advanced shortly after the sync, the RemoveOldXlogFiles() may
remove still-needed WAL, because we happen to consider a fresh
restart_lsn when calculating the logSegNo. Is that right?
They key action here is to restart the instance with -m immediate (or kill it and start it again) right after checkpoint. After restart, the slot's restart_lsn will be read from the disk and address to the removed WAL segment, if it's LSN was davanced enought to switch to a new WAL segment.
I tried to reproduce the issue using a stress test (checkpoint+restart
in a loop), but so far without success :-(
Can you clarify where exactly you added the pg_usleep(), and how long
are the waits you added? I wonder if the sleep is really needed,
considering the checkpoints are spread anyway. Also, what you mean by
"hard reset"?
I added pg_usleep as show below (in CheckPointBuffers function):
CheckPointBuffers(int flags)
{
BufferSync(flags);
+ pg_usleep(10000000);
}
Below is the instruction how I run my test (pg_usleep should be added to the code):
CONSOLE> initdb -D pgdata
CONSOLE> pg_ctl -D pgdata -l logfile start
... open two psql terminals and connect to the database (lets call them as PSQL-1, PSQL-2)
PSQL-1> select pg_create_physical_replication_slot('myslot', true, false);
CONSOLE> pgbench -i -s 10 postgres # create some WAL records
PSQL-1> checkpoint; -- press ENTER key and go to PSQL-2 console and execute next line in 1 second
PSQL-2> select pg_replication_slot_advance('myslot', pg_current_wal_lsn()); -- advance repslot during checkpoint
... wait for checkpoint to complete
CONSOLE> pg_ctl -D pgdata -m immediate stop
CONSOLE> pg_ctl -D pgdata start
PSQL-1> \c
PSQL-1> create extension pg_walinspect;
PSQL-1> select pg_get_wal_record_info(restart_lsn) from pg_replication_slots where slot_name = 'myslot';
ERROR: requested WAL segment pg_wal/000000010000000000000001 has already been removed
I'm trying to create a perl test to reproduce it. Please, give me some time to create the test script.
I kept thinking about this (sorry it's this incremental), particularly
if this applies to logical replication too. And AFAICS it does not, or
at least not to this extent.
Yes, it is not applied to logical replication, because logical slot is synced when advancing.
eah, a simple way to fix this might be to make sure we don't use the
required LSN value set after CheckPointReplicationSlots() to remove WAL.
AFAICS the problem is KeepLogSeg() gets the new LSN value by:
keep = XLogGetReplicationSlotMinimumLSN();
Let's say we get the LSN before calling CheckPointGuts(), and then pass
it to KeepLogSeg, so that it doesn't need to get the fresh value.
Yes, it is another solution and it can fix the problem. The question - which solution to choose. Well, I prefer to add a new in-memory state variable in the slot structure. Such variable may be useful if we want to check whether the slot data is synced or not. The calculation of the keep value before CheckPointGuts(), IMHO, requires to change signatures of a number of functions. I may prepare a new patch where your solution is implemented.
I'm sorry, if I missed to answer to some other questions. I will answer later.
With best regards,
Vitaly
On Thursday, November 21, 2024 17:56 MSK, "Vitaly Davydov" <v.davydov@postgrespro.ru> wrote:
I'm trying to create a perl test to reproduce it. Please, give me some time to create the test script.
Attached is the test script which reproduces my problem. It should be run on a patched postgresql with the following changes (see below). It is the easiest way to emulate long checkpoint during high load.
CheckPointBuffers(int flags)
{
BufferSync(flags);
+ pg_usleep(10000000);
}
I used the following command line to run the script, where <postgresqldir> - the directory with postgresql sources. The module IPC::Run should be installed as well. PATH and LD_LIBRARY_PATH should be set to a proper postgresql binary and libraries as well.
perl -I <postgresqldir>/src/test/perl/ restartlsn.pl
Finally, it should produce the following error into the log:
error running SQL: 'psql:<stdin>:1: ERROR: requested WAL segment pg_wal/000000010000000000000001 has already been removed'
With best regards,
Vitaly
Attachments:
On 11/21/24 14:59, Tomas Vondra wrote:
...
But then there's the SQL API - pg_logical_slot_get_changes(). And it
turns out it ends up syncing the slot to disk pretty often, because for
RUNNING_XACTS we call LogicalDecodingProcessRecord() + standby_decode(),
which ends up calling SaveSlotToDisk(). And at the end we call
LogicalConfirmReceivedLocation() for good measure, which saves the slot
too, just to be sure.FWIW I suspect this still is not perfectly safe, because we may still
crash / restart before the updated data.restart_lsn makes it to disk,
but after it was already used to remove old WAL, although that's
probably harder to hit. With streaming the subscriber will still send us
the new startpoint, so that should not fail I think. But with the SQL
API we probably can get into the "segment already removed" issues.I haven't tried reproducing this yet, I guess it should be possible
using the injection points. Not sure when I get to this, though.
I kept pulling on this loose thread, and the deeper I look the more I'm
concvinced ReplicationSlotsComputeRequiredLSN() is fundamentally unsafe.
I may be missing something, of course, in which case I'd be grateful if
someone could correct me.
I believe the main problem is that ReplicationSlotsComputeRequiredLSN()
operates on data that may not be on-disk yet. It just iterates over
slots in shared memory, looks at the data.restart_lsn, and rolls with
that. So some of the data may be lost after a crash or "immediate"
restart, which for restart_lsn means it can move backwards by some
unknown amount.
Unfortunately, some of the callers use the value as if it was durable,
and do irreversible actions based on it. This whole thread is about
checkpointer using the value to discard WAL supposedly not required by
any slot, only to find out we're missing WAL.
That seems like a rather fundamental problem, and the only reason why we
don't see this causing trouble more often is that (a) abrupt restarts
are not very common, (b) most slots are likely not lagging very much,
and thus not in danger of actually losing WAL, and (c) the streaming
replication tracks startpoint, which masks the issue.
But with the SQL API it's quite simple to cause issues with the right
timing, as I'll show in a bit.
There's an interesting difference in how different places update the
slot. For example LogicalConfirmReceivedLocation() does this:
1) update slot->data.restart_lsn
2) mark slot dirty: ReplicationSlotMarkDirty()
3) save slot to disk: ReplicationSlotSave()
4) recalculate required LSN: ReplicationSlotsComputeRequiredLSN()
while pg_replication_slot_advance() does only this:
1) update slot->data.restart_lsn
2) mark slot dirty: ReplicationSlotMarkDirty()
3) recalculate required LSN: ReplicationSlotsComputeRequiredLSN()
That is, it doesn't save the slot to disk. It just updates the LSN and
them proceeds to recalculate the "required LSN" for all slots. That
makes is very easy to hit the issue, as demonstrated by Vitaly.
However, it doesn't mean LogicalConfirmReceivedLocation() is safe. It
would be safe without concurrency, but it can happen that the logical
decoding does (1) and maybe (2), but before the slot gets persisted,
some other session gets to call ReplicationSlotsComputeRequiredLSN().
It might be logical decoding on another slot, or advance of a physical
slot. I haven't checked what else can trigger that.
So ultimately logical slots have exactly the same issue.
Attached are two patches, demonstrating the issue. 0001 adds injection
points into two places - before (2) in LogicalConfirmReceivedLocation,
and before removal of old WAL in a checkpoint. 0002 then adds a simple
TAP test triggering the issue in pg_logical_slot_get_changes(), leading to:
ERROR: requested WAL segment pg_wal/000000010000000000000001 has
already been removed
The same issue could be demonstrated on a physical slot - it would
actually be simpler, I think.
I've been unable to cause issues for streaming replication (both
physical and logical), because the subscriber sends startpoint which
adjusts the restart_lsn to a "good" value. But I'm not sure if that's
reliable in all cases, or if the replication could break too.
It's entirely possible this behavior is common knowledge, but it was a
surprise for me. Even if the streaming replication is safe, it does seem
to make using the SQL functions less reliable (not that it doesn't have
other challenges, e.g. with Ctrl-C). But maybe it could be made safer?
I don't have a great idea how to improve this. It seems wrong for
ReplicationSlotsComputeRequiredLSN() to calculate the LSN using values
from dirty slots, so maybe it should simply retry if any slot is dirty?
Or retry on that one slot? But various places update the restart_lsn
before marking the slot as dirty, so right now this won't work.
regards
--
Tomas Vondra
Attachments:
0002-TAP-test.patchtext/x-patch; charset=UTF-8; name=0002-TAP-test.patchDownload
From 79a045728b09237234f23130a2a710e1bdde7870 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Thu, 21 Nov 2024 23:07:22 +0100
Subject: [PATCH 2/2] TAP test
---
src/test/modules/test_required_lsn/Makefile | 18 +++
.../modules/test_required_lsn/meson.build | 15 +++
.../test_required_lsn/t/001_logical_slot.pl | 126 ++++++++++++++++++
3 files changed, 159 insertions(+)
create mode 100644 src/test/modules/test_required_lsn/Makefile
create mode 100644 src/test/modules/test_required_lsn/meson.build
create mode 100644 src/test/modules/test_required_lsn/t/001_logical_slot.pl
diff --git a/src/test/modules/test_required_lsn/Makefile b/src/test/modules/test_required_lsn/Makefile
new file mode 100644
index 00000000000..3eb2b02d38f
--- /dev/null
+++ b/src/test/modules/test_required_lsn/Makefile
@@ -0,0 +1,18 @@
+# src/test/modules/test_required_lsn/Makefile
+
+EXTRA_INSTALL=src/test/modules/injection_points \
+ contrib/test_decoding
+
+export enable_injection_points
+TAP_TESTS = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_required_lsn
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_required_lsn/meson.build b/src/test/modules/test_required_lsn/meson.build
new file mode 100644
index 00000000000..99ef3a60a4e
--- /dev/null
+++ b/src/test/modules/test_required_lsn/meson.build
@@ -0,0 +1,15 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+tests += {
+ 'name': 'test_required_lsn',
+ 'sd': meson.current_source_dir(),
+ 'bd': meson.current_build_dir(),
+ 'tap': {
+ 'env': {
+ 'enable_injection_points': get_option('injection_points') ? 'yes' : 'no',
+ },
+ 'tests': [
+ 't/001_logical_replication.pl'
+ ],
+ },
+}
diff --git a/src/test/modules/test_required_lsn/t/001_logical_slot.pl b/src/test/modules/test_required_lsn/t/001_logical_slot.pl
new file mode 100644
index 00000000000..41261f4aa6b
--- /dev/null
+++ b/src/test/modules/test_required_lsn/t/001_logical_slot.pl
@@ -0,0 +1,126 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+# This test verifies edge case of reading a multixact:
+# when we have multixact that is followed by exactly one another multixact,
+# and another multixact have no offset yet, we must wait until this offset
+# becomes observable. Previously we used to wait for 1ms in a loop in this
+# case, but now we use CV for this. This test is exercising such a sleep.
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+
+$node = PostgreSQL::Test::Cluster->new('mike');
+$node->init;
+$node->append_conf('postgresql.conf',
+ "shared_preload_libraries = 'injection_points'");
+$node->append_conf('postgresql.conf',
+ "wal_level = 'logical'");
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# create a simple table to generate data into
+$node->safe_psql('postgres',
+ q{create table t (id serial primary key, b text)});
+
+# create the two slots we'll need
+$node->safe_psql('postgres',
+ q{select pg_create_logical_replication_slot('slot_logical', 'test_decoding')});
+$node->safe_psql('postgres',
+ q{select pg_create_physical_replication_slot('slot_physical', true)});
+
+# advance both to current position, just to have everything "valid"
+$node->safe_psql('postgres',
+ q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null)});
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())});
+
+# run checkpoint, to flush current state to disk and set a baseline
+$node->safe_psql('postgres', q{checkpoint});
+
+# generate transactions to get RUNNING_XACTS
+my $xacts = $node->background_psql('postgres');
+$xacts->query_until(qr/run_xacts/,
+q(\echo run_xacts
+SELECT 1 \watch 0.1
+\q
+));
+
+# insert 2M rows, that's about 260MB (~20 segments) worth of WAL
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)});
+
+# run another checkpoint, to set a new restore LSN
+$node->safe_psql('postgres', q{checkpoint});
+
+# another 2M rows, that's about 260MB (~20 segments) worth of WAL
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)});
+
+# run another checkpoint, this time in the background, and make it wait
+# on the injection point), so that the checkpoint stops right before
+# removing old WAL segments
+print('starting checkpoint\n');
+
+my $checkpoint = $node->background_psql('postgres');
+$checkpoint->query_safe(q(select injection_points_attach('checkpoint-before-old-wal-removal','wait')));
+$checkpoint->query_until(qr/starting_checkpoint/,
+q(\echo starting_checkpoint
+checkpoint;
+\q
+));
+
+print('waiting for injection_point\n');
+# wait until the checkpoint stops right before removing WAL segments
+$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
+
+
+# try to advance the logical slot, but make it stop when it moves to the
+# next WAL segment (has to happen in the background too)
+my $logical = $node->background_psql('postgres');
+$logical->query_safe(q{select injection_points_attach('logical-replication-slot-advance-segment','wait');});
+$logical->query_until(qr/get_changes/,
+q(
+\echo get_changes
+select count(*) from pg_logical_slot_get_changes('slot_logical', null, null) \watch 1
+\q
+));
+
+
+# wait until the checkpoint stops right before removing WAL segments
+$node->wait_for_event('client backend', 'logical-replication-slot-advance-segment');
+
+
+# OK, we're in the right situation, time to advance the physical slot,
+# which recalculates the required LSN, and then unblock the checkpoint,
+# which removes the WAL still needed by the logical slot
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())});
+
+$node->safe_psql('postgres',
+ q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
+
+# abruptly stop the server (1 second should be enough for the checkpoint
+# to finish, would be better )
+$node->stop('immediate');
+
+$node->start;
+
+$node->safe_psql('postgres', q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null);});
+
+$node->stop;
+
+# If we reached this point - everything is OK.
+ok(1);
+done_testing();
--
2.47.0
0001-injection-points.patchtext/x-patch; charset=UTF-8; name=0001-injection-points.patchDownload
From eef5f02a5c22ccc520c20623d70eaf093a039f09 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Thu, 21 Nov 2024 20:37:00 +0100
Subject: [PATCH 1/2] injection points
---
src/backend/access/transam/xlog.c | 4 ++++
src/backend/replication/logical/logical.c | 18 ++++++++++++++++++
2 files changed, 22 insertions(+)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6f58412bcab..8f9629866c3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7310,6 +7310,10 @@ CreateCheckPoint(int flags)
if (PriorRedoPtr != InvalidXLogRecPtr)
UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
+#ifdef USE_INJECTION_POINTS
+ INJECTION_POINT("checkpoint-before-old-wal-removal");
+#endif
+
/*
* Delete old log files, those no longer needed for last checkpoint to
* prevent the disk holding the xlog from growing full.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index e941bb491d8..569c1925ecc 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -30,6 +30,7 @@
#include "access/xact.h"
#include "access/xlogutils.h"
+#include "access/xlog_internal.h"
#include "fmgr.h"
#include "miscadmin.h"
#include "pgstat.h"
@@ -41,6 +42,7 @@
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/inval.h"
#include "utils/memutils.h"
@@ -1844,9 +1846,13 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
{
bool updated_xmin = false;
bool updated_restart = false;
+ XLogRecPtr restart_lsn;
SpinLockAcquire(&MyReplicationSlot->mutex);
+ /* remember the old restart lsn */
+ restart_lsn = MyReplicationSlot->data.restart_lsn;
+
MyReplicationSlot->data.confirmed_flush = lsn;
/* if we're past the location required for bumping xmin, do so */
@@ -1888,6 +1894,18 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
/* first write new xmin to disk, so we know what's up after a crash */
if (updated_xmin || updated_restart)
{
+#ifdef USE_INJECTION_POINTS
+ XLogSegNo seg1,
+ seg2;
+
+ XLByteToSeg(restart_lsn, seg1, wal_segment_size);
+ XLByteToSeg(MyReplicationSlot->data.restart_lsn, seg2, wal_segment_size);
+
+ /* trigger injection point, but only if segment changes */
+ if (seg1 != seg2)
+ INJECTION_POINT("logical-replication-slot-advance-segment");
+#endif
+
ReplicationSlotMarkDirty();
ReplicationSlotSave();
elog(DEBUG1, "updated xmin: %u restart: %u", updated_xmin, updated_restart);
--
2.47.0
On 11/21/24 14:59, Tomas Vondra wrote:
I don't have a great idea how to improve this. It seems wrong for
ReplicationSlotsComputeRequiredLSN() to calculate the LSN using values
from dirty slots, so maybe it should simply retry if any slot is dirty?
Or retry on that one slot? But various places update the restart_lsn
before marking the slot as dirty, so right now this won't work.
To ping the topic, I would like to propose a new version of my patch. All the check-world tests seems to pass ok.
The idea of the patch is pretty simple - keep flushed restart_lsn in memory and use this value to calculate required lsn in ReplicationSlotsComputeRequiredLSN().
One note - if restart_lsn_flushed is invalid, the restart_lsn value will be used. If we take invalid restart_lsn_flushed instead of valid restart_lsn the slot will be skipped. At the moment I have no other ideas how to deal with invalid restart_lsn_flushed.
With best regards,
Vitaly
Attachments:
0001-Keep-WAL-segments-by-slot-s-flushed-restart-LSN.patchtext/x-patchDownload
From a6fb33969213e5f5dd994853ac052df64372b85f Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <v.davydov@postgrespro.ru>
Date: Thu, 31 Oct 2024 12:29:12 +0300
Subject: [PATCH 1/2] Keep WAL segments by slot's flushed restart LSN
---
src/backend/replication/slot.c | 37 +++++++++++++++++++++++++++++
src/backend/replication/walsender.c | 13 ++++++++++
src/include/replication/slot.h | 4 ++++
3 files changed, 54 insertions(+)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 4a206f9527..c3c44fe9f8 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -409,6 +409,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
slot->candidate_restart_valid = InvalidXLogRecPtr;
slot->candidate_restart_lsn = InvalidXLogRecPtr;
slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
+ slot->restart_lsn_flushed = InvalidXLogRecPtr;
slot->inactive_since = 0;
/*
@@ -1142,20 +1143,34 @@ ReplicationSlotsComputeRequiredLSN(void)
{
ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
XLogRecPtr restart_lsn;
+ XLogRecPtr restart_lsn_flushed;
bool invalidated;
+ ReplicationSlotPersistency persistency;
if (!s->in_use)
continue;
SpinLockAcquire(&s->mutex);
+ persistency = s->data.persistency;
restart_lsn = s->data.restart_lsn;
invalidated = s->data.invalidated != RS_INVAL_NONE;
+ restart_lsn_flushed = s->restart_lsn_flushed;
SpinLockRelease(&s->mutex);
/* invalidated slots need not apply */
if (invalidated)
continue;
+ /* truncate WAL for persistent slots by flushed restart_lsn */
+ if (persistency == RS_PERSISTENT)
+ {
+ if (restart_lsn_flushed != InvalidXLogRecPtr &&
+ restart_lsn > restart_lsn_flushed)
+ {
+ restart_lsn = restart_lsn_flushed;
+ }
+ }
+
if (restart_lsn != InvalidXLogRecPtr &&
(min_required == InvalidXLogRecPtr ||
restart_lsn < min_required))
@@ -1193,7 +1208,9 @@ ReplicationSlotsComputeLogicalRestartLSN(void)
{
ReplicationSlot *s;
XLogRecPtr restart_lsn;
+ XLogRecPtr restart_lsn_flushed;
bool invalidated;
+ ReplicationSlotPersistency persistency;
s = &ReplicationSlotCtl->replication_slots[i];
@@ -1207,14 +1224,26 @@ ReplicationSlotsComputeLogicalRestartLSN(void)
/* read once, it's ok if it increases while we're checking */
SpinLockAcquire(&s->mutex);
+ persistency = s->data.persistency;
restart_lsn = s->data.restart_lsn;
invalidated = s->data.invalidated != RS_INVAL_NONE;
+ restart_lsn_flushed = s->restart_lsn_flushed;
SpinLockRelease(&s->mutex);
/* invalidated slots need not apply */
if (invalidated)
continue;
+ /* truncate WAL for persistent slots by flushed restart_lsn */
+ if (persistency == RS_PERSISTENT)
+ {
+ if (restart_lsn_flushed != InvalidXLogRecPtr &&
+ restart_lsn > restart_lsn_flushed)
+ {
+ restart_lsn = restart_lsn_flushed;
+ }
+ }
+
if (restart_lsn == InvalidXLogRecPtr)
continue;
@@ -1432,6 +1461,7 @@ ReplicationSlotReserveWal(void)
Assert(slot != NULL);
Assert(slot->data.restart_lsn == InvalidXLogRecPtr);
+ Assert(slot->restart_lsn_flushed == InvalidXLogRecPtr);
/*
* The replication slot mechanism is used to prevent removal of required
@@ -1607,6 +1637,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
*/
SpinLockAcquire(&s->mutex);
+ Assert(s->data.restart_lsn >= s->restart_lsn_flushed);
+
restart_lsn = s->data.restart_lsn;
/* we do nothing if the slot is already invalid */
@@ -1691,7 +1723,10 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
* just rely on .invalidated.
*/
if (invalidation_cause == RS_INVAL_WAL_REMOVED)
+ {
s->data.restart_lsn = InvalidXLogRecPtr;
+ s->restart_lsn_flushed = InvalidXLogRecPtr;
+ }
/* Let caller know */
*invalidated = true;
@@ -2189,6 +2224,7 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
if (!slot->just_dirtied)
slot->dirty = false;
slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
+ slot->restart_lsn_flushed = cp.slotdata.restart_lsn;
SpinLockRelease(&slot->mutex);
LWLockRelease(&slot->io_in_progress_lock);
@@ -2386,6 +2422,7 @@ RestoreSlotFromDisk(const char *name)
slot->effective_xmin = cp.slotdata.xmin;
slot->effective_catalog_xmin = cp.slotdata.catalog_xmin;
slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
+ slot->restart_lsn_flushed = cp.slotdata.restart_lsn;
slot->candidate_catalog_xmin = InvalidTransactionId;
slot->candidate_xmin_lsn = InvalidXLogRecPtr;
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 371eef3ddd..03cdce23f0 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2329,6 +2329,7 @@ static void
PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
{
bool changed = false;
+ XLogRecPtr restart_lsn_flushed;
ReplicationSlot *slot = MyReplicationSlot;
Assert(lsn != InvalidXLogRecPtr);
@@ -2336,6 +2337,7 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
if (slot->data.restart_lsn != lsn)
{
changed = true;
+ restart_lsn_flushed = slot->restart_lsn_flushed;
slot->data.restart_lsn = lsn;
}
SpinLockRelease(&slot->mutex);
@@ -2343,6 +2345,17 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
if (changed)
{
ReplicationSlotMarkDirty();
+
+ /* Save the replication slot on disk in case of its flushed restart_lsn
+ * is invalid. Slots with invalid restart lsn are ignored when
+ * calculating required LSN. Once we started to keep the WAL by flushed
+ * restart LSN, we should save to disk an initial valid value.
+ */
+ if (slot->data.persistency == RS_PERSISTENT) {
+ if (restart_lsn_flushed == InvalidXLogRecPtr && lsn != InvalidXLogRecPtr)
+ ReplicationSlotSave();
+ }
+
ReplicationSlotsComputeRequiredLSN();
PhysicalWakeupLogicalWalSnd();
}
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index d2cf786fd5..e66248b82d 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -211,6 +211,10 @@ typedef struct ReplicationSlot
* recently stopped.
*/
TimestampTz inactive_since;
+
+ /* Latest restart LSN that was flushed to disk */
+ XLogRecPtr restart_lsn_flushed;
+
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
--
2.34.1
0002-Fix-src-recovery-t-001_stream_rep.pl-after-changes-i.patchtext/x-patchDownload
From f950bb109563ce407a5972abbd2fc931ef18dbeb Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <v.davydov@postgrespro.ru>
Date: Fri, 13 Dec 2024 16:02:14 +0300
Subject: [PATCH 2/2] Fix src/recovery/t/001_stream_rep.pl after changes in
restart lsn
---
src/test/recovery/t/001_stream_rep.pl | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index f3ea45ac4a..95f61cabfc 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -553,6 +553,9 @@ chomp($phys_restart_lsn_post);
ok( ($phys_restart_lsn_pre cmp $phys_restart_lsn_post) == 0,
"physical slot advance persists across restarts");
+# Cleanup unused WAL segments
+$node_primary->safe_psql('postgres', "CHECKPOINT;");
+
# Check if the previous segment gets correctly recycled after the
# server stopped cleanly, causing a shutdown checkpoint to be generated.
my $primary_data = $node_primary->data_dir;
--
2.34.1
Dear Hackers,
Let me please introduce a new version of the patch.
Patch description:
The slot data is flushed to the disk at the beginning of checkpoint. If
an existing slot is advanced in the middle of checkpoint execution, its
advanced restart LSN is taken to calculate the oldest LSN for WAL
segments removal at the end of checkpoint. If the node is restarted just
after the checkpoint, the slots data will be read from the disk at
recovery with the oldest restart LSN which can refer to removed WAL
segments.
The patch introduces a new in-memory state for slots -
flushed_restart_lsn which is used to calculate the oldest LSN for WAL
segments removal. This state is updated every time with the current
restart_lsn at the moment, when the slot is saving to disk.
With best regards,
Vitaly
Attachments:
0001-Keep-WAL-segments-by-slot-s-flushed-restart-LSN.patchtext/x-patchDownload
From 480ab108499d95c8befd95911524c4d77cec6e2e Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <v.davydov@postgrespro.ru>
Date: Mon, 3 Mar 2025 17:02:15 +0300
Subject: [PATCH 1/2] Keep WAL segments by slot's flushed restart LSN
The slot data is flushed to the disk at the beginning of checkpoint. If
an existing slot is advanced in the middle of checkpoint execution, its
advanced restart LSN is taken to calculate the oldest LSN for WAL
segments removal at the end of checkpoint. If the node is restarted just
after the checkpoint, the slots data will be read from the disk at
recovery with the oldest restart LSN which can refer to removed WAL
segments.
The patch introduces a new in-memory state for slots -
flushed_restart_lsn which is used to calculate the oldest LSN for WAL
segments removal. This state is updated every time with the current
restart_lsn at the moment, when the slot is saving to disk.
---
src/backend/replication/slot.c | 41 ++++++++++++++++++++++++++++++++++
src/include/replication/slot.h | 7 ++++++
2 files changed, 48 insertions(+)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 719e531eb90..294418df217 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -424,6 +424,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
slot->candidate_restart_valid = InvalidXLogRecPtr;
slot->candidate_restart_lsn = InvalidXLogRecPtr;
slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
+ slot->restart_lsn_flushed = InvalidXLogRecPtr;
slot->inactive_since = 0;
/*
@@ -1165,20 +1166,36 @@ ReplicationSlotsComputeRequiredLSN(void)
{
ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
XLogRecPtr restart_lsn;
+ XLogRecPtr restart_lsn_flushed;
bool invalidated;
+ ReplicationSlotPersistency persistency;
if (!s->in_use)
continue;
SpinLockAcquire(&s->mutex);
+ persistency = s->data.persistency;
restart_lsn = s->data.restart_lsn;
invalidated = s->data.invalidated != RS_INVAL_NONE;
+ restart_lsn_flushed = s->restart_lsn_flushed;
SpinLockRelease(&s->mutex);
/* invalidated slots need not apply */
if (invalidated)
continue;
+ /* Get the flushed restart_lsn for the persistent slot to compute
+ * the oldest LSN for WAL segments removals.
+ */
+ if (persistency == RS_PERSISTENT)
+ {
+ if (restart_lsn_flushed != InvalidXLogRecPtr &&
+ restart_lsn > restart_lsn_flushed)
+ {
+ restart_lsn = restart_lsn_flushed;
+ }
+ }
+
if (restart_lsn != InvalidXLogRecPtr &&
(min_required == InvalidXLogRecPtr ||
restart_lsn < min_required))
@@ -1216,7 +1233,9 @@ ReplicationSlotsComputeLogicalRestartLSN(void)
{
ReplicationSlot *s;
XLogRecPtr restart_lsn;
+ XLogRecPtr restart_lsn_flushed;
bool invalidated;
+ ReplicationSlotPersistency persistency;
s = &ReplicationSlotCtl->replication_slots[i];
@@ -1230,14 +1249,28 @@ ReplicationSlotsComputeLogicalRestartLSN(void)
/* read once, it's ok if it increases while we're checking */
SpinLockAcquire(&s->mutex);
+ persistency = s->data.persistency;
restart_lsn = s->data.restart_lsn;
invalidated = s->data.invalidated != RS_INVAL_NONE;
+ restart_lsn_flushed = s->restart_lsn_flushed;
SpinLockRelease(&s->mutex);
/* invalidated slots need not apply */
if (invalidated)
continue;
+ /* Get the flushed restart_lsn for the persistent slot to compute
+ * the oldest LSN for WAL segments removals.
+ */
+ if (persistency == RS_PERSISTENT)
+ {
+ if (restart_lsn_flushed != InvalidXLogRecPtr &&
+ restart_lsn > restart_lsn_flushed)
+ {
+ restart_lsn = restart_lsn_flushed;
+ }
+ }
+
if (restart_lsn == InvalidXLogRecPtr)
continue;
@@ -1455,6 +1488,7 @@ ReplicationSlotReserveWal(void)
Assert(slot != NULL);
Assert(slot->data.restart_lsn == InvalidXLogRecPtr);
+ Assert(slot->restart_lsn_flushed == InvalidXLogRecPtr);
/*
* The replication slot mechanism is used to prevent removal of required
@@ -1766,6 +1800,8 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
*/
SpinLockAcquire(&s->mutex);
+ Assert(s->data.restart_lsn >= s->restart_lsn_flushed);
+
restart_lsn = s->data.restart_lsn;
/* we do nothing if the slot is already invalid */
@@ -1835,7 +1871,10 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
* just rely on .invalidated.
*/
if (invalidation_cause == RS_INVAL_WAL_REMOVED)
+ {
s->data.restart_lsn = InvalidXLogRecPtr;
+ s->restart_lsn_flushed = InvalidXLogRecPtr;
+ }
/* Let caller know */
*invalidated = true;
@@ -2354,6 +2393,7 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
if (!slot->just_dirtied)
slot->dirty = false;
slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
+ slot->restart_lsn_flushed = cp.slotdata.restart_lsn;
SpinLockRelease(&slot->mutex);
LWLockRelease(&slot->io_in_progress_lock);
@@ -2569,6 +2609,7 @@ RestoreSlotFromDisk(const char *name)
slot->effective_xmin = cp.slotdata.xmin;
slot->effective_catalog_xmin = cp.slotdata.catalog_xmin;
slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
+ slot->restart_lsn_flushed = cp.slotdata.restart_lsn;
slot->candidate_catalog_xmin = InvalidTransactionId;
slot->candidate_xmin_lsn = InvalidXLogRecPtr;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index f5a24ccfbf2..b04d2401d6e 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -215,6 +215,13 @@ typedef struct ReplicationSlot
* recently stopped.
*/
TimestampTz inactive_since;
+
+ /* Latest restart_lsn that has been flushed to disk. For persistent slots
+ * the flushed LSN should be taken into account when calculating the oldest
+ * LSN for WAL segments removal.
+ */
+ XLogRecPtr restart_lsn_flushed;
+
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
--
2.34.1
0002-Fix-src-recovery-t-001_stream_rep.pl.patchtext/x-patchDownload
From 2413ab4468b94280d19316b203848912ed6d713f Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <v.davydov@postgrespro.ru>
Date: Fri, 13 Dec 2024 16:02:14 +0300
Subject: [PATCH 2/2] Fix src/recovery/t/001_stream_rep.pl
---
src/test/recovery/t/001_stream_rep.pl | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index ee57d234c86..eae9d00b9b4 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -553,6 +553,9 @@ chomp($phys_restart_lsn_post);
ok( ($phys_restart_lsn_pre cmp $phys_restart_lsn_post) == 0,
"physical slot advance persists across restarts");
+# Cleanup unused WAL segments
+$node_primary->safe_psql('postgres', "CHECKPOINT;");
+
# Check if the previous segment gets correctly recycled after the
# server stopped cleanly, causing a shutdown checkpoint to be generated.
my $primary_data = $node_primary->data_dir;
--
2.34.1
Hi, Vitaly!
On Mon, Mar 3, 2025 at 5:12 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
The slot data is flushed to the disk at the beginning of checkpoint. If
an existing slot is advanced in the middle of checkpoint execution, its
advanced restart LSN is taken to calculate the oldest LSN for WAL
segments removal at the end of checkpoint. If the node is restarted just
after the checkpoint, the slots data will be read from the disk at
recovery with the oldest restart LSN which can refer to removed WAL
segments.The patch introduces a new in-memory state for slots -
flushed_restart_lsn which is used to calculate the oldest LSN for WAL
segments removal. This state is updated every time with the current
restart_lsn at the moment, when the slot is saving to disk.
Thank you for your work on this subject. I think generally your
approach is correct. When we're truncating the WAL log, we need to
reply on the position that would be used in the case of server crush.
That is the position flushed to the disk.
While your patch is generality looks good, I'd like make following notes:
1) As ReplicationSlotsComputeRequiredLSN() is called each time we need
to advance the position of WAL needed by replication slots, the usage
pattern probably could be changed. Thus, we probably need to call
ReplicationSlotsComputeRequiredLSN() somewhere after change of
restart_lsn_flushed while restart_lsn is not changed. And probably
can skip ReplicationSlotsComputeRequiredLSN() in some cases when only
restart_lsn is changed.
2) I think it's essential to include into the patch test caches which
fail without patch. You could start from integrating [1] test into
your patch, and then add more similar tests for different situations.
Links.
1. /messages/by-id/e3ac0535-e7a2-4a96-9b36-9f765e9cfec5@vondra.me
------
Regards,
Alexander Korotkov
Supabase
Hi Alexander,
Thank you for the review. I apologize for a late reply. I missed your email.
1) As ReplicationSlotsComputeRequiredLSN() is called each time we need
to advance the position of WAL needed by replication slots, the usage
pattern probably could be changed. Thus, we probably need to call
ReplicationSlotsComputeRequiredLSN() somewhere after change of
restart_lsn_flushed while restart_lsn is not changed. And probably
can skip ReplicationSlotsComputeRequiredLSN() in some cases when only
restart_lsn is changed.
Yes, it is a good idea for investigation, thank you! I guess, It may work for
persistent slots but I'm not sure about other types of slots (ephemeral and
temporary). I have no clear understanding of consequences at the moment. I
propose to postpone it for future, because the proposed changes will me more
invasive.
2) I think it's essential to include into the patch test caches which
fail without patch. You could start from integrating [1] test into
your patch, and then add more similar tests for different situations.
The problem with TAP tests - it is hard to reproduce without injection points.
The Tomas Vondra's tests create two new injection points. I have to add more
injection points for new tests as well. Injection points help to test the code
but make the code unreadable. I'm not sure, what is the policy of creating
injection points? Personally, I would not like to add new injection points
only to check this particular rare case. I'm trying to create a test without
injection points that should fail occasionally, but I haven't succeeded at
the moment.
I have a question - is there any interest to backport the solution into
existing major releases? I can prepare a patch where restart_lsn_flushed stored
outside of ReplicationSlot structure and doesn't affect the existing API.
With best regards,
Vitaly
On Friday, April 04, 2025 06:22 MSK, Alexander Korotkov <aekorotkov@gmail.com> wrote:
Show quoted text
Hi, Vitaly!
On Mon, Mar 3, 2025 at 5:12 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
The slot data is flushed to the disk at the beginning of checkpoint. If
an existing slot is advanced in the middle of checkpoint execution, its
advanced restart LSN is taken to calculate the oldest LSN for WAL
segments removal at the end of checkpoint. If the node is restarted just
after the checkpoint, the slots data will be read from the disk at
recovery with the oldest restart LSN which can refer to removed WAL
segments.The patch introduces a new in-memory state for slots -
flushed_restart_lsn which is used to calculate the oldest LSN for WAL
segments removal. This state is updated every time with the current
restart_lsn at the moment, when the slot is saving to disk.Thank you for your work on this subject. I think generally your
approach is correct. When we're truncating the WAL log, we need to
reply on the position that would be used in the case of server crush.
That is the position flushed to the disk.While your patch is generality looks good, I'd like make following notes:
1) As ReplicationSlotsComputeRequiredLSN() is called each time we need
to advance the position of WAL needed by replication slots, the usage
pattern probably could be changed. Thus, we probably need to call
ReplicationSlotsComputeRequiredLSN() somewhere after change of
restart_lsn_flushed while restart_lsn is not changed. And probably
can skip ReplicationSlotsComputeRequiredLSN() in some cases when only
restart_lsn is changed.
2) I think it's essential to include into the patch test caches which
fail without patch. You could start from integrating [1] test into
your patch, and then add more similar tests for different situations.Links.
1. /messages/by-id/e3ac0535-e7a2-4a96-9b36-9f765e9cfec5@vondra.me------
Regards,
Alexander Korotkov
Supabase
On Thu, Apr 24, 2025 at 5:32 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
Thank you for the review. I apologize for a late reply. I missed your email.
1) As ReplicationSlotsComputeRequiredLSN() is called each time we need
to advance the position of WAL needed by replication slots, the usage
pattern probably could be changed. Thus, we probably need to call
ReplicationSlotsComputeRequiredLSN() somewhere after change of
restart_lsn_flushed while restart_lsn is not changed. And probably
can skip ReplicationSlotsComputeRequiredLSN() in some cases when only
restart_lsn is changed.Yes, it is a good idea for investigation, thank you! I guess, It may work for
persistent slots but I'm not sure about other types of slots (ephemeral and
temporary). I have no clear understanding of consequences at the moment. I
propose to postpone it for future, because the proposed changes will me more
invasive.
Yes, that's different for different types of slots. So, removing
ReplicationSlotsComputeRequiredLSN() doesn't look safe. But at least,
we need to analyze if we need to add extra calls.
2) I think it's essential to include into the patch test caches which
fail without patch. You could start from integrating [1] test into
your patch, and then add more similar tests for different situations.The problem with TAP tests - it is hard to reproduce without injection points.
The Tomas Vondra's tests create two new injection points. I have to add more
injection points for new tests as well. Injection points help to test the code
but make the code unreadable. I'm not sure, what is the policy of creating
injection points? Personally, I would not like to add new injection points
only to check this particular rare case. I'm trying to create a test without
injection points that should fail occasionally, but I haven't succeeded at
the moment.
I don't know if there is an explicit policy. I think we just add them
as needed to reproduce important situations in TAP tests. So, feel
free to them as many as you want to reproduce all the problematic
situations. During review we can find if they seem too many, but
don't bother about this at present stage.
I have a question - is there any interest to backport the solution into
existing major releases?
As long as this is the bug, it should be backpatched to all supported
affected releases.
I can prepare a patch where restart_lsn_flushed stored
outside of ReplicationSlot structure and doesn't affect the existing API.
Yes, please!
------
Regards,
Alexander Korotkov
Supabase
On Mon, Apr 28, 2025 at 8:17 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
I have a question - is there any interest to backport the solution into
existing major releases?As long as this is the bug, it should be backpatched to all supported
affected releases.
Yes, but I think we cannot back-patch the proposed fix to back
branches as it changes the ReplicationSlot struct defined in slot.h,
which breaks ABI compatibility.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Tue, Apr 29, 2025 at 4:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Apr 28, 2025 at 8:17 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
I have a question - is there any interest to backport the solution into
existing major releases?As long as this is the bug, it should be backpatched to all supported
affected releases.Yes, but I think we cannot back-patch the proposed fix to back
branches as it changes the ReplicationSlot struct defined in slot.h,
which breaks ABI compatibility.
Yes, and I think Vitaly already proposed to address this issue. This
aspect also needs to be carefully reviewed for sure.
On Thu, Apr 24, 2025 at 5:32 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
I can prepare a patch where restart_lsn_flushed stored
outside of ReplicationSlot structure and doesn't affect the existing API.
------
Regards,
Alexander Korotkov
Supabase
On Mon, Apr 28, 2025 at 6:39 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Tue, Apr 29, 2025 at 4:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Apr 28, 2025 at 8:17 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
I have a question - is there any interest to backport the solution into
existing major releases?As long as this is the bug, it should be backpatched to all supported
affected releases.Yes, but I think we cannot back-patch the proposed fix to back
branches as it changes the ReplicationSlot struct defined in slot.h,
which breaks ABI compatibility.Yes, and I think Vitaly already proposed to address this issue. This
aspect also needs to be carefully reviewed for sure.On Thu, Apr 24, 2025 at 5:32 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
I can prepare a patch where restart_lsn_flushed stored
outside of ReplicationSlot structure and doesn't affect the existing API.
Oh, I totally missed this part. Sorry for making noise. I'll review
the patch once submitted.
Regarding the proposed patch, I think we can somewhat follow
last_saved_confirmed_flush field of ReplicationSlot. For example, we
can set restart_lsn_flushed when restoring the slot from the disk,
etc.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Dear All,
Thank you for the attention to the patch. I updated a patch with a better
solution for the master branch which can be easily backported to the other
branches as we agree on the final solution.
Two tests are introduced which are based on Tomas Vondra's test for logical slots
with injection points from the discussion (patches: [1]0001-Add-injection-points-to-test-replication-slot-advanc.v2.patch, [2]0002-Add-TAP-test-to-check-logical-repl-slot-advance-duri.v2.patch, [3]0003-Add-TAP-test-to-check-physical-repl-slot-advance-dur.v2.patch). Tests
are implemented as module tests in src/test/modules/test_replslot_required_lsn
directory. I slightly modified the original test for logical slots and created a
new simpler test for physical slots without any additional injection points.
I prepared a new solution (patch [4]0004-Keep-WAL-segments-by-slot-s-flushed-restart-LSN.v2.patch) which is also based on Tomas Vondra's
proposal. With a fresh eye, I realized that it can fix the issue as well. It is
easy and less invasive to implement. The new solution differs from my original
solution: it is backward compatible (doesn't require any changes in ReplicationSlot
structure). My original solution can be backward compatible as well if to
allocate flushed_restart_lsn in a separate array in shmem, not in the
ReplicationSlot structure, but I believe the new solution is the better one. If
you still think that my previous solution is the better (I don't think so), I
will prepare a backward compatible patch with my previous solution.
I also proposed one more commit (patch [5]0005-Remove-redundant-ReplicationSlotsComputeRequiredLSN-.v2.patch) which removes unnecessary calls of
ReplicationSlotsComputeRequiredLSN function which seems to be redundant. This
function updates the oldest required LSN for slots and it is called every time
when slots' restart_lsn is changed. Once, we use the oldest required LSN in
CreateCheckPoint/CreateRestartPoint to remove old WAL segments, I believe, there
is no need to calculate the oldest value immediately when the slot is advancing
and in other cases when restart_lsn is changed. It may affect on
GetWALAvailability function because the oldest required LSN will be not up to
date, but this function seems to be used in the system view
pg_get_replication_slots and doesn't affect the logic of old WAL segments
removal. I also have some doubts concerning advancing of logical replication
slots: the call of ReplicationSlotsComputeRequiredLSN was removed. Not sure, how
it can affect on ReplicationSlotsComputeRequiredXmin. This commit is not
necessary for the fix but I think it is worth to consider. It may be dropped or
applied only to the master branch.
This patch can be easily backported to the major release branches. I will
quickly prepare the patches for the major releases as we agree on the final
solution.
I apologize for such too late change in patch when it is already on commitfest.
I'm not well experienced yet in the internals of PostgreSQL at the moment,
sometimes the better solution needs some time to grow. In doing we learn :)
[1]: 0001-Add-injection-points-to-test-replication-slot-advanc.v2.patch
[2]: 0002-Add-TAP-test-to-check-logical-repl-slot-advance-duri.v2.patch
[3]: 0003-Add-TAP-test-to-check-physical-repl-slot-advance-dur.v2.patch
[4]: 0004-Keep-WAL-segments-by-slot-s-flushed-restart-LSN.v2.patch
[5]: 0005-Remove-redundant-ReplicationSlotsComputeRequiredLSN-.v2.patch
With best regards,
Vitaly
Attachments:
0003-Add-TAP-test-to-check-physical-repl-slot-advance-dur.v2.patchtext/x-patchDownload
From a8693c3003df7f9850af0be5284bb6f0e7a82fa6 Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <v.davydov@postgrespro.ru>
Date: Wed, 30 Apr 2025 12:48:27 +0300
Subject: [PATCH 3/5] Add TAP test to check physical repl slot advance during
checkpoint
The test verifies that the physical replication slot is still valid
after immediate restart on checkpoint completion in case when the slot
was advanced during checkpoint.
Discussion: https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
---
.../test_replslot_required_lsn/meson.build | 3 +-
.../t/002_physical_slot.pl | 126 ++++++++++++++++++
2 files changed, 128 insertions(+), 1 deletion(-)
create mode 100644 src/test/modules/test_replslot_required_lsn/t/002_physical_slot.pl
diff --git a/src/test/modules/test_replslot_required_lsn/meson.build b/src/test/modules/test_replslot_required_lsn/meson.build
index 999c16201fb..44d2546632b 100644
--- a/src/test/modules/test_replslot_required_lsn/meson.build
+++ b/src/test/modules/test_replslot_required_lsn/meson.build
@@ -9,7 +9,8 @@ tests += {
'enable_injection_points': get_option('injection_points') ? 'yes' : 'no',
},
'tests': [
- 't/001_logical_slot.pl'
+ 't/001_logical_slot.pl',
+ 't/002_physical_slot.pl'
],
},
}
diff --git a/src/test/modules/test_replslot_required_lsn/t/002_physical_slot.pl b/src/test/modules/test_replslot_required_lsn/t/002_physical_slot.pl
new file mode 100644
index 00000000000..f89aec1da32
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/t/002_physical_slot.pl
@@ -0,0 +1,126 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+# This test verifies the case when the physical slot is advanced during
+# checkpoint. The test checks that the physical slot's restart_lsn still refers
+# to an existed WAL segment after immediate restart.
+#
+# Discussion:
+# https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
+#
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+
+$node = PostgreSQL::Test::Cluster->new('mike');
+$node->init();
+$node->append_conf('postgresql.conf',
+ "shared_preload_libraries = 'injection_points'");
+$node->append_conf('postgresql.conf',
+ "wal_level = 'replica'");
+$node->start();
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# create a simple table to generate data into
+$node->safe_psql('postgres',
+ q{create table t (id serial primary key, b text)});
+
+# create a physical replication slot
+$node->safe_psql('postgres',
+ q{select pg_create_physical_replication_slot('slot_physical', true)});
+
+# advance slot to current position, just to have everything "valid"
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())});
+
+# run checkpoint, to flush current state to disk and set a baseline
+$node->safe_psql('postgres', q{checkpoint});
+
+# insert 2M rows, that's about 260MB (~20 segments) worth of WAL
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,100000) s(i)});
+
+# advance slot to current position, just to have everything "valid"
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())});
+
+# run another checkpoint, to set a new restore LSN
+$node->safe_psql('postgres', q{checkpoint});
+
+# another 2M rows, that's about 260MB (~20 segments) worth of WAL
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)});
+
+my $restart_lsn_init = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'});
+chomp($restart_lsn_init);
+note("restart lsn before checkpoint: $restart_lsn_init");
+
+# run another checkpoint, this time in the background, and make it wait
+# on the injection point), so that the checkpoint stops right before
+# removing old WAL segments
+note('starting checkpoint');
+
+my $checkpoint = $node->background_psql('postgres');
+$checkpoint->query_safe(
+ q{select injection_points_attach('checkpoint-before-old-wal-removal','wait')});
+$checkpoint->query_until(qr/starting_checkpoint/,
+q(\echo starting_checkpoint
+checkpoint;
+\q
+));
+
+# wait until the checkpoint stops right before removing WAL segments
+note('waiting for injection_point');
+$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
+note('injection_point is reached');
+
+# OK, we're in the right situation, time to advance the physical slot,
+# which recalculates the required LSN, and then unblock the checkpoint,
+# which removes the WAL still needed by the logical slot
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())});
+
+# Continue checkpoint
+$node->safe_psql('postgres',
+ q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
+
+my $restart_lsn_old = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'});
+chomp($restart_lsn_old);
+note("restart lsn before stop: $restart_lsn_old");
+
+# abruptly stop the server (1 second should be enough for the checkpoint
+# to finish, would be better )
+$node->stop('immediate');
+
+$node->start;
+
+# Get the restart_lsn of the slot right after restarting
+my $restart_lsn = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'});
+chomp($restart_lsn);
+note("restart lsn: $restart_lsn");
+
+# Get wal segment name for slot's restart_lsn
+my $restart_lsn_segment = $node->safe_psql('postgres',
+ "SELECT pg_walfile_name('$restart_lsn'::pg_lsn)");
+chomp($restart_lsn_segment);
+
+# Check if the required wal segment exists
+note("required by slot segment name: $restart_lsn_segment");
+my $datadir = $node->data_dir;
+ok(-f "$datadir/pg_wal/$restart_lsn_segment",
+ "WAL segment $restart_lsn_segment for physical slot's restart_lsn $restart_lsn exists");
+
+done_testing();
--
2.34.1
0004-Keep-WAL-segments-by-slot-s-flushed-restart-LSN.v2.patchtext/x-patchDownload
From f17ba2642b7c7ef13163c3e411f3d9218a1faa11 Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <v.davydov@postgrespro.ru>
Date: Wed, 30 Apr 2025 14:09:21 +0300
Subject: [PATCH 4/5] Keep WAL segments by slot's flushed restart LSN
The patch fixes the issue with unexpected removal of old WAL segments
after checkpoint followed by immediate restart. The issue occurs when a
slot is advanced after the start of checkpoint and before old WAL
segments removal at end of checkpoint.
The idea of the patch is to get the minimal restart_lsn at the beginning
of checkpoint (or restart point) creation and use this value when
calculating oldest LSN for WAL segments removal at the end of
checkpoint. This idea was proposed by Tomas Vondra in the discussion.
Discussion:
https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
---
src/backend/access/transam/xlog.c | 37 ++++++++++++++++++++++++-------
1 file changed, 29 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1f2256a3b86..79a21e2d088 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -677,7 +677,8 @@ static XLogRecPtr CreateOverwriteContrecordRecord(XLogRecPtr aborted_lsn,
XLogRecPtr pagePtr,
TimeLineID newTLI);
static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
-static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
+static void KeepLogSeg(XLogRecPtr recptr, XLogRecPtr slotsMinLSN,
+ XLogSegNo *logSegNo);
static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
static void AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli,
@@ -7087,6 +7088,7 @@ CreateCheckPoint(int flags)
VirtualTransactionId *vxids;
int nvxids;
int oldXLogAllowed = 0;
+ XLogRecPtr slotsMinReqLSN;
/*
* An end-of-recovery checkpoint is really a shutdown checkpoint, just
@@ -7315,6 +7317,11 @@ CreateCheckPoint(int flags)
*/
END_CRIT_SECTION();
+ /*
+ * Get the current minimum LSN to be used later in WAL segments cleanup.
+ */
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+
/*
* In some cases there are groups of actions that must all occur on one
* side or the other of a checkpoint record. Before flushing the
@@ -7507,17 +7514,20 @@ CreateCheckPoint(int flags)
* prevent the disk holding the xlog from growing full.
*/
XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(recptr, &_logSegNo);
+ KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo);
if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
_logSegNo, InvalidOid,
InvalidTransactionId))
{
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+ CheckPointReplicationSlots(shutdown);
+
/*
* Some slots have been invalidated; recalculate the old-segment
* horizon, starting again from RedoRecPtr.
*/
XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(recptr, &_logSegNo);
+ KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo);
}
_logSegNo--;
RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr,
@@ -7792,6 +7802,7 @@ CreateRestartPoint(int flags)
XLogRecPtr endptr;
XLogSegNo _logSegNo;
TimestampTz xtime;
+ XLogRecPtr slotsMinReqLSN;
/* Concurrent checkpoint/restartpoint cannot happen */
Assert(!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER);
@@ -7874,6 +7885,11 @@ CreateRestartPoint(int flags)
MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
+ /*
+ * Get the current minimum LSN to be used later in WAL segments cleanup.
+ */
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+
if (log_checkpoints)
LogCheckpointStart(flags, true);
@@ -7962,17 +7978,20 @@ CreateRestartPoint(int flags)
receivePtr = GetWalRcvFlushRecPtr(NULL, NULL);
replayPtr = GetXLogReplayRecPtr(&replayTLI);
endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
- KeepLogSeg(endptr, &_logSegNo);
+ KeepLogSeg(endptr, slotsMinReqLSN, &_logSegNo);
if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
_logSegNo, InvalidOid,
InvalidTransactionId))
{
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+ CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
+
/*
* Some slots have been invalidated; recalculate the old-segment
* horizon, starting again from RedoRecPtr.
*/
XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(endptr, &_logSegNo);
+ KeepLogSeg(endptr, slotsMinReqLSN, &_logSegNo);
}
_logSegNo--;
@@ -8067,6 +8086,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
XLogSegNo oldestSegMaxWalSize; /* oldest segid kept by max_wal_size */
XLogSegNo oldestSlotSeg; /* oldest segid kept by slot */
uint64 keepSegs;
+ XLogRecPtr slotsMinReqLSN;
/*
* slot does not reserve WAL. Either deactivated, or has never been active
@@ -8080,8 +8100,9 @@ GetWALAvailability(XLogRecPtr targetLSN)
* oldestSlotSeg to the current segment.
*/
currpos = GetXLogWriteRecPtr();
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
XLByteToSeg(currpos, oldestSlotSeg, wal_segment_size);
- KeepLogSeg(currpos, &oldestSlotSeg);
+ KeepLogSeg(currpos, slotsMinReqLSN, &oldestSlotSeg);
/*
* Find the oldest extant segment file. We get 1 until checkpoint removes
@@ -8142,7 +8163,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
* invalidation is optionally done here, instead.
*/
static void
-KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
+KeepLogSeg(XLogRecPtr recptr, XLogRecPtr slotsMinReqLSN, XLogSegNo *logSegNo)
{
XLogSegNo currSegNo;
XLogSegNo segno;
@@ -8155,7 +8176,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
* Calculate how many segments are kept by slots first, adjusting for
* max_slot_wal_keep_size.
*/
- keep = XLogGetReplicationSlotMinimumLSN();
+ keep = slotsMinReqLSN;
if (keep != InvalidXLogRecPtr && keep < recptr)
{
XLByteToSeg(keep, segno, wal_segment_size);
--
2.34.1
0001-Add-injection-points-to-test-replication-slot-advanc.v2.patchtext/x-patchDownload
From 68b16da5448ec64661319bca07939e07066fe2a6 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Thu, 21 Nov 2024 20:37:00 +0100
Subject: [PATCH 1/5] Add injection points to test replication slot advance
New injection points:
* checkpoint-before-old-wal-removal - triggered in the checkpointer
process just before old WAL segments cleanup.
* logical-replication-slot-advance-segment - triggered in
LogicalConfirmReceivedLocation when restart_lsn was changed enough to
point to a next WAL segment.
Original patch by: Tomas Vondra <tomas@vondra.me>
Modified by: Vitaly Davydov <v.davydov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
---
src/backend/access/transam/xlog.c | 4 ++++
src/backend/replication/logical/logical.c | 18 ++++++++++++++++++
2 files changed, 22 insertions(+)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2d4c346473b..1f2256a3b86 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7498,6 +7498,10 @@ CreateCheckPoint(int flags)
if (PriorRedoPtr != InvalidXLogRecPtr)
UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
+#ifdef USE_INJECTION_POINTS
+ INJECTION_POINT("checkpoint-before-old-wal-removal");
+#endif
+
/*
* Delete old log files, those no longer needed for last checkpoint to
* prevent the disk holding the xlog from growing full.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index a8d2e024d34..2163dc5e275 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -30,6 +30,7 @@
#include "access/xact.h"
#include "access/xlogutils.h"
+#include "access/xlog_internal.h"
#include "fmgr.h"
#include "miscadmin.h"
#include "pgstat.h"
@@ -41,6 +42,7 @@
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/inval.h"
#include "utils/memutils.h"
@@ -1825,9 +1827,13 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
{
bool updated_xmin = false;
bool updated_restart = false;
+ XLogRecPtr restart_lsn pg_attribute_unused();
SpinLockAcquire(&MyReplicationSlot->mutex);
+ /* remember the old restart lsn */
+ restart_lsn = MyReplicationSlot->data.restart_lsn;
+
MyReplicationSlot->data.confirmed_flush = lsn;
/* if we're past the location required for bumping xmin, do so */
@@ -1869,6 +1875,18 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
/* first write new xmin to disk, so we know what's up after a crash */
if (updated_xmin || updated_restart)
{
+#ifdef USE_INJECTION_POINTS
+ XLogSegNo seg1,
+ seg2;
+
+ XLByteToSeg(restart_lsn, seg1, wal_segment_size);
+ XLByteToSeg(MyReplicationSlot->data.restart_lsn, seg2, wal_segment_size);
+
+ /* trigger injection point, but only if segment changes */
+ if (seg1 != seg2)
+ INJECTION_POINT("logical-replication-slot-advance-segment");
+#endif
+
ReplicationSlotMarkDirty();
ReplicationSlotSave();
elog(DEBUG1, "updated xmin: %u restart: %u", updated_xmin, updated_restart);
--
2.34.1
0005-Remove-redundant-ReplicationSlotsComputeRequiredLSN-.v2.patchtext/x-patchDownload
From 5836ce690d6167d4f79181cc3af0531245969df6 Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <v.davydov@postgrespro.ru>
Date: Thu, 1 May 2025 12:18:52 +0300
Subject: [PATCH 5/5] Remove redundant ReplicationSlotsComputeRequiredLSN calls
The function ReplicationSlotsComputeRequiredLSN is used to calculate the
oldest slots' required LSN. It is called every time when restart_lsn
value of any slot is changed (for example, when a slot is advancing).
The slot's oldest required LSN is used to remote old WAL segments in two
places - when checkpoint or restart point is created (CreateCheckPoint,
CreateRestartPoint functions). Old WAL segments seems to be truncated in
these two functions only.
The idea of the patch is to call ReplicationSlotsComputeRequiredLSN in
CreateCheckPoint or CreateRestartPoint functions only, before call of
RemoveOldXlogFiles function where old WAL segments are removed. There
is no obvious need to recalculate oldest required LSN every time when a
slot's restart_lsn is changed.
The value of the oldest required lsn can affect on slot invalidation.
The function InvalidateObsoleteReplicationSlots with non zero second
parameter (oldestSegno) is called in CreateCheckPoint,
CreateRestartPoint functions only where slot invalidation occurs with
reason RS_INVAL_WAL_REMOVED. Once we update the oldest slots' required
lsn in the beginning of these functions, the proposed patch should not
break the behaviour of slot invalidation function in this case.
---
src/backend/access/transam/xlog.c | 4 ++++
src/backend/replication/logical/logical.c | 1 -
src/backend/replication/logical/slotsync.c | 4 ----
src/backend/replication/slot.c | 5 -----
src/backend/replication/slotfuncs.c | 2 --
src/backend/replication/walsender.c | 1 -
6 files changed, 4 insertions(+), 13 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 79a21e2d088..5875b5f7b9c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7320,6 +7320,7 @@ CreateCheckPoint(int flags)
/*
* Get the current minimum LSN to be used later in WAL segments cleanup.
*/
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
/*
@@ -7519,6 +7520,7 @@ CreateCheckPoint(int flags)
_logSegNo, InvalidOid,
InvalidTransactionId))
{
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
CheckPointReplicationSlots(shutdown);
@@ -7888,6 +7890,7 @@ CreateRestartPoint(int flags)
/*
* Get the current minimum LSN to be used later in WAL segments cleanup.
*/
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
if (log_checkpoints)
@@ -7983,6 +7986,7 @@ CreateRestartPoint(int flags)
_logSegNo, InvalidOid,
InvalidTransactionId))
{
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 2163dc5e275..e796023033c 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1905,7 +1905,6 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
SpinLockRelease(&MyReplicationSlot->mutex);
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
}
}
else
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 656e66e0ae0..30662c09275 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -335,7 +335,6 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
SpinLockRelease(&slot->mutex);
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
}
return updated_config || updated_xmin_or_lsn;
@@ -502,9 +501,6 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
slot->data.restart_lsn = restart_lsn;
SpinLockRelease(&slot->mutex);
- /* Prevent WAL removal as fast as possible */
- ReplicationSlotsComputeRequiredLSN();
-
XLByteToSeg(slot->data.restart_lsn, segno, wal_segment_size);
/*
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 600b87fa9cb..dd18fe10f7d 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1008,7 +1008,6 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
* limits.
*/
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
/*
* If removing the directory fails, the worst thing that will happen is
@@ -1494,9 +1493,6 @@ ReplicationSlotReserveWal(void)
slot->data.restart_lsn = restart_lsn;
SpinLockRelease(&slot->mutex);
- /* prevent WAL removal as fast as possible */
- ReplicationSlotsComputeRequiredLSN();
-
/*
* If all required WAL is still there, great, otherwise retry. The
* slot should prevent further removal of WAL, unless there's a
@@ -2014,7 +2010,6 @@ restart:
if (invalidated)
{
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
}
return invalidated;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 36cc2ed4e44..3300fb9b1c9 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -583,7 +583,6 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
* advancing potentially done.
*/
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
ReplicationSlotRelease();
@@ -819,7 +818,6 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
ReplicationSlotMarkDirty();
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
ReplicationSlotSave();
#ifdef USE_ASSERT_CHECKING
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 9fa8beb6103..0767c2803d9 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2384,7 +2384,6 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
if (changed)
{
ReplicationSlotMarkDirty();
- ReplicationSlotsComputeRequiredLSN();
PhysicalWakeupLogicalWalSnd();
}
--
2.34.1
0002-Add-TAP-test-to-check-logical-repl-slot-advance-duri.v2.patchtext/x-patchDownload
From 2a5ead4a45c9624eace2dbad63f18ca76c307db6 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Thu, 21 Nov 2024 23:07:22 +0100
Subject: [PATCH 2/5] Add TAP test to check logical repl slot advance during
checkpoint
The test verifies that logical replication slot is still valid after
immediate restart on checkpoint completion in case when the slot was
advanced during checkpoint.
Original patch by: Tomas Vondra <tomas@vondra.me>
Modified by: Vitaly Davydov <v.davydov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
---
src/test/modules/Makefile | 4 +-
src/test/modules/meson.build | 1 +
.../test_replslot_required_lsn/Makefile | 18 +++
.../test_replslot_required_lsn/meson.build | 15 +++
.../t/001_logical_slot.pl | 124 ++++++++++++++++++
5 files changed, 160 insertions(+), 2 deletions(-)
create mode 100644 src/test/modules/test_replslot_required_lsn/Makefile
create mode 100644 src/test/modules/test_replslot_required_lsn/meson.build
create mode 100644 src/test/modules/test_replslot_required_lsn/t/001_logical_slot.pl
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index aa1d27bbed3..53d3dd8e0ed 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -46,9 +46,9 @@ SUBDIRS = \
ifeq ($(enable_injection_points),yes)
-SUBDIRS += injection_points gin typcache
+SUBDIRS += injection_points gin typcache test_replslot_required_lsn
else
-ALWAYS_SUBDIRS += injection_points gin typcache
+ALWAYS_SUBDIRS += injection_points gin typcache test_replslot_required_lsn
endif
ifeq ($(with_ssl),openssl)
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 9de0057bd1d..ac0dbd1f10f 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -43,3 +43,4 @@ subdir('typcache')
subdir('unsafe_tests')
subdir('worker_spi')
subdir('xid_wraparound')
+subdir('test_replslot_required_lsn')
diff --git a/src/test/modules/test_replslot_required_lsn/Makefile b/src/test/modules/test_replslot_required_lsn/Makefile
new file mode 100644
index 00000000000..e5ff8af255b
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/Makefile
@@ -0,0 +1,18 @@
+# src/test/modules/test_replslot_required_lsn/Makefile
+
+EXTRA_INSTALL=src/test/modules/injection_points \
+ contrib/test_decoding
+
+export enable_injection_points
+TAP_TESTS = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_replslot_required_lsn
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_replslot_required_lsn/meson.build b/src/test/modules/test_replslot_required_lsn/meson.build
new file mode 100644
index 00000000000..999c16201fb
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/meson.build
@@ -0,0 +1,15 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+tests += {
+ 'name': 'test_replslot_required_lsn',
+ 'sd': meson.current_source_dir(),
+ 'bd': meson.current_build_dir(),
+ 'tap': {
+ 'env': {
+ 'enable_injection_points': get_option('injection_points') ? 'yes' : 'no',
+ },
+ 'tests': [
+ 't/001_logical_slot.pl'
+ ],
+ },
+}
diff --git a/src/test/modules/test_replslot_required_lsn/t/001_logical_slot.pl b/src/test/modules/test_replslot_required_lsn/t/001_logical_slot.pl
new file mode 100644
index 00000000000..ff13c741ad0
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/t/001_logical_slot.pl
@@ -0,0 +1,124 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+# This test verifies the case when the logical slot is advanced during
+# checkpoint. The test checks that the logical slot's restart_lsn still refers
+# to an existed WAL segment after immediate restart.
+#
+# Discussion:
+# https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
+#
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+
+$node = PostgreSQL::Test::Cluster->new('mike');
+$node->init;
+$node->append_conf('postgresql.conf',
+ "shared_preload_libraries = 'injection_points'");
+$node->append_conf('postgresql.conf',
+ "wal_level = 'logical'");
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# create a simple table to generate data into
+$node->safe_psql('postgres',
+ q{create table t (id serial primary key, b text)});
+
+# create the two slots we'll need
+$node->safe_psql('postgres',
+ q{select pg_create_logical_replication_slot('slot_logical', 'test_decoding')});
+$node->safe_psql('postgres',
+ q{select pg_create_physical_replication_slot('slot_physical', true)});
+
+# advance both to current position, just to have everything "valid"
+$node->safe_psql('postgres',
+ q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null)});
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())});
+
+# run checkpoint, to flush current state to disk and set a baseline
+$node->safe_psql('postgres', q{checkpoint});
+
+# generate transactions to get RUNNING_XACTS
+my $xacts = $node->background_psql('postgres');
+$xacts->query_until(qr/run_xacts/,
+q(\echo run_xacts
+SELECT 1 \watch 0.1
+\q
+));
+
+# insert 2M rows, that's about 260MB (~20 segments) worth of WAL
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)});
+
+# run another checkpoint, to set a new restore LSN
+$node->safe_psql('postgres', q{checkpoint});
+
+# another 2M rows, that's about 260MB (~20 segments) worth of WAL
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)});
+
+# run another checkpoint, this time in the background, and make it wait
+# on the injection point), so that the checkpoint stops right before
+# removing old WAL segments
+print('starting checkpoint\n');
+
+my $checkpoint = $node->background_psql('postgres');
+$checkpoint->query_safe(q(select injection_points_attach('checkpoint-before-old-wal-removal','wait')));
+$checkpoint->query_until(qr/starting_checkpoint/,
+q(\echo starting_checkpoint
+checkpoint;
+\q
+));
+
+print('waiting for injection_point\n');
+# wait until the checkpoint stops right before removing WAL segments
+$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
+
+
+# try to advance the logical slot, but make it stop when it moves to the
+# next WAL segment (has to happen in the background too)
+my $logical = $node->background_psql('postgres');
+$logical->query_safe(q{select injection_points_attach('logical-replication-slot-advance-segment','wait');});
+$logical->query_until(qr/get_changes/,
+q(
+\echo get_changes
+select count(*) from pg_logical_slot_get_changes('slot_logical', null, null) \watch 1
+\q
+));
+
+# wait until the checkpoint stops right before removing WAL segments
+$node->wait_for_event('client backend', 'logical-replication-slot-advance-segment');
+
+# OK, we're in the right situation, time to advance the physical slot,
+# which recalculates the required LSN, and then unblock the checkpoint,
+# which removes the WAL still needed by the logical slot
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())});
+
+$node->safe_psql('postgres',
+ q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
+
+# abruptly stop the server (1 second should be enough for the checkpoint
+# to finish, would be better )
+$node->stop('immediate');
+
+$node->start;
+
+eval {
+ $node->safe_psql('postgres', q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null);});
+};
+is($@, '', "Logical slot still valid");
+
+done_testing();
--
2.34.1
Hi Vitaly!
On Fri, May 2, 2025 at 8:47 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
Thank you for the attention to the patch. I updated a patch with a better
solution for the master branch which can be easily backported to the other
branches as we agree on the final solution.Two tests are introduced which are based on Tomas Vondra's test for logical slots
with injection points from the discussion (patches: [1], [2], [3]). Tests
are implemented as module tests in src/test/modules/test_replslot_required_lsn
directory. I slightly modified the original test for logical slots and created a
new simpler test for physical slots without any additional injection points.
The patchset doesn't seem to build after 371f2db8b0, which adjusted
the signature of the INJECTION_POINT() macro. Could you, please,
update the patchset accordingly.
I prepared a new solution (patch [4]) which is also based on Tomas Vondra's
proposal. With a fresh eye, I realized that it can fix the issue as well. It is
easy and less invasive to implement. The new solution differs from my original
solution: it is backward compatible (doesn't require any changes in ReplicationSlot
structure). My original solution can be backward compatible as well if to
allocate flushed_restart_lsn in a separate array in shmem, not in the
ReplicationSlot structure, but I believe the new solution is the better one. If
you still think that my previous solution is the better (I don't think so), I
will prepare a backward compatible patch with my previous solution.
I see in 0004 patch we're calling XLogGetReplicationSlotMinimumLSN()
before slots synchronization then use it for WAL truncation.
Generally looks good. But what about the "if
(InvalidateObsoleteReplicationSlots(...))" branch? It calls
XLogGetReplicationSlotMinimumLSN() again. Why would the value
obtained from the latter call reflect slots as they are synchronized
to the disk?
------
Regards,
Alexander Korotkov
Supabase
Hi Alexander,
Thank you very much for the review!
The patchset doesn't seem to build after 371f2db8b0, which adjusted
the signature of the INJECTION_POINT() macro. Could you, please,
update the patchset accordingly.
I've updated the patch (see attached). Thanks.
I see in 0004 patch we're calling XLogGetReplicationSlotMinimumLSN()
before slots synchronization then use it for WAL truncation.
Generally looks good. But what about the "if
(InvalidateObsoleteReplicationSlots(...))" branch? It calls
XLogGetReplicationSlotMinimumLSN() again. Why would the value
obtained from the latter call reflect slots as they are synchronized
to the disk?
In patch 0004 I call XLogGetReplicationSlotMinimumLSN() again to keep the old
behaviour - this function was called in KeepLogSeg prior to my change. I also
call CheckPointReplicationSlots at the next line to save the invalidated and
other dirty slots on disk again to make sure, the new oldest LSN is in sync.
The problem I tried to solve in this if-branch is to fix test
src/test/recovery/t/019_replslot_limit.pl which was failed because the WAL was
not truncated enought for the test to pass ok. In general, this branch is not
necessary and we may fix the test by calling checkpoint twice (please, see the
alternative.rej patch for this case). If you think, we should incorporate this
new change, I'm ok to do it. But the WAL will be truncating more lazily.
Furthermore, I think we can save slots on disk right after invalidation, not in
CheckPointGuts to avoid saving invalidated slots twice.
With best regards,
Vitaly
Attachments:
0002-Add-TAP-test-to-check-logical-repl-slot-advance-duri.v3.patchtext/x-patchDownload
From 41eed2a90d68f4d9ac1ee3d00c89879358d19fd1 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Thu, 21 Nov 2024 23:07:22 +0100
Subject: [PATCH 2/5] Add TAP test to check logical repl slot advance during
checkpoint
The test verifies that logical replication slot is still valid after
immediate restart on checkpoint completion in case when the slot was
advanced during checkpoint.
Original patch by: Tomas Vondra <tomas@vondra.me>
Modified by: Vitaly Davydov <v.davydov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
---
src/test/modules/Makefile | 4 +-
src/test/modules/meson.build | 1 +
.../test_replslot_required_lsn/Makefile | 18 +++
.../test_replslot_required_lsn/meson.build | 15 +++
.../t/001_logical_slot.pl | 124 ++++++++++++++++++
5 files changed, 160 insertions(+), 2 deletions(-)
create mode 100644 src/test/modules/test_replslot_required_lsn/Makefile
create mode 100644 src/test/modules/test_replslot_required_lsn/meson.build
create mode 100644 src/test/modules/test_replslot_required_lsn/t/001_logical_slot.pl
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index aa1d27bbed3..53d3dd8e0ed 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -46,9 +46,9 @@ SUBDIRS = \
ifeq ($(enable_injection_points),yes)
-SUBDIRS += injection_points gin typcache
+SUBDIRS += injection_points gin typcache test_replslot_required_lsn
else
-ALWAYS_SUBDIRS += injection_points gin typcache
+ALWAYS_SUBDIRS += injection_points gin typcache test_replslot_required_lsn
endif
ifeq ($(with_ssl),openssl)
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 9de0057bd1d..ac0dbd1f10f 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -43,3 +43,4 @@ subdir('typcache')
subdir('unsafe_tests')
subdir('worker_spi')
subdir('xid_wraparound')
+subdir('test_replslot_required_lsn')
diff --git a/src/test/modules/test_replslot_required_lsn/Makefile b/src/test/modules/test_replslot_required_lsn/Makefile
new file mode 100644
index 00000000000..e5ff8af255b
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/Makefile
@@ -0,0 +1,18 @@
+# src/test/modules/test_replslot_required_lsn/Makefile
+
+EXTRA_INSTALL=src/test/modules/injection_points \
+ contrib/test_decoding
+
+export enable_injection_points
+TAP_TESTS = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_replslot_required_lsn
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_replslot_required_lsn/meson.build b/src/test/modules/test_replslot_required_lsn/meson.build
new file mode 100644
index 00000000000..999c16201fb
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/meson.build
@@ -0,0 +1,15 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+tests += {
+ 'name': 'test_replslot_required_lsn',
+ 'sd': meson.current_source_dir(),
+ 'bd': meson.current_build_dir(),
+ 'tap': {
+ 'env': {
+ 'enable_injection_points': get_option('injection_points') ? 'yes' : 'no',
+ },
+ 'tests': [
+ 't/001_logical_slot.pl'
+ ],
+ },
+}
diff --git a/src/test/modules/test_replslot_required_lsn/t/001_logical_slot.pl b/src/test/modules/test_replslot_required_lsn/t/001_logical_slot.pl
new file mode 100644
index 00000000000..ff13c741ad0
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/t/001_logical_slot.pl
@@ -0,0 +1,124 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+# This test verifies the case when the logical slot is advanced during
+# checkpoint. The test checks that the logical slot's restart_lsn still refers
+# to an existed WAL segment after immediate restart.
+#
+# Discussion:
+# https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
+#
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+
+$node = PostgreSQL::Test::Cluster->new('mike');
+$node->init;
+$node->append_conf('postgresql.conf',
+ "shared_preload_libraries = 'injection_points'");
+$node->append_conf('postgresql.conf',
+ "wal_level = 'logical'");
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# create a simple table to generate data into
+$node->safe_psql('postgres',
+ q{create table t (id serial primary key, b text)});
+
+# create the two slots we'll need
+$node->safe_psql('postgres',
+ q{select pg_create_logical_replication_slot('slot_logical', 'test_decoding')});
+$node->safe_psql('postgres',
+ q{select pg_create_physical_replication_slot('slot_physical', true)});
+
+# advance both to current position, just to have everything "valid"
+$node->safe_psql('postgres',
+ q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null)});
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())});
+
+# run checkpoint, to flush current state to disk and set a baseline
+$node->safe_psql('postgres', q{checkpoint});
+
+# generate transactions to get RUNNING_XACTS
+my $xacts = $node->background_psql('postgres');
+$xacts->query_until(qr/run_xacts/,
+q(\echo run_xacts
+SELECT 1 \watch 0.1
+\q
+));
+
+# insert 2M rows, that's about 260MB (~20 segments) worth of WAL
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)});
+
+# run another checkpoint, to set a new restore LSN
+$node->safe_psql('postgres', q{checkpoint});
+
+# another 2M rows, that's about 260MB (~20 segments) worth of WAL
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)});
+
+# run another checkpoint, this time in the background, and make it wait
+# on the injection point), so that the checkpoint stops right before
+# removing old WAL segments
+print('starting checkpoint\n');
+
+my $checkpoint = $node->background_psql('postgres');
+$checkpoint->query_safe(q(select injection_points_attach('checkpoint-before-old-wal-removal','wait')));
+$checkpoint->query_until(qr/starting_checkpoint/,
+q(\echo starting_checkpoint
+checkpoint;
+\q
+));
+
+print('waiting for injection_point\n');
+# wait until the checkpoint stops right before removing WAL segments
+$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
+
+
+# try to advance the logical slot, but make it stop when it moves to the
+# next WAL segment (has to happen in the background too)
+my $logical = $node->background_psql('postgres');
+$logical->query_safe(q{select injection_points_attach('logical-replication-slot-advance-segment','wait');});
+$logical->query_until(qr/get_changes/,
+q(
+\echo get_changes
+select count(*) from pg_logical_slot_get_changes('slot_logical', null, null) \watch 1
+\q
+));
+
+# wait until the checkpoint stops right before removing WAL segments
+$node->wait_for_event('client backend', 'logical-replication-slot-advance-segment');
+
+# OK, we're in the right situation, time to advance the physical slot,
+# which recalculates the required LSN, and then unblock the checkpoint,
+# which removes the WAL still needed by the logical slot
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())});
+
+$node->safe_psql('postgres',
+ q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
+
+# abruptly stop the server (1 second should be enough for the checkpoint
+# to finish, would be better )
+$node->stop('immediate');
+
+$node->start;
+
+eval {
+ $node->safe_psql('postgres', q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null);});
+};
+is($@, '', "Logical slot still valid");
+
+done_testing();
--
2.34.1
0004-Keep-WAL-segments-by-slot-s-flushed-restart-LSN.v3.patchtext/x-patchDownload
From 88078156f67aabeaa91d226ca4967c827dee977f Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <v.davydov@postgrespro.ru>
Date: Wed, 30 Apr 2025 14:09:21 +0300
Subject: [PATCH 4/5] Keep WAL segments by slot's flushed restart LSN
The patch fixes the issue with unexpected removal of old WAL segments
after checkpoint followed by immediate restart. The issue occurs when a
slot is advanced after the start of checkpoint and before old WAL
segments removal at end of checkpoint.
The idea of the patch is to get the minimal restart_lsn at the beginning
of checkpoint (or restart point) creation and use this value when
calculating oldest LSN for WAL segments removal at the end of
checkpoint. This idea was proposed by Tomas Vondra in the discussion.
Discussion:
https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
---
src/backend/access/transam/xlog.c | 37 ++++++++++++++++++++++++-------
1 file changed, 29 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 47ffc0a2307..9c0f9a0af28 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -677,7 +677,8 @@ static XLogRecPtr CreateOverwriteContrecordRecord(XLogRecPtr aborted_lsn,
XLogRecPtr pagePtr,
TimeLineID newTLI);
static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
-static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
+static void KeepLogSeg(XLogRecPtr recptr, XLogRecPtr slotsMinLSN,
+ XLogSegNo *logSegNo);
static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
static void AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli,
@@ -7087,6 +7088,7 @@ CreateCheckPoint(int flags)
VirtualTransactionId *vxids;
int nvxids;
int oldXLogAllowed = 0;
+ XLogRecPtr slotsMinReqLSN;
/*
* An end-of-recovery checkpoint is really a shutdown checkpoint, just
@@ -7315,6 +7317,11 @@ CreateCheckPoint(int flags)
*/
END_CRIT_SECTION();
+ /*
+ * Get the current minimum LSN to be used later in WAL segments cleanup.
+ */
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+
/*
* In some cases there are groups of actions that must all occur on one
* side or the other of a checkpoint record. Before flushing the
@@ -7507,17 +7514,20 @@ CreateCheckPoint(int flags)
* prevent the disk holding the xlog from growing full.
*/
XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(recptr, &_logSegNo);
+ KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo);
if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
_logSegNo, InvalidOid,
InvalidTransactionId))
{
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+ CheckPointReplicationSlots(shutdown);
+
/*
* Some slots have been invalidated; recalculate the old-segment
* horizon, starting again from RedoRecPtr.
*/
XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(recptr, &_logSegNo);
+ KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo);
}
_logSegNo--;
RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr,
@@ -7792,6 +7802,7 @@ CreateRestartPoint(int flags)
XLogRecPtr endptr;
XLogSegNo _logSegNo;
TimestampTz xtime;
+ XLogRecPtr slotsMinReqLSN;
/* Concurrent checkpoint/restartpoint cannot happen */
Assert(!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER);
@@ -7874,6 +7885,11 @@ CreateRestartPoint(int flags)
MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
+ /*
+ * Get the current minimum LSN to be used later in WAL segments cleanup.
+ */
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+
if (log_checkpoints)
LogCheckpointStart(flags, true);
@@ -7962,17 +7978,20 @@ CreateRestartPoint(int flags)
receivePtr = GetWalRcvFlushRecPtr(NULL, NULL);
replayPtr = GetXLogReplayRecPtr(&replayTLI);
endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
- KeepLogSeg(endptr, &_logSegNo);
+ KeepLogSeg(endptr, slotsMinReqLSN, &_logSegNo);
if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
_logSegNo, InvalidOid,
InvalidTransactionId))
{
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+ CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
+
/*
* Some slots have been invalidated; recalculate the old-segment
* horizon, starting again from RedoRecPtr.
*/
XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(endptr, &_logSegNo);
+ KeepLogSeg(endptr, slotsMinReqLSN, &_logSegNo);
}
_logSegNo--;
@@ -8067,6 +8086,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
XLogSegNo oldestSegMaxWalSize; /* oldest segid kept by max_wal_size */
XLogSegNo oldestSlotSeg; /* oldest segid kept by slot */
uint64 keepSegs;
+ XLogRecPtr slotsMinReqLSN;
/*
* slot does not reserve WAL. Either deactivated, or has never been active
@@ -8080,8 +8100,9 @@ GetWALAvailability(XLogRecPtr targetLSN)
* oldestSlotSeg to the current segment.
*/
currpos = GetXLogWriteRecPtr();
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
XLByteToSeg(currpos, oldestSlotSeg, wal_segment_size);
- KeepLogSeg(currpos, &oldestSlotSeg);
+ KeepLogSeg(currpos, slotsMinReqLSN, &oldestSlotSeg);
/*
* Find the oldest extant segment file. We get 1 until checkpoint removes
@@ -8142,7 +8163,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
* invalidation is optionally done here, instead.
*/
static void
-KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
+KeepLogSeg(XLogRecPtr recptr, XLogRecPtr slotsMinReqLSN, XLogSegNo *logSegNo)
{
XLogSegNo currSegNo;
XLogSegNo segno;
@@ -8155,7 +8176,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
* Calculate how many segments are kept by slots first, adjusting for
* max_slot_wal_keep_size.
*/
- keep = XLogGetReplicationSlotMinimumLSN();
+ keep = slotsMinReqLSN;
if (keep != InvalidXLogRecPtr && keep < recptr)
{
XLByteToSeg(keep, segno, wal_segment_size);
--
2.34.1
alternative.rejtext/x-rejectDownload
From 114c91d8dd8b9070222b1a59abfa61a9daece4f0 Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <v.davydov@postgrespro.ru>
Date: Tue, 20 May 2025 18:07:19 +0300
Subject: [PATCH] Alternative solution with 019_replslot_limit.pl fix
---
src/backend/access/transam/xlog.c | 16 ++++------------
src/test/recovery/t/019_replslot_limit.pl | 16 ++++++++++++++++
2 files changed, 20 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 624be87a609..4c3b68a94dc 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7520,16 +7520,12 @@ CreateCheckPoint(int flags)
_logSegNo, InvalidOid,
InvalidTransactionId))
{
- ReplicationSlotsComputeRequiredLSN();
- slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
- CheckPointReplicationSlots(shutdown);
-
/*
* Some slots have been invalidated; recalculate the old-segment
* horizon, starting again from RedoRecPtr.
*/
- XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo);
+ /* XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size); */
+ /* KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo); */
}
_logSegNo--;
RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr,
@@ -7986,16 +7982,12 @@ CreateRestartPoint(int flags)
_logSegNo, InvalidOid,
InvalidTransactionId))
{
- ReplicationSlotsComputeRequiredLSN();
- slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
- CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
-
/*
* Some slots have been invalidated; recalculate the old-segment
* horizon, starting again from RedoRecPtr.
*/
- XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(endptr, slotsMinReqLSN, &_logSegNo);
+ /* XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size); */
+ /* KeepLogSeg(endptr, slotsMinReqLSN, &_logSegNo); */
}
_logSegNo--;
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 6468784b83d..2ce1882cc59 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -212,6 +212,22 @@ for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
}
ok($checkpoint_ended, 'waited for checkpoint to end');
+# Execute one more checkpoint to advance the old-segment horizon after slot
+# invalidation. Slots are invalidated in the checkpoint after the old segment
+# horizon is calculated.
+$node_primary->safe_psql('postgres', "CHECKPOINT;");
+$checkpoint_ended = 0;
+for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
+{
+ if ($node_primary->log_contains("checkpoint complete: ", $logstart))
+ {
+ $checkpoint_ended = 1;
+ last;
+ }
+ usleep(100_000);
+}
+ok($checkpoint_ended, 'waited for checkpoint to end');
+
# The invalidated slot shouldn't keep the old-segment horizon back;
# see bug #17103: https://postgr.es/m/17103-004130e8f27782c9@postgresql.org
# Test for this by creating a new slot and comparing its restart LSN
--
2.34.1
0003-Add-TAP-test-to-check-physical-repl-slot-advance-dur.v3.patchtext/x-patchDownload
From 2131771f97fc3497906568612f9fdda027238d42 Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <v.davydov@postgrespro.ru>
Date: Wed, 30 Apr 2025 12:48:27 +0300
Subject: [PATCH 3/5] Add TAP test to check physical repl slot advance during
checkpoint
The test verifies that the physical replication slot is still valid
after immediate restart on checkpoint completion in case when the slot
was advanced during checkpoint.
Discussion: https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
---
.../test_replslot_required_lsn/meson.build | 3 +-
.../t/002_physical_slot.pl | 126 ++++++++++++++++++
2 files changed, 128 insertions(+), 1 deletion(-)
create mode 100644 src/test/modules/test_replslot_required_lsn/t/002_physical_slot.pl
diff --git a/src/test/modules/test_replslot_required_lsn/meson.build b/src/test/modules/test_replslot_required_lsn/meson.build
index 999c16201fb..44d2546632b 100644
--- a/src/test/modules/test_replslot_required_lsn/meson.build
+++ b/src/test/modules/test_replslot_required_lsn/meson.build
@@ -9,7 +9,8 @@ tests += {
'enable_injection_points': get_option('injection_points') ? 'yes' : 'no',
},
'tests': [
- 't/001_logical_slot.pl'
+ 't/001_logical_slot.pl',
+ 't/002_physical_slot.pl'
],
},
}
diff --git a/src/test/modules/test_replslot_required_lsn/t/002_physical_slot.pl b/src/test/modules/test_replslot_required_lsn/t/002_physical_slot.pl
new file mode 100644
index 00000000000..f89aec1da32
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/t/002_physical_slot.pl
@@ -0,0 +1,126 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+# This test verifies the case when the physical slot is advanced during
+# checkpoint. The test checks that the physical slot's restart_lsn still refers
+# to an existed WAL segment after immediate restart.
+#
+# Discussion:
+# https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
+#
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+
+$node = PostgreSQL::Test::Cluster->new('mike');
+$node->init();
+$node->append_conf('postgresql.conf',
+ "shared_preload_libraries = 'injection_points'");
+$node->append_conf('postgresql.conf',
+ "wal_level = 'replica'");
+$node->start();
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# create a simple table to generate data into
+$node->safe_psql('postgres',
+ q{create table t (id serial primary key, b text)});
+
+# create a physical replication slot
+$node->safe_psql('postgres',
+ q{select pg_create_physical_replication_slot('slot_physical', true)});
+
+# advance slot to current position, just to have everything "valid"
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())});
+
+# run checkpoint, to flush current state to disk and set a baseline
+$node->safe_psql('postgres', q{checkpoint});
+
+# insert 2M rows, that's about 260MB (~20 segments) worth of WAL
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,100000) s(i)});
+
+# advance slot to current position, just to have everything "valid"
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())});
+
+# run another checkpoint, to set a new restore LSN
+$node->safe_psql('postgres', q{checkpoint});
+
+# another 2M rows, that's about 260MB (~20 segments) worth of WAL
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)});
+
+my $restart_lsn_init = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'});
+chomp($restart_lsn_init);
+note("restart lsn before checkpoint: $restart_lsn_init");
+
+# run another checkpoint, this time in the background, and make it wait
+# on the injection point), so that the checkpoint stops right before
+# removing old WAL segments
+note('starting checkpoint');
+
+my $checkpoint = $node->background_psql('postgres');
+$checkpoint->query_safe(
+ q{select injection_points_attach('checkpoint-before-old-wal-removal','wait')});
+$checkpoint->query_until(qr/starting_checkpoint/,
+q(\echo starting_checkpoint
+checkpoint;
+\q
+));
+
+# wait until the checkpoint stops right before removing WAL segments
+note('waiting for injection_point');
+$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
+note('injection_point is reached');
+
+# OK, we're in the right situation, time to advance the physical slot,
+# which recalculates the required LSN, and then unblock the checkpoint,
+# which removes the WAL still needed by the logical slot
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())});
+
+# Continue checkpoint
+$node->safe_psql('postgres',
+ q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
+
+my $restart_lsn_old = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'});
+chomp($restart_lsn_old);
+note("restart lsn before stop: $restart_lsn_old");
+
+# abruptly stop the server (1 second should be enough for the checkpoint
+# to finish, would be better )
+$node->stop('immediate');
+
+$node->start;
+
+# Get the restart_lsn of the slot right after restarting
+my $restart_lsn = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'});
+chomp($restart_lsn);
+note("restart lsn: $restart_lsn");
+
+# Get wal segment name for slot's restart_lsn
+my $restart_lsn_segment = $node->safe_psql('postgres',
+ "SELECT pg_walfile_name('$restart_lsn'::pg_lsn)");
+chomp($restart_lsn_segment);
+
+# Check if the required wal segment exists
+note("required by slot segment name: $restart_lsn_segment");
+my $datadir = $node->data_dir;
+ok(-f "$datadir/pg_wal/$restart_lsn_segment",
+ "WAL segment $restart_lsn_segment for physical slot's restart_lsn $restart_lsn exists");
+
+done_testing();
--
2.34.1
0005-Remove-redundant-ReplicationSlotsComputeRequiredLSN-.v3.patchtext/x-patchDownload
From 7a7f0c8404bee31291f7cd0c07b307b04805aab1 Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <v.davydov@postgrespro.ru>
Date: Thu, 1 May 2025 12:18:52 +0300
Subject: [PATCH 5/5] Remove redundant ReplicationSlotsComputeRequiredLSN calls
The function ReplicationSlotsComputeRequiredLSN is used to calculate the
oldest slots' required LSN. It is called every time when restart_lsn
value of any slot is changed (for example, when a slot is advancing).
The slot's oldest required LSN is used to remote old WAL segments in two
places - when checkpoint or restart point is created (CreateCheckPoint,
CreateRestartPoint functions). Old WAL segments seems to be truncated in
these two functions only.
The idea of the patch is to call ReplicationSlotsComputeRequiredLSN in
CreateCheckPoint or CreateRestartPoint functions only, before call of
RemoveOldXlogFiles function where old WAL segments are removed. There
is no obvious need to recalculate oldest required LSN every time when a
slot's restart_lsn is changed.
The value of the oldest required lsn can affect on slot invalidation.
The function InvalidateObsoleteReplicationSlots with non zero second
parameter (oldestSegno) is called in CreateCheckPoint,
CreateRestartPoint functions only where slot invalidation occurs with
reason RS_INVAL_WAL_REMOVED. Once we update the oldest slots' required
lsn in the beginning of these functions, the proposed patch should not
break the behaviour of slot invalidation function in this case.
---
src/backend/access/transam/xlog.c | 4 ++++
src/backend/replication/logical/logical.c | 1 -
src/backend/replication/logical/slotsync.c | 4 ----
src/backend/replication/slot.c | 5 -----
src/backend/replication/slotfuncs.c | 2 --
src/backend/replication/walsender.c | 1 -
6 files changed, 4 insertions(+), 13 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9c0f9a0af28..624be87a609 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7320,6 +7320,7 @@ CreateCheckPoint(int flags)
/*
* Get the current minimum LSN to be used later in WAL segments cleanup.
*/
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
/*
@@ -7519,6 +7520,7 @@ CreateCheckPoint(int flags)
_logSegNo, InvalidOid,
InvalidTransactionId))
{
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
CheckPointReplicationSlots(shutdown);
@@ -7888,6 +7890,7 @@ CreateRestartPoint(int flags)
/*
* Get the current minimum LSN to be used later in WAL segments cleanup.
*/
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
if (log_checkpoints)
@@ -7983,6 +7986,7 @@ CreateRestartPoint(int flags)
_logSegNo, InvalidOid,
InvalidTransactionId))
{
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index de4d86afa22..bc88ccb0207 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1917,7 +1917,6 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
SpinLockRelease(&MyReplicationSlot->mutex);
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
}
}
else
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 656e66e0ae0..30662c09275 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -335,7 +335,6 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
SpinLockRelease(&slot->mutex);
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
}
return updated_config || updated_xmin_or_lsn;
@@ -502,9 +501,6 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
slot->data.restart_lsn = restart_lsn;
SpinLockRelease(&slot->mutex);
- /* Prevent WAL removal as fast as possible */
- ReplicationSlotsComputeRequiredLSN();
-
XLByteToSeg(slot->data.restart_lsn, segno, wal_segment_size);
/*
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 600b87fa9cb..dd18fe10f7d 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1008,7 +1008,6 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
* limits.
*/
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
/*
* If removing the directory fails, the worst thing that will happen is
@@ -1494,9 +1493,6 @@ ReplicationSlotReserveWal(void)
slot->data.restart_lsn = restart_lsn;
SpinLockRelease(&slot->mutex);
- /* prevent WAL removal as fast as possible */
- ReplicationSlotsComputeRequiredLSN();
-
/*
* If all required WAL is still there, great, otherwise retry. The
* slot should prevent further removal of WAL, unless there's a
@@ -2014,7 +2010,6 @@ restart:
if (invalidated)
{
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
}
return invalidated;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 36cc2ed4e44..3300fb9b1c9 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -583,7 +583,6 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
* advancing potentially done.
*/
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
ReplicationSlotRelease();
@@ -819,7 +818,6 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
ReplicationSlotMarkDirty();
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
ReplicationSlotSave();
#ifdef USE_ASSERT_CHECKING
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 9fa8beb6103..0767c2803d9 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2384,7 +2384,6 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
if (changed)
{
ReplicationSlotMarkDirty();
- ReplicationSlotsComputeRequiredLSN();
PhysicalWakeupLogicalWalSnd();
}
--
2.34.1
0001-Add-injection-points-to-test-replication-slot-advanc.v3.patchtext/x-patchDownload
From 372dca207ea1275aa21b95c7bc5b8e07f71b075e Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Thu, 21 Nov 2024 20:37:00 +0100
Subject: [PATCH 1/5] Add injection points to test replication slot advance
New injection points:
* checkpoint-before-old-wal-removal - triggered in the checkpointer
process just before old WAL segments cleanup.
* logical-replication-slot-advance-segment - triggered in
LogicalConfirmReceivedLocation when restart_lsn was changed enough to
point to a next WAL segment.
Original patch by: Tomas Vondra <tomas@vondra.me>
Modified by: Vitaly Davydov <v.davydov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
---
src/backend/access/transam/xlog.c | 4 ++++
src/backend/replication/logical/logical.c | 18 ++++++++++++++++++
2 files changed, 22 insertions(+)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1914859b2ee..47ffc0a2307 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7498,6 +7498,10 @@ CreateCheckPoint(int flags)
if (PriorRedoPtr != InvalidXLogRecPtr)
UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
+#ifdef USE_INJECTION_POINTS
+ INJECTION_POINT("checkpoint-before-old-wal-removal", NULL);
+#endif
+
/*
* Delete old log files, those no longer needed for last checkpoint to
* prevent the disk holding the xlog from growing full.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 1d56d0c4ef3..de4d86afa22 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -30,6 +30,7 @@
#include "access/xact.h"
#include "access/xlogutils.h"
+#include "access/xlog_internal.h"
#include "fmgr.h"
#include "miscadmin.h"
#include "pgstat.h"
@@ -41,6 +42,7 @@
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/inval.h"
#include "utils/memutils.h"
@@ -1825,9 +1827,13 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
{
bool updated_xmin = false;
bool updated_restart = false;
+ XLogRecPtr restart_lsn pg_attribute_unused();
SpinLockAcquire(&MyReplicationSlot->mutex);
+ /* remember the old restart lsn */
+ restart_lsn = MyReplicationSlot->data.restart_lsn;
+
/*
* Prevent moving the confirmed_flush backwards, as this could lead to
* data duplication issues caused by replicating already replicated
@@ -1881,6 +1887,18 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
/* first write new xmin to disk, so we know what's up after a crash */
if (updated_xmin || updated_restart)
{
+#ifdef USE_INJECTION_POINTS
+ XLogSegNo seg1,
+ seg2;
+
+ XLByteToSeg(restart_lsn, seg1, wal_segment_size);
+ XLByteToSeg(MyReplicationSlot->data.restart_lsn, seg2, wal_segment_size);
+
+ /* trigger injection point, but only if segment changes */
+ if (seg1 != seg2)
+ INJECTION_POINT("logical-replication-slot-advance-segment", NULL);
+#endif
+
ReplicationSlotMarkDirty();
ReplicationSlotSave();
elog(DEBUG1, "updated xmin: %u restart: %u", updated_xmin, updated_restart);
--
2.34.1
Hi, Vitaly!
On Tue, May 20, 2025 at 6:44 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
Thank you very much for the review!
The patchset doesn't seem to build after 371f2db8b0, which adjusted
the signature of the INJECTION_POINT() macro. Could you, please,
update the patchset accordingly.I've updated the patch (see attached). Thanks.
I see in 0004 patch we're calling XLogGetReplicationSlotMinimumLSN()
before slots synchronization then use it for WAL truncation.
Generally looks good. But what about the "if
(InvalidateObsoleteReplicationSlots(...))" branch? It calls
XLogGetReplicationSlotMinimumLSN() again. Why would the value
obtained from the latter call reflect slots as they are synchronized
to the disk?In patch 0004 I call XLogGetReplicationSlotMinimumLSN() again to keep the old
behaviour - this function was called in KeepLogSeg prior to my change. I also
call CheckPointReplicationSlots at the next line to save the invalidated and
other dirty slots on disk again to make sure, the new oldest LSN is in sync.The problem I tried to solve in this if-branch is to fix test
src/test/recovery/t/019_replslot_limit.pl which was failed because the WAL was
not truncated enought for the test to pass ok. In general, this branch is not
necessary and we may fix the test by calling checkpoint twice (please, see the
alternative.rej patch for this case). If you think, we should incorporate this
new change, I'm ok to do it. But the WAL will be truncating more lazily.Furthermore, I think we can save slots on disk right after invalidation, not in
CheckPointGuts to avoid saving invalidated slots twice.
Thank you for the clarification. It's all good. I just missed that
CheckPointReplicationSlots() syncs slots inside the "if" branch.
I've reordered the patchset. Fix should come first, tests comes
second. So, tests pass after each commit. Also I've joined both
tests and injection points into single commit. I don't see reason to
place tests into src/test/modules, because there is no module. I've
moved them into src/test/recovery.
I also improved some comments and commit messages. I think 0001
should go to all supported releases as it fixes material bug, while
0002 should be backpatched to 17, where injection points fist appears.
0003 should go to pg19 after branching. I'm continuing reviewing
this.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v4-0001-Keep-WAL-segments-by-the-flushed-value-of-the-slo.patchapplication/x-patch; name=v4-0001-Keep-WAL-segments-by-the-flushed-value-of-the-slo.patchDownload
From c409a441be6487063d49c2671d3a3aecb9ba6994 Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <v.davydov@postgrespro.ru>
Date: Wed, 30 Apr 2025 14:09:21 +0300
Subject: [PATCH v4 1/3] Keep WAL segments by the flushed value of the slot's
restart LSN
The patch fixes the issue with the unexpected removal of old WAL segments
after checkpoint, followed by an immediate restart. The issue occurs when
a slot is advanced after the start of the checkpoint and before old WAL
segments are removed at the end of the checkpoint.
The idea of the patch is to get the minimal restart_lsn at the beginning
of checkpoint (or restart point) creation and use this value when calculating
the oldest LSN for WAL segments removal at the end of checkpoint. This idea
was proposed by Tomas Vondra in the discussion.
Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Backpatch-through: 13
---
src/backend/access/transam/xlog.c | 37 ++++++++++++++++++++++++-------
1 file changed, 29 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1914859b2ee..30ae65fce53 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -677,7 +677,8 @@ static XLogRecPtr CreateOverwriteContrecordRecord(XLogRecPtr aborted_lsn,
XLogRecPtr pagePtr,
TimeLineID newTLI);
static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
-static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
+static void KeepLogSeg(XLogRecPtr recptr, XLogRecPtr slotsMinLSN,
+ XLogSegNo *logSegNo);
static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
static void AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli,
@@ -7087,6 +7088,7 @@ CreateCheckPoint(int flags)
VirtualTransactionId *vxids;
int nvxids;
int oldXLogAllowed = 0;
+ XLogRecPtr slotsMinReqLSN;
/*
* An end-of-recovery checkpoint is really a shutdown checkpoint, just
@@ -7315,6 +7317,11 @@ CreateCheckPoint(int flags)
*/
END_CRIT_SECTION();
+ /*
+ * Get the current minimum LSN to be used later in WAL segments cleanup.
+ */
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+
/*
* In some cases there are groups of actions that must all occur on one
* side or the other of a checkpoint record. Before flushing the
@@ -7503,17 +7510,20 @@ CreateCheckPoint(int flags)
* prevent the disk holding the xlog from growing full.
*/
XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(recptr, &_logSegNo);
+ KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo);
if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
_logSegNo, InvalidOid,
InvalidTransactionId))
{
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+ CheckPointReplicationSlots(shutdown);
+
/*
* Some slots have been invalidated; recalculate the old-segment
* horizon, starting again from RedoRecPtr.
*/
XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(recptr, &_logSegNo);
+ KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo);
}
_logSegNo--;
RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr,
@@ -7788,6 +7798,7 @@ CreateRestartPoint(int flags)
XLogRecPtr endptr;
XLogSegNo _logSegNo;
TimestampTz xtime;
+ XLogRecPtr slotsMinReqLSN;
/* Concurrent checkpoint/restartpoint cannot happen */
Assert(!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER);
@@ -7870,6 +7881,11 @@ CreateRestartPoint(int flags)
MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
+ /*
+ * Get the current minimum LSN to be used later in WAL segments cleanup.
+ */
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+
if (log_checkpoints)
LogCheckpointStart(flags, true);
@@ -7958,17 +7974,20 @@ CreateRestartPoint(int flags)
receivePtr = GetWalRcvFlushRecPtr(NULL, NULL);
replayPtr = GetXLogReplayRecPtr(&replayTLI);
endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
- KeepLogSeg(endptr, &_logSegNo);
+ KeepLogSeg(endptr, slotsMinReqLSN, &_logSegNo);
if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
_logSegNo, InvalidOid,
InvalidTransactionId))
{
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+ CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
+
/*
* Some slots have been invalidated; recalculate the old-segment
* horizon, starting again from RedoRecPtr.
*/
XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(endptr, &_logSegNo);
+ KeepLogSeg(endptr, slotsMinReqLSN, &_logSegNo);
}
_logSegNo--;
@@ -8063,6 +8082,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
XLogSegNo oldestSegMaxWalSize; /* oldest segid kept by max_wal_size */
XLogSegNo oldestSlotSeg; /* oldest segid kept by slot */
uint64 keepSegs;
+ XLogRecPtr slotsMinReqLSN;
/*
* slot does not reserve WAL. Either deactivated, or has never been active
@@ -8076,8 +8096,9 @@ GetWALAvailability(XLogRecPtr targetLSN)
* oldestSlotSeg to the current segment.
*/
currpos = GetXLogWriteRecPtr();
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
XLByteToSeg(currpos, oldestSlotSeg, wal_segment_size);
- KeepLogSeg(currpos, &oldestSlotSeg);
+ KeepLogSeg(currpos, slotsMinReqLSN, &oldestSlotSeg);
/*
* Find the oldest extant segment file. We get 1 until checkpoint removes
@@ -8138,7 +8159,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
* invalidation is optionally done here, instead.
*/
static void
-KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
+KeepLogSeg(XLogRecPtr recptr, XLogRecPtr slotsMinReqLSN, XLogSegNo *logSegNo)
{
XLogSegNo currSegNo;
XLogSegNo segno;
@@ -8151,7 +8172,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
* Calculate how many segments are kept by slots first, adjusting for
* max_slot_wal_keep_size.
*/
- keep = XLogGetReplicationSlotMinimumLSN();
+ keep = slotsMinReqLSN;
if (keep != InvalidXLogRecPtr && keep < recptr)
{
XLByteToSeg(keep, segno, wal_segment_size);
--
2.39.5 (Apple Git-154)
v4-0002-Add-TAP-tests-to-check-replication-slot-advance-d.patchapplication/x-patch; name=v4-0002-Add-TAP-tests-to-check-replication-slot-advance-d.patchDownload
From 6f2afd8239cdefc62a519825a670db2eb8a4e111 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Thu, 21 Nov 2024 20:37:00 +0100
Subject: [PATCH v4 2/3] Add TAP tests to check replication slot advance during
the checkpoint
The new tests verify that logical and physical replication slots are still
valid after an immediate restart on checkpoint completion when the slot was
advanced during the checkpoint.
This commit introduces two new injection points to make these tests possible:
* checkpoint-before-old-wal-removal - triggered in the checkpointer process
just before old WAL segments cleanup;
* logical-replication-slot-advance-segment - triggered in
LogicalConfirmReceivedLocation() when restart_lsn was changed enough to
point to the next WAL segment.
Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Backpatch-through: 17
---
src/backend/access/transam/xlog.c | 4 +
src/backend/replication/logical/logical.c | 18 +++
.../test_replslot_required_lsn/Makefile | 18 +++
.../test_replslot_required_lsn/meson.build | 16 ++
src/test/recovery/meson.build | 2 +
src/test/recovery/t/046_logical_slot.pl | 139 ++++++++++++++++++
src/test/recovery/t/047_physical_slot.pl | 136 +++++++++++++++++
7 files changed, 333 insertions(+)
create mode 100644 src/test/modules/test_replslot_required_lsn/Makefile
create mode 100644 src/test/modules/test_replslot_required_lsn/meson.build
create mode 100644 src/test/recovery/t/046_logical_slot.pl
create mode 100644 src/test/recovery/t/047_physical_slot.pl
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 30ae65fce53..9c0f9a0af28 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7505,6 +7505,10 @@ CreateCheckPoint(int flags)
if (PriorRedoPtr != InvalidXLogRecPtr)
UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
+#ifdef USE_INJECTION_POINTS
+ INJECTION_POINT("checkpoint-before-old-wal-removal", NULL);
+#endif
+
/*
* Delete old log files, those no longer needed for last checkpoint to
* prevent the disk holding the xlog from growing full.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 1d56d0c4ef3..f1eb798f3e9 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -29,6 +29,7 @@
#include "postgres.h"
#include "access/xact.h"
+#include "access/xlog_internal.h"
#include "access/xlogutils.h"
#include "fmgr.h"
#include "miscadmin.h"
@@ -41,6 +42,7 @@
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/inval.h"
#include "utils/memutils.h"
@@ -1825,9 +1827,13 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
{
bool updated_xmin = false;
bool updated_restart = false;
+ XLogRecPtr restart_lsn pg_attribute_unused();
SpinLockAcquire(&MyReplicationSlot->mutex);
+ /* remember the old restart lsn */
+ restart_lsn = MyReplicationSlot->data.restart_lsn;
+
/*
* Prevent moving the confirmed_flush backwards, as this could lead to
* data duplication issues caused by replicating already replicated
@@ -1881,6 +1887,18 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
/* first write new xmin to disk, so we know what's up after a crash */
if (updated_xmin || updated_restart)
{
+#ifdef USE_INJECTION_POINTS
+ XLogSegNo seg1,
+ seg2;
+
+ XLByteToSeg(restart_lsn, seg1, wal_segment_size);
+ XLByteToSeg(MyReplicationSlot->data.restart_lsn, seg2, wal_segment_size);
+
+ /* trigger injection point, but only if segment changes */
+ if (seg1 != seg2)
+ INJECTION_POINT("logical-replication-slot-advance-segment", NULL);
+#endif
+
ReplicationSlotMarkDirty();
ReplicationSlotSave();
elog(DEBUG1, "updated xmin: %u restart: %u", updated_xmin, updated_restart);
diff --git a/src/test/modules/test_replslot_required_lsn/Makefile b/src/test/modules/test_replslot_required_lsn/Makefile
new file mode 100644
index 00000000000..e5ff8af255b
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/Makefile
@@ -0,0 +1,18 @@
+# src/test/modules/test_replslot_required_lsn/Makefile
+
+EXTRA_INSTALL=src/test/modules/injection_points \
+ contrib/test_decoding
+
+export enable_injection_points
+TAP_TESTS = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_replslot_required_lsn
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_replslot_required_lsn/meson.build b/src/test/modules/test_replslot_required_lsn/meson.build
new file mode 100644
index 00000000000..44d2546632b
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/meson.build
@@ -0,0 +1,16 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+tests += {
+ 'name': 'test_replslot_required_lsn',
+ 'sd': meson.current_source_dir(),
+ 'bd': meson.current_build_dir(),
+ 'tap': {
+ 'env': {
+ 'enable_injection_points': get_option('injection_points') ? 'yes' : 'no',
+ },
+ 'tests': [
+ 't/001_logical_slot.pl',
+ 't/002_physical_slot.pl'
+ ],
+ },
+}
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index cb983766c67..5ee41c3cd4d 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -54,6 +54,8 @@ tests += {
't/043_no_contrecord_switch.pl',
't/044_invalidate_inactive_slots.pl',
't/045_archive_restartpoint.pl',
+ 't/046_logical_slot.pl',
+ 't/047_physical_slot.pl'
],
},
}
diff --git a/src/test/recovery/t/046_logical_slot.pl b/src/test/recovery/t/046_logical_slot.pl
new file mode 100644
index 00000000000..e78375178aa
--- /dev/null
+++ b/src/test/recovery/t/046_logical_slot.pl
@@ -0,0 +1,139 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+# This test verifies the case when the logical slot is advanced during
+# checkpoint. The test checks that the logical slot's restart_lsn still refers
+# to an existed WAL segment after immediate restart.
+#
+# Discussion:
+# https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
+#
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+
+$node = PostgreSQL::Test::Cluster->new('mike');
+$node->init;
+$node->append_conf('postgresql.conf',
+ "shared_preload_libraries = 'injection_points'");
+$node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# create a simple table to generate data into
+$node->safe_psql('postgres',
+ q{create table t (id serial primary key, b text)});
+
+# create the two slots we'll need
+$node->safe_psql('postgres',
+ q{select pg_create_logical_replication_slot('slot_logical', 'test_decoding')}
+);
+$node->safe_psql('postgres',
+ q{select pg_create_physical_replication_slot('slot_physical', true)});
+
+# advance both to current position, just to have everything "valid"
+$node->safe_psql('postgres',
+ q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null)}
+);
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# run checkpoint, to flush current state to disk and set a baseline
+$node->safe_psql('postgres', q{checkpoint});
+
+# generate transactions to get RUNNING_XACTS
+my $xacts = $node->background_psql('postgres');
+$xacts->query_until(
+ qr/run_xacts/,
+ q(\echo run_xacts
+SELECT 1 \watch 0.1
+\q
+));
+
+# insert 2M rows, that's about 260MB (~20 segments) worth of WAL
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+);
+
+# run another checkpoint, to set a new restore LSN
+$node->safe_psql('postgres', q{checkpoint});
+
+# another 2M rows, that's about 260MB (~20 segments) worth of WAL
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+);
+
+# run another checkpoint, this time in the background, and make it wait
+# on the injection point), so that the checkpoint stops right before
+# removing old WAL segments
+print('starting checkpoint\n');
+
+my $checkpoint = $node->background_psql('postgres');
+$checkpoint->query_safe(
+ q(select injection_points_attach('checkpoint-before-old-wal-removal','wait'))
+);
+$checkpoint->query_until(
+ qr/starting_checkpoint/,
+ q(\echo starting_checkpoint
+checkpoint;
+\q
+));
+
+print('waiting for injection_point\n');
+# wait until the checkpoint stops right before removing WAL segments
+$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
+
+
+# try to advance the logical slot, but make it stop when it moves to the
+# next WAL segment (has to happen in the background too)
+my $logical = $node->background_psql('postgres');
+$logical->query_safe(
+ q{select injection_points_attach('logical-replication-slot-advance-segment','wait');}
+);
+$logical->query_until(
+ qr/get_changes/,
+ q(
+\echo get_changes
+select count(*) from pg_logical_slot_get_changes('slot_logical', null, null) \watch 1
+\q
+));
+
+# wait until the checkpoint stops right before removing WAL segments
+$node->wait_for_event('client backend',
+ 'logical-replication-slot-advance-segment');
+
+# OK, we're in the right situation, time to advance the physical slot,
+# which recalculates the required LSN, and then unblock the checkpoint,
+# which removes the WAL still needed by the logical slot
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+$node->safe_psql('postgres',
+ q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
+
+# abruptly stop the server (1 second should be enough for the checkpoint
+# to finish, would be better )
+$node->stop('immediate');
+
+$node->start;
+
+eval {
+ $node->safe_psql('postgres',
+ q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null);}
+ );
+};
+is($@, '', "Logical slot still valid");
+
+done_testing();
diff --git a/src/test/recovery/t/047_physical_slot.pl b/src/test/recovery/t/047_physical_slot.pl
new file mode 100644
index 00000000000..f2cf096b308
--- /dev/null
+++ b/src/test/recovery/t/047_physical_slot.pl
@@ -0,0 +1,136 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+# This test verifies the case when the physical slot is advanced during
+# checkpoint. The test checks that the physical slot's restart_lsn still refers
+# to an existed WAL segment after immediate restart.
+#
+# Discussion:
+# https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
+#
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+
+$node = PostgreSQL::Test::Cluster->new('mike');
+$node->init();
+$node->append_conf('postgresql.conf',
+ "shared_preload_libraries = 'injection_points'");
+$node->append_conf('postgresql.conf', "wal_level = 'replica'");
+$node->start();
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# create a simple table to generate data into
+$node->safe_psql('postgres',
+ q{create table t (id serial primary key, b text)});
+
+# create a physical replication slot
+$node->safe_psql('postgres',
+ q{select pg_create_physical_replication_slot('slot_physical', true)});
+
+# advance slot to current position, just to have everything "valid"
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# run checkpoint, to flush current state to disk and set a baseline
+$node->safe_psql('postgres', q{checkpoint});
+
+# insert 2M rows, that's about 260MB (~20 segments) worth of WAL
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,100000) s(i)}
+);
+
+# advance slot to current position, just to have everything "valid"
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# run another checkpoint, to set a new restore LSN
+$node->safe_psql('postgres', q{checkpoint});
+
+# another 2M rows, that's about 260MB (~20 segments) worth of WAL
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+);
+
+my $restart_lsn_init = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'}
+);
+chomp($restart_lsn_init);
+note("restart lsn before checkpoint: $restart_lsn_init");
+
+# run another checkpoint, this time in the background, and make it wait
+# on the injection point), so that the checkpoint stops right before
+# removing old WAL segments
+note('starting checkpoint');
+
+my $checkpoint = $node->background_psql('postgres');
+$checkpoint->query_safe(
+ q{select injection_points_attach('checkpoint-before-old-wal-removal','wait')}
+);
+$checkpoint->query_until(
+ qr/starting_checkpoint/,
+ q(\echo starting_checkpoint
+checkpoint;
+\q
+));
+
+# wait until the checkpoint stops right before removing WAL segments
+note('waiting for injection_point');
+$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
+note('injection_point is reached');
+
+# OK, we're in the right situation, time to advance the physical slot,
+# which recalculates the required LSN, and then unblock the checkpoint,
+# which removes the WAL still needed by the logical slot
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Continue checkpoint
+$node->safe_psql('postgres',
+ q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
+
+my $restart_lsn_old = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'}
+);
+chomp($restart_lsn_old);
+note("restart lsn before stop: $restart_lsn_old");
+
+# abruptly stop the server (1 second should be enough for the checkpoint
+# to finish, would be better )
+$node->stop('immediate');
+
+$node->start;
+
+# Get the restart_lsn of the slot right after restarting
+my $restart_lsn = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'}
+);
+chomp($restart_lsn);
+note("restart lsn: $restart_lsn");
+
+# Get wal segment name for slot's restart_lsn
+my $restart_lsn_segment = $node->safe_psql('postgres',
+ "SELECT pg_walfile_name('$restart_lsn'::pg_lsn)");
+chomp($restart_lsn_segment);
+
+# Check if the required wal segment exists
+note("required by slot segment name: $restart_lsn_segment");
+my $datadir = $node->data_dir;
+ok( -f "$datadir/pg_wal/$restart_lsn_segment",
+ "WAL segment $restart_lsn_segment for physical slot's restart_lsn $restart_lsn exists"
+);
+
+done_testing();
--
2.39.5 (Apple Git-154)
v4-0003-Remove-redundant-ReplicationSlotsComputeRequiredL.patchapplication/x-patch; name=v4-0003-Remove-redundant-ReplicationSlotsComputeRequiredL.patchDownload
From cf5b10f1cbb400e7ff0e596fe962af5847e96c2e Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <v.davydov@postgrespro.ru>
Date: Thu, 1 May 2025 12:18:52 +0300
Subject: [PATCH v4 3/3] Remove redundant ReplicationSlotsComputeRequiredLSN
calls
The function ReplicationSlotsComputeRequiredLSN is used to calculate the
oldest slots' required LSN. It is called every time when restart_lsn
value of any slot is changed (for example, when a slot is advancing).
The slot's oldest required LSN is used to remote old WAL segments in two
places - when checkpoint or restart point is created (CreateCheckPoint,
CreateRestartPoint functions). Old WAL segments seems to be truncated in
these two functions only.
The idea of the patch is to call ReplicationSlotsComputeRequiredLSN in
CreateCheckPoint or CreateRestartPoint functions only, before call of
RemoveOldXlogFiles function where old WAL segments are removed. There
is no obvious need to recalculate oldest required LSN every time when a
slot's restart_lsn is changed.
The value of the oldest required lsn can affect on slot invalidation.
The function InvalidateObsoleteReplicationSlots with non zero second
parameter (oldestSegno) is called in CreateCheckPoint,
CreateRestartPoint functions only where slot invalidation occurs with
reason RS_INVAL_WAL_REMOVED. Once we update the oldest slots' required
lsn in the beginning of these functions, the proposed patch should not
break the behaviour of slot invalidation function in this case.
---
src/backend/access/transam/xlog.c | 4 ++++
src/backend/replication/logical/logical.c | 1 -
src/backend/replication/logical/slotsync.c | 4 ----
src/backend/replication/slot.c | 5 -----
src/backend/replication/slotfuncs.c | 2 --
src/backend/replication/walsender.c | 1 -
6 files changed, 4 insertions(+), 13 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9c0f9a0af28..624be87a609 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7320,6 +7320,7 @@ CreateCheckPoint(int flags)
/*
* Get the current minimum LSN to be used later in WAL segments cleanup.
*/
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
/*
@@ -7519,6 +7520,7 @@ CreateCheckPoint(int flags)
_logSegNo, InvalidOid,
InvalidTransactionId))
{
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
CheckPointReplicationSlots(shutdown);
@@ -7888,6 +7890,7 @@ CreateRestartPoint(int flags)
/*
* Get the current minimum LSN to be used later in WAL segments cleanup.
*/
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
if (log_checkpoints)
@@ -7983,6 +7986,7 @@ CreateRestartPoint(int flags)
_logSegNo, InvalidOid,
InvalidTransactionId))
{
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index f1eb798f3e9..7d136213777 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1917,7 +1917,6 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
SpinLockRelease(&MyReplicationSlot->mutex);
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
}
}
else
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 656e66e0ae0..30662c09275 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -335,7 +335,6 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
SpinLockRelease(&slot->mutex);
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
}
return updated_config || updated_xmin_or_lsn;
@@ -502,9 +501,6 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
slot->data.restart_lsn = restart_lsn;
SpinLockRelease(&slot->mutex);
- /* Prevent WAL removal as fast as possible */
- ReplicationSlotsComputeRequiredLSN();
-
XLByteToSeg(slot->data.restart_lsn, segno, wal_segment_size);
/*
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 600b87fa9cb..dd18fe10f7d 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1008,7 +1008,6 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
* limits.
*/
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
/*
* If removing the directory fails, the worst thing that will happen is
@@ -1494,9 +1493,6 @@ ReplicationSlotReserveWal(void)
slot->data.restart_lsn = restart_lsn;
SpinLockRelease(&slot->mutex);
- /* prevent WAL removal as fast as possible */
- ReplicationSlotsComputeRequiredLSN();
-
/*
* If all required WAL is still there, great, otherwise retry. The
* slot should prevent further removal of WAL, unless there's a
@@ -2014,7 +2010,6 @@ restart:
if (invalidated)
{
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
}
return invalidated;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 36cc2ed4e44..3300fb9b1c9 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -583,7 +583,6 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
* advancing potentially done.
*/
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
ReplicationSlotRelease();
@@ -819,7 +818,6 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
ReplicationSlotMarkDirty();
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
ReplicationSlotSave();
#ifdef USE_ASSERT_CHECKING
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 9fa8beb6103..0767c2803d9 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2384,7 +2384,6 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
if (changed)
{
ReplicationSlotMarkDirty();
- ReplicationSlotsComputeRequiredLSN();
PhysicalWakeupLogicalWalSnd();
}
--
2.39.5 (Apple Git-154)
On Fri, May 23, 2025 at 12:10 AM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
Hi, Vitaly!
On Tue, May 20, 2025 at 6:44 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
Thank you very much for the review!
The patchset doesn't seem to build after 371f2db8b0, which adjusted
the signature of the INJECTION_POINT() macro. Could you, please,
update the patchset accordingly.I've updated the patch (see attached). Thanks.
I see in 0004 patch we're calling XLogGetReplicationSlotMinimumLSN()
before slots synchronization then use it for WAL truncation.
Generally looks good. But what about the "if
(InvalidateObsoleteReplicationSlots(...))" branch? It calls
XLogGetReplicationSlotMinimumLSN() again. Why would the value
obtained from the latter call reflect slots as they are synchronized
to the disk?In patch 0004 I call XLogGetReplicationSlotMinimumLSN() again to keep the old
behaviour - this function was called in KeepLogSeg prior to my change. I also
call CheckPointReplicationSlots at the next line to save the invalidated and
other dirty slots on disk again to make sure, the new oldest LSN is in sync.The problem I tried to solve in this if-branch is to fix test
src/test/recovery/t/019_replslot_limit.pl which was failed because the WAL was
not truncated enought for the test to pass ok. In general, this branch is not
necessary and we may fix the test by calling checkpoint twice (please, see the
alternative.rej patch for this case). If you think, we should incorporate this
new change, I'm ok to do it. But the WAL will be truncating more lazily.Furthermore, I think we can save slots on disk right after invalidation, not in
CheckPointGuts to avoid saving invalidated slots twice.Thank you for the clarification. It's all good. I just missed that
CheckPointReplicationSlots() syncs slots inside the "if" branch.I've reordered the patchset. Fix should come first, tests comes
second. So, tests pass after each commit. Also I've joined both
tests and injection points into single commit. I don't see reason to
place tests into src/test/modules, because there is no module. I've
moved them into src/test/recovery.I also improved some comments and commit messages. I think 0001
should go to all supported releases as it fixes material bug, while
0002 should be backpatched to 17, where injection points fist appears.
0003 should go to pg19 after branching. I'm continuing reviewing
this.
I spend more time on this. The next revision is attached. It
contains revised comments and other cosmetic changes. I'm going to
backpatch 0001 to all supported branches, and 0002 to 17 where
injection points were introduced.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v5-0003-Remove-redundant-ReplicationSlotsComputeRequiredL.patchapplication/octet-stream; name=v5-0003-Remove-redundant-ReplicationSlotsComputeRequiredL.patchDownload
From db20268922142f8b8cbff893b771d6ced81970c6 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sat, 24 May 2025 13:34:36 +0300
Subject: [PATCH v5 3/3] Remove redundant ReplicationSlotsComputeRequiredLSN
calls
The function ReplicationSlotsComputeRequiredLSN is used to calculate the
oldest slots' required LSN. It is called every time when restart_lsn
value of any slot is changed (for example, when a slot is advancing).
The slot's oldest required LSN is used to remote old WAL segments in two
places - when checkpoint or restart point is created (CreateCheckPoint,
CreateRestartPoint functions). Old WAL segments seems to be truncated in
these two functions only.
The idea of the patch is to call ReplicationSlotsComputeRequiredLSN in
CreateCheckPoint or CreateRestartPoint functions only, before call of
RemoveOldXlogFiles function where old WAL segments are removed. There
is no obvious need to recalculate oldest required LSN every time when a
slot's restart_lsn is changed.
The value of the oldest required lsn can affect on slot invalidation.
The function InvalidateObsoleteReplicationSlots with non zero second
parameter (oldestSegno) is called in CreateCheckPoint,
CreateRestartPoint functions only where slot invalidation occurs with
reason RS_INVAL_WAL_REMOVED. Once we update the oldest slots' required
lsn in the beginning of these functions, the proposed patch should not
break the behaviour of slot invalidation function in this case.
---
src/backend/access/transam/xlog.c | 4 ++++
src/backend/replication/logical/logical.c | 1 -
src/backend/replication/logical/slotsync.c | 4 ----
src/backend/replication/slot.c | 5 -----
src/backend/replication/slotfuncs.c | 2 --
src/backend/replication/walsender.c | 1 -
6 files changed, 4 insertions(+), 13 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0a7f7a71d8b..bdfd0a59ab7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7324,6 +7324,7 @@ CreateCheckPoint(int flags)
* might be advanced concurrently, so we call this before
* CheckPointReplicationSlots() synchronizes replication slots.
*/
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
/*
@@ -7528,6 +7529,7 @@ CreateCheckPoint(int flags)
* cleanup. Then, we must synchronize the replication slots again in
* order to make this LSN safe to use.
*/
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
CheckPointReplicationSlots(shutdown);
@@ -7901,6 +7903,7 @@ CreateRestartPoint(int flags)
* might be advanced concurrently, so we call this before
* CheckPointReplicationSlots() synchronizes replication slots.
*/
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
if (log_checkpoints)
@@ -8001,6 +8004,7 @@ CreateRestartPoint(int flags)
* cleanup. Then, we must synchronize the replication slots again in
* order to make this LSN safe to use.
*/
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index f1eb798f3e9..7d136213777 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1917,7 +1917,6 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
SpinLockRelease(&MyReplicationSlot->mutex);
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
}
}
else
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 656e66e0ae0..30662c09275 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -335,7 +335,6 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
SpinLockRelease(&slot->mutex);
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
}
return updated_config || updated_xmin_or_lsn;
@@ -502,9 +501,6 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
slot->data.restart_lsn = restart_lsn;
SpinLockRelease(&slot->mutex);
- /* Prevent WAL removal as fast as possible */
- ReplicationSlotsComputeRequiredLSN();
-
XLByteToSeg(slot->data.restart_lsn, segno, wal_segment_size);
/*
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 600b87fa9cb..dd18fe10f7d 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1008,7 +1008,6 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
* limits.
*/
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
/*
* If removing the directory fails, the worst thing that will happen is
@@ -1494,9 +1493,6 @@ ReplicationSlotReserveWal(void)
slot->data.restart_lsn = restart_lsn;
SpinLockRelease(&slot->mutex);
- /* prevent WAL removal as fast as possible */
- ReplicationSlotsComputeRequiredLSN();
-
/*
* If all required WAL is still there, great, otherwise retry. The
* slot should prevent further removal of WAL, unless there's a
@@ -2014,7 +2010,6 @@ restart:
if (invalidated)
{
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
}
return invalidated;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 36cc2ed4e44..3300fb9b1c9 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -583,7 +583,6 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
* advancing potentially done.
*/
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
ReplicationSlotRelease();
@@ -819,7 +818,6 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
ReplicationSlotMarkDirty();
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
ReplicationSlotSave();
#ifdef USE_ASSERT_CHECKING
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 9fa8beb6103..0767c2803d9 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2384,7 +2384,6 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
if (changed)
{
ReplicationSlotMarkDirty();
- ReplicationSlotsComputeRequiredLSN();
PhysicalWakeupLogicalWalSnd();
}
--
2.39.5 (Apple Git-154)
v5-0002-Add-TAP-tests-to-check-replication-slot-advance-d.patchapplication/octet-stream; name=v5-0002-Add-TAP-tests-to-check-replication-slot-advance-d.patchDownload
From 5dd13070f2d7f9da7b335fa9077d7fb992106db8 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sat, 24 May 2025 13:26:28 +0300
Subject: [PATCH v5 2/3] Add TAP tests to check replication slot advance during
the checkpoint
The new tests verify that logical and physical replication slots are still
valid after an immediate restart on checkpoint completion when the slot was
advanced during the checkpoint.
This commit introduces two new injection points to make these tests possible:
* checkpoint-before-old-wal-removal - triggered in the checkpointer process
just before old WAL segments cleanup;
* logical-replication-slot-advance-segment - triggered in
LogicalConfirmReceivedLocation() when restart_lsn was changed enough to
point to the next WAL segment.
Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Backpatch-through: 17
---
src/backend/access/transam/xlog.c | 4 +
src/backend/replication/logical/logical.c | 18 +++
.../test_replslot_required_lsn/Makefile | 18 +++
.../test_replslot_required_lsn/meson.build | 16 ++
src/test/recovery/meson.build | 2 +
src/test/recovery/t/046_logical_slot.pl | 139 ++++++++++++++++++
src/test/recovery/t/047_physical_slot.pl | 133 +++++++++++++++++
7 files changed, 330 insertions(+)
create mode 100644 src/test/modules/test_replslot_required_lsn/Makefile
create mode 100644 src/test/modules/test_replslot_required_lsn/meson.build
create mode 100644 src/test/recovery/t/046_logical_slot.pl
create mode 100644 src/test/recovery/t/047_physical_slot.pl
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a0e589e9c4b..0a7f7a71d8b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7509,6 +7509,10 @@ CreateCheckPoint(int flags)
if (PriorRedoPtr != InvalidXLogRecPtr)
UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
+#ifdef USE_INJECTION_POINTS
+ INJECTION_POINT("checkpoint-before-old-wal-removal", NULL);
+#endif
+
/*
* Delete old log files, those no longer needed for last checkpoint to
* prevent the disk holding the xlog from growing full.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 1d56d0c4ef3..f1eb798f3e9 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -29,6 +29,7 @@
#include "postgres.h"
#include "access/xact.h"
+#include "access/xlog_internal.h"
#include "access/xlogutils.h"
#include "fmgr.h"
#include "miscadmin.h"
@@ -41,6 +42,7 @@
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/inval.h"
#include "utils/memutils.h"
@@ -1825,9 +1827,13 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
{
bool updated_xmin = false;
bool updated_restart = false;
+ XLogRecPtr restart_lsn pg_attribute_unused();
SpinLockAcquire(&MyReplicationSlot->mutex);
+ /* remember the old restart lsn */
+ restart_lsn = MyReplicationSlot->data.restart_lsn;
+
/*
* Prevent moving the confirmed_flush backwards, as this could lead to
* data duplication issues caused by replicating already replicated
@@ -1881,6 +1887,18 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
/* first write new xmin to disk, so we know what's up after a crash */
if (updated_xmin || updated_restart)
{
+#ifdef USE_INJECTION_POINTS
+ XLogSegNo seg1,
+ seg2;
+
+ XLByteToSeg(restart_lsn, seg1, wal_segment_size);
+ XLByteToSeg(MyReplicationSlot->data.restart_lsn, seg2, wal_segment_size);
+
+ /* trigger injection point, but only if segment changes */
+ if (seg1 != seg2)
+ INJECTION_POINT("logical-replication-slot-advance-segment", NULL);
+#endif
+
ReplicationSlotMarkDirty();
ReplicationSlotSave();
elog(DEBUG1, "updated xmin: %u restart: %u", updated_xmin, updated_restart);
diff --git a/src/test/modules/test_replslot_required_lsn/Makefile b/src/test/modules/test_replslot_required_lsn/Makefile
new file mode 100644
index 00000000000..e5ff8af255b
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/Makefile
@@ -0,0 +1,18 @@
+# src/test/modules/test_replslot_required_lsn/Makefile
+
+EXTRA_INSTALL=src/test/modules/injection_points \
+ contrib/test_decoding
+
+export enable_injection_points
+TAP_TESTS = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_replslot_required_lsn
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_replslot_required_lsn/meson.build b/src/test/modules/test_replslot_required_lsn/meson.build
new file mode 100644
index 00000000000..44d2546632b
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/meson.build
@@ -0,0 +1,16 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+tests += {
+ 'name': 'test_replslot_required_lsn',
+ 'sd': meson.current_source_dir(),
+ 'bd': meson.current_build_dir(),
+ 'tap': {
+ 'env': {
+ 'enable_injection_points': get_option('injection_points') ? 'yes' : 'no',
+ },
+ 'tests': [
+ 't/001_logical_slot.pl',
+ 't/002_physical_slot.pl'
+ ],
+ },
+}
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index cb983766c67..5ee41c3cd4d 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -54,6 +54,8 @@ tests += {
't/043_no_contrecord_switch.pl',
't/044_invalidate_inactive_slots.pl',
't/045_archive_restartpoint.pl',
+ 't/046_logical_slot.pl',
+ 't/047_physical_slot.pl'
],
},
}
diff --git a/src/test/recovery/t/046_logical_slot.pl b/src/test/recovery/t/046_logical_slot.pl
new file mode 100644
index 00000000000..b4265c4a6a5
--- /dev/null
+++ b/src/test/recovery/t/046_logical_slot.pl
@@ -0,0 +1,139 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+# This test verifies the case when the logical slot is advanced during
+# checkpoint. The test checks that the logical slot's restart_lsn still refers
+# to an existed WAL segment after immediate restart.
+#
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+
+$node = PostgreSQL::Test::Cluster->new('mike');
+$node->init;
+$node->append_conf('postgresql.conf',
+ "shared_preload_libraries = 'injection_points'");
+$node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# Create a simple table to generate data into.
+$node->safe_psql('postgres',
+ q{create table t (id serial primary key, b text)});
+
+# Create the two slots we'll need.
+$node->safe_psql('postgres',
+ q{select pg_create_logical_replication_slot('slot_logical', 'test_decoding')}
+);
+$node->safe_psql('postgres',
+ q{select pg_create_physical_replication_slot('slot_physical', true)});
+
+# Advance both slots to the current position just to have everything "valid".
+$node->safe_psql('postgres',
+ q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null)}
+);
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Run checkpoint to flush current state to disk and set a baseline.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Generate some transactions to get RUNNING_XACTS.
+my $xacts = $node->background_psql('postgres');
+$xacts->query_until(
+ qr/run_xacts/,
+ q(\echo run_xacts
+SELECT 1 \watch 0.1
+\q
+));
+
+# Insert 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+);
+
+# Run another checkpoint to set a new restore LSN.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Another 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+);
+
+# Run another checkpoint, this time in the background, and make it wait
+# on the injection point) so that the checkpoint stops right before
+# removing old WAL segments.
+note('starting checkpoint\n');
+
+my $checkpoint = $node->background_psql('postgres');
+$checkpoint->query_safe(
+ q(select injection_points_attach('checkpoint-before-old-wal-removal','wait'))
+);
+$checkpoint->query_until(
+ qr/starting_checkpoint/,
+ q(\echo starting_checkpoint
+checkpoint;
+\q
+));
+
+# Wait until the checkpoint stops right before removing WAL segments.
+note('waiting for injection_point\n');
+$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
+note('injection_point is reached');
+
+# Try to advance the logical slot, but make it stop when it moves to the next
+# WAL segment (this has to happen in the background, too).
+my $logical = $node->background_psql('postgres');
+$logical->query_safe(
+ q{select injection_points_attach('logical-replication-slot-advance-segment','wait');}
+);
+$logical->query_until(
+ qr/get_changes/,
+ q(
+\echo get_changes
+select count(*) from pg_logical_slot_get_changes('slot_logical', null, null) \watch 1
+\q
+));
+
+# Wait until the slot's restart_lsn points to the next WAL segment.
+note('waiting for injection_point\n');
+$node->wait_for_event('client backend',
+ 'logical-replication-slot-advance-segment');
+note('injection_point is reached');
+
+# OK, we're in the right situation: time to advance the physical slot, which
+# recalculates the required LSN, and then unblock the checkpoint, which
+# removes the WAL still needed by the logical slot.
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Continue the checkpoint.
+$node->safe_psql('postgres',
+ q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
+
+# Abruptly stop the server (1 second should be enough for the checkpoint
+# to finish; it would be better).
+$node->stop('immediate');
+
+$node->start;
+
+eval {
+ $node->safe_psql('postgres',
+ q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null);}
+ );
+};
+is($@, '', "Logical slot still valid");
+
+done_testing();
diff --git a/src/test/recovery/t/047_physical_slot.pl b/src/test/recovery/t/047_physical_slot.pl
new file mode 100644
index 00000000000..454e56b9bd2
--- /dev/null
+++ b/src/test/recovery/t/047_physical_slot.pl
@@ -0,0 +1,133 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+# This test verifies the case when the physical slot is advanced during
+# checkpoint. The test checks that the physical slot's restart_lsn still refers
+# to an existed WAL segment after immediate restart.
+#
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+
+$node = PostgreSQL::Test::Cluster->new('mike');
+$node->init;
+$node->append_conf('postgresql.conf',
+ "shared_preload_libraries = 'injection_points'");
+$node->append_conf('postgresql.conf', "wal_level = 'replica'");
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# Create a simple table to generate data into.
+$node->safe_psql('postgres',
+ q{create table t (id serial primary key, b text)});
+
+# Create a physical replication slot.
+$node->safe_psql('postgres',
+ q{select pg_create_physical_replication_slot('slot_physical', true)});
+
+# Advance slot to the current position, just to have everything "valid".
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Run checkpoint to flush current state to disk and set a baseline.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Insert 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,100000) s(i)}
+);
+
+# Advance slot to the current position, just to have everything "valid".
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Run another checkpoint to set a new restore LSN.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Another 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+);
+
+my $restart_lsn_init = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'}
+);
+chomp($restart_lsn_init);
+note("restart lsn before checkpoint: $restart_lsn_init");
+
+# Run another checkpoint, this time in the background, and make it wait
+# on the injection point) so that the checkpoint stops right before
+# removing old WAL segments.
+note('starting checkpoint');
+
+my $checkpoint = $node->background_psql('postgres');
+$checkpoint->query_safe(
+ q{select injection_points_attach('checkpoint-before-old-wal-removal','wait')}
+);
+$checkpoint->query_until(
+ qr/starting_checkpoint/,
+ q(\echo starting_checkpoint
+checkpoint;
+\q
+));
+
+# Wait until the checkpoint stops right before removing WAL segments.
+note('waiting for injection_point');
+$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
+note('injection_point is reached');
+
+# OK, we're in the right situation: time to advance the physical slot, which
+# recalculates the required LSN and then unblock the checkpoint, which
+# removes the WAL still needed by the physical slot.
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Continue the checkpoint.
+$node->safe_psql('postgres',
+ q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
+
+my $restart_lsn_old = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'}
+);
+chomp($restart_lsn_old);
+note("restart lsn before stop: $restart_lsn_old");
+
+# Abruptly stop the server (1 second should be enough for the checkpoint
+# to finish; it would be better).
+$node->stop('immediate');
+
+$node->start;
+
+# Get the restart_lsn of the slot right after restarting.
+my $restart_lsn = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'}
+);
+chomp($restart_lsn);
+note("restart lsn: $restart_lsn");
+
+# Get the WAL segment name for the slot's restart_lsn.
+my $restart_lsn_segment = $node->safe_psql('postgres',
+ "SELECT pg_walfile_name('$restart_lsn'::pg_lsn)");
+chomp($restart_lsn_segment);
+
+# Check if the required wal segment exists.
+note("required by slot segment name: $restart_lsn_segment");
+my $datadir = $node->data_dir;
+ok( -f "$datadir/pg_wal/$restart_lsn_segment",
+ "WAL segment $restart_lsn_segment for physical slot's restart_lsn $restart_lsn exists"
+);
+
+done_testing();
--
2.39.5 (Apple Git-154)
v5-0001-Keep-WAL-segments-by-the-flushed-value-of-the-slo.patchapplication/octet-stream; name=v5-0001-Keep-WAL-segments-by-the-flushed-value-of-the-slo.patchDownload
From b3f853589e55f78878923434810b0c6fb82c230e Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sat, 24 May 2025 13:09:16 +0300
Subject: [PATCH v5 1/3] Keep WAL segments by the flushed value of the slot's
restart LSN
The patch fixes the issue with the unexpected removal of old WAL segments
after checkpoint, followed by an immediate restart. The issue occurs when
a slot is advanced after the start of the checkpoint and before old WAL
segments are removed at the end of the checkpoint.
The idea of the patch is to get the minimal restart_lsn at the beginning
of checkpoint (or restart point) creation and use this value when calculating
the oldest LSN for WAL segments removal at the end of checkpoint. This idea
was proposed by Tomas Vondra in the discussion.
Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Backpatch-through: 13
---
src/backend/access/transam/xlog.c | 55 ++++++++++++++++++++++++++-----
1 file changed, 47 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1914859b2ee..a0e589e9c4b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -677,7 +677,8 @@ static XLogRecPtr CreateOverwriteContrecordRecord(XLogRecPtr aborted_lsn,
XLogRecPtr pagePtr,
TimeLineID newTLI);
static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
-static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
+static void KeepLogSeg(XLogRecPtr recptr, XLogRecPtr slotsMinLSN,
+ XLogSegNo *logSegNo);
static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
static void AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli,
@@ -7087,6 +7088,7 @@ CreateCheckPoint(int flags)
VirtualTransactionId *vxids;
int nvxids;
int oldXLogAllowed = 0;
+ XLogRecPtr slotsMinReqLSN;
/*
* An end-of-recovery checkpoint is really a shutdown checkpoint, just
@@ -7315,6 +7317,15 @@ CreateCheckPoint(int flags)
*/
END_CRIT_SECTION();
+ /*
+ * Get the current minimum LSN to be used later in the WAL segment
+ * cleanup. We may clean up only WAL segments, which are not needed
+ * according to synchronized LSNs of replication slots. The slot's LSN
+ * might be advanced concurrently, so we call this before
+ * CheckPointReplicationSlots() synchronizes replication slots.
+ */
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+
/*
* In some cases there are groups of actions that must all occur on one
* side or the other of a checkpoint record. Before flushing the
@@ -7503,17 +7514,25 @@ CreateCheckPoint(int flags)
* prevent the disk holding the xlog from growing full.
*/
XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(recptr, &_logSegNo);
+ KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo);
if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
_logSegNo, InvalidOid,
InvalidTransactionId))
{
+ /*
+ * Recalculate the current minimum LSN to be used in the WAL segment
+ * cleanup. Then, we must synchronize the replication slots again in
+ * order to make this LSN safe to use.
+ */
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+ CheckPointReplicationSlots(shutdown);
+
/*
* Some slots have been invalidated; recalculate the old-segment
* horizon, starting again from RedoRecPtr.
*/
XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(recptr, &_logSegNo);
+ KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo);
}
_logSegNo--;
RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr,
@@ -7788,6 +7807,7 @@ CreateRestartPoint(int flags)
XLogRecPtr endptr;
XLogSegNo _logSegNo;
TimestampTz xtime;
+ XLogRecPtr slotsMinReqLSN;
/* Concurrent checkpoint/restartpoint cannot happen */
Assert(!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER);
@@ -7870,6 +7890,15 @@ CreateRestartPoint(int flags)
MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
+ /*
+ * Get the current minimum LSN to be used later in the WAL segment
+ * cleanup. We may clean up only WAL segments, which are not needed
+ * according to synchronized LSNs of replication slots. The slot's LSN
+ * might be advanced concurrently, so we call this before
+ * CheckPointReplicationSlots() synchronizes replication slots.
+ */
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+
if (log_checkpoints)
LogCheckpointStart(flags, true);
@@ -7958,17 +7987,25 @@ CreateRestartPoint(int flags)
receivePtr = GetWalRcvFlushRecPtr(NULL, NULL);
replayPtr = GetXLogReplayRecPtr(&replayTLI);
endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
- KeepLogSeg(endptr, &_logSegNo);
+ KeepLogSeg(endptr, slotsMinReqLSN, &_logSegNo);
if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
_logSegNo, InvalidOid,
InvalidTransactionId))
{
+ /*
+ * Recalculate the current minimum LSN to be used in the WAL segment
+ * cleanup. Then, we must synchronize the replication slots again in
+ * order to make this LSN safe to use.
+ */
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+ CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
+
/*
* Some slots have been invalidated; recalculate the old-segment
* horizon, starting again from RedoRecPtr.
*/
XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(endptr, &_logSegNo);
+ KeepLogSeg(endptr, slotsMinReqLSN, &_logSegNo);
}
_logSegNo--;
@@ -8063,6 +8100,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
XLogSegNo oldestSegMaxWalSize; /* oldest segid kept by max_wal_size */
XLogSegNo oldestSlotSeg; /* oldest segid kept by slot */
uint64 keepSegs;
+ XLogRecPtr slotsMinReqLSN;
/*
* slot does not reserve WAL. Either deactivated, or has never been active
@@ -8076,8 +8114,9 @@ GetWALAvailability(XLogRecPtr targetLSN)
* oldestSlotSeg to the current segment.
*/
currpos = GetXLogWriteRecPtr();
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
XLByteToSeg(currpos, oldestSlotSeg, wal_segment_size);
- KeepLogSeg(currpos, &oldestSlotSeg);
+ KeepLogSeg(currpos, slotsMinReqLSN, &oldestSlotSeg);
/*
* Find the oldest extant segment file. We get 1 until checkpoint removes
@@ -8138,7 +8177,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
* invalidation is optionally done here, instead.
*/
static void
-KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
+KeepLogSeg(XLogRecPtr recptr, XLogRecPtr slotsMinReqLSN, XLogSegNo *logSegNo)
{
XLogSegNo currSegNo;
XLogSegNo segno;
@@ -8151,7 +8190,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
* Calculate how many segments are kept by slots first, adjusting for
* max_slot_wal_keep_size.
*/
- keep = XLogGetReplicationSlotMinimumLSN();
+ keep = slotsMinReqLSN;
if (keep != InvalidXLogRecPtr && keep < recptr)
{
XLByteToSeg(keep, segno, wal_segment_size);
--
2.39.5 (Apple Git-154)
On Sat, May 24, 2025 at 4:08 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
I spend more time on this. The next revision is attached. It
contains revised comments and other cosmetic changes. I'm going to
backpatch 0001 to all supported branches,
Is my understanding correct that we need 0001 because
PhysicalConfirmReceivedLocation() doesn't save the slot to disk after
changing the slot's restart_lsn? If so, shouldn't the comments (One
could argue that the slot should be saved to disk now, but that'd be
energy wasted - the worst thing lost information could cause here is
to give wrong information in a statistics view) in
PhysicalConfirmReceivedLocation() be changed to mention the risk of
not saving the slot?
Also, after 0001, even the same solution will be true for logical
slots as well, where we are already careful to save the slot to disk
if its restart_lsn is changed, see LogicalConfirmReceivedLocation().
So, won't that effort be wasted? Even if we don't want to do anything
about it (which doesn't sound like a good idea), we should note that
in comments somewhere.
--
With Regards,
Amit Kapila.
Hi, Amit!
Thank you for your attention to this patchset!
On Sat, May 24, 2025 at 2:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, May 24, 2025 at 4:08 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
I spend more time on this. The next revision is attached. It
contains revised comments and other cosmetic changes. I'm going to
backpatch 0001 to all supported branches,Is my understanding correct that we need 0001 because
PhysicalConfirmReceivedLocation() doesn't save the slot to disk after
changing the slot's restart_lsn?
Yes. Also, even if it would save slot to the disk, there is still
race condition that concurrent checkpoint could use updated value from
the shared memory to clean old WAL segments, and then crash happens
before we managed to write the slot to the disk.
If so, shouldn't the comments (One
could argue that the slot should be saved to disk now, but that'd be
energy wasted - the worst thing lost information could cause here is
to give wrong information in a statistics view) in
PhysicalConfirmReceivedLocation() be changed to mention the risk of
not saving the slot?Also, after 0001, even the same solution will be true for logical
slots as well, where we are already careful to save the slot to disk
if its restart_lsn is changed, see LogicalConfirmReceivedLocation().
So, won't that effort be wasted? Even if we don't want to do anything
about it (which doesn't sound like a good idea), we should note that
in comments somewhere.
I have added the comments about both points in the attached revision
of the patchset.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v6-0003-Remove-redundant-ReplicationSlotsComputeRequiredL.patchapplication/octet-stream; name=v6-0003-Remove-redundant-ReplicationSlotsComputeRequiredL.patchDownload
From 7737daab8bc713dad04181878ee6d8e0ac9d935c Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sat, 24 May 2025 13:34:36 +0300
Subject: [PATCH v6 3/3] Remove redundant ReplicationSlotsComputeRequiredLSN
calls
The function ReplicationSlotsComputeRequiredLSN is used to calculate the
oldest slots' required LSN. It is called every time when restart_lsn
value of any slot is changed (for example, when a slot is advancing).
The slot's oldest required LSN is used to remote old WAL segments in two
places - when checkpoint or restart point is created (CreateCheckPoint,
CreateRestartPoint functions). Old WAL segments seems to be truncated in
these two functions only.
The idea of the patch is to call ReplicationSlotsComputeRequiredLSN in
CreateCheckPoint or CreateRestartPoint functions only, before call of
RemoveOldXlogFiles function where old WAL segments are removed. There
is no obvious need to recalculate oldest required LSN every time when a
slot's restart_lsn is changed.
The value of the oldest required lsn can affect on slot invalidation.
The function InvalidateObsoleteReplicationSlots with non zero second
parameter (oldestSegno) is called in CreateCheckPoint,
CreateRestartPoint functions only where slot invalidation occurs with
reason RS_INVAL_WAL_REMOVED. Once we update the oldest slots' required
lsn in the beginning of these functions, the proposed patch should not
break the behaviour of slot invalidation function in this case.
---
src/backend/access/transam/xlog.c | 4 ++++
src/backend/replication/logical/logical.c | 1 -
src/backend/replication/logical/slotsync.c | 4 ----
src/backend/replication/slot.c | 5 -----
src/backend/replication/slotfuncs.c | 2 --
src/backend/replication/walsender.c | 1 -
6 files changed, 4 insertions(+), 13 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0a7f7a71d8b..bdfd0a59ab7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7324,6 +7324,7 @@ CreateCheckPoint(int flags)
* might be advanced concurrently, so we call this before
* CheckPointReplicationSlots() synchronizes replication slots.
*/
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
/*
@@ -7528,6 +7529,7 @@ CreateCheckPoint(int flags)
* cleanup. Then, we must synchronize the replication slots again in
* order to make this LSN safe to use.
*/
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
CheckPointReplicationSlots(shutdown);
@@ -7901,6 +7903,7 @@ CreateRestartPoint(int flags)
* might be advanced concurrently, so we call this before
* CheckPointReplicationSlots() synchronizes replication slots.
*/
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
if (log_checkpoints)
@@ -8001,6 +8004,7 @@ CreateRestartPoint(int flags)
* cleanup. Then, we must synchronize the replication slots again in
* order to make this LSN safe to use.
*/
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 081e6593722..34e973393c2 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1925,7 +1925,6 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
SpinLockRelease(&MyReplicationSlot->mutex);
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
}
}
else
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 656e66e0ae0..30662c09275 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -335,7 +335,6 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
SpinLockRelease(&slot->mutex);
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
}
return updated_config || updated_xmin_or_lsn;
@@ -502,9 +501,6 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
slot->data.restart_lsn = restart_lsn;
SpinLockRelease(&slot->mutex);
- /* Prevent WAL removal as fast as possible */
- ReplicationSlotsComputeRequiredLSN();
-
XLByteToSeg(slot->data.restart_lsn, segno, wal_segment_size);
/*
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 600b87fa9cb..dd18fe10f7d 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1008,7 +1008,6 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
* limits.
*/
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
/*
* If removing the directory fails, the worst thing that will happen is
@@ -1494,9 +1493,6 @@ ReplicationSlotReserveWal(void)
slot->data.restart_lsn = restart_lsn;
SpinLockRelease(&slot->mutex);
- /* prevent WAL removal as fast as possible */
- ReplicationSlotsComputeRequiredLSN();
-
/*
* If all required WAL is still there, great, otherwise retry. The
* slot should prevent further removal of WAL, unless there's a
@@ -2014,7 +2010,6 @@ restart:
if (invalidated)
{
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
}
return invalidated;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 36cc2ed4e44..3300fb9b1c9 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -583,7 +583,6 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
* advancing potentially done.
*/
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
ReplicationSlotRelease();
@@ -819,7 +818,6 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
ReplicationSlotMarkDirty();
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
ReplicationSlotSave();
#ifdef USE_ASSERT_CHECKING
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index d751d34295d..dff749f00a8 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2384,7 +2384,6 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
if (changed)
{
ReplicationSlotMarkDirty();
- ReplicationSlotsComputeRequiredLSN();
PhysicalWakeupLogicalWalSnd();
}
--
2.39.5 (Apple Git-154)
v6-0002-Add-TAP-tests-to-check-replication-slot-advance-d.patchapplication/octet-stream; name=v6-0002-Add-TAP-tests-to-check-replication-slot-advance-d.patchDownload
From 519fc0357ddc1ea0b12cd2df1deae840d56ea6ba Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sat, 24 May 2025 13:26:28 +0300
Subject: [PATCH v6 2/3] Add TAP tests to check replication slot advance during
the checkpoint
The new tests verify that logical and physical replication slots are still
valid after an immediate restart on checkpoint completion when the slot was
advanced during the checkpoint.
This commit introduces two new injection points to make these tests possible:
* checkpoint-before-old-wal-removal - triggered in the checkpointer process
just before old WAL segments cleanup;
* logical-replication-slot-advance-segment - triggered in
LogicalConfirmReceivedLocation() when restart_lsn was changed enough to
point to the next WAL segment.
Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Backpatch-through: 17
---
src/backend/access/transam/xlog.c | 4 +
src/backend/replication/logical/logical.c | 18 +++
.../test_replslot_required_lsn/Makefile | 18 +++
.../test_replslot_required_lsn/meson.build | 16 ++
src/test/recovery/meson.build | 2 +
src/test/recovery/t/046_logical_slot.pl | 139 ++++++++++++++++++
src/test/recovery/t/047_physical_slot.pl | 133 +++++++++++++++++
7 files changed, 330 insertions(+)
create mode 100644 src/test/modules/test_replslot_required_lsn/Makefile
create mode 100644 src/test/modules/test_replslot_required_lsn/meson.build
create mode 100644 src/test/recovery/t/046_logical_slot.pl
create mode 100644 src/test/recovery/t/047_physical_slot.pl
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a0e589e9c4b..0a7f7a71d8b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7509,6 +7509,10 @@ CreateCheckPoint(int flags)
if (PriorRedoPtr != InvalidXLogRecPtr)
UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
+#ifdef USE_INJECTION_POINTS
+ INJECTION_POINT("checkpoint-before-old-wal-removal", NULL);
+#endif
+
/*
* Delete old log files, those no longer needed for last checkpoint to
* prevent the disk holding the xlog from growing full.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 6b3995133e2..081e6593722 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -29,6 +29,7 @@
#include "postgres.h"
#include "access/xact.h"
+#include "access/xlog_internal.h"
#include "access/xlogutils.h"
#include "fmgr.h"
#include "miscadmin.h"
@@ -41,6 +42,7 @@
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/inval.h"
#include "utils/memutils.h"
@@ -1825,9 +1827,13 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
{
bool updated_xmin = false;
bool updated_restart = false;
+ XLogRecPtr restart_lsn pg_attribute_unused();
SpinLockAcquire(&MyReplicationSlot->mutex);
+ /* remember the old restart lsn */
+ restart_lsn = MyReplicationSlot->data.restart_lsn;
+
/*
* Prevent moving the confirmed_flush backwards, as this could lead to
* data duplication issues caused by replicating already replicated
@@ -1889,6 +1895,18 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
*/
if (updated_xmin || updated_restart)
{
+#ifdef USE_INJECTION_POINTS
+ XLogSegNo seg1,
+ seg2;
+
+ XLByteToSeg(restart_lsn, seg1, wal_segment_size);
+ XLByteToSeg(MyReplicationSlot->data.restart_lsn, seg2, wal_segment_size);
+
+ /* trigger injection point, but only if segment changes */
+ if (seg1 != seg2)
+ INJECTION_POINT("logical-replication-slot-advance-segment", NULL);
+#endif
+
ReplicationSlotMarkDirty();
ReplicationSlotSave();
elog(DEBUG1, "updated xmin: %u restart: %u", updated_xmin, updated_restart);
diff --git a/src/test/modules/test_replslot_required_lsn/Makefile b/src/test/modules/test_replslot_required_lsn/Makefile
new file mode 100644
index 00000000000..e5ff8af255b
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/Makefile
@@ -0,0 +1,18 @@
+# src/test/modules/test_replslot_required_lsn/Makefile
+
+EXTRA_INSTALL=src/test/modules/injection_points \
+ contrib/test_decoding
+
+export enable_injection_points
+TAP_TESTS = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_replslot_required_lsn
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_replslot_required_lsn/meson.build b/src/test/modules/test_replslot_required_lsn/meson.build
new file mode 100644
index 00000000000..44d2546632b
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/meson.build
@@ -0,0 +1,16 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+tests += {
+ 'name': 'test_replslot_required_lsn',
+ 'sd': meson.current_source_dir(),
+ 'bd': meson.current_build_dir(),
+ 'tap': {
+ 'env': {
+ 'enable_injection_points': get_option('injection_points') ? 'yes' : 'no',
+ },
+ 'tests': [
+ 't/001_logical_slot.pl',
+ 't/002_physical_slot.pl'
+ ],
+ },
+}
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index cb983766c67..5ee41c3cd4d 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -54,6 +54,8 @@ tests += {
't/043_no_contrecord_switch.pl',
't/044_invalidate_inactive_slots.pl',
't/045_archive_restartpoint.pl',
+ 't/046_logical_slot.pl',
+ 't/047_physical_slot.pl'
],
},
}
diff --git a/src/test/recovery/t/046_logical_slot.pl b/src/test/recovery/t/046_logical_slot.pl
new file mode 100644
index 00000000000..b4265c4a6a5
--- /dev/null
+++ b/src/test/recovery/t/046_logical_slot.pl
@@ -0,0 +1,139 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+# This test verifies the case when the logical slot is advanced during
+# checkpoint. The test checks that the logical slot's restart_lsn still refers
+# to an existed WAL segment after immediate restart.
+#
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+
+$node = PostgreSQL::Test::Cluster->new('mike');
+$node->init;
+$node->append_conf('postgresql.conf',
+ "shared_preload_libraries = 'injection_points'");
+$node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# Create a simple table to generate data into.
+$node->safe_psql('postgres',
+ q{create table t (id serial primary key, b text)});
+
+# Create the two slots we'll need.
+$node->safe_psql('postgres',
+ q{select pg_create_logical_replication_slot('slot_logical', 'test_decoding')}
+);
+$node->safe_psql('postgres',
+ q{select pg_create_physical_replication_slot('slot_physical', true)});
+
+# Advance both slots to the current position just to have everything "valid".
+$node->safe_psql('postgres',
+ q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null)}
+);
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Run checkpoint to flush current state to disk and set a baseline.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Generate some transactions to get RUNNING_XACTS.
+my $xacts = $node->background_psql('postgres');
+$xacts->query_until(
+ qr/run_xacts/,
+ q(\echo run_xacts
+SELECT 1 \watch 0.1
+\q
+));
+
+# Insert 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+);
+
+# Run another checkpoint to set a new restore LSN.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Another 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+);
+
+# Run another checkpoint, this time in the background, and make it wait
+# on the injection point) so that the checkpoint stops right before
+# removing old WAL segments.
+note('starting checkpoint\n');
+
+my $checkpoint = $node->background_psql('postgres');
+$checkpoint->query_safe(
+ q(select injection_points_attach('checkpoint-before-old-wal-removal','wait'))
+);
+$checkpoint->query_until(
+ qr/starting_checkpoint/,
+ q(\echo starting_checkpoint
+checkpoint;
+\q
+));
+
+# Wait until the checkpoint stops right before removing WAL segments.
+note('waiting for injection_point\n');
+$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
+note('injection_point is reached');
+
+# Try to advance the logical slot, but make it stop when it moves to the next
+# WAL segment (this has to happen in the background, too).
+my $logical = $node->background_psql('postgres');
+$logical->query_safe(
+ q{select injection_points_attach('logical-replication-slot-advance-segment','wait');}
+);
+$logical->query_until(
+ qr/get_changes/,
+ q(
+\echo get_changes
+select count(*) from pg_logical_slot_get_changes('slot_logical', null, null) \watch 1
+\q
+));
+
+# Wait until the slot's restart_lsn points to the next WAL segment.
+note('waiting for injection_point\n');
+$node->wait_for_event('client backend',
+ 'logical-replication-slot-advance-segment');
+note('injection_point is reached');
+
+# OK, we're in the right situation: time to advance the physical slot, which
+# recalculates the required LSN, and then unblock the checkpoint, which
+# removes the WAL still needed by the logical slot.
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Continue the checkpoint.
+$node->safe_psql('postgres',
+ q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
+
+# Abruptly stop the server (1 second should be enough for the checkpoint
+# to finish; it would be better).
+$node->stop('immediate');
+
+$node->start;
+
+eval {
+ $node->safe_psql('postgres',
+ q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null);}
+ );
+};
+is($@, '', "Logical slot still valid");
+
+done_testing();
diff --git a/src/test/recovery/t/047_physical_slot.pl b/src/test/recovery/t/047_physical_slot.pl
new file mode 100644
index 00000000000..454e56b9bd2
--- /dev/null
+++ b/src/test/recovery/t/047_physical_slot.pl
@@ -0,0 +1,133 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+# This test verifies the case when the physical slot is advanced during
+# checkpoint. The test checks that the physical slot's restart_lsn still refers
+# to an existed WAL segment after immediate restart.
+#
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+
+$node = PostgreSQL::Test::Cluster->new('mike');
+$node->init;
+$node->append_conf('postgresql.conf',
+ "shared_preload_libraries = 'injection_points'");
+$node->append_conf('postgresql.conf', "wal_level = 'replica'");
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# Create a simple table to generate data into.
+$node->safe_psql('postgres',
+ q{create table t (id serial primary key, b text)});
+
+# Create a physical replication slot.
+$node->safe_psql('postgres',
+ q{select pg_create_physical_replication_slot('slot_physical', true)});
+
+# Advance slot to the current position, just to have everything "valid".
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Run checkpoint to flush current state to disk and set a baseline.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Insert 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,100000) s(i)}
+);
+
+# Advance slot to the current position, just to have everything "valid".
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Run another checkpoint to set a new restore LSN.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Another 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+);
+
+my $restart_lsn_init = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'}
+);
+chomp($restart_lsn_init);
+note("restart lsn before checkpoint: $restart_lsn_init");
+
+# Run another checkpoint, this time in the background, and make it wait
+# on the injection point) so that the checkpoint stops right before
+# removing old WAL segments.
+note('starting checkpoint');
+
+my $checkpoint = $node->background_psql('postgres');
+$checkpoint->query_safe(
+ q{select injection_points_attach('checkpoint-before-old-wal-removal','wait')}
+);
+$checkpoint->query_until(
+ qr/starting_checkpoint/,
+ q(\echo starting_checkpoint
+checkpoint;
+\q
+));
+
+# Wait until the checkpoint stops right before removing WAL segments.
+note('waiting for injection_point');
+$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
+note('injection_point is reached');
+
+# OK, we're in the right situation: time to advance the physical slot, which
+# recalculates the required LSN and then unblock the checkpoint, which
+# removes the WAL still needed by the physical slot.
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Continue the checkpoint.
+$node->safe_psql('postgres',
+ q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
+
+my $restart_lsn_old = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'}
+);
+chomp($restart_lsn_old);
+note("restart lsn before stop: $restart_lsn_old");
+
+# Abruptly stop the server (1 second should be enough for the checkpoint
+# to finish; it would be better).
+$node->stop('immediate');
+
+$node->start;
+
+# Get the restart_lsn of the slot right after restarting.
+my $restart_lsn = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'}
+);
+chomp($restart_lsn);
+note("restart lsn: $restart_lsn");
+
+# Get the WAL segment name for the slot's restart_lsn.
+my $restart_lsn_segment = $node->safe_psql('postgres',
+ "SELECT pg_walfile_name('$restart_lsn'::pg_lsn)");
+chomp($restart_lsn_segment);
+
+# Check if the required wal segment exists.
+note("required by slot segment name: $restart_lsn_segment");
+my $datadir = $node->data_dir;
+ok( -f "$datadir/pg_wal/$restart_lsn_segment",
+ "WAL segment $restart_lsn_segment for physical slot's restart_lsn $restart_lsn exists"
+);
+
+done_testing();
--
2.39.5 (Apple Git-154)
v6-0001-Keep-WAL-segments-by-the-flushed-value-of-the-slo.patchapplication/octet-stream; name=v6-0001-Keep-WAL-segments-by-the-flushed-value-of-the-slo.patchDownload
From cea89c4c09e3f558094beefd2a614f8bc1b30fe1 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sat, 24 May 2025 16:26:27 +0300
Subject: [PATCH v6 1/3] Keep WAL segments by the flushed value of the slot's
restart LSN
The patch fixes the issue with the unexpected removal of old WAL segments
after checkpoint, followed by an immediate restart. The issue occurs when
a slot is advanced after the start of the checkpoint and before old WAL
segments are removed at the end of the checkpoint.
The idea of the patch is to get the minimal restart_lsn at the beginning
of checkpoint (or restart point) creation and use this value when calculating
the oldest LSN for WAL segments removal at the end of checkpoint. This idea
was proposed by Tomas Vondra in the discussion.
Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Backpatch-through: 13
---
src/backend/access/transam/xlog.c | 55 +++++++++++++++++++----
src/backend/replication/logical/logical.c | 10 ++++-
src/backend/replication/walsender.c | 4 ++
3 files changed, 60 insertions(+), 9 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1914859b2ee..a0e589e9c4b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -677,7 +677,8 @@ static XLogRecPtr CreateOverwriteContrecordRecord(XLogRecPtr aborted_lsn,
XLogRecPtr pagePtr,
TimeLineID newTLI);
static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
-static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
+static void KeepLogSeg(XLogRecPtr recptr, XLogRecPtr slotsMinLSN,
+ XLogSegNo *logSegNo);
static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
static void AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli,
@@ -7087,6 +7088,7 @@ CreateCheckPoint(int flags)
VirtualTransactionId *vxids;
int nvxids;
int oldXLogAllowed = 0;
+ XLogRecPtr slotsMinReqLSN;
/*
* An end-of-recovery checkpoint is really a shutdown checkpoint, just
@@ -7315,6 +7317,15 @@ CreateCheckPoint(int flags)
*/
END_CRIT_SECTION();
+ /*
+ * Get the current minimum LSN to be used later in the WAL segment
+ * cleanup. We may clean up only WAL segments, which are not needed
+ * according to synchronized LSNs of replication slots. The slot's LSN
+ * might be advanced concurrently, so we call this before
+ * CheckPointReplicationSlots() synchronizes replication slots.
+ */
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+
/*
* In some cases there are groups of actions that must all occur on one
* side or the other of a checkpoint record. Before flushing the
@@ -7503,17 +7514,25 @@ CreateCheckPoint(int flags)
* prevent the disk holding the xlog from growing full.
*/
XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(recptr, &_logSegNo);
+ KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo);
if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
_logSegNo, InvalidOid,
InvalidTransactionId))
{
+ /*
+ * Recalculate the current minimum LSN to be used in the WAL segment
+ * cleanup. Then, we must synchronize the replication slots again in
+ * order to make this LSN safe to use.
+ */
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+ CheckPointReplicationSlots(shutdown);
+
/*
* Some slots have been invalidated; recalculate the old-segment
* horizon, starting again from RedoRecPtr.
*/
XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(recptr, &_logSegNo);
+ KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo);
}
_logSegNo--;
RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr,
@@ -7788,6 +7807,7 @@ CreateRestartPoint(int flags)
XLogRecPtr endptr;
XLogSegNo _logSegNo;
TimestampTz xtime;
+ XLogRecPtr slotsMinReqLSN;
/* Concurrent checkpoint/restartpoint cannot happen */
Assert(!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER);
@@ -7870,6 +7890,15 @@ CreateRestartPoint(int flags)
MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
+ /*
+ * Get the current minimum LSN to be used later in the WAL segment
+ * cleanup. We may clean up only WAL segments, which are not needed
+ * according to synchronized LSNs of replication slots. The slot's LSN
+ * might be advanced concurrently, so we call this before
+ * CheckPointReplicationSlots() synchronizes replication slots.
+ */
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+
if (log_checkpoints)
LogCheckpointStart(flags, true);
@@ -7958,17 +7987,25 @@ CreateRestartPoint(int flags)
receivePtr = GetWalRcvFlushRecPtr(NULL, NULL);
replayPtr = GetXLogReplayRecPtr(&replayTLI);
endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
- KeepLogSeg(endptr, &_logSegNo);
+ KeepLogSeg(endptr, slotsMinReqLSN, &_logSegNo);
if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
_logSegNo, InvalidOid,
InvalidTransactionId))
{
+ /*
+ * Recalculate the current minimum LSN to be used in the WAL segment
+ * cleanup. Then, we must synchronize the replication slots again in
+ * order to make this LSN safe to use.
+ */
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+ CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
+
/*
* Some slots have been invalidated; recalculate the old-segment
* horizon, starting again from RedoRecPtr.
*/
XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(endptr, &_logSegNo);
+ KeepLogSeg(endptr, slotsMinReqLSN, &_logSegNo);
}
_logSegNo--;
@@ -8063,6 +8100,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
XLogSegNo oldestSegMaxWalSize; /* oldest segid kept by max_wal_size */
XLogSegNo oldestSlotSeg; /* oldest segid kept by slot */
uint64 keepSegs;
+ XLogRecPtr slotsMinReqLSN;
/*
* slot does not reserve WAL. Either deactivated, or has never been active
@@ -8076,8 +8114,9 @@ GetWALAvailability(XLogRecPtr targetLSN)
* oldestSlotSeg to the current segment.
*/
currpos = GetXLogWriteRecPtr();
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
XLByteToSeg(currpos, oldestSlotSeg, wal_segment_size);
- KeepLogSeg(currpos, &oldestSlotSeg);
+ KeepLogSeg(currpos, slotsMinReqLSN, &oldestSlotSeg);
/*
* Find the oldest extant segment file. We get 1 until checkpoint removes
@@ -8138,7 +8177,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
* invalidation is optionally done here, instead.
*/
static void
-KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
+KeepLogSeg(XLogRecPtr recptr, XLogRecPtr slotsMinReqLSN, XLogSegNo *logSegNo)
{
XLogSegNo currSegNo;
XLogSegNo segno;
@@ -8151,7 +8190,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
* Calculate how many segments are kept by slots first, adjusting for
* max_slot_wal_keep_size.
*/
- keep = XLogGetReplicationSlotMinimumLSN();
+ keep = slotsMinReqLSN;
if (keep != InvalidXLogRecPtr && keep < recptr)
{
XLByteToSeg(keep, segno, wal_segment_size);
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 1d56d0c4ef3..6b3995133e2 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1878,7 +1878,15 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
SpinLockRelease(&MyReplicationSlot->mutex);
- /* first write new xmin to disk, so we know what's up after a crash */
+ /*
+ * First, write new xmin and restart_lsn to disk so we know what's up
+ * after a crash. Even when we do this, the checkpointer can see the
+ * updated restart_lsn value in the shared memory; then, a crash can
+ * happen before we manage to write that value to the disk. Thus,
+ * checkpointer still needs to make special efforts to keep WAL
+ * segments required by the restart_lsn written to the disk. See
+ * CreateCheckPoint() and CreateRestartPoint() for details.
+ */
if (updated_xmin || updated_restart)
{
ReplicationSlotMarkDirty();
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 9fa8beb6103..d751d34295d 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2393,6 +2393,10 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
* be energy wasted - the worst thing lost information could cause here is
* to give wrong information in a statistics view - we'll just potentially
* be more conservative in removing files.
+ *
+ * Checkpointer makes special efforts to keep the WAL segments required by
+ * the restart_lsn written to the disk. See CreateCheckPoint() and
+ * CreateRestartPoint() for details.
*/
}
--
2.39.5 (Apple Git-154)
On Sat, May 24, 2025 at 6:59 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi, Amit!
Thank you for your attention to this patchset!
On Sat, May 24, 2025 at 2:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, May 24, 2025 at 4:08 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
I spend more time on this. The next revision is attached. It
contains revised comments and other cosmetic changes. I'm going to
backpatch 0001 to all supported branches,Is my understanding correct that we need 0001 because
PhysicalConfirmReceivedLocation() doesn't save the slot to disk after
changing the slot's restart_lsn?Yes. Also, even if it would save slot to the disk, there is still
race condition that concurrent checkpoint could use updated value from
the shared memory to clean old WAL segments, and then crash happens
before we managed to write the slot to the disk.
How can that happen, if we first write the updated value to disk and
then update the shared memory as we do in
LogicalConfirmReceivedLocation?
BTW, won't there be a similar problem with physical slot's xmin
computation as well? In PhysicalReplicationSlotNewXmin(), after
updating the slot's xmin computation, we mark it as dirty and update
shared memory values. Now, say after checkpointer writes these xmin
values to disk, walsender receives another feedback message and
updates the slot's xmin values. Now using these updated shared memory
values, vacuum removes the rows, however, a restart will show the
older xmin values in the slot, which mean vacuum would have removed
the required rows before restart.
As per my understanding, neither the xmin nor the LSN problem exists
for logical slots. I am pointing this out to indicate we may need to
think of a different solution for physical slots, if these are
problems only for physical slots.
--
With Regards,
Amit Kapila.
On Mon, May 26, 2025 at 9:49 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, May 24, 2025 at 6:59 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi, Amit!
Thank you for your attention to this patchset!
On Sat, May 24, 2025 at 2:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, May 24, 2025 at 4:08 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
I spend more time on this. The next revision is attached. It
contains revised comments and other cosmetic changes. I'm going to
backpatch 0001 to all supported branches,Is my understanding correct that we need 0001 because
PhysicalConfirmReceivedLocation() doesn't save the slot to disk after
changing the slot's restart_lsn?Yes. Also, even if it would save slot to the disk, there is still
race condition that concurrent checkpoint could use updated value from
the shared memory to clean old WAL segments, and then crash happens
before we managed to write the slot to the disk.How can that happen, if we first write the updated value to disk and
then update the shared memory as we do in
LogicalConfirmReceivedLocation?
I don't see this to be true for LogicalConfirmReceivedLocation() and
restart_lsn. We clearly don't update restart_lsn in shared memory
after flush. It we previously proposed to resolve a problem for
restart_lsn like this, but that approach breaks ABI and couldn't be
backpatched.
BTW, won't there be a similar problem with physical slot's xmin
computation as well? In PhysicalReplicationSlotNewXmin(), after
updating the slot's xmin computation, we mark it as dirty and update
shared memory values. Now, say after checkpointer writes these xmin
values to disk, walsender receives another feedback message and
updates the slot's xmin values. Now using these updated shared memory
values, vacuum removes the rows, however, a restart will show the
older xmin values in the slot, which mean vacuum would have removed
the required rows before restart.
I don't yet see why this should be a problem. feedbackXmin provides a
barrier for vacuum, but unlike restart_lsn it doesn't refers removed
tuples as the resource. If vacuum would remove some tuples based on
the last feedback message that tuples are not needed by replica. If
after restart we would have outdated feedbackXmin that would make our
vacuum even more conservative for a while, but I don't see how that
would lead to material problem. On contrast you can see in
046_logical_slot.pl how lack of restart_lsn synchronization leads to
an error while attempting to decode the changes, because the current
code expects WAL at restart_lsn to exist.
As per my understanding, neither the xmin nor the LSN problem exists
for logical slots. I am pointing this out to indicate we may need to
think of a different solution for physical slots, if these are
problems only for physical slots.
There is indeed a problem for logical slots. You can apply 0002 patch
alone and test for logical slots will fail.
------
Regards,
Alexander Korotkov
Supabase
Dear Alexander, Amit, All
Amit wrote:
Is my understanding correct that we need 0001 because
PhysicalConfirmReceivedLocation() doesn't save the slot to disk after
changing the slot's restart_lsn?Yes. Also, even if it would save slot to the disk, there is still
race condition that concurrent checkpoint could use updated value from
the shared memory to clean old WAL segments, and then crash happens
before we managed to write the slot to the disk.How can that happen, if we first write the updated value to disk and
then update the shared memory as we do in
LogicalConfirmReceivedLocation?
I guess, that the problem with logical slots still exist. Please, see the tap
test: src/test/recovery/t/046_logical_slot.pl from the v6 version of the patch.
A race condition may happen when logical slot's restart_lsn was changed but not
yet written to the disk. Imagine, there is another physical slot which is
advanced at this moment. It recomputes oldest min LSN and takes into account
changed but not saved to disk restart_lsn of the logical slot. We come to the
situation when the WAL segment for the logical slot's restart_lsn may be
truncated after immediate restart.
I'm not sure what may happen with two checkpoints which execute in parallel, but
I would say that the patch 0001 guarantees that every checkpoint run will trim
the WAL segments based on the already saved on disk restart LSNs of slots. The
rule to trim the WAL by saved slot's restart_lsn will not be violated.
Amit wrote:
As per my understanding, neither the xmin nor the LSN problem exists
for logical slots. I am pointing this out to indicate we may need to
think of a different solution for physical slots, if these are
problems only for physical slots.
As I've already told, it indirectly affects the logical slots as well.
Alexander wrote:
I spend more time on this. The next revision is attached. It
contains revised comments and other cosmetic changes. I'm going to
backpatch 0001 to all supported branches, and 0002 to 17 where
injection points were introduced.
Alexander, thank you for polishing the patch. Just my opinion, I would prefer
to put tests before the fix due to reason that you can reproduce the problem
when simply checkout the commit with tests. Once, the tests are after the fix
you are not able to do this way. Anyway, I'm ok with your changes. Thank you!
I did some changes in the patch (v7 is attached):
* Removed modules/test_replslot_required_lsn directory. It is not needed anymore,
once you've moved test files to another directory.
* Renamed tests to 046_checkpoint_logical_slot.pl, 047_checkpoint_physical_slot.pl.
I believe, such names are more descriptive.
Please, consider these changes.
With best regards,
Vitaly
Attachments:
v7-0003-Remove-redundant-ReplicationSlotsComputeRequiredLSN-.patchtext/x-patchDownload
From b85ffd56eb1e28d5e61e6221ba97e7e3bea7a982 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sat, 24 May 2025 13:34:36 +0300
Subject: [PATCH 3/3] Remove redundant ReplicationSlotsComputeRequiredLSN calls
The function ReplicationSlotsComputeRequiredLSN is used to calculate the
oldest slots' required LSN. It is called every time when restart_lsn
value of any slot is changed (for example, when a slot is advancing).
The slot's oldest required LSN is used to remote old WAL segments in two
places - when checkpoint or restart point is created (CreateCheckPoint,
CreateRestartPoint functions). Old WAL segments seems to be truncated in
these two functions only.
The idea of the patch is to call ReplicationSlotsComputeRequiredLSN in
CreateCheckPoint or CreateRestartPoint functions only, before call of
RemoveOldXlogFiles function where old WAL segments are removed. There
is no obvious need to recalculate oldest required LSN every time when a
slot's restart_lsn is changed.
The value of the oldest required lsn can affect on slot invalidation.
The function InvalidateObsoleteReplicationSlots with non zero second
parameter (oldestSegno) is called in CreateCheckPoint,
CreateRestartPoint functions only where slot invalidation occurs with
reason RS_INVAL_WAL_REMOVED. Once we update the oldest slots' required
lsn in the beginning of these functions, the proposed patch should not
break the behaviour of slot invalidation function in this case.
---
src/backend/access/transam/xlog.c | 4 ++++
src/backend/replication/logical/logical.c | 1 -
src/backend/replication/logical/slotsync.c | 4 ----
src/backend/replication/slot.c | 5 -----
src/backend/replication/slotfuncs.c | 2 --
src/backend/replication/walsender.c | 1 -
6 files changed, 4 insertions(+), 13 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0a7f7a71d8b..bdfd0a59ab7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7324,6 +7324,7 @@ CreateCheckPoint(int flags)
* might be advanced concurrently, so we call this before
* CheckPointReplicationSlots() synchronizes replication slots.
*/
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
/*
@@ -7528,6 +7529,7 @@ CreateCheckPoint(int flags)
* cleanup. Then, we must synchronize the replication slots again in
* order to make this LSN safe to use.
*/
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
CheckPointReplicationSlots(shutdown);
@@ -7901,6 +7903,7 @@ CreateRestartPoint(int flags)
* might be advanced concurrently, so we call this before
* CheckPointReplicationSlots() synchronizes replication slots.
*/
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
if (log_checkpoints)
@@ -8001,6 +8004,7 @@ CreateRestartPoint(int flags)
* cleanup. Then, we must synchronize the replication slots again in
* order to make this LSN safe to use.
*/
+ ReplicationSlotsComputeRequiredLSN();
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 081e6593722..34e973393c2 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1925,7 +1925,6 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
SpinLockRelease(&MyReplicationSlot->mutex);
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
}
}
else
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 656e66e0ae0..30662c09275 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -335,7 +335,6 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
SpinLockRelease(&slot->mutex);
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
}
return updated_config || updated_xmin_or_lsn;
@@ -502,9 +501,6 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
slot->data.restart_lsn = restart_lsn;
SpinLockRelease(&slot->mutex);
- /* Prevent WAL removal as fast as possible */
- ReplicationSlotsComputeRequiredLSN();
-
XLByteToSeg(slot->data.restart_lsn, segno, wal_segment_size);
/*
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 600b87fa9cb..dd18fe10f7d 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1008,7 +1008,6 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
* limits.
*/
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
/*
* If removing the directory fails, the worst thing that will happen is
@@ -1494,9 +1493,6 @@ ReplicationSlotReserveWal(void)
slot->data.restart_lsn = restart_lsn;
SpinLockRelease(&slot->mutex);
- /* prevent WAL removal as fast as possible */
- ReplicationSlotsComputeRequiredLSN();
-
/*
* If all required WAL is still there, great, otherwise retry. The
* slot should prevent further removal of WAL, unless there's a
@@ -2014,7 +2010,6 @@ restart:
if (invalidated)
{
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
}
return invalidated;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 36cc2ed4e44..3300fb9b1c9 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -583,7 +583,6 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
* advancing potentially done.
*/
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
ReplicationSlotRelease();
@@ -819,7 +818,6 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
ReplicationSlotMarkDirty();
ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
ReplicationSlotSave();
#ifdef USE_ASSERT_CHECKING
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index d751d34295d..dff749f00a8 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2384,7 +2384,6 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
if (changed)
{
ReplicationSlotMarkDirty();
- ReplicationSlotsComputeRequiredLSN();
PhysicalWakeupLogicalWalSnd();
}
--
2.34.1
v7-0002-Add-TAP-tests-to-check-replication-slot-advance-duri.patchtext/x-patchDownload
From c978cc88848615670fce667c83cda3fe874d80c0 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sat, 24 May 2025 13:26:28 +0300
Subject: [PATCH 2/3] Add TAP tests to check replication slot advance during
the checkpoint
The new tests verify that logical and physical replication slots are still
valid after an immediate restart on checkpoint completion when the slot was
advanced during the checkpoint.
This commit introduces two new injection points to make these tests possible:
* checkpoint-before-old-wal-removal - triggered in the checkpointer process
just before old WAL segments cleanup;
* logical-replication-slot-advance-segment - triggered in
LogicalConfirmReceivedLocation() when restart_lsn was changed enough to
point to the next WAL segment.
Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Backpatch-through: 17
---
src/backend/access/transam/xlog.c | 4 +
src/backend/replication/logical/logical.c | 18 +++
src/test/recovery/meson.build | 2 +
.../recovery/t/046_checkpoint_logical_slot.pl | 139 ++++++++++++++++++
.../t/047_checkpoint_physical_slot.pl | 133 +++++++++++++++++
5 files changed, 296 insertions(+)
create mode 100644 src/test/recovery/t/046_checkpoint_logical_slot.pl
create mode 100644 src/test/recovery/t/047_checkpoint_physical_slot.pl
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a0e589e9c4b..0a7f7a71d8b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7509,6 +7509,10 @@ CreateCheckPoint(int flags)
if (PriorRedoPtr != InvalidXLogRecPtr)
UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
+#ifdef USE_INJECTION_POINTS
+ INJECTION_POINT("checkpoint-before-old-wal-removal", NULL);
+#endif
+
/*
* Delete old log files, those no longer needed for last checkpoint to
* prevent the disk holding the xlog from growing full.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 6b3995133e2..081e6593722 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -29,6 +29,7 @@
#include "postgres.h"
#include "access/xact.h"
+#include "access/xlog_internal.h"
#include "access/xlogutils.h"
#include "fmgr.h"
#include "miscadmin.h"
@@ -41,6 +42,7 @@
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/inval.h"
#include "utils/memutils.h"
@@ -1825,9 +1827,13 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
{
bool updated_xmin = false;
bool updated_restart = false;
+ XLogRecPtr restart_lsn pg_attribute_unused();
SpinLockAcquire(&MyReplicationSlot->mutex);
+ /* remember the old restart lsn */
+ restart_lsn = MyReplicationSlot->data.restart_lsn;
+
/*
* Prevent moving the confirmed_flush backwards, as this could lead to
* data duplication issues caused by replicating already replicated
@@ -1889,6 +1895,18 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
*/
if (updated_xmin || updated_restart)
{
+#ifdef USE_INJECTION_POINTS
+ XLogSegNo seg1,
+ seg2;
+
+ XLByteToSeg(restart_lsn, seg1, wal_segment_size);
+ XLByteToSeg(MyReplicationSlot->data.restart_lsn, seg2, wal_segment_size);
+
+ /* trigger injection point, but only if segment changes */
+ if (seg1 != seg2)
+ INJECTION_POINT("logical-replication-slot-advance-segment", NULL);
+#endif
+
ReplicationSlotMarkDirty();
ReplicationSlotSave();
elog(DEBUG1, "updated xmin: %u restart: %u", updated_xmin, updated_restart);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index cb983766c67..92429d28402 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -54,6 +54,8 @@ tests += {
't/043_no_contrecord_switch.pl',
't/044_invalidate_inactive_slots.pl',
't/045_archive_restartpoint.pl',
+ 't/046_checkpoint_logical_slot.pl',
+ 't/047_checkpoint_physical_slot.pl'
],
},
}
diff --git a/src/test/recovery/t/046_checkpoint_logical_slot.pl b/src/test/recovery/t/046_checkpoint_logical_slot.pl
new file mode 100644
index 00000000000..b4265c4a6a5
--- /dev/null
+++ b/src/test/recovery/t/046_checkpoint_logical_slot.pl
@@ -0,0 +1,139 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+# This test verifies the case when the logical slot is advanced during
+# checkpoint. The test checks that the logical slot's restart_lsn still refers
+# to an existed WAL segment after immediate restart.
+#
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+
+$node = PostgreSQL::Test::Cluster->new('mike');
+$node->init;
+$node->append_conf('postgresql.conf',
+ "shared_preload_libraries = 'injection_points'");
+$node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# Create a simple table to generate data into.
+$node->safe_psql('postgres',
+ q{create table t (id serial primary key, b text)});
+
+# Create the two slots we'll need.
+$node->safe_psql('postgres',
+ q{select pg_create_logical_replication_slot('slot_logical', 'test_decoding')}
+);
+$node->safe_psql('postgres',
+ q{select pg_create_physical_replication_slot('slot_physical', true)});
+
+# Advance both slots to the current position just to have everything "valid".
+$node->safe_psql('postgres',
+ q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null)}
+);
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Run checkpoint to flush current state to disk and set a baseline.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Generate some transactions to get RUNNING_XACTS.
+my $xacts = $node->background_psql('postgres');
+$xacts->query_until(
+ qr/run_xacts/,
+ q(\echo run_xacts
+SELECT 1 \watch 0.1
+\q
+));
+
+# Insert 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+);
+
+# Run another checkpoint to set a new restore LSN.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Another 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+);
+
+# Run another checkpoint, this time in the background, and make it wait
+# on the injection point) so that the checkpoint stops right before
+# removing old WAL segments.
+note('starting checkpoint\n');
+
+my $checkpoint = $node->background_psql('postgres');
+$checkpoint->query_safe(
+ q(select injection_points_attach('checkpoint-before-old-wal-removal','wait'))
+);
+$checkpoint->query_until(
+ qr/starting_checkpoint/,
+ q(\echo starting_checkpoint
+checkpoint;
+\q
+));
+
+# Wait until the checkpoint stops right before removing WAL segments.
+note('waiting for injection_point\n');
+$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
+note('injection_point is reached');
+
+# Try to advance the logical slot, but make it stop when it moves to the next
+# WAL segment (this has to happen in the background, too).
+my $logical = $node->background_psql('postgres');
+$logical->query_safe(
+ q{select injection_points_attach('logical-replication-slot-advance-segment','wait');}
+);
+$logical->query_until(
+ qr/get_changes/,
+ q(
+\echo get_changes
+select count(*) from pg_logical_slot_get_changes('slot_logical', null, null) \watch 1
+\q
+));
+
+# Wait until the slot's restart_lsn points to the next WAL segment.
+note('waiting for injection_point\n');
+$node->wait_for_event('client backend',
+ 'logical-replication-slot-advance-segment');
+note('injection_point is reached');
+
+# OK, we're in the right situation: time to advance the physical slot, which
+# recalculates the required LSN, and then unblock the checkpoint, which
+# removes the WAL still needed by the logical slot.
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Continue the checkpoint.
+$node->safe_psql('postgres',
+ q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
+
+# Abruptly stop the server (1 second should be enough for the checkpoint
+# to finish; it would be better).
+$node->stop('immediate');
+
+$node->start;
+
+eval {
+ $node->safe_psql('postgres',
+ q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null);}
+ );
+};
+is($@, '', "Logical slot still valid");
+
+done_testing();
diff --git a/src/test/recovery/t/047_checkpoint_physical_slot.pl b/src/test/recovery/t/047_checkpoint_physical_slot.pl
new file mode 100644
index 00000000000..454e56b9bd2
--- /dev/null
+++ b/src/test/recovery/t/047_checkpoint_physical_slot.pl
@@ -0,0 +1,133 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+# This test verifies the case when the physical slot is advanced during
+# checkpoint. The test checks that the physical slot's restart_lsn still refers
+# to an existed WAL segment after immediate restart.
+#
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+
+$node = PostgreSQL::Test::Cluster->new('mike');
+$node->init;
+$node->append_conf('postgresql.conf',
+ "shared_preload_libraries = 'injection_points'");
+$node->append_conf('postgresql.conf', "wal_level = 'replica'");
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# Create a simple table to generate data into.
+$node->safe_psql('postgres',
+ q{create table t (id serial primary key, b text)});
+
+# Create a physical replication slot.
+$node->safe_psql('postgres',
+ q{select pg_create_physical_replication_slot('slot_physical', true)});
+
+# Advance slot to the current position, just to have everything "valid".
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Run checkpoint to flush current state to disk and set a baseline.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Insert 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,100000) s(i)}
+);
+
+# Advance slot to the current position, just to have everything "valid".
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Run another checkpoint to set a new restore LSN.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Another 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+);
+
+my $restart_lsn_init = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'}
+);
+chomp($restart_lsn_init);
+note("restart lsn before checkpoint: $restart_lsn_init");
+
+# Run another checkpoint, this time in the background, and make it wait
+# on the injection point) so that the checkpoint stops right before
+# removing old WAL segments.
+note('starting checkpoint');
+
+my $checkpoint = $node->background_psql('postgres');
+$checkpoint->query_safe(
+ q{select injection_points_attach('checkpoint-before-old-wal-removal','wait')}
+);
+$checkpoint->query_until(
+ qr/starting_checkpoint/,
+ q(\echo starting_checkpoint
+checkpoint;
+\q
+));
+
+# Wait until the checkpoint stops right before removing WAL segments.
+note('waiting for injection_point');
+$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
+note('injection_point is reached');
+
+# OK, we're in the right situation: time to advance the physical slot, which
+# recalculates the required LSN and then unblock the checkpoint, which
+# removes the WAL still needed by the physical slot.
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Continue the checkpoint.
+$node->safe_psql('postgres',
+ q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
+
+my $restart_lsn_old = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'}
+);
+chomp($restart_lsn_old);
+note("restart lsn before stop: $restart_lsn_old");
+
+# Abruptly stop the server (1 second should be enough for the checkpoint
+# to finish; it would be better).
+$node->stop('immediate');
+
+$node->start;
+
+# Get the restart_lsn of the slot right after restarting.
+my $restart_lsn = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'}
+);
+chomp($restart_lsn);
+note("restart lsn: $restart_lsn");
+
+# Get the WAL segment name for the slot's restart_lsn.
+my $restart_lsn_segment = $node->safe_psql('postgres',
+ "SELECT pg_walfile_name('$restart_lsn'::pg_lsn)");
+chomp($restart_lsn_segment);
+
+# Check if the required wal segment exists.
+note("required by slot segment name: $restart_lsn_segment");
+my $datadir = $node->data_dir;
+ok( -f "$datadir/pg_wal/$restart_lsn_segment",
+ "WAL segment $restart_lsn_segment for physical slot's restart_lsn $restart_lsn exists"
+);
+
+done_testing();
--
2.34.1
v7-0001-Keep-WAL-segments-by-the-flushed-value-of-the-slot-s.patchtext/x-patchDownload
From 1e0629efc65f190a58ec729db6f3ada4f8b83897 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sat, 24 May 2025 16:26:27 +0300
Subject: [PATCH 1/3] Keep WAL segments by the flushed value of the slot's
restart LSN
The patch fixes the issue with the unexpected removal of old WAL segments
after checkpoint, followed by an immediate restart. The issue occurs when
a slot is advanced after the start of the checkpoint and before old WAL
segments are removed at the end of the checkpoint.
The idea of the patch is to get the minimal restart_lsn at the beginning
of checkpoint (or restart point) creation and use this value when calculating
the oldest LSN for WAL segments removal at the end of checkpoint. This idea
was proposed by Tomas Vondra in the discussion.
Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Backpatch-through: 13
---
src/backend/access/transam/xlog.c | 55 +++++++++++++++++++----
src/backend/replication/logical/logical.c | 10 ++++-
src/backend/replication/walsender.c | 4 ++
3 files changed, 60 insertions(+), 9 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1914859b2ee..a0e589e9c4b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -677,7 +677,8 @@ static XLogRecPtr CreateOverwriteContrecordRecord(XLogRecPtr aborted_lsn,
XLogRecPtr pagePtr,
TimeLineID newTLI);
static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
-static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
+static void KeepLogSeg(XLogRecPtr recptr, XLogRecPtr slotsMinLSN,
+ XLogSegNo *logSegNo);
static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
static void AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli,
@@ -7087,6 +7088,7 @@ CreateCheckPoint(int flags)
VirtualTransactionId *vxids;
int nvxids;
int oldXLogAllowed = 0;
+ XLogRecPtr slotsMinReqLSN;
/*
* An end-of-recovery checkpoint is really a shutdown checkpoint, just
@@ -7315,6 +7317,15 @@ CreateCheckPoint(int flags)
*/
END_CRIT_SECTION();
+ /*
+ * Get the current minimum LSN to be used later in the WAL segment
+ * cleanup. We may clean up only WAL segments, which are not needed
+ * according to synchronized LSNs of replication slots. The slot's LSN
+ * might be advanced concurrently, so we call this before
+ * CheckPointReplicationSlots() synchronizes replication slots.
+ */
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+
/*
* In some cases there are groups of actions that must all occur on one
* side or the other of a checkpoint record. Before flushing the
@@ -7503,17 +7514,25 @@ CreateCheckPoint(int flags)
* prevent the disk holding the xlog from growing full.
*/
XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(recptr, &_logSegNo);
+ KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo);
if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
_logSegNo, InvalidOid,
InvalidTransactionId))
{
+ /*
+ * Recalculate the current minimum LSN to be used in the WAL segment
+ * cleanup. Then, we must synchronize the replication slots again in
+ * order to make this LSN safe to use.
+ */
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+ CheckPointReplicationSlots(shutdown);
+
/*
* Some slots have been invalidated; recalculate the old-segment
* horizon, starting again from RedoRecPtr.
*/
XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(recptr, &_logSegNo);
+ KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo);
}
_logSegNo--;
RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr,
@@ -7788,6 +7807,7 @@ CreateRestartPoint(int flags)
XLogRecPtr endptr;
XLogSegNo _logSegNo;
TimestampTz xtime;
+ XLogRecPtr slotsMinReqLSN;
/* Concurrent checkpoint/restartpoint cannot happen */
Assert(!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER);
@@ -7870,6 +7890,15 @@ CreateRestartPoint(int flags)
MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
+ /*
+ * Get the current minimum LSN to be used later in the WAL segment
+ * cleanup. We may clean up only WAL segments, which are not needed
+ * according to synchronized LSNs of replication slots. The slot's LSN
+ * might be advanced concurrently, so we call this before
+ * CheckPointReplicationSlots() synchronizes replication slots.
+ */
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+
if (log_checkpoints)
LogCheckpointStart(flags, true);
@@ -7958,17 +7987,25 @@ CreateRestartPoint(int flags)
receivePtr = GetWalRcvFlushRecPtr(NULL, NULL);
replayPtr = GetXLogReplayRecPtr(&replayTLI);
endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
- KeepLogSeg(endptr, &_logSegNo);
+ KeepLogSeg(endptr, slotsMinReqLSN, &_logSegNo);
if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
_logSegNo, InvalidOid,
InvalidTransactionId))
{
+ /*
+ * Recalculate the current minimum LSN to be used in the WAL segment
+ * cleanup. Then, we must synchronize the replication slots again in
+ * order to make this LSN safe to use.
+ */
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+ CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
+
/*
* Some slots have been invalidated; recalculate the old-segment
* horizon, starting again from RedoRecPtr.
*/
XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(endptr, &_logSegNo);
+ KeepLogSeg(endptr, slotsMinReqLSN, &_logSegNo);
}
_logSegNo--;
@@ -8063,6 +8100,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
XLogSegNo oldestSegMaxWalSize; /* oldest segid kept by max_wal_size */
XLogSegNo oldestSlotSeg; /* oldest segid kept by slot */
uint64 keepSegs;
+ XLogRecPtr slotsMinReqLSN;
/*
* slot does not reserve WAL. Either deactivated, or has never been active
@@ -8076,8 +8114,9 @@ GetWALAvailability(XLogRecPtr targetLSN)
* oldestSlotSeg to the current segment.
*/
currpos = GetXLogWriteRecPtr();
+ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
XLByteToSeg(currpos, oldestSlotSeg, wal_segment_size);
- KeepLogSeg(currpos, &oldestSlotSeg);
+ KeepLogSeg(currpos, slotsMinReqLSN, &oldestSlotSeg);
/*
* Find the oldest extant segment file. We get 1 until checkpoint removes
@@ -8138,7 +8177,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
* invalidation is optionally done here, instead.
*/
static void
-KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
+KeepLogSeg(XLogRecPtr recptr, XLogRecPtr slotsMinReqLSN, XLogSegNo *logSegNo)
{
XLogSegNo currSegNo;
XLogSegNo segno;
@@ -8151,7 +8190,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
* Calculate how many segments are kept by slots first, adjusting for
* max_slot_wal_keep_size.
*/
- keep = XLogGetReplicationSlotMinimumLSN();
+ keep = slotsMinReqLSN;
if (keep != InvalidXLogRecPtr && keep < recptr)
{
XLByteToSeg(keep, segno, wal_segment_size);
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 1d56d0c4ef3..6b3995133e2 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1878,7 +1878,15 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
SpinLockRelease(&MyReplicationSlot->mutex);
- /* first write new xmin to disk, so we know what's up after a crash */
+ /*
+ * First, write new xmin and restart_lsn to disk so we know what's up
+ * after a crash. Even when we do this, the checkpointer can see the
+ * updated restart_lsn value in the shared memory; then, a crash can
+ * happen before we manage to write that value to the disk. Thus,
+ * checkpointer still needs to make special efforts to keep WAL
+ * segments required by the restart_lsn written to the disk. See
+ * CreateCheckPoint() and CreateRestartPoint() for details.
+ */
if (updated_xmin || updated_restart)
{
ReplicationSlotMarkDirty();
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 9fa8beb6103..d751d34295d 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2393,6 +2393,10 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
* be energy wasted - the worst thing lost information could cause here is
* to give wrong information in a statistics view - we'll just potentially
* be more conservative in removing files.
+ *
+ * Checkpointer makes special efforts to keep the WAL segments required by
+ * the restart_lsn written to the disk. See CreateCheckPoint() and
+ * CreateRestartPoint() for details.
*/
}
--
2.34.1
On Mon, May 26, 2025 at 3:52 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
Dear Alexander, Amit, All
Amit wrote:
Is my understanding correct that we need 0001 because
PhysicalConfirmReceivedLocation() doesn't save the slot to disk after
changing the slot's restart_lsn?Yes. Also, even if it would save slot to the disk, there is still
race condition that concurrent checkpoint could use updated value from
the shared memory to clean old WAL segments, and then crash happens
before we managed to write the slot to the disk.How can that happen, if we first write the updated value to disk and
then update the shared memory as we do in
LogicalConfirmReceivedLocation?I guess, that the problem with logical slots still exist. Please, see the tap
test: src/test/recovery/t/046_logical_slot.pl from the v6 version of the patch.
A race condition may happen when logical slot's restart_lsn was changed but not
yet written to the disk. Imagine, there is another physical slot which is
advanced at this moment. It recomputes oldest min LSN and takes into account
changed but not saved to disk restart_lsn of the logical slot. We come to the
situation when the WAL segment for the logical slot's restart_lsn may be
truncated after immediate restart.
Okay, so I was missing the point that the physical slots can consider
the updated value of the logical slot's restart_lsn. The point I was
advocating for logical slots sanctity was when no physical slots are
involved. When updating replicationSlotMinLSN value in shared memory,
the logical slot machinery took care that the value we use should be
flushed to disk. One can argue that we should improve physical slots
machinery so that it also takes care to write the slot to disk before
updating the replicationSlotMinLSN, which is used to remove WAL. I
understand that the downside is physical slots will be written to disk
with a greater frequency, which will not be good from the performance
point of view, but can we think of doing it for the period when a
checkpoint is in progress? OTOH, if we don't want to adjust physical
slot machinery, it seems saving the logical slots to disk immediately
when its restart_lsn is updated is a waste of effort after your patch,
no? If so, why are we okay with that?
I understand that your proposed patch fixes the reported problem but I
am slightly afraid that the proposed solution is not a good idea w.r.t
logical slots so I am trying to see if there are any other alternative
ideas to fix this issue.
--
With Regards,
Amit Kapila.
Dear Amit,
OTOH, if we don't want to adjust physical
slot machinery, it seems saving the logical slots to disk immediately
when its restart_lsn is updated is a waste of effort after your patch,
no? If so, why are we okay with that?
I agree, that saving logical slots at advance is a possible waste of effort. But
I don't understand original ideas behind it. I haven't touched it to make
the minimal patch which should not break the existing functionality.
We trim WAL in checkpoint (or restart point) operation only. The slots'
restart_lsn is used to keep the wal from truncation. I believe, we need to
compute the slots' oldest lsn as the minimal value of restart_lsn values only
when executing checkpoint (or restart point). I guess, it doesn't depend on
slot's type (logical or physical). We have 0003 patch to fix it.
I haven't deeply investigated yet slot's xmin values but I guess the xmin values
are a different story than restart_lsn. It is used to avoid tuple deletions by
vacuum and it is updated by a different way. I can't say that
LogicalConfirmReceivedLocation is the right place to update saved on disk xmin
values. I would propose to update these values in SaveSlotToPath under some lock
to avoid concurrent reads of unsaved values or do in a checkpoint like as for
restart_lsn. We may investigate and improve it in an another patch.
With best regards,
Vitaly
On Mon, May 26, 2025 at 2:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, May 26, 2025 at 3:52 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
Dear Alexander, Amit, All
Amit wrote:
Is my understanding correct that we need 0001 because
PhysicalConfirmReceivedLocation() doesn't save the slot to disk after
changing the slot's restart_lsn?Yes. Also, even if it would save slot to the disk, there is still
race condition that concurrent checkpoint could use updated value from
the shared memory to clean old WAL segments, and then crash happens
before we managed to write the slot to the disk.How can that happen, if we first write the updated value to disk and
then update the shared memory as we do in
LogicalConfirmReceivedLocation?I guess, that the problem with logical slots still exist. Please, see the tap
test: src/test/recovery/t/046_logical_slot.pl from the v6 version of the patch.
A race condition may happen when logical slot's restart_lsn was changed but not
yet written to the disk. Imagine, there is another physical slot which is
advanced at this moment. It recomputes oldest min LSN and takes into account
changed but not saved to disk restart_lsn of the logical slot. We come to the
situation when the WAL segment for the logical slot's restart_lsn may be
truncated after immediate restart.Okay, so I was missing the point that the physical slots can consider
the updated value of the logical slot's restart_lsn. The point I was
advocating for logical slots sanctity was when no physical slots are
involved. When updating replicationSlotMinLSN value in shared memory,
the logical slot machinery took care that the value we use should be
flushed to disk. One can argue that we should improve physical slots
machinery so that it also takes care to write the slot to disk before
updating the replicationSlotMinLSN, which is used to remove WAL. I
understand that the downside is physical slots will be written to disk
with a greater frequency, which will not be good from the performance
point of view, but can we think of doing it for the period when a
checkpoint is in progress?
That could cause replication slowdown while checkpointing is
in-progress. This is certainly better than slowing down the
replication permanently, but still doesn't look good.
OTOH, if we don't want to adjust physical
slot machinery, it seems saving the logical slots to disk immediately
when its restart_lsn is updated is a waste of effort after your patch,
no? If so, why are we okay with that?
I don't think so. I think the reason why logical slots are synced to
disk immediately after update is that logical changes are not
idempotent (you can't safely apply the same change twice) unlike
physical block-level changes. This is why logical slots need to be
synced to prevent double replication of same changes, which could
lead, for example, to double insertion.
I understand that your proposed patch fixes the reported problem but I
am slightly afraid that the proposed solution is not a good idea w.r.t
logical slots so I am trying to see if there are any other alternative
ideas to fix this issue.
I don't understand exact concerns about this fix. For sure, we can
try to implement a fix hacking LogicalConfirmReceivedLocation() and
PhysicalConfirmReceivedLocation(). But that would be way more
cumbersome, especially if we have to keep ABI compatibility. Also, it
doesn't seem to me that either LogicalConfirmReceivedLocation() or
PhysicalConfirmReceivedLocation() currently try to address this issue:
LogicalConfirmReceivedLocation() implements immediate sync for
different reasons.
------
Regards,
Alexander Korotkov
Supabase
On Mon, May 26, 2025 at 10:36 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
On Mon, May 26, 2025 at 2:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, May 26, 2025 at 3:52 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
OTOH, if we don't want to adjust physical
slot machinery, it seems saving the logical slots to disk immediately
when its restart_lsn is updated is a waste of effort after your patch,
no? If so, why are we okay with that?I don't think so. I think the reason why logical slots are synced to
disk immediately after update is that logical changes are not
idempotent (you can't safely apply the same change twice) unlike
physical block-level changes. This is why logical slots need to be
synced to prevent double replication of same changes, which could
lead, for example, to double insertion.
Hmm, if this has to be true, then even in the else branch of
LogicalConfirmReceivedLocation [1]else { SpinLockAcquire(&MyReplicationSlot->mutex);, we should have saved the slot.
AFAIU, whether the logical changes are sent to the client is decided
based on two things: (a) the replication origins, which tracks
replication progress and are maintained by clients (which for built-in
replication are subscriber nodes), see [2]https://www.postgresql.org/docs/devel/replication-origins.html; and (b) confirmed_flush
LSN maintained in the slot by the server. Now, for each ack by the
client after applying/processing changes, we update the
confirmed_flush LSN of the slot but don't immediately flush it. This
shouldn't let us send the changes again because even if the system
crashes and restarts, the client will send the server the location to
start sending the changes from based on its origin tracking. There is
more to it, like there are cases when confirm_flush LSN in the slot
could be ahead the origin's LSN, and we handle all such cases, but I
don't think those are directly related here, so I am skipping those
details for now.
Note that LogicalConfirmReceivedLocation won't save the slot to disk
if it updates only confirmed_flush LSN, which is used to decide
whether to send the changes.
LogicalConfirmReceivedLocation() implements immediate sync for
different reasons.
I may be missing something, but let's discuss some more before we conclude this.
[1]: else { SpinLockAcquire(&MyReplicationSlot->mutex);
else
{
SpinLockAcquire(&MyReplicationSlot->mutex);
/*
* Prevent moving the confirmed_flush backwards. See comments above
* for the details.
*/
if (lsn > MyReplicationSlot->data.confirmed_flush)
MyReplicationSlot->data.confirmed_flush = lsn;
SpinLockRelease(&MyReplicationSlot->mutex);
}
[2]: https://www.postgresql.org/docs/devel/replication-origins.html
--
With Regards,
Amit Kapila.
On Tue, May 27, 2025 at 7:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, May 26, 2025 at 10:36 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:On Mon, May 26, 2025 at 2:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, May 26, 2025 at 3:52 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
OTOH, if we don't want to adjust physical
slot machinery, it seems saving the logical slots to disk immediately
when its restart_lsn is updated is a waste of effort after your patch,
no? If so, why are we okay with that?I don't think so. I think the reason why logical slots are synced to
disk immediately after update is that logical changes are not
idempotent (you can't safely apply the same change twice) unlike
physical block-level changes. This is why logical slots need to be
synced to prevent double replication of same changes, which could
lead, for example, to double insertion.Hmm, if this has to be true, then even in the else branch of
LogicalConfirmReceivedLocation [1], we should have saved the slot.
AFAIU, whether the logical changes are sent to the client is decided
based on two things: (a) the replication origins, which tracks
replication progress and are maintained by clients (which for built-in
replication are subscriber nodes), see [2]; and (b) confirmed_flush
LSN maintained in the slot by the server. Now, for each ack by the
client after applying/processing changes, we update the
confirmed_flush LSN of the slot but don't immediately flush it. This
shouldn't let us send the changes again because even if the system
crashes and restarts, the client will send the server the location to
start sending the changes from based on its origin tracking. There is
more to it, like there are cases when confirm_flush LSN in the slot
could be ahead the origin's LSN, and we handle all such cases, but I
don't think those are directly related here, so I am skipping those
details for now.Note that LogicalConfirmReceivedLocation won't save the slot to disk
if it updates only confirmed_flush LSN, which is used to decide
whether to send the changes.
You're right, I didn't study these aspects careful enough.
LogicalConfirmReceivedLocation() implements immediate sync for
different reasons.I may be missing something, but let's discuss some more before we conclude this.
So, yes probably LogicalConfirmReceivedLocation() tries to care about
keeping all WAL segments after the synchronized value of restart_lsn.
But it just doesn't care about concurrent
ReplicationSlotsComputeRequiredLSN(). In order to fix that logic, we
need effective_restart_lsn field by analogy to effective_catalog_xmin
(similar approach was discussed in this thread before). But that
would require ABI compatibility breakage.
So, I'd like to propose following: backpatch 0001 and 0002, but
implement effective_restart_lsn field for pg19. What do you think?
------
Regards,
Alexander Korotkov
Supabase
On Tue, May 27, 2025 at 12:12 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
On Tue, May 27, 2025 at 7:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, May 26, 2025 at 10:36 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:On Mon, May 26, 2025 at 2:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, May 26, 2025 at 3:52 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
OTOH, if we don't want to adjust physical
slot machinery, it seems saving the logical slots to disk immediately
when its restart_lsn is updated is a waste of effort after your patch,
no? If so, why are we okay with that?I don't think so. I think the reason why logical slots are synced to
disk immediately after update is that logical changes are not
idempotent (you can't safely apply the same change twice) unlike
physical block-level changes. This is why logical slots need to be
synced to prevent double replication of same changes, which could
lead, for example, to double insertion.Hmm, if this has to be true, then even in the else branch of
LogicalConfirmReceivedLocation [1], we should have saved the slot.
AFAIU, whether the logical changes are sent to the client is decided
based on two things: (a) the replication origins, which tracks
replication progress and are maintained by clients (which for built-in
replication are subscriber nodes), see [2]; and (b) confirmed_flush
LSN maintained in the slot by the server. Now, for each ack by the
client after applying/processing changes, we update the
confirmed_flush LSN of the slot but don't immediately flush it. This
shouldn't let us send the changes again because even if the system
crashes and restarts, the client will send the server the location to
start sending the changes from based on its origin tracking. There is
more to it, like there are cases when confirm_flush LSN in the slot
could be ahead the origin's LSN, and we handle all such cases, but I
don't think those are directly related here, so I am skipping those
details for now.Note that LogicalConfirmReceivedLocation won't save the slot to disk
if it updates only confirmed_flush LSN, which is used to decide
whether to send the changes.You're right, I didn't study these aspects careful enough.
LogicalConfirmReceivedLocation() implements immediate sync for
different reasons.I may be missing something, but let's discuss some more before we conclude this.
So, yes probably LogicalConfirmReceivedLocation() tries to care about
keeping all WAL segments after the synchronized value of restart_lsn.
But it just doesn't care about concurrent
ReplicationSlotsComputeRequiredLSN(). In order to fix that logic, we
need effective_restart_lsn field by analogy to effective_catalog_xmin
(similar approach was discussed in this thread before). But that
would require ABI compatibility breakage.So, I'd like to propose following: backpatch 0001 and 0002, but
implement effective_restart_lsn field for pg19. What do you think?
Possibly we could implement effective_restart_lsn even for pg18. As I
know, keeping ABI compatibility is not required for beta.
------
Regards,
Alexander Korotkov
Supabase
On Tue, May 27, 2025 at 2:48 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Tue, May 27, 2025 at 12:12 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:On Tue, May 27, 2025 at 7:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, May 26, 2025 at 10:36 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:On Mon, May 26, 2025 at 2:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, May 26, 2025 at 3:52 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
OTOH, if we don't want to adjust physical
slot machinery, it seems saving the logical slots to disk immediately
when its restart_lsn is updated is a waste of effort after your patch,
no? If so, why are we okay with that?I don't think so. I think the reason why logical slots are synced to
disk immediately after update is that logical changes are not
idempotent (you can't safely apply the same change twice) unlike
physical block-level changes. This is why logical slots need to be
synced to prevent double replication of same changes, which could
lead, for example, to double insertion.Hmm, if this has to be true, then even in the else branch of
LogicalConfirmReceivedLocation [1], we should have saved the slot.
AFAIU, whether the logical changes are sent to the client is decided
based on two things: (a) the replication origins, which tracks
replication progress and are maintained by clients (which for built-in
replication are subscriber nodes), see [2]; and (b) confirmed_flush
LSN maintained in the slot by the server. Now, for each ack by the
client after applying/processing changes, we update the
confirmed_flush LSN of the slot but don't immediately flush it. This
shouldn't let us send the changes again because even if the system
crashes and restarts, the client will send the server the location to
start sending the changes from based on its origin tracking. There is
more to it, like there are cases when confirm_flush LSN in the slot
could be ahead the origin's LSN, and we handle all such cases, but I
don't think those are directly related here, so I am skipping those
details for now.Note that LogicalConfirmReceivedLocation won't save the slot to disk
if it updates only confirmed_flush LSN, which is used to decide
whether to send the changes.You're right, I didn't study these aspects careful enough.
LogicalConfirmReceivedLocation() implements immediate sync for
different reasons.I may be missing something, but let's discuss some more before we conclude this.
So, yes probably LogicalConfirmReceivedLocation() tries to care about
keeping all WAL segments after the synchronized value of restart_lsn.
But it just doesn't care about concurrent
ReplicationSlotsComputeRequiredLSN(). In order to fix that logic, we
need effective_restart_lsn field by analogy to effective_catalog_xmin
(similar approach was discussed in this thread before). But that
would require ABI compatibility breakage.So, I'd like to propose following: backpatch 0001 and 0002, but
implement effective_restart_lsn field for pg19. What do you think?Possibly we could implement effective_restart_lsn even for pg18. As I
know, keeping ABI compatibility is not required for beta.
Yeah, we should be able to change ABI during beta, but I can't comment
on the idea of effective_restart_lsn without seeing the patch or a
detailed explanation of this idea.
Now, you see my point related to restart_lsn computation for logical
slots, it is better to also do some analysis of the problem related to
xmin I have highlighted in one of my previous emails [1]/messages/by-id/CAA4eK1KMaPA5jir_SFu+qr3qu55OOdFWVZpuUkqTSGZ9fyPpHA@mail.gmail.com -- With Regards, Amit Kapila.. I see your
response to it, but I feel someone needs to give it a try by writing a
test and see the behavior. I am saying because logical slots took
precaution of flushing to disk before updating shared values of xmin
for a reason, whereas similar precautions are not taken for physical
slots, so there could be a problem with that computation as well.
[1]: /messages/by-id/CAA4eK1KMaPA5jir_SFu+qr3qu55OOdFWVZpuUkqTSGZ9fyPpHA@mail.gmail.com -- With Regards, Amit Kapila.
--
With Regards,
Amit Kapila.
On Tue, May 27, 2025 at 2:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Yeah, we should be able to change ABI during beta, but I can't comment
on the idea of effective_restart_lsn without seeing the patch or a
detailed explanation of this idea.
Could you, please, check the patch [1]. It implements this idea
except it names new field restart_lsn_flushed instead of
effective_restart_lsn.
Now, you see my point related to restart_lsn computation for logical
slots, it is better to also do some analysis of the problem related to
xmin I have highlighted in one of my previous emails [1]. I see your
response to it, but I feel someone needs to give it a try by writing a
test and see the behavior. I am saying because logical slots took
precaution of flushing to disk before updating shared values of xmin
for a reason, whereas similar precautions are not taken for physical
slots, so there could be a problem with that computation as well.
I see LogicalConfirmReceivedLocation() performs correctly while
updating effective_catalog_xmin only after syncing the slot to the
disk. I don't see how effective_xmin gets updates with the logical
replication progress though. Could you get me some clue on this,
please?
Links.
1. /messages/by-id/1538a2-67c5c700-7-77ec5a80@179382871
------
Regards,
Alexander Korotkov
Supabase
On Thu, May 29, 2025 at 5:29 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Tue, May 27, 2025 at 2:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Yeah, we should be able to change ABI during beta, but I can't comment
on the idea of effective_restart_lsn without seeing the patch or a
detailed explanation of this idea.Could you, please, check the patch [1]. It implements this idea
except it names new field restart_lsn_flushed instead of
effective_restart_lsn.
This appears to be a better direction than the other patch, at least
for HEAD. I noticed a few points while looking at the patch.
1. restart_lsn_flushed: Can we name it as last_saved_restart_lsn based
on existing variable last_saved_confirmed_flush?
2. There are no comments as to why this is considered only for
persistent slots when CheckPointReplicationSlots doesn't have any such
check.
3. Please see if it makes sense to copy it in the copy_replication_slot.
Apart from these, I am not sure if there are still any pending
comments in the thread to be handled for this patch, so please see to
avoid missing anything.
Now, you see my point related to restart_lsn computation for logical
slots, it is better to also do some analysis of the problem related to
xmin I have highlighted in one of my previous emails [1]. I see your
response to it, but I feel someone needs to give it a try by writing a
test and see the behavior. I am saying because logical slots took
precaution of flushing to disk before updating shared values of xmin
for a reason, whereas similar precautions are not taken for physical
slots, so there could be a problem with that computation as well.I see LogicalConfirmReceivedLocation() performs correctly while
updating effective_catalog_xmin only after syncing the slot to the
disk. I don't see how effective_xmin gets updates with the logical
replication progress though. Could you get me some clue on this,
please?
As per my understanding, for logical slots, effective_xmin is only set
during the initial copy phase (or say if one has to export a
snapshot), after that, its value won't change. Please read the
comments in CreateInitDecodingContext() where we set its value. If you
still have questions about it, we can discuss further.
--
With Regards,
Amit Kapila.
On Mon, Jun 2, 2025 at 2:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, May 29, 2025 at 5:29 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Tue, May 27, 2025 at 2:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Yeah, we should be able to change ABI during beta, but I can't comment
on the idea of effective_restart_lsn without seeing the patch or a
detailed explanation of this idea.Could you, please, check the patch [1]. It implements this idea
except it names new field restart_lsn_flushed instead of
effective_restart_lsn.This appears to be a better direction than the other patch, at least
for HEAD. I noticed a few points while looking at the patch.1. restart_lsn_flushed: Can we name it as last_saved_restart_lsn based
on existing variable last_saved_confirmed_flush?
Good point, renamed.
2. There are no comments as to why this is considered only for
persistent slots when CheckPointReplicationSlots doesn't have any such
check.
Relevant comments added.
3. Please see if it makes sense to copy it in the copy_replication_slot.
Thank you for pointing, but I don't think this is necessary.
copy_replication_slot() calls ReplicationSlotSave(), which updates
last_saved_restart_lsn.
Also, I've changed ReplicationSlotsComputeRequiredLSN() call to
CheckPointReplicationSlots() to update required LSN after
SaveSlotToPath() updated last_saved_restart_lsn. This helps to pass
checks in 001_stream_rep.pl without additional hacks.
Apart from these, I am not sure if there are still any pending
comments in the thread to be handled for this patch, so please see to
avoid missing anything.Now, you see my point related to restart_lsn computation for logical
slots, it is better to also do some analysis of the problem related to
xmin I have highlighted in one of my previous emails [1]. I see your
response to it, but I feel someone needs to give it a try by writing a
test and see the behavior. I am saying because logical slots took
precaution of flushing to disk before updating shared values of xmin
for a reason, whereas similar precautions are not taken for physical
slots, so there could be a problem with that computation as well.I see LogicalConfirmReceivedLocation() performs correctly while
updating effective_catalog_xmin only after syncing the slot to the
disk. I don't see how effective_xmin gets updates with the logical
replication progress though. Could you get me some clue on this,
please?As per my understanding, for logical slots, effective_xmin is only set
during the initial copy phase (or say if one has to export a
snapshot), after that, its value won't change. Please read the
comments in CreateInitDecodingContext() where we set its value. If you
still have questions about it, we can discuss further.
OK, thank you for the clarification. I've read the comments in
CreateInitDecodingContext() as you suggested. All of above makes me
think *_xmin fields are handled properly.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v2-0001-Keep-WAL-segments-by-slot-s-flushed-restart-LSN.patchapplication/octet-stream; name=v2-0001-Keep-WAL-segments-by-slot-s-flushed-restart-LSN.patchDownload
From 0fe25ecf89396c26842d5889a1c4625002d08a3e Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <v.davydov@postgrespro.ru>
Date: Mon, 3 Mar 2025 17:02:15 +0300
Subject: [PATCH v2 1/2] Keep WAL segments by slot's flushed restart LSN
The slot data is flushed to the disk at the beginning of checkpoint. If
an existing slot is advanced in the middle of checkpoint execution, its
advanced restart LSN is taken to calculate the oldest LSN for WAL
segments removal at the end of checkpoint. If the node is restarted just
after the checkpoint, the slots data will be read from the disk at
recovery with the oldest restart LSN which can refer to removed WAL
segments.
The patch introduces a new in-memory state for slots -
flushed_restart_lsn which is used to calculate the oldest LSN for WAL
segments removal. This state is updated every time with the current
restart_lsn at the moment, when the slot is saving to disk.
---
src/backend/replication/slot.c | 57 ++++++++++++++++++++++++++++++++++
src/include/replication/slot.h | 7 +++++
2 files changed, 64 insertions(+)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 600b87fa9cb..c64f020742f 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -424,6 +424,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
slot->candidate_restart_valid = InvalidXLogRecPtr;
slot->candidate_restart_lsn = InvalidXLogRecPtr;
slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
+ slot->last_saved_restart_lsn = InvalidXLogRecPtr;
slot->inactive_since = 0;
/*
@@ -1165,20 +1166,41 @@ ReplicationSlotsComputeRequiredLSN(void)
{
ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
XLogRecPtr restart_lsn;
+ XLogRecPtr last_saved_restart_lsn;
bool invalidated;
+ ReplicationSlotPersistency persistency;
if (!s->in_use)
continue;
SpinLockAcquire(&s->mutex);
+ persistency = s->data.persistency;
restart_lsn = s->data.restart_lsn;
invalidated = s->data.invalidated != RS_INVAL_NONE;
+ last_saved_restart_lsn = s->last_saved_restart_lsn;
SpinLockRelease(&s->mutex);
/* invalidated slots need not apply */
if (invalidated)
continue;
+ /*
+ * For persistent slot use last_saved_restart_lsn to compute the
+ * oldest LSN for removal of WAL segments. The segments between
+ * last_saved_restart_lsn and restart_lsn might be needed by a
+ * persistent slot in the case of database crash. Non-persistent
+ * slots can't survive the database crash, so we don't care about
+ * last_saved_restart_lsn for them.
+ */
+ if (persistency == RS_PERSISTENT)
+ {
+ if (last_saved_restart_lsn != InvalidXLogRecPtr &&
+ restart_lsn > last_saved_restart_lsn)
+ {
+ restart_lsn = last_saved_restart_lsn;
+ }
+ }
+
if (restart_lsn != InvalidXLogRecPtr &&
(min_required == InvalidXLogRecPtr ||
restart_lsn < min_required))
@@ -1216,7 +1238,9 @@ ReplicationSlotsComputeLogicalRestartLSN(void)
{
ReplicationSlot *s;
XLogRecPtr restart_lsn;
+ XLogRecPtr last_saved_restart_lsn;
bool invalidated;
+ ReplicationSlotPersistency persistency;
s = &ReplicationSlotCtl->replication_slots[i];
@@ -1230,14 +1254,33 @@ ReplicationSlotsComputeLogicalRestartLSN(void)
/* read once, it's ok if it increases while we're checking */
SpinLockAcquire(&s->mutex);
+ persistency = s->data.persistency;
restart_lsn = s->data.restart_lsn;
invalidated = s->data.invalidated != RS_INVAL_NONE;
+ last_saved_restart_lsn = s->last_saved_restart_lsn;
SpinLockRelease(&s->mutex);
/* invalidated slots need not apply */
if (invalidated)
continue;
+ /*
+ * For persistent slot use last_saved_restart_lsn to compute the
+ * oldest LSN for removal of WAL segments. The segments between
+ * last_saved_restart_lsn and restart_lsn might be needed by a
+ * persistent slot in the case of database crash. Non-persistent
+ * slots can't survive the database crash, so we don't care about
+ * last_saved_restart_lsn for them.
+ */
+ if (persistency == RS_PERSISTENT)
+ {
+ if (last_saved_restart_lsn != InvalidXLogRecPtr &&
+ restart_lsn > last_saved_restart_lsn)
+ {
+ restart_lsn = last_saved_restart_lsn;
+ }
+ }
+
if (restart_lsn == InvalidXLogRecPtr)
continue;
@@ -1455,6 +1498,7 @@ ReplicationSlotReserveWal(void)
Assert(slot != NULL);
Assert(slot->data.restart_lsn == InvalidXLogRecPtr);
+ Assert(slot->last_saved_restart_lsn == InvalidXLogRecPtr);
/*
* The replication slot mechanism is used to prevent removal of required
@@ -1766,6 +1810,8 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
*/
SpinLockAcquire(&s->mutex);
+ Assert(s->data.restart_lsn >= s->last_saved_restart_lsn);
+
restart_lsn = s->data.restart_lsn;
/* we do nothing if the slot is already invalid */
@@ -1835,7 +1881,10 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
* just rely on .invalidated.
*/
if (invalidation_cause == RS_INVAL_WAL_REMOVED)
+ {
s->data.restart_lsn = InvalidXLogRecPtr;
+ s->last_saved_restart_lsn = InvalidXLogRecPtr;
+ }
/* Let caller know */
*invalidated = true;
@@ -2079,6 +2128,12 @@ CheckPointReplicationSlots(bool is_shutdown)
SaveSlotToPath(s, path, LOG);
}
LWLockRelease(ReplicationSlotAllocationLock);
+
+ /*
+ * Recompute the required LSN as SaveSlotToPath() updated
+ * last_saved_restart_lsn for slots.
+ */
+ ReplicationSlotsComputeRequiredLSN();
}
/*
@@ -2354,6 +2409,7 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
if (!slot->just_dirtied)
slot->dirty = false;
slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
+ slot->last_saved_restart_lsn = cp.slotdata.restart_lsn;
SpinLockRelease(&slot->mutex);
LWLockRelease(&slot->io_in_progress_lock);
@@ -2569,6 +2625,7 @@ RestoreSlotFromDisk(const char *name)
slot->effective_xmin = cp.slotdata.xmin;
slot->effective_catalog_xmin = cp.slotdata.catalog_xmin;
slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
+ slot->last_saved_restart_lsn = cp.slotdata.restart_lsn;
slot->candidate_catalog_xmin = InvalidTransactionId;
slot->candidate_xmin_lsn = InvalidXLogRecPtr;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index eb0b93b1114..e6fa9a4b5ab 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -215,6 +215,13 @@ typedef struct ReplicationSlot
* recently stopped.
*/
TimestampTz inactive_since;
+
+ /* Latest restart_lsn that has been flushed to disk. For persistent slots
+ * the flushed LSN should be taken into account when calculating the oldest
+ * LSN for WAL segments removal.
+ */
+ XLogRecPtr last_saved_restart_lsn;
+
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
--
2.39.5 (Apple Git-154)
v2-0002-Add-TAP-tests-to-check-replication-slot-advance-d.patchapplication/octet-stream; name=v2-0002-Add-TAP-tests-to-check-replication-slot-advance-d.patchDownload
From 5aa06332d2cccae44e25583638796742346dd462 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sat, 24 May 2025 13:26:28 +0300
Subject: [PATCH v2 2/2] Add TAP tests to check replication slot advance during
the checkpoint
The new tests verify that logical and physical replication slots are still
valid after an immediate restart on checkpoint completion when the slot was
advanced during the checkpoint.
This commit introduces two new injection points to make these tests possible:
* checkpoint-before-old-wal-removal - triggered in the checkpointer process
just before old WAL segments cleanup;
* logical-replication-slot-advance-segment - triggered in
LogicalConfirmReceivedLocation() when restart_lsn was changed enough to
point to the next WAL segment.
Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Backpatch-through: 17
---
src/backend/access/transam/xlog.c | 4 +
src/backend/replication/logical/logical.c | 18 +++
src/test/recovery/meson.build | 2 +
.../recovery/t/046_checkpoint_logical_slot.pl | 139 ++++++++++++++++++
.../t/047_checkpoint_physical_slot.pl | 133 +++++++++++++++++
5 files changed, 296 insertions(+)
create mode 100644 src/test/recovery/t/046_checkpoint_logical_slot.pl
create mode 100644 src/test/recovery/t/047_checkpoint_physical_slot.pl
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1914859b2ee..47ffc0a2307 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7498,6 +7498,10 @@ CreateCheckPoint(int flags)
if (PriorRedoPtr != InvalidXLogRecPtr)
UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
+#ifdef USE_INJECTION_POINTS
+ INJECTION_POINT("checkpoint-before-old-wal-removal", NULL);
+#endif
+
/*
* Delete old log files, those no longer needed for last checkpoint to
* prevent the disk holding the xlog from growing full.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 1d56d0c4ef3..f1eb798f3e9 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -29,6 +29,7 @@
#include "postgres.h"
#include "access/xact.h"
+#include "access/xlog_internal.h"
#include "access/xlogutils.h"
#include "fmgr.h"
#include "miscadmin.h"
@@ -41,6 +42,7 @@
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/inval.h"
#include "utils/memutils.h"
@@ -1825,9 +1827,13 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
{
bool updated_xmin = false;
bool updated_restart = false;
+ XLogRecPtr restart_lsn pg_attribute_unused();
SpinLockAcquire(&MyReplicationSlot->mutex);
+ /* remember the old restart lsn */
+ restart_lsn = MyReplicationSlot->data.restart_lsn;
+
/*
* Prevent moving the confirmed_flush backwards, as this could lead to
* data duplication issues caused by replicating already replicated
@@ -1881,6 +1887,18 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
/* first write new xmin to disk, so we know what's up after a crash */
if (updated_xmin || updated_restart)
{
+#ifdef USE_INJECTION_POINTS
+ XLogSegNo seg1,
+ seg2;
+
+ XLByteToSeg(restart_lsn, seg1, wal_segment_size);
+ XLByteToSeg(MyReplicationSlot->data.restart_lsn, seg2, wal_segment_size);
+
+ /* trigger injection point, but only if segment changes */
+ if (seg1 != seg2)
+ INJECTION_POINT("logical-replication-slot-advance-segment", NULL);
+#endif
+
ReplicationSlotMarkDirty();
ReplicationSlotSave();
elog(DEBUG1, "updated xmin: %u restart: %u", updated_xmin, updated_restart);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index cb983766c67..92429d28402 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -54,6 +54,8 @@ tests += {
't/043_no_contrecord_switch.pl',
't/044_invalidate_inactive_slots.pl',
't/045_archive_restartpoint.pl',
+ 't/046_checkpoint_logical_slot.pl',
+ 't/047_checkpoint_physical_slot.pl'
],
},
}
diff --git a/src/test/recovery/t/046_checkpoint_logical_slot.pl b/src/test/recovery/t/046_checkpoint_logical_slot.pl
new file mode 100644
index 00000000000..b4265c4a6a5
--- /dev/null
+++ b/src/test/recovery/t/046_checkpoint_logical_slot.pl
@@ -0,0 +1,139 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+# This test verifies the case when the logical slot is advanced during
+# checkpoint. The test checks that the logical slot's restart_lsn still refers
+# to an existed WAL segment after immediate restart.
+#
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+
+$node = PostgreSQL::Test::Cluster->new('mike');
+$node->init;
+$node->append_conf('postgresql.conf',
+ "shared_preload_libraries = 'injection_points'");
+$node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# Create a simple table to generate data into.
+$node->safe_psql('postgres',
+ q{create table t (id serial primary key, b text)});
+
+# Create the two slots we'll need.
+$node->safe_psql('postgres',
+ q{select pg_create_logical_replication_slot('slot_logical', 'test_decoding')}
+);
+$node->safe_psql('postgres',
+ q{select pg_create_physical_replication_slot('slot_physical', true)});
+
+# Advance both slots to the current position just to have everything "valid".
+$node->safe_psql('postgres',
+ q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null)}
+);
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Run checkpoint to flush current state to disk and set a baseline.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Generate some transactions to get RUNNING_XACTS.
+my $xacts = $node->background_psql('postgres');
+$xacts->query_until(
+ qr/run_xacts/,
+ q(\echo run_xacts
+SELECT 1 \watch 0.1
+\q
+));
+
+# Insert 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+);
+
+# Run another checkpoint to set a new restore LSN.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Another 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+);
+
+# Run another checkpoint, this time in the background, and make it wait
+# on the injection point) so that the checkpoint stops right before
+# removing old WAL segments.
+note('starting checkpoint\n');
+
+my $checkpoint = $node->background_psql('postgres');
+$checkpoint->query_safe(
+ q(select injection_points_attach('checkpoint-before-old-wal-removal','wait'))
+);
+$checkpoint->query_until(
+ qr/starting_checkpoint/,
+ q(\echo starting_checkpoint
+checkpoint;
+\q
+));
+
+# Wait until the checkpoint stops right before removing WAL segments.
+note('waiting for injection_point\n');
+$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
+note('injection_point is reached');
+
+# Try to advance the logical slot, but make it stop when it moves to the next
+# WAL segment (this has to happen in the background, too).
+my $logical = $node->background_psql('postgres');
+$logical->query_safe(
+ q{select injection_points_attach('logical-replication-slot-advance-segment','wait');}
+);
+$logical->query_until(
+ qr/get_changes/,
+ q(
+\echo get_changes
+select count(*) from pg_logical_slot_get_changes('slot_logical', null, null) \watch 1
+\q
+));
+
+# Wait until the slot's restart_lsn points to the next WAL segment.
+note('waiting for injection_point\n');
+$node->wait_for_event('client backend',
+ 'logical-replication-slot-advance-segment');
+note('injection_point is reached');
+
+# OK, we're in the right situation: time to advance the physical slot, which
+# recalculates the required LSN, and then unblock the checkpoint, which
+# removes the WAL still needed by the logical slot.
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Continue the checkpoint.
+$node->safe_psql('postgres',
+ q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
+
+# Abruptly stop the server (1 second should be enough for the checkpoint
+# to finish; it would be better).
+$node->stop('immediate');
+
+$node->start;
+
+eval {
+ $node->safe_psql('postgres',
+ q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null);}
+ );
+};
+is($@, '', "Logical slot still valid");
+
+done_testing();
diff --git a/src/test/recovery/t/047_checkpoint_physical_slot.pl b/src/test/recovery/t/047_checkpoint_physical_slot.pl
new file mode 100644
index 00000000000..454e56b9bd2
--- /dev/null
+++ b/src/test/recovery/t/047_checkpoint_physical_slot.pl
@@ -0,0 +1,133 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+# This test verifies the case when the physical slot is advanced during
+# checkpoint. The test checks that the physical slot's restart_lsn still refers
+# to an existed WAL segment after immediate restart.
+#
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+
+$node = PostgreSQL::Test::Cluster->new('mike');
+$node->init;
+$node->append_conf('postgresql.conf',
+ "shared_preload_libraries = 'injection_points'");
+$node->append_conf('postgresql.conf', "wal_level = 'replica'");
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# Create a simple table to generate data into.
+$node->safe_psql('postgres',
+ q{create table t (id serial primary key, b text)});
+
+# Create a physical replication slot.
+$node->safe_psql('postgres',
+ q{select pg_create_physical_replication_slot('slot_physical', true)});
+
+# Advance slot to the current position, just to have everything "valid".
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Run checkpoint to flush current state to disk and set a baseline.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Insert 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,100000) s(i)}
+);
+
+# Advance slot to the current position, just to have everything "valid".
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Run another checkpoint to set a new restore LSN.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Another 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+);
+
+my $restart_lsn_init = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'}
+);
+chomp($restart_lsn_init);
+note("restart lsn before checkpoint: $restart_lsn_init");
+
+# Run another checkpoint, this time in the background, and make it wait
+# on the injection point) so that the checkpoint stops right before
+# removing old WAL segments.
+note('starting checkpoint');
+
+my $checkpoint = $node->background_psql('postgres');
+$checkpoint->query_safe(
+ q{select injection_points_attach('checkpoint-before-old-wal-removal','wait')}
+);
+$checkpoint->query_until(
+ qr/starting_checkpoint/,
+ q(\echo starting_checkpoint
+checkpoint;
+\q
+));
+
+# Wait until the checkpoint stops right before removing WAL segments.
+note('waiting for injection_point');
+$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
+note('injection_point is reached');
+
+# OK, we're in the right situation: time to advance the physical slot, which
+# recalculates the required LSN and then unblock the checkpoint, which
+# removes the WAL still needed by the physical slot.
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Continue the checkpoint.
+$node->safe_psql('postgres',
+ q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
+
+my $restart_lsn_old = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'}
+);
+chomp($restart_lsn_old);
+note("restart lsn before stop: $restart_lsn_old");
+
+# Abruptly stop the server (1 second should be enough for the checkpoint
+# to finish; it would be better).
+$node->stop('immediate');
+
+$node->start;
+
+# Get the restart_lsn of the slot right after restarting.
+my $restart_lsn = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'}
+);
+chomp($restart_lsn);
+note("restart lsn: $restart_lsn");
+
+# Get the WAL segment name for the slot's restart_lsn.
+my $restart_lsn_segment = $node->safe_psql('postgres',
+ "SELECT pg_walfile_name('$restart_lsn'::pg_lsn)");
+chomp($restart_lsn_segment);
+
+# Check if the required wal segment exists.
+note("required by slot segment name: $restart_lsn_segment");
+my $datadir = $node->data_dir;
+ok( -f "$datadir/pg_wal/$restart_lsn_segment",
+ "WAL segment $restart_lsn_segment for physical slot's restart_lsn $restart_lsn exists"
+);
+
+done_testing();
--
2.39.5 (Apple Git-154)
Dear Alexander, Amit
Alexander Korotkov wrote:
Also, I've changed ReplicationSlotsComputeRequiredLSN() call to
CheckPointReplicationSlots() to update required LSN after
SaveSlotToPath() updated last_saved_restart_lsn. This helps to pass
checks in 001_stream_rep.pl without additional hacks.
Thank you for the improvement and patch preparation. I confirm the test is
passed without additional hacks now.
I sill do not understand why this solution is favored. It is, in my opinion,
a non backward-compatible solution. In any case, I'm ok to go with this patch.
If needed, I may prepare a backward-compatible solution where
last_saved_restart_lsn values will be in an another place of the shmem, rather
than in ReplicationSlot struct.
I still would like to add my 5 cents to the discussion.
The purpose of the xmin value is to prevent tuples from vacuuming. Slots'
restart_lsn values are used to calculate the oldest lsn to keep WAL segments
from removal in checkpoint. These processes are pretty independent.
The logical slots are advanced in 2 steps. At the first step, the logical
decoding stuff periodically sets consistent candidate values for catalog_xmin and
restart_lsn. At the second step, when LogicalConfirmReceivedLocation is called,
the candidate values are assigned on catalog_xmin and restart_lsn values based
on the confirmed lsn value. The slot is saved with these consistent values.
It is important, that the candidate values are consistent, decoding guarantees
it. In case of a crash, we should guarantee that the loaded from the disk
catalog_xmin and restart_lsn values are consistent and valid for logical slots.
LogicalConfirmReceivedLocation function keeps this consistency by updating them
from consistent candidate values in a single operation.
We have to guarantee that we use saved to disk values to calculate xmin horizon
and slots' oldest lsn. For this purpose, effective_catalog_xmin is used. We
update effective_catalog_xmin in LogicalConfirmReceivedLocation just
after saving slot to disk. Another place where we update effective_catalog_xmin
is when walsender receives hot standby feedback message.
Once, we have two independent processes (vacuuming, checkpoint), we can calculate
xmin horizon and oldest WAL lsn values independently (at different times) from
the saved to disk values. Note, these values are updated in a non atomic way.
The xmin value is set when the node receives hot standby feedback and it is used
to keep tuples from vacuuming as well as catalog_xmin for decoding stuff. Not
sure, xmin is applicable for logical replication.
The confirmed flush lsn is used as a startpoint when a peer node doesn't provide
the start lsn and to check that the start lsn is not older than the latest
confirmed flush lsn. The saving of the slot on disk at each call of
LogicalConfirmReceivedLocation doesn't help to avoid conflicts completely, but
it helps to decrease the probability of conflicts. So, i'm still not sure, we
need to save logical slots on each advance to avoid conflicts, because it
doesn't help in general. The conflicts should be resolved by other means.
Once, we truncate old wal segments in checkpoint only. I believe, it is ok if we
calculate the oldest lsn only at the beginning of the checkpoint, as it was in
the alternative solution. I think, we can update xmin horizon in checkpoint only
but the horizon advancing will be more lazy in this case.
Taking into account these thoughts, I can't see any problems with the alternative
patch where oldest wal lsn is calculated only in checkpoint.
With best regards,
Vitaly
On Thu, Jun 5, 2025 at 8:51 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
Dear Alexander, Amit
Alexander Korotkov wrote:
Also, I've changed ReplicationSlotsComputeRequiredLSN() call to
CheckPointReplicationSlots() to update required LSN after
SaveSlotToPath() updated last_saved_restart_lsn. This helps to pass
checks in 001_stream_rep.pl without additional hacks.Thank you for the improvement and patch preparation. I confirm the test is
passed without additional hacks now.I sill do not understand why this solution is favored. It is, in my opinion,
a non backward-compatible solution. In any case, I'm ok to go with this patch.
If needed, I may prepare a backward-compatible solution where
last_saved_restart_lsn values will be in an another place of the shmem, rather
than in ReplicationSlot struct.
I think we can use this approach for HEAD and probably keep the
previous idea for backbranches. Keeping some value in shared_memory
per slot sounds risky to me in terms of introducing new bugs.
I still would like to add my 5 cents to the discussion.
The purpose of the xmin value is to prevent tuples from vacuuming. Slots'
restart_lsn values are used to calculate the oldest lsn to keep WAL segments
from removal in checkpoint. These processes are pretty independent.The logical slots are advanced in 2 steps. At the first step, the logical
decoding stuff periodically sets consistent candidate values for catalog_xmin and
restart_lsn. At the second step, when LogicalConfirmReceivedLocation is called,
the candidate values are assigned on catalog_xmin and restart_lsn values based
on the confirmed lsn value. The slot is saved with these consistent values.
It is important, that the candidate values are consistent, decoding guarantees
it. In case of a crash, we should guarantee that the loaded from the disk
catalog_xmin and restart_lsn values are consistent and valid for logical slots.
LogicalConfirmReceivedLocation function keeps this consistency by updating them
from consistent candidate values in a single operation.We have to guarantee that we use saved to disk values to calculate xmin horizon
and slots' oldest lsn. For this purpose, effective_catalog_xmin is used. We
update effective_catalog_xmin in LogicalConfirmReceivedLocation just
after saving slot to disk. Another place where we update effective_catalog_xmin
is when walsender receives hot standby feedback message.Once, we have two independent processes (vacuuming, checkpoint), we can calculate
xmin horizon and oldest WAL lsn values independently (at different times) from
the saved to disk values. Note, these values are updated in a non atomic way.The xmin value is set when the node receives hot standby feedback and it is used
to keep tuples from vacuuming as well as catalog_xmin for decoding stuff.
Yeah, but with physical slots, it is possible that the slot's xmin
value is pointing to some value, say 700 (after restart), but vacuum
would have removed tuples from transaction IDs greater than 700 as
explained in email [1]/messages/by-id/CAA4eK1KMaPA5jir_SFu+qr3qu55OOdFWVZpuUkqTSGZ9fyPpHA@mail.gmail.com.
Not
sure, xmin is applicable for logical replication.
The confirmed flush lsn is used as a startpoint when a peer node doesn't provide
the start lsn and to check that the start lsn is not older than the latest
confirmed flush lsn. The saving of the slot on disk at each call of
LogicalConfirmReceivedLocation doesn't help to avoid conflicts completely, but
it helps to decrease the probability of conflicts.
We don't save slots at each call of LogicalConfirmReceivedLocation()
and when we save also, it is not to avoid conflicts but to avoid
removing required WAL segments and tuples.
So, i'm still not sure, we
need to save logical slots on each advance to avoid conflicts, because it
doesn't help in general. The conflicts should be resolved by other means.Once, we truncate old wal segments in checkpoint only. I believe, it is ok if we
calculate the oldest lsn only at the beginning of the checkpoint, as it was in
the alternative solution. I think, we can update xmin horizon in checkpoint only
but the horizon advancing will be more lazy in this case.Taking into account these thoughts, I can't see any problems with the alternative
patch where oldest wal lsn is calculated only in checkpoint.
The alternative will needlessly prevent removing WAL segments in some
cases when logical slots are in use.
[1]: /messages/by-id/CAA4eK1KMaPA5jir_SFu+qr3qu55OOdFWVZpuUkqTSGZ9fyPpHA@mail.gmail.com
--
With Regards,
Amit Kapila.
On Tue, Jun 3, 2025 at 6:51 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
As per my understanding, for logical slots, effective_xmin is only set
during the initial copy phase (or say if one has to export a
snapshot), after that, its value won't change. Please read the
comments in CreateInitDecodingContext() where we set its value. If you
still have questions about it, we can discuss further.OK, thank you for the clarification. I've read the comments in
CreateInitDecodingContext() as you suggested. All of above makes me
think *_xmin fields are handled properly.
Yes, they handled properly for logical slots, but there is no similar
safety mechanism for physical slots.
One minor comment:
+
+ /* Latest restart_lsn that has been flushed to disk. For persistent slots
+ * the flushed LSN should be taken into account when calculating the oldest
This doesn't follow our practice for multi-line comments.
--
With Regards,
Amit Kapila.
Hi Amit,
I think we can use this approach for HEAD and probably keep the
previous idea for backbranches. Keeping some value in shared_memory
per slot sounds risky to me in terms of introducing new bugs.
Not sure, what kind of problems may occur. I propose to allocate in shmem an
array of last_saved_restart_lsn like below which is not a part of the public
api (see below). It will be allocated and deallocated in shmem the same way as
ReplicationSlotCtlData. I can prepare a patch, if needed.
typedef struct ReplicationSlotCtlDataExt {
XLogRecPtr last_saved_restart_lsn[1];
} ReplicationSlotCtlDataExt;
Yeah, but with physical slots, it is possible that the slot's xmin
value is pointing to some value, say 700 (after restart), but vacuum
would have removed tuples from transaction IDs greater than 700 as
explained in email [1].
I think, we have no xmin problem for physical slots. The xmin values of
physical slots are used to process HSF messages. If I correctly understood what
you mean, you are telling about the problem which is solved by hot standby
feedback messages. This message is used to disable tuples vacuuming on the
primary to avoid delete conflicts on the replica in queries (some queries may
select some tuples which were vacuumed on the primary and deletions are
replicated to the standby). If the primary receives a HSF message after slot
saving, I believe, it is allowable if autovacuum cleans tuples with xmin later
than the last saved value. If the primary restarts, the older value will be
loaded but the replica already confirmed the newer value. Concerning replica,
it is the obligation of the replica to send such HSF xmin that will survive
replica's immediate restart.
Taking into account these thoughts, I can't see any problems with the alternative
patch where oldest wal lsn is calculated only in checkpoint.The alternative will needlessly prevent removing WAL segments in some
cases when logical slots are in use.
IMHO, I'm not sure, it will significantly impact the wal removal. We remove WAL
segments only in checkpoint. The alternate solution gets the oldest WAL segment
at the beginning of checkpoint, then saves dirty slots to disk, and removes old
WAL segments at the end of checkpoint using the oldest WAL segment obtained at
the beginning of checkpoint. The alternate solution may not be so effective
in terms of WAL segments removal, if a logical slot is advanced during
checkpoint, but I do not think it is a significant issue. From the other hand,
the alternate solution simplifies the logic of WAL removal, backward compatible
(avoids addition new in-memory states), decreases the number of locks in
ReplicationSlotsComputeRequiredLSN - no need to recalculate oldest slots'
restart lsn every time when a slot is advanced.
With best regards,
Vitaly
On Mon, Jun 9, 2025 at 7:09 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
I think we can use this approach for HEAD and probably keep the
previous idea for backbranches. Keeping some value in shared_memory
per slot sounds risky to me in terms of introducing new bugs.Not sure, what kind of problems may occur. I propose to allocate in shmem an
array of last_saved_restart_lsn like below which is not a part of the public
api (see below). It will be allocated and deallocated in shmem the same way as
ReplicationSlotCtlData. I can prepare a patch, if needed.typedef struct ReplicationSlotCtlDataExt {
XLogRecPtr last_saved_restart_lsn[1];
} ReplicationSlotCtlDataExt;
This could work, but I think this is not a solution for HEAD anyway.
In the HEAD, it would be better to keep everything inside the
ReplicationSlot struct. In the same time, I don't like idea to have
different shared memory structs between branches if we can avoid that.
Yeah, but with physical slots, it is possible that the slot's xmin
value is pointing to some value, say 700 (after restart), but vacuum
would have removed tuples from transaction IDs greater than 700 as
explained in email [1].I think, we have no xmin problem for physical slots. The xmin values of
physical slots are used to process HSF messages. If I correctly understood what
you mean, you are telling about the problem which is solved by hot standby
feedback messages. This message is used to disable tuples vacuuming on the
primary to avoid delete conflicts on the replica in queries (some queries may
select some tuples which were vacuumed on the primary and deletions are
replicated to the standby). If the primary receives a HSF message after slot
saving, I believe, it is allowable if autovacuum cleans tuples with xmin later
than the last saved value. If the primary restarts, the older value will be
loaded but the replica already confirmed the newer value. Concerning replica,
it is the obligation of the replica to send such HSF xmin that will survive
replica's immediate restart.
+1
Taking into account these thoughts, I can't see any problems with the alternative
patch where oldest wal lsn is calculated only in checkpoint.The alternative will needlessly prevent removing WAL segments in some
cases when logical slots are in use.IMHO, I'm not sure, it will significantly impact the wal removal. We remove WAL
segments only in checkpoint. The alternate solution gets the oldest WAL segment
at the beginning of checkpoint, then saves dirty slots to disk, and removes old
WAL segments at the end of checkpoint using the oldest WAL segment obtained at
the beginning of checkpoint. The alternate solution may not be so effective
in terms of WAL segments removal, if a logical slot is advanced during
checkpoint, but I do not think it is a significant issue. From the other hand,
the alternate solution simplifies the logic of WAL removal, backward compatible
(avoids addition new in-memory states), decreases the number of locks in
ReplicationSlotsComputeRequiredLSN - no need to recalculate oldest slots'
restart lsn every time when a slot is advanced.
So, my proposal is to commit the attached patchset to the HEAD, and
commit [1] to the back branches. Any objections?
Links.
1. /messages/by-id/CAPpHfdutKQxpm-gJgiZRb2ouKC9+HZx3fG3F00zd=xdxDidm_g@mail.gmail.com
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v3-0001-Keep-WAL-segments-by-slot-s-flushed-restart-LSN.patchapplication/octet-stream; name=v3-0001-Keep-WAL-segments-by-slot-s-flushed-restart-LSN.patchDownload
From 643ad0762e3aec45be5ff233a788a2659a8c0852 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Tue, 10 Jun 2025 23:05:48 +0300
Subject: [PATCH v3 1/2] Keep WAL segments by slot's flushed restart LSN
The slot data is flushed to the disk at the beginning of checkpoint. If
an existing slot is advanced in the middle of checkpoint execution, its
advanced restart LSN is taken to calculate the oldest LSN for WAL
segments removal at the end of checkpoint. If the node is restarted just
after the checkpoint, the slots data will be read from the disk at
recovery with the oldest restart LSN which can refer to removed WAL
segments.
The patch introduces a new in-memory state for slots -
flushed_restart_lsn which is used to calculate the oldest LSN for WAL
segments removal. This state is updated every time with the current
restart_lsn at the moment, when the slot is saving to disk.
Discussion: https://postgr.es/m/1d12d2-67235980-35-19a406a0%4063439497
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
---
src/backend/replication/slot.c | 57 ++++++++++++++++++++++++++++++++++
src/include/replication/slot.h | 8 +++++
2 files changed, 65 insertions(+)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 600b87fa9cb..c64f020742f 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -424,6 +424,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
slot->candidate_restart_valid = InvalidXLogRecPtr;
slot->candidate_restart_lsn = InvalidXLogRecPtr;
slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
+ slot->last_saved_restart_lsn = InvalidXLogRecPtr;
slot->inactive_since = 0;
/*
@@ -1165,20 +1166,41 @@ ReplicationSlotsComputeRequiredLSN(void)
{
ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
XLogRecPtr restart_lsn;
+ XLogRecPtr last_saved_restart_lsn;
bool invalidated;
+ ReplicationSlotPersistency persistency;
if (!s->in_use)
continue;
SpinLockAcquire(&s->mutex);
+ persistency = s->data.persistency;
restart_lsn = s->data.restart_lsn;
invalidated = s->data.invalidated != RS_INVAL_NONE;
+ last_saved_restart_lsn = s->last_saved_restart_lsn;
SpinLockRelease(&s->mutex);
/* invalidated slots need not apply */
if (invalidated)
continue;
+ /*
+ * For persistent slot use last_saved_restart_lsn to compute the
+ * oldest LSN for removal of WAL segments. The segments between
+ * last_saved_restart_lsn and restart_lsn might be needed by a
+ * persistent slot in the case of database crash. Non-persistent
+ * slots can't survive the database crash, so we don't care about
+ * last_saved_restart_lsn for them.
+ */
+ if (persistency == RS_PERSISTENT)
+ {
+ if (last_saved_restart_lsn != InvalidXLogRecPtr &&
+ restart_lsn > last_saved_restart_lsn)
+ {
+ restart_lsn = last_saved_restart_lsn;
+ }
+ }
+
if (restart_lsn != InvalidXLogRecPtr &&
(min_required == InvalidXLogRecPtr ||
restart_lsn < min_required))
@@ -1216,7 +1238,9 @@ ReplicationSlotsComputeLogicalRestartLSN(void)
{
ReplicationSlot *s;
XLogRecPtr restart_lsn;
+ XLogRecPtr last_saved_restart_lsn;
bool invalidated;
+ ReplicationSlotPersistency persistency;
s = &ReplicationSlotCtl->replication_slots[i];
@@ -1230,14 +1254,33 @@ ReplicationSlotsComputeLogicalRestartLSN(void)
/* read once, it's ok if it increases while we're checking */
SpinLockAcquire(&s->mutex);
+ persistency = s->data.persistency;
restart_lsn = s->data.restart_lsn;
invalidated = s->data.invalidated != RS_INVAL_NONE;
+ last_saved_restart_lsn = s->last_saved_restart_lsn;
SpinLockRelease(&s->mutex);
/* invalidated slots need not apply */
if (invalidated)
continue;
+ /*
+ * For persistent slot use last_saved_restart_lsn to compute the
+ * oldest LSN for removal of WAL segments. The segments between
+ * last_saved_restart_lsn and restart_lsn might be needed by a
+ * persistent slot in the case of database crash. Non-persistent
+ * slots can't survive the database crash, so we don't care about
+ * last_saved_restart_lsn for them.
+ */
+ if (persistency == RS_PERSISTENT)
+ {
+ if (last_saved_restart_lsn != InvalidXLogRecPtr &&
+ restart_lsn > last_saved_restart_lsn)
+ {
+ restart_lsn = last_saved_restart_lsn;
+ }
+ }
+
if (restart_lsn == InvalidXLogRecPtr)
continue;
@@ -1455,6 +1498,7 @@ ReplicationSlotReserveWal(void)
Assert(slot != NULL);
Assert(slot->data.restart_lsn == InvalidXLogRecPtr);
+ Assert(slot->last_saved_restart_lsn == InvalidXLogRecPtr);
/*
* The replication slot mechanism is used to prevent removal of required
@@ -1766,6 +1810,8 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
*/
SpinLockAcquire(&s->mutex);
+ Assert(s->data.restart_lsn >= s->last_saved_restart_lsn);
+
restart_lsn = s->data.restart_lsn;
/* we do nothing if the slot is already invalid */
@@ -1835,7 +1881,10 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
* just rely on .invalidated.
*/
if (invalidation_cause == RS_INVAL_WAL_REMOVED)
+ {
s->data.restart_lsn = InvalidXLogRecPtr;
+ s->last_saved_restart_lsn = InvalidXLogRecPtr;
+ }
/* Let caller know */
*invalidated = true;
@@ -2079,6 +2128,12 @@ CheckPointReplicationSlots(bool is_shutdown)
SaveSlotToPath(s, path, LOG);
}
LWLockRelease(ReplicationSlotAllocationLock);
+
+ /*
+ * Recompute the required LSN as SaveSlotToPath() updated
+ * last_saved_restart_lsn for slots.
+ */
+ ReplicationSlotsComputeRequiredLSN();
}
/*
@@ -2354,6 +2409,7 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
if (!slot->just_dirtied)
slot->dirty = false;
slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
+ slot->last_saved_restart_lsn = cp.slotdata.restart_lsn;
SpinLockRelease(&slot->mutex);
LWLockRelease(&slot->io_in_progress_lock);
@@ -2569,6 +2625,7 @@ RestoreSlotFromDisk(const char *name)
slot->effective_xmin = cp.slotdata.xmin;
slot->effective_catalog_xmin = cp.slotdata.catalog_xmin;
slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
+ slot->last_saved_restart_lsn = cp.slotdata.restart_lsn;
slot->candidate_catalog_xmin = InvalidTransactionId;
slot->candidate_xmin_lsn = InvalidXLogRecPtr;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index eb0b93b1114..ffacba9d2ae 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -215,6 +215,14 @@ typedef struct ReplicationSlot
* recently stopped.
*/
TimestampTz inactive_since;
+
+ /*
+ * Latest restart_lsn that has been flushed to disk. For persistent slots
+ * the flushed LSN should be taken into account when calculating the
+ * oldest LSN for WAL segments removal.
+ */
+ XLogRecPtr last_saved_restart_lsn;
+
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
--
2.39.5 (Apple Git-154)
v3-0002-Add-TAP-tests-to-check-replication-slot-advance-d.patchapplication/octet-stream; name=v3-0002-Add-TAP-tests-to-check-replication-slot-advance-d.patchDownload
From dfb7510590a9c17295e0026ec076ba02b5f5ead3 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sat, 24 May 2025 13:26:28 +0300
Subject: [PATCH v3 2/2] Add TAP tests to check replication slot advance during
the checkpoint
The new tests verify that logical and physical replication slots are still
valid after an immediate restart on checkpoint completion when the slot was
advanced during the checkpoint.
This commit introduces two new injection points to make these tests possible:
* checkpoint-before-old-wal-removal - triggered in the checkpointer process
just before old WAL segments cleanup;
* logical-replication-slot-advance-segment - triggered in
LogicalConfirmReceivedLocation() when restart_lsn was changed enough to
point to the next WAL segment.
Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Backpatch-through: 17
---
src/backend/access/transam/xlog.c | 4 +
src/backend/replication/logical/logical.c | 18 +++
src/test/recovery/meson.build | 2 +
.../recovery/t/046_checkpoint_logical_slot.pl | 139 ++++++++++++++++++
.../t/047_checkpoint_physical_slot.pl | 133 +++++++++++++++++
5 files changed, 296 insertions(+)
create mode 100644 src/test/recovery/t/046_checkpoint_logical_slot.pl
create mode 100644 src/test/recovery/t/047_checkpoint_physical_slot.pl
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1914859b2ee..47ffc0a2307 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7498,6 +7498,10 @@ CreateCheckPoint(int flags)
if (PriorRedoPtr != InvalidXLogRecPtr)
UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
+#ifdef USE_INJECTION_POINTS
+ INJECTION_POINT("checkpoint-before-old-wal-removal", NULL);
+#endif
+
/*
* Delete old log files, those no longer needed for last checkpoint to
* prevent the disk holding the xlog from growing full.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 1d56d0c4ef3..f1eb798f3e9 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -29,6 +29,7 @@
#include "postgres.h"
#include "access/xact.h"
+#include "access/xlog_internal.h"
#include "access/xlogutils.h"
#include "fmgr.h"
#include "miscadmin.h"
@@ -41,6 +42,7 @@
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/inval.h"
#include "utils/memutils.h"
@@ -1825,9 +1827,13 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
{
bool updated_xmin = false;
bool updated_restart = false;
+ XLogRecPtr restart_lsn pg_attribute_unused();
SpinLockAcquire(&MyReplicationSlot->mutex);
+ /* remember the old restart lsn */
+ restart_lsn = MyReplicationSlot->data.restart_lsn;
+
/*
* Prevent moving the confirmed_flush backwards, as this could lead to
* data duplication issues caused by replicating already replicated
@@ -1881,6 +1887,18 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
/* first write new xmin to disk, so we know what's up after a crash */
if (updated_xmin || updated_restart)
{
+#ifdef USE_INJECTION_POINTS
+ XLogSegNo seg1,
+ seg2;
+
+ XLByteToSeg(restart_lsn, seg1, wal_segment_size);
+ XLByteToSeg(MyReplicationSlot->data.restart_lsn, seg2, wal_segment_size);
+
+ /* trigger injection point, but only if segment changes */
+ if (seg1 != seg2)
+ INJECTION_POINT("logical-replication-slot-advance-segment", NULL);
+#endif
+
ReplicationSlotMarkDirty();
ReplicationSlotSave();
elog(DEBUG1, "updated xmin: %u restart: %u", updated_xmin, updated_restart);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index cb983766c67..92429d28402 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -54,6 +54,8 @@ tests += {
't/043_no_contrecord_switch.pl',
't/044_invalidate_inactive_slots.pl',
't/045_archive_restartpoint.pl',
+ 't/046_checkpoint_logical_slot.pl',
+ 't/047_checkpoint_physical_slot.pl'
],
},
}
diff --git a/src/test/recovery/t/046_checkpoint_logical_slot.pl b/src/test/recovery/t/046_checkpoint_logical_slot.pl
new file mode 100644
index 00000000000..b4265c4a6a5
--- /dev/null
+++ b/src/test/recovery/t/046_checkpoint_logical_slot.pl
@@ -0,0 +1,139 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+# This test verifies the case when the logical slot is advanced during
+# checkpoint. The test checks that the logical slot's restart_lsn still refers
+# to an existed WAL segment after immediate restart.
+#
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+
+$node = PostgreSQL::Test::Cluster->new('mike');
+$node->init;
+$node->append_conf('postgresql.conf',
+ "shared_preload_libraries = 'injection_points'");
+$node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# Create a simple table to generate data into.
+$node->safe_psql('postgres',
+ q{create table t (id serial primary key, b text)});
+
+# Create the two slots we'll need.
+$node->safe_psql('postgres',
+ q{select pg_create_logical_replication_slot('slot_logical', 'test_decoding')}
+);
+$node->safe_psql('postgres',
+ q{select pg_create_physical_replication_slot('slot_physical', true)});
+
+# Advance both slots to the current position just to have everything "valid".
+$node->safe_psql('postgres',
+ q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null)}
+);
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Run checkpoint to flush current state to disk and set a baseline.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Generate some transactions to get RUNNING_XACTS.
+my $xacts = $node->background_psql('postgres');
+$xacts->query_until(
+ qr/run_xacts/,
+ q(\echo run_xacts
+SELECT 1 \watch 0.1
+\q
+));
+
+# Insert 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+);
+
+# Run another checkpoint to set a new restore LSN.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Another 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+);
+
+# Run another checkpoint, this time in the background, and make it wait
+# on the injection point) so that the checkpoint stops right before
+# removing old WAL segments.
+note('starting checkpoint\n');
+
+my $checkpoint = $node->background_psql('postgres');
+$checkpoint->query_safe(
+ q(select injection_points_attach('checkpoint-before-old-wal-removal','wait'))
+);
+$checkpoint->query_until(
+ qr/starting_checkpoint/,
+ q(\echo starting_checkpoint
+checkpoint;
+\q
+));
+
+# Wait until the checkpoint stops right before removing WAL segments.
+note('waiting for injection_point\n');
+$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
+note('injection_point is reached');
+
+# Try to advance the logical slot, but make it stop when it moves to the next
+# WAL segment (this has to happen in the background, too).
+my $logical = $node->background_psql('postgres');
+$logical->query_safe(
+ q{select injection_points_attach('logical-replication-slot-advance-segment','wait');}
+);
+$logical->query_until(
+ qr/get_changes/,
+ q(
+\echo get_changes
+select count(*) from pg_logical_slot_get_changes('slot_logical', null, null) \watch 1
+\q
+));
+
+# Wait until the slot's restart_lsn points to the next WAL segment.
+note('waiting for injection_point\n');
+$node->wait_for_event('client backend',
+ 'logical-replication-slot-advance-segment');
+note('injection_point is reached');
+
+# OK, we're in the right situation: time to advance the physical slot, which
+# recalculates the required LSN, and then unblock the checkpoint, which
+# removes the WAL still needed by the logical slot.
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Continue the checkpoint.
+$node->safe_psql('postgres',
+ q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
+
+# Abruptly stop the server (1 second should be enough for the checkpoint
+# to finish; it would be better).
+$node->stop('immediate');
+
+$node->start;
+
+eval {
+ $node->safe_psql('postgres',
+ q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null);}
+ );
+};
+is($@, '', "Logical slot still valid");
+
+done_testing();
diff --git a/src/test/recovery/t/047_checkpoint_physical_slot.pl b/src/test/recovery/t/047_checkpoint_physical_slot.pl
new file mode 100644
index 00000000000..454e56b9bd2
--- /dev/null
+++ b/src/test/recovery/t/047_checkpoint_physical_slot.pl
@@ -0,0 +1,133 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+# This test verifies the case when the physical slot is advanced during
+# checkpoint. The test checks that the physical slot's restart_lsn still refers
+# to an existed WAL segment after immediate restart.
+#
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+
+$node = PostgreSQL::Test::Cluster->new('mike');
+$node->init;
+$node->append_conf('postgresql.conf',
+ "shared_preload_libraries = 'injection_points'");
+$node->append_conf('postgresql.conf', "wal_level = 'replica'");
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# Create a simple table to generate data into.
+$node->safe_psql('postgres',
+ q{create table t (id serial primary key, b text)});
+
+# Create a physical replication slot.
+$node->safe_psql('postgres',
+ q{select pg_create_physical_replication_slot('slot_physical', true)});
+
+# Advance slot to the current position, just to have everything "valid".
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Run checkpoint to flush current state to disk and set a baseline.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Insert 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,100000) s(i)}
+);
+
+# Advance slot to the current position, just to have everything "valid".
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Run another checkpoint to set a new restore LSN.
+$node->safe_psql('postgres', q{checkpoint});
+
+# Another 2M rows; that's about 260MB (~20 segments) worth of WAL.
+$node->safe_psql('postgres',
+ q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+);
+
+my $restart_lsn_init = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'}
+);
+chomp($restart_lsn_init);
+note("restart lsn before checkpoint: $restart_lsn_init");
+
+# Run another checkpoint, this time in the background, and make it wait
+# on the injection point) so that the checkpoint stops right before
+# removing old WAL segments.
+note('starting checkpoint');
+
+my $checkpoint = $node->background_psql('postgres');
+$checkpoint->query_safe(
+ q{select injection_points_attach('checkpoint-before-old-wal-removal','wait')}
+);
+$checkpoint->query_until(
+ qr/starting_checkpoint/,
+ q(\echo starting_checkpoint
+checkpoint;
+\q
+));
+
+# Wait until the checkpoint stops right before removing WAL segments.
+note('waiting for injection_point');
+$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
+note('injection_point is reached');
+
+# OK, we're in the right situation: time to advance the physical slot, which
+# recalculates the required LSN and then unblock the checkpoint, which
+# removes the WAL still needed by the physical slot.
+$node->safe_psql('postgres',
+ q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())}
+);
+
+# Continue the checkpoint.
+$node->safe_psql('postgres',
+ q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
+
+my $restart_lsn_old = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'}
+);
+chomp($restart_lsn_old);
+note("restart lsn before stop: $restart_lsn_old");
+
+# Abruptly stop the server (1 second should be enough for the checkpoint
+# to finish; it would be better).
+$node->stop('immediate');
+
+$node->start;
+
+# Get the restart_lsn of the slot right after restarting.
+my $restart_lsn = $node->safe_psql('postgres',
+ q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'}
+);
+chomp($restart_lsn);
+note("restart lsn: $restart_lsn");
+
+# Get the WAL segment name for the slot's restart_lsn.
+my $restart_lsn_segment = $node->safe_psql('postgres',
+ "SELECT pg_walfile_name('$restart_lsn'::pg_lsn)");
+chomp($restart_lsn_segment);
+
+# Check if the required wal segment exists.
+note("required by slot segment name: $restart_lsn_segment");
+my $datadir = $node->data_dir;
+ok( -f "$datadir/pg_wal/$restart_lsn_segment",
+ "WAL segment $restart_lsn_segment for physical slot's restart_lsn $restart_lsn exists"
+);
+
+done_testing();
--
2.39.5 (Apple Git-154)
On Wed, Jun 11, 2025 at 1:44 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Mon, Jun 9, 2025 at 7:09 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
I think we can use this approach for HEAD and probably keep the
previous idea for backbranches. Keeping some value in shared_memory
per slot sounds risky to me in terms of introducing new bugs.Not sure, what kind of problems may occur. I propose to allocate in shmem an
array of last_saved_restart_lsn like below which is not a part of the public
api (see below). It will be allocated and deallocated in shmem the same way as
ReplicationSlotCtlData. I can prepare a patch, if needed.typedef struct ReplicationSlotCtlDataExt {
XLogRecPtr last_saved_restart_lsn[1];
} ReplicationSlotCtlDataExt;This could work, but I think this is not a solution for HEAD anyway.
In the HEAD, it would be better to keep everything inside the
ReplicationSlot struct. In the same time, I don't like idea to have
different shared memory structs between branches if we can avoid that.Yeah, but with physical slots, it is possible that the slot's xmin
value is pointing to some value, say 700 (after restart), but vacuum
would have removed tuples from transaction IDs greater than 700 as
explained in email [1].I think, we have no xmin problem for physical slots. The xmin values of
physical slots are used to process HSF messages. If I correctly understood what
you mean, you are telling about the problem which is solved by hot standby
feedback messages. This message is used to disable tuples vacuuming on the
primary to avoid delete conflicts on the replica in queries (some queries may
select some tuples which were vacuumed on the primary and deletions are
replicated to the standby). If the primary receives a HSF message after slot
saving, I believe, it is allowable if autovacuum cleans tuples with xmin later
than the last saved value. If the primary restarts, the older value will be
loaded but the replica already confirmed the newer value. Concerning replica,
it is the obligation of the replica to send such HSF xmin that will survive
replica's immediate restart.+1
The point is about the general principle of slot's xmin values, which
is that the rows with xid greater than slot's xmin should be available
(or can't be removed by vacuum). But here, such a principle could be
violated after a restart. I don't have a test to show what harm it can
cause, but will try to think/investigate more on it.
Taking into account these thoughts, I can't see any problems with the alternative
patch where oldest wal lsn is calculated only in checkpoint.The alternative will needlessly prevent removing WAL segments in some
cases when logical slots are in use.IMHO, I'm not sure, it will significantly impact the wal removal. We remove WAL
segments only in checkpoint. The alternate solution gets the oldest WAL segment
at the beginning of checkpoint, then saves dirty slots to disk, and removes old
WAL segments at the end of checkpoint using the oldest WAL segment obtained at
the beginning of checkpoint. The alternate solution may not be so effective
in terms of WAL segments removal, if a logical slot is advanced during
checkpoint, but I do not think it is a significant issue. From the other hand,
the alternate solution simplifies the logic of WAL removal, backward compatible
(avoids addition new in-memory states), decreases the number of locks in
ReplicationSlotsComputeRequiredLSN - no need to recalculate oldest slots'
restart lsn every time when a slot is advanced.So, my proposal is to commit the attached patchset to the HEAD, and
commit [1] to the back branches. Any objections?
No objections. I think we can keep discussing if slot's xmin
computation has any issues or not, but you can proceed with the LSN
stuff.
--
With Regards,
Amit Kapila.
Hello Alexander,
10.06.2025 23:14, Alexander Korotkov wrote:
So, my proposal is to commit the attached patchset to the HEAD, and
commit [1] to the back branches. Any objections?
As the buildfarm animal prion shows [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2025-06-14%2001%3A58%3A06, the 046_checkpoint_logical_slot
test fails with "-DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE":
# poll_query_until timed out executing this query:
#
# SELECT count(*) > 0 FROM pg_stat_activity
# WHERE backend_type = 'client backend' AND wait_event = 'logical-replication-slot-advance-segment'
#
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
[04:16:27] t/046_checkpoint_logical_slot.pl ......
Dubious, test returned 29 (wstat 7424, 0x1d00)
No subtests run
[04:20:58] t/047_checkpoint_physical_slot.pl ..... ok 271294 ms ( 0.00 usr 0.00 sys + 0.37 cusr 0.26 csys = 0.63 CPU)
I'm able to reproduce this locally as well. Though the test passes for me
with the increased timeout, that is it's not stuck:
PG_TEST_TIMEOUT_DEFAULT=360 PROVE_TESTS="t/046*" make -s check -C src/test/recovery/
# +++ tap check in src/test/recovery +++
t/046_checkpoint_logical_slot.pl .. ok
All tests successful.
Files=1, Tests=1, 533 wallclock secs ( 0.01 usr 0.00 sys + 4.70 cusr 9.61 csys = 14.32 CPU)
Result: PASS
Could you have a look?
[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2025-06-14%2001%3A58%3A06
Best regards,
Alexander
Hi, Alexander!
On Sun, Jun 15, 2025 at 12:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
Hello Alexander,
10.06.2025 23:14, Alexander Korotkov wrote:
So, my proposal is to commit the attached patchset to the HEAD, and
commit [1] to the back branches. Any objections?As the buildfarm animal prion shows [1], the 046_checkpoint_logical_slot
test fails with "-DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE":
# poll_query_until timed out executing this query:
#
# SELECT count(*) > 0 FROM pg_stat_activity
# WHERE backend_type = 'client backend' AND wait_event = 'logical-replication-slot-advance-segment'
#
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
[04:16:27] t/046_checkpoint_logical_slot.pl ......
Dubious, test returned 29 (wstat 7424, 0x1d00)
No subtests run
[04:20:58] t/047_checkpoint_physical_slot.pl ..... ok 271294 ms ( 0.00 usr 0.00 sys + 0.37 cusr 0.26 csys = 0.63 CPU)I'm able to reproduce this locally as well. Though the test passes for me
with the increased timeout, that is it's not stuck:
PG_TEST_TIMEOUT_DEFAULT=360 PROVE_TESTS="t/046*" make -s check -C src/test/recovery/
# +++ tap check in src/test/recovery +++
t/046_checkpoint_logical_slot.pl .. ok
All tests successful.
Files=1, Tests=1, 533 wallclock secs ( 0.01 usr 0.00 sys + 4.70 cusr 9.61 csys = 14.32 CPU)
Result: PASSCould you have a look?
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2025-06-14%2001%3A58%3A06
Hmm... It seems to take too long to advance the segment with these
options on. Sure, I'll check this!
------
Regards,
Alexander Korotkov
Supabase
On Sun, Jun 15, 2025 at 12:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
Hello Alexander,
10.06.2025 23:14, Alexander Korotkov wrote:
So, my proposal is to commit the attached patchset to the HEAD, and
commit [1] to the back branches. Any objections?As the buildfarm animal prion shows [1], the 046_checkpoint_logical_slot
test fails with "-DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE":
# poll_query_until timed out executing this query:
#
# SELECT count(*) > 0 FROM pg_stat_activity
# WHERE backend_type = 'client backend' AND wait_event = 'logical-replication-slot-advance-segment'
#
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
[04:16:27] t/046_checkpoint_logical_slot.pl ......
Dubious, test returned 29 (wstat 7424, 0x1d00)
No subtests run
[04:20:58] t/047_checkpoint_physical_slot.pl ..... ok 271294 ms ( 0.00 usr 0.00 sys + 0.37 cusr 0.26 csys = 0.63 CPU)I'm able to reproduce this locally as well. Though the test passes for me
with the increased timeout, that is it's not stuck:
PG_TEST_TIMEOUT_DEFAULT=360 PROVE_TESTS="t/046*" make -s check -C src/test/recovery/
# +++ tap check in src/test/recovery +++
t/046_checkpoint_logical_slot.pl .. ok
All tests successful.
Files=1, Tests=1, 533 wallclock secs ( 0.01 usr 0.00 sys + 4.70 cusr 9.61 csys = 14.32 CPU)
Result: PASSCould you have a look?
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2025-06-14%2001%3A58%3A06
Could you, please, check this patch? On my system it makes 046 and
047 execute in 140 secs with -O0 and -DRELCACHE_FORCE_RELEASE
-DCATCACHE_FORCE_RELEASE.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v1-0001-Fix-046_checkpoint_-logical-physical-_slot.pl-exe.patchapplication/octet-stream; name=v1-0001-Fix-046_checkpoint_-logical-physical-_slot.pl-exe.patchDownload
From db591b3c0b9ae760395f79df68ca78922108e7f4 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 15 Jun 2025 13:54:09 +0300
Subject: [PATCH v1] Fix 046_checkpoint_(logical/physical)_slot.pl execution
time
Reported-by:
Bug:
Discussion:
Author:
Reviewed-by:
Tested-by:
Backpatch-through:
---
src/test/recovery/t/046_checkpoint_logical_slot.pl | 8 ++++----
src/test/recovery/t/047_checkpoint_physical_slot.pl | 8 ++++----
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/src/test/recovery/t/046_checkpoint_logical_slot.pl b/src/test/recovery/t/046_checkpoint_logical_slot.pl
index b4265c4a6a5..4fe928210dd 100644
--- a/src/test/recovery/t/046_checkpoint_logical_slot.pl
+++ b/src/test/recovery/t/046_checkpoint_logical_slot.pl
@@ -58,17 +58,17 @@ SELECT 1 \watch 0.1
\q
));
-# Insert 2M rows; that's about 260MB (~20 segments) worth of WAL.
+# Insert 50K rows; that's about 86MB (~5 segments) worth of WAL.
$node->safe_psql('postgres',
- q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+ q{insert into t (b) select repeat(md5(i::text),50) from generate_series(1,50000) s(i)}
);
# Run another checkpoint to set a new restore LSN.
$node->safe_psql('postgres', q{checkpoint});
-# Another 2M rows; that's about 260MB (~20 segments) worth of WAL.
+# Another 50K rows; that's about 86MB (~5 segments) worth of WAL.
$node->safe_psql('postgres',
- q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+ q{insert into t (b) select repeat(md5(i::text),50) from generate_series(1,50000) s(i)}
);
# Run another checkpoint, this time in the background, and make it wait
diff --git a/src/test/recovery/t/047_checkpoint_physical_slot.pl b/src/test/recovery/t/047_checkpoint_physical_slot.pl
index 454e56b9bd2..3141b256bf3 100644
--- a/src/test/recovery/t/047_checkpoint_physical_slot.pl
+++ b/src/test/recovery/t/047_checkpoint_physical_slot.pl
@@ -43,9 +43,9 @@ $node->safe_psql('postgres',
# Run checkpoint to flush current state to disk and set a baseline.
$node->safe_psql('postgres', q{checkpoint});
-# Insert 2M rows; that's about 260MB (~20 segments) worth of WAL.
+# Insert 50K rows; that's about 86MB (~5 segments) worth of WAL.
$node->safe_psql('postgres',
- q{insert into t (b) select md5(i::text) from generate_series(1,100000) s(i)}
+ q{insert into t (b) select repeat(md5(i::text),50) from generate_series(1,50000) s(i)}
);
# Advance slot to the current position, just to have everything "valid".
@@ -56,9 +56,9 @@ $node->safe_psql('postgres',
# Run another checkpoint to set a new restore LSN.
$node->safe_psql('postgres', q{checkpoint});
-# Another 2M rows; that's about 260MB (~20 segments) worth of WAL.
+# Another 50K rows; that's about 86MB (~5 segments) worth of WAL.
$node->safe_psql('postgres',
- q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+ q{insert into t (b) select repeat(md5(i::text),50) from generate_series(1,50000) s(i)}
);
my $restart_lsn_init = $node->safe_psql('postgres',
--
2.39.5 (Apple Git-154)
15.06.2025 14:02, Alexander Korotkov wrote:
Could you, please, check this patch? On my system it makes 046 and
047 execute in 140 secs with -O0 and -DRELCACHE_FORCE_RELEASE
-DCATCACHE_FORCE_RELEASE.
Thank you for the patch!
It decreases the test's duration significantly:
# +++ tap check in src/test/recovery +++
t/046_checkpoint_logical_slot.pl .. ok
All tests successful.
Files=1, Tests=1, 29 wallclock secs ( 0.01 usr 0.00 sys + 0.23 cusr 0.56 csys = 0.80 CPU)
Without the patch:
t/046_checkpoint_logical_slot.pl .. ok
All tests successful.
Files=1, Tests=1, 519 wallclock secs ( 0.01 usr 0.00 sys + 3.05 cusr 7.64 csys = 10.70 CPU)
Result: PASS
Best regards,
Alexander
BTW, while you're cleaning up this commit, could you remove the
excess newlines in some of the "note" commands in 046 and 047, like
note('starting checkpoint\n');
This produces bizarre output, as shown in the buildfarm logs:
[04:04:38.953](603.550s) # starting checkpoint\\n
regards, tom lane
Hi, Tom!
On Sun, Jun 15, 2025 at 7:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
BTW, while you're cleaning up this commit, could you remove the
excess newlines in some of the "note" commands in 046 and 047, likenote('starting checkpoint\n');
This produces bizarre output, as shown in the buildfarm logs:
Thank you for reporting this. The revised patch is attached. In
addition to reducing tests runtime, it removes excess newlines from
some note() calls. The commit message is here. I'm going to push
this if no objections.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v2-0001-Improve-runtime-and-output-of-tests-for-replicati.patchapplication/octet-stream; name=v2-0001-Improve-runtime-and-output-of-tests-for-replicati.patchDownload
From 5e332ece97a6aefcec569af7e16a4251fc79071c Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 15 Jun 2025 13:54:09 +0300
Subject: [PATCH v2] Improve runtime and output of tests for replication slots
checkpointing.
The TAP tests that verify logical and physical replication slot behavior
during checkpoints (046_checkpoint_logical_slot.pl and
047_checkpoint_physical_slot.pl) inserted two batches of 2 million rows each,
generating approximately 520 MB of WAL. On slow machines, or when compiled
with '-DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE', this caused the
tests to run for 8-9 minutes and occasionally time out, as seen on the
buildfarm animal prion.
Reduce each INSERT to 50k rows of wider data, which yields approximately
5 segments of WAL. This volume is still sufficient to advance the slot to
the next segment and exercise the code paths under test, but it cuts
the total wall-clock run time.
While here, remove superfluous '\n' characters from several note() calls;
these appeared literally in the build-farm logs and looked odd.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/fbc5d94e-6fbd-4a64-85d4-c9e284a58eb2%40gmail.com
---
src/test/recovery/t/046_checkpoint_logical_slot.pl | 14 +++++++-------
.../recovery/t/047_checkpoint_physical_slot.pl | 8 ++++----
2 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/src/test/recovery/t/046_checkpoint_logical_slot.pl b/src/test/recovery/t/046_checkpoint_logical_slot.pl
index b4265c4a6a5..964bc34fbe8 100644
--- a/src/test/recovery/t/046_checkpoint_logical_slot.pl
+++ b/src/test/recovery/t/046_checkpoint_logical_slot.pl
@@ -58,23 +58,23 @@ SELECT 1 \watch 0.1
\q
));
-# Insert 2M rows; that's about 260MB (~20 segments) worth of WAL.
+# Insert 50K rows; that's about 86MB (~5 segments) worth of WAL.
$node->safe_psql('postgres',
- q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+ q{insert into t (b) select repeat(md5(i::text),50) from generate_series(1,50000) s(i)}
);
# Run another checkpoint to set a new restore LSN.
$node->safe_psql('postgres', q{checkpoint});
-# Another 2M rows; that's about 260MB (~20 segments) worth of WAL.
+# Another 50K rows; that's about 86MB (~5 segments) worth of WAL.
$node->safe_psql('postgres',
- q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+ q{insert into t (b) select repeat(md5(i::text),50) from generate_series(1,50000) s(i)}
);
# Run another checkpoint, this time in the background, and make it wait
# on the injection point) so that the checkpoint stops right before
# removing old WAL segments.
-note('starting checkpoint\n');
+note('starting checkpoint');
my $checkpoint = $node->background_psql('postgres');
$checkpoint->query_safe(
@@ -88,7 +88,7 @@ checkpoint;
));
# Wait until the checkpoint stops right before removing WAL segments.
-note('waiting for injection_point\n');
+note('waiting for injection_point');
$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
note('injection_point is reached');
@@ -107,7 +107,7 @@ select count(*) from pg_logical_slot_get_changes('slot_logical', null, null) \wa
));
# Wait until the slot's restart_lsn points to the next WAL segment.
-note('waiting for injection_point\n');
+note('waiting for injection_point');
$node->wait_for_event('client backend',
'logical-replication-slot-advance-segment');
note('injection_point is reached');
diff --git a/src/test/recovery/t/047_checkpoint_physical_slot.pl b/src/test/recovery/t/047_checkpoint_physical_slot.pl
index 454e56b9bd2..3141b256bf3 100644
--- a/src/test/recovery/t/047_checkpoint_physical_slot.pl
+++ b/src/test/recovery/t/047_checkpoint_physical_slot.pl
@@ -43,9 +43,9 @@ $node->safe_psql('postgres',
# Run checkpoint to flush current state to disk and set a baseline.
$node->safe_psql('postgres', q{checkpoint});
-# Insert 2M rows; that's about 260MB (~20 segments) worth of WAL.
+# Insert 50K rows; that's about 86MB (~5 segments) worth of WAL.
$node->safe_psql('postgres',
- q{insert into t (b) select md5(i::text) from generate_series(1,100000) s(i)}
+ q{insert into t (b) select repeat(md5(i::text),50) from generate_series(1,50000) s(i)}
);
# Advance slot to the current position, just to have everything "valid".
@@ -56,9 +56,9 @@ $node->safe_psql('postgres',
# Run another checkpoint to set a new restore LSN.
$node->safe_psql('postgres', q{checkpoint});
-# Another 2M rows; that's about 260MB (~20 segments) worth of WAL.
+# Another 50K rows; that's about 86MB (~5 segments) worth of WAL.
$node->safe_psql('postgres',
- q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+ q{insert into t (b) select repeat(md5(i::text),50) from generate_series(1,50000) s(i)}
);
my $restart_lsn_init = $node->safe_psql('postgres',
--
2.39.5 (Apple Git-154)
Dear Alexander,
Thanks for pushing the fix patch! BTW, I have few comments for your commits.
Can you check and include them if needed?
01.
```
$node->append_conf('postgresql.conf',
"shared_preload_libraries = 'injection_points'");
```
No need to set shared_preload_libraries in 046/047. ISTM it must be set when we
enable the statistics.
02.
We should also check whether the injection_points can be installed or not.
You can check check_extension() and callers.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Dear Kuroda-san,
On Mon, Jun 16, 2025 at 12:11 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Thanks for pushing the fix patch! BTW, I have few comments for your commits.
Can you check and include them if needed?01.
```
$node->append_conf('postgresql.conf',
"shared_preload_libraries = 'injection_points'");
```No need to set shared_preload_libraries in 046/047. ISTM it must be set when we
enable the statistics.02.
We should also check whether the injection_points can be installed or not.
You can check check_extension() and callers.
Thank you! All of these totally make sense. The updated patch is attached.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v3-0001-Improve-runtime-and-output-of-tests-for-replicati.patchapplication/octet-stream; name=v3-0001-Improve-runtime-and-output-of-tests-for-replicati.patchDownload
From 7b02905e9bad091a269cae3194f4715ed41dffe0 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 15 Jun 2025 13:54:09 +0300
Subject: [PATCH v3] Improve runtime and output of tests for replication slots
checkpointing.
The TAP tests that verify logical and physical replication slot behavior
during checkpoints (046_checkpoint_logical_slot.pl and
047_checkpoint_physical_slot.pl) inserted two batches of 2 million rows each,
generating approximately 520 MB of WAL. On slow machines, or when compiled
with '-DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE', this caused the
tests to run for 8-9 minutes and occasionally time out, as seen on the
buildfarm animal prion.
Reduce each INSERT to 50k rows of wider data, which yields approximately
5 segments of WAL. This volume is still sufficient to advance the slot to
the next segment and exercise the code paths under test, but it cuts
the total wall-clock run time.
While here, remove superfluous '\n' characters from several note() calls;
these appeared literally in the build-farm logs and looked odd. Also, remove
excessive 'shared_preload_libraries' GUC from the config and add a check for
'injection_points' extension availability.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/fbc5d94e-6fbd-4a64-85d4-c9e284a58eb2%40gmail.com
---
.../recovery/t/046_checkpoint_logical_slot.pl | 25 ++++++++++++-------
.../t/047_checkpoint_physical_slot.pl | 19 +++++++++-----
2 files changed, 29 insertions(+), 15 deletions(-)
diff --git a/src/test/recovery/t/046_checkpoint_logical_slot.pl b/src/test/recovery/t/046_checkpoint_logical_slot.pl
index b4265c4a6a5..86cdc1a2d2c 100644
--- a/src/test/recovery/t/046_checkpoint_logical_slot.pl
+++ b/src/test/recovery/t/046_checkpoint_logical_slot.pl
@@ -21,10 +21,17 @@ my ($node, $result);
$node = PostgreSQL::Test::Cluster->new('mike');
$node->init;
-$node->append_conf('postgresql.conf',
- "shared_preload_libraries = 'injection_points'");
$node->append_conf('postgresql.conf', "wal_level = 'logical'");
$node->start;
+
+# Check if the extension injection_points is available, as it may be
+# possible that this script is run with installcheck, where the module
+# would not be installed by default.
+if (!$node->check_extension('injection_points'))
+{
+ plan skip_all => 'Extension injection_points not installed';
+}
+
$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
# Create a simple table to generate data into.
@@ -58,23 +65,23 @@ SELECT 1 \watch 0.1
\q
));
-# Insert 2M rows; that's about 260MB (~20 segments) worth of WAL.
+# Insert 50K rows; that's about 86MB (~5 segments) worth of WAL.
$node->safe_psql('postgres',
- q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+ q{insert into t (b) select repeat(md5(i::text),50) from generate_series(1,50000) s(i)}
);
# Run another checkpoint to set a new restore LSN.
$node->safe_psql('postgres', q{checkpoint});
-# Another 2M rows; that's about 260MB (~20 segments) worth of WAL.
+# Another 50K rows; that's about 86MB (~5 segments) worth of WAL.
$node->safe_psql('postgres',
- q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+ q{insert into t (b) select repeat(md5(i::text),50) from generate_series(1,50000) s(i)}
);
# Run another checkpoint, this time in the background, and make it wait
# on the injection point) so that the checkpoint stops right before
# removing old WAL segments.
-note('starting checkpoint\n');
+note('starting checkpoint');
my $checkpoint = $node->background_psql('postgres');
$checkpoint->query_safe(
@@ -88,7 +95,7 @@ checkpoint;
));
# Wait until the checkpoint stops right before removing WAL segments.
-note('waiting for injection_point\n');
+note('waiting for injection_point');
$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
note('injection_point is reached');
@@ -107,7 +114,7 @@ select count(*) from pg_logical_slot_get_changes('slot_logical', null, null) \wa
));
# Wait until the slot's restart_lsn points to the next WAL segment.
-note('waiting for injection_point\n');
+note('waiting for injection_point');
$node->wait_for_event('client backend',
'logical-replication-slot-advance-segment');
note('injection_point is reached');
diff --git a/src/test/recovery/t/047_checkpoint_physical_slot.pl b/src/test/recovery/t/047_checkpoint_physical_slot.pl
index 454e56b9bd2..5ff7ecdb905 100644
--- a/src/test/recovery/t/047_checkpoint_physical_slot.pl
+++ b/src/test/recovery/t/047_checkpoint_physical_slot.pl
@@ -21,10 +21,17 @@ my ($node, $result);
$node = PostgreSQL::Test::Cluster->new('mike');
$node->init;
-$node->append_conf('postgresql.conf',
- "shared_preload_libraries = 'injection_points'");
$node->append_conf('postgresql.conf', "wal_level = 'replica'");
$node->start;
+
+# Check if the extension injection_points is available, as it may be
+# possible that this script is run with installcheck, where the module
+# would not be installed by default.
+if (!$node->check_extension('injection_points'))
+{
+ plan skip_all => 'Extension injection_points not installed';
+}
+
$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
# Create a simple table to generate data into.
@@ -43,9 +50,9 @@ $node->safe_psql('postgres',
# Run checkpoint to flush current state to disk and set a baseline.
$node->safe_psql('postgres', q{checkpoint});
-# Insert 2M rows; that's about 260MB (~20 segments) worth of WAL.
+# Insert 50K rows; that's about 86MB (~5 segments) worth of WAL.
$node->safe_psql('postgres',
- q{insert into t (b) select md5(i::text) from generate_series(1,100000) s(i)}
+ q{insert into t (b) select repeat(md5(i::text),50) from generate_series(1,50000) s(i)}
);
# Advance slot to the current position, just to have everything "valid".
@@ -56,9 +63,9 @@ $node->safe_psql('postgres',
# Run another checkpoint to set a new restore LSN.
$node->safe_psql('postgres', q{checkpoint});
-# Another 2M rows; that's about 260MB (~20 segments) worth of WAL.
+# Another 50K rows; that's about 86MB (~5 segments) worth of WAL.
$node->safe_psql('postgres',
- q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+ q{insert into t (b) select repeat(md5(i::text),50) from generate_series(1,50000) s(i)}
);
my $restart_lsn_init = $node->safe_psql('postgres',
--
2.39.5 (Apple Git-154)
Dear Alexander,
Thank you! All of these totally make sense. The updated patch is attached.
Thanks for the update. I found another point.
```
-# Another 2M rows; that's about 260MB (~20 segments) worth of WAL.
+# Another 50K rows; that's about 86MB (~5 segments) worth of WAL.
$node->safe_psql('postgres',
- q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
+ q{insert into t (b) select repeat(md5(i::text),50) from generate_series(1,50000) s(i)}
);
```
I think a perl function advance_wal() can be used instead of doing actual INSERT
command because no one refers the replicated result. Same thing can be said in
046/047.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Hi Alexander,
While tracking buildfarm for one of other commits, I noticed this failure:
TRAP: failed Assert("s->data.restart_lsn >=
s->last_saved_restart_lsn"), File:
"../pgsql/src/backend/replication/slot.c", Line: 1813, PID: 3945797
postgres: standby: checkpointer (ExceptionalCondition+0x83) [0x55fa69b79f5e]
postgres: standby: checkpointer
(InvalidateObsoleteReplicationSlots+0x53c) [0x55fa69982171]
postgres: standby: checkpointer (CreateCheckPoint+0x9ad) [0x55fa6971feb2]
postgres: standby: checkpointer (CheckpointerMain+0x4b1) [0x55fa6996431c]
postgres: standby: checkpointer (postmaster_child_launch+0x130) [0x55fa69964b41]
postgres: standby: checkpointer (+0x40a1a7) [0x55fa699671a7]
postgres: standby: checkpointer (PostmasterMain+0x1563) [0x55fa6996aed6]
postgres: standby: checkpointer (main+0x7f0) [0x55fa6989f798]
/lib/x86_64-linux-gnu/libc.so.6(+0x29ca8) [0x7f1876a54ca8]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7f1876a54d65]
postgres: standby: checkpointer (_start+0x21) [0x55fa696421a1]
Scorpion is failing for pg_basebackup's 020_pg_receivewal test at [1]https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=scorpion&dt=2025-06-17%2000%3A40%3A46&stg=pg_basebackup-check.
[1]: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=scorpion&dt=2025-06-17%2000%3A40%3A46&stg=pg_basebackup-check
Regards,
Vignesh
Show quoted text
On Mon, 16 Jun 2025 at 16:47, Alexander Korotkov <aekorotkov@gmail.com> wrote:
Dear Kuroda-san,
On Mon, Jun 16, 2025 at 12:11 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:Thanks for pushing the fix patch! BTW, I have few comments for your commits.
Can you check and include them if needed?01.
```
$node->append_conf('postgresql.conf',
"shared_preload_libraries = 'injection_points'");
```No need to set shared_preload_libraries in 046/047. ISTM it must be set when we
enable the statistics.02.
We should also check whether the injection_points can be installed or not.
You can check check_extension() and callers.Thank you! All of these totally make sense. The updated patch is attached.
------
Regards,
Alexander Korotkov
Supabase
Hi, Vitaly!
On Tue, Jun 17, 2025 at 6:02 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
Thank you for reporting the issue.
While tracking buildfarm for one of other commits, I noticed this failure:
TRAP: failed Assert("s->data.restart_lsn >=
s->last_saved_restart_lsn"), File:
"../pgsql/src/backend/replication/slot.c", Line: 1813, PID: 3945797
postgres: standby: checkpointer (ExceptionalCondition+0x83) [0x55fa69b79f5e]
postgres: standby: checkpointer
(InvalidateObsoleteReplicationSlots+0x53c) [0x55fa69982171]
postgres: standby: checkpointer (CreateCheckPoint+0x9ad) [0x55fa6971feb2]This assert was introduced in the patch. Now, I think, it is a wrong one. Let me
please explain one of the possible scenarios when it can be triggered. In case
of physical replication, when walsender receives a standby reply message, it
calls PhysicalConfirmReceivedLocation function which updates slots' restart_lsn
from received flush_lsn value. This value may be older than the saved value. If
it happens during checkpoint, after slot saving to disk, this assert will be
triggered, because last_saved_restart_lsn value may be lesser than the new
restart_lsn value, updated from walsender.I propose to remove this assert.
Yes, but is it OK for restart_lsn to move backward? That might mean
that if checkpoint happen faster than
PhysicalConfirmReceivedLocation(), then crash could cause this WAL
location to be unavailable. Is that true?
Also, what do you think about proposed changes in [1]? I wonder if it
could somehow decrease the coverage.
Links.
1. /messages/by-id/OSCPR01MB149665B3F0629D10731B18E5AF570A@OSCPR01MB14966.jpnprd01.prod.outlook.com
------
Regards,
Alexander Korotkov
Supabase
Import Notes
Reply to msg id not found: 3f242e-68518380-11-78125f00@134027639
vignesh C <vignesh21@gmail.com> writes:
While tracking buildfarm for one of other commits, I noticed this failure:
TRAP: failed Assert("s->data.restart_lsn >=
s->last_saved_restart_lsn"), File:
"../pgsql/src/backend/replication/slot.c", Line: 1813, PID: 3945797
My animal mamba is also showing this assertion failure, but in a
different test (recovery/t/040_standby_failover_slots_sync.pl).
It's failed in two out of its three runs since ca307d5ce went in,
so it's more reproducible than scorpion's report, though still not
perfectly so.
I suspect that mamba is prone to this simply because it's slow,
although perhaps there's a different reason. Anyway, happy to
investigate manually if there's something you'd like me to
check for.
regards, tom lane
Dear Vitaly,
I've been working on the bug...
This assert was introduced in the patch. Now, I think, it is a wrong one. Let me
please explain one of the possible scenarios when it can be triggered. In case
of physical replication, when walsender receives a standby reply message, it
calls PhysicalConfirmReceivedLocation function which updates slots' restart_lsn
from received flush_lsn value. This value may be older than the saved value.
To confirm, can you tell me the theory why the walsender received old LSN?
It is sent by the walreceiver, so is there a case that LogstreamResult.Flush can go backward?
Not sure we can accept the situation.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Import Notes
Reply to msg id not found: 3f242e-68518380-11-78125f00@134027639
On Wed, 18 Jun 2025 at 14:35, Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
Dear Hayato,
To confirm, can you tell me the theory why the walsender received old LSN?
It is sent by the walreceiver, so is there a case that LogstreamResult.Flush can go backward?
Not sure we can accept the situation.I can't say anything about the origin of the issue, but it can be easily reproduced
on the master branch:1. Add an assert in PhysicalConfirmReceivedLocation (apply the attached patch)
2. Compile & install with tap tests and assertions enabled
3. cd src/bin/pg_basebackup/
3. PROVE_TESTS=t/020_pg_receivewal.pl gmake check
Thanks for the steps, I was able to reproduce the issue with the
suggested steps.
The test will fail because of the assertion. I plan to investigate the issue
but I need some more time for it. Once, it happens on the original master
branch, I think, this problem already exists. The proposed patch seems
to be not guilty.
This issue occurs even prior to this commit, I was able to reproduce
it on a version just before it. I’ll also look into analyzing the root
cause further.
It may be the same problem as discussed in:
/messages/by-id/CALDaNm2uQbhEVJzvnja6rw7Q9AYu9FpVmET=TbwLjV3DcPRhLw@mail.gmail.com
This issue was related to confirmed_flush and was addressed in commit
d1ffcc7fa3c54de8b2a677a3e503fc808c7b419c. It is not related to
restart_lsn.
Regards,
Vignesh
Import Notes
Reply to msg id not found: 14841-68528100-1f-5d76c480@266989578
On Wed, Jun 18, 2025 at 6:50 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
I think, it is a good idea. Once we do not use the generated data, it is ok
just to generate WAL segments using the proposed function. I've tested this
function. The tests worked as expected with and without the fix. The attached
patch does the change.Sorry, forgot to attach the patch. It is created on the current master branch.
It may conflict with your corrections. I hope, it could be useful.
Thank you. I've integrated this into a patch to improve these tests.
Regarding assertion failure, I've found that assert in
PhysicalConfirmReceivedLocation() conflicts with restart_lsn
previously set by ReplicationSlotReserveWal(). As I can see,
ReplicationSlotReserveWal() just picks fresh XLogCtl->RedoRecPtr lsn.
So, it doesn't seems there is a guarantee that restart_lsn never goes
backward. The commit in ReplicationSlotReserveWal() even states there
is a "chance that we have to retry". Thus, I propose to remove the
assertion introduced by ca307d5cec90.
Any objection from backpatching 0001 though 17 and pushing 0002 to the head?
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v4-0002-Remove-excess-assert-from-InvalidatePossiblyObsol.patchapplication/octet-stream; name=v4-0002-Remove-excess-assert-from-InvalidatePossiblyObsol.patchDownload
From a805b5fc6320069b33c3f60037f32f3e679db5cd Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Wed, 18 Jun 2025 19:34:00 +0300
Subject: [PATCH v4 2/2] Remove excess assert from
InvalidatePossiblyObsoleteSlot()
ca307d5cec90 introduced keeping WAL segments by slot's last saved restart
LSN. It also added an assertion that the slot's restart LSN never goes
backward. As stated in the ReplicationSlotReserveWal() comment, this is not
always true. Additionally, this issue has been spotted by some buildfarm
members.
Vitaly Davydov <v.davydov@postgrespro.ru> proposed the fix idea.
Reported-by: Vignesh C <vignesh21@gmail.com>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CALDaNm3s-jpQTe1MshsvQ8GO%3DTLj233JCdkQ7uZ6pwqRVpxAdw%40mail.gmail.com
---
src/backend/replication/slot.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index c64f020742f..c11e588d632 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1810,8 +1810,6 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
*/
SpinLockAcquire(&s->mutex);
- Assert(s->data.restart_lsn >= s->last_saved_restart_lsn);
-
restart_lsn = s->data.restart_lsn;
/* we do nothing if the slot is already invalid */
--
2.39.5 (Apple Git-154)
v4-0001-Improve-runtime-and-output-of-tests-for-replicati.patchapplication/octet-stream; name=v4-0001-Improve-runtime-and-output-of-tests-for-replicati.patchDownload
From 0b28994d5780bb6ca4b381de9dd8b96ea5dda3c0 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Wed, 18 Jun 2025 19:32:05 +0300
Subject: [PATCH v4 1/2] Improve runtime and output of tests for replication
slots checkpointing.
The TAP tests that verify logical and physical replication slot behavior
during checkpoints (046_checkpoint_logical_slot.pl and
047_checkpoint_physical_slot.pl) inserted two batches of 2 million rows each,
generating approximately 520 MB of WAL. On slow machines, or when compiled
with '-DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE', this caused the
tests to run for 8-9 minutes and occasionally time out, as seen on the
buildfarm animal prion.
This commit modifies the mentioned tests to utilize the $node->advance_wal()
function, thereby reducing runtime. Once we do not use the generated data,
the proposed function is a good alternative, which cuts the total wall-clock
run time.
While here, remove superfluous '\n' characters from several note() calls;
these appeared literally in the build-farm logs and looked odd. Also, remove
excessive 'shared_preload_libraries' GUC from the config and add a check for
'injection_points' extension availability.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/fbc5d94e-6fbd-4a64-85d4-c9e284a58eb2%40gmail.com
Backpatch-through: 17
---
.../recovery/t/046_checkpoint_logical_slot.pl | 31 +++++++++----------
.../t/047_checkpoint_physical_slot.pl | 23 +++++++-------
2 files changed, 25 insertions(+), 29 deletions(-)
diff --git a/src/test/recovery/t/046_checkpoint_logical_slot.pl b/src/test/recovery/t/046_checkpoint_logical_slot.pl
index b4265c4a6a5..d67c5108d78 100644
--- a/src/test/recovery/t/046_checkpoint_logical_slot.pl
+++ b/src/test/recovery/t/046_checkpoint_logical_slot.pl
@@ -21,15 +21,18 @@ my ($node, $result);
$node = PostgreSQL::Test::Cluster->new('mike');
$node->init;
-$node->append_conf('postgresql.conf',
- "shared_preload_libraries = 'injection_points'");
$node->append_conf('postgresql.conf', "wal_level = 'logical'");
$node->start;
-$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
-# Create a simple table to generate data into.
-$node->safe_psql('postgres',
- q{create table t (id serial primary key, b text)});
+# Check if the extension injection_points is available, as it may be
+# possible that this script is run with installcheck, where the module
+# would not be installed by default.
+if (!$node->check_extension('injection_points'))
+{
+ plan skip_all => 'Extension injection_points not installed';
+}
+
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
# Create the two slots we'll need.
$node->safe_psql('postgres',
@@ -58,23 +61,17 @@ SELECT 1 \watch 0.1
\q
));
-# Insert 2M rows; that's about 260MB (~20 segments) worth of WAL.
-$node->safe_psql('postgres',
- q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
-);
+$node->advance_wal(20);
# Run another checkpoint to set a new restore LSN.
$node->safe_psql('postgres', q{checkpoint});
-# Another 2M rows; that's about 260MB (~20 segments) worth of WAL.
-$node->safe_psql('postgres',
- q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
-);
+$node->advance_wal(20);
# Run another checkpoint, this time in the background, and make it wait
# on the injection point) so that the checkpoint stops right before
# removing old WAL segments.
-note('starting checkpoint\n');
+note('starting checkpoint');
my $checkpoint = $node->background_psql('postgres');
$checkpoint->query_safe(
@@ -88,7 +85,7 @@ checkpoint;
));
# Wait until the checkpoint stops right before removing WAL segments.
-note('waiting for injection_point\n');
+note('waiting for injection_point');
$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
note('injection_point is reached');
@@ -107,7 +104,7 @@ select count(*) from pg_logical_slot_get_changes('slot_logical', null, null) \wa
));
# Wait until the slot's restart_lsn points to the next WAL segment.
-note('waiting for injection_point\n');
+note('waiting for injection_point');
$node->wait_for_event('client backend',
'logical-replication-slot-advance-segment');
note('injection_point is reached');
diff --git a/src/test/recovery/t/047_checkpoint_physical_slot.pl b/src/test/recovery/t/047_checkpoint_physical_slot.pl
index 454e56b9bd2..a1332b5d44c 100644
--- a/src/test/recovery/t/047_checkpoint_physical_slot.pl
+++ b/src/test/recovery/t/047_checkpoint_physical_slot.pl
@@ -21,15 +21,18 @@ my ($node, $result);
$node = PostgreSQL::Test::Cluster->new('mike');
$node->init;
-$node->append_conf('postgresql.conf',
- "shared_preload_libraries = 'injection_points'");
$node->append_conf('postgresql.conf', "wal_level = 'replica'");
$node->start;
-$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
-# Create a simple table to generate data into.
-$node->safe_psql('postgres',
- q{create table t (id serial primary key, b text)});
+# Check if the extension injection_points is available, as it may be
+# possible that this script is run with installcheck, where the module
+# would not be installed by default.
+if (!$node->check_extension('injection_points'))
+{
+ plan skip_all => 'Extension injection_points not installed';
+}
+
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
# Create a physical replication slot.
$node->safe_psql('postgres',
@@ -44,9 +47,7 @@ $node->safe_psql('postgres',
$node->safe_psql('postgres', q{checkpoint});
# Insert 2M rows; that's about 260MB (~20 segments) worth of WAL.
-$node->safe_psql('postgres',
- q{insert into t (b) select md5(i::text) from generate_series(1,100000) s(i)}
-);
+$node->advance_wal(20);
# Advance slot to the current position, just to have everything "valid".
$node->safe_psql('postgres',
@@ -57,9 +58,7 @@ $node->safe_psql('postgres',
$node->safe_psql('postgres', q{checkpoint});
# Another 2M rows; that's about 260MB (~20 segments) worth of WAL.
-$node->safe_psql('postgres',
- q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)}
-);
+$node->advance_wal(20);
my $restart_lsn_init = $node->safe_psql('postgres',
q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'}
--
2.39.5 (Apple Git-154)
Import Notes
Reply to msg id not found: 21b44-6852e000-3-a556820@87831171
On Wed, Jun 18, 2025 at 10:17 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
On Wed, Jun 18, 2025 at 6:50 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
I think, it is a good idea. Once we do not use the generated data, it is ok
just to generate WAL segments using the proposed function. I've tested this
function. The tests worked as expected with and without the fix. The attached
patch does the change.Sorry, forgot to attach the patch. It is created on the current master branch.
It may conflict with your corrections. I hope, it could be useful.Thank you. I've integrated this into a patch to improve these tests.
Regarding assertion failure, I've found that assert in
PhysicalConfirmReceivedLocation() conflicts with restart_lsn
previously set by ReplicationSlotReserveWal(). As I can see,
ReplicationSlotReserveWal() just picks fresh XLogCtl->RedoRecPtr lsn.
So, it doesn't seems there is a guarantee that restart_lsn never goes
backward. The commit in ReplicationSlotReserveWal() even states there
is a "chance that we have to retry".
I don't see how this theory can lead to a restart_lsn of a slot going
backwards. The retry mentioned there is just a retry to reserve the
slot's position again if the required WAL is already removed. Such a
retry can only get the position later than the previous restart_lsn.
Thus, I propose to remove the
assertion introduced by ca307d5cec90.
If what I said above is correct, then the following part of the commit
message will be incorrect:
"As stated in the ReplicationSlotReserveWal() comment, this is not
always true. Additionally, this issue has been spotted by some
buildfarm
members."
--
With Regards,
Amit Kapila.
Dear Amit, Alexander,
Regarding assertion failure, I've found that assert in
PhysicalConfirmReceivedLocation() conflicts with restart_lsn
previously set by ReplicationSlotReserveWal(). As I can see,
ReplicationSlotReserveWal() just picks fresh XLogCtl->RedoRecPtr lsn.
So, it doesn't seems there is a guarantee that restart_lsn never goes
backward. The commit in ReplicationSlotReserveWal() even states there
is a "chance that we have to retry".I don't see how this theory can lead to a restart_lsn of a slot going
backwards. The retry mentioned there is just a retry to reserve the
slot's position again if the required WAL is already removed. Such a
retry can only get the position later than the previous restart_lsn.
We analyzed the assertion failure happened at pg_basebackup/020_pg_receivewal,
and confirmed that restart_lsn can go backward. This meant that Assert() added
by the ca307d5 can cause crash.
Background
===========
When pg_receivewal starts the replication and it uses the replication slot, it
sets as the beginning of the segment where restart_lsn exists, as the startpoint.
E.g., if the restart_lsn of the slot is 0/B000D0, pg_receivewal requests WALs
from 0/B00000.
More detail of this behavior, see f61e1dd2 and d9bae531.
What happened here
==================
Based on above theory, walsender sent from the beginning of segment (0/B00000).
When walreceiver receives, it tried to send reply. At that time the flushed
location of WAL would be 0/B00000. walsender sets the received lsn as restart_lsn
in PhysicalConfirmReceivedLocation(). Here the restart_lsn went backward (0/B000D0->0/B00000).
The assertion failure could happen if CHECKPOINT happened at that time.
Attribute last_saved_restart_lsn of the slot was 0/B000D0, but the data.restart_lsn
was 0/B00000. It could not satisfy the assertion added in InvalidatePossiblyObsoleteSlot().
Note
====
1.
In this case, starting from the beginning of the segment is not a problem, because
the checkpoint process only removes WAL files from segments that precede the
restart_lsn's wal segment. The current segment (0/B00000) will not be removed,
so there is no risk of data loss or inconsistency.
2.
A similar pattern applies to pg_basebackup. Both use logic that adjusts the
requested streaming position to the start of the segment, and it replies the
received LSN as flushed.
3.
I considered the theory above, but I could not reproduce 040_standby_failover_slots_sync
because it is a timing issue. Have someone else reproduced?
We are still investigating failure caused at 040_standby_failover_slots_sync.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
On Thu, Jun 19, 2025 at 1:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jun 18, 2025 at 10:17 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:On Wed, Jun 18, 2025 at 6:50 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
I think, it is a good idea. Once we do not use the generated data, it is ok
just to generate WAL segments using the proposed function. I've tested this
function. The tests worked as expected with and without the fix. The attached
patch does the change.Sorry, forgot to attach the patch. It is created on the current master branch.
It may conflict with your corrections. I hope, it could be useful.Thank you. I've integrated this into a patch to improve these tests.
Regarding assertion failure, I've found that assert in
PhysicalConfirmReceivedLocation() conflicts with restart_lsn
previously set by ReplicationSlotReserveWal(). As I can see,
ReplicationSlotReserveWal() just picks fresh XLogCtl->RedoRecPtr lsn.
So, it doesn't seems there is a guarantee that restart_lsn never goes
backward. The commit in ReplicationSlotReserveWal() even states there
is a "chance that we have to retry".I don't see how this theory can lead to a restart_lsn of a slot going
backwards. The retry mentioned there is just a retry to reserve the
slot's position again if the required WAL is already removed. Such a
retry can only get the position later than the previous restart_lsn.
Yes, if retry is needed, then the new position must be later for sure.
What I mean is that ReplicationSlotReserveWal() can reserve something
later than what standby is going to read (and correspondingly report
with PhysicalConfirmReceivedLocation()).
Thus, I propose to remove the
assertion introduced by ca307d5cec90.If what I said above is correct, then the following part of the commit
message will be incorrect:
"As stated in the ReplicationSlotReserveWal() comment, this is not
always true. Additionally, this issue has been spotted by some
buildfarm
members."
I agree, this comment needs improvement in terms of clarity.
Meanwhile I've pushed the patch for TAP tests, which I think didn't
get any objections.
------
Regards,
Alexander Korotkov
Supabase
Dear Kuroda-san,
On Thu, Jun 19, 2025 at 2:05 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Regarding assertion failure, I've found that assert in
PhysicalConfirmReceivedLocation() conflicts with restart_lsn
previously set by ReplicationSlotReserveWal(). As I can see,
ReplicationSlotReserveWal() just picks fresh XLogCtl->RedoRecPtr lsn.
So, it doesn't seems there is a guarantee that restart_lsn never goes
backward. The commit in ReplicationSlotReserveWal() even states there
is a "chance that we have to retry".I don't see how this theory can lead to a restart_lsn of a slot going
backwards. The retry mentioned there is just a retry to reserve the
slot's position again if the required WAL is already removed. Such a
retry can only get the position later than the previous restart_lsn.We analyzed the assertion failure happened at pg_basebackup/020_pg_receivewal,
and confirmed that restart_lsn can go backward. This meant that Assert() added
by the ca307d5 can cause crash.Background
===========
When pg_receivewal starts the replication and it uses the replication slot, it
sets as the beginning of the segment where restart_lsn exists, as the startpoint.
E.g., if the restart_lsn of the slot is 0/B000D0, pg_receivewal requests WALs
from 0/B00000.
More detail of this behavior, see f61e1dd2 and d9bae531.What happened here
==================
Based on above theory, walsender sent from the beginning of segment (0/B00000).
When walreceiver receives, it tried to send reply. At that time the flushed
location of WAL would be 0/B00000. walsender sets the received lsn as restart_lsn
in PhysicalConfirmReceivedLocation(). Here the restart_lsn went backward (0/B000D0->0/B00000).The assertion failure could happen if CHECKPOINT happened at that time.
Attribute last_saved_restart_lsn of the slot was 0/B000D0, but the data.restart_lsn
was 0/B00000. It could not satisfy the assertion added in InvalidatePossiblyObsoleteSlot().
Thank you for your detailed explanation!
Note
====
1.
In this case, starting from the beginning of the segment is not a problem, because
the checkpoint process only removes WAL files from segments that precede the
restart_lsn's wal segment. The current segment (0/B00000) will not be removed,
so there is no risk of data loss or inconsistency.2.
A similar pattern applies to pg_basebackup. Both use logic that adjusts the
requested streaming position to the start of the segment, and it replies the
received LSN as flushed.3.
I considered the theory above, but I could not reproduce 040_standby_failover_slots_sync
because it is a timing issue. Have someone else reproduced?We are still investigating failure caused at 040_standby_failover_slots_sync.
I didn't manage to reproduce this. But as I see from the logs [1]https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=mamba&dt=2025-06-17%2005%3A10%3A36&stg=recovery-check on
mamba that START_REPLICATION command was issued just before assert
trap. Could it be something similar to what I described in [2]/messages/by-id/CAPpHfdv3UEUBjsLhB_CwJT0xX9LmN6U+__myYopq4KcgvCSbTg@mail.gmail.com.
Namely:
1. ReplicationSlotReserveWal() sets restart_lsn for the slot.
2. Concurrent checkpoint flushes that restart_lsn to the disk.
3. PhysicalConfirmReceivedLocation() sets restart_lsn of the slot to
the beginning of the segment.
[1]: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=mamba&dt=2025-06-17%2005%3A10%3A36&stg=recovery-check
[2]: /messages/by-id/CAPpHfdv3UEUBjsLhB_CwJT0xX9LmN6U+__myYopq4KcgvCSbTg@mail.gmail.com
------
Regards,
Alexander Korotkov
Supabase
On Fri, Jun 20, 2025 at 5:48 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
If what I said above is correct, then the following part of the commit
message will be incorrect:
"As stated in the ReplicationSlotReserveWal() comment, this is not
always true. Additionally, this issue has been spotted by some
buildfarm
members."I agree, this comment needs improvement in terms of clarity.
Meanwhile I've pushed the patch for TAP tests, which I think didn't
get any objections.
Sounds reasonable. As per analysis till now, it seems removal of new
assert is correct and we just need to figure out the reason in all
failure cases as to why the physical slot's restart_lsn goes backward,
and then add a comment somewhere to ensure that we don't repeat a
similar mistake in the future.
--
With Regards,
Amit Kapila.
On Fri, 20 Jun 2025 at 05:54, Alexander Korotkov <aekorotkov@gmail.com> wrote:
Dear Kuroda-san,
On Thu, Jun 19, 2025 at 2:05 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:Regarding assertion failure, I've found that assert in
PhysicalConfirmReceivedLocation() conflicts with restart_lsn
previously set by ReplicationSlotReserveWal(). As I can see,
ReplicationSlotReserveWal() just picks fresh XLogCtl->RedoRecPtr lsn.
So, it doesn't seems there is a guarantee that restart_lsn never goes
backward. The commit in ReplicationSlotReserveWal() even states there
is a "chance that we have to retry".I don't see how this theory can lead to a restart_lsn of a slot going
backwards. The retry mentioned there is just a retry to reserve the
slot's position again if the required WAL is already removed. Such a
retry can only get the position later than the previous restart_lsn.We analyzed the assertion failure happened at pg_basebackup/020_pg_receivewal,
and confirmed that restart_lsn can go backward. This meant that Assert() added
by the ca307d5 can cause crash.Background
===========
When pg_receivewal starts the replication and it uses the replication slot, it
sets as the beginning of the segment where restart_lsn exists, as the startpoint.
E.g., if the restart_lsn of the slot is 0/B000D0, pg_receivewal requests WALs
from 0/B00000.
More detail of this behavior, see f61e1dd2 and d9bae531.What happened here
==================
Based on above theory, walsender sent from the beginning of segment (0/B00000).
When walreceiver receives, it tried to send reply. At that time the flushed
location of WAL would be 0/B00000. walsender sets the received lsn as restart_lsn
in PhysicalConfirmReceivedLocation(). Here the restart_lsn went backward (0/B000D0->0/B00000).The assertion failure could happen if CHECKPOINT happened at that time.
Attribute last_saved_restart_lsn of the slot was 0/B000D0, but the data.restart_lsn
was 0/B00000. It could not satisfy the assertion added in InvalidatePossiblyObsoleteSlot().Thank you for your detailed explanation!
Note
====
1.
In this case, starting from the beginning of the segment is not a problem, because
the checkpoint process only removes WAL files from segments that precede the
restart_lsn's wal segment. The current segment (0/B00000) will not be removed,
so there is no risk of data loss or inconsistency.2.
A similar pattern applies to pg_basebackup. Both use logic that adjusts the
requested streaming position to the start of the segment, and it replies the
received LSN as flushed.3.
I considered the theory above, but I could not reproduce 040_standby_failover_slots_sync
because it is a timing issue. Have someone else reproduced?We are still investigating failure caused at 040_standby_failover_slots_sync.
I didn't manage to reproduce this. But as I see from the logs [1] on
mamba that START_REPLICATION command was issued just before assert
trap. Could it be something similar to what I described in [2].
Namely:
1. ReplicationSlotReserveWal() sets restart_lsn for the slot.
2. Concurrent checkpoint flushes that restart_lsn to the disk.
3. PhysicalConfirmReceivedLocation() sets restart_lsn of the slot to
the beginning of the segment.
Here is my analysis for the 040_standby_failover_slots_sync test
failure where in physical replication slot can point to backward lsn:
In certain rare cases the restart LSN can go backwards. This scenario
can be reproduced by performing checkpoint continuously and slowing
the WAL applying. I have a patch with changes to handle this.
Here's a summary of the sequence of events:
1) Standby confirms a new LSN (0/40369C8) when primary sends some WAL contents:
After standby writes the received WAL contents in XLogWalRcvWrite, the
standby sends this lsn 0/40369C8 as the confirmed flush location. The
primary acknowledges this and updates the replication slot's
restart_lsn accordingly:
2025-06-20 14:33:21.777 IST [134998] standby1 LOG:
PhysicalConfirmReceivedLocation replication slot "sb1_slot" set
restart_lsn to 0/40369C8
2025-06-20 14:33:21.777 IST [134998] standby1 STATEMENT:
START_REPLICATION SLOT "sb1_slot" 0/3000000 TIMELINE 1
Checkpoint persists the new restart_lsn:
2) Shortly after receiving the new LSN, a checkpoint occurs which
saves this restart_lsn:
2025-06-20 14:33:21.780 IST [134913] LOG: checkpoint complete: wrote
0 buffers (0.0%), wrote 0 SLRU buffers; 0 WAL file(s) added, 0
removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.002 s; sync
files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0
kB; lsn=0/4036A20, redo lsn=0/40369C8
3)Streaming replication is restarted because of primary_conninfo
change and reload
The WAL receiver process is restarted:
25-06-20 14:33:21.779 IST [134997] FATAL: terminating walreceiver
process due to administrator command
4) Standby sends an older flush pointer after restart:
Upon restart, the WAL receiver sends a flush location (0/401D448)
derived from XLogRecoveryCtl->lastReplayedEndRecPtr, which is older
than the previously confirmed restart_lsn. It is important to note
that we are sending the lastReplayedEndRecPtr which is the last
successfully replayed lsn in this case:
5-06-20 14:33:21.796 IST [135135] LOG: WalReceiverMain
LogstreamResult.Flush initialized to 0/401D448
2025-06-20 14:33:21.796 IST [135135] LOG: sending write 0/401D448
flush 0/401D448 apply 0/401D448
This is done from here:
....
/* Initialize LogstreamResult and buffers for processing messages */
LogstreamResult.Write = LogstreamResult.Flush = GetXLogReplayRecPtr(NULL);
initStringInfo(&reply_message);
/* Initialize nap wakeup times. */
now = GetCurrentTimestamp();
for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i)
WalRcvComputeNextWakeup(i, now);
/* Send initial reply/feedback messages. */
XLogWalRcvSendReply(true, false);
...
In case of step 1, we are sending the lsn of the WAL that is written,
since we have slowed down the WAL applying the replay location is
lesser and the replay location is being sent here.
5) I have added logs to detect this inconsistency:
This leads to a scenario where the standby tries to confirm a
restart_lsn older than the one currently held by the primary:
2025-06-20 14:33:21.797 IST [135136] standby1 LOG: crash scenario -
slot sb1_slot, cannot confirm a restart LSN (0/401D448) that is older
than the current one (0/40369C8)
If a checkpoint happens concurrently during this condition, it may
trigger an assertion failure on the primary due to the restart_lsn
being less than the last_saved_restart_lsn.
Currently this does not break physical replication, but I'm not sure
if the gap increases to many WAL files and if the WAL files get
deleted, how it will behave.
Attached the patch changes with which you can reproduce. Grep for
"crash scenario" in the logs. For me it occurs with every run. The
reproduced logs are attached.
This proves that the restart_lsn can go backward in cases where the
standby is slowly applying. But this has nothing to do with this
thread, I felt you can commit the assert removal patch. I will
continue the analysis further and see if there is any impact or not
and we can later add comments accordingly.
Regards,
Vignesh
Attachments:
restart_lsn_backup_repro_v1.patchapplication/octet-stream; name=restart_lsn_backup_repro_v1.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 47ffc0a2307..1eea132bfcc 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7160,6 +7160,7 @@ CreateCheckPoint(int flags)
if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY |
CHECKPOINT_FORCE)) == 0)
{
+ #if 0
if (last_important_lsn == ControlFile->checkPoint)
{
END_CRIT_SECTION();
@@ -7167,6 +7168,7 @@ CreateCheckPoint(int flags)
(errmsg_internal("checkpoint skipped because system is idle")));
return false;
}
+ #endif
}
/*
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 6ce979f2d8b..c3804939780 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -2009,7 +2009,7 @@ ApplyWalRecord(XLogReaderState *xlogreader, XLogRecord *record, TimeLineID *repl
/* Pop the error context stack */
error_context_stack = errcallback.previous;
-
+ pg_usleep(1000L);
/*
* Update lastReplayedEndRecPtr after this record has been successfully
* replayed.
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index fda91ffd1ce..885d17ebde7 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -345,7 +345,7 @@ CheckpointerMain(const void *startup_data, size_t startup_data_len)
*/
for (;;)
{
- bool do_checkpoint = false;
+ bool do_checkpoint = true;
int flags = 0;
pg_time_t now;
int elapsed_secs;
@@ -573,11 +573,14 @@ CheckpointerMain(const void *startup_data, size_t startup_data_len)
continue; /* no sleep for us ... */
cur_timeout = Min(cur_timeout, XLogArchiveTimeout - elapsed_secs);
}
+ pg_usleep(1000L);
+ #if 0
(void) WaitLatch(MyLatch,
WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
cur_timeout * 1000L /* convert to ms */ ,
WAIT_EVENT_CHECKPOINTER_MAIN);
+ #endif
}
/*
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index c64f020742f..c1f1241309d 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1810,7 +1810,7 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
*/
SpinLockAcquire(&s->mutex);
- Assert(s->data.restart_lsn >= s->last_saved_restart_lsn);
+ //Assert(s->data.restart_lsn >= s->last_saved_restart_lsn);
restart_lsn = s->data.restart_lsn;
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 8c4d0fd9aed..d705cba5ae5 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -396,6 +396,9 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len)
/* Initialize LogstreamResult and buffers for processing messages */
LogstreamResult.Write = LogstreamResult.Flush = GetXLogReplayRecPtr(NULL);
+ elog(LOG, "WalReceiverMain LogstreamResult.Flush initialized to %X/%X",
+ LSN_FORMAT_ARGS(LogstreamResult.Flush));
+
initStringInfo(&reply_message);
/* Initialize nap wakeup times. */
@@ -960,6 +963,8 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr, TimeLineID tli)
buf += byteswritten;
LogstreamResult.Write = recptr;
+ elog(LOG, "XLogWalRcvFlush LogstreamResult.Write set to %X/%X",
+ LSN_FORMAT_ARGS(LogstreamResult.Write));
}
/* Update shared-memory status */
@@ -994,6 +999,9 @@ XLogWalRcvFlush(bool dying, TimeLineID tli)
LogstreamResult.Flush = LogstreamResult.Write;
+ elog(LOG, "XLogWalRcvFlush LogstreamResult.Flush initialized to %X/%X",
+ LSN_FORMAT_ARGS(LogstreamResult.Flush));
+
/* Update shared-memory status */
SpinLockAcquire(&walrcv->mutex);
if (walrcv->flushedUpto < LogstreamResult.Flush)
@@ -1138,7 +1146,7 @@ XLogWalRcvSendReply(bool force, bool requestReply)
pq_sendbyte(&reply_message, requestReply ? 1 : 0);
/* Send it */
- elog(DEBUG2, "sending write %X/%X flush %X/%X apply %X/%X%s",
+ elog(LOG, "sending write %X/%X flush %X/%X apply %X/%X%s",
LSN_FORMAT_ARGS(writePtr),
LSN_FORMAT_ARGS(flushPtr),
LSN_FORMAT_ARGS(applyPtr),
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index f2c33250e8b..3192581cb4c 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2377,7 +2377,14 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
if (slot->data.restart_lsn != lsn)
{
changed = true;
+
+ if (lsn < slot->data.restart_lsn)
+ elog(LOG, "crash scenario - slot %s, cannot confirm a restart LSN (%X/%X) that is older than the current one (%X/%X)",
+ NameStr(slot->data.name), LSN_FORMAT_ARGS(lsn), LSN_FORMAT_ARGS(slot->data.restart_lsn));
+
slot->data.restart_lsn = lsn;
+ elog(LOG, "PhysicalConfirmReceivedLocation replication slot \"%s\" set restart_lsn to %X/%X",
+ NameStr(slot->data.name), LSN_FORMAT_ARGS(slot->data.restart_lsn));
}
SpinLockRelease(&slot->mutex);
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 9c8b49e942d..a23d0561a2f 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -401,570 +401,4 @@ $primary->safe_psql('postgres',
# the failover slots.
$primary->wait_for_replay_catchup($standby1);
-$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
-
-# Two xl_running_xacts logs are generated here. When decoding the first log, it
-# only serializes the snapshot, without advancing the restart_lsn to the latest
-# position. This is because if a transaction is running, the restart_lsn can
-# only move to a position before that transaction. Hence, the second
-# xl_running_xacts log is needed, the decoding for which allows the restart_lsn
-# to advance to the last serialized snapshot's position (the first log).
-$primary->safe_psql(
- 'postgres', qq(
- BEGIN;
- SELECT txid_current();
- SELECT pg_log_standby_snapshot();
- COMMIT;
- BEGIN;
- SELECT txid_current();
- SELECT pg_log_standby_snapshot();
- COMMIT;
-));
-
-# Advance the restart_lsn to the position of the first xl_running_xacts log
-# generated above. Note that there might be concurrent xl_running_xacts logs
-# written by the bgwriter, which could cause the position to be advanced to an
-# unexpected point, but that would be a rare scenario and doesn't affect the
-# test results.
-$primary->safe_psql('postgres',
- "SELECT pg_replication_slot_advance('snap_test_slot', pg_current_wal_lsn());"
-);
-
-# Wait for the standby to catch up so that the standby is not lagging behind
-# the failover slots.
-$primary->wait_for_replay_catchup($standby1);
-
-# Log a message that will be consumed on the standby after promotion using the
-# synced slot. See the test where we promote standby (Promote the standby1 to
-# primary.)
-$primary->safe_psql('postgres',
- "SELECT pg_logical_emit_message(false, 'test', 'test');");
-
-# Get the confirmed_flush_lsn for the logical slot snap_test_slot on the primary
-my $confirmed_flush_lsn = $primary->safe_psql('postgres',
- "SELECT confirmed_flush_lsn from pg_replication_slots WHERE slot_name = 'snap_test_slot';"
-);
-
-$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
-
-# Verify that confirmed_flush_lsn of snap_test_slot slot is synced to the standby
-ok( $standby1->poll_query_until(
- 'postgres',
- "SELECT '$confirmed_flush_lsn' = confirmed_flush_lsn from pg_replication_slots WHERE slot_name = 'snap_test_slot' AND synced AND NOT temporary;"
- ),
- 'confirmed_flush_lsn of slot snap_test_slot synced to standby');
-
-##################################################
-# Test to confirm that the slot synchronization is protected from malicious
-# users.
-##################################################
-
-$primary->psql('postgres', "CREATE DATABASE slotsync_test_db");
-$primary->wait_for_replay_catchup($standby1);
-
-$standby1->stop;
-
-# On the primary server, create '=' operator in another schema mapped to
-# inequality function and redirect the queries to use new operator by setting
-# search_path. The new '=' operator is created with leftarg as 'bigint' and
-# right arg as 'int' to redirect 'count(*) = 1' in slot sync's query to use
-# new '=' operator.
-$primary->safe_psql(
- 'slotsync_test_db', q{
-
-CREATE ROLE repl_role REPLICATION LOGIN;
-CREATE SCHEMA myschema;
-
-CREATE FUNCTION myschema.myintne(bigint, int) RETURNS bool as $$
- BEGIN
- RETURN $1 <> $2;
- END;
- $$ LANGUAGE plpgsql immutable;
-
-CREATE OPERATOR myschema.= (
- leftarg = bigint,
- rightarg = int,
- procedure = myschema.myintne);
-
-ALTER DATABASE slotsync_test_db SET SEARCH_PATH TO myschema,pg_catalog;
-GRANT USAGE on SCHEMA myschema TO repl_role;
-});
-
-# Start the standby with changed primary_conninfo.
-$standby1->append_conf('postgresql.conf',
- "primary_conninfo = '$connstr_1 dbname=slotsync_test_db user=repl_role'");
-$standby1->start;
-
-# Run the synchronization function. If the sync flow was not prepared
-# to handle such attacks, it would have failed during the validation
-# of the primary_slot_name itself resulting in
-# ERROR: slot synchronization requires valid primary_slot_name
-$standby1->safe_psql('slotsync_test_db',
- "SELECT pg_sync_replication_slots();");
-
-# Reset the dbname and user in primary_conninfo to the earlier values.
-$standby1->append_conf('postgresql.conf',
- "primary_conninfo = '$connstr_1 dbname=postgres'");
-$standby1->reload;
-
-# Drop the newly created database.
-$primary->psql('postgres', q{DROP DATABASE slotsync_test_db;});
-
-##################################################
-# Test to confirm that the slot sync worker exits on invalid GUC(s) and
-# get started again on valid GUC(s).
-##################################################
-
-$log_offset = -s $standby1->logfile;
-
-# Enable slot sync worker.
-$standby1->append_conf('postgresql.conf', qq(sync_replication_slots = on));
-$standby1->reload;
-
-# Confirm that the slot sync worker is able to start.
-$standby1->wait_for_log(qr/slot sync worker started/, $log_offset);
-
-$log_offset = -s $standby1->logfile;
-
-# Disable another GUC required for slot sync.
-$standby1->append_conf('postgresql.conf', qq(hot_standby_feedback = off));
-$standby1->reload;
-
-# Confirm that slot sync worker acknowledge the GUC change and logs the msg
-# about wrong configuration.
-$standby1->wait_for_log(
- qr/slot synchronization worker will restart because of a parameter change/,
- $log_offset);
-$standby1->wait_for_log(
- qr/slot synchronization requires "hot_standby_feedback" to be enabled/,
- $log_offset);
-
-$log_offset = -s $standby1->logfile;
-
-# Re-enable the required GUC
-$standby1->append_conf('postgresql.conf', "hot_standby_feedback = on");
-$standby1->reload;
-
-# Confirm that the slot sync worker is able to start now.
-$standby1->wait_for_log(qr/slot sync worker started/, $log_offset);
-
-##################################################
-# Test to confirm that confirmed_flush_lsn of the logical slot on the primary
-# is synced to the standby via the slot sync worker.
-##################################################
-
-# Insert data on the primary
-$primary->safe_psql(
- 'postgres', qq[
- CREATE TABLE tab_int (a int PRIMARY KEY);
- INSERT INTO tab_int SELECT generate_series(1, 10);
-]);
-
-# Subscribe to the new table data and wait for it to arrive
-$subscriber1->safe_psql(
- 'postgres', qq[
- CREATE TABLE tab_int (a int PRIMARY KEY);
- CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, failover = true, create_slot = false);
-]);
-
-$subscriber1->wait_for_subscription_sync;
-
-# Do not allow any further advancement of the confirmed_flush_lsn for the
-# lsub1_slot.
-$subscriber1->safe_psql('postgres',
- "ALTER SUBSCRIPTION regress_mysub1 DISABLE");
-
-# Wait for the replication slot to become inactive on the publisher
-$primary->poll_query_until(
- 'postgres',
- "SELECT COUNT(*) FROM pg_catalog.pg_replication_slots WHERE slot_name = 'lsub1_slot' AND active='f'",
- 1);
-
-# Get the confirmed_flush_lsn for the logical slot lsub1_slot on the primary
-my $primary_flush_lsn = $primary->safe_psql('postgres',
- "SELECT confirmed_flush_lsn from pg_replication_slots WHERE slot_name = 'lsub1_slot';"
-);
-
-# Confirm that confirmed_flush_lsn of lsub1_slot slot is synced to the standby
-ok( $standby1->poll_query_until(
- 'postgres',
- "SELECT '$primary_flush_lsn' = confirmed_flush_lsn from pg_replication_slots WHERE slot_name = 'lsub1_slot' AND synced AND NOT temporary;"
- ),
- 'confirmed_flush_lsn of slot lsub1_slot synced to standby');
-
-##################################################
-# Test that logical failover replication slots wait for the specified
-# physical replication slots to receive the changes first. It uses the
-# following set up:
-#
-# (physical standbys)
-# | ----> standby1 (primary_slot_name = sb1_slot)
-# | ----> standby2 (primary_slot_name = sb2_slot)
-# primary ----- |
-# (logical replication)
-# | ----> subscriber1 (failover = true, slot_name = lsub1_slot)
-# | ----> subscriber2 (failover = false, slot_name = lsub2_slot)
-#
-# synchronized_standby_slots = 'sb1_slot'
-#
-# The setup is configured in such a way that the logical slot of subscriber1 is
-# enabled for failover, and thus the subscriber1 will wait for the physical
-# slot of standby1(sb1_slot) to catch up before receiving the decoded changes.
-##################################################
-
-$backup_name = 'backup3';
-
-$primary->psql('postgres',
- q{SELECT pg_create_physical_replication_slot('sb2_slot');});
-
-$primary->backup($backup_name);
-
-# Create another standby
-my $standby2 = PostgreSQL::Test::Cluster->new('standby2');
-$standby2->init_from_backup(
- $primary, $backup_name,
- has_streaming => 1,
- has_restoring => 1);
-$standby2->append_conf(
- 'postgresql.conf', qq(
-primary_slot_name = 'sb2_slot'
-));
-$standby2->start;
-$primary->wait_for_replay_catchup($standby2);
-
-# Configure primary to disallow any logical slots that have enabled failover
-# from getting ahead of the specified physical replication slot (sb1_slot).
-$primary->append_conf(
- 'postgresql.conf', qq(
-synchronized_standby_slots = 'sb1_slot'
-));
-$primary->reload;
-
-# Create another subscriber node without enabling failover, wait for sync to
-# complete
-my $subscriber2 = PostgreSQL::Test::Cluster->new('subscriber2');
-$subscriber2->init;
-$subscriber2->start;
-$subscriber2->safe_psql(
- 'postgres', qq[
- CREATE TABLE tab_int (a int PRIMARY KEY);
- CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub2_slot);
-]);
-
-$subscriber2->wait_for_subscription_sync;
-
-$subscriber1->safe_psql('postgres',
- "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
-
-my $offset = -s $primary->logfile;
-
-# Stop the standby associated with the specified physical replication slot
-# (sb1_slot) so that the logical replication slot (lsub1_slot) won't receive
-# changes until the standby comes up.
-$standby1->stop;
-
-# Create some data on the primary
-my $primary_row_count = 20;
-$primary->safe_psql('postgres',
- "INSERT INTO tab_int SELECT generate_series(11, $primary_row_count);");
-
-# Wait until the standby2 that's still running gets the data from the primary
-$primary->wait_for_replay_catchup($standby2);
-$result = $standby2->safe_psql('postgres',
- "SELECT count(*) = $primary_row_count FROM tab_int;");
-is($result, 't', "standby2 gets data from primary");
-
-# Wait for regress_mysub2 to get the data from the primary. This subscription
-# was not enabled for failover so it gets the data without waiting for any
-# standbys.
-$primary->wait_for_catchup('regress_mysub2');
-$result = $subscriber2->safe_psql('postgres',
- "SELECT count(*) = $primary_row_count FROM tab_int;");
-is($result, 't', "subscriber2 gets data from primary");
-
-# Wait until the primary server logs a warning indicating that it is waiting
-# for the sb1_slot to catch up.
-$primary->wait_for_log(
- qr/replication slot \"sb1_slot\" specified in parameter "synchronized_standby_slots" does not have active_pid/,
- $offset);
-
-# The regress_mysub1 was enabled for failover so it doesn't get the data from
-# primary and keeps waiting for the standby specified in synchronized_standby_slots
-# (sb1_slot aka standby1).
-$result =
- $subscriber1->safe_psql('postgres',
- "SELECT count(*) <> $primary_row_count FROM tab_int;");
-is($result, 't',
- "subscriber1 doesn't get data from primary until standby1 acknowledges changes"
-);
-
-# Start the standby specified in synchronized_standby_slots (sb1_slot aka standby1) and
-# wait for it to catch up with the primary.
-$standby1->start;
-$primary->wait_for_replay_catchup($standby1);
-$result = $standby1->safe_psql('postgres',
- "SELECT count(*) = $primary_row_count FROM tab_int;");
-is($result, 't', "standby1 gets data from primary");
-
-# Now that the standby specified in synchronized_standby_slots is up and running, the
-# primary can send the decoded changes to the subscription enabled for failover
-# (i.e. regress_mysub1). While the standby was down, regress_mysub1 didn't
-# receive any data from the primary. i.e. the primary didn't allow it to go
-# ahead of standby.
-$primary->wait_for_catchup('regress_mysub1');
-$result = $subscriber1->safe_psql('postgres',
- "SELECT count(*) = $primary_row_count FROM tab_int;");
-is($result, 't',
- "subscriber1 gets data from primary after standby1 acknowledges changes");
-
-##################################################
-# Verify that when using pg_logical_slot_get_changes to consume changes from a
-# logical failover slot, it will also wait for the slots specified in
-# synchronized_standby_slots to catch up.
-##################################################
-
-# Stop the standby associated with the specified physical replication slot so
-# that the logical replication slot won't receive changes until the standby
-# slot's restart_lsn is advanced or the slot is removed from the
-# synchronized_standby_slots list.
-$primary->safe_psql('postgres', "TRUNCATE tab_int;");
-$primary->wait_for_catchup('regress_mysub1');
-$standby1->stop;
-
-# Disable the regress_mysub1 to prevent the logical walsender from generating
-# more warnings.
-$subscriber1->safe_psql('postgres',
- "ALTER SUBSCRIPTION regress_mysub1 DISABLE");
-
-# Wait for the replication slot to become inactive on the publisher
-$primary->poll_query_until(
- 'postgres',
- "SELECT COUNT(*) FROM pg_catalog.pg_replication_slots WHERE slot_name = 'lsub1_slot' AND active = 'f'",
- 1);
-
-# Create a logical 'test_decoding' replication slot with failover enabled
-$primary->safe_psql('postgres',
- "SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, false, true);"
-);
-
-my $back_q = $primary->background_psql(
- 'postgres',
- on_error_stop => 0,
- timeout => $PostgreSQL::Test::Utils::timeout_default);
-
-# pg_logical_slot_get_changes will be blocked until the standby catches up,
-# hence it needs to be executed in a background session.
-$offset = -s $primary->logfile;
-$back_q->query_until(
- qr/logical_slot_get_changes/, q(
- \echo logical_slot_get_changes
- SELECT pg_logical_slot_get_changes('test_slot', NULL, NULL);
-));
-
-# Wait until the primary server logs a warning indicating that it is waiting
-# for the sb1_slot to catch up.
-$primary->wait_for_log(
- qr/replication slot \"sb1_slot\" specified in parameter "synchronized_standby_slots" does not have active_pid/,
- $offset);
-
-# Remove the standby from the synchronized_standby_slots list and reload the
-# configuration.
-$primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots', "''");
-$primary->reload;
-
-# Since there are no slots in synchronized_standby_slots, the function
-# pg_logical_slot_get_changes should now return, and the session can be
-# stopped.
-$back_q->quit;
-
-$primary->safe_psql('postgres',
- "SELECT pg_drop_replication_slot('test_slot');");
-
-# Add the physical slot (sb1_slot) back to the synchronized_standby_slots for further
-# tests.
-$primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots',
- "'sb1_slot'");
-$primary->reload;
-
-# Enable the regress_mysub1 for further tests
-$subscriber1->safe_psql('postgres',
- "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
-
-##################################################
-# Test that logical replication will wait for the user-created inactive
-# physical slot to catch up until we remove the slot from synchronized_standby_slots.
-##################################################
-
-$offset = -s $primary->logfile;
-
-# Create some data on the primary
-$primary_row_count = 10;
-$primary->safe_psql('postgres',
- "INSERT INTO tab_int SELECT generate_series(1, $primary_row_count);");
-
-# Wait until the primary server logs a warning indicating that it is waiting
-# for the sb1_slot to catch up.
-$primary->wait_for_log(
- qr/replication slot \"sb1_slot\" specified in parameter "synchronized_standby_slots" does not have active_pid/,
- $offset);
-
-# The regress_mysub1 doesn't get the data from primary because the specified
-# standby slot (sb1_slot) in synchronized_standby_slots is inactive.
-$result =
- $subscriber1->safe_psql('postgres', "SELECT count(*) = 0 FROM tab_int;");
-is($result, 't',
- "subscriber1 doesn't get data as the sb1_slot doesn't catch up");
-
-# Remove the standby from the synchronized_standby_slots list and reload the
-# configuration.
-$primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots', "''");
-$primary->reload;
-
-# Since there are no slots in synchronized_standby_slots, the primary server should now
-# send the decoded changes to the subscription.
-$primary->wait_for_catchup('regress_mysub1');
-$result = $subscriber1->safe_psql('postgres',
- "SELECT count(*) = $primary_row_count FROM tab_int;");
-is($result, 't',
- "subscriber1 gets data from primary after standby1 is removed from the synchronized_standby_slots list"
-);
-
-# Add the physical slot (sb1_slot) back to the synchronized_standby_slots for further
-# tests.
-$primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots',
- "'sb1_slot'");
-$primary->reload;
-
-##################################################
-# Test the synchronization of the two_phase setting for a subscription with the
-# standby. Additionally, prepare a transaction before enabling the two_phase
-# option; subsequent tests will verify if it can be correctly replicated to the
-# subscriber after committing it on the promoted standby.
-##################################################
-
-$standby1->start;
-
-# Prepare a transaction
-$primary->safe_psql(
- 'postgres', qq[
- BEGIN;
- INSERT INTO tab_int values(0);
- PREPARE TRANSACTION 'test_twophase_slotsync';
-]);
-
-$primary->wait_for_replay_catchup($standby1);
-$primary->wait_for_catchup('regress_mysub1');
-
-# Disable the subscription to allow changing the two_phase option.
-$subscriber1->safe_psql('postgres',
- "ALTER SUBSCRIPTION regress_mysub1 DISABLE");
-
-# Wait for the replication slot to become inactive on the publisher
-$primary->poll_query_until(
- 'postgres',
- "SELECT COUNT(*) FROM pg_catalog.pg_replication_slots WHERE slot_name = 'lsub1_slot' AND active='f'",
- 1);
-
-# Set two_phase to true and enable the subscription
-$subscriber1->safe_psql(
- 'postgres', qq[
- ALTER SUBSCRIPTION regress_mysub1 SET (two_phase = true);
- ALTER SUBSCRIPTION regress_mysub1 ENABLE;
-]);
-
-$primary->wait_for_catchup('regress_mysub1');
-
-my $two_phase_at = $primary->safe_psql('postgres',
- "SELECT two_phase_at from pg_replication_slots WHERE slot_name = 'lsub1_slot';"
-);
-
-# Confirm that two_phase setting of lsub1_slot slot is synced to the standby
-ok( $standby1->poll_query_until(
- 'postgres',
- "SELECT two_phase AND '$two_phase_at' = two_phase_at from pg_replication_slots WHERE slot_name = 'lsub1_slot' AND synced AND NOT temporary;"
- ),
- 'two_phase setting of slot lsub1_slot synced to standby');
-
-# Confirm that the prepared transaction is not yet replicated to the
-# subscriber.
-$result = $subscriber1->safe_psql('postgres',
- "SELECT count(*) = 0 FROM pg_prepared_xacts;");
-is($result, 't',
- "the prepared transaction is not replicated to the subscriber");
-
-##################################################
-# Promote the standby1 to primary. Confirm that:
-# a) the slot 'lsub1_slot' and 'snap_test_slot' are retained on the new primary
-# b) logical replication for regress_mysub1 is resumed successfully after failover
-# c) changes from the transaction prepared 'test_twophase_slotsync' can be
-# consumed from the synced slot 'snap_test_slot' once committed on the new
-# primary.
-# d) changes can be consumed from the synced slot 'snap_test_slot'
-##################################################
-$primary->wait_for_replay_catchup($standby1);
-
-# Capture the time before the standby is promoted
-my $promotion_time_on_primary = $standby1->safe_psql(
- 'postgres', qq[
- SELECT current_timestamp;
-]);
-
-$standby1->promote;
-
-# Capture the inactive_since of the synced slot after the promotion.
-# The expectation here is that the slot gets its inactive_since as part of the
-# promotion. We do this check before the slot is enabled on the new primary
-# below, otherwise, the slot gets active setting inactive_since to NULL.
-my $inactive_since_on_new_primary =
- $standby1->validate_slot_inactive_since('lsub1_slot',
- $promotion_time_on_primary);
-
-is( $standby1->safe_psql(
- 'postgres',
- "SELECT '$inactive_since_on_new_primary'::timestamptz > '$inactive_since_on_primary'::timestamptz"
- ),
- "t",
- 'synchronized slot has got its own inactive_since on the new primary after promotion'
-);
-
-# Update subscription with the new primary's connection info
-my $standby1_conninfo = $standby1->connstr . ' dbname=postgres';
-$subscriber1->safe_psql('postgres',
- "ALTER SUBSCRIPTION regress_mysub1 CONNECTION '$standby1_conninfo';");
-
-# Confirm the synced slot 'lsub1_slot' is retained on the new primary
-is( $standby1->safe_psql(
- 'postgres',
- q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot', 'snap_test_slot') AND synced AND NOT temporary;}
- ),
- 't',
- 'synced slot retained on the new primary');
-
-# Commit the prepared transaction
-$standby1->safe_psql('postgres',
- "COMMIT PREPARED 'test_twophase_slotsync';");
-$standby1->wait_for_catchup('regress_mysub1');
-
-# Confirm that the prepared transaction is replicated to the subscriber
-is($subscriber1->safe_psql('postgres', q{SELECT count(*) FROM tab_int;}),
- "11", 'prepared data replicated from the new primary');
-
-# Insert data on the new primary
-$standby1->safe_psql('postgres',
- "INSERT INTO tab_int SELECT generate_series(11, 20);");
-$standby1->wait_for_catchup('regress_mysub1');
-
-# Confirm that data in tab_int replicated on the subscriber
-is($subscriber1->safe_psql('postgres', q{SELECT count(*) FROM tab_int;}),
- "21", 'data replicated from the new primary');
-
-# Consume the data from the snap_test_slot. The synced slot should reach a
-# consistent point by restoring the snapshot at the restart_lsn serialized
-# during slot synchronization.
-$result = $standby1->safe_psql('postgres',
- "SELECT count(*) FROM pg_logical_slot_get_changes('snap_test_slot', NULL, NULL) WHERE data ~ 'message*';"
-);
-
-is($result, '1', "data can be consumed using snap_test_slot");
-
done_testing();
040_standby_failover_slots_sync_logs.7zapplication/x-compressed; name=040_standby_failover_slots_sync_logs.7zDownload
7z��' ���U[ $ U��
����%intg�|n�L���c1�;W'�PAT�Z�n�� xi|w��5�� *�f�0F������� HL����(��6�l�nH��oO�-z&����SL�.c�i���[��d w����s��S�V���Z[��4Y8dE$:LbL�����f�o��rV�~d���c���4���
%� ����v����Y;�:Jf�"
���`���"�j@M��~=<��j7(6d��F<u���+��@JJ����"|;���B��\�G�R@5�����+�Z{7����m?{����'��T�4:4���Q&��d���6��r��_9\�H�� {�z��e�4�~a�b9�]]���s[����S���G_3�����t�����q�B6^<��)�������*�{E�� ���U�����'���g ?�����)-O<�6���=��L`���b&��4��n,���V����|����m����P�@�U�{��gj��|y�w��K�P������.:���| ���,��h?�I���
��jk6<5SPP��vN�����X\�,�w��}�u@���h'�qj ?�Q�)IZq:���F@�>q�N����c�$Q� �egc#e�/f&8!����t�������o��c��"������f��>GVy��u�P<�������<�SP���!<R��)vq��������bKq��f��~��(
@�
��F��]3R���p�����n��PCHB?��W�t�����������)��m��(h\!��vp��3n�w���]��bO��~X��&<�h~��[����k����@�C���G���A���� ���'&�z�Jg���5�c���V�������x�q���q�`[��*a�k�=�^C�@�]�R��Uzvq����z��&�#]��V;�2�Z�]v��� l]����4�������I��1�w���(��$��7�E�cCj����ib�G������4����j���(��]���&��'����\�MC|%u,�p�:W�}�<�[���lz�6����F�q���|��S��������<__3����IB��;�S'/� {d.a�/�g��Uj�"���E(?��,1&�p���������i*� Zr�6e��E_��f��(�����E\%���`YL[
��i�+�����&]*����g�_-y�(0#�B����K����
�����6 ����4�Z�/��z�:o�9�Epf������*]1��Nvy�
pbg��>@���_M]��e�������<F���Fy��i���D{���/����:H[1�L��
��M��8��/��+����\xI<��N���z����@�^'XQL���x6�L���)����v���=����S�0=�s>�(=�X52?�z��S;9���Z��s%��'l�k��l(4B����f���h�q��:�+������~��Z�(��EH]I���3y�����x0b $���O��[(��~���jv'�wN��\!T>
0����Td/������n-�t�
��L�d}��9��,��,�_�������c�� �����)�]�c��4�U��r�]p���2�?FV��'@d;��$�1���E4���a+uj8.
}���t�T�=�K%�}��,k���5���:{r6
V���2�\����R�Om��P�������F� ����zpI�1�c��PA <X��%���������3� ��������)� ��X5��)E���$�fF����$���?S����w���Z0b
1c�97NO��� c31s�s��~:x`���zb�d\�4�-�|X��F���fy������f�gI]P^��e��0b*c�����I7.s���{��?}��Eg�����^� ����r�W��t����q�D��k��E��1��9��V�os�7���������fU ��.�]������N��[>5�G�2t-R����R}s���>�M�LY��r�=4�j?�(��P������zF��)3�;@\��H�e�,-��M4��i'f�������v
+w�t�<�n@��qi���C\��!���u������vB�q�|����$�&��_ �I�k�NLj���;8Z(�f;��� ����S��Tb.��S����� C�,�G������!��O����:���zR12�o��;��w?�6�l-��vEF��]Q T�o���$Ag"1����M���W�1��U&{�p�o-]�!����N�>�FD���;�+X�oew����V��pn�)R
��84J�3�C.�G�^���*Og�D�K_�����cp$a��i?�l4_���L�hN��O��x��I+�5��j<�v����z� ����ryZ����)����qo����VV����V����}�m��K�:�D�&YO�������:�=���
�<�1+�KtV�\��[�FE%���>B=��M��}8H?S������v�N;�����JnE[$%:tc!��������y����UV��B��tfJ��9��5��7#�.1�.H4�4���{�A�����,� ������ q|A�jec����������(���H��G9�3�g�I����+�-QZ��)����Pld���c�7��_�<�\`�h����wJ��9PM5���5{��n�|�
' j�'�22���I��U����H��^�s0����u�K)��OO�
�5�(@7��{ W\������
E�nYK�^���pN6k��MQ��[��m��PB=���(���rva\}t^�`�\�'.���Sd�����`�I[��� ~*� �jQ-9�'�z�?�(R�4\��Z�$��L
:�"���{��8�:���2��6��vc��0����,��N2�����apwR&`��J��"�"�����=7'��������?n�0�����`��]Y�)g�4���=I�I�3�z!�����w����2�� ,�1��u��;��t"���?�?�fP,�>�
�A��y|Jt&��8�!��� ^�kY�:���O��s�)�����
��.B���� ��n��#����$�>������RJr2����=�x7��PIy��&N�'n�E�Rs�C^���`1������@5+���TH�ZH>��T�t��$�,�'�X@j���<��V�, #����)ff ���R�M�z�oJ���>~ <��J4��=K��V��R���3��UV�<���qRL.`=���A�_�/���T �����4U�FZK���Q�5?���@ ����a��A����x�l^��7r����:,^�0I���>�w2)�*%��
��k���b3���apZ��Qpi$#g7����[B'�G��$�C
��
���>K�aQP�8��L�dRh�+W��F�a�m������u���-�-���a��$�z�
�M6�%�"jd�xgB���W,���{Q��r:w��j���E��9w����-��,p��3t'Z�S����cik���� Bdf�b��.��6��?*���UY4hEL��6��_I��m�����h���z��H�f��
�cV����Z�����HY��?�hC#�u,�V����,�`t�<�Yk_&h�B1�����|����P��6K���6�[>'���i���
���p�-����u�F,X���H�*P��us�{������@�=��$���"Z�w�g�Gpv�
�y1�"&�n��1�z��?�4x����Z�&`�ol�R\���\�P9��.�[�n����{����B�W��������,v�L"^7U���=��!�C@�3C��Ob�#'n���? �7���7�b�\yW�8���1Mp��Pc�V��0P�qI�2�2��%pj�D�{���,l&�e�������I3e�w��V����].���z�g#��i�_)��f1�EF;����8:S�a������o�8����8 �p1�n�q gCI�G��^��*���[a��1Bh�� ]�������-_�[?���jTq�#S���A�1������l)"V���s�m�k��x-��s����Rg#Y'#*�Y��NY;��������������kQ1��e�����sK��x�e�����@k����}M=��r=�����K��������+����&j+��j4��{U��5-5������Om3����|��f}�4bY���&n�$u� ��V�!�)�+�������4'��D��L�#��[���Kz��C�Y�(-��F7�*��w�QSU�g0�TUc��Cz���k ��L�]C<���l������] W�M�=#���y�pD}H@�%������Q��`��|:�����O�s���L79G>0��� �����f�mCV`M��k�9�Mf9����Q] +i_��tg�o2���\����S�(�
U��.�#����Tu~��*���V��l ����<� ?�6�����J����j�e9p���w0*��_��_H����9 ���S���0�$�#���/fg�3��'��>Z�u�L?Z�n��\-��V���l;N�+�bA�� ��"����n�L�d���|�l�O
6�5-<"N��1Q2�m���1�V;^�������.*e� $�������9F������j��b���pu�M���/d�hD��R��V�t��|k�=S�@�]2��|�`��IY&I?l��D�i�V�{����Xn��A��qow=����C��lD�u���%��B�L��� w��dG��^�_�i{<�J��
2�$��&]�����i�xt���y���6m�P7c��\�?�������6^��W�P2Qr�t�Vq)8�����&�9xk�)��&�+i����P�}��Nz:���km�����v��?�>
���_�Z\�l9b']����������i�f��5�}'�\���/���guH�b��#b��2L:�fq��k��N#�L~�Av��������t�T2�1��6;L$�p�_�.�>����H
\�^3�TZ��>W��
2����s���t�~����1����q�U��A�*3zq�S��$ �~�zbTFg$���K��L������z]W�������f�X�6�S�4�Jo�@D`D�!%F���M)rlB��t@����������0W7�R'��0��?��|w�V�����Y��q��|��������u�����j��eeD�C�b�6$���O�f*
[)��b�|v8�K�
��L�'�=P"����[w.� ����D������o��������?,��kr%�6���hr�"��}@�X�9�J��NGJgbca������Q��
p��\u��o��O�������TX���`n.�r������v��.
�����8O���*��YF��Xf2:7&;���Lr�;R#52�C����Z���
��T�rP�/��@�`i�rTN��2�����( �t��W��Y� /��"]���_\5~x�Tk,�Lm�<.����?��������m��z�p���K����\A���n��U�Q����lA��<S�����Q�������F���nU��uPU������}��9�������BV���h�?qy�����K�)��^�I�����8�k.+�#o���]�g��r����$G] ��3�z�����^��2���?fer�@~Z�TzN��Y�����&;��(8Ra8���@� �*�����M��C����G?u'{�;B��0�P��i��Z��+�����Y�hg��3�y0��#�PA�����j���� �uS��.�� '�`��M�RD�����w{~�x�=*��uc�%�A�9J�b3d�4�$7fKZ�c@��V�c�U��G��av�P�c����xh���������G�+h�O�T1Wl�n;"x�GE� R�t� ������ +��
�S�B+S�>`(�/9��AWQ��>"�3�U��;��WP\:�R��+i�z����>0= �$�e>�0q�'I>*���'��{*K<<�x>�
|�� ��������~�
\��Lo9�l����=J� ������V�t�h��I��9-[��c� �[����� �:���+����f����
��K��I�SLG�Jq�>8������� ���i%�\
�'k�9a���a������86�T���3�M��=�������cW����f���-� ���t:��i���0f�����QvL=*:���A�� ��D���d��*n���O�w< 5�'<�9N����y�������=�Z����1!b�5��<���:�hh8��[5�� L��p/��d7z���M����B���>��F�a�h�b|�K��!���j���q�OD�g'!�W�a/���s|��I���F�
m��8a�t���OV�������������A���:��;`���I�,�F .+�bN�`��[��j�A���NT�� (�g<�W���EI�v
��D�d����w�
;�6���Y%�dI������ c��E��5�NU i��a@G$P����1/��z�Uj�Uz=F3�#�$W!mg!.��V;�5��������V{PT���R����:���#j{�-C<&�i��{1���J^��F^�{����Yoc7�a�Q:2Y/�|��9��Y�����k����s���^3d��m�yO8@+��=�@��!��������s��d1.���}]!OJ�-`BTP�Vl��gw%������
!O�����DN;wphYy:��6=���Q`To���H�L��.��Ba���v�J�s���&W`Y���f�~������-m0��N��S���S�S�^��I�C�]w��h��vw�����k}�.��A\�`&����H�F�b.��R���R��@�����i^9�V3 z�;�Y�7B�G�z6���+�,,���L�A�������m@���s��D������2��;�#YW/���:������,O$iZ��f���6�D��F;������Gtv��1�J�5���IN��D}v�F �8����J�t���&v��N)����&LM���e�Dp�\���LB,��������D%�5����s�s81��z>��GX~��7"��?U����Ik?A��J��@����wx^]o���0
0��o}LYR��)�pDS��lh@W ��Gm��g��k��s����������G��0qq~70��f`d>����Cg���i[���r#����2v��6�]]w��c�\����&�uxb/efv�P�t<�m����@�Zc�A��]�"��=��kULCg�0��8�B����2
;�3�9yh�G$��6����'��M&�\B�����<{h��H���V(�H99���3����;�U?��?������W#V2�k������E���M)����e�<�W=�j�����8�� sQ�������v�'�.��/���A,5���P b%�"K���e�H�w������?%P�dEE��V@�>lU�1��S�������������
N���Tx�|��[���������j�!��.�����;��f��3R����s ����/����)�[�=� ~������P�����l<��U'4�U���7x)�������1�j>�%��'J@��<����o&E�\��Q?/u=���M���� �GS
��
.G����r58��� �C����q��!��3�+�yW�_�Fcx��)�/r~F���3��&"���t�b�������J LeJ����Ha���I��8y�6�JH����G����R�g~���_59�Jr��;la��z�?
����9�5�t,�i ������Z:�A�:�}������%�����|"[���f�d9�w�c��� �#�6��A;;Q�(�U�����&q�(q��*Rj����E��k���Mi��<��Q�sae����E%�J_��iA�0�^^�ov������"�D���~C���k�S�|�d�*����d>��=����{)�� ���[O���\��lD}���P���Yg?�������� K�����S��9�#�/X����Y���W�:�gA�huu��� f�Z�����2*_)y��D�HW���o�����)��2��-��>��`��=�����T�
�B���$^U��p��(}���������������vGl����:5��g�Y fQ\`>�������h�@I��p8Wy��Vv+����TE�<
�r��������{�o
q�����#u�FXD�*��6�R��}/b�U��V)�0[3��������=�u�I���?��Q0E��x$������)�1�=��%x&j�W+@ ����W�Vo�#-��1�rP��v6�q���J�J��c�E�0K���RE��_}��]�Ei�����W=�8j��������Ds�;^�F�8�-�iz��L�Y;�`A~z�9K���,��$p�n���z���l��g%��5nh�>a��v7�3a�?��"�d��aR�V�&#�*�cG}�p|t���O*�x�����.���A��5u}�
��o�?��T��#G�/���k�JQ���"&:�92,g��S��W������"]T�VG����,��2m���������#�����m1�0v������T�Pv�A����t��G�w4N���_���x������c��U��r�Wao��� T[�GJ��p���
���*P����E�L��g`-�-<0
�61��rD`/fl�kL5����.x
����7��1�?��p�i�x����B����iJoS/_'��%'�S����,���zM�� �������2!ZA������!����>+b�`��[����>���e&��vb��+ �!�!�^&�F�RBDV���FS���s ������E��7�1�����N�#\��Ne�U�B!���\j��������H�)�
�Y�f:����_�j��g�����\��� ��7�V�8�
���b�~Qj�j����_h�7�r1H����%���v�/���+t��_��� ��O3��{�����FfP����r:�4S
C��,����*�w�_V�C`�@d��K=���
?v3d�
:��<�w�k�#�D���!17k'L4U�Yp���@2� T���P�w����:-������,?�C�0N~g��A�)�O8U����K�e�������YC�o��LCB�y���{���Z���A���� �q��\%=KR7sc��p!5�Hl����s:,o�����L�V����C���G'�}������x�PZp�Y��Y�nU+#��]�1h`z�ZM��X�"t2�@������>���u��GL=f2������ ���tA��|����Y��e�k(����a��2��1�t�����pQ8�A������-f�.�S��Ii�c��*G����R4��_�}�����]���)O�+y����&YR+���?!�L�*��u��]c�uC�������0
6y]�P�Xo?���
}����#]7�^Ly�H���h�:O��P.B��Km�t]��%�5���B��7��<Lq\_�O��m����4��E��O_��r���z����>#���h�����O�nx��d>����A"~�.�>�9o����{ZM������-����:���@�������uQ7�����<�F1�vr�
�o(�r,-�T/T��6$0^ �����l�^v:l��
�afh1��:G�#���m���^@�k�r&�'k��]�j����\X�P&��gb
��;-� ����o�oGx�6h�B#�8������������M�9
gY}���F
���Ew���E�D�
%a�+���!������������������8��S�O�����?���L��0��2�u�z� oc(�`^G�6���"�\~\1�7���id�KWx�}��zzLHmg��;������a�y�b�r���c��V|��������-��lpPFm7�!�����>��&� /��n%c�M.O���=.{�QR
^h���/6�"���}�+��~���yg����;_���r�L���1i�
:�r����^�u�u�fN���>�\���#��X�}��g.�7����g���4����������#��?�Zd����<�]��,�b )����^�=rt��r������_���`�}*W��S��
K3��d�^7����N��qWS0�}y6e
��0?\�;g��[���ER���NeS�������WsZ��,I_�X����Qs�$H�0�w"�]�U�j�C�G���|�2�������}��h;���*�a�N�;]{�Z�����i�����"o��][��7�s\_B����gw!�uey�|��{
��f�sP��^�u�C��&�^_����������3���mQ��kj�_�i���p�s����l}<� {�X��.w�����F�=k���Nl)4
�B��f�
m[n������n�~���G#(v^��'�����I�.>D��RX~�Rz���� Y�N��=z�SNR�E'�_����!��YS���Nq��f�~�E���)�B�P?p�I�5C��K�Nx�z����Y{��� X;�r�B��1/=D������ZdG�������T+(+�o�c�8T�
�C��vC����;�:��oa�������v���
�=�pv�zt�]��hJ�J����m���(���CJ�l�����v_�ch�����S>_���|7v?b[So,��N�*����������+@���K�q��T�qz��g��)�Rr�(�)�� ���xa����A���}�;���5-^���C[w��J8 �Py)=�E�4'�O��P�C#��Ejfq��s6Xd�Q3���57=� ������|#e��b�G9��%��r������$Xs����>�U�I��b�5TN��=(����1��$�z�ovLo
�������P�;����R�3Qz{�$����|�hJAYjGj&Cb���XL�t�#�����j����$��!t�8P����}!����}K�2H���|��y��+b�����D>��CA"��?�z��C'U��KB�P�4�;�0����xE�`����:G*-�_��f�q�����t�������Of5@���p�F���R766�-��q�c�v�lg��e�fE������iK����$n�����s�Gw+%�!U=�B]>q�$��F�����I�w�m&���w�U���i�~"��������=`����sF�0}����� M�����66[�;�Qe>1t��M�^�m4�����u�$�D��gC��Qxl����Q��c���r�V�/�)��l�9�e���X�v�W4�^�lu��"`.%��Z:Z9��v����c����z%^^,?�Sy�YzW�4���
r(��MG��k��|�@�����y 1��e�q��9]��2���f5<C���O;K6���e���\5����{��H�����[q� 5�?^��~5qq�Ro��O9�D����C����z����Y"J7\�K����c�1C&P^����9�����+xv�N��<&���h�����?�|�67�!�FW��6X����O(j�1�7�|����{�6���Sb1�h�Ou��+n�����������Vq.�K'VM�^>qm8C�T�K|~7�Z�<-\����L�9h)
���Z��[�@����u��������%�Eu.�T������������3Gq>:���y~�����K�_����^�� /���H��sL�����{N�js*���h�T^�� �@3����u�lr�-�j��e��+,k6i��a�@b�9.�4RQ\'v����&����^����"]������[6M�������p)�p��`�I�����F������(.�{KI��t�4Iy1C i������Tb�J��������B���a(����� wCX��I�����5���]���r�5���''�J��zW�N�2��4�����qR���t��1�H��� W7-�-Ucx_j����/�-�OD�5��D:B� �$�g�O`��kv
� ��YT|�����H���g��zw���`�����{��d�������}M���?���9O9%�x�`��:����f��������\���?}�
C��(�6 ��J���Z�h�X|~t/����7�Ha��yK�B��tT4:a-�U�O��n�|�������K���X�0@�n�N2-�D��&���
���{��<A�N �R*��8U����&�,�ak�*����6kXF���J[�J���t@��,�iWg �0��u����@F1X�.�\���<��6J2=�F��|�L���=�X������x����j.��J^5BE*yo���]m�����������H��4x��R�U0����;(������)� ����F� �?����?S��l��x�N@U�I�^{�����{���s��7w������z�*�_A�iG�Q���0�b�*����|g����_���W�]qn����/�c���?R���� ����p5XK<"L�a1m �v��H_oK��v�;|���k5�'ty,�-y�>
�/��+��������Q&Y^����)���S�k~;U�"K%�;,��=�g)�}��n������G�'����������� �:��-��d��SN�r��w����g!�W}��D>ds�w��%�Y{��<�����t��,��:M�{����9^��|k�%�w��x��7�h��_'�h��`�IB��9���0��h��r� x������4�;}_!/��5����&sA��������;U>������rX�'$����\�>���?3��w-��\��i�Y�c��s�7�k���� JF�{�'�A���H�`��5�L�����J�&�����&�����aa���;��I:m��}���0D���)OT�+��Yn�?���>�`���s�����uw���c�+���B{������2���5�.t�S����A���bS��'*��B|��cw��D&;��[������I�Uq�m���>�V�dwy���q�?\��-h
\=��c=�A�[N\�&�U>�jt�l�����������Du�Ji8��6cCz�M<����k���r��|�~V3y&��D���v���8������{i�������G�:��Xt�7�
����U�� m�s��(>Q
�Y>%����Z��&���($�^���� ����6��+���T�2�@�C�w.���X�.���]��&A�zc�\�HB>������9t�����H$�q��Q ��2(.1TB!'6|F�&bqb���I�g0�F�]�D��H-�F@~���N
��N0��!�����o�eh��z�U8��fc�F��0e[bp3�o��5�`>���^��@��W��?�Ao���
��������H(�8�]����l%c �+�����X�X��� d&��������`�E��[if#�K���%.����]��P8X6������z��"��������x����}>�A�������.�Ipv
��KA������
��:��a���c������gK�]�V9V��������4Gp�>M�'�����%����'n2:��&3������0i��o����/7^���d��~�u_,&z����b'w��������ir�=�8�����~���]Y�_��7���FeHS���J��������m��� ��*��0���+}@\Ms9�2�N! b�f�BK=1@�FYv����0Q�������<Bf����0=����������,�2( ���4�!A/\2A������&�~�r�t�����S���Y�g�\����9�����>Y��\����g�"A3�6��.\pV���!7������f}
�0JA��i��t�7�����!���&0��=���1�B�o
��u9����;e��p� <��By�I#_'*���(u������ ;����3��c�������v�XF�FJ�����p����d�OEa9��!I����������Rn�u��RrH�[� ��p$�����������ix6z 8������N�#J ������Zr��i���|K2����b����e�ey"��������~@hU$�(D�4��T�6�FJ��T���)��l���|��[�P�L"����#�'�������G���K�!,M�-��%v����pVQ�`�
`��������oKkv�|=zVh�@�_� BUW~)j����Q�'����K�-�*�q��]���;�(/��j��F2U�*X���O*����?���Z������~��s�M���F��6��P�C�xN����pSB�v��X��d_��{�J1>�������-J�?���Ix�,�T���'
AT���;Y<���1KJ�N���2��0��O���Q�uL�\�i��H��^Q`h��������ux� ��n
h��L�n�{5�`�LF����A:#��:�7�;���|���ob�\`�"�&��0��K3��}Z�?<�A���>��b�����5d���"`�l��e����.�gg:/g5�'����
����a|�w.6��������3t���� +`��J������D�����3�U������a,<�lC�������!Z��#J�
��������3��� x]5^�����0S���e,�K!�����K"-��U�p!N������Z���g��������?j���X)#�KU�����:�d��5�a'=�&�L��X@1�eLJ�Ij�K�,���f'|�#���7�`�pd5� L�=��_?���� :������NO�:��m1�����A::+�J�����U�KR�Gb`��I|�� ���6���%�W3�� 2�w�h.c���a; p�Y[��jv,<��r�����71���`B,�1���?�0�t��[r��{2����x� ����z&U�c��>����Z�����W�C0����mg��q�4���]O)"Y�[��.ZF���t��z6/p�-^y�t��z�[�L����j���1[t� ����1�:&�z��?�O���C}:`������)v��*�e��+o���p����]r'���[�Bm��Z�1� f��i"-,�l����%�X �� ���2TG9������6��)`�!�_�_�=v,�4x�����B�� ���s!�iV��<U8b8�_���0E��tT�s�b��.����M{��DV���O�Ot�����I&u�e]�G������R�,��7�����9�� },Wo�(���vb���a ���;z?��1�4)��'�%�&��+�X�-rp4L�����sQJ��y�v�I��$X��E����Yg���1�gM�����A���4��N����.��x��y���sA�����mv��t��������"f�7"�~iM����kU�Q�"�XE*�Y����L���B�&f����Ni��,��
��z y�I�9�#�E��A��s�(�La WI���w��;��}I��m���$5�k����*f�J�k�����p-���l��s�O�0��C�*��O����)\��],��xP6�3u�b��O4a ��RS�7g{�M��T-����XU���(nY�������Z����]����Zt9�;���������Ovw�����&�v4cqa)QCe�_J���k��������$|Dz��u��gz�]
����%I��cv�mG����p�3�5���I�#f��Vs�m���:�9#� )��g��fU_��������~����f7�*���THU� �j�O�3����� ?T�83�S�c9��~@���P`eSq��9"�[-�R�!;eN��Dy8�1�q��W��C����K���*���\fU�9Obu������������i�)^J�J5K�w+�����)Q��w�v�A5�!�>q�'��v1]R�,6P4*$��@����tl
�!�C>�-�M����l�5_�XGd�
�J5�}��Rln��a��g��W �(]
�*���.%���d:���p�R�D���1aGN��b]���Q�dsY�-��%�+yd���c{��/���d�%���Nz�
v-��Q����P:JmH�� ��TT��^����N�Q#�������r: !T)I���M�{c���X��$����U�v��`������2��w����
�Q�L�h[�d�#V�*.%L`�b�Xc-�F�#Q�WK������������b����.�1:�;����{�hQ�qr��Xg<?*�?-3(�O`z�$E:R���������c���z�O��&O����Gm'Z�0�$'p0�g�����'�������e�kDv�v�.w���N�q���t�� r6�[����o ��kFD�����)B~8<�j�_��-�S�8<�������:"����]L�-6��=h���]�k�_zn�s�~�S������y,��d��u�<�(��2�����P�����<���.X�8K������[y�*���b B�y��$J��{���
&��N���g�P����y;2/��Cf?��~�9���N�p��n��S�\��T��)q!s8>��3�2����A��������'1�T6n�ZeT��k��L�6�������'^�!�lR�$����,�u�[�G05\.�����`�y�����t���#$z�v���f2��w!n�25= �C�k���X��#q��1������!���� H%�M�����kn�7D*B��0�]����s|j�>z�vv������� ����{�8O�z:��l�Q�*�WK����U�R�|��?������F����o������V��Q�F�:�����|�)b��=��E�V9�7���T�e���4M�F��C�1�e��jx�_H"�so��\�k9Q����@�+<�#�pi���� �[8���.�a�[���Lk+m&[��������S.���M�g��(��9�0z��bI��u|CVVt��iu�R���� V�����Cm��,\}<j�V���g��,����������� 1K�z���G8j��P F$;�����V�*4���h�dK�����'�(��9<��H�������w�&3��O=���7�����G��c����i�i>s0d�(�+���h�=E+4�
cd�;F��]�l�?��|5����\�*u&3�l����\1�����>�kgA��;h���bj���y }
?��[�������j�E����/�Y�=���O�-��|�m9esZ���x�;��V��z8�NA�����"E���j m"�plv�F��b"T�����������6��/����p�����KU�2;Z"8�����N
@���a���x��p������ �O�x�����n��3)�� 4q���s,������o _�tX��G�u���u���t�>