logical: fix recomputation required LSN on restart_lsn-only advancement
Hi,
While reading logical replication code, I found an issue in LogicalConfirmReceivedLocation().
In LogicalConfirmReceivedLocation(), updated_restart is tracked independently from updated_xmin, and the slot is marked dirty and saved when either one changed. But after that, ReplicationSlotsComputeRequiredLSN() is still only called inside "if (updated_xmin)”.
So for the restart-only case:
* updated_restart = true
* updated_xmin = false
* ReplicationSlotSave() runs
* ReplicationSlotsComputeRequiredLSN() does not run because updated_xmin is false
That means the global retention point managed by XLogSetReplicationSlotMinimumLSN() can stay stale until some later unrelated event recomputes it. Since ReplicationSlotsComputeRequiredLSN() derives the global minimum from slot restat_lsn, skipping it after a restart-only advance can retain excess WAL and may lead to WAL bloat.
This patch fixes the problem by moving ReplicationSlotsComputeRequiredLSN() under “if (updated_restart)”.
Looks like this issue has been there for a long time, so if this analysis is correct, it may also be worth back-patching.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Attachments:
v1-0001-logical-recompute-required-LSN-when-restart_lsn-a.patchapplication/octet-stream; name=v1-0001-logical-recompute-required-LSN-when-restart_lsn-a.patch; x-unix-mode=0644Download+11-6
On Tue, Apr 21, 2026 at 10:16 AM Chao Li <li.evan.chao@gmail.com> wrote:
Hi,
While reading logical replication code, I found an issue in
LogicalConfirmReceivedLocation().In LogicalConfirmReceivedLocation(), updated_restart is tracked
independently from updated_xmin, and the slot is marked dirty and saved
when either one changed. But after that,
ReplicationSlotsComputeRequiredLSN() is still only called inside "if
(updated_xmin)”.So for the restart-only case:
* updated_restart = true
* updated_xmin = false
* ReplicationSlotSave() runs
* ReplicationSlotsComputeRequiredLSN() does not run because updated_xmin
is falseThat means the global retention point managed by
XLogSetReplicationSlotMinimumLSN() can stay stale until some later
unrelated event recomputes it. Since ReplicationSlotsComputeRequiredLSN()
derives the global minimum from slot restat_lsn, skipping it after a
restart-only advance can retain excess WAL and may lead to WAL bloat.This patch fixes the problem by moving
ReplicationSlotsComputeRequiredLSN() under “if (updated_restart)”.Looks like this issue has been there for a long time, so if this analysis
is correct, it may also be worth back-patching.Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
I think this change is reasonable.
This makes the recomputation condition match the state it actually depends
on.
Regards,
Xunqi Hu
Chao Li <li.evan.chao@gmail.com> wrote:
While reading logical replication code, I found an issue in LogicalConfirmReceivedLocation().
In LogicalConfirmReceivedLocation(), updated_restart is tracked independently from updated_xmin, and the slot is marked dirty and saved when either one changed. But after that, ReplicationSlotsComputeRequiredLSN() is still only called inside "if (updated_xmin)”.
So for the restart-only case:
* updated_restart = true
* updated_xmin = false
* ReplicationSlotSave() runs
* ReplicationSlotsComputeRequiredLSN() does not run because updated_xmin is falseThat means the global retention point managed by XLogSetReplicationSlotMinimumLSN() can stay stale until some later unrelated event recomputes it. Since ReplicationSlotsComputeRequiredLSN() derives the global minimum from slot restat_lsn, skipping it after a restart-only advance can retain excess WAL and may lead to WAL bloat.
This patch fixes the problem by moving ReplicationSlotsComputeRequiredLSN() under “if (updated_restart)”.
FYI, this overlaps with another post in the REPACK thread [1]/messages/by-id/TYRPR01MB14195633567DA00ABD42570B794592@TYRPR01MB14195.jpnprd01.prod.outlook.com.
Looks like this issue has been there for a long time, so if this analysis is correct, it may also be worth back-patching.
As REPACK in PG 19 does not let xmin advance (that should be fixed in the
future), I think makes sense to apply [1]/messages/by-id/TYRPR01MB14195633567DA00ABD42570B794592@TYRPR01MB14195.jpnprd01.prod.outlook.com to v19. However, during logical
replication, xmin (IMO) gets updated rather often, so the problem should not
be that severe in earlier versions.
[1]: /messages/by-id/TYRPR01MB14195633567DA00ABD42570B794592@TYRPR01MB14195.jpnprd01.prod.outlook.com
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
On Apr 21, 2026, at 15:09, Antonin Houska <ah@cybertec.at> wrote:
Chao Li <li.evan.chao@gmail.com> wrote:
While reading logical replication code, I found an issue in LogicalConfirmReceivedLocation().
In LogicalConfirmReceivedLocation(), updated_restart is tracked independently from updated_xmin, and the slot is marked dirty and saved when either one changed. But after that, ReplicationSlotsComputeRequiredLSN() is still only called inside "if (updated_xmin)”.
So for the restart-only case:
* updated_restart = true
* updated_xmin = false
* ReplicationSlotSave() runs
* ReplicationSlotsComputeRequiredLSN() does not run because updated_xmin is falseThat means the global retention point managed by XLogSetReplicationSlotMinimumLSN() can stay stale until some later unrelated event recomputes it. Since ReplicationSlotsComputeRequiredLSN() derives the global minimum from slot restat_lsn, skipping it after a restart-only advance can retain excess WAL and may lead to WAL bloat.
This patch fixes the problem by moving ReplicationSlotsComputeRequiredLSN() under “if (updated_restart)”.
FYI, this overlaps with another post in the REPACK thread [1].
Looks like this issue has been there for a long time, so if this analysis is correct, it may also be worth back-patching.
As REPACK in PG 19 does not let xmin advance (that should be fixed in the
future), I think makes sense to apply [1] to v19. However, during logical
replication, xmin (IMO) gets updated rather often, so the problem should not
be that severe in earlier versions.[1] /messages/by-id/TYRPR01MB14195633567DA00ABD42570B794592@TYRPR01MB14195.jpnprd01.prod.outlook.com
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
Thanks for pointing out that. I will review that patch.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/