From 0e06c5fe451832276f26cb3a86111228e78525a4 Mon Sep 17 00:00:00 2001 From: Anthonin Bonnefoy Date: Tue, 3 Mar 2026 17:42:40 +0100 Subject: Fix stuck shutdown due to unflushed records Shutdown sequence may be stuck indefinitely under the following circumstances: - Data checksums is enabled - A logical replication walsender is running - A select in an explicit transaction tries to prune a full heap page, wrote a FPI_FOR_HINT record which crosses the page boundary - The select is rollbacked (or killed) - 'pg_ctl stop' is sent The FPI_FOR_HINT record is likely going to be a contrecord and starts a new page. However, as the select is rollbacked, XLogSetAsyncXactLSN isn't called to advance the LSN to include this record. When the checkpointer starts ShutdownXLOG(), all walsenders will be notified to stop. However, the logical replication walsender will be stuck in the following infinite loop: - Tries to read the last FPI_FOR_HINT record - The page with the record header is read - tot_len > len, the record needs to be reassembled - Tries to read the next page containing the rest of the record. It fails since this page was never written. - xlog reader state is reset with XLogReaderInvalReadState - It goes back to the start of WalSndLoop's loop There are some attempts done by the walsender to flush the WAL using XLogBackgroundFlush. However, XLogBackgroundFlush only writes completed blocks, or up to the latest known async lsn. Since the select was rollbacked, XLogBackgroundFlush doesn't flush the next partial page. This patch fixes the issue by replacing XLogBackgroundFlush() by XLogFlush(GetXLogInsertRecPtr()), flushing all pending records without depending on async LSN to be up to date. --- src/backend/replication/walsender.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 2cde8ebc729..5a6a618678d 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1886,7 +1886,7 @@ WalSndWaitForWal(XLogRecPtr loc) * written, because walwriter has shut down already. */ if (got_STOPPING) - XLogBackgroundFlush(); + XLogFlush(GetXLogInsertRecPtr()); /* * To avoid the scenario where standbys need to catch up to a newer -- 2.52.0