Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

Started by Anthonin Bonnefoy3 months ago21 messageshackers

anthonin.bonnefoy@datadoghq.com

3 months ago

Hi,

Shutdown may be indefinitely stuck under the following circumstances:
- Data checksum is enabled (needed to generate FPI_FOR_HINT record)
- A logical replication walsender is running
- A select in an explicit ongoing transaction pruned a heap page and
logged a FPI_FOR_HINT record. This record is likely going to be a
contrecord and start a new page.

Starting the shutdown will kill this ongoing transaction. Since the
transaction doesn't have an allocated xid, the FPI_FOR_HINT record
will be left unflushed.

When the checkpointer calls ShutdownXLOG(), all walsenders will be
notified to stop. However, the logical replication walsender will be
stuck in an infinite loop, trying to read this unflushed record and
never reaching the stop state, blocking the whole shutdown sequence.

This can be reproduced with the following script (this assumes
`pgbench -i` was run to create pgbench_accounts and a running logical
replication walsender):

TRUNCATE pgbench_accounts;
-- Completely fill the first heap page
INSERT INTO pgbench_accounts SELECT *, *, *, '' FROM generate_series(0, 62);
-- This should tag the page's metadata as full
BEGIN;
UPDATE pgbench_accounts SET bid=4 where aid=1;
ROLLBACK;
-- Force checkpoint so next change will be a FPW
CHECKPOINT;
-- Open an explicit transaction
BEGIN;
-- Select will do an opportunistic pruning, find nothing to prune but
will still unset the page full flag, writing a FPI_FOR_HINT
SELECT ctid, * FROM pgbench_accounts WHERE aid=2;

Then shutdown the database with 'pg_ctl stop' with the transaction
left opened. The shutdown will be stuck and the logical replication
walsender will be stuck at 100% CPU.

I've managed to reproduce this issue on 14 and the current HEAD.

The attached (tentative) patch fixes the issue by flushing all records
before signaling walsenders to stop. At that point, all backends
should have been killed, so flushing leftover records felt like a
correct approach.

Regards,
Anthonin Bonnefoy

Anthonin Bonnefoy

anthonin.bonnefoy@datadoghq.com

3 months ago

In reply to: Anthonin Bonnefoy (#1)

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

Some additional informations:

XLogSendLogical is stuck in the following infinite loop:
- It attempt to read the next record with XLogReadAhead + XLogDecodeNextRecord
- The page with the record header is read
- It has the record header, it goes back to XLogDecodeNextRecord
- tot_len > len, the record needs to be reassembled
- The next page containing the rest of the record is read with
ReadPageInternal. It fails since this page was never written.
- It jumps to the err label, XLogReaderInvalReadState(state) is called
and reset the reader state
- It goes back to the start of WalSndLoop's loop

There are some attempts done by the walsender to flush the WAL using
XLogBackgroundFlush:
/*
* If we're shutting down, trigger pending WAL to be written out,
* otherwise we'd possibly end up waiting for WAL that never gets
* written, because walwriter has shut down already.
*/
if (got_STOPPING)
XLogBackgroundFlush();

However, XLogBackgroundFlush only writes completed blocks or the
latest async xact known. With the issue triggered, I have the
following state:

XLogCtl->LogwrtRqst: (Write = 39128056, Flush = 39124992)
LogwrtResult: (Write = 39124992, Flush = 39124992)
XLogCtl->asyncXactLSN: 39119776

There are 3064 bytes (39128056 - 39124992) that contain the next page
with the rest of the cont record that still needs to be written.
However, XLogBackgroundFlush backs off to the previous page boundary:
/* back off to last completed page boundary */
WriteRqst.Write -= WriteRqst.Write % XLOG_BLCKSZ;
Meaning WriteRqst.Write will be 39124992, which is already written and
flushed and asyncXactLSN is behind both write and flush.

So, it looks like the root issue is more that the async LSN isn't
updated when a transaction without xid is rollbacked.
When going through CommitTransaction, such a transaction would still
go through XLogSetAsyncXactLSN.

I've updated the patch with this new approach: XLogSetAsyncXactLSN is
now called in RecordTransactionAbort even when a xid wasn't assigned.
With this, the logical walsender is able to force the flush of the
last partial page using XLogBackgroundFlush.

Anthonin Bonnefoy

anthonin.bonnefoy@datadoghq.com

3 months ago

In reply to: Anthonin Bonnefoy (#2)

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

And here's a script reproducing the issue. It creates the clusters,
sets up the logical replication and runs the necessary query to leave
FPI_FOR_HINT as the last written record.

If successful, the script should have pg_stop stuck with 'waiting for
server to shut down.......', with the walsender stuck at 100% CPU.

Fujii Masao

masao.fujii@gmail.com

3 months ago

In reply to: Anthonin Bonnefoy (#2)

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

Shutdown may be indefinitely stuck under the following circumstances:

Thanks for reporting this issue and for the patch!

I was able to reproduce the problem on master.

On Tue, Feb 24, 2026 at 6:47 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

So, it looks like the root issue is more that the async LSN isn't
updated when a transaction without xid is rollbacked.
When going through CommitTransaction, such a transaction would still
go through XLogSetAsyncXactLSN.

I've updated the patch with this new approach: XLogSetAsyncXactLSN is
now called in RecordTransactionAbort even when a xid wasn't assigned.
With this, the logical walsender is able to force the flush of the
last partial page using XLogBackgroundFlush.

I'm a bit concerned that this approach could cause walwriter to attempt
WAL writes more aggressively in some cases, adding overhead during
normal processing. If the goal is only to prevent shutdown from getting stuck,
instead, would it be sufficient for ShutdownXLOG() to call
XLogSetAsyncXactLSN() just before WalSndWaitStopping()?

Regards,

--
Fujii Masao

Anthonin Bonnefoy

anthonin.bonnefoy@datadoghq.com

3 months ago

In reply to: Fujii Masao (#4)

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

On Thu, Feb 26, 2026 at 5:20 PM Fujii Masao <masao.fujii@gmail.com> wrote:

Thanks for reporting this issue and for the patch!

I was able to reproduce the problem on master.

Thanks for the confirmation!

I'm a bit concerned that this approach could cause walwriter to attempt
WAL writes more aggressively in some cases, adding overhead during
normal processing. If the goal is only to prevent shutdown from getting stuck,
instead, would it be sufficient for ShutdownXLOG() to call
XLogSetAsyncXactLSN() just before WalSndWaitStopping()?

I did something similar with the first version using
XLogFlush(WriteRqstPtr), and it indeed fixes the shutdown issue.

Though I was under the impression that calling XLogSetAsyncXactLSN()
in RecordTransactionAbort was a better approach, as it is similar to
reporting the latest async abort. You may have a large backlog of
records, like selects pruning a lot of pages and timing out, which the
next commit will have to flush. Notifying the walwriter allows it to
flush those records it can.

Maybe for the context of a backport, patching ShutdownXLOG has the
benefit of minimising the amount of changes and risks?

I've updated the RecordTransactionAbort approach for now, with a small
optimization. XactLastRecEnd may be 0 if nothing was written, and we
can skip the unnecessary spinlocks in this case.

Regards,
Anthonin Bonnefoy

Anthonin Bonnefoy

anthonin.bonnefoy@datadoghq.com

3 months ago

In reply to: Anthonin Bonnefoy (#5)

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

Here's a small updated version of the patch, with the 2 different approaches.

- 0001: This is the XLogSetAsyncXactLSN call in RecordTransactionAbort
approach. The small difference with v3 is that the 'XactLastRecEnd !=
0' condition is now merged with !isSubXact:
+if (!isSubXact && XactLastRecEnd != 0)
+{
+        XLogSetAsyncXactLSN(XactLastRecEnd);
         XactLastRecEnd = 0;
+}

- 0002: This is the ShutdownXLOG approach. I've used
XLogFlush(WriteRqstPtr) instead of updating the async LSN. It feels
like if we're going to stop the walsenders, we may as well flush
everything and get the WAL in a good state.
The spinlock to access XLogCtl->LogwrtRqst.Write is probably
unnecessary since we're at a point where no additional WAL records
should be written, but it doesn't hurt to keep consistency.

Regards,
Anthonin Bonnefoy

Fujii Masao

masao.fujii@gmail.com

3 months ago

In reply to: Anthonin Bonnefoy (#6)

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

On Wed, Mar 4, 2026 at 2:11 AM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

Here's a small updated version of the patch, with the 2 different approaches.

Thanks for the patches!

- 0001: This is the XLogSetAsyncXactLSN call in RecordTransactionAbort
approach. The small difference with v3 is that the 'XactLastRecEnd !=
0' condition is now merged with !isSubXact:
+if (!isSubXact && XactLastRecEnd != 0)
+{
+        XLogSetAsyncXactLSN(XactLastRecEnd);
XactLastRecEnd = 0;
+}

The approach of calling XLogSetAsyncXactLSN() in RecordTransactionAbort() seems
more like an improvement than a bug fix. Since it changes
RecordTransactionAbort(), it could have unintended impact on the system.

It may be a reasonable idea (though I'm not certain yet), but for a bug fix
I believe we should first apply the minimal change necessary to resolve
the issue. If needed, this approach could then be proposed later separately as
an improvement for the next major version.

- 0002: This is the ShutdownXLOG approach. I've used
XLogFlush(WriteRqstPtr) instead of updating the async LSN. It feels
like if we're going to stop the walsenders, we may as well flush
everything and get the WAL in a good state.
The spinlock to access XLogCtl->LogwrtRqst.Write is probably
unnecessary since we're at a point where no additional WAL records
should be written, but it doesn't hurt to keep consistency.

As a simpler alternative, would it make sense for walsender to call
XLogFlush(GetXLogInsertRecPtr()) instead of XLogBackgroundFlush() during
shutdown? I'm not sure why walsender currently uses XLogBackgroundFlush().
If there isn't a clear reason for that choice, directly calling XLogFlush()
might be the simpler solution. Thought?

Regards,

--
Fujii Masao

Anthonin Bonnefoy

anthonin.bonnefoy@datadoghq.com

3 months ago

In reply to: Fujii Masao (#7)

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

On Tue, Mar 3, 2026 at 6:29 PM Fujii Masao <masao.fujii@gmail.com> wrote:

The approach of calling XLogSetAsyncXactLSN() in RecordTransactionAbort() seems
more like an improvement than a bug fix. Since it changes
RecordTransactionAbort(), it could have unintended impact on the system.

It may be a reasonable idea (though I'm not certain yet), but for a bug fix
I believe we should first apply the minimal change necessary to resolve
the issue. If needed, this approach could then be proposed later separately as
an improvement for the next major version.

Agreed, that's definitely a change that can have a large impact. I
will open a separate thread later.

As a simpler alternative, would it make sense for walsender to call
XLogFlush(GetXLogInsertRecPtr()) instead of XLogBackgroundFlush() during
shutdown? I'm not sure why walsender currently uses XLogBackgroundFlush().
If there isn't a clear reason for that choice, directly calling XLogFlush()
might be the simpler solution. Thought?

That sounds like a good solution. I've tried it and it fixes the
issue. And this only changes the shutdown behaviour in the walsender.

The use of XLogBackgroundFlush() has been introduced with
c6c333436491, but there's no mention why it was specifically used. I
guess the assumption was that a change would either be flushed with a
commit, or tracked by async LSN through rollback, so
XLogBackgroundFlush() would always write pending records. But this
turns out to be false in the case of this bug.

I've updated the patch with this approach.

Regards,
Anthonin Bonnefoy

Fujii Masao

masao.fujii@gmail.com

3 months ago

In reply to: Anthonin Bonnefoy (#8)

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

On Wed, Mar 4, 2026 at 5:52 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

On Tue, Mar 3, 2026 at 6:29 PM Fujii Masao <masao.fujii@gmail.com> wrote:

The approach of calling XLogSetAsyncXactLSN() in RecordTransactionAbort() seems
more like an improvement than a bug fix. Since it changes
RecordTransactionAbort(), it could have unintended impact on the system.

It may be a reasonable idea (though I'm not certain yet), but for a bug fix
I believe we should first apply the minimal change necessary to resolve
the issue. If needed, this approach could then be proposed later separately as
an improvement for the next major version.

Agreed, that's definitely a change that can have a large impact. I
will open a separate thread later.

As a simpler alternative, would it make sense for walsender to call
XLogFlush(GetXLogInsertRecPtr()) instead of XLogBackgroundFlush() during
shutdown? I'm not sure why walsender currently uses XLogBackgroundFlush().
If there isn't a clear reason for that choice, directly calling XLogFlush()
might be the simpler solution. Thought?

That sounds like a good solution. I've tried it and it fixes the
issue. And this only changes the shutdown behaviour in the walsender.

The use of XLogBackgroundFlush() has been introduced with
c6c333436491, but there's no mention why it was specifically used. I
guess the assumption was that a change would either be flushed with a
commit, or tracked by async LSN through rollback, so
XLogBackgroundFlush() would always write pending records. But this
turns out to be false in the case of this bug.

I've updated the patch with this approach.

Thanks for updating the patch!

Barring any objections, I will commit the patch and backpatch it to
all supported branches.

Regards,

--
Fujii Masao

#10

Alexander Lakhin

exclusion@gmail.com

3 months ago

In reply to: Fujii Masao (#9)

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

Hello Anthonin and Masao-san,

04.03.2026 19:46, Fujii Masao wrote:

On Wed, Mar 4, 2026 at 5:52 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

I've updated the patch with this approach.

Thanks for updating the patch!

Barring any objections, I will commit the patch and backpatch it to
all supported branches.

Thank you for working on this!

It looks like the same issue was discovered and discussed before, but that
time without a final fix: [1]/messages/by-id/f15d665f-4cd1-4894-037c-afdbe369287e@gmail.com. I tried v5 patch with my
099_walsender_stop.pl test and it executed 100 internal iterations
successfully, while without the patch it failed for me on iterations 6, 8, 5.

[1]: /messages/by-id/f15d665f-4cd1-4894-037c-afdbe369287e@gmail.com

Best regards,
Alexander

#11

Anthonin Bonnefoy

anthonin.bonnefoy@datadoghq.com

3 months ago

In reply to: Alexander Lakhin (#10)

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

Hi Alexander,

On Thu, Mar 5, 2026 at 7:30 AM Alexander Lakhin <exclusion@gmail.com> wrote:

Hello Anthonin and Masao-san,
Thank you for working on this!

It looks like the same issue was discovered and discussed before, but that
time without a final fix: [1]. I tried v5 patch with my
099_walsender_stop.pl test and it executed 100 internal iterations
successfully, while without the patch it failed for me on iterations 6, 8, 5.

[1] /messages/by-id/f15d665f-4cd1-4894-037c-afdbe369287e@gmail.com

Thanks for the tests and the additional context.

Looking at the the thread, the latest patch provided a similar solution using:
+ XLogFlush(GetInsertRecPtr());

So it was relying on GetInsertRecPtr() instead of
GetXLogInsertRecPtr(). As mentioned in the thread, GetInsertRecPtr()
only returns the position of the last full xlog page, meaning it
doesn't fix the issue we have where the last partial page contains a
continuation record.

Testing the XLogFlush(GetInsertRecPtr()) patch with my script, I still
get the shutdown stuck issue.

Using GetXLogInsertRecPtr() is required to make sure the last partial
page is correctly flushed.

Regards,
Anthonin Bonnefoy

#12

Fujii Masao

masao.fujii@gmail.com

3 months ago

In reply to: Anthonin Bonnefoy (#11)

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

On Thu, Mar 5, 2026 at 5:40 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

So it was relying on GetInsertRecPtr() instead of
GetXLogInsertRecPtr(). As mentioned in the thread, GetInsertRecPtr()
only returns the position of the last full xlog page, meaning it
doesn't fix the issue we have where the last partial page contains a
continuation record.

Testing the XLogFlush(GetInsertRecPtr()) patch with my script, I still
get the shutdown stuck issue.

Using GetXLogInsertRecPtr() is required to make sure the last partial
page is correctly flushed.

Since GetXLogInsertRecPtr() returns a bogus LSN and XLogFlush() does
almost nothing during recovery, I added a !RecoveryInProgress() check
as follows. I've attached the latest version of the patch and updated
the commit message.

- if (got_STOPPING)
- XLogBackgroundFlush();
+ if (got_STOPPING && !RecoveryInProgress())
+ XLogFlush(GetXLogInsertRecPtr());

Regards,

--
Fujii Masao

#13

Fujii Masao

masao.fujii@gmail.com

3 months ago

In reply to: Fujii Masao (#12)

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

On Fri, Mar 6, 2026 at 8:46 AM Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Mar 5, 2026 at 5:40 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

So it was relying on GetInsertRecPtr() instead of
GetXLogInsertRecPtr(). As mentioned in the thread, GetInsertRecPtr()
only returns the position of the last full xlog page, meaning it
doesn't fix the issue we have where the last partial page contains a
continuation record.

Testing the XLogFlush(GetInsertRecPtr()) patch with my script, I still
get the shutdown stuck issue.

Using GetXLogInsertRecPtr() is required to make sure the last partial
page is correctly flushed.

Since GetXLogInsertRecPtr() returns a bogus LSN and XLogFlush() does
almost nothing during recovery, I added a !RecoveryInProgress() check
as follows. I've attached the latest version of the patch and updated
the commit message.
- if (got_STOPPING)
- XLogBackgroundFlush();
+ if (got_STOPPING && !RecoveryInProgress())
+ XLogFlush(GetXLogInsertRecPtr());

I've pushed the patch. Thanks!

Regards,

--
Fujii Masao

#14

Andres Freund

andres@anarazel.de

3 months ago

In reply to: Fujii Masao (#13)

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

Hi,

On 2026-03-06 16:48:06 +0900, Fujii Masao wrote:

On Fri, Mar 6, 2026 at 8:46 AM Fujii Masao <masao.fujii@gmail.com> wrote:
On Thu, Mar 5, 2026 at 5:40 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

So it was relying on GetInsertRecPtr() instead of
GetXLogInsertRecPtr(). As mentioned in the thread, GetInsertRecPtr()
only returns the position of the last full xlog page, meaning it
doesn't fix the issue we have where the last partial page contains a
continuation record.

Testing the XLogFlush(GetInsertRecPtr()) patch with my script, I still
get the shutdown stuck issue.

Using GetXLogInsertRecPtr() is required to make sure the last partial
page is correctly flushed.

Since GetXLogInsertRecPtr() returns a bogus LSN and XLogFlush() does
almost nothing during recovery, I added a !RecoveryInProgress() check
as follows. I've attached the latest version of the patch and updated
the commit message.
- if (got_STOPPING)
- XLogBackgroundFlush();
+ if (got_STOPPING && !RecoveryInProgress())
+ XLogFlush(GetXLogInsertRecPtr());
I've pushed the patch. Thanks!

I'm pretty sure this is not correct as-is, it suffers from the same issue as
/messages/by-id/vf4hbwrotvhbgcnknrqmfbqlu75oyjkmausvy66ic7x7vuhafx@e4rvwavtjswo
I.e. it is not safe to use GetXLogInsertRecPtr() to determine up to where to
flush to, due to page boundaries.

Greetings,

Andres Freund

#15

Anthonin Bonnefoy

anthonin.bonnefoy@datadoghq.com

3 months ago

In reply to: Andres Freund (#14)

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

On Tue, Mar 10, 2026 at 6:11 PM Andres Freund <andres@anarazel.de> wrote:

I'm pretty sure this is not correct as-is, it suffers from the same issue as
/messages/by-id/vf4hbwrotvhbgcnknrqmfbqlu75oyjkmausvy66ic7x7vuhafx@e4rvwavtjswo
I.e. it is not safe to use GetXLogInsertRecPtr() to determine up to where to
flush to, due to page boundaries.

I've managed to reproduce this issue by ensuring the FPI_FOR_HINT
record finishes at the end of a page with the following script (might
need some adjustment if the record sizes are different):

DROP TABLE IF EXISTS test_insert_rec_ptr;
CREATE TABLE test_insert_rec_ptr(aid int, data text) WITH
(autovacuum_enabled = false);
INSERT INTO test_insert_rec_ptr SELECT *, repeat('a', 100) FROM
generate_series(0, 57);
-- This should tag the page as full
BEGIN; UPDATE test_insert_rec_ptr SET aid=2 where aid=1; ROLLBACK;
CHECKPOINT;
-- Start with a fresh file
SELECT pg_switch_wal();
-- Our FPI_FOR_HINT writes 8193 bytes
-- With the long header, the first page has 8152 bytes available
-- With the short header, the second page has 8168 bytes available
-- We want our FPI_FOR_HINT to finish at the end of the second page
(+/- 8 bytes of alignment)
-- We need to write the first 25 bytes (or 32 with alignment) in the first page
-- For that, we need to write 8120 bytes of WAL records
BEGIN;
-- 264 bytes of FPW
INSERT INTO test_insert_rec_ptr VALUES(1);
-- 74 * 104 bytes
INSERT INTO test_insert_rec_ptr SELECT *, repeat('a', 44) FROM
generate_series(1, 74);
-- 108 bytes
INSERT INTO test_insert_rec_ptr VALUES(1, repeat('a', 48));
-- 46 bytes
COMMIT;
-- 264 + 74 * 104 + 46 + 108 = 8114 bytes, which will round up to 8120
with alignment
-- FPI_FOR_HINT record should be at 0x1FE0
BEGIN; SELECT * FROM test_insert_rec_ptr WHERE aid=2; ROLLBACK;

As far as I can tell, the only impact it has is to complain about the
write request being too far:
LOG: request to flush past end of generated WAL; request 0/01604018,
current position 0/01604000
ERROR: xlog flush request 0/01604018 is not satisfied --- flushed
only to 0/01604000

To avoid this issue, it sounds like we need something to use
XLogBytePosToEndRecPtr instead of XLogBytePosToRecPtr to convert the
byte position? With XLogBytePosToRecPtr(), the flush request would
stop at 01604000 instead of going to the next page with 01604018.

In the attached patch, I've added a GetXLogInsertEndRecPtr() function
which is similar to GetXLogInsertRecPtr(), except it uses
XLogBytePosToEndRecPtr() to stop at the page boundary.
There was also another XLogFlush(GetXLogWriteRecPtr()) call in
syncutils.c, so I replaced both calls with
XLogFlush(GetXLogInsertEndRecPtr()).

Regards,
Anthonin Bonnefoy

#16

Fujii Masao

masao.fujii@gmail.com

3 months ago

In reply to: Anthonin Bonnefoy (#15)

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

On Thu, Mar 12, 2026 at 11:08 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

On Tue, Mar 10, 2026 at 6:11 PM Andres Freund <andres@anarazel.de> wrote:

I'm pretty sure this is not correct as-is, it suffers from the same issue as
/messages/by-id/vf4hbwrotvhbgcnknrqmfbqlu75oyjkmausvy66ic7x7vuhafx@e4rvwavtjswo
I.e. it is not safe to use GetXLogInsertRecPtr() to determine up to where to
flush to, due to page boundaries.

Thanks for the report!

I've managed to reproduce this issue by ensuring the FPI_FOR_HINT
record finishes at the end of a page with the following script (might
need some adjustment if the record sizes are different):

DROP TABLE IF EXISTS test_insert_rec_ptr;
CREATE TABLE test_insert_rec_ptr(aid int, data text) WITH
(autovacuum_enabled = false);
INSERT INTO test_insert_rec_ptr SELECT *, repeat('a', 100) FROM
generate_series(0, 57);
-- This should tag the page as full
BEGIN; UPDATE test_insert_rec_ptr SET aid=2 where aid=1; ROLLBACK;
CHECKPOINT;
-- Start with a fresh file
SELECT pg_switch_wal();
-- Our FPI_FOR_HINT writes 8193 bytes
-- With the long header, the first page has 8152 bytes available
-- With the short header, the second page has 8168 bytes available
-- We want our FPI_FOR_HINT to finish at the end of the second page
(+/- 8 bytes of alignment)
-- We need to write the first 25 bytes (or 32 with alignment) in the first page
-- For that, we need to write 8120 bytes of WAL records
BEGIN;
-- 264 bytes of FPW
INSERT INTO test_insert_rec_ptr VALUES(1);
-- 74 * 104 bytes
INSERT INTO test_insert_rec_ptr SELECT *, repeat('a', 44) FROM
generate_series(1, 74);
-- 108 bytes
INSERT INTO test_insert_rec_ptr VALUES(1, repeat('a', 48));
-- 46 bytes
COMMIT;
-- 264 + 74 * 104 + 46 + 108 = 8114 bytes, which will round up to 8120
with alignment
-- FPI_FOR_HINT record should be at 0x1FE0
BEGIN; SELECT * FROM test_insert_rec_ptr WHERE aid=2; ROLLBACK;

As far as I can tell, the only impact it has is to complain about the
write request being too far:
LOG: request to flush past end of generated WAL; request 0/01604018,
current position 0/01604000
ERROR: xlog flush request 0/01604018 is not satisfied --- flushed
only to 0/01604000

To avoid this issue, it sounds like we need something to use
XLogBytePosToEndRecPtr instead of XLogBytePosToRecPtr to convert the
byte position? With XLogBytePosToRecPtr(), the flush request would
stop at 01604000 instead of going to the next page with 01604018.

In the attached patch, I've added a GetXLogInsertEndRecPtr() function
which is similar to GetXLogInsertRecPtr(), except it uses
XLogBytePosToEndRecPtr() to stop at the page boundary.
There was also another XLogFlush(GetXLogWriteRecPtr()) call in
syncutils.c, so I replaced both calls with
XLogFlush(GetXLogInsertEndRecPtr()).

Thanks for investigating the issue and making the patch!
It looks good to me.

Andres,

Do you have any comments on the proposed patch?

Regards,

--
Fujii Masao

#17

Fujii Masao

masao.fujii@gmail.com

3 months ago

In reply to: Fujii Masao (#16)

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

On Fri, Mar 13, 2026 at 2:24 AM Fujii Masao <masao.fujii@gmail.com> wrote:

Thanks for investigating the issue and making the patch!
It looks good to me.

Since Tomas added GetXLogInsertEndRecPtr() in commit b1f14c96720,
I updated the patch to use it. Patch attached.
Barring any objections, I will commit it.

-       XLogFlush(GetXLogWriteRecPtr());
+       XLogFlush(GetXLogInsertEndRecPtr());

I excluded the above change from the patch because it seems like a separate
issue. I also wonder whether this code could cause an error in XLogFlush()
even when GetXLogWriteRecPtr() is used.

Regards,

--
Fujii Masao

#18

Anthonin Bonnefoy

anthonin.bonnefoy@datadoghq.com

2 months ago

In reply to: Fujii Masao (#17)

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

On Mon, Mar 16, 2026 at 6:39 AM Fujii Masao <masao.fujii@gmail.com> wrote:

Since Tomas added GetXLogInsertEndRecPtr() in commit b1f14c96720,
I updated the patch to use it. Patch attached.
Barring any objections, I will commit it.
-       XLogFlush(GetXLogWriteRecPtr());
+       XLogFlush(GetXLogInsertEndRecPtr());

Thanks for the updated patch! I've run my test script against the
patch and there's no more "xlog flush request xxx is not satisfied"
errors reported.

I excluded the above change from the patch because it seems like a separate
issue. I also wonder whether this code could cause an error in XLogFlush()
even when GetXLogWriteRecPtr() is used.

Ha right, I've mixed Insert and Write and thought that
FinishSyncWorker was also doing a XlogFlush(GetXLogInsertRecPtr())
when writing the patch. If I try to trigger the partial record issue,
GetXLogWriteRecPtr() points at the end of the WAL page containing the
beginning of the FPI_FOR_HINT, there's no attempt to flush in the
future. So FinishSyncWorker doesn't seem impacted by the issue.

Regards,
Anthonin Bonnefoy

#19

Fujii Masao

masao.fujii@gmail.com

2 months ago

In reply to: Anthonin Bonnefoy (#18)

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

On Mon, Mar 16, 2026 at 6:45 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

On Mon, Mar 16, 2026 at 6:39 AM Fujii Masao <masao.fujii@gmail.com> wrote:
Since Tomas added GetXLogInsertEndRecPtr() in commit b1f14c96720,
I updated the patch to use it. Patch attached.
Barring any objections, I will commit it.
-       XLogFlush(GetXLogWriteRecPtr());
+       XLogFlush(GetXLogInsertEndRecPtr());
Thanks for the updated patch! I've run my test script against the
patch and there's no more "xlog flush request xxx is not satisfied"
errors reported.

Thanks for the test! I've pushed the patch.

I excluded the above change from the patch because it seems like a separate
issue. I also wonder whether this code could cause an error in XLogFlush()
even when GetXLogWriteRecPtr() is used.

Ha right, I've mixed Insert and Write and thought that
FinishSyncWorker was also doing a XlogFlush(GetXLogInsertRecPtr())
when writing the patch. If I try to trigger the partial record issue,
GetXLogWriteRecPtr() points at the end of the WAL page containing the
beginning of the FPI_FOR_HINT, there's no attempt to flush in the
future. So FinishSyncWorker doesn't seem impacted by the issue.

Understood. Thanks for the investigation!

Regards,

--
Fujii Masao

#20

Michael Paquier

michael@paquier.xyz

2 months ago

In reply to: Fujii Masao (#19)

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

On Tue, Mar 17, 2026 at 08:16:08AM +0900, Fujii Masao wrote:

On Mon, Mar 16, 2026 at 6:45 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

Thanks for the updated patch! I've run my test script against the
patch and there's no more "xlog flush request xxx is not satisfied"
errors reported.

Thanks for the test! I've pushed the patch.

This stuff seems sensible enough that I think we should at least have
a test, no? It does not have to be absolutely perfect in terms of
reproducibility, just good enough to be able to detect it across the
buildfarm. We already do various things with page boundaries in WAL
during recovery, and a shutdown could be perhaps timed to increase the
reproducibility rate of the issues discussed?
--
Michael

#21

Anthonin Bonnefoy

anthonin.bonnefoy@datadoghq.com

2 months ago

In reply to: Michael Paquier (#20)

Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: