Return value of XLogInsertRecord() for XLOG_SWITCH record

Started by cca550711 days ago4 messageshackers

Jump to latest

cca5507

cca5507@qq.com

11 days ago

Hi,

I find that the return value of XLogInsertRecord() for XLOG_SWITCH record
is inconsistent with other records.

For XLOG_SWITCH record:

```
/*
* Even though we reserved the rest of the segment for us, which is
* reflected in EndPos, we return a pointer to just the end of the
* xlog-switch record.
*/
if (inserted)
{
EndPos = StartPos + SizeOfXLogRecord;
if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ)
{
uint64 offset = XLogSegmentOffset(EndPos, wal_segment_size);

if (offset == EndPos % XLOG_BLCKSZ)
EndPos += SizeOfXLogLongPHD;
else
EndPos += SizeOfXLogShortPHD;
}
}
```

It is equivalent to XLogBytePosToRecPtr(), but all other records use XLogBytePosToEndRecPtr().

No actual problem found yet, but I think it's better to keep them consistent. Thoughts?

--
Regards,
ChangAo Chen

반지현

rring0727@gmail.com

3 days ago

In reply to: cca5507 (#1)

Re: Return value of XLogInsertRecord() for XLOG_SWITCH record

Hi ChangAo,

I tested v1 of this patch.

Environment:
- PostgreSQL master (commit 9b43e6793b0)
- Ubuntu 24.04, gcc 13.3.0, x86_64
- Configured with --enable-cassert --enable-debug --enable-tap-tests

The patch applies cleanly and builds without new warnings.

Testing performed:
- make check: PASS
- make check-world: PASS
- make -C src/test/recovery check: PASS (599 tests)
In particular, t/043_no_contrecord_switch.pl and t/039_end_of_wal.pl
both pass, which exercise the XLOG_SWITCH boundary handling.

Manual verification:
I ran pg_switch_wal() twice with some activity in between and observed
that the returned LSN and subsequent pg_current_wal_lsn() values are
consistent with segment boundaries and page headers:

SELECT pg_switch_wal(); -- 0/0178FE38
SELECT pg_current_wal_lsn(); -- 0/02000000 (new segment)

-- ... some DDL and INSERT ...

SELECT pg_switch_wal(); -- 0/02026528
SELECT pg_current_wal_lsn(); -- 0/03000060 (new segment + 0x60
-- == SizeOfXLogLongPHD)

The 0x60 offset matches the long page header size, which is what the
original code was computing via the explicit SizeOfXLogLongPHD /
SizeOfXLogShortPHD branches. The refactored version using
XLogBytePosToEndRecPtr(XLogRecPtrToBytePos(StartPos) +
MAXALIGN(SizeOfXLogRecord)) produces the same result while matching
the pattern used elsewhere in XLogInsertRecord().

One question:
The original code did not apply MAXALIGN() to SizeOfXLogRecord before
adding it. In practice SizeOfXLogRecord is likely already MAXALIGN'd
(given typical record header layout), but could you confirm whether
MAXALIGN() here is a correctness fix, a defensive no-op, or something
that requires a separate note in the commit message?

Otherwise the change looks good to me, and I think it's a reasonable
cleanup.

Tested-by: Jihyun Bahn <rring0727@gmail.com>

Regards,
Jihyun Bahn

cca5507

cca5507@qq.com

3 days ago

In reply to: 반지현 (#2)

Re: Return value of XLogInsertRecord() for XLOG_SWITCH record

Hi, thanks for the test!

One question:
The original code did not apply MAXALIGN() to SizeOfXLogRecord before
adding it. In practice SizeOfXLogRecord is likely already MAXALIGN'd
(given typical record header layout), but could you confirm whether
MAXALIGN() here is a correctness fix, a defensive no-op, or something
that requires a separate note in the commit message?

Otherwise the change looks good to me, and I think it's a reasonable
cleanup.

I apply the MAXALIGN() to keep it consistent with ReserveXLogSwitch(), the
value seems unchanged though.

Attach v2 which is more efficient.

--
Regards,
ChangAo Chen

반지현

rring0727@gmail.com

3 days ago

In reply to: cca5507 (#3)

Re: Return value of XLogInsertRecord() for XLOG_SWITCH record

Hi ChangAo,

Thanks for the explanation and v2. I re-tested the new version and
also ran a comparison against unpatched master to better understand
the behavior change.

## Test environment

Commit: 64b2b421248 (today's master)
OS: Ubuntu 24.04 LTS on AWS EC2 (t3.xlarge, 4 vCPU, 16 GB)
Build: --enable-cassert --enable-debug --enable-tap-tests
CFLAGS="-O0 -ggdb -fno-omit-frame-pointer"

## Test results for v2

make check: All 248 tests passed
make -C src/test/recovery check: All 599 tests passed (262s)

Nothing new from the build side — clean compile, no new warnings.

## Manual verification

A few observations from exercising pg_switch_wal() repeatedly:

-- After some activity that lands mid-segment:
SELECT pg_switch_wal(); -> 0/03000078
SELECT pg_current_wal_lsn(); -> 0/04000000

-- Immediately calling switch again (now exactly on segment boundary):
SELECT pg_switch_wal(); -> 0/04000000 (no-op, same LSN)
SELECT pg_switch_wal(); -> 0/04000000 (still no-op)

So pg_switch_wal() is idempotent at segment boundaries — which is
what one would hope for.

## Comparison with unpatched master

I stashed v2, rebuilt, and re-ran the same sequence on current
master. The externally observable LSN behavior is identical:
repeated pg_switch_wal() calls at a segment boundary all return
the same LSN with no further WAL written.

This makes sense: when ReserveXLogSwitch() determines there's no
space worth switching, XLogInsertRecord() enters the branch with
inserted == false, so the EndPos adjustment logic on the
`if (inserted)` path isn't actually reached at segment boundaries
in current code.

## My take on v2

The new `EndPos % XLOG_BLCKSZ != 0` guard doesn't fix an
externally observable bug in current master — ReserveXLogSwitch()
already prevents the problematic case from reaching this code path.
But the guard makes the invariant local to XLogInsertRecord()
rather than depending on the caller's contract, and the added
comment referencing XLogBytePosToEndRecPtr() ties the two paths
together nicely. I think that's a worthwhile defensive improvement.

Going back to the structure of the original code (rather than the
v1 helper-based rewrite) also seems reasonable — no function-call
overhead, and the arithmetic is clear enough with the new comment.

The MAXALIGN() consistency argument with ReserveXLogSwitch() makes
sense to me.

Tested-by: Jihyun Bahn <rring0727@gmail.com>

Regards,
Jihyun Bahn

2026년 4월 21일 (화) 오후 8:36, cca5507 <cca5507@qq.com>님이 작성:

Show quoted text

Hi, thanks for the test!

One question:
The original code did not apply MAXALIGN() to SizeOfXLogRecord before
adding it. In practice SizeOfXLogRecord is likely already MAXALIGN'd
(given typical record header layout), but could you confirm whether
MAXALIGN() here is a correctness fix, a defensive no-op, or something
that requires a separate note in the commit message?

Otherwise the change looks good to me, and I think it's a reasonable
cleanup.

I apply the MAXALIGN() to keep it consistent with ReserveXLogSwitch(), the
value seems unchanged though.

Attach v2 which is more efficient.

--
Regards,
ChangAo Chen

Return value of XLogInsertRecord() for XLOG_SWITCH record

Attachments:

Attachments: