Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations

Started by Andreas Karlsson5 months ago12 messageshackers

andreas.karlsson@percona.com

5 months ago

Hi,

Andres pointed out this possible optimization on Discord so I hacked up
a quick patch which avoids taking a lock when reading the LSN from a
page on architectures where we can be sure to not get a torn value. It
is always nice to remove a lock from a reasonably hot code path.

I thought about using our functions for atomics but did not do so since
I did not want to introduce any extra overhead on platforms which do not
support 64-bit atomic operations.

I decided to just remove the struct to make the code simpler and more
consistent but I can also see an argument for keeping it to get some
degree of type safety.

I have not properly benchmarked it yet but plan to do so when I am back
from my vacation.

I have also included a cleanup patch where I change a macro into an
inline function which I think improves code readability. Feel free to
ignore that one if you want.

--
Andreas
Percona

Peter Geoghegan

pg@bowt.ie

4 months ago

In reply to: Andreas Karlsson (#1)

Re: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations

On Sun, Nov 23, 2025 at 6:10 PM Andreas Karlsson <andreas@proxel.se> wrote:

Andres pointed out this possible optimization on Discord so I hacked up
a quick patch which avoids taking a lock when reading the LSN from a
page on architectures where we can be sure to not get a torn value. It
is always nice to remove a lock from a reasonably hot code path.

This seems very useful.

One reason why I'm interested in this work is that it will facilitate
other work (from Tomas Vondra and myself) on the proposed new
amgetbatch interface, which enables optimizations such as index
prefetching. That work will replace the use of the amgettuple
interface with a index page/batch oriented amgetbatch interface. It
generalizes nbtree's dropPin optimization (originally added under a
different name by commit 2ed5b87f) to work with any index AM that uses
the new amgetbatch interface -- which will include both nbtree and
hash in the initial version (and possibly other index AMs in the
future).

This means that there'll be quite a few new BufferGetLSNAtomic calls
during scans of hash indexes, where before there were none -- we need
to stash an LSN so that we have an alternative way of detecting unsafe
concurrent TID recycling when LP_DEAD-marking index tuples as a batch
is released (since there'll have been no buffer pin held on the index
leaf page while we accessed the heap when the dropPin optimization is
applied).

With page-level checksums enabled, on the recently posted v4 of our
patch set, I find that there's a regression of about 5% of transaction
throughput with a variant of pgbench SELECT that uses hash indexes.
But with your patch applied on top of our own, that regression
completely goes away.

FWIW the improvement I see with regular nbtree index scans is still
visible, but not quite as good as with hash indexes + the amgetbatch
patch set. It should be easy enough to see this for yourself. Standard
pgbench SELECT should work well enough -- just make sure that you test
with page-level checksums enabled.

--
Peter Geoghegan

Andreas Karlsson

andreas.karlsson@percona.com

4 months ago

In reply to: Andreas Karlsson (#1)

Re: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations

On 11/24/25 12:10 AM, Andreas Karlsson wrote:

Andres pointed out this possible optimization on Discord so I hacked up
a quick patch which avoids taking a lock when reading the LSN from a
page on architectures where we can be sure to not get a torn value. It
is always nice to remove a lock from a reasonably hot code path.

I thought about using our functions for atomics but did not do so since
I did not want to introduce any extra overhead on platforms which do not
support 64-bit atomic operations.

I decided to just remove the struct to make the code simpler and more
consistent but I can also see an argument for keeping it to get some
degree of type safety.

Noticed that the pageinspect tests in the CI did not like that I
increased the alignment requirement of PageHeaderData from 4 to 8 bytes
presumably due to a bytea's data being 4 byte aligned. I will need to
think how to best fix this. Maybe copying the struct is fine in the the
pageinspect code because being able to check for this alignment seems
like a nice property for code which is not in pageinspect.

It gives us the following backtrace:

../src/include/storage/bufpage.h:397:16: runtime error: member access
within misaligned address 0x55d6721e69bc for type 'const struct
PageHeaderData', which requires 8 byte alignment
0x55d6721e69bc: note: pointer points here
10 80 00 00 00 00 00 00 00 00 00 00 00 00 04 00 1c 00 e0 1f 00 20
04 20 00 00 00 00 e0 9f 40 00
^
#0 0x7fb39deb9b07 in PageGetMaxOffsetNumber
../src/include/storage/bufpage.h:397
#1 0x7fb39debac3e in heap_page_items
../contrib/pageinspect/heapfuncs.c:168
#2 0x55d63ff127c7 in ExecMakeTableFunctionResult
../src/backend/executor/execSRF.c:234
#3 0x55d63ff42aa7 in FunctionNext
../src/backend/executor/nodeFunctionscan.c:94
#4 0x55d63ff14157 in ExecScanFetch
../src/include/executor/execScan.h:134
#5 0x55d63ff14157 in ExecScanExtended
../src/include/executor/execScan.h:195
#6 0x55d63ff14157 in ExecScan ../src/backend/executor/execScan.c:59
#7 0x55d63ff427b6 in ExecFunctionScan
../src/backend/executor/nodeFunctionscan.c:269
#8 0x55d63ff0c38c in ExecProcNodeFirst
../src/backend/executor/execProcnode.c:469
#9 0x55d63fef5cee in ExecProcNode
../src/include/executor/executor.h:319
#10 0x55d63fef6465 in ExecutePlan
../src/backend/executor/execMain.c:1707
#11 0x55d63fef7c00 in standard_ExecutorRun
../src/backend/executor/execMain.c:366
#12 0x55d63fef7c60 in ExecutorRun
../src/backend/executor/execMain.c:303
#13 0x55d640322295 in PortalRunSelect ../src/backend/tcop/pquery.c:916
#14 0x55d64032534e in PortalRun ../src/backend/tcop/pquery.c:760
#15 0x55d64031e34a in exec_simple_query
../src/backend/tcop/postgres.c:1279
#16 0x55d6403215ce in PostgresMain ../src/backend/tcop/postgres.c:4775
#17 0x55d640316c6c in BackendMain
../src/backend/tcop/backend_startup.c:124
#18 0x55d6401b2711 in postmaster_child_launch
../src/backend/postmaster/launch_backend.c:268
#19 0x55d6401b7b19 in BackendStartup
../src/backend/postmaster/postmaster.c:3598
#20 0x55d6401ba061 in ServerLoop
../src/backend/postmaster/postmaster.c:1713
#21 0x55d6401bb972 in PostmasterMain
../src/backend/postmaster/postmaster.c:1403
#22 0x55d63ffde07b in main ../src/backend/main/main.c:231
#23 0x7fb39e033ca7 (/lib/x86_64-linux-gnu/libc.so.6+0x29ca7)
(BuildId: def5460e3cee00bfee25b429c97bcc4853e5b3a8)
#24 0x7fb39e033d64 in __libc_start_main
(/lib/x86_64-linux-gnu/libc.so.6+0x29d64) (BuildId:
def5460e3cee00bfee25b429c97bcc4853e5b3a8)
#25 0x55d63fb70260 in _start
(/tmp/cirrus-ci-build/build/tmp_install/usr/local/pgsql/bin/postgres+0x8b0260)
(BuildId: d2e585ffc320b6de4bfce11bb801bb923e7d3484)

Links

- https://cirrus-ci.com/task/5353983260229632
-
https://api.cirrus-ci.com/v1/artifact/task/5353983260229632/testrun/build/testrun/pageinspect/regress/log/postmaster.log

Andreas

Andres Freund

andres@anarazel.de

3 months ago

In reply to: Andreas Karlsson (#1)

Re: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations

Hi,

On 2025-11-24 00:10:03 +0100, Andreas Karlsson wrote:

Andres pointed out this possible optimization on Discord so I hacked up a
quick patch which avoids taking a lock when reading the LSN from a page on
architectures where we can be sure to not get a torn value. It is always
nice to remove a lock from a reasonably hot code path.

Nice.

static inline XLogRecPtr
PageXLogRecPtrGet(PageXLogRecPtr val)
{
-	return (uint64) val.xlogid << 32 | val.xrecoff;
+	return val;
}

#define PageXLogRecPtrSet(ptr, lsn) \
-	((ptr).xlogid = (uint32) ((lsn) >> 32), (ptr).xrecoff = (uint32) (lsn))
+	((ptr) = (lsn))
+
+#else
+
+static inline XLogRecPtr
+PageXLogRecPtrGet(volatile PageXLogRecPtr val)
+{
+	return (val << 32) | (val >> 32);
+}

A volatile on a non-pointer won't do you much good, I'm afraid. You need to
make sure that the underlying value is read as a single 8 byte read, I don't
see how this guarantees that, unfortunately.

Greetings,

Andres Freund

Andreas Karlsson

andreas.karlsson@percona.com

3 months ago

In reply to: Andres Freund (#4)

Re: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations

On 1/2/26 4:27 PM, Andres Freund wrote:

A volatile on a non-pointer won't do you much good, I'm afraid. You need to
make sure that the underlying value is read as a single 8 byte read, I don't
see how this guarantees that, unfortunately.

Yeah, that was a quite big thinko. I have a attached a patch with the
thinko fixed but I am still not happy with it. I think I will try to use
atomics.h and see if that makes the code nicer to read.

Also will after that see what I can do about pageinspect.

Andreas

Peter Geoghegan

pg@bowt.ie

2 months ago

In reply to: Andreas Karlsson (#5)

Re: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations

On Wed, Jan 14, 2026 at 1:31 AM Andreas Karlsson <andreas@proxel.se> wrote:

Yeah, that was a quite big thinko. I have a attached a patch with the
thinko fixed but I am still not happy with it. I think I will try to use
atomics.h and see if that makes the code nicer to read.

Also will after that see what I can do about pageinspect.

I believe that pageinspect's heap_page_items function needs to use
get_page_from_raw -- see commit 14e9b18fed.

Attached patch 0002-* shows what's required. I'm including your
original 0001-* patch from your v2 here, to keep CFTester happy.

We (Tomas Vondra and I) are treating this as a dependency for our
index prefetching patch. It'd be good to get this done soon.
--
Peter Geoghegan

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 1 month ago

In reply to: Peter Geoghegan (#6)

Re: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations

On 2/5/26 16:38, Peter Geoghegan wrote:

On Wed, Jan 14, 2026 at 1:31 AM Andreas Karlsson <andreas@proxel.se> wrote:

Yeah, that was a quite big thinko. I have a attached a patch with the
thinko fixed but I am still not happy with it. I think I will try to use
atomics.h and see if that makes the code nicer to read.

Also will after that see what I can do about pageinspect.

I believe that pageinspect's heap_page_items function needs to use
get_page_from_raw -- see commit 14e9b18fed.

Attached patch 0002-* shows what's required. I'm including your
original 0001-* patch from your v2 here, to keep CFTester happy.

We (Tomas Vondra and I) are treating this as a dependency for our
index prefetching patch. It'd be good to get this done soon.
--

Here's a v4 with some very minor adjustments and updated commit message.
Barring objections, I'm going to push this shortly - in a day or two, to
get it out of the way for the main index prefetch patch.

The only real change is about asserts in BufferGetLSNAtomic( - the new
"ifdef" branch only called PageGetLSN(), so it lost the checks that the
buffer is valid and pinned. Which seems not desirable.

In fact, looking at the existing code, shouldn't the asserts be before
the "fastpath" exit (for cases without hint bits or local buffers)?

The v4 changes both points. It adds the asserts to the new ifdef block,
and moves it up in the existing one.

regards

--
Tomas Vondra

Andres Freund

andres@anarazel.de

about 1 month ago

In reply to: Tomas Vondra (#7)

Re: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations

Hi,

On 2026-03-10 19:41:54 +0100, Tomas Vondra wrote:

On 2/5/26 16:38, Peter Geoghegan wrote:

On Wed, Jan 14, 2026 at 1:31 AM Andreas Karlsson <andreas@proxel.se> wrote:

Yeah, that was a quite big thinko. I have a attached a patch with the
thinko fixed but I am still not happy with it. I think I will try to use
atomics.h and see if that makes the code nicer to read.

Also will after that see what I can do about pageinspect.

I believe that pageinspect's heap_page_items function needs to use
get_page_from_raw -- see commit 14e9b18fed.

Maybe I'm slow today, but how's that related to the patch at hand?

Regardless of that, it seems a bit confusing to fold the pageinspect changes
into the main commit.

The only real change is about asserts in BufferGetLSNAtomic( - the new
"ifdef" branch only called PageGetLSN(), so it lost the checks that the
buffer is valid and pinned. Which seems not desirable.

In fact, looking at the existing code, shouldn't the asserts be before
the "fastpath" exit (for cases without hint bits or local buffers)?

The v4 changes both points. It adds the asserts to the new ifdef block,
and moves it up in the existing one.

regards

--
Tomas Vondra

From 4d6479b37e60015cc4cf55579a8ec51337675996 Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <andreas@proxel.se>
Date: Fri, 21 Nov 2025 10:15:06 +0100
Subject: [PATCH v4] Do not lock in BufferGetLSNAtomic() on archs with 8 byte
atomic reads

On platforms where we can read or write the whole LSN atomically, we do
not need to lock the buffer header to prevent torn LSNs. We can do this
only on platforms with PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY, and when the
pd_lsn field is properly aligned.

For historical reasons the PageXLogRecPtr was defined as a struct with
two uint32 fields. This replaces it with a single uint64 value, to make
the intent clearer.

Idea by Andres Freund. Initial patch by Andreas Karlsson, improved by
Peter Geoghegan. Minor tweaks by me.

Author: Andreas Karlsson <andreas@proxel.se>
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: /messages/by-id/b6610c3b-3f59-465a-bdbb-8e9259f0abc4@proxel.se

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 5f3d083e938..efcf64169aa 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4628,33 +4628,42 @@ BufferIsPermanent(Buffer buffer)

/*
* BufferGetLSNAtomic
- *		Retrieves the LSN of the buffer atomically using a buffer header lock.
- *		This is necessary for some callers who may not have an exclusive lock
- *		on the buffer.
+ *		Retrieves the LSN of the buffer atomically.
+ *
+ *		On platforms without 8 byte atomic reads/write we need to take a
+ *		header lock. This is necessary for some callers who may not have an
+ *		exclusive lock on the buffer.
*/
XLogRecPtr
BufferGetLSNAtomic(Buffer buffer)
{
+#ifdef PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY
+	Assert(BufferIsValid(buffer));
+	Assert(BufferIsPinned(buffer));
+
+	return PageGetLSN(BufferGetPage(buffer));
+#else
char	   *page = BufferGetPage(buffer);
BufferDesc *bufHdr;
XLogRecPtr	lsn;

+	/* Make sure we've got a real buffer, and that we hold a pin on it. */
+	Assert(BufferIsValid(buffer));
+	Assert(BufferIsPinned(buffer));
+

Seems a bit silly to have these asserts duplicated. I'd probably just put the
body of the #else into an {} and then have the asserts before the ifdef.

diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 92a6bb9b0c0..e994526ca52 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -91,24 +91,27 @@ typedef uint16 LocationIndex;

/*
- * For historical reasons, the 64-bit LSN value is stored as two 32-bit
- * values.
+ * For historical reasons, the storage of 64-bit LSN values depends on CPU
+ * endianness; PageXLogRecPtr used to be a struct consisting of two 32-bit
+ * values.  When reading (and writing) the pd_lsn field from page headers, the
+ * caller must convert from (and convert to) the platform's native endianness.
*/
-typedef struct
-{
-	uint32		xlogid;			/* high bits */
-	uint32		xrecoff;		/* low bits */
-} PageXLogRecPtr;
+typedef uint64 PageXLogRecPtr;

I think this should explain why we are storing it as a 64bit value (i.e. that
we need to be able to read it without tearing).

I suspect this should continue to be a struct (just containing a 64bit uint),
otherwise it'll be way way too easy to omit the conversion, due to C's weak
typedefs.

-static inline XLogRecPtr
-PageXLogRecPtrGet(PageXLogRecPtr val)
+/*
+ * Convert a  pd_lsn taken from a page header into its native
+ * uint64/PageXLogRecPtr representation
+ */

Odd double space before pd_lsn.

+static inline PageXLogRecPtr
+PageXLogRecPtrGet(PageXLogRecPtr pd_lsn)
{
-	return (uint64) val.xlogid << 32 | val.xrecoff;
+#ifdef WORDS_BIGENDIAN
+	return pd_lsn;
+#else
+	return (pd_lsn << 32) | (pd_lsn >> 32);
+#endif
}

-#define PageXLogRecPtrSet(ptr, lsn) \
-	((ptr).xlogid = (uint32) ((lsn) >> 32), (ptr).xrecoff = (uint32) (lsn))
-
/*
* disk page organization
*
@@ -387,10 +390,11 @@ PageGetLSN(const PageData *page)
{
return PageXLogRecPtrGet(((const PageHeaderData *) page)->pd_lsn);
}

I think this may need to actually use a volatile to force the read to happen
as the 64bit value, otherwise the compiler would be entirely free to implement
it as two 32bit reads.

static inline void
PageSetLSN(Page page, XLogRecPtr lsn)
{
-	PageXLogRecPtrSet(((PageHeader) page)->pd_lsn, lsn);
+	((PageHeader) page)->pd_lsn = PageXLogRecPtrGet(lsn);
}

Dito.

Seems also a bit misleading to document PageXLogRecPtrSet as "Convert a pd_lsn
taken from a page header into its native ..." when we use it for both
directions.

I think this probably also ought to assert that the page this is being set on
is properly aligned.

Greetings,

Andres Freund

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 1 month ago

In reply to: Andres Freund (#8)

Re: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations

On 3/10/26 21:41, Andres Freund wrote:

Hi,

On 2026-03-10 19:41:54 +0100, Tomas Vondra wrote:

On 2/5/26 16:38, Peter Geoghegan wrote:

On Wed, Jan 14, 2026 at 1:31 AM Andreas Karlsson <andreas@proxel.se> wrote:

Yeah, that was a quite big thinko. I have a attached a patch with the
thinko fixed but I am still not happy with it. I think I will try to use
atomics.h and see if that makes the code nicer to read.

Also will after that see what I can do about pageinspect.

I believe that pageinspect's heap_page_items function needs to use
get_page_from_raw -- see commit 14e9b18fed.

Maybe I'm slow today, but how's that related to the patch at hand?

Regardless of that, it seems a bit confusing to fold the pageinspect changes
into the main commit.

That's because of alignment requirements, per this comment:

* On machines with MAXALIGN = 8, the payload of a bytea is not
* maxaligned, since it will start 4 bytes into a palloc'd value.
* PageHeaderData requires 8 byte alignment, so always use this
* function when accessing page header fields from a raw page bytea.

AFAIK proper alignment is required to make the load atomic.

XLogRecPtr
BufferGetLSNAtomic(Buffer buffer)
{
+#ifdef PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY
+	Assert(BufferIsValid(buffer));
+	Assert(BufferIsPinned(buffer));
+
+	return PageGetLSN(BufferGetPage(buffer));
+#else
char	   *page = BufferGetPage(buffer);
BufferDesc *bufHdr;
XLogRecPtr	lsn;

+	/* Make sure we've got a real buffer, and that we hold a pin on it. */
+	Assert(BufferIsValid(buffer));
+	Assert(BufferIsPinned(buffer));
+

Seems a bit silly to have these asserts duplicated. I'd probably just put the
body of the #else into an {} and then have the asserts before the ifdef.

OK, WFM.

diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 92a6bb9b0c0..e994526ca52 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -91,24 +91,27 @@ typedef uint16 LocationIndex;

/*
- * For historical reasons, the 64-bit LSN value is stored as two 32-bit
- * values.
+ * For historical reasons, the storage of 64-bit LSN values depends on CPU
+ * endianness; PageXLogRecPtr used to be a struct consisting of two 32-bit
+ * values.  When reading (and writing) the pd_lsn field from page headers, the
+ * caller must convert from (and convert to) the platform's native endianness.
*/
-typedef struct
-{
-	uint32		xlogid;			/* high bits */
-	uint32		xrecoff;		/* low bits */
-} PageXLogRecPtr;
+typedef uint64 PageXLogRecPtr;

I think this should explain why we are storing it as a 64bit value (i.e. that
we need to be able to read it without tearing).

I suspect this should continue to be a struct (just containing a 64bit uint),
otherwise it'll be way way too easy to omit the conversion, due to C's weak
typedefs.

I agree. It'd be trivial to call it with plain XLogRecPtr (especially
now, with a function handling both directions).

-static inline XLogRecPtr
-PageXLogRecPtrGet(PageXLogRecPtr val)
+/*
+ * Convert a  pd_lsn taken from a page header into its native
+ * uint64/PageXLogRecPtr representation
+ */

Odd double space before pd_lsn.

+static inline PageXLogRecPtr
+PageXLogRecPtrGet(PageXLogRecPtr pd_lsn)
{
-	return (uint64) val.xlogid << 32 | val.xrecoff;
+#ifdef WORDS_BIGENDIAN
+	return pd_lsn;
+#else
+	return (pd_lsn << 32) | (pd_lsn >> 32);
+#endif
}

-#define PageXLogRecPtrSet(ptr, lsn) \
-	((ptr).xlogid = (uint32) ((lsn) >> 32), (ptr).xrecoff = (uint32) (lsn))
-
/*
* disk page organization
*
@@ -387,10 +390,11 @@ PageGetLSN(const PageData *page)
{
return PageXLogRecPtrGet(((const PageHeaderData *) page)->pd_lsn);
}

I think this may need to actually use a volatile to force the read to happen
as the 64bit value, otherwise the compiler would be entirely free to implement
it as two 32bit reads.

I did that in v5 (I think). But at the same time I'm now rather confused
about the meaning of PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY. Because if a
well-aligned load/store of 8B can be implemented as two 32-bit reads
(with a "sequence point" in between), then how is that atomic?

I'm also a bit ... unsure about "volatile" in general. Is that actually
something the specification says is needed here, or is it just an
attempt to "disable optimizations" (like the split into two stores)?

static inline void
PageSetLSN(Page page, XLogRecPtr lsn)
{
-	PageXLogRecPtrSet(((PageHeader) page)->pd_lsn, lsn);
+	((PageHeader) page)->pd_lsn = PageXLogRecPtrGet(lsn);
}
Dito.

Seems also a bit misleading to document PageXLogRecPtrSet as "Convert a pd_lsn
taken from a page header into its native ..." when we use it for both
directions.

I agree, I found that confusing too. In the attached v5 I went back to
the separate get/set functions from v3. I think that's better/clearer.

regards

--
Tomas Vondra

#10

Andres Freund

andres@anarazel.de

about 1 month ago

In reply to: Tomas Vondra (#9)

Re: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations

Hi,

On 2026-03-11 01:00:34 +0100, Tomas Vondra wrote:

On 3/10/26 21:41, Andres Freund wrote:

On 2026-03-10 19:41:54 +0100, Tomas Vondra wrote:

On 2/5/26 16:38, Peter Geoghegan wrote:

On Wed, Jan 14, 2026 at 1:31 AM Andreas Karlsson <andreas@proxel.se> wrote:

Yeah, that was a quite big thinko. I have a attached a patch with the
thinko fixed but I am still not happy with it. I think I will try to use
atomics.h and see if that makes the code nicer to read.

Also will after that see what I can do about pageinspect.

I believe that pageinspect's heap_page_items function needs to use
get_page_from_raw -- see commit 14e9b18fed.

Maybe I'm slow today, but how's that related to the patch at hand?

Regardless of that, it seems a bit confusing to fold the pageinspect changes
into the main commit.

That's because of alignment requirements, per this comment:

* On machines with MAXALIGN = 8, the payload of a bytea is not
* maxaligned, since it will start 4 bytes into a palloc'd value.
* PageHeaderData requires 8 byte alignment, so always use this
* function when accessing page header fields from a raw page bytea.

I guess we are now reading 8 bytes as one.

AFAIK proper alignment is required to make the load atomic.

Sure, but it's a copy of the page, so that'd itself would not matter.

+static inline PageXLogRecPtr
+PageXLogRecPtrGet(PageXLogRecPtr pd_lsn)
{
-	return (uint64) val.xlogid << 32 | val.xrecoff;
+#ifdef WORDS_BIGENDIAN
+	return pd_lsn;
+#else
+	return (pd_lsn << 32) | (pd_lsn >> 32);
+#endif
}
-#define PageXLogRecPtrSet(ptr, lsn) \
-	((ptr).xlogid = (uint32) ((lsn) >> 32), (ptr).xrecoff = (uint32) (lsn))
-
/*
* disk page organization
*
@@ -387,10 +390,11 @@ PageGetLSN(const PageData *page)
{
return PageXLogRecPtrGet(((const PageHeaderData *) page)->pd_lsn);
}
I think this may need to actually use a volatile to force the read to happen
as the 64bit value, otherwise the compiler would be entirely free to implement
it as two 32bit reads.
I did that in v5 (I think). But at the same time I'm now rather confused
about the meaning of PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY. Because if a
well-aligned load/store of 8B can be implemented as two 32-bit reads
(with a "sequence point" in between), then how is that atomic?

All PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY says is that the hardware can do 8
byte reads/writes without tearing. But that still requires 8 bytes
reads/writes to be done as one, not multiple instructions.

I'm also a bit ... unsure about "volatile" in general. Is that actually
something the specification says is needed here, or is it just an
attempt to "disable optimizations" (like the split into two stores)?

It's definitely suboptimal. We should use something C11's
atomic_load()/atomic_store(). But we can't rely on that yet (it's not enabled
without extra flags in at least msvc).

The volatile does prevent the compiler from just doing a read/write twice or
never, just because it'd be nicer for code generation. But it doesn't
actually guarantee that the right instructions for reading writing are
generated :(

Greetings,

Andres Freund

#11

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 1 month ago

In reply to: Andres Freund (#10)

Re: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations

On 3/11/26 01:21, Andres Freund wrote:

Hi,

On 2026-03-11 01:00:34 +0100, Tomas Vondra wrote:

On 3/10/26 21:41, Andres Freund wrote:

On 2026-03-10 19:41:54 +0100, Tomas Vondra wrote:

On 2/5/26 16:38, Peter Geoghegan wrote:

On Wed, Jan 14, 2026 at 1:31 AM Andreas Karlsson <andreas@proxel.se> wrote:

Yeah, that was a quite big thinko. I have a attached a patch with the
thinko fixed but I am still not happy with it. I think I will try to use
atomics.h and see if that makes the code nicer to read.

Also will after that see what I can do about pageinspect.

I believe that pageinspect's heap_page_items function needs to use
get_page_from_raw -- see commit 14e9b18fed.

Maybe I'm slow today, but how's that related to the patch at hand?

Regardless of that, it seems a bit confusing to fold the pageinspect changes
into the main commit.

That's because of alignment requirements, per this comment:

* On machines with MAXALIGN = 8, the payload of a bytea is not
* maxaligned, since it will start 4 bytes into a palloc'd value.
* PageHeaderData requires 8 byte alignment, so always use this
* function when accessing page header fields from a raw page bytea.

I guess we are now reading 8 bytes as one.

Not sure I understand. Are you saying it's OK or not?

AFAIK proper alignment is required to make the load atomic.

Sure, but it's a copy of the page, so that'd itself would not matter.

I does seem a bit unnecessary, in the sense that there shouldn't be
anything changing the data.

+static inline PageXLogRecPtr
+PageXLogRecPtrGet(PageXLogRecPtr pd_lsn)
{
-	return (uint64) val.xlogid << 32 | val.xrecoff;
+#ifdef WORDS_BIGENDIAN
+	return pd_lsn;
+#else
+	return (pd_lsn << 32) | (pd_lsn >> 32);
+#endif
}
-#define PageXLogRecPtrSet(ptr, lsn) \
-	((ptr).xlogid = (uint32) ((lsn) >> 32), (ptr).xrecoff = (uint32) (lsn))
-
/*
* disk page organization
*
@@ -387,10 +390,11 @@ PageGetLSN(const PageData *page)
{
return PageXLogRecPtrGet(((const PageHeaderData *) page)->pd_lsn);
}
I think this may need to actually use a volatile to force the read to happen
as the 64bit value, otherwise the compiler would be entirely free to implement
it as two 32bit reads.
I did that in v5 (I think). But at the same time I'm now rather confused
about the meaning of PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY. Because if a
well-aligned load/store of 8B can be implemented as two 32-bit reads
(with a "sequence point" in between), then how is that atomic?
All PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY says is that the hardware can do 8
byte reads/writes without tearing. But that still requires 8 bytes
reads/writes to be done as one, not multiple instructions.

I'm also a bit ... unsure about "volatile" in general. Is that actually
something the specification says is needed here, or is it just an
attempt to "disable optimizations" (like the split into two stores)?

It's definitely suboptimal. We should use something C11's
atomic_load()/atomic_store(). But we can't rely on that yet (it's not enabled
without extra flags in at least msvc).

The volatile does prevent the compiler from just doing a read/write twice or
never, just because it'd be nicer for code generation. But it doesn't
actually guarantee that the right instructions for reading writing are
generated :(

That sounds a bit scary, TBH.

regards

--
Tomas Vondra

#12

Andres Freund

andres@anarazel.de

about 1 month ago

In reply to: Tomas Vondra (#11)

Re: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations

Hi,

On 2026-03-11 01:43:02 +0100, Tomas Vondra wrote:

Maybe I'm slow today, but how's that related to the patch at hand?

Regardless of that, it seems a bit confusing to fold the pageinspect changes
into the main commit.

That's because of alignment requirements, per this comment:

* On machines with MAXALIGN = 8, the payload of a bytea is not
* maxaligned, since it will start 4 bytes into a palloc'd value.
* PageHeaderData requires 8 byte alignment, so always use this
* function when accessing page header fields from a raw page bytea.

I guess we are now reading 8 bytes as one.

Not sure I understand. Are you saying it's OK or not?

I mean that this adds an 8 byte read, which alignment picky hardware could
complain about, in theory. So I guess I kinda see the point of the change
now.

I'm also a bit ... unsure about "volatile" in general. Is that actually
something the specification says is needed here, or is it just an
attempt to "disable optimizations" (like the split into two stores)?

It's definitely suboptimal. We should use something C11's
atomic_load()/atomic_store(). But we can't rely on that yet (it's not enabled
without extra flags in at least msvc).

The volatile does prevent the compiler from just doing a read/write twice or
never, just because it'd be nicer for code generation. But it doesn't
actually guarantee that the right instructions for reading writing are
generated :(

That sounds a bit scary, TBH.

Indeed. It's how atomic reads / writes have worked for quite a while though,
so I think we'll just have to live with it until we can rely on C11 atomics on
msvc.

From 794fb844266dcd8d97217da09b61c5c9cafc01d7 Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <andreas@proxel.se>
Date: Fri, 21 Nov 2025 10:15:06 +0100
Subject: [PATCH v5] Do not lock in BufferGetLSNAtomic() on archs with 8 byte
atomic reads

On platforms where we can read or write the whole LSN atomically, we do
not need to lock the buffer header to prevent torn LSNs. We can do this
only on platforms with PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY, and when the
pd_lsn field is properly aligned.

For historical reasons the PageXLogRecPtr was defined as a struct with
two uint32 fields. This replaces it with a single uint64 value, to make
the intent clearer.

Idea by Andres Freund. Initial patch by Andreas Karlsson, improved by
Peter Geoghegan. Minor tweaks by me.

I'd add a note explaining why heapfuncs is updated.

/*
* disk page organization
@@ -385,12 +411,13 @@ PageGetMaxOffsetNumber(const PageData *page)
static inline XLogRecPtr
PageGetLSN(const PageData *page)
{
-	return PageXLogRecPtrGet(((const PageHeaderData *) page)->pd_lsn);
+	return PageXLogRecPtrGet(&((const volatile PageHeaderData *) page)->pd_lsn);
}

I don't think this volatile is needed, the one in PageXLogRecPtrGet's
declaration should do the work.

Greetings,

Andres Freund

Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: