Re: Atomic ops for unlogged LSN

Started by John Morrisabout 2 years ago20 messages
#1John Morris
john.morris@crunchydata.com
1 attachment(s)

Keeping things up to date. Here is a rebased patch with no changes from previous one.

* John Morris

Attachments:

atomic-lsn.patchapplication/octet-stream; name=atomic-lsn.patchDownload
From 982900a2c78840db7853c1bf61037c49af3e2db3 Mon Sep 17 00:00:00 2001
From: John Morris <john.morris@crunchydata.com>
Date: Wed, 25 Oct 2023 21:41:16 -0700
Subject: [PATCH] atomic lsn squashed

---
 src/backend/access/transam/xlog.c | 23 ++++++-----------------
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 40461923ea..03daa18e4c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -469,9 +469,8 @@ typedef struct XLogCtlData
 
 	XLogSegNo	lastRemovedSegNo;	/* latest removed/recycled XLOG segment */
 
-	/* Fake LSN counter, for unlogged relations. Protected by ulsn_lck. */
-	XLogRecPtr	unloggedLSN;
-	slock_t		ulsn_lck;
+	/* Fake LSN counter for unlogged relations. Updated atomically. */
+	pg_atomic_uint64 unloggedLSN;
 
 	/* Time and LSN of last xlog segment switch. Protected by WALWriteLock. */
 	pg_time_t	lastSegSwitchTime;
@@ -4294,14 +4293,7 @@ DataChecksumsEnabled(void)
 XLogRecPtr
 GetFakeLSNForUnloggedRel(void)
 {
-	XLogRecPtr	nextUnloggedLSN;
-
-	/* increment the unloggedLSN counter, need SpinLock */
-	SpinLockAcquire(&XLogCtl->ulsn_lck);
-	nextUnloggedLSN = XLogCtl->unloggedLSN++;
-	SpinLockRelease(&XLogCtl->ulsn_lck);
-
-	return nextUnloggedLSN;
+	return pg_atomic_fetch_add_u64(&XLogCtl->unloggedLSN, 1);
 }
 
 /*
@@ -4714,7 +4706,6 @@ XLOGShmemInit(void)
 
 	SpinLockInit(&XLogCtl->Insert.insertpos_lck);
 	SpinLockInit(&XLogCtl->info_lck);
-	SpinLockInit(&XLogCtl->ulsn_lck);
 }
 
 /*
@@ -5319,9 +5310,9 @@ StartupXLOG(void)
 	 * the unlogged LSN counter can be reset too.
 	 */
 	if (ControlFile->state == DB_SHUTDOWNED)
-		XLogCtl->unloggedLSN = ControlFile->unloggedLSN;
+		pg_atomic_init_u64(&XLogCtl->unloggedLSN, ControlFile->unloggedLSN);
 	else
-		XLogCtl->unloggedLSN = FirstNormalUnloggedLSN;
+		pg_atomic_init_u64(&XLogCtl->unloggedLSN, FirstNormalUnloggedLSN);
 
 	/*
 	 * Copy any missing timeline history files between 'now' and the recovery
@@ -6903,9 +6894,7 @@ CreateCheckPoint(int flags)
 	 * unused on non-shutdown checkpoints, but seems useful to store it always
 	 * for debugging purposes.
 	 */
-	SpinLockAcquire(&XLogCtl->ulsn_lck);
-	ControlFile->unloggedLSN = XLogCtl->unloggedLSN;
-	SpinLockRelease(&XLogCtl->ulsn_lck);
+	ControlFile->unloggedLSN = pg_atomic_read_u64(&XLogCtl->unloggedLSN);
 
 	UpdateControlFile();
 	LWLockRelease(ControlFileLock);
-- 
2.33.0

#2Nathan Bossart
nathandbossart@gmail.com
In reply to: John Morris (#1)

On Thu, Oct 26, 2023 at 03:00:58PM +0000, John Morris wrote:

Keeping things up to date. Here is a rebased patch with no changes from previous one.

This patch looks a little different than the last version I see posted [0]/messages/by-id/BYAPR13MB2677ED1797C81779D17B414CA03EA@BYAPR13MB2677.namprd13.prod.outlook.com.
That last version of the patch (which appears to be just about committable)
still applies for me, too.

[0]: /messages/by-id/BYAPR13MB2677ED1797C81779D17B414CA03EA@BYAPR13MB2677.namprd13.prod.outlook.com

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#3John Morris
john.morris@crunchydata.com
In reply to: Nathan Bossart (#2)

* This patch looks a little different than the last version I see posted [0].
That last version of the patch (which appears to be just about committable)

My oops – I was looking at the wrong commit. The newer patch has already been committed, so pretend that last message didn’t happen. Thanks,
John

#4John Morris
john.morris@crunchydata.com
In reply to: John Morris (#3)
1 attachment(s)

Here is what I meant to do earlier. As it turns out, this patch has not been merged yet.

This is a rebased version . Even though I labelled it “v3”, there should be no changes from “v2”.

Attachments:

atomicLSN-v3-Use-atomic-ops-for-unloggedLSNs.patchapplication/octet-stream; name=atomicLSN-v3-Use-atomic-ops-for-unloggedLSNs.patchDownload
From 479bfdc34f6dc79cf31cdaac355465117b7a1992 Mon Sep 17 00:00:00 2001
From: John Morris <john.morris@crunchydata.com>
Date: Wed, 1 Nov 2023 13:16:38 -0700
Subject: [PATCH] Use atomic increment  instead of a lock when advancing the
 unlogged LSN

---
 src/backend/access/transam/xlog.c | 48 +++++++++++++++----------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b541be8eec..d903f7280f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -469,9 +469,8 @@ typedef struct XLogCtlData
 
 	XLogSegNo	lastRemovedSegNo;	/* latest removed/recycled XLOG segment */
 
-	/* Fake LSN counter, for unlogged relations. Protected by ulsn_lck. */
-	XLogRecPtr	unloggedLSN;
-	slock_t		ulsn_lck;
+	/* Fake LSN counter for unlogged relations. Updated atomically. */
+	pg_atomic_uint64 unloggedLSN;
 
 	/* Time and LSN of last xlog segment switch. Protected by WALWriteLock. */
 	pg_time_t	lastSegSwitchTime;
@@ -4287,21 +4286,16 @@ DataChecksumsEnabled(void)
  *
  * Each call generates an LSN that is greater than any previous value
  * returned. The current counter value is saved and restored across clean
- * shutdowns, but like unlogged relations, does not survive a crash. This can
- * be used in lieu of real LSN values returned by XLogInsert, if you need an
- * LSN-like increasing sequence of numbers without writing any WAL.
+ * shutdowns, but it is reset after a crash. Consequently, it should
+ * only be used for unlogged relations or for data which does not survive
+ * a crash. An unlogged LSN may be used in lieu of real LSN values
+ * returned by XLogInsert, if you need an LSN-like increasing sequence
+ * of numbers without writing any WAL.
  */
 XLogRecPtr
 GetFakeLSNForUnloggedRel(void)
 {
-	XLogRecPtr	nextUnloggedLSN;
-
-	/* increment the unloggedLSN counter, need SpinLock */
-	SpinLockAcquire(&XLogCtl->ulsn_lck);
-	nextUnloggedLSN = XLogCtl->unloggedLSN++;
-	SpinLockRelease(&XLogCtl->ulsn_lck);
-
-	return nextUnloggedLSN;
+	return pg_atomic_fetch_add_u64(&XLogCtl->unloggedLSN, 1);
 }
 
 /*
@@ -4714,7 +4708,6 @@ XLOGShmemInit(void)
 
 	SpinLockInit(&XLogCtl->Insert.insertpos_lck);
 	SpinLockInit(&XLogCtl->info_lck);
-	SpinLockInit(&XLogCtl->ulsn_lck);
 }
 
 /*
@@ -5319,9 +5312,9 @@ StartupXLOG(void)
 	 * the unlogged LSN counter can be reset too.
 	 */
 	if (ControlFile->state == DB_SHUTDOWNED)
-		XLogCtl->unloggedLSN = ControlFile->unloggedLSN;
+		pg_atomic_init_u64(&XLogCtl->unloggedLSN, ControlFile->unloggedLSN);
 	else
-		XLogCtl->unloggedLSN = FirstNormalUnloggedLSN;
+		pg_atomic_init_u64(&XLogCtl->unloggedLSN, FirstNormalUnloggedLSN);
 
 	/*
 	 * Copy any missing timeline history files between 'now' and the recovery
@@ -6899,13 +6892,20 @@ CreateCheckPoint(int flags)
 	ControlFile->minRecoveryPointTLI = 0;
 
 	/*
-	 * Persist unloggedLSN value. It's reset on crash recovery, so this goes
-	 * unused on non-shutdown checkpoints, but seems useful to store it always
-	 * for debugging purposes.
-	 */
-	SpinLockAcquire(&XLogCtl->ulsn_lck);
-	ControlFile->unloggedLSN = XLogCtl->unloggedLSN;
-	SpinLockRelease(&XLogCtl->ulsn_lck);
+	 * Persist the unloggedLSN.
+	 * We have two cases to consider:
+	 *   1) We are shutting down. All other backends have finished, and
+	 *      the unloggedLSN isn't being updated any more. There are no race
+	 *      conditions. We just save the current value and restore it
+	 *      when we are restarting.
+	 *   2) We are actively running, say during a checkpoint.
+	 *      The unloggedLSN is being updated by other backends,
+	 *      but the value we write out doesn't matter.
+	 *      The unloggedLSN will be reset on crash recovery.
+	 *      We could simply write a zero, but the current value
+	 *      might be useful for debugging.
+	 */
+	ControlFile->unloggedLSN = pg_atomic_read_u64(&XLogCtl->unloggedLSN);
 
 	UpdateControlFile();
 	LWLockRelease(ControlFileLock);
-- 
2.33.0

#5Nathan Bossart
nathandbossart@gmail.com
In reply to: John Morris (#4)

On Wed, Nov 01, 2023 at 09:15:20PM +0000, John Morris wrote:

This is a rebased version . Even though I labelled it “v3”, there should be no changes from “v2”.

Thanks. I think this is almost ready, but I have to harp on the
pg_atomic_read_u64() business once more. The relevant comment in atomics.h
has this note:

* The read is guaranteed to return a value as it has been written by this or
* another process at some point in the past. There's however no cache
* coherency interaction guaranteeing the value hasn't since been written to
* again.

However unlikely, this seems to suggest that CreateCheckPoint() could see
an old value with your patch. Out of an abundance of caution, I'd
recommend changing this to pg_atomic_compare_exchange_u64() like
pg_atomic_read_u64_impl() does in generic.h.

@@ -4635,7 +4629,6 @@ XLOGShmemInit(void)

SpinLockInit(&XLogCtl->Insert.insertpos_lck);
SpinLockInit(&XLogCtl->info_lck);
- SpinLockInit(&XLogCtl->ulsn_lck);
}

Shouldn't we do the pg_atomic_init_u64() here? We can still set the
initial value in StartupXLOG(), but it might be safer to initialize the
variable where we are initializing the other shared memory stuff.

Since this isn't a tremendously performance-sensitive area, IMHO we should
code defensively to eliminate any doubts about correctness and to make it
easier to reason about.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#6Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Nathan Bossart (#5)

On Thu, Nov 2, 2023 at 9:10 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Wed, Nov 01, 2023 at 09:15:20PM +0000, John Morris wrote:

This is a rebased version . Even though I labelled it “v3”, there should be no changes from “v2”.

Thanks. I think this is almost ready, but I have to harp on the
pg_atomic_read_u64() business once more. The relevant comment in atomics.h
has this note:

* The read is guaranteed to return a value as it has been written by this or
* another process at some point in the past. There's however no cache
* coherency interaction guaranteeing the value hasn't since been written to
* again.

However unlikely, this seems to suggest that CreateCheckPoint() could see
an old value with your patch. Out of an abundance of caution, I'd
recommend changing this to pg_atomic_compare_exchange_u64() like
pg_atomic_read_u64_impl() does in generic.h.

+1. pg_atomic_read_u64 provides no barrier semantics whereas
pg_atomic_compare_exchange_u64 does. Without the barrier, it might
happen that the value is read while the other backend is changing it.
I think something like below providing full barrier semantics looks
fine to me:

XLogRecPtr ulsn;

pg_atomic_compare_exchange_u64(&XLogCtl->unloggedLSN, &ulsn, 0);
ControlFile->unloggedLSN = ulsn;

@@ -4635,7 +4629,6 @@ XLOGShmemInit(void)

SpinLockInit(&XLogCtl->Insert.insertpos_lck);
SpinLockInit(&XLogCtl->info_lck);
- SpinLockInit(&XLogCtl->ulsn_lck);
}

Shouldn't we do the pg_atomic_init_u64() here? We can still set the
initial value in StartupXLOG(), but it might be safer to initialize the
variable where we are initializing the other shared memory stuff.

I think no one accesses the unloggedLSN in between
CreateSharedMemoryAndSemaphores -> XLOGShmemInit and StartupXLOG.
However, I see nothing wrong in doing
pg_atomic_init_u64(&XLogCtl->unloggedLSN, InvalidXLogRecPtr); in
XLOGShmemInit.

Since this isn't a tremendously performance-sensitive area, IMHO we should
code defensively to eliminate any doubts about correctness and to make it
easier to reason about.

Right.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#7Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#5)
1 attachment(s)

On Wed, Nov 01, 2023 at 10:40:06PM -0500, Nathan Bossart wrote:

Since this isn't a tremendously performance-sensitive area, IMHO we should
code defensively to eliminate any doubts about correctness and to make it
easier to reason about.

Concretely, like this.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

unlogged_lsn_v4.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b541be8eec..1068f96c2d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -469,9 +469,8 @@ typedef struct XLogCtlData
 
 	XLogSegNo	lastRemovedSegNo;	/* latest removed/recycled XLOG segment */
 
-	/* Fake LSN counter, for unlogged relations. Protected by ulsn_lck. */
-	XLogRecPtr	unloggedLSN;
-	slock_t		ulsn_lck;
+	/* Fake LSN counter, for unlogged relations. */
+	pg_atomic_uint64 unloggedLSN;
 
 	/* Time and LSN of last xlog segment switch. Protected by WALWriteLock. */
 	pg_time_t	lastSegSwitchTime;
@@ -4294,14 +4293,7 @@ DataChecksumsEnabled(void)
 XLogRecPtr
 GetFakeLSNForUnloggedRel(void)
 {
-	XLogRecPtr	nextUnloggedLSN;
-
-	/* increment the unloggedLSN counter, need SpinLock */
-	SpinLockAcquire(&XLogCtl->ulsn_lck);
-	nextUnloggedLSN = XLogCtl->unloggedLSN++;
-	SpinLockRelease(&XLogCtl->ulsn_lck);
-
-	return nextUnloggedLSN;
+	return pg_atomic_fetch_add_u64(&XLogCtl->unloggedLSN, 1);
 }
 
 /*
@@ -4714,7 +4706,7 @@ XLOGShmemInit(void)
 
 	SpinLockInit(&XLogCtl->Insert.insertpos_lck);
 	SpinLockInit(&XLogCtl->info_lck);
-	SpinLockInit(&XLogCtl->ulsn_lck);
+	pg_atomic_init_u64(&XLogCtl->unloggedLSN, 0);
 }
 
 /*
@@ -5319,9 +5311,9 @@ StartupXLOG(void)
 	 * the unlogged LSN counter can be reset too.
 	 */
 	if (ControlFile->state == DB_SHUTDOWNED)
-		XLogCtl->unloggedLSN = ControlFile->unloggedLSN;
+		pg_atomic_write_u64(&XLogCtl->unloggedLSN, ControlFile->unloggedLSN);
 	else
-		XLogCtl->unloggedLSN = FirstNormalUnloggedLSN;
+		pg_atomic_write_u64(&XLogCtl->unloggedLSN, FirstNormalUnloggedLSN);
 
 	/*
 	 * Copy any missing timeline history files between 'now' and the recovery
@@ -6902,10 +6894,12 @@ CreateCheckPoint(int flags)
 	 * Persist unloggedLSN value. It's reset on crash recovery, so this goes
 	 * unused on non-shutdown checkpoints, but seems useful to store it always
 	 * for debugging purposes.
+	 *
+	 * pg_atomic_read_u64() isn't guaranteed to return the most up-to-date
+	 * value, so this is implemented via a compare/exchange with 0 to ensure
+	 * the necessary cache coherency interactions.
 	 */
-	SpinLockAcquire(&XLogCtl->ulsn_lck);
-	ControlFile->unloggedLSN = XLogCtl->unloggedLSN;
-	SpinLockRelease(&XLogCtl->ulsn_lck);
+	pg_atomic_compare_exchange_u64(&XLogCtl->unloggedLSN, &ControlFile->unloggedLSN, 0);
 
 	UpdateControlFile();
 	LWLockRelease(ControlFileLock);
#8John Morris
john.morris@crunchydata.com
In reply to: Nathan Bossart (#7)
1 attachment(s)

I incorporated your suggestions and added a few more. The changes are mainly related to catching potential errors if some basic assumptions aren’t met.

There are basically 3 assumptions. Stating them as conditions we want to avoid.

* We should not get an unlogged LSN before reading the control file.
* We should not get an unlogged LSN when shutting down.
* The unlogged LSN written out during a checkpoint shouldn’t be used.

Your suggestion addressed the first problem, and it took only minor changes to address the other two.

The essential idea is, we set a value of zero in each of the 3 situations. Then we throw an Assert() If somebody tries to allocate an unlogged LSN with the value zero.

I found the comment about cache coherency a bit confusing. We are dealing with a single address, so there should be no memory ordering or coherency issues. (Did I misunderstand?) I see it more as a race condition. Rather than merely explaining why it shouldn’t happen, the new version verifies the assumptions and throws an Assert() if something goes wrong.

Attachments:

atomic_lsn_v5.patchapplication/octet-stream; name=atomic_lsn_v5.patchDownload
From fa7d62f088e90b3d1c3a2e656dacd174299f538f Mon Sep 17 00:00:00 2001
From: John Morris <john.morris@crunchydata.com>
Date: Mon, 6 Nov 2023 15:53:07 -0800
Subject: [PATCH] atomic_lsn_v5

---
 src/backend/access/transam/xlog.c | 36 +++++++++++++++----------------
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b541be8eec..8f033749b3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -469,9 +469,8 @@ typedef struct XLogCtlData
 
 	XLogSegNo	lastRemovedSegNo;	/* latest removed/recycled XLOG segment */
 
-	/* Fake LSN counter, for unlogged relations. Protected by ulsn_lck. */
-	XLogRecPtr	unloggedLSN;
-	slock_t		ulsn_lck;
+	/* Fake LSN counter, for unlogged relations. */
+	pg_atomic_uint64 unloggedLSN;
 
 	/* Time and LSN of last xlog segment switch. Protected by WALWriteLock. */
 	pg_time_t	lastSegSwitchTime;
@@ -4294,13 +4293,8 @@ DataChecksumsEnabled(void)
 XLogRecPtr
 GetFakeLSNForUnloggedRel(void)
 {
-	XLogRecPtr	nextUnloggedLSN;
-
-	/* increment the unloggedLSN counter, need SpinLock */
-	SpinLockAcquire(&XLogCtl->ulsn_lck);
-	nextUnloggedLSN = XLogCtl->unloggedLSN++;
-	SpinLockRelease(&XLogCtl->ulsn_lck);
-
+	XLogRecPtr nextUnloggedLSN = pg_atomic_fetch_add_u64(&XLogCtl->unloggedLSN, 1);
+	Assert(nextUnloggedLSN != 0);
 	return nextUnloggedLSN;
 }
 
@@ -4714,7 +4708,7 @@ XLOGShmemInit(void)
 
 	SpinLockInit(&XLogCtl->Insert.insertpos_lck);
 	SpinLockInit(&XLogCtl->info_lck);
-	SpinLockInit(&XLogCtl->ulsn_lck);
+	pg_atomic_init_u64(&XLogCtl->unloggedLSN, 0);
 }
 
 /*
@@ -5319,9 +5313,9 @@ StartupXLOG(void)
 	 * the unlogged LSN counter can be reset too.
 	 */
 	if (ControlFile->state == DB_SHUTDOWNED)
-		XLogCtl->unloggedLSN = ControlFile->unloggedLSN;
+		pg_atomic_write_u64(&XLogCtl->unloggedLSN, ControlFile->unloggedLSN);
 	else
-		XLogCtl->unloggedLSN = FirstNormalUnloggedLSN;
+		pg_atomic_write_u64(&XLogCtl->unloggedLSN, FirstNormalUnloggedLSN);
 
 	/*
 	 * Copy any missing timeline history files between 'now' and the recovery
@@ -6899,13 +6893,17 @@ CreateCheckPoint(int flags)
 	ControlFile->minRecoveryPointTLI = 0;
 
 	/*
-	 * Persist unloggedLSN value. It's reset on crash recovery, so this goes
-	 * unused on non-shutdown checkpoints, but seems useful to store it always
-	 * for debugging purposes.
+	 * On shutdown, persist the unlogged LSN and reset the counter to zero.
+	 * if somebody tries to allocate another unlogged LSN, we should get an Assert() error.
+	 * On a regular checkpoint, write a zero to the control file so we
+	 * get an Assert() error if somebody accidentally uses the value in the control file.
+	 * The unlogged LSN counter should be quiescent, but we use an atomic exchange
+	 * just in case it isn't.
 	 */
-	SpinLockAcquire(&XLogCtl->ulsn_lck);
-	ControlFile->unloggedLSN = XLogCtl->unloggedLSN;
-	SpinLockRelease(&XLogCtl->ulsn_lck);
+	if (shutdown)
+		ControlFile->unloggedLSN = pg_atomic_exchange_u64(&XLogCtl->unloggedLSN, 0);
+	else
+		ControlFile->unloggedLSN = 0;
 
 	UpdateControlFile();
 	LWLockRelease(ControlFileLock);
-- 
2.33.0

#9Nathan Bossart
nathandbossart@gmail.com
In reply to: John Morris (#8)

On Tue, Nov 07, 2023 at 12:57:32AM +0000, John Morris wrote:

I incorporated your suggestions and added a few more. The changes are
mainly related to catching potential errors if some basic assumptions
aren’t met.

Hm. Could we move that to a separate patch? We've lived without these
extra checks for a very long time, and I'm not aware of any related issues,
so I'm not sure it's worth the added complexity. And IMO it'd be better to
keep it separate from the initial atomics conversion, anyway.

I found the comment about cache coherency a bit confusing. We are dealing
with a single address, so there should be no memory ordering or coherency
issues. (Did I misunderstand?) I see it more as a race condition. Rather
than merely explaining why it shouldn’t happen, the new version verifies
the assumptions and throws an Assert() if something goes wrong.

I was thinking of the comment for pg_atomic_read_u32() that I cited earlier
[0]: /messages/by-id/20231102034006.GA85609@nathanxps13
semantics. My interpretation of that comment is that these functions
provide no guarantee that the value returned is the most up-to-date value.
But my interpretation could be wrong, and maybe this is meant to highlight
that the value might change before we can use the return value in a
compare/exchange or something.

I spent a little time earlier today reviewing the various underlying
implementations, but apparently I need to spend some more time looking at
those...

[0]: /messages/by-id/20231102034006.GA85609@nathanxps13

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#10Stephen Frost
sfrost@snowman.net
In reply to: Nathan Bossart (#9)

Greetings,

* Nathan Bossart (nathandbossart@gmail.com) wrote:

On Tue, Nov 07, 2023 at 12:57:32AM +0000, John Morris wrote:

I incorporated your suggestions and added a few more. The changes are
mainly related to catching potential errors if some basic assumptions
aren’t met.

Hm. Could we move that to a separate patch? We've lived without these
extra checks for a very long time, and I'm not aware of any related issues,
so I'm not sure it's worth the added complexity. And IMO it'd be better to
keep it separate from the initial atomics conversion, anyway.

I do see the value in adding in an Assert though I don't want to throw
away the info about what the recent unlogged LSN was when we crash. As
that basically boils down to a one-line addition, I don't think it
really needs to be in a separate patch.

I found the comment about cache coherency a bit confusing. We are dealing
with a single address, so there should be no memory ordering or coherency
issues. (Did I misunderstand?) I see it more as a race condition. Rather
than merely explaining why it shouldn’t happen, the new version verifies
the assumptions and throws an Assert() if something goes wrong.

I was thinking of the comment for pg_atomic_read_u32() that I cited earlier
[0]. This comment also notes that pg_atomic_read_u32/64() has no barrier
semantics. My interpretation of that comment is that these functions
provide no guarantee that the value returned is the most up-to-date value.

There seems to be some serious misunderstanding about what is happening
here. The value written into the control file for unlogged LSN during
normal operation does *not* need to be the most up-to-date value and
talking about it as if it needs to be the absolutely most up-to-date and
correct value is, if anything, adding to the confusion, not reducing
confusion. The reason to write in anything other than a zero during
these routine checkpoints for unlogged LSN is entirely for forensics
purposes, not because we'll ever actually use the value- during crash
recovery and backup/restore, we're going to reset the unlogged LSN
counter anyway and we're going to throw away all of unlogged table
contents across the entire system.

We only care about the value of the unlogged LSN being correct during
normal shutdown when we're writing out the shutdown checkpoint, but by
that time everything else has been shut down and the value absolutely
should not be changing.

Thanks,

Stephen

#11Nathan Bossart
nathandbossart@gmail.com
In reply to: Stephen Frost (#10)

On Tue, Nov 07, 2023 at 11:47:46AM -0500, Stephen Frost wrote:

We only care about the value of the unlogged LSN being correct during
normal shutdown when we're writing out the shutdown checkpoint, but by
that time everything else has been shut down and the value absolutely
should not be changing.

I agree that's all true. I'm trying to connect how this scenario ensures
we see the most up-to-date value in light of this comment above
pg_atomic_read_u32():

* The read is guaranteed to return a value as it has been written by this or
* another process at some point in the past. There's however no cache
* coherency interaction guaranteeing the value hasn't since been written to
* again.

Is there something special about all other backends being shut down that
ensures this returns the most up-to-date value and not something from "some
point in the past" as the stated contract for this function seems to
suggest?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#12Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#11)

Hi,

On 2023-11-07 11:02:49 -0600, Nathan Bossart wrote:

On Tue, Nov 07, 2023 at 11:47:46AM -0500, Stephen Frost wrote:

We only care about the value of the unlogged LSN being correct during
normal shutdown when we're writing out the shutdown checkpoint, but by
that time everything else has been shut down and the value absolutely
should not be changing.

I agree that's all true. I'm trying to connect how this scenario ensures
we see the most up-to-date value in light of this comment above
pg_atomic_read_u32():

* The read is guaranteed to return a value as it has been written by this or
* another process at some point in the past. There's however no cache
* coherency interaction guaranteeing the value hasn't since been written to
* again.

Is there something special about all other backends being shut down that
ensures this returns the most up-to-date value and not something from "some
point in the past" as the stated contract for this function seems to
suggest?

Practically yes - getting to the point of writing the shutdown checkpoint
implies having gone through a bunch of code that implies memory barriers
(spinlocks, lwlocks).

However, even if there's likely some other implied memory barrier that we
could piggyback on, the patch much simpler to understand if it doesn't change
coherency rules. There's no way the overhead could matter.

Greetings,

Andres Freund

#13Andres Freund
andres@anarazel.de
In reply to: John Morris (#8)

Hi,

On 2023-11-07 00:57:32 +0000, John Morris wrote:

I found the comment about cache coherency a bit confusing. We are dealing
with a single address, so there should be no memory ordering or coherency
issues. (Did I misunderstand?) I see it more as a race condition.

IMO cache coherency covers the value a single variable has in different
threads / processes.

In fact, the only reason there effectively is a guarantee that you're not
seeing an outdated unloggedLSN value during shutdown checkpoints, even without
the spinlock or full barrier atomic op, is that the LWLockAcquire(), a few
lines above this, would prevent both the compiler and CPU from moving the read
of unloggedLSN to much earlier. Obviously that lwlock has a different
address...

If the patch just had done the minimal conversion, it'd already have been
committed... Even if there'd be a performance reason to get rid of the memory
barrier around reading unloggedLSN in CreateCheckPoint(), I'd do the
conversion in a separate commit.

Greetings,

Andres Freund

#14Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#12)

On Tue, Nov 07, 2023 at 04:58:16PM -0800, Andres Freund wrote:

On 2023-11-07 11:02:49 -0600, Nathan Bossart wrote:

Is there something special about all other backends being shut down that
ensures this returns the most up-to-date value and not something from "some
point in the past" as the stated contract for this function seems to
suggest?

Practically yes - getting to the point of writing the shutdown checkpoint
implies having gone through a bunch of code that implies memory barriers
(spinlocks, lwlocks).

Sure.

However, even if there's likely some other implied memory barrier that we
could piggyback on, the patch much simpler to understand if it doesn't change
coherency rules. There's no way the overhead could matter.

I wonder if it's worth providing a set of "locked read" functions. Those
could just do a compare/exchange with 0 in the generic implementation. For
patches like this one where the overhead really shouldn't matter, I'd
encourage folks to use those to make it easy to reason about correctness.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#15Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#14)

On Thu, Nov 09, 2023 at 03:27:33PM -0600, Nathan Bossart wrote:

I wonder if it's worth providing a set of "locked read" functions. Those
could just do a compare/exchange with 0 in the generic implementation. For
patches like this one where the overhead really shouldn't matter, I'd
encourage folks to use those to make it easy to reason about correctness.

I moved this proposal to a new thread [0]/messages/by-id/20231110205128.GB1315705@nathanxps13.

[0]: /messages/by-id/20231110205128.GB1315705@nathanxps13

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#16Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#15)
1 attachment(s)

Here is a new version of the patch that uses the new atomic read/write
functions with full barriers that were added in commit bd5132d. Thoughts?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v6-0001-Convert-unloggedLSN-to-an-atomic-variable.patchtext/x-diff; charset=us-asciiDownload
From c40594c62ccfbf75cb0d3787cb9367d15ae37de8 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Thu, 29 Feb 2024 10:25:46 -0600
Subject: [PATCH v6 1/1] Convert unloggedLSN to an atomic variable.

Currently, this variable is an XLogRecPtr protected by a spinlock.
By converting it to an atomic variable, we can remove the spinlock,
which saves a small amount of shared memory space.  Since this code
is not performance-critical, we use atomic operations with full
barrier semantics to make it easy to reason about correctness.

Author: John Morris
Reviewed-by: Michael Paquier, Robert Haas, Andres Freund, Stephen Frost, Bharath Rupireddy
Discussion: https://postgr.es/m/BYAPR13MB26772534335255E50318C574A0409%40BYAPR13MB2677.namprd13.prod.outlook.com
Discussion: https://postgr.es/m/MN2PR13MB2688FD8B757316CB5C54C8A2A0DDA%40MN2PR13MB2688.namprd13.prod.outlook.com
---
 src/backend/access/transam/xlog.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c1162d55bf..eb02e3b6a6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -470,9 +470,8 @@ typedef struct XLogCtlData
 
 	XLogSegNo	lastRemovedSegNo;	/* latest removed/recycled XLOG segment */
 
-	/* Fake LSN counter, for unlogged relations. Protected by ulsn_lck. */
-	XLogRecPtr	unloggedLSN;
-	slock_t		ulsn_lck;
+	/* Fake LSN counter, for unlogged relations. */
+	pg_atomic_uint64 unloggedLSN;
 
 	/* Time and LSN of last xlog segment switch. Protected by WALWriteLock. */
 	pg_time_t	lastSegSwitchTime;
@@ -4498,14 +4497,7 @@ DataChecksumsEnabled(void)
 XLogRecPtr
 GetFakeLSNForUnloggedRel(void)
 {
-	XLogRecPtr	nextUnloggedLSN;
-
-	/* increment the unloggedLSN counter, need SpinLock */
-	SpinLockAcquire(&XLogCtl->ulsn_lck);
-	nextUnloggedLSN = XLogCtl->unloggedLSN++;
-	SpinLockRelease(&XLogCtl->ulsn_lck);
-
-	return nextUnloggedLSN;
+	return pg_atomic_fetch_add_u64(&XLogCtl->unloggedLSN, 1);
 }
 
 /*
@@ -4921,7 +4913,7 @@ XLOGShmemInit(void)
 
 	SpinLockInit(&XLogCtl->Insert.insertpos_lck);
 	SpinLockInit(&XLogCtl->info_lck);
-	SpinLockInit(&XLogCtl->ulsn_lck);
+	pg_atomic_init_u64(&XLogCtl->unloggedLSN, InvalidXLogRecPtr);
 }
 
 /*
@@ -5526,9 +5518,11 @@ StartupXLOG(void)
 	 * the unlogged LSN counter can be reset too.
 	 */
 	if (ControlFile->state == DB_SHUTDOWNED)
-		XLogCtl->unloggedLSN = ControlFile->unloggedLSN;
+		pg_atomic_write_membarrier_u64(&XLogCtl->unloggedLSN,
+									   ControlFile->unloggedLSN);
 	else
-		XLogCtl->unloggedLSN = FirstNormalUnloggedLSN;
+		pg_atomic_write_membarrier_u64(&XLogCtl->unloggedLSN,
+									   FirstNormalUnloggedLSN);
 
 	/*
 	 * Copy any missing timeline history files between 'now' and the recovery
@@ -7110,9 +7104,7 @@ CreateCheckPoint(int flags)
 	 * unused on non-shutdown checkpoints, but seems useful to store it always
 	 * for debugging purposes.
 	 */
-	SpinLockAcquire(&XLogCtl->ulsn_lck);
-	ControlFile->unloggedLSN = XLogCtl->unloggedLSN;
-	SpinLockRelease(&XLogCtl->ulsn_lck);
+	ControlFile->unloggedLSN = pg_atomic_read_membarrier_u64(&XLogCtl->unloggedLSN);
 
 	UpdateControlFile();
 	LWLockRelease(ControlFileLock);
-- 
2.25.1

#17Stephen Frost
sfrost@snowman.net
In reply to: Nathan Bossart (#16)

Greetings,

* Nathan Bossart (nathandbossart@gmail.com) wrote:

Here is a new version of the patch that uses the new atomic read/write
functions with full barriers that were added in commit bd5132d. Thoughts?

Saw that commit go in- glad to see it. Thanks for updating this patch
too. The changes look good to me.

Thanks again,

Stephen

#18John Morris
john.morris@crunchydata.com
In reply to: Nathan Bossart (#16)

Looks good to me.

* John

From: Nathan Bossart <nathandbossart@gmail.com>
Date: Thursday, February 29, 2024 at 8:34 AM
To: Andres Freund <andres@anarazel.de>
Cc: Stephen Frost <sfrost@snowman.net>, John Morris <john.morris@crunchydata.com>, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>, Michael Paquier <michael@paquier.xyz>, Robert Haas <robertmhaas@gmail.com>, pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>
Subject: Re: Atomic ops for unlogged LSN
Here is a new version of the patch that uses the new atomic read/write
functions with full barriers that were added in commit bd5132d. Thoughts?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#19Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Nathan Bossart (#16)

On Thu, Feb 29, 2024 at 10:04 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:

Here is a new version of the patch that uses the new atomic read/write
functions with full barriers that were added in commit bd5132d. Thoughts?

Thanks for getting the other patch in. The attached v6 patch LGTM.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#20Nathan Bossart
nathandbossart@gmail.com
In reply to: Bharath Rupireddy (#19)

Committed.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com