DSA_ALLOC_NO_OOM doesn't work

Started by Heikki Linnakangasalmost 2 years ago9 messages
#1Heikki Linnakangas
hlinnaka@iki.fi
1 attachment(s)

If you call dsa_allocate_extended(DSA_ALLOC_NO_OOM), it will still
ereport an error if you run out of space (originally reported at [0]https://github.com/pgvector/pgvector/issues/434#issuecomment-1912744489).

Attached patch adds code to test_dsa.c to demonstrate that:

postgres=# select test_dsa_basic();
ERROR: could not resize shared memory segment "/PostgreSQL.1312700148"
to 1075843072 bytes: No space left on device

[0]: https://github.com/pgvector/pgvector/issues/434#issuecomment-1912744489

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

test-DSA_ALLOC_NO_OOM.patchtext/x-patch; charset=UTF-8; name=test-DSA_ALLOC_NO_OOM.patchDownload
diff --git a/src/test/modules/test_dsa/test_dsa.c b/src/test/modules/test_dsa/test_dsa.c
index 844316dec2b..f37eb57e99d 100644
--- a/src/test/modules/test_dsa/test_dsa.c
+++ b/src/test/modules/test_dsa/test_dsa.c
@@ -51,6 +51,19 @@ test_dsa_basic(PG_FUNCTION_ARGS)
 	for (int i = 0; i < 100; i++)
 	{
 		dsa_free(a, p[i]);
+		p[i] = InvalidDsaPointer;
+	}
+
+	for (int i = 0; i < 100; i++)
+		p[i] = dsa_allocate_extended(a, 1024*1024*1024, DSA_ALLOC_NO_OOM | DSA_ALLOC_HUGE);
+
+	for (int i = 0; i < 100; i++)
+	{
+		if (p[i] != InvalidDsaPointer)
+		{
+			dsa_free(a, p[i]);
+			p[i] = InvalidDsaPointer;
+		}
 	}
 
 	dsa_detach(a);
#2Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Heikki Linnakangas (#1)
2 attachment(s)
Re: DSA_ALLOC_NO_OOM doesn't work

(moving to pgsql-hackers)

On 29/01/2024 14:06, Heikki Linnakangas wrote:

If you call dsa_allocate_extended(DSA_ALLOC_NO_OOM), it will still
ereport an error if you run out of space (originally reported at [0]).

Attached patch adds code to test_dsa.c to demonstrate that:

postgres=# select test_dsa_basic();
ERROR: could not resize shared memory segment "/PostgreSQL.1312700148"
to 1075843072 bytes: No space left on device

[0] https://github.com/pgvector/pgvector/issues/434#issuecomment-1912744489

I wrote the attached patch to address this, in a fairly straightforward
or naive way. The problem was that even though dsa_allocate() had code
to check the return value of dsm_create(), and return NULL instead of
ereport(ERROR) if the DSA_ALLOC_NO_OOM was set, dsm_create() does not in
fact return NULL on OOM. To fix, I added a DSM_CREATE_NO_OOM option to
dsm_create(), and I also had to punch that through to dsm_impl_op().

This is a little unpolished, but if we want to backpatch something
narrow now, this would probably be the right approach.

However, I must say that the dsm_impl_op() interface is absolutely
insane. It's like someone looked at ioctl() and thought, "hey that's a
great idea!". It mixes all different operations like creating or
destroying a DSM segment together into one function call, and the return
value is just a boolean, even though the function could fail for many
different reasons, and the callers do in fact care about the reason. In
a more natural interface, the different operations would have very
different function signatures.

I think we must refactor that. It might be best to leave this
DSA_ALLOC_NO_OOM bug unfixed in backpatches, and fix it on top of the
refactorings on master only. Later, we can backpatch the refactorings
too if we're comfortable with it; extensions shouldn't be using the
dsm_impl_op() interface directly.

(I skimmed through the thread where the DSM code was added, but didn't
see any mention of why dsm_impl_op() signature is like that:
/messages/by-id/CA+TgmoaDqDUgt=4Zs_QPOnBt=EstEaVNP+5t+m=FPNWshiPR3A@mail.gmail.com)

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

0001-Add-test.patchtext/x-patch; charset=UTF-8; name=0001-Add-test.patchDownload
From b60f877ad67e70b60915b2b25d0dbae6972c3536 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 13 Feb 2024 14:22:10 +0200
Subject: [PATCH 1/2] Add test

Discussion: https://www.postgresql.org/message-id/5efa4a5e-2b8b-42dd-80ed-f920718cf5c0@iki.fi
---
 src/test/modules/test_dsa/test_dsa.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/src/test/modules/test_dsa/test_dsa.c b/src/test/modules/test_dsa/test_dsa.c
index 844316dec2b..f37eb57e99d 100644
--- a/src/test/modules/test_dsa/test_dsa.c
+++ b/src/test/modules/test_dsa/test_dsa.c
@@ -51,6 +51,19 @@ test_dsa_basic(PG_FUNCTION_ARGS)
 	for (int i = 0; i < 100; i++)
 	{
 		dsa_free(a, p[i]);
+		p[i] = InvalidDsaPointer;
+	}
+
+	for (int i = 0; i < 100; i++)
+		p[i] = dsa_allocate_extended(a, 1024*1024*1024, DSA_ALLOC_NO_OOM | DSA_ALLOC_HUGE);
+
+	for (int i = 0; i < 100; i++)
+	{
+		if (p[i] != InvalidDsaPointer)
+		{
+			dsa_free(a, p[i]);
+			p[i] = InvalidDsaPointer;
+		}
 	}
 
 	dsa_detach(a);
-- 
2.39.2

0002-Fix-DSA_ALLOC_NO_OOM.patchtext/x-patch; charset=UTF-8; name=0002-Fix-DSA_ALLOC_NO_OOM.patchDownload
From fb20d55111d726dca247b2111a175f30d412e35a Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 13 Feb 2024 15:52:48 +0200
Subject: [PATCH 2/2] Fix DSA_ALLOC_NO_OOM

---
 src/backend/access/common/session.c   |  2 +-
 src/backend/access/transam/parallel.c |  2 +-
 src/backend/storage/ipc/dsm.c         | 43 +++++++++-----
 src/backend/storage/ipc/dsm_impl.c    | 84 ++++++++++++++++++++-------
 src/backend/utils/mmgr/dsa.c          |  2 +-
 src/include/storage/dsm.h             |  1 +
 src/include/storage/dsm_impl.h        |  4 +-
 7 files changed, 97 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/common/session.c b/src/backend/access/common/session.c
index 3f2256f4915..5d4fe6dbb7a 100644
--- a/src/backend/access/common/session.c
+++ b/src/backend/access/common/session.c
@@ -102,7 +102,7 @@ GetSessionDsmHandle(void)
 
 	/* Set up segment and TOC. */
 	size = shm_toc_estimate(&estimator);
-	seg = dsm_create(size, DSM_CREATE_NULL_IF_MAXSEGMENTS);
+	seg = dsm_create(size, DSM_CREATE_NULL_IF_MAXSEGMENTS | DSM_CREATE_NO_OOM);
 	if (seg == NULL)
 	{
 		MemoryContextSwitchTo(old_context);
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 849a03e4b65..df1e5e7145d 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -312,7 +312,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	 */
 	segsize = shm_toc_estimate(&pcxt->estimator);
 	if (pcxt->nworkers > 0)
-		pcxt->seg = dsm_create(segsize, DSM_CREATE_NULL_IF_MAXSEGMENTS);
+		pcxt->seg = dsm_create(segsize, DSM_CREATE_NULL_IF_MAXSEGMENTS | DSM_CREATE_NO_OOM);
 	if (pcxt->seg != NULL)
 		pcxt->toc = shm_toc_create(PARALLEL_MAGIC,
 								   dsm_segment_address(pcxt->seg),
diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index 6b12108dd10..2d2596fbe2d 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -214,7 +214,7 @@ dsm_postmaster_startup(PGShmemHeader *shim)
 			continue;
 		if (dsm_impl_op(DSM_OP_CREATE, dsm_control_handle, segsize,
 						&dsm_control_impl_private, &dsm_control_address,
-						&dsm_control_mapped_size, ERROR))
+						&dsm_control_mapped_size, ERROR, 0))
 			break;
 	}
 	dsm_control = dsm_control_address;
@@ -255,7 +255,7 @@ dsm_cleanup_using_control_segment(dsm_handle old_control_handle)
 	 * out quietly.
 	 */
 	if (!dsm_impl_op(DSM_OP_ATTACH, old_control_handle, 0, &impl_private,
-					 &mapped_address, &mapped_size, DEBUG1))
+					 &mapped_address, &mapped_size, DEBUG1, 0))
 		return;
 
 	/*
@@ -266,7 +266,7 @@ dsm_cleanup_using_control_segment(dsm_handle old_control_handle)
 	if (!dsm_control_segment_sane(old_control, mapped_size))
 	{
 		dsm_impl_op(DSM_OP_DETACH, old_control_handle, 0, &impl_private,
-					&mapped_address, &mapped_size, LOG);
+					&mapped_address, &mapped_size, LOG, 0);
 		return;
 	}
 
@@ -296,7 +296,7 @@ dsm_cleanup_using_control_segment(dsm_handle old_control_handle)
 
 		/* Destroy the referenced segment. */
 		dsm_impl_op(DSM_OP_DESTROY, handle, 0, &junk_impl_private,
-					&junk_mapped_address, &junk_mapped_size, LOG);
+					&junk_mapped_address, &junk_mapped_size, LOG, 0);
 	}
 
 	/* Destroy the old control segment, too. */
@@ -304,7 +304,7 @@ dsm_cleanup_using_control_segment(dsm_handle old_control_handle)
 		 "cleaning up dynamic shared memory control segment with ID %u",
 		 old_control_handle);
 	dsm_impl_op(DSM_OP_DESTROY, old_control_handle, 0, &impl_private,
-				&mapped_address, &mapped_size, LOG);
+				&mapped_address, &mapped_size, LOG, 0);
 }
 
 /*
@@ -400,7 +400,7 @@ dsm_postmaster_shutdown(int code, Datum arg)
 
 		/* Destroy the segment. */
 		dsm_impl_op(DSM_OP_DESTROY, handle, 0, &junk_impl_private,
-					&junk_mapped_address, &junk_mapped_size, LOG);
+					&junk_mapped_address, &junk_mapped_size, LOG, 0);
 	}
 
 	/* Remove the control segment itself. */
@@ -410,7 +410,7 @@ dsm_postmaster_shutdown(int code, Datum arg)
 	dsm_control_address = dsm_control;
 	dsm_impl_op(DSM_OP_DESTROY, dsm_control_handle, 0,
 				&dsm_control_impl_private, &dsm_control_address,
-				&dsm_control_mapped_size, LOG);
+				&dsm_control_mapped_size, LOG, 0);
 	dsm_control = dsm_control_address;
 	shim->dsm_control = 0;
 }
@@ -512,6 +512,8 @@ dsm_shmem_init(void)
  * remains attached until explicitly detached or the session ends.
  * Creating with a NULL CurrentResourceOwner is equivalent to creating
  * with a non-NULL CurrentResourceOwner and then calling dsm_pin_mapping.
+ *
+ * XXX: explain flags
  */
 dsm_segment *
 dsm_create(Size size, int flags)
@@ -569,14 +571,25 @@ dsm_create(Size size, int flags)
 			LWLockRelease(DynamicSharedMemoryControlLock);
 		for (;;)
 		{
+			int			impl_flags = ((flags & DSM_CREATE_NO_OOM) != 0) ? DSM_OP_CREATE_NO_OOM : 0;
+
 			Assert(seg->mapped_address == NULL && seg->mapped_size == 0);
 			/* Use even numbers only */
 			seg->handle = pg_prng_uint32(&pg_global_prng_state) << 1;
 			if (seg->handle == DSM_HANDLE_INVALID)	/* Reserve sentinel */
 				continue;
+			errno = 0;
 			if (dsm_impl_op(DSM_OP_CREATE, seg->handle, size, &seg->impl_private,
-							&seg->mapped_address, &seg->mapped_size, ERROR))
+							&seg->mapped_address, &seg->mapped_size, ERROR, impl_flags))
 				break;
+			if ((flags & DSM_CREATE_NO_OOM) != 0 && (errno == ENOMEM || errno == ENOSPC))
+			{
+				if (seg->resowner != NULL)
+					ResourceOwnerForgetDSM(seg->resowner, seg);
+				dlist_delete(&seg->node);
+				pfree(seg);
+				return NULL;
+			}
 		}
 		LWLockAcquire(DynamicSharedMemoryControlLock, LW_EXCLUSIVE);
 	}
@@ -614,13 +627,13 @@ dsm_create(Size size, int flags)
 		LWLockRelease(DynamicSharedMemoryControlLock);
 		if (!using_main_dsm_region)
 			dsm_impl_op(DSM_OP_DESTROY, seg->handle, 0, &seg->impl_private,
-						&seg->mapped_address, &seg->mapped_size, WARNING);
+						&seg->mapped_address, &seg->mapped_size, WARNING, 0);
 		if (seg->resowner != NULL)
 			ResourceOwnerForgetDSM(seg->resowner, seg);
 		dlist_delete(&seg->node);
 		pfree(seg);
 
-		if ((flags & DSM_CREATE_NULL_IF_MAXSEGMENTS) != 0)
+		if ((flags & (DSM_CREATE_NULL_IF_MAXSEGMENTS | DSM_CREATE_NO_OOM)) != 0)
 			return NULL;
 		ereport(ERROR,
 				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
@@ -744,7 +757,7 @@ dsm_attach(dsm_handle h)
 	/* Here's where we actually try to map the segment. */
 	if (!is_main_region_dsm_handle(seg->handle))
 		dsm_impl_op(DSM_OP_ATTACH, seg->handle, 0, &seg->impl_private,
-					&seg->mapped_address, &seg->mapped_size, ERROR);
+					&seg->mapped_address, &seg->mapped_size, ERROR, 0);
 
 	return seg;
 }
@@ -788,7 +801,7 @@ dsm_detach_all(void)
 	if (control_address != NULL)
 		dsm_impl_op(DSM_OP_DETACH, dsm_control_handle, 0,
 					&dsm_control_impl_private, &control_address,
-					&dsm_control_mapped_size, ERROR);
+					&dsm_control_mapped_size, ERROR, 0);
 }
 
 /*
@@ -841,7 +854,7 @@ dsm_detach(dsm_segment *seg)
 	{
 		if (!is_main_region_dsm_handle(seg->handle))
 			dsm_impl_op(DSM_OP_DETACH, seg->handle, 0, &seg->impl_private,
-						&seg->mapped_address, &seg->mapped_size, WARNING);
+						&seg->mapped_address, &seg->mapped_size, WARNING, 0);
 		seg->impl_private = NULL;
 		seg->mapped_address = NULL;
 		seg->mapped_size = 0;
@@ -883,7 +896,7 @@ dsm_detach(dsm_segment *seg)
 			 */
 			if (is_main_region_dsm_handle(seg->handle) ||
 				dsm_impl_op(DSM_OP_DESTROY, seg->handle, 0, &seg->impl_private,
-							&seg->mapped_address, &seg->mapped_size, WARNING))
+							&seg->mapped_address, &seg->mapped_size, WARNING, 0))
 			{
 				LWLockAcquire(DynamicSharedMemoryControlLock, LW_EXCLUSIVE);
 				if (is_main_region_dsm_handle(seg->handle))
@@ -1055,7 +1068,7 @@ dsm_unpin_segment(dsm_handle handle)
 		 */
 		if (is_main_region_dsm_handle(handle) ||
 			dsm_impl_op(DSM_OP_DESTROY, handle, 0, &junk_impl_private,
-						&junk_mapped_address, &junk_mapped_size, WARNING))
+						&junk_mapped_address, &junk_mapped_size, WARNING, 0))
 		{
 			LWLockAcquire(DynamicSharedMemoryControlLock, LW_EXCLUSIVE);
 			if (is_main_region_dsm_handle(handle))
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 03aa47a1049..693caf8a64b 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -72,23 +72,23 @@
 #ifdef USE_DSM_POSIX
 static bool dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
 						   void **impl_private, void **mapped_address,
-						   Size *mapped_size, int elevel);
+						   Size *mapped_size, int elevel, int flags);
 static int	dsm_impl_posix_resize(int fd, off_t size);
 #endif
 #ifdef USE_DSM_SYSV
 static bool dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
 						  void **impl_private, void **mapped_address,
-						  Size *mapped_size, int elevel);
+						  Size *mapped_size, int elevel, int flags);
 #endif
 #ifdef USE_DSM_WINDOWS
 static bool dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
 							 void **impl_private, void **mapped_address,
-							 Size *mapped_size, int elevel);
+							 Size *mapped_size, int elevel, int flags);
 #endif
 #ifdef USE_DSM_MMAP
 static bool dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
 						  void **impl_private, void **mapped_address,
-						  Size *mapped_size, int elevel);
+						  Size *mapped_size, int elevel, int flags);
 #endif
 static int	errcode_for_dynamic_shared_memory(void);
 
@@ -153,38 +153,45 @@ int			min_dynamic_shared_memory;
  * a message should first be logged at the specified elevel, except in the
  * case where DSM_OP_CREATE experiences a name collision, which should
  * silently return false.
+ *
+ * If op is DSM_OP_CREATE and the DSM_OP_CREATE_NO_OOM flag is set, returns
+ * false on a failure caused by running out of memory or disk space,
+ * regardless of elevel. That can be distinguished from a name collision by
+ * checking if 'errno' is ENOSPC or ENOMEM.
  *-----
  */
 bool
 dsm_impl_op(dsm_op op, dsm_handle handle, Size request_size,
 			void **impl_private, void **mapped_address, Size *mapped_size,
-			int elevel)
+			int elevel, int flags)
 {
 	Assert(op == DSM_OP_CREATE || request_size == 0);
 	Assert((op != DSM_OP_CREATE && op != DSM_OP_ATTACH) ||
 		   (*mapped_address == NULL && *mapped_size == 0));
+	Assert((op == DSM_OP_CREATE && (flags & ~DSM_OP_CREATE_NO_OOM) == 0) ||
+		   flags == 0);
 
 	switch (dynamic_shared_memory_type)
 	{
 #ifdef USE_DSM_POSIX
 		case DSM_IMPL_POSIX:
 			return dsm_impl_posix(op, handle, request_size, impl_private,
-								  mapped_address, mapped_size, elevel);
+								  mapped_address, mapped_size, elevel, flags);
 #endif
 #ifdef USE_DSM_SYSV
 		case DSM_IMPL_SYSV:
 			return dsm_impl_sysv(op, handle, request_size, impl_private,
-								 mapped_address, mapped_size, elevel);
+								 mapped_address, mapped_size, elevel, flags);
 #endif
 #ifdef USE_DSM_WINDOWS
 		case DSM_IMPL_WINDOWS:
 			return dsm_impl_windows(op, handle, request_size, impl_private,
-									mapped_address, mapped_size, elevel);
+									mapped_address, mapped_size, elevel, flags);
 #endif
 #ifdef USE_DSM_MMAP
 		case DSM_IMPL_MMAP:
 			return dsm_impl_mmap(op, handle, request_size, impl_private,
-								 mapped_address, mapped_size, elevel);
+								 mapped_address, mapped_size, elevel, flags);
 #endif
 		default:
 			elog(ERROR, "unexpected dynamic shared memory type: %d",
@@ -211,10 +218,10 @@ dsm_impl_op(dsm_op op, dsm_handle handle, Size request_size,
 static bool
 dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
 			   void **impl_private, void **mapped_address, Size *mapped_size,
-			   int elevel)
+			   int elevel, int flags)
 {
 	char		name[64];
-	int			flags;
+	int			open_flags;
 	int			fd;
 	char	   *address;
 
@@ -255,10 +262,13 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
 	 */
 	ReserveExternalFD();
 
-	flags = O_RDWR | (op == DSM_OP_CREATE ? O_CREAT | O_EXCL : 0);
-	if ((fd = shm_open(name, flags, PG_FILE_MODE_OWNER)) == -1)
+	open_flags = O_RDWR | (op == DSM_OP_CREATE ? O_CREAT | O_EXCL : 0);
+	if ((fd = shm_open(name, open_flags, PG_FILE_MODE_OWNER)) == -1)
 	{
 		ReleaseExternalFD();
+		if (op == DSM_OP_CREATE && (flags & DSM_OP_CREATE_NO_OOM) != 0 &&
+			(errno == ENOMEM || errno == ENOSPC))
+			return false;
 		if (op == DSM_OP_ATTACH || errno != EEXIST)
 			ereport(elevel,
 					(errcode_for_dynamic_shared_memory(),
@@ -304,6 +314,9 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
 		shm_unlink(name);
 		errno = save_errno;
 
+		if (op == DSM_OP_CREATE && (flags & DSM_OP_CREATE_NO_OOM) != 0 &&
+			(errno == ENOMEM || errno == ENOSPC))
+			return false;
 		ereport(elevel,
 				(errcode_for_dynamic_shared_memory(),
 				 errmsg("could not resize shared memory segment \"%s\" to %zu bytes: %m",
@@ -326,6 +339,9 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
 			shm_unlink(name);
 		errno = save_errno;
 
+		if (op == DSM_OP_CREATE && (flags & DSM_OP_CREATE_NO_OOM) != 0 &&
+			(errno == ENOMEM || errno == ENOSPC))
+			return false;
 		ereport(elevel,
 				(errcode_for_dynamic_shared_memory(),
 				 errmsg("could not map shared memory segment \"%s\": %m",
@@ -422,7 +438,7 @@ dsm_impl_posix_resize(int fd, off_t size)
 static bool
 dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
 			  void **impl_private, void **mapped_address, Size *mapped_size,
-			  int elevel)
+			  int elevel, int flags)
 {
 	key_t		key;
 	int			ident;
@@ -484,13 +500,14 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
 	}
 	else
 	{
-		int			flags = IPCProtection;
+		int			ipc_flags = IPCProtection;
 		size_t		segsize;
 
 		/*
 		 * Allocate the memory BEFORE acquiring the resource, so that we don't
 		 * leak the resource if memory allocation fails.
 		 */
+		// FIXME: NO_OOM
 		ident_cache = MemoryContextAlloc(TopMemoryContext, sizeof(int));
 
 		/*
@@ -502,16 +519,20 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
 
 		if (op == DSM_OP_CREATE)
 		{
-			flags |= IPC_CREAT | IPC_EXCL;
+			ipc_flags |= IPC_CREAT | IPC_EXCL;
 			segsize = request_size;
 		}
 
-		if ((ident = shmget(key, segsize, flags)) == -1)
+		if ((ident = shmget(key, segsize, ipc_flags)) == -1)
 		{
+			if (op == DSM_OP_CREATE && (flags & DSM_OP_CREATE_NO_OOM) != 0 &&
+				(errno == ENOMEM || errno == ENOSPC))
+				return false;
 			if (op == DSM_OP_ATTACH || errno != EEXIST)
 			{
 				int			save_errno = errno;
 
+				// FIXME: do we leak 'ident_cache' otherwise?
 				pfree(ident_cache);
 				errno = save_errno;
 				ereport(elevel,
@@ -579,6 +600,9 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
 			shmctl(ident, IPC_RMID, NULL);
 		errno = save_errno;
 
+		if (op == DSM_OP_CREATE && (flags & DSM_OP_CREATE_NO_OOM) != 0 &&
+			(errno == ENOMEM || errno == ENOSPC))
+			return false;
 		ereport(elevel,
 				(errcode_for_dynamic_shared_memory(),
 				 errmsg("could not map shared memory segment \"%s\": %m",
@@ -609,7 +633,7 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
 static bool
 dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
 				 void **impl_private, void **mapped_address,
-				 Size *mapped_size, int elevel)
+				 Size *mapped_size, int elevel, int flags)
 {
 	char	   *address;
 	HANDLE		hmap;
@@ -702,6 +726,10 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
 		if (!hmap)
 		{
 			_dosmaperr(errcode);
+
+			if ((flags & DSM_OP_CREATE_NO_OOM) != 0 &&
+				(errno == ENOMEM || errno == ENOSPC))
+				return false;
 			ereport(elevel,
 					(errcode_for_dynamic_shared_memory(),
 					 errmsg("could not create shared memory segment \"%s\": %m",
@@ -738,6 +766,9 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
 		CloseHandle(hmap);
 		errno = save_errno;
 
+		if (op == DSM_OP_CREATE && (flags & DSM_OP_CREATE_NO_OOM) != 0 &&
+			(errno == ENOMEM || errno == ENOSPC))
+			return false;
 		ereport(elevel,
 				(errcode_for_dynamic_shared_memory(),
 				 errmsg("could not map shared memory segment \"%s\": %m",
@@ -791,10 +822,10 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
 static bool
 dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
 			  void **impl_private, void **mapped_address, Size *mapped_size,
-			  int elevel)
+			  int elevel, int flags)
 {
 	char		name[64];
-	int			flags;
+	int			open_flags;
 	int			fd;
 	char	   *address;
 
@@ -827,9 +858,12 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
 	}
 
 	/* Create new segment or open an existing one for attach. */
-	flags = O_RDWR | (op == DSM_OP_CREATE ? O_CREAT | O_EXCL : 0);
-	if ((fd = OpenTransientFile(name, flags)) == -1)
+	open_flags = O_RDWR | (op == DSM_OP_CREATE ? O_CREAT | O_EXCL : 0);
+	if ((fd = OpenTransientFile(name, open_flags)) == -1)
 	{
+		if (op == DSM_OP_CREATE && (flags & DSM_OP_CREATE_NO_OOM) != 0 &&
+			(errno == ENOMEM || errno == ENOSPC))
+			return false;
 		if (op == DSM_OP_ATTACH || errno != EEXIST)
 			ereport(elevel,
 					(errcode_for_dynamic_shared_memory(),
@@ -906,6 +940,9 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
 			unlink(name);
 			errno = save_errno ? save_errno : ENOSPC;
 
+			if (op == DSM_OP_CREATE && (flags & DSM_OP_CREATE_NO_OOM) != 0 &&
+				(errno == ENOMEM || errno == ENOSPC))
+				return false;
 			ereport(elevel,
 					(errcode_for_dynamic_shared_memory(),
 					 errmsg("could not resize shared memory segment \"%s\" to %zu bytes: %m",
@@ -928,6 +965,9 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
 			unlink(name);
 		errno = save_errno;
 
+		if (op == DSM_OP_CREATE && (flags & DSM_OP_CREATE_NO_OOM) != 0 &&
+			(errno == ENOMEM || errno == ENOSPC))
+			return false;
 		ereport(elevel,
 				(errcode_for_dynamic_shared_memory(),
 				 errmsg("could not map shared memory segment \"%s\": %m",
diff --git a/src/backend/utils/mmgr/dsa.c b/src/backend/utils/mmgr/dsa.c
index a6b728ba9ff..29b6f975be7 100644
--- a/src/backend/utils/mmgr/dsa.c
+++ b/src/backend/utils/mmgr/dsa.c
@@ -2164,7 +2164,7 @@ make_new_segment(dsa_area *area, size_t requested_pages)
 	/* Create the segment. */
 	oldowner = CurrentResourceOwner;
 	CurrentResourceOwner = area->resowner;
-	segment = dsm_create(total_size, 0);
+	segment = dsm_create(total_size, DSM_CREATE_NULL_IF_MAXSEGMENTS | DSM_CREATE_NO_OOM);
 	CurrentResourceOwner = oldowner;
 	if (segment == NULL)
 		return NULL;
diff --git a/src/include/storage/dsm.h b/src/include/storage/dsm.h
index 1a22b32df1a..7e2ec8fc560 100644
--- a/src/include/storage/dsm.h
+++ b/src/include/storage/dsm.h
@@ -18,6 +18,7 @@
 typedef struct dsm_segment dsm_segment;
 
 #define DSM_CREATE_NULL_IF_MAXSEGMENTS			0x0001
+#define DSM_CREATE_NO_OOM						0x0002
 
 /* Startup and shutdown functions. */
 struct PGShmemHeader;			/* avoid including pg_shmem.h */
diff --git a/src/include/storage/dsm_impl.h b/src/include/storage/dsm_impl.h
index 882269603da..2097876f756 100644
--- a/src/include/storage/dsm_impl.h
+++ b/src/include/storage/dsm_impl.h
@@ -66,10 +66,12 @@ typedef enum
 	DSM_OP_DESTROY,
 } dsm_op;
 
+#define DSM_OP_CREATE_NO_OOM	0x01
+
 /* Create, attach to, detach from, resize, or destroy a segment. */
 extern bool dsm_impl_op(dsm_op op, dsm_handle handle, Size request_size,
 						void **impl_private, void **mapped_address, Size *mapped_size,
-						int elevel);
+						int elevel, int flags);
 
 /* Implementation-dependent actions required to keep segment until shutdown. */
 extern void dsm_impl_pin_segment(dsm_handle handle, void *impl_private,
-- 
2.39.2

#3Thomas Munro
thomas.munro@gmail.com
In reply to: Heikki Linnakangas (#2)
Re: DSA_ALLOC_NO_OOM doesn't work

On Wed, Feb 14, 2024 at 3:23 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 29/01/2024 14:06, Heikki Linnakangas wrote:

If you call dsa_allocate_extended(DSA_ALLOC_NO_OOM), it will still
ereport an error if you run out of space (originally reported at [0]).

Attached patch adds code to test_dsa.c to demonstrate that:

postgres=# select test_dsa_basic();
ERROR: could not resize shared memory segment "/PostgreSQL.1312700148"
to 1075843072 bytes: No space left on device

[0] https://github.com/pgvector/pgvector/issues/434#issuecomment-1912744489

Right, DSA_ALLOC_NO_OOM handles the case where there aren't any more
DSM slots (which used to be more common before we ramped up some
constants) and the case where max_total_segment_size (as self-imposed
limit) would be exceeded, but there is nothing to deal with failure to
allocate at the DSM level, and yeah that just isn't a thing it can do.
Not surprisingly that this observation comes from a Docker user: its
64MB default size limit on the /dev/shm mountpoint breaks parallel
query as discussed on list a few times (see also
https://github.com/docker-library/postgres/issues/416).

This is my mistake, introduced in commit 16be2fd10019 where I failed
to pass that down into DSM code. The only user of DSA_ALLOC_NO_OOM in
core code so far is in dshash.c, where we just re-throw after some
cleanup, commit 4569715b, so you could leak that control object due to
this phenomenon.

I wrote the attached patch to address this, in a fairly straightforward
or naive way. The problem was that even though dsa_allocate() had code
to check the return value of dsm_create(), and return NULL instead of
ereport(ERROR) if the DSA_ALLOC_NO_OOM was set, dsm_create() does not in
fact return NULL on OOM. To fix, I added a DSM_CREATE_NO_OOM option to
dsm_create(), and I also had to punch that through to dsm_impl_op().

Yeah, makes total sense.

This is a little unpolished, but if we want to backpatch something
narrow now, this would probably be the right approach.

However, I must say that the dsm_impl_op() interface is absolutely
insane. It's like someone looked at ioctl() and thought, "hey that's a
great idea!". It mixes all different operations like creating or
destroying a DSM segment together into one function call, and the return
value is just a boolean, even though the function could fail for many
different reasons, and the callers do in fact care about the reason. In
a more natural interface, the different operations would have very
different function signatures.

Yeah. It also manages to channel some of shmat() et al's negative beauty.

Anyway, that leads to this treatment of errnos in your patch:

+            errno = 0;
             if (dsm_impl_op(DSM_OP_CREATE, seg->handle, size,
&seg->impl_private,
-                            &seg->mapped_address, &seg->mapped_size, ERROR))
+                            &seg->mapped_address, &seg->mapped_size,
ERROR, impl_flags))
                 break;
+            if ((flags & DSM_CREATE_NO_OOM) != 0 && (errno == ENOMEM
|| errno == ENOSPC))

... which seems reasonable given the constraints. Another idea might
be to write something different in one of those output parameters to
distinguish out-of-memory, but nothing really seems to fit...

I think we must refactor that. It might be best to leave this
DSA_ALLOC_NO_OOM bug unfixed in backpatches, and fix it on top of the
refactorings on master only. Later, we can backpatch the refactorings
too if we're comfortable with it; extensions shouldn't be using the
dsm_impl_op() interface directly.

Yeah, that sounds true. We don't do a good job of nailing down the
public API of PostgreSQL but dsm_impl_op() is a rare case that is very
obviously not intended to be called by anyone else. On the other
hand, if we really want to avoid changing the function prototype on
principle, perhaps we could make a new operation DSM_OP_CREATE_NO_OOM
instead?

#4Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#2)
Re: DSA_ALLOC_NO_OOM doesn't work

On Tue, Feb 13, 2024 at 7:53 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

However, I must say that the dsm_impl_op() interface is absolutely
insane. It's like someone looked at ioctl() and thought, "hey that's a
great idea!".

As the person who wrote that code, this made me laugh.

I agree it's not the prettiest interface, but I thought that was OK
considering that it should only ever have a very limited number of
callers. I believe I did it this way in the interest of code
compactness. Since there are four DSM implementations, I wanted the
implementation-specific code to be short and all in one place, and
jamming it all into one function served that purpose. Also, there's a
bunch of logic that is shared by multiple operations - detach and
destroy tend to be similar, and so do create and attach, and there are
even things that are shared across all operations, like the snprintf
at the top of dsm_impl_posix() or the slightly larger amount of
boilerplate at the top of dsm_impl_sysv().

I'm not particularly opposed to refactoring this to make it nicer, but
my judgement was that splitting it up into one function per operation
per implementation, say, would have involved a lot of duplication of
small bits of code that might then get out of sync with each other
over time. By doing it this way, the logic is a bit tangled -- or
maybe more than a bit -- but there's very little duplication because
each implementation gets jammed into the smallest possible box. I'm OK
with somebody deciding that I got the trade-offs wrong there, but I
will be interested to see the number of lines added vs. removed in any
future refactoring patch.

--
Robert Haas
EDB: http://www.enterprisedb.com

#5Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Robert Haas (#4)
2 attachment(s)
Re: DSA_ALLOC_NO_OOM doesn't work

On 14/02/2024 09:23, Robert Haas wrote:

On Tue, Feb 13, 2024 at 7:53 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

However, I must say that the dsm_impl_op() interface is absolutely
insane. It's like someone looked at ioctl() and thought, "hey that's a
great idea!".

As the person who wrote that code, this made me laugh.

I agree it's not the prettiest interface, but I thought that was OK
considering that it should only ever have a very limited number of
callers. I believe I did it this way in the interest of code
compactness. Since there are four DSM implementations, I wanted the
implementation-specific code to be short and all in one place, and
jamming it all into one function served that purpose. Also, there's a
bunch of logic that is shared by multiple operations - detach and
destroy tend to be similar, and so do create and attach, and there are
even things that are shared across all operations, like the snprintf
at the top of dsm_impl_posix() or the slightly larger amount of
boilerplate at the top of dsm_impl_sysv().

I'm not particularly opposed to refactoring this to make it nicer, but
my judgement was that splitting it up into one function per operation
per implementation, say, would have involved a lot of duplication of
small bits of code that might then get out of sync with each other
over time. By doing it this way, the logic is a bit tangled -- or
maybe more than a bit -- but there's very little duplication because
each implementation gets jammed into the smallest possible box. I'm OK
with somebody deciding that I got the trade-offs wrong there, but I
will be interested to see the number of lines added vs. removed in any
future refactoring patch.

That's fair, I can see those reasons. Nevertheless, I do think it was a
bad tradeoff. A little bit of repetition would be better here, or we can
extract the common parts to smaller functions.

I came up with the attached:

25 files changed, 1710 insertions(+), 1113 deletions(-)

So yeah, it's more code, and there's some repetition, but I think this
is more readable. Some of that is extra boilerplate because I split the
implementations to separate files, and I also added tests.

I'm not 100% wedded on all of the decisions here, but I think this is
the right direction overall. For example, I decided to separate
dsm_handle used by the high-level interface in dsm.c and the
dsm_impl_handle used by the low-level interace in dsm_impl_*.c (more on
that below). That feels slightly better to me, but could be left out if
we don't want that.

Notable changes:

- Split the single multiplexed dsm_impl_op() function into multiple
functions for different operations. This allows more natural function
signatures for the different operations.

- The create() function is now responsible for generating the handle,
instead of having the caller generate it. Those implementations that
need to generate a random handle and retry if it's already in use, now
do that retry within the implementation.

- The destroy() function no longer detaches the segment; you must call
detach() first if the segment is still attached. This avoids having to
pass "junk" values when destroying a segment that's not attached, and in
case of error, makes it more clear what failed.

- Separate dsm_handle, used by backend code to interact with the high
level interface in dsm.c, from dsm_impl_handle, which is used to
interact with the low-level functions in dsm_impl.c. This gets rid of
the convention in dsm.c of reserving odd numbers for DSM segments stored
in the main shmem area. There is now an explicit flag for that the
control slot. For generating dsm_handles, we now use the same scheme we
used to use for main-area shm segments for all DSM segments, which
includes the slot number in the dsm_handle. The implementations use
their own mechanisms for generating the low-level dsm_impl_handles (all
but the SysV implementation generate a random handle and retry on
collision).

- Use IPC_PRIVATE in the SysV implementation to have the OS create a
unique identifier for us. Use the shmid directly as the (low-level)
handle, so that we don't need to use shmget() to convert a key to shmid,
and don't need the "cache" for that.

- create() no longer returns the mapped_size. The old Windows
implementation had some code to read the actual mapped size after
creating the mapping, and returned that in *mapped_size. Others just
returned the requested size. In principle reading the actual size might
be useful; the caller might be able to make use of the whole mapped size
when it's larger than requested. In practice, the callers didn't do
that. Also, POSIX shmem on FreeBSD has similar round-up-to-page-size
behavior but the implementation did not query the actual mapped size
after creating the segment, so you could not rely on it.

- Added a test that exercises basic create, detach, attach functionality
using all the different implementations supported on the current platform.

- Change datatype of the opaque types in dsm_impl.c from "void *" to
typedefs over uintptr_t. It's easy to make mistakes with "void *", as
you can pass any pointer without getting warnings from the compiler.
Dedicated typedefs give a bit more type checking. (This is in the first
commit, all the other changes are bundled together in the second commit.)

Overall, I don't think this is backpatchable. The handle changes and use
of IPC_PRIVATE in particular: they could lead to failure to clean up old
segments if you upgraded the binary without a clean shutdown. A slightly
different version of this possibly would be, but I'd like to focus on
what's best for master for now.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v1-0001-Change-datatype-of-the-opaque-types-in-dsm_impl.c.patchtext/x-patch; charset=UTF-8; name=v1-0001-Change-datatype-of-the-opaque-types-in-dsm_impl.c.patchDownload
From 7b29e4818bb3b780664fde2e8f5726b4f0b54b7b Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 21 Feb 2024 17:15:22 +0200
Subject: [PATCH v1 1/2] Change datatype of the opaque types in dsm_impl.c

It's easy to make mistakes with "void *", as you can pass any pointer
without getting warnings from the compiler. Switch to a typedef over
uintptr_t to get a bit more type checking.
---
 src/backend/storage/ipc/dsm.c      | 28 +++++++++++------------
 src/backend/storage/ipc/dsm_impl.c | 36 +++++++++++++++---------------
 src/include/storage/dsm.h          |  2 +-
 src/include/storage/dsm_impl.h     | 21 +++++++++++++----
 4 files changed, 50 insertions(+), 37 deletions(-)

diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index 6b12108dd10..d3f982c5b2a 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -70,7 +70,7 @@ struct dsm_segment
 	ResourceOwner resowner;		/* Resource owner. */
 	dsm_handle	handle;			/* Segment name. */
 	uint32		control_slot;	/* Slot in control segment. */
-	void	   *impl_private;	/* Implementation-specific private data. */
+	dsm_impl_private impl_private;	/* Implementation-specific private data. */
 	void	   *mapped_address; /* Mapping address, or NULL if unmapped. */
 	Size		mapped_size;	/* Size of our mapping. */
 	slist_head	on_detach;		/* On-detach callbacks. */
@@ -83,7 +83,7 @@ typedef struct dsm_control_item
 	uint32		refcnt;			/* 2+ = active, 1 = moribund, 0 = gone */
 	size_t		first_page;
 	size_t		npages;
-	void	   *impl_private_pm_handle; /* only needed on Windows */
+	dsm_impl_private_pm_handle impl_private_pm_handle; /* only needed on Windows */
 	bool		pinned;
 } dsm_control_item;
 
@@ -140,7 +140,7 @@ static dlist_head dsm_segment_list = DLIST_STATIC_INIT(dsm_segment_list);
 static dsm_handle dsm_control_handle;
 static dsm_control_header *dsm_control;
 static Size dsm_control_mapped_size = 0;
-static void *dsm_control_impl_private = NULL;
+static dsm_impl_private dsm_control_impl_private = 0;
 
 
 /* ResourceOwner callbacks to hold DSM segments */
@@ -240,8 +240,8 @@ dsm_cleanup_using_control_segment(dsm_handle old_control_handle)
 {
 	void	   *mapped_address = NULL;
 	void	   *junk_mapped_address = NULL;
-	void	   *impl_private = NULL;
-	void	   *junk_impl_private = NULL;
+	dsm_impl_private impl_private = 0;
+	dsm_impl_private junk_impl_private = 0;
 	Size		mapped_size = 0;
 	Size		junk_mapped_size = 0;
 	uint32		nitems;
@@ -362,7 +362,7 @@ dsm_postmaster_shutdown(int code, Datum arg)
 	uint32		i;
 	void	   *dsm_control_address;
 	void	   *junk_mapped_address = NULL;
-	void	   *junk_impl_private = NULL;
+	dsm_impl_private junk_impl_private = 0;
 	Size		junk_mapped_size = 0;
 	PGShmemHeader *shim = (PGShmemHeader *) DatumGetPointer(arg);
 
@@ -598,7 +598,7 @@ dsm_create(Size size, int flags)
 			dsm_control->item[i].handle = seg->handle;
 			/* refcnt of 1 triggers destruction, so start at 2 */
 			dsm_control->item[i].refcnt = 2;
-			dsm_control->item[i].impl_private_pm_handle = NULL;
+			dsm_control->item[i].impl_private_pm_handle = 0;
 			dsm_control->item[i].pinned = false;
 			seg->control_slot = i;
 			LWLockRelease(DynamicSharedMemoryControlLock);
@@ -637,7 +637,7 @@ dsm_create(Size size, int flags)
 	dsm_control->item[nitems].handle = seg->handle;
 	/* refcnt of 1 triggers destruction, so start at 2 */
 	dsm_control->item[nitems].refcnt = 2;
-	dsm_control->item[nitems].impl_private_pm_handle = NULL;
+	dsm_control->item[nitems].impl_private_pm_handle = 0;
 	dsm_control->item[nitems].pinned = false;
 	seg->control_slot = nitems;
 	dsm_control->nitems++;
@@ -842,7 +842,7 @@ dsm_detach(dsm_segment *seg)
 		if (!is_main_region_dsm_handle(seg->handle))
 			dsm_impl_op(DSM_OP_DETACH, seg->handle, 0, &seg->impl_private,
 						&seg->mapped_address, &seg->mapped_size, WARNING);
-		seg->impl_private = NULL;
+		seg->impl_private = 0;
 		seg->mapped_address = NULL;
 		seg->mapped_size = 0;
 	}
@@ -955,7 +955,7 @@ dsm_unpin_mapping(dsm_segment *seg)
 void
 dsm_pin_segment(dsm_segment *seg)
 {
-	void	   *handle = NULL;
+	dsm_impl_private_pm_handle pm_handle = 0;
 
 	/*
 	 * Bump reference count for this segment in shared memory. This will
@@ -967,10 +967,10 @@ dsm_pin_segment(dsm_segment *seg)
 	if (dsm_control->item[seg->control_slot].pinned)
 		elog(ERROR, "cannot pin a segment that is already pinned");
 	if (!is_main_region_dsm_handle(seg->handle))
-		dsm_impl_pin_segment(seg->handle, seg->impl_private, &handle);
+		dsm_impl_pin_segment(seg->handle, seg->impl_private, &pm_handle);
 	dsm_control->item[seg->control_slot].pinned = true;
 	dsm_control->item[seg->control_slot].refcnt++;
-	dsm_control->item[seg->control_slot].impl_private_pm_handle = handle;
+	dsm_control->item[seg->control_slot].impl_private_pm_handle = pm_handle;
 	LWLockRelease(DynamicSharedMemoryControlLock);
 }
 
@@ -1039,7 +1039,7 @@ dsm_unpin_segment(dsm_handle handle)
 	/* Clean up resources if that was the last reference. */
 	if (destroy)
 	{
-		void	   *junk_impl_private = NULL;
+		dsm_impl_private junk_impl_private = 0;
 		void	   *junk_mapped_address = NULL;
 		Size		junk_mapped_size = 0;
 
@@ -1211,7 +1211,7 @@ dsm_create_descriptor(void)
 
 	/* seg->handle must be initialized by the caller */
 	seg->control_slot = INVALID_CONTROL_SLOT;
-	seg->impl_private = NULL;
+	seg->impl_private = 0;
 	seg->mapped_address = NULL;
 	seg->mapped_size = 0;
 
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 8dd669e0ce9..4478c58bb72 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -71,23 +71,23 @@
 
 #ifdef USE_DSM_POSIX
 static bool dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
-						   void **impl_private, void **mapped_address,
+						   dsm_impl_private *impl_private, void **mapped_address,
 						   Size *mapped_size, int elevel);
 static int	dsm_impl_posix_resize(int fd, off_t size);
 #endif
 #ifdef USE_DSM_SYSV
 static bool dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
-						  void **impl_private, void **mapped_address,
+						  dsm_impl_private *impl_private, void **mapped_address,
 						  Size *mapped_size, int elevel);
 #endif
 #ifdef USE_DSM_WINDOWS
 static bool dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
-							 void **impl_private, void **mapped_address,
+							 dsm_impl_private *impl_private, void **mapped_address,
 							 Size *mapped_size, int elevel);
 #endif
 #ifdef USE_DSM_MMAP
 static bool dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
-						  void **impl_private, void **mapped_address,
+						  dsm_impl_private *impl_private, void **mapped_address,
 						  Size *mapped_size, int elevel);
 #endif
 static int	errcode_for_dynamic_shared_memory(void);
@@ -157,7 +157,7 @@ int			min_dynamic_shared_memory;
  */
 bool
 dsm_impl_op(dsm_op op, dsm_handle handle, Size request_size,
-			void **impl_private, void **mapped_address, Size *mapped_size,
+			dsm_impl_private *impl_private, void **mapped_address, Size *mapped_size,
 			int elevel)
 {
 	Assert(op == DSM_OP_CREATE || request_size == 0);
@@ -210,7 +210,7 @@ dsm_impl_op(dsm_op op, dsm_handle handle, Size request_size,
  */
 static bool
 dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
-			   void **impl_private, void **mapped_address, Size *mapped_size,
+			   dsm_impl_private *impl_private, void **mapped_address, Size *mapped_size,
 			   int elevel)
 {
 	char		name[64];
@@ -421,7 +421,7 @@ dsm_impl_posix_resize(int fd, off_t size)
  */
 static bool
 dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
-			  void **impl_private, void **mapped_address, Size *mapped_size,
+			  dsm_impl_private *impl_private, void **mapped_address, Size *mapped_size,
 			  int elevel)
 {
 	key_t		key;
@@ -477,9 +477,9 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
 	 * the shared memory key to a shared memory identifier using shmget(). To
 	 * avoid repeated lookups, we store the key using impl_private.
 	 */
-	if (*impl_private != NULL)
+	if (*impl_private != 0)
 	{
-		ident_cache = *impl_private;
+		ident_cache = (int *) *impl_private;
 		ident = *ident_cache;
 	}
 	else
@@ -522,14 +522,14 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
 		}
 
 		*ident_cache = ident;
-		*impl_private = ident_cache;
+		*impl_private = (uintptr_t) ident_cache;
 	}
 
 	/* Handle teardown cases. */
 	if (op == DSM_OP_DETACH || op == DSM_OP_DESTROY)
 	{
 		pfree(ident_cache);
-		*impl_private = NULL;
+		*impl_private = 0;
 		if (*mapped_address != NULL && shmdt(*mapped_address) != 0)
 		{
 			ereport(elevel,
@@ -608,7 +608,7 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
  */
 static bool
 dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
-				 void **impl_private, void **mapped_address,
+				 dsm_impl_private *impl_private, void **mapped_address,
 				 Size *mapped_size, int elevel)
 {
 	char	   *address;
@@ -790,7 +790,7 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
  */
 static bool
 dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
-			  void **impl_private, void **mapped_address, Size *mapped_size,
+			  dsm_impl_private *impl_private, void **mapped_address, Size *mapped_size,
 			  int elevel)
 {
 	char		name[64];
@@ -960,8 +960,8 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
  * do anything to receive the handle; Windows transfers it automatically.
  */
 void
-dsm_impl_pin_segment(dsm_handle handle, void *impl_private,
-					 void **impl_private_pm_handle)
+dsm_impl_pin_segment(dsm_handle handle, dsm_impl_private impl_private,
+					 dsm_impl_private_pm_handle *pm_handle)
 {
 	switch (dynamic_shared_memory_type)
 	{
@@ -992,7 +992,7 @@ dsm_impl_pin_segment(dsm_handle handle, void *impl_private,
 				 * matter.  We're just holding onto it so that, if the segment
 				 * is unpinned, dsm_impl_unpin_segment can close it.
 				 */
-				*impl_private_pm_handle = hmap;
+				*pm_handle = hmap;
 			}
 			break;
 #endif
@@ -1011,7 +1011,7 @@ dsm_impl_pin_segment(dsm_handle handle, void *impl_private,
  * postmaster's process space.
  */
 void
-dsm_impl_unpin_segment(dsm_handle handle, void **impl_private)
+dsm_impl_unpin_segment(dsm_handle handle, dsm_impl_private_pm_handle *pm_handle)
 {
 	switch (dynamic_shared_memory_type)
 	{
@@ -1034,7 +1034,7 @@ dsm_impl_unpin_segment(dsm_handle handle, void **impl_private)
 									name)));
 				}
 
-				*impl_private = NULL;
+				*pm_handle = 0;
 			}
 			break;
 #endif
diff --git a/src/include/storage/dsm.h b/src/include/storage/dsm.h
index 1a22b32df1a..35ae4eb164e 100644
--- a/src/include/storage/dsm.h
+++ b/src/include/storage/dsm.h
@@ -13,7 +13,7 @@
 #ifndef DSM_H
 #define DSM_H
 
-#include "storage/dsm_impl.h"
+#include "dsm_impl.h"
 
 typedef struct dsm_segment dsm_segment;
 
diff --git a/src/include/storage/dsm_impl.h b/src/include/storage/dsm_impl.h
index 882269603da..f2bfb2a1a5c 100644
--- a/src/include/storage/dsm_impl.h
+++ b/src/include/storage/dsm_impl.h
@@ -66,14 +66,27 @@ typedef enum
 	DSM_OP_DESTROY,
 } dsm_op;
 
+/*
+ * When a segment is created or attached, the caller provides this space to
+ * hold implementation-specific information about the attachment. It is opaque
+ * to the caller, and is passed back to the implementation when detaching.
+ */
+typedef uintptr_t dsm_impl_private;
+
+/*
+ * Similar caller-provided space for implementation-specific information held
+ * when a segment is pinned.
+ */
+typedef uintptr_t dsm_impl_private_pm_handle;
+
 /* Create, attach to, detach from, resize, or destroy a segment. */
 extern bool dsm_impl_op(dsm_op op, dsm_handle handle, Size request_size,
-						void **impl_private, void **mapped_address, Size *mapped_size,
+						dsm_impl_private *impl_private, void **mapped_address, Size *mapped_size,
 						int elevel);
 
 /* Implementation-dependent actions required to keep segment until shutdown. */
-extern void dsm_impl_pin_segment(dsm_handle handle, void *impl_private,
-								 void **impl_private_pm_handle);
-extern void dsm_impl_unpin_segment(dsm_handle handle, void **impl_private);
+extern void dsm_impl_pin_segment(dsm_handle handle, dsm_impl_private impl_private,
+								 dsm_impl_private_pm_handle *impl_private_pm_handle);
+extern void dsm_impl_unpin_segment(dsm_handle handle, dsm_impl_private_pm_handle *pm_handle);
 
 #endif							/* DSM_IMPL_H */
-- 
2.39.2

v1-0002-Rewrite-the-dsm_impl-interface.patchtext/x-patch; charset=UTF-8; name=v1-0002-Rewrite-the-dsm_impl-interface.patchDownload
From ce0b0ae4ca4cc6046ec20baaf627bba1a9cc7112 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 21 Feb 2024 21:09:59 +0200
Subject: [PATCH v1 2/2] Rewrite the dsm_impl interface

Notable changes:

- Split the single multiplexed dsm_impl_op() function into multiple
  functions for different operations. This allows more natural function
  signatures for the different operations.

- The create() function is now responsible for generating the handle,
  instead of having the caller generate it. Those implementations that
  need to generate a random handle and retry if it's already in use now
  do that retry within the implementation.

- The destroy() function no longer detaches the segment; you must call
  detach() first, if the segment is still attached. This avoids having
  to pass "junk" values when destoying a segment that's not attached,
  and in case of error, makes it more clear what failed.

- Separate dsm_handle, used by backend code to interact with the
  high-level interface in dsm.c, from dsm_impl_handle, which is used to
  interact with the low-level functions in dsm_impl.c. This gets rid of
  the convention in dsm.c of reserving odd numbers for DSM segments
  stored in the main shmem area. There is now an explicit flag for that
  the control slot. For generating dsm_handles, we now use the same
  scheme we used to use for main-area shm segments for all DSM segments,
  which includes the slot number in the dsm_handle. The implementations
  use their own mechanisms for generating the low-level
  dsm_impl_handles (all but the SysV implementation generate a random
  handle and retry on collision).

- Use IPC_PRIVATE in the SysV implementation to have the OS create a
  unique identifier for us. Use the shmid directly as the (low-level)
  handle, so that we don't need to use shmget() to convert a key to
  shmid, and don't need the cache for that.

- create() no longer returns the mapped_size. The old Windows
  implementation had some code to read the actual mapped size after
  creating the mapping, and returned that in *mapped_size. In principle
  that might be useful; the caller might be able to make use of the
  whole mapped size when it's larger than requested. In practice, the
  callers didn't do that. Also, POSIX shmem on FreeBSD has similar
  round-up-to-page-size behavior but the implementation did not query
  the actual mapped size after creating the segment, so you could not
  rely on it.

- Added a test that exercises basic create, detach, attach
  functionality using all the different implementations supported on the
  current platform.

Discussion: https://www.postgresql.org/message-id/6030bdec-0de1-4da2-b0b3-335007eae97f@iki.fi
---
 src/backend/port/sysv_shmem.c                |   4 +-
 src/backend/port/win32_shmem.c               |   2 +-
 src/backend/storage/ipc/Makefile             |  10 +
 src/backend/storage/ipc/dsm.c                | 273 ++---
 src/backend/storage/ipc/dsm_impl.c           | 984 +------------------
 src/backend/storage/ipc/dsm_impl_mmap.c      | 300 ++++++
 src/backend/storage/ipc/dsm_impl_posix.c     | 341 +++++++
 src/backend/storage/ipc/dsm_impl_sysv.c      | 224 +++++
 src/backend/storage/ipc/dsm_impl_windows.c   | 345 +++++++
 src/backend/storage/ipc/meson.build          |  12 +
 src/backend/utils/misc/guc_tables.c          |   2 +-
 src/include/storage/dsm.h                    |  10 +-
 src/include/storage/dsm_impl.h               |  99 +-
 src/include/storage/pg_shmem.h               |   2 +-
 src/include/utils/guc_hooks.h                |   2 +
 src/test/modules/Makefile                    |   1 +
 src/test/modules/meson.build                 |   1 +
 src/test/modules/test_dsm/.gitignore         |   3 +
 src/test/modules/test_dsm/Makefile           |  27 +
 src/test/modules/test_dsm/meson.build        |  33 +
 src/test/modules/test_dsm/t/001_dsm_basic.pl |  61 ++
 src/test/modules/test_dsm/test_dsm--1.0.sql  |   9 +
 src/test/modules/test_dsm/test_dsm.c         |  75 ++
 src/test/modules/test_dsm/test_dsm.control   |   4 +
 src/tools/pgindent/typedefs.list             |   3 +
 25 files changed, 1714 insertions(+), 1113 deletions(-)
 create mode 100644 src/backend/storage/ipc/dsm_impl_mmap.c
 create mode 100644 src/backend/storage/ipc/dsm_impl_posix.c
 create mode 100644 src/backend/storage/ipc/dsm_impl_sysv.c
 create mode 100644 src/backend/storage/ipc/dsm_impl_windows.c
 create mode 100644 src/test/modules/test_dsm/.gitignore
 create mode 100644 src/test/modules/test_dsm/Makefile
 create mode 100644 src/test/modules/test_dsm/meson.build
 create mode 100644 src/test/modules/test_dsm/t/001_dsm_basic.pl
 create mode 100644 src/test/modules/test_dsm/test_dsm--1.0.sql
 create mode 100644 src/test/modules/test_dsm/test_dsm.c
 create mode 100644 src/test/modules/test_dsm/test_dsm.control

diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 9a96329bf25..b9bf2979f5d 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -827,7 +827,7 @@ PGSharedMemoryCreate(Size size,
 				 * if some other process creates the same shmem key before we
 				 * do, in which case we'll try the next key.
 				 */
-				if (oldhdr->dsm_control != 0)
+				if (oldhdr->dsm_control != DSM_IMPL_HANDLE_INVALID)
 					dsm_cleanup_using_control_segment(oldhdr->dsm_control);
 				if (shmctl(shmid, IPC_RMID, NULL) < 0)
 					NextShmemSegID++;
@@ -842,7 +842,7 @@ PGSharedMemoryCreate(Size size,
 	hdr = (PGShmemHeader *) memAddress;
 	hdr->creatorPID = getpid();
 	hdr->magic = PGShmemMagic;
-	hdr->dsm_control = 0;
+	hdr->dsm_control = DSM_IMPL_HANDLE_INVALID;
 
 	/* Fill in the data directory ID info, too */
 	hdr->device = statbuf.st_dev;
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index 90bed0146dd..b23686dd49d 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -390,7 +390,7 @@ retry:
 	 */
 	hdr->totalsize = size;
 	hdr->freeoffset = MAXALIGN(sizeof(PGShmemHeader));
-	hdr->dsm_control = 0;
+	hdr->dsm_control = DSM_IMPL_HANDLE_INVALID;
 
 	/* Save info for possible future use */
 	UsedShmemSegAddr = memAddress;
diff --git a/src/backend/storage/ipc/Makefile b/src/backend/storage/ipc/Makefile
index d8a1653eb6a..4e600f77d1c 100644
--- a/src/backend/storage/ipc/Makefile
+++ b/src/backend/storage/ipc/Makefile
@@ -27,4 +27,14 @@ OBJS = \
 	sinvaladt.o \
 	standby.o
 
+ifeq ($(PORTNAME), win32)
+OBJS += dsm_impl_windows.o
+else
+# TODO: Only build the POSIX and System V implementatiions on
+# platforms where they are available. Currently there are large #ifdef
+# blocks in the source files, but would be nicer to skip compiling
+# them altogether.
+OBJS += dsm_impl_posix.o dsm_impl_sysv.o dsm_impl_mmap.o
+endif
+
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index d3f982c5b2a..e0775d8a984 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -14,6 +14,18 @@
  * hard postmaster crash, remaining segments will be removed, if they
  * still exist, at the next postmaster startup.
  *
+ * These services manage two kinds of segments:
+ *
+ * 1. Segments carved out of one "main" DSM segment
+ * 2. Segments backed by a separate low-level DSM segments
+ *
+ * Each segment is identified by a 32-bit integer handle (dsm_handle),
+ * whether it's carved out of the main region or created as a separate
+ * segment.  The handle can be used by other processes to find and attach
+ * to the segment.  Segments that are backed by a dedicated low-level
+ * segment also have a separate dsm_impl_handle, which is used with the
+ * low-level functions in dsm_impl.c.
+ *
  * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
@@ -68,7 +80,10 @@ struct dsm_segment
 {
 	dlist_node	node;			/* List link in dsm_segment_list. */
 	ResourceOwner resowner;		/* Resource owner. */
-	dsm_handle	handle;			/* Segment name. */
+	dsm_handle	handle;			/* Handle for other backends to find this
+								 * segment. */
+	dsm_impl_handle impl_handle;	/* Implementation-specific handle. */
+	bool		in_main_region; /* Is this stored in the main shm region? */
 	uint32		control_slot;	/* Slot in control segment. */
 	dsm_impl_private impl_private;	/* Implementation-specific private data. */
 	void	   *mapped_address; /* Mapping address, or NULL if unmapped. */
@@ -80,10 +95,13 @@ struct dsm_segment
 typedef struct dsm_control_item
 {
 	dsm_handle	handle;
+	dsm_impl_handle impl_handle;
+	bool		in_main_region;
 	uint32		refcnt;			/* 2+ = active, 1 = moribund, 0 = gone */
 	size_t		first_page;
 	size_t		npages;
-	dsm_impl_private_pm_handle impl_private_pm_handle; /* only needed on Windows */
+	dsm_impl_private_pm_handle impl_private_pm_handle;	/* only needed on
+														 * Windows */
 	bool		pinned;
 } dsm_control_item;
 
@@ -102,8 +120,7 @@ static dsm_segment *dsm_create_descriptor(void);
 static bool dsm_control_segment_sane(dsm_control_header *control,
 									 Size mapped_size);
 static uint64 dsm_control_bytes_needed(uint32 nitems);
-static inline dsm_handle make_main_region_dsm_handle(int slot);
-static inline bool is_main_region_dsm_handle(dsm_handle handle);
+static inline dsm_handle make_dsm_handle(int slot);
 
 /* Has this backend initialized the dynamic shared memory system yet? */
 static bool dsm_init_done = false;
@@ -137,7 +154,7 @@ static dlist_head dsm_segment_list = DLIST_STATIC_INIT(dsm_segment_list);
  * reference counted; instead, it lasts for the postmaster's entire
  * life cycle.  For simplicity, it doesn't have a dsm_segment object either.
  */
-static dsm_handle dsm_control_handle;
+static dsm_impl_handle dsm_control_handle;
 static dsm_control_header *dsm_control;
 static Size dsm_control_mapped_size = 0;
 static dsm_impl_private dsm_control_impl_private = 0;
@@ -199,32 +216,19 @@ dsm_postmaster_startup(PGShmemHeader *shim)
 		 maxitems);
 	segsize = dsm_control_bytes_needed(maxitems);
 
-	/*
-	 * Loop until we find an unused identifier for the new control segment. We
-	 * sometimes use DSM_HANDLE_INVALID as a sentinel value indicating "no
-	 * control segment", so avoid generating that value for a real handle.
-	 */
-	for (;;)
-	{
-		Assert(dsm_control_address == NULL);
-		Assert(dsm_control_mapped_size == 0);
-		/* Use even numbers only */
-		dsm_control_handle = pg_prng_uint32(&pg_global_prng_state) << 1;
-		if (dsm_control_handle == DSM_HANDLE_INVALID)
-			continue;
-		if (dsm_impl_op(DSM_OP_CREATE, dsm_control_handle, segsize,
-						&dsm_control_impl_private, &dsm_control_address,
-						&dsm_control_mapped_size, ERROR))
-			break;
-	}
+	/* Create the control segment. */
+	dsm_control_handle = dsm_impl->create(segsize,
+										  &dsm_control_impl_private, &dsm_control_address,
+										  ERROR);
 	dsm_control = dsm_control_address;
+	dsm_control_mapped_size = segsize;
 	on_shmem_exit(dsm_postmaster_shutdown, PointerGetDatum(shim));
 	elog(DEBUG2,
 		 "created dynamic shared memory control segment %u (%zu bytes)",
 		 dsm_control_handle, segsize);
 	shim->dsm_control = dsm_control_handle;
 
-	/* Initialize control segment. */
+	/* Initialize it. */
 	dsm_control->magic = PG_DYNSHMEM_CONTROL_MAGIC;
 	dsm_control->nitems = 0;
 	dsm_control->maxitems = maxitems;
@@ -236,14 +240,11 @@ dsm_postmaster_startup(PGShmemHeader *shim)
  * segments to which it refers, and then the control segment itself.
  */
 void
-dsm_cleanup_using_control_segment(dsm_handle old_control_handle)
+dsm_cleanup_using_control_segment(dsm_impl_handle old_control_handle)
 {
 	void	   *mapped_address = NULL;
-	void	   *junk_mapped_address = NULL;
 	dsm_impl_private impl_private = 0;
-	dsm_impl_private junk_impl_private = 0;
 	Size		mapped_size = 0;
-	Size		junk_mapped_size = 0;
 	uint32		nitems;
 	uint32		i;
 	dsm_control_header *old_control;
@@ -254,8 +255,8 @@ dsm_cleanup_using_control_segment(dsm_handle old_control_handle)
 	 * exists, or an unrelated process has used the same shm ID.  So just fall
 	 * out quietly.
 	 */
-	if (!dsm_impl_op(DSM_OP_ATTACH, old_control_handle, 0, &impl_private,
-					 &mapped_address, &mapped_size, DEBUG1))
+	if (!dsm_impl->attach(old_control_handle, &impl_private,
+						  &mapped_address, &mapped_size, DEBUG1))
 		return;
 
 	/*
@@ -265,8 +266,8 @@ dsm_cleanup_using_control_segment(dsm_handle old_control_handle)
 	old_control = (dsm_control_header *) mapped_address;
 	if (!dsm_control_segment_sane(old_control, mapped_size))
 	{
-		dsm_impl_op(DSM_OP_DETACH, old_control_handle, 0, &impl_private,
-					&mapped_address, &mapped_size, LOG);
+		dsm_impl->detach(old_control_handle, impl_private,
+						 mapped_address, mapped_size, LOG);
 		return;
 	}
 
@@ -277,7 +278,7 @@ dsm_cleanup_using_control_segment(dsm_handle old_control_handle)
 	nitems = old_control->nitems;
 	for (i = 0; i < nitems; ++i)
 	{
-		dsm_handle	handle;
+		dsm_impl_handle handle;
 		uint32		refcnt;
 
 		/* If the reference count is 0, the slot is actually unused. */
@@ -286,8 +287,8 @@ dsm_cleanup_using_control_segment(dsm_handle old_control_handle)
 			continue;
 
 		/* If it was using the main shmem area, there is nothing to do. */
-		handle = old_control->item[i].handle;
-		if (is_main_region_dsm_handle(handle))
+		handle = old_control->item[i].impl_handle;
+		if (old_control->item[i].in_main_region)
 			continue;
 
 		/* Log debugging information. */
@@ -295,16 +296,16 @@ dsm_cleanup_using_control_segment(dsm_handle old_control_handle)
 			 handle, refcnt);
 
 		/* Destroy the referenced segment. */
-		dsm_impl_op(DSM_OP_DESTROY, handle, 0, &junk_impl_private,
-					&junk_mapped_address, &junk_mapped_size, LOG);
+		dsm_impl->destroy(handle, LOG);
 	}
 
 	/* Destroy the old control segment, too. */
 	elog(DEBUG2,
 		 "cleaning up dynamic shared memory control segment with ID %u",
 		 old_control_handle);
-	dsm_impl_op(DSM_OP_DESTROY, old_control_handle, 0, &impl_private,
-				&mapped_address, &mapped_size, LOG);
+	dsm_impl->detach(old_control_handle, impl_private,
+					 mapped_address, mapped_size, LOG);
+	dsm_impl->destroy(old_control_handle, LOG);
 }
 
 /*
@@ -361,9 +362,6 @@ dsm_postmaster_shutdown(int code, Datum arg)
 	uint32		nitems;
 	uint32		i;
 	void	   *dsm_control_address;
-	void	   *junk_mapped_address = NULL;
-	dsm_impl_private junk_impl_private = 0;
-	Size		junk_mapped_size = 0;
 	PGShmemHeader *shim = (PGShmemHeader *) DatumGetPointer(arg);
 
 	/*
@@ -384,23 +382,23 @@ dsm_postmaster_shutdown(int code, Datum arg)
 	/* Remove any remaining segments. */
 	for (i = 0; i < nitems; ++i)
 	{
-		dsm_handle	handle;
+		dsm_impl_handle handle;
 
 		/* If the reference count is 0, the slot is actually unused. */
 		if (dsm_control->item[i].refcnt == 0)
 			continue;
 
-		handle = dsm_control->item[i].handle;
-		if (is_main_region_dsm_handle(handle))
+		if (dsm_control->item[i].in_main_region)
 			continue;
 
+		handle = dsm_control->item[i].impl_handle;
+
 		/* Log debugging information. */
 		elog(DEBUG2, "cleaning up orphaned dynamic shared memory with ID %u",
 			 handle);
 
 		/* Destroy the segment. */
-		dsm_impl_op(DSM_OP_DESTROY, handle, 0, &junk_impl_private,
-					&junk_mapped_address, &junk_mapped_size, LOG);
+		dsm_impl->destroy(handle, LOG);
 	}
 
 	/* Remove the control segment itself. */
@@ -408,11 +406,13 @@ dsm_postmaster_shutdown(int code, Datum arg)
 		 "cleaning up dynamic shared memory control segment with ID %u",
 		 dsm_control_handle);
 	dsm_control_address = dsm_control;
-	dsm_impl_op(DSM_OP_DESTROY, dsm_control_handle, 0,
-				&dsm_control_impl_private, &dsm_control_address,
-				&dsm_control_mapped_size, LOG);
-	dsm_control = dsm_control_address;
-	shim->dsm_control = 0;
+	dsm_impl->detach(dsm_control_handle,
+					 dsm_control_impl_private, dsm_control_address,
+					 dsm_control_mapped_size, LOG);
+	dsm_impl->destroy(dsm_control_handle, LOG);
+	dsm_control = NULL;
+	dsm_control_mapped_size = 0;
+	shim->dsm_control = DSM_IMPL_HANDLE_INVALID;
 }
 
 /*
@@ -429,17 +429,17 @@ dsm_backend_startup(void)
 		void	   *control_address = NULL;
 
 		/* Attach control segment. */
-		Assert(dsm_control_handle != 0);
-		dsm_impl_op(DSM_OP_ATTACH, dsm_control_handle, 0,
-					&dsm_control_impl_private, &control_address,
-					&dsm_control_mapped_size, ERROR);
+		Assert(dsm_control_handle != DSM_HANDLE_INVALID);
+		dsm_impl->attach(dsm_control_handle,
+						 &dsm_control_impl_private, &control_address,
+						 &dsm_control_mapped_size, ERROR);
 		dsm_control = control_address;
 		/* If control segment doesn't look sane, something is badly wrong. */
 		if (!dsm_control_segment_sane(dsm_control, dsm_control_mapped_size))
 		{
-			dsm_impl_op(DSM_OP_DETACH, dsm_control_handle, 0,
-						&dsm_control_impl_private, &control_address,
-						&dsm_control_mapped_size, WARNING);
+			dsm_impl->detach(dsm_control_handle,
+							 dsm_control_impl_private, control_address,
+							 dsm_control_mapped_size, WARNING);
 			ereport(FATAL,
 					(errcode(ERRCODE_INTERNAL_ERROR),
 					 errmsg("dynamic shared memory control segment is not valid")));
@@ -459,7 +459,8 @@ dsm_backend_startup(void)
 void
 dsm_set_control_handle(dsm_handle h)
 {
-	Assert(dsm_control_handle == 0 && h != 0);
+	Assert(dsm_control_handle == DSM_HANDLE_INVALID);
+	Assert(h != DSM_HANDLE_INVALID);
 	dsm_control_handle = h;
 }
 #endif
@@ -567,19 +568,15 @@ dsm_create(Size size, int flags)
 		 */
 		if (dsm_main_space_fpm)
 			LWLockRelease(DynamicSharedMemoryControlLock);
-		for (;;)
-		{
-			Assert(seg->mapped_address == NULL && seg->mapped_size == 0);
-			/* Use even numbers only */
-			seg->handle = pg_prng_uint32(&pg_global_prng_state) << 1;
-			if (seg->handle == DSM_HANDLE_INVALID)	/* Reserve sentinel */
-				continue;
-			if (dsm_impl_op(DSM_OP_CREATE, seg->handle, size, &seg->impl_private,
-							&seg->mapped_address, &seg->mapped_size, ERROR))
-				break;
-		}
+
+		seg->impl_handle = dsm_impl->create(size, &seg->impl_private,
+											&seg->mapped_address, ERROR);
+		seg->mapped_size = size;
 		LWLockAcquire(DynamicSharedMemoryControlLock, LW_EXCLUSIVE);
 	}
+	else
+		seg->impl_handle = DSM_IMPL_HANDLE_INVALID;
+	seg->in_main_region = using_main_dsm_region;
 
 	/* Search the control segment for an unused slot. */
 	nitems = dsm_control->nitems;
@@ -587,15 +584,16 @@ dsm_create(Size size, int flags)
 	{
 		if (dsm_control->item[i].refcnt == 0)
 		{
+			seg->handle = make_dsm_handle(i);
+			seg->in_main_region = using_main_dsm_region;
+			dsm_control->item[i].in_main_region = using_main_dsm_region;
+			dsm_control->item[i].handle = seg->handle;
 			if (using_main_dsm_region)
 			{
-				seg->handle = make_main_region_dsm_handle(i);
 				dsm_control->item[i].first_page = first_page;
 				dsm_control->item[i].npages = npages;
 			}
-			else
-				Assert(!is_main_region_dsm_handle(seg->handle));
-			dsm_control->item[i].handle = seg->handle;
+			dsm_control->item[i].impl_handle = seg->impl_handle;
 			/* refcnt of 1 triggers destruction, so start at 2 */
 			dsm_control->item[i].refcnt = 2;
 			dsm_control->item[i].impl_private_pm_handle = 0;
@@ -613,8 +611,13 @@ dsm_create(Size size, int flags)
 			FreePageManagerPut(dsm_main_space_fpm, first_page, npages);
 		LWLockRelease(DynamicSharedMemoryControlLock);
 		if (!using_main_dsm_region)
-			dsm_impl_op(DSM_OP_DESTROY, seg->handle, 0, &seg->impl_private,
-						&seg->mapped_address, &seg->mapped_size, WARNING);
+		{
+			dsm_impl->detach(seg->impl_handle, seg->impl_private,
+							 seg->mapped_address, seg->mapped_size, WARNING);
+			seg->mapped_address = NULL;
+			seg->mapped_size = 0;
+			dsm_impl->destroy(seg->impl_handle, WARNING);
+		}
 		if (seg->resowner != NULL)
 			ResourceOwnerForgetDSM(seg->resowner, seg);
 		dlist_delete(&seg->node);
@@ -628,13 +631,15 @@ dsm_create(Size size, int flags)
 	}
 
 	/* Enter the handle into a new array slot. */
+	seg->handle = make_dsm_handle(nitems);
+	dsm_control->item[nitems].in_main_region = using_main_dsm_region;
+	dsm_control->item[nitems].handle = seg->handle;
 	if (using_main_dsm_region)
 	{
-		seg->handle = make_main_region_dsm_handle(nitems);
-		dsm_control->item[i].first_page = first_page;
-		dsm_control->item[i].npages = npages;
+		dsm_control->item[nitems].first_page = first_page;
+		dsm_control->item[nitems].npages = npages;
 	}
-	dsm_control->item[nitems].handle = seg->handle;
+	dsm_control->item[nitems].impl_handle = seg->impl_handle;
 	/* refcnt of 1 triggers destruction, so start at 2 */
 	dsm_control->item[nitems].refcnt = 2;
 	dsm_control->item[nitems].impl_private_pm_handle = 0;
@@ -719,7 +724,9 @@ dsm_attach(dsm_handle h)
 		/* Otherwise we've found a match. */
 		dsm_control->item[i].refcnt++;
 		seg->control_slot = i;
-		if (is_main_region_dsm_handle(seg->handle))
+		seg->in_main_region = dsm_control->item[i].in_main_region;
+		seg->impl_handle = dsm_control->item[i].impl_handle;
+		if (seg->in_main_region)
 		{
 			seg->mapped_address = (char *) dsm_main_space_begin +
 				dsm_control->item[i].first_page * FPM_PAGE_SIZE;
@@ -742,9 +749,9 @@ dsm_attach(dsm_handle h)
 	}
 
 	/* Here's where we actually try to map the segment. */
-	if (!is_main_region_dsm_handle(seg->handle))
-		dsm_impl_op(DSM_OP_ATTACH, seg->handle, 0, &seg->impl_private,
-					&seg->mapped_address, &seg->mapped_size, ERROR);
+	if (!seg->in_main_region)
+		dsm_impl->attach(seg->impl_handle, &seg->impl_private,
+						 &seg->mapped_address, &seg->mapped_size, ERROR);
 
 	return seg;
 }
@@ -786,9 +793,13 @@ dsm_detach_all(void)
 	}
 
 	if (control_address != NULL)
-		dsm_impl_op(DSM_OP_DETACH, dsm_control_handle, 0,
-					&dsm_control_impl_private, &control_address,
-					&dsm_control_mapped_size, ERROR);
+	{
+		dsm_impl->detach(dsm_control_handle,
+						 dsm_control_impl_private, control_address,
+						 dsm_control_mapped_size, ERROR);
+		dsm_control = NULL;
+		dsm_control_mapped_size = 0;
+	}
 }
 
 /*
@@ -839,9 +850,9 @@ dsm_detach(dsm_segment *seg)
 	 */
 	if (seg->mapped_address != NULL)
 	{
-		if (!is_main_region_dsm_handle(seg->handle))
-			dsm_impl_op(DSM_OP_DETACH, seg->handle, 0, &seg->impl_private,
-						&seg->mapped_address, &seg->mapped_size, WARNING);
+		if (!seg->in_main_region)
+			dsm_impl->detach(seg->impl_handle, seg->impl_private,
+							 seg->mapped_address, seg->mapped_size, WARNING);
 		seg->impl_private = 0;
 		seg->mapped_address = NULL;
 		seg->mapped_size = 0;
@@ -855,6 +866,8 @@ dsm_detach(dsm_segment *seg)
 
 		LWLockAcquire(DynamicSharedMemoryControlLock, LW_EXCLUSIVE);
 		Assert(dsm_control->item[control_slot].handle == seg->handle);
+		Assert(dsm_control->item[control_slot].in_main_region == seg->in_main_region);
+		Assert(dsm_control->item[control_slot].impl_handle == seg->impl_handle);
 		Assert(dsm_control->item[control_slot].refcnt > 1);
 		refcnt = --dsm_control->item[control_slot].refcnt;
 		seg->control_slot = INVALID_CONTROL_SLOT;
@@ -881,16 +894,17 @@ dsm_detach(dsm_segment *seg)
 			 * other reason, the postmaster may not have any better luck than
 			 * we did.  There's not much we can do about that, though.
 			 */
-			if (is_main_region_dsm_handle(seg->handle) ||
-				dsm_impl_op(DSM_OP_DESTROY, seg->handle, 0, &seg->impl_private,
-							&seg->mapped_address, &seg->mapped_size, WARNING))
+			if (seg->in_main_region ||
+				dsm_impl->destroy(seg->impl_handle, WARNING))
 			{
 				LWLockAcquire(DynamicSharedMemoryControlLock, LW_EXCLUSIVE);
-				if (is_main_region_dsm_handle(seg->handle))
+				if (seg->in_main_region)
 					FreePageManagerPut((FreePageManager *) dsm_main_space_begin,
 									   dsm_control->item[control_slot].first_page,
 									   dsm_control->item[control_slot].npages);
 				Assert(dsm_control->item[control_slot].handle == seg->handle);
+				Assert(dsm_control->item[control_slot].in_main_region == seg->in_main_region);
+				Assert(dsm_control->item[control_slot].impl_handle == seg->impl_handle);
 				Assert(dsm_control->item[control_slot].refcnt == 1);
 				dsm_control->item[control_slot].refcnt = 0;
 				LWLockRelease(DynamicSharedMemoryControlLock);
@@ -966,8 +980,8 @@ dsm_pin_segment(dsm_segment *seg)
 	LWLockAcquire(DynamicSharedMemoryControlLock, LW_EXCLUSIVE);
 	if (dsm_control->item[seg->control_slot].pinned)
 		elog(ERROR, "cannot pin a segment that is already pinned");
-	if (!is_main_region_dsm_handle(seg->handle))
-		dsm_impl_pin_segment(seg->handle, seg->impl_private, &pm_handle);
+	if (!seg->in_main_region)
+		dsm_impl->pin_segment(seg->impl_handle, seg->impl_private, &pm_handle);
 	dsm_control->item[seg->control_slot].pinned = true;
 	dsm_control->item[seg->control_slot].refcnt++;
 	dsm_control->item[seg->control_slot].impl_private_pm_handle = pm_handle;
@@ -991,6 +1005,8 @@ dsm_unpin_segment(dsm_handle handle)
 	uint32		control_slot = INVALID_CONTROL_SLOT;
 	bool		destroy = false;
 	uint32		i;
+	bool		in_main_region = false;
+	dsm_impl_handle impl_handle = DSM_IMPL_HANDLE_INVALID;
 
 	/* Find the control slot for the given handle. */
 	LWLockAcquire(DynamicSharedMemoryControlLock, LW_EXCLUSIVE);
@@ -1004,6 +1020,8 @@ dsm_unpin_segment(dsm_handle handle)
 		if (dsm_control->item[i].handle == handle)
 		{
 			control_slot = i;
+			in_main_region = dsm_control->item[i].in_main_region;
+			impl_handle = dsm_control->item[i].impl_handle;
 			break;
 		}
 	}
@@ -1024,9 +1042,12 @@ dsm_unpin_segment(dsm_handle handle)
 	 * releasing the lock, because impl_private_pm_handle may get modified by
 	 * dsm_impl_unpin_segment.
 	 */
-	if (!is_main_region_dsm_handle(handle))
-		dsm_impl_unpin_segment(handle,
-							   &dsm_control->item[control_slot].impl_private_pm_handle);
+	if (!in_main_region)
+	{
+		dsm_impl->unpin_segment(impl_handle,
+								dsm_control->item[control_slot].impl_private_pm_handle);
+		dsm_control->item[control_slot].impl_private_pm_handle = 0;
+	}
 
 	/* Note that 1 means no references (0 means unused slot). */
 	if (--dsm_control->item[control_slot].refcnt == 1)
@@ -1039,26 +1060,15 @@ dsm_unpin_segment(dsm_handle handle)
 	/* Clean up resources if that was the last reference. */
 	if (destroy)
 	{
-		dsm_impl_private junk_impl_private = 0;
-		void	   *junk_mapped_address = NULL;
-		Size		junk_mapped_size = 0;
-
 		/*
 		 * For an explanation of how error handling works in this case, see
-		 * comments in dsm_detach.  Note that if we reach this point, the
-		 * current process certainly does not have the segment mapped, because
-		 * if it did, the reference count would have still been greater than 1
-		 * even after releasing the reference count held by the pin.  The fact
-		 * that there can't be a dsm_segment for this handle makes it OK to
-		 * pass the mapped size, mapped address, and private data as NULL
-		 * here.
+		 * comments in dsm_detach.
 		 */
-		if (is_main_region_dsm_handle(handle) ||
-			dsm_impl_op(DSM_OP_DESTROY, handle, 0, &junk_impl_private,
-						&junk_mapped_address, &junk_mapped_size, WARNING))
+		if (in_main_region ||
+			dsm_impl->destroy(impl_handle, WARNING))
 		{
 			LWLockAcquire(DynamicSharedMemoryControlLock, LW_EXCLUSIVE);
-			if (is_main_region_dsm_handle(handle))
+			if (in_main_region)
 				FreePageManagerPut((FreePageManager *) dsm_main_space_begin,
 								   dsm_control->item[control_slot].first_page,
 								   dsm_control->item[control_slot].npages);
@@ -1209,7 +1219,10 @@ dsm_create_descriptor(void)
 	seg = MemoryContextAlloc(TopMemoryContext, sizeof(dsm_segment));
 	dlist_push_head(&dsm_segment_list, &seg->node);
 
-	/* seg->handle must be initialized by the caller */
+	/*
+	 * seg->handle, seg->in_main_region, and seg->impl_handle must be
+	 * initialized by the caller.
+	 */
 	seg->control_slot = INVALID_CONTROL_SLOT;
 	seg->impl_private = 0;
 	seg->mapped_address = NULL;
@@ -1259,29 +1272,25 @@ dsm_control_bytes_needed(uint32 nitems)
 		+ sizeof(dsm_control_item) * (uint64) nitems;
 }
 
+/*
+ * Generate a new handle that can be used by any backend process to identify a
+ * DSM segment.
+ */
 static inline dsm_handle
-make_main_region_dsm_handle(int slot)
+make_dsm_handle(int slot)
 {
 	dsm_handle	handle;
 
 	/*
-	 * We need to create a handle that doesn't collide with any existing extra
-	 * segment created by dsm_impl_op(), so we'll make it odd.  It also
-	 * mustn't collide with any other main area pseudo-segment, so we'll
-	 * include the slot number in some of the bits.  We also want to make an
-	 * effort to avoid newly created and recently destroyed handles from being
-	 * confused, so we'll make the rest of the bits random.
+	 * It mustn't collide with any other segment, so we include the slot
+	 * number in some of the bits.  We also want to make an effort to avoid
+	 * newly created and recently destroyed handles from being confused, so we
+	 * make the rest of the bits random.
 	 */
-	handle = 1;
-	handle |= slot << 1;
-	handle |= pg_prng_uint32(&pg_global_prng_state) << (pg_leftmost_one_pos32(dsm_control->maxitems) + 1);
-	return handle;
-}
+	handle = slot;
+	handle |= pg_prng_uint32(&pg_global_prng_state) << (pg_leftmost_one_pos32(dsm_control->maxitems));
 
-static inline bool
-is_main_region_dsm_handle(dsm_handle handle)
-{
-	return handle & 1;
+	return handle;
 }
 
 /* ResourceOwner callbacks */
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 4478c58bb72..ced298a8726 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -3,7 +3,7 @@
  * dsm_impl.c
  *	  manage dynamic shared memory segments
  *
- * This file provides low-level APIs for creating and destroying shared
+ * 'dsm_impl' provides low-level APIs for creating and destroying shared
  * memory segments using several different possible techniques.  We refer
  * to these segments as dynamic because they can be created, altered, and
  * destroyed at any point during the server life cycle.  This is unlike
@@ -36,6 +36,11 @@
  *
  * As ever, Windows requires its own implementation.
  *
+ * The different implementations are in separate source files in this
+ * directory. This file contains a few functions shared by different
+ * implementations and the GUC support to route calls to the currently
+ * active implementation.
+ *
  * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
@@ -48,49 +53,8 @@
 
 #include "postgres.h"
 
-#include <fcntl.h>
-#include <signal.h>
-#include <unistd.h>
-#ifndef WIN32
-#include <sys/mman.h>
-#include <sys/ipc.h>
-#include <sys/shm.h>
-#include <sys/stat.h>
-#endif
-
-#include "common/file_perm.h"
-#include "libpq/pqsignal.h"
-#include "miscadmin.h"
-#include "pgstat.h"
-#include "portability/mem.h"
-#include "postmaster/postmaster.h"
 #include "storage/dsm_impl.h"
-#include "storage/fd.h"
-#include "utils/guc.h"
-#include "utils/memutils.h"
-
-#ifdef USE_DSM_POSIX
-static bool dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
-						   dsm_impl_private *impl_private, void **mapped_address,
-						   Size *mapped_size, int elevel);
-static int	dsm_impl_posix_resize(int fd, off_t size);
-#endif
-#ifdef USE_DSM_SYSV
-static bool dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
-						  dsm_impl_private *impl_private, void **mapped_address,
-						  Size *mapped_size, int elevel);
-#endif
-#ifdef USE_DSM_WINDOWS
-static bool dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
-							 dsm_impl_private *impl_private, void **mapped_address,
-							 Size *mapped_size, int elevel);
-#endif
-#ifdef USE_DSM_MMAP
-static bool dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
-						  dsm_impl_private *impl_private, void **mapped_address,
-						  Size *mapped_size, int elevel);
-#endif
-static int	errcode_for_dynamic_shared_memory(void);
+#include "utils/guc_hooks.h"
 
 const struct config_enum_entry dynamic_shared_memory_options[] = {
 #ifdef USE_DSM_POSIX
@@ -114,940 +78,60 @@ int			dynamic_shared_memory_type = DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE;
 /* Amount of space reserved for DSM segments in the main area. */
 int			min_dynamic_shared_memory;
 
-/* Size of buffer to be used for zero-filling. */
-#define ZBUFFER_SIZE				8192
+const dsm_impl_ops *dsm_impl;
 
-#define SEGMENT_NAME_PREFIX			"Global/PostgreSQL"
-
-/*------
- * Perform a low-level shared memory operation in a platform-specific way,
- * as dictated by the selected implementation.  Each implementation is
- * required to implement the following primitives.
- *
- * DSM_OP_CREATE.  Create a segment whose size is the request_size and
- * map it.
- *
- * DSM_OP_ATTACH.  Map the segment, whose size must be the request_size.
- *
- * DSM_OP_DETACH.  Unmap the segment.
- *
- * DSM_OP_DESTROY.  Unmap the segment, if it is mapped.  Destroy the
- * segment.
- *
- * Arguments:
- *	 op: The operation to be performed.
- *	 handle: The handle of an existing object, or for DSM_OP_CREATE, the
- *	   identifier for the new handle the caller wants created.
- *	 request_size: For DSM_OP_CREATE, the requested size.  Otherwise, 0.
- *	 impl_private: Private, implementation-specific data.  Will be a pointer
- *	   to NULL for the first operation on a shared memory segment within this
- *	   backend; thereafter, it will point to the value to which it was set
- *	   on the previous call.
- *	 mapped_address: Pointer to start of current mapping; pointer to NULL
- *	   if none.  Updated with new mapping address.
- *	 mapped_size: Pointer to size of current mapping; pointer to 0 if none.
- *	   Updated with new mapped size.
- *	 elevel: Level at which to log errors.
- *
- * Return value: true on success, false on failure.  When false is returned,
- * a message should first be logged at the specified elevel, except in the
- * case where DSM_OP_CREATE experiences a name collision, which should
- * silently return false.
- *-----
- */
-bool
-dsm_impl_op(dsm_op op, dsm_handle handle, Size request_size,
-			dsm_impl_private *impl_private, void **mapped_address, Size *mapped_size,
-			int elevel)
+int
+errcode_for_dynamic_shared_memory(void)
 {
-	Assert(op == DSM_OP_CREATE || request_size == 0);
-	Assert((op != DSM_OP_CREATE && op != DSM_OP_ATTACH) ||
-		   (*mapped_address == NULL && *mapped_size == 0));
+	if (errno == EFBIG || errno == ENOMEM)
+		return errcode(ERRCODE_OUT_OF_MEMORY);
+	else
+		return errcode_for_file_access();
+}
 
-	switch (dynamic_shared_memory_type)
+void
+assign_dynamic_shared_memory_type(int new_dynamic_shared_memory_type, void *extra)
+{
+	switch (new_dynamic_shared_memory_type)
 	{
 #ifdef USE_DSM_POSIX
 		case DSM_IMPL_POSIX:
-			return dsm_impl_posix(op, handle, request_size, impl_private,
-								  mapped_address, mapped_size, elevel);
+			dsm_impl = &dsm_impl_posix_ops;
+			break;
 #endif
 #ifdef USE_DSM_SYSV
 		case DSM_IMPL_SYSV:
-			return dsm_impl_sysv(op, handle, request_size, impl_private,
-								 mapped_address, mapped_size, elevel);
+			dsm_impl = &dsm_impl_sysv_ops;
+			break;
 #endif
 #ifdef USE_DSM_WINDOWS
 		case DSM_IMPL_WINDOWS:
-			return dsm_impl_windows(op, handle, request_size, impl_private,
-									mapped_address, mapped_size, elevel);
+			dsm_impl = &dsm_impl_windows_ops;
+			break;
 #endif
 #ifdef USE_DSM_MMAP
 		case DSM_IMPL_MMAP:
-			return dsm_impl_mmap(op, handle, request_size, impl_private,
-								 mapped_address, mapped_size, elevel);
+			dsm_impl = &dsm_impl_mmap_ops;
+			break;
 #endif
 		default:
 			elog(ERROR, "unexpected dynamic shared memory type: %d",
-				 dynamic_shared_memory_type);
-			return false;
+				 new_dynamic_shared_memory_type);
 	}
 }
 
-#ifdef USE_DSM_POSIX
 /*
- * Operating system primitives to support POSIX shared memory.
- *
- * POSIX shared memory segments are created and attached using shm_open()
- * and shm_unlink(); other operations, such as sizing or mapping the
- * segment, are performed as if the shared memory segments were files.
- *
- * Indeed, on some platforms, they may be implemented that way.  While
- * POSIX shared memory segments seem intended to exist in a flat namespace,
- * some operating systems may implement them as files, even going so far
- * to treat a request for /xyz as a request to create a file by that name
- * in the root directory.  Users of such broken platforms should select
- * a different shared memory implementation.
- */
-static bool
-dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
-			   dsm_impl_private *impl_private, void **mapped_address, Size *mapped_size,
-			   int elevel)
-{
-	char		name[64];
-	int			flags;
-	int			fd;
-	char	   *address;
-
-	snprintf(name, 64, "/PostgreSQL.%u", handle);
-
-	/* Handle teardown cases. */
-	if (op == DSM_OP_DETACH || op == DSM_OP_DESTROY)
-	{
-		if (*mapped_address != NULL
-			&& munmap(*mapped_address, *mapped_size) != 0)
-		{
-			ereport(elevel,
-					(errcode_for_dynamic_shared_memory(),
-					 errmsg("could not unmap shared memory segment \"%s\": %m",
-							name)));
-			return false;
-		}
-		*mapped_address = NULL;
-		*mapped_size = 0;
-		if (op == DSM_OP_DESTROY && shm_unlink(name) != 0)
-		{
-			ereport(elevel,
-					(errcode_for_dynamic_shared_memory(),
-					 errmsg("could not remove shared memory segment \"%s\": %m",
-							name)));
-			return false;
-		}
-		return true;
-	}
-
-	/*
-	 * Create new segment or open an existing one for attach.
-	 *
-	 * Even though we will close the FD before returning, it seems desirable
-	 * to use Reserve/ReleaseExternalFD, to reduce the probability of EMFILE
-	 * failure.  The fact that we won't hold the FD open long justifies using
-	 * ReserveExternalFD rather than AcquireExternalFD, though.
-	 */
-	ReserveExternalFD();
-
-	flags = O_RDWR | (op == DSM_OP_CREATE ? O_CREAT | O_EXCL : 0);
-	if ((fd = shm_open(name, flags, PG_FILE_MODE_OWNER)) == -1)
-	{
-		ReleaseExternalFD();
-		if (op == DSM_OP_ATTACH || errno != EEXIST)
-			ereport(elevel,
-					(errcode_for_dynamic_shared_memory(),
-					 errmsg("could not open shared memory segment \"%s\": %m",
-							name)));
-		return false;
-	}
-
-	/*
-	 * If we're attaching the segment, determine the current size; if we are
-	 * creating the segment, set the size to the requested value.
-	 */
-	if (op == DSM_OP_ATTACH)
-	{
-		struct stat st;
-
-		if (fstat(fd, &st) != 0)
-		{
-			int			save_errno;
-
-			/* Back out what's already been done. */
-			save_errno = errno;
-			close(fd);
-			ReleaseExternalFD();
-			errno = save_errno;
-
-			ereport(elevel,
-					(errcode_for_dynamic_shared_memory(),
-					 errmsg("could not stat shared memory segment \"%s\": %m",
-							name)));
-			return false;
-		}
-		request_size = st.st_size;
-	}
-	else if (dsm_impl_posix_resize(fd, request_size) != 0)
-	{
-		int			save_errno;
-
-		/* Back out what's already been done. */
-		save_errno = errno;
-		close(fd);
-		ReleaseExternalFD();
-		shm_unlink(name);
-		errno = save_errno;
-
-		ereport(elevel,
-				(errcode_for_dynamic_shared_memory(),
-				 errmsg("could not resize shared memory segment \"%s\" to %zu bytes: %m",
-						name, request_size)));
-		return false;
-	}
-
-	/* Map it. */
-	address = mmap(NULL, request_size, PROT_READ | PROT_WRITE,
-				   MAP_SHARED | MAP_HASSEMAPHORE | MAP_NOSYNC, fd, 0);
-	if (address == MAP_FAILED)
-	{
-		int			save_errno;
-
-		/* Back out what's already been done. */
-		save_errno = errno;
-		close(fd);
-		ReleaseExternalFD();
-		if (op == DSM_OP_CREATE)
-			shm_unlink(name);
-		errno = save_errno;
-
-		ereport(elevel,
-				(errcode_for_dynamic_shared_memory(),
-				 errmsg("could not map shared memory segment \"%s\": %m",
-						name)));
-		return false;
-	}
-	*mapped_address = address;
-	*mapped_size = request_size;
-	close(fd);
-	ReleaseExternalFD();
-
-	return true;
-}
-
-/*
- * Set the size of a virtual memory region associated with a file descriptor.
- * If necessary, also ensure that virtual memory is actually allocated by the
- * operating system, to avoid nasty surprises later.
- *
- * Returns non-zero if either truncation or allocation fails, and sets errno.
- */
-static int
-dsm_impl_posix_resize(int fd, off_t size)
-{
-	int			rc;
-	int			save_errno;
-	sigset_t	save_sigmask;
-
-	/*
-	 * Block all blockable signals, except SIGQUIT.  posix_fallocate() can run
-	 * for quite a long time, and is an all-or-nothing operation.  If we
-	 * allowed SIGUSR1 to interrupt us repeatedly (for example, due to
-	 * recovery conflicts), the retry loop might never succeed.
-	 */
-	if (IsUnderPostmaster)
-		sigprocmask(SIG_SETMASK, &BlockSig, &save_sigmask);
-
-	pgstat_report_wait_start(WAIT_EVENT_DSM_ALLOCATE);
-#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
-
-	/*
-	 * On Linux, a shm_open fd is backed by a tmpfs file.  If we were to use
-	 * ftruncate, the file would contain a hole.  Accessing memory backed by a
-	 * hole causes tmpfs to allocate pages, which fails with SIGBUS if there
-	 * is no more tmpfs space available.  So we ask tmpfs to allocate pages
-	 * here, so we can fail gracefully with ENOSPC now rather than risking
-	 * SIGBUS later.
-	 *
-	 * We still use a traditional EINTR retry loop to handle SIGCONT.
-	 * posix_fallocate() doesn't restart automatically, and we don't want this
-	 * to fail if you attach a debugger.
-	 */
-	do
-	{
-		rc = posix_fallocate(fd, 0, size);
-	} while (rc == EINTR);
-
-	/*
-	 * The caller expects errno to be set, but posix_fallocate() doesn't set
-	 * it.  Instead it returns error numbers directly.  So set errno, even
-	 * though we'll also return rc to indicate success or failure.
-	 */
-	errno = rc;
-#else
-	/* Extend the file to the requested size. */
-	do
-	{
-		rc = ftruncate(fd, size);
-	} while (rc < 0 && errno == EINTR);
-#endif
-	pgstat_report_wait_end();
-
-	if (IsUnderPostmaster)
-	{
-		save_errno = errno;
-		sigprocmask(SIG_SETMASK, &save_sigmask, NULL);
-		errno = save_errno;
-	}
-
-	return rc;
-}
-
-#endif							/* USE_DSM_POSIX */
-
-#ifdef USE_DSM_SYSV
-/*
- * Operating system primitives to support System V shared memory.
- *
- * System V shared memory segments are manipulated using shmget(), shmat(),
- * shmdt(), and shmctl().  As the default allocation limits for System V
- * shared memory are usually quite low, the POSIX facilities may be
- * preferable; but those are not supported everywhere.
- */
-static bool
-dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
-			  dsm_impl_private *impl_private, void **mapped_address, Size *mapped_size,
-			  int elevel)
-{
-	key_t		key;
-	int			ident;
-	char	   *address;
-	char		name[64];
-	int		   *ident_cache;
-
-	/*
-	 * POSIX shared memory and mmap-based shared memory identify segments with
-	 * names.  To avoid needless error message variation, we use the handle as
-	 * the name.
-	 */
-	snprintf(name, 64, "%u", handle);
-
-	/*
-	 * The System V shared memory namespace is very restricted; names are of
-	 * type key_t, which is expected to be some sort of integer data type, but
-	 * not necessarily the same one as dsm_handle.  Since we use dsm_handle to
-	 * identify shared memory segments across processes, this might seem like
-	 * a problem, but it's really not.  If dsm_handle is bigger than key_t,
-	 * the cast below might truncate away some bits from the handle the
-	 * user-provided, but it'll truncate exactly the same bits away in exactly
-	 * the same fashion every time we use that handle, which is all that
-	 * really matters.  Conversely, if dsm_handle is smaller than key_t, we
-	 * won't use the full range of available key space, but that's no big deal
-	 * either.
-	 *
-	 * We do make sure that the key isn't negative, because that might not be
-	 * portable.
-	 */
-	key = (key_t) handle;
-	if (key < 1)				/* avoid compiler warning if type is unsigned */
-		key = -key;
-
-	/*
-	 * There's one special key, IPC_PRIVATE, which can't be used.  If we end
-	 * up with that value by chance during a create operation, just pretend it
-	 * already exists, so that caller will retry.  If we run into it anywhere
-	 * else, the caller has passed a handle that doesn't correspond to
-	 * anything we ever created, which should not happen.
-	 */
-	if (key == IPC_PRIVATE)
-	{
-		if (op != DSM_OP_CREATE)
-			elog(DEBUG4, "System V shared memory key may not be IPC_PRIVATE");
-		errno = EEXIST;
-		return false;
-	}
-
-	/*
-	 * Before we can do anything with a shared memory segment, we have to map
-	 * the shared memory key to a shared memory identifier using shmget(). To
-	 * avoid repeated lookups, we store the key using impl_private.
-	 */
-	if (*impl_private != 0)
-	{
-		ident_cache = (int *) *impl_private;
-		ident = *ident_cache;
-	}
-	else
-	{
-		int			flags = IPCProtection;
-		size_t		segsize;
-
-		/*
-		 * Allocate the memory BEFORE acquiring the resource, so that we don't
-		 * leak the resource if memory allocation fails.
-		 */
-		ident_cache = MemoryContextAlloc(TopMemoryContext, sizeof(int));
-
-		/*
-		 * When using shmget to find an existing segment, we must pass the
-		 * size as 0.  Passing a non-zero size which is greater than the
-		 * actual size will result in EINVAL.
-		 */
-		segsize = 0;
-
-		if (op == DSM_OP_CREATE)
-		{
-			flags |= IPC_CREAT | IPC_EXCL;
-			segsize = request_size;
-		}
-
-		if ((ident = shmget(key, segsize, flags)) == -1)
-		{
-			if (op == DSM_OP_ATTACH || errno != EEXIST)
-			{
-				int			save_errno = errno;
-
-				pfree(ident_cache);
-				errno = save_errno;
-				ereport(elevel,
-						(errcode_for_dynamic_shared_memory(),
-						 errmsg("could not get shared memory segment: %m")));
-			}
-			return false;
-		}
-
-		*ident_cache = ident;
-		*impl_private = (uintptr_t) ident_cache;
-	}
-
-	/* Handle teardown cases. */
-	if (op == DSM_OP_DETACH || op == DSM_OP_DESTROY)
-	{
-		pfree(ident_cache);
-		*impl_private = 0;
-		if (*mapped_address != NULL && shmdt(*mapped_address) != 0)
-		{
-			ereport(elevel,
-					(errcode_for_dynamic_shared_memory(),
-					 errmsg("could not unmap shared memory segment \"%s\": %m",
-							name)));
-			return false;
-		}
-		*mapped_address = NULL;
-		*mapped_size = 0;
-		if (op == DSM_OP_DESTROY && shmctl(ident, IPC_RMID, NULL) < 0)
-		{
-			ereport(elevel,
-					(errcode_for_dynamic_shared_memory(),
-					 errmsg("could not remove shared memory segment \"%s\": %m",
-							name)));
-			return false;
-		}
-		return true;
-	}
-
-	/* If we're attaching it, we must use IPC_STAT to determine the size. */
-	if (op == DSM_OP_ATTACH)
-	{
-		struct shmid_ds shm;
-
-		if (shmctl(ident, IPC_STAT, &shm) != 0)
-		{
-			ereport(elevel,
-					(errcode_for_dynamic_shared_memory(),
-					 errmsg("could not stat shared memory segment \"%s\": %m",
-							name)));
-			return false;
-		}
-		request_size = shm.shm_segsz;
-	}
-
-	/* Map it. */
-	address = shmat(ident, NULL, PG_SHMAT_FLAGS);
-	if (address == (void *) -1)
-	{
-		int			save_errno;
-
-		/* Back out what's already been done. */
-		save_errno = errno;
-		if (op == DSM_OP_CREATE)
-			shmctl(ident, IPC_RMID, NULL);
-		errno = save_errno;
-
-		ereport(elevel,
-				(errcode_for_dynamic_shared_memory(),
-				 errmsg("could not map shared memory segment \"%s\": %m",
-						name)));
-		return false;
-	}
-	*mapped_address = address;
-	*mapped_size = request_size;
-
-	return true;
-}
-#endif
-
-#ifdef USE_DSM_WINDOWS
-/*
- * Operating system primitives to support Windows shared memory.
- *
- * Windows shared memory implementation is done using file mapping
- * which can be backed by either physical file or system paging file.
- * Current implementation uses system paging file as other effects
- * like performance are not clear for physical file and it is used in similar
- * way for main shared memory in windows.
- *
- * A memory mapping object is a kernel object - they always get deleted when
- * the last reference to them goes away, either explicitly via a CloseHandle or
- * when the process containing the reference exits.
- */
-static bool
-dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
-				 dsm_impl_private *impl_private, void **mapped_address,
-				 Size *mapped_size, int elevel)
-{
-	char	   *address;
-	HANDLE		hmap;
-	char		name[64];
-	MEMORY_BASIC_INFORMATION info;
-
-	/*
-	 * Storing the shared memory segment in the Global\ namespace, can allow
-	 * any process running in any session to access that file mapping object
-	 * provided that the caller has the required access rights. But to avoid
-	 * issues faced in main shared memory, we are using the naming convention
-	 * similar to main shared memory. We can change here once issue mentioned
-	 * in GetSharedMemName is resolved.
-	 */
-	snprintf(name, 64, "%s.%u", SEGMENT_NAME_PREFIX, handle);
-
-	/*
-	 * Handle teardown cases.  Since Windows automatically destroys the object
-	 * when no references remain, we can treat it the same as detach.
-	 */
-	if (op == DSM_OP_DETACH || op == DSM_OP_DESTROY)
-	{
-		if (*mapped_address != NULL
-			&& UnmapViewOfFile(*mapped_address) == 0)
-		{
-			_dosmaperr(GetLastError());
-			ereport(elevel,
-					(errcode_for_dynamic_shared_memory(),
-					 errmsg("could not unmap shared memory segment \"%s\": %m",
-							name)));
-			return false;
-		}
-		if (*impl_private != NULL
-			&& CloseHandle(*impl_private) == 0)
-		{
-			_dosmaperr(GetLastError());
-			ereport(elevel,
-					(errcode_for_dynamic_shared_memory(),
-					 errmsg("could not remove shared memory segment \"%s\": %m",
-							name)));
-			return false;
-		}
-
-		*impl_private = NULL;
-		*mapped_address = NULL;
-		*mapped_size = 0;
-		return true;
-	}
-
-	/* Create new segment or open an existing one for attach. */
-	if (op == DSM_OP_CREATE)
-	{
-		DWORD		size_high;
-		DWORD		size_low;
-		DWORD		errcode;
-
-		/* Shifts >= the width of the type are undefined. */
-#ifdef _WIN64
-		size_high = request_size >> 32;
-#else
-		size_high = 0;
-#endif
-		size_low = (DWORD) request_size;
-
-		/* CreateFileMapping might not clear the error code on success */
-		SetLastError(0);
-
-		hmap = CreateFileMapping(INVALID_HANDLE_VALUE,	/* Use the pagefile */
-								 NULL,	/* Default security attrs */
-								 PAGE_READWRITE,	/* Memory is read/write */
-								 size_high, /* Upper 32 bits of size */
-								 size_low,	/* Lower 32 bits of size */
-								 name);
-
-		errcode = GetLastError();
-		if (errcode == ERROR_ALREADY_EXISTS || errcode == ERROR_ACCESS_DENIED)
-		{
-			/*
-			 * On Windows, when the segment already exists, a handle for the
-			 * existing segment is returned.  We must close it before
-			 * returning.  However, if the existing segment is created by a
-			 * service, then it returns ERROR_ACCESS_DENIED. We don't do
-			 * _dosmaperr here, so errno won't be modified.
-			 */
-			if (hmap)
-				CloseHandle(hmap);
-			return false;
-		}
-
-		if (!hmap)
-		{
-			_dosmaperr(errcode);
-			ereport(elevel,
-					(errcode_for_dynamic_shared_memory(),
-					 errmsg("could not create shared memory segment \"%s\": %m",
-							name)));
-			return false;
-		}
-	}
-	else
-	{
-		hmap = OpenFileMapping(FILE_MAP_WRITE | FILE_MAP_READ,
-							   FALSE,	/* do not inherit the name */
-							   name);	/* name of mapping object */
-		if (!hmap)
-		{
-			_dosmaperr(GetLastError());
-			ereport(elevel,
-					(errcode_for_dynamic_shared_memory(),
-					 errmsg("could not open shared memory segment \"%s\": %m",
-							name)));
-			return false;
-		}
-	}
-
-	/* Map it. */
-	address = MapViewOfFile(hmap, FILE_MAP_WRITE | FILE_MAP_READ,
-							0, 0, 0);
-	if (!address)
-	{
-		int			save_errno;
-
-		_dosmaperr(GetLastError());
-		/* Back out what's already been done. */
-		save_errno = errno;
-		CloseHandle(hmap);
-		errno = save_errno;
-
-		ereport(elevel,
-				(errcode_for_dynamic_shared_memory(),
-				 errmsg("could not map shared memory segment \"%s\": %m",
-						name)));
-		return false;
-	}
-
-	/*
-	 * VirtualQuery gives size in page_size units, which is 4K for Windows. We
-	 * need size only when we are attaching, but it's better to get the size
-	 * when creating new segment to keep size consistent both for
-	 * DSM_OP_CREATE and DSM_OP_ATTACH.
-	 */
-	if (VirtualQuery(address, &info, sizeof(info)) == 0)
-	{
-		int			save_errno;
-
-		_dosmaperr(GetLastError());
-		/* Back out what's already been done. */
-		save_errno = errno;
-		UnmapViewOfFile(address);
-		CloseHandle(hmap);
-		errno = save_errno;
-
-		ereport(elevel,
-				(errcode_for_dynamic_shared_memory(),
-				 errmsg("could not stat shared memory segment \"%s\": %m",
-						name)));
-		return false;
-	}
-
-	*mapped_address = address;
-	*mapped_size = info.RegionSize;
-	*impl_private = hmap;
-
-	return true;
-}
-#endif
-
-#ifdef USE_DSM_MMAP
-/*
- * Operating system primitives to support mmap-based shared memory.
- *
- * Calling this "shared memory" is somewhat of a misnomer, because what
- * we're really doing is creating a bunch of files and mapping them into
- * our address space.  The operating system may feel obliged to
- * synchronize the contents to disk even if nothing is being paged out,
- * which will not serve us well.  The user can relocate the pg_dynshmem
- * directory to a ramdisk to avoid this problem, if available.
- */
-static bool
-dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
-			  dsm_impl_private *impl_private, void **mapped_address, Size *mapped_size,
-			  int elevel)
-{
-	char		name[64];
-	int			flags;
-	int			fd;
-	char	   *address;
-
-	snprintf(name, 64, PG_DYNSHMEM_DIR "/" PG_DYNSHMEM_MMAP_FILE_PREFIX "%u",
-			 handle);
-
-	/* Handle teardown cases. */
-	if (op == DSM_OP_DETACH || op == DSM_OP_DESTROY)
-	{
-		if (*mapped_address != NULL
-			&& munmap(*mapped_address, *mapped_size) != 0)
-		{
-			ereport(elevel,
-					(errcode_for_dynamic_shared_memory(),
-					 errmsg("could not unmap shared memory segment \"%s\": %m",
-							name)));
-			return false;
-		}
-		*mapped_address = NULL;
-		*mapped_size = 0;
-		if (op == DSM_OP_DESTROY && unlink(name) != 0)
-		{
-			ereport(elevel,
-					(errcode_for_dynamic_shared_memory(),
-					 errmsg("could not remove shared memory segment \"%s\": %m",
-							name)));
-			return false;
-		}
-		return true;
-	}
-
-	/* Create new segment or open an existing one for attach. */
-	flags = O_RDWR | (op == DSM_OP_CREATE ? O_CREAT | O_EXCL : 0);
-	if ((fd = OpenTransientFile(name, flags)) == -1)
-	{
-		if (op == DSM_OP_ATTACH || errno != EEXIST)
-			ereport(elevel,
-					(errcode_for_dynamic_shared_memory(),
-					 errmsg("could not open shared memory segment \"%s\": %m",
-							name)));
-		return false;
-	}
-
-	/*
-	 * If we're attaching the segment, determine the current size; if we are
-	 * creating the segment, set the size to the requested value.
-	 */
-	if (op == DSM_OP_ATTACH)
-	{
-		struct stat st;
-
-		if (fstat(fd, &st) != 0)
-		{
-			int			save_errno;
-
-			/* Back out what's already been done. */
-			save_errno = errno;
-			CloseTransientFile(fd);
-			errno = save_errno;
-
-			ereport(elevel,
-					(errcode_for_dynamic_shared_memory(),
-					 errmsg("could not stat shared memory segment \"%s\": %m",
-							name)));
-			return false;
-		}
-		request_size = st.st_size;
-	}
-	else
-	{
-		/*
-		 * Allocate a buffer full of zeros.
-		 *
-		 * Note: palloc zbuffer, instead of just using a local char array, to
-		 * ensure it is reasonably well-aligned; this may save a few cycles
-		 * transferring data to the kernel.
-		 */
-		char	   *zbuffer = (char *) palloc0(ZBUFFER_SIZE);
-		Size		remaining = request_size;
-		bool		success = true;
-
-		/*
-		 * Zero-fill the file. We have to do this the hard way to ensure that
-		 * all the file space has really been allocated, so that we don't
-		 * later seg fault when accessing the memory mapping.  This is pretty
-		 * pessimal.
-		 */
-		while (success && remaining > 0)
-		{
-			Size		goal = remaining;
-
-			if (goal > ZBUFFER_SIZE)
-				goal = ZBUFFER_SIZE;
-			pgstat_report_wait_start(WAIT_EVENT_DSM_FILL_ZERO_WRITE);
-			if (write(fd, zbuffer, goal) == goal)
-				remaining -= goal;
-			else
-				success = false;
-			pgstat_report_wait_end();
-		}
-
-		if (!success)
-		{
-			int			save_errno;
-
-			/* Back out what's already been done. */
-			save_errno = errno;
-			CloseTransientFile(fd);
-			unlink(name);
-			errno = save_errno ? save_errno : ENOSPC;
-
-			ereport(elevel,
-					(errcode_for_dynamic_shared_memory(),
-					 errmsg("could not resize shared memory segment \"%s\" to %zu bytes: %m",
-							name, request_size)));
-			return false;
-		}
-	}
-
-	/* Map it. */
-	address = mmap(NULL, request_size, PROT_READ | PROT_WRITE,
-				   MAP_SHARED | MAP_HASSEMAPHORE | MAP_NOSYNC, fd, 0);
-	if (address == MAP_FAILED)
-	{
-		int			save_errno;
-
-		/* Back out what's already been done. */
-		save_errno = errno;
-		CloseTransientFile(fd);
-		if (op == DSM_OP_CREATE)
-			unlink(name);
-		errno = save_errno;
-
-		ereport(elevel,
-				(errcode_for_dynamic_shared_memory(),
-				 errmsg("could not map shared memory segment \"%s\": %m",
-						name)));
-		return false;
-	}
-	*mapped_address = address;
-	*mapped_size = request_size;
-
-	if (CloseTransientFile(fd) != 0)
-	{
-		ereport(elevel,
-				(errcode_for_file_access(),
-				 errmsg("could not close shared memory segment \"%s\": %m",
-						name)));
-		return false;
-	}
-
-	return true;
-}
-#endif
-
-/*
- * Implementation-specific actions that must be performed when a segment is to
- * be preserved even when no backend has it attached.
- *
- * Except on Windows, we don't need to do anything at all.  But since Windows
- * cleans up segments automatically when no references remain, we duplicate
- * the segment handle into the postmaster process.  The postmaster needn't
- * do anything to receive the handle; Windows transfers it automatically.
+ * No-op implementation of the pin_segment interface.  Most implementations
+ * (all but Windows) don't need to do anything here.
  */
 void
-dsm_impl_pin_segment(dsm_handle handle, dsm_impl_private impl_private,
-					 dsm_impl_private_pm_handle *pm_handle)
+dsm_impl_noop_pin_segment(dsm_impl_handle handle, dsm_impl_private impl_private,
+						  dsm_impl_private_pm_handle *pm_handle)
 {
-	switch (dynamic_shared_memory_type)
-	{
-#ifdef USE_DSM_WINDOWS
-		case DSM_IMPL_WINDOWS:
-			if (IsUnderPostmaster)
-			{
-				HANDLE		hmap;
-
-				if (!DuplicateHandle(GetCurrentProcess(), impl_private,
-									 PostmasterHandle, &hmap, 0, FALSE,
-									 DUPLICATE_SAME_ACCESS))
-				{
-					char		name[64];
-
-					snprintf(name, 64, "%s.%u", SEGMENT_NAME_PREFIX, handle);
-					_dosmaperr(GetLastError());
-					ereport(ERROR,
-							(errcode_for_dynamic_shared_memory(),
-							 errmsg("could not duplicate handle for \"%s\": %m",
-									name)));
-				}
-
-				/*
-				 * Here, we remember the handle that we created in the
-				 * postmaster process.  This handle isn't actually usable in
-				 * any process other than the postmaster, but that doesn't
-				 * matter.  We're just holding onto it so that, if the segment
-				 * is unpinned, dsm_impl_unpin_segment can close it.
-				 */
-				*pm_handle = hmap;
-			}
-			break;
-#endif
-		default:
-			break;
-	}
 }
 
-/*
- * Implementation-specific actions that must be performed when a segment is no
- * longer to be preserved, so that it will be cleaned up when all backends
- * have detached from it.
- *
- * Except on Windows, we don't need to do anything at all.  For Windows, we
- * close the extra handle that dsm_impl_pin_segment created in the
- * postmaster's process space.
- */
+/* no-op implementation of the unpin_segment interface */
 void
-dsm_impl_unpin_segment(dsm_handle handle, dsm_impl_private_pm_handle *pm_handle)
-{
-	switch (dynamic_shared_memory_type)
-	{
-#ifdef USE_DSM_WINDOWS
-		case DSM_IMPL_WINDOWS:
-			if (IsUnderPostmaster)
-			{
-				if (*impl_private &&
-					!DuplicateHandle(PostmasterHandle, *impl_private,
-									 NULL, NULL, 0, FALSE,
-									 DUPLICATE_CLOSE_SOURCE))
-				{
-					char		name[64];
-
-					snprintf(name, 64, "%s.%u", SEGMENT_NAME_PREFIX, handle);
-					_dosmaperr(GetLastError());
-					ereport(ERROR,
-							(errcode_for_dynamic_shared_memory(),
-							 errmsg("could not duplicate handle for \"%s\": %m",
-									name)));
-				}
-
-				*pm_handle = 0;
-			}
-			break;
-#endif
-		default:
-			break;
-	}
-}
-
-static int
-errcode_for_dynamic_shared_memory(void)
+dsm_impl_noop_unpin_segment(dsm_impl_handle handle, dsm_impl_private_pm_handle pm_handle)
 {
-	if (errno == EFBIG || errno == ENOMEM)
-		return errcode(ERRCODE_OUT_OF_MEMORY);
-	else
-		return errcode_for_file_access();
 }
diff --git a/src/backend/storage/ipc/dsm_impl_mmap.c b/src/backend/storage/ipc/dsm_impl_mmap.c
new file mode 100644
index 00000000000..8dd7d1eeb7f
--- /dev/null
+++ b/src/backend/storage/ipc/dsm_impl_mmap.c
@@ -0,0 +1,300 @@
+/*-------------------------------------------------------------------------
+ *
+ * dsm_impl_mmap.c
+ *	  Operating system primitives to support mmap-based shared memory.
+ *
+ * Calling this "shared memory" is somewhat of a misnomer, because what
+ * we're really doing is creating a bunch of files and mapping them into
+ * our address space.  The operating system may feel obliged to
+ * synchronize the contents to disk even if nothing is being paged out,
+ * which will not serve us well.  The user can relocate the pg_dynshmem
+ * directory to a ramdisk to avoid this problem, if available.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/storage/ipc/dsm_impl_mmap.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <fcntl.h>
+#include <unistd.h>
+#ifndef WIN32
+#include <sys/mman.h>
+#include <sys/stat.h>
+#endif
+
+#include "common/file_perm.h"
+#include "common/pg_prng.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "portability/mem.h"
+#include "storage/dsm_impl.h"
+#include "storage/fd.h"
+
+#ifdef USE_DSM_MMAP
+
+/* Size of buffer to be used for zero-filling. */
+#define ZBUFFER_SIZE				8192
+
+static dsm_impl_handle
+dsm_impl_mmap_create(Size request_size, dsm_impl_private *impl_private,
+					 void **mapped_address, int elevel)
+{
+	char		name[64];
+	int			flags;
+	int			fd;
+	char	   *address;
+	dsm_impl_handle handle;
+
+	/*
+	 * Create new segment, with a random name.  If the name is already in use,
+	 * retry until we find an unused name.
+	 */
+	for (;;)
+	{
+		do
+		{
+			handle = pg_prng_uint32(&pg_global_prng_state);
+		} while (handle == DSM_IMPL_HANDLE_INVALID);
+
+		snprintf(name, 64, PG_DYNSHMEM_DIR "/" PG_DYNSHMEM_MMAP_FILE_PREFIX "%u",
+				 handle);
+
+		/* Create new segment or open an existing one for attach. */
+		flags = O_RDWR | O_CREAT | O_EXCL;
+		if ((fd = OpenTransientFile(name, flags)) == -1)
+		{
+			if (errno == EEXIST)
+				continue;
+			ereport(elevel,
+					(errcode_for_dynamic_shared_memory(),
+					 errmsg("could not open shared memory segment \"%s\": %m",
+							name)));
+			return DSM_IMPL_HANDLE_INVALID;
+		}
+		break;
+	}
+
+	/* Enlarge it to the requested size. */
+	{
+		/*
+		 * Allocate a buffer full of zeros.
+		 *
+		 * Note: palloc zbuffer, instead of just using a local char array, to
+		 * ensure it is reasonably well-aligned; this may save a few cycles
+		 * transferring data to the kernel.
+		 */
+		char	   *zbuffer = (char *) palloc0(ZBUFFER_SIZE);
+		Size		remaining = request_size;
+		bool		success = true;
+
+		/*
+		 * Zero-fill the file. We have to do this the hard way to ensure that
+		 * all the file space has really been allocated, so that we don't
+		 * later seg fault when accessing the memory mapping.  This is pretty
+		 * pessimal.
+		 */
+		while (success && remaining > 0)
+		{
+			Size		goal = remaining;
+
+			if (goal > ZBUFFER_SIZE)
+				goal = ZBUFFER_SIZE;
+			pgstat_report_wait_start(WAIT_EVENT_DSM_FILL_ZERO_WRITE);
+			if (write(fd, zbuffer, goal) == goal)
+				remaining -= goal;
+			else
+				success = false;
+			pgstat_report_wait_end();
+		}
+
+		if (!success)
+		{
+			int			save_errno;
+
+			/* Back out what's already been done. */
+			save_errno = errno;
+			CloseTransientFile(fd);
+			unlink(name);
+			errno = save_errno ? save_errno : ENOSPC;
+
+			ereport(elevel,
+					(errcode_for_dynamic_shared_memory(),
+					 errmsg("could not resize shared memory segment \"%s\" to %zu bytes: %m",
+							name, request_size)));
+			return DSM_IMPL_HANDLE_INVALID;
+		}
+	}
+
+	/* Map it to this process's address space. */
+	address = mmap(NULL, request_size, PROT_READ | PROT_WRITE,
+				   MAP_SHARED | MAP_HASSEMAPHORE | MAP_NOSYNC, fd, 0);
+	if (address == MAP_FAILED)
+	{
+		int			save_errno;
+
+		/* Back out what's already been done. */
+		save_errno = errno;
+		CloseTransientFile(fd);
+		unlink(name);
+		errno = save_errno;
+
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not map shared memory segment \"%s\": %m",
+						name)));
+		return DSM_IMPL_HANDLE_INVALID;
+	}
+
+	/* Once it's mapped, we don't need the file descriptor for it anymore. */
+	if (CloseTransientFile(fd) != 0)
+	{
+		ereport(elevel,
+				(errcode_for_file_access(),
+				 errmsg("could not close shared memory segment \"%s\": %m",
+						name)));
+		return DSM_IMPL_HANDLE_INVALID;
+	}
+
+	*mapped_address = address;
+	*impl_private = 0;			/* not used by this implementation */
+	return handle;
+}
+
+static bool
+dsm_impl_mmap_attach(dsm_impl_handle handle,
+					 dsm_impl_private *impl_private, void **mapped_address, Size *mapped_size,
+					 int elevel)
+{
+	char		name[64];
+	int			flags;
+	int			fd;
+	char	   *address;
+	Size		size;
+	struct stat st;
+
+	snprintf(name, 64, PG_DYNSHMEM_DIR "/" PG_DYNSHMEM_MMAP_FILE_PREFIX "%u",
+			 handle);
+
+	/* Open an existing file for attach. */
+	flags = O_RDWR;
+	if ((fd = OpenTransientFile(name, flags)) == -1)
+	{
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not open shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+
+	/* Determine the current size. */
+	if (fstat(fd, &st) != 0)
+	{
+		int			save_errno;
+
+		/* Back out what's already been done. */
+		save_errno = errno;
+		CloseTransientFile(fd);
+		errno = save_errno;
+
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not stat shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+	size = st.st_size;
+
+	/* Map it to this process's address space. */
+	address = mmap(NULL, size, PROT_READ | PROT_WRITE,
+				   MAP_SHARED | MAP_HASSEMAPHORE | MAP_NOSYNC, fd, 0);
+	if (address == MAP_FAILED)
+	{
+		int			save_errno;
+
+		/* Back out what's already been done. */
+		save_errno = errno;
+		CloseTransientFile(fd);
+		errno = save_errno;
+
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not map shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+
+	/* Once it's mapped, we don't need the file descriptor for it anymore */
+	if (CloseTransientFile(fd) != 0)
+	{
+		ereport(elevel,
+				(errcode_for_file_access(),
+				 errmsg("could not close shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+
+	*mapped_address = address;
+	*mapped_size = size;
+	*impl_private = 0;			/* not used by this implementation */
+	return true;
+}
+
+static bool
+dsm_impl_mmap_detach(dsm_impl_handle handle,
+					 dsm_impl_private impl_private, void *mapped_address, Size mapped_size,
+					 int elevel)
+{
+	char		name[64];
+
+	Assert(mapped_address != NULL);
+	Assert(impl_private == 0);	/* not used by this implementation */
+
+	snprintf(name, 64, PG_DYNSHMEM_DIR "/" PG_DYNSHMEM_MMAP_FILE_PREFIX "%u",
+			 handle);
+
+	if (munmap(mapped_address, mapped_size) != 0)
+	{
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not unmap shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+	return true;
+}
+
+static bool
+dsm_impl_mmap_destroy(dsm_impl_handle handle, int elevel)
+{
+	char		name[64];
+
+	snprintf(name, 64, PG_DYNSHMEM_DIR "/" PG_DYNSHMEM_MMAP_FILE_PREFIX "%u",
+			 handle);
+
+	if (unlink(name) != 0)
+	{
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not remove shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+	return true;
+}
+
+const dsm_impl_ops dsm_impl_mmap_ops = {
+	.create = dsm_impl_mmap_create,
+	.attach = dsm_impl_mmap_attach,
+	.detach = dsm_impl_mmap_detach,
+	.destroy = dsm_impl_mmap_destroy,
+	.pin_segment = dsm_impl_noop_pin_segment,
+	.unpin_segment = dsm_impl_noop_unpin_segment,
+};
+
+#endif
diff --git a/src/backend/storage/ipc/dsm_impl_posix.c b/src/backend/storage/ipc/dsm_impl_posix.c
new file mode 100644
index 00000000000..f9c873cd961
--- /dev/null
+++ b/src/backend/storage/ipc/dsm_impl_posix.c
@@ -0,0 +1,341 @@
+/*-------------------------------------------------------------------------
+ *
+ * dsm_impl_posix.c
+ *	  Operating system primitives to support POSIX shared memory.
+ *
+ * POSIX shared memory segments are created and attached using shm_open()
+ * and shm_unlink(); other operations, such as sizing or mapping the
+ * segment, are performed as if the shared memory segments were files.
+ *
+ * Indeed, on some platforms, they may be implemented that way.  While
+ * POSIX shared memory segments seem intended to exist in a flat namespace,
+ * some operating systems may implement them as files, even going so far
+ * to treat a request for /xyz as a request to create a file by that name
+ * in the root directory.  Users of such broken platforms should select
+ * a different shared memory implementation.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/storage/ipc/dsm_impl_posix.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <fcntl.h>
+#include <signal.h>
+#include <unistd.h>
+#ifndef WIN32
+#include <sys/mman.h>
+#include <sys/stat.h>
+#endif
+
+#include "common/file_perm.h"
+#include "common/pg_prng.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "portability/mem.h"
+#include "storage/dsm_impl.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+
+#ifdef USE_DSM_POSIX
+
+static int	dsm_impl_posix_resize(int fd, off_t size);
+
+static dsm_impl_handle
+dsm_impl_posix_create(Size request_size, dsm_impl_private *impl_private,
+					  void **mapped_address, int elevel)
+{
+	char		name[64];
+	int			flags;
+	int			fd;
+	char	   *address;
+	dsm_impl_handle handle;
+
+	/*
+	 * Even though we will close the FD before returning, it seems desirable
+	 * to use Reserve/ReleaseExternalFD, to reduce the probability of EMFILE
+	 * failure.  The fact that we won't hold the FD open long justifies using
+	 * ReserveExternalFD rather than AcquireExternalFD, though.
+	 */
+	ReserveExternalFD();
+
+	/*
+	 * Create new segment, with a random name.  If the name is already in use,
+	 * retry until we find an unused name.
+	 */
+	for (;;)
+	{
+		do
+		{
+			handle = pg_prng_uint32(&pg_global_prng_state);
+		} while (handle == DSM_IMPL_HANDLE_INVALID);
+
+		snprintf(name, 64, "/PostgreSQL.%u", handle);
+
+		flags = O_RDWR | O_CREAT | O_EXCL;
+		if ((fd = shm_open(name, flags, PG_FILE_MODE_OWNER)) == -1)
+		{
+			ReleaseExternalFD();
+			if (errno == EEXIST)
+				continue;
+			ereport(elevel,
+					(errcode_for_dynamic_shared_memory(),
+					 errmsg("could not open shared memory segment \"%s\": %m",
+							name)));
+			return DSM_IMPL_HANDLE_INVALID;
+		}
+		break;
+	}
+
+	/* Enlarge it to the requested size. */
+	if (dsm_impl_posix_resize(fd, request_size) != 0)
+	{
+		int			save_errno;
+
+		/* Back out what's already been done. */
+		save_errno = errno;
+		close(fd);
+		ReleaseExternalFD();
+		shm_unlink(name);
+		errno = save_errno;
+
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not resize shared memory segment \"%s\" to %zu bytes: %m",
+						name, request_size)));
+		return DSM_IMPL_HANDLE_INVALID;
+	}
+
+	/* Map it to this process's address space. */
+	address = mmap(NULL, request_size, PROT_READ | PROT_WRITE,
+				   MAP_SHARED | MAP_HASSEMAPHORE | MAP_NOSYNC, fd, 0);
+	if (address == MAP_FAILED)
+	{
+		int			save_errno;
+
+		/* Back out what's already been done. */
+		save_errno = errno;
+		close(fd);
+		ReleaseExternalFD();
+		shm_unlink(name);
+		errno = save_errno;
+
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not map shared memory segment \"%s\": %m",
+						name)));
+		return DSM_IMPL_HANDLE_INVALID;
+	}
+
+	/* Once it's mapped, we don't need the file descriptor for it anymore. */
+	close(fd);
+	ReleaseExternalFD();
+
+	*mapped_address = address;
+	*impl_private = 0;			/* not used by this implementation */
+	return handle;
+}
+
+static bool
+dsm_impl_posix_attach(dsm_impl_handle handle, dsm_impl_private *impl_private,
+					  void **mapped_address, Size *mapped_size,
+					  int elevel)
+{
+	char		name[64];
+	int			flags;
+	int			fd;
+	char	   *address;
+	Size		size;
+	struct stat st;
+
+	snprintf(name, 64, "/PostgreSQL.%u", handle);
+
+	/* Like in dsm_impl_posix_attach, make sure an FD is available */
+	ReserveExternalFD();
+
+	flags = O_RDWR;
+	if ((fd = shm_open(name, flags, PG_FILE_MODE_OWNER)) == -1)
+	{
+		ReleaseExternalFD();
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not open shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+
+	/* Determine the current size */
+	if (fstat(fd, &st) != 0)
+	{
+		int			save_errno;
+
+		/* Back out what's already been done. */
+		save_errno = errno;
+		close(fd);
+		ReleaseExternalFD();
+		errno = save_errno;
+
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not stat shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+	size = st.st_size;
+
+	/* Map it to this process's address space. */
+	address = mmap(NULL, size, PROT_READ | PROT_WRITE,
+				   MAP_SHARED | MAP_HASSEMAPHORE | MAP_NOSYNC, fd, 0);
+	if (address == MAP_FAILED)
+	{
+		int			save_errno;
+
+		/* Back out what's already been done. */
+		save_errno = errno;
+		close(fd);
+		ReleaseExternalFD();
+		errno = save_errno;
+
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not map shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+
+	/* Once it's mapped, we don't need the file descriptor for it anymore */
+	close(fd);
+	ReleaseExternalFD();
+
+	*mapped_address = address;
+	*mapped_size = size;
+	*impl_private = 0;			/* not used by this implementation */
+	return true;
+}
+
+static bool
+dsm_impl_posix_detach(dsm_impl_handle handle, dsm_impl_private impl_private,
+					  void *mapped_address, Size mapped_size,
+					  int elevel)
+{
+	char		name[64];
+
+	Assert(mapped_address != NULL);
+	Assert(impl_private == 0);	/* not used by this implementation */
+
+	snprintf(name, 64, "/PostgreSQL.%u", handle);
+
+	if (munmap(mapped_address, mapped_size) != 0)
+	{
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not unmap shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+	return true;
+}
+
+static bool
+dsm_impl_posix_destroy(dsm_impl_handle handle, int elevel)
+{
+	char		name[64];
+
+	snprintf(name, 64, "/PostgreSQL.%u", handle);
+
+	if (shm_unlink(name) != 0)
+	{
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not remove shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+	return true;
+}
+
+/*
+ * Set the size of a virtual memory region associated with a file descriptor.
+ * If necessary, also ensure that virtual memory is actually allocated by the
+ * operating system, to avoid nasty surprises later.
+ *
+ * Returns non-zero if either truncation or allocation fails, and sets errno.
+ */
+static int
+dsm_impl_posix_resize(int fd, off_t size)
+{
+	int			rc;
+	int			save_errno;
+	sigset_t	save_sigmask;
+
+	/*
+	 * Block all blockable signals, except SIGQUIT.  posix_fallocate() can run
+	 * for quite a long time, and is an all-or-nothing operation.  If we
+	 * allowed SIGUSR1 to interrupt us repeatedly (for example, due to
+	 * recovery conflicts), the retry loop might never succeed.
+	 */
+	if (IsUnderPostmaster)
+		sigprocmask(SIG_SETMASK, &BlockSig, &save_sigmask);
+
+	pgstat_report_wait_start(WAIT_EVENT_DSM_ALLOCATE);
+#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
+
+	/*
+	 * On Linux, a shm_open fd is backed by a tmpfs file.  If we were to use
+	 * ftruncate, the file would contain a hole.  Accessing memory backed by a
+	 * hole causes tmpfs to allocate pages, which fails with SIGBUS if there
+	 * is no more tmpfs space available.  So we ask tmpfs to allocate pages
+	 * here, so we can fail gracefully with ENOSPC now rather than risking
+	 * SIGBUS later.
+	 *
+	 * We still use a traditional EINTR retry loop to handle SIGCONT.
+	 * posix_fallocate() doesn't restart automatically, and we don't want this
+	 * to fail if you attach a debugger.
+	 */
+	do
+	{
+		rc = posix_fallocate(fd, 0, size);
+	} while (rc == EINTR);
+
+	/*
+	 * The caller expects errno to be set, but posix_fallocate() doesn't set
+	 * it.  Instead it returns error numbers directly.  So set errno, even
+	 * though we'll also return rc to indicate success or failure.
+	 */
+	errno = rc;
+#else
+	/* Extend the file to the requested size. */
+	do
+	{
+		rc = ftruncate(fd, size);
+	} while (rc < 0 && errno == EINTR);
+#endif
+	pgstat_report_wait_end();
+
+	if (IsUnderPostmaster)
+	{
+		save_errno = errno;
+		sigprocmask(SIG_SETMASK, &save_sigmask, NULL);
+		errno = save_errno;
+	}
+
+	return rc;
+}
+
+const dsm_impl_ops dsm_impl_posix_ops = {
+	.create = dsm_impl_posix_create,
+	.attach = dsm_impl_posix_attach,
+	.detach = dsm_impl_posix_detach,
+	.destroy = dsm_impl_posix_destroy,
+	.pin_segment = dsm_impl_noop_pin_segment,
+	.unpin_segment = dsm_impl_noop_unpin_segment,
+};
+
+#endif							/* USE_DSM_POSIX */
diff --git a/src/backend/storage/ipc/dsm_impl_sysv.c b/src/backend/storage/ipc/dsm_impl_sysv.c
new file mode 100644
index 00000000000..cde742d4854
--- /dev/null
+++ b/src/backend/storage/ipc/dsm_impl_sysv.c
@@ -0,0 +1,224 @@
+/*-------------------------------------------------------------------------
+ *
+ * dsm_impl_sysv.c
+ *	  Operating system primitives to support System V shared memory.
+ *
+ * System V shared memory segments are manipulated using shmget(), shmat(),
+ * shmdt(), and shmctl().  As the default allocation limits for System V
+ * shared memory are usually quite low, the POSIX facilities may be
+ * preferable; but those are not supported everywhere.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/storage/ipc/dsm_impl_sysv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <fcntl.h>
+#include <signal.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <sys/ipc.h>
+#include <sys/shm.h>
+#include <sys/stat.h>
+
+#include "common/file_perm.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "portability/mem.h"
+#include "postmaster/postmaster.h"
+#include "storage/dsm_impl.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+
+#ifdef USE_DSM_SYSV
+
+/*
+ * Return the segment's shmid as a handle.
+ *
+ * Because 0 is a valid shmid, but not a valid dsm_impl_handle, add one to
+ * avoid it.  The valid range for shmid is from 0 to INT_MAX, so it fits in
+ * uint32 this way.
+ */
+static inline dsm_impl_handle
+handle_from_shmid(int shmid)
+{
+	return ((uint32) shmid) + 1;
+}
+
+static inline int
+shmid_from_handle(dsm_impl_handle handle)
+{
+	return handle - 1;
+}
+
+static dsm_impl_handle
+dsm_impl_sysv_create(Size request_size,
+					 dsm_impl_private *impl_private, void **mapped_address,
+					 int elevel)
+{
+	int			ident;
+	char	   *address;
+	char		name[64];
+
+	/*
+	 * Create a new shared memory segment.  We let shmget() pick a unique
+	 * identifier for us.
+	 */
+	if ((ident = shmget(IPC_PRIVATE, request_size, IPCProtection | IPC_CREAT | IPC_EXCL)) == -1)
+	{
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not create shared memory segment: %m")));
+		return DSM_IMPL_HANDLE_INVALID;
+	}
+
+	/*
+	 * POSIX shared memory and mmap-based shared memory identify segments with
+	 * names.  To avoid needless error message variation, we use the shmid as
+	 * the name.
+	 */
+	snprintf(name, 64, "shmid %d", ident);
+
+	/* Map it to this process's address space. */
+	address = shmat(ident, NULL, PG_SHMAT_FLAGS);
+	if (address == (void *) -1)
+	{
+		int			save_errno;
+
+		/* Back out what's already been done. */
+		save_errno = errno;
+		shmctl(ident, IPC_RMID, NULL);
+		errno = save_errno;
+
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not map shared memory segment \"%s\": %m",
+						name)));
+		return DSM_IMPL_HANDLE_INVALID;
+	}
+	*mapped_address = address;
+	*impl_private = 0;			/* not used by this implementation */
+
+	return handle_from_shmid(ident);
+}
+
+static bool
+dsm_impl_sysv_attach(dsm_impl_handle handle,
+					 dsm_impl_private *impl_private, void **mapped_address, Size *mapped_size,
+					 int elevel)
+{
+	int			ident = shmid_from_handle(handle);
+	char	   *address;
+	char		name[64];
+	struct shmid_ds shm;
+	Size		size;
+
+	/*
+	 * POSIX shared memory and mmap-based shared memory identify segments with
+	 * names.  To avoid needless error message variation, we use the shmid as
+	 * the name.
+	 */
+	snprintf(name, 64, "shmid %d", ident);
+
+	/* Use IPC_STAT to determine the size. */
+	if (shmctl(ident, IPC_STAT, &shm) != 0)
+	{
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not stat shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+	size = shm.shm_segsz;
+
+	/* Map it to this process's address space. */
+	address = shmat(ident, NULL, PG_SHMAT_FLAGS);
+	if (address == (void *) -1)
+	{
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not map shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+	*mapped_address = address;
+	*mapped_size = size;
+	*impl_private = 0;			/* not used by this implementation */
+
+	return true;
+}
+
+
+static bool
+dsm_impl_sysv_detach(dsm_impl_handle handle,
+					 dsm_impl_private impl_private, void *mapped_address, Size mapped_size,
+					 int elevel)
+{
+	int			ident = shmid_from_handle(handle);
+	char		name[64];
+
+	Assert(mapped_address != NULL);
+	Assert(impl_private == 0);	/* not used by this implementation */
+
+	/*
+	 * POSIX shared memory and mmap-based shared memory identify segments with
+	 * names.  To avoid needless error message variation, we use the shmid as
+	 * the name.
+	 */
+	snprintf(name, 64, "shmid %d", ident);
+
+	if (shmdt(mapped_address) != 0)
+	{
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not unmap shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+	return true;
+}
+
+static bool
+dsm_impl_sysv_destroy(dsm_impl_handle handle, int elevel)
+{
+	int			ident = shmid_from_handle(handle);
+	char		name[64];
+
+	/*
+	 * POSIX shared memory and mmap-based shared memory identify segments with
+	 * names.  To avoid needless error message variation, we use the shmid as
+	 * the name.
+	 */
+	snprintf(name, 64, "shmid %d", ident);
+
+	/* Handle teardown cases. */
+	if (shmctl(ident, IPC_RMID, NULL) < 0)
+	{
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not remove shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+	return true;
+}
+
+const dsm_impl_ops dsm_impl_sysv_ops = {
+	.create = dsm_impl_sysv_create,
+	.attach = dsm_impl_sysv_attach,
+	.detach = dsm_impl_sysv_detach,
+	.destroy = dsm_impl_sysv_destroy,
+	.pin_segment = dsm_impl_noop_pin_segment,
+	.unpin_segment = dsm_impl_noop_unpin_segment,
+};
+
+#endif
diff --git a/src/backend/storage/ipc/dsm_impl_windows.c b/src/backend/storage/ipc/dsm_impl_windows.c
new file mode 100644
index 00000000000..f4b62f92c35
--- /dev/null
+++ b/src/backend/storage/ipc/dsm_impl_windows.c
@@ -0,0 +1,345 @@
+/*-------------------------------------------------------------------------
+ *
+ * dsm_impl_windows.c
+ *	  Operating system primitives to support Windows shared memory.
+ *
+ * Windows shared memory implementation is done using file mapping
+ * which can be backed by either physical file or system paging file.
+ * Current implementation uses system paging file as other effects
+ * like performance are not clear for physical file and it is used in
+ * similar way for main shared memory in windows.
+ *
+ * A memory mapping object is a kernel object - they always get deleted
+ * when the last reference to them goes away, either explicitly via a
+ * CloseHandle or when the process containing the reference exits.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/storage/ipc/dsm_impl_windows.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/file_perm.h"
+#include "common/pg_prng.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "portability/mem.h"
+#include "postmaster/postmaster.h"
+#include "storage/dsm_impl.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+
+#define SEGMENT_NAME_PREFIX			"Global/PostgreSQL"
+
+static dsm_impl_handle
+dsm_impl_windows_create(Size request_size,
+						dsm_impl_private *impl_private, void **mapped_address,
+						int elevel)
+{
+	char	   *address;
+	HANDLE		hmap;
+	char		name[64];
+	DWORD		size_high;
+	DWORD		size_low;
+	DWORD		errcode;
+	dsm_impl_handle handle;
+
+	/* Shifts >= the width of the type are undefined. */
+#ifdef _WIN64
+	size_high = request_size >> 32;
+#else
+	size_high = 0;
+#endif
+	size_low = (DWORD) request_size;
+
+	/*
+	 * Create new segment, with a random name.  If the name is already in use,
+	 * retry until we find an unused name.
+	 */
+	for (;;)
+	{
+		do
+		{
+			handle = pg_prng_uint32(&pg_global_prng_state);
+		} while (handle == DSM_IMPL_HANDLE_INVALID);
+
+		/*
+		 * Storing the shared memory segment in the Global\ namespace can
+		 * allow any process running in any session to access that file
+		 * mapping object provided that the caller has the required access
+		 * rights.  But to avoid issues faced in main shared memory, we are
+		 * using the naming convention similar to main shared memory.  We can
+		 * change here once issue mentioned in GetSharedMemName is resolved.
+		 */
+		snprintf(name, 64, "%s.%u", SEGMENT_NAME_PREFIX, handle);
+
+		/* CreateFileMapping might not clear the error code on success */
+		SetLastError(0);
+
+		hmap = CreateFileMapping(INVALID_HANDLE_VALUE,	/* Use the pagefile */
+								 NULL,	/* Default security attrs */
+								 PAGE_READWRITE,	/* Memory is read/write */
+								 size_high, /* Upper 32 bits of size */
+								 size_low,	/* Lower 32 bits of size */
+								 name);
+
+		errcode = GetLastError();
+		if (errcode == ERROR_ALREADY_EXISTS || errcode == ERROR_ACCESS_DENIED)
+		{
+			/*
+			 * On Windows, when the segment already exists, a handle for the
+			 * existing segment is returned.  We must close it before
+			 * retrying.  However, if the existing segment is created by a
+			 * service, then it returns ERROR_ACCESS_DENIED. We don't do
+			 * _dosmaperr here, so errno won't be modified.
+			 */
+			if (hmap)
+				CloseHandle(hmap);
+			continue;
+		}
+		break;
+	}
+
+	if (!hmap)
+	{
+		_dosmaperr(errcode);
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not create shared memory segment \"%s\": %m",
+						name)));
+		return DSM_IMPL_HANDLE_INVALID;
+	}
+
+	/* Map it. */
+	address = MapViewOfFile(hmap, FILE_MAP_WRITE | FILE_MAP_READ,
+							0, 0, 0);
+	if (!address)
+	{
+		int			save_errno;
+
+		_dosmaperr(GetLastError());
+		/* Back out what's already been done. */
+		save_errno = errno;
+		CloseHandle(hmap);
+		errno = save_errno;
+
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not map shared memory segment \"%s\": %m",
+						name)));
+		return DSM_IMPL_HANDLE_INVALID;
+	}
+
+	*mapped_address = address;
+	*impl_private = (uintptr_t) hmap;
+
+	return handle;
+}
+
+static bool
+dsm_impl_windows_attach(dsm_impl_handle handle,
+						dsm_impl_private *impl_private, void **mapped_address,
+						Size *mapped_size, int elevel)
+{
+	char	   *address;
+	HANDLE		hmap;
+	char		name[64];
+	MEMORY_BASIC_INFORMATION info;
+
+	snprintf(name, 64, "%s.%u", SEGMENT_NAME_PREFIX, handle);
+
+	/* Open the existing mapping object. */
+	hmap = OpenFileMapping(FILE_MAP_WRITE | FILE_MAP_READ,
+						   FALSE,	/* do not inherit the name */
+						   name);	/* name of mapping object */
+	if (!hmap)
+	{
+		_dosmaperr(GetLastError());
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not open shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+
+	/* Map it to this process's address space. */
+	address = MapViewOfFile(hmap, FILE_MAP_WRITE | FILE_MAP_READ,
+							0, 0, 0);
+	if (!address)
+	{
+		int			save_errno;
+
+		_dosmaperr(GetLastError());
+		/* Back out what's already been done. */
+		save_errno = errno;
+		CloseHandle(hmap);
+		errno = save_errno;
+
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not map shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+
+	/*
+	 * Determine the size of the mapping.
+	 *
+	 * Note: Windows rounds up the mapping size to the 4K page size, so this
+	 * can be larger than what was requested when the segment was created.
+	 */
+	if (VirtualQuery(address, &info, sizeof(info)) == 0)
+	{
+		int			save_errno;
+
+		_dosmaperr(GetLastError());
+		/* Back out what's already been done. */
+		save_errno = errno;
+		UnmapViewOfFile(address);
+		CloseHandle(hmap);
+		errno = save_errno;
+
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not stat shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+
+	*mapped_address = address;
+	*mapped_size = info.RegionSize;
+	*impl_private = (uintptr_t) hmap;
+
+	return true;
+}
+
+static bool
+dsm_impl_windows_detach(dsm_impl_handle handle,
+						dsm_impl_private impl_private, void *mapped_address,
+						Size mapped_size, int elevel)
+{
+	HANDLE		hmap;
+	char		name[64];
+
+	Assert(mapped_address != NULL);
+	hmap = (HANDLE) impl_private;
+
+	snprintf(name, 64, "%s.%u", SEGMENT_NAME_PREFIX, handle);
+
+	/*
+	 * Handle teardown cases.
+	 */
+	if (UnmapViewOfFile(mapped_address) == 0)
+	{
+		_dosmaperr(GetLastError());
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not unmap shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+	if (CloseHandle(hmap) == 0)
+	{
+		_dosmaperr(GetLastError());
+		ereport(elevel,
+				(errcode_for_dynamic_shared_memory(),
+				 errmsg("could not remove shared memory segment \"%s\": %m",
+						name)));
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * Since Windows automatically destroys the object when no references remain,
+ * this is a no-op.
+ */
+static bool
+dsm_impl_windows_destroy(dsm_impl_handle handle, int elevel)
+{
+	return true;
+}
+
+/*
+ * Windows cleans up segments automatically when no references remain. That's
+ * handy, but if a segment needs to be preserved even when no backend has it
+ * attached, we need to take measures to prevent it from being cleaned up.  To
+ * prevent it, we duplicate the segment handle into the postmaster process.
+ * The postmaster needn't do anything to receive the handle; Windows transfers
+ * it automatically.
+ */
+static void
+dsm_impl_windows_pin_segment(dsm_impl_handle handle, dsm_impl_private impl_private,
+							 dsm_impl_private_pm_handle *pm_handle)
+{
+	if (IsUnderPostmaster)
+	{
+		HANDLE		hmap = (HANDLE) impl_private;
+		HANDLE		pm_hmap;
+
+		if (!DuplicateHandle(GetCurrentProcess(), hmap,
+							 PostmasterHandle, &pm_hmap, 0, FALSE,
+							 DUPLICATE_SAME_ACCESS))
+		{
+			char		name[64];
+
+			snprintf(name, 64, "%s.%u", SEGMENT_NAME_PREFIX, handle);
+			_dosmaperr(GetLastError());
+			ereport(ERROR,
+					(errcode_for_dynamic_shared_memory(),
+					 errmsg("could not duplicate handle for \"%s\": %m",
+							name)));
+		}
+
+		/*
+		 * Here, we remember the handle that we created in the postmaster
+		 * process.  This handle isn't actually usable in any process other
+		 * than the postmaster, but that doesn't matter.  We're just holding
+		 * onto it so that, if the segment is unpinned, dsm_unpin_segment can
+		 * close it.
+		 */
+		*pm_handle = (uintptr_t) pm_hmap;
+	}
+}
+
+/*
+ * Close the extra handle that pin_segment created in the postmaster's process
+ * space.
+ */
+static void
+dsm_impl_windows_unpin_segment(dsm_impl_handle handle, dsm_impl_private_pm_handle pm_handle)
+{
+	if (IsUnderPostmaster)
+	{
+		HANDLE		pm_hmap = (HANDLE) pm_handle;
+
+		if (!DuplicateHandle(PostmasterHandle, pm_hmap,
+							 NULL, NULL, 0, FALSE,
+							 DUPLICATE_CLOSE_SOURCE))
+		{
+			char		name[64];
+
+			snprintf(name, 64, "%s.%u", SEGMENT_NAME_PREFIX, handle);
+			_dosmaperr(GetLastError());
+			ereport(ERROR,
+					(errcode_for_dynamic_shared_memory(),
+					 errmsg("could not duplicate handle for \"%s\": %m",
+							name)));
+		}
+	}
+}
+
+const dsm_impl_ops dsm_impl_windows_ops = {
+	.create = dsm_impl_windows_create,
+	.attach = dsm_impl_windows_attach,
+	.detach = dsm_impl_windows_detach,
+	.destroy = dsm_impl_windows_destroy,
+	.pin_segment = dsm_impl_windows_pin_segment,
+	.unpin_segment = dsm_impl_windows_unpin_segment,
+};
diff --git a/src/backend/storage/ipc/meson.build b/src/backend/storage/ipc/meson.build
index 5a936171f73..5acea5af250 100644
--- a/src/backend/storage/ipc/meson.build
+++ b/src/backend/storage/ipc/meson.build
@@ -20,3 +20,15 @@ backend_sources += files(
   'standby.c',
 
 )
+
+if host_system == 'windows'
+  backend_sources += files(
+    'dsm_impl_windows.c',
+  )
+else
+  backend_sources += files(
+    'dsm_impl_posix.c',
+    'dsm_impl_sysv.c',
+    'dsm_impl_mmap.c',
+  )
+endif
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 70652f0a3fc..1d809ba2c29 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -4885,7 +4885,7 @@ struct config_enum ConfigureNamesEnum[] =
 		},
 		&dynamic_shared_memory_type,
 		DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE, dynamic_shared_memory_options,
-		NULL, NULL, NULL
+		NULL, assign_dynamic_shared_memory_type, NULL
 	},
 
 	{
diff --git a/src/include/storage/dsm.h b/src/include/storage/dsm.h
index 35ae4eb164e..5d46361e4a0 100644
--- a/src/include/storage/dsm.h
+++ b/src/include/storage/dsm.h
@@ -17,11 +17,19 @@
 
 typedef struct dsm_segment dsm_segment;
 
+/*
+ * Note, must be meaningful even across restarts
+ */
+typedef uint32 dsm_handle;
+
+/* Sentinel value to use for invalid DSM handles. */
+#define DSM_HANDLE_INVALID 0
+
 #define DSM_CREATE_NULL_IF_MAXSEGMENTS			0x0001
 
 /* Startup and shutdown functions. */
 struct PGShmemHeader;			/* avoid including pg_shmem.h */
-extern void dsm_cleanup_using_control_segment(dsm_handle old_control_handle);
+extern void dsm_cleanup_using_control_segment(dsm_impl_handle old_control_handle);
 extern void dsm_postmaster_startup(struct PGShmemHeader *);
 extern void dsm_backend_shutdown(void);
 extern void dsm_detach_all(void);
diff --git a/src/include/storage/dsm_impl.h b/src/include/storage/dsm_impl.h
index f2bfb2a1a5c..0f351776773 100644
--- a/src/include/storage/dsm_impl.h
+++ b/src/include/storage/dsm_impl.h
@@ -13,6 +13,10 @@
 #ifndef DSM_IMPL_H
 #define DSM_IMPL_H
 
+typedef uint32 dsm_impl_handle;
+
+#define DSM_IMPL_HANDLE_INVALID 0
+
 /* Dynamic shared memory implementations. */
 #define DSM_IMPL_POSIX			1
 #define DSM_IMPL_SYSV			2
@@ -51,25 +55,11 @@ extern PGDLLIMPORT int min_dynamic_shared_memory;
 #define PG_DYNSHMEM_DIR					"pg_dynshmem"
 #define PG_DYNSHMEM_MMAP_FILE_PREFIX	"mmap."
 
-/* A "name" for a dynamic shared memory segment. */
-typedef uint32 dsm_handle;
-
-/* Sentinel value to use for invalid DSM handles. */
-#define DSM_HANDLE_INVALID ((dsm_handle) 0)
-
-/* All the shared-memory operations we know about. */
-typedef enum
-{
-	DSM_OP_CREATE,
-	DSM_OP_ATTACH,
-	DSM_OP_DETACH,
-	DSM_OP_DESTROY,
-} dsm_op;
-
 /*
  * When a segment is created or attached, the caller provides this space to
- * hold implementation-specific information about the attachment. It is opaque
- * to the caller, and is passed back to the implementation when detaching.
+ * hold implementation-specific information about the attachment.  It is
+ * opaque to the caller, and is passed back to the implementation when
+ * detaching.
  */
 typedef uintptr_t dsm_impl_private;
 
@@ -79,14 +69,73 @@ typedef uintptr_t dsm_impl_private;
  */
 typedef uintptr_t dsm_impl_private_pm_handle;
 
-/* Create, attach to, detach from, resize, or destroy a segment. */
-extern bool dsm_impl_op(dsm_op op, dsm_handle handle, Size request_size,
-						dsm_impl_private *impl_private, void **mapped_address, Size *mapped_size,
-						int elevel);
+/* All the shared-memory operations we know about. */
+typedef struct
+{
+	/*
+	 * Create a new DSM segment, and map it to the current process's address
+	 * space.
+	 *
+	 * Note: the mapped_size can differ from requested size (currently only
+	 * Windows)
+	 *
+	 */
+	dsm_impl_handle(*create) (Size request_size, dsm_impl_private *impl_private,
+							  void **mapped_address, int elevel);
+
+	/*
+	 * Map an existing DSM segment to current process's address space.
+	 */
+	bool		(*attach) (dsm_impl_handle handle, dsm_impl_private *impl_private,
+						   void **mapped_address, Size *mapped_size, int elevel);
+
+	/*
+	 * Unmap
+	 */
+	bool		(*detach) (dsm_impl_handle handle, dsm_impl_private impl_private,
+						   void *mapped_address, Size mapped_size, int elevel);
+
+	/*
+	 * Destroy a DSM segment.
+	 *
+	 * Note: must detach first
+	 */
+	bool		(*destroy) (dsm_impl_handle handle, int elevel);
+
+	/*
+	 * Implementation-specific actions that must be performed when a segment
+	 * is to be preserved even when no backend has it attached.
+	 */
+	void		(*pin_segment) (dsm_impl_handle handle, dsm_impl_private impl_private,
+								dsm_impl_private_pm_handle *pm_handle);
+
+	/*
+	 * Implementation-specific actions that must be performed when a segment
+	 * is no longer to be preserved, so that it will be cleaned up when all
+	 * backends have detached from it.
+	 */
+	void		(*unpin_segment) (dsm_impl_handle handle, dsm_impl_private_pm_handle pm_handle);
+} dsm_impl_ops;
+
+extern PGDLLIMPORT const dsm_impl_ops *dsm_impl;
+
+#ifdef USE_DSM_POSIX
+extern PGDLLIMPORT const dsm_impl_ops dsm_impl_posix_ops;
+#endif
+#ifdef USE_DSM_SYSV
+extern PGDLLIMPORT const dsm_impl_ops dsm_impl_sysv_ops;
+#endif
+#ifdef USE_DSM_WINDOWS
+extern PGDLLIMPORT const dsm_impl_ops dsm_impl_windows_ops;
+#endif
+#ifdef USE_DSM_MMAP
+extern PGDLLIMPORT const dsm_impl_ops dsm_impl_mmap_ops;
+#endif
+
+extern void dsm_impl_noop_pin_segment(dsm_impl_handle handle, dsm_impl_private impl_private,
+									  dsm_impl_private_pm_handle *pm_handle);
+extern void dsm_impl_noop_unpin_segment(dsm_impl_handle handle, dsm_impl_private_pm_handle pm_handle);
 
-/* Implementation-dependent actions required to keep segment until shutdown. */
-extern void dsm_impl_pin_segment(dsm_handle handle, dsm_impl_private impl_private,
-								 dsm_impl_private_pm_handle *impl_private_pm_handle);
-extern void dsm_impl_unpin_segment(dsm_handle handle, dsm_impl_private_pm_handle *pm_handle);
+extern int	errcode_for_dynamic_shared_memory(void);
 
 #endif							/* DSM_IMPL_H */
diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index 3065ff5be71..c319ad6208f 100644
--- a/src/include/storage/pg_shmem.h
+++ b/src/include/storage/pg_shmem.h
@@ -33,7 +33,7 @@ typedef struct PGShmemHeader	/* standard header for all Postgres shmem */
 	pid_t		creatorPID;		/* PID of creating process (set but unread) */
 	Size		totalsize;		/* total size of segment */
 	Size		freeoffset;		/* offset to first free space */
-	dsm_handle	dsm_control;	/* ID of dynamic shared memory control seg */
+	dsm_impl_handle dsm_control;	/* ID of dynamic shared memory control seg */
 	void	   *index;			/* pointer to ShmemIndex table */
 #ifndef WIN32					/* Windows doesn't have useful inode#s */
 	dev_t		device;			/* device data directory is on */
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 339c490300e..289b5c7addc 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -59,6 +59,8 @@ extern bool check_default_text_search_config(char **newval, void **extra, GucSou
 extern void assign_default_text_search_config(const char *newval, void *extra);
 extern bool check_default_with_oids(bool *newval, void **extra,
 									GucSource source);
+extern void assign_dynamic_shared_memory_type(int new_dynamic_shared_memory_type,
+											  void *extra);
 extern bool check_effective_io_concurrency(int *newval, void **extra,
 										   GucSource source);
 extern bool check_huge_page_size(int *newval, void **extra, GucSource source);
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 89aa41b5e31..8750eab03de 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -18,6 +18,7 @@ SUBDIRS = \
 		  test_custom_rmgrs \
 		  test_ddl_deparse \
 		  test_dsa \
+		  test_dsm \
 		  test_dsm_registry \
 		  test_extensions \
 		  test_ginpostinglist \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 8fbe742d385..54b1c26a7fa 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -17,6 +17,7 @@ subdir('test_copy_callbacks')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
 subdir('test_dsa')
+subdir('test_dsm')
 subdir('test_dsm_registry')
 subdir('test_extensions')
 subdir('test_ginpostinglist')
diff --git a/src/test/modules/test_dsm/.gitignore b/src/test/modules/test_dsm/.gitignore
new file mode 100644
index 00000000000..9a737823e16
--- /dev/null
+++ b/src/test/modules/test_dsm/.gitignore
@@ -0,0 +1,3 @@
+# Generated subdirectories
+/log/
+/tmp_check/
diff --git a/src/test/modules/test_dsm/Makefile b/src/test/modules/test_dsm/Makefile
new file mode 100644
index 00000000000..12168490652
--- /dev/null
+++ b/src/test/modules/test_dsm/Makefile
@@ -0,0 +1,27 @@
+# src/test/modules/test_dsm/Makefile
+
+MODULE_big = test_dsm
+OBJS = \
+	$(WIN32RES) \
+	test_dsm.o
+PGFILEDESC = "test_dsm - test code for dynamic shared memory"
+
+EXTENSION = test_dsm
+DATA = test_dsm--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_dsm
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
diff --git a/src/test/modules/test_dsm/meson.build b/src/test/modules/test_dsm/meson.build
new file mode 100644
index 00000000000..4321cabe6c4
--- /dev/null
+++ b/src/test/modules/test_dsm/meson.build
@@ -0,0 +1,33 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+test_dsm_sources = files(
+  'test_dsm.c',
+)
+
+if host_system == 'windows'
+  test_dsm_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_dsm',
+    '--FILEDESC', 'test_dsm - test code for dynamic shared memory',])
+endif
+
+test_dsm = shared_module('test_dsm',
+  test_dsm_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_dsm
+
+test_install_data += files(
+  'test_dsm.control',
+  'test_dsm--1.0.sql',
+)
+
+tests += {
+  'name': 'test_dsm',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'tap': {
+    'tests': [
+      't/001_dsm_basic.pl',
+    ],
+  },
+}
diff --git a/src/test/modules/test_dsm/t/001_dsm_basic.pl b/src/test/modules/test_dsm/t/001_dsm_basic.pl
new file mode 100644
index 00000000000..dacc4e75ce1
--- /dev/null
+++ b/src/test/modules/test_dsm/t/001_dsm_basic.pl
@@ -0,0 +1,61 @@
+
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+# Run the standard regression tests with streaming replication
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+use File::Basename;
+
+# Initialize primary node
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init();
+
+# Run a SQL command and return psql's stderr (including debug messages)
+sub run_dsm_basic_test
+{
+	local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+	my $dynamic_shared_memory_type = shift;
+	my $min_dynamic_shared_memory = shift;
+
+	$node->adjust_conf('postgresql.conf', 'dynamic_shared_memory_type', $dynamic_shared_memory_type);
+	$node->adjust_conf('postgresql.conf', 'min_dynamic_shared_memory', $min_dynamic_shared_memory);
+
+	$node->start;
+
+	$node->safe_psql('postgres', 'CREATE EXTENSION IF NOT EXISTS test_dsm');
+
+	my $stderr;
+	my $cmdret = $node->psql('postgres', 'SELECT test_dsm_basic()', stderr => \$stderr);
+	ok($cmdret == 0, "$dynamic_shared_memory_type with minsize $min_dynamic_shared_memory");
+	is($stderr, '', "$dynamic_shared_memory_type with minsize $min_dynamic_shared_memory");
+
+	$node->stop('fast');
+}
+
+# Test all the DSM implementations
+
+SKIP:
+{
+	skip "Skipping posix, sysv, mmap test on Windows", 2 if ($windows_os);
+
+	run_dsm_basic_test('posix', 0);
+	run_dsm_basic_test('posix', 1000000);
+	run_dsm_basic_test('sysv', 0);
+	run_dsm_basic_test('sysv', 1000000);
+	run_dsm_basic_test('mmap', 0);
+	run_dsm_basic_test('mmap', 1000000);
+}
+
+SKIP:
+{
+	skip "Windows dsm support", 2 if (!$windows_os);
+
+	run_dsm_basic_test('windows', 0);
+	run_dsm_basic_test('windows', 1000000);
+}
+
+done_testing();
diff --git a/src/test/modules/test_dsm/test_dsm--1.0.sql b/src/test/modules/test_dsm/test_dsm--1.0.sql
new file mode 100644
index 00000000000..4dbdc811b81
--- /dev/null
+++ b/src/test/modules/test_dsm/test_dsm--1.0.sql
@@ -0,0 +1,9 @@
+/* src/test/modules/test_dsm/test_dsm--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_dsm" to load this file. \quit
+
+CREATE FUNCTION test_dsm_basic()
+	RETURNS pg_catalog.void
+	AS 'MODULE_PATHNAME' LANGUAGE C;
+
diff --git a/src/test/modules/test_dsm/test_dsm.c b/src/test/modules/test_dsm/test_dsm.c
new file mode 100644
index 00000000000..3b64d8a2f02
--- /dev/null
+++ b/src/test/modules/test_dsm/test_dsm.c
@@ -0,0 +1,75 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_dsm.c
+ *		Test dynamic shared memory
+ *
+ * Copyright (c) 2022-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_dsm/test_dsm.c
+ *
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "storage/dsm.h"
+
+PG_MODULE_MAGIC;
+
+
+/* Test basic DSM functionality */
+PG_FUNCTION_INFO_V1(test_dsm_basic);
+Datum
+test_dsm_basic(PG_FUNCTION_ARGS)
+{
+	dsm_segment *seg;
+	unsigned char *p;
+	Size		requested_size;
+	Size		created_size;
+	Size		attached_size;
+	dsm_handle	handle;
+
+	/* Create a small DSM segment */
+	requested_size = 1000;
+	seg = dsm_create(requested_size, 0);
+
+	handle = dsm_segment_handle(seg);
+	created_size = dsm_segment_map_length(seg);
+	if (requested_size != created_size)
+		elog(ERROR, "DSM size mismatch, requested %lu but created as %lu", requested_size, created_size);
+
+	/* Fill it with data */
+	p = dsm_segment_address(seg);
+	memset(p, 0x12, requested_size);
+
+	/* Pin the segment so that it's not destroyed when we detach it */
+	dsm_pin_segment(seg);
+
+	/* Detach and re-attach it */
+	dsm_detach(seg);
+	seg = dsm_attach(handle);
+
+	/*
+	 * Check the size after re-attaching.  It can be larger than what was
+	 * requested originally, because some implementations round it up to the
+	 * nearest page size. Tolerate that.
+	 */
+	attached_size = dsm_segment_map_length(seg);
+	if (attached_size < created_size)
+		elog(ERROR, "DSM size mismatch, created %lu but attached %lu", created_size, attached_size);
+	if (requested_size + 100000 < created_size)
+		elog(ERROR, "unexpectdly large size after attach: requested %lu but got %lu", requested_size, created_size);
+
+	/* check contents */
+	p = dsm_segment_address(seg);
+	for (Size i; i < created_size; i++)
+	{
+		if (p[i] != 0x12)
+			elog(ERROR, "DSM segment has unexpected content %u at offset %lu", p[i], i);
+	}
+
+	dsm_detach(seg);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/test/modules/test_dsm/test_dsm.control b/src/test/modules/test_dsm/test_dsm.control
new file mode 100644
index 00000000000..2c25c4d55f2
--- /dev/null
+++ b/src/test/modules/test_dsm/test_dsm.control
@@ -0,0 +1,4 @@
+comment = 'Test code for dynamic shared memory'
+default_version = '1.0'
+module_pathname = '$libdir/test_dsm'
+relocatable = true
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d808aad8b05..39b11c88c9d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3322,6 +3322,9 @@ dshash_table_item
 dsm_control_header
 dsm_control_item
 dsm_handle
+dsm_impl_ops
+dsm_impl_private
+dsm_impl_private_pm_handle
 dsm_op
 dsm_segment
 dsm_segment_detach_callback
-- 
2.39.2

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Heikki Linnakangas (#5)
Re: DSA_ALLOC_NO_OOM doesn't work

On Thu, Feb 22, 2024 at 8:19 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

- Separate dsm_handle, used by backend code to interact with the high
level interface in dsm.c, from dsm_impl_handle, which is used to
interact with the low-level functions in dsm_impl.c. This gets rid of
the convention in dsm.c of reserving odd numbers for DSM segments stored
in the main shmem area. There is now an explicit flag for that the
control slot. For generating dsm_handles, we now use the same scheme we
used to use for main-area shm segments for all DSM segments, which
includes the slot number in the dsm_handle. The implementations use
their own mechanisms for generating the low-level dsm_impl_handles (all
but the SysV implementation generate a random handle and retry on
collision).

Could we use slot number and generation number, instead of slot number
and random number? I have never liked the random number thing, which
IIUC was needed because of SysV key space management problems 'leaking'
up to the handle level (yuck). With generations, you could keep
collisions arbitrarily far apart (just decide how many bits to use).
Collisions aren't exactly likely, but if there is no need for that
approach, I'm not sure why we'd keep it. (I remember dealing with
actual collisions in the wild due to lack of PRNG seeding in workers,
which admittedly should be vanishingly rare now).

If the slot number is encoded into the handle, why do we still need a
linear search for the slot?

- create() no longer returns the mapped_size. The old Windows
implementation had some code to read the actual mapped size after
creating the mapping, and returned that in *mapped_size. Others just
returned the requested size. In principle reading the actual size might
be useful; the caller might be able to make use of the whole mapped size
when it's larger than requested. In practice, the callers didn't do
that. Also, POSIX shmem on FreeBSD has similar round-up-to-page-size
behavior but the implementation did not query the actual mapped size
after creating the segment, so you could not rely on it.

I think that is an interesting issue with the main shmem area. There,
we can set huge_page_size to fantastically large sizes up to 16GB on
some architectures, but we have nothing to make sure we don't waste
some or most of the final page. But I agree that there's not much
point in worrying about this for DSM.

- Added a test that exercises basic create, detach, attach functionality
using all the different implementations supported on the current platform.

I wonder how we could test the cleanup-after-crash behaviour.

#7Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#6)
Re: DSA_ALLOC_NO_OOM doesn't work

On Thu, Feb 22, 2024 at 10:30 AM Thomas Munro <thomas.munro@gmail.com> wrote:

collisions arbitrarily far apart (just decide how many bits to use).

. o O ( Perhaps if you also allocated slots using a FIFO freelist,
instead of the current linear search for the first free slot, you
could maximise the time before a slot is reused, improving the
collision-avoiding power of a generation scheme? )

#8Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Thomas Munro (#7)
Re: DSA_ALLOC_NO_OOM doesn't work

On 22/02/2024 01:03, Thomas Munro wrote:

On Thu, Feb 22, 2024 at 10:30 AM Thomas Munro <thomas.munro@gmail.com> wrote:

collisions arbitrarily far apart (just decide how many bits to use).

. o O ( Perhaps if you also allocated slots using a FIFO freelist,
instead of the current linear search for the first free slot, you
could maximise the time before a slot is reused, improving the
collision-avoiding power of a generation scheme? )

We could also enlarge dsm_handle from 32-bits to 64-bits, if we're
worried about collisions.

I actually experimented with something like that too: I encoded the "is
this in main region" in one of the high bits and let the implementation
use the low bits. One small issue with that is that we have a few places
that pass a DSM handle as the 'bgw_main' argument when launching a
worker process, and on 32-bit platforms that would not be wide enough.
Those could be changed to use the wider 'bgw_extra' field instead, though.

--
Heikki Linnakangas
Neon (https://neon.tech)

#9Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#5)
Re: DSA_ALLOC_NO_OOM doesn't work

On Thu, Feb 22, 2024 at 12:49 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

That's fair, I can see those reasons. Nevertheless, I do think it was a
bad tradeoff. A little bit of repetition would be better here, or we can
extract the common parts to smaller functions.

I came up with the attached:

25 files changed, 1710 insertions(+), 1113 deletions(-)

So yeah, it's more code, and there's some repetition, but I think this
is more readable. Some of that is extra boilerplate because I split the
implementations to separate files, and I also added tests.

Adding tests is great. I'm unenthusiastic about the rest, but I don't
really care enough to argue.

What's the goal here, anyway? I mean, why bother?

--
Robert Haas
EDB: http://www.enterprisedb.com