Re: [PATCH] BUG #13416: Postgres >= 9.3 doesn't use optimized shared memory on Solaris anymore

Started by Sean Chittendenover 8 years ago4 messages
#1Sean Chittenden
seanc@joyent.com
1 attachment(s)

Hello.  We identified the same problem.  Sam Gwydir and Josh Clulow were able to put together the appropriate fix after.

The breakage in src/backend/port/sysv_shmem.c and src/include/storage/dsm_impl.h should be back ported to all supported versions (the regression was introduced between the 9.2 and 9.3 branches).

The only real question remaining is: do we want to change the default behavior, as detected by initdb(1), to use ISM on Illumos/Solaris derived systems?  Given the memory bloat experienced when using POSIX shared memory (potentially very significant for systems with larger shared_buffers), we think it's probably prudent to change the default from dynamic_shared_memory_type=posix to sysv.  Unfortunately I think it's not worth changing the default behavior of initdb(1) in the current supported branches, but it should be changed for 10.

Please let us know if there are any questions.  -sc

--
Sean Chittenden
seanc@joyent.com

Attachments:

ism.patchapplication/octet-streamDownload
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 850eafddc3..0a8e1bb342 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -55,15 +55,20 @@
  * false, we might need to add compile and/or run-time tests here and do this
  * only if the running kernel supports it.
  *
- * However, we must always disable this logic in the EXEC_BACKEND case, and
+ * There are two cases in which we disable this logic. On systems that support
+ * intimate shared memory (e.g. illumos and Solaris), we prefer the usage of
+ * System V shared memory in order to take advantage of the performance
+ * improvements that sharing page tables provides.
+ *
+ * In addition, we must always disable this logic in the EXEC_BACKEND case, and
  * fall back to the old method of allocating the entire segment using System V
  * shared memory, because there's no way to attach an anonymous mmap'd segment
  * to a process after exec().  Since EXEC_BACKEND is intended only for
  * developer use, this shouldn't be a big problem.  Because of this, we do
  * not worry about supporting anonymous shmem in the EXEC_BACKEND cases below.
  */
-#ifndef EXEC_BACKEND
-#define USE_ANONYMOUS_SHMEM
+#if !defined(EXEC_BACKEND) && !defined(SHM_SHARE_MMU)
+#  define USE_ANONYMOUS_SHMEM
 #endif
 
 
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 27d134e17d..329ed88c8b 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1050,12 +1050,13 @@ set_null_conf(void)
  * attempt to reproduce what the postmaster will do when allocating a POSIX
  * segment in dsm_impl.c; if it doesn't work, we assume it won't work for
  * the postmaster either, and configure the cluster for System V shared
- * memory instead.
+ * memory instead. However, on Solaris-derived systems, System V shared
+ * memory is preferred.
  */
 static char *
 choose_dsm_implementation(void)
 {
-#ifdef HAVE_SHM_OPEN
+#if defined(HAVE_SHM_OPEN) && !defined(__sun)
 	int			ntries = 10;
 
 	while (ntries > 0)
diff --git a/src/include/storage/dsm_impl.h b/src/include/storage/dsm_impl.h
index ec05e22a6b..af0d32fcb5 100644
--- a/src/include/storage/dsm_impl.h
+++ b/src/include/storage/dsm_impl.h
@@ -23,20 +23,22 @@
 /*
  * Determine which dynamic shared memory implementations will be supported
  * on this platform, and which one will be the default.
+ * Note for systems which feature SHM_SHARE_MMU (e.g. illumos and Solaris)
+ * we prefer System V Shared Memory instead of POSIX Shared Memory.
  */
 #ifdef WIN32
-#define USE_DSM_WINDOWS
-#define DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE		DSM_IMPL_WINDOWS
-#else
-#ifdef HAVE_SHM_OPEN
-#define USE_DSM_POSIX
-#define DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE		DSM_IMPL_POSIX
-#endif
-#define USE_DSM_SYSV
-#ifndef DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE
-#define DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE		DSM_IMPL_SYSV
-#endif
-#define USE_DSM_MMAP
+#    define USE_DSM_WINDOWS
+#    define DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE		DSM_IMPL_WINDOWS
+#  else
+#    ifdef HAVE_SHM_OPEN
+#      define USE_DSM_POSIX
+#      define DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE	DSM_IMPL_POSIX
+#    endif
+#    define USE_DSM_SYSV
+#    if !defined(DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE) || defined(SHM_SHARE_MMU)
+#      define DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE	DSM_IMPL_SYSV
+#    endif
+#    define USE_DSM_MMAP
 #endif
 
 /* GUC. */
#2Sean Chittenden
seanc@joyent.com
In reply to: Sean Chittenden (#1)

Hello.  We identified the same problem.  Sam Gwydir and Josh Clulow were able to put together the appropriate fix after.

The breakage in src/backend/port/sysv_shmem.c and src/include/storage/dsm_impl.h should be back ported to all supported versions (the regression was introduced between the 9.2 and 9.3 branches).

The only real question remaining is: do we want to change the default behavior, as detected by initdb(1), to use ISM on Illumos/Solaris derived systems?  Given the memory bloat experienced when using POSIX shared memory (potentially very significant for systems with larger shared_buffers), we think it's probably prudent to change the default from dynamic_shared_memory_type=posix to sysv.  Unfortunately I think it's not worth changing the default behavior of initdb(1) in the current supported branches, but it should be changed for 10.

Please let us know if there are any questions.  -sc

--
Sean Chittenden
seanc@joyent.com

#3Andres Freund
andres@anarazel.de
In reply to: Sean Chittenden (#2)
Re: Re: [PATCH] BUG #13416: Postgres >= 9.3 doesn't use optimized shared memory on Solaris anymore

Hi,

On 2017-10-04 10:47:06 -0700, Sean Chittenden wrote:

Hello. �We identified the same problem. �Sam Gwydir and Josh Clulow were able to put together the appropriate fix after.

The breakage in�src/backend/port/sysv_shmem.c�and�src/include/storage/dsm_impl.h�should be back ported to all supported versions (the regression was introduced between the 9.2 and 9.3 branches).

Personally I don't think "breakage" is quite the right work.

I also don't like that we're unconditionally not using
USE_ANONYMOUS_SHMEM - doesn't that run into similar config limits on
solaris based stuff as it does on linux etc?

I think if we want to do this, we should rather go with a patch like
/messages/by-id/20140422121921.GD4449@awork2.anarazel.de

Greetings,

Andres Freund

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Sean Chittenden
seanc@joyent.com
In reply to: Andres Freund (#3)
1 attachment(s)
Re: Re: [PATCH] BUG #13416: Postgres >= 9.3 doesn't use optimized shared memory on Solaris anymore

Fair enough.  We observed a ~4x amplification in memory usage so it was rather severe in our case.

The patch you referenced was a much nicer approach and Sam updated it to match that style (thank you Sam!).  We debated this internally and feel strongly that this should be exposed as a runtime GUC as suggested.  What's your take on the attached patch?

-sc

--
Sean Chittenden
seanc@joyent.com

Show quoted text

On Oct 4, 2017, 10:56 AM -0700, Andres Freund <andres@anarazel.de>, wrote:

Hi,

On 2017-10-04 10:47:06 -0700, Sean Chittenden wrote:

Hello.  We identified the same problem.  Sam Gwydir and Josh Clulow were able to put together the appropriate fix after.

The breakage in src/backend/port/sysv_shmem.c and src/include/storage/dsm_impl.h should be back ported to all supported versions (the regression was introduced between the 9.2 and 9.3 branches).

Personally I don't think "breakage" is quite the right work.

I also don't like that we're unconditionally not using
USE_ANONYMOUS_SHMEM - doesn't that run into similar config limits on
solaris based stuff as it does on linux etc?

I think if we want to do this, we should rather go with a patch like
/messages/by-id/20140422121921.GD4449@awork2.anarazel.de

Greetings,

Andres Freund

Attachments:

ism2.patchapplication/octet-streamDownload
diff --git c/src/backend/port/sysv_shmem.c w/src/backend/port/sysv_shmem.c
index 850eafddc3..bbc3cbbd2a 100644
--- c/src/backend/port/sysv_shmem.c
+++ w/src/backend/port/sysv_shmem.c
@@ -55,15 +55,20 @@
  * false, we might need to add compile and/or run-time tests here and do this
  * only if the running kernel supports it.
  *
- * However, we must always disable this logic in the EXEC_BACKEND case, and
+ * There are two cases in which we disable this logic. On systems that support
+ * intimate shared memory (e.g. illumos and Solaris), we prefer the usage of
+ * System V shared memory in order to take advantage of the performance
+ * improvements that sharing page tables provides.
+ *
+ * In addition, we must always disable this logic in the EXEC_BACKEND case, and
  * fall back to the old method of allocating the entire segment using System V
  * shared memory, because there's no way to attach an anonymous mmap'd segment
  * to a process after exec().  Since EXEC_BACKEND is intended only for
  * developer use, this shouldn't be a big problem.  Because of this, we do
  * not worry about supporting anonymous shmem in the EXEC_BACKEND cases below.
  */
-#ifndef EXEC_BACKEND
-#define USE_ANONYMOUS_SHMEM
+#if !defined(EXEC_BACKEND) && !defined(SHMEM_TYPE_SYSV)
+#  define USE_ANONYMOUS_SHMEM
 #endif
 
 
@@ -576,17 +581,23 @@ PGSharedMemoryCreate(Size size, bool makePrivate, int port,
 	Assert(size > MAXALIGN(sizeof(PGShmemHeader)));
 
 #ifdef USE_ANONYMOUS_SHMEM
-	AnonymousShmem = CreateAnonymousSegment(&size);
-	AnonymousShmemSize = size;
+	if (shared_memory_type == SHMEM_TYPE_MMAP)
+	{
+		AnonymousShmem = CreateAnonymousSegment(&size);
+		AnonymousShmemSize = size;
 
-	/* Register on-exit routine to unmap the anonymous segment */
-	on_shmem_exit(AnonymousShmemDetach, (Datum) 0);
+		/* Register on-exit routine to unmap the anonymous segment */
+		on_shmem_exit(AnonymousShmemDetach, (Datum) 0);
 
-	/* Now we need only allocate a minimal-sized SysV shmem block. */
-	sysvsize = sizeof(PGShmemHeader);
-#else
-	sysvsize = size;
+		/* Now we need only allocate a minimal-sized SysV shmem block. */
+		sysvsize = sizeof(PGShmemHeader);
+	}
+	else
 #endif
+	{
+		Assert(shared_memory_type == SHMEM_TYPE_SYSV);
+		sysvsize = size;
+	}
 
 	/* Make sure PGSharedMemoryAttach doesn't fail without need */
 	UsedShmemSegAddr = NULL;
diff --git c/src/backend/storage/ipc/ipci.c w/src/backend/storage/ipc/ipci.c
index c04b17fa8e..44c3a28aa2 100644
--- c/src/backend/storage/ipc/ipci.c
+++ w/src/backend/storage/ipc/ipci.c
@@ -45,6 +45,8 @@
 #include "storage/spin.h"
 #include "utils/snapmgr.h"
 
+/* GUCs */
+int	shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
 
 shmem_startup_hook_type shmem_startup_hook = NULL;
 
diff --git c/src/backend/utils/misc/guc.c w/src/backend/utils/misc/guc.c
index 4f1891f844..9527cc028a 100644
--- c/src/backend/utils/misc/guc.c
+++ w/src/backend/utils/misc/guc.c
@@ -393,6 +393,19 @@ static const struct config_enum_entry force_parallel_mode_options[] = {
 	{NULL, 0, false}
 };
 
+static struct config_enum_entry shared_memory_options[] = {
+#ifndef WIN32
+	{ "sysv", SHMEM_TYPE_SYSV, false},
+#endif
+#ifndef EXEC_BACKEND
+	{ "mmap", SHMEM_TYPE_MMAP, false},
+#endif
+#ifdef WIN32
+	{ "windows", SHMEM_TYPE_WINDOWS, false},
+#endif
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -3748,6 +3761,16 @@ static struct config_enum ConfigureNamesEnum[] =
 	},
 
 	{
+		{"shared_memory_type", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Selects the shared memory implementation used."),
+			NULL
+		},
+		&shared_memory_type,
+		DEFAULT_SHARED_MEMORY_TYPE, shared_memory_options,
+		NULL, NULL, NULL
+	},
+
+	{
 		{"dynamic_shared_memory_type", PGC_POSTMASTER, RESOURCES_MEM,
 			gettext_noop("Selects the dynamic shared memory implementation used."),
 			NULL
diff --git c/src/backend/utils/misc/postgresql.conf.sample w/src/backend/utils/misc/postgresql.conf.sample
index fa6c0eaa80..8f59028ea2 100644
--- c/src/backend/utils/misc/postgresql.conf.sample
+++ w/src/backend/utils/misc/postgresql.conf.sample
@@ -124,6 +124,11 @@
 #replacement_sort_tuples = 150000	# limits use of replacement selection sort
 #autovacuum_work_mem = -1		# min 1MB, or -1 to use maintenance_work_mem
 #max_stack_depth = 2MB			# min 100kB
+#shared_memory_type = mmap		# the default is the first option
+					# supported by the operating system:
+					#   mmap
+					#   sysv
+					#   windows
 #dynamic_shared_memory_type = posix	# the default is the first option
 					# supported by the operating system:
 					#   posix
diff --git c/src/bin/initdb/initdb.c w/src/bin/initdb/initdb.c
index 27d134e17d..329ed88c8b 100644
--- c/src/bin/initdb/initdb.c
+++ w/src/bin/initdb/initdb.c
@@ -1050,12 +1050,13 @@ set_null_conf(void)
  * attempt to reproduce what the postmaster will do when allocating a POSIX
  * segment in dsm_impl.c; if it doesn't work, we assume it won't work for
  * the postmaster either, and configure the cluster for System V shared
- * memory instead.
+ * memory instead. However, on Solaris-derived systems, System V shared
+ * memory is preferred.
  */
 static char *
 choose_dsm_implementation(void)
 {
-#ifdef HAVE_SHM_OPEN
+#if defined(HAVE_SHM_OPEN) && !defined(__sun)
 	int			ntries = 10;
 
 	while (ntries > 0)
diff --git c/src/include/storage/pg_shmem.h w/src/include/storage/pg_shmem.h
index 6bf803ab88..761fd48df0 100644
--- c/src/include/storage/pg_shmem.h
+++ w/src/include/storage/pg_shmem.h
@@ -41,9 +41,18 @@ typedef struct PGShmemHeader	/* standard header for all Postgres shmem */
 #endif
 } PGShmemHeader;
 
-/* GUC variable */
+/* GUC variables */
+extern int	shared_memory_type;
 extern int	huge_pages;
 
+/* Possible values for shared_memory_type */
+typedef enum
+{
+	SHMEM_TYPE_WINDOWS,
+	SHMEM_TYPE_SYSV,
+	SHMEM_TYPE_MMAP
+}	PGShmemType;
+
 /* Possible values for huge_pages */
 typedef enum
 {
@@ -59,6 +68,14 @@ extern HANDLE UsedShmemSegID;
 #endif
 extern void *UsedShmemSegAddr;
 
+#if !defined(WIN32) && !defined(EXEC_BACKEND)
+#define DEFAULT_SHARED_MEMORY_TYPE SHMEM_TYPE_MMAP
+#elif !defined(WIN32)
+#define DEFAULT_SHARED_MEMORY_TYPE SHMEM_TYPE_SYSV
+#else
+#define DEFAULT_SHARED_MEMORY_TYPE SHMEM_TYPE_WINDOWS
+#endif
+
 #ifdef EXEC_BACKEND
 extern void PGSharedMemoryReAttach(void);
 extern void PGSharedMemoryNoReAttach(void);