Estimating HugePages Requirements?
Good day,
I'm trying to set up a chef recipe to reserve enough HugePages on a linux
system for our PG servers. A given VM will only host one PG cluster and
that will be the only thing on that host that uses HugePages. Blogs that
I've seen suggest that it would be as simple as taking the shared_buffers
setting and dividing that by 2MB (huge page size), however I found that I
needed some more.
In my test case, shared_buffers is set to 4003MB (calculated by chef) but
PG failed to start until I reserved a few hundred more MB. When I checked
VmPeak, it was 4321MB, so I ended up having to reserve over 2161 huge
pages, over a hundred more than I had originally thought.
I'm told other factors contribute to this additional memory requirement,
such as max_connections, wal_buffers, etc. I'm wondering if anyone has been
able to come up with a reliable method for determining the HugePages
requirements for a PG cluster based on the GUC values (that would be known
at deployment time).
Thanks,
Don.
--
Don Seiler
www.seiler.us
On Thu, Jun 10, 2021 at 12:42 AM Don Seiler <don@seiler.us> wrote:
I'm told other factors contribute to this additional memory requirement, such as max_connections, wal_buffers, etc. I'm wondering if anyone has been able to come up with a reliable method for determining the HugePages requirements for a PG cluster based on the GUC values (that would be known at deployment time).
It also depends on modules like pg_stat_statements and their own
configuration. I think that you can find the required size that your
current configuration will allocate with:
SELECT sum(allocated_size) FROM pg_shmem_allocations ;
Please ignore, if you have read the blog below, if not, at the end of it
there is a github repo which has mem specs for various tpcc benchmarks.
Ofcourse, your workload expectations may vary from the test scenarios used,
but just in case.
Settling the Myth of Transparent HugePages for Databases - Percona Database
Performance Blog
<https://www.percona.com/blog/2019/03/06/settling-the-myth-of-transparent-hugepages-for-databases/>
On Wed, Jun 9, 2021 at 1:45 PM Vijaykumar Jain <
vijaykumarjain.github@gmail.com> wrote:
Please ignore, if you have read the blog below, if not, at the end of it
there is a github repo which has mem specs for various tpcc benchmarks.
Ofcourse, your workload expectations may vary from the test scenarios
used, but just in case.Settling the Myth of Transparent HugePages for Databases - Percona
Database Performance Blog
<https://www.percona.com/blog/2019/03/06/settling-the-myth-of-transparent-hugepages-for-databases/>
That blog post is about transparent huge pages, which is different than
HugePages I'm looking at here. We already disable THP as a matter of course.
--
Don Seiler
www.seiler.us
On Wed, Jun 9, 2021 at 01:52:19PM -0500, Don Seiler wrote:
On Wed, Jun 9, 2021 at 1:45 PM Vijaykumar Jain <vijaykumarjain.github@gmail.com
wrote:
Please ignore, if you have read the blog below, if not, at the end of it
there is a github repo which has mem specs for various tpcc�benchmarks.
Ofcourse, your workload expectations�may vary from the test scenarios used,
but just in case.Settling the Myth of Transparent HugePages for Databases - Percona Database
Performance BlogThat blog post is about transparent huge pages, which is different than
HugePages I'm looking at here. We already disable THP as a matter of course.
This blog post talks about sizing huge pages too:
https://momjian.us/main/blogs/pgblog/2021.html#April_12_2021
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
On Wed, Jun 9, 2021 at 7:23 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Thu, Jun 10, 2021 at 12:42 AM Don Seiler <don@seiler.us> wrote:
I'm told other factors contribute to this additional memory requirement, such as max_connections, wal_buffers, etc. I'm wondering if anyone has been able to come up with a reliable method for determining the HugePages requirements for a PG cluster based on the GUC values (that would be known at deployment time).
It also depends on modules like pg_stat_statements and their own
configuration. I think that you can find the required size that your
current configuration will allocate with:SELECT sum(allocated_size) FROM pg_shmem_allocations ;
I wonder how hard it would be to for example expose that through a
commandline switch or tool.
The point being that in order to run the query you suggest, the server
must already be running. There is no way to use this to estimate the
size that you're going to need after changing the value of
shared_buffers, which is a very common scenario. (You can change it,
restart without using huge pages because it fails, run that query,
change huge pages, and restart again -- but that's not exactly...
convenient)
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes:
I wonder how hard it would be to for example expose that through a
commandline switch or tool.
Just try to start the server and see if it complains.
For instance, with shared_buffers=10000000 I get
2021-06-09 15:08:56.821 EDT [1428121] FATAL: could not map anonymous shared memory: Cannot allocate memory
2021-06-09 15:08:56.821 EDT [1428121] HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space, or huge pages. To reduce the request size (currently 83720568832 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
Of course, if it *does* start, you can do the other thing.
Admittedly, we could make that easier somehow; but if it took
25 years for somebody to ask for this, I'm not sure it's
worth creating a feature to make it a shade easier.
regards, tom lane
On Wed, Jun 9, 2021 at 9:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
I wonder how hard it would be to for example expose that through a
commandline switch or tool.Just try to start the server and see if it complains.
For instance, with shared_buffers=10000000 I get2021-06-09 15:08:56.821 EDT [1428121] FATAL: could not map anonymous shared memory: Cannot allocate memory
2021-06-09 15:08:56.821 EDT [1428121] HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space, or huge pages. To reduce the request size (currently 83720568832 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.Of course, if it *does* start, you can do the other thing.
Well, I have to *stop* the existing one first, most likely, otherwise
there won't be enough huge pages (or indeed memory) available. And if
then doesn't start, you're looking at extended downtime.
You can automate this to minimize it (set the value in the conf, stop
old, start new, if new doesn't start then stop new, reconfigure, start
old again), but it's *far* from friendly.
This process works when you're setting up a brand new server with
nobody using it. It doesn't work well, or at all, when you actually
have active users on it..
Admittedly, we could make that easier somehow; but if it took
25 years for somebody to ask for this, I'm not sure it's
worth creating a feature to make it a shade easier.
We haven't had huge page support for 25 years, "only" since 9.4 so
about 7 years.
And for every year that passes, huge pages become more interesting in
that in general memory sizes increase so the payoff of using them is
increased.
Using huge pages *should* be a trivial improvement to set up. But it's
in my experience complicated enough that many just skip it simply for
that reason.
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes:
On Wed, Jun 9, 2021 at 9:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Just try to start the server and see if it complains.
Well, I have to *stop* the existing one first, most likely, otherwise
there won't be enough huge pages (or indeed memory) available.
I'm not following. If you have a production server running, its
pg_shmem_allocations total should already be a pretty good guide
to what you need to configure HugePages for. You need to know to
round that up, of course --- but if you aren't building a lot of
slop into the HugePages configuration anyway, you'll get burned
down the road.
regards, tom lane
On Wed, Jun 9, 2021 at 9:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
On Wed, Jun 9, 2021 at 9:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Just try to start the server and see if it complains.
Well, I have to *stop* the existing one first, most likely, otherwise
there won't be enough huge pages (or indeed memory) available.I'm not following. If you have a production server running, its
pg_shmem_allocations total should already be a pretty good guide
to what you need to configure HugePages for. You need to know to
round that up, of course --- but if you aren't building a lot of
slop into the HugePages configuration anyway, you'll get burned
down the road.
I'm talking about the case when you want to *change* the value for
shared_buffers (or other parameters that would change the amount of
required huge pages), on a system where you're using huge pages.
pg_shmem_allocations will tell you what you need with the current
value, not what you need with the new value.
But yes, you can do some math around it and make a well educated
guess. But it would be very convenient to have the system able to do
that for you.
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
moving to pgsql-hackers@
On 6/9/21, 9:41 AM, "Don Seiler" <don@seiler.us> wrote:
I'm trying to set up a chef recipe to reserve enough HugePages on a
linux system for our PG servers. A given VM will only host one PG
cluster and that will be the only thing on that host that uses
HugePages. Blogs that I've seen suggest that it would be as simple
as taking the shared_buffers setting and dividing that by 2MB (huge
page size), however I found that I needed some more.In my test case, shared_buffers is set to 4003MB (calculated by
chef) but PG failed to start until I reserved a few hundred more MB.
When I checked VmPeak, it was 4321MB, so I ended up having to
reserve over 2161 huge pages, over a hundred more than I had
originally thought.I'm told other factors contribute to this additional memory
requirement, such as max_connections, wal_buffers, etc. I'm
wondering if anyone has been able to come up with a reliable method
for determining the HugePages requirements for a PG cluster based on
the GUC values (that would be known at deployment time).
In RDS, we've added a pg_ctl option that returns the amount of shared
memory required. Basically, we start up postmaster just enough to get
an accurate value from CreateSharedMemoryAndSemaphores() and then shut
down. The patch is quite battle-tested at this point (we first
started using it in 2017, and we've been enabling huge pages by
default since v10). I'd be happy to clean it up and submit it for
discussion in pgsql-hackers@ if there is interest.
Nathan
On Jun 9, 2021, at 1:52 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
I'd be happy to clean it up and submit it for
discussion in pgsql-hackers@ if there is interest.
Yes, I'd like to see it. Thanks for offering.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
I agree, its confusing for many and that confusion arises from the fact
that you usually talk of shared_buffers in MB or GB whereas hugepages have
to be configured in units of 2mb. But once they understand they realize its
pretty simple.
Don, we have experienced the same not just with postgres but also with
oracle. I havent been able to get to the root of it, but what we usually do
is, we add another 100-200 pages and that works for us. If the SGA or
shared_buffers is high eg 96gb, then we add 250-500 pages. Those few
hundred MBs may be wasted (because the moment you configure hugepages, the
operating system considers it as used and does not use it any more) but
nowadays, servers have 64 or 128 gb RAM easily and wasting that 500mb to
1gb does not hurt really.
HTH
On Thu, 10 Jun 2021 at 1:01 AM, Magnus Hagander <magnus@hagander.net> wrote:
Show quoted text
On Wed, Jun 9, 2021 at 9:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
On Wed, Jun 9, 2021 at 9:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Just try to start the server and see if it complains.
Well, I have to *stop* the existing one first, most likely, otherwise
there won't be enough huge pages (or indeed memory) available.I'm not following. If you have a production server running, its
pg_shmem_allocations total should already be a pretty good guide
to what you need to configure HugePages for. You need to know to
round that up, of course --- but if you aren't building a lot of
slop into the HugePages configuration anyway, you'll get burned
down the road.I'm talking about the case when you want to *change* the value for
shared_buffers (or other parameters that would change the amount of
required huge pages), on a system where you're using huge pages.
pg_shmem_allocations will tell you what you need with the current
value, not what you need with the new value.But yes, you can do some math around it and make a well educated
guess. But it would be very convenient to have the system able to do
that for you.--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
On 6/9/21, 3:51 PM, "Mark Dilger" <mark.dilger@enterprisedb.com> wrote:
On Jun 9, 2021, at 1:52 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
I'd be happy to clean it up and submit it for
discussion in pgsql-hackers@ if there is interest.Yes, I'd like to see it. Thanks for offering.
Here's the general idea. It still needs a bit of polishing, but I'm
hoping this is enough to spark some discussion on the approach.
Nathan
Attachments:
v1-0001-add-pg_ctl-option-for-retreiving-shmem-size.patchapplication/octet-stream; name=v1-0001-add-pg_ctl-option-for-retreiving-shmem-size.patchDownload
From a3f9d75237c14326cc0f96506f26f887d142e633 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Thu, 10 Jun 2021 03:00:49 +0000
Subject: [PATCH v1 1/1] add pg_ctl option for retreiving shmem size
---
src/backend/postmaster/postmaster.c | 31 ++++++++++++++++++++++++-------
src/backend/storage/ipc/ipci.c | 9 +++++++--
src/backend/tcop/postgres.c | 6 +++++-
src/backend/utils/init/postinit.c | 2 +-
src/bin/pg_ctl/pg_ctl.c | 23 ++++++++++++++++++++++-
src/include/storage/ipc.h | 2 +-
6 files changed, 60 insertions(+), 13 deletions(-)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 5a050898fe..b2fc67f4a2 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -586,6 +586,7 @@ PostmasterMain(int argc, char *argv[])
bool listen_addr_saved = false;
int i;
char *output_config_variable = NULL;
+ bool output_shmem = false;
InitProcessGlobals();
@@ -702,7 +703,7 @@ PostmasterMain(int argc, char *argv[])
* tcop/postgres.c (the option sets should not conflict) and with the
* common help() function in main/main.c.
*/
- while ((opt = getopt(argc, argv, "B:bc:C:D:d:EeFf:h:ijk:lN:nOPp:r:S:sTt:W:-:")) != -1)
+ while ((opt = getopt(argc, argv, "B:bc:C:D:d:EeFf:h:ijk:lMN:nOPp:r:S:sTt:W:-:")) != -1)
{
switch (opt)
{
@@ -768,6 +769,10 @@ PostmasterMain(int argc, char *argv[])
SetConfigOption("ssl", "true", PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'M':
+ output_shmem = true;
+ break;
+
case 'N':
SetConfigOption("max_connections", optarg, PGC_POSTMASTER, PGC_S_ARGV);
break;
@@ -1019,6 +1024,18 @@ PostmasterMain(int argc, char *argv[])
*/
InitializeMaxBackends();
+ if (output_shmem)
+ {
+ char output[64];
+ Size size;
+
+ size = CreateSharedMemoryAndSemaphores(true);
+ sprintf(output, "%zu", size);
+
+ puts(output);
+ ExitPostmaster(0);
+ }
+
/*
* Set up shared memory and semaphores.
*/
@@ -2660,7 +2677,7 @@ reset_shared(void)
* (if using SysV shmem and/or semas). This helps ensure that we will
* clean up dead IPC objects if the postmaster crashes and is restarted.
*/
- CreateSharedMemoryAndSemaphores();
+ (void) CreateSharedMemoryAndSemaphores(false);
}
@@ -5014,7 +5031,7 @@ SubPostmasterMain(int argc, char *argv[])
InitProcess();
/* Attach process to shared data structures */
- CreateSharedMemoryAndSemaphores();
+ (void) CreateSharedMemoryAndSemaphores(false);
/* And run the backend */
BackendRun(&port); /* does not return */
@@ -5028,7 +5045,7 @@ SubPostmasterMain(int argc, char *argv[])
InitAuxiliaryProcess();
/* Attach process to shared data structures */
- CreateSharedMemoryAndSemaphores();
+ (void) CreateSharedMemoryAndSemaphores(false);
AuxiliaryProcessMain(argc - 2, argv + 2); /* does not return */
}
@@ -5041,7 +5058,7 @@ SubPostmasterMain(int argc, char *argv[])
InitProcess();
/* Attach process to shared data structures */
- CreateSharedMemoryAndSemaphores();
+ (void) CreateSharedMemoryAndSemaphores(false);
AutoVacLauncherMain(argc - 2, argv + 2); /* does not return */
}
@@ -5054,7 +5071,7 @@ SubPostmasterMain(int argc, char *argv[])
InitProcess();
/* Attach process to shared data structures */
- CreateSharedMemoryAndSemaphores();
+ (void) CreateSharedMemoryAndSemaphores(false);
AutoVacWorkerMain(argc - 2, argv + 2); /* does not return */
}
@@ -5072,7 +5089,7 @@ SubPostmasterMain(int argc, char *argv[])
InitProcess();
/* Attach process to shared data structures */
- CreateSharedMemoryAndSemaphores();
+ (void) CreateSharedMemoryAndSemaphores(false);
/* Fetch MyBgworkerEntry from shared memory */
shmem_slot = atoi(argv[1] + 15);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 3e4ec53a97..0202e59748 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -91,8 +91,8 @@ RequestAddinShmemSpace(Size size)
* check IsUnderPostmaster, rather than EXEC_BACKEND, to detect this case.
* This is a bit code-wasteful and could be cleaned up.)
*/
-void
-CreateSharedMemoryAndSemaphores(void)
+Size
+CreateSharedMemoryAndSemaphores(bool size_only)
{
PGShmemHeader *shim = NULL;
@@ -161,6 +161,9 @@ CreateSharedMemoryAndSemaphores(void)
/* might as well round it off to a multiple of a typical page size */
size = add_size(size, 8192 - (size % 8192));
+ if (size_only)
+ return size;
+
elog(DEBUG3, "invoking IpcMemoryCreate(size=%zu)", size);
/*
@@ -288,4 +291,6 @@ CreateSharedMemoryAndSemaphores(void)
*/
if (shmem_startup_hook)
shmem_startup_hook();
+
+ return 0;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8cea10c901..1ef3fdf8d1 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3711,7 +3711,7 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
* postmaster/postmaster.c (the option sets should not conflict) and with
* the common help() function in main/main.c.
*/
- while ((flag = getopt(argc, argv, "B:bc:C:D:d:EeFf:h:ijk:lN:nOPp:r:S:sTt:v:W:-:")) != -1)
+ while ((flag = getopt(argc, argv, "B:bc:C:D:d:EeFf:h:ijk:lMN:nOPp:r:S:sTt:v:W:-:")) != -1)
{
switch (flag)
{
@@ -3777,6 +3777,10 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("ssl", "true", ctx, gucsource);
break;
+ case 'M':
+ /* ignored for consistency with postmaster */
+ break;
+
case 'N':
SetConfigOption("max_connections", optarg, ctx, gucsource);
break;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 51d1bbef30..7c329009d2 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -437,7 +437,7 @@ InitCommunication(void)
* We're running a postgres bootstrap process or a standalone backend,
* so we need to set up shmem.
*/
- CreateSharedMemoryAndSemaphores();
+ (void) CreateSharedMemoryAndSemaphores(false);
}
}
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 7985da0a94..9678185930 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -67,7 +67,8 @@ typedef enum
KILL_COMMAND,
REGISTER_COMMAND,
UNREGISTER_COMMAND,
- RUN_AS_SERVICE_COMMAND
+ RUN_AS_SERVICE_COMMAND,
+ OUTPUT_SHARED_MEMORY_COMMAND
} CtlCommand;
#define DEFAULT_WAIT 60
@@ -898,6 +899,9 @@ do_start(void)
pm_pid = start_postmaster();
+ if (ctl_command == OUTPUT_SHARED_MEMORY_COMMAND)
+ return;
+
if (do_wait)
{
/*
@@ -2469,6 +2473,20 @@ main(int argc, char **argv)
else if (strcmp(argv[optind], "runservice") == 0)
ctl_command = RUN_AS_SERVICE_COMMAND;
#endif
+ else if (strcmp(argv[optind], "output_shared_memory") == 0)
+ {
+ ctl_command = OUTPUT_SHARED_MEMORY_COMMAND;
+
+ if (!post_opts)
+ post_opts = pstrdup("-M");
+ else
+ {
+ char *old_post_opts = post_opts;
+
+ post_opts = psprintf("%s %s", old_post_opts, "-M");
+ free(old_post_opts);
+ }
+ }
else
{
write_stderr(_("%s: unrecognized operation mode \"%s\"\n"), progname, argv[optind]);
@@ -2572,6 +2590,9 @@ main(int argc, char **argv)
pgwin32_doRunAsService();
break;
#endif
+ case OUTPUT_SHARED_MEMORY_COMMAND:
+ do_start();
+ break;
default:
break;
}
diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index 753a6dd4d7..cdb1dd266d 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -77,6 +77,6 @@ extern void check_on_shmem_exit_lists_are_empty(void);
/* ipci.c */
extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
-extern void CreateSharedMemoryAndSemaphores(void);
+extern Size CreateSharedMemoryAndSemaphores(bool size_only);
#endif /* IPC_H */
--
2.16.6
On Wed, Jun 9, 2021, 21:03 P C <puravc@gmail.com> wrote:
I agree, its confusing for many and that confusion arises from the fact
that you usually talk of shared_buffers in MB or GB whereas hugepages have
to be configured in units of 2mb. But once they understand they realize its
pretty simple.Don, we have experienced the same not just with postgres but also with
oracle. I havent been able to get to the root of it, but what we usually do
is, we add another 100-200 pages and that works for us. If the SGA or
shared_buffers is high eg 96gb, then we add 250-500 pages. Those few
hundred MBs may be wasted (because the moment you configure hugepages, the
operating system considers it as used and does not use it any more) but
nowadays, servers have 64 or 128 gb RAM easily and wasting that 500mb to
1gb does not hurt really.
I don't have a problem with the math, just wanted to know if it was
possible to better estimate what the actual requirements would be at
deployment time. My fallback will probably be you did and just pad with an
extra 512MB by default.
Don.
On Wed, Jun 09, 2021 at 10:55:08PM -0500, Don Seiler wrote:
On Wed, Jun 9, 2021, 21:03 P C <puravc@gmail.com> wrote:
I agree, its confusing for many and that confusion arises from the fact
that you usually talk of shared_buffers in MB or GB whereas hugepages have
to be configured in units of 2mb. But once they understand they realize its
pretty simple.Don, we have experienced the same not just with postgres but also with
oracle. I havent been able to get to the root of it, but what we usually do
is, we add another 100-200 pages and that works for us. If the SGA or
shared_buffers is high eg 96gb, then we add 250-500 pages. Those few
hundred MBs may be wasted (because the moment you configure hugepages, the
operating system considers it as used and does not use it any more) but
nowadays, servers have 64 or 128 gb RAM easily and wasting that 500mb to
1gb does not hurt really.I don't have a problem with the math, just wanted to know if it was
possible to better estimate what the actual requirements would be at
deployment time. My fallback will probably be you did and just pad with an
extra 512MB by default.
It's because the huge allocation isn't just shared_buffers, but also
wal_buffers:
| The amount of shared memory used for WAL data that has not yet been written to disk.
| The default setting of -1 selects a size equal to 1/32nd (about 3%) of shared_buffers, ...
.. and other stuff:
src/backend/storage/ipc/ipci.c
* Size of the Postgres shared-memory block is estimated via
* moderately-accurate estimates for the big hogs, plus 100K for the
* stuff that's too small to bother with estimating.
*
* We take some care during this phase to ensure that the total size
* request doesn't overflow size_t. If this gets through, we don't
* need to be so careful during the actual allocation phase.
*/
size = 100000;
size = add_size(size, PGSemaphoreShmemSize(numSemas));
size = add_size(size, SpinlockSemaSize());
size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
sizeof(ShmemIndexEnt)));
size = add_size(size, dsm_estimate_size());
size = add_size(size, BufferShmemSize());
size = add_size(size, LockShmemSize());
size = add_size(size, PredicateLockShmemSize());
size = add_size(size, ProcGlobalShmemSize());
size = add_size(size, XLOGShmemSize());
size = add_size(size, CLOGShmemSize());
size = add_size(size, CommitTsShmemSize());
size = add_size(size, SUBTRANSShmemSize());
size = add_size(size, TwoPhaseShmemSize());
size = add_size(size, BackgroundWorkerShmemSize());
size = add_size(size, MultiXactShmemSize());
size = add_size(size, LWLockShmemSize());
size = add_size(size, ProcArrayShmemSize());
size = add_size(size, BackendStatusShmemSize());
size = add_size(size, SInvalShmemSize());
size = add_size(size, PMSignalShmemSize());
size = add_size(size, ProcSignalShmemSize());
size = add_size(size, CheckpointerShmemSize());
size = add_size(size, AutoVacuumShmemSize());
size = add_size(size, ReplicationSlotsShmemSize());
size = add_size(size, ReplicationOriginShmemSize());
size = add_size(size, WalSndShmemSize());
size = add_size(size, WalRcvShmemSize());
size = add_size(size, PgArchShmemSize());
size = add_size(size, ApplyLauncherShmemSize());
size = add_size(size, SnapMgrShmemSize());
size = add_size(size, BTreeShmemSize());
size = add_size(size, SyncScanShmemSize());
size = add_size(size, AsyncShmemSize());
#ifdef EXEC_BACKEND
size = add_size(size, ShmemBackendArraySize());
#endif
/* freeze the addin request size and include it */
addin_request_allowed = false;
size = add_size(size, total_addin_request);
/* might as well round it off to a multiple of a typical page size */
size = add_size(size, 8192 - (size % 8192));
BTW, I think it'd be nice if this were a NOTICE:
| elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge pages disabled: %m", allocsize);
--
Justin
On Thu, Jun 10, 2021 at 7:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Wed, Jun 09, 2021 at 10:55:08PM -0500, Don Seiler wrote:
On Wed, Jun 9, 2021, 21:03 P C <puravc@gmail.com> wrote:
I agree, its confusing for many and that confusion arises from the fact
that you usually talk of shared_buffers in MB or GB whereas hugepageshave
to be configured in units of 2mb. But once they understand they
realize its
pretty simple.
Don, we have experienced the same not just with postgres but also with
oracle. I havent been able to get to the root of it, but what weusually do
is, we add another 100-200 pages and that works for us. If the SGA or
shared_buffers is high eg 96gb, then we add 250-500 pages. Those few
hundred MBs may be wasted (because the moment you configurehugepages, the
operating system considers it as used and does not use it any more) but
nowadays, servers have 64 or 128 gb RAM easily and wasting that 500mbto
1gb does not hurt really.
I don't have a problem with the math, just wanted to know if it was
possible to better estimate what the actual requirements would be at
deployment time. My fallback will probably be you did and just pad withan
extra 512MB by default.
It's because the huge allocation isn't just shared_buffers, but also
wal_buffers:| The amount of shared memory used for WAL data that has not yet been
written to disk.
| The default setting of -1 selects a size equal to 1/32nd (about 3%) of
shared_buffers, ..... and other stuff:
src/backend/storage/ipc/ipci.c
* Size of the Postgres shared-memory block is estimated via
* moderately-accurate estimates for the big hogs, plus 100K for
the
* stuff that's too small to bother with estimating.
*
* We take some care during this phase to ensure that the total
size
* request doesn't overflow size_t. If this gets through, we don't
* need to be so careful during the actual allocation phase.
*/
size = 100000;
size = add_size(size, PGSemaphoreShmemSize(numSemas));
size = add_size(size, SpinlockSemaSize());
size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,sizeof(ShmemIndexEnt)));
size = add_size(size, dsm_estimate_size());
size = add_size(size, BufferShmemSize());
size = add_size(size, LockShmemSize());
size = add_size(size, PredicateLockShmemSize());
size = add_size(size, ProcGlobalShmemSize());
size = add_size(size, XLOGShmemSize());
size = add_size(size, CLOGShmemSize());
size = add_size(size, CommitTsShmemSize());
size = add_size(size, SUBTRANSShmemSize());
size = add_size(size, TwoPhaseShmemSize());
size = add_size(size, BackgroundWorkerShmemSize());
size = add_size(size, MultiXactShmemSize());
size = add_size(size, LWLockShmemSize());
size = add_size(size, ProcArrayShmemSize());
size = add_size(size, BackendStatusShmemSize());
size = add_size(size, SInvalShmemSize());
size = add_size(size, PMSignalShmemSize());
size = add_size(size, ProcSignalShmemSize());
size = add_size(size, CheckpointerShmemSize());
size = add_size(size, AutoVacuumShmemSize());
size = add_size(size, ReplicationSlotsShmemSize());
size = add_size(size, ReplicationOriginShmemSize());
size = add_size(size, WalSndShmemSize());
size = add_size(size, WalRcvShmemSize());
size = add_size(size, PgArchShmemSize());
size = add_size(size, ApplyLauncherShmemSize());
size = add_size(size, SnapMgrShmemSize());
size = add_size(size, BTreeShmemSize());
size = add_size(size, SyncScanShmemSize());
size = add_size(size, AsyncShmemSize());
#ifdef EXEC_BACKEND
size = add_size(size, ShmemBackendArraySize());
#endif/* freeze the addin request size and include it */
addin_request_allowed = false;
size = add_size(size, total_addin_request);/* might as well round it off to a multiple of a typical page size
*/
size = add_size(size, 8192 - (size % 8192));BTW, I think it'd be nice if this were a NOTICE:
| elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge pages disabled:
%m", allocsize);
Great detail. I did some trial and error around just a few variables
(shared_buffers, wal_buffers, max_connections) and came up with a formula
that seems to be "good enough" for at least a rough default estimate.
The pseudo-code is basically:
ceiling((shared_buffers + 200 + (25 * shared_buffers/1024) +
10*(max_connections-100)/200 + wal_buffers-16)/2)
This assumes that all values are in MB and that wal_buffers is set to a
value other than the default of -1 obviously. I decided to default
wal_buffers to 16MB in our environments since that's what -1 should go to
based on the description in the documentation for an instance with
shared_buffers of the sizes in our deployments.
This formula did come up a little short (2MB) when I had a low
shared_buffers value at 2GB. Raising that starting 200 value to something
like 250 would take care of that. The limited testing I did based on
different values we see across our production deployments worked otherwise.
Please let me know what you folks think. I know I'm ignoring a lot of other
factors, especially given what Justin recently shared.
The remaining trick for me now is to calculate this in chef since
shared_buffers and wal_buffers attributes are strings with the unit ("MB")
in them, rather than just numerical values. Thinking of changing that
attribute to be just that and assume/require MB to make the calculations
easier.
--
Don Seiler
www.seiler.us
On 6/9/21, 8:09 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
On 6/9/21, 3:51 PM, "Mark Dilger" <mark.dilger@enterprisedb.com> wrote:
On Jun 9, 2021, at 1:52 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
I'd be happy to clean it up and submit it for
discussion in pgsql-hackers@ if there is interest.Yes, I'd like to see it. Thanks for offering.
Here's the general idea. It still needs a bit of polishing, but I'm
hoping this is enough to spark some discussion on the approach.
Here's a rebased version of the patch.
Nathan
Attachments:
v2-0001-add-pg_ctl-option-for-retreiving-shmem-size.patchapplication/octet-stream; name=v2-0001-add-pg_ctl-option-for-retreiving-shmem-size.patchDownload
From f17f5862c2dd5c01a41143eb4d9d7eafae83618f Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Mon, 9 Aug 2021 22:48:48 +0000
Subject: [PATCH v2 1/1] add pg_ctl option for retreiving shmem size
---
src/backend/bootstrap/bootstrap.c | 2 +-
src/backend/postmaster/postmaster.c | 31 ++++++++++++++++++++++++-------
src/backend/storage/ipc/ipci.c | 9 +++++++--
src/backend/tcop/postgres.c | 8 ++++++--
src/bin/pg_ctl/pg_ctl.c | 23 ++++++++++++++++++++++-
src/include/storage/ipc.h | 2 +-
6 files changed, 61 insertions(+), 14 deletions(-)
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 48615c0ebc..9e57591add 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -324,7 +324,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
InitializeMaxBackends();
- CreateSharedMemoryAndSemaphores();
+ (void) CreateSharedMemoryAndSemaphores(false);
/*
* XXX: It might make sense to move this into its own function at some
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index fc0bc8d99e..5227ce372f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -586,6 +586,7 @@ PostmasterMain(int argc, char *argv[])
bool listen_addr_saved = false;
int i;
char *output_config_variable = NULL;
+ bool output_shmem = false;
InitProcessGlobals();
@@ -709,7 +710,7 @@ PostmasterMain(int argc, char *argv[])
* tcop/postgres.c (the option sets should not conflict) and with the
* common help() function in main/main.c.
*/
- while ((opt = getopt(argc, argv, "B:bc:C:D:d:EeFf:h:ijk:lN:nOPp:r:S:sTt:W:-:")) != -1)
+ while ((opt = getopt(argc, argv, "B:bc:C:D:d:EeFf:h:ijk:lMN:nOPp:r:S:sTt:W:-:")) != -1)
{
switch (opt)
{
@@ -775,6 +776,10 @@ PostmasterMain(int argc, char *argv[])
SetConfigOption("ssl", "true", PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'M':
+ output_shmem = true;
+ break;
+
case 'N':
SetConfigOption("max_connections", optarg, PGC_POSTMASTER, PGC_S_ARGV);
break;
@@ -1026,6 +1031,18 @@ PostmasterMain(int argc, char *argv[])
*/
InitializeMaxBackends();
+ if (output_shmem)
+ {
+ char output[64];
+ Size size;
+
+ size = CreateSharedMemoryAndSemaphores(true);
+ sprintf(output, "%zu", size);
+
+ puts(output);
+ ExitPostmaster(0);
+ }
+
/*
* Set up shared memory and semaphores.
*/
@@ -2673,7 +2690,7 @@ reset_shared(void)
* (if using SysV shmem and/or semas). This helps ensure that we will
* clean up dead IPC objects if the postmaster crashes and is restarted.
*/
- CreateSharedMemoryAndSemaphores();
+ (void) CreateSharedMemoryAndSemaphores(false);
}
@@ -5017,7 +5034,7 @@ SubPostmasterMain(int argc, char *argv[])
InitProcess();
/* Attach process to shared data structures */
- CreateSharedMemoryAndSemaphores();
+ (void) CreateSharedMemoryAndSemaphores(false);
/* And run the backend */
BackendRun(&port); /* does not return */
@@ -5035,7 +5052,7 @@ SubPostmasterMain(int argc, char *argv[])
InitAuxiliaryProcess();
/* Attach process to shared data structures */
- CreateSharedMemoryAndSemaphores();
+ (void) CreateSharedMemoryAndSemaphores(false);
auxtype = atoi(argv[3]);
AuxiliaryProcessMain(auxtype); /* does not return */
@@ -5049,7 +5066,7 @@ SubPostmasterMain(int argc, char *argv[])
InitProcess();
/* Attach process to shared data structures */
- CreateSharedMemoryAndSemaphores();
+ (void) CreateSharedMemoryAndSemaphores(false);
AutoVacLauncherMain(argc - 2, argv + 2); /* does not return */
}
@@ -5062,7 +5079,7 @@ SubPostmasterMain(int argc, char *argv[])
InitProcess();
/* Attach process to shared data structures */
- CreateSharedMemoryAndSemaphores();
+ (void) CreateSharedMemoryAndSemaphores(false);
AutoVacWorkerMain(argc - 2, argv + 2); /* does not return */
}
@@ -5080,7 +5097,7 @@ SubPostmasterMain(int argc, char *argv[])
InitProcess();
/* Attach process to shared data structures */
- CreateSharedMemoryAndSemaphores();
+ (void) CreateSharedMemoryAndSemaphores(false);
/* Fetch MyBgworkerEntry from shared memory */
shmem_slot = atoi(argv[1] + 15);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 3e4ec53a97..0202e59748 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -91,8 +91,8 @@ RequestAddinShmemSpace(Size size)
* check IsUnderPostmaster, rather than EXEC_BACKEND, to detect this case.
* This is a bit code-wasteful and could be cleaned up.)
*/
-void
-CreateSharedMemoryAndSemaphores(void)
+Size
+CreateSharedMemoryAndSemaphores(bool size_only)
{
PGShmemHeader *shim = NULL;
@@ -161,6 +161,9 @@ CreateSharedMemoryAndSemaphores(void)
/* might as well round it off to a multiple of a typical page size */
size = add_size(size, 8192 - (size % 8192));
+ if (size_only)
+ return size;
+
elog(DEBUG3, "invoking IpcMemoryCreate(size=%zu)", size);
/*
@@ -288,4 +291,6 @@ CreateSharedMemoryAndSemaphores(void)
*/
if (shmem_startup_hook)
shmem_startup_hook();
+
+ return 0;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 58b5960e27..eee4307fec 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3711,7 +3711,7 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
* postmaster/postmaster.c (the option sets should not conflict) and with
* the common help() function in main/main.c.
*/
- while ((flag = getopt(argc, argv, "B:bc:C:D:d:EeFf:h:ijk:lN:nOPp:r:S:sTt:v:W:-:")) != -1)
+ while ((flag = getopt(argc, argv, "B:bc:C:D:d:EeFf:h:ijk:lMN:nOPp:r:S:sTt:v:W:-:")) != -1)
{
switch (flag)
{
@@ -3777,6 +3777,10 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("ssl", "true", ctx, gucsource);
break;
+ case 'M':
+ /* ignored for consistency with postmaster */
+ break;
+
case 'N':
SetConfigOption("max_connections", optarg, ctx, gucsource);
break;
@@ -4051,7 +4055,7 @@ PostgresMain(int argc, char *argv[],
/* Initialize MaxBackends (if under postmaster, was done already) */
InitializeMaxBackends();
- CreateSharedMemoryAndSemaphores();
+ (void) CreateSharedMemoryAndSemaphores(false);
}
/*
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 7985da0a94..9678185930 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -67,7 +67,8 @@ typedef enum
KILL_COMMAND,
REGISTER_COMMAND,
UNREGISTER_COMMAND,
- RUN_AS_SERVICE_COMMAND
+ RUN_AS_SERVICE_COMMAND,
+ OUTPUT_SHARED_MEMORY_COMMAND
} CtlCommand;
#define DEFAULT_WAIT 60
@@ -898,6 +899,9 @@ do_start(void)
pm_pid = start_postmaster();
+ if (ctl_command == OUTPUT_SHARED_MEMORY_COMMAND)
+ return;
+
if (do_wait)
{
/*
@@ -2469,6 +2473,20 @@ main(int argc, char **argv)
else if (strcmp(argv[optind], "runservice") == 0)
ctl_command = RUN_AS_SERVICE_COMMAND;
#endif
+ else if (strcmp(argv[optind], "output_shared_memory") == 0)
+ {
+ ctl_command = OUTPUT_SHARED_MEMORY_COMMAND;
+
+ if (!post_opts)
+ post_opts = pstrdup("-M");
+ else
+ {
+ char *old_post_opts = post_opts;
+
+ post_opts = psprintf("%s %s", old_post_opts, "-M");
+ free(old_post_opts);
+ }
+ }
else
{
write_stderr(_("%s: unrecognized operation mode \"%s\"\n"), progname, argv[optind]);
@@ -2572,6 +2590,9 @@ main(int argc, char **argv)
pgwin32_doRunAsService();
break;
#endif
+ case OUTPUT_SHARED_MEMORY_COMMAND:
+ do_start();
+ break;
default:
break;
}
diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index 753a6dd4d7..cdb1dd266d 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -77,6 +77,6 @@ extern void check_on_shmem_exit_lists_are_empty(void);
/* ipci.c */
extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
-extern void CreateSharedMemoryAndSemaphores(void);
+extern Size CreateSharedMemoryAndSemaphores(bool size_only);
#endif /* IPC_H */
--
2.16.6
On Mon, Aug 9, 2021 at 3:57 PM Bossart, Nathan <bossartn@amazon.com> wrote:
On 6/9/21, 8:09 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
On 6/9/21, 3:51 PM, "Mark Dilger" <mark.dilger@enterprisedb.com> wrote:
On Jun 9, 2021, at 1:52 PM, Bossart, Nathan <bossartn@amazon.com>
wrote:
I'd be happy to clean it up and submit it for
discussion in pgsql-hackers@ if there is interest.Yes, I'd like to see it. Thanks for offering.
Here's the general idea. It still needs a bit of polishing, but I'm
hoping this is enough to spark some discussion on the approach.Here's a rebased version of the patch.
Nathan
Hi,
-extern void CreateSharedMemoryAndSemaphores(void);
+extern Size CreateSharedMemoryAndSemaphores(bool size_only);
Should the parameter be enum / bitmask so that future addition would not
change the method signature ?
Cheers
On 8/9/21, 4:05 PM, "Zhihong Yu" <zyu@yugabyte.com> wrote:
-extern void CreateSharedMemoryAndSemaphores(void); +extern Size CreateSharedMemoryAndSemaphores(bool size_only);Should the parameter be enum / bitmask so that future addition would not change the method signature ?
I don't have a strong opinion about this. I don't feel that it's
really necessary, but if reviewers want a bitmask instead, I can
change it.
Nathan
On Thu, Jun 10, 2021 at 07:23:33PM -0500, Justin Pryzby wrote:
On Wed, Jun 09, 2021 at 10:55:08PM -0500, Don Seiler wrote:
On Wed, Jun 9, 2021, 21:03 P C <puravc@gmail.com> wrote:
I agree, its confusing for many and that confusion arises from the fact
that you usually talk of shared_buffers in MB or GB whereas hugepages have
to be configured in units of 2mb. But once they understand they realize its
pretty simple.Don, we have experienced the same not just with postgres but also with
oracle. I havent been able to get to the root of it, but what we usually do
is, we add another 100-200 pages and that works for us. If the SGA or
shared_buffers is high eg 96gb, then we add 250-500 pages. Those few
hundred MBs may be wasted (because the moment you configure hugepages, the
operating system considers it as used and does not use it any more) but
nowadays, servers have 64 or 128 gb RAM easily and wasting that 500mb to
1gb does not hurt really.I don't have a problem with the math, just wanted to know if it was
possible to better estimate what the actual requirements would be at
deployment time. My fallback will probably be you did and just pad with an
extra 512MB by default.It's because the huge allocation isn't just shared_buffers, but also
wal_buffers:| The amount of shared memory used for WAL data that has not yet been written to disk.
| The default setting of -1 selects a size equal to 1/32nd (about 3%) of shared_buffers, ..... and other stuff:
I wonder if this shouldn't be solved the other way around:
Define shared_buffers as the exact size to be allocated/requested from the OS
(regardless of whether they're huge pages or not), and have postgres compute
everything else based on that. So shared_buffers=2GB would end up being 1950MB
(or so) of buffer cache. We'd have to check that after the other allocations,
there's still at least 128kB left for the buffer cache. Maybe we'd have to
bump the minimum value of shared_buffers.
Show quoted text
src/backend/storage/ipc/ipci.c
* Size of the Postgres shared-memory block is estimated via
* moderately-accurate estimates for the big hogs, plus 100K for the
* stuff that's too small to bother with estimating.
*
* We take some care during this phase to ensure that the total size
* request doesn't overflow size_t. If this gets through, we don't
* need to be so careful during the actual allocation phase.
*/
size = 100000;
size = add_size(size, PGSemaphoreShmemSize(numSemas));
size = add_size(size, SpinlockSemaSize());
size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
sizeof(ShmemIndexEnt)));
size = add_size(size, dsm_estimate_size());
size = add_size(size, BufferShmemSize());
size = add_size(size, LockShmemSize());
size = add_size(size, PredicateLockShmemSize());
size = add_size(size, ProcGlobalShmemSize());
size = add_size(size, XLOGShmemSize());
size = add_size(size, CLOGShmemSize());
size = add_size(size, CommitTsShmemSize());
size = add_size(size, SUBTRANSShmemSize());
size = add_size(size, TwoPhaseShmemSize());
size = add_size(size, BackgroundWorkerShmemSize());
size = add_size(size, MultiXactShmemSize());
size = add_size(size, LWLockShmemSize());
size = add_size(size, ProcArrayShmemSize());
size = add_size(size, BackendStatusShmemSize());
size = add_size(size, SInvalShmemSize());
size = add_size(size, PMSignalShmemSize());
size = add_size(size, ProcSignalShmemSize());
size = add_size(size, CheckpointerShmemSize());
size = add_size(size, AutoVacuumShmemSize());
size = add_size(size, ReplicationSlotsShmemSize());
size = add_size(size, ReplicationOriginShmemSize());
size = add_size(size, WalSndShmemSize());
size = add_size(size, WalRcvShmemSize());
size = add_size(size, PgArchShmemSize());
size = add_size(size, ApplyLauncherShmemSize());
size = add_size(size, SnapMgrShmemSize());
size = add_size(size, BTreeShmemSize());
size = add_size(size, SyncScanShmemSize());
size = add_size(size, AsyncShmemSize());
#ifdef EXEC_BACKEND
size = add_size(size, ShmemBackendArraySize());
#endif/* freeze the addin request size and include it */
addin_request_allowed = false;
size = add_size(size, total_addin_request);/* might as well round it off to a multiple of a typical page size */
size = add_size(size, 8192 - (size % 8192));BTW, I think it'd be nice if this were a NOTICE:
| elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge pages disabled: %m", allocsize);
Hi,
On 2021-08-09 18:58:53 -0500, Justin Pryzby wrote:
Define shared_buffers as the exact size to be allocated/requested from the OS
(regardless of whether they're huge pages or not), and have postgres compute
everything else based on that. So shared_buffers=2GB would end up being 1950MB
(or so) of buffer cache. We'd have to check that after the other allocations,
there's still at least 128kB left for the buffer cache. Maybe we'd have to
bump the minimum value of shared_buffers.
I don't like that. How much "other" shared memory we're going to need is
very hard to predict and depends on extensions, configuration options
like max_locks_per_transaction, max_connections to a significant
degree. This way the user ends up needing to guess at least as much as
before to get to a sensible shared_buffers.
Greetings,
Andres Freund
Hi,
On 2021-08-09 22:57:18 +0000, Bossart, Nathan wrote:
@@ -1026,6 +1031,18 @@ PostmasterMain(int argc, char *argv[])
*/
InitializeMaxBackends();+ if (output_shmem) + { + char output[64]; + Size size; + + size = CreateSharedMemoryAndSemaphores(true); + sprintf(output, "%zu", size); + + puts(output); + ExitPostmaster(0); + }
I don't like putting this into PostmasterMain(). Either BootstrapMain()
(specifically checker mode) or GucInfoMain() seem like better places.
-void -CreateSharedMemoryAndSemaphores(void) +Size +CreateSharedMemoryAndSemaphores(bool size_only) { PGShmemHeader *shim = NULL;@@ -161,6 +161,9 @@ CreateSharedMemoryAndSemaphores(void)
/* might as well round it off to a multiple of a typical page size */
size = add_size(size, 8192 - (size % 8192));+ if (size_only) + return size; + elog(DEBUG3, "invoking IpcMemoryCreate(size=%zu)", size);/*
@@ -288,4 +291,6 @@ CreateSharedMemoryAndSemaphores(void)
*/
if (shmem_startup_hook)
shmem_startup_hook();
+
+ return 0;
}
That seems like an ugly API to me. Why don't we split the size
determination and shmem creation functions into two?
Greetings,
Andres Freund
On 8/9/21, 8:43 PM, "Andres Freund" <andres@anarazel.de> wrote:
I don't like putting this into PostmasterMain(). Either BootstrapMain()
(specifically checker mode) or GucInfoMain() seem like better places.
I think BootstrapModeMain() makes the most sense. It fits in nicely
with the --check logic that's already there. With v3, the following
command can be used to retrieve the amount of shared memory required.
postgres --output-shmem -D dir
While testing this new option, I noticed that you can achieve similar
results today with the following command, although this one will
actually try to create the shared memory, too.
postgres --check -D dir -c log_min_messages=debug3 2> >(grep IpcMemoryCreate)
IMO the new option is still handy, but I can see the argument that it
might not be necessary.
That seems like an ugly API to me. Why don't we split the size
determination and shmem creation functions into two?
I did it this way in v3.
Nathan
Attachments:
v3-0001-introduce-option-for-retreiving-shmem-size.patchapplication/octet-stream; name=v3-0001-introduce-option-for-retreiving-shmem-size.patchDownload
From 6ff02fbc9f4b47456ccb367b21d2b489845620ce Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Wed, 11 Aug 2021 22:43:03 +0000
Subject: [PATCH v3 1/1] introduce option for retreiving shmem size
---
src/backend/bootstrap/bootstrap.c | 18 +++--
src/backend/main/main.c | 6 +-
src/backend/storage/ipc/ipci.c | 142 ++++++++++++++++++++++----------------
src/include/bootstrap/bootstrap.h | 2 +-
src/include/storage/ipc.h | 1 +
5 files changed, 102 insertions(+), 67 deletions(-)
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 48615c0ebc..c9b2284ad2 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -198,9 +198,13 @@ CheckerModeMain(void)
* the current configuration, particularly the passed in options pertaining
* to shared memory sizing, options work (or at least do not cause an error
* up to shared memory creation).
+ *
+ * When output_shmem is true, startup is done only far enough to calculate the
+ * amount of shared memory required for the current configuration. The result
+ * of this calculation is printed.
*/
void
-BootstrapModeMain(int argc, char *argv[], bool check_only)
+BootstrapModeMain(int argc, char *argv[], bool check_only, bool output_shmem)
{
int i;
char *progname = argv[0];
@@ -214,10 +218,11 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
/* Set defaults, to be overridden by explicit options below */
InitializeGUCOptions();
- /* an initial --boot or --check should be present */
+ /* an initial --boot, --check, or --output-shmem should be present */
Assert(argc > 1
&& (strcmp(argv[1], "--boot") == 0
- || strcmp(argv[1], "--check") == 0));
+ || strcmp(argv[1], "--check") == 0
+ || strcmp(argv[1], "--output-shmem") == 0));
argv++;
argc--;
@@ -324,14 +329,17 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
InitializeMaxBackends();
- CreateSharedMemoryAndSemaphores();
+ if (output_shmem)
+ printf("%zu bytes\n", CalculateShmemSize(NULL));
+ else
+ CreateSharedMemoryAndSemaphores();
/*
* XXX: It might make sense to move this into its own function at some
* point. Right now it seems like it'd cause more code duplication than
* it's worth.
*/
- if (check_only)
+ if (check_only || output_shmem)
{
SetProcessingMode(NormalProcessing);
CheckerModeMain();
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 3a2a0d598c..c141ae3d1c 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -182,9 +182,11 @@ main(int argc, char *argv[])
*/
if (argc > 1 && strcmp(argv[1], "--check") == 0)
- BootstrapModeMain(argc, argv, true);
+ BootstrapModeMain(argc, argv, true, false);
+ else if (argc > 1 && strcmp(argv[1], "--output-shmem") == 0)
+ BootstrapModeMain(argc, argv, false, true);
else if (argc > 1 && strcmp(argv[1], "--boot") == 0)
- BootstrapModeMain(argc, argv, false);
+ BootstrapModeMain(argc, argv, false, false);
#ifdef EXEC_BACKEND
else if (argc > 1 && strncmp(argv[1], "--fork", 6) == 0)
SubPostmasterMain(argc, argv);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 3e4ec53a97..b225b1ee70 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -75,6 +75,87 @@ RequestAddinShmemSpace(Size size)
total_addin_request = add_size(total_addin_request, size);
}
+/*
+ * CalculateShmemSize
+ * Calculates the amount of shared memory and number of semaphores needed.
+ *
+ * If num_semaphores is not NULL, it will be set to the number of semaphores
+ * required.
+ *
+ * Note that this function freezes the additional shared memory request size
+ * from loadable modules.
+ */
+Size
+CalculateShmemSize(int *num_semaphores)
+{
+ Size size;
+ int numSemas;
+
+ /* Compute number of semaphores we'll need */
+ numSemas = ProcGlobalSemas();
+ numSemas += SpinlockSemas();
+
+ /* Return the number of semaphores if requested by the caller */
+ if (num_semaphores)
+ *num_semaphores = numSemas;
+
+ /*
+ * Size of the Postgres shared-memory block is estimated via moderately-
+ * accurate estimates for the big hogs, plus 100K for the stuff that's too
+ * small to bother with estimating.
+ *
+ * We take some care to ensure that the total size request doesn't overflow
+ * size_t. If this gets through, we don't need to be so careful during the
+ * actual allocation phase.
+ */
+ size = 100000;
+ size = add_size(size, PGSemaphoreShmemSize(numSemas));
+ size = add_size(size, SpinlockSemaSize());
+ size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
+ sizeof(ShmemIndexEnt)));
+ size = add_size(size, dsm_estimate_size());
+ size = add_size(size, BufferShmemSize());
+ size = add_size(size, LockShmemSize());
+ size = add_size(size, PredicateLockShmemSize());
+ size = add_size(size, ProcGlobalShmemSize());
+ size = add_size(size, XLOGShmemSize());
+ size = add_size(size, CLOGShmemSize());
+ size = add_size(size, CommitTsShmemSize());
+ size = add_size(size, SUBTRANSShmemSize());
+ size = add_size(size, TwoPhaseShmemSize());
+ size = add_size(size, BackgroundWorkerShmemSize());
+ size = add_size(size, MultiXactShmemSize());
+ size = add_size(size, LWLockShmemSize());
+ size = add_size(size, ProcArrayShmemSize());
+ size = add_size(size, BackendStatusShmemSize());
+ size = add_size(size, SInvalShmemSize());
+ size = add_size(size, PMSignalShmemSize());
+ size = add_size(size, ProcSignalShmemSize());
+ size = add_size(size, CheckpointerShmemSize());
+ size = add_size(size, AutoVacuumShmemSize());
+ size = add_size(size, ReplicationSlotsShmemSize());
+ size = add_size(size, ReplicationOriginShmemSize());
+ size = add_size(size, WalSndShmemSize());
+ size = add_size(size, WalRcvShmemSize());
+ size = add_size(size, PgArchShmemSize());
+ size = add_size(size, ApplyLauncherShmemSize());
+ size = add_size(size, SnapMgrShmemSize());
+ size = add_size(size, BTreeShmemSize());
+ size = add_size(size, SyncScanShmemSize());
+ size = add_size(size, AsyncShmemSize());
+#ifdef EXEC_BACKEND
+ size = add_size(size, ShmemBackendArraySize());
+#endif
+
+ /* freeze the addin request size and include it */
+ addin_request_allowed = false;
+ size = add_size(size, total_addin_request);
+
+ /* might as well round it off to a multiple of a typical page size */
+ size = add_size(size, 8192 - (size % 8192));
+
+ return size;
+}
/*
* CreateSharedMemoryAndSemaphores
@@ -102,65 +183,8 @@ CreateSharedMemoryAndSemaphores(void)
Size size;
int numSemas;
- /* Compute number of semaphores we'll need */
- numSemas = ProcGlobalSemas();
- numSemas += SpinlockSemas();
-
- /*
- * Size of the Postgres shared-memory block is estimated via
- * moderately-accurate estimates for the big hogs, plus 100K for the
- * stuff that's too small to bother with estimating.
- *
- * We take some care during this phase to ensure that the total size
- * request doesn't overflow size_t. If this gets through, we don't
- * need to be so careful during the actual allocation phase.
- */
- size = 100000;
- size = add_size(size, PGSemaphoreShmemSize(numSemas));
- size = add_size(size, SpinlockSemaSize());
- size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
- sizeof(ShmemIndexEnt)));
- size = add_size(size, dsm_estimate_size());
- size = add_size(size, BufferShmemSize());
- size = add_size(size, LockShmemSize());
- size = add_size(size, PredicateLockShmemSize());
- size = add_size(size, ProcGlobalShmemSize());
- size = add_size(size, XLOGShmemSize());
- size = add_size(size, CLOGShmemSize());
- size = add_size(size, CommitTsShmemSize());
- size = add_size(size, SUBTRANSShmemSize());
- size = add_size(size, TwoPhaseShmemSize());
- size = add_size(size, BackgroundWorkerShmemSize());
- size = add_size(size, MultiXactShmemSize());
- size = add_size(size, LWLockShmemSize());
- size = add_size(size, ProcArrayShmemSize());
- size = add_size(size, BackendStatusShmemSize());
- size = add_size(size, SInvalShmemSize());
- size = add_size(size, PMSignalShmemSize());
- size = add_size(size, ProcSignalShmemSize());
- size = add_size(size, CheckpointerShmemSize());
- size = add_size(size, AutoVacuumShmemSize());
- size = add_size(size, ReplicationSlotsShmemSize());
- size = add_size(size, ReplicationOriginShmemSize());
- size = add_size(size, WalSndShmemSize());
- size = add_size(size, WalRcvShmemSize());
- size = add_size(size, PgArchShmemSize());
- size = add_size(size, ApplyLauncherShmemSize());
- size = add_size(size, SnapMgrShmemSize());
- size = add_size(size, BTreeShmemSize());
- size = add_size(size, SyncScanShmemSize());
- size = add_size(size, AsyncShmemSize());
-#ifdef EXEC_BACKEND
- size = add_size(size, ShmemBackendArraySize());
-#endif
-
- /* freeze the addin request size and include it */
- addin_request_allowed = false;
- size = add_size(size, total_addin_request);
-
- /* might as well round it off to a multiple of a typical page size */
- size = add_size(size, 8192 - (size % 8192));
-
+ /* Compute the size of the shared-memory block */
+ size = CalculateShmemSize(&numSemas);
elog(DEBUG3, "invoking IpcMemoryCreate(size=%zu)", size);
/*
diff --git a/src/include/bootstrap/bootstrap.h b/src/include/bootstrap/bootstrap.h
index 7d3b78e374..8e420574f5 100644
--- a/src/include/bootstrap/bootstrap.h
+++ b/src/include/bootstrap/bootstrap.h
@@ -32,7 +32,7 @@ extern Form_pg_attribute attrtypes[MAXATTR];
extern int numattr;
-extern void BootstrapModeMain(int argc, char *argv[], bool check_only) pg_attribute_noreturn();
+extern void BootstrapModeMain(int argc, char *argv[], bool check_only, bool output_shmem) pg_attribute_noreturn();
extern void closerel(char *name);
extern void boot_openrel(char *name);
diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index 753a6dd4d7..80e191d407 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -77,6 +77,7 @@ extern void check_on_shmem_exit_lists_are_empty(void);
/* ipci.c */
extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
+extern Size CalculateShmemSize(int *num_semaphores);
extern void CreateSharedMemoryAndSemaphores(void);
#endif /* IPC_H */
--
2.16.6
On Wed, Aug 11, 2021 at 11:23:52PM +0000, Bossart, Nathan wrote:
I think BootstrapModeMain() makes the most sense. It fits in nicely
with the --check logic that's already there. With v3, the following
command can be used to retrieve the amount of shared memory required.postgres --output-shmem -D dir
While testing this new option, I noticed that you can achieve similar
results today with the following command, although this one will
actually try to create the shared memory, too.
That may not be the best option.
IMO the new option is still handy, but I can see the argument that it
might not be necessary.
A separate option looks handy. Wouldn't it be better to document it
in postgres-ref.sgml then?
--
Michael
On Fri, Aug 27, 2021 at 8:46 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Aug 11, 2021 at 11:23:52PM +0000, Bossart, Nathan wrote:
I think BootstrapModeMain() makes the most sense. It fits in nicely
with the --check logic that's already there. With v3, the following
command can be used to retrieve the amount of shared memory required.postgres --output-shmem -D dir
While testing this new option, I noticed that you can achieve similar
results today with the following command, although this one will
actually try to create the shared memory, too.That may not be the best option.
I would say that can be a disastrous option.
First of all it would probably not work if you already have something
running -- especially when using huge pages. And if it does work, in
that or other scenarios, it can potentially have significant impact on
a running cluster to suddenly allocate many GB of more memory...
IMO the new option is still handy, but I can see the argument that it
might not be necessary.A separate option looks handy. Wouldn't it be better to document it
in postgres-ref.sgml then?
I'd say a lot more than just handy. I don't think the workaround is
really all that useful.
(haven't looked at the actual patch yet, just commenting on the principle)
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
On 8/27/21, 7:41 AM, "Magnus Hagander" <magnus@hagander.net> wrote:
On Fri, Aug 27, 2021 at 8:46 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Aug 11, 2021 at 11:23:52PM +0000, Bossart, Nathan wrote:
While testing this new option, I noticed that you can achieve similar
results today with the following command, although this one will
actually try to create the shared memory, too.That may not be the best option.
I would say that can be a disastrous option.
First of all it would probably not work if you already have something
running -- especially when using huge pages. And if it does work, in
that or other scenarios, it can potentially have significant impact on
a running cluster to suddenly allocate many GB of more memory...
The v3 patch actually didn't work if the server was already running.
I removed that restriction in v4.
IMO the new option is still handy, but I can see the argument that it
might not be necessary.A separate option looks handy. Wouldn't it be better to document it
in postgres-ref.sgml then?I'd say a lot more than just handy. I don't think the workaround is
really all that useful.
I added some documentation in v4.
Nathan
Attachments:
v4-0001-introduce-option-for-retreiving-shmem-size.patchapplication/octet-stream; name=v4-0001-introduce-option-for-retreiving-shmem-size.patchDownload
From dff1d2b57a65ada412386b3dce2f1cf6d8409cac Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 27 Aug 2021 17:54:26 +0000
Subject: [PATCH v4 1/1] introduce option for retreiving shmem size
---
doc/src/sgml/ref/postgres-ref.sgml | 17 +++++
doc/src/sgml/runtime.sgml | 17 ++---
src/backend/bootstrap/bootstrap.c | 25 +++++--
src/backend/main/main.c | 6 +-
src/backend/storage/ipc/ipci.c | 142 ++++++++++++++++++++++---------------
src/include/bootstrap/bootstrap.h | 2 +-
src/include/storage/ipc.h | 1 +
7 files changed, 132 insertions(+), 78 deletions(-)
diff --git a/doc/src/sgml/ref/postgres-ref.sgml b/doc/src/sgml/ref/postgres-ref.sgml
index 4aaa7abe1a..da20fb559a 100644
--- a/doc/src/sgml/ref/postgres-ref.sgml
+++ b/doc/src/sgml/ref/postgres-ref.sgml
@@ -280,6 +280,23 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>--output-shmem</option></term>
+ <listitem>
+ <para>
+ Prints the amount of shared memory required for the current
+ configuration and exits. This can be used on a running server.
+ This must be the first argument on the command line.
+ </para>
+
+ <para>
+ This option is useful for determining the number of huge pages needed
+ for the server. For more information, see
+ <xref linkend="linux-huge-pages"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-p <replaceable class="parameter">port</replaceable></option></term>
<listitem>
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index f1cbc1d9e9..535eacadb2 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1442,17 +1442,14 @@ export PG_OOM_ADJUST_VALUE=0
with <varname>CONFIG_HUGETLBFS=y</varname> and
<varname>CONFIG_HUGETLB_PAGE=y</varname>. You will also have to configure
the operating system to provide enough huge pages of the desired size.
- To estimate the number of huge pages needed, start
- <productname>PostgreSQL</productname> without huge pages enabled and check
- the postmaster's anonymous shared memory segment size, as well as the
- system's default and supported huge page sizes, using the
- <filename>/proc</filename> and <filename>/sys</filename> file systems.
+ To estimate the number of huge pages needed, use the
+ <command>postgres</command> command to determine the amount of shared memory
+ needed, and use the <filename>/proc</filename> and <filename>/sys</filename>
+ file systems to find the system's default and supported huge page sizes.
This might look like:
<programlisting>
-$ <userinput>head -1 $PGDATA/postmaster.pid</userinput>
-4170
-$ <userinput>pmap 4170 | awk '/rw-s/ && /zero/ {print $2}'</userinput>
-6490428K
+$ <userinput>postgres --output-shmem -D $PGDATA</userinput>
+6646198272 bytes
$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
Hugepagesize: 2048 kB
$ <userinput>ls /sys/kernel/mm/hugepages</userinput>
@@ -1463,7 +1460,7 @@ hugepages-1048576kB hugepages-2048kB
either 2MB or 1GB with <xref linkend="guc-huge-page-size"/>.
Assuming <literal>2MB</literal> huge pages,
- <literal>6490428</literal> / <literal>2048</literal> gives approximately
+ <literal>6646198272</literal> / <literal>2097152</literal> gives approximately
<literal>3169.154</literal>, so in this example we need at
least <literal>3170</literal> huge pages. A larger setting would be
appropriate if other programs on the machine also need huge pages.
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 48615c0ebc..5bb491da8b 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -198,9 +198,13 @@ CheckerModeMain(void)
* the current configuration, particularly the passed in options pertaining
* to shared memory sizing, options work (or at least do not cause an error
* up to shared memory creation).
+ *
+ * When output_shmem is true, startup is done only far enough to calculate the
+ * amount of shared memory required for the current configuration. The result
+ * of this calculation is printed.
*/
void
-BootstrapModeMain(int argc, char *argv[], bool check_only)
+BootstrapModeMain(int argc, char *argv[], bool check_only, bool output_shmem)
{
int i;
char *progname = argv[0];
@@ -214,10 +218,11 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
/* Set defaults, to be overridden by explicit options below */
InitializeGUCOptions();
- /* an initial --boot or --check should be present */
+ /* an initial --boot, --check, or --output-shmem should be present */
Assert(argc > 1
&& (strcmp(argv[1], "--boot") == 0
- || strcmp(argv[1], "--check") == 0));
+ || strcmp(argv[1], "--check") == 0
+ || strcmp(argv[1], "--output-shmem") == 0));
argv++;
argc--;
@@ -317,21 +322,29 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
checkDataDir();
ChangeToDataDir();
- CreateDataDirLockFile(false);
+ /*
+ * We do not create the lock file when --output-shmem is specified so
+ * that it can be used while the server is running.
+ */
+ if (!output_shmem)
+ CreateDataDirLockFile(false);
SetProcessingMode(BootstrapProcessing);
IgnoreSystemIndexes = true;
InitializeMaxBackends();
- CreateSharedMemoryAndSemaphores();
+ if (output_shmem)
+ printf("%zu bytes\n", CalculateShmemSize(NULL));
+ else
+ CreateSharedMemoryAndSemaphores();
/*
* XXX: It might make sense to move this into its own function at some
* point. Right now it seems like it'd cause more code duplication than
* it's worth.
*/
- if (check_only)
+ if (check_only || output_shmem)
{
SetProcessingMode(NormalProcessing);
CheckerModeMain();
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 3a2a0d598c..c141ae3d1c 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -182,9 +182,11 @@ main(int argc, char *argv[])
*/
if (argc > 1 && strcmp(argv[1], "--check") == 0)
- BootstrapModeMain(argc, argv, true);
+ BootstrapModeMain(argc, argv, true, false);
+ else if (argc > 1 && strcmp(argv[1], "--output-shmem") == 0)
+ BootstrapModeMain(argc, argv, false, true);
else if (argc > 1 && strcmp(argv[1], "--boot") == 0)
- BootstrapModeMain(argc, argv, false);
+ BootstrapModeMain(argc, argv, false, false);
#ifdef EXEC_BACKEND
else if (argc > 1 && strncmp(argv[1], "--fork", 6) == 0)
SubPostmasterMain(argc, argv);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 3e4ec53a97..b225b1ee70 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -75,6 +75,87 @@ RequestAddinShmemSpace(Size size)
total_addin_request = add_size(total_addin_request, size);
}
+/*
+ * CalculateShmemSize
+ * Calculates the amount of shared memory and number of semaphores needed.
+ *
+ * If num_semaphores is not NULL, it will be set to the number of semaphores
+ * required.
+ *
+ * Note that this function freezes the additional shared memory request size
+ * from loadable modules.
+ */
+Size
+CalculateShmemSize(int *num_semaphores)
+{
+ Size size;
+ int numSemas;
+
+ /* Compute number of semaphores we'll need */
+ numSemas = ProcGlobalSemas();
+ numSemas += SpinlockSemas();
+
+ /* Return the number of semaphores if requested by the caller */
+ if (num_semaphores)
+ *num_semaphores = numSemas;
+
+ /*
+ * Size of the Postgres shared-memory block is estimated via moderately-
+ * accurate estimates for the big hogs, plus 100K for the stuff that's too
+ * small to bother with estimating.
+ *
+ * We take some care to ensure that the total size request doesn't overflow
+ * size_t. If this gets through, we don't need to be so careful during the
+ * actual allocation phase.
+ */
+ size = 100000;
+ size = add_size(size, PGSemaphoreShmemSize(numSemas));
+ size = add_size(size, SpinlockSemaSize());
+ size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
+ sizeof(ShmemIndexEnt)));
+ size = add_size(size, dsm_estimate_size());
+ size = add_size(size, BufferShmemSize());
+ size = add_size(size, LockShmemSize());
+ size = add_size(size, PredicateLockShmemSize());
+ size = add_size(size, ProcGlobalShmemSize());
+ size = add_size(size, XLOGShmemSize());
+ size = add_size(size, CLOGShmemSize());
+ size = add_size(size, CommitTsShmemSize());
+ size = add_size(size, SUBTRANSShmemSize());
+ size = add_size(size, TwoPhaseShmemSize());
+ size = add_size(size, BackgroundWorkerShmemSize());
+ size = add_size(size, MultiXactShmemSize());
+ size = add_size(size, LWLockShmemSize());
+ size = add_size(size, ProcArrayShmemSize());
+ size = add_size(size, BackendStatusShmemSize());
+ size = add_size(size, SInvalShmemSize());
+ size = add_size(size, PMSignalShmemSize());
+ size = add_size(size, ProcSignalShmemSize());
+ size = add_size(size, CheckpointerShmemSize());
+ size = add_size(size, AutoVacuumShmemSize());
+ size = add_size(size, ReplicationSlotsShmemSize());
+ size = add_size(size, ReplicationOriginShmemSize());
+ size = add_size(size, WalSndShmemSize());
+ size = add_size(size, WalRcvShmemSize());
+ size = add_size(size, PgArchShmemSize());
+ size = add_size(size, ApplyLauncherShmemSize());
+ size = add_size(size, SnapMgrShmemSize());
+ size = add_size(size, BTreeShmemSize());
+ size = add_size(size, SyncScanShmemSize());
+ size = add_size(size, AsyncShmemSize());
+#ifdef EXEC_BACKEND
+ size = add_size(size, ShmemBackendArraySize());
+#endif
+
+ /* freeze the addin request size and include it */
+ addin_request_allowed = false;
+ size = add_size(size, total_addin_request);
+
+ /* might as well round it off to a multiple of a typical page size */
+ size = add_size(size, 8192 - (size % 8192));
+
+ return size;
+}
/*
* CreateSharedMemoryAndSemaphores
@@ -102,65 +183,8 @@ CreateSharedMemoryAndSemaphores(void)
Size size;
int numSemas;
- /* Compute number of semaphores we'll need */
- numSemas = ProcGlobalSemas();
- numSemas += SpinlockSemas();
-
- /*
- * Size of the Postgres shared-memory block is estimated via
- * moderately-accurate estimates for the big hogs, plus 100K for the
- * stuff that's too small to bother with estimating.
- *
- * We take some care during this phase to ensure that the total size
- * request doesn't overflow size_t. If this gets through, we don't
- * need to be so careful during the actual allocation phase.
- */
- size = 100000;
- size = add_size(size, PGSemaphoreShmemSize(numSemas));
- size = add_size(size, SpinlockSemaSize());
- size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
- sizeof(ShmemIndexEnt)));
- size = add_size(size, dsm_estimate_size());
- size = add_size(size, BufferShmemSize());
- size = add_size(size, LockShmemSize());
- size = add_size(size, PredicateLockShmemSize());
- size = add_size(size, ProcGlobalShmemSize());
- size = add_size(size, XLOGShmemSize());
- size = add_size(size, CLOGShmemSize());
- size = add_size(size, CommitTsShmemSize());
- size = add_size(size, SUBTRANSShmemSize());
- size = add_size(size, TwoPhaseShmemSize());
- size = add_size(size, BackgroundWorkerShmemSize());
- size = add_size(size, MultiXactShmemSize());
- size = add_size(size, LWLockShmemSize());
- size = add_size(size, ProcArrayShmemSize());
- size = add_size(size, BackendStatusShmemSize());
- size = add_size(size, SInvalShmemSize());
- size = add_size(size, PMSignalShmemSize());
- size = add_size(size, ProcSignalShmemSize());
- size = add_size(size, CheckpointerShmemSize());
- size = add_size(size, AutoVacuumShmemSize());
- size = add_size(size, ReplicationSlotsShmemSize());
- size = add_size(size, ReplicationOriginShmemSize());
- size = add_size(size, WalSndShmemSize());
- size = add_size(size, WalRcvShmemSize());
- size = add_size(size, PgArchShmemSize());
- size = add_size(size, ApplyLauncherShmemSize());
- size = add_size(size, SnapMgrShmemSize());
- size = add_size(size, BTreeShmemSize());
- size = add_size(size, SyncScanShmemSize());
- size = add_size(size, AsyncShmemSize());
-#ifdef EXEC_BACKEND
- size = add_size(size, ShmemBackendArraySize());
-#endif
-
- /* freeze the addin request size and include it */
- addin_request_allowed = false;
- size = add_size(size, total_addin_request);
-
- /* might as well round it off to a multiple of a typical page size */
- size = add_size(size, 8192 - (size % 8192));
-
+ /* Compute the size of the shared-memory block */
+ size = CalculateShmemSize(&numSemas);
elog(DEBUG3, "invoking IpcMemoryCreate(size=%zu)", size);
/*
diff --git a/src/include/bootstrap/bootstrap.h b/src/include/bootstrap/bootstrap.h
index 7d3b78e374..8e420574f5 100644
--- a/src/include/bootstrap/bootstrap.h
+++ b/src/include/bootstrap/bootstrap.h
@@ -32,7 +32,7 @@ extern Form_pg_attribute attrtypes[MAXATTR];
extern int numattr;
-extern void BootstrapModeMain(int argc, char *argv[], bool check_only) pg_attribute_noreturn();
+extern void BootstrapModeMain(int argc, char *argv[], bool check_only, bool output_shmem) pg_attribute_noreturn();
extern void closerel(char *name);
extern void boot_openrel(char *name);
diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index 753a6dd4d7..80e191d407 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -77,6 +77,7 @@ extern void check_on_shmem_exit_lists_are_empty(void);
/* ipci.c */
extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
+extern Size CalculateShmemSize(int *num_semaphores);
extern void CreateSharedMemoryAndSemaphores(void);
#endif /* IPC_H */
--
2.16.6
On 2021-08-27 16:40:27 +0200, Magnus Hagander wrote:
On Fri, Aug 27, 2021 at 8:46 AM Michael Paquier <michael@paquier.xyz> wrote:
I'd say a lot more than just handy. I don't think the workaround is
really all that useful.
+1
On 8/27/21, 11:16 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
I added some documentation in v4.
I realized that my attempt at documenting this new option was missing
some important context about the meaning of the return value when used
against a running server. I added that in v5.
Nathan
Attachments:
v5-0001-introduce-option-for-retreiving-shmem-size.patchapplication/octet-stream; name=v5-0001-introduce-option-for-retreiving-shmem-size.patchDownload
From 083cc81509c5d26b0dd431c58cf040bf311e9387 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 27 Aug 2021 19:24:38 +0000
Subject: [PATCH v5 1/1] introduce option for retreiving shmem size
---
doc/src/sgml/ref/postgres-ref.sgml | 20 ++++++
doc/src/sgml/runtime.sgml | 17 ++---
src/backend/bootstrap/bootstrap.c | 25 +++++--
src/backend/main/main.c | 6 +-
src/backend/storage/ipc/ipci.c | 142 ++++++++++++++++++++++---------------
src/include/bootstrap/bootstrap.h | 2 +-
src/include/storage/ipc.h | 1 +
7 files changed, 135 insertions(+), 78 deletions(-)
diff --git a/doc/src/sgml/ref/postgres-ref.sgml b/doc/src/sgml/ref/postgres-ref.sgml
index 4aaa7abe1a..c28b6470c2 100644
--- a/doc/src/sgml/ref/postgres-ref.sgml
+++ b/doc/src/sgml/ref/postgres-ref.sgml
@@ -280,6 +280,26 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>--output-shmem</option></term>
+ <listitem>
+ <para>
+ Prints the amount of shared memory required for the given
+ configuration and exits. This can be used on a running server, but
+ the return value reflects the amount of shared memory needed based
+ on the current invocation. It does not return the amount of shared
+ memory in use by the running server. This must be the first
+ argument on the command line.
+ </para>
+
+ <para>
+ This option is useful for determining the number of huge pages
+ needed for the server. For more information, see
+ <xref linkend="linux-huge-pages"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-p <replaceable class="parameter">port</replaceable></option></term>
<listitem>
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index f1cbc1d9e9..535eacadb2 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1442,17 +1442,14 @@ export PG_OOM_ADJUST_VALUE=0
with <varname>CONFIG_HUGETLBFS=y</varname> and
<varname>CONFIG_HUGETLB_PAGE=y</varname>. You will also have to configure
the operating system to provide enough huge pages of the desired size.
- To estimate the number of huge pages needed, start
- <productname>PostgreSQL</productname> without huge pages enabled and check
- the postmaster's anonymous shared memory segment size, as well as the
- system's default and supported huge page sizes, using the
- <filename>/proc</filename> and <filename>/sys</filename> file systems.
+ To estimate the number of huge pages needed, use the
+ <command>postgres</command> command to determine the amount of shared memory
+ needed, and use the <filename>/proc</filename> and <filename>/sys</filename>
+ file systems to find the system's default and supported huge page sizes.
This might look like:
<programlisting>
-$ <userinput>head -1 $PGDATA/postmaster.pid</userinput>
-4170
-$ <userinput>pmap 4170 | awk '/rw-s/ && /zero/ {print $2}'</userinput>
-6490428K
+$ <userinput>postgres --output-shmem -D $PGDATA</userinput>
+6646198272 bytes
$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
Hugepagesize: 2048 kB
$ <userinput>ls /sys/kernel/mm/hugepages</userinput>
@@ -1463,7 +1460,7 @@ hugepages-1048576kB hugepages-2048kB
either 2MB or 1GB with <xref linkend="guc-huge-page-size"/>.
Assuming <literal>2MB</literal> huge pages,
- <literal>6490428</literal> / <literal>2048</literal> gives approximately
+ <literal>6646198272</literal> / <literal>2097152</literal> gives approximately
<literal>3169.154</literal>, so in this example we need at
least <literal>3170</literal> huge pages. A larger setting would be
appropriate if other programs on the machine also need huge pages.
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 48615c0ebc..5bb491da8b 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -198,9 +198,13 @@ CheckerModeMain(void)
* the current configuration, particularly the passed in options pertaining
* to shared memory sizing, options work (or at least do not cause an error
* up to shared memory creation).
+ *
+ * When output_shmem is true, startup is done only far enough to calculate the
+ * amount of shared memory required for the current configuration. The result
+ * of this calculation is printed.
*/
void
-BootstrapModeMain(int argc, char *argv[], bool check_only)
+BootstrapModeMain(int argc, char *argv[], bool check_only, bool output_shmem)
{
int i;
char *progname = argv[0];
@@ -214,10 +218,11 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
/* Set defaults, to be overridden by explicit options below */
InitializeGUCOptions();
- /* an initial --boot or --check should be present */
+ /* an initial --boot, --check, or --output-shmem should be present */
Assert(argc > 1
&& (strcmp(argv[1], "--boot") == 0
- || strcmp(argv[1], "--check") == 0));
+ || strcmp(argv[1], "--check") == 0
+ || strcmp(argv[1], "--output-shmem") == 0));
argv++;
argc--;
@@ -317,21 +322,29 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
checkDataDir();
ChangeToDataDir();
- CreateDataDirLockFile(false);
+ /*
+ * We do not create the lock file when --output-shmem is specified so
+ * that it can be used while the server is running.
+ */
+ if (!output_shmem)
+ CreateDataDirLockFile(false);
SetProcessingMode(BootstrapProcessing);
IgnoreSystemIndexes = true;
InitializeMaxBackends();
- CreateSharedMemoryAndSemaphores();
+ if (output_shmem)
+ printf("%zu bytes\n", CalculateShmemSize(NULL));
+ else
+ CreateSharedMemoryAndSemaphores();
/*
* XXX: It might make sense to move this into its own function at some
* point. Right now it seems like it'd cause more code duplication than
* it's worth.
*/
- if (check_only)
+ if (check_only || output_shmem)
{
SetProcessingMode(NormalProcessing);
CheckerModeMain();
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 3a2a0d598c..c141ae3d1c 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -182,9 +182,11 @@ main(int argc, char *argv[])
*/
if (argc > 1 && strcmp(argv[1], "--check") == 0)
- BootstrapModeMain(argc, argv, true);
+ BootstrapModeMain(argc, argv, true, false);
+ else if (argc > 1 && strcmp(argv[1], "--output-shmem") == 0)
+ BootstrapModeMain(argc, argv, false, true);
else if (argc > 1 && strcmp(argv[1], "--boot") == 0)
- BootstrapModeMain(argc, argv, false);
+ BootstrapModeMain(argc, argv, false, false);
#ifdef EXEC_BACKEND
else if (argc > 1 && strncmp(argv[1], "--fork", 6) == 0)
SubPostmasterMain(argc, argv);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 3e4ec53a97..b225b1ee70 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -75,6 +75,87 @@ RequestAddinShmemSpace(Size size)
total_addin_request = add_size(total_addin_request, size);
}
+/*
+ * CalculateShmemSize
+ * Calculates the amount of shared memory and number of semaphores needed.
+ *
+ * If num_semaphores is not NULL, it will be set to the number of semaphores
+ * required.
+ *
+ * Note that this function freezes the additional shared memory request size
+ * from loadable modules.
+ */
+Size
+CalculateShmemSize(int *num_semaphores)
+{
+ Size size;
+ int numSemas;
+
+ /* Compute number of semaphores we'll need */
+ numSemas = ProcGlobalSemas();
+ numSemas += SpinlockSemas();
+
+ /* Return the number of semaphores if requested by the caller */
+ if (num_semaphores)
+ *num_semaphores = numSemas;
+
+ /*
+ * Size of the Postgres shared-memory block is estimated via moderately-
+ * accurate estimates for the big hogs, plus 100K for the stuff that's too
+ * small to bother with estimating.
+ *
+ * We take some care to ensure that the total size request doesn't overflow
+ * size_t. If this gets through, we don't need to be so careful during the
+ * actual allocation phase.
+ */
+ size = 100000;
+ size = add_size(size, PGSemaphoreShmemSize(numSemas));
+ size = add_size(size, SpinlockSemaSize());
+ size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
+ sizeof(ShmemIndexEnt)));
+ size = add_size(size, dsm_estimate_size());
+ size = add_size(size, BufferShmemSize());
+ size = add_size(size, LockShmemSize());
+ size = add_size(size, PredicateLockShmemSize());
+ size = add_size(size, ProcGlobalShmemSize());
+ size = add_size(size, XLOGShmemSize());
+ size = add_size(size, CLOGShmemSize());
+ size = add_size(size, CommitTsShmemSize());
+ size = add_size(size, SUBTRANSShmemSize());
+ size = add_size(size, TwoPhaseShmemSize());
+ size = add_size(size, BackgroundWorkerShmemSize());
+ size = add_size(size, MultiXactShmemSize());
+ size = add_size(size, LWLockShmemSize());
+ size = add_size(size, ProcArrayShmemSize());
+ size = add_size(size, BackendStatusShmemSize());
+ size = add_size(size, SInvalShmemSize());
+ size = add_size(size, PMSignalShmemSize());
+ size = add_size(size, ProcSignalShmemSize());
+ size = add_size(size, CheckpointerShmemSize());
+ size = add_size(size, AutoVacuumShmemSize());
+ size = add_size(size, ReplicationSlotsShmemSize());
+ size = add_size(size, ReplicationOriginShmemSize());
+ size = add_size(size, WalSndShmemSize());
+ size = add_size(size, WalRcvShmemSize());
+ size = add_size(size, PgArchShmemSize());
+ size = add_size(size, ApplyLauncherShmemSize());
+ size = add_size(size, SnapMgrShmemSize());
+ size = add_size(size, BTreeShmemSize());
+ size = add_size(size, SyncScanShmemSize());
+ size = add_size(size, AsyncShmemSize());
+#ifdef EXEC_BACKEND
+ size = add_size(size, ShmemBackendArraySize());
+#endif
+
+ /* freeze the addin request size and include it */
+ addin_request_allowed = false;
+ size = add_size(size, total_addin_request);
+
+ /* might as well round it off to a multiple of a typical page size */
+ size = add_size(size, 8192 - (size % 8192));
+
+ return size;
+}
/*
* CreateSharedMemoryAndSemaphores
@@ -102,65 +183,8 @@ CreateSharedMemoryAndSemaphores(void)
Size size;
int numSemas;
- /* Compute number of semaphores we'll need */
- numSemas = ProcGlobalSemas();
- numSemas += SpinlockSemas();
-
- /*
- * Size of the Postgres shared-memory block is estimated via
- * moderately-accurate estimates for the big hogs, plus 100K for the
- * stuff that's too small to bother with estimating.
- *
- * We take some care during this phase to ensure that the total size
- * request doesn't overflow size_t. If this gets through, we don't
- * need to be so careful during the actual allocation phase.
- */
- size = 100000;
- size = add_size(size, PGSemaphoreShmemSize(numSemas));
- size = add_size(size, SpinlockSemaSize());
- size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
- sizeof(ShmemIndexEnt)));
- size = add_size(size, dsm_estimate_size());
- size = add_size(size, BufferShmemSize());
- size = add_size(size, LockShmemSize());
- size = add_size(size, PredicateLockShmemSize());
- size = add_size(size, ProcGlobalShmemSize());
- size = add_size(size, XLOGShmemSize());
- size = add_size(size, CLOGShmemSize());
- size = add_size(size, CommitTsShmemSize());
- size = add_size(size, SUBTRANSShmemSize());
- size = add_size(size, TwoPhaseShmemSize());
- size = add_size(size, BackgroundWorkerShmemSize());
- size = add_size(size, MultiXactShmemSize());
- size = add_size(size, LWLockShmemSize());
- size = add_size(size, ProcArrayShmemSize());
- size = add_size(size, BackendStatusShmemSize());
- size = add_size(size, SInvalShmemSize());
- size = add_size(size, PMSignalShmemSize());
- size = add_size(size, ProcSignalShmemSize());
- size = add_size(size, CheckpointerShmemSize());
- size = add_size(size, AutoVacuumShmemSize());
- size = add_size(size, ReplicationSlotsShmemSize());
- size = add_size(size, ReplicationOriginShmemSize());
- size = add_size(size, WalSndShmemSize());
- size = add_size(size, WalRcvShmemSize());
- size = add_size(size, PgArchShmemSize());
- size = add_size(size, ApplyLauncherShmemSize());
- size = add_size(size, SnapMgrShmemSize());
- size = add_size(size, BTreeShmemSize());
- size = add_size(size, SyncScanShmemSize());
- size = add_size(size, AsyncShmemSize());
-#ifdef EXEC_BACKEND
- size = add_size(size, ShmemBackendArraySize());
-#endif
-
- /* freeze the addin request size and include it */
- addin_request_allowed = false;
- size = add_size(size, total_addin_request);
-
- /* might as well round it off to a multiple of a typical page size */
- size = add_size(size, 8192 - (size % 8192));
-
+ /* Compute the size of the shared-memory block */
+ size = CalculateShmemSize(&numSemas);
elog(DEBUG3, "invoking IpcMemoryCreate(size=%zu)", size);
/*
diff --git a/src/include/bootstrap/bootstrap.h b/src/include/bootstrap/bootstrap.h
index 7d3b78e374..8e420574f5 100644
--- a/src/include/bootstrap/bootstrap.h
+++ b/src/include/bootstrap/bootstrap.h
@@ -32,7 +32,7 @@ extern Form_pg_attribute attrtypes[MAXATTR];
extern int numattr;
-extern void BootstrapModeMain(int argc, char *argv[], bool check_only) pg_attribute_noreturn();
+extern void BootstrapModeMain(int argc, char *argv[], bool check_only, bool output_shmem) pg_attribute_noreturn();
extern void closerel(char *name);
extern void boot_openrel(char *name);
diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index 753a6dd4d7..80e191d407 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -77,6 +77,7 @@ extern void check_on_shmem_exit_lists_are_empty(void);
/* ipci.c */
extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
+extern Size CalculateShmemSize(int *num_semaphores);
extern void CreateSharedMemoryAndSemaphores(void);
#endif /* IPC_H */
--
2.16.6
Hi,
On 2021-08-27 19:27:18 +0000, Bossart, Nathan wrote:
+ <varlistentry> + <term><option>--output-shmem</option></term> + <listitem> + <para> + Prints the amount of shared memory required for the given + configuration and exits. This can be used on a running server, but + the return value reflects the amount of shared memory needed based + on the current invocation. It does not return the amount of shared + memory in use by the running server. This must be the first + argument on the command line. + </para> + + <para> + This option is useful for determining the number of huge pages + needed for the server. For more information, see + <xref linkend="linux-huge-pages"/>. + </para> + </listitem> + </varlistentry> +
One thing I wonder is if this wouldn't better be dealt with in a more generic
way. While this is the most problematic runtime computed GUC, it's not the
only one. What if we introduced a new shared_memory_size GUC, and made
--describe-config output it? Perhaps adding --describe-config=guc-name?
I also wonder if we should output the number of hugepages needed instead of
the "raw" bytes of shared memory. The whole business about figuring out the
huge page size, dividing the shared memory size by that and then rounding up
could be removed in that case. Due to huge_page_size it's not even immediately
obvious which huge page size one should use...
diff --git a/src/backend/main/main.c b/src/backend/main/main.c index 3a2a0d598c..c141ae3d1c 100644 --- a/src/backend/main/main.c +++ b/src/backend/main/main.c @@ -182,9 +182,11 @@ main(int argc, char *argv[]) */if (argc > 1 && strcmp(argv[1], "--check") == 0) - BootstrapModeMain(argc, argv, true); + BootstrapModeMain(argc, argv, true, false); + else if (argc > 1 && strcmp(argv[1], "--output-shmem") == 0) + BootstrapModeMain(argc, argv, false, true); else if (argc > 1 && strcmp(argv[1], "--boot") == 0) - BootstrapModeMain(argc, argv, false); + BootstrapModeMain(argc, argv, false, false); #ifdef EXEC_BACKEND else if (argc > 1 && strncmp(argv[1], "--fork", 6) == 0) SubPostmasterMain(argc, argv);
help() needs an update too.
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c index 3e4ec53a97..b225b1ee70 100644 --- a/src/backend/storage/ipc/ipci.c +++ b/src/backend/storage/ipc/ipci.c @@ -75,6 +75,87 @@ RequestAddinShmemSpace(Size size) total_addin_request = add_size(total_addin_request, size); }+/* + * CalculateShmemSize + * Calculates the amount of shared memory and number of semaphores needed. + * + * If num_semaphores is not NULL, it will be set to the number of semaphores + * required. + * + * Note that this function freezes the additional shared memory request size + * from loadable modules. + */ +Size +CalculateShmemSize(int *num_semaphores) +{
Can you split this into a separate commit? It feels fairy uncontroversial to
me, so I think we could just apply it soon?
Greetings,
Andres Freund
On 8/27/21, 12:39 PM, "Andres Freund" <andres@anarazel.de> wrote:
One thing I wonder is if this wouldn't better be dealt with in a more generic
way. While this is the most problematic runtime computed GUC, it's not the
only one. What if we introduced a new shared_memory_size GUC, and made
--describe-config output it? Perhaps adding --describe-config=guc-name?I also wonder if we should output the number of hugepages needed instead of
the "raw" bytes of shared memory. The whole business about figuring out the
huge page size, dividing the shared memory size by that and then rounding up
could be removed in that case. Due to huge_page_size it's not even immediately
obvious which huge page size one should use...
I like both of these ideas.
Can you split this into a separate commit? It feels fairy uncontroversial to
me, so I think we could just apply it soon?
I attached a patch for just the uncontroversial part, which is
unfortunately all I have time for today.
Nathan
Attachments:
0001-Move-the-shared-memory-size-calculation-to-its-own-f.patchapplication/octet-stream; name=0001-Move-the-shared-memory-size-calculation-to-its-own-f.patchDownload
From 499e969ef82c5d93353be10610c801640099774f Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 27 Aug 2021 20:03:01 +0000
Subject: [PATCH 1/1] Move the shared memory size calculation to its own
function.
This change refactors the shared memory size calculation in
CreateSharedMemoryAndSemaphores() to its own function. This is
intended for use in a future change that will simplify the steps
for setting up huge pages.
---
src/backend/storage/ipc/ipci.c | 142 ++++++++++++++++++++++++-----------------
src/include/storage/ipc.h | 1 +
2 files changed, 84 insertions(+), 59 deletions(-)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 3e4ec53a97..b225b1ee70 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -75,6 +75,87 @@ RequestAddinShmemSpace(Size size)
total_addin_request = add_size(total_addin_request, size);
}
+/*
+ * CalculateShmemSize
+ * Calculates the amount of shared memory and number of semaphores needed.
+ *
+ * If num_semaphores is not NULL, it will be set to the number of semaphores
+ * required.
+ *
+ * Note that this function freezes the additional shared memory request size
+ * from loadable modules.
+ */
+Size
+CalculateShmemSize(int *num_semaphores)
+{
+ Size size;
+ int numSemas;
+
+ /* Compute number of semaphores we'll need */
+ numSemas = ProcGlobalSemas();
+ numSemas += SpinlockSemas();
+
+ /* Return the number of semaphores if requested by the caller */
+ if (num_semaphores)
+ *num_semaphores = numSemas;
+
+ /*
+ * Size of the Postgres shared-memory block is estimated via moderately-
+ * accurate estimates for the big hogs, plus 100K for the stuff that's too
+ * small to bother with estimating.
+ *
+ * We take some care to ensure that the total size request doesn't overflow
+ * size_t. If this gets through, we don't need to be so careful during the
+ * actual allocation phase.
+ */
+ size = 100000;
+ size = add_size(size, PGSemaphoreShmemSize(numSemas));
+ size = add_size(size, SpinlockSemaSize());
+ size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
+ sizeof(ShmemIndexEnt)));
+ size = add_size(size, dsm_estimate_size());
+ size = add_size(size, BufferShmemSize());
+ size = add_size(size, LockShmemSize());
+ size = add_size(size, PredicateLockShmemSize());
+ size = add_size(size, ProcGlobalShmemSize());
+ size = add_size(size, XLOGShmemSize());
+ size = add_size(size, CLOGShmemSize());
+ size = add_size(size, CommitTsShmemSize());
+ size = add_size(size, SUBTRANSShmemSize());
+ size = add_size(size, TwoPhaseShmemSize());
+ size = add_size(size, BackgroundWorkerShmemSize());
+ size = add_size(size, MultiXactShmemSize());
+ size = add_size(size, LWLockShmemSize());
+ size = add_size(size, ProcArrayShmemSize());
+ size = add_size(size, BackendStatusShmemSize());
+ size = add_size(size, SInvalShmemSize());
+ size = add_size(size, PMSignalShmemSize());
+ size = add_size(size, ProcSignalShmemSize());
+ size = add_size(size, CheckpointerShmemSize());
+ size = add_size(size, AutoVacuumShmemSize());
+ size = add_size(size, ReplicationSlotsShmemSize());
+ size = add_size(size, ReplicationOriginShmemSize());
+ size = add_size(size, WalSndShmemSize());
+ size = add_size(size, WalRcvShmemSize());
+ size = add_size(size, PgArchShmemSize());
+ size = add_size(size, ApplyLauncherShmemSize());
+ size = add_size(size, SnapMgrShmemSize());
+ size = add_size(size, BTreeShmemSize());
+ size = add_size(size, SyncScanShmemSize());
+ size = add_size(size, AsyncShmemSize());
+#ifdef EXEC_BACKEND
+ size = add_size(size, ShmemBackendArraySize());
+#endif
+
+ /* freeze the addin request size and include it */
+ addin_request_allowed = false;
+ size = add_size(size, total_addin_request);
+
+ /* might as well round it off to a multiple of a typical page size */
+ size = add_size(size, 8192 - (size % 8192));
+
+ return size;
+}
/*
* CreateSharedMemoryAndSemaphores
@@ -102,65 +183,8 @@ CreateSharedMemoryAndSemaphores(void)
Size size;
int numSemas;
- /* Compute number of semaphores we'll need */
- numSemas = ProcGlobalSemas();
- numSemas += SpinlockSemas();
-
- /*
- * Size of the Postgres shared-memory block is estimated via
- * moderately-accurate estimates for the big hogs, plus 100K for the
- * stuff that's too small to bother with estimating.
- *
- * We take some care during this phase to ensure that the total size
- * request doesn't overflow size_t. If this gets through, we don't
- * need to be so careful during the actual allocation phase.
- */
- size = 100000;
- size = add_size(size, PGSemaphoreShmemSize(numSemas));
- size = add_size(size, SpinlockSemaSize());
- size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
- sizeof(ShmemIndexEnt)));
- size = add_size(size, dsm_estimate_size());
- size = add_size(size, BufferShmemSize());
- size = add_size(size, LockShmemSize());
- size = add_size(size, PredicateLockShmemSize());
- size = add_size(size, ProcGlobalShmemSize());
- size = add_size(size, XLOGShmemSize());
- size = add_size(size, CLOGShmemSize());
- size = add_size(size, CommitTsShmemSize());
- size = add_size(size, SUBTRANSShmemSize());
- size = add_size(size, TwoPhaseShmemSize());
- size = add_size(size, BackgroundWorkerShmemSize());
- size = add_size(size, MultiXactShmemSize());
- size = add_size(size, LWLockShmemSize());
- size = add_size(size, ProcArrayShmemSize());
- size = add_size(size, BackendStatusShmemSize());
- size = add_size(size, SInvalShmemSize());
- size = add_size(size, PMSignalShmemSize());
- size = add_size(size, ProcSignalShmemSize());
- size = add_size(size, CheckpointerShmemSize());
- size = add_size(size, AutoVacuumShmemSize());
- size = add_size(size, ReplicationSlotsShmemSize());
- size = add_size(size, ReplicationOriginShmemSize());
- size = add_size(size, WalSndShmemSize());
- size = add_size(size, WalRcvShmemSize());
- size = add_size(size, PgArchShmemSize());
- size = add_size(size, ApplyLauncherShmemSize());
- size = add_size(size, SnapMgrShmemSize());
- size = add_size(size, BTreeShmemSize());
- size = add_size(size, SyncScanShmemSize());
- size = add_size(size, AsyncShmemSize());
-#ifdef EXEC_BACKEND
- size = add_size(size, ShmemBackendArraySize());
-#endif
-
- /* freeze the addin request size and include it */
- addin_request_allowed = false;
- size = add_size(size, total_addin_request);
-
- /* might as well round it off to a multiple of a typical page size */
- size = add_size(size, 8192 - (size % 8192));
-
+ /* Compute the size of the shared-memory block */
+ size = CalculateShmemSize(&numSemas);
elog(DEBUG3, "invoking IpcMemoryCreate(size=%zu)", size);
/*
diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index 753a6dd4d7..80e191d407 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -77,6 +77,7 @@ extern void check_on_shmem_exit_lists_are_empty(void);
/* ipci.c */
extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
+extern Size CalculateShmemSize(int *num_semaphores);
extern void CreateSharedMemoryAndSemaphores(void);
#endif /* IPC_H */
--
2.16.6
On Fri, Aug 27, 2021 at 08:16:40PM +0000, Bossart, Nathan wrote:
On 8/27/21, 12:39 PM, "Andres Freund" <andres@anarazel.de> wrote:
One thing I wonder is if this wouldn't better be dealt with in a more generic
way. While this is the most problematic runtime computed GUC, it's not the
only one. What if we introduced a new shared_memory_size GUC, and made
--describe-config output it? Perhaps adding --describe-config=guc-name?I also wonder if we should output the number of hugepages needed instead of
the "raw" bytes of shared memory. The whole business about figuring out the
huge page size, dividing the shared memory size by that and then rounding up
could be removed in that case. Due to huge_page_size it's not even immediately
obvious which huge page size one should use...I like both of these ideas.
That pretty much looks like -C in concept, isn't it? Except that you
cannot get the actual total shared memory value because we'd do this
operation before loading shared_preload_libraries and miss any amount
asked by extensions. There is a problem similar when attempting to do
postgres -C data_checksums, for example, which would output an
incorrect value even if the cluster has data checksums enabled.
--
Michael
On Sat, Aug 28, 2021 at 11:00:11AM +0900, Michael Paquier wrote:
On Fri, Aug 27, 2021 at 08:16:40PM +0000, Bossart, Nathan wrote:
On 8/27/21, 12:39 PM, "Andres Freund" <andres@anarazel.de> wrote:
One thing I wonder is if this wouldn't better be dealt with in a more generic
way. While this is the most problematic runtime computed GUC, it's not the
only one. What if we introduced a new shared_memory_size GUC, and made
--describe-config output it? Perhaps adding --describe-config=guc-name?I also wonder if we should output the number of hugepages needed instead of
the "raw" bytes of shared memory. The whole business about figuring out the
huge page size, dividing the shared memory size by that and then rounding up
could be removed in that case. Due to huge_page_size it's not even immediately
obvious which huge page size one should use...I like both of these ideas.
That pretty much looks like -C in concept, isn't it? Except that you
cannot get the actual total shared memory value because we'd do this
operation before loading shared_preload_libraries and miss any amount
asked by extensions. There is a problem similar when attempting to do
postgres -C data_checksums, for example, which would output an
incorrect value even if the cluster has data checksums enabled.
Since we don't want to try to allocate the huge pages, and we also don't want
to compute based on shared_buffers alone, did anyone consider if pg_controldata
is the right place to put this ?
It includes a lot of related stuff:
max_connections setting: 100
max_worker_processes setting: 8
- (added in 2013: 6bc8ef0b7f1f1df3998745a66e1790e27424aa0c)
max_wal_senders setting: 10
max_prepared_xacts setting: 2
max_locks_per_xact setting: 64
I'm not sure if there's any reason these aren't also shown (?)
autovacuum_max_workers - added in 2007: e2a186b03
max_predicate_locks_per_xact - added in 2011: dafaa3efb
max_logical_replication_workers
max_replication_slots
--
Justin
On 8/27/21, 7:01 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
On Fri, Aug 27, 2021 at 08:16:40PM +0000, Bossart, Nathan wrote:
On 8/27/21, 12:39 PM, "Andres Freund" <andres@anarazel.de> wrote:
One thing I wonder is if this wouldn't better be dealt with in a more generic
way. While this is the most problematic runtime computed GUC, it's not the
only one. What if we introduced a new shared_memory_size GUC, and made
--describe-config output it? Perhaps adding --describe-config=guc-name?I also wonder if we should output the number of hugepages needed instead of
the "raw" bytes of shared memory. The whole business about figuring out the
huge page size, dividing the shared memory size by that and then rounding up
could be removed in that case. Due to huge_page_size it's not even immediately
obvious which huge page size one should use...I like both of these ideas.
That pretty much looks like -C in concept, isn't it? Except that you
cannot get the actual total shared memory value because we'd do this
operation before loading shared_preload_libraries and miss any amount
asked by extensions. There is a problem similar when attempting to do
postgres -C data_checksums, for example, which would output an
incorrect value even if the cluster has data checksums enabled.
Attached is a hacky attempt at adding a shared_memory_size GUC in a
way that could be used with -C. This should include the amount of
shared memory requested by extensions, too. As long as huge_page_size
is nonzero, it seems easy enough to provide the number of huge pages
needed as well.
Nathan
Attachments:
0001-add-shared_memory_size-GUC.patchapplication/octet-stream; name=0001-add-shared_memory_size-GUC.patchDownload
From 2c6ed729d951621e34214abd1c532539530479ea Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Sat, 28 Aug 2021 05:02:55 +0000
Subject: [PATCH 1/1] add shared_memory_size GUC
---
src/backend/postmaster/postmaster.c | 25 +++++++++++++++++++++++--
src/backend/utils/misc/guc.c | 13 +++++++++++++
2 files changed, 36 insertions(+), 2 deletions(-)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9c2c98614a..38e28b3f8f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -893,7 +893,8 @@ PostmasterMain(int argc, char *argv[])
if (!SelectConfigFiles(userDoption, progname))
ExitPostmaster(2);
- if (output_config_variable != NULL)
+ if (output_config_variable != NULL &&
+ strcmp(output_config_variable, "shared_memory_size") != 0)
{
/*
* "-C guc" was specified, so print GUC's value and exit. No extra
@@ -983,7 +984,8 @@ PostmasterMain(int argc, char *argv[])
* so it must happen before opening sockets so that at exit, the socket
* lockfiles go away after CloseServerPorts runs.
*/
- CreateDataDirLockFile(true);
+ if (output_config_variable == NULL)
+ CreateDataDirLockFile(true);
/*
* Read the control file (for error checking and config info).
@@ -1026,6 +1028,25 @@ PostmasterMain(int argc, char *argv[])
*/
InitializeMaxBackends();
+ {
+ char buf[64];
+ Size size;
+
+ size = CalculateShmemSize(NULL);
+ size += (1024 * 1024) - 1;
+ size /= (1024 * 1024);
+
+ sprintf(buf, "%lu MB", size);
+ SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+ }
+
+ if (output_config_variable != NULL)
+ {
+ const char *val = GetConfigOption(output_config_variable, false, false);
+ puts(val);
+ ExitPostmaster(0);
+ }
+
/*
* Set up shared memory and semaphores.
*/
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 467b0fd6fe..7c88dbeae9 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -620,6 +620,8 @@ char *pgstat_temp_directory;
char *application_name;
+int shmem_size_mb;
+
int tcp_keepalives_idle;
int tcp_keepalives_interval;
int tcp_keepalives_count;
@@ -2337,6 +2339,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM,
+ gettext_noop("Shows the amount of shared memory allocated by the server (rounded up to the nearest MB)."),
+ NULL,
+ GUC_UNIT_MB
+ },
+ &shmem_size_mb,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
{
{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
gettext_noop("Sets the maximum number of temporary buffers used by each session."),
--
2.16.6
On Sat, Aug 28, 2021 at 05:36:37AM +0000, Bossart, Nathan wrote:
Attached is a hacky attempt at adding a shared_memory_size GUC in a
way that could be used with -C. This should include the amount of
shared memory requested by extensions, too. As long as huge_page_size
is nonzero, it seems easy enough to provide the number of huge pages
needed as well.
Yes, the implementation is not good. The key thing is that by wanting
to support shared_memory_size with the -C switch of postgres, we need
to call process_shared_preload_libraries before output_config_variable.
This additionally means to call ApplyLauncherRegister() before that so
as all the bgworker slots are not taken first. Going through
_PG_init() also means that we'd better use ChangeToDataDir()
beforehand.
Attached is a WIP to show how the order of the operations could be
changed, as that's easier to grasp. Even if we don't do that, having
the GUC and the refactoring of CalculateShmemSize() would still be
useful, as one could just query an existing instance for an estimation
of huge pages for a cloned one.
The GUC shared_memory_size should have GUC_NOT_IN_SAMPLE and
GUC_DISALLOW_IN_FILE, with some documentation, of course. I added the
flags to the GUC, not the docs. The code setting up the GUC is not
good either. It would make sense to just have that in a small wrapper
of ipci.c, perhaps.
--
Michael
Attachments:
v2-0001-Add-shared_memory_size-GUC.patchtext/plain; charset=us-asciiDownload
From e2345206dacec8b4454c3b429d0cfac01d5a0a58 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 30 Aug 2021 16:24:59 +0900
Subject: [PATCH] Add shared_memory_size GUC
---
src/include/storage/ipc.h | 1 +
src/backend/postmaster/postmaster.c | 60 +++++++-----
src/backend/storage/ipc/ipci.c | 142 ++++++++++++++++------------
src/backend/utils/misc/guc.c | 13 +++
4 files changed, 135 insertions(+), 81 deletions(-)
diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index 753a6dd4d7..80e191d407 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -77,6 +77,7 @@ extern void check_on_shmem_exit_lists_are_empty(void);
/* ipci.c */
extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
+extern Size CalculateShmemSize(int *num_semaphores);
extern void CreateSharedMemoryAndSemaphores(void);
#endif /* IPC_H */
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9c2c98614a..0d3b4b1be0 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -893,6 +893,44 @@ PostmasterMain(int argc, char *argv[])
if (!SelectConfigFiles(userDoption, progname))
ExitPostmaster(2);
+ /* Verify that DataDir looks reasonable */
+ checkDataDir();
+
+ /* Check that pg_control exists */
+ checkControlFile();
+
+ /* And switch working directory into it */
+ ChangeToDataDir();
+
+ /*
+ * Register the apply launcher. Since it registers a background worker,
+ * it needs to be called before InitializeMaxBackends(), and it's probably
+ * a good idea to call it before any modules had chance to take the
+ * background worker slots.
+ */
+ ApplyLauncherRegister();
+
+ /*
+ * Process any libraries that should be preloaded at postmaster start.
+ * Thie happens before running -C, so as it is possible to get an
+ * estimation of the total shared memory size allocated to this system,
+ * accounting for the portion from loaded libraries.
+ */
+ process_shared_preload_libraries();
+
+ /* XXX: this should be moved around */
+ {
+ char buf[64];
+ Size size;
+
+ size = CalculateShmemSize(NULL);
+ size += (1024 * 1024) - 1;
+ size /= (1024 * 1024);
+
+ sprintf(buf, "%lu MB", size);
+ SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+ }
+
if (output_config_variable != NULL)
{
/*
@@ -907,15 +945,6 @@ PostmasterMain(int argc, char *argv[])
ExitPostmaster(0);
}
- /* Verify that DataDir looks reasonable */
- checkDataDir();
-
- /* Check that pg_control exists */
- checkControlFile();
-
- /* And switch working directory into it */
- ChangeToDataDir();
-
/*
* Check for invalid combinations of GUC settings.
*/
@@ -996,19 +1025,6 @@ PostmasterMain(int argc, char *argv[])
*/
LocalProcessControlFile(false);
- /*
- * Register the apply launcher. Since it registers a background worker,
- * it needs to be called before InitializeMaxBackends(), and it's probably
- * a good idea to call it before any modules had chance to take the
- * background worker slots.
- */
- ApplyLauncherRegister();
-
- /*
- * process any libraries that should be preloaded at postmaster start
- */
- process_shared_preload_libraries();
-
/*
* Initialize SSL library, if specified.
*/
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 3e4ec53a97..b225b1ee70 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -75,6 +75,87 @@ RequestAddinShmemSpace(Size size)
total_addin_request = add_size(total_addin_request, size);
}
+/*
+ * CalculateShmemSize
+ * Calculates the amount of shared memory and number of semaphores needed.
+ *
+ * If num_semaphores is not NULL, it will be set to the number of semaphores
+ * required.
+ *
+ * Note that this function freezes the additional shared memory request size
+ * from loadable modules.
+ */
+Size
+CalculateShmemSize(int *num_semaphores)
+{
+ Size size;
+ int numSemas;
+
+ /* Compute number of semaphores we'll need */
+ numSemas = ProcGlobalSemas();
+ numSemas += SpinlockSemas();
+
+ /* Return the number of semaphores if requested by the caller */
+ if (num_semaphores)
+ *num_semaphores = numSemas;
+
+ /*
+ * Size of the Postgres shared-memory block is estimated via moderately-
+ * accurate estimates for the big hogs, plus 100K for the stuff that's too
+ * small to bother with estimating.
+ *
+ * We take some care to ensure that the total size request doesn't overflow
+ * size_t. If this gets through, we don't need to be so careful during the
+ * actual allocation phase.
+ */
+ size = 100000;
+ size = add_size(size, PGSemaphoreShmemSize(numSemas));
+ size = add_size(size, SpinlockSemaSize());
+ size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
+ sizeof(ShmemIndexEnt)));
+ size = add_size(size, dsm_estimate_size());
+ size = add_size(size, BufferShmemSize());
+ size = add_size(size, LockShmemSize());
+ size = add_size(size, PredicateLockShmemSize());
+ size = add_size(size, ProcGlobalShmemSize());
+ size = add_size(size, XLOGShmemSize());
+ size = add_size(size, CLOGShmemSize());
+ size = add_size(size, CommitTsShmemSize());
+ size = add_size(size, SUBTRANSShmemSize());
+ size = add_size(size, TwoPhaseShmemSize());
+ size = add_size(size, BackgroundWorkerShmemSize());
+ size = add_size(size, MultiXactShmemSize());
+ size = add_size(size, LWLockShmemSize());
+ size = add_size(size, ProcArrayShmemSize());
+ size = add_size(size, BackendStatusShmemSize());
+ size = add_size(size, SInvalShmemSize());
+ size = add_size(size, PMSignalShmemSize());
+ size = add_size(size, ProcSignalShmemSize());
+ size = add_size(size, CheckpointerShmemSize());
+ size = add_size(size, AutoVacuumShmemSize());
+ size = add_size(size, ReplicationSlotsShmemSize());
+ size = add_size(size, ReplicationOriginShmemSize());
+ size = add_size(size, WalSndShmemSize());
+ size = add_size(size, WalRcvShmemSize());
+ size = add_size(size, PgArchShmemSize());
+ size = add_size(size, ApplyLauncherShmemSize());
+ size = add_size(size, SnapMgrShmemSize());
+ size = add_size(size, BTreeShmemSize());
+ size = add_size(size, SyncScanShmemSize());
+ size = add_size(size, AsyncShmemSize());
+#ifdef EXEC_BACKEND
+ size = add_size(size, ShmemBackendArraySize());
+#endif
+
+ /* freeze the addin request size and include it */
+ addin_request_allowed = false;
+ size = add_size(size, total_addin_request);
+
+ /* might as well round it off to a multiple of a typical page size */
+ size = add_size(size, 8192 - (size % 8192));
+
+ return size;
+}
/*
* CreateSharedMemoryAndSemaphores
@@ -102,65 +183,8 @@ CreateSharedMemoryAndSemaphores(void)
Size size;
int numSemas;
- /* Compute number of semaphores we'll need */
- numSemas = ProcGlobalSemas();
- numSemas += SpinlockSemas();
-
- /*
- * Size of the Postgres shared-memory block is estimated via
- * moderately-accurate estimates for the big hogs, plus 100K for the
- * stuff that's too small to bother with estimating.
- *
- * We take some care during this phase to ensure that the total size
- * request doesn't overflow size_t. If this gets through, we don't
- * need to be so careful during the actual allocation phase.
- */
- size = 100000;
- size = add_size(size, PGSemaphoreShmemSize(numSemas));
- size = add_size(size, SpinlockSemaSize());
- size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
- sizeof(ShmemIndexEnt)));
- size = add_size(size, dsm_estimate_size());
- size = add_size(size, BufferShmemSize());
- size = add_size(size, LockShmemSize());
- size = add_size(size, PredicateLockShmemSize());
- size = add_size(size, ProcGlobalShmemSize());
- size = add_size(size, XLOGShmemSize());
- size = add_size(size, CLOGShmemSize());
- size = add_size(size, CommitTsShmemSize());
- size = add_size(size, SUBTRANSShmemSize());
- size = add_size(size, TwoPhaseShmemSize());
- size = add_size(size, BackgroundWorkerShmemSize());
- size = add_size(size, MultiXactShmemSize());
- size = add_size(size, LWLockShmemSize());
- size = add_size(size, ProcArrayShmemSize());
- size = add_size(size, BackendStatusShmemSize());
- size = add_size(size, SInvalShmemSize());
- size = add_size(size, PMSignalShmemSize());
- size = add_size(size, ProcSignalShmemSize());
- size = add_size(size, CheckpointerShmemSize());
- size = add_size(size, AutoVacuumShmemSize());
- size = add_size(size, ReplicationSlotsShmemSize());
- size = add_size(size, ReplicationOriginShmemSize());
- size = add_size(size, WalSndShmemSize());
- size = add_size(size, WalRcvShmemSize());
- size = add_size(size, PgArchShmemSize());
- size = add_size(size, ApplyLauncherShmemSize());
- size = add_size(size, SnapMgrShmemSize());
- size = add_size(size, BTreeShmemSize());
- size = add_size(size, SyncScanShmemSize());
- size = add_size(size, AsyncShmemSize());
-#ifdef EXEC_BACKEND
- size = add_size(size, ShmemBackendArraySize());
-#endif
-
- /* freeze the addin request size and include it */
- addin_request_allowed = false;
- size = add_size(size, total_addin_request);
-
- /* might as well round it off to a multiple of a typical page size */
- size = add_size(size, 8192 - (size % 8192));
-
+ /* Compute the size of the shared-memory block */
+ size = CalculateShmemSize(&numSemas);
elog(DEBUG3, "invoking IpcMemoryCreate(size=%zu)", size);
/*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 467b0fd6fe..872bdc1c8e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -620,6 +620,8 @@ char *pgstat_temp_directory;
char *application_name;
+int shmem_size_mb;
+
int tcp_keepalives_idle;
int tcp_keepalives_interval;
int tcp_keepalives_count;
@@ -2337,6 +2339,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM,
+ gettext_noop("Shows the amount of shared memory allocated by the server (rounded up to the nearest MB)."),
+ NULL,
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_UNIT_MB
+ },
+ &shmem_size_mb,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
{
{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
gettext_noop("Sets the maximum number of temporary buffers used by each session."),
--
2.33.0
On 8/30/21, 12:29 AM, "Michael Paquier" <michael@paquier.xyz> wrote:
Attached is a WIP to show how the order of the operations could be
changed, as that's easier to grasp. Even if we don't do that, having
the GUC and the refactoring of CalculateShmemSize() would still be
useful, as one could just query an existing instance for an estimation
of huge pages for a cloned one.The GUC shared_memory_size should have GUC_NOT_IN_SAMPLE and
GUC_DISALLOW_IN_FILE, with some documentation, of course. I added the
flags to the GUC, not the docs. The code setting up the GUC is not
good either. It would make sense to just have that in a small wrapper
of ipci.c, perhaps.
I moved the GUC calculation to ipci.c, adjusted the docs, and added a
huge_pages_required GUC. It's still a little rough around the edges,
and I haven't tested it on Windows, but this seems like the direction
the patch is headed.
Nathan
Attachments:
v3-0001-Move-the-shared-memory-size-calculation-to-its-ow.patchapplication/octet-stream; name=v3-0001-Move-the-shared-memory-size-calculation-to-its-ow.patchDownload
From 68a6974c73ef512ceb8e35649bf0add4b3547fa3 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 27 Aug 2021 20:03:01 +0000
Subject: [PATCH v3 1/2] Move the shared memory size calculation to its own
function.
This change refactors the shared memory size calculation in
CreateSharedMemoryAndSemaphores() to its own function. This is
intended for use in a future change that will simplify the steps
for setting up huge pages.
---
src/backend/storage/ipc/ipci.c | 142 ++++++++++++++++++++++++-----------------
src/include/storage/ipc.h | 1 +
2 files changed, 84 insertions(+), 59 deletions(-)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 3e4ec53a97..b225b1ee70 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -75,6 +75,87 @@ RequestAddinShmemSpace(Size size)
total_addin_request = add_size(total_addin_request, size);
}
+/*
+ * CalculateShmemSize
+ * Calculates the amount of shared memory and number of semaphores needed.
+ *
+ * If num_semaphores is not NULL, it will be set to the number of semaphores
+ * required.
+ *
+ * Note that this function freezes the additional shared memory request size
+ * from loadable modules.
+ */
+Size
+CalculateShmemSize(int *num_semaphores)
+{
+ Size size;
+ int numSemas;
+
+ /* Compute number of semaphores we'll need */
+ numSemas = ProcGlobalSemas();
+ numSemas += SpinlockSemas();
+
+ /* Return the number of semaphores if requested by the caller */
+ if (num_semaphores)
+ *num_semaphores = numSemas;
+
+ /*
+ * Size of the Postgres shared-memory block is estimated via moderately-
+ * accurate estimates for the big hogs, plus 100K for the stuff that's too
+ * small to bother with estimating.
+ *
+ * We take some care to ensure that the total size request doesn't overflow
+ * size_t. If this gets through, we don't need to be so careful during the
+ * actual allocation phase.
+ */
+ size = 100000;
+ size = add_size(size, PGSemaphoreShmemSize(numSemas));
+ size = add_size(size, SpinlockSemaSize());
+ size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
+ sizeof(ShmemIndexEnt)));
+ size = add_size(size, dsm_estimate_size());
+ size = add_size(size, BufferShmemSize());
+ size = add_size(size, LockShmemSize());
+ size = add_size(size, PredicateLockShmemSize());
+ size = add_size(size, ProcGlobalShmemSize());
+ size = add_size(size, XLOGShmemSize());
+ size = add_size(size, CLOGShmemSize());
+ size = add_size(size, CommitTsShmemSize());
+ size = add_size(size, SUBTRANSShmemSize());
+ size = add_size(size, TwoPhaseShmemSize());
+ size = add_size(size, BackgroundWorkerShmemSize());
+ size = add_size(size, MultiXactShmemSize());
+ size = add_size(size, LWLockShmemSize());
+ size = add_size(size, ProcArrayShmemSize());
+ size = add_size(size, BackendStatusShmemSize());
+ size = add_size(size, SInvalShmemSize());
+ size = add_size(size, PMSignalShmemSize());
+ size = add_size(size, ProcSignalShmemSize());
+ size = add_size(size, CheckpointerShmemSize());
+ size = add_size(size, AutoVacuumShmemSize());
+ size = add_size(size, ReplicationSlotsShmemSize());
+ size = add_size(size, ReplicationOriginShmemSize());
+ size = add_size(size, WalSndShmemSize());
+ size = add_size(size, WalRcvShmemSize());
+ size = add_size(size, PgArchShmemSize());
+ size = add_size(size, ApplyLauncherShmemSize());
+ size = add_size(size, SnapMgrShmemSize());
+ size = add_size(size, BTreeShmemSize());
+ size = add_size(size, SyncScanShmemSize());
+ size = add_size(size, AsyncShmemSize());
+#ifdef EXEC_BACKEND
+ size = add_size(size, ShmemBackendArraySize());
+#endif
+
+ /* freeze the addin request size and include it */
+ addin_request_allowed = false;
+ size = add_size(size, total_addin_request);
+
+ /* might as well round it off to a multiple of a typical page size */
+ size = add_size(size, 8192 - (size % 8192));
+
+ return size;
+}
/*
* CreateSharedMemoryAndSemaphores
@@ -102,65 +183,8 @@ CreateSharedMemoryAndSemaphores(void)
Size size;
int numSemas;
- /* Compute number of semaphores we'll need */
- numSemas = ProcGlobalSemas();
- numSemas += SpinlockSemas();
-
- /*
- * Size of the Postgres shared-memory block is estimated via
- * moderately-accurate estimates for the big hogs, plus 100K for the
- * stuff that's too small to bother with estimating.
- *
- * We take some care during this phase to ensure that the total size
- * request doesn't overflow size_t. If this gets through, we don't
- * need to be so careful during the actual allocation phase.
- */
- size = 100000;
- size = add_size(size, PGSemaphoreShmemSize(numSemas));
- size = add_size(size, SpinlockSemaSize());
- size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
- sizeof(ShmemIndexEnt)));
- size = add_size(size, dsm_estimate_size());
- size = add_size(size, BufferShmemSize());
- size = add_size(size, LockShmemSize());
- size = add_size(size, PredicateLockShmemSize());
- size = add_size(size, ProcGlobalShmemSize());
- size = add_size(size, XLOGShmemSize());
- size = add_size(size, CLOGShmemSize());
- size = add_size(size, CommitTsShmemSize());
- size = add_size(size, SUBTRANSShmemSize());
- size = add_size(size, TwoPhaseShmemSize());
- size = add_size(size, BackgroundWorkerShmemSize());
- size = add_size(size, MultiXactShmemSize());
- size = add_size(size, LWLockShmemSize());
- size = add_size(size, ProcArrayShmemSize());
- size = add_size(size, BackendStatusShmemSize());
- size = add_size(size, SInvalShmemSize());
- size = add_size(size, PMSignalShmemSize());
- size = add_size(size, ProcSignalShmemSize());
- size = add_size(size, CheckpointerShmemSize());
- size = add_size(size, AutoVacuumShmemSize());
- size = add_size(size, ReplicationSlotsShmemSize());
- size = add_size(size, ReplicationOriginShmemSize());
- size = add_size(size, WalSndShmemSize());
- size = add_size(size, WalRcvShmemSize());
- size = add_size(size, PgArchShmemSize());
- size = add_size(size, ApplyLauncherShmemSize());
- size = add_size(size, SnapMgrShmemSize());
- size = add_size(size, BTreeShmemSize());
- size = add_size(size, SyncScanShmemSize());
- size = add_size(size, AsyncShmemSize());
-#ifdef EXEC_BACKEND
- size = add_size(size, ShmemBackendArraySize());
-#endif
-
- /* freeze the addin request size and include it */
- addin_request_allowed = false;
- size = add_size(size, total_addin_request);
-
- /* might as well round it off to a multiple of a typical page size */
- size = add_size(size, 8192 - (size % 8192));
-
+ /* Compute the size of the shared-memory block */
+ size = CalculateShmemSize(&numSemas);
elog(DEBUG3, "invoking IpcMemoryCreate(size=%zu)", size);
/*
diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index 753a6dd4d7..80e191d407 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -77,6 +77,7 @@ extern void check_on_shmem_exit_lists_are_empty(void);
/* ipci.c */
extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
+extern Size CalculateShmemSize(int *num_semaphores);
extern void CreateSharedMemoryAndSemaphores(void);
#endif /* IPC_H */
--
2.16.6
v3-0002-Introduce-shared_memory_size-and-huge_pages_requi.patchapplication/octet-stream; name=v3-0002-Introduce-shared_memory_size-and-huge_pages_requi.patchDownload
From c7e8d7937a940ecc5d22c4cc757d452bb536ac3e Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Tue, 31 Aug 2021 05:05:43 +0000
Subject: [PATCH v3 2/2] Introduce shared_memory_size and huge_pages_required
GUCs.
These parameters are intended to simplify huge pages setup.
Instead of manually calculating the number of huge pages required
for the main shared memory segment, a command like the following
can be used to determine how many are needed:
postgres -D $PGDATA -C huge_pages_required
---
doc/src/sgml/config.sgml | 30 ++++++++++++++++++++
doc/src/sgml/runtime.sgml | 33 +++++++---------------
src/backend/port/sysv_shmem.c | 2 +-
src/backend/postmaster/postmaster.c | 55 ++++++++++++++++++++++---------------
src/backend/storage/ipc/ipci.c | 48 ++++++++++++++++++++++++++++++++
src/backend/utils/misc/guc.c | 25 +++++++++++++++++
src/include/storage/ipc.h | 1 +
src/include/storage/pg_shmem.h | 4 +++
8 files changed, 152 insertions(+), 46 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2c31c35a6b..e586427640 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -10101,6 +10101,22 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
</listitem>
</varlistentry>
+ <varlistentry id="guc-huge-pages-required" xreflabel="huge_pages_required">
+ <term><varname>huge_pages_required</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>huge_pages_required</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Reports the number of huge pages that are required for the main
+ shared memory segment based on the specified
+ <xref linkend="guc-huge-page-size"/>. If the huge page size cannot
+ be determined, this will be <literal>-1</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-integer-datetimes" xreflabel="integer_datetimes">
<term><varname>integer_datetimes</varname> (<type>boolean</type>)
<indexterm>
@@ -10275,6 +10291,20 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
</listitem>
</varlistentry>
+ <varlistentry id="guc-shared-memory-size" xreflabel="shared_memory_size">
+ <term><varname>shared_memory_size</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>shared_memory_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Reports the size of the main shared memory segment, rounded up to
+ the nearest megabyte.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-ssl-library" xreflabel="ssl_library">
<term><varname>ssl_library</varname> (<type>string</type>)
<indexterm>
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index f1cbc1d9e9..28bc36283e 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1442,32 +1442,19 @@ export PG_OOM_ADJUST_VALUE=0
with <varname>CONFIG_HUGETLBFS=y</varname> and
<varname>CONFIG_HUGETLB_PAGE=y</varname>. You will also have to configure
the operating system to provide enough huge pages of the desired size.
- To estimate the number of huge pages needed, start
- <productname>PostgreSQL</productname> without huge pages enabled and check
- the postmaster's anonymous shared memory segment size, as well as the
- system's default and supported huge page sizes, using the
- <filename>/proc</filename> and <filename>/sys</filename> file systems.
- This might look like:
+ To estimate the number of huge pages needed, use the
+ <command>postgres</command> command to see the value of
+ <xref linkend="guc-huge-pages-required"/>. This might look like:
<programlisting>
-$ <userinput>head -1 $PGDATA/postmaster.pid</userinput>
-4170
-$ <userinput>pmap 4170 | awk '/rw-s/ && /zero/ {print $2}'</userinput>
-6490428K
-$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
-Hugepagesize: 2048 kB
-$ <userinput>ls /sys/kernel/mm/hugepages</userinput>
-hugepages-1048576kB hugepages-2048kB
+$ <userinput>postgres -D $PGDATA -C huge_pages_required</userinput>
+3170
</programlisting>
- In this example the default is 2MB, but you can also explicitly request
- either 2MB or 1GB with <xref linkend="guc-huge-page-size"/>.
-
- Assuming <literal>2MB</literal> huge pages,
- <literal>6490428</literal> / <literal>2048</literal> gives approximately
- <literal>3169.154</literal>, so in this example we need at
- least <literal>3170</literal> huge pages. A larger setting would be
- appropriate if other programs on the machine also need huge pages.
- We can set this with:
+ Note that you can explicitly request either 2MB or 1GB huge pages with
+ <xref linkend="guc-huge-page-size"/>. While we need at least
+ <literal>3170</literal> huge pages in this example, a larger setting
+ would be appropriate if other programs on the machine also need huge
+ pages. We can allocate the huge pages with:
<programlisting>
# <userinput>sysctl -w vm.nr_hugepages=3170</userinput>
</programlisting>
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 9de96edf6a..f42f1ac171 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -478,7 +478,7 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
* Returns the (real, assumed or config provided) page size into *hugepagesize,
* and the hugepage-related mmap flags to use into *mmap_flags.
*/
-static void
+void
GetHugePageSize(Size *hugepagesize, int *mmap_flags)
{
Size default_hugepagesize = 0;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9c2c98614a..c32c21d632 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -893,6 +893,39 @@ PostmasterMain(int argc, char *argv[])
if (!SelectConfigFiles(userDoption, progname))
ExitPostmaster(2);
+ /* Verify that DataDir looks reasonable */
+ checkDataDir();
+
+ /* Check that pg_control exists */
+ checkControlFile();
+
+ /* And switch working directory into it */
+ ChangeToDataDir();
+
+ /*
+ * Register the apply launcher. Since it registers a background worker,
+ * it needs to be called before InitializeMaxBackends(), and it's probably
+ * a good idea to call it before any modules had chance to take the
+ * background worker slots.
+ */
+ ApplyLauncherRegister();
+
+ /*
+ * Process any libraries that should be preloaded at postmaster start.
+ * Thie happens before running -C, so as it is possible to get an
+ * estimation of the total shared memory size allocated to this system,
+ * accounting for the portion from loaded libraries.
+ */
+ process_shared_preload_libraries();
+
+ /*
+ * Determine the value of any runtime-computed GUCs that depend on the
+ * amount of shared memory required. It is important to do this after
+ * preloaded libraries have had a chance to request additional shared
+ * memory.
+ */
+ InitializeShmemGUCs();
+
if (output_config_variable != NULL)
{
/*
@@ -907,15 +940,6 @@ PostmasterMain(int argc, char *argv[])
ExitPostmaster(0);
}
- /* Verify that DataDir looks reasonable */
- checkDataDir();
-
- /* Check that pg_control exists */
- checkControlFile();
-
- /* And switch working directory into it */
- ChangeToDataDir();
-
/*
* Check for invalid combinations of GUC settings.
*/
@@ -996,19 +1020,6 @@ PostmasterMain(int argc, char *argv[])
*/
LocalProcessControlFile(false);
- /*
- * Register the apply launcher. Since it registers a background worker,
- * it needs to be called before InitializeMaxBackends(), and it's probably
- * a good idea to call it before any modules had chance to take the
- * background worker slots.
- */
- ApplyLauncherRegister();
-
- /*
- * process any libraries that should be preloaded at postmaster start
- */
- process_shared_preload_libraries();
-
/*
* Initialize SSL library, if specified.
*/
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index b225b1ee70..86061da1bc 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -14,6 +14,10 @@
*/
#include "postgres.h"
+#ifndef WIN32
+#include <sys/mman.h>
+#endif
+
#include "access/clog.h"
#include "access/commit_ts.h"
#include "access/heapam.h"
@@ -313,3 +317,47 @@ CreateSharedMemoryAndSemaphores(void)
if (shmem_startup_hook)
shmem_startup_hook();
}
+
+/*
+ * InitializeShmemGUCs
+ *
+ * This function initializes runtime-computed GUCs related to the amount of
+ * shared memory required for the current configuration.
+ */
+void
+InitializeShmemGUCs(void)
+{
+ char buf[64];
+ Size size_b;
+ Size size_mb;
+#if defined(MAP_HUGETLB) || defined(WIN32)
+ Size hp_size;
+ Size hp_required;
+#endif
+#ifdef MAP_HUGETLB
+ int unused;
+#endif
+
+ /*
+ * Calculate the shared memory size in bytes and in megabytes (rounded
+ * up to the nearest megabyte).
+ */
+ size_b = CalculateShmemSize(NULL);
+ size_mb = add_size(size_b, (1024 * 1024) - 1) / (1024 * 1024);
+
+ sprintf(buf, "%lu MB", size_mb);
+ SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+
+ /* Calculate the number of huge_pages required */
+#if defined(MAP_HUGETLB)
+ GetHugePageSize(&hp_size, &unused);
+#elif defined(WIN32)
+ hp_size = GetLargePageMinimum();
+#endif
+
+#if defined(MAP_HUGETLB) || defined(WIN32)
+ hp_required = (size_b / hp_size) + 1;
+ sprintf(buf, "%lu", hp_required);
+ SetConfigOption("huge_pages_required", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+#endif
+}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 467b0fd6fe..0d4dd27394 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -620,6 +620,9 @@ char *pgstat_temp_directory;
char *application_name;
+int shmem_size_mb;
+int huge_pages_required;
+
int tcp_keepalives_idle;
int tcp_keepalives_interval;
int tcp_keepalives_count;
@@ -2223,6 +2226,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"huge_pages_required", PGC_INTERNAL, RESOURCES_MEM,
+ gettext_noop("Shows the number of huge pages needed for the main shared memory area."),
+ gettext_noop("-1 indicates that the huge page size could not be determined."),
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ },
+ &huge_pages_required,
+ -1, -1, INT_MAX,
+ NULL, NULL, NULL
+ },
+
{
/* This is PGC_SUSET to prevent hiding from log_lock_waits. */
{"deadlock_timeout", PGC_SUSET, LOCK_MANAGEMENT,
@@ -2337,6 +2351,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM,
+ gettext_noop("Shows the amount of shared memory allocated by the server (rounded up to the nearest MB)."),
+ NULL,
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_UNIT_MB
+ },
+ &shmem_size_mb,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
{
{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index 80e191d407..7a1ebc8559 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -79,5 +79,6 @@ extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
extern Size CalculateShmemSize(int *num_semaphores);
extern void CreateSharedMemoryAndSemaphores(void);
+extern void InitializeShmemGUCs(void);
#endif /* IPC_H */
diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index 059df1b72c..c44403ed6a 100644
--- a/src/include/storage/pg_shmem.h
+++ b/src/include/storage/pg_shmem.h
@@ -88,4 +88,8 @@ extern PGShmemHeader *PGSharedMemoryCreate(Size size,
extern bool PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2);
extern void PGSharedMemoryDetach(void);
+#ifdef MAP_HUGETLB
+extern void GetHugePageSize(Size *hugepagesize, int *mmap_flags);
+#endif
+
#endif /* PG_SHMEM_H */
--
2.16.6
On Tue, Aug 31, 2021 at 05:37:52AM +0000, Bossart, Nathan wrote:
I moved the GUC calculation to ipci.c, adjusted the docs, and added a
huge_pages_required GUC. It's still a little rough around the edges,
and I haven't tested it on Windows, but this seems like the direction
the patch is headed.
Hmm. I am not sure about the addition of huge_pages_required, knowing
that we would have shared_memory_size. I'd rather let the calculation
part to the user with a scan of /proc/meminfo.
+#elif defined(WIN32)
+ hp_size = GetLargePageMinimum();
+#endif
+
+#if defined(MAP_HUGETLB) || defined(WIN32)
+ hp_required = (size_b / hp_size) + 1;
As of [1], there is the following description:
"If the processor does not support large pages, the return value is
zero."
So there is a problem here.
[1]: https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-getlargepageminimum -- Michael
--
Michael
On 8/31/21, 11:54 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
Hmm. I am not sure about the addition of huge_pages_required, knowing
that we would have shared_memory_size. I'd rather let the calculation
part to the user with a scan of /proc/meminfo.
I included this based on some feedback from Andres upthread [0]/messages/by-id/20210827193813.oqo5lamvyzahs35o@alap3.anarazel.de. I
went ahead and split the patch set into 3 pieces in case we end up
leaving it out.
+#elif defined(WIN32) + hp_size = GetLargePageMinimum(); +#endif + +#if defined(MAP_HUGETLB) || defined(WIN32) + hp_required = (size_b / hp_size) + 1; As of [1], there is the following description: "If the processor does not support large pages, the return value is zero." So there is a problem here.
I've fixed this in v4.
Nathan
[0]: /messages/by-id/20210827193813.oqo5lamvyzahs35o@alap3.anarazel.de
Attachments:
v4-0001-Move-the-shared-memory-size-calculation-to-its-ow.patchapplication/octet-stream; name=v4-0001-Move-the-shared-memory-size-calculation-to-its-ow.patchDownload
From ca71493463921ccffa080f353da8c1d47442c644 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 27 Aug 2021 20:03:01 +0000
Subject: [PATCH v4 1/3] Move the shared memory size calculation to its own
function.
This change refactors the shared memory size calculation in
CreateSharedMemoryAndSemaphores() to its own function. This is
intended for use in a future change that will simplify the steps
for setting up huge pages.
---
src/backend/storage/ipc/ipci.c | 142 ++++++++++++++++++++++++-----------------
src/include/storage/ipc.h | 1 +
2 files changed, 84 insertions(+), 59 deletions(-)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 3e4ec53a97..b225b1ee70 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -75,6 +75,87 @@ RequestAddinShmemSpace(Size size)
total_addin_request = add_size(total_addin_request, size);
}
+/*
+ * CalculateShmemSize
+ * Calculates the amount of shared memory and number of semaphores needed.
+ *
+ * If num_semaphores is not NULL, it will be set to the number of semaphores
+ * required.
+ *
+ * Note that this function freezes the additional shared memory request size
+ * from loadable modules.
+ */
+Size
+CalculateShmemSize(int *num_semaphores)
+{
+ Size size;
+ int numSemas;
+
+ /* Compute number of semaphores we'll need */
+ numSemas = ProcGlobalSemas();
+ numSemas += SpinlockSemas();
+
+ /* Return the number of semaphores if requested by the caller */
+ if (num_semaphores)
+ *num_semaphores = numSemas;
+
+ /*
+ * Size of the Postgres shared-memory block is estimated via moderately-
+ * accurate estimates for the big hogs, plus 100K for the stuff that's too
+ * small to bother with estimating.
+ *
+ * We take some care to ensure that the total size request doesn't overflow
+ * size_t. If this gets through, we don't need to be so careful during the
+ * actual allocation phase.
+ */
+ size = 100000;
+ size = add_size(size, PGSemaphoreShmemSize(numSemas));
+ size = add_size(size, SpinlockSemaSize());
+ size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
+ sizeof(ShmemIndexEnt)));
+ size = add_size(size, dsm_estimate_size());
+ size = add_size(size, BufferShmemSize());
+ size = add_size(size, LockShmemSize());
+ size = add_size(size, PredicateLockShmemSize());
+ size = add_size(size, ProcGlobalShmemSize());
+ size = add_size(size, XLOGShmemSize());
+ size = add_size(size, CLOGShmemSize());
+ size = add_size(size, CommitTsShmemSize());
+ size = add_size(size, SUBTRANSShmemSize());
+ size = add_size(size, TwoPhaseShmemSize());
+ size = add_size(size, BackgroundWorkerShmemSize());
+ size = add_size(size, MultiXactShmemSize());
+ size = add_size(size, LWLockShmemSize());
+ size = add_size(size, ProcArrayShmemSize());
+ size = add_size(size, BackendStatusShmemSize());
+ size = add_size(size, SInvalShmemSize());
+ size = add_size(size, PMSignalShmemSize());
+ size = add_size(size, ProcSignalShmemSize());
+ size = add_size(size, CheckpointerShmemSize());
+ size = add_size(size, AutoVacuumShmemSize());
+ size = add_size(size, ReplicationSlotsShmemSize());
+ size = add_size(size, ReplicationOriginShmemSize());
+ size = add_size(size, WalSndShmemSize());
+ size = add_size(size, WalRcvShmemSize());
+ size = add_size(size, PgArchShmemSize());
+ size = add_size(size, ApplyLauncherShmemSize());
+ size = add_size(size, SnapMgrShmemSize());
+ size = add_size(size, BTreeShmemSize());
+ size = add_size(size, SyncScanShmemSize());
+ size = add_size(size, AsyncShmemSize());
+#ifdef EXEC_BACKEND
+ size = add_size(size, ShmemBackendArraySize());
+#endif
+
+ /* freeze the addin request size and include it */
+ addin_request_allowed = false;
+ size = add_size(size, total_addin_request);
+
+ /* might as well round it off to a multiple of a typical page size */
+ size = add_size(size, 8192 - (size % 8192));
+
+ return size;
+}
/*
* CreateSharedMemoryAndSemaphores
@@ -102,65 +183,8 @@ CreateSharedMemoryAndSemaphores(void)
Size size;
int numSemas;
- /* Compute number of semaphores we'll need */
- numSemas = ProcGlobalSemas();
- numSemas += SpinlockSemas();
-
- /*
- * Size of the Postgres shared-memory block is estimated via
- * moderately-accurate estimates for the big hogs, plus 100K for the
- * stuff that's too small to bother with estimating.
- *
- * We take some care during this phase to ensure that the total size
- * request doesn't overflow size_t. If this gets through, we don't
- * need to be so careful during the actual allocation phase.
- */
- size = 100000;
- size = add_size(size, PGSemaphoreShmemSize(numSemas));
- size = add_size(size, SpinlockSemaSize());
- size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
- sizeof(ShmemIndexEnt)));
- size = add_size(size, dsm_estimate_size());
- size = add_size(size, BufferShmemSize());
- size = add_size(size, LockShmemSize());
- size = add_size(size, PredicateLockShmemSize());
- size = add_size(size, ProcGlobalShmemSize());
- size = add_size(size, XLOGShmemSize());
- size = add_size(size, CLOGShmemSize());
- size = add_size(size, CommitTsShmemSize());
- size = add_size(size, SUBTRANSShmemSize());
- size = add_size(size, TwoPhaseShmemSize());
- size = add_size(size, BackgroundWorkerShmemSize());
- size = add_size(size, MultiXactShmemSize());
- size = add_size(size, LWLockShmemSize());
- size = add_size(size, ProcArrayShmemSize());
- size = add_size(size, BackendStatusShmemSize());
- size = add_size(size, SInvalShmemSize());
- size = add_size(size, PMSignalShmemSize());
- size = add_size(size, ProcSignalShmemSize());
- size = add_size(size, CheckpointerShmemSize());
- size = add_size(size, AutoVacuumShmemSize());
- size = add_size(size, ReplicationSlotsShmemSize());
- size = add_size(size, ReplicationOriginShmemSize());
- size = add_size(size, WalSndShmemSize());
- size = add_size(size, WalRcvShmemSize());
- size = add_size(size, PgArchShmemSize());
- size = add_size(size, ApplyLauncherShmemSize());
- size = add_size(size, SnapMgrShmemSize());
- size = add_size(size, BTreeShmemSize());
- size = add_size(size, SyncScanShmemSize());
- size = add_size(size, AsyncShmemSize());
-#ifdef EXEC_BACKEND
- size = add_size(size, ShmemBackendArraySize());
-#endif
-
- /* freeze the addin request size and include it */
- addin_request_allowed = false;
- size = add_size(size, total_addin_request);
-
- /* might as well round it off to a multiple of a typical page size */
- size = add_size(size, 8192 - (size % 8192));
-
+ /* Compute the size of the shared-memory block */
+ size = CalculateShmemSize(&numSemas);
elog(DEBUG3, "invoking IpcMemoryCreate(size=%zu)", size);
/*
diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index 753a6dd4d7..80e191d407 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -77,6 +77,7 @@ extern void check_on_shmem_exit_lists_are_empty(void);
/* ipci.c */
extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
+extern Size CalculateShmemSize(int *num_semaphores);
extern void CreateSharedMemoryAndSemaphores(void);
#endif /* IPC_H */
--
2.16.6
v4-0003-Introduce-huge_pages_required-GUC.patchapplication/octet-stream; name=v4-0003-Introduce-huge_pages_required-GUC.patchDownload
From e51d8d40dd21697e5a4e2b79be8125bec3707b49 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Wed, 1 Sep 2021 18:12:28 +0000
Subject: [PATCH v4 3/3] Introduce huge_pages_required GUC.
This runtime-computed GUC shows the number of huge pages required
for the server's main shared memory area. It can be viewed with
'postgres -C' so that the huge pages can be allocated prior to
startup.
---
doc/src/sgml/config.sgml | 16 ++++++++++++++++
doc/src/sgml/runtime.sgml | 27 +++++++++------------------
src/backend/port/sysv_shmem.c | 2 +-
src/backend/storage/ipc/ipci.c | 26 ++++++++++++++++++++++++++
src/backend/utils/misc/guc.c | 12 ++++++++++++
src/include/storage/pg_shmem.h | 4 ++++
6 files changed, 68 insertions(+), 19 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ef0e2a7746..781147f719 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -10101,6 +10101,22 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
</listitem>
</varlistentry>
+ <varlistentry id="guc-huge-pages-required" xreflabel="huge_pages_required">
+ <term><varname>huge_pages_required</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>huge_pages_required</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Reports the number of huge pages that are required for the main
+ shared memory area based on the specified
+ <xref linkend="guc-huge-page-size"/>. If the huge page size cannot
+ be determined, this will be <literal>-1</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-integer-datetimes" xreflabel="integer_datetimes">
<term><varname>integer_datetimes</varname> (<type>boolean</type>)
<indexterm>
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 2144c0abad..d955639900 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1444,28 +1444,19 @@ export PG_OOM_ADJUST_VALUE=0
the operating system to provide enough huge pages of the desired size.
To estimate the number of huge pages needed, use the
<command>postgres</command> command to see the value of
- <xref linkend="guc-shared-memory-size"/>, and use the
- <filename>/proc</filename> and <filename>/sys</filename> file systems
- to find the system's default and supported huge page sizes. This might
- look like:
+ <xref linkend="guc-huge-pages-required"/>. This might look like:
<programlisting>
-$ <userinput>postgres -D $PGDATA -C shared_memory_size</userinput>
-6339
-$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
-Hugepagesize: 2048 kB
-$ <userinput>ls /sys/kernel/mm/hugepages</userinput>
-hugepages-1048576kB hugepages-2048kB
+$ <userinput>postgres -D $PGDATA -C huge_pages_required</userinput>
+3170
</programlisting>
- In this example the default is 2MB, but you can also explicitly request
- either 2MB or 1GB with <xref linkend="guc-huge-page-size"/>.
+ Note that you can explicitly request either 2MB or 1GB huge pages with
+ <xref linkend="guc-huge-page-size"/>.
- Assuming <literal>2MB</literal> huge pages,
- <literal>6339</literal> / <literal>2</literal> gives
- <literal>3169.5</literal>, so in this example we need at
- least <literal>3170</literal> huge pages. A larger setting would be
- appropriate if other programs on the machine also need huge pages.
- We can set this with:
+ While we need at least <literal>3170</literal> huge pages in this
+ example, a larger setting would be appropriate if other programs on
+ the machine also need huge pages. We can allocate the huge pages
+ with:
<programlisting>
# <userinput>sysctl -w vm.nr_hugepages=3170</userinput>
</programlisting>
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 9de96edf6a..f42f1ac171 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -478,7 +478,7 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
* Returns the (real, assumed or config provided) page size into *hugepagesize,
* and the hugepage-related mmap flags to use into *mmap_flags.
*/
-static void
+void
GetHugePageSize(Size *hugepagesize, int *mmap_flags)
{
Size default_hugepagesize = 0;
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f5736703a8..a625dd7701 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -14,6 +14,10 @@
*/
#include "postgres.h"
+#ifndef WIN32
+#include <sys/mman.h>
+#endif
+
#include "access/clog.h"
#include "access/commit_ts.h"
#include "access/heapam.h"
@@ -326,6 +330,13 @@ InitializeShmemGUCs(void)
char buf[64];
Size size_b;
Size size_mb;
+#if defined(MAP_HUGETLB) || defined(WIN32)
+ Size hp_size;
+ Size hp_required;
+#endif
+#ifdef MAP_HUGETLB
+ int unused;
+#endif
/*
* Calculate the shared memory size and round up to the nearest
@@ -336,4 +347,19 @@ InitializeShmemGUCs(void)
sprintf(buf, "%lu MB", size_mb);
SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+
+ /* Calculate the number of huge pages required */
+#if defined(MAP_HUGETLB)
+ GetHugePageSize(&hp_size, &unused);
+#elif defined(WIN32)
+ hp_size = GetLargePageMinimum();
+#endif
+
+#if defined(MAP_HUGETLB) || defined(WIN32)
+ if (hp_size == 0)
+ return;
+ hp_required = (size_b / hp_size) + 1;
+ sprintf(buf, "%lu", hp_required);
+ SetConfigOption("huge_pages_required", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+#endif
}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a11387c5ce..3e9a01746e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -621,6 +621,7 @@ char *pgstat_temp_directory;
char *application_name;
int shmem_size_mb;
+int huge_pages_required;
int tcp_keepalives_idle;
int tcp_keepalives_interval;
@@ -2225,6 +2226,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"huge_pages_required", PGC_INTERNAL, RESOURCES_MEM,
+ gettext_noop("Shows the number of huge pages needed for the main shared memory area."),
+ gettext_noop("-1 indicates that the huge page size could not be determined."),
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ },
+ &huge_pages_required,
+ -1, -1, INT_MAX,
+ NULL, NULL, NULL
+ },
+
{
/* This is PGC_SUSET to prevent hiding from log_lock_waits. */
{"deadlock_timeout", PGC_SUSET, LOCK_MANAGEMENT,
diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index 059df1b72c..c44403ed6a 100644
--- a/src/include/storage/pg_shmem.h
+++ b/src/include/storage/pg_shmem.h
@@ -88,4 +88,8 @@ extern PGShmemHeader *PGSharedMemoryCreate(Size size,
extern bool PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2);
extern void PGSharedMemoryDetach(void);
+#ifdef MAP_HUGETLB
+extern void GetHugePageSize(Size *hugepagesize, int *mmap_flags);
+#endif
+
#endif /* PG_SHMEM_H */
--
2.16.6
v4-0002-Introduce-shared_memory_size-GUC.patchapplication/octet-stream; name=v4-0002-Introduce-shared_memory_size-GUC.patchDownload
From bda20ec9080b739ee6698b2b5f34d04fbe1b86f6 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Wed, 1 Sep 2021 18:22:11 +0000
Subject: [PATCH v4 2/3] Introduce shared_memory_size GUC.
This runtime-computed GUC shows the size of the server's main
shared memory area. It can also be viewed with 'postgres -C',
which is useful for calculating the number of huge pages required
prior to startup.
---
doc/src/sgml/config.sgml | 14 ++++++++++
doc/src/sgml/runtime.sgml | 22 +++++++--------
src/backend/postmaster/postmaster.c | 55 ++++++++++++++++++++++---------------
src/backend/storage/ipc/ipci.c | 24 ++++++++++++++++
src/backend/utils/misc/guc.c | 13 +++++++++
src/include/storage/ipc.h | 1 +
6 files changed, 95 insertions(+), 34 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2c31c35a6b..ef0e2a7746 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -10275,6 +10275,20 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
</listitem>
</varlistentry>
+ <varlistentry id="guc-shared-memory-size" xreflabel="shared_memory_size">
+ <term><varname>shared_memory_size</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>shared_memory_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Reports the size of the main shared memory area, rounded up to the
+ nearest megabyte.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-ssl-library" xreflabel="ssl_library">
<term><varname>ssl_library</varname> (<type>string</type>)
<indexterm>
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index f1cbc1d9e9..2144c0abad 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1442,17 +1442,15 @@ export PG_OOM_ADJUST_VALUE=0
with <varname>CONFIG_HUGETLBFS=y</varname> and
<varname>CONFIG_HUGETLB_PAGE=y</varname>. You will also have to configure
the operating system to provide enough huge pages of the desired size.
- To estimate the number of huge pages needed, start
- <productname>PostgreSQL</productname> without huge pages enabled and check
- the postmaster's anonymous shared memory segment size, as well as the
- system's default and supported huge page sizes, using the
- <filename>/proc</filename> and <filename>/sys</filename> file systems.
- This might look like:
+ To estimate the number of huge pages needed, use the
+ <command>postgres</command> command to see the value of
+ <xref linkend="guc-shared-memory-size"/>, and use the
+ <filename>/proc</filename> and <filename>/sys</filename> file systems
+ to find the system's default and supported huge page sizes. This might
+ look like:
<programlisting>
-$ <userinput>head -1 $PGDATA/postmaster.pid</userinput>
-4170
-$ <userinput>pmap 4170 | awk '/rw-s/ && /zero/ {print $2}'</userinput>
-6490428K
+$ <userinput>postgres -D $PGDATA -C shared_memory_size</userinput>
+6339
$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
Hugepagesize: 2048 kB
$ <userinput>ls /sys/kernel/mm/hugepages</userinput>
@@ -1463,8 +1461,8 @@ hugepages-1048576kB hugepages-2048kB
either 2MB or 1GB with <xref linkend="guc-huge-page-size"/>.
Assuming <literal>2MB</literal> huge pages,
- <literal>6490428</literal> / <literal>2048</literal> gives approximately
- <literal>3169.154</literal>, so in this example we need at
+ <literal>6339</literal> / <literal>2</literal> gives
+ <literal>3169.5</literal>, so in this example we need at
least <literal>3170</literal> huge pages. A larger setting would be
appropriate if other programs on the machine also need huge pages.
We can set this with:
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9c2c98614a..c32c21d632 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -893,6 +893,39 @@ PostmasterMain(int argc, char *argv[])
if (!SelectConfigFiles(userDoption, progname))
ExitPostmaster(2);
+ /* Verify that DataDir looks reasonable */
+ checkDataDir();
+
+ /* Check that pg_control exists */
+ checkControlFile();
+
+ /* And switch working directory into it */
+ ChangeToDataDir();
+
+ /*
+ * Register the apply launcher. Since it registers a background worker,
+ * it needs to be called before InitializeMaxBackends(), and it's probably
+ * a good idea to call it before any modules had chance to take the
+ * background worker slots.
+ */
+ ApplyLauncherRegister();
+
+ /*
+ * Process any libraries that should be preloaded at postmaster start.
+ * Thie happens before running -C, so as it is possible to get an
+ * estimation of the total shared memory size allocated to this system,
+ * accounting for the portion from loaded libraries.
+ */
+ process_shared_preload_libraries();
+
+ /*
+ * Determine the value of any runtime-computed GUCs that depend on the
+ * amount of shared memory required. It is important to do this after
+ * preloaded libraries have had a chance to request additional shared
+ * memory.
+ */
+ InitializeShmemGUCs();
+
if (output_config_variable != NULL)
{
/*
@@ -907,15 +940,6 @@ PostmasterMain(int argc, char *argv[])
ExitPostmaster(0);
}
- /* Verify that DataDir looks reasonable */
- checkDataDir();
-
- /* Check that pg_control exists */
- checkControlFile();
-
- /* And switch working directory into it */
- ChangeToDataDir();
-
/*
* Check for invalid combinations of GUC settings.
*/
@@ -996,19 +1020,6 @@ PostmasterMain(int argc, char *argv[])
*/
LocalProcessControlFile(false);
- /*
- * Register the apply launcher. Since it registers a background worker,
- * it needs to be called before InitializeMaxBackends(), and it's probably
- * a good idea to call it before any modules had chance to take the
- * background worker slots.
- */
- ApplyLauncherRegister();
-
- /*
- * process any libraries that should be preloaded at postmaster start
- */
- process_shared_preload_libraries();
-
/*
* Initialize SSL library, if specified.
*/
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index b225b1ee70..f5736703a8 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -313,3 +313,27 @@ CreateSharedMemoryAndSemaphores(void)
if (shmem_startup_hook)
shmem_startup_hook();
}
+
+/*
+ * InitializeShmemGUCs
+ *
+ * This function initializes runtime-computed GUCs related to the amount of
+ * shared memory required for the current configuration.
+ */
+void
+InitializeShmemGUCs(void)
+{
+ char buf[64];
+ Size size_b;
+ Size size_mb;
+
+ /*
+ * Calculate the shared memory size and round up to the nearest
+ * megabyte.
+ */
+ size_b = CalculateShmemSize(NULL);
+ size_mb = add_size(size_b, (1024 * 1024) - 1) / (1024 * 1024);
+
+ sprintf(buf, "%lu MB", size_mb);
+ SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 467b0fd6fe..a11387c5ce 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -620,6 +620,8 @@ char *pgstat_temp_directory;
char *application_name;
+int shmem_size_mb;
+
int tcp_keepalives_idle;
int tcp_keepalives_interval;
int tcp_keepalives_count;
@@ -2337,6 +2339,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM,
+ gettext_noop("Shows the size of the server's main shared memory area (rounded up to the nearest MB)."),
+ NULL,
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_UNIT_MB
+ },
+ &shmem_size_mb,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
{
{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index 80e191d407..7a1ebc8559 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -79,5 +79,6 @@ extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
extern Size CalculateShmemSize(int *num_semaphores);
extern void CreateSharedMemoryAndSemaphores(void);
+extern void InitializeShmemGUCs(void);
#endif /* IPC_H */
--
2.16.6
On Wed, Sep 01, 2021 at 06:28:21PM +0000, Bossart, Nathan wrote:
On 8/31/21, 11:54 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
Hmm. I am not sure about the addition of huge_pages_required, knowing
that we would have shared_memory_size. I'd rather let the calculation
part to the user with a scan of /proc/meminfo.I included this based on some feedback from Andres upthread [0]. I
went ahead and split the patch set into 3 pieces in case we end up
leaving it out.
Thanks. Anyway, we don't really need huge_pages_required on Windows,
do we? The following docs of Windows tell what do to when using large
pages:
https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support
The backend code does that as in PGSharedMemoryCreate(), now that I
look at it. And there is no way to change the minimum large page size
there as far as I can see because that's decided by the processor, no?
There is a case for shared_memory_size on Windows to be able to adjust
the sizing of the memory of the host, though.
+#elif defined(WIN32) + hp_size = GetLargePageMinimum(); +#endif + +#if defined(MAP_HUGETLB) || defined(WIN32) + hp_required = (size_b / hp_size) + 1; As of [1], there is the following description: "If the processor does not support large pages, the return value is zero." So there is a problem here.I've fixed this in v4.
At the end it would be nice to not finish with two GUCs. Both depend
on the reordering of the actions done by the postmaster, so I'd be
curious to hear the thoughts of others on this particular point.
--
Michael
On 9/2/21, 12:54 AM, "Michael Paquier" <michael@paquier.xyz> wrote:
Thanks. Anyway, we don't really need huge_pages_required on Windows,
do we? The following docs of Windows tell what do to when using large
pages:
https://docs.microsoft.com/en-us/windows/win32/memory/large-page-supportThe backend code does that as in PGSharedMemoryCreate(), now that I
look at it. And there is no way to change the minimum large page size
there as far as I can see because that's decided by the processor, no?
There is a case for shared_memory_size on Windows to be able to adjust
the sizing of the memory of the host, though.
Yeah, huge_pages_required might not serve much purpose for Windows.
We could always set it to -1 for Windows if it seems like it'll do
more harm than good.
At the end it would be nice to not finish with two GUCs. Both depend
on the reordering of the actions done by the postmaster, so I'd be
curious to hear the thoughts of others on this particular point.
Of course. It'd be great to hear others' thoughts on this stuff.
Nathan
On Thu, Sep 02, 2021 at 04:46:56PM +0000, Bossart, Nathan wrote:
Yeah, huge_pages_required might not serve much purpose for Windows.
We could always set it to -1 for Windows if it seems like it'll do
more harm than good.
I'd be fine with this setup on environments where there is no need for
it.
At the end it would be nice to not finish with two GUCs. Both depend
on the reordering of the actions done by the postmaster, so I'd be
curious to hear the thoughts of others on this particular point.Of course. It'd be great to hear others' thoughts on this stuff.
Just to be clear here, the ordering of HEAD is that for the
postmaster:
- Load configuration.
- Handle -C config_param.
- checkDataDir(), to check permissions of the data dir, etc.
- checkControlFile(), to see if the control file exists.
- Switch to data directory as work dir.
- Lock file creation.
- Initial read of the control file (where the GUC data_checksums is
set).
- Register apply launcher
- shared_preload_libraries
With 0002, we have that:
- Load configuration.
- checkDataDir(), to check permissions of the data dir, etc.
- checkControlFile(), to see if the control file exists.
- Switch to data directory as work dir.
- Register apply launcher
- shared_preload_libraries
- Calculate the shmem GUCs (new step)
- Handle -C config_param.
- Lock file creation.
- Initial read of the control file (where the GUC data_checksums is
set).
One thing that would be incorrect upon more review is that we'd still
have data_checksums wrong with -C, meaning that the initial read of
the control file should be moved further up, and getting the control
file checks done before registering workers would be better. Keeping
the lock file at the end would be fine AFAIK, but should we worry
about the interactions with _PG_init() here?
0001, that refactors the calculation of the shmem size into a
different routine, is fine as-is, so I'd like to move on and apply
it.
--
Michael
At Thu, 2 Sep 2021 16:46:56 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in
On 9/2/21, 12:54 AM, "Michael Paquier" <michael@paquier.xyz> wrote:
Thanks. Anyway, we don't really need huge_pages_required on Windows,
do we? The following docs of Windows tell what do to when using large
pages:
https://docs.microsoft.com/en-us/windows/win32/memory/large-page-supportThe backend code does that as in PGSharedMemoryCreate(), now that I
look at it. And there is no way to change the minimum large page size
there as far as I can see because that's decided by the processor, no?
There is a case for shared_memory_size on Windows to be able to adjust
the sizing of the memory of the host, though.Yeah, huge_pages_required might not serve much purpose for Windows.
We could always set it to -1 for Windows if it seems like it'll do
more harm than good.
I agreed to this.
At the end it would be nice to not finish with two GUCs. Both depend
on the reordering of the actions done by the postmaster, so I'd be
curious to hear the thoughts of others on this particular point.Of course. It'd be great to hear others' thoughts on this stuff.
Honestly, I would be satisfied if the following error message
contained required huge pages.
FATAL: could not map anonymous shared memory: Cannot allocate memory
HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space, or huge pages. To reduce the request size (currently 148897792 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
Or emit a different message if huge_pages=on.
FATAL: could not map anonymous shared memory from huge pages
HINT: This usually means that PostgreSQL's request for huge pages more than available. The required 2048kB huge pages for the required memory size (currently 148897792 bytes) is 71 pages.
Returning to this feature, even if I am informed that via GUC, I won't
add memory by looking shared_memory_size. Anyway since shard_buffers
occupies almost all portion of shared memory allocated to postgres, we
are not supposed to need such a precise adjustment of the required
size of shared memory. On the other hand available number of huge
pages is configurable and we need to set it as required. On the other
hand, it might seem to me a bit strange that there's only
huge_page_required and not shared_memory_size in the view of
comprehensiveness or completeness. So my feeling at this point is "I
need only huge_pages_required but might want shared_memory_size just
for completeness".
By the way I noticed that postgres -C huge_page_size shows 0, which I
think should have the number used for the calculation if we show
huge_page_required.
I noticed that postgres -C shared_memory_size showed 137 (= 144703488)
whereas the error message above showed 148897792 bytes (142MB). So it
seems that something is forgotten while calculating
shared_memory_size. As the consequence, launching postgres setting
huge_pages_required (69 pages) as vm.nr_hugepages ended up in the
"could not map anonymous shared memory" error.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 9/2/21, 6:46 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
On Thu, Sep 02, 2021 at 04:46:56PM +0000, Bossart, Nathan wrote:
Yeah, huge_pages_required might not serve much purpose for Windows.
We could always set it to -1 for Windows if it seems like it'll do
more harm than good.I'd be fine with this setup on environments where there is no need for
it.
I did this in v5.
One thing that would be incorrect upon more review is that we'd still
have data_checksums wrong with -C, meaning that the initial read of
the control file should be moved further up, and getting the control
file checks done before registering workers would be better. Keeping
the lock file at the end would be fine AFAIK, but should we worry
about the interactions with _PG_init() here?
I think we can avoid so much reordering by moving the -C handling
instead. That should also fix things like data_checksums. I split
the reordering part out into its own patch in v5.
You bring up an interesting point about _PG_init(). Presently, you
can safely assume that the data directory is locked during _PG_init(),
so there's no need to worry about breaking something on a running
server. I don't know how important this is. Most _PG_init()
functions that I've seen will define some GUCs, request some shared
memory, register some background workers, and/or install some hooks.
Those all seem like safe things to do, but I wouldn't be at all
surprised to hear examples to the contrary. In any case, it looks
like the current ordering of these two steps has been there for 15+
years.
If this is a concern, one option would be to disallow running "-C
shared_memory_size" on running servers. That would have to extend to
GUCs like data_checksums and huge_pages_required, too.
0001, that refactors the calculation of the shmem size into a
different routine, is fine as-is, so I'd like to move on and apply
it.
Sounds good to me.
Nathan
Attachments:
v5-0001-Move-the-shared-memory-size-calculation-to-its-ow.patchapplication/octet-stream; name=v5-0001-Move-the-shared-memory-size-calculation-to-its-ow.patchDownload
From 7939bf84c276cc507ff14d0284dfb3fb7db705d6 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 27 Aug 2021 20:03:01 +0000
Subject: [PATCH v5 1/4] Move the shared memory size calculation to its own
function.
This change refactors the shared memory size calculation in
CreateSharedMemoryAndSemaphores() to its own function. This is
intended for use in a future change that will simplify the steps
for setting up huge pages.
---
src/backend/storage/ipc/ipci.c | 142 ++++++++++++++++++++++++-----------------
src/include/storage/ipc.h | 1 +
2 files changed, 84 insertions(+), 59 deletions(-)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 3e4ec53a97..b225b1ee70 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -75,6 +75,87 @@ RequestAddinShmemSpace(Size size)
total_addin_request = add_size(total_addin_request, size);
}
+/*
+ * CalculateShmemSize
+ * Calculates the amount of shared memory and number of semaphores needed.
+ *
+ * If num_semaphores is not NULL, it will be set to the number of semaphores
+ * required.
+ *
+ * Note that this function freezes the additional shared memory request size
+ * from loadable modules.
+ */
+Size
+CalculateShmemSize(int *num_semaphores)
+{
+ Size size;
+ int numSemas;
+
+ /* Compute number of semaphores we'll need */
+ numSemas = ProcGlobalSemas();
+ numSemas += SpinlockSemas();
+
+ /* Return the number of semaphores if requested by the caller */
+ if (num_semaphores)
+ *num_semaphores = numSemas;
+
+ /*
+ * Size of the Postgres shared-memory block is estimated via moderately-
+ * accurate estimates for the big hogs, plus 100K for the stuff that's too
+ * small to bother with estimating.
+ *
+ * We take some care to ensure that the total size request doesn't overflow
+ * size_t. If this gets through, we don't need to be so careful during the
+ * actual allocation phase.
+ */
+ size = 100000;
+ size = add_size(size, PGSemaphoreShmemSize(numSemas));
+ size = add_size(size, SpinlockSemaSize());
+ size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
+ sizeof(ShmemIndexEnt)));
+ size = add_size(size, dsm_estimate_size());
+ size = add_size(size, BufferShmemSize());
+ size = add_size(size, LockShmemSize());
+ size = add_size(size, PredicateLockShmemSize());
+ size = add_size(size, ProcGlobalShmemSize());
+ size = add_size(size, XLOGShmemSize());
+ size = add_size(size, CLOGShmemSize());
+ size = add_size(size, CommitTsShmemSize());
+ size = add_size(size, SUBTRANSShmemSize());
+ size = add_size(size, TwoPhaseShmemSize());
+ size = add_size(size, BackgroundWorkerShmemSize());
+ size = add_size(size, MultiXactShmemSize());
+ size = add_size(size, LWLockShmemSize());
+ size = add_size(size, ProcArrayShmemSize());
+ size = add_size(size, BackendStatusShmemSize());
+ size = add_size(size, SInvalShmemSize());
+ size = add_size(size, PMSignalShmemSize());
+ size = add_size(size, ProcSignalShmemSize());
+ size = add_size(size, CheckpointerShmemSize());
+ size = add_size(size, AutoVacuumShmemSize());
+ size = add_size(size, ReplicationSlotsShmemSize());
+ size = add_size(size, ReplicationOriginShmemSize());
+ size = add_size(size, WalSndShmemSize());
+ size = add_size(size, WalRcvShmemSize());
+ size = add_size(size, PgArchShmemSize());
+ size = add_size(size, ApplyLauncherShmemSize());
+ size = add_size(size, SnapMgrShmemSize());
+ size = add_size(size, BTreeShmemSize());
+ size = add_size(size, SyncScanShmemSize());
+ size = add_size(size, AsyncShmemSize());
+#ifdef EXEC_BACKEND
+ size = add_size(size, ShmemBackendArraySize());
+#endif
+
+ /* freeze the addin request size and include it */
+ addin_request_allowed = false;
+ size = add_size(size, total_addin_request);
+
+ /* might as well round it off to a multiple of a typical page size */
+ size = add_size(size, 8192 - (size % 8192));
+
+ return size;
+}
/*
* CreateSharedMemoryAndSemaphores
@@ -102,65 +183,8 @@ CreateSharedMemoryAndSemaphores(void)
Size size;
int numSemas;
- /* Compute number of semaphores we'll need */
- numSemas = ProcGlobalSemas();
- numSemas += SpinlockSemas();
-
- /*
- * Size of the Postgres shared-memory block is estimated via
- * moderately-accurate estimates for the big hogs, plus 100K for the
- * stuff that's too small to bother with estimating.
- *
- * We take some care during this phase to ensure that the total size
- * request doesn't overflow size_t. If this gets through, we don't
- * need to be so careful during the actual allocation phase.
- */
- size = 100000;
- size = add_size(size, PGSemaphoreShmemSize(numSemas));
- size = add_size(size, SpinlockSemaSize());
- size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
- sizeof(ShmemIndexEnt)));
- size = add_size(size, dsm_estimate_size());
- size = add_size(size, BufferShmemSize());
- size = add_size(size, LockShmemSize());
- size = add_size(size, PredicateLockShmemSize());
- size = add_size(size, ProcGlobalShmemSize());
- size = add_size(size, XLOGShmemSize());
- size = add_size(size, CLOGShmemSize());
- size = add_size(size, CommitTsShmemSize());
- size = add_size(size, SUBTRANSShmemSize());
- size = add_size(size, TwoPhaseShmemSize());
- size = add_size(size, BackgroundWorkerShmemSize());
- size = add_size(size, MultiXactShmemSize());
- size = add_size(size, LWLockShmemSize());
- size = add_size(size, ProcArrayShmemSize());
- size = add_size(size, BackendStatusShmemSize());
- size = add_size(size, SInvalShmemSize());
- size = add_size(size, PMSignalShmemSize());
- size = add_size(size, ProcSignalShmemSize());
- size = add_size(size, CheckpointerShmemSize());
- size = add_size(size, AutoVacuumShmemSize());
- size = add_size(size, ReplicationSlotsShmemSize());
- size = add_size(size, ReplicationOriginShmemSize());
- size = add_size(size, WalSndShmemSize());
- size = add_size(size, WalRcvShmemSize());
- size = add_size(size, PgArchShmemSize());
- size = add_size(size, ApplyLauncherShmemSize());
- size = add_size(size, SnapMgrShmemSize());
- size = add_size(size, BTreeShmemSize());
- size = add_size(size, SyncScanShmemSize());
- size = add_size(size, AsyncShmemSize());
-#ifdef EXEC_BACKEND
- size = add_size(size, ShmemBackendArraySize());
-#endif
-
- /* freeze the addin request size and include it */
- addin_request_allowed = false;
- size = add_size(size, total_addin_request);
-
- /* might as well round it off to a multiple of a typical page size */
- size = add_size(size, 8192 - (size % 8192));
-
+ /* Compute the size of the shared-memory block */
+ size = CalculateShmemSize(&numSemas);
elog(DEBUG3, "invoking IpcMemoryCreate(size=%zu)", size);
/*
diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index 753a6dd4d7..80e191d407 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -77,6 +77,7 @@ extern void check_on_shmem_exit_lists_are_empty(void);
/* ipci.c */
extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
+extern Size CalculateShmemSize(int *num_semaphores);
extern void CreateSharedMemoryAndSemaphores(void);
#endif /* IPC_H */
--
2.16.6
v5-0004-Introduce-huge_pages_required-GUC.patchapplication/octet-stream; name=v5-0004-Introduce-huge_pages_required-GUC.patchDownload
From e65d7c001b6fb6ae6e903eaffa155935459bb2d8 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 3 Sep 2021 17:11:51 +0000
Subject: [PATCH v5 4/4] Introduce huge_pages_required GUC.
This runtime-computed GUC shows the number of huge pages required
for the server's main shared memory area. It can be viewed with
'postgres -C' so that the huge pages can be allocated prior to
startup.
---
doc/src/sgml/config.sgml | 21 +++++++++++++++++++++
doc/src/sgml/runtime.sgml | 27 +++++++++------------------
src/backend/port/sysv_shmem.c | 2 +-
src/backend/storage/ipc/ipci.c | 20 ++++++++++++++++++++
src/backend/utils/misc/guc.c | 12 ++++++++++++
src/include/storage/pg_shmem.h | 4 ++++
6 files changed, 67 insertions(+), 19 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ef0e2a7746..b27d8aff15 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -10101,6 +10101,27 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
</listitem>
</varlistentry>
+ <varlistentry id="guc-huge-pages-required" xreflabel="huge_pages_required">
+ <term><varname>huge_pages_required</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>huge_pages_required</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Reports the number of huge pages that are required for the main
+ shared memory area based on the specified
+ <xref linkend="guc-huge-page-size"/>. If huge pages are not supported,
+ this will be <literal>-1</literal>.
+ </para>
+ <para>
+ This setting is supported only on Linux. It is always set to
+ <literal>-1</literal> on other platforms. For more details about using
+ huge pages on Linux, see <xref linkend="linux-huge-pages"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-integer-datetimes" xreflabel="integer_datetimes">
<term><varname>integer_datetimes</varname> (<type>boolean</type>)
<indexterm>
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 2144c0abad..d955639900 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1444,28 +1444,19 @@ export PG_OOM_ADJUST_VALUE=0
the operating system to provide enough huge pages of the desired size.
To estimate the number of huge pages needed, use the
<command>postgres</command> command to see the value of
- <xref linkend="guc-shared-memory-size"/>, and use the
- <filename>/proc</filename> and <filename>/sys</filename> file systems
- to find the system's default and supported huge page sizes. This might
- look like:
+ <xref linkend="guc-huge-pages-required"/>. This might look like:
<programlisting>
-$ <userinput>postgres -D $PGDATA -C shared_memory_size</userinput>
-6339
-$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
-Hugepagesize: 2048 kB
-$ <userinput>ls /sys/kernel/mm/hugepages</userinput>
-hugepages-1048576kB hugepages-2048kB
+$ <userinput>postgres -D $PGDATA -C huge_pages_required</userinput>
+3170
</programlisting>
- In this example the default is 2MB, but you can also explicitly request
- either 2MB or 1GB with <xref linkend="guc-huge-page-size"/>.
+ Note that you can explicitly request either 2MB or 1GB huge pages with
+ <xref linkend="guc-huge-page-size"/>.
- Assuming <literal>2MB</literal> huge pages,
- <literal>6339</literal> / <literal>2</literal> gives
- <literal>3169.5</literal>, so in this example we need at
- least <literal>3170</literal> huge pages. A larger setting would be
- appropriate if other programs on the machine also need huge pages.
- We can set this with:
+ While we need at least <literal>3170</literal> huge pages in this
+ example, a larger setting would be appropriate if other programs on
+ the machine also need huge pages. We can allocate the huge pages
+ with:
<programlisting>
# <userinput>sysctl -w vm.nr_hugepages=3170</userinput>
</programlisting>
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 9de96edf6a..f42f1ac171 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -478,7 +478,7 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
* Returns the (real, assumed or config provided) page size into *hugepagesize,
* and the hugepage-related mmap flags to use into *mmap_flags.
*/
-static void
+void
GetHugePageSize(Size *hugepagesize, int *mmap_flags)
{
Size default_hugepagesize = 0;
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f5736703a8..19ff9dff26 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -14,6 +14,10 @@
*/
#include "postgres.h"
+#ifndef WIN32
+#include <sys/mman.h>
+#endif
+
#include "access/clog.h"
#include "access/commit_ts.h"
#include "access/heapam.h"
@@ -326,6 +330,11 @@ InitializeShmemGUCs(void)
char buf[64];
Size size_b;
Size size_mb;
+#if defined(MAP_HUGETLB)
+ Size hp_size;
+ Size hp_required;
+ int unused;
+#endif
/*
* Calculate the shared memory size and round up to the nearest
@@ -336,4 +345,15 @@ InitializeShmemGUCs(void)
sprintf(buf, "%lu MB", size_mb);
SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+
+#if defined(MAP_HUGETLB)
+
+ /* Calculate the number of huge pages required */
+ GetHugePageSize(&hp_size, &unused);
+ hp_required = (size_b / hp_size) + 1;
+
+ sprintf(buf, "%lu", hp_required);
+ SetConfigOption("huge_pages_required", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+
+#endif
}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a11387c5ce..1a30d635fc 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -621,6 +621,7 @@ char *pgstat_temp_directory;
char *application_name;
int shmem_size_mb;
+int huge_pages_required;
int tcp_keepalives_idle;
int tcp_keepalives_interval;
@@ -2225,6 +2226,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"huge_pages_required", PGC_INTERNAL, RESOURCES_MEM,
+ gettext_noop("Shows the number of huge pages needed for the main shared memory area."),
+ gettext_noop("-1 indicates that the value could not be determined."),
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ },
+ &huge_pages_required,
+ -1, -1, INT_MAX,
+ NULL, NULL, NULL
+ },
+
{
/* This is PGC_SUSET to prevent hiding from log_lock_waits. */
{"deadlock_timeout", PGC_SUSET, LOCK_MANAGEMENT,
diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index 059df1b72c..c44403ed6a 100644
--- a/src/include/storage/pg_shmem.h
+++ b/src/include/storage/pg_shmem.h
@@ -88,4 +88,8 @@ extern PGShmemHeader *PGSharedMemoryCreate(Size size,
extern bool PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2);
extern void PGSharedMemoryDetach(void);
+#ifdef MAP_HUGETLB
+extern void GetHugePageSize(Size *hugepagesize, int *mmap_flags);
+#endif
+
#endif /* PG_SHMEM_H */
--
2.16.6
v5-0003-Introduce-shared_memory_size-GUC.patchapplication/octet-stream; name=v5-0003-Introduce-shared_memory_size-GUC.patchDownload
From 2e2638b2c5ad167620dd5b41eec10aaa790b8c5a Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 3 Sep 2021 16:49:45 +0000
Subject: [PATCH v5 3/4] Introduce shared_memory_size GUC.
This runtime-computed GUC shows the size of the server's main
shared memory area. It can also be viewed with 'postgres -C',
which is useful for calculating the number of huge pages required
prior to startup.
---
doc/src/sgml/config.sgml | 14 ++++++++++++++
doc/src/sgml/runtime.sgml | 22 ++++++++++------------
src/backend/postmaster/postmaster.c | 7 +++++++
src/backend/storage/ipc/ipci.c | 24 ++++++++++++++++++++++++
src/backend/utils/misc/guc.c | 13 +++++++++++++
src/include/storage/ipc.h | 1 +
6 files changed, 69 insertions(+), 12 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2c31c35a6b..ef0e2a7746 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -10275,6 +10275,20 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
</listitem>
</varlistentry>
+ <varlistentry id="guc-shared-memory-size" xreflabel="shared_memory_size">
+ <term><varname>shared_memory_size</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>shared_memory_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Reports the size of the main shared memory area, rounded up to the
+ nearest megabyte.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-ssl-library" xreflabel="ssl_library">
<term><varname>ssl_library</varname> (<type>string</type>)
<indexterm>
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index f1cbc1d9e9..2144c0abad 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1442,17 +1442,15 @@ export PG_OOM_ADJUST_VALUE=0
with <varname>CONFIG_HUGETLBFS=y</varname> and
<varname>CONFIG_HUGETLB_PAGE=y</varname>. You will also have to configure
the operating system to provide enough huge pages of the desired size.
- To estimate the number of huge pages needed, start
- <productname>PostgreSQL</productname> without huge pages enabled and check
- the postmaster's anonymous shared memory segment size, as well as the
- system's default and supported huge page sizes, using the
- <filename>/proc</filename> and <filename>/sys</filename> file systems.
- This might look like:
+ To estimate the number of huge pages needed, use the
+ <command>postgres</command> command to see the value of
+ <xref linkend="guc-shared-memory-size"/>, and use the
+ <filename>/proc</filename> and <filename>/sys</filename> file systems
+ to find the system's default and supported huge page sizes. This might
+ look like:
<programlisting>
-$ <userinput>head -1 $PGDATA/postmaster.pid</userinput>
-4170
-$ <userinput>pmap 4170 | awk '/rw-s/ && /zero/ {print $2}'</userinput>
-6490428K
+$ <userinput>postgres -D $PGDATA -C shared_memory_size</userinput>
+6339
$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
Hugepagesize: 2048 kB
$ <userinput>ls /sys/kernel/mm/hugepages</userinput>
@@ -1463,8 +1461,8 @@ hugepages-1048576kB hugepages-2048kB
either 2MB or 1GB with <xref linkend="guc-huge-page-size"/>.
Assuming <literal>2MB</literal> huge pages,
- <literal>6490428</literal> / <literal>2048</literal> gives approximately
- <literal>3169.154</literal>, so in this example we need at
+ <literal>6339</literal> / <literal>2</literal> gives
+ <literal>3169.5</literal>, so in this example we need at
least <literal>3170</literal> huge pages. A larger setting would be
appropriate if other programs on the machine also need huge pages.
We can set this with:
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 0fa2f41ddc..c0fbc83539 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1016,6 +1016,13 @@ PostmasterMain(int argc, char *argv[])
*/
InitializeMaxBackends();
+ /*
+ * Now that loadable modules have had their chance to request additional
+ * shared memory, determine the value of any runtime-computed GUCs that
+ * depend on the amount of shared memory required.
+ */
+ InitializeShmemGUCs();
+
/*
* If user requested that we print a GUC's value and exit (via "-C guc"),
* do that now since we've loaded all configurations.
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index b225b1ee70..f5736703a8 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -313,3 +313,27 @@ CreateSharedMemoryAndSemaphores(void)
if (shmem_startup_hook)
shmem_startup_hook();
}
+
+/*
+ * InitializeShmemGUCs
+ *
+ * This function initializes runtime-computed GUCs related to the amount of
+ * shared memory required for the current configuration.
+ */
+void
+InitializeShmemGUCs(void)
+{
+ char buf[64];
+ Size size_b;
+ Size size_mb;
+
+ /*
+ * Calculate the shared memory size and round up to the nearest
+ * megabyte.
+ */
+ size_b = CalculateShmemSize(NULL);
+ size_mb = add_size(size_b, (1024 * 1024) - 1) / (1024 * 1024);
+
+ sprintf(buf, "%lu MB", size_mb);
+ SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 467b0fd6fe..a11387c5ce 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -620,6 +620,8 @@ char *pgstat_temp_directory;
char *application_name;
+int shmem_size_mb;
+
int tcp_keepalives_idle;
int tcp_keepalives_interval;
int tcp_keepalives_count;
@@ -2337,6 +2339,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM,
+ gettext_noop("Shows the size of the server's main shared memory area (rounded up to the nearest MB)."),
+ NULL,
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_UNIT_MB
+ },
+ &shmem_size_mb,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
{
{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index 80e191d407..7a1ebc8559 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -79,5 +79,6 @@ extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
extern Size CalculateShmemSize(int *num_semaphores);
extern void CreateSharedMemoryAndSemaphores(void);
+extern void InitializeShmemGUCs(void);
#endif /* IPC_H */
--
2.16.6
v5-0002-Move-postmaster-C-logic-to-after-processing-prelo.patchapplication/octet-stream; name=v5-0002-Move-postmaster-C-logic-to-after-processing-prelo.patchDownload
From ba43a45c5ace5573589735514284d2dd1a6aec42 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 3 Sep 2021 16:41:31 +0000
Subject: [PATCH v5 2/4] Move postmaster "-C" logic to after processing preload
libraries.
This change allows us to return correct values for runtime-computed
GUCs such as data_checksums and for future GUCs related to the
amount of shared memory required for the server.
One notable behavior change is that "-C" will now run all preload
libraries' _PG_init() functions without locking the data directory.
If a library's _PG_init() function might do something that would
break a running server, it will need to be adjusted to be safe.
---
src/backend/postmaster/postmaster.c | 33 ++++++++++++++++++---------------
1 file changed, 18 insertions(+), 15 deletions(-)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9c2c98614a..0fa2f41ddc 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -893,20 +893,6 @@ PostmasterMain(int argc, char *argv[])
if (!SelectConfigFiles(userDoption, progname))
ExitPostmaster(2);
- if (output_config_variable != NULL)
- {
- /*
- * "-C guc" was specified, so print GUC's value and exit. No extra
- * permission check is needed because the user is reading inside the
- * data dir.
- */
- const char *config_val = GetConfigOption(output_config_variable,
- false, false);
-
- puts(config_val ? config_val : "");
- ExitPostmaster(0);
- }
-
/* Verify that DataDir looks reasonable */
checkDataDir();
@@ -982,8 +968,12 @@ PostmasterMain(int argc, char *argv[])
* is responsible for removing both data directory and socket lockfiles;
* so it must happen before opening sockets so that at exit, the socket
* lockfiles go away after CloseServerPorts runs.
+ *
+ * We skip this step if we are just going to print a GUC's value and exit
+ * a few steps down.
*/
- CreateDataDirLockFile(true);
+ if (output_config_variable == NULL)
+ CreateDataDirLockFile(true);
/*
* Read the control file (for error checking and config info).
@@ -1026,6 +1016,19 @@ PostmasterMain(int argc, char *argv[])
*/
InitializeMaxBackends();
+ /*
+ * If user requested that we print a GUC's value and exit (via "-C guc"),
+ * do that now since we've loaded all configurations.
+ */
+ if (output_config_variable != NULL)
+ {
+ const char *config_val = GetConfigOption(output_config_variable,
+ false, false);
+
+ puts(config_val ? config_val : "");
+ ExitPostmaster(0);
+ }
+
/*
* Set up shared memory and semaphores.
*/
--
2.16.6
On 9/2/21, 10:12 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
By the way I noticed that postgres -C huge_page_size shows 0, which I
think should have the number used for the calculation if we show
huge_page_required.
I would agree with this if huge_page_size was a runtime-computed GUC,
but since it's intended for users to explicitly request the huge page
size, it might be slightly confusing. Perhaps another option would be
to create a new GUC for this. Or maybe it's enough to note that the
value will be changed from 0 at runtime if huge pages are supported.
In any case, it might be best to handle this separately.
I noticed that postgres -C shared_memory_size showed 137 (= 144703488)
whereas the error message above showed 148897792 bytes (142MB). So it
seems that something is forgotten while calculating
shared_memory_size. As the consequence, launching postgres setting
huge_pages_required (69 pages) as vm.nr_hugepages ended up in the
"could not map anonymous shared memory" error.
Hm. I'm not seeing this with the v5 patch set, so maybe I
inadvertently fixed it already. Can you check this again with v5?
Nathan
Hi,
On 2021-09-01 15:53:52 +0900, Michael Paquier wrote:
Hmm. I am not sure about the addition of huge_pages_required, knowing
that we would have shared_memory_size. I'd rather let the calculation
part to the user with a scan of /proc/meminfo.
-1. We can easily do better, what do we gain by making the user do this stuff?
Especially because the right value also depends on huge_page_size.
Greetings,
Andres Freund
On Fri, Sep 03, 2021 at 05:36:43PM +0000, Bossart, Nathan wrote:
On 9/2/21, 6:46 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
You bring up an interesting point about _PG_init(). Presently, you
can safely assume that the data directory is locked during _PG_init(),
so there's no need to worry about breaking something on a running
server. I don't know how important this is. Most _PG_init()
functions that I've seen will define some GUCs, request some shared
memory, register some background workers, and/or install some hooks.
Those all seem like safe things to do, but I wouldn't be at all
surprised to hear examples to the contrary. In any case, it looks
like the current ordering of these two steps has been there for 15+
years.
Yeah. What you are describing here matches what I have seen in the
past and what we do in core for _PG_init(). Now extensions developers
could do more fancy things, like dropping things on-disk to check the
load state, for whatever reasons. And things could break in such
cases. Perhaps people should not do that, but it is no fun either to
break code that has been working for years even if that's just a major
upgrade.
+ * We skip this step if we are just going to print a GUC's value and exit
+ * a few steps down.
*/
- CreateDataDirLockFile(true);
+ if (output_config_variable == NULL)
+ CreateDataDirLockFile(true);
Anyway, 0002 gives me shivers.
If this is a concern, one option would be to disallow running "-C
shared_memory_size" on running servers. That would have to extend to
GUCs like data_checksums and huge_pages_required, too.
Just noting this bit from 0003 that would break without 0002:
-$ <userinput>pmap 4170 | awk '/rw-s/ && /zero/ {print $2}'</userinput>
-6490428K
+$ <userinput>postgres -D $PGDATA -C shared_memory_size</userinput>
+6339
0001, that refactors the calculation of the shmem size into a
different routine, is fine as-is, so I'd like to move on and apply
it.Sounds good to me.
Applied this one.
Without concluding on 0002 yet, another thing that we could do is to
just add the GUCs. These sound rather useful on their own (mixed
feelings about huge_pages_required but I can see why it is useful to
avoid the setup steps and the backend already grabs this information),
particularly when it comes to cloned setups that share a lot of
properties.
--
Michael
On 9/5/21, 7:28 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
On Fri, Sep 03, 2021 at 05:36:43PM +0000, Bossart, Nathan wrote:
On 9/2/21, 6:46 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
0001, that refactors the calculation of the shmem size into a
different routine, is fine as-is, so I'd like to move on and apply
it.Sounds good to me.
Applied this one.
Thanks!
Without concluding on 0002 yet, another thing that we could do is to
just add the GUCs. These sound rather useful on their own (mixed
feelings about huge_pages_required but I can see why it is useful to
avoid the setup steps and the backend already grabs this information),
particularly when it comes to cloned setups that share a lot of
properties.
I think this is a good starting point, but I'd like to follow up on
making them visible without completely starting the server. The main
purpose for adding these GUCs is to be able to set up huge pages
before server startup. Disallowing "-C huge_pages_required" on a
running server to enable this use-case seems like a modest tradeoff.
Anyway, I'll restructure the remaining patches to add the GUCs first
and then address the 0002 business separately.
Nathan
On 9/5/21, 9:26 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
On 9/5/21, 7:28 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
Without concluding on 0002 yet, another thing that we could do is to
just add the GUCs. These sound rather useful on their own (mixed
feelings about huge_pages_required but I can see why it is useful to
avoid the setup steps and the backend already grabs this information),
particularly when it comes to cloned setups that share a lot of
properties.I think this is a good starting point, but I'd like to follow up on
making them visible without completely starting the server. The main
purpose for adding these GUCs is to be able to set up huge pages
before server startup. Disallowing "-C huge_pages_required" on a
running server to enable this use-case seems like a modest tradeoff.Anyway, I'll restructure the remaining patches to add the GUCs first
and then address the 0002 business separately.
Attached is a new patch set. The first two patches just add the new
GUCs, and the third is an attempt at providing useful values for those
GUCs via -C.
Nathan
Attachments:
v6-0003-Provide-useful-values-for-postgres-C-with-runtime.patchapplication/octet-stream; name=v6-0003-Provide-useful-values-for-postgres-C-with-runtime.patchDownload
From bce822ac162a63be8ece7ee035539d7951170b3e Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Mon, 6 Sep 2021 23:31:41 +0000
Subject: [PATCH v6 3/3] Provide useful values for 'postgres -C' with
runtime-computed GUCs.
The -C option is handled before a small subset of GUCs that are
computed at runtime are initialized. Unfortunately, we cannot move
this handling to after they are initialized without disallowing
'postgres -C' on a running server. One notable reason for this is
that loadable libraries' _PG_init() functions are called before all
runtime-computed GUCs are initialized, and this is not guaranteed
to be safe to do on running servers.
In order to provide useful values for 'postgres -C' for runtime-
computed GUCs, this change adds a new section for handling just
these GUCs just before shared memory is initialized. While users
won't be able to use -C for runtime-computed GUCs on running
servers, providing a useful value with this restriction seems
better than not providing a useful value at all.
---
doc/src/sgml/ref/postgres-ref.sgml | 10 ++++++--
doc/src/sgml/runtime.sgml | 33 ++++++++----------------
src/backend/postmaster/postmaster.c | 50 +++++++++++++++++++++++++++++++------
src/backend/utils/misc/guc.c | 10 ++++----
src/include/utils/guc.h | 6 +++++
5 files changed, 73 insertions(+), 36 deletions(-)
diff --git a/doc/src/sgml/ref/postgres-ref.sgml b/doc/src/sgml/ref/postgres-ref.sgml
index 4aaa7abe1a..f6f2246bff 100644
--- a/doc/src/sgml/ref/postgres-ref.sgml
+++ b/doc/src/sgml/ref/postgres-ref.sgml
@@ -133,13 +133,19 @@ PostgreSQL documentation
<listitem>
<para>
Prints the value of the named run-time parameter, and exits.
- (See the <option>-c</option> option above for details.) This can
- be used on a running server, and returns values from
+ (See the <option>-c</option> option above for details.) This
+ returns values from
<filename>postgresql.conf</filename>, modified by any parameters
supplied in this invocation. It does not reflect parameters
supplied when the cluster was started.
</para>
+ <para>
+ This can be used on a running server for most parameters. However,
+ the server must be shut down for some runtime-computed parameters
+ (e.g., <xref linkend="guc-huge-pages-required"/>).
+ </para>
+
<para>
This option is meant for other programs that interact with a server
instance, such as <xref linkend="app-pg-ctl"/>, to query configuration
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index f1cbc1d9e9..d955639900 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1442,32 +1442,21 @@ export PG_OOM_ADJUST_VALUE=0
with <varname>CONFIG_HUGETLBFS=y</varname> and
<varname>CONFIG_HUGETLB_PAGE=y</varname>. You will also have to configure
the operating system to provide enough huge pages of the desired size.
- To estimate the number of huge pages needed, start
- <productname>PostgreSQL</productname> without huge pages enabled and check
- the postmaster's anonymous shared memory segment size, as well as the
- system's default and supported huge page sizes, using the
- <filename>/proc</filename> and <filename>/sys</filename> file systems.
- This might look like:
+ To estimate the number of huge pages needed, use the
+ <command>postgres</command> command to see the value of
+ <xref linkend="guc-huge-pages-required"/>. This might look like:
<programlisting>
-$ <userinput>head -1 $PGDATA/postmaster.pid</userinput>
-4170
-$ <userinput>pmap 4170 | awk '/rw-s/ && /zero/ {print $2}'</userinput>
-6490428K
-$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
-Hugepagesize: 2048 kB
-$ <userinput>ls /sys/kernel/mm/hugepages</userinput>
-hugepages-1048576kB hugepages-2048kB
+$ <userinput>postgres -D $PGDATA -C huge_pages_required</userinput>
+3170
</programlisting>
- In this example the default is 2MB, but you can also explicitly request
- either 2MB or 1GB with <xref linkend="guc-huge-page-size"/>.
+ Note that you can explicitly request either 2MB or 1GB huge pages with
+ <xref linkend="guc-huge-page-size"/>.
- Assuming <literal>2MB</literal> huge pages,
- <literal>6490428</literal> / <literal>2048</literal> gives approximately
- <literal>3169.154</literal>, so in this example we need at
- least <literal>3170</literal> huge pages. A larger setting would be
- appropriate if other programs on the machine also need huge pages.
- We can set this with:
+ While we need at least <literal>3170</literal> huge pages in this
+ example, a larger setting would be appropriate if other programs on
+ the machine also need huge pages. We can allocate the huge pages
+ with:
<programlisting>
# <userinput>sysctl -w vm.nr_hugepages=3170</userinput>
</programlisting>
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 963c03fa93..d3cbb14d33 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -896,15 +896,31 @@ PostmasterMain(int argc, char *argv[])
if (output_config_variable != NULL)
{
/*
- * "-C guc" was specified, so print GUC's value and exit. No extra
- * permission check is needed because the user is reading inside the
- * data dir.
+ * If this is a runtime-computed GUC, it hasn't yet been initialized,
+ * and the present value is not useful. However, this is a convenient
+ * place to print the value for most GUCs because it is safe to run
+ * postmaster startup to this point even if the server is already
+ * running. For the handful of runtime-computed GUCs that we can't
+ * provide meaningful values for yet, we wait until later in postmaster
+ * startup to print the value. We won't be able to use -C on running
+ * servers for those GUCs, but otherwise this option is unusable for
+ * them.
*/
- const char *config_val = GetConfigOption(output_config_variable,
- false, false);
+ int flags = GetConfigOptionFlags(output_config_variable, true);
- puts(config_val ? config_val : "");
- ExitPostmaster(0);
+ if ((flags & GUC_RUNTIME_COMPUTED) == 0)
+ {
+ /*
+ * "-C guc" was specified, so print GUC's value and exit. No extra
+ * permission check is needed because the user is reading inside
+ * the data dir.
+ */
+ const char *config_val = GetConfigOption(output_config_variable,
+ false, false);
+
+ puts(config_val ? config_val : "");
+ ExitPostmaster(0);
+ }
}
/* Verify that DataDir looks reasonable */
@@ -1033,6 +1049,26 @@ PostmasterMain(int argc, char *argv[])
*/
InitializeShmemGUCs();
+ /*
+ * If -C was specified with a runtime-computed GUC, we held off printing
+ * the value earlier, as the GUC was not yet initialized. We handle -C for
+ * most GUCs before we lock the data directory so that the option may be
+ * used on a running server. However, a handful of GUCs are runtime-
+ * computed and do not have meaningful values until after locking the data
+ * directory, and we cannot safely calculate their values earlier on a
+ * running server. At this point, such GUCs should be properly
+ * initialized, and we haven't yet set up shared memory, so this is a good
+ * time to handle the -C option for these special GUCs.
+ */
+ if (output_config_variable != NULL)
+ {
+ const char *config_val = GetConfigOption(output_config_variable,
+ false, false);
+
+ puts(config_val ? config_val : "");
+ ExitPostmaster(0);
+ }
+
/*
* Set up shared memory and semaphores.
*/
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 1a30d635fc..cabbd083d9 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1985,7 +1985,7 @@ static struct config_bool ConfigureNamesBool[] =
{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows whether data checksums are turned on for this cluster."),
NULL,
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&data_checksums,
false,
@@ -2230,7 +2230,7 @@ static struct config_int ConfigureNamesInt[] =
{"huge_pages_required", PGC_INTERNAL, RESOURCES_MEM,
gettext_noop("Shows the number of huge pages needed for the main shared memory area."),
gettext_noop("-1 indicates that the value could not be determined."),
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&huge_pages_required,
-1, -1, INT_MAX,
@@ -2355,7 +2355,7 @@ static struct config_int ConfigureNamesInt[] =
{"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM,
gettext_noop("Shows the size of the server's main shared memory area (rounded up to the nearest MB)."),
NULL,
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_UNIT_MB
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_UNIT_MB | GUC_RUNTIME_COMPUTED
},
&shmem_size_mb,
0, 0, INT_MAX,
@@ -2420,7 +2420,7 @@ static struct config_int ConfigureNamesInt[] =
"in the form accepted by the chmod and umask system "
"calls. (To use the customary octal format the number "
"must start with a 0 (zero).)"),
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&data_directory_mode,
0700, 0000, 0777,
@@ -3244,7 +3244,7 @@ static struct config_int ConfigureNamesInt[] =
{"wal_segment_size", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows the size of write ahead log segments."),
NULL,
- GUC_UNIT_BYTE | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_UNIT_BYTE | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&wal_segment_size,
DEFAULT_XLOG_SEG_SIZE,
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index a7c3a4958e..aa18d304ac 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -229,6 +229,12 @@ typedef enum
#define GUC_EXPLAIN 0x100000 /* include in explain */
+/*
+ * GUC_RUNTIME_COMPUTED is intended for runtime-computed GUCs that are only
+ * available via 'postgres -C' if the server is not running.
+ */
+#define GUC_RUNTIME_COMPUTED 0x200000
+
#define GUC_UNIT (GUC_UNIT_MEMORY | GUC_UNIT_TIME)
--
2.16.6
v6-0002-Introduce-huge_pages_required-GUC.patchapplication/octet-stream; name=v6-0002-Introduce-huge_pages_required-GUC.patchDownload
From 6010e5bc8c1574a982d4cec2cc323f972f553071 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Mon, 6 Sep 2021 18:04:59 +0000
Subject: [PATCH v6 2/3] Introduce huge_pages_required GUC.
This runtime-computed GUC shows the number of huge pages required
for the server's main shared memory area. Like shared_memory_size,
it cannot be viewed with 'postgres -C' yet.
---
doc/src/sgml/config.sgml | 21 +++++++++++++++++++++
src/backend/port/sysv_shmem.c | 2 +-
src/backend/storage/ipc/ipci.c | 20 ++++++++++++++++++++
src/backend/utils/misc/guc.c | 12 ++++++++++++
src/include/storage/pg_shmem.h | 4 ++++
5 files changed, 58 insertions(+), 1 deletion(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ef0e2a7746..b27d8aff15 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -10101,6 +10101,27 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
</listitem>
</varlistentry>
+ <varlistentry id="guc-huge-pages-required" xreflabel="huge_pages_required">
+ <term><varname>huge_pages_required</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>huge_pages_required</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Reports the number of huge pages that are required for the main
+ shared memory area based on the specified
+ <xref linkend="guc-huge-page-size"/>. If huge pages are not supported,
+ this will be <literal>-1</literal>.
+ </para>
+ <para>
+ This setting is supported only on Linux. It is always set to
+ <literal>-1</literal> on other platforms. For more details about using
+ huge pages on Linux, see <xref linkend="linux-huge-pages"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-integer-datetimes" xreflabel="integer_datetimes">
<term><varname>integer_datetimes</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 9de96edf6a..f42f1ac171 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -478,7 +478,7 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
* Returns the (real, assumed or config provided) page size into *hugepagesize,
* and the hugepage-related mmap flags to use into *mmap_flags.
*/
-static void
+void
GetHugePageSize(Size *hugepagesize, int *mmap_flags)
{
Size default_hugepagesize = 0;
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 614b8e92c4..e5f94f9e69 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -14,6 +14,10 @@
*/
#include "postgres.h"
+#ifndef WIN32
+#include <sys/mman.h>
+#endif
+
#include "access/clog.h"
#include "access/commit_ts.h"
#include "access/heapam.h"
@@ -326,6 +330,11 @@ InitializeShmemGUCs(void)
char buf[64];
Size size_b;
Size size_mb;
+#if defined(MAP_HUGETLB)
+ Size hp_size;
+ Size hp_required;
+ int unused;
+#endif
/*
* Calculate the shared memory size and round up to the nearest
@@ -336,4 +345,15 @@ InitializeShmemGUCs(void)
sprintf(buf, "%lu MB", size_mb);
SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+
+#if defined(MAP_HUGETLB)
+
+ /* Calculate the number of huge pages required */
+ GetHugePageSize(&hp_size, &unused);
+ hp_required = (size_b / hp_size) + 1;
+
+ sprintf(buf, "%lu", hp_required);
+ SetConfigOption("huge_pages_required", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+
+#endif
}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a11387c5ce..1a30d635fc 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -621,6 +621,7 @@ char *pgstat_temp_directory;
char *application_name;
int shmem_size_mb;
+int huge_pages_required;
int tcp_keepalives_idle;
int tcp_keepalives_interval;
@@ -2225,6 +2226,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"huge_pages_required", PGC_INTERNAL, RESOURCES_MEM,
+ gettext_noop("Shows the number of huge pages needed for the main shared memory area."),
+ gettext_noop("-1 indicates that the value could not be determined."),
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ },
+ &huge_pages_required,
+ -1, -1, INT_MAX,
+ NULL, NULL, NULL
+ },
+
{
/* This is PGC_SUSET to prevent hiding from log_lock_waits. */
{"deadlock_timeout", PGC_SUSET, LOCK_MANAGEMENT,
diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index 059df1b72c..c44403ed6a 100644
--- a/src/include/storage/pg_shmem.h
+++ b/src/include/storage/pg_shmem.h
@@ -88,4 +88,8 @@ extern PGShmemHeader *PGSharedMemoryCreate(Size size,
extern bool PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2);
extern void PGSharedMemoryDetach(void);
+#ifdef MAP_HUGETLB
+extern void GetHugePageSize(Size *hugepagesize, int *mmap_flags);
+#endif
+
#endif /* PG_SHMEM_H */
--
2.16.6
v6-0001-Introduce-shared_memory_size-GUC.patchapplication/octet-stream; name=v6-0001-Introduce-shared_memory_size-GUC.patchDownload
From fedabb6b8fec47ba783812b3408f3e6eeb122f74 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Mon, 6 Sep 2021 18:00:02 +0000
Subject: [PATCH v6 1/3] Introduce shared_memory_size GUC.
This runtime-computed GUC shows the size of the server's main
shared memory area. While this is intended to be useful for
calculating the number of huge pages required prior to startup, it
cannot be viewed via 'postgres -C' yet. This may be addressed in a
future change.
---
doc/src/sgml/config.sgml | 14 ++++++++++++++
src/backend/postmaster/postmaster.c | 7 +++++++
src/backend/storage/ipc/ipci.c | 24 ++++++++++++++++++++++++
src/backend/utils/misc/guc.c | 13 +++++++++++++
src/include/storage/ipc.h | 1 +
5 files changed, 59 insertions(+)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2c31c35a6b..ef0e2a7746 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -10275,6 +10275,20 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
</listitem>
</varlistentry>
+ <varlistentry id="guc-shared-memory-size" xreflabel="shared_memory_size">
+ <term><varname>shared_memory_size</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>shared_memory_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Reports the size of the main shared memory area, rounded up to the
+ nearest megabyte.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-ssl-library" xreflabel="ssl_library">
<term><varname>ssl_library</varname> (<type>string</type>)
<indexterm>
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9c2c98614a..963c03fa93 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1026,6 +1026,13 @@ PostmasterMain(int argc, char *argv[])
*/
InitializeMaxBackends();
+ /*
+ * Now that loadable modules have had their chance to request additional
+ * shared memory, determine the value of any runtime-computed GUCs that
+ * depend on the amount of shared memory required.
+ */
+ InitializeShmemGUCs();
+
/*
* Set up shared memory and semaphores.
*/
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 64bc16fa84..614b8e92c4 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -313,3 +313,27 @@ CreateSharedMemoryAndSemaphores(void)
if (shmem_startup_hook)
shmem_startup_hook();
}
+
+/*
+ * InitializeShmemGUCs
+ *
+ * This function initializes runtime-computed GUCs related to the amount of
+ * shared memory required for the current configuration.
+ */
+void
+InitializeShmemGUCs(void)
+{
+ char buf[64];
+ Size size_b;
+ Size size_mb;
+
+ /*
+ * Calculate the shared memory size and round up to the nearest
+ * megabyte.
+ */
+ size_b = CalculateShmemSize(NULL);
+ size_mb = add_size(size_b, (1024 * 1024) - 1) / (1024 * 1024);
+
+ sprintf(buf, "%lu MB", size_mb);
+ SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 467b0fd6fe..a11387c5ce 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -620,6 +620,8 @@ char *pgstat_temp_directory;
char *application_name;
+int shmem_size_mb;
+
int tcp_keepalives_idle;
int tcp_keepalives_interval;
int tcp_keepalives_count;
@@ -2337,6 +2339,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM,
+ gettext_noop("Shows the size of the server's main shared memory area (rounded up to the nearest MB)."),
+ NULL,
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_UNIT_MB
+ },
+ &shmem_size_mb,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
{
{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index 80e191d407..7a1ebc8559 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -79,5 +79,6 @@ extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
extern Size CalculateShmemSize(int *num_semaphores);
extern void CreateSharedMemoryAndSemaphores(void);
+extern void InitializeShmemGUCs(void);
#endif /* IPC_H */
--
2.16.6
On Mon, Sep 06, 2021 at 11:55:42PM +0000, Bossart, Nathan wrote:
Attached is a new patch set. The first two patches just add the new
GUCs, and the third is an attempt at providing useful values for those
GUCs via -C.
+ sprintf(buf, "%lu MB", size_mb);
+ SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
One small-ish comment about 0002: there is no need to add the unit
into the buffer set as GUC_UNIT_MB would take care of that. The patch
looks fine.
+#ifndef WIN32
+#include <sys/mman.h>
+#endif
So, this is needed in ipci.c to check for MAP_HUGETLB. I am not much
a fan of moving around platform-specific checks when these have
remained local to each shmem implementation. Could it be cleaner to
add GetHugePageSize() to win32_shmem.c and make it always declared in
the SysV implementation?
--
Michael
At Fri, 3 Sep 2021 17:46:05 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in
On 9/2/21, 10:12 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
By the way I noticed that postgres -C huge_page_size shows 0, which I
think should have the number used for the calculation if we show
huge_page_required.I would agree with this if huge_page_size was a runtime-computed GUC,
but since it's intended for users to explicitly request the huge page
size, it might be slightly confusing. Perhaps another option would be
to create a new GUC for this. Or maybe it's enough to note that the
value will be changed from 0 at runtime if huge pages are supported.
In any case, it might be best to handle this separately.
(Sorry, I was confused, but) yeah, agreed.
I noticed that postgres -C shared_memory_size showed 137 (= 144703488)
whereas the error message above showed 148897792 bytes (142MB). So it
seems that something is forgotten while calculating
shared_memory_size. As the consequence, launching postgres setting
huge_pages_required (69 pages) as vm.nr_hugepages ended up in the
"could not map anonymous shared memory" error.Hm. I'm not seeing this with the v5 patch set, so maybe I
inadvertently fixed it already. Can you check this again with v5?
Thanks! I confirmed that the numbers match with v5.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 9/6/21, 9:00 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
+ sprintf(buf, "%lu MB", size_mb); + SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE); One small-ish comment about 0002: there is no need to add the unit into the buffer set as GUC_UNIT_MB would take care of that. The patch looks fine.
I fixed this in v7.
+#ifndef WIN32
+#include <sys/mman.h>
+#endif
So, this is needed in ipci.c to check for MAP_HUGETLB. I am not much
a fan of moving around platform-specific checks when these have
remained local to each shmem implementation. Could it be cleaner to
add GetHugePageSize() to win32_shmem.c and make it always declared in
the SysV implementation?
I don't know if it's really all that much cleaner, but I did it this
way in v7. IMO it's a little weird that GetHugePageSize() doesn't
return the value from GetLargePageMinimum(), but that's what we'd need
to do to avoid setting huge_pages_required for Windows without any
platform-specific checks.
Nathan
Attachments:
v7-0003-Provide-useful-values-for-postgres-C-with-runtime.patchapplication/octet-stream; name=v7-0003-Provide-useful-values-for-postgres-C-with-runtime.patchDownload
From 535a46135b2e836882197cb5e4ee4eadd1a945a0 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Mon, 6 Sep 2021 23:31:41 +0000
Subject: [PATCH v7 3/3] Provide useful values for 'postgres -C' with
runtime-computed GUCs.
The -C option is handled before a small subset of GUCs that are
computed at runtime are initialized. Unfortunately, we cannot move
this handling to after they are initialized without disallowing
'postgres -C' on a running server. One notable reason for this is
that loadable libraries' _PG_init() functions are called before all
runtime-computed GUCs are initialized, and this is not guaranteed
to be safe to do on running servers.
In order to provide useful values for 'postgres -C' for runtime-
computed GUCs, this change adds a new section for handling just
these GUCs just before shared memory is initialized. While users
won't be able to use -C for runtime-computed GUCs on running
servers, providing a useful value with this restriction seems
better than not providing a useful value at all.
---
doc/src/sgml/ref/postgres-ref.sgml | 10 ++++++--
doc/src/sgml/runtime.sgml | 33 ++++++++----------------
src/backend/postmaster/postmaster.c | 50 +++++++++++++++++++++++++++++++------
src/backend/utils/misc/guc.c | 10 ++++----
src/include/utils/guc.h | 6 +++++
5 files changed, 73 insertions(+), 36 deletions(-)
diff --git a/doc/src/sgml/ref/postgres-ref.sgml b/doc/src/sgml/ref/postgres-ref.sgml
index 4aaa7abe1a..f6f2246bff 100644
--- a/doc/src/sgml/ref/postgres-ref.sgml
+++ b/doc/src/sgml/ref/postgres-ref.sgml
@@ -133,13 +133,19 @@ PostgreSQL documentation
<listitem>
<para>
Prints the value of the named run-time parameter, and exits.
- (See the <option>-c</option> option above for details.) This can
- be used on a running server, and returns values from
+ (See the <option>-c</option> option above for details.) This
+ returns values from
<filename>postgresql.conf</filename>, modified by any parameters
supplied in this invocation. It does not reflect parameters
supplied when the cluster was started.
</para>
+ <para>
+ This can be used on a running server for most parameters. However,
+ the server must be shut down for some runtime-computed parameters
+ (e.g., <xref linkend="guc-huge-pages-required"/>).
+ </para>
+
<para>
This option is meant for other programs that interact with a server
instance, such as <xref linkend="app-pg-ctl"/>, to query configuration
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index f1cbc1d9e9..d955639900 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1442,32 +1442,21 @@ export PG_OOM_ADJUST_VALUE=0
with <varname>CONFIG_HUGETLBFS=y</varname> and
<varname>CONFIG_HUGETLB_PAGE=y</varname>. You will also have to configure
the operating system to provide enough huge pages of the desired size.
- To estimate the number of huge pages needed, start
- <productname>PostgreSQL</productname> without huge pages enabled and check
- the postmaster's anonymous shared memory segment size, as well as the
- system's default and supported huge page sizes, using the
- <filename>/proc</filename> and <filename>/sys</filename> file systems.
- This might look like:
+ To estimate the number of huge pages needed, use the
+ <command>postgres</command> command to see the value of
+ <xref linkend="guc-huge-pages-required"/>. This might look like:
<programlisting>
-$ <userinput>head -1 $PGDATA/postmaster.pid</userinput>
-4170
-$ <userinput>pmap 4170 | awk '/rw-s/ && /zero/ {print $2}'</userinput>
-6490428K
-$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
-Hugepagesize: 2048 kB
-$ <userinput>ls /sys/kernel/mm/hugepages</userinput>
-hugepages-1048576kB hugepages-2048kB
+$ <userinput>postgres -D $PGDATA -C huge_pages_required</userinput>
+3170
</programlisting>
- In this example the default is 2MB, but you can also explicitly request
- either 2MB or 1GB with <xref linkend="guc-huge-page-size"/>.
+ Note that you can explicitly request either 2MB or 1GB huge pages with
+ <xref linkend="guc-huge-page-size"/>.
- Assuming <literal>2MB</literal> huge pages,
- <literal>6490428</literal> / <literal>2048</literal> gives approximately
- <literal>3169.154</literal>, so in this example we need at
- least <literal>3170</literal> huge pages. A larger setting would be
- appropriate if other programs on the machine also need huge pages.
- We can set this with:
+ While we need at least <literal>3170</literal> huge pages in this
+ example, a larger setting would be appropriate if other programs on
+ the machine also need huge pages. We can allocate the huge pages
+ with:
<programlisting>
# <userinput>sysctl -w vm.nr_hugepages=3170</userinput>
</programlisting>
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 963c03fa93..d3cbb14d33 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -896,15 +896,31 @@ PostmasterMain(int argc, char *argv[])
if (output_config_variable != NULL)
{
/*
- * "-C guc" was specified, so print GUC's value and exit. No extra
- * permission check is needed because the user is reading inside the
- * data dir.
+ * If this is a runtime-computed GUC, it hasn't yet been initialized,
+ * and the present value is not useful. However, this is a convenient
+ * place to print the value for most GUCs because it is safe to run
+ * postmaster startup to this point even if the server is already
+ * running. For the handful of runtime-computed GUCs that we can't
+ * provide meaningful values for yet, we wait until later in postmaster
+ * startup to print the value. We won't be able to use -C on running
+ * servers for those GUCs, but otherwise this option is unusable for
+ * them.
*/
- const char *config_val = GetConfigOption(output_config_variable,
- false, false);
+ int flags = GetConfigOptionFlags(output_config_variable, true);
- puts(config_val ? config_val : "");
- ExitPostmaster(0);
+ if ((flags & GUC_RUNTIME_COMPUTED) == 0)
+ {
+ /*
+ * "-C guc" was specified, so print GUC's value and exit. No extra
+ * permission check is needed because the user is reading inside
+ * the data dir.
+ */
+ const char *config_val = GetConfigOption(output_config_variable,
+ false, false);
+
+ puts(config_val ? config_val : "");
+ ExitPostmaster(0);
+ }
}
/* Verify that DataDir looks reasonable */
@@ -1033,6 +1049,26 @@ PostmasterMain(int argc, char *argv[])
*/
InitializeShmemGUCs();
+ /*
+ * If -C was specified with a runtime-computed GUC, we held off printing
+ * the value earlier, as the GUC was not yet initialized. We handle -C for
+ * most GUCs before we lock the data directory so that the option may be
+ * used on a running server. However, a handful of GUCs are runtime-
+ * computed and do not have meaningful values until after locking the data
+ * directory, and we cannot safely calculate their values earlier on a
+ * running server. At this point, such GUCs should be properly
+ * initialized, and we haven't yet set up shared memory, so this is a good
+ * time to handle the -C option for these special GUCs.
+ */
+ if (output_config_variable != NULL)
+ {
+ const char *config_val = GetConfigOption(output_config_variable,
+ false, false);
+
+ puts(config_val ? config_val : "");
+ ExitPostmaster(0);
+ }
+
/*
* Set up shared memory and semaphores.
*/
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 1a30d635fc..cabbd083d9 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1985,7 +1985,7 @@ static struct config_bool ConfigureNamesBool[] =
{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows whether data checksums are turned on for this cluster."),
NULL,
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&data_checksums,
false,
@@ -2230,7 +2230,7 @@ static struct config_int ConfigureNamesInt[] =
{"huge_pages_required", PGC_INTERNAL, RESOURCES_MEM,
gettext_noop("Shows the number of huge pages needed for the main shared memory area."),
gettext_noop("-1 indicates that the value could not be determined."),
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&huge_pages_required,
-1, -1, INT_MAX,
@@ -2355,7 +2355,7 @@ static struct config_int ConfigureNamesInt[] =
{"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM,
gettext_noop("Shows the size of the server's main shared memory area (rounded up to the nearest MB)."),
NULL,
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_UNIT_MB
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_UNIT_MB | GUC_RUNTIME_COMPUTED
},
&shmem_size_mb,
0, 0, INT_MAX,
@@ -2420,7 +2420,7 @@ static struct config_int ConfigureNamesInt[] =
"in the form accepted by the chmod and umask system "
"calls. (To use the customary octal format the number "
"must start with a 0 (zero).)"),
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&data_directory_mode,
0700, 0000, 0777,
@@ -3244,7 +3244,7 @@ static struct config_int ConfigureNamesInt[] =
{"wal_segment_size", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows the size of write ahead log segments."),
NULL,
- GUC_UNIT_BYTE | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_UNIT_BYTE | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&wal_segment_size,
DEFAULT_XLOG_SEG_SIZE,
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index a7c3a4958e..aa18d304ac 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -229,6 +229,12 @@ typedef enum
#define GUC_EXPLAIN 0x100000 /* include in explain */
+/*
+ * GUC_RUNTIME_COMPUTED is intended for runtime-computed GUCs that are only
+ * available via 'postgres -C' if the server is not running.
+ */
+#define GUC_RUNTIME_COMPUTED 0x200000
+
#define GUC_UNIT (GUC_UNIT_MEMORY | GUC_UNIT_TIME)
--
2.16.6
v7-0002-Introduce-huge_pages_required-GUC.patchapplication/octet-stream; name=v7-0002-Introduce-huge_pages_required-GUC.patchDownload
From 46c6ada322517c4246ea16597475c392d0b33005 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Tue, 7 Sep 2021 16:46:52 +0000
Subject: [PATCH v7 2/3] Introduce huge_pages_required GUC.
This runtime-computed GUC shows the number of huge pages required
for the server's main shared memory area. Like shared_memory_size,
it cannot be viewed with 'postgres -C' yet.
---
doc/src/sgml/config.sgml | 21 +++++++++++++++++++++
src/backend/port/sysv_shmem.c | 16 +++++++++++-----
src/backend/port/win32_shmem.c | 14 ++++++++++++++
src/backend/storage/ipc/ipci.c | 12 ++++++++++++
src/backend/utils/misc/guc.c | 12 ++++++++++++
src/include/storage/pg_shmem.h | 1 +
6 files changed, 71 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ef0e2a7746..b27d8aff15 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -10101,6 +10101,27 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
</listitem>
</varlistentry>
+ <varlistentry id="guc-huge-pages-required" xreflabel="huge_pages_required">
+ <term><varname>huge_pages_required</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>huge_pages_required</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Reports the number of huge pages that are required for the main
+ shared memory area based on the specified
+ <xref linkend="guc-huge-page-size"/>. If huge pages are not supported,
+ this will be <literal>-1</literal>.
+ </para>
+ <para>
+ This setting is supported only on Linux. It is always set to
+ <literal>-1</literal> on other platforms. For more details about using
+ huge pages on Linux, see <xref linkend="linux-huge-pages"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-integer-datetimes" xreflabel="integer_datetimes">
<term><varname>integer_datetimes</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 9de96edf6a..125e2d47ec 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -456,8 +456,6 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
return shmStat.shm_nattch == 0 ? SHMSTATE_UNATTACHED : SHMSTATE_ATTACHED;
}
-#ifdef MAP_HUGETLB
-
/*
* Identify the huge page size to use, and compute the related mmap flags.
*
@@ -476,11 +474,19 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
* such as increasing shared_buffers to absorb the extra space.
*
* Returns the (real, assumed or config provided) page size into *hugepagesize,
- * and the hugepage-related mmap flags to use into *mmap_flags.
+ * and the hugepage-related mmap flags to use into *mmap_flags. If huge pages
+ * is not supported, *hugepagesize and *mmap_flags will be set to 0.
*/
-static void
+void
GetHugePageSize(Size *hugepagesize, int *mmap_flags)
{
+#ifndef MAP_HUGETLB
+
+ *hugepagesize = 0;
+ *mmap_flags = 0;
+
+#else
+
Size default_hugepagesize = 0;
/*
@@ -553,9 +559,9 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags)
*mmap_flags |= (shift & MAP_HUGE_MASK) << MAP_HUGE_SHIFT;
}
#endif
-}
#endif /* MAP_HUGETLB */
+}
/*
* Creates an anonymous mmap()ed shared memory segment.
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index d7a71992d8..90de2ab4e1 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -605,3 +605,17 @@ pgwin32_ReserveSharedMemoryRegion(HANDLE hChild)
return true;
}
+
+/*
+ * This function is provided for consistency with sysv_shmem.c and does not
+ * provide any useful information for Windows. To obtain the large page size,
+ * use GetLargePageMinimum() instead.
+ *
+ * This always sets *hugepagesize and *mmap_flags to 0.
+ */
+void
+GetHugePageSize(Size *hugepagesize, int *mmap_flags)
+{
+ *hugepagesize = 0;
+ *mmap_flags = 0;
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 4b7b2faa4c..14de7afe0a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -326,6 +326,9 @@ InitializeShmemGUCs(void)
char buf[64];
Size size_b;
Size size_mb;
+ Size hp_size;
+ Size hp_required;
+ int unused;
/*
* Calculate the shared memory size and round up to the nearest
@@ -335,4 +338,13 @@ InitializeShmemGUCs(void)
size_mb = add_size(size_b, (1024 * 1024) - 1) / (1024 * 1024);
sprintf(buf, "%lu", size_mb);
SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+
+ /* Calculate the number of huge pages required */
+ GetHugePageSize(&hp_size, &unused);
+ if (hp_size != 0)
+ {
+ hp_required = (size_b / hp_size) + 1;
+ sprintf(buf, "%lu", hp_required);
+ SetConfigOption("huge_pages_required", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+ }
}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a11387c5ce..1a30d635fc 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -621,6 +621,7 @@ char *pgstat_temp_directory;
char *application_name;
int shmem_size_mb;
+int huge_pages_required;
int tcp_keepalives_idle;
int tcp_keepalives_interval;
@@ -2225,6 +2226,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"huge_pages_required", PGC_INTERNAL, RESOURCES_MEM,
+ gettext_noop("Shows the number of huge pages needed for the main shared memory area."),
+ gettext_noop("-1 indicates that the value could not be determined."),
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ },
+ &huge_pages_required,
+ -1, -1, INT_MAX,
+ NULL, NULL, NULL
+ },
+
{
/* This is PGC_SUSET to prevent hiding from log_lock_waits. */
{"deadlock_timeout", PGC_SUSET, LOCK_MANAGEMENT,
diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index 059df1b72c..518eb86065 100644
--- a/src/include/storage/pg_shmem.h
+++ b/src/include/storage/pg_shmem.h
@@ -87,5 +87,6 @@ extern PGShmemHeader *PGSharedMemoryCreate(Size size,
PGShmemHeader **shim);
extern bool PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2);
extern void PGSharedMemoryDetach(void);
+extern void GetHugePageSize(Size *hugepagesize, int *mmap_flags);
#endif /* PG_SHMEM_H */
--
2.16.6
v7-0001-Introduce-shared_memory_size-GUC.patchapplication/octet-stream; name=v7-0001-Introduce-shared_memory_size-GUC.patchDownload
From 96d9d14dcc1626c3ff604bfa60b8c9549532a1ba Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Tue, 7 Sep 2021 16:42:46 +0000
Subject: [PATCH v7 1/3] Introduce shared_memory_size GUC.
This runtime-computed GUC shows the size of the server's main
shared memory area. While this is intended to be useful for
calculating the number of huge pages required prior to startup, it
cannot be viewed via 'postgres -C' yet. This may be addressed in a
future change.
---
doc/src/sgml/config.sgml | 14 ++++++++++++++
src/backend/postmaster/postmaster.c | 7 +++++++
src/backend/storage/ipc/ipci.c | 23 +++++++++++++++++++++++
src/backend/utils/misc/guc.c | 13 +++++++++++++
src/include/storage/ipc.h | 1 +
5 files changed, 58 insertions(+)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2c31c35a6b..ef0e2a7746 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -10275,6 +10275,20 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
</listitem>
</varlistentry>
+ <varlistentry id="guc-shared-memory-size" xreflabel="shared_memory_size">
+ <term><varname>shared_memory_size</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>shared_memory_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Reports the size of the main shared memory area, rounded up to the
+ nearest megabyte.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-ssl-library" xreflabel="ssl_library">
<term><varname>ssl_library</varname> (<type>string</type>)
<indexterm>
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9c2c98614a..963c03fa93 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1026,6 +1026,13 @@ PostmasterMain(int argc, char *argv[])
*/
InitializeMaxBackends();
+ /*
+ * Now that loadable modules have had their chance to request additional
+ * shared memory, determine the value of any runtime-computed GUCs that
+ * depend on the amount of shared memory required.
+ */
+ InitializeShmemGUCs();
+
/*
* Set up shared memory and semaphores.
*/
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 64bc16fa84..4b7b2faa4c 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -313,3 +313,26 @@ CreateSharedMemoryAndSemaphores(void)
if (shmem_startup_hook)
shmem_startup_hook();
}
+
+/*
+ * InitializeShmemGUCs
+ *
+ * This function initializes runtime-computed GUCs related to the amount of
+ * shared memory required for the current configuration.
+ */
+void
+InitializeShmemGUCs(void)
+{
+ char buf[64];
+ Size size_b;
+ Size size_mb;
+
+ /*
+ * Calculate the shared memory size and round up to the nearest
+ * megabyte.
+ */
+ size_b = CalculateShmemSize(NULL);
+ size_mb = add_size(size_b, (1024 * 1024) - 1) / (1024 * 1024);
+ sprintf(buf, "%lu", size_mb);
+ SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 467b0fd6fe..a11387c5ce 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -620,6 +620,8 @@ char *pgstat_temp_directory;
char *application_name;
+int shmem_size_mb;
+
int tcp_keepalives_idle;
int tcp_keepalives_interval;
int tcp_keepalives_count;
@@ -2337,6 +2339,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM,
+ gettext_noop("Shows the size of the server's main shared memory area (rounded up to the nearest MB)."),
+ NULL,
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_UNIT_MB
+ },
+ &shmem_size_mb,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
{
{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index 80e191d407..7a1ebc8559 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -79,5 +79,6 @@ extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
extern Size CalculateShmemSize(int *num_semaphores);
extern void CreateSharedMemoryAndSemaphores(void);
+extern void InitializeShmemGUCs(void);
#endif /* IPC_H */
--
2.16.6
On 9/6/21, 11:24 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
At Fri, 3 Sep 2021 17:46:05 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in
On 9/2/21, 10:12 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
I noticed that postgres -C shared_memory_size showed 137 (= 144703488)
whereas the error message above showed 148897792 bytes (142MB). So it
seems that something is forgotten while calculating
shared_memory_size. As the consequence, launching postgres setting
huge_pages_required (69 pages) as vm.nr_hugepages ended up in the
"could not map anonymous shared memory" error.Hm. I'm not seeing this with the v5 patch set, so maybe I
inadvertently fixed it already. Can you check this again with v5?Thanks! I confirmed that the numbers match with v5.
Thanks for confirming.
Nathan
On Tue, Sep 07, 2021 at 05:08:43PM +0000, Bossart, Nathan wrote:
On 9/6/21, 9:00 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
+ sprintf(buf, "%lu MB", size_mb); + SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE); One small-ish comment about 0002: there is no need to add the unit into the buffer set as GUC_UNIT_MB would take care of that. The patch looks fine.I fixed this in v7.
Switched the variable name to shared_memory_size_mb for easier
grepping, moved it to a more correct location with the other read-only
GUCS, and applied 0002. Well, 0001 here.
+#ifndef WIN32
+#include <sys/mman.h>
+#endif
So, this is needed in ipci.c to check for MAP_HUGETLB. I am not much
a fan of moving around platform-specific checks when these have
remained local to each shmem implementation. Could it be cleaner to
add GetHugePageSize() to win32_shmem.c and make it always declared in
the SysV implementation?I don't know if it's really all that much cleaner, but I did it this
way in v7. IMO it's a little weird that GetHugePageSize() doesn't
return the value from GetLargePageMinimum(), but that's what we'd need
to do to avoid setting huge_pages_required for Windows without any
platform-specific checks.
Thanks. Keeping MAP_HUGETLB within the SysV portions is an
improvement IMO. That's subject to one's taste, perhaps.
After sleeping on it, I'd be fine to live with the logic based on the
new GUC flag called GUC_RUNTIME_COMPUTED to control if a parameter can
be looked at either an earlier or a later stage of the startup
sequences, with the later stage meaning that such parameters cannot be
checked if a server is running. This case was originally broken
anyway, so it does not make it worse, just better.
+ This can be used on a running server for most parameters. However,
+ the server must be shut down for some runtime-computed parameters
+ (e.g., <xref linkend="guc-huge-pages-required"/>).
Perhaps we should add a couple of extra parameters here, like
shared_memory_size, and perhaps wal_segment_size? The list does not
have to be complete, just meaningful enough.
--
Michael
On 2021/09/08 12:50, Michael Paquier wrote:
On Tue, Sep 07, 2021 at 05:08:43PM +0000, Bossart, Nathan wrote:
On 9/6/21, 9:00 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
+ sprintf(buf, "%lu MB", size_mb); + SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE); One small-ish comment about 0002: there is no need to add the unit into the buffer set as GUC_UNIT_MB would take care of that. The patch looks fine.I fixed this in v7.
Switched the variable name to shared_memory_size_mb for easier
grepping, moved it to a more correct location with the other read-only
GUCS, and applied 0002. Well, 0001 here.
Thanks for adding useful feature!
+ {"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM,
When reading the applied code, I found the category of shared_memory_size
is RESOURCES_MEM. Why? This seems right because the parameter is related
to memory resource. But since its context is PGC_INTERNAL, PRESET_OPTIONS
is more proper as the category? BTW, the category of any other
PGC_INTERNAL parameters seems to be PRESET_OPTIONS.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 9/7/21, 8:50 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
Switched the variable name to shared_memory_size_mb for easier
grepping, moved it to a more correct location with the other read-only
GUCS, and applied 0002. Well, 0001 here.
Thanks! And thanks for cleaning up the small mistake in aa37a43.
+ This can be used on a running server for most parameters. However, + the server must be shut down for some runtime-computed parameters + (e.g., <xref linkend="guc-huge-pages-required"/>). Perhaps we should add a couple of extra parameters here, like shared_memory_size, and perhaps wal_segment_size? The list does not have to be complete, just meaningful enough.
Good idea.
Nathan
On 9/8/21, 12:11 AM, "Fujii Masao" <masao.fujii@oss.nttdata.com> wrote:
Thanks for adding useful feature!
:)
+ {"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM,
When reading the applied code, I found the category of shared_memory_size
is RESOURCES_MEM. Why? This seems right because the parameter is related
to memory resource. But since its context is PGC_INTERNAL, PRESET_OPTIONS
is more proper as the category? BTW, the category of any other
PGC_INTERNAL parameters seems to be PRESET_OPTIONS.
Yeah, I did wonder about this. We're even listing it in the "Preset
Options" section in the docs. I updated this in the new patch set,
which is attached.
Nathan
Attachments:
v8-0002-Provide-useful-values-for-postgres-C-with-runtime.patchapplication/octet-stream; name=v8-0002-Provide-useful-values-for-postgres-C-with-runtime.patchDownload
From 3edf8cfb42c2d16352685f2bed8c39ac04ae1a2a Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Wed, 8 Sep 2021 17:23:14 +0000
Subject: [PATCH v8 2/2] Provide useful values for 'postgres -C' with
runtime-computed GUCs.
The -C option is handled before a small subset of GUCs that are
computed at runtime are initialized. Unfortunately, we cannot move
this handling to after they are initialized without disallowing
'postgres -C' on a running server. One notable reason for this is
that loadable libraries' _PG_init() functions are called before all
runtime-computed GUCs are initialized, and this is not guaranteed
to be safe to do on running servers.
In order to provide useful values for 'postgres -C' for runtime-
computed GUCs, this change adds a new section for handling just
these GUCs just before shared memory is initialized. While users
won't be able to use -C for runtime-computed GUCs on running
servers, providing a useful value with this restriction seems
better than not providing a useful value at all.
---
doc/src/sgml/ref/postgres-ref.sgml | 12 +++++++--
doc/src/sgml/runtime.sgml | 33 ++++++++----------------
src/backend/postmaster/postmaster.c | 50 +++++++++++++++++++++++++++++++------
src/backend/utils/misc/guc.c | 10 ++++----
src/include/utils/guc.h | 6 +++++
5 files changed, 75 insertions(+), 36 deletions(-)
diff --git a/doc/src/sgml/ref/postgres-ref.sgml b/doc/src/sgml/ref/postgres-ref.sgml
index 4aaa7abe1a..89cc3cdb4e 100644
--- a/doc/src/sgml/ref/postgres-ref.sgml
+++ b/doc/src/sgml/ref/postgres-ref.sgml
@@ -133,13 +133,21 @@ PostgreSQL documentation
<listitem>
<para>
Prints the value of the named run-time parameter, and exits.
- (See the <option>-c</option> option above for details.) This can
- be used on a running server, and returns values from
+ (See the <option>-c</option> option above for details.) This
+ returns values from
<filename>postgresql.conf</filename>, modified by any parameters
supplied in this invocation. It does not reflect parameters
supplied when the cluster was started.
</para>
+ <para>
+ This can be used on a running server for most parameters. However,
+ the server must be shut down for some runtime-computed parameters
+ (e.g., <xref linkend="guc-huge-pages-required"/>,
+ <xref linkend="guc-shared-memory-size"/>, and
+ <xref linkend="guc-wal-segment-size"/>).
+ </para>
+
<para>
This option is meant for other programs that interact with a server
instance, such as <xref linkend="app-pg-ctl"/>, to query configuration
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index f1cbc1d9e9..d955639900 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1442,32 +1442,21 @@ export PG_OOM_ADJUST_VALUE=0
with <varname>CONFIG_HUGETLBFS=y</varname> and
<varname>CONFIG_HUGETLB_PAGE=y</varname>. You will also have to configure
the operating system to provide enough huge pages of the desired size.
- To estimate the number of huge pages needed, start
- <productname>PostgreSQL</productname> without huge pages enabled and check
- the postmaster's anonymous shared memory segment size, as well as the
- system's default and supported huge page sizes, using the
- <filename>/proc</filename> and <filename>/sys</filename> file systems.
- This might look like:
+ To estimate the number of huge pages needed, use the
+ <command>postgres</command> command to see the value of
+ <xref linkend="guc-huge-pages-required"/>. This might look like:
<programlisting>
-$ <userinput>head -1 $PGDATA/postmaster.pid</userinput>
-4170
-$ <userinput>pmap 4170 | awk '/rw-s/ && /zero/ {print $2}'</userinput>
-6490428K
-$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
-Hugepagesize: 2048 kB
-$ <userinput>ls /sys/kernel/mm/hugepages</userinput>
-hugepages-1048576kB hugepages-2048kB
+$ <userinput>postgres -D $PGDATA -C huge_pages_required</userinput>
+3170
</programlisting>
- In this example the default is 2MB, but you can also explicitly request
- either 2MB or 1GB with <xref linkend="guc-huge-page-size"/>.
+ Note that you can explicitly request either 2MB or 1GB huge pages with
+ <xref linkend="guc-huge-page-size"/>.
- Assuming <literal>2MB</literal> huge pages,
- <literal>6490428</literal> / <literal>2048</literal> gives approximately
- <literal>3169.154</literal>, so in this example we need at
- least <literal>3170</literal> huge pages. A larger setting would be
- appropriate if other programs on the machine also need huge pages.
- We can set this with:
+ While we need at least <literal>3170</literal> huge pages in this
+ example, a larger setting would be appropriate if other programs on
+ the machine also need huge pages. We can allocate the huge pages
+ with:
<programlisting>
# <userinput>sysctl -w vm.nr_hugepages=3170</userinput>
</programlisting>
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b2fe420c3c..f601b9e5d1 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -896,15 +896,31 @@ PostmasterMain(int argc, char *argv[])
if (output_config_variable != NULL)
{
/*
- * "-C guc" was specified, so print GUC's value and exit. No extra
- * permission check is needed because the user is reading inside the
- * data dir.
+ * If this is a runtime-computed GUC, it hasn't yet been initialized,
+ * and the present value is not useful. However, this is a convenient
+ * place to print the value for most GUCs because it is safe to run
+ * postmaster startup to this point even if the server is already
+ * running. For the handful of runtime-computed GUCs that we can't
+ * provide meaningful values for yet, we wait until later in postmaster
+ * startup to print the value. We won't be able to use -C on running
+ * servers for those GUCs, but otherwise this option is unusable for
+ * them.
*/
- const char *config_val = GetConfigOption(output_config_variable,
- false, false);
+ int flags = GetConfigOptionFlags(output_config_variable, true);
- puts(config_val ? config_val : "");
- ExitPostmaster(0);
+ if ((flags & GUC_RUNTIME_COMPUTED) == 0)
+ {
+ /*
+ * "-C guc" was specified, so print GUC's value and exit. No extra
+ * permission check is needed because the user is reading inside
+ * the data dir.
+ */
+ const char *config_val = GetConfigOption(output_config_variable,
+ false, false);
+
+ puts(config_val ? config_val : "");
+ ExitPostmaster(0);
+ }
}
/* Verify that DataDir looks reasonable */
@@ -1033,6 +1049,26 @@ PostmasterMain(int argc, char *argv[])
*/
InitializeShmemGUCs();
+ /*
+ * If -C was specified with a runtime-computed GUC, we held off printing
+ * the value earlier, as the GUC was not yet initialized. We handle -C for
+ * most GUCs before we lock the data directory so that the option may be
+ * used on a running server. However, a handful of GUCs are runtime-
+ * computed and do not have meaningful values until after locking the data
+ * directory, and we cannot safely calculate their values earlier on a
+ * running server. At this point, such GUCs should be properly
+ * initialized, and we haven't yet set up shared memory, so this is a good
+ * time to handle the -C option for these special GUCs.
+ */
+ if (output_config_variable != NULL)
+ {
+ const char *config_val = GetConfigOption(output_config_variable,
+ false, false);
+
+ puts(config_val ? config_val : "");
+ ExitPostmaster(0);
+ }
+
/*
* Set up shared memory and semaphores.
*/
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 8edfd07340..50e60b270b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1984,7 +1984,7 @@ static struct config_bool ConfigureNamesBool[] =
{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows whether data checksums are turned on for this cluster."),
NULL,
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&data_checksums,
false,
@@ -2229,7 +2229,7 @@ static struct config_int ConfigureNamesInt[] =
{"huge_pages_required", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows the number of huge pages needed for the main shared memory area."),
gettext_noop("-1 indicates that the value could not be determined."),
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&huge_pages_required,
-1, -1, INT_MAX,
@@ -2354,7 +2354,7 @@ static struct config_int ConfigureNamesInt[] =
{"shared_memory_size", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows the size of the server's main shared memory area (rounded up to the nearest MB)."),
NULL,
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_UNIT_MB
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_UNIT_MB | GUC_RUNTIME_COMPUTED
},
&shared_memory_size_mb,
0, 0, INT_MAX,
@@ -2419,7 +2419,7 @@ static struct config_int ConfigureNamesInt[] =
"in the form accepted by the chmod and umask system "
"calls. (To use the customary octal format the number "
"must start with a 0 (zero).)"),
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&data_directory_mode,
0700, 0000, 0777,
@@ -3243,7 +3243,7 @@ static struct config_int ConfigureNamesInt[] =
{"wal_segment_size", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows the size of write ahead log segments."),
NULL,
- GUC_UNIT_BYTE | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_UNIT_BYTE | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&wal_segment_size,
DEFAULT_XLOG_SEG_SIZE,
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index a7c3a4958e..aa18d304ac 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -229,6 +229,12 @@ typedef enum
#define GUC_EXPLAIN 0x100000 /* include in explain */
+/*
+ * GUC_RUNTIME_COMPUTED is intended for runtime-computed GUCs that are only
+ * available via 'postgres -C' if the server is not running.
+ */
+#define GUC_RUNTIME_COMPUTED 0x200000
+
#define GUC_UNIT (GUC_UNIT_MEMORY | GUC_UNIT_TIME)
--
2.16.6
v8-0001-Introduce-huge_pages_required-GUC.patchapplication/octet-stream; name=v8-0001-Introduce-huge_pages_required-GUC.patchDownload
From b7da8e594f52cd45487c5637c70b0cc11087fb2f Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Wed, 8 Sep 2021 17:18:39 +0000
Subject: [PATCH v8 1/2] Introduce huge_pages_required GUC.
This runtime-computed GUC shows the number of huge pages required
for the server's main shared memory area.
---
doc/src/sgml/config.sgml | 21 +++++++++++++++++++++
src/backend/port/sysv_shmem.c | 16 +++++++++++-----
src/backend/port/win32_shmem.c | 14 ++++++++++++++
src/backend/storage/ipc/ipci.c | 14 ++++++++++++++
src/backend/utils/misc/guc.c | 14 +++++++++++++-
src/include/storage/pg_shmem.h | 1 +
6 files changed, 74 insertions(+), 6 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ef0e2a7746..b27d8aff15 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -10101,6 +10101,27 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
</listitem>
</varlistentry>
+ <varlistentry id="guc-huge-pages-required" xreflabel="huge_pages_required">
+ <term><varname>huge_pages_required</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>huge_pages_required</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Reports the number of huge pages that are required for the main
+ shared memory area based on the specified
+ <xref linkend="guc-huge-page-size"/>. If huge pages are not supported,
+ this will be <literal>-1</literal>.
+ </para>
+ <para>
+ This setting is supported only on Linux. It is always set to
+ <literal>-1</literal> on other platforms. For more details about using
+ huge pages on Linux, see <xref linkend="linux-huge-pages"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-integer-datetimes" xreflabel="integer_datetimes">
<term><varname>integer_datetimes</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 9de96edf6a..125e2d47ec 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -456,8 +456,6 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
return shmStat.shm_nattch == 0 ? SHMSTATE_UNATTACHED : SHMSTATE_ATTACHED;
}
-#ifdef MAP_HUGETLB
-
/*
* Identify the huge page size to use, and compute the related mmap flags.
*
@@ -476,11 +474,19 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
* such as increasing shared_buffers to absorb the extra space.
*
* Returns the (real, assumed or config provided) page size into *hugepagesize,
- * and the hugepage-related mmap flags to use into *mmap_flags.
+ * and the hugepage-related mmap flags to use into *mmap_flags. If huge pages
+ * is not supported, *hugepagesize and *mmap_flags will be set to 0.
*/
-static void
+void
GetHugePageSize(Size *hugepagesize, int *mmap_flags)
{
+#ifndef MAP_HUGETLB
+
+ *hugepagesize = 0;
+ *mmap_flags = 0;
+
+#else
+
Size default_hugepagesize = 0;
/*
@@ -553,9 +559,9 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags)
*mmap_flags |= (shift & MAP_HUGE_MASK) << MAP_HUGE_SHIFT;
}
#endif
-}
#endif /* MAP_HUGETLB */
+}
/*
* Creates an anonymous mmap()ed shared memory segment.
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index d7a71992d8..90de2ab4e1 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -605,3 +605,17 @@ pgwin32_ReserveSharedMemoryRegion(HANDLE hChild)
return true;
}
+
+/*
+ * This function is provided for consistency with sysv_shmem.c and does not
+ * provide any useful information for Windows. To obtain the large page size,
+ * use GetLargePageMinimum() instead.
+ *
+ * This always sets *hugepagesize and *mmap_flags to 0.
+ */
+void
+GetHugePageSize(Size *hugepagesize, int *mmap_flags)
+{
+ *hugepagesize = 0;
+ *mmap_flags = 0;
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 13f3926ff6..91490653cf 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -326,6 +326,9 @@ InitializeShmemGUCs(void)
char buf[64];
Size size_b;
Size size_mb;
+ Size hp_size;
+ Size hp_required;
+ int unused;
/*
* Calculate the shared memory size and round up to the nearest megabyte.
@@ -334,4 +337,15 @@ InitializeShmemGUCs(void)
size_mb = add_size(size_b, (1024 * 1024) - 1) / (1024 * 1024);
sprintf(buf, "%zu", size_mb);
SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+
+ /*
+ * Calculate the number of huge pages required.
+ */
+ GetHugePageSize(&hp_size, &unused);
+ if (hp_size != 0)
+ {
+ hp_required = (size_b / hp_size) + 1;
+ sprintf(buf, "%zu", hp_required);
+ SetConfigOption("huge_pages_required", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+ }
}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index c339acf067..8edfd07340 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -665,6 +665,7 @@ static int max_identifier_length;
static int block_size;
static int segment_size;
static int shared_memory_size_mb;
+static int huge_pages_required;
static int wal_block_size;
static bool data_checksums;
static bool integer_datetimes;
@@ -2224,6 +2225,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"huge_pages_required", PGC_INTERNAL, PRESET_OPTIONS,
+ gettext_noop("Shows the number of huge pages needed for the main shared memory area."),
+ gettext_noop("-1 indicates that the value could not be determined."),
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ },
+ &huge_pages_required,
+ -1, -1, INT_MAX,
+ NULL, NULL, NULL
+ },
+
{
/* This is PGC_SUSET to prevent hiding from log_lock_waits. */
{"deadlock_timeout", PGC_SUSET, LOCK_MANAGEMENT,
@@ -2339,7 +2351,7 @@ static struct config_int ConfigureNamesInt[] =
},
{
- {"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM,
+ {"shared_memory_size", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows the size of the server's main shared memory area (rounded up to the nearest MB)."),
NULL,
GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_UNIT_MB
diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index 059df1b72c..518eb86065 100644
--- a/src/include/storage/pg_shmem.h
+++ b/src/include/storage/pg_shmem.h
@@ -87,5 +87,6 @@ extern PGShmemHeader *PGSharedMemoryCreate(Size size,
PGShmemHeader **shim);
extern bool PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2);
extern void PGSharedMemoryDetach(void);
+extern void GetHugePageSize(Size *hugepagesize, int *mmap_flags);
#endif /* PG_SHMEM_H */
--
2.16.6
On Wed, Sep 08, 2021 at 04:10:41PM +0900, Fujii Masao wrote:
+ {"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM,
When reading the applied code, I found the category of shared_memory_size
is RESOURCES_MEM. Why? This seems right because the parameter is related
to memory resource. But since its context is PGC_INTERNAL, PRESET_OPTIONS
is more proper as the category? BTW, the category of any other
PGC_INTERNAL parameters seems to be PRESET_OPTIONS.
Yes, that's an oversight from me. I was looking at that yesterday,
noticed some exceptions in the GUC list with things not allowed in
files and just concluded that RESOURCES_MEM should be fine, but the
docs tell a different story. Thanks, fixed.
--
Michael
On Wed, Sep 08, 2021 at 05:52:33PM +0000, Bossart, Nathan wrote:
Yeah, I did wonder about this. We're even listing it in the "Preset
Options" section in the docs. I updated this in the new patch set,
which is attached.
I broke that again, so rebased as v9 attached.
FWIW, I don't have an environment at hand these days to test properly
0001, so this will have to wait a bit. I really like the approach
taken by 0002, and it is independent of the other patch while
extending support for postgres -c to provide the correct runtime
values. So let's wrap this part first. No need to send a reorganized
patch set.
--
Michael
On Thu, Sep 09, 2021 at 01:19:14PM +0900, Michael Paquier wrote:
I broke that again, so rebased as v9 attached.
Well.
--
Michael
Attachments:
v9-0001-Introduce-huge_pages_required-GUC.patchtext/plain; charset=us-asciiDownload
From 09b1f5a5e0e88a6d84bfc2f8a1bc410e21fb3775 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 9 Sep 2021 13:10:24 +0900
Subject: [PATCH v9 1/2] Introduce huge_pages_required GUC.
This runtime-computed GUC shows the number of huge pages required
for the server's main shared memory area.
---
src/include/storage/pg_shmem.h | 1 +
src/backend/port/sysv_shmem.c | 16 +++++++++++-----
src/backend/port/win32_shmem.c | 14 ++++++++++++++
src/backend/storage/ipc/ipci.c | 14 ++++++++++++++
src/backend/utils/misc/guc.c | 12 ++++++++++++
doc/src/sgml/config.sgml | 21 +++++++++++++++++++++
6 files changed, 73 insertions(+), 5 deletions(-)
diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index 059df1b72c..518eb86065 100644
--- a/src/include/storage/pg_shmem.h
+++ b/src/include/storage/pg_shmem.h
@@ -87,5 +87,6 @@ extern PGShmemHeader *PGSharedMemoryCreate(Size size,
PGShmemHeader **shim);
extern bool PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2);
extern void PGSharedMemoryDetach(void);
+extern void GetHugePageSize(Size *hugepagesize, int *mmap_flags);
#endif /* PG_SHMEM_H */
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 9de96edf6a..125e2d47ec 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -456,8 +456,6 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
return shmStat.shm_nattch == 0 ? SHMSTATE_UNATTACHED : SHMSTATE_ATTACHED;
}
-#ifdef MAP_HUGETLB
-
/*
* Identify the huge page size to use, and compute the related mmap flags.
*
@@ -476,11 +474,19 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
* such as increasing shared_buffers to absorb the extra space.
*
* Returns the (real, assumed or config provided) page size into *hugepagesize,
- * and the hugepage-related mmap flags to use into *mmap_flags.
+ * and the hugepage-related mmap flags to use into *mmap_flags. If huge pages
+ * is not supported, *hugepagesize and *mmap_flags will be set to 0.
*/
-static void
+void
GetHugePageSize(Size *hugepagesize, int *mmap_flags)
{
+#ifndef MAP_HUGETLB
+
+ *hugepagesize = 0;
+ *mmap_flags = 0;
+
+#else
+
Size default_hugepagesize = 0;
/*
@@ -553,9 +559,9 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags)
*mmap_flags |= (shift & MAP_HUGE_MASK) << MAP_HUGE_SHIFT;
}
#endif
-}
#endif /* MAP_HUGETLB */
+}
/*
* Creates an anonymous mmap()ed shared memory segment.
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index d7a71992d8..90de2ab4e1 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -605,3 +605,17 @@ pgwin32_ReserveSharedMemoryRegion(HANDLE hChild)
return true;
}
+
+/*
+ * This function is provided for consistency with sysv_shmem.c and does not
+ * provide any useful information for Windows. To obtain the large page size,
+ * use GetLargePageMinimum() instead.
+ *
+ * This always sets *hugepagesize and *mmap_flags to 0.
+ */
+void
+GetHugePageSize(Size *hugepagesize, int *mmap_flags)
+{
+ *hugepagesize = 0;
+ *mmap_flags = 0;
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 13f3926ff6..91490653cf 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -326,6 +326,9 @@ InitializeShmemGUCs(void)
char buf[64];
Size size_b;
Size size_mb;
+ Size hp_size;
+ Size hp_required;
+ int unused;
/*
* Calculate the shared memory size and round up to the nearest megabyte.
@@ -334,4 +337,15 @@ InitializeShmemGUCs(void)
size_mb = add_size(size_b, (1024 * 1024) - 1) / (1024 * 1024);
sprintf(buf, "%zu", size_mb);
SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+
+ /*
+ * Calculate the number of huge pages required.
+ */
+ GetHugePageSize(&hp_size, &unused);
+ if (hp_size != 0)
+ {
+ hp_required = (size_b / hp_size) + 1;
+ sprintf(buf, "%zu", hp_required);
+ SetConfigOption("huge_pages_required", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+ }
}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index fd4ca83be1..8edfd07340 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -665,6 +665,7 @@ static int max_identifier_length;
static int block_size;
static int segment_size;
static int shared_memory_size_mb;
+static int huge_pages_required;
static int wal_block_size;
static bool data_checksums;
static bool integer_datetimes;
@@ -2224,6 +2225,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"huge_pages_required", PGC_INTERNAL, PRESET_OPTIONS,
+ gettext_noop("Shows the number of huge pages needed for the main shared memory area."),
+ gettext_noop("-1 indicates that the value could not be determined."),
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ },
+ &huge_pages_required,
+ -1, -1, INT_MAX,
+ NULL, NULL, NULL
+ },
+
{
/* This is PGC_SUSET to prevent hiding from log_lock_waits. */
{"deadlock_timeout", PGC_SUSET, LOCK_MANAGEMENT,
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ef0e2a7746..b27d8aff15 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -10101,6 +10101,27 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
</listitem>
</varlistentry>
+ <varlistentry id="guc-huge-pages-required" xreflabel="huge_pages_required">
+ <term><varname>huge_pages_required</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>huge_pages_required</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Reports the number of huge pages that are required for the main
+ shared memory area based on the specified
+ <xref linkend="guc-huge-page-size"/>. If huge pages are not supported,
+ this will be <literal>-1</literal>.
+ </para>
+ <para>
+ This setting is supported only on Linux. It is always set to
+ <literal>-1</literal> on other platforms. For more details about using
+ huge pages on Linux, see <xref linkend="linux-huge-pages"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-integer-datetimes" xreflabel="integer_datetimes">
<term><varname>integer_datetimes</varname> (<type>boolean</type>)
<indexterm>
--
2.33.0
v9-0002-Provide-useful-values-for-postgres-C-with-runtime.patchtext/plain; charset=us-asciiDownload
From f4763273a7f4649d4c50699b590c0b4b8077cb95 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 9 Sep 2021 13:10:38 +0900
Subject: [PATCH v9 2/2] Provide useful values for 'postgres -C' with
runtime-computed GUCs.
The -C option is handled before a small subset of GUCs that are
computed at runtime are initialized. Unfortunately, we cannot move
this handling to after they are initialized without disallowing
'postgres -C' on a running server. One notable reason for this is
that loadable libraries' _PG_init() functions are called before all
runtime-computed GUCs are initialized, and this is not guaranteed
to be safe to do on running servers.
In order to provide useful values for 'postgres -C' for runtime-
computed GUCs, this change adds a new section for handling just
these GUCs just before shared memory is initialized. While users
won't be able to use -C for runtime-computed GUCs on running
servers, providing a useful value with this restriction seems
better than not providing a useful value at all.
---
src/include/utils/guc.h | 6 ++++
src/backend/postmaster/postmaster.c | 50 +++++++++++++++++++++++++----
src/backend/utils/misc/guc.c | 10 +++---
doc/src/sgml/ref/postgres-ref.sgml | 12 +++++--
doc/src/sgml/runtime.sgml | 33 +++++++------------
5 files changed, 75 insertions(+), 36 deletions(-)
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index a7c3a4958e..aa18d304ac 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -229,6 +229,12 @@ typedef enum
#define GUC_EXPLAIN 0x100000 /* include in explain */
+/*
+ * GUC_RUNTIME_COMPUTED is intended for runtime-computed GUCs that are only
+ * available via 'postgres -C' if the server is not running.
+ */
+#define GUC_RUNTIME_COMPUTED 0x200000
+
#define GUC_UNIT (GUC_UNIT_MEMORY | GUC_UNIT_TIME)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b2fe420c3c..f601b9e5d1 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -896,15 +896,31 @@ PostmasterMain(int argc, char *argv[])
if (output_config_variable != NULL)
{
/*
- * "-C guc" was specified, so print GUC's value and exit. No extra
- * permission check is needed because the user is reading inside the
- * data dir.
+ * If this is a runtime-computed GUC, it hasn't yet been initialized,
+ * and the present value is not useful. However, this is a convenient
+ * place to print the value for most GUCs because it is safe to run
+ * postmaster startup to this point even if the server is already
+ * running. For the handful of runtime-computed GUCs that we can't
+ * provide meaningful values for yet, we wait until later in postmaster
+ * startup to print the value. We won't be able to use -C on running
+ * servers for those GUCs, but otherwise this option is unusable for
+ * them.
*/
- const char *config_val = GetConfigOption(output_config_variable,
- false, false);
+ int flags = GetConfigOptionFlags(output_config_variable, true);
- puts(config_val ? config_val : "");
- ExitPostmaster(0);
+ if ((flags & GUC_RUNTIME_COMPUTED) == 0)
+ {
+ /*
+ * "-C guc" was specified, so print GUC's value and exit. No extra
+ * permission check is needed because the user is reading inside
+ * the data dir.
+ */
+ const char *config_val = GetConfigOption(output_config_variable,
+ false, false);
+
+ puts(config_val ? config_val : "");
+ ExitPostmaster(0);
+ }
}
/* Verify that DataDir looks reasonable */
@@ -1033,6 +1049,26 @@ PostmasterMain(int argc, char *argv[])
*/
InitializeShmemGUCs();
+ /*
+ * If -C was specified with a runtime-computed GUC, we held off printing
+ * the value earlier, as the GUC was not yet initialized. We handle -C for
+ * most GUCs before we lock the data directory so that the option may be
+ * used on a running server. However, a handful of GUCs are runtime-
+ * computed and do not have meaningful values until after locking the data
+ * directory, and we cannot safely calculate their values earlier on a
+ * running server. At this point, such GUCs should be properly
+ * initialized, and we haven't yet set up shared memory, so this is a good
+ * time to handle the -C option for these special GUCs.
+ */
+ if (output_config_variable != NULL)
+ {
+ const char *config_val = GetConfigOption(output_config_variable,
+ false, false);
+
+ puts(config_val ? config_val : "");
+ ExitPostmaster(0);
+ }
+
/*
* Set up shared memory and semaphores.
*/
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 8edfd07340..50e60b270b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1984,7 +1984,7 @@ static struct config_bool ConfigureNamesBool[] =
{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows whether data checksums are turned on for this cluster."),
NULL,
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&data_checksums,
false,
@@ -2229,7 +2229,7 @@ static struct config_int ConfigureNamesInt[] =
{"huge_pages_required", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows the number of huge pages needed for the main shared memory area."),
gettext_noop("-1 indicates that the value could not be determined."),
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&huge_pages_required,
-1, -1, INT_MAX,
@@ -2354,7 +2354,7 @@ static struct config_int ConfigureNamesInt[] =
{"shared_memory_size", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows the size of the server's main shared memory area (rounded up to the nearest MB)."),
NULL,
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_UNIT_MB
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_UNIT_MB | GUC_RUNTIME_COMPUTED
},
&shared_memory_size_mb,
0, 0, INT_MAX,
@@ -2419,7 +2419,7 @@ static struct config_int ConfigureNamesInt[] =
"in the form accepted by the chmod and umask system "
"calls. (To use the customary octal format the number "
"must start with a 0 (zero).)"),
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&data_directory_mode,
0700, 0000, 0777,
@@ -3243,7 +3243,7 @@ static struct config_int ConfigureNamesInt[] =
{"wal_segment_size", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows the size of write ahead log segments."),
NULL,
- GUC_UNIT_BYTE | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_UNIT_BYTE | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&wal_segment_size,
DEFAULT_XLOG_SEG_SIZE,
diff --git a/doc/src/sgml/ref/postgres-ref.sgml b/doc/src/sgml/ref/postgres-ref.sgml
index 4aaa7abe1a..89cc3cdb4e 100644
--- a/doc/src/sgml/ref/postgres-ref.sgml
+++ b/doc/src/sgml/ref/postgres-ref.sgml
@@ -133,13 +133,21 @@ PostgreSQL documentation
<listitem>
<para>
Prints the value of the named run-time parameter, and exits.
- (See the <option>-c</option> option above for details.) This can
- be used on a running server, and returns values from
+ (See the <option>-c</option> option above for details.) This
+ returns values from
<filename>postgresql.conf</filename>, modified by any parameters
supplied in this invocation. It does not reflect parameters
supplied when the cluster was started.
</para>
+ <para>
+ This can be used on a running server for most parameters. However,
+ the server must be shut down for some runtime-computed parameters
+ (e.g., <xref linkend="guc-huge-pages-required"/>,
+ <xref linkend="guc-shared-memory-size"/>, and
+ <xref linkend="guc-wal-segment-size"/>).
+ </para>
+
<para>
This option is meant for other programs that interact with a server
instance, such as <xref linkend="app-pg-ctl"/>, to query configuration
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index f1cbc1d9e9..d955639900 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1442,32 +1442,21 @@ export PG_OOM_ADJUST_VALUE=0
with <varname>CONFIG_HUGETLBFS=y</varname> and
<varname>CONFIG_HUGETLB_PAGE=y</varname>. You will also have to configure
the operating system to provide enough huge pages of the desired size.
- To estimate the number of huge pages needed, start
- <productname>PostgreSQL</productname> without huge pages enabled and check
- the postmaster's anonymous shared memory segment size, as well as the
- system's default and supported huge page sizes, using the
- <filename>/proc</filename> and <filename>/sys</filename> file systems.
- This might look like:
+ To estimate the number of huge pages needed, use the
+ <command>postgres</command> command to see the value of
+ <xref linkend="guc-huge-pages-required"/>. This might look like:
<programlisting>
-$ <userinput>head -1 $PGDATA/postmaster.pid</userinput>
-4170
-$ <userinput>pmap 4170 | awk '/rw-s/ && /zero/ {print $2}'</userinput>
-6490428K
-$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
-Hugepagesize: 2048 kB
-$ <userinput>ls /sys/kernel/mm/hugepages</userinput>
-hugepages-1048576kB hugepages-2048kB
+$ <userinput>postgres -D $PGDATA -C huge_pages_required</userinput>
+3170
</programlisting>
- In this example the default is 2MB, but you can also explicitly request
- either 2MB or 1GB with <xref linkend="guc-huge-page-size"/>.
+ Note that you can explicitly request either 2MB or 1GB huge pages with
+ <xref linkend="guc-huge-page-size"/>.
- Assuming <literal>2MB</literal> huge pages,
- <literal>6490428</literal> / <literal>2048</literal> gives approximately
- <literal>3169.154</literal>, so in this example we need at
- least <literal>3170</literal> huge pages. A larger setting would be
- appropriate if other programs on the machine also need huge pages.
- We can set this with:
+ While we need at least <literal>3170</literal> huge pages in this
+ example, a larger setting would be appropriate if other programs on
+ the machine also need huge pages. We can allocate the huge pages
+ with:
<programlisting>
# <userinput>sysctl -w vm.nr_hugepages=3170</userinput>
</programlisting>
--
2.33.0
On 9/8/21, 9:19 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
FWIW, I don't have an environment at hand these days to test properly
0001, so this will have to wait a bit. I really like the approach
taken by 0002, and it is independent of the other patch while
extending support for postgres -c to provide the correct runtime
values. So let's wrap this part first. No need to send a reorganized
patch set.
Sounds good.
For 0001, the biggest thing on my mind at the moment is the name of
the GUC. "huge_pages_required" feels kind of ambiguous. From the
name alone, it could mean either "the number of huge pages required"
or "huge pages are required for the server to run." Also, the number
of huge pages required is not actually required if you don't want to
run the server with huge pages. I think it might be clearer to
somehow indicate that the value is essentially the size of the main
shared memory area in terms of the huge page size, but I'm not sure
how to do that concisely. Perhaps it is enough to just make sure the
description of "huge_pages_required" is detailed enough.
For 0002, I have two small concerns. My first concern is that it
might be confusing to customers when the runtime GUCs cannot be
returned for a running server. We have the note in the docs, but if
you're encountering it on the command line, it's not totally clear
what the problem is.
$ postgres -D . -C log_min_messages
warning
$ postgres -D . -C shared_memory_size
2021-09-09 18:51:21.617 UTC [7924] FATAL: lock file "postmaster.pid" already exists
2021-09-09 18:51:21.617 UTC [7924] HINT: Is another postmaster (PID 7912) running in data directory "/local/home/bossartn/pgdata"?
My other concern is that by default, viewing the runtime-computed GUCs
will also emit a LOG.
$ postgres -D . -C shared_memory_size
142
2021-09-09 18:53:25.194 UTC [8006] LOG: database system is shut down
Running these commands with log_min_messages=debug5 emits way more
information for the runtime-computed GUCs than for others, but IMO
that is alright. However, perhaps we should adjust the logging in
0002 to improve the default user experience. I attached an attempt at
that.
With the attached patch, trying to view a runtime-computed GUC on a
running server will look like this:
$ postgres -D . -C shared_memory_size
2021-09-09 21:24:21.552 UTC [6224] FATAL: lock file "postmaster.pid" already exists
2021-09-09 21:24:21.552 UTC [6224] DETAIL: Runtime-computed GUC "shared_memory_size" cannot be viewed on a running server.
2021-09-09 21:24:21.552 UTC [6224] HINT: Is another postmaster (PID 3628) running in data directory "/local/home/bossartn/pgdata"?
And viewing a runtime-computed GUC on a server that is shut down will
look like this:
$ postgres -D . -C shared_memory_size
142
I'm not tremendously happy with the patch, but I hope that it at least
helps with the discussion.
Nathan
Attachments:
0001-Provide-useful-values-for-postgres-C-with-runtime-co.patchapplication/octet-stream; name=0001-Provide-useful-values-for-postgres-C-with-runtime-co.patchDownload
From aba09219bbcbdb5d9747c63dbf75380c774873e7 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Thu, 9 Sep 2021 21:42:41 +0000
Subject: [PATCH 1/1] Provide useful values for 'postgres -C' with
runtime-computed GUCs.
The -C option is handled before a small subset of GUCs that are
computed at runtime are initialized. Unfortunately, we cannot move
this handling to after they are initialized without disallowing
'postgres -C' on a running server. One notable reason for this is
that loadable libraries' _PG_init() functions are called before all
runtime-computed GUCs are initialized, and this is not guaranteed
to be safe to do on running servers.
In order to provide useful values for 'postgres -C' for runtime-
computed GUCs, this change adds a new section for handling just
these GUCs just before shared memory is initialized. While users
won't be able to use -C for runtime-computed GUCs on running
servers, providing a useful value with this restriction seems
better than not providing a useful value at all.
---
doc/src/sgml/ref/postgres-ref.sgml | 12 ++++++--
doc/src/sgml/runtime.sgml | 33 +++++++-------------
src/backend/bootstrap/bootstrap.c | 2 +-
src/backend/postmaster/postmaster.c | 52 +++++++++++++++++++++++++++-----
src/backend/tcop/postgres.c | 2 +-
src/backend/utils/init/miscinit.c | 60 ++++++++++++++++++++++++++++---------
src/backend/utils/misc/guc.c | 8 ++---
src/include/miscadmin.h | 3 +-
src/include/utils/guc.h | 6 ++++
9 files changed, 125 insertions(+), 53 deletions(-)
diff --git a/doc/src/sgml/ref/postgres-ref.sgml b/doc/src/sgml/ref/postgres-ref.sgml
index 4aaa7abe1a..89cc3cdb4e 100644
--- a/doc/src/sgml/ref/postgres-ref.sgml
+++ b/doc/src/sgml/ref/postgres-ref.sgml
@@ -133,13 +133,21 @@ PostgreSQL documentation
<listitem>
<para>
Prints the value of the named run-time parameter, and exits.
- (See the <option>-c</option> option above for details.) This can
- be used on a running server, and returns values from
+ (See the <option>-c</option> option above for details.) This
+ returns values from
<filename>postgresql.conf</filename>, modified by any parameters
supplied in this invocation. It does not reflect parameters
supplied when the cluster was started.
</para>
+ <para>
+ This can be used on a running server for most parameters. However,
+ the server must be shut down for some runtime-computed parameters
+ (e.g., <xref linkend="guc-huge-pages-required"/>,
+ <xref linkend="guc-shared-memory-size"/>, and
+ <xref linkend="guc-wal-segment-size"/>).
+ </para>
+
<para>
This option is meant for other programs that interact with a server
instance, such as <xref linkend="app-pg-ctl"/>, to query configuration
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index f1cbc1d9e9..d955639900 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1442,32 +1442,21 @@ export PG_OOM_ADJUST_VALUE=0
with <varname>CONFIG_HUGETLBFS=y</varname> and
<varname>CONFIG_HUGETLB_PAGE=y</varname>. You will also have to configure
the operating system to provide enough huge pages of the desired size.
- To estimate the number of huge pages needed, start
- <productname>PostgreSQL</productname> without huge pages enabled and check
- the postmaster's anonymous shared memory segment size, as well as the
- system's default and supported huge page sizes, using the
- <filename>/proc</filename> and <filename>/sys</filename> file systems.
- This might look like:
+ To estimate the number of huge pages needed, use the
+ <command>postgres</command> command to see the value of
+ <xref linkend="guc-huge-pages-required"/>. This might look like:
<programlisting>
-$ <userinput>head -1 $PGDATA/postmaster.pid</userinput>
-4170
-$ <userinput>pmap 4170 | awk '/rw-s/ && /zero/ {print $2}'</userinput>
-6490428K
-$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
-Hugepagesize: 2048 kB
-$ <userinput>ls /sys/kernel/mm/hugepages</userinput>
-hugepages-1048576kB hugepages-2048kB
+$ <userinput>postgres -D $PGDATA -C huge_pages_required</userinput>
+3170
</programlisting>
- In this example the default is 2MB, but you can also explicitly request
- either 2MB or 1GB with <xref linkend="guc-huge-page-size"/>.
+ Note that you can explicitly request either 2MB or 1GB huge pages with
+ <xref linkend="guc-huge-page-size"/>.
- Assuming <literal>2MB</literal> huge pages,
- <literal>6490428</literal> / <literal>2048</literal> gives approximately
- <literal>3169.154</literal>, so in this example we need at
- least <literal>3170</literal> huge pages. A larger setting would be
- appropriate if other programs on the machine also need huge pages.
- We can set this with:
+ While we need at least <literal>3170</literal> huge pages in this
+ example, a larger setting would be appropriate if other programs on
+ the machine also need huge pages. We can allocate the huge pages
+ with:
<programlisting>
# <userinput>sysctl -w vm.nr_hugepages=3170</userinput>
</programlisting>
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 48615c0ebc..2c9e832951 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -317,7 +317,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
checkDataDir();
ChangeToDataDir();
- CreateDataDirLockFile(false);
+ CreateDataDirLockFile(false, NULL);
SetProcessingMode(BootstrapProcessing);
IgnoreSystemIndexes = true;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b2fe420c3c..58136aac09 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -896,15 +896,31 @@ PostmasterMain(int argc, char *argv[])
if (output_config_variable != NULL)
{
/*
- * "-C guc" was specified, so print GUC's value and exit. No extra
- * permission check is needed because the user is reading inside the
- * data dir.
+ * If this is a runtime-computed GUC, it hasn't yet been initialized,
+ * and the present value is not useful. However, this is a convenient
+ * place to print the value for most GUCs because it is safe to run
+ * postmaster startup to this point even if the server is already
+ * running. For the handful of runtime-computed GUCs that we can't
+ * provide meaningful values for yet, we wait until later in postmaster
+ * startup to print the value. We won't be able to use -C on running
+ * servers for those GUCs, but otherwise this option is unusable for
+ * them.
*/
- const char *config_val = GetConfigOption(output_config_variable,
- false, false);
+ int flags = GetConfigOptionFlags(output_config_variable, true);
- puts(config_val ? config_val : "");
- ExitPostmaster(0);
+ if ((flags & GUC_RUNTIME_COMPUTED) == 0)
+ {
+ /*
+ * "-C guc" was specified, so print GUC's value and exit. No extra
+ * permission check is needed because the user is reading inside
+ * the data dir.
+ */
+ const char *config_val = GetConfigOption(output_config_variable,
+ false, false);
+
+ puts(config_val ? config_val : "");
+ ExitPostmaster(0);
+ }
}
/* Verify that DataDir looks reasonable */
@@ -983,7 +999,7 @@ PostmasterMain(int argc, char *argv[])
* so it must happen before opening sockets so that at exit, the socket
* lockfiles go away after CloseServerPorts runs.
*/
- CreateDataDirLockFile(true);
+ CreateDataDirLockFile(true, output_config_variable);
/*
* Read the control file (for error checking and config info).
@@ -1033,6 +1049,26 @@ PostmasterMain(int argc, char *argv[])
*/
InitializeShmemGUCs();
+ /*
+ * If -C was specified with a runtime-computed GUC, we held off printing
+ * the value earlier, as the GUC was not yet initialized. We handle -C for
+ * most GUCs before we lock the data directory so that the option may be
+ * used on a running server. However, a handful of GUCs are runtime-
+ * computed and do not have meaningful values until after locking the data
+ * directory, and we cannot safely calculate their values earlier on a
+ * running server. At this point, such GUCs should be properly
+ * initialized, and we haven't yet set up shared memory, so this is a good
+ * time to handle the -C option for these special GUCs.
+ */
+ if (output_config_variable != NULL)
+ {
+ const char *config_val = GetConfigOption(output_config_variable,
+ false, false);
+
+ puts(config_val ? config_val : "");
+ ExitPostmaster(0);
+ }
+
/*
* Set up shared memory and semaphores.
*/
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 58b5960e27..d84759ae59 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4043,7 +4043,7 @@ PostgresMain(int argc, char *argv[],
/*
* Create lockfile for data directory.
*/
- CreateDataDirLockFile(false);
+ CreateDataDirLockFile(false, NULL);
/* read control file (error checking and contains config ) */
LocalProcessControlFile(false);
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 88801374b5..a63aba6f57 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -952,6 +952,7 @@ static void
UnlinkLockFiles(int status, Datum arg)
{
ListCell *l;
+ bool logShutDown = DatumGetBool(arg);
foreach(l, lock_files)
{
@@ -968,10 +969,10 @@ UnlinkLockFiles(int status, Datum arg)
* of a postmaster or standalone backend, while we won't come here at all
* when exiting postmaster child processes. Therefore, this is a good
* place to log completion of shutdown. We could alternatively teach
- * proc_exit() to do it, but that seems uglier. In a standalone backend,
- * use NOTICE elevel to be less chatty.
+ * proc_exit() to do it, but that seems uglier. In a standalone backend
+ * and when logShutDown is false, use NOTICE elevel to be less chatty.
*/
- ereport(IsPostmasterEnvironment ? LOG : NOTICE,
+ ereport(IsPostmasterEnvironment && logShutDown ? LOG : NOTICE,
(errmsg("database system is shut down")));
}
@@ -986,7 +987,8 @@ UnlinkLockFiles(int status, Datum arg)
static void
CreateLockFile(const char *filename, bool amPostmaster,
const char *socketDir,
- bool isDDLock, const char *refName)
+ bool isDDLock, const char *refName,
+ const char *output_config_variable)
{
int fd;
char buffer[MAXPGPATH * 2 + 256];
@@ -999,6 +1001,18 @@ CreateLockFile(const char *filename, bool amPostmaster,
my_gp_pid;
const char *envvar;
+ /*
+ * If output_config_variable is not NULL, we try to add some useful context
+ * to the FATAL messages to make it clear why 'postgres -C' is failing for
+ * this particular GUC (only a small set of runtime-computed GUCs require
+ * the server to be shut down).
+ */
+#define LOCK_FILE_ERRDETAIL (output_config_variable ? \
+ errdetail("Runtime-computed GUC \"%s\" cannot " \
+ "be viewed on a running server.", \
+ output_config_variable) : \
+ 0)
+
/*
* If the PID in the lockfile is our own PID or our parent's or
* grandparent's PID, then the file must be stale (probably left over from
@@ -1060,7 +1074,8 @@ CreateLockFile(const char *filename, bool amPostmaster,
ereport(FATAL,
(errcode_for_file_access(),
errmsg("could not create lock file \"%s\": %m",
- filename)));
+ filename),
+ LOCK_FILE_ERRDETAIL));
/*
* Read the file to get the old owner's PID. Note race condition
@@ -1074,14 +1089,16 @@ CreateLockFile(const char *filename, bool amPostmaster,
ereport(FATAL,
(errcode_for_file_access(),
errmsg("could not open lock file \"%s\": %m",
- filename)));
+ filename),
+ LOCK_FILE_ERRDETAIL));
}
pgstat_report_wait_start(WAIT_EVENT_LOCK_FILE_CREATE_READ);
if ((len = read(fd, buffer, sizeof(buffer) - 1)) < 0)
ereport(FATAL,
(errcode_for_file_access(),
errmsg("could not read lock file \"%s\": %m",
- filename)));
+ filename),
+ LOCK_FILE_ERRDETAIL));
pgstat_report_wait_end();
close(fd);
@@ -1090,6 +1107,7 @@ CreateLockFile(const char *filename, bool amPostmaster,
ereport(FATAL,
(errcode(ERRCODE_LOCK_FILE_EXISTS),
errmsg("lock file \"%s\" is empty", filename),
+ LOCK_FILE_ERRDETAIL,
errhint("Either another server is starting, or the lock file is the remnant of a previous server startup crash.")));
}
@@ -1136,6 +1154,7 @@ CreateLockFile(const char *filename, bool amPostmaster,
(errcode(ERRCODE_LOCK_FILE_EXISTS),
errmsg("lock file \"%s\" already exists",
filename),
+ LOCK_FILE_ERRDETAIL,
isDDLock ?
(encoded_pid < 0 ?
errhint("Is another postgres (PID %d) running in data directory \"%s\"?",
@@ -1183,6 +1202,7 @@ CreateLockFile(const char *filename, bool amPostmaster,
(errcode(ERRCODE_LOCK_FILE_EXISTS),
errmsg("pre-existing shared memory block (key %lu, ID %lu) is still in use",
id1, id2),
+ LOCK_FILE_ERRDETAIL,
errhint("Terminate any old server processes associated with data directory \"%s\".",
refName)));
}
@@ -1198,6 +1218,7 @@ CreateLockFile(const char *filename, bool amPostmaster,
(errcode_for_file_access(),
errmsg("could not remove old lock file \"%s\": %m",
filename),
+ LOCK_FILE_ERRDETAIL,
errhint("The file seems accidentally left over, but "
"it could not be removed. Please remove the file "
"by hand and try again.")));
@@ -1235,7 +1256,8 @@ CreateLockFile(const char *filename, bool amPostmaster,
errno = save_errno ? save_errno : ENOSPC;
ereport(FATAL,
(errcode_for_file_access(),
- errmsg("could not write lock file \"%s\": %m", filename)));
+ errmsg("could not write lock file \"%s\": %m", filename),
+ LOCK_FILE_ERRDETAIL));
}
pgstat_report_wait_end();
@@ -1249,7 +1271,8 @@ CreateLockFile(const char *filename, bool amPostmaster,
errno = save_errno;
ereport(FATAL,
(errcode_for_file_access(),
- errmsg("could not write lock file \"%s\": %m", filename)));
+ errmsg("could not write lock file \"%s\": %m", filename),
+ LOCK_FILE_ERRDETAIL));
}
pgstat_report_wait_end();
if (close(fd) != 0)
@@ -1260,16 +1283,21 @@ CreateLockFile(const char *filename, bool amPostmaster,
errno = save_errno;
ereport(FATAL,
(errcode_for_file_access(),
- errmsg("could not write lock file \"%s\": %m", filename)));
+ errmsg("could not write lock file \"%s\": %m", filename),
+ LOCK_FILE_ERRDETAIL));
}
/*
* Arrange to unlink the lock file(s) at proc_exit. If this is the first
* one, set up the on_proc_exit function to do it; then add this lock file
* to the list of files to unlink.
+ *
+ * If output_config_variable is not NULL, we ask UnlinkLockFiles to be less
+ * chatty to improve the user experience with 'postgres -C'.
*/
if (lock_files == NIL)
- on_proc_exit(UnlinkLockFiles, 0);
+ on_proc_exit(UnlinkLockFiles,
+ BoolGetDatum(output_config_variable == NULL));
/*
* Use lcons so that the lock files are unlinked in reverse order of
@@ -1287,11 +1315,15 @@ CreateLockFile(const char *filename, bool amPostmaster,
*
* Note that the socket directory path line is initially written as empty.
* postmaster.c will rewrite it upon creating the first Unix socket.
+ *
+ * If output_config_variable is not NULL (i.e., we are running 'postgres
+ * -C'), the logs we emit will be slightly adjusted to provide a better
+ * user experience.
*/
void
-CreateDataDirLockFile(bool amPostmaster)
+CreateDataDirLockFile(bool amPostmaster, const char *output_config_variable)
{
- CreateLockFile(DIRECTORY_LOCK_FILE, amPostmaster, "", true, DataDir);
+ CreateLockFile(DIRECTORY_LOCK_FILE, amPostmaster, "", true, DataDir, output_config_variable);
}
/*
@@ -1304,7 +1336,7 @@ CreateSocketLockFile(const char *socketfile, bool amPostmaster,
char lockfile[MAXPGPATH];
snprintf(lockfile, sizeof(lockfile), "%s.lock", socketfile);
- CreateLockFile(lockfile, amPostmaster, socketDir, false, socketfile);
+ CreateLockFile(lockfile, amPostmaster, socketDir, false, socketfile, NULL);
}
/*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 23236fa4c3..a6e4fcc24e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1983,7 +1983,7 @@ static struct config_bool ConfigureNamesBool[] =
{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows whether data checksums are turned on for this cluster."),
NULL,
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&data_checksums,
false,
@@ -2342,7 +2342,7 @@ static struct config_int ConfigureNamesInt[] =
{"shared_memory_size", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows the size of the server's main shared memory area (rounded up to the nearest MB)."),
NULL,
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_UNIT_MB
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_UNIT_MB | GUC_RUNTIME_COMPUTED
},
&shared_memory_size_mb,
0, 0, INT_MAX,
@@ -2407,7 +2407,7 @@ static struct config_int ConfigureNamesInt[] =
"in the form accepted by the chmod and umask system "
"calls. (To use the customary octal format the number "
"must start with a 0 (zero).)"),
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&data_directory_mode,
0700, 0000, 0777,
@@ -3231,7 +3231,7 @@ static struct config_int ConfigureNamesInt[] =
{"wal_segment_size", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows the size of write ahead log segments."),
NULL,
- GUC_UNIT_BYTE | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_UNIT_BYTE | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&wal_segment_size,
DEFAULT_XLOG_SEG_SIZE,
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 2e2e9a364a..d630ffa310 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -466,7 +466,8 @@ extern char *session_preload_libraries_string;
extern char *shared_preload_libraries_string;
extern char *local_preload_libraries_string;
-extern void CreateDataDirLockFile(bool amPostmaster);
+extern void CreateDataDirLockFile(bool amPostmaster,
+ const char *output_config_variable);
extern void CreateSocketLockFile(const char *socketfile, bool amPostmaster,
const char *socketDir);
extern void TouchSocketLockFiles(void);
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index a7c3a4958e..aa18d304ac 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -229,6 +229,12 @@ typedef enum
#define GUC_EXPLAIN 0x100000 /* include in explain */
+/*
+ * GUC_RUNTIME_COMPUTED is intended for runtime-computed GUCs that are only
+ * available via 'postgres -C' if the server is not running.
+ */
+#define GUC_RUNTIME_COMPUTED 0x200000
+
#define GUC_UNIT (GUC_UNIT_MEMORY | GUC_UNIT_TIME)
--
2.16.6
On Thu, Sep 09, 2021 at 09:53:22PM +0000, Bossart, Nathan wrote:
For 0002, I have two small concerns. My first concern is that it
might be confusing to customers when the runtime GUCs cannot be
returned for a running server. We have the note in the docs, but if
you're encountering it on the command line, it's not totally clear
what the problem is.
Yeah, that's true. There are more unlikely-to-happen errors that
could be triggered and prevent the command to work. I have never
tried using error_context_stack in a code path as early as that, to be
honest.
Running these commands with log_min_messages=debug5 emits way more
information for the runtime-computed GUCs than for others, but IMO
that is alright. However, perhaps we should adjust the logging in
0002 to improve the default user experience. I attached an attempt at
that.
Registered bgworkers would generate a DEBUG entry, for one.
I'm not tremendously happy with the patch, but I hope that it at least
helps with the discussion.
As far as the behavior is documented, I'd be fine with the approach to
keep the code in its simplest shape. I agree that the message is
confusing, still it is not wrong either as we try to query a run-time
parameter, but we need the lock.
--
Michael
On 9/9/21, 7:03 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
As far as the behavior is documented, I'd be fine with the approach to
keep the code in its simplest shape. I agree that the message is
confusing, still it is not wrong either as we try to query a run-time
parameter, but we need the lock.
That seems alright to me.
Nathan
On Thu, Sep 9, 2021 at 5:53 PM Bossart, Nathan <bossartn@amazon.com> wrote:
For 0001, the biggest thing on my mind at the moment is the name of
the GUC. "huge_pages_required" feels kind of ambiguous. From the
name alone, it could mean either "the number of huge pages required"
or "huge pages are required for the server to run." Also, the number
of huge pages required is not actually required if you don't want to
run the server with huge pages.
+1 to all of that.
I think it might be clearer to
somehow indicate that the value is essentially the size of the main
shared memory area in terms of the huge page size, but I'm not sure
how to do that concisely. Perhaps it is enough to just make sure the
description of "huge_pages_required" is detailed enough.
shared_memory_size_in_huge_pages? It's kinda long, but a long name
that you can understand without reading the docs is better than a
short one where you can't.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 9/10/21, 1:02 PM, "Robert Haas" <robertmhaas@gmail.com> wrote:
On Thu, Sep 9, 2021 at 5:53 PM Bossart, Nathan <bossartn@amazon.com> wrote:
I think it might be clearer to
somehow indicate that the value is essentially the size of the main
shared memory area in terms of the huge page size, but I'm not sure
how to do that concisely. Perhaps it is enough to just make sure the
description of "huge_pages_required" is detailed enough.shared_memory_size_in_huge_pages? It's kinda long, but a long name
that you can understand without reading the docs is better than a
short one where you can't.
I think that's an improvement. The only other idea I have at the
moment is num_huge_pages_required_for_shared_memory.
Nathan
On Fri, Sep 10, 2021 at 7:43 PM Bossart, Nathan <bossartn@amazon.com> wrote:
shared_memory_size_in_huge_pages? It's kinda long, but a long name
that you can understand without reading the docs is better than a
short one where you can't.I think that's an improvement. The only other idea I have at the
moment is num_huge_pages_required_for_shared_memory.
Hmm, that to me sounds like maybe only part of shared memory uses huge
pages and maybe we're just giving you the number required for that
part. I realize that it doesn't work that way but I don't know if
everyone will.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 9/13/21, 8:59 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
On Fri, Sep 10, 2021 at 7:43 PM Bossart, Nathan <bossartn@amazon.com> wrote:
I think that's an improvement. The only other idea I have at the
moment is num_huge_pages_required_for_shared_memory.Hmm, that to me sounds like maybe only part of shared memory uses huge
pages and maybe we're just giving you the number required for that
part. I realize that it doesn't work that way but I don't know if
everyone will.
Yeah, I agree. What about
huge_pages_needed_for_shared_memory_size or
huge_pages_needed_for_main_shared_memory? I'm still not stoked about
using "required" or "needed" in the name, as it sounds like huge pages
must be allocated for the server to run, which is only true if
huge_pages=on. I haven't thought of a better word to use, though.
Nathan
On Mon, Sep 13, 2021 at 2:49 PM Bossart, Nathan <bossartn@amazon.com> wrote:
Yeah, I agree. What about
huge_pages_needed_for_shared_memory_size or
huge_pages_needed_for_main_shared_memory? I'm still not stoked about
using "required" or "needed" in the name, as it sounds like huge pages
must be allocated for the server to run, which is only true if
huge_pages=on. I haven't thought of a better word to use, though.
I prefer the first of those to the second. I don't find it
particularly better or worse than my previous suggestion of
shared_memory_size_in_huge_pages.
--
Robert Haas
EDB: http://www.enterprisedb.com
"Bossart, Nathan" <bossartn@amazon.com> writes:
Yeah, I agree. What about
huge_pages_needed_for_shared_memory_size or
huge_pages_needed_for_main_shared_memory?
Seems like "huge_pages_needed_for_shared_memory" would be sufficient.
regards, tom lane
On Mon, Sep 13, 2021 at 04:20:00PM -0400, Robert Haas wrote:
On Mon, Sep 13, 2021 at 2:49 PM Bossart, Nathan <bossartn@amazon.com> wrote:
Yeah, I agree. What about
huge_pages_needed_for_shared_memory_size or
huge_pages_needed_for_main_shared_memory? I'm still not stoked about
using "required" or "needed" in the name, as it sounds like huge pages
must be allocated for the server to run, which is only true if
huge_pages=on. I haven't thought of a better word to use, though.I prefer the first of those to the second. I don't find it
particularly better or worse than my previous suggestion of
shared_memory_size_in_huge_pages.
I am not particularly fond of the use "needed" in this context, so I'd
be fine with your suggestion of "shared_memory_size_in_huge_pages.
Some other ideas I could think of:
- shared_memory_size_as_huge_pages
- huge_pages_for_shared_memory_size
Having shared_memory_size in the GUC name is kind of appealing though
in terms of grepping, and one gets the relationship with
shared_memory_size immediately. If the consensus is
huge_pages_needed_for_shared_memory_size, I won't fight it, but IMO
that's too long.
--
Michael
On 9/13/21, 1:25 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
Seems like "huge_pages_needed_for_shared_memory" would be sufficient.
I think we are down to either shared_memory_size_in_huge_pages or
huge_pages_needed_for_shared_memory. Robert's argument against
huge_pages_needed_for_shared_memory was that it might sound like only
part of shared memory uses huge pages and we're only giving the number
required for that. Speaking of which, isn't that technically true?
For shared_memory_size_in_huge_pages, the intent is to make it sound
like we are providing shared_memory_size in terms of the huge page
size, but I think it could also be interpreted as "the amount of
shared memory that is currently stored in huge pages."
I personally lean towards huge_pages_needed_for_shared_memory because
it feels the most clear and direct to me. I'm not vehemently opposed
to shared_memory_size_in_huge_pages, though. I don't think either one
is too misleading.
Nathan
At Tue, 14 Sep 2021 00:30:22 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in
On 9/13/21, 1:25 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
Seems like "huge_pages_needed_for_shared_memory" would be sufficient.
I think we are down to either shared_memory_size_in_huge_pages or
huge_pages_needed_for_shared_memory. Robert's argument against
huge_pages_needed_for_shared_memory was that it might sound like only
part of shared memory uses huge pages and we're only giving the number
required for that. Speaking of which, isn't that technically true?
For shared_memory_size_in_huge_pages, the intent is to make it sound
like we are providing shared_memory_size in terms of the huge page
size, but I think it could also be interpreted as "the amount of
shared memory that is currently stored in huge pages."I personally lean towards huge_pages_needed_for_shared_memory because
it feels the most clear and direct to me. I'm not vehemently opposed
to shared_memory_size_in_huge_pages, though. I don't think either one
is too misleading.
I like 'in' slightly than 'for' in this context. I stand by Michael
that that name looks somewhat too long especially considering that
that name won't be completed on shell command lines, but won't fight
it, too. On the other hand the full-spelled name can be thought as
one can spell it out from memory easily than a name halfway shortened.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 9/13/21, 5:49 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
At Tue, 14 Sep 2021 00:30:22 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in
On 9/13/21, 1:25 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
Seems like "huge_pages_needed_for_shared_memory" would be sufficient.
I think we are down to either shared_memory_size_in_huge_pages or
huge_pages_needed_for_shared_memory. Robert's argument against
huge_pages_needed_for_shared_memory was that it might sound like only
part of shared memory uses huge pages and we're only giving the number
required for that. Speaking of which, isn't that technically true?
For shared_memory_size_in_huge_pages, the intent is to make it sound
like we are providing shared_memory_size in terms of the huge page
size, but I think it could also be interpreted as "the amount of
shared memory that is currently stored in huge pages."I personally lean towards huge_pages_needed_for_shared_memory because
it feels the most clear and direct to me. I'm not vehemently opposed
to shared_memory_size_in_huge_pages, though. I don't think either one
is too misleading.I like 'in' slightly than 'for' in this context. I stand by Michael
that that name looks somewhat too long especially considering that
that name won't be completed on shell command lines, but won't fight
it, too. On the other hand the full-spelled name can be thought as
one can spell it out from memory easily than a name halfway shortened.
I think I see more support for shared_memory_size_in_huge_pages than
for huge_pages_needed_for_shared_memory at the moment. I'll update
the patch set in the next day or two to use
shared_memory_size_in_huge_pages unless something changes in the
meantime.
Nathan
On Tue, Sep 14, 2021 at 06:00:44PM +0000, Bossart, Nathan wrote:
I think I see more support for shared_memory_size_in_huge_pages than
for huge_pages_needed_for_shared_memory at the moment. I'll update
the patch set in the next day or two to use
shared_memory_size_in_huge_pages unless something changes in the
meantime.
I have been looking at the patch to add the new GUC flag and the new
sequence for postgres -C, and we could have some TAP tests. There
were two places that made sense to me: pg_checksums/t/002_actions.pl
and recovery/t/017_shm.pl. I have chosen the former as it has
coverage across more platforms, and used data_checksums for this
purpose, with an extra positive test to check for the case where a GUC
can be queried while the server is running.
There are four parameters that are updated here:
- shared_memory_size
- wal_segment_size
- data_checksums
- data_directory_mode
That looks sensible, after looking at the full set of GUCs.
Attached is a refreshed patch (commit message is the same as v9 for
now), with some minor tweaks and the tests.
Thoughts?
--
Michael
Attachments:
v10-0001-Provide-useful-values-for-postgres-C-with-runtim.patchtext/plain; charset=us-asciiDownload
From e24d151efe45e1bc7048ec72ea66097c94f633be Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 15 Sep 2021 12:02:32 +0900
Subject: [PATCH v10] Provide useful values for 'postgres -C' with
runtime-computed GUCs
The -C option is handled before a small subset of GUCs that are
computed at runtime are initialized. Unfortunately, we cannot move
this handling to after they are initialized without disallowing
'postgres -C' on a running server. One notable reason for this is
that loadable libraries' _PG_init() functions are called before all
runtime-computed GUCs are initialized, and this is not guaranteed
to be safe to do on running servers.
In order to provide useful values for 'postgres -C' for runtime-
computed GUCs, this change adds a new section for handling just
these GUCs just before shared memory is initialized. While users
won't be able to use -C for runtime-computed GUCs on running
servers, providing a useful value with this restriction seems
better than not providing a useful value at all.
---
src/include/utils/guc.h | 6 ++++
src/backend/postmaster/postmaster.c | 50 +++++++++++++++++++++++----
src/backend/utils/misc/guc.c | 8 ++---
src/bin/pg_checksums/t/002_actions.pl | 21 ++++++++++-
doc/src/sgml/ref/postgres-ref.sgml | 11 ++++--
5 files changed, 82 insertions(+), 14 deletions(-)
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index a7c3a4958e..aa18d304ac 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -229,6 +229,12 @@ typedef enum
#define GUC_EXPLAIN 0x100000 /* include in explain */
+/*
+ * GUC_RUNTIME_COMPUTED is intended for runtime-computed GUCs that are only
+ * available via 'postgres -C' if the server is not running.
+ */
+#define GUC_RUNTIME_COMPUTED 0x200000
+
#define GUC_UNIT (GUC_UNIT_MEMORY | GUC_UNIT_TIME)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b2fe420c3c..f5b91c7af7 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -896,15 +896,31 @@ PostmasterMain(int argc, char *argv[])
if (output_config_variable != NULL)
{
/*
- * "-C guc" was specified, so print GUC's value and exit. No extra
- * permission check is needed because the user is reading inside the
- * data dir.
+ * If this is a runtime-computed GUC, it hasn't yet been initialized,
+ * and the present value is not useful. However, this is a convenient
+ * place to print the value for most GUCs because it is safe to run
+ * postmaster startup to this point even if the server is already
+ * running. For the handful of runtime-computed GUCs that we can't
+ * provide meaningful values for yet, we wait until later in
+ * postmaster startup to print the value. We won't be able to use -C
+ * on running servers for those GUCs, but otherwise this option is
+ * unusable for them.
*/
- const char *config_val = GetConfigOption(output_config_variable,
- false, false);
+ int flags = GetConfigOptionFlags(output_config_variable, true);
- puts(config_val ? config_val : "");
- ExitPostmaster(0);
+ if ((flags & GUC_RUNTIME_COMPUTED) == 0)
+ {
+ /*
+ * "-C guc" was specified, so print GUC's value and exit. No
+ * extra permission check is needed because the user is reading
+ * inside the data dir.
+ */
+ const char *config_val = GetConfigOption(output_config_variable,
+ false, false);
+
+ puts(config_val ? config_val : "");
+ ExitPostmaster(0);
+ }
}
/* Verify that DataDir looks reasonable */
@@ -1033,6 +1049,26 @@ PostmasterMain(int argc, char *argv[])
*/
InitializeShmemGUCs();
+ /*
+ * If -C was specified with a runtime-computed GUC, we held off printing
+ * the value earlier, as the GUC was not yet initialized. We handle -C
+ * for most GUCs before we lock the data directory so that the option may
+ * be used on a running server. However, a handful of GUCs are runtime-
+ * computed and do not have meaningful values until after locking the data
+ * directory, and we cannot safely calculate their values earlier on a
+ * running server. At this point, such GUCs should be properly
+ * initialized, and we haven't yet set up shared memory, so this is a good
+ * time to handle the -C option for these special GUCs.
+ */
+ if (output_config_variable != NULL)
+ {
+ const char *config_val = GetConfigOption(output_config_variable,
+ false, false);
+
+ puts(config_val ? config_val : "");
+ ExitPostmaster(0);
+ }
+
/*
* Set up shared memory and semaphores.
*/
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 23236fa4c3..a6e4fcc24e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1983,7 +1983,7 @@ static struct config_bool ConfigureNamesBool[] =
{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows whether data checksums are turned on for this cluster."),
NULL,
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&data_checksums,
false,
@@ -2342,7 +2342,7 @@ static struct config_int ConfigureNamesInt[] =
{"shared_memory_size", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows the size of the server's main shared memory area (rounded up to the nearest MB)."),
NULL,
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_UNIT_MB
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_UNIT_MB | GUC_RUNTIME_COMPUTED
},
&shared_memory_size_mb,
0, 0, INT_MAX,
@@ -2407,7 +2407,7 @@ static struct config_int ConfigureNamesInt[] =
"in the form accepted by the chmod and umask system "
"calls. (To use the customary octal format the number "
"must start with a 0 (zero).)"),
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&data_directory_mode,
0700, 0000, 0777,
@@ -3231,7 +3231,7 @@ static struct config_int ConfigureNamesInt[] =
{"wal_segment_size", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows the size of write ahead log segments."),
NULL,
- GUC_UNIT_BYTE | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+ GUC_UNIT_BYTE | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
},
&wal_segment_size,
DEFAULT_XLOG_SEG_SIZE,
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index a18c104a94..e98586c3f8 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -10,7 +10,7 @@ use PostgresNode;
use TestLib;
use Fcntl qw(:seek);
-use Test::More tests => 63;
+use Test::More tests => 69;
# Utility routine to create and check a table with corrupted checksums
@@ -178,11 +178,30 @@ command_fails(
[ 'pg_checksums', '--enable', '--filenode', '1234', '-D', $pgdata ],
"fails when relfilenodes are requested and action is --enable");
+# Test postgres -C for an offline cluster.
+# Run-time GUCs are safe to query here. Note that a lock file is created,
+# then unlinked, leading to an extra LOG entry showing in stderr.
+command_checks_all(
+ [ 'postgres', '-D', $pgdata, '-C', 'data_checksums' ],
+ 0,
+ [qr/^on$/],
+ # LOG entry when unlinking lock file.
+ [qr/database system is shut down/],
+ 'data_checksums=on is reported on an offline cluster');
+
# Checks cannot happen with an online cluster
$node->start;
command_fails([ 'pg_checksums', '--check', '-D', $pgdata ],
"fails with online cluster");
+# Test postgres -C on an online cluster.
+command_fails_like(
+ [ 'postgres', '-D', $pgdata, '-C', 'data_checksums' ],
+ qr/lock file .* already exists/,
+ 'data_checksums is not reported on an online cluster');
+command_ok([ 'postgres', '-D', $pgdata, '-C', 'work_mem' ],
+ 'non-runtime parameter is reported on an online cluster');
+
# Check corruption of table on default tablespace.
check_relation_corruption($node, 'corrupt1', 'pg_default');
diff --git a/doc/src/sgml/ref/postgres-ref.sgml b/doc/src/sgml/ref/postgres-ref.sgml
index 4aaa7abe1a..02142ffab2 100644
--- a/doc/src/sgml/ref/postgres-ref.sgml
+++ b/doc/src/sgml/ref/postgres-ref.sgml
@@ -133,13 +133,20 @@ PostgreSQL documentation
<listitem>
<para>
Prints the value of the named run-time parameter, and exits.
- (See the <option>-c</option> option above for details.) This can
- be used on a running server, and returns values from
+ (See the <option>-c</option> option above for details.) This
+ returns values from
<filename>postgresql.conf</filename>, modified by any parameters
supplied in this invocation. It does not reflect parameters
supplied when the cluster was started.
</para>
+ <para>
+ This can be used on a running server for most parameters. However,
+ the server must be shut down for some runtime-computed parameters
+ (e.g., <xref linkend="guc-shared-memory-size"/>, and
+ <xref linkend="guc-wal-segment-size"/>).
+ </para>
+
<para>
This option is meant for other programs that interact with a server
instance, such as <xref linkend="app-pg-ctl"/>, to query configuration
--
2.33.0
On 9/14/21, 8:06 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
Attached is a refreshed patch (commit message is the same as v9 for
now), with some minor tweaks and the tests.Thoughts?
LGTM
+ This can be used on a running server for most parameters. However,
+ the server must be shut down for some runtime-computed parameters
+ (e.g., <xref linkend="guc-shared-memory-size"/>, and
+ <xref linkend="guc-wal-segment-size"/>).
nitpick: I think you can remove the comma before the "and" in the list
of examples.
Nathan
On Wed, Sep 15, 2021 at 10:31:20PM +0000, Bossart, Nathan wrote:
+ This can be used on a running server for most parameters. However, + the server must be shut down for some runtime-computed parameters + (e.g., <xref linkend="guc-shared-memory-size"/>, and + <xref linkend="guc-wal-segment-size"/>).nitpick: I think you can remove the comma before the "and" in the list
of examples.
Fixed that, and applied. Could you rebase the last patch with the
name suggested for the last GUC, including the docs? It looks like we
are heading for shared_memory_size_in_huge_pages.
--
Michael
On 9/15/21, 7:42 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
Fixed that, and applied. Could you rebase the last patch with the
name suggested for the last GUC, including the docs? It looks like we
are heading for shared_memory_size_in_huge_pages.
Thanks! And done.
For the huge pages setup documentation, I considered sending stderr to
/dev/null to eliminate the LOG from the output, but I opted against
that. That would've looked like this:
postgres -D $PGDATA -C shared_memory_size_in_huge_pages 2> /dev/null
Otherwise, there aren't any significant changes in this version of the
patch besides the name change.
Nathan
Attachments:
v11-0001-Introduce-shared_memory_size_in_huge_pages-GUC.patchapplication/octet-stream; name=v11-0001-Introduce-shared_memory_size_in_huge_pages-GUC.patchDownload
From 4ef008bd7fb4ad7a6c7c0fbd070ef571429c046c Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Thu, 16 Sep 2021 16:35:55 +0000
Subject: [PATCH v11 1/1] Introduce shared_memory_size_in_huge_pages GUC.
This runtime-computed GUC shows the number of huge pages required
for the server's main shared memory area.
---
doc/src/sgml/config.sgml | 20 ++++++++++++++++++++
doc/src/sgml/ref/postgres-ref.sgml | 3 ++-
doc/src/sgml/runtime.sgml | 30 ++++++++++--------------------
src/backend/port/sysv_shmem.c | 16 +++++++++++-----
src/backend/port/win32_shmem.c | 14 ++++++++++++++
src/backend/storage/ipc/ipci.c | 14 ++++++++++++++
src/backend/utils/misc/guc.c | 12 ++++++++++++
src/include/storage/pg_shmem.h | 1 +
8 files changed, 84 insertions(+), 26 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ef0e2a7746..6e8839cd1b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -10289,6 +10289,26 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
</listitem>
</varlistentry>
+ <varlistentry id="guc-shared-memory-size-in-huge-pages" xreflabel="shared_memory_size_in_huge_pages">
+ <term><varname>shared_memory_size_in_huge_pages</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>shared_memory_size_in_huge_pages</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Reports the number of huge pages that are needed for the main shared
+ memory area based on the specified <xref linkend="guc-huge-page-size"/>.
+ If huge pages are not supported, this will be <literal>-1</literal>.
+ </para>
+ <para>
+ This setting is supported only on Linux. It is always set to
+ <literal>-1</literal> on other platforms. For more details about using
+ huge pages on Linux, see <xref linkend="linux-huge-pages"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-ssl-library" xreflabel="ssl_library">
<term><varname>ssl_library</varname> (<type>string</type>)
<indexterm>
diff --git a/doc/src/sgml/ref/postgres-ref.sgml b/doc/src/sgml/ref/postgres-ref.sgml
index f72c3b04e4..55a3f6c69d 100644
--- a/doc/src/sgml/ref/postgres-ref.sgml
+++ b/doc/src/sgml/ref/postgres-ref.sgml
@@ -143,7 +143,8 @@ PostgreSQL documentation
<para>
This can be used on a running server for most parameters. However,
the server must be shut down for some runtime-computed parameters
- (e.g., <xref linkend="guc-shared-memory-size"/> and
+ (e.g., <xref linkend="guc-shared-memory-size"/>,
+ <xref linkend="guc-shared-memory-size-in-huge-pages"/>, and
<xref linkend="guc-wal-segment-size"/>).
</para>
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index f1cbc1d9e9..9310859506 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1442,31 +1442,21 @@ export PG_OOM_ADJUST_VALUE=0
with <varname>CONFIG_HUGETLBFS=y</varname> and
<varname>CONFIG_HUGETLB_PAGE=y</varname>. You will also have to configure
the operating system to provide enough huge pages of the desired size.
- To estimate the number of huge pages needed, start
- <productname>PostgreSQL</productname> without huge pages enabled and check
- the postmaster's anonymous shared memory segment size, as well as the
- system's default and supported huge page sizes, using the
- <filename>/proc</filename> and <filename>/sys</filename> file systems.
+ To determine the number of huge pages needed, use the
+ <command>postgres</command> command to see the value of
+ <xref linkend="guc-shared-memory-size-in-huge-pages"/>.
This might look like:
<programlisting>
-$ <userinput>head -1 $PGDATA/postmaster.pid</userinput>
-4170
-$ <userinput>pmap 4170 | awk '/rw-s/ && /zero/ {print $2}'</userinput>
-6490428K
-$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
-Hugepagesize: 2048 kB
-$ <userinput>ls /sys/kernel/mm/hugepages</userinput>
-hugepages-1048576kB hugepages-2048kB
+$ <userinput>postgres -D $PGDATA -C shared_memory_size_in_huge_pages</userinput>
+3170
</programlisting>
- In this example the default is 2MB, but you can also explicitly request
- either 2MB or 1GB with <xref linkend="guc-huge-page-size"/>.
+ Note that you can explicitly request either 2MB or 1GB huge pages with
+ <xref linkend="guc-huge-page-size"/>.
- Assuming <literal>2MB</literal> huge pages,
- <literal>6490428</literal> / <literal>2048</literal> gives approximately
- <literal>3169.154</literal>, so in this example we need at
- least <literal>3170</literal> huge pages. A larger setting would be
- appropriate if other programs on the machine also need huge pages.
+ While we need at least <literal>3170</literal> huge pages in this example,
+ a larger setting would be appropriate if other programs on the machine
+ also need huge pages.
We can set this with:
<programlisting>
# <userinput>sysctl -w vm.nr_hugepages=3170</userinput>
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 9de96edf6a..125e2d47ec 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -456,8 +456,6 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
return shmStat.shm_nattch == 0 ? SHMSTATE_UNATTACHED : SHMSTATE_ATTACHED;
}
-#ifdef MAP_HUGETLB
-
/*
* Identify the huge page size to use, and compute the related mmap flags.
*
@@ -476,11 +474,19 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
* such as increasing shared_buffers to absorb the extra space.
*
* Returns the (real, assumed or config provided) page size into *hugepagesize,
- * and the hugepage-related mmap flags to use into *mmap_flags.
+ * and the hugepage-related mmap flags to use into *mmap_flags. If huge pages
+ * is not supported, *hugepagesize and *mmap_flags will be set to 0.
*/
-static void
+void
GetHugePageSize(Size *hugepagesize, int *mmap_flags)
{
+#ifndef MAP_HUGETLB
+
+ *hugepagesize = 0;
+ *mmap_flags = 0;
+
+#else
+
Size default_hugepagesize = 0;
/*
@@ -553,9 +559,9 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags)
*mmap_flags |= (shift & MAP_HUGE_MASK) << MAP_HUGE_SHIFT;
}
#endif
-}
#endif /* MAP_HUGETLB */
+}
/*
* Creates an anonymous mmap()ed shared memory segment.
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index d7a71992d8..90de2ab4e1 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -605,3 +605,17 @@ pgwin32_ReserveSharedMemoryRegion(HANDLE hChild)
return true;
}
+
+/*
+ * This function is provided for consistency with sysv_shmem.c and does not
+ * provide any useful information for Windows. To obtain the large page size,
+ * use GetLargePageMinimum() instead.
+ *
+ * This always sets *hugepagesize and *mmap_flags to 0.
+ */
+void
+GetHugePageSize(Size *hugepagesize, int *mmap_flags)
+{
+ *hugepagesize = 0;
+ *mmap_flags = 0;
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 13f3926ff6..805d807906 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -326,6 +326,9 @@ InitializeShmemGUCs(void)
char buf[64];
Size size_b;
Size size_mb;
+ Size hp_size;
+ Size hp_required;
+ int unused;
/*
* Calculate the shared memory size and round up to the nearest megabyte.
@@ -334,4 +337,15 @@ InitializeShmemGUCs(void)
size_mb = add_size(size_b, (1024 * 1024) - 1) / (1024 * 1024);
sprintf(buf, "%zu", size_mb);
SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+
+ /*
+ * Calculate the number of huge pages required.
+ */
+ GetHugePageSize(&hp_size, &unused);
+ if (hp_size != 0)
+ {
+ hp_required = add_size(size_b / hp_size, 1);
+ sprintf(buf, "%zu", hp_required);
+ SetConfigOption("shared_memory_size_in_huge_pages", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+ }
}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a6e4fcc24e..d2ce4a8450 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -665,6 +665,7 @@ static int max_identifier_length;
static int block_size;
static int segment_size;
static int shared_memory_size_mb;
+static int shared_memory_size_in_huge_pages;
static int wal_block_size;
static bool data_checksums;
static bool integer_datetimes;
@@ -2349,6 +2350,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"shared_memory_size_in_huge_pages", PGC_INTERNAL, PRESET_OPTIONS,
+ gettext_noop("Shows the number of huge pages needed for the main shared memory area."),
+ gettext_noop("-1 indicates that the value could not be determined."),
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
+ },
+ &shared_memory_size_in_huge_pages,
+ -1, -1, INT_MAX,
+ NULL, NULL, NULL
+ },
+
{
{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index 059df1b72c..518eb86065 100644
--- a/src/include/storage/pg_shmem.h
+++ b/src/include/storage/pg_shmem.h
@@ -87,5 +87,6 @@ extern PGShmemHeader *PGSharedMemoryCreate(Size size,
PGShmemHeader **shim);
extern bool PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2);
extern void PGSharedMemoryDetach(void);
+extern void GetHugePageSize(Size *hugepagesize, int *mmap_flags);
#endif /* PG_SHMEM_H */
--
2.16.6
+ * and the hugepage-related mmap flags to use into *mmap_flags. If huge pages
+ * is not supported, *hugepagesize and *mmap_flags will be set to 0.
nitpick: *are* not supported, as you say elsewhere.
+ gettext_noop("Shows the number of huge pages needed for the main shared memory area."),
Maybe this was already discussed, but "main" could be misleading.
To me that sounds like there might be huge pages needed for something other
than the "main" area and the reported value might turn out to be inadequate,
(which is exactly the issue these patch are trying to address).
In particular, this sounds like it's just going to report
shared_buffers/huge_page_size (since shared buffers is usually the "main" use
of shared memory) - rather than reporting the size of the entire shared memory
in units of huge_pages.
--
Justin
On 9/16/21, 10:17 AM, "Justin Pryzby" <pryzby@telsasoft.com> wrote:
+ * and the hugepage-related mmap flags to use into *mmap_flags. If huge pages + * is not supported, *hugepagesize and *mmap_flags will be set to 0.nitpick: *are* not supported, as you say elsewhere.
Updated. I think I originally chose "is" because I was referring to
Huge Pages as a singular feature, but that sounds a bit awkward, and I
don't think there's any substantial difference either way.
+ gettext_noop("Shows the number of huge pages needed for the main shared memory area."),
Maybe this was already discussed, but "main" could be misleading.
To me that sounds like there might be huge pages needed for something other
than the "main" area and the reported value might turn out to be inadequate,
(which is exactly the issue these patch are trying to address).In particular, this sounds like it's just going to report
shared_buffers/huge_page_size (since shared buffers is usually the "main" use
of shared memory) - rather than reporting the size of the entire shared memory
in units of huge_pages.
I'm not sure I agree on this one. The documentation for huge_pages
[0]: https://www.postgresql.org/docs/devel/runtime-config-resource.html#GUC-HUGE-PAGES
and the new shared_memory_size GUC uses it as well [2]https://www.postgresql.org/docs/devel/runtime-config-preset.html#GUC-SHARED-MEMORY-SIZE. I don't see
anything in the documentation that suggests that shared_buffers is the
only thing in the main shared memory area, and the documentation for
setting up huge pages makes no mention of any extra memory that needs
to be considered, either.
Furthermore, I'm not sure what else we'd call it. I don't think it's
accurate to suggest that it's the entirety of shared memory for the
server, as it's possible to dynamically allocate more as needed (which
IIUC won't use any explicitly allocated huge pages).
Nathan
[0]: https://www.postgresql.org/docs/devel/runtime-config-resource.html#GUC-HUGE-PAGES
[1]: https://www.postgresql.org/docs/devel/runtime-config-resource.html#GUC-SHARED-MEMORY-TYPE
[2]: https://www.postgresql.org/docs/devel/runtime-config-preset.html#GUC-SHARED-MEMORY-SIZE
Attachments:
v12-0001-Introduce-shared_memory_size_in_huge_pages-GUC.patchapplication/octet-stream; name=v12-0001-Introduce-shared_memory_size_in_huge_pages-GUC.patchDownload
From de31b31eb48f1c918a266b1a902ca6ab553a5675 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Thu, 16 Sep 2021 21:23:04 +0000
Subject: [PATCH v12 1/1] Introduce shared_memory_size_in_huge_pages GUC.
This runtime-computed GUC shows the number of huge pages required
for the server's main shared memory area.
---
doc/src/sgml/config.sgml | 20 ++++++++++++++++++++
doc/src/sgml/ref/postgres-ref.sgml | 3 ++-
doc/src/sgml/runtime.sgml | 30 ++++++++++--------------------
src/backend/port/sysv_shmem.c | 16 +++++++++++-----
src/backend/port/win32_shmem.c | 14 ++++++++++++++
src/backend/storage/ipc/ipci.c | 14 ++++++++++++++
src/backend/utils/misc/guc.c | 12 ++++++++++++
src/include/storage/pg_shmem.h | 1 +
8 files changed, 84 insertions(+), 26 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ef0e2a7746..6e8839cd1b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -10289,6 +10289,26 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
</listitem>
</varlistentry>
+ <varlistentry id="guc-shared-memory-size-in-huge-pages" xreflabel="shared_memory_size_in_huge_pages">
+ <term><varname>shared_memory_size_in_huge_pages</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>shared_memory_size_in_huge_pages</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Reports the number of huge pages that are needed for the main shared
+ memory area based on the specified <xref linkend="guc-huge-page-size"/>.
+ If huge pages are not supported, this will be <literal>-1</literal>.
+ </para>
+ <para>
+ This setting is supported only on Linux. It is always set to
+ <literal>-1</literal> on other platforms. For more details about using
+ huge pages on Linux, see <xref linkend="linux-huge-pages"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-ssl-library" xreflabel="ssl_library">
<term><varname>ssl_library</varname> (<type>string</type>)
<indexterm>
diff --git a/doc/src/sgml/ref/postgres-ref.sgml b/doc/src/sgml/ref/postgres-ref.sgml
index f72c3b04e4..55a3f6c69d 100644
--- a/doc/src/sgml/ref/postgres-ref.sgml
+++ b/doc/src/sgml/ref/postgres-ref.sgml
@@ -143,7 +143,8 @@ PostgreSQL documentation
<para>
This can be used on a running server for most parameters. However,
the server must be shut down for some runtime-computed parameters
- (e.g., <xref linkend="guc-shared-memory-size"/> and
+ (e.g., <xref linkend="guc-shared-memory-size"/>,
+ <xref linkend="guc-shared-memory-size-in-huge-pages"/>, and
<xref linkend="guc-wal-segment-size"/>).
</para>
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index f1cbc1d9e9..9310859506 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1442,31 +1442,21 @@ export PG_OOM_ADJUST_VALUE=0
with <varname>CONFIG_HUGETLBFS=y</varname> and
<varname>CONFIG_HUGETLB_PAGE=y</varname>. You will also have to configure
the operating system to provide enough huge pages of the desired size.
- To estimate the number of huge pages needed, start
- <productname>PostgreSQL</productname> without huge pages enabled and check
- the postmaster's anonymous shared memory segment size, as well as the
- system's default and supported huge page sizes, using the
- <filename>/proc</filename> and <filename>/sys</filename> file systems.
+ To determine the number of huge pages needed, use the
+ <command>postgres</command> command to see the value of
+ <xref linkend="guc-shared-memory-size-in-huge-pages"/>.
This might look like:
<programlisting>
-$ <userinput>head -1 $PGDATA/postmaster.pid</userinput>
-4170
-$ <userinput>pmap 4170 | awk '/rw-s/ && /zero/ {print $2}'</userinput>
-6490428K
-$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
-Hugepagesize: 2048 kB
-$ <userinput>ls /sys/kernel/mm/hugepages</userinput>
-hugepages-1048576kB hugepages-2048kB
+$ <userinput>postgres -D $PGDATA -C shared_memory_size_in_huge_pages</userinput>
+3170
</programlisting>
- In this example the default is 2MB, but you can also explicitly request
- either 2MB or 1GB with <xref linkend="guc-huge-page-size"/>.
+ Note that you can explicitly request either 2MB or 1GB huge pages with
+ <xref linkend="guc-huge-page-size"/>.
- Assuming <literal>2MB</literal> huge pages,
- <literal>6490428</literal> / <literal>2048</literal> gives approximately
- <literal>3169.154</literal>, so in this example we need at
- least <literal>3170</literal> huge pages. A larger setting would be
- appropriate if other programs on the machine also need huge pages.
+ While we need at least <literal>3170</literal> huge pages in this example,
+ a larger setting would be appropriate if other programs on the machine
+ also need huge pages.
We can set this with:
<programlisting>
# <userinput>sysctl -w vm.nr_hugepages=3170</userinput>
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 9de96edf6a..bc15e2649c 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -456,8 +456,6 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
return shmStat.shm_nattch == 0 ? SHMSTATE_UNATTACHED : SHMSTATE_ATTACHED;
}
-#ifdef MAP_HUGETLB
-
/*
* Identify the huge page size to use, and compute the related mmap flags.
*
@@ -476,11 +474,19 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
* such as increasing shared_buffers to absorb the extra space.
*
* Returns the (real, assumed or config provided) page size into *hugepagesize,
- * and the hugepage-related mmap flags to use into *mmap_flags.
+ * and the hugepage-related mmap flags to use into *mmap_flags. If huge pages
+ * are not supported, *hugepagesize and *mmap_flags will be set to 0.
*/
-static void
+void
GetHugePageSize(Size *hugepagesize, int *mmap_flags)
{
+#ifndef MAP_HUGETLB
+
+ *hugepagesize = 0;
+ *mmap_flags = 0;
+
+#else
+
Size default_hugepagesize = 0;
/*
@@ -553,9 +559,9 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags)
*mmap_flags |= (shift & MAP_HUGE_MASK) << MAP_HUGE_SHIFT;
}
#endif
-}
#endif /* MAP_HUGETLB */
+}
/*
* Creates an anonymous mmap()ed shared memory segment.
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index d7a71992d8..90de2ab4e1 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -605,3 +605,17 @@ pgwin32_ReserveSharedMemoryRegion(HANDLE hChild)
return true;
}
+
+/*
+ * This function is provided for consistency with sysv_shmem.c and does not
+ * provide any useful information for Windows. To obtain the large page size,
+ * use GetLargePageMinimum() instead.
+ *
+ * This always sets *hugepagesize and *mmap_flags to 0.
+ */
+void
+GetHugePageSize(Size *hugepagesize, int *mmap_flags)
+{
+ *hugepagesize = 0;
+ *mmap_flags = 0;
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 13f3926ff6..805d807906 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -326,6 +326,9 @@ InitializeShmemGUCs(void)
char buf[64];
Size size_b;
Size size_mb;
+ Size hp_size;
+ Size hp_required;
+ int unused;
/*
* Calculate the shared memory size and round up to the nearest megabyte.
@@ -334,4 +337,15 @@ InitializeShmemGUCs(void)
size_mb = add_size(size_b, (1024 * 1024) - 1) / (1024 * 1024);
sprintf(buf, "%zu", size_mb);
SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+
+ /*
+ * Calculate the number of huge pages required.
+ */
+ GetHugePageSize(&hp_size, &unused);
+ if (hp_size != 0)
+ {
+ hp_required = add_size(size_b / hp_size, 1);
+ sprintf(buf, "%zu", hp_required);
+ SetConfigOption("shared_memory_size_in_huge_pages", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+ }
}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a6e4fcc24e..d2ce4a8450 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -665,6 +665,7 @@ static int max_identifier_length;
static int block_size;
static int segment_size;
static int shared_memory_size_mb;
+static int shared_memory_size_in_huge_pages;
static int wal_block_size;
static bool data_checksums;
static bool integer_datetimes;
@@ -2349,6 +2350,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"shared_memory_size_in_huge_pages", PGC_INTERNAL, PRESET_OPTIONS,
+ gettext_noop("Shows the number of huge pages needed for the main shared memory area."),
+ gettext_noop("-1 indicates that the value could not be determined."),
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
+ },
+ &shared_memory_size_in_huge_pages,
+ -1, -1, INT_MAX,
+ NULL, NULL, NULL
+ },
+
{
{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index 059df1b72c..518eb86065 100644
--- a/src/include/storage/pg_shmem.h
+++ b/src/include/storage/pg_shmem.h
@@ -87,5 +87,6 @@ extern PGShmemHeader *PGSharedMemoryCreate(Size size,
PGShmemHeader **shim);
extern bool PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2);
extern void PGSharedMemoryDetach(void);
+extern void GetHugePageSize(Size *hugepagesize, int *mmap_flags);
#endif /* PG_SHMEM_H */
--
2.16.6
On Thu, Sep 16, 2021 at 09:26:56PM +0000, Bossart, Nathan wrote:
I'm not sure I agree on this one. The documentation for huge_pages
[0] and shared_memory_type [1] uses the same phrasing multiple times,
and the new shared_memory_size GUC uses it as well [2]. I don't see
anything in the documentation that suggests that shared_buffers is the
only thing in the main shared memory area, and the documentation for
setting up huge pages makes no mention of any extra memory that needs
to be considered, either.
Looks rather sane to me, FWIW. I have not tested on Linux properly
yet (not tempted to take my bets on the buildfarm on a Friday,
either), but I should be able to handle that at the beginning of next
week.
+ GetHugePageSize(&hp_size, &unused);
+ if (hp_size != 0)
I'd rather change GetHugePageSize() to be able to accept NULL for the
parameter values, rather than declaring such variables.
+ To determine the number of huge pages needed, use the
+ <command>postgres</command> command to see the value of
+ <xref linkend="guc-shared-memory-size-in-huge-pages"/>.
We may want to say as well here that the server should be offline?
It would not hurt to duplicate this information with
postgres-ref.sgml.
+ This setting is supported only on Linux. It is always set to
Nit: This paragraph is missing two <productname>s for Linux. The docs
are random about that, but these are new entries.
--
Michael
On 9/16/21, 7:21 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
+ GetHugePageSize(&hp_size, &unused); + if (hp_size != 0) I'd rather change GetHugePageSize() to be able to accept NULL for the parameter values, rather than declaring such variables.
Done.
+ To determine the number of huge pages needed, use the + <command>postgres</command> command to see the value of + <xref linkend="guc-shared-memory-size-in-huge-pages"/>. We may want to say as well here that the server should be offline? It would not hurt to duplicate this information with postgres-ref.sgml.
Done.
+ This setting is supported only on Linux. It is always set to
Nit: This paragraph is missing two <productname>s for Linux. The docs
are random about that, but these are new entries.
Done.
Nathan
Attachments:
v13-0001-Introduce-shared_memory_size_in_huge_pages-GUC.patchapplication/octet-stream; name=v13-0001-Introduce-shared_memory_size_in_huge_pages-GUC.patchDownload
From 4a86d93c6c75d3e692bc66d026b68338452c8ead Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 17 Sep 2021 16:23:59 +0000
Subject: [PATCH v13 1/1] Introduce shared_memory_size_in_huge_pages GUC.
This runtime-computed GUC shows the number of huge pages required
for the server's main shared memory area.
---
doc/src/sgml/config.sgml | 21 ++++++++++++++++++
doc/src/sgml/ref/postgres-ref.sgml | 3 ++-
doc/src/sgml/runtime.sgml | 31 ++++++++++-----------------
src/backend/port/sysv_shmem.c | 44 ++++++++++++++++++++++++--------------
src/backend/port/win32_shmem.c | 17 +++++++++++++++
src/backend/storage/ipc/ipci.c | 14 ++++++++++++
src/backend/utils/misc/guc.c | 12 +++++++++++
src/include/storage/pg_shmem.h | 1 +
8 files changed, 106 insertions(+), 37 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ef0e2a7746..0a8e35c59f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -10289,6 +10289,27 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
</listitem>
</varlistentry>
+ <varlistentry id="guc-shared-memory-size-in-huge-pages" xreflabel="shared_memory_size_in_huge_pages">
+ <term><varname>shared_memory_size_in_huge_pages</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>shared_memory_size_in_huge_pages</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Reports the number of huge pages that are needed for the main shared
+ memory area based on the specified <xref linkend="guc-huge-page-size"/>.
+ If huge pages are not supported, this will be <literal>-1</literal>.
+ </para>
+ <para>
+ This setting is supported only on <productname>Linux</productname>. It
+ is always set to <literal>-1</literal> on other platforms. For more
+ details about using huge pages on <productname>Linux</productname>, see
+ <xref linkend="linux-huge-pages"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-ssl-library" xreflabel="ssl_library">
<term><varname>ssl_library</varname> (<type>string</type>)
<indexterm>
diff --git a/doc/src/sgml/ref/postgres-ref.sgml b/doc/src/sgml/ref/postgres-ref.sgml
index f72c3b04e4..55a3f6c69d 100644
--- a/doc/src/sgml/ref/postgres-ref.sgml
+++ b/doc/src/sgml/ref/postgres-ref.sgml
@@ -143,7 +143,8 @@ PostgreSQL documentation
<para>
This can be used on a running server for most parameters. However,
the server must be shut down for some runtime-computed parameters
- (e.g., <xref linkend="guc-shared-memory-size"/> and
+ (e.g., <xref linkend="guc-shared-memory-size"/>,
+ <xref linkend="guc-shared-memory-size-in-huge-pages"/>, and
<xref linkend="guc-wal-segment-size"/>).
</para>
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index f1cbc1d9e9..aa2291a326 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1442,31 +1442,22 @@ export PG_OOM_ADJUST_VALUE=0
with <varname>CONFIG_HUGETLBFS=y</varname> and
<varname>CONFIG_HUGETLB_PAGE=y</varname>. You will also have to configure
the operating system to provide enough huge pages of the desired size.
- To estimate the number of huge pages needed, start
- <productname>PostgreSQL</productname> without huge pages enabled and check
- the postmaster's anonymous shared memory segment size, as well as the
- system's default and supported huge page sizes, using the
- <filename>/proc</filename> and <filename>/sys</filename> file systems.
+ To determine the number of huge pages needed, use the
+ <command>postgres</command> command to see the value of
+ <xref linkend="guc-shared-memory-size-in-huge-pages"/>. Note that the
+ server must be shut down to view this runtime-computed parameter.
This might look like:
<programlisting>
-$ <userinput>head -1 $PGDATA/postmaster.pid</userinput>
-4170
-$ <userinput>pmap 4170 | awk '/rw-s/ && /zero/ {print $2}'</userinput>
-6490428K
-$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
-Hugepagesize: 2048 kB
-$ <userinput>ls /sys/kernel/mm/hugepages</userinput>
-hugepages-1048576kB hugepages-2048kB
+$ <userinput>postgres -D $PGDATA -C shared_memory_size_in_huge_pages</userinput>
+3170
</programlisting>
- In this example the default is 2MB, but you can also explicitly request
- either 2MB or 1GB with <xref linkend="guc-huge-page-size"/>.
+ Note that you can explicitly request either 2MB or 1GB huge pages with
+ <xref linkend="guc-huge-page-size"/>.
- Assuming <literal>2MB</literal> huge pages,
- <literal>6490428</literal> / <literal>2048</literal> gives approximately
- <literal>3169.154</literal>, so in this example we need at
- least <literal>3170</literal> huge pages. A larger setting would be
- appropriate if other programs on the machine also need huge pages.
+ While we need at least <literal>3170</literal> huge pages in this example,
+ a larger setting would be appropriate if other programs on the machine
+ also need huge pages.
We can set this with:
<programlisting>
# <userinput>sysctl -w vm.nr_hugepages=3170</userinput>
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 9de96edf6a..1ad70a3600 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -456,8 +456,6 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
return shmStat.shm_nattch == 0 ? SHMSTATE_UNATTACHED : SHMSTATE_ATTACHED;
}
-#ifdef MAP_HUGETLB
-
/*
* Identify the huge page size to use, and compute the related mmap flags.
*
@@ -476,11 +474,22 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
* such as increasing shared_buffers to absorb the extra space.
*
* Returns the (real, assumed or config provided) page size into *hugepagesize,
- * and the hugepage-related mmap flags to use into *mmap_flags.
+ * and the hugepage-related mmap flags to use into *mmap_flags. If huge pages
+ * are not supported, *hugepagesize and *mmap_flags will be set to 0. It is
+ * safe to set mmap_flags to NULL if its value is not needed.
*/
-static void
+void
GetHugePageSize(Size *hugepagesize, int *mmap_flags)
{
+#ifndef MAP_HUGETLB
+
+ *hugepagesize = 0;
+
+ if (mmap_flags != NULL)
+ *mmap_flags = 0;
+
+#else
+
Size default_hugepagesize = 0;
/*
@@ -539,23 +548,26 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags)
*hugepagesize = 2 * 1024 * 1024;
}
- *mmap_flags = MAP_HUGETLB;
+ /* If requested, determine appropriate mmap flags. */
+ if (mmap_flags != NULL)
+ {
+ *mmap_flags = MAP_HUGETLB;
- /*
- * On recent enough Linux, also include the explicit page size, if
- * necessary.
- */
+ /*
+ * On recent enough Linux, also include the explicit page size, if
+ * necessary.
+ */
#if defined(MAP_HUGE_MASK) && defined(MAP_HUGE_SHIFT)
- if (*hugepagesize != default_hugepagesize)
- {
- int shift = pg_ceil_log2_64(*hugepagesize);
+ if (*hugepagesize != default_hugepagesize)
+ {
+ int shift = pg_ceil_log2_64(*hugepagesize);
- *mmap_flags |= (shift & MAP_HUGE_MASK) << MAP_HUGE_SHIFT;
- }
+ *mmap_flags |= (shift & MAP_HUGE_MASK) << MAP_HUGE_SHIFT;
+ }
#endif
-}
-
+ }
#endif /* MAP_HUGETLB */
+}
/*
* Creates an anonymous mmap()ed shared memory segment.
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index d7a71992d8..1d5d902eb8 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -605,3 +605,20 @@ pgwin32_ReserveSharedMemoryRegion(HANDLE hChild)
return true;
}
+
+/*
+ * This function is provided for consistency with sysv_shmem.c and does not
+ * provide any useful information for Windows. To obtain the large page size,
+ * use GetLargePageMinimum() instead.
+ *
+ * This function sets *hugepagesize to 0. If mmap_flags is not NULL, it also
+ * sets *mmap_flags to 0.
+ */
+void
+GetHugePageSize(Size *hugepagesize, int *mmap_flags)
+{
+ *hugepagesize = 0;
+
+ if (mmap_flags != NULL)
+ *mmap_flags = 0;
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 13f3926ff6..9fa3e0631e 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -326,6 +326,7 @@ InitializeShmemGUCs(void)
char buf[64];
Size size_b;
Size size_mb;
+ Size hp_size;
/*
* Calculate the shared memory size and round up to the nearest megabyte.
@@ -334,4 +335,17 @@ InitializeShmemGUCs(void)
size_mb = add_size(size_b, (1024 * 1024) - 1) / (1024 * 1024);
sprintf(buf, "%zu", size_mb);
SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+
+ /*
+ * Calculate the number of huge pages required.
+ */
+ GetHugePageSize(&hp_size, NULL);
+ if (hp_size != 0)
+ {
+ Size hp_required;
+
+ hp_required = add_size(size_b / hp_size, 1);
+ sprintf(buf, "%zu", hp_required);
+ SetConfigOption("shared_memory_size_in_huge_pages", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
+ }
}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a6e4fcc24e..d2ce4a8450 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -665,6 +665,7 @@ static int max_identifier_length;
static int block_size;
static int segment_size;
static int shared_memory_size_mb;
+static int shared_memory_size_in_huge_pages;
static int wal_block_size;
static bool data_checksums;
static bool integer_datetimes;
@@ -2349,6 +2350,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"shared_memory_size_in_huge_pages", PGC_INTERNAL, PRESET_OPTIONS,
+ gettext_noop("Shows the number of huge pages needed for the main shared memory area."),
+ gettext_noop("-1 indicates that the value could not be determined."),
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
+ },
+ &shared_memory_size_in_huge_pages,
+ -1, -1, INT_MAX,
+ NULL, NULL, NULL
+ },
+
{
{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index 059df1b72c..518eb86065 100644
--- a/src/include/storage/pg_shmem.h
+++ b/src/include/storage/pg_shmem.h
@@ -87,5 +87,6 @@ extern PGShmemHeader *PGSharedMemoryCreate(Size size,
PGShmemHeader **shim);
extern bool PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2);
extern void PGSharedMemoryDetach(void);
+extern void GetHugePageSize(Size *hugepagesize, int *mmap_flags);
#endif /* PG_SHMEM_H */
--
2.16.6
Should we also initialize the shared memory GUCs in bootstrap and
single-user mode? I think I missed this in bd17880.
Nathan
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 48615c0ebc..4c4cf44871 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -324,6 +324,12 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
InitializeMaxBackends();
+ /*
+ * Initialize runtime-computed GUCs that depend on the amount of shared
+ * memory required.
+ */
+ InitializeShmemGUCs();
+
CreateSharedMemoryAndSemaphores();
/*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0775abe35d..cae0b079b9 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3978,6 +3978,12 @@ PostgresSingleUserMain(int argc, char *argv[],
/* Initialize MaxBackends */
InitializeMaxBackends();
+ /*
+ * Initialize runtime-computed GUCs that depend on the amount of shared
+ * memory required.
+ */
+ InitializeShmemGUCs();
+
CreateSharedMemoryAndSemaphores();
/*
On Fri, Sep 17, 2021 at 04:31:44PM +0000, Bossart, Nathan wrote:
Done.
Thanks. I have gone through the last patch this morning, did some
tests on all the platforms I have at hand (including Linux) and
finished by applying it after doing some small tweaks. First, I have
finished by extending GetHugePageSize() to accept NULL for its first
argument hugepagesize. A second thing was in the docs, where it is
still useful IMO to keep the reference to /proc/meminfo and
/sys/kernel/mm/hugepages to let users know how the system impacts the
calculation of the new GUC.
Let's see what the buildfarm thinks about it.
--
Michael
On Tue, Sep 21, 2021 at 12:08:22AM +0000, Bossart, Nathan wrote:
Should we also initialize the shared memory GUCs in bootstrap and
single-user mode? I think I missed this in bd17880.
Why would we need that for the bootstrap mode?
While looking at the patch for shared_memory_size, I have looked at
those code paths to note that some of the runtime GUCs would be set
thanks to the load of the control file, but supporting this case
sounded rather limited to me for --single when it came to shared
memory and huge page estimation and we don't load
shared_preload_libraries in this context either, which could lead to
wrong estimations. Anyway, I am not going to fight hard if people
would like that for the --single mode, even if it may lead to an
underestimation of the shmem allocated.
--
Michael
On 9/20/21, 6:48 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
Thanks. I have gone through the last patch this morning, did some
tests on all the platforms I have at hand (including Linux) and
finished by applying it after doing some small tweaks. First, I have
finished by extending GetHugePageSize() to accept NULL for its first
argument hugepagesize. A second thing was in the docs, where it is
still useful IMO to keep the reference to /proc/meminfo and
/sys/kernel/mm/hugepages to let users know how the system impacts the
calculation of the new GUC.Let's see what the buildfarm thinks about it.
Thanks!
Nathan
On 9/20/21, 7:29 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
On Tue, Sep 21, 2021 at 12:08:22AM +0000, Bossart, Nathan wrote:
Should we also initialize the shared memory GUCs in bootstrap and
single-user mode? I think I missed this in bd17880.Why would we need that for the bootstrap mode?
While looking at the patch for shared_memory_size, I have looked at
those code paths to note that some of the runtime GUCs would be set
thanks to the load of the control file, but supporting this case
sounded rather limited to me for --single when it came to shared
memory and huge page estimation and we don't load
shared_preload_libraries in this context either, which could lead to
wrong estimations. Anyway, I am not going to fight hard if people
would like that for the --single mode, even if it may lead to an
underestimation of the shmem allocated.
I was looking at this from the standpoint of keeping the startup steps
consistent between the modes. Looking again, I can't think of
a strong reason to add it to bootstrap mode. I think the case for
adding it to single-user mode is a bit stronger, as commands like
"SHOW shared_memory_size;" currently return 0. I lean in favor of
adding it for single-user mode, but it's probably fine either way.
Nathan
On Tue, Sep 21, 2021 at 04:06:38PM +0000, Bossart, Nathan wrote:
I was looking at this from the standpoint of keeping the startup steps
consistent between the modes. Looking again, I can't think of
a strong reason to add it to bootstrap mode. I think the case for
adding it to single-user mode is a bit stronger, as commands like
"SHOW shared_memory_size;" currently return 0. I lean in favor of
adding it for single-user mode, but it's probably fine either way.
I am not sure either as that's a tradeoff between an underestimation
and no information. The argument of consistency indeed matters.
Let's see if others have any opinion to share on this point.
--
Michael
On Tue, Sep 21, 2021 at 11:53 PM Michael Paquier <michael@paquier.xyz> wrote:
I am not sure either as that's a tradeoff between an underestimation
and no information. The argument of consistency indeed matters.
Let's see if others have any opinion to share on this point.
Well, if we think the information won't be safe to use, it's better to
report nothing than a wrong value, I think.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Sep 9, 2021 at 11:53 PM Bossart, Nathan <bossartn@amazon.com> wrote:
On 9/8/21, 9:19 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
FWIW, I don't have an environment at hand these days to test properly
0001, so this will have to wait a bit. I really like the approach
taken by 0002, and it is independent of the other patch while
extending support for postgres -c to provide the correct runtime
values. So let's wrap this part first. No need to send a reorganized
patch set.Sounds good.
For 0001, the biggest thing on my mind at the moment is the name of
the GUC. "huge_pages_required" feels kind of ambiguous. From the
name alone, it could mean either "the number of huge pages required"
or "huge pages are required for the server to run." Also, the number
of huge pages required is not actually required if you don't want to
run the server with huge pages. I think it might be clearer to
somehow indicate that the value is essentially the size of the main
shared memory area in terms of the huge page size, but I'm not sure
how to do that concisely. Perhaps it is enough to just make sure the
description of "huge_pages_required" is detailed enough.For 0002, I have two small concerns. My first concern is that it
might be confusing to customers when the runtime GUCs cannot be
returned for a running server. We have the note in the docs, but if
you're encountering it on the command line, it's not totally clear
what the problem is.$ postgres -D . -C log_min_messages
warning
$ postgres -D . -C shared_memory_size
2021-09-09 18:51:21.617 UTC [7924] FATAL: lock file "postmaster.pid" already exists
2021-09-09 18:51:21.617 UTC [7924] HINT: Is another postmaster (PID 7912) running in data directory "/local/home/bossartn/pgdata"?My other concern is that by default, viewing the runtime-computed GUCs
will also emit a LOG.$ postgres -D . -C shared_memory_size
142
2021-09-09 18:53:25.194 UTC [8006] LOG: database system is shut downRunning these commands with log_min_messages=debug5 emits way more
information for the runtime-computed GUCs than for others, but IMO
that is alright. However, perhaps we should adjust the logging in
0002 to improve the default user experience. I attached an attempt at
that.With the attached patch, trying to view a runtime-computed GUC on a
running server will look like this:$ postgres -D . -C shared_memory_size
2021-09-09 21:24:21.552 UTC [6224] FATAL: lock file "postmaster.pid" already exists
2021-09-09 21:24:21.552 UTC [6224] DETAIL: Runtime-computed GUC "shared_memory_size" cannot be viewed on a running server.
2021-09-09 21:24:21.552 UTC [6224] HINT: Is another postmaster (PID 3628) running in data directory "/local/home/bossartn/pgdata"?And viewing a runtime-computed GUC on a server that is shut down will
look like this:$ postgres -D . -C shared_memory_size
142
Nothing fixing this ended up actually getting committed, right? That
is, we still get the extra log output?
And in fact, the command documented on
https://www.postgresql.org/docs/devel/kernel-resources.html doesn't
actually produce the output that the docs show, it also shows the log
line, in the default config? If we can't fix the extra logging we
should at least have our docs represent reality -- maybe by adding a
"2>/dev/null" entry? But it'd be better to have it not output that log
in the first place...
(Of course what I'd really want is to be able to run it on a cluster
that's running, but that was discussed downthread so I'm not bringing
that one up for changes now)
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
On Mon, Mar 14, 2022 at 04:26:43PM +0100, Magnus Hagander wrote:
Nothing fixing this ended up actually getting committed, right? That
is, we still get the extra log output?
Correct.
And in fact, the command documented on
https://www.postgresql.org/docs/devel/kernel-resources.html doesn't
actually produce the output that the docs show, it also shows the log
line, in the default config? If we can't fix the extra logging we
should at least have our docs represent reality -- maybe by adding a
"2>/dev/null" entry? But it'd be better to have it not output that log
in the first place...
I attached a patch to adjust the documentation for now. This apparently
crossed my mind earlier [0]/messages/by-id/C45224E1-29C8-414C-A8E6-B718C07ACB94@amazon.com, but I didn't follow through with it for some
reason.
(Of course what I'd really want is to be able to run it on a cluster
that's running, but that was discussed downthread so I'm not bringing
that one up for changes now)
I think it is worth revisiting the extra logging and the ability to view
runtime-computed GUCs on a running server. Should this be an open item for
v15, or do you think it's alright to leave it for the v16 development
cycle?
[0]: /messages/by-id/C45224E1-29C8-414C-A8E6-B718C07ACB94@amazon.com
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachments:
v1-0001-send-stderr-to-dev-null-in-Linux-Huge-Pages-docum.patchtext/x-diff; charset=us-asciiDownload
From 7ee7b176c5280349631426ff047a9df394e26d59 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandbossart@gmail.com>
Date: Mon, 14 Mar 2022 10:24:48 -0700
Subject: [PATCH v1 1/1] send stderr to /dev/null in Linux Huge Pages
documentation
---
doc/src/sgml/runtime.sgml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index f77ed24204..85b3ffcd71 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1448,7 +1448,7 @@ export PG_OOM_ADJUST_VALUE=0
server must be shut down to view this runtime-computed parameter.
This might look like:
<programlisting>
-$ <userinput>postgres -D $PGDATA -C shared_memory_size_in_huge_pages</userinput>
+$ <userinput>postgres -D $PGDATA -C shared_memory_size_in_huge_pages 2> /dev/null</userinput>
3170
$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
Hugepagesize: 2048 kB
--
2.25.1
On Mon, Mar 14, 2022 at 10:34:17AM -0700, Nathan Bossart wrote:
On Mon, Mar 14, 2022 at 04:26:43PM +0100, Magnus Hagander wrote:
And in fact, the command documented on
https://www.postgresql.org/docs/devel/kernel-resources.html doesn't
actually produce the output that the docs show, it also shows the log
line, in the default config? If we can't fix the extra logging we
should at least have our docs represent reality -- maybe by adding a
"2>/dev/null" entry? But it'd be better to have it not output that log
in the first place...I attached a patch to adjust the documentation for now. This apparently
crossed my mind earlier [0], but I didn't follow through with it for some
reason.
Another thing that we can add is -c log_min_messages=fatal, but my
method is more complicated than a simple redirection, of course :)
(Of course what I'd really want is to be able to run it on a cluster
that's running, but that was discussed downthread so I'm not bringing
that one up for changes now)I think it is worth revisiting the extra logging and the ability to view
runtime-computed GUCs on a running server. Should this be an open item for
v15, or do you think it's alright to leave it for the v16 development
cycle?
Well, this is a completely new problem as it opens the door of
potential concurrent access issues with the data directory lock file
while reading values from the control file. And that's not mandatory
to be able to get those estimations without having to allocate a large
chunk of memory, which was the primary goal discussed upthread as far
as I recall. So I would leave that as an item to potentially tackle
in future versions.
--
Michael
On Tue, Mar 15, 2022 at 3:41 AM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Mar 14, 2022 at 10:34:17AM -0700, Nathan Bossart wrote:
On Mon, Mar 14, 2022 at 04:26:43PM +0100, Magnus Hagander wrote:
And in fact, the command documented on
https://www.postgresql.org/docs/devel/kernel-resources.html doesn't
actually produce the output that the docs show, it also shows the log
line, in the default config? If we can't fix the extra logging we
should at least have our docs represent reality -- maybe by adding a
"2>/dev/null" entry? But it'd be better to have it not output that log
in the first place...I attached a patch to adjust the documentation for now. This apparently
crossed my mind earlier [0], but I didn't follow through with it for some
reason.Another thing that we can add is -c log_min_messages=fatal, but my
method is more complicated than a simple redirection, of course :)
Either does work, but yours has more characters :)
(Of course what I'd really want is to be able to run it on a cluster
that's running, but that was discussed downthread so I'm not bringing
that one up for changes now)I think it is worth revisiting the extra logging and the ability to view
runtime-computed GUCs on a running server. Should this be an open item for
v15, or do you think it's alright to leave it for the v16 development
cycle?Well, this is a completely new problem as it opens the door of
potential concurrent access issues with the data directory lock file
while reading values from the control file. And that's not mandatory
to be able to get those estimations without having to allocate a large
chunk of memory, which was the primary goal discussed upthread as far
as I recall. So I would leave that as an item to potentially tackle
in future versions.
I think we're talking about two different things here.
I think the "avoid extra logging" would be worth seeing if we can
address for 15.
The "able to run on a live cluster" seems a lot bigger and more scary
and not 15 material.
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
On Tue, Mar 15, 2022 at 11:02:37PM +0100, Magnus Hagander wrote:
I think we're talking about two different things here.
I think the "avoid extra logging" would be worth seeing if we can
address for 15.
A simple approach could be to just set log_min_messages to PANIC before
exiting. I've attached a patch for this. With this patch, we'll still see
a FATAL if we try to use 'postgres -C' for a runtime-computed GUC on a
running server, and there will be no extra output as long as the user sets
log_min_messages to INFO or higher (i.e., not a DEBUG* value). For
comparison, 'postgres -C' for a non-runtime-computed GUC does not emit
extra output as long as the user sets log_min_messages to DEBUG2 or higher.
The "able to run on a live cluster" seems a lot bigger and more scary
and not 15 material.
+1
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachments:
v2-0001-don-t-emit-shutdown-messages-for-postgres-C-with-.patchtext/x-diff; charset=us-asciiDownload
From cdfd1ad00ca1d8afbfcbafc1f684b5ba9cc43eb6 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandbossart@gmail.com>
Date: Tue, 15 Mar 2022 15:36:41 -0700
Subject: [PATCH v2 1/1] don't emit shutdown messages for 'postgres -C' with
runtime-computed GUCs
---
src/backend/postmaster/postmaster.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 80bb269599..bf48bc6326 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1066,6 +1066,10 @@ PostmasterMain(int argc, char *argv[])
false, false);
puts(config_val ? config_val : "");
+
+ /* don't emit shutdown messages */
+ SetConfigOption("log_min_messages", "PANIC", PGC_INTERNAL, PGC_S_OVERRIDE);
+
ExitPostmaster(0);
}
--
2.25.1
On Tue, Mar 15, 2022 at 03:44:39PM -0700, Nathan Bossart wrote:
A simple approach could be to just set log_min_messages to PANIC before
exiting. I've attached a patch for this. With this patch, we'll still see
a FATAL if we try to use 'postgres -C' for a runtime-computed GUC on a
running server, and there will be no extra output as long as the user sets
log_min_messages to INFO or higher (i.e., not a DEBUG* value). For
comparison, 'postgres -C' for a non-runtime-computed GUC does not emit
extra output as long as the user sets log_min_messages to DEBUG2 or higher.
I created a commitfest entry for this:
https://commitfest.postgresql.org/38/3596/
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Tue, Mar 15, 2022 at 03:44:39PM -0700, Nathan Bossart wrote:
A simple approach could be to just set log_min_messages to PANIC before
exiting. I've attached a patch for this. With this patch, we'll still see
a FATAL if we try to use 'postgres -C' for a runtime-computed GUC on a
running server, and there will be no extra output as long as the user sets
log_min_messages to INFO or higher (i.e., not a DEBUG* value). For
comparison, 'postgres -C' for a non-runtime-computed GUC does not emit
extra output as long as the user sets log_min_messages to DEBUG2 or higher.
puts(config_val ? config_val : ""); + + /* don't emit shutdown messages */ + SetConfigOption("log_min_messages", "PANIC", PGC_INTERNAL, PGC_S_OVERRIDE); + ExitPostmaster(0);
That's fancy, but I don't like that much. And this would not protect
either against any messages generated before this code path, either,
even if that should be enough for the current HEAD .
My solution for the docs is perhaps too confusing for the end-user,
and we are talking about a Linux-only thing here anyway. So, at the
end, I am tempted to just add the "2> /dev/null" as suggested upthread
by Nathan and call it a day. Does that sound fine?
--
Michael
On Wed, Mar 23, 2022 at 03:25:48PM +0900, Michael Paquier wrote:
My solution for the docs is perhaps too confusing for the end-user,
and we are talking about a Linux-only thing here anyway. So, at the
end, I am tempted to just add the "2> /dev/null" as suggested upthread
by Nathan and call it a day.
This still sounds like the best way to go for now, so done this way as
of bbd4951.
--
Michael
On Wed, Mar 23, 2022 at 7:25 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Mar 15, 2022 at 03:44:39PM -0700, Nathan Bossart wrote:
A simple approach could be to just set log_min_messages to PANIC before
exiting. I've attached a patch for this. With this patch, we'll still see
a FATAL if we try to use 'postgres -C' for a runtime-computed GUC on a
running server, and there will be no extra output as long as the user sets
log_min_messages to INFO or higher (i.e., not a DEBUG* value). For
comparison, 'postgres -C' for a non-runtime-computed GUC does not emit
extra output as long as the user sets log_min_messages to DEBUG2 or higher.puts(config_val ? config_val : ""); + + /* don't emit shutdown messages */ + SetConfigOption("log_min_messages", "PANIC", PGC_INTERNAL, PGC_S_OVERRIDE); + ExitPostmaster(0);That's fancy, but I don't like that much. And this would not protect
either against any messages generated before this code path, either,
But neither would the suggestion of redirecting stderr to /dev/null.
In fact, doing the redirect it will *also* throw away any FATAL that
happens. In fact, using the 2>/dev/null method, we *also* remove the
message that says there's another postmaster running in this
directory, which is strictly worse than the override of
log_min_messages.
That said, the redirect can be removed without recompiling postgres,
so it is probably still hte better choice as a temporary workaround.
But we should really look into getting a better solution in place once
we start on 16.
My solution for the docs is perhaps too confusing for the end-user,
and we are talking about a Linux-only thing here anyway. So, at the
end, I am tempted to just add the "2> /dev/null" as suggested upthread
by Nathan and call it a day. Does that sound fine?
What would be a linux only thing?
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
On Thu, Mar 24, 2022 at 02:07:26PM +0100, Magnus Hagander wrote:
On Wed, Mar 23, 2022 at 7:25 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Mar 15, 2022 at 03:44:39PM -0700, Nathan Bossart wrote:
A simple approach could be to just set log_min_messages to PANIC before
exiting. I've attached a patch for this. With this patch, we'll still see
a FATAL if we try to use 'postgres -C' for a runtime-computed GUC on a
running server, and there will be no extra output as long as the user sets
log_min_messages to INFO or higher (i.e., not a DEBUG* value). For
comparison, 'postgres -C' for a non-runtime-computed GUC does not emit
extra output as long as the user sets log_min_messages to DEBUG2 or higher.puts(config_val ? config_val : ""); + + /* don't emit shutdown messages */ + SetConfigOption("log_min_messages", "PANIC", PGC_INTERNAL, PGC_S_OVERRIDE); + ExitPostmaster(0);That's fancy, but I don't like that much. And this would not protect
either against any messages generated before this code path, either,But neither would the suggestion of redirecting stderr to /dev/null.
In fact, doing the redirect it will *also* throw away any FATAL that
happens. In fact, using the 2>/dev/null method, we *also* remove the
message that says there's another postmaster running in this
directory, which is strictly worse than the override of
log_min_messages.That said, the redirect can be removed without recompiling postgres,
so it is probably still hte better choice as a temporary workaround.
But we should really look into getting a better solution in place once
we start on 16.
A couple of other options to consider:
1) Always set log_min_messages to WARNING/ERROR/FATAL for 'postgres -C'.
We might need some special logic for handling the case where the user is
inspecting the log_min_messages parameter. With this approach, you'd
probably never get extra output unless something was wrong (e.g., database
already running when inspecting a runtime-computed GUC). Also, this would
silence any extra output that you might see today with non-runtime-computed
GUCs.
2) Add some way to skip just the shutdown message (e.g., a variable set
when output_config_variable is true). With this approach, you wouldn't get
extra output by default, but you still might if log_min_messages is set to
something like DEBUG3. This wouldn't impact any extra output that you see
today with non-runtime-computed GUCs.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Thu, Mar 24, 2022 at 01:31:08PM -0700, Nathan Bossart wrote:
A couple of other options to consider:
1) Always set log_min_messages to WARNING/ERROR/FATAL for 'postgres -C'.
We might need some special logic for handling the case where the user is
inspecting the log_min_messages parameter. With this approach, you'd
probably never get extra output unless something was wrong (e.g., database
already running when inspecting a runtime-computed GUC). Also, this would
silence any extra output that you might see today with non-runtime-computed
GUCs.2) Add some way to skip just the shutdown message (e.g., a variable set
when output_config_variable is true). With this approach, you wouldn't get
extra output by default, but you still might if log_min_messages is set to
something like DEBUG3. This wouldn't impact any extra output that you see
today with non-runtime-computed GUCs.
I've attached a first attempt at option 1.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachments:
v3-0001-Silence-extra-logging-with-postgres-C.patchtext/x-diff; charset=us-asciiDownload
From f36d04f5d19c673a7be788d6d4955b148f580095 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandbossart@gmail.com>
Date: Mon, 28 Mar 2022 10:12:23 -0700
Subject: [PATCH v3 1/1] Silence extra logging with 'postgres -C'.
Presently, the server may emit a variety of extra log messages when inspecting
GUC values. For example, when inspecting a runtime-computed GUC, the server
will always emit a "database system is shut down" LOG (unless the user has set
log_min_messages higher than LOG). To avoid these extra log messages, this
change sets log_min_messages to FATAL when -C is used (unless the user has set
it to PANIC). At FATAL, the user will still receive messages explaining why a
GUC's value cannot be inspected.
---
src/backend/postmaster/postmaster.c | 34 +++++++++++++++++++++++++----
1 file changed, 30 insertions(+), 4 deletions(-)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 80bb269599..7ee20d7c69 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -897,6 +897,26 @@ PostmasterMain(int argc, char *argv[])
if (output_config_variable != NULL)
{
+ const char *logging_setting;
+ int flags;
+
+ /*
+ * Silence any extra log messages when -C is used. Before adjusting
+ * log_min_messages, we save its value in case the user is inspecting
+ * it. If we are inspecting a runtime-computed GUC, it is possible that
+ * a preloaded library's _PG_init() will override our custom setting for
+ * log_min_messages, but trying to deal with that is probably more
+ * trouble than it's worth.
+ *
+ * If the user has set log_min_messages to PANIC, we leave it at PANIC.
+ * This has the side effect of silencing helpful FATAL messages (e.g.,
+ * when trying to inspect a runtime-computed GUC on a running server),
+ * but presumably the user has set it there for good reason.
+ */
+ logging_setting = GetConfigOption("log_min_messages", false, false);
+ if (log_min_messages < FATAL)
+ SetConfigOption("log_min_messages", "FATAL", PGC_INTERNAL, PGC_S_OVERRIDE);
+
/*
* If this is a runtime-computed GUC, it hasn't yet been initialized,
* and the present value is not useful. However, this is a convenient
@@ -908,17 +928,23 @@ PostmasterMain(int argc, char *argv[])
* on running servers for those GUCs, but using this option now would
* lead to incorrect results for them.
*/
- int flags = GetConfigOptionFlags(output_config_variable, true);
+ flags = GetConfigOptionFlags(output_config_variable, true);
if ((flags & GUC_RUNTIME_COMPUTED) == 0)
{
+ const char *config_val;
+
/*
* "-C guc" was specified, so print GUC's value and exit. No
* extra permission check is needed because the user is reading
- * inside the data dir.
+ * inside the data dir. If we are inspecting log_min_messages,
+ * print the saved value.
*/
- const char *config_val = GetConfigOption(output_config_variable,
- false, false);
+ if (pg_strcasecmp(output_config_variable, "log_min_messages") == 0)
+ config_val = logging_setting;
+ else
+ config_val = GetConfigOption(output_config_variable,
+ false, false);
puts(config_val ? config_val : "");
ExitPostmaster(0);
--
2.25.1
On Thu, Mar 24, 2022 at 02:07:26PM +0100, Magnus Hagander wrote:
But neither would the suggestion of redirecting stderr to /dev/null.
In fact, doing the redirect it will *also* throw away any FATAL that
happens. In fact, using the 2>/dev/null method, we *also* remove the
message that says there's another postmaster running in this
directory, which is strictly worse than the override of
log_min_messages.
Well, we could also tweak more the command with a redirection of
stderr to a log file or such, and tell to look at it for errors.
That said, the redirect can be removed without recompiling postgres,
so it is probably still hte better choice as a temporary workaround.
But we should really look into getting a better solution in place once
we start on 16.
But do we really need a better or more invasive solution for already
running servers though? A SHOW command would be able to do the work
already in this case. This would lack consistency compared to the
offline case, but we are not without option either. That leaves the
case where the server is running, has allocated memory but is not
ready to accept connections, like crash recovery, still this use case
looks rather thin to me.
My solution for the docs is perhaps too confusing for the end-user,
and we are talking about a Linux-only thing here anyway. So, at the
end, I am tempted to just add the "2> /dev/null" as suggested upthread
by Nathan and call it a day. Does that sound fine?What would be a linux only thing?
Perhaps not at some point in the future. Now that's under a section
of the docs only for Linux.
--
Michael
On Wed, Apr 20, 2022, 00:12 Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Mar 24, 2022 at 02:07:26PM +0100, Magnus Hagander wrote:
But neither would the suggestion of redirecting stderr to /dev/null.
In fact, doing the redirect it will *also* throw away any FATAL that
happens. In fact, using the 2>/dev/null method, we *also* remove the
message that says there's another postmaster running in this
directory, which is strictly worse than the override of
log_min_messages.Well, we could also tweak more the command with a redirection of
stderr to a log file or such, and tell to look at it for errors.
That's would be a pretty terrible ux though.
That said, the redirect can be removed without recompiling postgres,
so it is probably still hte better choice as a temporary workaround.
But we should really look into getting a better solution in place once
we start on 16.But do we really need a better or more invasive solution for already
running servers though? A SHOW command would be able to do the work
already in this case. This would lack consistency compared to the
offline case, but we are not without option either. That leaves the
case where the server is running, has allocated memory but is not
ready to accept connections, like crash recovery, still this use case
looks rather thin to me.
I agree that thats a very narrow use case. And I'm nog sure the use case of
a running server is even that important here - it's really the offline one
that's important. Or rather, the really compelling one is when there is a
server running but I want to check the value offline because it will
change. SHOW doesn't help there because it shows the value based on the
currently running configuration, not the new one after a restart.
I don't agree that the redirect is a solution. It's a workaround.
My solution for the docs is perhaps too confusing for the end-user,
and we are talking about a Linux-only thing here anyway. So, at the
end, I am tempted to just add the "2> /dev/null" as suggested upthread
by Nathan and call it a day. Does that sound fine?What would be a linux only thing?
Perhaps not at some point in the future. Now that's under a section
of the docs only for Linux.
Hmm. So what's the solution on windows? I guess maybe it's not as important
there because there is no limit on huge pages, but generally getting the
expected shared memory usage might be useful? Just significantly less
important.
/Magnus
On Fri, Apr 22, 2022 at 09:49:34AM +0200, Magnus Hagander wrote:
I agree that thats a very narrow use case. And I'm not sure the use case of
a running server is even that important here - it's really the offline one
that's important. Or rather, the really compelling one is when there is a
server running but I want to check the value offline because it will
change. SHOW doesn't help there because it shows the value based on the
currently running configuration, not the new one after a restart.
You mean the case of a server where one would directly change
postgresql.conf on a running server, and use postgres -C to see how
much the kernel settings need to be changed before the restart?
Hmm. So what's the solution on windows? I guess maybe it's not as important
there because there is no limit on huge pages, but generally getting the
expected shared memory usage might be useful? Just significantly less
important.
Contrary to Linux, we don't need to care about the number of large
pages that are necessary because there is no equivalent of
vm.nr_hugepages on Windows (see [1]https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support -- Michael), do we? If that were the case,
we'd have a use case for huge_page_size, additionally.
That's the case where shared_memory_size_in_huge_pages comes in
handy, as much as does huge_page_size, and note that
shared_memory_size works on WIN32.
[1]: https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support -- Michael
--
Michael
On Mon, Apr 25, 2022 at 2:15 AM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Apr 22, 2022 at 09:49:34AM +0200, Magnus Hagander wrote:
I agree that thats a very narrow use case. And I'm not sure the use case
of
a running server is even that important here - it's really the offline
one
that's important. Or rather, the really compelling one is when there is a
server running but I want to check the value offline because it will
change. SHOW doesn't help there because it shows the value based on the
currently running configuration, not the new one after a restart.You mean the case of a server where one would directly change
postgresql.conf on a running server, and use postgres -C to see how
much the kernel settings need to be changed before the restart?
Yes.
AIUI that was the original use-case for this feature. It certainly was for
me :)
Hmm. So what's the solution on windows? I guess maybe it's not as
importantthere because there is no limit on huge pages, but generally getting the
expected shared memory usage might be useful? Just significantly less
important.Contrary to Linux, we don't need to care about the number of large
pages that are necessary because there is no equivalent of
vm.nr_hugepages on Windows (see [1]), do we? If that were the case,
we'd have a use case for huge_page_size, additionally.
Right, for this one in particular -- that's what I meant with my comment
about there not being a limit. But this feature works for other settings as
well, not just the huge pages one. Exactly what the use-cases are can
vary, but surely they would have the same problems wrt redirects?
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On Mon, Apr 25, 2022 at 04:55:25PM +0200, Magnus Hagander wrote:
AIUI that was the original use-case for this feature. It certainly was for
me :)
Perhaps we'd be fine with relaxing the requirements here knowing that
the control file should never be larger than PG_CONTROL_MAX_SAFE_SIZE
(aka the read should be atomic so it could be made lockless). At the
end of the day, to be absolutely correct in the shmem size estimation,
I think that we are going to need what's proposed here or the sizing
may not be right depending on how extensions adjust GUCs after they
load their _PG_init():
/messages/by-id/20220419154658.GA2487941@nathanxps13
That's a bit independent, but not completely unrelated either
depending on how exact you want your number of estimated huge pages to
be. Just wanted to mention it.
Contrary to Linux, we don't need to care about the number of large
pages that are necessary because there is no equivalent of
vm.nr_hugepages on Windows (see [1]), do we? If that were the case,
we'd have a use case for huge_page_size, additionally.Right, for this one in particular -- that's what I meant with my comment
about there not being a limit. But this feature works for other settings as
well, not just the huge pages one. Exactly what the use-cases are can
vary, but surely they would have the same problems wrt redirects?
Yes, the redirection issue would apply to all the run-time GUCs.
--
Michael
On Tue, Apr 26, 2022 at 10:34:06AM +0900, Michael Paquier wrote:
Yes, the redirection issue would apply to all the run-time GUCs.
Should this be tracked as an open item for v15? There was another recent
report about the extra log output [0]/messages/by-id/YnARlI5nvbziobR4@momjian.us.
[0]: /messages/by-id/YnARlI5nvbziobR4@momjian.us
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Fri, May 06, 2022 at 10:13:18AM -0700, Nathan Bossart wrote:
On Tue, Apr 26, 2022 at 10:34:06AM +0900, Michael Paquier wrote:
Yes, the redirection issue would apply to all the run-time GUCs.
Should this be tracked as an open item for v15? There was another recent
report about the extra log output [0].
That makes it for two complaints on two separate threads. So an open
item seems adapted to adjust this behavior.
I have looked at the patch posted at [1]/messages/by-id/20220328173503.GA137769@nathanxps13 -- Michael, and I don't quite understand
why you need the extra dance with log_min_messages. Why don't you
just set the GUC at the end of the code path in PostmasterMain() where
we print non-runtime-computed parameters? I am not really worrying
about users deciding to set log_min_messages to PANIC in
postgresql.conf when it comes to postgres -C, TBH, as they'd miss the
FATAL messages if the command is attempted on a server already
starting.
Per se the attached.
[1]: /messages/by-id/20220328173503.GA137769@nathanxps13 -- Michael
--
Michael
Attachments:
v4-0001-Silence-extra-logging-with-postgres-C.patchtext/x-diff; charset=us-asciiDownload
From 3b8a7f8079955cd59a5a318adf6938cdd3c29c6b Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 9 May 2022 15:50:19 +0900
Subject: [PATCH v4] Silence extra logging with 'postgres -C'.
Presently, the server may emit a variety of extra log messages when
inspecting GUC values. For example, when inspecting a runtime-computed
GUC, the server will always emit a "database system is shut down" LOG
(unless the user has set log_min_messages higher than LOG). To avoid
these extra log messages, this change sets log_min_messages to FATAL
when -C is used (even if set to PANIC in postgresql.conf). At FATAL,
the user will still receive messages explaining why a GUC's value cannot
be inspected.
---
src/backend/postmaster/postmaster.c | 10 ++++++++++
doc/src/sgml/runtime.sgml | 2 +-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ce4007bb2c..38b63bc215 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -913,6 +913,16 @@ PostmasterMain(int argc, char *argv[])
puts(config_val ? config_val : "");
ExitPostmaster(0);
}
+
+ /*
+ * A runtime-computed GUC will be printed later on. As we initialize
+ * a server startup sequence, silence any log messages that may show up
+ * in the output generated. FATAL and more severe messages are useful
+ * to show, even if one would only expect at least PANIC. LOG entries
+ * are hidden.
+ */
+ SetConfigOption("log_min_messages", "FATAL", PGC_INTERNAL,
+ PGC_S_OVERRIDE);
}
/* Verify that DataDir looks reasonable */
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 4465c876b1..62cec614d3 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1448,7 +1448,7 @@ export PG_OOM_ADJUST_VALUE=0
server must be shut down to view this runtime-computed parameter.
This might look like:
<programlisting>
-$ <userinput>postgres -D $PGDATA -C shared_memory_size_in_huge_pages 2> /dev/null</userinput>
+$ <userinput>postgres -D $PGDATA -C shared_memory_size_in_huge_pages</userinput>
3170
$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
Hugepagesize: 2048 kB
--
2.36.0
On Mon, May 09, 2022 at 03:53:24PM +0900, Michael Paquier wrote:
I have looked at the patch posted at [1], and I don't quite understand
why you need the extra dance with log_min_messages. Why don't you
just set the GUC at the end of the code path in PostmasterMain() where
we print non-runtime-computed parameters?
The log_min_messages dance avoids extra output when inspecting
non-runtime-computed GUCs, like this:
~/pgdata$ postgres -D . -C log_min_messages -c log_min_messages=debug5
debug5
2022-05-10 09:06:04.728 PDT [3715607] DEBUG: shmem_exit(0): 0 before_shmem_exit callbacks to make
2022-05-10 09:06:04.728 PDT [3715607] DEBUG: shmem_exit(0): 0 on_shmem_exit callbacks to make
2022-05-10 09:06:04.728 PDT [3715607] DEBUG: proc_exit(0): 0 callbacks to make
2022-05-10 09:06:04.728 PDT [3715607] DEBUG: exit(0)
AFAICT you need to set log_min_messages to at least DEBUG3 to see extra
output for the non-runtime-computed GUCs, so it might not be worth the
added complexity.
I am not really worrying
about users deciding to set log_min_messages to PANIC in
postgresql.conf when it comes to postgres -C, TBH, as they'd miss the
FATAL messages if the command is attempted on a server already
starting.
I don't have a strong opinion on this one.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Tue, May 10, 2022 at 09:12:49AM -0700, Nathan Bossart wrote:
AFAICT you need to set log_min_messages to at least DEBUG3 to see extra
output for the non-runtime-computed GUCs, so it might not be worth the
added complexity.
This set of messages is showing up for ages with zero complaints from
the field AFAIK, and nobody would use this level of logging except
developers. One thing that overriding log_min_messages to FATAL does,
however, is to not show anymore those debug3 messages when querying a
runtime-computed GUC, but that's the kind of things we'd hide. Your
patch would hide those entries in both cases. Perhaps we could do
that, but at the end, I don't really see any need to complicate this
code path more than necessary, and this is enough to silence the logs
in the cases we care about basically all the time, even if the log
levels are reduced a bit on a given cluster. Hence, I have applied
the simplest solution to just enforce a log_min_messages=FATAL when
requesting a runtime GUC.
--
Michael
On Wed, May 11, 2022 at 02:34:25PM +0900, Michael Paquier wrote:
Hence, I have applied
the simplest solution to just enforce a log_min_messages=FATAL when
requesting a runtime GUC.
Thanks!
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com