OpenBSD versus semaphores

Started by Tom Laneabout 7 years ago7 messages
#1Tom Lane
tgl@sss.pgh.pa.us

I've been toying with OpenBSD lately, and soon noticed a seriously
annoying problem for running Postgres on it: by default, its limits
for SysV semaphores are only SEMMNS=60, SEMMNI=10. Not only does that
greatly constrain the number of connections for a single installation,
it means that our TAP tests fail because you can't start two postmasters
concurrently (cf [1]/messages/by-id/e6ecf989-9d5a-9dc5-12de-96985b6e5a5f@mksoft.nu).

Raising the annoyance factor considerably, AFAICT the only way to
increase these settings is to build your own custom kernel.

So I looked around for an alternative, and found out that modern
OpenBSD releases support named POSIX semaphores (though not unnamed
ones, at least not shared unnamed ones). What's more, it appears that
in this implementation, named semaphores don't eat open file descriptors
as they do on macOS, removing our major objection to using them.

I don't have any OpenBSD installation on hardware that I'd take very
seriously for performance testing, but some light testing with
"pgbench -S" suggests that a build with PREFERRED_SEMAPHORES=NAMED_POSIX
has just about the same performance as a build with SysV semaphores.

This all leads to the thought that maybe we should be selecting
PREFERRED_SEMAPHORES=NAMED_POSIX on OpenBSD. At the very least,
our docs ought to recommend it as a credible alternative for
people who don't want to get into building custom kernels.

I've checked that this works back to OpenBSD 6.0, and scanning
their man pages suggests that the feature appeared in 5.5.
5.5 isn't that old (2014) so possibly people are still running
older versions, but we could easily put in version-specific
default logic similar to what's in src/template/darwin.

Thoughts?

regards, tom lane

[1]: /messages/by-id/e6ecf989-9d5a-9dc5-12de-96985b6e5a5f@mksoft.nu

#2Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Tom Lane (#1)
Re: OpenBSD versus semaphores

On Tue, Jan 8, 2019 at 7:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I've been toying with OpenBSD lately, and soon noticed a seriously
annoying problem for running Postgres on it: by default, its limits
for SysV semaphores are only SEMMNS=60, SEMMNI=10. Not only does that
greatly constrain the number of connections for a single installation,
it means that our TAP tests fail because you can't start two postmasters
concurrently (cf [1]).

Raising the annoyance factor considerably, AFAICT the only way to
increase these settings is to build your own custom kernel.

So I looked around for an alternative, and found out that modern
OpenBSD releases support named POSIX semaphores (though not unnamed
ones, at least not shared unnamed ones). What's more, it appears that
in this implementation, named semaphores don't eat open file descriptors
as they do on macOS, removing our major objection to using them.

I don't have any OpenBSD installation on hardware that I'd take very
seriously for performance testing, but some light testing with
"pgbench -S" suggests that a build with PREFERRED_SEMAPHORES=NAMED_POSIX
has just about the same performance as a build with SysV semaphores.

This all leads to the thought that maybe we should be selecting
PREFERRED_SEMAPHORES=NAMED_POSIX on OpenBSD. At the very least,
our docs ought to recommend it as a credible alternative for
people who don't want to get into building custom kernels.

I've checked that this works back to OpenBSD 6.0, and scanning
their man pages suggests that the feature appeared in 5.5.
5.5 isn't that old (2014) so possibly people are still running
older versions, but we could easily put in version-specific
default logic similar to what's in src/template/darwin.

Thoughts?

No OpenBSD here, but I was curious enough to peek at their
implementation. Like others, they create a tiny file under /tmp for
each one, mmap() and close the fd straight away. Apparently don't
support shared sem_init() yet (EPERM). So your plan seems good to me.
CC'ing Pierre-Emmanuel (OpenBSD PostgreSQL port maintainer) in case he
is interested.

Wild speculation: I wouldn't be surprised if POSIX named semas
perform better than SysV semas on a large enough system, since they'll
live on different pages. At a glance, their sys_semget apparently
allocates arrays of struct sem without padding and I think they
probably get about 4 to a cacheline; see our experience with an 8
socket box leading to commit 2d306759 where we added our own padding.

--
Thomas Munro
http://www.enterprisedb.com

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#2)
Re: OpenBSD versus semaphores

Thomas Munro <thomas.munro@enterprisedb.com> writes:

On Tue, Jan 8, 2019 at 7:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

So I looked around for an alternative, and found out that modern
OpenBSD releases support named POSIX semaphores (though not unnamed
ones, at least not shared unnamed ones). What's more, it appears that
in this implementation, named semaphores don't eat open file descriptors
as they do on macOS, removing our major objection to using them.

No OpenBSD here, but I was curious enough to peek at their
implementation. Like others, they create a tiny file under /tmp for
each one, mmap() and close the fd straight away.

Oh, yeah, I can see a bunch of tiny mappings with procmap. I wonder
whether that scales any better than an open FD per semaphore, when
it comes to forking a bunch of child processes that will inherit
all those mappings or FDs. I've not tried to benchmark child process
launch as such --- as I said, I'm not running this on hardware that
would support serious benchmarking.

BTW, I just finished finding out that recent NetBSD (8.99.25) has
working code paths for *both* named and unnamed POSIX semaphores.
However, it appears that both code paths involve an open FD per
semaphore, so it's likely not something we want to recommend using.

regards, tom lane

#4Mikael Kjellström
mikael.kjellstrom@mksoft.nu
In reply to: Tom Lane (#1)
Re: OpenBSD versus semaphores

On 2019-01-08 07:14, Tom Lane wrote:

I've been toying with OpenBSD lately, and soon noticed a seriously
annoying problem for running Postgres on it: by default, its limits
for SysV semaphores are only SEMMNS=60, SEMMNI=10. Not only does that
greatly constrain the number of connections for a single installation,
it means that our TAP tests fail because you can't start two postmasters
concurrently (cf [1]).

Raising the annoyance factor considerably, AFAICT the only way to
increase these settings is to build your own custom kernel.

You don't need to build your custom kernel to change those settings.

Just add:

kern.seminfo.semmni=20

to /etc/sysctl.conf and reboot

/Mikael

In reply to: Tom Lane (#1)
Re: OpenBSD versus semaphores

On Tue, Jan 8, 2019 at 12:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I've been toying with OpenBSD lately, and soon noticed a seriously
annoying problem for running Postgres on it: by default, its limits
for SysV semaphores are only SEMMNS=60, SEMMNI=10. Not only does that
greatly constrain the number of connections for a single installation,
it means that our TAP tests fail because you can't start two postmasters
concurrently (cf [1]).

Raising the annoyance factor considerably, AFAICT the only way to
increase these settings is to build your own custom kernel.

This is not accurate, you can change this values via sysctl(1), extracted
from OpenBSD postgresql port:

Tuning for busy servers

=======================
The default sizes in the GENERIC kernel for SysV semaphores are only
just large enough for a database with the default configuration
(max_connections 40) if no other running processes use semaphores.
In other cases you will need to increase the limits. Adding the
following in /etc/sysctl.conf will be reasonable for many systems:

kern.seminfo.semmni=60
kern.seminfo.semmns=1024

To serve a large number of connections (>250), you may need higher
values for the above.

http://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/ports/databases/postgresql/pkg/README-server?rev=1.25&amp;content-type=text/plain

Show quoted text

So I looked around for an alternative, and found out that modern
OpenBSD releases support named POSIX semaphores (though not unnamed
ones, at least not shared unnamed ones). What's more, it appears that
in this implementation, named semaphores don't eat open file descriptors
as they do on macOS, removing our major objection to using them.

I don't have any OpenBSD installation on hardware that I'd take very
seriously for performance testing, but some light testing with
"pgbench -S" suggests that a build with PREFERRED_SEMAPHORES=NAMED_POSIX
has just about the same performance as a build with SysV semaphores.

This all leads to the thought that maybe we should be selecting
PREFERRED_SEMAPHORES=NAMED_POSIX on OpenBSD. At the very least,
our docs ought to recommend it as a credible alternative for
people who don't want to get into building custom kernels.

I've checked that this works back to OpenBSD 6.0, and scanning
their man pages suggests that the feature appeared in 5.5.
5.5 isn't that old (2014) so possibly people are still running
older versions, but we could easily put in version-specific
default logic similar to what's in src/template/darwin.

Thoughts?

regards, tom lane

[1]
/messages/by-id/e6ecf989-9d5a-9dc5-12de-96985b6e5a5f@mksoft.nu

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mikael Kjellström (#4)
Re: OpenBSD versus semaphores

=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?= <mikael.kjellstrom@mksoft.nu> writes:

On 2019-01-08 07:14, Tom Lane wrote:

Raising the annoyance factor considerably, AFAICT the only way to
increase these settings is to build your own custom kernel.

You don't need to build your custom kernel to change those settings.
Just add:
kern.seminfo.semmni=20
to /etc/sysctl.conf and reboot

Hm, I wonder when that came in? Our documentation doesn't know about it.

regards, tom lane

#7Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#3)
1 attachment(s)
Re: OpenBSD versus semaphores

On Fri, Apr 2, 2021 at 9:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@enterprisedb.com> writes:

On Tue, Jan 8, 2019 at 7:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

So I looked around for an alternative, and found out that modern
OpenBSD releases support named POSIX semaphores (though not unnamed
ones, at least not shared unnamed ones). What's more, it appears that
in this implementation, named semaphores don't eat open file descriptors
as they do on macOS, removing our major objection to using them.

No OpenBSD here, but I was curious enough to peek at their
implementation. Like others, they create a tiny file under /tmp for
each one, mmap() and close the fd straight away.

Oh, yeah, I can see a bunch of tiny mappings with procmap. I wonder
whether that scales any better than an open FD per semaphore, when
it comes to forking a bunch of child processes that will inherit
all those mappings or FDs. I've not tried to benchmark child process
launch as such --- as I said, I'm not running this on hardware that
would support serious benchmarking.

I also have no ability to benchmark on a real OpenBSD system, but once
a year or so when I spin up a little OpenBSD VM to test some patch or
other, it annoys me that our tests fail out of the box and then I have
to look up how to change the sysctls, so here's a patch. I also
checked the release notes to confirm that 5.5 is the right release to
look for[1]https://www.openbsd.org/55.html; by now that's EOL and probably not even worth bothering
with the test but doesn't cost much to be cautious about that. 4.x is
surely too old to waste electrons on. I guess the question for
OpenBSD experts is whether having (say) a thousand tiny mappings is
bad. On the plus side, we know from other Oses that having semas
spread out is good for reducing false sharing on large systems.

[1]: https://www.openbsd.org/55.html

Attachments:

0001-Use-POSIX_NAMED_SEMAPHORES-on-OpenBSD.patchtext/x-patch; charset=US-ASCII; name=0001-Use-POSIX_NAMED_SEMAPHORES-on-OpenBSD.patchDownload
From e7bcfa563e568ca8d3474432fd094a38205c78f0 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Fri, 2 Apr 2021 03:43:26 +1300
Subject: [PATCH] Use POSIX_NAMED_SEMAPHORES on OpenBSD.

For PostgreSQL to work out of the box without sysctl changes, let's use
POSIX semaphores instead of System V ones.  The "unamed" kind aren't
supported in shared memory, so we use the "named" kind.

Discussion: https://postgr.es/m/27582.1546928073%40sss.pgh.pa.us
---
 doc/src/sgml/runtime.sgml |  6 +-----
 src/template/openbsd      | 11 +++++++++++
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index bf877c0e0c..9aa3129aba 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -998,11 +998,7 @@ psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such
        <para>
         The default shared memory settings are usually good enough, unless
         you have set <literal>shared_memory_type</literal> to <literal>sysv</literal>.
-        You will usually want to
-        increase <literal>kern.seminfo.semmni</literal>
-        and <literal>kern.seminfo.semmns</literal>,
-        as <systemitem class="osname">OpenBSD</systemitem>'s default settings
-        for these are uncomfortably small.
+        System V semaphores are not used on this platform (since OpenBSD 5.5).
        </para>
 
        <para>
diff --git a/src/template/openbsd b/src/template/openbsd
index 365268c489..d3b6bf464a 100644
--- a/src/template/openbsd
+++ b/src/template/openbsd
@@ -2,3 +2,14 @@
 
 # Extra CFLAGS for code that will go into a shared library
 CFLAGS_SL="-fPIC -DPIC"
+
+# OpenBSD 5.5 (2014) gained named POSIX semaphores.  They work out of the box
+# without changing any sysctl settings, unlike System V semaphores.
+case $host_os in
+  openbsd5.[01234]*)
+    USE_SYSV_SEMAPHORES=1
+    ;;
+  *)
+    USE_NAMED_POSIX_SEMAPHORES=1
+    ;;
+esac
-- 
2.30.1