two servers on the same port

Started by Eric Haszlakiewiczabout 17 years ago9 messages
#1Eric Haszlakiewicz
erh@swapsimple.com
2 attachment(s)

I just spent a couple of days trying to figure out why I couldn't start
two servers on the same port, even though I was configuring separate
listen_address values. I kept gettting errors about shmget failing with
"could not create shared memory segment: Invalid argument".

I finally noticed that the shared memory key mentioned in the error when
starting the second server was the same as what the first server was
using, which appeared to be generated based off of the port number.

Sure enough when I changed the port, it used a different shared memory
key and started right up. After searching around on the web a bit
I found some pages that suggested running under different userids
might be necessary. So, I tried that, and changed the port back to the
standard 5432, and it started up.

Anyway, everything seems to be working fine, but I figured this info
should be a bit easier to find, so here's a couple patches to the
documentation to mention how this works.

eric

Attachments:

config.sgml.difftext/plain; charset=us-asciiDownload
--- doc/src/sgml/config.sgml.orig	2008-10-18 00:08:50.000000000 -0500
+++ doc/src/sgml/config.sgml	2008-10-18 00:10:58.000000000 -0500
@@ -337,6 +337,12 @@
         same port number is used for all IP addresses the server listens on.
         This parameter can only be set at server start.
        </para>
+       <para>
+        This setting also determines the key used for the shared memory
+        segment.  Because of that, two servers can not be started on the
+        same port, even if they have different listen_addresses, unless
+        they are also running under two different userids.
+       </para>
       </listitem>
      </varlistentry>
 
runtime.sgml.difftext/plain; charset=us-asciiDownload
--- doc/src/sgml/runtime.sgml.orig	2008-10-18 00:05:37.000000000 -0500
+++ doc/src/sgml/runtime.sgml	2008-10-18 00:08:37.000000000 -0500
@@ -401,6 +401,15 @@
     </para>
 
     <para>
+    You can also get this error if you try to start two servers on the
+    same machine, on the same port, even if you specify different
+    listen_address values.  In order for that configuration to work,
+    you'll need to run the servers under different userids, which will
+    cause <productname>PostgreSQL</productname> to use different
+    shared memory keys.
+	</para>
+
+    <para>
      An error like
 <screen>
 FATAL:  could not create semaphores: No space left on device
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Eric Haszlakiewicz (#1)
Re: two servers on the same port

Eric Haszlakiewicz <erh@swapsimple.com> writes:

I just spent a couple of days trying to figure out why I couldn't start
two servers on the same port, even though I was configuring separate
listen_address values.

That's already documented not to work, and not for any hidden
implementation reason: you'd have a conflict on the Unix-domain socket
name.

regards, tom lane

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#2)
Re: two servers on the same port

Tom Lane wrote:

Eric Haszlakiewicz <erh@swapsimple.com> writes:

I just spent a couple of days trying to figure out why I couldn't start
two servers on the same port, even though I was configuring separate
listen_address values.

That's already documented not to work, and not for any hidden
implementation reason: you'd have a conflict on the Unix-domain socket
name.

unless you use a different socket directory.

cheers

andrew

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#3)
Re: two servers on the same port

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

That's already documented not to work, and not for any hidden
implementation reason: you'd have a conflict on the Unix-domain socket
name.

unless you use a different socket directory.

Hmm ... but the OP didn't mention any such thing. In any case I think
he's misdiagnosed his problem, because the shmem code *should* ignore
pre-existing shmem segments that are already in use --- see the loop in
PGSharedMemoryCreate.

regards, tom lane

#5Eric Haszlakiewicz
erh@swapsimple.com
In reply to: Tom Lane (#2)
Re: two servers on the same port

On Sat, Oct 18, 2008 at 12:48:13PM -0400, Tom Lane wrote:

Eric Haszlakiewicz <erh@swapsimple.com> writes:

I just spent a couple of days trying to figure out why I couldn't start
two servers on the same port, even though I was configuring separate
listen_address values.

That's already documented not to work, and not for any hidden
implementation reason: you'd have a conflict on the Unix-domain socket
name.

er.. but I didn't get any kind of error about a conflict on a unix domain
socket, I got an error about shmget. I don't even think it's possible
to have a conflict like that since the two servers were running in
different chroot directories.

eric

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Eric Haszlakiewicz (#5)
Re: two servers on the same port

Eric Haszlakiewicz <erh@swapsimple.com> writes:

On Sat, Oct 18, 2008 at 12:48:13PM -0400, Tom Lane wrote:

That's already documented not to work, and not for any hidden
implementation reason: you'd have a conflict on the Unix-domain socket
name.

er.. but I didn't get any kind of error about a conflict on a unix domain
socket, I got an error about shmget. I don't even think it's possible
to have a conflict like that since the two servers were running in
different chroot directories.

Well, different chroot would do it, but you didn't mention that ;-)

Anyway, I still think that the proposed documentation patches are wrong,
because the code ought to work as long as you don't have a direct
conflict on TCP or Unix sockets. It's true that the port number is used
as a seed for picking shmem keys, but it should try the next key if it
hits an already-in-use shmem segment. Can you poke at it a bit more
closely and see what's happening? What platform is this, anyway?

regards, tom lane

#7Eric Haszlakiewicz
erh@swapsimple.com
In reply to: Tom Lane (#6)
Re: two servers on the same port

On Sun, Oct 19, 2008 at 10:15:22PM -0400, Tom Lane wrote:

Eric Haszlakiewicz <erh@swapsimple.com> writes:

On Sat, Oct 18, 2008 at 12:48:13PM -0400, Tom Lane wrote:

That's already documented not to work, and not for any hidden
implementation reason: you'd have a conflict on the Unix-domain socket
name.

er.. but I didn't get any kind of error about a conflict on a unix domain
socket, I got an error about shmget. I don't even think it's possible
to have a conflict like that since the two servers were running in
different chroot directories.

Well, different chroot would do it, but you didn't mention that ;-)

er.. why does a chroot matter? I don't see any mention of chroot in
the docs.

Anyway, I still think that the proposed documentation patches are wrong,
because the code ought to work as long as you don't have a direct
conflict on TCP or Unix sockets. It's true that the port number is used

I don't understand how the configuration I have contains a conflict. Because
of the chroot, the unix socket can't conflict, and because I set different
IP addresses the tcp socket shouldn't conflict either.

as a seed for picking shmem keys, but it should try the next key if it
hits an already-in-use shmem segment. Can you poke at it a bit more
closely and see what's happening? What platform is this, anyway?

I'm running on NetBSD 4.

Well, it seems that something doesn't work right with the "try the next key"
code when the userid are the same. I'm not really sure what I should try
here.

eric

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Eric Haszlakiewicz (#7)
Re: two servers on the same port

Eric Haszlakiewicz <erh@swapsimple.com> writes:

On Sun, Oct 19, 2008 at 10:15:22PM -0400, Tom Lane wrote:

Well, different chroot would do it, but you didn't mention that ;-)

er.. why does a chroot matter?

Putting the servers in different chroots would mean that they see two
different /tmp directories, thus no conflict from both trying to open
Unix-domain sockets at /tmp/.s.PGSQL.5432.

Anyway, I still think that the proposed documentation patches are wrong,
because the code ought to work as long as you don't have a direct
conflict on TCP or Unix sockets. It's true that the port number is used

I don't understand how the configuration I have contains a conflict.

It doesn't. So the question is why do you have a problem?

What platform is this, anyway?

I'm running on NetBSD 4.

Well, it seems that something doesn't work right with the "try the next key"
code when the userid are the same. I'm not really sure what I should try
here.

I read the code and the shmget spec a bit more. It looks to me like the
issue may be about the ordering of error checks in the kernel. The
Single Unix Spec quoth

The shmget() function will fail if:

[EEXIST]
A shared memory identifier exists for the argument key but
(shmflg&IPC_CREAT)&&(shmflg&IPC_EXCL) is non-zero.

[EINVAL]
The value of size is less than the system-imposed minimum or greater
than the system-imposed maximum, or a shared memory identifier exists
for the argument key but the size of the segment associated with it is
less than size and size is not 0.

[ and some other error cases that aren't interesting here ]

If you are starting the two servers with different shmem sizing
parameters then it is possible that the second reason for giving EINVAL
applies. Now our code is expecting to get EEXIST if there's a shmem
conflict, and it treats EINVAL as fatal because of the first reason for
giving EINVAL. I wonder whether NetBSD is coded so that it kicks out
EINVAL in this situation. It would be within its rights according to
SUS I suppose (since the spec quoth "If more than one error occurs in
processing a function call, any one of the possible errors may be
returned, as the order of detection is undefined.") but I would still
argue that this is a kernel bug because that behavior is useless.
The EINVAL error is sufficiently ambiguous that it should not be
returned if there is a less ambiguous reason to fail.

For comparison, the Linux manpage for shmget says in so many words

If shmflg specifies both IPC_CREAT and IPC_EXCL and a shared
memory segment already exists for key, then shmget() fails with
errno set to EEXIST.

and the Darwin (some-BSD-derived) manpage also gives EEXIST priority,
saying

[EINVAL] No shared memory segment is to be created, and a
shared memory segment exists for key, but the size of
the segment associated with it is less than size,
which is non-zero.

So the first question for you is did you give the two servers different
shmem sizing parameters? If so, does the behavior change if you start
them in the opposite order? If the answer to both is "yes" then I think
you ought to file a bug against NetBSD kernel. They're returning an
error code that is uselessly confusing and out of step with other
implementations.

regards, tom lane

#9Eric Haszlakiewicz
erh@swapsimple.com
In reply to: Tom Lane (#8)
Re: two servers on the same port

On Sun, Oct 19, 2008 at 11:21:09PM -0400, Tom Lane wrote:

Eric Haszlakiewicz <erh@swapsimple.com> writes:

On Sun, Oct 19, 2008 at 10:15:22PM -0400, Tom Lane wrote:

What platform is this, anyway?

I'm running on NetBSD 4.

Well, it seems that something doesn't work right with the "try the next key"
code when the userid are the same. I'm not really sure what I should try
here.

I read the code and the shmget spec a bit more. It looks to me like the
issue may be about the ordering of error checks in the kernel. The
Single Unix Spec quoth

...snip...

If you are starting the two servers with different shmem sizing
parameters then it is possible that the second reason for giving EINVAL
applies. Now our code is expecting to get EEXIST if there's a shmem

...snip...

So the first question for you is did you give the two servers different
shmem sizing parameters? If so, does the behavior change if you start
them in the opposite order? If the answer to both is "yes" then I think
you ought to file a bug against NetBSD kernel. They're returning an
error code that is uselessly confusing and out of step with other
implementations.

Yes, and yes. The error checking order in NetBSD put the EEXIST return
last so the "different size check" was taking precedence. I fixed that,
and now starting two pg servers, even in different chroot's, behaves as
expected. Thanks for the suggestion of where to look!

eric