shmctl EIDRM preventing startup
One of the servers I use (RHEL AS 4; Linux 2.6.9-34.ELsmp x86_64)
appears to be in the same state after a reboot as the server in the
"Restart after poweroutage" thread from a few months ago:
http://archives.postgresql.org/pgsql-general/2007-03/msg00738.php
As in the thread, "ipcs -a" shows no postgres-owned shared memory
segments and strace shows shmctl() failing with EIDRM.
http://archives.postgresql.org/pgsql-general/2007-03/msg00743.php
I have only limited access to the box and I haven't found out why
it was rebooted. I don't think it was a scheduled reboot so it
might have been due to a power outage.
Has anybody figured out if this is a Linux kernel bug? I might
have until Monday morning if anybody can suggest something to look
at; after that the admins will probably reboot and/or remove
postmaster.pid to get the database running again.
Thanks.
--
Michael Fuhr
Michael Fuhr <mike@fuhr.org> writes:
One of the servers I use (RHEL AS 4; Linux 2.6.9-34.ELsmp x86_64)
appears to be in the same state after a reboot as the server in the
"Restart after poweroutage" thread from a few months ago:
http://archives.postgresql.org/pgsql-general/2007-03/msg00738.php
Interesting indeed. Lapham's report was on FC6 which uses a kernel
vastly newer than RHEL4 (2.6.20) but his was also x86_64, which might
be relevant. I recall trying a little bit to reproduce the problem
after updating my own x86_64 box to FC6, but without success.
Has anybody figured out if this is a Linux kernel bug? I might
have until Monday morning if anybody can suggest something to look
at; after that the admins will probably reboot and/or remove
postmaster.pid to get the database running again.
Is it possible/reasonable/practical to (a) hold off longer than that
and (b) get me access to the box? On Monday I'd have a chance to
involve some Red Hat kernel folk in looking at it.
regards, tom lane
Michael Fuhr wrote:
One of the servers I use (RHEL AS 4; Linux 2.6.9-34.ELsmp x86_64)
appears to be in the same state after a reboot as the server in the
"Restart after poweroutage" thread from a few months ago:http://archives.postgresql.org/pgsql-general/2007-03/msg00738.php
As in the thread, "ipcs -a" shows no postgres-owned shared memory
segments and strace shows shmctl() failing with EIDRM.http://archives.postgresql.org/pgsql-general/2007-03/msg00743.php
Maybe what is happening is that an entirely unrelated process created a
segment with that ID, attached to it, and then it was deleted. I don't
know how to check however.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes:
Maybe what is happening is that an entirely unrelated process created a
segment with that ID, attached to it, and then it was deleted. I don't
know how to check however.
AFAIK, EIDRM should imply that the segment has been IPC_RMID'd but still
exists because there are still processes attached to it. So the thing
to look for is processes still attached. Not 100% sure how to do that,
but I'm sure the info is exposed under /proc somehow...
regards, tom lane
On Sun, Jul 01, 2007 at 10:06:58PM -0400, Tom Lane wrote:
Michael Fuhr <mike@fuhr.org> writes:
Has anybody figured out if this is a Linux kernel bug? I might
have until Monday morning if anybody can suggest something to look
at; after that the admins will probably reboot and/or remove
postmaster.pid to get the database running again.Is it possible/reasonable/practical to (a) hold off longer than that
and (b) get me access to the box? On Monday I'd have a chance to
involve some Red Hat kernel folk in looking at it.
Possibly; I'll see what I can do. How early Monday do you think
everybody would be available?
--
Michael Fuhr
On Sun, Jul 01, 2007 at 10:39:01PM -0400, Tom Lane wrote:
Alvaro Herrera <alvherre@commandprompt.com> writes:
Maybe what is happening is that an entirely unrelated process created a
segment with that ID, attached to it, and then it was deleted. I don't
know how to check however.AFAIK, EIDRM should imply that the segment has been IPC_RMID'd but still
exists because there are still processes attached to it. So the thing
to look for is processes still attached. Not 100% sure how to do that,
but I'm sure the info is exposed under /proc somehow...
If it's installed, this:
lsof |grep SYSV
Will list all processes attached to a SHM segemtn on the system. I
think ipcs can do the same. You can grep /proc/*/maps for the same
info.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
On Mon, Jul 02, 2007 at 01:05:35PM +0200, Martijn van Oosterhout wrote:
If it's installed, this:
lsof |grep SYSV
Will list all processes attached to a SHM segemtn on the system. I
think ipcs can do the same. You can grep /proc/*/maps for the same
info.
I already tried those; none show the shared memory key that the
postmaster is complaining about.
--
Michael Fuhr