shmctl EIDRM preventing startup

Started by Michael Fuhralmost 19 years ago7 messagesgeneral

mike@fuhr.org

almost 19 years ago

One of the servers I use (RHEL AS 4; Linux 2.6.9-34.ELsmp x86_64)
appears to be in the same state after a reboot as the server in the
"Restart after poweroutage" thread from a few months ago:

http://archives.postgresql.org/pgsql-general/2007-03/msg00738.php

As in the thread, "ipcs -a" shows no postgres-owned shared memory
segments and strace shows shmctl() failing with EIDRM.

http://archives.postgresql.org/pgsql-general/2007-03/msg00743.php

I have only limited access to the box and I haven't found out why
it was rebooted. I don't think it was a scheduled reboot so it
might have been due to a power outage.

Has anybody figured out if this is a Linux kernel bug? I might
have until Monday morning if anybody can suggest something to look
at; after that the admins will probably reboot and/or remove
postmaster.pid to get the database running again.

Thanks.

--
Michael Fuhr

Tom Lane

tgl@sss.pgh.pa.us

almost 19 years ago

In reply to: Michael Fuhr (#1)

Re: shmctl EIDRM preventing startup

Michael Fuhr <mike@fuhr.org> writes:

One of the servers I use (RHEL AS 4; Linux 2.6.9-34.ELsmp x86_64)
appears to be in the same state after a reboot as the server in the
"Restart after poweroutage" thread from a few months ago:

http://archives.postgresql.org/pgsql-general/2007-03/msg00738.php

Interesting indeed. Lapham's report was on FC6 which uses a kernel
vastly newer than RHEL4 (2.6.20) but his was also x86_64, which might
be relevant. I recall trying a little bit to reproduce the problem
after updating my own x86_64 box to FC6, but without success.

Has anybody figured out if this is a Linux kernel bug? I might
have until Monday morning if anybody can suggest something to look
at; after that the admins will probably reboot and/or remove
postmaster.pid to get the database running again.

Is it possible/reasonable/practical to (a) hold off longer than that
and (b) get me access to the box? On Monday I'd have a chance to
involve some Red Hat kernel folk in looking at it.

regards, tom lane

Alvaro Herrera

alvherre@2ndquadrant.com

almost 19 years ago

In reply to: Michael Fuhr (#1)

Re: shmctl EIDRM preventing startup

Michael Fuhr wrote:

One of the servers I use (RHEL AS 4; Linux 2.6.9-34.ELsmp x86_64)
appears to be in the same state after a reboot as the server in the
"Restart after poweroutage" thread from a few months ago:

http://archives.postgresql.org/pgsql-general/2007-03/msg00738.php

As in the thread, "ipcs -a" shows no postgres-owned shared memory
segments and strace shows shmctl() failing with EIDRM.

http://archives.postgresql.org/pgsql-general/2007-03/msg00743.php

Maybe what is happening is that an entirely unrelated process created a
segment with that ID, attached to it, and then it was deleted. I don't
know how to check however.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Tom Lane

tgl@sss.pgh.pa.us

almost 19 years ago

In reply to: Alvaro Herrera (#3)

Re: shmctl EIDRM preventing startup

Alvaro Herrera <alvherre@commandprompt.com> writes:

Maybe what is happening is that an entirely unrelated process created a
segment with that ID, attached to it, and then it was deleted. I don't
know how to check however.

AFAIK, EIDRM should imply that the segment has been IPC_RMID'd but still
exists because there are still processes attached to it. So the thing
to look for is processes still attached. Not 100% sure how to do that,
but I'm sure the info is exposed under /proc somehow...

regards, tom lane

Michael Fuhr

mike@fuhr.org

almost 19 years ago

In reply to: Tom Lane (#2)

Re: shmctl EIDRM preventing startup

On Sun, Jul 01, 2007 at 10:06:58PM -0400, Tom Lane wrote:

Michael Fuhr <mike@fuhr.org> writes:

Has anybody figured out if this is a Linux kernel bug? I might
have until Monday morning if anybody can suggest something to look
at; after that the admins will probably reboot and/or remove
postmaster.pid to get the database running again.

Is it possible/reasonable/practical to (a) hold off longer than that
and (b) get me access to the box? On Monday I'd have a chance to
involve some Red Hat kernel folk in looking at it.

Possibly; I'll see what I can do. How early Monday do you think
everybody would be available?

--
Michael Fuhr

Martijn van Oosterhout

kleptog@svana.org

almost 19 years ago

In reply to: Tom Lane (#4)

Re: shmctl EIDRM preventing startup

On Sun, Jul 01, 2007 at 10:39:01PM -0400, Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Maybe what is happening is that an entirely unrelated process created a
segment with that ID, attached to it, and then it was deleted. I don't
know how to check however.

AFAIK, EIDRM should imply that the segment has been IPC_RMID'd but still
exists because there are still processes attached to it. So the thing
to look for is processes still attached. Not 100% sure how to do that,
but I'm sure the info is exposed under /proc somehow...

If it's installed, this:

lsof |grep SYSV

Will list all processes attached to a SHM segemtn on the system. I
think ipcs can do the same. You can grep /proc/*/maps for the same
info.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

Michael Fuhr

mike@fuhr.org

almost 19 years ago

In reply to: Martijn van Oosterhout (#6)

Re: shmctl EIDRM preventing startup

On Mon, Jul 02, 2007 at 01:05:35PM +0200, Martijn van Oosterhout wrote:

If it's installed, this:

lsof |grep SYSV

Will list all processes attached to a SHM segemtn on the system. I
think ipcs can do the same. You can grep /proc/*/maps for the same
info.

I already tried those; none show the shared memory key that the
postmaster is complaining about.

--
Michael Fuhr