BUG #5628: 9.0beta4 failed automatic crash recovery

Started by Itagaki Takahiroover 15 years ago6 messagesbugs
Jump to latest
#1Itagaki Takahiro
itagaki.takahiro@gmail.com

The following bug has been logged online:

Bug reference: 5628
Logged by: Itagaki Takahiro
Email address: itagaki.takahiro@gmail.com
PostgreSQL version: 9.0b4 (32bit)
Operating system: Windows 7 (64bit)
Description: 9.0beta4 failed automatic crash recovery
Details:

9.0beta4 seems to fail automatic crash recovery after
some of backend processes crashed, though 8.2 succeeded
to recover. This is a rare error case, but some logic
for shared memory might be broken between versions.

I crashed a backend as a test manually with "pg_ctl kill":
pg_ctl kill QUIT <backend-pid>

9.0 server has gone with the following logs:
----
WARNING: terminating connection because of crash of another server process
...
LOG: all server processes terminated; reinitializing
FATAL: pre-existing shared memory block is still in use
HINT: Check if there are any old server processes still running, and
terminate them.
----

But 8.2 can recover as expected:
----
WARNING: terminating connection because of crash of another server process
...
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted at <timestamp>
----

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Itagaki Takahiro (#1)
Re: BUG #5628: 9.0beta4 failed automatic crash recovery

"Itagaki Takahiro" <itagaki.takahiro@gmail.com> writes:

9.0beta4 seems to fail automatic crash recovery after
some of backend processes crashed,

Works for me, and always has worked for me (and I crash backend
processes regularly ;-)). Maybe something Windows-specific?

regards, tom lane

#3Itagaki Takahiro
itagaki.takahiro@gmail.com
In reply to: Tom Lane (#2)
Re: BUG #5628: 9.0beta4 failed automatic crash recovery

On Tue, Aug 24, 2010 at 9:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Itagaki Takahiro" <itagaki.takahiro@gmail.com> writes:

9.0beta4 seems to fail automatic crash recovery after
some of backend processes crashed,

Works for me, and always has worked for me (and I crash backend
processes regularly ;-)).

Me too!

 Maybe something Windows-specific?

Sure. I didn't see any problems on Linux machine.
There might be issues to detach/reattach shared memory on Windows.

--
Itagaki Takahiro

#4Magnus Hagander
magnus@hagander.net
In reply to: Itagaki Takahiro (#3)
Re: BUG #5628: 9.0beta4 failed automatic crash recovery

On Tue, Aug 24, 2010 at 2:59 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:

On Tue, Aug 24, 2010 at 9:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Itagaki Takahiro" <itagaki.takahiro@gmail.com> writes:

9.0beta4 seems to fail automatic crash recovery after
some of backend processes crashed,

Works for me, and always has worked for me (and I crash backend
processes regularly ;-)).

Me too!

 Maybe something Windows-specific?

Sure. I didn't see any problems on Linux machine.
There might be issues to detach/reattach shared memory on Windows.

We've seen this on and off before. Are you saying it's fully reproducible?

I don't recall if we did any specific changes around this for 9.0, did we?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#5Itagaki Takahiro
itagaki.takahiro@gmail.com
In reply to: Magnus Hagander (#4)
Re: BUG #5628: 9.0beta4 failed automatic crash recovery

On Tue, Aug 24, 2010 at 5:25 PM, Magnus Hagander <magnus@hagander.net> wrote:

There might be issues to detach/reattach shared memory on Windows.

We've seen this on and off before. Are you saying it's fully reproducible?

I don't recall if we did any specific changes around this for 9.0, did we?

Yes, it is reproducible. I tested 8.3 and 8.4, and found 8.3 and newer
versions failed to recover.
* 8.2,17 => OK
* 8.3.11, 8.4.4, 9.0b4 => FAILED

Same error messages were logged on failed cases:
FATAL: pre-existing shared memory block is still in use
HINT: Check if there are any old server processes still running,
and terminate them.

Changes for the issue might be introduced between 8.2 and 8.3,
or in bugfixes only applied to 8.3 or newer versions.

--
Itagaki Takahiro

#6Magnus Hagander
magnus@hagander.net
In reply to: Itagaki Takahiro (#5)
Re: BUG #5628: 9.0beta4 failed automatic crash recovery

On Tue, Aug 24, 2010 at 11:01 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:

On Tue, Aug 24, 2010 at 5:25 PM, Magnus Hagander <magnus@hagander.net>
wrote:

There might be issues to detach/reattach shared memory on Windows.

We've seen this on and off before. Are you saying it's fully reproducible?

I don't recall if we did any specific changes around this for 9.0, did we?

Yes, it is reproducible. I tested 8.3 and 8.4, and found 8.3 and newer
versions failed to recover.
 * 8.2,17 => OK
 * 8.3.11, 8.4.4, 9.0b4 => FAILED

Same error messages were logged on failed cases:
 FATAL:  pre-existing shared memory block is still in use
 HINT:  Check if there are any old server processes still running,
and terminate them.

Interesting. It certainly doesn't happen for everybody, or we
would've heard a lot more about this. We have seen a couple of reports
of it, IIRC, but nothing easily reproducible.

Could you try increasing either the Sleep() call or the loop counter
in PGSharedMemoryCreate (win32_shmem.c) to some very high value, and
then when it's trying to restart check:
1) Is there more than one postgres.exe running (there should be only
the postmaster
2) With process explorer, see if postmaster has an open handle to the
shared memory segment (thus is basically conflicting with itself)

Changes for the issue might be introduced between 8.2 and 8.3,
or in bugfixes only applied to 8.3 or newer versions.

Yes, the shared memory stuff was basically rewritten for 8.3.

We also have http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=0ad6b8dd7cee13cde693571f20b10b364e52dd23,
which has been backpatched to 8.2 but is not in any yet released
version. Could you try the current tip of the 8.2 branch?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/