Weird postmaster crashes

Started by Dmitry Tkachalmost 23 years ago7 messagesbugsgeneral
Jump to latest
#1Dmitry Tkach
dmitry@openratings.com
bugsgeneral

I am experiencing database server crashes quite frequently (sometimes,
*daily*), and I am having hard time identifying what could possibly be
causing them :-(
They seem to be happenning kinda randomly, I was unable to attribute
them to any specific database activity going on at the time...
The postgres log looks like:

2003-06-10 13:53:32 [14522] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [16915] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [14523] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [17095] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without
a PROC structure
2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without
a PROC structure
2003-06-10 13:53:32 [14527] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [14685] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [17093] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [17092] DEBUG: pq_recvbuf: unexpected EOF on
client connection
.... <snip a few identical messages (with different pids)

2003-06-10 13:53:33 [14072] DEBUG: server process (pid 14551) exited
with exit code 1
2003-06-10 13:53:33 [14072] DEBUG: terminating any other active server
processes
2003-06-10 13:53:33 [1609] NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
.....

It does not even produce a core file after this - just silently exists,
and restarts itself.

Could somebody please point me to any clue what could possibly be wrong
with it?

This is 7.2.1 - I know, I need to upgrade.
Working on it, but it is going to take a while, and at the time being I
would greatly appreciate any ideas on what I can do about this thing.

Thanks a lot!

Dima

#2Dennis Gearon
gearond@cvc.net
In reply to: Dmitry Tkach (#1)
bugsgeneral
Re: [GENERAL] Weird postmaster crashes

the mantra is to always check hardware first. Do a disk and memory check.

Dmitry Tkach wrote:

Show quoted text

I am experiencing database server crashes quite frequently (sometimes,
*daily*), and I am having hard time identifying what could possibly be
causing them :-(
They seem to be happenning kinda randomly, I was unable to attribute
them to any specific database activity going on at the time...
The postgres log looks like:

2003-06-10 13:53:32 [14522] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [16915] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [14523] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [17095] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without
a PROC structure
2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without
a PROC structure
2003-06-10 13:53:32 [14527] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [14685] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [17093] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [17092] DEBUG: pq_recvbuf: unexpected EOF on
client connection
.... <snip a few identical messages (with different pids)

2003-06-10 13:53:33 [14072] DEBUG: server process (pid 14551) exited
with exit code 1
2003-06-10 13:53:33 [14072] DEBUG: terminating any other active server
processes
2003-06-10 13:53:33 [1609] NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
.....

It does not even produce a core file after this - just silently exists,
and restarts itself.

Could somebody please point me to any clue what could possibly be wrong
with it?

This is 7.2.1 - I know, I need to upgrade.
Working on it, but it is going to take a while, and at the time being I
would greatly appreciate any ideas on what I can do about this thing.

Thanks a lot!

Dima

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dmitry Tkach (#1)
bugsgeneral
Re: [GENERAL] Weird postmaster crashes

Dmitry Tkach <dmitry@openratings.com> writes:

I am experiencing database server crashes quite frequently

2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without
a PROC structure

This is 7.2.1 - I know, I need to upgrade.

Yes, you do. This is a known bug that was fixed in .3 or .4.

regards, tom lane

#4Dmitry Tkach
dmitry@openratings.com
In reply to: Tom Lane (#3)
bugsgeneral
Re: [GENERAL] Weird postmaster crashes

Tom Lane wrote:

Dmitry Tkach <dmitry@openratings.com> writes:

I am experiencing database server crashes quite frequently

2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without
a PROC structure

This is 7.2.1 - I know, I need to upgrade.

Yes, you do. This is a known bug that was fixed in .3 or .4.

regards, tom lane

Thanks, Tom!

That's kinda what I suspected....
Could you give me some idea on what circumstances cause this to happen?

Thanks again!

Dima

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dmitry Tkach (#4)
bugsgeneral
Re: [GENERAL] Weird postmaster crashes

Dmitry Tkach <dmitry@openratings.com> writes:

2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without
a PROC structure

Could you give me some idea on what circumstances cause this to happen?

IIRC, it's an order-of-operations mistake during backend shutdown: the
proc structure is deallocated while it's still possible to receive an
interrupt from another backend --- and if you get such an interrupt, you
need the proc. So from the user's point of view it's pretty
unpredictable.

Short answer: upgrade. This is not the only nasty bug in 7.2.1.

regards, tom lane

#6Dmitry Tkach
dmitry@openratings.com
In reply to: Tom Lane (#5)
bugsgeneral
Re: [GENERAL] Weird postmaster crashes

Makes sense. Thanks!

One more thing to clarify - when you said it was fixed in .3 and .4 did
you mean 7.3 or 7.2.3?

Thanks!

Dima

Tom Lane wrote:

Show quoted text

Dmitry Tkach <dmitry@openratings.com> writes:

2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without
a PROC structure

Could you give me some idea on what circumstances cause this to happen?

IIRC, it's an order-of-operations mistake during backend shutdown: the
proc structure is deallocated while it's still possible to receive an
interrupt from another backend --- and if you get such an interrupt, you
need the proc. So from the user's point of view it's pretty
unpredictable.

Short answer: upgrade. This is not the only nasty bug in 7.2.1.

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dmitry Tkach (#6)
bugsgeneral
Re: [GENERAL] Weird postmaster crashes

Dmitry Tkach <dmitry@openratings.com> writes:

One more thing to clarify - when you said it was fixed in .3 and .4 did
you mean 7.3 or 7.2.3?

I meant I couldn't remember whether it was first fixed in 7.2.3 or 7.2.4.
Doesn't matter for your purposes --- as long as you're updating, you
should go to 7.2.4.

7.3.* has the fix also of course, but updating to 7.3 is a much bigger
task.

regards, tom lane