postmaster crash

Started by Michael Beckstetteabout 24 years ago3 messagesbugs
Jump to latest
#1Michael Beckstette
mbeckste@TechFak.Uni-Bielefeld.DE

Hi,

from time to time my postmaster crashes with the below mentioned errormessage.

I am using PostgreSQL 7.1.2 on sparc-sun-solaris2.5.1, compiled by GCC 2.95 on
a 4CPU/4GB UltraSparc II system.

The crash happened during a VACUUM ANALYZE which i perform every 30 minutes.
At the point of the crash i had several open client connections which tried to
fill one table (table1),by a 'COPY table1 FROM stdin.....' simultaniously.

LOG:
DEBUG: --Relation cluster_nodes--
DEBUG: Pages 52: Changed 0, reaped 52, Empty 0, New 0; Tup 23: Vac 1914,
Keep/VTL 0/0, Crash 0, UnUsed 28, MinLen 120, MaxLen 372; Re-using: Free/Avail.
Space 411044/405288; EndEmpty/Avail. Pages 0/51. CPU 0.00s/0.00u sec.
DEBUG: Index node_index: Pages 14; Tuples 23: Deleted 1914. CPU 0.00s/0.08u
sec.
DEBUG: Rel cluster_nodes: Pages: 52 --> 1; Tuple(s) moved: 23. CPU 0.00s/0.05u
sec.
DEBUG: Index node_index: Pages 14; Tuples 23: Deleted 23. CPU 0.00s/0.00u sec.
DEBUG: MoveOfflineLogs: remove 0000000D000000A3
FATAL 2: MoveOfflineLogs: cannot read xlog dir: Invalid argument
Server process (pid 11560) exited with status 512 at Mon Mar 25 22:00:53 2002
Terminating any active server processes...
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend died abnormally
and possibly corrupted shared memory.
I have rolled back the current transaction and am going to
terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend died abnormally
and possibly corrupted shared memory.
I have rolled back the current transaction and am going to
terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
'
'
'
'
'
'
Server processes were terminated at Mon Mar 25 22:00:53 2002
Reinitializing shared memory and semaphores
DEBUG: database system was interrupted at 2002-03-25 22:00:53 MET
DEBUG: CheckPoint record at (13, 2751970248)
DEBUG: Redo record at (13, 2751936936); Undo record at (0, 0); Shutdown FALSE
DEBUG: NextTransactionId: 254232066; NextOid: 52967926
DEBUG: database system was not properly shut down; automatic recovery in
progress...
DEBUG: redo starts at (13, 2751936936)
DEBUG: ReadRecord: record with zero len at (13, 2751970312)
DEBUG: redo done at (13, 2751970248)
DEBUG: database system is in production state

Thanks for an almost :) excellent product.

Regards
Michael Beckstette

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Beckstette (#1)
Re: postmaster crash

"Michael Beckstette" <mbeckste@TechFak.Uni-Bielefeld.DE> writes:

from time to time my postmaster crashes with the below mentioned errormessage.

DEBUG: MoveOfflineLogs: remove 0000000D000000A3
FATAL 2: MoveOfflineLogs: cannot read xlog dir: Invalid argument

Hmm. I cannot see a reason for the xlog directory to be unreadable,
especially not if it's readable most of the time.

The code that is reporting this failure is in MoveOfflineLogs() in
src/backend/access/transam/xlog.c. You could maybe add some
additional debug logging there to try to understand what is going wrong
... but I sure don't see any reason for readdir() to fail, especially
not if it's succeeded on previous calls --- and your log indicates it's
succeeded at least once.

Are you perhaps running Postgres over an NFS mount? That's widely
considered unreliable.

regards, tom lane

#3Michael Beckstette
mbeckste@TechFak.Uni-Bielefeld.DE
In reply to: Michael Beckstette (#1)
Re: postmaster crash

Hi Tom,

thanx for your quick response.
Yes the DB is on a NFS mounted volume, but i have checked my logs: There are no
nfs error messages. As i said these error occures only from time to time (3
times in the last 6 month...I guess).And the only thing i recognized (except
the same errormessage (xlogdir)) everytime was, that it happened when
transfering large amounts of data (10 clients simultaniously, each ca. 100MB to
one single table using COPY).

Michael