postmaster crash

Started by Michael Beckstetteabout 24 years ago3 messagesbugs

Jump to latest

Michael Beckstette

mbeckste@TechFak.Uni-Bielefeld.DE

about 24 years ago

Hi,

from time to time my postmaster crashes with the below mentioned errormessage.

I am using PostgreSQL 7.1.2 on sparc-sun-solaris2.5.1, compiled by GCC 2.95 on
a 4CPU/4GB UltraSparc II system.

The crash happened during a VACUUM ANALYZE which i perform every 30 minutes.
At the point of the crash i had several open client connections which tried to
fill one table (table1),by a 'COPY table1 FROM stdin.....' simultaniously.

LOG:
DEBUG: --Relation cluster_nodes--
DEBUG: Pages 52: Changed 0, reaped 52, Empty 0, New 0; Tup 23: Vac 1914,
Keep/VTL 0/0, Crash 0, UnUsed 28, MinLen 120, MaxLen 372; Re-using: Free/Avail.
Space 411044/405288; EndEmpty/Avail. Pages 0/51. CPU 0.00s/0.00u sec.
DEBUG: Index node_index: Pages 14; Tuples 23: Deleted 1914. CPU 0.00s/0.08u
sec.
DEBUG: Rel cluster_nodes: Pages: 52 --> 1; Tuple(s) moved: 23. CPU 0.00s/0.05u
sec.
DEBUG: Index node_index: Pages 14; Tuples 23: Deleted 23. CPU 0.00s/0.00u sec.
DEBUG: MoveOfflineLogs: remove 0000000D000000A3
FATAL 2: MoveOfflineLogs: cannot read xlog dir: Invalid argument
Server process (pid 11560) exited with status 512 at Mon Mar 25 22:00:53 2002
Terminating any active server processes...
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend died abnormally
and possibly corrupted shared memory.
I have rolled back the current transaction and am going to
terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend died abnormally
and possibly corrupted shared memory.
I have rolled back the current transaction and am going to
terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
'
'
'
'
'
'
Server processes were terminated at Mon Mar 25 22:00:53 2002
Reinitializing shared memory and semaphores
DEBUG: database system was interrupted at 2002-03-25 22:00:53 MET
DEBUG: CheckPoint record at (13, 2751970248)
DEBUG: Redo record at (13, 2751936936); Undo record at (0, 0); Shutdown FALSE
DEBUG: NextTransactionId: 254232066; NextOid: 52967926
DEBUG: database system was not properly shut down; automatic recovery in
progress...
DEBUG: redo starts at (13, 2751936936)
DEBUG: ReadRecord: record with zero len at (13, 2751970312)
DEBUG: redo done at (13, 2751970248)
DEBUG: database system is in production state

Thanks for an almost :) excellent product.

Regards
Michael Beckstette

Tom Lane

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: Michael Beckstette (#1)

Re: postmaster crash

"Michael Beckstette" <mbeckste@TechFak.Uni-Bielefeld.DE> writes:

from time to time my postmaster crashes with the below mentioned errormessage.

DEBUG: MoveOfflineLogs: remove 0000000D000000A3
FATAL 2: MoveOfflineLogs: cannot read xlog dir: Invalid argument

Hmm. I cannot see a reason for the xlog directory to be unreadable,
especially not if it's readable most of the time.

The code that is reporting this failure is in MoveOfflineLogs() in
src/backend/access/transam/xlog.c. You could maybe add some
additional debug logging there to try to understand what is going wrong
... but I sure don't see any reason for readdir() to fail, especially
not if it's succeeded on previous calls --- and your log indicates it's
succeeded at least once.

Are you perhaps running Postgres over an NFS mount? That's widely
considered unreliable.

regards, tom lane

Michael Beckstette

mbeckste@TechFak.Uni-Bielefeld.DE

about 24 years ago

In reply to: Michael Beckstette (#1)

Re: postmaster crash

Hi Tom,

thanx for your quick response.
Yes the DB is on a NFS mounted volume, but i have checked my logs: There are no
nfs error messages. As i said these error occures only from time to time (3
times in the last 6 month...I guess).And the only thing i recognized (except
the same errormessage (xlogdir)) everytime was, that it happened when
transfering large amounts of data (10 clients simultaniously, each ca. 100MB to
one single table using COPY).

Michael

Import Notes

Reply to msg id not found: tgl@sss.pgh.pa.us