switching txlog file in 7.1beta

Started by Zeugswetter Andreas SBabout 25 years ago4 messages
#1Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at

when doing txlog switches there seems to be a problem with remembering the
correct = active logfile, when the postmaster crashes.

This is one of the problems I tried to show up previously:
You cannot rely on writes to other files except the txlog itself !!!

Thus the current way of recording the active txlog seg and position in
pg_control is busted, and must be avoided. I would try to not use pg_control
for this at all, but scan the pg_xlog directory for this purpose.

cusejoua=# update journaleintrag set txt_funktion=trim(txt_funktion);
FATAL 2: write(logfile 0 seg 2 off 4612096) failed: No such file or directory
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Andreas

PS: I am using -F (bad boy that I am :-)

#2Vadim Mikheev
vmikheev@sectorbase.com
In reply to: Zeugswetter Andreas SB (#1)
Re: switching txlog file in 7.1beta

when doing txlog switches there seems to be a problem with remembering the
correct = active logfile, when the postmaster crashes.

This is one of the problems I tried to show up previously:
You cannot rely on writes to other files except the txlog itself !!!

Why? If you handle those files specifically, as txlog itself.

Thus the current way of recording the active txlog seg and position in
pg_control is busted, and must be avoided. I would try to not use pg_control
for this at all, but scan the pg_xlog directory for this purpose.

cusejoua=# update journaleintrag set txt_funktion=trim(txt_funktion);
FATAL 2: write(logfile 0 seg 2 off 4612096) failed: No such file or directory
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Can you start up db with --wal_debug=1 and send me output?

Vadim

#3Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Vadim Mikheev (#2)
AW: switching txlog file in 7.1beta

cusejoua=# update journaleintrag set txt_funktion=trim(txt_funktion);
FATAL 2: write(logfile 0 seg 2 off 4612096) failed: No such file or directory
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Can you start up db with --wal_debug=1 and send me output?

Sorry, can't reproduce this exact case. The last log was actually number 3.
To reproduce similar states simply fill the data filesystem with any sql,
(may need to be one that writes more than 16 Mb txlog).

Compressed log something like:

INSERT @ 0/31853840: prev 0/31853648; xprev 0/31853648; xid 596: Heap - update: node 18719/18720; cid 0; tid 1005/26; new 3511/48
XLogFlush: rqst 0/22738576; wrt 0/31842304; flsh 0/24307040
ERROR: cannot extend journaleintrag: No space left on device.
Check free disk space.
INSERT @ 0/31854040: prev 0/31853840; xprev 0/31853840; xid 596: Transaction - abort: 2000-12-15 11:30:38
XLogFlush: rqst 0/29779696; wrt 0/0; flsh 0/0
FATAL 2: write(logfile 0 seg 1 off 15065088) failed: No space left on device
Server process (pid 23444) exited with status 512 at Fri Dec 15 11:33:12 2000
Terminating any active server processes...

Server processes were terminated at Fri Dec 15 11:33:12 2000
Reinitializing shared memory and semaphores
DEBUG: starting up
DEBUG: database system was interrupted at 2000-12-15 11:33:12
DEBUG: CheckPoint record at (0, 316272)
DEBUG: Redo record at (0, 316272); Undo record at (0, 0); Shutdown TRUE
DEBUG: NextTransactionId: 590; NextOid: 18719
DEBUG: database system was not properly shut down; automatic recovery in progress...
DEBUG: redo starts at (0, 316328)
REDO @ 0/316328; LSN 0/316360: prev 0/316272; xprev 0/0; xid 590: XLOG - nextOid: 26911
..........
REDO @ 0/327312; LSN 0/327560: prev 0/327064; xprev 0/327064; xid 594: Heap - insert: node 1
8719/18720; cid 0; tid 0/18
DEBUG: ReadRecord: there is no subrecord flag in logfile 0 seg 0 off 40
DEBUG: Formatting logfile 0 seg 0 block 39 at offset 8072
DEBUG: The last logId/logSeg is (0, 0)
DEBUG: Set logId/logSeg in control file
DEBUG: redo done at (0, 327312)
XLogFlush: rqst 0/0; wrt 0/327560; flsh 0/327560
...........
INSERT @ 0/327560: prev 0/327312; xprev 0/0; xid 0: XLOG - checkpoint: redo 0/327560; undo 0/0; sui 21; xid 595; oid 26911; shutdown
XLogFlush: rqst 0/327616; wrt 0/327560; flsh 0/327560
DEBUG: database system is in production state

Seems ReadRecord should switch to seg 1 above and not 0.
Then txlog file 0000001 somehow gets deleted.

(all rows from table journaleintrag are lost) test server only of course :-)

Andreas

#4Vadim Mikheev
vmikheev@sectorbase.com
In reply to: Zeugswetter Andreas SB (#3)
Re: switching txlog file in 7.1beta

REDO @ 0/327312; LSN 0/327560: prev 0/327064; xprev 0/327064; xid 594: Heap - insert: node 1
8719/18720; cid 0; tid 0/18
DEBUG: ReadRecord: there is no subrecord flag in logfile 0 seg 0 off 40

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

DEBUG: Formatting logfile 0 seg 0 block 39 at offset 8072
DEBUG: The last logId/logSeg is (0, 0)
DEBUG: Set logId/logSeg in control file
DEBUG: redo done at (0, 327312)
XLogFlush: rqst 0/0; wrt 0/327560; flsh 0/327560
...........
INSERT @ 0/327560: prev 0/327312; xprev 0/0; xid 0: XLOG - checkpoint: redo 0/327560; undo 0/0; sui 21; xid 595; oid 26911;

shutdown

XLogFlush: rqst 0/327616; wrt 0/327560; flsh 0/327560
DEBUG: database system is in production state

Seems ReadRecord should switch to seg 1 above and not 0.
Then txlog file 0000001 somehow gets deleted.

Something wrong was written into seg 0. ReadRecord assumes that this is end of log and
so removes anything after this place (and seg 1 too, of course).
It doesn't look like related to txlog switching. Something bad in XLogInsert/XLogWrite.
Thanks.

Vadim