BUG #5176: database recovery produces infinite loop

Started by Helge Mildeover 16 years ago3 messagesbugs
Jump to latest
#1Helge Milde
helge@monsternett.no

The following bug has been logged online:

Bug reference: 5176
Logged by: Helge Milde
Email address: helge@monsternett.no
PostgreSQL version: 8.1.18
Operating system: Debian 2.6.18-6-686
Description: database recovery produces infinite loop
Details:

Our database recently crashed, and since then, we haven't been able to start
it again.

Here's the postmaster log with debug level 5:

--- log start ---
LOG:  could not load root certificate file "root.crt": No SSL error
reported
DETAIL:  Will not verify client certificates.
DEBUG:  invoking IpcMemoryCreate(size=10469376)
DEBUG:  max_safe_fds = 985, usable_fds = 1000, already_open = 5
LOG:  database system was interrupted while in recovery at 2009-11-10
13:07:47 CET
HINT:  This probably means that some data is corrupted and you will have to
use the last backup for recovery.
LOG:  checkpoint record is at 8/931B8558
LOG:  redo record is at 8/931B3544; undo record is at 0/0; shutdown FALSE
LOG:  next transaction ID: 566148231; next OID: 25210
LOG:  next MultiXactId: 18; next MultiXactOffset: 37
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  redo starts at 8/931B3544
--- log end ---

Nothing else of interest follows.

It seems the recovery process is failing; doing a strace on the 'postgres:
startup process' pid, I see this:

-- strace log start ---
Process 29888 attached - interrupt to quit
close(14) = 0
open("pg_clog/021B", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 14
_llseek(14, 245760, [245760], SEEK_SET) = 0
write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192)
= 8192
fsync(14) = 0
close(14) = 0
open("pg_clog/021B", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 14
_llseek(14, 245760, [245760], SEEK_SET) = 0
write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192)
= 8192
fsync(14) = 0
close(14) = 0
-- strace log end ---

This repeats infinetly.

We do have some database dumps, but they are not very recent, so any help on
fixing this would be much appreciated.

Thanks,
Helge Milde

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Helge Milde (#1)
Re: BUG #5176: database recovery produces infinite loop

"Helge Milde" <helge@monsternett.no> writes:

It seems the recovery process is failing; doing a strace on the 'postgres:
startup process' pid, I see this:

-- strace log start ---
Process 29888 attached - interrupt to quit
close(14) = 0
open("pg_clog/021B", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 14
_llseek(14, 245760, [245760], SEEK_SET) = 0
write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192)
= 8192
fsync(14) = 0
close(14) = 0
open("pg_clog/021B", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 14
_llseek(14, 245760, [245760], SEEK_SET) = 0
write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192)
= 8192
fsync(14) = 0
close(14) = 0
-- strace log end ---

This repeats infinetly.

Hmm, can you attach to that process with gdb and collect a few stack
traces to show where it's looping? This isn't a symptom we've seen
before, AFAIR.

regards, tom lane

#3Helge Milde
helge@monsternett.no
In reply to: Tom Lane (#2)
Re: BUG #5176: database recovery produces infinite loop

I'm sorry, but the issue has been fixed now; we've upgraded to postgres-8.3, so I can't reproduce this anymore..
I'm not completely sure how it got fixed (it wasn't me), but I can ask the person who fixed it and get back to you tomorrow; perhaps that might help narrow the field a little bit.

On Tue, Nov 10, 2009 at 09:31:06AM -0500, Tom Lane wrote:

"Helge Milde" <helge@monsternett.no> writes:

It seems the recovery process is failing; doing a strace on the 'postgres:
startup process' pid, I see this:

-- strace log start ---
Process 29888 attached - interrupt to quit
close(14) = 0
open("pg_clog/021B", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 14
_llseek(14, 245760, [245760], SEEK_SET) = 0
write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192)
= 8192
fsync(14) = 0
close(14) = 0
open("pg_clog/021B", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 14
_llseek(14, 245760, [245760], SEEK_SET) = 0
write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192)
= 8192
fsync(14) = 0
close(14) = 0
-- strace log end ---

This repeats infinetly.

Hmm, can you attach to that process with gdb and collect a few stack
traces to show where it's looping? This isn't a symptom we've seen
before, AFAIR.

regards, tom lane

--
Helge Milde, 69701808
www.monsternett.no