CRITICAL HELP NEEDED! DEAD DB!
Sep 24 10:22:37 snafu postgres[18306]: [2-1] LOG: database system was
interrupted while in recovery at 2004-09-24 10:21:41 MST
Sep 24 10:22:37 snafu postgres[18306]: [2-2] HINT: This probably means
that some data is corrupted and you will have to use the last backup for
recovery.
Sep 24 10:22:37 snafu postgres[18306]: [3-1] LOG: checkpoint record is
at 9A/C2022368
Sep 24 10:22:37 snafu postgres[18306]: [4-1] LOG: redo record is at
9A/C2022368; undo record is at 0/0; shutdown FALSE
Sep 24 10:22:37 snafu postgres[18306]: [5-1] LOG: next transaction ID:
197841225; next OID: 715436086
Sep 24 10:22:37 snafu postgres[18306]: [6-1] LOG: database system was
not properly shut down; automatic recovery in progress
Sep 24 10:22:37 snafu postgres[18306]: [7-1] LOG: redo starts at
9A/C20223B0
Sep 24 10:22:37 snafu postgres[18306]: [8-1] PANIC: btree_insert_redo:
failed to add item
Sep 24 10:22:37 snafu postgres[18299]: [2-1] LOG: startup process (PID
18306) was terminated by signal 6
Sep 24 10:22:37 snafu postgres[18299]: [3-1] LOG: aborting startup due
to startup process failure
Any suggestions to recover?! I'm dead in the water! Please!!!
For starters a little more detail would be helpful, for example:
What version of PostgreSQL? What OS? What compiler? What happened that
caused this? Server Crash?
Matthew
Cott Lang wrote:
Show quoted text
Sep 24 10:22:37 snafu postgres[18306]: [2-1] LOG: database system was
interrupted while in recovery at 2004-09-24 10:21:41 MST
Sep 24 10:22:37 snafu postgres[18306]: [2-2] HINT: This probably means
that some data is corrupted and you will have to use the last backup for
recovery.
Sep 24 10:22:37 snafu postgres[18306]: [3-1] LOG: checkpoint record is
at 9A/C2022368
Sep 24 10:22:37 snafu postgres[18306]: [4-1] LOG: redo record is at
9A/C2022368; undo record is at 0/0; shutdown FALSE
Sep 24 10:22:37 snafu postgres[18306]: [5-1] LOG: next transaction ID:
197841225; next OID: 715436086
Sep 24 10:22:37 snafu postgres[18306]: [6-1] LOG: database system was
not properly shut down; automatic recovery in progress
Sep 24 10:22:37 snafu postgres[18306]: [7-1] LOG: redo starts at
9A/C20223B0
Sep 24 10:22:37 snafu postgres[18306]: [8-1] PANIC: btree_insert_redo:
failed to add item
Sep 24 10:22:37 snafu postgres[18299]: [2-1] LOG: startup process (PID
18306) was terminated by signal 6
Sep 24 10:22:37 snafu postgres[18299]: [3-1] LOG: aborting startup due
to startup process failureAny suggestions to recover?! I'm dead in the water! Please!!!
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Cott Lang
Sent: Friday, September 24, 2004 10:21 AM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] CRITICAL HELP NEEDED! DEAD DB!Sep 24 10:22:37 snafu postgres[18306]: [2-1] LOG: database
system was interrupted while in recovery at 2004-09-24
10:21:41 MST Sep 24 10:22:37 snafu postgres[18306]: [2-2]
HINT: This probably means that some data is corrupted and
you will have to use the last backup for recovery. Sep 24
10:22:37 snafu postgres[18306]: [3-1] LOG: checkpoint record
is at 9A/C2022368 Sep 24 10:22:37 snafu postgres[18306]:
[4-1] LOG: redo record is at 9A/C2022368; undo record is at
0/0; shutdown FALSE Sep 24 10:22:37 snafu postgres[18306]:
[5-1] LOG: next transaction ID: 197841225; next OID:
715436086 Sep 24 10:22:37 snafu postgres[18306]: [6-1] LOG:
database system was not properly shut down; automatic
recovery in progress Sep 24 10:22:37 snafu postgres[18306]:
[7-1] LOG: redo starts at 9A/C20223B0 Sep 24 10:22:37 snafu
postgres[18306]: [8-1] PANIC: btree_insert_redo: failed to
add item Sep 24 10:22:37 snafu postgres[18299]: [2-1] LOG:
startup process (PID
18306) was terminated by signal 6
Sep 24 10:22:37 snafu postgres[18299]: [3-1] LOG: aborting
startup due to startup process failureAny suggestions to recover?! I'm dead in the water! Please!!!
When did you do your last backup?
This message is a clue:
"HINT: This probably means that some data is corrupted and you will
have to use the last backup for recovery."
If you do a restore from your last backup, you will lose the data
between that time and the time of the problem. Any other solution will
be fraught with peril, I think.
Otherwise, maybe something here will help:
http://svana.org/kleptog/pgsql/pgfsck.html
Import Notes
Resolved by subject fallback
Cott Lang <cott@internetstaff.com> writes:
Sep 24 10:22:37 snafu postgres[18306]: [2-1] LOG: database system was
interrupted while in recovery at 2004-09-24 10:21:41 MST
Sep 24 10:22:37 snafu postgres[18306]: [2-2] HINT: This probably means
that some data is corrupted and you will have to use the last backup for
recovery.
Sep 24 10:22:37 snafu postgres[18306]: [3-1] LOG: checkpoint record is
at 9A/C2022368
Sep 24 10:22:37 snafu postgres[18306]: [4-1] LOG: redo record is at
9A/C2022368; undo record is at 0/0; shutdown FALSE
Sep 24 10:22:37 snafu postgres[18306]: [5-1] LOG: next transaction ID:
197841225; next OID: 715436086
Sep 24 10:22:37 snafu postgres[18306]: [6-1] LOG: database system was
not properly shut down; automatic recovery in progress
Sep 24 10:22:37 snafu postgres[18306]: [7-1] LOG: redo starts at
9A/C20223B0
Sep 24 10:22:37 snafu postgres[18306]: [8-1] PANIC: btree_insert_redo:
failed to add item
Sep 24 10:22:37 snafu postgres[18299]: [2-1] LOG: startup process (PID
18306) was terminated by signal 6
Sep 24 10:22:37 snafu postgres[18299]: [3-1] LOG: aborting startup due
to startup process failure
Any suggestions to recover?! I'm dead in the water! Please!!!
I think your only chance is pg_resetxlog. Be aware that you won't
necessarily have a consistent database afterwards --- in particular,
whichever index that failure is about is certainly broken. I'd
recommend a dump and reload, plus as much manual verification of data
consistency as you can manage.
How did you get into this state, anyway?
regards, tom lane
On Fri, 2004-09-24 at 11:43, Tom Lane wrote:
I think your only chance is pg_resetxlog. Be aware that you won't
necessarily have a consistent database afterwards --- in particular,
whichever index that failure is about is certainly broken. I'd
recommend a dump and reload, plus as much manual verification of data
consistency as you can manage.
That's what I've done, so far so good, although we are still checking
consistency against the last backup. Thanks for the info. Luckily this
was one of our smaller databases ...
How did you get into this state, anyway?
I wish I knew - this is what appeared to start it:
Sep 24 10:19:41 snafu postgres[18176]: [464-1] ERROR: could not open
segment 1 of relation "idx_ordl_id" (target block 1719234412): No such
file or
Sep 24 10:19:41 snafu postgres[18176]: [464-2] directory
I can't figure out what the exact problem is; there were no I/O errors
or any other relative messages at the time, the box was empty, and
nothing remarkable was going on. :(
thanks,
Cott
PS: No, I don't think it's a PG problem. :)
Does pgfsck work on 7.4.x?
Show quoted text
Otherwise, maybe something here will help:
http://svana.org/kleptog/pgsql/pgfsck.html---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Cott Lang wrote:
I wish I knew - this is what appeared to start it:
Sep 24 10:19:41 snafu postgres[18176]: [464-1] ERROR: could not open
segment 1 of relation "idx_ordl_id" (target block 1719234412): No such
file or
Sep 24 10:19:41 snafu postgres[18176]: [464-2] directoryI can't figure out what the exact problem is; there were no I/O errors
or any other relative messages at the time, the box was empty, and
nothing remarkable was going on. :(
I saw that exact error message, with no logged I/O system errors, when
using SAN attached storage a month or so ago. It turned out to be the
SAN silently corrupting files. We did eventually start to see scsi
errors, but not at the beginning.
Joe