crash / data recovery issues

Started by Robert Treatabout 18 years ago4 messageshackers
Jump to latest
#1Robert Treat
xzilla@users.sourceforge.net

I'm trying to do some data recovery on an 8.1.9 system. The brief history is
the system crashed, attempted to do xlog replay but that failed. I did a
pg_resetxlog to get something that would startup, and it looks as if the
indexes on pg_class have become corrupt. (ie. reindex claimes duplicate rows,
which do not show up when doing count() manipulations on the data). As it
turns out, I can't drop these indexes either (system refuses with message
indexes are needed by the system). This has kind of let the system in an
unworkable state.

I've tried to do a pg_dump, but get schema with OID 96568 does not exist
error. The database has a number (~100) temp schemas in it, so I was
suspecting that the problem was with some object referencing a temp schema
with broken dependencies, but I looked through pg_depend for any referencing
objects but found none. I also looked through pg_type, pg_proc, pg_class,
pg_constraint, pg_operator, pg_opclass, pg_conversion at their respective
*namespace fields and also found no matches. Any suggestions on what else
might cause this, or how to get past it?

I also did some digging to find the original error on xlog replay and it
was "failed to re-find parent key in "763769" for split pages 21032/21033".
I'm wondering if this is actually something you can push past with
pg_resetxlog, or if I need to do a pg_resetxlog and pass in values prior to
that error point (i guess essentially letting pg_resetxlog do a lookup)...
thoughts?

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Treat (#1)
Re: crash / data recovery issues

Robert Treat wrote:

I'm trying to do some data recovery on an 8.1.9 system. The brief history is
the system crashed, attempted to do xlog replay but that failed. I did a
pg_resetxlog to get something that would startup, and it looks as if the
indexes on pg_class have become corrupt. (ie. reindex claimes duplicate rows,
which do not show up when doing count() manipulations on the data). As it
turns out, I can't drop these indexes either (system refuses with message
indexes are needed by the system). This has kind of let the system in an
unworkable state.

You can work out of it by starting a standalone server with system
indexes disabled (postgres -O -P, I think) and do a REINDEX on it (the
form of it that reindexes all system indexes -- I think it's REINDEX
DATABASE).

I also did some digging to find the original error on xlog replay and it
was "failed to re-find parent key in "763769" for split pages 21032/21033".
I'm wondering if this is actually something you can push past with
pg_resetxlog, or if I need to do a pg_resetxlog and pass in values prior to
that error point (i guess essentially letting pg_resetxlog do a lookup)...
thoughts?

You should be able to get out of that by reindexing that index.
(Actually, after you do a pg_resetxlog I think the best is to pg_dump
the whole thing and reload it. That gives you at least the assurance
that your FKs are not b0rked)

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Treat (#1)
Re: crash / data recovery issues

Robert Treat <xzilla@users.sourceforge.net> writes:

I'm trying to do some data recovery on an 8.1.9 system.
...
I also did some digging to find the original error on xlog replay and it
was "failed to re-find parent key in "763769" for split pages 21032/21033".

Hmm, the only known cause of that was fixed in 8.1.6. Don't suppose you made
a copy of everything before destroying the evidence with pg_resetxlog?
If you did, any chance I could get access to it?

regards, tom lane

#4Robert Treat
xzilla@users.sourceforge.net
In reply to: Alvaro Herrera (#2)
Re: crash / data recovery issues

On Wednesday 06 February 2008 13:56, Alvaro Herrera wrote:

Robert Treat wrote:

it looks as if the indexes on pg_class have become corrupt. (ie. reindex
claimes duplicate rows, which do not show up when doing count()
manipulations on the data). As it turns out, I can't drop these indexes
either (system refuses with message indexes are needed by the system).
This has kind of let the system in an unworkable state.

You can work out of it by starting a standalone server with system
indexes disabled (postgres -O -P, I think) and do a REINDEX on it (the
form of it that reindexes all system indexes -- I think it's REINDEX
DATABASE).

Sorry, I should have mentioned I tried the above was under postgres -d
1 -P -O -D /path/to/data, but the reindex complains (doing reindex directly
on the pg_class indexes, or doing reindex system).

Personally I was surprised to find out it wouldn't let me drop the indexes
under this mode, but thats a different story. Oh, probably worth noting I
am able to reindex other system tables this way, just not pg_class.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL