Recovering database from crashed HD (bad sectors)
Hi
A development box hard disk failed which was running a PG instance with
multiple databases on it. I got the data recovered with some bad sector
errors. Ran another instance of PG (same version), and was to able to take
dump (using pg_dump) of all but one database. For one database I am getting
the following error:
pg_dump -Fc alpha_45 > alpha_45.dump
pg_dump: [archiver (db)] connection to database "alpha_45" failed: FATAL:
could not open file "base/525035/11678": No such file or directory
These are the only two files in the directory similar to the one above:
/usr/local/pgsql/data/base/525035/11678_fsm
/usr/local/pgsql/data/base/525035/11678_vm
Is there any hope of recovering this DB, or should I start looking into
restoring from old backups? Data loss is not a concern, I just would like
to know if I should even try working on it. I looked into this page
http://www.postgresql.org/docs/9.1/static/runtime-config-developer.html,
and tried using "zero_damaged_pages = on" in postgresql.conf, but it was of
no help.
OS: FreeBSD 9.3
PG Version: 9.1
With regards
Amitabh
On Sat, Jul 18, 2015 at 6:59 PM, Amitabh Kant <amitabhkant@gmail.com> wrote:
A development box hard disk failed which was running a PG instance with
multiple databases on it. I got the data recovered with some bad sector
errors. Ran another instance of PG (same version), and was to able to take
dump (using pg_dump) of all but one database. For one database I am getting
the following error:pg_dump -Fc alpha_45 > alpha_45.dump
pg_dump: [archiver (db)] connection to database "alpha_45" failed: FATAL:
could not open file "base/525035/11678": No such file or directoryThese are the only two files in the directory similar to the one above:
/usr/local/pgsql/data/base/525035/11678_fsm
/usr/local/pgsql/data/base/525035/11678_vmIs there any hope of recovering this DB, or should I start looking into
restoring from old backups? Data loss is not a concern, I just would like to
know if I should even try working on it. I looked into this page
http://www.postgresql.org/docs/9.1/static/runtime-config-developer.html, and
tried using "zero_damaged_pages = on" in postgresql.conf, but it was of no
help.
Visibly your data is already lost in alpha_45... Hence if you can
recover this database from an older backup you had better do it. Now
perhaps other folks here have other recommendations and strategies
though :)
--
Michael
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Michael Paquier wrote:
On Sat, Jul 18, 2015 at 6:59 PM, Amitabh Kant <amitabhkant@gmail.com> wrote:
A development box hard disk failed which was running a PG instance with
multiple databases on it. I got the data recovered with some bad sector
errors. Ran another instance of PG (same version), and was to able to take
dump (using pg_dump) of all but one database. For one database I am getting
the following error:pg_dump -Fc alpha_45 > alpha_45.dump
pg_dump: [archiver (db)] connection to database "alpha_45" failed: FATAL:
could not open file "base/525035/11678": No such file or directoryThese are the only two files in the directory similar to the one above:
/usr/local/pgsql/data/base/525035/11678_fsm
/usr/local/pgsql/data/base/525035/11678_vmIs there any hope of recovering this DB, or should I start looking into
restoring from old backups? Data loss is not a concern, I just would like to
know if I should even try working on it. I looked into this page
http://www.postgresql.org/docs/9.1/static/runtime-config-developer.html, and
tried using "zero_damaged_pages = on" in postgresql.conf, but it was of no
help.Visibly your data is already lost in alpha_45... Hence if you can
recover this database from an older backup you had better do it. Now
perhaps other folks here have other recommendations and strategies
though :)
If the data since the last backup is very valuable, there are certainly
other avenues to explore, but it's an ad-hoc job that takes time and
someone knowledgeable enough directly in charge of the job. It's not
something I would attempt over a mailing list, for sure.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Amitabh Kant <amitabhkant@gmail.com> writes:
A development box hard disk failed which was running a PG instance with
multiple databases on it. I got the data recovered with some bad sector
errors. Ran another instance of PG (same version), and was to able to take
dump (using pg_dump) of all but one database. For one database I am getting
the following error:
pg_dump -Fc alpha_45 > alpha_45.dump
pg_dump: [archiver (db)] connection to database "alpha_45" failed: FATAL:
could not open file "base/525035/11678": No such file or directory
These are the only two files in the directory similar to the one above:
/usr/local/pgsql/data/base/525035/11678_fsm
/usr/local/pgsql/data/base/525035/11678_vm
Is there any hope of recovering this DB, or should I start looking into
restoring from old backups?
Well, as others have said, if the data is worth money to you then it
would be worthwhile hiring a pro. If you just want to experiment,
the first thing to do is to find out which system catalog or index
that is, which you should be able to do by connecting to any of the
non-broken databases in the same installation and issuing
select * from pg_class where pg_relation_filenode(oid) = 11678;
If you're lucky and it's only an index, you could try connecting to the
busted database with PGOPTIONS=-P (ignore_system_indexes) and reindexing
the broken index. If it's a catalog, whether recovery is possible would
depend on which one.
In any case, it would be prudent to make a tar-style copy of the whole
$PGDATA tree (with the postmaster shut down) before experimenting,
so that you can undo any catastrophic mistakes.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I am just experimenting with the crashed DB. Before I tried anything else,
I did make a complete copy of the recovered directory tree.
As for running the sql command as suggested by Tom, here is the result:
template1=# select * from pg_class where pg_relation_filenode(oid) = 11678;
relname | relnamespace | reltype | reloftype | relowner | relam |
relfilenode | reltablespace | relpages | reltuples | reltoastrelid |
reltoastidxid | relhasindex | relisshared | relpersistence | relkind |
relnatts | relchecks | relhasoids | relhaspkey | relhasrules |
relhastriggers | relhassubclass | relfrozenxid | relacl | reloptions
----------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+----------------+---------+----------+-----------+------------+------------+-------------+----------------+----------------+--------------+------------+------------
pg_class | 11 | 83 | 0 | 10 | 0 |
0 | 0 | 8 | 281 | 0 | 0
| t | f | p | r | 26 |
0 | t | f | f | f | f
| 662 | {=r/pgsql} |
(1 row)
From what I can understand, this is not a index as reltype is non-zero . Is
there anything else that I can try?
With regards
Amitabh
On Sat, Jul 18, 2015 at 7:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Amitabh Kant <amitabhkant@gmail.com> writes:
A development box hard disk failed which was running a PG instance with
multiple databases on it. I got the data recovered with some bad sector
errors. Ran another instance of PG (same version), and was to able totake
dump (using pg_dump) of all but one database. For one database I am
getting
the following error:
pg_dump -Fc alpha_45 > alpha_45.dump
pg_dump: [archiver (db)] connection to database "alpha_45" failed: FATAL:
could not open file "base/525035/11678": No such file or directoryThese are the only two files in the directory similar to the one above:
/usr/local/pgsql/data/base/525035/11678_fsm
/usr/local/pgsql/data/base/525035/11678_vmIs there any hope of recovering this DB, or should I start looking into
restoring from old backups?Well, as others have said, if the data is worth money to you then it
would be worthwhile hiring a pro. If you just want to experiment,
the first thing to do is to find out which system catalog or index
that is, which you should be able to do by connecting to any of the
non-broken databases in the same installation and issuing
select * from pg_class where pg_relation_filenode(oid) = 11678;If you're lucky and it's only an index, you could try connecting to the
busted database with PGOPTIONS=-P (ignore_system_indexes) and reindexing
the broken index. If it's a catalog, whether recovery is possible would
depend on which one.In any case, it would be prudent to make a tar-style copy of the whole
$PGDATA tree (with the postmaster shut down) before experimenting,
so that you can undo any catastrophic mistakes.regards, tom lane
Amitabh Kant <amitabhkant@gmail.com> writes:
As for running the sql command as suggested by Tom, here is the result:
template1=# select * from pg_class where pg_relation_filenode(oid) = 11678;
pg_class | 11 | 83 | 0 | 10 | 0 |
0 | 0 | 8 | 281 | 0 | 0
| t | f | p | r | 26 |
0 | t | f | f | f | f
| 662 | {=r/pgsql} |
That's about the worst possible answer :-(. Without pg_class, you have
little hope of telling which is which among the other files; and there
would be no real commonality with the contents of pg_class from other
databases in the installation, so no way to jury-rig something. Moreover,
because pg_class is consulted *very* early in backend startup, it seems
entirely likely that the failure you're seeing is only the tip of the
iceberg; there very possibly are other files that are also missing or
badly damaged.
It's possible that a professional data recovery team could extract
something from the wreckage, but it would take a lot of time and money.
Personally I'd say it's time to go to your backups.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Tom Lane wrote:
Amitabh Kant <amitabhkant@gmail.com> writes:
As for running the sql command as suggested by Tom, here is the result:
template1=# select * from pg_class where pg_relation_filenode(oid) = 11678;pg_class | 11 | 83 | 0 | 10 | 0 |
0 | 0 | 8 | 281 | 0 | 0
| t | f | p | r | 26 |
0 | t | f | f | f | f
| 662 | {=r/pgsql} |That's about the worst possible answer :-(. Without pg_class, you have
little hope of telling which is which among the other files; and there
would be no real commonality with the contents of pg_class from other
databases in the installation, so no way to jury-rig something. Moreover,
because pg_class is consulted *very* early in backend startup, it seems
entirely likely that the failure you're seeing is only the tip of the
iceberg; there very possibly are other files that are also missing or
badly damaged.
I would look for the file in the partition's lost+found directory; with
luck, the file is there. pg_filedump can identify the number of
attributes each tuple has in each file, if there are many such files;
you need to look for a file whose tuples have 26 attributes (relnatts in
the above row).
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Sun, Jul 19, 2015 at 4:10 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:
Tom Lane wrote:
Amitabh Kant <amitabhkant@gmail.com> writes:
As for running the sql command as suggested by Tom, here is the result:
template1=# select * from pg_class where pg_relation_filenode(oid) =11678;
pg_class | 11 | 83 | 0 | 10 | 0 |
0 | 0 | 8 | 281 | 0 |0
| t | f | p | r | 26 |
0 | t | f | f | f | f
| 662 | {=r/pgsql} |That's about the worst possible answer :-(. Without pg_class, you have
little hope of telling which is which among the other files; and there
would be no real commonality with the contents of pg_class from other
databases in the installation, so no way to jury-rig something.Moreover,
because pg_class is consulted *very* early in backend startup, it seems
entirely likely that the failure you're seeing is only the tip of the
iceberg; there very possibly are other files that are also missing or
badly damaged.I would look for the file in the partition's lost+found directory; with
luck, the file is there. pg_filedump can identify the number of
attributes each tuple has in each file, if there are many such files;
you need to look for a file whose tuples have 26 attributes (relnatts in
the above row).--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
There is no lost+found directory in the recovered files. Since the DB was
not critical, and can easily be replaced, we are not inclined towards
sending it for professional recovery. As Tom suggested, this would just be
the tip iceberg, as even during recovery, there were lots of read errors
from the disk. I will recreate the DB from the backups.
Thanks for the help and insights though.
With regards
Amitabh