how to recover after harddisk error
Hi,
Yesterday at about 8pm the harddisk subsystem of our web application
crashed, because of some scsi-error. The system could be restarted today
in the morning, but the database would not come up again. The following
info could be found in the log file.
2003-02-26 09:03:06 [1291] DEBUG: database system was interrupted at
2003-02-25 20:19:22 CET
2003-02-26 09:03:06 [1291] DEBUG: open of
/usr/local/pgsql/data/pg_xlog/0000001A000000C9 (log file 26, segment
201) failed
: No such file or directory
2003-02-26 09:03:06 [1291] DEBUG: invalid primary checkpoint record
2003-02-26 09:03:06 [1291] DEBUG: open of
/usr/local/pgsql/data/pg_xlog/0000001A000000C8 (log file 26, segment
200) failed
: No such file or directory
2003-02-26 09:03:06 [1291] DEBUG: invalid secondary checkpoint record
2003-02-26 09:03:06 [1291] FATAL 2: unable to locate a valid
checkpoint record
2003-02-26 09:03:06 [1277] DEBUG: startup process (pid 1291) exited
with exit code 2
2003-02-26 09:03:06 [1277] DEBUG: aborting startup due to startup
process failure
I did the following steps to get the system running again:
- a new initdb in another data-directory
- create the database again
- restore the data from the last available nightly dump
Is there a better way to get the system running again? Had there been
any way to access the old system again? The steps I did took about 45
min which is quite long (cause the db-dump is rather large) and if there
had been some important data it had been lost...
TIA, peter
2003-02-26 09:03:06 [1291] DEBUG: invalid primary checkpoint record
2003-02-26 09:03:06 [1291] DEBUG: open of
/usr/local/pgsql/data/pg_xlog/0000001A000000C8 (log file 26, segment
200) failedI did the following steps to get the system running again:
- a new initdb in another data-directory
- create the database again
- restore the data from the last available nightly dumpIs there a better way to get the system running again? Had there been
any way to access the old system again? The steps I did took about 45
min which is quite long (cause the db-dump is rather large) and if there
had been some important data it had been lost...
pg_resetxlog from contrib
Regards,
Bjoern
Thanks a lot Bjoern.
Just wanted to mention that I found pg_resetxlog to be available per
default in pg7.3.2.
-----Ursprüngliche Nachricht-----
Von: Björn Metzdorf [mailto:bm@turtle-entertainment.de]
Gesendet: Mittwoch, 26. Februar 2003 10:25
An: Peter Alberer; pgsql-general@postgresql.org
Betreff: Re: [GENERAL] how to recover after harddisk error2003-02-26 09:03:06 [1291] DEBUG: invalid primary checkpoint
record
2003-02-26 09:03:06 [1291] DEBUG: open of
/usr/local/pgsql/data/pg_xlog/0000001A000000C8 (log file 26, segment
200) failedI did the following steps to get the system running again:
- a new initdb in another data-directory
- create the database again
- restore the data from the last available nightly dumpIs there a better way to get the system running again? Had there been
any way to access the old system again? The steps I did took about 45
min which is quite long (cause the db-dump is rather large) and if
there
Show quoted text
had been some important data it had been lost...
pg_resetxlog from contrib
Regards,
Bjoern
"Peter Alberer" <h9351252@obelix.wu-wien.ac.at> writes:
2003-02-26 09:03:06 [1291] DEBUG: open of
/usr/local/pgsql/data/pg_xlog/0000001A000000C9 (log file 26, segment
201) failed
: No such file or directory
2003-02-26 09:03:06 [1291] DEBUG: invalid primary checkpoint record
2003-02-26 09:03:06 [1291] DEBUG: open of
/usr/local/pgsql/data/pg_xlog/0000001A000000C8 (log file 26, segment
200) failed
: No such file or directory
2003-02-26 09:03:06 [1291] DEBUG: invalid secondary checkpoint record
2003-02-26 09:03:06 [1291] FATAL 2: unable to locate a valid
checkpoint record
Assuming you haven't wiped the old database directory yet...
What file name(s) are actually present in /usr/local/pgsql/data/pg_xlog/
? What does pg_controldata show --- do the other fields of pg_control
look sane?
pg_resetxlog would have allowed you to restart, but at the price of
losing any consistency guarantees about the results of
recently-committed transactions. So I consider it a very last resort.
What I'd like to understand first is why the system couldn't restart
normally.
regards, tom lane
Too bad, i had intended to keep the old database instance around, but i had to remove the files a
few hours ago after running low on harddisk capacity...
ciao, peter
Show quoted text
"Peter Alberer" <h9351252@obelix.wu-wien.ac.at> writes:
2003-02-26 09:03:06 [1291] DEBUG: open of
/usr/local/pgsql/data/pg_xlog/0000001A000000C9 (log file 26, segment
201) failed
: No such file or directory
2003-02-26 09:03:06 [1291] DEBUG: invalid primary checkpoint record
2003-02-26 09:03:06 [1291] DEBUG: open of
/usr/local/pgsql/data/pg_xlog/0000001A000000C8 (log file 26, segment
200) failed
: No such file or directory
2003-02-26 09:03:06 [1291] DEBUG: invalid secondary checkpoint record
2003-02-26 09:03:06 [1291] FATAL 2: unable to locate a valid
checkpoint recordAssuming you haven't wiped the old database directory yet...
What file name(s) are actually present in /usr/local/pgsql/data/pg_xlog/
? What does pg_controldata show --- do the other fields of pg_control
look sane?pg_resetxlog would have allowed you to restart, but at the price of
losing any consistency guarantees about the results of
recently-committed transactions. So I consider it a very last resort.
What I'd like to understand first is why the system couldn't restart
normally.regards, tom lane