recovery after interrupt in the middle of a previous recovery
Hello,
I am using postgres 8.3.1, and have implemented warm standby very much like
the one described in the high availability documentation on this site.
It seems to work well except for this problem: I've had a case where the
postgresql server was interrupted while in recovery (I think it was a user
interrupt, the log sais:
. LOG: received fast shutdown request
LOG: archive recovery complete
FATAL: terminating connection due to administrator command
LOG: startup process (PID 6033) exited with exit code 1
LOG: aborting startup due to startup process failure
And after that, pg doesn't go through the recovery script provided in
recovery.conf, and doesn't manage to come up. The log sais:
LOG: database system was interrupted while in recovery at log time
2010-05-26 02:00:03 IDT
HINT: If this has occurred more than once some data might be corrupted and
you might need to choose an earlier recovery target.
LOG: could not open file "pg_xlog/000000CA0000000A0000006D" (log file 10,
segment 109): No such file or directory
LOG: invalid primary checkpoint record
LOG: could not open file "pg_xlog/000000CA0000000A0000006D" (log file 10,
segment 109): No such file or directory
LOG: invalid secondary checkpoint record
PANIC: could not locate a valid checkpoint record
LOG: startup process (PID 8081) was terminated by signal 6: Aborted
LOG: aborting startup due to startup process failure
But usually it just goes into my recovery script and there I provide the
WAL archive files and put them in the pg_xlog directory.
Do you know if I have to configure someplace what script to use when PG is
recovering from a failed recovery? Or is this a bug?
Thanks!
Or Kroyzer <orkroyzer@gmail.com> writes:
I am using postgres 8.3.1,
... you really ought to be using 8.3.something-recent ...
and have implemented warm standby very much like
the one described in the high availability documentation on this site.
It seems to work well except for this problem: I've had a case where the
postgresql server was interrupted while in recovery (I think it was a user
interrupt, the log sais:
. LOG: received fast shutdown request
LOG: archive recovery complete
FATAL: terminating connection due to administrator command
LOG: startup process (PID 6033) exited with exit code 1
LOG: aborting startup due to startup process failure
And after that, pg doesn't go through the recovery script provided in
recovery.conf, and doesn't manage to come up. The log sais:
LOG: database system was interrupted while in recovery at log time
2010-05-26 02:00:03 IDT
HINT: If this has occurred more than once some data might be corrupted and
you might need to choose an earlier recovery target.
LOG: could not open file "pg_xlog/000000CA0000000A0000006D" (log file 10,
segment 109): No such file or directory
LOG: invalid primary checkpoint record
LOG: could not open file "pg_xlog/000000CA0000000A0000006D" (log file 10,
segment 109): No such file or directory
LOG: invalid secondary checkpoint record
PANIC: could not locate a valid checkpoint record
LOG: startup process (PID 8081) was terminated by signal 6: Aborted
LOG: aborting startup due to startup process failure
Hmm. Try putting back your recovery.conf file --- it will have been
renamed at the point where "archive recovery complete" was printed.
This example suggests that we might be doing that too early.
regards, tom lane
Thanks.
2010/5/26 Tom Lane <tgl@sss.pgh.pa.us>
Show quoted text
Or Kroyzer <orkroyzer@gmail.com> writes:
I am using postgres 8.3.1,
... you really ought to be using 8.3.something-recent ...
and have implemented warm standby very much like
the one described in the high availability documentation on this site.
It seems to work well except for this problem: I've had a case where the
postgresql server was interrupted while in recovery (I think it was auser
interrupt, the log sais:
. LOG: received fast shutdown request
LOG: archive recovery complete
FATAL: terminating connection due to administrator command
LOG: startup process (PID 6033) exited with exit code 1
LOG: aborting startup due to startup process failureAnd after that, pg doesn't go through the recovery script provided in
recovery.conf, and doesn't manage to come up. The log sais:LOG: database system was interrupted while in recovery at log time
2010-05-26 02:00:03 IDT
HINT: If this has occurred more than once some data might be corruptedand
you might need to choose an earlier recovery target.
LOG: could not open file "pg_xlog/000000CA0000000A0000006D" (log file10,
segment 109): No such file or directory
LOG: invalid primary checkpoint record
LOG: could not open file "pg_xlog/000000CA0000000A0000006D" (log file10,
segment 109): No such file or directory
LOG: invalid secondary checkpoint record
PANIC: could not locate a valid checkpoint record
LOG: startup process (PID 8081) was terminated by signal 6: Aborted
LOG: aborting startup due to startup process failureHmm. Try putting back your recovery.conf file --- it will have been
renamed at the point where "archive recovery complete" was printed.
This example suggests that we might be doing that too early.regards, tom lane