recovery after interrupt in the middle of a previous recovery

Started by Or Kroyzerabout 16 years ago3 messagesgeneral

orkroyzer@gmail.com

about 16 years ago

Hello,
I am using postgres 8.3.1, and have implemented warm standby very much like
the one described in the high availability documentation on this site.
It seems to work well except for this problem: I've had a case where the
postgresql server was interrupted while in recovery (I think it was a user
interrupt, the log sais:

. LOG: received fast shutdown request
LOG: archive recovery complete
FATAL: terminating connection due to administrator command
LOG: startup process (PID 6033) exited with exit code 1
LOG: aborting startup due to startup process failure

And after that, pg doesn't go through the recovery script provided in
recovery.conf, and doesn't manage to come up. The log sais:

LOG: database system was interrupted while in recovery at log time
2010-05-26 02:00:03 IDT
HINT: If this has occurred more than once some data might be corrupted and
you might need to choose an earlier recovery target.
LOG: could not open file "pg_xlog/000000CA0000000A0000006D" (log file 10,
segment 109): No such file or directory
LOG: invalid primary checkpoint record
LOG: could not open file "pg_xlog/000000CA0000000A0000006D" (log file 10,
segment 109): No such file or directory
LOG: invalid secondary checkpoint record
PANIC: could not locate a valid checkpoint record
LOG: startup process (PID 8081) was terminated by signal 6: Aborted
LOG: aborting startup due to startup process failure

But usually it just goes into my recovery script and there I provide the
WAL archive files and put them in the pg_xlog directory.

Do you know if I have to configure someplace what script to use when PG is
recovering from a failed recovery? Or is this a bug?

Thanks!

Tom Lane

tgl@sss.pgh.pa.us

about 16 years ago

In reply to: Or Kroyzer (#1)

Re: recovery after interrupt in the middle of a previous recovery

Or Kroyzer <orkroyzer@gmail.com> writes:

I am using postgres 8.3.1,

... you really ought to be using 8.3.something-recent ...

and have implemented warm standby very much like
the one described in the high availability documentation on this site.
It seems to work well except for this problem: I've had a case where the
postgresql server was interrupted while in recovery (I think it was a user
interrupt, the log sais:

. LOG: received fast shutdown request
LOG: archive recovery complete
FATAL: terminating connection due to administrator command
LOG: startup process (PID 6033) exited with exit code 1
LOG: aborting startup due to startup process failure

And after that, pg doesn't go through the recovery script provided in
recovery.conf, and doesn't manage to come up. The log sais:

LOG: database system was interrupted while in recovery at log time
2010-05-26 02:00:03 IDT
HINT: If this has occurred more than once some data might be corrupted and
you might need to choose an earlier recovery target.
LOG: could not open file "pg_xlog/000000CA0000000A0000006D" (log file 10,
segment 109): No such file or directory
LOG: invalid primary checkpoint record
LOG: could not open file "pg_xlog/000000CA0000000A0000006D" (log file 10,
segment 109): No such file or directory
LOG: invalid secondary checkpoint record
PANIC: could not locate a valid checkpoint record
LOG: startup process (PID 8081) was terminated by signal 6: Aborted
LOG: aborting startup due to startup process failure

Hmm. Try putting back your recovery.conf file --- it will have been
renamed at the point where "archive recovery complete" was printed.
This example suggests that we might be doing that too early.

regards, tom lane

Or Kroyzer

orkroyzer@gmail.com

about 16 years ago

In reply to: Tom Lane (#2)

Re: recovery after interrupt in the middle of a previous recovery

Thanks.

2010/5/26 Tom Lane <tgl@sss.pgh.pa.us>

Show quoted text

Or Kroyzer <orkroyzer@gmail.com> writes:

I am using postgres 8.3.1,

... you really ought to be using 8.3.something-recent ...

and have implemented warm standby very much like
the one described in the high availability documentation on this site.
It seems to work well except for this problem: I've had a case where the
postgresql server was interrupted while in recovery (I think it was a

user

interrupt, the log sais:

. LOG: received fast shutdown request
LOG: archive recovery complete
FATAL: terminating connection due to administrator command
LOG: startup process (PID 6033) exited with exit code 1
LOG: aborting startup due to startup process failure

And after that, pg doesn't go through the recovery script provided in
recovery.conf, and doesn't manage to come up. The log sais:

LOG: database system was interrupted while in recovery at log time
2010-05-26 02:00:03 IDT
HINT: If this has occurred more than once some data might be corrupted

and

you might need to choose an earlier recovery target.
LOG: could not open file "pg_xlog/000000CA0000000A0000006D" (log file

10,

segment 109): No such file or directory
LOG: invalid primary checkpoint record
LOG: could not open file "pg_xlog/000000CA0000000A0000006D" (log file

10,

segment 109): No such file or directory
LOG: invalid secondary checkpoint record
PANIC: could not locate a valid checkpoint record
LOG: startup process (PID 8081) was terminated by signal 6: Aborted
LOG: aborting startup due to startup process failure

Hmm. Try putting back your recovery.conf file --- it will have been
renamed at the point where "archive recovery complete" was printed.
This example suggests that we might be doing that too early.

regards, tom lane