Unable to start postgres in recovery mode.
I am trying to put my database in recovery mode and I get the following error:
===========
LOG: starting archive recovery
LOG: restore_command = "/myrestore/pg_restore.sh %f %p"
00000001.history pg_xlog/RECOVERYHISTORY
[Main]: Server requested for 00000001.history to be copied to
pg_xlog/RECOVERYHISTORY
[pg_restore::isWALFileReady]: 1 Available WAL Files
[Main]: Request to copy 00000001.history to pg_xlog/RECOVERYHISTORY
[pg_restore]::copyWALFile: Moving
/mybackup/000000010000000000000001.009352E8.backup to
pg_xlog/RECOVERYHISTORY
LOG: restored log file "00000001.history" from archive
FATAL: syntax error in history file: START WAL LOCATION: 0/19352E8
(file 000000010000000000000001)
HINT: Expected a numeric timeline ID.
LOG: startup process (PID 12323) exited with exit code 1
LOG: aborting startup due to startup process failure
===========
In the log above, the logs with [Main] or [pg_restore] is my script
which is called by the recovery.conf.
The postgres server is asking for 00000001.history file and I do not
have that file. All I have is the 0*10*1.009352E8.backup file and
other WAL files starting from 0*10*1. In the above case, I move
0*10*1.009352E8.backup pg_xlog/RECOVERYHISTORY. Note that my backup is
on a staging area and I can therefore move safely.
What am I doing wrong?
If I indicate that I do not have the concerned file by returning error
code 1, I get the following error in the log:
============
LOG: starting archive recovery
LOG: restore_command = "/myrestore/pg_restore.sh %f %p"
00000001.history pg_xlog/RECOVERYHISTORY
[Main]: Server requested for 00000001.history to be copied to
pg_xlog/RECOVERYHISTORY
00000001000000000000004F pg_xlog/RECOVERYXLOG
[Main]: Server requested for 00000001000000000000004F to be copied to
pg_xlog/RECOVERYXLOG
LOG: could not open file "pg_xlog/00000001000000000000004F" (log file
0, segment 79): No such file or directory
LOG: invalid primary checkpoint record
00000001000000000000004F pg_xlog/RECOVERYXLOG
[Main]: Server requested for 00000001000000000000004F to be copied to
pg_xlog/RECOVERYXLOG
LOG: could not open file "pg_xlog/00000001000000000000004F" (log file
0, segment 79): No such file or directory
LOG: invalid secondary checkpoint record
PANIC: could not locate a valid checkpoint record
LOG: startup process (PID 12222) was terminated by signal 6
LOG: aborting startup due to startup process failure
LOG: database system was shut down at 2007-03-19 03:33:05 PDT
==============
So what am I doing wrong here? Any help in the above matter is greatly
appreciated.
Regards
Dhaval
"Dhaval Shah" <dhaval.shah.m@gmail.com> writes:
What am I doing wrong?
Lying to the server. If you don't have the requested file, return
failure, don't invent something. There are a number of cases where
the recovery process asks for files that are quite likely not to exist.
If I indicate that I do not have the concerned file by returning error
code 1, I get the following error in the log:
This may indicate that you have an incomplete backup :-(. It's hard to
tell from this much info though. What is in pg_control (use
pg_controldata to dump) and what is in the backup_label file (that's
plain text)? What WAL segment files do you actually have?
regards, tom lane
Thanks for the email. It helped and after going through the email and
the doc, I realized that the "backup" file had the wrong information,
or rather I had the wrong backup files. That will do the kind of
errors I have seen.
However, I do have one question, I am setting this up as part of the
HA process. The standby is a "hot" standby. Now, if the primary fails
how do I tell the secondary that come out of recovery mode and move
the recovery.conf to recovery.done and start the db. I mean, what
error code shall I return?
If I return a non-numeric error code, I get the following result [from
serverlog]:
====
00000001000000000000001B pg_xlog/RECOVERYXLOG
LOG: restored log file "00000001000000000000001B" from archive
00000001000000000000001C pg_xlog/RECOVERYXLOG
[Main: Triggering Recovery!!!] <---- My script detected that it needs
to trigger recovery...
LOG: could not open file "pg_xlog/00000001000000000000001C" (log file
0, segment 28): No such file or directory
LOG: redo done at 0/1B000070
00000001000000000000001B pg_xlog/RECOVERYXLOG
Main: Triggering Recovery!!! <--- My script is called again and the
script says trigger recovery
PANIC: could not open file "pg_xlog/00000001000000000000001B" (log
file 0, segment 27): No such file or directory
LOG: startup process (PID 32167) was terminated by signal 6
LOG: aborting startup due to startup process failure
====
This is what my script is doing:
if ( triggerRecovery() ) {
print "Main: Triggering Recovery!!! \n";
return 1;
}
So, the question is, on detecting that the primary is down and to
trigger recovery, what error code shall I return? Or do I have to move
the recovery.conf to recovery.done myself and restart the db?
Regards
Dhaval
On 3/20/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Dhaval Shah" <dhaval.shah.m@gmail.com> writes:
What am I doing wrong?
Lying to the server. If you don't have the requested file, return
failure, don't invent something. There are a number of cases where
the recovery process asks for files that are quite likely not to exist.If I indicate that I do not have the concerned file by returning error
code 1, I get the following error in the log:This may indicate that you have an incomplete backup :-(. It's hard to
tell from this much info though. What is in pg_control (use
pg_controldata to dump) and what is in the backup_label file (that's
plain text)? What WAL segment files do you actually have?regards, tom lane
--
Dhaval Shah