Unable to start postgres in recovery mode.

Started by Dhaval Shahabout 19 years ago3 messagesgeneral

dhaval.shah.m@gmail.com

about 19 years ago

I am trying to put my database in recovery mode and I get the following error:

===========
LOG: starting archive recovery
LOG: restore_command = "/myrestore/pg_restore.sh %f %p"
00000001.history pg_xlog/RECOVERYHISTORY
[Main]: Server requested for 00000001.history to be copied to
pg_xlog/RECOVERYHISTORY
[pg_restore::isWALFileReady]: 1 Available WAL Files
[Main]: Request to copy 00000001.history to pg_xlog/RECOVERYHISTORY
[pg_restore]::copyWALFile: Moving
/mybackup/000000010000000000000001.009352E8.backup to
pg_xlog/RECOVERYHISTORY
LOG: restored log file "00000001.history" from archive
FATAL: syntax error in history file: START WAL LOCATION: 0/19352E8
(file 000000010000000000000001)

HINT: Expected a numeric timeline ID.
LOG: startup process (PID 12323) exited with exit code 1
LOG: aborting startup due to startup process failure

===========

In the log above, the logs with [Main] or [pg_restore] is my script
which is called by the recovery.conf.

The postgres server is asking for 00000001.history file and I do not
have that file. All I have is the 0*10*1.009352E8.backup file and
other WAL files starting from 0*10*1. In the above case, I move
0*10*1.009352E8.backup pg_xlog/RECOVERYHISTORY. Note that my backup is
on a staging area and I can therefore move safely.

What am I doing wrong?

If I indicate that I do not have the concerned file by returning error
code 1, I get the following error in the log:

============
LOG: starting archive recovery
LOG: restore_command = "/myrestore/pg_restore.sh %f %p"
00000001.history pg_xlog/RECOVERYHISTORY
[Main]: Server requested for 00000001.history to be copied to
pg_xlog/RECOVERYHISTORY
00000001000000000000004F pg_xlog/RECOVERYXLOG
[Main]: Server requested for 00000001000000000000004F to be copied to
pg_xlog/RECOVERYXLOG
LOG: could not open file "pg_xlog/00000001000000000000004F" (log file
0, segment 79): No such file or directory
LOG: invalid primary checkpoint record
00000001000000000000004F pg_xlog/RECOVERYXLOG
[Main]: Server requested for 00000001000000000000004F to be copied to
pg_xlog/RECOVERYXLOG
LOG: could not open file "pg_xlog/00000001000000000000004F" (log file
0, segment 79): No such file or directory
LOG: invalid secondary checkpoint record
PANIC: could not locate a valid checkpoint record
LOG: startup process (PID 12222) was terminated by signal 6
LOG: aborting startup due to startup process failure
LOG: database system was shut down at 2007-03-19 03:33:05 PDT
==============

So what am I doing wrong here? Any help in the above matter is greatly
appreciated.

Regards
Dhaval

Tom Lane

tgl@sss.pgh.pa.us

about 19 years ago

In reply to: Dhaval Shah (#1)

Re: Unable to start postgres in recovery mode.

"Dhaval Shah" <dhaval.shah.m@gmail.com> writes:

What am I doing wrong?

Lying to the server. If you don't have the requested file, return
failure, don't invent something. There are a number of cases where
the recovery process asks for files that are quite likely not to exist.

If I indicate that I do not have the concerned file by returning error
code 1, I get the following error in the log:

This may indicate that you have an incomplete backup :-(. It's hard to
tell from this much info though. What is in pg_control (use
pg_controldata to dump) and what is in the backup_label file (that's
plain text)? What WAL segment files do you actually have?

regards, tom lane

Dhaval Shah

dhaval.shah.m@gmail.com

about 19 years ago

In reply to: Tom Lane (#2)

Re: Unable to start postgres in recovery mode.

Thanks for the email. It helped and after going through the email and
the doc, I realized that the "backup" file had the wrong information,
or rather I had the wrong backup files. That will do the kind of
errors I have seen.

However, I do have one question, I am setting this up as part of the
HA process. The standby is a "hot" standby. Now, if the primary fails
how do I tell the secondary that come out of recovery mode and move
the recovery.conf to recovery.done and start the db. I mean, what
error code shall I return?

If I return a non-numeric error code, I get the following result [from
serverlog]:

====
00000001000000000000001B pg_xlog/RECOVERYXLOG
LOG: restored log file "00000001000000000000001B" from archive
00000001000000000000001C pg_xlog/RECOVERYXLOG
[Main: Triggering Recovery!!!] <---- My script detected that it needs
to trigger recovery...
LOG: could not open file "pg_xlog/00000001000000000000001C" (log file
0, segment 28): No such file or directory
LOG: redo done at 0/1B000070
00000001000000000000001B pg_xlog/RECOVERYXLOG
Main: Triggering Recovery!!! <--- My script is called again and the
script says trigger recovery
PANIC: could not open file "pg_xlog/00000001000000000000001B" (log
file 0, segment 27): No such file or directory
LOG: startup process (PID 32167) was terminated by signal 6
LOG: aborting startup due to startup process failure
====

This is what my script is doing:

if ( triggerRecovery() ) {
print "Main: Triggering Recovery!!! \n";
return 1;
}

So, the question is, on detecting that the primary is down and to
trigger recovery, what error code shall I return? Or do I have to move
the recovery.conf to recovery.done myself and restart the db?

Regards
Dhaval

On 3/20/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Dhaval Shah" <dhaval.shah.m@gmail.com> writes:

What am I doing wrong?

Lying to the server. If you don't have the requested file, return
failure, don't invent something. There are a number of cases where
the recovery process asks for files that are quite likely not to exist.

If I indicate that I do not have the concerned file by returning error
code 1, I get the following error in the log:

This may indicate that you have an incomplete backup :-(. It's hard to
tell from this much info though. What is in pg_control (use
pg_controldata to dump) and what is in the backup_label file (that's
plain text)? What WAL segment files do you actually have?

regards, tom lane

--
Dhaval Shah