recovery via base + WAL replay failure

Started by Rob Adamsover 17 years ago8 messagesgeneral
Jump to latest
#1Rob Adams
robfadams@cox.net

I'm trying to demonstrate recovery using the "continuous archiving"
backup technique. I'm using 8.3 on Windows.

I made a base backup while the postgres was running using the following
batch file:

--------------------------
psql -d test_database -U user_name -c "SELECT pg_start_backup('test');"

7za a -tzip "C:\backup\base.zip" "C:\Program Files\PostgreSQL\8.3\data"

psql -d test_database -U user_name -c "SELECT pg_stop_backup();"
--------------------------

No indication of any errors. However, I am unable to recover. Here are
my steps:

1. Stop service, replace the data directory w/ the base backup.
2. Create recovery.conf with this setting:
restore_command = 'copy C:\backup\%f "%p"'
3. Try to start service

After about 90 seconds, the dos prompt displays:
"(postgres 8.3) service could not be started"
"The service did not report an error"

Does anyone know what I am probably doing wrong?

Thanks,
--Rob Adams

#2Magnus Hagander
magnus@hagander.net
In reply to: Rob Adams (#1)
Re: recovery via base + WAL replay failure

Rob Adams wrote:

I'm trying to demonstrate recovery using the "continuous archiving"
backup technique. I'm using 8.3 on Windows.

I made a base backup while the postgres was running using the following
batch file:

--------------------------
psql -d test_database -U user_name -c "SELECT pg_start_backup('test');"

7za a -tzip "C:\backup\base.zip" "C:\Program Files\PostgreSQL\8.3\data"

psql -d test_database -U user_name -c "SELECT pg_stop_backup();"
--------------------------

No indication of any errors. However, I am unable to recover. Here are
my steps:

1. Stop service, replace the data directory w/ the base backup.
2. Create recovery.conf with this setting:
restore_command = 'copy C:\backup\%f "%p"'
3. Try to start service

After about 90 seconds, the dos prompt displays:
"(postgres 8.3) service could not be started"
"The service did not report an error"

Does anyone know what I am probably doing wrong?

You'll need to look at the PostgreSQL logs to see what they say.

But I'm curious about the 7za step. It gives no errors or warnings?
IIRC, you need to use VSS on Windows to avoid sharing violations when
opening the files unless a specific open mode is used. It could be that
7za uses that by default, but it's also possible tha tit's not actually
backing up all files...

//Magnus

#3Greg Smith
gsmith@gregsmith.com
In reply to: Rob Adams (#1)
Re: recovery via base + WAL replay failure

On Sun, 3 Aug 2008, Rob Adams wrote:

I made a base backup while the postgres was running using the following batch
file:
psql -d test_database -U user_name -c "SELECT pg_start_backup('test');"

What did you have archive_command set to? That needs to dump the WAL
files generated while the backup is going on somewhere that gets copied
over after the main copy is done, and you need the last of them referenced
by the backup copied over before you can use that backup. Steps (1) and
(5) of
http://www.postgresql.org/docs/current/static/continuous-archiving.html
are the hard parts here and I don't see that you're addressing them so
far, and that will keep the copy from starting if all the files aren't
there.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#4Lennin Caro
lennin.caro@yahoo.com
In reply to: Greg Smith (#3)
Re: recovery via base + WAL replay failure

what error show the log file?

--- On Mon, 8/4/08, Greg Smith <gsmith@gregsmith.com> wrote:
Show quoted text

From: Greg Smith <gsmith@gregsmith.com>
Subject: Re: [GENERAL] recovery via base + WAL replay failure
To: "Rob Adams" <robfadams@cox.net>
Cc: "postgres general" <pgsql-general@postgresql.org>
Date: Monday, August 4, 2008, 5:58 PM
On Sun, 3 Aug 2008, Rob Adams wrote:

I made a base backup while the postgres was running

using the following batch

file:
psql -d test_database -U user_name -c "SELECT

pg_start_backup('test');"

What did you have archive_command set to? That needs to
dump the WAL
files generated while the backup is going on somewhere that
gets copied
over after the main copy is done, and you need the last of
them referenced
by the backup copied over before you can use that backup.
Steps (1) and
(5) of
http://www.postgresql.org/docs/current/static/continuous-archiving.html

are the hard parts here and I don't see that you're
addressing them so
far, and that will keep the copy from starting if all the
files aren't
there.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com
Baltimore, MD

--
Sent via pgsql-general mailing list
(pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Rob Adams
robfadams@cox.net
In reply to: Greg Smith (#3)
Re: recovery via base + WAL replay failure

The WAL file archiving appears to be working correctly. These are the
settings I'm using for archiving the WAL files:

archive_mode = on
archive_command = 'copy %p C:\backup\%f /A'
archive_timeout = 15s

Thanks again,
--Rob

Greg Smith wrote:

Show quoted text

On Sun, 3 Aug 2008, Rob Adams wrote:

I made a base backup while the postgres was running using the
following batch file:
psql -d test_database -U user_name -c "SELECT pg_start_backup('test');"

What did you have archive_command set to? That needs to dump the WAL
files generated while the backup is going on somewhere that gets copied
over after the main copy is done, and you need the last of them
referenced by the backup copied over before you can use that backup.
Steps (1) and (5) of
http://www.postgresql.org/docs/current/static/continuous-archiving.html
are the hard parts here and I don't see that you're addressing them so
far, and that will keep the copy from starting if all the files aren't
there.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#6Rob Adams
robfadams@cox.net
In reply to: Lennin Caro (#4)
Re: recovery via base + WAL replay failure

There is nothing in the log file (in pg_log dir) with regard to this.
Should I set any particular parameter in the postgresql.conf file to log
information about a failed startup? I have not altered or uncommented
any lines in the "ERROR REPORTING AND LOGGING" section of the conf file.

Thanks again,
--Rob Adams

Lennin Caro wrote:

Show quoted text

what error show the log file?

--- On Mon, 8/4/08, Greg Smith <gsmith@gregsmith.com> wrote:

From: Greg Smith <gsmith@gregsmith.com>
Subject: Re: [GENERAL] recovery via base + WAL replay failure
To: "Rob Adams" <robfadams@cox.net>
Cc: "postgres general" <pgsql-general@postgresql.org>
Date: Monday, August 4, 2008, 5:58 PM
On Sun, 3 Aug 2008, Rob Adams wrote:

I made a base backup while the postgres was running

using the following batch

file:
psql -d test_database -U user_name -c "SELECT

pg_start_backup('test');"

What did you have archive_command set to? That needs to
dump the WAL
files generated while the backup is going on somewhere that
gets copied
over after the main copy is done, and you need the last of
them referenced
by the backup copied over before you can use that backup.
Steps (1) and
(5) of
http://www.postgresql.org/docs/current/static/continuous-archiving.html

are the hard parts here and I don't see that you're
addressing them so
far, and that will keep the copy from starting if all the
files aren't
there.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com
Baltimore, MD

--
Sent via pgsql-general mailing list
(pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Rob Adams
robfadams@cox.net
In reply to: Lennin Caro (#4)
Re: recovery via base + WAL replay failure

I found error log entries in the Windows Event Viewer:

----------------
2008-08-01 23:57:55 GMT FATAL: could not remove old lock file
"postmaster.pid": Permission denied

2008-08-01 23:57:55 GMT HINT: The file seems accidentally left over,
but it could not be removed. Please remove the file by hand and try again.
----------------

However, there is no postmaster.pid file in the data directory. (I
can't find one anywhere else, either.)

Any ideas?

Thanks,
--Rob Adams

Lennin Caro wrote:

Show quoted text

what error show the log file?

--- On Mon, 8/4/08, Greg Smith <gsmith@gregsmith.com> wrote:

From: Greg Smith <gsmith@gregsmith.com>
Subject: Re: [GENERAL] recovery via base + WAL replay failure
To: "Rob Adams" <robfadams@cox.net>
Cc: "postgres general" <pgsql-general@postgresql.org>
Date: Monday, August 4, 2008, 5:58 PM
On Sun, 3 Aug 2008, Rob Adams wrote:

I made a base backup while the postgres was running

using the following batch

file:
psql -d test_database -U user_name -c "SELECT

pg_start_backup('test');"

What did you have archive_command set to? That needs to
dump the WAL
files generated while the backup is going on somewhere that
gets copied
over after the main copy is done, and you need the last of
them referenced
by the backup copied over before you can use that backup.
Steps (1) and
(5) of
http://www.postgresql.org/docs/current/static/continuous-archiving.html

are the hard parts here and I don't see that you're
addressing them so
far, and that will keep the copy from starting if all the
files aren't
there.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com
Baltimore, MD

--
Sent via pgsql-general mailing list
(pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Rob Adams
robfadams@cox.net
In reply to: Rob Adams (#7)
Re: recovery via base + WAL replay failure

Finally figured out what was wrong. The data folder had incorrect
permissions after unzipping the base backup. For me, the solution was
unchecking the "Inherit from parent the permission entries that apply to
child objects" option in the Advanced Security Settings dialog for the
data folder & giving the postgres user full control.

Nothing appeared in the log when the database failed to startup b/c the
permissions were wrong. However, an application error did get reported
to Windows which I found using the Event Viewer.

--Rob