Help setting up warm standby replication

Started by Nico Sabbialmost 19 years ago8 messagesgeneral
Jump to latest
#1Nico Sabbi
nsabbi@officinedigitali.it

Hi,
after having read the documentation and the few posts in this list I
tried to set up
a warm standby replication between two instances of postgres running on
my pc
in 2 different base directories and 2 different ports.
(The second one was a cp -a of the first one after having created a
backup point) .
I don't need a realtime update: even few minutes of lag are good enough
for my needs.

To begin with I followed the example of the docs: in the recovery.conf
file of the
slave instance I set

restore_command = 'cp -av /var/lib/pgsql/data/pg_xlog/%f %p'

but at restart I got these messages in the logs (some translation from
italian to english by me) :
LOG: system shutdown at 2007-04-13 13:02:51 CEST
LOG: starting archive recovery
LOG: restore_command = "cp -av /var/lib/pgsql/data/pg_xlog/%f %p"
cp: impossibile fare stat di
`/var/lib/pgsql/data/pg_xlog/00000001.history': No such file or directory
`/var/lib/pgsql/data/pg_xlog/000000010000000000000007' ->
`pg_xlog/RECOVERYXLOG'
LOG: restored file "000000010000000000000007" from archive
LOG: invalid record length at 0/7000218
LOG: invalid primary record checkpoint
`/var/lib/pgsql/data/pg_xlog/000000010000000000000007' ->
`pg_xlog/RECOVERYXLOG'
LOG: restored file "000000010000000000000007" from archive
LOG: invalid record length at 0/70001D0
LOG: invalid primary record checkpoint
PANIC: impossible locating a valid record checkpoint
LOG: startup process (PID 20250) was terminated by signal 6
LOG: aborting startup due to startup process failure

same thing if I empty the pg_xlog/ dir (except archive_status/)

I hope someone can explain what I did wrong.

Another thing I didn't understand is the continous replication mechanism:
will the slave server periodically call the recovery_command or do I
have to setup a cronjob to instruct it to search updates?

Sorry if these questions are stupid :) and thanks in advance.

Nico

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nico Sabbi (#1)
Re: Help setting up warm standby replication

Nico Sabbi <nsabbi@officinedigitali.it> writes:

To begin with I followed the example of the docs: in the recovery.conf
file of the
slave instance I set

restore_command = 'cp -av /var/lib/pgsql/data/pg_xlog/%f %p'

Hm, it looks like you are trying to copy xlog segments straight from the
pg_xlog directory of the master? That's not going to work, because the
master will rename/overwrite those files as soon as it thinks it's done
with them. You need to have an archive_command on the master that is
really truly copying the data to somewhere else, and then the
restore_command should copy from the somewhere else.

regards, tom lane

#3Nico Sabbi
nsabbi@officinedigitali.it
In reply to: Tom Lane (#2)
Re: Help setting up warm standby replication

Tom Lane wrote:

Nico Sabbi <nsabbi@officinedigitali.it> writes:

To begin with I followed the example of the docs: in the recovery.conf
file of the
slave instance I set

restore_command = 'cp -av /var/lib/pgsql/data/pg_xlog/%f %p'

Hm, it looks like you are trying to copy xlog segments straight from the
pg_xlog directory of the master? That's not going to work, because the
master will rename/overwrite those files as soon as it thinks it's done
with them. You need to have an archive_command on the master that is
really truly copying the data to somewhere else, and then the
restore_command should copy from the somewhere else.

after having set up shared directory (one for archive_command and one
for recover_command)
I could successfully start and sync the slave dbms, but after the
initial recovery
the file recovery.conf is renamed to recovery.done and no more
updates/WAL are
asked (the slave's log reads:

LOG: archive recovery complete
LOG: system database ready

)

Is there any parameter that I have to pass to the second server to keep
on requesting WALs?
I still don't understand what instructs the server to continously
request the master's logs.

#4Merlin Moncure
mmoncure@gmail.com
In reply to: Nico Sabbi (#3)
Re: Help setting up warm standby replication

On 4/16/07, Nico Sabbi <nsabbi@officinedigitali.it> wrote:

Is there any parameter that I have to pass to the second server to keep
on requesting WALs?
I still don't understand what instructs the server to continously
request the master's logs.

google pg_standby. I've set it up and it works.

merlin

#5Nico Sabbi
nsabbi@officinedigitali.it
In reply to: Merlin Moncure (#4)
Re: Help setting up warm standby replication

Merlin Moncure wrote:

On 4/16/07, Nico Sabbi <nsabbi@officinedigitali.it> wrote:

Is there any parameter that I have to pass to the second server to keep
on requesting WALs?
I still don't understand what instructs the server to continously
request the master's logs.

google pg_standby. I've set it up and it works.

merlin

Hi,
I installed the second version of pg_standby.tar that you posted here.

In recovery.conf I set
restore_command = '/usr/local/bin/pg_standby -d -c -s 5 -w 0 /tmp/pg/ %f %p'

and I see in the log file that something goes on after every update:

Trigger file : (null)
Waiting for WAL file : /tmp/pg//000000010000000000000021
WAL file path : 000000010000000000000021
Restoring to... : pg_xlog/RECOVERYXLOG
Sleep interval : 5 seconds
Max wait interval : 0 forever
Command for restore : cp /tmp/pg//000000010000000000000021
pg_xlog/RECOVERYXLOG
running restore : success
LOG: restored file "000000010000000000000021" from archivio

but "cp /tmp/pg//000000010000000000000021 pg_xlog/RECOVERYXLOG" looks
suspicious to me.

Is it expected? should WAL files overwrite pg_xlog/RECOVERYXLOG ?
If I stop the replication and start the slave after having removed the
recovery.conf
the server doesn't come up:

LOG: aborting startup due to startup process failure
LOG: database system was interrupted while in recovery at log time
2007-04-16 15:29:42 CEST
HINT: If this has occurred more than once some data may be corrupted
and you may need to choose an earlier recovery target.
LOG: impossibile aprire il file "pg_xlog/000000010000000000000009"
(file registro 0, segmento 9): No such file or directory
LOG: invalid primary checkpoint record
LOG: impossibile aprire il file "pg_xlog/000000010000000000000021"
(file registro 0, segmento 33): No such file or directory
LOG: invalid secondary checkpoint record
PANIC: impossibile localizzare un checkpoint record valido
LOG: startup process (PID 10824) was terminated by signal 6
LOG: aborting startup due to startup process failure

infact pg_xlog/ of the slave contains only RECOVERHISTORY:

cat pg_xlog/RECOVERYHISTORY
START WAL LOCATION: 0/9000020 (file 000000010000000000000009)
STOP WAL LOCATION: 0/A000000 (file 00000001000000000000000A)
CHECKPOINT LOCATION: 0/9000020
START TIME: 2007-04-16 15:29:42 CEST
LABEL: label
STOP TIME: 2007-04-16 15:31:39 CEST

Did I do something wrong?
Thanks for your help,
Nico

#6Merlin Moncure
mmoncure@gmail.com
In reply to: Nico Sabbi (#5)
Re: Help setting up warm standby replication

On 4/16/07, Nico Sabbi <nsabbi@officinedigitali.it> wrote:

Merlin Moncure wrote:

On 4/16/07, Nico Sabbi <nsabbi@officinedigitali.it> wrote:

Is there any parameter that I have to pass to the second server to keep
on requesting WALs?
I still don't understand what instructs the server to continously
request the master's logs.

google pg_standby. I've set it up and it works.

merlin

Hi,
I installed the second version of pg_standby.tar that you posted here.

In recovery.conf I set
restore_command = '/usr/local/bin/pg_standby -d -c -s 5 -w 0 /tmp/pg/ %f %p'

and I see in the log file that something goes on after every update:

Trigger file : (null)
Waiting for WAL file : /tmp/pg//000000010000000000000021
WAL file path : 000000010000000000000021
Restoring to... : pg_xlog/RECOVERYXLOG
Sleep interval : 5 seconds
Max wait interval : 0 forever
Command for restore : cp /tmp/pg//000000010000000000000021
pg_xlog/RECOVERYXLOG
running restore : success
LOG: restored file "000000010000000000000021" from archivio

but "cp /tmp/pg//000000010000000000000021 pg_xlog/RECOVERYXLOG" looks
suspicious to me.

Is it expected? should WAL files overwrite pg_xlog/RECOVERYXLOG ?
If I stop the replication and start the slave after having removed the
recovery.conf
the server doesn't come up:

LOG: aborting startup due to startup process failure
LOG: database system was interrupted while in recovery at log time
2007-04-16 15:29:42 CEST
HINT: If this has occurred more than once some data may be corrupted
and you may need to choose an earlier recovery target.
LOG: impossibile aprire il file "pg_xlog/000000010000000000000009"
(file registro 0, segmento 9): No such file or directory
LOG: invalid primary checkpoint record
LOG: impossibile aprire il file "pg_xlog/000000010000000000000021"
(file registro 0, segmento 33): No such file or directory
LOG: invalid secondary checkpoint record
PANIC: impossibile localizzare un checkpoint record valido
LOG: startup process (PID 10824) was terminated by signal 6
LOG: aborting startup due to startup process failure

infact pg_xlog/ of the slave contains only RECOVERHISTORY:

cat pg_xlog/RECOVERYHISTORY
START WAL LOCATION: 0/9000020 (file 000000010000000000000009)
STOP WAL LOCATION: 0/A000000 (file 00000001000000000000000A)
CHECKPOINT LOCATION: 0/9000020
START TIME: 2007-04-16 15:29:42 CEST
LABEL: label
STOP TIME: 2007-04-16 15:31:39 CEST

Did I do something wrong?
Thanks for your help,

try link mode, not copy mode (-l). make sure you read about the -k switch.

merlin

#7Nico Sabbi
nsabbi@officinedigitali.it
In reply to: Merlin Moncure (#6)
Re: Help setting up warm standby replication

Merlin Moncure wrote:

try link mode, not copy mode (-l). make sure you read about the -k
switch.

merlin

replaced -c with -l, but the result was the same.
I assume that with -k you mean the socket dir for postgres - since
pg_standby doesn't
seek to recognize that switch - but I didn't need it: I connected to -p
5433.

Anyway, after having stopped the replication I had the same problem;
in order to make the slave start I had to manually copy the log files
from the
archive to pg_xlog/ .

Is pg_standby supposed to work with Postgres 8.2 ? or only with 8.1.?

#8Nico Sabbi
nsabbi@officinedigitali.it
In reply to: Nico Sabbi (#7)
Re: Help setting up warm standby replication

Nico Sabbi wrote:

Merlin Moncure wrote:

try link mode, not copy mode (-l). make sure you read about the -k
switch.

merlin

replaced -c with -l, but the result was the same.
I assume that with -k you mean the socket dir for postgres - since
pg_standby doesn't
seek to recognize that switch - but I didn't need it: I connected to
-p 5433.

Anyway, after having stopped the replication I had the same problem;
in order to make the slave start I had to manually copy the log files
from the
archive to pg_xlog/ .

Is pg_standby supposed to work with Postgres 8.2 ? or only with 8.1.?

update: touching the right trigger makes the replication stop, thus I
can connect to the server
and see the updated data.
Thanks a lot,
Nico