Having trouble configuring a Master with multiple standby Servers in PostgreSQL 9.3.3
<html><body><span style="font-family:Verdana; color:#000000; font-size:10pt;"><div>Trying to configure a master with two standby server vm's<br><br>Postrgresql.conf - Master Server VM<br><br>wal_level = hot_standby # minimal, archive, or hot_standby </div><div> # (change requires restart)<br># - Archiving -<br>archive_mode = on # allows archiving to be done # (change requires restart)<br>archive_command = 'test ! -f /mnt/server/master_archivedir/%f && cp %p /mnt/server/master_archivedir/%f' # command to use to archive a logfile segment<br> # placeholders: %p = path of file to archive<br>#archive_timeout = 0 # force a logfile segment switch after this<br> # number of seconds; 0 disables<br>#------------------------------------------------------------------------------<br># REPLICATION<br>#------------------------------------------------------------------------------<br># - Sending Server(s) -<br># Set these on the master and on any standby that will send replication data.<br>max_wal_senders = 3 # max number of walsender processes<br> # (change requires restart)<br>_________________________________________________________________________________________________________<br>Postrgresql.conf - Slave1 Server VM<br><br>wal_level = hot_standby # minimal, archive, or hot_standby</div><div> # (change requires restart)<br># - Archiving -<br>archive_mode = on # allows archiving to be done<br> # (change requires restart)<br>archive_command = 'test ! -f /mnt/server/slave1_archivedir/%f && cp %p /mnt/server/slave1_archivedir/%f' # command to use to archive a logfile segment<br> # placeholders: %p = path of file to archive<br>#archive_timeout = 0 # force a logfile segment switch after this<br> # number of seconds; 0 disables<br>#------------------------------------------------------------------------------<br># REPLICATION<br>#------------------------------------------------------------------------------<br># - Sending Server(s) -<br># Set these on the master and on any standby that will send replication data.<br>max_wal_senders = 3 # max number of walsender processes<br> # (change requires restart)<br># - Standby Servers -<br># These settings are ignored on a master server.<br>hot_standby = on<br><br>*** Completed this step, with 1.2.3.4 being the IP of slave1<br><br>psql -c "select pg_start_backup('initial_backup');"<br>rsync -cvar --inplace --exclude=*pg_xlog* /u01/fiber/postgreSQL_data/postgres@1.2.3.4:/u01/fiber/postgreSQL_data/<br>psql -c " select pg_stop_backup ();"<br><br><b>Now the issue is with the recovery.conf file on slave1, should the restore_command point to the archivelogs on the master? <br>Do I run the archive_cleanup_command when I recover slave1 or do I wait until I have finished backup/copy from the slave2</b><br><br>standby_mode = 'on'<br>primary_conninfo = 'host=<master database ip address> port=5432 dbname=tumsdb user=replication password=<password> application_name=slave1 sslmode=require'<br>restore_command = 'cp /mnt/server/master_archivedir/%f "%p%"' <b><--- **** Is this correct! **** The master remains on-line and is producing archive logs. </b><br>archive_cleanup_command = 'pg_archivecleanup /mnt/server/master_archivedir/ %r'<br>trigger_file= '/opt/PostgreSQL/9.3/data/pgsql.trigger.file'<br>______________________________________________________________________________________________________<br>Postrgresql.conf - Slave2 Server VM<br><br>wal_level = hot_standby # minimal, archive, or hot_standby<br> # (change requires restart)<br># - Archiving -<br>archive_mode = on # allows archiving to be done<br> # (change requires restart)<br>archive_command = 'test ! -f /mnt/server/slave2_archivedir/%f && cp %p /mnt/server/slave2_archivedir/%f' # command to use to archive a logfile segment<br> # placeholders: %p = path of file to archive<br>#archive_timeout = 0 # force a logfile segment switch after this<br> # number of seconds; 0 disables<br>#------------------------------------------------------------------------------<br># REPLICATION<br>#------------------------------------------------------------------------------<br># - Sending Server(s) -<br># Set these on the master and on any standby that will send replication data.<br>max_wal_senders = 3 # max number of walsender processes<br> # (change requires restart)<br># - Standby Servers -<br># These settings are ignored on a master server.<br>hot_standby = on<br><br>*** I HAVE NOT COMPLETED this step yet from the master, with 1.2.3.5 being the IP of slave2. This takes about 3 days to finish the rsync copy<br><br>psql -c "select pg_start_backup('initial_backup');"<br>rsync -cvar --inplace --exclude=*pg_xlog* /u01/fiber/postgreSQL_data/postgres@1.2.3.5:/u01/fiber/postgreSQL_data/<br>psql -c " select pg_stop_backup ();"<br></div><div><br></div><div>Recovery.conf file slave2<br></div><div><br>standby_mode = 'on'<br>primary_conninfo = 'host=<master database ip address> port=5432 dbname=tumsdb user=replication password=<password> application_name=slave2 sslmode=require'<br>restore_command = 'cp /mnt/server/slave2_archivedir/%f "%p%"' <b><--- **** Is this correct! **** The master remains on-line and is producing archive logs. </b><br>archive_cleanup_command = 'pg_archivecleanup /mnt/server/slave2_archivedir/ %r'<br>trigger_file= '/opt/PostgreSQL/9.3/data/pgsql.trigger.file'<br><br>These directories reside on a share accessible to all three VM's<br><br>/mnt/server/master_archivedir<br>/mnt/server/slave1_archivedir<br>/mnt/server/slave2_archivedir</div><div><br></div><div>Thanks for any assistance. <br><br><br mce_bogus="1"></div></span></body></html>
TODO
On Thu, Apr 17, 2014 at 1:29 AM, <fburgess@radiantblue.com> wrote:
Now the issue is with the recovery.conf file on slave1, should the
restore_command point to the archivelogs on the master?
Yes, this is where archive_command of master copies the WAL files. You need
them for recovery operations on slaves.
Do I run the archive_cleanup_command when I recover slave1 or do I wait
until I have finished backup/copy from the slave2
Be careful here, this command may remove WAL files that are needed by other
slaves. For example, if slave1 kicks this command, you may remove files
still needed by slave2 that has not yet done any recovery operation and it
may need them.
postgresql.conf - Slave1
restore_command = 'cp /mnt/server/master_archivedir/%f "%p%"' <---
****
Is this correct! **** The master remains on-line and is producing archive
logs.
No need to have that much complexity for %p:
restore_command = 'cp -i /mnt/server/master_archivedir/%f %p'
postgresql.conf - Slave2 Server VM
restore_command = 'cp /mnt/server/slave2_archivedir/%f "%p%"' <---
****
Is this correct! **** The master remains on-line and is producing archive
logs.
Please see above, it could be more simple.
--
Michael
<html><body><span style="font-family:Verdana; color:#000000; font-size:10pt;"><div>Hi Michael, thanks for your reply.</div><div><br></div><div>I discussed this my colleague, and we decided to change the archive_command to execute a shell script.</div><div><br></div><div>#!/bin/bash<br># archive_command script to replicate archivelogs to standby server slaves<br># <br># postgresql.conf parameter<br>#<br># archive_command = '<$PGDATA>/<a href="http://replica_achive_set.sh">replica_achive_set.sh</a> "%p" "%f"'<br>#<br>set -e<br>set -u<br>ARCHIVE1="/mnt/server/slave1_archivedir"<br>ARCHIVE2="/mnt/server/slave2_archivedir"<br>if [ -f ${ARCHIVE1}/$2 ] && [ -f ${ARCHIVE2}/$2 ] ; then<br> echo Archive file $2 already exists in one of the replicated sets archive, skipping >&2<br> exit 0<br>fi<br>echoerr() { echo "$@" 1>&2; }<br>FAIL=0<br>`/usr/bin/rsync -aq $1 ${ARCHIVE1}/$2` & pid_1=$! ; `/usr/bin/rsync -aq $1 ${ARCHIVE2}/$2` & pid_2=$!<br>echoerr "Spawned replication processes $pid_1 AND $pid_2"<br>wait $pid_1 || let "FAIL+=1"<br>wait $pid_2 || let "FAIL+=1"<br>if [ "$FAIL" == "0" ];<br>then<br>echoerr "Replication success $1 $2"<br>else<br>echoerr "Replication failed $1 $2"<br>fi<br></div><div><br></div><div>This will copy the archivelogs from the master to both slaves. <b>Will that avoid the issue with removing needed WAL files?</b></div><div><br></div><div>I should be able to use these recovery.conf files</div><div><br></div><div>slave #1</div><div><br></div><div><span style="font-family:Verdana; color:#000000; font-size:10pt;">standby_mode = 'on'<br>primary_conninfo
= 'host=<master database ip address> port=5432 dbname=tumsdb
user=replication password=<password> application_name=slave1
sslmode=require'<br>restore_command = 'cp </span><span style="font-family:Verdana; color:#000000; font-size:10pt;">/mnt/server/slave1_archivedir/%f "%p%"' <br>archive_cleanup_command = 'pg_archivecleanup </span><span style="font-family:Verdana; color:#000000; font-size:10pt;">/mnt/server/slave1_archivedir/ %r'<br>trigger_file= '/opt/PostgreSQL/9.3/data/pgsql.trigger.file'</span></div><div><br></div><div><br></div><div>slaves #2</div><div><br></div><div><span style="font-family:Verdana; color:#000000; font-size:10pt;">standby_mode = 'on'<br>primary_conninfo
= 'host=<master database ip address> port=5432 dbname=tumsdb
user=replication password=<password> application_name=slave2
sslmode=require'<br>restore_command = 'cp </span><span style="font-family:Verdana; color:#000000; font-size:10pt;">/mnt/server/slave2_archivedir/%f "%p%"' <b></b><br>archive_cleanup_command = 'pg_archivecleanup </span><span style="font-family:Verdana; color:#000000; font-size:10pt;">/mnt/server/slave2_archivedir/ %r'<br>trigger_file= '/opt/PostgreSQL/9.3/data/pgsql.trigger.file'</span></div><div><span style="font-family:Verdana; color:#000000; font-size:10pt;"><br></span></div><div><b>Does this look correct?</b></div><div><br></div><div>Finally, question about the backup.</div><div><br></div><div>I did a pg_clt reload to change the archivelog destination from /mnt/server/master_archivedir to be redistributed to slave1 and slave2.<b> Do I need to redo this backup step? </b></div><div><br></div><div>psql -c "select pg_start_backup('initial_backup');"<br>rsync -cvar --inplace --exclude=*pg_xlog* /u01/fiber/postgreSQL_data/postgres@1.2.3.5:/u01/fiber/postgreSQL_data/<br>psql -c " select pg_stop_backup ();"<br></div><div><span style="font-family:Verdana; color:#000000; font-size:10pt;"><br></span></div><div><span style="font-family:Verdana; color:#000000; font-size:10pt;">or can I just copy all of the missing archivelog files from the /mnt/server/master_archivedir to the slaves, and then restart the slaves in recovery mode?</span></div><div><span style="font-family:Verdana; color:#000000; font-size:10pt;"><br></span></div><div><span style="font-family:Verdana; color:#000000; font-size:10pt;">thanks</span></div><div><br></div>
<blockquote id="replyBlockquote" webmail="1" style="border-left: 2px solid blue; margin-left: 8px; padding-left: 8px; font-size:10pt; color:black; font-family:verdana;">
<div id="wmQuoteWrapper">
-------- Original Message --------<br>
Subject: Re: [BUGS] Having trouble configuring a Master with multiple<br>
standby Servers in PostgreSQL 9.3.3<br>
From: Michael Paquier <<a href="mailto:michael.paquier@gmail.com">michael.paquier@gmail.com</a>><br>
Date: Wed, April 16, 2014 6:07 pm<br>
To: <a href="mailto:fburgess@radiantblue.com">fburgess@radiantblue.com</a><br>
Cc: <a href="mailto:pgsql-bugs@postgresql.org">pgsql-bugs@postgresql.org</a><br>
<br>
<div dir="ltr"><div>TODO<br><br>On Thu, Apr 17, 2014 at 1:29 AM, <<a target="_blank" href="mailto:fburgess@radiantblue.com">fburgess@radiantblue.com</a>> wrote:<br>> Now the issue is with the recovery.conf file on slave1, should the<br> > restore_command point to the archivelogs on the master?<br>Yes, this is where archive_command of master copies the WAL files. You need them for recovery operations on slaves.<br><br>> Do I run the archive_cleanup_command when I recover slave1 or do I wait<br> > until I have finished backup/copy from the slave2<br>Be careful here, this command may remove WAL files that are needed by other slaves. For example, if slave1 kicks this command, you may remove files still needed by slave2 that has not yet done any recovery operation and it may need them.<br> <br>> postgresql.conf - Slave1<br>> restore_command = 'cp /mnt/server/master_archivedir/%f "%p%"' <--- ****<br>> Is this correct! **** The master remains on-line and is producing archive<br> > logs.<br>No need to have that much complexity for %p:<br>restore_command <span class=""></span>= 'cp -i /mnt/server/master_archivedir/%f %p'<br><br>> postgresql.conf - Slave2 Server VM<br>> restore_command = 'cp /mnt/server/slave2_archivedir/%f "%p%"' <--- ****<br> > Is this correct! **** The master remains on-line and is producing archive<br>> logs.<br></div>Please see above, it could be more simple.<br><div>-- <br>Michael</div></div>
</div>
</blockquote></span></body></html>
Import Notes
Resolved by subject fallback
On Fri, Apr 18, 2014 at 1:19 AM, <fburgess@radiantblue.com> wrote:
Hi Michael, thanks for your reply.
I discussed this my colleague, and we decided to change the archive_command
to execute a shell script.
That's wiser as it allows more flexibility.
This will copy the archivelogs from the master to both slaves. Will that
avoid the issue with removing needed WAL files?
slave 1
archive_cleanup_command = 'pg_archivecleanup /mnt/server/slave1_archivedir/
%r'
slaves #2
archive_cleanup_command = 'pg_archivecleanup /mnt/server/slave2_archivedir/
%r'
Does this look correct?
Looks fine. You are copying each WAL file to a different archive
folder, and pg_archivecleanup will clean only the path it uses for
each folder, so there is no risk to have a WAL file removed by one
slave and needed by the other.
I did a pg_clt reload to change the archivelog destination from
/mnt/server/master_archivedir to be redistributed to slave1 and slave2. Do I
need to redo this backup step?
Not if the slaves have already fetched necessary WAL files from the
single master archive before you changed the command.
psql -c "select pg_start_backup('initial_backup');"
rsync -cvar --inplace --exclude=*pg_xlog*
/u01/fiber/postgreSQL_data/postgres@1.2.3.5:/u01/fiber/postgreSQL_data/
psql -c " select pg_stop_backup ();"or can I just copy all of the missing archivelog files from the
/mnt/server/master_archivedir to the slaves, and then restart the slaves in
recovery mode?
Taking a new base backup will be fine. But you actually do not need to
do so if your slaves have already caught up enough. Your slaves are
using streaming replication and are on the same server as the master
AFAIU so they should be fine, but there is always a possibility that
they need some WAL from archives if one of them for example was not
able to connect to the master for a long time and master already
dropped the necessary WAL files from its pg_xlog.
--
Michael
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
<html><body><span style="font-family:Verdana; color:#000000; font-size:10pt;"><div>I started the recovery yesterday and it ran overnight and is still running. Is the problem that the master is still producing new archivelogs that the slave is trying to recover, so that I am currently in a perpetual recovery mode?</div><div>I can see that the most recent archivelog being processed on the master is also being recovered on the slave. Do I need to suspend copying the archivelogs to the /mnt/server/slave1_archivedir/ directory, or should I wait?</div><div><br></div><div>thanks</div><div><br></div><div>Freddie<br></div><div><br></div><div><br></div>
<blockquote id="replyBlockquote" webmail="1" style="border-left: 2px solid blue; margin-left: 8px; padding-left: 8px; font-size:10pt; color:black; font-family:verdana;">
<div id="wmQuoteWrapper">
-------- Original Message --------<br>
Subject: Re: [BUGS] Having trouble configuring a Master with multiple<br>
standby Servers in PostgreSQL 9.3.3<br>
From: Michael Paquier <<a href="mailto:michael.paquier@gmail.com">michael.paquier@gmail.com</a>><br>
Date: Thu, April 17, 2014 5:28 pm<br>
To: <a href="mailto:fburgess@radiantblue.com">fburgess@radiantblue.com</a><br>
Cc: <a href="mailto:pgsql-bugs@postgresql.org">pgsql-bugs@postgresql.org</a><br>
<br>
On Fri, Apr 18, 2014 at 1:19 AM, <<a href="mailto:fburgess@radiantblue.com">fburgess@radiantblue.com</a>> wrote:<br>
> Hi Michael, thanks for your reply.<br>
><br>
> I discussed this my colleague, and we decided to change the archive_command<br>
> to execute a shell script.<br>
That's wiser as it allows more flexibility.<br>
<br>
> This will copy the archivelogs from the master to both slaves. Will that<br>
> avoid the issue with removing needed WAL files?<br>
> slave 1<br>
> archive_cleanup_command = 'pg_archivecleanup /mnt/server/slave1_archivedir/<br>
> %r'<br>
> slaves #2<br>
> archive_cleanup_command = 'pg_archivecleanup /mnt/server/slave2_archivedir/<br>
> %r'<br>
> Does this look correct?<br>
Looks fine. You are copying each WAL file to a different archive<br>
folder, and pg_archivecleanup will clean only the path it uses for<br>
each folder, so there is no risk to have a WAL file removed by one<br>
slave and needed by the other.<br>
<br>
> I did a pg_clt reload to change the archivelog destination from<br>
> /mnt/server/master_archivedir to be redistributed to slave1 and slave2. Do I<br>
> need to redo this backup step?<br>
Not if the slaves have already fetched necessary WAL files from the<br>
single master archive before you changed the command.<br>
<br>
> psql -c "select pg_start_backup('initial_backup');"<br>
> rsync -cvar --inplace --exclude=*pg_xlog*<br>
> /u01/fiber/postgreSQL_data/postgres@1.2.3.5:/u01/fiber/postgreSQL_data/<br>
> psql -c " select pg_stop_backup ();"<br>
><br>
> or can I just copy all of the missing archivelog files from the<br>
> /mnt/server/master_archivedir to the slaves, and then restart the slaves in<br>
> recovery mode?<br>
Taking a new base backup will be fine. But you actually do not need to<br>
do so if your slaves have already caught up enough. Your slaves are<br>
using streaming replication and are on the same server as the master<br>
AFAIU so they should be fine, but there is always a possibility that<br>
they need some WAL from archives if one of them for example was not<br>
able to connect to the master for a long time and master already<br>
dropped the necessary WAL files from its pg_xlog.<br>
-- <br>
Michael<br>
<br>
<br>
-- <br>
Sent via pgsql-bugs mailing list (<a href="mailto:pgsql-bugs@postgresql.org">pgsql-bugs@postgresql.org</a>)<br>
To make changes to your subscription:<br>
<a href="http://www.postgresql.org/mailpref/pgsql-bugs">http://www.postgresql.org/mailpref/pgsql-bugs</a><br>
</div>
</blockquote></span></body></html>
Import Notes
Resolved by subject fallback
<html><body><span style="font-family:Verdana; color:#000000; font-size:10pt;"><div>Sorry folks, we fixed the problem, turned out that in the recovery.conf file I had primary_conninfo " ... sslmode=require" This caused the error "could not connect to the primary server: sslmode value "require" invalid when SSL support is not compiled" So we just removed that option and bounced the slave and everything is working now.</div><div><br></div><div>thanks<br></div>
<blockquote id="replyBlockquote" webmail="1" style="border-left: 2px solid blue; margin-left: 8px; padding-left: 8px; font-size:10pt; color:black; font-family:verdana;">
<div id="wmQuoteWrapper">
-------- Original Message --------<br>
Subject: Re: [BUGS] Having trouble configuring a Master with multiple<br>
standby Servers in PostgreSQL 9.3.3<br>
From: <<a href="mailto:fburgess@radiantblue.com">fburgess@radiantblue.com</a>><br>
Date: Fri, April 18, 2014 8:24 am<br>
To: "Michael Paquier" <<a href="mailto:michael.paquier@gmail.com">michael.paquier@gmail.com</a>><br>
Cc: <a href="mailto:pgsql-bugs@postgresql.org">pgsql-bugs@postgresql.org</a><br>
<br>
<span style="font-family:Verdana; color:#000000; font-size:10pt;"><div>I started the recovery yesterday and it ran overnight and is still running. Is the problem that the master is still producing new archivelogs that the slave is trying to recover, so that I am currently in a perpetual recovery mode?</div><div>I can see that the most recent archivelog being processed on the master is also being recovered on the slave. Do I need to suspend copying the archivelogs to the /mnt/server/slave1_archivedir/ directory, or should I wait?</div><div><br></div><div>thanks</div><div><br></div><div>Freddie<br></div><div><br></div><div><br></div> <blockquote id="replyBlockquote" webmail="1" style="border-left: 2px solid blue; margin-left: 8px; padding-left: 8px; font-size:10pt; color:black; font-family:verdana;"> <div id="wmQuoteWrapper"> -------- Original Message --------<br> Subject: Re: [BUGS] Having trouble configuring a Master with multiple<br> standby Servers in PostgreSQL 9.3.3<br> From: Michael Paquier <<a target="_blank" href="mailto:michael.paquier@gmail.com">michael.paquier@gmail.com</a>><br> Date: Thu, April 17, 2014 5:28 pm<br> To: <a target="_blank" href="mailto:fburgess@radiantblue.com">fburgess@radiantblue.com</a><br> Cc: <a target="_blank" href="mailto:pgsql-bugs@postgresql.org">pgsql-bugs@postgresql.org</a><br> <br> On Fri, Apr 18, 2014 at 1:19 AM, <<a target="_blank" href="mailto:fburgess@radiantblue.com">fburgess@radiantblue.com</a>> wrote:<br> > Hi Michael, thanks for your reply.<br> ><br> > I discussed this my colleague, and we decided to change the archive_command<br> > to execute a shell script.<br> That's wiser as it allows more flexibility.<br> <br> > This will copy the archivelogs from the master to both slaves. Will that<br> > avoid the issue with removing needed WAL files?<br> > slave 1<br> > archive_cleanup_command = 'pg_archivecleanup /mnt/server/slave1_archivedir/<br> > %r'<br> > slaves #2<br> > archive_cleanup_command = 'pg_archivecleanup /mnt/server/slave2_archivedir/<br> > %r'<br> > Does this look correct?<br> Looks fine. You are copying each WAL file to a different archive<br> folder, and pg_archivecleanup will clean only the path it uses for<br> each folder, so there is no risk to have a WAL file removed by one<br> slave and needed by the other.<br> <br> > I did a pg_clt reload to change the archivelog destination from<br> > /mnt/server/master_archivedir to be redistributed to slave1 and slave2. Do I<br> > need to redo this backup step?<br> Not if the slaves have already fetched necessary WAL files from the<br> single master archive before you changed the command.<br> <br> > psql -c "select pg_start_backup('initial_backup');"<br> > rsync -cvar --inplace --exclude=*pg_xlog*<br> > /u01/fiber/postgreSQL_data/postgres@1.2.3.5:/u01/fiber/postgreSQL_data/<br> > psql -c " select pg_stop_backup ();"<br> ><br> > or can I just copy all of the missing archivelog files from the<br> > /mnt/server/master_archivedir to the slaves, and then restart the slaves in<br> > recovery mode?<br> Taking a new base backup will be fine. But you actually do not need to<br> do so if your slaves have already caught up enough. Your slaves are<br> using streaming replication and are on the same server as the master<br> AFAIU so they should be fine, but there is always a possibility that<br> they need some WAL from archives if one of them for example was not<br> able to connect to the master for a long time and master already<br> dropped the necessary WAL files from its pg_xlog.<br> -- <br> Michael<br> <br> <br> -- <br> Sent via pgsql-bugs mailing list (<a target="_blank" href="mailto:pgsql-bugs@postgresql.org">pgsql-bugs@postgresql.org</a>)<br> To make changes to your subscription:<br> <a target="_blank" href="http://www.postgresql.org/mailpref/pgsql-bugs">http://www.postgresql.org/mailpref/pgsql-bugs</a><br> </div> </blockquote></span>
</div>
</blockquote></span></body></html>
Import Notes
Resolved by subject fallback