WAL Log Shipping - Warm Standby not working under 8.3.7

Started by Keaton Adamsover 16 years ago8 messagesgeneral

kadams@mxlogic.com

over 16 years ago

We had WAL Log shipping (warm standby) working fine under 8.1.4 but under 8.3.7 we can't get the slave to come up properly. Nothing has changed in our process with regard to start_backup, rsync, stop_backup, bring up the warm standby server in continuous recovery mode, but the failover DB won't start with the following error:

<2010-01-07 10:54:23 MST>LOG: received immediate shutdown request
/mxl/var/pgsql/data/ha_copy.sh: line 103: 13976 Quit sleep 5
File /mxl/var/pgsql/data/stopslave found. Aborting Process.
<2010-01-07 10:54:28 MST>LOG: could not open file "pg_xlog/00000001000000F6000000E9" (log file 246, segment 233): No such file or directory
<2010-01-07 10:54:28 MST>LOG: redo done at F6/E8FFE378
<2010-01-07 10:54:28 MST>LOG: last completed transaction was at log time 2009-12-28 10:18:04.893307-07
Waiting for log: 00000001000000F6000000E8
<2010-01-07 11:24:49 MST>FATAL: could not restore file "00000001000000F6000000E8" from archive: return code 15

The log file in reference is very old and is not on the Master PG server in pg_xlogs and the "last completed transaction" can't be right either. Is this a bug or it is something we are doing wrong?

Thanks,

Keaton

psql (PostgreSQL) 8.3.7
contains support for command-line editing
RHEL 5 64 Bit
Linux ourservername 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

Keaton Adams

kadams@mxlogic.com

over 16 years ago

In reply to: Keaton Adams (#1)

Re: WAL Log Shipping - Warm Standby not working under 8.3.7

I did find some references to a fix of last-completed transaction time and I looked in the postgresql-bugs archive, but I'm not having any luck confirming that this is a problem in 8.3.7 and an upgrade to 8.3.9 would fix the issue.

postgresql 8.3.7 .... Fix incorrect logging of last-completed-transaction time during PITR ..... Last transaction end time is now logged at end of recovery and at each logged restart point (Simon) ...

On 1/7/10 12:53 PM, "Keaton Adams" <kadams@mxlogic.com> wrote:

The log file in reference is very old and is not on the Master PG server in pg_xlogs and the "last completed transaction" can't be right either. Is this a bug or it is something we are doing wrong?

Thanks,

Keaton

psql (PostgreSQL) 8.3.7
contains support for command-line editing
RHEL 5 64 Bit
Linux ourservername 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

Keaton Adams

kadams@mxlogic.com

over 16 years ago

In reply to: Keaton Adams (#2)

Re: WAL Log Shipping - Warm Standby not working under 8.3.7

OK,

So what am I doing wrong here?

Installed PG 8.3.7 on Slave machine

Restored from last evening's backup from the master DB to make the rsync across the network finish sooner.

Shut down the PG instance on the slave machine

Ran a script that does the following:

select pg_start_backup('Master_Backup');
rsync -rvlpogtz ${masterdb}/* ${slave_dbuser}@${slave_host}:${slavedb}
select pg_stop_backup();
ssh ${slave_dbuser}@${slave_host} rm ${slavedb}/postmaster.pid 2>/dev/null
ssh ${slave_dbuser}@${slave_host} rm ${slave_backup_path}/0* 2>/dev/null
ssh ${slave_dbuser}@${slave_host} ${PSQL_BIN}/pg_ctl -D ${slavedb} -l logfile start

When the slave PG database attempts to come up in recovery mode, it aborts because it is looking for a log file that is extremely old and does not exist on the master DB server. I believe the Master PG instance was restarted on 12/28/09 and has been running ever since. Is there a way to reset the "last completed transaction" on a DB? Why is PG looking so far back for a WAL log to begin recovery when so much has been done since the 28th including daily backups?

<2010-01-07 10:54:23 MST>LOG: received immediate shutdown request
/mxl/var/pgsql/data/ha_copy.sh: line 103: 13976 Quit sleep 5
File /mxl/var/pgsql/data/stopslave found. Aborting Process.
<2010-01-07 10:54:28 MST>LOG: could not open file "pg_xlog/00000001000000F6000000E9" (log file 246, segment 233): N
o such file or directory
<2010-01-07 10:54:28 MST>LOG: redo done at F6/E8FFE378
<2010-01-07 10:54:28 MST>LOG: last completed transaction was at log time 2009-12-28 10:18:04.893307-07
Waiting for log: 00000001000000F6000000E8
<2010-01-07 11:24:49 MST>FATAL: could not restore file "00000001000000F6000000E8" from archive: return code 15

Again, nothing was changed with the scripts or the replication process and this worked just fine under 8.1.4.

Thanks!

On 1/8/10 8:10 AM, "Keaton Adams" <kadams@mxlogic.com> wrote:

On 1/7/10 12:53 PM, "Keaton Adams" <kadams@mxlogic.com> wrote:

The log file in reference is very old and is not on the Master PG server in pg_xlogs and the "last completed transaction" can't be right either. Is this a bug or it is something we are doing wrong?

Thanks,

Keaton

psql (PostgreSQL) 8.3.7
contains support for command-line editing
RHEL 5 64 Bit
Linux ourservername 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

Erik Jones

ejones@engineyard.com

about 16 years ago

In reply to: Keaton Adams (#3)

Re: WAL Log Shipping - Warm Standby not working under 8.3.7

On Jan 8, 2010, at 9:50 AM, Keaton Adams wrote:

OK,

So what am I doing wrong here?

Installed PG 8.3.7 on Slave machine

Restored from last evening's backup from the master DB to make the rsync across the network finish sooner.

Shut down the PG instance on the slave machine

Ran a script that does the following:

select pg_start_backup('Master_Backup');
rsync -rvlpogtz ${masterdb}/* ${slave_dbuser}@${slave_host}:${slavedb}
select pg_stop_backup();
ssh ${slave_dbuser}@${slave_host} rm ${slavedb}/postmaster.pid 2>/dev/null
ssh ${slave_dbuser}@${slave_host} rm ${slave_backup_path}/0* 2>/dev/null
ssh ${slave_dbuser}@${slave_host} ${PSQL_BIN}/pg_ctl -D ${slavedb} -l logfile start

Is ${slave_backup_path} your archive directory? Why are you deleting all of you archives there?

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k

Keaton Adams

kadams@mxlogic.com

about 16 years ago

In reply to: Erik Jones (#4)

Re: WAL Log Shipping - Warm Standby not working under 8.3.7

To clean up from a prior run.

Erik Jones <ejones@engineyard.com> wrote:

On Jan 8, 2010, at 9:50 AM, Keaton Adams wrote:

OK,

So what am I doing wrong here?

Installed PG 8.3.7 on Slave machine

Restored from last evening's backup from the master DB to make the rsync across the network finish sooner.

Shut down the PG instance on the slave machine

Ran a script that does the following:

select pg_start_backup('Master_Backup');
rsync -rvlpogtz ${masterdb}/* ${slave_dbuser}@${slave_host}:${slavedb}
select pg_stop_backup();
ssh ${slave_dbuser}@${slave_host} rm ${slavedb}/postmaster.pid 2>/dev/null
ssh ${slave_dbuser}@${slave_host} rm ${slave_backup_path}/0* 2>/dev/null
ssh ${slave_dbuser}@${slave_host} ${PSQL_BIN}/pg_ctl -D ${slavedb} -l logfile start

Is ${slave_backup_path} your archive directory? Why are you deleting all of you archives there?

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k

Import Notes

Resolved by subject fallback

Erik Jones

ejones@engineyard.com

about 16 years ago

In reply to: Erik Jones (#4)

Re: WAL Log Shipping - Warm Standby not working under 8.3.7

On Jan 8, 2010, at 4:50 PM, Erik Jones wrote:

On Jan 8, 2010, at 9:50 AM, Keaton Adams wrote:

OK,

So what am I doing wrong here?

Installed PG 8.3.7 on Slave machine

Restored from last evening's backup from the master DB to make the rsync across the network finish sooner.

Shut down the PG instance on the slave machine

Ran a script that does the following:

select pg_start_backup('Master_Backup');
rsync -rvlpogtz ${masterdb}/* ${slave_dbuser}@${slave_host}:${slavedb}
select pg_stop_backup();
ssh ${slave_dbuser}@${slave_host} rm ${slavedb}/postmaster.pid 2>/dev/null
ssh ${slave_dbuser}@${slave_host} rm ${slave_backup_path}/0* 2>/dev/null
ssh ${slave_dbuser}@${slave_host} ${PSQL_BIN}/pg_ctl -D ${slavedb} -l logfile start

Is ${slave_backup_path} your archive directory? Why are you deleting all of you archives there?

Also, what are the contents of your recovery.conf file? Are you using pg_standby? The typical setup is to clear <datadir>/pg_xlog on your standby and use pg_standby to recovery files directly from your archive directory?

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k

Keaton Adams

kadams@mxlogic.com

about 16 years ago

In reply to: Erik Jones (#6)

Re: WAL Log Shipping - Warm Standby not working under 8.3.7

We are still on PG 8.3.7 and I don't see pg_standby. It looks like it was added with 8.3.8. I will put in a request to update to 8.3.9 as a part of our next software upgrade (internal product we write) but for now I need to try to figure out why this isn't working under 8.3.7. I did manage to bring up replication under 8.3.7 in the test environment just fine, so the question is now why the standby PG instance won't come up in recovery mode and why it is looking so far back for the next WAL log file.

-K

On 1/8/10 5:55 PM, "Erik Jones" <ejones@engineyard.com> wrote:

On Jan 8, 2010, at 4:50 PM, Erik Jones wrote:

On Jan 8, 2010, at 9:50 AM, Keaton Adams wrote:

OK,

So what am I doing wrong here?

Installed PG 8.3.7 on Slave machine

Restored from last evening's backup from the master DB to make the rsync across the network finish sooner.

Shut down the PG instance on the slave machine

Ran a script that does the following:

select pg_start_backup('Master_Backup');
rsync -rvlpogtz ${masterdb}/* ${slave_dbuser}@${slave_host}:${slavedb}
select pg_stop_backup();
ssh ${slave_dbuser}@${slave_host} rm ${slavedb}/postmaster.pid 2>/dev/null
ssh ${slave_dbuser}@${slave_host} rm ${slave_backup_path}/0* 2>/dev/null
ssh ${slave_dbuser}@${slave_host} ${PSQL_BIN}/pg_ctl -D ${slavedb} -l logfile start

Is ${slave_backup_path} your archive directory? Why are you deleting all of you archives there?

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k

Erik Jones

ejones@engineyard.com

about 16 years ago

In reply to: Keaton Adams (#7)

Re: WAL Log Shipping - Warm Standby not working under 8.3.7

On Jan 9, 2010, at 7:24 AM, Keaton Adams wrote:

Restored from last evening's backup from the master DB to make the rsync across the network finish sooner.

Shut down the PG instance on the slave machine

Ran a script that does the following:

select pg_start_backup('Master_Backup');
rsync -rvlpogtz ${masterdb}/* ${slave_dbuser}@${slave_host}:${slavedb}
select pg_stop_backup();
ssh ${slave_dbuser}@${slave_host} rm ${slavedb}/postmaster.pid 2>/dev/null
ssh ${slave_dbuser}@${slave_host} rm ${slave_backup_path}/0* 2>/dev/null
ssh ${slave_dbuser}@${slave_host} ${PSQL_BIN}/pg_ctl -D ${slavedb} -l logfile start

Is ${slave_backup_path} your archive directory? Why are you deleting all of you archives there?

Also, what are the contents of your recovery.conf file? Are you using pg_standby? The typical setup is to clear <datadir>/pg_xlog on your standby and use pg_standby to recovery files directly from your archive directory?

We are still on PG 8.3.7 and I don’t see pg_standby. It looks like it was added with 8.3.8. I will put in a request to update to 8.3.9 as a part of our next software upgrade (internal product we write) but for now I need to try to figure out why this isn’t working under 8.3.7. I did manage to bring up replication under 8.3.7 in the test environment just fine, so the question is now why the standby PG instance won’t come up in recovery mode and why it is looking so far back for the next WAL log file.

-K

Keaton,

Please, don't top post (I've fixed it in this reply), it makes it difficult for others to join the thread and know what's currently going on.

Anyways, with regards to pg_standby, it should be included in all versions >= 8.3.0. Regardless, it will work with anything version since at least 8.2 so you can just grab it from a source install and build it against your built source tree or use your package manager of choice to get it.

Regardless:

ssh ${slave_dbuser}@${slave_host} rm ${slave_backup_path}/0* 2>/dev/null

You are doing that *after* you run pg_start_backup() so you're deleting WAL files that are needed for the standby -- it needs all WAL files generated from the start_backup() call on.

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k