Interrupted streaming replication

Started by Tony Nelsonover 10 years ago8 messagesgeneral
Jump to latest
#1Tony Nelson
tnelson@starpoint.com

Hi all,

I have a small problem of my own making. I have a simple streaming replication setup, one master and two slaves. One slave is in the same rack as the master, the other is across a "slow" link in another server room.

Last night I executed some updates that caused a lot of changes. The slave in the same room did worked just fine, and the slave across the slow link was working fine, applying logs.

Then my backups executed, and moved all of the WAL archives, to an archive directory.

The good news is, I *think* I have them all.

The slave is waiting:

postgres 12562 12561 0 Oct16 ? 00:11:07 postgres: startup process waiting for 00000001000000890000002C

And I definitely have the file in my archive directory on the master.

The master is logging this error:

2015-12-07 08:24:50 EST FATAL: requested WAL segment 00000001000000890000002C has already been removed

Can I simply copy the file from my archive directory back to the WAL directory?

Thanks in advance
Tony Nelson

________________________________
Since 1982, Starpoint Solutions has been a trusted source of human capital and solutions. We are committed to our clients, employees, environment, community and social concerns. We foster an inclusive culture based on trust, respect, honesty and solid performance. Learn more about Starpoint and our social responsibility at http://www.starpoint.com/social_responsibility

________________________________
This email message from Starpoint Solutions LLC is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Opinions, conclusions and other information in this message that do not relate to the official business of Starpoint Solutions shall be understood as neither given nor endorsed by it.

#2Andreas Kretschmer
akretschmer@spamfence.net
In reply to: Tony Nelson (#1)
Re: Interrupted streaming replication

Tony Nelson <tnelson@starpoint.com> wrote:

2015-12-07 08:24:50 EST FATAL: requested WAL segment 00000001000000890000002C
has already been removed

Can I simply copy the file from my archive directory back to the WAL directory?

I'm afraid that won't work, because of the master knows nothing about
this copy and can't stream its content.

Maybe you can create/update your recovery.conf, see restore_command.

for the future: increase wal_keep_segments, or better: use replication
slots (since 9.4 a really good feature to prevent such errors).

Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknown)
Kaufbach, Saxony, Germany, Europe. N 51.05082�, E 13.56889�

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Tony Nelson
tnelson@starpoint.com
In reply to: Andreas Kretschmer (#2)
Re: Interrupted streaming replication

-----Original Message-----
Tony Nelson <tnelson@starpoint.com> wrote:

2015-12-07 08:24:50 EST FATAL: requested WAL segment
00000001000000890000002C has already been removed

Can I simply copy the file from my archive directory back to the WAL

directory?

I'm afraid that won't work, because of the master knows nothing about this
copy and can't stream its content.

Maybe you can create/update your recovery.conf, see restore_command.

for the future: increase wal_keep_segments, or better: use replication slots
(since 9.4 a really good feature to prevent such errors).

Thank you. I'm still running 9.1, but I'll look into 9.4. Fortunately my database is small so doing a new base backup is not the end of the world. It's been a long time since I've set this up, so I'm going to have to go through a refresher course on exactly what I need to do.

Thanks for your help.

Tony

Since 1982, Starpoint Solutions has been a trusted source of human capital and solutions. We are committed to our clients, employees, environment, community and social concerns. We foster an inclusive culture based on trust, respect, honesty and solid performance. Learn more about Starpoint and our social responsibility at http://www.starpoint.com/social_responsibility

This email message from Starpoint Solutions LLC is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Opinions, conclusions and other information in this message that do not relate to the official business of Starpoint Solutions shall be understood as neither given nor endorsed by it.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Melvin Davidson
melvin6925@gmail.com
In reply to: Tony Nelson (#3)
Re: Interrupted streaming replication

As long as you have 00000001000000890000002C and the subsequent WALs in
your archive directory, then you should
be able to simply scp them to you second slave's pg_xlog directory.

On Mon, Dec 7, 2015 at 8:50 AM, Tony Nelson <tnelson@starpoint.com> wrote:

-----Original Message-----
Tony Nelson <tnelson@starpoint.com> wrote:

2015-12-07 08:24:50 EST FATAL: requested WAL segment
00000001000000890000002C has already been removed

Can I simply copy the file from my archive directory back to the WAL

directory?

I'm afraid that won't work, because of the master knows nothing about

this

copy and can't stream its content.

Maybe you can create/update your recovery.conf, see restore_command.

for the future: increase wal_keep_segments, or better: use replication

slots

(since 9.4 a really good feature to prevent such errors).

Thank you. I'm still running 9.1, but I'll look into 9.4. Fortunately my
database is small so doing a new base backup is not the end of the world.
It's been a long time since I've set this up, so I'm going to have to go
through a refresher course on exactly what I need to do.

Thanks for your help.

Tony

Since 1982, Starpoint Solutions has been a trusted source of human capital
and solutions. We are committed to our clients, employees, environment,
community and social concerns. We foster an inclusive culture based on
trust, respect, honesty and solid performance. Learn more about Starpoint
and our social responsibility at
http://www.starpoint.com/social_responsibility

This email message from Starpoint Solutions LLC is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or distribution is
prohibited. If you are not the intended recipient, please contact the
sender by reply email and destroy all copies of the original message.
Opinions, conclusions and other information in this message that do not
relate to the official business of Starpoint Solutions shall be understood
as neither given nor endorsed by it.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#5Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Tony Nelson (#3)
Re: Interrupted streaming replication

On 12/07/2015 05:50 AM, Tony Nelson wrote:

-----Original Message-----
Tony Nelson <tnelson@starpoint.com> wrote:

2015-12-07 08:24:50 EST FATAL: requested WAL segment
00000001000000890000002C has already been removed

Can I simply copy the file from my archive directory back to the WAL

directory?

I'm afraid that won't work, because of the master knows nothing about this
copy and can't stream its content.

Maybe you can create/update your recovery.conf, see restore_command.

for the future: increase wal_keep_segments, or better: use replication slots
(since 9.4 a really good feature to prevent such errors).

Thank you. I'm still running 9.1, but I'll look into 9.4. Fortunately my database is small so doing a new base backup is not the end of the world. It's been a long time since I've set this up, so I'm going to have to go through a refresher course on exactly what I need to do.

I have not done it, but is seems to me given that you have an WAL
archive you could enable archive recovery on the stalled slave and
restart it:

http://www.postgresql.org/docs/9.1/interactive/archive-recovery-settings.html

Thanks for your help.

Tony

Since 1982, Starpoint Solutions has been a trusted source of human capital and solutions. We are committed to our clients, employees, environment, community and social concerns. We foster an inclusive culture based on trust, respect, honesty and solid performance. Learn more about Starpoint and our social responsibility at http://www.starpoint.com/social_responsibility

This email message from Starpoint Solutions LLC is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Opinions, conclusions and other information in this message that do not relate to the official business of Starpoint Solutions shall be understood as neither given nor endorsed by it.

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Tony Nelson
tnelson@starpoint.com
In reply to: Melvin Davidson (#4)
Re: Interrupted streaming replication

Thank you very much, this is working perfectly. I copied a couple over and the server applied them, and is waiting for the next set.

Tony

From: Melvin Davidson [mailto:melvin6925@gmail.com]
Sent: Monday, December 07, 2015 8:59 AM
To: Tony Nelson
Cc: Andreas Kretschmer; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Interrupted streaming replication

As long as you have 00000001000000890000002C and the subsequent WALs in your archive directory, then you should
be able to simply scp them to you second slave's pg_xlog directory.

On Mon, Dec 7, 2015 at 8:50 AM, Tony Nelson <tnelson@starpoint.com<mailto:tnelson@starpoint.com>> wrote:

-----Original Message-----
Tony Nelson <tnelson@starpoint.com<mailto:tnelson@starpoint.com>> wrote:

2015-12-07 08:24:50 EST FATAL: requested WAL segment
00000001000000890000002C has already been removed

Can I simply copy the file from my archive directory back to the WAL

directory?

I'm afraid that won't work, because of the master knows nothing about this
copy and can't stream its content.

Maybe you can create/update your recovery.conf, see restore_command.

for the future: increase wal_keep_segments, or better: use replication slots
(since 9.4 a really good feature to prevent such errors).

Thank you. I'm still running 9.1, but I'll look into 9.4. Fortunately my database is small so doing a new base backup is not the end of the world. It's been a long time since I've set this up, so I'm going to have to go through a refresher course on exactly what I need to do.

Thanks for your help.

Tony

Since 1982, Starpoint Solutions has been a trusted source of human capital and solutions. We are committed to our clients, employees, environment, community and social concerns. We foster an inclusive culture based on trust, respect, honesty and solid performance. Learn more about Starpoint and our social responsibility at http://www.starpoint.com/social_responsibility

This email message from Starpoint Solutions LLC is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Opinions, conclusions and other information in this message that do not relate to the official business of Starpoint Solutions shall be understood as neither given nor endorsed by it.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org<mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you. [cid:~WRD000.jpg]

________________________________
Since 1982, Starpoint Solutions has been a trusted source of human capital and solutions. We are committed to our clients, employees, environment, community and social concerns. We foster an inclusive culture based on trust, respect, honesty and solid performance. Learn more about Starpoint and our social responsibility at http://www.starpoint.com/social_responsibility

________________________________
This email message from Starpoint Solutions LLC is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Opinions, conclusions and other information in this message that do not relate to the official business of Starpoint Solutions shall be understood as neither given nor endorsed by it.

Attachments:

~WRD000.jpgimage/jpeg; name=~WRD000.jpgDownload
#7Andreas Kretschmer
akretschmer@spamfence.net
In reply to: Melvin Davidson (#4)
Re: Interrupted streaming replication

Melvin Davidson <melvin6925@gmail.com> wrote:

As long as you have 00000001000000890000002C and the subsequent WALs in your
archive directory, then you should
be able to simply scp them to you second slave's pg_xlog directory.

Nice idea ;-)

wasn't sure if that works, but yes, nice.

Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknown)
Kaufbach, Saxony, Germany, Europe. N 51.05082�, E 13.56889�

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Tony Nelson (#6)
Re: Interrupted streaming replication

On 12/07/2015 06:04 AM, Tony Nelson wrote:

Thank you very much, this is working perfectly. I copied a couple over
and the server applied them, and is waiting for the next set.

To prevent this sort of thing in the future I would still take a look at
the restore_command, for the following reason:

http://www.postgresql.org/docs/9.1/interactive/warm-standby.html#STREAMING-REPLICATION

"If you use streaming replication without file-based continuous
archiving, you have to set wal_keep_segments in the master to a value
high enough to ensure that old WAL segments are not recycled too early,
while the standby might still need them to catch up. If the standby
falls behind too much, it needs to be reinitialized from a new base
backup. If you set up a WAL archive that's accessible from the standby,
wal_keep_segments is not required as the standby can always use the
archive to catch up."

A belt and suspenders approach where streaming is the default and WAL
archive recovery is the backup.

Tony

*From:*Melvin Davidson [mailto:melvin6925@gmail.com]
*Sent:* Monday, December 07, 2015 8:59 AM
*To:* Tony Nelson
*Cc:* Andreas Kretschmer; pgsql-general@postgresql.org
*Subject:* Re: [GENERAL] Interrupted streaming replication

As long as you have 00000001000000890000002C and the subsequent WALs in
your archive directory, then you should
be able to simply scp them to you second slave's pg_xlog directory.

On Mon, Dec 7, 2015 at 8:50 AM, Tony Nelson <tnelson@starpoint.com
<mailto:tnelson@starpoint.com>> wrote:

-----Original Message-----
Tony Nelson <tnelson@starpoint.com <mailto:tnelson@starpoint.com>> wrote:

2015-12-07 08:24:50 EST FATAL: requested WAL segment
00000001000000890000002C has already been removed

Can I simply copy the file from my archive directory back to the WAL

directory?

I'm afraid that won't work, because of the master knows nothing about

this

copy and can't stream its content.

Maybe you can create/update your recovery.conf, see restore_command.

for the future: increase wal_keep_segments, or better: use

replication slots

(since 9.4 a really good feature to prevent such errors).

Thank you. I'm still running 9.1, but I'll look into 9.4. Fortunately
my database is small so doing a new base backup is not the end of the
world. It's been a long time since I've set this up, so I'm going to
have to go through a refresher course on exactly what I need to do.

Thanks for your help.

Tony

Since 1982, Starpoint Solutions has been a trusted source of human
capital and solutions. We are committed to our clients, employees,
environment, community and social concerns. We foster an inclusive
culture based on trust, respect, honesty and solid performance. Learn
more about Starpoint and our social responsibility at
http://www.starpoint.com/social_responsibility

This email message from Starpoint Solutions LLC is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact
the sender by reply email and destroy all copies of the original
message. Opinions, conclusions and other information in this message
that do not relate to the official business of Starpoint Solutions shall
be understood as neither given nor endorsed by it.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org
<mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--

*Melvin Davidson*
*I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you. Image removed by sender.*

------------------------------------------------------------------------
Since 1982, Starpoint Solutions has been a trusted source of human
capital and solutions. We are committed to our clients, employees,
environment, community and social concerns. We foster an inclusive
culture based on trust, respect, honesty and solid performance. Learn
more about Starpoint and our social responsibility at
http://www.starpoint.com/social_responsibility

------------------------------------------------------------------------
This email message from Starpoint Solutions LLC is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or distribution is
prohibited. If you are not the intended recipient, please contact the
sender by reply email and destroy all copies of the original message.
Opinions, conclusions and other information in this message that do not
relate to the official business of Starpoint Solutions shall be
understood as neither given nor endorsed by it.

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general