Backup history file should be replicated in Streaming Replication?
Hi,
In current SR, since a backup history file is not replicated,
the standby always starts an archive recovery without a backup
history file, and a wrong minRecoveryPoint might be used. This
is not a problem for SR itself, but would cause trouble when
SR cooperates with Hot Standby.
HS begins accepting read-only queries after a recovery reaches
minRecoveryPoint (i.e., a database has become consistent). So,
a wrong minRecoveryPoint would execute read-only queries on an
inconsistent database. A backup history file should be replicated
at the beginning of the standby's recovery.
This problem should be addressed right now? Or, I should wait
until current simple SR patch has been committed?
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Fujii Masao wrote:
In current SR, since a backup history file is not replicated,
the standby always starts an archive recovery without a backup
history file, and a wrong minRecoveryPoint might be used. This
is not a problem for SR itself, but would cause trouble when
SR cooperates with Hot Standby.
But the backup history file is included in the base backup you start
replication from, right? After that, minRecoveryPoint is stored in the
control file and advanced as the recovery progresses.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Thu, Nov 26, 2009 at 4:55 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
Fujii Masao wrote:
In current SR, since a backup history file is not replicated,
the standby always starts an archive recovery without a backup
history file, and a wrong minRecoveryPoint might be used. This
is not a problem for SR itself, but would cause trouble when
SR cooperates with Hot Standby.But the backup history file is included in the base backup you start
replication from, right?
No. A backup history file is created by pg_stop_backup().
So it's not included in the base backup.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Fujii Masao wrote:
On Thu, Nov 26, 2009 at 4:55 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:Fujii Masao wrote:
In current SR, since a backup history file is not replicated,
the standby always starts an archive recovery without a backup
history file, and a wrong minRecoveryPoint might be used. This
is not a problem for SR itself, but would cause trouble when
SR cooperates with Hot Standby.But the backup history file is included in the base backup you start
replication from, right?No. A backup history file is created by pg_stop_backup().
So it's not included in the base backup.
Ah, I see.
Yeah, that needs to be addressed regardless of HS, because you can
otherwise start up (= fail over to) the standby too early, before the
minimum recovery point has been reached.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Thu, Nov 26, 2009 at 5:17 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
Yeah, that needs to be addressed regardless of HS, because you can
otherwise start up (= fail over to) the standby too early, before the
minimum recovery point has been reached.
Okey, I address that ASAP.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Hi,
On Thu, Nov 26, 2009 at 5:20 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Thu, Nov 26, 2009 at 5:17 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:Yeah, that needs to be addressed regardless of HS, because you can
otherwise start up (= fail over to) the standby too early, before the
minimum recovery point has been reached.Okey, I address that ASAP.
pg_stop_backup deletes the previous backup history file from pg_xlog.
So replication of a backup history file would fail if just one new
online-backup is caused after the base-backup for the standby is taken.
This is too aggressive deletion policy for Streaming Replication, I think.
So I'd like to change pg_stop_backup so as to delete only backup
history files of four or more generations ago (four is enough?).
Thought?
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Fujii Masao wrote:
pg_stop_backup deletes the previous backup history file from pg_xlog.
So replication of a backup history file would fail if just one new
online-backup is caused after the base-backup for the standby is taken.
This is too aggressive deletion policy for Streaming Replication, I think.So I'd like to change pg_stop_backup so as to delete only backup
history files of four or more generations ago (four is enough?).
This is essentially the same problem we have with WAL files and
checkpoints. If the standby falls behind too much, without having on
open connection to the master all the time, the master will delete old
files that are still needed in the standby.
I don't think it's worthwhile to modify pg_stop_backup() like that. We
should address the general problem. At the moment, you're fine if you
also configure WAL archiving and log file shipping, but it would be nice
to have some simpler mechanism to avoid the problem. For example, a GUC
in master to retain all log files (including backup history files) for X
days. Or some way for to register the standby with the master so that
the master knows it's out there, and still needs the logs, even when
it's not connected.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Fri, Dec 18, 2009 at 11:03 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
Or some way for to register the standby with the master so that
the master knows it's out there, and still needs the logs, even when
it's not connected.
That is the real answer, I think.
...Robert
On 18.12.09 17:05 , Robert Haas wrote:
On Fri, Dec 18, 2009 at 11:03 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:Or some way for to register the standby with the master so that
the master knows it's out there, and still needs the logs, even when
it's not connected.That is the real answer, I think.
It'd prefer if the slave could automatically fetch a new base backup if
it falls behind too far to catch up with the available logs. That way,
old logs don't start piling up on the server if a slave goes offline for
a long time.
The slave could for example run a configurable shell script in that
case, for example. You could then use that to rsync the data directory
from the server (after a pg_start_backup() of course).
best regards,
Florian Pflug
On Fri, Dec 18, 2009 at 12:22 PM, Florian Pflug <fgp.phlo.org@gmail.com> wrote:
On 18.12.09 17:05 , Robert Haas wrote:
On Fri, Dec 18, 2009 at 11:03 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:Or some way for to register the standby with the master so that
the master knows it's out there, and still needs the logs, even when
it's not connected.That is the real answer, I think.
It'd prefer if the slave could automatically fetch a new base backup if it
falls behind too far to catch up with the available logs. That way, old logs
don't start piling up on the server if a slave goes offline for a long time.The slave could for example run a configurable shell script in that case,
for example. You could then use that to rsync the data directory from the
server (after a pg_start_backup() of course).
That would be nice to have too, but it's almost certainly much harder
to implement. In particular, there's no hard and fast rule for
figuring out when you've dropped so far behind that resnapping the
whole thing is faster than replaying the WAL bit by bit. And, of
course, you'll have to take the standby down if you go that route,
whereas trying to catch up the WAL lets it stay up throughout the
process.
I think (as I did/do with Hot Standby) that the most important thing
here is to get to a point where we have a reasonably good feature that
is of some use, and commit it. It will probably have some annoying
limitations; we can remove those later. I have a feel that what we
have right now is going to be non-robust in the face of network
breaks, but that is a problem that can be fixed by a future patch.
...Robert
Robert Haas wrote:
On Fri, Dec 18, 2009 at 12:22 PM, Florian Pflug <fgp.phlo.org@gmail.com> wrote:
On 18.12.09 17:05 , Robert Haas wrote:
On Fri, Dec 18, 2009 at 11:03 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:Or some way for to register the standby with the master so that
the master knows it's out there, and still needs the logs, even when
it's not connected.That is the real answer, I think.
It'd prefer if the slave could automatically fetch a new base backup if it
falls behind too far to catch up with the available logs. That way, old logs
don't start piling up on the server if a slave goes offline for a long time.The slave could for example run a configurable shell script in that case,
for example. You could then use that to rsync the data directory from the
server (after a pg_start_backup() of course).That would be nice to have too,
Yeah, for small databases, it's probably a better tradeoff. The problem
with keeping WAL around in the master indefinitely is that you will
eventually run out of disk space if the standby disappears for too long.
but it's almost certainly much harder
to implement. In particular, there's no hard and fast rule for
figuring out when you've dropped so far behind that resnapping the
whole thing is faster than replaying the WAL bit by bit.
I'd imagine that you take a new base backup only if you have to, ie. the
old WAL files the slave needs have already been deleted from the master.
And, of
course, you'll have to take the standby down if you go that route,
whereas trying to catch up the WAL lets it stay up throughout the
process.
Good point.
I think (as I did/do with Hot Standby) that the most important thing
here is to get to a point where we have a reasonably good feature that
is of some use, and commit it. It will probably have some annoying
limitations; we can remove those later. I have a feel that what we
have right now is going to be non-robust in the face of network
breaks, but that is a problem that can be fixed by a future patch.
Agreed. About a year ago, I was vocal about not relying on the file
based shipping, but I don't have a problem with relying on it as an
intermediate step, until we add the other options. It's robust as it is,
if you set up WAL archiving.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Hi,
Le 18 déc. 2009 à 19:21, Heikki Linnakangas a écrit :
On Fri, Dec 18, 2009 at 12:22 PM, Florian Pflug <fgp.phlo.org@gmail.com> wrote:
It'd prefer if the slave could automatically fetch a new base backup if it
falls behind too far to catch up with the available logs. That way, old logs
don't start piling up on the server if a slave goes offline for a long time.
Well I did propose to consider a state machine with clear transition for such problems, a while ago, and I think my remarks still do apply:
http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg131511.html
Sorry for non archives.postgresql.org link, couldn't find the mail there.
Yeah, for small databases, it's probably a better tradeoff. The problem
with keeping WAL around in the master indefinitely is that you will
eventually run out of disk space if the standby disappears for too long.
I'd vote for having a setting on the master for how long you keep WALs. If slave loose sync then comes back, either you still have the required WALs and you're back to catchup or you don't and you're back either to base/init dance.
Maybe you want to add a control on the slave to require explicit DBA action before getting back to taking a base backup from the master, though, as that could be provided from a nightly PITR backup rather than the live server.
but it's almost certainly much harder
to implement. In particular, there's no hard and fast rule for
figuring out when you've dropped so far behind that resnapping the
whole thing is faster than replaying the WAL bit by bit.I'd imagine that you take a new base backup only if you have to, ie. the
old WAL files the slave needs have already been deleted from the master.
Well consider a slave can be in one of those states: base, init, setup, catchup, sync. Now what you just said is reduced to saying what transitions you can do without resorting to base backup, and I don't see that many as soon as the last sync point is no more available on the master.
I think (as I did/do with Hot Standby) that the most important thing
here is to get to a point where we have a reasonably good feature that
is of some use, and commit it. It will probably have some annoying
limitations; we can remove those later. I have a feel that what we
have right now is going to be non-robust in the face of network
breaks, but that is a problem that can be fixed by a future patch.Agreed. About a year ago, I was vocal about not relying on the file
based shipping, but I don't have a problem with relying on it as an
intermediate step, until we add the other options. It's robust as it is,
if you set up WAL archiving.
I think I'd like to have the feature that a slave never pretends it's in-sync or soon-to-be when clearly it's not. For the asynchronous case, we can live with it. As soon as we're talking synchronous, you really want the master to skip any not-in-sync slave at COMMIT. To be even more clear, a slave that is not in sync is NOT a slave as far as synchronous replication is concerned.
Regards,
--
dim
Dimitri Fontaine escribi�:
Well I did propose to consider a state machine with clear transition for such problems, a while ago, and I think my remarks still do apply:
http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg131511.htmlSorry for non archives.postgresql.org link, couldn't find the mail there.
http://archives.postgresql.org/message-id/87fxcxnjwt.fsf%40hi-media-techno.com
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Robert Haas wrote:
I think (as I did/do with Hot Standby) that the most important thing
here is to get to a point where we have a reasonably good feature that
is of some use, and commit it. It will probably have some annoying
limitations; we can remove those later. I have a feel that what we
have right now is going to be non-robust in the face of network
breaks, but that is a problem that can be fixed by a future patch.
Improving robustness in all the situations where you'd like things to be
better for replication is a never ending job. As I understand it, a
major issue with this patch right now is how it links to the client
libpq. That's the sort of problem that can make this uncomittable. As
long as the fundamentals are good, it's important not to get lost in
optimizing the end UI here if it's at the expense of getting something
you can deploy at all in the process. If Streaming Replication ships
with a working core but a horribly complicated setup/failover mechanism,
that's infinitely better than not shipping at all because resources were
diverted toward making things more robust or easier to setup instead.
Also, the pool of authors who can work on tweaking the smaller details
here is larger than those capable of working on the fundamental
streaming replication code.
--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.com
On Sat, Dec 19, 2009 at 1:03 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
I don't think it's worthwhile to modify pg_stop_backup() like that. We
should address the general problem. At the moment, you're fine if you
also configure WAL archiving and log file shipping, but it would be nice
to have some simpler mechanism to avoid the problem. For example, a GUC
in master to retain all log files (including backup history files) for X
days. Or some way for to register the standby with the master so that
the master knows it's out there, and still needs the logs, even when
it's not connected.
I propose the new GUC replication_reserved_segments (better name?) which
determines the maximum number of WAL files held for the standby.
Design:
* Only the WAL files which are replication_reserved_segments segments older
than the current write segment can be recycled. IOW, we can think that the
standby which falls replication_reserved_segments segments behind is always
connected to the primary, and the WAL files needed for the active standby
are not recycled.
* Disjoin the standby which falls more than replication_reserved_segment
segments behind, in order to avoid the disk full failure, i.e., the
primary server's PANIC error. This would be only possible in asynchronous
replication case.
Thought?
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Thu, 2009-11-26 at 17:02 +0900, Fujii Masao wrote:
On Thu, Nov 26, 2009 at 4:55 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:Fujii Masao wrote:
In current SR, since a backup history file is not replicated,
the standby always starts an archive recovery without a backup
history file, and a wrong minRecoveryPoint might be used. This
is not a problem for SR itself, but would cause trouble when
SR cooperates with Hot Standby.But the backup history file is included in the base backup you start
replication from, right?No. A backup history file is created by pg_stop_backup().
So it's not included in the base backup.
The backup history file is a slightly bit quirky way of doing things and
was designed when the transfer mechanism was file-based.
Why don't we just write a new xlog record that contains the information
we need? Copy the contents of the backup history file into the WAL
record, just like we do with prepared transactions. That way it will be
streamed to the standby without any other code being needed for SR,
while we don't need to retest warm standby to check that still works
also.
(The thread diverges onto a second point and this first point seems to
have been a little forgotten)
--
Simon Riggs www.2ndQuadrant.com
On Tue, 2009-12-22 at 14:40 +0900, Fujii Masao wrote:
On Sat, Dec 19, 2009 at 1:03 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:I don't think it's worthwhile to modify pg_stop_backup() like that. We
should address the general problem. At the moment, you're fine if you
also configure WAL archiving and log file shipping, but it would be nice
to have some simpler mechanism to avoid the problem. For example, a GUC
in master to retain all log files (including backup history files) for X
days. Or some way for to register the standby with the master so that
the master knows it's out there, and still needs the logs, even when
it's not connected.I propose the new GUC replication_reserved_segments (better name?) which
determines the maximum number of WAL files held for the standby.Design:
* Only the WAL files which are replication_reserved_segments segments older
than the current write segment can be recycled. IOW, we can think that the
standby which falls replication_reserved_segments segments behind is always
connected to the primary, and the WAL files needed for the active standby
are not recycled.
(I don't fully understand your words above, sorry.)
Possibly an easier way would be to have a size limit, not a number of
segments. Something like max_reserved_wal = 240GB. We made that change
to shared_buffers a few years back and it was really useful.
The problem I foresee is that doing it this way puts an upper limit on
the size of database we can replicate. While we do base backup and
transfer it we must store WAL somewhere. Forcing us to store the WAL on
the master while this happens could be very limiting.
* Disjoin the standby which falls more than replication_reserved_segment
segments behind, in order to avoid the disk full failure, i.e., the
primary server's PANIC error. This would be only possible in asynchronous
replication case.
Or at the start of replication.
--
Simon Riggs www.2ndQuadrant.com
On Wed, Dec 23, 2009 at 2:42 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
The backup history file is a slightly bit quirky way of doing things and
was designed when the transfer mechanism was file-based.Why don't we just write a new xlog record that contains the information
we need? Copy the contents of the backup history file into the WAL
record, just like we do with prepared transactions. That way it will be
streamed to the standby without any other code being needed for SR,
while we don't need to retest warm standby to check that still works
also.
This means that we can replace a backup history file with the corresponding
xlog record. I think that we should simplify the code by making the replacement
completely rather than just adding new xlog record. Thought?
BTW, in current SR code, the capability to replicate a backup history file has
been implemented. But if there is better and simpler idea, I'll adopt it.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Wed, 2009-12-23 at 03:28 +0900, Fujii Masao wrote:
On Wed, Dec 23, 2009 at 2:42 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
The backup history file is a slightly bit quirky way of doing things and
was designed when the transfer mechanism was file-based.Why don't we just write a new xlog record that contains the information
we need? Copy the contents of the backup history file into the WAL
record, just like we do with prepared transactions. That way it will be
streamed to the standby without any other code being needed for SR,
while we don't need to retest warm standby to check that still works
also.This means that we can replace a backup history file with the corresponding
xlog record. I think that we should simplify the code by making the replacement
completely rather than just adding new xlog record. Thought?
We can't do that because it would stop file-based archiving from
working. I don't think we should deprecate that for another release at
least.
--
Simon Riggs www.2ndQuadrant.com
On Wed, Dec 23, 2009 at 2:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
(I don't fully understand your words above, sorry.)
NM ;-)
Possibly an easier way would be to have a size limit, not a number of
segments. Something like max_reserved_wal = 240GB. We made that change
to shared_buffers a few years back and it was really useful.
For me, a size limit is intuitive because the checkpoint_segments which
is closely connected with the new parameter still indicates the number of
segments. But if more people like a size limit, I'll make that change.
The problem I foresee is that doing it this way puts an upper limit on
the size of database we can replicate. While we do base backup and
transfer it we must store WAL somewhere. Forcing us to store the WAL on
the master while this happens could be very limiting.
Yes. If the size of pg_xlog is relatively small to the size of database,
the primary would not be able to hold all the WAL files required for
the standby, so we would need to use the restore_command which
retrieves the WAL files from the master's archival area. I'm OK that
such extra operation is required in that special case, now.
* Disjoin the standby which falls more than replication_reserved_segment
segments behind, in order to avoid the disk full failure, i.e., the
primary server's PANIC error. This would be only possible in asynchronous
replication case.Or at the start of replication.
Yes. I think that avoidance of the primary's PANIC error should be
given priority over a smooth start of replication.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center