[BUG] pg_basebackup from disconnected standby fails

Started by Kyotaro Horiguchialmost 10 years ago55 messageshackers
Jump to latest
#1Kyotaro Horiguchi
horikyota.ntt@gmail.com

Hello, I found that pg_basebackup from a replication standby
fails after the following steps, on 9.3 and the master.

- start a replication master
- start a replication standby
- stop the master in the mode other than immediate.

pg_basebackup to the standby will fail with the following error.

pg_basebackup: could not get transaction log end position from
server: ERROR: could not find any WAL files

The immediate cause is that do_pg_stop_backup returns an ealier
LSN to do_pg_start_backup. The backup start point is the redo
point of the last executed restart point. And the backup end
point is the minRecoveryPoint at the call time.

A restart point doesn't update the minRecoveryPoint when it is
actually executed. Even though, ControlFile->checkPointCopy is
updated to the redo point of the restart point just made. The
minRecoveryPoint is too small as the backup end point on this
situation. Thit is, end point can go behind the start point.

This can be caused by the simple steps above but it also can be
occur when pg_basebackup is connected after master's
disconnection during a restart point. (With some other
timing-dependet condition)

So, the following comment in do_pg_stop_backup says as the
following seems somewhat wrong.

* We return the current minimum recovery point as the backup end
* location. Note that it can be greater than the exact backup end
* location if the minimum recovery point is updated after the backup of
* pg_control. This is harmless for current uses.

After looking more closely, I found that the minRecoveryPoint
tends to be too small as the backup end point, and up to the
record at the lastReplayedRecPtr can affect the pages on disk and
they can go into the backup just taken.

My conclusion here is that do_pg_stop_backup should return
lastReplayedRecPtr, not minRecoveryPoint.

The attached small patch does this on the master. The first
problem is fixed by this for me.

Any thoughts?

# Sorry, but I'll be offline 'til Monday.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Make-pg_stop_backup-on-standby-give-proper-end-LSN.patchtext/x-patch; charset=us-asciiDownload+4-9
#2Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#1)
Re: [BUG] pg_basebackup from disconnected standby fails

On Thu, Jun 9, 2016 at 9:55 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello, I found that pg_basebackup from a replication standby
fails after the following steps, on 9.3 and the master.

- start a replication master
- start a replication standby
- stop the master in the mode other than immediate.

pg_basebackup to the standby will fail with the following error.

pg_basebackup: could not get transaction log end position from server:
ERROR: could not find any WAL files

Indeed, and you could just do the following to reproduce the failure
with the recovery test suite, so I would suggest adding this test in
the patch:
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -24,6 +24,11 @@ $node_standby_1->start;
 # pg_basebackup works on a standby).
 $node_standby_1->backup($backup_name);
+# Take a second backup of the standby while the master is offline.
+$node_master->stop;
+$node_standby_1->backup('my_backup_2');
+$node_master->start;
+

After looking more closely, I found that the minRecoveryPoint
tends to be too small as the backup end point, and up to the
record at the lastReplayedRecPtr can affect the pages on disk and
they can go into the backup just taken.

My conclusion here is that do_pg_stop_backup should return
lastReplayedRecPtr, not minRecoveryPoint.

I have been thinking quite a bit about this patch, and this logic
sounds quite right to me. When stopping the backup we need to let the
user know up to which point it needs to replay WAL, and relation pages
are touched up to lastReplayedEndRecPtr. This LSN could be greater
than the minimum recovery point as there is no record to mark the end
of the backup, and pg_control has normally, well surely, being backup
up last but that's not a new problem it would as well have been backup
up before the minimum recovery point has been reached...

Still, perhaps we could improve the documentation regarding that? Say
it is recommended to enforce the minimum recovery point in pg_control
to the stop backup LSN to ensure that the backup recovers to a
consistent state when taking a backup from a standby.

I am attaching an updated patch with the test btw.
--
Michael

Attachments:

backup-standby-v2.patchbinary/octet-stream; name=backup-standby-v2.patchDownload+9-8
#3Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#2)
Re: [BUG] pg_basebackup from disconnected standby fails

Hello, thank you for looking this.

At Fri, 10 Jun 2016 17:39:59 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqTv5gmKQcNDoFGTGqoqXz2xLz4RRw247oqOJzZTVy6-7Q@mail.gmail.com>

On Thu, Jun 9, 2016 at 9:55 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello, I found that pg_basebackup from a replication standby
fails after the following steps, on 9.3 and the master.

- start a replication master
- start a replication standby
- stop the master in the mode other than immediate.

pg_basebackup to the standby will fail with the following error.

pg_basebackup: could not get transaction log end position from server:
ERROR: could not find any WAL files

Indeed, and you could just do the following to reproduce the failure
with the recovery test suite, so I would suggest adding this test in
the patch:
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -24,6 +24,11 @@ $node_standby_1->start;
# pg_basebackup works on a standby).
$node_standby_1->backup($backup_name);
+# Take a second backup of the standby while the master is offline.
+$node_master->stop;
+$node_standby_1->backup('my_backup_2');
+$node_master->start;

I'm not sure that adding the test case for a particular bug like
this is appropriate. But it would be acceptable because it
doesn't take long time and it is separate from standard checks.

After looking more closely, I found that the minRecoveryPoint
tends to be too small as the backup end point, and up to the
record at the lastReplayedRecPtr can affect the pages on disk and
they can go into the backup just taken.

My conclusion here is that do_pg_stop_backup should return
lastReplayedRecPtr, not minRecoveryPoint.

I have been thinking quite a bit about this patch, and this logic
sounds quite right to me. When stopping the backup we need to let the
user know up to which point it needs to replay WAL, and relation pages
are touched up to lastReplayedEndRecPtr.

Yes, but by design, the changes in buffers don't go into disk
until buffer replacing occurs, which updates minRecoveryPoint. My
understanding is that the problem is that a restart point that is
not accompanied with buffer updates advances only the redo point
of the last checkpoint and doesn't update minRecoveryPoint, which
may be behind the redo point at the time.

It seems to me that we could more agressively advance the
minRecoveryPoint (but must not let it go too far..), but it is
right for it to aim a bit smaller than the ideal location.

So I proposed the patch as a solution instead of changing
minRecoveryPoint's movement. As the result, the explanation for
it is accurate enough, but obscuring the root cause.

This LSN could be greater
than the minimum recovery point as there is no record to mark the end
of the backup, and pg_control has normally, well surely, being backup
up last but that's not a new problem it would as well have been backup
up before the minimum recovery point has been reached...

Yes. it would cause no new problem except that the amount of WAL
files to be copied could be larger than ideal amount.

Still, perhaps we could improve the documentation regarding that? Say
it is recommended to enforce the minimum recovery point in pg_control
to the stop backup LSN to ensure that the backup recovers to a
consistent state when taking a backup from a standby.

If I understand that correctly, I don't find a explicit mention
to minimum recovery point (and should not be, I suppose) in PITR
and pg_baseback pages in the documentaiton.

Up to where we need to reach a consistent state from a backup is
not the "Minimum recovery ending point" in pg_control. It holds
the consistency point in the previous recovery. What we need to
reach there for a backup is the WAL files up to the LSN returned
by the (do_)pg_stop_backup(). All of the files should have been
archived on master and should have been automatically transferred
by pg_basebackup with -X/x option. (I don't know what to do when
-X/x is not specified for pg_basebackup to standby, though..)

I am attaching an updated patch with the test btw.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#3)
Re: [BUG] pg_basebackup from disconnected standby fails

On Tue, Jun 14, 2016 at 8:31 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

+# Take a second backup of the standby while the master is offline.
+$node_master->stop;
+$node_standby_1->backup('my_backup_2');
+$node_master->start;

I'm not sure that adding the test case for a particular bug like
this is appropriate. But it would be acceptable because it
doesn't take long time and it is separate from standard checks.

We already take a backup from a standby when master is connected, it
should not cost much in terms of time.

It seems to me that we could more agressively advance the
minRecoveryPoint (but must not let it go too far..), but it is
right for it to aim a bit smaller than the ideal location.

It may be risky to propose such a change for a backpatch. Anyway, in
any case there is no guarantee that when using the low-level backup
routines pg_start/stop_backup with a custom backup method the minimum
recovery point will be correct.. pg_basebackup does that a bit more
carefully if I recall correctly (too lazy to look at the code now :)).
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#3)
Re: [BUG] pg_basebackup from disconnected standby fails

Sorry, I'm confused about the minRecoveryPoint.

Reconsidered a bit.

At Tue, 14 Jun 2016 20:31:11 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20160614.203111.229211034.horiguchi.kyotaro@lab.ntt.co.jp>

After looking more closely, I found that the minRecoveryPoint
tends to be too small as the backup end point, and up to the
record at the lastReplayedRecPtr can affect the pages on disk and
they can go into the backup just taken.

My conclusion here is that do_pg_stop_backup should return
lastReplayedRecPtr, not minRecoveryPoint.

I have been thinking quite a bit about this patch, and this logic
sounds quite right to me. When stopping the backup we need to let the
user know up to which point it needs to replay WAL, and relation pages
are touched up to lastReplayedEndRecPtr.

Yes, but by design, the changes in buffers don't go into disk
until buffer replacing occurs, which updates minRecoveryPoint. My
understanding is that the problem is that a restart point that is
not accompanied with buffer updates advances only the redo point
of the last checkpoint and doesn't update minRecoveryPoint, which
may be behind the redo point at the time.

It seems to me that we could more agressively advance the
minRecoveryPoint (but must not let it go too far..), but it is
right for it to aim a bit smaller than the ideal location.

It's wrong. minRecoveryPoint should be greater than or equal to
the maximum buffer-touching LSN reached in previous recoveries,
and it can be the same to replayEndRecPtr but may be behind it if
no acutual modification on page files is done hereafter. xlog.c
works that way. The value of the minRecoveryPoint smaller than
the redo point of the last checkpoint with no buffer flush is
allowable from this point of view but it is not proper for the
end point of a backup.

If we skip recording the last checkpoint position when it
eventually causes no buffer flush, minRecoveryPoint is again
usable for the purpose. However, it causes repeated restartpoint
trial on the same (skipped) checkpoint record.

As the consequence, we can solve this problemn also by explicitly
updating the minRecoveryPoint for an executed restartpoint
without no buffer flush.

The attached patch performs this way and also solves the problem.

Which one do you think is more preferable? Or any other solution?

This patch updates minRecoeryPoint only for restartpoints that
caused no buffer flush but such restriction might not be
necessary.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Advancing-minRecoveryPoint-for-executed-empty-restar.patchtext/x-patch; charset=us-asciiDownload+14-1
#6Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#4)
Re: [BUG] pg_basebackup from disconnected standby fails

Hello,

At Tue, 14 Jun 2016 21:24:58 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqQkJv-3Cbi=yDytoC9eWPfmjrj7-DLOn9C4YsB1twAKiw@mail.gmail.com>

On Tue, Jun 14, 2016 at 8:31 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

+# Take a second backup of the standby while the master is offline.
+$node_master->stop;
+$node_standby_1->backup('my_backup_2');
+$node_master->start;

I'm not sure that adding the test case for a particular bug like
this is appropriate. But it would be acceptable because it
doesn't take long time and it is separate from standard checks.

We already take a backup from a standby when master is connected, it
should not cost much in terms of time.

Agreed.

It seems to me that we could more agressively advance the
minRecoveryPoint (but must not let it go too far..), but it is
right for it to aim a bit smaller than the ideal location.

It may be risky to propose such a change for a backpatch. Anyway, in
any case there is no guarantee that when using the low-level backup
routines pg_start/stop_backup with a custom backup method the minimum
recovery point will be correct.. pg_basebackup does that a bit more
carefully if I recall correctly (too lazy to look at the code now :)).

Yes, as written in another mail, minRecoveryPoint seems to be
correctly upadtaed as its definition. The problem is that only
redo point of the last restartpoint is updated, even if no update
happened. But just inhibiting the update of redo point for the
case in turn causes an annoying repetition of restartpoints.

Addition to the TAP test above, the following SQL commands on the
master causes the same problem without disconnection.

=# checkpoint;
=# select pg_switch_xlog();

This is somewhat artifical but the same situation could be made
also in the nature. The exact condition for this is replaying a
checkpoint record having no buffer modification since the
preceding checkpoint in the previous WAL segments.

Returning to the discussion on which way to choose, we agreed
that using replayEndRecPtr instead of minRecoveryPoint in
do_pg_stop_backup(), as the patch attached to the following
message.

/messages/by-id/CAB7nPqTv5gmKQcNDoFGTGqoqXz2xLz4RRw247oqOJzZTVy6-7Q@mail.gmail.com

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#6)
Re: [BUG] pg_basebackup from disconnected standby fails

On Wed, Jun 15, 2016 at 4:52 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello,

Sorry for the late reply, Horiguchi-san. I have finally been able to
put some mind power into that.

This is somewhat artificial but the same situation could be made
also in the nature. The exact condition for this is replaying a
checkpoint record having no buffer modification since the
preceding checkpoint in the previous WAL segments.

After thinking about that more, I am seeing your point.
CreateRestartPoint is clearly missing the shot for the update of
minRecoveryPoint even when a restart point can be created. Since
cdd46c7, we assume that a node in recovery will not update it, and
rely on the fact that it will be updated if some buffers are flushed,
but that assumption got broken since the introduction of the
possibility to take base backups from standbys, as in your case there
is no activity on the standby that would allow minRecoveryPoint to be
updated and make the backup finish in a clean state. By the way,
relying on CheckpointStats to decide if it should be updated or not
looks like a broken concept to me as this information is just stored
for logging and has no role in any decision-making, and I think that
it should remain this way. So, instead I think that we had better
update minRecoveryPoint unconditionally as in the attached. I reworked
the surrounding comments in consequence. That's a bit more aggressive,
and that addresses the backup problems.

Returning to the discussion on which way to choose, we agreed
that using replayEndRecPtr instead of minRecoveryPoint in
do_pg_stop_backup(), as the patch attached to the following
message.

Hm.. If there are no buffers flushed to disk that may increase the
time it takes to reach a consistent state at recovery for no real
reasons. So actually it is not that much a good idea...
--
Michael

Attachments:

backup-standby-v4.patchinvalid/octet-stream; name=backup-standby-v4.patchDownload+16-10
#8Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#7)
Re: [BUG] pg_basebackup from disconnected standby fails

On Thu, Jun 23, 2016 at 3:50 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Wed, Jun 15, 2016 at 4:52 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello,

Sorry for the late reply, Horiguchi-san. I have finally been able to
put some mind power into that.

This is somewhat artificial but the same situation could be made
also in the nature. The exact condition for this is replaying a
checkpoint record having no buffer modification since the
preceding checkpoint in the previous WAL segments.

After thinking about that more, I am seeing your point.
CreateRestartPoint is clearly missing the shot for the update of
minRecoveryPoint even when a restart point can be created. Since
cdd46c7, we assume that a node in recovery will not update it, and
rely on the fact that it will be updated if some buffers are flushed,
but that assumption got broken since the introduction of the
possibility to take base backups from standbys, as in your case there
is no activity on the standby that would allow minRecoveryPoint to be
updated and make the backup finish in a clean state. By the way,
relying on CheckpointStats to decide if it should be updated or not
looks like a broken concept to me as this information is just stored
for logging and has no role in any decision-making, and I think that
it should remain this way. So, instead I think that we had better
update minRecoveryPoint unconditionally as in the attached. I reworked
the surrounding comments in consequence. That's a bit more aggressive,
and that addresses the backup problems.

Returning to the discussion on which way to choose, we agreed
that using replayEndRecPtr instead of minRecoveryPoint in
do_pg_stop_backup(), as the patch attached to the following
message.

Hm.. If there are no buffers flushed to disk that may increase the
time it takes to reach a consistent state at recovery for no real
reasons. So actually it is not that much a good idea...

By the way, do you mind if I add this patch to the next CF? Better not
to lose track of it...
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#8)
Re: [BUG] pg_basebackup from disconnected standby fails

On Thu, Jun 23, 2016 at 3:51 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

By the way, do you mind if I add this patch to the next CF? Better not
to lose track of it...

Well, I have added an entry here at the end:
https://commitfest.postgresql.org/10/654/
Better doing it now before I forget about it...
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#9)
Re: [BUG] pg_basebackup from disconnected standby fails

Sorry for the absense. Thank you for registering it.

regards.

At Fri, 24 Jun 2016 14:46:25 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqTLZE77Joz3EREBjunESB4TiZjyji4s_7b+HJnSSDUpLQ@mail.gmail.com>

On Thu, Jun 23, 2016 at 3:51 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

By the way, do you mind if I add this patch to the next CF? Better not
to lose track of it...

Well, I have added an entry here at the end:
https://commitfest.postgresql.org/10/654/
Better doing it now before I forget about it...
--
Michael

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Kyotaro Horiguchi (#3)
Re: [BUG] pg_basebackup from disconnected standby fails

Kyotaro HORIGUCHI wrote:

At Fri, 10 Jun 2016 17:39:59 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqTv5gmKQcNDoFGTGqoqXz2xLz4RRw247oqOJzZTVy6-7Q@mail.gmail.com>

On Thu, Jun 9, 2016 at 9:55 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Indeed, and you could just do the following to reproduce the failure
with the recovery test suite, so I would suggest adding this test in
the patch:
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -24,6 +24,11 @@ $node_standby_1->start;
# pg_basebackup works on a standby).
$node_standby_1->backup($backup_name);
+# Take a second backup of the standby while the master is offline.
+$node_master->stop;
+$node_standby_1->backup('my_backup_2');
+$node_master->start;

I'm not sure that adding the test case for a particular bug like
this is appropriate. But it would be acceptable because it
doesn't take long time and it is separate from standard checks.

The reason this test is appropiate is that it's testing a feature we
want to support, namely that taking a backup from a standby works even
when the master is stopped. It's not a test for this particular bug,
even though the feature doesn't work because of this bug.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Alvaro Herrera (#11)
Re: [BUG] pg_basebackup from disconnected standby fails

Hello, thank you for the comment.

At Fri, 8 Jul 2016 14:42:20 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in <20160708184220.GA733807@alvherre.pgsql>

Kyotaro HORIGUCHI wrote:

At Fri, 10 Jun 2016 17:39:59 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqTv5gmKQcNDoFGTGqoqXz2xLz4RRw247oqOJzZTVy6-7Q@mail.gmail.com>

On Thu, Jun 9, 2016 at 9:55 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Indeed, and you could just do the following to reproduce the failure
with the recovery test suite, so I would suggest adding this test in
the patch:
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -24,6 +24,11 @@ $node_standby_1->start;
# pg_basebackup works on a standby).
$node_standby_1->backup($backup_name);
+# Take a second backup of the standby while the master is offline.
+$node_master->stop;
+$node_standby_1->backup('my_backup_2');
+$node_master->start;

I'm not sure that adding the test case for a particular bug like
this is appropriate. But it would be acceptable because it
doesn't take long time and it is separate from standard checks.

The reason this test is appropiate is that it's testing a feature we
want to support, namely that taking a backup from a standby works even
when the master is stopped. It's not a test for this particular bug,
even though the feature doesn't work because of this bug.

That's true, but we don't always have a perfectly comprehensive
test suite, consciously or unconsciously. The sentence was
inattentive but the "bug" was just the negative comparable to
"feature" in my mind. My point was the comparison between adding
a test for a corner-case and its cost. It must be added if the
fixed feature is fragile. It can be added it doesn't take a too
long time to finish.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#12)
Re: [BUG] pg_basebackup from disconnected standby fails

On Mon, Jul 11, 2016 at 4:40 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

That's true, but we don't always have a perfectly comprehensive
test suite, consciously or unconsciously. The sentence was
inattentive but the "bug" was just the negative comparable to
"feature" in my mind. My point was the comparison between adding
a test for a corner-case and its cost. It must be added if the
fixed feature is fragile. It can be added it doesn't take a too
long time to finish.

I'd just add it to be honest. Taking backups from standbys, with or
without the master being connected is a supported feature, and we want
to follow-up to be sure that it does not in the future. Having now the
infrastructure for more complex scenarios does not mean of course that
we need to test everything and all kind of things, of course, but here
the benefits are good compared to the cost that a single call of
pg_basebackup has in the complete the test suite run.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#7)
Re: [BUG] pg_basebackup from disconnected standby fails

On Thu, Jun 23, 2016 at 12:20 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Wed, Jun 15, 2016 at 4:52 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello,

Sorry for the late reply, Horiguchi-san. I have finally been able to
put some mind power into that.

This is somewhat artificial but the same situation could be made
also in the nature. The exact condition for this is replaying a
checkpoint record having no buffer modification since the
preceding checkpoint in the previous WAL segments.

After thinking about that more, I am seeing your point.
CreateRestartPoint is clearly missing the shot for the update of
minRecoveryPoint even when a restart point can be created.

I think updating minRecoveryPoint unconditionally can change it's
purpose in some cases. Refer below comments in code:

* minRecoveryPoint is updated to the latest replayed LSN whenever we
* flush a data change during archive recovery. That guards against
* starting archive recovery, aborting it, and restarting with an earlier
* stop location. If we've already flushed data changes from WAL record X
* to disk, we mustn't start up until we reach X again.

Now, as per above rule, the value of minRecoveryPoint can be much
smaller than XLogCtl->replayEndRecPtr. I think this can change the
rules when we can allow read-only connections.

Another point to note is that we are updating checkpoint location
during restart points, when the database state is
DB_IN_ARCHIVE_RECOVERY and updating minRecoveryPoint unconditionally
doesn't look in sync with that as well.

I think your and Kyotaro-san's point that minRecoveryPoint should be
updated to support back-ups on standby has merits, but I think doing
it unconditionally might lead to change in behaviour in some cases.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#14)
Re: [BUG] pg_basebackup from disconnected standby fails

On Tue, Jul 12, 2016 at 12:22 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think updating minRecoveryPoint unconditionally can change it's
purpose in some cases. Refer below comments in code:

* minRecoveryPoint is updated to the latest replayed LSN whenever we
* flush a data change during archive recovery. That guards against
* starting archive recovery, aborting it, and restarting with an earlier
* stop location. If we've already flushed data changes from WAL record X
* to disk, we mustn't start up until we reach X again.

Now, as per above rule, the value of minRecoveryPoint can be much
smaller than XLogCtl->replayEndRecPtr. I think this can change the
rules when we can allow read-only connections.

That would delay the moment read-only connections in hot standby are
allowed in the case where a server is stopped with "immediate", then
restarted with a different target if no data has been flushed when
replaying.

I think your and Kyotaro-san's point that minRecoveryPoint should be
updated to support back-ups on standby has merits, but I think doing
it unconditionally might lead to change in behaviour in some cases.

If we want to tackle the case I mentioned above, one way is to just
update minRecoveryPoint when an exclusive or a non-exclusive backup is
being taken by looking at their status in shared memory. See for
example the patch (1) attached, but this does not save from the case
where a node replays WAL, does not have data flushes, and from which a
backup is taken, in the case where this node gets restarted later with
the immediate mode and has different replay targets.

Another way that just popped into my mind is to add dedicated fields
to XLogCtl that set the stop LSN of a backup the way it should be
instead of using minRecoveryPoint. In short we'd update those fields
in CreateRestartPoint and UpdateMinRecoveryPoint under
XLogCtl->info_lck. The good thing is that this lock is already taken
there. See patch (2) accomplishing that.

Thinking about that, patch (2) is far cleaner, and takes care of not
advancing minRecoveryPoint where not needed, but it does it for the
base backups as they should be. So the dependency between the minimum
recovery point and the end locations of a backup get reduced.
--
Michael

Attachments:

backup-standby-v5-1.patchtext/x-patch; charset=US-ASCII; name=backup-standby-v5-1.patchDownload+36-10
backup-standby-v5-2.patchtext/x-patch; charset=US-ASCII; name=backup-standby-v5-2.patchDownload+41-15
#16Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#15)
Re: [BUG] pg_basebackup from disconnected standby fails

On Wed, Jul 13, 2016 at 8:56 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Jul 12, 2016 at 12:22 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think updating minRecoveryPoint unconditionally can change it's
purpose in some cases. Refer below comments in code:

* minRecoveryPoint is updated to the latest replayed LSN whenever we
* flush a data change during archive recovery. That guards against
* starting archive recovery, aborting it, and restarting with an earlier
* stop location. If we've already flushed data changes from WAL record X
* to disk, we mustn't start up until we reach X again.

Now, as per above rule, the value of minRecoveryPoint can be much
smaller than XLogCtl->replayEndRecPtr. I think this can change the
rules when we can allow read-only connections.

That would delay the moment read-only connections in hot standby are
allowed in the case where a server is stopped with "immediate", then
restarted with a different target if no data has been flushed when
replaying.

I think your and Kyotaro-san's point that minRecoveryPoint should be
updated to support back-ups on standby has merits, but I think doing
it unconditionally might lead to change in behaviour in some cases.

If we want to tackle the case I mentioned above, one way is to just
update minRecoveryPoint when an exclusive or a non-exclusive backup is
being taken by looking at their status in shared memory. See for
example the patch (1) attached, but this does not save from the case
where a node replays WAL, does not have data flushes, and from which a
backup is taken, in the case where this node gets restarted later with
the immediate mode and has different replay targets.

This looks clumsy as it updates minrecoverypoint for a specific
condition and it doesn't solve the above mentioned inconcistency.

Another way that just popped into my mind is to add dedicated fields
to XLogCtl that set the stop LSN of a backup the way it should be
instead of using minRecoveryPoint. In short we'd update those fields
in CreateRestartPoint and UpdateMinRecoveryPoint under
XLogCtl->info_lck. The good thing is that this lock is already taken
there. See patch (2) accomplishing that.

How is it different/preferable then directly using
XLogCtl->replayEndRecPtr and XLogCtl->replayEndTLI for stop backup
purpose? Do you see any problem if we go with what Kyotaro-san has
proposed in the initial patch [1]/messages/by-id/20160609.215558.118976703.horiguchi.kyotaro@lab.ntt.co.jp (aka using
XLogCtl->lastReplayedEndRecPtr and XLogCtl->lastReplayedTLI as stop
backup location)?

[1]: /messages/by-id/20160609.215558.118976703.horiguchi.kyotaro@lab.ntt.co.jp

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#16)
Re: [BUG] pg_basebackup from disconnected standby fails

On Sat, Jul 16, 2016 at 9:20 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jul 13, 2016 at 8:56 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

If we want to tackle the case I mentioned above, one way is to just
update minRecoveryPoint when an exclusive or a non-exclusive backup is
being taken by looking at their status in shared memory. See for
example the patch (1) attached, but this does not save from the case
where a node replays WAL, does not have data flushes, and from which a
backup is taken, in the case where this node gets restarted later with
the immediate mode and has different replay targets.

This looks clumsy as it updates minrecoverypoint for a specific
condition and it doesn't solve the above mentioned inconcistency.

Yep. I am not saying the contrary. That's why (2) with its separate
fields would make more sense.

Another way that just popped into my mind is to add dedicated fields
to XLogCtl that set the stop LSN of a backup the way it should be
instead of using minRecoveryPoint. In short we'd update those fields
in CreateRestartPoint and UpdateMinRecoveryPoint under
XLogCtl->info_lck. The good thing is that this lock is already taken
there. See patch (2) accomplishing that.

How is it different/preferable then directly using
XLogCtl->replayEndRecPtr and XLogCtl->replayEndTLI for stop backup
purpose? Do you see any problem if we go with what Kyotaro-san has
proposed in the initial patch [1] (aka using
XLogCtl->lastReplayedEndRecPtr and XLogCtl->lastReplayedTLI as stop
backup location)?

Re-reading this thread from scratch and scratching my mind, I am
actually not getting why we bumped into the topic of making
minRecoveryPoint updates more aggressive instead of the first proposal
:)

Knowing that we have no way to be sure if pg_control has been backed
up last or not, using the last replay LSN and TLI would be the most
simple solution, so let's do this for back-branches. It is an
annoyance to not be able to ensure that backups are taken while the
master is stopped or if there is no activity that updates relation
pages.

The thing that is really annoying btw is that there will be always a
gap between minRecoveryPoint and the actual moment where a backup
finishes because there is no way to rely on the XLOG_BACKUP_END
record. On top of that we can not be sure if pg_control has been
backed up last or not. Which is why it would be cool to document that
gap. Another crazy idea would be to return pg_control as an extra
return field of pg_stop_backup() and encourage users to write that
back in the backup itself. This would allow closing any hole in the
current logic for backups taken from live standbys: minRecoveryPoint
would be updated directly to the last replayed LSN/TLI in the control
file.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#17)
Re: [BUG] pg_basebackup from disconnected standby fails

On Tue, Jul 19, 2016 at 10:43 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Sat, Jul 16, 2016 at 9:20 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jul 13, 2016 at 8:56 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Another way that just popped into my mind is to add dedicated fields
to XLogCtl that set the stop LSN of a backup the way it should be
instead of using minRecoveryPoint. In short we'd update those fields
in CreateRestartPoint and UpdateMinRecoveryPoint under
XLogCtl->info_lck. The good thing is that this lock is already taken
there. See patch (2) accomplishing that.

How is it different/preferable then directly using
XLogCtl->replayEndRecPtr and XLogCtl->replayEndTLI for stop backup
purpose? Do you see any problem if we go with what Kyotaro-san has
proposed in the initial patch [1] (aka using
XLogCtl->lastReplayedEndRecPtr and XLogCtl->lastReplayedTLI as stop
backup location)?

Re-reading this thread from scratch and scratching my mind, I am
actually not getting why we bumped into the topic of making
minRecoveryPoint updates more aggressive instead of the first proposal
:)

Knowing that we have no way to be sure if pg_control has been backed
up last or not, using the last replay LSN and TLI would be the most
simple solution, so let's do this for back-branches.

Why only for back-branches? Do you have better solution for head?

It is an
annoyance to not be able to ensure that backups are taken while the
master is stopped or if there is no activity that updates relation
pages.

The thing that is really annoying btw is that there will be always a
gap between minRecoveryPoint and the actual moment where a backup
finishes because there is no way to rely on the XLOG_BACKUP_END
record.

Sorry, but I am not able to understand what you mean by above. What
kind of relation you are trying to show between minRecoveryPoint and
backup finish point?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#18)
Re: [BUG] pg_basebackup from disconnected standby fails

On Tue, Jul 19, 2016 at 9:08 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jul 19, 2016 at 10:43 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Sat, Jul 16, 2016 at 9:20 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jul 13, 2016 at 8:56 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Another way that just popped into my mind is to add dedicated fields
to XLogCtl that set the stop LSN of a backup the way it should be
instead of using minRecoveryPoint. In short we'd update those fields
in CreateRestartPoint and UpdateMinRecoveryPoint under
XLogCtl->info_lck. The good thing is that this lock is already taken
there. See patch (2) accomplishing that.

How is it different/preferable then directly using
XLogCtl->replayEndRecPtr and XLogCtl->replayEndTLI for stop backup
purpose? Do you see any problem if we go with what Kyotaro-san has
proposed in the initial patch [1] (aka using
XLogCtl->lastReplayedEndRecPtr and XLogCtl->lastReplayedTLI as stop
backup location)?

Re-reading this thread from scratch and scratching my mind, I am
actually not getting why we bumped into the topic of making
minRecoveryPoint updates more aggressive instead of the first proposal
:)

Knowing that we have no way to be sure if pg_control has been backed
up last or not, using the last replay LSN and TLI would be the most
simple solution, so let's do this for back-branches.

Why only for back-branches? Do you have better solution for head?

Yes, I mentioned an idea upthread to set up the minimum recovery point
saved in the backup to the last replayed LSN. Though that's not
acceptable for 9.6 as this requires changing the output of
pg_stop_backup() with a new field containing the bytes of pg_control.
I am not sure how others feel about that, so for now and the
back-branches we could just document that updating the field after
taking a backup closes any holes, and prevents to open a hot standby
for read-only connections too early.

It is an
annoyance to not be able to ensure that backups are taken while the
master is stopped or if there is no activity that updates relation
pages.

The thing that is really annoying btw is that there will be always a
gap between minRecoveryPoint and the actual moment where a backup
finishes because there is no way to rely on the XLOG_BACKUP_END
record.

Sorry, but I am not able to understand what you mean by above. What
kind of relation you are trying to show between minRecoveryPoint and
backup finish point?

When taking a backup from a standby, there is no XLOG_BACKUP_END that
can be used to determine when a hot standby is open for read-only
connections.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#19)
Re: [BUG] pg_basebackup from disconnected standby fails

On Wed, Jul 20, 2016 at 5:12 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Jul 19, 2016 at 9:08 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jul 19, 2016 at 10:43 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Sat, Jul 16, 2016 at 9:20 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jul 13, 2016 at 8:56 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Why only for back-branches? Do you have better solution for head?

Yes, I mentioned an idea upthread to set up the minimum recovery point
saved in the backup to the last replayed LSN. Though that's not
acceptable for 9.6 as this requires changing the output of
pg_stop_backup() with a new field containing the bytes of pg_control.
I am not sure how others feel about that,

Yeah, I think that is totally different angle to fix this issue, so
don't you think it is better to start a separate thread to discuss
about it for 10.0 and mark this patch as ready for committer.

It is an
annoyance to not be able to ensure that backups are taken while the
master is stopped or if there is no activity that updates relation
pages.

The thing that is really annoying btw is that there will be always a
gap between minRecoveryPoint and the actual moment where a backup
finishes because there is no way to rely on the XLOG_BACKUP_END
record.

Sorry, but I am not able to understand what you mean by above. What
kind of relation you are trying to show between minRecoveryPoint and
backup finish point?

When taking a backup from a standby, there is no XLOG_BACKUP_END that
can be used to determine when a hot standby is open for read-only
connections.

Okay, may be we can add a comment in code to indicate the same, but
OTOH, it is apparent from code, so not sure if it is worth to add it.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#20)
#22Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#21)
#23Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#22)
#24Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#23)
#25Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#22)
#26Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#13)
#27Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#25)
#28Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#27)
#29Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#28)
#30Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#29)
#31Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Robert Haas (#30)
#32Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#30)
#33Amit Kapila
amit.kapila16@gmail.com
In reply to: Kyotaro Horiguchi (#31)
#34Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#33)
#35Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#32)
#36Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#34)
#37Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#36)
#38Robert Haas
robertmhaas@gmail.com
In reply to: Kyotaro Horiguchi (#31)
#39Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#32)
#40Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#34)
#41Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#39)
#42Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#34)
#43Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#42)
#44Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#42)
#45Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#41)
#46Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#45)
#47Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#43)
#48Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#47)
#49Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#46)
#50Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#48)
#51Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#50)
#52Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#51)
#53Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#52)
#54Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#53)
#55Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#53)