pg_stop_backup(wait_for_archive := true) on standby server

Started by Masahiko Sawadaalmost 9 years ago67 messageshackers
Jump to latest
#1Masahiko Sawada
sawada.mshk@gmail.com

Hi,

Since an optional second argument wait_for_archive of pg_stop_backup
has been introduced in PostgreSQL 10 we can choose whether wait for
archiving. But my colleagues found that we can do pg_stop_backup with
wait_for_archive = true on the standby server but it actually doesn't
wait for WAL archiving. Because this behavior is not documented and we
cannot find out it without reading source code it will confuse the
user.

I think we can raise an error when pg_stop_backup with
wait_for_archive = true is executed on the standby. Attached patch
change it so that.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

pg_stop_backup_on_standby.patchapplication/octet-stream; name=pg_stop_backup_on_standby.patchDownload+5-0
#2Magnus Hagander
magnus@hagander.net
In reply to: Masahiko Sawada (#1)
Re: pg_stop_backup(wait_for_archive := true) on standby server

On Thu, Jun 22, 2017 at 10:12 AM, Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

Hi,

Since an optional second argument wait_for_archive of pg_stop_backup
has been introduced in PostgreSQL 10 we can choose whether wait for
archiving. But my colleagues found that we can do pg_stop_backup with
wait_for_archive = true on the standby server but it actually doesn't
wait for WAL archiving. Because this behavior is not documented and we
cannot find out it without reading source code it will confuse the
user.

I think we can raise an error when pg_stop_backup with
wait_for_archive = true is executed on the standby. Attached patch
change it so that.

Wouldn't it be better to make it *work*? If you have archive_mode=always,
it makes sense to want to wait on the standby as well, does it not?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#3Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Magnus Hagander (#2)
Re: pg_stop_backup(wait_for_archive := true) on standby server

On Thu, Jun 22, 2017 at 10:36 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Jun 22, 2017 at 10:12 AM, Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

Hi,

Since an optional second argument wait_for_archive of pg_stop_backup
has been introduced in PostgreSQL 10 we can choose whether wait for
archiving. But my colleagues found that we can do pg_stop_backup with
wait_for_archive = true on the standby server but it actually doesn't
wait for WAL archiving. Because this behavior is not documented and we
cannot find out it without reading source code it will confuse the
user.

I think we can raise an error when pg_stop_backup with
wait_for_archive = true is executed on the standby. Attached patch
change it so that.

Wouldn't it be better to make it *work*? If you have archive_mode=always, it
makes sense to want to wait on the standby as well, does it not?

Yes, ideally it will be better to make it wait for WAL archiving on
standby server when archive_mode=always. But I think it would be for
PG11 item, and this item is for PG10.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Magnus Hagander
magnus@hagander.net
In reply to: Masahiko Sawada (#3)
Re: pg_stop_backup(wait_for_archive := true) on standby server

On Thu, Jun 22, 2017 at 6:22 PM, Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Thu, Jun 22, 2017 at 10:36 PM, Magnus Hagander <magnus@hagander.net>
wrote:

On Thu, Jun 22, 2017 at 10:12 AM, Masahiko Sawada <sawada.mshk@gmail.com

wrote:

Hi,

Since an optional second argument wait_for_archive of pg_stop_backup
has been introduced in PostgreSQL 10 we can choose whether wait for
archiving. But my colleagues found that we can do pg_stop_backup with
wait_for_archive = true on the standby server but it actually doesn't
wait for WAL archiving. Because this behavior is not documented and we
cannot find out it without reading source code it will confuse the
user.

I think we can raise an error when pg_stop_backup with
wait_for_archive = true is executed on the standby. Attached patch
change it so that.

Wouldn't it be better to make it *work*? If you have

archive_mode=always, it

makes sense to want to wait on the standby as well, does it not?

Yes, ideally it will be better to make it wait for WAL archiving on
standby server when archive_mode=always. But I think it would be for
PG11 item, and this item is for PG10.

I'm not sure. I think this can be considered a bug in the implementation
for 10, and as such is "open for fixing". However, it's not a very critical
bug so I doubt it should be a release blocker, but if someone wants to work
on a fix I think we should commit it.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#5Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Magnus Hagander (#4)
Re: pg_stop_backup(wait_for_archive := true) on standby server

On Thu, Jun 29, 2017 at 10:30 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Jun 22, 2017 at 6:22 PM, Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Thu, Jun 22, 2017 at 10:36 PM, Magnus Hagander <magnus@hagander.net>
wrote:

On Thu, Jun 22, 2017 at 10:12 AM, Masahiko Sawada
<sawada.mshk@gmail.com>
wrote:

Hi,

Since an optional second argument wait_for_archive of pg_stop_backup
has been introduced in PostgreSQL 10 we can choose whether wait for
archiving. But my colleagues found that we can do pg_stop_backup with
wait_for_archive = true on the standby server but it actually doesn't
wait for WAL archiving. Because this behavior is not documented and we
cannot find out it without reading source code it will confuse the
user.

I think we can raise an error when pg_stop_backup with
wait_for_archive = true is executed on the standby. Attached patch
change it so that.

Wouldn't it be better to make it *work*? If you have
archive_mode=always, it
makes sense to want to wait on the standby as well, does it not?

Yes, ideally it will be better to make it wait for WAL archiving on
standby server when archive_mode=always. But I think it would be for
PG11 item, and this item is for PG10.

I'm not sure. I think this can be considered a bug in the implementation for
10, and as such is "open for fixing". However, it's not a very critical bug
so I doubt it should be a release blocker, but if someone wants to work on a
fix I think we should commit it.

I agree with you. I'd like to hear opinions from other hackers as well.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Peter Eisentraut
peter_e@gmx.net
In reply to: Masahiko Sawada (#5)
Re: pg_stop_backup(wait_for_archive := true) on standby server

On 6/30/17 04:08, Masahiko Sawada wrote:

I'm not sure. I think this can be considered a bug in the implementation for
10, and as such is "open for fixing". However, it's not a very critical bug
so I doubt it should be a release blocker, but if someone wants to work on a
fix I think we should commit it.

I agree with you. I'd like to hear opinions from other hackers as well.

It's preferable to make it work. If it's not easily possible, then we
should prohibit it.

Comments from Stephen (original committer)?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Stephen Frost
sfrost@snowman.net
In reply to: Peter Eisentraut (#6)
Re: pg_stop_backup(wait_for_archive := true) on standby server

Peter, all,

* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:

On 6/30/17 04:08, Masahiko Sawada wrote:

I'm not sure. I think this can be considered a bug in the implementation for
10, and as such is "open for fixing". However, it's not a very critical bug
so I doubt it should be a release blocker, but if someone wants to work on a
fix I think we should commit it.

I agree with you. I'd like to hear opinions from other hackers as well.

It's preferable to make it work. If it's not easily possible, then we
should prohibit it.

Comments from Stephen (original committer)?

I agree that it'd be preferable to make it work, but I'm not sure I can
commit to having it done in short order. I'm happy to work to prohibit
it, but if someone has a few spare cycles to make it actually work,
that'd be great.

In short, I agree with Magnus and feel like I'm more-or-less in the same
boat as he is (though slightly jealous as that's not actually physically
the case, for I hear he has a rather nice boat...).

Thanks!

Stephen

#8Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#7)
Re: pg_stop_backup(wait_for_archive := true) on standby server

On Sat, Jul 1, 2017 at 3:59 AM, Stephen Frost <sfrost@snowman.net> wrote:

* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:

On 6/30/17 04:08, Masahiko Sawada wrote:

I'm not sure. I think this can be considered a bug in the implementation for
10, and as such is "open for fixing". However, it's not a very critical bug
so I doubt it should be a release blocker, but if someone wants to work on a
fix I think we should commit it.

I agree with you. I'd like to hear opinions from other hackers as well.

It's preferable to make it work. If it's not easily possible, then we
should prohibit it.

Comments from Stephen (original committer)?

I agree that it'd be preferable to make it work, but I'm not sure I can
commit to having it done in short order. I'm happy to work to prohibit
it, but if someone has a few spare cycles to make it actually work,
that'd be great.

Fixing the limitation instead of prohibiting it looks like a better
way of doing things to me. It would be hard to explain to users why
the implementation does not consider archive_mode = always. Blocking
it is just four lines of code, still that feels wrong.

In short, I agree with Magnus and feel like I'm more-or-less in the same
boat as he is (though slightly jealous as that's not actually physically
the case, for I hear he has a rather nice boat...).

That means a PG-EU in Sweden at some point?!
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#8)
Re: pg_stop_backup(wait_for_archive := true) on standby server

On Sun, Jul 2, 2017 at 4:39 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Sat, Jul 1, 2017 at 3:59 AM, Stephen Frost <sfrost@snowman.net> wrote:

* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:

On 6/30/17 04:08, Masahiko Sawada wrote:

I'm not sure. I think this can be considered a bug in the implementation for
10, and as such is "open for fixing". However, it's not a very critical bug
so I doubt it should be a release blocker, but if someone wants to work on a
fix I think we should commit it.

I agree with you. I'd like to hear opinions from other hackers as well.

It's preferable to make it work. If it's not easily possible, then we
should prohibit it.

Comments from Stephen (original committer)?

I agree that it'd be preferable to make it work, but I'm not sure I can
commit to having it done in short order. I'm happy to work to prohibit
it, but if someone has a few spare cycles to make it actually work,
that'd be great.

Fixing the limitation instead of prohibiting it looks like a better
way of doing things to me. It would be hard to explain to users why
the implementation does not consider archive_mode = always. Blocking
it is just four lines of code, still that feels wrong.

I feel that since we cannot switch the WAL forcibly on the standby
server we need to find a new way to do so. I'm not sure but it might
be a hard work and be late for PG10. Or you meant that you have a idea
for this?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#9)
Re: pg_stop_backup(wait_for_archive := true) on standby server

On Wed, Jul 5, 2017 at 10:19 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I feel that since we cannot switch the WAL forcibly on the standby
server we need to find a new way to do so. I'm not sure but it might
be a hard work and be late for PG10. Or you meant that you have a idea
for this?

Why not refactoring a bit do_pg_stop_backup() so as the wait phase
happens even if a backup is started in recovery? Now wait_for_archive
is ignored because no wait is happening and the stop point is directly
returned back to the caller. For the wait actually happening, I don't
have a better idea than documenting the fact that enforcing manually a
segment switch on the primary needs to happen. That's better than
having users including WAL in their base backups but not actually
having everything they need. And I think that documenting that
properly is better than restricting things that should work.

In most workloads, multiple WAL segments can be generated per second,
and in even more of them a new segment generated would happen in less
than a minute, so waiting for a segment switch on the primary should
not be a problem for most users. The log letting user know about the
wait should be more informative when things happen on a standby, like
"waiting for segment to be finished or switched on the primary".

If the restriction approach is preferred, I think that the check
should happen in do_pg_stop_backup as well, and not in
pg_stop_backup_v2 as your patch suggests. pg_basebackup is not able to
do non-exclusive backups but this may happen some day, who knows..
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#10)
Re: pg_stop_backup(wait_for_archive := true) on standby server

On Wed, Jul 5, 2017 at 4:57 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Why not refactoring a bit do_pg_stop_backup() so as the wait phase
happens even if a backup is started in recovery? Now wait_for_archive
is ignored because no wait is happening and the stop point is directly
returned back to the caller. For the wait actually happening, I don't
have a better idea than documenting the fact that enforcing manually a
segment switch on the primary needs to happen. That's better than
having users including WAL in their base backups but not actually
having everything they need. And I think that documenting that
properly is better than restricting things that should work.

While looking at that in more details, I got surprised by two things:
1) No backup history file is generated on a standby during a base backup.
2) Because of 1), those files are not archived even if archive_mode = always.

This sounds to me like a missing optimization of archive_mode =
always, and the following comment block in xlog.c is at least
incorrect as an archiver can be invoked in recovery:
* XXX currently a backup history file is for informational and debug
* purposes only. It's not essential for an online backup. Furthermore,
* even if it's created, it will not be archived during recovery because
* an archiver is not invoked. So it doesn't seem worthwhile to write a
* backup history file during recovery.

So I would suggest the following things to address this issue:
1) Generate a backup history file for backups taken at recovery as well.
2) Archive it if archive_mode = always.
3) Wait for both the segment of the stop point and the backup history
files to be archived before returning back to the caller of
pg_stop_backup.

It would be nice to get all that addressed in 10. Thoughts?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#10)
Re: pg_stop_backup(wait_for_archive := true) on standby server

On Wed, Jul 5, 2017 at 4:57 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Wed, Jul 5, 2017 at 10:19 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I feel that since we cannot switch the WAL forcibly on the standby
server we need to find a new way to do so. I'm not sure but it might
be a hard work and be late for PG10. Or you meant that you have a idea
for this?

Why not refactoring a bit do_pg_stop_backup() so as the wait phase
happens even if a backup is started in recovery? Now wait_for_archive
is ignored because no wait is happening and the stop point is directly
returned back to the caller. For the wait actually happening, I don't
have a better idea than documenting the fact that enforcing manually a
segment switch on the primary needs to happen. That's better than
having users including WAL in their base backups but not actually
having everything they need. And I think that documenting that
properly is better than restricting things that should work.

I agree with this idea. I've tried to make it wait for archiving but
it seems to me that there are other two issues we need to deal with:
the timeline ID on standby server is always reset after created a
restart point, and ThisTimeLineID variable of a backend process is not
set on the standby node when initializing. The former would be easy to
fix because resetting it is for debugging purposes. However, to deal
with latter issue, I'm considering the influence on other things.

In most workloads, multiple WAL segments can be generated per second,
and in even more of them a new segment generated would happen in less
than a minute, so waiting for a segment switch on the primary should
not be a problem for most users. The log letting user know about the
wait should be more informative when things happen on a standby, like
"waiting for segment to be finished or switched on the primary".

If the restriction approach is preferred, I think that the check
should happen in do_pg_stop_backup as well, and not in
pg_stop_backup_v2 as your patch suggests. pg_basebackup is not able to
do non-exclusive backups but this may happen some day, who knows..

Yeah, I understood.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#12)
Re: pg_stop_backup(wait_for_archive := true) on standby server

On Fri, Jul 7, 2017 at 4:06 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I agree with this idea. I've tried to make it wait for archiving but
it seems to me that there are other two issues we need to deal with:
the timeline ID on standby server is always reset after created a
restart point, and ThisTimeLineID variable of a backend process is not
set on the standby node when initializing. The former would be easy to
fix because resetting it is for debugging purposes. However, to deal
with latter issue, I'm considering the influence on other things.

ThisTimeLineID does not need to be touched. It seems to me that you
should just use the wait timeline as minRecoveryPointTLI, and wait for
segments using this value. The segment and backup history files to
wait for should be defined with minRecoveryPoint.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#13)
Re: pg_stop_backup(wait_for_archive := true) on standby server

On Fri, Jul 7, 2017 at 4:21 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Jul 7, 2017 at 4:06 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I agree with this idea. I've tried to make it wait for archiving but
it seems to me that there are other two issues we need to deal with:
the timeline ID on standby server is always reset after created a
restart point, and ThisTimeLineID variable of a backend process is not
set on the standby node when initializing. The former would be easy to
fix because resetting it is for debugging purposes. However, to deal
with latter issue, I'm considering the influence on other things.

ThisTimeLineID does not need to be touched. It seems to me that you
should just use the wait timeline as minRecoveryPointTLI, and wait for
segments using this value. The segment and backup history files to
wait for should be defined with minRecoveryPoint.

Thank you for the advise. I'll post the patch later.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#11)
Re: pg_stop_backup(wait_for_archive := true) on standby server

On Fri, Jul 7, 2017 at 3:48 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Wed, Jul 5, 2017 at 4:57 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Why not refactoring a bit do_pg_stop_backup() so as the wait phase
happens even if a backup is started in recovery? Now wait_for_archive
is ignored because no wait is happening and the stop point is directly
returned back to the caller. For the wait actually happening, I don't
have a better idea than documenting the fact that enforcing manually a
segment switch on the primary needs to happen. That's better than
having users including WAL in their base backups but not actually
having everything they need. And I think that documenting that
properly is better than restricting things that should work.

While looking at that in more details, I got surprised by two things:
1) No backup history file is generated on a standby during a base backup.
2) Because of 1), those files are not archived even if archive_mode = always.

This sounds to me like a missing optimization of archive_mode =
always, and the following comment block in xlog.c is at least
incorrect as an archiver can be invoked in recovery:
* XXX currently a backup history file is for informational and debug
* purposes only. It's not essential for an online backup. Furthermore,
* even if it's created, it will not be archived during recovery because
* an archiver is not invoked. So it doesn't seem worthwhile to write a
* backup history file during recovery.

So I would suggest the following things to address this issue:
1) Generate a backup history file for backups taken at recovery as well.
2) Archive it if archive_mode = always.
3) Wait for both the segment of the stop point and the backup history
files to be archived before returning back to the caller of
pg_stop_backup.

It would be nice to get all that addressed in 10. Thoughts?

Yeah, I agree. Attached patch makes it works and deals with the
history file issue.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

pg_stop_backup_on_standby_v2.patchtext/x-patch; charset=US-ASCII; name=pg_stop_backup_on_standby_v2.patchDownload+38-38
#16Noah Misch
noah@leadboat.com
In reply to: Stephen Frost (#7)
Re: pg_stop_backup(wait_for_archive := true) on standby server

On Fri, Jun 30, 2017 at 02:59:11PM -0400, Stephen Frost wrote:

Peter, all,

* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:

On 6/30/17 04:08, Masahiko Sawada wrote:

I'm not sure. I think this can be considered a bug in the implementation for
10, and as such is "open for fixing". However, it's not a very critical bug
so I doubt it should be a release blocker, but if someone wants to work on a
fix I think we should commit it.

I agree with you. I'd like to hear opinions from other hackers as well.

It's preferable to make it work. If it's not easily possible, then we
should prohibit it.

Comments from Stephen (original committer)?

I agree that it'd be preferable to make it work, but I'm not sure I can
commit to having it done in short order. I'm happy to work to prohibit
it, but if someone has a few spare cycles to make it actually work,
that'd be great.

[Action required within three days. This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item. Stephen,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1]/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.

[1]: /messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Stephen Frost
sfrost@snowman.net
In reply to: Noah Misch (#16)
Re: pg_stop_backup(wait_for_archive := true) on standby server

All,

* Noah Misch (noah@leadboat.com) wrote:

On Fri, Jun 30, 2017 at 02:59:11PM -0400, Stephen Frost wrote:

* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:

On 6/30/17 04:08, Masahiko Sawada wrote:

I'm not sure. I think this can be considered a bug in the implementation for
10, and as such is "open for fixing". However, it's not a very critical bug
so I doubt it should be a release blocker, but if someone wants to work on a
fix I think we should commit it.

I agree with you. I'd like to hear opinions from other hackers as well.

It's preferable to make it work. If it's not easily possible, then we
should prohibit it.

Comments from Stephen (original committer)?

I agree that it'd be preferable to make it work, but I'm not sure I can
commit to having it done in short order. I'm happy to work to prohibit
it, but if someone has a few spare cycles to make it actually work,
that'd be great.

[Action required within three days. This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item. Stephen,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.

I'm out of town this week but will review the patch from Masahiko and
provide a status update by July 17th.

Thanks!

Stephen

#18Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#17)
Re: pg_stop_backup(wait_for_archive := true) on standby server

On Sun, Jul 9, 2017 at 8:00 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Noah Misch (noah@leadboat.com) wrote:

[Action required within three days. This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item. Stephen,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.

I'm out of town this week but will review the patch from Masahiko and
provide a status update by July 17th.

Good to know. I was planning to review the patch within 24 hours. I
have spotted at least one bug in the patch: there is no need to wait
for the backup history file and the last segment to be archived on a
standby unless archive_mode = always, meaning that
XLogArchivingAlways() needs to be used instead of
XLogArchivingActive(). There may be other things, but I am lacking
time and brain power now for a complete analysis.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#15)
Re: pg_stop_backup(wait_for_archive := true) on standby server

On Sat, Jul 8, 2017 at 12:50 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jul 7, 2017 at 3:48 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

So I would suggest the following things to address this issue:
1) Generate a backup history file for backups taken at recovery as well.
2) Archive it if archive_mode = always.
3) Wait for both the segment of the stop point and the backup history
files to be archived before returning back to the caller of
pg_stop_backup.

It would be nice to get all that addressed in 10. Thoughts?

Yeah, I agree. Attached patch makes it works and deals with the
history file issue.

I had a look at the proposed patch. Here are some comments.

@@ -11002,10 +11000,10 @@ do_pg_stop_backup(char *labelfile, bool
waitforarchive, TimeLineID *stoptli_p)
    if (waitforarchive && XLogArchivingActive())
    {
        XLByteToPrevSeg(stoppoint, _logSegNo);
-       XLogFileName(lastxlogfilename, ThisTimeLineID, _logSegNo);
+       XLogFileName(lastxlogfilename, stoptli, _logSegNo);

On a standby the wait phase should not happen if archive_mode = on,
but only if archive_mode = always. So I would suggest to change this
upper condition a bit, and shuffle a bit the code to make the wait
phase happen last:
1) stoptli_p first.
2) Check for XLogArchivingActive or XLogArchivingAlways, then with the
NOTICE message.
3) Do the actual wait.
This way the code doing the wait does not need to be in its long
lengthy if() branch. I think that we should replace the pg_usleep()
call with a latch to make this more responsive. That should be a
future patch.

In backup history files generated in standbys, the STOP TIME is not
set and this results in garbage in the file.

+    If second parameter is true and on standby, <function>pg_stop_backup</>
+    waits for WAL to be archived without forcibly switching WAL on standby.
+    So enforcing manually a WAL switch on primary needs to happen.
Here is a reformulation:
If the second parameter wait_for_archive is true and the backup is
taken on a standby, pg_stop_backup waits for WAL to be archived when
archive_mode = always. Enforcing manually a WAL segment switch to
happen with for example pg_switch_wal() may be necessary if the
primary has low activity to allow the backup to complete. Using
statement_timeout to limit the amount of time to wait or switching
wait_for_archive to false will control the wait time, though all the
WAL segments necessary to recover into a consistent state from the
backup taken may not be archived at the time pg_stop_backup returns
its status to the caller.

The errhint() for the wait phase should be reworked for a standby. I
would suggest for errmsg() the same thing, aka:
pg_stop_backup still waiting for all required WAL segments to be
archived (%d seconds elapsed)
But the following for a backup started in recovery. That's long but
things need to be really clear to the user:
Backup has been taken from a standby, check if the WAL segments needed
for this backup have been completed, in which case a forced segment
switch may can be needed on the primary. If a segment switch has
already happened check that your archive_command is executing
properly. pg_stop_backup can be canceled safely, but the database
backup will not be usable without all the WAL segments.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#19)
Re: pg_stop_backup(wait_for_archive := true) on standby server

On Mon, Jul 10, 2017 at 11:56 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Sat, Jul 8, 2017 at 12:50 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jul 7, 2017 at 3:48 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

So I would suggest the following things to address this issue:
1) Generate a backup history file for backups taken at recovery as well.
2) Archive it if archive_mode = always.
3) Wait for both the segment of the stop point and the backup history
files to be archived before returning back to the caller of
pg_stop_backup.

It would be nice to get all that addressed in 10. Thoughts?

Yeah, I agree. Attached patch makes it works and deals with the
history file issue.

I had a look at the proposed patch. Here are some comments.

Thank you for reviewing the patch!

@@ -11002,10 +11000,10 @@ do_pg_stop_backup(char *labelfile, bool
waitforarchive, TimeLineID *stoptli_p)
if (waitforarchive && XLogArchivingActive())
{
XLByteToPrevSeg(stoppoint, _logSegNo);
-       XLogFileName(lastxlogfilename, ThisTimeLineID, _logSegNo);
+       XLogFileName(lastxlogfilename, stoptli, _logSegNo);

On a standby the wait phase should not happen if archive_mode = on,
but only if archive_mode = always. So I would suggest to change this
upper condition a bit, and shuffle a bit the code to make the wait
phase happen last:
1) stoptli_p first.
2) Check for XLogArchivingActive or XLogArchivingAlways, then with the
NOTICE message.
3) Do the actual wait.

Thank you for the suggestion. Fixed.

This way the code doing the wait does not need to be in its long
lengthy if() branch. I think that we should replace the pg_usleep()
call with a latch to make this more responsive. That should be a
future patch.

Yeah, I agree.

In backup history files generated in standbys, the STOP TIME is not
set and this results in garbage in the file.

Fixed.

+    If second parameter is true and on standby, <function>pg_stop_backup</>
+    waits for WAL to be archived without forcibly switching WAL on standby.
+    So enforcing manually a WAL switch on primary needs to happen.
Here is a reformulation:
If the second parameter wait_for_archive is true and the backup is
taken on a standby, pg_stop_backup waits for WAL to be archived when
archive_mode = always. Enforcing manually a WAL segment switch to
happen with for example pg_switch_wal() may be necessary if the
primary has low activity to allow the backup to complete. Using
statement_timeout to limit the amount of time to wait or switching
wait_for_archive to false will control the wait time, though all the
WAL segments necessary to recover into a consistent state from the
backup taken may not be archived at the time pg_stop_backup returns
its status to the caller.

Thanks, fixed.

The errhint() for the wait phase should be reworked for a standby. I
would suggest for errmsg() the same thing, aka:
pg_stop_backup still waiting for all required WAL segments to be
archived (%d seconds elapsed)
But the following for a backup started in recovery. That's long but
things need to be really clear to the user:
Backup has been taken from a standby, check if the WAL segments needed
for this backup have been completed, in which case a forced segment
switch may can be needed on the primary. If a segment switch has
already happened check that your archive_command is executing
properly. pg_stop_backup can be canceled safely, but the database
backup will not be usable without all the WAL segments.

Fixed.

Attached updated version patch. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

pg_stop_backup_on_standby_v3.patchapplication/octet-stream; name=pg_stop_backup_on_standby_v3.patchDownload+60-38
#21Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#20)
#22Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#21)
#23Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#22)
#24Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#23)
#25Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#24)
#26Stephen Frost
sfrost@snowman.net
In reply to: Masahiko Sawada (#22)
#27Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Stephen Frost (#26)
#28Stephen Frost
sfrost@snowman.net
In reply to: Masahiko Sawada (#27)
#29Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Stephen Frost (#28)
#30David Steele
david@pgmasters.net
In reply to: Masahiko Sawada (#29)
#31Robert Haas
robertmhaas@gmail.com
In reply to: David Steele (#30)
#32Stephen Frost
sfrost@snowman.net
In reply to: David Steele (#30)
#33Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#31)
#34Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#33)
#35Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#34)
#36David Steele
david@pgmasters.net
In reply to: Stephen Frost (#32)
#37Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#35)
#38David Steele
david@pgmasters.net
In reply to: David Steele (#36)
#39Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#37)
#40Stephen Frost
sfrost@snowman.net
In reply to: Masahiko Sawada (#39)
#41Noah Misch
noah@leadboat.com
In reply to: Stephen Frost (#28)
#42Stephen Frost
sfrost@snowman.net
In reply to: Noah Misch (#41)
#43Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#42)
#44Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#43)
#45Noah Misch
noah@leadboat.com
In reply to: Stephen Frost (#42)
#46Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#44)
#47Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#46)
#48Stephen Frost
sfrost@snowman.net
In reply to: Noah Misch (#45)
#49Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#47)
#50Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#48)
#51Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#49)
#52Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#50)
#53Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Stephen Frost (#51)
#54Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#52)
#55Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#54)
#56Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#55)
#57Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#56)
#58Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#57)
#59Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#57)
#60Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#59)
#61Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#60)
#62Paul Jungwirth
pj@illuminatedcomputing.com
In reply to: Robert Haas (#59)
#63Paul Jungwirth
pj@illuminatedcomputing.com
In reply to: Paul Jungwirth (#62)
#64Robert Haas
robertmhaas@gmail.com
In reply to: Paul Jungwirth (#62)
#65Paul Jungwirth
pj@illuminatedcomputing.com
In reply to: Robert Haas (#64)
#66Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#64)
#67Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#66)