PATCH: Make pg_stop_backup() archive wait optional
Currently pg_stop_backup() will wait for all WAL to be archived before
returning. If there is a problem with the archive command or a large
backlog it may not return for a very long time (if ever). Backup
software is faced with the choice of cancelling the query and hoping the
stop backup record was written or waiting indefinitely.
The purpose of this patch is to make waiting for archive optional, with
the default being the current behavior, i.e., to wait for all WAL to be
archived. This functionality is already used internally by
pg_basebackup, so the only real change is to expose it through the
pg_stop_backup() function.
I wasn't sure where, if anywhere, to put tests (the underlying
functionality is tested in the pg_basebackup TAP tests). test/recovery
and bin/pg_basebackup did not seem like appropriate places and the
pg_regress tests are not capable enough. It seems like a new
test/backup suite would be a good idea but I wanted to get some feedback
before proceeding in that direction.
I also marked the pg_stop_* functions as parallel restricted, the same
as pg_start_backup(). Previously they were parallel safe which I don't
believe is accurate for the non-exclusive version at the very least,
since it is tied to a particular backend.
The patch applies cleanly on 30df93f but will be broken (and easily
fixed) by any cat version bump in later commits.
--
-David
david@pgmasters.net
Attachments:
pgstopbackup-wait-v1.patchtext/plain; charset=UTF-8; name=pgstopbackup-wait-v1.patch; x-mac-creator=0; x-mac-type=0Download+21-8
On Tue, Feb 28, 2017 at 9:25 AM, David Steele <david@pgmasters.net> wrote:
I also marked the pg_stop_* functions as parallel restricted, the same
as pg_start_backup(). Previously they were parallel safe which I don't
believe is accurate for the non-exclusive version at the very least,
since it is tied to a particular backend.
Yeah, those should really be parallel restricted. For the exclusive
version, having the function run in parallel would also lead to errors
per the presence/lack of backup_label file.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2/27/17 7:38 PM, Michael Paquier wrote:
On Tue, Feb 28, 2017 at 9:25 AM, David Steele <david@pgmasters.net> wrote:
I also marked the pg_stop_* functions as parallel restricted, the same
as pg_start_backup(). Previously they were parallel safe which I don't
believe is accurate for the non-exclusive version at the very least,
since it is tied to a particular backend.Yeah, those should really be parallel restricted. For the exclusive
version, having the function run in parallel would also lead to errors
per the presence/lack of backup_label file.
I'm not sure that's the case. It seems like it should lock just as
multiple backends would now. One process would succeed and the others
would error. Maybe I'm missing something?
Either way, I don't think the behavior makes any sense. Parallel safe
seems more sensible.
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Feb 28, 2017 at 9:42 AM, David Steele <david@pgmasters.net> wrote:
On 2/27/17 7:38 PM, Michael Paquier wrote:
On Tue, Feb 28, 2017 at 9:25 AM, David Steele <david@pgmasters.net> wrote:
I also marked the pg_stop_* functions as parallel restricted, the same
as pg_start_backup(). Previously they were parallel safe which I don't
believe is accurate for the non-exclusive version at the very least,
since it is tied to a particular backend.Yeah, those should really be parallel restricted. For the exclusive
version, having the function run in parallel would also lead to errors
per the presence/lack of backup_label file.I'm not sure that's the case. It seems like it should lock just as
multiple backends would now. One process would succeed and the others
would error. Maybe I'm missing something?
Hm, any errors happening in the workers would be reported to the
leader, meaning that even if one worker succeeded to run
pg_start_backup() it would be reported as an error at the end to the
client, no? By marking the exclusive function restricted we get sure
that it is just the leader that fails or succeeds.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2/27/17 7:50 PM, Michael Paquier wrote:
On Tue, Feb 28, 2017 at 9:42 AM, David Steele <david@pgmasters.net> wrote:
On 2/27/17 7:38 PM, Michael Paquier wrote:
On Tue, Feb 28, 2017 at 9:25 AM, David Steele <david@pgmasters.net> wrote:
I also marked the pg_stop_* functions as parallel restricted, the same
as pg_start_backup(). Previously they were parallel safe which I don't
believe is accurate for the non-exclusive version at the very least,
since it is tied to a particular backend.Yeah, those should really be parallel restricted. For the exclusive
version, having the function run in parallel would also lead to errors
per the presence/lack of backup_label file.I'm not sure that's the case. It seems like it should lock just as
multiple backends would now. One process would succeed and the others
would error. Maybe I'm missing something?Hm, any errors happening in the workers would be reported to the
leader, meaning that even if one worker succeeded to run
pg_start_backup() it would be reported as an error at the end to the
client, no? By marking the exclusive function restricted we get sure
that it is just the leader that fails or succeeds.
Good point, and it strengthens the argument beyond, "it just seems right."
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Feb 28, 2017 at 6:22 AM, David Steele <david@pgmasters.net> wrote:
I'm not sure that's the case. It seems like it should lock just as
multiple backends would now. One process would succeed and the others
would error. Maybe I'm missing something?Hm, any errors happening in the workers would be reported to the
leader, meaning that even if one worker succeeded to run
pg_start_backup() it would be reported as an error at the end to the
client, no? By marking the exclusive function restricted we get sure
that it is just the leader that fails or succeeds.Good point, and it strengthens the argument beyond, "it just seems right."
I think the argument should be based on whether or not the function
depends on backend-private state that will not be synchronized.
That's the definition of what makes something parallel-restricted or
not.
It looks like pg_start_backup() and pg_stop_backup() depend on the
backend-private global variable nonexclusive_backup_running, so they
should be parallel-restricted.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2/28/17 10:22 PM, Robert Haas wrote:
On Tue, Feb 28, 2017 at 6:22 AM, David Steele <david@pgmasters.net> wrote:
I'm not sure that's the case. It seems like it should lock just as
multiple backends would now. One process would succeed and the others
would error. Maybe I'm missing something?Hm, any errors happening in the workers would be reported to the
leader, meaning that even if one worker succeeded to run
pg_start_backup() it would be reported as an error at the end to the
client, no? By marking the exclusive function restricted we get sure
that it is just the leader that fails or succeeds.Good point, and it strengthens the argument beyond, "it just seems right."
I think the argument should be based on whether or not the function
depends on backend-private state that will not be synchronized.
That's the definition of what makes something parallel-restricted or
not.
Absolutely. Yesterday was a long day so I may have (perhaps) become a
bit flippant.
It looks like pg_start_backup() and pg_stop_backup() depend on the
backend-private global variable nonexclusive_backup_running, so they
should be parallel-restricted.
Agreed.
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2/27/17 6:25 PM, David Steele wrote:
The purpose of this patch is to make waiting for archive optional, with
the default being the current behavior, i.e., to wait for all WAL to be
archived. This functionality is already used internally by
pg_basebackup, so the only real change is to expose it through the
pg_stop_backup() function.
Do the docs mention anywhere how to monitor WAL archiving to know if
you've got all the necessary WAL? Perhaps a function to do that would be
worth adding.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 3/1/17 5:11 PM, Jim Nasby wrote:
On 2/27/17 6:25 PM, David Steele wrote:
The purpose of this patch is to make waiting for archive optional, with
the default being the current behavior, i.e., to wait for all WAL to be
archived. This functionality is already used internally by
pg_basebackup, so the only real change is to expose it through the
pg_stop_backup() function.Do the docs mention anywhere how to monitor WAL archiving to know if
you've got all the necessary WAL? Perhaps a function to do that would be
worth adding.
The docs in the patch mention that this option should only be used by
backup solutions that know how to monitor archiving:
+ This behavior is only useful for backup
+ software which independently monitors WAL archiving, otherwise WAL
+ required to make the backup consistent might be missing and make
the backup
+ useless.
There is already a view, pg_stat_archiver, that allows a program to see
what has been archived. I'm not sure we'd be adding much by putting a
function around that.
I would be OK with adding a link to pg_stat_archiver in the
pg_stop_backup() documentation to if that seems appropriate.
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 1, 2017 at 9:07 AM, David Steele <david@pgmasters.net> wrote:
On 2/28/17 10:22 PM, Robert Haas wrote:
On Tue, Feb 28, 2017 at 6:22 AM, David Steele <david@pgmasters.net> wrote:
I'm not sure that's the case. It seems like it should lock just as
multiple backends would now. One process would succeed and the others
would error. Maybe I'm missing something?Hm, any errors happening in the workers would be reported to the
leader, meaning that even if one worker succeeded to run
pg_start_backup() it would be reported as an error at the end to the
client, no? By marking the exclusive function restricted we get sure
that it is just the leader that fails or succeeds.Good point, and it strengthens the argument beyond, "it just seems right."
I think the argument should be based on whether or not the function
depends on backend-private state that will not be synchronized.
That's the definition of what makes something parallel-restricted or
not.Absolutely. Yesterday was a long day so I may have (perhaps) become a
bit flippant.It looks like pg_start_backup() and pg_stop_backup() depend on the
backend-private global variable nonexclusive_backup_running, so they
should be parallel-restricted.Agreed.
How about a separately-committable patch that just does that, and then
a main patch that applies on top of it?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Robert,
On 3/4/17 1:58 AM, Robert Haas wrote:
On Wed, Mar 1, 2017 at 9:07 AM, David Steele <david@pgmasters.net> wrote:
On 2/28/17 10:22 PM, Robert Haas wrote:
On Tue, Feb 28, 2017 at 6:22 AM, David Steele <david@pgmasters.net> wrote:
I'm not sure that's the case. It seems like it should lock just as
multiple backends would now. One process would succeed and the others
would error. Maybe I'm missing something?Hm, any errors happening in the workers would be reported to the
leader, meaning that even if one worker succeeded to run
pg_start_backup() it would be reported as an error at the end to the
client, no? By marking the exclusive function restricted we get sure
that it is just the leader that fails or succeeds.Good point, and it strengthens the argument beyond, "it just seems right."
I think the argument should be based on whether or not the function
depends on backend-private state that will not be synchronized.
That's the definition of what makes something parallel-restricted or
not.Absolutely. Yesterday was a long day so I may have (perhaps) become a
bit flippant.It looks like pg_start_backup() and pg_stop_backup() depend on the
backend-private global variable nonexclusive_backup_running, so they
should be parallel-restricted.Agreed.
How about a separately-committable patch that just does that, and then
a main patch that applies on top of it?
Yes, that makes sense. Attached are two patches as requested:
01 - Just marks pg_stop_backup() variants as parallel restricted
02 - Add the wait_for_archive param to pg_stop_backup().
These apply cleanly on 272adf4.
Thanks,
--
-David
david@pgmasters.net
Attachments:
pgstopbackup-wait-01-flags-v2.patchtext/plain; charset=UTF-8; name=pgstopbackup-wait-01-flags-v2.patch; x-mac-creator=0; x-mac-type=0Download+3-3
pgstopbackup-wait-02-param-v2.patchtext/plain; charset=UTF-8; name=pgstopbackup-wait-02-param-v2.patch; x-mac-creator=0; x-mac-type=0Download+20-7
On Sat, Mar 4, 2017 at 9:12 AM, David Steele <david@pgmasters.net> wrote:
Yes, that makes sense. Attached are two patches as requested:
01 - Just marks pg_stop_backup() variants as parallel restricted
02 - Add the wait_for_archive param to pg_stop_backup().These apply cleanly on 272adf4.
Committed 01. Nobody's offered an opinion about 02 yet, so I'm not
going to commit that, but one minor nitpick:
+ WAL to be archived. This behavior is only useful for backup
+ software which independently monitors WAL archiving, otherwise WAL
+ required to make the backup consistent might be missing and make the backup
I think this should really say "...which independently monitors WAL
archiving. Otherwise, WAL..."
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert, all,
* Robert Haas (robertmhaas@gmail.com) wrote:
On Sat, Mar 4, 2017 at 9:12 AM, David Steele <david@pgmasters.net> wrote:
Yes, that makes sense. Attached are two patches as requested:
01 - Just marks pg_stop_backup() variants as parallel restricted
02 - Add the wait_for_archive param to pg_stop_backup().These apply cleanly on 272adf4.
Committed 01. Nobody's offered an opinion about 02 yet, so I'm not
going to commit that, but one minor nitpick:+ WAL to be archived. This behavior is only useful for backup + software which independently monitors WAL archiving, otherwise WAL + required to make the backup consistent might be missing and make the backupI think this should really say "...which independently monitors WAL
archiving. Otherwise, WAL..."
Regarding 02, I certainly see that as valuable for the reasons which
David outlined in his initial email. I can certainly take point on
getting it committed, but I wouldn't complain if someone else does
either.
Thanks!
Stephen
On Mon, Mar 6, 2017 at 12:53 PM, Stephen Frost <sfrost@snowman.net> wrote:
Regarding 02, I certainly see that as valuable for the reasons which
David outlined in his initial email. I can certainly take point on
getting it committed, but I wouldn't complain if someone else does
either.
Sold, to the snowman in the first row.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Robert,
On 3/6/17 12:48 PM, Robert Haas wrote:
On Sat, Mar 4, 2017 at 9:12 AM, David Steele <david@pgmasters.net> wrote:
Yes, that makes sense. Attached are two patches as requested:
01 - Just marks pg_stop_backup() variants as parallel restricted
02 - Add the wait_for_archive param to pg_stop_backup().These apply cleanly on 272adf4.
Committed 01.
Thanks!
Nobody's offered an opinion about 02 yet, so I'm not
going to commit that, but one minor nitpick:+ WAL to be archived. This behavior is only useful for backup + software which independently monitors WAL archiving, otherwise WAL + required to make the backup consistent might be missing and make the backupI think this should really say "...which independently monitors WAL
archiving. Otherwise, WAL..."
The attached patch udpates the docs per your suggestion and has been
rebased on master at d69fae2.
Thanks,
--
-David
david@pgmasters.net
Attachments:
pgstopbackup-wait-v3.patchtext/plain; charset=UTF-8; name=pgstopbackup-wait-v3.patch; x-mac-creator=0; x-mac-type=0Download+20-7
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of David Steele
The attached patch udpates the docs per your suggestion and has been rebased
on master at d69fae2.
I made this ready for committer. The patch applied except for catversion.h, the patch content looks good, and the target test passed as follows:
I set archive_command to 'sleep 10'. pg_stop_backup() with archive wait took about 10 seconds, emitting NOTICE messages.
postgres=# select pg_stop_backup(false, true);
NOTICE: pg_stop_backup cleanup done, waiting for required WAL segments to be archived
NOTICE: pg_stop_backup complete, all required WAL segments have been archived
pg_stop_backup
---------------------------------------------------------------------------
(0/B0000F8,"START WAL LOCATION: 0/B000028 (file 00000001000000000000000B)+
CHECKPOINT LOCATION: 0/B000060 +
BACKUP METHOD: streamed +
BACKUP FROM: master +
START TIME: 2017-03-17 13:26:47 JST +
LABEL: a +
","")
(1 row)
pg_stop_backup() without archive wait returned immediately without displaying any NOTICE messages.
postgres=# select pg_stop_backup(false, false);
pg_stop_backup
---------------------------------------------------------------------------
(0/D000130,"START WAL LOCATION: 0/D000028 (file 00000001000000000000000D)+
CHECKPOINT LOCATION: 0/D000060 +
BACKUP METHOD: streamed +
BACKUP FROM: master +
START TIME: 2017-03-17 13:29:46 JST +
LABEL: a +
","")
(1 row)
BTW, does the developer of each feature have to modify the catalog version in catversion.h? It's a bit annoying to see the patch application failure on catversion.h. Isn't it enough to modify the catalog version only when alpha/beta/RC/final versions are released?
Regards
Takayuki Tsunakawa
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Mar 17, 2017 at 1:47 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
BTW, does the developer of each feature have to modify the catalog version in catversion.h? It's a bit annoying to see the patch application failure on catversion.h.
Committers take care of this part.
Isn't it enough to modify the catalog version only when alpha/beta/RC/final versions are released?
That's useful at least for developers to bump it even during a
development cycle as you can notice with a hard failure at startup if
a data folder you have created is compatible with the new binaries or
not.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier
On Fri, Mar 17, 2017 at 1:47 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:BTW, does the developer of each feature have to modify the catalog version
in catversion.h? It's a bit annoying to see the patch application failure
on catversion.h.Committers take care of this part.
I understood the committer modifies the catalog version based on the patch content, so the patch submitter doesn't have to modify it. I'm relieved.
Isn't it enough to modify the catalog version only when
alpha/beta/RC/final versions are released?
That's useful at least for developers to bump it even during a development
cycle as you can notice with a hard failure at startup if a data folder
you have created is compatible with the new binaries or not.
Oh, you're absolutely right.
Regards
Takayuki Tsunakawa
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tsunakawa,
Takayuki
I made this ready for committer. The patch applied except for catversion.h,
the patch content looks good, and the target test passed as follows:
Sorry, I reverted this to waiting for author, because make check failed. src/test/regress/expected/opr_sanity.out seems to need revision because pg_stop_backup() was added.
Regards
Takayuki Tsunakawa
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 3/17/17 4:18 AM, Tsunakawa, Takayuki wrote:
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tsunakawa,
Takayuki
I made this ready for committer. The patch applied except for catversion.h,
the patch content looks good, and the target test passed as follows:Sorry, I reverted this to waiting for author, because make check failed. src/test/regress/expected/opr_sanity.out seems to need revision because pg_stop_backup() was added.
Well, that's embarrassing. When I recreated the function to add
defaults I messed up the AS clause and did not pay attention to the
results of the regression tests, apparently.
Attached is a new version rebased on 88e66d1. Catalog version bump has
also been omitted.
Thanks,
--
-David
david@pgmasters.net