Remove Deprecated Exclusive Backup Mode
Hackers,
I propose we remove the deprecated exclusive backup mode of
pg_start_backup()/pg_stop_backup() for Postgres 12.
The exclusive backup mode has a serious known issue. If Postgres
terminates ungracefully during a backup (due to hardware, kernel,
Postgres issues, etc.) then Postgres may refuse to restart.
The reason is that the backup_label file will usually reference a
checkpoint LSN that is older than the WAL available in pg_wal. Postgres
does emit a helpful error message while PANIC'ing but that's cold
comfort to an admin who must manually intervene to get their cluster
operational again.
The deprecated exclusive mode promises to make a difficult problem
simple but doesn't live up to that promise. That's why it was replaced
externally in 9.6 and why pg_basebackup has not used exclusive backups
since it was introduced in 9.2.
Non-exclusive backups have been available since 9.6 and several
third-party solutions support this mode, in addition to pg_basebackup.
The recently introduced recovery changes will break current automated
solutions so this seems like a good opportunity to make improvements on
the backup side as well.
I'll submit a patch for the 2019-01 commitfest.
Regards,
--
-David
david@pgmasters.net
On Mon, Nov 26, 2018 at 10:13:34PM -0500, David Steele wrote:
Non-exclusive backups have been available since 9.6 and several third-party
solutions support this mode, in addition to pg_basebackup.
I think that two releases is not actually that much time to just nuke
the exclusive backup interface. I would be fine if the docs show the
deprecation more aggressively using a warning section, and we could add
an explicit WARNING message about the matter directly when calling
pg_start_backup and pg_stop_backup.
My 2c.
--
Michael
Hi,
On 2018-11-27 12:20:13 +0900, Michael Paquier wrote:
On Mon, Nov 26, 2018 at 10:13:34PM -0500, David Steele wrote:
Non-exclusive backups have been available since 9.6 and several third-party
solutions support this mode, in addition to pg_basebackup.I think that two releases is not actually that much time to just nuke
the exclusive backup interface. I would be fine if the docs show the
deprecation more aggressively using a warning section, and we could add
an explicit WARNING message about the matter directly when calling
pg_start_backup and pg_stop_backup.
That was my gut reaction as well, but I think David's argument about
existing scripts / workflows being broken due the the recovery.conf
is a good reason to be more aggressive here.
Greetings,
Andres Freund
On Mon, Nov 26, 2018 at 10:13 PM David Steele <david@pgmasters.net> wrote:
I propose we remove the deprecated exclusive backup mode of
pg_start_backup()/pg_stop_backup() for Postgres 12.
-1. I don't have a problem with deprecating exclusive backup mode
eventually, but I don't see any good reason to do it this soon.
It's not like the problems with exclusive backup are so serious that
you can't work around them. If you know which machine is your master,
you can arrange to remove backup_label on reboot (only) on the master
(only). Sure, a lot of people probably do this wrong, but it's
infeasible to disallow all the things that people might use
incorrectly without making the system useless.
There must be hundreds or thousands of backup scripts written by
individual users that still use exclusive mode floating around out
there. Forcing all of those to be updated or scrapped will annoy
users to no benefit.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Nov 27, 2018 at 4:46 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-11-27 12:20:13 +0900, Michael Paquier wrote:
On Mon, Nov 26, 2018 at 10:13:34PM -0500, David Steele wrote:
Non-exclusive backups have been available since 9.6 and several
third-party
solutions support this mode, in addition to pg_basebackup.
I think that two releases is not actually that much time to just nuke
the exclusive backup interface. I would be fine if the docs show the
deprecation more aggressively using a warning section, and we could add
an explicit WARNING message about the matter directly when calling
pg_start_backup and pg_stop_backup.That was my gut reaction as well, but I think David's argument about
existing scripts / workflows being broken due the the recovery.conf
is a good reason to be more aggressive here.
Yeah, I'm in the same boat here -- it feels like it's a bit too short, but
since we're breaking things for people *anyway*, it's probably better to
break both at once than to have all those people have their things broken
multiple times.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On 11/26/18 11:04 PM, Robert Haas wrote:
On Mon, Nov 26, 2018 at 10:13 PM David Steele <david@pgmasters.net> wrote:
I propose we remove the deprecated exclusive backup mode of
pg_start_backup()/pg_stop_backup() for Postgres 12.-1. I don't have a problem with deprecating exclusive backup mode
eventually, but I don't see any good reason to do it this soon.It's not like the problems with exclusive backup are so serious that
you can't work around them. If you know which machine is your master,
you can arrange to remove backup_label on reboot (only) on the master
(only).
I do not recommend that users write automated scripts to delete files
out of $PGDATA. Accidents happen that way.
If this is our recommendation on how the mitigate the problem with
exclusive backup then I think it proves the point that it is broken and
should be removed.
Sure, a lot of people probably do this wrong, but it's
infeasible to disallow all the things that people might use
incorrectly without making the system useless.
But we have already fixed this problem -- it only remains to remove the
old method.
There must be hundreds or thousands of backup scripts written by
individual users that still use exclusive mode floating around out
there. Forcing all of those to be updated or scrapped will annoy
users to no benefit.
But we are OK with forcing them to scrap their recovery scripts?
And there is a benefit -- Postgres will be able to restart if it crashes
during a backup.
Regards,
--
-David
david@pgmasters.net
On 27/11/2018 04:46, Andres Freund wrote:
That was my gut reaction as well, but I think David's argument about
existing scripts / workflows being broken due the the recovery.conf
is a good reason to be more aggressive here.
But backup scripts are not affected by the recovery.conf changes.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Nov 27, 2018 at 02:21:49PM +0100, Peter Eisentraut wrote:
On 27/11/2018 04:46, Andres Freund wrote:
That was my gut reaction as well, but I think David's argument about
existing scripts / workflows being broken due the the recovery.conf
is a good reason to be more aggressive here.But backup scripts are not affected by the recovery.conf changes.
In any of my own backup scripts (yeah!), I don't have any dependency to
that either. Or perhaps pgBackRest has a dependency in this area?
--
Michael
On Tue, 27 Nov 2018 at 03:13, David Steele <david@pgmasters.net> wrote:
The deprecated exclusive mode promises to make a difficult problem
simple but doesn't live up to that promise. That's why it was replaced
externally in 9.6 and why pg_basebackup has not used exclusive backups
since it was introduced in 9.2.Non-exclusive backups have been available since 9.6 and several
third-party solutions support this mode, in addition to pg_basebackup.The recently introduced recovery changes will break current automated
solutions so this seems like a good opportunity to make improvements on
the backup side as well.I'll submit a patch for the 2019-01 commitfest.
-1 for removal, in this release
It's not there because anyone likes it, it's there because removing it is a
risk
Recent changes are the reason to keep it, not a reason to remove.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Greetings,
* Michael Paquier (michael@paquier.xyz) wrote:
On Tue, Nov 27, 2018 at 02:21:49PM +0100, Peter Eisentraut wrote:
On 27/11/2018 04:46, Andres Freund wrote:
That was my gut reaction as well, but I think David's argument about
existing scripts / workflows being broken due the the recovery.conf
is a good reason to be more aggressive here.But backup scripts are not affected by the recovery.conf changes.
In any of my own backup scripts (yeah!), I don't have any dependency to
that either. Or perhaps pgBackRest has a dependency in this area?
If you don't consider your recovery scripts and your backup scripts to
be related then I've really got to wonder how you're regularly testing
your backups to make sure that they're actually valid.
If you aren't regularly testing your backups then I've got little
sympathy.
To be clear, pgbackrest doesn't have any dependency here- but it, like
all of the other 3rd party backup solutions and any restore solution
that a user has come up with, are going to have to be changed to deal
with the changes in how recovery works, so this is a good time to make
these changes.
Thanks!
Stephen
Greetings,
* Robert Haas (robertmhaas@gmail.com) wrote:
There must be hundreds or thousands of backup scripts written by
individual users that still use exclusive mode floating around out
there. Forcing all of those to be updated or scrapped will annoy
users to no benefit.
How about automated recovery scripts?
We've decided it's fine to break all of those which exist out there.
I'm concerned, seriously, that people don't have anywhere near the
concern about the recovery side of things as they do about the backup
side of things and that's really concerning.
Thanks!
Stephen
On 27/11/2018 15:45, Stephen Frost wrote:
But backup scripts are not affected by the recovery.conf changes.
In any of my own backup scripts (yeah!), I don't have any dependency to
that either. Or perhaps pgBackRest has a dependency in this area?If you don't consider your recovery scripts and your backup scripts to
be related then I've really got to wonder how you're regularly testing
your backups to make sure that they're actually valid.
The sort of installations that continue to use the exclusive backup mode
probably have the following tooling: a 20-line shell script to make the
backup and either a 10-line shell script or a similarly sized README or
wiki page to do the recovery. Changing the latter for the recovery.conf
changes is probably a 3-line change. Changing the former for the
removal of exclusive backups would require major changes. (Try writing
a shell script that keeps a psql session open while it takes the backup
from the file system. It's possible, but it requires significantly more
logic.)
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Greetings,
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
On 27/11/2018 15:45, Stephen Frost wrote:
But backup scripts are not affected by the recovery.conf changes.
In any of my own backup scripts (yeah!), I don't have any dependency to
that either. Or perhaps pgBackRest has a dependency in this area?If you don't consider your recovery scripts and your backup scripts to
be related then I've really got to wonder how you're regularly testing
your backups to make sure that they're actually valid.The sort of installations that continue to use the exclusive backup mode
probably have the following tooling: a 20-line shell script to make the
backup and either a 10-line shell script or a similarly sized README or
wiki page to do the recovery. Changing the latter for the recovery.conf
changes is probably a 3-line change. Changing the former for the
removal of exclusive backups would require major changes. (Try writing
a shell script that keeps a psql session open while it takes the backup
from the file system. It's possible, but it requires significantly more
logic.)
They're also the sort of installations which don't have reliable backups
and don't have any clue about the danger they are in due to the current
bug/issue/whatever we have with exclusive backups.
No, I don't agree that it's sensible to continue to march on as if
nothing is wrong.
Thanks!
Stephen
On 11/27/18 3:46 PM, Stephen Frost wrote:
I'm concerned, seriously, that people don't have anywhere near the
concern about the recovery side of things as they do about the backup
side of things and that's really concerning.
I agree with your larger point, but in this case the two breakages do
not seem equal. As far as I gather the removal of recovery.conf will in
practice result in a longer downtime when your recovery scripts breaks
but not any data loss. While if we remove the exclusive backup mode then
some people's backup scripts will break and if they do not properly
monitor their backups they risk data loss.
I think we should use more caution when data loss is at stake rather
than "just" downtime. So personally I am in favor of updating the manual
with warnings (right now it does not even say if exclusive or
non-exclusive is the default) and adding a deprecation warning when
people use the exclusive mode.
Best regards,
Andreas
Greetings,
* Andreas Karlsson (andreas@proxel.se) wrote:
On 11/27/18 3:46 PM, Stephen Frost wrote:
I'm concerned, seriously, that people don't have anywhere near the
concern about the recovery side of things as they do about the backup
side of things and that's really concerning.I agree with your larger point, but in this case the two breakages do not
seem equal. As far as I gather the removal of recovery.conf will in practice
result in a longer downtime when your recovery scripts breaks but not any
data loss. While if we remove the exclusive backup mode then some people's
backup scripts will break and if they do not properly monitor their backups
they risk data loss.
Let's please try to remember that this is across a major version upgrade
and is removing a broken capability that's already been deprecated for a
couple years.
If they aren't monitoring their backup scripts today, and aren't
regularly performing restores of those backups, they're already risking
data loss.
I think we should use more caution when data loss is at stake rather than
"just" downtime. So personally I am in favor of updating the manual with
warnings (right now it does not even say if exclusive or non-exclusive is
the default) and adding a deprecation warning when people use the exclusive
mode.
Waiting isn't going to change any of these factors, nor will throwing
warnings about the exclusive mode if people aren't monitoring their
scripts.
If we really want to keep the exclusive backup mode around, then, as
Magnus said on a nearby thread, it needs to be fixed. If it's fixed and
just works and everyone's scripts work, then great, then we can
un-deprecate it and move on.
If we aren't going to fix it then let's remove it.
Thanks!
Stephen
On 11/27/18 4:11 PM, Stephen Frost wrote:
I agree with your larger point, but in this case the two breakages do not
seem equal. As far as I gather the removal of recovery.conf will in practice
result in a longer downtime when your recovery scripts breaks but not any
data loss. While if we remove the exclusive backup mode then some people's
backup scripts will break and if they do not properly monitor their backups
they risk data loss.Let's please try to remember that this is across a major version upgrade
and is removing a broken capability that's already been deprecated for a
couple years.
I know that and you know that, but even our own manual use the exclusive
mode in some of the examples, e.g: "pg_start_backup('label_goes_here')"[1].
Your point about that people who do not monitor their backups wont
notice warnings is fair, but even so I think we should give people more
time to migrate when even our own documentation isn't always clear about
the exclusive mode being deprecated.
1.
https://www.postgresql.org/docs/11/functions-admin.html#FUNCTIONS-ADMIN-BACKUP
Andreas
On 27/11/2018 16:02, Stephen Frost wrote:
They're also the sort of installations which don't have reliable backups
and don't have any clue about the danger they are in due to the current
bug/issue/whatever we have with exclusive backups.No, I don't agree that it's sensible to continue to march on as if
nothing is wrong.
It's fair to argue that this facility is broken and needs to be removed.
But that needs to be its own argument.
The argument in this subthread is that since we're already making
changes in an adjacent area, removing this feature will have a low impact.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/27/18 9:54 AM, Peter Eisentraut wrote:
On 27/11/2018 15:45, Stephen Frost wrote:
But backup scripts are not affected by the recovery.conf changes.
In any of my own backup scripts (yeah!), I don't have any dependency to
that either. Or perhaps pgBackRest has a dependency in this area?If you don't consider your recovery scripts and your backup scripts to
be related then I've really got to wonder how you're regularly testing
your backups to make sure that they're actually valid.The sort of installations that continue to use the exclusive backup mode
probably have the following tooling: a 20-line shell script to make the
backup and either a 10-line shell script or a similarly sized README or
wiki page to do the recovery. Changing the latter for the recovery.conf
changes is probably a 3-line change.
Really? We have gone from a file that can be safely overwritten
(recovery.conf) to a file that might need to be parsed to figure out if
the settings already exists (postgresql.auto.conf). Of course, you can
just continue to append to the file but that seems pretty grotty to me.
This will also require changes to all HA solutions or backup solutions
that deal with recovery. I think you underestimate how big a change
this is. I've been thinking about how to write code for it and it is
significantly more complex -- also more flexible so I'm happy about that.
Changing the former for the
removal of exclusive backups would require major changes. (Try writing
a shell script that keeps a psql session open while it takes the backup
from the file system. It's possible, but it requires significantly more
logic.)
We provide pg_basebackup with core and it is a solid tool for doing
backups. Do we really want to encourage the use of bash scripts to do
what we already have a tool for? If users want to do something more
complex than pg_basebackup then bash is certainly not the tool for that.
Regards,
--
-David
david@pgmasters.net
On 11/27/18 10:29 AM, Peter Eisentraut wrote:
On 27/11/2018 16:02, Stephen Frost wrote:
They're also the sort of installations which don't have reliable backups
and don't have any clue about the danger they are in due to the current
bug/issue/whatever we have with exclusive backups.No, I don't agree that it's sensible to continue to march on as if
nothing is wrong.It's fair to argue that this facility is broken and needs to be removed.
But that needs to be its own argument.
I believe I made this argument in the OP.
The argument in this subthread is that since we're already making
changes in an adjacent area, removing this feature will have a low impact.
Fair enough, but I think the argument against exclusive backups stands
on its own. Also, I don't see backup and restore as adjacent. I see
them as thoroughly intertwined.
--
-David
david@pgmasters.net
On 11/27/18 8:56 AM, Simon Riggs wrote:
On Tue, 27 Nov 2018 at 03:13, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:
The deprecated exclusive mode promises to make a difficult problem
simple but doesn't live up to that promise. That's why it was replaced
externally in 9.6 and why pg_basebackup has not used exclusive backups
since it was introduced in 9.2.Non-exclusive backups have been available since 9.6 and several
third-party solutions support this mode, in addition to pg_basebackup.The recently introduced recovery changes will break current automated
solutions so this seems like a good opportunity to make improvements on
the backup side as well.I'll submit a patch for the 2019-01 commitfest.
-1 for removal, in this release
It's not there because anyone likes it, it's there because removing it
is a riskRecent changes are the reason to keep it, not a reason to remove.
How so?
--
-David
david@pgmasters.net
Greetings,
* David Steele (david@pgmasters.net) wrote:
On 11/27/18 10:29 AM, Peter Eisentraut wrote:
On 27/11/2018 16:02, Stephen Frost wrote:
They're also the sort of installations which don't have reliable backups
and don't have any clue about the danger they are in due to the current
bug/issue/whatever we have with exclusive backups.No, I don't agree that it's sensible to continue to march on as if
nothing is wrong.It's fair to argue that this facility is broken and needs to be removed.
But that needs to be its own argument.
We made that argument, and had agreement on it, and that's why it's
deprecated today. The only question here is how long we keep this
deprecated and broken capability around and given that we're completely
breaking everything around recovery, now is as good a time as any since
users should be looking at how they do backup and restore when they
migrate to v12 due to these changes.
The argument in this subthread is that since we're already making
changes in an adjacent area, removing this feature will have a low impact.Fair enough, but I think the argument against exclusive backups stands
on its own. Also, I don't see backup and restore as adjacent. I see
them as thoroughly intertwined.
Agreed.
Thanks!
Stephen
On Tue, 27 Nov 2018 at 14:45, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Michael Paquier (michael@paquier.xyz) wrote:
On Tue, Nov 27, 2018 at 02:21:49PM +0100, Peter Eisentraut wrote:
On 27/11/2018 04:46, Andres Freund wrote:
That was my gut reaction as well, but I think David's argument about
existing scripts / workflows being broken due the the recovery.conf
is a good reason to be more aggressive here.But backup scripts are not affected by the recovery.conf changes.
In any of my own backup scripts (yeah!), I don't have any dependency to
that either. Or perhaps pgBackRest has a dependency in this area?If you don't consider your recovery scripts and your backup scripts to
be related then I've really got to wonder how you're regularly testing
your backups to make sure that they're actually valid.If you aren't regularly testing your backups then I've got little
sympathy.To be clear, pgbackrest doesn't have any dependency here- but it, like
all of the other 3rd party backup solutions and any restore solution
that a user has come up with, are going to have to be changed to deal
with the changes in how recovery works, so this is a good time to make
these changes.
If those tools are updated and the changes all work, that will be good.
That isn't the same thing as an argument to remove things in this release.
I propose waiting until next release.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Greetings,
* Andreas Karlsson (andreas@proxel.se) wrote:
On 11/27/18 4:11 PM, Stephen Frost wrote:
I agree with your larger point, but in this case the two breakages do not
seem equal. As far as I gather the removal of recovery.conf will in practice
result in a longer downtime when your recovery scripts breaks but not any
data loss. While if we remove the exclusive backup mode then some people's
backup scripts will break and if they do not properly monitor their backups
they risk data loss.Let's please try to remember that this is across a major version upgrade
and is removing a broken capability that's already been deprecated for a
couple years.I know that and you know that, but even our own manual use the exclusive
mode in some of the examples, e.g: "pg_start_backup('label_goes_here')"[1].
Thankfully, that'll get nicely cleaned up by removing the exclusive
backup mode.
Your point about that people who do not monitor their backups wont notice
warnings is fair, but even so I think we should give people more time to
migrate when even our own documentation isn't always clear about the
exclusive mode being deprecated.
I don't buy off on this argument- we also have things like this:
https://www.postgresql.org/docs/11/backup-file.html
Where we show a simple tar command as a way to backup the database, but
then we caveat it. Similairly, we make it clear that users shouldn't be
using the exclusive mode backups:
https://www.postgresql.org/docs/11/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP
If users are only reading the system functions description and thinking
that's enough to figure out how to do backups, and didn't read the
chapters of documentation we have about how to perform a backup, then
chances are very good their existing backup systems are broken, and
that's all the more reason to remove the exclusive mode because at least
then it won't look like things are working when they aren't.
Thanks!
Stephen
Hi,
On 2018-11-26 23:04:35 -0500, Robert Haas wrote:
It's not like the problems with exclusive backup are so serious that
you can't work around them. If you know which machine is your master,
you can arrange to remove backup_label on reboot (only) on the master
(only). Sure, a lot of people probably do this wrong, but it's
infeasible to disallow all the things that people might use
incorrectly without making the system useless.
That doesn't protect against postgres, not machine, [crash-]restarts
though.
Greetings,
Andres Freund
On Tue, Nov 27, 2018 at 09:45:04AM -0500, Stephen Frost wrote:
If you don't consider your recovery scripts and your backup scripts to
be related then I've really got to wonder how you're regularly testing
your backups to make sure that they're actually valid.
Base backups can be perfectly self-contained as long as they include all
the WAL segments needed to recover up to the end-of-backup record.
That's what pg_basebackup does with its default options
(--wal-method=stream in particular).
If you aren't regularly testing your backups then I've got little
sympathy.
Fortunately they do, hundreds of time on a daily basis ;)
To be clear, pgbackrest doesn't have any dependency here- but it, like
all of the other 3rd party backup solutions and any restore solution
that a user has come up with, are going to have to be changed to deal
with the changes in how recovery works, so this is a good time to make
these changes.
My point is that base backups do not have a mandatory dependency with
recovery.conf all the time as they can perfectly be restored if they are
standalone backups. I can see a dependency with recovery.conf once you
have a base backup which needs to be fed with WAL segments from an
external archive, or when using a base backup to create a standby.
--
Michael
On 11/27/18 4:25 PM, Michael Paquier wrote:
On Tue, Nov 27, 2018 at 09:45:04AM -0500, Stephen Frost wrote:
If you don't consider your recovery scripts and your backup scripts to
be related then I've really got to wonder how you're regularly testing
your backups to make sure that they're actually valid.Base backups can be perfectly self-contained as long as they include all
the WAL segments needed to recover up to the end-of-backup record.
That's what pg_basebackup does with its default options
(--wal-method=stream in particular).
This is true, of course, but it misses one of the major benefits of
file-level backups which is PITR.
If you aren't regularly testing your backups then I've got little
sympathy.Fortunately they do, hundreds of time on a daily basis ;)
Nice!
To be clear, pgbackrest doesn't have any dependency here- but it, like
all of the other 3rd party backup solutions and any restore solution
that a user has come up with, are going to have to be changed to deal
with the changes in how recovery works, so this is a good time to make
these changes.My point is that base backups do not have a mandatory dependency with
recovery.conf all the time as they can perfectly be restored if they are
standalone backups. I can see a dependency with recovery.conf once you
have a base backup which needs to be fed with WAL segments from an
external archive, or when using a base backup to create a standby.
If you want to do PITR -- which is the default in most situations --
then some interaction with recovery.conf is needed. I think you would
be hard-pressed to find a prominent HA or backup solution that doesn't
do so.
--
-David
david@pgmasters.net
Greetings,
* Michael Paquier (michael@paquier.xyz) wrote:
On Tue, Nov 27, 2018 at 09:45:04AM -0500, Stephen Frost wrote:
If you don't consider your recovery scripts and your backup scripts to
be related then I've really got to wonder how you're regularly testing
your backups to make sure that they're actually valid.Base backups can be perfectly self-contained as long as they include all
the WAL segments needed to recover up to the end-of-backup record.
That's what pg_basebackup does with its default options
(--wal-method=stream in particular).
This really doesn't change my opinion at all. Sure, some custom scripts
might have been written to operate in this way and now they'll need to
be adjusted to work the exact same way pg_basebackup works today and use
the non-exclusive mode, but they'll be better off for it and won't have
the concern about what happens if the system is inadvertantly restarted
during a backup.
I'd also say it's likely that places which know enough to build such a
solution aren't ones that we really need to worry about having an issue
adjusting their scripts.
Thanks!
Stephen
On 11/26/18 10:13 PM, David Steele wrote:
Hackers,
I propose we remove the deprecated exclusive backup mode of
pg_start_backup()/pg_stop_backup() for Postgres 12.The exclusive backup mode has a serious known issue. If Postgres
terminates ungracefully during a backup (due to hardware, kernel,
Postgres issues, etc.) then Postgres may refuse to restart.The reason is that the backup_label file will usually reference a
checkpoint LSN that is older than the WAL available in pg_wal. Postgres
does emit a helpful error message while PANIC'ing but that's cold
comfort to an admin who must manually intervene to get their cluster
operational again.The deprecated exclusive mode promises to make a difficult problem
simple but doesn't live up to that promise. That's why it was replaced
externally in 9.6 and why pg_basebackup has not used exclusive backups
since it was introduced in 9.2.Non-exclusive backups have been available since 9.6 and several
third-party solutions support this mode, in addition to pg_basebackup.The recently introduced recovery changes will break current automated
solutions so this seems like a good opportunity to make improvements on
the backup side as well.I'll submit a patch for the 2019-01 commitfest.
Attached is the patch.
I was a bit surprised by how much code went away. There was a lot of
code dedicated to making sure that backup_label got renamed on shutdown,
that there was not an exclusive backup running, etc.
There were no tests for exclusive backup so the test changes were minimal.
I did have to replace one "hot backup" in
010_logical_decoding_timelines.pl. I'm not sure why the backup was
being done this way, or why the standby needs a copy of pg_replslot
(which is not copied by pg_basebackup). It might be worth looking at
but it seemed out of scope for this patch.
Regards,
--
-David
david@pgmasters.net
Attachments:
remove-exclusive-backup.patchtext/plain; charset=UTF-8; name=remove-exclusive-backup.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index a73fd4d044..6f45f2e177 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -819,15 +819,8 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 && cp pg_wal/0
sequence, and that the success of a step is verified before
proceeding to the next step.
</para>
- <para>
- Low level base backups can be made in a non-exclusive or an exclusive
- way. The non-exclusive method is recommended and the exclusive one is
- deprecated and will eventually be removed.
- </para>
- <sect3 id="backup-lowlevel-base-backup-nonexclusive">
- <title>Making a non-exclusive low level backup</title>
<para>
- A non-exclusive low level backup is one that allows other
+ A low level backup allows other
concurrent backups to be running (both those started using
the same backup API and those started using
<xref linkend="app-pgbasebackup"/>).
@@ -845,7 +838,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 && cp pg_wal/0
rights to run pg_start_backup (superuser, or a user who has been granted
EXECUTE on the function) and issue the command:
<programlisting>
-SELECT pg_start_backup('label', false, false);
+SELECT pg_start_backup('label', false);
</programlisting>
where <literal>label</literal> is any string you want to use to uniquely
identify this backup operation. The connection
@@ -866,10 +859,6 @@ SELECT pg_start_backup('label', false, false);
issue an immediate checkpoint using as much I/O as available.
</para>
- <para>
- The third parameter being <literal>false</literal> tells
- <function>pg_start_backup</function> to initiate a non-exclusive base backup.
- </para>
</listitem>
<listitem>
<para>
@@ -887,7 +876,7 @@ SELECT pg_start_backup('label', false, false);
<para>
In the same connection as before, issue the command:
<programlisting>
-SELECT * FROM pg_stop_backup(false, true);
+SELECT * FROM pg_stop_backup(true);
</programlisting>
This terminates backup mode. On a primary, it also performs an automatic
switch to the next WAL segment. On a standby, it is not possible to
@@ -945,116 +934,6 @@ SELECT * FROM pg_stop_backup(false, true);
</listitem>
</orderedlist>
</para>
- </sect3>
- <sect3 id="backup-lowlevel-base-backup-exclusive">
- <title>Making an exclusive low level backup</title>
- <para>
- The process for an exclusive backup is mostly the same as for a
- non-exclusive one, but it differs in a few key steps. This type of backup
- can only be taken on a primary and does not allow concurrent backups.
- Prior to <productname>PostgreSQL</productname> 9.6, this
- was the only low-level method available, but it is now recommended that
- all users upgrade their scripts to use non-exclusive backups if possible.
- </para>
- <para>
- <orderedlist>
- <listitem>
- <para>
- Ensure that WAL archiving is enabled and working.
- </para>
- </listitem>
- <listitem>
- <para>
- Connect to the server (it does not matter which database) as a user with
- rights to run pg_start_backup (superuser, or a user who has been granted
- EXECUTE on the function) and issue the command:
-<programlisting>
-SELECT pg_start_backup('label');
-</programlisting>
- where <literal>label</literal> is any string you want to use to uniquely
- identify this backup operation.
- <function>pg_start_backup</function> creates a <firstterm>backup label</firstterm> file,
- called <filename>backup_label</filename>, in the cluster directory with
- information about your backup, including the start time and label string.
- The function also creates a <firstterm>tablespace map</firstterm> file,
- called <filename>tablespace_map</filename>, in the cluster directory with
- information about tablespace symbolic links in <filename>pg_tblspc/</filename> if
- one or more such link is present. Both files are critical to the
- integrity of the backup, should you need to restore from it.
- </para>
-
- <para>
- By default, <function>pg_start_backup</function> can take a long time to finish.
- This is because it performs a checkpoint, and the I/O
- required for the checkpoint will be spread out over a significant
- period of time, by default half your inter-checkpoint interval
- (see the configuration parameter
- <xref linkend="guc-checkpoint-completion-target"/>). This is
- usually what you want, because it minimizes the impact on query
- processing. If you want to start the backup as soon as
- possible, use:
-<programlisting>
-SELECT pg_start_backup('label', true);
-</programlisting>
- This forces the checkpoint to be done as quickly as possible.
- </para>
- </listitem>
- <listitem>
- <para>
- Perform the backup, using any convenient file-system-backup tool
- such as <application>tar</application> or <application>cpio</application> (not
- <application>pg_dump</application> or
- <application>pg_dumpall</application>). It is neither
- necessary nor desirable to stop normal operation of the database
- while you do this. See
- <xref linkend="backup-lowlevel-base-backup-data"/> for things to
- consider during this backup.
- </para>
- <para>
- Note that if the server crashes during the backup it may not be
- possible to restart until the <literal>backup_label</literal> file has been
- manually deleted from the <envar>PGDATA</envar> directory.
- </para>
- </listitem>
- <listitem>
- <para>
- Again connect to the database as a user with rights to run
- pg_stop_backup (superuser, or a user who has been granted EXECUTE on
- the function), and issue the command:
-<programlisting>
-SELECT pg_stop_backup();
-</programlisting>
- This function terminates backup mode and
- performs an automatic switch to the next WAL segment. The reason for the
- switch is to arrange for the last WAL segment written during the backup
- interval to be ready to archive.
- </para>
- </listitem>
- <listitem>
- <para>
- Once the WAL segment files active during the backup are archived, you are
- done. The file identified by <function>pg_stop_backup</function>'s result is
- the last segment that is required to form a complete set of backup files.
- If <varname>archive_mode</varname> is enabled,
- <function>pg_stop_backup</function> does not return until the last segment has
- been archived.
- Archiving of these files happens automatically since you have
- already configured <varname>archive_command</varname>. In most cases this
- happens quickly, but you are advised to monitor your archive
- system to ensure there are no delays.
- If the archive process has fallen behind
- because of failures of the archive command, it will keep retrying
- until the archive succeeds and the backup is complete.
- If you wish to place a time limit on the execution of
- <function>pg_stop_backup</function>, set an appropriate
- <varname>statement_timeout</varname> value, but make note that if
- <function>pg_stop_backup</function> terminates because of this your backup
- may not be valid.
- </para>
- </listitem>
- </orderedlist>
- </para>
- </sect3>
<sect3 id="backup-lowlevel-base-backup-data">
<title>Backing up the data directory</title>
<para>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b3336ea9be..ae7722bb12 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18945,38 +18945,17 @@ SELECT set_config('log_statement_stats', 'off', false);
</row>
<row>
<entry>
- <literal><function>pg_start_backup(<parameter>label</parameter> <type>text</type> <optional>, <parameter>fast</parameter> <type>boolean</type> <optional>, <parameter>exclusive</parameter> <type>boolean</type> </optional></optional>)</function></literal>
+ <literal><function>pg_start_backup(<parameter>label</parameter> <type>text</type> <optional>, <parameter>fast</parameter> <type>boolean</type> </optional>)</function></literal>
</entry>
<entry><type>pg_lsn</type></entry>
<entry>Prepare for performing on-line backup (restricted to superusers by default, but other users can be granted EXECUTE to run the function)</entry>
</row>
<row>
<entry>
- <literal><function>pg_stop_backup()</function></literal>
- </entry>
- <entry><type>pg_lsn</type></entry>
- <entry>Finish performing exclusive on-line backup (restricted to superusers by default, but other users can be granted EXECUTE to run the function)</entry>
- </row>
- <row>
- <entry>
- <literal><function>pg_stop_backup(<parameter>exclusive</parameter> <type>boolean</type> <optional>, <parameter>wait_for_archive</parameter> <type>boolean</type> </optional>)</function></literal>
+ <literal><function>pg_stop_backup(<optional> <parameter>wait_for_archive</parameter> <type>boolean</type> </optional>)</function></literal>
</entry>
<entry><type>setof record</type></entry>
- <entry>Finish performing exclusive or non-exclusive on-line backup (restricted to superusers by default, but other users can be granted EXECUTE to run the function)</entry>
- </row>
- <row>
- <entry>
- <literal><function>pg_is_in_backup()</function></literal>
- </entry>
- <entry><type>bool</type></entry>
- <entry>True if an on-line exclusive backup is still in progress.</entry>
- </row>
- <row>
- <entry>
- <literal><function>pg_backup_start_time()</function></literal>
- </entry>
- <entry><type>timestamp with time zone</type></entry>
- <entry>Get start time of an on-line exclusive backup in progress.</entry>
+ <entry>Finish performing on-line backup (restricted to superusers by default, but other users can be granted EXECUTE to run the function)</entry>
</row>
<row>
<entry>
@@ -19013,16 +18992,9 @@ SELECT set_config('log_statement_stats', 'off', false);
<para>
<function>pg_start_backup</function> accepts an arbitrary user-defined label for
the backup. (Typically this would be the name under which the backup dump
- file will be stored.) When used in exclusive mode, the function writes a
- backup label file (<filename>backup_label</filename>) and, if there are any links
- in the <filename>pg_tblspc/</filename> directory, a tablespace map file
- (<filename>tablespace_map</filename>) into the database cluster's data directory,
- performs a checkpoint, and then returns the backup's starting write-ahead
- log location as text. The user can ignore this result value, but it is
- provided in case it is useful. When used in non-exclusive mode, the
- contents of these files are instead returned by the
- <function>pg_stop_backup</function> function, and should be written to the backup
- by the caller.
+ file will be stored.) It returns the backup's starting write-ahead
+ log location as a <type>pg_lsn</type> which can be ignored by the user, but is
+ provided in case it is useful.
<programlisting>
postgres=# select pg_start_backup('label_goes_here');
@@ -19038,12 +19010,10 @@ postgres=# select pg_start_backup('label_goes_here');
</para>
<para>
- In an exclusive backup, <function>pg_stop_backup</function> removes the label file
- and, if it exists, the <filename>tablespace_map</filename> file created by
- <function>pg_start_backup</function>. In a non-exclusive backup, the contents of
- the <filename>backup_label</filename> and <filename>tablespace_map</filename> are returned
- in the result of the function, and should be written to files in the
- backup (and not in the data directory). There is an optional second
+ The contents of the <filename>backup_label</filename> and
+ <filename>tablespace_map</filename> are returned in the result of
+ <function>pg_stop_backup</function>, and should be written to files in the
+ backup (and not in the data directory). There is an optional
parameter of type <type>boolean</type>. If false, the <function>pg_stop_backup</function>
will return immediately after the backup is completed without waiting for
WAL to be archived. This behavior is only useful for backup
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c80b14ed97..5807616289 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -493,29 +493,6 @@ typedef union WALInsertLockPadded
char pad[PG_CACHE_LINE_SIZE];
} WALInsertLockPadded;
-/*
- * State of an exclusive backup, necessary to control concurrent activities
- * across sessions when working on exclusive backups.
- *
- * EXCLUSIVE_BACKUP_NONE means that there is no exclusive backup actually
- * running, to be more precise pg_start_backup() is not being executed for
- * an exclusive backup and there is no exclusive backup in progress.
- * EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is starting an
- * exclusive backup.
- * EXCLUSIVE_BACKUP_IN_PROGRESS means that pg_start_backup() has finished
- * running and an exclusive backup is in progress. pg_stop_backup() is
- * needed to finish it.
- * EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is stopping an
- * exclusive backup.
- */
-typedef enum ExclusiveBackupState
-{
- EXCLUSIVE_BACKUP_NONE = 0,
- EXCLUSIVE_BACKUP_STARTING,
- EXCLUSIVE_BACKUP_IN_PROGRESS,
- EXCLUSIVE_BACKUP_STOPPING
-} ExclusiveBackupState;
-
/*
* Session status of running backup, used for sanity checks in SQL-callable
* functions to start and stop backups.
@@ -563,15 +540,12 @@ typedef struct XLogCtlInsert
bool fullPageWrites;
/*
- * exclusiveBackupState indicates the state of an exclusive backup (see
- * comments of ExclusiveBackupState for more details). nonExclusiveBackups
- * is a counter indicating the number of streaming base backups currently
- * in progress. forcePageWrites is set to true when either of these is
- * non-zero. lastBackupStart is the latest checkpoint redo location used
- * as a starting point for an online backup.
+ * runningBackups is a counter indicating the number of backups currently in
+ * progress. forcePageWrites is set to true when either of these is
+ * non-zero. lastBackupStart is the latest checkpoint redo location used as
+ * a starting point for an online backup.
*/
- ExclusiveBackupState exclusiveBackupState;
- int nonExclusiveBackups;
+ int runningBackups;
XLogRecPtr lastBackupStart;
/*
@@ -916,7 +890,6 @@ static void xlog_outrec(StringInfo buf, XLogReaderState *record);
#endif
static void xlog_outdesc(StringInfo buf, XLogReaderState *record);
static void pg_start_backup_callback(int code, Datum arg);
-static void pg_stop_backup_callback(int code, Datum arg);
static bool read_backup_label(XLogRecPtr *checkPointLoc,
bool *backupEndRequired, bool *backupFromStandby);
static bool read_tablespace_map(List **tablespaces);
@@ -9208,7 +9181,7 @@ CreateRestartPoint(int flags)
* Ensure minRecoveryPoint is past the checkpoint record. Normally,
* this will have happened already while writing out dirty buffers,
* but not necessarily - e.g. because no buffers were dirtied. We do
- * this because a non-exclusive base backup uses minRecoveryPoint to
+ * this because a backup performed in recovery uses minRecoveryPoint to
* determine which WAL files must be included in the backup, and the
* file (or files) containing the checkpoint record must be included,
* at a minimum. Note that for an ordinary restart of recovery there's
@@ -10165,36 +10138,25 @@ XLogFileNameP(TimeLineID tli, XLogSegNo segno)
/*
* do_pg_start_backup is the workhorse of the user-visible pg_start_backup()
* function. It creates the necessary starting checkpoint and constructs the
- * backup label file.
- *
- * There are two kind of backups: exclusive and non-exclusive. An exclusive
- * backup is started with pg_start_backup(), and there can be only one active
- * at a time. The backup and tablespace map files of an exclusive backup are
- * written to $PGDATA/backup_label and $PGDATA/tablespace_map, and they are
- * removed by pg_stop_backup().
- *
- * A non-exclusive backup is used for the streaming base backups (see
- * src/backend/replication/basebackup.c). The difference to exclusive backups
- * is that the backup label and tablespace map files are not written to disk.
- * Instead, their would-be contents are returned in *labelfile and *tblspcmapfile,
- * and the caller is responsible for including them in the backup archive as
- * 'backup_label' and 'tablespace_map'. There can be many non-exclusive backups
- * active at the same time, and they don't conflict with an exclusive backup
- * either.
+ * backup label and tablespace map.
+ *
+ * The backup label and tablespace map contents are returned in *labelfile and
+ * *tblspcmapfile, and the caller is responsible for including them in the
+ * backup archive as 'backup_label' and 'tablespace_map'. There can be many
+ * backups active at the same time.
*
* tblspcmapfile is required mainly for tar format in windows as native windows
* utilities are not able to create symlinks while extracting files from tar.
* However for consistency, the same is used for all platforms.
*
- * needtblspcmapfile is true for the cases (exclusive backup and for
- * non-exclusive backup only when tar format is used for taking backup)
- * when backup needs to generate tablespace_map file, it is used to
- * embed escape character before newline character in tablespace path.
+ * needtblspcmapfile is true for the cases (only when tar format is used for
+ * taking backup) when backup needs to generate tablespace_map file, it is used
+ * to embed escape character before newline character in tablespace path.
*
* Returns the minimum WAL location that must be present to restore from this
* backup, and the corresponding timeline ID in *starttli_p.
*
- * Every successfully started non-exclusive backup must be stopped by calling
+ * Every successfully started backup must be stopped by calling
* do_pg_stop_backup() or do_pg_abort_backup().
*
* It is the responsibility of the caller of this function to verify the
@@ -10206,7 +10168,6 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
StringInfo tblspcmapfile, bool infotbssize,
bool needtblspcmapfile)
{
- bool exclusive = (labelfile == NULL);
bool backup_started_in_recovery = false;
XLogRecPtr checkpointloc;
XLogRecPtr startpoint;
@@ -10215,20 +10176,9 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
char strfbuf[128];
char xlogfilename[MAXFNAMELEN];
XLogSegNo _logSegNo;
- struct stat stat_buf;
- FILE *fp;
backup_started_in_recovery = RecoveryInProgress();
- /*
- * Currently only non-exclusive backup can be taken during recovery.
- */
- if (backup_started_in_recovery && exclusive)
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("recovery is in progress"),
- errhint("WAL control functions cannot be executed during recovery.")));
-
/*
* During recovery, we don't need to check WAL level. Because, if WAL
* level is not sufficient, it's impossible to get here during recovery.
@@ -10267,30 +10217,12 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
* XLogInsertRecord().
*/
WALInsertLockAcquireExclusive();
- if (exclusive)
- {
- /*
- * At first, mark that we're now starting an exclusive backup, to
- * ensure that there are no other sessions currently running
- * pg_start_backup() or pg_stop_backup().
- */
- if (XLogCtl->Insert.exclusiveBackupState != EXCLUSIVE_BACKUP_NONE)
- {
- WALInsertLockRelease();
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("a backup is already in progress"),
- errhint("Run pg_stop_backup() and try again.")));
- }
- XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_STARTING;
- }
- else
- XLogCtl->Insert.nonExclusiveBackups++;
+ XLogCtl->Insert.runningBackups++;
XLogCtl->Insert.forcePageWrites = true;
WALInsertLockRelease();
/* Ensure we release forcePageWrites if fail below */
- PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
+ PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) 0);
{
bool gotUniqueStartpoint = false;
DIR *tblspcdir;
@@ -10417,11 +10349,8 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
XLogFileName(xlogfilename, starttli, _logSegNo, wal_segment_size);
/*
- * Construct tablespace_map file
+ * Construct tablespace_map
*/
- if (exclusive)
- tblspcmapfile = makeStringInfo();
-
datadirpathlen = strlen(DataDir);
/* Collect information about all tablespaces */
@@ -10511,11 +10440,8 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
FreeDir(tblspcdir);
/*
- * Construct backup label file
+ * Construct backup_label
*/
- if (exclusive)
- labelfile = makeStringInfo();
-
/* Use the log timezone here, not the session timezone */
stamp_time = (pg_time_t) time(NULL);
pg_strftime(strfbuf, sizeof(strfbuf),
@@ -10525,122 +10451,19 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
(uint32) (startpoint >> 32), (uint32) startpoint, xlogfilename);
appendStringInfo(labelfile, "CHECKPOINT LOCATION: %X/%X\n",
(uint32) (checkpointloc >> 32), (uint32) checkpointloc);
- appendStringInfo(labelfile, "BACKUP METHOD: %s\n",
- exclusive ? "pg_start_backup" : "streamed");
+ appendStringInfo(labelfile, "BACKUP METHOD: streamed\n");
appendStringInfo(labelfile, "BACKUP FROM: %s\n",
backup_started_in_recovery ? "standby" : "master");
appendStringInfo(labelfile, "START TIME: %s\n", strfbuf);
appendStringInfo(labelfile, "LABEL: %s\n", backupidstr);
appendStringInfo(labelfile, "START TIMELINE: %u\n", starttli);
-
- /*
- * Okay, write the file, or return its contents to caller.
- */
- if (exclusive)
- {
- /*
- * Check for existing backup label --- implies a backup is already
- * running. (XXX given that we checked exclusiveBackupState
- * above, maybe it would be OK to just unlink any such label
- * file?)
- */
- if (stat(BACKUP_LABEL_FILE, &stat_buf) != 0)
- {
- if (errno != ENOENT)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- BACKUP_LABEL_FILE)));
- }
- else
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("a backup is already in progress"),
- errhint("If you're sure there is no backup in progress, remove file \"%s\" and try again.",
- BACKUP_LABEL_FILE)));
-
- fp = AllocateFile(BACKUP_LABEL_FILE, "w");
-
- if (!fp)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not create file \"%s\": %m",
- BACKUP_LABEL_FILE)));
- if (fwrite(labelfile->data, labelfile->len, 1, fp) != 1 ||
- fflush(fp) != 0 ||
- pg_fsync(fileno(fp)) != 0 ||
- ferror(fp) ||
- FreeFile(fp))
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not write file \"%s\": %m",
- BACKUP_LABEL_FILE)));
- /* Allocated locally for exclusive backups, so free separately */
- pfree(labelfile->data);
- pfree(labelfile);
-
- /* Write backup tablespace_map file. */
- if (tblspcmapfile->len > 0)
- {
- if (stat(TABLESPACE_MAP, &stat_buf) != 0)
- {
- if (errno != ENOENT)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- TABLESPACE_MAP)));
- }
- else
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("a backup is already in progress"),
- errhint("If you're sure there is no backup in progress, remove file \"%s\" and try again.",
- TABLESPACE_MAP)));
-
- fp = AllocateFile(TABLESPACE_MAP, "w");
-
- if (!fp)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not create file \"%s\": %m",
- TABLESPACE_MAP)));
- if (fwrite(tblspcmapfile->data, tblspcmapfile->len, 1, fp) != 1 ||
- fflush(fp) != 0 ||
- pg_fsync(fileno(fp)) != 0 ||
- ferror(fp) ||
- FreeFile(fp))
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not write file \"%s\": %m",
- TABLESPACE_MAP)));
- }
-
- /* Allocated locally for exclusive backups, so free separately */
- pfree(tblspcmapfile->data);
- pfree(tblspcmapfile);
- }
}
- PG_END_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
+ PG_END_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) 0);
/*
- * Mark that start phase has correctly finished for an exclusive backup.
- * Session-level locks are updated as well to reflect that state.
- *
- * Note that CHECK_FOR_INTERRUPTS() must not occur while updating backup
- * counters and session-level lock. Otherwise they can be updated
- * inconsistently, and which might cause do_pg_abort_backup() to fail.
+ * Mark that the start phase has correctly finished for the backup.
*/
- if (exclusive)
- {
- WALInsertLockAcquireExclusive();
- XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_IN_PROGRESS;
-
- /* Set session-level lock */
- sessionBackupState = SESSION_BACKUP_EXCLUSIVE;
- WALInsertLockRelease();
- }
- else
- sessionBackupState = SESSION_BACKUP_NON_EXCLUSIVE;
+ sessionBackupState = SESSION_BACKUP_RUNNING;
/*
* We're done. As a convenience, return the starting WAL location.
@@ -10654,43 +10477,15 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
static void
pg_start_backup_callback(int code, Datum arg)
{
- bool exclusive = DatumGetBool(arg);
-
/* Update backup counters and forcePageWrites on failure */
WALInsertLockAcquireExclusive();
- if (exclusive)
- {
- Assert(XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STARTING);
- XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_NONE;
- }
- else
- {
- Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
- XLogCtl->Insert.nonExclusiveBackups--;
- }
- if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&
- XLogCtl->Insert.nonExclusiveBackups == 0)
- {
- XLogCtl->Insert.forcePageWrites = false;
- }
- WALInsertLockRelease();
-}
+ Assert(XLogCtl->Insert.runningBackups > 0);
+ XLogCtl->Insert.runningBackups--;
-/*
- * Error cleanup callback for pg_stop_backup
- */
-static void
-pg_stop_backup_callback(int code, Datum arg)
-{
- bool exclusive = DatumGetBool(arg);
-
- /* Update backup status on failure */
- WALInsertLockAcquireExclusive();
- if (exclusive)
+ if (XLogCtl->Insert.runningBackups == 0)
{
- Assert(XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STOPPING);
- XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_IN_PROGRESS;
+ XLogCtl->Insert.forcePageWrites = false;
}
WALInsertLockRelease();
}
@@ -10708,9 +10503,6 @@ get_backup_status(void)
* do_pg_stop_backup is the workhorse of the user-visible pg_stop_backup()
* function.
*
- * If labelfile is NULL, this stops an exclusive backup. Otherwise this stops
- * the non-exclusive backup specified by 'labelfile'.
- *
* Returns the last WAL location that must be present to restore from this
* backup, and the corresponding timeline ID in *stoptli_p.
*
@@ -10720,7 +10512,6 @@ get_backup_status(void)
XLogRecPtr
do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
{
- bool exclusive = (labelfile == NULL);
bool backup_started_in_recovery = false;
XLogRecPtr startpoint;
XLogRecPtr stoppoint;
@@ -10734,7 +10525,6 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
char histfilename[MAXFNAMELEN];
char backupfrom[20];
XLogSegNo _logSegNo;
- FILE *lfp;
FILE *fp;
char ch;
int seconds_before_warning;
@@ -10747,15 +10537,6 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
backup_started_in_recovery = RecoveryInProgress();
- /*
- * Currently only non-exclusive backup can be taken during recovery.
- */
- if (backup_started_in_recovery && exclusive)
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("recovery is in progress"),
- errhint("WAL control functions cannot be executed during recovery.")));
-
/*
* During recovery, we don't need to check WAL level. Because, if WAL
* level is not sufficient, it's impossible to get here during recovery.
@@ -10766,106 +10547,23 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
errmsg("WAL level not sufficient for making an online backup"),
errhint("wal_level must be set to \"replica\" or \"logical\" at server start.")));
- if (exclusive)
- {
- /*
- * At first, mark that we're now stopping an exclusive backup, to
- * ensure that there are no other sessions currently running
- * pg_start_backup() or pg_stop_backup().
- */
- WALInsertLockAcquireExclusive();
- if (XLogCtl->Insert.exclusiveBackupState != EXCLUSIVE_BACKUP_IN_PROGRESS)
- {
- WALInsertLockRelease();
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("exclusive backup not in progress")));
- }
- XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_STOPPING;
- WALInsertLockRelease();
-
- /*
- * Remove backup_label. In case of failure, the state for an exclusive
- * backup is switched back to in-progress.
- */
- PG_ENSURE_ERROR_CLEANUP(pg_stop_backup_callback, (Datum) BoolGetDatum(exclusive));
- {
- /*
- * Read the existing label file into memory.
- */
- struct stat statbuf;
- int r;
-
- if (stat(BACKUP_LABEL_FILE, &statbuf))
- {
- /* should not happen per the upper checks */
- if (errno != ENOENT)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- BACKUP_LABEL_FILE)));
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("a backup is not in progress")));
- }
-
- lfp = AllocateFile(BACKUP_LABEL_FILE, "r");
- if (!lfp)
- {
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not read file \"%s\": %m",
- BACKUP_LABEL_FILE)));
- }
- labelfile = palloc(statbuf.st_size + 1);
- r = fread(labelfile, statbuf.st_size, 1, lfp);
- labelfile[statbuf.st_size] = '\0';
-
- /*
- * Close and remove the backup label file
- */
- if (r != 1 || ferror(lfp) || FreeFile(lfp))
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not read file \"%s\": %m",
- BACKUP_LABEL_FILE)));
- durable_unlink(BACKUP_LABEL_FILE, ERROR);
-
- /*
- * Remove tablespace_map file if present, it is created only if
- * there are tablespaces.
- */
- durable_unlink(TABLESPACE_MAP, DEBUG1);
- }
- PG_END_ENSURE_ERROR_CLEANUP(pg_stop_backup_callback, (Datum) BoolGetDatum(exclusive));
- }
-
/*
- * OK to update backup counters, forcePageWrites and session-level lock.
+ * OK to update backup counters and forcePageWrites.
*
* Note that CHECK_FOR_INTERRUPTS() must not occur while updating them.
* Otherwise they can be updated inconsistently, and which might cause
* do_pg_abort_backup() to fail.
*/
WALInsertLockAcquireExclusive();
- if (exclusive)
- {
- XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_NONE;
- }
- else
- {
- /*
- * The user-visible pg_start/stop_backup() functions that operate on
- * exclusive backups can be called at any time, but for non-exclusive
- * backups, it is expected that each do_pg_start_backup() call is
- * matched by exactly one do_pg_stop_backup() call.
- */
- Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
- XLogCtl->Insert.nonExclusiveBackups--;
- }
- if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&
- XLogCtl->Insert.nonExclusiveBackups == 0)
+ /*
+ * It is expected that each do_pg_start_backup() call is matched by exactly
+ * one do_pg_stop_backup() call.
+ */
+ Assert(XLogCtl->Insert.runningBackups > 0);
+ XLogCtl->Insert.runningBackups--;
+
+ if (XLogCtl->Insert.runningBackups == 0)
{
XLogCtl->Insert.forcePageWrites = false;
}
@@ -11117,28 +10815,21 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
* This does just the most basic steps of do_pg_stop_backup(), by taking the
* system out of backup mode, thus making it a lot more safe to call from
* an error handler.
- *
- * NB: This is only for aborting a non-exclusive backup that doesn't write
- * backup_label. A backup started with pg_start_backup() needs to be finished
- * with pg_stop_backup().
*/
void
do_pg_abort_backup(void)
{
/*
- * Quick exit if session is not keeping around a non-exclusive backup
- * already started.
+ * Quick exit if session does not have a running backup.
*/
if (sessionBackupState == SESSION_BACKUP_NONE)
return;
WALInsertLockAcquireExclusive();
- Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
- Assert(sessionBackupState == SESSION_BACKUP_NON_EXCLUSIVE);
- XLogCtl->Insert.nonExclusiveBackups--;
+ Assert(XLogCtl->Insert.runningBackups > 0);
+ XLogCtl->Insert.runningBackups--;
- if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&
- XLogCtl->Insert.nonExclusiveBackups == 0)
+ if (XLogCtl->Insert.runningBackups == 0)
{
XLogCtl->Insert.forcePageWrites = false;
}
@@ -11441,87 +11132,6 @@ rm_redo_error_callback(void *arg)
pfree(buf.data);
}
-/*
- * BackupInProgress: check if online backup mode is active
- *
- * This is done by checking for existence of the "backup_label" file.
- */
-bool
-BackupInProgress(void)
-{
- struct stat stat_buf;
-
- return (stat(BACKUP_LABEL_FILE, &stat_buf) == 0);
-}
-
-/*
- * CancelBackup: rename the "backup_label" and "tablespace_map"
- * files to cancel backup mode
- *
- * If the "backup_label" file exists, it will be renamed to "backup_label.old".
- * Similarly, if the "tablespace_map" file exists, it will be renamed to
- * "tablespace_map.old".
- *
- * Note that this will render an online backup in progress
- * useless. To correctly finish an online backup, pg_stop_backup must be
- * called.
- */
-void
-CancelBackup(void)
-{
- struct stat stat_buf;
-
- /* if the backup_label file is not there, return */
- if (stat(BACKUP_LABEL_FILE, &stat_buf) < 0)
- return;
-
- /* remove leftover file from previously canceled backup if it exists */
- unlink(BACKUP_LABEL_OLD);
-
- if (durable_rename(BACKUP_LABEL_FILE, BACKUP_LABEL_OLD, DEBUG1) != 0)
- {
- ereport(WARNING,
- (errcode_for_file_access(),
- errmsg("online backup mode was not canceled"),
- errdetail("File \"%s\" could not be renamed to \"%s\": %m.",
- BACKUP_LABEL_FILE, BACKUP_LABEL_OLD)));
- return;
- }
-
- /* if the tablespace_map file is not there, return */
- if (stat(TABLESPACE_MAP, &stat_buf) < 0)
- {
- ereport(LOG,
- (errmsg("online backup mode canceled"),
- errdetail("File \"%s\" was renamed to \"%s\".",
- BACKUP_LABEL_FILE, BACKUP_LABEL_OLD)));
- return;
- }
-
- /* remove leftover file from previously canceled backup if it exists */
- unlink(TABLESPACE_MAP_OLD);
-
- if (durable_rename(TABLESPACE_MAP, TABLESPACE_MAP_OLD, DEBUG1) == 0)
- {
- ereport(LOG,
- (errmsg("online backup mode canceled"),
- errdetail("Files \"%s\" and \"%s\" were renamed to "
- "\"%s\" and \"%s\", respectively.",
- BACKUP_LABEL_FILE, TABLESPACE_MAP,
- BACKUP_LABEL_OLD, TABLESPACE_MAP_OLD)));
- }
- else
- {
- ereport(WARNING,
- (errcode_for_file_access(),
- errmsg("online backup mode canceled"),
- errdetail("File \"%s\" was renamed to \"%s\", but "
- "file \"%s\" could not be renamed to \"%s\": %m.",
- BACKUP_LABEL_FILE, BACKUP_LABEL_OLD,
- TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
- }
-}
-
/*
* Read the XLOG page containing RecPtr into readBuf (if not read already).
* Returns number of bytes read, if the page is read successfully, or -1
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index f139eeff9f..469d5e41c1 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -40,17 +40,16 @@
/*
- * Store label file and tablespace map during non-exclusive backups.
+ * Store label file and tablespace map during backups.
*/
static StringInfo label_file;
static StringInfo tblspc_map_file;
/*
- * Called when the backend exits with a running non-exclusive base backup,
- * to clean up state.
+ * Called when the backend exits with a running base backup to clean up state.
*/
static void
-nonexclusive_base_backup_cleanup(int code, Datum arg)
+base_backup_cleanup(int code, Datum arg)
{
do_pg_abort_backup();
ereport(WARNING,
@@ -74,101 +73,40 @@ pg_start_backup(PG_FUNCTION_ARGS)
{
text *backupid = PG_GETARG_TEXT_PP(0);
bool fast = PG_GETARG_BOOL(1);
- bool exclusive = PG_GETARG_BOOL(2);
char *backupidstr;
XLogRecPtr startpoint;
SessionBackupState status = get_backup_status();
+ MemoryContext oldcontext;
backupidstr = text_to_cstring(backupid);
- if (status == SESSION_BACKUP_NON_EXCLUSIVE)
+ if (status == SESSION_BACKUP_RUNNING)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("a backup is already in progress in this session")));
- if (exclusive)
- {
- startpoint = do_pg_start_backup(backupidstr, fast, NULL, NULL,
- NULL, NULL, false, true);
- }
- else
- {
- MemoryContext oldcontext;
-
- /*
- * Label file and tablespace map file need to be long-lived, since
- * they are read in pg_stop_backup.
- */
- oldcontext = MemoryContextSwitchTo(TopMemoryContext);
- label_file = makeStringInfo();
- tblspc_map_file = makeStringInfo();
- MemoryContextSwitchTo(oldcontext);
+ /*
+ * Label file and tablespace map file need to be long-lived, since
+ * they are read in pg_stop_backup.
+ */
+ oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+ label_file = makeStringInfo();
+ tblspc_map_file = makeStringInfo();
+ MemoryContextSwitchTo(oldcontext);
- startpoint = do_pg_start_backup(backupidstr, fast, NULL, label_file,
- NULL, tblspc_map_file, false, true);
+ startpoint = do_pg_start_backup(backupidstr, fast, NULL, label_file,
+ NULL, tblspc_map_file, false, true);
- before_shmem_exit(nonexclusive_base_backup_cleanup, (Datum) 0);
- }
+ before_shmem_exit(base_backup_cleanup, (Datum) 0);
PG_RETURN_LSN(startpoint);
}
-/*
- * pg_stop_backup: finish taking an on-line backup dump
- *
- * We write an end-of-backup WAL record, and remove the backup label file
- * created by pg_start_backup, creating a backup history file in pg_wal
- * instead (whence it will immediately be archived). The backup history file
- * contains the same info found in the label file, plus the backup-end time
- * and WAL location. Before 9.0, the backup-end time was read from the backup
- * history file at the beginning of archive recovery, but we now use the WAL
- * record for that and the file is for informational and debug purposes only.
- *
- * Note: different from CancelBackup which just cancels online backup mode.
- *
- * Note: this version is only called to stop an exclusive backup. The function
- * pg_stop_backup_v2 (overloaded as pg_stop_backup in SQL) is called to
- * stop non-exclusive backups.
- *
- * Permission checking for this function is managed through the normal
- * GRANT system.
- */
-Datum
-pg_stop_backup(PG_FUNCTION_ARGS)
-{
- XLogRecPtr stoppoint;
- SessionBackupState status = get_backup_status();
-
- if (status == SESSION_BACKUP_NON_EXCLUSIVE)
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("non-exclusive backup in progress"),
- errhint("Did you mean to use pg_stop_backup('f')?")));
-
- /*
- * Exclusive backups were typically started in a different connection, so
- * don't try to verify that status of backup is set to
- * SESSION_BACKUP_EXCLUSIVE in this function. Actual verification that an
- * exclusive backup is in fact running is handled inside
- * do_pg_stop_backup.
- */
- stoppoint = do_pg_stop_backup(NULL, true, NULL);
-
- PG_RETURN_LSN(stoppoint);
-}
-
/*
- * pg_stop_backup_v2: finish taking exclusive or nonexclusive on-line backup.
+ * pg_stop_backup: finish taking an on-line backup.
*
- * Works the same as pg_stop_backup, except for non-exclusive backups it returns
- * the backup label and tablespace map files as text fields in as part of the
- * resultset.
- *
- * The first parameter (variable 'exclusive') allows the user to tell us if
- * this is an exclusive or a non-exclusive backup.
- *
- * The second parameter (variable 'waitforarchive'), which is optional,
+ * The first parameter (variable 'waitforarchive'), which is optional,
* allows the user to choose if they want to wait for the WAL to be archived
* or if we should just return as soon as the WAL record is written.
*
@@ -176,7 +114,7 @@ pg_stop_backup(PG_FUNCTION_ARGS)
* GRANT system.
*/
Datum
-pg_stop_backup_v2(PG_FUNCTION_ARGS)
+pg_stop_backup(PG_FUNCTION_ARGS)
{
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
TupleDesc tupdesc;
@@ -186,8 +124,7 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
Datum values[3];
bool nulls[3];
- bool exclusive = PG_GETARG_BOOL(0);
- bool waitforarchive = PG_GETARG_BOOL(1);
+ bool waitforarchive = PG_GETARG_BOOL(0);
XLogRecPtr stoppoint;
SessionBackupState status = get_backup_status();
@@ -219,52 +156,30 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
MemSet(values, 0, sizeof(values));
MemSet(nulls, 0, sizeof(nulls));
- if (exclusive)
- {
- if (status == SESSION_BACKUP_NON_EXCLUSIVE)
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("non-exclusive backup in progress"),
- errhint("Did you mean to use pg_stop_backup('f')?")));
-
- /*
- * Stop the exclusive backup, and since we're in an exclusive backup
- * return NULL for both backup_label and tablespace_map.
- */
- stoppoint = do_pg_stop_backup(NULL, waitforarchive, NULL);
-
- nulls[1] = true;
- nulls[2] = true;
- }
- else
- {
- if (status != SESSION_BACKUP_NON_EXCLUSIVE)
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("non-exclusive backup is not in progress"),
- errhint("Did you mean to use pg_stop_backup('t')?")));
-
- /*
- * Stop the non-exclusive backup. Return a copy of the backup label
- * and tablespace map so they can be written to disk by the caller.
- */
- stoppoint = do_pg_stop_backup(label_file->data, waitforarchive, NULL);
- cancel_before_shmem_exit(nonexclusive_base_backup_cleanup, (Datum) 0);
-
- values[1] = CStringGetTextDatum(label_file->data);
- values[2] = CStringGetTextDatum(tblspc_map_file->data);
-
- /* Free structures allocated in TopMemoryContext */
- pfree(label_file->data);
- pfree(label_file);
- label_file = NULL;
- pfree(tblspc_map_file->data);
- pfree(tblspc_map_file);
- tblspc_map_file = NULL;
- }
+ if (status != SESSION_BACKUP_RUNNING)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("backup is not in progress"),
+ errhint("Did you call pg_start_backup()?")));
+
+ /*
+ * Stop the backup. Return a copy of the backup label and tablespace map so
+ * they can be written to disk by the caller.
+ */
+ stoppoint = do_pg_stop_backup(label_file->data, waitforarchive, NULL);
+ cancel_before_shmem_exit(base_backup_cleanup, (Datum) 0);
- /* Stoppoint is included on both exclusive and nonexclusive backups */
values[0] = LSNGetDatum(stoppoint);
+ values[1] = CStringGetTextDatum(label_file->data);
+ values[2] = CStringGetTextDatum(tblspc_map_file->data);
+
+ /* Free structures allocated in TopMemoryContext */
+ pfree(label_file->data);
+ pfree(label_file);
+ label_file = NULL;
+ pfree(tblspc_map_file->data);
+ pfree(tblspc_map_file);
+ tblspc_map_file = NULL;
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
tuplestore_donestoring(typstore);
@@ -626,81 +541,6 @@ pg_wal_lsn_diff(PG_FUNCTION_ARGS)
PG_RETURN_NUMERIC(result);
}
-/*
- * Returns bool with current on-line backup mode, a global state.
- */
-Datum
-pg_is_in_backup(PG_FUNCTION_ARGS)
-{
- PG_RETURN_BOOL(BackupInProgress());
-}
-
-/*
- * Returns start time of an online exclusive backup.
- *
- * When there's no exclusive backup in progress, the function
- * returns NULL.
- */
-Datum
-pg_backup_start_time(PG_FUNCTION_ARGS)
-{
- Datum xtime;
- FILE *lfp;
- char fline[MAXPGPATH];
- char backup_start_time[30];
-
- /*
- * See if label file is present
- */
- lfp = AllocateFile(BACKUP_LABEL_FILE, "r");
- if (lfp == NULL)
- {
- if (errno != ENOENT)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not read file \"%s\": %m",
- BACKUP_LABEL_FILE)));
- PG_RETURN_NULL();
- }
-
- /*
- * Parse the file to find the START TIME line.
- */
- backup_start_time[0] = '\0';
- while (fgets(fline, sizeof(fline), lfp) != NULL)
- {
- if (sscanf(fline, "START TIME: %25[^\n]\n", backup_start_time) == 1)
- break;
- }
-
- /* Check for a read error. */
- if (ferror(lfp))
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not read file \"%s\": %m", BACKUP_LABEL_FILE)));
-
- /* Close the backup label file. */
- if (FreeFile(lfp))
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not close file \"%s\": %m", BACKUP_LABEL_FILE)));
-
- if (strlen(backup_start_time) == 0)
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE)));
-
- /*
- * Convert the time string read from file to TimestampTz form.
- */
- xtime = DirectFunctionCall3(timestamptz_in,
- CStringGetDatum(backup_start_time),
- ObjectIdGetDatum(InvalidOid),
- Int32GetDatum(-1));
-
- PG_RETURN_DATUM(xtime);
-}
-
/*
* Promotes a standby server.
*
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 8630542bb3..bf30693f23 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1018,14 +1018,14 @@ COMMENT ON FUNCTION ts_debug(text) IS
--
CREATE OR REPLACE FUNCTION
- pg_start_backup(label text, fast boolean DEFAULT false, exclusive boolean DEFAULT true)
+ pg_start_backup(label text, fast boolean DEFAULT false)
RETURNS pg_lsn STRICT VOLATILE LANGUAGE internal AS 'pg_start_backup'
PARALLEL RESTRICTED;
CREATE OR REPLACE FUNCTION pg_stop_backup (
- exclusive boolean, wait_for_archive boolean DEFAULT true,
- OUT lsn pg_lsn, OUT labelfile text, OUT spcmapfile text)
- RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup_v2'
+ wait_for_archive boolean DEFAULT true, OUT lsn pg_lsn, OUT labelfile text,
+ OUT spcmapfile text)
+ RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup'
PARALLEL RESTRICTED;
CREATE OR REPLACE FUNCTION
@@ -1133,9 +1133,8 @@ AS 'jsonb_insert';
-- can later change who can access these functions, or leave them as only
-- available to superuser / cluster owner, if they choose.
--
-REVOKE EXECUTE ON FUNCTION pg_start_backup(text, boolean, boolean) FROM public;
-REVOKE EXECUTE ON FUNCTION pg_stop_backup() FROM public;
-REVOKE EXECUTE ON FUNCTION pg_stop_backup(boolean, boolean) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_start_backup(text, boolean) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_stop_backup(boolean) FROM public;
REVOKE EXECUTE ON FUNCTION pg_create_restore_point(text) FROM public;
REVOKE EXECUTE ON FUNCTION pg_switch_wal() FROM public;
REVOKE EXECUTE ON FUNCTION pg_wal_replay_pause() FROM public;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a33a131182..3a8653ab4d 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -300,8 +300,7 @@ static bool FatalError = false; /* T if recovering from backend crash */
* and we switch to PM_RUN state.
*
* Normal child backends can only be launched when we are in PM_RUN or
- * PM_HOT_STANDBY state. (We also allow launch of normal
- * child backends in PM_WAIT_BACKUP state, but only for superusers.)
+ * PM_HOT_STANDBY state.
* In other states we handle connection requests by launching "dead_end"
* child processes, which will simply send the client an error message and
* quit. (We track these in the BackendList so that we can know when they
@@ -327,7 +326,6 @@ typedef enum
PM_RECOVERY, /* in archive recovery mode */
PM_HOT_STANDBY, /* in hot standby mode */
PM_RUN, /* normal "database is alive" state */
- PM_WAIT_BACKUP, /* waiting for online backup mode to end */
PM_WAIT_READONLY, /* waiting for read only backends to exit */
PM_WAIT_BACKENDS, /* waiting for live backends to exit */
PM_SHUTDOWN, /* waiting for checkpointer to do shutdown
@@ -2241,9 +2239,6 @@ retry1:
(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
errmsg("sorry, too many clients already")));
break;
- case CAC_WAITBACKUP:
- /* OK for now, will check in InitPostgres */
- break;
case CAC_OK:
break;
}
@@ -2352,18 +2347,10 @@ canAcceptConnections(void)
/*
* Can't start backends when in startup/shutdown/inconsistent recovery
* state.
- *
- * In state PM_WAIT_BACKUP only superusers can connect (this must be
- * allowed so that a superuser can end online backup mode); we return
- * CAC_WAITBACKUP code to indicate that this must be checked later. Note
- * that neither CAC_OK nor CAC_WAITBACKUP can safely be returned until we
- * have checked for too many children.
*/
if (pmState != PM_RUN)
{
- if (pmState == PM_WAIT_BACKUP)
- result = CAC_WAITBACKUP; /* allow superusers only */
- else if (Shutdown > NoShutdown)
+ if (Shutdown > NoShutdown)
return CAC_SHUTDOWN; /* shutdown is pending */
else if (!FatalError &&
(pmState == PM_STARTUP ||
@@ -2704,7 +2691,7 @@ pmdie(SIGNAL_ARGS)
* and walreceiver processes.
*/
pmState = (pmState == PM_RUN) ?
- PM_WAIT_BACKUP : PM_WAIT_READONLY;
+ PM_WAIT_BACKENDS : PM_WAIT_READONLY;
}
/*
@@ -2754,7 +2741,6 @@ pmdie(SIGNAL_ARGS)
pmState = PM_WAIT_BACKENDS;
}
else if (pmState == PM_RUN ||
- pmState == PM_WAIT_BACKUP ||
pmState == PM_WAIT_READONLY ||
pmState == PM_WAIT_BACKENDS ||
pmState == PM_HOT_STANDBY)
@@ -3571,7 +3557,6 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
if (pmState == PM_RECOVERY ||
pmState == PM_HOT_STANDBY ||
pmState == PM_RUN ||
- pmState == PM_WAIT_BACKUP ||
pmState == PM_WAIT_READONLY ||
pmState == PM_SHUTDOWN)
pmState = PM_WAIT_BACKENDS;
@@ -3663,15 +3648,6 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
static void
PostmasterStateMachine(void)
{
- if (pmState == PM_WAIT_BACKUP)
- {
- /*
- * PM_WAIT_BACKUP state ends when online backup mode is not active.
- */
- if (!BackupInProgress())
- pmState = PM_WAIT_BACKENDS;
- }
-
if (pmState == PM_WAIT_READONLY)
{
/*
@@ -3841,18 +3817,6 @@ PostmasterStateMachine(void)
}
else
{
- /*
- * Terminate exclusive backup mode to avoid recovery after a clean
- * fast shutdown. Since an exclusive backup can only be taken
- * during normal running (and not, for example, while running
- * under Hot Standby) it only makes sense to do this if we reached
- * normal running. If we're still in recovery, the backup file is
- * one we're recovering *from*, and we must keep it around so that
- * recovery restarts from the right place.
- */
- if (ReachedNormalRunning)
- CancelBackup();
-
/* Normal exit from the postmaster is here */
ExitPostmaster(0);
}
@@ -4054,8 +4018,7 @@ BackendStartup(Port *port)
/* Pass down canAcceptConnections state */
port->canAcceptConnections = canAcceptConnections();
- bn->dead_end = (port->canAcceptConnections != CAC_OK &&
- port->canAcceptConnections != CAC_WAITBACKUP);
+ bn->dead_end = port->canAcceptConnections != CAC_OK;
/*
* Unless it's a dead_end child, assign it a child slot number
@@ -5182,7 +5145,7 @@ sigusr1_handler(SIGNAL_ARGS)
}
if (CheckPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE) &&
- (pmState == PM_WAIT_BACKUP || pmState == PM_WAIT_BACKENDS))
+ pmState == PM_WAIT_BACKENDS)
{
/* Advance postmaster's state machine */
PostmasterStateMachine();
@@ -5775,7 +5738,6 @@ bgworker_should_start_now(BgWorkerStartTime start_time)
case PM_SHUTDOWN:
case PM_WAIT_BACKENDS:
case PM_WAIT_READONLY:
- case PM_WAIT_BACKUP:
break;
case PM_RUN:
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 78ed6cf797..448556d7c3 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -172,10 +172,8 @@ static const char *excludeFiles[] =
RELCACHE_INIT_FILENAME,
/*
- * If there's a backup_label or tablespace_map file, it belongs to a
- * backup started by the user with pg_start_backup(). It is *not* correct
- * for this backup. Our backup_label/tablespace_map is injected into the
- * tar separately.
+ * backup_label and tablespace_map should not exist in in a running cluster
+ * capable of doing an online backup, but exclude then just in case.
*/
BACKUP_LABEL_FILE,
TABLESPACE_MAP,
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index b636b1e262..d868c21fdc 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -788,7 +788,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
*/
if ((!am_superuser || am_walsender) &&
MyProcPort != NULL &&
- MyProcPort->canAcceptConnections == CAC_WAITBACKUP)
+ MyProcPort->canAcceptConnections == CAC_SHUTDOWN)
{
if (am_walsender)
ereport(FATAL,
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 3e1c3863c4..04eea9bfeb 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -158,6 +158,10 @@ isnt(slurp_file("$tempdir/backup/backup_label"),
'DONOTCOPY', 'existing backup_label not copied');
rmtree("$tempdir/backup");
+# Now delete the bogus backup_label file since it will interfere with startup
+unlink("$pgdata/backup_label")
+ or BAIL_OUT("unable to unlink $pgdata/backup_label");
+
$node->command_ok(
[
'pg_basebackup', '-D', "$tempdir/backup2", '--waldir',
diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c
index 222b56f58a..084ea06597 100644
--- a/src/bin/pg_rewind/filemap.c
+++ b/src/bin/pg_rewind/filemap.c
@@ -94,9 +94,9 @@ static const char *excludeFiles[] =
"pg_internal.init", /* defined as RELCACHE_INIT_FILENAME */
/*
- * If there's a backup_label or tablespace_map file, it belongs to a
- * backup started by the user with pg_start_backup(). It is *not* correct
- * for this backup. Our backup_label is written later on separately.
+ * If there is a backup_label or tablespace_map file, it indicates that
+ * a recovery failed and this cluster probably can't be rewound, but
+ * exclude them anyway if they are found.
*/
"backup_label", /* defined as BACKUP_LABEL_FILE */
"tablespace_map", /* defined as TABLESPACE_MAP */
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index f3a7ba4d42..7d0f45ff99 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -338,8 +338,7 @@ extern void assign_checkpoint_completion_target(double newval, void *extra);
typedef enum SessionBackupState
{
SESSION_BACKUP_NONE,
- SESSION_BACKUP_EXCLUSIVE,
- SESSION_BACKUP_NON_EXCLUSIVE
+ SESSION_BACKUP_RUNNING,
} SessionBackupState;
extern XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast,
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f79fcfe029..4a4c49cf99 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5814,25 +5814,15 @@
proargtypes => 'int4', prosrc => 'pg_terminate_backend' },
{ oid => '2172', descr => 'prepare for taking an online backup',
proname => 'pg_start_backup', provolatile => 'v', proparallel => 'r',
- prorettype => 'pg_lsn', proargtypes => 'text bool bool',
+ prorettype => 'pg_lsn', proargtypes => 'text bool',
prosrc => 'pg_start_backup' },
-{ oid => '2173', descr => 'finish taking an online backup',
- proname => 'pg_stop_backup', provolatile => 'v', proparallel => 'r',
- prorettype => 'pg_lsn', proargtypes => '', prosrc => 'pg_stop_backup' },
{ oid => '2739', descr => 'finish taking an online backup',
proname => 'pg_stop_backup', prorows => '1', proretset => 't',
provolatile => 'v', proparallel => 'r', prorettype => 'record',
- proargtypes => 'bool bool', proallargtypes => '{bool,bool,pg_lsn,text,text}',
- proargmodes => '{i,i,o,o,o}',
- proargnames => '{exclusive,wait_for_archive,lsn,labelfile,spcmapfile}',
- prosrc => 'pg_stop_backup_v2' },
-{ oid => '3813', descr => 'true if server is in online backup',
- proname => 'pg_is_in_backup', provolatile => 'v', prorettype => 'bool',
- proargtypes => '', prosrc => 'pg_is_in_backup' },
-{ oid => '3814', descr => 'start time of an online backup',
- proname => 'pg_backup_start_time', provolatile => 's',
- prorettype => 'timestamptz', proargtypes => '',
- prosrc => 'pg_backup_start_time' },
+ proargtypes => 'bool', proallargtypes => '{bool,pg_lsn,text,text}',
+ proargmodes => '{i,o,o,o}',
+ proargnames => '{wait_for_archive,lsn,labelfile,spcmapfile}',
+ prosrc => 'pg_stop_backup' },
{ oid => '3436', descr => 'promote standby server',
proname => 'pg_promote', provolatile => 'v', prorettype => 'bool',
proargtypes => 'bool int4', proargnames => '{wait,wait_seconds}',
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index b2c59df54e..3d7b831599 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -70,8 +70,7 @@ typedef struct
typedef enum CAC_state
{
- CAC_OK, CAC_STARTUP, CAC_SHUTDOWN, CAC_RECOVERY, CAC_TOOMANY,
- CAC_WAITBACKUP
+ CAC_OK, CAC_STARTUP, CAC_SHUTDOWN, CAC_RECOVERY, CAC_TOOMANY
} CAC_state;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index d6b32c070c..687056ba55 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -447,8 +447,4 @@ extern void process_session_preload_libraries(void);
extern void pg_bindtextdomain(const char *domain);
extern bool has_rolreplication(Oid roleid);
-/* in access/transam/xlog.c */
-extern bool BackupInProgress(void);
-extern void CancelBackup(void);
-
#endif /* MISCADMIN_H */
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 8a2c6fc122..4c27a7d362 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -546,25 +546,6 @@ sub backup
return;
}
-=item $node->backup_fs_hot(backup_name)
-
-Create a backup with a filesystem level copy in subdirectory B<backup_name> of
-B<< $node->backup_dir >>, including WAL.
-
-Archiving must be enabled, as B<pg_start_backup()> and B<pg_stop_backup()> are
-used. This is not checked or enforced.
-
-The backup name is passed as the backup label to B<pg_start_backup()>.
-
-=cut
-
-sub backup_fs_hot
-{
- my ($self, $backup_name) = @_;
- $self->_backup_fs($backup_name, 1);
- return;
-}
-
=item $node->backup_fs_cold(backup_name)
Create a backup with a filesystem level copy in subdirectory B<backup_name> of
@@ -578,53 +559,18 @@ Use B<backup> or B<backup_fs_hot> if you want to back up a running server.
sub backup_fs_cold
{
my ($self, $backup_name) = @_;
- $self->_backup_fs($backup_name, 0);
- return;
-}
-
-
-# Common sub of backup_fs_hot and backup_fs_cold
-sub _backup_fs
-{
- my ($self, $backup_name, $hot) = @_;
- my $backup_path = $self->backup_dir . '/' . $backup_name;
- my $port = $self->port;
- my $name = $self->name;
-
- print "# Taking filesystem backup $backup_name from node \"$name\"\n";
-
- if ($hot)
- {
- my $stdout = $self->safe_psql('postgres',
- "SELECT * FROM pg_start_backup('$backup_name');");
- print "# pg_start_backup: $stdout\n";
- }
RecursiveCopy::copypath(
$self->data_dir,
- $backup_path,
+ $self->backup_dir . '/' . $backup_name,
filterfn => sub {
my $src = shift;
return ($src ne 'log' and $src ne 'postmaster.pid');
});
- if ($hot)
- {
-
- # We ignore pg_stop_backup's return value. We also assume archiving
- # is enabled; otherwise the caller will have to copy the remaining
- # segments.
- my $stdout =
- $self->safe_psql('postgres', 'SELECT * FROM pg_stop_backup();');
- print "# pg_stop_backup: $stdout\n";
- }
-
- print "# Backup finished\n";
return;
}
-
-
=pod
=item $node->init_from_backup(root_node, backup_name)
diff --git a/src/test/recovery/t/010_logical_decoding_timelines.pl b/src/test/recovery/t/010_logical_decoding_timelines.pl
index e582b20005..32dfda454c 100644
--- a/src/test/recovery/t/010_logical_decoding_timelines.pl
+++ b/src/test/recovery/t/010_logical_decoding_timelines.pl
@@ -66,7 +66,9 @@ $node_master->safe_psql('dropme',
$node_master->safe_psql('postgres', 'CHECKPOINT;');
my $backup_name = 'b1';
-$node_master->backup_fs_hot($backup_name);
+$node_master->stop();
+$node_master->backup_fs_cold($backup_name);
+$node_master->start();
$node_master->safe_psql('postgres',
q[SELECT pg_create_physical_replication_slot('phys_slot');]);
On Wed, Dec 12, 2018 at 7:24 AM David Steele <david@pgmasters.net> wrote:
Attached is the patch.
I was a bit surprised by how much code went away. There was a lot of
code dedicated to making sure that backup_label got renamed on shutdown,
that there was not an exclusive backup running, etc.There were no tests for exclusive backup so the test changes were minimal.
I did have to replace one "hot backup" in
010_logical_decoding_timelines.pl. I'm not sure why the backup was
being done this way, or why the standby needs a copy of pg_replslot
(which is not copied by pg_basebackup). It might be worth looking at
but it seemed out of scope for this patch.
I wish to point out that there currently seem to be more votes against
this proposal than for it, and that nobody should commit a patch
unless there is a consensus that it should be committed, whether or
not the committer personally agrees with the arguments against it.
As for my vote, I do not buy the idea that because we're changing some
stuff about recovery.conf we should go ahead and do this too. Yes,
they are related, but just because you adjust your backup/restore
script/tool to cope with one change doesn't mean that you don't have
to adjust it some more to cope with the other change.
I also think that the idea that supporting the exclusive backup
interface is hurting anything is greatly exaggerated. Whether we keep
it or not, we're not forcing anyone to use it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Dec 12, 2018 at 11:05:36AM +0900, Robert Haas wrote:
I wish to point out that there currently seem to be more votes against
this proposal than for it, and that nobody should commit a patch
unless there is a consensus that it should be committed, whether or
not the committer personally agrees with the arguments against it.As for my vote, I do not buy the idea that because we're changing some
stuff about recovery.conf we should go ahead and do this too. Yes,
they are related, but just because you adjust your backup/restore
script/tool to cope with one change doesn't mean that you don't have
to adjust it some more to cope with the other change.I also think that the idea that supporting the exclusive backup
interface is hurting anything is greatly exaggerated. Whether we keep
it or not, we're not forcing anyone to use it.
+1 on all that. Per the trend of this thread, I see a bunch of
committers and contributors commenting about *not* removing this code,
so sending a patch to actually remove it looks like a throw into an
abysmal void.
--
Michael
Hi,
On 2018-12-12 11:05:36 +0900, Robert Haas wrote:
As for my vote, I do not buy the idea that because we're changing some
stuff about recovery.conf we should go ahead and do this too. Yes,
they are related, but just because you adjust your backup/restore
script/tool to cope with one change doesn't mean that you don't have
to adjust it some more to cope with the other change.
Sure, but you have to touch it, and it's not that big of an additional
change. Call the other form of pg_stop_backup(), and write the contents
of the two returned textblobs into backup_label/tablespace_map. As a
reward you get a backup that break crash-safety for the database, and
not having to do the migration later.
I also think that the idea that supporting the exclusive backup
interface is hurting anything is greatly exaggerated. Whether we keep
it or not, we're not forcing anyone to use it.
It's hurting users. I've seen numerous outages due to the backup_label
file being around in the wrong moment.
Greetings,
Andres Freund
On 2018-12-12 11:32:55 +0900, Michael Paquier wrote:
On Wed, Dec 12, 2018 at 11:05:36AM +0900, Robert Haas wrote:
I wish to point out that there currently seem to be more votes against
this proposal than for it, and that nobody should commit a patch
unless there is a consensus that it should be committed, whether or
not the committer personally agrees with the arguments against it.As for my vote, I do not buy the idea that because we're changing some
stuff about recovery.conf we should go ahead and do this too. Yes,
they are related, but just because you adjust your backup/restore
script/tool to cope with one change doesn't mean that you don't have
to adjust it some more to cope with the other change.I also think that the idea that supporting the exclusive backup
interface is hurting anything is greatly exaggerated. Whether we keep
it or not, we're not forcing anyone to use it.+1 on all that. Per the trend of this thread, I see a bunch of
committers and contributors commenting about *not* removing this code,
so sending a patch to actually remove it looks like a throw into an
abysmal void.
I don't think it's as clear as you make it out to be:
In favor:
- David Steele
- Andres Freund
- Stephen Frost, I think
Against:
- Simon Riggs
- Robert Haas
- Michael Paquier
- Andreas Karlsson
- Peter Eisentraut, I think
While that's clearly lopsided, I still think it's pretty darn absurd to
call such a situation "a throw into an abysmal void".
Greetings,
Andres Freund
On 12/11/18 9:05 PM, Robert Haas wrote:
On Wed, Dec 12, 2018 at 7:24 AM David Steele <david@pgmasters.net> wrote:
Attached is the patch.
As for my vote, I do not buy the idea that because we're changing some
stuff about recovery.conf we should go ahead and do this too. Yes,
they are related, but just because you adjust your backup/restore
script/tool to cope with one change doesn't mean that you don't have
to adjust it some more to cope with the other change.
I think the patch stands on its own. Exclusive backup mode is broken
and is know to cause problems in the field.
We deprecated it three years ago and we have a very capable core backup
tool that people can use instead.
I also think that the idea that supporting the exclusive backup
interface is hurting anything is greatly exaggerated. Whether we keep
it or not, we're not forcing anyone to use it.
There are maintenance advantages to getting rid of it. The code has
become simpler and the documentation has become much simpler. Trying to
explain the subtle differences between backup methods is difficult and
only makes the documentation more confusing.
Removing the extra complexity from shutdown also seems like a win. The
gyrations we went through to try and ensure that backup_label was
renamed before shutdown were truly agonizing. None of that is necessary
with the non-exclusive mode.
--
-David
david@pgmasters.net
On Wed, Dec 12, 2018 at 11:44 AM Andres Freund <andres@anarazel.de> wrote:
On 2018-12-12 11:32:55 +0900, Michael Paquier wrote:
On Wed, Dec 12, 2018 at 11:05:36AM +0900, Robert Haas wrote:
I wish to point out that there currently seem to be more votes against
this proposal than for it, and that nobody should commit a patch
unless there is a consensus that it should be committed, whether or
not the committer personally agrees with the arguments against it.As for my vote, I do not buy the idea that because we're changing some
stuff about recovery.conf we should go ahead and do this too. Yes,
they are related, but just because you adjust your backup/restore
script/tool to cope with one change doesn't mean that you don't have
to adjust it some more to cope with the other change.I also think that the idea that supporting the exclusive backup
interface is hurting anything is greatly exaggerated. Whether we keep
it or not, we're not forcing anyone to use it.+1 on all that. Per the trend of this thread, I see a bunch of
committers and contributors commenting about *not* removing this code,
so sending a patch to actually remove it looks like a throw into an
abysmal void.I don't think it's as clear as you make it out to be:
In favor:
- David Steele
- Andres Freund
- Stephen Frost, I think
You forgot me, also in this camp :P
Against:
- Simon Riggs
- Robert Haas
- Michael Paquier
- Andreas Karlsson
- Peter Eisentraut, I thinkWhile that's clearly lopsided, I still think it's pretty darn absurd to
call such a situation "a throw into an abysmal void".
I'd say it's close enough to an even split to say it doesn't go either way.
But we can clearly say it's not consensus for removing it (nor for keeping
it, but that would be the default..)
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On Wed, Dec 12, 2018 at 1:09 PM David Steele <david@pgmasters.net> wrote:
On 12/11/18 9:05 PM, Robert Haas wrote:
On Wed, Dec 12, 2018 at 7:24 AM David Steele <david@pgmasters.net>
wrote:
Attached is the patch.
As for my vote, I do not buy the idea that because we're changing some
stuff about recovery.conf we should go ahead and do this too. Yes,
they are related, but just because you adjust your backup/restore
script/tool to cope with one change doesn't mean that you don't have
to adjust it some more to cope with the other change.I think the patch stands on its own. Exclusive backup mode is broken
and is know to cause problems in the field.
Yes, this is a *far* too common thing to be seen in the field IMHO.
Is it super-common? No. But it's by far too common not to be a concern.
We deprecated it three years ago and we have a very capable core backup
tool that people can use instead.
+1.
If we can't get consensus on removing it now, then how about actually
setting a *plan* for when to remove it?
For example, in 12 we could more aggressively deprecate it (such as
actually emitting a WARNING message whenever used?), and then announce
already now that it will be removed in version 13? That will then cover all
those people who didn't read the documentation and find the existing
deprecation notice?
I also think that the idea that supporting the exclusive backup
interface is hurting anything is greatly exaggerated. Whether we keep
it or not, we're not forcing anyone to use it.There are maintenance advantages to getting rid of it. The code has
become simpler and the documentation has become much simpler. Trying to
explain the subtle differences between backup methods is difficult and
only makes the documentation more confusing.Removing the extra complexity from shutdown also seems like a win. The
gyrations we went through to try and ensure that backup_label was
renamed before shutdown were truly agonizing. None of that is necessary
with the non-exclusive mode.
That, too!
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On 11/27/18 10:05 AM, Andreas Karlsson wrote:
On 11/27/18 3:46 PM, Stephen Frost wrote:
I'm concerned, seriously, that people don't have anywhere near the
concern about the recovery side of things as they do about the backup
side of things and that's really concerning.I think we should use more caution when data loss is at stake rather
than "just" downtime. So personally I am in favor of updating the manual
with warnings (right now it does not even say if exclusive or
non-exclusive is the default) and adding a deprecation warning when
people use the exclusive mode.
The documentation says (since 9.6):
Low level base backups can be made in a non-exclusive or an exclusive
way. The non-exclusive method is recommended and the exclusive one is
deprecated and will eventually be removed.
That seems clear enough to me.
--
-David
david@pgmasters.net
On 12/11/18 9:44 PM, Andres Freund wrote:
On 2018-12-12 11:32:55 +0900, Michael Paquier wrote:
+1 on all that. Per the trend of this thread, I see a bunch of
committers and contributors commenting about *not* removing this code,
so sending a patch to actually remove it looks like a throw into an
abysmal void.I don't think it's as clear as you make it out to be:
In favor:
- David Steele
- Andres Freund
- Stephen Frost, I think
Against:
- Simon Riggs
- Robert Haas
- Michael Paquier
- Andreas Karlsson
- Peter Eisentraut, I thinkWhile that's clearly lopsided, I still think it's pretty darn absurd to
call such a situation "a throw into an abysmal void".
I didn't get the impression that Peter was against, he just thought that
it needed to stand on its own, rather than be justified by the
recovery.conf changes, which I agree with.
Simon rather clearly said that he thinks we should wait until the next
release, which I don't see as being entirely against.
--
-David
david@pgmasters.net
On Wed, Dec 12, 2018 at 1:23 PM David Steele <david@pgmasters.net> wrote:
I didn't get the impression that Peter was against, he just thought that
it needed to stand on its own, rather than be justified by the
recovery.conf changes, which I agree with.Simon rather clearly said that he thinks we should wait until the next
release, which I don't see as being entirely against.
Well, nobody is saying that we should NEVER remove this. The
discussion is about what to do in v12.
Most of the features I've been involved in removing have been
deprecated for 5+ years. The first release where this one was
deprecated was only 2 years ago. So it feels dramatically faster to
me than what I think we have typically done.
Actually, I hadn't realized until this discussion that the exclusive
backup interface was actually deprecated -- I thought we were just
recommending the new non-exclusive backup interface should be used.
If we're in a rush to remove this (and apparently many of us are), I
think we should make that warning a lot more prominent, maybe copy it
into a few more places, and back-patch the changes.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Dec 12, 2018 at 01:31:53PM +0900, Robert Haas wrote:
Most of the features I've been involved in removing have been
deprecated for 5+ years. The first release where this one was
deprecated was only 2 years ago. So it feels dramatically faster to
me than what I think we have typically done.Actually, I hadn't realized until this discussion that the exclusive
backup interface was actually deprecated -- I thought we were just
recommending the new non-exclusive backup interface should be used.
If we're in a rush to remove this (and apparently many of us are), I
think we should make that warning a lot more prominent, maybe copy it
into a few more places, and back-patch the changes.
Indeed, I definitely agree that we should move away from the exclusive
backup APIs at some point, still it feels very early to do this move as
of now. Showing a bigger warning in the docs would be nice as a first
step, and then we could wait a couple of extra years before doing the
actual move.
By the way, David and others, my apologies for my latest email, I hope
you did not take that badly, my impression of this thread about the
opinions gathered was wrong.
--
Michael
On December 11, 2018 10:10:00 PM PST, Michael Paquier <michael@paquier.xyz> wrote:
By the way, David and others, my apologies for my latest email, I hope
you did not take that badly, my impression of this thread about the
opinions gathered was wrong.
Even if it had been 10:1 it doesn't seem like it'd have been a good approach of an answer.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Wed, 2018-12-12 at 13:31 +0900, Robert Haas wrote:
On Wed, Dec 12, 2018 at 1:23 PM David Steele <david@pgmasters.net> wrote:
I didn't get the impression that Peter was against, he just thought that
it needed to stand on its own, rather than be justified by the
recovery.conf changes, which I agree with.Simon rather clearly said that he thinks we should wait until the next
release, which I don't see as being entirely against.Well, nobody is saying that we should NEVER remove this. The
discussion is about what to do in v12.Most of the features I've been involved in removing have been
deprecated for 5+ years. The first release where this one was
deprecated was only 2 years ago. So it feels dramatically faster to
me than what I think we have typically done.Actually, I hadn't realized until this discussion that the exclusive
backup interface was actually deprecated -- I thought we were just
recommending the new non-exclusive backup interface should be used.
If we're in a rush to remove this (and apparently many of us are), I
think we should make that warning a lot more prominent, maybe copy it
into a few more places, and back-patch the changes.
+1
I too only learned about this recently, while the problem with exclusive
backups has been known at least since 2008 (c979a1fe), and nobody felt
this to be a terrible problem back then.
I believe that the danger is greatly overrated. It is not like you end
up with a corrupted database after a crash, and you get a pretty helpful
error message. Many people are happy enough to live with that.
I'm on board with deprecating and removing it eventually, but I see no
problem in waiting for the customary 5 years.
And yes, a prominent warning in the next major release notes would be
a good thing.
Yours,
Laurenz Albe
On 12/11/18 11:31 PM, Robert Haas wrote:
On Wed, Dec 12, 2018 at 1:23 PM David Steele <david@pgmasters.net> wrote:
I didn't get the impression that Peter was against, he just thought that
it needed to stand on its own, rather than be justified by the
recovery.conf changes, which I agree with.Simon rather clearly said that he thinks we should wait until the next
release, which I don't see as being entirely against.Well, nobody is saying that we should NEVER remove this. The
discussion is about what to do in v12.
Fair enough.
Most of the features I've been involved in removing have been
deprecated for 5+ years. The first release where this one was
deprecated was only 2 years ago. So it feels dramatically faster to
me than what I think we have typically done.
I think this case is different because exclusive backups have known issues.
I also think that having two similar but different methods stifles
development in this area and makes the docs harder to improve. There's
a lot of "if this then that else that" in the docs which make them hard
to maintain and harder to read.
Add the fact that we have *zero* tests for exclusive backups. I only
had to refactor one exclusive backup in the tests and since it did not
archive, exclude pg_wal, postmaster.pid, or do anything else our docs
recommend I wouldn't say it qualifies as a real test. Also, it wasn't
even trying to test exclusive backups -- it was a test for logical
replication following timelines.
So yeah, we'd be removing it in three years instead of five, but there
has been a safe in-core alternative since 9.1 (eight years).
Regards,
--
-David
david@pgmasters.net
On Wed, Dec 12, 2018 at 08:22:10AM -0500, David Steele wrote:
Add the fact that we have *zero* tests for exclusive backups. I only
had to refactor one exclusive backup in the tests and since it did not
archive, exclude pg_wal, postmaster.pid, or do anything else our docs
recommend I wouldn't say it qualifies as a real test. Also, it wasn't
even trying to test exclusive backups -- it was a test for logical
replication following timelines.
This point is really right. The TAP tests rely heavily on pg_basebackup
when taking base backups, still there is an interface to be able to take
filesystem-level backups with the exclusive SQL interface. The test
David is referring to here is backup_fs_hot in PostgresNode.pm.
--
Michael
On 12/12/2018 05:31, Robert Haas wrote:
Most of the features I've been involved in removing have been
deprecated for 5+ years. The first release where this one was
deprecated was only 2 years ago. So it feels dramatically faster to
me than what I think we have typically done.
I was just looking this up as well, and I find it too fast. The
nonexclusive backups were introduced in 9.6. So I'd say that we could
remove the exclusive ones when 9.5 goes EOL. (That would mean this
patch could be submitted for PostgreSQL 13, since 9.5 will go out of
support around the time PG13 would be released.)
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/12/2018 23:24, David Steele wrote:
@@ -845,7 +838,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 && cp pg_wal/0 rights to run pg_start_backup (superuser, or a user who has been granted EXECUTE on the function) and issue the command: <programlisting> -SELECT pg_start_backup('label', false, false); +SELECT pg_start_backup('label', false); </programlisting> where <literal>label</literal> is any string you want to use to uniquely identify this backup operation. The connection
Is it good to change the meaning of an existing interface like that?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12/12/2018 05:09, David Steele wrote:
I think the patch stands on its own. Exclusive backup mode is broken
and is know to cause problems in the field.
Is this breakage documented anywhere for users?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12/13/18 2:02 PM, Peter Eisentraut wrote:
On 12/12/2018 05:09, David Steele wrote:
I think the patch stands on its own. Exclusive backup mode is broken
and is know to cause problems in the field.Is this breakage documented anywhere for users?
Yes:
https://www.postgresql.org/docs/11/continuous-archiving.html
"Note that if the server crashes during the backup it may not be
possible to restart until the backup_label file has been manually
deleted from the PGDATA directory."
Seems like the wording could be stronger.
--
-David
david@pgmasters.net
On 12/13/18 2:00 PM, Peter Eisentraut wrote:
On 11/12/2018 23:24, David Steele wrote:
@@ -845,7 +838,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 && cp pg_wal/0 rights to run pg_start_backup (superuser, or a user who has been granted EXECUTE on the function) and issue the command: <programlisting> -SELECT pg_start_backup('label', false, false); +SELECT pg_start_backup('label', false); </programlisting> where <literal>label</literal> is any string you want to use to uniquely identify this backup operation. The connectionIs it good to change the meaning of an existing interface like that?
Good question.
We could leave the third parameter (changing the default to false) and
error if it has any value but false. It's a bit ugly but it does
maintain compatibility with the current non-exclusive backup interface.
And, the third parameter would never need to be specified unless we add
a fourth.
Or we could add a new function and deprecate this one -- but what to
call it?
I agree it's not very cool to break code for users who actually *did*
migrate to non-exclusive backups.
--
-David
david@pgmasters.net
On Thu, Dec 13, 2018 at 2:29 PM David Steele <david@pgmasters.net> wrote:
Good question.
We could leave the third parameter (changing the default to false) and
error if it has any value but false. It's a bit ugly but it does
maintain compatibility with the current non-exclusive backup interface.
And, the third parameter would never need to be specified unless we add
a fourth.Or we could add a new function and deprecate this one -- but what to
call it?I agree it's not very cool to break code for users who actually *did*
migrate to non-exclusive backups.
Uh, yeah. If you do that, you're not just forcing people off the old
interface -- you're making *everybody* update their stuff again. And
having to have conditional logic to handle different releases presents
its own set of dangers. IMHO, the biggest enemy in this area BY FAR is
human error, not the details of how the interface works. No amount of
providing a better-designed interface will compensate for the
confusion factor involved in changing it.
My vote is ... when we actually deprecate this, change nothing at the
SQL level initially, but just throw an error if the "exclusive"
argument isn't false:
ERROR: exclusive backup mode is no longer supported
That will require everyone to use the three-argument form of
pg_start_backup(), but I think that's good, because if we accept the
one and two argument forms, how do we actually know that the user
updated their code? If you want to actually force people to switch to
the non-exclusive mode, you have to make sure that anything that might
be a residual request for exclusive backup mode errors out. If the
same SQL just does something different, the user might fail to notice.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 13/12/2018 20:17, David Steele wrote:
On 12/13/18 2:02 PM, Peter Eisentraut wrote:
On 12/12/2018 05:09, David Steele wrote:
I think the patch stands on its own. Exclusive backup mode is broken
and is know to cause problems in the field.Is this breakage documented anywhere for users?
Yes:
https://www.postgresql.org/docs/11/continuous-archiving.html
"Note that if the server crashes during the backup it may not be
possible to restart until the backup_label file has been manually
deleted from the PGDATA directory."Seems like the wording could be stronger.
I think that is a pretty liberal interpretation of the word "broken".
All this says is "if you do A, and B happens, then you need to do C".
Clearly, not having to do that at all is better, but if this is all
there is to it, then I'm confused by the characterizations of how awful
and terrible this feature is and how we must rush to remove it.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12/14/18 3:01 AM, Peter Eisentraut wrote:
On 13/12/2018 20:17, David Steele wrote:
On 12/13/18 2:02 PM, Peter Eisentraut wrote:
On 12/12/2018 05:09, David Steele wrote:
I think the patch stands on its own. Exclusive backup mode is broken
and is know to cause problems in the field.Is this breakage documented anywhere for users?
Yes:
https://www.postgresql.org/docs/11/continuous-archiving.html
"Note that if the server crashes during the backup it may not be
possible to restart until the backup_label file has been manually
deleted from the PGDATA directory."Seems like the wording could be stronger.
I think that is a pretty liberal interpretation of the word "broken".
All this says is "if you do A, and B happens, then you need to do C".Clearly, not having to do that at all is better, but if this is all
there is to it, then I'm confused by the characterizations of how awful
and terrible this feature is and how we must rush to remove it.
If the server crashes during a backup, the server probably won't
restart. We say "may" here but if your backup runs more than a few
checkpoints it's more like won't. The remedy is to go manually delete a
file the user may have never heard of. They are often hesitant to do so
-- for good reason.
The downtime stretches as they call their support company or consult
Google to determine if they should really do this.
The main thing is that *manual* intervention is required to get the
cluster running again. Sure, you could automate it, but now we have
users writing scripts to delete files out of PGDATA (may as well get
tablespace_map as well) -- but hopefully not in the case of a restore
where you actually need those files. And hopefully not anything else
important.
That seems pretty broken to me.
--
-David
david@pgmasters.net
On Fri, Dec 14, 2018 at 8:27 AM David Steele <david@pgmasters.net> wrote:
If the server crashes during a backup, the server probably won't
restart. We say "may" here but if your backup runs more than a few
checkpoints it's more like won't. The remedy is to go manually delete a
file the user may have never heard of. They are often hesitant to do so
-- for good reason.
Sure. So it's broken in the sense that if you don't read and
carefully follow the directions, you will have problems. But that's
also true of an automobile, a power saw, the DELETE command,
antibiotics, and the Internal Revenue Code. Many things in life are
complicated and require due caution, and yet people regularly navigate
things that are VASTLY more complicated than "if the server crashes
during a hot backup, remove this file." Especially because if we
subsequently fail for this reason, we give you a very precise hint
that tells you exactly what to do:
HINT: If you are not restoring from a backup, try removing the file
"<path to $PGDATA goes here>/backup_label"
Now, I grant that emitting a HINT telling you exactly what to do is
not as good as not requiring manual user invention in the first place,
but it's not like we're requiring you to carry out some byzantine
process that nobody can possible understand. The "if" condition here
is one that any DBA should be able to understand: am I, or am I not,
in the process of restoring a backup? The system does not know, but
the DBA should. And once you know that the answer is "no, I'm not
restoring a backup", then you just remove the exact pathname indicated
and try it again and everything works fine.
Peter's complaint -- that this is a pretty liberal reading of the word
"broken" -- is IMHO very well put. Saying that something is broken
means it doesn't work. But the exclusive backup mode works fine if
used according to the directions. The fact that people often get
confused and don't follow the directions, and that the directions tend
to end up requiring manual intervention after a system crash, is
unfortunate, and it is why we have a new API. But "that thing isn't
as well designed as it could've been" is not the same thing as "that
thing does not work when used as designed," no matter how much you
keep insisting the contrary. It does work. It has worked for a long
time. Many people have used it successfully. It was the ONLY way of
doing this for many years. There is no necessary reason why it has to
be a huge problem, and I don't think it is.
In my experience with EnterpriseDB customers, this IS an occasional
source of confusion, but there are other things that are a problem a
lot more frequently, like work_mem, which you often can't set high
enough to get good query performance without causing OOM events when
the system gets busy. I'd hesitate to call work_mem broken, but a GUC
for which no single value both adequately protects against OOM and
gives workable performance is closer to broken than an exclusive
backup mode, which actually does work - albeit with occasional manual
intervention - if you read and follow the directions.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 12/12/18 4:44 AM, Andres Freund wrote:
On 2018-12-12 11:32:55 +0900, Michael Paquier wrote:
+1 on all that. Per the trend of this thread, I see a bunch of
committers and contributors commenting about *not* removing this code,
so sending a patch to actually remove it looks like a throw into an
abysmal void.I don't think it's as clear as you make it out to be:
In favor:
- David Steele
- Andres Freund
- Stephen Frost, I think
Against:
- Simon Riggs
- Robert Haas
- Michael Paquier
- Andreas Karlsson
- Peter Eisentraut, I thinkWhile that's clearly lopsided, I still think it's pretty darn absurd to
call such a situation "a throw into an abysmal void".
The way I read the discussion, there is no consensus for getting this
into PG12, but there does seem to be one for PG13.
I'll push this to the first post-PG12 CF when one appears. Can we just
go ahead and create a 2019-07 CF now? I know we generally discuss this
at PGCon, but we can always rename it, right?
--
-David
david@pgmasters.net
On Wed, Dec 19, 2018 at 5:38 PM David Steele <david@pgmasters.net> wrote:
On 12/12/18 4:44 AM, Andres Freund wrote:
On 2018-12-12 11:32:55 +0900, Michael Paquier wrote:
+1 on all that. Per the trend of this thread, I see a bunch of
committers and contributors commenting about *not* removing this code,
so sending a patch to actually remove it looks like a throw into an
abysmal void.I don't think it's as clear as you make it out to be:
In favor:
- David Steele
- Andres Freund
- Stephen Frost, I think
Against:
- Simon Riggs
- Robert Haas
- Michael Paquier
- Andreas Karlsson
- Peter Eisentraut, I thinkWhile that's clearly lopsided, I still think it's pretty darn absurd to
call such a situation "a throw into an abysmal void".The way I read the discussion, there is no consensus for getting this
into PG12, but there does seem to be one for PG13.I'll push this to the first post-PG12 CF when one appears. Can we just
go ahead and create a 2019-07 CF now? I know we generally discuss this
at PGCon, but we can always rename it, right?
We can, I guess. Are you saying you're likely to forget? :P
For 12, it would probably be good to have a more clear deprecation notice
into the documentations. Is that something on your list of things to work
on, or should someone else (like, uh, me)?
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On 12/19/18 9:02 PM, Magnus Hagander wrote:
On Wed, Dec 19, 2018 at 5:38 PM David Steele <david@pgmasters.net
I'll push this to the first post-PG12 CF when one appears. Can we just
go ahead and create a 2019-07 CF now? I know we generally discuss this
at PGCon, but we can always rename it, right?We can, I guess. Are you saying you're likely to forget? :P
Not likely to forget, but I figured it's not good to have it in at the
start of the next CF. People might think they can review it!
For 12, it would probably be good to have a more clear deprecation
notice into the documentations. Is that something on your list of things
to work on, or should someone else (like, uh, me)?
I'm planning to do that -- I meant to mention that. I may not have time
until after the new year, but I doubt it's something that will cause a
big fuss.
--
-David
david@pgmasters.net
On Wed, Dec 19, 2018 at 06:38:00PM +0200, David Steele wrote:
I'll push this to the first post-PG12 CF when one appears. Can we just go
ahead and create a 2019-07 CF now? I know we generally discuss this at
PGCon, but we can always rename it, right?
The schedule gets decided at the developer meeting, still the commit
fests created can have their name and start/end dates modified as needed
So, I think that it is not an issue to add one ahead now. And here you
go:
https://commitfest.postgresql.org/23/
This request was going to pop out at some point anyway in the follow-up
months.
--
Michael
On 12/20/18 1:35 AM, Michael Paquier wrote:
On Wed, Dec 19, 2018 at 06:38:00PM +0200, David Steele wrote:
I'll push this to the first post-PG12 CF when one appears. Can we just go
ahead and create a 2019-07 CF now? I know we generally discuss this at
PGCon, but we can always rename it, right?The schedule gets decided at the developer meeting, still the commit
fests created can have their name and start/end dates modified as needed
So, I think that it is not an issue to add one ahead now. And here you
go:
https://commitfest.postgresql.org/23/This request was going to pop out at some point anyway in the follow-up
months.
Thanks, Michael. Getting this error, not sure what it means:
Cannot move patch to the same commitfest, and multiple future
commitfests exist!
--
-David
david@pgmasters.net
On Thu, Dec 20, 2018 at 12:29:48PM +0200, David Steele wrote:
Cannot move patch to the same commitfest, and multiple future commitfests
exist!
I am not sure what it means either. What if you just mark the existing
entry as returned with feedback, and create a new one ahead?
--
Michael
On 12/21/18 2:01 AM, Michael Paquier wrote:
On Thu, Dec 20, 2018 at 12:29:48PM +0200, David Steele wrote:
Cannot move patch to the same commitfest, and multiple future commitfests
exist!I am not sure what it means either. What if you just mark the existing
entry as returned with feedback, and create a new one ahead?
Yeah, I can do that, but it would be nice to know why this is not
working. It would also be nice to preserve the history.
Magnus?
--
-David
david@pgmasters.net
On Fri, Dec 21, 2018 at 1:18 AM David Steele <david@pgmasters.net> wrote:
On 12/21/18 2:01 AM, Michael Paquier wrote:
On Thu, Dec 20, 2018 at 12:29:48PM +0200, David Steele wrote:
Cannot move patch to the same commitfest, and multiple future commitfests
exist!I am not sure what it means either. What if you just mark the existing
entry as returned with feedback, and create a new one ahead?Yeah, I can do that, but it would be nice to know why this is not
working. It would also be nice to preserve the history.Magnus?
I think what it means is that it doesn't know which CommitFest is the
"next" one. If the patch were currently in the in-progress
CommitFest, the Open one (if any) would be the next one. Otherwise,
if there were only one Future CommitFest, that would be the next one.
But right now you're moving a patch that is already in the Open
CommitFest and there are two 2 Future CommitFests, so it doesn't know
which one to pick.
I think the old CommitFest application let you set the CommitFest via
the Edit screen also, so you could pick a specific CommitFest to
target. But that doesn't seem to exist in Magnus's version.
If you just wait until the Open CommitFest is marked In Progress and
the next one is marked Open, it should work in any case.
I think.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 12/21/18 6:43 PM, Robert Haas wrote:
If you just wait until the Open CommitFest is marked In Progress and
the next one is marked Open, it should work in any case.
I'll give that a try since it's only a few more days.
I think.
Well, it's an uncertain world.
--
-David
david@pgmasters.net
On Fri, Dec 21, 2018 at 5:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Dec 21, 2018 at 1:18 AM David Steele <david@pgmasters.net> wrote:
On 12/21/18 2:01 AM, Michael Paquier wrote:
On Thu, Dec 20, 2018 at 12:29:48PM +0200, David Steele wrote:
Cannot move patch to the same commitfest, and multiple future
commitfests
exist!
I am not sure what it means either. What if you just mark the existing
entry as returned with feedback, and create a new one ahead?Yeah, I can do that, but it would be nice to know why this is not
working. It would also be nice to preserve the history.Magnus?
I think what it means is that it doesn't know which CommitFest is the
"next" one. If the patch were currently in the in-progress
CommitFest, the Open one (if any) would be the next one. Otherwise,
if there were only one Future CommitFest, that would be the next one.
But right now you're moving a patch that is already in the Open
CommitFest and there are two 2 Future CommitFests, so it doesn't know
which one to pick.I think the old CommitFest application let you set the CommitFest via
the Edit screen also, so you could pick a specific CommitFest to
target. But that doesn't seem to exist in Magnus's version.
That was a specifically requested change, and IIRC has been discussed
before whenever people run into it :)
The basics is, there is not supposed to be more than one Future commitfest,
in order to keep the workflow as simple as possible. When there is, all
sorts of bad things happen. What we really should have at that point is a
blocker so you can't *create* more than one of them, but right now it
naively assumes nobody does...
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On 12/27/18 6:53 PM, Magnus Hagander wrote:
On Fri, Dec 21, 2018 at 5:43 PM Robert Haas <robertmhaas@gmail.com
I think the old CommitFest application let you set the CommitFest via
the Edit screen also, so you could pick a specific CommitFest to
target. But that doesn't seem to exist in Magnus's version.The basics is, there is not supposed to be more than one Future
commitfest, in order to keep the workflow as simple as possible. When
there is, all sorts of bad things happen. What we really should have at
that point is a blocker so you can't *create* more than one of them, but
right now it naively assumes nobody does...
Which is ironic since I specifically requested that one be created.
--
-David
david@pgmasters.net
Greetings,
* Laurenz Albe (laurenz.albe@cybertec.at) wrote:
I too only learned about this recently, while the problem with exclusive
backups has been known at least since 2008 (c979a1fe), and nobody felt
this to be a terrible problem back then.
My recollection was that back then it wasn't clear what to *do* about
the problem. Once we had a solution (non-exclusive backups) we
deprecated the exclusive backup mechanism.
I believe that the danger is greatly overrated. It is not like you end
up with a corrupted database after a crash, and you get a pretty helpful
error message. Many people are happy enough to live with that.
Unfortunately, people get corrupted databases all the time because the
backups are corrupted. The problem with databases not restarting
correctly is another issue with the exclusive backup method but it's
certainly not the only one.
I'm on board with deprecating and removing it eventually, but I see no
problem in waiting for the customary 5 years.
We don't actually have a 'customary 5 years' rule of any sort.
And yes, a prominent warning in the next major release notes would be
a good thing.
This is two years too late, at best. That is, maybe it would have made
sense to highlight that the exclusive backup method was deprecated when
it was made so, but that ship has sailed. Further, we don't put out
announcements saying "we're going to remove X in the next release" and I
don't see it making sense to start now.
Thanks!
Stephen
Greetings,
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
On 12/12/2018 05:31, Robert Haas wrote:
Most of the features I've been involved in removing have been
deprecated for 5+ years. The first release where this one was
deprecated was only 2 years ago. So it feels dramatically faster to
me than what I think we have typically done.I was just looking this up as well, and I find it too fast. The
nonexclusive backups were introduced in 9.6. So I'd say that we could
remove the exclusive ones when 9.5 goes EOL. (That would mean this
patch could be submitted for PostgreSQL 13, since 9.5 will go out of
support around the time PG13 would be released.)
I don't agree with either the notion that we have to wait 5 years in
this case or that we've only had a good alternative to the exclusive
backup mode since 9.5 as we've had pg_basebackup since 9.1.
Thanks,
Stephen
Greetings,
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
Clearly, not having to do that at all is better, but if this is all
there is to it, then I'm confused by the characterizations of how awful
and terrible this feature is and how we must rush to remove it.
It's not all there is to it though.
This issue leads to extended downtime regularly and is definitely a huge
'gotcha' for users, even if you don't want to call it outright broken,
but the other issue is that our documentation is ridiculously
complicated around how to do a backup properly because of this and that
also leads to the reality that it's difficult to make improvements to it
because every sentence has to consider both methods, and that's really
assuming that users actively read through the detailed documentation
instead of just looking at those nice simple 'pg_start_backup' and
'pg_stop_backup' methods and use them thinking that's all that's
required.
We see this *all* the time, on the lists, in blog posts (even those from
somewhat reputable companies...), and in other places. The exclusive
backup method is a huge foot-gun for new users to PostgreSQL and leads
to downtime, corrupt backups, and then corrupt databases when backups
get restored.
It really does need to go, and the sooner, the better.
Thanks,
Stephen
Greetings,
* Robert Haas (robertmhaas@gmail.com) wrote:
On Thu, Dec 13, 2018 at 2:29 PM David Steele <david@pgmasters.net> wrote:
We could leave the third parameter (changing the default to false) and
error if it has any value but false. It's a bit ugly but it does
maintain compatibility with the current non-exclusive backup interface.
And, the third parameter would never need to be specified unless we add
a fourth.Or we could add a new function and deprecate this one -- but what to
call it?
... and would people really realize that they have to do more than just
change the function name? I'm not sure.
I agree it's not very cool to break code for users who actually *did*
migrate to non-exclusive backups.
This bothers me much less. We change our APIs between major releases
from time-to-time and people need to update their code to handle that.
Uh, yeah. If you do that, you're not just forcing people off the old
interface -- you're making *everybody* update their stuff again. And
having to have conditional logic to handle different releases presents
its own set of dangers. IMHO, the biggest enemy in this area BY FAR is
human error, not the details of how the interface works. No amount of
providing a better-designed interface will compensate for the
confusion factor involved in changing it.
I don't agree with the argument that conditional logic for different
releases is an issue- we already require people to have that all over
the place and, while it's a bit annoying, it's part and parcel for
working with multiple major versions of PG.
My vote is ... when we actually deprecate this, change nothing at the
SQL level initially, but just throw an error if the "exclusive"
argument isn't false:
ERROR: exclusive backup mode is no longer supported
That will require everyone to use the three-argument form of
pg_start_backup(), but I think that's good, because if we accept the
one and two argument forms, how do we actually know that the user
updated their code? If you want to actually force people to switch to
the non-exclusive mode, you have to make sure that anything that might
be a residual request for exclusive backup mode errors out. If the
same SQL just does something different, the user might fail to notice.
This is an argument that I can agree with though. We certainly don't
want existing users of the exclusive mode to think "everything is fine!"
after it's been ripped out. I hate having to leave old API cruft around
like an argument that's required to be passed in and required to have a
specific value but it does seem the best way to make it extremely clear
to users of the exclusive mode that they really have to adjust their
code.
Perhaps in some future version we'll redefine the API more broadly and
be able to rip out this API entirely, throwing away such old cruft as we
do it, but until then I think we can live with it.
Thanks!
Stephen
Stephen Frost wrote:
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
Clearly, not having to do that at all is better, but if this is all
there is to it, then I'm confused by the characterizations of how awful
and terrible this feature is and how we must rush to remove it.It's not all there is to it though.
This issue leads to extended downtime regularly and is definitely a huge
'gotcha' for users, even if you don't want to call it outright broken,
Only if PostgreSQL crashes regularly, right?
Yours,
Laurenz Albe
Greetings,
* Laurenz Albe (laurenz.albe@cybertec.at) wrote:
Stephen Frost wrote:
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
Clearly, not having to do that at all is better, but if this is all
there is to it, then I'm confused by the characterizations of how awful
and terrible this feature is and how we must rush to remove it.It's not all there is to it though.
This issue leads to extended downtime regularly and is definitely a huge
'gotcha' for users, even if you don't want to call it outright broken,Only if PostgreSQL crashes regularly, right?
This happens anytime a backup is in progress and the system crashes in
any way- PostgreSQL going down, the container Postgres is running in,
the server Postgres is running on, or if the filesystem that Postgres is
running on goes away, etc.
There's certainly no shortage of cases where this can happen.
Thanks!
Stephen
On 2019-01-05 13:19:20 -0500, Stephen Frost wrote:
Greetings,
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
On 12/12/2018 05:31, Robert Haas wrote:
Most of the features I've been involved in removing have been
deprecated for 5+ years. The first release where this one was
deprecated was only 2 years ago. So it feels dramatically faster to
me than what I think we have typically done.I was just looking this up as well, and I find it too fast. The
nonexclusive backups were introduced in 9.6. So I'd say that we could
remove the exclusive ones when 9.5 goes EOL. (That would mean this
patch could be submitted for PostgreSQL 13, since 9.5 will go out of
support around the time PG13 would be released.)I don't agree with either the notion that we have to wait 5 years in
this case or that we've only had a good alternative to the exclusive
backup mode since 9.5 as we've had pg_basebackup since 9.1.
I don't agree with a general 5 year deprecation window either. But it
seems pretty clear that there's no majority for removing exclusive
backups in v12. I think it'd be good to make the warning about its
impending death more explicit, but otherwise mark this CF entry either
as rejected or returned with feedback.
Greetings,
Andres Freund
On 2/16/19 5:57 AM, Andres Freund wrote:
On 2019-01-05 13:19:20 -0500, Stephen Frost wrote:
Greetings,
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
On 12/12/2018 05:31, Robert Haas wrote:
Most of the features I've been involved in removing have been
deprecated for 5+ years. The first release where this one was
deprecated was only 2 years ago. So it feels dramatically faster to
me than what I think we have typically done.I was just looking this up as well, and I find it too fast. The
nonexclusive backups were introduced in 9.6. So I'd say that we could
remove the exclusive ones when 9.5 goes EOL. (That would mean this
patch could be submitted for PostgreSQL 13, since 9.5 will go out of
support around the time PG13 would be released.)I don't agree with either the notion that we have to wait 5 years in
this case or that we've only had a good alternative to the exclusive
backup mode since 9.5 as we've had pg_basebackup since 9.1.I don't agree with a general 5 year deprecation window either. But it
seems pretty clear that there's no majority for removing exclusive
backups in v12. I think it'd be good to make the warning about its
impending death more explicit, but otherwise mark this CF entry either
as rejected or returned with feedback.
I think there is support for the patch in PG13 so I was planning to move
it out of the March CF to the first PG13 CF as soon as the app will
allow it, i.e., once there is only a single open CF.
Thanks,
--
-David
david@pgmasters.net
On Mon, Feb 18, 2019 at 6:13 AM David Steele <david@pgmasters.net> wrote:
On 2/16/19 5:57 AM, Andres Freund wrote:
On 2019-01-05 13:19:20 -0500, Stephen Frost wrote:
Greetings,
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
On 12/12/2018 05:31, Robert Haas wrote:
Most of the features I've been involved in removing have been
deprecated for 5+ years. The first release where this one was
deprecated was only 2 years ago. So it feels dramatically faster to
me than what I think we have typically done.I was just looking this up as well, and I find it too fast. The
nonexclusive backups were introduced in 9.6. So I'd say that we could
remove the exclusive ones when 9.5 goes EOL. (That would mean this
patch could be submitted for PostgreSQL 13, since 9.5 will go out of
support around the time PG13 would be released.)I don't agree with either the notion that we have to wait 5 years in
this case or that we've only had a good alternative to the exclusive
backup mode since 9.5 as we've had pg_basebackup since 9.1.I don't agree with a general 5 year deprecation window either. But it
seems pretty clear that there's no majority for removing exclusive
backups in v12. I think it'd be good to make the warning about its
impending death more explicit, but otherwise mark this CF entry either
as rejected or returned with feedback.I think there is support for the patch in PG13 so I was planning to move
it out of the March CF to the first PG13 CF as soon as the app will
allow it, i.e., once there is only a single open CF.
Agreed, and I think we should also update the documentation for 12 with the
suggested more explicit mention of the deprecation.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Greetings,
* Magnus Hagander (magnus@hagander.net) wrote:
On Mon, Feb 18, 2019 at 6:13 AM David Steele <david@pgmasters.net> wrote:
On 2/16/19 5:57 AM, Andres Freund wrote:
On 2019-01-05 13:19:20 -0500, Stephen Frost wrote:
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
On 12/12/2018 05:31, Robert Haas wrote:
Most of the features I've been involved in removing have been
deprecated for 5+ years. The first release where this one was
deprecated was only 2 years ago. So it feels dramatically faster to
me than what I think we have typically done.I was just looking this up as well, and I find it too fast. The
nonexclusive backups were introduced in 9.6. So I'd say that we could
remove the exclusive ones when 9.5 goes EOL. (That would mean this
patch could be submitted for PostgreSQL 13, since 9.5 will go out of
support around the time PG13 would be released.)I don't agree with either the notion that we have to wait 5 years in
this case or that we've only had a good alternative to the exclusive
backup mode since 9.5 as we've had pg_basebackup since 9.1.I don't agree with a general 5 year deprecation window either. But it
seems pretty clear that there's no majority for removing exclusive
backups in v12. I think it'd be good to make the warning about its
impending death more explicit, but otherwise mark this CF entry either
as rejected or returned with feedback.I think there is support for the patch in PG13 so I was planning to move
it out of the March CF to the first PG13 CF as soon as the app will
allow it, i.e., once there is only a single open CF.Agreed, and I think we should also update the documentation for 12 with the
suggested more explicit mention of the deprecation.
+1.
Thanks!
Stephen
On 2/22/19 6:32 PM, Stephen Frost wrote:
* Magnus Hagander (magnus@hagander.net) wrote:
On Mon, Feb 18, 2019 at 6:13 AM David Steele <david@pgmasters.net> wrote:
On 2/16/19 5:57 AM, Andres Freund wrote:
I don't agree with a general 5 year deprecation window either. But it
seems pretty clear that there's no majority for removing exclusive
backups in v12. I think it'd be good to make the warning about its
impending death more explicit, but otherwise mark this CF entry either
as rejected or returned with feedback.I think there is support for the patch in PG13 so I was planning to move
it out of the March CF to the first PG13 CF as soon as the app will
allow it, i.e., once there is only a single open CF.Agreed, and I think we should also update the documentation for 12 with the
suggested more explicit mention of the deprecation.+1.
I'm working on this for the March CF. Ideally we would back-patch it to
9.6.
--
-David
david@pgmasters.net
On Tue, Nov 27, 2018 at 12:13 PM David Steele <david@pgmasters.net> wrote:
Hackers,
I propose we remove the deprecated exclusive backup mode of
pg_start_backup()/pg_stop_backup() for Postgres 12.
-1 for the removal. I think that there are still many users of an exclusive
backup API, and it's not so easy to migrate their backup scripts so that
only non-exclusive one is used because of the reason I wrote in another thread.
/messages/by-id/CAHGQGwHUkEbkVexVfWNLjmq2rzOS_SHYMiECt+KBn-cBPq5Arg@mail.gmail.com
Regards,
--
Fujii Masao
On Fri, Feb 22, 2019 at 12:35 PM Fujii Masao <masao.fujii@gmail.com> wrote:
-1 for the removal. I think that there are still many users of an exclusive
backup API, and it's not so easy to migrate their backup scripts so that
only non-exclusive one is used because of the reason I wrote in another thread.
/messages/by-id/CAHGQGwHUkEbkVexVfWNLjmq2rzOS_SHYMiECt+KBn-cBPq5Arg@mail.gmail.com
Yeah, those are interesting points. Unfortunately, I think something
as simple as your proposed...
psql -c "SELECT pg_start_backup()"
rsync, cp, tar, storage backup, or something
psql -c "SELECT pg_stop_backup()"
...doesn't have much chance of being correct. If you aren't checking
whether pg_start_backup() throws an error, for example, you have got a
very failure-prone set-up. That being said, it's possible to fix that
problem using some shell logic, whereas the problem of keeping a
connection open for the duration of the backup from the shell is much
harder. I recently ran across a case where someone attempted it but
did not get the logic entirely right, with rather painful
consequences.
Now you might say that people should not write their own tools and
just use professionally produced ones. But I don't really agree with
that, and whatever we think people SHOULD do, there are in fact a lot
of people using their own tools.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Greetings,
* Robert Haas (robertmhaas@gmail.com) wrote:
On Fri, Feb 22, 2019 at 12:35 PM Fujii Masao <masao.fujii@gmail.com> wrote:
-1 for the removal. I think that there are still many users of an exclusive
backup API, and it's not so easy to migrate their backup scripts so that
only non-exclusive one is used because of the reason I wrote in another thread.
/messages/by-id/CAHGQGwHUkEbkVexVfWNLjmq2rzOS_SHYMiECt+KBn-cBPq5Arg@mail.gmail.comYeah, those are interesting points. Unfortunately, I think something
as simple as your proposed...psql -c "SELECT pg_start_backup()"
rsync, cp, tar, storage backup, or something
psql -c "SELECT pg_stop_backup()"...doesn't have much chance of being correct. If you aren't checking
whether pg_start_backup() throws an error, for example, you have got a
very failure-prone set-up. That being said, it's possible to fix that
problem using some shell logic, whereas the problem of keeping a
connection open for the duration of the backup from the shell is much
harder. I recently ran across a case where someone attempted it but
did not get the logic entirely right, with rather painful
consequences.Now you might say that people should not write their own tools and
just use professionally produced ones. But I don't really agree with
that, and whatever we think people SHOULD do, there are in fact a lot
of people using their own tools.
Yes, they are- but I'd argue that's specifically because the
old/deprecated/broken API makes it *look* safe, even though it isn't.
Getting a solid and resiliant backup to work from a shell script is, imv
anyway (though I might have a bit of experience, having tried numerous
times myself and then realizing that it just isn't practical...), a
downright fool's errand.
Perhaps there's some way we can fix that so that people can write a lazy
rsync-based backup script and use it, but I don't see anyone seriously
working on that and I don't think that's because they just really like
to make things difficult for people but rather because it really is
*hard* to do, if not downright impossible.
What we need is a backup tool included in core that users feel
comfortable using instead of trying to write their own. I don't think
they write their own because it's somehow better than the existing tools
or because they really think it's a good idea- it's just what looks like
the "thing to do" based on our documentation and the API we provide
today. Sure, pg_basebackup is good for smaller environments and I
always encourage people to use it when they want something that's in
core that is safe to use and that does things correctly, but it's got
some pretty serious limitations due to using the replication protocol.
Thanks!
Stephen
On Feb 22, 2019, at 15:18, Stephen Frost <sfrost@snowman.net> wrote:
Getting a solid and resiliant backup to work from a shell script is, imv
anyway (though I might have a bit of experience, having tried numerous
times myself and then realizing that it just isn't practical...), a
downright fool's errand.
The reality, though, is that there are a lot of organizations who have invested time and effort into getting a backup strategy working using the existing APIs, and there will be quite a bit of pushback against the version in which the existing exclusive API is removed.
Some of those will be able to move to non-exclusive backups easily; others won't. For the ones that can't move easily, the reaction will not be, "PostgreSQL version x has a safer backup API"; it will be "PostgreSQL version x broke our backups, so we're not upgrading to it."
Rather than deprecate the existing API, I'd rather see the documentation updated to discuss the danger cases.
--
-- Christophe Pettus
xof@thebuild.com
Greetings,
* Christophe Pettus (xof@thebuild.com) wrote:
On Feb 22, 2019, at 15:18, Stephen Frost <sfrost@snowman.net> wrote:
Getting a solid and resiliant backup to work from a shell script is, imv
anyway (though I might have a bit of experience, having tried numerous
times myself and then realizing that it just isn't practical...), a
downright fool's errand.The reality, though, is that there are a lot of organizations who have invested time and effort into getting a backup strategy working using the existing APIs, and there will be quite a bit of pushback against the version in which the existing exclusive API is removed.
Do they realize how that existing backup strategy is flawed? I doubt
it, and when they discover how it's broken, I don't think they're going
to be thanking us for letting them continue to use it.
Some of those will be able to move to non-exclusive backups easily; others won't. For the ones that can't move easily, the reaction will not be, "PostgreSQL version x has a safer backup API"; it will be "PostgreSQL version x broke our backups, so we're not upgrading to it."
We don't cater to this line of argument when it comes to breaking
changes in the backend, or when we break monitoring scripts, and I don't
see a reason why we should do so here. They aren't required to upgrade
immediately- we provide 5 *years* of support for major versions which we
release, and we're going to give advance warning of this change that's
even beyond that 5 years. I have essentially zero sympathy for
organizations which refuse to allocate even the bit of time necessary to
address this change for over 5+ years.
Rather than deprecate the existing API, I'd rather see the documentation updated to discuss the danger cases.
Ok, then please do so, and please be prepared to continue to maintain
the documentation of both methods moving forward, because others have
tried and have (rightfully, in my opinion) decided that it's frankly not
worth the effort and ultimately just terribly confusing for users that
we have these two different backup methods and even just updating the
documentation for one or the other is downright painful (to the point
that people litterally give up on it). That really isn't a good place
to be in.
Thanks!
Stephen
Hi,,
On 2019-02-24 11:52:54 -0800, Christophe Pettus wrote:
On Feb 22, 2019, at 15:18, Stephen Frost <sfrost@snowman.net> wrote:
Getting a solid and resiliant backup to work from a shell script is, imv
anyway (though I might have a bit of experience, having tried numerous
times myself and then realizing that it just isn't practical...), a
downright fool's errand.The reality, though, is that there are a lot of organizations who have
invested time and effort into getting a backup strategy working using
the existing APIs, and there will be quite a bit of pushback against
the version in which the existing exclusive API is removed.Some of those will be able to move to non-exclusive backups easily;
others won't. For the ones that can't move easily, the reaction will
not be, "PostgreSQL version x has a safer backup API"; it will be
"PostgreSQL version x broke our backups, so we're not upgrading to
it."
I think it also depends on how we remove exclusive backups. If we do it
by adding a 'pg_run_hot_backup --conection-options -- shell_command_to_copy'
it ought to be pretty simple to fix scripts. Whereas right now, with the
requirement to manually have a postgres connection alive, which is
annoying to do from shell, is not the case.
I think if we'd introduced a command like that, we'd have seen a higher
adoption of non-exclusive backups.
Rather than deprecate the existing API, I'd rather see the documentation updated to discuss the danger cases.
You can't work around some of danger cases (crash in the wrong moment,
your database doesn't come up properly), so I don't think docs really
address the problem.
Greetings,
Andres Freund
On Feb 24, 2019, at 12:00, Stephen Frost <sfrost@snowman.net> wrote:
Do they realize how that existing backup strategy is flawed?
Undoubtedly, some do, some don't. However, given that it has been the *only* backup API for a very long time, many organizations have spent a lot of time closing all of the holes.
It's not impossible to do safe backups with the existing API. Unquestionably, there are installations doing backups that might end up with a silently badly one, but it's entirely possible to do that unsafely (in many of the same ways) with the non-exclusively API.
The installations that need to fix the scripts are also exactly the ones that can't use pg_basebackup or another pre-packaged solution, usually because they have a specific way of taking the file system copy (SAN snapshot, etc.) that those don't support.
We don't cater to this line of argument when it comes to breaking
changes in the backend, or when we break monitoring scripts, and I don't
see a reason why we should do so here.
Those cases are not analogous.
1. Backend APIs are declared non-stable, and do not have a wide audience compared to backing up the database.
2. Monitoring scripts, while important, are not as critical as the backup system. (And, in fact, I didn't agree with breaking those views either, but that's another discussion.)
Ok, then please do so, and please be prepared to continue to maintain
the documentation of both methods moving forward, because others have
tried and have (rightfully, in my opinion) decided that it's frankly not
worth the effort and ultimately just terribly confusing for users that
we have these two different backup methods and even just updating the
documentation for one or the other is downright painful (to the point
that people litterally give up on it).
We're going to have to do that anyway. For as long as we are maintaining the documentation on a version that has both APIs, we're going to have to say, "Don't use this one, and *here's why*." Saying, "Don't use this one because we said so" when it is an API of long standing that works just as it always did isn't going to cut it.
--
-- Christophe Pettus
xof@thebuild.com
Greetings,
* Christophe Pettus (xof@thebuild.com) wrote:
On Feb 24, 2019, at 12:00, Stephen Frost <sfrost@snowman.net> wrote:
Do they realize how that existing backup strategy is flawed?
Undoubtedly, some do, some don't. However, given that it has been the *only* backup API for a very long time, many organizations have spent a lot of time closing all of the holes.
No, it *hasn't* been the only backup API for a long time- that was only
true up until 9.6 was released, since then we've had both, and it's made
everything a downright mess because of exactly these arguments.
It's not impossible to do safe backups with the existing API. Unquestionably, there are installations doing backups that might end up with a silently badly one, but it's entirely possible to do that unsafely (in many of the same ways) with the non-exclusively API.
Yes, it *is* impossible to do safe backups with the existing API. There
is an unquestionable race condition where a system restart will cause
your system to not come back up without you going in and removing the
backup_label file- and the only way you make that race window small is
to remove the backup_label file right after you run pg_start_backup and
copy it, and then PUT IT BACK at the end before you call pg_stop_backup,
which is insane, but otherwise the 'race window' is the ENTIRE length of
the backup.
This is the reason the non-exclusive backup mode *exists*. All of the
effort that went into building the non-exclusive backup mode wasn't done
just for the fun of it.
The installations that need to fix the scripts are also exactly the ones that can't use pg_basebackup or another pre-packaged solution, usually because they have a specific way of taking the file system copy (SAN snapshot, etc.) that those don't support.
I'm not going to bother going into the issues that SAN snapshots have
here, because it's largely orthogonal to this discussion. What I'll say
is that they'll have over 5 years to adjust those scripts and I don't
see an issue with that.
We don't cater to this line of argument when it comes to breaking
changes in the backend, or when we break monitoring scripts, and I don't
see a reason why we should do so here.Those cases are not analogous.
I disagree *entirely*.
1. Backend APIs are declared non-stable, and do not have a wide audience compared to backing up the database.
*ALL* of our APIs are declared non-stable between major releases.
That's why we have *major* and *minor* releases.
2. Monitoring scripts, while important, are not as critical as the backup system. (And, in fact, I didn't agree with breaking those views either, but that's another discussion.)
Agreed, we should care even more about making sure that the tools we
provide to perform a backup are reliable, which is a good reason to
remove the broken and deprecated exclusive API.
Ok, then please do so, and please be prepared to continue to maintain
the documentation of both methods moving forward, because others have
tried and have (rightfully, in my opinion) decided that it's frankly not
worth the effort and ultimately just terribly confusing for users that
we have these two different backup methods and even just updating the
documentation for one or the other is downright painful (to the point
that people litterally give up on it).We're going to have to do that anyway. For as long as we are maintaining the documentation on a version that has both APIs, we're going to have to say, "Don't use this one, and *here's why*." Saying, "Don't use this one because we said so" when it is an API of long standing that works just as it always did isn't going to cut it.
We have existing documentation for the versions that have both APIs. It
sucks, it's horribly complicated, and no one wants to work on it because
of that.
Sure, it'd be nice if someone would fix it to list out the problems with
the exclusive API, but that requires someone wanting to spend time on
it. If that's not going to happen, I certainly don't think that means
we should *keep* it.
THanks!
Stephen
Andres Freund wrote:
On 2019-02-24 11:52:54 -0800, Christophe Pettus wrote:
On Feb 22, 2019, at 15:18, Stephen Frost <sfrost@snowman.net> wrote:
Getting a solid and resiliant backup to work from a shell script is, imv
anyway (though I might have a bit of experience, having tried numerous
times myself and then realizing that it just isn't practical...), a
downright fool's errand.
Here is my run with the fool's cap on:
https://github.com/cybertec-postgresql/safe-backup
The reality, though, is that there are a lot of organizations who have
invested time and effort into getting a backup strategy working using
the existing APIs, and there will be quite a bit of pushback against
the version in which the existing exclusive API is removed.Some of those will be able to move to non-exclusive backups easily;
others won't. For the ones that can't move easily, the reaction will
not be, "PostgreSQL version x has a safer backup API"; it will be
"PostgreSQL version x broke our backups, so we're not upgrading to
it."
From the reactions I see in the field, I'd agree.
I think it also depends on how we remove exclusive backups. If we do it
by adding a 'pg_run_hot_backup --conection-options -- shell_command_to_copy'
it ought to be pretty simple to fix scripts. Whereas right now, with the
requirement to manually have a postgres connection alive, which is
annoying to do from shell, is not the case.I think if we'd introduced a command like that, we'd have seen a higher
adoption of non-exclusive backups.
I think that is a good idea and will do the trick for many people.
It will be harder for those whose backup solution is driven by
a central backup software that backs up the file system and just offers
"pre-backup" and "post-backup" hooks.
Yours,
Laurenz Albe
Stephen Frost wrote:
Yes, it *is* impossible to do safe backups with the existing API. There
is an unquestionable race condition where a system restart will cause
your system to not come back up without you going in and removing the
backup_label file- and the only way you make that race window small is
to remove the backup_label file right after you run pg_start_backup and
copy it, and then PUT IT BACK at the end before you call pg_stop_backup,
which is insane, but otherwise the 'race window' is the ENTIRE length of
the backup.
I just have an idea:
What about an option to keep WAL around for the duration of an exclusive backup?
That way PostgreSQL can still restart after a crash. It will take longer than
expected, but it will work. But then, perhaps the long recovery time is only
marginally better than having to manually delete the backup_label file...
Yours,
Laurenz Albe
On Feb 24, 2019, at 13:00, Stephen Frost <sfrost@snowman.net> wrote:
No, it *hasn't* been the only backup API for a long time- that was only
true up until 9.6 was released, since then we've had both, and it's made
everything a downright mess because of exactly these arguments.
9.6 has been out for about 2.5 years. How long was the old version of pg_start_backup() the only interface? Considerably longer.
Yes, it *is* impossible to do safe backups with the existing API.
That's an overstatement. "The existing interface has failure conditions that are hard to work around" is true. Saying "it's impossible to do safe backups" implies that no pre-9.6 system has ever been backed up. If we take that line, it's just going to create a "Oh, come *on*" reaction from users.
What I'll say
is that they'll have over 5 years to adjust those scripts and I don't
see an issue with that.
That's not quite the situation. What they will have is a choice between upgrading, and or their existing scripts continuing to work. That choice will start ticking the instant a release without the old interface comes out. The back-pressure on upgrading is already very high; this is going to put a lot of installations into a bind, and a lot will make the decision not to upgrade because of that.
*ALL* of our APIs are declared non-stable between major releases.
That's why we have *major* and *minor* releases.
I note that a word game is being played here. We're calling pg_start_backup() "an API", as if it is the same as, say, the GiST C API, but it's a SQL level construct. SQL level constructs are not promises we need to keep forever, of course, but they're not the same class of promise as a C API.
Sure, it'd be nice if someone would fix it to list out the problems with
the exclusive API, but that requires someone wanting to spend time on
it.
I'll take a turn at it.
--
-- Christophe Pettus
xof@thebuild.com
Greetings,
* Laurenz Albe (laurenz.albe@cybertec.at) wrote:
Stephen Frost wrote:
Yes, it *is* impossible to do safe backups with the existing API. There
is an unquestionable race condition where a system restart will cause
your system to not come back up without you going in and removing the
backup_label file- and the only way you make that race window small is
to remove the backup_label file right after you run pg_start_backup and
copy it, and then PUT IT BACK at the end before you call pg_stop_backup,
which is insane, but otherwise the 'race window' is the ENTIRE length of
the backup.I just have an idea:
What about an option to keep WAL around for the duration of an exclusive backup?
That way PostgreSQL can still restart after a crash. It will take longer than
expected, but it will work. But then, perhaps the long recovery time is only
marginally better than having to manually delete the backup_label file...
I'm afraid that we'd end up with many, many complaints about people
running out of disk space on WAL when they are trying to take a backup..
I do applaud your efforts to think of a better solution but I'm afraid
that isn't really workable. While crashing with a backup_label in place
definitely sucks and makes recovering from that not fun, it's probably
better than having people run out of disk space and having the system
PANIC from that during what would otherwise be perfectly normal
operation.
That would also seem like a bit of an odd difference between the
exclusive and non-exclusive backup methods... and another things we'd
have to write up documentation for if we kept both methods around to try
and explain to users and that is just not a pleasant thought.
Thanks!
Stephen
Greetings,
* Christophe Pettus (xof@thebuild.com) wrote:
On Feb 24, 2019, at 13:00, Stephen Frost <sfrost@snowman.net> wrote:
No, it *hasn't* been the only backup API for a long time- that was only
true up until 9.6 was released, since then we've had both, and it's made
everything a downright mess because of exactly these arguments.9.6 has been out for about 2.5 years. How long was the old version of pg_start_backup() the only interface? Considerably longer.
Right, and PG12 will be out for another *5* years beyond that, meaning
people will have had 8.5 years to move from the exclusive API to the
non-exclusive one.
Yes, it *is* impossible to do safe backups with the existing API.
That's an overstatement. "The existing interface has failure conditions that are hard to work around" is true. Saying "it's impossible to do safe backups" implies that no pre-9.6 system has ever been backed up. If we take that line, it's just going to create a "Oh, come *on*" reaction from users.
Every one of the exclusive backups that was taken wasn't *safe*, and a
crash during any of them (which we've certainly heard plenty about
through various support channels over the years...) would have resulted
in a failure to properly restart.
No, I'm not saying that such backups are corrupted, *that* is an
overstatement, but it isn't actually what I'm saying, just a strawman
you've come up with to argue against.
What I'll say
is that they'll have over 5 years to adjust those scripts and I don't
see an issue with that.That's not quite the situation. What they will have is a choice between upgrading, and or their existing scripts continuing to work. That choice will start ticking the instant a release without the old interface comes out. The back-pressure on upgrading is already very high; this is going to put a lot of installations into a bind, and a lot will make the decision not to upgrade because of that.
That's fine- they can push off upgrading, for as long as 5 years, and
we'll still support them. That's the model we operate under for
everything, this isn't any different in that regard and I don't have
sympathy for people who insist on saying that *this*, just *this* one
API of *all* the ones we have simply *can't* be changed *ever*.
*ALL* of our APIs are declared non-stable between major releases.
That's why we have *major* and *minor* releases.I note that a word game is being played here. We're calling pg_start_backup() "an API", as if it is the same as, say, the GiST C API, but it's a SQL level construct. SQL level constructs are not promises we need to keep forever, of course, but they're not the same class of promise as a C API.
I'm not following here, at all... We're actually better, historically,
about not changing SQL-level constructs than C-level APIs, but we can
and do still change them, regularly, with every major release- including
making new reserved keywords and other very clearly potentially breaking
changes.
Sure, it'd be nice if someone would fix it to list out the problems with
the exclusive API, but that requires someone wanting to spend time on
it.I'll take a turn at it.
Just to be clear- I don't agree that simply documenting the issues with
it is a sufficient solution to make us keep it. I'll certainly continue
to advocate for its removal, but it would be nice for people who are
currently using it to be aware of the issues, so I do think it'd be good
to have the documentation improved in that regard, just make sure to not
invalidate the documentation around non-exclusive backup while you're at
it.
Thanks!
Stephen
On Feb 24, 2019, at 13:44, Stephen Frost <sfrost@snowman.net> wrote:
Right, and PG12 will be out for another *5* years beyond that, meaning
people will have had 8.5 years to move from the exclusive API to the
non-exclusive one.
The thing is that for 90% of installations, the clock will start ticking when the deprecation. Until then, most of them are not going to feel any pressure to make the change. Most installations have a lot of other things on their minds. They have the option of not upgrading, and that will be the option a lot of them take.
Every one of the exclusive backups that was taken wasn't *safe*, and a
crash during any of them (which we've certainly heard plenty about
through various support channels over the years...) would have resulted
in a failure to properly restart.
Most installations don't routinely encounter this. I know it happens, and it is definitely a problem, but the field is not strewn with corpses here. The new interface is unquestionably an improvement, but let's not get our messaging amped up to the point that it hurts our credibility.
No, I'm not saying that such backups are corrupted, *that* is an
overstatement, but it isn't actually what I'm saying, just a strawman
you've come up with to argue against.
That may not be what you are saying, but statements like "it *is* impossible to do safe backups with the existing API" will be heard that way. Let's keep the messaging we give users accurate.
this isn't any different in that regard and I don't have
sympathy for people who insist on saying that *this*, just *this* one
API of *all* the ones we have simply *can't* be changed *ever*.
Sure, we can change anything. It's whether this particular deprecation is a good idea.
I'm not following here, at all... We're actually better, historically,
about not changing SQL-level constructs than C-level APIs
That's my point. Calling this an "API" makes the change sound more routine than it is. It's an "API" that a *lot* of installations depend on.
I don't agree that simply documenting the issues with
it is a sufficient solution to make us keep it.
Understood, but I think we need to document it no matter what.
--
-- Christophe Pettus
xof@thebuild.com
Greetings,
* Christophe Pettus (xof@thebuild.com) wrote:
On Feb 24, 2019, at 13:44, Stephen Frost <sfrost@snowman.net> wrote:
Right, and PG12 will be out for another *5* years beyond that, meaning
people will have had 8.5 years to move from the exclusive API to the
non-exclusive one.The thing is that for 90% of installations, the clock will start ticking when the deprecation. Until then, most of them are not going to feel any pressure to make the change. Most installations have a lot of other things on their minds. They have the option of not upgrading, and that will be the option a lot of them take.
Just to be clear- it was *already* deprecated, we are just talking here
about making that more clear/obvious.
And, again, they're welcome to not upgrade for as long as they'd like,
frankly, but we'll only provide back-patches for 5 years.
Every one of the exclusive backups that was taken wasn't *safe*, and a
crash during any of them (which we've certainly heard plenty about
through various support channels over the years...) would have resulted
in a failure to properly restart.Most installations don't routinely encounter this. I know it happens, and it is definitely a problem, but the field is not strewn with corpses here. The new interface is unquestionably an improvement, but let's not get our messaging amped up to the point that it hurts our credibility.
No, I'm not saying that such backups are corrupted, *that* is an
overstatement, but it isn't actually what I'm saying, just a strawman
you've come up with to argue against.That may not be what you are saying, but statements like "it *is* impossible to do safe backups with the existing API" will be heard that way. Let's keep the messaging we give users accurate.
this isn't any different in that regard and I don't have
sympathy for people who insist on saying that *this*, just *this* one
API of *all* the ones we have simply *can't* be changed *ever*.Sure, we can change anything. It's whether this particular deprecation is a good idea.
You say above that the new interface is unquestionably an improvement
and here say that we shouldn't deprecate the old one in favor of it
(even though we actually already have... but that's beside the point I'm
trying to make here), so what you're advocating for is that we keep an
old and known broken interface that we know causes real issues even
after we've developed a new and unquestionably better one.
For my 2c, I think keeping the old interface around at all was a mistake
from the start, just the same as things like pg_user have been kept
around forever because of fear of breaking things, even as we rip out
the recovery.conf file or rename tons of things in the catalog across
major versions.
I really, honestly, believe what we need to *stop* doing is trying to
drag along support for old/deprecated interfaces after we've introduced
new ones, thus avoiding these arguments and the time spent on them, and
the time spent dealing with implementing and documenting new APIs around
old ones. The only thing it seems to unquestionably do is make us argue
about when we are going to *finally* rip out the old/deprecated API (or
interface, or whatever you want to call it).
This is the new "should we call it PG 10.0 or PG 9.7 this time?"
argument, all over again...
I'm not following here, at all... We're actually better, historically,
about not changing SQL-level constructs than C-level APIsThat's my point. Calling this an "API" makes the change sound more routine than it is. It's an "API" that a *lot* of installations depend on.
A lot of them depended on pg_wal being named pg_xlog too, but we seem to
have managed reasonably well through that, not to mention all the
catalog changes that caused issues for monitoring, etc.
I don't agree that simply documenting the issues with
it is a sufficient solution to make us keep it.Understood, but I think we need to document it no matter what.
Sure, go for it.
Thanks!
Stephen
Hi,
On 2019-02-24 17:19:22 -0500, Stephen Frost wrote:
You say above that the new interface is unquestionably an improvement
FWIW, I think we didn't design it even remotely as well as we ought to
have. It was both a mistake to not offer a version of non-exclusive
backups that works with a client connection (for integration with the
TSMs of this world), and it was a mistake not to offer a commandline
tool that does the non-exclusive pg/start backup.
I really, honestly, believe what we need to *stop* doing is trying to
drag along support for old/deprecated interfaces after we've introduced
new ones, thus avoiding these arguments and the time spent on them, and
the time spent dealing with implementing and documenting new APIs around
old ones. The only thing it seems to unquestionably do is make us argue
about when we are going to *finally* rip out the old/deprecated API (or
interface, or whatever you want to call it).
While I agree that we should remove non-exclusive backups, I VEHEMENTLY
oppose the unconditionality of the above statement. Yes, it's annoying
to keep interfaces around, but unnecessarily inflicting pain to everyone
also is annoying.
That's not to say we shouldn't be better at announcing and then
following through on the deprecation of old interfaces.
Greetings,
Andres Freund
On Feb 24, 2019, at 14:19, Stephen Frost <sfrost@snowman.net> wrote:
You say above that the new interface is unquestionably an improvement
and here say that we shouldn't deprecate the old one in favor of it
(even though we actually already have... but that's beside the point I'm
trying to make here), so what you're advocating for is that we keep an
old and known broken interface that we know causes real issues even
after we've developed a new and unquestionably better one.
Yes, I am advocating exactly that. The reason that I think we need to keep the old one (or, at least, not remove it as soon as 12) is that creating an obstacle to upgrades is worse than retaining the old one, and it *will* be an obstacle to upgrades (or to using the community edition at all).
I do think we need a simple, pg_basebackup-level-complexity hot backup tool that allows a forked copy operation, per Andres' suggestion.
There's no single answer to every possible situation like this that pops up; it depends on how widely used the interface was, how reasonable the expectation of permanence is, and the difficulty the users will encounter in changing to the new one, along with the complexity of maintaining the documentation and code for the old interface.
A lot of them depended on pg_wal being named pg_xlog too, but we seem to
have managed reasonably well through that, not to mention all the
catalog changes that caused issues for monitoring, etc.
Some of the incompatible catalog changes (in particular, in pg_stat_activity) I thought were gratuitous, but we did them, and no point in relitigating that now. I'd say that the internal layout of PGDATA is fairly weak promise compared to an SQL-level construct, especially one as widely used as pg_start_backup().
--
-- Christophe Pettus
xof@thebuild.com
Hi,
On 2019-02-24 14:35:04 -0800, Christophe Pettus wrote:
On Feb 24, 2019, at 14:19, Stephen Frost <sfrost@snowman.net> wrote:
You say above that the new interface is unquestionably an improvement
and here say that we shouldn't deprecate the old one in favor of it
(even though we actually already have... but that's beside the point I'm
trying to make here), so what you're advocating for is that we keep an
old and known broken interface that we know causes real issues even
after we've developed a new and unquestionably better one.Yes, I am advocating exactly that. The reason that I think we need to
keep the old one (or, at least, not remove it as soon as 12) is that
creating an obstacle to upgrades is worse than retaining the old one,
and it *will* be an obstacle to upgrades (or to using the community
edition at all).
It sounds to me like you treat this as if having the old method around
had no downsides. But I have seen *numereous* downtimes due to it, and
also corruption triggered by it (people following the hint to remove
backup_label).
A lot of them depended on pg_wal being named pg_xlog too, but we seem to
have managed reasonably well through that, not to mention all the
catalog changes that caused issues for monitoring, etc.Some of the incompatible catalog changes (in particular, in
pg_stat_activity) I thought were gratuitous, but we did them, and no
point in relitigating that now. I'd say that the internal layout of
PGDATA is fairly weak promise compared to an SQL-level construct,
especially one as widely used as pg_start_backup().
I don't buy that, because you normally specifically should exclude
pg_xlog/pg_wal from basebackups.
Greetings,
Andres Freund
On 2/24/19 11:36 PM, Stephen Frost wrote:
Greetings,
* Laurenz Albe (laurenz.albe@cybertec.at) wrote:
Stephen Frost wrote:
Yes, it *is* impossible to do safe backups with the existing API. There
is an unquestionable race condition where a system restart will cause
your system to not come back up without you going in and removing the
backup_label file- and the only way you make that race window small is
to remove the backup_label file right after you run pg_start_backup and
copy it, and then PUT IT BACK at the end before you call pg_stop_backup,
which is insane, but otherwise the 'race window' is the ENTIRE length of
the backup.I just have an idea:
What about an option to keep WAL around for the duration of an exclusive backup?
That way PostgreSQL can still restart after a crash. It will take longer than
expected, but it will work. But then, perhaps the long recovery time is only
marginally better than having to manually delete the backup_label file...I'm afraid that we'd end up with many, many complaints about people
running out of disk space on WAL when they are trying to take a backup..
This would also require replaying all that WAL during crash recovery
which could mean a much longer startup time.
--
-David
david@pgmasters.net
On 2/25/19 12:35 AM, Christophe Pettus wrote:
On Feb 24, 2019, at 14:19, Stephen Frost <sfrost@snowman.net> wrote:
You say above that the new interface is unquestionably an improvement
and here say that we shouldn't deprecate the old one in favor of it
(even though we actually already have... but that's beside the point I'm
trying to make here), so what you're advocating for is that we keep an
old and known broken interface that we know causes real issues even
after we've developed a new and unquestionably better one.Yes, I am advocating exactly that. The reason that I think we need to keep the old one (or, at least, not remove it as soon as 12)...
Exclusive backup will not be removed for PG12. There wasn't support for
it so I push it out to PG13.
There does appear to be support for removing it in PG13, though, and I'd
like to see that done sooner than later.
I do think we need a simple, pg_basebackup-level-complexity hot backup tool that allows a forked copy operation, per Andres' suggestion.
This is a good idea -- but I'm still a bit mystified why the ability to
run a backup in shell script is considered to be a hard requirement.
Just about any language you can name performs non-exclusive backups with
ease -- except shell. Perhaps the problem is with the language?
Regards,
--
-David
david@pgmasters.net
On 2/22/19 7:10 PM, Robert Haas wrote:
On Fri, Feb 22, 2019 at 12:35 PM Fujii Masao <masao.fujii@gmail.com> wrote:
-1 for the removal. I think that there are still many users of an exclusive
backup API, and it's not so easy to migrate their backup scripts so that
only non-exclusive one is used because of the reason I wrote in another thread.
/messages/by-id/CAHGQGwHUkEbkVexVfWNLjmq2rzOS_SHYMiECt+KBn-cBPq5Arg@mail.gmail.comYeah, those are interesting points. Unfortunately, I think something
as simple as your proposed...psql -c "SELECT pg_start_backup()"
rsync, cp, tar, storage backup, or something
psql -c "SELECT pg_stop_backup()"...doesn't have much chance of being correct. If you aren't checking
whether pg_start_backup() throws an error, for example, you have got a
very failure-prone set-up. That being said, it's possible to fix that
problem using some shell logic, whereas the problem of keeping a
connection open for the duration of the backup from the shell is much
harder. I recently ran across a case where someone attempted it but
did not get the logic entirely right, with rather painful
consequences.Now you might say that people should not write their own tools and
just use professionally produced ones. But I don't really agree with
that, and whatever we think people SHOULD do, there are in fact a lot
of people using their own tools.
FWIW +1 for not remove exclusive backup.
Maintain a connection during the backup is a hard pre-requisite. In my
previous jobs I already done custom scripts to perform backup by using
pre/post backup hook to control backup software: With vmware to do PITR
backup with VM snapshot or with another file backup tool which perform
block deduplication. I do not see where is the problem if you check
pg_start_backup()/pg_stop_backup() went well?
It will be annoying if after this removal, companies must change their
backup strategy by using specific postgres tools (pgbackrest, barman).
Regards,
David Steele <david@pgmasters.net> writes:
On 2/25/19 12:35 AM, Christophe Pettus wrote:
On Feb 24, 2019, at 14:19, Stephen Frost <sfrost@snowman.net> wrote:
You say above that the new interface is unquestionably an improvement
and here say that we shouldn't deprecate the old one in favor of it
(even though we actually already have... but that's beside the point I'm
trying to make here), so what you're advocating for is that we keep an
old and known broken interface that we know causes real issues even
after we've developed a new and unquestionably better one.Yes, I am advocating exactly that. The reason that I think we need
to keep the old one (or, at least, not remove it as soon as 12)...Exclusive backup will not be removed for PG12. There wasn't support for
it so I push it out to PG13.
How about making the deprecation more obvious in PG12, by making
pg_start_backup() in exclusive mode issue a WARNING that it will be
removed in PG13?
We had a similar problem in Perl, where we got pushback for removing
deprecated features because they had been deprecated for so long (in
some cases over 20 years) with no sign of actually being removed. We
alleviated this by making all deprecation warnings include the version
in which the feature would be removed (at least two major versions after
the one where the versioned warning was introduced).
Having this sort of clarity about what deprecation actually constitutes
is also useful when deciding whether to deprecate or merely discourage a
feature.
My two minor currency units,
- ilmari
--
"A disappointingly low fraction of the human race is,
at any given time, on fire." - Stig Sandbeck Mathisen
Greetings,
* Adrien NAYRAT (adrien.nayrat@anayrat.info) wrote:
On 2/22/19 7:10 PM, Robert Haas wrote:
On Fri, Feb 22, 2019 at 12:35 PM Fujii Masao <masao.fujii@gmail.com> wrote:
-1 for the removal. I think that there are still many users of an exclusive
backup API, and it's not so easy to migrate their backup scripts so that
only non-exclusive one is used because of the reason I wrote in another thread.
/messages/by-id/CAHGQGwHUkEbkVexVfWNLjmq2rzOS_SHYMiECt+KBn-cBPq5Arg@mail.gmail.comYeah, those are interesting points. Unfortunately, I think something
as simple as your proposed...psql -c "SELECT pg_start_backup()"
rsync, cp, tar, storage backup, or something
psql -c "SELECT pg_stop_backup()"...doesn't have much chance of being correct. If you aren't checking
whether pg_start_backup() throws an error, for example, you have got a
very failure-prone set-up. That being said, it's possible to fix that
problem using some shell logic, whereas the problem of keeping a
connection open for the duration of the backup from the shell is much
harder. I recently ran across a case where someone attempted it but
did not get the logic entirely right, with rather painful
consequences.Now you might say that people should not write their own tools and
just use professionally produced ones. But I don't really agree with
that, and whatever we think people SHOULD do, there are in fact a lot
of people using their own tools.FWIW +1 for not remove exclusive backup.
Maintain a connection during the backup is a hard pre-requisite. In my
previous jobs I already done custom scripts to perform backup by using
pre/post backup hook to control backup software: With vmware to do PITR
backup with VM snapshot or with another file backup tool which perform block
deduplication. I do not see where is the problem if you check
pg_start_backup()/pg_stop_backup() went well?
This, really, is a large part of the problem- people don't realize and
just simply don't want to believe, it seems at least, that there's any
problem with what they're doing because it works some, or even most, of
the time.
Making a running PG cluster *look* like it was restored from a backup is
really bad and leads to a lot of confusion and cases where PG won't
start up, but that's what the exclusive pg_start_backup/pg_stop_backup
method does and is more-or-less *required* of any solution that wants to
just be able to tar up the data directory as the backup method, which is
why it ends up just being a bad interface overall.
And then there's other issues- things like making sure that you actually
got all of the WAL, without which the backup is inconsistent being the
leading one but there's others.
It will be annoying if after this removal, companies must change their
backup strategy by using specific postgres tools (pgbackrest, barman).
You don't write your own database system using CSV files and shell
magic, do you? I have to say that it continues to boggle my mind that
people insist that *this* part of the system has to be able to be
implementable using shell scripts.
Folks, these are your backups we're talking about, your last resort if
everything else goes up in flames, why do you want to risk that by
implementing your own one-off solution, particularly when there's known
serious issues using that interface, and you want to just use shell
scripts to do it...?
Thanks!
Stephen
Greetings,
* Dagfinn Ilmari Mannsåker (ilmari@ilmari.org) wrote:
David Steele <david@pgmasters.net> writes:
On 2/25/19 12:35 AM, Christophe Pettus wrote:
On Feb 24, 2019, at 14:19, Stephen Frost <sfrost@snowman.net> wrote:
You say above that the new interface is unquestionably an improvement
and here say that we shouldn't deprecate the old one in favor of it
(even though we actually already have... but that's beside the point I'm
trying to make here), so what you're advocating for is that we keep an
old and known broken interface that we know causes real issues even
after we've developed a new and unquestionably better one.Yes, I am advocating exactly that. The reason that I think we need
to keep the old one (or, at least, not remove it as soon as 12)...Exclusive backup will not be removed for PG12. There wasn't support for
it so I push it out to PG13.How about making the deprecation more obvious in PG12, by making
pg_start_backup() in exclusive mode issue a WARNING that it will be
removed in PG13?
So.. We've more-or-less come full circle back to where the thread had
left off last time- the plan is to update the documentation to make it
clearer that the exclusive mode is deprecated and that it's going to be
removed, and then we'll actually remove it in PG13.
We could also add such a warning but I'm not entirely convinced it's a
good idea. There was some earlier discussion about it that I'd probably
want to go back and reread first.
Thanks!
Stephen
Greetings,
* Andres Freund (andres@anarazel.de) wrote:
On 2019-02-24 17:19:22 -0500, Stephen Frost wrote:
You say above that the new interface is unquestionably an improvement
FWIW, I think we didn't design it even remotely as well as we ought to
have. It was both a mistake to not offer a version of non-exclusive
backups that works with a client connection (for integration with the
TSMs of this world), and it was a mistake not to offer a commandline
tool that does the non-exclusive pg/start backup.
I'm not really following- did you mean to say "without a client
connection" above?
We could still add some kind of commandline tool to help with the
non-exclusive pg start/stop backup but I'm not really sure exactly what
that would look like and I honestly don't have terribly much interest in
spending resources on implementing it since it would lack a great many
features that we'd then end up wanting to add on... and then we end up
backing into building yet another backup tool.
I really, honestly, believe what we need to *stop* doing is trying to
drag along support for old/deprecated interfaces after we've introduced
new ones, thus avoiding these arguments and the time spent on them, and
the time spent dealing with implementing and documenting new APIs around
old ones. The only thing it seems to unquestionably do is make us argue
about when we are going to *finally* rip out the old/deprecated API (or
interface, or whatever you want to call it).While I agree that we should remove non-exclusive backups, I VEHEMENTLY
oppose the unconditionality of the above statement. Yes, it's annoying
to keep interfaces around, but unnecessarily inflicting pain to everyone
also is annoying.
Alright, then how about we provide a bit of help for everyone who's got
a system built around recovery.conf today, instead of just completely
ripping that out?
If we're happy to do that then I really can't agree with these arguments
that there's things we should try to maintain when it comes to
interfaces, and that's far from the only example of a pretty massive
change that just went into version X with zero notice to users about
what they're using today being deprecated.
That's not to say we shouldn't be better at announcing and then
following through on the deprecation of old interfaces.
We announce things have changed every year, and people have five years
from that point, during which we'll continue to support them and fix
bugs and deal with security issues, to make whatever changes they need
to for the new interface.
The argument seems to be that people want new features but they don't
want to have to do any work to get those new features. Instead, they
expect the community to take on the burden of maintaining those old
interfaces for them, so that they can get those new features. That
seems like a pretty raw deal for the community and not one that I
really think we should be supporting, and it isn't like we're actually
consistent with it at all- except that we don't break the back-branches
and we do make breaking changes in major versions.
When is something too big of a change to make in a given major version
and we have to, instead, provide backwards compatibility for it? What
is that policy? How many releases do we have to support it for? How do
we communicate that to our users, so that they know what they can depend
on being there across major releases and what they can't?
Thanks!
Stephen
Stephen Frost wrote:
It will be annoying if after this removal, companies must change their
backup strategy by using specific postgres tools (pgbackrest, barman).You don't write your own database system using CSV files and shell
magic, do you? I have to say that it continues to boggle my mind that
people insist that *this* part of the system has to be able to be
implementable using shell scripts.Folks, these are your backups we're talking about, your last resort if
everything else goes up in flames, why do you want to risk that by
implementing your own one-off solution, particularly when there's known
serious issues using that interface, and you want to just use shell
scripts to do it...?
If we didn't think that simplicity in backup has some value, why does
PostgreSQL provide pg_basebackup?
Not everybody wants to use tools like pgbackrest and barman because it
takes some effort to set them up properly. It seems that you think that
people who want something simpler are irresponsible.
Incidentally, both barman and pgbackrest stream the backup to a backup server.
I think the most important use case for the low level backup API is when
your database is so large that you want to leverage storage techniques
like snapshots for backup, because you can't or don't want to stream all
those terabytes across the network.
I'm not playing devil's advocate here to annoy you. I see the problems
with the exclusive backup, and I see how it can hurt people.
I just think that removing exclusive backup without some kind of help
like Andres sketched above will make people unhappy.
Yours,
Laurenz Albe
Greetings,
* Laurenz Albe (laurenz.albe@cybertec.at) wrote:
Stephen Frost wrote:
It will be annoying if after this removal, companies must change their
backup strategy by using specific postgres tools (pgbackrest, barman).You don't write your own database system using CSV files and shell
magic, do you? I have to say that it continues to boggle my mind that
people insist that *this* part of the system has to be able to be
implementable using shell scripts.Folks, these are your backups we're talking about, your last resort if
everything else goes up in flames, why do you want to risk that by
implementing your own one-off solution, particularly when there's known
serious issues using that interface, and you want to just use shell
scripts to do it...?If we didn't think that simplicity in backup has some value, why does
PostgreSQL provide pg_basebackup?
I'm all for people using pg_basebackup and encourage them to do so, but,
really, that tool *is* too simple by itself: you need to combine it with
something else that handles data retention, backup validation, etc.
This could be some independent backup solution though, since
pg_basebackup gives you a simple set of files for the backup that can
then be backed up using those other backup tools. (The same is true of
pgbackrest, btw, and people certainly do use pgbackrest to back up to a
repository and then back that repo up using other tools, but in this
case pgbackrest does have some of those data retention and backup
validation capabilities itself, so it isn't necessary to do that.)
Not everybody wants to use tools like pgbackrest and barman because it
takes some effort to set them up properly. It seems that you think that
people who want something simpler are irresponsible.
I don't mind simpler (we actually try pretty hard to make pgbackrest
simple to use, in fact... the simplest config only requires a 4-line
config file), provided that it actually does everything that you
need a backup tool to do. I do think it's an unfortunate reality that
not enough people take the time to think about what they really need
from a backup solution and spend the time to either find or build a
proper solution.
Incidentally, both barman and pgbackrest stream the backup to a backup server.
I'm pretty sure both can backup to a local repository too (I know
pgbackrest can), and not just stream to a backup server, so I'm not
really sure what you're getting at here.
I think the most important use case for the low level backup API is when
your database is so large that you want to leverage storage techniques
like snapshots for backup, because you can't or don't want to stream all
those terabytes across the network.
Snapshots place an awful lot of faith in your storage layer-
particularly when you aren't doing any kind of validation that the
snapshot is in good shape and that there hasn't been any in-place latent
corruption, ever. If you think about it, a snapshot is actually very
similar to just always doing incremental backups with pgbackrest and
never doing a new full backup or even a check of those old files that
haven't changed to see if they're still valid. It's not an approach I'd
recommend depending on exclusively.
There's ways to address that, of course, such as building a manifest of
checksums of the files in the data directory, such that you can then
validate that the snapshot is correct, and detect if any corruption ends
up happening (and checksum the manifest as well), though you need to
also track the LSN of the pg_start_backup, so you can know that changes
which happened after that LSN are in the WAL that's captured during the
checksum, and you need a place to put all of this for when you want to
go back and re-validate an old snapshot.. and likely other things that
I'm forgetting at the moment, but it's definitely something we've
thought about and have considered how we might add support to pgbackrest
for working with snapshots (and it wouldn't surprise me if that got
implemented before v13 even..).
If you are happy to place complete faith in snapshots and trust that
they're entirely atomic, you can always forgo dealing with any of this
and just snapshot PGDATA and let PG go through crash recovery when you
restore.
I'm not playing devil's advocate here to annoy you. I see the problems
with the exclusive backup, and I see how it can hurt people.
I just think that removing exclusive backup without some kind of help
like Andres sketched above will make people unhappy.
Keeping it will definitely make me unhappy. :)
I'm not against something simple being added to help with the new API,
or with keeping the old API if someone can figure out how to solve for
the known issues with it, just to be clear. I do worry that the
'simple' thing we add either won't help in a lot of cases or will end up
being much more complex and difficult to use than we're thinking, but we
can discuss that if someone actually implements it, I suppose.
Thanks!
Stephen
On Feb 25, 2019, at 05:14, Stephen Frost <sfrost@snowman.net> wrote:
You don't write your own database system using CSV files and shell
magic, do you? I have to say that it continues to boggle my mind that
people insist that *this* part of the system has to be able to be
implementable using shell scripts.
It's not complicated why: Backup has to interact with a lot more components of the overall environment than a CSV export, and those components vary *wildly* from one installation to another, and are often over-contrained by local policy.
--
-- Christophe Pettus
xof@thebuild.com
On Feb 25, 2019, at 05:18, Stephen Frost <sfrost@snowman.net> wrote:
So.. We've more-or-less come full circle back to where the thread had
left off last time- the plan is to update the documentation to make it
clearer that the exclusive mode is deprecated and that it's going to be
removed, and then we'll actually remove it in PG13.
That's not my position, certainly; I still object to its removal.
--
-- Christophe Pettus
xof@thebuild.com
On Feb 24, 2019, at 22:49, David Steele <david@pgmasters.net> wrote:
This is a good idea -- but I'm still a bit mystified why the ability to run a backup in shell script is considered to be a hard requirement. Just about any language you can name performs non-exclusive backups with ease -- except shell. Perhaps the problem is with the language?
Many installations have a high degree of discomfort implementing system management functionality in something other than bash. It can be for any number of reasons: previous bad experiences (*cough* RHEL/Python *cough*), the feeling that too many dependencies are dragged in, or just lack of familiarity with them.
--
-- Christophe Pettus
xof@thebuild.com
Greetings,
* Christophe Pettus (xof@thebuild.com) wrote:
On Feb 25, 2019, at 05:18, Stephen Frost <sfrost@snowman.net> wrote:
So.. We've more-or-less come full circle back to where the thread had
left off last time- the plan is to update the documentation to make it
clearer that the exclusive mode is deprecated and that it's going to be
removed, and then we'll actually remove it in PG13.That's not my position, certainly; I still object to its removal.
It wasn't my intent to outline that as your position, my apologies if it
came across that way.
I do believe that's still the plan though, based on both this and the
prior discussion, unless something else happens to address the known
issues.
Thanks!
Stephen
On Sun, Feb 24, 2019 at 3:00 PM Stephen Frost <sfrost@snowman.net> wrote:
Ok, then please do so, and please be prepared to continue to maintain
the documentation of both methods moving forward, because others have
tried and have (rightfully, in my opinion) decided that it's frankly not
worth the effort and ultimately just terribly confusing for users that
we have these two different backup methods and even just updating the
documentation for one or the other is downright painful (to the point
that people litterally give up on it). That really isn't a good place
to be in.
This, to me, is downright rude. You don't get to tell other people
what to do. Nor do you get to tell other people "hey, I'm going to
make this change that you don't like unless you agree to do the work
which I specify." If you want to make a change, you have to build
consensus for that change. If you can't get people to agree with your
proposed change, what happens is that you don't get to make that
change. Whether other people choose to do any work that you and they
might happen to agree is valuable is up to them.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Feb 25, 2019 at 1:49 AM David Steele <david@pgmasters.net> wrote:
Exclusive backup will not be removed for PG12. There wasn't support for
it so I push it out to PG13.There does appear to be support for removing it in PG13, though, and I'd
like to see that done sooner than later.
Given the additional votes that have appeared since the previous
discussion, I am no longer convinced that there is a consensus for
ripping it out in PG13.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Greetings,
* Robert Haas (robertmhaas@gmail.com) wrote:
On Sun, Feb 24, 2019 at 3:00 PM Stephen Frost <sfrost@snowman.net> wrote:
Ok, then please do so, and please be prepared to continue to maintain
the documentation of both methods moving forward, because others have
tried and have (rightfully, in my opinion) decided that it's frankly not
worth the effort and ultimately just terribly confusing for users that
we have these two different backup methods and even just updating the
documentation for one or the other is downright painful (to the point
that people litterally give up on it). That really isn't a good place
to be in.This, to me, is downright rude. You don't get to tell other people
what to do. Nor do you get to tell other people "hey, I'm going to
make this change that you don't like unless you agree to do the work
which I specify." If you want to make a change, you have to build
consensus for that change. If you can't get people to agree with your
proposed change, what happens is that you don't get to make that
change. Whether other people choose to do any work that you and they
might happen to agree is valuable is up to them.
It wasn't my intent to be rude, and my apologies to you and Christophe
for it coming across that way.
I do think we can ask that people who wish to have a capability included
in PG (or continue to be included when there are serious known issues
with it...) be prepared to either build and maintain it themselves or to
convince someone else to do so (or both, and have a committer agree to
it). That's how we've long operated and it wasn't my intent to imply
otherwise, but I agree that I could have said it in a nicer way to avoid
it coming across as telling Christophe what to do.
I'm also on board with building a consensus for making a change, but a
consensus does not mean that everyone must agree or be supportive of the
change. There's also a practical side to things which is that if the
consensus seems to be "we are going to get rid of X" and someone wants
to work on X, we should probably make it pretty clear to them that
there's a consensus to remove it to allow them the option to decide if
they wish to still work on X, or not. Similairly, I think it's good to
let people know who want to work on Y when Y is really hard, so that
they have some idea what they're getting into.
Thanks!
Stephen
On Mon, Feb 25, 2019 at 8:42 AM Stephen Frost <sfrost@snowman.net> wrote:
Alright, then how about we provide a bit of help for everyone who's got
a system built around recovery.conf today, instead of just completely
ripping that out?If we're happy to do that then I really can't agree with these arguments
that there's things we should try to maintain when it comes to
interfaces, and that's far from the only example of a pretty massive
change that just went into version X with zero notice to users about
what they're using today being deprecated.
It's not the same thing. First of all, the recovery.conf changes have
been under discussion for quite a few years now, whereas this change
has been under discussion for only a few months. It is also worth
noting that those recovery.conf changes would've been made years ago
if people hadn't objected to breaking backward compatibility; people
have just as much of a right to object to these changes on those
grounds as they did to those changes. Second, the recovery.conf
changes are intended to provide a tangible benefit - namely, getting
the recovery.conf settings into the GUC system, so that they can be
queried and changed using methods available for other GUCs. But the
change proposed here breaks things for people who are using the
mechanism in question and it doesn't give them anything else instead.
Indeed, it seems quite obvious that you don't have the LEAST interest
in putting any effort into actually making things in this area better
for users; it appears that you just want to break what they're doing
now and shove the work of coping with those changes onto them.
Christophe is quite right to question whether that will cause users
not to upgrade. I think it's overtly user-hostile. You have referred
to the effort of maintaining this code, but you haven't pointed to any
tangible improvement that you'd like to make in this area that is
actually being blocked by the existence of exclusive backup mode.
It's all just griping about how terrible the whole thing is, and it
seems that there are actually quite a few people here who find it less
terrible than you think it is.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Feb 22, 2019 at 6:18 PM Stephen Frost <sfrost@snowman.net> wrote:
What we need is a backup tool included in core that users feel
comfortable using instead of trying to write their own.
I agree. That's a great idea. Let's talk about how to make that
happen. Providing a tool that gives people MORE AND BETTER options
than what they have today is likely to make users more happy.
Removing an option that people are currently using, and which they
find better than other available options for reasons with which I
understand that you disagree, will make users more sad. Happy is
better.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Greetings,
* Robert Haas (robertmhaas@gmail.com) wrote:
On Mon, Feb 25, 2019 at 8:42 AM Stephen Frost <sfrost@snowman.net> wrote:
Alright, then how about we provide a bit of help for everyone who's got
a system built around recovery.conf today, instead of just completely
ripping that out?If we're happy to do that then I really can't agree with these arguments
that there's things we should try to maintain when it comes to
interfaces, and that's far from the only example of a pretty massive
change that just went into version X with zero notice to users about
what they're using today being deprecated.It's not the same thing. First of all, the recovery.conf changes have
been under discussion for quite a few years now, whereas this change
has been under discussion for only a few months.
This argument doesn't work for me because it doesn't make *any*
difference to our users- the vast majority of whom don't follow
-hackers. Our users will simply see that in v12 the system they've
built around recovery.conf won't work and they'll have to learn the new
way and adjust everything to deal with it.
If that argument did matter, we could go back and find the prior
discussions about the issues around the exclusive backup mode and about
removing it, or next year we could point to this thread about it, or the
year after, and say "well, we talked about it a lot, so now let's just
do it", but it all ends up looking the same to our users unless we
actually write into some kind of user-facing documentation or WARNINGs
or similar that something is going away.
It is also worth
noting that those recovery.conf changes would've been made years ago
if people hadn't objected to breaking backward compatibility; people
have just as much of a right to object to these changes on those
grounds as they did to those changes. Second, the recovery.conf
changes are intended to provide a tangible benefit - namely, getting
the recovery.conf settings into the GUC system, so that they can be
queried and changed using methods available for other GUCs. But the
change proposed here breaks things for people who are using the
mechanism in question and it doesn't give them anything else instead.
This isn't a fair argument either because we're having this *after* the
new API was implemented for backup- namely non-exclusive mode, which
*did* give people something better to use instead.
If we went back and added back into the v12 branch support for a
recovery.conf file, then we'd be having exactly this same argument next
year about if we should remove the recovery.conf file support or leave
it in- and you could make the same argument then that removing the
recovery.conf support doesn't give the user anything tangible to replace
it with- because that would have already been done.
Indeed, it seems quite obvious that you don't have the LEAST interest
in putting any effort into actually making things in this area better
for users; it appears that you just want to break what they're doing
now and shove the work of coping with those changes onto them.
What will make things better for users is actually moving to the new API
that's been available for years now, so that they can avoid the pitfalls
with the old API.
Christophe is quite right to question whether that will cause users
not to upgrade. I think it's overtly user-hostile. You have referred
to the effort of maintaining this code, but you haven't pointed to any
tangible improvement that you'd like to make in this area that is
actually being blocked by the existence of exclusive backup mode.
I disagree quite a bit with this statement- the existing documentation
is absolutely horrid and needs to be completely ripped out and rewritten
and maintaining the exclusive backup mode in such a rewrite would
absolutely be a *lot* of additional work.
We actually took a shot at trying to improve the documentation while
continuing to cover both the exclusive and the non-exclusive mode and
it's hugely painful.
It's all just griping about how terrible the whole thing is, and it
seems that there are actually quite a few people here who find it less
terrible than you think it is.
I used to be one of those people. I know that it looks fine and it
certainly seems appealing but having gone through bad experiences with
it, and seen others stumble through those same experiences time and time
again, I've learned that it really is an issue, and one that I would
very much like to avoid causing future users to stumble over.
Thanks!
Stephen
On Mon, Feb 25, 2019 at 11:09 AM Stephen Frost <sfrost@snowman.net> wrote:
I do think we can ask that people who wish to have a capability included
in PG (or continue to be included when there are serious known issues
with it...) be prepared to either build and maintain it themselves or to
convince someone else to do so (or both, and have a committer agree to
it). That's how we've long operated and it wasn't my intent to imply
otherwise, but I agree that I could have said it in a nicer way to avoid
it coming across as telling Christophe what to do.
I think it is certainly true that if you want a new capability added,
you have to either write it yourself or get someone to do it for you.
There is SOME precedent for the proposition that if you want an
obsolete capability not to be removed, you should be prepared to help
fix it. But frankly I think the latter proposition is one we've taken
only occasionally, and only for things that were a whole lot cruftier
and more problematic than this is. For example, if a library is no
longer available and we have a contrib module that depends on that
library, it's reasonable to say that we can't keep the contrib module
unless somebody rewrites it not to depend on that library any more.
But the current situation is completely different. The code
maintenance burden resulting from keeping exclusive-mode backups is
mostly hypothetical; the code works as well as it ever did. We rarely
take the position that we're going to rip out older code that doesn't
work as nicely as newer code does just because nobody's prepared to
e.g. better-document the old code.
For example, we still have FEBE protocol 2 support floating around.
If you're going to talk about badly-designed footguns that ought to be
excised with extreme prejudice, that would IMHO be a far better place
to start than this. Unlike this code, which it's now obvious is used
by quite a number of people even just among those who read this list
regularly, that code is probably nearly unused and untested and, if we
broke it, we probably wouldn't know.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Greetings,
* Robert Haas (robertmhaas@gmail.com) wrote:
On Fri, Feb 22, 2019 at 6:18 PM Stephen Frost <sfrost@snowman.net> wrote:
What we need is a backup tool included in core that users feel
comfortable using instead of trying to write their own.I agree. That's a great idea. Let's talk about how to make that
happen. Providing a tool that gives people MORE AND BETTER options
than what they have today is likely to make users more happy.
I've been working for years on making this happen, and am currently
putting quite a large amount of resources into what I hope will be the
final step in the process to get a solution that will have a great many
more and better options than what's available today. Maybe it'll end up
happening, which I think would be great, maybe it won't, which would be
sad, but at least we'll have tried.
Removing an option that people are currently using, and which they
find better than other available options for reasons with which I
understand that you disagree, will make users more sad. Happy is
better.
I don't want to see more users stumbling over the issues with the
exclusive backup interface. A better interface exists, and has existed
since 9.6. The exclusive backup method was deprecated in 2016. One of
the really bad things about the exclusive backup method is that it
*looks* like it works well and that there aren't any issues with it, but
when users hit the issues, they get very sad. We might have 1000s of
happy users who never run into those issues and therefore don't want to
see us change anything, but what about the 10s or 100s of users who do
hit those issues, what do we tell them? Seems like we're saying "well,
sorry, we knew those issues existed and it's unfortunate you hit them,
but there's this better thing that you *should* have been using and then
you wouldn't have hit those issues." That doesn't seem likely to make
people happy with us.
Thanks!
Stephen
On Mon, Feb 25, 2019 at 11:23 AM Stephen Frost <sfrost@snowman.net> wrote:
If that argument did matter, we could go back and find the prior
discussions about the issues around the exclusive backup mode and about
removing it, or next year we could point to this thread about it, or the
year after, and say "well, we talked about it a lot, so now let's just
do it", but it all ends up looking the same to our users unless we
actually write into some kind of user-facing documentation or WARNINGs
or similar that something is going away.
That's true. But nobody's objecting to strengthening the warnings in
the documentation. People are objecting to removing the thing itself.
This isn't a fair argument either because we're having this *after* the
new API was implemented for backup- namely non-exclusive mode, which
*did* give people something better to use instead.
There are several people who are saying that doesn't meet all their
needs, giving reasons why it's problematic, and suggesting things that
could be done to make it less problematic. It's not OK to say "we can
remove exclusive backup mode because now we have non-exclusive backup
mode" unless other people actually agree that all the use cases are
covered, and it appears that they don't.
I disagree quite a bit with this statement- the existing documentation
is absolutely horrid and needs to be completely ripped out and rewritten
and maintaining the exclusive backup mode in such a rewrite would
absolutely be a *lot* of additional work.We actually took a shot at trying to improve the documentation while
continuing to cover both the exclusive and the non-exclusive mode and
it's hugely painful.
Well, perhaps that's pain you need to incur. The alternative seems to
be to rip out something that people don't want ripped out.
I used to be one of those people. I know that it looks fine and it
certainly seems appealing but having gone through bad experiences with
it, and seen others stumble through those same experiences time and time
again, I've learned that it really is an issue, and one that I would
very much like to avoid causing future users to stumble over.
Sure, that sounds great. But the way to do that is to continue
improving the system until exclusive-mode backups really are not a
useful thing any more, not to remove it while there are still a lot of
people relying on it who can offer tangible explanations for their
choice to do so.
It feels to me like you are portraying the increasing number of people
objecting to this change as naive or foolish or at least not as
enlightened as you are, and I really object to that. I happen to
think that people like Christophe Pettus and Fujii Masao and Laurenz
Albe are smart people whose opinions ought to be taken as just as
valid as your own.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Feb 25, 2019 at 11:33 AM Stephen Frost <sfrost@snowman.net> wrote:
Removing an option that people are currently using, and which they
find better than other available options for reasons with which I
understand that you disagree, will make users more sad. Happy is
better.I don't want to see more users stumbling over the issues with the
exclusive backup interface. A better interface exists, and has existed
since 9.6. The exclusive backup method was deprecated in 2016. One of
the really bad things about the exclusive backup method is that it
*looks* like it works well and that there aren't any issues with it, but
when users hit the issues, they get very sad. We might have 1000s of
happy users who never run into those issues and therefore don't want to
see us change anything, but what about the 10s or 100s of users who do
hit those issues, what do we tell them? Seems like we're saying "well,
sorry, we knew those issues existed and it's unfortunate you hit them,
but there's this better thing that you *should* have been using and then
you wouldn't have hit those issues." That doesn't seem likely to make
people happy with us.
Sure, nobody wants users to keep hitting the same problems over and
over again, but the above is still reductionist. If we take the view
that anything that can cause data corruption in any scenario, no
matter how remote, is more than everything else, then we should just
stop shipping minor releases and halt all other development work until
we deal with https://commitfest.postgresql.org/22/528/ that has been
open for 14 CommitFests and until we've fully dealt with every aspect
of fsync-gate. I don't see you treating either of those issues with
the same sense of urgency with which you're treating this one, so
evidently you feel entitled to decide which
potentially-data-corrupting issues you want to attack first.
And I agree that you should have exactly that right. Along similar
lines, I'd like our users to have the right to decide when they want
to upgrade from using exclusive backup mode to non-exclusive backup
mode, instead of being forced into making a change at a time decided
by our fiat.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Greetings,
* Robert Haas (robertmhaas@gmail.com) wrote:
On Mon, Feb 25, 2019 at 11:23 AM Stephen Frost <sfrost@snowman.net> wrote:
If that argument did matter, we could go back and find the prior
discussions about the issues around the exclusive backup mode and about
removing it, or next year we could point to this thread about it, or the
year after, and say "well, we talked about it a lot, so now let's just
do it", but it all ends up looking the same to our users unless we
actually write into some kind of user-facing documentation or WARNINGs
or similar that something is going away.That's true. But nobody's objecting to strengthening the warnings in
the documentation. People are objecting to removing the thing itself.
Then how long will we carry it forward? How much warning do we need to
provide?
This isn't a fair argument either because we're having this *after* the
new API was implemented for backup- namely non-exclusive mode, which
*did* give people something better to use instead.There are several people who are saying that doesn't meet all their
needs, giving reasons why it's problematic, and suggesting things that
could be done to make it less problematic. It's not OK to say "we can
remove exclusive backup mode because now we have non-exclusive backup
mode" unless other people actually agree that all the use cases are
covered, and it appears that they don't.
I haven't heard anyone say it doesn't meet their needs, just that it's
not as easy to use, which is a fair critcism, but not the same thing.
I'm all for making the non-exclusive mode less problematic if we're able
to. If that's something that would help us move forward with getting
rid of the exclusive backup mode than that's an area that I'd be willing
to put resources.
I disagree quite a bit with this statement- the existing documentation
is absolutely horrid and needs to be completely ripped out and rewritten
and maintaining the exclusive backup mode in such a rewrite would
absolutely be a *lot* of additional work.We actually took a shot at trying to improve the documentation while
continuing to cover both the exclusive and the non-exclusive mode and
it's hugely painful.Well, perhaps that's pain you need to incur. The alternative seems to
be to rip out something that people don't want ripped out.
... now I feel like I'm being told what to do.
The point is that keeping it actually *is* work, it isn't free, not even
in our tree. Making our users stumble over the issues with it also
isn't free, that's another cost of keeping it.
I used to be one of those people. I know that it looks fine and it
certainly seems appealing but having gone through bad experiences with
it, and seen others stumble through those same experiences time and time
again, I've learned that it really is an issue, and one that I would
very much like to avoid causing future users to stumble over.Sure, that sounds great. But the way to do that is to continue
improving the system until exclusive-mode backups really are not a
useful thing any more, not to remove it while there are still a lot of
people relying on it who can offer tangible explanations for their
choice to do so.It feels to me like you are portraying the increasing number of people
objecting to this change as naive or foolish or at least not as
enlightened as you are, and I really object to that. I happen to
think that people like Christophe Pettus and Fujii Masao and Laurenz
Albe are smart people whose opinions ought to be taken as just as
valid as your own.
I honestly do doubt that they have had the same experiences that I have
had, but that doesn't mean that they're not smart or that I don't value
their opinion- I agree that they're all smart people and I certainly do
value their opinions. That doesn't mean that I can't disagree with them
or can't explain why I disagree.
Thanks!
Stephen
Greetings,
* Robert Haas (robertmhaas@gmail.com) wrote:
On Mon, Feb 25, 2019 at 11:33 AM Stephen Frost <sfrost@snowman.net> wrote:
Removing an option that people are currently using, and which they
find better than other available options for reasons with which I
understand that you disagree, will make users more sad. Happy is
better.I don't want to see more users stumbling over the issues with the
exclusive backup interface. A better interface exists, and has existed
since 9.6. The exclusive backup method was deprecated in 2016. One of
the really bad things about the exclusive backup method is that it
*looks* like it works well and that there aren't any issues with it, but
when users hit the issues, they get very sad. We might have 1000s of
happy users who never run into those issues and therefore don't want to
see us change anything, but what about the 10s or 100s of users who do
hit those issues, what do we tell them? Seems like we're saying "well,
sorry, we knew those issues existed and it's unfortunate you hit them,
but there's this better thing that you *should* have been using and then
you wouldn't have hit those issues." That doesn't seem likely to make
people happy with us.Sure, nobody wants users to keep hitting the same problems over and
over again, but the above is still reductionist. If we take the view
that anything that can cause data corruption in any scenario, no
matter how remote, is more than everything else, then we should just
stop shipping minor releases and halt all other development work until
we deal with https://commitfest.postgresql.org/22/528/ that has been
open for 14 CommitFests and until we've fully dealt with every aspect
of fsync-gate. I don't see you treating either of those issues with
the same sense of urgency with which you're treating this one, so
evidently you feel entitled to decide which
potentially-data-corrupting issues you want to attack first.
I'm not advocating for removing it today or tomorrow, or in some point
release. We're already to the stage where we're talking about
something that wouldn't hit until late 2020. Also, frankly, while there
might be something I could do to help, the whole madness with fsync gate
seems to be getting dealt with by some very smart people who know a
whole lot more about the internals of Linux and FreeBSD and I'd rather
not get in their way. On the other hand, dealing with backups and
issues around backups has been something that I at least hope I've
gotten decent at.
And I agree that you should have exactly that right. Along similar
lines, I'd like our users to have the right to decide when they want
to upgrade from using exclusive backup mode to non-exclusive backup
mode, instead of being forced into making a change at a time decided
by our fiat.
Our users *have* that option, and have for over two years now, and
it'll have been 4 years by the time v13 comes out. I agree that the
non-exclusive backup mode isn't as simple as the exclusive backup mode-
to use your example, what we *used* to be doing with fsync() was sure a
lot simpler than what we're going to be doing in the future, but
sometimes people have to make changes, even to more complicated APIs,
to make sure that they're doing things the right way to ensure that
their data doesn't get eaten.
Thanks!
Stephen
Greetings,
* Robert Haas (robertmhaas@gmail.com) wrote:
On Mon, Feb 25, 2019 at 11:09 AM Stephen Frost <sfrost@snowman.net> wrote:
I do think we can ask that people who wish to have a capability included
in PG (or continue to be included when there are serious known issues
with it...) be prepared to either build and maintain it themselves or to
convince someone else to do so (or both, and have a committer agree to
it). That's how we've long operated and it wasn't my intent to imply
otherwise, but I agree that I could have said it in a nicer way to avoid
it coming across as telling Christophe what to do.I think it is certainly true that if you want a new capability added,
you have to either write it yourself or get someone to do it for you.
There is SOME precedent for the proposition that if you want an
obsolete capability not to be removed, you should be prepared to help
fix it. But frankly I think the latter proposition is one we've taken
only occasionally, and only for things that were a whole lot cruftier
and more problematic than this is. For example, if a library is no
longer available and we have a contrib module that depends on that
library, it's reasonable to say that we can't keep the contrib module
unless somebody rewrites it not to depend on that library any more.
But the current situation is completely different. The code
maintenance burden resulting from keeping exclusive-mode backups is
mostly hypothetical; the code works as well as it ever did. We rarely
take the position that we're going to rip out older code that doesn't
work as nicely as newer code does just because nobody's prepared to
e.g. better-document the old code.
I'd argue that if something which has been deprecated for years is
seriously getting in the way of making progress that we should be
willing to accept a proposal to rip it out after giving people
sufficient time to adjust, and that if someone really wants to keep that
deprecated API then they need to be willing to spend the time to address
the reasons why it was deprecated.
For example, we still have FEBE protocol 2 support floating around.
If you're going to talk about badly-designed footguns that ought to be
excised with extreme prejudice, that would IMHO be a far better place
to start than this. Unlike this code, which it's now obvious is used
by quite a number of people even just among those who read this list
regularly, that code is probably nearly unused and untested and, if we
broke it, we probably wouldn't know.
You can probably guess my feelings with regard to the FEBE v2 support.
At least, in that case, people aren't really using it though. What I
really dislike is that we've got a *lot* of people using something that
we *know* has some pretty serious potential data-eating problems.
Saying that we should keep it around because a lot of people are using
it isn't the way to persuade me that we should keep it.
Thanks!
Stephen
On Feb 25, 2019, at 08:55, Stephen Frost <sfrost@snowman.net> wrote:
I honestly do doubt that they have had the same experiences that I have
had
Well, I guarantee you that no two people on this list have had identical experiences. :) I certainly have been bitten by the problems with the current system. But the resistance to major version upgrades is *huge*, and I'm strongly biased against anything that will make that harder. I'm not sure I'm communicating how big a problem telling many large installations, "If you move to v12/13/etc., you will have to change your backup system" is going to be.
--
-- Christophe Pettus
xof@thebuild.com
On Mon, Feb 25, 2019 at 10:49 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Stephen Frost wrote:
It will be annoying if after this removal, companies must change their
backup strategy by using specific postgres tools (pgbackrest, barman).You don't write your own database system using CSV files and shell
magic, do you? I have to say that it continues to boggle my mind that
people insist that *this* part of the system has to be able to be
implementable using shell scripts.Folks, these are your backups we're talking about, your last resort if
everything else goes up in flames, why do you want to risk that by
implementing your own one-off solution, particularly when there's known
serious issues using that interface, and you want to just use shell
scripts to do it...?If we didn't think that simplicity in backup has some value, why does
PostgreSQL provide pg_basebackup?Not everybody wants to use tools like pgbackrest and barman because it
takes some effort to set them up properly. It seems that you think that
people who want something simpler are irresponsible.Incidentally, both barman and pgbackrest stream the backup to a backup server.
I think the most important use case for the low level backup API is when
your database is so large that you want to leverage storage techniques
like snapshots for backup, because you can't or don't want to stream all
those terabytes across the network.I'm not playing devil's advocate here to annoy you. I see the problems
with the exclusive backup, and I see how it can hurt people.
I just think that removing exclusive backup without some kind of help
like Andres sketched above will make people unhappy.
+1
Another idea is to improve an exclusive backup method so that it will never
cause such issue. What about changing an exclusive backup mode of
pg_start_backup() so that it creates something like backup_label.pending file
instead of backup_label? Then if the database cluster has backup_label.pending
file but not recovery.signal (this is the case where the database is recovered
just after the server crashes while an exclusive backup is in progress),
in this idea, the recovery using that database cluster always ignores
(or removes) backup_label.pending file and start replaying WAL from
the REDO location that pg_control file indicates. So this idea enables us to
work around the issue that an exclusive backup could cause.
On the other hand, the downside of this idea is that the users need to change
the recovery procedure. When they want to do PITR using the backup having
backup_label.pending, they need to not only create recovery.signal but also
rename backup_label.pending to backup_label. Rename of backup_label file
is brand-new step for their recovery procedure, and changing the recovery
procedure might be painful for some users. But IMO it's less painful than
removing an exclusive backup API at all.
Thought?
BTW, if recovery.signal is created but backup_label.pending is not renamed
(this is the case where the operator forgets to rename the file even though
she or he create recovery signal file, i.e., mis-configuration), I think that
the recovery should emit PANIC immediately with the HINT like
"HINT: rename backup_label.pening to backup_label if you want to do PITR".
Regards,
--
Fujii Masao
On 2/25/19 7:50 PM, Fujii Masao wrote:
On Mon, Feb 25, 2019 at 10:49 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
I'm not playing devil's advocate here to annoy you. I see the problems
with the exclusive backup, and I see how it can hurt people.
I just think that removing exclusive backup without some kind of help
like Andres sketched above will make people unhappy.+1
Another idea is to improve an exclusive backup method so that it will never
cause such issue. What about changing an exclusive backup mode of
pg_start_backup() so that it creates something like backup_label.pending file
instead of backup_label? Then if the database cluster has backup_label.pending
file but not recovery.signal (this is the case where the database is recovered
just after the server crashes while an exclusive backup is in progress),
in this idea, the recovery using that database cluster always ignores
(or removes) backup_label.pending file and start replaying WAL from
the REDO location that pg_control file indicates. So this idea enables us to
work around the issue that an exclusive backup could cause.
It's an interesting idea.
On the other hand, the downside of this idea is that the users need to change
the recovery procedure. When they want to do PITR using the backup having
backup_label.pending, they need to not only create recovery.signal but also
rename backup_label.pending to backup_label. Rename of backup_label file
is brand-new step for their recovery procedure, and changing the recovery
procedure might be painful for some users. But IMO it's less painful than
removing an exclusive backup API at all.
Well, given that we have invalidated all prior recovery procedures in
PG12 I'm not sure how big a deal that is. Of course, it's too late make
a change like this for PG12.
Thought?
Here's the really obvious bad thing: if users do not update their
procedures and we ignore backup_label.pending on startup then they will
end up with a corrupt database because it will not replay from the
correct checkpoint. If we error on the presence of backup_label.pending
then we are right back to where we started.
I know there are backup solutions that rely on copying all required WAL
to pg_xlog/pg_wal before starting recovery. Those solutions would
silently break in this case and end up in corruption. If we require
recovery.signal then we still have the current problem of the cluster
not starting after a crash.
BTW, if recovery.signal is created but backup_label.pending is not renamed
(this is the case where the operator forgets to rename the file even though
she or he create recovery signal file, i.e., mis-configuration), I think that
the recovery should emit PANIC immediately with the HINT like
"HINT: rename backup_label.pening to backup_label if you want to do PITR".
This causes its own problems, as stated above.
Regards,
--
-David
david@pgmasters.net
On Mon, Feb 25, 2019 at 11:33:33AM -0500, Stephen Frost wrote:
I don't want to see more users stumbling over the issues with the
exclusive backup interface. A better interface exists, and has existed
since 9.6.
Do you really think we would be having this discussion if the
non-exclusive backup method was unequivocally better? It is better for
some use-cases, and impossible for others.
Also, you can't say it will have no impact for five years on people who
do not upgrade. The impact will be that they will have no new Postgres
features for five years.
I am not taking a stance on this issue, but making unclear statements
isn't helping the discussion.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
Hi Christophe,
On 2/25/19 7:24 PM, Christophe Pettus wrote:
On Feb 25, 2019, at 08:55, Stephen Frost <sfrost@snowman.net> wrote:
I honestly do doubt that they have had the same experiences that I have
hadWell, I guarantee you that no two people on this list have had identical experiences. :) I certainly have been bitten by the problems with the current system. But the resistance to major version upgrades is *huge*, and I'm strongly biased against anything that will make that harder. I'm not sure I'm communicating how big a problem telling many large installations, "If you move to v12/13/etc., you will have to change your backup system" is going to be.
I honestly think you are underestimating how bad this can be.
The prevailing wisdom is that it's unfortunate that these backup_labels
get left around but they can be removed with scripting, so no big deal.
After that the cluster will start.
But -- if you are too aggressive about removing the backup_label and
accidentally do it before a real restore from backup, then you have a
corrupt cluster. Totally silent, but definitely corrupt. You'll
probably only see it when you start getting consistency errors from the
indexes, if ever. Page checksums won't catch it either unless you are
*lucky* enough to have a torn page.
Erroneous scripting of this kind can also affect backups that were made
with the non-exclusive method since the backups look the same.
fsync() is the major corruption issue we are facing right now but that
doesn't mean there aren't other sources of corruption we should be
thinking about. I've thought about this one a lot and it scares me.
I've worked on ways to make it better, but all of them break something
and involve compromises that are nearly as severe as removing exclusive
backups entirely.
Regards,
--
-David
david@pgmasters.net
Greetings,
* Christophe Pettus (xof@thebuild.com) wrote:
On Feb 25, 2019, at 08:55, Stephen Frost <sfrost@snowman.net> wrote:
I honestly do doubt that they have had the same experiences that I have
hadWell, I guarantee you that no two people on this list have had identical experiences. :)
Indeed! :)
I certainly have been bitten by the problems with the current system. But the resistance to major version upgrades is *huge*, and I'm strongly biased against anything that will make that harder. I'm not sure I'm communicating how big a problem telling many large installations, "If you move to v12/13/etc., you will have to change your backup system" is going to be.
Aren't they going to need to make a change for v12 now anyway?
Hopefully they're regularly testing their backups by doing a restore of
them, and dropping a recovery.conf into the directory of a v12 system
after restore will do exactly nothing and they'll get errors complaining
about how they need to provide a restore_command.
That's actually what prompted bringing this painful topic up again-
there's a large change being done to how backup/restore with PG is done
(the recovery.conf file is going away), and people who have written any
kind of automation (or even just documented procedures...) around that
will have to update their systems. At least from my perspective, making
them have to do such an update once, instead of once now and another
time in the future when we remove exclusive backup (or figure out a way
to do it that's safe and update the instructions for how to do it
right...), is the better option.
* Bruce Momjian (bruce@momjian.us) wrote:
On Mon, Feb 25, 2019 at 11:33:33AM -0500, Stephen Frost wrote:
I don't want to see more users stumbling over the issues with the
exclusive backup interface. A better interface exists, and has existed
since 9.6.Do you really think we would be having this discussion if the
non-exclusive backup method was unequivocally better? It is better for
some use-cases, and impossible for others.
Based on Christophe's comment above, anything which required users to
make a change on upgrade to their backup system would be cause to have
this discussion, which likely includes most possible solutions to the
issues with exclusive backup too, unfortunately..
Also, you can't say it will have no impact for five years on people who
do not upgrade. The impact will be that they will have no new Postgres
features for five years.
I don't think I made the claim that there wouldn't be any impact for
five years, I said they would continue to have support for five years.
Also, this is exactly what we tell them for any other breaking change
(such as removal of recovery.conf).
I am not taking a stance on this issue, but making unclear statements
isn't helping the discussion.
It's not my intent to make unclear statements, so I certainly appreicate
you, and anyone else, pointing out when I do. I'm happy to clarify.
Thanks!
Stephen
Import Notes
Reply to msg id not found: 20190225182311.zuteaz2c5jqcorwv@momjian.usBEBB89D5-4BFA-4D0F-B4C1-202EBF4B1636@thebuild.com | Resolved by subject fallback
Greetings,
* David Steele (david@pgmasters.net) wrote:
On 2/25/19 7:50 PM, Fujii Masao wrote:
Thought?
Here's the really obvious bad thing: if users do not update their procedures
and we ignore backup_label.pending on startup then they will end up with a
corrupt database because it will not replay from the correct checkpoint. If
we error on the presence of backup_label.pending then we are right back to
where we started.
Right, we definitely can't make a change which would cause non-updated
systems using the same APIs to suddenly end up with corruption. If we
require a change be made to the scripts that run the process then we
must make it something the user is essentially opt'ing into- where
they've told us explicitly that they're ready to use the new interface,
which is how the non-exclusive backup mode was added.
Thanks!
Stephen
On Mon, Feb 25, 2019 at 02:24:18PM -0500, Stephen Frost wrote:
* Bruce Momjian (bruce@momjian.us) wrote:
On Mon, Feb 25, 2019 at 11:33:33AM -0500, Stephen Frost wrote:
I don't want to see more users stumbling over the issues with the
exclusive backup interface. A better interface exists, and has existed
since 9.6.Do you really think we would be having this discussion if the
non-exclusive backup method was unequivocally better? It is better for
some use-cases, and impossible for others.Based on Christophe's comment above, anything which required users to
make a change on upgrade to their backup system would be cause to have
this discussion, which likely includes most possible solutions to the
issues with exclusive backup too, unfortunately..
I am not addressing what Christophe said, but rather what you said. We
clearly are fine in requiring people to update their software for major
releases. I think if there was a way for old backup methods to work at
all, this would be a lot simpler converation.
Also, you can't say it will have no impact for five years on people who
do not upgrade. The impact will be that they will have no new Postgres
features for five years.I don't think I made the claim that there wouldn't be any impact for
five years, I said they would continue to have support for five years.Also, this is exactly what we tell them for any other breaking change
(such as removal of recovery.conf).I am not taking a stance on this issue, but making unclear statements
isn't helping the discussion.It's not my intent to make unclear statements, so I certainly appreicate
you, and anyone else, pointing out when I do. I'm happy to clarify.
Good.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
Hi,
On 2019-02-25 08:14:16 -0500, Stephen Frost wrote:
It will be annoying if after this removal, companies must change their
backup strategy by using specific postgres tools (pgbackrest, barman).You don't write your own database system using CSV files and shell
magic, do you? I have to say that it continues to boggle my mind that
people insist that *this* part of the system has to be able to be
implementable using shell scripts.Folks, these are your backups we're talking about, your last resort if
everything else goes up in flames, why do you want to risk that by
implementing your own one-off solution, particularly when there's known
serious issues using that interface, and you want to just use shell
scripts to do it...?
FWIW, if you weren't selling backrest quite so hard everywhere backups
are mentioned, I'd find this thread a lot more convicing.
Greetings,
Andres Freund
On 2019-02-25 08:42:02 -0500, Stephen Frost wrote:
Greetings,
* Andres Freund (andres@anarazel.de) wrote:
On 2019-02-24 17:19:22 -0500, Stephen Frost wrote:
You say above that the new interface is unquestionably an improvement
FWIW, I think we didn't design it even remotely as well as we ought to
have. It was both a mistake to not offer a version of non-exclusive
backups that works with a client connection (for integration with the
TSMs of this world), and it was a mistake not to offer a commandline
tool that does the non-exclusive pg/start backup.I'm not really following- did you mean to say "without a client
connection" above?
Yes.
We could still add some kind of commandline tool to help with the
non-exclusive pg start/stop backup
That doesn't help, as long as the connection requirement is there. As
explained elsewhere in this thread, a lot of larger external backup
infrastructure just gives you a start/stop hook, and that makes using a
persistent connection hard and fragile.
but I'm not really sure exactly what that would look like and I
honestly don't have terribly much interest in spending resources on
implementing it since it would lack a great many features that we'd
then end up wanting to add on... and then we end up backing into
building yet another backup tool.
Meh. You are proposing to remove a feature (even if a dangerous
one). Requiring some minimal compat work to make htat more palpaple
isn't crazy. Nor does using backrest or whatnot do *ANYTHING* about the
users of large company wide backup tools.
I really, honestly, believe what we need to *stop* doing is trying to
drag along support for old/deprecated interfaces after we've introduced
new ones, thus avoiding these arguments and the time spent on them, and
the time spent dealing with implementing and documenting new APIs around
old ones. The only thing it seems to unquestionably do is make us argue
about when we are going to *finally* rip out the old/deprecated API (or
interface, or whatever you want to call it).While I agree that we should remove non-exclusive backups, I VEHEMENTLY
oppose the unconditionality of the above statement. Yes, it's annoying
to keep interfaces around, but unnecessarily inflicting pain to everyone
also is annoying.Alright, then how about we provide a bit of help for everyone who's got
a system built around recovery.conf today, instead of just completely
ripping that out?
Oh, comeon. Obviously we're sometimes going to have to make breaking
changes. But you're essentially arguing that we shouldn't provide compat
where we can come up with a reasonable way to provide backward
compatibility. And I think that's crazy and will hurt postgres.
That's not to say we shouldn't be better at announcing and then
following through on the deprecation of old interfaces.We announce things have changed every year, and people have five years
from that point, during which we'll continue to support them and fix
bugs and deal with security issues, to make whatever changes they need
to for the new interface.The argument seems to be that people want new features but they don't
want to have to do any work to get those new features.
I mean, duh? We can't make that happen all the time, but I don't
understand why it's surprising to have that as *one* goal that's in
conflict with other goals? Your analysis also neglects that a lot of
software will have to work across multiple supported versions of
postgres (at the very least in prep for the migration, but also often
longer) - by having a few versions were both interfaces work, that can
be made *MUCH* less painful.
When is something too big of a change to make in a given major version
and we have to, instead, provide backwards compatibility for it? What
is that policy? How many releases do we have to support it for? How do
we communicate that to our users, so that they know what they can depend
on being there across major releases and what they can't?
I literally said that we should have more policy around this.
I'm going to stop here, discussing with you in this thread seems
unproductive.
Greetings,
Andres Freund
Hi,
On 2019-02-25 20:43:01 +0200, David Steele wrote:
fsync() is the major corruption issue we are facing right now but that
doesn't mean there aren't other sources of corruption we should be thinking
about. I've thought about this one a lot and it scares me.
FWIW, I think this kind of issue practially bites a *LOT* more people
than the fsync() issue.
I've worked on ways to make it better, but all of them break something and
involve compromises that are nearly as severe as removing exclusive backups
entirely.
There've been plenty proposals to make non-exclusive backups less
painful in this thread, so ...?
Greetings,
Andres Freund
On Feb 25, 2019, at 11:24, Stephen Frost <sfrost@snowman.net> wrote:
Aren't they going to need to make a change for v12 now anyway?Hopefully they're regularly testing their backups by doing a restore of
them, and dropping a recovery.conf into the directory of a v12 system
after restore will do exactly nothing and they'll get errors complaining
about how they need to provide a restore_command.
It varies with the installation, but backups are 100% - epsilon automated, while restores are very frequently done via runbook. Changing a runbook is institutionally less contentious, and the change is more or less "drop the file into mumble/conf.d rather than mumble", which is less of a break.
--
-- Christophe Pettus
xof@thebuild.com
Hi,
On 2019-02-25 09:24:32 -0800, Christophe Pettus wrote:
But the resistance to major version upgrades is *huge*, and I'm
strongly biased against anything that will make that harder. I'm not
sure I'm communicating how big a problem telling many large
installations, "If you move to v12/13/etc., you will have to change
your backup system" is going to be.
I think you might be right about this specific issue. But to me it
sounds like you also don't appreciate that development resources are
really constrained too, and providing endless backward compatibility for
everything is going to use both resources directly, and indirectly by
making the whole system more complex. I've been on your side of this
fight a couple times (and largely lost), but I think it's important to
appreciate that it's all a balancing, and we all have valid reasons to
keep the balance between development pace / ease of use / ease of
upgrade the way we argue for them. Hard upgrading is going to reduce
adoption, but so is lack of feature development and too many creaky &
redundant & easy to misuse interfaces.
Greetings,
Andres Freund
Hi Andres,
On 2/25/19 10:57 PM, Andres Freund wrote:
Hi,
On 2019-02-25 08:14:16 -0500, Stephen Frost wrote:
It will be annoying if after this removal, companies must change their
backup strategy by using specific postgres tools (pgbackrest, barman).You don't write your own database system using CSV files and shell
magic, do you? I have to say that it continues to boggle my mind that
people insist that *this* part of the system has to be able to be
implementable using shell scripts.Folks, these are your backups we're talking about, your last resort if
everything else goes up in flames, why do you want to risk that by
implementing your own one-off solution, particularly when there's known
serious issues using that interface, and you want to just use shell
scripts to do it...?FWIW, if you weren't selling backrest quite so hard everywhere backups
are mentioned, I'd find this thread a lot more convicing.
pgBackRest has not used exclusive backups since the new API was
introduced in 9.6 so this is not an issue for our users.
Over time we have contributed back to Postgres in areas we thought could
be improved based on our work on the pgBackRest project: 6ad8ac60,
9fe3c644, 017e4f25, 78874531, 449338cc, 98267ee8, 8694cc96, 920a5e50,
c37b3d08, 5fc1670b, b981df4c. This does not include the various backup
related patches that we have reviewed.
If promoting pgBackRest were our primary concern then it would be in our
interest to allow Postgres exclusive backups to stay broken and
pg_basebackup to be as primitive as possible.
Our objections to exclusive backups stem from a desire to do the right
thing, as we see it, and because we have seen first hand all the ways
that backups can go wrong.
--
-David
david@pgmasters.net
On Mon, Feb 25, 2019 at 10:14 PM Andres Freund <andres@anarazel.de> wrote:
On 2019-02-25 08:42:02 -0500, Stephen Frost wrote:
Greetings,
* Andres Freund (andres@anarazel.de) wrote:
On 2019-02-24 17:19:22 -0500, Stephen Frost wrote:
You say above that the new interface is unquestionably an improvement
FWIW, I think we didn't design it even remotely as well as we ought to
have. It was both a mistake to not offer a version of non-exclusive
backups that works with a client connection (for integration with the
TSMs of this world), and it was a mistake not to offer a commandline
tool that does the non-exclusive pg/start backup.I'm not really following- did you mean to say "without a client
connection" above?Yes.
We could still add some kind of commandline tool to help with the
non-exclusive pg start/stop backupThat doesn't help, as long as the connection requirement is there. As
explained elsewhere in this thread, a lot of larger external backup
infrastructure just gives you a start/stop hook, and that makes using a
persistent connection hard and fragile.
In my experience they don't just give you that. They also give you a module
that calls the postgres APIs to do backups.
Some of them are using the old API, some are using the new one. The only
thing that would be required there would be to upgrade the module to one
that supports the 9.6 APIs. (And you have to upgrade those modules whenever
you upgrade oracle or mssql or whatever else you have, so that's not
something those organisations are unused to)
I agree with the need for as simpler way to drive it in the simple case or
in the say filesystem snapshot case, but I'm not sure the pre/post case is
all that common in this context? Some certainly use it today even if they
don't have to, but that can probably be easily changed to such a
"simplified cmd tool" that you have suggested elsewhere as well.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On 2/25/19 11:20 PM, Christophe Pettus wrote:
On Feb 25, 2019, at 11:24, Stephen Frost <sfrost@snowman.net> wrote:
Aren't they going to need to make a change for v12 now anyway?Hopefully they're regularly testing their backups by doing a restore of
them, and dropping a recovery.conf into the directory of a v12 system
after restore will do exactly nothing and they'll get errors complaining
about how they need to provide a restore_command.It varies with the installation, but backups are 100% - epsilon automated, while restores are very frequently done via runbook. Changing a runbook is institutionally less contentious, and the change is more or less "drop the file into mumble/conf.d rather than mumble", which is less of a break.
This is true but is becoming far less true over time. Automation of
recovery is now very common -- as it should be. Backups should be
tested regularly and recovery should not be a scary thing that is done
at 2am from an outdated runbook.
The recovery.conf change will have serious impact when it arrives in PG12.
--
-David
david@pgmasters.net
On 2/25/19 11:38 PM, Andres Freund wrote:
Hi,
On 2019-02-25 09:24:32 -0800, Christophe Pettus wrote:
But the resistance to major version upgrades is *huge*, and I'm
strongly biased against anything that will make that harder. I'm not
sure I'm communicating how big a problem telling many large
installations, "If you move to v12/13/etc., you will have to change
your backup system" is going to be.I think you might be right about this specific issue. But to me it
sounds like you also don't appreciate that development resources are
really constrained too, and providing endless backward compatibility for
everything is going to use both resources directly, and indirectly by
making the whole system more complex. I've been on your side of this
fight a couple times (and largely lost), but I think it's important to
appreciate that it's all a balancing, and we all have valid reasons to
keep the balance between development pace / ease of use / ease of
upgrade the way we argue for them. Hard upgrading is going to reduce
adoption, but so is lack of feature development and too many creaky &
redundant & easy to misuse interfaces.
+1.
And those (like me) who are motivated to work in this area are put off
by the byzantine code and how easy it is to break things -- in large
part because of the *complete* absence of tests for exclusive mode.
I did a round of doc updates for non/exclusive backups a while back and
it was excruciating to measure and document the subtle differences in a
way that a user might possibly understand. I'm honestly not motivated
to come back to it until we remove the deprecated interfaces.
--
-David
david@pgmasters.net
On Feb 25, 2019, at 13:38, Andres Freund <andres@anarazel.de> wrote:
I think you might be right about this specific issue. But to me it
sounds like you also don't appreciate that development resources are
really constrained too, and providing endless backward compatibility for
everything is going to use both resources directly, and indirectly by
making the whole system more complex.
One of the things I've tried not to do in this discussion is turn it into a policy debate about backwards compatibility in general, rather than this very specific feature. Of course, there's a cost to keeping around features, and I fully appreciate that. In this discussion, the code complexity didn't seem to be the fundamental driver behind removing the feature; the relative safety of it was, along with maintaining the documentation.
I jumped in because there seemed to be an argument going on that all of the cool kids will have moved off the old interface and there was essentially no cost to removing it in v12 or v13, and that didn't correspond to my experience.
--
-- Christophe Pettus
xof@thebuild.com
Greetings,
* Andres Freund (andres@anarazel.de) wrote:
On 2019-02-25 08:14:16 -0500, Stephen Frost wrote:
It will be annoying if after this removal, companies must change their
backup strategy by using specific postgres tools (pgbackrest, barman).You don't write your own database system using CSV files and shell
magic, do you? I have to say that it continues to boggle my mind that
people insist that *this* part of the system has to be able to be
implementable using shell scripts.Folks, these are your backups we're talking about, your last resort if
everything else goes up in flames, why do you want to risk that by
implementing your own one-off solution, particularly when there's known
serious issues using that interface, and you want to just use shell
scripts to do it...?FWIW, if you weren't selling backrest quite so hard everywhere backups
are mentioned, I'd find this thread a lot more convicing.
I push pgbackrest, just like I push PG, because I view it as the best
open source tool out there for the job, well architected, designed from
the ground-up to do what a backup tool needs to, and it has a strong
community around it with people contributing back, all openly and under
a permissive license which allows anyone to use it, hack on it, and do
what they want with it- basically, I see it as the PG of backup tools
for PG. What PG is to relational databases, pgbackrest is to PG backup
tools, imv.
As David mentioned already, this really doesn't change things for
pgbackrest one way or the other- we don't use the old API and haven't
since the non-exclusive API was available. In addition, pgbackrest
hasn't got a solution for the "we just want to take a snapshot" or "we
just want to use start/stop hooks". If I was really trying to push
people to pgbackrest because of the flaws in the exclusive backup mode,
I'd surely develop answers to those issues first and make them part of
pgbackrest in short order, and then start yelling from the tops of
buildings about how broken the exclusive backup API is and about how
I've got an easy solution for everyone to move to, maybe do a blog post
on planet. I'm not doing that though, instead I'm discussing the
issues here, with a tiny fraction of our community, and pushing for us
to agree to get rid of a dangerous API in favor of an API that anyone
can use w/o pgbackrest, if they want to.
Thanks!
Stephen
On 2/25/19 11:59 PM, Christophe Pettus wrote:
On Feb 25, 2019, at 13:38, Andres Freund <andres@anarazel.de> wrote:
I think you might be right about this specific issue. But to me it
sounds like you also don't appreciate that development resources are
really constrained too, and providing endless backward compatibility for
everything is going to use both resources directly, and indirectly by
making the whole system more complex.One of the things I've tried not to do in this discussion is turn it into a policy debate about backwards compatibility in general, rather than this very specific feature. Of course, there's a cost to keeping around features, and I fully appreciate that. In this discussion, the code complexity didn't seem to be the fundamental driver behind removing the feature; the relative safety of it was, along with maintaining the documentation.
Code complexity is very much an issue. Since the two methods are merged
with a ton of conditionals, working on one without affecting the other
is a real chore, especially since exclusive backups have *zero* tests.
Personally I'm not interested in writing those tests so I try not to
touch the code either.
I jumped in because there seemed to be an argument going on that all of the cool kids will have moved off the old interface and there was essentially no cost to removing it in v12 or v13, and that didn't correspond to my experience.
I don't think anyone believes there will be zero cost, but the issues
with the exclusive method seem to outweigh the benefits, at least in my
experience.
--
-David
david@pgmasters.net
Greetings,
* Andres Freund (andres@anarazel.de) wrote:
On 2019-02-25 08:42:02 -0500, Stephen Frost wrote:
* Andres Freund (andres@anarazel.de) wrote:
On 2019-02-24 17:19:22 -0500, Stephen Frost wrote:
You say above that the new interface is unquestionably an improvement
FWIW, I think we didn't design it even remotely as well as we ought to
have. It was both a mistake to not offer a version of non-exclusive
backups that works with a client connection (for integration with the
TSMs of this world), and it was a mistake not to offer a commandline
tool that does the non-exclusive pg/start backup.I'm not really following- did you mean to say "without a client
connection" above?Yes.
Ah, alright, that makes more sense then.
You might bring that up with the individual who wrote it.
We could still add some kind of commandline tool to help with the
non-exclusive pg start/stop backupThat doesn't help, as long as the connection requirement is there. As
explained elsewhere in this thread, a lot of larger external backup
infrastructure just gives you a start/stop hook, and that makes using a
persistent connection hard and fragile.
I tend to agree w/ Magnus in that I'm not sure that this is actually
as common these days as claimed. That said, I'm not against someone
trying to implement something that doesn't require a connection, though
I understand from discussion with David and his comments elsewhere on
this thread that it isn't that simple a thing to do- and people would
have to migrate to whatever it is anyway.
but I'm not really sure exactly what that would look like and I
honestly don't have terribly much interest in spending resources on
implementing it since it would lack a great many features that we'd
then end up wanting to add on... and then we end up backing into
building yet another backup tool.Meh. You are proposing to remove a feature (even if a dangerous
one). Requiring some minimal compat work to make htat more palpaple
isn't crazy. Nor does using backrest or whatnot do *ANYTHING* about the
users of large company wide backup tools.
Yes, I'm supporting the proposal to remove a dangerous feature. I
really don't think much more needs to be said than that. If the feature
wasn't dangerous then I wouldn't be argueing for its removal. I do feel
bad that pgbackrest hasn't got a solution already to this, but we've had
other priorities. I do think we'll get there, eventually, but not any
time soon. Being told that I have to implement another feature so that
we can remove the dangerous one is something I don't agree with.
I really, honestly, believe what we need to *stop* doing is trying to
drag along support for old/deprecated interfaces after we've introduced
new ones, thus avoiding these arguments and the time spent on them, and
the time spent dealing with implementing and documenting new APIs around
old ones. The only thing it seems to unquestionably do is make us argue
about when we are going to *finally* rip out the old/deprecated API (or
interface, or whatever you want to call it).While I agree that we should remove non-exclusive backups, I VEHEMENTLY
oppose the unconditionality of the above statement. Yes, it's annoying
to keep interfaces around, but unnecessarily inflicting pain to everyone
also is annoying.Alright, then how about we provide a bit of help for everyone who's got
a system built around recovery.conf today, instead of just completely
ripping that out?Oh, comeon. Obviously we're sometimes going to have to make breaking
changes. But you're essentially arguing that we shouldn't provide compat
where we can come up with a reasonable way to provide backward
compatibility. And I think that's crazy and will hurt postgres.
This was discussed, *extensively*, previously on this list with a whole
lot of proposals trying to come up with easy ways to "fix" exclusive
backups and I don't really think that this thread has proposed anything
novel in that regard. Basically, I just don't think it's that easy.
Perhaps I'm wrong, in which case, great, I'd be happy to review a simple
patch that solves it.
That's not to say we shouldn't be better at announcing and then
following through on the deprecation of old interfaces.We announce things have changed every year, and people have five years
from that point, during which we'll continue to support them and fix
bugs and deal with security issues, to make whatever changes they need
to for the new interface.The argument seems to be that people want new features but they don't
want to have to do any work to get those new features.I mean, duh? We can't make that happen all the time, but I don't
understand why it's surprising to have that as *one* goal that's in
conflict with other goals? Your analysis also neglects that a lot of
software will have to work across multiple supported versions of
postgres (at the very least in prep for the migration, but also often
longer) - by having a few versions were both interfaces work, that can
be made *MUCH* less painful.
Having to support multiple APIs like this really is painful.
Also, I'm pretty well versed in software that has to support multiple PG
major versions across time, using specifically these APIs in fact.
Having both interfaces work at the same time didn't make that any less
painful (pgbackrest supported non-exclusive backup in the same version
as 9.6 support was added, and supports backing up PG back to 8.3).
Perhaps it would help some, but I tend to doubt it based both on that
experience and in the experience that there tends to be two times people
make changes- opportunistically when they realize something has been
deprecated, and when they're forced to because the old interface went
away, and the level of effort and time spent tends to be about the same,
all that's different is when it's done.
When is something too big of a change to make in a given major version
and we have to, instead, provide backwards compatibility for it? What
is that policy? How many releases do we have to support it for? How do
we communicate that to our users, so that they know what they can depend
on being there across major releases and what they can't?I literally said that we should have more policy around this.
Great. I was kinda hoping you might propose something by way of
answering those questions so that maybe we could actually get to a point
of having such a policy.
Thanks!
Stephen
On Mon, Feb 25, 2019 at 4:38 PM David Steele <david@pgmasters.net> wrote:
FWIW, if you weren't selling backrest quite so hard everywhere backups
are mentioned, I'd find this thread a lot more convicing.pgBackRest has not used exclusive backups since the new API was
introduced in 9.6 so this is not an issue for our users.Over time we have contributed back to Postgres in areas we thought could
be improved based on our work on the pgBackRest project: 6ad8ac60,
9fe3c644, 017e4f25, 78874531, 449338cc, 98267ee8, 8694cc96, 920a5e50,
c37b3d08, 5fc1670b, b981df4c. This does not include the various backup
related patches that we have reviewed.If promoting pgBackRest were our primary concern then it would be in our
interest to allow Postgres exclusive backups to stay broken and
pg_basebackup to be as primitive as possible.
Hmm, so what you're saying is that you'd like to disable an API that
some non-backrest users are relying upon but which no backrest users
are relying upon. And you don't understand why some non-backrest
users are opposed to that plan. Is that a correct summary of your
position?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert,
* Robert Haas (robertmhaas@gmail.com) wrote:
Hmm, so what you're saying is that you'd like to disable an API that
some non-backrest users are relying upon but which no backrest users
are relying upon. And you don't understand why some non-backrest
users are opposed to that plan. Is that a correct summary of your
position?
This topic has come up more than once and on multiple threads (entirely
independent of pgbackrest and any of the other individuals on this
thread) and I think it needs to be a discussion held at the PGCon
Developer meeting.
I'll propose it.
If you'd like to propose something to fix the, as well discussed,
dangerous exclusive backup method, or further litigate at what point a
dangerous and misleading interface should be removed, I'm sure you'll
find we'd be happy to talk about that.
Thanks,
Stephen
On 26/02/19 4:53 PM, Robert Haas wrote:
On Mon, Feb 25, 2019 at 4:38 PM David Steele <david@pgmasters.net> wrote:
FWIW, if you weren't selling backrest quite so hard everywhere backups
are mentioned, I'd find this thread a lot more convicing.pgBackRest has not used exclusive backups since the new API was
introduced in 9.6 so this is not an issue for our users.Over time we have contributed back to Postgres in areas we thought could
be improved based on our work on the pgBackRest project: 6ad8ac60,
9fe3c644, 017e4f25, 78874531, 449338cc, 98267ee8, 8694cc96, 920a5e50,
c37b3d08, 5fc1670b, b981df4c. This does not include the various backup
related patches that we have reviewed.If promoting pgBackRest were our primary concern then it would be in our
interest to allow Postgres exclusive backups to stay broken and
pg_basebackup to be as primitive as possible.Hmm, so what you're saying is that you'd like to disable an API that
some non-backrest users are relying upon but which no backrest users
are relying upon. And you don't understand why some non-backrest
users are opposed to that plan. Is that a correct summary of your
position?
+1 to Robert's Management Summary.
ISTM that the onus should be on the patch submitter to provide additions
to pg_basebackup that make it as painless as possible for those people
*not* using pgBackRest to continue making backups. Breaking this is just
not right. Submitting patches that mean that people *must* use
pgBackRest is also not right IMHO.
Finally, Open Source is about is working together to make the a common
project (in this case Poistgres) better - not forcing us to use
something else you have written (even if it is good).
regards
Mark
Greetings Mark,
* Mark Kirkwood (mark.kirkwood@catalyst.net.nz) wrote:
ISTM that the onus should be on the patch submitter to provide additions to
pg_basebackup that make it as painless as possible for those people *not*
using pgBackRest to continue making backups. Breaking this is just not
right. Submitting patches that mean that people *must* use pgBackRest is
also not right IMHO.
I'm sorry that there's some confusion here- to be clear, no one is
required to use pgBackRest. pg_basebackup works quite well and wouldn't
be impacted by the changes proposed no this thread. The arguments
against removing the exclusive backup feature don't have anything to do
with pg_basebackup.
Thanks!
Stephen
On Mon, Feb 25, 2019 at 08:17:27PM +0200, David Steele wrote:
Here's the really obvious bad thing: if users do not update their procedures
and we ignore backup_label.pending on startup then they will end up with a
corrupt database because it will not replay from the correct checkpoint. If
we error on the presence of backup_label.pending then we are right back to
where we started.
Not really. If we error on backup_label.pending, we can make the
difference between a backend which has crashed in the middle of an
exclusive backup without replaying anything and a backend which is
started based on a base backup, so an operator can take some action to
see what's wrong with the server. If you issue an error, users can
also see that their custom backup script is wrong because they forgot
to rename the flag after taking a backup of the data folder(s).
The real pain point about the non-exclusive APIs is really that we
need to keep the connection opened, which can be done easily using
drivers like psycopg2 or such, still that's far from straight-forward
for the average enterprise user, and it can be hard for some companies
to complicate a software stack with more external dependencies. So
yes, while I can see the long-term benefits of non-exclusive backups,
they are much more complicated to use than exclusive backups.
--
Michael
On 26/02/19 5:41 PM, Stephen Frost wrote:
Greetings Mark,
* Mark Kirkwood (mark.kirkwood@catalyst.net.nz) wrote:
ISTM that the onus should be on the patch submitter to provide additions to
pg_basebackup that make it as painless as possible for those people *not*
using pgBackRest to continue making backups. Breaking this is just not
right. Submitting patches that mean that people *must* use pgBackRest is
also not right IMHO.I'm sorry that there's some confusion here- to be clear, no one is
required to use pgBackRest. pg_basebackup works quite well and wouldn't
be impacted by the changes proposed no this thread. The arguments
against removing the exclusive backup feature don't have anything to do
with pg_basebackup.
Ah yes (checks pg_basbackup code), you are correct! Reading this thread
I thought I saw a comment to the effect that pg_basebackup was being
broken, hence the less than impressed post.
Your relentless bashing of people doing their own backups and heavy
marketing of pgBackRest - unfortunately - made it easy for me to believe
that this was a possibility that you might see as ok. So - apologies for
the misunderstanding, however less marketing of your own product would
avoid me jumping to the wrong conclusion.
regards
Mark
On 2/26/19 6:51 AM, Michael Paquier wrote:
On Mon, Feb 25, 2019 at 08:17:27PM +0200, David Steele wrote:
Here's the really obvious bad thing: if users do not update their procedures
and we ignore backup_label.pending on startup then they will end up with a
corrupt database because it will not replay from the correct checkpoint. If
we error on the presence of backup_label.pending then we are right back to
where we started.Not really. If we error on backup_label.pending, we can make the
difference between a backend which has crashed in the middle of an
exclusive backup without replaying anything and a backend which is
started based on a base backup, so an operator can take some action to
see what's wrong with the server. If you issue an error, users can
also see that their custom backup script is wrong because they forgot
to rename the flag after taking a backup of the data folder(s).
The operator still has a decision to make, manually, just as they do
now. The wrong decision may mean a corrupt database.
Here's the scenario:
1) They do a restore, forget to rename backup_label.pending.
2) Postgres won't start, which is the same action we take now.
3) The user is not sure what to do, rename or delete? They delete, and
the cluster is corrupted.
Worse, they have scripted the deletion of backup_label so that the
cluster will restart on crash. This is the recommendation from our
documentation after all. If that script runs after a restore instead of
a crash, then the cluster will be corrupt -- silently.
--
-David
david@pgmasters.net
Fujii Masao wrote:
Another idea is to improve an exclusive backup method so that it will never
cause such issue. What about changing an exclusive backup mode of
pg_start_backup() so that it creates something like backup_label.pending file
instead of backup_label? Then if the database cluster has backup_label.pending
file but not recovery.signal (this is the case where the database is recovered
just after the server crashes while an exclusive backup is in progress),
in this idea, the recovery using that database cluster always ignores
(or removes) backup_label.pending file and start replaying WAL from
the REDO location that pg_control file indicates. So this idea enables us to
work around the issue that an exclusive backup could cause.
Then if you restore a backup, but forget to add the recovery.signal file,
PostgreSQL will happily recover from a wrong checkpoint and you end up with
a corrupted database.
I think the fundamental problem with all these approaches is that there is
no safe way to distinguish a server crashed in backup mode from a restored
backup. This is what makes the problem so hard.
The existing exclusive backup is in my opinion the safest variant: it refuses
to create a corrupted cluster without manual intervention and gives you a dire
warning to consider if you are doing the right thing.
Yours,
Laurenz Albe
Greetings,
* Mark Kirkwood (mark.kirkwood@catalyst.net.nz) wrote:
On 26/02/19 5:41 PM, Stephen Frost wrote:
* Mark Kirkwood (mark.kirkwood@catalyst.net.nz) wrote:
ISTM that the onus should be on the patch submitter to provide additions to
pg_basebackup that make it as painless as possible for those people *not*
using pgBackRest to continue making backups. Breaking this is just not
right. Submitting patches that mean that people *must* use pgBackRest is
also not right IMHO.I'm sorry that there's some confusion here- to be clear, no one is
required to use pgBackRest. pg_basebackup works quite well and wouldn't
be impacted by the changes proposed no this thread. The arguments
against removing the exclusive backup feature don't have anything to do
with pg_basebackup.Ah yes (checks pg_basbackup code), you are correct! Reading this thread I
thought I saw a comment to the effect that pg_basebackup was being broken,
hence the less than impressed post.
No, I don't see breaking pg_basebackup as ok, but I do have concerns
about how it doesn't provide for any validation of the backup unless you
use tar+gzip. Unlike with exclusive mode, at least you don't run the
risk with pg_basebackup that a crash will mean that PG won't come back
up without intervention. I can understand how the comments which you
were responding to might have lead to that confusion.
Your relentless bashing of people doing their own backups and heavy
marketing of pgBackRest - unfortunately - made it easy for me to believe
that this was a possibility that you might see as ok. So - apologies for the
misunderstanding, however less marketing of your own product would avoid me
jumping to the wrong conclusion.
As for my personal recommendations, if you go back far enough, you can
see where I used to discuss how to use the exclusive API for doing
backups with people, because maybe they didn't like the tools available
at the time (I didn't, after all), but after seeing how many issues come
from doing that and realizing how bad it is that things like our
documented archive_command doesn't ensure that WAL is actually written
out to disk, I realized why dedicated tools had been written and started
to recommend that people pick one of the dedicated tools instead of
writing their own (and I used to recommend the other solution like
barman right alongside pgBackRest, and I still do make the generic
recommendation at times that people not try to build their own backup
tool) but then after seeing things like CIFS mounts happily throwing
kernel errors while returning success on (obviously, actually failed)
fsync() calls, resulting in corrupted backups that would silently be
corrupt if restored from, I realized that having an actual manifest of
all files backed up and their checksums wasn't a "nice to have" kind of
feature for a backup tool but something critical to a backup solution
(thankfully, David knew that right from the start and so it was always
part of pgBackRest) and so I have a pretty hard time recommending
solutions that don't have that these days.
As for back to the main topic of this thread, do I hope that removing
the exclusive backup mode will cause people to look at and consider
existing, purpose-build, backup solutions for PG, rather than trying to
update their one-off backup shell scripts? Yes, of course I do. Am I
trying to sneak something in that's going to force them to use
pgBackRest? No, of course not. Note that pg_basebackup, barman and
pgBackRest have had support for the non-exclusive API since it was
introduced (or, for barman, even before then if one installed the
pgespresso extension, and pg_basebackup since it was written because it
uses the replication protocol), and so does WAL-G (though I'm not sure
exactly when it got it, or maybe it always had it), and pg_probackup
(which appears to have gotten it in May 2017, less than a year after 9.6
was released), and even the shell-based pitery (which got it at some
point before pg_probackup, it looks like, based on the commits).
Considering there's at least *5* other backup tools which already
support the non-exclusive API, all of which have done so since the
non-exclusive mode was introduced or shortly after, I really have to
question the angle of attack being used here which is claiming that
this thread is motivated by a goal of forcing people to pgBackRest.
The main group who would be impacted by the exclusive mode going away
are those who aren't using an existing solution and who either didn't
get the memo about exclusive mode being deprecated, or decided just to
ignore it, and those are the installations which really do concern me
the most because they're the highest risk for people ending up losing
their data.
Thanks!
Stephen
Greetings,
* Laurenz Albe (laurenz.albe@cybertec.at) wrote:
I think the fundamental problem with all these approaches is that there is
no safe way to distinguish a server crashed in backup mode from a restored
backup. This is what makes the problem so hard.
Right- if you want to just call start/stop and take a snapshot in the
middle and then be able to restore that directly and start up the
database, then there *can't* be any way to distinguish between the two,
which is, I'm pretty sure, where this whole discussion ended up back
during the 9.6 development cycle and why it's still an issue. If there
was an easy way to fix this, I feel like we would have already.
The existing exclusive backup is in my opinion the safest variant: it refuses
to create a corrupted cluster without manual intervention and gives you a dire
warning to consider if you are doing the right thing.
... it's the least dangerous if you limit yourself to that method, but
that doesn't make it safe. :(
In the end, you basically *have* to have a way of extracting out the
data needed for the backup (start/stop WAL and such) that doesn't make
the running cluster look like it's a backup being restored, and you
*have* to make that information available to the database cluster when
it's restored somehow, and notify PG that it's doing backup recovery and
*not* crash recovery, to eliminate this risk, and that's pretty hard to
manage if all you want to do is snapshot the filesystem.
Of course, you have to have a solution for WAL too and the thought has
crossed my mind that maybe there's something we could do when it comes
to stash all the info needed in the WAL archive, but I'm still not sure
how we'd solve for knowing if we're doing backup recovery or crash
recovery in that case without some kind of marker or something external
telling us that's what we're doing. As you proposed previously, but
with a bit of a twist, maybe we could just always do backup recovery if
we find a .backup (or whatever) file in the WAL that, when compared to
pg_control, shows that we were in the process of doing a backup... That
would require that everyone always have a restore_command set, which
wasn't possible before because that went into recovery.conf, but it's
possible to just always have that set now, and that would eliminate the
risk of us running the system out of disk space by keeping all the WAL
that's generated during the backup local.
Obviously, a lot of this is pretty hand-wavy, and you still have the
unfortunate situation that if you're actually recoverying from a crash
that just happened to happen while you were taking a backup then you
could be replaying a heck of a lot more WAL than you needed to, and you
have to have a working restore_command on the primary, and you'd have to
figure out a way for PG to check for these files .backup or whatever
files on startup that doesn't take forever or require stepping through
every WAL segment or something, but maybe those concerns could be
addressed.
Thanks!
Stephen
On Tue, Feb 26, 2019 at 1:49 AM David Steele <david@pgmasters.net> wrote:
The operator still has a decision to make, manually, just as they do
now. The wrong decision may mean a corrupt database.Here's the scenario:
1) They do a restore, forget to rename backup_label.pending.
2) Postgres won't start, which is the same action we take now.
3) The user is not sure what to do, rename or delete? They delete, and
the cluster is corrupted.Worse, they have scripted the deletion of backup_label so that the
cluster will restart on crash. This is the recommendation from our
documentation after all. If that script runs after a restore instead of
a crash, then the cluster will be corrupt -- silently.
Sure, that's true. On the other hand, it's not like someone can't
manage to use the non-exclusive mode and fail to drop the backup_label
and tablespace_map files into the right places. This procedure is
tedious and error-prone no matter which way you do it, which is one
reason I don't believe that the exclusive backup method is nearly as
bad as you're making it out to be.
Mind you, I'm not really defending the proposal to add a new backup
mode along the likes Fujii Masao is proposing. I don't think the
situation will be made less error-prone by having three ways to do it
instead of two. But I think we probably would've been happier if we'd
designed it the way he's now proposing in the first place, instead of
the way we actually did.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Feb 26, 2019 at 3:17 AM David Steele <david@pgmasters.net> wrote:
On 2/25/19 7:50 PM, Fujii Masao wrote:
On Mon, Feb 25, 2019 at 10:49 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
I'm not playing devil's advocate here to annoy you. I see the problems
with the exclusive backup, and I see how it can hurt people.
I just think that removing exclusive backup without some kind of help
like Andres sketched above will make people unhappy.+1
Another idea is to improve an exclusive backup method so that it will never
cause such issue. What about changing an exclusive backup mode of
pg_start_backup() so that it creates something like backup_label.pending file
instead of backup_label? Then if the database cluster has backup_label.pending
file but not recovery.signal (this is the case where the database is recovered
just after the server crashes while an exclusive backup is in progress),
in this idea, the recovery using that database cluster always ignores
(or removes) backup_label.pending file and start replaying WAL from
the REDO location that pg_control file indicates. So this idea enables us to
work around the issue that an exclusive backup could cause.It's an interesting idea.
On the other hand, the downside of this idea is that the users need to change
the recovery procedure. When they want to do PITR using the backup having
backup_label.pending, they need to not only create recovery.signal but also
rename backup_label.pending to backup_label. Rename of backup_label file
is brand-new step for their recovery procedure, and changing the recovery
procedure might be painful for some users. But IMO it's less painful than
removing an exclusive backup API at all.Well, given that we have invalidated all prior recovery procedures in
PG12 I'm not sure how big a deal that is. Of course, it's too late make
a change like this for PG12.Thought?
Here's the really obvious bad thing: if users do not update their
procedures and we ignore backup_label.pending on startup then they will
end up with a corrupt database because it will not replay from the
correct checkpoint. If we error on the presence of backup_label.pending
then we are right back to where we started.
No. In this case, since backup_label.pending and recovery.signal exist,
as I described in my previous post, the server stops the recovery with
PANIC error before corrupting the database. Then the operator can
rename backup_label.pending to backup_label and restart the recovery
safely.
So, let me clarify the situations;
(1) If backup_label and recovery.signal exist, the recovery starts safely.
This is the normal case of recovery from the base backup.
(2)If backup_label.pending and recovery.signal exist, as described above,
PANIC error happens at the start of recovery. This case can happen
if the operator forgets to rename backup_label.pending, i.e.,
operation mistake. So, after PANIC, the operator needs to fix her or
his mistake (i.e., rename backup_label.pending) and restart
the recovery.
(3) If backup_label.pending exists but recovery.signal doesn't, the server
ignores (or removes) backup_label.pending and do the recovery
starting the pg_control's REDO location. This case can happen if
the server crashes while an exclusive backup is in progress.
So crash-in-the-middle-of-backup doesn't prevent the recovery from
starting in this idea.
Regards,
--
Fujii Masao
On Tue, Feb 26, 2019 at 12:20 PM Fujii Masao <masao.fujii@gmail.com> wrote:
So, let me clarify the situations;
(1) If backup_label and recovery.signal exist, the recovery starts safely.
This is the normal case of recovery from the base backup.(2)If backup_label.pending and recovery.signal exist, as described above,
PANIC error happens at the start of recovery. This case can happen
if the operator forgets to rename backup_label.pending, i.e.,
operation mistake. So, after PANIC, the operator needs to fix her or
his mistake (i.e., rename backup_label.pending) and restart
the recovery.(3) If backup_label.pending exists but recovery.signal doesn't, the server
ignores (or removes) backup_label.pending and do the recovery
starting the pg_control's REDO location. This case can happen if
the server crashes while an exclusive backup is in progress.
So crash-in-the-middle-of-backup doesn't prevent the recovery from
starting in this idea.
The if-conditions for 1 and 2 appear to be the same, which is confusing.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Feb 26, 2019, at 09:26, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Feb 26, 2019 at 12:20 PM Fujii Masao <masao.fujii@gmail.com> wrote:
So, let me clarify the situations;
(1) If backup_label and recovery.signal exist, the recovery starts safely.
This is the normal case of recovery from the base backup.(2)If backup_label.pending and recovery.signal exist, as described above,
PANIC error happens at the start of recovery. This case can happen
if the operator forgets to rename backup_label.pending, i.e.,
operation mistake. So, after PANIC, the operator needs to fix her or
his mistake (i.e., rename backup_label.pending) and restart
the recovery.(3) If backup_label.pending exists but recovery.signal doesn't, the server
ignores (or removes) backup_label.pending and do the recovery
starting the pg_control's REDO location. This case can happen if
the server crashes while an exclusive backup is in progress.
So crash-in-the-middle-of-backup doesn't prevent the recovery from
starting in this idea.The if-conditions for 1 and 2 appear to be the same, which is confusing.
I believe #1 is when backup_label (no .pending) exists, #2 is when backup_label.pending (with .pending) exists.
At the absolute minimum, this discussion has convinced me that we need to create a wiki page to accurately describe the failure scenarios for both exclusive and non-exclusive backups, and the recovery actions for them. If it exists already, my search attempts weren't successful. If it doesn't, I'm happy to start one.
--
-- Christophe Pettus
xof@thebuild.com
On Wed, Feb 27, 2019 at 2:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Feb 26, 2019 at 12:20 PM Fujii Masao <masao.fujii@gmail.com> wrote:
So, let me clarify the situations;
(1) If backup_label and recovery.signal exist, the recovery starts safely.
This is the normal case of recovery from the base backup.(2)If backup_label.pending and recovery.signal exist, as described above,
PANIC error happens at the start of recovery. This case can happen
if the operator forgets to rename backup_label.pending, i.e.,
operation mistake. So, after PANIC, the operator needs to fix her or
his mistake (i.e., rename backup_label.pending) and restart
the recovery.(3) If backup_label.pending exists but recovery.signal doesn't, the server
ignores (or removes) backup_label.pending and do the recovery
starting the pg_control's REDO location. This case can happen if
the server crashes while an exclusive backup is in progress.
So crash-in-the-middle-of-backup doesn't prevent the recovery from
starting in this idea.The if-conditions for 1 and 2 appear to be the same, which is confusing.
Using other name like backup_in_progress instead of backup_label.pending
may help a bit for your concern? That is,
(1) backup_label and recovery.signal
(2) backup_in_progress and recovery.signal
Regards,
--
Fujii Masao
On Tue, Feb 26, 2019 at 12:31 PM Christophe Pettus <xof@thebuild.com> wrote:
I believe #1 is when backup_label (no .pending) exists, #2 is when backup_label.pending (with .pending) exists.
Oh, whoops. I get it now.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Feb 26, 2019 at 3:49 PM David Steele <david@pgmasters.net> wrote:
On 2/26/19 6:51 AM, Michael Paquier wrote:
On Mon, Feb 25, 2019 at 08:17:27PM +0200, David Steele wrote:
Here's the really obvious bad thing: if users do not update their procedures
and we ignore backup_label.pending on startup then they will end up with a
corrupt database because it will not replay from the correct checkpoint. If
we error on the presence of backup_label.pending then we are right back to
where we started.Not really. If we error on backup_label.pending, we can make the
difference between a backend which has crashed in the middle of an
exclusive backup without replaying anything and a backend which is
started based on a base backup, so an operator can take some action to
see what's wrong with the server. If you issue an error, users can
also see that their custom backup script is wrong because they forgot
to rename the flag after taking a backup of the data folder(s).The operator still has a decision to make, manually, just as they do
now. The wrong decision may mean a corrupt database.
Even in non-exclusive backup mode, the wrong decision may mean
a corrupt database. For example, what if the user may forget to save
the backup_label in the backup taken by using non-exclusive backup
method? So I'm not sure if this is the matter only for an exclusive backup.
Regards,
--
Fujii Masao
On Tue, Feb 26, 2019 at 6:31 PM Christophe Pettus <xof@thebuild.com> wrote:
On Feb 26, 2019, at 09:26, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Feb 26, 2019 at 12:20 PM Fujii Masao <masao.fujii@gmail.com>
wrote:
So, let me clarify the situations;
(1) If backup_label and recovery.signal exist, the recovery starts
safely.
This is the normal case of recovery from the base backup.
(2)If backup_label.pending and recovery.signal exist, as described
above,
PANIC error happens at the start of recovery. This case can happen
if the operator forgets to rename backup_label.pending, i.e.,
operation mistake. So, after PANIC, the operator needs to fix heror
his mistake (i.e., rename backup_label.pending) and restart
the recovery.(3) If backup_label.pending exists but recovery.signal doesn't, the
server
ignores (or removes) backup_label.pending and do the recovery
starting the pg_control's REDO location. This case can happen if
the server crashes while an exclusive backup is in progress.
So crash-in-the-middle-of-backup doesn't prevent the recovery from
starting in this idea.The if-conditions for 1 and 2 appear to be the same, which is confusing.
I believe #1 is when backup_label (no .pending) exists, #2 is when
backup_label.pending (with .pending) exists.At the absolute minimum, this discussion has convinced me that we need to
create a wiki page to accurately describe the failure scenarios for both
exclusive and non-exclusive backups, and the recovery actions for them. If
it exists already, my search attempts weren't successful. If it doesn't,
I'm happy to start one.
That should not be a wiki page, really, that should be part of the main
documentation.
It can be drafted on the wiki of course, but we *really* should get
something like that into the docs. Because that's what people are going to
read. They'd only go look for the information on the wiki if they knew
there was a problem and needed the details, and the big issue is that
people don't know there are issues in the first place.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Hi,
On 2019-02-26 20:38:17 +0100, Magnus Hagander wrote:
That should not be a wiki page, really, that should be part of the main
documentation.
+1
It can be drafted on the wiki of course, but we *really* should get
something like that into the docs. Because that's what people are going to
read. They'd only go look for the information on the wiki if they knew
there was a problem and needed the details, and the big issue is that
people don't know there are issues in the first place.
Also, there's so much wrong stuff on the wiki that people that know to
look there just don't believe what they read.
Greetings,
Andres Freund
On Mon, Feb 25, 2019 at 10:49 PM David Steele <david@pgmasters.net> wrote:
Worse, they have scripted the deletion of backup_label so that the
cluster will restart on crash. This is the recommendation from our
documentation after all. If that script runs after a restore instead of
a crash, then the cluster will be corrupt -- silently.
I've always thought that the biggest problem is the documentation.
While the "Continuous Archiving and Point-in-Time Recovery (PITR)"
section of the user docs isn't quite as bad as it used to be, it's
still fundamentally structured "bottom up". That's bad.
I think that it would improve matters enormously if that section of
the documentation, and possibly the entire chapter was restructured to
be "top down". This should include prominent links to specific third
party projects, including Barman, pgBackRest, and others. This whole
area has become much more contentious than it needs to be, and I fear
that we cannot see the forest for the trees.
That said, I don't think that we should be in a hurry to fully remove
the exclusive backup mode. Home-spun backup solutions are a bad idea,
but removing exclusive mode strikes me as corrective over-reaction,
and a technical solution to an organizational problem. There are
definitely people that use the interface in a sensible way, despite
its shortcomings.
--
Peter Geoghegan
On 2/26/19 2:46 PM, Andres Freund wrote:
Also, there's so much wrong stuff on the wiki that people that know to
look there just don't believe what they read.
Should there be a wiki errata page on the wiki?
I'm fairly serious ... for those times when you're reading the Foo
page on the wiki, and realize it still describes the way Foo worked
before 0abcde, and you don't feel you have time to write a polished
update ... go to the Errata page and add a link to Foo, the date, your
name, and "noticed this still describes how Foo worked before 0abcde".
Or is there a MediaWiki template already for that? You could add the
template right at the top of the page, and it would automagically be
added to the category "pages with errata".
That category could be a starting point for anybody who has a bit of
time to improve some wiki pages, and is looking for likely candidates.
Regards,
-Chap
Fujii Masao wrote:
So, let me clarify the situations;
(3) If backup_label.pending exists but recovery.signal doesn't, the server
ignores (or removes) backup_label.pending and do the recovery
starting the pg_control's REDO location. This case can happen if
the server crashes while an exclusive backup is in progress.
So crash-in-the-middle-of-backup doesn't prevent the recovery from
starting in this idea
That's where I see the problem with your idea.
It is fairly easy for someone to restore a backup and then just start
the server without first creating "recovery.signal", and that would
lead to data corruption.
Indeed, if I didn't know much about databases and were faced with
the task of restoring a PostgreSQL database from backup in a hurry,
that is probably what I would do.
Having thought some about the whole problem area, I believe that there
is no safe way to disambiguate a backup from a server crashed in backup
mode.
The non-exclusive backup mode actually suffers from a similar problem ---
if you don't add "backup_label" to the backup afterwards, it will lead
to database corruption.
The only fool-proof tool we have in core is pg_basebackup.
Any really good backup solution that wishes to integrate with a third-
party backup tool will have to get that tool to include "backup_label"
with the backup itself.
For that, it would need to know the contents of "backup_label" *before*
pg_stop_backup(). Is there a good reason why it is pg_stop_backup()
and not pg_start_backup() that provides that information?
Yours,
Laurenz Albe
On Feb 26, 2019, at 11:38, Magnus Hagander <magnus@hagander.net> wrote:
That should not be a wiki page, really, that should be part of the main documentation.
I was just suggesting using a wiki page to draft it before we drop it into the main documentation. I'm open to other suggestions, of course!
--
-- Christophe Pettus
xof@thebuild.com
On Wed, Feb 27, 2019 at 4:35 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Fujii Masao wrote:
So, let me clarify the situations;
(3) If backup_label.pending exists but recovery.signal doesn't, the server
ignores (or removes) backup_label.pending and do the recovery
starting the pg_control's REDO location. This case can happen if
the server crashes while an exclusive backup is in progress.
So crash-in-the-middle-of-backup doesn't prevent the recovery from
starting in this ideaThat's where I see the problem with your idea.
It is fairly easy for someone to restore a backup and then just start
the server without first creating "recovery.signal", and that would
lead to data corruption.
Isn't this case problematic even when a backup was taken by pg_basebackup?
Because of lack of recovery.signal, no archived WAL files are replayed and
the database may not reach to the latest point.
Regards,
--
Fujii Masao
Greetings,
* Fujii Masao (masao.fujii@gmail.com) wrote:
On Wed, Feb 27, 2019 at 4:35 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Fujii Masao wrote:
So, let me clarify the situations;
(3) If backup_label.pending exists but recovery.signal doesn't, the server
ignores (or removes) backup_label.pending and do the recovery
starting the pg_control's REDO location. This case can happen if
the server crashes while an exclusive backup is in progress.
So crash-in-the-middle-of-backup doesn't prevent the recovery from
starting in this ideaThat's where I see the problem with your idea.
It is fairly easy for someone to restore a backup and then just start
the server without first creating "recovery.signal", and that would
lead to data corruption.Isn't this case problematic even when a backup was taken by pg_basebackup?
Because of lack of recovery.signal, no archived WAL files are replayed and
the database may not reach to the latest point.
There's a number of problems, imv, that I think have just been
ignored/missed when it comes to the recovery.conf removal and tools like
pg_basebackup, see this msg also:
/messages/by-id/20181127153405.GX3415@tamriel.snowman.net
At this point, I think we need to put these issues on the v12 open items
page so that we don't forget about them and get these things fixed
before v12 comes out.
Thanks!
Stephen
On 2/28/19 4:44 PM, Fujii Masao wrote:
On Wed, Feb 27, 2019 at 4:35 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Fujii Masao wrote:
So, let me clarify the situations;
(3) If backup_label.pending exists but recovery.signal doesn't, the server
ignores (or removes) backup_label.pending and do the recovery
starting the pg_control's REDO location. This case can happen if
the server crashes while an exclusive backup is in progress.
So crash-in-the-middle-of-backup doesn't prevent the recovery from
starting in this ideaThat's where I see the problem with your idea.
It is fairly easy for someone to restore a backup and then just start
the server without first creating "recovery.signal", and that would
lead to data corruption.Isn't this case problematic even when a backup was taken by pg_basebackup?
Because of lack of recovery.signal, no archived WAL files are replayed and
the database may not reach to the latest point.
It is problematic because we have a message encouraging users to delete
backup_label when in fact they should be creating recovery.signal.
If the required WAL is present the cluster will not be corrupted. It
just may not play forward as far as you would prefer.
--
-David
david@pgmasters.net
On 2/27/19 8:22 PM, Christophe Pettus wrote:
On Feb 26, 2019, at 11:38, Magnus Hagander <magnus@hagander.net> wrote:
That should not be a wiki page, really, that should be part of the main documentation.I was just suggesting using a wiki page to draft it before we drop it into the main documentation. I'm open to other suggestions, of course!
It seems to me that the best way to discuss this is via a patch to the
main documentation. I've written something to get us started:
https://commitfest.postgresql.org/22/2042/
--
-David
david@pgmasters.net
El 28/2/19 a las 15:13, David Steele escribió:
It seems to me that the best way to discuss this is via a patch to the
main documentation. I've written something to get us started:
+ <para>
+ The exclusive backup method is deprecated and should be avoided in
favor
+ of the non-exclusive backup method or
+ <application>pg_basebackup</application>.
+ </para>
Isn't pg_basebackup a non-exclusive backup method?
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Feb 28, 2019 at 11:07:47PM -0300, Martín Marqués wrote:
El 28/2/19 a las 15:13, David Steele escribió: + <para> + The exclusive backup method is deprecated and should be avoided in favor + of the non-exclusive backup method or + <application>pg_basebackup</application>. + </para>Isn't pg_basebackup a non-exclusive backup method?
It seems to me that it is exactly what this sentence means. Perhaps
it should be splitted into two sentences for better clarity?
--
Michael
On Thu, Feb 28, 2019 at 7:51 PM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Feb 28, 2019 at 11:07:47PM -0300, Martín Marqués wrote:
El 28/2/19 a las 15:13, David Steele escribió: + <para> + The exclusive backup method is deprecated and should be avoided in favor + of the non-exclusive backup method or + <application>pg_basebackup</application>. + </para>Isn't pg_basebackup a non-exclusive backup method?
It seems to me that it is exactly what this sentence means. Perhaps
it should be splitted into two sentences for better clarity?
Comments on the documentation patch should be placed on the thread
linked to the documentation patch as opposed to here.
David J.
El jue., 28 de feb. de 2019 a la(s) 23:50, Michael Paquier
(michael@paquier.xyz) escribió:
On Thu, Feb 28, 2019 at 11:07:47PM -0300, Martín Marqués wrote:
El 28/2/19 a las 15:13, David Steele escribió: + <para> + The exclusive backup method is deprecated and should be avoided in favor + of the non-exclusive backup method or + <application>pg_basebackup</application>. + </para>Isn't pg_basebackup a non-exclusive backup method?
It seems to me that it is exactly what this sentence means. Perhaps
it should be splitted into two sentences for better clarity?
When I read that phrase I see 2 options: non-exclusive backups, and
pg_basebackup. It's as if pg_basebackup wasn't included in the former
group. It might bring some confusion.
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 3/1/19 11:36 AM, Martín Marqués wrote:
El jue., 28 de feb. de 2019 a la(s) 23:50, Michael Paquier
(michael@paquier.xyz) escribió:On Thu, Feb 28, 2019 at 11:07:47PM -0300, Martín Marqués wrote:
El 28/2/19 a las 15:13, David Steele escribió: + <para> + The exclusive backup method is deprecated and should be avoided in favor + of the non-exclusive backup method or + <application>pg_basebackup</application>. + </para>Isn't pg_basebackup a non-exclusive backup method?
It seems to me that it is exactly what this sentence means. Perhaps
it should be splitted into two sentences for better clarity?When I read that phrase I see 2 options: non-exclusive backups, and
pg_basebackup. It's as if pg_basebackup wasn't included in the former
group. It might bring some confusion.
This has been updated on the other thread and I CC'd you.
--
-David
david@pgmasters.net
On Thu, Feb 28, 2019 at 05:08:24PM +0200, David Steele wrote:
On 2/28/19 4:44 PM, Fujii Masao wrote:
On Wed, Feb 27, 2019 at 4:35 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Fujii Masao wrote:
So, let me clarify the situations;
(3) If backup_label.pending exists but recovery.signal doesn't, the server
ignores (or removes) backup_label.pending and do the recovery
starting the pg_control's REDO location. This case can happen if
the server crashes while an exclusive backup is in progress.
So crash-in-the-middle-of-backup doesn't prevent the recovery from
starting in this ideaThat's where I see the problem with your idea.
It is fairly easy for someone to restore a backup and then just start
the server without first creating "recovery.signal", and that would
lead to data corruption.Isn't this case problematic even when a backup was taken by pg_basebackup?
Because of lack of recovery.signal, no archived WAL files are replayed and
the database may not reach to the latest point.It is problematic because we have a message encouraging users to delete
backup_label when in fact they should be creating recovery.signal.
Should we issue a client NOTICE/WARNING message as part of
pg_start_backup to indicate what to do if the server crashes before
pg_stop_backup()?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
On 3/5/19 12:36 AM, Bruce Momjian wrote:
On Thu, Feb 28, 2019 at 05:08:24PM +0200, David Steele wrote:
On 2/28/19 4:44 PM, Fujii Masao wrote:
On Wed, Feb 27, 2019 at 4:35 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Fujii Masao wrote:
So, let me clarify the situations;
(3) If backup_label.pending exists but recovery.signal doesn't, the server
ignores (or removes) backup_label.pending and do the recovery
starting the pg_control's REDO location. This case can happen if
the server crashes while an exclusive backup is in progress.
So crash-in-the-middle-of-backup doesn't prevent the recovery from
starting in this ideaThat's where I see the problem with your idea.
It is fairly easy for someone to restore a backup and then just start
the server without first creating "recovery.signal", and that would
lead to data corruption.Isn't this case problematic even when a backup was taken by pg_basebackup?
Because of lack of recovery.signal, no archived WAL files are replayed and
the database may not reach to the latest point.It is problematic because we have a message encouraging users to delete
backup_label when in fact they should be creating recovery.signal.Should we issue a client NOTICE/WARNING message as part of
pg_start_backup to indicate what to do if the server crashes before
pg_stop_backup()?
I'm not in favor of this since it will just encourage users to change
their log level and possibly miss important notices/warnings.
Regards,
--
-David
david@pgmasters.net
On 12/11/18 5:24 PM, David Steele wrote:
Attached is the patch.
I was a bit surprised by how much code went away. There was a lot of
code dedicated to making sure that backup_label got renamed on shutdown,
that there was not an exclusive backup running, etc.There were no tests for exclusive backup so the test changes were minimal.
I did have to replace one "hot backup" in
010_logical_decoding_timelines.pl. I'm not sure why the backup was
being done this way, or why the standby needs a copy of pg_replslot
(which is not copied by pg_basebackup). It might be worth looking at
but it seemed out of scope for this patch.
Rebased patch is attached.
I also fixed the issue that Robert pointed out with regard to changing
the function signatures. It's a bit weird that there are optional
parameters when the last parameter is really not optional (i.e. the
default will lead to an error). Any suggestions are appreciated.
Reading back through the thread, the major objection to the original
patch was that it was submitted too soon. Hopefully the elapsed time
addresses that concern.
Another common objection is that the new API is hard to use from bash
scripts. This is arguably still true, but Laurenz has demonstrated that
bash is capable of this feat:
https://github.com/cybertec-postgresql/safe-backup
Lastly, there is some concern about getting the backup label too late
when doing snapshots or using traditional backup software. It would
certainly be possible to return the backup_label and tablespace_map from
pg_start_backup() and let the user make the choice about what to do with
them. Any of these methods will require some scripting so I don't see
why the files couldn't be written as, for example, backup_label.snap and
tablespace_map.snap and then renamed by the restore script. The patch
does not currently make this change, but it could be added pretty easily
if that overcomes this objection.
Regards,
--
-David
david@pgmasters.net
Attachments:
remove-exclusive-backup-v2.patchtext/plain; charset=UTF-8; name=remove-exclusive-backup-v2.patch; x-mac-creator=0; x-mac-type=0Download
From 035153589f47047d38cbb2ffef89d7f1f67e85cc Mon Sep 17 00:00:00 2001
From: David Steele <david@pgmasters.net>
Date: Tue, 30 Jun 2020 17:23:51 -0400
Subject: [PATCH] Remove Deprecated Exclusive Backup Mode.
Remove the exclusive backup mode which has been deprecated since
PostgreSQL 9.6.
---
doc/src/sgml/backup.sgml | 150 +-----
doc/src/sgml/func.sgml | 82 +--
src/backend/access/transam/xlog.c | 476 ++----------------
src/backend/access/transam/xlogfuncs.c | 245 ++-------
src/backend/catalog/system_views.sql | 3 +-
src/backend/postmaster/postmaster.c | 46 +-
src/backend/replication/basebackup.c | 6 +-
src/backend/utils/init/postinit.c | 2 +-
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 4 +
src/bin/pg_rewind/filemap.c | 6 +-
src/include/access/xlog.h | 3 +-
src/include/catalog/pg_proc.dat | 12 +-
src/include/libpq/libpq-be.h | 3 +-
src/include/miscadmin.h | 4 -
src/test/perl/PostgresNode.pm | 56 +--
.../t/010_logical_decoding_timelines.pl | 4 +-
16 files changed, 128 insertions(+), 974 deletions(-)
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index bdc9026c62..c544d79a1b 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -819,16 +819,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 && cp pg_wal/0
sequence, and that the success of a step is verified before
proceeding to the next step.
</para>
- <para>
- Low level base backups can be made in a non-exclusive or an exclusive
- way. The non-exclusive method is recommended and the exclusive one is
- deprecated and will eventually be removed.
- </para>
- <sect3 id="backup-lowlevel-base-backup-nonexclusive">
- <title>Making a Non-Exclusive Low-Level Backup</title>
<para>
- A non-exclusive low level backup is one that allows other
+ A low level backup allows other
concurrent backups to be running (both those started using
the same backup API and those started using
<xref linkend="app-pgbasebackup"/>).
@@ -868,8 +861,9 @@ SELECT pg_start_backup('label', false, false);
</para>
<para>
- The third parameter being <literal>false</literal> tells
- <function>pg_start_backup</function> to initiate a non-exclusive base backup.
+ The third parameter must always be set to <literal>false</literal>. The
+ default of <literal>true</literal> requests an exclusive backup, which is
+ no longer supported.
</para>
</listitem>
<listitem>
@@ -946,142 +940,6 @@ SELECT * FROM pg_stop_backup(false, true);
</listitem>
</orderedlist>
</para>
- </sect3>
- <sect3 id="backup-lowlevel-base-backup-exclusive">
- <title>Making an Exclusive Low-Level Backup</title>
-
- <note>
- <para>
- The exclusive backup method is deprecated and should be avoided.
- Prior to <productname>PostgreSQL</productname> 9.6, this was the only
- low-level method available, but it is now recommended that all users
- upgrade their scripts to use non-exclusive backups.
- </para>
- </note>
-
- <para>
- The process for an exclusive backup is mostly the same as for a
- non-exclusive one, but it differs in a few key steps. This type of
- backup can only be taken on a primary and does not allow concurrent
- backups. Moreover, because it creates a backup label file, as
- described below, it can block automatic restart of the master server
- after a crash. On the other hand, the erroneous removal of this
- file from a backup or standby is a common mistake, which can result
- in serious data corruption. If it is necessary to use this method,
- the following steps may be used.
- </para>
- <para>
- <orderedlist>
- <listitem>
- <para>
- Ensure that WAL archiving is enabled and working.
- </para>
- </listitem>
- <listitem>
- <para>
- Connect to the server (it does not matter which database) as a user with
- rights to run pg_start_backup (superuser, or a user who has been granted
- EXECUTE on the function) and issue the command:
-<programlisting>
-SELECT pg_start_backup('label');
-</programlisting>
- where <literal>label</literal> is any string you want to use to uniquely
- identify this backup operation.
- <function>pg_start_backup</function> creates a <firstterm>backup label</firstterm> file,
- called <filename>backup_label</filename>, in the cluster directory with
- information about your backup, including the start time and label string.
- The function also creates a <firstterm>tablespace map</firstterm> file,
- called <filename>tablespace_map</filename>, in the cluster directory with
- information about tablespace symbolic links in <filename>pg_tblspc/</filename> if
- one or more such link is present. Both files are critical to the
- integrity of the backup, should you need to restore from it.
- </para>
-
- <para>
- By default, <function>pg_start_backup</function> can take a long time to finish.
- This is because it performs a checkpoint, and the I/O
- required for the checkpoint will be spread out over a significant
- period of time, by default half your inter-checkpoint interval
- (see the configuration parameter
- <xref linkend="guc-checkpoint-completion-target"/>). This is
- usually what you want, because it minimizes the impact on query
- processing. If you want to start the backup as soon as
- possible, use:
-<programlisting>
-SELECT pg_start_backup('label', true);
-</programlisting>
- This forces the checkpoint to be done as quickly as possible.
- </para>
- </listitem>
- <listitem>
- <para>
- Perform the backup, using any convenient file-system-backup tool
- such as <application>tar</application> or <application>cpio</application> (not
- <application>pg_dump</application> or
- <application>pg_dumpall</application>). It is neither
- necessary nor desirable to stop normal operation of the database
- while you do this. See
- <xref linkend="backup-lowlevel-base-backup-data"/> for things to
- consider during this backup.
- </para>
- <para>
- As noted above, if the server crashes during the backup it may not be
- possible to restart until the <filename>backup_label</filename> file has
- been manually deleted from the <envar>PGDATA</envar> directory. Note
- that it is very important to never remove the
- <filename>backup_label</filename> file when restoring a backup, because
- this will result in corruption. Confusion about when it is appropriate
- to remove this file is a common cause of data corruption when using this
- method; be very certain that you remove the file only on an existing
- master and never when building a standby or restoring a backup, even if
- you are building a standby that will subsequently be promoted to a new
- master.
- </para>
- </listitem>
- <listitem>
- <para>
- Again connect to the database as a user with rights to run
- pg_stop_backup (superuser, or a user who has been granted EXECUTE on
- the function), and issue the command:
-<programlisting>
-SELECT pg_stop_backup();
-</programlisting>
- This function terminates backup mode and
- performs an automatic switch to the next WAL segment. The reason for the
- switch is to arrange for the last WAL segment written during the backup
- interval to be ready to archive.
- </para>
- </listitem>
- <listitem>
- <para>
- Once the WAL segment files active during the backup are archived, you are
- done. The file identified by <function>pg_stop_backup</function>'s result is
- the last segment that is required to form a complete set of backup files.
- If <varname>archive_mode</varname> is enabled,
- <function>pg_stop_backup</function> does not return until the last segment has
- been archived.
- Archiving of these files happens automatically since you have
- already configured <varname>archive_command</varname>. In most cases this
- happens quickly, but you are advised to monitor your archive
- system to ensure there are no delays.
- If the archive process has fallen behind
- because of failures of the archive command, it will keep retrying
- until the archive succeeds and the backup is complete.
- </para>
-
- <para>
- When using exclusive backup mode, it is absolutely imperative to ensure
- that <function>pg_stop_backup</function> completes successfully at the
- end of the backup. Even if the backup itself fails, for example due to
- lack of disk space, failure to call <function>pg_stop_backup</function>
- will leave the server in backup mode indefinitely, causing future backups
- to fail and increasing the risk of a restart failure during the time that
- <filename>backup_label</filename> exists.
- </para>
- </listitem>
- </orderedlist>
- </para>
- </sect3>
<sect3 id="backup-lowlevel-base-backup-data">
<title>Backing Up the Data Directory</title>
<para>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f065856535..9e262fab3b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -23990,9 +23990,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
The functions shown in <xref
linkend="functions-admin-backup-table"/> assist in making on-line backups.
These functions cannot be executed during recovery (except
- non-exclusive <function>pg_start_backup</function>,
- non-exclusive <function>pg_stop_backup</function>,
- <function>pg_is_in_backup</function>, <function>pg_backup_start_time</function>
+ <function>pg_start_backup</function>, <function>pg_stop_backup</function>,
and <function>pg_wal_lsn_diff</function>).
</para>
@@ -24099,20 +24097,9 @@ SELECT collation for ('foo' COLLATE "de_DE");
it specifies executing <function>pg_start_backup</function> as quickly
as possible. This forces an immediate checkpoint which will cause a
spike in I/O operations, slowing any concurrently executing queries.
- The optional third parameter specifies whether to perform an exclusive
- or non-exclusive backup (default is exclusive).
- </para>
- <para>
- When used in exclusive mode, this function writes a backup label file
- (<filename>backup_label</filename>) and, if there are any links in
- the <filename>pg_tblspc/</filename> directory, a tablespace map file
- (<filename>tablespace_map</filename>) into the database cluster's data
- directory, then performs a checkpoint, and then returns the backup's
- starting write-ahead log location. (The user can ignore this
- result value, but it is provided in case it is useful.) When used in
- non-exclusive mode, the contents of these files are instead returned
- by the <function>pg_stop_backup</function> function, and should be
- copied to the backup area by the user.
+ The third parameter must be set explicitly <literal>false</literal>.
+ The default of <literal>true</literal> for an exclusive backup is no
+ longer supported.
</para>
<para>
This function is restricted to superusers by default, but other users
@@ -24135,15 +24122,10 @@ SELECT collation for ('foo' COLLATE "de_DE");
<parameter>spcmapfile</parameter> <type>text</type> )
</para>
<para>
- Finishes performing an exclusive or non-exclusive on-line backup.
- The <parameter>exclusive</parameter> parameter must match the
+ Finishes performing an on-line backup. The
+ <parameter>exclusive</parameter> parameter must be
+ <literal>false</literal> false to match the
previous <function>pg_start_backup</function> call.
- In an exclusive backup, <function>pg_stop_backup</function> removes
- the backup label file and, if it exists, the tablespace map file
- created by <function>pg_start_backup</function>. In a non-exclusive
- backup, the desired contents of these files are returned as part of
- the result of the function, and should be written to files in the
- backup area (not in the data directory).
</para>
<para>
There is an optional second parameter of type <type>boolean</type>.
@@ -24174,9 +24156,9 @@ SELECT collation for ('foo' COLLATE "de_DE");
The result of the function is a single record.
The <parameter>lsn</parameter> column holds the backup's ending
write-ahead log location (which again can be ignored). The second and
- third columns are <literal>NULL</literal> when ending an exclusive
- backup; after a non-exclusive backup they hold the desired contents of
- the label and tablespace map files.
+ third columns hold the contents of the label and tablespace map files,
+ which should be written to files in the backup area (not in the data
+ directory).
</para>
<para>
This function is restricted to superusers by default, but other users
@@ -24184,50 +24166,6 @@ SELECT collation for ('foo' COLLATE "de_DE");
</para></entry>
</row>
- <row>
- <entry role="func_table_entry"><para role="func_signature">
- <function>pg_stop_backup</function> ()
- <returnvalue>pg_lsn</returnvalue>
- </para>
- <para>
- Finishes performing an exclusive on-line backup. This simplified
- version is equivalent to <literal>pg_stop_backup(true,
- true)</literal>, except that it only returns the <type>pg_lsn</type>
- result.
- </para>
- <para>
- This function is restricted to superusers by default, but other users
- can be granted EXECUTE to run the function.
- </para></entry>
- </row>
-
- <row>
- <entry role="func_table_entry"><para role="func_signature">
- <indexterm>
- <primary>pg_is_in_backup</primary>
- </indexterm>
- <function>pg_is_in_backup</function> ()
- <returnvalue>boolean</returnvalue>
- </para>
- <para>
- Returns true if an on-line exclusive backup is in progress.
- </para></entry>
- </row>
-
- <row>
- <entry role="func_table_entry"><para role="func_signature">
- <indexterm>
- <primary>pg_backup_start_time</primary>
- </indexterm>
- <function>pg_backup_start_time</function> ()
- <returnvalue>timestamp with time zone</returnvalue>
- </para>
- <para>
- Returns the start time of the current on-line exclusive backup if one
- is in progress, otherwise <literal>NULL</literal>.
- </para></entry>
- </row>
-
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fd93bcfaeb..3f3902fddf 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -506,29 +506,6 @@ typedef union WALInsertLockPadded
char pad[PG_CACHE_LINE_SIZE];
} WALInsertLockPadded;
-/*
- * State of an exclusive backup, necessary to control concurrent activities
- * across sessions when working on exclusive backups.
- *
- * EXCLUSIVE_BACKUP_NONE means that there is no exclusive backup actually
- * running, to be more precise pg_start_backup() is not being executed for
- * an exclusive backup and there is no exclusive backup in progress.
- * EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is starting an
- * exclusive backup.
- * EXCLUSIVE_BACKUP_IN_PROGRESS means that pg_start_backup() has finished
- * running and an exclusive backup is in progress. pg_stop_backup() is
- * needed to finish it.
- * EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is stopping an
- * exclusive backup.
- */
-typedef enum ExclusiveBackupState
-{
- EXCLUSIVE_BACKUP_NONE = 0,
- EXCLUSIVE_BACKUP_STARTING,
- EXCLUSIVE_BACKUP_IN_PROGRESS,
- EXCLUSIVE_BACKUP_STOPPING
-} ExclusiveBackupState;
-
/*
* Session status of running backup, used for sanity checks in SQL-callable
* functions to start and stop backups.
@@ -576,15 +553,12 @@ typedef struct XLogCtlInsert
bool fullPageWrites;
/*
- * exclusiveBackupState indicates the state of an exclusive backup (see
- * comments of ExclusiveBackupState for more details). nonExclusiveBackups
- * is a counter indicating the number of streaming base backups currently
- * in progress. forcePageWrites is set to true when either of these is
- * non-zero. lastBackupStart is the latest checkpoint redo location used
- * as a starting point for an online backup.
+ * runningBackups is a counter indicating the number of backups currently in
+ * progress. forcePageWrites is set to true when either of these is
+ * non-zero. lastBackupStart is the latest checkpoint redo location used as
+ * a starting point for an online backup.
*/
- ExclusiveBackupState exclusiveBackupState;
- int nonExclusiveBackups;
+ int runningBackups;
XLogRecPtr lastBackupStart;
/*
@@ -944,7 +918,6 @@ static void xlog_outrec(StringInfo buf, XLogReaderState *record);
#endif
static void xlog_outdesc(StringInfo buf, XLogReaderState *record);
static void pg_start_backup_callback(int code, Datum arg);
-static void pg_stop_backup_callback(int code, Datum arg);
static bool read_backup_label(XLogRecPtr *checkPointLoc,
bool *backupEndRequired, bool *backupFromStandby);
static bool read_tablespace_map(List **tablespaces);
@@ -9371,7 +9344,7 @@ CreateRestartPoint(int flags)
* Ensure minRecoveryPoint is past the checkpoint record. Normally,
* this will have happened already while writing out dirty buffers,
* but not necessarily - e.g. because no buffers were dirtied. We do
- * this because a non-exclusive base backup uses minRecoveryPoint to
+ * this because a backup performed in recovery uses minRecoveryPoint to
* determine which WAL files must be included in the backup, and the
* file (or files) containing the checkpoint record must be included,
* at a minimum. Note that for an ordinary restart of recovery there's
@@ -10444,25 +10417,14 @@ issue_xlog_fsync(int fd, XLogSegNo segno)
}
/*
- * do_pg_start_backup
- *
- * Utility function called at the start of an online backup. It creates the
- * necessary starting checkpoint and constructs the backup label file.
+ * do_pg_start_backup is the workhorse of the user-visible pg_start_backup()
+ * function. It creates the necessary starting checkpoint and constructs the
+ * backup label and tablespace map.
*
- * There are two kind of backups: exclusive and non-exclusive. An exclusive
- * backup is started with pg_start_backup(), and there can be only one active
- * at a time. The backup and tablespace map files of an exclusive backup are
- * written to $PGDATA/backup_label and $PGDATA/tablespace_map, and they are
- * removed by pg_stop_backup().
- *
- * A non-exclusive backup is used for the streaming base backups (see
- * src/backend/replication/basebackup.c). The difference to exclusive backups
- * is that the backup label and tablespace map files are not written to disk.
- * Instead, their would-be contents are returned in *labelfile and *tblspcmapfile,
- * and the caller is responsible for including them in the backup archive as
- * 'backup_label' and 'tablespace_map'. There can be many non-exclusive backups
- * active at the same time, and they don't conflict with an exclusive backup
- * either.
+ * The backup label and tablespace map contents are returned in *labelfile and
+ * *tblspcmapfile, and the caller is responsible for including them in the
+ * backup archive as 'backup_label' and 'tablespace_map'. There can be many
+ * backups active at the same time.
*
* tablespaces is required only when this function is called while
* the streaming base backup requested by pg_basebackup is running.
@@ -10472,15 +10434,14 @@ issue_xlog_fsync(int fd, XLogSegNo segno)
* utilities are not able to create symlinks while extracting files from tar.
* However for consistency, the same is used for all platforms.
*
- * needtblspcmapfile is true for the cases (exclusive backup and for
- * non-exclusive backup only when tar format is used for taking backup)
- * when backup needs to generate tablespace_map file, it is used to
- * embed escape character before newline character in tablespace path.
+ * needtblspcmapfile is true for the cases (only when tar format is used for
+ * taking backup) when backup needs to generate tablespace_map file, it is used
+ * to embed escape character before newline character in tablespace path.
*
* Returns the minimum WAL location that must be present to restore from this
* backup, and the corresponding timeline ID in *starttli_p.
*
- * Every successfully started non-exclusive backup must be stopped by calling
+ * Every successfully started backup must be stopped by calling
* do_pg_stop_backup() or do_pg_abort_backup().
*
* It is the responsibility of the caller of this function to verify the
@@ -10491,7 +10452,6 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
StringInfo labelfile, List **tablespaces,
StringInfo tblspcmapfile, bool needtblspcmapfile)
{
- bool exclusive = (labelfile == NULL);
bool backup_started_in_recovery = false;
XLogRecPtr checkpointloc;
XLogRecPtr startpoint;
@@ -10500,20 +10460,9 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
char strfbuf[128];
char xlogfilename[MAXFNAMELEN];
XLogSegNo _logSegNo;
- struct stat stat_buf;
- FILE *fp;
backup_started_in_recovery = RecoveryInProgress();
- /*
- * Currently only non-exclusive backup can be taken during recovery.
- */
- if (backup_started_in_recovery && exclusive)
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("recovery is in progress"),
- errhint("WAL control functions cannot be executed during recovery.")));
-
/*
* During recovery, we don't need to check WAL level. Because, if WAL
* level is not sufficient, it's impossible to get here during recovery.
@@ -10552,30 +10501,12 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
* XLogInsertRecord().
*/
WALInsertLockAcquireExclusive();
- if (exclusive)
- {
- /*
- * At first, mark that we're now starting an exclusive backup, to
- * ensure that there are no other sessions currently running
- * pg_start_backup() or pg_stop_backup().
- */
- if (XLogCtl->Insert.exclusiveBackupState != EXCLUSIVE_BACKUP_NONE)
- {
- WALInsertLockRelease();
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("a backup is already in progress"),
- errhint("Run pg_stop_backup() and try again.")));
- }
- XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_STARTING;
- }
- else
- XLogCtl->Insert.nonExclusiveBackups++;
+ XLogCtl->Insert.runningBackups++;
XLogCtl->Insert.forcePageWrites = true;
WALInsertLockRelease();
/* Ensure we release forcePageWrites if fail below */
- PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
+ PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) 0);
{
bool gotUniqueStartpoint = false;
DIR *tblspcdir;
@@ -10702,11 +10633,8 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
XLogFileName(xlogfilename, starttli, _logSegNo, wal_segment_size);
/*
- * Construct tablespace_map file
+ * Construct tablespace_map
*/
- if (exclusive)
- tblspcmapfile = makeStringInfo();
-
datadirpathlen = strlen(DataDir);
/* Collect information about all tablespaces */
@@ -10796,11 +10724,8 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
FreeDir(tblspcdir);
/*
- * Construct backup label file
+ * Construct backup_label
*/
- if (exclusive)
- labelfile = makeStringInfo();
-
/* Use the log timezone here, not the session timezone */
stamp_time = (pg_time_t) time(NULL);
pg_strftime(strfbuf, sizeof(strfbuf),
@@ -10810,122 +10735,19 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
(uint32) (startpoint >> 32), (uint32) startpoint, xlogfilename);
appendStringInfo(labelfile, "CHECKPOINT LOCATION: %X/%X\n",
(uint32) (checkpointloc >> 32), (uint32) checkpointloc);
- appendStringInfo(labelfile, "BACKUP METHOD: %s\n",
- exclusive ? "pg_start_backup" : "streamed");
+ appendStringInfo(labelfile, "BACKUP METHOD: streamed\n");
appendStringInfo(labelfile, "BACKUP FROM: %s\n",
backup_started_in_recovery ? "standby" : "master");
appendStringInfo(labelfile, "START TIME: %s\n", strfbuf);
appendStringInfo(labelfile, "LABEL: %s\n", backupidstr);
appendStringInfo(labelfile, "START TIMELINE: %u\n", starttli);
-
- /*
- * Okay, write the file, or return its contents to caller.
- */
- if (exclusive)
- {
- /*
- * Check for existing backup label --- implies a backup is already
- * running. (XXX given that we checked exclusiveBackupState
- * above, maybe it would be OK to just unlink any such label
- * file?)
- */
- if (stat(BACKUP_LABEL_FILE, &stat_buf) != 0)
- {
- if (errno != ENOENT)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- BACKUP_LABEL_FILE)));
- }
- else
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("a backup is already in progress"),
- errhint("If you're sure there is no backup in progress, remove file \"%s\" and try again.",
- BACKUP_LABEL_FILE)));
-
- fp = AllocateFile(BACKUP_LABEL_FILE, "w");
-
- if (!fp)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not create file \"%s\": %m",
- BACKUP_LABEL_FILE)));
- if (fwrite(labelfile->data, labelfile->len, 1, fp) != 1 ||
- fflush(fp) != 0 ||
- pg_fsync(fileno(fp)) != 0 ||
- ferror(fp) ||
- FreeFile(fp))
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not write file \"%s\": %m",
- BACKUP_LABEL_FILE)));
- /* Allocated locally for exclusive backups, so free separately */
- pfree(labelfile->data);
- pfree(labelfile);
-
- /* Write backup tablespace_map file. */
- if (tblspcmapfile->len > 0)
- {
- if (stat(TABLESPACE_MAP, &stat_buf) != 0)
- {
- if (errno != ENOENT)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- TABLESPACE_MAP)));
- }
- else
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("a backup is already in progress"),
- errhint("If you're sure there is no backup in progress, remove file \"%s\" and try again.",
- TABLESPACE_MAP)));
-
- fp = AllocateFile(TABLESPACE_MAP, "w");
-
- if (!fp)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not create file \"%s\": %m",
- TABLESPACE_MAP)));
- if (fwrite(tblspcmapfile->data, tblspcmapfile->len, 1, fp) != 1 ||
- fflush(fp) != 0 ||
- pg_fsync(fileno(fp)) != 0 ||
- ferror(fp) ||
- FreeFile(fp))
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not write file \"%s\": %m",
- TABLESPACE_MAP)));
- }
-
- /* Allocated locally for exclusive backups, so free separately */
- pfree(tblspcmapfile->data);
- pfree(tblspcmapfile);
- }
}
- PG_END_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
+ PG_END_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) 0);
/*
- * Mark that start phase has correctly finished for an exclusive backup.
- * Session-level locks are updated as well to reflect that state.
- *
- * Note that CHECK_FOR_INTERRUPTS() must not occur while updating backup
- * counters and session-level lock. Otherwise they can be updated
- * inconsistently, and which might cause do_pg_abort_backup() to fail.
+ * Mark that the start phase has correctly finished for the backup.
*/
- if (exclusive)
- {
- WALInsertLockAcquireExclusive();
- XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_IN_PROGRESS;
-
- /* Set session-level lock */
- sessionBackupState = SESSION_BACKUP_EXCLUSIVE;
- WALInsertLockRelease();
- }
- else
- sessionBackupState = SESSION_BACKUP_NON_EXCLUSIVE;
+ sessionBackupState = SESSION_BACKUP_RUNNING;
/*
* We're done. As a convenience, return the starting WAL location.
@@ -10939,43 +10761,15 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
static void
pg_start_backup_callback(int code, Datum arg)
{
- bool exclusive = DatumGetBool(arg);
-
/* Update backup counters and forcePageWrites on failure */
WALInsertLockAcquireExclusive();
- if (exclusive)
- {
- Assert(XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STARTING);
- XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_NONE;
- }
- else
- {
- Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
- XLogCtl->Insert.nonExclusiveBackups--;
- }
- if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&
- XLogCtl->Insert.nonExclusiveBackups == 0)
- {
- XLogCtl->Insert.forcePageWrites = false;
- }
- WALInsertLockRelease();
-}
+ Assert(XLogCtl->Insert.runningBackups > 0);
+ XLogCtl->Insert.runningBackups--;
-/*
- * Error cleanup callback for pg_stop_backup
- */
-static void
-pg_stop_backup_callback(int code, Datum arg)
-{
- bool exclusive = DatumGetBool(arg);
-
- /* Update backup status on failure */
- WALInsertLockAcquireExclusive();
- if (exclusive)
+ if (XLogCtl->Insert.runningBackups == 0)
{
- Assert(XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STOPPING);
- XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_IN_PROGRESS;
+ XLogCtl->Insert.forcePageWrites = false;
}
WALInsertLockRelease();
}
@@ -10995,9 +10789,6 @@ get_backup_status(void)
* Utility function called at the end of an online backup. It cleans up the
* backup state and can optionally wait for WAL segments to be archived.
*
- * If labelfile is NULL, this stops an exclusive backup. Otherwise this stops
- * the non-exclusive backup specified by 'labelfile'.
- *
* Returns the last WAL location that must be present to restore from this
* backup, and the corresponding timeline ID in *stoptli_p.
*
@@ -11007,7 +10798,6 @@ get_backup_status(void)
XLogRecPtr
do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
{
- bool exclusive = (labelfile == NULL);
bool backup_started_in_recovery = false;
XLogRecPtr startpoint;
XLogRecPtr stoppoint;
@@ -11021,7 +10811,6 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
char histfilename[MAXFNAMELEN];
char backupfrom[20];
XLogSegNo _logSegNo;
- FILE *lfp;
FILE *fp;
char ch;
int seconds_before_warning;
@@ -11034,15 +10823,6 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
backup_started_in_recovery = RecoveryInProgress();
- /*
- * Currently only non-exclusive backup can be taken during recovery.
- */
- if (backup_started_in_recovery && exclusive)
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("recovery is in progress"),
- errhint("WAL control functions cannot be executed during recovery.")));
-
/*
* During recovery, we don't need to check WAL level. Because, if WAL
* level is not sufficient, it's impossible to get here during recovery.
@@ -11053,106 +10833,23 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
errmsg("WAL level not sufficient for making an online backup"),
errhint("wal_level must be set to \"replica\" or \"logical\" at server start.")));
- if (exclusive)
- {
- /*
- * At first, mark that we're now stopping an exclusive backup, to
- * ensure that there are no other sessions currently running
- * pg_start_backup() or pg_stop_backup().
- */
- WALInsertLockAcquireExclusive();
- if (XLogCtl->Insert.exclusiveBackupState != EXCLUSIVE_BACKUP_IN_PROGRESS)
- {
- WALInsertLockRelease();
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("exclusive backup not in progress")));
- }
- XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_STOPPING;
- WALInsertLockRelease();
-
- /*
- * Remove backup_label. In case of failure, the state for an exclusive
- * backup is switched back to in-progress.
- */
- PG_ENSURE_ERROR_CLEANUP(pg_stop_backup_callback, (Datum) BoolGetDatum(exclusive));
- {
- /*
- * Read the existing label file into memory.
- */
- struct stat statbuf;
- int r;
-
- if (stat(BACKUP_LABEL_FILE, &statbuf))
- {
- /* should not happen per the upper checks */
- if (errno != ENOENT)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- BACKUP_LABEL_FILE)));
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("a backup is not in progress")));
- }
-
- lfp = AllocateFile(BACKUP_LABEL_FILE, "r");
- if (!lfp)
- {
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not read file \"%s\": %m",
- BACKUP_LABEL_FILE)));
- }
- labelfile = palloc(statbuf.st_size + 1);
- r = fread(labelfile, statbuf.st_size, 1, lfp);
- labelfile[statbuf.st_size] = '\0';
-
- /*
- * Close and remove the backup label file
- */
- if (r != 1 || ferror(lfp) || FreeFile(lfp))
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not read file \"%s\": %m",
- BACKUP_LABEL_FILE)));
- durable_unlink(BACKUP_LABEL_FILE, ERROR);
-
- /*
- * Remove tablespace_map file if present, it is created only if
- * there are tablespaces.
- */
- durable_unlink(TABLESPACE_MAP, DEBUG1);
- }
- PG_END_ENSURE_ERROR_CLEANUP(pg_stop_backup_callback, (Datum) BoolGetDatum(exclusive));
- }
-
/*
- * OK to update backup counters, forcePageWrites and session-level lock.
+ * OK to update backup counters and forcePageWrites.
*
* Note that CHECK_FOR_INTERRUPTS() must not occur while updating them.
* Otherwise they can be updated inconsistently, and which might cause
* do_pg_abort_backup() to fail.
*/
WALInsertLockAcquireExclusive();
- if (exclusive)
- {
- XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_NONE;
- }
- else
- {
- /*
- * The user-visible pg_start/stop_backup() functions that operate on
- * exclusive backups can be called at any time, but for non-exclusive
- * backups, it is expected that each do_pg_start_backup() call is
- * matched by exactly one do_pg_stop_backup() call.
- */
- Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
- XLogCtl->Insert.nonExclusiveBackups--;
- }
- if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&
- XLogCtl->Insert.nonExclusiveBackups == 0)
+ /*
+ * It is expected that each do_pg_start_backup() call is matched by exactly
+ * one do_pg_stop_backup() call.
+ */
+ Assert(XLogCtl->Insert.runningBackups > 0);
+ XLogCtl->Insert.runningBackups--;
+
+ if (XLogCtl->Insert.runningBackups == 0)
{
XLogCtl->Insert.forcePageWrites = false;
}
@@ -11410,10 +11107,6 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
* The caller can pass 'arg' as 'true' or 'false' to control whether a warning
* is emitted.
*
- * NB: This is only for aborting a non-exclusive backup that doesn't write
- * backup_label. A backup started with pg_start_backup() needs to be finished
- * with pg_stop_backup().
- *
* NB: This gets used as a before_shmem_exit handler, hence the odd-looking
* signature.
*/
@@ -11423,18 +11116,16 @@ do_pg_abort_backup(int code, Datum arg)
bool emit_warning = DatumGetBool(arg);
/*
- * Quick exit if session is not keeping around a non-exclusive backup
- * already started.
+ * Quick exit if session does not have a running backup.
*/
- if (sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE)
+ if (sessionBackupState != SESSION_BACKUP_RUNNING)
return;
WALInsertLockAcquireExclusive();
- Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
- XLogCtl->Insert.nonExclusiveBackups--;
+ Assert(XLogCtl->Insert.runningBackups > 0);
+ XLogCtl->Insert.runningBackups--;
- if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&
- XLogCtl->Insert.nonExclusiveBackups == 0)
+ if (XLogCtl->Insert.runningBackups == 0)
{
XLogCtl->Insert.forcePageWrites = false;
}
@@ -11756,87 +11447,6 @@ rm_redo_error_callback(void *arg)
pfree(buf.data);
}
-/*
- * BackupInProgress: check if online backup mode is active
- *
- * This is done by checking for existence of the "backup_label" file.
- */
-bool
-BackupInProgress(void)
-{
- struct stat stat_buf;
-
- return (stat(BACKUP_LABEL_FILE, &stat_buf) == 0);
-}
-
-/*
- * CancelBackup: rename the "backup_label" and "tablespace_map"
- * files to cancel backup mode
- *
- * If the "backup_label" file exists, it will be renamed to "backup_label.old".
- * Similarly, if the "tablespace_map" file exists, it will be renamed to
- * "tablespace_map.old".
- *
- * Note that this will render an online backup in progress
- * useless. To correctly finish an online backup, pg_stop_backup must be
- * called.
- */
-void
-CancelBackup(void)
-{
- struct stat stat_buf;
-
- /* if the backup_label file is not there, return */
- if (stat(BACKUP_LABEL_FILE, &stat_buf) < 0)
- return;
-
- /* remove leftover file from previously canceled backup if it exists */
- unlink(BACKUP_LABEL_OLD);
-
- if (durable_rename(BACKUP_LABEL_FILE, BACKUP_LABEL_OLD, DEBUG1) != 0)
- {
- ereport(WARNING,
- (errcode_for_file_access(),
- errmsg("online backup mode was not canceled"),
- errdetail("File \"%s\" could not be renamed to \"%s\": %m.",
- BACKUP_LABEL_FILE, BACKUP_LABEL_OLD)));
- return;
- }
-
- /* if the tablespace_map file is not there, return */
- if (stat(TABLESPACE_MAP, &stat_buf) < 0)
- {
- ereport(LOG,
- (errmsg("online backup mode canceled"),
- errdetail("File \"%s\" was renamed to \"%s\".",
- BACKUP_LABEL_FILE, BACKUP_LABEL_OLD)));
- return;
- }
-
- /* remove leftover file from previously canceled backup if it exists */
- unlink(TABLESPACE_MAP_OLD);
-
- if (durable_rename(TABLESPACE_MAP, TABLESPACE_MAP_OLD, DEBUG1) == 0)
- {
- ereport(LOG,
- (errmsg("online backup mode canceled"),
- errdetail("Files \"%s\" and \"%s\" were renamed to "
- "\"%s\" and \"%s\", respectively.",
- BACKUP_LABEL_FILE, TABLESPACE_MAP,
- BACKUP_LABEL_OLD, TABLESPACE_MAP_OLD)));
- }
- else
- {
- ereport(WARNING,
- (errcode_for_file_access(),
- errmsg("online backup mode canceled"),
- errdetail("File \"%s\" was renamed to \"%s\", but "
- "file \"%s\" could not be renamed to \"%s\": %m.",
- BACKUP_LABEL_FILE, BACKUP_LABEL_OLD,
- TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
- }
-}
-
/*
* Read the XLOG page containing RecPtr into readBuf (if not read already).
* Returns number of bytes read, if the page is read successfully, or -1
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 290658b22c..ec3669b4ae 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -39,7 +39,7 @@
#include "utils/tuplestore.h"
/*
- * Store label file and tablespace map during non-exclusive backups.
+ * Store label file and tablespace map during backups.
*/
static StringInfo label_file;
static StringInfo tblspc_map_file;
@@ -61,101 +61,46 @@ pg_start_backup(PG_FUNCTION_ARGS)
{
text *backupid = PG_GETARG_TEXT_PP(0);
bool fast = PG_GETARG_BOOL(1);
- bool exclusive = PG_GETARG_BOOL(2);
char *backupidstr;
XLogRecPtr startpoint;
SessionBackupState status = get_backup_status();
+ MemoryContext oldcontext;
backupidstr = text_to_cstring(backupid);
- if (status == SESSION_BACKUP_NON_EXCLUSIVE)
+ /* Error if exclusive backup mode is requested */
+ if (PG_GETARG_BOOL(2))
ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("a backup is already in progress in this session")));
-
- if (exclusive)
- {
- startpoint = do_pg_start_backup(backupidstr, fast, NULL, NULL,
- NULL, NULL, true);
- }
- else
- {
- MemoryContext oldcontext;
-
- /*
- * Label file and tablespace map file need to be long-lived, since
- * they are read in pg_stop_backup.
- */
- oldcontext = MemoryContextSwitchTo(TopMemoryContext);
- label_file = makeStringInfo();
- tblspc_map_file = makeStringInfo();
- MemoryContextSwitchTo(oldcontext);
-
- register_persistent_abort_backup_handler();
-
- startpoint = do_pg_start_backup(backupidstr, fast, NULL, label_file,
- NULL, tblspc_map_file, true);
- }
-
- PG_RETURN_LSN(startpoint);
-}
-
-/*
- * pg_stop_backup: finish taking an on-line backup dump
- *
- * We write an end-of-backup WAL record, and remove the backup label file
- * created by pg_start_backup, creating a backup history file in pg_wal
- * instead (whence it will immediately be archived). The backup history file
- * contains the same info found in the label file, plus the backup-end time
- * and WAL location. Before 9.0, the backup-end time was read from the backup
- * history file at the beginning of archive recovery, but we now use the WAL
- * record for that and the file is for informational and debug purposes only.
- *
- * Note: different from CancelBackup which just cancels online backup mode.
- *
- * Note: this version is only called to stop an exclusive backup. The function
- * pg_stop_backup_v2 (overloaded as pg_stop_backup in SQL) is called to
- * stop non-exclusive backups.
- *
- * Permission checking for this function is managed through the normal
- * GRANT system.
- */
-Datum
-pg_stop_backup(PG_FUNCTION_ARGS)
-{
- XLogRecPtr stoppoint;
- SessionBackupState status = get_backup_status();
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("exclusive backup mode is no longer supported")));
- if (status == SESSION_BACKUP_NON_EXCLUSIVE)
+ if (status == SESSION_BACKUP_RUNNING)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("non-exclusive backup in progress"),
- errhint("Did you mean to use pg_stop_backup('f')?")));
+ errmsg("a backup is already in progress in this session")));
/*
- * Exclusive backups were typically started in a different connection, so
- * don't try to verify that status of backup is set to
- * SESSION_BACKUP_EXCLUSIVE in this function. Actual verification that an
- * exclusive backup is in fact running is handled inside
- * do_pg_stop_backup.
+ * Label file and tablespace map file need to be long-lived, since
+ * they are read in pg_stop_backup.
*/
- stoppoint = do_pg_stop_backup(NULL, true, NULL);
+ oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+ label_file = makeStringInfo();
+ tblspc_map_file = makeStringInfo();
+ MemoryContextSwitchTo(oldcontext);
- PG_RETURN_LSN(stoppoint);
+ register_persistent_abort_backup_handler();
+
+ startpoint = do_pg_start_backup(backupidstr, fast, NULL, label_file,
+ NULL, tblspc_map_file, true);
+
+ PG_RETURN_LSN(startpoint);
}
/*
- * pg_stop_backup_v2: finish taking exclusive or nonexclusive on-line backup.
+ * pg_stop_backup: finish taking an on-line backup.
*
- * Works the same as pg_stop_backup, except for non-exclusive backups it returns
- * the backup label and tablespace map files as text fields in as part of the
- * resultset.
- *
- * The first parameter (variable 'exclusive') allows the user to tell us if
- * this is an exclusive or a non-exclusive backup.
- *
- * The second parameter (variable 'waitforarchive'), which is optional,
+ * The first parameter (variable 'waitforarchive'), which is optional,
* allows the user to choose if they want to wait for the WAL to be archived
* or if we should just return as soon as the WAL record is written.
*
@@ -163,7 +108,7 @@ pg_stop_backup(PG_FUNCTION_ARGS)
* GRANT system.
*/
Datum
-pg_stop_backup_v2(PG_FUNCTION_ARGS)
+pg_stop_backup(PG_FUNCTION_ARGS)
{
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
TupleDesc tupdesc;
@@ -173,11 +118,16 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
Datum values[3];
bool nulls[3];
- bool exclusive = PG_GETARG_BOOL(0);
bool waitforarchive = PG_GETARG_BOOL(1);
XLogRecPtr stoppoint;
SessionBackupState status = get_backup_status();
+ /* Error if exclusive backup mode is requested */
+ if (PG_GETARG_BOOL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("exclusive backup mode is no longer supported")));
+
/* check to see if caller supports us returning a tuplestore */
if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
ereport(ERROR,
@@ -205,51 +155,29 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
MemSet(values, 0, sizeof(values));
MemSet(nulls, 0, sizeof(nulls));
- if (exclusive)
- {
- if (status == SESSION_BACKUP_NON_EXCLUSIVE)
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("non-exclusive backup in progress"),
- errhint("Did you mean to use pg_stop_backup('f')?")));
-
- /*
- * Stop the exclusive backup, and since we're in an exclusive backup
- * return NULL for both backup_label and tablespace_map.
- */
- stoppoint = do_pg_stop_backup(NULL, waitforarchive, NULL);
-
- nulls[1] = true;
- nulls[2] = true;
- }
- else
- {
- if (status != SESSION_BACKUP_NON_EXCLUSIVE)
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("non-exclusive backup is not in progress"),
- errhint("Did you mean to use pg_stop_backup('t')?")));
+ if (status != SESSION_BACKUP_RUNNING)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("backup is not in progress"),
+ errhint("Did you call pg_start_backup()?")));
- /*
- * Stop the non-exclusive backup. Return a copy of the backup label
- * and tablespace map so they can be written to disk by the caller.
- */
- stoppoint = do_pg_stop_backup(label_file->data, waitforarchive, NULL);
-
- values[1] = CStringGetTextDatum(label_file->data);
- values[2] = CStringGetTextDatum(tblspc_map_file->data);
-
- /* Free structures allocated in TopMemoryContext */
- pfree(label_file->data);
- pfree(label_file);
- label_file = NULL;
- pfree(tblspc_map_file->data);
- pfree(tblspc_map_file);
- tblspc_map_file = NULL;
- }
+ /*
+ * Stop the backup. Return a copy of the backup label and tablespace map so
+ * they can be written to disk by the caller.
+ */
+ stoppoint = do_pg_stop_backup(label_file->data, waitforarchive, NULL);
- /* Stoppoint is included on both exclusive and nonexclusive backups */
values[0] = LSNGetDatum(stoppoint);
+ values[1] = CStringGetTextDatum(label_file->data);
+ values[2] = CStringGetTextDatum(tblspc_map_file->data);
+
+ /* Free structures allocated in TopMemoryContext */
+ pfree(label_file->data);
+ pfree(label_file);
+ label_file = NULL;
+ pfree(tblspc_map_file->data);
+ pfree(tblspc_map_file);
+ tblspc_map_file = NULL;
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
tuplestore_donestoring(tupstore);
@@ -627,81 +555,6 @@ pg_wal_lsn_diff(PG_FUNCTION_ARGS)
PG_RETURN_NUMERIC(result);
}
-/*
- * Returns bool with current on-line backup mode, a global state.
- */
-Datum
-pg_is_in_backup(PG_FUNCTION_ARGS)
-{
- PG_RETURN_BOOL(BackupInProgress());
-}
-
-/*
- * Returns start time of an online exclusive backup.
- *
- * When there's no exclusive backup in progress, the function
- * returns NULL.
- */
-Datum
-pg_backup_start_time(PG_FUNCTION_ARGS)
-{
- Datum xtime;
- FILE *lfp;
- char fline[MAXPGPATH];
- char backup_start_time[30];
-
- /*
- * See if label file is present
- */
- lfp = AllocateFile(BACKUP_LABEL_FILE, "r");
- if (lfp == NULL)
- {
- if (errno != ENOENT)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not read file \"%s\": %m",
- BACKUP_LABEL_FILE)));
- PG_RETURN_NULL();
- }
-
- /*
- * Parse the file to find the START TIME line.
- */
- backup_start_time[0] = '\0';
- while (fgets(fline, sizeof(fline), lfp) != NULL)
- {
- if (sscanf(fline, "START TIME: %25[^\n]\n", backup_start_time) == 1)
- break;
- }
-
- /* Check for a read error. */
- if (ferror(lfp))
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not read file \"%s\": %m", BACKUP_LABEL_FILE)));
-
- /* Close the backup label file. */
- if (FreeFile(lfp))
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not close file \"%s\": %m", BACKUP_LABEL_FILE)));
-
- if (strlen(backup_start_time) == 0)
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE)));
-
- /*
- * Convert the time string read from file to TimestampTz form.
- */
- xtime = DirectFunctionCall3(timestamptz_in,
- CStringGetDatum(backup_start_time),
- ObjectIdGetDatum(InvalidOid),
- Int32GetDatum(-1));
-
- PG_RETURN_DATUM(xtime);
-}
-
/*
* Promotes a standby server.
*
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5314e9348f..cce0e504a0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1214,7 +1214,7 @@ CREATE OR REPLACE FUNCTION
CREATE OR REPLACE FUNCTION pg_stop_backup (
exclusive boolean, wait_for_archive boolean DEFAULT true,
OUT lsn pg_lsn, OUT labelfile text, OUT spcmapfile text)
- RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup_v2'
+ RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup'
PARALLEL RESTRICTED;
CREATE OR REPLACE FUNCTION
@@ -1427,7 +1427,6 @@ AS 'unicode_is_normalized';
-- available to superuser / cluster owner, if they choose.
--
REVOKE EXECUTE ON FUNCTION pg_start_backup(text, boolean, boolean) FROM public;
-REVOKE EXECUTE ON FUNCTION pg_stop_backup() FROM public;
REVOKE EXECUTE ON FUNCTION pg_stop_backup(boolean, boolean) FROM public;
REVOKE EXECUTE ON FUNCTION pg_create_restore_point(text) FROM public;
REVOKE EXECUTE ON FUNCTION pg_switch_wal() FROM public;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b4d475bb0b..05c44c7ec3 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -300,8 +300,7 @@ static bool FatalError = false; /* T if recovering from backend crash */
* and we switch to PM_RUN state.
*
* Normal child backends can only be launched when we are in PM_RUN or
- * PM_HOT_STANDBY state. (We also allow launch of normal
- * child backends in PM_WAIT_BACKUP state, but only for superusers.)
+ * PM_HOT_STANDBY state.
* In other states we handle connection requests by launching "dead_end"
* child processes, which will simply send the client an error message and
* quit. (We track these in the BackendList so that we can know when they
@@ -327,7 +326,6 @@ typedef enum
PM_RECOVERY, /* in archive recovery mode */
PM_HOT_STANDBY, /* in hot standby mode */
PM_RUN, /* normal "database is alive" state */
- PM_WAIT_BACKUP, /* waiting for online backup mode to end */
PM_WAIT_READONLY, /* waiting for read only backends to exit */
PM_WAIT_BACKENDS, /* waiting for live backends to exit */
PM_SHUTDOWN, /* waiting for checkpointer to do shutdown
@@ -2320,9 +2318,6 @@ retry1:
(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
errmsg("sorry, too many clients already")));
break;
- case CAC_WAITBACKUP:
- /* OK for now, will check in InitPostgres */
- break;
case CAC_OK:
break;
}
@@ -2440,19 +2435,11 @@ canAcceptConnections(int backend_type)
* state. We treat autovac workers the same as user backends for this
* purpose. However, bgworkers are excluded from this test; we expect
* bgworker_should_start_now() decided whether the DB state allows them.
- *
- * In state PM_WAIT_BACKUP only superusers can connect (this must be
- * allowed so that a superuser can end online backup mode); we return
- * CAC_WAITBACKUP code to indicate that this must be checked later. Note
- * that neither CAC_OK nor CAC_WAITBACKUP can safely be returned until we
- * have checked for too many children.
*/
if (pmState != PM_RUN &&
backend_type != BACKEND_TYPE_BGWORKER)
{
- if (pmState == PM_WAIT_BACKUP)
- result = CAC_WAITBACKUP; /* allow superusers only */
- else if (Shutdown > NoShutdown)
+ if (Shutdown > NoShutdown)
return CAC_SHUTDOWN; /* shutdown is pending */
else if (!FatalError &&
(pmState == PM_STARTUP ||
@@ -2817,7 +2804,7 @@ pmdie(SIGNAL_ARGS)
* and walreceiver processes.
*/
pmState = (pmState == PM_RUN) ?
- PM_WAIT_BACKUP : PM_WAIT_READONLY;
+ PM_WAIT_BACKENDS : PM_WAIT_READONLY;
}
/*
@@ -2867,7 +2854,6 @@ pmdie(SIGNAL_ARGS)
pmState = PM_WAIT_BACKENDS;
}
else if (pmState == PM_RUN ||
- pmState == PM_WAIT_BACKUP ||
pmState == PM_WAIT_READONLY ||
pmState == PM_WAIT_BACKENDS ||
pmState == PM_HOT_STANDBY)
@@ -3709,7 +3695,6 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
if (pmState == PM_RECOVERY ||
pmState == PM_HOT_STANDBY ||
pmState == PM_RUN ||
- pmState == PM_WAIT_BACKUP ||
pmState == PM_WAIT_READONLY ||
pmState == PM_SHUTDOWN)
pmState = PM_WAIT_BACKENDS;
@@ -3793,15 +3778,6 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
static void
PostmasterStateMachine(void)
{
- if (pmState == PM_WAIT_BACKUP)
- {
- /*
- * PM_WAIT_BACKUP state ends when online backup mode is not active.
- */
- if (!BackupInProgress())
- pmState = PM_WAIT_BACKENDS;
- }
-
if (pmState == PM_WAIT_READONLY)
{
/*
@@ -3967,18 +3943,6 @@ PostmasterStateMachine(void)
}
else
{
- /*
- * Terminate exclusive backup mode to avoid recovery after a clean
- * fast shutdown. Since an exclusive backup can only be taken
- * during normal running (and not, for example, while running
- * under Hot Standby) it only makes sense to do this if we reached
- * normal running. If we're still in recovery, the backup file is
- * one we're recovering *from*, and we must keep it around so that
- * recovery restarts from the right place.
- */
- if (ReachedNormalRunning)
- CancelBackup();
-
/* Normal exit from the postmaster is here */
ExitPostmaster(0);
}
@@ -4180,8 +4144,7 @@ BackendStartup(Port *port)
/* Pass down canAcceptConnections state */
port->canAcceptConnections = canAcceptConnections(BACKEND_TYPE_NORMAL);
- bn->dead_end = (port->canAcceptConnections != CAC_OK &&
- port->canAcceptConnections != CAC_WAITBACKUP);
+ bn->dead_end = port->canAcceptConnections != CAC_OK;
/*
* Unless it's a dead_end child, assign it a child slot number
@@ -5898,7 +5861,6 @@ bgworker_should_start_now(BgWorkerStartTime start_time)
case PM_SHUTDOWN:
case PM_WAIT_BACKENDS:
case PM_WAIT_READONLY:
- case PM_WAIT_BACKUP:
break;
case PM_RUN:
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 096b0fcef0..1440180cc4 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -206,10 +206,8 @@ static const struct exclude_list_item excludeFiles[] =
{RELCACHE_INIT_FILENAME, true},
/*
- * If there's a backup_label or tablespace_map file, it belongs to a
- * backup started by the user with pg_start_backup(). It is *not* correct
- * for this backup. Our backup_label/tablespace_map is injected into the
- * tar separately.
+ * backup_label and tablespace_map should not exist in in a running cluster
+ * capable of doing an online backup, but exclude them just in case.
*/
{BACKUP_LABEL_FILE, false},
{TABLESPACE_MAP, false},
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index f4247ea70d..2e1d4e1c40 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -791,7 +791,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
*/
if ((!am_superuser || am_walsender) &&
MyProcPort != NULL &&
- MyProcPort->canAcceptConnections == CAC_WAITBACKUP)
+ MyProcPort->canAcceptConnections == CAC_SHUTDOWN)
{
if (am_walsender)
ereport(FATAL,
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 208df557b8..90af0d6272 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -159,6 +159,10 @@ isnt(slurp_file("$tempdir/backup/backup_label"),
'DONOTCOPY', 'existing backup_label not copied');
rmtree("$tempdir/backup");
+# Now delete the bogus backup_label file since it will interfere with startup
+unlink("$pgdata/backup_label")
+ or BAIL_OUT("unable to unlink $pgdata/backup_label");
+
$node->command_ok(
[
'pg_basebackup', '-D',
diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c
index 36a2d62341..1485eff3e1 100644
--- a/src/bin/pg_rewind/filemap.c
+++ b/src/bin/pg_rewind/filemap.c
@@ -106,9 +106,9 @@ static const struct exclude_list_item excludeFiles[] =
{"pg_internal.init", true}, /* defined as RELCACHE_INIT_FILENAME */
/*
- * If there's a backup_label or tablespace_map file, it belongs to a
- * backup started by the user with pg_start_backup(). It is *not* correct
- * for this backup. Our backup_label is written later on separately.
+ * If there is a backup_label or tablespace_map file, it indicates that
+ * a recovery failed and this cluster probably can't be rewound, but
+ * exclude them anyway if they are found.
*/
{"backup_label", false}, /* defined as BACKUP_LABEL_FILE */
{"tablespace_map", false}, /* defined as TABLESPACE_MAP */
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 77ac4e785f..b99ef3ecb4 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -368,8 +368,7 @@ extern void assign_checkpoint_completion_target(double newval, void *extra);
typedef enum SessionBackupState
{
SESSION_BACKUP_NONE,
- SESSION_BACKUP_EXCLUSIVE,
- SESSION_BACKUP_NON_EXCLUSIVE
+ SESSION_BACKUP_RUNNING,
} SessionBackupState;
extern XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast,
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 61f2c2f5b4..4135e588fd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6051,23 +6051,13 @@
proname => 'pg_start_backup', provolatile => 'v', proparallel => 'r',
prorettype => 'pg_lsn', proargtypes => 'text bool bool',
prosrc => 'pg_start_backup' },
-{ oid => '2173', descr => 'finish taking an online backup',
- proname => 'pg_stop_backup', provolatile => 'v', proparallel => 'r',
- prorettype => 'pg_lsn', proargtypes => '', prosrc => 'pg_stop_backup' },
{ oid => '2739', descr => 'finish taking an online backup',
proname => 'pg_stop_backup', prorows => '1', proretset => 't',
provolatile => 'v', proparallel => 'r', prorettype => 'record',
proargtypes => 'bool bool', proallargtypes => '{bool,bool,pg_lsn,text,text}',
proargmodes => '{i,i,o,o,o}',
proargnames => '{exclusive,wait_for_archive,lsn,labelfile,spcmapfile}',
- prosrc => 'pg_stop_backup_v2' },
-{ oid => '3813', descr => 'true if server is in online backup',
- proname => 'pg_is_in_backup', provolatile => 'v', prorettype => 'bool',
- proargtypes => '', prosrc => 'pg_is_in_backup' },
-{ oid => '3814', descr => 'start time of an online backup',
- proname => 'pg_backup_start_time', provolatile => 's',
- prorettype => 'timestamptz', proargtypes => '',
- prosrc => 'pg_backup_start_time' },
+ prosrc => 'pg_stop_backup' },
{ oid => '3436', descr => 'promote standby server',
proname => 'pg_promote', provolatile => 'v', prorettype => 'bool',
proargtypes => 'bool int4', proargnames => '{wait,wait_seconds}',
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 179ebaa104..66e527dec8 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -70,8 +70,7 @@ typedef struct
typedef enum CAC_state
{
- CAC_OK, CAC_STARTUP, CAC_SHUTDOWN, CAC_RECOVERY, CAC_TOOMANY,
- CAC_WAITBACKUP
+ CAC_OK, CAC_STARTUP, CAC_SHUTDOWN, CAC_RECOVERY, CAC_TOOMANY
} CAC_state;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 18bc8a7b90..ddea0c9f33 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -465,8 +465,4 @@ extern void process_session_preload_libraries(void);
extern void pg_bindtextdomain(const char *domain);
extern bool has_rolreplication(Oid roleid);
-/* in access/transam/xlog.c */
-extern bool BackupInProgress(void);
-extern void CancelBackup(void);
-
#endif /* MISCADMIN_H */
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 1407359aef..f580090e9a 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -557,25 +557,6 @@ sub backup
return;
}
-=item $node->backup_fs_hot(backup_name)
-
-Create a backup with a filesystem level copy in subdirectory B<backup_name> of
-B<< $node->backup_dir >>, including WAL.
-
-Archiving must be enabled, as B<pg_start_backup()> and B<pg_stop_backup()> are
-used. This is not checked or enforced.
-
-The backup name is passed as the backup label to B<pg_start_backup()>.
-
-=cut
-
-sub backup_fs_hot
-{
- my ($self, $backup_name) = @_;
- $self->_backup_fs($backup_name, 1);
- return;
-}
-
=item $node->backup_fs_cold(backup_name)
Create a backup with a filesystem level copy in subdirectory B<backup_name> of
@@ -589,53 +570,18 @@ Use B<backup> or B<backup_fs_hot> if you want to back up a running server.
sub backup_fs_cold
{
my ($self, $backup_name) = @_;
- $self->_backup_fs($backup_name, 0);
- return;
-}
-
-
-# Common sub of backup_fs_hot and backup_fs_cold
-sub _backup_fs
-{
- my ($self, $backup_name, $hot) = @_;
- my $backup_path = $self->backup_dir . '/' . $backup_name;
- my $port = $self->port;
- my $name = $self->name;
-
- print "# Taking filesystem backup $backup_name from node \"$name\"\n";
-
- if ($hot)
- {
- my $stdout = $self->safe_psql('postgres',
- "SELECT * FROM pg_start_backup('$backup_name');");
- print "# pg_start_backup: $stdout\n";
- }
RecursiveCopy::copypath(
$self->data_dir,
- $backup_path,
+ $self->backup_dir . '/' . $backup_name,
filterfn => sub {
my $src = shift;
return ($src ne 'log' and $src ne 'postmaster.pid');
});
- if ($hot)
- {
-
- # We ignore pg_stop_backup's return value. We also assume archiving
- # is enabled; otherwise the caller will have to copy the remaining
- # segments.
- my $stdout =
- $self->safe_psql('postgres', 'SELECT * FROM pg_stop_backup();');
- print "# pg_stop_backup: $stdout\n";
- }
-
- print "# Backup finished\n";
return;
}
-
-
=pod
=item $node->init_from_backup(root_node, backup_name)
diff --git a/src/test/recovery/t/010_logical_decoding_timelines.pl b/src/test/recovery/t/010_logical_decoding_timelines.pl
index 11f5595e2b..b556d7a654 100644
--- a/src/test/recovery/t/010_logical_decoding_timelines.pl
+++ b/src/test/recovery/t/010_logical_decoding_timelines.pl
@@ -66,7 +66,9 @@ $node_master->safe_psql('dropme',
$node_master->safe_psql('postgres', 'CHECKPOINT;');
my $backup_name = 'b1';
-$node_master->backup_fs_hot($backup_name);
+$node_master->stop();
+$node_master->backup_fs_cold($backup_name);
+$node_master->start();
$node_master->safe_psql('postgres',
q[SELECT pg_create_physical_replication_slot('phys_slot');]);
--
2.21.0 (Apple Git-122.2)
Greetings,
* David Steele (david@pgmasters.net) wrote:
Lastly, there is some concern about getting the backup label too late when
doing snapshots or using traditional backup software. It would certainly be
possible to return the backup_label and tablespace_map from
pg_start_backup() and let the user make the choice about what to do with
them. Any of these methods will require some scripting so I don't see why
the files couldn't be written as, for example, backup_label.snap and
tablespace_map.snap and then renamed by the restore script. The patch does
not currently make this change, but it could be added pretty easily if that
overcomes this objection.
Of course, if someone just restored that snapshot without actually
moving the files into place they'd get a corrupted database without any
indication of anything being wrong...
And if we checked for those files on startup and complained about them
being there (called .snap) then we're back into the "crash during backup
causes PG to fail to start" situation.
All of which is, as I recall, is why we have the API we do today.
As such, doing that doesn't strike me as an improvement over using the
new API and making it abundently clear that it's not so simple as people
might like to think it is...
Thanks,
Stephen
On 6/30/20 6:13 PM, Stephen Frost wrote:
Greetings,
* David Steele (david@pgmasters.net) wrote:
Lastly, there is some concern about getting the backup label too late when
doing snapshots or using traditional backup software. It would certainly be
possible to return the backup_label and tablespace_map from
pg_start_backup() and let the user make the choice about what to do with
them. Any of these methods will require some scripting so I don't see why
the files couldn't be written as, for example, backup_label.snap and
tablespace_map.snap and then renamed by the restore script. The patch does
not currently make this change, but it could be added pretty easily if that
overcomes this objection.Of course, if someone just restored that snapshot without actually
moving the files into place they'd get a corrupted database without any
indication of anything being wrong...And if we checked for those files on startup and complained about them
being there (called .snap) then we're back into the "crash during backup
causes PG to fail to start" situation.All of which is, as I recall, is why we have the API we do today.
I'm certainly not proposing that we ignore .snap files (or whatever).
There are a many ways a restore can be done incorrectly and not be
detected. The restore script would be responsible for knowing the rules
and renaming the files or erroring out.
As such, doing that doesn't strike me as an improvement over using the
new API and making it abundently clear that it's not so simple as people
might like to think it is...
Sure, backup is complicated, but I think there's an argument for
providing the backup_label and tablespace_map files at the beginning of
the backup since they are available at that point. And maybe put some
scary language about storing them in PGDATA.
Regards,
--
-David
david@pgmasters.net
Greetings,
* David Steele (david@pgmasters.net) wrote:
On 6/30/20 6:13 PM, Stephen Frost wrote:
* David Steele (david@pgmasters.net) wrote:
Lastly, there is some concern about getting the backup label too late when
doing snapshots or using traditional backup software. It would certainly be
possible to return the backup_label and tablespace_map from
pg_start_backup() and let the user make the choice about what to do with
them. Any of these methods will require some scripting so I don't see why
the files couldn't be written as, for example, backup_label.snap and
tablespace_map.snap and then renamed by the restore script. The patch does
not currently make this change, but it could be added pretty easily if that
overcomes this objection.Of course, if someone just restored that snapshot without actually
moving the files into place they'd get a corrupted database without any
indication of anything being wrong...And if we checked for those files on startup and complained about them
being there (called .snap) then we're back into the "crash during backup
causes PG to fail to start" situation.All of which is, as I recall, is why we have the API we do today.
I'm certainly not proposing that we ignore .snap files (or whatever). There
are a many ways a restore can be done incorrectly and not be detected. The
restore script would be responsible for knowing the rules and renaming the
files or erroring out.
If we don't ignore them then what are we going to do when they exist..?
As such, doing that doesn't strike me as an improvement over using the
new API and making it abundently clear that it's not so simple as people
might like to think it is...Sure, backup is complicated, but I think there's an argument for providing
the backup_label and tablespace_map files at the beginning of the backup
since they are available at that point. And maybe put some scary language
about storing them in PGDATA.
but still require that the connection be kept open until the
pg_stop_backup() is called? I suppose we could do that but it seems
unlikely to change much for anyone.
De-deprecating the exclusive backup mode and returning that info at the
start and telling users "put this anywhere EXCEPT this specific place,
and make sure you DO put these files in that exact place when you
restore" would perhaps be closer to what folks who really, really, want
to do snapshot-based backups would want... if that ends up being a net
positive for users or not is questionable imv. One could imagine a set
of scripts that would save that info to a directory above PGDATA but on
the appropriate volume to be included in the snapshot, and then on
restore would put them into the right place to allow PG to restore from
the backup properly, removing the complication around having to save
that data elsewhere or snapshot another filesystem after the backup is
run or such.
Thanks,
Stephen
On Tue, Jun 30, 2020 at 07:10:46PM -0400, Stephen Frost wrote:
De-deprecating the exclusive backup mode and returning that info at the
start and telling users "put this anywhere EXCEPT this specific place,
and make sure you DO put these files in that exact place when you
restore" would perhaps be closer to what folks who really, really, want
to do snapshot-based backups would want... if that ends up being a net
Agreed.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com
The usefulness of a cup is in its emptiness, Bruce Lee
Greetings,
* Bruce Momjian (bruce@momjian.us) wrote:
On Tue, Jun 30, 2020 at 07:10:46PM -0400, Stephen Frost wrote:
De-deprecating the exclusive backup mode and returning that info at the
start and telling users "put this anywhere EXCEPT this specific place,
and make sure you DO put these files in that exact place when you
restore" would perhaps be closer to what folks who really, really, want
to do snapshot-based backups would want... if that ends up being a netAgreed.
Alright, in that case I think we need a victim^Wvolunteer to step up to
rewrite the documentation for backup for these two different methods,
with very clear instructions on how to do this properly (which really
should also include removing the particularly bad 'cp' command usage).
Maybe I'm wrong, but as I recall David already tried to do that once and
I doubt he's going to be interested in another round of that.
Otherwise, it seems best to stop trying to paint this as something
that's simple to do and rip the bandaid off and remove this long
deprecated method, so we can have some sane documentation for the
software writers who are writing backup solutions for PG.
Thanks,
Stephen
On Tue, Jun 30, 2020 at 07:32:47PM -0400, Stephen Frost wrote:
Greetings,
* Bruce Momjian (bruce@momjian.us) wrote:
On Tue, Jun 30, 2020 at 07:10:46PM -0400, Stephen Frost wrote:
De-deprecating the exclusive backup mode and returning that info at the
start and telling users "put this anywhere EXCEPT this specific place,
and make sure you DO put these files in that exact place when you
restore" would perhaps be closer to what folks who really, really, want
to do snapshot-based backups would want... if that ends up being a netAgreed.
Alright, in that case I think we need a victim^Wvolunteer to step up to
rewrite the documentation for backup for these two different methods,
with very clear instructions on how to do this properly (which really
should also include removing the particularly bad 'cp' command usage).Maybe I'm wrong, but as I recall David already tried to do that once and
I doubt he's going to be interested in another round of that.Otherwise, it seems best to stop trying to paint this as something
that's simple to do and rip the bandaid off and remove this long
deprecated method, so we can have some sane documentation for the
software writers who are writing backup solutions for PG.
I can work on it but I don't think I know all the details.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com
The usefulness of a cup is in its emptiness, Bruce Lee
On Tue, 2020-06-30 at 17:38 -0400, David Steele wrote:
Rebased patch is attached.
I remain -1 on removing the exclusive backup API.
People who don't mind manual intervention when PostgreSQL doesn't
start after a crash in backup mode could safely use it.
We have been over this, and there is not much point in re-opening
the discussion. I just had to speak up.
Lastly, there is some concern about getting the backup label too late
when doing snapshots or using traditional backup software. It would
certainly be possible to return the backup_label and tablespace_map from
pg_start_backup() and let the user make the choice about what to do with
them. Any of these methods will require some scripting so I don't see
why the files couldn't be written as, for example, backup_label.snap and
tablespace_map.snap and then renamed by the restore script. The patch
does not currently make this change, but it could be added pretty easily
if that overcomes this objection.
That would be interesting improvements that would make the non-exclusive
backup API easier to use.
Yours,
Laurenz Albe
Greetings,
* Laurenz Albe (laurenz.albe@cybertec.at) wrote:
On Tue, 2020-06-30 at 17:38 -0400, David Steele wrote:
Rebased patch is attached.
I remain -1 on removing the exclusive backup API.
It's still deprecated and I'm certainly against removing that status
until/unless someone actually fixes it (including documentation), and if
we're not going to do that then we should remove it.
People who don't mind manual intervention when PostgreSQL doesn't
start after a crash in backup mode could safely use it.
I disagree that that's really appropriate.
We have been over this, and there is not much point in re-opening
the discussion. I just had to speak up.
It's not ok to just leave it as deprecated forever.
Lastly, there is some concern about getting the backup label too late
when doing snapshots or using traditional backup software. It would
certainly be possible to return the backup_label and tablespace_map from
pg_start_backup() and let the user make the choice about what to do with
them. Any of these methods will require some scripting so I don't see
why the files couldn't be written as, for example, backup_label.snap and
tablespace_map.snap and then renamed by the restore script. The patch
does not currently make this change, but it could be added pretty easily
if that overcomes this objection.That would be interesting improvements that would make the non-exclusive
backup API easier to use.
This would presumably be for the exclusive API as a way to make it not
completely broken, maybe.
If we wanted to try and make this work in a non-exclusive manner then
we'd need to do something like have the user save some info out at
pg_start_backup time that they then provide at pg_stop_backup time, so
we can match up the specific backup and write the appropriate WAL
message.
Thanks,
Stephen
On Wed, 2020-07-01 at 08:47 -0400, Stephen Frost wrote:
I remain -1 on removing the exclusive backup API.
It's still deprecated and I'm certainly against removing that status
until/unless someone actually fixes it (including documentation), and if
we're not going to do that then we should remove it.
Well, it doesn't need fixing, since it is working just fine (for certain
values of "just fine").
I agree with the need to document the problems better.
Yours,
Laurenz Albe
Greetings,
* Laurenz Albe (laurenz.albe@cybertec.at) wrote:
On Wed, 2020-07-01 at 08:47 -0400, Stephen Frost wrote:
I remain -1 on removing the exclusive backup API.
It's still deprecated and I'm certainly against removing that status
until/unless someone actually fixes it (including documentation), and if
we're not going to do that then we should remove it.Well, it doesn't need fixing, since it is working just fine (for certain
values of "just fine").
imv it's a disservice to our users to claim it's working when there's a
myriad of problems with it that we only allude to in the documentation,
some of which simply have no way of being addressed by a user.
Perhaps if we remove the landmines we can say it works, but imv it
doesn't today. I suspect we'll simply not get to a point where we agree
on this, but if you'd like to propose a patch that fixes the issues and
the documentation, that'd be a great direction to take this. If that's
not going to happen, then I'm in favor of removing it. I really dislike
these half-measures and keeping deprecated things that complicate our
code and our documentation forever.
Thanks,
Stephen
On Wed, Jul 1, 2020 at 04:46:59PM +0200, Laurenz Albe wrote:
On Wed, 2020-07-01 at 08:47 -0400, Stephen Frost wrote:
I remain -1 on removing the exclusive backup API.
It's still deprecated and I'm certainly against removing that status
until/unless someone actually fixes it (including documentation), and if
we're not going to do that then we should remove it.Well, it doesn't need fixing, since it is working just fine (for certain
values of "just fine").I agree with the need to document the problems better.
Agreed, and I don't support the repeated attempts to force things. For
me, the exclusive mode got marked as deprecated because it is "not
optimal", but that doesn't mean it is useless or needs to be removed.
Frankly, I think we should remove deprecated and just mark it as
non-optimal.
I do like the idea of pg_start_backup returning a value that needs to be
supplied to pg_stop_backup.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com
The usefulness of a cup is in its emptiness, Bruce Lee
Greetings,
* Bruce Momjian (bruce@momjian.us) wrote:
On Wed, Jul 1, 2020 at 04:46:59PM +0200, Laurenz Albe wrote:
On Wed, 2020-07-01 at 08:47 -0400, Stephen Frost wrote:
I remain -1 on removing the exclusive backup API.
It's still deprecated and I'm certainly against removing that status
until/unless someone actually fixes it (including documentation), and if
we're not going to do that then we should remove it.Well, it doesn't need fixing, since it is working just fine (for certain
values of "just fine").I agree with the need to document the problems better.
Agreed, and I don't support the repeated attempts to force things. For
me, the exclusive mode got marked as deprecated because it is "not
optimal", but that doesn't mean it is useless or needs to be removed.
I don't consider something that, in entirely normal operation, causes
the server to not restart from a crash to be just "not optimal". You
could argue that doing a non-atomic snapshot of PG as a backup as "not
optimal" too, and certainly there's no shortage of folks who do and it
even works in a lot of cases, but there's no doubt it'll end up
breaking at some point.
Thanks,
Stephen
On Wed, Jul 1, 2020 at 2:47 PM Stephen Frost <sfrost@snowman.net> wrote:
This would presumably be for the exclusive API as a way to make it not
completely broken, maybe.If we wanted to try and make this work in a non-exclusive manner then
we'd need to do something like have the user save some info out at
pg_start_backup time that they then provide at pg_stop_backup time, so
we can match up the specific backup and write the appropriate WAL
message.
We don't even need to make it an exclusive mode -- we can allow
*nonexclusive* backups but remove the requirement to run start and stop
backup in the same connection, which I believe is the problem that people
have with the exclusive mode. So how about something like this:
1. Make pg_start_backup() return the backup label file as well. We can add
an extra column to the output without breaking backwards compatibility. And
since we have all the information for the file at pg_start_backup() time,
the user can then write that into the backup. We clearly document that this
should *not* be written as "backup_label" in the data directory. We can
even define what it should be instead.
2. Make pg_start_backup() also return a "cookie" value with a
unique identifier if asked to run in "disconnectable mode". Store this
identifier in shared state somewhere in the backend.
3. Make a version of pg_stop_backup that takes this cookie as a parameter,
and that way supports stopping a "disconnectable backup".
4. Scream loudly in the logs if we shut down postgres with any open
"disconnectable backups" in progress. (to let people know they screwed up)
5. Perhaps provide a row in pg_stat_progress_basebackup, or in it's own
view, to show these "disconnectable mode backups"?
5b. In fact, maybe provide something like that for the current
non-exclusive ones as well, in case people have hung sessions.
The weak spot in this is still the "don't write it as backup label", but we
can document that. And that would allow us to do non-exclusive base backups
without requiring maintaining the connection. And we can completely get rid
of the exclusive ones.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Greetings,
* Magnus Hagander (magnus@hagander.net) wrote:
On Wed, Jul 1, 2020 at 2:47 PM Stephen Frost <sfrost@snowman.net> wrote:
This would presumably be for the exclusive API as a way to make it not
completely broken, maybe.If we wanted to try and make this work in a non-exclusive manner then
we'd need to do something like have the user save some info out at
pg_start_backup time that they then provide at pg_stop_backup time, so
we can match up the specific backup and write the appropriate WAL
message.We don't even need to make it an exclusive mode -- we can allow
*nonexclusive* backups but remove the requirement to run start and stop
backup in the same connection, which I believe is the problem that people
have with the exclusive mode. So how about something like this:1. Make pg_start_backup() return the backup label file as well. We can add
an extra column to the output without breaking backwards compatibility. And
since we have all the information for the file at pg_start_backup() time,
the user can then write that into the backup. We clearly document that this
should *not* be written as "backup_label" in the data directory. We can
even define what it should be instead.2. Make pg_start_backup() also return a "cookie" value with a
unique identifier if asked to run in "disconnectable mode". Store this
identifier in shared state somewhere in the backend.
Seems like we could possibly just make this be the WAL position the
backup started at, since we use that as the finishing location...? I
get that users could screw up passing the value back in and get a
corrupted backup, but it'd avoid us having to stick anything in shared
memory, wouldn't it?
3. Make a version of pg_stop_backup that takes this cookie as a parameter,
and that way supports stopping a "disconnectable backup".4. Scream loudly in the logs if we shut down postgres with any open
"disconnectable backups" in progress. (to let people know they screwed up)
I'm less sure how useful that'd be, but ok.
5. Perhaps provide a row in pg_stat_progress_basebackup, or in it's own
view, to show these "disconnectable mode backups"?
5b. In fact, maybe provide something like that for the current
non-exclusive ones as well, in case people have hung sessions.
Ideally we'd provide a way for external backup tools to update said
progress with information of their own like the % complete, if they wish
to...
The weak spot in this is still the "don't write it as backup label", but we
can document that. And that would allow us to do non-exclusive base backups
without requiring maintaining the connection. And we can completely get rid
of the exclusive ones.
If this helps us get rid of exclusive backup mode then that's certainly
helpful, though inventing yet another method of doing backup makes me
cringe to think of the documentation complexity since it doesn't seem
like we'd really be reducing that. Now, if we also removed the existing
non-exclusive method and then had a *single* backup method with clear
documentation as to how to use it properly, that'd be a marked
improvement overall.
Thanks,
Stephen
On Wed, Jul 1, 2020 at 6:29 PM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Magnus Hagander (magnus@hagander.net) wrote:
On Wed, Jul 1, 2020 at 2:47 PM Stephen Frost <sfrost@snowman.net> wrote:
This would presumably be for the exclusive API as a way to make it not
completely broken, maybe.If we wanted to try and make this work in a non-exclusive manner then
we'd need to do something like have the user save some info out at
pg_start_backup time that they then provide at pg_stop_backup time, so
we can match up the specific backup and write the appropriate WAL
message.We don't even need to make it an exclusive mode -- we can allow
*nonexclusive* backups but remove the requirement to run start and stop
backup in the same connection, which I believe is the problem that people
have with the exclusive mode. So how about something like this:1. Make pg_start_backup() return the backup label file as well. We can
add
an extra column to the output without breaking backwards compatibility.
And
since we have all the information for the file at pg_start_backup() time,
the user can then write that into the backup. We clearly document thatthis
should *not* be written as "backup_label" in the data directory. We can
even define what it should be instead.2. Make pg_start_backup() also return a "cookie" value with a
unique identifier if asked to run in "disconnectable mode". Store this
identifier in shared state somewhere in the backend.Seems like we could possibly just make this be the WAL position the
backup started at, since we use that as the finishing location...? I
get that users could screw up passing the value back in and get a
corrupted backup, but it'd avoid us having to stick anything in shared
memory, wouldn't it?
Or, how about we require them to provide the backup label contents in its
entirety? Which I believe does contain that WAL portion? The downside of
that is that it would be multiline which might screw with shellscripts.
We would still need to stick *something* in there so we can keep track of
when it's done. You should only be able to stop each backup once for
example. Otherwise, you'll end up with XLogCtl->Insert.nonExclusiveBackups
being wrong if someone calls stop twice with the same cookie/wal location.
3. Make a version of pg_stop_backup that takes this cookie as a
parameter,
and that way supports stopping a "disconnectable backup".
4. Scream loudly in the logs if we shut down postgres with any open
"disconnectable backups" in progress. (to let people know they screwedup)
I'm less sure how useful that'd be, but ok.
5. Perhaps provide a row in pg_stat_progress_basebackup, or in it's own
view, to show these "disconnectable mode backups"?
5b. In fact, maybe provide something like that for the current
non-exclusive ones as well, in case people have hung sessions.Ideally we'd provide a way for external backup tools to update said
progress with information of their own like the % complete, if they wish
to...
That's definitely moving the goalposts on this several miles :) Not saying
that wouldn't be nice, but let's keep those separate :P
The weak spot in this is still the "don't write it as backup label", but
wecan document that. And that would allow us to do non-exclusive base
backups
without requiring maintaining the connection. And we can completely get
rid
of the exclusive ones.
If this helps us get rid of exclusive backup mode then that's certainly
helpful, though inventing yet another method of doing backup makes me
cringe to think of the documentation complexity since it doesn't seem
like we'd really be reducing that. Now, if we also removed the existing
non-exclusive method and then had a *single* backup method with clear
documentation as to how to use it properly, that'd be a marked
improvement overall.
We could do that, but the passing of cookies or whatever is completely
unnecessary with the current method so it would complicate that one...
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Greetings,
* Magnus Hagander (magnus@hagander.net) wrote:
On Wed, Jul 1, 2020 at 6:29 PM Stephen Frost <sfrost@snowman.net> wrote:
* Magnus Hagander (magnus@hagander.net) wrote:
On Wed, Jul 1, 2020 at 2:47 PM Stephen Frost <sfrost@snowman.net> wrote:
This would presumably be for the exclusive API as a way to make it not
completely broken, maybe.If we wanted to try and make this work in a non-exclusive manner then
we'd need to do something like have the user save some info out at
pg_start_backup time that they then provide at pg_stop_backup time, so
we can match up the specific backup and write the appropriate WAL
message.We don't even need to make it an exclusive mode -- we can allow
*nonexclusive* backups but remove the requirement to run start and stop
backup in the same connection, which I believe is the problem that people
have with the exclusive mode. So how about something like this:1. Make pg_start_backup() return the backup label file as well. We can
add
an extra column to the output without breaking backwards compatibility.
And
since we have all the information for the file at pg_start_backup() time,
the user can then write that into the backup. We clearly document thatthis
should *not* be written as "backup_label" in the data directory. We can
even define what it should be instead.2. Make pg_start_backup() also return a "cookie" value with a
unique identifier if asked to run in "disconnectable mode". Store this
identifier in shared state somewhere in the backend.Seems like we could possibly just make this be the WAL position the
backup started at, since we use that as the finishing location...? I
get that users could screw up passing the value back in and get a
corrupted backup, but it'd avoid us having to stick anything in shared
memory, wouldn't it?Or, how about we require them to provide the backup label contents in its
entirety? Which I believe does contain that WAL portion? The downside of
that is that it would be multiline which might screw with shellscripts.
This is getting a bit beyond the pale imv if we're talking about trying
to cater to software that can't manage to handle a multi-line result or
multi-line argument to a function.
We would still need to stick *something* in there so we can keep track of
when it's done. You should only be able to stop each backup once for
example. Otherwise, you'll end up with XLogCtl->Insert.nonExclusiveBackups
being wrong if someone calls stop twice with the same cookie/wal location.
Hrmpf, yeah, I suppose we can't really avoid having to track it somehow,
in which case, yeah, we might as well just have some unique ID for each
that we then look up in shared memory (though that unique ID could be
the starting WAL if we wanted, but I don't suppose it particularly
matters if it's that or something else). We'd need to have some
parameter like max_concurrent_backups or something tho, which I was
hoping to avoid.
5. Perhaps provide a row in pg_stat_progress_basebackup, or in it's own
view, to show these "disconnectable mode backups"?
5b. In fact, maybe provide something like that for the current
non-exclusive ones as well, in case people have hung sessions.Ideally we'd provide a way for external backup tools to update said
progress with information of their own like the % complete, if they wish
to...That's definitely moving the goalposts on this several miles :) Not saying
that wouldn't be nice, but let's keep those separate :P
Well, you brought up progress monitoring. :)
The weak spot in this is still the "don't write it as backup label", but
wecan document that. And that would allow us to do non-exclusive base
backups
without requiring maintaining the connection. And we can completely get
rid
of the exclusive ones.
If this helps us get rid of exclusive backup mode then that's certainly
helpful, though inventing yet another method of doing backup makes me
cringe to think of the documentation complexity since it doesn't seem
like we'd really be reducing that. Now, if we also removed the existing
non-exclusive method and then had a *single* backup method with clear
documentation as to how to use it properly, that'd be a marked
improvement overall.We could do that, but the passing of cookies or whatever is completely
unnecessary with the current method so it would complicate that one...
No, it wouldn't complicate "that one" since "that one" would be gone.
Yes, it'd be a slightly more complicated process than what they handle
currently but I have confidence that everything that's, today, arranging
to keep a PG connection open would be able to handle instead (or, in
addition to, if they wanted to continue having that connection anyway)
passing the cookie through from start->stop backup, and I do think it
would massively simplify the documentation which would make it well
worth it.
Thanks,
Stephen
On Wed, Jul 1, 2020 at 11:48 AM Magnus Hagander <magnus@hagander.net> wrote:
We don't even need to make it an exclusive mode -- we can allow *nonexclusive* backups but remove the requirement to run start and stop backup in the same connection, which I believe is the problem that people have with the exclusive mode.
Well, one of the big advantages of the non-exclusive mode is that the
backup automatically is terminated if the connection is closed, which
removes the possibility of leaving a backup in progress for a long
time inadvertently. I think that back-tracking on that would probably
be a step in the wrong direction.
I agree with what I understand to be Bruce's position -- non-exclusive
backup mode is better than exclusive mode backup mode, but exclusive
backup mode is not broken and doesn't need to be removed. I think it's
very sensible to continue to support exclusive backup mode but to
encourage users to use non-exclusive backup mode instead. I find
Stephen's position that exclusive backup mode cannot be used safely to
be unsupported by any compelling evidence, and reviewing the thread, I
find that there are a LOT of people who seem to agree.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Jul 1, 2020 at 01:23:21PM -0400, Stephen Frost wrote:
Or, how about we require them to provide the backup label contents in its
entirety? Which I believe does contain that WAL portion? The downside of
that is that it would be multiline which might screw with shellscripts.This is getting a bit beyond the pale imv if we're talking about trying
to cater to software that can't manage to handle a multi-line result or
multi-line argument to a function.
Seems to work for me in bash:
$ X='a
b
c'
$ echo "$X"
a
b
c
$ echo $X
a b c
You need the double quotes around the variable name.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com
The usefulness of a cup is in its emptiness, Bruce Lee
On Wed, Jul 1, 2020 at 9:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jul 1, 2020 at 11:48 AM Magnus Hagander <magnus@hagander.net>
wrote:We don't even need to make it an exclusive mode -- we can allow
*nonexclusive* backups but remove the requirement to run start and stop
backup in the same connection, which I believe is the problem that people
have with the exclusive mode.Well, one of the big advantages of the non-exclusive mode is that the
backup automatically is terminated if the connection is closed, which
removes the possibility of leaving a backup in progress for a long
time inadvertently. I think that back-tracking on that would probably
be a step in the wrong direction.
I am in no way suggesting we should do this to replace the nonexclusive
backups the way they are now. They would still retain this property, and
should be encouraged to be the default way.
I am only suggesting this as an alternative to the exclusive ones, which
hopefully can get rid of most of the drawbacks of the exclusive mode ones.
Yes, it would still lack the ability to automatically terminate (except
it *would* do such a termination if the server restarted -- instead of like
the current exclusive mode which just breaks the server then).
I agree with what I understand to be Bruce's position -- non-exclusive
backup mode is better than exclusive mode backup mode, but exclusive
backup mode is not broken and doesn't need to be removed. I think it's
very sensible to continue to support exclusive backup mode but to
encourage users to use non-exclusive backup mode instead. I find
Stephen's position that exclusive backup mode cannot be used safely to
be unsupported by any compelling evidence, and reviewing the thread, I
find that there are a LOT of people who seem to agree.
I think that is what I'm suggesting -- that is, I'm suggesting a way to
solve the actual problems that people have with non-exclusive mode, to the
point of making it possible ot run those without the drawbacks.
As far as I've seen, the one thing that people have problems with in the
exclusive mode backups are precisely the fact that they have to keep a
persistent conneciton open, and thus it cannot work together with backup
software that is limited to only supporting running a pre- and a post
script.
Something like I have suggested here is to solve *that* problem. I don't
think anybody actually explicitly wants "exclusive backups" -- they want a
backup solution that plugs into their world of pre/post scripts. And if we
can make that one work in a safer way than the current exclusive backups,
ohw is that not an improvement?
//Magnus
Greetings,
* Robert Haas (robertmhaas@gmail.com) wrote:
On Wed, Jul 1, 2020 at 11:48 AM Magnus Hagander <magnus@hagander.net> wrote:
We don't even need to make it an exclusive mode -- we can allow *nonexclusive* backups but remove the requirement to run start and stop backup in the same connection, which I believe is the problem that people have with the exclusive mode.
Well, one of the big advantages of the non-exclusive mode is that the
backup automatically is terminated if the connection is closed, which
removes the possibility of leaving a backup in progress for a long
time inadvertently. I think that back-tracking on that would probably
be a step in the wrong direction.
What's the actual downside from having a forgotten backup being left
around for an extended period of time..?
I agree with what I understand to be Bruce's position -- non-exclusive
backup mode is better than exclusive mode backup mode, but exclusive
backup mode is not broken and doesn't need to be removed. I think it's
very sensible to continue to support exclusive backup mode but to
encourage users to use non-exclusive backup mode instead. I find
Stephen's position that exclusive backup mode cannot be used safely to
be unsupported by any compelling evidence, and reviewing the thread, I
find that there are a LOT of people who seem to agree.
I didn't realize there was a question regarding what happens if the DB
server ends up crashing during an exclusive backup. That hardly seems
to be safe if you care about having your database start up after a
crash, which seems like an important thing that we generally consider of
value considering the amount of effort we put into it.
I agree that there seems to be a number of people who want to ignore
that the system breaks down in that situation in order to keep exclusive
backup around, but, unlike the discussion being had between Magnus and
I, that doesn't actually move us forward at all in terms of providing a
good solution or being able to write the documentation in a sane way for
people who aren't PG hackers to be able to follow.
Thanks,
Stephen
On Wed, Jul 1, 2020 at 3:50 PM Magnus Hagander <magnus@hagander.net> wrote:
As far as I've seen, the one thing that people have problems with in the exclusive mode backups are precisely the fact that they have to keep a persistent conneciton open, and thus it cannot work together with backup software that is limited to only supporting running a pre- and a post script.
Something like I have suggested here is to solve *that* problem. I don't think anybody actually explicitly wants "exclusive backups" -- they want a backup solution that plugs into their world of pre/post scripts. And if we can make that one work in a safer way than the current exclusive backups, ohw is that not an improvement?
Yeah, I guess that's a pretty fair point. I have to confess to having
somewhat limited enthusiasm for adding a third mode here, but it might
be worth it.
It seems pretty well inevitable to me that people are going to forget
to end them. I am not sure exactly what the consequences of that will
be, but if for example there's a limited number of shared memory slots
to store information about these backups, then if you leak any, you'll
eventually run out of slots and your backups will start failing. I
feel like that's a going to happen to about 75% of the people who try
to use this new backup mode at some point in time, but maybe I'm a
pessimist.[1]OK, that's not really a "maybe".
If we could jigger things so that you don't need to stop the backup at
all, you only start it, and whether you ever finish copying everything
is something about which the system need not know or care, that would
be a lot nicer. I'm not sure I see how to do that, though.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
[1]: OK, that's not really a "maybe".
On Wed, Jul 1, 2020 at 9:59 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jul 1, 2020 at 3:50 PM Magnus Hagander <magnus@hagander.net>
wrote:As far as I've seen, the one thing that people have problems with in the
exclusive mode backups are precisely the fact that they have to keep a
persistent conneciton open, and thus it cannot work together with backup
software that is limited to only supporting running a pre- and a post
script.Something like I have suggested here is to solve *that* problem. I don't
think anybody actually explicitly wants "exclusive backups" -- they want a
backup solution that plugs into their world of pre/post scripts. And if we
can make that one work in a safer way than the current exclusive backups,
ohw is that not an improvement?Yeah, I guess that's a pretty fair point. I have to confess to having
somewhat limited enthusiasm for adding a third mode here, but it might
be worth it.
The intention is definitely not to have 3 modes. If we build this mode the
intention is that it is strictly better than exclusive mode, and thus
exclusive mode can finally be removed. (Whereas the nonexclusive mode is
better in many ways, but not all)
It seems pretty well inevitable to me that people are going to forget
to end them. I am not sure exactly what the consequences of that will
be, but if for example there's a limited number of shared memory slots
to store information about these backups, then if you leak any, you'll
eventually run out of slots and your backups will start failing. I
feel like that's a going to happen to about 75% of the people who try
to use this new backup mode at some point in time, but maybe I'm a
pessimist.[1]
Agreed. yet this is still strictly better or equal than the current
exclusive backups, which fail after forgetting that *once*.
If we could jigger things so that you don't need to stop the backup at
all, you only start it, and whether you ever finish copying everything
is something about which the system need not know or care, that would
be a lot nicer. I'm not sure I see how to do that, though.
Yeah, I'm not sure about that one either.
//Magnus
On 7/1/20 3:58 PM, Robert Haas wrote:
On Wed, Jul 1, 2020 at 3:50 PM Magnus Hagander <magnus@hagander.net> wrote:
As far as I've seen, the one thing that people have problems with in the exclusive mode backups are precisely the fact that they have to keep a persistent conneciton open, and thus it cannot work together with backup software that is limited to only supporting running a pre- and a post script.
Something like I have suggested here is to solve *that* problem. I don't think anybody actually explicitly wants "exclusive backups" -- they want a backup solution that plugs into their world of pre/post scripts. And if we can make that one work in a safer way than the current exclusive backups, ohw is that not an improvement?
Yeah, I guess that's a pretty fair point. I have to confess to having
somewhat limited enthusiasm for adding a third mode here, but it might
be worth it.It seems pretty well inevitable to me that people are going to forget
to end them. I am not sure exactly what the consequences of that will
be, but if for example there's a limited number of shared memory slots
to store information about these backups, then if you leak any, you'll
eventually run out of slots and your backups will start failing. I
feel like that's a going to happen to about 75% of the people who try
to use this new backup mode at some point in time, but maybe I'm a
pessimist.[1]
I did consider going down this road and I came to more or less the same
conclusion. Backups would accumulate until we just ran out of space
and/or started throwing errors at some limit. Hardly very attractive.
Also, full pages writes will be left on indefinitely.
If we could jigger things so that you don't need to stop the backup at
all, you only start it, and whether you ever finish copying everything
is something about which the system need not know or care, that would
be a lot nicer. I'm not sure I see how to do that, though.
Well, the only thing pg_stop_backup() *really* needs to know is the
starting WAL position. pg_start_backup() gets that info so if it passes
it back to pg_stop_backup() that could be enough. Or as was proposed
above, it just passes the backup_label back to pg_stop_backup() for
parsing. To write the .backup file to WAL (which I personally think is
pretty useless) you'd need backup_label.
The issue I have not been able to work around is that full page writes
need to be on for backups and they won't get turned back off if you
don't end all the backups that get started.
Here's a thought. What if we just stored the oldest starting LSN and a
count of how many backups have been requested. When the backup ends it
checks that backup count is > 0 and starting LSN is <= its starting LSN.
If not, it throws an error. When backups go to 0 FPWs are turned off if
they were off before the first backup.
That way, we could have a single function to cancel all backups in
progress. Even if a new one started, one ending that was started prior
to the cancel would know that it could not end successfully and error.
For the vast majority of people who have full page writes on, it
wouldn't really matter how many backups were running so we could
probably just skip all of that.
If we wanted to really make it more foolproof we could add a checksum to
the backup label to make sure the user doesn't damage it before passing
it back.
Regards,
--
-David
david@pgmasters.net
On Wed, Jul 1, 2020 at 10:24:15PM +0200, Magnus Hagander wrote:
On Wed, Jul 1, 2020 at 9:59 PM Robert Haas <robertmhaas@gmail.com> wrote:
Yeah, I guess that's a pretty fair point. I have to confess to having
somewhat limited enthusiasm for adding a third mode here, but it might
be worth it.
The intention is definitely not to have 3 modes. If we build this mode the
intention is that it is strictly better than exclusive mode, and thus exclusive
mode can finally be removed. (Whereas the nonexclusive mode is better in many
ways, but not all)
Agreed. I don't think three modes would help anyone.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com
The usefulness of a cup is in its emptiness, Bruce Lee
On Wed, Jul 1, 2020 at 04:27:58PM -0400, David Steele wrote:
On 7/1/20 3:58 PM, Robert Haas wrote:
If we could jigger things so that you don't need to stop the backup at
all, you only start it, and whether you ever finish copying everything
is something about which the system need not know or care, that would
be a lot nicer. I'm not sure I see how to do that, though.Well, the only thing pg_stop_backup() *really* needs to know is the starting
WAL position. pg_start_backup() gets that info so if it passes it back to
pg_stop_backup() that could be enough. Or as was proposed above, it just
Doesn't pg_start_backup already return this?
SELECT pg_start_backup('test');
pg_start_backup
-----------------
--> 0/2000028
Here's a thought. What if we just stored the oldest starting LSN and a count
of how many backups have been requested. When the backup ends it checks that
backup count is > 0 and starting LSN is <= its starting LSN. If not, it
throws an error. When backups go to 0 FPWs are turned off if they were off
before the first backup.
Can't we just error out of an exclusive-style backup is requested to
start and a previous exclusive-style backup has not been stopped?
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com
The usefulness of a cup is in its emptiness, Bruce Lee
On Wed, Jul 1, 2020 at 4:27 PM David Steele <david@pgmasters.net> wrote:
Well, the only thing pg_stop_backup() *really* needs to know is the
starting WAL position. pg_start_backup() gets that info so if it passes
it back to pg_stop_backup() that could be enough. Or as was proposed
above, it just passes the backup_label back to pg_stop_backup() for
parsing. To write the .backup file to WAL (which I personally think is
pretty useless) you'd need backup_label.The issue I have not been able to work around is that full page writes
need to be on for backups and they won't get turned back off if you
don't end all the backups that get started.Here's a thought. What if we just stored the oldest starting LSN and a
count of how many backups have been requested. When the backup ends it
checks that backup count is > 0 and starting LSN is <= its starting LSN.
If not, it throws an error. When backups go to 0 FPWs are turned off if
they were off before the first backup.That way, we could have a single function to cancel all backups in
progress. Even if a new one started, one ending that was started prior
to the cancel would know that it could not end successfully and error.For the vast majority of people who have full page writes on, it
wouldn't really matter how many backups were running so we could
probably just skip all of that.
Yeah, I like this line of thinking. Not sure about the details.
If we wanted to really make it more foolproof we could add a checksum to
the backup label to make sure the user doesn't damage it before passing
it back.
I don't see a whole lot of point in that, honestly. It doesn't make it
impossible to damage the backup label; it just makes it harder. And
that doesn't seem worth much.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 7/1/20 4:27 PM, David Steele wrote:
Here's a thought. What if we just stored the oldest starting LSN and a
count of how many backups have been requested. When the backup ends it
checks that backup count is > 0 and starting LSN is <= its starting LSN.
If not, it throws an error. When backups go to 0 FPWs are turned off if
they were off before the first backup.
To be clear, the min starting LSN only needs to be reset when
incrementing total running backups from 0 to 1. What we are recording
here is the LSN since FPWs were enabled. As long as it is <= the start
LSN of the backup when it ends then all is well (I think).
Regards,
--
-David
david@pgmasters.net
On Wed, Jul 1, 2020 at 4:29 PM Bruce Momjian <bruce@momjian.us> wrote:
Agreed. I don't think three modes would help anyone.
Well, I think that if and when we remove the existing exclusive mode,
we're going to break a bunch of people's backup scripts. I think it's
appropriate to do that eventually, but I'm not in a big rush. Long
deprecation periods are a feature, not a bug.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Jul 1, 2020 at 10:28 PM David Steele <david@pgmasters.net> wrote:
On 7/1/20 3:58 PM, Robert Haas wrote:
On Wed, Jul 1, 2020 at 3:50 PM Magnus Hagander <magnus@hagander.net>
wrote:
As far as I've seen, the one thing that people have problems with in
the exclusive mode backups are precisely the fact that they have to keep a
persistent conneciton open, and thus it cannot work together with backup
software that is limited to only supporting running a pre- and a post
script.Something like I have suggested here is to solve *that* problem. I
don't think anybody actually explicitly wants "exclusive backups" -- they
want a backup solution that plugs into their world of pre/post scripts. And
if we can make that one work in a safer way than the current exclusive
backups, ohw is that not an improvement?Yeah, I guess that's a pretty fair point. I have to confess to having
somewhat limited enthusiasm for adding a third mode here, but it might
be worth it.It seems pretty well inevitable to me that people are going to forget
to end them. I am not sure exactly what the consequences of that will
be, but if for example there's a limited number of shared memory slots
to store information about these backups, then if you leak any, you'll
eventually run out of slots and your backups will start failing. I
feel like that's a going to happen to about 75% of the people who try
to use this new backup mode at some point in time, but maybe I'm a
pessimist.[1]I did consider going down this road and I came to more or less the same
conclusion. Backups would accumulate until we just ran out of space
and/or started throwing errors at some limit. Hardly very attractive.Also, full pages writes will be left on indefinitely.
If we could jigger things so that you don't need to stop the backup at
all, you only start it, and whether you ever finish copying everything
is something about which the system need not know or care, that would
be a lot nicer. I'm not sure I see how to do that, though.Well, the only thing pg_stop_backup() *really* needs to know is the
starting WAL position. pg_start_backup() gets that info so if it passes
it back to pg_stop_backup() that could be enough. Or as was proposed
above, it just passes the backup_label back to pg_stop_backup() for
parsing. To write the .backup file to WAL (which I personally think is
pretty useless) you'd need backup_label.The issue I have not been able to work around is that full page writes
need to be on for backups and they won't get turned back off if you
don't end all the backups that get started.Here's a thought. What if we just stored the oldest starting LSN and a
count of how many backups have been requested. When the backup ends it
checks that backup count is > 0 and starting LSN is <= its starting LSN.
If not, it throws an error. When backups go to 0 FPWs are turned off if
they were off before the first backup.
I guess the weak spot of that one is if some script does stop without doing
start first, it will break somebody else's backup. (And yes, I've seen
scripts make this mistake many times -- it equally breaks the exclusive
backups in the current system...)
And don't we need the combination of the start/stop location for the
history file?
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Greetings,
* Robert Haas (robertmhaas@gmail.com) wrote:
On Wed, Jul 1, 2020 at 4:29 PM Bruce Momjian <bruce@momjian.us> wrote:
Agreed. I don't think three modes would help anyone.
Well, I think that if and when we remove the existing exclusive mode,
we're going to break a bunch of people's backup scripts. I think it's
appropriate to do that eventually, but I'm not in a big rush. Long
deprecation periods are a feature, not a bug.
This is pretty tangential to the overall discussion, but I generally
disagree that deprecation periods beyond what we already, always,
provide thanks to supporting 5 major versions concurrently are actually
a feature or are healthy for the project.
I continue to find it curious that we stress a great deal over (very
likely) poorly written backup scripts that haven't been updated in the
5+? years since exclusive mode was deprecated, but we happily add new
keywords in new major versions and remove columns in our catalog tables
or rename columns or entirely redo how recovery works (breaking every
script out there that performed a restore..) or do any number of other
things that potentially break applications that communicate through the
PG protocol and other parts of the system that very likely have scripts
that were written to work with them.
I'm really of the opinion that we need to stop doing that.
If someone could explain what is so special about *this* part of the
system that we absolutely can't possibly accept any change that might
break user's scripts, and why it's worth all of the angst, maintenance,
ridiculously difficult documentation to understand and hacks (the
interface to pg_start_backup is ridiculously warty because of this), I'd
greatly appreciate it.
Thanks,
Stephen
On Wed, Jul 1, 2020 at 04:36:03PM -0400, Robert Haas wrote:
On Wed, Jul 1, 2020 at 4:29 PM Bruce Momjian <bruce@momjian.us> wrote:
Agreed. I don't think three modes would help anyone.
Well, I think that if and when we remove the existing exclusive mode,
we're going to break a bunch of people's backup scripts. I think it's
appropriate to do that eventually, but I'm not in a big rush. Long
deprecation periods are a feature, not a bug.
True, but we have had this marked as deprecated for a while. Our
Postgres people who love backup technology want this improved, and our
operational people are asking for it to stay, so keeping the ability but
changing the API seems reasonable to me. I tend to think our "love
backup technology" folks know more than our operational folks, even
though I am in the operational camp.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com
The usefulness of a cup is in its emptiness, Bruce Lee
On 7/1/20 4:39 PM, Magnus Hagander wrote:
On Wed, Jul 1, 2020 at 10:28 PM David Steele <david@pgmasters.net
Here's a thought. What if we just stored the oldest starting LSN and a
count of how many backups have been requested. When the backup ends it
checks that backup count is > 0 and starting LSN is <= its starting
LSN.
If not, it throws an error. When backups go to 0 FPWs are turned off if
they were off before the first backup.I guess the weak spot of that one is if some script does stop without
doing start first, it will break somebody else's backup. (And yes, I've
seen scripts make this mistake many times -- it equally breaks the
exclusive backups in the current system...)
Well, they'd have to pass in a backup_label with a start LSN >= the min
LSN or they would just get an error and not decrement the backup count.
The real issue would be if they called pg_stop_backup twice. We might be
able to stop that with a rolling max stop lsn to keep anyone from
calling pg_stop_backup() twice.
But yeah, it would be possible to kill somebody else's session with some
finagling. Still, worse case would be an error'd backup rather than a
corrupt one.
But really, that's only if FPWs are turned off. We can also do some
extra validation if the session is left open, which for most software is
the norm now.
And don't we need the combination of the start/stop location for the
history file?
You mean the .backup file for the WAL? All that needs is the
backup_label and the stop LSN that's determined in pg_stop_backup(). Am
I missing something?
Regards,
--
-David
david@pgmasters.net
On 7/1/20 4:32 PM, Bruce Momjian wrote:
On Wed, Jul 1, 2020 at 04:27:58PM -0400, David Steele wrote:
On 7/1/20 3:58 PM, Robert Haas wrote:
If we could jigger things so that you don't need to stop the backup at
all, you only start it, and whether you ever finish copying everything
is something about which the system need not know or care, that would
be a lot nicer. I'm not sure I see how to do that, though.Well, the only thing pg_stop_backup() *really* needs to know is the starting
WAL position. pg_start_backup() gets that info so if it passes it back to
pg_stop_backup() that could be enough. Or as was proposed above, it justDoesn't pg_start_backup already return this?
SELECT pg_start_backup('test');
pg_start_backup
-----------------
--> 0/2000028
It does, though after looking at the code I think we need the entire
backup_label. For any of this to be useful, pg_start_backup() *must*
return backup_label anyway so it can be stored somewhere the backup
software can find it.
Here's a thought. What if we just stored the oldest starting LSN and a count
of how many backups have been requested. When the backup ends it checks that
backup count is > 0 and starting LSN is <= its starting LSN. If not, it
throws an error. When backups go to 0 FPWs are turned off if they were off
before the first backup.Can't we just error out of an exclusive-style backup is requested to
start and a previous exclusive-style backup has not been stopped?
That means only allowing one exclusive backup but it's not like we don't
already have that limitation. But -- I'd rather not even need to specify
exclusive/non-exclusive when the backup is started. Basically, all
backups would be non-exclusive but we'd allow pg_stop_backup() to be
called in a new session (or the same session).
Regards,
--
-David
david@pgmasters.net
On Wed, Jul 1, 2020 at 05:20:16PM -0400, David Steele wrote:
On 7/1/20 4:32 PM, Bruce Momjian wrote:
On Wed, Jul 1, 2020 at 04:27:58PM -0400, David Steele wrote:
On 7/1/20 3:58 PM, Robert Haas wrote:
If we could jigger things so that you don't need to stop the backup at
all, you only start it, and whether you ever finish copying everything
is something about which the system need not know or care, that would
be a lot nicer. I'm not sure I see how to do that, though.Well, the only thing pg_stop_backup() *really* needs to know is the starting
WAL position. pg_start_backup() gets that info so if it passes it back to
pg_stop_backup() that could be enough. Or as was proposed above, it justDoesn't pg_start_backup already return this?
SELECT pg_start_backup('test');
pg_start_backup
-----------------
--> 0/2000028It does, though after looking at the code I think we need the entire
backup_label. For any of this to be useful, pg_start_backup() *must* return
backup_label anyway so it can be stored somewhere the backup software can
find it.
Sure, sounds good to me.
Here's a thought. What if we just stored the oldest starting LSN and a count
of how many backups have been requested. When the backup ends it checks that
backup count is > 0 and starting LSN is <= its starting LSN. If not, it
throws an error. When backups go to 0 FPWs are turned off if they were off
before the first backup.Can't we just error out of an exclusive-style backup is requested to
start and a previous exclusive-style backup has not been stopped?That means only allowing one exclusive backup but it's not like we don't
already have that limitation. But -- I'd rather not even need to specify
exclusive/non-exclusive when the backup is started. Basically, all backups
would be non-exclusive but we'd allow pg_stop_backup() to be called in a new
session (or the same session).
OK, however you want.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com
The usefulness of a cup is in its emptiness, Bruce Lee
On Wed, Jul 1, 2020 at 11:08 PM David Steele <david@pgmasters.net> wrote:
On 7/1/20 4:39 PM, Magnus Hagander wrote:
On Wed, Jul 1, 2020 at 10:28 PM David Steele <david@pgmasters.net
Here's a thought. What if we just stored the oldest starting LSN anda
count of how many backups have been requested. When the backup ends
it
checks that backup count is > 0 and starting LSN is <= its starting
LSN.
If not, it throws an error. When backups go to 0 FPWs are turned offif
they were off before the first backup.
I guess the weak spot of that one is if some script does stop without
doing start first, it will break somebody else's backup. (And yes, I've
seen scripts make this mistake many times -- it equally breaks the
exclusive backups in the current system...)Well, they'd have to pass in a backup_label with a start LSN >= the min
LSN or they would just get an error and not decrement the backup count.
Oh as long as we're still requiring pg_stop_backup() to pass in something
that it received from pg_start_backup() so that we can verify it's the
correct one, then that problem wouldn't exist.
The real issue would be if they called pg_stop_backup twice. We might be
able to stop that with a rolling max stop lsn to keep anyone from
calling pg_stop_backup() twice.
But yeah, it would be possible to kill somebody else's session with some
finagling. Still, worse case would be an error'd backup rather than a
corrupt one.
What about the case of:
Session A - start backup
Session B - stop backup (but A is still running of course)
Session C - start backup
Session A - stop backup
At this point, session A can still stop the backup because there is one
running -- but there has been time in between the two when no backup was
running. That could lead to Session A getting a corrupt backup, I think --
unless we pass some unique identifier back in pg_stop_backup that matches
it up. (And if we do pass that up, then session B running pg_stop_backup()
would fail, thus leaving the backup started by A still running.
But really, that's only if FPWs are turned off. We can also do some
extra validation if the session is left open, which for most software is
the norm now.
Session left open should really still be the default, as it's the safest
one :) And yes, most backup *software* does it. But the entire reason we
want another mode from the current non-exclusive backup is people *not*
using one of the ready-made backup solutions.
And don't we need the combination of the start/stop location for the
history file?
You mean the .backup file for the WAL? All that needs is the
backup_label and the stop LSN that's determined in pg_stop_backup(). Am
I missing something?
I mean the .backup file in the archive, yes. That one contains both the
start and the stop location, timeline and time.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On Wed, 1 Jul 2020 15:58:57 -0400
Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jul 1, 2020 at 3:50 PM Magnus Hagander <magnus@hagander.net> wrote:
As far as I've seen, the one thing that people have problems with in the
exclusive mode backups are precisely the fact that they have to keep a
persistent conneciton open, and thus it cannot work together with backup
software that is limited to only supporting running a pre- and a post
script.Something like I have suggested here is to solve *that* problem. I don't
think anybody actually explicitly wants "exclusive backups" -- they want a
backup solution that plugs into their world of pre/post scripts. And if we
can make that one work in a safer way than the current exclusive backups,
ohw is that not an improvement?Yeah, I guess that's a pretty fair point. I have to confess to having
somewhat limited enthusiasm for adding a third mode here, but it might
be worth it.It seems pretty well inevitable to me that people are going to forget
to end them.
There's so many way an admin could break their backup process and not notice
it. Even with current nonexclusive backup, a bad written script might creates
bad unrecoverable backups.
In regard with your concern, monitoring in progress backups might be *one*
answer, from server side. This was "easy" with exclusive backup, we just
monitor the backup_label age and warn if it is older than expected [1]https://github.com/OPMDG/check_pgactivity/#user-content-backup_label_age-8.1. Using
non-exclusive backup...this is not that easy anymore. And pg_is_in_backup() is
quite misleading if the admin found it without reading doc. Maybe an admin
function to list in progress non-exclusive backup and related backend pid might
be a good start?
With such admin function facilities, client side process could monitor backup
and decide to cancel them based on some internal backup policies/process.
Tools like pg_basebackup (and probably pgbackrest/barman to some extends) still
need the backup to abort on disconnection. Maybe it could flag its session
using the replication protocol or call a new admin function or load a hook on
session-shutdown to keep previous API behavior?
Regards,
[1]: https://github.com/OPMDG/check_pgactivity/#user-content-backup_label_age-8.1
On Wed, 2020-07-01 at 16:46 -0400, Stephen Frost wrote:
I continue to find it curious that we stress a great deal over (very
likely) poorly written backup scripts that haven't been updated in the
5+? years since exclusive mode was deprecated, but we happily add new
keywords in new major versions and remove columns in our catalog tables
or rename columns or entirely redo how recovery works (breaking every
script out there that performed a restore..) or do any number of other
things that potentially break applications that communicate through the
PG protocol and other parts of the system that very likely have scripts
that were written to work with them.I'm really of the opinion that we need to stop doing that.
If someone could explain what is so special about *this* part of the
system that we absolutely can't possibly accept any change that might
break user's scripts, and why it's worth all of the angst, maintenance,
ridiculously difficult documentation to understand and hacks (the
interface to pg_start_backup is ridiculously warty because of this), I'd
greatly appreciate it.
Easy. Lots of people write backup scripts, and fewer people write
complicated pg_catalog queries. We would make way more users unhappy.
Besides, there is nothing "poorly written" about a backup script using
exclusive backup as long as the user is aware that there could be some
small degree of trouble in case of a crash.
You have a point about reworking the way recovery works: after teaching
several classes and running into oddities with recovery parameters in
postgresql.conf, I am not sure if that was a good move.
Particularly the necessity for "recovery.signal" makes the New Way no
easier than the Old Way - perhaps something like "pg_ctl start_in_recovery"
would have been smarter.
But one instance where we made users unhappy is not justification
for making them unhappy again.
Yours,
Laurenz Albe
On Thu, Jul 2, 2020 at 1:06 AM Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
wrote:
On Wed, 1 Jul 2020 15:58:57 -0400
Robert Haas <robertmhaas@gmail.com> wrote:On Wed, Jul 1, 2020 at 3:50 PM Magnus Hagander <magnus@hagander.net>
wrote:
As far as I've seen, the one thing that people have problems with in
the
exclusive mode backups are precisely the fact that they have to keep a
persistent conneciton open, and thus it cannot work together withbackup
software that is limited to only supporting running a pre- and a post
script.Something like I have suggested here is to solve *that* problem. I
don't
think anybody actually explicitly wants "exclusive backups" -- they
want a
backup solution that plugs into their world of pre/post scripts. And
if we
can make that one work in a safer way than the current exclusive
backups,
ohw is that not an improvement?
Yeah, I guess that's a pretty fair point. I have to confess to having
somewhat limited enthusiasm for adding a third mode here, but it might
be worth it.It seems pretty well inevitable to me that people are going to forget
to end them.There's so many way an admin could break their backup process and not
notice
it. Even with current nonexclusive backup, a bad written script might
creates
bad unrecoverable backups.
Definitely. And the scariest part of them is that they all work fine in
testing...
In regard with your concern, monitoring in progress backups might be *one*
answer, from server side. This was "easy" with exclusive backup, we just
monitor the backup_label age and warn if it is older than expected [1].
Using
Yeah, but having a monitoring point that will crash your database on
restart isn't very safe :)
non-exclusive backup...this is not that easy anymore. And
pg_is_in_backup() is
quite misleading if the admin found it without reading doc. Maybe an admin
Yeah, as it is now it should really be called pg_is_in_exclusive_backup().
function to list in progress non-exclusive backup and related backend pid
might
be a good start?
I think it would make perfect sense to show manual (exclusive or
non-exclusive) base backups in pg_stat_progress_basebackup. There's no
fundamental need that one should only be for base backups taken with
pg_basebackup. In fact, I'd argue that's a mistake in the view in v13 that
it does not include this information.
It could have "phase" set to "manual non-exclusive" for example, and leave
the other fields at NULL.
I guess in theory even for exclusive ones, with the pid column set to NULL.
(As Stephen mentioned at some point in the future we might also want to
extend it to allow these tools to report their progress as well into it,
probably by just calling an admin function on the connection that they
already have).
Tools like pg_basebackup (and probably pgbackrest/barman to some extends)
still
need the backup to abort on disconnection. Maybe it could flag its session
using the replication protocol or call a new admin function or load a hook
on
session-shutdown to keep previous API behavior?
Suddenly requiring a replication protocol connection for one thing, when
all their other things are done over a normal one, seems really terrible.
But having an admin function would work.
But anything requiring loading of a hook is going to raise the bar
massively as suddenly somebody needs to install a compiled binary on the
server in order to use it. But calling a separate function is pretty much
what I suggested upthread (except I suggested a new version of
pg_stop_backup, but implementation wise)
But I'm not sure what previous API behavior you're looking to preserve here?
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On Thu, Jul 2, 2020 at 10:35 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:
On Wed, 2020-07-01 at 16:46 -0400, Stephen Frost wrote:
I continue to find it curious that we stress a great deal over (very
likely) poorly written backup scripts that haven't been updated in the
5+? years since exclusive mode was deprecated, but we happily add new
keywords in new major versions and remove columns in our catalog tables
or rename columns or entirely redo how recovery works (breaking every
script out there that performed a restore..) or do any number of other
things that potentially break applications that communicate through the
PG protocol and other parts of the system that very likely have scripts
that were written to work with them.I'm really of the opinion that we need to stop doing that.
If someone could explain what is so special about *this* part of the
system that we absolutely can't possibly accept any change that might
break user's scripts, and why it's worth all of the angst, maintenance,
ridiculously difficult documentation to understand and hacks (the
interface to pg_start_backup is ridiculously warty because of this), I'd
greatly appreciate it.Easy. Lots of people write backup scripts, and fewer people write
complicated pg_catalog queries. We would make way more users unhappy.
It's not about complicated catalog queries.
For example, in PostgreSQL 13 we break what I would say is the vast
majority of every query against pg_stat_statements, by renaming those
fields. Yet the old ones were never deprecated, and certainly not for 5+
years. And I certainly *hope* we have more users who query
pg_stat_statements than write their own backup solutions.
In PostgreSQL 12, we broke every single restore script by moving backup
configuration into recovery.conf. Yet the old ones were never deprecated,
and certainly not for 5+ years.
Or for PostgreSQL 10, we renamed pg_xlog to pg_wal, and pg_log to log, both
which would break the vast majority of existing backup scripts. Yet the old
ones were never deprecated, and certainly not for 5+ years.
In general, we have made backwards incompatible changes in almost every
release of PostgreSQL. And the vast majority of them have *not* been
deprecated for years ahead of making the change -- we just killed them.
So I do agree with Stephen on this -- it seems we are holding this old and
if not broken then at least very *fragile* API to a *very* different set of
standards than our other functionality.
All backup scripts had to be rewritten in 10. All restore scripts had to be
rewritten in 12. What's so special about requiring backup scripts to be
rewritten again in 14?
Now, I understand it if we were removing functionality that you could not
rewrite to make things work. Heck, I can even at least partially understand
that non-exclusive backup mode is too *difficult* for some people to use.
But the currently suggested changes are there specifically to make it as
easy to use as the deprecated one, yet a lot safer.
Besides, there is nothing "poorly written" about a backup script using
exclusive backup as long as the user is aware that there could be some
small degree of trouble in case of a crash.
It's not a poorly written script, but it *is* a poorly designed API. But if
we decide we can never fix things that we have designed poorly, then what
other parts of our development should we stop?
(And no, I don't agree it's a "small degree of trouble". At all. Because
usually you end up with a total system outage sometime in the middle of the
night (when most people run their backups), blocking end user access,
waking up on-call personnel, and dealing with a pretty damn major incident)
But one instance where we made users unhappy is not justification
for making them unhappy again.
This is true.
So which users will actually be unhappy about this?
Yes, they have to make small updates to their scripts. Again, if they don't
want that, they can *never* upgrade PostgreSQL, because we make such
changes in *every* major release. Those are small, and those are trivial
(with the suggested changes, not today). In return, they get a database
that won't crash on them if something happens during backup.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On 7/1/20 5:44 PM, Magnus Hagander wrote:
On Wed, Jul 1, 2020 at 11:08 PM David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:But yeah, it would be possible to kill somebody else's session with
some
finagling. Still, worse case would be an error'd backup rather than a
corrupt one.What about the case of:
Session A - start backup
Session B - stop backup (but A is still running of course)
Session C - start backup
Session A - stop backupAt this point, session A can still stop the backup because there is one
running -- but there has been time in between the two when no backup was
running. That could lead to Session A getting a corrupt backup, I think
-- unless we pass some unique identifier back in pg_stop_backup that
matches it up. (And if we do pass that up, then session B running
pg_stop_backup() would fail, thus leaving the backup started by A still
running.
This is fine because the min start LSN would have been advanced after B
stopped. When A tries to stop the min start LSN will be later than its
start LSN so it will error.
It might be easier/better to just keep the one exclusive slot in shared
memory and store the backup label in it. We only allow one exclusive
backup now so it wouldn't be a loss in functionality.
None of this really solves the problem of what happens when the user
dumps the backup_label into the data directory. With traditional backup
software that's pretty much going to be the only choice. Is telling them
not to do it and washing our hands of it really enough?
In particular, I'm worried about the logic in postmaster.c that would be
removed if we no longer save the backup_label explicitly during an
exclusive backup. If backup_label is no longer removed on a clean
shutdown it seems we'll just make the situation worse.
Regards,
--
-David
david@pgmasters.net
On Thu, 2 Jul 2020 12:32:14 +0200
Magnus Hagander <magnus@hagander.net> wrote:
[...]
non-exclusive backup...this is not that easy anymore. And
pg_is_in_backup() is quite misleading if the admin found it without reading
doc. Maybe an adminYeah, as it is now it should really be called pg_is_in_exclusive_backup().
+1
We end up to the same conclusion while discussing with Gilles Darold this
morning. But considering the discussion bellow, this might not be needed anyway.
function to list in progress non-exclusive backup and related backend pid
might
be a good start?I think it would make perfect sense to show manual (exclusive or
non-exclusive) base backups in pg_stat_progress_basebackup. There's no
fundamental need that one should only be for base backups taken with
pg_basebackup. In fact, I'd argue that's a mistake in the view in v13 that
it does not include this information.It could have "phase" set to "manual non-exclusive" for example, and leave
the other fields at NULL.
Even in manual non-exclusive we could define some different phase:
* waiting for checkpoint to finish
* external backup
* waiting for wal archiving to finish
So maybe exposing the backup label and/or the method might be useful as well?
I guess in theory even for exclusive ones, with the pid column set to NULL.
We are discussing removing the mandatory connection during the non-exclusive
backup. If it become optional at some point, this PID might be NULL as well...
(As Stephen mentioned at some point in the future we might also want to
extend it to allow these tools to report their progress as well into it,
probably by just calling an admin function on the connection that they
already have).Tools like pg_basebackup (and probably pgbackrest/barman to some extends)
still need the backup to abort on disconnection. Maybe it could flag its
session using the replication protocol or call a new admin function or load
a hook on session-shutdown to keep previous API behavior?Suddenly requiring a replication protocol connection for one thing, when
all their other things are done over a normal one, seems really terrible.
Sorry, I misexplained. Adding such flag to the replication protocol was only for
pg_basebackup. Other tools would use an admin function to achieve the same goal.
But having an admin function would work.
But anything requiring loading of a hook is going to raise the bar
massively as suddenly somebody needs to install a compiled binary on the
server in order to use it.
Contrib already provide many useful tools. How would it be different from eg.
pg_archivecleanup? IIRC, end of session hook was discussed in the past, but I
did not follow the thread by the time. But anyway, an admin function would
probably make the job as well anyway.
But calling a separate function is pretty much > what I suggested upthread
(except I suggested a new version of pg_stop_backup, but implementation wise)But I'm not sure what previous API behavior you're looking to preserve here?
Current non-exclusive backup are automatically aborted when the
pg_start_backup() session disappear without pg_stop_backup(). I suppose this was
the API behavior that Robert is concerned to loose when he is writing about
people forgetting to end backups.
And now you are talking about Stephen's idea to report the progress of a backup,
this kind of API could be used as watchdog as well: if the frontend did not
report before some optional kind of (user defined) timeout, abort the backup.
We would have a "similar" behavior than current one.
To be clear, I suppose these admin functions could be called from any backend
session, taking some kind of backup id (eg. the token you were talking about
upthread).
Regards,
Greetings,
* Laurenz Albe (laurenz.albe@cybertec.at) wrote:
On Wed, 2020-07-01 at 16:46 -0400, Stephen Frost wrote:
If someone could explain what is so special about *this* part of the
system that we absolutely can't possibly accept any change that might
break user's scripts, and why it's worth all of the angst, maintenance,
ridiculously difficult documentation to understand and hacks (the
interface to pg_start_backup is ridiculously warty because of this), I'd
greatly appreciate it.Easy. Lots of people write backup scripts, and fewer people write
complicated pg_catalog queries. We would make way more users unhappy.
And no one writes scripts to do restores? No, I don't believe that
(also don't believe that no one writes scripts or queries that use
various catalog tables). Also, we certainly do get complaints from time
to time from people who got hit by a catalog change, but it's usually
"this stopped working" and you tell them "do this instead" and they're
happy to update their code. That's why we have major versions, and why
we have back-branches and why we support a given major version for 5
years.
You have a point about reworking the way recovery works: after teaching
several classes and running into oddities with recovery parameters in
postgresql.conf, I am not sure if that was a good move.
If it was a good idea or bad idea is really pretty independent of the
point I was trying to make: we *did* completely change the restore API
without any deprecation period for the old one or period of time where
we supported both or any of the foolishness that we've been doing for
the backup API for the past however many years. Perhaps the mob of
users who are going to complain about that change hasn't decided to
upgrade to v12 quite yet, but I doubt it.
I'll say it again- I really think we need to stop this special case
where we've decided that we can't possibly make a breaking change to the
backup API even across major versions. We don't do that, evidently, in
hardly any other cases and it's an altogether good thing- we don't have
ridiculously complicated restore documentation that has to explain two
completely different ways of doing a restore, nor is the code
complicated by having different ways, etc..
Thanks,
Stephen
Greetings,
* Magnus Hagander (magnus@hagander.net) wrote:
On Thu, Jul 2, 2020 at 1:06 AM Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
wrote:function to list in progress non-exclusive backup and related backend pid
might
be a good start?I think it would make perfect sense to show manual (exclusive or
non-exclusive) base backups in pg_stat_progress_basebackup. There's no
fundamental need that one should only be for base backups taken with
pg_basebackup. In fact, I'd argue that's a mistake in the view in v13 that
it does not include this information.
I agree entirely that it was a mistake in v13 to not include this- and
to not include a way for other backup tools to report their progress.
This is a good example of why we really need an in-core non-streamed
backup capability, imv, because if we don't we end up with things like
this that are just thinking about streaming basebackups. We also have
no in-core code that is user-facing that exercises the low-level backup
API.
It could have "phase" set to "manual non-exclusive" for example, and leave
the other fields at NULL.
Yeah.
I guess in theory even for exclusive ones, with the pid column set to NULL.
(As Stephen mentioned at some point in the future we might also want to
extend it to allow these tools to report their progress as well into it,
probably by just calling an admin function on the connection that they
already have).
Right, that wouldn't have been hard to include and would have been quite
nice.
Thanks,
Stephen
On 7/2/20 7:12 AM, David Steele wrote:
None of this really solves the problem of what happens when the user
dumps the backup_label into the data directory. With traditional backup
software that's pretty much going to be the only choice. Is telling them
not to do it and washing our hands of it really enough?
After some thought I believe there are several recommendations we could
give in the docs about what to do with the backup_label file.
This patch, though, looks to be a dead end. I think the best thing to do
is withdraw the patch and end the thread.
I'll start a new thread when I have a patch/design that implements the
new ideas we have discussed.
Regards,
--
-David
david@pgmasters.net