Postgres restart in the middle of exclusive backup and the presence of backup_label file
Hi Hackers,
While an exclusive backup is in progress if Postgres restarts, postgres
runs the recovery from the checkpoint identified by the label file instead
of the control file. This can cause long recovery or even sometimes fail to
recover as the WAL records corresponding to that checkpoint location are
removed. I can write a layer in my control plane to remove the backup_label
file when I know the server is not in restore from the base backup but I
don't see a reason why everyone has to repeat this step. Am I missing
something?
If there are no standby.signal or recovery.signal, what is the use case of
honoring backup_label file? Even when they exist, for a long running
recovery, should we honor the backup_label file as the majority of the WAL
already applied? It does slow down the recovery on restart right as it has
to start all the way from the beginning?
Thanks,
Satya
On Wed, Nov 24, 2021 at 02:12:19PM -0800, SATYANARAYANA NARLAPURAM wrote:
While an exclusive backup is in progress if Postgres restarts, postgres
runs the recovery from the checkpoint identified by the label file instead
of the control file. This can cause long recovery or even sometimes fail to
recover as the WAL records corresponding to that checkpoint location are
removed. I can write a layer in my control plane to remove the backup_label
file when I know the server is not in restore from the base backup but I
don't see a reason why everyone has to repeat this step. Am I missing
something?
This is a known issue with exclusive backups, which is a reason why
non-exclusive backups have been implemented. pg_basebackup does that,
and using "false" as the third argument of pg_start_backup() would
have the same effect. So I would recommend to switch to that.
--
Michael
Thanks Michael!
This is a known issue with exclusive backups, which is a reason why
non-exclusive backups have been implemented. pg_basebackup does that,
and using "false" as the third argument of pg_start_backup() would
have the same effect. So I would recommend to switch to that.
Is there a plan in place to remove the exclusive backup option from the
core in PG 15/16? If we are keeping it then why not make it better?
On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote:
Is there a plan in place to remove the exclusive backup option from the
core in PG 15/16?
This was discussed, but removing it could also harm users relying on
it. Perhaps it could be revisited, but I am not sure if this is worth
worrying about either.
If we are keeping it then why not make it better?
Well, non-exclusive backups are better by design in many aspects, so I
don't quite see the point in spending time on something that has more
limitations than what's already in place.
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote:
If we are keeping it then why not make it better?
Well, non-exclusive backups are better by design in many aspects, so I
don't quite see the point in spending time on something that has more
limitations than what's already in place.
IMO the main reason for keeping it is backwards compatibility for users
who have a satisfactory backup arrangement using it. That same argument
implies that we shouldn't change how it works (at least, not very much).
regards, tom lane
On 11/26/21, 7:33 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael@paquier.xyz> writes:
On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote:
If we are keeping it then why not make it better?
Well, non-exclusive backups are better by design in many aspects, so I
don't quite see the point in spending time on something that has more
limitations than what's already in place.IMO the main reason for keeping it is backwards compatibility for users
who have a satisfactory backup arrangement using it. That same argument
implies that we shouldn't change how it works (at least, not very much).
The issues with exclusive backups seem to be fairly well-documented
(e.g., c900c15), but perhaps there should also be a note in the
"Backup Control Functions" table [0]https://www.postgresql.org/docs/devel/functions-admin.html#FUNCTIONS-ADMIN-BACKUP.
Nathan
[0]: https://www.postgresql.org/docs/devel/functions-admin.html#FUNCTIONS-ADMIN-BACKUP
Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Michael Paquier <michael@paquier.xyz> writes:
On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote:
If we are keeping it then why not make it better?
Well, non-exclusive backups are better by design in many aspects, so I
don't quite see the point in spending time on something that has more
limitations than what's already in place.IMO the main reason for keeping it is backwards compatibility for users
who have a satisfactory backup arrangement using it. That same argument
implies that we shouldn't change how it works (at least, not very much).
There isn't a satisfactory backup approach using it specifically because
of this issue, hence why we should remove it to make it so users don't
run into this. Would also simplify the documentation around the low
level backup API, which would be a very good thing. Right now, making
improvements in that area is very challenging even if all you want to do
is improve the documentation around the non-exclusive API.
We dealt with this as best as one could in pgbackrest for PG versions
prior to when non-exclusive backup was added- which is to remove the
backup_label file as soon as possible and then put it back right before
you call pg_stop_backup() (since it'll complain otherwise). Not a
perfect answer though and a risk still exists there of a failed restart
happening. Of course, for versions which support non-exclusive backup,
we use that to avoid this issue.
We also extensively changed how restore works a couple releases ago and
while there was some noise about it, it certainly wasn't all that bad.
I don't find the reasons brought up to continue to support exclusive
backup to be at all compelling and the lack of huge issues with the new
way restore works to make it abundently clear that we can, in fact,
remove exclusive backup in a major version change without the world
coming down.
Thanks,
Stephen
On Tue, 2021-11-30 at 09:20 -0500, Stephen Frost wrote:
Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Michael Paquier <michael@paquier.xyz> writes:
On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote:
If we are keeping it then why not make it better?
Well, non-exclusive backups are better by design in many aspects, so I
don't quite see the point in spending time on something that has more
limitations than what's already in place.IMO the main reason for keeping it is backwards compatibility for users
who have a satisfactory backup arrangement using it. That same argument
implies that we shouldn't change how it works (at least, not very much).There isn't a satisfactory backup approach using it specifically because
of this issue, hence why we should remove it to make it so users don't
run into this.
There is a satisfactory approach, as long as you are satisfied with
manually restarting the server if it crashed during a backup.
I don't find the reasons brought up to continue to support exclusive
backup to be at all compelling and the lack of huge issues with the new
way restore works to make it abundently clear that we can, in fact,
remove exclusive backup in a major version change without the world
coming down.
I guess the lack of hue and cry was at least to a certain extent because
the exclusive backup API was deprecated, but not removed.
Yours,
Laurenz Albe
Greetings,
On Tue, Nov 30, 2021 at 11:47 Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2021-11-30 at 09:20 -0500, Stephen Frost wrote:
Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Michael Paquier <michael@paquier.xyz> writes:
On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM
wrote:
If we are keeping it then why not make it better?
Well, non-exclusive backups are better by design in many aspects, so
I
don't quite see the point in spending time on something that has more
limitations than what's already in place.IMO the main reason for keeping it is backwards compatibility for users
who have a satisfactory backup arrangement using it. That sameargument
implies that we shouldn't change how it works (at least, not very
much).
There isn't a satisfactory backup approach using it specifically because
of this issue, hence why we should remove it to make it so users don't
run into this.There is a satisfactory approach, as long as you are satisfied with
manually restarting the server if it crashed during a backup.
I disagree that that’s a satisfactory approach. It certainly wasn’t
intended or documented as part of the original feature and therefore to
call it satisfactory strikes me quite strongly as revisionist history.
I don't find the reasons brought up to continue to support exclusive
backup to be at all compelling and the lack of huge issues with the new
way restore works to make it abundently clear that we can, in fact,
remove exclusive backup in a major version change without the world
coming down.I guess the lack of hue and cry was at least to a certain extent because
the exclusive backup API was deprecated, but not removed.
These comments were in reference to the restore API, which was quite
changed (new special files that have to be touched, removing of
recovery.conf, options moved to postgresql.conf/.auto, etc). So, no.
Thanks,
Stephen
Show quoted text
On 11/30/21, 9:51 AM, "Stephen Frost" <sfrost@snowman.net> wrote:
I disagree that that’s a satisfactory approach. It certainly wasn’t
intended or documented as part of the original feature and therefore
to call it satisfactory strikes me quite strongly as revisionist
history.
It looks like the exclusive way has been marked deprecated in all
supported versions along with a note that it will eventually be
removed. If it's not going to be removed out of fear of breaking
backward compatibility, I think the documentation should be updated to
say that. However, unless there is something that is preventing users
from switching to the non-exclusive approach, I think it is reasonable
to begin thinking about removing it.
Nathan
"Bossart, Nathan" <bossartn@amazon.com> writes:
It looks like the exclusive way has been marked deprecated in all
supported versions along with a note that it will eventually be
removed. If it's not going to be removed out of fear of breaking
backward compatibility, I think the documentation should be updated to
say that. However, unless there is something that is preventing users
from switching to the non-exclusive approach, I think it is reasonable
to begin thinking about removing it.
If we're willing to outright remove it, I don't have any great objection.
My original two cents was that we shouldn't put effort into improving it;
but removing it isn't that.
regards, tom lane
On 11/30/21, 2:27 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
If we're willing to outright remove it, I don't have any great objection.
My original two cents was that we shouldn't put effort into improving it;
but removing it isn't that.
I might try to put a patch together for the January commitfest, given
there is enough support.
Nathan
On 11/30/21 17:26, Tom Lane wrote:
"Bossart, Nathan" <bossartn@amazon.com> writes:
It looks like the exclusive way has been marked deprecated in all
supported versions along with a note that it will eventually be
removed. If it's not going to be removed out of fear of breaking
backward compatibility, I think the documentation should be updated to
say that. However, unless there is something that is preventing users
from switching to the non-exclusive approach, I think it is reasonable
to begin thinking about removing it.If we're willing to outright remove it, I don't have any great objection.
My original two cents was that we shouldn't put effort into improving it;
but removing it isn't that.
The main objections as I recall are that it is much harder for simple
backup scripts and commercial backup integrations to hold a connection
to postgres open and write the backup label separately into the backup.
As Stephen noted, working in this area is much harder (even in the docs)
due to the need to keep both methods working. When I removed exclusive
backup it didn't break any tests, other than one that needed to generate
a corrupt backup, so we have virtually no coverage for that method.
I did figure out how to keep the safe part of exclusive backup (not
having to maintain a connection) while removing the dangerous part
(writing backup_label into PGDATA), but it was a substantial amount of
work and I felt that it had little chance of being committed.
Attaching the thread [1]/messages/by-id/ac7339ca-3718-3c93-929f-99e725d1172c@pgmasters.net that I started with a patch to remove exclusive
backup for reference.
--
[1]: /messages/by-id/ac7339ca-3718-3c93-929f-99e725d1172c@pgmasters.net
/messages/by-id/ac7339ca-3718-3c93-929f-99e725d1172c@pgmasters.net
On 11/30/21, 2:58 PM, "David Steele" <david@pgmasters.net> wrote:
I did figure out how to keep the safe part of exclusive backup (not
having to maintain a connection) while removing the dangerous part
(writing backup_label into PGDATA), but it was a substantial amount of
work and I felt that it had little chance of being committed.
Do you think it's still worth trying to make it safe, or do you think
we should just remove exclusive mode completely?
Attaching the thread [1] that I started with a patch to remove exclusive
backup for reference.
Ah, good, some light reading. :)
Nathan
On Tue, Nov 30, 2021 at 05:58:15PM -0500, David Steele wrote:
The main objections as I recall are that it is much harder for simple backup
scripts and commercial backup integrations to hold a connection to postgres
open and write the backup label separately into the backup.
I don't quite understand why this argument would not hold even today,
even if I'd like to think that more people are using pg_basebackup.
I did figure out how to keep the safe part of exclusive backup (not having
to maintain a connection) while removing the dangerous part (writing
backup_label into PGDATA), but it was a substantial amount of work and I
felt that it had little chance of being committed.
Which was, I guess, done by storing the backup_label contents within a
file different than backup_label, still maintained in the main data
folder to ensure that it gets included in the backup?
--
Michael
On Tue, Nov 30, 2021 at 4:54 PM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Nov 30, 2021 at 05:58:15PM -0500, David Steele wrote:
The main objections as I recall are that it is much harder for simple
backup
scripts and commercial backup integrations to hold a connection to
postgres
open and write the backup label separately into the backup.
I don't quite understand why this argument would not hold even today,
even if I'd like to think that more people are using pg_basebackup.I did figure out how to keep the safe part of exclusive backup (not
having
to maintain a connection) while removing the dangerous part (writing
backup_label into PGDATA), but it was a substantial amount of work and I
felt that it had little chance of being committed.Which was, I guess, done by storing the backup_label contents within a
file different than backup_label, still maintained in the main data
folder to ensure that it gets included in the backup?
Non-exclusive backup has significant advantages over exclusive backups but
would like to add a few comments on the simplicity of exclusive backups -
1/ It is not uncommon nowadays to take a snapshot based backup. Exclusive
backup simplifies this story as the backup label file is part of the
snapshot. Otherwise, one needs to store it somewhere outside as snapshot
metadata and copy this file over during restore (after creating a disk from
the snapshot) to the data directory. Typical steps included are 1/ start
pg_base_backup 2/ Take disk snapshot 3/ pg_stop_backup() 4/ Mark snapshot
as consistent and add some create time metadata.
2/ Control plane code responsible for taking backups is simpler with
exclusive backups than non-exclusive as it doesn't maintain a connection to
the server, particularly when that orchestration is outside the machine the
Postgres server is running on.
IMHO, we should either remove the support for it or improve it but not
leave it hanging there.
On 11/30/21 19:54, Michael Paquier wrote:
On Tue, Nov 30, 2021 at 05:58:15PM -0500, David Steele wrote:
I did figure out how to keep the safe part of exclusive backup (not having
to maintain a connection) while removing the dangerous part (writing
backup_label into PGDATA), but it was a substantial amount of work and I
felt that it had little chance of being committed.Which was, I guess, done by storing the backup_label contents within a
file different than backup_label, still maintained in the main data
folder to ensure that it gets included in the backup?
That, or emit it from pg_start_backup() so the user can write it
wherever they please. That would include writing it into PGDATA if they
really wanted to, but that would be on them and the default behavior
would be safe. The problem with this is if the user does not
rename/supply backup_label on restore then they will get corruption and
not know it.
Here's another idea. Since the contents of pg_wal are not supposed to be
copied, we could add a file there to indicate that the cluster should
remove backup_label on restart. Our instructions also say to remove the
contents of pg_wal on restore if they were originally copied, so
hopefully one of the two would happen. But, again, if they fail to
follow the directions it would lead to corruption.
Order would be important here. When starting the backup the proper order
would be to write pg_wal/backup_in_progress and then backup_label. When
stopping the backup they would be removed in the reverse order.
On a restart if both are present then delete both in the correct order
and start crash recovery using the info in pg_control. If only
backup_label is present then go into recovery using the info from
backup_label.
It's possible for pg_wal/backup_in_process to be present by itself if
the server crashes after deleting backup_label but before deleting
pg_wal/backup_in_progress. In that case the server should simply remove
it on start and go into crash recovery using the info from pg_control.
The advantage of this idea is that it does not change the current
instructions as far as I can see. If the user is already following them,
they'll be fine. If they are not, then they'll need to start doing so.
Of course, none of this affects users who are using non-exclusive
backup, which I do hope covers the majority by now.
Thoughts?
Regards,
-David
On 11/30/21 17:26, Tom Lane wrote:
"Bossart, Nathan" <bossartn@amazon.com> writes:
It looks like the exclusive way has been marked deprecated in all
supported versions along with a note that it will eventually be
removed. If it's not going to be removed out of fear of breaking
backward compatibility, I think the documentation should be updated to
say that. However, unless there is something that is preventing users
from switching to the non-exclusive approach, I think it is reasonable
to begin thinking about removing it.If we're willing to outright remove it, I don't have any great objection.
My original two cents was that we shouldn't put effort into improving it;
but removing it isn't that.
+1
Let's just remove it. We already know it's a footgun, and there's been
plenty of warning.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On 11/30/21 18:31, Bossart, Nathan wrote:
On 11/30/21, 2:58 PM, "David Steele" <david@pgmasters.net> wrote:
I did figure out how to keep the safe part of exclusive backup (not
having to maintain a connection) while removing the dangerous part
(writing backup_label into PGDATA), but it was a substantial amount of
work and I felt that it had little chance of being committed.Do you think it's still worth trying to make it safe, or do you think
we should just remove exclusive mode completely?
My preference would be to remove it completely, but I haven't gotten a
lot of traction so far.
Attaching the thread [1] that I started with a patch to remove exclusive
backup for reference.Ah, good, some light reading. :)
Sure, if you say so!
Regards,
-David
On 12/1/21, 8:27 AM, "David Steele" <david@pgmasters.net> wrote:
On 11/30/21 18:31, Bossart, Nathan wrote:
Do you think it's still worth trying to make it safe, or do you think
we should just remove exclusive mode completely?My preference would be to remove it completely, but I haven't gotten a
lot of traction so far.
In this thread, I count 6 people who seem alright with removing it,
and 2 who might be opposed, although I don't think anyone has
explicitly stated they are against it.
Nathan