Improving Physical Backup/Restore within the Low Level API
Hi!
This email is a first pass at a user-visible design for how our backup and
restore process, as enabled by the Low Level API, can be modified to make
it more mistake-proof. In short, it requires pg_start_backup to further
expand upon what it means for the system to be in the midst of a backup,
pg_stop_backup to reverse those things, and modifying the startup process
to deal with the server having crashed while the system is in that backup
state. Notes at the end extend the design to handle concurrent backups.
The core functional changes are:
1) pg_backup_start modifies a newly added "in backup" state flag in
pg_control to on.
2) pg_backup_stop modifies that flag back to off.
3) postmaster will refuse to start if that flag is on, unless one of:
a) crash.signal exists in the data directory
b) recovery.signal exists in the data directory
c) standby.signal exists in the data directory
4) Signal file processing causes the in-backup flag in pg_control to be set
to off
The newly added crash.signal file is required to handle the case where the
server crashes after pg_backup_start and before pg_backup_stop. It
initiates a crash recovery of the instance just as is done today but with
the added change of flipping the flag to off when recovery is complete just
before going live.
The error message for the failed startup while in backup will tell the dba
that one of the three signal files must exist.
When processing recovery.signal or standby.signal the presence of the
backup_label and tablespace_map files are mandatory and the system will
also fail to start should they be missing.
For non-functional changes I would also suggest doing the following:
pg_backup_start will create a "pg_backup_metadata" directory if there is
not already one, or will empty it if there is.
pg_backup_start will create a crash.signal file in that directory
pg_backup_stop will create files within pg_backup_metadata upon its
completion:
backup_label
tablespace_map
recovery.signal
standby.signal
All of the instructions regarding what to place in those files should be
removed and instead the system should write them - no copy-paste.
The instructions modified to say "copy the backup_label and tablespace_map
files to the root of the backup directory and the recovery and standby
signal files to the pg_backup_metadata directory in the backup.
Additionally, we document crash recovery by saying "move crash.signal from
pg_backup_metadata to the root of the data directory". We should explicitly
advise excluding or removing pg_backup_metadata/crash.signal from the
backup as well.
Extending the above to handle concurrent backup, for pg_control we'd sill
use the on/off flag but we have to have a shared in-memory session lock on
something so that only the last surviving process actually changes it to
off while also dealing with sessions that terminate without issuing
pg_backup_stop and without the server itself crashing. (I'm unfamiliar with
how this is handled today but I presume a mechanism exists already that
just needs to be extended).
For the non-functional stuff, pg_backup_start returns a process id, and
subdirectories under pg_backup_metadata are created named with such. Add a
pg_backup_cleanup() function that executes while not in backup mode to
clean up those subdirectories. Any subdirectory in the backup that isn't
the specified process id from pg_start_backup should be excluded/removed.
David J.
On Mon, 2023-10-16 at 09:26 -0700, David G. Johnston wrote:
This email is a first pass at a user-visible design for how our backup and restore
process, as enabled by the Low Level API, can be modified to make it more mistake-proof.
In short, it requires pg_start_backup to further expand upon what it means for the
system to be in the midst of a backup, pg_stop_backup to reverse those things,
and modifying the startup process to deal with the server having crashed while the
system is in that backup state. Notes at the end extend the design to handle concurrent backups.The core functional changes are:
1) pg_backup_start modifies a newly added "in backup" state flag in pg_control to on.
2) pg_backup_stop modifies that flag back to off.
3) postmaster will refuse to start if that flag is on, unless one of:
a) crash.signal exists in the data directory
b) recovery.signal exists in the data directory
c) standby.signal exists in the data directory
4) Signal file processing causes the in-backup flag in pg_control to be set to offThe newly added crash.signal file is required to handle the case where the server
crashes after pg_backup_start and before pg_backup_stop. It initiates a crash recovery
of the instance just as is done today but with the added change of flipping the flag
to off when recovery is complete just before going live.
I see a couple of problems and/or things that need clarification with that idea:
- Two backups can run concurrently. How do you reconcile that with the "in backup"
flag and crash.signal?
- I guess crash.signal is created during pg_start_backup(). So that file will be
included in the backup. How do you handle that during recovery? Ignore it if
another signal file is present? And if the user forgets to create a signal file
for recovery, how do you prevent PostgreSQL from performing crash recovery?
Yours,
Laurenz Albe
On Mon, Oct 16, 2023 at 10:26 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:
On Mon, 2023-10-16 at 09:26 -0700, David G. Johnston wrote:
This email is a first pass at a user-visible design for how our backup
and restore
process, as enabled by the Low Level API, can be modified to make it
more mistake-proof.
In short, it requires pg_start_backup to further expand upon what it
means for the
system to be in the midst of a backup, pg_stop_backup to reverse those
things,
and modifying the startup process to deal with the server having crashed
while the
system is in that backup state. Notes at the end extend the design to
handle concurrent backups.
The core functional changes are:
1) pg_backup_start modifies a newly added "in backup" state flag inpg_control to on.
2) pg_backup_stop modifies that flag back to off.
3) postmaster will refuse to start if that flag is on, unless one of:
a) crash.signal exists in the data directory
b) recovery.signal exists in the data directory
c) standby.signal exists in the data directory
4) Signal file processing causes the in-backup flag in pg_control to beset to off
The newly added crash.signal file is required to handle the case where
the server
crashes after pg_backup_start and before pg_backup_stop. It initiates a
crash recovery
of the instance just as is done today but with the added change of
flipping the flag
to off when recovery is complete just before going live.
I see a couple of problems and/or things that need clarification with that
idea:- Two backups can run concurrently. How do you reconcile that with the
"in backup"
flag and crash.signal?
- I guess crash.signal is created during pg_start_backup(). So that file
will be
included in the backup. How do you handle that during recovery? Ignore
it if
another signal file is present? And if the user forgets to create a
signal file
for recovery, how do you prevent PostgreSQL from performing crash
recovery?
crash.signal is created in the pg_backup_metadata directory, not the root
directory. Should the server crash while any backup is in progress
pg_control would be aware of that fact (in_backup=true would still be
there, instead of in_backup=false which only comes back after all backups
have completed) and the server will not restart without user intervention -
specifically, moving the crash.signal file from (one of) the
pg_backup_metadata subdirectories to the root directory. As there is
nothing special about the crash.signal files in the pg_backup_metadata
subdirectories "touch crash.signal" could be used.
The backed up pg_control file will have in_backup=true (I haven't pondered
the torn reads dynamic of this - I'm supposing that placing a copy of
pg_control into the pg_backup_metadata directory might be part of solving
that problem).
David J.
On Mon, 2023-10-16 at 11:18 -0700, David G. Johnston wrote:
I see a couple of problems and/or things that need clarification with that idea:
- Two backups can run concurrently. How do you reconcile that with the "in backup"
flag and crash.signal?
- I guess crash.signal is created during pg_start_backup(). So that file will be
included in the backup. How do you handle that during recovery? Ignore it if
another signal file is present? And if the user forgets to create a signal file
for recovery, how do you prevent PostgreSQL from performing crash recovery?crash.signal is created in the pg_backup_metadata directory, not the root directory.
Should the server crash while any backup is in progress pg_control would be aware
of that fact (in_backup=true would still be there, instead of in_backup=false which
only comes back after all backups have completed) and the server will not restart
without user intervention - specifically, moving the crash.signal file from (one of)
the pg_backup_metadata subdirectories to the root directory. As there is nothing
special about the crash.signal files in the pg_backup_metadata subdirectories
"touch crash.signal" could be used.
I see - I missed the part with the pg_backup_metadata directory.
I think it won't meet with favor if there are cases that require manual intervention
for starting the server. That was the main argument for getting rid of the exclusive
backup API, which had a similar problem.
Also, how do you envision two concurrent backups with your setup?
Yours,
Laurenz Albe
On Mon, Oct 16, 2023 at 12:09 PM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:
I think it won't meet with favor if there are cases that require manual
intervention
for starting the server. That was the main argument for getting rid of
the exclusive
backup API, which had a similar problem.
In the rare case of a crash of the source database while one or more
databases are in progress. Restoring the backup requires manual
intervention with signal files today.
I get a desire for the live production server to not need intervention to
recover from a crash but I can't help but feel that this requirement plus
the goal of making this a non-interventionist as possible during recovery
are incompatible. But I haven't given it a great amount of thought as I
felt the limited scope and situation were an acceptable cost for keeping
the process straight-forward (i.e., starting up a backup mode instance
requires a signal file that dictates the kind of recovery to perform). We
can either make the live backup contents invalid until something happens
after pg_backup_stop ends that makes it valid or we have to make the
current system being backed up invalid so long as it's in backup mode. The
later seemed easier and doesn't require actions outside of our control.
Also, how do you envision two concurrent backups with your setup?
I don't know if I understand the question - if ensuring that "in backup" is
turned on when the first backup starts and is turned off when the last
backup ends isn't sufficient for concurrent usage I don't know what else I
need to deal with. Apparently concurrent backups already work today and
I'm not seeing how, aside from the process ids for the metadata directories
(i.e., the user needs to remove all but their own process subdirectory from
pg_backup_metadata) and state flag they wouldn't continue to work as-is.
David J.
On Mon, Oct 16, 2023 at 12:36 PM David G. Johnston <
david.g.johnston@gmail.com> wrote:
On Mon, Oct 16, 2023 at 12:09 PM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:I think it won't meet with favor if there are cases that require manual
intervention
for starting the server. That was the main argument for getting rid of
the exclusive
backup API, which had a similar problem.In the rare case of a crash of the source database while one or more
databases are in progress.
Or even more simply, just document that should the process executing
pg_backup_start, and eventually pg_backup_end, that noticed its session die
out from under it, should just add crash.signal to the data directory
(there probably can be a bit more intelligence involved in case the session
crash was isolated). A normal server shutdown should remove any
crash.signal files it sees (and ensure in_backup="false"...). A non-normal
shutdown is going to end up in crash recovery anyway so having the signal
file there won't harm anything even if pg_control is showing
"in_backup=false".
In short, I probably don't know the details well enough to code the
solution but this seems solvable for those users that need automatic reboot
and crash recovery during an incomplete backup. But no, by default, and
probably so far as pg_basebackup is concerned, a server crash during backup
results in requiring outside intervention in order to get the server to
restart. It specifically requires creation of crash.signal, the specific
method being unimportant and its contents being fixed - whether empty or
otherwise.
David J.
On Mon, Oct 16, 2023 at 5:21 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
But no, by default, and probably so far as pg_basebackup is concerned, a server crash during backup results in requiring outside intervention in order to get the server to restart.
Others may differ, but I think such a proposal is dead on arrival. As
Laurenz says, that's just reinventing one of the main problems with
exclusive backup mode.
The underlying issue here is that, fundamentally, there's no way for
postgres itself to tell the difference between the backup directory on
the primary and an exact copy of it on a standby. There has to be some
mechanism by which the user tells us whether this is the original
directory or a clone of it -- and that's what backup_label,
recovery.signal, and standby.signal are for. Your proposal rejiggers
the details of how we distinguish primary from standby, but it
doesn't, and can't, avoid the need for users to actually follow the
directions, and I don't see why they'd be any more likely to follow
the directions that this proposal would require than the directions
we're giving them now.
I wish I had a better idea here, because the status quo is definitely
not great. The only thought that really occurs to me is that we might
do better if PostgreSQL did more of the work itself and left fewer
steps to the user to perform. If you could click the "take a backup
here" button and the "restore a backup there" button and not think
about what was actually happening, you'd not have the opportunity to
mess up. But, as I understand it, the main motivation for the
continued existence of the low-level API is that the data directory
might be really big, and you might need to clone it using some kind of
special magic that your system has available instead of copying all
the bytes. And that makes it hard to move more of the responsibility
into PostgreSQL itself, because we don't know how that special magic
works.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 10/17/23 14:28, Robert Haas wrote:
On Mon, Oct 16, 2023 at 5:21 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:But no, by default, and probably so far as pg_basebackup is concerned, a server crash during backup results in requiring outside intervention in order to get the server to restart.
Others may differ, but I think such a proposal is dead on arrival. As
Laurenz says, that's just reinventing one of the main problems with
exclusive backup mode.
I concur -- this proposal resurrects the issues we had with exclusive
backups without solving the issues being debated elsewhere, e.g. torn
reads of pg_control or users removing backup_label when they should not.
Regards,
-David
On Tue, Oct 17, 2023 at 12:30 PM David Steele <david@pgmasters.net> wrote:
On 10/17/23 14:28, Robert Haas wrote:
On Mon, Oct 16, 2023 at 5:21 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:But no, by default, and probably so far as pg_basebackup is concerned,
a server crash during backup results in requiring outside intervention in
order to get the server to restart.Others may differ, but I think such a proposal is dead on arrival. As
Laurenz says, that's just reinventing one of the main problems with
exclusive backup mode.I concur -- this proposal resurrects the issues we had with exclusive
backups without solving the issues being debated elsewhere, e.g. torn
reads of pg_control or users removing backup_label when they should not.
Thank you all for the feedback.
Admittedly I don't understand the problem of torn reads well enough to
solve it here but I figured by moving the "must not remove" stuff out of
backup_label and into pg_control the odds of it being removed from the
backup and the backup still booting basically go to zero. I do agree that
renaming backup_label to something like "recovery_stuff_do_not_delete.conf"
probably does that just as well without the downside.
Placing a copy of all relevant files into pg_backup_metadata seems like a
decent shield against accidents and a way to reliably self-document the
backup even if the behavioral changes are not desired. Though doing that
and handling multiple concurrent backups probably makes the cost too high
to move away from relying just on documentation.
I suppose I'd consider having to add one file to the data directory to be
an improvement over having to remove two of them - in terms of what it
takes to recover from system failure during a backup.
David J