The danger of deleting backup_label
Hackers,
While reading through [1]/messages/by-id/CAM_vCudkSjr7NsNKSdjwtfAm9dbzepY6beZ5DP177POKy8=2aw@mail.gmail.com I saw there were two instances where
backup_label was removed to achieve a "successful" restore. This might
work on trivial test restores but is an invitation to (silent) disaster
in a production environment where the checkpoint stored in backup_label
is almost certain to be earlier than the one stored in pg_control.
A while back I had an idea on how to prevent this so I decided to give
it a try. Basically, before writing pg_control to the backup I set
checkpoint to 0xFFFFFFFFFFFFFFFF.
Recovery worked perfectly as long as backup_label was present and failed
hard when it was not:
LOG: invalid primary checkpoint record
PANIC: could not locate a valid checkpoint record
It's not a very good message, but at least the foot gun has been
removed. We could use this as a special value to give a better message,
and maybe use something a bit more unique like 0xFFFFFFFFFADEFADE (or
whatever) as the value.
This is all easy enough for pg_basebackup to do, but will certainly be
non-trivial for most backup software to implement. In [2]/messages/by-id/CA+hUKGKiZJcfZSA5G5Rm8oC78SNOQ4c8az5Ku=4wMTjw1FZ40g@mail.gmail.com we have
discussed perhaps returning pg_control from pg_backup_stop() for the
backup software to save, or it could become part of the backup_label
(encoded as hex or base64, presumably). I prefer the latter as this
means less work for the backup software (except for the need to exclude
pg_control from the backup).
I don't have a patch for this yet because I did not test this idea using
pg_basebackup, but I'll be happy to work up a patch if there is interest.
I feel like we should do *something* here. If even advanced users are
making this mistake, then we should take it pretty seriously.
Regards,
-David
[1]: /messages/by-id/CAM_vCudkSjr7NsNKSdjwtfAm9dbzepY6beZ5DP177POKy8=2aw@mail.gmail.com
/messages/by-id/CAM_vCudkSjr7NsNKSdjwtfAm9dbzepY6beZ5DP177POKy8=2aw@mail.gmail.com
[2]: /messages/by-id/CA+hUKGKiZJcfZSA5G5Rm8oC78SNOQ4c8az5Ku=4wMTjw1FZ40g@mail.gmail.com
/messages/by-id/CA+hUKGKiZJcfZSA5G5Rm8oC78SNOQ4c8az5Ku=4wMTjw1FZ40g@mail.gmail.com
On Thu, Sep 28, 2023 at 05:14:22PM -0400, David Steele wrote:
While reading through [1] I saw there were two instances where backup_label
was removed to achieve a "successful" restore. This might work on trivial
test restores but is an invitation to (silent) disaster in a production
environment where the checkpoint stored in backup_label is almost certain to
be earlier than the one stored in pg_control.
Definitely successful.
Recovery worked perfectly as long as backup_label was present and failed
hard when it was not:LOG: invalid primary checkpoint record
PANIC: could not locate a valid checkpoint recordIt's not a very good message, but at least the foot gun has been removed. We
could use this as a special value to give a better message, and maybe use
something a bit more unique like 0xFFFFFFFFFADEFADE (or whatever) as the
value.
Why not just InvalidXLogRecPtr?
This is all easy enough for pg_basebackup to do, but will certainly be
non-trivial for most backup software to implement. In [2] we have discussed
perhaps returning pg_control from pg_backup_stop() for the backup software
to save, or it could become part of the backup_label (encoded as hex or
base64, presumably). I prefer the latter as this means less work for the
backup software (except for the need to exclude pg_control from the backup).I don't have a patch for this yet because I did not test this idea using
pg_basebackup, but I'll be happy to work up a patch if there is interest.
If the contents of the control file are tweaked before sending it
through a BASE_BACKUP, it would cover more than just pg_basebackup.
Switching the way the control file is sent with new contents in
sendFileWithContent() rather than sendFile() would be one way, for
instance..
--
Michael
On 9/28/23 22:30, Michael Paquier wrote:
On Thu, Sep 28, 2023 at 05:14:22PM -0400, David Steele wrote:
Recovery worked perfectly as long as backup_label was present and failed
hard when it was not:LOG: invalid primary checkpoint record
PANIC: could not locate a valid checkpoint recordIt's not a very good message, but at least the foot gun has been removed. We
could use this as a special value to give a better message, and maybe use
something a bit more unique like 0xFFFFFFFFFADEFADE (or whatever) as the
value.Why not just InvalidXLogRecPtr?
That fails because there is a check to make sure the checkpoint is valid
when pg_control is loaded. Another possibility is to use a special LSN
like we use for unlogged tables. Anything >= 24 and < WAL segment size
will work fine.
This is all easy enough for pg_basebackup to do, but will certainly be
non-trivial for most backup software to implement. In [2] we have discussed
perhaps returning pg_control from pg_backup_stop() for the backup software
to save, or it could become part of the backup_label (encoded as hex or
base64, presumably). I prefer the latter as this means less work for the
backup software (except for the need to exclude pg_control from the backup).I don't have a patch for this yet because I did not test this idea using
pg_basebackup, but I'll be happy to work up a patch if there is interest.If the contents of the control file are tweaked before sending it
through a BASE_BACKUP, it would cover more than just pg_basebackup.
Switching the way the control file is sent with new contents in
sendFileWithContent() rather than sendFile() would be one way, for
instance..
Good point, and that makes this even more compelling. If we include
pg_control into backup_label then there is no need to modify pg_control
(as above) -- we can just exclude it from the backup entirely. That will
certainly require some rejigging in recovery but seems worth it for
backup solutions that can't easily modify pg_control. The C-based
solutions can do this pretty easily but it is a pretty high bar for
anyone else.
Regards,
-David
Hi David,
Even though I spent a whole bunch of time trying to figure out how to
make concurrent reads of the control file sufficiently atomic for
backups (pg_basebackup and low level filesystem tools), and we
explored multiple avenues with varying results, and finally came up
with something that basically works pretty well... actually I just
hate all of that stuff, and I'm hoping to be able to just withdraw
https://commitfest.postgresql.org/45/4025/ and chalk it all up to
discovery/education and call *this* thread the real outcome of that
preliminary work.
So I'm +1 on the idea of putting a control file image into the backup
label and I'm happy that you're looking into it.
We could just leave the control file out of the base backup
completely, as you said, removing a whole foot-gun. People following
the 'low level' instructions will still get a copy of the control file
from the filesystem, and I don't see any reliable way to poison that
file without also making it so that a crash wouldn't also be prevented
from recovering. I have wondered about putting extra "fingerprint"
information into the control file such as the file's path and inode
number etc, so that you can try to distinguish between a control file
written by PostgreSQL, and a control file copied somewhere else, but
that all feels too fragile, and at the end of the day, people
following the low level backup instructions had better follow the low
level backup instructions (hopefully via the intermediary of an
excellent external backup tool).
As Stephen mentioned[1]/messages/by-id/ZL69NXjCNG+WHCqG@tamriel.snowman.net, we could perhaps also complain if both backup
label and control file exist, and then hint that the user should
remove the *control file* (not the backup label!). I had originally
suggested we would just overwrite the control file, but by explicitly
complaining about it we would also bring the matter to tool/script
authors' attention, ie that they shouldn't be backing that file up, or
should be removing it in a later step if they copy everything. He
also mentions that there doesn't seem to be anything stopping us from
back-patching changes to the backup label contents if we go this way.
I don't have a strong opinion on that and we could leave the question
for later.
On Tue, Oct 10, 2023 at 05:06:45PM -0400, David Steele wrote:
That fails because there is a check to make sure the checkpoint is valid
when pg_control is loaded. Another possibility is to use a special LSN like
we use for unlogged tables. Anything >= 24 and < WAL segment size will work
fine.
Do we have any reason to do that in the presence of a backup_label
file anyway? We'll know the LSN of the checkpoint based on what the
base backup wants us to use. Using a fake-still-rather-valid value
for the LSN in the control file to bypass this check does not address
the issue you are pointing at: it is just avoiding this check. A
reasonable answer would be, IMO, to just not do this check at all
based on the control file in this case.
If the contents of the control file are tweaked before sending it
through a BASE_BACKUP, it would cover more than just pg_basebackup.
Switching the way the control file is sent with new contents in
sendFileWithContent() rather than sendFile() would be one way, for
instance..Good point, and that makes this even more compelling. If we include
pg_control into backup_label then there is no need to modify pg_control (as
above) -- we can just exclude it from the backup entirely. That will
certainly require some rejigging in recovery but seems worth it for backup
solutions that can't easily modify pg_control. The C-based solutions can do
this pretty easily but it is a pretty high bar for anyone else.
I have little idea about that, but I guess that you are referring to
backrest here.
--
Michael
Hi Thomas,
On 10/11/23 18:10, Thomas Munro wrote:
Even though I spent a whole bunch of time trying to figure out how to
make concurrent reads of the control file sufficiently atomic for
backups (pg_basebackup and low level filesystem tools), and we
explored multiple avenues with varying results, and finally came up
with something that basically works pretty well... actually I just
hate all of that stuff, and I'm hoping to be able to just withdraw
https://commitfest.postgresql.org/45/4025/ and chalk it all up to
discovery/education and call *this* thread the real outcome of that
preliminary work.So I'm +1 on the idea of putting a control file image into the backup
label and I'm happy that you're looking into it.
Well, hopefully this thread will *at least* be the solution going
forward. Not sure about a back patch yet, see below...
We could just leave the control file out of the base backup
completely, as you said, removing a whole foot-gun.
That's the plan.
People following
the 'low level' instructions will still get a copy of the control file
from the filesystem, and I don't see any reliable way to poison that
file without also making it so that a crash wouldn't also be prevented
from recovering. I have wondered about putting extra "fingerprint"
information into the control file such as the file's path and inode
number etc, so that you can try to distinguish between a control file
written by PostgreSQL, and a control file copied somewhere else, but
that all feels too fragile, and at the end of the day, people
following the low level backup instructions had better follow the low
level backup instructions (hopefully via the intermediary of an
excellent external backup tool).
Not sure about the inode idea, because it seems OK for people to move a
cluster elsewhere under a variety of circumstances. I do have an idea
about how to mark a cluster in "recovery to consistency" mode, but not
quite sure how to atomically turn that off at the end of recovery to
consistency. I have some ideas I'll work on though.
As Stephen mentioned[1], we could perhaps also complain if both backup
label and control file exist, and then hint that the user should
remove the *control file* (not the backup label!). I had originally
suggested we would just overwrite the control file, but by explicitly
complaining about it we would also bring the matter to tool/script
authors' attention, ie that they shouldn't be backing that file up, or
should be removing it in a later step if they copy everything. He
also mentions that there doesn't seem to be anything stopping us from
back-patching changes to the backup label contents if we go this way.
I don't have a strong opinion on that and we could leave the question
for later.
I'm worried about the possibility of back patching this unless the
solution comes out to be simpler than I think and that rarely comes to
pass. Surely throwing errors on something that is currently valid (i.e.
backup_label and pg_control both present).
But perhaps there is a simpler, acceptable solution we could back patch
(transparent to all parties except Postgres) and then a more advanced
solution we could go forward with.
I guess I had better get busy on this.
Regards,
-David
[1]: /messages/by-id/ZL69NXjCNG+WHCqG@tamriel.snowman.net
/messages/by-id/ZL69NXjCNG+WHCqG@tamriel.snowman.net
On 10/11/23 18:22, Michael Paquier wrote:
On Tue, Oct 10, 2023 at 05:06:45PM -0400, David Steele wrote:
That fails because there is a check to make sure the checkpoint is valid
when pg_control is loaded. Another possibility is to use a special LSN like
we use for unlogged tables. Anything >= 24 and < WAL segment size will work
fine.Do we have any reason to do that in the presence of a backup_label
file anyway? We'll know the LSN of the checkpoint based on what the
base backup wants us to use. Using a fake-still-rather-valid value
for the LSN in the control file to bypass this check does not address
the issue you are pointing at: it is just avoiding this check. A
reasonable answer would be, IMO, to just not do this check at all
based on the control file in this case.
Yeah, that's fair. And it looks like we are leaning towards excluding
pg_control from the backup entirely, so the point is probably moot.
If the contents of the control file are tweaked before sending it
through a BASE_BACKUP, it would cover more than just pg_basebackup.
Switching the way the control file is sent with new contents in
sendFileWithContent() rather than sendFile() would be one way, for
instance..Good point, and that makes this even more compelling. If we include
pg_control into backup_label then there is no need to modify pg_control (as
above) -- we can just exclude it from the backup entirely. That will
certainly require some rejigging in recovery but seems worth it for backup
solutions that can't easily modify pg_control. The C-based solutions can do
this pretty easily but it is a pretty high bar for anyone else.I have little idea about that, but I guess that you are referring to
backrest here.
Sure, pgBackRest, but there are other backup solutions written in C. My
point is really that we should not depend on backup solutions being able
to manipulate C structs. It looks the the solution we are working
towards would not require that.
Regards,
-David
On 10/12/23 10:19, David Steele wrote:
On 10/11/23 18:10, Thomas Munro wrote:
As Stephen mentioned[1], we could perhaps also complain if both backup
label and control file exist, and then hint that the user should
remove the *control file* (not the backup label!). I had originally
suggested we would just overwrite the control file, but by explicitly
complaining about it we would also bring the matter to tool/script
authors' attention, ie that they shouldn't be backing that file up, or
should be removing it in a later step if they copy everything. He
also mentions that there doesn't seem to be anything stopping us from
back-patching changes to the backup label contents if we go this way.
I don't have a strong opinion on that and we could leave the question
for later.I'm worried about the possibility of back patching this unless the
solution comes out to be simpler than I think and that rarely comes to
pass. Surely throwing errors on something that is currently valid (i.e.
backup_label and pg_control both present).But perhaps there is a simpler, acceptable solution we could back patch
(transparent to all parties except Postgres) and then a more advanced
solution we could go forward with.I guess I had better get busy on this.
Attached is a very POC attempt at something along these lines that could
be back patched. I stopped when I realized excluding pg_control from the
backup is a necessity to make this work (else we still could end up with
a torn copy of pg_control even if there is a good copy elsewhere). To
enumerate the back patch issues as I see them:
1) We still need a copy of pg_control in order to get Postgres to start
and that copy might be torn (pretty much where we are now). We can
handle this easily in pg_basebackup but most backup software will not be
able to do so. In my opinion teaching Postgres to start without
pg_control is too big a change to possibly back patch.
2) We need to deal with backups made with a prior *minor* version that
did not include pg_control in the backup_label. Doable, but...
3) We need to move backup_label to the end of the main pg_basebackup
tar, which could cause unforeseen breakage for tools.
4) This patch is less efficient for backups taken from standby because
it will overwrite pg_control on restart and force replay back to the
original MinRecoveryPoint.
5) We still need a solution for exclusive backup (still valid in PG <=
14). Doable but it would still have the weakness of 1.
All of this is fixable in HEAD, but seems incredibly dangerous to back
patch. Even so, I have attached the patch in case somebody sees an
opportunity that I do not.
Regards,
-David
Attachments:
v01-pgcontrol-in-backup-label.patchtext/plain; charset=UTF-8; name=v01-pgcontrol-in-backup-label.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c0e4ca50899..e4a4478af75 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -74,6 +74,7 @@
#include "pg_trace.h"
#include "pgstat.h"
#include "port/atomics.h"
+#include "port/pg_crc32c.h"
#include "port/pg_iovec.h"
#include "postmaster/bgwriter.h"
#include "postmaster/startup.h"
@@ -8691,6 +8692,21 @@ do_pg_backup_stop(BackupState *state, bool waitforarchive)
"and should not be used. "
"Try taking another online backup.")));
+ /*
+ * Create a copy of control data to be stored in the backup label.
+ * Recalculate the CRC so recovery can validate the contents but do not
+ * bother with the timestamp since that will be applied before it is
+ * written by recovery.
+ */
+ LWLockAcquire(ControlFileLock, LW_SHARED);
+ state->controlFile = *ControlFile;
+ LWLockRelease(ControlFileLock);
+
+ INIT_CRC32C(state->controlFile.crc);
+ COMP_CRC32C(state->controlFile.crc, (char *)&state->controlFile,
+ offsetof(ControlFileData, crc));
+ FIN_CRC32C(state->controlFile.crc);
+
/*
* During recovery, we don't write an end-of-backup record. We assume that
* pg_control was backed up last and its minimum recovery point can be
@@ -8741,11 +8757,8 @@ do_pg_backup_stop(BackupState *state, bool waitforarchive)
"Enable full_page_writes and run CHECKPOINT on the primary, "
"and then try an online backup again.")));
-
- LWLockAcquire(ControlFileLock, LW_SHARED);
- state->stoppoint = ControlFile->minRecoveryPoint;
- state->stoptli = ControlFile->minRecoveryPointTLI;
- LWLockRelease(ControlFileLock);
+ state->stoppoint = state->controlFile.minRecoveryPoint;
+ state->stoptli = state->controlFile.minRecoveryPointTLI;
}
else
{
diff --git a/src/backend/access/transam/xlogbackup.c b/src/backend/access/transam/xlogbackup.c
index 21d68133ae1..79559d78acb 100644
--- a/src/backend/access/transam/xlogbackup.c
+++ b/src/backend/access/transam/xlogbackup.c
@@ -16,6 +16,7 @@
#include "access/xlog.h"
#include "access/xlog_internal.h"
#include "access/xlogbackup.h"
+#include "common/base64.h"
/*
* Build contents for backup_label or backup history file.
@@ -76,6 +77,17 @@ build_backup_content(BackupState *state, bool ishistoryfile)
appendStringInfo(result, "STOP TIME: %s\n", stopstrfbuf);
appendStringInfo(result, "STOP TIMELINE: %u\n", state->stoptli);
}
+ /* Include a copy of control data */
+ else
+ {
+ char controlbuf[((sizeof(ControlFileData) + 2) / 3 * 4) + 1];
+
+ pg_b64_encode((char *)&state->controlFile, sizeof(ControlFileData),
+ controlbuf, sizeof(controlbuf) - 1);
+ controlbuf[sizeof(controlbuf) - 1] = '\0';
+
+ appendStringInfo(result, "CONTROL DATA: %s\n", controlbuf);
+ }
data = result->data;
pfree(result);
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index becc2bda62e..1abd9494bf0 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -43,6 +43,7 @@
#include "backup/basebackup.h"
#include "catalog/pg_control.h"
#include "commands/tablespace.h"
+#include "common/base64.h"
#include "common/file_utils.h"
#include "miscadmin.h"
#include "pgstat.h"
@@ -390,7 +391,8 @@ static void readRecoverySignalFile(void);
static void validateRecoveryParameters(void);
static bool read_backup_label(XLogRecPtr *checkPointLoc,
TimeLineID *backupLabelTLI,
- bool *backupEndRequired, bool *backupFromStandby);
+ bool *backupEndRequired, bool *backupFromStandby,
+ ControlFileData *ControlFile);
static bool read_tablespace_map(List **tablespaces);
static void xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI);
@@ -610,7 +612,7 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
primary_image_masked = (char *) palloc(BLCKSZ);
if (read_backup_label(&CheckPointLoc, &CheckPointTLI, &backupEndRequired,
- &backupFromStandby))
+ &backupFromStandby, ControlFile))
{
List *tablespaces = NIL;
@@ -1167,7 +1169,8 @@ validateRecoveryParameters(void)
*/
static bool
read_backup_label(XLogRecPtr *checkPointLoc, TimeLineID *backupLabelTLI,
- bool *backupEndRequired, bool *backupFromStandby)
+ bool *backupEndRequired, bool *backupFromStandby,
+ ControlFileData *ControlFile)
{
char startxlogfilename[MAXFNAMELEN];
TimeLineID tli_from_walseg,
@@ -1180,6 +1183,7 @@ read_backup_label(XLogRecPtr *checkPointLoc, TimeLineID *backupLabelTLI,
char backuptime[128];
uint32 hi,
lo;
+ char controlB64Buf[684 + 1];
/* suppress possible uninitialized-variable warnings */
*checkPointLoc = InvalidXLogRecPtr;
@@ -1285,6 +1289,51 @@ read_backup_label(XLogRecPtr *checkPointLoc, TimeLineID *backupLabelTLI,
tli_from_file, BACKUP_LABEL_FILE)));
}
+ /*
+ * Read control data to be used to create pg_control. Control data may not
+ * exist if the backup was made with an older version, in which case the
+ * control file read from disk will be used.
+ */
+ if (fscanf(lfp, "CONTROL DATA: %684s\n", controlB64Buf) == 1)
+ {
+ ControlFileData controlBuf;
+ pg_crc32c crc;
+
+ /* Check that the base64 data is the correct length */
+ if (strlen(controlB64Buf) != (sizeof(ControlFileData) + 2) / 3 * 4)
+ ereport(FATAL,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE),
+ errdetail("Control data expected %zu base64 characters.",
+ (sizeof(ControlFileData) + 2) / 3 * 4)));
+
+ /* Decode control file */
+ if (pg_b64_decode(controlB64Buf, strlen(controlB64Buf), (char *)&controlBuf,
+ sizeof(ControlFileData)) == -1)
+ ereport(FATAL,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE),
+ errdetail("Control data contains invalid base64 encoding.")));
+
+ /* CRC check on stored control file */
+ INIT_CRC32C(crc);
+ COMP_CRC32C(crc, (char *)&controlBuf, offsetof(ControlFileData, crc));
+ FIN_CRC32C(crc);
+
+ if (!EQ_CRC32C(crc, controlBuf.crc))
+ ereport(FATAL,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE),
+ errdetail("Control data contains invalid CRC.")));
+
+ ereport(DEBUG1,
+ (errmsg_internal("backup control data in file \"%s\"",
+ BACKUP_LABEL_FILE)));
+
+ /* Copy control data */
+ memcpy(ControlFile, &controlBuf, sizeof(ControlFileData));
+ }
+
if (ferror(lfp) || FreeFile(lfp))
ereport(FATAL,
(errcode_for_file_access(),
diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index 7d025bcf382..f49dabf63ef 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -330,13 +330,7 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
bbsink_begin_archive(sink, "base.tar");
- /* In the main tar, include the backup_label first... */
- backup_label = build_backup_content(backup_state, false);
- sendFileWithContent(sink, BACKUP_LABEL_FILE,
- backup_label, &manifest);
- pfree(backup_label);
-
- /* Then the tablespace_map file, if required... */
+ /* Send the tablespace_map file, if required... */
if (opt->sendtblspcmapfile)
{
sendFileWithContent(sink, TABLESPACE_MAP,
@@ -348,7 +342,7 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
sendDir(sink, ".", 1, false, state.tablespaces,
sendtblspclinks, &manifest, NULL);
- /* ... and pg_control after everything else. */
+ /* ... and pg_control after everything but backup_label */
if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
ereport(ERROR,
(errcode_for_file_access(),
@@ -356,6 +350,16 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
XLOG_CONTROL_FILE)));
sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
false, InvalidOid, &manifest, NULL);
+
+ /* End the backup before sending backup_label */
+ basebackup_progress_wait_wal_archive(&state);
+ do_pg_backup_stop(backup_state, !opt->nowait);
+
+ /* Last in the main tar, include backup_label */
+ backup_label = build_backup_content(backup_state, false);
+ sendFileWithContent(sink, BACKUP_LABEL_FILE,
+ backup_label, &manifest);
+ pfree(backup_label);
}
else
{
@@ -389,9 +393,6 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
}
}
- basebackup_progress_wait_wal_archive(&state);
- do_pg_backup_stop(backup_state, !opt->nowait);
-
endptr = backup_state->stoppoint;
endtli = backup_state->stoptli;
diff --git a/src/include/access/xlogbackup.h b/src/include/access/xlogbackup.h
index 1611358137b..bd53405670e 100644
--- a/src/include/access/xlogbackup.h
+++ b/src/include/access/xlogbackup.h
@@ -15,6 +15,7 @@
#define XLOG_BACKUP_H
#include "access/xlogdefs.h"
+#include "catalog/pg_control.h"
#include "pgtime.h"
/* Structure to hold backup state. */
@@ -33,6 +34,7 @@ typedef struct BackupState
XLogRecPtr stoppoint; /* backup stop WAL location */
TimeLineID stoptli; /* backup stop TLI */
pg_time_t stoptime; /* backup stop time */
+ ControlFileData controlFile; /* Copy of control data */
} BackupState;
extern char *build_backup_content(BackupState *state,
On Sat, Oct 14, 2023 at 11:33 AM David Steele <david@pgmasters.net> wrote:
All of this is fixable in HEAD, but seems incredibly dangerous to back
patch. Even so, I have attached the patch in case somebody sees an
opportunity that I do not.
I really do not think we should be even thinking about back-patching
something like this. It's clearly not a bug fix, although I'm sure
that someone can try to characterize it that way, if they want to make
the well-worn argument that any behavior they don't like is a bug. But
that's a pretty lame argument. Usage errors on the part of users are
not bugs, even if we've coded the software in such a way as to make
those errors more likely.
I think what we ought to be talking about is whether a change like
this is a good idea even in master. I don't think it's a terrible
idea, but I'm also not sure that it's a good idea. The problem is that
if you're doing the right thing with your backup_label, then this is
unnecessary, and if you're doing the wrong thing, then why should you
do the right thing about this? I mean, admittedly you can't just
ignore a fatal error, but I think people will just run pg_resetwal,
which is even worse than starting from the wrong checkpoint. I feel
like in cases where a customer I'm working with has a bad backup,
their entire focus is on doing something to that backup to get a
running system back, whatever it takes. It's already too late at that
point to fix the backup procedure - they only have the backups they
have. You could hope people would do test restores before disaster
strikes, but people who are that prepared are probably running a real
backup tool and will never have this problem in the first place.
Perhaps that's all too pessimistic. I don't know. Certainly, other
people can have experiences that are different than mine. But I feel
like I struggle to think of a case where this would have prevented a
bad outcome, and that makes me wonder whether it's really a good idea
to complicate the system.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 10/16/23 10:55, Robert Haas wrote:
On Sat, Oct 14, 2023 at 11:33 AM David Steele <david@pgmasters.net> wrote:
All of this is fixable in HEAD, but seems incredibly dangerous to back
patch. Even so, I have attached the patch in case somebody sees an
opportunity that I do not.I really do not think we should be even thinking about back-patching
something like this. It's clearly not a bug fix, although I'm sure
that someone can try to characterize it that way, if they want to make
the well-worn argument that any behavior they don't like is a bug. But
that's a pretty lame argument. Usage errors on the part of users are
not bugs, even if we've coded the software in such a way as to make
those errors more likely.
Hmmm, the reason to back patch this is that it would fix [1]/messages/by-id/20221123014224.xisi44byq3cf5psi@awork3.anarazel.de, which sure
looks like a problem to me even if it is not a "bug". We can certainly
require backup software to retry pg_control until the checksum is valid
but that seems like a pretty big ask, even considering how complicated
backup is.
I investigated this as a solution to [1]/messages/by-id/20221123014224.xisi44byq3cf5psi@awork3.anarazel.de because it seemed like a better
solution that what we have in [1]/messages/by-id/20221123014224.xisi44byq3cf5psi@awork3.anarazel.de. But once I got into the weeds it was
obvious that this wasn't going to be something we could back patch.
I think what we ought to be talking about is whether a change like
this is a good idea even in master. I don't think it's a terrible
idea, but I'm also not sure that it's a good idea. The problem is that
if you're doing the right thing with your backup_label, then this is
unnecessary, and if you're doing the wrong thing, then why should you
do the right thing about this?
First and foremost, this solves the problem of pg_control being torn
when read by the backup software. It can't be torn if it is not there.
There are also other advantages -- we can massage pg_control before
writing it back out. This already happens, but before that happens there
is a copy of the (prior) running pg_control there that can mess up the
process.
I mean, admittedly you can't just
ignore a fatal error, but I think people will just run pg_resetwal,
which is even worse than starting from the wrong checkpoint.
If you start from the last checkpoint (which is what will generally be
stored in pg_control) then the effect is pretty similar.
I feel
like in cases where a customer I'm working with has a bad backup,
their entire focus is on doing something to that backup to get a
running system back, whatever it takes. It's already too late at that
point to fix the backup procedure - they only have the backups they
have. You could hope people would do test restores before disaster
strikes, but people who are that prepared are probably running a real
backup tool and will never have this problem in the first place.
Right now the user can remove backup_label and get a "successful"
restore and not realize that they have just corrupted their cluster.
This is independent of the backup/restore tool doing all the right things.
My goal here is to narrow the options to try and make it so there is
*one* valid procedure that will work. For this patch the idea is that
they *must* start Postgres to get a valid pg_control from the
backup_label. Any other action leads to a fatal error.
Note that the current patch is very WIP and does not actually do
everything I'm talking about here. I was just trying to see if it could
be used to solve the problem in [1]/messages/by-id/20221123014224.xisi44byq3cf5psi@awork3.anarazel.de. It can't.
Perhaps that's all too pessimistic. I don't know. Certainly, other
people can have experiences that are different than mine. But I feel
like I struggle to think of a case where this would have prevented a
bad outcome, and that makes me wonder whether it's really a good idea
to complicate the system.
I'm specifically addressing cases like those that came up (twice!) in
[2]: [1] /messages/by-id/CAM_vCudkSjr7NsNKSdjwtfAm9dbzepY6beZ5DP177POKy8=2aw@mail.gmail.com
hackers can make this mistake then we should do better.
Regards,
-David
[1]: /messages/by-id/20221123014224.xisi44byq3cf5psi@awork3.anarazel.de
/messages/by-id/20221123014224.xisi44byq3cf5psi@awork3.anarazel.de
[2]: [1] /messages/by-id/CAM_vCudkSjr7NsNKSdjwtfAm9dbzepY6beZ5DP177POKy8=2aw@mail.gmail.com
/messages/by-id/CAM_vCudkSjr7NsNKSdjwtfAm9dbzepY6beZ5DP177POKy8=2aw@mail.gmail.com
On Mon, Oct 16, 2023 at 11:45 AM David Steele <david@pgmasters.net> wrote:
Hmmm, the reason to back patch this is that it would fix [1], which sure
looks like a problem to me even if it is not a "bug". We can certainly
require backup software to retry pg_control until the checksum is valid
but that seems like a pretty big ask, even considering how complicated
backup is.
That seems like a problem with pg_control not being written atomically
when the standby server is updating it during recovery, rather than a
problem with backup_label not being used at the start of recovery.
Unless I'm confused.
If you start from the last checkpoint (which is what will generally be
stored in pg_control) then the effect is pretty similar.
If the backup didn't span a checkpoint, then restoring from the one in
pg_control actually works fine. Not that I'm encouraging that. But if
you replay WAL from the control file, you at least get the last
checkpoint's worth of WAL; if you use pg_resetwal, you get nothing.
I don't really want to get hung up on this though. My main point here
is that I have trouble believing that an error after you've already
screwed up your backup helps much. I think what we need is to make it
less likely that you will screw up your backup in the first place.
Right now the user can remove backup_label and get a "successful"
restore and not realize that they have just corrupted their cluster.
This is independent of the backup/restore tool doing all the right things.
I don't think it's independent of that at all.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 10/16/23 12:25, Robert Haas wrote:
On Mon, Oct 16, 2023 at 11:45 AM David Steele <david@pgmasters.net> wrote:
Hmmm, the reason to back patch this is that it would fix [1], which sure
looks like a problem to me even if it is not a "bug". We can certainly
require backup software to retry pg_control until the checksum is valid
but that seems like a pretty big ask, even considering how complicated
backup is.That seems like a problem with pg_control not being written atomically
when the standby server is updating it during recovery, rather than a
problem with backup_label not being used at the start of recovery.
Unless I'm confused.
You are not confused. But the fact that it practically can be read as
torn at all on the standby is a function of how rapidly it is being
written to update min recovery point. Writing it atomically via a temp
file may affect performance in this area, but only during the backup,
which may cause recovery to run more slowly during a backup.
I don't have proof of this because I don't have any hosts large enough
to test the theory.
Right now the user can remove backup_label and get a "successful"
restore and not realize that they have just corrupted their cluster.
This is independent of the backup/restore tool doing all the right things.I don't think it's independent of that at all.
I think it is. Imagine the user does backup/recovery using their
favorite tool and everything is fine. But due to some misconfiguration
or a problem in the WAL archive, they get this error:
FATAL: could not locate required checkpoint record
2023-10-16 16:42:50.132 UTC HINT: If you are restoring from a backup,
touch "/home/dev/test/data/recovery.signal" or
"/home/dev/test/data/standby.signal" and add required recovery options.
If you are not restoring from a backup, try removing the file
"/home/dev/test/data/backup_label".
Be careful: removing "/home/dev/test/data/backup_label" will
result in a corrupt cluster if restoring from a backup.
I did this by setting restore_command=false, but it could just as easily
be the actual restore command that returns false due to a variety of
reasons. The user has no idea what "could not locate required checkpoint
record" means and if there is enough automation they may not even
realize they just restored from a backup.
After some agonizing (we hope) they decide to delete backup_label and,
wow, it just works! So now they merrily go on their way with a corrupted
cluster. They also remember for the next time that deleting backup_label
is definitely a good procedure.
The idea behind this patch is that deleting backup_label would produce a
hard error because pg_control would be missing as well (if the backup
software did its job). If both pg_control and backup_label are present
(but pg_control has not been loaded with the contents of backup_label,
i.e. it is the running copy from the backup cluster) we can also error.
It's not perfect, because they could backup (or restore) pg_control but
not backup_label, but we are narrowing the cases where something can go
wrong and they have silent corruption, especially if their
backup/restore software follows the directions.
Regards,
-David
On Mon, Oct 16, 2023 at 1:00 PM David Steele <david@pgmasters.net> wrote:
After some agonizing (we hope) they decide to delete backup_label and,
wow, it just works! So now they merrily go on their way with a corrupted
cluster. They also remember for the next time that deleting backup_label
is definitely a good procedure.The idea behind this patch is that deleting backup_label would produce a
hard error because pg_control would be missing as well (if the backup
software did its job). If both pg_control and backup_label are present
(but pg_control has not been loaded with the contents of backup_label,
i.e. it is the running copy from the backup cluster) we can also error.
I mean, I think we're just going in circles, here. I did and do
understand, but I didn't and don't agree. You're hypothesizing a user
who is willing to do ONE thing that they shouldn't do during backup
restoration (namely, remove backup_label) but who won't be willing to
do a SECOND thing that they shouldn't do during backup restoration
(namely, run pg_resetwal). In my experience, users who are willing to
corrupt their database don't typically limit themselves to one bad
decision, and therefore I doubt that this proposal delivers enough
value to justify the complexity.
I understand that you feel differently, and that's fine, but I don't
think our disagreement here stems from me being confused. I just ...
don't agree.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Oct 16, 2023 at 12:25:59PM -0400, Robert Haas wrote:
On Mon, Oct 16, 2023 at 11:45 AM David Steele <david@pgmasters.net> wrote:
If you start from the last checkpoint (which is what will generally be
stored in pg_control) then the effect is pretty similar.If the backup didn't span a checkpoint, then restoring from the one in
pg_control actually works fine. Not that I'm encouraging that. But if
you replay WAL from the control file, you at least get the last
checkpoint's worth of WAL; if you use pg_resetwal, you get nothing.
There's no guarantee that the backend didn't spawn an extra checkpoint
while a base backup was taken, either, because we don't fail hard at
the end of the BASE_BACKUP code paths if the redo LSN has been updated
since the beginning of a BASE_BACKUP. So that's really *never* do it
except if you like silent corruptions.
I don't really want to get hung up on this though. My main point here
is that I have trouble believing that an error after you've already
screwed up your backup helps much. I think what we need is to make it
less likely that you will screw up your backup in the first place.
Yeah.. Now what's the best user experience? Is it better for a base
backup to fail and have a user retry? Or is it better to have the
backend-side backup logic do what we think is safer? The former
(likely with a REDO check or similar), will likely never work on large
instances, while users will likely always find ways to screw up base
backups taken by latter methods. A third approach is to put more
careful checks at restore time, and the latter helps a lot here.
--
Michael
On 10/16/23 15:06, Robert Haas wrote:
On Mon, Oct 16, 2023 at 1:00 PM David Steele <david@pgmasters.net> wrote:
After some agonizing (we hope) they decide to delete backup_label and,
wow, it just works! So now they merrily go on their way with a corrupted
cluster. They also remember for the next time that deleting backup_label
is definitely a good procedure.The idea behind this patch is that deleting backup_label would produce a
hard error because pg_control would be missing as well (if the backup
software did its job). If both pg_control and backup_label are present
(but pg_control has not been loaded with the contents of backup_label,
i.e. it is the running copy from the backup cluster) we can also error.I mean, I think we're just going in circles, here. I did and do
understand, but I didn't and don't agree. You're hypothesizing a user
who is willing to do ONE thing that they shouldn't do during backup
restoration (namely, remove backup_label) but who won't be willing to
do a SECOND thing that they shouldn't do during backup restoration
(namely, run pg_resetwal).
In my experience the first case is much more likely than the second.
Your experience may vary.
Anyway, I think they are pretty different. Deleting backup label appears
to give a perfectly valid restore. Running pg_resetwal is more clearly
(I think) the nuclear solution.
I understand that you feel differently, and that's fine, but I don't
think our disagreement here stems from me being confused. I just ...
don't agree.
Fair enough, we don't agree.
Regards,
-David
Greetings,
* David Steele (david@pgmasters.net) wrote:
On 10/16/23 15:06, Robert Haas wrote:
On Mon, Oct 16, 2023 at 1:00 PM David Steele <david@pgmasters.net> wrote:
After some agonizing (we hope) they decide to delete backup_label and,
wow, it just works! So now they merrily go on their way with a corrupted
cluster. They also remember for the next time that deleting backup_label
is definitely a good procedure.The idea behind this patch is that deleting backup_label would produce a
hard error because pg_control would be missing as well (if the backup
software did its job). If both pg_control and backup_label are present
(but pg_control has not been loaded with the contents of backup_label,
i.e. it is the running copy from the backup cluster) we can also error.I mean, I think we're just going in circles, here. I did and do
understand, but I didn't and don't agree. You're hypothesizing a user
who is willing to do ONE thing that they shouldn't do during backup
restoration (namely, remove backup_label) but who won't be willing to
do a SECOND thing that they shouldn't do during backup restoration
(namely, run pg_resetwal).In my experience the first case is much more likely than the second. Your
experience may vary.
My experience (though perhaps not a surprise) mirrors David's.
Anyway, I think they are pretty different. Deleting backup label appears to
give a perfectly valid restore. Running pg_resetwal is more clearly (I
think) the nuclear solution.
Right, and a delete of backup_label is just an 'rm' that folks may think
"oh, this is just some leftover thing that isn't actually needed".
OTOH, pg_resetwal has an online documentation page and a man page that's
very clear that it's only to be used as a last resort (perhaps we should
pull that into the --help output too..?). It's also pretty clear that
pg_resetwal is actually changing things about the cluster while nuking
backup_label doesn't *seem* to be in that same category, even though we
all know it is because it's needed once recovery begins.
I'd also put out there that while people don't do restore testing
nearly as much as they should, they tend to at _least_ try to do it once
after taking their first backup and if that fails then they try to figure
out why and what they're not doing right.
Thanks,
Stephen
On Tue, Oct 17, 2023 at 3:17 PM Stephen Frost <sfrost@snowman.net> wrote:
I'd also put out there that while people don't do restore testing
nearly as much as they should, they tend to at _least_ try to do it once
after taking their first backup and if that fails then they try to figure
out why and what they're not doing right.
Well, I agree with you on that point, but a lot of people only seem to
realize that after it's too late.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 10/14/23 11:30, David Steele wrote:
On 10/12/23 10:19, David Steele wrote:
On 10/11/23 18:10, Thomas Munro wrote:
As Stephen mentioned[1], we could perhaps also complain if both backup
label and control file exist, and then hint that the user should
remove the *control file* (not the backup label!). I had originally
suggested we would just overwrite the control file, but by explicitly
complaining about it we would also bring the matter to tool/script
authors' attention, ie that they shouldn't be backing that file up, or
should be removing it in a later step if they copy everything. He
also mentions that there doesn't seem to be anything stopping us from
back-patching changes to the backup label contents if we go this way.
I don't have a strong opinion on that and we could leave the question
for later.I'm worried about the possibility of back patching this unless the
solution comes out to be simpler than I think and that rarely comes to
pass. Surely throwing errors on something that is currently valid
(i.e. backup_label and pg_control both present).But perhaps there is a simpler, acceptable solution we could back
patch (transparent to all parties except Postgres) and then a more
advanced solution we could go forward with.I guess I had better get busy on this.
Attached is a very POC attempt at something along these lines that could
be back patched. I stopped when I realized excluding pg_control from the
backup is a necessity to make this work (else we still could end up with
a torn copy of pg_control even if there is a good copy elsewhere). To
enumerate the back patch issues as I see them:
Given that the above can't be back patched, I'm thinking we don't need
backup_label at all going forward. We just write the values we need for
recovery into pg_control and return *that* from pg_backup_stop() and
tell the user to store it with their backup. We already have "These
files are vital to the backup working and must be written byte for byte
without modification, which may require opening the file in binary
mode." in the documentation so dealing with pg_control should not be a
problem. pg_control also has a CRC so we will know if it gets munged.
It doesn't really matter where/how they store pg_control as long as it
is written back into PGDATA before the cluster starts. If
backupEndRequired, etc., are set appropriately then recovery will do the
right thing when it starts, just as now if PG crashes after it has
renamed backup_label but before recovery to consistency has completed.
We can still enforce the presence of recovery.signal by checking
backupEndRequired if that's something we want but it seems like
backupEndRequired would be enough. I'm fine either way.
Another thing we can do here is make backup from standby easier. The
user won't need to worry about *when* pg_control is copied. We can just
write the ideal min recovery point into pg_control.
Any informational data currently in backup_label can be returned as
columns (like the start/end lsn is now).
This makes the patch much less invasive and while it definitely,
absolutely cannot be back patched, it seems like a good way forward.
This is the direction I'm planning to work on patch-wise but I'd like to
hear people's thoughts.
Regards,
-David
At Tue, 17 Oct 2023 16:16:42 -0400, David Steele <david@pgmasters.net> wrote in
Given that the above can't be back patched, I'm thinking we don't need
backup_label at all going forward. We just write the values we need
for recovery into pg_control and return *that* from pg_backup_stop()
and tell the user to store it with their backup. We already have
"These files are vital to the backup working and must be written byte
for byte without modification, which may require opening the file in
binary mode." in the documentation so dealing with pg_control should
not be a problem. pg_control also has a CRC so we will know if it gets
munged.
I'm somewhat perplexed regarding the objective of this thread.
This thread began with the intent of preventing users from removing
the backup_label from a backup. At the beginning, the proposal aimed
to achieve this by injecting an invalid value to pg_control file
located in the generated backup. However, this (and previous) proposal
seems to deviate from that initial objective. It now eliminates the
need to be concerned about the pg_control version that is coped into
the generated backup. However, if someone removes the backup_label
from a backup, the initial concerns could still surface.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Tue, Oct 17, 2023 at 4:17 PM David Steele <david@pgmasters.net> wrote:
Given that the above can't be back patched, I'm thinking we don't need
backup_label at all going forward. We just write the values we need for
recovery into pg_control and return *that* from pg_backup_stop() and
tell the user to store it with their backup. We already have "These
files are vital to the backup working and must be written byte for byte
without modification, which may require opening the file in binary
mode." in the documentation so dealing with pg_control should not be a
problem. pg_control also has a CRC so we will know if it gets munged.
Yeah, I was thinking about this kind of idea, too. I think it might be
a good idea, although I'm not completely certain about that, either.
On the positive side, you can't remove backup_label in error if
backup_label is not a thing. You certainly can't remove the control
file. You can, however, use the original control file instead of the
one that you were supposed to use. However, that is not really any
different from failing to write the backup_label into the backup
directory, which you can already do today. Also, it adds very little
net complexity to the low-level backup procedure. Instead of needing
to write the backup_label into the backup directory, you write the
control file -- but that's instead, not in addition. So overall it
seems like the complexity is similar to what we have today but one
possible mistake is eliminated.
Also on the positive side, I suspect we could remove a decent amount
of code for dealing with backup_label files. We wouldn't have to read
them any more (and the code to do that is pretty rough-and-ready) and
we wouldn't have to do different things based on whether the
backup_label exists or not. The logic in xlog*.c is extremely
complicated, and everything that we can do to reduce the number of
cases that need to be considered is not just good, but great.
But there are also some negatives.
First, anything that is stored in the backup_label but not the control
file has to (a) move into the control file, (b) be stored someplace
else, or (c) be eliminated as a concept. We're likely to get
complaints about (a), especially if the data in question is anything
big. Any proposal to do (b) risks undermining the whole theory under
which this is a good proposal, namely that removing backup_label gives
us one less thing to worry about. So that brings us to (c).
Personally, I would lose very little sleep if the LABEL field died and
never came back, and I wouldn't miss START TIME and STOP TIME either,
but somebody else might well feel differently. I don't think it's
trivial to get rid of BACKUP METHOD, as there unfortunately seems to
be code that depends on knowing the difference between BACKUP FROM:
streamed and BACKUP FROM: pg_rewind. I suspect that BACKUP FROM:
primary/standby might have the same issue, but I'm not sure. STOP
TIMELINE could be a problem too. I think that if somebody could do
some rejiggering to eliminate some of the differences between the
cases here, that could be really good general cleanup irrespective of
what we decide about this proposal, and moving some things in to
pg_control is probably reasonable too. For instance, it would seem
crazy to me to argue that storing the backup end location in the
control file is OK, but storing the backup end TLI there would not be
OK. But the point here is that there's probably a good deal of careful
thinking that would need to be done here about exactly where all of
the stuff that currently exists in the backup_label file but not in
pg_control needs to end up.
Second, right now, the stuff that we return at the end of a backup is
all text data. With this proposal, it becomes binary data. I entirely
realize that people should only be doing these kinds of backups using
automated tools that that those automated tools should be perfectly
capable of handling binary data without garbling anything. But that's
about as realistic as supposing that people won't instantly remove the
backup_label file the moment it seems like it will solve some problem,
even when the directions clearly state that this should only be done
in some other situation that is not the one the user is facing. It
just occurred to me that one thing we could do to improve the user
experience here is offer some kind of command-line utility to assist
with taking a low-level backup. This could be done even if we don't
proceed with this proposal e.g.
pg_llbackup -d $CONNTR --backup-label=PATH --tablespace-map=PATH
--copy-data-directory=SHELLCOMMAND
I don't know for sure how much that would help, but I wonder if it
might actually help quite a bit, because right now people do things
like use psql in a shell script to try to juggle a database connection
and then in some other part of the shell script do the data copying.
But it is quite easy to screw up the error handling or the psql
session lifetime or something like that, and this would maybe provide
a nicer interface. Details likely need a good deal of kibitizing.
There might be other problems, too. This is just what occurs to me off
the top of my head. But I think it's an interesting angle to explore
further.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 10/17/23 22:13, Kyotaro Horiguchi wrote:
At Tue, 17 Oct 2023 16:16:42 -0400, David Steele <david@pgmasters.net> wrote in
Given that the above can't be back patched, I'm thinking we don't need
backup_label at all going forward. We just write the values we need
for recovery into pg_control and return *that* from pg_backup_stop()
and tell the user to store it with their backup. We already have
"These files are vital to the backup working and must be written byte
for byte without modification, which may require opening the file in
binary mode." in the documentation so dealing with pg_control should
not be a problem. pg_control also has a CRC so we will know if it gets
munged.I'm somewhat perplexed regarding the objective of this thread.
This thread began with the intent of preventing users from removing
the backup_label from a backup. At the beginning, the proposal aimed
to achieve this by injecting an invalid value to pg_control file
located in the generated backup. However, this (and previous) proposal
seems to deviate from that initial objective. It now eliminates the
need to be concerned about the pg_control version that is coped into
the generated backup. However, if someone removes the backup_label
from a backup, the initial concerns could still surface.
Yeah, the discussion has moved around quite a bit, but the goal remains
the same, to make Postgres error when it does not have the information
it needs to proceed with recovery. Right now if you delete backup_label
recovery appears to complete successfully, silently corrupting the database.
In the proposal as it stands now there would be no backup_label at all,
so no danger of removing it.
We have also gotten a bit sidetracked by our hope to use this proposal
to address torn reads of pg_control during the backup, at least in HEAD.
Regards,
-David
On 10/18/23 08:39, Robert Haas wrote:
On Tue, Oct 17, 2023 at 4:17 PM David Steele <david@pgmasters.net> wrote:
Given that the above can't be back patched, I'm thinking we don't need
backup_label at all going forward. We just write the values we need for
recovery into pg_control and return *that* from pg_backup_stop() and
tell the user to store it with their backup. We already have "These
files are vital to the backup working and must be written byte for byte
without modification, which may require opening the file in binary
mode." in the documentation so dealing with pg_control should not be a
problem. pg_control also has a CRC so we will know if it gets munged.Yeah, I was thinking about this kind of idea, too. I think it might be
a good idea, although I'm not completely certain about that, either.
<snip>
First, anything that is stored in the backup_label but not the control
file has to (a) move into the control file,
I'd rather avoid this.
(b) be stored someplace
else,
I don't think the additional fields *need* to be stored anywhere at all,
at least not by us. We can provide them as output from pg_backup_stop()
and the caller can do as they please. None of those fields are part of
the restore process.
or (c) be eliminated as a concept.
I think we should keep them as above since I don't believe they hurt
anything.
We're likely to get
complaints about (a), especially if the data in question is anything
big. Any proposal to do (b) risks undermining the whole theory under
which this is a good proposal, namely that removing backup_label gives
us one less thing to worry about. So that brings us to (c).
Personally, I would lose very little sleep if the LABEL field died and
never came back, and I wouldn't miss START TIME and STOP TIME either,
but somebody else might well feel differently. I don't think it's
trivial to get rid of BACKUP METHOD, as there unfortunately seems to
be code that depends on knowing the difference between BACKUP FROM:
streamed and BACKUP FROM: pg_rewind. I suspect that BACKUP FROM:
primary/standby might have the same issue, but I'm not sure.
BACKUP FROM: primary/standby we can definitely handle, and BACKUP
METHOD: pg_rewind just means backupEndRequired is false. That will need
to be written by pg_rewind (instead of writing backup_label).
STOP
TIMELINE could be a problem too. I think that if somebody could do
some rejiggering to eliminate some of the differences between the
cases here, that could be really good general cleanup irrespective of
what we decide about this proposal, and moving some things in to
pg_control is probably reasonable too. For instance, it would seem
crazy to me to argue that storing the backup end location in the
control file is OK, but storing the backup end TLI there would not be
OK.
We have a place in pg_control for the end TLI (minRecoveryPointTLI).
That's only valid for backup from standby since a primary backup can
never change timelines.
But the point here is that there's probably a good deal of careful
thinking that would need to be done here about exactly where all of
the stuff that currently exists in the backup_label file but not in
pg_control needs to end up.
Agreed.
Second, right now, the stuff that we return at the end of a backup is
all text data. With this proposal, it becomes binary data. I entirely
realize that people should only be doing these kinds of backups using
automated tools that that those automated tools should be perfectly
capable of handling binary data without garbling anything. But that's
about as realistic as supposing that people won't instantly remove the
backup_label file the moment it seems like it will solve some problem,
even when the directions clearly state that this should only be done
in some other situation that is not the one the user is facing.
Well, we do specify that backup_label and tablespace_map should be saved
byte for byte. But We've already seen users mess this up in the past and
add \r characters that made the files unreadable.
If it can be done wrong, it will be done wrong by somebody.
It
just occurred to me that one thing we could do to improve the user
experience here is offer some kind of command-line utility to assist
with taking a low-level backup. This could be done even if we don't
proceed with this proposal e.g.pg_llbackup -d $CONNTR --backup-label=PATH --tablespace-map=PATH
--copy-data-directory=SHELLCOMMANDI don't know for sure how much that would help, but I wonder if it
might actually help quite a bit, because right now people do things
like use psql in a shell script to try to juggle a database connection
and then in some other part of the shell script do the data copying.
But it is quite easy to screw up the error handling or the psql
session lifetime or something like that, and this would maybe provide
a nicer interface.
I think in most cases where this would be useful the user should just be
using pg_basebackup. If the backup is trying to use snapshots, then
backup_label needs to be stored outside the snapshot and we won't be
able to easily help.
There might be other problems, too. This is just what occurs to me off
the top of my head. But I think it's an interesting angle to explore
further.
There may definitely be other problems and I'm pretty sure there will
be. My feeling is that they will be surmountable, but I won't know for
sure until I finish the patch.
But I also feel it's a good idea to explore further. I'll work on the
patch and should have something to share soon.
Regards,
-David
On Wednesday, October 18, 2023, David Steele <david@pgmasters.net> wrote:
On 10/18/23 08:39, Robert Haas wrote:
On Tue, Oct 17, 2023 at 4:17 PM David Steele <david@pgmasters.net> wrote:
Given that the above can't be back patched, I'm thinking we don't need
backup_label at all going forward. We just write the values we need for
recovery into pg_control and return *that* from pg_backup_stop() and
tell the user to store it with their backup. We already have "These
files are vital to the backup working and must be written byte for byte
without modification, which may require opening the file in binary
mode." in the documentation so dealing with pg_control should not be a
problem. pg_control also has a CRC so we will know if it gets munged.Yeah, I was thinking about this kind of idea, too. I think it might be
a good idea, although I'm not completely certain about that, either.<snip>
First, anything that is stored in the backup_label but not the control
file has to (a) move into the control file,
I'd rather avoid this.
If we are OK outputting custom pg_control file content from pg_backup_end
to govern this then I’d probably just include
“tablespace_map_required=true|false” and “backup_label_required=true” lines
in it and leave everything else mostly the same, including the name. In
order for a server with those lines in its pg_control to boot it requires
that a signal file be present along with any of the named files where
*_required is true. Upon successful completion those lines are removed
from pg_control.
It seem unnecessary to move any and all relevant content into pg_control;
just a flag to ensure that those files that are needed a present in the
backup directory and whatever validation of those files is needed to ensure
they provide sufficient data.
If the user removes those files without a backup the server is not going to
start unless they make further obvious attempts to circumvent the design.
Manually editing pg_comtrol being obvious circumventing.
This would seem to be a compatible change. If you fail to use the
pg_control from pg_backup_stop you don’t get the safeguard but everything
still works. Do we really believe we need to break/force-upgrade tooling
to use this new procedure? Depending on the answer to the torn pg_comtrol
file problem which may indeed warrant such breakage.
David J.
On Wed, Oct 18, 2023 at 7:15 PM David Steele <david@pgmasters.net> wrote:
(b) be stored someplace
else,I don't think the additional fields *need* to be stored anywhere at all,
at least not by us. We can provide them as output from pg_backup_stop()
and the caller can do as they please. None of those fields are part of
the restore process.
Not sure which fields we're talking about here. I agree that if
they're not really needed, we can return them and the user can keep
them or discard them as they wish. But in that case you might also ask
why bother even returning them.
pg_llbackup -d $CONNTR --backup-label=PATH --tablespace-map=PATH
--copy-data-directory=SHELLCOMMANDI think in most cases where this would be useful the user should just be
using pg_basebackup. If the backup is trying to use snapshots, then
backup_label needs to be stored outside the snapshot and we won't be
able to easily help.
Right, the idea of the above was that you would specify paths for the
backup label and the tablespace map that were outside of the snapshot
directory in that kind of case. But you couldn't screw up the line
endings or whatever because pg_llbackup would take care of that aspect
of it for you.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 10/19/23 10:24, Robert Haas wrote:
On Wed, Oct 18, 2023 at 7:15 PM David Steele <david@pgmasters.net> wrote:
(b) be stored someplace
else,I don't think the additional fields *need* to be stored anywhere at all,
at least not by us. We can provide them as output from pg_backup_stop()
and the caller can do as they please. None of those fields are part of
the restore process.Not sure which fields we're talking about here. I agree that if
they're not really needed, we can return them and the user can keep
them or discard them as they wish. But in that case you might also ask
why bother even returning them.
I'm specifically talking about START TIME and LABEL. They are currently
stored in backup_label but not used for recovery. START TIMELINE is also
not used, except as a cross check against START WAL LOCATION.
I'd still like to see most or all of these fields exposed through
pg_backup_stop(). The user can choose to store them or not, but none of
them will be required for recovery.
pg_llbackup -d $CONNTR --backup-label=PATH --tablespace-map=PATH
--copy-data-directory=SHELLCOMMANDI think in most cases where this would be useful the user should just be
using pg_basebackup. If the backup is trying to use snapshots, then
backup_label needs to be stored outside the snapshot and we won't be
able to easily help.Right, the idea of the above was that you would specify paths for the
backup label and the tablespace map that were outside of the snapshot
directory in that kind of case. But you couldn't screw up the line
endings or whatever because pg_llbackup would take care of that aspect
of it for you.
What I meant here (but said badly) is that in the case of snapshot
backups, the backup_label and tablespace_map will likely need to be
stored somewhere off the server since they can't be part of the
snapshot, perhaps in a key store. In that case the backup software would
still need to read the files from wherever we stored then and correctly
handle them when storing elsewhere. If you were moving the files to say,
S3, a similar thing needs to happen. In general, I think a locally
mounted filesystem is very unlikely to be the final destination for
these files, and if it is then probably pg_basebackup is your friend.
Regards,
-David
On Thu, Oct 19, 2023 at 10:43 AM David Steele <david@pgmasters.net> wrote:
What I meant here (but said badly) is that in the case of snapshot
backups, the backup_label and tablespace_map will likely need to be
stored somewhere off the server since they can't be part of the
snapshot, perhaps in a key store. In that case the backup software would
still need to read the files from wherever we stored then and correctly
handle them when storing elsewhere. If you were moving the files to say,
S3, a similar thing needs to happen. In general, I think a locally
mounted filesystem is very unlikely to be the final destination for
these files, and if it is then probably pg_basebackup is your friend.
I mean, writing those tiny little files locally and then uploading
them should be fine in a case like that. It still reduces the surface
for mistakes. And you could also have --backup-label='| whatever' or
something if you wanted. The point is that right now we're asking
people to pull this information out of a query result, and that means
people are trying to do it by calling out to psql, and that is a GREAT
way to screw up the escaping or the newlines or whatever. I don't
think the mistakes people are making here are being made by people
using Perl and DBD::Pg, or Python and psycopg2, or C and libpq.
They're being made by people who are trying to shell script their way
through it, which entails using psql, which makes screwing it up a
breeze.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thursday, October 19, 2023, David Steele <david@pgmasters.net> wrote:
On 10/19/23 10:24, Robert Haas wrote:
On Wed, Oct 18, 2023 at 7:15 PM David Steele <david@pgmasters.net> wrote:
pg_llbackup -d $CONNTR --backup-label=PATH --tablespace-map=PATH
--copy-data-directory=SHELLCOMMANDI think in most cases where this would be useful the user should just be
using pg_basebackup. If the backup is trying to use snapshots, then
backup_label needs to be stored outside the snapshot and we won't be
able to easily help.Right, the idea of the above was that you would specify paths for the
backup label and the tablespace map that were outside of the snapshot
directory in that kind of case. But you couldn't screw up the line
endings or whatever because pg_llbackup would take care of that aspect
of it for you.What I meant here (but said badly) is that in the case of snapshot
backups, the backup_label and tablespace_map will likely need to be stored
somewhere off the server since they can't be part of the snapshot, perhaps
in a key store. In that case the backup software would still need to read
the files from wherever we stored then and correctly handle them when
storing elsewhere. If you were moving the files to say, S3, a similar thing
needs to happen. In general, I think a locally mounted filesystem is very
unlikely to be the final destination for these files, and if it is then
probably pg_basebackup is your friend.
We are choosing to not take responsibility if the procedure used by the dba
is one that takes a full live backup of the data directory as-is and then
brings that backup back into production without making any changes to it.
That restored copy will be bootable but quite probably corrupted. That is
on them as we have no way to make it non-bootable without risking source
database being unable to be restarted automatically upon a crash. In
short, a snapshot methodology is beyond what we are willing to provide
protections for.
What I do kinda gather from the above is we should be providing a
pg_baserestore application if we want to give our users fewer reasons to
write their own tooling against the API and/or want to add more complexity
to pg_basebackup to handle various needs that need corresponding recovery
needs that we should implement as code instead of documentation.
David J.
On 10/19/23 10:56, Robert Haas wrote:
On Thu, Oct 19, 2023 at 10:43 AM David Steele <david@pgmasters.net> wrote:
What I meant here (but said badly) is that in the case of snapshot
backups, the backup_label and tablespace_map will likely need to be
stored somewhere off the server since they can't be part of the
snapshot, perhaps in a key store. In that case the backup software would
still need to read the files from wherever we stored then and correctly
handle them when storing elsewhere. If you were moving the files to say,
S3, a similar thing needs to happen. In general, I think a locally
mounted filesystem is very unlikely to be the final destination for
these files, and if it is then probably pg_basebackup is your friend.I mean, writing those tiny little files locally and then uploading
them should be fine in a case like that. It still reduces the surface
for mistakes. And you could also have --backup-label='| whatever' or
something if you wanted. The point is that right now we're asking
people to pull this information out of a query result, and that means
people are trying to do it by calling out to psql, and that is a GREAT
way to screw up the escaping or the newlines or whatever. I don't
think the mistakes people are making here are being made by people
using Perl and DBD::Pg, or Python and psycopg2, or C and libpq.
They're being made by people who are trying to shell script their way
through it, which entails using psql, which makes screwing it up a
breeze.
OK, I see what you mean and I agree. Better documentation might be the
answer here, but I doubt that psql is a good tool for starting/stopping
backup and not sure we want to encourage it.
Regards,
-David