Add recovery to pg_control and remove backup_label

Started by David Steeleover 2 years ago48 messageshackers
Jump to latest
#1David Steele
david@pgmasters.net

Hackers,

This was originally proposed in [1]/messages/by-id/1330cb48-4e47-03ca-f2fb-b144b49514d8@pgmasters.net but that thread went through a
number of different proposals so it seems better to start anew.

The basic idea here is to simplify and harden recovery by getting rid of
backup_label and storing recovery information directly in pg_control.
Instead of backup software copying pg_control from PGDATA, it stores an
updated version that is returned from pg_backup_stop(). I believe this
is better for the following reasons:

* The user can no longer remove backup_label and get what looks like a
successful restore (while almost certainly causing corruption). If
pg_control is removed the cluster will not start. The user may try
pg_resetwal, but I think that tool makes it pretty clear that corruption
will result from its use. We could also modify pg_resetwal to complain
if recovery info is present in pg_control.

* We don't need to worry about backup software seeing a torn copy of
pg_control, since Postgres can safely read it out of memory and provide
a valid copy via pg_backup_stop(). This solves [2]/messages/by-id/20221123014224.xisi44byq3cf5psi@awork3.anarazel.de without needing to
write pg_control via a temp file, which may affect performance on a
standby. Unfortunately, this solution cannot be back patched.

* For backup from standby, we no longer need to instruct the backup
software to copy pg_control last. In fact the backup software should not
copy pg_control from PGDATA at all.

Since backup_label is now gone, the fields that used to be in
backup_label are now provided as columns returned from pg_backup_start()
and pg_backup_stop() and the backup history file is still written to the
archive. For pg_basebackup we would have the option of writing the
fields into the JSON manifest, storing them to a file (e.g.
backup.info), or just ignoring them. None of the fields are required for
recovery but backup software may be very interested in them.

I updated pg_rewind but I'm not very confident in the tests. When I
removed backup_label processing, but before I updated pg_rewind to write
recovery info into pg_control, all the rewind tests passed.

This patch highlights the fact that we still have no tests for the
low-level backup method. I modified pgBackRest to work with this patch
and the entire test suite ran without any issues, but in-core tests
would be good to have. I'm planning to work on those myself as a
separate patch.

This patch would also make the proposal in [3]/messages/by-id/eb3d1aae-1a75-bcd3-692a-38729423168f@pgmasters.net obsolete since there is
no need to rename backup_label if it is gone.

I know that outputting pg_control as bytea is going to be a bit
controversial. Software that is using psql get run pg_backup_stop()
could use encode() to get pg_control as text and then decode it later.
Alternately, we could update ReadControlFile() to recognize a
base64-encoded pg_control file. I'm not sure dealing with binary data is
that much of a problem, though, and if the backup software gets it wrong
then recovery with fail on an invalid pg_control file.

Lastly, I think there are improvements to be made in recovery that go
beyond this patch. I originally set out to load the recovery info into
*just* the existing fields in pg_control but it required so many changes
to recovery that I decided it was too dangerous to do all in one patch.
This patch very much takes the "backup_label in pg_control" approach,
though I reused fields where possible. The added fields, e.g.
backupRecoveryRequested, also allow us to keep the user experience
pretty much the same in terms of messages and errors.

Thoughts?

Regards,
-David

[1]: /messages/by-id/1330cb48-4e47-03ca-f2fb-b144b49514d8@pgmasters.net
/messages/by-id/1330cb48-4e47-03ca-f2fb-b144b49514d8@pgmasters.net
[2]: /messages/by-id/20221123014224.xisi44byq3cf5psi@awork3.anarazel.de
/messages/by-id/20221123014224.xisi44byq3cf5psi@awork3.anarazel.de
[3]: /messages/by-id/eb3d1aae-1a75-bcd3-692a-38729423168f@pgmasters.net
/messages/by-id/eb3d1aae-1a75-bcd3-692a-38729423168f@pgmasters.net

Attachments:

v01-recovery-in-pgcontrol-remove-backuplabel.patchtext/plain; charset=UTF-8; name=v01-recovery-in-pgcontrol-remove-backuplabel.patchDownload+255-400
#2David G. Johnston
david.g.johnston@gmail.com
In reply to: David Steele (#1)
Re: Add recovery to pg_control and remove backup_label

On Thu, Oct 26, 2023 at 2:02 PM David Steele <david@pgmasters.net> wrote:

Hackers,

This was originally proposed in [1] but that thread went through a
number of different proposals so it seems better to start anew.

The basic idea here is to simplify and harden recovery by getting rid of
backup_label and storing recovery information directly in pg_control.
Instead of backup software copying pg_control from PGDATA, it stores an
updated version that is returned from pg_backup_stop(). I believe this
is better for the following reasons:

* The user can no longer remove backup_label and get what looks like a
successful restore (while almost certainly causing corruption). If
pg_control is removed the cluster will not start. The user may try
pg_resetwal, but I think that tool makes it pretty clear that corruption
will result from its use. We could also modify pg_resetwal to complain
if recovery info is present in pg_control.

* We don't need to worry about backup software seeing a torn copy of
pg_control, since Postgres can safely read it out of memory and provide
a valid copy via pg_backup_stop(). This solves [2] without needing to
write pg_control via a temp file, which may affect performance on a
standby. Unfortunately, this solution cannot be back patched.

Are we planning on dealing with torn writes in the back branches in some
way or are we just throwing in the towel and saying the old method is too
error-prone to exist/retain and therefore the goal of the v17 changes is to
not only provide a better way but also to ensure the old way no longer
works? It seems sufficient to change the output signature of
pg_backup_stop to accomplish that goal though I am pondering whether an
explicit check and error for seeing the backup_label file would be
warranted.

If we are going to solve the torn writes problem completely then while I
agree the new way is superior, implementing it doesn't have to mean
existing tools built to produce backup_label and rely upon the pg_control
in the data directory need to be forcibly broken.

I know that outputting pg_control as bytea is going to be a bit
controversial. Software that is using psql get run pg_backup_stop()
could use encode() to get pg_control as text and then decode it later.
Alternately, we could update ReadControlFile() to recognize a
base64-encoded pg_control file. I'm not sure dealing with binary data is
that much of a problem, though, and if the backup software gets it wrong
then recovery with fail on an invalid pg_control file.

Can we not figure out some way to place the relevant files onto the server
somewhere so that a simple "cp" command would work? Have pg_backup_stop
return paths instead of contents, those paths being "$TEMP_DIR"/<random
unique new directory>/pg_control.conf (and tablespace_map)

David J.

#3David Steele
david@pgmasters.net
In reply to: David G. Johnston (#2)
Re: Add recovery to pg_control and remove backup_label

On 10/26/23 17:27, David G. Johnston wrote:

On Thu, Oct 26, 2023 at 2:02 PM David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:

Are we planning on dealing with torn writes in the back branches in some
way or are we just throwing in the towel and saying the old method is
too error-prone to exist/retain

We are still planning to address this issue in the back branches.

and therefore the goal of the v17
changes is to not only provide a better way but also to ensure the old
way no longer works?  It seems sufficient to change the output signature
of pg_backup_stop to accomplish that goal though I am pondering whether
an explicit check and error for seeing the backup_label file would be
warranted.

Well, if the backup tool is just copying the second column of output to
the backup_label, then it won't break. Of course in that case, restores
won't work correctly but you would not get an error. Testing would show
that it is not working properly and backup tools should certainly be tested.

Even so, I'm OK with an explicit check for backup_label. Let's see what
others think.

If we are going to solve the torn writes problem completely then while I
agree the new way is superior, implementing it doesn't have to mean
existing tools built to produce backup_label and rely upon the
pg_control in the data directory need to be forcibly broken.

It is a pretty easy update to any backup software that supports
non-exclusive backup. I was able to make the changes to pgBackRest in
less than an hour. We've made major changes to backup and restore in
almost every major version of PostgreSQL for a while: non-exlusive
backup in 9.6, dir renames in 10, variable WAL size in 11, new recovery
location in 12, hard recovery target errors in 13, and changes to
non-exclusive backup and removal of exclusive backup in 15. In 17 we are
already looking at new page and segment sizes.

I know that outputting pg_control as bytea is going to be a bit
controversial. Software that is using psql get run pg_backup_stop()
could use encode() to get pg_control as text and then decode it later.
Alternately, we could update ReadControlFile() to recognize a
base64-encoded pg_control file. I'm not sure dealing with binary
data is
that much of a problem, though, and if the backup software gets it
wrong
then recovery with fail on an invalid pg_control file.

Can we not figure out some way to place the relevant files onto the
server somewhere so that a simple "cp" command would work?  Have
pg_backup_stop return paths instead of contents, those paths being
"$TEMP_DIR"/<random unique new directory>/pg_control.conf (and
tablespace_map)

Nobody has been able to figure this out, and some of us have been
thinking about it for years. It just doesn't seem possible to reliably
tell the difference between a cluster that was copied and one that
simply crashed.

If cp is really the backup tool being employed, I would recommend using
pg_basebackup. cp has flaws that could lead to corruption, and of course
does not at all take into account the archive required to make a backup
consistent, directories to be excluded, the order of copying pg_control
on backup from standy, etc., etc.

Backup/restore is not a simple endeavor and we don't do anyone favors
pretending that it is.

Regards,
-David

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: David Steele (#3)
Re: Add recovery to pg_control and remove backup_label

On Fri, Oct 27, 2023 at 7:10 AM David Steele <david@pgmasters.net> wrote:

On 10/26/23 17:27, David G. Johnston wrote:

Can we not figure out some way to place the relevant files onto the
server somewhere so that a simple "cp" command would work? Have
pg_backup_stop return paths instead of contents, those paths being
"$TEMP_DIR"/<random unique new directory>/pg_control.conf (and
tablespace_map)

Nobody has been able to figure this out, and some of us have been
thinking about it for years. It just doesn't seem possible to reliably
tell the difference between a cluster that was copied and one that
simply crashed.

If cp is really the backup tool being employed, I would recommend using
pg_basebackup. cp has flaws that could lead to corruption, and of course
does not at all take into account the archive required to make a backup
consistent, directories to be excluded, the order of copying pg_control
on backup from standy, etc., etc.

Let me modify that to make it a bit more clear, I actually wouldn't care if
pg_backup_end outputs an entire binary pg_control file as part of the SQL
resultset.

My proposal would be to, in addition, place in the temporary directory on
the server, Postgres-written versions of pg_control and tablespace_map as
part of the pg_backup_end processing. The client software would then have
a choice. Write the contents of the SQL resultset to newly created binary
mode files in the destination, or, copy the server-written files from the
temporary directory to the destination.

That said, I'm starting to dislike that idea myself. It only really makes
sense if the files could be placed in the data directory but that isn't
doable given concurrent backups and not wanting to place the source server
into an inconsistent state.

David J.

#5David Steele
david@pgmasters.net
In reply to: David G. Johnston (#4)
Re: Add recovery to pg_control and remove backup_label

On 10/27/23 13:45, David G. Johnston wrote:

Let me modify that to make it a bit more clear, I actually wouldn't care
if pg_backup_end outputs an entire binary pg_control file as part of the
SQL resultset.

My proposal would be to, in addition, place in the temporary directory
on the server, Postgres-written versions of pg_control and
tablespace_map as part of the pg_backup_end processing.  The client
software would then have a choice.  Write the contents of the SQL
resultset to newly created binary mode files in the destination, or,
copy the server-written files from the temporary directory to the
destination.

That said, I'm starting to dislike that idea myself.  It only really
makes sense if the files could be placed in the data directory but that
isn't doable given concurrent backups and not wanting to place the
source server into an inconsistent state.

Pretty much the conclusion I have come to myself over the years.

Regards,
-David

#6David Steele
david@pgmasters.net
In reply to: David Steele (#1)
Re: Add recovery to pg_control and remove backup_label

Rebased on 151ffcf6.

Attachments:

v02-recovery-in-pgcontrol-remove-backuplabel.patchtext/plain; charset=UTF-8; name=v02-recovery-in-pgcontrol-remove-backuplabel.patchDownload+258-405
#7Michael Paquier
michael@paquier.xyz
In reply to: David Steele (#3)
Re: Add recovery to pg_control and remove backup_label

On Fri, Oct 27, 2023 at 10:10:42AM -0400, David Steele wrote:

We are still planning to address this issue in the back branches.

FWIW, redesigning the backend code in charge of doing base backups in
the back branches is out of scope. Based on a read of the proposed
patch, it includes catalog changes which would require a catversion
bump, so that's not going to work anyway.
--
Michael

#8Michael Paquier
michael@paquier.xyz
In reply to: David Steele (#6)
Re: Add recovery to pg_control and remove backup_label

On Sun, Nov 05, 2023 at 01:45:39PM -0400, David Steele wrote:

Rebased on 151ffcf6.

I like this patch a lot. Even if the backup_label file is removed, we
still have all the debug information from the backup history file,
thanks to its LABEL, BACKUP METHOD and BACKUP FROM, so no information
is lost. It does a 1:1 replacement of the contents parsed from the
backup_label needed by recovery by fetching them from the control
file. Sounds like a straight-forward change to me.

The patch is failing the recovery test 039_end_of_wal.pl. Could you
look at the failure?

         /* Build and save the contents of the backup history file */
-        history_file = build_backup_content(state, true);
+        history_file = build_backup_content(state);

build_backup_content() sounds like an incorrect name if it is a
routine onlyused to build the contents of backup history files.

Why is there nothing updated in src/bin/pg_controldata/?

+        /* Clear fields used to initialize recovery */
+        ControlFile->backupCheckPoint = InvalidXLogRecPtr;
+        ControlFile->backupStartPointTLI = 0;
+        ControlFile->backupRecoveryRequired = false;
+        ControlFile->backupFromStandby = false;

These variables in the control file are cleaned up when the
backup_label file was read previously, but backup_label is renamed to
backup_label.old a bit later than that. Your logic looks correct seen
from here, but shouldn't these variables be set much later, aka just
*after* UpdateControlFile(). This gap between the initialization of
the control file and the in-memory reset makes the code quite brittle,
IMO.

- basebackup_progress_wait_wal_archive(&state);
- do_pg_backup_stop(backup_state, !opt->nowait);

Why is that moved?

-    The backup label
-    file includes the label string you gave to <function>pg_backup_start</function>,
-    as well as the time at which <function>pg_backup_start</function> was run, and
-    the name of the starting WAL file.  In case of confusion it is therefore
-    possible to look inside a backup file and determine exactly which
-    backup session the dump file came from.  The tablespace map file includes
+    The tablespace map file includes

It may be worth mentioning that the backup history file holds this
information on the primary's pg_wal, as well.

The changes in sendFileWithContent() may be worth a patch of its own.

--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -146,6 +146,9 @@ typedef struct ControlFileData
@@ -160,14 +163,25 @@ typedef struct ControlFileData
     XLogRecPtr    minRecoveryPoint;
     TimeLineID    minRecoveryPointTLI;
+    XLogRecPtr    backupCheckPoint;
     XLogRecPtr    backupStartPoint;
+    TimeLineID    backupStartPointTLI;
     XLogRecPtr    backupEndPoint;
+    bool         backupRecoveryRequired;
+    bool         backupFromStandby;

This increases the size of the control file from 296B to 312B with an
8-byte alignment, as far as I can see. The size of the control file
has been always a sensitive subject especially with the hard limit of
PG_CONTROL_MAX_SAFE_SIZE. Well, the point of this patch is that this
is the price to pay to prevent users from doing something stupid with
a removal of a backup_label when they should not. Do others have an
opinion about this increase in size?

Actually, grouping backupStartPointTLI and minRecoveryPointTLI should
reduce more the size with some alignment magic, no?

- /*
- * BACKUP METHOD lets us know if this was a typical backup ("streamed",
- * which could mean either pg_basebackup or the pg_backup_start/stop
- * method was used) or if this label came from somewhere else (the only
- * other option today being from pg_rewind). If this was a streamed
- * backup then we know that we need to play through until we get to the
- * end of the WAL which was generated during the backup (at which point we
- * will have reached consistency and backupEndRequired will be reset to be
- * false).
- */
- if (fscanf(lfp, "BACKUP METHOD: %19s\n", backuptype) == 1)
- {
- if (strcmp(backuptype, "streamed") == 0)
- *backupEndRequired = true;
- }

backupRecoveryRequired in the control file is switched to false for
pg_rewind and true for streamed backups. My gut feeling is telling me
that this should be OK, as out-of-core tools would need an upgrade if
they relied on the backend_label file anyway. I can see that this
change makes use lose some documentation, unfortunately. Shouldn't
these removed lines be moved to pg_control.h instead for the
description of backupEndRequired?

doc/src/sgml/ref/pg_rewind.sgml and
src/backend/access/transam/xlogrecovery.c still include references to
the backup_label file.
--
Michael

#9David Steele
david@pgmasters.net
In reply to: Michael Paquier (#7)
Re: Add recovery to pg_control and remove backup_label

On 11/6/23 01:05, Michael Paquier wrote:

On Fri, Oct 27, 2023 at 10:10:42AM -0400, David Steele wrote:

We are still planning to address this issue in the back branches.

FWIW, redesigning the backend code in charge of doing base backups in
the back branches is out of scope. Based on a read of the proposed
patch, it includes catalog changes which would require a catversion
bump, so that's not going to work anyway.

I did not mean this patch -- rather some variation of what Thomas has
been working on, more than likely.

Regards,
-David

#10David Steele
david@pgmasters.net
In reply to: Michael Paquier (#8)
Re: Add recovery to pg_control and remove backup_label

On 11/6/23 02:35, Michael Paquier wrote:

On Sun, Nov 05, 2023 at 01:45:39PM -0400, David Steele wrote:

Rebased on 151ffcf6.

I like this patch a lot. Even if the backup_label file is removed, we
still have all the debug information from the backup history file,
thanks to its LABEL, BACKUP METHOD and BACKUP FROM, so no information
is lost. It does a 1:1 replacement of the contents parsed from the
backup_label needed by recovery by fetching them from the control
file. Sounds like a straight-forward change to me.

That's the plan, at least!

The patch is failing the recovery test 039_end_of_wal.pl. Could you
look at the failure?

I'm not seeing this failure, and CI seems happy [1]https://cirrus-ci.com/build/4939808120766464. Can you give
details of the error message?

/* Build and save the contents of the backup history file */
-        history_file = build_backup_content(state, true);
+        history_file = build_backup_content(state);

build_backup_content() sounds like an incorrect name if it is a
routine onlyused to build the contents of backup history files.

Good point, I have renamed this to build_backup_history_content().

Why is there nothing updated in src/bin/pg_controldata/?

Oops, added.

+        /* Clear fields used to initialize recovery */
+        ControlFile->backupCheckPoint = InvalidXLogRecPtr;
+        ControlFile->backupStartPointTLI = 0;
+        ControlFile->backupRecoveryRequired = false;
+        ControlFile->backupFromStandby = false;

These variables in the control file are cleaned up when the
backup_label file was read previously, but backup_label is renamed to
backup_label.old a bit later than that. Your logic looks correct seen
from here, but shouldn't these variables be set much later, aka just
*after* UpdateControlFile(). This gap between the initialization of
the control file and the in-memory reset makes the code quite brittle,
IMO.

If we set these fields where backup_label was renamed, the logic would
not be exactly the same since pg_control won't be updated until the next
time through the loop. Since the fields should be updated before
UpdateControlFile() I thought it made sense to keep all the updates
together.

Overall I think it is simpler, and we don't need to acquire a lock on
ControlFile.

- basebackup_progress_wait_wal_archive(&state);
- do_pg_backup_stop(backup_state, !opt->nowait);

Why is that moved?

do_pg_backup_stop() generates the updated pg_control so it needs to run
before we transmit pg_control.

-    The backup label
-    file includes the label string you gave to <function>pg_backup_start</function>,
-    as well as the time at which <function>pg_backup_start</function> was run, and
-    the name of the starting WAL file.  In case of confusion it is therefore
-    possible to look inside a backup file and determine exactly which
-    backup session the dump file came from.  The tablespace map file includes
+    The tablespace map file includes

It may be worth mentioning that the backup history file holds this
information on the primary's pg_wal, as well.

OK, reworded.

The changes in sendFileWithContent() may be worth a patch of its own.

Thomas included this change in his pg_basebackup changes so I did the
same. Maybe wait a bit before we split this out? Seems like a pretty
small change...

--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -146,6 +146,9 @@ typedef struct ControlFileData
@@ -160,14 +163,25 @@ typedef struct ControlFileData
XLogRecPtr    minRecoveryPoint;
TimeLineID    minRecoveryPointTLI;
+    XLogRecPtr    backupCheckPoint;
XLogRecPtr    backupStartPoint;
+    TimeLineID    backupStartPointTLI;
XLogRecPtr    backupEndPoint;
+    bool         backupRecoveryRequired;
+    bool         backupFromStandby;

This increases the size of the control file from 296B to 312B with an
8-byte alignment, as far as I can see. The size of the control file
has been always a sensitive subject especially with the hard limit of
PG_CONTROL_MAX_SAFE_SIZE. Well, the point of this patch is that this
is the price to pay to prevent users from doing something stupid with
a removal of a backup_label when they should not. Do others have an
opinion about this increase in size?

Actually, grouping backupStartPointTLI and minRecoveryPointTLI should
reduce more the size with some alignment magic, no?

I thought about this, but it seemed to me that existing fields had been
positioned to make the grouping logical rather than to optimize
alignment, e.g. minRecoveryPointTLI. Ideally that would have been placed
near backupEndRequired (or vice versa). But if the general opinion is to
rearrange for alignment, I'm OK with that.

backupRecoveryRequired in the control file is switched to false for
pg_rewind and true for streamed backups. My gut feeling is telling me
that this should be OK, as out-of-core tools would need an upgrade if
they relied on the backend_label file anyway. I can see that this
change makes use lose some documentation, unfortunately. Shouldn't
these removed lines be moved to pg_control.h instead for the
description of backupEndRequired?

Updated description in pg_control.h -- it's a bit vague but not sure it
is a good idea to get into the inner workings of pg_rewind here?

doc/src/sgml/ref/pg_rewind.sgml and
src/backend/access/transam/xlogrecovery.c still include references to
the backup_label file.

Fixed.

Attached is a new patch based on 18b585155.

Regards,
-David

[1]: https://cirrus-ci.com/build/4939808120766464

Attachments:

v03-recovery-in-pgcontrol-remove-backuplabel.patchtext/plain; charset=UTF-8; name=v03-recovery-in-pgcontrol-remove-backuplabel.patchDownload+273-406
#11Michael Paquier
michael@paquier.xyz
In reply to: David Steele (#10)
Re: Add recovery to pg_control and remove backup_label

On Mon, Nov 06, 2023 at 05:39:02PM -0400, David Steele wrote:

On 11/6/23 02:35, Michael Paquier wrote:

The patch is failing the recovery test 039_end_of_wal.pl. Could you
look at the failure?

I'm not seeing this failure, and CI seems happy [1]. Can you give details of
the error message?

I've retested today, and miss the failure. I'll let you know if I see
this again.

+        /* Clear fields used to initialize recovery */
+        ControlFile->backupCheckPoint = InvalidXLogRecPtr;
+        ControlFile->backupStartPointTLI = 0;
+        ControlFile->backupRecoveryRequired = false;
+        ControlFile->backupFromStandby = false;

These variables in the control file are cleaned up when the
backup_label file was read previously, but backup_label is renamed to
backup_label.old a bit later than that. Your logic looks correct seen
from here, but shouldn't these variables be set much later, aka just
*after* UpdateControlFile(). This gap between the initialization of
the control file and the in-memory reset makes the code quite brittle,
IMO.

Yeah, sorry, there's a think from me here. I meant to reset these
variables just before the UpdateControlFile() after InitWalRecovery()
in UpdateControlFile(), much closer to it.

If we set these fields where backup_label was renamed, the logic would not
be exactly the same since pg_control won't be updated until the next time
through the loop. Since the fields should be updated before
UpdateControlFile() I thought it made sense to keep all the updates
together.

Overall I think it is simpler, and we don't need to acquire a lock on
ControlFile.

What you are proposing is the same as what we already do for
backupEndRequired or backupStartPoint in the control file when
initializing recovery, so objection withdrawn.

Thomas included this change in his pg_basebackup changes so I did the same.
Maybe wait a bit before we split this out? Seems like a pretty small
change...

Seems like a pretty good argument for refactoring that now, and let
any other patches rely on it. Would you like to send a separate
patch?

Actually, grouping backupStartPointTLI and minRecoveryPointTLI should
reduce more the size with some alignment magic, no?

I thought about this, but it seemed to me that existing fields had been
positioned to make the grouping logical rather than to optimize alignment,
e.g. minRecoveryPointTLI. Ideally that would have been placed near
backupEndRequired (or vice versa). But if the general opinion is to
rearrange for alignment, I'm OK with that.

I've not tested, but it looks like moving backupStartPointTLI after
backupEndPoint should shave 8 bytes, if you want to maintain a more
coherent group for the LSNs.
--
Michael

#12Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#11)
Re: Add recovery to pg_control and remove backup_label

On Tue, Nov 07, 2023 at 05:20:27PM +0900, Michael Paquier wrote:

On Mon, Nov 06, 2023 at 05:39:02PM -0400, David Steele wrote:
I've retested today, and miss the failure. I'll let you know if I see
this again.

I've done a few more dozen runs, and still nothing. I am wondering
what this disturbance was.

If we set these fields where backup_label was renamed, the logic would not
be exactly the same since pg_control won't be updated until the next time
through the loop. Since the fields should be updated before
UpdateControlFile() I thought it made sense to keep all the updates
together.

Overall I think it is simpler, and we don't need to acquire a lock on
ControlFile.

What you are proposing is the same as what we already do for
backupEndRequired or backupStartPoint in the control file when
initializing recovery, so objection withdrawn.

Thomas included this change in his pg_basebackup changes so I did the same.
Maybe wait a bit before we split this out? Seems like a pretty small
change...

Seems like a pretty good argument for refactoring that now, and let
any other patches rely on it. Would you like to send a separate
patch?

The split still looks worth doing seen from here, so I am switching
the patch as WoA for now.

Actually, grouping backupStartPointTLI and minRecoveryPointTLI should
reduce more the size with some alignment magic, no?

I thought about this, but it seemed to me that existing fields had been
positioned to make the grouping logical rather than to optimize alignment,
e.g. minRecoveryPointTLI. Ideally that would have been placed near
backupEndRequired (or vice versa). But if the general opinion is to
rearrange for alignment, I'm OK with that.

I've not tested, but it looks like moving backupStartPointTLI after
backupEndPoint should shave 8 bytes, if you want to maintain a more
coherent group for the LSNs.

+    * backupFromStandby indicates that the backup was taken on a standby. It is
+    * require to initialize recovery and set to false afterwards.
s/require/required/.

The term "backup recovery", that we've never used in the tree until
now as far as I know. Perhaps this recovery method should just be
referred as "recovery from backup"?

By the way, there is another thing that this patch has forgotten: the
SQL functions that display data from the control file. Shouldn't
pg_control_recovery() be extended with the new fields? These fields
may be less critical than the other ones related to recovery, but I
suspect that showing them can become handy at least for debugging and
monitoring purposes.

Something in this area is that backupRecoveryRequired is the switch
controlling if the fields set by the recovery initialization. Could
it be actually useful to leave the other fields as they are and only
reset backupRecoveryRequired before the first control file update?
This would leave a trace of the backup history directly in the control
file.

What about pg_resetwal and RewriteControlFile()? Shouldn't these
recovery fields be reset as well?

git diff --check is complaining a bit.
--
Michael

#13David Steele
david@pgmasters.net
In reply to: Michael Paquier (#12)
Re: Add recovery to pg_control and remove backup_label

On 11/10/23 00:37, Michael Paquier wrote:

On Tue, Nov 07, 2023 at 05:20:27PM +0900, Michael Paquier wrote:

On Mon, Nov 06, 2023 at 05:39:02PM -0400, David Steele wrote:
I've retested today, and miss the failure. I'll let you know if I see
this again.

I've done a few more dozen runs, and still nothing. I am wondering
what this disturbance was.

OK, hopefully it was just a blip.

If we set these fields where backup_label was renamed, the logic would not
be exactly the same since pg_control won't be updated until the next time
through the loop. Since the fields should be updated before
UpdateControlFile() I thought it made sense to keep all the updates
together.

Overall I think it is simpler, and we don't need to acquire a lock on
ControlFile.

What you are proposing is the same as what we already do for
backupEndRequired or backupStartPoint in the control file when
initializing recovery, so objection withdrawn.

Thomas included this change in his pg_basebackup changes so I did the same.
Maybe wait a bit before we split this out? Seems like a pretty small
change...

Seems like a pretty good argument for refactoring that now, and let
any other patches rely on it. Would you like to send a separate
patch?

The split still looks worth doing seen from here, so I am switching
the patch as WoA for now.

This has been split out.

Actually, grouping backupStartPointTLI and minRecoveryPointTLI should
reduce more the size with some alignment magic, no?

I thought about this, but it seemed to me that existing fields had been
positioned to make the grouping logical rather than to optimize alignment,
e.g. minRecoveryPointTLI. Ideally that would have been placed near
backupEndRequired (or vice versa). But if the general opinion is to
rearrange for alignment, I'm OK with that.

I've not tested, but it looks like moving backupStartPointTLI after
backupEndPoint should shave 8 bytes, if you want to maintain a more
coherent group for the LSNs.

OK, I have moved backupStartPointTLI.

+    * backupFromStandby indicates that the backup was taken on a standby. It is
+    * require to initialize recovery and set to false afterwards.
s/require/required/.

Fixed.

The term "backup recovery", that we've never used in the tree until
now as far as I know. Perhaps this recovery method should just be
referred as "recovery from backup"?

Well, "backup recovery" is less awkward, I think. For instance "backup
recovery field" vs "recovery from backup field".

By the way, there is another thing that this patch has forgotten: the
SQL functions that display data from the control file. Shouldn't
pg_control_recovery() be extended with the new fields? These fields
may be less critical than the other ones related to recovery, but I
suspect that showing them can become handy at least for debugging and
monitoring purposes.

I guess that depends on whether we reset them or not (discussion below).
Right now they would not be visible since by the time the user could log
on they would be reset.

Something in this area is that backupRecoveryRequired is the switch
controlling if the fields set by the recovery initialization. Could
it be actually useful to leave the other fields as they are and only
reset backupRecoveryRequired before the first control file update?
This would leave a trace of the backup history directly in the control
file.

Since the other recovery fields are cleared in ReachedEndOfBackup() this
would be a change from what we do now.

None of these fields are ever visible (with the exception of
minRecoveryPoint/TLI) since they are reset when the database becomes
consistent and before logons are allowed. Viewing them with
pg_controldata makes sense, but I'm not sure pg_control_recovery() does.

In fact, are backup_start_lsn, backup_end_lsn, and
end_of_backup_record_required ever non-zero when logged onto Postgres?
Maybe I'm missing something?

What about pg_resetwal and RewriteControlFile()? Shouldn't these
recovery fields be reset as well?

Done.

git diff --check is complaining a bit.

Fixed.

New patches attached based on eb81e8e790.

Regards,
-David

Attachments:

recovery-in-pgcontrol-v4-0001-pass-len-to-sendFileWithContent.patchtext/plain; charset=UTF-8; name=recovery-in-pgcontrol-v4-0001-pass-len-to-sendFileWithContent.patchDownload+11-12
recovery-in-pgcontrol-v4-0002-remove-backuplabel.patchtext/plain; charset=UTF-8; name=recovery-in-pgcontrol-v4-0002-remove-backuplabel.patchDownload+268-397
#14Michael Paquier
michael@paquier.xyz
In reply to: David Steele (#13)
Re: Add recovery to pg_control and remove backup_label

On Fri, Nov 10, 2023 at 02:55:19PM -0400, David Steele wrote:

On 11/10/23 00:37, Michael Paquier wrote:

I've done a few more dozen runs, and still nothing. I am wondering
what this disturbance was.

OK, hopefully it was just a blip.

Still nothing on this side. So that seems really like a random blip
in the matrix.

This has been split out.

Thanks, applied 0001.

The term "backup recovery", that we've never used in the tree until
now as far as I know. Perhaps this recovery method should just be
referred as "recovery from backup"?

Well, "backup recovery" is less awkward, I think. For instance "backup
recovery field" vs "recovery from backup field".

Not sure. I've never used this term when referring to recovery from a
backup. Perhaps I'm just not used to it, still that sounds a bit
confusing here.

Something in this area is that backupRecoveryRequired is the switch
controlling if the fields set by the recovery initialization. Could
it be actually useful to leave the other fields as they are and only
reset backupRecoveryRequired before the first control file update?
This would leave a trace of the backup history directly in the control
file.

Since the other recovery fields are cleared in ReachedEndOfBackup() this
would be a change from what we do now.

None of these fields are ever visible (with the exception of
minRecoveryPoint/TLI) since they are reset when the database becomes
consistent and before logons are allowed. Viewing them with pg_controldata
makes sense, but I'm not sure pg_control_recovery() does.

In fact, are backup_start_lsn, backup_end_lsn, and
end_of_backup_record_required ever non-zero when logged onto Postgres? Maybe
I'm missing something?

Yeah, but custom backup/restore tools may want manipulate the contents
of the control file for their own work, so at least for the sake of
visibility it sounds important to me to show all the information at
hand, and that there is no need to.

-    The backup label
-    file includes the label string you gave to <function>pg_backup_start</function>,
+    The backup history file (which is archived like WAL) includes the label
+    string you gave to <function>pg_backup_start</function>,
     as well as the time at which <function>pg_backup_start</function> was run, and
     the name of the starting WAL file.  In case of confusion it is therefore
-    possible to look inside a backup file and determine exactly which
+    possible to look inside a backup history file and determine exactly which

As a side note, it is a bit disappointing that we lose the backup
label from the backup itself, even if the patch changes correctly the
documentation to reflect the new behavior. It is in the backup
history file on the node from where the base backup has been taken or
in the archives, hopefully. However there is nothing that remains in
the base backup itself, and backups can be self-contained (easy with
pg_basebackup --wal-method=stream). I think that we should retain a
minimum amount of information as a replacement for the backup_label,
at least. With this state, the current patch slightly reduces the
debuggability of deployments. That could be annoying for some users.

New patches attached based on eb81e8e790.

Diving into the code for references about the backup label file, I
have spotted this log in pg_rewind that is now incorrect:
if (showprogress)
pg_log_info("creating backup label and updating control file");

+    printf(_("Backup start location's timeline:     %u\n"),
+           ControlFile->backupStartPointTLI);
     printf(_("Backup end location:                  %X/%X\n"),
            LSN_FORMAT_ARGS(ControlFile->backupEndPoint));
Perhaps these two should be reversed to match with the header file.
+    /*
+     * After pg_backup_stop() returns this field will contain a copy of
+     * pg_control that should be stored with the backup. Fields have been
+     * updated for recovery and the CRC has been recalculated. The buffer
+     * is padded to PG_CONTROL_MAX_SAFE_SIZE so that pg_control is always
+     * a consistent size but smaller (and hopefully easier to handle) than
+     * PG_CONTROL_FILE_SIZE. Bytes after sizeof(ControlFileData) are zeroed.
+     */
+    uint8_t controlFile[PG_CONTROL_MAX_SAFE_SIZE];

I don't mind the addition of a control file with the max safe size,
because it will never be higher than that. However:

+                /* End the backup before sending pg_control */
+                basebackup_progress_wait_wal_archive(&state);
+                do_pg_backup_stop(backup_state, !opt->nowait);
+
+                /* Send copy of pg_control containing recovery info */
+                sendFileWithContent(sink, XLOG_CONTROL_FILE,
+                                    (char *)backup_state->controlFile,
+                                    PG_CONTROL_MAX_SAFE_SIZE, &manifest);

It seems to me that the base backup protocol should always send an 8k
file for the control file so as we maintain consistency with the
on-disk format. Currently, a base backup taken with this patch
results in a control file of size 512B.

+	/* Build the contents of pg_control */
+	pg_control_bytea = (bytea *) palloc(PG_CONTROL_MAX_SAFE_SIZE + VARHDRSZ);
+	SET_VARSIZE(pg_control_bytea, PG_CONTROL_MAX_SAFE_SIZE + VARHDRSZ);
+	memcpy(VARDATA(pg_control_bytea), backup_state->controlFile, PG_CONTROL_MAX_SAFE_SIZE);

Similar comment for the control file returned by pg_backup_stop(),
which could just be made a 8k field?

+     <function>pg_backup_stop</function> returns the
+     <filename>pg_control</filename> file, which must be stored in the
+     <filename>global</filename> directory of the backup. It also returns the

And perhaps emphasize that this file should be an 8kB file in the
paragraph mentioning the data returned by pg_backup_stop()?

-      Create a <filename>backup_label</filename> file to begin WAL replay at
+      Update <filename>pg_control</filename> file to begin WAL replay at
       the checkpoint created at failover and configure the
       <filename>pg_control</filename> file with a minimum consistency LSN

pg_control is mentioned twice, so perhaps this could be worded better?

PG_CONTROL_VERSION is important to not forget about.. Perhaps this
should be noted somewhere, or just changed in the patch itself.
Contrary to catalog changes, we do few of these in the control file so
there is close to zero risk of conflicts with other patches in the CF
app.
--
Michael

#15Michael Paquier
michael@paquier.xyz
In reply to: David Steele (#10)
Re: Add recovery to pg_control and remove backup_label

(I am not exactly sure how, but we've lost pgsql-hackers on the way
when you sent v5. Now added back in CC with the two latest patches
you've proposed attached.)

Here is a short summary of what has been missed by the lists:
- I've commented that the patch should not create, not show up in
fields returned the SQL functions or stream control files with a size
of 512B, just stick to 8kB. If this is worth changing this should be
applied consistently across the board including initdb, discussed on
its own thread.
- The backup-related fields in the control file are reset at the end
of recovery. I've suggested to not do that to keep a trace of what
was happening during recovery. The latest version of the patch resets
the fields.
- With the backup_label file gone, we lose some information in the
backups themselves, which is not good. Instead, you have suggested an
approach where this data is added to the backup manifest, meaning that
no information would be lost, particularly useful for self-contained
backups. The fields planned to be added to the backup manifest are:
-- The start and end time of the backup, the end timestamp being
useful to know when stop time can be used for PITR.
-- The backup label.
I've agreed that it may be the best thing to do at this end to not
lose any data related to the removal of the backup_label file.

On Sun, Nov 19, 2023 at 02:14:32PM -0400, David Steele wrote:

On 11/15/23 20:03, Michael Paquier wrote:

As the label is only an informational field, the parsing added to
pg_verifybackup is not really needed because it is used nowhere in the
validation process, so keeping the logic simpler would be the way to
go IMO. This is contrary to the WAL range for example, where start
and end LSNs are used for validation with a pg_waldump command.
Robert, any comments about the addition of the label in the manifest?

I'm sure Robert will comment on this when he gets the time, but for now I
have backed off on passing the new info to pg_verifybackup and added
start/stop time.

FWIW, I'm OK with the bits for the backup manifest as presented. So
if there are no remarks and/or no objections, I'd like to apply it but
let give some room to others to comment on that as there's been a gap
in the emails exchanged on pgsql-hackers. I hope that the summary
I've posted above covers everything. So let's see about doing
something around the middle of next week. With Thanksgiving in the
US, a lot of folks will not have the time to monitor what's happening
on this thread.

+      The end time for the backup. This is when the backup was stopped in
+      <productname>PostgreSQL</productname> and represents the earliest time
+      that can be used for time-based Point-In-Time Recovery.

This one is actually a very good point. We'd lost this capacity with
the backup_label file gone without the end timestamps in the control
file.

New patches attached based on b218fbb7.

I've noticed on the other thread the remark about being less
aggressive with the fields related to recovery in the control file, so
I assume that this patch should leave the fields be after the end of
recovery from the start and only rely on backupRecoveryRequired to
decide if the recovery should use the fields or not:
/messages/by-id/241ccde1-1928-4ba2-a0bb-5350f7b191a8@=pgmasters.net

+	ControlFile->backupCheckPoint = InvalidXLogRecPtr;
 	ControlFile->backupStartPoint = InvalidXLogRecPtr;
+	ControlFile->backupStartPointTLI = 0;
 	ControlFile->backupEndPoint = InvalidXLogRecPtr;
+	ControlFile->backupFromStandby = false;
 	ControlFile->backupEndRequired = false;

Still, I get the temptation of being consistent with the current style
on HEAD to reset everything, as well..
--
Michael

Attachments:

recovery-in-pgcontrol-v7-0001-add-info-to-manifest.patchtext/x-diff; charset=us-asciiDownload+157-5
recovery-in-pgcontrol-v7-0002-remove-backuplabel.patchtext/x-diff; charset=us-asciiDownload+293-410
#16David Steele
david@pgmasters.net
In reply to: Michael Paquier (#15)
Re: Add recovery to pg_control and remove backup_label

On 11/19/23 21:15, Michael Paquier wrote:

(I am not exactly sure how, but we've lost pgsql-hackers on the way
when you sent v5. Now added back in CC with the two latest patches
you've proposed attached.)

Ugh, I must have hit reply instead of reply all. It's a rookie error and
you hate to see it.

Here is a short summary of what has been missed by the lists:
- I've commented that the patch should not create, not show up in
fields returned the SQL functions or stream control files with a size
of 512B, just stick to 8kB. If this is worth changing this should be
applied consistently across the board including initdb, discussed on
its own thread.
- The backup-related fields in the control file are reset at the end
of recovery. I've suggested to not do that to keep a trace of what
was happening during recovery. The latest version of the patch resets
the fields.
- With the backup_label file gone, we lose some information in the
backups themselves, which is not good. Instead, you have suggested an
approach where this data is added to the backup manifest, meaning that
no information would be lost, particularly useful for self-contained
backups. The fields planned to be added to the backup manifest are:
-- The start and end time of the backup, the end timestamp being
useful to know when stop time can be used for PITR.
-- The backup label.
I've agreed that it may be the best thing to do at this end to not
lose any data related to the removal of the backup_label file.

This looks right to me.

On Sun, Nov 19, 2023 at 02:14:32PM -0400, David Steele wrote:

On 11/15/23 20:03, Michael Paquier wrote:

As the label is only an informational field, the parsing added to
pg_verifybackup is not really needed because it is used nowhere in the
validation process, so keeping the logic simpler would be the way to
go IMO. This is contrary to the WAL range for example, where start
and end LSNs are used for validation with a pg_waldump command.
Robert, any comments about the addition of the label in the manifest?

I'm sure Robert will comment on this when he gets the time, but for now I
have backed off on passing the new info to pg_verifybackup and added
start/stop time.

FWIW, I'm OK with the bits for the backup manifest as presented. So
if there are no remarks and/or no objections, I'd like to apply it but
let give some room to others to comment on that as there's been a gap
in the emails exchanged on pgsql-hackers. I hope that the summary
I've posted above covers everything. So let's see about doing
something around the middle of next week. With Thanksgiving in the
US, a lot of folks will not have the time to monitor what's happening
on this thread.

Timing sounds good to me.

+      The end time for the backup. This is when the backup was stopped in
+      <productname>PostgreSQL</productname> and represents the earliest time
+      that can be used for time-based Point-In-Time Recovery.

This one is actually a very good point. We'd lost this capacity with
the backup_label file gone without the end timestamps in the control
file.

Yeah, the end time is very important for recovery scenarios. We
definitely need that recorded somewhere.

I've noticed on the other thread the remark about being less
aggressive with the fields related to recovery in the control file, so
I assume that this patch should leave the fields be after the end of
recovery from the start and only rely on backupRecoveryRequired to
decide if the recovery should use the fields or not:
/messages/by-id/241ccde1-1928-4ba2-a0bb-5350f7b191a8@=pgmasters.net

+	ControlFile->backupCheckPoint = InvalidXLogRecPtr;
ControlFile->backupStartPoint = InvalidXLogRecPtr;
+	ControlFile->backupStartPointTLI = 0;
ControlFile->backupEndPoint = InvalidXLogRecPtr;
+	ControlFile->backupFromStandby = false;
ControlFile->backupEndRequired = false;

Still, I get the temptation of being consistent with the current style
on HEAD to reset everything, as well..

I'd rather reset everything for now (as we do now) and think about
keeping these values as a separate patch. It may be that we don't want
to keep all of them, or we need a separate flag to say recovery was
completed. We are accumulating a lot of booleans here, maybe we need a
state var (recoveryRequired, recoveryInProgress, recoveryComplete) and
then define which other vars are valid in each state.

Regards,
-David

#17Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#15)
Re: Add recovery to pg_control and remove backup_label

On Sun, Nov 19, 2023 at 8:16 PM Michael Paquier <michael@paquier.xyz> wrote:

(I am not exactly sure how, but we've lost pgsql-hackers on the way
when you sent v5. Now added back in CC with the two latest patches
you've proposed attached.)

Here is a short summary of what has been missed by the lists:
- I've commented that the patch should not create, not show up in
fields returned the SQL functions or stream control files with a size
of 512B, just stick to 8kB. If this is worth changing this should be
applied consistently across the board including initdb, discussed on
its own thread.
- The backup-related fields in the control file are reset at the end
of recovery. I've suggested to not do that to keep a trace of what
was happening during recovery. The latest version of the patch resets
the fields.
- With the backup_label file gone, we lose some information in the
backups themselves, which is not good. Instead, you have suggested an
approach where this data is added to the backup manifest, meaning that
no information would be lost, particularly useful for self-contained
backups. The fields planned to be added to the backup manifest are:
-- The start and end time of the backup, the end timestamp being
useful to know when stop time can be used for PITR.
-- The backup label.
I've agreed that it may be the best thing to do at this end to not
lose any data related to the removal of the backup_label file.

I think we need more votes to make a change this big. I have a
concern, which I think I've expressed before, that we keep whacking
around the backup APIs, and that has a cost which is potentially
larger than the benefits. The last time we changed the API, we changed
pg_stop_backup to pg_backup_stop, but this doesn't do that, and I
wonder if that's OK. Even if it is, do we really want to change this
API around again after such a short time?

That said, I don't have an intrinsic problem with moving this
information from the backup_label to the backup_manifest file since it
is purely informational. I do think there should perhaps be some
additions to the test cases, though.

I am concerned about the interaction of this proposal with incremental
backup. When you take an incremental backup, you get something that
looks a lot like a usable data directory but isn't. To prevent that
from causing avoidable disasters, the present version of the patch
adds INCREMENTAL FROM LSN and INCREMENTAL FROM TLI fields to the
backup_label. pg_combinebackup knows to look for those fields, and the
server knows that if they are present it should refuse to start. With
this change, though, I suppose those fields would end up in
pg_control. But that does not feel entirely great, because we have a
goal of keeping the amount of real data in pg_control below 512 bytes,
the traditional sector size, and this adds another 12 bytes of stuff
to that file that currently doesn't need to be there. I feel like
that's kind of a problem.

But my main point here is ... if we have a few more senior hackers
weigh in and vote in favor of this change, well then that's one thing.
But IMHO a discussion that's mostly between 2 people is not nearly a
strong enough consensus to justify this amount of disruption.

--
Robert Haas
EDB: http://www.enterprisedb.com

#18David Steele
david@pgmasters.net
In reply to: Robert Haas (#17)
Re: Add recovery to pg_control and remove backup_label

On 11/20/23 12:11, Robert Haas wrote:

On Sun, Nov 19, 2023 at 8:16 PM Michael Paquier <michael@paquier.xyz> wrote:

(I am not exactly sure how, but we've lost pgsql-hackers on the way
when you sent v5. Now added back in CC with the two latest patches
you've proposed attached.)

Here is a short summary of what has been missed by the lists:
- I've commented that the patch should not create, not show up in
fields returned the SQL functions or stream control files with a size
of 512B, just stick to 8kB. If this is worth changing this should be
applied consistently across the board including initdb, discussed on
its own thread.
- The backup-related fields in the control file are reset at the end
of recovery. I've suggested to not do that to keep a trace of what
was happening during recovery. The latest version of the patch resets
the fields.
- With the backup_label file gone, we lose some information in the
backups themselves, which is not good. Instead, you have suggested an
approach where this data is added to the backup manifest, meaning that
no information would be lost, particularly useful for self-contained
backups. The fields planned to be added to the backup manifest are:
-- The start and end time of the backup, the end timestamp being
useful to know when stop time can be used for PITR.
-- The backup label.
I've agreed that it may be the best thing to do at this end to not
lose any data related to the removal of the backup_label file.

I think we need more votes to make a change this big. I have a
concern, which I think I've expressed before, that we keep whacking
around the backup APIs, and that has a cost which is potentially
larger than the benefits.

From my perspective it's not that big a change for backup software but
it does bring a lot of benefits, including fixing an outstanding bug in
Postgres, i.e. reading pg_control without getting a torn copy.

The last time we changed the API, we changed
pg_stop_backup to pg_backup_stop, but this doesn't do that, and I
wonder if that's OK. Even if it is, do we really want to change this
API around again after such a short time?

This is a good point. We could just rename again, but not sure what
names to go for this time. OTOH if the backup software is selecting
fields then they will get an error because the names have changed. If
the software is grabbing fields by position then they'll get a
valid-looking result (even if querying by position is a terrible idea).

Another thing we could do is explicitly error if we see backup_label in
PGDATA during recovery. That's just a few lines of code so would not be
a big deal to maintain. This error would only be visible on restore, so
it presumes that backup software is being tested.

Maybe just a rename to something like pg_backup_begin/end would be the
way to go.

That said, I don't have an intrinsic problem with moving this
information from the backup_label to the backup_manifest file since it
is purely informational. I do think there should perhaps be some
additions to the test cases, though.

A little hard to add to the tests, I think, since they are purely
informational, i.e. not pushed up by the parser. Maybe we could just
grep for the fields?

I am concerned about the interaction of this proposal with incremental
backup. When you take an incremental backup, you get something that
looks a lot like a usable data directory but isn't. To prevent that
from causing avoidable disasters, the present version of the patch
adds INCREMENTAL FROM LSN and INCREMENTAL FROM TLI fields to the
backup_label. pg_combinebackup knows to look for those fields, and the
server knows that if they are present it should refuse to start. With
this change, though, I suppose those fields would end up in
pg_control. But that does not feel entirely great, because we have a
goal of keeping the amount of real data in pg_control below 512 bytes,
the traditional sector size, and this adds another 12 bytes of stuff
to that file that currently doesn't need to be there. I feel like
that's kind of a problem.

I think these fields would be handled the same as the rest of the fields
in backup_label: returned from pg_backup_stop() and also stored in
backup_manifest. Third-party software can do as they like with them and
pg_combinebackup can just read from backup_manifest.

As for the pg_control file -- it might be best to give it a different
name for backups that are not essentially copies of PGDATA. On the other
hand, pgBackRest has included pg_control in incremental backups since
day one and we've never had a user mistakenly do a manual restore of one
and cause a problem (though manual restores are not the norm). Still,
probably can't hurt to be a bit careful.

But my main point here is ... if we have a few more senior hackers
weigh in and vote in favor of this change, well then that's one thing.
But IMHO a discussion that's mostly between 2 people is not nearly a
strong enough consensus to justify this amount of disruption.

We absolutely need more people to look at this and sign off. I'm glad
they have not so far because it has allowed time to whack the patch
around and get it into better shape.

Regards,
-David

#19Robert Haas
robertmhaas@gmail.com
In reply to: David Steele (#18)
Re: Add recovery to pg_control and remove backup_label

On Mon, Nov 20, 2023 at 12:54 PM David Steele <david@pgmasters.net> wrote:

Another thing we could do is explicitly error if we see backup_label in
PGDATA during recovery. That's just a few lines of code so would not be
a big deal to maintain. This error would only be visible on restore, so
it presumes that backup software is being tested.

I think that if we do decide to adopt this proposal, that would be a
smart precaution.

A little hard to add to the tests, I think, since they are purely
informational, i.e. not pushed up by the parser. Maybe we could just
grep for the fields?

Hmm. Or should they be pushed up by the parser?

I think these fields would be handled the same as the rest of the fields
in backup_label: returned from pg_backup_stop() and also stored in
backup_manifest. Third-party software can do as they like with them and
pg_combinebackup can just read from backup_manifest.

I think that would be a bad plan, because this is critical
information, and a backup manifest is not a thing that you're required
to have. It's not a natural fit at all. We don't want to create a
situation where if you nuke the backup_manifest then the server
forgets that what it has is an incremental backup rather than a usable
data directory.

We absolutely need more people to look at this and sign off. I'm glad
they have not so far because it has allowed time to whack the patch
around and get it into better shape.

Cool.

--
Robert Haas
EDB: http://www.enterprisedb.com

#20David Steele
david@pgmasters.net
In reply to: Robert Haas (#19)
Re: Add recovery to pg_control and remove backup_label

On 11/20/23 14:44, Robert Haas wrote:

On Mon, Nov 20, 2023 at 12:54 PM David Steele <david@pgmasters.net> wrote:

Another thing we could do is explicitly error if we see backup_label in
PGDATA during recovery. That's just a few lines of code so would not be
a big deal to maintain. This error would only be visible on restore, so
it presumes that backup software is being tested.

I think that if we do decide to adopt this proposal, that would be a
smart precaution.

I'd be OK with it -- what do you think, Michael? Would this be enough
that we would not need to rename the functions, or should we just go
with the rename?

A little hard to add to the tests, I think, since they are purely
informational, i.e. not pushed up by the parser. Maybe we could just
grep for the fields?

Hmm. Or should they be pushed up by the parser?

We could do that. I started on that road, but it's a lot of code for
fields that aren't used. I think it would be better if the parser also
loaded a data structure that represented the manifest. Seems to me
there's a lot of duplicated code between pg_verifybackup and
pg_combinebackup the way it is now.

I think these fields would be handled the same as the rest of the fields
in backup_label: returned from pg_backup_stop() and also stored in
backup_manifest. Third-party software can do as they like with them and
pg_combinebackup can just read from backup_manifest.

I think that would be a bad plan, because this is critical
information, and a backup manifest is not a thing that you're required
to have. It's not a natural fit at all. We don't want to create a
situation where if you nuke the backup_manifest then the server
forgets that what it has is an incremental backup rather than a usable
data directory.

I can't see why a backup would continue to be valid without a manifest
-- that's not very standard for backup software. If you have the
critical info in backup_label, you can't afford to lose that, so why
should backup_manifest be any different?

Regards,
-David

#21Robert Haas
robertmhaas@gmail.com
In reply to: David Steele (#20)
#22David Steele
david@pgmasters.net
In reply to: Robert Haas (#21)
#23Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#17)
#24Andres Freund
andres@anarazel.de
In reply to: David Steele (#22)
#25David G. Johnston
david.g.johnston@gmail.com
In reply to: Andres Freund (#23)
#26Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#17)
#27Andres Freund
andres@anarazel.de
In reply to: David G. Johnston (#25)
#28Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#23)
#29Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#28)
#30Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#29)
#31David Steele
david@pgmasters.net
In reply to: Andres Freund (#29)
#32Andres Freund
andres@anarazel.de
In reply to: David Steele (#31)
#33David Steele
david@pgmasters.net
In reply to: Andres Freund (#32)
#34David Steele
david@pgmasters.net
In reply to: Andres Freund (#24)
#35Andres Freund
andres@anarazel.de
In reply to: David Steele (#34)
#36David Steele
david@pgmasters.net
In reply to: Andres Freund (#23)
#37David Steele
david@pgmasters.net
In reply to: Andres Freund (#35)
#38Andres Freund
andres@anarazel.de
In reply to: David Steele (#37)
#39David Steele
david@pgmasters.net
In reply to: Andres Freund (#38)
#40Stephen Frost
sfrost@snowman.net
In reply to: David Steele (#33)
#41Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#40)
#42Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#41)
#43vignesh C
vignesh21@gmail.com
In reply to: Michael Paquier (#15)
#44Michael Paquier
michael@paquier.xyz
In reply to: vignesh C (#43)
#45David Steele
david@pgmasters.net
In reply to: Michael Paquier (#44)
#46David Steele
david@pgmasters.net
In reply to: David Steele (#45)
#47Stefan Fercot
stefan.fercot@protonmail.com
In reply to: David Steele (#46)
#48David Steele
david@pgmasters.net
In reply to: Stefan Fercot (#47)