Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?
Hi,
After the commit [1]commit 39969e2a1e4d7f5a37f3ef37d53bbfe171e7d77a Author: Stephen Frost <sfrost@snowman.net> Date: Wed Apr 6 14:41:03 2022 -0400, is it correct to say errmsg("invalid data in file
\"%s\"", BACKUP_LABEL_FILE))); in do_pg_backup_stop() when we hold the
contents in backend global memory, not actually reading from backup_label
file? However, it is correct to say that in read_backup_label.
IMO, we can either say "invalid backup_label contents found" or we can be
more descriptive and say "invalid "START WAL LOCATION" line found in
backup_label content" and "invalid "BACKUP FROM" line found in
backup_label content" and so on.
Thoughts?
[1]: commit 39969e2a1e4d7f5a37f3ef37d53bbfe171e7d77a Author: Stephen Frost <sfrost@snowman.net> Date: Wed Apr 6 14:41:03 2022 -0400
commit 39969e2a1e4d7f5a37f3ef37d53bbfe171e7d77a
Author: Stephen Frost <sfrost@snowman.net>
Date: Wed Apr 6 14:41:03 2022 -0400
Remove exclusive backup mode
errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE)));
Regards,
Bharath Rupireddy.
At Wed, 20 Jul 2022 17:09:09 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
Hi,
After the commit [1], is it correct to say errmsg("invalid data in file
\"%s\"", BACKUP_LABEL_FILE))); in do_pg_backup_stop() when we hold the
contents in backend global memory, not actually reading from backup_label
file? However, it is correct to say that in read_backup_label.IMO, we can either say "invalid backup_label contents found" or we can be
more descriptive and say "invalid "START WAL LOCATION" line found in
backup_label content" and "invalid "BACKUP FROM" line found in
backup_label content" and so on.Thoughts?
Previously there the case the "char *labelfile" is loaded from a file,
but currently it is alwasy a string build on the process. In that
sense, nowadays it is a kind of internal error, which I think is not
supposed to be exposed to users.
So I think we can leave the code alone to avoid back-patching
obstacles. But if we decided to change the code around, I'd like to
change the string into a C struct, so that we don't need to parse it.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Thu, Jul 21, 2022 at 2:33 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
At Wed, 20 Jul 2022 17:09:09 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
Hi,
After the commit [1], is it correct to say errmsg("invalid data in file
\"%s\"", BACKUP_LABEL_FILE))); in do_pg_backup_stop() when we hold the
contents in backend global memory, not actually reading from backup_label
file? However, it is correct to say that in read_backup_label.IMO, we can either say "invalid backup_label contents found" or we can be
more descriptive and say "invalid "START WAL LOCATION" line found in
backup_label content" and "invalid "BACKUP FROM" line found in
backup_label content" and so on.Thoughts?
Previously there the case the "char *labelfile" is loaded from a file,
but currently it is alwasy a string build on the process. In that
sense, nowadays it is a kind of internal error, which I think is not
supposed to be exposed to users.So I think we can leave the code alone to avoid back-patching
obstacles. But if we decided to change the code around, I'd like to
change the string into a C struct, so that we don't need to parse it.
Hm. I think we must take this opportunity to clean it up. You are
right, we don't need to parse the label file contents (just like we
used to do previously after reading it from the file) in
do_pg_backup_stop(), instead we can just pass a structure. Also,
do_pg_backup_stop() isn't modifying any labelfile contents, but using
startxlogfilename, startpoint and backupfrom from the labelfile
contents. I think this information can easily be passed as a single
structure. In fact, I might think a bit more here and wrap label_file,
tblspc_map_file to a single structure something like below and pass it
across the functions.
typedef struct BackupState
{
StringInfo label_file;
StringInfo tblspc_map_file;
char startxlogfilename[MAXFNAMELEN];
XLogRecPtr startpoint;
char backupfrom[20];
} BackupState;
This way, the code is more readable, structured and we can remove 2
sscanf() calls, 2 "invalid data in file" errors, 1 strchr() call, 1
strstr() call. Only thing is that it creates code diff from the
previous PG versions which is fine IMO. If okay, I'm happy to prepare
a patch.
Thoughts?
Regards,
Bharath Rupireddy.
At Mon, 25 Jul 2022 14:21:38 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
Hm. I think we must take this opportunity to clean it up. You are
right, we don't need to parse the label file contents (just like we
used to do previously after reading it from the file) in
do_pg_backup_stop(), instead we can just pass a structure. Also,
do_pg_backup_stop() isn't modifying any labelfile contents, but using
startxlogfilename, startpoint and backupfrom from the labelfile
contents. I think this information can easily be passed as a single
structure. In fact, I might think a bit more here and wrap label_file,
tblspc_map_file to a single structure something like below and pass it
across the functions.typedef struct BackupState
{
StringInfo label_file;
StringInfo tblspc_map_file;
char startxlogfilename[MAXFNAMELEN];
XLogRecPtr startpoint;
char backupfrom[20];
} BackupState;This way, the code is more readable, structured and we can remove 2
sscanf() calls, 2 "invalid data in file" errors, 1 strchr() call, 1
strstr() call. Only thing is that it creates code diff from the
previous PG versions which is fine IMO. If okay, I'm happy to prepare
a patch.Thoughts?
It is more or less what was in my mind, but it seems that we don't
need StringInfo there, or should avoid it to signal the strings are
not editable.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 7/25/22 22:49, Kyotaro Horiguchi wrote:
At Mon, 25 Jul 2022 14:21:38 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
Hm. I think we must take this opportunity to clean it up. You are
right, we don't need to parse the label file contents (just like we
used to do previously after reading it from the file) in
do_pg_backup_stop(), instead we can just pass a structure. Also,
do_pg_backup_stop() isn't modifying any labelfile contents, but using
startxlogfilename, startpoint and backupfrom from the labelfile
contents. I think this information can easily be passed as a single
structure. In fact, I might think a bit more here and wrap label_file,
tblspc_map_file to a single structure something like below and pass it
across the functions.typedef struct BackupState
{
StringInfo label_file;
StringInfo tblspc_map_file;
char startxlogfilename[MAXFNAMELEN];
XLogRecPtr startpoint;
char backupfrom[20];
} BackupState;This way, the code is more readable, structured and we can remove 2
sscanf() calls, 2 "invalid data in file" errors, 1 strchr() call, 1
strstr() call. Only thing is that it creates code diff from the
previous PG versions which is fine IMO. If okay, I'm happy to prepare
a patch.Thoughts?
It is more or less what was in my mind, but it seems that we don't
need StringInfo there, or should avoid it to signal the strings are
not editable.
I would prefer to have all the components of backup_label stored
separately and then generate backup_label from them in pg_backup_stop().
For PG16 I am planning to add some fields to backup_label that are not
known when pg_backup_start() is called, e.g. min recovery time.
Regards,
-David
On Tue, Jul 26, 2022 at 5:22 PM David Steele <david@pgmasters.net> wrote:
On 7/25/22 22:49, Kyotaro Horiguchi wrote:
At Mon, 25 Jul 2022 14:21:38 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
Hm. I think we must take this opportunity to clean it up. You are
right, we don't need to parse the label file contents (just like we
used to do previously after reading it from the file) in
do_pg_backup_stop(), instead we can just pass a structure. Also,
do_pg_backup_stop() isn't modifying any labelfile contents, but using
startxlogfilename, startpoint and backupfrom from the labelfile
contents. I think this information can easily be passed as a single
structure. In fact, I might think a bit more here and wrap label_file,
tblspc_map_file to a single structure something like below and pass it
across the functions.typedef struct BackupState
{
StringInfo label_file;
StringInfo tblspc_map_file;
char startxlogfilename[MAXFNAMELEN];
XLogRecPtr startpoint;
char backupfrom[20];
} BackupState;This way, the code is more readable, structured and we can remove 2
sscanf() calls, 2 "invalid data in file" errors, 1 strchr() call, 1
strstr() call. Only thing is that it creates code diff from the
previous PG versions which is fine IMO. If okay, I'm happy to prepare
a patch.Thoughts?
It is more or less what was in my mind, but it seems that we don't
need StringInfo there, or should avoid it to signal the strings are
not editable.I would prefer to have all the components of backup_label stored
separately and then generate backup_label from them in pg_backup_stop().
+1, because pg_backup_stop is the one that's returning backup_label
contents, so it does make sense for it to prepare it once and for all
and return.
For PG16 I am planning to add some fields to backup_label that are not
known when pg_backup_start() is called, e.g. min recovery time.
Can you please point to your patch that does above?
Yes, right now, backup_label or tablespace_map contents are being
filled in by pg_backup_start and are never changed again. But if your
above proposal is for fixing some issue, then it would make sense for
us to carry all the info in a structure to pg_backup_stop and then let
it prepare the backup_label and tablespace_map contents.
If the approach is okay for the hackers, I would like to spend time on it.
Regards,
Bharath Rupireddy.
On 7/26/22 07:59, Bharath Rupireddy wrote:
On Tue, Jul 26, 2022 at 5:22 PM David Steele <david@pgmasters.net> wrote:
On 7/25/22 22:49, Kyotaro Horiguchi wrote:
At Mon, 25 Jul 2022 14:21:38 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
Hm. I think we must take this opportunity to clean it up. You are
right, we don't need to parse the label file contents (just like we
used to do previously after reading it from the file) in
do_pg_backup_stop(), instead we can just pass a structure. Also,
do_pg_backup_stop() isn't modifying any labelfile contents, but using
startxlogfilename, startpoint and backupfrom from the labelfile
contents. I think this information can easily be passed as a single
structure. In fact, I might think a bit more here and wrap label_file,
tblspc_map_file to a single structure something like below and pass it
across the functions.typedef struct BackupState
{
StringInfo label_file;
StringInfo tblspc_map_file;
char startxlogfilename[MAXFNAMELEN];
XLogRecPtr startpoint;
char backupfrom[20];
} BackupState;This way, the code is more readable, structured and we can remove 2
sscanf() calls, 2 "invalid data in file" errors, 1 strchr() call, 1
strstr() call. Only thing is that it creates code diff from the
previous PG versions which is fine IMO. If okay, I'm happy to prepare
a patch.Thoughts?
It is more or less what was in my mind, but it seems that we don't
need StringInfo there, or should avoid it to signal the strings are
not editable.I would prefer to have all the components of backup_label stored
separately and then generate backup_label from them in pg_backup_stop().+1, because pg_backup_stop is the one that's returning backup_label
contents, so it does make sense for it to prepare it once and for all
and return.For PG16 I am planning to add some fields to backup_label that are not
known when pg_backup_start() is called, e.g. min recovery time.Can you please point to your patch that does above?
Currently it is a plan, not a patch. So there is nothing to show yet.
Yes, right now, backup_label or tablespace_map contents are being
filled in by pg_backup_start and are never changed again. But if your
above proposal is for fixing some issue, then it would make sense for
us to carry all the info in a structure to pg_backup_stop and then let
it prepare the backup_label and tablespace_map contents.
I think this makes sense even if I don't get these changes into PG16.
If the approach is okay for the hackers, I would like to spend time on it.
+1 from me.
Regards,
-David
On Tue, Jul 26, 2022 at 5:50 PM David Steele <david@pgmasters.net> wrote:
I would prefer to have all the components of backup_label stored
separately and then generate backup_label from them in pg_backup_stop().+1, because pg_backup_stop is the one that's returning backup_label
contents, so it does make sense for it to prepare it once and for all
and return.For PG16 I am planning to add some fields to backup_label that are not
known when pg_backup_start() is called, e.g. min recovery time.Can you please point to your patch that does above?
Currently it is a plan, not a patch. So there is nothing to show yet.
Yes, right now, backup_label or tablespace_map contents are being
filled in by pg_backup_start and are never changed again. But if your
above proposal is for fixing some issue, then it would make sense for
us to carry all the info in a structure to pg_backup_stop and then let
it prepare the backup_label and tablespace_map contents.I think this makes sense even if I don't get these changes into PG16.
If the approach is okay for the hackers, I would like to spend time on it.
+1 from me.
Here comes the v1 patch. This patch tries to refactor backup related
code, advantages of doing so are following:
1) backup state is more structured now - all in a single structure,
callers can create backup_label contents whenever required, either
during the pg_backup_start or the pg_backup_stop or in between.
2) no parsing of backup_label file lines now in pg_backup_stop, no
error checking for invalid parsing.
3) backup_label and history file contents have most of the things in
common, they can now be created within a single function.
4) makes backup related code extensible and readable.
One downside is that it creates a lot of diff with previous versions.
Please review.
--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/
Attachments:
v1-0001-Refactor-backup-related-code.patchapplication/x-patch; name=v1-0001-Refactor-backup-related-code.patchDownload+234-201
On Sat, Jul 30, 2022 at 5:37 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Tue, Jul 26, 2022 at 5:50 PM David Steele <david@pgmasters.net> wrote:
I would prefer to have all the components of backup_label stored
separately and then generate backup_label from them in pg_backup_stop().+1, because pg_backup_stop is the one that's returning backup_label
contents, so it does make sense for it to prepare it once and for all
and return.For PG16 I am planning to add some fields to backup_label that are not
known when pg_backup_start() is called, e.g. min recovery time.Can you please point to your patch that does above?
Currently it is a plan, not a patch. So there is nothing to show yet.
Yes, right now, backup_label or tablespace_map contents are being
filled in by pg_backup_start and are never changed again. But if your
above proposal is for fixing some issue, then it would make sense for
us to carry all the info in a structure to pg_backup_stop and then let
it prepare the backup_label and tablespace_map contents.I think this makes sense even if I don't get these changes into PG16.
If the approach is okay for the hackers, I would like to spend time on it.
+1 from me.
Here comes the v1 patch. This patch tries to refactor backup related
code, advantages of doing so are following:1) backup state is more structured now - all in a single structure,
callers can create backup_label contents whenever required, either
during the pg_backup_start or the pg_backup_stop or in between.
2) no parsing of backup_label file lines now in pg_backup_stop, no
error checking for invalid parsing.
3) backup_label and history file contents have most of the things in
common, they can now be created within a single function.
4) makes backup related code extensible and readable.One downside is that it creates a lot of diff with previous versions.
Please review.
I added this to current CF - https://commitfest.postgresql.org/39/3808/
--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
On Mon, Aug 8, 2022 at 7:20 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
Please review.
I added this to current CF - https://commitfest.postgresql.org/39/3808/
Here's the v2 patch, no change from v1, just rebased because of commit
a8c012869763c711abc9085f54b2a100b60a85fa (Move basebackup code to new
directory src/backend/backup).
--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Attachments:
v2-0001-Refactor-backup-related-code.patchapplication/octet-stream; name=v2-0001-Refactor-backup-related-code.patchDownload+234-201
On Thu, Aug 11, 2022 at 09:55:13AM +0530, Bharath Rupireddy wrote:
Here's the v2 patch, no change from v1, just rebased because of commit
a8c012869763c711abc9085f54b2a100b60a85fa (Move basebackup code to new
directory src/backend/backup).
I was skimming at this patch, and indeed it is a bit crazy to write
the generate the contents of the backup_label file at backup start,
just to parse them again at backup stop with these extra sscan calls.
-#define PG_STOP_BACKUP_V2_COLS 3
+#define PG_BACKUP_STOP_V2_COLS 3
It seems to me that such changes, while they make sense with the new
naming of the backup start/stop functions are unrelated to what you
are trying to solve primarily here. This justifies a separate
cleanup, but I am perhaps overly-pedantic here :)
--
Michael
On Mon, Sep 12, 2022 at 1:12 PM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Aug 11, 2022 at 09:55:13AM +0530, Bharath Rupireddy wrote:
Here's the v2 patch, no change from v1, just rebased because of commit
a8c012869763c711abc9085f54b2a100b60a85fa (Move basebackup code to new
directory src/backend/backup).I was skimming at this patch, and indeed it is a bit crazy to write
the generate the contents of the backup_label file at backup start,
just to parse them again at backup stop with these extra sscan calls.
Thanks for taking a look at the patch.
-#define PG_STOP_BACKUP_V2_COLS 3
+#define PG_BACKUP_STOP_V2_COLS 3
It seems to me that such changes, while they make sense with the new
naming of the backup start/stop functions are unrelated to what you
are trying to solve primarily here. This justifies a separate
cleanup, but I am perhaps overly-pedantic here :)
I've posted a separate patch [1]/messages/by-id/CALj2ACXjvC28ppeDTCrfaSyHga0ggP5nRLJbsjx=7N-74UT4QA@mail.gmail.com to adjust the macro name alone.
Please review the attached v3 patch after removing the above macro changes.
[1]: /messages/by-id/CALj2ACXjvC28ppeDTCrfaSyHga0ggP5nRLJbsjx=7N-74UT4QA@mail.gmail.com
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v3-0001-Refactor-backup-related-code.patchapplication/octet-stream; name=v3-0001-Refactor-backup-related-code.patchDownload+231-198
On Mon, Sep 12, 2022 at 5:09 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
Please review the attached v3 patch after removing the above macro changes.
I'm attaching the v4 patch that's rebased on to the latest HEAD.
Please consider this for review.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v4-0001-Refactor-backup-related-code.patchapplication/x-patch; name=v4-0001-Refactor-backup-related-code.patchDownload+231-198
On Wed, Sep 14, 2022 at 02:24:12PM +0530, Bharath Rupireddy wrote:
I'm attaching the v4 patch that's rebased on to the latest HEAD.
Please consider this for review.
I have been looking at this patch.
- StringInfo labelfile;
- StringInfo tblspc_map_file;
backup_manifest_info manifest;
+ BackupState backup_state;
You could use initialize the state here with a {0}. That's simpler.
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -380,6 +380,31 @@ GetRmgr(RmgrId rmid)
}
#endif
+/* Structure to hold backup state. */
+typedef struct BackupStateData
+{
Why is that in xlog_internal.h? This header includes a lot of
declarations about the internals of WAL, but the backup state is not
that internal. I'd like to think that we should begin separating the
backup-related routines into their own file, as of a set of
xlogbackup.c/h in this case. That's a split I have been wondering
about for some time now. The internals of xlog.c for the start/stop
backups are tied to XLogCtlData which map such a switch more
complicated than it looks, so we can begin small and have the routines
to create, free and build the label file and the tablespace map in
this new file.
+ state->name = (char *) palloc0(len + 1);
+ memcpy(state->name, name, len);
Or just pstrdup()?
+BackupState
+get_backup_state(const char *name)
+{
I would name this one create_backup_state() instead.
+void
+create_backup_content_str(BackupState state, bool forhistoryfile)
+{
This could be a build_backup_content().
It seems to me that there is no point in having the list of
tablespaces in BackupStateData? This actually makes the code harder
to follow, see for example the changes with do_pg_backup_start(), we
the list of tablespace may or may be not passed down as a pointer of
BackupStateData while BackupStateData is itself the first argument of
this routine. These are independent from the label and backup history
file, as well.
We need to be careful about the file format (see read_backup_label()),
and create_backup_content_str() is careful about that which is good.
Core does not care about the format of the backup history file, though
some community tools may. I agree that what you are proposing here
makes the generation of these files easier to follow, but let's
document what forhistoryfile is here for, at least. Saving the
the backup label and history file strings in BackupState is a
confusing interface IMO. It would be more intuitive to have the
backup state in input, and provide the string generated in output
depending on what we want from the backup state.
- backup_started_in_recovery = RecoveryInProgress();
+ Assert(state != NULL);
+
+ in_recovery = RecoveryInProgress();
[...]
- if (strcmp(backupfrom, "standby") == 0 && !backup_started_in_recovery)
+ if (state->started_in_recovery == true && in_recovery == false)
I would have kept the naming to backup_started_in_recovery here. What
you are doing is logically right by relying on started_in_recovery to
check if recovery was running when the backup started, but this just
creates useless noise in the refactoring.
Something unrelated to your patch that I am noticing while scanning
the area is that we have been always lazy in freeing the label file
data allocated in TopMemoryContext when using the SQL functions if the
backup is aborted. We are not talking about this much amount of
memory each time a backup is performed, but that could be a cause for
memory bloat in a server if the same session is used and backups keep
failing, as the data is freed only on a successful pg_backup_stop().
Base backups through the replication protocol don't care about that as
the code keeps around the same pointer for the whole duration of
perform_base_backup(). Trying to tie the cleanup of the label file
with the abort phase would be the cause of more complications with
do_pg_backup_stop(), and xlog.c has no need to know about that now.
Just a remark for later.
--
Michael
On Fri, Sep 16, 2022 at 12:01 PM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Sep 14, 2022 at 02:24:12PM +0530, Bharath Rupireddy wrote:
I'm attaching the v4 patch that's rebased on to the latest HEAD.
Please consider this for review.I have been looking at this patch.
Thanks for reviewing it.
- StringInfo labelfile; - StringInfo tblspc_map_file; backup_manifest_info manifest; + BackupState backup_state; You could use initialize the state here with a {0}. That's simpler.
BackupState is a pointer to BackupStateData, we can't initialize that
way. However, I got rid of BackupStateData and used BackupState for
the structure directly, whenever pointer to the structure is required,
I'm using BackupState * to be more clear.
--- a/src/include/access/xlog_internal.h +++ b/src/include/access/xlog_internal.h @@ -380,6 +380,31 @@ GetRmgr(RmgrId rmid) } #endif+/* Structure to hold backup state. */
+typedef struct BackupStateData
+{
Why is that in xlog_internal.h? This header includes a lot of
declarations about the internals of WAL, but the backup state is not
that internal. I'd like to think that we should begin separating the
backup-related routines into their own file, as of a set of
xlogbackup.c/h in this case. That's a split I have been wondering
about for some time now. The internals of xlog.c for the start/stop
backups are tied to XLogCtlData which map such a switch more
complicated than it looks, so we can begin small and have the routines
to create, free and build the label file and the tablespace map in
this new file.
Good idea. It makes a lot more sense to me, because xlog.c is already
a file of 9000 LOC. I've created xlogbackup.c/.h files and added the
new code there. Once this patch gets in, I can offer my hand to move
do_pg_backup_start() and do_pg_backup_stop() from xlog.c and if okay,
pg_backup_start() and pg_backup_stop() from xlogfuncs.c to
xlogbackup.c/.h. Then, we might have to create new get/set APIs for
XLogCtl fields that do_pg_backup_start() and do_pg_backup_stop()
access.
+ state->name = (char *) palloc0(len + 1); + memcpy(state->name, name, len); Or just pstrdup()?
Done.
+BackupState +get_backup_state(const char *name) +{ I would name this one create_backup_state() instead.+void +create_backup_content_str(BackupState state, bool forhistoryfile) +{ This could be a build_backup_content().
I came up with more meaningful names - allocate_backup_state(),
deallocate_backup_state(), build_backup_content().
It seems to me that there is no point in having the list of
tablespaces in BackupStateData? This actually makes the code harder
to follow, see for example the changes with do_pg_backup_start(), we
the list of tablespace may or may be not passed down as a pointer of
BackupStateData while BackupStateData is itself the first argument of
this routine. These are independent from the label and backup history
file, as well.
I haven't stored the list of tablespaces in BackupState, it's the
string that do_pg_backup_start() creates is stored in there for
carrying it till pg_backup_stop(). Adding the tablespace_map,
backup_label, history_file in BackupState makes it easy to carry them
across various backup related functions.
We need to be careful about the file format (see read_backup_label()),
and create_backup_content_str() is careful about that which is good.
Core does not care about the format of the backup history file, though
some community tools may.
Are you suggesting that we need something like check_history_file()
similar to what read_backup_label() does by parsing each line of the
label file and erroring out if not in the required format?
I agree that what you are proposing here
makes the generation of these files easier to follow, but let's
document what forhistoryfile is here for, at least. Saving the
the backup label and history file strings in BackupState is a
confusing interface IMO. It would be more intuitive to have the
backup state in input, and provide the string generated in output
depending on what we want from the backup state.
We need to carry tablespace_map contents from do_pg_backup_start()
till pg_backup_stop(), backup_label and history_file too are easy to
carry across. Hence it will be good to have all of them i.e.
tablespace_map, backup_label and history_file in the BackupState
structure. IMO, this way is good.
- backup_started_in_recovery = RecoveryInProgress(); + Assert(state != NULL); + + in_recovery = RecoveryInProgress(); [...] - if (strcmp(backupfrom, "standby") == 0 && !backup_started_in_recovery) + if (state->started_in_recovery == true && in_recovery == false)I would have kept the naming to backup_started_in_recovery here. What
you are doing is logically right by relying on started_in_recovery to
check if recovery was running when the backup started, but this just
creates useless noise in the refactoring.
PSA new patch.
Something unrelated to your patch that I am noticing while scanning
the area is that we have been always lazy in freeing the label file
data allocated in TopMemoryContext when using the SQL functions if the
backup is aborted. We are not talking about this much amount of
memory each time a backup is performed, but that could be a cause for
memory bloat in a server if the same session is used and backups keep
failing, as the data is freed only on a successful pg_backup_stop().
Base backups through the replication protocol don't care about that as
the code keeps around the same pointer for the whole duration of
perform_base_backup(). Trying to tie the cleanup of the label file
with the abort phase would be the cause of more complications with
do_pg_backup_stop(), and xlog.c has no need to know about that now.
Just a remark for later.
Yeah, I think that can be solved by passing in backup_state to
do_pg_abort_backup(). If okay, I can work on this too as 0002 patch in
this thread or we can discuss this separately to get more attention
after this refactoring patch gets in.
I'm attaching v5 patch with the above review comments addressed,
please review it further.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v5-0001-Refactor-backup-related-code.patchapplication/octet-stream; name=v5-0001-Refactor-backup-related-code.patchDownload+275-193
On 2022/09/17 16:18, Bharath Rupireddy wrote:
Good idea. It makes a lot more sense to me, because xlog.c is already
a file of 9000 LOC. I've created xlogbackup.c/.h files and added the
new code there. Once this patch gets in, I can offer my hand to move
do_pg_backup_start() and do_pg_backup_stop() from xlog.c and if okay,
pg_backup_start() and pg_backup_stop() from xlogfuncs.c to
xlogbackup.c/.h. Then, we might have to create new get/set APIs for
XLogCtl fields that do_pg_backup_start() and do_pg_backup_stop()
access.
The definition of SessionBackupState enum type also should be in xlogbackup.h?
We need to carry tablespace_map contents from do_pg_backup_start()
till pg_backup_stop(), backup_label and history_file too are easy to
carry across. Hence it will be good to have all of them i.e.
tablespace_map, backup_label and history_file in the BackupState
structure. IMO, this way is good.
backup_label and history_file are not carried between pg_backup_start()
and _stop(), so don't need to be saved in BackupState. Their contents
can be easily created from other saved fields in BackupState,
if necessary. So at least for me it's better to get rid of them from
BackupState and don't allocate TopMemoryContext memory for them.
Something unrelated to your patch that I am noticing while scanning
the area is that we have been always lazy in freeing the label file
data allocated in TopMemoryContext when using the SQL functions if the
backup is aborted. We are not talking about this much amount of
memory each time a backup is performed, but that could be a cause for
memory bloat in a server if the same session is used and backups keep
failing, as the data is freed only on a successful pg_backup_stop().
Base backups through the replication protocol don't care about that as
the code keeps around the same pointer for the whole duration of
perform_base_backup(). Trying to tie the cleanup of the label file
with the abort phase would be the cause of more complications with
do_pg_backup_stop(), and xlog.c has no need to know about that now.
Just a remark for later.Yeah, I think that can be solved by passing in backup_state to
do_pg_abort_backup(). If okay, I can work on this too as 0002 patch in
this thread or we can discuss this separately to get more attention
after this refactoring patch gets in.
Or, to avoid such memory bloat, how about allocating the memory for
backup_state only when it's NULL?
I'm attaching v5 patch with the above review comments addressed,
please review it further.
Thanks for updating the patch!
+ char startxlogfile[MAXFNAMELEN_BACKUP]; /* backup start WAL file */
<snip>
+ char stopxlogfile[MAXFNAMELEN_BACKUP]; /* backup stop WAL file */
These file names seem not necessary in BackupState because they can be
calculated from other fields like startpoint and starttli, etc when
making backup_label and history file contents. If we remove them from
BackupState, we can also remove the definition of MAXFNAMELEN_BACKUP
macro from xlogbackup.h.
+ /* construct backup_label contents */
+ build_backup_content(state, false);
In basebackup case, build_backup_content() is called unnecessary twice
because do_pg_stop_backup() and its caller, perform_base_backup() call
that. This makes me think that it's better to get rid of the call to
build_backup_content() from do_pg_backup_stop(). Instead its callers,
perform_base_backup() and pg_backup_stop() should call that.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Sun, Sep 18, 2022 at 7:38 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2022/09/17 16:18, Bharath Rupireddy wrote:
Good idea. It makes a lot more sense to me, because xlog.c is already
a file of 9000 LOC. I've created xlogbackup.c/.h files and added the
new code there. Once this patch gets in, I can offer my hand to move
do_pg_backup_start() and do_pg_backup_stop() from xlog.c and if okay,
pg_backup_start() and pg_backup_stop() from xlogfuncs.c to
xlogbackup.c/.h. Then, we might have to create new get/set APIs for
XLogCtl fields that do_pg_backup_start() and do_pg_backup_stop()
access.The definition of SessionBackupState enum type also should be in xlogbackup.h?
Correct. Basically, all the backup related code from xlog.c,
xlogfuncs.c and elsewhere can go to xlogbackup.c/.h. I will focus on
that refactoring patch once this gets in.
We need to carry tablespace_map contents from do_pg_backup_start()
till pg_backup_stop(), backup_label and history_file too are easy to
carry across. Hence it will be good to have all of them i.e.
tablespace_map, backup_label and history_file in the BackupState
structure. IMO, this way is good.backup_label and history_file are not carried between pg_backup_start()
and _stop(), so don't need to be saved in BackupState. Their contents
can be easily created from other saved fields in BackupState,
if necessary. So at least for me it's better to get rid of them from
BackupState and don't allocate TopMemoryContext memory for them.
Yeah, but they have to be carried from do_pg_backup_stop() to
pg_backup_stop() or callers and also instead of keeping tablespace_map
in BackupState and others elsewhere don't seem to be a good idea to
me. IMO, BackupState is a good place to contain all the information
that's carried across various functions. I've changed the code to
lazily (upon first use in the backend) allocate memory for all of them
as we're concerned of the memory allocation beforehand.
Yeah, I think that can be solved by passing in backup_state to
do_pg_abort_backup(). If okay, I can work on this too as 0002 patch in
this thread or we can discuss this separately to get more attention
after this refactoring patch gets in.Or, to avoid such memory bloat, how about allocating the memory for
backup_state only when it's NULL?
Ah my bad, I missed that. Done now.
I'm attaching v5 patch with the above review comments addressed,
please review it further.Thanks for updating the patch!
Thanks for reviewing it.
+ char startxlogfile[MAXFNAMELEN_BACKUP]; /* backup start WAL file */ <snip> + char stopxlogfile[MAXFNAMELEN_BACKUP]; /* backup stop WAL file */These file names seem not necessary in BackupState because they can be
calculated from other fields like startpoint and starttli, etc when
making backup_label and history file contents. If we remove them from
BackupState, we can also remove the definition of MAXFNAMELEN_BACKUP
macro from xlogbackup.h.
Done.
+ /* construct backup_label contents */ + build_backup_content(state, false);In basebackup case, build_backup_content() is called unnecessary twice
because do_pg_stop_backup() and its caller, perform_base_backup() call
that. This makes me think that it's better to get rid of the call to
build_backup_content() from do_pg_backup_stop(). Instead its callers,
perform_base_backup() and pg_backup_stop() should call that.
Yeah, it's a good idea. Done that way. It's easier because we can
create backup_label file contents at any point of time after
pg_backup_start().
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v6-0001-Refactor-backup-related-code.patchapplication/octet-stream; name=v6-0001-Refactor-backup-related-code.patchDownload+362-201
On Sun, Sep 18, 2022 at 1:23 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
cfbot fails [1]https://cirrus-ci.com/task/5816966114967552?logs=test_world#L720 with v6 patch. I made a silly mistake by not checking
the output of "make check-world -j 16" fully, I just saw the end
message "All tests successful." before posting the v6 patch.
The failure is due to perform_base_backup() accessing BackupState's
tablespace_map without a null check, so I fixed it.
Sorry for the noise. Please review the attached v7 patch further.
[1]: https://cirrus-ci.com/task/5816966114967552?logs=test_world#L720
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v7-0001-Refactor-backup-related-code.patchapplication/octet-stream; name=v7-0001-Refactor-backup-related-code.patchDownload+364-202
On 2022/09/18 23:09, Bharath Rupireddy wrote:
On Sun, Sep 18, 2022 at 1:23 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:cfbot fails [1] with v6 patch. I made a silly mistake by not checking
the output of "make check-world -j 16" fully, I just saw the end
message "All tests successful." before posting the v6 patch.The failure is due to perform_base_backup() accessing BackupState's
tablespace_map without a null check, so I fixed it.Sorry for the noise. Please review the attached v7 patch further.
Thanks for updating the patch!
=# SELECT * FROM pg_backup_start('test', true);
=# SELECT * FROM pg_backup_stop();
LOG: server process (PID 15651) was terminated by signal 11: Segmentation fault: 11
DETAIL: Failed process was running: SELECT * FROM pg_backup_stop();
With v7 patch, pg_backup_stop() caused the segmentation fault.
=# SELECT * FROM pg_backup_start(repeat('test', 1024));
ERROR: backup label too long (max 1024 bytes)
STATEMENT: SELECT * FROM pg_backup_start(repeat('test', 1024));
=# SELECT * FROM pg_backup_start(repeat('testtest', 1024));
LOG: server process (PID 15844) was terminated by signal 11: Segmentation fault: 11
DETAIL: Failed process was running: SELECT * FROM pg_backup_start(repeat('testtest', 1024));
When I specified longer backup label in the second run of pg_backup_start()
after the first run failed, it caused the segmentation fault.
+ state = (BackupState *) palloc0(sizeof(BackupState));
+ state->name = pstrdup(name);
pg_backup_start() calls allocate_backup_state() and allocates the memory for
the input backup label before checking its length in do_pg_backup_start().
This can cause the memory for backup label to be allocated too much
unnecessary. I think that the maximum length of BackupState.name should
be MAXPGPATH (i.e., maximum allowed length for backup label).
The definition of SessionBackupState enum type also should be in xlogbackup.h?
Correct. Basically, all the backup related code from xlog.c,
xlogfuncs.c and elsewhere can go to xlogbackup.c/.h. I will focus on
that refactoring patch once this gets in.
Understood.
Yeah, but they have to be carried from do_pg_backup_stop() to
pg_backup_stop() or callers and also instead of keeping tablespace_map
in BackupState and others elsewhere don't seem to be a good idea to
me. IMO, BackupState is a good place to contain all the information
that's carried across various functions.
In v7 patch, since pg_backup_stop() calls build_backup_content(),
backup_label and history_file seem not to be carried from do_pg_backup_stop()
to pg_backup_stop(). This makes me still think that it's better not to include
them in BackupState...
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Mon, Sep 19, 2022 at 2:38 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
Thanks for updating the patch!
=# SELECT * FROM pg_backup_start('test', true);
=# SELECT * FROM pg_backup_stop();
LOG: server process (PID 15651) was terminated by signal 11: Segmentation fault: 11
DETAIL: Failed process was running: SELECT * FROM pg_backup_stop();With v7 patch, pg_backup_stop() caused the segmentation fault.
Fixed. I believed that the regression tests cover pg_backup_start()
and pg_backup_stop(), and relied on make check-world, surprisingly
there's no test that covers these functions. Is it a good idea to add
tests for these functions in misc_functions.sql or backup.sql or
somewhere so that they get run as part of make check? Thoughts?
=# SELECT * FROM pg_backup_start(repeat('test', 1024));
ERROR: backup label too long (max 1024 bytes)
STATEMENT: SELECT * FROM pg_backup_start(repeat('test', 1024));=# SELECT * FROM pg_backup_start(repeat('testtest', 1024));
LOG: server process (PID 15844) was terminated by signal 11: Segmentation fault: 11
DETAIL: Failed process was running: SELECT * FROM pg_backup_start(repeat('testtest', 1024));When I specified longer backup label in the second run of pg_backup_start()
after the first run failed, it caused the segmentation fault.+ state = (BackupState *) palloc0(sizeof(BackupState)); + state->name = pstrdup(name);pg_backup_start() calls allocate_backup_state() and allocates the memory for
the input backup label before checking its length in do_pg_backup_start().
This can cause the memory for backup label to be allocated too much
unnecessary. I think that the maximum length of BackupState.name should
be MAXPGPATH (i.e., maximum allowed length for backup label).
That's a good idea. I'm marking a flag if the label name overflows (>
MAXPGPATH), later using it in do_pg_backup_start() to error out. We
could've thrown error directly, but that changes the behaviour - right
now, first "
wal_level must be set to \"replica\" or \"logical\" at server start."
gets emitted and then label name overflow error - I don't want to do
that.
Yeah, but they have to be carried from do_pg_backup_stop() to
pg_backup_stop() or callers and also instead of keeping tablespace_map
in BackupState and others elsewhere don't seem to be a good idea to
me. IMO, BackupState is a good place to contain all the information
that's carried across various functions.In v7 patch, since pg_backup_stop() calls build_backup_content(),
backup_label and history_file seem not to be carried from do_pg_backup_stop()
to pg_backup_stop(). This makes me still think that it's better not to include
them in BackupState...
I'm a bit worried about the backup state being spread across if we
separate out backup_label and history_file from BackupState and keep
tablespace_map contents there. As I said upthread, we are not
allocating memory for them at the beginning, we allocate only when
needed. IMO, this code is readable and more extensible.
I've also taken help of the error callback mechanism to clean up the
allocated memory in case of a failure. For do_pg_abort_backup() cases,
I think we don't need to clean the memory because that gets called on
proc exit (before_shmem_exit()).
Please review the v8 patch further.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com