Timeline ID in backup_label file
Hi all,
Lately, in order to extract some information from a backup_label file
in python I have found myself doing something like the following to
get a timeline number (feel free to reuse that code, especially the
regex pattern):
pattern = re.compile('^START WAL
LOCATION:.*([0-9A-F]+\/[0-9A-F]+).*\(file ([0-9A-F]+)\)')
if pattern.match(backup_lable_line):
current_lsn = m.group(1)
current_segment = m.group(2)
current_tli = str(int(current_segment[:8], 16))
That's more or less similar to what read_backup_label()@xlog.c does
using sscanf(), still I keep wondering why we need to go through this
much complication to get the origin timeline number of a base backup,
information which is IMO as important as the start LSN when working on
backup methods.
I would find interesting to add at the bottom of the backup_label file
a new field called "START TIMELINE: %d" to put this information in a
more understandable shape. Any opinions?
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 26 October 2017 at 02:50, Michael Paquier <michael.paquier@gmail.com> wrote:
Hi all,
Lately, in order to extract some information from a backup_label file
in python I have found myself doing something like the following to
get a timeline number (feel free to reuse that code, especially the
regex pattern):
pattern = re.compile('^START WAL
LOCATION:.*([0-9A-F]+\/[0-9A-F]+).*\(file ([0-9A-F]+)\)')
if pattern.match(backup_lable_line):
current_lsn = m.group(1)
current_segment = m.group(2)
current_tli = str(int(current_segment[:8], 16))That's more or less similar to what read_backup_label()@xlog.c does
using sscanf(), still I keep wondering why we need to go through this
much complication to get the origin timeline number of a base backup,
information which is IMO as important as the start LSN when working on
backup methods.I would find interesting to add at the bottom of the backup_label file
a new field called "START TIMELINE: %d" to put this information in a
more understandable shape. Any opinions?
Strong "yes" from me.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/25/17 10:10 PM, Craig Ringer wrote:
On 26 October 2017 at 02:50, Michael Paquier <michael.paquier@gmail.com> wrote:
I would find interesting to add at the bottom of the backup_label file
a new field called "START TIMELINE: %d" to put this information in a
more understandable shape. Any opinions?Strong "yes" from me.
+1
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Oct 26, 2017 at 5:50 AM, David Steele <david@pgmasters.net> wrote:
On 10/25/17 10:10 PM, Craig Ringer wrote:
On 26 October 2017 at 02:50, Michael Paquier <michael.paquier@gmail.com> wrote:
I would find interesting to add at the bottom of the backup_label file
a new field called "START TIMELINE: %d" to put this information in a
more understandable shape. Any opinions?Strong "yes" from me.
+1
Thanks for the feedback. Attached is a patch to achieve so, I have
added as well a STOP TIMELINE field in the backup history file. Note
that START TIMELINE gets automatically into the backup history file.
Added a CF entry as well.
--
Michael
Attachments:
backup_label_tli.patchapplication/octet-stream; name=backup_label_tli.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dd028a12a4..fe35da28f1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10587,6 +10587,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
backup_started_in_recovery ? "standby" : "master");
appendStringInfo(labelfile, "START TIME: %s\n", strfbuf);
appendStringInfo(labelfile, "LABEL: %s\n", backupidstr);
+ appendStringInfo(labelfile, "START TIMELINE: %u\n", starttli);
/*
* Okay, write the file, or return its contents to caller.
@@ -11050,6 +11051,7 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
/* transfer remaining lines from label to history file */
fprintf(fp, "%s", remaining);
fprintf(fp, "STOP TIME: %s\n", strfbuf);
+ fprintf(fp, "STOP TIMELINE: %u\n", stoptli);
if (fflush(fp) || ferror(fp) || FreeFile(fp))
ereport(ERROR,
(errcode_for_file_access(),
@@ -11252,7 +11254,7 @@ read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired,
bool *backupFromStandby)
{
char startxlogfilename[MAXFNAMELEN];
- TimeLineID tli;
+ TimeLineID tli1, tli2;
FILE *lfp;
char ch;
char backuptype[20];
@@ -11283,7 +11285,7 @@ read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired,
* format).
*/
if (fscanf(lfp, "START WAL LOCATION: %X/%X (file %08X%16s)%c",
- &hi, &lo, &tli, startxlogfilename, &ch) != 5 || ch != '\n')
+ &hi, &lo, &tli1, startxlogfilename, &ch) != 5 || ch != '\n')
ereport(FATAL,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE)));
@@ -11312,6 +11314,20 @@ read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired,
*backupFromStandby = true;
}
+ /*
+ * START TIMELINE is new as of 11. Its parsing is not mandatory, still
+ * use it as a sanity check if present.
+ */
+ if (fscanf(lfp, "START TIMELINE: %u\n", &tli2) == 1)
+ {
+ if (tli1 != tli2)
+ ereport(FATAL,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE),
+ errdetail("Timeline ID parsed is %u, but expected %u",
+ tli2, tli1)));
+ }
+
if (ferror(lfp) || FreeFile(lfp))
ereport(FATAL,
(errcode_for_file_access(),
Hi Michael,
Find my review below.
On 10/26/17 2:03 PM, Michael Paquier wrote:
Thanks for the feedback. Attached is a patch to achieve so, I have
added as well a STOP TIMELINE field in the backup history file. Note
that START TIMELINE gets automatically into the backup history file.
Added a CF entry as well.
+ TimeLineID tli1, tli2;
I'm not that excited about these names but don't have any better ideas.
+ if (fscanf(lfp, "START TIMELINE: %u\n", &tli2) == 1)
This didn't work when I tested it (I had intentionally munged the "START
TIMELINE" to produce an error). The problem is the "START TIME" and
"LABEL" lines which are not being read. I added a few fgets() calls and
it worked.
Thanks!
--
-David
david@pgmasters.net
On Wed, Nov 15, 2017 at 11:16 PM, David Steele <david@pgmasters.net> wrote:
Find my review below.
On 10/26/17 2:03 PM, Michael Paquier wrote:
Thanks for the feedback. Attached is a patch to achieve so, I have
added as well a STOP TIMELINE field in the backup history file. Note
that START TIMELINE gets automatically into the backup history file.
Added a CF entry as well.+ TimeLineID tli1, tli2;
I'm not that excited about these names but don't have any better ideas.
One is tli_from_segname and the second is tli_from_file. Would
something like that sound better?
+ if (fscanf(lfp, "START TIMELINE: %u\n", &tli2) == 1)
This didn't work when I tested it (I had intentionally munged the "START
TIMELINE" to produce an error). The problem is the "START TIME" and
"LABEL" lines which are not being read. I added a few fgets() calls and
it worked.
Currently backup label files are generated as such:
START WAL LOCATION: 0/2000028 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/2000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2017-11-16 07:52:50 JST
LABEL: popo
START TIMELINE: 1
There could be two things that could be done here:
1) Keep read_backup_label order-dependent and look at the START TIME
and LABEL fields, then kick in ereport(LOG) with the information. This
makes the fields in the backup_label still order-dependent.
2) Make read_backup_label smarter by parsing it line-by-line and fill
in the data wanted. This way the parsing logic and sanity checks are
split into two, and Postgres is smarter at looking at backup_label
files generated by any backup tool.
Which one do you prefer? Getting input from backup tool maintainers is
important here. 2) is more extensible if more fields are added to the
backup_label for a reason or another in the future.
--
Michael
On 11/15/17 6:01 PM, Michael Paquier wrote:
On Wed, Nov 15, 2017 at 11:16 PM, David Steele <david@pgmasters.net> wrote:
Find my review below.
On 10/26/17 2:03 PM, Michael Paquier wrote:
Thanks for the feedback. Attached is a patch to achieve so, I have
added as well a STOP TIMELINE field in the backup history file. Note
that START TIMELINE gets automatically into the backup history file.
Added a CF entry as well.+ TimeLineID tli1, tli2;
I'm not that excited about these names but don't have any better ideas.
One is tli_from_segname and the second is tli_from_file. Would
something like that sound better?
Yes, those names are better.
+ if (fscanf(lfp, "START TIMELINE: %u\n", &tli2) == 1)
This didn't work when I tested it (I had intentionally munged the "START
TIMELINE" to produce an error). The problem is the "START TIME" and
"LABEL" lines which are not being read. I added a few fgets() calls and
it worked.Currently backup label files are generated as such:
START WAL LOCATION: 0/2000028 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/2000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2017-11-16 07:52:50 JST
LABEL: popo
START TIMELINE: 1There could be two things that could be done here:
1) Keep read_backup_label order-dependent and look at the START TIME
and LABEL fields, then kick in ereport(LOG) with the information. This
makes the fields in the backup_label still order-dependent.
2) Make read_backup_label smarter by parsing it line-by-line and fill
in the data wanted. This way the parsing logic and sanity checks are
split into two, and Postgres is smarter at looking at backup_label
files generated by any backup tool.Which one do you prefer? Getting input from backup tool maintainers is
important here. 2) is more extensible if more fields are added to the
backup_label for a reason or another in the future.
For this patch at least, I think we should do #1. Getting rid of the
order dependency is attractive but there may be other programs that are
depending on the order. I know you are not proposing to change the
order now, but it *could* be changed with #2.
Also, I think DEBUG1 would be a more appropriate log level if any
logging is done.
Regards,
--
-David
david@pgmasters.net
On Thu, Nov 16, 2017 at 9:20 AM, David Steele <david@pgmasters.net> wrote:
For this patch at least, I think we should do #1. Getting rid of the order
dependency is attractive but there may be other programs that are depending
on the order. I know you are not proposing to change the order now, but it
*could* be changed with #2.
read_backup_label() is a static function in the backend code. With #2
I do not imply to change the order of the elements written in the
backup_label file, just to make the way they are parsed smarter.
Also, I think DEBUG1 would be a more appropriate log level if any logging is
done.
OK, the label and time strings can include spaces. The label string is
assumed to not be bigger than 1028 bytes, and the timestamp is assumed
that it can get up to 128. So it is possible to use stuff like
fscanf("%127[^\n]\n") to avoid any buffer overflow problems on the
backend. If a backup_label is generated with strings longer than that,
then read_backup_label would fail to parse the next entries but that's
not really something to worry about IMO as those are only minor sanity
checks. Having a safer coding in the backend is more important, and
the new checks should not trigger any hard failures.
--
Michael
Attachments:
backup_label_tli_v2.patchapplication/octet-stream; name=backup_label_tli_v2.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e729180f82..0521b7b4ce 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10538,6 +10538,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
backup_started_in_recovery ? "standby" : "master");
appendStringInfo(labelfile, "START TIME: %s\n", strfbuf);
appendStringInfo(labelfile, "LABEL: %s\n", backupidstr);
+ appendStringInfo(labelfile, "START TIMELINE: %u\n", starttli);
/*
* Okay, write the file, or return its contents to caller.
@@ -10998,9 +10999,13 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
(uint32) (startpoint >> 32), (uint32) startpoint, startxlogfilename);
fprintf(fp, "STOP WAL LOCATION: %X/%X (file %s)\n",
(uint32) (stoppoint >> 32), (uint32) stoppoint, stopxlogfilename);
- /* transfer remaining lines from label to history file */
+ /*
+ * Transfer remaining lines including label and start timeline to
+ * history file.
+ */
fprintf(fp, "%s", remaining);
fprintf(fp, "STOP TIME: %s\n", strfbuf);
+ fprintf(fp, "STOP TIMELINE: %u\n", stoptli);
if (fflush(fp) || ferror(fp) || FreeFile(fp))
ereport(ERROR,
(errcode_for_file_access(),
@@ -11203,11 +11208,13 @@ read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired,
bool *backupFromStandby)
{
char startxlogfilename[MAXFNAMELEN];
- TimeLineID tli;
+ TimeLineID tli_from_walseg, tli_from_file;
FILE *lfp;
char ch;
char backuptype[20];
char backupfrom[20];
+ char backuplabel[MAXPGPATH];
+ char backuptime[128];
uint32 hi,
lo;
@@ -11234,7 +11241,7 @@ read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired,
* format).
*/
if (fscanf(lfp, "START WAL LOCATION: %X/%X (file %08X%16s)%c",
- &hi, &lo, &tli, startxlogfilename, &ch) != 5 || ch != '\n')
+ &hi, &lo, &tli_from_walseg, startxlogfilename, &ch) != 5 || ch != '\n')
ereport(FATAL,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE)));
@@ -11263,6 +11270,39 @@ read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired,
*backupFromStandby = true;
}
+ /*
+ * Parse START TIME and LABEL. Those are not mandatory fields for
+ * recovery but checking for their presence is useful for debugging
+ * and the next sanity checks. Cope also with the fact that the
+ * result buffers have a pre-allocated size, hence if the backup_label
+ * file has been generated with strings longer than the maximum assumed
+ * here an incorrect parsing happens. That's fine as only minor
+ * consistency checks are done afterwards.
+ */
+ if (fscanf(lfp, "START TIME: %127[^\n]\n", backuptime) == 1)
+ ereport(DEBUG1,
+ (errmsg("backup time %s in file \"%s\"",
+ backuptime, BACKUP_LABEL_FILE)));
+
+ if (fscanf(lfp, "LABEL: %1023[^\n]\n", backuplabel) == 1)
+ ereport(DEBUG1,
+ (errmsg("backup label %s in file \"%s\"",
+ backuplabel, BACKUP_LABEL_FILE)));
+
+ /*
+ * START TIMELINE is new as of 11. Its parsing is not mandatory, still
+ * use it as a sanity check if present.
+ */
+ if (fscanf(lfp, "START TIMELINE: %u\n", &tli_from_file) == 1)
+ {
+ if (tli_from_walseg != tli_from_file)
+ ereport(FATAL,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE),
+ errdetail("Timeline ID parsed is %u, but expected %u",
+ tli_from_file, tli_from_walseg)));
+ }
+
if (ferror(lfp) || FreeFile(lfp))
ereport(FATAL,
(errcode_for_file_access(),
Hi Michael,
On 11/15/17 10:09 PM, Michael Paquier wrote:
On Thu, Nov 16, 2017 at 9:20 AM, David Steele <david@pgmasters.net> wrote:
For this patch at least, I think we should do #1. Getting rid of the order
dependency is attractive but there may be other programs that are depending
on the order. I know you are not proposing to change the order now, but it
*could* be changed with #2.read_backup_label() is a static function in the backend code. With #2
I do not imply to change the order of the elements written in the
backup_label file, just to make the way they are parsed smarter.
My point is that the order *could* be changed and it wouldn't be noticed
if the read function were improved as you propose.
I'm not against the idea, just think we should continue to enforce the
order unless we decide an interface break is OK.
Also, I think DEBUG1 would be a more appropriate log level if any logging is
done.OK, the label and time strings can include spaces. The label string is
assumed to not be bigger than 1028 bytes, and the timestamp is assumed
that it can get up to 128. So it is possible to use stuff like
fscanf("%127[^\n]\n") to avoid any buffer overflow problems on the
backend. If a backup_label is generated with strings longer than that,
then read_backup_label would fail to parse the next entries but that's
not really something to worry about IMO as those are only minor sanity
checks. Having a safer coding in the backend is more important, and
the new checks should not trigger any hard failures.
I have tested and get an error as expected when I munge the backup_label
file:
FATAL: invalid data in file "backup_label"
DETAIL: Timeline ID parsed is 2, but expected 1
Everything else looks good so I will mark it ready for committer.
Regards,
--
-David
david@pgmasters.net
On Mon, Nov 27, 2017 at 11:06 PM, David Steele <david@pgmasters.net> wrote:
On 11/15/17 10:09 PM, Michael Paquier wrote:
On Thu, Nov 16, 2017 at 9:20 AM, David Steele <david@pgmasters.net> wrote:
For this patch at least, I think we should do #1. Getting rid of the order
dependency is attractive but there may be other programs that are depending
on the order. I know you are not proposing to change the order now, but it
*could* be changed with #2.read_backup_label() is a static function in the backend code. With #2
I do not imply to change the order of the elements written in the
backup_label file, just to make the way they are parsed smarter.My point is that the order *could* be changed and it wouldn't be noticed
if the read function were improved as you propose.I'm not against the idea, just think we should continue to enforce the
order unless we decide an interface break is OK.
I still don't quite understand here. The order the items are read does
not cause a backward-incompatible change. True that there is no reason
to change that either now.
Also, I think DEBUG1 would be a more appropriate log level if any logging is
done.OK, the label and time strings can include spaces. The label string is
assumed to not be bigger than 1028 bytes, and the timestamp is assumed
that it can get up to 128. So it is possible to use stuff like
fscanf("%127[^\n]\n") to avoid any buffer overflow problems on the
backend. If a backup_label is generated with strings longer than that,
then read_backup_label would fail to parse the next entries but that's
not really something to worry about IMO as those are only minor sanity
checks. Having a safer coding in the backend is more important, and
the new checks should not trigger any hard failures.I have tested and get an error as expected when I munge the backup_label
file:FATAL: invalid data in file "backup_label"
DETAIL: Timeline ID parsed is 2, but expected 1Everything else looks good so I will mark it ready for committer.
Thanks. This maps what I saw.
--
Michael
On 11/27/17 7:11 PM, Michael Paquier wrote:
On Mon, Nov 27, 2017 at 11:06 PM, David Steele <david@pgmasters.net> wrote:
On 11/15/17 10:09 PM, Michael Paquier wrote:
read_backup_label() is a static function in the backend code. With #2
I do not imply to change the order of the elements written in the
backup_label file, just to make the way they are parsed smarter.My point is that the order *could* be changed and it wouldn't be noticed
if the read function were improved as you propose.I'm not against the idea, just think we should continue to enforce the
order unless we decide an interface break is OK.I still don't quite understand here. The order the items are read does
not cause a backward-incompatible change. True that there is no reason
to change that either now.
Perhaps I'm the one who is misunderstanding. If you propose a patch I'll
be happy to review it, though I doubt there is a lot to be gained even
if it would be a better implementation.
Regards,
--
-David
david@pgmasters.net
On Tue, Nov 28, 2017 at 9:40 AM, David Steele <david@pgmasters.net> wrote:
Perhaps I'm the one who is misunderstanding. If you propose a patch I'll be
happy to review it, though I doubt there is a lot to be gained even if it
would be a better implementation.
OK. I'll keep that in mind. Thanks for the input.
--
Michael
On 27 November 2017 at 14:06, David Steele <david@pgmasters.net> wrote:
I have tested and get an error as expected when I munge the backup_label
file:FATAL: invalid data in file "backup_label"
DETAIL: Timeline ID parsed is 2, but expected 1Everything else looks good so I will mark it ready for committer.
Sounds good.
No tests? No docs/extended explanatory comments?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jan 5, 2018 at 11:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 27 November 2017 at 14:06, David Steele <david@pgmasters.net> wrote:
I have tested and get an error as expected when I munge the backup_label
file:FATAL: invalid data in file "backup_label"
DETAIL: Timeline ID parsed is 2, but expected 1Everything else looks good so I will mark it ready for committer.
Sounds good.
Thanks for the feedback.
No tests?
Do you think that's worth the cycles as a TAP test? This can be done
by generating a dummy backup_label file and then start a standby to
see the failure, but this looks quite costly for minimum gain. This
code path is also taken tested in
src/test/recovery/t/001_stream_rep.pl for example, though that's not
the failure path. So if we add a DEBUG1 line after fetching correctly
the timeline ID this could help by looking at
tmp_check/log/001_stream_rep_standby_1.log, but this needs to set
"log_min_messages = debug1" PostgresNode.pm for example so as this
shows up in the logs (this could be useful if done by default
actually, DEBUG1 is not too talkative). Attached is an example of
patch doing so. See for the addition in PostgresNode.pm and this new
bit:
+ ereport(DEBUG1,
+ (errmsg("backup timeline %u in file \"%s\"",
+ tli_from_file, BACKUP_LABEL_FILE)));
No docs/extended explanatory comments?
There is no existing documentation about the format of the backup_file
in doc/. So it seems to me that it is not the problem of this patch to
fill in the hole. Regarding the comments, perhaps something better
could be done, but I am not quite sure what. We don't much explain
what the current fields mean, except that one can guess what they mean
from their names, which is the intention behind the code.
--
Michael
Attachments:
backup_label_tli_v2.patchapplication/octet-stream; name=backup_label_tli_v2.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e729180f82..0521b7b4ce 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10538,6 +10538,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
backup_started_in_recovery ? "standby" : "master");
appendStringInfo(labelfile, "START TIME: %s\n", strfbuf);
appendStringInfo(labelfile, "LABEL: %s\n", backupidstr);
+ appendStringInfo(labelfile, "START TIMELINE: %u\n", starttli);
/*
* Okay, write the file, or return its contents to caller.
@@ -10998,9 +10999,13 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
(uint32) (startpoint >> 32), (uint32) startpoint, startxlogfilename);
fprintf(fp, "STOP WAL LOCATION: %X/%X (file %s)\n",
(uint32) (stoppoint >> 32), (uint32) stoppoint, stopxlogfilename);
- /* transfer remaining lines from label to history file */
+ /*
+ * Transfer remaining lines including label and start timeline to
+ * history file.
+ */
fprintf(fp, "%s", remaining);
fprintf(fp, "STOP TIME: %s\n", strfbuf);
+ fprintf(fp, "STOP TIMELINE: %u\n", stoptli);
if (fflush(fp) || ferror(fp) || FreeFile(fp))
ereport(ERROR,
(errcode_for_file_access(),
@@ -11203,11 +11208,13 @@ read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired,
bool *backupFromStandby)
{
char startxlogfilename[MAXFNAMELEN];
- TimeLineID tli;
+ TimeLineID tli_from_walseg, tli_from_file;
FILE *lfp;
char ch;
char backuptype[20];
char backupfrom[20];
+ char backuplabel[MAXPGPATH];
+ char backuptime[128];
uint32 hi,
lo;
@@ -11234,7 +11241,7 @@ read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired,
* format).
*/
if (fscanf(lfp, "START WAL LOCATION: %X/%X (file %08X%16s)%c",
- &hi, &lo, &tli, startxlogfilename, &ch) != 5 || ch != '\n')
+ &hi, &lo, &tli_from_walseg, startxlogfilename, &ch) != 5 || ch != '\n')
ereport(FATAL,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE)));
@@ -11263,6 +11270,39 @@ read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired,
*backupFromStandby = true;
}
+ /*
+ * Parse START TIME and LABEL. Those are not mandatory fields for
+ * recovery but checking for their presence is useful for debugging
+ * and the next sanity checks. Cope also with the fact that the
+ * result buffers have a pre-allocated size, hence if the backup_label
+ * file has been generated with strings longer than the maximum assumed
+ * here an incorrect parsing happens. That's fine as only minor
+ * consistency checks are done afterwards.
+ */
+ if (fscanf(lfp, "START TIME: %127[^\n]\n", backuptime) == 1)
+ ereport(DEBUG1,
+ (errmsg("backup time %s in file \"%s\"",
+ backuptime, BACKUP_LABEL_FILE)));
+
+ if (fscanf(lfp, "LABEL: %1023[^\n]\n", backuplabel) == 1)
+ ereport(DEBUG1,
+ (errmsg("backup label %s in file \"%s\"",
+ backuplabel, BACKUP_LABEL_FILE)));
+
+ /*
+ * START TIMELINE is new as of 11. Its parsing is not mandatory, still
+ * use it as a sanity check if present.
+ */
+ if (fscanf(lfp, "START TIMELINE: %u\n", &tli_from_file) == 1)
+ {
+ if (tli_from_walseg != tli_from_file)
+ ereport(FATAL,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE),
+ errdetail("Timeline ID parsed is %u, but expected %u",
+ tli_from_file, tli_from_walseg)));
+ }
+
if (ferror(lfp) || FreeFile(lfp))
ereport(FATAL,
(errcode_for_file_access(),
On Sat, Jan 6, 2018 at 9:54 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Fri, Jan 5, 2018 at 11:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 27 November 2017 at 14:06, David Steele <david@pgmasters.net> wrote:
I have tested and get an error as expected when I munge the backup_label
file:FATAL: invalid data in file "backup_label"
DETAIL: Timeline ID parsed is 2, but expected 1Everything else looks good so I will mark it ready for committer.
Sounds good.
Thanks for the feedback.
Previous patch was incorrect, this was the same v2 as upthread.
Attached is the correct v3.
--
Michael
Attachments:
backup_label_tli_v3.patchapplication/octet-stream; name=backup_label_tli_v3.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 02974f0e52..e42b828edf 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10535,6 +10535,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
backup_started_in_recovery ? "standby" : "master");
appendStringInfo(labelfile, "START TIME: %s\n", strfbuf);
appendStringInfo(labelfile, "LABEL: %s\n", backupidstr);
+ appendStringInfo(labelfile, "START TIMELINE: %u\n", starttli);
/*
* Okay, write the file, or return its contents to caller.
@@ -11015,9 +11016,13 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
(uint32) (startpoint >> 32), (uint32) startpoint, startxlogfilename);
fprintf(fp, "STOP WAL LOCATION: %X/%X (file %s)\n",
(uint32) (stoppoint >> 32), (uint32) stoppoint, stopxlogfilename);
- /* transfer remaining lines from label to history file */
+ /*
+ * Transfer remaining lines including label and start timeline to
+ * history file.
+ */
fprintf(fp, "%s", remaining);
fprintf(fp, "STOP TIME: %s\n", strfbuf);
+ fprintf(fp, "STOP TIMELINE: %u\n", stoptli);
if (fflush(fp) || ferror(fp) || FreeFile(fp))
ereport(ERROR,
(errcode_for_file_access(),
@@ -11228,11 +11233,13 @@ read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired,
bool *backupFromStandby)
{
char startxlogfilename[MAXFNAMELEN];
- TimeLineID tli;
+ TimeLineID tli_from_walseg, tli_from_file;
FILE *lfp;
char ch;
char backuptype[20];
char backupfrom[20];
+ char backuplabel[MAXPGPATH];
+ char backuptime[128];
uint32 hi,
lo;
@@ -11259,7 +11266,7 @@ read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired,
* format).
*/
if (fscanf(lfp, "START WAL LOCATION: %X/%X (file %08X%16s)%c",
- &hi, &lo, &tli, startxlogfilename, &ch) != 5 || ch != '\n')
+ &hi, &lo, &tli_from_walseg, startxlogfilename, &ch) != 5 || ch != '\n')
ereport(FATAL,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE)));
@@ -11288,6 +11295,43 @@ read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired,
*backupFromStandby = true;
}
+ /*
+ * Parse START TIME and LABEL. Those are not mandatory fields for
+ * recovery but checking for their presence is useful for debugging
+ * and the next sanity checks. Cope also with the fact that the
+ * result buffers have a pre-allocated size, hence if the backup_label
+ * file has been generated with strings longer than the maximum assumed
+ * here an incorrect parsing happens. That's fine as only minor
+ * consistency checks are done afterwards.
+ */
+ if (fscanf(lfp, "START TIME: %127[^\n]\n", backuptime) == 1)
+ ereport(DEBUG1,
+ (errmsg("backup time %s in file \"%s\"",
+ backuptime, BACKUP_LABEL_FILE)));
+
+ if (fscanf(lfp, "LABEL: %1023[^\n]\n", backuplabel) == 1)
+ ereport(DEBUG1,
+ (errmsg("backup label %s in file \"%s\"",
+ backuplabel, BACKUP_LABEL_FILE)));
+
+ /*
+ * START TIMELINE is new as of 11. Its parsing is not mandatory, still
+ * use it as a sanity check if present.
+ */
+ if (fscanf(lfp, "START TIMELINE: %u\n", &tli_from_file) == 1)
+ {
+ if (tli_from_walseg != tli_from_file)
+ ereport(FATAL,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE),
+ errdetail("Timeline ID parsed is %u, but expected %u",
+ tli_from_file, tli_from_walseg)));
+
+ ereport(DEBUG1,
+ (errmsg("backup timeline %u in file \"%s\"",
+ tli_from_file, BACKUP_LABEL_FILE)));
+ }
+
if (ferror(lfp) || FreeFile(lfp))
ereport(FATAL,
(errcode_for_file_access(),
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 93faadc20e..80f68df246 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -419,6 +419,7 @@ sub init
print $conf "restart_after_crash = off\n";
print $conf "log_line_prefix = '%m [%p] %q%a '\n";
print $conf "log_statement = all\n";
+ print $conf "log_min_messages = debug1\n";
print $conf "log_replication_commands = on\n";
print $conf "wal_retrieve_retry_interval = '500ms'\n";
print $conf "port = $port\n";
On Sat, Jan 06, 2018 at 09:56:03AM +0900, Michael Paquier wrote:
Previous patch was incorrect, this was the same v2 as upthread.
Attached is the correct v3.
For the archive's sake, this version of the patch has been pushed as
6271fce. Thanks Simon for the commit, and David for the review!
--
Michael