pg_wal/RECOVERYHISTORY file remains after archive recovery
Hi,
When we do archive recovery from the database cluster of which
timeline ID is more than 2 pg_wal/RECOVERYHISTORY is remained even
after archive recovery completed.
The cause of this seems cbc55da556b that moved exitArchiveRecovery()
to before writeTimeLineHistory(). writeTimeLineHIstory() restores the
history file from archive directory and therefore creates
RECOVERYHISTORY file in pg_wal directory. We used to remove such
temporary file by exitArchiveRecovery() but with this commit the order
of calling these functions is reversed. Therefore we create
RECOVERYHISTORY file after exited from archive recovery mode and
remain it.
To fix it I think that we can remove RECOVERYHISTORY file before the
history file is archived in writeTimeLineHIstory(). The commit
cbc55da556b is intended to minimize the window between the moment the
file is written and the end-of-recovery record is generated. So I
think it's not good to put exitArchiveRecovery() after
writeTimeLineHIstory().
This issue seems to exist in all supported version as far as I read
the code, although I don't test all of them yet.
I've attached the draft patch to fix this issue. Regression test might
be required. Feedback and suggestion are very welcome.
Regards,
--
Masahiko Sawada
Attachments:
remove_recovered_historyfile.patchtext/x-patch; charset=US-ASCII; name=remove_recovered_historyfile.patchDownload
diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c
index c2ba480c70..b0fd1dc5ac 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -281,9 +281,9 @@ findNewestTimeLine(TimeLineID startTLI)
* switchpoint: WAL location where the system switched to the new timeline
* reason: human-readable explanation of why the timeline was switched
*
- * Currently this is only used at the end recovery, and so there are no locking
+ * Currently this is only used after the end recovery, and so there are no locking
* considerations. But we should be just as tense as XLogFileInit to avoid
- * emplacing a bogus file.
+ * emplacing a bogus file and need to get rid of recovered timeline-history file.
*/
void
writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
@@ -296,6 +296,7 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
int srcfd;
int fd;
int nbytes;
+ bool remove_tmphist = false;
Assert(newTLI > parentTLI); /* else bad selection of newTLI */
@@ -319,7 +320,8 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
if (ArchiveRecoveryRequested)
{
TLHistoryFileName(histfname, parentTLI);
- RestoreArchivedFile(path, histfname, "RECOVERYHISTORY", 0, false);
+ remove_tmphist = RestoreArchivedFile(path, histfname, "RECOVERYHISTORY",
+ 0, false);
}
else
TLHistoryFilePath(path, parentTLI);
@@ -375,6 +377,19 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not close file \"%s\": %m", path)));
+
+ /*
+ * Get rid of remaining recovered timeline-history file since we are
+ * already exited from archive recovery mode and therefore there is no
+ * cleanup later.
+ */
+ if (remove_tmphist)
+ {
+ char histpath[MAXPGPATH];
+
+ snprintf(histpath, MAXPGPATH, XLOGDIR "/RECOVERYHISTORY");
+ unlink(histpath); /* ignore any error */
+ }
}
/*
On Thu, Sep 26, 2019 at 5:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Hi,
When we do archive recovery from the database cluster of which
timeline ID is more than 2 pg_wal/RECOVERYHISTORY is remained even
after archive recovery completed.The cause of this seems cbc55da556b that moved exitArchiveRecovery()
to before writeTimeLineHistory(). writeTimeLineHIstory() restores the
history file from archive directory and therefore creates
RECOVERYHISTORY file in pg_wal directory. We used to remove such
temporary file by exitArchiveRecovery() but with this commit the order
of calling these functions is reversed. Therefore we create
RECOVERYHISTORY file after exited from archive recovery mode and
remain it.To fix it I think that we can remove RECOVERYHISTORY file before the
history file is archived in writeTimeLineHIstory(). The commit
cbc55da556b is intended to minimize the window between the moment the
file is written and the end-of-recovery record is generated. So I
think it's not good to put exitArchiveRecovery() after
writeTimeLineHIstory().This issue seems to exist in all supported version as far as I read
the code, although I don't test all of them yet.I've attached the draft patch to fix this issue. Regression test might
be required. Feedback and suggestion are very welcome.
What about moving the logic that removes RECO VERYXLOG and
RECOVERYHISTORY from exitArchiveRecovery() and performing it
just before/after RemoveNonParentXlogFiles()? Which looks simple.
Regards,
--
Fujii Masao
On Thu, Sep 26, 2019 at 6:23 PM Fujii Masao <masao.fujii@gmail.com> wrote:
On Thu, Sep 26, 2019 at 5:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Hi,
When we do archive recovery from the database cluster of which
timeline ID is more than 2 pg_wal/RECOVERYHISTORY is remained even
after archive recovery completed.The cause of this seems cbc55da556b that moved exitArchiveRecovery()
to before writeTimeLineHistory(). writeTimeLineHIstory() restores the
history file from archive directory and therefore creates
RECOVERYHISTORY file in pg_wal directory. We used to remove such
temporary file by exitArchiveRecovery() but with this commit the order
of calling these functions is reversed. Therefore we create
RECOVERYHISTORY file after exited from archive recovery mode and
remain it.To fix it I think that we can remove RECOVERYHISTORY file before the
history file is archived in writeTimeLineHIstory(). The commit
cbc55da556b is intended to minimize the window between the moment the
file is written and the end-of-recovery record is generated. So I
think it's not good to put exitArchiveRecovery() after
writeTimeLineHIstory().This issue seems to exist in all supported version as far as I read
the code, although I don't test all of them yet.I've attached the draft patch to fix this issue. Regression test might
be required. Feedback and suggestion are very welcome.What about moving the logic that removes RECO VERYXLOG and
RECOVERYHISTORY from exitArchiveRecovery() and performing it
just before/after RemoveNonParentXlogFiles()? Which looks simple.
Agreed. Attached the updated patch.
Regards,
--
Masahiko Sawada
Attachments:
v2_remove_recovered_historyfile.patchapplication/octet-stream; name=v2_remove_recovered_historyfile.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6c69eb6..c57dcc6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5461,7 +5461,6 @@ validateRecoveryParameters(void)
static void
exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
{
- char recoveryPath[MAXPGPATH];
char xlogfname[MAXFNAMELEN];
XLogSegNo endLogSegNo;
XLogSegNo startLogSegNo;
@@ -5542,17 +5541,6 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
XLogArchiveCleanup(xlogfname);
/*
- * Since there might be a partial WAL segment named RECOVERYXLOG, get rid
- * of it.
- */
- snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
- unlink(recoveryPath); /* ignore any error */
-
- /* Get rid of any remaining recovered timeline-history file, too */
- snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYHISTORY");
- unlink(recoveryPath); /* ignore any error */
-
- /*
* Remove the signal files out of the way, so that we don't accidentally
* re-enter archive recovery mode in a subsequent crash.
*/
@@ -7608,6 +7596,19 @@ StartupXLOG(void)
if (ArchiveRecoveryRequested)
{
+ char recoveryPath[MAXPGPATH];
+
+ /*
+ * We exited from archive mode. Since there might be a partial WAL
+ * segment named RECOVERYXLOG, get rid of it.
+ */
+ snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
+ unlink(recoveryPath); /* ignore any error */
+
+ /* Get rid of any remaining recovered timeline-history file, too */
+ snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYHISTORY");
+ unlink(recoveryPath); /* ignore any error */
+
/*
* We switched to a new timeline. Clean up segments on the old
* timeline.
On Fri, Sep 27, 2019 at 01:51:25PM +0900, Masahiko Sawada wrote:
On Thu, Sep 26, 2019 at 6:23 PM Fujii Masao <masao.fujii@gmail.com> wrote:
What about moving the logic that removes RECO VERYXLOG and
RECOVERYHISTORY from exitArchiveRecovery() and performing it
just before/after RemoveNonParentXlogFiles()? Which looks simple.Agreed. Attached the updated patch.
Mea culpa here, I have just noticed the thread. Fujii-san, would you
prefer if I take care of it? And also, what's the issue with not
doing the removal of both files just after writeTimeLineHistory()?
That's actually what happened before cbc55da5.
--
Michael
On Fri, Sep 27, 2019 at 5:09 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Sep 27, 2019 at 01:51:25PM +0900, Masahiko Sawada wrote:
On Thu, Sep 26, 2019 at 6:23 PM Fujii Masao <masao.fujii@gmail.com> wrote:
What about moving the logic that removes RECO VERYXLOG and
RECOVERYHISTORY from exitArchiveRecovery() and performing it
just before/after RemoveNonParentXlogFiles()? Which looks simple.Agreed. Attached the updated patch.
Mea culpa here, I have just noticed the thread. Fujii-san, would you
prefer if I take care of it? And also, what's the issue with not
doing the removal of both files just after writeTimeLineHistory()?
That's actually what happened before cbc55da5.
So you think that it's better to remove them just after writeTimeLineHistory()?
Per the following Sawada-san's comment, I was thinking that idea is basically
not good. And, RemoveNonParentXlogFiles() also removes garbage files from
pg_wal. It's simpler if similar codes exist near. Thought?
Regards,
--
Fujii Masao
On Fri, Sep 27, 2019 at 05:58:21PM +0900, Fujii Masao wrote:
So you think that it's better to remove them just after writeTimeLineHistory()?
Per the following Sawada-san's comment, I was thinking that idea is basically
not good. And, RemoveNonParentXlogFiles() also removes garbage files from
pg_wal. It's simpler if similar codes exist near. Thought?
Sawada-san's argument of upthread is that it is not good to put
exitArchiveRecovery() after writeTimeLineHIstory(), which is what
cbc55da has done per the reasons mentioned in the commit log, and we
should not change that.
My argument is we know that RECOVERYXLOG and RECOVERYHISTORY are not
needed anymore at this stage of recovery, hence we had better remove
them as soon as possible. I am not convinced that it is a good idea
to move the cleanup close to RemoveNonParentXlogFiles(). First, this
is an entirely different part of the logic where the startup process
has already switched to a new timeline. Second, we add more steps
between the moment the two files are not needed and the moment they
are removed, so any failure in-between would cause those files to
still be there (we cannot say either that we will not manipulate this
code later on) and we don't want those files to lie around. So,
mentioning that we do the cleanup just after writeTimeLineHIstory()
because we don't need them anymore is more consistent with what has
been done for ages for the end of archive recovery, something that
cbc55da unfortunately broke.
--
Michael
On Fri, Sep 27, 2019 at 7:16 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Sep 27, 2019 at 05:58:21PM +0900, Fujii Masao wrote:
So you think that it's better to remove them just after writeTimeLineHistory()?
Per the following Sawada-san's comment, I was thinking that idea is basically
not good. And, RemoveNonParentXlogFiles() also removes garbage files from
pg_wal. It's simpler if similar codes exist near. Thought?Sawada-san's argument of upthread is that it is not good to put
exitArchiveRecovery() after writeTimeLineHIstory(), which is what
cbc55da has done per the reasons mentioned in the commit log, and we
should not change that.My argument is we know that RECOVERYXLOG and RECOVERYHISTORY are not
needed anymore at this stage of recovery, hence we had better remove
them as soon as possible. I am not convinced that it is a good idea
to move the cleanup close to RemoveNonParentXlogFiles(). First, this
is an entirely different part of the logic where the startup process
has already switched to a new timeline. Second, we add more steps
between the moment the two files are not needed and the moment they
are removed, so any failure in-between would cause those files to
still be there (we cannot say either that we will not manipulate this
code later on) and we don't want those files to lie around. So,
mentioning that we do the cleanup just after writeTimeLineHIstory()
because we don't need them anymore is more consistent with what has
been done for ages for the end of archive recovery, something that
cbc55da unfortunately broke.
Ok, I have no objection to remove them just after writeTimeLineHistory().
Regards,
--
Fujii Masao
On Fri, Sep 27, 2019 at 8:41 PM Fujii Masao <masao.fujii@gmail.com> wrote:
On Fri, Sep 27, 2019 at 7:16 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Sep 27, 2019 at 05:58:21PM +0900, Fujii Masao wrote:
So you think that it's better to remove them just after writeTimeLineHistory()?
Per the following Sawada-san's comment, I was thinking that idea is basically
not good. And, RemoveNonParentXlogFiles() also removes garbage files from
pg_wal. It's simpler if similar codes exist near. Thought?Sawada-san's argument of upthread is that it is not good to put
exitArchiveRecovery() after writeTimeLineHIstory(), which is what
cbc55da has done per the reasons mentioned in the commit log, and we
should not change that.My argument is we know that RECOVERYXLOG and RECOVERYHISTORY are not
needed anymore at this stage of recovery, hence we had better remove
them as soon as possible. I am not convinced that it is a good idea
to move the cleanup close to RemoveNonParentXlogFiles(). First, this
is an entirely different part of the logic where the startup process
has already switched to a new timeline. Second, we add more steps
between the moment the two files are not needed and the moment they
are removed, so any failure in-between would cause those files to
still be there (we cannot say either that we will not manipulate this
code later on) and we don't want those files to lie around. So,
mentioning that we do the cleanup just after writeTimeLineHIstory()
because we don't need them anymore is more consistent with what has
been done for ages for the end of archive recovery, something that
cbc55da unfortunately broke.Ok, I have no objection to remove them just after writeTimeLineHistory().
I abandoned once to move the removal code to between
writeTimeLineHistory() and timeline switching because of expanding the
window but since unlink itself will complete within a very short time
it would not be problamatic much.
Attached the updated patch that just moves the removal code.
Regards,
--
Masahiko Sawada
Attachments:
v3_remove_recovered_historyfile.patchtext/x-patch; charset=US-ASCII; name=v3_remove_recovered_historyfile.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6c69eb6dd7..dcfb481f38 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5541,17 +5541,6 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
XLogFileName(xlogfname, ThisTimeLineID, startLogSegNo, wal_segment_size);
XLogArchiveCleanup(xlogfname);
- /*
- * Since there might be a partial WAL segment named RECOVERYXLOG, get rid
- * of it.
- */
- snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
- unlink(recoveryPath); /* ignore any error */
-
- /* Get rid of any remaining recovered timeline-history file, too */
- snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYHISTORY");
- unlink(recoveryPath); /* ignore any error */
-
/*
* Remove the signal files out of the way, so that we don't accidentally
* re-enter archive recovery mode in a subsequent crash.
@@ -7475,6 +7464,17 @@ StartupXLOG(void)
*/
writeTimeLineHistory(ThisTimeLineID, recoveryTargetTLI,
EndRecPtr, reason);
+
+ /*
+ * Since there might be a partial WAL segment named RECOVERYXLOG, get rid
+ * of it.
+ */
+ snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
+ unlink(recoveryPath); /* ignore any error */
+
+ /* Get rid of any remaining recovered timeline-history file, too */
+ snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYHISTORY");
+ unlink(recoveryPath); /* ignore any error */
}
/* Save the selected TimeLineID in shared memory, too */
On Fri, Sep 27, 2019 at 10:00:16PM +0900, Masahiko Sawada wrote:
I abandoned once to move the removal code to between
writeTimeLineHistory() and timeline switching because of expanding the
window but since unlink itself will complete within a very short time
it would not be problamatic much.Attached the updated patch that just moves the removal code.
That's not quite it, as you forgot to move the declaration of
recoveryPath so the patch fails to compile.
Adding some tests would be nice, so I updated your patch to include
something. One place where we recover files from archives is
002_archiving.pl, still the files get renamed to the segment names
when recovered so that's difficult to make that part 100%
deterministic yet. Still as a reminder of the properties behind those
files it does not sound bad to document it in the test either, that's
cheap, and we get the future covered.
--
Michael
Attachments:
v4_remove_recovered_historyfile.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6c69eb6dd7..1e5d1691ee 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5461,7 +5461,6 @@ validateRecoveryParameters(void)
static void
exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
{
- char recoveryPath[MAXPGPATH];
char xlogfname[MAXFNAMELEN];
XLogSegNo endLogSegNo;
XLogSegNo startLogSegNo;
@@ -5541,17 +5540,6 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
XLogFileName(xlogfname, ThisTimeLineID, startLogSegNo, wal_segment_size);
XLogArchiveCleanup(xlogfname);
- /*
- * Since there might be a partial WAL segment named RECOVERYXLOG, get rid
- * of it.
- */
- snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
- unlink(recoveryPath); /* ignore any error */
-
- /* Get rid of any remaining recovered timeline-history file, too */
- snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYHISTORY");
- unlink(recoveryPath); /* ignore any error */
-
/*
* Remove the signal files out of the way, so that we don't accidentally
* re-enter archive recovery mode in a subsequent crash.
@@ -7419,6 +7407,7 @@ StartupXLOG(void)
if (ArchiveRecoveryRequested)
{
char reason[200];
+ char recoveryPath[MAXPGPATH];
Assert(InArchiveRecovery);
@@ -7475,6 +7464,17 @@ StartupXLOG(void)
*/
writeTimeLineHistory(ThisTimeLineID, recoveryTargetTLI,
EndRecPtr, reason);
+
+ /*
+ * Since there might be a partial WAL segment named RECOVERYXLOG, get
+ * rid of it.
+ */
+ snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
+ unlink(recoveryPath); /* ignore any error */
+
+ /* Get rid of any remaining recovered timeline-history file, too */
+ snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYHISTORY");
+ unlink(recoveryPath); /* ignore any error */
}
/* Save the selected TimeLineID in shared memory, too */
diff --git a/src/test/recovery/t/002_archiving.pl b/src/test/recovery/t/002_archiving.pl
index e1bd3c95cc..e71cb8a25a 100644
--- a/src/test/recovery/t/002_archiving.pl
+++ b/src/test/recovery/t/002_archiving.pl
@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 1;
+use Test::More tests => 3;
use File::Copy;
# Initialize master node, doing archives
@@ -49,3 +49,14 @@ $node_standby->poll_query_until('postgres', $caughtup_query)
my $result =
$node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
is($result, qq(1000), 'check content from archives');
+
+# Promote the standby, and check that files specifically generated during
+# archive recovery are cleaned up.
+$node_standby->promote;
+my $node_standby_data = $node_standby->data_dir;
+isnt(
+ -f "$node_standby_data/pg_wal/RECOVERYHISTORY",
+ "RECOVERYHISTORY removed after promotion");
+isnt(
+ -f "$node_standby_data/pg_wal/RECOVERYXLOG",
+ "RECOVERYXLOG removed after promotion");
On Mon, Sep 30, 2019 at 10:10 AM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Sep 27, 2019 at 10:00:16PM +0900, Masahiko Sawada wrote:
I abandoned once to move the removal code to between
writeTimeLineHistory() and timeline switching because of expanding the
window but since unlink itself will complete within a very short time
it would not be problamatic much.Attached the updated patch that just moves the removal code.
That's not quite it, as you forgot to move the declaration of
recoveryPath so the patch fails to compile.
Oops, thanks.
Adding some tests would be nice, so I updated your patch to include
something. One place where we recover files from archives is
002_archiving.pl, still the files get renamed to the segment names
when recovered so that's difficult to make that part 100%
deterministic yet. Still as a reminder of the properties behind those
files it does not sound bad to document it in the test either, that's
cheap, and we get the future covered.
Thank you for updating the patch!
+1 to add tests but even the current postgres passes this tests
because of two reasons: one is $node_standby tries to restore
00000001.history but fails and therefore RECOVERYHISTORY isn't
created. Another one is described To reproduce this issue the new
timeline ID of recovered database needs to be more than 3.
+isnt(
+ -f "$node_standby_data/pg_wal/RECOVERYHISTORY",
+ "RECOVERYHISTORY removed after promotion");
+isnt(
+ -f "$node_standby_data/pg_wal/RECOVERYXLOG",
+ "RECOVERYXLOG removed after promotion");
I think that the above checks are always true because isnt() function
checks if the 1st argument and 2nd argument are not the same.
I've attached the updated version patch including the tests. Please review it.
Regards,
--
Masahiko Sawada
Attachments:
v5_remove_recovered_historyfile.patchapplication/octet-stream; name=v5_remove_recovered_historyfile.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 61ba6b8..9f8c35b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5461,7 +5461,6 @@ validateRecoveryParameters(void)
static void
exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
{
- char recoveryPath[MAXPGPATH];
char xlogfname[MAXFNAMELEN];
XLogSegNo endLogSegNo;
XLogSegNo startLogSegNo;
@@ -5542,17 +5541,6 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
XLogArchiveCleanup(xlogfname);
/*
- * Since there might be a partial WAL segment named RECOVERYXLOG, get rid
- * of it.
- */
- snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
- unlink(recoveryPath); /* ignore any error */
-
- /* Get rid of any remaining recovered timeline-history file, too */
- snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYHISTORY");
- unlink(recoveryPath); /* ignore any error */
-
- /*
* Remove the signal files out of the way, so that we don't accidentally
* re-enter archive recovery mode in a subsequent crash.
*/
@@ -7433,6 +7421,7 @@ StartupXLOG(void)
if (ArchiveRecoveryRequested)
{
char reason[200];
+ char recoveryPath[MAXPGPATH];
Assert(InArchiveRecovery);
@@ -7489,6 +7478,17 @@ StartupXLOG(void)
*/
writeTimeLineHistory(ThisTimeLineID, recoveryTargetTLI,
EndRecPtr, reason);
+
+ /*
+ * Since there might be a partial WAL segment named RECOVERYXLOG, get rid
+ * of it.
+ */
+ snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
+ unlink(recoveryPath); /* ignore any error */
+
+ /* Get rid of any remaining recovered timeline-history file, too */
+ snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYHISTORY");
+ unlink(recoveryPath); /* ignore any error */
}
/* Save the selected TimeLineID in shared memory, too */
diff --git a/src/test/recovery/t/002_archiving.pl b/src/test/recovery/t/002_archiving.pl
index e1bd3c9..bb2283d 100644
--- a/src/test/recovery/t/002_archiving.pl
+++ b/src/test/recovery/t/002_archiving.pl
@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 1;
+use Test::More tests => 3;
use File::Copy;
# Initialize master node, doing archives
@@ -49,3 +49,22 @@ $node_standby->poll_query_until('postgres', $caughtup_query)
my $result =
$node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
is($result, qq(1000), 'check content from archives');
+$node_standby->promote;
+
+# Initialize another standby node from the backup in order to get timeline
+# ID more than 3 and therefore to restore the old timeline history file
+my $node_standby2 = get_new_node('standby2');
+$node_standby2->init_from_backup($node_master, $backup_name,
+ has_restoring => 1);
+$node_standby2->start;
+
+# Promote the standby2, and check that files specifically generated during
+# archive recovery are cleaned up.
+$node_standby2->promote;
+my $node_standby2_data = $node_standby2->data_dir;
+ok(
+ ! -f "$node_standby2_data/pg_wal/RECOVERYHISTORY",
+ "RECOVERYHISTORY removed after promotion");
+ok(
+ ! -f "$node_standby2_data/pg_wal/RECOVERYXLOG",
+ "RECOVERYXLOG removed after promotion");
On Mon, Sep 30, 2019 at 12:53:58PM +0900, Masahiko Sawada wrote:
I think that the above checks are always true because isnt() function
checks if the 1st argument and 2nd argument are not the same.
Dammit. I overlooked this part of the module's doc.
I've attached the updated version patch including the tests. Please
review it.
Thanks, your test allows to reproduce the original problem, so that's
nice. I don't have much to say, except some improvements to the
comments of the test as per the attached. What do you think?
--
Michael
Attachments:
v6_remove_recovered_historyfile.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 61ba6b852e..790e2c8714 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5461,7 +5461,6 @@ validateRecoveryParameters(void)
static void
exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
{
- char recoveryPath[MAXPGPATH];
char xlogfname[MAXFNAMELEN];
XLogSegNo endLogSegNo;
XLogSegNo startLogSegNo;
@@ -5541,17 +5540,6 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
XLogFileName(xlogfname, ThisTimeLineID, startLogSegNo, wal_segment_size);
XLogArchiveCleanup(xlogfname);
- /*
- * Since there might be a partial WAL segment named RECOVERYXLOG, get rid
- * of it.
- */
- snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
- unlink(recoveryPath); /* ignore any error */
-
- /* Get rid of any remaining recovered timeline-history file, too */
- snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYHISTORY");
- unlink(recoveryPath); /* ignore any error */
-
/*
* Remove the signal files out of the way, so that we don't accidentally
* re-enter archive recovery mode in a subsequent crash.
@@ -7433,6 +7421,7 @@ StartupXLOG(void)
if (ArchiveRecoveryRequested)
{
char reason[200];
+ char recoveryPath[MAXPGPATH];
Assert(InArchiveRecovery);
@@ -7489,6 +7478,17 @@ StartupXLOG(void)
*/
writeTimeLineHistory(ThisTimeLineID, recoveryTargetTLI,
EndRecPtr, reason);
+
+ /*
+ * Since there might be a partial WAL segment named RECOVERYXLOG, get
+ * rid of it.
+ */
+ snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
+ unlink(recoveryPath); /* ignore any error */
+
+ /* Get rid of any remaining recovered timeline-history file, too */
+ snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYHISTORY");
+ unlink(recoveryPath); /* ignore any error */
}
/* Save the selected TimeLineID in shared memory, too */
diff --git a/src/test/recovery/t/002_archiving.pl b/src/test/recovery/t/002_archiving.pl
index e1bd3c95cc..b1c4694cdf 100644
--- a/src/test/recovery/t/002_archiving.pl
+++ b/src/test/recovery/t/002_archiving.pl
@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 1;
+use Test::More tests => 3;
use File::Copy;
# Initialize master node, doing archives
@@ -49,3 +49,28 @@ $node_standby->poll_query_until('postgres', $caughtup_query)
my $result =
$node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
is($result, qq(1000), 'check content from archives');
+
+# Check the presence of temporary files specifically generated during
+# archive recovery. To ensure the presence of the temporary history
+# file, switch to a timeline large enough to allow a standby to recover
+# a history file from an archive. As this requires at least two timeline
+# switches, promote the existing standby first. Then create a second
+# standby based on the promoted one. Finally, the second standby is
+# promoted.
+$node_standby->promote;
+
+my $node_standby2 = get_new_node('standby2');
+$node_standby2->init_from_backup($node_master, $backup_name,
+ has_restoring => 1);
+$node_standby2->start;
+
+# Now promote standby2, and check that files specifically generated during
+# archive recovery are removed by the end of recovery.
+$node_standby2->promote;
+my $node_standby2_data = $node_standby2->data_dir;
+ok(
+ ! -f "$node_standby2_data/pg_wal/RECOVERYHISTORY",
+ "RECOVERYHISTORY removed after promotion");
+ok(
+ ! -f "$node_standby2_data/pg_wal/RECOVERYXLOG",
+ "RECOVERYXLOG removed after promotion");
On Mon, Sep 30, 2019 at 5:03 PM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Sep 30, 2019 at 12:53:58PM +0900, Masahiko Sawada wrote:
I think that the above checks are always true because isnt() function
checks if the 1st argument and 2nd argument are not the same.Dammit. I overlooked this part of the module's doc.
I've attached the updated version patch including the tests. Please
review it.Thanks, your test allows to reproduce the original problem, so that's
nice. I don't have much to say, except some improvements to the
comments of the test as per the attached. What do you think?
Thank you for updating! The comment in your patch is much better.
Regards,
--
Masahiko Sawada
On Mon, Sep 30, 2019 at 05:07:08PM +0900, Masahiko Sawada wrote:
Thank you for updating! The comment in your patch is much better.
Thanks, done and back-patched down to 9.5.
--
Michael
On Wed, Oct 2, 2019 at 3:58 PM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Sep 30, 2019 at 05:07:08PM +0900, Masahiko Sawada wrote:
Thank you for updating! The comment in your patch is much better.
Thanks, done and back-patched down to 9.5.
Thank you!
Regards,
--
Masahiko Sawada