Tracking latest timeline in standby mode

Started by Heikki Linnakangasabout 15 years ago15 messages

heikki.linnakangas@enterprisedb.com

about 15 years ago

1 attachment(s)

At the moment, when you specify recovery_target_timeline='latest', we
scan for the latest timeline at the beginning of recovery, and pick that
as the target. If new timelines appear during recovery, we stick to the
target chosen in the beginning, the new timelines are ignored. That's
undesirable if you have one master and two standby servers, and failover
happens to one of the standbys. The other standby won't automatically
start tracking the new TLI created by the promoted new master, it
requires a restart to notice.

This was discussed a while ago:
http://archives.postgresql.org/pgsql-hackers/2010-10/msg00620.php

More work needs to be done to make that work over streaming replication,
sending history files over the wire, for example, but let's take baby
steps. At the very minimum the startup process should notice new
timelines appearing in the archive. The attached patch does that.

Comments?

A related issue is that we should have a check for the issue I also
mentioned in the comments:

/*
* If the current timeline is not part of the history of the
* new timeline, we cannot proceed to it.
*
* XXX This isn't foolproof: The new timeline might have forked from
* the current one, but before the current recovery location. In that
* case we will still switch to the new timeline and proceed replaying
* from it even though the history doesn't match what we already
* replayed. That's not good. We will likely notice at the next online
* checkpoint, as the TLI won't match what we expected, but it's
* not guaranteed. The admin needs to make sure that doesn't happen.
*/

but that's a pre-existing and orthogonal issue, it can with the current
code too if you restart the standby, so let's handle that as a separate
patch. I'll focus on that next.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

rescan-latest-tli-1.patchtext/x-diff; name=rescan-latest-tli-1.patchDownload

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6f1fedd..ea624ae 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -206,6 +206,8 @@ static bool recoveryStopAfter;
  *
  * recoveryTargetTLI: the desired timeline that we want to end in.
  *
+ * recoveryTargetIsLatest: was the requested target timeline 'latest'
+ *
  * expectedTLIs: an integer list of recoveryTargetTLI and the TLIs of
  * its known parents, newest first (so recoveryTargetTLI is always the
  * first list member).	Only these TLIs are expected to be seen in the WAL
@@ -219,6 +221,7 @@ static bool recoveryStopAfter;
  * to decrease.
  */
 static TimeLineID recoveryTargetTLI;
+static bool recoveryTargetIsLatest = false;
 static List *expectedTLIs;
 static TimeLineID curFileTLI;
 
@@ -601,6 +604,7 @@ static bool ValidXLOGHeader(XLogPageHeader hdr, int emode);
 static XLogRecord *ReadCheckpointRecord(XLogRecPtr RecPtr, int whichChkpt);
 static List *readTimeLineHistory(TimeLineID targetTLI);
 static bool existsTimeLineHistory(TimeLineID probeTLI);
+static bool rescanLatestTimeLine(void);
 static TimeLineID findNewestTimeLine(TimeLineID startTLI);
 static void writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 					 TimeLineID endTLI,
@@ -4218,6 +4222,61 @@ existsTimeLineHistory(TimeLineID probeTLI)
 }
 
 /*
+ * Scan for new timelines that might have appeared in the archive since we
+ * started recovery.
+ *
+ * If there is any, the function changes recovery target TLI to the latest
+ * one and returns 'true'.
+ */
+static bool
+rescanLatestTimeLine(void)
+{
+	TimeLineID newtarget;
+	newtarget = findNewestTimeLine(recoveryTargetTLI);
+	if (newtarget != recoveryTargetTLI)
+	{
+		/*
+		 * Determine the list of expected TLIs for the new TLI
+		 */
+		List *newExpectedTLIs;
+		newExpectedTLIs = readTimeLineHistory(newtarget);
+
+		/*
+		 * If the current timeline is not part of the history of the
+		 * new timeline, we cannot proceed to it.
+		 *
+		 * XXX This isn't foolproof: The new timeline might have forked from
+		 * the current one, but before the current recovery location. In that
+		 * case we will still switch to the new timeline and proceed replaying
+		 * from it even though the history doesn't match what we already
+		 * replayed. That's not good. We will likely notice at the next online
+		 * checkpoint, as the TLI won't match what we expected, but it's
+		 * not guaranteed. The admin needs to make sure that doesn't happen.
+		 */
+		if (!list_member_int(expectedTLIs,
+							 (int) recoveryTargetTLI))
+			ereport(LOG,
+					(errmsg("new timeline %u is not a child of database system timeline %u",
+							newtarget,
+							ThisTimeLineID)));
+		else
+		{
+			/* Switch target */
+			recoveryTargetTLI = newtarget;
+			expectedTLIs = newExpectedTLIs;
+
+			XLogCtl->RecoveryTargetTLI = recoveryTargetTLI;
+
+			ereport(LOG,
+					(errmsg("new target timeline is %u",
+							recoveryTargetTLI)));
+			return true;
+		}
+	}
+	return false;
+}
+
+/*
  * Find the newest existing timeline, assuming that startTLI exists.
  *
  * Note: while this is somewhat heuristic, it does positively guarantee
@@ -5319,11 +5378,13 @@ readRecoveryCommandFile(void)
 						(errmsg("recovery target timeline %u does not exist",
 								rtli)));
 			recoveryTargetTLI = rtli;
+			recoveryTargetIsLatest = false;
 		}
 		else
 		{
 			/* We start the "latest" search from pg_control's timeline */
 			recoveryTargetTLI = findNewestTimeLine(recoveryTargetTLI);
+			recoveryTargetIsLatest = true;
 		}
 	}
 }
@@ -9483,13 +9544,24 @@ retry:
 					{
 						/*
 						 * We've exhausted all options for retrieving the
-						 * file. Retry ...
+						 * file. Retry.
 						 */
 						failedSources = 0;
 
 						/*
-						 * ... but sleep first if it hasn't been long since
-						 * last attempt.
+						 * Before we sleep, re-scan for possible new timelines
+						 * if we were requested to recover to the latest
+						 * timeline.
+						 */
+						if (recoveryTargetIsLatest)
+						{
+							if (rescanLatestTimeLine())
+								continue;
+						}
+
+						/*
+						 * If it hasn't been long since last attempt, sleep
+						 * to avoid busy-waiting.
 						 */
 						now = (pg_time_t) time(NULL);
 						if ((now - last_fail_time) < 5)

Fujii Masao

masao.fujii@gmail.com

about 15 years ago

In reply to: Heikki Linnakangas (#1)

Re: Tracking latest timeline in standby mode

On Wed, Oct 27, 2010 at 11:42 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

At the moment, when you specify recovery_target_timeline='latest', we scan
for the latest timeline at the beginning of recovery, and pick that as the
target. If new timelines appear during recovery, we stick to the target
chosen in the beginning, the new timelines are ignored. That's undesirable
if you have one master and two standby servers, and failover happens to one
of the standbys. The other standby won't automatically start tracking the
new TLI created by the promoted new master, it requires a restart to notice.

This was discussed a while ago:
http://archives.postgresql.org/pgsql-hackers/2010-10/msg00620.php

More work needs to be done to make that work over streaming replication,
sending history files over the wire, for example, but let's take baby steps.
At the very minimum the startup process should notice new timelines
appearing in the archive. The attached patch does that.

Comments?

Currently the startup process rescans the timeline history file only
when walreceiver
is not in progress. But, if walreceiver receives that file from the
master in the future,
the startup process should rescan them even while walreceiver is in progress?

A related issue is that we should have a check for the issue I also
mentioned in the comments:

/*
* If the current timeline is not part of the history of the
* new timeline, we cannot proceed to it.
*
* XXX This isn't foolproof: The new timeline might have forked
from
* the current one, but before the current recovery location. In
that
* case we will still switch to the new timeline and proceed
replaying
* from it even though the history doesn't match what we already
* replayed. That's not good. We will likely notice at the next
online
* checkpoint, as the TLI won't match what we expected, but it's
* not guaranteed. The admin needs to make sure that doesn't
happen.
*/

but that's a pre-existing and orthogonal issue, it can with the current code
too if you restart the standby, so let's handle that as a separate patch.

I'm thinking to write the timeline switch LSN to the timeline history file, and
compare LSN with the location of the last applied WAL record when that
file is rescaned. If the timeline switch LSN is ahead, we cannot do the switch.

Currently the timeline history file contains the timeline switch WAL filename,
but it's not used at all. As a first step, what about replacing that
filename with
the switch LSN?

+			/* Switch target */
+			recoveryTargetTLI = newtarget;
+			expectedTLIs = newExpectedTLIs;

Before "expectedTLIs = newExpectedTLIs", we should call
list_free_deep(expectedTLIs)?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

about 15 years ago

In reply to: Fujii Masao (#2)

Re: Tracking latest timeline in standby mode

On 01.11.2010 12:32, Fujii Masao wrote:

A related issue is that we should have a check for the issue I also
mentioned in the comments:

/*
* If the current timeline is not part of the history of the
* new timeline, we cannot proceed to it.
*
* XXX This isn't foolproof: The new timeline might have forked
from
* the current one, but before the current recovery location. In
that
* case we will still switch to the new timeline and proceed
replaying
* from it even though the history doesn't match what we already
* replayed. That's not good. We will likely notice at the next
online
* checkpoint, as the TLI won't match what we expected, but it's
* not guaranteed. The admin needs to make sure that doesn't
happen.
*/

but that's a pre-existing and orthogonal issue, it can with the current code
too if you restart the standby, so let's handle that as a separate patch.

I'm thinking to write the timeline switch LSN to the timeline history file, and
compare LSN with the location of the last applied WAL record when that
file is rescaned. If the timeline switch LSN is ahead, we cannot do the switch.

Yeah, that's one approach. Another is to validate the TLI in the xlog
page header, it should always match the current timeline we're on. That
would feel more robust to me.

We're a bit fuzzy about what TLI is written in the page header when the
timeline changing checkpoint record is written, though. If the
checkpoint record fits in the previous page, the page will carry the old
TLI, but if the checkpoint record begins a new WAL page, the new page is
initialized with the new TLI. I think we should rearrange that so that
the page header will always carry the old TLI.

+			/* Switch target */
+			recoveryTargetTLI = newtarget;
+			expectedTLIs = newExpectedTLIs;
Before "expectedTLIs = newExpectedTLIs", we should call
list_free_deep(expectedTLIs)?

It's an integer list so list_free(expectedTLIs) is enough, and I doubt
that leakage will ever be a problem in practice, but in principle you're
right.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Fujii Masao

masao.fujii@gmail.com

about 15 years ago

In reply to: Heikki Linnakangas (#3)

Re: Tracking latest timeline in standby mode

On Mon, Nov 1, 2010 at 8:32 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Yeah, that's one approach. Another is to validate the TLI in the xlog page
header, it should always match the current timeline we're on. That would
feel more robust to me.

Yeah, that seems better.

We're a bit fuzzy about what TLI is written in the page header when the
timeline changing checkpoint record is written, though. If the checkpoint
record fits in the previous page, the page will carry the old TLI, but if
the checkpoint record begins a new WAL page, the new page is initialized
with the new TLI. I think we should rearrange that so that the page header
will always carry the old TLI.

Or after rescanning the timeline history files, what about refetching the last
applied record and checking whether the TLI in the xlog page header is the
same as the previous TLI? IOW, what about using the header of the xlog page
including the last applied record instead of the following checkpoint record?

Anyway ISTM we should also check that the min recovery point is not ahead
of the TLI switch location. So we need to fetch the record in the min recovery
point and validate the TLI of the xlog page header. Otherwise, the database
might get corrupted. This can happen, for example, when you remove all the
WAL files in pg_xlog directory and restart the standby.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

about 15 years ago

In reply to: Fujii Masao (#4)

Re: Tracking latest timeline in standby mode

On 02.11.2010 07:15, Fujii Masao wrote:

On Mon, Nov 1, 2010 at 8:32 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Yeah, that's one approach. Another is to validate the TLI in the xlog page
header, it should always match the current timeline we're on. That would
feel more robust to me.

Yeah, that seems better.

We're a bit fuzzy about what TLI is written in the page header when the
timeline changing checkpoint record is written, though. If the checkpoint
record fits in the previous page, the page will carry the old TLI, but if
the checkpoint record begins a new WAL page, the new page is initialized
with the new TLI. I think we should rearrange that so that the page header
will always carry the old TLI.

Or after rescanning the timeline history files, what about refetching the last
applied record and checking whether the TLI in the xlog page header is the
same as the previous TLI? IOW, what about using the header of the xlog page
including the last applied record instead of the following checkpoint record?

I guess that would work too, but it seems problematic to move backwards
during recovery.

Anyway ISTM we should also check that the min recovery point is not ahead
of the TLI switch location. So we need to fetch the record in the min recovery
point and validate the TLI of the xlog page header. Otherwise, the database
might get corrupted. This can happen, for example, when you remove all the
WAL files in pg_xlog directory and restart the standby.

Yes, that's another problem. We don't know which timeline the min
recovery point refers to. We should store TLI along with
minRecoveryPoint, then we can at least check that we're on the right
timeline when we reach minRecoveryPoint and throw an error.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

about 15 years ago

In reply to: Fujii Masao (#4)

Re: Tracking latest timeline in standby mode

On 02.11.2010 07:15, Fujii Masao wrote:

On Mon, Nov 1, 2010 at 8:32 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Yeah, that's one approach. Another is to validate the TLI in the xlog page
header, it should always match the current timeline we're on. That would
feel more robust to me.

Yeah, that seems better.

I finally got around to look at this. I wrote a patch to validate that
the TLI on xlog page header matches ThisTimeLineID during recovery, and
noticed quickly in testing that it doesn't catch all the cases I'd like
to catch :-(.

The problem scenario is this:

TLI 1 -----------+C-------+------->Standby
.
.
TLI 2 +C-------+------->

The two horizontal lines represent two timelines. TLI 2 forks off from
TLI 1, because of a failover to a not-completely up-to-date standby
server, for example. The plus-signs represent WAL segment boundaries and
C's represent checkpoint records.

Another standby server has replayed all the WAL on TLI 2. Its latest
restartpoint is C. The checkpoint records on the different timelines are
at the same location, at the beginning of the WAL files - not all that
impossible if you have archive_timeout set, for example.

Now, if you stop and restart the standby, it will try to recover to the
latest timeline, which is TLI 2. But before the restart, it had already
replayed the WAL from TLI 1, so it's wrong to replay the WAL from the
parallel universe of TLI 2. At the moment, it will go ahead and do it,
and you end up with an inconsistent database.

I planned to fix that by checking the TLI on the xlog page header, but
that alone isn't enough in the above scenario. The TLI on the page
headers on timeline 2 are what's expected; the first page on the segment
has TLI==1, because it was just forked off from timeline 1, and the
subsequent pages have TLI==2, as they should after the checkpoint record.

So we have to remember that before the restart, which timeline where we
on. We already remember how far we had replayed, that's the
minRecoveryPoint we store in the control file, but we have to memorize
the timeline along that.

On reflection, your idea of checking the history file before replaying
anything seems much easier. We'll still need to add the timeline
alongside minRecoveryPoint to do the checking, but it's a lot easier to
do against the history file. And we can validate the TLIs on page
headers against the information from the history file as we read in the WAL.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Fujii Masao

masao.fujii@gmail.com

almost 15 years ago

In reply to: Heikki Linnakangas (#6)

Re: Tracking latest timeline in standby mode

On Wed, Jan 5, 2011 at 5:08 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

I finally got around to look at this. I wrote a patch to validate that the
TLI on xlog page header matches ThisTimeLineID during recovery, and noticed
quickly in testing that it doesn't catch all the cases I'd like to catch
:-(.

The patch added into the CF hasn't solved this problem yet. Are you planning
to solve it in 9.1? Or are you planning to just commit the patch for 9.1, and
postpone the issue to 9.2 or later? I'm OK either way. Of course, the former
is quite better, though.

Anyway, you have to add the documentation about this feature.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Fujii Masao (#7)

Re: Tracking latest timeline in standby mode

On Mon, Jan 24, 2011 at 2:00 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Wed, Jan 5, 2011 at 5:08 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

I finally got around to look at this. I wrote a patch to validate that the
TLI on xlog page header matches ThisTimeLineID during recovery, and noticed
quickly in testing that it doesn't catch all the cases I'd like to catch
:-(.

The patch added into the CF hasn't solved this problem yet. Are you planning
to solve it in 9.1? Or are you planning to just commit the patch for 9.1, and
postpone the issue to 9.2 or later? I'm OK either way. Of course, the former
is quite better, though.

Anyway, you have to add the documentation about this feature.

This patch is erroneously marked Needs Review in the CommitFest
application, but I think really it's Waiting on Author, and has been
for a long time. I'm thinking we should push this out to 9.2.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 15 years ago

In reply to: Robert Haas (#8)

1 attachment(s)

Re: Tracking latest timeline in standby mode

On 08.02.2011 06:27, Robert Haas wrote:

On Mon, Jan 24, 2011 at 2:00 AM, Fujii Masao<masao.fujii@gmail.com> wrote:

On Wed, Jan 5, 2011 at 5:08 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

I finally got around to look at this. I wrote a patch to validate that the
TLI on xlog page header matches ThisTimeLineID during recovery, and noticed
quickly in testing that it doesn't catch all the cases I'd like to catch
:-(.

The patch added into the CF hasn't solved this problem yet. Are you planning
to solve it in 9.1? Or are you planning to just commit the patch for 9.1, and
postpone the issue to 9.2 or later? I'm OK either way. Of course, the former
is quite better, though.

Anyway, you have to add the documentation about this feature.

This patch is erroneously marked Needs Review in the CommitFest
application, but I think really it's Waiting on Author, and has been
for a long time. I'm thinking we should push this out to 9.2.

I dropped the ball on this one, but now that we have pg_basebackup and
"pg_ctl promote" which make it easy to set up a standby and failover, I
think we should still do this in 9.1. Otherwise you need a restart to
have a 2nd standby server track the TLI change that failover causes.

I wanted to add those extra safeguards, and to support streaming
replication in addition to restoring from archive, but that's 9.2
material. However, the original patch
(http://archives.postgresql.org/message-id/4CC83A50.7070807@enterprisedb.com)
was non-intrusive and no-one objected. While the extra safeguards
would've been nice, this patch doesn't make the situation any worse than
it is already when you restart the standby.

Here's an updated version of that patch, now with a little bit of
documentation. Barring objections, I'll commit this.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

rescan-latest-tli-2.patchtext/x-diff; name=rescan-latest-tli-2.patchDownload

diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index e30552f..3c98ae6 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -660,7 +660,10 @@ protocol to make nodes agree on a serializable transactional order.
     command file <filename>recovery.conf</> in the standby's cluster data
     directory, and turn on <varname>standby_mode</>. Set
     <varname>restore_command</> to a simple command to copy files from
-    the WAL archive.
+    the WAL archive. If you plan to have multiple standby servers for high
+    availability purposes, set <varname>recovery_target_timeline</> to
+    <literal>latest</>, to make the standby server follow the timeline change
+    that occurs at failover to another standby.
    </para>
 
    <note>
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 602fbe2..e9e95ac 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -240,7 +240,9 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
        <para>
         Specifies recovering into a particular timeline.  The default is
         to recover along the same timeline that was current when the
-        base backup was taken.  You only need to set this parameter
+        base backup was taken. Setting this to <literal>latest</> recovers
+        to the latest timeline found in the archive, which is useful in
+        a standby server. Other than that you only need to set this parameter
         in complex re-recovery situations, where you need to return to
         a state that itself was reached after a point-in-time recovery.
         See <xref linkend="backup-timelines"> for discussion.
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b4eb4ac..d1f69cf 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -214,6 +214,8 @@ static bool recoveryStopAfter;
  *
  * recoveryTargetTLI: the desired timeline that we want to end in.
  *
+ * recoveryTargetIsLatest: was the requested target timeline 'latest'
+ *
  * expectedTLIs: an integer list of recoveryTargetTLI and the TLIs of
  * its known parents, newest first (so recoveryTargetTLI is always the
  * first list member).	Only these TLIs are expected to be seen in the WAL
@@ -227,6 +229,7 @@ static bool recoveryStopAfter;
  * to decrease.
  */
 static TimeLineID recoveryTargetTLI;
+static bool recoveryTargetIsLatest = false;
 static List *expectedTLIs;
 static TimeLineID curFileTLI;
 
@@ -637,6 +640,7 @@ static bool ValidXLOGHeader(XLogPageHeader hdr, int emode);
 static XLogRecord *ReadCheckpointRecord(XLogRecPtr RecPtr, int whichChkpt);
 static List *readTimeLineHistory(TimeLineID targetTLI);
 static bool existsTimeLineHistory(TimeLineID probeTLI);
+static bool rescanLatestTimeLine(void);
 static TimeLineID findNewestTimeLine(TimeLineID startTLI);
 static void writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 					 TimeLineID endTLI,
@@ -4254,6 +4258,61 @@ existsTimeLineHistory(TimeLineID probeTLI)
 }
 
 /*
+ * Scan for new timelines that might have appeared in the archive since we
+ * started recovery.
+ *
+ * If there is any, the function changes recovery target TLI to the latest
+ * one and returns 'true'.
+ */
+static bool
+rescanLatestTimeLine(void)
+{
+	TimeLineID newtarget;
+	newtarget = findNewestTimeLine(recoveryTargetTLI);
+	if (newtarget != recoveryTargetTLI)
+	{
+		/*
+		 * Determine the list of expected TLIs for the new TLI
+		 */
+		List *newExpectedTLIs;
+		newExpectedTLIs = readTimeLineHistory(newtarget);
+
+		/*
+		 * If the current timeline is not part of the history of the
+		 * new timeline, we cannot proceed to it.
+		 *
+		 * XXX This isn't foolproof: The new timeline might have forked from
+		 * the current one, but before the current recovery location. In that
+		 * case we will still switch to the new timeline and proceed replaying
+		 * from it even though the history doesn't match what we already
+		 * replayed. That's not good. We will likely notice at the next online
+		 * checkpoint, as the TLI won't match what we expected, but it's
+		 * not guaranteed. The admin needs to make sure that doesn't happen.
+		 */
+		if (!list_member_int(expectedTLIs,
+							 (int) recoveryTargetTLI))
+			ereport(LOG,
+					(errmsg("new timeline %u is not a child of database system timeline %u",
+							newtarget,
+							ThisTimeLineID)));
+		else
+		{
+			/* Switch target */
+			recoveryTargetTLI = newtarget;
+			expectedTLIs = newExpectedTLIs;
+
+			XLogCtl->RecoveryTargetTLI = recoveryTargetTLI;
+
+			ereport(LOG,
+					(errmsg("new target timeline is %u",
+							recoveryTargetTLI)));
+			return true;
+		}
+	}
+	return false;
+}
+
+/*
  * Find the newest existing timeline, assuming that startTLI exists.
  *
  * Note: while this is somewhat heuristic, it does positively guarantee
@@ -5327,11 +5386,13 @@ readRecoveryCommandFile(void)
 						(errmsg("recovery target timeline %u does not exist",
 								rtli)));
 			recoveryTargetTLI = rtli;
+			recoveryTargetIsLatest = false;
 		}
 		else
 		{
 			/* We start the "latest" search from pg_control's timeline */
 			recoveryTargetTLI = findNewestTimeLine(recoveryTargetTLI);
+			recoveryTargetIsLatest = true;
 		}
 	}
 
@@ -10032,13 +10093,24 @@ retry:
 					{
 						/*
 						 * We've exhausted all options for retrieving the
-						 * file. Retry ...
+						 * file. Retry.
 						 */
 						failedSources = 0;
 
 						/*
-						 * ... but sleep first if it hasn't been long since
-						 * last attempt.
+						 * Before we sleep, re-scan for possible new timelines
+						 * if we were requested to recover to the latest
+						 * timeline.
+						 */
+						if (recoveryTargetIsLatest)
+						{
+							if (rescanLatestTimeLine())
+								continue;
+						}
+
+						/*
+						 * If it hasn't been long since last attempt, sleep
+						 * to avoid busy-waiting.
 						 */
 						now = (pg_time_t) time(NULL);
 						if ((now - last_fail_time) < 5)

#10

Magnus Hagander

magnus@hagander.net

almost 15 years ago

In reply to: Heikki Linnakangas (#9)

Re: Tracking latest timeline in standby mode

On Mon, Mar 7, 2011 at 11:52, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 08.02.2011 06:27, Robert Haas wrote:

On Mon, Jan 24, 2011 at 2:00 AM, Fujii Masao<masao.fujii@gmail.com>
wrote:

On Wed, Jan 5, 2011 at 5:08 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

I finally got around to look at this. I wrote a patch to validate that
the
TLI on xlog page header matches ThisTimeLineID during recovery, and
noticed
quickly in testing that it doesn't catch all the cases I'd like to catch
:-(.

The patch added into the CF hasn't solved this problem yet. Are you
planning
to solve it in 9.1? Or are you planning to just commit the patch for 9.1,
and
postpone the issue to 9.2 or later? I'm OK either way. Of course, the
former
is quite better, though.

Anyway, you have to add the documentation about this feature.

This patch is erroneously marked Needs Review in the CommitFest
application, but I think really it's Waiting on Author, and has been
for a long time. I'm thinking we should push this out to 9.2.

I dropped the ball on this one, but now that we have pg_basebackup and
"pg_ctl promote" which make it easy to set up a standby and failover, I
think we should still do this in 9.1. Otherwise you need a restart to have a
2nd standby server track the TLI change that failover causes.

+1 for doing this!

(haven't had time to look through the actual patch, so obviously don't
do it if it's broken..)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#11

Fujii Masao

masao.fujii@gmail.com

almost 15 years ago

In reply to: Magnus Hagander (#10)

Re: Tracking latest timeline in standby mode

On Mon, Mar 7, 2011 at 9:06 PM, Magnus Hagander <magnus@hagander.net> wrote:

I dropped the ball on this one, but now that we have pg_basebackup and
"pg_ctl promote" which make it easy to set up a standby and failover, I
think we should still do this in 9.1. Otherwise you need a restart to have a
2nd standby server track the TLI change that failover causes.

+1 for doing this!

Comments:

+		if (!list_member_int(expectedTLIs,
+							 (int) recoveryTargetTLI))
+			ereport(LOG,
+					(errmsg("new timeline %u is not a child of database system timeline %u",

We should check whether recoveryTargetTLI is a member of newExpectedTLIs
instead of expectedTLIs?

+ /* Switch target */
+                       recoveryTargetTLI = newtarget;
+                       expectedTLIs = newExpectedTLIs;
Before "expectedTLIs = newExpectedTLIs", we should call
list_free_deep(expectedTLIs)?

It's an integer list so list_free(expectedTLIs) is enough, and I doubt that leakage will ever be a problem in practice, but in principle you're right.

True. But I think that it's good habit to fix a leakage no matter how
small it's.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#12

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 15 years ago

In reply to: Fujii Masao (#11)

Re: Tracking latest timeline in standby mode

On 07.03.2011 14:35, Fujii Masao wrote:

Comments:
+		if (!list_member_int(expectedTLIs,
+							 (int) recoveryTargetTLI))
+			ereport(LOG,
+					(errmsg("new timeline %u is not a child of database system timeline %u",
We should check whether recoveryTargetTLI is a member of newExpectedTLIs
instead of expectedTLIs?

Thanks, fixed.

+ /* Switch target */
+                       recoveryTargetTLI = newtarget;
+                       expectedTLIs = newExpectedTLIs;
Before "expectedTLIs = newExpectedTLIs", we should call
list_free_deep(expectedTLIs)?

It's an integer list so list_free(expectedTLIs) is enough, and I doubt that leakage will ever be a problem in practice, but in principle you're right.
True. But I think that it's good habit to fix a leakage no matter how
small it's.

Ah, thanks for the reminder.

Added that and committed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#13

senthilnathan

senthilnathan.t@gmail.com

over 14 years ago

In reply to: Heikki Linnakangas (#12)

Re: Tracking latest timeline in standby mode

Whether this feature is available in version 9.1.0. ??

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Tracking-latest-timeline-in-standby-mode-tp3238829p4863900.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

#14

Fujii Masao

masao.fujii@gmail.com

over 14 years ago

In reply to: senthilnathan (#13)

Re: Tracking latest timeline in standby mode

On Mon, Oct 3, 2011 at 3:18 PM, senthilnathan <senthilnathan.t@gmail.com> wrote:

Whether this feature is available in version 9.1.0. ??

Yes, it's available in 9.1.x.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#15

senthilnathan

senthilnathan.t@gmail.com

about 14 years ago

In reply to: Fujii Masao (#14)

Re: Tracking latest timeline in standby mode

We are using 9.1.,

We have a set up like a master and 2 standby servers. M -- > S1,S2 . Both
standby S1 and S2 share the same archive. Master will have an Virtual IP.
Both stand by servers will be replicated using this virtual ip.

Assume the master fails,using our heart beat mechanism Virtual IP bound to
S1(if S1 is ahead or equal to S2 XLOG).,

Is it required to copy the time line history file that is generated at time
of S1 promotion as master to the archive directory of S2 for replication to
work (i.e S1(new master) to S2.)

Without doing this history file copy from S1 to S2, S2 keeps throwing the
following error message.,

2011-12-07 17:29:46 IST::@:[18879]:FATAL: could not receive data from WAL
stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

cp: cannot stat `../archive/000000010000000000000005': No such file or
directory
2011-12-07 17:29:49 IST::@:[18875]:LOG: record with zero length at
0/5D8FFC0
cp: cannot stat `../archive/000000010000000000000005': No such file or
directory
cp: cannot stat `../archive/00000002.history': No such file or directory
2011-12-07 17:29:49 IST::@:[20362]:FATAL: timeline 2 of the primary does
not match recovery target timeline 1
cp: cannot stat `../archive/000000010000000000000005': No such file or
directory
cp: cannot stat `../archive/000000010000000000000005': No such file or
directory
cp: cannot stat `../archive/00000002.history': No such file or directory
2011-12-07 17:29:54 IST::@:[20367]:FATAL: timeline 2 of the primary does
not match recovery target timeline 1
cp: cannot stat `../archive/000000010000000000000005': No such file or
directory
cp: cannot stat `../archive/000000010000000000000005': No such file or
directory
cp: cannot stat `../archive/00000002.history': No such file or directory

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Tracking-latest-timeline-in-standby-mode-tp3238829p5057733.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.