[v9.2] Start new timeline for PITR

Started by David Fetterover 14 years ago13 messages
#1David Fetter
david@fetter.org
1 attachment(s)

Folks,

The nice people at VMware, where I work, have come up with a small
patch to allow PITR to create a new timeline. This is useful in cases
where you're using filesystem snapshots of $PGDATA which may be old.

PFA a patch implementing and documenting same :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachments:

new_timeline02.difftext/plain; charset=us-asciiDownload
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index de60905..0df3977 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -73,6 +73,32 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
       </listitem>
      </varlistentry>
 
+     <varlistentry id="create-new-timeline" xreflabel="create_new_timeline">
+      <term><varname>create_new_timeline</varname> (<type>boolean</type>)</term>
+      <indexterm>
+        <primary><varname>create_new_timeline</varname> recovery parameter</primary>
+      </indexterm>
+      <listitem>
+       <para>
+        If set, create a new timeline unconditionally.  This parameter is
+        used in archive recovery scenarios where filesystem snapshots
+        are used.
+       </para>
+       <para>
+        If set to true, this overrides any recover_target that is
+        specified in the recovery.conf file.  Instead, it will perform
+        a crash recovery, then switch to a new timeline.
+       </para>
+       <para>
+        When using create_new_timeline, the restore_command should be
+        the same as for a regular point-in-time recovery.  In this
+        case it only gets used to retrieve the timeline history files
+        from the archive disk, so that postgres can correctly choose a
+        new timeline.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="archive-cleanup-command" xreflabel="archive_cleanup_command">
       <term><varname>archive_cleanup_command</varname> (<type>string</type>)</term>
       <indexterm>
diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
index 229c749..e962938 100644
--- a/src/backend/access/transam/recovery.conf.sample
+++ b/src/backend/access/transam/recovery.conf.sample
@@ -44,6 +44,14 @@
 #restore_command = ''		# e.g. 'cp /mnt/server/archivedir/%f %p'
 #
 #
+# create_new_timeline
+#
+# specifies whether we are starting a new timeline for recovery.  This
+# is useful in scenarios using filesystem snapshots.
+#
+#create_new_timeline = false
+#
+#
 # archive_cleanup_command
 #
 # specifies an optional shell command to execute at every restartpoint.
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5c3ca47..ee43f44 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -201,6 +201,9 @@ static bool StandbyMode = false;
 static char *PrimaryConnInfo = NULL;
 static char *TriggerFile = NULL;
 
+/* option, possibly overridden by recovery.conf, for creating a new timeline for crash recovery */
+static bool createNewTimeLine = false;
+
 /* if recoveryStopsHere returns true, it saves actual stop xid/time/name here */
 static TransactionId recoveryStopXid;
 static TimestampTz recoveryStopTime;
@@ -5385,6 +5388,15 @@ readRecoveryCommandFile(void)
 					(errmsg("trigger_file = '%s'",
 							TriggerFile)));
 		}
+		else if (strcmp(item->name, "create_new_timeline") == 0)
+		{
+			if (!parse_bool(item->value, &createNewTimeLine))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("parameter \"%s\" requires a Boolean value", item->name)));
+			ereport(DEBUG2,
+					(errmsg("create_new_timeline = '%s'", item->value)));
+		}
 		else
 			ereport(FATAL,
 					(errmsg("unrecognized recovery parameter \"%s\"",
@@ -5410,8 +5422,14 @@ readRecoveryCommandFile(void)
 							RECOVERY_COMMAND_FILE)));
 	}
 
-	/* Enable fetching from archive recovery area */
-	InArchiveRecovery = true;
+	/*
+	 * Check whether we're creating a new timeline.
+	 */
+	if (createNewTimeLine)
+		InRecovery = true;
+	else
+		/* Enable fetching from archive recovery area */
+		InArchiveRecovery = true;
 
 	/*
 	 * If user specified recovery_target_timeline, validate it or compute the
@@ -5524,8 +5542,15 @@ exitArchiveRecovery(TimeLineID endTLI, uint32 endLogId, uint32 endLogSeg)
 		{
 			XLogFileCopy(endLogId, endLogSeg,
 						 endTLI, endLogId, endLogSeg);
-
-			if (XLogArchivingActive())
+			/*
+			 * The PITR script should have set the '.done' flag for
+			 * this file, so we don't want to archive it again, as the
+			 * archive version is newer.
+			 *
+			 * If the '.done' flag was not set, the archiver will
+			 * eventually handle it.
+			 */
+			if (XLogArchivingActive() && !createNewTimeLine)
 			{
 				XLogFileName(xlogpath, endTLI, endLogId, endLogSeg);
 				XLogArchiveNotify(xlogpath);
@@ -6676,6 +6701,13 @@ StartupXLOG(void)
 				ereport(FATAL,
 					  (errmsg("WAL ends before consistent recovery point")));
 		}
+		
+		/*
+		 * Check whether we're creating a new timeline, and if we are,
+		 * put us into archive recovery mode.
+		 */
+		if (createNewTimeLine)
+			InArchiveRecovery = true;
 	}
 
 	/*
#2Josh Berkus
josh@agliodbs.com
In reply to: David Fetter (#1)
Re: [v9.2] Start new timeline for PITR

On 6/9/11 4:51 PM, David Fetter wrote:

Folks,

The nice people at VMware, where I work, have come up with a small
patch to allow PITR to create a new timeline. This is useful in cases
where you're using filesystem snapshots of $PGDATA which may be old.

PFA a patch implementing and documenting same :)

Can you explain here in email what the specific goals and expected
behavior of the option are?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Fetter (#1)
Re: [v9.2] Start new timeline for PITR

David Fetter <david@fetter.org> writes:

The nice people at VMware, where I work, have come up with a small
patch to allow PITR to create a new timeline. This is useful in cases
where you're using filesystem snapshots of $PGDATA which may be old.

Huh? We already start a new timeline when doing a non-crash-recovery
replay scenario.

The code looks pretty confused too, which makes it difficult to
reverse-engineer what your point is.

regards, tom lane

#4Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#3)
Re: [v9.2] Start new timeline for PITR

On Thu, Jun 9, 2011 at 8:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Fetter <david@fetter.org> writes:

The nice people at VMware, where I work, have come up with a small
patch to allow PITR to create a new timeline.  This is useful in cases
where you're using filesystem snapshots of $PGDATA which may be old.

Huh?  We already start a new timeline when doing a non-crash-recovery
replay scenario.

The code looks pretty confused too, which makes it difficult to
reverse-engineer what your point is.

I am guessing that they are taking a filesystem snapshot, and then
using that to fire up PG. So to PG it looks like a crash recovery,
but they want a new timeline anyway.

<waves hands>

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5David Fetter
david@fetter.org
In reply to: Robert Haas (#4)
Re: [v9.2] Start new timeline for PITR

On Fri, Jun 10, 2011 at 01:20:25AM -0400, Robert Haas wrote:

On Thu, Jun 9, 2011 at 8:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Fetter <david@fetter.org> writes:

The nice people at VMware, where I work, have come up with a small
patch to allow PITR to create a new timeline. �This is useful in cases
where you're using filesystem snapshots of $PGDATA which may be old.

Huh? �We already start a new timeline when doing a non-crash-recovery
replay scenario.

The code looks pretty confused too, which makes it difficult to
reverse-engineer what your point is.

I am guessing that they are taking a filesystem snapshot, and then
using that to fire up PG. So to PG it looks like a crash recovery,
but they want a new timeline anyway.

<waves hands>

That's pretty much it. More detail:

Let's imagine we're taking filesystem snapshots each day by whatever
means. We're also archiving xlogs, but only have space for 48 hours'
worth. Now we want to recover to 3 days ago, but there are no WALs
from that time, so we do a crash recovery from the filesystem
snapshot. Doing continuous archiving from this conflicts with the
existing WALs, which we solve by creating a new timeline.

This also allows subsequent PITR to other times on the original
timeline.

Josh B pointed out that since this option to true conflicts with
another option, having both should prevent recovery from even
starting, and I'll work up a patch for this tonight or at latest
tomorrow.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#6Josh Berkus
josh@agliodbs.com
In reply to: David Fetter (#5)
Re: [v9.2] Start new timeline for PITR

David,

Let's imagine we're taking filesystem snapshots each day by whatever
means. We're also archiving xlogs, but only have space for 48 hours'
worth. Now we want to recover to 3 days ago, but there are no WALs
from that time, so we do a crash recovery from the filesystem
snapshot. Doing continuous archiving from this conflicts with the
existing WALs, which we solve by creating a new timeline.

How is this different from just changing the recovery_command?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#7Josh Berkus
josh@agliodbs.com
In reply to: David Fetter (#5)
Re: [v9.2] Start new timeline for PITR

David,

Let's imagine we're taking filesystem snapshots each day by whatever
means. We're also archiving xlogs, but only have space for 48 hours'
worth. Now we want to recover to 3 days ago, but there are no WALs
from that time, so we do a crash recovery from the filesystem
snapshot. Doing continuous archiving from this conflicts with the
existing WALs, which we solve by creating a new timeline.

How is this different from just changing the recovery_command?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#8Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#6)
Re: [v9.2] Start new timeline for PITR

On Fri, Jun 10, 2011 at 2:53 PM, Josh Berkus <josh@agliodbs.com> wrote:

Let's imagine we're taking filesystem snapshots each day by whatever
means.  We're also archiving xlogs, but only have space for 48 hours'
worth.  Now we want to recover to 3 days ago, but there are no WALs
from that time, so we do a crash recovery from the filesystem
snapshot.  Doing continuous archiving from this conflicts with the
existing WALs, which we solve by creating a new timeline.

How is this different from just changing the recovery_command?

*scratches head*

How is it the same?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#9Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#8)
Re: [v9.2] Start new timeline for PITR

On 10.06.2011 22:34, Robert Haas wrote:

On Fri, Jun 10, 2011 at 2:53 PM, Josh Berkus<josh@agliodbs.com> wrote:

Let's imagine we're taking filesystem snapshots each day by whatever
means. We're also archiving xlogs, but only have space for 48 hours'
worth. Now we want to recover to 3 days ago, but there are no WALs
from that time, so we do a crash recovery from the filesystem
snapshot. Doing continuous archiving from this conflicts with the
existing WALs, which we solve by creating a new timeline.

How is this different from just changing the recovery_command?

*scratches head*

How is it the same?

Creating a dummy recovery.conf with bogus recovery_command would do the
trick.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#10Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#8)
Re: [v9.2] Start new timeline for PITR

On 6/10/11 12:34 PM, Robert Haas wrote:

On Fri, Jun 10, 2011 at 2:53 PM, Josh Berkus <josh@agliodbs.com> wrote:

Let's imagine we're taking filesystem snapshots each day by whatever
means. We're also archiving xlogs, but only have space for 48 hours'
worth. Now we want to recover to 3 days ago, but there are no WALs
from that time, so we do a crash recovery from the filesystem
snapshot. Doing continuous archiving from this conflicts with the
existing WALs, which we solve by creating a new timeline.

How is this different from just changing the recovery_command?

*scratches head*

How is it the same?

Well, presumably I can just change recovery_command to recover from an
empty directory. Then the PITR copy will just come up as soon as it
finishes processing local snapshot WAL, and it'll start its own
timeline. How is this different from what the patch does?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#11Jaime Casanova
jaime@2ndquadrant.com
In reply to: David Fetter (#5)
Re: [v9.2] Start new timeline for PITR

On Fri, Jun 10, 2011 at 11:30 AM, David Fetter <david@fetter.org> wrote:

This also allows subsequent PITR to other times on the original
timeline.

Josh B pointed out that since this option to true conflicts with
another option, having both should prevent recovery from even
starting, and I'll work up a patch for this tonight or at latest
tomorrow.

Hi,

Are you still working on this? should we expect a new patch?

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

#12David Fetter
david@fetter.org
In reply to: Jaime Casanova (#11)
Re: [v9.2] Start new timeline for PITR

On Fri, Jun 17, 2011 at 09:57:13AM -0500, Jaime Casanova wrote:

On Fri, Jun 10, 2011 at 11:30 AM, David Fetter <david@fetter.org> wrote:

This also allows subsequent PITR to other times on the original
timeline.

Josh B pointed out that since this option to true conflicts with
another option, having both should prevent recovery from even
starting, and I'll work up a patch for this tonight or at latest
tomorrow.

Hi,

Are you still working on this? should we expect a new patch?

Yes, sorry about that. I let work get on top of me. Will try for a
new patch this evening.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#13Jaime Casanova
jaime@2ndquadrant.com
In reply to: David Fetter (#12)
Re: [v9.2] Start new timeline for PITR

On Fri, Jun 17, 2011 at 1:54 PM, David Fetter <david@fetter.org> wrote:

On Fri, Jun 17, 2011 at 09:57:13AM -0500, Jaime Casanova wrote:

Are you still working on this? should we expect a new patch?

Yes, sorry about that.  I let work get on top of me.  Will try for a
new patch this evening.

ok... i will wait it to review... just in advance, i really don't like
this name "create_new_timeline"... it will drive confusion

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación