Pause at end of recovery

Started by Magnus Haganderabout 14 years ago8 messages
#1Magnus Hagander
magnus@hagander.net

These days we have pause_at_recovery_target, which lets us pause when
we reach a PITR target. Is there a particular reason we don't have a
way to pause at end of recovery if we *didn't* specify a target -
meaning we let it run until the end of the archived log? While it's
too late to change the target, I can see a lot of usescases where you
don't want it to be possible to make changes to the database again
until it has been properly verified - and keeping it up in readonly
mode in that case can be quite useful...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#2Simon Riggs
simon@2ndQuadrant.com
In reply to: Magnus Hagander (#1)
Re: Pause at end of recovery

On Tue, Dec 20, 2011 at 1:40 PM, Magnus Hagander <magnus@hagander.net> wrote:

These days we have pause_at_recovery_target, which lets us pause when
we reach a PITR target. Is there a particular reason we don't have a
way to pause at end of recovery if we *didn't* specify a target -
meaning we let it run until the end of the archived log? While it's
too late to change the target, I can see a lot of usescases where you
don't want it to be possible to make changes to the database again
until it has been properly verified - and keeping it up in readonly
mode in that case can be quite useful...

Useful for what purpose? It' s possible to deny access in other ways already.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#3Magnus Hagander
magnus@hagander.net
In reply to: Simon Riggs (#2)
Re: Pause at end of recovery

On Tue, Dec 20, 2011 at 18:15, Simon Riggs <simon@2ndquadrant.com> wrote:

On Tue, Dec 20, 2011 at 1:40 PM, Magnus Hagander <magnus@hagander.net> wrote:

These days we have pause_at_recovery_target, which lets us pause when
we reach a PITR target. Is there a particular reason we don't have a
way to pause at end of recovery if we *didn't* specify a target -
meaning we let it run until the end of the archived log? While it's
too late to change the target, I can see a lot of usescases where you
don't want it to be possible to make changes to the database again
until it has been properly verified - and keeping it up in readonly
mode in that case can be quite useful...

Useful for what purpose? It' s possible to deny access in other ways already.

For validating the restore, while allowing easy read-only access.

If you could declare a read-only connection in pg_hba.conf it would
give the same functionality, but you really can't...

I'm not saying it's a big feature. But the way it looks now it seems
to be artificially restricted from a usecase. Or is there a technical
reason why we don't allow it?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#4Simon Riggs
simon@2ndquadrant.com
In reply to: Magnus Hagander (#3)
Re: Pause at end of recovery

On Wed, Dec 21, 2011 at 12:04 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Tue, Dec 20, 2011 at 18:15, Simon Riggs <simon@2ndquadrant.com> wrote:

On Tue, Dec 20, 2011 at 1:40 PM, Magnus Hagander <magnus@hagander.net> wrote:

These days we have pause_at_recovery_target, which lets us pause when
we reach a PITR target. Is there a particular reason we don't have a
way to pause at end of recovery if we *didn't* specify a target -
meaning we let it run until the end of the archived log? While it's
too late to change the target, I can see a lot of usescases where you
don't want it to be possible to make changes to the database again
until it has been properly verified - and keeping it up in readonly
mode in that case can be quite useful...

Useful for what purpose? It' s possible to deny access in other ways already.

For validating the restore, while allowing easy read-only access.

If you could declare a read-only connection in pg_hba.conf it would
give the same functionality, but you really can't...

I'm not saying it's a big feature. But the way it looks now it seems
to be artificially restricted from a usecase. Or is there a technical
reason why we don't allow it?

I can see a reason to do this now. I've written patch and will commit
on Friday. Nudge me if I don't.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#4)
1 attachment(s)
Re: Pause at end of recovery

On Thu, Dec 22, 2011 at 6:16 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

I can see a reason to do this now. I've written patch and will commit
on Friday. Nudge me if I don't.

It's hard to write this so it works in all cases and doesn't work in
the right cases also.

Basically, we can't get in the way of crash recovery, so the only way
we can currently tell a crash recovery from an archive recovery is the
presence of restore_command.

If you don't have that and you haven't set a recovery target, it won't
pause and there's nothing I can do, AFAICS.

Please test this and review before commit.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

pause_at_end_of_logs.v2.patchtext/x-patch; charset=US-ASCII; name=pause_at_end_of_logs.v2.patchDownload
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 8647024..1e1614f 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -263,6 +263,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
        <para>
         Specifies whether recovery should pause when the recovery target
         is reached. The default is true.
+        If <varname>pause_at_recovery_target</> is set yet no recovery target
+        is specified there will be a pause when we reach the end of WAL.
         This is intended to allow queries to be executed against the
         database to check if this recovery target is the most desirable
         point for recovery. The paused state can be resumed by using
@@ -275,7 +277,7 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
        </para>
        <para>
         This setting has no effect if <xref linkend="guc-hot-standby"> is not
-        enabled, or if no recovery target is set.
+        enabled, or if the database has not reached a consistent state.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 41800a4..dc72c00 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -186,6 +186,9 @@ static bool InArchiveRecovery = false;
 /* Was the last xlog file restored from archive, or local? */
 static bool restoredFromArchive = false;
 
+/* Were we explicitly requested to terminate recovery, by any means? */
+static bool triggered = false;
+
 /* options taken from recovery.conf for archive recovery */
 static char *recoveryRestoreCommand = NULL;
 static char *recoveryEndCommand = NULL;
@@ -6569,6 +6572,24 @@ StartupXLOG(void)
 				ereport(LOG,
 					 (errmsg("last completed transaction was at log time %s",
 							 timestamptz_to_str(xtime))));
+
+			/*
+			 * If we are in archive recovery and yet didn't have an explicit
+			 * recovery target then pause at the end of recovery, unless we
+			 * already paused above or we have been triggered to go live.
+			 * Pause only if users can connect to send a resume message
+			 */
+			if (recoveryPauseAtTarget && 
+				standbyState == STANDBY_SNAPSHOT_READY &&
+				recoveryTarget == RECOVERY_TARGET_UNSET &&
+				InArchiveRecovery &&
+				!triggered &&
+				!reachedStopPoint)
+			{
+				SetRecoveryPause(true);
+				recoveryPausesHere();
+			}
+
 			InRedo = false;
 		}
 		else
@@ -9836,7 +9857,7 @@ retry:
 					 * can from archive and pg_xlog before failover.
 					 */
 					if (CheckForStandbyTrigger())
-						goto triggered;
+						goto trigger_received;
 				}
 
 				/*
@@ -9960,7 +9981,7 @@ next_record_is_invalid:
 	else
 		return false;
 
-triggered:
+trigger_received:
 	if (readFile >= 0)
 		close(readFile);
 	readFile = -1;
@@ -10012,7 +10033,6 @@ static bool
 CheckForStandbyTrigger(void)
 {
 	struct stat stat_buf;
-	static bool triggered = false;
 
 	if (triggered)
 		return true;
#6Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#5)
Re: Pause at end of recovery

On Wed, Dec 28, 2011 at 7:27 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Thu, Dec 22, 2011 at 6:16 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

I can see a reason to do this now. I've written patch and will commit
on Friday. Nudge me if I don't.

It's hard to write this so it works in all cases and doesn't work in
the right cases also.

Basically, we can't get in the way of crash recovery, so the only way
we can currently tell a crash recovery from an archive recovery is the
presence of restore_command.

If you don't have that and you haven't set a recovery target, it won't
pause and there's nothing I can do, AFAICS.

Please test this and review before commit.

What if wrong recovery target is specified and an archive recovery reaches
end of WAL files unexpectedly? Even in this case, we want to pause
recovery at the end? Otherwise, we'll lose chance to correct the recovery
target and retry archive recovery.

One idea; starting archive recovery with standby_mode=on meets your needs?
When archive recovery reaches end of WAL files, regardless of whether recovery
target is specified or not, recovery pauses at the end. If hot_standby
is enabled,
you can check the contents and if it's OK you can finish recovery by
pg_ctl promote.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#7Magnus Hagander
magnus@hagander.net
In reply to: Fujii Masao (#6)
Re: Pause at end of recovery

On Thu, Jan 26, 2012 at 08:42, Fujii Masao <masao.fujii@gmail.com> wrote:

On Wed, Dec 28, 2011 at 7:27 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Thu, Dec 22, 2011 at 6:16 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

I can see a reason to do this now. I've written patch and will commit
on Friday. Nudge me if I don't.

It's hard to write this so it works in all cases and doesn't work in
the right cases also.

Basically, we can't get in the way of crash recovery, so the only way
we can currently tell a crash recovery from an archive recovery is the
presence of restore_command.

If you don't have that and you haven't set a recovery target, it won't
pause and there's nothing I can do, AFAICS.

Please test this and review before commit.

What if wrong recovery target is specified and an archive recovery reaches
end of WAL files unexpectedly? Even in this case, we want to pause
recovery at the end? Otherwise, we'll lose chance to correct the recovery
target and retry archive recovery.

Yes, we definitely want to pause then.

One idea; starting archive recovery with standby_mode=on meets your needs?

I haven't tested, but probably, yes. But in that case, why do we need
the pause_at_recovery_target *at all*? It's basically overloaded
functionality already, but I figured it was set up that way to keep
replication and recovery a bit separated?

When archive recovery reaches end of WAL files, regardless of whether recovery
target is specified or not, recovery pauses at the end. If hot_standby
is enabled,
you can check the contents and if it's OK you can finish recovery by
pg_ctl promote.

That is pretty much the usecase, yes. Or readjust the recovery target
(or, heck, add more files to the wal archive because you set it up
wrong somehow) and continue the recovery further along the line.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#8Fujii Masao
masao.fujii@gmail.com
In reply to: Magnus Hagander (#7)
Re: Pause at end of recovery

On Fri, Jan 27, 2012 at 12:50 AM, Magnus Hagander <magnus@hagander.net> wrote:

One idea; starting archive recovery with standby_mode=on meets your needs?

I haven't tested, but probably, yes. But in that case, why do we need
the pause_at_recovery_target *at all*? It's basically overloaded
functionality already, but I figured it was set up that way to keep
replication and recovery a bit separated?

AFAIK, when standby_mode = on, archive recovery pauses only at end of WAL files.
When recovery target is specified and archive recovery reaches the
target, it doesn't
pause. OTOH, when pause_at_recovery_target is set, archive recovery pauses only
at the target but not end of WAL files. Neither can cover all the usecases. So
pause_at_recovery_target was implemented.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center