[proposal] recovery_target "latest"
Hello, hackers!
I`d like to propose a new argument for recovery_target parameter, which
will stand to recovering until all available WAL segments are applied.
Current PostgreSQL recovery default behavior(when no recovery target is
provided) does exactly that, but there are several shortcomings:
- without explicit recovery target standing for default behavior,
recovery_target_action is not coming to action at the end of recovery
- with PG12 changes, the life of all backup tools became very hard,
because now recovery parameters can be set outside of single config
file(recovery.conf), so it is impossible to ensure, that default
recovery behavior, desired in some cases, will not be silently
overwritten by some recovery parameter forgotten by user.
Proposed path is very simple and solves the aforementioned problems by
introducing new argument "latest" for recovery_target parameter.
Old recovery behavior is still available if no recovery target is
provided. I`m not sure, whether it should it be left as it is now, or not.
Another open question is what to do with recovery_target_inclusive if
recovery_target = "latest" is used.
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0001-recovery_target_latest.patchtext/x-patch; name=0001-recovery_target_latest.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0191ec84b1..49675c38da 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3366,21 +3366,19 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows
<variablelist>
<varlistentry id="guc-recovery-target" xreflabel="recovery_target">
- <term><varname>recovery_target</varname><literal> = 'immediate'</literal>
+ <term><varname>recovery_target</varname> (<type>string</type>)
<indexterm>
<primary><varname>recovery_target</varname> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
- This parameter specifies that recovery should end as soon as a
- consistent state is reached, i.e. as early as possible. When restoring
- from an online backup, this means the point where taking the backup
- ended.
- </para>
- <para>
- Technically, this is a string parameter, but <literal>'immediate'</literal>
- is currently the only allowed value.
+ This parameter determines how far recovery should proceed. The value
+ <literal>immediate</literal> means that recovery should end as soon as a consistent
+ state is reached, i.e. as early as possible. When restoring from an online
+ backup, this means the point where taking the backup ended.
+ The second possible value <literal>latest</literal> means that recovery
+ should proceed to the end of the available WAL log.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2e3cc51006..faab0b2b4c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6345,6 +6345,9 @@ StartupXLOG(void)
else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE)
ereport(LOG,
(errmsg("starting point-in-time recovery to earliest consistent point")));
+ else if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ ereport(LOG,
+ (errmsg("starting point-in-time recovery until all available WAL is applied")));
else
ereport(LOG,
(errmsg("starting archive recovery")));
@@ -7257,6 +7260,13 @@ StartupXLOG(void)
* end of main redo apply loop
*/
+ /*
+ * If all avaialble WAL is replayed,
+ * then RECOVERY_TARGET_LATEST is satisfied
+ */
+ if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ reachedStopPoint = true;
+
if (reachedStopPoint)
{
if (!reachedConsistency)
@@ -7459,6 +7469,8 @@ StartupXLOG(void)
recoveryStopName);
else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE)
snprintf(reason, sizeof(reason), "reached consistency");
+ else if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ snprintf(reason, sizeof(reason), "applied all available WAL");
else
snprintf(reason, sizeof(reason), "no recovery target specified");
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 31a5ef0474..b652304683 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3503,7 +3503,10 @@ static struct config_string ConfigureNamesString[] =
{
{"recovery_target", PGC_POSTMASTER, WAL_RECOVERY_TARGET,
- gettext_noop("Set to \"immediate\" to end recovery as soon as a consistent state is reached."),
+ gettext_noop("Set to \"immediate\" to end recovery as "
+ "soon as a consistent state is reached or "
+ " to \"latest\" to end recovery after "
+ "replaying all available WAL in the archive."),
NULL
},
&recovery_target_string,
@@ -11527,9 +11530,10 @@ error_multiple_recovery_targets(void)
static bool
check_recovery_target(char **newval, void **extra, GucSource source)
{
- if (strcmp(*newval, "immediate") != 0 && strcmp(*newval, "") != 0)
+ if (strcmp(*newval, "immediate") != 0 && strcmp(*newval, "latest") != 0 &&
+ strcmp(*newval, "") != 0)
{
- GUC_check_errdetail("The only allowed value is \"immediate\".");
+ GUC_check_errdetail("The only allowed values are \"immediate\" and \"latest\".");
return false;
}
return true;
@@ -11539,11 +11543,17 @@ static void
assign_recovery_target(const char *newval, void *extra)
{
if (recoveryTarget != RECOVERY_TARGET_UNSET &&
- recoveryTarget != RECOVERY_TARGET_IMMEDIATE)
+ recoveryTarget != RECOVERY_TARGET_IMMEDIATE &&
+ recoveryTarget != RECOVERY_TARGET_LATEST)
error_multiple_recovery_targets();
if (newval && strcmp(newval, "") != 0)
- recoveryTarget = RECOVERY_TARGET_IMMEDIATE;
+ {
+ if (strcmp(newval, "immediate") == 0)
+ recoveryTarget = RECOVERY_TARGET_IMMEDIATE;
+ else if (strcmp(newval, "latest") == 0)
+ recoveryTarget = RECOVERY_TARGET_LATEST;
+ }
else
recoveryTarget = RECOVERY_TARGET_UNSET;
}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252aad..55e7e4bd2c 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -84,7 +84,8 @@ typedef enum
RECOVERY_TARGET_TIME,
RECOVERY_TARGET_NAME,
RECOVERY_TARGET_LSN,
- RECOVERY_TARGET_IMMEDIATE
+ RECOVERY_TARGET_IMMEDIATE,
+ RECOVERY_TARGET_LATEST
} RecoveryTargetType;
/*
Hello.
At Mon, 4 Nov 2019 16:03:38 +0300, Grigory Smolkin <g.smolkin@postgrespro.ru> wrote in
Hello, hackers!
I`d like to propose a new argument for recovery_target parameter,
which will stand to recovering until all available WAL segments are
applied.Current PostgreSQL recovery default behavior(when no recovery target
is provided) does exactly that, but there are several shortcomings:
- without explicit recovery target standing for default behavior,
recovery_target_action is not coming to action at the end of recovery
- with PG12 changes, the life of all backup tools became very hard,
because now recovery parameters can be set outside of single config
file(recovery.conf), so it is impossible to ensure, that default
recovery behavior, desired in some cases, will not be silently
overwritten by some recovery parameter forgotten by user.Proposed path is very simple and solves the aforementioned problems by
introducing new argument "latest" for recovery_target parameter.
Does the tool remove or rename recovery.conf to cancel the settings?
And do you intend that the new option is used to override settings by
appending it at the end of postgresql.conf? If so, the commit
f2cbffc7a6 seems to break the assumption. PG12 rejects to start if it
finds two different kinds of recovery target settings.
Old recovery behavior is still available if no recovery target is
provided. I`m not sure, whether it should it be left as it is now, or
not.Another open question is what to do with recovery_target_inclusive if
recovery_target = "latest" is used.
Anyway inclusiveness doesn't affect to "immediate". If we had the
"latest" option, it would behave the same way.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Thank you for you interest in this topic!
On 11/5/19 10:41 AM, Kyotaro Horiguchi wrote:
Hello.
At Mon, 4 Nov 2019 16:03:38 +0300, Grigory Smolkin <g.smolkin@postgrespro.ru> wrote in
Hello, hackers!
I`d like to propose a new argument for recovery_target parameter,
which will stand to recovering until all available WAL segments are
applied.Current PostgreSQL recovery default behavior(when no recovery target
is provided) does exactly that, but there are several shortcomings:
- without explicit recovery target standing for default behavior,
recovery_target_action is not coming to action at the end of recovery
- with PG12 changes, the life of all backup tools became very hard,
because now recovery parameters can be set outside of single config
file(recovery.conf), so it is impossible to ensure, that default
recovery behavior, desired in some cases, will not be silently
overwritten by some recovery parameter forgotten by user.Proposed path is very simple and solves the aforementioned problems by
introducing new argument "latest" for recovery_target parameter.Does the tool remove or rename recovery.conf to cancel the settings?
And do you intend that the new option is used to override settings by
appending it at the end of postgresql.conf? If so, the commit
f2cbffc7a6 seems to break the assumption. PG12 rejects to start if it
finds two different kinds of recovery target settings.
Yes, previously it was possible to remove/rename old recovery.conf, but
not anymore.
My assumption is exactly that PG should reject to start because of
multiple recovery targets.
Failing to start is infinitely better that recovering to the wrong
recovery target.
Old recovery behavior is still available if no recovery target is
provided. I`m not sure, whether it should it be left as it is now, or
not.Another open question is what to do with recovery_target_inclusive if
recovery_target = "latest" is used.Anyway inclusiveness doesn't affect to "immediate". If we had the
"latest" option, it would behave the same way.
Right, thank you.
regards.
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attached new version of a patch with TAP test.
On 11/5/19 11:51 AM, Grigory Smolkin wrote:
Thank you for you interest in this topic!
On 11/5/19 10:41 AM, Kyotaro Horiguchi wrote:
Hello.
At Mon, 4 Nov 2019 16:03:38 +0300, Grigory Smolkin
<g.smolkin@postgrespro.ru> wrote inHello, hackers!
I`d like to propose a new argument for recovery_target parameter,
which will stand to recovering until all available WAL segments are
applied.Current PostgreSQL recovery default behavior(when no recovery target
is provided) does exactly that, but there are several shortcomings:
- without explicit recovery target standing for default behavior,
recovery_target_action is not coming to action at the end of recovery
- with PG12 changes, the life of all backup tools became very hard,
because now recovery parameters can be set outside of single config
file(recovery.conf), so it is impossible to ensure, that default
recovery behavior, desired in some cases, will not be silently
overwritten by some recovery parameter forgotten by user.Proposed path is very simple and solves the aforementioned problems by
introducing new argument "latest" for recovery_target parameter.Does the tool remove or rename recovery.conf to cancel the settings?
And do you intend that the new option is used to override settings by
appending it at the end of postgresql.conf? If so, the commit
f2cbffc7a6 seems to break the assumption. PG12 rejects to start if it
finds two different kinds of recovery target settings.Yes, previously it was possible to remove/rename old recovery.conf,
but not anymore.
My assumption is exactly that PG should reject to start because of
multiple recovery targets.
Failing to start is infinitely better that recovering to the wrong
recovery target.Old recovery behavior is still available if no recovery target is
provided. I`m not sure, whether it should it be left as it is now, or
not.Another open question is what to do with recovery_target_inclusive if
recovery_target = "latest" is used.Anyway inclusiveness doesn't affect to "immediate". If we had the
"latest" option, it would behave the same way.Right, thank you.
regards.
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0002-recovery_target_latest.patchtext/x-patch; name=0002-recovery_target_latest.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 619ac8c50c..f0fa0b09c8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3316,21 +3316,19 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows
<variablelist>
<varlistentry id="guc-recovery-target" xreflabel="recovery_target">
- <term><varname>recovery_target</varname><literal> = 'immediate'</literal>
+ <term><varname>recovery_target</varname> (<type>string</type>)
<indexterm>
<primary><varname>recovery_target</varname> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
- This parameter specifies that recovery should end as soon as a
- consistent state is reached, i.e. as early as possible. When restoring
- from an online backup, this means the point where taking the backup
- ended.
- </para>
- <para>
- Technically, this is a string parameter, but <literal>'immediate'</literal>
- is currently the only allowed value.
+ This parameter determines how far recovery should proceed. The value
+ <literal>immediate</literal> means that recovery should end as soon as a consistent
+ state is reached, i.e. as early as possible. When restoring from an online
+ backup, this means the point where taking the backup ended.
+ The second possible value <literal>latest</literal> means that recovery
+ should proceed to the end of the available WAL log.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 77ad765989..707ebe68a1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6350,6 +6350,9 @@ StartupXLOG(void)
else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE)
ereport(LOG,
(errmsg("starting point-in-time recovery to earliest consistent point")));
+ else if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ ereport(LOG,
+ (errmsg("starting point-in-time recovery until all available WAL is applied")));
else
ereport(LOG,
(errmsg("starting archive recovery")));
@@ -7261,6 +7264,13 @@ StartupXLOG(void)
* end of main redo apply loop
*/
+ /*
+ * If all avaialble WAL is replayed,
+ * then RECOVERY_TARGET_LATEST is satisfied
+ */
+ if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ reachedStopPoint = true;
+
if (reachedStopPoint)
{
if (!reachedConsistency)
@@ -7462,6 +7472,8 @@ StartupXLOG(void)
recoveryStopName);
else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE)
snprintf(reason, sizeof(reason), "reached consistency");
+ else if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ snprintf(reason, sizeof(reason), "applied all available WAL");
else
snprintf(reason, sizeof(reason), "no recovery target specified");
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f0ed326a1b..e7288a79b6 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3503,7 +3503,10 @@ static struct config_string ConfigureNamesString[] =
{
{"recovery_target", PGC_POSTMASTER, WAL_RECOVERY_TARGET,
- gettext_noop("Set to \"immediate\" to end recovery as soon as a consistent state is reached."),
+ gettext_noop("Set to \"immediate\" to end recovery as "
+ "soon as a consistent state is reached or "
+ " to \"latest\" to end recovery after "
+ "replaying all available WAL in the archive."),
NULL
},
&recovery_target_string,
@@ -11498,9 +11501,10 @@ error_multiple_recovery_targets(void)
static bool
check_recovery_target(char **newval, void **extra, GucSource source)
{
- if (strcmp(*newval, "immediate") != 0 && strcmp(*newval, "") != 0)
+ if (strcmp(*newval, "immediate") != 0 && strcmp(*newval, "latest") != 0 &&
+ strcmp(*newval, "") != 0)
{
- GUC_check_errdetail("The only allowed value is \"immediate\".");
+ GUC_check_errdetail("The only allowed values are \"immediate\" and \"latest\".");
return false;
}
return true;
@@ -11510,11 +11514,17 @@ static void
assign_recovery_target(const char *newval, void *extra)
{
if (recoveryTarget != RECOVERY_TARGET_UNSET &&
- recoveryTarget != RECOVERY_TARGET_IMMEDIATE)
+ recoveryTarget != RECOVERY_TARGET_IMMEDIATE &&
+ recoveryTarget != RECOVERY_TARGET_LATEST)
error_multiple_recovery_targets();
if (newval && strcmp(newval, "") != 0)
- recoveryTarget = RECOVERY_TARGET_IMMEDIATE;
+ {
+ if (strcmp(newval, "immediate") == 0)
+ recoveryTarget = RECOVERY_TARGET_IMMEDIATE;
+ else if (strcmp(newval, "latest") == 0)
+ recoveryTarget = RECOVERY_TARGET_LATEST;
+ }
else
recoveryTarget = RECOVERY_TARGET_UNSET;
}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252aad..55e7e4bd2c 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -84,7 +84,8 @@ typedef enum
RECOVERY_TARGET_TIME,
RECOVERY_TARGET_NAME,
RECOVERY_TARGET_LSN,
- RECOVERY_TARGET_IMMEDIATE
+ RECOVERY_TARGET_IMMEDIATE,
+ RECOVERY_TARGET_LATEST
} RecoveryTargetType;
/*
diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index d8fbd50011..01dab1aebf 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 8;
+use Test::More tests => 9;
# Create and test a standby from given backup, with a certain recovery target.
# Choose $until_lsn later than the transaction commit that causes the row
@@ -77,7 +77,7 @@ my $lsn3 =
$node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
my $recovery_time = $node_master->safe_psql('postgres', "SELECT now()");
-# Even more data, this time with a recovery target name
+# More data, with a recovery target name
$node_master->safe_psql('postgres',
"INSERT INTO tab_int VALUES (generate_series(3001,4000))");
my $recovery_name = "my_target";
@@ -86,14 +86,17 @@ my $lsn4 =
$node_master->safe_psql('postgres',
"SELECT pg_create_restore_point('$recovery_name');");
-# And now for a recovery target LSN
+# Even more data, this time with a recovery target LSN
$node_master->safe_psql('postgres',
"INSERT INTO tab_int VALUES (generate_series(4001,5000))");
my $lsn5 = my $recovery_lsn =
$node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+# And now for a recovery target 'latest'
$node_master->safe_psql('postgres',
"INSERT INTO tab_int VALUES (generate_series(5001,6000))");
+my $lsn6 =
+ $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
# Force archiving of WAL file
$node_master->safe_psql('postgres', "SELECT pg_switch_wal()");
@@ -114,6 +117,9 @@ test_recovery_standby('name', 'standby_4', $node_master, \@recovery_params,
@recovery_params = ("recovery_target_lsn = '$recovery_lsn'");
test_recovery_standby('LSN', 'standby_5', $node_master, \@recovery_params,
"5000", $lsn5);
+@recovery_params = ("recovery_target = 'latest'");
+test_recovery_standby('latest target', 'standby_6', $node_master, \@recovery_params,
+ "6000", $lsn6);
# Multiple targets
#
@@ -126,9 +132,9 @@ test_recovery_standby('LSN', 'standby_5', $node_master, \@recovery_params,
"recovery_target_name = ''",
"recovery_target_time = '$recovery_time'");
test_recovery_standby('multiple overriding settings',
- 'standby_6', $node_master, \@recovery_params, "3000", $lsn3);
+ 'standby_7', $node_master, \@recovery_params, "3000", $lsn3);
-my $node_standby = get_new_node('standby_7');
+my $node_standby = get_new_node('standby_8');
$node_standby->init_from_backup($node_master, 'my_backup',
has_restoring => 1);
$node_standby->append_conf(
This seems to also be related to this discussion:
</messages/by-id/993736dd3f1713ec1f63fc3b653839f5@lako.no>
I like this idea.
I don't like the name "latest". What does that mean? Other
documentation talks about the "end of the archive". What does that
mean? It means until restore_command errors. Let's think of a name
that reflects that better. Maybe "all_archive" or something like that.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/6/19 10:39 AM, Peter Eisentraut wrote:
This seems to also be related to this discussion:
</messages/by-id/993736dd3f1713ec1f63fc3b653839f5@lako.no>
Yes, in a way. Strengthening current lax recovery behavior is a very
good idea.
I like this idea.
I don't like the name "latest". What does that mean? Other
documentation talks about the "end of the archive". What does that
mean? It means until restore_command errors. Let's think of a name
that reflects that better. Maybe "all_archive" or something like that.
As with "immediate", "latest" reflects the latest possible state this
PostgreSQL instance can achieve when using PITR. I think it is simple
and easy to understand for an end user, which sees PITR as a way to go
from one state to another. In my experience, at least, which is, of
course, subjective.
But if we want an argument name to be technically accurate, then, I
think, something like "end-of-available-WAL"/"all-WAL", "end-of-WAL" is
a way to go.
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Wed, Nov 6, 2019 at 6:33 PM Grigory Smolkin <g.smolkin@postgrespro.ru> wrote:
On 11/6/19 10:39 AM, Peter Eisentraut wrote:
This seems to also be related to this discussion:
</messages/by-id/993736dd3f1713ec1f63fc3b653839f5@lako.no>Yes, in a way. Strengthening current lax recovery behavior is a very
good idea.I like this idea.
I don't like the name "latest". What does that mean? Other
documentation talks about the "end of the archive". What does that
mean? It means until restore_command errors. Let's think of a name
that reflects that better. Maybe "all_archive" or something like that.As with "immediate", "latest" reflects the latest possible state this
PostgreSQL instance can achieve when using PITR. I think it is simple
and easy to understand for an end user, which sees PITR as a way to go
from one state to another. In my experience, at least, which is, of
course, subjective.But if we want an argument name to be technically accurate, then, I
think, something like "end-of-available-WAL"/"all-WAL", "end-of-WAL" is
a way to go.
What happens if this parameter is set to latest in the standby mode?
Or the combination of those settings should be prohibited?
Regards,
--
Fujii Masao
On 11/6/19 12:56 PM, Fujii Masao wrote:
On Wed, Nov 6, 2019 at 6:33 PM Grigory Smolkin <g.smolkin@postgrespro.ru> wrote:
On 11/6/19 10:39 AM, Peter Eisentraut wrote:
This seems to also be related to this discussion:
</messages/by-id/993736dd3f1713ec1f63fc3b653839f5@lako.no>Yes, in a way. Strengthening current lax recovery behavior is a very
good idea.I like this idea.
I don't like the name "latest". What does that mean? Other
documentation talks about the "end of the archive". What does that
mean? It means until restore_command errors. Let's think of a name
that reflects that better. Maybe "all_archive" or something like that.As with "immediate", "latest" reflects the latest possible state this
PostgreSQL instance can achieve when using PITR. I think it is simple
and easy to understand for an end user, which sees PITR as a way to go
from one state to another. In my experience, at least, which is, of
course, subjective.But if we want an argument name to be technically accurate, then, I
think, something like "end-of-available-WAL"/"all-WAL", "end-of-WAL" is
a way to go.What happens if this parameter is set to latest in the standby mode?
Or the combination of those settings should be prohibited?
Currently it will behave just like regular standby, so I think, to avoid
confusion and keep things simple, the combination of them should be
prohibited.
Thank you for pointing this out, I will work on it.
The other way around, as I see it, is to define RECOVERY_TARGET_LATEST
as something more complex, for example, the latest possible endptr in
latest WAL segment. But it is tricky, because WAL archive may keeps
growing as recovery is progressing or, in case of standby, master keeps
sending more and more WAL.
Regards,
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 11/6/19 1:55 PM, Grigory Smolkin wrote:
On 11/6/19 12:56 PM, Fujii Masao wrote:
On Wed, Nov 6, 2019 at 6:33 PM Grigory Smolkin
<g.smolkin@postgrespro.ru> wrote:On 11/6/19 10:39 AM, Peter Eisentraut wrote:
This seems to also be related to this discussion:
</messages/by-id/993736dd3f1713ec1f63fc3b653839f5@lako.no>Yes, in a way. Strengthening current lax recovery behavior is a very
good idea.I like this idea.
I don't like the name "latest". What does that mean? Other
documentation talks about the "end of the archive". What does that
mean? It means until restore_command errors. Let's think of a name
that reflects that better. Maybe "all_archive" or something like
that.As with "immediate", "latest" reflects the latest possible state this
PostgreSQL instance can achieve when using PITR. I think it is simple
and easy to understand for an end user, which sees PITR as a way to go
from one state to another. In my experience, at least, which is, of
course, subjective.But if we want an argument name to be technically accurate, then, I
think, something like "end-of-available-WAL"/"all-WAL", "end-of-WAL" is
a way to go.What happens if this parameter is set to latest in the standby mode?
Or the combination of those settings should be prohibited?Currently it will behave just like regular standby, so I think, to
avoid confusion and keep things simple, the combination of them should
be prohibited.
Thank you for pointing this out, I will work on it.
Attached new patch revision, now it is impossible to use recovery_target
'latest' in standby mode.
TAP test is updated to reflect this behavior.
The other way around, as I see it, is to define RECOVERY_TARGET_LATEST
as something more complex, for example, the latest possible endptr in
latest WAL segment. But it is tricky, because WAL archive may keeps
growing as recovery is progressing or, in case of standby, master
keeps sending more and more WAL.Regards,
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0003-recovery_target_latest.patchtext/x-patch; name=0003-recovery_target_latest.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 619ac8c50c..f0fa0b09c8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3316,21 +3316,19 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows
<variablelist>
<varlistentry id="guc-recovery-target" xreflabel="recovery_target">
- <term><varname>recovery_target</varname><literal> = 'immediate'</literal>
+ <term><varname>recovery_target</varname> (<type>string</type>)
<indexterm>
<primary><varname>recovery_target</varname> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
- This parameter specifies that recovery should end as soon as a
- consistent state is reached, i.e. as early as possible. When restoring
- from an online backup, this means the point where taking the backup
- ended.
- </para>
- <para>
- Technically, this is a string parameter, but <literal>'immediate'</literal>
- is currently the only allowed value.
+ This parameter determines how far recovery should proceed. The value
+ <literal>immediate</literal> means that recovery should end as soon as a consistent
+ state is reached, i.e. as early as possible. When restoring from an online
+ backup, this means the point where taking the backup ended.
+ The second possible value <literal>latest</literal> means that recovery
+ should proceed to the end of the available WAL log.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 77ad765989..6001b0aa34 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5401,6 +5401,15 @@ validateRecoveryParameters(void)
errmsg("must specify restore_command when standby mode is not enabled")));
}
+ /* Combining standby mode with recovery to the end of WAL is
+ * impossible, because there is no end of WAL in this case.
+ */
+ if (StandbyModeRequested && recoveryTarget == RECOVERY_TARGET_LATEST)
+ ereport(FATAL,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("recovery to the end of WAL is not possible because standby mode is enabled")));
+
+
/*
* Override any inconsistent requests. Note that this is a change of
* behaviour in 9.5; prior to this we simply ignored a request to pause if
@@ -6350,6 +6359,9 @@ StartupXLOG(void)
else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE)
ereport(LOG,
(errmsg("starting point-in-time recovery to earliest consistent point")));
+ else if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ ereport(LOG,
+ (errmsg("starting point-in-time recovery until all available WAL is applied")));
else
ereport(LOG,
(errmsg("starting archive recovery")));
@@ -7261,6 +7273,13 @@ StartupXLOG(void)
* end of main redo apply loop
*/
+ /*
+ * If all avaialble WAL is replayed,
+ * then RECOVERY_TARGET_LATEST is satisfied
+ */
+ if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ reachedStopPoint = true;
+
if (reachedStopPoint)
{
if (!reachedConsistency)
@@ -7462,6 +7481,8 @@ StartupXLOG(void)
recoveryStopName);
else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE)
snprintf(reason, sizeof(reason), "reached consistency");
+ else if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ snprintf(reason, sizeof(reason), "applied all available WAL");
else
snprintf(reason, sizeof(reason), "no recovery target specified");
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f0ed326a1b..e7288a79b6 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3503,7 +3503,10 @@ static struct config_string ConfigureNamesString[] =
{
{"recovery_target", PGC_POSTMASTER, WAL_RECOVERY_TARGET,
- gettext_noop("Set to \"immediate\" to end recovery as soon as a consistent state is reached."),
+ gettext_noop("Set to \"immediate\" to end recovery as "
+ "soon as a consistent state is reached or "
+ " to \"latest\" to end recovery after "
+ "replaying all available WAL in the archive."),
NULL
},
&recovery_target_string,
@@ -11498,9 +11501,10 @@ error_multiple_recovery_targets(void)
static bool
check_recovery_target(char **newval, void **extra, GucSource source)
{
- if (strcmp(*newval, "immediate") != 0 && strcmp(*newval, "") != 0)
+ if (strcmp(*newval, "immediate") != 0 && strcmp(*newval, "latest") != 0 &&
+ strcmp(*newval, "") != 0)
{
- GUC_check_errdetail("The only allowed value is \"immediate\".");
+ GUC_check_errdetail("The only allowed values are \"immediate\" and \"latest\".");
return false;
}
return true;
@@ -11510,11 +11514,17 @@ static void
assign_recovery_target(const char *newval, void *extra)
{
if (recoveryTarget != RECOVERY_TARGET_UNSET &&
- recoveryTarget != RECOVERY_TARGET_IMMEDIATE)
+ recoveryTarget != RECOVERY_TARGET_IMMEDIATE &&
+ recoveryTarget != RECOVERY_TARGET_LATEST)
error_multiple_recovery_targets();
if (newval && strcmp(newval, "") != 0)
- recoveryTarget = RECOVERY_TARGET_IMMEDIATE;
+ {
+ if (strcmp(newval, "immediate") == 0)
+ recoveryTarget = RECOVERY_TARGET_IMMEDIATE;
+ else if (strcmp(newval, "latest") == 0)
+ recoveryTarget = RECOVERY_TARGET_LATEST;
+ }
else
recoveryTarget = RECOVERY_TARGET_UNSET;
}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252aad..55e7e4bd2c 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -84,7 +84,8 @@ typedef enum
RECOVERY_TARGET_TIME,
RECOVERY_TARGET_NAME,
RECOVERY_TARGET_LSN,
- RECOVERY_TARGET_IMMEDIATE
+ RECOVERY_TARGET_IMMEDIATE,
+ RECOVERY_TARGET_LATEST
} RecoveryTargetType;
/*
diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index d8fbd50011..05f5f9889c 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 8;
+use Test::More tests => 9;
# Create and test a standby from given backup, with a certain recovery target.
# Choose $until_lsn later than the transaction commit that causes the row
@@ -26,6 +26,23 @@ sub test_recovery_standby
$node_standby->append_conf('postgresql.conf', qq($param_item));
}
+ # It is expected for recovery target latest to fail on standby
+ if ($test_name eq 'latest target')
+ {
+ $node_standby->start(fail_ok => 1);
+ $node_standby->clean_node;
+
+ $node_standby = get_new_node($node_name);
+ $node_standby->init_from_backup($node_master, 'my_backup',
+ has_restoring => 1);
+ unlink($node_standby->data_dir . '/standby.signal');
+
+ foreach my $param_item (@$recovery_params)
+ {
+ $node_standby->append_conf('postgresql.conf', qq($param_item));
+ }
+ }
+
$node_standby->start;
# Wait until standby has replayed enough data
@@ -77,7 +94,7 @@ my $lsn3 =
$node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
my $recovery_time = $node_master->safe_psql('postgres', "SELECT now()");
-# Even more data, this time with a recovery target name
+# More data, with a recovery target name
$node_master->safe_psql('postgres',
"INSERT INTO tab_int VALUES (generate_series(3001,4000))");
my $recovery_name = "my_target";
@@ -86,14 +103,17 @@ my $lsn4 =
$node_master->safe_psql('postgres',
"SELECT pg_create_restore_point('$recovery_name');");
-# And now for a recovery target LSN
+# Even more data, this time with a recovery target LSN
$node_master->safe_psql('postgres',
"INSERT INTO tab_int VALUES (generate_series(4001,5000))");
my $lsn5 = my $recovery_lsn =
$node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+# And now for a recovery target 'latest'
$node_master->safe_psql('postgres',
"INSERT INTO tab_int VALUES (generate_series(5001,6000))");
+my $lsn6 =
+ $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
# Force archiving of WAL file
$node_master->safe_psql('postgres', "SELECT pg_switch_wal()");
@@ -114,6 +134,9 @@ test_recovery_standby('name', 'standby_4', $node_master, \@recovery_params,
@recovery_params = ("recovery_target_lsn = '$recovery_lsn'");
test_recovery_standby('LSN', 'standby_5', $node_master, \@recovery_params,
"5000", $lsn5);
+@recovery_params = ("recovery_target = 'latest'");
+test_recovery_standby('latest target', 'standby_6', $node_master, \@recovery_params,
+ "6000", $lsn6);
# Multiple targets
#
@@ -126,9 +149,9 @@ test_recovery_standby('LSN', 'standby_5', $node_master, \@recovery_params,
"recovery_target_name = ''",
"recovery_target_time = '$recovery_time'");
test_recovery_standby('multiple overriding settings',
- 'standby_6', $node_master, \@recovery_params, "3000", $lsn3);
+ 'standby_7', $node_master, \@recovery_params, "3000", $lsn3);
-my $node_standby = get_new_node('standby_7');
+my $node_standby = get_new_node('standby_8');
$node_standby->init_from_backup($node_master, 'my_backup',
has_restoring => 1);
$node_standby->append_conf(
At Thu, 7 Nov 2019 02:28:39 +0300, Grigory Smolkin <g.smolkin@postgrespro.ru> wrote in
On 11/6/19 1:55 PM, Grigory Smolkin wrote:
On 11/6/19 12:56 PM, Fujii Masao wrote:
On Wed, Nov 6, 2019 at 6:33 PM Grigory Smolkin
<g.smolkin@postgrespro.ru> wrote:On 11/6/19 10:39 AM, Peter Eisentraut wrote:
This seems to also be related to this discussion:
</messages/by-id/993736dd3f1713ec1f63fc3b653839f5@lako.no>Yes, in a way. Strengthening current lax recovery behavior is a very
good idea.I like this idea.
I don't like the name "latest". What does that mean? Other
documentation talks about the "end of the archive". What does that
mean? It means until restore_command errors. Let's think of a name
that reflects that better. Maybe "all_archive" or something like
that.As with "immediate", "latest" reflects the latest possible state this
PostgreSQL instance can achieve when using PITR. I think it is simple
and easy to understand for an end user, which sees PITR as a way to go
from one state to another. In my experience, at least, which is, of
course, subjective.But if we want an argument name to be technically accurate, then, I
think, something like "end-of-available-WAL"/"all-WAL", "end-of-WAL"
is
a way to go.What happens if this parameter is set to latest in the standby mode?
Or the combination of those settings should be prohibited?Currently it will behave just like regular standby, so I think, to
avoid confusion and keep things simple, the combination of them should
be prohibited.
Thank you for pointing this out, I will work on it.Attached new patch revision, now it is impossible to use
recovery_target 'latest' in standby mode.
TAP test is updated to reflect this behavior.
In the first place, latest (or anything it could be named as) is
defined as the explit label for the default behavior. Thus the latest
should work as if nothing is set to recovery_target* following the
definition. That might seems somewhat strange but I think at least it
is harmless.
recovery_target=immediate + r_t_action=shutdown for a standby works as
commanded. Do we need to inhibit that, too?
The other way around, as I see it, is to define RECOVERY_TARGET_LATEST
as something more complex, for example, the latest possible endptr in
latest WAL segment. But it is tricky, because WAL archive may keeps
growing as recovery is progressing or, in case of standby, master
keeps sending more and more WAL.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 11/7/19 8:36 AM, Kyotaro Horiguchi wrote:
At Thu, 7 Nov 2019 02:28:39 +0300, Grigory Smolkin <g.smolkin@postgrespro.ru> wrote in
On 11/6/19 1:55 PM, Grigory Smolkin wrote:
On 11/6/19 12:56 PM, Fujii Masao wrote:
On Wed, Nov 6, 2019 at 6:33 PM Grigory Smolkin
<g.smolkin@postgrespro.ru> wrote:On 11/6/19 10:39 AM, Peter Eisentraut wrote:
This seems to also be related to this discussion:
</messages/by-id/993736dd3f1713ec1f63fc3b653839f5@lako.no>Yes, in a way. Strengthening current lax recovery behavior is a very
good idea.I like this idea.
I don't like the name "latest". What does that mean? Other
documentation talks about the "end of the archive". What does that
mean? It means until restore_command errors. Let's think of a name
that reflects that better. Maybe "all_archive" or something like
that.As with "immediate", "latest" reflects the latest possible state this
PostgreSQL instance can achieve when using PITR. I think it is simple
and easy to understand for an end user, which sees PITR as a way to go
from one state to another. In my experience, at least, which is, of
course, subjective.But if we want an argument name to be technically accurate, then, I
think, something like "end-of-available-WAL"/"all-WAL", "end-of-WAL"
is
a way to go.What happens if this parameter is set to latest in the standby mode?
Or the combination of those settings should be prohibited?Currently it will behave just like regular standby, so I think, to
avoid confusion and keep things simple, the combination of them should
be prohibited.
Thank you for pointing this out, I will work on it.Attached new patch revision, now it is impossible to use
recovery_target 'latest' in standby mode.
TAP test is updated to reflect this behavior.In the first place, latest (or anything it could be named as) is
defined as the explit label for the default behavior. Thus the latest
should work as if nothing is set to recovery_target* following the
definition. That might seems somewhat strange but I think at least it
is harmless.
Well, it was more about getting default behavior by using some explicit
recovery_target, not the other way around. Because it will break some
3rd party backup and replication applications, that may rely on old
behavior of ignoring recovery_target_action when no recovery_target is
provided.
But you think that it is worth pursuing, I can do that.
recovery_target=immediate + r_t_action=shutdown for a standby works as
commanded. Do we need to inhibit that, too?
Why something, that work as expected, should be inhibited?
The other way around, as I see it, is to define RECOVERY_TARGET_LATEST
as something more complex, for example, the latest possible endptr in
latest WAL segment. But it is tricky, because WAL archive may keeps
growing as recovery is progressing or, in case of standby, master
keeps sending more and more WAL.regards.
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
At Thu, 7 Nov 2019 12:22:28 +0300, Grigory Smolkin <g.smolkin@postgrespro.ru> wrote in
On 11/7/19 8:36 AM, Kyotaro Horiguchi wrote:
At Thu, 7 Nov 2019 02:28:39 +0300, Grigory Smolkin
<g.smolkin@postgrespro.ru> wrote inOn 11/6/19 1:55 PM, Grigory Smolkin wrote:
On 11/6/19 12:56 PM, Fujii Masao wrote:
What happens if this parameter is set to latest in the standby mode?
Or the combination of those settings should be prohibited?Currently it will behave just like regular standby, so I think, to
avoid confusion and keep things simple, the combination of them should
be prohibited.
Thank you for pointing this out, I will work on it.Attached new patch revision, now it is impossible to use
recovery_target 'latest' in standby mode.
TAP test is updated to reflect this behavior.In the first place, latest (or anything it could be named as) is
defined as the explit label for the default behavior. Thus the latest
should work as if nothing is set to recovery_target* following the
definition. That might seems somewhat strange but I think at least it
is harmless.Well, it was more about getting default behavior by using some
explicit recovery_target, not the other way around. Because it will
break some 3rd party backup and replication applications, that may
rely on old behavior of ignoring recovery_target_action when no
recovery_target is provided.
But you think that it is worth pursuing, I can do that.
Ah. Sorry for the misleading statement. What I had in my mind was
somewhat the mixture of them. I thought that recovery_target =''
behaves the same way as now, r_t_action is ignored. And 'latest' just
makes recovery_target_action work as the current non-empty
recovery_target's does. But I'm not confident that it is a good
design.
recovery_target=immediate + r_t_action=shutdown for a standby works as
commanded. Do we need to inhibit that, too?Why something, that work as expected, should be inhibited?
To make sure, I don't think we should do that. I meant by the above
that standby mode is already accepting recovery_target_action so
inhibiting that only for 'latest' is not orthogonal and could be more
confusing for users, and complicatig the code. So my opinion is we
shouldn't inhibit 'latest' unless r_t_action harms.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 11/7/19 12:56 PM, Kyotaro Horiguchi wrote:
At Thu, 7 Nov 2019 12:22:28 +0300, Grigory Smolkin <g.smolkin@postgrespro.ru> wrote in
On 11/7/19 8:36 AM, Kyotaro Horiguchi wrote:
At Thu, 7 Nov 2019 02:28:39 +0300, Grigory Smolkin
<g.smolkin@postgrespro.ru> wrote inOn 11/6/19 1:55 PM, Grigory Smolkin wrote:
On 11/6/19 12:56 PM, Fujii Masao wrote:
What happens if this parameter is set to latest in the standby mode?
Or the combination of those settings should be prohibited?Currently it will behave just like regular standby, so I think, to
avoid confusion and keep things simple, the combination of them should
be prohibited.
Thank you for pointing this out, I will work on it.Attached new patch revision, now it is impossible to use
recovery_target 'latest' in standby mode.
TAP test is updated to reflect this behavior.In the first place, latest (or anything it could be named as) is
defined as the explit label for the default behavior. Thus the latest
should work as if nothing is set to recovery_target* following the
definition. That might seems somewhat strange but I think at least it
is harmless.Well, it was more about getting default behavior by using some
explicit recovery_target, not the other way around. Because it will
break some 3rd party backup and replication applications, that may
rely on old behavior of ignoring recovery_target_action when no
recovery_target is provided.
But you think that it is worth pursuing, I can do that.Ah. Sorry for the misleading statement. What I had in my mind was
somewhat the mixture of them. I thought that recovery_target =''
behaves the same way as now, r_t_action is ignored. And 'latest' just
makes recovery_target_action work as the current non-empty
recovery_target's does. But I'm not confident that it is a good
design.recovery_target=immediate + r_t_action=shutdown for a standby works as
commanded. Do we need to inhibit that, too?Why something, that work as expected, should be inhibited?
To make sure, I don't think we should do that. I meant by the above
that standby mode is already accepting recovery_target_action so
inhibiting that only for 'latest' is not orthogonal and could be more
confusing for users, and complicatig the code. So my opinion is we
shouldn't inhibit 'latest' unless r_t_action harms.
I gave it some thought and now think that prohibiting recovery_target
'latest' and standby was a bad idea.
All recovery_targets follow the same pattern of usage, so
recovery_target "latest" also must be capable of working in standby mode.
All recovery_targets have a clear deterministic 'target' where recovery
should end.
In case of recovery_target "latest" this target is the end of available
WAL, therefore the end of available WAL must be more clearly defined.
I will work on it.
Thank you for a feedback.
regards.
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 11/7/19 4:36 PM, Grigory Smolkin wrote:
On 11/7/19 12:56 PM, Kyotaro Horiguchi wrote:
At Thu, 7 Nov 2019 12:22:28 +0300, Grigory Smolkin
<g.smolkin@postgrespro.ru> wrote inOn 11/7/19 8:36 AM, Kyotaro Horiguchi wrote:
At Thu, 7 Nov 2019 02:28:39 +0300, Grigory Smolkin
<g.smolkin@postgrespro.ru> wrote inOn 11/6/19 1:55 PM, Grigory Smolkin wrote:
On 11/6/19 12:56 PM, Fujii Masao wrote:
What happens if this parameter is set to latest in the standby
mode?
Or the combination of those settings should be prohibited?Currently it will behave just like regular standby, so I think, to
avoid confusion and keep things simple, the combination of them
should
be prohibited.
Thank you for pointing this out, I will work on it.Attached new patch revision, now it is impossible to use
recovery_target 'latest' in standby mode.
TAP test is updated to reflect this behavior.In the first place, latest (or anything it could be named as) is
defined as the explit label for the default behavior. Thus the latest
should work as if nothing is set to recovery_target* following the
definition. That might seems somewhat strange but I think at least it
is harmless.Well, it was more about getting default behavior by using some
explicit recovery_target, not the other way around. Because it will
break some 3rd party backup and replication applications, that may
rely on old behavior of ignoring recovery_target_action when no
recovery_target is provided.
But you think that it is worth pursuing, I can do that.Ah. Sorry for the misleading statement. What I had in my mind was
somewhat the mixture of them. I thought that recovery_target =''
behaves the same way as now, r_t_action is ignored. And 'latest' just
makes recovery_target_action work as the current non-empty
recovery_target's does. But I'm not confident that it is a good
design.recovery_target=immediate + r_t_action=shutdown for a standby works as
commanded. Do we need to inhibit that, too?Why something, that work as expected, should be inhibited?
To make sure, I don't think we should do that. I meant by the above
that standby mode is already accepting recovery_target_action so
inhibiting that only for 'latest' is not orthogonal and could be more
confusing for users, and complicatig the code. So my opinion is we
shouldn't inhibit 'latest' unless r_t_action harms.I gave it some thought and now think that prohibiting recovery_target
'latest' and standby was a bad idea.
All recovery_targets follow the same pattern of usage, so
recovery_target "latest" also must be capable of working in standby mode.
All recovery_targets have a clear deterministic 'target' where
recovery should end.
In case of recovery_target "latest" this target is the end of
available WAL, therefore the end of available WAL must be more clearly
defined.
I will work on it.Thank you for a feedback.
Attached new patch revision, now end of available WAL is defined as the
fact of missing required WAL.
In case of standby, the end of WAL is defined as 2 consecutive switches
of WAL source, that didn`t provided requested record.
In case of streaming standby, each switch of WAL source is forced after
3 failed attempts to get new data from walreceiver.
All constants are taken off the top of my head and serves as proof of
concept.
TAP test is updated.
regards.
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0004-recovery_target_latest.patchtext/x-patch; name=0004-recovery_target_latest.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 619ac8c50c..f0fa0b09c8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3316,21 +3316,19 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows
<variablelist>
<varlistentry id="guc-recovery-target" xreflabel="recovery_target">
- <term><varname>recovery_target</varname><literal> = 'immediate'</literal>
+ <term><varname>recovery_target</varname> (<type>string</type>)
<indexterm>
<primary><varname>recovery_target</varname> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
- This parameter specifies that recovery should end as soon as a
- consistent state is reached, i.e. as early as possible. When restoring
- from an online backup, this means the point where taking the backup
- ended.
- </para>
- <para>
- Technically, this is a string parameter, but <literal>'immediate'</literal>
- is currently the only allowed value.
+ This parameter determines how far recovery should proceed. The value
+ <literal>immediate</literal> means that recovery should end as soon as a consistent
+ state is reached, i.e. as early as possible. When restoring from an online
+ backup, this means the point where taking the backup ended.
+ The second possible value <literal>latest</literal> means that recovery
+ should proceed to the end of the available WAL log.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 77ad765989..74b1a6696d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -843,6 +843,13 @@ static bool updateMinRecoveryPoint = true;
*/
bool reachedConsistency = false;
+/*
+ * Have we reached an end of available WAL?
+ * Only ever set when RECOVERY_TARGET_LATEST is used.
+ */
+bool reachedEndOfWAL = false;
+int standby_src_switch_count = 0;
+
static bool InRedo = false;
/* Have we launched bgwriter during recovery? */
@@ -4309,6 +4316,14 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
if (record)
{
+ /*
+ * When recovering to the end of WAL in standby mode, reset
+ * the counter of WAL source switches after page
+ * is successfully readed.
+ */
+ if (recoveryTarget == RECOVERY_TARGET_LATEST && StandbyMode)
+ standby_src_switch_count = 0;
+
/* Great, got a record */
return record;
}
@@ -4372,7 +4387,7 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
}
/* In standby mode, loop back to retry. Otherwise, give up. */
- if (StandbyMode && !CheckForStandbyTrigger())
+ if (StandbyMode && !CheckForStandbyTrigger() && !reachedEndOfWAL)
continue;
else
return NULL;
@@ -6350,6 +6365,9 @@ StartupXLOG(void)
else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE)
ereport(LOG,
(errmsg("starting point-in-time recovery to earliest consistent point")));
+ else if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ ereport(LOG,
+ (errmsg("starting point-in-time recovery until all available WAL is applied")));
else
ereport(LOG,
(errmsg("starting archive recovery")));
@@ -7261,6 +7279,18 @@ StartupXLOG(void)
* end of main redo apply loop
*/
+ /*
+ * If all avaialble WAL is replayed,
+ * then RECOVERY_TARGET_LATEST is satisfied
+ */
+ if (recoveryTarget == RECOVERY_TARGET_LATEST && reachedEndOfWAL)
+ {
+ ereport(LOG,
+ (errmsg("recovery stopping after reaching the end of available WAL")));
+
+ reachedStopPoint = true;
+ }
+
if (reachedStopPoint)
{
if (!reachedConsistency)
@@ -7462,6 +7492,8 @@ StartupXLOG(void)
recoveryStopName);
else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE)
snprintf(reason, sizeof(reason), "reached consistency");
+ else if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ snprintf(reason, sizeof(reason), "applied all available WAL");
else
snprintf(reason, sizeof(reason), "no recovery target specified");
@@ -11737,6 +11769,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
static TimestampTz last_fail_time = 0;
TimestampTz now;
bool streaming_reply_sent = false;
+ int standby_failed_read_attempts = 0;
/*-------
* Standby mode is implemented by a state machine:
@@ -11796,7 +11829,10 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
* and pg_wal.
*/
if (!StandbyMode)
+ {
+ reachedEndOfWAL = true;
return false;
+ }
/*
* If primary_conninfo is set, launch walreceiver to try
@@ -11934,9 +11970,27 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
}
if (currentSource != oldSource)
+ {
+ /*
+ * If the amount of WAL source switches exceeds the limit,
+ * when recovering to end of WAL in standby mode,
+ * then declate the current state to be the end of WAL.
+ */
+ if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ {
+ standby_src_switch_count++;
+
+ if (standby_src_switch_count >= 2)
+ {
+ reachedEndOfWAL = true;
+ return false;
+ }
+ }
+
elog(DEBUG2, "switched WAL source from %s to %s after %s",
xlogSourceNames[oldSource], xlogSourceNames[currentSource],
lastSourceFailed ? "failure" : "success");
+ }
/*
* We've now handled possible failure. Try to read from the chosen
@@ -11968,9 +12022,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
if (readFile >= 0)
return true; /* success! */
+ if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ standby_failed_read_attempts++;
+
/*
* Nope, not found in archive or pg_wal.
*/
+
lastSourceFailed = true;
break;
@@ -12047,6 +12105,11 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
}
break;
}
+ else
+ {
+ if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ standby_failed_read_attempts++;
+ }
/*
* Data not here yet. Check for trigger, then wait for
@@ -12102,6 +12165,18 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
* process.
*/
HandleStartupProcInterrupts();
+
+ /*
+ * If the amount of failed attempts to get data from current
+ * WAL source exceeds the limit, when recovering to the end
+ * of WAL in standby mode, then force the WAL source switch.
+ */
+ if (recoveryTarget == RECOVERY_TARGET_LATEST &&
+ standby_failed_read_attempts >= 3)
+ {
+ lastSourceFailed = true;
+ break;
+ }
}
return false; /* not reached */
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f0ed326a1b..e7288a79b6 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3503,7 +3503,10 @@ static struct config_string ConfigureNamesString[] =
{
{"recovery_target", PGC_POSTMASTER, WAL_RECOVERY_TARGET,
- gettext_noop("Set to \"immediate\" to end recovery as soon as a consistent state is reached."),
+ gettext_noop("Set to \"immediate\" to end recovery as "
+ "soon as a consistent state is reached or "
+ " to \"latest\" to end recovery after "
+ "replaying all available WAL in the archive."),
NULL
},
&recovery_target_string,
@@ -11498,9 +11501,10 @@ error_multiple_recovery_targets(void)
static bool
check_recovery_target(char **newval, void **extra, GucSource source)
{
- if (strcmp(*newval, "immediate") != 0 && strcmp(*newval, "") != 0)
+ if (strcmp(*newval, "immediate") != 0 && strcmp(*newval, "latest") != 0 &&
+ strcmp(*newval, "") != 0)
{
- GUC_check_errdetail("The only allowed value is \"immediate\".");
+ GUC_check_errdetail("The only allowed values are \"immediate\" and \"latest\".");
return false;
}
return true;
@@ -11510,11 +11514,17 @@ static void
assign_recovery_target(const char *newval, void *extra)
{
if (recoveryTarget != RECOVERY_TARGET_UNSET &&
- recoveryTarget != RECOVERY_TARGET_IMMEDIATE)
+ recoveryTarget != RECOVERY_TARGET_IMMEDIATE &&
+ recoveryTarget != RECOVERY_TARGET_LATEST)
error_multiple_recovery_targets();
if (newval && strcmp(newval, "") != 0)
- recoveryTarget = RECOVERY_TARGET_IMMEDIATE;
+ {
+ if (strcmp(newval, "immediate") == 0)
+ recoveryTarget = RECOVERY_TARGET_IMMEDIATE;
+ else if (strcmp(newval, "latest") == 0)
+ recoveryTarget = RECOVERY_TARGET_LATEST;
+ }
else
recoveryTarget = RECOVERY_TARGET_UNSET;
}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252aad..55e7e4bd2c 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -84,7 +84,8 @@ typedef enum
RECOVERY_TARGET_TIME,
RECOVERY_TARGET_NAME,
RECOVERY_TARGET_LSN,
- RECOVERY_TARGET_IMMEDIATE
+ RECOVERY_TARGET_IMMEDIATE,
+ RECOVERY_TARGET_LATEST
} RecoveryTargetType;
/*
diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index d8fbd50011..01dab1aebf 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 8;
+use Test::More tests => 9;
# Create and test a standby from given backup, with a certain recovery target.
# Choose $until_lsn later than the transaction commit that causes the row
@@ -77,7 +77,7 @@ my $lsn3 =
$node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
my $recovery_time = $node_master->safe_psql('postgres', "SELECT now()");
-# Even more data, this time with a recovery target name
+# More data, with a recovery target name
$node_master->safe_psql('postgres',
"INSERT INTO tab_int VALUES (generate_series(3001,4000))");
my $recovery_name = "my_target";
@@ -86,14 +86,17 @@ my $lsn4 =
$node_master->safe_psql('postgres',
"SELECT pg_create_restore_point('$recovery_name');");
-# And now for a recovery target LSN
+# Even more data, this time with a recovery target LSN
$node_master->safe_psql('postgres',
"INSERT INTO tab_int VALUES (generate_series(4001,5000))");
my $lsn5 = my $recovery_lsn =
$node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+# And now for a recovery target 'latest'
$node_master->safe_psql('postgres',
"INSERT INTO tab_int VALUES (generate_series(5001,6000))");
+my $lsn6 =
+ $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
# Force archiving of WAL file
$node_master->safe_psql('postgres', "SELECT pg_switch_wal()");
@@ -114,6 +117,9 @@ test_recovery_standby('name', 'standby_4', $node_master, \@recovery_params,
@recovery_params = ("recovery_target_lsn = '$recovery_lsn'");
test_recovery_standby('LSN', 'standby_5', $node_master, \@recovery_params,
"5000", $lsn5);
+@recovery_params = ("recovery_target = 'latest'");
+test_recovery_standby('latest target', 'standby_6', $node_master, \@recovery_params,
+ "6000", $lsn6);
# Multiple targets
#
@@ -126,9 +132,9 @@ test_recovery_standby('LSN', 'standby_5', $node_master, \@recovery_params,
"recovery_target_name = ''",
"recovery_target_time = '$recovery_time'");
test_recovery_standby('multiple overriding settings',
- 'standby_6', $node_master, \@recovery_params, "3000", $lsn3);
+ 'standby_7', $node_master, \@recovery_params, "3000", $lsn3);
-my $node_standby = get_new_node('standby_7');
+my $node_standby = get_new_node('standby_8');
$node_standby->init_from_backup($node_master, 'my_backup',
has_restoring => 1);
$node_standby->append_conf(
On 11/8/19 7:00 AM, Grigory Smolkin wrote:
On 11/7/19 4:36 PM, Grigory Smolkin wrote:
On 11/7/19 12:56 PM, Kyotaro Horiguchi wrote:
At Thu, 7 Nov 2019 12:22:28 +0300, Grigory Smolkin
<g.smolkin@postgrespro.ru> wrote inOn 11/7/19 8:36 AM, Kyotaro Horiguchi wrote:
At Thu, 7 Nov 2019 02:28:39 +0300, Grigory Smolkin
<g.smolkin@postgrespro.ru> wrote inOn 11/6/19 1:55 PM, Grigory Smolkin wrote:
On 11/6/19 12:56 PM, Fujii Masao wrote:
What happens if this parameter is set to latest in the standby
mode?
Or the combination of those settings should be prohibited?Currently it will behave just like regular standby, so I think, to
avoid confusion and keep things simple, the combination of them
should
be prohibited.
Thank you for pointing this out, I will work on it.Attached new patch revision, now it is impossible to use
recovery_target 'latest' in standby mode.
TAP test is updated to reflect this behavior.In the first place, latest (or anything it could be named as) is
defined as the explit label for the default behavior. Thus the latest
should work as if nothing is set to recovery_target* following the
definition. That might seems somewhat strange but I think at
least it
is harmless.Well, it was more about getting default behavior by using some
explicit recovery_target, not the other way around. Because it will
break some 3rd party backup and replication applications, that may
rely on old behavior of ignoring recovery_target_action when no
recovery_target is provided.
But you think that it is worth pursuing, I can do that.Ah. Sorry for the misleading statement. What I had in my mind was
somewhat the mixture of them. I thought that recovery_target =''
behaves the same way as now, r_t_action is ignored. And 'latest' just
makes recovery_target_action work as the current non-empty
recovery_target's does. But I'm not confident that it is a good
design.recovery_target=immediate + r_t_action=shutdown for a standby
works as
commanded. Do we need to inhibit that, too?Why something, that work as expected, should be inhibited?
To make sure, I don't think we should do that. I meant by the above
that standby mode is already accepting recovery_target_action so
inhibiting that only for 'latest' is not orthogonal and could be more
confusing for users, and complicatig the code. So my opinion is we
shouldn't inhibit 'latest' unless r_t_action harms.I gave it some thought and now think that prohibiting recovery_target
'latest' and standby was a bad idea.
All recovery_targets follow the same pattern of usage, so
recovery_target "latest" also must be capable of working in standby
mode.
All recovery_targets have a clear deterministic 'target' where
recovery should end.
In case of recovery_target "latest" this target is the end of
available WAL, therefore the end of available WAL must be more
clearly defined.
I will work on it.Thank you for a feedback.
Attached new patch revision, now end of available WAL is defined as
the fact of missing required WAL.
In case of standby, the end of WAL is defined as 2 consecutive
switches of WAL source, that didn`t provided requested record.
In case of streaming standby, each switch of WAL source is forced
after 3 failed attempts to get new data from walreceiver.All constants are taken off the top of my head and serves as proof of
concept.
TAP test is updated.
Attached new revision, it contains some minor refactoring.
While working on it, I stumbled upon something strange:
why DisownLatch(&XLogCtl->recoveryWakeupLatch) is called before
ReadRecord(xlogreader, LastRec, PANIC, false) ?
Isn`t this latch may be accessed in WaitForWALToBecomeAvailable() if
streaming standby gets promoted?
regards.
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0005-recovery_target_latest.patchtext/x-patch; name=0005-recovery_target_latest.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0191ec84b1..49675c38da 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3366,21 +3366,19 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows
<variablelist>
<varlistentry id="guc-recovery-target" xreflabel="recovery_target">
- <term><varname>recovery_target</varname><literal> = 'immediate'</literal>
+ <term><varname>recovery_target</varname> (<type>string</type>)
<indexterm>
<primary><varname>recovery_target</varname> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
- This parameter specifies that recovery should end as soon as a
- consistent state is reached, i.e. as early as possible. When restoring
- from an online backup, this means the point where taking the backup
- ended.
- </para>
- <para>
- Technically, this is a string parameter, but <literal>'immediate'</literal>
- is currently the only allowed value.
+ This parameter determines how far recovery should proceed. The value
+ <literal>immediate</literal> means that recovery should end as soon as a consistent
+ state is reached, i.e. as early as possible. When restoring from an online
+ backup, this means the point where taking the backup ended.
+ The second possible value <literal>latest</literal> means that recovery
+ should proceed to the end of the available WAL log.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2e3cc51006..e89b4ee75a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -843,6 +843,13 @@ static bool updateMinRecoveryPoint = true;
*/
bool reachedConsistency = false;
+/*
+ * Have we reached an end of available WAL?
+ * Only ever set when RECOVERY_TARGET_LATEST is used.
+ */
+bool reachedEndOfWAL = false;
+int standby_src_switch_count = 0;
+
static bool InRedo = false;
/* Have we launched bgwriter during recovery? */
@@ -4309,6 +4316,14 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
if (record)
{
+ /*
+ * When recovering to the end of WAL in standby mode, reset
+ * the counter of WAL source switches after page
+ * is successfully readed.
+ */
+ if (recoveryTarget == RECOVERY_TARGET_LATEST && StandbyMode)
+ standby_src_switch_count = 0;
+
/* Great, got a record */
return record;
}
@@ -4372,7 +4387,7 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
}
/* In standby mode, loop back to retry. Otherwise, give up. */
- if (StandbyMode && !CheckForStandbyTrigger())
+ if (StandbyMode && !CheckForStandbyTrigger() && !reachedEndOfWAL)
continue;
else
return NULL;
@@ -6345,6 +6360,9 @@ StartupXLOG(void)
else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE)
ereport(LOG,
(errmsg("starting point-in-time recovery to earliest consistent point")));
+ else if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ ereport(LOG,
+ (errmsg("starting point-in-time recovery until all available WAL is applied")));
else
ereport(LOG,
(errmsg("starting archive recovery")));
@@ -7257,6 +7275,19 @@ StartupXLOG(void)
* end of main redo apply loop
*/
+ /*
+ * If all avaialble WAL is replayed,
+ * then RECOVERY_TARGET_LATEST is satisfied
+ */
+ if (recoveryTarget == RECOVERY_TARGET_LATEST && reachedEndOfWAL)
+ {
+ ereport(LOG,
+ (errmsg("recovery stopping after reaching the end of available WAL")));
+
+ reachedStopPoint = true;
+ standby_src_switch_count = 0;
+ }
+
if (reachedStopPoint)
{
if (!reachedConsistency)
@@ -7459,6 +7490,8 @@ StartupXLOG(void)
recoveryStopName);
else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE)
snprintf(reason, sizeof(reason), "reached consistency");
+ else if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ snprintf(reason, sizeof(reason), "applied all available WAL");
else
snprintf(reason, sizeof(reason), "no recovery target specified");
@@ -11744,7 +11777,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
{
static TimestampTz last_fail_time = 0;
TimestampTz now;
- bool streaming_reply_sent = false;
+ bool streaming_reply_sent = false;
+ int failed_stream_attempts = 0;
/*-------
* Standby mode is implemented by a state machine:
@@ -11804,7 +11838,10 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
* and pg_wal.
*/
if (!StandbyMode)
+ {
+ reachedEndOfWAL = true;
return false;
+ }
/*
* If primary_conninfo is set, launch walreceiver to try
@@ -11948,9 +11985,29 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
}
if (currentSource != oldSource)
+ {
+ /*
+ * If the amount of WAL source switches exceeds the limit,
+ * when recovering to end of WAL in standby mode,
+ * then declate the current state to be the end of WAL.
+ */
+ if (recoveryTarget == RECOVERY_TARGET_LATEST && !reachedEndOfWAL)
+ {
+ standby_src_switch_count++;
+
+ if (standby_src_switch_count >= 4)
+ {
+ elog(DEBUG2, "the limit on WAL source switches is exhausted, "
+ "the end of WAL is reached");
+ reachedEndOfWAL = true;
+ return false;
+ }
+ }
+
elog(DEBUG2, "switched WAL source from %s to %s after %s",
xlogSourceNames[oldSource], xlogSourceNames[currentSource],
lastSourceFailed ? "failure" : "success");
+ }
/*
* We've now handled possible failure. Try to read from the chosen
@@ -11985,6 +12042,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
/*
* Nope, not found in archive or pg_wal.
*/
+
lastSourceFailed = true;
break;
@@ -12067,6 +12125,11 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
}
break;
}
+ else
+ {
+ if (recoveryTarget == RECOVERY_TARGET_LATEST && !reachedEndOfWAL)
+ failed_stream_attempts++;
+ }
/*
* Data not here yet. Check for trigger, then wait for
@@ -12122,6 +12185,19 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
* process.
*/
HandleStartupProcInterrupts();
+
+ /*
+ * If the amount of failed attempts to get data from current
+ * WAL source exceeds the limit, when recovering to the end
+ * of WAL in standby mode, then force the WAL source switch.
+ */
+ if (recoveryTarget == RECOVERY_TARGET_LATEST &&
+ !reachedEndOfWAL &&
+ failed_stream_attempts >= 3)
+ {
+ lastSourceFailed = true;
+ break;
+ }
}
return false; /* not reached */
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 31a5ef0474..b652304683 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3503,7 +3503,10 @@ static struct config_string ConfigureNamesString[] =
{
{"recovery_target", PGC_POSTMASTER, WAL_RECOVERY_TARGET,
- gettext_noop("Set to \"immediate\" to end recovery as soon as a consistent state is reached."),
+ gettext_noop("Set to \"immediate\" to end recovery as "
+ "soon as a consistent state is reached or "
+ " to \"latest\" to end recovery after "
+ "replaying all available WAL in the archive."),
NULL
},
&recovery_target_string,
@@ -11527,9 +11530,10 @@ error_multiple_recovery_targets(void)
static bool
check_recovery_target(char **newval, void **extra, GucSource source)
{
- if (strcmp(*newval, "immediate") != 0 && strcmp(*newval, "") != 0)
+ if (strcmp(*newval, "immediate") != 0 && strcmp(*newval, "latest") != 0 &&
+ strcmp(*newval, "") != 0)
{
- GUC_check_errdetail("The only allowed value is \"immediate\".");
+ GUC_check_errdetail("The only allowed values are \"immediate\" and \"latest\".");
return false;
}
return true;
@@ -11539,11 +11543,17 @@ static void
assign_recovery_target(const char *newval, void *extra)
{
if (recoveryTarget != RECOVERY_TARGET_UNSET &&
- recoveryTarget != RECOVERY_TARGET_IMMEDIATE)
+ recoveryTarget != RECOVERY_TARGET_IMMEDIATE &&
+ recoveryTarget != RECOVERY_TARGET_LATEST)
error_multiple_recovery_targets();
if (newval && strcmp(newval, "") != 0)
- recoveryTarget = RECOVERY_TARGET_IMMEDIATE;
+ {
+ if (strcmp(newval, "immediate") == 0)
+ recoveryTarget = RECOVERY_TARGET_IMMEDIATE;
+ else if (strcmp(newval, "latest") == 0)
+ recoveryTarget = RECOVERY_TARGET_LATEST;
+ }
else
recoveryTarget = RECOVERY_TARGET_UNSET;
}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252aad..55e7e4bd2c 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -84,7 +84,8 @@ typedef enum
RECOVERY_TARGET_TIME,
RECOVERY_TARGET_NAME,
RECOVERY_TARGET_LSN,
- RECOVERY_TARGET_IMMEDIATE
+ RECOVERY_TARGET_IMMEDIATE,
+ RECOVERY_TARGET_LATEST
} RecoveryTargetType;
/*
diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index d8fbd50011..01dab1aebf 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 8;
+use Test::More tests => 9;
# Create and test a standby from given backup, with a certain recovery target.
# Choose $until_lsn later than the transaction commit that causes the row
@@ -77,7 +77,7 @@ my $lsn3 =
$node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
my $recovery_time = $node_master->safe_psql('postgres', "SELECT now()");
-# Even more data, this time with a recovery target name
+# More data, with a recovery target name
$node_master->safe_psql('postgres',
"INSERT INTO tab_int VALUES (generate_series(3001,4000))");
my $recovery_name = "my_target";
@@ -86,14 +86,17 @@ my $lsn4 =
$node_master->safe_psql('postgres',
"SELECT pg_create_restore_point('$recovery_name');");
-# And now for a recovery target LSN
+# Even more data, this time with a recovery target LSN
$node_master->safe_psql('postgres',
"INSERT INTO tab_int VALUES (generate_series(4001,5000))");
my $lsn5 = my $recovery_lsn =
$node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+# And now for a recovery target 'latest'
$node_master->safe_psql('postgres',
"INSERT INTO tab_int VALUES (generate_series(5001,6000))");
+my $lsn6 =
+ $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
# Force archiving of WAL file
$node_master->safe_psql('postgres', "SELECT pg_switch_wal()");
@@ -114,6 +117,9 @@ test_recovery_standby('name', 'standby_4', $node_master, \@recovery_params,
@recovery_params = ("recovery_target_lsn = '$recovery_lsn'");
test_recovery_standby('LSN', 'standby_5', $node_master, \@recovery_params,
"5000", $lsn5);
+@recovery_params = ("recovery_target = 'latest'");
+test_recovery_standby('latest target', 'standby_6', $node_master, \@recovery_params,
+ "6000", $lsn6);
# Multiple targets
#
@@ -126,9 +132,9 @@ test_recovery_standby('LSN', 'standby_5', $node_master, \@recovery_params,
"recovery_target_name = ''",
"recovery_target_time = '$recovery_time'");
test_recovery_standby('multiple overriding settings',
- 'standby_6', $node_master, \@recovery_params, "3000", $lsn3);
+ 'standby_7', $node_master, \@recovery_params, "3000", $lsn3);
-my $node_standby = get_new_node('standby_7');
+my $node_standby = get_new_node('standby_8');
$node_standby->init_from_backup($node_master, 'my_backup',
has_restoring => 1);
$node_standby->append_conf(
At Fri, 8 Nov 2019 16:08:47 +0300, Grigory Smolkin <g.smolkin@postgrespro.ru> wrote in
While working on it, I stumbled upon something strange:
why DisownLatch(&XLogCtl->recoveryWakeupLatch) is called before
ReadRecord(xlogreader, LastRec, PANIC, false) ?
Isn`t this latch may be accessed in WaitForWALToBecomeAvailable() if
streaming standby gets promoted?
The DisownLatch is just for the sake of tidiness and can be placed
anywhere after the ShutdownWalRcv() call but the current place (just
before "StandbyMode = false") seems natural. The ReadRecord call
doesn't launch another wal receiver because we cleard StandbyMode just
before.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Fri, 8 Nov 2019 16:08:47 +0300, Grigory Smolkin <g.smolkin@postgrespro.ru> wrote in
On 11/8/19 7:00 AM, Grigory Smolkin wrote:
On 11/7/19 4:36 PM, Grigory Smolkin wrote:
I gave it some thought and now think that prohibiting recovery_target
'latest' and standby was a bad idea.
All recovery_targets follow the same pattern of usage, so
recovery_target "latest" also must be capable of working in standby
mode.
All recovery_targets have a clear deterministic 'target' where
recovery should end.
In case of recovery_target "latest" this target is the end of
available WAL, therefore the end of available WAL must be more clearly
defined.
I will work on it.Thank you for a feedback.
Attached new patch revision, now end of available WAL is defined as
the fact of missing required WAL.
In case of standby, the end of WAL is defined as 2 consecutive
switches of WAL source, that didn`t provided requested record.
In case of streaming standby, each switch of WAL source is forced
after 3 failed attempts to get new data from walreceiver.All constants are taken off the top of my head and serves as proof of
concept.
TAP test is updated.Attached new revision, it contains some minor refactoring.
Thanks for the new patch. I found that it needs more than I thought,
but it seems a bit too complicated and less stable.
As the patch does, WaitForWALToBecomeAvailable needs to exit when
avaiable sources are exhausted. However, we don't need to count
failures to do that. It is enough that the function have two more exit
point. When streaming timeout fires, and when we found that streaming
is not set up when archive/wal failed.
In my opinion, it is better that we have less dependency to global
variables in such deep levels in a call hierachy. Such nformation can
be stored in XLogPageReadPrivate.
I think The doc needs to exiplain on the difference between default
and latest.
Please find the attached, which illustrates the first two points of
the aboves.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
target_latest_PoC.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3b766e66b9..8c8a1c6bf0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -808,6 +808,7 @@ typedef struct XLogPageReadPrivate
int emode;
bool fetching_ckpt; /* are we fetching a checkpoint record? */
bool randAccess;
+ bool return_on_eow; /* returns when reaching End of WAL */
} XLogPageReadPrivate;
/*
@@ -887,7 +888,9 @@ static int XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source);
static int XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
int reqLen, XLogRecPtr targetRecPtr, char *readBuf);
static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
- bool fetching_ckpt, XLogRecPtr tliRecPtr);
+ bool fetching_ckpt,
+ XLogRecPtr tliRecPtr,
+ bool return_on_eow);
static int emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
static void XLogFileClose(void);
static void PreallocXlogFiles(XLogRecPtr endptr);
@@ -4253,6 +4256,7 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
private->fetching_ckpt = fetching_ckpt;
private->emode = emode;
private->randAccess = (RecPtr != InvalidXLogRecPtr);
+ private->return_on_eow = (recoveryTarget == RECOVERY_TARGET_LATEST);
/* This is the first attempt to read this page. */
lastSourceFailed = false;
@@ -4371,8 +4375,12 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
continue;
}
- /* In standby mode, loop back to retry. Otherwise, give up. */
- if (StandbyMode && !CheckForStandbyTrigger())
+ /*
+ * In standby mode, loop back to retry. Otherwise, give up.
+ * If we told to return on end of WAL, also give up.
+ */
+ if (StandbyMode && !CheckForStandbyTrigger() &&
+ !private->return_on_eow)
continue;
else
return NULL;
@@ -7271,7 +7279,7 @@ StartupXLOG(void)
* end of main redo apply loop
*/
- if (reachedStopPoint)
+ if (reachedStopPoint || recoveryTarget == RECOVERY_TARGET_LATEST)
{
if (!reachedConsistency)
ereport(FATAL,
@@ -7473,6 +7481,8 @@ StartupXLOG(void)
recoveryStopName);
else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE)
snprintf(reason, sizeof(reason), "reached consistency");
+ else if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ snprintf(reason, sizeof(reason), "end of WAL");
else
snprintf(reason, sizeof(reason), "no recovery target specified");
@@ -11605,7 +11615,8 @@ retry:
if (!WaitForWALToBecomeAvailable(targetPagePtr + reqLen,
private->randAccess,
private->fetching_ckpt,
- targetRecPtr))
+ targetRecPtr,
+ private->return_on_eow))
{
if (readFile >= 0)
close(readFile);
@@ -11745,7 +11756,9 @@ next_record_is_invalid:
*
* If the record is not immediately available, the function returns false
* if we're not in standby mode. In standby mode, waits for it to become
- * available.
+ * available looping over all sources if return_on_eow is false. Otherwise the
+ * function returns false if the record is not immediately available on all
+ * sources.
*
* When the requested record becomes available, the function opens the file
* containing it (if not open already), and returns true. When end of standby
@@ -11754,7 +11767,8 @@ next_record_is_invalid:
*/
static bool
WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
- bool fetching_ckpt, XLogRecPtr tliRecPtr)
+ bool fetching_ckpt, XLogRecPtr tliRecPtr,
+ bool return_on_eow)
{
static TimestampTz last_fail_time = 0;
TimestampTz now;
@@ -11862,6 +11876,12 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
receivedUpto = 0;
}
+ /*
+ * If we don't get into streaming, this is the end of WAL
+ */
+ else if (return_on_eow)
+ return false;
+
/*
* Move to XLOG_FROM_STREAM state in either case. We'll
* get immediate failure if we didn't launch walreceiver,
@@ -12124,6 +12144,14 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
WL_EXIT_ON_PM_DEATH,
5000L, WAIT_EVENT_RECOVERY_WAL_ALL);
ResetLatch(&XLogCtl->recoveryWakeupLatch);
+
+ /*
+ * This is not exactly the end of WAL, but assume the
+ * primary has no more record to send.
+ */
+ if (return_on_eow)
+ return false;
+
break;
}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 4b3769b8b0..42367a793d 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -11637,9 +11637,11 @@ error_multiple_recovery_targets(void)
static bool
check_recovery_target(char **newval, void **extra, GucSource source)
{
- if (strcmp(*newval, "immediate") != 0 && strcmp(*newval, "") != 0)
+ if (strcmp(*newval, "immediate") != 0 &&
+ strcmp(*newval, "latest") != 0 &&
+ strcmp(*newval, "") != 0)
{
- GUC_check_errdetail("The only allowed value is \"immediate\".");
+ GUC_check_errdetail("The only allowed value is \"immediate\" and \"latest\".");
return false;
}
return true;
@@ -11649,13 +11651,16 @@ static void
assign_recovery_target(const char *newval, void *extra)
{
if (recoveryTarget != RECOVERY_TARGET_UNSET &&
- recoveryTarget != RECOVERY_TARGET_IMMEDIATE)
+ recoveryTarget != RECOVERY_TARGET_IMMEDIATE &&
+ recoveryTarget != RECOVERY_TARGET_LATEST)
error_multiple_recovery_targets();
- if (newval && strcmp(newval, "") != 0)
+ if (!newval || strcmp(newval, "") == 0)
+ recoveryTarget = RECOVERY_TARGET_UNSET;
+ else if(strcmp(newval, "immediate") == 0)
recoveryTarget = RECOVERY_TARGET_IMMEDIATE;
else
- recoveryTarget = RECOVERY_TARGET_UNSET;
+ recoveryTarget = RECOVERY_TARGET_LATEST;
}
static bool
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252aad..55e7e4bd2c 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -84,7 +84,8 @@ typedef enum
RECOVERY_TARGET_TIME,
RECOVERY_TARGET_NAME,
RECOVERY_TARGET_LSN,
- RECOVERY_TARGET_IMMEDIATE
+ RECOVERY_TARGET_IMMEDIATE,
+ RECOVERY_TARGET_LATEST
} RecoveryTargetType;
/*
On 2019-11-08 05:00, Grigory Smolkin wrote:
Attached new patch revision, now end of available WAL is defined as the
fact of missing required WAL.
In case of standby, the end of WAL is defined as 2 consecutive switches
of WAL source, that didn`t provided requested record.
In case of streaming standby, each switch of WAL source is forced after
3 failed attempts to get new data from walreceiver.All constants are taken off the top of my head and serves as proof of
concept.
Well, this is now changing the meaning of the patch quite a bit. I'm on
board with making the existing default behavior explicit. (This is
similar to how we added recovery_target_timeline = 'current' in PG12.)
Still not a fan of the name yet, but that's trivial.
If, however, you want to change the default behavior or introduce a new
behavior, as you are suggesting here, that should be a separate discussion.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/21/19 1:46 PM, Peter Eisentraut wrote:
On 2019-11-08 05:00, Grigory Smolkin wrote:
Attached new patch revision, now end of available WAL is defined as the
fact of missing required WAL.
In case of standby, the end of WAL is defined as 2 consecutive switches
of WAL source, that didn`t provided requested record.
In case of streaming standby, each switch of WAL source is forced after
3 failed attempts to get new data from walreceiver.All constants are taken off the top of my head and serves as proof of
concept.Well, this is now changing the meaning of the patch quite a bit. I'm
on board with making the existing default behavior explicit. (This is
similar to how we added recovery_target_timeline = 'current' in PG12.)
Still not a fan of the name yet, but that's trivial.If, however, you want to change the default behavior or introduce a
new behavior, as you are suggesting here, that should be a separate
discussion.
No, default behavior is not to be changed. As I previously mentioned, it
may break the backward compatibility for 3rd party backup and
replication applications.
At Fri, 8 Nov 2019 16:08:47 +0300, Grigory Smolkin<g.smolkin@postgrespro.ru> wrote in
On 11/8/19 7:00 AM, Grigory Smolkin wrote:
On 11/7/19 4:36 PM, Grigory Smolkin wrote:
I gave it some thought and now think that prohibiting recovery_target
'latest' and standby was a bad idea.
All recovery_targets follow the same pattern of usage, so
recovery_target "latest" also must be capable of working in standby
mode.
All recovery_targets have a clear deterministic 'target' where
recovery should end.
In case of recovery_target "latest" this target is the end of
available WAL, therefore the end of available WAL must be more clearly
defined.
I will work on it.Thank you for a feedback.
Attached new patch revision, now end of available WAL is defined as
the fact of missing required WAL.
In case of standby, the end of WAL is defined as 2 consecutive
switches of WAL source, that didn`t provided requested record.
In case of streaming standby, each switch of WAL source is forced
after 3 failed attempts to get new data from walreceiver.All constants are taken off the top of my head and serves as proof of
concept.
TAP test is updated.Attached new revision, it contains some minor refactoring.
Thanks for the new patch. I found that it needs more than I thought,
but it seems a bit too complicated and less stable.As the patch does, WaitForWALToBecomeAvailable needs to exit when
avaiable sources are exhausted. However, we don't need to count
failures to do that. It is enough that the function have two more exit
point. When streaming timeout fires, and when we found that streaming
is not set up when archive/wal failed.In my opinion, it is better that we have less dependency to global
variables in such deep levels in a call hierachy. Such nformation can
be stored in XLogPageReadPrivate.
Many thanks!
It looks much better that my approach with global variables.
I`ve tested it and have some thoughts/concerns:
1. Recovery should report the exact reason why it has been forced to
stop. In case of recovering to the end of WAL, standby promotion request
received during recovery could be mistaken for reaching the end of WAL
and reported as such. To avoid this, I think that reachedEndOfWal
variable should be introduced.
In attached patch it is added as a global variable, but maybe something
more clever may be devised. I was not sure that reachedEndOfWal could be
placed in XLogPageReadPrivate. Because we need to access it at the
higher level than ReadRecord(), and I was under impression placing it in
XLogPageReadPrivate could violate abstraction level of XLogPageReadPrivate.
2. During the testing, I`ve stumbled upon assertion failure in case of
recovering in standby mode to the the end of WAL coupled with
recovery_target_action as "promote", caused by the WAL source in state
machine not been changed after reaching the recovery target (script to
reproduce is attached):
2019-12-07 13:45:54.468 MSK [23067] LOG: starting PostgreSQL 13devel on
x86_64-pc-linux-gnu, compiled by gcc (GCC) 9.2.1 20190827 (Red Hat
9.2.1-1), 64-bit
2019-12-07 13:45:54.468 MSK [23067] LOG: listening on IPv4 address
"127.0.0.1", port 15433
2019-12-07 13:45:54.470 MSK [23067] LOG: listening on Unix socket
"/tmp/.s.PGSQL.15433"
2019-12-07 13:45:54.475 MSK [23068] LOG: database system was
interrupted; last known up at 2019-12-07 13:45:49 MSK
cp: cannot stat '/home/gsmol/task/13_devel/archive/00000002.history': No
such file or directory
2019-12-07 13:45:54.602 MSK [23068] LOG: entering standby mode
2019-12-07 13:45:54.614 MSK [23068] LOG: restored log file
"000000010000000000000002" from archive
2019-12-07 13:45:54.679 MSK [23068] LOG: redo starts at 0/2000028
2019-12-07 13:45:54.682 MSK [23068] LOG: consistent recovery state
reached at 0/2000100
2019-12-07 13:45:54.682 MSK [23067] LOG: database system is ready to
accept read only connections
2019-12-07 13:45:54.711 MSK [23068] LOG: restored log file
"000000010000000000000003" from archive
2019-12-07 13:45:54.891 MSK [23068] LOG: restored log file
"000000010000000000000004" from archive
2019-12-07 13:45:55.046 MSK [23068] LOG: restored log file
"000000010000000000000005" from archive
2019-12-07 13:45:55.210 MSK [23068] LOG: restored log file
"000000010000000000000006" from archive
2019-12-07 13:45:55.377 MSK [23068] LOG: restored log file
"000000010000000000000007" from archive
2019-12-07 13:45:55.566 MSK [23068] LOG: restored log file
"000000010000000000000008" from archive
2019-12-07 13:45:55.737 MSK [23068] LOG: restored log file
"000000010000000000000009" from archive
cp: cannot stat
'/home/gsmol/task/13_devel/archive/00000001000000000000000A': No such
file or directory
2019-12-07 13:45:56.233 MSK [23083] LOG: started streaming WAL from
primary at 0/A000000 on timeline 1
2019-12-07 13:45:56.365 MSK [23068] LOG: recovery stopping after
reaching the end of available WAL
2019-12-07 13:45:56.365 MSK [23068] LOG: redo done at 0/9FFC670
2019-12-07 13:45:56.365 MSK [23068] LOG: last completed transaction was
at log time 2019-12-07 13:45:53.627746+03
2019-12-07 13:45:56.365 MSK [23083] FATAL: terminating walreceiver
process due to administrator command
TRAP: FailedAssertion("StandbyMode", File: "xlog.c", Line: 12032)
postgres: startup waiting for
00000001000000000000000A(ExceptionalCondition+0xa8)[0xa88b55]
postgres: startup waiting for 00000001000000000000000A[0x573417]
postgres: startup waiting for 00000001000000000000000A[0x572b68]
postgres: startup waiting for 00000001000000000000000A[0x579066]
postgres: startup waiting for
00000001000000000000000A(XLogReadRecord+0xe3)[0x5788ac]
postgres: startup waiting for 00000001000000000000000A[0x5651f8]
postgres: startup waiting for
00000001000000000000000A(StartupXLOG+0x23aa)[0x56b26e]
postgres: startup waiting for
00000001000000000000000A(StartupProcessMain+0xc7)[0x8642a1]
postgres: startup waiting for
00000001000000000000000A(AuxiliaryProcessMain+0x5b8)[0x5802ad]
postgres: startup waiting for 00000001000000000000000A[0x863175]
postgres: startup waiting for
00000001000000000000000A(PostmasterMain+0x1214)[0x85e0a4]
postgres: startup waiting for 00000001000000000000000A[0x76f247]
/lib64/libc.so.6(__libc_start_main+0xf3)[0x7f8867958f33]
postgres: startup waiting for
00000001000000000000000A(_start+0x2e)[0x47afee]
#0 0x00007f886796ce75 in raise () from /lib64/libc.so.6
#1 0x00007f8867957895 in abort () from /lib64/libc.so.6
#2 0x0000000000a88b82 in ExceptionalCondition (conditionName=0xb24acc
"StandbyMode", errorType=0xb208a7 "FailedAssertion",
fileName=0xb208a0 "xlog.c", lineNumber=12032) at assert.c:67
#3 0x0000000000573417 in WaitForWALToBecomeAvailable (RecPtr=151003136,
randAccess=true, fetching_ckpt=false, tliRecPtr=167757424,
return_on_eow=true) at xlog.c:12032
#4 0x0000000000572b68 in XLogPageRead (xlogreader=0xf08ed8,
targetPagePtr=150994944, reqLen=8192, targetRecPtr=167757424,
readBuf=0xf37d50 "\002\321\005") at xlog.c:11611
#5 0x0000000000579066 in ReadPageInternal (state=0xf08ed8,
pageptr=167755776, reqLen=1672) at xlogreader.c:579
#6 0x00000000005788ac in XLogReadRecord (state=0xf08ed8,
RecPtr=167757424, errormsg=0x7fff1f5cdeb8) at xlogreader.c:300
#7 0x00000000005651f8 in ReadRecord (xlogreader=0xf08ed8,
RecPtr=167757424, emode=22, fetching_ckpt=false) at xlog.c:4271
#8 0x000000000056b26e in StartupXLOG () at xlog.c:7373
#9 0x00000000008642a1 in StartupProcessMain () at startup.c:196
#10 0x00000000005802ad in AuxiliaryProcessMain (argc=2,
argv=0x7fff1f5cea80) at bootstrap.c:451
#11 0x0000000000863175 in StartChildProcess (type=StartupProcess) at
postmaster.c:5461
#12 0x000000000085e0a4 in PostmasterMain (argc=3, argv=0xf082b0) at
postmaster.c:1392
#13 0x000000000076f247 in main (argc=3, argv=0xf082b0) at main.c:210
(gdb) frame 8
#8 0x000000000056b26e in StartupXLOG () at xlog.c:7373
7373 record = ReadRecord(xlogreader, LastRec, PANIC, false);
(gdb) p StandbyMode
$1 = false
(gdb) list
7368
7369 /*
7370 * Re-fetch the last valid or last applied record, so we
can identify the
7371 * exact endpoint of what we consider the valid portion
of WAL.
7372 */
7373 record = ReadRecord(xlogreader, LastRec, PANIC, false);
7374 EndOfLog = EndRecPtr;
7375
7376 /*
7377 * EndOfLogTLI is the TLI in the filename of the XLOG
segment containing
Both issues are fixed in the new patch version.
Any review and thoughts on the matters would be much appreciated.
I think The doc needs to exiplain on the difference between default
and latest.
Sure, I will work on it.
Please find the attached, which illustrates the first two points of
the aboves.regards.
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
target_latest_PoC_v2.patchtext/x-patch; name=target_latest_PoC_v2.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4ec13f3311..bc904a09bd 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3385,21 +3385,19 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows
<variablelist>
<varlistentry id="guc-recovery-target" xreflabel="recovery_target">
- <term><varname>recovery_target</varname><literal> = 'immediate'</literal>
+ <term><varname>recovery_target</varname> (<type>string</type>)
<indexterm>
<primary><varname>recovery_target</varname> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
- This parameter specifies that recovery should end as soon as a
- consistent state is reached, i.e. as early as possible. When restoring
- from an online backup, this means the point where taking the backup
- ended.
- </para>
- <para>
- Technically, this is a string parameter, but <literal>'immediate'</literal>
- is currently the only allowed value.
+ This parameter determines how far recovery should proceed. The value
+ <literal>immediate</literal> means that recovery should end as soon as a consistent
+ state is reached, i.e. as early as possible. When restoring from an online
+ backup, this means the point where taking the backup ended.
+ The second possible value <literal>latest</literal> means that recovery
+ should proceed to the end of the available WAL log.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5f0ee50092..db75587f1e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -808,6 +808,7 @@ typedef struct XLogPageReadPrivate
int emode;
bool fetching_ckpt; /* are we fetching a checkpoint record? */
bool randAccess;
+ bool return_on_eow; /* returns when reaching End of WAL */
} XLogPageReadPrivate;
/*
@@ -843,6 +844,9 @@ static bool updateMinRecoveryPoint = true;
*/
bool reachedConsistency = false;
+/* Have we reached End of WAL? Used only in case of RECOVERY_TARGET_LATEST */
+bool reachedEndOfWal = false;
+
static bool InRedo = false;
/* Have we launched bgwriter during recovery? */
@@ -887,7 +891,9 @@ static int XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source);
static int XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
int reqLen, XLogRecPtr targetRecPtr, char *readBuf);
static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
- bool fetching_ckpt, XLogRecPtr tliRecPtr);
+ bool fetching_ckpt,
+ XLogRecPtr tliRecPtr,
+ bool return_on_eow);
static int emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
static void XLogFileClose(void);
static void PreallocXlogFiles(XLogRecPtr endptr);
@@ -4253,6 +4259,7 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
private->fetching_ckpt = fetching_ckpt;
private->emode = emode;
private->randAccess = (RecPtr != InvalidXLogRecPtr);
+ private->return_on_eow = (recoveryTarget == RECOVERY_TARGET_LATEST);
/* This is the first attempt to read this page. */
lastSourceFailed = false;
@@ -4371,8 +4378,12 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
continue;
}
- /* In standby mode, loop back to retry. Otherwise, give up. */
- if (StandbyMode && !CheckForStandbyTrigger())
+ /*
+ * In standby mode, loop back to retry. Otherwise, give up.
+ * If we told to return on end of WAL, also give up.
+ */
+ if (StandbyMode && !CheckForStandbyTrigger() &&
+ !private->return_on_eow)
continue;
else
return NULL;
@@ -6342,6 +6353,9 @@ StartupXLOG(void)
else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE)
ereport(LOG,
(errmsg("starting point-in-time recovery to earliest consistent point")));
+ else if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ ereport(LOG,
+ (errmsg("starting point-in-time recovery until all available WAL is applied")));
else
ereport(LOG,
(errmsg("starting archive recovery")));
@@ -7254,6 +7268,13 @@ StartupXLOG(void)
* end of main redo apply loop
*/
+ if (recoveryTarget == RECOVERY_TARGET_LATEST && reachedEndOfWal)
+ {
+ ereport(LOG,
+ (errmsg("recovery stopping after reaching the end of available WAL")));
+ reachedStopPoint = true;
+ }
+
if (reachedStopPoint)
{
if (!reachedConsistency)
@@ -7456,6 +7477,8 @@ StartupXLOG(void)
recoveryStopName);
else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE)
snprintf(reason, sizeof(reason), "reached consistency");
+ else if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ snprintf(reason, sizeof(reason), "end of WAL");
else
snprintf(reason, sizeof(reason), "no recovery target specified");
@@ -11588,7 +11611,8 @@ retry:
if (!WaitForWALToBecomeAvailable(targetPagePtr + reqLen,
private->randAccess,
private->fetching_ckpt,
- targetRecPtr))
+ targetRecPtr,
+ private->return_on_eow))
{
if (readFile >= 0)
close(readFile);
@@ -11728,7 +11752,9 @@ next_record_is_invalid:
*
* If the record is not immediately available, the function returns false
* if we're not in standby mode. In standby mode, waits for it to become
- * available.
+ * available looping over all sources if return_on_eow is false. Otherwise the
+ * function returns false if the record is not immediately available on all
+ * sources.
*
* When the requested record becomes available, the function opens the file
* containing it (if not open already), and returns true. When end of standby
@@ -11737,7 +11763,8 @@ next_record_is_invalid:
*/
static bool
WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
- bool fetching_ckpt, XLogRecPtr tliRecPtr)
+ bool fetching_ckpt, XLogRecPtr tliRecPtr,
+ bool return_on_eow)
{
static TimestampTz last_fail_time = 0;
TimestampTz now;
@@ -11845,6 +11872,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
receivedUpto = 0;
}
+ /*
+ * If we don't get into streaming, this is the end of WAL
+ */
+ else if (return_on_eow)
+ {
+ reachedEndOfWal = true;
+ return false;
+ }
+
/*
* Move to XLOG_FROM_STREAM state in either case. We'll
* get immediate failure if we didn't launch walreceiver,
@@ -12107,6 +12143,18 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
WL_EXIT_ON_PM_DEATH,
5000L, WAIT_EVENT_RECOVERY_WAL_ALL);
ResetLatch(&XLogCtl->recoveryWakeupLatch);
+
+ /*
+ * This is not exactly the end of WAL, but assume the
+ * primary has no more record to send.
+ */
+ if (return_on_eow)
+ {
+ reachedEndOfWal = true;
+ currentSource = XLOG_FROM_PG_WAL;
+ return false;
+ }
+
break;
}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 5fccc9683e..3238e4ce98 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3545,7 +3545,10 @@ static struct config_string ConfigureNamesString[] =
{
{"recovery_target", PGC_POSTMASTER, WAL_RECOVERY_TARGET,
- gettext_noop("Set to \"immediate\" to end recovery as soon as a consistent state is reached."),
+ gettext_noop("Set to \"immediate\" to end recovery as "
+ "soon as a consistent state is reached or "
+ " to \"latest\" to end recovery after "
+ "replaying all available WAL in the archive."),
NULL
},
&recovery_target_string,
@@ -11650,9 +11653,10 @@ error_multiple_recovery_targets(void)
static bool
check_recovery_target(char **newval, void **extra, GucSource source)
{
- if (strcmp(*newval, "immediate") != 0 && strcmp(*newval, "") != 0)
+ if (strcmp(*newval, "immediate") != 0 && strcmp(*newval, "latest") != 0 &&
+ strcmp(*newval, "") != 0)
{
- GUC_check_errdetail("The only allowed value is \"immediate\".");
+ GUC_check_errdetail("The only allowed values are \"immediate\" and \"latest\".");
return false;
}
return true;
@@ -11662,11 +11666,17 @@ static void
assign_recovery_target(const char *newval, void *extra)
{
if (recoveryTarget != RECOVERY_TARGET_UNSET &&
- recoveryTarget != RECOVERY_TARGET_IMMEDIATE)
+ recoveryTarget != RECOVERY_TARGET_IMMEDIATE &&
+ recoveryTarget != RECOVERY_TARGET_LATEST)
error_multiple_recovery_targets();
if (newval && strcmp(newval, "") != 0)
- recoveryTarget = RECOVERY_TARGET_IMMEDIATE;
+ {
+ if (strcmp(newval, "immediate") == 0)
+ recoveryTarget = RECOVERY_TARGET_IMMEDIATE;
+ else if (strcmp(newval, "latest") == 0)
+ recoveryTarget = RECOVERY_TARGET_LATEST;
+ }
else
recoveryTarget = RECOVERY_TARGET_UNSET;
}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252aad..55e7e4bd2c 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -84,7 +84,8 @@ typedef enum
RECOVERY_TARGET_TIME,
RECOVERY_TARGET_NAME,
RECOVERY_TARGET_LSN,
- RECOVERY_TARGET_IMMEDIATE
+ RECOVERY_TARGET_IMMEDIATE,
+ RECOVERY_TARGET_LATEST
} RecoveryTargetType;
/*
diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index d8fbd50011..70fb79a8fc 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,8 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 8;
+use Test::More tests => 15;
+#use Time::HiRes qw(usleep);
# Create and test a standby from given backup, with a certain recovery target.
# Choose $until_lsn later than the transaction commit that causes the row
@@ -26,6 +27,8 @@ sub test_recovery_standby
$node_standby->append_conf('postgresql.conf', qq($param_item));
}
+ $node_standby->append_conf('postgresql.conf', "recovery_target_action = promote");
+
$node_standby->start;
# Wait until standby has replayed enough data
@@ -39,6 +42,15 @@ sub test_recovery_standby
$node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
is($result, qq($num_rows), "check standby content for $test_name");
+ # check that mode was promoted
+ if ($test_name ne 'multiple overriding settings')
+ {
+ $node_standby->safe_psql('postgres', "SELECT pg_sleep(1)");
+ my $is_in_recovery =
+ $node_standby->safe_psql('postgres', "SELECT pg_is_in_recovery()");
+ is($is_in_recovery, 'f', "check that standby was promoted for $test_name");
+ }
+
# Stop standby node
$node_standby->teardown_node;
@@ -77,7 +89,7 @@ my $lsn3 =
$node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
my $recovery_time = $node_master->safe_psql('postgres', "SELECT now()");
-# Even more data, this time with a recovery target name
+# More data, with a recovery target name
$node_master->safe_psql('postgres',
"INSERT INTO tab_int VALUES (generate_series(3001,4000))");
my $recovery_name = "my_target";
@@ -86,14 +98,17 @@ my $lsn4 =
$node_master->safe_psql('postgres',
"SELECT pg_create_restore_point('$recovery_name');");
-# And now for a recovery target LSN
+# Even more data, this time with a recovery target LSN
$node_master->safe_psql('postgres',
"INSERT INTO tab_int VALUES (generate_series(4001,5000))");
my $lsn5 = my $recovery_lsn =
$node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+# And now for a recovery target 'latest'
$node_master->safe_psql('postgres',
"INSERT INTO tab_int VALUES (generate_series(5001,6000))");
+my $lsn6 =
+ $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
# Force archiving of WAL file
$node_master->safe_psql('postgres', "SELECT pg_switch_wal()");
@@ -114,6 +129,9 @@ test_recovery_standby('name', 'standby_4', $node_master, \@recovery_params,
@recovery_params = ("recovery_target_lsn = '$recovery_lsn'");
test_recovery_standby('LSN', 'standby_5', $node_master, \@recovery_params,
"5000", $lsn5);
+@recovery_params = ("recovery_target = 'latest'");
+test_recovery_standby('latest target', 'standby_6', $node_master, \@recovery_params,
+ "6000", $lsn6);
# Multiple targets
#
@@ -126,9 +144,9 @@ test_recovery_standby('LSN', 'standby_5', $node_master, \@recovery_params,
"recovery_target_name = ''",
"recovery_target_time = '$recovery_time'");
test_recovery_standby('multiple overriding settings',
- 'standby_6', $node_master, \@recovery_params, "3000", $lsn3);
+ 'standby_7', $node_master, \@recovery_params, "3000", $lsn3);
-my $node_standby = get_new_node('standby_7');
+my $node_standby = get_new_node('standby_8');
$node_standby->init_from_backup($node_master, 'my_backup',
has_restoring => 1);
$node_standby->append_conf(
Thanks for the new version.
At Sun, 8 Dec 2019 04:03:01 +0300, Grigory Smolkin <g.smolkin@postgrespro.ru> wrote in
On 11/21/19 1:46 PM, Peter Eisentraut wrote:
On 2019-11-08 05:00, Grigory Smolkin wrote:
I`ve tested it and have some thoughts/concerns:
1. Recovery should report the exact reason why it has been forced to
stop. In case of recovering to the end of WAL, standby promotion
request received during recovery could be mistaken for reaching the
end of WAL and reported as such. To avoid this, I think that
reachedEndOfWal variable should be introduced.In attached patch it is added as a global variable, but maybe
something more clever may be devised. I was not sure that
reachedEndOfWal could be placed in XLogPageReadPrivate. Because we
need to access it at the higher level than ReadRecord(), and I was
under impression placing it in XLogPageReadPrivate could violate
abstraction level of XLogPageReadPrivate.
CheckForStandbyTrigger() always returns true once the trigger is
pulled. We don't care whether end-of-WAL is reached if promote is
already triggered. Thus, we can tell the promote case by asking
CheckForStandbyTrigger() when we exited the redo main loop with
recoveryTarget = RECOVERY_TARGET_LATEST. Is this works as you expect?
2. During the testing, I`ve stumbled upon assertion failure in case of
recovering in standby mode to the the end of WAL coupled with
recovery_target_action as "promote", caused by the WAL source in state
machine not been changed after reaching the recovery target (script to
reproduce is attached):
...
TRAP: FailedAssertion("StandbyMode", File: "xlog.c", Line: 12032)
...
#2 0x0000000000a88b82 in ExceptionalCondition (conditionName=0xb24acc
#"StandbyMode", errorType=0xb208a7 "FailedAssertion",
fileName=0xb208a0 "xlog.c", lineNumber=12032) at assert.c:67
#3 0x0000000000573417 in WaitForWALToBecomeAvailable
#(RecPtr=151003136, randAccess=true, fetching_ckpt=false,
#tliRecPtr=167757424,
return_on_eow=true) at xlog.c:12032
...
#7 0x00000000005651f8 in ReadRecord (xlogreader=0xf08ed8,
#RecPtr=167757424, emode=22, fetching_ckpt=false) at xlog.c:4271
..
ReadRecord is called with currentSource=STREAM after StandbyMode was
turned off. I suppose the fix means the "currentSource =
XLOG_FROM_PG_WAL" line but I don't think it not the right way.
Streaming timeout means failure when return_on_eow is true. Thus the
right thing to do there is setting lastSourceFailed to true. The first
half of WaitForWALToBecomeAvailable handles failure of the current
source thus source transition happens only there. The second half just
reports failure to the first half.
Both issues are fixed in the new patch version.
Any review and thoughts on the matters would be much appreciated.I think The doc needs to exiplain on the difference between default
and latest.Sure, I will work on it.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Hi,
this patch was waiting on author without any update/response since early
December, so I've marked it as returned with feedback. Feel free to
re-submit an updated version to a future CF.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services