Time-Delayed Standbys
Hi all,
The attached patch is a continuation of Robert's work [1]/messages/by-id/BANLkTi==TTzHDqWzwJDjmOf__8YuA7L1jw@mail.gmail.com.
I made some changes:
- use of Latches instead of pg_usleep, so we don't have to wakeup regularly.
- call HandleStartupProcInterrupts() before CheckForStandbyTrigger()
because might change the trigger file's location
- compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and
XLOG_XACT_COMMIT_COMPACT checks
- don't care about clockdrift because it's an admin problem.
Regards,
[1]: /messages/by-id/BANLkTi==TTzHDqWzwJDjmOf__8YuA7L1jw@mail.gmail.com
/messages/by-id/BANLkTi==TTzHDqWzwJDjmOf__8YuA7L1jw@mail.gmail.com
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
Show quoted text
Timbira: http://www.timbira.com.br
Blog sobre TI: http://fabriziomello.blogspot.com
Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
Twitter: http://twitter.com/fabriziomello
Attachments:
time-delayed-standby-v1.patchapplication/octet-stream; name=time-delayed-standby-v1.patchDownload
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index c0c543e..641c9c6 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -135,6 +135,27 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows
</listitem>
</varlistentry>
+ <varlistentry id="recovery-time-delay" xreflabel="recovery_time_delay">
+ <term><varname>recovery_time_delay</varname> (<type>integer</type>)</term>
+ <indexterm>
+ <primary><varname>recovery_time_delay</> recovery parameter</primary>
+ </indexterm>
+ <listitem>
+ <para>
+ Specifies the amount of time (in milliseconds, if no unit is specified)
+ which recovery of transaction commits should lag the master. This
+ parameter allows creation of a time-delayed standby. For example, if
+ you set this parameter to <literal>5min</literal>, the standby will
+ replay each transaction commit only when the system time on the standby
+ is at least five minutes past the commit time reported by the master.
+ </para>
+ <para>
+ Note that if the master and standby system clocks are not synchronized,
+ this might lead to unexpected results.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect1>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 06f5eb0..02a5bfd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -218,6 +218,8 @@ static bool recoveryPauseAtTarget = true;
static TransactionId recoveryTargetXid;
static TimestampTz recoveryTargetTime;
static char *recoveryTargetName;
+static int recovery_time_delay = 0;
+static TimestampTz recoveryDelayUntilTime;
/* options taken from recovery.conf for XLOG streaming */
static bool StandbyModeRequested = false;
@@ -730,6 +732,7 @@ static void readRecoveryCommandFile(void);
static void exitArchiveRecovery(TimeLineID endTLI, XLogSegNo endLogSegNo);
static bool recoveryStopsHere(XLogRecord *record, bool *includeThis);
static void recoveryPausesHere(void);
+static void recoveryDelay(void);
static void SetLatestXTime(TimestampTz xtime);
static void SetCurrentChunkStartTime(TimestampTz xtime);
static void CheckRequiredParameterValues(void);
@@ -5474,6 +5477,19 @@ readRecoveryCommandFile(void)
(errmsg_internal("trigger_file = '%s'",
TriggerFile)));
}
+ else if (strcmp(item->name, "recovery_time_delay") == 0)
+ {
+ const char *hintmsg;
+
+ if (!parse_int(item->value, &recovery_time_delay, GUC_UNIT_MS,
+ &hintmsg))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("parameter \"%s\" requires a temporal value", "recovery_time_delay"),
+ hintmsg ? errhint("%s", _(hintmsg)) : 0));
+ ereport(DEBUG2,
+ (errmsg("recovery_time_delay = '%s'", item->value)));
+ }
else
ereport(FATAL,
(errmsg("unrecognized recovery parameter \"%s\"",
@@ -5623,7 +5639,8 @@ exitArchiveRecovery(TimeLineID endTLI, XLogSegNo endLogSegNo)
* We also track the timestamp of the latest applied COMMIT/ABORT
* record in XLogCtl->recoveryLastXTime, for logging purposes.
* Also, some information is saved in recoveryStopXid et al for use in
- * annotating the new timeline's history file.
+ * annotating the new timeline's history file; and recoveryDelayUntilTime
+ * is updated, for time-delayed standbys.
*/
static bool
recoveryStopsHere(XLogRecord *record, bool *includeThis)
@@ -5633,6 +5650,9 @@ recoveryStopsHere(XLogRecord *record, bool *includeThis)
TimestampTz recordXtime;
char recordRPName[MAXFNAMELEN];
+ /* Clear any previous recovery delay time */
+ recoveryDelayUntilTime = 0;
+
/* We only consider stopping at COMMIT, ABORT or RESTORE POINT records */
if (record->xl_rmid != RM_XACT_ID && record->xl_rmid != RM_XLOG_ID)
return false;
@@ -5643,6 +5663,11 @@ recoveryStopsHere(XLogRecord *record, bool *includeThis)
recordXactCommitData = (xl_xact_commit_compact *) XLogRecGetData(record);
recordXtime = recordXactCommitData->xact_time;
+
+ if (recovery_time_delay > 0)
+ recoveryDelayUntilTime =
+ TimestampTzPlusMilliseconds(recordXactCommitData->xact_time,
+ recovery_time_delay);
}
else if (record->xl_rmid == RM_XACT_ID && record_info == XLOG_XACT_COMMIT)
{
@@ -5650,6 +5675,11 @@ recoveryStopsHere(XLogRecord *record, bool *includeThis)
recordXactCommitData = (xl_xact_commit *) XLogRecGetData(record);
recordXtime = recordXactCommitData->xact_time;
+
+ if (recovery_time_delay > 0)
+ recoveryDelayUntilTime =
+ TimestampTzPlusMilliseconds(recordXactCommitData->xact_time,
+ recovery_time_delay);
}
else if (record->xl_rmid == RM_XACT_ID && record_info == XLOG_XACT_ABORT)
{
@@ -5832,6 +5862,48 @@ SetRecoveryPause(bool recoveryPause)
}
/*
+ * When recovery_time_delay is set, we wait long enough to make sure we are
+ * at least that far behind the master.
+ */
+static void
+recoveryDelay(void)
+{
+ /* main delay loop */
+ while (true)
+ {
+ long secs;
+ int microsecs;
+
+ ResetLatch(&XLogCtl->recoveryWakeupLatch);
+
+ /* might change the trigger file's location */
+ HandleStartupProcInterrupts();
+
+ if (CheckForStandbyTrigger())
+ break;
+
+ TimestampDifference(GetCurrentTimestamp(), recoveryDelayUntilTime,
+ &secs, µsecs);
+
+ /*
+ * Wait for difference between GetCurrentTimestamp() and
+ * recoveryDelayUntilTime
+ */
+ if (secs > 0 || microsecs > 0)
+ {
+ elog(DEBUG2, "recovery delay %ld seconds, %d milliseconds",
+ secs, microsecs / 1000);
+
+ WaitLatch(&XLogCtl->recoveryWakeupLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+ secs * 1000L + microsecs / 1000);
+ }
+ else
+ break;
+ }
+}
+
+/*
* Save timestamp of latest processed commit/abort record.
*
* We keep this in XLogCtl, not a simple static variable, so that it can be
@@ -6723,6 +6795,18 @@ StartupXLOG(void)
break;
}
+ /*
+ * If we've been asked to lag the master, pause until enough
+ * time has passed.
+ */
+ if (recovery_time_delay > 0)
+ {
+ recoveryDelay();
+
+ if (CheckForStandbyTrigger())
+ break;
+ }
+
/* Setup error traceback support for ereport() */
errcallback.callback = rm_redo_error_callback;
errcallback.arg = (void *) record;
Hi Royes,
I'm sorry for my late review...
I feel potential of your patch in PG replication function, and it might be
something useful for all people. I check your patch and have some comment for
improvement. I haven't executed detail of unexpected sutuation yet. But I think
that under following problem2 is important functionality problem. So I ask you to
solve the problem in first.
* Regress test
No problem.
* Problem1
Your patch does not code recovery.conf.sample about recovery_time_delay.
Please add it.
* Problem2
When I set time-delayed standby and start standby server, I cannot access stanby
server by psql. It is because PG is in first starting recovery which cannot
access by psql. I think that time-delayed standby is only delayed recovery
position, it must not affect other functionality.
I didn't test recoevery in master server with recovery_time_delay. If you have
detail test result of these cases, please send me.
My first easy review of your patch is that all.
Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Nov 29, 2013 at 5:49 AM, KONDO Mitsumasa <
kondo.mitsumasa@lab.ntt.co.jp> wrote:
Hi Royes,
I'm sorry for my late review...
No problem...
I feel potential of your patch in PG replication function, and it might
be something useful for all people. I check your patch and have some
comment for improvement. I haven't executed detail of unexpected sutuation
yet. But I think that under following problem2 is important functionality
problem. So I ask you to solve the problem in first.
* Regress test
No problem.* Problem1
Your patch does not code recovery.conf.sample about recovery_time_delay.
Please add it.
Fixed.
* Problem2
When I set time-delayed standby and start standby server, I cannot access
stanby server by psql. It is because PG is in first starting recovery which
cannot access by psql. I think that time-delayed standby is only delayed
recovery position, it must not affect other functionality.
I didn't test recoevery in master server with recovery_time_delay. If you
have detail test result of these cases, please send me.
Well, I could not reproduce the problem that you described.
I run the following test:
1) Clusters
- build master
- build slave and attach to the master using SR and config
recovery_time_delay to 1min.
2) Stop de Slave
3) Run some transactions on the master using pgbench to generate a lot of
archives
4) Start the slave and connect to it using psql and in another session I
can see all archive recovery log
My first easy review of your patch is that all.
Thanks.
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
Show quoted text
Timbira: http://www.timbira.com.br
Blog sobre TI: http://fabriziomello.blogspot.com
Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
Twitter: http://twitter.com/fabriziomello
Attachments:
time-delayed-standby-v2.patchtext/x-diff; charset=US-ASCII; name=time-delayed-standby-v2.patchDownload
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index c0c543e..641c9c6 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -135,6 +135,27 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows
</listitem>
</varlistentry>
+ <varlistentry id="recovery-time-delay" xreflabel="recovery_time_delay">
+ <term><varname>recovery_time_delay</varname> (<type>integer</type>)</term>
+ <indexterm>
+ <primary><varname>recovery_time_delay</> recovery parameter</primary>
+ </indexterm>
+ <listitem>
+ <para>
+ Specifies the amount of time (in milliseconds, if no unit is specified)
+ which recovery of transaction commits should lag the master. This
+ parameter allows creation of a time-delayed standby. For example, if
+ you set this parameter to <literal>5min</literal>, the standby will
+ replay each transaction commit only when the system time on the standby
+ is at least five minutes past the commit time reported by the master.
+ </para>
+ <para>
+ Note that if the master and standby system clocks are not synchronized,
+ this might lead to unexpected results.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect1>
diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
index 5acfa57..97cc7af 100644
--- a/src/backend/access/transam/recovery.conf.sample
+++ b/src/backend/access/transam/recovery.conf.sample
@@ -123,6 +123,17 @@
#
#trigger_file = ''
#
+# recovery_time_delay
+#
+# By default, a standby server keeps restoring XLOG records from the
+# primary as soon as possible. If you want to delay the replay of
+# commited transactions from the master, specify a recovery time delay.
+# For example, if you set this parameter to 5min, the standby will replay
+# each transaction commit only whe the system time on the standby is least
+# five minutes past the commit time reported by the master.
+#
+#recovery_time_delay = 0
+#
#---------------------------------------------------------------------------
# HOT STANDBY PARAMETERS
#---------------------------------------------------------------------------
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index de19d22..714b1bd 100755
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -218,6 +218,8 @@ static bool recoveryPauseAtTarget = true;
static TransactionId recoveryTargetXid;
static TimestampTz recoveryTargetTime;
static char *recoveryTargetName;
+static int recovery_time_delay = 0;
+static TimestampTz recoveryDelayUntilTime;
/* options taken from recovery.conf for XLOG streaming */
static bool StandbyModeRequested = false;
@@ -730,6 +732,7 @@ static void readRecoveryCommandFile(void);
static void exitArchiveRecovery(TimeLineID endTLI, XLogSegNo endLogSegNo);
static bool recoveryStopsHere(XLogRecord *record, bool *includeThis);
static void recoveryPausesHere(void);
+static void recoveryDelay(void);
static void SetLatestXTime(TimestampTz xtime);
static void SetCurrentChunkStartTime(TimestampTz xtime);
static void CheckRequiredParameterValues(void);
@@ -5474,6 +5477,19 @@ readRecoveryCommandFile(void)
(errmsg_internal("trigger_file = '%s'",
TriggerFile)));
}
+ else if (strcmp(item->name, "recovery_time_delay") == 0)
+ {
+ const char *hintmsg;
+
+ if (!parse_int(item->value, &recovery_time_delay, GUC_UNIT_MS,
+ &hintmsg))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("parameter \"%s\" requires a temporal value", "recovery_time_delay"),
+ hintmsg ? errhint("%s", _(hintmsg)) : 0));
+ ereport(DEBUG2,
+ (errmsg("recovery_time_delay = '%s'", item->value)));
+ }
else
ereport(FATAL,
(errmsg("unrecognized recovery parameter \"%s\"",
@@ -5623,7 +5639,8 @@ exitArchiveRecovery(TimeLineID endTLI, XLogSegNo endLogSegNo)
* We also track the timestamp of the latest applied COMMIT/ABORT
* record in XLogCtl->recoveryLastXTime, for logging purposes.
* Also, some information is saved in recoveryStopXid et al for use in
- * annotating the new timeline's history file.
+ * annotating the new timeline's history file; and recoveryDelayUntilTime
+ * is updated, for time-delayed standbys.
*/
static bool
recoveryStopsHere(XLogRecord *record, bool *includeThis)
@@ -5633,6 +5650,9 @@ recoveryStopsHere(XLogRecord *record, bool *includeThis)
TimestampTz recordXtime;
char recordRPName[MAXFNAMELEN];
+ /* Clear any previous recovery delay time */
+ recoveryDelayUntilTime = 0;
+
/* We only consider stopping at COMMIT, ABORT or RESTORE POINT records */
if (record->xl_rmid != RM_XACT_ID && record->xl_rmid != RM_XLOG_ID)
return false;
@@ -5643,6 +5663,11 @@ recoveryStopsHere(XLogRecord *record, bool *includeThis)
recordXactCommitData = (xl_xact_commit_compact *) XLogRecGetData(record);
recordXtime = recordXactCommitData->xact_time;
+
+ if (recovery_time_delay > 0)
+ recoveryDelayUntilTime =
+ TimestampTzPlusMilliseconds(recordXactCommitData->xact_time,
+ recovery_time_delay);
}
else if (record->xl_rmid == RM_XACT_ID && record_info == XLOG_XACT_COMMIT)
{
@@ -5650,6 +5675,11 @@ recoveryStopsHere(XLogRecord *record, bool *includeThis)
recordXactCommitData = (xl_xact_commit *) XLogRecGetData(record);
recordXtime = recordXactCommitData->xact_time;
+
+ if (recovery_time_delay > 0)
+ recoveryDelayUntilTime =
+ TimestampTzPlusMilliseconds(recordXactCommitData->xact_time,
+ recovery_time_delay);
}
else if (record->xl_rmid == RM_XACT_ID && record_info == XLOG_XACT_ABORT)
{
@@ -5832,6 +5862,48 @@ SetRecoveryPause(bool recoveryPause)
}
/*
+ * When recovery_time_delay is set, we wait long enough to make sure we are
+ * at least that far behind the master.
+ */
+static void
+recoveryDelay(void)
+{
+ /* main delay loop */
+ while (true)
+ {
+ long secs;
+ int microsecs;
+
+ ResetLatch(&XLogCtl->recoveryWakeupLatch);
+
+ /* might change the trigger file's location */
+ HandleStartupProcInterrupts();
+
+ if (CheckForStandbyTrigger())
+ break;
+
+ TimestampDifference(GetCurrentTimestamp(), recoveryDelayUntilTime,
+ &secs, µsecs);
+
+ /*
+ * Wait for difference between GetCurrentTimestamp() and
+ * recoveryDelayUntilTime
+ */
+ if (secs > 0 || microsecs > 0)
+ {
+ elog(DEBUG2, "recovery delay %ld seconds, %d milliseconds",
+ secs, microsecs / 1000);
+
+ WaitLatch(&XLogCtl->recoveryWakeupLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+ secs * 1000L + microsecs / 1000);
+ }
+ else
+ break;
+ }
+}
+
+/*
* Save timestamp of latest processed commit/abort record.
*
* We keep this in XLogCtl, not a simple static variable, so that it can be
@@ -6723,6 +6795,18 @@ StartupXLOG(void)
break;
}
+ /*
+ * If we've been asked to lag the master, pause until enough
+ * time has passed.
+ */
+ if (recovery_time_delay > 0)
+ {
+ recoveryDelay();
+
+ if (CheckForStandbyTrigger())
+ break;
+ }
+
/* Setup error traceback support for ereport() */
errcallback.callback = rm_redo_error_callback;
errcallback.arg = (void *) record;
Hi Fabrizio,
looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It
applies and compiles w/o errors or warnings. I set up a master and two
hot standbys replicating from the master, one with 5 minutes delay and
one without delay. After that I created a new database and generated
some test data:
CREATE TABLE test (val INTEGER);
INSERT INTO test (val) (SELECT * FROM generate_series(0, 1000000));
The non-delayed standby nearly instantly had the data replicated, the
delayed standby was replicated after exactly 5 minutes. I did not
notice any problems, errors or warnings.
Greetings,
CK
--
Christian Kruse http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse
<christian@2ndquadrant.com>wrote:
Hi Fabrizio,
looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It
applies and compiles w/o errors or warnings. I set up a master and two
hot standbys replicating from the master, one with 5 minutes delay and
one without delay. After that I created a new database and generated
some test data:CREATE TABLE test (val INTEGER);
INSERT INTO test (val) (SELECT * FROM generate_series(0, 1000000));The non-delayed standby nearly instantly had the data replicated, the
delayed standby was replicated after exactly 5 minutes. I did not
notice any problems, errors or warnings.
Thanks for your review Christian...
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
Show quoted text
Timbira: http://www.timbira.com.br
Blog sobre TI: http://fabriziomello.blogspot.com
Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
Twitter: http://twitter.com/fabriziomello
On Tue, Dec 3, 2013 at 12:36 PM, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:
On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse <christian@2ndquadrant.com>
wrote:Hi Fabrizio,
looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It
applies and compiles w/o errors or warnings. I set up a master and two
hot standbys replicating from the master, one with 5 minutes delay and
one without delay. After that I created a new database and generated
some test data:CREATE TABLE test (val INTEGER);
INSERT INTO test (val) (SELECT * FROM generate_series(0, 1000000));The non-delayed standby nearly instantly had the data replicated, the
delayed standby was replicated after exactly 5 minutes. I did not
notice any problems, errors or warnings.Thanks for your review Christian...
So, I proposed this patch previously and I still think it's a good
idea, but it got voted down on the grounds that it didn't deal with
clock drift. I view that as insufficient reason to reject the
feature, but others disagreed. Unless some of those people have
changed their minds, I don't think this patch has much future here.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12/03/2013 10:46 AM, Robert Haas wrote:
On Tue, Dec 3, 2013 at 12:36 PM, Fabr�zio de Royes Mello
<fabriziomello@gmail.com> wrote:On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse <christian@2ndquadrant.com>
wrote:Hi Fabrizio,
looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It
applies and compiles w/o errors or warnings. I set up a master and two
hot standbys replicating from the master, one with 5 minutes delay and
one without delay. After that I created a new database and generated
some test data:CREATE TABLE test (val INTEGER);
INSERT INTO test (val) (SELECT * FROM generate_series(0, 1000000));The non-delayed standby nearly instantly had the data replicated, the
delayed standby was replicated after exactly 5 minutes. I did not
notice any problems, errors or warnings.Thanks for your review Christian...
So, I proposed this patch previously and I still think it's a good
idea, but it got voted down on the grounds that it didn't deal with
clock drift. I view that as insufficient reason to reject the
feature, but others disagreed. Unless some of those people have
changed their minds, I don't think this patch has much future here.
I would agree that it is a good idea.
Joshua D. Drake
--
Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
a rose in the deeps of my heart. - W.B. Yeats
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-12-03 13:46:28 -0500, Robert Haas wrote:
On Tue, Dec 3, 2013 at 12:36 PM, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse <christian@2ndquadrant.com>
wrote:Hi Fabrizio,
looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It
applies and compiles w/o errors or warnings. I set up a master and two
hot standbys replicating from the master, one with 5 minutes delay and
one without delay. After that I created a new database and generated
some test data:CREATE TABLE test (val INTEGER);
INSERT INTO test (val) (SELECT * FROM generate_series(0, 1000000));The non-delayed standby nearly instantly had the data replicated, the
delayed standby was replicated after exactly 5 minutes. I did not
notice any problems, errors or warnings.Thanks for your review Christian...
So, I proposed this patch previously and I still think it's a good
idea, but it got voted down on the grounds that it didn't deal with
clock drift. I view that as insufficient reason to reject the
feature, but others disagreed.
I really fail to see why clock drift should be this patch's
responsibility. It's not like the world would go under^W data corruption
would ensue if the clocks drift. Your standby would get delayed
imprecisely. Big deal. From what I know of potential users of this
feature, they would set it to at the very least 30min - that's WAY above
the range for acceptable clock-drift on servers.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 18 October 2013 19:03, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:
The attached patch is a continuation of Robert's work [1].
Reviewing v2...
I made some changes:
- use of Latches instead of pg_usleep, so we don't have to wakeup regularly.
OK
- call HandleStartupProcInterrupts() before CheckForStandbyTrigger() because
might change the trigger file's location
OK
- compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and
XLOG_XACT_COMMIT_COMPACT checks
Why just those? Why not aborts and restore points also?
- don't care about clockdrift because it's an admin problem.
Few minor points on things
* The code with comment "Clear any previous recovery delay time" is in
wrong place, move down or remove completely. Setting the delay to zero
doesn't prevent calling recoveryDelay(), so that logic looks wrong
anyway.
* The loop exit in recoveryDelay() is inelegant, should break if <= 0
* There's a spelling mistake in sample
* The patch has whitespace in one place
and one important point...
* The delay loop happens AFTER we check for a pause. Which means if
the user notices a problem on a commit, then hits pause button on the
standby, the pause will have no effect and the next commit will be
applied anyway. Maybe just one commit, but its an off by one error
that removes the benefit of the patch. So I think we need to test this
again after we finish delaying
if (xlogctl->recoveryPause)
recoveryPausesHere();
We need to explain in the docs that this is intended only for use in a
live streaming deployment. It will have little or no meaning in a
PITR.
I think recovery_time_delay should be called
<something>_apply_delay
to highlight the point that it is the apply of records that is
delayed, not the receipt. And hence the need to document that sync rep
is NOT slowed down by setting this value.
And to make the name consistent with other parameters, I suggest
min_standby_apply_delay
We also need to document caveats about the patch, in that it only
delays on timestamped WAL records and other records may be applied
sooner than the delay in some circumstances, so it is not a way to
avoid all cancellations.
We also need to document the behaviour of the patch is to apply all
data received as quickly as possible once triggered, so the specified
delay does not slow down promoting the server to a master. That might
also be seen as a negative behaviour, since promoting the master
effectively sets recovery_time_delay to zero.
I will handle the additional documentation, if you can update the
patch with the main review comments. Thanks.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
(2013/11/30 5:34), Fabrízio de Royes Mello wrote:
On Fri, Nov 29, 2013 at 5:49 AM, KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp
<mailto:kondo.mitsumasa@lab.ntt.co.jp>> wrote:* Problem1
Your patch does not code recovery.conf.sample about recovery_time_delay.
Please add it.Fixed.
OK. It seems no problem.
* Problem2
When I set time-delayed standby and start standby server, I cannot accessstanby server by psql. It is because PG is in first starting recovery which
cannot access by psql. I think that time-delayed standby is only delayed recovery
position, it must not affect other functionality.I didn't test recoevery in master server with recovery_time_delay. If you have
detail test result of these cases, please send me.
Well, I could not reproduce the problem that you described.
I run the following test:
1) Clusters
- build master
- build slave and attach to the master using SR and config recovery_time_delay to
1min.2) Stop de Slave
3) Run some transactions on the master using pgbench to generate a lot of archives
4) Start the slave and connect to it using psql and in another session I can see
all archive recovery log
Hmm... I had thought my mistake in reading your email, but it reproduce again.
When I sat small recovery_time_delay(=30000), it might work collectry. However, I
sat long timed recovery_time_delay(=3000000), it didn't work.
My reporduced operation log is under following.
[mitsu-ko@localhost postgresql]$ bin/pgbench -T 30 -c 8 -j4 -p5432
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 8
number of threads: 4
duration: 30 s
number of transactions actually processed: 68704
latency average: 3.493 ms
tps = 2289.196747 (including connections establishing)
tps = 2290.175129 (excluding connections establishing)
[mitsu-ko@localhost postgresql]$ vim slave/recovery.conf
[mitsu-ko@localhost postgresql]$ bin/pg_ctl -D slave start
server starting
[mitsu-ko@localhost postgresql]$ LOG: database system was shut down in recovery at 2013-12-03 10:26:41 JST
LOG: entering standby mode
LOG: consistent recovery state reached at 0/5C4D8668
LOG: redo starts at 0/5C4000D8
[mitsu-ko@localhost postgresql]$ FATAL: the database system is starting up
FATAL: the database system is starting up
FATAL: the database system is starting up
FATAL: the database system is starting up
FATAL: the database system is starting up
[mitsu-ko@localhost postgresql]$ bin/psql -p6543
psql: FATAL: the database system is starting up
[mitsu-ko@localhost postgresql]$ bin/psql -p6543
psql: FATAL: the database system is starting up
I attached my postgresql.conf and recovery.conf. It will be reproduced.
I think that your patch should be needed recovery flags which are like
ArchiveRecoveryRequested and InArchiveRecovery etc. It is because time-delayed
standy works only replication situasion. And I hope that it isn't bad in startup
standby server and archive recovery. Is it wrong with your image? I think this
patch have a lot of potential, however I think that standby functionality is more
important than this feature. And we might need to discuss that how behavior is
best in this patch.
Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
Attachments:
(2013/12/04 4:00), Andres Freund wrote:
On 2013-12-03 13:46:28 -0500, Robert Haas wrote:
On Tue, Dec 3, 2013 at 12:36 PM, Fabr�zio de Royes Mello
<fabriziomello@gmail.com> wrote:On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse <christian@2ndquadrant.com>
wrote:Hi Fabrizio,
looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It
applies and compiles w/o errors or warnings. I set up a master and two
hot standbys replicating from the master, one with 5 minutes delay and
one without delay. After that I created a new database and generated
some test data:CREATE TABLE test (val INTEGER);
INSERT INTO test (val) (SELECT * FROM generate_series(0, 1000000));The non-delayed standby nearly instantly had the data replicated, the
delayed standby was replicated after exactly 5 minutes. I did not
notice any problems, errors or warnings.Thanks for your review Christian...
So, I proposed this patch previously and I still think it's a good
idea, but it got voted down on the grounds that it didn't deal with
clock drift. I view that as insufficient reason to reject the
feature, but others disagreed.I really fail to see why clock drift should be this patch's
responsibility. It's not like the world would go under^W data corruption
would ensue if the clocks drift. Your standby would get delayed
imprecisely. Big deal. From what I know of potential users of this
feature, they would set it to at the very least 30min - that's WAY above
the range for acceptable clock-drift on servers.
Yes. I think that purpose of this patch is long time delay in standby server,
and not for little bit careful timing delay.
Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-12-04 11:13:58 +0900, KONDO Mitsumasa wrote:
4) Start the slave and connect to it using psql and in another session I can see
all archive recovery logHmm... I had thought my mistake in reading your email, but it reproduce again.
When I sat small recovery_time_delay(=30000), it might work collectry.
However, I sat long timed recovery_time_delay(=3000000), it didn't work.
My reporduced operation log is under following.
[mitsu-ko@localhost postgresql]$ bin/pgbench -T 30 -c 8 -j4 -p5432
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 8
number of threads: 4
duration: 30 s
number of transactions actually processed: 68704
latency average: 3.493 ms
tps = 2289.196747 (including connections establishing)
tps = 2290.175129 (excluding connections establishing)
[mitsu-ko@localhost postgresql]$ vim slave/recovery.conf
[mitsu-ko@localhost postgresql]$ bin/pg_ctl -D slave start
server starting
[mitsu-ko@localhost postgresql]$ LOG: database system was shut down in recovery at 2013-12-03 10:26:41 JST
LOG: entering standby mode
LOG: consistent recovery state reached at 0/5C4D8668
LOG: redo starts at 0/5C4000D8
[mitsu-ko@localhost postgresql]$ FATAL: the database system is starting up
FATAL: the database system is starting up
FATAL: the database system is starting up
FATAL: the database system is starting up
FATAL: the database system is starting up
[mitsu-ko@localhost postgresql]$ bin/psql -p6543
psql: FATAL: the database system is starting up
[mitsu-ko@localhost postgresql]$ bin/psql -p6543
psql: FATAL: the database system is starting upI attached my postgresql.conf and recovery.conf. It will be reproduced.
So, you brought up a standby and it took more time to become consistent
because it waited on commits? That's the problem? If so, I don't think
that's a bug?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 04/12/13 11:13, KONDO Mitsumasa wrote:
1) Clusters
- build master
- build slave and attach to the master using SR and config recovery_time_delay to
1min.2) Stop de Slave
3) Run some transactions on the master using pgbench to generate a lot of archives
4) Start the slave and connect to it using psql and in another session I can see
all archive recovery logHmm... I had thought my mistake in reading your email, but it reproduce again.
When I sat small recovery_time_delay(=30000), it might work collectry.
However, I sat long timed recovery_time_delay(=3000000), it didn't work.
[…]
I'm not sure if I understand your problem correctly. I try to
summarize, please correct if I'm wrong:
You created a master node and a hot standby with 3000000 delay. Then
you stopped the standby, did the pgbench and startet the hot standby
again. It did not get in line with the master. Is this correct?
I don't see a problem here… the standby should not be in sync with the
master, it should be delayed. I did step by step what you did and
after 50 minutes (3000000ms) the standby was at the same level the
master was.
Did I missunderstand you?
Regards,
Christian Kruse
--
Christian Kruse http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
2013/12/4 Andres Freund <andres@2ndquadrant.com>
On 2013-12-04 11:13:58 +0900, KONDO Mitsumasa wrote:
4) Start the slave and connect to it using psql and in another session
I can see
all archive recovery log
Hmm... I had thought my mistake in reading your email, but it reproduce
again.
When I sat small recovery_time_delay(=30000), it might work collectry.
However, I sat long timed recovery_time_delay(=3000000), it didn't work.My reporduced operation log is under following.
[mitsu-ko@localhost postgresql]$ bin/pgbench -T 30 -c 8 -j4 -p5432
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 8
number of threads: 4
duration: 30 s
number of transactions actually processed: 68704
latency average: 3.493 ms
tps = 2289.196747 (including connections establishing)
tps = 2290.175129 (excluding connections establishing)
[mitsu-ko@localhost postgresql]$ vim slave/recovery.conf
[mitsu-ko@localhost postgresql]$ bin/pg_ctl -D slave start
server starting
[mitsu-ko@localhost postgresql]$ LOG: database system was shut downin recovery at 2013-12-03 10:26:41 JST
LOG: entering standby mode
LOG: consistent recovery state reached at 0/5C4D8668
LOG: redo starts at 0/5C4000D8
[mitsu-ko@localhost postgresql]$ FATAL: the database system isstarting up
FATAL: the database system is starting up
FATAL: the database system is starting up
FATAL: the database system is starting up
FATAL: the database system is starting up
[mitsu-ko@localhost postgresql]$ bin/psql -p6543
psql: FATAL: the database system is starting up
[mitsu-ko@localhost postgresql]$ bin/psql -p6543
psql: FATAL: the database system is starting upI attached my postgresql.conf and recovery.conf. It will be reproduced.
So, you brought up a standby and it took more time to become consistent
because it waited on commits? That's the problem? If so, I don't think
that's a bug?
When it happened, psql cannot connect standby server at all. I think this
behavior is not good.
It should only delay recovery position and can seen old delay table data.
Cannot connect server is not hoped behavior.
If you think this behavior is the best, I will set ready for commiter. And
commiter will fix it better.
Rregards,
--
Mitsumasa KONDO
NTT Open Source Software Center
On 2013-12-04 22:47:47 +0900, Mitsumasa KONDO wrote:
2013/12/4 Andres Freund <andres@2ndquadrant.com>
When it happened, psql cannot connect standby server at all. I think this
behavior is not good.
It should only delay recovery position and can seen old delay table data.
That doesn't sound like a good plan - even if the clients cannot connect
yet, you can still promote the server. Just not taking delay into
consideration at that point seems like it would possibly surprise users
rather badly in situations they really cannot use such surprises.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2013-12-03 19:33:16 +0000, Simon Riggs wrote:
- compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and
XLOG_XACT_COMMIT_COMPACT checksWhy just those? Why not aborts and restore points also?
What would the advantage of waiting on anything but commits be? If it's
not a commit, the action won't change the state of the database (yesyes,
there are exceptions, but those don't have a timestamp)...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/12/4 Christian Kruse <christian@2ndquadrant.com>
You created a master node and a hot standby with 3000000 delay. Then
you stopped the standby, did the pgbench and startet the hot standby
again. It did not get in line with the master. Is this correct?
No. First, I start master, and execute pgbench. Second, I start standby
with 3000000ms(50min) delay.
Then it cannot connect standby server by psql at all. I'm not sure why
standby did not start.
It might because delay feature is disturbed in REDO loop when first standby
start-up.
I don't see a problem here… the standby should not be in sync with the
master, it should be delayed. I did step by step what you did and
after 50 minutes (3000000ms) the standby was at the same level the
master was.
I think we can connect standby server any time, nevertheless with delay
option.
Did I missunderstand you?
I'm not sure... You might right or another best way might be existed.
Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
2013/12/4 Andres Freund <andres@2ndquadrant.com>
On 2013-12-04 22:47:47 +0900, Mitsumasa KONDO wrote:
2013/12/4 Andres Freund <andres@2ndquadrant.com>
When it happened, psql cannot connect standby server at all. I think this
behavior is not good.
It should only delay recovery position and can seen old delay table data.That doesn't sound like a good plan - even if the clients cannot connect
yet, you can still promote the server.
I'm not sure your argument, but does a purpose of this patch slip off?
Just not taking delay into
consideration at that point seems like it would possibly surprise users
rather badly in situations they really cannot use such surprises.
Hmm... I think user will be surprised...
I think it is easy to fix behavior using recovery flag.
So we had better to wait for other comments.
Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
Robert Haas <robertmhaas@gmail.com> wrote:
So, I proposed this patch previously and I still think it's a
good idea, but it got voted down on the grounds that it didn't
deal with clock drift. I view that as insufficient reason to
reject the feature, but others disagreed. Unless some of those
people have changed their minds, I don't think this patch has
much future here.
There are many things that a system admin can get wrong. Failing
to supply this feature because the sysadmin might not be running
ntpd (or equivalent) correctly seems to me to be like not having
the software do fsync because the sysadmin might not have turned
off write-back buffering on drives without persistent storage.
Either way, poor system management can defeat the feature. Either
way, I see no reason to withhold the feature from those who manage
their systems in a sane fashion.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 04/12/13 07:22, Kevin Grittner wrote:
There are many things that a system admin can get wrong. Failing
to supply this feature because the sysadmin might not be running
ntpd (or equivalent) correctly seems to me to be like not having
the software do fsync because the sysadmin might not have turned
off write-back buffering on drives without persistent storage.
Either way, poor system management can defeat the feature. Either
way, I see no reason to withhold the feature from those who manage
their systems in a sane fashion.
I agree. But maybe we should add a warning in the documentation about
time syncing?
Greetings,
CK
--
Christian Kruse http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
src/backend/access/transam/xlog.c:5889: trailing whitespace.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 3 December 2013 18:46, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Dec 3, 2013 at 12:36 PM, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse <christian@2ndquadrant.com>
wrote:Hi Fabrizio,
looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It
applies and compiles w/o errors or warnings. I set up a master and two
hot standbys replicating from the master, one with 5 minutes delay and
one without delay. After that I created a new database and generated
some test data:CREATE TABLE test (val INTEGER);
INSERT INTO test (val) (SELECT * FROM generate_series(0, 1000000));The non-delayed standby nearly instantly had the data replicated, the
delayed standby was replicated after exactly 5 minutes. I did not
notice any problems, errors or warnings.Thanks for your review Christian...
So, I proposed this patch previously and I still think it's a good
idea, but it got voted down on the grounds that it didn't deal with
clock drift. I view that as insufficient reason to reject the
feature, but others disagreed. Unless some of those people have
changed their minds, I don't think this patch has much future here.
I had that objection and others. Since then many people have requested
this feature and have persuaded me that this is worth having and that
my objections are minor points. I now agree with the need for the
feature, almost as written.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Dec 5, 2013 at 1:45 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 3 December 2013 18:46, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Dec 3, 2013 at 12:36 PM, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse <
christian@2ndquadrant.com>
wrote:
Hi Fabrizio,
looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It
applies and compiles w/o errors or warnings. I set up a master and two
hot standbys replicating from the master, one with 5 minutes delay and
one without delay. After that I created a new database and generated
some test data:CREATE TABLE test (val INTEGER);
INSERT INTO test (val) (SELECT * FROM generate_series(0, 1000000));The non-delayed standby nearly instantly had the data replicated, the
delayed standby was replicated after exactly 5 minutes. I did not
notice any problems, errors or warnings.Thanks for your review Christian...
So, I proposed this patch previously and I still think it's a good
idea, but it got voted down on the grounds that it didn't deal with
clock drift. I view that as insufficient reason to reject the
feature, but others disagreed. Unless some of those people have
changed their minds, I don't think this patch has much future here.I had that objection and others. Since then many people have requested
this feature and have persuaded me that this is worth having and that
my objections are minor points. I now agree with the need for the
feature, almost as written.
Not recalling the older thread, but it seems the "breaks on clock drift", I
think we can fairly easily make that situation "good enough". Just have
IDENTIFY_SYSTEM return the current timestamp on the master, and refuse to
start if the time difference is too great. Yes, that doesn't catch the case
when the machines are in perfect sync when they start up and drift *later*,
but it will catch the most common cases I bet. But I think that's good
enough that we can accept the feature, given that *most* people will have
ntp, and that it's a very useful feature for those people. But we could
help people who run into it because of a simple config error..
Or maybe the suggested patch already does this, in which case ignore that
part :)
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On 5 December 2013 08:51, Magnus Hagander <magnus@hagander.net> wrote:
Not recalling the older thread, but it seems the "breaks on clock drift", I
think we can fairly easily make that situation "good enough". Just have
IDENTIFY_SYSTEM return the current timestamp on the master, and refuse to
start if the time difference is too great. Yes, that doesn't catch the case
when the machines are in perfect sync when they start up and drift *later*,
but it will catch the most common cases I bet. But I think that's good
enough that we can accept the feature, given that *most* people will have
ntp, and that it's a very useful feature for those people. But we could help
people who run into it because of a simple config error..Or maybe the suggested patch already does this, in which case ignore that
part :)
I think the very nature of *this* feature is that it doesnt *require*
the clocks to be exactly in sync, even though that is the basis for
measurement.
The setting of this parameter for sane usage would be minimum 5 mins,
but more likely 30 mins, 1 hour or more.
In that case, a few seconds drift either way makes no real difference
to this feature.
So IMHO, without prejudice to other features that may be more
critically reliant on time synchronisation, we are OK to proceed with
this specific feature.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Dec 5, 2013 at 7:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 5 December 2013 08:51, Magnus Hagander <magnus@hagander.net> wrote:
Not recalling the older thread, but it seems the "breaks on clock
drift", I
think we can fairly easily make that situation "good enough". Just have
IDENTIFY_SYSTEM return the current timestamp on the master, and refuse to
start if the time difference is too great. Yes, that doesn't catch thecase
when the machines are in perfect sync when they start up and drift
*later*,
but it will catch the most common cases I bet. But I think that's good
enough that we can accept the feature, given that *most* people will have
ntp, and that it's a very useful feature for those people. But we couldhelp
people who run into it because of a simple config error..
Or maybe the suggested patch already does this, in which case ignore that
part :)I think the very nature of *this* feature is that it doesnt *require*
the clocks to be exactly in sync, even though that is the basis for
measurement.The setting of this parameter for sane usage would be minimum 5 mins,
but more likely 30 mins, 1 hour or more.In that case, a few seconds drift either way makes no real difference
to this feature.So IMHO, without prejudice to other features that may be more
critically reliant on time synchronisation, we are OK to proceed with
this specific feature.
Hi all,
I saw the comments of all of you. I'm a few busy with some customers issues
(has been a crazy week), but I'll reply and/or fix your suggestions later.
Thanks for all review and sorry to delay in reply.
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
Show quoted text
Timbira: http://www.timbira.com.br
Blog sobre TI: http://fabriziomello.blogspot.com
Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
Twitter: http://twitter.com/fabriziomello
On Tue, Dec 3, 2013 at 5:33 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
- compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and
XLOG_XACT_COMMIT_COMPACT checksWhy just those? Why not aborts and restore points also?
I think make no sense execute the delay after aborts and/or restore points,
because it not change data in a standby server.
- don't care about clockdrift because it's an admin problem.
Few minor points on things
* The code with comment "Clear any previous recovery delay time" is in
wrong place, move down or remove completely. Setting the delay to zero
doesn't prevent calling recoveryDelay(), so that logic looks wrong
anyway.
Fixed.
* The loop exit in recoveryDelay() is inelegant, should break if <= 0
Fixed.
* There's a spelling mistake in sample
Fixed.
* The patch has whitespace in one place
Fixed.
and one important point...
* The delay loop happens AFTER we check for a pause. Which means if
the user notices a problem on a commit, then hits pause button on the
standby, the pause will have no effect and the next commit will be
applied anyway. Maybe just one commit, but its an off by one error
that removes the benefit of the patch. So I think we need to test this
again after we finish delayingif (xlogctl->recoveryPause)
recoveryPausesHere();
Fixed.
We need to explain in the docs that this is intended only for use in a
live streaming deployment. It will have little or no meaning in a
PITR.
Fixed.
I think recovery_time_delay should be called
<something>_apply_delay
to highlight the point that it is the apply of records that is
delayed, not the receipt. And hence the need to document that sync rep
is NOT slowed down by setting this value.
Fixed.
And to make the name consistent with other parameters, I suggest
min_standby_apply_delay
I agree. Fixed!
We also need to document caveats about the patch, in that it only
delays on timestamped WAL records and other records may be applied
sooner than the delay in some circumstances, so it is not a way to
avoid all cancellations.We also need to document the behaviour of the patch is to apply all
data received as quickly as possible once triggered, so the specified
delay does not slow down promoting the server to a master. That might
also be seen as a negative behaviour, since promoting the master
effectively sets recovery_time_delay to zero.I will handle the additional documentation, if you can update the
patch with the main review comments. Thanks.
Thanks, your help is welcome.
Att,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
Show quoted text
Timbira: http://www.timbira.com.br
Blog sobre TI: http://fabriziomello.blogspot.com
Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
Twitter: http://twitter.com/fabriziomello
Attachments:
time-delayed-standby-v3.patchtext/x-diff; charset=US-ASCII; name=time-delayed-standby-v3.patchDownload
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 9d80256..12aa917 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -142,6 +142,31 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows
</listitem>
</varlistentry>
+ <varlistentry id="min-standby-apply-delay" xreflabel="min_standby_apply_delay">
+ <term><varname>min_standby_apply_delay</varname> (<type>integer</type>)</term>
+ <indexterm>
+ <primary><varname>min_standby_apply_delay</> recovery parameter</primary>
+ </indexterm>
+ <listitem>
+ <para>
+ Specifies the amount of time (in milliseconds, if no unit is specified)
+ which recovery of transaction commits should lag the master. This
+ parameter allows creation of a time-delayed standby. For example, if
+ you set this parameter to <literal>5min</literal>, the standby will
+ replay each transaction commit only when the system time on the standby
+ is at least five minutes past the commit time reported by the master.
+ </para>
+ <para>
+ Note that if the master and standby system clocks are not synchronized,
+ this might lead to unexpected results.
+ </para>
+ <para>
+ This parameter works only for streaming replication deployments. Synchronous
+ replicas and PITR has not affected.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect1>
diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
index 5acfa57..e8617db 100644
--- a/src/backend/access/transam/recovery.conf.sample
+++ b/src/backend/access/transam/recovery.conf.sample
@@ -123,6 +123,17 @@
#
#trigger_file = ''
#
+# min_standby_apply_delay
+#
+# By default, a standby server keeps restoring XLOG records from the
+# primary as soon as possible. If you want to delay the replay of
+# commited transactions from the master, specify a recovery time delay.
+# For example, if you set this parameter to 5min, the standby will replay
+# each transaction commit only when the system time on the standby is least
+# five minutes past the commit time reported by the master.
+#
+#min_standby_apply_delay = 0
+#
#---------------------------------------------------------------------------
# HOT STANDBY PARAMETERS
#---------------------------------------------------------------------------
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b68230d..9d43a72 100755
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -218,6 +218,8 @@ static bool recoveryPauseAtTarget = true;
static TransactionId recoveryTargetXid;
static TimestampTz recoveryTargetTime;
static char *recoveryTargetName;
+static int min_standby_apply_delay = 0;
+static TimestampTz recoveryDelayUntilTime;
/* options taken from recovery.conf for XLOG streaming */
static bool StandbyModeRequested = false;
@@ -730,6 +732,7 @@ static void readRecoveryCommandFile(void);
static void exitArchiveRecovery(TimeLineID endTLI, XLogSegNo endLogSegNo);
static bool recoveryStopsHere(XLogRecord *record, bool *includeThis);
static void recoveryPausesHere(void);
+static void recoveryDelay(void);
static void SetLatestXTime(TimestampTz xtime);
static void SetCurrentChunkStartTime(TimestampTz xtime);
static void CheckRequiredParameterValues(void);
@@ -5474,6 +5477,19 @@ readRecoveryCommandFile(void)
(errmsg_internal("trigger_file = '%s'",
TriggerFile)));
}
+ else if (strcmp(item->name, "min_standby_apply_delay") == 0)
+ {
+ const char *hintmsg;
+
+ if (!parse_int(item->value, &min_standby_apply_delay, GUC_UNIT_MS,
+ &hintmsg))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("parameter \"%s\" requires a temporal value", "min_standby_apply_delay"),
+ hintmsg ? errhint("%s", _(hintmsg)) : 0));
+ ereport(DEBUG2,
+ (errmsg("min_standby_apply_delay = '%s'", item->value)));
+ }
else
ereport(FATAL,
(errmsg("unrecognized recovery parameter \"%s\"",
@@ -5623,7 +5639,8 @@ exitArchiveRecovery(TimeLineID endTLI, XLogSegNo endLogSegNo)
* We also track the timestamp of the latest applied COMMIT/ABORT
* record in XLogCtl->recoveryLastXTime, for logging purposes.
* Also, some information is saved in recoveryStopXid et al for use in
- * annotating the new timeline's history file.
+ * annotating the new timeline's history file; and recoveryDelayUntilTime
+ * is updated, for time-delayed standbys.
*/
static bool
recoveryStopsHere(XLogRecord *record, bool *includeThis)
@@ -5643,6 +5660,11 @@ recoveryStopsHere(XLogRecord *record, bool *includeThis)
recordXactCommitData = (xl_xact_commit_compact *) XLogRecGetData(record);
recordXtime = recordXactCommitData->xact_time;
+
+ if (min_standby_apply_delay > 0)
+ recoveryDelayUntilTime =
+ TimestampTzPlusMilliseconds(recordXactCommitData->xact_time,
+ min_standby_apply_delay);
}
else if (record->xl_rmid == RM_XACT_ID && record_info == XLOG_XACT_COMMIT)
{
@@ -5650,6 +5672,11 @@ recoveryStopsHere(XLogRecord *record, bool *includeThis)
recordXactCommitData = (xl_xact_commit *) XLogRecGetData(record);
recordXtime = recordXactCommitData->xact_time;
+
+ if (min_standby_apply_delay > 0)
+ recoveryDelayUntilTime =
+ TimestampTzPlusMilliseconds(recordXactCommitData->xact_time,
+ min_standby_apply_delay);
}
else if (record->xl_rmid == RM_XACT_ID && record_info == XLOG_XACT_ABORT)
{
@@ -5832,6 +5859,46 @@ SetRecoveryPause(bool recoveryPause)
}
/*
+ * When min_standby_apply_delay is set, we wait long enough to make sure we are
+ * at least that far behind the master.
+ */
+static void
+recoveryDelay(void)
+{
+ /* main delay loop */
+ while (true)
+ {
+ long secs;
+ int microsecs;
+
+ ResetLatch(&XLogCtl->recoveryWakeupLatch);
+
+ /* might change the trigger file's location */
+ HandleStartupProcInterrupts();
+
+ if (CheckForStandbyTrigger())
+ break;
+
+ /*
+ * Wait for difference between GetCurrentTimestamp() and
+ * recoveryDelayUntilTime
+ */
+ TimestampDifference(GetCurrentTimestamp(), recoveryDelayUntilTime,
+ &secs, µsecs);
+
+ if (secs <= 0 && microsecs <=0)
+ break;
+
+ elog(DEBUG2, "recovery delay %ld seconds, %d milliseconds",
+ secs, microsecs / 1000);
+
+ WaitLatch(&XLogCtl->recoveryWakeupLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+ secs * 1000L + microsecs / 1000);
+ }
+}
+
+/*
* Save timestamp of latest processed commit/abort record.
*
* We keep this in XLogCtl, not a simple static variable, so that it can be
@@ -6732,6 +6799,21 @@ StartupXLOG(void)
break;
}
+ /*
+ * If we've been asked to lag the master, pause until enough
+ * time has passed.
+ */
+ if (min_standby_apply_delay > 0)
+ {
+ recoveryDelay();
+
+ if (xlogctl->recoveryPause)
+ recoveryPausesHere();
+
+ if (CheckForStandbyTrigger())
+ break;
+ }
+
/* Setup error traceback support for ereport() */
errcallback.callback = rm_redo_error_callback;
errcallback.arg = (void *) record;
On Thu, Dec 5, 2013 at 11:07 PM, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:
On Tue, Dec 3, 2013 at 5:33 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
- compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and
XLOG_XACT_COMMIT_COMPACT checksWhy just those? Why not aborts and restore points also?
I think make no sense execute the delay after aborts and/or restore points,
because it not change data in a standby server.
I see no reason to pause for aborts. Aside from the fact that it
wouldn't be reliable in corner cases, as Fabrízio says, there's no
user-visible effect, just as there's no user-visible effect from
replaying a transaction up until just prior to the point where it
commits (which we also do).
Waiting for restore points seems like it potentially makes sense. If
the standby is delayed by an hour, and you create a restore point and
wait 55 minutes, you might expect that that you can still kill the
standby and recover it to that restore point.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Dec 6, 2013 at 1:36 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Dec 5, 2013 at 11:07 PM, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:On Tue, Dec 3, 2013 at 5:33 PM, Simon Riggs <simon@2ndquadrant.com>
wrote:
- compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and
XLOG_XACT_COMMIT_COMPACT checksWhy just those? Why not aborts and restore points also?
I think make no sense execute the delay after aborts and/or restore
points,
because it not change data in a standby server.
I see no reason to pause for aborts. Aside from the fact that it
wouldn't be reliable in corner cases, as Fabrízio says, there's no
user-visible effect, just as there's no user-visible effect from
replaying a transaction up until just prior to the point where it
commits (which we also do).Waiting for restore points seems like it potentially makes sense. If
the standby is delayed by an hour, and you create a restore point and
wait 55 minutes, you might expect that that you can still kill the
standby and recover it to that restore point.
Makes sense. Fixed.
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
Show quoted text
Timbira: http://www.timbira.com.br
Blog sobre TI: http://fabriziomello.blogspot.com
Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
Twitter: http://twitter.com/fabriziomello
Attachments:
time-delayed-standby-v4.patchtext/x-diff; charset=US-ASCII; name=time-delayed-standby-v4.patchDownload
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 9d80256..12aa917 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -142,6 +142,31 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows
</listitem>
</varlistentry>
+ <varlistentry id="min-standby-apply-delay" xreflabel="min_standby_apply_delay">
+ <term><varname>min_standby_apply_delay</varname> (<type>integer</type>)</term>
+ <indexterm>
+ <primary><varname>min_standby_apply_delay</> recovery parameter</primary>
+ </indexterm>
+ <listitem>
+ <para>
+ Specifies the amount of time (in milliseconds, if no unit is specified)
+ which recovery of transaction commits should lag the master. This
+ parameter allows creation of a time-delayed standby. For example, if
+ you set this parameter to <literal>5min</literal>, the standby will
+ replay each transaction commit only when the system time on the standby
+ is at least five minutes past the commit time reported by the master.
+ </para>
+ <para>
+ Note that if the master and standby system clocks are not synchronized,
+ this might lead to unexpected results.
+ </para>
+ <para>
+ This parameter works only for streaming replication deployments. Synchronous
+ replicas and PITR has not affected.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect1>
diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
index 5acfa57..e8617db 100644
--- a/src/backend/access/transam/recovery.conf.sample
+++ b/src/backend/access/transam/recovery.conf.sample
@@ -123,6 +123,17 @@
#
#trigger_file = ''
#
+# min_standby_apply_delay
+#
+# By default, a standby server keeps restoring XLOG records from the
+# primary as soon as possible. If you want to delay the replay of
+# commited transactions from the master, specify a recovery time delay.
+# For example, if you set this parameter to 5min, the standby will replay
+# each transaction commit only when the system time on the standby is least
+# five minutes past the commit time reported by the master.
+#
+#min_standby_apply_delay = 0
+#
#---------------------------------------------------------------------------
# HOT STANDBY PARAMETERS
#---------------------------------------------------------------------------
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b68230d..7ca2f9b 100755
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -218,6 +218,8 @@ static bool recoveryPauseAtTarget = true;
static TransactionId recoveryTargetXid;
static TimestampTz recoveryTargetTime;
static char *recoveryTargetName;
+static int min_standby_apply_delay = 0;
+static TimestampTz recoveryDelayUntilTime;
/* options taken from recovery.conf for XLOG streaming */
static bool StandbyModeRequested = false;
@@ -730,6 +732,7 @@ static void readRecoveryCommandFile(void);
static void exitArchiveRecovery(TimeLineID endTLI, XLogSegNo endLogSegNo);
static bool recoveryStopsHere(XLogRecord *record, bool *includeThis);
static void recoveryPausesHere(void);
+static void recoveryDelay(void);
static void SetLatestXTime(TimestampTz xtime);
static void SetCurrentChunkStartTime(TimestampTz xtime);
static void CheckRequiredParameterValues(void);
@@ -5474,6 +5477,19 @@ readRecoveryCommandFile(void)
(errmsg_internal("trigger_file = '%s'",
TriggerFile)));
}
+ else if (strcmp(item->name, "min_standby_apply_delay") == 0)
+ {
+ const char *hintmsg;
+
+ if (!parse_int(item->value, &min_standby_apply_delay, GUC_UNIT_MS,
+ &hintmsg))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("parameter \"%s\" requires a temporal value", "min_standby_apply_delay"),
+ hintmsg ? errhint("%s", _(hintmsg)) : 0));
+ ereport(DEBUG2,
+ (errmsg("min_standby_apply_delay = '%s'", item->value)));
+ }
else
ereport(FATAL,
(errmsg("unrecognized recovery parameter \"%s\"",
@@ -5623,7 +5639,8 @@ exitArchiveRecovery(TimeLineID endTLI, XLogSegNo endLogSegNo)
* We also track the timestamp of the latest applied COMMIT/ABORT
* record in XLogCtl->recoveryLastXTime, for logging purposes.
* Also, some information is saved in recoveryStopXid et al for use in
- * annotating the new timeline's history file.
+ * annotating the new timeline's history file; and recoveryDelayUntilTime
+ * is updated, for time-delayed standbys.
*/
static bool
recoveryStopsHere(XLogRecord *record, bool *includeThis)
@@ -5643,6 +5660,11 @@ recoveryStopsHere(XLogRecord *record, bool *includeThis)
recordXactCommitData = (xl_xact_commit_compact *) XLogRecGetData(record);
recordXtime = recordXactCommitData->xact_time;
+
+ if (min_standby_apply_delay > 0)
+ recoveryDelayUntilTime =
+ TimestampTzPlusMilliseconds(recordXactCommitData->xact_time,
+ min_standby_apply_delay);
}
else if (record->xl_rmid == RM_XACT_ID && record_info == XLOG_XACT_COMMIT)
{
@@ -5650,6 +5672,11 @@ recoveryStopsHere(XLogRecord *record, bool *includeThis)
recordXactCommitData = (xl_xact_commit *) XLogRecGetData(record);
recordXtime = recordXactCommitData->xact_time;
+
+ if (min_standby_apply_delay > 0)
+ recoveryDelayUntilTime =
+ TimestampTzPlusMilliseconds(recordXactCommitData->xact_time,
+ min_standby_apply_delay);
}
else if (record->xl_rmid == RM_XACT_ID && record_info == XLOG_XACT_ABORT)
{
@@ -5665,6 +5692,11 @@ recoveryStopsHere(XLogRecord *record, bool *includeThis)
recordRestorePointData = (xl_restore_point *) XLogRecGetData(record);
recordXtime = recordRestorePointData->rp_time;
strncpy(recordRPName, recordRestorePointData->rp_name, MAXFNAMELEN);
+
+ if (min_standby_apply_delay > 0)
+ recoveryDelayUntilTime =
+ TimestampTzPlusMilliseconds(recordRestorePointData->rp_time,
+ min_standby_apply_delay);
}
else
return false;
@@ -5832,6 +5864,46 @@ SetRecoveryPause(bool recoveryPause)
}
/*
+ * When min_standby_apply_delay is set, we wait long enough to make sure we are
+ * at least that far behind the master.
+ */
+static void
+recoveryDelay(void)
+{
+ /* main delay loop */
+ while (true)
+ {
+ long secs;
+ int microsecs;
+
+ ResetLatch(&XLogCtl->recoveryWakeupLatch);
+
+ /* might change the trigger file's location */
+ HandleStartupProcInterrupts();
+
+ if (CheckForStandbyTrigger())
+ break;
+
+ /*
+ * Wait for difference between GetCurrentTimestamp() and
+ * recoveryDelayUntilTime
+ */
+ TimestampDifference(GetCurrentTimestamp(), recoveryDelayUntilTime,
+ &secs, µsecs);
+
+ if (secs <= 0 && microsecs <=0)
+ break;
+
+ elog(DEBUG2, "recovery delay %ld seconds, %d milliseconds",
+ secs, microsecs / 1000);
+
+ WaitLatch(&XLogCtl->recoveryWakeupLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+ secs * 1000L + microsecs / 1000);
+ }
+}
+
+/*
* Save timestamp of latest processed commit/abort record.
*
* We keep this in XLogCtl, not a simple static variable, so that it can be
@@ -6732,6 +6804,21 @@ StartupXLOG(void)
break;
}
+ /*
+ * If we've been asked to lag the master, pause until enough
+ * time has passed.
+ */
+ if (min_standby_apply_delay > 0)
+ {
+ recoveryDelay();
+
+ if (xlogctl->recoveryPause)
+ recoveryPausesHere();
+
+ if (CheckForStandbyTrigger())
+ break;
+ }
+
/* Setup error traceback support for ereport() */
errcallback.callback = rm_redo_error_callback;
errcallback.arg = (void *) record;
2013/12/9 KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>
Hi Fabrízio,
I test your v4 patch, and send your review comments.
* Fix typo
49 -# commited transactions from the master, specify a recovery time
delay.
49 +# committed transactions from the master, specify a recovery time
delay.
* Fix white space
177 - if (secs <= 0 && microsecs <=0)
177 + if (secs <= 0 && microsecs <=0 )* Add functionality (I propose)
We can set negative number at min_standby_apply_delay. I think that this
feature
is for world wide replication situation. For example, master server is in
Japan and slave server is in San Francisco. Japan time fowards than San
Francisco time
. And if we want to delay in this situation, it can need negative number
in min_standby_apply_delay. So I propose that time delay conditional branch
change under following.- if (min_standby_apply_delay > 0) + if (min_standby_apply_delay != 0)What do you think? It might also be working collectry.
what using interval instead absolute time?
Regards
Pavel
Show quoted text
* Problem 1
I read your wittened document. There is "PITR has not affected".
However, when I run PITR with min_standby_apply_delay=3000000, it cannot
start server. The log is under following.[mitsu-ko@localhost postgresql]$ bin/pg_ctl -D data2 start
server starting
[mitsu-ko@localhost postgresql]$ LOG: database system was interrupted;
last known up at 2013-12-08 18:57:00 JST
LOG: creating missing WAL directory "pg_xlog/archive_status"
cp: cannot stat `../arc/00000002.history':
LOG: starting archive recovery
LOG: restored log file "000000010000000000000041" from archive
LOG: redo starts at 0/41000028
LOG: consistent recovery state reached at 0/410000F0
LOG: database system is ready to accept read only connections
LOG: restored log file "000000010000000000000042" from archive
FATAL: cannot wait on a latch owned by another process
LOG: startup process (PID 30501) exited with exit code 1
LOG: terminating any other active server processesWe need recovery flag for controling PITR situation.
That's all for now.
If you are busy, please fix in your pace. I'm busy and I'd like to wait
your time, too:-)Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: 52A59D10.7020209@lab.ntt.co.jp
Hi Fabrízio,
I test your v4 patch, and send your review comments.
* Fix typo
49 -# commited transactions from the master, specify a recovery time delay.
49 +# committed transactions from the master, specify a recovery time delay.
* Fix white space
177 - if (secs <= 0 && microsecs <=0)
177 + if (secs <= 0 && microsecs <=0 )
* Add functionality (I propose)
We can set negative number at min_standby_apply_delay. I think that this feature
is for world wide replication situation. For example, master server is in Japan
and slave server is in San Francisco. Japan time fowards than San Francisco time
. And if we want to delay in this situation, it can need negative number in
min_standby_apply_delay. So I propose that time delay conditional branch change
under following.
- if (min_standby_apply_delay > 0) + if (min_standby_apply_delay != 0)
What do you think? It might also be working collectry.
* Problem 1
I read your wittened document. There is "PITR has not affected".
However, when I run PITR with min_standby_apply_delay=3000000, it cannot start
server. The log is under following.
[mitsu-ko@localhost postgresql]$ bin/pg_ctl -D data2 start
server starting
[mitsu-ko@localhost postgresql]$ LOG: database system was interrupted; last known up at 2013-12-08 18:57:00 JST
LOG: creating missing WAL directory "pg_xlog/archive_status"
cp: cannot stat `../arc/00000002.history':
LOG: starting archive recovery
LOG: restored log file "000000010000000000000041" from archive
LOG: redo starts at 0/41000028
LOG: consistent recovery state reached at 0/410000F0
LOG: database system is ready to accept read only connections
LOG: restored log file "000000010000000000000042" from archive
FATAL: cannot wait on a latch owned by another process
LOG: startup process (PID 30501) exited with exit code 1
LOG: terminating any other active server processes
We need recovery flag for controling PITR situation.
That's all for now.
If you are busy, please fix in your pace. I'm busy and I'd like to wait your
time, too:-)
Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
(2013/12/09 19:36), KONDO Mitsumasa wrote:
* Problem 1
I read your wittened document. There is "PITR has not affected".
However, when I run PITR with min_standby_apply_delay=3000000, it cannot start
server. The log is under following.[mitsu-ko@localhost postgresql]$ bin/pg_ctl -D data2 start
server starting
[mitsu-ko@localhost postgresql]$ LOG: database system was interrupted; last
known up at 2013-12-08 18:57:00 JST
LOG: creating missing WAL directory "pg_xlog/archive_status"
cp: cannot stat `../arc/00000002.history':
LOG: starting archive recovery
LOG: restored log file "000000010000000000000041" from archive
LOG: redo starts at 0/41000028
LOG: consistent recovery state reached at 0/410000F0
LOG: database system is ready to accept read only connections
LOG: restored log file "000000010000000000000042" from archive
FATAL: cannot wait on a latch owned by another process
LOG: startup process (PID 30501) exited with exit code 1
LOG: terminating any other active server processesWe need recovery flag for controling PITR situation.
Add my comment. We have to consider three situations.
1. PITR
2. replication standby
3. replication standby with restore_command
I think this patch cannot delay in 1 situation. So I think you should add only
StandbyModeRequested flag in conditional branch.
Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
(2013/12/09 19:35), Pavel Stehule wrote:
2013/12/9 KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp
<mailto:kondo.mitsumasa@lab.ntt.co.jp>>Hi Fabrízio,
I test your v4 patch, and send your review comments.
* Fix typo
49 -# commited transactions from the master, specify a recovery time delay.
49 +# committed transactions from the master, specify a recovery time delay.* Fix white space
177 - if (secs <= 0 && microsecs <=0)
177 + if (secs <= 0 && microsecs <=0 )* Add functionality (I propose)
We can set negative number at min_standby_apply_delay. I think that this feature
is for world wide replication situation. For example, master server is in
Japan and slave server is in San Francisco. Japan time fowards than San
Francisco time
. And if we want to delay in this situation, it can need negative number in
min_standby_apply_delay. So I propose that time delay conditional branch
change under following.- if (min_standby_apply_delay > 0) + if (min_standby_apply_delay != 0)What do you think? It might also be working collectry.
what using interval instead absolute time?
This is because local time is recorded in XLOG. And it has big cost for
calculating global time.
Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-12-09 19:51:01 +0900, KONDO Mitsumasa wrote:
Add my comment. We have to consider three situations.
1. PITR
2. replication standby
3. replication standby with restore_commandI think this patch cannot delay in 1 situation.
Why?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12/04/2013 02:46 AM, Robert Haas wrote:
Thanks for your review Christian...
So, I proposed this patch previously and I still think it's a good
idea, but it got voted down on the grounds that it didn't deal with
clock drift. I view that as insufficient reason to reject the
feature, but others disagreed. Unless some of those people have
changed their minds, I don't think this patch has much future here.
Surely that's the operating system / VM host / sysadmin / whatever's
problem?
The only way to "deal with" clock drift that isn't fragile in the face
of variable latency, etc, is to basically re-implement (S)NTP in order
to find out what the clock difference with the remote is.
If we're going to do that, why not just let the OS deal with it?
It might well be worth complaining about obvious aberrations like
timestamps in the local future - preferably by complaining and not
actually dying. It does need to be able to cope with a *skewing* clock,
but I'd be surprised if it had any issues there in the first place.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9 Dec 2013 12:16, "Craig Ringer" <craig@2ndquadrant.com> wrote:
The only way to "deal with" clock drift that isn't fragile in the face
of variable latency, etc, is to basically re-implement (S)NTP in order
to find out what the clock difference with the remote is.
There's actually an entirely different way to "deal" with clock drift: test
"master time" and "slave time" as two different incomparable spaces.
Similar to how you would treat measurements in different units.
If you do that then you can measure and manage the delay in the slave
between receiving and applying a record and also measure the amount of
master server time which can be pending. These measurements don't depend at
all on time sync between servers.
The specified feature depends explicitly on the conversion between master
and slave time spaces so it's inevitable that sync would be an issue. It
might be nice to print a warning on connection if the time is far out of
sync or periodically check. But I don't think reimplementing NTP is a good
idea.
(2013/12/09 20:29), Andres Freund wrote:
On 2013-12-09 19:51:01 +0900, KONDO Mitsumasa wrote:
Add my comment. We have to consider three situations.
1. PITR
2. replication standby
3. replication standby with restore_commandI think this patch cannot delay in 1 situation.
Why?
I have three reasons.
1. It is written in document. Can we remove it?
2. Name of this feature is "Time-delayed *standbys*", not "Time-delayed
*recovery*". Can we change it?
3. I think it is unnessesary in master PITR. And if it can delay in master
PITR, it will become master at unexpected timing, not to continue to
recovery. It is meaningless.
I'd like to ask you what do you expect from this feature and how to use it.
Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-12-10 13:26:27 +0900, KONDO Mitsumasa wrote:
(2013/12/09 20:29), Andres Freund wrote:
On 2013-12-09 19:51:01 +0900, KONDO Mitsumasa wrote:
Add my comment. We have to consider three situations.
1. PITR
2. replication standby
3. replication standby with restore_commandI think this patch cannot delay in 1 situation.
Why?
I have three reasons.
None of these reasons seem to be of technical nature, right?
1. It is written in document. Can we remove it?
2. Name of this feature is "Time-delayed *standbys*", not "Time-delayed
*recovery*". Can we change it?
I don't think that'd be a win in clarity. But perhaps somebody else has
a better suggestion?
3. I think it is unnessesary in master PITR. And if it can delay in master
PITR, it will become master at unexpected timing, not to continue to
recovery. It is meaningless.
"master PITR"? What's that? All PITR is based on recovery.conf and thus
not really a "master"?
Why should we prohibit using this feature in PITR? I don't see any
advantage in doing so. If somebody doesn't want the delay, they
shouldn't set it in the configuration file. End of story.
There's not really a that meaningful distinction between PITR and
replication using archive_command. Especially when using
*pause_after. I think this feature will be used in a lot of scenarios in
which PITR is currently used.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
(2013/12/10 18:38), Andres Freund wrote:
"master PITR"? What's that? All PITR is based on recovery.conf and thus
not really a "master"?
"master PITR" is PITR with "standby_mode = off". It's just recovery from
basebackup. They have difference between "master PITR" and "standby" that the
former will be independent timelineID, but the latter is same timeline ID taht
following the master sever. In the first place, purposes are different.
Why should we prohibit using this feature in PITR? I don't see any
advantage in doing so. If somebody doesn't want the delay, they
shouldn't set it in the configuration file. End of story.
Unfortunately, there are a lot of stupid in the world... I think you have these
clients, too.
There's not really a that meaningful distinction between PITR and
replication using archive_command. Especially when using
*pause_after.
It is meaningless in "master PITR". It will be master which has new timelineID at
unexpected timing.
I think this feature will be used in a lot of scenarios in
which PITR is currently used.
We have to judge which is better, we get something potential or to protect stupid.
And we had better to wait author's comment...
Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11 December 2013 06:36, KONDO Mitsumasa
<kondo.mitsumasa@lab.ntt.co.jp> wrote:
I think this feature will be used in a lot of scenarios in
which PITR is currently used.We have to judge which is better, we get something potential or to protect
stupid.
And we had better to wait author's comment...
I'd say just document that it wouldn't make sense to use it for PITR.
There may be some use case we can't see yet, so specifically
prohibiting a use case that is not dangerous seems too much at this
point. I will no doubt be reminded of these words in the future...
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Dec 11, 2013 at 6:27 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 11 December 2013 06:36, KONDO Mitsumasa
<kondo.mitsumasa@lab.ntt.co.jp> wrote:I think this feature will be used in a lot of scenarios in
which PITR is currently used.We have to judge which is better, we get something potential or to
protect
stupid.
And we had better to wait author's comment...I'd say just document that it wouldn't make sense to use it for PITR.
There may be some use case we can't see yet, so specifically
prohibiting a use case that is not dangerous seems too much at this
point. I will no doubt be reminded of these words in the future...
Hi all,
I tend to agree with Simon, but I confess that I don't liked to delay a
server with standby_mode = 'off'.
The main goal of this patch is delay the Streaming Replication, so if the
slave server isn't a hot-standby I think makes no sense to delay it.
Mitsumasa suggested to add "StandbyModeRequested" in conditional branch to
skip this situation. I agree with him!
And I'll change 'recoveryDelay' (functions, variables) to 'standbyDelay'.
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
Show quoted text
Timbira: http://www.timbira.com.br
Blog sobre TI: http://fabriziomello.blogspot.com
Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
Twitter: http://twitter.com/fabriziomello
On 2013-12-11 16:37:54 -0200, Fabrízio de Royes Mello wrote:
On Wed, Dec 11, 2013 at 6:27 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
I think this feature will be used in a lot of scenarios in
which PITR is currently used.We have to judge which is better, we get something potential or to protect
stupid.
And we had better to wait author's comment...I'd say just document that it wouldn't make sense to use it for PITR.
There may be some use case we can't see yet, so specifically
prohibiting a use case that is not dangerous seems too much at this
point. I will no doubt be reminded of these words in the future...
I tend to agree with Simon, but I confess that I don't liked to delay a
server with standby_mode = 'off'.
The main goal of this patch is delay the Streaming Replication, so if the
slave server isn't a hot-standby I think makes no sense to delay it.
Mitsumasa suggested to add "StandbyModeRequested" in conditional branch to
skip this situation. I agree with him!
I don't think that position has any merit, sorry: Think about the way
this stuff gets setup. The user creates a new basebackup (pg_basebackup,
manual pg_start/stop_backup, shutdown primary). Then he creates a
recovery conf by either starting from scratch, using
--write-recovery-conf or by copying recovery.conf.sample. In none of
these cases delay will be configured.
So, with that in mind, the only way it could have been configured is by
the user *explicitly* writing it into recovery.conf. And now you want to
to react to this explicit step by just *silently* ignoring the setting
based on some random criteria (arguments have been made about
hot_standby=on/off, standby_mode=on/off which aren't directly
related). Why on earth would that by a usability improvement?
Also, you seem to assume there's no point in configuring it for any of
hot_standby=off, standby_mode=off, recovery_target=*. Why? There's
usecases for all of them:
* hot_standby=off: Makes delay useable with wal_level=archive (and thus
a lower WAL volume)
* standby_mode=off: Configurations that use tools like pg_standby and
similar simply don't need standby_mode=on. If you want to trigger
failover from within the restore_command you *cannot* set it.
* recovery_target_*: It can still make sense if you use
pause_at_recovery_target.
In which scenarios does your restriction actually improve anything?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Dec 11, 2013 at 7:47 PM, Andres Freund <andres@2ndquadrant.com>
wrote:
I don't think that position has any merit, sorry: Think about the way
this stuff gets setup. The user creates a new basebackup (pg_basebackup,
manual pg_start/stop_backup, shutdown primary). Then he creates a
recovery conf by either starting from scratch, using
--write-recovery-conf or by copying recovery.conf.sample. In none of
these cases delay will be configured.
Ok.
So, with that in mind, the only way it could have been configured is by
the user *explicitly* writing it into recovery.conf. And now you want to
to react to this explicit step by just *silently* ignoring the setting
based on some random criteria (arguments have been made about
hot_standby=on/off, standby_mode=on/off which aren't directly
related). Why on earth would that by a usability improvement?Also, you seem to assume there's no point in configuring it for any of
hot_standby=off, standby_mode=off, recovery_target=*. Why? There's
usecases for all of them:
* hot_standby=off: Makes delay useable with wal_level=archive (and thus
a lower WAL volume)
* standby_mode=off: Configurations that use tools like pg_standby and
similar simply don't need standby_mode=on. If you want to trigger
failover from within the restore_command you *cannot* set it.
* recovery_target_*: It can still make sense if you use
pause_at_recovery_target.In which scenarios does your restriction actually improve anything?
Given your arguments I'm forced to review my understanding of the problem.
You are absolutely right in your assertions. I was not seeing the scenario
on this perspective.
Anyway we need to improve docs, any suggestions?
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
Show quoted text
Timbira: http://www.timbira.com.br
Blog sobre TI: http://fabriziomello.blogspot.com
Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
Twitter: http://twitter.com/fabriziomello
(2013/12/12 7:23), Fabr�zio de Royes Mello wrote:
On Wed, Dec 11, 2013 at 7:47 PM, Andres Freund <andres@2ndquadrant.com
* hot_standby=off: Makes delay useable with wal_level=archive (and thus
a lower WAL volume)
* standby_mode=off: Configurations that use tools like pg_standby and
similar simply don't need standby_mode=on. If you want to trigger
failover from within the restore_command you *cannot* set it.
* recovery_target_*: It can still make sense if you use
pause_at_recovery_target.
I don't think part of his arguments are right very much... We can just set
stanby_mode=on when we use "min_standby_apply_delay" with pg_standby and similar
simply tools. However, I tend to agree with not to need to prohibit except for
standby_mode. So I'd like to propose that changing parameter name of
"min_standby_apply_delay" to "min_recovery_apply_delay". It is natural for this
feature.
Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12 December 2013 08:19, KONDO Mitsumasa
<kondo.mitsumasa@lab.ntt.co.jp> wrote:
(2013/12/12 7:23), Fabrízio de Royes Mello wrote:
On Wed, Dec 11, 2013 at 7:47 PM, Andres Freund <andres@2ndquadrant.com
* hot_standby=off: Makes delay useable with wal_level=archive (and thus
a lower WAL volume)
* standby_mode=off: Configurations that use tools like pg_standby and
similar simply don't need standby_mode=on. If you want to trigger
failover from within the restore_command you *cannot* set it.
* recovery_target_*: It can still make sense if you use
pause_at_recovery_target.I don't think part of his arguments are right very much... We can just set
stanby_mode=on when we use "min_standby_apply_delay" with pg_standby and
similar simply tools. However, I tend to agree with not to need to prohibit
except for standby_mode. So I'd like to propose that changing parameter name
of "min_standby_apply_delay" to "min_recovery_apply_delay". It is natural
for this feature.
OK
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9 December 2013 10:54, KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp> wrote:
(2013/12/09 19:35), Pavel Stehule wrote:
2013/12/9 KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp
<mailto:kondo.mitsumasa@lab.ntt.co.jp>>Hi Fabrízio,
I test your v4 patch, and send your review comments.
* Fix typo
49 -# commited transactions from the master, specify a recovery
time delay.
49 +# committed transactions from the master, specify a recovery
time delay.
* Fix white space
177 - if (secs <= 0 && microsecs <=0)
177 + if (secs <= 0 && microsecs <=0 )* Add functionality (I propose)
We can set negative number at min_standby_apply_delay. I think that
this feature
is for world wide replication situation. For example, master server is
in
Japan and slave server is in San Francisco. Japan time fowards than
San
Francisco time
. And if we want to delay in this situation, it can need negative
number in
min_standby_apply_delay. So I propose that time delay conditional
branch
change under following.- if (min_standby_apply_delay > 0) + if (min_standby_apply_delay != 0)What do you think? It might also be working collectry.
what using interval instead absolute time?
This is because local time is recorded in XLOG. And it has big cost for
calculating global time.
I agree with your request here, but I don't think negative values are
the right way to implement that, at least it would not be very usable.
My suggestion would be to add the TZ to the checkpoint record. This
way all users of WAL can see the TZ of the master and act accordingly.
I'll do a separate patch for that.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
(2013/12/12 18:09), Simon Riggs wrote:
On 9 December 2013 10:54, KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp> wrote:
(2013/12/09 19:35), Pavel Stehule wrote:
2013/12/9 KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp
<mailto:kondo.mitsumasa@lab.ntt.co.jp>>Hi Fabrízio,
I test your v4 patch, and send your review comments.
* Fix typo
49 -# commited transactions from the master, specify a recovery
time delay.
49 +# committed transactions from the master, specify a recovery
time delay.
* Fix white space
177 - if (secs <= 0 && microsecs <=0)
177 + if (secs <= 0 && microsecs <=0 )* Add functionality (I propose)
We can set negative number at min_standby_apply_delay. I think that
this feature
is for world wide replication situation. For example, master server is
in
Japan and slave server is in San Francisco. Japan time fowards than
San
Francisco time
. And if we want to delay in this situation, it can need negative
number in
min_standby_apply_delay. So I propose that time delay conditional
branch
change under following.- if (min_standby_apply_delay > 0) + if (min_standby_apply_delay != 0)What do you think? It might also be working collectry.
what using interval instead absolute time?
This is because local time is recorded in XLOG. And it has big cost for
calculating global time.I agree with your request here, but I don't think negative values are
the right way to implement that, at least it would not be very usable.
I think that my proposal is the easiest and simplist way to solve this problem.
And I believe that the man who cannot calculate the difference in time-zone
doesn't set replication cluster across continents.
My suggestion would be to add the TZ to the checkpoint record. This
way all users of WAL can see the TZ of the master and act accordingly.
I'll do a separate patch for that.
It is something useful for also other situations. However, it might be
going to happen long and complicated discussions... I think that our hope is to
commit this patch in this commit-fest or next final commit-fest.
Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12 December 2013 10:42, KONDO Mitsumasa
<kondo.mitsumasa@lab.ntt.co.jp> wrote:
I agree with your request here, but I don't think negative values are
the right way to implement that, at least it would not be very usable.I think that my proposal is the easiest and simplist way to solve this
problem. And I believe that the man who cannot calculate the difference in
time-zone doesn't set replication cluster across continents.My suggestion would be to add the TZ to the checkpoint record. This
way all users of WAL can see the TZ of the master and act accordingly.
I'll do a separate patch for that.It is something useful for also other situations. However, it might be
going to happen long and complicated discussions... I think that our hope is
to commit this patch in this commit-fest or next final commit-fest.
Agreed on no delay for the delay patch, as shown by my commit.
Still think we need better TZ handling.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-12-12 09:09:21 +0000, Simon Riggs wrote:
* Add functionality (I propose)
We can set negative number at min_standby_apply_delay. I think that
this feature
is for world wide replication situation. For example, master server is
in
Japan and slave server is in San Francisco. Japan time fowards than
San
Francisco time
. And if we want to delay in this situation, it can need negative
This is because local time is recorded in XLOG. And it has big cost for
calculating global time.
Uhm? Isn't the timestamp in commit records actually a TimestampTz? And
thus essentially stored as UTC? I don't think this problem actually
exists?
My suggestion would be to add the TZ to the checkpoint record. This
way all users of WAL can see the TZ of the master and act accordingly.
I'll do a separate patch for that.
Intuitively I'd say that might be useful - but I am not reall sure what
for. And we don't exactly have a great interface for looking at a
checkpoint's data. Maybe add it to the control file instead?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12 December 2013 11:05, Andres Freund <andres@2ndquadrant.com> wrote:
My suggestion would be to add the TZ to the checkpoint record. This
way all users of WAL can see the TZ of the master and act accordingly.
I'll do a separate patch for that.Intuitively I'd say that might be useful - but I am not reall sure what
for. And we don't exactly have a great interface for looking at a
checkpoint's data. Maybe add it to the control file instead?
That's actually what I had in mind, I just phrased it badly in mid-thought.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/12/12 Simon Riggs <simon@2ndquadrant.com>
On 12 December 2013 10:42, KONDO Mitsumasa
<kondo.mitsumasa@lab.ntt.co.jp> wrote:I agree with your request here, but I don't think negative values are
the right way to implement that, at least it would not be very usable.I think that my proposal is the easiest and simplist way to solve this
problem. And I believe that the man who cannot calculate the differencein
time-zone doesn't set replication cluster across continents.
My suggestion would be to add the TZ to the checkpoint record. This
way all users of WAL can see the TZ of the master and act accordingly.
I'll do a separate patch for that.It is something useful for also other situations. However, it might be
going to happen long and complicated discussions... I think that ourhope is
to commit this patch in this commit-fest or next final commit-fest.
Agreed on no delay for the delay patch, as shown by my commit.
Our forecast was very accurate...
Nice commit, Thanks!
Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
Simon Riggs <simon@2ndQuadrant.com> writes:
On 12 December 2013 11:05, Andres Freund <andres@2ndquadrant.com> wrote:
My suggestion would be to add the TZ to the checkpoint record. This
way all users of WAL can see the TZ of the master and act accordingly.
I'll do a separate patch for that.
Intuitively I'd say that might be useful - but I am not reall sure what
for. And we don't exactly have a great interface for looking at a
checkpoint's data. Maybe add it to the control file instead?
That's actually what I had in mind, I just phrased it badly in mid-thought.
I don't think you realize what a can of worms that would be. There's
no compact representation of "a timezone", unless you are only proposing
to store the UTC offset; and frankly I'm not particularly seeing the point
of that.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Dec 12, 2013 at 9:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
On 12 December 2013 11:05, Andres Freund <andres@2ndquadrant.com> wrote:
My suggestion would be to add the TZ to the checkpoint record. This
way all users of WAL can see the TZ of the master and act accordingly.
I'll do a separate patch for that.Intuitively I'd say that might be useful - but I am not reall sure what
for. And we don't exactly have a great interface for looking at a
checkpoint's data. Maybe add it to the control file instead?That's actually what I had in mind, I just phrased it badly in mid-thought.
I don't think you realize what a can of worms that would be. There's
no compact representation of "a timezone", unless you are only proposing
to store the UTC offset; and frankly I'm not particularly seeing the point
of that.
+1. I can see the point of storing a timestamp in each checkpoint
record, if we don't already, but time zones should be completely
irrelevant to this feature. Everything should be reckoned in seconds
since the epoch.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12 December 2013 15:03, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Dec 12, 2013 at 9:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
On 12 December 2013 11:05, Andres Freund <andres@2ndquadrant.com> wrote:
My suggestion would be to add the TZ to the checkpoint record. This
way all users of WAL can see the TZ of the master and act accordingly.
I'll do a separate patch for that.Intuitively I'd say that might be useful - but I am not reall sure what
for. And we don't exactly have a great interface for looking at a
checkpoint's data. Maybe add it to the control file instead?That's actually what I had in mind, I just phrased it badly in mid-thought.
I don't think you realize what a can of worms that would be. There's
no compact representation of "a timezone", unless you are only proposing
to store the UTC offset; and frankly I'm not particularly seeing the point
of that.+1. I can see the point of storing a timestamp in each checkpoint
record, if we don't already, but time zones should be completely
irrelevant to this feature. Everything should be reckoned in seconds
since the epoch.
Don't panic guys! I meant UTC offset only. And yes, it may not be
needed, will check.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12 December 2013 15:19, Simon Riggs <simon@2ndquadrant.com> wrote:
Don't panic guys! I meant UTC offset only. And yes, it may not be
needed, will check.
Checked, all non-UTC TZ offsets work without further effort here.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Dec 12, 2013 at 3:39 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 12 December 2013 15:19, Simon Riggs <simon@2ndquadrant.com> wrote:
Don't panic guys! I meant UTC offset only. And yes, it may not be
needed, will check.Checked, all non-UTC TZ offsets work without further effort here.
Thanks!
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
Show quoted text
Timbira: http://www.timbira.com.br
Blog sobre TI: http://fabriziomello.blogspot.com
Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
Twitter: http://twitter.com/fabriziomello
On Thu, Dec 12, 2013 at 3:42 PM, Fabrízio de Royes Mello <
fabriziomello@gmail.com> wrote:
On Thu, Dec 12, 2013 at 3:39 PM, Simon Riggs <simon@2ndquadrant.com>
wrote:
On 12 December 2013 15:19, Simon Riggs <simon@2ndquadrant.com> wrote:
Don't panic guys! I meant UTC offset only. And yes, it may not be
needed, will check.Checked, all non-UTC TZ offsets work without further effort here.
Thanks!
Reviewing the committed patch I noted that the "CheckForStandbyTrigger()"
after the delay was removed.
If we promote the standby during the delay and don't check the trigger
immediately after the delay, then we will replay undesired WALs records.
The attached patch add this check.
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
Show quoted text
Timbira: http://www.timbira.com.br
Blog sobre TI: http://fabriziomello.blogspot.com
Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
Twitter: http://twitter.com/fabriziomello
Attachments:
check-for-standby-trigger-on-time-delayed-standby_v1.patchtext/x-diff; charset=US-ASCII; name=check-for-standby-trigger-on-time-delayed-standby_v1.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a76aef3..fbc2d2f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6835,6 +6835,14 @@ StartupXLOG(void)
recoveryApplyDelay();
/*
+ * Check for standby trigger to prevent the
+ * replay of undesired WAL records if the
+ * slave was promoted during the delay.
+ */
+ if (CheckForStandbyTrigger())
+ break;
+
+ /*
* We test for paused recovery again here. If
* user sets delayed apply, it may be because
* they expect to pause recovery in case of
On 12 December 2013 21:58, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:
On Thu, Dec 12, 2013 at 3:42 PM, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:On Thu, Dec 12, 2013 at 3:39 PM, Simon Riggs <simon@2ndquadrant.com>
wrote:On 12 December 2013 15:19, Simon Riggs <simon@2ndquadrant.com> wrote:
Don't panic guys! I meant UTC offset only. And yes, it may not be
needed, will check.Checked, all non-UTC TZ offsets work without further effort here.
Thanks!
Reviewing the committed patch I noted that the "CheckForStandbyTrigger()"
after the delay was removed.If we promote the standby during the delay and don't check the trigger
immediately after the delay, then we will replay undesired WALs records.The attached patch add this check.
I removed it because it was after the pause. I'll replace it, but
before the pause.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-12-13 11:56:47 +0000, Simon Riggs wrote:
On 12 December 2013 21:58, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:Reviewing the committed patch I noted that the "CheckForStandbyTrigger()"
after the delay was removed.If we promote the standby during the delay and don't check the trigger
immediately after the delay, then we will replay undesired WALs records.The attached patch add this check.
I removed it because it was after the pause. I'll replace it, but
before the pause.
Doesn't after the pause make more sense? If somebody promoted while we
were waiting, we want to recognize that before rolling forward? The wait
can take a long while after all?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 13 December 2013 11:58, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-12-13 11:56:47 +0000, Simon Riggs wrote:
On 12 December 2013 21:58, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:Reviewing the committed patch I noted that the "CheckForStandbyTrigger()"
after the delay was removed.If we promote the standby during the delay and don't check the trigger
immediately after the delay, then we will replay undesired WALs records.The attached patch add this check.
I removed it because it was after the pause. I'll replace it, but
before the pause.Doesn't after the pause make more sense? If somebody promoted while we
were waiting, we want to recognize that before rolling forward? The wait
can take a long while after all?
That would change the way pause currently works, which is OOS for that patch.
I'm happy to discuss such a change, but if agreed, it would need to
apply in all cases, not just this one.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-12-13 13:09:13 +0000, Simon Riggs wrote:
On 13 December 2013 11:58, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-12-13 11:56:47 +0000, Simon Riggs wrote:
On 12 December 2013 21:58, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:Reviewing the committed patch I noted that the "CheckForStandbyTrigger()"
after the delay was removed.If we promote the standby during the delay and don't check the trigger
immediately after the delay, then we will replay undesired WALs records.The attached patch add this check.
I removed it because it was after the pause. I'll replace it, but
before the pause.Doesn't after the pause make more sense? If somebody promoted while we
were waiting, we want to recognize that before rolling forward? The wait
can take a long while after all?That would change the way pause currently works, which is OOS for that patch.
But this feature isn't pause itself - it's imo something
independent. Note that we currently
a) check pause again after recoveryApplyDelay(),
b) do check for promotion if the sleep in recoveryApplyDelay() is
interrupted. So not checking after the final sleep seems confusing.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 13 December 2013 13:22, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-12-13 13:09:13 +0000, Simon Riggs wrote:
On 13 December 2013 11:58, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-12-13 11:56:47 +0000, Simon Riggs wrote:
On 12 December 2013 21:58, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:Reviewing the committed patch I noted that the "CheckForStandbyTrigger()"
after the delay was removed.If we promote the standby during the delay and don't check the trigger
immediately after the delay, then we will replay undesired WALs records.The attached patch add this check.
I removed it because it was after the pause. I'll replace it, but
before the pause.Doesn't after the pause make more sense? If somebody promoted while we
were waiting, we want to recognize that before rolling forward? The wait
can take a long while after all?That would change the way pause currently works, which is OOS for that patch.
But this feature isn't pause itself - it's imo something
independent. Note that we currently
a) check pause again after recoveryApplyDelay(),
b) do check for promotion if the sleep in recoveryApplyDelay() is
interrupted. So not checking after the final sleep seems confusing.
I'm proposing the attached patch.
This patch implements a consistent view of recovery pause, which is
that when paused, we don't check for promotion, during or immediately
after. That is user noticeable behaviour and shouldn't be changed
without thought and discussion on a separate thread with a clear
descriptive title. (I might argue in favour of it myself, I'm not yet
decided).
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
snippet.patchapplication/octet-stream; name=snippet.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a76aef3..c0ce170 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6834,6 +6834,9 @@ StartupXLOG(void)
{
recoveryApplyDelay();
+ if (CheckForStandbyTrigger())
+ break;
+
/*
* We test for paused recovery again here. If
* user sets delayed apply, it may be because
On Fri, Dec 13, 2013 at 11:44 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 13 December 2013 13:22, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-12-13 13:09:13 +0000, Simon Riggs wrote:
On 13 December 2013 11:58, Andres Freund <andres@2ndquadrant.com>
wrote:
On 2013-12-13 11:56:47 +0000, Simon Riggs wrote:
On 12 December 2013 21:58, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:Reviewing the committed patch I noted that the
"CheckForStandbyTrigger()"
after the delay was removed.
If we promote the standby during the delay and don't check the
trigger
immediately after the delay, then we will replay undesired WALs
records.
The attached patch add this check.
I removed it because it was after the pause. I'll replace it, but
before the pause.Doesn't after the pause make more sense? If somebody promoted while
we
were waiting, we want to recognize that before rolling forward? The
wait
can take a long while after all?
That would change the way pause currently works, which is OOS for that
patch.
But this feature isn't pause itself - it's imo something
independent. Note that we currently
a) check pause again after recoveryApplyDelay(),
b) do check for promotion if the sleep in recoveryApplyDelay() is
interrupted. So not checking after the final sleep seems confusing.I'm proposing the attached patch.
This patch implements a consistent view of recovery pause, which is
that when paused, we don't check for promotion, during or immediately
after. That is user noticeable behaviour and shouldn't be changed
without thought and discussion on a separate thread with a clear
descriptive title. (I might argue in favour of it myself, I'm not yet
decided).
In my previous message [1]/messages/by-id/CAFcNs+qD0AJ=qzhsHD9+v_Mhz0RTBJ=cJPCT_T=uT_JvvnC1FQ@mail.gmail.com I attach a patch equal to your ;-)
Regards,
[1]: /messages/by-id/CAFcNs+qD0AJ=qzhsHD9+v_Mhz0RTBJ=cJPCT_T=uT_JvvnC1FQ@mail.gmail.com
/messages/by-id/CAFcNs+qD0AJ=qzhsHD9+v_Mhz0RTBJ=cJPCT_T=uT_JvvnC1FQ@mail.gmail.com
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
Show quoted text
Timbira: http://www.timbira.com.br
Blog sobre TI: http://fabriziomello.blogspot.com
Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
Twitter: http://twitter.com/fabriziomello
On 2013-12-13 13:44:30 +0000, Simon Riggs wrote:
On 13 December 2013 13:22, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-12-13 13:09:13 +0000, Simon Riggs wrote:
On 13 December 2013 11:58, Andres Freund <andres@2ndquadrant.com> wrote:
I removed it because it was after the pause. I'll replace it, but
before the pause.Doesn't after the pause make more sense? If somebody promoted while we
were waiting, we want to recognize that before rolling forward? The wait
can take a long while after all?That would change the way pause currently works, which is OOS for that patch.
But this feature isn't pause itself - it's imo something
independent. Note that we currently
a) check pause again after recoveryApplyDelay(),
b) do check for promotion if the sleep in recoveryApplyDelay() is
interrupted. So not checking after the final sleep seems confusing.I'm proposing the attached patch.
LOoks good, although I'd move it down below the comment ;)
This patch implements a consistent view of recovery pause, which is
that when paused, we don't check for promotion, during or immediately
after. That is user noticeable behaviour and shouldn't be changed
without thought and discussion on a separate thread with a clear
descriptive title. (I might argue in favour of it myself, I'm not yet
decided).
Some more improvements in that are certainly would be good...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers