comment for "fast promote"
Hi,
Now I'm seeing xlog.c in 93_stable for studying "fast promote",
and I have a question.
I think it has an extra unlink command for "promote" file.
(on 9937 line)
-------
9934 if (stat(FAST_PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
9935 {
9936 unlink(FAST_PROMOTE_SIGNAL_FILE);
9937 unlink(PROMOTE_SIGNAL_FILE);
9938 fast_promote = true;
9939 }
-------
Is this command necesary ?
regards,
------------------
NTT Software Corporation
Tomonari Katsumata
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jul 25, 2013 at 5:33 PM, Tomonari Katsumata
<katsumata.tomonari@po.ntts.co.jp> wrote:
Hi,
Now I'm seeing xlog.c in 93_stable for studying "fast promote",
and I have a question.I think it has an extra unlink command for "promote" file.
(on 9937 line)
-------
9934 if (stat(FAST_PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
9935 {
9936 unlink(FAST_PROMOTE_SIGNAL_FILE);
9937 unlink(PROMOTE_SIGNAL_FILE);
9938 fast_promote = true;
9939 }
-------Is this command necesary ?
Yes, it prevents PROMOTE_SIGNAL_FILE from remaining even if
both promote files exist.
One question is that: we really still need to support normal promote?
pg_ctl promote provides only way to do fast promotion. If we want to
do normal promotion, we need to create PROMOTE_SIGNAL_FILE
and send the SIGUSR1 signal to postmaster by hand. This seems messy.
I think that we should remove normal promotion at all, or change
pg_ctl promote so that provides also the way to do normal promotion.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Fujii-san,
Thank you for response.
(2013/07/25 21:15), Fujii Masao wrote:
On Thu, Jul 25, 2013 at 5:33 PM, Tomonari Katsumata
<katsumata.tomonari@po.ntts.co.jp> wrote:Hi,
Now I'm seeing xlog.c in 93_stable for studying "fast promote",
and I have a question.I think it has an extra unlink command for "promote" file.
(on 9937 line)
-------
9934 if (stat(FAST_PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
9935 {
9936 unlink(FAST_PROMOTE_SIGNAL_FILE);
9937 unlink(PROMOTE_SIGNAL_FILE);
9938 fast_promote = true;
9939 }
-------Is this command necesary ?
Yes, it prevents PROMOTE_SIGNAL_FILE from remaining even if
both promote files exist.
The command("unlink(PROMOTE_SIGNAL_FILE)") here is for
unusualy case.
Because the case is when done both procedures below.
- user create "promote" file on PGDATA
- user issue "pg_ctl promote"
I understand the reason.
But I think it's better to unlink(PROMOTE_SIGNAL_FILE) before
unlink(FAST_PROMOTE_SIGNAL_FILE).
Because FAST_PROMOTE_SIGNAL_FILE is definetly there but
PROMOTE_SIGNAL_FILE is sometimes there or not there.
And I have another question linking this behavior.
I think TriggerFile should be removed too.
This is corner-case but it will happen.
How do you think of it ?
One question is that: we really still need to support normal promote?
pg_ctl promote provides only way to do fast promotion. If we want to
do normal promotion, we need to create PROMOTE_SIGNAL_FILE
and send the SIGUSR1 signal to postmaster by hand. This seems messy.I think that we should remove normal promotion at all, or change
pg_ctl promote so that provides also the way to do normal promotion.
I think he merit of "fast promote" is
- allowing quick connection by skipping checkpoint
and its demerit is
- taking little bit longer when crash-recovery
If it is seldom to happen its crash soon after promoting
and "fast promte" never breaks consistency of database cluster,
I think we don't need normal promotion.
(of course we need to put detail about promotion on document.)
regards,
--------
NTT Software Corporation
Tomonari Katsumata
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 26, 2013 at 11:19 AM, Tomonari Katsumata
<katsumata.tomonari@po.ntts.co.jp> wrote:
Hi Fujii-san,
Thank you for response.
(2013/07/25 21:15), Fujii Masao wrote:
On Thu, Jul 25, 2013 at 5:33 PM, Tomonari Katsumata
<katsumata.tomonari@po.ntts.co.jp> wrote:Hi,
Now I'm seeing xlog.c in 93_stable for studying "fast promote",
and I have a question.I think it has an extra unlink command for "promote" file.
(on 9937 line)
-------
9934 if (stat(FAST_PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
9935 {
9936 unlink(FAST_PROMOTE_SIGNAL_FILE);
9937 unlink(PROMOTE_SIGNAL_FILE);
9938 fast_promote = true;
9939 }
-------Is this command necesary ?
Yes, it prevents PROMOTE_SIGNAL_FILE from remaining even if
both promote files exist.The command("unlink(PROMOTE_SIGNAL_FILE)") here is for
unusualy case.
Because the case is when done both procedures below.
- user create "promote" file on PGDATA
- user issue "pg_ctl promote"I understand the reason.
But I think it's better to unlink(PROMOTE_SIGNAL_FILE) before
unlink(FAST_PROMOTE_SIGNAL_FILE).
Because FAST_PROMOTE_SIGNAL_FILE is definetly there but
PROMOTE_SIGNAL_FILE is sometimes there or not there.
I could not understand why that's better. Could you elaborate that?
And I have another question linking this behavior.
I think TriggerFile should be removed too.
This is corner-case but it will happen.
How do you think of it ?
I don't have strong opinion about that. I've never heard the complaint
about that current behavior so far.
One question is that: we really still need to support normal promote?
pg_ctl promote provides only way to do fast promotion. If we want to
do normal promotion, we need to create PROMOTE_SIGNAL_FILE
and send the SIGUSR1 signal to postmaster by hand. This seems messy.I think that we should remove normal promotion at all, or change
pg_ctl promote so that provides also the way to do normal promotion.I think he merit of "fast promote" is
- allowing quick connection by skipping checkpoint
and its demerit is
- taking little bit longer when crash-recoveryIf it is seldom to happen its crash soon after promoting
and "fast promte" never breaks consistency of database cluster,
I think we don't need normal promotion.
You can execute checkpoint after fast promotion for that.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
Yes, it prevents PROMOTE_SIGNAL_FILE from remaining even if
both promote files exist.The command("unlink(PROMOTE_SIGNAL_FILE)") here is for
unusualy case.
Because the case is when done both procedures below.
- user create "promote" file on PGDATA
- user issue "pg_ctl promote"I understand the reason.
But I think it's better to unlink(PROMOTE_SIGNAL_FILE) before
unlink(FAST_PROMOTE_SIGNAL_FILE).
Because FAST_PROMOTE_SIGNAL_FILE is definetly there but
PROMOTE_SIGNAL_FILE is sometimes there or not there.I could not understand why that's better. Could you elaborate that?
I'm sorry for less explanation.
I've thought that errno would be set ENOENT and
this may lead something wrong.
I checked this and I know it's not problem.
sorry for confusing you.
And I have another question linking this behavior.
I think TriggerFile should be removed too.
This is corner-case but it will happen.
How do you think of it ?I don't have strong opinion about that. I've never heard the complaint
about that current behavior so far.
For example, please imagine the cascading replication environment and
using old master as a standby without copying the timeline history file
to new standby.
-------
1. replicating 3 servers(A,B,C)
A->B->C
("trigger_file = /tmp/trig" is set in recovery_recovery.conf on B and C.)
2. stop server A and promoting server B with "touch /tmp/trig;pg_ctl
promote"
B->C
(/tmp/trig file remains on server B)
4. stop server B and promoting server C with "pg_ctl promote"
C
5. making server B connect for standby of server C
C->B
---------
In step5 server B will promote as soon as it starts,
because "/tmp/trig" is stil there.
One question is that: we really still need to support normal promote?
pg_ctl promote provides only way to do fast promotion. If we want to
do normal promotion, we need to create PROMOTE_SIGNAL_FILE
and send the SIGUSR1 signal to postmaster by hand. This seems messy.I think that we should remove normal promotion at all, or change
pg_ctl promote so that provides also the way to do normal promotion.I think he merit of "fast promote" is
- allowing quick connection by skipping checkpoint
and its demerit is
- taking little bit longer when crash-recoveryIf it is seldom to happen its crash soon after promoting
and "fast promte" never breaks consistency of database cluster,
I think we don't need normal promotion.You can execute checkpoint after fast promotion for that.
OK.
Then I think we should do below things.
- removing normal promotion at all from source
- adding the know-how you suggest on document
Are there any objection?
regards,
-------------------
Tomonari Katsumata
On 27-07-2013 06:57, Tomonari Katsumata wrote:
1. replicating 3 servers(A,B,C)
A->B->C
("trigger_file = /tmp/trig" is set in recovery_recovery.conf on B and C.)2. stop server A and promoting server B with "touch /tmp/trig;pg_ctl
promote"
B->C
(/tmp/trig file remains on server B)
Why don't you setup recovery_end_command parameter? The trigger_file is
important in some (legacy) environments and that is using an external
tool to handle the service initialization.
It seems to me it is an opportunity to improve trigger_file description
(informing a way to cleanup the file created) than to suggest it is not
useful.
--
Euler Taveira Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jul 27, 2013 at 6:57 PM, Tomonari Katsumata
<t.katsumata1122@gmail.com> wrote:
Hi,
Yes, it prevents PROMOTE_SIGNAL_FILE from remaining even if
both promote files exist.The command("unlink(PROMOTE_SIGNAL_FILE)") here is for
unusualy case.
Because the case is when done both procedures below.
- user create "promote" file on PGDATA
- user issue "pg_ctl promote"I understand the reason.
But I think it's better to unlink(PROMOTE_SIGNAL_FILE) before
unlink(FAST_PROMOTE_SIGNAL_FILE).
Because FAST_PROMOTE_SIGNAL_FILE is definetly there but
PROMOTE_SIGNAL_FILE is sometimes there or not there.I could not understand why that's better. Could you elaborate that?
I'm sorry for less explanation.
I've thought that errno would be set ENOENT and
this may lead something wrong.
I checked this and I know it's not problem.sorry for confusing you.
And I have another question linking this behavior.
I think TriggerFile should be removed too.
This is corner-case but it will happen.
How do you think of it ?I don't have strong opinion about that. I've never heard the complaint
about that current behavior so far.For example, please imagine the cascading replication environment and
using old master as a standby without copying the timeline history file
to new standby.-------
1. replicating 3 servers(A,B,C)
A->B->C
("trigger_file = /tmp/trig" is set in recovery_recovery.conf on B and C.)2. stop server A and promoting server B with "touch /tmp/trig;pg_ctl
promote"
Why do you need to both create the trigger file and run pg_ctl promote?
Anyway, if the patch is useful for fail-safe and it doesn't break the current
behavior, I'd be happy to apply it. You are suggesting that we should remove
the trigger file in CheckForStandbyTrigger() even if pg_ctl promote is executed.
But there can be some cases where we can get out of the WAL replay loop,
for example, reach the recovery_target_xxx. So ISTM we should try to remove
both the trigger file and "promote" file at the end of recovery
instead. Thought?
B->C
(/tmp/trig file remains on server B)4. stop server B and promoting server C with "pg_ctl promote"
C5. making server B connect for standby of server C
C->B
---------In step5 server B will promote as soon as it starts,
because "/tmp/trig" is stil there.One question is that: we really still need to support normal promote?
pg_ctl promote provides only way to do fast promotion. If we want to
do normal promotion, we need to create PROMOTE_SIGNAL_FILE
and send the SIGUSR1 signal to postmaster by hand. This seems messy.I think that we should remove normal promotion at all, or change
pg_ctl promote so that provides also the way to do normal promotion.I think he merit of "fast promote" is
- allowing quick connection by skipping checkpoint
and its demerit is
- taking little bit longer when crash-recoveryIf it is seldom to happen its crash soon after promoting
and "fast promte" never breaks consistency of database cluster,
I think we don't need normal promotion.You can execute checkpoint after fast promotion for that.
OK.
Then I think we should do below things.
- removing normal promotion at all from source
- adding the know-how you suggest on document
IMO either is necessary.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
I made a patch for REL9_3_STABLE which gets rid of
old promote processing. please check it.
This patch make PostgreSQL do fast promoting(*) always.
(*) which means skipping long checkpoint before increasing
timeline.
And after this, I'll do make another patch for unlinking files which are
created by user as a trigger_file or "pg_ctl promote" command.
---------------
Tomonari Katsumata
2013/7/30 Fujii Masao <masao.fujii@gmail.com>
Show quoted text
On Sat, Jul 27, 2013 at 6:57 PM, Tomonari Katsumata
<t.katsumata1122@gmail.com> wrote:Hi,
Yes, it prevents PROMOTE_SIGNAL_FILE from remaining even if
both promote files exist.The command("unlink(PROMOTE_SIGNAL_FILE)") here is for
unusualy case.
Because the case is when done both procedures below.
- user create "promote" file on PGDATA
- user issue "pg_ctl promote"I understand the reason.
But I think it's better to unlink(PROMOTE_SIGNAL_FILE) before
unlink(FAST_PROMOTE_SIGNAL_FILE).
Because FAST_PROMOTE_SIGNAL_FILE is definetly there but
PROMOTE_SIGNAL_FILE is sometimes there or not there.I could not understand why that's better. Could you elaborate that?
I'm sorry for less explanation.
I've thought that errno would be set ENOENT and
this may lead something wrong.
I checked this and I know it's not problem.sorry for confusing you.
And I have another question linking this behavior.
I think TriggerFile should be removed too.
This is corner-case but it will happen.
How do you think of it ?I don't have strong opinion about that. I've never heard the complaint
about that current behavior so far.For example, please imagine the cascading replication environment and
using old master as a standby without copying the timeline history file
to new standby.-------
1. replicating 3 servers(A,B,C)
A->B->C
("trigger_file = /tmp/trig" is set in recovery_recovery.conf on B and C.)2. stop server A and promoting server B with "touch /tmp/trig;pg_ctl
promote"Why do you need to both create the trigger file and run pg_ctl promote?
Anyway, if the patch is useful for fail-safe and it doesn't break the
current
behavior, I'd be happy to apply it. You are suggesting that we should
remove
the trigger file in CheckForStandbyTrigger() even if pg_ctl promote is
executed.
But there can be some cases where we can get out of the WAL replay loop,
for example, reach the recovery_target_xxx. So ISTM we should try to remove
both the trigger file and "promote" file at the end of recovery
instead. Thought?B->C
(/tmp/trig file remains on server B)4. stop server B and promoting server C with "pg_ctl promote"
C5. making server B connect for standby of server C
C->B
---------In step5 server B will promote as soon as it starts,
because "/tmp/trig" is stil there.One question is that: we really still need to support normal promote?
pg_ctl promote provides only way to do fast promotion. If we want to
do normal promotion, we need to create PROMOTE_SIGNAL_FILE
and send the SIGUSR1 signal to postmaster by hand. This seems messy.I think that we should remove normal promotion at all, or change
pg_ctl promote so that provides also the way to do normal promotion.I think he merit of "fast promote" is
- allowing quick connection by skipping checkpoint
and its demerit is
- taking little bit longer when crash-recoveryIf it is seldom to happen its crash soon after promoting
and "fast promte" never breaks consistency of database cluster,
I think we don't need normal promotion.You can execute checkpoint after fast promotion for that.
OK.
Then I think we should do below things.
- removing normal promotion at all from source
- adding the know-how you suggest on documentIMO either is necessary.
Regards,
--
Fujii Masao
Attachments:
getting_rid_of_old_promote.patchapplication/octet-stream; name=getting_rid_of_old_promote.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 45b17f5..353c259 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -66,7 +66,6 @@ extern uint32 bootstrap_data_checksum_version;
#define RECOVERY_COMMAND_FILE "recovery.conf"
#define RECOVERY_COMMAND_DONE "recovery.done"
#define PROMOTE_SIGNAL_FILE "promote"
-#define FAST_PROMOTE_SIGNAL_FILE "fast_promote"
/* User-settable parameters */
@@ -225,9 +224,6 @@ static char *TriggerFile = NULL;
/* are we currently in standby mode? */
bool StandbyMode = false;
-/* whether request for fast promotion has been made yet */
-static bool fast_promote = false;
-
/* if recoveryStopsHere returns true, it saves actual stop xid/time/name here */
static TransactionId recoveryStopXid;
static TimestampTz recoveryStopTime;
@@ -4866,7 +4862,7 @@ StartupXLOG(void)
DBState dbstate_at_startup;
XLogReaderState *xlogreader;
XLogPageReadPrivate private;
- bool fast_promoted = false;
+ bool checkpoint_skip = false;
/*
* Read control file and check XLOG status looks valid.
@@ -5962,43 +5958,40 @@ StartupXLOG(void)
* the rule that TLI only changes in shutdown checkpoints, which
* allows some extra error checking in xlog_redo.
*
- * In fast promotion, only create a lightweight end-of-recovery record
+ * In promotion, only create a lightweight end-of-recovery record
* instead of a full checkpoint. A checkpoint is requested later,
* after we're fully out of recovery mode and already accepting
* queries.
*/
if (bgwriterLaunched)
{
- if (fast_promote)
+ checkPointLoc = ControlFile->prevCheckPoint;
+
+ /*
+ * Confirm the last checkpoint is available for us to recover
+ * from if we fail. Note that we don't check for the secondary
+ * checkpoint since that isn't available in most base backups.
+ */
+ record = ReadCheckpointRecord(xlogreader, checkPointLoc, 1, false);
+ if (record != NULL)
{
- checkPointLoc = ControlFile->prevCheckPoint;
+ checkpoint_skip = true;
/*
- * Confirm the last checkpoint is available for us to recover
- * from if we fail. Note that we don't check for the secondary
- * checkpoint since that isn't available in most base backups.
+ * Insert a special WAL record to mark the end of
+ * recovery, since we aren't doing a checkpoint. That
+ * means that the checkpointer process may likely be in
+ * the middle of a time-smoothed restartpoint and could
+ * continue to be for minutes after this. That sounds
+ * strange, but the effect is roughly the same and it
+ * would be stranger to try to come out of the
+ * restartpoint and then checkpoint. We request a
+ * checkpoint later anyway, just for safety.
*/
- record = ReadCheckpointRecord(xlogreader, checkPointLoc, 1, false);
- if (record != NULL)
- {
- fast_promoted = true;
-
- /*
- * Insert a special WAL record to mark the end of
- * recovery, since we aren't doing a checkpoint. That
- * means that the checkpointer process may likely be in
- * the middle of a time-smoothed restartpoint and could
- * continue to be for minutes after this. That sounds
- * strange, but the effect is roughly the same and it
- * would be stranger to try to come out of the
- * restartpoint and then checkpoint. We request a
- * checkpoint later anyway, just for safety.
- */
- CreateEndOfRecoveryRecord();
- }
+ CreateEndOfRecoveryRecord();
}
- if (!fast_promoted)
+ if (!checkpoint_skip)
RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
CHECKPOINT_IMMEDIATE |
CHECKPOINT_WAIT);
@@ -6111,12 +6104,12 @@ StartupXLOG(void)
WalSndWakeup();
/*
- * If this was a fast promotion, request an (online) checkpoint now. This
+ * If checkpoint_skip is true, request an (online) checkpoint now. This
* isn't required for consistency, but the last restartpoint might be far
* back, and in case of a crash, recovering from it might take a longer
* than is appropriate now that we're not in standby mode anymore.
*/
- if (fast_promoted)
+ if (checkpoint_skip)
RequestCheckpoint(CHECKPOINT_FORCE);
}
@@ -9931,16 +9924,11 @@ CheckForStandbyTrigger(void)
* process do the unlink. This allows Startup to know whether we're
* doing fast or normal promotion. Fast promotion takes precedence.
*/
- if (stat(FAST_PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
- {
- unlink(FAST_PROMOTE_SIGNAL_FILE);
- unlink(PROMOTE_SIGNAL_FILE);
- fast_promote = true;
- }
- else if (stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
+ if (stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
{
+ if (TriggerFile != NULL)
+ unlink(PROMOTE_SIGNAL_FILE);
unlink(PROMOTE_SIGNAL_FILE);
- fast_promote = false;
}
ereport(LOG, (errmsg("received promote request")));
@@ -9957,9 +9945,9 @@ CheckForStandbyTrigger(void)
{
ereport(LOG,
(errmsg("trigger file found: %s", TriggerFile)));
+ unlink(PROMOTE_SIGNAL_FILE);
unlink(TriggerFile);
triggered = true;
- fast_promote = true;
return true;
}
return false;
@@ -9974,8 +9962,7 @@ CheckPromoteSignal(void)
{
struct stat stat_buf;
- if (stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0 ||
- stat(FAST_PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
+ if (stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
return true;
return false;
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 9045e00..9e2b6c2 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -1099,12 +1099,10 @@ do_promote(void)
}
/*
- * For 9.3 onwards, use fast promotion as the default option. Promotion
- * with a full checkpoint is still possible by writing a file called
- * "promote", e.g. snprintf(promote_file, MAXPGPATH, "%s/promote",
- * pg_data);
+ * For 9.3 onwards, use fast promotion as the default option.
+ * This means a lightweight CheckPoint is executed when promoting.
*/
- snprintf(promote_file, MAXPGPATH, "%s/fast_promote", pg_data);
+ snprintf(promote_file, MAXPGPATH, "%s/promote", pg_data);
if ((prmfile = fopen(promote_file, "w")) == NULL)
{
On Sat, Aug 3, 2013 at 4:31 PM, Tomonari Katsumata
<t.katsumata1122@gmail.com> wrote:
Hi,
I made a patch for REL9_3_STABLE which gets rid of
old promote processing. please check it.
This patch make PostgreSQL do fast promoting(*) always.
(*) which means skipping long checkpoint before increasing
timeline.
Thanks for the patch!
I fixed the bug that your patch accidentally makes archive recovery
skip end-of-recovery checkpoint, fixed some typos, refactored the
source code and posted the updated version of the patch in
/messages/by-id/CAHGQGwGYkF+CvpOMdxaO=+aNAzc1Oo9O4LqWo50MxpvFj+0VOw@mail.gmail.com
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers