Should we remove "not fast" promotion at all?

Started by Fujii Masaoover 12 years ago31 messageshackers
Jump to latest
#1Fujii Masao
masao.fujii@gmail.com

Hi all,

We discussed the $SUBJECT in the following threads:
/messages/by-id/CA+TgmoZbR+WL8E7MF_KRp6fY4FD2pMr11TPiuyjMFX_Vtg1Wrw@mail.gmail.com
/messages/by-id/CAHGQGwEBUvgcx8X+Z0Hh+VdwYcJ8KCuRuLt1jSsxeLxPcX=0_w@mail.gmail.com

Our consensus seems to remove "not fast" promotion at all
because there is no use case for that promotion.

Attached patch removes "not fast" promotion. Barring any objections,
I will commit this patch.

Regards,

On Sat, Aug 3, 2013 at 4:31 PM, Tomonari Katsumata
<t.katsumata1122@gmail.com> wrote:

Hi,

I made a patch for REL9_3_STABLE which gets rid of
old promote processing. please check it.
This patch make PostgreSQL do fast promoting(*) always.
(*) which means skipping long checkpoint before increasing
timeline.

And after this, I'll do make another patch for unlinking files which are
created by user as a trigger_file or "pg_ctl promote" command.

---------------
Tomonari Katsumata
2013/7/30 Fujii Masao <masao.fujii@gmail.com>

On Sat, Jul 27, 2013 at 6:57 PM, Tomonari Katsumata
<t.katsumata1122@gmail.com> wrote:

Hi,

Yes, it prevents PROMOTE_SIGNAL_FILE from remaining even if
both promote files exist.

The command("unlink(PROMOTE_SIGNAL_FILE)") here is for
unusualy case.
Because the case is when done both procedures below.
- user create "promote" file on PGDATA
- user issue "pg_ctl promote"

I understand the reason.
But I think it's better to unlink(PROMOTE_SIGNAL_FILE) before
unlink(FAST_PROMOTE_SIGNAL_FILE).
Because FAST_PROMOTE_SIGNAL_FILE is definetly there but
PROMOTE_SIGNAL_FILE is sometimes there or not there.

I could not understand why that's better. Could you elaborate that?

I'm sorry for less explanation.

I've thought that errno would be set ENOENT and
this may lead something wrong.
I checked this and I know it's not problem.

sorry for confusing you.

And I have another question linking this behavior.
I think TriggerFile should be removed too.
This is corner-case but it will happen.
How do you think of it ?

I don't have strong opinion about that. I've never heard the complaint
about that current behavior so far.

For example, please imagine the cascading replication environment and
using old master as a standby without copying the timeline history file
to new standby.

-------
1. replicating 3 servers(A,B,C)
A->B->C
("trigger_file = /tmp/trig" is set in recovery_recovery.conf on B and
C.)

2. stop server A and promoting server B with "touch /tmp/trig;pg_ctl
promote"

Why do you need to both create the trigger file and run pg_ctl promote?

Anyway, if the patch is useful for fail-safe and it doesn't break the
current
behavior, I'd be happy to apply it. You are suggesting that we should
remove
the trigger file in CheckForStandbyTrigger() even if pg_ctl promote is
executed.
But there can be some cases where we can get out of the WAL replay loop,
for example, reach the recovery_target_xxx. So ISTM we should try to
remove
both the trigger file and "promote" file at the end of recovery
instead. Thought?

B->C
(/tmp/trig file remains on server B)

4. stop server B and promoting server C with "pg_ctl promote"
C

5. making server B connect for standby of server C
C->B
---------

In step5 server B will promote as soon as it starts,
because "/tmp/trig" is stil there.

One question is that: we really still need to support normal promote?
pg_ctl promote provides only way to do fast promotion. If we want to
do normal promotion, we need to create PROMOTE_SIGNAL_FILE
and send the SIGUSR1 signal to postmaster by hand. This seems messy.

I think that we should remove normal promotion at all, or change
pg_ctl promote so that provides also the way to do normal promotion.

I think he merit of "fast promote" is
- allowing quick connection by skipping checkpoint
and its demerit is
- taking little bit longer when crash-recovery

If it is seldom to happen its crash soon after promoting
and "fast promte" never breaks consistency of database cluster,
I think we don't need normal promotion.

You can execute checkpoint after fast promotion for that.

OK.
Then I think we should do below things.
- removing normal promotion at all from source
- adding the know-how you suggest on document

IMO either is necessary.

Regards,

--
Fujii Masao

--
Fujii Masao

Attachments:

remove_not_fast_promote_v1.patchapplication/octet-stream; name=remove_not_fast_promote_v1.patchDownload+34-64
#2Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#1)
Re: Should we remove "not fast" promotion at all?

On Tue, Aug 6, 2013 at 3:24 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

Hi all,

We discussed the $SUBJECT in the following threads:
/messages/by-id/CA+TgmoZbR+WL8E7MF_KRp6fY4FD2pMr11TPiuyjMFX_Vtg1Wrw@mail.gmail.com
/messages/by-id/CAHGQGwEBUvgcx8X+Z0Hh+VdwYcJ8KCuRuLt1jSsxeLxPcX=0_w@mail.gmail.com

Our consensus seems to remove "not fast" promotion at all
because there is no use case for that promotion.

Indeed, if two modes of promotion are available, it is not that
user-friendly if pg_ctl does not support both directly.

struct stat stat_buf;

-       if (stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0 ||
-               stat(FAST_PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
+       /*
+        * In 9.1 and 9.2 the postmaster unlinked the promote file inside the
+        * signal handler. We now leave the file in place and let the Startup
+        * process do the unlink. This is the infrastructure for supporting
+        * various promotion modes in the future. This allows Startup to know
+        * the mode from the promote signal file that the postmaster left.
+        */
+       if (stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
                return true;
Why not reshuffle this comment and remove references to 9.1 and 9.2?
Something like that perhaps:
Leave the promote signal file in place and let the Startup do the
unlink. This infrastructure permits Startup to know the mode from the
promote signal file that postmaster left, keeping the door open for
support of multiple promotion modes in the future.
-               /*
-                * In 9.1 and 9.2 the postmaster unlinked the promote
file inside the
-                * signal handler. We now leave the file in place and
let the Startup
-                * process do the unlink. This allows Startup to know
whether we're
-                * doing fast or normal promotion. Fast promotion
takes precedence.
-                */
-               if (stat(FAST_PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
-               {
-                       unlink(FAST_PROMOTE_SIGNAL_FILE);
-                       unlink(PROMOTE_SIGNAL_FILE);
-                       fast_promote = true;
-               }
-               else if (stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
-               {
-                       unlink(PROMOTE_SIGNAL_FILE);
-                       fast_promote = false;
-               }
-
                ereport(LOG, (errmsg("received promote request")));
-
+               unlink(PROMOTE_SIGNAL_FILE);
Wouldn't it make sense to keep the call to stat() to check the file
status before unlinking it?
-- 
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Andres Freund
andres@anarazel.de
In reply to: Fujii Masao (#1)
Re: Should we remove "not fast" promotion at all?

Hi,

On 2013-08-06 03:24:58 +0900, Fujii Masao wrote:

Hi all,

We discussed the $SUBJECT in the following threads:
/messages/by-id/CA+TgmoZbR+WL8E7MF_KRp6fY4FD2pMr11TPiuyjMFX_Vtg1Wrw@mail.gmail.com
/messages/by-id/CAHGQGwEBUvgcx8X+Z0Hh+VdwYcJ8KCuRuLt1jSsxeLxPcX=0_w@mail.gmail.com

Our consensus seems to remove "not fast" promotion at all
because there is no use case for that promotion.

Attached patch removes "not fast" promotion. Barring any objections,
I will commit this patch.

FWIW I'd rather keep plain promotion for a release or two. TBH, I have a
bit of trust issues regarding the new method, and I'd like to be able to
test potential issues against a stock postgres by doing a normal instead
of a fast promotion.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Fujii Masao
masao.fujii@gmail.com
In reply to: Michael Paquier (#2)
Re: Should we remove "not fast" promotion at all?

On Tue, Aug 6, 2013 at 11:20 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Aug 6, 2013 at 3:24 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

Hi all,

We discussed the $SUBJECT in the following threads:
/messages/by-id/CA+TgmoZbR+WL8E7MF_KRp6fY4FD2pMr11TPiuyjMFX_Vtg1Wrw@mail.gmail.com
/messages/by-id/CAHGQGwEBUvgcx8X+Z0Hh+VdwYcJ8KCuRuLt1jSsxeLxPcX=0_w@mail.gmail.com

Our consensus seems to remove "not fast" promotion at all
because there is no use case for that promotion.

Indeed, if two modes of promotion are available, it is not that
user-friendly if pg_ctl does not support both directly.

struct stat stat_buf;

-       if (stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0 ||
-               stat(FAST_PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
+       /*
+        * In 9.1 and 9.2 the postmaster unlinked the promote file inside the
+        * signal handler. We now leave the file in place and let the Startup
+        * process do the unlink. This is the infrastructure for supporting
+        * various promotion modes in the future. This allows Startup to know
+        * the mode from the promote signal file that the postmaster left.
+        */
+       if (stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
return true;
Why not reshuffle this comment and remove references to 9.1 and 9.2?

I just left the old comment as it is. I'm OK with your suggestion.

-               /*
-                * In 9.1 and 9.2 the postmaster unlinked the promote
file inside the
-                * signal handler. We now leave the file in place and
let the Startup
-                * process do the unlink. This allows Startup to know
whether we're
-                * doing fast or normal promotion. Fast promotion
takes precedence.
-                */
-               if (stat(FAST_PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
-               {
-                       unlink(FAST_PROMOTE_SIGNAL_FILE);
-                       unlink(PROMOTE_SIGNAL_FILE);
-                       fast_promote = true;
-               }
-               else if (stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
-               {
-                       unlink(PROMOTE_SIGNAL_FILE);
-                       fast_promote = false;
-               }
-
ereport(LOG, (errmsg("received promote request")));
-
+               unlink(PROMOTE_SIGNAL_FILE);
Wouldn't it make sense to keep the call to stat() to check the file
status before unlinking it?

Why do we need to check the existence of the file before removing it
here?

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Fujii Masao
masao.fujii@gmail.com
In reply to: Andres Freund (#3)
Re: Should we remove "not fast" promotion at all?

On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:

Hi,

On 2013-08-06 03:24:58 +0900, Fujii Masao wrote:

Hi all,

We discussed the $SUBJECT in the following threads:
/messages/by-id/CA+TgmoZbR+WL8E7MF_KRp6fY4FD2pMr11TPiuyjMFX_Vtg1Wrw@mail.gmail.com
/messages/by-id/CAHGQGwEBUvgcx8X+Z0Hh+VdwYcJ8KCuRuLt1jSsxeLxPcX=0_w@mail.gmail.com

Our consensus seems to remove "not fast" promotion at all
because there is no use case for that promotion.

Attached patch removes "not fast" promotion. Barring any objections,
I will commit this patch.

FWIW I'd rather keep plain promotion for a release or two. TBH, I have a
bit of trust issues regarding the new method, and I'd like to be able to
test potential issues against a stock postgres by doing a normal instead
of a fast promotion.

So we should add new option specifying the promotion mode, into pg_ctl?
Currently pg_ctl cannot trigger the normal promotion.

Or, instead of normal promotion, it might be better to use another promotion
technique like shutdown + remove recovery.conf + restart for that purpose?

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andres Freund
andres@anarazel.de
In reply to: Fujii Masao (#5)
Re: Should we remove "not fast" promotion at all?

Fujii Masao <masao.fujii@gmail.com> schrieb:

On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com>
wrote:

Hi,

On 2013-08-06 03:24:58 +0900, Fujii Masao wrote:

Hi all,

We discussed the $SUBJECT in the following threads:

/messages/by-id/CA+TgmoZbR+WL8E7MF_KRp6fY4FD2pMr11TPiuyjMFX_Vtg1Wrw@mail.gmail.com

/messages/by-id/CAHGQGwEBUvgcx8X+Z0Hh+VdwYcJ8KCuRuLt1jSsxeLxPcX=0_w@mail.gmail.com

Our consensus seems to remove "not fast" promotion at all
because there is no use case for that promotion.

Attached patch removes "not fast" promotion. Barring any objections,
I will commit this patch.

FWIW I'd rather keep plain promotion for a release or two. TBH, I

have a

bit of trust issues regarding the new method, and I'd like to be able

to

test potential issues against a stock postgres by doing a normal

instead

of a fast promotion.

So we should add new option specifying the promotion mode, into pg_ctl?
Currently pg_ctl cannot trigger the normal promotion.

I am fine with only supporting doing the promotion in the old fashioned way, but I wouldn't protest against an option either.

Or, instead of normal promotion, it might be better to use another
promotion
technique like shutdown + remove recovery.conf + restart for that
purpose?

That's a very bad thing to do since it suppresses the timeline increase...

Regards,

Andres

Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#4)
Re: Should we remove "not fast" promotion at all?

On Tue, Aug 6, 2013 at 12:41 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

-
+               unlink(PROMOTE_SIGNAL_FILE);
Wouldn't it make sense to keep the call to stat() to check the file
status before unlinking it?

Why do we need to check the existence of the file before removing it
here?

Forget what I said, I had in mind that it might have been better to
put in silence errors of unlink here. This is not mandatory.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Fujii Masao (#5)
Re: Should we remove "not fast" promotion at all?

Fujii Masao <masao.fujii@gmail.com> writes:

On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:

FWIW I'd rather keep plain promotion for a release or two. TBH, I have a
bit of trust issues regarding the new method, and I'd like to be able to
test potential issues against a stock postgres by doing a normal instead
of a fast promotion.

So we should add new option specifying the promotion mode, into pg_ctl?
Currently pg_ctl cannot trigger the normal promotion.

It would be silly to add such an option if we want to remove the old mode
in a release or two.

I think what Andres is suggesting is to leave it as-is for 9.4 and then
remove the old code in 9.5 or 9.6. Which seems prudent to me.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#6)
Re: Should we remove "not fast" promotion at all?

On Tue, Aug 6, 2013 at 12:52 PM, Andres Freund <andres@2ndquadrant.com> wrote:

Fujii Masao <masao.fujii@gmail.com> schrieb:

On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com>
wrote:

Hi,

On 2013-08-06 03:24:58 +0900, Fujii Masao wrote:

Hi all,

We discussed the $SUBJECT in the following threads:

/messages/by-id/CA+TgmoZbR+WL8E7MF_KRp6fY4FD2pMr11TPiuyjMFX_Vtg1Wrw@mail.gmail.com

/messages/by-id/CAHGQGwEBUvgcx8X+Z0Hh+VdwYcJ8KCuRuLt1jSsxeLxPcX=0_w@mail.gmail.com

Our consensus seems to remove "not fast" promotion at all
because there is no use case for that promotion.

Attached patch removes "not fast" promotion. Barring any objections,
I will commit this patch.

FWIW I'd rather keep plain promotion for a release or two. TBH, I

have a

bit of trust issues regarding the new method, and I'd like to be able

to

test potential issues against a stock postgres by doing a normal

instead

of a fast promotion.

So we should add new option specifying the promotion mode, into pg_ctl?
Currently pg_ctl cannot trigger the normal promotion.

I am fine with only supporting doing the promotion in the old fashioned way, but I wouldn't protest against an option either.

Yeah an additional option would be the way to go especially if new
promotion modes are supported in the future. Btw, what I like about
this patch is that it opens the door for easier support of additional
promotion modes. Could it be possible to use this advantage to support
both the fast and non-fast promotions now with a new fresh structure?

Or, instead of normal promotion, it might be better to use another
promotion
technique like shutdown + remove recovery.conf + restart for that
purpose?

That's a very bad thing to do since it suppresses the timeline increase...

Agreed with Andres. This is unsafe as it avoids as well all the safety
checks at the end of archive recovery.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tomonari Katsumata
t.katsumata1122@gmail.com
In reply to: Tom Lane (#8)
Re: Should we remove "not fast" promotion at all?

Hi,

2013/8/6 Tom Lane <tgl@sss.pgh.pa.us>

Fujii Masao <masao.fujii@gmail.com> writes:

On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com>

wrote:

FWIW I'd rather keep plain promotion for a release or two. TBH, I have a
bit of trust issues regarding the new method, and I'd like to be able to
test potential issues against a stock postgres by doing a normal instead
of a fast promotion.

So we should add new option specifying the promotion mode, into pg_ctl?
Currently pg_ctl cannot trigger the normal promotion.

It would be silly to add such an option if we want to remove the old mode
in a release or two.

I think what Andres is suggesting is to leave it as-is for 9.4 and then
remove the old code in 9.5 or 9.6. Which seems prudent to me.

How about giving trigger_file an ability to chose "fast promote" and

"normal promote"
like the triggerfile of pg_standby.
It means that if the contents of the trigger_file is empty or 'smart' then
do "normal promote",
and it's 'fast' then do "fast promote".

I think this change would be smaller than change to pg_ctl.
And this would allow us to treat ${PGDATA}/promote and trigger_file only.
(because ${PGDATA}/fast_promote is not created automatically)

regards,
---------------
Tomonari Katsumata

#11Fujii Masao
masao.fujii@gmail.com
In reply to: Tom Lane (#8)
Re: Should we remove "not fast" promotion at all?

On Tue, Aug 6, 2013 at 1:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Fujii Masao <masao.fujii@gmail.com> writes:

On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:

FWIW I'd rather keep plain promotion for a release or two. TBH, I have a
bit of trust issues regarding the new method, and I'd like to be able to
test potential issues against a stock postgres by doing a normal instead
of a fast promotion.

So we should add new option specifying the promotion mode, into pg_ctl?
Currently pg_ctl cannot trigger the normal promotion.

It would be silly to add such an option if we want to remove the old mode
in a release or two.

Without such an option, a user cannot easily trigger the "normal" promotion
when we find some problems in fast promotion. In this case, a user needs to
create the "promote" file and send the SIGUSR1 signal to postmaster by hand.
Or needs to execute pg_ctl promote by using old version (e.g., 9.2) of pg_ctl.
Seems confusing.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Andres Freund
andres@anarazel.de
In reply to: Fujii Masao (#11)
Re: Should we remove "not fast" promotion at all?

On 2013-08-07 22:26:53 +0900, Fujii Masao wrote:

On Tue, Aug 6, 2013 at 1:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Fujii Masao <masao.fujii@gmail.com> writes:

On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:

FWIW I'd rather keep plain promotion for a release or two. TBH, I have a
bit of trust issues regarding the new method, and I'd like to be able to
test potential issues against a stock postgres by doing a normal instead
of a fast promotion.

So we should add new option specifying the promotion mode, into pg_ctl?
Currently pg_ctl cannot trigger the normal promotion.

It would be silly to add such an option if we want to remove the old mode
in a release or two.

Without such an option, a user cannot easily trigger the "normal" promotion
when we find some problems in fast promotion. In this case, a user needs to
create the "promote" file and send the SIGUSR1 signal to postmaster by hand.
Or needs to execute pg_ctl promote by using old version (e.g., 9.2) of pg_ctl.
Seems confusing.

Seems fine for debugging to me.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Michael Paquier
michael@paquier.xyz
In reply to: Tomonari Katsumata (#10)
Re: Should we remove "not fast" promotion at all?

On Tue, Aug 6, 2013 at 8:05 PM, Tomonari Katsumata
<t.katsumata1122@gmail.com> wrote:

Hi,

2013/8/6 Tom Lane <tgl@sss.pgh.pa.us>

Fujii Masao <masao.fujii@gmail.com> writes:

On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com>
wrote:

FWIW I'd rather keep plain promotion for a release or two. TBH, I have
a
bit of trust issues regarding the new method, and I'd like to be able
to
test potential issues against a stock postgres by doing a normal
instead
of a fast promotion.

So we should add new option specifying the promotion mode, into pg_ctl?
Currently pg_ctl cannot trigger the normal promotion.

It would be silly to add such an option if we want to remove the old mode
in a release or two.

I think what Andres is suggesting is to leave it as-is for 9.4 and then
remove the old code in 9.5 or 9.6. Which seems prudent to me.

How about giving trigger_file an ability to chose "fast promote" and "normal
promote"
like the triggerfile of pg_standby.
It means that if the contents of the trigger_file is empty or 'smart' then
do "normal promote",
and it's 'fast' then do "fast promote".
I think this change would be smaller than change to pg_ctl.
And this would allow us to treat ${PGDATA}/promote and trigger_file only.
(because ${PGDATA}/fast_promote is not created automatically)

Indeed, this would be the way to go to have an extensible format for
other promotion modes or other actions that could be triggered by a
standby. So why not taking the approach suggested by Katsumata-san
now? One single file to rule them all, in this case called promote,
including a keyword indicating the promotion action to take. This
could be controlled by pg_ctl entirely, and opens the door to extra
possible modes.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#13)
Re: Should we remove "not fast" promotion at all?

On 2013-08-08 06:40:00 +0900, Michael Paquier wrote:

On Tue, Aug 6, 2013 at 8:05 PM, Tomonari Katsumata
<t.katsumata1122@gmail.com> wrote:

Hi,

2013/8/6 Tom Lane <tgl@sss.pgh.pa.us>

Fujii Masao <masao.fujii@gmail.com> writes:

On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com>
wrote:

FWIW I'd rather keep plain promotion for a release or two. TBH, I have
a
bit of trust issues regarding the new method, and I'd like to be able
to
test potential issues against a stock postgres by doing a normal
instead
of a fast promotion.

So we should add new option specifying the promotion mode, into pg_ctl?
Currently pg_ctl cannot trigger the normal promotion.

It would be silly to add such an option if we want to remove the old mode
in a release or two.

I think what Andres is suggesting is to leave it as-is for 9.4 and then
remove the old code in 9.5 or 9.6. Which seems prudent to me.

How about giving trigger_file an ability to chose "fast promote" and "normal
promote"
like the triggerfile of pg_standby.
It means that if the contents of the trigger_file is empty or 'smart' then
do "normal promote",
and it's 'fast' then do "fast promote".
I think this change would be smaller than change to pg_ctl.
And this would allow us to treat ${PGDATA}/promote and trigger_file only.
(because ${PGDATA}/fast_promote is not created automatically)

Indeed, this would be the way to go to have an extensible format for
other promotion modes or other actions that could be triggered by a
standby. So why not taking the approach suggested by Katsumata-san
now? One single file to rule them all, in this case called promote,
including a keyword indicating the promotion action to take. This
could be controlled by pg_ctl entirely, and opens the door to extra
possible modes.

Why are we suddenly trying to make this even more complicated? It's too
late to redesign stuff without very good evidence that it's
needed. Renaming trigger files and changing their format certainly
doesn't seem appropriate post-beta.

Let's just leave this as is, and remove the code in 9.4/9.5.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#14)
Re: Should we remove "not fast" promotion at all?

On Thu, Aug 8, 2013 at 12:24 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-08-08 06:40:00 +0900, Michael Paquier wrote:

On Tue, Aug 6, 2013 at 8:05 PM, Tomonari Katsumata
<t.katsumata1122@gmail.com> wrote:

Hi,

2013/8/6 Tom Lane <tgl@sss.pgh.pa.us>

Fujii Masao <masao.fujii@gmail.com> writes:

On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com>
wrote:

FWIW I'd rather keep plain promotion for a release or two. TBH, I have
a
bit of trust issues regarding the new method, and I'd like to be able
to
test potential issues against a stock postgres by doing a normal
instead
of a fast promotion.

So we should add new option specifying the promotion mode, into pg_ctl?
Currently pg_ctl cannot trigger the normal promotion.

It would be silly to add such an option if we want to remove the old mode
in a release or two.

I think what Andres is suggesting is to leave it as-is for 9.4 and then
remove the old code in 9.5 or 9.6. Which seems prudent to me.

How about giving trigger_file an ability to chose "fast promote" and "normal
promote"
like the triggerfile of pg_standby.
It means that if the contents of the trigger_file is empty or 'smart' then
do "normal promote",
and it's 'fast' then do "fast promote".
I think this change would be smaller than change to pg_ctl.
And this would allow us to treat ${PGDATA}/promote and trigger_file only.
(because ${PGDATA}/fast_promote is not created automatically)

Indeed, this would be the way to go to have an extensible format for
other promotion modes or other actions that could be triggered by a
standby. So why not taking the approach suggested by Katsumata-san
now? One single file to rule them all, in this case called promote,
including a keyword indicating the promotion action to take. This
could be controlled by pg_ctl entirely, and opens the door to extra
possible modes.

Why are we suddenly trying to make this even more complicated? It's too
late to redesign stuff without very good evidence that it's
needed. Renaming trigger files and changing their format certainly
doesn't seem appropriate post-beta.

Let's just leave this as is, and remove the code in 9.4/9.5.

Sorry. I should have been clearer. I meant that for 9.4~ only. For 9.3
yes it's too late.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Bruce Momjian
bruce@momjian.us
In reply to: Michael Paquier (#15)
Re: Should we remove "not fast" promotion at all?

On Thu, Aug 8, 2013 at 01:27:35PM +0900, Michael Paquier wrote:

Why are we suddenly trying to make this even more complicated? It's too
late to redesign stuff without very good evidence that it's
needed. Renaming trigger files and changing their format certainly
doesn't seem appropriate post-beta.

Let's just leave this as is, and remove the code in 9.4/9.5.

Sorry. I should have been clearer. I meant that for 9.4~ only. For 9.3
yes it's too late.

We seem to be all over the map with the fast promotion code --- some
people don't trust it, some people want an option to enable the old
method, and some people want the old method removed.

This has left us in an odd situation where we are going to ship 9.3
old-method code with no way to enable it in case we need it. Adding the
ability to enable it in 9.4 makes no sense --- effectively, if we need
the old promotion code, we are going to have to enable it in a 9.3 minor
release, while if we get to 9.4 final without needing it, we can assume
the fast promotion code is good and we don't need the old code.

I think a prudent plan would be to remove the old promotion code just
before 9.4 beta as we would know by then that the code is no longer
needed. I have added this as a 9.4 open item:

https://wiki.postgresql.org/wiki/PostgreSQL_9.4_Open_Items

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#16)
Re: Should we remove "not fast" promotion at all?

On 2013-08-08 12:50:31 -0400, Bruce Momjian wrote:

On Thu, Aug 8, 2013 at 01:27:35PM +0900, Michael Paquier wrote:

Why are we suddenly trying to make this even more complicated? It's too
late to redesign stuff without very good evidence that it's
needed. Renaming trigger files and changing their format certainly
doesn't seem appropriate post-beta.

Let's just leave this as is, and remove the code in 9.4/9.5.

Sorry. I should have been clearer. I meant that for 9.4~ only. For 9.3
yes it's too late.

We seem to be all over the map with the fast promotion code --- some
people don't trust it, some people want an option to enable the old
method, and some people want the old method removed.

This has left us in an odd situation where we are going to ship 9.3
old-method code with no way to enable it in case we need it.

That's not true. It's relatively easy to trigger it. You just can't use
pg_ctl. Which seems completely fine for debugging.

Imo there's no need to do anything. Which is what we've concluded on
before...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Josh Berkus
josh@agliodbs.com
In reply to: Fujii Masao (#1)
Re: Should we remove "not fast" promotion at all?

Bruce, all:

We seem to be all over the map with the fast promotion code --- some
people don't trust it, some people want an option to enable the old
method, and some people want the old method removed.

Having read over this thread, the only reason given for retaining any
ability to use "old" promotion code is because people are worried about
"fast" promotion being buggy. This seems wrong.

Either we have confidence is fast promotion, or we don't. If we don't
have confidence, then either (a) more testing is needed, or (b) it
shouldn't be the default. Again, here, we are coming up against our
lack of any kind of broad replication failure testing.

Of course, even if we have confidence, bugs are always possible, and
leaving the old promotion code in there would make it somewhat easier to
ship a 9.3.2 update which reverts the behavior. But maybe we should
focus on shipping a version which is relatively bug-free instead?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Andres Freund
andres@anarazel.de
In reply to: Josh Berkus (#18)
Re: Should we remove "not fast" promotion at all?

On 2013-08-08 10:15:14 -0700, Josh Berkus wrote:

Bruce, all:

We seem to be all over the map with the fast promotion code --- some
people don't trust it, some people want an option to enable the old
method, and some people want the old method removed.

Having read over this thread, the only reason given for retaining any
ability to use "old" promotion code is because people are worried about
"fast" promotion being buggy. This seems wrong.

Well, it's touching one of the more complex parts of pg.

Either we have confidence is fast promotion, or we don't. If we don't
have confidence, then either (a) more testing is needed, or (b) it
shouldn't be the default. Again, here, we are coming up against our
lack of any kind of broad replication failure testing.

While I think we definitely miss out there I don't think any regression
suite would help much here. I am wary of unknown problems, not ones
we already have tests for. The subtle ones aren't easy to test, even
with a regression suite.

Of course, even if we have confidence, bugs are always possible, and
leaving the old promotion code in there would make it somewhat easier to
ship a 9.3.2 update which reverts the behavior. But maybe we should
focus on shipping a version which is relatively bug-free instead?

The problem is that, especially involving HS, there's lots of subtle
corner cases. And those are pretty hard to forsee and thus hard to
test. Being able to tell somebody to touch some file and kill a certain
process instead of pg_ctl triggering is certainly better than to have
them apply complex patches which then only exhibit the old behaviour.
It's not about letting people regularly use it or such. It's about being
able to verify problems.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Josh Berkus
josh@agliodbs.com
In reply to: Andres Freund (#3)
Re: Should we remove "not fast" promotion at all?

On 08/08/2013 10:34 AM, Andres Freund wrote:

On 2013-08-08 10:15:14 -0700, Josh Berkus wrote:

Either we have confidence is fast promotion, or we don't. If we don't
have confidence, then either (a) more testing is needed, or (b) it
shouldn't be the default. Again, here, we are coming up against our
lack of any kind of broad replication failure testing.

While I think we definitely miss out there I don't think any regression
suite would help much here. I am wary of unknown problems, not ones
we already have tests for. The subtle ones aren't easy to test, even
with a regression suite.

Yeah, that's why we have to get beyond the mentality that regression
testing is the only kind of testing. We need a destruction test for
replication, and that's NOT going to be a regression test. Among other
things, we'll probably need to run it on cloud hosting.

The problem is that, especially involving HS, there's lots of subtle
corner cases. And those are pretty hard to forsee and thus hard to
test.

It would be useful to assemble a list of corner cases we *do* know
about. This could become a test suite, and we could keep adding to it.

Being able to tell somebody to touch some file and kill a certain
process instead of pg_ctl triggering is certainly better than to have
them apply complex patches which then only exhibit the old behaviour.
It's not about letting people regularly use it or such. It's about being
able to verify problems.

The problem is, if failover fails badly, the user is probably facing a
corrupt database, downtime, loss of data, and restore from backup. So
if we don't think that fast failover is rock-solid trustworthy --- or at
least as trustworthy as slow failover was -- then we should be making it
a non-default option for 9.3. We shouldn't be exposing people who don't
need fast failover to new risks without their knowledge.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Andres Freund
andres@anarazel.de
In reply to: Josh Berkus (#20)
#22Josh Berkus
josh@agliodbs.com
In reply to: Tomonari Katsumata (#10)
#23Tomonari Katsumata
katsumata.tomonari@po.ntts.co.jp
In reply to: Josh Berkus (#22)
#24Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Josh Berkus (#18)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#24)
#26Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#25)
#27Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#24)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#27)
#29Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#28)
#30Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#28)
#31Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#24)