pg_ctl failover Re: Latches, signals, and waiting

Started by Fujii Masaoover 15 years ago30 messageshackers
Jump to latest
#1Fujii Masao
masao.fujii@gmail.com

On Wed, Sep 15, 2010 at 11:14 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 15/09/10 16:55, Tom Lane wrote:

So I'm wondering if we couldn't eliminate the five-second sleep
requirement here too.  It's problematic anyhow, since somebody looking
for energy efficiency will still feel it's too short, while somebody
concerned about fast failover will feel it's too long.

Yep.

 Could the
standby triggering protocol be modified so that it involves sending a
signal, not just creating a file?

Seems reasonable, at least if we still provide an option for more frequent
polling and no need to send signal.

(One issue is that it's not clear what that'd translate to on Windows.)

pg_ctl failover ? At the moment, the location of the trigger file is
configurable, but if we accept a constant location like "$PGDATA/failover"
pg_ctl could do the whole thing, create the file and send signal. pg_ctl on
Window already knows how to send the "signal" via the named pipe signal
emulation.

The attached patch implements the above-mentioned pg_ctl failover.

Comments? Objections?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

pg_ctl_failover_v1.patchapplication/octet-stream; name=pg_ctl_failover_v1.patchDownload+155-28
#2Itagaki Takahiro
itagaki.takahiro@gmail.com
In reply to: Fujii Masao (#1)
Re: pg_ctl failover Re: Latches, signals, and waiting

On Thu, Jan 13, 2011 at 00:14, Fujii Masao <masao.fujii@gmail.com> wrote:

pg_ctl failover ? At the moment, the location of the trigger file is
configurable, but if we accept a constant location like "$PGDATA/failover"
pg_ctl could do the whole thing, create the file and send signal. pg_ctl on
Window already knows how to send the "signal" via the named pipe signal
emulation.

The attached patch implements the above-mentioned pg_ctl failover.

I have three comments:
- Will we call it "failover"? We will use the command also in "switchover"
operations. "pg_ctl promote" might be more neutral, but users might be
hard to imagine replication feature from "promote".

- pg_ctl should unlink failover_files when it failed to send failover signals.

- "standby_triggered" variable might be renamed to "failover_triggered" or so.

--
Itagaki Takahiro

#3Fujii Masao
masao.fujii@gmail.com
In reply to: Itagaki Takahiro (#2)
Re: pg_ctl failover Re: Latches, signals, and waiting

On Thu, Jan 13, 2011 at 11:29 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:

On Thu, Jan 13, 2011 at 00:14, Fujii Masao <masao.fujii@gmail.com> wrote:

pg_ctl failover ? At the moment, the location of the trigger file is
configurable, but if we accept a constant location like "$PGDATA/failover"
pg_ctl could do the whole thing, create the file and send signal. pg_ctl on
Window already knows how to send the "signal" via the named pipe signal
emulation.

The attached patch implements the above-mentioned pg_ctl failover.

I have three comments:

Thanks for the review!

- Will we call it "failover"? We will use the command also in "switchover"
 operations. "pg_ctl promote" might be more neutral, but users might be
 hard to imagine replication feature from "promote".

OK. Similarly, I should also change the word "failover" used in function and
variable names to the "promote"? For example,
#define PROMOTE_SIGNAL_FILE "promote" rather than
#define FAILOVER_SIGNAL_FILE "failover"?

- pg_ctl should unlink failover_files when it failed to send failover signals.

Good catch.

- "standby_triggered" variable might be renamed to "failover_triggered" or so.

Furthermore, "failover_triggered" should be renamed to "promote_triggered"?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#4Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Itagaki Takahiro (#2)
Re: pg_ctl failover Re: Latches, signals, and waiting

On 13.01.2011 04:29, Itagaki Takahiro wrote:

On Thu, Jan 13, 2011 at 00:14, Fujii Masao<masao.fujii@gmail.com> wrote:

pg_ctl failover ? At the moment, the location of the trigger file is
configurable, but if we accept a constant location like "$PGDATA/failover"
pg_ctl could do the whole thing, create the file and send signal. pg_ctl on
Window already knows how to send the "signal" via the named pipe signal
emulation.

The attached patch implements the above-mentioned pg_ctl failover.

I have three comments:
- Will we call it "failover"? We will use the command also in "switchover"
operations. "pg_ctl promote" might be more neutral, but users might be
hard to imagine replication feature from "promote".

I agree that "failover" or even "switchover" is too specific. You might
want promote a server even if you keep the old master still running, if
you're creating a temporary copy of the master repository for testing
purposes etc.

+1 for "promote". People unfamiliar with the replication stuff might not
immediately understand that it's related to replication, but they
wouldn't have any use for the option anyway. It should be clear to
anyone who needs it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#5Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#4)
Re: pg_ctl failover Re: Latches, signals, and waiting

On Thu, Jan 13, 2011 at 5:00 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 13.01.2011 04:29, Itagaki Takahiro wrote:

On Thu, Jan 13, 2011 at 00:14, Fujii Masao<masao.fujii@gmail.com>  wrote:

pg_ctl failover ? At the moment, the location of the trigger file is
configurable, but if we accept a constant location like
"$PGDATA/failover"
pg_ctl could do the whole thing, create the file and send signal. pg_ctl
on
Window already knows how to send the "signal" via the named pipe signal
emulation.

The attached patch implements the above-mentioned pg_ctl failover.

I have three comments:
- Will we call it "failover"? We will use the command also in "switchover"
  operations. "pg_ctl promote" might be more neutral, but users might be
  hard to imagine replication feature from "promote".

I agree that "failover" or even "switchover" is too specific. You might want
promote a server even if you keep the old master still running, if you're
creating a temporary copy of the master repository for testing purposes etc.

+1 for "promote". People unfamiliar with the replication stuff might not
immediately understand that it's related to replication, but they wouldn't
have any use for the option anyway. It should be clear to anyone who needs
it.

I agree.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#4)
Re: pg_ctl failover Re: Latches, signals, and waiting

On Thu, Jan 13, 2011 at 7:00 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

+1 for "promote". People unfamiliar with the replication stuff might not
immediately understand that it's related to replication, but they wouldn't
have any use for the option anyway. It should be clear to anyone who needs
it.

I did s/failover/promote. Here is the updated patch.

- pg_ctl should unlink failover_files when it failed to send failover signals.

Done.

And, I changed some descriptions about trigger in high-availability.sgml
and recovery-config.sgml.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

pg_ctl_failover_v2.patchapplication/octet-stream; name=pg_ctl_failover_v2.patchDownload+169-39
#7Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#6)
Re: pg_ctl failover Re: Latches, signals, and waiting

On Thu, Jan 13, 2011 at 9:08 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

I did s/failover/promote. Here is the updated patch.

I rebased the patch to current git master.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

pg_ctl_promote_v3.patchapplication/octet-stream; name=pg_ctl_promote_v3.patchDownload+169-39
#8Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Fujii Masao (#7)
Re: pg_ctl failover Re: Latches, signals, and waiting

I did s/failover/promote. Here is the updated patch.

I rebased the patch to current git master.

I'm thinking about implementing a function which does a promotion for
the standby. It will make pgpool lot easier to control the promotion
since it allow to fire the promotion operation (either creating a
trigger file or sending a signal) via SQL, not ssh etc.

If there's enough interest, I will propose such a function for next CF.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#9Magnus Hagander
magnus@hagander.net
In reply to: Tatsuo Ishii (#8)
Re: pg_ctl failover Re: Latches, signals, and waiting

On Fri, Jan 28, 2011 at 08:44, Tatsuo Ishii <ishii@postgresql.org> wrote:

I did s/failover/promote. Here is the updated patch.

I rebased the patch to current git master.

I'm thinking about implementing a function which does a promotion for
the standby. It will make pgpool lot easier to control the promotion
since it allow to fire the promotion operation (either creating a
trigger file or sending a signal) via SQL, not ssh etc.

I agree that having this available via SQL would be useful in a number
of cases. pgpool or such being one, but also for example pgadmin.

If there's enough interest, I will propose such a function for next CF.

Just as a reminder, remember that next CF means 9.2.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#10Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Magnus Hagander (#9)
Re: pg_ctl failover Re: Latches, signals, and waiting

On Fri, Jan 28, 2011 at 08:44, Tatsuo Ishii <ishii@postgresql.org> wrote:

I did s/failover/promote. Here is the updated patch.

I rebased the patch to current git master.

I'm thinking about implementing a function which does a promotion for
the standby. It will make pgpool lot easier to control the promotion
since it allow to fire the promotion operation (either creating a
trigger file or sending a signal) via SQL, not ssh etc.

I agree that having this available via SQL would be useful in a number
of cases. pgpool or such being one, but also for example pgadmin.

If there's enough interest, I will propose such a function for next CF.

Just as a reminder, remember that next CF means 9.2.

Oh, I meant current CF (has started in January)
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#11Fujii Masao
masao.fujii@gmail.com
In reply to: Magnus Hagander (#9)
Re: pg_ctl failover Re: Latches, signals, and waiting

On Fri, Jan 28, 2011 at 4:57 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Fri, Jan 28, 2011 at 08:44, Tatsuo Ishii <ishii@postgresql.org> wrote:

I did s/failover/promote. Here is the updated patch.

I rebased the patch to current git master.

I'm thinking about implementing a function which does a promotion for
the standby. It will make pgpool lot easier to control the promotion
since it allow to fire the promotion operation (either creating a
trigger file or sending a signal) via SQL, not ssh etc.

I agree that having this available via SQL would be useful in a number
of cases. pgpool or such being one, but also for example pgadmin.

Agreed. I submitted the patch before, but I forgot to update it
and add it to CF.
http://archives.postgresql.org/message-id/AANLkTimuHbxbuM+zLkaEX3aDqSeiMUE3xb4ww1QtsLmf@mail.gmail.com

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#12Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Fujii Masao (#11)
Re: pg_ctl failover Re: Latches, signals, and waiting

On Fri, Jan 28, 2011 at 4:57 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Fri, Jan 28, 2011 at 08:44, Tatsuo Ishii <ishii@postgresql.org> wrote:

I did s/failover/promote. Here is the updated patch.

I rebased the patch to current git master.

I'm thinking about implementing a function which does a promotion for
the standby. It will make pgpool lot easier to control the promotion
since it allow to fire the promotion operation (either creating a
trigger file or sending a signal) via SQL, not ssh etc.

I agree that having this available via SQL would be useful in a number
of cases. pgpool or such being one, but also for example pgadmin.

Agreed. I submitted the patch before, but I forgot to update it
and add it to CF.
http://archives.postgresql.org/message-id/AANLkTimuHbxbuM+zLkaEX3aDqSeiMUE3xb4ww1QtsLmf@mail.gmail.com

Great!
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#13Robert Haas
robertmhaas@gmail.com
In reply to: Tatsuo Ishii (#12)
Re: pg_ctl failover Re: Latches, signals, and waiting

On Fri, Jan 28, 2011 at 3:40 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:

On Fri, Jan 28, 2011 at 4:57 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Fri, Jan 28, 2011 at 08:44, Tatsuo Ishii <ishii@postgresql.org> wrote:

I did s/failover/promote. Here is the updated patch.

I rebased the patch to current git master.

I'm thinking about implementing a function which does a promotion for
the standby. It will make pgpool lot easier to control the promotion
since it allow to fire the promotion operation (either creating a
trigger file or sending a signal) via SQL, not ssh etc.

I agree that having this available via SQL would be useful in a number
of cases. pgpool or such being one, but also for example pgadmin.

Agreed. I submitted the patch before, but I forgot to update it
and add it to CF.
http://archives.postgresql.org/message-id/AANLkTimuHbxbuM+zLkaEX3aDqSeiMUE3xb4ww1QtsLmf@mail.gmail.com

Great!

I hate to be a wet blanket, but the number of patches in this CF is
going the wrong direction.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#13)
Re: pg_ctl failover Re: Latches, signals, and waiting

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Jan 28, 2011 at 3:40 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:

Agreed. I submitted the patch before, but I forgot to update it
and add it to CF.
http://archives.postgresql.org/message-id/AANLkTimuHbxbuM+zLkaEX3aDqSeiMUE3xb4ww1QtsLmf@mail.gmail.com

Great!

I hate to be a wet blanket, but the number of patches in this CF is
going the wrong direction.

Yes. I'm not sure that the fact that something was discussed months ago
entitles the submitter to a free exemption from the requirement to meet
the CF submission deadline.

regards, tom lane

#15Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Magnus Hagander (#9)
Re: pg_ctl failover Re: Latches, signals, and waiting

I did s/failover/promote. Here is the updated patch.

I rebased the patch to current git master.

I'm thinking about implementing a function which does a promotion for
the standby. It will make pgpool lot easier to control the promotion
since it allow to fire the promotion operation (either creating a
trigger file or sending a signal) via SQL, not ssh etc.

I agree that having this available via SQL would be useful in a number
of cases. pgpool or such being one, but also for example pgadmin.

If there's enough interest, I will propose such a function for next CF.

Just as a reminder, remember that next CF means 9.2.

Ok. I will write a C user function and add to pgpool source tree. I
think it will be fairly easy to create a trigger file in the function.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#16Fujii Masao
masao.fujii@gmail.com
In reply to: Tatsuo Ishii (#15)
Re: pg_ctl failover Re: Latches, signals, and waiting

On Sat, Jan 29, 2011 at 1:11 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:

Ok. I will write a C user function and add to pgpool source tree. I
think it will be fairly easy to create a trigger file in the function.

If the "pg_ctl promote" patch will have been committed, I recommend that
the C function should send the signal to the startup process rather than
creating the trigger file. Because the trigger file is checked every for 5s,
which would lengthen the failover time by an average 2.5s.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#17Itagaki Takahiro
itagaki.takahiro@gmail.com
In reply to: Fujii Masao (#16)
Re: pg_ctl failover Re: Latches, signals, and waiting

On Mon, Jan 31, 2011 at 11:52, Fujii Masao <masao.fujii@gmail.com> wrote:

On Sat, Jan 29, 2011 at 1:11 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:

Ok. I will write a C user function and add to pgpool source tree. I
think it will be fairly easy to create a trigger file in the function.

If the "pg_ctl promote" patch will have been committed, I recommend that
the C function should send the signal to the startup process rather than
creating the trigger file.

The C function needs to create a trigger file in $PGDATA/promote
before sending signals, no? system("pg_ctl promote") seems
the easiest way if you use an external module.

--
Itagaki Takahiro

#18Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Fujii Masao (#16)
Re: pg_ctl failover Re: Latches, signals, and waiting

If the "pg_ctl promote" patch will have been committed, I recommend that
the C function should send the signal to the startup process rather than
creating the trigger file. Because the trigger file is checked every for 5s,
which would lengthen the failover time by an average 2.5s.

Ok, probably I could make the function smart enough to signal or not
by looking at the PostgreSQL version.

BTW is it possible to export following variable in xlog.c?

static char *TriggerFile = NULL;

That would make coding of the C function lot easier.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#19Fujii Masao
masao.fujii@gmail.com
In reply to: Itagaki Takahiro (#17)
Re: pg_ctl failover Re: Latches, signals, and waiting

On Mon, Jan 31, 2011 at 12:31 PM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:

The C function needs to create a trigger file in $PGDATA/promote
before sending signals, no?

No. At least in the current patch, just receipt of SIGUSR2 causes the
startup process to end a recovery. The startup process doesn't check
the existence of $PGDATA/promote, though postmaster does.

 system("pg_ctl promote") seems
the easiest way if you use an external module.

Yeah, that's true.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#20Fujii Masao
masao.fujii@gmail.com
In reply to: Tatsuo Ishii (#18)
Re: pg_ctl failover Re: Latches, signals, and waiting

On Mon, Jan 31, 2011 at 12:35 PM, Tatsuo Ishii <ishii@postgresql.org> wrote:

If the "pg_ctl promote" patch will have been committed, I recommend that
the C function should send the signal to the startup process rather than
creating the trigger file. Because the trigger file is checked every for 5s,
which would lengthen the failover time by an average 2.5s.

Ok, probably I could make the function smart enough to signal or not
by looking at the PostgreSQL version.

BTW is it possible to export following variable in xlog.c?

static char *TriggerFile = NULL;

That would make coding of the C function lot easier.

If you change the function so that it sends the signal or call
system("pg_ctl promote"), exporting that variable seems to
be unnecessary. Because pg_ctl promote can promote
the server even if trigger_file is not supplied. You don't need
to check whether trigger_file is set or not, in the C function.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#21Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Fujii Masao (#20)
#22Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#7)
#23Magnus Hagander
magnus@hagander.net
In reply to: Robert Haas (#22)
#24Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#23)
#25Fujii Masao
masao.fujii@gmail.com
In reply to: Magnus Hagander (#24)
#26Stephen Frost
sfrost@snowman.net
In reply to: Fujii Masao (#25)
#27Fujii Masao
masao.fujii@gmail.com
In reply to: Stephen Frost (#26)
#28Stephen Frost
sfrost@snowman.net
In reply to: Fujii Masao (#27)
#29Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#25)
#30Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#29)