Synchronous replication: Admin command for replication_timeout_action

Started by K, Niranjan (NSN - IN/Bangalore)over 16 years ago3 messages

Hi,

This is to support an admin command or utility which can trigger the
server to be taken to a standalone mode if there a connection failure
detection between Primary and server. It need not be always, that the
replication_timeout needs to be accomplished to detect the connection
failure because it could happen that cluster/hearbeat framework might
detect the connection failure earlier to the replication_timeout. So the
admin command, which will abstract the implementation details will
assist in taking the server to standalone mode earlier to
replication_timeout.

Are there any suggestions from your side with respect to this?

regards,
Niranjan

#2Fujii Masao
masao.fujii@gmail.com
In reply to: K, Niranjan (NSN - IN/Bangalore) (#1)
Re: Synchronous replication: Admin command for replication_timeout_action

Hi,

On Tue, May 5, 2009 at 2:37 AM, K, Niranjan (NSN - IN/Bangalore)
<niranjan.k@nsn.com> wrote:

Hi,

This is to support an admin command or utility which can trigger the
server to be taken to a standalone mode if there a connection failure
detection between Primary and server. It need not be always, that the
replication_timeout needs to be accomplished to detect the connection
failure because it could happen that cluster/hearbeat framework might
detect the connection failure earlier to the replication_timeout. So the
admin command, which will abstract the implementation details will
assist in taking the server to standalone mode earlier to
replication_timeout.

Are there any suggestions from your side with respect to this?

Yes. Since walsender is treated as special backend, we can use
pg_terminate_backend() to terminate replication and let the server
standalone. This feature is simple but very useful, so I'll address it
(my previous patch has not provided this completely yet).

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#2)
Re: Synchronous replication: Admin command for replication_timeout_action

On Tue, 2009-05-26 at 11:06 +0900, Fujii Masao wrote:

Yes. Since walsender is treated as special backend, we can use
pg_terminate_backend() to terminate replication and let the server
standalone. This feature is simple but very useful, so I'll address it
(my previous patch has not provided this completely yet).

I think we need something better than that. We shouldn't be shooting at
pids in a production database: we may get it wrong and take something
else down instead.

We need a graceful termination of replication and an immediate one.
There may be other things we need to add later, so a specific command
will be better and allow us to produce messages like "replication isn't
running" if used inappropriately.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support