Synchronous replication: Admin command for replication_timeout_action
Hi,
This is to support an admin command or utility which can trigger the
server to be taken to a standalone mode if there a connection failure
detection between Primary and server. It need not be always, that the
replication_timeout needs to be accomplished to detect the connection
failure because it could happen that cluster/hearbeat framework might
detect the connection failure earlier to the replication_timeout. So the
admin command, which will abstract the implementation details will
assist in taking the server to standalone mode earlier to
replication_timeout.
Are there any suggestions from your side with respect to this?
regards,
Niranjan
Hi,
On Tue, May 5, 2009 at 2:37 AM, K, Niranjan (NSN - IN/Bangalore)
<niranjan.k@nsn.com> wrote:
Hi,
This is to support an admin command or utility which can trigger the
server to be taken to a standalone mode if there a connection failure
detection between Primary and server. It need not be always, that the
replication_timeout needs to be accomplished to detect the connection
failure because it could happen that cluster/hearbeat framework might
detect the connection failure earlier to the replication_timeout. So the
admin command, which will abstract the implementation details will
assist in taking the server to standalone mode earlier to
replication_timeout.Are there any suggestions from your side with respect to this?
Yes. Since walsender is treated as special backend, we can use
pg_terminate_backend() to terminate replication and let the server
standalone. This feature is simple but very useful, so I'll address it
(my previous patch has not provided this completely yet).
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Tue, 2009-05-26 at 11:06 +0900, Fujii Masao wrote:
Yes. Since walsender is treated as special backend, we can use
pg_terminate_backend() to terminate replication and let the server
standalone. This feature is simple but very useful, so I'll address it
(my previous patch has not provided this completely yet).
I think we need something better than that. We shouldn't be shooting at
pids in a production database: we may get it wrong and take something
else down instead.
We need a graceful termination of replication and an immediate one.
There may be other things we need to add later, so a specific command
will be better and allow us to produce messages like "replication isn't
running" if used inappropriately.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support