equivalent to "replication_timeout" on standby server

Started by Sambaover 14 years ago4 messagesgeneral

saasira@gmail.com

over 14 years ago

Hi all,

The postgres manual explains the "replication_timeout" to be used to

"Terminate replication connections that are inactive longer than the
specified number of milliseconds. This is useful for the primary server to
detect a standby crash or network outage"

Is there a similar configuration parameter that helps the WAL receiver
processes to terminate the idle connections on the standby servers?

It would be very useful (for monitoring purpose) if the termination of such
an idle connection on either master or standby servers is logged with
appropriate message.

Could some one explain me if this is possible with postgres-9.1.1?

Thanks and Regards,
Samba

Fujii Masao

masao.fujii@gmail.com

over 14 years ago

In reply to: Samba (#1)

Re: equivalent to "replication_timeout" on standby server

On Thu, Nov 3, 2011 at 12:25 AM, Samba <saasira@gmail.com> wrote:

The postgres manual explains the "replication_timeout" to be used to

"Terminate replication connections that are inactive longer than the
specified number of milliseconds. This is useful for the primary server to
detect a standby crash or network outage"

Is there a similar configuration parameter that helps the WAL receiver
processes to terminate the idle connections on the standby servers?

No.

But setting keepalive libpq parameters in primary_conninfo might be useful
to detect the termination of connection from the standby server.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Samba

saasira@gmail.com

over 14 years ago

In reply to: Fujii Masao (#2)

Re: equivalent to "replication_timeout" on standby server

Thanks Fuji for that I hint...

I searched around on the internet for that trick and it looks like we can
make the Standby close its connection to the master much earlier than it
otherwise would;it is good for me now.

But still there seems to be two problem areas that can be improved over
time...

- although both master(with replication_timeout) and slave (with tcp
timeout option in primary_conninfo parameter) closes the connection in
quick time (based on tcp idle connection timeout), as of now they do not
log such information. It would be really helpful if such disconnects are
logged with appropriate severity so that the problem can identified early
and help in keeping track of patterns and history of such issues.
-
- Presently, neither master nor standby server attempts
to resume streaming replication when they happen to see each other after
some prolonged disconnect. It would be better if either master or slave or
both the servers makes periodic checks to find if the other is reachable
and resume the replication( if possible, or else log the message that a
full sync may be required).

Thanks and Regards,
Samba

----------------------------------------------------------------------------------------------------------------------
On Fri, Nov 4, 2011 at 7:25 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

Show quoted text

On Thu, Nov 3, 2011 at 12:25 AM, Samba <saasira@gmail.com> wrote:

The postgres manual explains the "replication_timeout" to be used to

"Terminate replication connections that are inactive longer than the
specified number of milliseconds. This is useful for the primary server

to

detect a standby crash or network outage"

Is there a similar configuration parameter that helps the WAL receiver
processes to terminate the idle connections on the standby servers?

No.

But setting keepalive libpq parameters in primary_conninfo might be useful
to detect the termination of connection from the standby server.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Fujii Masao

masao.fujii@gmail.com

over 14 years ago

In reply to: Samba (#3)

Re: equivalent to "replication_timeout" on standby server

On Fri, Nov 4, 2011 at 10:58 PM, Samba <saasira@gmail.com> wrote:

although both master(with replication_timeout) and slave (with tcp timeout
option in primary_conninfo parameter) closes the connection in quick time
(based on tcp idle connection timeout), as of now they do not log such
information. It would be really helpful if such disconnects are logged with
appropriate severity so that the problem can identified early and help in
keeping track of patterns and history of such issues.

Oh, really? Unless I'm missing something, when replication timeout happens,
the following log message would be logged in the master:

terminating walsender process due to replication timeout

OTOH, something like the following would be logged in the standby:

could not receive data from WAL stream......

Presently, neither master nor standby server attempts to resume streaming
replication when they happen to see each other after some prolonged
disconnect. It would be better if either master or slave or both the servers
makes periodic checks to find if the other is reachable and resume the
replication( if possible, or else log the message that a full sync may be
required).

The standby periodically tries reconnecting to the master after it detects
the termination of replication connection. So even after prolonged disconnect,
replication can automatically resume.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center