Changing the setting of wal_sender_timeout per standby

Started by Tsunakawa, Takayukiover 7 years ago23 messages

tsunakawa.takay@jp.fujitsu.com

over 7 years ago

Hello,

What do you think about changing wal_sender_timeout from PGC_HUP to PGC_BACKEND or PGC_USERSET?

Some customer wants to change the setting per standby, i.e., a shorter timeout for a standby in the same region to enable faster detection failure and failover, and a longer timeout for a standby in the remote region (for disaster recovery) to avoid mis-judging its health.

The current PGC_HUP allows to change the setting by editing postgresql.conf or ALTER SYSTEM and then sending SIGHUP to a specific walsender. But that's not easy to use. The user has to do it upon every switchover and failover.

With PGC_BACKEND, the user would be able to tune the timeout as follows:

[recovery.conf]
primary_conninfo = '... options=''-c wal_sender_timeout=60000'' ...'

With PGC_USERSET, the user would be able to use different user accounts for each standby, and tune the setting as follows:

ALTER USER repluser_remote SET wal_sender_timeout = 60000;

FYI
In Oracle Data Guard, the user configures the timeout for each standby in the primary server's configuration file like this:

LOG_ARCHIVE_DEST_1 = "SERVICE=local_conn_info SYNC NET_TIMEOUT=5"
LOG_ARCHIVE_DEST_2 = "SERVICE=remote_conn_info ASYNC NET_TIMEOUT=60"

Regards
Takayuki Tsunakawa

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Tsunakawa, Takayuki (#1)

Re: Changing the setting of wal_sender_timeout per standby

On Thu, Sep 13, 2018 at 01:14:12AM +0000, Tsunakawa, Takayuki wrote:

Some customer wants to change the setting per standby, i.e., a shorter
timeout for a standby in the same region to enable faster detection
failure and failover, and a longer timeout for a standby in the remote
region (for disaster recovery) to avoid mis-judging its health.

This argument is sensible.

The current PGC_HUP allows to change the setting by editing
postgresql.conf or ALTER SYSTEM and then sending SIGHUP to a specific
walsender. But that's not easy to use. The user has to do it upon
every switchover and failover.

With PGC_BACKEND, the user would be able to tune the timeout as follows:

[recovery.conf]
primary_conninfo = '... options=''-c wal_sender_timeout=60000'' ...'

With PGC_USERSET, the user would be able to use different user
accounts for each standby, and tune the setting as follows:

ALTER USER repluser_remote SET wal_sender_timeout = 60000;

It seems to me that switching to PGC_BACKENDwould cover already all the
use-cases you are mentioning, as at the end one would just want to
adjust the WAL sender timeout on a connection basis depending on the
geographical location of the receiver and the latency between primary
and standby.
--
Michael

Masahiko Sawada

sawada.mshk@gmail.com

over 7 years ago

In reply to: Michael Paquier (#2)

Re: Changing the setting of wal_sender_timeout per standby

On Thu, Sep 13, 2018 at 12:32 PM, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Sep 13, 2018 at 01:14:12AM +0000, Tsunakawa, Takayuki wrote:

Some customer wants to change the setting per standby, i.e., a shorter
timeout for a standby in the same region to enable faster detection
failure and failover, and a longer timeout for a standby in the remote
region (for disaster recovery) to avoid mis-judging its health.

This argument is sensible.

The current PGC_HUP allows to change the setting by editing
postgresql.conf or ALTER SYSTEM and then sending SIGHUP to a specific
walsender. But that's not easy to use. The user has to do it upon
every switchover and failover.

With PGC_BACKEND, the user would be able to tune the timeout as follows:

[recovery.conf]
primary_conninfo = '... options=''-c wal_sender_timeout=60000'' ...'

With PGC_USERSET, the user would be able to use different user
accounts for each standby, and tune the setting as follows:

ALTER USER repluser_remote SET wal_sender_timeout = 60000;

It seems to me that switching to PGC_BACKENDwould cover already all the
use-cases you are mentioning, as at the end one would just want to
adjust the WAL sender timeout on a connection basis depending on the
geographical location of the receiver and the latency between primary
and standby.

+1 for PGC_BACKEND. It looks enough for most use cases.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

over 7 years ago

In reply to: Masahiko Sawada (#3)

Re: Changing the setting of wal_sender_timeout per standby

At Fri, 14 Sep 2018 18:22:33 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoBr2Y=N4iH8+6m5ara2GWdKE6ZrzWaqjZux6ErZ9pyAxQ@mail.gmail.com>

On Thu, Sep 13, 2018 at 12:32 PM, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Sep 13, 2018 at 01:14:12AM +0000, Tsunakawa, Takayuki wrote:

Some customer wants to change the setting per standby, i.e., a shorter
timeout for a standby in the same region to enable faster detection
failure and failover, and a longer timeout for a standby in the remote
region (for disaster recovery) to avoid mis-judging its health.

This argument is sensible.

The current PGC_HUP allows to change the setting by editing
postgresql.conf or ALTER SYSTEM and then sending SIGHUP to a specific
walsender. But that's not easy to use. The user has to do it upon
every switchover and failover.

With PGC_BACKEND, the user would be able to tune the timeout as follows:

[recovery.conf]
primary_conninfo = '... options=''-c wal_sender_timeout=60000'' ...'

With PGC_USERSET, the user would be able to use different user
accounts for each standby, and tune the setting as follows:

ALTER USER repluser_remote SET wal_sender_timeout = 60000;

It seems to me that switching to PGC_BACKENDwould cover already all the
use-cases you are mentioning, as at the end one would just want to
adjust the WAL sender timeout on a connection basis depending on the
geographical location of the receiver and the latency between primary
and standby.

+1 for PGC_BACKEND. It looks enough for most use cases.

+1, and we need a means to see the actual value, in
pg_stat_replication?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Kyotaro HORIGUCHI (#4)

Re: Changing the setting of wal_sender_timeout per standby

On Tue, Sep 18, 2018 at 11:20:03AM +0900, Kyotaro HORIGUCHI wrote:

+1, and we need a means to see the actual value, in
pg_stat_replication?

Well, being able to see what another session is using as settings is
not a trivial problem, perhaps not worth solving, and orthogonal to
what's discussed here...
--
Michael

Tsunakawa, Takayuki

tsunakawa.takay@jp.fujitsu.com

over 7 years ago

In reply to: Kyotaro HORIGUCHI (#4)

1 attachment(s)

RE: Changing the setting of wal_sender_timeout per standby

From: Kyotaro HORIGUCHI [mailto:horiguchi.kyotaro@lab.ntt.co.jp]

At Fri, 14 Sep 2018 18:22:33 +0900, Masahiko Sawada <sawada.mshk@gmail.com>
wrote in
<CAD21AoBr2Y=N4iH8+6m5ara2GWdKE6ZrzWaqjZux6ErZ9pyAxQ@mail.gmail.com>

On Thu, Sep 13, 2018 at 12:32 PM, Michael Paquier <michael@paquier.xyz>

wrote:

On Thu, Sep 13, 2018 at 01:14:12AM +0000, Tsunakawa, Takayuki wrote:

Some customer wants to change the setting per standby, i.e., a shorter
timeout for a standby in the same region to enable faster detection
failure and failover, and a longer timeout for a standby in the remote
region (for disaster recovery) to avoid mis-judging its health.

This argument is sensible.

The current PGC_HUP allows to change the setting by editing
postgresql.conf or ALTER SYSTEM and then sending SIGHUP to a specific
walsender. But that's not easy to use. The user has to do it upon
every switchover and failover.

With PGC_BACKEND, the user would be able to tune the timeout as follows:

[recovery.conf]
primary_conninfo = '... options=''-c wal_sender_timeout=60000'' ...'

With PGC_USERSET, the user would be able to use different user
accounts for each standby, and tune the setting as follows:

ALTER USER repluser_remote SET wal_sender_timeout = 60000;

It seems to me that switching to PGC_BACKENDwould cover already all

the

use-cases you are mentioning, as at the end one would just want to
adjust the WAL sender timeout on a connection basis depending on the
geographical location of the receiver and the latency between primary
and standby.

+1 for PGC_BACKEND. It looks enough for most use cases.

+1, and we need a means to see the actual value, in
pg_stat_replication?

Thanks, the patch attached. I'll add this to the next CF shortly. As Michael said, I think viewing the configured value would be a separate feature.

Regards
Takayuki Tsunakawa

Attachments:

walsender_timeout_PGC_BACKEND.patchapplication/octet-stream; name=walsender_timeout_PGC_BACKEND.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bee4afb..fad859c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3127,11 +3127,22 @@ include_dir 'conf.d'
         Terminate replication connections that are inactive longer
         than the specified number of milliseconds. This is useful for
         the sending server to detect a standby crash or network outage.
-        A value of zero disables the timeout mechanism.  This parameter
-        can only be set in
-        the <filename>postgresql.conf</filename> file or on the server command line.
+        A value of zero disables the timeout mechanism.
+        Only superusers can change this parameter at session start,
+        and it cannot be changed at all within a session.
         The default value is 60 seconds.
        </para>
+
+       <tip>
+        <para>
+         The <literal>%q</literal> escape is useful when including information that is
+         You can also set this in recovery.conf as follows.  This allows you to set a
+         longer timeout for a standby in the remote data center across the slow WAN.
+<programlisting>
+primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass options=''-c wal_sender_timeout=5000'''
+</programlisting>
+        </para>
+       </tip>
       </listitem>
      </varlistentry>
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 0625eff..66783e4 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2539,7 +2539,7 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
-		{"wal_sender_timeout", PGC_SIGHUP, REPLICATION_SENDING,
+		{"wal_sender_timeout", PGC_BACKEND, REPLICATION_SENDING,
 			gettext_noop("Sets the maximum time to wait for WAL replication."),
 			NULL,
 			GUC_UNIT_MS

Masahiko Sawada

sawada.mshk@gmail.com

over 7 years ago

In reply to: Tsunakawa, Takayuki (#6)

Re: Changing the setting of wal_sender_timeout per standby

On Tue, Sep 18, 2018 at 5:27 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

From: Kyotaro HORIGUCHI [mailto:horiguchi.kyotaro@lab.ntt.co.jp]

At Fri, 14 Sep 2018 18:22:33 +0900, Masahiko Sawada <sawada.mshk@gmail.com>
wrote in
<CAD21AoBr2Y=N4iH8+6m5ara2GWdKE6ZrzWaqjZux6ErZ9pyAxQ@mail.gmail.com>

On Thu, Sep 13, 2018 at 12:32 PM, Michael Paquier <michael@paquier.xyz>

wrote:

On Thu, Sep 13, 2018 at 01:14:12AM +0000, Tsunakawa, Takayuki wrote:

Some customer wants to change the setting per standby, i.e., a shorter
timeout for a standby in the same region to enable faster detection
failure and failover, and a longer timeout for a standby in the remote
region (for disaster recovery) to avoid mis-judging its health.

This argument is sensible.

The current PGC_HUP allows to change the setting by editing
postgresql.conf or ALTER SYSTEM and then sending SIGHUP to a specific
walsender. But that's not easy to use. The user has to do it upon
every switchover and failover.

With PGC_BACKEND, the user would be able to tune the timeout as follows:

[recovery.conf]
primary_conninfo = '... options=''-c wal_sender_timeout=60000'' ...'

With PGC_USERSET, the user would be able to use different user
accounts for each standby, and tune the setting as follows:

ALTER USER repluser_remote SET wal_sender_timeout = 60000;

It seems to me that switching to PGC_BACKENDwould cover already all

the

use-cases you are mentioning, as at the end one would just want to
adjust the WAL sender timeout on a connection basis depending on the
geographical location of the receiver and the latency between primary
and standby.

+1 for PGC_BACKEND. It looks enough for most use cases.

+1, and we need a means to see the actual value, in
pg_stat_replication?

Thanks, the patch attached. I'll add this to the next CF shortly. As Michael said, I think viewing the configured value would be a separate feature.

Thank you for the patch.

+       <tip>
+        <para>
+         The <literal>%q</literal> escape is useful when including
information that is
+         You can also set this in recovery.conf as follows.  This
allows you to set a
+         longer timeout for a standby in the remote data center
across the slow WAN.
+<programlisting>
+primary_conninfo = 'host=192.168.1.50 port=5432 user=foo
password=foopass options=''-c wal_sender_timeout=5000'''
+</programlisting>
+        </para>
+       </tip>

I didn't follow the first sentence of the above hunk. Is the
wal_sender_timeout relevant with %q?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Tsunakawa, Takayuki

tsunakawa.takay@jp.fujitsu.com

over 7 years ago

In reply to: Masahiko Sawada (#7)

1 attachment(s)

RE: Changing the setting of wal_sender_timeout per standby

From: Masahiko Sawada [mailto:sawada.mshk@gmail.com]

I didn't follow the first sentence of the above hunk. Is the
wal_sender_timeout relevant with %q?

Ouch, that's a careless mistake. I copied the paragraph from another parameter and failed to remove some sentence. Patch revised.

Regards
Takayuki Tsunakawa

Attachments:

walsender_timeout_PGC_BACKEND_v2.patchapplication/octet-stream; name=walsender_timeout_PGC_BACKEND_v2.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bee4afb..f6ada1f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3127,11 +3127,22 @@ include_dir 'conf.d'
         Terminate replication connections that are inactive longer
         than the specified number of milliseconds. This is useful for
         the sending server to detect a standby crash or network outage.
-        A value of zero disables the timeout mechanism.  This parameter
-        can only be set in
-        the <filename>postgresql.conf</filename> file or on the server command line.
+        A value of zero disables the timeout mechanism.
+        Only superusers can change this parameter at session start,
+        and it cannot be changed at all within a session.
         The default value is 60 seconds.
        </para>
+
+       <tip>
+        <para>
+         You can also set this in <filename>recovery.conf</filename> as follows.  This
+         allows you to set a longer timeout for a standby in the remote data center
+         across the slow WAN.
+<programlisting>
+primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass options=''-c wal_sender_timeout=5000'''
+</programlisting>
+        </para>
+       </tip>
       </listitem>
      </varlistentry>
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 0625eff..66783e4 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2539,7 +2539,7 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
-		{"wal_sender_timeout", PGC_SIGHUP, REPLICATION_SENDING,
+		{"wal_sender_timeout", PGC_BACKEND, REPLICATION_SENDING,
 			gettext_noop("Sets the maximum time to wait for WAL replication."),
 			NULL,
 			GUC_UNIT_MS

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Tsunakawa, Takayuki (#8)

Re: Changing the setting of wal_sender_timeout per standby

On Wed, Sep 19, 2018 at 12:14:57AM +0000, Tsunakawa, Takayuki wrote:

From: Masahiko Sawada [mailto:sawada.mshk@gmail.com]

I didn't follow the first sentence of the above hunk. Is the
wal_sender_timeout relevant with %q?

Ouch, that's a careless mistake. I copied the paragraph from another
parameter and failed to remove some sentence. Patch revised.

-        A value of zero disables the timeout mechanism.  This parameter
-        can only be set in
-        the <filename>postgresql.conf</filename> file or on the server
-        command line.
+        A value of zero disables the timeout mechanism.
+        Only superusers can change this parameter at session start,
+        and it cannot be changed at all within a session.
         The default value is 60 seconds.

Parameters classified as PGC_BACKEND can be updated by any users, and
those marked as PGC_SU_BACKEND can only be updated by superusers.
Replication users are not superusers, which is why PGC_BACKEND is most
adapted. Your description should just say "This parameter cannot be
changed after session start.
--
Michael

#10

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Michael Paquier (#9)

Re: Changing the setting of wal_sender_timeout per standby

On Wed, Sep 19, 2018 at 02:40:31PM +0900, Michael Paquier wrote:

Parameters classified as PGC_BACKEND can be updated by any users, and
those marked as PGC_SU_BACKEND can only be updated by superusers.
Replication users are not superusers, which is why PGC_BACKEND is most
adapted. Your description should just say "This parameter cannot be
changed after session start.

Actually, now that I look at guc.c, using PGC_BACKEND breaks one case:
such parameter types cannot be changed even with SIGHUP, and this even
if the session relies on the default value. So you could break
applications relying on reloads. PGC_USERSET would actually do the work
as if the session uses a non-default value specified by SET or within
the connection string, then SIGHUP updates have no effect. On the
contrary, if the client relies on the default value, then SIGHUP updates
take effect. Sorry for the confusion, I should have looked at guc.c
more carefully.
--
Michael

#11

Tsunakawa, Takayuki

tsunakawa.takay@jp.fujitsu.com

over 7 years ago

In reply to: Michael Paquier (#9)

1 attachment(s)

RE: Changing the setting of wal_sender_timeout per standby

From: Michael Paquier [mailto:michael@paquier.xyz]

Parameters classified as PGC_BACKEND can be updated by any users, and those
marked as PGC_SU_BACKEND can only be updated by superusers.
Replication users are not superusers, which is why PGC_BACKEND is most
adapted. Your description should just say "This parameter cannot be
changed after session start.

How embarrassing... I'm sorry to cause you trouble to point out a silly mistake like this (I thought I would write as you suggested, but it seems that I was not who I usually am.) The revised patch attached.

Regards
Takayuki Tsunakawa

Attachments:

walsender_timeout_PGC_BACKEND_v3.patchapplication/octet-stream; name=walsender_timeout_PGC_BACKEND_v3.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bee4afb..736f73b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3127,11 +3127,21 @@ include_dir 'conf.d'
         Terminate replication connections that are inactive longer
         than the specified number of milliseconds. This is useful for
         the sending server to detect a standby crash or network outage.
-        A value of zero disables the timeout mechanism.  This parameter
-        can only be set in
-        the <filename>postgresql.conf</filename> file or on the server command line.
+        A value of zero disables the timeout mechanism.
+        This parameter cannot be changed after session start.
         The default value is 60 seconds.
        </para>
+
+       <tip>
+        <para>
+         You can also set this in <filename>recovery.conf</filename> as follows.  This
+         allows you to set a longer timeout for a standby in the remote data center
+         across the slow WAN.
+<programlisting>
+primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass options=''-c wal_sender_timeout=5000'''
+</programlisting>
+        </para>
+       </tip>
       </listitem>
      </varlistentry>
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 0625eff..66783e4 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2539,7 +2539,7 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
-		{"wal_sender_timeout", PGC_SIGHUP, REPLICATION_SENDING,
+		{"wal_sender_timeout", PGC_BACKEND, REPLICATION_SENDING,
 			gettext_noop("Sets the maximum time to wait for WAL replication."),
 			NULL,
 			GUC_UNIT_MS

#12

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Tsunakawa, Takayuki (#11)

1 attachment(s)

Re: Changing the setting of wal_sender_timeout per standby

On Wed, Sep 19, 2018 at 06:21:52AM +0000, Tsunakawa, Takayuki wrote:

How embarrassing... I'm sorry to cause you trouble to point out a
silly mistake like this (I thought I would write as you suggested, but
it seems that I was not who I usually am.) The revised patch
attached.

Thanks for the new version. Per my comments up-thread here, you cannot
actually use PGC_BACKEND:
/messages/by-id/20180919061303.GB19808@paquier.xyz

This would break the case where this parameter is reloaded when a
session does not use a custom value for wal_sender_timeout. I have also
looked at all the code paths using wal_sender_timeout, and the change
looks safe to me. Please find attached an update, simplified, version.
Does that look fine to you?
--
Michael

Attachments:

wal-timeout-userset-michael.patchtext/x-diff; charset=us-asciiDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e1073ac6d3..29bd0eca8c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3127,10 +3127,8 @@ include_dir 'conf.d'
         Terminate replication connections that are inactive longer
         than the specified number of milliseconds. This is useful for
         the sending server to detect a standby crash or network outage.
-        A value of zero disables the timeout mechanism.  This parameter
-        can only be set in
-        the <filename>postgresql.conf</filename> file or on the server command line.
-        The default value is 60 seconds.
+        A value of zero disables the timeout mechanism. The default value
+        is 60 seconds.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 77662aff7f..e9f542cfed 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2539,7 +2539,7 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
-		{"wal_sender_timeout", PGC_SIGHUP, REPLICATION_SENDING,
+		{"wal_sender_timeout", PGC_USERSET, REPLICATION_SENDING,
 			gettext_noop("Sets the maximum time to wait for WAL replication."),
 			NULL,
 			GUC_UNIT_MS

#13

Tsunakawa, Takayuki

tsunakawa.takay@jp.fujitsu.com

over 7 years ago

In reply to: Michael Paquier (#12)

RE: Changing the setting of wal_sender_timeout per standby

From: Michael Paquier [mailto:michael@paquier.xyz]

Thanks for the new version. Per my comments up-thread here, you cannot
actually use PGC_BACKEND:
/messages/by-id/20180919061303.GB19808@paquier.x
yz

This would break the case where this parameter is reloaded when a session
does not use a custom value for wal_sender_timeout. I have also looked
at all the code paths using wal_sender_timeout, and the change looks safe
to me. Please find attached an update, simplified, version.
Does that look fine to you?

Thank you, PGC_USERSET is off course fine to me. I thought PGC_BACKEND includes the capability of PGC_SIGHUP, because PGC_BACKEND is placed after PGC_SIGHUP in the definition of enum GucContext. That was a pitfall. I should have paid more attention.

I find it more user friendly to include a description somewhere that the user can tune the timeout per standby, like I added a tip in the description of wal_sender_timeout. I'm afraid users won't know whether and how to tune the setting per standby, as libpq's options parameter doesn't seem well-known in my experience.

Regards
Takayuki Tsunakawa

#14

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Tsunakawa, Takayuki (#13)

Re: Changing the setting of wal_sender_timeout per standby

On Fri, Sep 21, 2018 at 02:28:07AM +0000, Tsunakawa, Takayuki wrote:

I find it more user friendly to include a description somewhere that
the user can tune the timeout per standby, like I added a tip in the
description of wal_sender_timeout. I'm afraid users won't know
whether and how to tune the setting per standby, as libpq's options
parameter doesn't seem well-known in my experience.

But that does not apply to this single parameter, no? I would think
that a section in recovery.conf is more adapted. I can see that the
patch I proposed up-thread could be more precise though, so why not
adding at an extra paragraph? Here is an idea:
"For a cluster distributed across multiple geographic locations, using
a different value per location brings more flexibility in the cluster
management. A smaller value is useful for faster failure detection with
a standby having a low connection latency, and a larger value helps in
judging better the health of a standby if located in a remote location,
with a longer connection latency."
--
Michael

#15

Tsunakawa, Takayuki

tsunakawa.takay@jp.fujitsu.com

over 7 years ago

In reply to: Michael Paquier (#14)

RE: Changing the setting of wal_sender_timeout per standby

From: Michael Paquier [mailto:michael@paquier.xyz]

But that does not apply to this single parameter, no? I would think that
a section in recovery.conf is more adapted. I can see that the patch I
proposed up-thread could be more precise though, so why not adding at an
extra paragraph? Here is an idea:
"For a cluster distributed across multiple geographic locations, using a
different value per location brings more flexibility in the cluster
management. A smaller value is useful for faster failure detection with
a standby having a low connection latency, and a larger value helps
judging better the health of a standby if located in a remote location,
with a longer connection latency."

That paragraph looks cool, thanks. Regarding where the paragraph should be, there are three candidate locations:

(1) 19.6.1. Sending Servers, wal_sender_timeout description
https://www.postgresql.org/docs/devel/static/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-SENDER

(2) 26.2.4. Setting Up a Standby Server
https://www.postgresql.org/docs/devel/static/warm-standby.html#STANDBY-SERVER-SETUP

(3) 27.3. Standby Server Settings, primary_conninfo description
https://www.postgresql.org/docs/devel/static/standby-settings.html

I think all of these are almost equally good. I chose (1) at first, and you chose (3). But (2) may be the best, because it's the natural place the user will see when configuring the standby, and it already contains an example of recovery.conf. We can add "options=''-c wal_sender_timeout=5000''" or something in that example. I'm OK with anyplace, but I recommend adding how to specify wal_sender_timeout in primary_conninfo, because libpq's options parameter may not be well-konown, and it's a bit difficult to figure out the need to enclose the value with double single-quotes.

Regards
Takayuki Tsunakawa

#16

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Tsunakawa, Takayuki (#15)

Re: Changing the setting of wal_sender_timeout per standby

On Fri, Sep 21, 2018 at 05:37:42AM +0000, Tsunakawa, Takayuki wrote:

I think all of these are almost equally good. I chose (1) at first,
and you chose (3). But (2) may be the best, because it's the natural
place the user will see when configuring the standby, and it already
contains an example of recovery.conf. We can add "options=''-c
wal_sender_timeout=5000''" or something in that example. I'm OK with
anyplace, but I recommend adding how to specify wal_sender_timeout in
primary_conninfo, because libpq's options parameter may not be
well-konown, and it's a bit difficult to figure out the need to
enclose the value with double single-quotes.

I think that the description of wal_sender_timeout and its properties
should remain where the parameter is defined, so (3) is not a good
option in my opinion. (2) has a point with the use of quotes actually,
so why not just mention options=''-c wal_sender_timeout=5000'' in the
example of recovery.conf as you suggest and call it a day, but keep the
paragraph I suggested in (1)?
--
Michael

#17

Tsunakawa, Takayuki

tsunakawa.takay@jp.fujitsu.com

over 7 years ago

In reply to: Michael Paquier (#16)

RE: Changing the setting of wal_sender_timeout per standby

From: Michael Paquier [mailto:michael@paquier.xyz]

I think that the description of wal_sender_timeout and its properties should
remain where the parameter is defined, so (3) is not a good option in my
opinion. (2) has a point with the use of quotes actually, so why not just
mention options=''-c wal_sender_timeout=5000'' in the example of
recovery.conf as you suggest and call it a day, but keep the paragraph I
suggested in (1)?

Agreed. Sorry to cause you to take this long time for such a tiny patch...

Regards
Takayuki Tsunakawa

#18

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Tsunakawa, Takayuki (#17)

Re: Changing the setting of wal_sender_timeout per standby

On Fri, Sep 21, 2018 at 06:26:19AM +0000, Tsunakawa, Takayuki wrote:

Agreed. Sorry to cause you to take this long time for such a tiny
patch...

Well, that is arguing about how to shape things and agree on those,
which is not wasted time, far from that.
--
Michael

#19

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Tsunakawa, Takayuki (#17)

Re: Changing the setting of wal_sender_timeout per standby

On Fri, Sep 21, 2018 at 06:26:19AM +0000, Tsunakawa, Takayuki wrote:

Agreed.

Okay, I have pushed the patch with all your suggestions included.
--
Michael

#20

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Michael Paquier (#19)

Re: Changing the setting of wal_sender_timeout per standby

Hi,

On 2018-09-22 15:27:24 +0900, Michael Paquier wrote:

On Fri, Sep 21, 2018 at 06:26:19AM +0000, Tsunakawa, Takayuki wrote:

Agreed.

Okay, I have pushed the patch with all your suggestions included.

Have there been discussions about the security effects of this change?
Previously the server admin could control the timeout, which could
affect things like syncrep, after this it's not possible anymore. I
*think* that's ok, but it should be discussed.

Greetings,

Andres Freund

#21

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#20)

Re: Changing the setting of wal_sender_timeout per standby

Andres Freund <andres@anarazel.de> writes:

Have there been discussions about the security effects of this change?
Previously the server admin could control the timeout, which could
affect things like syncrep, after this it's not possible anymore. I
*think* that's ok, but it should be discussed.

Hm. An evil replication connection could already cause all sorts of
operational problems (and I'm not counting grabbing all your data).
Does this add anything much new in that line? It seems like the
effects would be at least in the same ballpark as not sending
hot-standby-feedback messages in a timely fashion.

regards, tom lane

#22

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Tom Lane (#21)

Re: Changing the setting of wal_sender_timeout per standby

On Sun, Sep 23, 2018 at 10:47:44AM -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Have there been discussions about the security effects of this change?
Previously the server admin could control the timeout, which could
affect things like syncrep, after this it's not possible anymore. I
*think* that's ok, but it should be discussed.

Hm. An evil replication connection could already cause all sorts of
operational problems (and I'm not counting grabbing all your data).
Does this add anything much new in that line? It seems like the
effects would be at least in the same ballpark as not sending
hot-standby-feedback messages in a timely fashion.

Well, a user able to spawn a WAL sender has replication rights, and it
is already entrusted a lot, particularly knowing that this user can run
BASE_BACKUP and fetch a superuser password which could be used for more
evil actions. So I am not sure what is actually worrying with this
change in this area, at least it seems to me that the bar is not really
lowered. An admin can still enforce a value if the client does not
specify it at connection time. What kind of attack would you see? An
evil user connecting with a insanely high value and delaying failure
detection, impacting the system performance?
--
Michael

#23

Tsunakawa, Takayuki

tsunakawa.takay@jp.fujitsu.com

over 7 years ago

In reply to: Michael Paquier (#19)

RE: Changing the setting of wal_sender_timeout per standby

From: Michael Paquier [mailto:michael@paquier.xyz]

Okay, I have pushed the patch with all your suggestions included.

Thanks so much!

Regards
Takayuki Tsunakawa