Timeout parameters

Started by Nagaura, Ryoheiover 7 years ago120 messageshackers
Jump to latest
#1Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com

Hi, all.

I'd like to suggest introducing two parameters to handle client-server communication timeouts.
That is "tcp_user_timeout" and "socket_timeout" parameter.

I implemented "tcp_user_timeout" parameter
in both backend and frontend side.
This parameter enables us to
use TCP_USER_TIMEOUT option on linux.
If the parameter is specified, the process sets the value to
TCP_USER_TIMEOUT option.
In my opinion, this option is needed for the following situation:
If the server can't return an ack packet to the request from the client,
the client performs retransmission processing.
In this case TCP keepalive option can't work.
Therefore we need TCP USER TIMEOUT option.
Andrei Yahorau also refer to the necessity of this option in [1]/messages/by-id/OF4C8A68CE.A350F319-ON432582D0.0028A5FF-432582D0.002FEE28@iba.by.

"socket_timeout" is the application layer timeout parameter
from when frontend issues SQL query
to when frontend receives the execution result from backend.
When this parameter is active and timeout occurs,
frontend close the socket.
It is a merit for client to set the maximum time
to wait for SQL.

I'm waiting for your opinions or reviews.

[1]: /messages/by-id/OF4C8A68CE.A350F319-ON432582D0.0028A5FF-432582D0.002FEE28@iba.by

Bes regards,
---------------------
Ryohei Nagaura

Attachments:

socket_timeout.patchapplication/octet-stream; name=socket_timeout.patchDownload+45-0
TCP_USER_TIMEOUT_in_backend.patchapplication/octet-stream; name=TCP_USER_TIMEOUT_in_backend.patchDownload+52-2
TCP_USER_TIMEOUT_in_interface.patchapplication/octet-stream; name=TCP_USER_TIMEOUT_in_interface.patchDownload+123-0
#2AYahorau@ibagroup.eu
AYahorau@ibagroup.eu
In reply to: Nagaura, Ryohei (#1)
Re: Timeout parameters

Hello Ryohei,

I took a look at your changes and I have some notes.
I faced the same issue as you faced. In my opinion hanging of a client is
quite critical case and it needs to be overcame.
TCP_USER_TIMEOUT option helps to overcome this problem and I agree with
you that it needs to be supported within PostgreSQL.

Nevertheless, it is necessary to take into account that the option
TCP_USER_TIMEOUT is supported by Linux kernel starting since 2.6.37. In a
lower kernel version these changes will not take affect.

I am not sure that suggested by you “socket_timeout” option should be
implemented.
I see that you have changed pqWait() function. In my opinion it
contradicts a bit with the comment to this function:
“We also stop waiting and return if the kernel flags an exception
condition on the socket.” It means that this function should wait for some
condition (ready to read/write) forever. On the other side, there is a
function pqWaitTimed() which does the same action but within a timeout.
So, in my opinion such changes of this function can lead to the problem
with backward compatibility: the caller process expects that it will wait
forever but terminates unexpectedly by timeout.

As far as I understand PostgreSQL versioning policy, the implementation of
new parameter requires modification of internal PostgreSQL structure. As
you know it is not posssible without stopping a service which can be done
during migration to the new major version of PostgreSQL which is expected
to be released in September 2019.

As a workaround I suggest using asynchronous command processing
https://www.postgresql.org/docs/10/static/libpq-async.html.

Best regards,
Andrei Yahorau

From: "Nagaura, Ryohei" <nagaura.ryohei@jp.fujitsu.com>
To: "'pgsql-hackers@postgresql.org'" <pgsql-hackers@postgresql.org>,
Cc: "AYahorau@ibagroup.eu" <AYahorau@ibagroup.eu>
Date: 23/10/2018 07:37
Subject: Timeout parameters

Hi, all.

I'd like to suggest introducing two parameters to handle client-server
communication timeouts.
That is "tcp_user_timeout" and "socket_timeout" parameter.

I implemented "tcp_user_timeout" parameter
in both backend and frontend side.
This parameter enables us to
use TCP_USER_TIMEOUT option on linux.
If the parameter is specified, the process sets the value to
TCP_USER_TIMEOUT option.
In my opinion, this option is needed for the following situation:
If the server can't return an ack packet to the request from the client,
the client performs retransmission processing.
In this case TCP keepalive option can't work.
Therefore we need TCP USER TIMEOUT option.
Andrei Yahorau also refer to the necessity of this option in [1]/messages/by-id/OF4C8A68CE.A350F319-ON432582D0.0028A5FF-432582D0.002FEE28@iba.by.

"socket_timeout" is the application layer timeout parameter
from when frontend issues SQL query
to when frontend receives the execution result from backend.
When this parameter is active and timeout occurs,
frontend close the socket.
It is a merit for client to set the maximum time
to wait for SQL.

I'm waiting for your opinions or reviews.

[1]: /messages/by-id/OF4C8A68CE.A350F319-ON432582D0.0028A5FF-432582D0.002FEE28@iba.by
/messages/by-id/OF4C8A68CE.A350F319-ON432582D0.0028A5FF-432582D0.002FEE28@iba.by

Bes regards,
---------------------
Ryohei Nagaura

[attachment "socket_timeout.patch" deleted by Andrei Yahorau/IBA]
[attachment "TCP_USER_TIMEOUT_in_backend.patch" deleted by Andrei
Yahorau/IBA] [attachment "TCP_USER_TIMEOUT_in_interface.patch" deleted by
Andrei Yahorau/IBA]

#3Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: AYahorau@ibagroup.eu (#2)
RE: Timeout parameters

Hi Andrei,

Thank you for response.

TCP_USER_TIMEOUT option helps to overcome this problem and I agree with
you that it needs to be supported within PostgreSQL.

I'm glad to your agreement.

Nevertheless, it is necessary to take into account that the option
TCP_USER_TIMEOUT is supported by Linux kernel starting since 2.6.37. In
a lower kernel version these changes will not take affect.

Does it mean how do we support Linux OS whose kernel version is less than 2.6.37?

I am not sure that suggested by you “socket_timeout” option should be
implemented.
As a workaround I suggest using asynchronous command processing
https://www.postgresql.org/docs/10/static/libpq-async.html

There are many applications implemented with synchronous API
(e.g. PQexec()), so "socket_timeout" is useful I think.

Best regards,
---------------------
Ryohei Nagaura

#4Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#3)
RE: Timeout parameters

Hi Andrei,

First, I inform you that I may not contact for the following period:
From November 1st to November 19th

Second, I noticed my misunderstanding in previous mail.

Nevertheless, it is necessary to take into account that the option
TCP_USER_TIMEOUT is supported by Linux kernel starting since 2.6.37.
In a lower kernel version these changes will not take affect.

Does it mean how do we support Linux OS whose kernel version is less than
2.6.37?

I understand that you pointed out my implementation.
I'll remake patch files when I return.

Finally, I write test method for each parameters here roughly.
You may use iptables command on linux when testing TCP_USER_TIMEOUT.
You may use pg_sleep(seconds) command in postgres.
I'll write the details after my returning.

Continue to discuss the socket_timeout, please.

Best regards,
---------------------
Ryohei Nagaura

#5Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Nagaura, Ryohei (#1)
Re: Timeout parameters

Hello Ryohei,

I'd like to suggest introducing two parameters to handle client-server
communication timeouts.

I'm generally fine with giving more access to low-level parameters to
users. However, I'm not sure I understand the use case you have that needs
these new extensions.

"socket_timeout" parameter.

About the "socket_timout" patch:

Patch does not apply cleanly because of a "trailing whitespace" in a
comment. Please remove spaces at the end of lines.

I'd like clarifications about the use case that needs this specific
feature, especially to understand why the server-side "statement_timeout"
setting is not right enough.

"socket_timeout" is the application layer timeout parameter from when
frontend issues SQL query to when frontend receives the execution result
from backend. When this parameter is active and timeout occurs, frontend
close the socket. It is a merit for client to set the maximum time to
wait for SQL.

I think that there is some kind of a misnomer: this is not a socket-level
timeout, but a client-side query timeout, so it should be named
differently? I'm not sure how to name it, though.

I checked that the feature works at the psql level.

sh> psql "socket_timeout=2"

psql> SELECT 1;
1

psql> SELECT pg_sleep(3);
timeout expired
The connection to the server was lost. Attempting reset: Succeeded.

The timeout is per statement, if there are several statements, each get
its own timeout, just like server-side "statement_timeout".

I think that the way it works is a little extreme, basically the
connection is aborted from within pqWait, and then restarted from scratch.
I would not expect that from such a feature, but I'm not sure how to
cancel a query from libpq, but it is possible, eg:

psql> SELECT pg_sleep(10);
^C Cancel request sent
ERROR: canceling statement due to user request

psql>

Would that be better? It probably assumes that the connection is okay.

The implementation looks awkward, because part of the logic of pqWaitTimed
is reimplemented in pqWait. Also, I do not understand the computation
of finish_time, which seems to assume that pqWait is going to be called
immediately after sending a query, which may or may not be the case, and
if it is not the time() call there is not the start of the statement.

C style: all commas should be followed by a space (or newline).

There is no clear way to know about the value of the setting (SHOW, \set,
\pset...). Ok, this is already the case of other connection parameters.

Using "atoi" is a bad idea because it accepts trailing garbage and does
not detect overflows. Use the "parse_int_param" function instead.

There are no tests.

There is no documentation.

--
Fabien.

#6Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Fabien COELHO (#5)
RE: Timeout parameters

Hi, Fabien.

Thank you for your review.
And I'm very sorry to have kept you waiting so long.

About "socket_timeout"

I'm generally fine with giving more access to low-level parameters to users.
However, I'm not sure I understand the use case you have that needs these
new extensions.

If you face the following situation, this parameter will be needed.
1. The connection between the server and the client has been established normally.
2. A server process has been received SQL statement.
3. The server OS can return an ack packet, but it takes time to execute the SQL statement
Or return the result because the server process is very busy.
4. The client wants to close the connection while leaving the job to the server.
In this case, "statement_timeout" can't satisfy at line 4.

I think that there is some kind of a misnomer: this is not a socket-level
timeout, but a client-side query timeout, so it should be named differently?

Yes, I think so.

I'm not sure how to name it, though.

Me too.

I think that the way it works is a little extreme, basically the connection
is aborted from within pqWait, and then restarted from scratch.

There is no clear way to know about the value of the setting (SHOW, \set,
\pset...). Ok, this is already the case of other connection parameters.

If this parameter can be needed, I would like to discuss design and optional functions.
How do you think?
I'll correct patch of "socket_timeout" after that.

About "TCP_USER_TIMEOUT"
I fixed on the previous feedback.
Would you review, please?

There are no tests.

I introduce the test methods of TCP_USER_TIMEOUT.

Test of client-side TCP_USER_TIMEOUT:
[client operation]
1. Connect DB server.
postgres=# psql postgresql://USERNAME:PASSWORD@hostname:port/dbname?tcp_user_timeout=15000
2. Get the port number by the following command:
postgres=# select inet_client_port();
3. Close the client port from the other console of the client machine.
Please rewrite "56750" to the number confirmed on line 2.
$ iptables -I INPUT -p tcp --dport 56750 -j DROP
4. Query the following SQL:
postgres=# select pg_sleep(10);
5. TCP USER TIMEOUT works correctly if an error message is output to the console.

Test of server-side TCP_USER_TIMEOUT:
[client operation]
1. Connect DB server.
2. Get the port number by the following command:
postgres=# select inet_client_port();
3. Set the TCP_USER_TIMEOUT by the following command:
postgres=# set tcp_user_timeout=15000;
4. Query the following SQL:
postgres=# select pg_sleep(10);
5. Close the client port from the other console.
Please rewrite "56750" to the number confirmed on line 2.
$ iptables -I INPUT -p tcp --dport 56750 -j DROP
[server operation]
6. Verify the logfile.

There is no documentation.

I made a patch of documentation of TCP USER TIMEOUT.

Best regards,
---------------------
Ryohei Nagaura

Attachments:

document.patchapplication/octet-stream; name=document.patchDownload+32-0
TCP_backend.patchapplication/octet-stream; name=TCP_backend.patchDownload+51-1
TCP_interface.patchapplication/octet-stream; name=TCP_interface.patchDownload+54-0
#7Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#6)
RE: Timeout parameters

Hi,

There was an invisible space, so I removed it.
I registered with 2019-01 commitfest.

Best regards,
---------------------
Ryohei Nagaura

Show quoted text

-----Original Message-----
From: Nagaura, Ryohei [mailto:nagaura.ryohei@jp.fujitsu.com]
Sent: Thursday, December 6, 2018 2:20 PM
To: 'Fabien COELHO' <coelho@cri.ensmp.fr>;
'pgsql-hackers@postgresql.org' <pgsql-hackers@postgresql.org>
Cc: Yahorau, A. (IBA) <AYahorau@ibagroup.eu>
Subject: RE: Timeout parameters

Hi, Fabien.

Thank you for your review.
And I'm very sorry to have kept you waiting so long.

About "socket_timeout"

I'm generally fine with giving more access to low-level parameters to

users.

However, I'm not sure I understand the use case you have that needs
these new extensions.

If you face the following situation, this parameter will be needed.
1. The connection between the server and the client has been established
normally.
2. A server process has been received SQL statement.
3. The server OS can return an ack packet, but it takes time to execute
the SQL statement
Or return the result because the server process is very busy.
4. The client wants to close the connection while leaving the job to the
server.
In this case, "statement_timeout" can't satisfy at line 4.

I think that there is some kind of a misnomer: this is not a
socket-level timeout, but a client-side query timeout, so it should be

named differently?
Yes, I think so.

I'm not sure how to name it, though.

Me too.

I think that the way it works is a little extreme, basically the
connection is aborted from within pqWait, and then restarted from scratch.

There is no clear way to know about the value of the setting (SHOW,
\set, \pset...). Ok, this is already the case of other connection

parameters.
If this parameter can be needed, I would like to discuss design and optional
functions.
How do you think?
I'll correct patch of "socket_timeout" after that.

About "TCP_USER_TIMEOUT"
I fixed on the previous feedback.
Would you review, please?

There are no tests.

I introduce the test methods of TCP_USER_TIMEOUT.

Test of client-side TCP_USER_TIMEOUT:
[client operation]
1. Connect DB server.
postgres=# psql
postgresql://USERNAME:PASSWORD@hostname:port/dbname?tcp_user_timeout=1
5000
2. Get the port number by the following command:
postgres=# select inet_client_port();
3. Close the client port from the other console of the client machine.
Please rewrite "56750" to the number confirmed on line 2.
$ iptables -I INPUT -p tcp --dport 56750 -j DROP 4. Query the
following SQL:
postgres=# select pg_sleep(10);
5. TCP USER TIMEOUT works correctly if an error message is output to the
console.

Test of server-side TCP_USER_TIMEOUT:
[client operation]
1. Connect DB server.
2. Get the port number by the following command:
postgres=# select inet_client_port();
3. Set the TCP_USER_TIMEOUT by the following command:
postgres=# set tcp_user_timeout=15000;
4. Query the following SQL:
postgres=# select pg_sleep(10);
5. Close the client port from the other console.
Please rewrite "56750" to the number confirmed on line 2.
$ iptables -I INPUT -p tcp --dport 56750 -j DROP [server operation]
6. Verify the logfile.

There is no documentation.

I made a patch of documentation of TCP USER TIMEOUT.

Best regards,
---------------------
Ryohei Nagaura

Attachments:

document_v2.patchapplication/octet-stream; name=document_v2.patchDownload+32-0
TCP_backend_v2.patchapplication/octet-stream; name=TCP_backend_v2.patchDownload+51-1
TCP_interface_v2.patchapplication/octet-stream; name=TCP_interface_v2.patchDownload+54-0
#8Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#6)
RE: Timeout parameters

Hi, Fabien.

The next CF will start so I want to restart the discussion.

About "socket_timeout"
If you face the following situation, this parameter will be needed.

If you feel that this situation can't happen or the use case is too limited, please point out so.

I think that there is some kind of a misnomer: this is not a
socket-level timeout, but a client-side query timeout, so it should be

named differently?
Yes, I think so.

I'm not sure how to name it, though.

Me too.

Since I want to use the monitoring target as the parameter name, let's decide the parameter name while designing.

I think that the way it works is a little extreme, basically the
connection is aborted from within pqWait, and then restarted from scratch.

Which motion seems to be uncomfortable?
Or both?

There is no clear way to know about the value of the setting (SHOW,
\set, \pset...).

That is a nice idea!
If this parameter implementation is decide, I'll also add these features.

About "TCP_USER_TIMEOUT"
I introduce the test methods of TCP_USER_TIMEOUT.

I only came up with this test methods with "iptables".
Since this command can be used only by root, I didn't create a script.

Best regards,
---------------------
Ryohei Nagaura

#9Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Nagaura, Ryohei (#7)
RE: Timeout parameters

I'm not sure I understand the use case you have that needs these new
extensions.

If you face the following situation, this parameter will be needed.
1. The connection between the server and the client has been established
normally.
2. A server process has been received SQL statement.
3. The server OS can return an ack packet, but it takes time to execute
the SQL statement
Or return the result because the server process is very busy.
4. The client wants to close the connection while leaving the job to the
server.
In this case, "statement_timeout" can't satisfy at line 4.

Why?

ISTM that "leaving the job" to the server with a client-side connection
closed is basically an abort, no different from what server-side
"statement_timeout" already provides?

Also, from a client perspective, if you use statement_timeout, it
would timeout, then the client would process the error and the connection
would be ready for the next query without needing to be re-created, which
is quite costly anyway? Also, if the server is busy, recreating an
connection is expensive so it won't help much, really?

So from your explanation above I must admit that I do not clearly
understand the use case for a client-side libpq-level SQL statement
timeout. I still need some convincing.

About the implementation, I'm wondering whether something simpler could be
done. Check how psql implements "ctrl-c" to abort a running query: it
seems that it sends a cancel message, no need to actually abort the
connection?

I think that there is some kind of a misnomer: this is not a
socket-level timeout, but a client-side query timeout, so it should be

named differently?
Yes, I think so.

Hmmm.... "client_statement_timeout" maybe?

--
Fabien.

#10Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Nagaura, Ryohei (#7)
RE: Timeout parameters

こんにちは Royhei,

About the patches: you are expected to send consistent patches, i.e. one
feature with its associated documentation, not two separate features and
another patch for documenting them.

--
Fabien.

#11Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Fabien COELHO (#9)
RE: Timeout parameters

Hi,

On Tue, Dec 25, 2018 at 2:59 AM, Fabien COELHO wrote:

4. The client wants to close the connection while leaving the job to
the server.
In this case, "statement_timeout" can't satisfy at line 4.

Why?
ISTM that "leaving the job" to the server with a client-side connection
closed is basically an abort, no different from what server-side
"statement_timeout" already provides?

"while leaving the job to the server" means that "while the server continue the job".
# Sorry for the inappropriate explanation.
I understand that "statement_timeout" won't.

Also, from a client perspective, if you use statement_timeout, it would
timeout, then the client would process the error and the connection would
be ready for the next query without needing to be re-created, which is quite
costly anyway? Also, if the server is busy, recreating an connection is
expensive so it won't help much, really?

When the recreating the connection the server may be not busy.
In this case, it isn't so costly to reconnect.
Also, if a client do not have to execute the remaining query immediately after timeout,
the client will have the choice of waiting until the server is not busy.

About the implementation, I'm wondering whether something simpler could
be done. Check how psql implements "ctrl-c" to abort a running query: it
seems that it sends a cancel message, no need to actually abort the
connection?

This is my homework.

Hmmm.... "client_statement_timeout" maybe?

I agree.

Best regards,
---------------------
Ryohei Nagaura

#12Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Fabien COELHO (#10)
RE: Timeout parameters

Hi Fabien.

On Wed, Dec 26, 2018 at 3:02 AM, Fabien COELHO wrote:

About the patches: you are expected to send consistent patches, i.e. one
feature with its associated documentation, not two separate features and
another patch for documenting them.

Thank you for teaching me.
I rewrote patches and attached in this mail.

Best regards,
---------------------
Ryohei Nagaura

Attachments:

TCP_backend_v3.patchapplication/octet-stream; name=TCP_backend_v3.patchDownload+71-0
TCP_interface_v3.patchapplication/octet-stream; name=TCP_interface_v3.patchDownload+65-0
#13Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Nagaura, Ryohei (#11)
RE: Timeout parameters

Hello Ryohei,

4. The client wants to close the connection while leaving the job to
the server.
In this case, "statement_timeout" can't satisfy at line 4.

Why?
ISTM that "leaving the job" to the server with a client-side connection
closed is basically an abort, no different from what server-side
"statement_timeout" already provides?

"while leaving the job to the server" means that "while the server continue the job".
# Sorry for the inappropriate explanation.
I understand that "statement_timeout" won't.

I still do not understand the use-case specifics: for me, aborting the
connection, or a softer cancelling the statement, will result in the
server stopping the statement, so the server does NOT "continue the job",
so I still do not see how it really differs from the server-side
statement_timeout setting.

--
Fabien.

#14Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Fabien COELHO (#13)
RE: Timeout parameters

From: Fabien COELHO [mailto:coelho@cri.ensmp.fr]

I still do not understand the use-case specifics: for me, aborting the
connection, or a softer cancelling the statement, will result in the
server stopping the statement, so the server does NOT "continue the job",
so I still do not see how it really differs from the server-side
statement_timeout setting.

How about when the server is so saturated that statement_timeout cannot work? See SQLNET.SEND_TIMEOUT and SQLNET.RECV_TIMEOUT here:

https://docs.oracle.com/cd/E11882_01/network.112/e10835/sqlnet.htm#NETRF228

As these parameter names suggest, maybe we could use SEND_TIMEO and RECV_TIMEO socket options for setsockopt() instead of using pqWaitTimed().

To wrap up, the relevant parameters work like this:

* TCP keepalive and TCP user (retransmission) timeout: for network problems
* statement_timeout: for long-running queries
* socket_timeout (or send/recv_timeout): for saturated servers

FYI, PgJDBC has a parameter named socketTimeout:

https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters

Regards
Takayuki Tsunakawa

#15AYahorau@ibagroup.eu
AYahorau@ibagroup.eu
In reply to: Tsunakawa, Takayuki (#14)
RE: Timeout parameters

Hello,

To wrap up, the relevant parameters work like this:

* TCP keepalive and TCP user (retransmission) timeout: for network

problems

* statement_timeout: for long-running queries
* socket_timeout (or send/recv_timeout): for saturated servers

Takayuki Tsunakawa, could you provide wider explanation of socket_timeout?
I'm little bit misunderstanding in which cases this parameter is/can be
used.

I would like to add some more information about TCP keepalive and
TCP_USER_TIMEOUT mechanisms:
1) Both these mechanisms are used for termination socket connection in
case of network problems.
2) This termination of tcp connection is done on operation system level
(kernel) and not by application.
3) TCP keepalive and TCP_USER_TIMEOUT work differently and complement each
other :
* TCP keepalive mechanism works when a socket is in idle state
(there is no any transaction in this case)
* TCP_USER_TIMEOUT mechanism works when a socket is in active state
(sending/receiving data).

So, TCP keepalive and TCP_USER_TIMEOUT provide full control under network
state directly after creating a TCP socket and applying these parameters
to it. Moreover, this control is delegate to the operation system (kernel)
in case it supports such mechanisms.
If TCP_USER_TIMEOUT is not supported by PostgreSQL, it means that TCP
connection are partly controlled by the operation system (kernel). In this
case pqWaitTimed() should be used on the application layer for connection
control in data transmission phase.

In my opinion, there is no any difference between server and client
connection sides. To avoid mess in the configuration it seems reasonable
to give the same name of this option for the client and server sides.

To my mind, this description of tcp_user_timeout option is not correct
(See my comment about TCP_USER_TIMEOUT mechanism above).

+ <listitem>
+ <para>
+ Specify in milliseconds the time to disconnect to the client 
+ when there is no ack packet from the client to the server's data 

transmission.

+ This parameter is supported on linux version 2.6.37 or later.
+ </para>
+ <note>
+ <para>
+ This parameter is not supported on Windows.
+ </para>
+ </note>
+ </listitem>

As for me It better to specify the description as follows:

<listitem>
<para>
Define a wrapper for TCP_USER_TIMEOUT socket option of libpq connection.
</para>
<para>
Specifies the number of milliseconds after which a TCP connection can be
aborted by the operation system due to network problems when the data is
transmitting through this connection (sending/receiving). A value of 0
uses the system default. This parameter is supported only on systems that
support TCP_USER_TIMEOUT or an equivalent socket option, and on Windows;
on other systems, it must be zero. In sessions connected via a Unix-domain
socket, this parameter is ignored and always reads as zero.
</para>
<note>
<para>
This parameter is not supported on Windows, and must be zero.
</para>
<para>
To enable full control under TCP connection use this option together with
keepalive.
</para>
</note>
</listitem>

Best regards,
Andrei Yahorau

From: "Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>
To: 'Fabien COELHO' <coelho@cri.ensmp.fr>, "Nagaura, Ryohei"
<nagaura.ryohei@jp.fujitsu.com>,
Cc: "'pgsql-hackers@postgresql.org'" <pgsql-hackers@postgresql.org>,
"AYahorau@ibagroup.eu" <AYahorau@ibagroup.eu>
Date: 27/12/2018 11:26
Subject: RE: Timeout parameters

From: Fabien COELHO [mailto:coelho@cri.ensmp.fr]

I still do not understand the use-case specifics: for me, aborting the
connection, or a softer cancelling the statement, will result in the
server stopping the statement, so the server does NOT "continue the

job",

so I still do not see how it really differs from the server-side
statement_timeout setting.

How about when the server is so saturated that statement_timeout cannot
work? See SQLNET.SEND_TIMEOUT and SQLNET.RECV_TIMEOUT here:

https://docs.oracle.com/cd/E11882_01/network.112/e10835/sqlnet.htm#NETRF228

As these parameter names suggest, maybe we could use SEND_TIMEO and
RECV_TIMEO socket options for setsockopt() instead of using pqWaitTimed().

To wrap up, the relevant parameters work like this:

* TCP keepalive and TCP user (retransmission) timeout: for network
problems
* statement_timeout: for long-running queries
* socket_timeout (or send/recv_timeout): for saturated servers

FYI, PgJDBC has a parameter named socketTimeout:

https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters

Regards
Takayuki Tsunakawa

#16Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Fabien COELHO (#13)
RE: Timeout parameters

Hi,

Sorry for my late.

On Tue, Dec 25, 2018 at 7:40 PM, Fabien COELHO wrote:

I still do not understand the use-case specifics: for me, aborting the
connection, or a softer cancelling the statement, will result in the server
stopping the statement, so the server does NOT "continue the job",

The server continue the job when the connection is aborted.
You can confirm by the following procedure.
0. connect to the server normally.
1. query the following statement
=# update tbl set clmn = (select to_number(pg_sleep(10)||'10','20'));
2. kill client-side psql process before the result returns.
3. reconnect the server and "select" query on tbl.

On Thu, Jan 10, 2019 at 3:14 AM, AYahorau@ibagroup.eu wrote:

Takayuki Tsunakawa, could you provide wider explanation of socket_timeout?
I'm little bit misunderstanding in which cases this parameter is/can be
used.

The communication between a client and the server is normal.
The server is so busy that the server can return ack packet and can't work statement_timeout.
In this case, the client may wait for very long time.
This parameter is effective for such clients.

If TCP_USER_TIMEOUT is not supported by PostgreSQL, it means that TCP
connection are partly controlled by the operation system (kernel). In this
case pqWaitTimed() should be used on the application layer for connection
control in data transmission phase.

In the current postgres, PQgetResult() called by sync command "PQexec()" uses pqWait().
If the user wishes to sync communication, how do you specify the waiting time limit?
It makes sense to implement in pqWait() that can wait clients indefinitely, I think.

As for me It better to specify the description as follows:

Thank you for your comment.
I adopted your documentation in the current patch.

On Wed, Dec 26, 2018 at 8:25 PM, Tsunakawa, Takayuki wrote:

To wrap up, the relevant parameters work like this:

* TCP keepalive and TCP user (retransmission) timeout: for network problems
* statement_timeout: for long-running queries
* socket_timeout (or send/recv_timeout): for saturated servers

Thank you for your summary.

Best regards,
---------------------
Ryohei Nagaura

Attachments:

TCP_backend_v4.patchapplication/octet-stream; name=TCP_backend_v4.patchDownload+94-0
TCP_interface_v4.patchapplication/octet-stream; name=TCP_interface_v4.patchDownload+81-0
#17Michael Paquier
michael@paquier.xyz
In reply to: Nagaura, Ryohei (#16)
Re: Timeout parameters

On Mon, Jan 28, 2019 at 04:51:11AM +0000, Nagaura, Ryohei wrote:

Sorry for my late.

Moved to next CF per the latest updates: there is a patch with no
reviews for it.
--
Michael

#18Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Michael Paquier (#17)
RE: Timeout parameters

Hi Fabien,

Would you review TCP_USER_TIMEOUT patches first please?
I want to avoid the situation that
the discussion of socket_timeout has been lengthened
and tcp_user_timeout patch is also not commit in the next CF.

On Mon, Feb 4, 2019 at 2:24 AM, Michael Paquier wrote:

Moved to next CF per the latest updates: there is a patch with no reviews for it.

Thank you.

Best regards,
---------------------
Ryohei Nagaura

#19Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#18)
RE: Timeout parameters

Hi,

I tried to re-read the whole thread.
Based from what I read, there are two proposed timeout parameters,
which I think can be discussed and commited separately:
(1) tcp_user_timeout
(2) tcp_socket_timeout (or suggested client_statement_timeout,
send_timeout/recv_timeout)

Regarding the use-case of each parameter, Tsunakawa-san briefly
explained them above. Quoting again:

* TCP keepalive and TCP user (retransmission) timeout: for network problems
* statement_timeout: for long-running queries
* socket_timeout (or send/recv_timeout): for saturated servers

The already existing statement_timeout mainly limits how long
each statement should run. [1]https://www.postgresql.org/docs/devel/runtime-config-client.html
However, even if statement_timeout was configured, it does not
handle the timeout for instances that a network failure occurs,
so the application would not recover from error.
Therefore, there's a need for these features, to meet the cases
that statement_timeout currently does not handle.

1) tcp_user_timeout parameter
As for user_timeout param, there seems to be a common agreement
with regards to its need.

Just minor nitpick:
+ char *tcp_user_timeout; /* TCP USER TIMEOUT */
I think that's unnecessary capitalization in the user timeout part.

The latest tcp_user_timeout patch seems to be almost in good shape,
feedback about doc (clearer description from Andrei)
and code (whitespace, C-style, parse_int_param, handling old kernel
version) were addressed.

I think this can be "committed" separately when it's finalized.

2) tcp_socket_timeout parameter
On the other hand, there needs to be a further discussion and design
improvement with regards to the implementation of socket_timeout:
- whether (a) it should abort the connection from pqWait() or
other means, or (b) cancel the statement similar to how psql
does it as suggested by Fabien
- proper parameter name

Based from your description below, I agree with Fabien that it's somehow
the application/client side query timeout

"socket_timeout" is the application layer timeout parameter from when
frontend issues SQL query to when frontend receives the execution result
from backend. When this parameter is active and timeout occurs, frontend
close the socket. It is a merit for client to set the maximum time to
wait for SQL.

In PgJDBC, it serves two purpose though: query timeout and network problem
detection. The use of socketTimeout aborts the connection. [2]https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters

The timeout value used for socket read operations. If reading from
the server takes longer than this value, the connection is closed.
This can be used as both a brute force global query timeout and a
method of detecting network problems. The timeout is specified in
seconds and a value of zero means that it is disabled.

Perhaps you could also clarify a bit more through documentation on how
socket_timeout handles the timeout differently from statement_timeout
and tcp_user_timeout.
Then we can decide on the which parameter name is better once the
implementation becomes clearer.

[1]: https://www.postgresql.org/docs/devel/runtime-config-client.html
[2]: https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters

Regards,
Kirk Jamison

#20Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Jamison, Kirk (#19)
RE: Timeout parameters

From: Jamison, Kirk [mailto:k.jamison@jp.fujitsu.com]

1) tcp_user_timeout parameter
I think this can be "committed" separately when it's finalized.

Do you mean you've reviewed and tested the patch by simulating a communication failure in the way Nagaura-san suggested?

2) tcp_socket_timeout parameter
- whether (a) it should abort the connection from pqWait() or
other means, or
(b) cancel the statement similar to how psql
does it as suggested by Fabien

We have no choice but to terminate the connection, because we can't tell whether we can recover from the problem and continue to use the connection (e.g. long-running query) or not (permanent server or network failure).

Regarding the place, pqWait() is the best (and possibly only) place. The purpose of this feature is to avoid waiting for response from the server forever (or too long) in any case, as a last resort.

Oracle has similar parameters called SQLNET.RECV_TIMEOUT and SQLNET.SEND_TIMEOUT. From those names, I guess they use SO_RCVTIMEO and SO_SNDTIMEO socket options. However, we can't use them because use non-blocking sockets and poll(), while SO_RCV/SND_TIMEO do ont have an effect for poll():

[excerpt from "man 7 socket"]
--------------------------------------------------
SO_RCVTIMEO and SO_SNDTIMEO
Specify the receiving or sending timeouts until reporting an
error. The argument is a struct timeval. If an input or output
function blocks for this period of time, and data has been sent
or received, the return value of that function will be the
amount of data transferred; if no data has been transferred and
the timeout has been reached then -1 is returned with errno set
to EAGAIN or EWOULDBLOCK just as if the socket was specified to
be non-blocking. If the timeout is set to zero (the default)
then the operation will never timeout. Timeouts only have
effect for system calls that perform socket I/O (e.g., read(2),
recvmsg(2), send(2), sendmsg(2)); timeouts have no effect for
select(2), poll(2), epoll_wait(2), etc.
--------------------------------------------------

- proper parameter name

Based from your description below, I agree with Fabien that it's somehow
the application/client side query timeout

I think the name is good because it indicates the socket-level timeout. That's just like PgJDBC and Oracle, and I didn't feel strange when I read their manuals.

Perhaps you could also clarify a bit more through documentation on how
socket_timeout handles the timeout differently from statement_timeout
and tcp_user_timeout.

Maybe. Could you suggest good description?

Regards
Takayuki Tsunakawa

#21Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Jamison, Kirk (#19)
#22Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#21)
#23Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Tsunakawa, Takayuki (#20)
#24Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#23)
#25Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Tsunakawa, Takayuki (#20)
#26MikalaiKeida@ibagroup.eu
MikalaiKeida@ibagroup.eu
In reply to: Jamison, Kirk (#25)
#27Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: MikalaiKeida@ibagroup.eu (#26)
#28Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Jamison, Kirk (#25)
#29Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: MikalaiKeida@ibagroup.eu (#26)
#30Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Tsunakawa, Takayuki (#28)
#31Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Jamison, Kirk (#30)
#32Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Tsunakawa, Takayuki (#31)
#33Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#32)
#34Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Jamison, Kirk (#33)
#35Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Jamison, Kirk (#33)
#36Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#35)
#37Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Jamison, Kirk (#36)
#38Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#37)
#39Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Nagaura, Ryohei (#38)
#40Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Nagaura, Ryohei (#38)
#41Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Fabien COELHO (#40)
#42Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Fabien COELHO (#41)
#43Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Fabien COELHO (#40)
#44Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Nagaura, Ryohei (#43)
#45Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Fabien COELHO (#44)
#46MikalaiKeida@ibagroup.eu
MikalaiKeida@ibagroup.eu
In reply to: Nagaura, Ryohei (#45)
#47Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: MikalaiKeida@ibagroup.eu (#46)
#48MikalaiKeida@ibagroup.eu
MikalaiKeida@ibagroup.eu
In reply to: Nagaura, Ryohei (#47)
#49Robert Haas
robertmhaas@gmail.com
In reply to: Nagaura, Ryohei (#45)
#50Robert Haas
robertmhaas@gmail.com
In reply to: Nagaura, Ryohei (#38)
#51Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Robert Haas (#49)
#52Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: MikalaiKeida@ibagroup.eu (#48)
#53Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Robert Haas (#50)
#54Robert Haas
robertmhaas@gmail.com
In reply to: Fabien COELHO (#51)
#55Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Robert Haas (#54)
#56Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Nagaura, Ryohei (#1)
#57Robert Haas
robertmhaas@gmail.com
In reply to: Fabien COELHO (#55)
#58Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Fabien COELHO (#56)
#59Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Robert Haas (#57)
#60Robert Haas
robertmhaas@gmail.com
In reply to: Tsunakawa, Takayuki (#59)
#61Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Robert Haas (#60)
#62Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tsunakawa, Takayuki (#61)
#63Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Tsunakawa, Takayuki (#58)
#64Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Kyotaro Horiguchi (#62)
#65Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Fabien COELHO (#63)
#66MikalaiKeida@ibagroup.eu
MikalaiKeida@ibagroup.eu
In reply to: Fabien COELHO (#63)
#67Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: MikalaiKeida@ibagroup.eu (#66)
#68Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: MikalaiKeida@ibagroup.eu (#66)
#69Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tsunakawa, Takayuki (#67)
#70MikalaiKeida@ibagroup.eu
MikalaiKeida@ibagroup.eu
In reply to: Tsunakawa, Takayuki (#67)
#71Robert Haas
robertmhaas@gmail.com
In reply to: Tsunakawa, Takayuki (#61)
#72Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#71)
#73Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Robert Haas (#72)
#74Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Robert Haas (#71)
#75Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: MikalaiKeida@ibagroup.eu (#70)
#76MikalaiKeida@ibagroup.eu
MikalaiKeida@ibagroup.eu
In reply to: Tsunakawa, Takayuki (#75)
#77Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: MikalaiKeida@ibagroup.eu (#76)
#78MikalaiKeida@ibagroup.eu
MikalaiKeida@ibagroup.eu
In reply to: Tsunakawa, Takayuki (#77)
#79Fabien COELHO
coelho@cri.ensmp.fr
In reply to: MikalaiKeida@ibagroup.eu (#66)
#80Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: MikalaiKeida@ibagroup.eu (#78)
#81Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Fabien COELHO (#79)
#82Robert Haas
robertmhaas@gmail.com
In reply to: Jamison, Kirk (#81)
#83Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Robert Haas (#82)
#84Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Tsunakawa, Takayuki (#83)
#85Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#84)
#86Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Jamison, Kirk (#85)
#87Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#86)
#88Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Nagaura, Ryohei (#86)
#89Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Fabien COELHO (#88)
#90Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Fabien COELHO (#88)
#91Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Nagaura, Ryohei (#89)
#92Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Kyotaro Horiguchi (#91)
#93Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Tsunakawa, Takayuki (#92)
#94Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Kyotaro Horiguchi (#91)
#95Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Tsunakawa, Takayuki (#92)
#96Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#94)
#97Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#96)
#98Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#97)
#99Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Jamison, Kirk (#98)
#100Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#99)
#101Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#100)
#102Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#100)
#103Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Tsunakawa, Takayuki (#102)
#104Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Jamison, Kirk (#103)
#105Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Jamison, Kirk (#103)
#106Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#105)
#107Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#106)
#108Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Nagaura, Ryohei (#107)
#109Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Kyotaro Horiguchi (#108)
#110Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#109)
#111Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Nagaura, Ryohei (#109)
#112Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Fabien COELHO (#111)
#113Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#112)
#114Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Jamison, Kirk (#113)
#115Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Nagaura, Ryohei (#114)
#116Michael Paquier
michael@paquier.xyz
In reply to: Jamison, Kirk (#115)
#117Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Michael Paquier (#116)
#118Michael Paquier
michael@paquier.xyz
In reply to: Tsunakawa, Takayuki (#117)
#119Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Michael Paquier (#118)
#120Nagaura, Ryohei
nagaura.ryohei@jp.fujitsu.com
In reply to: Michael Paquier (#118)