Add client connection check during the execution of the query

Started by Sergey Cherkashinover 7 years ago54 messages
Jump to latest
#1Sergey Cherkashin
s.cherkashin@postgrespro.ru

This patch adds verification of the connection with the client during
the execution of the SQL query. The feature enables using the GUC
variable ‘client_connection_check_interval’. The default check interval
is 1 second. If you set the value of ‘client_connection_check_interval’
to 0, then the check will not be performed.
The feature will be useful in cases when, during the execution of a very
long query, the client suddenly terminates the connection - this will
allow backend to cancel further execution of the query and free server
resources.

Attachments:

Add_client_connection_check.patchtext/x-diff; name=Add_client_connection_check.patchDownload+158-1
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sergey Cherkashin (#1)
Re: Add client connection check during the execution of the query

s.cherkashin@postgrespro.ru writes:

This patch adds verification of the connection with the client during
the execution of the SQL query. The feature enables using the GUC
variable ‘client_connection_check_interval’. The default check interval
is 1 second. If you set the value of ‘client_connection_check_interval’
to 0, then the check will not be performed.

I took a quick look through this.

* It won't even compile on non-Linux platforms, because MSG_DONTWAIT
is a Linux-ism. Perhaps that can be replaced by putting the client
socket into nonblock mode, but I'm not very sure that that'll work
(especially when using OpenSSL or another TLS implementation).

* I'm not convinced that this will reliably detect client connection loss.
AFAICS, if there is any unread data pending, it'd report that all is well
even if the client dropped off the net after sending that data. It's hard
to evaluate how likely such a situation is, but one really obvious case
is that the client might've sent an 'X' message to try to close the
connection gracefully. Also, use of TLS would again make things much
harder to reason about, because the TLS layer may send or receive data
that we don't know about.

* The management of the pending timeout interrupt seems like a mess.
Why did you involve ProcessInterrupts in that? It seems likely to queue
extra timeouts at random times due to unrelated interrupts causing that
bit of code to run, and/or cause weird gaps in the timeout intervals due
to not being run promptly. I'd be inclined to set this up so that the
timeout handler itself re-queues the timeout (I think that will work, or
if not, we should probably fix timeout.c so that it can).

* BTW, I am not on board with making this enabled-by-default.

This does seem like possibly a useful option if we can make it
work portably/reliably, but I don't have very high hopes for that.

regards, tom lane

#3Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#2)
Re: Add client connection check during the execution of the query

Hi,

On 2019-01-13 18:05:39 -0500, Tom Lane wrote:

s.cherkashin@postgrespro.ru writes:

This patch adds verification of the connection with the client during
the execution of the SQL query. The feature enables using the GUC
variable ‘client_connection_check_interval’. The default check interval
is 1 second. If you set the value of ‘client_connection_check_interval’
to 0, then the check will not be performed.

I took a quick look through this.

* It won't even compile on non-Linux platforms, because MSG_DONTWAIT
is a Linux-ism. Perhaps that can be replaced by putting the client
socket into nonblock mode, but I'm not very sure that that'll work
(especially when using OpenSSL or another TLS implementation).

* I'm not convinced that this will reliably detect client connection loss.
AFAICS, if there is any unread data pending, it'd report that all is well
even if the client dropped off the net after sending that data. It's hard
to evaluate how likely such a situation is, but one really obvious case
is that the client might've sent an 'X' message to try to close the
connection gracefully. Also, use of TLS would again make things much
harder to reason about, because the TLS layer may send or receive data
that we don't know about.

* The management of the pending timeout interrupt seems like a mess.
Why did you involve ProcessInterrupts in that? It seems likely to queue
extra timeouts at random times due to unrelated interrupts causing that
bit of code to run, and/or cause weird gaps in the timeout intervals due
to not being run promptly. I'd be inclined to set this up so that the
timeout handler itself re-queues the timeout (I think that will work, or
if not, we should probably fix timeout.c so that it can).

* BTW, I am not on board with making this enabled-by-default.

This does seem like possibly a useful option if we can make it
work portably/reliably, but I don't have very high hopes for that.

Given that nothing happened since this message, and the commitfest is
ending, I'm going to mark this as returned with feedback.

Greetings,

Andres Freund

#4Sergey Cherkashin
s.cherkashin@postgrespro.ru
In reply to: Tom Lane (#2)
Re: Add client connection check during the execution of the query

The purpose of this patch is to stop the execution of continuous
requests in case of a disconnection from the client. In most cases, the
client must wait for a response from the server before sending new data
- which means there should not remain unread data on the socket and we
will be able to determine a broken connection.
Perhaps exceptions are possible, but I could not think of such a use
case (except COPY). I would be grateful if someone could offer such
cases or their solutions.
I added a test for the GUC variable when the client connects via SSL,
but I'm not sure that this test is really necessary.

Best regards,
Sergey Cherkashin.

Attachments:

Add_client_connection_check_v2.patchtext/x-diff; name=Add_client_connection_check_v2.patchDownload+211-1
#5Thomas Munro
thomas.munro@gmail.com
In reply to: Sergey Cherkashin (#4)
Re: Add client connection check during the execution of the query

On Sat, Feb 9, 2019 at 6:16 AM <s.cherkashin@postgrespro.ru> wrote:

The purpose of this patch is to stop the execution of continuous
requests in case of a disconnection from the client. In most cases, the
client must wait for a response from the server before sending new data
- which means there should not remain unread data on the socket and we
will be able to determine a broken connection.
Perhaps exceptions are possible, but I could not think of such a use
case (except COPY). I would be grateful if someone could offer such
cases or their solutions.
I added a test for the GUC variable when the client connects via SSL,
but I'm not sure that this test is really necessary.

Hello Sergey,

This seems like a reasonable idea to me. There is no point in running
a monster 24 hour OLAP query if your client has gone away. It's using
MSG_PEEK which is POSIX, and I can't immediately think of any reason
why it's not safe to try to peek at a byte in that socket at any time.
Could you add a comment to explain why you have !PqCommReadingMsg &&
!PqCommBusy? The tests pass on a couple of different Unixoid OSes I
tried. Is it really necessary to do a bunch of IO and busy CPU work
in $long_query? pg_sleep(60) can do the job, because it includes a
standard CHECK_FOR_INTERRUPTS/latch loop that will loop around on
SIGALRM. I just tested that your patch correctly interrupts
pg_sleep() if I kill -9 my psql process. Why did you call the timeout
SKIP_CLIENT_CHECK_TIMEOUT (I don't understand the "SKIP" part)? Why
not CLIENT_CONNECTION_CHECK_TIMEOUT or something?

I wonder if it's confusing to users that you get "connection to client
lost" if the connection goes away while running a query, but nothing
if the connection goes away without saying goodbye ("X") while idle.

The build fails on Windows. I think it's because
src/tools/msvc/Mkvcbuild.pm is trying to find something to compile
under src/test/modules/connection, and I think the solution is to add
that to the variable @contrib_excludes. (I wonder if that script
should assume there is nothing to build instead of dying with "Could
not determine contrib module type for connection", otherwise every
Unix hacker is bound to get this wrong every time.)

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.45820

Aside from that problem, my Windows CI building thing isn't smart
enough to actually run those extra tests yet, so I don't know if it
actually works on that platform yet (I really need to teach that thing
to use the full buildfarm scripts...)

--
Thomas Munro
https://enterprisedb.com

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#5)
Re: Add client connection check during the execution of the query

On Fri, Jul 5, 2019 at 5:36 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Sat, Feb 9, 2019 at 6:16 AM <s.cherkashin@postgrespro.ru> wrote:

The purpose of this patch is to stop the execution of continuous
requests in case of a disconnection from the client. In most cases, the
client must wait for a response from the server before sending new data
- which means there should not remain unread data on the socket and we
will be able to determine a broken connection.
Perhaps exceptions are possible, but I could not think of such a use
case (except COPY). I would be grateful if someone could offer such
cases or their solutions.
I added a test for the GUC variable when the client connects via SSL,
but I'm not sure that this test is really necessary.

[review]

Email to Sergey is bouncing back. I've set this to "Waiting on
author" in the Commitfest app.

--
Thomas Munro
https://enterprisedb.com

#7Tatsuo Ishii
ishii@postgresql.org
In reply to: Sergey Cherkashin (#4)
Re: Add client connection check during the execution of the query

The purpose of this patch is to stop the execution of continuous
requests in case of a disconnection from the client.

Pgpool-II already does this by sending a parameter status message to
the client. It is expected that clients are always prepared to receive
the parameter status message. This way I believe we could reliably
detect that the connection to the client is broken or not.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#8Tatsuo Ishii
ishii@postgresql.org
In reply to: Thomas Munro (#5)
Re: Add client connection check during the execution of the query

This seems like a reasonable idea to me. There is no point in running
a monster 24 hour OLAP query if your client has gone away. It's using
MSG_PEEK which is POSIX, and I can't immediately think of any reason
why it's not safe to try to peek at a byte in that socket at any time.

I am not familiar with Windows but I accidentally found this article
written by Microsoft:

https://support.microsoft.com/en-us/help/192599/info-avoid-data-peeking-in-winsock

It seems using MSG_PEEK is not recommended by Microsoft.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#9Thomas Munro
thomas.munro@gmail.com
In reply to: Tatsuo Ishii (#8)
Re: Add client connection check during the execution of the query

On Fri, Jul 5, 2019 at 6:42 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

This seems like a reasonable idea to me. There is no point in running
a monster 24 hour OLAP query if your client has gone away. It's using
MSG_PEEK which is POSIX, and I can't immediately think of any reason
why it's not safe to try to peek at a byte in that socket at any time.

I am not familiar with Windows but I accidentally found this article
written by Microsoft:

https://support.microsoft.com/en-us/help/192599/info-avoid-data-peeking-in-winsock

It seems using MSG_PEEK is not recommended by Microsoft.

Hmm, interesting. Using it very infrequently just as a way to detect
that the other end has gone away doesn't seem too crazy based on
anything in that article though, does it? What they're saying
actually applies to every operating system, not just Windows, AFAICS.
Namely, don't use MSG_PEEK frequently because it's a syscall and takes
locks in the kernel, and don't use it to wait for full messages to
arrive, or you might effectively deadlock if internal buffers are
full. But Sergey's patch only uses it to check if we could read 1
single byte, and does so very infrequently (the GUC should certainly
be set to at least many seconds).

What else could we do? Assuming the kernel actually knows the
connection has gone away, the WaitEventSetWait() interface is no help
on its own, I think, because it'll just tell you the socket is read
for reading when it's closed, you still have to actually try to read
to distinguish closed from a data byte.

I tried this patch using a real network with two machines. I was able
to get the new "connection to client lost" error by shutting down a
network interface (effectively yanking a cable), but only with TCP
keepalive configured. That's not too surprising; without that and
without trying to write, there is no way for the kernel to know that
the other end has gone.

--
Thomas Munro
https://enterprisedb.com

#10Thomas Munro
thomas.munro@gmail.com
In reply to: Tatsuo Ishii (#7)
Re: Add client connection check during the execution of the query

On Fri, Jul 5, 2019 at 6:28 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

The purpose of this patch is to stop the execution of continuous
requests in case of a disconnection from the client.

Pgpool-II already does this by sending a parameter status message to
the client. It is expected that clients are always prepared to receive
the parameter status message. This way I believe we could reliably
detect that the connection to the client is broken or not.

Hmm. If you send a message, it's basically application-level
keepalive. But it's a lot harder to be sure that the protocol and
socket are in the right state to insert a message at every possible
CHECK_FOR_INTERRUPT() location. Sergey's proposal of recv(MSG_PEEK)
doesn't require any knowledge of the protocol at all, though it
probably does need TCP keepalive to be configured to be useful for
remote connections.

--
Thomas Munro
https://enterprisedb.com

#11Stas Kelvich
s.kelvich@postgrespro.ru
In reply to: Thomas Munro (#10)
Re: Add client connection check during the execution of the query

On 5 Jul 2019, at 11:46, Thomas Munro <thomas.munro@gmail.com> wrote:

On Fri, Jul 5, 2019 at 6:28 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

The purpose of this patch is to stop the execution of continuous
requests in case of a disconnection from the client.

Pgpool-II already does this by sending a parameter status message to
the client. It is expected that clients are always prepared to receive
the parameter status message. This way I believe we could reliably
detect that the connection to the client is broken or not.

Hmm. If you send a message, it's basically application-level
keepalive. But it's a lot harder to be sure that the protocol and
socket are in the right state to insert a message at every possible
CHECK_FOR_INTERRUPT() location. Sergey's proposal of recv(MSG_PEEK)
doesn't require any knowledge of the protocol at all, though it
probably does need TCP keepalive to be configured to be useful for
remote connections.

Well, indeed in case of cable disconnect only way to detect it with
proposed approach is to have tcp keepalive. However if disconnection
happens due to client application shutdown then client OS should itself
properly close than connection and therefore this patch will detect
such situation without keepalives configured.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#12Thomas Munro
thomas.munro@gmail.com
In reply to: Stas Kelvich (#11)
Re: Add client connection check during the execution of the query

On Sat, Jul 6, 2019 at 12:27 AM Stas Kelvich <s.kelvich@postgrespro.ru> wrote:

Well, indeed in case of cable disconnect only way to detect it with
proposed approach is to have tcp keepalive. However if disconnection
happens due to client application shutdown then client OS should itself
properly close than connection and therefore this patch will detect
such situation without keepalives configured.

Yeah.

+1 for this patch, with a few adjustments including making the test
use pg_sleep() as mentioned. It does something useful, namely
cancelling very long running queries sooner if the client has gone
away instead of discovering that potentially much later when sending a
response. It does so with a portable kernel interface (though we
haven't heard from a Windows tester), and I think it's using it in a
safe way (we're not doing the various bad things you can do with
MSG_PEEK, and the fd is expected to be valid for the process's
lifetime, and the socket is always in non-blocking mode*, so I don't
think there is any bad time for pg_check_client_connection() to run).
It reuses the existing timer infrastructure so there isn't really any
new overhead. One syscall every 10 seconds or whatever at the next
available CFI is basically nothing. On its own, this patch will
reliably detect clients that closed abruptly or exited/crashed (so
they client kernel sends a FIN packet). In combination with TCP
keepalive, it'll also detect clients that went away because the
network or client kernel ceased to exist.

*There are apparently no callers of pg_set_block(), so if you survived
pq_init() you have a non-blocking socket. If you're on Windows, the
code always sets the magic pgwin32_noblock global flag before trying
to peek. I wondered if it's OK that the CFI would effectively clobber
that with 0 on its way out, but that seems to be OK because every
place in the code that sets that flag does so immediately before an IO
operationg without a CFI in between. As the comment in pgstat.c says
"This is extremely broken and should be fixed someday.". I wonder if
we even need that flag at all now that all socket IO is non-blocking.

--
Thomas Munro
https://enterprisedb.com

#13Tatsuo Ishii
ishii@postgresql.org
In reply to: Thomas Munro (#12)
Re: Add client connection check during the execution of the query

Yeah.

+1 for this patch, with a few adjustments including making the test
use pg_sleep() as mentioned. It does something useful, namely
cancelling very long running queries sooner if the client has gone
away instead of discovering that potentially much later when sending a
response. It does so with a portable kernel interface (though we
haven't heard from a Windows tester), and I think it's using it in a
safe way (we're not doing the various bad things you can do with
MSG_PEEK, and the fd is expected to be valid for the process's
lifetime, and the socket is always in non-blocking mode*, so I don't
think there is any bad time for pg_check_client_connection() to run).
It reuses the existing timer infrastructure so there isn't really any
new overhead. One syscall every 10 seconds or whatever at the next
available CFI is basically nothing. On its own, this patch will
reliably detect clients that closed abruptly or exited/crashed (so
they client kernel sends a FIN packet). In combination with TCP
keepalive, it'll also detect clients that went away because the
network or client kernel ceased to exist.

*There are apparently no callers of pg_set_block(), so if you survived
pq_init() you have a non-blocking socket. If you're on Windows, the
code always sets the magic pgwin32_noblock global flag before trying
to peek. I wondered if it's OK that the CFI would effectively clobber
that with 0 on its way out, but that seems to be OK because every
place in the code that sets that flag does so immediately before an IO
operationg without a CFI in between. As the comment in pgstat.c says
"This is extremely broken and should be fixed someday.". I wonder if
we even need that flag at all now that all socket IO is non-blocking.

I have looked into the patch and tested a little bit.

First of all, I had to grab February snapshot to test the patch
because it does not apply to the current HEAD. I noticed that there
are some confusions in the doc and code regarding what the new
configuration parameter means. According to the doc:

+        Default value is <literal>zero</literal> - it disables connection
+        checks, so the backend will detect client disconnection only when trying
+        to send a response to the query.

But guc.c comment says:

+			gettext_noop("Sets the time interval for checking connection with the client."),
+			gettext_noop("A value of -1 disables this feature. Zero selects a suitable default value."),

Probably the doc is correct since the actual code does so.

Also I found this in postgresql.conf default:

#client_connection_check_interval = 1000 # set time interval between

So here the default value seems to be be 1000. If so, guc.c should be
adjusted and the doc should be changed accordingly. I am not sure.

Next I have tested the patch using standard pgbench.

With the feature enabled with 1000ms check interval:
$ pgbench -c 10 -T 300 -S test
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 300 s
number of transactions actually processed: 19347995
latency average = 0.155 ms
tps = 64493.278581 (including connections establishing)
tps = 64493.811945 (excluding connections establishing)

Without the feature (client-connection-check-interval = 0)
$ pgbench -c 10 -T 300 -S test
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 300 s
number of transactions actually processed: 20314812
latency average = 0.148 ms
tps = 67715.993428 (including connections establishing)
tps = 67717.251843 (excluding connections establishing)

So the performance is about 5% down with the feature enabled in this
case. For me, 5% down is not subtle. Probably we should warn this in
the doc.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#14Thomas Munro
thomas.munro@gmail.com
In reply to: Tatsuo Ishii (#13)
Re: Add client connection check during the execution of the query

On Thu, Jul 18, 2019 at 3:19 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

So the performance is about 5% down with the feature enabled in this
case. For me, 5% down is not subtle. Probably we should warn this in
the doc.

Yeah, the timer logic is wrong. I didn't have time to look into it
but with truss/strace for some reason I see 3 setitimer() syscalls for
every query, but I think this doesn't even need to set the timer for
every query.

--
Thomas Munro
https://enterprisedb.com

#15Tatsuo Ishii
ishii@postgresql.org
In reply to: Thomas Munro (#14)
Re: Add client connection check during the execution of the query

Yeah, the timer logic is wrong. I didn't have time to look into it
but with truss/strace for some reason I see 3 setitimer() syscalls for
every query, but I think this doesn't even need to set the timer for
every query.

Hum. I see 2 settimer(), instead of 3.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#15)
Re: Add client connection check during the execution of the query

Tatsuo Ishii <ishii@sraoss.co.jp> writes:

Yeah, the timer logic is wrong. I didn't have time to look into it
but with truss/strace for some reason I see 3 setitimer() syscalls for
every query, but I think this doesn't even need to set the timer for
every query.

Hum. I see 2 settimer(), instead of 3.

src/backend/utils/misc/timeout.c is not really designed for there
to be timeouts that persist across multiple queries. It can probably
be made better, but this patch doesn't appear to have touched any of
that logic.

To point to just one obvious problem, the error recovery path
(sigsetjmp block) in postgres.c does

disable_all_timeouts(false);

which cancels *all* timeouts. Probably don't want that behavior
anymore.

I think the issue you're complaining about may have to do with
the fact that if there's no statement timeout active, both
enable_statement_timeout and disable_statement_timeout will
call "disable_timeout(STATEMENT_TIMEOUT, false);". That does
nothing, as desired, if there are no other active timeouts ...
but if there is one, ie the client_connection timeout, we'll
end up calling schedule_alarm which will call setitimer even
if the desired time-of-nearest-timeout hasn't changed.
That was OK behavior for the set of timeouts that the code
was designed to handle, but we're going to need to be smarter
now.

regards, tom lane

#17Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#16)
Re: Add client connection check during the execution of the query

On Thu, Jul 18, 2019 at 5:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Tatsuo Ishii <ishii@sraoss.co.jp> writes:

Yeah, the timer logic is wrong. I didn't have time to look into it
but with truss/strace for some reason I see 3 setitimer() syscalls for
every query, but I think this doesn't even need to set the timer for
every query.

Hum. I see 2 settimer(), instead of 3.

src/backend/utils/misc/timeout.c is not really designed for there
to be timeouts that persist across multiple queries. It can probably
be made better, but this patch doesn't appear to have touched any of
that logic.

To point to just one obvious problem, the error recovery path
(sigsetjmp block) in postgres.c does

disable_all_timeouts(false);

which cancels *all* timeouts. Probably don't want that behavior
anymore.

I think the issue you're complaining about may have to do with
the fact that if there's no statement timeout active, both
enable_statement_timeout and disable_statement_timeout will
call "disable_timeout(STATEMENT_TIMEOUT, false);". That does
nothing, as desired, if there are no other active timeouts ...
but if there is one, ie the client_connection timeout, we'll
end up calling schedule_alarm which will call setitimer even
if the desired time-of-nearest-timeout hasn't changed.
That was OK behavior for the set of timeouts that the code
was designed to handle, but we're going to need to be smarter
now.

Ok, I think we like this feature proposal and I think we have a pretty
good handle on what needs to be done next, but without a patch author
that's not going to happen. I've therefore marked it "Returned with
feedback". If Sergey or someone else is interested in picking this up
and doing the work in time for CF2, please feel free to change that to
'moved to next'.

I wonder if it'd be possible, along the way, to make it so that
statement_timeout doesn't require calling itimer() for every
statement, instead letting the existing timer run if there is a
reasonable amount left to run on it, and then resetting it if needed.
I haven't looked into whether that's hard/impossible due to some race
condition I haven't spent the time to think about, but if Ishii-san
sees 5% slowdown from a couple of itimer calls(), I guess using
statement_timeout might currently cause a 2.5% slowdown.

--
Thomas Munro
https://enterprisedb.com

#18Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Tatsuo Ishii (#13)
Re: Add client connection check during the execution of the query

On 18.07.2019 6:19, Tatsuo Ishii wrote:

I noticed that there are some confusions in the doc and code regarding what the new
configuration parameter means. According to the doc:

+        Default value is <literal>zero</literal> - it disables connection
+        checks, so the backend will detect client disconnection only when trying
+        to send a response to the query.

But guc.c comment says:

+			gettext_noop("Sets the time interval for checking connection with the client."),
+			gettext_noop("A value of -1 disables this feature. Zero selects a suitable default value."),

Probably the doc is correct since the actual code does so.

Yes, value -1 is not even accepted due to the specified range.

tps = 67715.993428 (including connections establishing)
tps = 67717.251843 (excluding connections establishing)

So the performance is about 5% down with the feature enabled in this
case. For me, 5% down is not subtle. Probably we should warn this in
the doc.

I also see some performance degradation, although it is not so large in
my case (I used pgbench with scale factor 10 and run the same command as
you).
In my case difference is 103k vs. 105k TPS is smaller than 2%.

It seems to me that it is not necessary to enable timeout at each command:

@@ -4208,6 +4210,9 @@ PostgresMain(int argc, char *argv[],
           */
          CHECK_FOR_INTERRUPTS();
          DoingCommandRead = false;
+        if (client_connection_check_interval)
+            enable_timeout_after(SKIP_CLIENT_CHECK_TIMEOUT,
+                                 client_connection_check_interval);

         /*
          * (5) turn off the idle-in-transaction timeout

Unlike statement timeout or idle in transaction timeout price start of
measuring time is not important.
So it is possible to do once before main backend loop:

@@ -3981,6 +3983,10 @@ PostgresMain(int argc, char *argv[],
        if (!IsUnderPostmaster)
                PgStartTime = GetCurrentTimestamp();

+       if (client_connection_check_interval)
+               enable_timeout_after(SKIP_CLIENT_CHECK_TIMEOUT,
+ client_connection_check_interval);
+
         /*
          * POSTGRES main processing loop begins here
          *

But actually I do not see much difference from moving enabling timeout code.
Moreover the difference in performance almost not depend on the value of
timeout.
I set it to 100 seconds with pgbench loop 30 seconds (so timeout never
fired and recv is never called)
and still there is small difference in performance.

After some experiments I found out that just presence of active timer
results some small performance penalty.
You can easily check it: set for example statement_timeout to the same
large value (100 seconds) and you will get the same small slowdown.

So recv() itself is not source of the problem.
Actually any system call (may be except fsync) performed with frequency
less than one second can not have some noticeable impact on performance.
So I do not think that recv(MSG_PEEK)  can cause any performance problem
at Windows or any other platform.
But I wonder why we can not perform just pool with POLLOUT flag and zero
timeout.
If OS detected closed connection, it should return POLLHUP, should not it?
I am not sure if it is more portable or more efficient way - just seems
to be a little bit more natural way (from my point of view) to check if
connection is still alive.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#19Thomas Munro
thomas.munro@gmail.com
In reply to: Konstantin Knizhnik (#18)
Re: Add client connection check during the execution of the query

On Sat, Aug 3, 2019 at 4:40 AM Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

On 18.07.2019 6:19, Tatsuo Ishii wrote:

So the performance is about 5% down with the feature enabled in this
case. For me, 5% down is not subtle. Probably we should warn this in
the doc.

I also see some performance degradation, although it is not so large in
my case (I used pgbench with scale factor 10 and run the same command as
you).
In my case difference is 103k vs. 105k TPS is smaller than 2%.

I didn't test, but hopefully the degradation is fixed by commit 09cf1d52?

If OS detected closed connection, it should return POLLHUP, should not it?
I am not sure if it is more portable or more efficient way - just seems
to be a little bit more natural way (from my point of view) to check if
connection is still alive.

That's if you're sleeping inepoll etc. This patch is for CPU-bound
backends, running a long query. We need to do something special to
find out if the kernel knows that the connection has been closed.

I've done a quick rebase of this the patch and added it to the
commitfest. No other changes. Several things were mentioned earlier
that still need to be tidied up.

Attachments:

v3-0001-Detect-dropped-connections-while-running-queries.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Detect-dropped-connections-while-running-queries.patchDownload+212-2
#20Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#19)
Re: Add client connection check during the execution of the query

On Mon, Mar 1, 2021 at 6:18 PM Thomas Munro <thomas.munro@gmail.com> wrote:

I've done a quick rebase of this the patch and added it to the
commitfest. No other changes. Several things were mentioned earlier
that still need to be tidied up.

Rebased again due to bitrot. This time I did some actual work:

1. I didn't like the way it was rearming the timer *in the timer
handler*; I think it should be done in the CFI(), and only if it
determines that you're still running a query (otherwise you'll get
periodic wakeups while you're idle between quieries, which is bad for
the arctic ice cap; we already handle client going away efficiently
between queries with WaitEventSet socket readiness).
2. The timer handler surely has to set the latch to close a race (cf.
other similar handlers; between the CFI() and the beginning of the
sleep, you could handle the signal, set the flag, and then go to sleep
for 100 years).
3. The test might as well use pg_sleep() instead of doing a plpgsql
busy loop of SELECT queries.
4. I prefer the name CLIENT_CONNECTION_CHECK_TIMEOUT instead of
SKIP_CLIENT_CHECK_TIMEOUT; let's make up only one new name for a
concept instead of two.
5. Miniscule doc change.

I put these into a separate patch for ease of review. I don't claim
this is ready -- still needs more testing etc -- but it seems to be
generating the right system calls at the right times now.

Attachments:

v4-0001-Detect-dropped-connections-while-running-queries.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Detect-dropped-connections-while-running-queries.patchDownload+212-2
v4-0002-some-fixups.patchtext/x-patch; charset=US-ASCII; name=v4-0002-some-fixups.patchDownload+15-28
#21Zhihong Yu
zyu@yugabyte.com
In reply to: Thomas Munro (#20)
#22Thomas Munro
thomas.munro@gmail.com
In reply to: Zhihong Yu (#21)
#23Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#22)
#24Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#23)
#25Zhihong Yu
zyu@yugabyte.com
In reply to: Thomas Munro (#24)
#26Thomas Munro
thomas.munro@gmail.com
In reply to: Konstantin Knizhnik (#18)
#27Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#26)
#28Maksim Milyutin
milyutinma@gmail.com
In reply to: Thomas Munro (#26)
#29Thomas Munro
thomas.munro@gmail.com
In reply to: Maksim Milyutin (#28)
#30Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#29)
#31tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Thomas Munro (#30)
#32Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Thomas Munro (#30)
#33Thomas Munro
thomas.munro@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#31)
#34Thomas Munro
thomas.munro@gmail.com
In reply to: Bharath Rupireddy (#32)
#35tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Thomas Munro (#33)
#36Thomas Munro
thomas.munro@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#35)
#37Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#36)
#38Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#37)
#39Zhihong Yu
zyu@yugabyte.com
In reply to: Thomas Munro (#38)
#40Thomas Munro
thomas.munro@gmail.com
In reply to: Zhihong Yu (#39)
#41Zhihong Yu
zyu@yugabyte.com
In reply to: Thomas Munro (#40)
#42Maksim Milyutin
milyutinma@gmail.com
In reply to: Thomas Munro (#38)
#43Thomas Munro
thomas.munro@gmail.com
In reply to: Maksim Milyutin (#42)
#44Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#43)
#45Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#44)
#46Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#45)
#47Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#46)
#48Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#46)
#49Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#48)
#50Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#49)
#51Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#50)
#52Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#51)
#53Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#52)
#54Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#53)