zombie connections

Started by Robert Haasalmost 6 years ago12 messages
#1Robert Haas
robertmhaas@gmail.com

Hi,

Suppose that the server is executing a lengthy query, and the client
breaks the connection. The operating system will be aware that the
connection is no more, but PostgreSQL doesn't notice, because it's not
try to read from or write to the socket. It's not paying attention to
the socket at all. In theory, the query could be one that runs for a
million years and continue to chew up CPU and I/O, or at the very
least a connection slot, essentially forever. That's sad.

I don't have a terribly specific idea about how to improve this, but
is there some way that we could, at least periodically, check the
socket to see whether it's dead? Noticing the demise of the client
after a configurable interval (maybe 60s by default?) would be
infinitely better than never.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Robert Haas (#1)
Re: zombie connections

pá 3. 4. 2020 v 14:30 odesílatel Robert Haas <robertmhaas@gmail.com> napsal:

Hi,

Suppose that the server is executing a lengthy query, and the client
breaks the connection. The operating system will be aware that the
connection is no more, but PostgreSQL doesn't notice, because it's not
try to read from or write to the socket. It's not paying attention to
the socket at all. In theory, the query could be one that runs for a
million years and continue to chew up CPU and I/O, or at the very
least a connection slot, essentially forever. That's sad.

I don't have a terribly specific idea about how to improve this, but
is there some way that we could, at least periodically, check the
socket to see whether it's dead? Noticing the demise of the client
after a configurable interval (maybe 60s by default?) would be
infinitely better than never.

+ 1

Pavel

Show quoted text

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Julien Rouhaud
rjuju123@gmail.com
In reply to: Pavel Stehule (#2)
Re: zombie connections

On Fri, Apr 03, 2020 at 02:40:30PM +0200, Pavel Stehule wrote:

p� 3. 4. 2020 v 14:30 odes�latel Robert Haas <robertmhaas@gmail.com> napsal:

Suppose that the server is executing a lengthy query, and the client
breaks the connection. The operating system will be aware that the
connection is no more, but PostgreSQL doesn't notice, because it's not
try to read from or write to the socket. It's not paying attention to
the socket at all. In theory, the query could be one that runs for a
million years and continue to chew up CPU and I/O, or at the very
least a connection slot, essentially forever. That's sad.

I don't have a terribly specific idea about how to improve this, but
is there some way that we could, at least periodically, check the
socket to see whether it's dead? Noticing the demise of the client
after a configurable interval (maybe 60s by default?) would be
infinitely better than never.

+ 1

+1 too, I already saw such behavior.

Maybe the postmaster could send some new PROCSIG SIGUSR1 signal to backends at
a configurable interval and let ProcessInterrupts handle it?

#4Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Robert Haas (#1)
Re: zombie connections

On 03.04.2020 15:29, Robert Haas wrote:

Hi,

Suppose that the server is executing a lengthy query, and the client
breaks the connection. The operating system will be aware that the
connection is no more, but PostgreSQL doesn't notice, because it's not
try to read from or write to the socket. It's not paying attention to
the socket at all. In theory, the query could be one that runs for a
million years and continue to chew up CPU and I/O, or at the very
least a connection slot, essentially forever. That's sad.

I don't have a terribly specific idea about how to improve this, but
is there some way that we could, at least periodically, check the
socket to see whether it's dead? Noticing the demise of the client
after a configurable interval (maybe 60s by default?) would be
infinitely better than never.

There was a patch on commitfest addressing this problem:
https://commitfest.postgresql.org/21/1882/
It it currently included in PostgrePro EE, but the author of the patch
is not working in our company any more.
Should we resurrects this patch or there is something wrong with the
proposed approach?

#5Robert Haas
robertmhaas@gmail.com
In reply to: Konstantin Knizhnik (#4)
Re: zombie connections

On Fri, Apr 3, 2020 at 9:34 AM Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

There was a patch on commitfest addressing this problem:
https://commitfest.postgresql.org/21/1882/
It it currently included in PostgrePro EE, but the author of the patch
is not working in our company any more.
Should we resurrects this patch or there is something wrong with the
proposed approach?

Thanks for the link.

Tom seems to have offered some fairly specific criticism in
/messages/by-id/31564.1563426253@sss.pgh.pa.us

I haven't studied that thread in detail, but I would suggest that if
you want to resurrect the patch, that might be a good place to start.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Isaac Morland
isaac.morland@gmail.com
In reply to: Robert Haas (#1)
Re: zombie connections

On Fri, 3 Apr 2020 at 08:30, Robert Haas <robertmhaas@gmail.com> wrote:

I don't have a terribly specific idea about how to improve this, but
is there some way that we could, at least periodically, check the
socket to see whether it's dead? Noticing the demise of the client
after a configurable interval (maybe 60s by default?) would be
infinitely better than never.

Does it make any difference if the query is making changes? If the query is
just computing a result and returning it to the client, there is no point
in continuing once the socket is closed. But if it is updating data or
making DDL changes, then at least some of the time it would be preferable
for the changes to be made. Having said that, in normal operation one
wants, at the client end, to see the message from the server that the
changes have been completed, not just fire off a change and hope it
completes.

#7Robert Haas
robertmhaas@gmail.com
In reply to: Isaac Morland (#6)
Re: zombie connections

On Fri, Apr 3, 2020 at 9:52 AM Isaac Morland <isaac.morland@gmail.com> wrote:

Does it make any difference if the query is making changes? If the query is just computing a result and returning it to the client, there is no point in continuing once the socket is closed. But if it is updating data or making DDL changes, then at least some of the time it would be preferable for the changes to be made. Having said that, in normal operation one wants, at the client end, to see the message from the server that the changes have been completed, not just fire off a change and hope it completes.

The system can't know whether the query is going to change anything,
because even if the query is a SELECT, it doesn't know whether any of
the functions or operators called from that SELECT might write data.

I don't think it would be smart to make behavior like this depend on
whether the statement is a SELECT vs. INSERT/UPDATE/DELETE, or on
things like whether there is an explicit transaction open. I think we
should just have a feature that kills the server process if the
connection goes away. If some people don't want that, it can be
optional.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Pavel Stehule
pavel.stehule@gmail.com
In reply to: Isaac Morland (#6)
Re: zombie connections

pá 3. 4. 2020 v 15:52 odesílatel Isaac Morland <isaac.morland@gmail.com>
napsal:

On Fri, 3 Apr 2020 at 08:30, Robert Haas <robertmhaas@gmail.com> wrote:

I don't have a terribly specific idea about how to improve this, but
is there some way that we could, at least periodically, check the
socket to see whether it's dead? Noticing the demise of the client
after a configurable interval (maybe 60s by default?) would be
infinitely better than never.

Does it make any difference if the query is making changes? If the query
is just computing a result and returning it to the client, there is no
point in continuing once the socket is closed. But if it is updating data
or making DDL changes, then at least some of the time it would be
preferable for the changes to be made. Having said that, in normal
operation one wants, at the client end, to see the message from the server
that the changes have been completed, not just fire off a change and hope
it completes.

I prefer simple solution without any "intelligence". It is much safer to
close connect and rollback. Then it is clean protocol - when server didn't
reported successful end of operation, then operation was reverted - always.

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#8)
Re: zombie connections

Pavel Stehule <pavel.stehule@gmail.com> writes:

I prefer simple solution without any "intelligence". It is much safer to
close connect and rollback. Then it is clean protocol - when server didn't
reported successful end of operation, then operation was reverted - always.

It would be a fatal mistake to imagine that this feature would offer any
greater guarantees in that line than we have today (which is to say,
none really). It can be no better than the OS network stack's error
detection/reporting, which is necessarily pretty weak. The fact that
the kernel accepted a "command complete" message from us doesn't mean
that the client was still alive at that instant, much less that the
message will be deliverable.

In general I think the threshold problem for a patch like this will be
"how do you keep the added overhead down". As Robert noted upthread,
timeout.c is quite a bit shy of being able to handle timeouts that
persist across statements. I don't think that there's any fundamental
reason it can't be improved, but it will need improvements.

regards, tom lane

#10Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#9)
Re: zombie connections

On Fri, Apr 3, 2020 at 10:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

In general I think the threshold problem for a patch like this will be
"how do you keep the added overhead down". As Robert noted upthread,
timeout.c is quite a bit shy of being able to handle timeouts that
persist across statements. I don't think that there's any fundamental
reason it can't be improved, but it will need improvements.

Why do we need that? If we're not executing a statement, we're
probably trying to read() from the socket, and we'll notice if that
returns 0 or -1. So it seems like we only need periodic checks while
there's a statement in progress.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#10)
Re: zombie connections

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Apr 3, 2020 at 10:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

In general I think the threshold problem for a patch like this will be
"how do you keep the added overhead down". As Robert noted upthread,
timeout.c is quite a bit shy of being able to handle timeouts that
persist across statements. I don't think that there's any fundamental
reason it can't be improved, but it will need improvements.

Why do we need that? If we're not executing a statement, we're
probably trying to read() from the socket, and we'll notice if that
returns 0 or -1. So it seems like we only need periodic checks while
there's a statement in progress.

Maybe you could build it that way, but I'm not sure it's a better way.

(1) You'll need to build a concept of a timeout that's not a statement
timeout, but nonetheless should be canceled exactly when the statement
timeout is (not before or after, unless you'd like to incur additional
setitimer() calls). That's going to involve either timeout.c surgery
or fragile requirements on the callers.

(2) It only wins if a statement timeout is active, otherwise it makes
things worse, because then you need setitimer() at statement start
and end just to enable/disable the socket check timeout. Whereas
if you just let a once-a-minute timeout continue to run, you don't
incur those kernel calls.

It's possible that we should run this timeout differently depending
on whether or not a statement timeout is active, though I'd prefer to
avoid such complexity if possible. On the whole, if we have to
optimize just one of those cases, it should be the no-statement-timeout
case; with that timeout active, you're paying two setitimers per
statement anyway.

Anyway, the core problem with the originally-submitted patch was that
it was totally ignorant that timeout.c had restrictions it was breaking.
You can either fix the restrictions, or you can try to design around them,
but you've got to be aware of what that code can and can't do today.

regards, tom lane

#12Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#11)
Re: zombie connections

On Fri, Apr 3, 2020 at 11:50 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

(2) It only wins if a statement timeout is active, otherwise it makes
things worse, because then you need setitimer() at statement start
and end just to enable/disable the socket check timeout. Whereas
if you just let a once-a-minute timeout continue to run, you don't
incur those kernel calls.

Oh, that's a really good point. I should have thought of that.

Anyway, the core problem with the originally-submitted patch was that
it was totally ignorant that timeout.c had restrictions it was breaking.
You can either fix the restrictions, or you can try to design around them,
but you've got to be aware of what that code can and can't do today.

No disagreement there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company