Client failure allows backed to continue
As part of the training class I did, some people tested what happens
when the client allocates tons of memory to store a result and aborts.
What we found was that though elog was properly called:
elog(COMMERROR, "pq_recvbuf: recv() failed: %m");
(I think that was the message.) the backend did not exit and kept
eating CPU. I think the problem is that the elog code only exits on
ERROR, not COMMERROR. Is there some way to fix this?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes:
As part of the training class I did, some people tested what happens
when the client allocates tons of memory to store a result and aborts.
What we found was that though elog was properly called:
elog(COMMERROR, "pq_recvbuf: recv() failed: %m");
(I think that was the message.) the backend did not exit and kept
eating CPU. I think the problem is that the elog code only exits on
ERROR, not COMMERROR. Is there some way to fix this?
There's been talk of setting the QueryCancel flag after detecting a
client communication failure ... but no one has ever done the legwork
to see if that works nicely, or what downsides it might have.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
As part of the training class I did, some people tested what happens
when the client allocates tons of memory to store a result and aborts.What we found was that though elog was properly called:
elog(COMMERROR, "pq_recvbuf: recv() failed: %m");
(I think that was the message.) the backend did not exit and kept
eating CPU. I think the problem is that the elog code only exits on
ERROR, not COMMERROR. Is there some way to fix this?There's been talk of setting the QueryCancel flag after detecting a
client communication failure ... but no one has ever done the legwork
to see if that works nicely, or what downsides it might have.
Why is COMMERROR not doing the longjump like ERROR?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Why is COMMERROR not doing the longjump like ERROR?
Because it's defined to be like LOG.
A more useful reply might be that I'm not sure it's safe to abort in the
client I/O routines.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Why is COMMERROR not doing the longjump like ERROR?
Because it's defined to be like LOG.
A more useful reply might be that I'm not sure it's safe to abort in the
client I/O routines.
Well, if we get an I/O error, I can't imagine why we would continue
doing anything --- are any of those recoverable? Do we need a separate
error type for I/O messages?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Well, if we get an I/O error, I can't imagine why we would continue
doing anything --- are any of those recoverable?
Well, that's what's not clear --- it's hard to tell if a write failure
is a hard error or just transient. If we make like elog(ERROR),
returning to the main loop, and then a read from the client *doesn't*
fail, we'll try to continue ... but we've just screwed the pooch,
because we have not sent a complete message and therefore certainly have
messed up frontend/backend synchronization. I have no idea whether it's
really possible to recover from this situation or not, but that approach
surely won't work.
If you want to take a kamikaze any-comm-error-means-we're-dead approach,
you might think about elog(FATAL). But that tries to send a message to
the client. Instant infinite loop, if the error is hard.
Complaints to the postmaster log, and abort at the next safe place
(*not* partway through message output) seem like the way to go to me.
Do we need a separate error type for I/O messages?
Uh ... see COMMERROR.
regards, tom lane
Well, setting query_cancel then seems like a logical solution because it
will exit at a reasonable point, hopefully. Right now we have
statement_timeout and that exits at a give time, but I suppose it
doesn't exit while data is transfering, so it may be different.
---------------------------------------------------------------------------
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Well, if we get an I/O error, I can't imagine why we would continue
doing anything --- are any of those recoverable?Well, that's what's not clear --- it's hard to tell if a write failure
is a hard error or just transient. If we make like elog(ERROR),
returning to the main loop, and then a read from the client *doesn't*
fail, we'll try to continue ... but we've just screwed the pooch,
because we have not sent a complete message and therefore certainly have
messed up frontend/backend synchronization. I have no idea whether it's
really possible to recover from this situation or not, but that approach
surely won't work.If you want to take a kamikaze any-comm-error-means-we're-dead approach,
you might think about elog(FATAL). But that tries to send a message to
the client. Instant infinite loop, if the error is hard.Complaints to the postmaster log, and abort at the next safe place
(*not* partway through message output) seem like the way to go to me.Do we need a separate error type for I/O messages?
Uh ... see COMMERROR.
regards, tom lane
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073