Re: SIGPIPE gripe

Started by Tom Laneover 27 years ago8 messages
#1Tom Lane
tgl@sss.pgh.pa.us

"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes:

If you don't remember, there was a fflush() called in fe-misc.c,
exactly in pqPuts() function that receives EPIPE and it's currently
not ignored.

Yes, I recall the "broken pipe" problem and thought that someone had
fixed it (most platforms didn't seem to see the problem, but Linux did).

fe-connect.c is set up to ignore SIGPIPE while trying to shut down the
connection. (The 6.3.2 release is broken on non-POSIX-signal platforms,
because it resets SIGPIPE to SIG_DFL afterwards, which may well not be
what the application wanted. I've fixed that in the version that I plan
to submit soon.)

There is no equivalent code to ignore SIGPIPE during ordinary writes to
the backend. I'm hesitant to add it on the following grounds:
1. Speed: a write would need three kernel calls instead of one.
2. I'm suspicious of code that alters signal settings during normal
operation. Especially in a library that can't know what else is
going on in the application. Disabling the application's signal
handler, even for short intervals, is best avoided.
3. It's only an issue if the backend crashes, which shouldn't happen
anyway ... shouldn't happen anyway ... shouldn't ... ;-)

The real question is what scenario is causing SIGPIPE to be delivered
in the first place. A search of the pgsql-hackers archives for
"SIGPIPE" yields only a mention of seeing SIGPIPE some of the time
(not always) when trying to connect to a nonexistent database.
If that's what's being complained of here, I'll try to look into it.

regards, tom lane

#2Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Tom Lane (#1)
Re: [HACKERS] Re: SIGPIPE gripe

Yes, I recall the "broken pipe" problem and thought that someone had
fixed it (most platforms didn't seem to see the problem, but Linux
did).

fe-connect.c is set up to ignore SIGPIPE while trying to shut down the
connection. (The 6.3.2 release is broken on non-POSIX-signal
platforms, because it resets SIGPIPE to SIG_DFL afterwards, which may
well not be what the application wanted. I've fixed that in the
version that I plan to submit soon.)
There is no equivalent code to ignore SIGPIPE during ordinary writes
to the backend. I'm hesitant to add it on the following grounds:
1. Speed: a write would need three kernel calls instead of one.
2. I'm suspicious of code that alters signal settings during normal
operation. Especially in a library that can't know what else is
going on in the application. Disabling the application's signal
handler, even for short intervals, is best avoided.
3. It's only an issue if the backend crashes, which shouldn't happen
anyway ... shouldn't happen anyway ... shouldn't ... ;-)
The real question is what scenario is causing SIGPIPE to be delivered
in the first place. A search of the pgsql-hackers archives for
"SIGPIPE" yields only a mention of seeing SIGPIPE some of the time
(not always) when trying to connect to a nonexistent database.
If that's what's being complained of here, I'll try to look into it.

golem$ psql nada
Connection to database 'nada' failed.
FATAL 1: Database nada does not exist in pg_database
golem$ psql nada
Broken pipe
golem$

This is on a Linux box with Postgres code frozen on 980408. I assume
that full v6.3.2 exhibits the same...

- Tom

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas G. Lockhart (#2)

I said:

The real question is what scenario is causing SIGPIPE to be delivered
in the first place. A search of the pgsql-hackers archives for
"SIGPIPE" yields only a mention of seeing SIGPIPE some of the time
(not always) when trying to connect to a nonexistent database.

OK, I've been able to reproduce this; I understand the problem and
I have a proposed fix.

Here's the scenario. On the server side, this happens:

Postmaster receives new connection request from client

(possible authentication cycle here)

Postmaster sends "AUTH OK" to client

Postmaster forks backend

Backend discovers that database name is invalid

Backend sends error message

Backend closes connection and exits

Meanwhile, once the client receives the "AUTH OK" it initiates
an empty query cycle (which is commented as intending to discover
whether the database exists!):

...

Client receives "AUTH_OK"

Client sends "Q " query

Client waits for response

The problem, of course, is that if the backend manages to exit
before the client gets to send its empty query, then the client
is writing on a closed connection. Boom, SIGPIPE.

I thought about hacking around this by having the postmaster check
the validity of the database name before it does the authorization
cycle. But that's a bad idea; first because it allows unauthorized
users to probe the validity of database names, and second because
it only fixes this particular instance of the problem. The general
problem is that the FE/BE protocol does not make provision for errors
reported by the backend during startup. ISTM there are many ways in
which the BE might fail during startup, not all of which could
reasonably be checked in advance by the postmaster.

So ... since we're altering the protocol anyway ... the right fix is
to alter the protocol a little more. Remember that "Z" message that
the backend is now sending at the end of every query cycle? What
we ought to do is make the BE send "Z" at completion of startup,
as well. (In other words, "Z" will really mean "Ready for Query"
rather than "Query Done". This is actually easier to implement in
postgres.c than the other way.) Now the client's startup procedure
looks like

...

Client receives "AUTH_OK"

Client waits for "Z" ; if get "E" instead, BE startup failed.

I suspect it's not really necessary to do an empty query after this,
but we may as well leave that in there for additional reliability.

regards, tom lane

#4Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Tom Lane (#3)
Re: [HACKERS] Re: SIGPIPE gripe

Client receives "AUTH_OK"

Client waits for "Z" ; if get "E" instead, BE startup failed.

I suspect it's not really necessary to do an empty query after this,
but we may as well leave that in there for additional reliability.

I say go without the extra query. We can always add it later if there
is a problem. Backend startup time should be as fast as possible,
especially for short requests like www.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#4)
Re: [HACKERS] Re: SIGPIPE gripe

Bruce Momjian <maillist@candle.pha.pa.us> writes:

I suspect it's not really necessary to do an empty query after this,
but we may as well leave that in there for additional reliability.

I say go without the extra query. We can always add it later if there
is a problem. Backend startup time should be as fast as possible,

Good point. OK, I'll leave the empty-query code in fe-connect.c,
but ifdef it out, and we'll see if anyone has any problems.

regards, tom lane

#6Noname
dg@illustra.com
In reply to: Tom Lane (#3)
Re: [HACKERS] Re: SIGPIPE gripe

So ... since we're altering the protocol anyway ... the right fix is
to alter the protocol a little more. Remember that "Z" message that
the backend is now sending at the end of every query cycle? What
we ought to do is make the BE send "Z" at completion of startup,
as well. (In other words, "Z" will really mean "Ready for Query"
rather than "Query Done". This is actually easier to implement in
postgres.c than the other way.) Now the client's startup procedure
looks like

...

Client receives "AUTH_OK"

Client waits for "Z" ; if get "E" instead, BE startup failed.

BE fails, client gets SIGPIPE? or client waits forever?

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software 300 Lakeside Drive Oakland, CA 94612
- A child of five could understand this! Fetch me a child of five.

#7Peter T Mount
psqlhack@retep.org.uk
In reply to: Tom Lane (#3)
Re: [HACKERS] Re: SIGPIPE gripe

On Mon, 4 May 1998, Tom Lane wrote:

[snip]

Meanwhile, once the client receives the "AUTH OK" it initiates
an empty query cycle (which is commented as intending to discover
whether the database exists!):

...

Client receives "AUTH_OK"

Client sends "Q " query

Client waits for response

The problem, of course, is that if the backend manages to exit
before the client gets to send its empty query, then the client
is writing on a closed connection. Boom, SIGPIPE.

[snip]

So ... since we're altering the protocol anyway ... the right fix is
to alter the protocol a little more. Remember that "Z" message that
the backend is now sending at the end of every query cycle? What
we ought to do is make the BE send "Z" at completion of startup,
as well. (In other words, "Z" will really mean "Ready for Query"
rather than "Query Done". This is actually easier to implement in
postgres.c than the other way.) Now the client's startup procedure
looks like

...

Client receives "AUTH_OK"

Client waits for "Z" ; if get "E" instead, BE startup failed.

This sounds fair enough. Infact we could then throw a more meaningful
error message when this occurs.

I suspect it's not really necessary to do an empty query after this,
but we may as well leave that in there for additional reliability.

In JDBC, I replaced (back in 6.2) the empty query with one to get the
current DateStyle (which the driver then uses to handle dates correctly).

--
Peter T Mount peter@retep.org.uk or petermount@earthling.net
Main Homepage: http://www.retep.org.uk
************ Someday I may rebuild this signature completely ;-) ************
Work Homepage: http://www.maidstone.gov.uk Work EMail: peter@maidstone.gov.uk

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter T Mount (#7)
Re: [HACKERS] Re: SIGPIPE gripe

dg@illustra.com (David Gould) writes:

So ... since we're altering the protocol anyway ... the right fix is
to alter the protocol a little more.

Client waits for "Z" ; if get "E" instead, BE startup failed.

BE fails, client gets SIGPIPE? or client waits forever?

Neither: the client detects EOF on its input and realizes that the
backend failed. Already done and tested.

(SIGPIPE is only for *write* on a closed channel, not *read*.
Read just returns EOF.)

regards, tom lane