"incomplete startup packet" on SGI

Started by David Rysdamover 20 years ago7 messagesgeneral
Jump to latest
#1David Rysdam
drysdam@ll.mit.edu

I have a working 8.1 server running on Linux and I can connect to it
from other Linux clients. I built postgresql 8.1 on an SGI (using
--without-readline but otherwise stock) and it compiled OK and installed
fine. But when I try to connect to the Linux server I get "could not
send startup packet: transport endpoint is not connected" on the client
end and "incomplete startup packet" on the server end. Connectivity
between the two machines is working.

I could find basically no useful references to the former and the only
references to the latter were portscans and the like.

Browsing the source, I see a couple places that message could come
from. One relates to SSL, which the output from configure says is
turned off on both client and server. The other is just a generic comm
error--but would could cause a partial failure like that?

#2David Rysdam
drysdam@ll.mit.edu
In reply to: David Rysdam (#1)
Re: "incomplete startup packet" on SGI

Just finished building and installing on *Sun* (also
"--without-readline", not that I think that could be the issue): Works
fine. So it's something to do with the SGI build in particular.

David Rysdam wrote:

Show quoted text

I have a working 8.1 server running on Linux and I can connect to it
from other Linux clients. I built postgresql 8.1 on an SGI (using
--without-readline but otherwise stock) and it compiled OK and
installed fine. But when I try to connect to the Linux server I get
"could not send startup packet: transport endpoint is not connected"
on the client end and "incomplete startup packet" on the server end.
Connectivity between the two machines is working.

I could find basically no useful references to the former and the only
references to the latter were portscans and the like.

Browsing the source, I see a couple places that message could come
from. One relates to SSL, which the output from configure says is
turned off on both client and server. The other is just a generic
comm error--but would could cause a partial failure like that?

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

#3Doug McNaught
doug@mcnaught.org
In reply to: David Rysdam (#2)
Re: "incomplete startup packet" on SGI

David Rysdam <drysdam@ll.mit.edu> writes:

Just finished building and installing on *Sun* (also
"--without-readline", not that I think that could be the issue): Works
fine. So it's something to do with the SGI build in particular.

IRIX buggy, film at 11. :)

-Doug

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rysdam (#2)
Re: "incomplete startup packet" on SGI

David Rysdam <drysdam@ll.mit.edu> writes:

Just finished building and installing on *Sun* (also
"--without-readline", not that I think that could be the issue): Works
fine. So it's something to do with the SGI build in particular.

More likely it's something to do with weird behavior of the SGI kernel's
TCP stack. I did a little googling for "transport endpoint is not
connected" without turning up anything obviously related, but that or
ENOTCONN is probably what you need to search on.

regards, tom lane

#5David Rysdam
drysdam@ll.mit.edu
In reply to: Tom Lane (#4)
Re: "incomplete startup packet" on SGI

Tom Lane wrote:

David Rysdam <drysdam@ll.mit.edu> writes:

Just finished building and installing on *Sun* (also
"--without-readline", not that I think that could be the issue): Works
fine. So it's something to do with the SGI build in particular.

More likely it's something to do with weird behavior of the SGI kernel's
TCP stack. I did a little googling for "transport endpoint is not
connected" without turning up anything obviously related, but that or
ENOTCONN is probably what you need to search on.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

It's acting like a race condition or pointer problem. When I add random
debug printfs/PQflushs to libpq it sometimes works.

#6David Rysdam
drysdam@ll.mit.edu
In reply to: David Rysdam (#5)
Re: "incomplete startup packet" on SGI

David Rysdam wrote:

Tom Lane wrote:

David Rysdam <drysdam@ll.mit.edu> writes:

Just finished building and installing on *Sun* (also
"--without-readline", not that I think that could be the issue):
Works fine. So it's something to do with the SGI build in particular.

More likely it's something to do with weird behavior of the SGI kernel's
TCP stack. I did a little googling for "transport endpoint is not
connected" without turning up anything obviously related, but that or
ENOTCONN is probably what you need to search on.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

It's acting like a race condition or pointer problem. When I add
random debug printfs/PQflushs to libpq it sometimes works.
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Not a race condition: No threads
Not a memory leak: Electric fence says nothing. And it works when
electric fence is running, whereas a binary that uses the same libpq
without linking efence does not work.

#7David Rysdam
drysdam@ll.mit.edu
In reply to: David Rysdam (#6)
Re: "incomplete startup packet" on SGI

David Rysdam wrote:

David Rysdam wrote:

Tom Lane wrote:

David Rysdam <drysdam@ll.mit.edu> writes:

Just finished building and installing on *Sun* (also
"--without-readline", not that I think that could be the issue):
Works fine. So it's something to do with the SGI build in particular.

More likely it's something to do with weird behavior of the SGI
kernel's
TCP stack. I did a little googling for "transport endpoint is not
connected" without turning up anything obviously related, but that or
ENOTCONN is probably what you need to search on.

regards, tom lane

---------------------------(end of
broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

It's acting like a race condition or pointer problem. When I add
random debug printfs/PQflushs to libpq it sometimes works.
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Not a race condition: No threads
Not a memory leak: Electric fence says nothing. And it works when
electric fence is running, whereas a binary that uses the same libpq
without linking efence does not work.

I know nobody is interested in this, but I think I should document the
"solution" for anyone who finds this thread in the archives: My theory
is that Irix is unable to keep up with how fast the postgresql client is
going and that the debug statements/efence stuff are slowing it down
enough that Irix can catch up and make sure the socket really is there,
connected and working. To that end, I inserted a sleep(1) in
fe-connect.c just before the pqPacketSend(...startpacket...) stuff.
It's stupid and hacky, but gets me where I need to be and maybe this
hint will inspire somebody who knows (and cares) about Irix to find a
real fix.