HP/Pgsql/DBD::Pg issue

Started by Ed L.almost 19 years ago7 messagesgeneral
Jump to latest
#1Ed L.
pgsql@bluepolka.net

After a reboot (and usually after an OS patch) on our HP-UX 11.23
64-bit Itanium DB servers, our libpq/DBD::Pg libraries cease to
work. Instead, they give the standard message you get when the
DB cluster is not running. But we *know* it is running and all
access paths are working. We have found a workaround by
switching from 64-bit perl to 32-bit perl, build a 32-bit pgsql,
and rebuild the perl DBD module using 32-bit perl and linking
with the 32-bit pgsql. But the fact we're having to do that is
a problem for us.

I don't understand this problem and am at a loss as to where to
look. Any ideas?

TIA.

Ed

#2Ed L.
pgsql@bluepolka.net
In reply to: Ed L. (#1)
Re: HP/Pgsql/DBD::Pg issue

On Thursday 26 April 2007 8:50 am, Ed L. wrote:

After a reboot (and usually after an OS patch) on our HP-UX
11.23 64-bit Itanium DB servers, our libpq/DBD::Pg libraries
cease to work. Instead, they give the standard message you
get when the DB cluster is not running. But we *know* it is
running and all access paths are working. We have found a
workaround by switching from 64-bit perl to 32-bit perl, build
a 32-bit pgsql, and rebuild the perl DBD module using 32-bit
perl and linking with the 32-bit pgsql. But the fact we're
having to do that is a problem for us.

I don't understand this problem and am at a loss as to where
to look. Any ideas?

I should add that it is only these client apps that are running
on the DB server itself. DBD apps connecting remotely don't
have any problems.

TIA.

Ed

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ed L. (#1)
Re: HP/Pgsql/DBD::Pg issue

"Ed L." <pgsql@bluepolka.net> writes:

After a reboot (and usually after an OS patch) on our HP-UX 11.23
64-bit Itanium DB servers, our libpq/DBD::Pg libraries cease to
work. Instead, they give the standard message you get when the
DB cluster is not running.

Try ktrace'ing the client to see what it's doing at the kernel-call level.
(I think HPUX's equivalent is just called "trace" btw.)

regards, tom lane

#4Ed L.
pgsql@bluepolka.net
In reply to: Tom Lane (#3)
Re: HP/Pgsql/DBD::Pg issue

On Thursday 26 April 2007 9:42 am, Tom Lane wrote:

"Ed L." <pgsql@bluepolka.net> writes:

After a reboot (and usually after an OS patch) on our HP-UX
11.23 64-bit Itanium DB servers, our libpq/DBD::Pg libraries
cease to work. Instead, they give the standard message you
get when the DB cluster is not running.

Try ktrace'ing the client to see what it's doing at the
kernel-call level. (I think HPUX's equivalent is just called
"trace" btw.)

Attached is a small tar.gz file containing a short perl DBI
connection script that repeatedly demonstrates this problem.
There are also two log files containing tusc output (an HP
syscall trace utility), one for the 32-bit run (which works) and
another for the 64-bit run (which fails). I haven't made much
sense of it yet, so any help deciphering is appreciated.

TIA.

Ed

Attachments:

connfail.tar.gzapplication/x-tgz; name=connfail.tar.gzDownload
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ed L. (#4)
Re: HP/Pgsql/DBD::Pg issue

"Ed L." <pgsql@bluepolka.net> writes:

Attached is a small tar.gz file containing a short perl DBI
connection script that repeatedly demonstrates this problem.
There are also two log files containing tusc output (an HP
syscall trace utility), one for the 32-bit run (which works) and
another for the 64-bit run (which fails). I haven't made much
sense of it yet, so any help deciphering is appreciated.

Well, it's going wrong here:

socket(AF_INET, SOCK_STREAM, 0) .......................... = 4
setsockopt(4, 0x6, TCP_NODELAY, 0x9fffffffffffe210, 4) ... = 0
fcntl(4, F_SETFL, 65536) ................................. = 0
fcntl(4, F_SETFD, 1) ..................................... = 0
connect(4, 0x6000000000416ea0, 16) ....................... = 0
getsockopt(4, SOL_SOCKET, SO_ERROR, 0x9fffffffffffe32c, 0x9fffffffffffe338) = 0
close(4) ................................................. = 0

The close() indicates we're into the failure path, so evidently the
getsockopt returned a failure indication (though it's hard to tell what
--- strerror() isn't providing anything useful).  What strikes me as odd
about this is that the connect() really should have returned EINPROGRESS
or some other failure code, because we're doing it in nonblock mode.
A zero return implies that the connection is already made, which it
shouldn't be if you're connecting to some other machine (if this is a
local connection then maybe it's sane, but I don't see that here when
testing loopback TCP connections).  So I wonder if connect() is blowing
it here and claiming the connection is ready when it's not quite yet.
Another possibility is that getsockopt() is returning bad data, which
smells a bit more like the sort of thing that might go wrong in 64 vs
32 bit mode.

You might want to adjust connectFailureMessage() in fe-connect.c to
print the actual numeric value of "errorno" along with its strerror
translation ... that might give a bit more hint.

regards, tom lane

#6Ed L.
pgsql@bluepolka.net
In reply to: Tom Lane (#5)
Re: HP/Pgsql/DBD::Pg issue

On Tuesday 01 May 2007 2:23 pm, Tom Lane wrote:

Well, it's going wrong here:

socket(AF_INET, SOCK_STREAM, 0) .......................... = 4
setsockopt(4, 0x6, TCP_NODELAY, 0x9fffffffffffe210, 4) ... = 0
fcntl(4, F_SETFL, 65536) ................................. = 0
fcntl(4, F_SETFD, 1) ..................................... = 0
connect(4, 0x6000000000416ea0, 16) ....................... = 0
getsockopt(4, SOL_SOCKET, SO_ERROR, 0x9fffffffffffe32c,
0x9fffffffffffe338) = 0 close(4)
................................................. = 0

The close() indicates we're into the failure path, so
evidently the getsockopt returned a failure indication (though
it's hard to tell what --- strerror() isn't providing anything
useful). What strikes me as odd about this is that the
connect() really should have returned EINPROGRESS or some
other failure code, because we're doing it in nonblock mode. A
zero return implies that the connection is already made, which
it shouldn't be if you're connecting to some other machine (if
this is a local connection then maybe it's sane, but I don't
see that here when testing loopback TCP connections). So I
wonder if connect() is blowing it here and claiming the
connection is ready when it's not quite yet. Another
possibility is that getsockopt() is returning bad data, which
smells a bit more like the sort of thing that might go wrong
in 64 vs 32 bit mode.

It is indeed a local connection using PGHOST=`hostname`. That
name maps to one of the external NIC IPs, not to the normal
127.0.0.1 loopback address. For context, I've seen this a
number of times over the past couple years, from pgsql 7.3.x to
8.1.x, HPUX 11.00 to 11.23, 32-bit-only and 32/64 Itaniums,
always via a local connection using `hostname` mapping to an
external NIC. What it is about the reboots that triggers this
remains a mystery.

Ed

#7Ed L.
pgsql@bluepolka.net
In reply to: Ed L. (#6)
Re: HP/Pgsql/DBD::Pg issue

On Tuesday 01 May 2007 2:46 pm, Ed L. wrote:

It is indeed a local connection using PGHOST=`hostname`.  That
name maps to one of the external NIC IPs, not to the normal
127.0.0.1 loopback address.  For context, I've seen this a
number of times over the past couple years, from pgsql 7.3.x
to 8.1.x, HPUX 11.00 to 11.23, 32-bit-only and 32/64 Itaniums,
always via a local connection using `hostname` mapping to an
external NIC.  What it is about the reboots that triggers this
remains a mystery.

Not to create a red herring, I should add it also fails for
PGHOST=localhost/127... Only relinking/reinstalling with 32-bit
perl seems to fix it. I will see if I can tweak fe-connect.c.

Ed