Tru64/Alpha problems

Started by Andrew Dunstanalmost 20 years ago8 messages
#1Andrew Dunstan
andrew@dunslane.net
1 attachment(s)

Honda Shigehiro has diagnosed the longstanding problems with his
Tru64/Alpha buildfarm member (bear). See below.

First, it appears that there is a problem with the system getaddrinfo(),
which configure reports as usable, but turns out not to be. Our current
configure test checks the return value of getaddrinfo("", "", NULL,
NULL) but I am wondering if we should test for "localhost" instead of ""
as the first parameter.

Second, it appears that this platform apparently doesn't handle Infinity
and NaN well. The regression diffs are attached.

cheers

andrew

-------- Original Message --------
Subject: Re: postgresql buildfarm member bear
Date: Tue, 28 Mar 2006 21:53:15 +0900 (JST)
From: Honda Shigehiro <fwif0083@mb.infoweb.ne.jp>
To: andrew@dunslane.net
CC: fwif0083@mb.infoweb.ne.jp
References: <44229B69.2090909@dunslane.net>
<20060323.225736.41630581.fwif0083@mb.infoweb.ne.jp>
<4422ACED.7030506@dunslane.net>

I found the cause. Tru64's getaddrinfo seems something wrong.
(I use version 5.0, but with google search, this is same until
version 5.1B.) I had used only with Unix domain socket.

So I succeed to start server with Unix Domain Socket(ex. make check).
But with "listen_addresses = 'localhost'", fail with:
LOG: could not translate host name "localhost", service "5432" to address: servname not supported for ai_socktype

To solve this, I had change to use src/port/getaddrinfo.c.
(I have little knowledge about autoconf...so ugly...)
Is there smart way which do not need to change code?

(1) change configure script and run it
bash-2.05b$ diff configure.aaa configure
14651c14651
< #define HAVE_GETADDRINFO 1
---

/* #define HAVE_GETADDRINFO 1 */

(2) run make command
It fail by some undefined symbol. After the fail, change directory
to src/port and type:
cc -std -I../../src/port -I../../src/include -I/usr/local/include -c getaddrinfo.c -o getaddrinfo.o
ar crs libpgport.a isinf.o getopt_long.o copydir.o dirmod.o exec.o noblock.o path.o pipe.o pgsleep.o pgstrcasecmp.o sprompt.o thread.o getaddrinfo.o
ar crs libpgport_srv.a isinf.o getopt_long.o copydir.o dirmod_srv.o exec_srv.o noblock.o path.o pipe.o pgsleep.o pgstrcasecmp.o sprompt.o thread_srv.o getaddrinfo.o

(3) re-run make command

(4) check make check and make installcheck
float4 and float8 tests are failed in both cases.

Attachments:

regression.diffstext/plain; name=regression.diffsDownload
*** ./expected/float4.out	Thu Apr  7 10:51:40 2005
--- ./results/float4.out	Tue Mar 28 21:03:10 2006
***************
*** 35,69 ****
  ERROR:  invalid input syntax for type real: "123            5"
  -- special inputs
  SELECT 'NaN'::float4;
!  float4 
! --------
!     NaN
! (1 row)
! 
  SELECT 'nan'::float4;
!  float4 
! --------
!     NaN
! (1 row)
! 
  SELECT '   NAN  '::float4;
!  float4 
! --------
!     NaN
! (1 row)
! 
  SELECT 'infinity'::float4;
!   float4  
! ----------
!  Infinity
! (1 row)
! 
  SELECT '          -INFINiTY   '::float4;
!   float4   
! -----------
!  -Infinity
! (1 row)
! 
  -- bad special inputs
  SELECT 'N A N'::float4;
  ERROR:  invalid input syntax for type real: "N A N"
--- 35,54 ----
  ERROR:  invalid input syntax for type real: "123            5"
  -- special inputs
  SELECT 'NaN'::float4;
! ERROR:  floating-point exception
! DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
  SELECT 'nan'::float4;
! ERROR:  floating-point exception
! DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
  SELECT '   NAN  '::float4;
! ERROR:  floating-point exception
! DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
  SELECT 'infinity'::float4;
! ERROR:  floating-point exception
! DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
  SELECT '          -INFINiTY   '::float4;
! ERROR:  floating-point exception
! DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
  -- bad special inputs
  SELECT 'N A N'::float4;
  ERROR:  invalid input syntax for type real: "N A N"
***************
*** 72,90 ****
  SELECT ' INFINITY    x'::float4;
  ERROR:  invalid input syntax for type real: " INFINITY    x"
  SELECT 'Infinity'::float4 + 100.0;
! ERROR:  type "double precision" value out of range: overflow
  SELECT 'Infinity'::float4 / 'Infinity'::float4;
!  ?column? 
! ----------
!       NaN
! (1 row)
! 
  SELECT 'nan'::float4 / 'nan'::float4;
!  ?column? 
! ----------
!       NaN
! (1 row)
! 
  SELECT '' AS five, * FROM FLOAT4_TBL;
   five |     f1      
  ------+-------------
--- 57,70 ----
  SELECT ' INFINITY    x'::float4;
  ERROR:  invalid input syntax for type real: " INFINITY    x"
  SELECT 'Infinity'::float4 + 100.0;
! ERROR:  floating-point exception
! DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
  SELECT 'Infinity'::float4 / 'Infinity'::float4;
! ERROR:  floating-point exception
! DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
  SELECT 'nan'::float4 / 'nan'::float4;
! ERROR:  floating-point exception
! DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
  SELECT '' AS five, * FROM FLOAT4_TBL;
   five |     f1      
  ------+-------------

======================================================================

*** ./expected/float8.out	Thu Jun  9 06:15:29 2005
--- ./results/float8.out	Tue Mar 28 21:03:10 2006
***************
*** 35,57 ****
  ERROR:  invalid input syntax for type double precision: "123           5"
  -- special inputs
  SELECT 'NaN'::float8;
!  float8 
! --------
!     NaN
! (1 row)
! 
  SELECT 'nan'::float8;
!  float8 
! --------
!     NaN
! (1 row)
! 
  SELECT '   NAN  '::float8;
!  float8 
! --------
!     NaN
! (1 row)
! 
  SELECT 'infinity'::float8;
    float8  
  ----------
--- 35,48 ----
  ERROR:  invalid input syntax for type double precision: "123           5"
  -- special inputs
  SELECT 'NaN'::float8;
! ERROR:  floating-point exception
! DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
  SELECT 'nan'::float8;
! ERROR:  floating-point exception
! DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
  SELECT '   NAN  '::float8;
! ERROR:  floating-point exception
! DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
  SELECT 'infinity'::float8;
    float8  
  ----------
***************
*** 72,90 ****
  SELECT ' INFINITY    x'::float8;
  ERROR:  invalid input syntax for type double precision: " INFINITY    x"
  SELECT 'Infinity'::float8 + 100.0;
! ERROR:  type "double precision" value out of range: overflow
  SELECT 'Infinity'::float8 / 'Infinity'::float8;
!  ?column? 
! ----------
!       NaN
! (1 row)
! 
  SELECT 'nan'::float8 / 'nan'::float8;
!  ?column? 
! ----------
!       NaN
! (1 row)
! 
  SELECT '' AS five, * FROM FLOAT8_TBL;
   five |          f1          
  ------+----------------------
--- 63,76 ----
  SELECT ' INFINITY    x'::float8;
  ERROR:  invalid input syntax for type double precision: " INFINITY    x"
  SELECT 'Infinity'::float8 + 100.0;
! ERROR:  floating-point exception
! DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
  SELECT 'Infinity'::float8 / 'Infinity'::float8;
! ERROR:  floating-point exception
! DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
  SELECT 'nan'::float8 / 'nan'::float8;
! ERROR:  floating-point exception
! DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
  SELECT '' AS five, * FROM FLOAT8_TBL;
   five |          f1          
  ------+----------------------
***************
*** 342,348 ****
     SET f1 = FLOAT8_TBL.f1 * '-1'
     WHERE FLOAT8_TBL.f1 > '0.0';
  SELECT '' AS bad, f.f1 * '1e200' from FLOAT8_TBL f;
! ERROR:  type "double precision" value out of range: overflow
  SELECT '' AS bad, f.f1 ^ '1e200' from FLOAT8_TBL f;
  ERROR:  result is out of range
  SELECT '' AS bad, ln(f.f1) from FLOAT8_TBL f where f.f1 = '0.0' ;
--- 328,335 ----
     SET f1 = FLOAT8_TBL.f1 * '-1'
     WHERE FLOAT8_TBL.f1 > '0.0';
  SELECT '' AS bad, f.f1 * '1e200' from FLOAT8_TBL f;
! ERROR:  floating-point exception
! DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
  SELECT '' AS bad, f.f1 ^ '1e200' from FLOAT8_TBL f;
  ERROR:  result is out of range
  SELECT '' AS bad, ln(f.f1) from FLOAT8_TBL f where f.f1 = '0.0' ;

======================================================================

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#1)
Re: Tru64/Alpha problems

Andrew Dunstan <andrew@dunslane.net> writes:

Honda Shigehiro has diagnosed the longstanding problems with his
Tru64/Alpha buildfarm member (bear). See below.

First, it appears that there is a problem with the system getaddrinfo(),
which configure reports as usable, but turns out not to be. Our current
configure test checks the return value of getaddrinfo("", "", NULL,
NULL) but I am wondering if we should test for "localhost" instead of ""
as the first parameter.

Huh? That's just an AC_TRY_LINK test, we don't actually execute it.
If we did, the test would fail on machines where resolution of "localhost"
is broken, which we already know is a not-so-rare disease ...

I'm not sure that I believe the "getaddrinfo doesn't work" diagnosis
anyway, seeing that bear gets through "make check" okay. Wouldn't that
fail too if there were a problem there?

Second, it appears that this platform apparently doesn't handle Infinity
and NaN well. The regression diffs are attached.

On the FPE front, it'd be useful to get a gdb traceback to see where the
SIGFPE is occurring.

regards, tom lane

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#2)
Re: Tru64/Alpha problems

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Honda Shigehiro has diagnosed the longstanding problems with his
Tru64/Alpha buildfarm member (bear). See below.

First, it appears that there is a problem with the system getaddrinfo(),
which configure reports as usable, but turns out not to be. Our current
configure test checks the return value of getaddrinfo("", "", NULL,
NULL) but I am wondering if we should test for "localhost" instead of ""
as the first parameter.

Huh? That's just an AC_TRY_LINK test, we don't actually execute it.
If we did, the test would fail on machines where resolution of "localhost"
is broken, which we already know is a not-so-rare disease ...

I'm not sure that I believe the "getaddrinfo doesn't work" diagnosis
anyway, seeing that bear gets through "make check" okay. Wouldn't that
fail too if there were a problem there?

Now that I look further into it, this machine was working just fine
until we made a change in configure, allegedly to get things right on
Tru64. The first build that went wrong was the one right after
configure.in version 1.450. I see a report from Albert Chin that this
patch worked, but the buildfarm member seems to provide counter-proof.

cheers

andrew

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#3)
Re: Tru64/Alpha problems

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

I'm not sure that I believe the "getaddrinfo doesn't work" diagnosis
anyway, seeing that bear gets through "make check" okay. Wouldn't that
fail too if there were a problem there?

Now that I look further into it, this machine was working just fine
until we made a change in configure, allegedly to get things right on
Tru64. The first build that went wrong was the one right after
configure.in version 1.450. I see a report from Albert Chin that this
patch worked, but the buildfarm member seems to provide counter-proof.

Ugh. So probably it depends on just which version of Tru64 you're using
:-(. Maybe earlier versions of Tru64 have a broken getaddrinfo and it's
fixed in later ones? How would we tell the difference?

regards, tom lane

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#4)
Re: Tru64/Alpha problems

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

I'm not sure that I believe the "getaddrinfo doesn't work" diagnosis
anyway, seeing that bear gets through "make check" okay. Wouldn't that
fail too if there were a problem there?

Now that I look further into it, this machine was working just fine
until we made a change in configure, allegedly to get things right on
Tru64. The first build that went wrong was the one right after
configure.in version 1.450. I see a report from Albert Chin that this
patch worked, but the buildfarm member seems to provide counter-proof.

Ugh. So probably it depends on just which version of Tru64 you're using
:-(. Maybe earlier versions of Tru64 have a broken getaddrinfo and it's
fixed in later ones? How would we tell the difference?

I have done some more digging on this. The buildfarm member had a couple
of configuration issues which I have remedied, and which almost
certainly account for the float test errors we saw. However, we still
get an error when we try to start the installed s/w with the default
listen_addresses:

LOG: could not translate host name "localhost", service "5832" to address: servname not supported for ai_socktype

Of course, this won't be seen with "make check", since it starts on Unix
with listen_addresses='', which means we never even look for any sort of
TCP addrinfo.

I found a hint on the web that we should use -D_SOCKADDR_LEN. I tried
this, but got a link failure, complaining about revc and send. This man
page extract explains:

[Tru64 UNIX] The recv() function is identical to the recvfrom() function
with a zero-valued address_len parameter, and to the read() function if no
flags are used. For that reason the recv() function is disabled when
4.4BSD behavior is enabled; that is, when the _SOCKADDR_LEN compile-time
option is defined.

I'd like to know some settings that we can use that will get Tru64
cleanly through the buildfarm set. If noone offers any, I propose that
we revert the getaddrinfo() test in configure and use our own on Tru64
until they do.

cheers

andrew

#6Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#5)
Re: Tru64/Alpha problems

I wrote:

I have done some more digging on this. The buildfarm member had a
couple of configuration issues which I have remedied, and which almost
certainly account for the float test errors we saw. However, we still
get an error when we try to start the installed s/w with the default
listen_addresses:

LOG: could not translate host name "localhost", service "5832" to
address: servname not supported for ai_socktype

Of course, this won't be seen with "make check", since it starts on
Unix with listen_addresses='', which means we never even look for any
sort of TCP addrinfo.

I found a hint on the web that we should use -D_SOCKADDR_LEN. I tried
this, but got a link failure, complaining about revc and send. This
man page extract explains:

[Tru64 UNIX] The recv() function is identical to the recvfrom()
function
with a zero-valued address_len parameter, and to the read() function
if no
flags are used. For that reason the recv() function is disabled when
4.4BSD behavior is enabled; that is, when the _SOCKADDR_LEN compile-time
option is defined.

I'd like to know some settings that we can use that will get Tru64
cleanly through the buildfarm set. If noone offers any, I propose that
we revert the getaddrinfo() test in configure and use our own on Tru64
until they do.

I have not had any response to this. Is there any objection to my
reverting the configure changes for the head and 8.1 branches? If not I
intend to do that around the end of the week.

cheers

andrew

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#6)
Re: Tru64/Alpha problems

Andrew Dunstan <andrew@dunslane.net> writes:

I'd like to know some settings that we can use that will get Tru64
cleanly through the buildfarm set. If noone offers any, I propose that
we revert the getaddrinfo() test in configure and use our own on Tru64
until they do.

I have not had any response to this. Is there any objection to my
reverting the configure changes for the head and 8.1 branches?

Presumably, whoever was complaining beforehand will come back ...
but I don't remember who that was.

regards, tom lane

#8Hans-Jürgen Schönig
postgres@cybertec.at
In reply to: Tom Lane (#7)
Re: Tru64/Alpha problems

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

I'd like to know some settings that we can use that will get Tru64
cleanly through the buildfarm set. If noone offers any, I propose that
we revert the getaddrinfo() test in configure and use our own on Tru64
until they do.

I have not had any response to this. Is there any objection to my
reverting the configure changes for the head and 8.1 branches?

Presumably, whoever was complaining beforehand will come back ...
but I don't remember who that was.

regards, tom lane

i think the issue you are referring to comes from a Solaris report.
some patch levels of solaris have seriously broken getaddrinfo(). in
this case pg_hba.conf cannot be read anymore.
we got a similar report some time ago. we did a simple configure tweak
to make sure that the onboard function is used. it seems to happen only
on some strange patchlevel (god knows which ones).

best regards,

hans

--
Cybertec Geschwinde & Sch�nig GmbH
Sch�ngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at