[pgsql-hackers-owner+M6959@postgresql.org: Majordomo Delivery Error]

Started by Roberto Melloalmost 25 years ago19 messages
#1Roberto Mello
rmello@cc.usu.edu

I think somebody has the "owner" address forwarding to the list.
Really annoying. Would somebody please stop that?

Thanks,

-Roberto

----- Forwarded message from pgsql-hackers-owner+M6959@postgresql.org -----

From: pgsql-hackers-owner+M6959@postgresql.org
Subject: Majordomo Delivery Error
To: pgsql-hackers-owner+M6959@postgresql.org
X-VMS-To: IN%"pgsql-hackers-owner+M6959@postgresql.org"

This message was created automatically by mail delivery software.
A Majordomo message could not be delivered to the following addresses:

biff@htmlhost.net:
450 4.7.1 <biff@htmlhost.net>... Can not check MX records for recipient host htmlhost.net

-- Original message omitted --

----- End forwarded message -----

-- 
+----| http://fslc.usu.edu USU Free Software & GNU/Linux Club|------+
  Roberto Mello - Computer Science, USU - http://www.brasileiro.net 
      http://www.sdl.usu.edu - Space Dynamics Lab, Developer    
A seminar on Time Travel will be held two weeks ago
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Roberto Mello (#1)
Re: [pgsql-hackers-owner+M6959@postgresql.org: Majordomo Delivery Error]

Roberto Mello <rmello@cc.usu.edu> writes:

I think somebody has the "owner" address forwarding to the list.

No, certainly not --- otherwise we'd all be drowning in bounce messages.
The Postgres mailing lists are correctly configured. However, there are
certain sites that run broken mail software that sends bounce messages
back to the author of the individual message being bounced, rather than
sending 'em to the list owner address like it should.

If you get a bounce message for mailing list traffic that you authored,
then forward it to Marc, who otherwise may not find out that the dead
address needs to be removed from the lists.

You might also try complaining to the postmaster at the offending site,
but I've found that's generally a waste of time; if the admin had a clue
you'd never have seen any bounce anyway.

regards, tom lane

#3Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Tom Lane (#2)
Re: [lockhart@alumni.caltech.edu: Third call for platform testing]

(cc'd the -hackers mailing list)

Thanks for the reports Matthew. There is a single failure in the
NetBSD/sparc64 test due to a problem in the reltime test (or in starting
the reltime test). There is a different failure in your NetBSD/sparc
test, but since you are not confident about your installation we'll wait
to diagnose that (unless this rings a bell with someone).

Anyone have suggestions for Mathew?

- Thomas

for postgresql-7.1RC2.tar.gz, here is my `make check' for NetBSD/sparc64:

[ ... ]
reltime ... FAILED
[ ... ]
test horology ... FAILED
[ ... ]
inherit ... FAILED
[ ... ]
test misc ... FAILED
[ ... ]

=======================
4 of 76 tests failed.
=======================

digging into the regression.diffs, i can see that:
- reltime failed because it just had:
! psql: Backend startup failed

Hmm. That one is a problem. Perhaps someone will have a suggestion?

- horology failed because of off-by-one errors somewhere:

Not a problem; I have an unintended dependency on daylight savings time,
which now causes this test to fail for everyone. The test itself should
be fixed for the release.

for several cases. another failure here was due to:
! ERROR: Relation 'reltime_tbl' does not exist
which i guess is caused by the first failure.

Yes, I think you are right.

Show quoted text

- inherit fails because the ordering is invalid, eg:

- a | aaa
a | aaaa
a | aaaaa
a | aaaaaa
a | aaaaaaa
a | aaaaaaaa
b | bbb
b | bbbb
b | bbbbb
b | bbbbbb
b | bbbbbbb
- b | bbbbbbbb
c | ccc

vs

a | aaaa
a | aaaaa
a | aaaaaa
a | aaaaaaa
a | aaaaaaaa
+ a | aaa
+ b | bbbbbbbb
b | bbb
b | bbbb
b | bbbbb
b | bbbbbb
b | bbbbbbb

there are dozens of these failures in the inherit test.

- misc fails because of the reltime failure, i guess:

- reltime_tbl

and:

! (90 rows)
--
! (89 rows)

i don't know anything about postgresql (i am merely testing at the
suggestion of a friend) so i'm not very well equiped to debug these
failures without some help.

and for NetBSD/sparc:

[ ... ]
test horology ... FAILED
[ ... ]
create_index ... FAILED
[ ... ]
test sanity_check ... FAILED
[ ... ]

- horology fails for similar reasons as sparc64, but only 2
failures instead of about 15.

- create_index failed because of some weird error that may
have more to do with the quick-n-dirty installation i have
on the SS20 i'm doing the test on:

CREATE INDEX hash_i4_index ON hash_i4_heap USING hash (random int4_ops);
+ ERROR:  cannot read block 3 of hash_i4_index: Bad address
CREATE INDEX hash_name_index ON hash_name_heap USING hash (random name_ops);
+ ERROR:  cannot read block 3 of hash_name_index: Bad address
CREATE INDEX hash_txt_index ON hash_txt_heap USING hash (random text_ops);
+ ERROR:  cannot read block 3 of hash_txt_index: Bad address
CREATE INDEX hash_f8_index ON hash_f8_heap USING hash (random float8_ops);
+ ERROR:  cannot read block 3 of hash_f8_index: Bad address

- sanity_check fails because of the create_index failure:

- hash_f8_heap | t
- hash_i4_heap | t
- hash_name_heap | t
- hash_txt_heap | t

! (45 rows)
vs
! (41 rows)

i will be reinstalling this SS20 with a full installation sometime in
the next few days. i will re-run the testsuite after this to see if
that is causing any of the lossage. none of the sparc64 lossage should
be related, and that was run on an Ultra1/140 FWIW. both of these were
run under NetBSD 1.5S (-current from a few weeks ago.)

.mrg.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Lockhart (#3)
Re: [lockhart@alumni.caltech.edu: Third call for platform testing]

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:

Anyone have suggestions for Mathew?

for postgresql-7.1RC2.tar.gz, here is my `make check' for NetBSD/sparc64:

digging into the regression.diffs, i can see that:
- reltime failed because it just had:
! psql: Backend startup failed

The postmaster log file should have more info, but a first thought is
that you ran up against process or swap-space limitations. The parallel
check has fifty-odd processes going at its peak, which is more than the
default per-user process limit on many Unixen.

- inherit fails because the ordering is invalid, eg:

Ordering issues are not really bugs (cf documentation about interpreting
regression results), although it'd be interesting to know if these diffs
still occur after you resolve the other failures.

- create_index failed because of some weird error that may
have more to do with the quick-n-dirty installation i have
on the SS20 i'm doing the test on:

CREATE INDEX hash_i4_index ON hash_i4_heap USING hash (random int4_ops);
+ ERROR: cannot read block 3 of hash_i4_index: Bad address

"Bad address"? That seems pretty bizarre.

i will be reinstalling this SS20 with a full installation sometime in
the next few days. i will re-run the testsuite after this to see if
that is causing any of the lossage.

Please let us know.

regards, tom lane

#5Tom Ivar Helbekkmo
tih@kpnQwest.no
In reply to: Tom Lane (#4)
Re: [lockhart@alumni.caltech.edu: Third call for platform testing]

Tom Lane <tgl@sss.pgh.pa.us> writes:

CREATE INDEX hash_i4_index ON hash_i4_heap USING hash (random int4_ops);
+ ERROR: cannot read block 3 of hash_i4_index: Bad address

"Bad address"? That seems pretty bizarre.

This is obviously something that shows up on _some_ NetBSD platforms.
The above was on sparc64, but that same problem is the only one I see
in the regression testing on NetBSD/vax that isn't just different
floating point (the VAX doesn't have IEEE), different ordering of
(unordered) collections or different wording of strerror() output.

NetBSD/i386 doesn't have the "Bad address" problem.

-tih
--
The basic difference is this: hackers build things, crackers break them.

#6matthew green
mrg@eterna.com.au
In reply to: Tom Ivar Helbekkmo (#5)
re: [lockhart@alumni.caltech.edu: Third call for platform testing]

CREATE INDEX hash_i4_index ON hash_i4_heap USING hash (random int4_ops);
+ ERROR: cannot read block 3 of hash_i4_index: Bad address

"Bad address"? That seems pretty bizarre.

This is obviously something that shows up on _some_ NetBSD platforms.
The above was on sparc64, but that same problem is the only one I see

that Bad address message was actually from sparc.

#7matthew green
mrg@eterna.com.au
In reply to: Tom Lane (#4)
re: [lockhart@alumni.caltech.edu: Third call for platform testing]

digging into the regression.diffs, i can see that:
- reltime failed because it just had:
! psql: Backend startup failed

The postmaster log file should have more info, but a first thought is
that you ran up against process or swap-space limitations. The parallel
check has fifty-odd processes going at its peak, which is more than the
default per-user process limit on many Unixen.

hmm, maxproc=80 on this system currently and i wasn't really doing anything
else. it has 256MB ram and 280MB swap (unused). exactly what am i looking
for in the postmaster.log file? it is 65kb long...

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: matthew green (#7)
Re: [lockhart@alumni.caltech.edu: Third call for platform testing]

matthew green <mrg@eterna.com.au> writes:

digging into the regression.diffs, i can see that:
- reltime failed because it just had:
! psql: Backend startup failed

The postmaster log file should have more info, but a first thought is
that you ran up against process or swap-space limitations. The parallel
check has fifty-odd processes going at its peak, which is more than the
default per-user process limit on many Unixen.

hmm, maxproc=80 on this system currently and i wasn't really doing anything
else. it has 256MB ram and 280MB swap (unused). exactly what am i looking
for in the postmaster.log file? it is 65kb long...

Look for messages about "fork failed". They should give a kernel error
message too.

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Ivar Helbekkmo (#5)
Re: [lockhart@alumni.caltech.edu: Third call for platform testing]

Tom Ivar Helbekkmo <tih@kpnQwest.no> writes:

Tom Lane <tgl@sss.pgh.pa.us> writes:
CREATE INDEX hash_i4_index ON hash_i4_heap USING hash (random int4_ops);
+ ERROR: cannot read block 3 of hash_i4_index: Bad address

"Bad address"? That seems pretty bizarre.

This is obviously something that shows up on _some_ NetBSD platforms.

If it's reproducible on more than one box then we should look into it.
Am I right to guess that "Bad address" means a bogus pointer handed to
a kernel call? If so, it'll probably take some digging with gdb to find
out the cause. I'd be happy to do the digging if anyone can give me an
account reachable via telnet or ssh on one of these machines.

regards, tom lane

#10matthew green
mrg@eterna.com.au
In reply to: Tom Lane (#4)
re: [lockhart@alumni.caltech.edu: Third call for platform testing]

i will be reinstalling this SS20 with a full installation sometime in
the next few days. i will re-run the testsuite after this to see if
that is causing any of the lossage.

Please let us know.

actually, i had a classic i could test with -- all except horology passed,
so if there were two expected failures there, all is fine on NetBSD/sparc.

#11matthew green
mrg@eterna.com.au
In reply to: Tom Lane (#8)
re: [lockhart@alumni.caltech.edu: Third call for platform testing]

matthew green <mrg@eterna.com.au> writes:

digging into the regression.diffs, i can see that:
- reltime failed because it just had:
! psql: Backend startup failed

The postmaster log file should have more info, but a first thought is
that you ran up against process or swap-space limitations. The parallel
check has fifty-odd processes going at its peak, which is more than the
default per-user process limit on many Unixen.

hmm, maxproc=80 on this system currently and i wasn't really doing anything
else. it has 256MB ram and 280MB swap (unused). exactly what am i looking
for in the postmaster.log file? it is 65kb long...

Look for messages about "fork failed". They should give a kernel error
message too.

after running `unlimit' (tcsh) before `make check', the only failures i have
are the horology (expected) and the inherit sorted failures, on NetBSD/sparc64.

i also believe the `Bad address' errors were caused when the test was run in
an NFS mounted directory.

.mrg.

#12Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: matthew green (#11)
Re: [lockhart@alumni.caltech.edu: Third call for platform testing]

after running `unlimit' (tcsh) before `make check', the only failures i have
are the horology (expected) and the inherit sorted failures, on NetBSD/sparc64.

I'll mark both NetBSD/sparc as supported, for both 32 and 64-bit builds.
Thanks!

- Thomas

#13Tom Ivar Helbekkmo
tih@kpnQwest.no
In reply to: matthew green (#11)
Re: [lockhart@alumni.caltech.edu: Third call for platform testing]

matthew green <mrg@eterna.com.au> writes:

i also believe the `Bad address' errors were caused when the test
was run in an NFS mounted directory.

You may have something, there. My test run on the VAX was over NFS.
I set up NetBSD on a VAX specifically to test PostgreSQL 7.1, but I
didn't have any disk available that it could use, so I went for NFS.

-tih
--
The basic difference is this: hackers build things, crackers break them.

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Ivar Helbekkmo (#5)
NetBSD "Bad address" failure (was Re: Third call for platform testing)

Tom Ivar Helbekkmo <tih@kpnQwest.no> writes:

Tom Lane <tgl@sss.pgh.pa.us> writes:
CREATE INDEX hash_i4_index ON hash_i4_heap USING hash (random int4_ops);
+ ERROR: cannot read block 3 of hash_i4_index: Bad address

"Bad address"? That seems pretty bizarre.

This is obviously something that shows up on _some_ NetBSD platforms.
The above was on sparc64, but that same problem is the only one I see
in the regression testing on NetBSD/vax that isn't just different
floating point (the VAX doesn't have IEEE), different ordering of
(unordered) collections or different wording of strerror() output.

NetBSD/i386 doesn't have the "Bad address" problem.

After looking into it, I find that the problem is this: Postgres, or at
least the hash-index part of it, expects to be able to lseek() to a
position past the end of a file and then get a non-failure return from
read(). (This happens indirectly because it uses ReadBuffer for blocks
that it has never yet written.) Given the attached test program, I get
this result on my own machine:

$ touch z -- create an empty file
$ ./a.out z 0 -- read at offset 0
Read 0 bytes
$ ./a.out z 1 -- read at offset 8K
Read 0 bytes

Presumably, the same result appears everywhere else that the regress
tests pass. But NetBSD 1.5T gives

$ touch z
$ ./a.out z 0
Read 0 bytes
$ ./a.out z 1
read: Bad address
$ uname -a
NetBSD varg.i.eunet.no 1.5T NetBSD 1.5T (VARG) #4: Thu Apr 5 23:38:04 CEST 2001 root@varg.i.eunet.no:/usr/src/sys/arch/vax/compile/VARG vax

I think this is indisputably a bug in (some versions of) NetBSD. If I
can seek past the end of file, read() shouldn't consider it a hard error
to read there --- and in any case, EFAULT isn't a very reasonable error
code to return. Since it seems not to be a widespread problem, I'm not
eager to change the hash code to try to avoid it.

regards, tom lane

#include <stdio.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>

int main (int argc, char** argv)
{
char *fname = argv[1];
int fd, readres;
long seekres;
char buf[8192];

fd = open(fname, O_RDONLY, 0);
if (fd < 0)
{
perror(fname);
exit(1);
}
seekres = lseek(fd, atoi(argv[2]) * 8192, SEEK_SET);
if (seekres < 0)
{
perror("seek");
exit(1);
}
readres = read(fd, buf, sizeof(buf));
if (readres < 0)
{
perror("read");
exit(1);
}
printf("Read %d bytes\n", readres);

exit(0);
}

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#14)
Re: NetBSD "Bad address" failure (was Re: Third call for platform testing)

I wrote:

I think this is indisputably a bug in (some versions of) NetBSD. If I
can seek past the end of file, read() shouldn't consider it a hard error
to read there --- and in any case, EFAULT isn't a very reasonable error
code to return. Since it seems not to be a widespread problem, I'm not
eager to change the hash code to try to avoid it.

I forgot to mention a possible contributing factor: the files involved
were NFS-mounted, in the case I was looking at. So this may be an NFS
problem more than a NetBSD problem. Anyone want to try the given test
case on NFS-mounted files on other systems?

regards, tom lane

#16Tom Ivar Helbekkmo
tih@kpnQwest.no
In reply to: Tom Lane (#15)
Re: NetBSD "Bad address" failure (was Re: Third call for platform testing)

Tom Lane <tgl@sss.pgh.pa.us> writes:

I think this is indisputably a bug in (some versions of) NetBSD.

I forgot to mention a possible contributing factor: the files involved
were NFS-mounted, in the case I was looking at. So this may be an NFS
problem more than a NetBSD problem. Anyone want to try the given test
case on NFS-mounted files on other systems?

I can verify, that with NetBSD-current on sparc, your test code works
the way you want it to on local disk, but fails (in the way you've
observed), if the target file is on an NFS-mounted file system.

-tih
--
The basic difference is this: hackers build things, crackers break them.

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Ivar Helbekkmo (#16)
Re: NetBSD "Bad address" failure (was Re: Third call for platform testing)

Tom Ivar Helbekkmo <tih@kpnQwest.no> writes:

I can verify, that with NetBSD-current on sparc, your test code works
the way you want it to on local disk, but fails (in the way you've
observed), if the target file is on an NFS-mounted file system.

FWIW, the test program succeeds (no error) using HPUX 10.20 and a couple
different Linux flavors as either client or server. So I'm still
thinking that it's NetBSD-specific. It would be useful to try it on
some other BSD derivatives though ...

regards, tom lane

#18Larry Rosenman
ler@lerctr.org
In reply to: Tom Lane (#17)
Re: NetBSD "Bad address" failure (was Re: Third call for platform testing)

* Tom Lane <tgl@sss.pgh.pa.us> [010414 18:15]:

Tom Ivar Helbekkmo <tih@kpnQwest.no> writes:

I can verify, that with NetBSD-current on sparc, your test code works
the way you want it to on local disk, but fails (in the way you've
observed), if the target file is on an NFS-mounted file system.

FWIW, the test program succeeds (no error) using HPUX 10.20 and a couple
different Linux flavors as either client or server. So I'm still
thinking that it's NetBSD-specific. It would be useful to try it on
some other BSD derivatives though ...

I can arrange a test on FreeBSD 4.3... I'll try it tomorrow...

(Or I can give access....)

LER

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

#19matthew green
mrg@eterna.com.au
In reply to: Tom Lane (#14)
re: NetBSD "Bad address" failure (was Re: Third call for platform testing)

yes, this is a bug in netbsd-current that was introduced with about 5 month
ago with the new unified buffer cache system. it has been fixed.

thanks.

From: Chuck Silvers <chs@netbsd.org>
To: source-changes@netbsd.org
Subject: CVS commit: syssrc
Date: Mon, 16 Apr 2001 17:37:44 +0300 (EEST)

Module Name: syssrc
Committed By: chs
Date: Mon Apr 16 14:37:44 UTC 2001

Modified Files:
syssrc/sys/nfs: nfs_bio.c

Log Message:
reads at or after EOF should "succeed".

To generate a diff of this commit:
cvs rdiff -r1.65 -r1.66 syssrc/sys/nfs/nfs_bio.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.