What in the world is happening on spoonbill?

Started by Tom Laneover 17 years ago25 messages
#1Tom Lane
tgl@sss.pgh.pa.us

Buildfarm member spoonbill's last four HEAD builds have all failed in
the same utterly bizarre way. It looks like about half of the test
results files got truncated at random places --- no errors, no nothing,
the file just ends early. What's up with that?

regards, tom lane

#2Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Tom Lane (#1)
Re: What in the world is happening on spoonbill?

Tom Lane wrote:

Buildfarm member spoonbill's last four HEAD builds have all failed in
the same utterly bizarre way. It looks like about half of the test
results files got truncated at random places --- no errors, no nothing,
the file just ends early. What's up with that?

psql is coredumping:

#0 0x0000000000112ea0 in print_aligned_text (cont=0xffffffffffff8a90,
fout=0x74c0c8) at print.c:664
664 if (width > 0 && width_wrap[i] &&
(gdb) bt
#0 0x0000000000112ea0 in print_aligned_text (cont=0xffffffffffff8a90,
fout=0x74c0c8) at print.c:664
#1 0x0000000000116e40 in printTable (cont=0xffffffffffff8a90,
fout=0x74c0c8, flog=0x0) at print.c:2248
#2 0x00000000001170e0 in printQuery (result=0x41a44800, opt=0x4,
fout=0x74c0c8, flog=0x0) at print.c:2365
#3 0x0000000000107dc0 in PrintQueryTuples (results=0x41a44800) at
common.c:605
#4 0x00000000001080b0 in PrintQueryResults (results=0x41a44800) at
common.c:710
#5 0x0000000000108508 in SendQuery (query=0x4f4cd600 "select * from
def_test;") at common.c:870
#6 0x000000000010c5f4 in MainLoop (source=0x74c030) at mainloop.c:242
#7 0x000000000010eb40 in main (argc=6, argv=0xffffffffffff91f8) at
startup.c:347

which points the figner towards the psql changes ...

Stefan

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stefan Kaltenbrunner (#2)
Re: What in the world is happening on spoonbill?

Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:

psql is coredumping:

Huh. I wonder why it's only happening on that one machine.
Is there anything particularly unusual about datatype sizes
or alignment rules on that platform?

regards, tom lane

#4Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Tom Lane (#3)
Re: What in the world is happening on spoonbill?

Tom Lane wrote:

Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:

psql is coredumping:

Huh. I wonder why it's only happening on that one machine.
Is there anything particularly unusual about datatype sizes
or alignment rules on that platform?

hmm well it is a 64bit Sparc box running OpenBSD which is a tad
"unusual" in itself.
But if i had to guess this more likely caused by the special malloc
flags used on spoonbill (FGJPZ) - per your recommendations in:

http://archives.postgresql.org/pgsql-hackers/2005-06/msg00828.php

docs at:

http://www.openbsd.org/cgi-bin/man.cgi?query=malloc.conf&amp;apropos=0&amp;sektion=0&amp;manpath=OpenBSD+4.2&amp;arch=sparc64&amp;format=html

Stefan

#5Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Tom Lane (#3)
Re: What in the world is happening on spoonbill?

Tom Lane wrote:

Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:

psql is coredumping:

Huh. I wonder why it's only happening on that one machine.
Is there anything particularly unusual about datatype sizes
or alignment rules on that platform?

hmm actually - the windows buildfarm failures/issues andrew reported
might be the same issue from looking at his report and the failure after
killing psql ...

Stefan

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stefan Kaltenbrunner (#4)
Re: What in the world is happening on spoonbill?

Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:

Tom Lane wrote:

Huh. I wonder why it's only happening on that one machine.

But if i had to guess this more likely caused by the special malloc
flags used on spoonbill (FGJPZ) - per your recommendations in:

Hah, yeah, that's it. The code was definitely indexing off the end
of the width_wrap[] array. It's surprising that we didn't get any
more-obvious failures, like bogus output formatting.

Can you modify the buildfarm's description of that machine to mention
the special malloc debug flags? It'd probably stop me from asking
you this question again ;-)

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stefan Kaltenbrunner (#2)
Re: What in the world is happening on spoonbill?

Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:

psql is coredumping:

BTW, this exposes a pretty nasty omission in pg_regress: it fails to
say anything about a nonzero exit code from a psql child process
that's running a test. Seems like wait_for_tests() ought to complain
about that. Any objections? Does anyone know how to get the child
process exit status on Windows?

regards, tom lane

#8Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Tom Lane (#6)
Re: What in the world is happening on spoonbill?

Tom Lane wrote:

Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:

Tom Lane wrote:

Huh. I wonder why it's only happening on that one machine.

But if i had to guess this more likely caused by the special malloc
flags used on spoonbill (FGJPZ) - per your recommendations in:

Hah, yeah, that's it. The code was definitely indexing off the end
of the width_wrap[] array. It's surprising that we didn't get any
more-obvious failures, like bogus output formatting.

Can you modify the buildfarm's description of that machine to mention
the special malloc debug flags? It'd probably stop me from asking
you this question again ;-)

hmm - would take somebody with SQL-level access to do this - the script
to update OS/compiler related data is only partially(ie not updating all
information) working...
But maybe it would be nice to have some sort of "notes about this
buildfarm member" text field that contains this information (or other
stuff like "this is a VM running on bar" or "this is really the same
hardware as animal bar just with configuration baz" ?

Stefan

#9Jeremy Drake
pgsql@jdrake.com
In reply to: Tom Lane (#7)
Re: What in the world is happening on spoonbill?

On Sat, 17 May 2008, Tom Lane wrote:

Does anyone know how to get the child
process exit status on Windows?

GetExitCodeProcess, if you've got the process handle handy (which I assume
you do, since you most likely were calling one of the WaitFor...Object
family of functions.

http://msdn.microsoft.com/en-us/library/ms683189(VS.85).aspx

regards, tom lane

--
Then a man said: Speak to us of Expectations.

He then said: If a man does not see or hear the waters of the Jordan,
then he should not taste the pomegranate or ply his wares in an open
market.

If a man would not labour in the salt and rock quarries then he should
not accept of the Earth that which he refuses to give of himself.

Such a man would expect a pear of a peach tree.
Such a man would expect a stone to lay an egg.
Such a man would expect Sears to assemble a lawnmower.
-- Kehlog Albran, "The Profit"

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#7)
Re: What in the world is happening on spoonbill?

I wrote:

BTW, this exposes a pretty nasty omission in pg_regress: it fails to
say anything about a nonzero exit code from a psql child process
that's running a test. Seems like wait_for_tests() ought to complain
about that. Any objections?

So I coded this up, and fortunately thought to try it with ecpg's tests
before committing:

test preproc/define ... ok
test preproc/init ... ok
test preproc/type ... ok
test preproc/variable ... ok
test preproc/whenever ... FAILED: test process exited with exit code 1
test sql/array ... ok
test sql/binary ... ok
test sql/code100 ... ok
test sql/copystdout ... ok

Apparently the exit(1) is intentional in that test.

We could possibly extend the syntax of regression schedule files to have
a way to say what's the expected exit status, but that seems like more
work than it's worth. Would it be all right to just remove the test of
"on error stop" mode?

regards, tom lane

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#10)
Re: What in the world is happening on spoonbill?

I wrote:

We could possibly extend the syntax of regression schedule files to have
a way to say what's the expected exit status, but that seems like more
work than it's worth. Would it be all right to just remove the test of
"on error stop" mode?

What I did for the moment is just make it annotate the report, rather
than treating nonzero status as a failure in itself. That will at
least help with diagnosing problems.

regards, tom lane

#12Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#10)
Re: What in the world is happening on spoonbill?

Tom Lane wrote:

We could possibly extend the syntax of regression schedule files to have
a way to say what's the expected exit status, but that seems like more
work than it's worth. �Would it be all right to just remove the test of
"on error stop" mode?

Woulnd't it be enough to report the exist status if a test fails, instead of
requiring a certain exit status for success?

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#12)
Re: What in the world is happening on spoonbill?

Peter Eisentraut <peter_e@gmx.net> writes:

Woulnd't it be enough to report the exist status if a test fails, instead of
requiring a certain exit status for success?

What I have it doing is reporting the exit status if not zero, but it's
only an annotation on the short-form output; it doesn't control whether
the test is considered to have succeeded or not. I'm not very happy
with that because a crash after all the expected output has been
produced would not result in a report of failure --- and we have seen
problems with psql crashing at exit, so this isn't an academic point.

regards, tom lane

#14Gregory Stark
stark@enterprisedb.com
In reply to: Tom Lane (#13)
Re: What in the world is happening on spoonbill?

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Peter Eisentraut <peter_e@gmx.net> writes:

Woulnd't it be enough to report the exist status if a test fails, instead of
requiring a certain exit status for success?

What I have it doing is reporting the exit status if not zero, but it's
only an annotation on the short-form output; it doesn't control whether
the test is considered to have succeeded or not. I'm not very happy
with that because a crash after all the expected output has been
produced would not result in a report of failure --- and we have seen
problems with psql crashing at exit, so this isn't an academic point.

It might be a bit weird but pg_regress could stick a message in the output
file before it does the comparison with the expected results.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

#15Michael Meskes
meskes@postgresql.org
In reply to: Tom Lane (#10)
Re: What in the world is happening on spoonbill?

On Sat, May 17, 2008 at 03:52:07PM -0400, Tom Lane wrote:

So I coded this up, and fortunately thought to try it with ecpg's tests
before committing:
...
test preproc/whenever ... FAILED: test process exited with exit code 1
...
Apparently the exit(1) is intentional in that test.
..
work than it's worth. Would it be all right to just remove the test of
"on error stop" mode?

I'm fine with removing this test. Granted it leaves a very small code
path untested but I think we can live with this.

Michael
--
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org
Go VfL Borussia! Go SF 49ers! Use Debian GNU/Linux! Use PostgreSQL!

#16Alvaro Herrera
alvherre@commandprompt.com
In reply to: Stefan Kaltenbrunner (#8)
Re: What in the world is happening on spoonbill?

Stefan Kaltenbrunner wrote:

Tom Lane wrote:

Can you modify the buildfarm's description of that machine to mention
the special malloc debug flags? It'd probably stop me from asking
you this question again ;-)

hmm - would take somebody with SQL-level access to do this - the script
to update OS/compiler related data is only partially(ie not updating all
information) working...

I've changed the compiler to read gcc-malloc-FGJPZ on spoonbill.

BTW this animal has not updated in quite a few days ... is this
expected?

But maybe it would be nice to have some sort of "notes about this
buildfarm member" text field that contains this information (or other
stuff like "this is a VM running on bar" or "this is really the same
hardware as animal bar just with configuration baz" ?

Apparently Andrew has been working on it, but it's not yet visible on
the web page anywhere.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#17Andrew Dunstan
andrew@dunslane.net
In reply to: Alvaro Herrera (#16)
Re: What in the world is happening on spoonbill?

Alvaro Herrera wrote:

But maybe it would be nice to have some sort of "notes about this
buildfarm member" text field that contains this information (or other
stuff like "this is a VM running on bar" or "this is really the same
hardware as animal bar just with configuration baz" ?

Apparently Andrew has been working on it, but it's not yet visible on
the web page anywhere.

Yes, I started on it. The problem is that we have very little real
estate available on the dashboard to display it. I tried making it
available as a tooltip but Tom didn't like that much (in private
correspondence), and I didn't get back to doing something else. But the
database changes are there. So, how/where would people like member
annotations displayed?

cheers

andrew

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#17)
Re: What in the world is happening on spoonbill?

Andrew Dunstan <andrew@dunslane.net> writes:

Yes, I started on it. The problem is that we have very little real
estate available on the dashboard to display it. I tried making it
available as a tooltip but Tom didn't like that much (in private
correspondence), and I didn't get back to doing something else. But the
database changes are there. So, how/where would people like member
annotations displayed?

Hmm, well, a tooltip is certainly better than nothing at all.

I don't recall exactly what I said about tooltips in the mail you're
referring to, but the main objection I can think of right now is that
I'm not sure all browsing setups support tooltips nicely.

Perhaps another way is to include the machine description details in
the per-animal status history page, eg in or under the "System Detail"
bit at
http://www.pgbuildfarm.org/cgi-bin/show_history.pl?nm=spoonbill&amp;br=HEAD

regards, tom lane

#19Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#18)
Re: What in the world is happening on spoonbill?

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Yes, I started on it. The problem is that we have very little real
estate available on the dashboard to display it. I tried making it
available as a tooltip but Tom didn't like that much (in private
correspondence), and I didn't get back to doing something else. But the
database changes are there. So, how/where would people like member
annotations displayed?

Hmm, well, a tooltip is certainly better than nothing at all.

I don't recall exactly what I said about tooltips in the mail you're
referring to, but the main objection I can think of right now is that
I'm not sure all browsing setups support tooltips nicely.

Any half way modern browser that is not text based should support tool tips.

Joshua D. Drake

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua D. Drake (#19)
Re: What in the world is happening on spoonbill?

"Joshua D. Drake" <jd@commandprompt.com> writes:

Tom Lane wrote:

I'm not sure all browsing setups support tooltips nicely.

Any half way modern browser that is not text based should support tool tips.

Are we in the business of excluding text-based browsers? Or obsolete
ones, for that matter?

regards, tom lane

#21Martijn van Oosterhout
kleptog@svana.org
In reply to: Tom Lane (#20)
Re: What in the world is happening on spoonbill?

On Sat, Aug 23, 2008 at 11:44:59PM -0400, Tom Lane wrote:

"Joshua D. Drake" <jd@commandprompt.com> writes:

Tom Lane wrote:

I'm not sure all browsing setups support tooltips nicely.

Any half way modern browser that is not text based should support tool tips.

Are we in the business of excluding text-based browsers? Or obsolete
ones, for that matter?

You need to decide how you want it to appear on such browsers. If you
use the alt tag of images, text browsers will simply place the text
inline instead of the image. They'd probably ignore the title tag. The
title tag is the best, and is widely supported.

I'm not sure if any text browsers support CSS, so if implemented that
way, for them the tooltips simply won't appear.

Once you decide on exactly how you want text browsrs to be able to see
it then the solution becomes obvious.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#22Steven Lembark
lembark@wrkhors.com
In reply to: Martijn van Oosterhout (#21)
Re: What in the world is happening on spoonbill?

Are we in the business of excluding text-based browsers? Or obsolete
ones, for that matter?

I don't think we would want to be in the business of
dealing successfully with every quirk of every browser
ever released.

Another way to look at it is supporting standards:
If graphical browsers support at least HTML and CSS,
maybe ecmascript, then they are supportable. If text
based ones can handle the necessary alt tags then we
can also support them.

Beyond that, do you really want to document and code
around every quirk in MSIE 1.0, Netscape 0.50, or any
of the now-extinct text-based browsers for MSDOS?

--
Steven Lembark 85-09 90th St.
Workhorse Computing Woodhaven, NY, 11421
lembark@wrkhors.com +1 888 359 3508

#23Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#20)
Re: What in the world is happening on spoonbill?

Tom Lane wrote:

"Joshua D. Drake" <jd@commandprompt.com> writes:

Tom Lane wrote:

I'm not sure all browsing setups support tooltips nicely.

Any half way modern browser that is not text based should support tool tips.

Are we in the business of excluding text-based browsers? Or obsolete
ones, for that matter?

Shrug, I was just offering that most browsers should support it.

Joshua D. Drake

Show quoted text

regards, tom lane

#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Steven Lembark (#22)
Re: What in the world is happening on spoonbill?

Steven Lembark <lembark@wrkhors.com> writes:

Are we in the business of excluding text-based browsers? Or obsolete
ones, for that matter?

I don't think we would want to be in the business of
dealing successfully with every quirk of every browser
ever released.

That's nothing but a straw-man. The point here was to avoid using
constructs that we know won't work on some set of browsers, not to
specifically code around any "quirks". I already suggested a workable
solution that involves no new assumptions at all, which was to put the
added info on the linked-to pages instead of directly on the dashboard.

Now we could do that *and* use tooltips, if we can be fairly sure that
the tooltips will be ignored by browsers that can't handle them as
popups.

regards, tom lane

#25Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Alvaro Herrera (#16)
Re: What in the world is happening on spoonbill?

Alvaro Herrera wrote:

Stefan Kaltenbrunner wrote:

Tom Lane wrote:

Can you modify the buildfarm's description of that machine to mention
the special malloc debug flags? It'd probably stop me from asking
you this question again ;-)

hmm - would take somebody with SQL-level access to do this - the script
to update OS/compiler related data is only partially(ie not updating all
information) working...

I've changed the compiler to read gcc-malloc-FGJPZ on spoonbill.

BTW this animal has not updated in quite a few days ... is this
expected?

FWIW: this should be fixed now ...

Stefan