Buildfarm feature request: some way to track/classify failures

Started by Tom Laneabout 19 years ago42 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

The current buildfarm webpages make it easy to see when a branch tip
is seriously broken, but it's not very easy to investigate transient
failures, such as a regression test race condition that only
materializes once in awhile. I would like to have a way of seeing
just the failed build attempts across all machines running a given
branch. Ideally it would be possible to tag failures as to the cause
(if known) and/or symptom pattern, and then be able to examine just
the ones without known cause or having similar symptoms.

I'm not sure how much of this is reasonable to try to do with webpages
similar to what we've got. But the data is all in a database AIUI,
so another possibility is to do this work via SQL. That'd require
having the ability to pull the information from the buildfarm database
so someone else could manipulate it.

So I guess the first question is can you make the build data available,
and the second is whether you're interested in building more flexible
views or just want to let someone else do that. Also, if anyone does
make an effort to tag failures, it'd be good to somehow push that data
back into the master database, so that we don't end up duplicating such
work.

regards, tom lane

#2Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#1)
Re: Buildfarm feature request: some way to track/classify failures

Tom Lane wrote:

The current buildfarm webpages make it easy to see when a branch tip
is seriously broken, but it's not very easy to investigate transient
failures, such as a regression test race condition that only
materializes once in awhile. I would like to have a way of seeing
just the failed build attempts across all machines running a given
branch. Ideally it would be possible to tag failures as to the cause
(if known) and/or symptom pattern, and then be able to examine just
the ones without known cause or having similar symptoms.

I'm not sure how much of this is reasonable to try to do with webpages
similar to what we've got. But the data is all in a database AIUI,
so another possibility is to do this work via SQL. That'd require
having the ability to pull the information from the buildfarm database
so someone else could manipulate it.

So I guess the first question is can you make the build data available,
and the second is whether you're interested in building more flexible
views or just want to let someone else do that. Also, if anyone does
make an effort to tag failures, it'd be good to somehow push that data
back into the master database, so that we don't end up duplicating such
work.

If the data is already there and just not represented, just let me know
exactly what you want and I will implement pages for that data happily.

Joshua D. Drake

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#1)
Re: Buildfarm feature request: some way to track/classify failures

Tom Lane wrote:

The current buildfarm webpages make it easy to see when a branch tip
is seriously broken, but it's not very easy to investigate transient
failures, such as a regression test race condition that only
materializes once in awhile. I would like to have a way of seeing
just the failed build attempts across all machines running a given
branch. Ideally it would be possible to tag failures as to the cause
(if known) and/or symptom pattern, and then be able to examine just
the ones without known cause or having similar symptoms.

I'm not sure how much of this is reasonable to try to do with webpages
similar to what we've got. But the data is all in a database AIUI,
so another possibility is to do this work via SQL. That'd require
having the ability to pull the information from the buildfarm database
so someone else could manipulate it.

So I guess the first question is can you make the build data available,
and the second is whether you're interested in building more flexible
views or just want to let someone else do that. Also, if anyone does
make an effort to tag failures, it'd be good to somehow push that data
back into the master database, so that we don't end up duplicating such
work.

Well, the db is currently running around 13Gb, so that's not something
to be exported lightly ;-)

If we upgraded from Postgres 8.0.x to 8.2.x we could make use of some
features, like dynamic partitioning and copy from queries, that might
make life easier (CP people: that's a hint :-) )

I don't want to fragment effort, but I also know CP don't want open
access, for obvious reasons.

We can also look at a safe API that we could make available freely. I've
already done this over SOAP (see example client at
http://people.planetpostgresql.org/andrew/index.php?/archives/14-SOAP-server-for-Buildfarm-dashboard.html
). Doing updates is a whole other matter, of course.

Lastly, note that some buildfarm enhancements are on the SOC project
list. I have no idea if anyone will express any interest in that, of
course. It's not very glamorous work.

cheers

andrew

#4Joshua D. Drake
jd@commandprompt.com
In reply to: Andrew Dunstan (#3)
Re: Buildfarm feature request: some way to track/classify failures

Well, the db is currently running around 13Gb, so that's not something
to be exported lightly ;-)

If we upgraded from Postgres 8.0.x to 8.2.x we could make use of some
features, like dynamic partitioning and copy from queries, that might
make life easier (CP people: that's a hint :-) )

Yeah, Yeah... I need to get you off that machine as a whole :) Which is
on the list but I am waiting for 8.3 *badda bing*.

Sincerely,

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#3)
Re: Buildfarm feature request: some way to track/classify failures

Andrew Dunstan <andrew@dunslane.net> writes:

Well, the db is currently running around 13Gb, so that's not something
to be exported lightly ;-)

Yeah. I would assume though that the vast bulk of that is captured log
files. For the purposes I'm imagining, it'd be sufficient to export
only the rest of the database --- or ideally, records including all the
other fields and a URL for each log file. For the small number of log
files you actually need to examine, you'd chase the URL.

regards, tom lane

#6Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#5)
Re: Buildfarm feature request: some way to track/classify failures

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Well, the db is currently running around 13Gb, so that's not something
to be exported lightly ;-)

Yeah. I would assume though that the vast bulk of that is captured log
files. For the purposes I'm imagining, it'd be sufficient to export
only the rest of the database --- or ideally, records including all the
other fields and a URL for each log file. For the small number of log
files you actually need to examine, you'd chase the URL.

OK, for anyone that wants to play, I have created an extract that
contains a summary of every non-CVS-related failure we've had. It's a
single table looking like this:

CREATE TABLE mfailures (
sysname text,
snapshot timestamp without time zone,
stage text,
conf_sum text,
branch text,
changed_this_run text,
changed_since_success text,
log_archive_filenames text[],
build_flags text[]
);

The dump is just under 1Mb and can be downloaded from
http://www.pgbuildfarm.org/mfailures.dump

If this is useful we can create it or something like it on a regular
basis (say nightly).

The summary log for a given build can be got from:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=&lt;sysname&gt;&amp;dt=&lt;snapshot&gt;

To look at the log for a given run stage select
http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=&lt;sysname&gt;&amp;dt=&lt;snapshot&gt;&amp;stg=&lt;stagename&gt;
- the stage names available (if any) are the entries in
log_archive_filenames, stripped of the ".log" suffix.

We can make these available over an API that isn't plain http is people
want. Or we can provide a version of the buildlog that is tripped of the
html.

cheers

andrew

#7Jeremy Drake
pgsql@jdrake.com
In reply to: Andrew Dunstan (#6)
Re: Buildfarm feature request: some way to track/classify failures

On Fri, 16 Mar 2007, Andrew Dunstan wrote:

OK, for anyone that wants to play, I have created an extract that contains a
summary of every non-CVS-related failure we've had. It's a single table
looking like this:

CREATE TABLE mfailures (
sysname text,
snapshot timestamp without time zone,
stage text,
conf_sum text,
branch text,
changed_this_run text,
changed_since_success text,
log_archive_filenames text[],
build_flags text[]
);

Sweet. Should be interesting to look at.

The dump is just under 1Mb and can be downloaded from
http://www.pgbuildfarm.org/mfailures.dump

Sure about that?

--14:45:45-- http://www.pgbuildfarm.org/mfailures.dump
=> `mfailures.dump'
Resolving www.pgbuildfarm.org... 207.173.203.146
Connecting to www.pgbuildfarm.org|207.173.203.146|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9,184,142 (8.8M) [text/plain]

--
BOO! We changed Coke again! BLEAH! BLEAH!

#8Andrew Dunstan
andrew@dunslane.net
In reply to: Jeremy Drake (#7)
Re: Buildfarm feature request: some way to track/classify failures

Jeremy Drake wrote:

The dump is just under 1Mb and can be downloaded from
http://www.pgbuildfarm.org/mfailures.dump

Sure about that?

HTTP request sent, awaiting response... 200 OK
Length: 9,184,142 (8.8M) [text/plain]

Damn these new specs. They made me skip a digit.

cheers

andrew

#9Josh Berkus
josh@agliodbs.com
In reply to: Andrew Dunstan (#3)
Re: Buildfarm feature request: some way to track/classify failures

Andrew,

Lastly, note that some buildfarm enhancements are on the SOC project
list. I have no idea if anyone will express any interest in that, of
course. It's not very glamorous work.

On the other hand, I think there are a lot more student perl hackers and
web people than there are folks with the potential to do backend stuff.
So who knows?

--Josh

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#6)
Re: Buildfarm feature request: some way to track/classify failures

Andrew Dunstan <andrew@dunslane.net> writes:

OK, for anyone that wants to play, I have created an extract that
contains a summary of every non-CVS-related failure we've had. It's a
single table looking like this:

I did some analysis on this data. Attached is a text dump of a table
declared as

CREATE TABLE mreasons (
sysname text,
snapshot timestamp without time zone,
branch text,
reason text,
known boolean
);

where the sysname/snapshot/branch data is taken from your table,
"reason" is a brief sketch of the failure, and "known" indicates
whether the cause is known ... although as I went along it sort
of evolved into "does this seem worthy of more investigation?".

I looked at every failure back through early December. I'd intended to
go back further, but decided I'd hit a point of diminishing returns.
However, failures back to the beginning of July that matched grep
searches for recent symptoms are classified in the table.

The gross stats are: 2231 failures classified, 71 distinct reason
codes, 81 failures (with 18 reasons) that seem worthy of closer
investigation:

bfarm=# select reason,branch,max(snapshot) as latest, count(*) from mreasons where not known group by 1,2 order by 1,2 ;
reason | branch | latest | count
------------------------------------------------------------------+---------------+---------------------+-------
Input/output error - possible hardware problem | HEAD | 2007-03-06 10:30:01 | 1
No rule to make target | HEAD | 2007-02-08 15:30:01 | 6
No rule to make target | REL8_0_STABLE | 2007-02-28 03:15:02 | 9
No rule to make target | REL8_2_STABLE | 2006-12-17 20:00:01 | 1
could not open relation with OID | HEAD | 2007-03-16 16:45:01 | 2
could not open relation with OID | REL8_1_STABLE | 2006-08-29 23:30:07 | 2
createlang not found? | REL8_1_STABLE | 2007-02-28 02:50:00 | 1
irreproducible contrib/sslinfo build failure, likely not our bug | HEAD | 2007-02-03 07:03:02 | 1
irreproducible opr_sanity failure | HEAD | 2006-12-18 19:15:02 | 2
libintl.h rejected by configure | HEAD | 2007-01-11 20:35:00 | 3
libintl.h rejected by configure | REL8_0_STABLE | 2007-03-01 20:28:04 | 22
postmaster failed to start | REL7_4_STABLE | 2007-02-28 22:23:20 | 1
postmaster failed to start | REL8_0_STABLE | 2007-02-28 22:30:44 | 1
random Solaris configure breakage | HEAD | 2007-01-14 05:30:00 | 1
random Windows breakage | HEAD | 2007-03-16 09:48:31 | 3
random Windows breakage | REL8_0_STABLE | 2007-03-15 03:15:09 | 7
segfault during bootstrap | HEAD | 2007-03-12 23:03:03 | 1
server does not shut down | HEAD | 2007-01-08 03:03:03 | 3
tablespace is not empty | HEAD | 2007-02-24 15:00:10 | 6
tablespace is not empty | REL8_1_STABLE | 2007-01-25 02:30:01 | 2
unexpected statement_timeout failure | HEAD | 2007-01-25 05:05:06 | 1
unexplained tsearch2 crash | HEAD | 2007-01-10 22:05:02 | 1
weird DST-transition-like timestamp test failure | HEAD | 2007-02-04 07:25:04 | 1
weird assembler failure, likely not our bug | HEAD | 2006-12-26 17:02:01 | 1
weird assembler failure, likely not our bug | REL8_2_STABLE | 2007-02-03 23:47:01 | 1
weird install failure | HEAD | 2007-01-25 12:35:00 | 1
(26 rows)

I think I know the cause of the recent 'could not open relation with
OID' failures in HEAD, but the rest of these maybe need a look.
Any volunteers?

Also, for completeness, the causes I wrote off as not interesting
(anymore, in some cases):

bfarm=# select reason,max(snapshot) as latest, count(*) from mreasons where known group by 1 order by 1 ;
reason | latest | count
----------------------------------------------------------------------+---------------------+-------
DST transition test failure | 2007-03-13 04:04:47 | 26
ISO-week-patch regression test breakage | 2007-02-16 15:00:08 | 23
No rule to make Makefile.port | 2007-03-02 12:30:02 | 40
Out of disk space | 2007-02-16 22:30:01 | 67
Out of semaphores | 2007-02-20 02:03:31 | 14
Python not installed | 2007-02-19 22:45:05 | 2
Solaris random conn-refused bug | 2007-03-06 01:20:00 | 37
TCP socket already in use | 2007-01-09 07:03:04 | 13
Too many clients | 2007-02-26 06:06:02 | 90
Too many open files in system | 2007-02-27 20:30:59 | 17
another icc crash | 2007-02-03 10:50:01 | 1
apparently a malloc bug | 2007-03-04 23:00:20 | 27
bogus system clock setting | 1997-12-21 15:20:11 | 6
breakage from changing := to = in makefiles | 2007-02-10 02:15:01 | 4
broken GUC patch | 2007-03-13 15:15:01 | 92
broken float8 hacking | 2007-01-06 20:00:09 | 120
broken fsync-revoke patch | 2007-01-17 16:21:01 | 77
broken inet hacking | 2007-01-03 00:05:01 | 4
broken log_error patch | 2007-01-28 08:15:01 | 15
broken money patch | 2007-01-03 19:05:01 | 78
broken pg_regress change for msvc support | 2007-01-19 22:03:00 | 46
broken plpython patch | 2007-01-25 14:21:00 | 22
broken sys_siglist patch | 2007-01-28 06:06:02 | 18
bug in btree page split patch | 2007-02-08 11:35:03 | 7
buildfarm pilot error | 2007-01-19 03:28:07 | 69
cache flush bug in operator-family patch | 2006-12-31 10:30:03 | 8
ccache failure | 2007-01-25 23:00:34 | 2
could not create shared memory | 2007-02-13 07:00:05 | 32
ecpg regression test teething pains | 2007-02-03 13:30:02 | 516
failure to update PL expected files for may/can/might rewording | 2007-02-01 20:15:01 | 8
failure to update contrib expected files for may/can/might rewording | 2007-02-01 21:15:02 | 11
failure to update expected files for may/can/might rewording | 2007-02-01 19:35:02 | 3
icc "internal error" | 2007-03-16 16:30:01 | 29
image not found (possibly related to too-many-open-files) | 2006-10-25 08:05:02 | 1
largeobject test bugs | 2007-02-17 23:35:03 | 4
ld segfaulted | 2007-03-16 15:30:02 | 3
missing BYTE_ORDER definition for Solaris | 2007-01-10 14:18:23 | 1
pg_regress patch breakage | 2007-02-08 18:30:01 | 1
plancache test race condition | 2007-03-16 11:15:01 | 5
pltcl regression test broken by ORDER BY semantics tightening | 2007-01-09 03:15:01 | 9
previous contrib test still running | 2007-02-13 20:49:33 | 21
random Solaris breakage | 2007-01-05 17:20:01 | 1
random Windows breakage | 2006-12-27 03:15:07 | 1
random Windows permission-denied failures | 2007-02-12 11:00:09 | 5
random ccache breakage | 2007-01-04 01:34:33 | 1
readline misconfiguration | 2007-02-12 17:19:41 | 33
row-ordering discrepancy in rowtypes test | 2007-02-10 03:00:02 | 3
stats test failed | 2007-03-14 13:00:02 | 319
threaded Python library | 2007-01-10 04:05:02 | 6
undefined symbol pg_mic2ascii | 2007-02-03 01:13:40 | 101
unexpected signal 9 | 2006-12-31 06:30:02 | 15
unportable uuid patch | 2007-01-31 17:30:01 | 16
use of // comment | 2007-02-16 09:23:02 | 1
xml code teething problems | 2007-02-16 16:01:05 | 79
(54 rows)

Some of these might possibly be interesting to other people ...

regards, tom lane

#11Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#10)
Re: Buildfarm feature request: some way to track/classify failures

| 2007-01-31 17:30:01 | 16

use of // comment | 2007-02-16 09:23:02 | 1
xml code teething problems | 2007-02-16 16:01:05 | 79
(54 rows)

Some of these might possibly be interesting to other people ...

If you provide the various greps, etc... I will put it into the website
proper...

Joshua D. Drake

Show quoted text

regards, tom lane

------------------------------------------------------------------------

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

#12Jeremy Drake
pgsql@jdrake.com
In reply to: Tom Lane (#10)
Re: Buildfarm feature request: some way to track/classify failures

On Sun, 18 Mar 2007, Tom Lane wrote:

another icc crash | 2007-02-03 10:50:01 | 1
icc "internal error" | 2007-03-16 16:30:01 | 29

These on mongoose are most likely a result of flaky hardware. They tend
to occur most often when either
a) I am doing something else on the box when the build runs, or
b) the ambient temperature in the room is > ~72degF

I need to bring down this box at some point and try to figure out if it is
bad memory or what.

Anyway, ICC seems to be one of the few things that are really succeptable
to hardware issues (on this box at least, it is mostly ICC and firefox),
and I apologize for the noise this caused in the buildfarm logs...

--
American business long ago gave up on demanding that prospective
employees be honest and hardworking. It has even stopped hoping for
employees who are educated enough that they can tell the difference
between the men's room and the women's room without having little
pictures on the doors.
-- Dave Barry, "Urine Trouble, Mister"

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua D. Drake (#11)
Re: Buildfarm feature request: some way to track/classify failures

"Joshua D. Drake" <jd@commandprompt.com> writes:

Some of these might possibly be interesting to other people ...

If you provide the various greps, etc... I will put it into the website
proper...

Unfortunately I didn't keep notes on exactly what I searched for in each
case. Some of them were not based on grep at all, but rather "this
failure looks similar to those others and happened in the period between
a known bad patch commit and its fix". The goal was essentially to
group together failures that probably arose from the same cause --- I
may have made a mistake or two along the way ...

regards, tom lane

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeremy Drake (#12)
Re: Buildfarm feature request: some way to track/classify failures

Jeremy Drake <pgsql@jdrake.com> writes:

These on mongoose are most likely a result of flaky hardware.

Yeah, I saw a pretty fair number of irreproducible issues that are
probably hardware flake-outs. Of course you can't tell which are those
and which are low-probability software bugs for many moons...

I believe that a large fraction of the buildfarm consists of
semi-retired equipment that is probably more prone to this sort of
problem than newer stuff would be. But that's the price we must pay
for building such a large test farm on a shoestring. What we need to do
to deal with it, I think, is institutionalize some kind of long-term
tracking so that we can tell the recurrent from the non-recurrent
issues. I don't quite know how to do that; what I did over this past
weekend was labor-intensive and not scalable.

SoC project perhaps?

regards, tom lane

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#14)
Re: Buildfarm feature request: some way to track/classify failures

BTW, before I forget, this little project turned up a couple of
small improvements for the current buildfarm infrastructure:

1. There are half a dozen entries with obviously bogus timestamps:

bfarm=# select sysname,snapshot,branch from mfailures where snapshot < '2004-01-01';
sysname | snapshot | branch
------------+---------------------+--------
corgi | 1997-10-14 14:20:10 | HEAD
kookaburra | 1970-01-01 01:23:00 | HEAD
corgi | 1997-09-30 11:47:08 | HEAD
corgi | 1997-10-17 14:20:11 | HEAD
corgi | 1997-12-21 15:20:11 | HEAD
corgi | 1997-10-15 14:20:10 | HEAD
corgi | 1997-09-28 11:47:09 | HEAD
corgi | 1997-09-28 11:47:08 | HEAD
(8 rows)

indicating wrong system clock settings on these buildfarm machines.
(Indeed, IIRC these failures were actually caused by the ridiculous
clock settings --- we have at least one regression test that checks
century >= 21 ...) Perhaps the buildfarm server should bounce
reports with timestamps more than a day in the past or a few minutes in
the future. I think though that a more useful answer would be to
include "time of receipt of report" in the permanent record, and then
subsequent analysis could make its own decisions about whether to
believe the snapshot timestamp --- plus we could track elapsed times for
builds, which could be interesting in itself.

2. I was annoyed repeatedly that some buildfarm members weren't
reporting log_archive_filenames entries, which forced going the long
way round in the process I was using. Seems like we need some more
proactive means for getting buildfarm owners to keep their script
versions up-to-date. Not sure what that should look like exactly,
as long as it's not "you can run an ancient version as long as you
please".

regards, tom lane

#16Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#10)
Re: Buildfarm feature request: some way to track/classify failures

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Also, for completeness, the causes I wrote off as not interesting
(anymore, in some cases):

missing BYTE_ORDER definition for Solaris | 2007-01-10 14:18:23 | 1

What is this BYTE_ORDER macro? Should I be using it instead of the
AC_C_BIGENDIAN test in configure for the packed varlena patch?

row-ordering discrepancy in rowtypes test | 2007-02-10 03:00:02 | 3

Is this because the test is fixed or unfixable? If not shouldn't the test get
an ORDER BY clause so that it will reliably pass on future versions?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#17Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Bruce Momjian (#16)
Re: Buildfarm feature request: some way to track/classify failures

Gregory Stark wrote:

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Also, for completeness, the causes I wrote off as not interesting
(anymore, in some cases):

missing BYTE_ORDER definition for Solaris | 2007-01-10 14:18:23 | 1

What is this BYTE_ORDER macro? Should I be using it instead of the
AC_C_BIGENDIAN test in configure for the packed varlena patch?

FYI: this is the relevant commit (the affected buildfarm member was
clownfish)
http://archives.postgresql.org/pgsql-committers/2007-01/msg00154.php

Stefan

#18Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#15)
Re: Buildfarm feature request: some way to track/classify failures

Tom Lane wrote:

BTW, before I forget, this little project turned up a couple of
small improvements for the current buildfarm infrastructure:

1. There are half a dozen entries with obviously bogus timestamps:

bfarm=# select sysname,snapshot,branch from mfailures where snapshot < '2004-01-01';
sysname | snapshot | branch
------------+---------------------+--------
corgi | 1997-10-14 14:20:10 | HEAD
kookaburra | 1970-01-01 01:23:00 | HEAD
corgi | 1997-09-30 11:47:08 | HEAD
corgi | 1997-10-17 14:20:11 | HEAD
corgi | 1997-12-21 15:20:11 | HEAD
corgi | 1997-10-15 14:20:10 | HEAD
corgi | 1997-09-28 11:47:09 | HEAD
corgi | 1997-09-28 11:47:08 | HEAD
(8 rows)

indicating wrong system clock settings on these buildfarm machines.
(Indeed, IIRC these failures were actually caused by the ridiculous
clock settings --- we have at least one regression test that checks
century >= 21 ...) Perhaps the buildfarm server should bounce
reports with timestamps more than a day in the past or a few minutes in
the future. I think though that a more useful answer would be to
include "time of receipt of report" in the permanent record, and then
subsequent analysis could make its own decisions about whether to
believe the snapshot timestamp --- plus we could track elapsed times for
builds, which could be interesting in itself.

We actually do timestamp the reports - I just didn't include that in the
extract. I will alter the view it's based on. We started doing this in
Nov 2005, so I'm going to restrict the view to cases where the
report_time is not null - I doubt we're interested in ancient history.

A revised extract is available at
http://www.pgbuildfarm.org/mfailures2.dump

We already reject snapshot times that are in the future.

Use of NTP is highly recommended to buildfarm members, but I'm reluctant
to make it mandatory, as they might not have it available. I think we
can do this: alter the client script to report its idea of current time
at the time it makes the web transaction. If it's off from the server
time by more than some small value (say 60 secs), adjust the snapshot
time accordingly. If they don't report it then we can reject insane
dates (more than 24hours ago seems about right).

So I agree with both your suggestions ;-)

2. I was annoyed repeatedly that some buildfarm members weren't
reporting log_archive_filenames entries, which forced going the long
way round in the process I was using. Seems like we need some more
proactive means for getting buildfarm owners to keep their script
versions up-to-date. Not sure what that should look like exactly,
as long as it's not "you can run an ancient version as long as you
please".

Modern clients report the versions of the two scripts involved (see
script_version and web_script_version in reported config) so we could
easily enforce a minimum version on these.

cheers

andrew

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#16)
Re: Buildfarm feature request: some way to track/classify failures

Gregory Stark <stark@enterprisedb.com> writes:

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

missing BYTE_ORDER definition for Solaris | 2007-01-10 14:18:23 | 1

What is this BYTE_ORDER macro? Should I be using it instead of the
AC_C_BIGENDIAN test in configure for the packed varlena patch?

Actually, if we start to rely on AC_C_BIGENDIAN, I'd prefer to see us
get rid of direct usages of BYTE_ORDER. It looks like only
contrib/pgcrypto is depending on it today, but we've got lots of
cruft in the include/port/ files supporting that.

row-ordering discrepancy in rowtypes test | 2007-02-10 03:00:02 | 3

Is this because the test is fixed or unfixable?

It's fixed.
http://archives.postgresql.org/pgsql-committers/2007-02/msg00228.php

regards, tom lane

#20Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#16)
Re: Buildfarm feature request: some way to track/classify failures

"Gregory Stark" <stark@enterprisedb.com> writes:

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

row-ordering discrepancy in rowtypes test | 2007-02-10 03:00:02 | 3

Is this because the test is fixed or unfixable? If not shouldn't the test get
an ORDER BY clause so that it will reliably pass on future versions?

Hm, I took a quick look at this test and while there are a couple tests
missing ORDER BY clauses I can't see how they could possibly generate results
that are out of order. Perhaps the ones that do have ORDER BY clauses only
recently acquired them?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua D. Drake (#2)
#22Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#21)
#23Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#18)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#22)
#25Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#25)
#27Martijn van Oosterhout
kleptog@svana.org
In reply to: Tom Lane (#26)
#28Andrew Dunstan
andrew@dunslane.net
In reply to: Martijn van Oosterhout (#27)
#29Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Andrew Dunstan (#28)
#30Andrew Dunstan
andrew@dunslane.net
In reply to: Stefan Kaltenbrunner (#29)
#31Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andrew Dunstan (#28)
#32Andrew Dunstan
andrew@dunslane.net
In reply to: Alvaro Herrera (#31)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#28)
#34Arturo Perez
aperez@hayesinc.com
In reply to: Tom Lane (#33)
#35Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#33)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#35)
#37Andrew Dunstan
andrew@dunslane.net
In reply to: Arturo Perez (#34)
#38Martijn van Oosterhout
kleptog@svana.org
In reply to: Andrew Dunstan (#35)
#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#38)
#40Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#39)
#41Joshua D. Drake
jd@commandprompt.com
In reply to: Andrew Dunstan (#40)
#42Andrew Dunstan
andrew@dunslane.net
In reply to: Joshua D. Drake (#41)