"CVS-Unknown" buildfarm failures?

Started by Tom Laneover 19 years ago26 messages
#1Tom Lane
tgl@sss.pgh.pa.us

meerkat and snake both have persistent "CVS-Unknown" failures in some
but not all branches. I can't see any evidence of an actual failure
in their logs though. What I do see is "?" entries about files that
shouldn't be there --- for instance, meerkat apparently needs a "make
distclean". If that's what's causing the failure report, could we
get the buildfarm to show a more useful status message? I'd always
assumed that "CVS-Unknown" suggested a transient problem such as
connection loss, and there wasn't any need for human intervention.

A more radical answer is to have the script go ahead and delete the
offending files itself, but I can see where that might not have good
fail-soft behavior ...

regards, tom lane

#2Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#1)
Re: "CVS-Unknown" buildfarm failures?

Tom Lane wrote:

meerkat and snake both have persistent "CVS-Unknown" failures in some
but not all branches. I can't see any evidence of an actual failure
in their logs though. What I do see is "?" entries about files that
shouldn't be there --- for instance, meerkat apparently needs a "make
distclean". If that's what's causing the failure report, could we
get the buildfarm to show a more useful status message? I'd always
assumed that "CVS-Unknown" suggested a transient problem such as
connection loss, and there wasn't any need for human intervention.

A more radical answer is to have the script go ahead and delete the
offending files itself, but I can see where that might not have good
fail-soft behavior ...

I have manually ran a dist-clean on meerkat for 8_0 and 8_1 and am
rerunning the builds now.

Joshua D. Drake

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#1)
Re: 'CVS-Unknown' buildfarm failures?

Tom Lane said:

meerkat and snake both have persistent "CVS-Unknown" failures in some
but not all branches. I can't see any evidence of an actual failure in
their logs though. What I do see is "?" entries about files that
shouldn't be there --- for instance, meerkat apparently needs a "make
distclean". If that's what's causing the failure report, could we get
the buildfarm to show a more useful status message? I'd always assumed
that "CVS-Unknown" suggested a transient problem such as
connection loss, and there wasn't any need for human intervention.

A more radical answer is to have the script go ahead and delete the
offending files itself, but I can see where that might not have good
fail-soft behavior ...

cvs-unknown means there are unknown files in the repo:

my $unknown_files = grep {/^\?/ } @cvslog;
...
send_result('CVS-Unknown',$unknown_files,\@cvslog)
if ($unknown_files);

This is almost always a case of operator error. buildfarm only ever builds
in a copy of the repo, not in the permanent repo itself, so there should
NEVER be any file there which does not come from CVS. I have repeatedly
advised buildfarm member owners not to build by hand in the buildfarm repos.
Not everybody listens, apparently.

All this is intended to ensure that we are actually working on a faithful
reflection of the postgresql.org repo, and not something that has been
mangled somehow.

I can call it "CVS-Unknown-Files" if that will make it clearer.

cheers

andrew

#4Andrew Dunstan
andrew@dunslane.net
In reply to: Joshua D. Drake (#2)
Re: 'CVS-Unknown' buildfarm failures?

Joshua D. Drake said:

Tom Lane wrote:

A more radical answer is to have the script go ahead and delete the
offending files itself, but I can see where that might not have good
fail-soft behavior ...

I have manually ran a dist-clean on meerkat for 8_0 and 8_1 and am
rerunning the builds now.

If that doesn't work, the correct method of recovery is to remove the repo
copy altogether and let the buildfarm script get a completely fresh checkout:

rm -rf <buildroot>/<branch>/pgsql

would do the trick.

cheers

andrew

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#3)
Re: 'CVS-Unknown' buildfarm failures?

"Andrew Dunstan" <andrew@dunslane.net> writes:

Tom Lane said:

meerkat and snake both have persistent "CVS-Unknown" failures in some
but not all branches. I can't see any evidence of an actual failure in
their logs though.

cvs-unknown means there are unknown files in the repo:

Oh. Well, it needs renamed then ;-). Per our message style guidelines,
calling an error "unknown" is seldom a good idea.

I can call it "CVS-Unknown-Files" if that will make it clearer.

Maybe CVS-Extraneous-Files?

regards, tom lane

#6Dave Page
dpage@vale-housing.co.uk
In reply to: Tom Lane (#5)
Re: 'CVS-Unknown' buildfarm failures?

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of
Andrew Dunstan
Sent: 02 June 2006 03:31
To: tgl@sss.pgh.pa.us
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] 'CVS-Unknown' buildfarm failures?

cvs-unknown means there are unknown files in the repo:

my $unknown_files = grep {/^\?/ } @cvslog;
...
send_result('CVS-Unknown',$unknown_files,\@cvslog)
if ($unknown_files);

This is almost always a case of operator error. buildfarm
only ever builds
in a copy of the repo, not in the permanent repo itself, so
there should
NEVER be any file there which does not come from CVS. I have
repeatedly
advised buildfarm member owners not to build by hand in the
buildfarm repos.
Not everybody listens, apparently.

The owner of snake can guarantee that that is not the case - that box is
not used for *anything* other than the buildfarm and hasn't even been
logged into for weeks, if not months.

The failures come and go, so I have to suspect something other than
operator error.

Regards, Dave

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Dave Page (#6)
Re: 'CVS-Unknown' buildfarm failures?

Dave Page said:

I have
repeatedly
advised buildfarm member owners not to build by hand in the
buildfarm repos.
Not everybody listens, apparently.

The owner of snake can guarantee that that is not the case - that box
is not used for *anything* other than the buildfarm and hasn't even
been logged into for weeks, if not months.

The failures come and go, so I have to suspect something other than
operator error.

That's why I said "almost always" :-)

I strongly suspect that snake is hitting the "file/directory doesn't
disappear immediately when you unlink/rmdir" problem on Windows that we have
had to code around inside Postgres. It looks like cvs is trying to prune an
empty directory but isn't fast enough.

I assume that snake just uses the Msys DTK's cvs? If so, I think we'll just
have to live with this - it's not very frequent - snake's last occurrence on
HEAD was 62 days ago.

cheers

andrew

#8Dave Page
dpage@vale-housing.co.uk
In reply to: Andrew Dunstan (#7)
Re: 'CVS-Unknown' buildfarm failures?

-----Original Message-----
From: Andrew Dunstan [mailto:andrew@dunslane.net]
Sent: 02 June 2006 12:18
To: Dave Page
Cc: tgl@sss.pgh.pa.us; pgsql-hackers@postgresql.org
Subject: RE: [HACKERS] 'CVS-Unknown' buildfarm failures?

That's why I said "almost always" :-)

:-)

I strongly suspect that snake is hitting the "file/directory doesn't
disappear immediately when you unlink/rmdir" problem on
Windows that we have
had to code around inside Postgres. It looks like cvs is
trying to prune an
empty directory but isn't fast enough.

Sounds feasible.

I assume that snake just uses the Msys DTK's cvs?

Yes.

If so, I
think we'll just
have to live with this - it's not very frequent - snake's
last occurrence on
HEAD was 62 days ago.

Strange though, as I'd expect to see more of the problem on HEAD than
the stable branches.

Regards, Dave.

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#7)
Re: 'CVS-Unknown' buildfarm failures?

"Andrew Dunstan" <andrew@dunslane.net> writes:

I strongly suspect that snake is hitting the "file/directory doesn't
disappear immediately when you unlink/rmdir" problem on Windows that we have
had to code around inside Postgres. It looks like cvs is trying to prune an
empty directory but isn't fast enough.

Maybe "sleep 2" or so between "make distclean" and "cvs update" in the
script would help?

regards, tom lane

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#9)
Re: 'CVS-Unknown' buildfarm failures?

Tom Lane wrote:

"Andrew Dunstan" <andrew@dunslane.net> writes:

I strongly suspect that snake is hitting the "file/directory doesn't
disappear immediately when you unlink/rmdir" problem on Windows that we have
had to code around inside Postgres. It looks like cvs is trying to prune an
empty directory but isn't fast enough.

Maybe "sleep 2" or so between "make distclean" and "cvs update" in the
script would help?

buildfarm never does make distclean. It operates on a copy which it
removes at the end of the run.

What's happening here is that cvs actually creates the directory and
then later prunes it when it finds it is empty. Run strace on "cvs
update" and then look for pg-config, or examine src/bin/CVS/Entries.Log
and you should see what's going on. So we'd have to put the sleep inside
cvs ...

cheers

andrew

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#10)
Re: 'CVS-Unknown' buildfarm failures?

Andrew Dunstan <andrew@dunslane.net> writes:

What's happening here is that cvs actually creates the directory and
then later prunes it when it finds it is empty.

I find that explanation pretty unconvincing. Why would cvs print a "?"
for such a directory?

regards, tom lane

#12Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#11)
Re: 'CVS-Unknown' buildfarm failures?

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

What's happening here is that cvs actually creates the directory and
then later prunes it when it finds it is empty.

I find that explanation pretty unconvincing. Why would cvs print a "?"
for such a directory?

Another possibility is that the directory is an artifact left from a
previous run of cvs update in which the rmdir failed, and the present
run prints out the "?" line and subsequently prunes the directory as we
have told it to do with the -P flag.

I don't have time to dig deeply into the CVS sources to debug the
problem comprehensively.

cheers

andrew

#13Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#11)
Re: 'CVS-Unknown' buildfarm failures?

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

What's happening here is that cvs actually creates the directory and
then later prunes it when it finds it is empty.

I find that explanation pretty unconvincing. Why would cvs print a "?"
for such a directory?

cvs will print a ? if it doesn't know what it is... or is that svn?

Joshua D. Drake

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua D. Drake (#13)
Re: 'CVS-Unknown' buildfarm failures?

"Joshua D. Drake" <jd@commandprompt.com> writes:

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

What's happening here is that cvs actually creates the directory and
then later prunes it when it finds it is empty.

I find that explanation pretty unconvincing. Why would cvs print a "?"
for such a directory?

cvs will print a ? if it doesn't know what it is... or is that svn?

But cvs certainly knows "what it is" if it's a subdirectory subject to
creation and pruning; that means the subdirectory exists in the
repository.

I doubt that cvs would complain about a pre-existing subdirectory of
this type either, because that would result in an unreasonable amount of
chatter when adding or removing the -P option.

regards, tom lane

#15Andrew Dunstan
andrew@dunslane.net
In reply to: Joshua D. Drake (#13)
Re: 'CVS-Unknown' buildfarm failures?

Joshua D. Drake wrote:

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

What's happening here is that cvs actually creates the directory and
then later prunes it when it finds it is empty.

I find that explanation pretty unconvincing. Why would cvs print a "?"
for such a directory?

cvs will print a ? if it doesn't know what it is... or is that svn?

yes, it's a file/directory it doesn't know about.

At one stage I suppressed these checks, but I found that too many times
we saw errors due to unclean repos. So now buildfarm insists on having a
clean repo.

I suppose I could provide a switch to turn it off ... in one recent case
the repo was genuinely not clean, though, so I am not terribly keen on
that approach - but I am open to persuasion.

cheers

andrew

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#15)
Re: 'CVS-Unknown' buildfarm failures?

Andrew Dunstan <andrew@dunslane.net> writes:

I suppose I could provide a switch to turn it off ... in one recent case
the repo was genuinely not clean, though, so I am not terribly keen on
that approach - but I am open to persuasion.

No, I agree it's a good check. Just wondering if we can reduce the
number of false positives. The recent meerkat failures, for instance,
were *not* false positives.

Looking at the snake failures of this type on HEAD, I do see that the
complaints are all about subdirectories that should have been pruned,
which makes Andrew's theory seem plausible. Maybe we should file this
behavior as a cvs bug.

Sudden thought: is there any particularly good reason to use the cvs
update -P switch in buildfarm repositories? If we simply eliminated
the create/prune thrashing for these directories, it'd fix the problem,
if Andrew's idea is correct. Probably save a few cycles too. And since
people are really not supposed to be using these checkouts for anything
else, they don't need to be pretty.

regards, tom lane

#17Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#16)
Re: 'CVS-Unknown' buildfarm failures?

Tom Lane wrote:

Sudden thought: is there any particularly good reason to use the cvs
update -P switch in buildfarm repositories? If we simply eliminated
the create/prune thrashing for these directories, it'd fix the problem,
if Andrew's idea is correct. Probably save a few cycles too. And since
people are really not supposed to be using these checkouts for anything
else, they don't need to be pretty.

Good point. I'll do that, since it's pretty close to cost-free.

There will be a new release of buildfarm client code, with this and the
error name change, in the next day or so.

cheers

andrew

#18Jim Nasby
jnasby@pervasive.com
In reply to: Andrew Dunstan (#15)
Re: 'CVS-Unknown' buildfarm failures?

On Jun 2, 2006, at 10:27 AM, Andrew Dunstan wrote:

Joshua D. Drake wrote:

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

What's happening here is that cvs actually creates the directory
and then later prunes it when it finds it is empty.

I find that explanation pretty unconvincing. Why would cvs print
a "?"
for such a directory?

cvs will print a ? if it doesn't know what it is... or is that svn?

yes, it's a file/directory it doesn't know about.

At one stage I suppressed these checks, but I found that too many
times we saw errors due to unclean repos. So now buildfarm insists
on having a clean repo.

I suppose I could provide a switch to turn it off ... in one recent
case the repo was genuinely not clean, though, so I am not terribly
keen on that approach - but I am open to persuasion.

Another option would be to re-run cvs up one more time if we get any
unexpected files. It sounds like that would fix this issue on windows
machines, while still ensuring we had a clean repo to work from.

#19Andrew Dunstan
andrew@dunslane.net
In reply to: Jim Nasby (#18)
Re: 'CVS-Unknown' buildfarm failures?

Jim Nasby wrote:

yes, it's a file/directory it doesn't know about.

At one stage I suppressed these checks, but I found that too many
times we saw errors due to unclean repos. So now buildfarm insists
on having a clean repo.

I suppose I could provide a switch to turn it off ... in one recent
case the repo was genuinely not clean, though, so I am not terribly
keen on that approach - but I am open to persuasion.

Another option would be to re-run cvs up one more time if we get any
unexpected files. It sounds like that would fix this issue on windows
machines, while still ensuring we had a clean repo to work from.

please see the new release of the buildfarm client, in which I have
followed Tom's suggestion of removing the -P flag from the checkout and
update commands - that should solve the Windows problem, as it will no
longer try to remove the directory. I hope that solves the problem - if
not I'll have a look at other solutions.

cheers

andrew

#20Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#19)
Re: 'CVS-Unknown' buildfarm failures?

I said:

Another option would be to re-run cvs up one more time if we get any
unexpected files. It sounds like that would fix this issue on windows
machines, while still ensuring we had a clean repo to work from.

please see the new release of the buildfarm client, in which I have
followed Tom's suggestion of removing the -P flag from the checkout and
update commands - that should solve the Windows problem, as it will no
longer try to remove the directory. I hope that solves the problem -
if not I'll have a look at other solutions.

Unfortunately, this fell over first time out:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&amp;dt=2006-06-04%2012:09:33
The fix handled directories, but we got a false positive from a rename not
being immediate either, it seems. Bloody Windows!

One thought I had was to force Windows to use CVS export rather than update.
This has 2 disadvantages: it requires a complete repo fetch every run, even
if we don't need to do anything because nothing has changed, and it also
means we can't report the version numbers on files changed. Example:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&amp;dt=2006-06-04%2012:21:43
So what I'm going to try instead is a variation on Jim's suggestion above,
but instead of re-running cvs update, what we'll do is a longish sleep (say
10 or 20 secs) which should be enough time for Windows to get its act
together, and then run cvs status, which will also show us extraneous files.

thoughts?

cheers

andrew

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#20)
Re: 'CVS-Unknown' buildfarm failures?

"Andrew Dunstan" <andrew@dunslane.net> writes:

Another option would be to re-run cvs up one more time if we get any
unexpected files. It sounds like that would fix this issue on windows
machines, while still ensuring we had a clean repo to work from.

So what I'm going to try instead is a variation on Jim's suggestion above,
but instead of re-running cvs update, what we'll do is a longish sleep (say
10 or 20 secs) which should be enough time for Windows to get its act
together, and then run cvs status, which will also show us extraneous files.

Yeah, this is probably OK since you only need to do it if you see any ?
entries in the cvs update. Another low-tech solution is to sleep a bit
and then see if any of the files/directories listed in ? entries are
still there.

regards, tom lane

#22Magnus Hagander
mha@sollentuna.net
In reply to: Tom Lane (#21)
Re: 'CVS-Unknown' buildfarm failures?

Unfortunately, this fell over first time out:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&amp;dt=200

6-06-04%2012:09:33

The fix handled directories, but we got a false positive from
a rename not being immediate either, it seems. Bloody Windows!

Are you running this from msys or from "actual windows"? I haven't
observed this outside msys, in which case it might be an idea to execute
it elsewhere, but I don't think I've done "things like it" enough to be
sure that makes a difference..

//Magnus

#23Greg Stark
gsstark@mit.edu
In reply to: Andrew Dunstan (#20)
Re: 'CVS-Unknown' buildfarm failures?

"Andrew Dunstan" <andrew@dunslane.net> writes:

One thought I had was to force Windows to use CVS export rather than update.
This has 2 disadvantages: it requires a complete repo fetch every run, even
if we don't need to do anything because nothing has changed, and it also
means we can't report the version numbers on files changed.

You could also just have the windows machine rsync the directory from one of
the other build machines?

--
greg

#24Andrew Dunstan
andrew@dunslane.net
In reply to: Greg Stark (#23)
Re: 'CVS-Unknown' buildfarm failures?

Greg Stark wrote:

"Andrew Dunstan" <andrew@dunslane.net> writes:

One thought I had was to force Windows to use CVS export rather than update.
This has 2 disadvantages: it requires a complete repo fetch every run, even
if we don't need to do anything because nothing has changed, and it also
means we can't report the version numbers on files changed.

You could also just have the windows machine rsync the directory from one of
the other build machines?

The farm is distributed - none of the members have any knowledge of the
others. And it is a design requirement that no inbound access is
required for buildfarm members, and that no tools are required other
than those that are required to build postgres.

Anyway, I think we have it covered now,.

cheers

andrew

#25Jim Nasby
jnasby@pervasive.com
In reply to: Andrew Dunstan (#20)
Re: 'CVS-Unknown' buildfarm failures?

On Jun 4, 2006, at 8:18 AM, Andrew Dunstan wrote:

I said:

Another option would be to re-run cvs up one more time if we get any
unexpected files. It sounds like that would fix this issue on
windows
machines, while still ensuring we had a clean repo to work from.

please see the new release of the buildfarm client, in which I have
followed Tom's suggestion of removing the -P flag from the
checkout and
update commands - that should solve the Windows problem, as it
will no
longer try to remove the directory. I hope that solves the problem -
if not I'll have a look at other solutions.

Unfortunately, this fell over first time out:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?
nm=loris&dt=2006-06-04%2012:09:33
The fix handled directories, but we got a false positive from a
rename not
being immediate either, it seems. Bloody Windows!

One thought I had was to force Windows to use CVS export rather
than update.
This has 2 disadvantages: it requires a complete repo fetch every
run, even
if we don't need to do anything because nothing has changed, and it
also
means we can't report the version numbers on files changed. Example:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?
nm=loris&dt=2006-06-04%2012:21:43
So what I'm going to try instead is a variation on Jim's suggestion
above,
but instead of re-running cvs update, what we'll do is a longish
sleep (say
10 or 20 secs) which should be enough time for Windows to get its act
together, and then run cvs status, which will also show us
extraneous files.

What about my suggestion of runing CVS a second time if we get
extraneous files the first go-round? I'm guessing there'd have to be
a sleep in there as well...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#26Andrew Dunstan
andrew@dunslane.net
In reply to: Jim Nasby (#25)
Re: 'CVS-Unknown' buildfarm failures?

Jim Nasby wrote:

What about my suggestion of runing CVS a second time if we get
extraneous files the first go-round? I'm guessing there'd have to be a
sleep in there as well...

The trouble with running "cvs update" a second time is that it will be
just as liable to fail as the first run. So I am following your
suggestion, but with this modification: after a sleep we will run "cvs
status" which will not have the same issues, because it doesn't create
or delete anything, and will show us any extraneous files/directories
that might be present.

cheers

andrew