[Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

Started by Dave Pageover 18 years ago13 messages
#1Dave Page
dpage@postgresql.org

I've been seeing this failure intermittently on Narwhal HEAD, and once
on 8.1. Other branches have been OK, as have other animals running on
the same physical box. Narwhal-HEAD is run more often than any other
builds however.

Anyone have any idea what might be wrong? It seems unlikely to be a
hardware issue given that it's the exact same test failures each time.

Regards, Dave.

-------- Original Message --------
Subject: PGBuildfarm member narwhal Branch HEAD Status changed from OK
to InstallCheck failure
Date: Fri, 20 Apr 2007 13:46:22 -0700 (PDT)
From: PG Build Farm <pgbuildfarm-web@hosting-two.commandprompt.com>
To: pgbuildfarm-status-chngs@pgfoundry.org,
pgbuildfarm-status-green@pgfoundry.org

The PGBuildfarm member narwhal had the following event on branch HEAD:

Status changed from OK to InstallCheck failure

The snapshot timestamp for the build that triggered this notification
is: 2007-04-20 20:00:01

The specs of this machine are:
OS: Windows Server 2003 R2 / 5.2.3790
Arch: i686
Comp: GCC / 3.4.2 (mingw-special)

For more information, see
http://www.pgbuildfarm.org/cgi-bin/show_history.pl?nm=narwhal&amp;br=HEAD

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dave Page (#1)
Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

Dave Page <dpage@postgresql.org> writes:

I've been seeing this failure intermittently on Narwhal HEAD, and once
on 8.1. Other branches have been OK, as have other animals running on
the same physical box. Narwhal-HEAD is run more often than any other
builds however.

Anyone have any idea what might be wrong? It seems unlikely to be a
hardware issue given that it's the exact same test failures each time.

Yeah, I'd been wondering about that too, but have no clue what's up.
It seems particularly odd that all the failures are in installcheck
not check.

If you want to poke at it, I'd suggest changing the ERROR to PANIC
(it's in bufmgr.c) to cause a core dump, run installchecks till you
get a panic, and then look around in the dump to see what you can find.
It'd be particularly interesting to see what the buffer actually
contains. Also you could look at the corresponding page of the disk
file (which in theory should be the same as the buffer contents,
since this error check is only made just after a read() ...)

regards, tom lane

#3Dave Page
dpage@postgresql.org
In reply to: Tom Lane (#2)
Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

Tom Lane wrote:

Dave Page <dpage@postgresql.org> writes:

I've been seeing this failure intermittently on Narwhal HEAD, and once
on 8.1. Other branches have been OK, as have other animals running on
the same physical box. Narwhal-HEAD is run more often than any other
builds however.

Anyone have any idea what might be wrong? It seems unlikely to be a
hardware issue given that it's the exact same test failures each time.

Yeah, I'd been wondering about that too, but have no clue what's up.
It seems particularly odd that all the failures are in installcheck
not check.

If you want to poke at it, I'd suggest changing the ERROR to PANIC
(it's in bufmgr.c) to cause a core dump, run installchecks till you
get a panic, and then look around in the dump to see what you can find.
It'd be particularly interesting to see what the buffer actually
contains. Also you could look at the corresponding page of the disk
file (which in theory should be the same as the buffer contents,
since this error check is only made just after a read() ...)

Hmm, I'll give it a go when I'm back in the office, but bear in mind
this is a Mingw build on which debugging is nigh-on impossible.

Regards, Dave.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dave Page (#3)
Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

Dave Page <dpage@postgresql.org> writes:

Tom Lane wrote:

If you want to poke at it, I'd suggest changing the ERROR to PANIC
(it's in bufmgr.c) to cause a core dump, run installchecks till you
get a panic, and then look around in the dump to see what you can find.
It'd be particularly interesting to see what the buffer actually
contains. Also you could look at the corresponding page of the disk
file (which in theory should be the same as the buffer contents,
since this error check is only made just after a read() ...)

Hmm, I'll give it a go when I'm back in the office, but bear in mind
this is a Mingw build on which debugging is nigh-on impossible.

I was afraid of that. Well, at least get a dump of page 104 in that
index so we can see what's on-disk.

regards, tom lane

#5Dave Page
dpage@postgresql.org
In reply to: Tom Lane (#4)
Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

Tom Lane wrote:

Dave Page <dpage@postgresql.org> writes:

Tom Lane wrote:

If you want to poke at it, I'd suggest changing the ERROR to PANIC
(it's in bufmgr.c) to cause a core dump, run installchecks till you
get a panic, and then look around in the dump to see what you can find.
It'd be particularly interesting to see what the buffer actually
contains. Also you could look at the corresponding page of the disk
file (which in theory should be the same as the buffer contents,
since this error check is only made just after a read() ...)

Hmm, I'll give it a go when I'm back in the office, but bear in mind
this is a Mingw build on which debugging is nigh-on impossible.

I was afraid of that. Well, at least get a dump of page 104 in that
index so we can see what's on-disk.

Sure - I'll have to try with 8.1/8.2 unless you have a pg_filedump
that'll work with -HEAD?

/D

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dave Page (#5)
Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

Dave Page <dpage@postgresql.org> writes:

Tom Lane wrote:

I was afraid of that. Well, at least get a dump of page 104 in that
index so we can see what's on-disk.

Sure - I'll have to try with 8.1/8.2 unless you have a pg_filedump
that'll work with -HEAD?

No, I don't, but a plain hex/ascii dump is probably the best thing
anyway, since we know the page header is wrong. So use any old
version of pg_filedump with -d switch.

regards, tom lane

#7Zeugswetter Andreas ADI SD
ZeugswetterA@spardat.at
In reply to: Dave Page (#3)
Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Statuschanged from OK to InstallCheck failure]

Hmm, I'll give it a go when I'm back in the office, but bear
in mind this is a Mingw build on which debugging is nigh-on
impossible.

I use the Snapshot
http://prdownloads.sf.net/mingw/gdb-6.3-2.exe?download from sf.net.
It has some issues, but it is definitely useable.

Andreas

#8Dave Page
dpage@postgresql.org
In reply to: Zeugswetter Andreas ADI SD (#7)
Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Statuschanged from OK to InstallCheck failure]

Zeugswetter Andreas ADI SD wrote:

Hmm, I'll give it a go when I'm back in the office, but bear
in mind this is a Mingw build on which debugging is nigh-on
impossible.

I use the Snapshot
http://prdownloads.sf.net/mingw/gdb-6.3-2.exe?download from sf.net.
It has some issues, but it is definitely useable.

I'll give it a go - thanks.

Regards, Dave.

#9Dave Page
dpage@postgresql.org
In reply to: Dave Page (#3)
Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

Dave Page wrote:

If you want to poke at it, I'd suggest changing the ERROR to PANIC
(it's in bufmgr.c) to cause a core dump, run installchecks till you
get a panic, and then look around in the dump to see what you can find.

Well, in typical fashion after 25+ runs this morning there's not a
failure in sight :-(. I'll keep trying this afternoon, but in case that
doesn't work, I've tweaked my buildfarm config to leave error trees in
place so maybe we can catch it that way (though that'll be without the
PANIC of course).

Regards, Dave

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dave Page (#1)
Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

Dave Page <dpage@postgresql.org> writes:

I've been seeing this failure intermittently on Narwhal HEAD, and once
on 8.1. Other branches have been OK, as have other animals running on
the same physical box. Narwhal-HEAD is run more often than any other
builds however.

Oh, this is interesting:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=baiji&amp;dt=2007-04-26%2022:00:02

Different compiler, different OS, not quite the same block number (109,
whereas IIRC all the previous examples have complained of block 104).
Is this the same physical machine as narwhal?

regards, tom lane

#11Dave Page
dpage@postgresql.org
In reply to: Tom Lane (#10)
Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

Tom Lane wrote:

Dave Page <dpage@postgresql.org> writes:

I've been seeing this failure intermittently on Narwhal HEAD, and once
on 8.1. Other branches have been OK, as have other animals running on
the same physical box. Narwhal-HEAD is run more often than any other
builds however.

Oh, this is interesting:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=baiji&amp;dt=2007-04-26%2022:00:02

Different compiler, different OS, not quite the same block number (109,
whereas IIRC all the previous examples have complained of block 104).
Is this the same physical machine as narwhal?

Yes, it is. It's an FC6 box running VMWare server, with a Win 2k3r2 VM
and a Vista ultimate VM, both with mingw and msvc animals.

I'm still not convinced it's a hardware problem - aside from the fact
that it's the same error every time (although, I note in this case it
was in check, not installcheck), I would expect at least one of SMART,
FC6, VMware or 2k3/Vista to spot that there was a problem. I have also
recreated the virtual disks of both VMs since this started happening. I
wonder if we're hitting some odd bug in VMware.

Anyhoo, unfortunately Baiji wasn't set to keep error builds - I've
changed that now and will run it a few times again. I'll also run a
sector level check of Narwhal's virtual disk and see if that complains.

Regards, Dave.

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dave Page (#11)
Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

Dave Page <dpage@postgresql.org> writes:

Tom Lane wrote:

Is this the same physical machine as narwhal?

Yes, it is. It's an FC6 box running VMWare server, with a Win 2k3r2 VM
and a Vista ultimate VM, both with mingw and msvc animals.

I'm still not convinced it's a hardware problem - aside from the fact
that it's the same error every time (although, I note in this case it
was in check, not installcheck), I would expect at least one of SMART,
FC6, VMware or 2k3/Vista to spot that there was a problem. I have also
recreated the virtual disks of both VMs since this started happening. I
wonder if we're hitting some odd bug in VMware.

I concur it's too regular to be a hardware issue. The VMware idea is
a bit plausible though. If that's it, we ought to see failures of this
ilk on all four animals sooner or later ...

regards, tom lane

#13Dave Page
dpage@postgresql.org
In reply to: Tom Lane (#12)
Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

Tom Lane wrote:

I concur it's too regular to be a hardware issue. The VMware idea is
a bit plausible though. If that's it, we ought to see failures of this
ilk on all four animals sooner or later ...

I've run full disk scans in both Windows VMs, and forced an fsck of the
host just to be on the safe side and nothing showed up. By chance it
seems that VMWare released an update just yesterday so I've upgraded
everything. Hopefully the problem will go away now, but I'm not holding
my breath!

If not, one other option would be to roll back a couple of versions of
VMware - an older version hosted Bandicoot for some time with no problems.

Regards, Dave