What is happening on buildfarm member baiji?
The last two runs on baiji have failed at the installcheck stage,
with symptoms that look a heck of a lot like the most recent system
catalog changes haven't taken effect (eg, it doesn't seem to know
about pg_type.typarray). Given that the previous "check" step
passed, the most likely explanation seems to be that some part
of the "install" step failed --- I've not tried to reproduce the
behavior but it looks like it might be explained if the install
target's postgres.bki file was not getting overwritten. So we
have two issues: what exactly is going wrong (some new form of
Vista brain death no doubt), and why isn't the buildfarm script
noticing?
regards, tom lane
Tom Lane wrote:
The last two runs on baiji have failed at the installcheck stage,
with symptoms that look a heck of a lot like the most recent system
catalog changes haven't taken effect (eg, it doesn't seem to know
about pg_type.typarray). Given that the previous "check" step
passed, the most likely explanation seems to be that some part
of the "install" step failed --- I've not tried to reproduce the
behavior but it looks like it might be explained if the install
target's postgres.bki file was not getting overwritten. So we
have two issues: what exactly is going wrong (some new form of
Vista brain death no doubt), and why isn't the buildfarm script
noticing?
The script will not even run if the install directory exists:
die "$buildroot/$branch has $pgsql or inst directories!"
if ((!$from_source && -d $pgsql) || -d "inst");
But the install process is different for MSVC. It could be that we are
screwing up there.
I no longer have an MSVC box, so I can't tell so easily ;-(
cheers
andrew
Andrew Dunstan wrote:
Tom Lane wrote:
The last two runs on baiji have failed at the installcheck stage,
with symptoms that look a heck of a lot like the most recent system
catalog changes haven't taken effect (eg, it doesn't seem to know
about pg_type.typarray). Given that the previous "check" step
passed, the most likely explanation seems to be that some part
of the "install" step failed --- I've not tried to reproduce the
behavior but it looks like it might be explained if the install
target's postgres.bki file was not getting overwritten. So we
have two issues: what exactly is going wrong (some new form of
Vista brain death no doubt), and why isn't the buildfarm script
noticing?The script will not even run if the install directory exists:
die "$buildroot/$branch has $pgsql or inst directories!"
if ((!$from_source && -d $pgsql) || -d "inst");But the install process is different for MSVC. It could be that we are
screwing up there.
Uh, but that piece of code you're referring to is from the bulidfarm
code, right? Isn't it the same?
I no longer have an MSVC box, so I can't tell so easily ;-(
Non-Vista MSVC boxes seem to pass fine (mastodon and skylark, for
example - skylark fails on something completely different, not fully
investigated yet, but looks to be a buildfarm problem rather than a
backend one), so I don't think it's the MSVC procedure alone that's the
cause of it.
//Magnus
Magnus Hagander wrote:
Andrew Dunstan wrote:
Tom Lane wrote:
The last two runs on baiji have failed at the installcheck stage,
with symptoms that look a heck of a lot like the most recent system
catalog changes haven't taken effect (eg, it doesn't seem to know
about pg_type.typarray). Given that the previous "check" step
passed, the most likely explanation seems to be that some part
of the "install" step failed --- I've not tried to reproduce the
behavior but it looks like it might be explained if the install
target's postgres.bki file was not getting overwritten. So we
have two issues: what exactly is going wrong (some new form of
Vista brain death no doubt), and why isn't the buildfarm script
noticing?The script will not even run if the install directory exists:
die "$buildroot/$branch has $pgsql or inst directories!"
if ((!$from_source && -d $pgsql) || -d "inst");But the install process is different for MSVC. It could be that we are
screwing up there.Uh, but that piece of code you're referring to is from the bulidfarm
code, right? Isn't it the same?
Yes, but it might be that the MSVC install doesn't actually use that
location properly. Unfortunately, its logging is less than verbose, unlike
the standard install procedure.
I no longer have an MSVC box, so I can't tell so easily ;-(
Non-Vista MSVC boxes seem to pass fine (mastodon and skylark, for
example - skylark fails on something completely different, not fully
investigated yet, but looks to be a buildfarm problem rather than a
backend one), so I don't think it's the MSVC procedure alone that's the
cause of it.
Possibly. My point was that I can't even investigate how MSVC is working
at all.
cheers
andrew
Magnus Hagander wrote:
My point was that I can't even investigate how MSVC is working
at all.So what is it you're looking for, specifically, to help with that?
As a very bare minimum, we need to change the installation procedure to
log its destination.
Unless that has somehow got screwed up I can't see how Tom's theory of a
possibly left over .bki file can stand up.
cheers
andrew
Import Notes
Reply to msg id not found: 46470C43.8040003@hagander.net
Andrew Dunstan wrote:
Magnus Hagander wrote:
Andrew Dunstan wrote:
Tom Lane wrote:
The last two runs on baiji have failed at the installcheck stage,
with symptoms that look a heck of a lot like the most recent system
catalog changes haven't taken effect (eg, it doesn't seem to know
about pg_type.typarray). Given that the previous "check" step
passed, the most likely explanation seems to be that some part
of the "install" step failed --- I've not tried to reproduce the
behavior but it looks like it might be explained if the install
target's postgres.bki file was not getting overwritten. So we
have two issues: what exactly is going wrong (some new form of
Vista brain death no doubt), and why isn't the buildfarm script
noticing?The script will not even run if the install directory exists:
die "$buildroot/$branch has $pgsql or inst directories!"
if ((!$from_source && -d $pgsql) || -d "inst");But the install process is different for MSVC. It could be that we are
screwing up there.Uh, but that piece of code you're referring to is from the bulidfarm
code, right? Isn't it the same?Yes, but it might be that the MSVC install doesn't actually use that
location properly. Unfortunately, its logging is less than verbose, unlike
the standard install procedure.I no longer have an MSVC box, so I can't tell so easily ;-(
Non-Vista MSVC boxes seem to pass fine (mastodon and skylark, for
example - skylark fails on something completely different, not fully
investigated yet, but looks to be a buildfarm problem rather than a
backend one), so I don't think it's the MSVC procedure alone that's the
cause of it.Possibly. My point was that I can't even investigate how MSVC is working
at all.
So what is it you're looking for, specifically, to help with that?
//Magnus
Andrew Dunstan wrote:
Magnus Hagander wrote:
My point was that I can't even investigate how MSVC is working
at all.So what is it you're looking for, specifically, to help with that?
As a very bare minimum, we need to change the installation procedure to
log its destination.Unless that has somehow got screwed up I can't see how Tom's theory of a
possibly left over .bki file can stand up.
Just to be clear, are you looking for something as simple as this?
Index: Install.pm
===================================================================
RCS file: /cvsroot/pgsql/src/tools/msvc/Install.pm,v
retrieving revision 1.14
diff -c -r1.14 Install.pm
*** Install.pm 25 Apr 2007 19:00:05 -0000 1.14
--- Install.pm 13 May 2007 15:21:51 -0000
***************
*** 35,41 ****
$conf = "release";
}
die "Could not find debug or release binaries" if ($conf eq "");
! print "Installing for $conf\n";
EnsureDirectories($target,
'bin','lib','share','share/timezonesets','share/contrib','doc',
'doc/contrib', 'symbols');
--- 35,41 ----
$conf = "release";
}
die "Could not find debug or release binaries" if ($conf eq "");
! print "Installing for $conf in $target\n";
EnsureDirectories($target,
'bin','lib','share','share/timezonesets','share/contrib','doc',
'doc/contrib', 'symbols');
//Magnus
Magnus Hagander wrote:
! print "Installing for $conf in $target\n";
Looks like a good place to start, sure.
cheers
andrew
Andrew Dunstan wrote:
Magnus Hagander wrote:
! print "Installing for $conf in $target\n";
Looks like a good place to start, sure.
Ok. Applied.
//Magnus
"Andrew Dunstan" <andrew@dunslane.net> writes:
Unless that has somehow got screwed up I can't see how Tom's theory of a
possibly left over .bki file can stand up.
Well, I tried inserting a .bki file from April 30 into a HEAD
installation, and that made it dump core during bootstrap, so that
offhand theory was wrong.
However, when I run the HEAD regression tests against that entire
April 30 installation tree, I can duplicate the baiji regression diffs
almost exactly --- the polymorphism test fails for me where it succeeds
on baiji, which I think indicate that baiji has the patch I applied on
May 1 for SQL function inlining.
So I now state fairly confidently that baiji is failing to overwrite
*any* of the installation tree, /share and /bin both, and instead is
testing an installation dating from sometime between May 1 and May 11.
Have there been any recent changes in either the buildfarm script or
the MSVC install code that might have changed where the install is
supposed to go?
regards, tom lane
Tom Lane wrote:
"Andrew Dunstan" <andrew@dunslane.net> writes:
Unless that has somehow got screwed up I can't see how Tom's theory of a
possibly left over .bki file can stand up.Well, I tried inserting a .bki file from April 30 into a HEAD
installation, and that made it dump core during bootstrap, so that
offhand theory was wrong.However, when I run the HEAD regression tests against that entire
April 30 installation tree, I can duplicate the baiji regression diffs
almost exactly --- the polymorphism test fails for me where it succeeds
on baiji, which I think indicate that baiji has the patch I applied on
May 1 for SQL function inlining.So I now state fairly confidently that baiji is failing to overwrite
*any* of the installation tree, /share and /bin both, and instead is
testing an installation dating from sometime between May 1 and May 11.
Have there been any recent changes in either the buildfarm script or
the MSVC install code that might have changed where the install is
supposed to go?
Not to my knowledge, but I have no method of testing what's going on,
and I hate guessing like this - in fact this is what has worried me all
along about supporting MSVC builds - we always said we didn't want to
have to have 2 build environments, but now we have two and we'll be
supporting them forever, even though one of them is not used by 95% of
our developers. I realise that MSVC builds are likely to perform better,
but we have now got a situation where we are likely to have breakage on
a regular basis, ISTM.
(sorry to grumble - it's been a very frustrating 24 hours)
cheers
andrew
Andrew Dunstan wrote:
Not to my knowledge, but I have no method of testing what's going on,
and I hate guessing like this - in fact this is what has worried me all
along about supporting MSVC builds - we always said we didn't want to
have to have 2 build environments, but now we have two and we'll be
supporting them forever, even though one of them is not used by 95% of
our developers. I realise that MSVC builds are likely to perform better,
but we have now got a situation where we are likely to have breakage on
a regular basis, ISTM.
It's not just that they perform better - we also get a debugger that
actually works well (yes, I know newer gdb's apparently do work on
Mingw; but even a fully functional GDB doesn't come close to VC++), but
more importantly it's looking more and more like it'll be our only way
of producing a 64bit build for Windows.
(sorry to grumble - it's been a very frustrating 24 hours)
:-(
Regards, Dave.
Tom Lane wrote:
So I now state fairly confidently that baiji is failing to overwrite
*any* of the installation tree, /share and /bin both, and instead is
testing an installation dating from sometime between May 1 and May 11.
Close. There was an Msys build from the 9th running on port 5432.
So, it seems there are a couple of issues here:
1) There appears to be no way to specify the default port number in the
MSVC build. The buildfarm passes it to configure for regular builds,
which obviously isn't run in VC++ mode, thus leaving the build on 5432.
2) VC++ and Msys builds will both happily start on the same port at the
same time. The first one to start listens on 5432 until it shuts down,
at which point the second server takes over seamlessly! It doesn't
matter which is started first - it's as if Windows is queuing up the
listens on the port.
Confusingly, the similar behaviour is reproducible on XP Pro, except the
connection seems to go to the last server started, instead of the first!
Regards, Dave
Close. There was an Msys build from the 9th running on port 5432.
2) VC++ and Msys builds will both happily start on the same
port at the same time. The first one to start listens on 5432
until it shuts down, at which point the second server takes
over seamlessly! It doesn't matter which is started first -
it's as if Windows is queuing up the listens on the port.
Um, we explicitly set SO_REUSEADDR. So the port will happily allow a
second bind.
http://support.microsoft.com/kb/307175 quote:
"If you use SO_REUSADDR to bind multiple servers to the same port at the
same time, only one random listening socket accepts a connection
request."
Andreas
Zeugswetter Andreas ADI SD wrote:
Close. There was an Msys build from the 9th running on port 5432.
2) VC++ and Msys builds will both happily start on the same
port at the same time. The first one to start listens on 5432
until it shuts down, at which point the second server takes
over seamlessly! It doesn't matter which is started first -
it's as if Windows is queuing up the listens on the port.Um, we explicitly set SO_REUSEADDR. So the port will happily allow a
second bind.
So we do. I must confess I didn't look at the code, just spoke with
Magnus who agreed it didn't seem like it should be possible.
Regards, Dave
Dave Page wrote:
Tom Lane wrote:
So I now state fairly confidently that baiji is failing to overwrite
*any* of the installation tree, /share and /bin both, and instead is
testing an installation dating from sometime between May 1 and May 11.Close. There was an Msys build from the 9th running on port 5432.
So, it seems there are a couple of issues here:
1) There appears to be no way to specify the default port number in the
MSVC build. The buildfarm passes it to configure for regular builds,
which obviously isn't run in VC++ mode, thus leaving the build on 5432.2) VC++ and Msys builds will both happily start on the same port at the
same time. The first one to start listens on 5432 until it shuts down,
at which point the second server takes over seamlessly! It doesn't
matter which is started first - it's as if Windows is queuing up the
listens on the port.Confusingly, the similar behaviour is reproducible on XP Pro, except the
connection seems to go to the last server started, instead of the first!
I'll look at the port mess.
Are you running 2 buildfarm members on the same machine? If so, you
should look at using the multi-root factility which is explicitly
designed to avoid clashes of this sort.
cheers
andrew
Dave Page <dpage@postgresql.org> writes:
2) VC++ and Msys builds will both happily start on the same port at the
same time. The first one to start listens on 5432 until it shuts down,
at which point the second server takes over seamlessly!
Uh ... so the lock-file stuff is completely broken on Windows?
The SO_REUSEADDR flag is intentional --- without that, on many
platforms there would be a significant time delay needed between
stopping a postmaster and starting a new one. But our socket lock
file machinery ought to have detected the conflict.
regards, tom lane
Andrew Dunstan wrote:
I'll look at the port mess.
Are you running 2 buildfarm members on the same machine? If so, you
should look at using the multi-root factility which is explicitly
designed to avoid clashes of this sort.
Yes, I've got VC++ and Mingw/Msys animals on each of two (virtual)
machines. Each is completely independent of each other - different
configs, different scripts, different ports, different directories etc.
Where can I find out about multi-root? I can't see anything in the
config file, or in PGBuildFarm-HOWTO.txt
Regards, Dave.
I wrote:
Uh ... so the lock-file stuff is completely broken on Windows?
Not so much broken as commented out ... on looking at the code, it's
blindingly obvious that we don't even try to create a socket lock file
if not HAVE_UNIX_SOCKETS. Sigh.
There is a related risk even on Unix machines: two postmasters can be
started on the same port number if they have different settings of
unix_socket_directory, and then it's indeterminate which one you will
contact if you connect to the TCP port. I seem to recall that we
discussed this several years ago, and didn't really find a satisfactory
way of interlocking the TCP port per se.
regards, tom lane
On Mon, May 14, 2007 at 08:50:54AM -0400, Tom Lane wrote:
I wrote:
Uh ... so the lock-file stuff is completely broken on Windows?
Not so much broken as commented out ... on looking at the code, it's
blindingly obvious that we don't even try to create a socket lock file
if not HAVE_UNIX_SOCKETS. Sigh.There is a related risk even on Unix machines: two postmasters can be
started on the same port number if they have different settings of
unix_socket_directory, and then it's indeterminate which one you will
contact if you connect to the TCP port. I seem to recall that we
discussed this several years ago, and didn't really find a satisfactory
way of interlocking the TCP port per se.
If all we want to do is add a check that prevents two servers to start on
the same port, we could do that trivially in a win32 specific way (since
we'll never have unix sockets there). Just create an object in the global
namespace named postgresql.interlock.<portnumber> or such a thing.
Worth doing?
//Magnus