mingw check hung

Started by Andrew Dunstanabout 17 years ago33 messageshackers
Jump to latest
#1Andrew Dunstan
andrew@dunslane.net

Something happened about 80 hours ago that caused my mingw buildfarm
member (gcc 3.4.2 on Win XP Pro SP2) to hang at the check stage. It
looks like it's hung in initdb.

I wonder if it could be this commit:

Log Message:
-----------
Make win32 builds always do SetEnvironmentVariable() when doing putenv().
Also, if linked against other versions than the default MSVCRT library
(for example the MSVC build which links against MSVCRT80), also update
the cache in the default MSVCRT at the same time.

I note that the change is not apparently limited to MSVC builds. The MSVC animal that runs on the same machine appears unaffected. I see one other mingw buildfarm member that is having problems that started a few days ago (yak) and another that looks like it is a few hours overdue to report, so it might also be hung (vaquita).

cheers

andrew

#2Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#1)
Re: mingw check hung

Andrew Dunstan wrote:

Something happened about 80 hours ago that caused my mingw buildfarm
member (gcc 3.4.2 on Win XP Pro SP2) to hang at the check stage. It
looks like it's hung in initdb.

I wonder if it could be this commit:

Log Message:
-----------
Make win32 builds always do SetEnvironmentVariable() when doing putenv().
Also, if linked against other versions than the default MSVCRT library
(for example the MSVC build which links against MSVCRT80), also update
the cache in the default MSVCRT at the same time.

I note that the change is not apparently limited to MSVC builds. The
MSVC animal that runs on the same machine appears unaffected. I see
one other mingw buildfarm member that is having problems that started
a few days ago (yak) and another that looks like it is a few hours
overdue to report, so it might also be hung (vaquita).

Further to this:

I see that vaquita has now reported in, and is happy. Also, I can run
happily on my Vista box (vaquita is also a Vista box). I therefore
suspect that we have a problem specifically with XP (both dawn_bat and
yak are XP boxes).

cheers

andrew

#3Magnus Hagander
magnus@hagander.net
In reply to: Andrew Dunstan (#2)
Re: mingw check hung

Andrew Dunstan wrote:

Andrew Dunstan wrote:

Something happened about 80 hours ago that caused my mingw buildfarm
member (gcc 3.4.2 on Win XP Pro SP2) to hang at the check stage. It
looks like it's hung in initdb.

I wonder if it could be this commit:

Log Message:
-----------
Make win32 builds always do SetEnvironmentVariable() when doing putenv().
Also, if linked against other versions than the default MSVCRT library
(for example the MSVC build which links against MSVCRT80), also update
the cache in the default MSVCRT at the same time.

I note that the change is not apparently limited to MSVC builds. The
MSVC animal that runs on the same machine appears unaffected. I see
one other mingw buildfarm member that is having problems that started
a few days ago (yak) and another that looks like it is a few hours
overdue to report, so it might also be hung (vaquita).

Further to this:

I see that vaquita has now reported in, and is happy. Also, I can run
happily on my Vista box (vaquita is also a Vista box). I therefore
suspect that we have a problem specifically with XP (both dawn_bat and
yak are XP boxes).

Have you managed to get gdb running on that box, and if so, can you try
to grab a stacktrace? If not, try a stacktrace from process explorer. It
doesn't actually work with mingw, but it gives you a hint based on DLL
exports...

//Magnus

#4Andrew Dunstan
andrew@dunslane.net
In reply to: Magnus Hagander (#3)
Re: mingw check hung

Magnus Hagander wrote:

Have you managed to get gdb running on that box, and if so, can you try
to grab a stacktrace? If not, try a stacktrace from process explorer. It
doesn't actually work with mingw, but it gives you a hint based on DLL
exports...

I'll see what I can do. By the time I get to see the problem Dr Watson
already has the process - in fact the run is hanging waiting on a Dr
Watson dialog box ;-(

I've installed drmingw to handle exceptions instead, so we'll see if
that gives us useful info. If not, I'll see what I can do with gdb.

cheers

andrew

#5Magnus Hagander
magnus@hagander.net
In reply to: Andrew Dunstan (#4)
Re: mingw check hung

Andrew Dunstan wrote:

Magnus Hagander wrote:

Have you managed to get gdb running on that box, and if so, can you try
to grab a stacktrace? If not, try a stacktrace from process explorer. It
doesn't actually work with mingw, but it gives you a hint based on DLL
exports...

I'll see what I can do. By the time I get to see the problem Dr Watson
already has the process - in fact the run is hanging waiting on a Dr
Watson dialog box ;-(

There's a commandline parameter to drwatson, iirc, that will make it
stop grabbing them automatically.

I've installed drmingw to handle exceptions instead, so we'll see if
that gives us useful info. If not, I'll see what I can do with gdb.

Hadn't heard of drwmingw, I see how that can be useful :-)

//Magnus

#6Andrew Dunstan
andrew@dunslane.net
In reply to: Magnus Hagander (#5)
Re: mingw check hung

Magnus Hagander wrote:

Andrew Dunstan wrote:

I've installed drmingw to handle exceptions instead, so we'll see if
that gives us useful info. If not, I'll see what I can do with gdb.

Hadn't heard of drwmingw, I see how that can be useful :-)

report from DrMingw is below

cheers

andrew

initdb.exe caused an Access Violation at location 7c91b1fa in module
ntdll.dll Writing to location 20202030.

Registers:
eax=20202020 ebx=00000000 ecx=00000000 edx=003eab70 esi=003eab70
edi=00000000
eip=7c91b1fa esp=0022b820 ebp=0022b894 iopl=0 nv up ei pl nz na
po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000
efl=00000206

Call stack:
7C91B1FA ntdll.dll:7C91B1FA RtlpWaitForCriticalSection
7C901046 ntdll.dll:7C901046 RtlEnterCriticalSection
77C3F34F msvcrt.dll:77C3F34F _popen
00401493 initdb.exe:00401493 popen_check initdb.c:477
static FILE * popen_check(
const char * command = ,
const char * mode =
)
...
errno = 0;
cmdfd = popen(command, mode);

if (cmdfd == NULL)

fprintf(stderr, _("%s: could not execute command \"%s\": %s\n"),
progname, command, strerror(errno));
...

00404DA0 initdb.exe:00404DA0 main initdb.c:1650
int main(
int argc = 7,
char * * argv = &0x003e3d21
)
...
DEVNULL);

PG_CMD_OPEN;

for (line = sysviews_setup; *line != NULL; line++)
...

004011E7 initdb.exe:004011E7
00401238 initdb.exe:00401238
7C817067 kernel32.dll:7C817067 RegisterWaitForInputIdle

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#6)
Re: mingw check hung

Andrew Dunstan wrote:

Magnus Hagander wrote:

Andrew Dunstan wrote:

I've installed drmingw to handle exceptions instead, so we'll see if
that gives us useful info. If not, I'll see what I can do with gdb.

Hadn't heard of drwmingw, I see how that can be useful :-)

report from DrMingw is below

Further data point:

The suspect patch is quite definitely the source of the problem. I undid
the configure changes and surrounded the additions to port/win32.h with
#ifdef WIN32_ONLY_COMPILER ... #endif. Result: the problem disappeared,
and "make check" completed perfectly.

cheers

andrew

#8Magnus Hagander
magnus@hagander.net
In reply to: Andrew Dunstan (#7)
Re: mingw check hung

Andrew Dunstan wrote:

Andrew Dunstan wrote:

Magnus Hagander wrote:

Andrew Dunstan wrote:

I've installed drmingw to handle exceptions instead, so we'll see if
that gives us useful info. If not, I'll see what I can do with gdb.

Hadn't heard of drwmingw, I see how that can be useful :-)

report from DrMingw is below

Further data point:

The suspect patch is quite definitely the source of the problem. I undid
the configure changes and surrounded the additions to port/win32.h with
#ifdef WIN32_ONLY_COMPILER ... #endif. Result: the problem disappeared,
and "make check" completed perfectly.

Per discussion I looked at just reverting that part, but that won't
work. If we do that, the call to SetEnvironmentVariable() will not be
run, which certainly isn't right..

The problem has to be in win32env.c. I originally thought we
accidentally called the putenv function twice in this case, but that
code seems properly #ifdef:ed to MSVC.

I'm not sure I trust the crash point at all - is this compiled with
debug info enabled? It seems like a *very* strange line to crash on...

I can't spot the error right off :-( Can you try to see if it's the
putenv() or the unsetenv() that gets broken? (by making sure just one of
them get replaced)

//Magnus

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#8)
Re: mingw check hung

Magnus Hagander <magnus@hagander.net> writes:

Andrew Dunstan wrote:

The suspect patch is quite definitely the source of the problem.

I can't spot the error right off :-( Can you try to see if it's the
putenv() or the unsetenv() that gets broken?

Are we sure pgwin32_unsetenv works in this environment? (Or worse,
maybe it's trying to use port/unsetenv.c?)

regards, tom lane

#10Mark Cave-Ayland
mark.cave-ayland@siriusit.co.uk
In reply to: Magnus Hagander (#8)
Re: mingw check hung

Magnus Hagander wrote:

Per discussion I looked at just reverting that part, but that won't
work. If we do that, the call to SetEnvironmentVariable() will not be
run, which certainly isn't right..

The problem has to be in win32env.c. I originally thought we
accidentally called the putenv function twice in this case, but that
code seems properly #ifdef:ed to MSVC.

I'm not sure I trust the crash point at all - is this compiled with
debug info enabled? It seems like a *very* strange line to crash on...

I can't spot the error right off :-( Can you try to see if it's the
putenv() or the unsetenv() that gets broken? (by making sure just one of
them get replaced)

//Magnus

Hi guys,

Don't know if this is relevant at all, but it reminds me of a problem I
had with environment variables in PostGIS with MingW. It was something
along the lines of environment variables set in a MingW program using
putenv() for PGPORT, PGHOST etc. weren't visible to a MSVC-compiled
libpq but were to a MingW-compiled libpq. It's fairly easy to knock up a
quick test program in C to verify this.

I eventually gave up and just built a connection string instead - for
reference the final patch is here
http://postgis.refractions.net/pipermail/postgis-commits/2008-January/000199.html.
I appreciate it may not be 100% relevant, but I thought I'd flag it up
as possibly being a fault with the MingW putenv implementation.

HTH,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

#11Magnus Hagander
magnus@hagander.net
In reply to: Mark Cave-Ayland (#10)
Re: mingw check hung

Mark Cave-Ayland wrote:

Magnus Hagander wrote:

Per discussion I looked at just reverting that part, but that won't
work. If we do that, the call to SetEnvironmentVariable() will not be
run, which certainly isn't right..

The problem has to be in win32env.c. I originally thought we
accidentally called the putenv function twice in this case, but that
code seems properly #ifdef:ed to MSVC.

I'm not sure I trust the crash point at all - is this compiled with
debug info enabled? It seems like a *very* strange line to crash on...

I can't spot the error right off :-( Can you try to see if it's the
putenv() or the unsetenv() that gets broken? (by making sure just one of
them get replaced)

//Magnus

Hi guys,

Don't know if this is relevant at all, but it reminds me of a problem I
had with environment variables in PostGIS with MingW. It was something
along the lines of environment variables set in a MingW program using
putenv() for PGPORT, PGHOST etc. weren't visible to a MSVC-compiled
libpq but were to a MingW-compiled libpq. It's fairly easy to knock up a
quick test program in C to verify this.

That's the reason for this patch to go in in the first place. That has
been fixed. It also seems to have caused crashes on mingw, which was not
expected :-)

It's not actually a fault with mingw putenv, it's just that those go
into the cached environment only.

//Magnus

#12Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#9)
Re: mingw check hung

Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

Andrew Dunstan wrote:

The suspect patch is quite definitely the source of the problem.

I can't spot the error right off :-( Can you try to see if it's the
putenv() or the unsetenv() that gets broken?

Are we sure pgwin32_unsetenv works in this environment? (Or worse,
maybe it's trying to use port/unsetenv.c?)

It is the pgwin32_unsetenv() call that is causing the trouble somehow.
That much I have just managed to isolate.

cheers

andrew

#13Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#12)
Re: mingw check hung

Andrew Dunstan wrote:

Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

Andrew Dunstan wrote:

The suspect patch is quite definitely the source of the problem.

I can't spot the error right off :-( Can you try to see if it's the
putenv() or the unsetenv() that gets broken?

Are we sure pgwin32_unsetenv works in this environment? (Or worse,
maybe it's trying to use port/unsetenv.c?)

It is the pgwin32_unsetenv() call that is causing the trouble somehow.
That much I have just managed to isolate.

Specifically, it's the SetEnvironmentVariable() call from
pgwin32_putenv() called from pgwin32_unsetenv(). When this is disabled
things work just fine.

cheers

andrew

#14Magnus Hagander
magnus@hagander.net
In reply to: Andrew Dunstan (#13)
Re: mingw check hung

Andrew Dunstan wrote:

Andrew Dunstan wrote:

Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

Andrew Dunstan wrote:

The suspect patch is quite definitely the source of the problem.

I can't spot the error right off :-( Can you try to see if it's the
putenv() or the unsetenv() that gets broken?

Are we sure pgwin32_unsetenv works in this environment? (Or worse,
maybe it's trying to use port/unsetenv.c?)

It is the pgwin32_unsetenv() call that is causing the trouble somehow.
That much I have just managed to isolate.

Specifically, it's the SetEnvironmentVariable() call from
pgwin32_putenv() called from pgwin32_unsetenv(). When this is disabled
things work just fine.

That's strange :( What arguments are it sent to the function? Since this
is an API function, it really shouldn't behave differently between mingw
and msvc, so it must be something that goes wrong with the arguments.

Also, Tom mentioned earlier that we may be including *two* replacements
for unsetenv(), which could be what's causing the problem. Can you check
if that is happening and try to disable the one in port/unsetenv.c and
see if that changes things?

//Magnus

#15Andrew Dunstan
andrew@dunslane.net
In reply to: Magnus Hagander (#14)
Re: mingw check hung

Magnus Hagander wrote:

Specifically, it's the SetEnvironmentVariable() call from
pgwin32_putenv() called from pgwin32_unsetenv(). When this is disabled
things work just fine.

That's strange :( What arguments are it sent to the function? Since this
is an API function, it really shouldn't behave differently between mingw
and msvc, so it must be something that goes wrong with the arguments.

Also, Tom mentioned earlier that we may be including *two* replacements
for unsetenv(), which could be what's causing the problem. Can you check
if that is happening and try to disable the one in port/unsetenv.c and
see if that changes things?

I've already ruled out that hypothesis by forcing the call direct to
pgwin32_unsetenv() instead of relying on the macro, in initdb.c.

There are only two such calls in initdb.c: the arguments are "LC_ALL"
and "PGCLIENTENCODING".

I wonder if this version of SetEnvironmentVariable is sufficiently dumb
that it fails badly if given a NULL second argument for a value that is
not in fact in the environment (as I would normally expect of these on
Windows)?

cheers

andrew

#16Magnus Hagander
magnus@hagander.net
In reply to: Andrew Dunstan (#15)
Re: mingw check hung

Andrew Dunstan wrote:

Magnus Hagander wrote:

Specifically, it's the SetEnvironmentVariable() call from
pgwin32_putenv() called from pgwin32_unsetenv(). When this is disabled
things work just fine.

That's strange :( What arguments are it sent to the function? Since this
is an API function, it really shouldn't behave differently between mingw
and msvc, so it must be something that goes wrong with the arguments.

Also, Tom mentioned earlier that we may be including *two* replacements
for unsetenv(), which could be what's causing the problem. Can you check
if that is happening and try to disable the one in port/unsetenv.c and
see if that changes things?

I've already ruled out that hypothesis by forcing the call direct to
pgwin32_unsetenv() instead of relying on the macro, in initdb.c.

There are only two such calls in initdb.c: the arguments are "LC_ALL"
and "PGCLIENTENCODING".

I wonder if this version of SetEnvironmentVariable is sufficiently dumb
that it fails badly if given a NULL second argument for a value that is
not in fact in the environment (as I would normally expect of these on
Windows)?

But that should be a win32 API call. It's not a runtime call. So it
should be identical between mingw and msvc!

Try removing the code that sets it to NULL if it's empty string. Having
it as empty string made it fail on MSVC, and the API documentation says
it should be NULL, but maybe mingw is somehow intercepting the call and
breaking it...

//Magnus

#17Andrew Dunstan
andrew@dunslane.net
In reply to: Magnus Hagander (#16)
Re: mingw check hung

Magnus Hagander wrote:

Andrew Dunstan wrote:

Magnus Hagander wrote:

Specifically, it's the SetEnvironmentVariable() call from
pgwin32_putenv() called from pgwin32_unsetenv(). When this is disabled
things work just fine.

That's strange :( What arguments are it sent to the function? Since this
is an API function, it really shouldn't behave differently between mingw
and msvc, so it must be something that goes wrong with the arguments.

Also, Tom mentioned earlier that we may be including *two* replacements
for unsetenv(), which could be what's causing the problem. Can you check
if that is happening and try to disable the one in port/unsetenv.c and
see if that changes things?

I've already ruled out that hypothesis by forcing the call direct to
pgwin32_unsetenv() instead of relying on the macro, in initdb.c.

There are only two such calls in initdb.c: the arguments are "LC_ALL"
and "PGCLIENTENCODING".

I wonder if this version of SetEnvironmentVariable is sufficiently dumb
that it fails badly if given a NULL second argument for a value that is
not in fact in the environment (as I would normally expect of these on
Windows)?

But that should be a win32 API call. It's not a runtime call. So it
should be identical between mingw and msvc!

Try removing the code that sets it to NULL if it's empty string. Having
it as empty string made it fail on MSVC, and the API documentation says
it should be NULL, but maybe mingw is somehow intercepting the call and
breaking it...

Mingw is just passing the call on.

You're right. When I comment out the NULL assignment, it all works.

MSDN says this (<http://msdn.microsoft.com/en-us/library/z46c489x.aspx&gt;):

If the value parameter is not empty and the environment variable
named by the variable parameter does not exist, the environment
variable is created and assigned the contents of value. Solely for
purposes of this operation, value is considered empty if it is a
null reference (Nothing in Visual Basic), contains a zero-length
string, or contains an initial hexadecimal zero character (0x00).

If variable contains a non-initial hexadecimal zero character, the
characters before the zero character are considered the environment
variable name and all subsequent characters are ignored.

If value contains a non-initial hexadecimal zero character, the
characters before the zero character are assigned to the environment
variable and all subsequent characters are ignored.

If value is empty and the environment variable named by variable
exists, the environment variable is deleted. If variable does not
exist, no error occurs even though the operation cannot be performed.

So it looks like we could remove that NULL assignment happily and expect
the right thing to be done.

cheers

andrew

#18Magnus Hagander
magnus@hagander.net
In reply to: Andrew Dunstan (#17)
Re: mingw check hung

Andrew Dunstan wrote:

Magnus Hagander wrote:

Andrew Dunstan wrote:

Magnus Hagander wrote:

Specifically, it's the SetEnvironmentVariable() call from
pgwin32_putenv() called from pgwin32_unsetenv(). When this is disabled
things work just fine.

That's strange :( What arguments are it sent to the function? Since
this
is an API function, it really shouldn't behave differently between
mingw
and msvc, so it must be something that goes wrong with the arguments.

Also, Tom mentioned earlier that we may be including *two* replacements
for unsetenv(), which could be what's causing the problem. Can you
check
if that is happening and try to disable the one in port/unsetenv.c and
see if that changes things?

I've already ruled out that hypothesis by forcing the call direct to
pgwin32_unsetenv() instead of relying on the macro, in initdb.c.

There are only two such calls in initdb.c: the arguments are "LC_ALL"
and "PGCLIENTENCODING".

I wonder if this version of SetEnvironmentVariable is sufficiently dumb
that it fails badly if given a NULL second argument for a value that is
not in fact in the environment (as I would normally expect of these on
Windows)?

But that should be a win32 API call. It's not a runtime call. So it
should be identical between mingw and msvc!

Try removing the code that sets it to NULL if it's empty string. Having
it as empty string made it fail on MSVC, and the API documentation says
it should be NULL, but maybe mingw is somehow intercepting the call and
breaking it...

Mingw is just passing the call on.

You're right. When I comment out the NULL assignment, it all works.

MSDN says this (<http://msdn.microsoft.com/en-us/library/z46c489x.aspx&gt;):

If the value parameter is not empty and the environment variable
named by the variable parameter does not exist, the environment
variable is created and assigned the contents of value. Solely for
purposes of this operation, value is considered empty if it is a
null reference (Nothing in Visual Basic), contains a zero-length
string, or contains an initial hexadecimal zero character (0x00).

If variable contains a non-initial hexadecimal zero character, the
characters before the zero character are considered the environment
variable name and all subsequent characters are ignored.

If value contains a non-initial hexadecimal zero character, the
characters before the zero character are assigned to the environment
variable and all subsequent characters are ignored.

If value is empty and the environment variable named by variable
exists, the environment variable is deleted. If variable does not
exist, no error occurs even though the operation cannot be performed.

So it looks like we could remove that NULL assignment happily and expect
the right thing to be done.

I'm doing training all day today, but I can hopefully look at it this
weekend if you haven't already. However, I do recall *adding* that part
specifically for MSVC compatibility - I got a crash without it. Perhaps
we need to #ifdef it on mingw, but I'd like to understand *why*, since
it's just an API call...

Are we *sure*, btw, that this is actually a mingw issue, and not
something else in the environment? Could you try a MSVC compiled binary
on the same machine?

//Magnus

#19Andrew Dunstan
andrew@dunslane.net
In reply to: Magnus Hagander (#18)
Re: mingw check hung

Magnus Hagander wrote:

Are we *sure*, btw, that this is actually a mingw issue, and not
something else in the environment? Could you try a MSVC compiled binary
on the same machine?

My MSVC buildfarm animal runs on the same machine, and does not suffer
the same problem.

cheers

andrew

#20Magnus Hagander
magnus@hagander.net
In reply to: Andrew Dunstan (#19)
Re: mingw check hung

Andrew Dunstan wrote:

Magnus Hagander wrote:

Are we *sure*, btw, that this is actually a mingw issue, and not
something else in the environment? Could you try a MSVC compiled binary
on the same machine?

My MSVC buildfarm animal runs on the same machine, and does not suffer
the same problem.

Meh. Stupid mingw :-)

So how about we #ifdef out that NULL setting based on
WIN32_ONLY_COMPILER, does that seem reasonable?

//Magnus

#21Andrew Dunstan
andrew@dunslane.net
In reply to: Magnus Hagander (#20)
#22Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#21)
#23Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#22)
#24Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Andrew Dunstan (#22)
#25Andrew Dunstan
andrew@dunslane.net
In reply to: Hiroshi Inoue (#24)
#26Magnus Hagander
magnus@hagander.net
In reply to: Andrew Dunstan (#25)
#27Magnus Hagander
magnus@hagander.net
In reply to: Hiroshi Inoue (#24)
#28Hiroshi Saito
z-saito@guitar.ocn.ne.jp
In reply to: Andrew Dunstan (#1)
#29Andrew Dunstan
andrew@dunslane.net
In reply to: Magnus Hagander (#26)
#30Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#29)
#31Magnus Hagander
magnus@hagander.net
In reply to: Andrew Dunstan (#29)
#32Andrew Dunstan
andrew@dunslane.net
In reply to: Magnus Hagander (#31)
#33Magnus Hagander
magnus@hagander.net
In reply to: Andrew Dunstan (#32)