windows / initdb oddness
This took me hours to find ...
On my Windows box, CVS HEAD gets an execution failure on "initdb foo"
but succeeds happily with "initdb -D foo".
This is not true for REL8_1_STABLE, nor is it true for all Windows
machines/environments, apparently, otherwise we would be seeing failures
from the buildfarm member snake, since the buildfarm script does "initdb
data".
My suspicion is that this is some side effect of the restricted-exec
patch, but I don't have time to dig further right now.
cheers
andrew
This took me hours to find ...
On my Windows box, CVS HEAD gets an execution failure on "initdb foo"
but succeeds happily with "initdb -D foo".This is not true for REL8_1_STABLE, nor is it true for all
Windows machines/environments, apparently, otherwise we would
be seeing failures from the buildfarm member snake, since the
buildfarm script does "initdb data".My suspicion is that this is some side effect of the
restricted-exec patch, but I don't have time to dig further right now.
Um, so what error msg do you get when it's failing?
//Magnus
Import Notes
Resolved by subject fallback
Magnus Hagander wrote:
This took me hours to find ...
On my Windows box, CVS HEAD gets an execution failure on "initdb foo"
but succeeds happily with "initdb -D foo".This is not true for REL8_1_STABLE, nor is it true for all
Windows machines/environments, apparently, otherwise we would
be seeing failures from the buildfarm member snake, since the
buildfarm script does "initdb data".My suspicion is that this is some side effect of the
restricted-exec patch, but I don't have time to dig further right now.Um, so what error msg do you get when it's failing?
I get a popup box that says:
initdb.exe has encountered a problem and needs to close.
We are sorry for the inconvenience.
Clicking a link gives this info:
AppName: initdb.exe AppVer: 8.2.0.6051 ModName: msvcrt.dll
ModVer: 7.0.2600.1106 Offset: 00033830
It wouldn't let me copy the rest of the info ;-(
cheers
andrew
This took me hours to find ...
On my Windows box, CVS HEAD gets an execution failure on
"initdb foo"
but succeeds happily with "initdb -D foo".
This is not true for REL8_1_STABLE, nor is it true for all Windows
machines/environments, apparently, otherwise we would be seeing
failures from the buildfarm member snake, since thebuildfarm script
does "initdb data".
My suspicion is that this is some side effect of the
restricted-exec
patch, but I don't have time to dig further right now.
Um, so what error msg do you get when it's failing?
I get a popup box that says:
initdb.exe has encountered a problem and needs to close.
We are sorry for the inconvenience.Clicking a link gives this info:
AppName: initdb.exe AppVer: 8.2.0.6051 ModName: msvcrt.dll
ModVer: 7.0.2600.1106 Offset: 00033830It wouldn't let me copy the rest of the info ;-(
Hm. Crap. (For those not familiar with this, that's a coredump without a
core:-P)
Does it give you an error code? (Nevermind the stackdump etc, just the
code)
Are you running this with an admin account or a non-admin account? If
admin, what are the permissions on the initdb.exe file and libpq.dll?
Anything weird in how you run it - do you specify a path to initdb, or
run it from current directory for example?
And finally, can you check with process explorer if it's the first or
second initdb that dies? (With this patch, initdb will re-exec itself
with lower privs)
//Magnus
Import Notes
Resolved by subject fallback
Magnus Hagander wrote:
I get a popup box that says:
initdb.exe has encountered a problem and needs to close.
We are sorry for the inconvenience.Clicking a link gives this info:
AppName: initdb.exe AppVer: 8.2.0.6051 ModName: msvcrt.dll
ModVer: 7.0.2600.1106 Offset: 00033830It wouldn't let me copy the rest of the info ;-(
Hm. Crap. (For those not familiar with this, that's a coredump without a
core:-P)Does it give you an error code? (Nevermind the stackdump etc, just the
code)
I'll have a look when I get time to reboot the machine into Windows.
Are you running this with an admin account or a non-admin account? If
admin, what are the permissions on the initdb.exe file and libpq.dll?
Should be unprivileged - it's the account I use to run buildfarm. (and
which has therefore in each case just successfully run "make check" with
the identical binaries).
Anything weird in how you run it - do you specify a path to initdb, or
run it from current directory for example?
relative path: bin/initdb foo
(bin has libpq.dll as well as initdb.exe).
And finally, can you check with process explorer if it's the first or
second initdb that dies? (With this patch, initdb will re-exec itself
with lower privs)
I will add some trace writes when I get a chance. I was rather hoping
something would jump out at you, but obviously it hasn't, so I'll have
to dig into it the slow way. *sigh*
cheers
andrew
Are you running this with an admin account or a non-admin
account? If
admin, what are the permissions on the initdb.exe file and libpq.dll?
Should be unprivileged - it's the account I use to run
buildfarm. (and which has therefore in each case just
successfully run "make check" with the identical binaries).
Hm. Ok. Is that part running under msys or "plain windows"? I haven't
done much testing under msys. (Though it's clearly not always broken)
Anything weird in how you run it - do you specify a path to
initdb, or
run it from current directory for example?
relative path: bin/initdb foo
(bin has libpq.dll as well as initdb.exe).
But if this is from buildfarm, it should be the same on snake, right?
And finally, can you check with process explorer if it's the
first or
second initdb that dies? (With this patch, initdb will
re-exec itself
with lower privs)
I will add some trace writes when I get a chance. I was
rather hoping something would jump out at you, but obviously
it hasn't, so I'll have to dig into it the slow way. *sigh*
Ok. Sorry, can't help much. It's definitly quite possible it's somewhere
around the new priv code. My first guess would be something with the
build-new-commandline pieces, but I don't see what it should be...
//Magnus
Import Notes
Resolved by subject fallback
I wrote:
I will add some trace writes when I get a chance. I was rather hoping
something would jump out at you, but obviously it hasn't, so I'll have
to dig into it the slow way. *sigh*
Just eyeballing the code it looks to me like the problem is this line:
strcat(cmdline, *" --restrictedexec"*);
which is appending an option type argument after the non-option argument.
That would exactly account for the failure when we call "initdb foo" but
not "initdb -D foo".
The solution would be put --restrictedexec earlier on the new command
line. I'll work on that.
cheers
andrew
Andrew Dunstan wrote:
I wrote:
I will add some trace writes when I get a chance. I was rather hoping
something would jump out at you, but obviously it hasn't, so I'll
have to dig into it the slow way. *sigh*Just eyeballing the code it looks to me like the problem is this line:
strcat(cmdline, *" --restrictedexec"*);
which is appending an option type argument after the non-option argument.
That would exactly account for the failure when we call "initdb foo"
but not "initdb -D foo".The solution would be put --restrictedexec earlier on the new command
line. I'll work on that.
The probem is apparently the one I identified above, and is fixed by the
attached patch, which I will apply soon unless there are objections.
As for why we saw this on loris but not snake, I suspect they might have
different getopt libraries installed.
cheers
andrew
Attachments:
initdb.patchtext/x-patch; name=initdb.patchDownload+33-21
I will add some trace writes when I get a chance. I was
rather hoping
something would jump out at you, but obviously it hasn't, so I'll
have to dig into it the slow way. *sigh*Just eyeballing the code it looks to me like the problem is
this line:
strcat(cmdline, *" --restrictedexec"*);
which is appending an option type argument after the
non-option argument.
That would exactly account for the failure when we call
"initdb foo"
but not "initdb -D foo".
The solution would be put --restrictedexec earlier on the
new command
line. I'll work on that.
The probem is apparently the one I identified above, and is
fixed by the attached patch, which I will apply soon unless
there are objections.As for why we saw this on loris but not snake, I suspect they
might have different getopt libraries installed.
Isn't that just fixing the symptom and not the actual bug? In this case,
if we cause the bug, we should do this as well, but doesn't it crash the
same way if you *manually* put arguments in the "wrong order" on the
commandline? Like "inidb foo --no-locale" or somehting like that?
(I still can't reproduce it on my machines, so I guess I have a better
getopt as well.)
//Magnus
Import Notes
Resolved by subject fallback
Magnus Hagander wrote:
The solution would be put --restrictedexec earlier on the
new command
line. I'll work on that.
The probem is apparently the one I identified above, and is
fixed by the attached patch, which I will apply soon unless
there are objections.As for why we saw this on loris but not snake, I suspect they
might have different getopt libraries installed.Isn't that just fixing the symptom and not the actual bug? In this case,
if we cause the bug, we should do this as well, but doesn't it crash the
same way if you *manually* put arguments in the "wrong order" on the
commandline? Like "inidb foo --no-locale" or somehting like that?(I still can't reproduce it on my machines, so I guess I have a better
getopt as well.)
We don't promise that you can put the pgdata argument anywhere except at
the end of the command line. In fact, our manual page requires it at the
end. Even on systems with GNU getopt, if POSIXLY_CORRECT is set then
processing would stop at the first non-getopt argument.
So I can live with bombing, even if it's a bit unpleasant, in the case
of "initdb foo --no-locale", but we cannot *cause* that by appending a
secret argument ourselves, so that "initdb foo" also bombs.
The logic to detect and correct this in the general case before getopt
is called is not worth the pain.
cheers
andrew
Andrew Dunstan wrote:
Magnus Hagander wrote:
The solution would be put --restrictedexec earlier on the
new command
line. I'll work on that.
The probem is apparently the one I identified above, and is fixed by
the attached patch, which I will apply soon unless there are
objections.As for why we saw this on loris but not snake, I suspect they might
have different getopt libraries installed.Isn't that just fixing the symptom and not the actual bug? In this case,
if we cause the bug, we should do this as well, but doesn't it crash the
same way if you *manually* put arguments in the "wrong order" on the
commandline? Like "inidb foo --no-locale" or somehting like that?(I still can't reproduce it on my machines, so I guess I have a better
getopt as well.)We don't promise that you can put the pgdata argument anywhere except
at the end of the command line. In fact, our manual page requires it
at the end. Even on systems with GNU getopt, if POSIXLY_CORRECT is set
then processing would stop at the first non-getopt argument.So I can live with bombing, even if it's a bit unpleasant, in the case
of "initdb foo --no-locale", but we cannot *cause* that by appending
a secret argument ourselves, so that "initdb foo" also bombs.The logic to detect and correct this in the general case before getopt
is called is not worth the pain.
Thinking about this a tiny bit more, it struck me that by far the best
way to do this is to stop using a magic argument and use the environment
instead. Then we don't need to mangle the command line at all. This
actually results in less code, and should be more robust (mangling the
command line in Windows is dangerous and difficult because of quotes).
Trial patch below, although I don't have a Windows box handy to test it on.
cheers
andrew
Attachments:
initdb.restrict.patchtext/x-patch; name=initdb.restrict.patchDownload+16-23
Andrew Dunstan <andrew@dunslane.net> writes:
Thinking about this a tiny bit more, it struck me that by far the best
way to do this is to stop using a magic argument and use the environment
instead. Then we don't need to mangle the command line at all. This
actually results in less code, and should be more robust (mangling the
command line in Windows is dangerous and difficult because of quotes).
This seems like a good idea.
Is there any reason to worry about an accidental environment conflict?
If someone mistakenly did "export PG_RESTRICT_EXEC=1", it looks to me
like this would cause the re-exec bit to be skipped, but I suppose the
worst possible consequence is that the postmaster would refuse to start.
Is there anything I don't see? (Of course, the magic argument method
can be broken manually in just the same way...)
regards, tom lane
Tom Lane wrote:
Is there any reason to worry about an accidental environment conflict?
If someone mistakenly did "export PG_RESTRICT_EXEC=1", it looks to me
like this would cause the re-exec bit to be skipped, but I suppose the
worst possible consequence is that the postmaster would refuse to start.
Is there anything I don't see? (Of course, the magic argument method
can be broken manually in just the same way...)
Yes. The effect would be that we just do exactly what we do today
anyway. We could make the value some more obscure token, but I don't
see much point.
cheers
andrew
Thinking about this a tiny bit more, it struck me that by
far the best
way to do this is to stop using a magic argument and use the
environment instead. Then we don't need to mangle thecommand line at
all. This actually results in less code, and should be more robust
(mangling the command line in Windows is dangerous anddifficult because of quotes).
This seems like a good idea.
Is there any reason to worry about an accidental environment conflict?
If someone mistakenly did "export PG_RESTRICT_EXEC=1", it
looks to me like this would cause the re-exec bit to be
skipped, but I suppose the worst possible consequence is that
the postmaster would refuse to start.
Is there anything I don't see? (Of course, the magic
argument method can be broken manually in just the same way...)
This only affects initdb, not postmaster.
I don't see the risk being bigger with environment than commandline at
all.
//Magnus
Import Notes
Resolved by subject fallback
Is there any reason to worry about an accidental environment
conflict?
If someone mistakenly did "export PG_RESTRICT_EXEC=1", it
looks to me
like this would cause the re-exec bit to be skipped, but I
suppose the
worst possible consequence is that the postmaster would
refuse to start.
Is there anything I don't see? (Of course, the magic
argument method
can be broken manually in just the same way...)
Yes. The effect would be that we just do exactly what we do
today anyway. We could make the value some more obscure
token, but I don't see much point.
No, if the user wants to break it, go ahead. They're just going to break
things for themselves (since the execuited postgres.exe still retains
the admin check and will bail out). I see no reason to make it obscure.
//Magnus
Import Notes
Resolved by subject fallback