A bit of PG archeology uncovers an interesting Linux/Unix factoid
For reasons, I was trying to compile older versions of Postgres and
ran into a strange behaviour where system() worked normally but then
returned -1 with errno set to ECHILD. And surprisingly it looks like
we've seen this behaviour in the past but on a Solaris:
commit 07d4d36aae79cf2ac365e381ed3e7ce62dcfa783
Author: Tatsuo Ishii <ishii@postgresql.org>
Date: Thu May 25 06:53:43 2000 +0000
On solaris, createdb/dropdb fails because of strange behavior of system().
(it returns error with errno ECHILD upon successful completion of commands).
This fix ignores an error from system() if errno == ECHILD.
It looks like Linux now behaves similarly, in fact there's a Redhat
notice about this causing similar headaches in Oracle:
https://access.redhat.com/solutions/37218
So just in case anyone else wants to use system() in Postgres or
indeed any other Unix application that twiddles with the SIGCHILD
handler this is something to beware of. It's not entirely clear to me
that the mention of SA_NOCLDWAIT is the only way to get this
behaviour, at least one stackoverflow answer implied just setting
SIG_IGN was enough.
--
greg
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 02/15/16 13:42, Greg Stark wrote:
(it returns error with errno ECHILD upon successful completion of commands).
This fix ignores an error from system() if errno == ECHILD.It looks like Linux now behaves similarly,
It seems to be official, in the Single Unix Specification:
http://pubs.opengroup.org/onlinepubs/7908799/xsh/sigaction.html
SA_NOCLDWAIT
If set, and sig equals SIGCHLD, child processes of the calling
processes will not be transformed into zombie processes when they
terminate. If the calling process subsequently waits for its
children, and the process has no unwaited for children that were
transformed into zombie processes, it will block until all of its
children terminate, and wait(), wait3(), waitid() and waitpid() will
fail and set errno to [ECHILD]. Otherwise, terminating child
processes will be transformed into zombie processes, unless SIGCHLD
is set to SIG_IGN.
So just in case anyone else wants to use system() in Postgres or
indeed any other Unix application that twiddles with the SIGCHILD
handler this is something to beware of. It's not entirely clear to me
that the mention of SA_NOCLDWAIT is the only way to get this
behaviour, at least one stackoverflow answer implied just setting
SIG_IGN was enough.
Yup:
• If a process sets the action for the SIGCHLD signal to SIG_IGN, the
behaviour is unspecified, except as specified below. If the action
for the SIGCHLD signal is set to SIG_IGN, child processes of the
calling processes will not be transformed into zombie processes when
they terminate. If the calling process subsequently waits for its
children, and the process has no unwaited for children that were
transformed into zombie processes, it will block until all of its
children terminate, and wait(), wait3(), waitid() and waitpid() will
fail and set errno to [ECHILD].
-Chap
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Feb 16, 2016 at 12:51 AM, Chapman Flack <chap@anastigmatix.net> wrote:
If the calling process subsequently waits for its
children, and the process has no unwaited for children that were
transformed into zombie processes, it will block until all of its
children terminate, and wait(), wait3(), waitid() and waitpid() will
fail and set errno to [ECHILD].
Sure, but I don't see anything saying system() should be expected to
not handle this situation. At least there's nothing in the system.3
man page that says it should be expected to always return an error if
SIGCHILD is ignored.
And actually looking at that documentation it's not clear to me why
it's the case. I would have expected system to immediately call
waitpid after the fork and unless the subprocess was very quick that
should be sufficient to get the exit code. One might even imagine
having system intentionally have some kind interlock to ensure that
the parent has called waitpid before the child execs the shell.
--
greg
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 02/15/16 20:03, Greg Stark wrote:
On Tue, Feb 16, 2016 at 12:51 AM, Chapman Flack <chap@anastigmatix.net> wrote:
If the calling process subsequently waits for its
children, and the process has no unwaited for children that were
transformed into zombie processes, it will block until all of its
children terminate, and wait(), wait3(), waitid() and waitpid() will
fail and set errno to [ECHILD].
And actually looking at that documentation it's not clear to me why
it's the case. I would have expected system to immediately call
waitpid after the fork and unless the subprocess was very quick that
should be sufficient to get the exit code. One might even imagine
having system intentionally have some kind interlock to ensure that
the parent has called waitpid before the child execs the shell.
Doesn't the wording suggest that even if the parent is fast enough
to call waitpid before the child exits, waitpid will only block until
the child terminates and then say ECHILD anyway?
I wouldn't be surprised if they specified it that way to avoid creating
a race condition where you would *sometimes* think it was doing what you
wanted.
Agree that the language for ECHILD in system(3) doesn't clearly reflect that
in the "status ... is no longer available" description it gives for ECHILD.
-Chap
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers