BUG #15449: file_fdw using program cause exit code error when using LIMIT
The following bug has been logged on the website:
Bug reference: 15449
Logged by: Eric Cyr
Email address: eric.cyr@gmail.com
PostgreSQL version: 10.5
Operating system: Ubuntu 16.04.2 LTS (Xenial Xerus)
Description:
CREATE FOREIGN TABLE test_file_fdw_program_limit
(
message text
)
SERVER pg_log_fdw OPTIONS (program 'echo "test"', format 'csv')
;
--
SELECT * FROM test_file_fdw_program_limit;
/*
OK
*/
SELECT * FROM test_file_fdw_program_limit LIMIT 2;
/*
OK
*/
SELECT * FROM test_file_fdw_program_limit LIMIT 0;
/*
[38000]: ERROR: program "echo "test"" failed Detail: child process exited with exit code 1 */
with exit code 1
*/
--
LIMIT 0 is the easiest way we found to reproduce the error, but we
encountered same issue with LIMIT (< file lines) on a program doing an if
exists + file output stream manipulation.
=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
SELECT * FROM test_file_fdw_program_limit LIMIT 0;
/*
[38000] ERROR: program "echo "test"" failed Detail: child process exited
with exit code 1
*/
LIMIT 0 is the easiest way we found to reproduce the error, but we
encountered same issue with LIMIT (< file lines) on a program doing an if
exists + file output stream manipulation.
Yeah, I can reproduce this on macOS as well as Linux. Capturing stderr
shows something pretty unsurprising:
sh: line 1: echo: write error: Broken pipe
I get similar behavior from "cat"; didn't try anything else.
So the called program is behaving in a somewhat reasonable way: it's
detecting EPIPE on its stdout (after we close the pipe), reporting that,
and doing exit(1).
Unfortunately, it's not clear what we could do about that, short of
always reading the whole program output, which is likely to be a
cure worse than the disease. If the program were failing thanks to
SIGPIPE, we could recognize that as a case we can ignore ... but with
behavior like this, I don't see a reliable way to tell it apart from
a generic program failure, which surely we'd better report.
The only idea that comes to mind is to introduce a file_fdw option
specifying "read program input to end, even if query doesn't want
it all". Not quite sure which choice should be the default; you
could make an argument either way I think.
Curiously, if you do stuff like "cat longfile | head -10", you don't
get any visible error, though surely cat must be seeing the same
thing underneath ... I wonder how the shell is arranging that?
regards, tom lane
I wrote:
=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
SELECT * FROM test_file_fdw_program_limit LIMIT 0;
/*
[38000] ERROR: program "echo "test"" failed Detail: child process exited
with exit code 1
*/
Yeah, I can reproduce this on macOS as well as Linux. Capturing stderr
shows something pretty unsurprising:
sh: line 1: echo: write error: Broken pipe
So the called program is behaving in a somewhat reasonable way: it's
detecting EPIPE on its stdout (after we close the pipe), reporting that,
and doing exit(1).
Unfortunately, it's not clear what we could do about that, short of
always reading the whole program output, which is likely to be a
cure worse than the disease. If the program were failing thanks to
SIGPIPE, we could recognize that as a case we can ignore ... but with
behavior like this, I don't see a reliable way to tell it apart from
a generic program failure, which surely we'd better report.
After a bit of thought, the problem here is blindingly obvious:
we generally run the backend with SIGPIPE handing set to SIG_IGN,
and evidently popen() allows the called program to inherit that,
at least on these platforms.
So what we need to do is make sure the called program inherits SIG_DFL
handling for SIGPIPE, and then special-case that result as not being
a failure. The attached POC patch does that and successfully makes
the file_fdw problem go away for me.
It's just a POC because there are some things that need more thought
than I've given them:
1. Is it OK to revert SIGPIPE to default processing for *all* programs
launched through OpenPipeStream? If not, what infrastructure do we
need to add to control that? In particular, is it sane to revert
SIGPIPE for a pipe program that we will write to not read from?
(I thought probably yes, because that is the normal Unix behavior,
but it could be debated.)
2. Likewise, is it always OK for ClosePipeToProgram to ignore a
SIGPIPE failure? (For ordinary COPY, maybe it shouldn't, since
we don't intend to terminate that early.)
3. Maybe this should be implemented at some higher level?
4. Are there any other signals we ought to be reverting to default
behavior before launching a COPY TO/FROM PROGRAM?
regards, tom lane
Attachments:
process-sigpipe-normally-in-copy-from-program-1.patchtext/x-patch; charset=us-ascii; name=process-sigpipe-normally-in-copy-from-program-1.patchDownload+21-8
Hello.
At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane <tgl@sss.pgh.pa.us> wrote in <18397.1540297291@sss.pgh.pa.us>
I wrote:
=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
SELECT * FROM test_file_fdw_program_limit LIMIT 0;
/*
[38000] ERROR: program "echo "test"" failed Detail: child process exited
with exit code 1
*/Yeah, I can reproduce this on macOS as well as Linux. Capturing stderr
shows something pretty unsurprising:
sh: line 1: echo: write error: Broken pipe
So the called program is behaving in a somewhat reasonable way: it's
detecting EPIPE on its stdout (after we close the pipe), reporting that,
and doing exit(1).
Unfortunately, it's not clear what we could do about that, short of
always reading the whole program output, which is likely to be a
cure worse than the disease. If the program were failing thanks to
SIGPIPE, we could recognize that as a case we can ignore ... but with
behavior like this, I don't see a reliable way to tell it apart from
a generic program failure, which surely we'd better report.After a bit of thought, the problem here is blindingly obvious:
we generally run the backend with SIGPIPE handing set to SIG_IGN,
and evidently popen() allows the called program to inherit that,
at least on these platforms.So what we need to do is make sure the called program inherits SIG_DFL
handling for SIGPIPE, and then special-case that result as not being
a failure. The attached POC patch does that and successfully makes
the file_fdw problem go away for me.It's just a POC because there are some things that need more thought
than I've given them:1. Is it OK to revert SIGPIPE to default processing for *all* programs
launched through OpenPipeStream? If not, what infrastructure do we
need to add to control that? In particular, is it sane to revert
SIGPIPE for a pipe program that we will write to not read from?
(I thought probably yes, because that is the normal Unix behavior,
but it could be debated.)
Forcing to ignore SIGPIPE lets the program to handle EPIPE, which
might not be handled poperly since it would be unexpected by the
program.
If we don't read from the pipe (that is, we are writing to it),
anyway we shouldn't change the behavior since SIGPIPE can come
from another pipe.
2. Likewise, is it always OK for ClosePipeToProgram to ignore a
SIGPIPE failure? (For ordinary COPY, maybe it shouldn't, since
we don't intend to terminate that early.)
Since the SIGPIPE may come from another pipe, I think we
shouldn't generally. But iff we are explicitly stop reading from
the pipe before detecting an error, it can be ignored since we
aren't interested in further failure. Thus ClosePipeToProgram
might need an extra parameter or CopyState may need an additional
flag named, say, "explicit_close" (or
"ignore_sigpipe_on_pclose"..) in CopyState which will be set in
EndCopy.
3. Maybe this should be implemented at some higher level?
I'm not sure, but it seems to me not needed.
4. Are there any other signals we ought to be reverting to default
behavior before launching a COPY TO/FROM PROGRAM?
All handlers other than SIG_DEF and IGN are reset on exec().
Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
harm anything. Perhaps it would be safer against future changes
if we explicitly reset all changed actions to SIG_DEF, but it
might be too-much..
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
(2018/10/29 15:58), Kyotaro HORIGUCHI wrote:
At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane<tgl@sss.pgh.pa.us> wrote in<18397.1540297291@sss.pgh.pa.us>
After a bit of thought, the problem here is blindingly obvious:
we generally run the backend with SIGPIPE handing set to SIG_IGN,
and evidently popen() allows the called program to inherit that,
at least on these platforms.So what we need to do is make sure the called program inherits SIG_DFL
handling for SIGPIPE, and then special-case that result as not being
a failure. The attached POC patch does that and successfully makes
the file_fdw problem go away for me.
Thanks for working on this!
It's just a POC because there are some things that need more thought
than I've given them:1. Is it OK to revert SIGPIPE to default processing for *all* programs
launched through OpenPipeStream? If not, what infrastructure do we
need to add to control that? In particular, is it sane to revert
SIGPIPE for a pipe program that we will write to not read from?
(I thought probably yes, because that is the normal Unix behavior,
but it could be debated.)Forcing to ignore SIGPIPE lets the program to handle EPIPE, which
might not be handled poperly since it would be unexpected by the
program.If we don't read from the pipe (that is, we are writing to it),
anyway we shouldn't change the behavior since SIGPIPE can come
from another pipe.
I'm a bit confused here. Horiguchi-san, are you saying that the called
program that we will read from should inherit SIG_DFL for SIGPIPE, as
proposed in the POC patch, but the called program that we will write to
should inherit SIG_IGN as it currently does?
ISTM that it would be OK to inherit SIG_DFL in both cases, because I
think it would be the responsibility of the called program to handle
SIGPIPE properly (if necessary) in both cases. Maybe I'm missing
something, though.
2. Likewise, is it always OK for ClosePipeToProgram to ignore a
SIGPIPE failure? (For ordinary COPY, maybe it shouldn't, since
we don't intend to terminate that early.)Since the SIGPIPE may come from another pipe, I think we
shouldn't generally.
Agreed; if ClosePipeToProgram ignores that failure, we would fail to get
a better error message in CopySendEndOfRow if the called program
(inheriting SIG_DFL for SIGPIPE) was terminated on SIGPIPE:
if (cstate->is_program)
{
if (errno == EPIPE)
{
/*
* The pipe will be closed automatically on
error at
* the end of transaction, but we might get a
better
* error message from the subprocess' exit code
than
* just "Broken Pipe"
*/
ClosePipeToProgram(cstate);
/*
* If ClosePipeToProgram() didn't throw an
error, the
* program terminated normally, but closed the pipe
* first. Restore errno, and throw an error.
*/
errno = EPIPE;
}
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not write to COPY program:
%m")));
}
But iff we are explicitly stop reading from
the pipe before detecting an error, it can be ignored since we
aren't interested in further failure.
You mean that we should ignore other failures of the called program when
we stop reading from the pipe early?
Thus ClosePipeToProgram
might need an extra parameter or CopyState may need an additional
flag named, say, "explicit_close" (or
"ignore_sigpipe_on_pclose"..) in CopyState which will be set in
EndCopy.3. Maybe this should be implemented at some higher level?
I'm not sure, but it seems to me not needed.
ISTM that it's a good idea to control the ClosePipeToProgram behavior by
a new member for CopyState.
4. Are there any other signals we ought to be reverting to default
behavior before launching a COPY TO/FROM PROGRAM?All handlers other than SIG_DEF and IGN are reset on exec().
Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
harm anything. Perhaps it would be safer against future changes
if we explicitly reset all changed actions to SIG_DEF, but it
might be too-much..
Not sure, but reverting SIGPIPE to default would be enough as a fix for
the original issue, if we go with the POC patch.
Best regards,
Etsuro Fujita
At Fri, 02 Nov 2018 22:05:36 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in <5BDC4BA0.7050106@lab.ntt.co.jp>
(2018/10/29 15:58), Kyotaro HORIGUCHI wrote:
At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane<tgl@sss.pgh.pa.us> wrote
in<18397.1540297291@sss.pgh.pa.us>After a bit of thought, the problem here is blindingly obvious:
we generally run the backend with SIGPIPE handing set to SIG_IGN,
and evidently popen() allows the called program to inherit that,
at least on these platforms.So what we need to do is make sure the called program inherits SIG_DFL
handling for SIGPIPE, and then special-case that result as not being
a failure. The attached POC patch does that and successfully makes
the file_fdw problem go away for me.Thanks for working on this!
It's just a POC because there are some things that need more thought
than I've given them:1. Is it OK to revert SIGPIPE to default processing for *all* programs
launched through OpenPipeStream? If not, what infrastructure do we
need to add to control that? In particular, is it sane to revert
SIGPIPE for a pipe program that we will write to not read from?
(I thought probably yes, because that is the normal Unix behavior,
but it could be debated.)Forcing to ignore SIGPIPE lets the program to handle EPIPE, which
might not be handled poperly since it would be unexpected by the
program.If we don't read from the pipe (that is, we are writing to it),
anyway we shouldn't change the behavior since SIGPIPE can come
from another pipe.I'm a bit confused here. Horiguchi-san, are you saying that the
called program that we will read from should inherit SIG_DFL for
SIGPIPE, as proposed in the POC patch, but the called program that we
will write to should inherit SIG_IGN as it currently does?
Maybe yes. The understanding of the first paragram looks
right. But in my second paragraph, I said that we shouldn't set
SIGPIPE to other than SIG_DFL at exec() time even if we are to
read the pipe because it can be writing to another pipe and the
change can affect the program's behavior.
ISTM that it would be OK to inherit SIG_DFL in both cases, because I
think it would be the responsibility of the called program to handle
SIGPIPE properly (if necessary) in both cases. Maybe I'm missing
something, though.
So, I think we *should* (not just OK to) restore SIGPIPE to
SIG_DFL in any case here to prevent undetermined situation for
the program.
2. Likewise, is it always OK for ClosePipeToProgram to ignore a
SIGPIPE failure? (For ordinary COPY, maybe it shouldn't, since
we don't intend to terminate that early.)Since the SIGPIPE may come from another pipe, I think we
shouldn't generally.Agreed; if ClosePipeToProgram ignores that failure, we would fail to
get a better error message in CopySendEndOfRow if the called program
(inheriting SIG_DFL for SIGPIPE) was terminated on SIGPIPE:if (cstate->is_program)
{
if (errno == EPIPE)
{
/*
* The pipe will be closed automatically on error at
* the end of transaction, but we might get a better
* error message from the subprocess' exit code than
* just "Broken Pipe"
*/
ClosePipeToProgram(cstate);/*
* If ClosePipeToProgram() didn't throw an error, the
* program terminated normally, but closed the pipe
* first. Restore errno, and throw an error.
*/
errno = EPIPE;
}
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not write to COPY program: %m")));
}
Mmm, that's EPIPE of fwrite, not SIGNALLED&SIGPIPE of called
program's exit status. So it is irrelevant to called program's
SIGPIPE setup. It requires SIGPIPE to be kept to SIG_IGN on the
backend side.
But iff we are explicitly stop reading from
the pipe before detecting an error, it can be ignored since we
aren't interested in further failure.You mean that we should ignore other failures of the called program
when we stop reading from the pipe early?
Yes, we have received sufficient data from the pipe then closed
it successfully. The program may want write more but we don't
want it. We ragher should ignore SIGPIPE exit code of the program
since closing our writing end of a pipe is likely to cause it and
even if it comes from another pipe, we can assume that the
SIGPIPE immediately stopped the program before returning any
garbage to us.
Thus ClosePipeToProgram
might need an extra parameter or CopyState may need an additional
flag named, say, "explicit_close" (or
"ignore_sigpipe_on_pclose"..) in CopyState which will be set in
EndCopy.3. Maybe this should be implemented at some higher level?
I'm not sure, but it seems to me not needed.
ISTM that it's a good idea to control the ClosePipeToProgram behavior
by a new member for CopyState.4. Are there any other signals we ought to be reverting to default
behavior before launching a COPY TO/FROM PROGRAM?All handlers other than SIG_DEF and IGN are reset on exec().
Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
harm anything. Perhaps it would be safer against future changes
if we explicitly reset all changed actions to SIG_DEF, but it
might be too-much..Not sure, but reverting SIGPIPE to default would be enough as a fix
for the original issue, if we go with the POC patch.
Agreed. I wonder why there's no API that resets all handlers to
SIG_DFL at once or some flag telling to exec() that it should
start with default signal handler set.
I tried that but found that fread() doesn't detect termination of
called program by a signal as an error. So just set a flag in
EndCopyFrom() ends with concealing of the error.
Finally in the attached PoC, I set cstate->eof_reached on failure
of NextCopyFromRawFields then if eof_reached we don't ingore
SIGPIPE. For a program like
"puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
the following behavior. (I got an error without the fflush() but
it is inevitable).
=# select * from ft2 limit 1;
a
-------
test1
=# select * from ft2 limit 2;
ERROR: program "/home/horiguti/work/exprog" failed
DETAIL: child process was terminated by signal 13: Broken pipe
For the original case:
=# select * from ft1 limit 1;
a
------
test
=# select * from ft1 limit 2;
a
------
test
(1 row)
I didn't confirmed whether it is complete.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
process-sigpipe-normally-in-copy-from-program-kyota-1.patchtext/x-patch; charset=us-asciiDownload+30-3
On Wed, Oct 24, 2018 at 1:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
SELECT * FROM test_file_fdw_program_limit LIMIT 0;
/*
[38000] ERROR: program "echo "test"" failed Detail: child process exited
with exit code 1
*/Yeah, I can reproduce this on macOS as well as Linux. Capturing stderr
shows something pretty unsurprising:
sh: line 1: echo: write error: Broken pipe
So the called program is behaving in a somewhat reasonable way: it's
detecting EPIPE on its stdout (after we close the pipe), reporting that,
and doing exit(1).
Unfortunately, it's not clear what we could do about that, short of
always reading the whole program output, which is likely to be a
cure worse than the disease. If the program were failing thanks to
SIGPIPE, we could recognize that as a case we can ignore ... but with
behavior like this, I don't see a reliable way to tell it apart from
a generic program failure, which surely we'd better report.After a bit of thought, the problem here is blindingly obvious:
we generally run the backend with SIGPIPE handing set to SIG_IGN,
and evidently popen() allows the called program to inherit that,
at least on these platforms.So what we need to do is make sure the called program inherits SIG_DFL
handling for SIGPIPE, and then special-case that result as not being
a failure. The attached POC patch does that and successfully makes
the file_fdw problem go away for me.It's just a POC because there are some things that need more thought
than I've given them:1. Is it OK to revert SIGPIPE to default processing for *all* programs
launched through OpenPipeStream? If not, what infrastructure do we
need to add to control that? In particular, is it sane to revert
SIGPIPE for a pipe program that we will write to not read from?
(I thought probably yes, because that is the normal Unix behavior,
but it could be debated.)2. Likewise, is it always OK for ClosePipeToProgram to ignore a
SIGPIPE failure? (For ordinary COPY, maybe it shouldn't, since
we don't intend to terminate that early.)
I'm not sure about that. It might in theory be telling you about some
other pipe. If you're OK with that false positive, why not ignore all
errors after you've read enough successful input and decided to close
the pipe?
3. Maybe this should be implemented at some higher level?
It won't work for some programs that ignore or catch the signal, so in
theory you might want to give users the power/responsibility to say
"ignore errors that occur after I decide to hang up". Here are three
different behaviours I found in popular software, showing termination
by signal, custom error handling that we can't distinguish, and a
bonehead strategy:
$ seq 1 1000000 | head -5
1
2
3
4
5
... exit code indicates killed by signal
$ python -c "for i in range(1000000): print i" | head -5
0
1
2
3
4
Traceback (most recent call last):
File "<string>", line 1, in <module>
IOError: [Errno 32] Broken pipe
... exit code 1
$ cat Test.java
public class Test {
public static void main(String[] args) {
for (int i = 0; i < 1000000; ++i) {
System.out.println(Integer.toString(i));
}
}
}
$ javac Test.java
$ java Test | head -5
0
1
2
3
4
... wait a really long time with no output, exit code 0
(Explanation: JVMs ignore SIGPIPE and usually convert EPIPE into an IO
exception, except for PrintStreams like System.out which just eat data
after an error...)
4. Are there any other signals we ought to be reverting to default
behavior before launching a COPY TO/FROM PROGRAM?
On my FreeBSD system, I compared the output of procstat -i (= show
signal disposition) for two "sleep 60" processes, one invoked from the
shell and the other from COPY ... FROM PROGRAM. The differences were:
PIPE, TTIN, TTOU and USR2. For the first and last of those, the
default action would be to terminate the process, but the COPY PROGRAM
child ignored them; for TTIN and TTOU, the default action would be to
stop the process, but again they are ignored. Why do bgwriter.c,
startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
regular backends?
--
Thomas Munro
http://www.enterprisedb.com
(2018/11/06 12:53), Kyotaro HORIGUCHI wrote:
At Fri, 02 Nov 2018 22:05:36 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote in<5BDC4BA0.7050106@lab.ntt.co.jp>
(2018/10/29 15:58), Kyotaro HORIGUCHI wrote:
At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane<tgl@sss.pgh.pa.us> wrote
in<18397.1540297291@sss.pgh.pa.us>It's just a POC because there are some things that need more thought
than I've given them:1. Is it OK to revert SIGPIPE to default processing for *all* programs
launched through OpenPipeStream? If not, what infrastructure do we
need to add to control that? In particular, is it sane to revert
SIGPIPE for a pipe program that we will write to not read from?
(I thought probably yes, because that is the normal Unix behavior,
but it could be debated.)
ISTM that it would be OK to inherit SIG_DFL in both cases, because I
think it would be the responsibility of the called program to handle
SIGPIPE properly (if necessary) in both cases. Maybe I'm missing
something, though.So, I think we *should* (not just OK to) restore SIGPIPE to
SIG_DFL in any case here to prevent undetermined situation for
the program.
OK
2. Likewise, is it always OK for ClosePipeToProgram to ignore a
SIGPIPE failure? (For ordinary COPY, maybe it shouldn't, since
we don't intend to terminate that early.)Since the SIGPIPE may come from another pipe, I think we
shouldn't generally.Agreed; if ClosePipeToProgram ignores that failure, we would fail to
get a better error message in CopySendEndOfRow if the called program
(inheriting SIG_DFL for SIGPIPE) was terminated on SIGPIPE:if (cstate->is_program)
{
if (errno == EPIPE)
{
/*
* The pipe will be closed automatically on error at
* the end of transaction, but we might get a better
* error message from the subprocess' exit code than
* just "Broken Pipe"
*/
ClosePipeToProgram(cstate);/*
* If ClosePipeToProgram() didn't throw an error, the
* program terminated normally, but closed the pipe
* first. Restore errno, and throw an error.
*/
errno = EPIPE;
}
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not write to COPY program: %m")));
}Mmm, that's EPIPE of fwrite, not SIGNALLED&SIGPIPE of called
program's exit status. So it is irrelevant to called program's
SIGPIPE setup. It requires SIGPIPE to be kept to SIG_IGN on the
backend side.
My explanation might not be enough. Let me explain. If the called
program that inherited SIG_DFL for SIGPIPE terminated on SIGPIPE for
some reason, ClosePipeToProgram *as-is* would create an error message
from that program's exit code. But if we modify ClosePipeToProgram like
the original POC patch, that function would not create that message for
that termination. To avoid that, I think it would be better for
ClosePipeToProgram to ignore the SIGPIPE failure only in the case where
the caller is a COPY FROM PROGRAM that is allowed to terminate early.
(Maybe we could specify that as a new argument for BeginCopyFrom.)
But iff we are explicitly stop reading from
the pipe before detecting an error, it can be ignored since we
aren't interested in further failure.You mean that we should ignore other failures of the called program
when we stop reading from the pipe early?Yes, we have received sufficient data from the pipe then closed
it successfully. The program may want write more but we don't
want it. We ragher should ignore SIGPIPE exit code of the program
since closing our writing end of a pipe is likely to cause it and
I think so too.
even if it comes from another pipe, we can assume that the
SIGPIPE immediately stopped the program before returning any
garbage to us.
Sorry, I don't understand this.
4. Are there any other signals we ought to be reverting to default
behavior before launching a COPY TO/FROM PROGRAM?All handlers other than SIG_DEF and IGN are reset on exec().
Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
harm anything. Perhaps it would be safer against future changes
if we explicitly reset all changed actions to SIG_DEF, but it
might be too-much..Not sure, but reverting SIGPIPE to default would be enough as a fix
for the original issue, if we go with the POC patch.Agreed. I wonder why there's no API that resets all handlers to
SIG_DFL at once or some flag telling to exec() that it should
start with default signal handler set.
Such API would be an improvement, but IMO I think that would go beyond
the scope of this fix.
Finally in the attached PoC, I set cstate->eof_reached on failure
of NextCopyFromRawFields then if eof_reached we don't ingore
SIGPIPE. For a program like
"puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
the following behavior. (I got an error without the fflush() but
it is inevitable).=# select * from ft2 limit 1;
a
-------
test1=# select * from ft2 limit 2;
ERROR: program "/home/horiguti/work/exprog" failed
DETAIL: child process was terminated by signal 13: Broken pipeFor the original case:
=# select * from ft1 limit 1;
a
------
test
=# select * from ft1 limit 2;
a
------
test
(1 row)I didn't confirmed whether it is complete.
Sorry, I don't understand this fully, but the reason to add the
eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal
COPY FROM PROGRAM cases?
Best regards,
Etsuro Fujita
At Tue, 06 Nov 2018 21:07:37 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in <5BE18409.2070004@lab.ntt.co.jp>
(2018/11/06 12:53), Kyotaro HORIGUCHI wrote:
At Fri, 02 Nov 2018 22:05:36 +0900, Etsuro
Fujita<fujita.etsuro@lab.ntt.co.jp> wrote
in<5BDC4BA0.7050106@lab.ntt.co.jp>(2018/10/29 15:58), Kyotaro HORIGUCHI wrote:
At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane<tgl@sss.pgh.pa.us> wrote
in<18397.1540297291@sss.pgh.pa.us>It's just a POC because there are some things that need more thought
than I've given them:1. Is it OK to revert SIGPIPE to default processing for *all* programs
launched through OpenPipeStream? If not, what infrastructure do we
need to add to control that? In particular, is it sane to revert
SIGPIPE for a pipe program that we will write to not read from?
(I thought probably yes, because that is the normal Unix behavior,
but it could be debated.)ISTM that it would be OK to inherit SIG_DFL in both cases, because I
think it would be the responsibility of the called program to handle
SIGPIPE properly (if necessary) in both cases. Maybe I'm missing
something, though.So, I think we *should* (not just OK to) restore SIGPIPE to
SIG_DFL in any case here to prevent undetermined situation for
the program.OK
2. Likewise, is it always OK for ClosePipeToProgram to ignore a
SIGPIPE failure? (For ordinary COPY, maybe it shouldn't, since
we don't intend to terminate that early.)Since the SIGPIPE may come from another pipe, I think we
shouldn't generally.Agreed; if ClosePipeToProgram ignores that failure, we would fail to
get a better error message in CopySendEndOfRow if the called program
(inheriting SIG_DFL for SIGPIPE) was terminated on SIGPIPE:if (cstate->is_program)
{
if (errno == EPIPE)
{
/*
* The pipe will be closed automatically on error
* at
* the end of transaction, but we might get a
* better
* error message from the subprocess' exit code
* than
* just "Broken Pipe"
*/
ClosePipeToProgram(cstate);/*
* If ClosePipeToProgram() didn't throw an error,
* the
* program terminated normally, but closed the
* pipe
* first. Restore errno, and throw an error.
*/
errno = EPIPE;
}
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not write to COPY program:
%m")));
}Mmm, that's EPIPE of fwrite, not SIGNALLED&SIGPIPE of called
program's exit status. So it is irrelevant to called program's
SIGPIPE setup. It requires SIGPIPE to be kept to SIG_IGN on the
backend side.My explanation might not be enough. Let me explain. If the called
program that inherited SIG_DFL for SIGPIPE terminated on SIGPIPE for
some reason, ClosePipeToProgram *as-is* would create an error message
from that program's exit code. But if we modify ClosePipeToProgram
like the original POC patch, that function would not create that
message for that termination. To avoid that, I think it would be
better for ClosePipeToProgram to ignore the SIGPIPE failure only in
the case where the caller is a COPY FROM PROGRAM that is allowed to
terminate early. (Maybe we could specify that as a new argument for
BeginCopyFrom.)
I understood. Thanks for the explanation. I agree to that.
But iff we are explicitly stop reading from
the pipe before detecting an error, it can be ignored since we
aren't interested in further failure.You mean that we should ignore other failures of the called program
when we stop reading from the pipe early?Yes, we have received sufficient data from the pipe then closed
it successfully. The program may want write more but we don't
want it. We ragher should ignore SIGPIPE exit code of the program
rather
since closing our writing end of a pipe is likely to cause it and
I think so too.
even if it comes from another pipe, we can assume that the
SIGPIPE immediately stopped the program before returning any
garbage to us.Sorry, I don't understand this.
Mmm. It looks confusing, sorry. In other words:
We don't care the reason for program's ending if we received
enough data from the program and we actively (= before receiving
EOF) stop reading. On the other hand if SIGPIPE (on another pipe)
or something fatal happened on the program before we end reading,
we receive EOF just after that and we are interested in the cause
of the program's death.
# Does the above make sense?
In the sense of "We don't care the reason", negligible reasons
are necessariry restricted to SIGPIPE, evan SIGSEGV could be
theoretically ignored safely. "theoretically" here means it is
another issue whether we rely on the output from a program which
causes SEGV (or any reason other than SIGPIPE, which we caused).
4. Are there any other signals we ought to be reverting to default
behavior before launching a COPY TO/FROM PROGRAM?All handlers other than SIG_DEF and IGN are reset on exec().
Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
harm anything. Perhaps it would be safer against future changes
if we explicitly reset all changed actions to SIG_DEF, but it
might be too-much..Not sure, but reverting SIGPIPE to default would be enough as a fix
for the original issue, if we go with the POC patch.Agreed. I wonder why there's no API that resets all handlers to
SIG_DFL at once or some flag telling to exec() that it should
start with default signal handler set.Such API would be an improvement, but IMO I think that would go beyond
the scope of this fix.
Ah, I wanted OS or standard library to provide it. Not us. Anyway
it is not relevant to the topic here.
Finally in the attached PoC, I set cstate->eof_reached on failure
of NextCopyFromRawFields then if eof_reached we don't ingore
SIGPIPE. For a program like
"puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
the following behavior. (I got an error without the fflush() but
it is inevitable).=# select * from ft2 limit 1;
a
-------
test1=# select * from ft2 limit 2;
ERROR: program "/home/horiguti/work/exprog" failed
DETAIL: child process was terminated by signal 13: Broken pipeFor the original case:
=# select * from ft1 limit 1;
a
------
test
=# select * from ft1 limit 2;
a
------
test
(1 row)I didn't confirmed whether it is complete.
Sorry, I don't understand this fully, but the reason to add the
eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal
COPY FROM PROGRAM cases?
Yes, if we have received EOF, the program ended before we
finished reading from the pipe thus any error should be
reported. Elsewise we don't care at least SIGPIPE. In the
attached patch all kind of errors are ignored when pipe is closed
from reading-end on backends.
As the result it doesn't report an error for SELECT * FROM ft2
LIMIT 1 on "main(void){puts("test1"); return 1;}".
=# select * from ft limit 1;
a
-------
test1
(1 row)
limit 2 reports the error.
=# select * from ft limit 2;
ERROR: program "/home/horiguti/work/exprog" failed
DETAIL: child process exited with exit code 1
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
ignore-process-exit-status-after-close.patchtext/x-patch; charset=us-asciiDownload+28-4
(2018/11/06 19:50), Thomas Munro wrote:
On Wed, Oct 24, 2018 at 1:21 AM Tom Lane<tgl@sss.pgh.pa.us> wrote:
I wrote:
=?utf-8?q?PG_Bug_reporting_form?=<noreply@postgresql.org> writes:
SELECT * FROM test_file_fdw_program_limit LIMIT 0;
/*
[38000] ERROR: program "echo "test"" failed Detail: child process exited
with exit code 1
*/Yeah, I can reproduce this on macOS as well as Linux. Capturing stderr
shows something pretty unsurprising:
sh: line 1: echo: write error: Broken pipe
So the called program is behaving in a somewhat reasonable way: it's
detecting EPIPE on its stdout (after we close the pipe), reporting that,
and doing exit(1).
Unfortunately, it's not clear what we could do about that, short of
always reading the whole program output, which is likely to be a
cure worse than the disease. If the program were failing thanks to
SIGPIPE, we could recognize that as a case we can ignore ... but with
behavior like this, I don't see a reliable way to tell it apart from
a generic program failure, which surely we'd better report.After a bit of thought, the problem here is blindingly obvious:
we generally run the backend with SIGPIPE handing set to SIG_IGN,
and evidently popen() allows the called program to inherit that,
at least on these platforms.So what we need to do is make sure the called program inherits SIG_DFL
handling for SIGPIPE, and then special-case that result as not being
a failure. The attached POC patch does that and successfully makes
the file_fdw problem go away for me.It's just a POC because there are some things that need more thought
than I've given them:1. Is it OK to revert SIGPIPE to default processing for *all* programs
launched through OpenPipeStream? If not, what infrastructure do we
need to add to control that? In particular, is it sane to revert
SIGPIPE for a pipe program that we will write to not read from?
(I thought probably yes, because that is the normal Unix behavior,
but it could be debated.)2. Likewise, is it always OK for ClosePipeToProgram to ignore a
SIGPIPE failure? (For ordinary COPY, maybe it shouldn't, since
we don't intend to terminate that early.)I'm not sure about that. It might in theory be telling you about some
other pipe. If you're OK with that false positive, why not ignore all
errors after you've read enough successful input and decided to close
the pipe?
It's unfortunate to have that false positive, but in my opinion I think
we had better to error out if there is something wrong with the called
program, because in that case I think the data that we read from the
pipe might not be reliable. IMO I think it would be the responsibility
of the called program to handle/ignore SIGPIPE properly if necessary.
3. Maybe this should be implemented at some higher level?
It won't work for some programs that ignore or catch the signal, so in
theory you might want to give users the power/responsibility to say
"ignore errors that occur after I decide to hang up". Here are three
different behaviours I found in popular software, showing termination
by signal, custom error handling that we can't distinguish, and a
bonehead strategy:
Interesting!
$ seq 1 1000000 | head -5
1
2
3
4
5
... exit code indicates killed by signal$ python -c "for i in range(1000000): print i" | head -5
0
1
2
3
4
Traceback (most recent call last):
File "<string>", line 1, in<module>
IOError: [Errno 32] Broken pipe
... exit code 1
That's sad.
$ cat Test.java
public class Test {
public static void main(String[] args) {
for (int i = 0; i< 1000000; ++i) {
System.out.println(Integer.toString(i));
}
}
}
$ javac Test.java
$ java Test | head -5
0
1
2
3
4
... wait a really long time with no output, exit code 0(Explanation: JVMs ignore SIGPIPE and usually convert EPIPE into an IO
exception, except for PrintStreams like System.out which just eat data
after an error...)
I agree that that is a bonehead strategy, but that seems not that bad to me.
4. Are there any other signals we ought to be reverting to default
behavior before launching a COPY TO/FROM PROGRAM?On my FreeBSD system, I compared the output of procstat -i (= show
signal disposition) for two "sleep 60" processes, one invoked from the
shell and the other from COPY ... FROM PROGRAM. The differences were:
PIPE, TTIN, TTOU and USR2. For the first and last of those, the
default action would be to terminate the process, but the COPY PROGRAM
child ignored them; for TTIN and TTOU, the default action would be to
stop the process, but again they are ignored. Why do bgwriter.c,
startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
regular backends?
So, we should revert SIGUSR2 as well to default processing?
Thanks!
Best regards,
Etsuro Fujita
(2018/11/07 9:22), Kyotaro HORIGUCHI wrote:
At Tue, 06 Nov 2018 21:07:37 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote in<5BE18409.2070004@lab.ntt.co.jp>
(2018/11/06 12:53), Kyotaro HORIGUCHI wrote:
At Fri, 02 Nov 2018 22:05:36 +0900, Etsuro
Fujita<fujita.etsuro@lab.ntt.co.jp> wrote
in<5BDC4BA0.7050106@lab.ntt.co.jp>(2018/10/29 15:58), Kyotaro HORIGUCHI wrote:
But iff we are explicitly stop reading from
the pipe before detecting an error, it can be ignored since we
aren't interested in further failure.You mean that we should ignore other failures of the called program
when we stop reading from the pipe early?Yes, we have received sufficient data from the pipe then closed
it successfully. The program may want write more but we don't
want it. We ragher should ignore SIGPIPE exit code of the programrather
since closing our writing end of a pipe is likely to cause it and
I think so too.
even if it comes from another pipe, we can assume that the
SIGPIPE immediately stopped the program before returning any
garbage to us.Sorry, I don't understand this.
Mmm. It looks confusing, sorry. In other words:
We don't care the reason for program's ending if we received
enough data from the program and we actively (= before receiving
EOF) stop reading. On the other hand if SIGPIPE (on another pipe)
or something fatal happened on the program before we end reading,
we receive EOF just after that and we are interested in the cause
of the program's death.# Does the above make sense?
Yeah, thanks for the explanation!
In the sense of "We don't care the reason", negligible reasons
are necessariry restricted to SIGPIPE, evan SIGSEGV could be
theoretically ignored safely. "theoretically" here means it is
another issue whether we rely on the output from a program which
causes SEGV (or any reason other than SIGPIPE, which we caused).
For the SIGSEGV case, I think it would be better that we don't rely on
the output data, IMO, because I think there might be a possibility that
the program have generated that data incorrectly/unexpectedly.
Finally in the attached PoC, I set cstate->eof_reached on failure
of NextCopyFromRawFields then if eof_reached we don't ingore
SIGPIPE. For a program like
"puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
the following behavior. (I got an error without the fflush() but
it is inevitable).=# select * from ft2 limit 1;
a
-------
test1=# select * from ft2 limit 2;
ERROR: program "/home/horiguti/work/exprog" failed
DETAIL: child process was terminated by signal 13: Broken pipeFor the original case:
=# select * from ft1 limit 1;
a
------
test
=# select * from ft1 limit 2;
a
------
test
(1 row)I didn't confirmed whether it is complete.
Sorry, I don't understand this fully, but the reason to add the
eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal
COPY FROM PROGRAM cases?Yes, if we have received EOF, the program ended before we
finished reading from the pipe thus any error should be
reported. Elsewise we don't care at least SIGPIPE. In the
attached patch all kind of errors are ignored when pipe is closed
from reading-end on backends.
OK, I understand your idea. Thanks for the patch!
As the result it doesn't report an error for SELECT * FROM ft2
LIMIT 1 on "main(void){puts("test1"); return 1;}".=# select * from ft limit 1;
a
-------
test1
(1 row)limit 2 reports the error.
=# select * from ft limit 2;
ERROR: program "/home/horiguti/work/exprog" failed
DETAIL: child process exited with exit code 1
I think this would be contrary to users expectations: if the SELECT
command works for limit 1, they would expect that the command would work
for limit 2 as well. So, I think it would be better to error out that
command for limit 1 as well, as-is.
Best regards,
Etsuro Fujita
On Wed, Nov 7, 2018 at 11:03 PM Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
(2018/11/07 9:22), Kyotaro HORIGUCHI wrote:
At Tue, 06 Nov 2018 21:07:37 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote in<5BE18409.2070004@lab.ntt.co.jp>
(2018/11/06 12:53), Kyotaro HORIGUCHI wrote:
even if it comes from another pipe, we can assume that the
SIGPIPE immediately stopped the program before returning any
garbage to us.Sorry, I don't understand this.
Mmm. It looks confusing, sorry. In other words:
We don't care the reason for program's ending if we received
enough data from the program and we actively (= before receiving
EOF) stop reading. On the other hand if SIGPIPE (on another pipe)
or something fatal happened on the program before we end reading,
we receive EOF just after that and we are interested in the cause
of the program's death.# Does the above make sense?
Yeah, thanks for the explanation!
+1
I take back what I said earlier about false positives from other
pipes. I think it's only traditional Unix programs designed for use
in pipelines and naive programs that let SIGPIPE terminate the
process. The accepted answer here gives a good way to think about
it:
https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist
A program sophisticated enough to be writing to other pipes is no
longer in that category and should be setting up signal dispositions
itself, so I agree that we should enable the default disposition and
ignore WTERMSIG(exit_code) == SIGPIPE, as proposed. That is pretty
close to the intended purpose of that signal AFAICS.
In the sense of "We don't care the reason", negligible reasons
are necessariry restricted to SIGPIPE, evan SIGSEGV could be
theoretically ignored safely. "theoretically" here means it is
another issue whether we rely on the output from a program which
causes SEGV (or any reason other than SIGPIPE, which we caused).For the SIGSEGV case, I think it would be better that we don't rely on
the output data, IMO, because I think there might be a possibility that
the program have generated that data incorrectly/unexpectedly.
+1
I don't think we should ignore termination by signals other than
SIGPIPE: that could hide serious problems from users. I want to know
if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it
happens after we read enough data; there is a major problem that a
human needs to investigate!
Finally in the attached PoC, I set cstate->eof_reached on failure
of NextCopyFromRawFields then if eof_reached we don't ingore
SIGPIPE. For a program like
"puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
the following behavior. (I got an error without the fflush() but
it is inevitable).=# select * from ft2 limit 1;
a
-------
test1=# select * from ft2 limit 2;
ERROR: program "/home/horiguti/work/exprog" failed
DETAIL: child process was terminated by signal 13: Broken pipeFor the original case:
=# select * from ft1 limit 1;
a
------
test
=# select * from ft1 limit 2;
a
------
test
(1 row)I didn't confirmed whether it is complete.
Sorry, I don't understand this fully, but the reason to add the
eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal
COPY FROM PROGRAM cases?Yes, if we have received EOF, the program ended before we
finished reading from the pipe thus any error should be
reported. Elsewise we don't care at least SIGPIPE. In the
attached patch all kind of errors are ignored when pipe is closed
from reading-end on backends.OK, I understand your idea. Thanks for the patch!
+1
As the result it doesn't report an error for SELECT * FROM ft2
LIMIT 1 on "main(void){puts("test1"); return 1;}".=# select * from ft limit 1;
a
-------
test1
(1 row)limit 2 reports the error.
=# select * from ft limit 2;
ERROR: program "/home/horiguti/work/exprog" failed
DETAIL: child process exited with exit code 1I think this would be contrary to users expectations: if the SELECT
command works for limit 1, they would expect that the command would work
for limit 2 as well. So, I think it would be better to error out that
command for limit 1 as well, as-is.
I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an
error. For LIMIT 1, we got all the rows we wanted, and then we closed
the pipe. If we got a non-zero non-signal exit code, or a signal exit
code and it was SIGPIPE (not any other signal!), then we should
consider that to be expected.
I tried to think of a scenario where the already-received output is
truly invalidated by a later error that happens after we close the
pipe. It could be something involving a program that uses something
optimistic like serializable snapshot isolation that can later decide
that whatever it told you earlier is not valid after all. Suppose the
program is clever enough to expect EPIPE and not consider that to be
an error, but wants to tell us about serialization failure with a
non-zero exit code. To handle that, you'd need a way to provide an
option to file_fdw to tell it not to ignore non-zero exit codes after
close. This seems so exotic and contrived that it's not worth
bothering with for now, but could always be added.
BTW just for curiosity:
perl -e 'for (my $i=0; $i < 1000000; $i++) { print "$i\n"; }' | head -5
Exit code: terminated by SIGPIPE, like seq
ruby -e 'for i in 1..1000000 do puts i; end' | head -5
Exit code: 1, like Python
On Wed, Nov 7, 2018 at 4:44 PM Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
(2018/11/06 19:50), Thomas Munro wrote:
On my FreeBSD system, I compared the output of procstat -i (= show
signal disposition) for two "sleep 60" processes, one invoked from the
shell and the other from COPY ... FROM PROGRAM. The differences were:
PIPE, TTIN, TTOU and USR2. For the first and last of those, the
default action would be to terminate the process, but the COPY PROGRAM
child ignored them; for TTIN and TTOU, the default action would be to
stop the process, but again they are ignored. Why do bgwriter.c,
startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
regular backends?So, we should revert SIGUSR2 as well to default processing?
I don't think it matters in practice, but it might be nice to restore
that just for consistency. I'm not sure what to think about the TTIN,
TTOU stuff; I don't understand job control well right now but I don't
think it really applies to programs run by a PostgreSQL backend, so if
we restore those it'd probably again be only for consistency. Then
again, there may be a reason someone decided to ignore those in the
postmaster + regular backends but not the various auxiliary processes.
Anyone?
--
Thomas Munro
http://www.enterprisedb.com
(2018/11/08 10:50), Thomas Munro wrote:
On Wed, Nov 7, 2018 at 11:03 PM Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:(2018/11/07 9:22), Kyotaro HORIGUCHI wrote:
At Tue, 06 Nov 2018 21:07:37 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote in<5BE18409.2070004@lab.ntt.co.jp>
(2018/11/06 12:53), Kyotaro HORIGUCHI wrote:
even if it comes from another pipe, we can assume that the
SIGPIPE immediately stopped the program before returning any
garbage to us.Sorry, I don't understand this.
Mmm. It looks confusing, sorry. In other words:
We don't care the reason for program's ending if we received
enough data from the program and we actively (= before receiving
EOF) stop reading. On the other hand if SIGPIPE (on another pipe)
or something fatal happened on the program before we end reading,
we receive EOF just after that and we are interested in the cause
of the program's death.# Does the above make sense?
Yeah, thanks for the explanation!
+1
I take back what I said earlier about false positives from other
pipes. I think it's only traditional Unix programs designed for use
in pipelines and naive programs that let SIGPIPE terminate the
process. The accepted answer here gives a good way to think about
it:https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist
Thanks for the information!
A program sophisticated enough to be writing to other pipes is no
longer in that category and should be setting up signal dispositions
itself, so I agree that we should enable the default disposition and
ignore WTERMSIG(exit_code) == SIGPIPE, as proposed. That is pretty
close to the intended purpose of that signal AFAICS.
Great!
In the sense of "We don't care the reason", negligible reasons
are necessariry restricted to SIGPIPE, evan SIGSEGV could be
theoretically ignored safely. "theoretically" here means it is
another issue whether we rely on the output from a program which
causes SEGV (or any reason other than SIGPIPE, which we caused).For the SIGSEGV case, I think it would be better that we don't rely on
the output data, IMO, because I think there might be a possibility that
the program have generated that data incorrectly/unexpectedly.+1
I don't think we should ignore termination by signals other than
SIGPIPE: that could hide serious problems from users. I want to know
if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it
happens after we read enough data; there is a major problem that a
human needs to investigate!
I think so too.
Finally in the attached PoC, I set cstate->eof_reached on failure
of NextCopyFromRawFields then if eof_reached we don't ingore
SIGPIPE. For a program like
"puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
the following behavior. (I got an error without the fflush() but
it is inevitable).=# select * from ft2 limit 1;
a
-------
test1=# select * from ft2 limit 2;
ERROR: program "/home/horiguti/work/exprog" failed
DETAIL: child process was terminated by signal 13: Broken pipeFor the original case:
=# select * from ft1 limit 1;
a
------
test
=# select * from ft1 limit 2;
a
------
test
(1 row)I didn't confirmed whether it is complete.
Sorry, I don't understand this fully, but the reason to add the
eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal
COPY FROM PROGRAM cases?Yes, if we have received EOF, the program ended before we
finished reading from the pipe thus any error should be
reported. Elsewise we don't care at least SIGPIPE. In the
attached patch all kind of errors are ignored when pipe is closed
from reading-end on backends.OK, I understand your idea. Thanks for the patch!
+1
As the result it doesn't report an error for SELECT * FROM ft2
LIMIT 1 on "main(void){puts("test1"); return 1;}".=# select * from ft limit 1;
a
-------
test1
(1 row)limit 2 reports the error.
=# select * from ft limit 2;
ERROR: program "/home/horiguti/work/exprog" failed
DETAIL: child process exited with exit code 1I think this would be contrary to users expectations: if the SELECT
command works for limit 1, they would expect that the command would work
for limit 2 as well. So, I think it would be better to error out that
command for limit 1 as well, as-is.I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an
error. For LIMIT 1, we got all the rows we wanted, and then we closed
the pipe. If we got a non-zero non-signal exit code, or a signal exit
code and it was SIGPIPE (not any other signal!), then we should
consider that to be expected.
Maybe I'm missing something, but the non-zero non-signal exit code means
that there was something wrong with the called program, so I think a
human had better investigate that as well IMO, which would probably be a
minor problem, though. Too restrictive?
I tried to think of a scenario where the already-received output is
truly invalidated by a later error that happens after we close the
pipe. It could be something involving a program that uses something
optimistic like serializable snapshot isolation that can later decide
that whatever it told you earlier is not valid after all. Suppose the
program is clever enough to expect EPIPE and not consider that to be
an error, but wants to tell us about serialization failure with a
non-zero exit code. To handle that, you'd need a way to provide an
option to file_fdw to tell it not to ignore non-zero exit codes after
close. This seems so exotic and contrived that it's not worth
bothering with for now, but could always be added.
Interesting! I agree that such an option could add more flexibility in
handling the non-zero-exit-code case.
BTW just for curiosity:
perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
Exit code: terminated by SIGPIPE, like seq
Good to know!
ruby -e 'for i in 1..1000000 do puts i; end' | head -5
Exit code: 1, like Python
Sad. Anyway, thanks a lot for these experiments in addition to the
previous ones!
On Wed, Nov 7, 2018 at 4:44 PM Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:(2018/11/06 19:50), Thomas Munro wrote:
On my FreeBSD system, I compared the output of procstat -i (= show
signal disposition) for two "sleep 60" processes, one invoked from the
shell and the other from COPY ... FROM PROGRAM. The differences were:
PIPE, TTIN, TTOU and USR2. For the first and last of those, the
default action would be to terminate the process, but the COPY PROGRAM
child ignored them; for TTIN and TTOU, the default action would be to
stop the process, but again they are ignored. Why do bgwriter.c,
startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
regular backends?So, we should revert SIGUSR2 as well to default processing?
I don't think it matters in practice, but it might be nice to restore
that just for consistency.
Agreed.
I'm not sure what to think about the TTIN,
TTOU stuff; I don't understand job control well right now but I don't
think it really applies to programs run by a PostgreSQL backend, so if
we restore those it'd probably again be only for consistency. Then
again, there may be a reason someone decided to ignore those in the
postmaster + regular backends but not the various auxiliary processes.
Anyone?
I don't have any idea about that.
Best regards,
Etsuro Fujita
At Thu, 08 Nov 2018 21:52:31 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in <5BE4318F.4040002@lab.ntt.co.jp>
(2018/11/08 10:50), Thomas Munro wrote:
I take back what I said earlier about false positives from other
pipes. I think it's only traditional Unix programs designed for use
in pipelines and naive programs that let SIGPIPE terminate the
process. The accepted answer here gives a good way to think about
it:https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist
Thanks for the information!
A program sophisticated enough to be writing to other pipes is no
longer in that category and should be setting up signal dispositions
itself, so I agree that we should enable the default disposition and
ignore WTERMSIG(exit_code) == SIGPIPE, as proposed. That is pretty
close to the intended purpose of that signal AFAICS.Great!
In the sense of "We don't care the reason", negligible reasons
are necessariry restricted to SIGPIPE, evan SIGSEGV could be
theoretically ignored safely. "theoretically" here means it is
another issue whether we rely on the output from a program which
causes SEGV (or any reason other than SIGPIPE, which we caused).For the SIGSEGV case, I think it would be better that we don't rely on
the output data, IMO, because I think there might be a possibility
that
the program have generated that data incorrectly/unexpectedly.+1
I don't think we should ignore termination by signals other than
SIGPIPE: that could hide serious problems from users. I want to know
if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it
happens after we read enough data; there is a major problem that a
human needs to investigate!I think so too.
Ok, I can live with that with no problem.
As the result it doesn't report an error for SELECT * FROM ft2
LIMIT 1 on "main(void){puts("test1"); return 1;}".=# select * from ft limit 1;
a
-------
test1
(1 row)limit 2 reports the error.
=# select * from ft limit 2;
ERROR: program "/home/horiguti/work/exprog" failed
DETAIL: child process exited with exit code 1I think this would be contrary to users expectations: if the SELECT
command works for limit 1, they would expect that the command would
work
for limit 2 as well. So, I think it would be better to error out that
command for limit 1 as well, as-is.I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an
error. For LIMIT 1, we got all the rows we wanted, and then we closed
the pipe. If we got a non-zero non-signal exit code, or a signal exit
code and it was SIGPIPE (not any other signal!), then we should
consider that to be expected.Maybe I'm missing something, but the non-zero non-signal exit code
means that there was something wrong with the called program, so I
think a human had better investigate that as well IMO, which would
probably be a minor problem, though. Too restrictive?
I think Thomas just saying that reading more lines can develop
problems. According to the current discussion, we should error
out if we had SEGV when limit 1.
I tried to think of a scenario where the already-received output is
truly invalidated by a later error that happens after we close the
pipe. It could be something involving a program that uses something
optimistic like serializable snapshot isolation that can later decide
that whatever it told you earlier is not valid after all. Suppose the
program is clever enough to expect EPIPE and not consider that to be
an error, but wants to tell us about serialization failure with a
non-zero exit code. To handle that, you'd need a way to provide an
option to file_fdw to tell it not to ignore non-zero exit codes after
close. This seems so exotic and contrived that it's not worth
bothering with for now, but could always be added.Interesting! I agree that such an option could add more flexibility
in handling the non-zero-exit-code case.
I think the program shoudn't output a line until all possible
output is validated. Once the data source emited a line, the
receiver can't do other than believe that it won't be withdrawn.
BTW just for curiosity:
perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
Exit code: terminated by SIGPIPE, like seqGood to know!
Mmm..I didn't get an error at hand on both CentOS7 and High Sierra.
| $ perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
...
| 4
| $ echo $?
| 0
ruby -e 'for i in 1..1000000 do puts i; end' | head -5
Exit code: 1, like PythonSad. Anyway, thanks a lot for these experiments in addition to the
previous ones!
ruby reported broken pipe but exit status was 0..
create foreign table ft5 (a text) server svf1 options (program 'ruby -e "for i in 1..1000 do puts i; end"');
select * from ft5 limit 5;
a
---
1
...
5
(5 rows)
(no error)
On Wed, Nov 7, 2018 at 4:44 PM Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:(2018/11/06 19:50), Thomas Munro wrote:
On my FreeBSD system, I compared the output of procstat -i (= show
signal disposition) for two "sleep 60" processes, one invoked from the
shell and the other from COPY ... FROM PROGRAM. The differences were:
PIPE, TTIN, TTOU and USR2. For the first and last of those, the
default action would be to terminate the process, but the COPY PROGRAM
child ignored them; for TTIN and TTOU, the default action would be to
stop the process, but again they are ignored. Why do bgwriter.c,
startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
regular backends?So, we should revert SIGUSR2 as well to default processing?
I don't think it matters in practice, but it might be nice to restore
that just for consistency.Agreed.
I'm not sure what to think about the TTIN,
TTOU stuff; I don't understand job control well right now but I don't
think it really applies to programs run by a PostgreSQL backend, so if
we restore those it'd probably again be only for consistency. Then
again, there may be a reason someone decided to ignore those in the
postmaster + regular backends but not the various auxiliary processes.
Anyone?I don't have any idea about that.
In my understanding processes not connected to a
terminal(tty/pts) cannot receive TTIN/TTOU (unless someone sent
it artifically). Since child processes are detached by setsid()
(on Linux), programs called in that way also won't have a
controlling terminal at the start time and I suppose they have no
means to connect to one since they are no longer on the same
session with postmaster.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Fri, Nov 9, 2018 at 6:39 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Mmm..I didn't get an error at hand on both CentOS7 and High Sierra.
| $ perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
...
| 4
| $ echo $?
| 0
That's the exit code from head. You can see python or perl's exit
code by adding strace in front (on Linux):
$ strace python -c "for i in range(1000000): print i" | head -5
...
+++ exited with 1 +++
$ strace perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
...
+++ killed by SIGPIPE +++
create foreign table ft5 (a text) server svf1 options (program 'ruby -e "for i in 1..1000 do puts i; end"');
select * from ft5 limit 5;
a
---
1
...
5
(5 rows)
(no error)
1000 is not enough... due to buffering, it works. Try 1000000:
postgres=# create foreign table ft5 (a text) server svf1 options
(program 'ruby -e "for i in 1..1000000 do puts i; end"');
CREATE FOREIGN TABLE
postgres=# select * from ft5 limit 5;
ERROR: program "ruby -e "for i in 1..1000000 do puts i; end"" failed
DETAIL: child process exited with exit code 1
--
Thomas Munro
http://www.enterprisedb.com
(2018/11/09 14:39), Kyotaro HORIGUCHI wrote:
At Thu, 08 Nov 2018 21:52:31 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote in<5BE4318F.4040002@lab.ntt.co.jp>
(2018/11/08 10:50), Thomas Munro wrote:
I take back what I said earlier about false positives from other
pipes. I think it's only traditional Unix programs designed for use
in pipelines and naive programs that let SIGPIPE terminate the
process. The accepted answer here gives a good way to think about
it:https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist
Thanks for the information!
A program sophisticated enough to be writing to other pipes is no
longer in that category and should be setting up signal dispositions
itself, so I agree that we should enable the default disposition and
ignore WTERMSIG(exit_code) == SIGPIPE, as proposed. That is pretty
close to the intended purpose of that signal AFAICS.Great!
In the sense of "We don't care the reason", negligible reasons
are necessariry restricted to SIGPIPE, evan SIGSEGV could be
theoretically ignored safely. "theoretically" here means it is
another issue whether we rely on the output from a program which
causes SEGV (or any reason other than SIGPIPE, which we caused).For the SIGSEGV case, I think it would be better that we don't rely on
the output data, IMO, because I think there might be a possibility
that
the program have generated that data incorrectly/unexpectedly.+1
I don't think we should ignore termination by signals other than
SIGPIPE: that could hide serious problems from users. I want to know
if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it
happens after we read enough data; there is a major problem that a
human needs to investigate!I think so too.
Ok, I can live with that with no problem.
OK
As the result it doesn't report an error for SELECT * FROM ft2
LIMIT 1 on "main(void){puts("test1"); return 1;}".=# select * from ft limit 1;
a
-------
test1
(1 row)limit 2 reports the error.
=# select * from ft limit 2;
ERROR: program "/home/horiguti/work/exprog" failed
DETAIL: child process exited with exit code 1I think this would be contrary to users expectations: if the SELECT
command works for limit 1, they would expect that the command would
work
for limit 2 as well. So, I think it would be better to error out that
command for limit 1 as well, as-is.I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an
error. For LIMIT 1, we got all the rows we wanted, and then we closed
the pipe. If we got a non-zero non-signal exit code, or a signal exit
code and it was SIGPIPE (not any other signal!), then we should
consider that to be expected.Maybe I'm missing something, but the non-zero non-signal exit code
means that there was something wrong with the called program, so I
think a human had better investigate that as well IMO, which would
probably be a minor problem, though. Too restrictive?I think Thomas just saying that reading more lines can develop
problems. According to the current discussion, we should error
out if we had SEGV when limit 1.
Ah, I misread that. Sorry for the noise.
On Wed, Nov 7, 2018 at 4:44 PM Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:(2018/11/06 19:50), Thomas Munro wrote:
On my FreeBSD system, I compared the output of procstat -i (= show
signal disposition) for two "sleep 60" processes, one invoked from the
shell and the other from COPY ... FROM PROGRAM. The differences were:
PIPE, TTIN, TTOU and USR2. For the first and last of those, the
default action would be to terminate the process, but the COPY PROGRAM
child ignored them; for TTIN and TTOU, the default action would be to
stop the process, but again they are ignored. Why do bgwriter.c,
startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
regular backends?So, we should revert SIGUSR2 as well to default processing?
I don't think it matters in practice, but it might be nice to restore
that just for consistency.Agreed.
I'm not sure what to think about the TTIN,
TTOU stuff; I don't understand job control well right now but I don't
think it really applies to programs run by a PostgreSQL backend, so if
we restore those it'd probably again be only for consistency. Then
again, there may be a reason someone decided to ignore those in the
postmaster + regular backends but not the various auxiliary processes.
Anyone?I don't have any idea about that.
In my understanding processes not connected to a
terminal(tty/pts) cannot receive TTIN/TTOU (unless someone sent
it artifically). Since child processes are detached by setsid()
(on Linux), programs called in that way also won't have a
controlling terminal at the start time and I suppose they have no
means to connect to one since they are no longer on the same
session with postmaster.
For TTIN and TTOU, we would first need to make clear the reason for the
inconsistency Thomas pointed out. I'm wondering if we should leave the
TTIN/TTOU stuff for future work.
Best regards,
Etsuro Fujita
At Fri, 9 Nov 2018 20:32:54 +1300, Thomas Munro <thomas.munro@enterprisedb.com> wrote in <CAEepm=21wvBaKY2cbN62xn8JoPygLLhTawK2TkBac8Suw68YBw@mail.gmail.com>
On Fri, Nov 9, 2018 at 6:39 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:Mmm..I didn't get an error at hand on both CentOS7 and High Sierra.
| $ perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
...
| 4
| $ echo $?
| 0That's the exit code from head. You can see python or perl's exit
code by adding strace in front (on Linux):
Sorry. My stupid. I understand it as head ignores not only
SIGPIPE but SIGSEV and any error exit status of the calling
program. I tried head with a program ends with SEGV and saw that
head ignores it.
$ ./t
line 1
Segmentation fault (core dumped)
$ ./t | head -5 # SEGV before head closes the pipe
line 1
$
This is more tolerant than what I proposed before. (I think it is
too-much tolerant for us).
create foreign table ft5 (a text) server svf1 options (program 'ruby -e "for i in 1..1000 do puts i; end"');
select * from ft5 limit 5;
a
---
1
...
5
(5 rows)
(no error)1000 is not enough... due to buffering, it works. Try 1000000:
Ah. Understood. Thanks. (Ruby's flush donesn't work for pipes..)
postgres=# create foreign table ft5 (a text) server svf1 options
(program 'ruby -e "for i in 1..1000000 do puts i; end"');
CREATE FOREIGN TABLE
postgres=# select * from ft5 limit 5;
ERROR: program "ruby -e "for i in 1..1000000 do puts i; end"" failed
DETAIL: child process exited with exit code 1
I saw the same failure. Ruby handles SIGPIPE and converts it to
exit(1). It cannot be handled by just ignoring SIGPIPE.
Ruby seems to be a friend of my second patch:p
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Fri, 09 Nov 2018 18:27:09 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in <5BE552ED.4040304@lab.ntt.co.jp>
Ok, I can live with that with no problem.
OK
...
I think Thomas just saying that reading more lines can develop
problems. According to the current discussion, we should error
out if we had SEGV when limit 1.Ah, I misread that. Sorry for the noise.
Being said that, the ruby case may suggest that we should be more
tolerant for the crash-after-limit case.
In my understanding processes not connected to a
terminal(tty/pts) cannot receive TTIN/TTOU (unless someone sent
it artifically). Since child processes are detached by setsid()
(on Linux), programs called in that way also won't have a
controlling terminal at the start time and I suppose they have no
means to connect to one since they are no longer on the same
session with postmaster.For TTIN and TTOU, we would first need to make clear the reason for
the inconsistency Thomas pointed out. I'm wondering if we should
leave the TTIN/TTOU stuff for future work.
Inconsistency? I read the Thomas's messages as "TTIO/TTOU are not
needed to our busines and we don't have a reason to restore them
before calling external programs other than just plaster
seemingly consistency." And I agree to the analysis and I agree
to you on the point that it doens't need consideration just now.
If the consistency means the different behaviors between perl and
ruby, (as written in another message,) I'm inclined to vote for
having a bit more tolerance for error of external programs as my
second patch.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
(2018/11/12 18:52), Kyotaro HORIGUCHI wrote:
At Fri, 09 Nov 2018 18:27:09 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote in<5BE552ED.4040304@lab.ntt.co.jp>
Ok, I can live with that with no problem.
OK
...
I think Thomas just saying that reading more lines can develop
problems. According to the current discussion, we should error
out if we had SEGV when limit 1.Ah, I misread that. Sorry for the noise.
Being said that, the ruby case may suggest that we should be more
tolerant for the crash-after-limit case.
The Ruby case would be sad, but I'm not sure we can handle such a case
safely in general, because core and file_fdw don't have enough knowledge
about whether a non-zero exit code returned from pclose is OK or not,
which would actually depend on the called program. One approach for
that would be to add an option to file_fdw for the called program to
tell it to ignore those exit codes, which would be somewhat similar to
what Thomas proposed [1]/messages/by-id/CAEepm=0fBPiRkSiJ3v4ynm+aP-A-dhuHjTFBAxwo59EkE2-E5Q@mail.gmail.com.
In my understanding processes not connected to a
terminal(tty/pts) cannot receive TTIN/TTOU (unless someone sent
it artifically). Since child processes are detached by setsid()
(on Linux), programs called in that way also won't have a
controlling terminal at the start time and I suppose they have no
means to connect to one since they are no longer on the same
session with postmaster.For TTIN and TTOU, we would first need to make clear the reason for
the inconsistency Thomas pointed out. I'm wondering if we should
leave the TTIN/TTOU stuff for future work.Inconsistency?
By "the inconsistency" I mean his words:
Why do bgwriter.c, startup.c, ... set SIGTTIN and SIGTTOU back to
SIG_DFL, but not regular backends?
I read the Thomas's messages as "TTIO/TTOU are not
needed to our busines and we don't have a reason to restore them
before calling external programs other than just plaster
seemingly consistency." And I agree to the analysis and I agree
to you on the point that it doens't need consideration just now.
OK
If the consistency means the different behaviors between perl and
ruby, (as written in another message,) I'm inclined to vote for
having a bit more tolerance for error of external programs as my
second patch.
Maybe my explanation was not enough, but I don't mean that.
Thanks!
Best regards,
Etsuro Fujita
[1]: /messages/by-id/CAEepm=0fBPiRkSiJ3v4ynm+aP-A-dhuHjTFBAxwo59EkE2-E5Q@mail.gmail.com
/messages/by-id/CAEepm=0fBPiRkSiJ3v4ynm+aP-A-dhuHjTFBAxwo59EkE2-E5Q@mail.gmail.com
(2018/11/12 20:41), Etsuro Fujita wrote:
(2018/11/12 18:52), Kyotaro HORIGUCHI wrote:
I read the Thomas's messages as "TTIO/TTOU are not
needed to our busines and we don't have a reason to restore them
before calling external programs other than just plaster
seemingly consistency." And I agree to the analysis and I agree
to you on the point that it doens't need consideration just now.OK
Attached is an updated version of Tom's POC patch. Here are changes:
* I modified his patch so that the called program inherits SIG_DFL for
SIGUSR2 as well, as discussed upthread.
* I think it's better to ignore the SIGPIPE failure in
ClosePipeToProgram if we were in a COPY FROM PROGRAM that was allowed to
terminate early and keep the behavior as-is otherwise. If we ignore
that failure unconditionally in that function, eg, COPY TO PROGRAM would
fail to get a (better) error message in CopySendEndOfRow or EndCopy when
the invoked program was terminated on SIGPIPE, as discussed before [1]/messages/by-id/5BE18409.2070004@lab.ntt.co.jp.
And to do so, I added a new argument to BeginCopyFrom to specify
whether COPY FROM PROGRAM can terminate early or not.
Best regards,
Etsuro Fujita