SIGPIPE in TAP tests
Two buildfarm runs[1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2017-11-23%2008%3A37%3A17[2]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2017-10-30%2000%3A14%3A09 from the last 90 days have failed in
src/test/authentication, like this:
t/001_password.pl ..
Failed 3/8 subtests
t/002_saslprep.pl .. ok 1815 ms ( 0.00 usr 0.00 sys + 0.89 cusr 0.26 csys = 1.15 CPU)
Test Summary Report
-------------------
t/001_password.pl (Wstat: 13 Tests: 5 Failed: 0)
Non-zero wait status: 13
Parse errors: Bad plan. You planned 8 tests but ran 5.
Files=2, Tests=17, 3 wallclock secs ( 0.04 usr 0.00 sys + 1.67 cusr 0.50 csys = 2.21 CPU)
Result: FAIL
Compared to a good run, the other logs just end suddenly after the expected
"FATAL: password authentication failed". "Wstat: 13" means the Perl process
died to signal 13 (SIGPIPE). This test invokes psql in a way that fails
authentication, and it writes "SELECT 1" to psql's stdin. The SIGPIPE happens
if the psql process exits before that write. I can reproduce this
consistently by patching a delay into IPC::Run:
--- IPC/Run.pm~ 2017-05-12 06:46:43.000000000 -0700
+++ IPC/Run.pm 2017-12-09 00:24:06.876132628 -0800
@@ -1449,6 +1449,8 @@
sub run {
local $in_run = 1; ## Allow run()-only optimizations.
my IPC::Run $self = start(@_);
+ use Time::HiRes 'usleep';
+ usleep 100_000;
The two src/test/authentication tests then fail, but nothing else fails.
Let's ignore SIGPIPE in all TAP tests, which leaves some evidence in
regress_log_001_password:
ack Broken pipe: write( 13, 'SELECT 1' ) at /home/nm/sw/cpan/lib/perl5/IPC/Run/IO.pm line 549.
To fix the actual failures, we can cease sending "SELECT 1"; it's enough to
disconnect immediately. Patch attached.
[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2017-11-23%2008%3A37%3A17
[2]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2017-10-30%2000%3A14%3A09
Attachments:
sigpipe-tap-v1.patchtext/plain; charset=us-asciiDownload
diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index 2d3f674..9340f2f 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -44,8 +44,7 @@ sub test_role
$status_string = 'success' if ($expected_res eq 0);
- my $res =
- $node->psql('postgres', 'SELECT 1', extra_params => [ '-U', $role ]);
+ my $res = $node->psql('postgres', undef, extra_params => [ '-U', $role ]);
is($res, $expected_res,
"authentication $status_string for method $method, role $role");
}
diff --git a/src/test/authentication/t/002_saslprep.pl b/src/test/authentication/t/002_saslprep.pl
index df9f85d..e09273e 100644
--- a/src/test/authentication/t/002_saslprep.pl
+++ b/src/test/authentication/t/002_saslprep.pl
@@ -41,8 +41,7 @@ sub test_login
$status_string = 'success' if ($expected_res eq 0);
$ENV{"PGPASSWORD"} = $password;
- my $res =
- $node->psql('postgres', 'SELECT 1', extra_params => [ '-U', $role ]);
+ my $res = $node->psql('postgres', undef, extra_params => [ '-U', $role ]);
is($res, $expected_res,
"authentication $status_string for role $role with password $password"
);
diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm
index d1a2eb5..6019040 100644
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -75,6 +75,10 @@ BEGIN
INIT
{
+ # Return EPIPE instead of killing the process with SIGPIPE. An affected
+ # test may still fail, but it's more likely to report useful facts.
+ $SIG{PIPE} = 'IGNORE';
+
# Determine output directories, and create them. The base path is the
# TESTDIR environment variable, which is normally set by the invoking
# Makefile.
On Sun, Dec 10, 2017 at 6:02 AM, Noah Misch <noah@leadboat.com> wrote:
Two buildfarm runs[1][2] from the last 90 days have failed in
src/test/authentication, like this:t/001_password.pl ..
Failed 3/8 subtests
t/002_saslprep.pl .. ok 1815 ms ( 0.00 usr 0.00 sys + 0.89 cusr 0.26 csys = 1.15 CPU)Test Summary Report
-------------------
t/001_password.pl (Wstat: 13 Tests: 5 Failed: 0)
Non-zero wait status: 13
Parse errors: Bad plan. You planned 8 tests but ran 5.
Files=2, Tests=17, 3 wallclock secs ( 0.04 usr 0.00 sys + 1.67 cusr 0.50 csys = 2.21 CPU)
Result: FAILCompared to a good run, the other logs just end suddenly after the expected
"FATAL: password authentication failed". "Wstat: 13" means the Perl process
died to signal 13 (SIGPIPE). This test invokes psql in a way that fails
authentication, and it writes "SELECT 1" to psql's stdin. The SIGPIPE happens
if the psql process exits before that write.
Nice investigation. An interesting coincidence is that I have looked
yesterday at an off-list reported some folks have sent me which is
basically what you have here.
+ # Return EPIPE instead of killing the process with SIGPIPE. An affected
+ # test may still fail, but it's more likely to report useful facts.
+ $SIG{PIPE} = 'IGNORE';
With this portion changed I can indeed see something which is more helpful:
psql: FATAL: password authentication failed for user "saslpreptest1_role"
ack Broken pipe: write( 11, 'SELECT 1' ) at
/Library/Perl/5.18/IPC/Run/IO.pm line 558.
If SIGPIPE is ignored then test output just stops after generating the
FATAL message. Oops.
The two src/test/authentication tests then fail, but nothing else fails.
Let's ignore SIGPIPE in all TAP tests, which leaves some evidence in
regress_log_001_password:ack Broken pipe: write( 13, 'SELECT 1' ) at /home/nm/sw/cpan/lib/perl5/IPC/Run/IO.pm line 549.
To fix the actual failures, we can cease sending "SELECT 1"; it's enough to
disconnect immediately. Patch attached.
Perhaps you could use an empty string instead? I feel a bit uneasy
about passing an undefined object to IPC::Run::run.
--
Michael
On Mon, Dec 11, 2017 at 04:19:52PM +0900, Michael Paquier wrote:
On Sun, Dec 10, 2017 at 6:02 AM, Noah Misch <noah@leadboat.com> wrote:
If SIGPIPE is ignored then test output just stops after generating the
FATAL message. Oops.
You mean "If SIGPIPE is not ignored ...", right?
To fix the actual failures, we can cease sending "SELECT 1"; it's enough to
disconnect immediately. Patch attached.Perhaps you could use an empty string instead? I feel a bit uneasy
about passing an undefined object to IPC::Run::run.
IPC::Run documents the equivalence of undef and '' in this context; search for
"close a child processes stdin" in
http://search.cpan.org/~rbs/IPC-Run-0.78/lib/IPC/Run.pm. Thus, I expect both
spellings to work reliably, and I find "undef" slightly more evocative.
On Mon, Dec 11, 2017 at 4:32 PM, Noah Misch <noah@leadboat.com> wrote:
On Mon, Dec 11, 2017 at 04:19:52PM +0900, Michael Paquier wrote:
On Sun, Dec 10, 2017 at 6:02 AM, Noah Misch <noah@leadboat.com> wrote:
If SIGPIPE is ignored then test output just stops after generating the
FATAL message. Oops.You mean "If SIGPIPE is not ignored ...", right?
Yes, sorry.
To fix the actual failures, we can cease sending "SELECT 1"; it's enough to
disconnect immediately. Patch attached.Perhaps you could use an empty string instead? I feel a bit uneasy
about passing an undefined object to IPC::Run::run.IPC::Run documents the equivalence of undef and '' in this context; search for
"close a child processes stdin" in
http://search.cpan.org/~rbs/IPC-Run-0.78/lib/IPC/Run.pm. Thus, I expect both
spellings to work reliably, and I find "undef" slightly more evocative.
Thanks, I missed this bit. No objections to use undef then.
--
Michael