LDAP check flapping on crake due to race
Hi,
There are one or two failures per month on crake. It looks like when
authentication is rejected, as expected in the tests, the psql process
is exiting, but there is a race where the Perl script still wants to
write a dummy query to its stdin (?), so you get:
psql: FATAL: LDAP authentication failed for user "test1"
ack Broken pipe: write( 13, 'SELECT 1' ) at
/usr/share/perl5/vendor_perl/IPC/Run/IO.pm line 549.
Example:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2019-11-10%2023%3A36%3A04
tmunro=> select animal, snapshot, branch from run where fail_stage =
'ldapCheck' order by snapshot desc;
animal | snapshot | branch
--------+---------------------+---------------
crake | 2020-08-02 02:32:30 | REL_13_STABLE
crake | 2020-07-22 23:36:04 | REL_12_STABLE
crake | 2020-07-14 00:52:04 | REL_13_STABLE
crake | 2020-05-15 17:35:05 | REL_11_STABLE
crake | 2020-04-07 20:51:03 | REL_12_STABLE
mantid | 2020-03-04 18:17:58 | REL_12_STABLE
mantid | 2020-03-04 17:59:58 | REL_11_STABLE
crake | 2020-01-17 14:33:21 | REL_12_STABLE
crake | 2019-11-10 23:36:04 | REL_11_STABLE
crake | 2019-09-09 08:48:25 | HEAD
crake | 2019-08-05 21:18:23 | REL_12_STABLE
crake | 2019-07-19 01:33:31 | HEAD
crake | 2019-07-16 01:06:02 | REL_11_STABLE
(13 rows)
(Ignore mantid, it had a temporary setup problem that was resolved.)
On Sun, Aug 02, 2020 at 05:29:57PM +1200, Thomas Munro wrote:
There are one or two failures per month on crake. It looks like when
authentication is rejected, as expected in the tests, the psql process
is exiting, but there is a race where the Perl script still wants to
write a dummy query to its stdin (?), so you get:psql: FATAL: LDAP authentication failed for user "test1"
ack Broken pipe: write( 13, 'SELECT 1' ) at
/usr/share/perl5/vendor_perl/IPC/Run/IO.pm line 549.
Do you suppose a fix like e12a472 would cover this? ("psql <&-" fails with
status 1 after successful authentication, and authentication failure gives
status 2.)
Noah Misch <noah@leadboat.com> writes:
On Sun, Aug 02, 2020 at 05:29:57PM +1200, Thomas Munro wrote:
There are one or two failures per month on crake. It looks like when
authentication is rejected, as expected in the tests, the psql process
is exiting, but there is a race where the Perl script still wants to
write a dummy query to its stdin (?), so you get:
psql: FATAL: LDAP authentication failed for user "test1"
ack Broken pipe: write( 13, 'SELECT 1' ) at
/usr/share/perl5/vendor_perl/IPC/Run/IO.pm line 549.
Do you suppose a fix like e12a472 would cover this? ("psql <&-" fails with
status 1 after successful authentication, and authentication failure gives
status 2.)
AFAICT the failure is happening down inside PostgresNode::psql's call
of IPC::Run::run, so we don't really have the option to adjust things
in exactly that way. (And messing with sub psql for the benefit of
this one caller seems pretty risky anyway.)
I'm inclined to suggest that the LDAP test's test_access could use
an empty stdin and pass "-c 'SELECT 1'" as a command line option
instead. (Maybe that's exactly what you meant, but I'm not sure.)
I've not been able to duplicate this locally, so I have no idea if
that'd really fix it.
regards, tom lane
On Mon, Aug 3, 2020 at 4:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm inclined to suggest that the LDAP test's test_access could use
an empty stdin and pass "-c 'SELECT 1'" as a command line option
instead. (Maybe that's exactly what you meant, but I'm not sure.)
Good idea. Here's a patch like that.
I've not been able to duplicate this locally, so I have no idea if
that'd really fix it.
Me neither -- I guess someone who enjoys perl could hack IPC::Run to
take a short nap at the right moment.
Attachments:
0001-Fix-rare-failure-in-LDAP-tests.patchapplication/x-patch; name=0001-Fix-rare-failure-in-LDAP-tests.patchDownload
From 22220d01b65cef920c953551ce1c49d071fdf29f Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 3 Aug 2020 11:34:40 +1200
Subject: [PATCH] Fix rare failure in LDAP tests.
Instead of writing a query to psql's stdin, use -c. This avoids a
failure where psql exits before we write, seen a few times on the build
farm. Thanks to Noah Misch and Tom Lane for suggesting this fix.
Back-patch to 11, where the LDAP tests arrived.
Discussion: https://postgr.es/m/CA%2BhUKGLFmW%2BHQYPeKiwSp5sdFFHtFViCpw4Mh6yAgEx74r5-Cw%40mail.gmail.com
---
src/test/ldap/t/001_auth.pl | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/test/ldap/t/001_auth.pl b/src/test/ldap/t/001_auth.pl
index f8941144f5..3bc7672451 100644
--- a/src/test/ldap/t/001_auth.pl
+++ b/src/test/ldap/t/001_auth.pl
@@ -165,7 +165,8 @@ sub test_access
my ($node, $role, $expected_res, $test_name) = @_;
my $res =
- $node->psql('postgres', 'SELECT 1', extra_params => [ '-U', $role ]);
+ $node->psql('postgres', undef,
+ extra_params => [ '-U', $role, '-c', 'SELECT 1' ]);
is($res, $expected_res, $test_name);
return;
}
--
2.20.1
On Mon, Aug 03, 2020 at 12:12:57PM +1200, Thomas Munro wrote:
On Mon, Aug 3, 2020 at 4:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm inclined to suggest that the LDAP test's test_access could use
an empty stdin and pass "-c 'SELECT 1'" as a command line option
instead. (Maybe that's exactly what you meant, but I'm not sure.)Good idea. Here's a patch like that.
While I had meant a different approach, this is superior.
I've not been able to duplicate this locally, so I have no idea if
that'd really fix it.Me neither -- I guess someone who enjoys perl could hack IPC::Run to
take a short nap at the right moment.
Not essential to reproduce first, I think.
On Mon, Aug 3, 2020 at 12:29 PM Noah Misch <noah@leadboat.com> wrote:
On Mon, Aug 03, 2020 at 12:12:57PM +1200, Thomas Munro wrote:
On Mon, Aug 3, 2020 at 4:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm inclined to suggest that the LDAP test's test_access could use
an empty stdin and pass "-c 'SELECT 1'" as a command line option
instead. (Maybe that's exactly what you meant, but I'm not sure.)Good idea. Here's a patch like that.
While I had meant a different approach, this is superior.
Thanks. Pushed.