On the stability of TAP tests for LDAP

Started by Michael Paquierover 6 years ago10 messageshackers
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Hi all,
(Peter Eisentraut in CC)

crake has just complained about a failure with the LDAP test suite:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2019-07-19%2001%3A33%3A31

# Running: /usr/sbin/slapd -f
/home/bf/bfr/root/HEAD/pgsql.build/src/test/ldap/tmp_check/slapd.conf
-h ldap://localhost:55306 ldaps://localhost:55307
# loading LDAP data
# Running: ldapadd -x -y
/home/bf/bfr/root/HEAD/pgsql.build/src/test/ldap/tmp_check/ldappassword
-f authdata.ldif
ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)

And FWIW, when running a parallel check-world to make my laptop busy
enough, it is rather usual to face failures with this test, which is
annoying. Shouldn't we have at least a number of retries with
intermediate sleeps for the commands run in 001_auth.pl? As far as I
recall, I think that we can run into failures when calling ldapadd and
ldappasswd.

Thanks,
--
Michael

#2Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#1)
Re: On the stability of TAP tests for LDAP

On Fri, Jul 19, 2019 at 3:30 PM Michael Paquier <michael@paquier.xyz> wrote:

# Running: /usr/sbin/slapd -f
/home/bf/bfr/root/HEAD/pgsql.build/src/test/ldap/tmp_check/slapd.conf
-h ldap://localhost:55306 ldaps://localhost:55307
# loading LDAP data
# Running: ldapadd -x -y
/home/bf/bfr/root/HEAD/pgsql.build/src/test/ldap/tmp_check/ldappassword
-f authdata.ldif
ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)

And FWIW, when running a parallel check-world to make my laptop busy
enough, it is rather usual to face failures with this test, which is
annoying. Shouldn't we have at least a number of retries with
intermediate sleeps for the commands run in 001_auth.pl? As far as I
recall, I think that we can run into failures when calling ldapadd and
ldappasswd.

Yeah, it seems we need to figure out a way to wait for it to be ready
to accept connections. I wondered how other people do this, and found
one example that polls for the .pid file:

https://github.com/tiredofit/docker-openldap/blob/master/install/etc/cont-init.d/10-openldap#L347

That looks nice and tidy but I'm not sure it can be trusted, given
that OpenLDAP's own tests poll a trivial ldapsearch (also based on a
cursory glance at slapd/main.c which appears to write the .pid file a
bit too early, though I may have misread):

https://github.com/openldap/openldap/blob/master/tests/scripts/test039-glue-ldap-concurrency#L59

I guess we should do that too. I don't know how to write Perl but I'll try...

--
Thomas Munro
https://enterprisedb.com

#3Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#2)
Re: On the stability of TAP tests for LDAP

On Wed, Jul 24, 2019 at 3:52 PM Thomas Munro <thomas.munro@gmail.com> wrote:

I guess we should do that too. I don't know how to write Perl but I'll try...

Does this look about right?

--
Thomas Munro
https://enterprisedb.com

Attachments:

wait-for-slapd.patchapplication/octet-stream; name=wait-for-slapd.patchDownload+9-0
#4Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#3)
Re: On the stability of TAP tests for LDAP

On Wed, Jul 24, 2019 at 04:41:05PM +1200, Thomas Munro wrote:

On Wed, Jul 24, 2019 at 3:52 PM Thomas Munro <thomas.munro@gmail.com> wrote:

I guess we should do that too. I don't know how to write Perl but I'll try...

Does this look about right?

Some comments from here. I have not tested the patch.

I would recommend using TestLib::system_log instead of plain system().
The command should be a list of arguments with one element per
argument (see call of system_log in PostgresNode.pm for example). The
indentation is incorrect, and that I would make the retry longer as I
got the feeling that on slow machines we could still have issues. We
also usually tend to increase the timeout up to 5 minutes, and the
sleep phases make use of Time::HiRes::usleep.
--
Michael

#5Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#4)
Re: On the stability of TAP tests for LDAP

On Wed, Jul 24, 2019 at 5:26 PM Michael Paquier <michael@paquier.xyz> wrote:

Does this look about right?

Some comments from here. I have not tested the patch.

I would recommend using TestLib::system_log instead of plain system().
The command should be a list of arguments with one element per
argument (see call of system_log in PostgresNode.pm for example). The
indentation is incorrect, and that I would make the retry longer as I
got the feeling that on slow machines we could still have issues. We
also usually tend to increase the timeout up to 5 minutes, and the
sleep phases make use of Time::HiRes::usleep.

Thanks, here's v2.

--
Thomas Munro
https://enterprisedb.com

Attachments:

wait-for-slapd-v2.patchapplication/octet-stream; name=wait-for-slapd-v2.patchDownload+11-0
#6Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#5)
Re: On the stability of TAP tests for LDAP

On Wed, Jul 24, 2019 at 05:47:13PM +1200, Thomas Munro wrote:

Thanks, here's v2.

Perhaps this worked on freebsd? Now that I test it, the test gets
stuck on my Debian box:
# waiting for slapd to accept requests...
# Running: ldapsearch -h localhost -p 49534 -s base -b
dc=example,dc=net -n 'objectclass=*'
SASL/DIGEST-MD5 authentication started
Please enter your password:
ldap_sasl_interactive_bind_s: Invalid credentials (49)
additional info: SASL(-13): user not found: no secret in
database

pgperltidy complains about the patch indentation using perltidy
v20170521 (version mentioned in tools/pgindent/README).
--
Michael

#7Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#6)
Re: On the stability of TAP tests for LDAP

On Wed, Jul 24, 2019 at 7:50 PM Michael Paquier <michael@paquier.xyz> wrote:

Perhaps this worked on freebsd? Now that I test it, the test gets
stuck on my Debian box:
# waiting for slapd to accept requests...
# Running: ldapsearch -h localhost -p 49534 -s base -b
dc=example,dc=net -n 'objectclass=*'
SASL/DIGEST-MD5 authentication started
Please enter your password:
ldap_sasl_interactive_bind_s: Invalid credentials (49)
additional info: SASL(-13): user not found: no secret in
database

Huh, yeah, I don't know why slapd requires credentials on Debian, when
the version that ships with FreeBSD is OK with an anonymous
connection. Rather than worrying about that, I just adjusted it to
supply the credentials. It works on both for me.

pgperltidy complains about the patch indentation using perltidy
v20170521 (version mentioned in tools/pgindent/README).

Fixed.

--
Thomas Munro
https://enterprisedb.com

Attachments:

wait-for-slapd-v3.patchapplication/octet-stream; name=wait-for-slapd-v3.patchDownload+16-0
#8Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#7)
Re: On the stability of TAP tests for LDAP

On Wed, Jul 24, 2019 at 09:01:47PM +1200, Thomas Munro wrote:

Huh, yeah, I don't know why slapd requires credentials on Debian, when
the version that ships with FreeBSD is OK with an anonymous
connection. Rather than worrying about that, I just adjusted it to
supply the credentials. It works on both for me.

Thanks for the updated patch, this looks good. I have done a series
of tests keeping my laptop busy and I haven't seen a failure where I
usually see problems 10%~20% of the time. So that seems to help,
thanks!
--
Michael

#9Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#8)
Re: On the stability of TAP tests for LDAP

On Thu, Jul 25, 2019 at 12:51 PM Michael Paquier <michael@paquier.xyz> wrote:

Thanks for the updated patch, this looks good. I have done a series
of tests keeping my laptop busy and I haven't seen a failure where I
usually see problems 10%~20% of the time. So that seems to help,
thanks!

Pushed, thanks.

--
Thomas Munro
https://enterprisedb.com

#10Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#9)
Re: On the stability of TAP tests for LDAP

On Fri, Jul 26, 2019 at 10:18:48AM +1200, Thomas Munro wrote:

Pushed, thanks.

Thanks for fixing! I'll update this thread if there are still some
problems.
--
Michael