More test/kerberos tweaks

Started by Jacob Championabout 5 years ago4 messageshackers
Jump to latest
#1Jacob Champion
jacob.champion@enterprisedb.com

While working on [1]/messages/by-id/a9ee5e4e8e844d06c2bcf70c6ed3306ccb4897f1.camel@vmware.com I ended up running into a couple issues with the
Kerberos test suite. Attached are two patches with possible
improvements:

Some tests check for expected log messages. They currently search
through the entire log file, from the beginning, for every match. So if
two tests share the same expected log content (which is common), it's
possible for the second test to get a false positive by matching
against the first test's output. (You can see this by modifying one of
the expected-failure tests to expect the same output as a previous
happy-path test.)

The first patch stores the offset of the previous match, and searches
forward from there during the next match, resetting the offset every
time the log file changes. This isn't perfect -- it could still result
in false positives if one test spits out two or more matching log lines
and only matches the first one by itself -- but searching forward
should be an improvement over what's there now.

The second patch is more of a quality-of-life improvement for devs. On
a failed log match, the test will spin for three minutes before giving
up on the match. I think this is excessive, especially since
interrupting the test with Ctrl-C leaves behind a running KDC daemon.
The patch reduces the timeout to three seconds. I guess the only
question I have is whether there are any underpowered machines out
there running this test, relying on the higher timeout to function.

--Jacob

[1]: /messages/by-id/a9ee5e4e8e844d06c2bcf70c6ed3306ccb4897f1.camel@vmware.com
/messages/by-id/a9ee5e4e8e844d06c2bcf70c6ed3306ccb4897f1.camel@vmware.com

Attachments:

0001-test-kerberos-only-search-forward-in-logs.patchtext/x-patch; name=0001-test-kerberos-only-search-forward-in-logs.patchDownload+23-2
0002-test-kerberos-reduce-match-timeout.patchtext/x-patch; name=0002-test-kerberos-reduce-match-timeout.patchDownload+1-2
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jacob Champion (#1)
Re: More test/kerberos tweaks

Jacob Champion <pchampion@vmware.com> writes:

The second patch is more of a quality-of-life improvement for devs. On
a failed log match, the test will spin for three minutes before giving
up on the match. I think this is excessive, especially since
interrupting the test with Ctrl-C leaves behind a running KDC daemon.
The patch reduces the timeout to three seconds. I guess the only
question I have is whether there are any underpowered machines out
there running this test, relying on the higher timeout to function.

We have, almost invariably, regretted it when we tried to use short
timeouts in test cases.

I checked the buildfarm logs for the past month to see which machines
are running the kerberos test and what their reported stage runtimes
were. There are just three:

system min time max time

crake | 00:00:09 | 00:01:16
eelpout | 00:00:00 | 00:00:01
elver | 00:00:04 | 00:00:09

I'm not sure what's happening on crake to give it such a wide range
of runtimes on this test, but I can't help thinking it would probably
have failed a few of those runs with a three-second timeout.

More generally, sometimes people want to do things like run a test
under valgrind. So it's not just "underpowered machines" that may
need a generous timeout. Even if we did reduce the default, I'd
want a way (probably via an environment variable, cf PGCTLTIMEOUT)
to kick it back up.

On the whole, I think the right thing to be doing here is not so
much messing with the timeout as fixing the test script to be
more robust against control-C. If it's failing to shut down the
KDC, I'd say that's a test bug.

regards, tom lane

#3Jacob Champion
jacob.champion@enterprisedb.com
In reply to: Tom Lane (#2)
Re: More test/kerberos tweaks

On Fri, 2021-02-05 at 15:55 -0500, Tom Lane wrote:

We have, almost invariably, regretted it when we tried to use short
timeouts in test cases.

That's what I was afraid of. I can work around it easily enough on my
local machine, so it's not really a blocker in any sense.

That just leaves the first patch, then.

On the whole, I think the right thing to be doing here is not so
much messing with the timeout as fixing the test script to be
more robust against control-C. If it's failing to shut down the
KDC, I'd say that's a test bug.

Agreed. I'm trying to limit the amount of test churn I introduce, since
I don't speak Perl very well. :D

Reworking the log checks so that they didn't need timeouts (e.g. by
stopping the server or otherwise flushing the logs, a la the
ssl_passphrase_callback tests) would be another approach. I'll jot it
down to look into later.

--Jacob

#4Jacob Champion
jacob.champion@enterprisedb.com
In reply to: Jacob Champion (#3)
Re: More test/kerberos tweaks

On Fri, 2021-02-05 at 21:54 +0000, Jacob Champion wrote:

That just leaves the first patch, then.

I've moved the first patch into the commitfest entry for [1]/messages/by-id/c55788dd1773c521c862e8e0dddb367df51222be.camel@vmware.com, which
depends on it.

--Jacob

[1]: /messages/by-id/c55788dd1773c521c862e8e0dddb367df51222be.camel@vmware.com