More test/kerberos tweaks

Started by Jacob Championalmost 5 years ago4 messages
#1Jacob Champion
pchampion@vmware.com
2 attachment(s)

While working on [1]/messages/by-id/a9ee5e4e8e844d06c2bcf70c6ed3306ccb4897f1.camel@vmware.com I ended up running into a couple issues with the
Kerberos test suite. Attached are two patches with possible
improvements:

Some tests check for expected log messages. They currently search
through the entire log file, from the beginning, for every match. So if
two tests share the same expected log content (which is common), it's
possible for the second test to get a false positive by matching
against the first test's output. (You can see this by modifying one of
the expected-failure tests to expect the same output as a previous
happy-path test.)

The first patch stores the offset of the previous match, and searches
forward from there during the next match, resetting the offset every
time the log file changes. This isn't perfect -- it could still result
in false positives if one test spits out two or more matching log lines
and only matches the first one by itself -- but searching forward
should be an improvement over what's there now.

The second patch is more of a quality-of-life improvement for devs. On
a failed log match, the test will spin for three minutes before giving
up on the match. I think this is excessive, especially since
interrupting the test with Ctrl-C leaves behind a running KDC daemon.
The patch reduces the timeout to three seconds. I guess the only
question I have is whether there are any underpowered machines out
there running this test, relying on the higher timeout to function.

--Jacob

[1]: /messages/by-id/a9ee5e4e8e844d06c2bcf70c6ed3306ccb4897f1.camel@vmware.com
/messages/by-id/a9ee5e4e8e844d06c2bcf70c6ed3306ccb4897f1.camel@vmware.com

Attachments:

0001-test-kerberos-only-search-forward-in-logs.patchtext/x-patch; name=0001-test-kerberos-only-search-forward-in-logs.patchDownload
From 49bc25665e1a878114d2b89bdab8fd8761e7c432 Mon Sep 17 00:00:00 2001
From: Jacob Champion <pchampion@vmware.com>
Date: Wed, 3 Feb 2021 11:04:42 -0800
Subject: [PATCH 1/8] test/kerberos: only search forward in logs

The log content tests searched through the entire log file, from the
beginning, for every match. If two tests shared the same expected log
content, it was possible for the second test to get a false positive by
matching against the first test's output. (This could be seen by
modifying one of the expected-failure tests to expect the same output as
a previous happy-path test.)

Store the offset of the previous match, and search forward from there
during the next match, resetting the offset every time the log file
changes. This could still result in false positives if one test spits
out two or more matching log lines, but it should be an improvement over
what's there now.
---
 src/test/kerberos/t/001_auth.pl | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl
index 079321bbfc..c721d24dbd 100644
--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -182,6 +182,9 @@ $node->safe_psql('postgres', 'CREATE USER test1;');
 
 note "running tests";
 
+my $current_log_name = '';
+my $current_log_position;
+
 # Test connection success or failure, and if success, that query returns true.
 sub test_access
 {
@@ -221,18 +224,37 @@ sub test_access
 		$lfname =~ s/^stderr //;
 		chomp $lfname;
 
+		if ($lfname ne $current_log_name)
+		{
+			$current_log_name = $lfname;
+
+			# Search from the beginning of this new file.
+			$current_log_position = 0;
+			note "current_log_position = $current_log_position";
+		}
+
 		# might need to retry if logging collector process is slow...
 		my $max_attempts = 180 * 10;
 		my $first_logfile;
 		for (my $attempts = 0; $attempts < $max_attempts; $attempts++)
 		{
 			$first_logfile = slurp_file($node->data_dir . '/' . $lfname);
-			last if $first_logfile =~ m/\Q$expect_log_msg\E/;
+
+			# Don't include previously matched text in the search.
+			$first_logfile = substr $first_logfile, $current_log_position;
+			if ($first_logfile =~ m/\Q$expect_log_msg\E/g)
+			{
+				$current_log_position += pos($first_logfile);
+				last;
+			}
+
 			usleep(100_000);
 		}
 
 		like($first_logfile, qr/\Q$expect_log_msg\E/,
 			 'found expected log file content');
+
+		note "current_log_position = $current_log_position";
 	}
 
 	return;
-- 
2.25.1

0002-test-kerberos-reduce-match-timeout.patchtext/x-patch; name=0002-test-kerberos-reduce-match-timeout.patchDownload
From d3629612e6a510c55589c95d2d2e2f648b8b7ea4 Mon Sep 17 00:00:00 2001
From: Jacob Champion <pchampion@vmware.com>
Date: Wed, 3 Feb 2021 13:51:06 -0800
Subject: [PATCH 2/8] test/kerberos: reduce match timeout

Waiting three minutes for every failed match is a bit much, especially
since Ctrl-C'ing the test suite leaves behind a stray KDC daemon.
---
 src/test/kerberos/t/001_auth.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl
index c721d24dbd..6e5380c2cc 100644
--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -234,7 +234,7 @@ sub test_access
 		}
 
 		# might need to retry if logging collector process is slow...
-		my $max_attempts = 180 * 10;
+		my $max_attempts = 3 * 10;
 		my $first_logfile;
 		for (my $attempts = 0; $attempts < $max_attempts; $attempts++)
 		{
-- 
2.25.1

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jacob Champion (#1)
Re: More test/kerberos tweaks

Jacob Champion <pchampion@vmware.com> writes:

The second patch is more of a quality-of-life improvement for devs. On
a failed log match, the test will spin for three minutes before giving
up on the match. I think this is excessive, especially since
interrupting the test with Ctrl-C leaves behind a running KDC daemon.
The patch reduces the timeout to three seconds. I guess the only
question I have is whether there are any underpowered machines out
there running this test, relying on the higher timeout to function.

We have, almost invariably, regretted it when we tried to use short
timeouts in test cases.

I checked the buildfarm logs for the past month to see which machines
are running the kerberos test and what their reported stage runtimes
were. There are just three:

system min time max time

crake | 00:00:09 | 00:01:16
eelpout | 00:00:00 | 00:00:01
elver | 00:00:04 | 00:00:09

I'm not sure what's happening on crake to give it such a wide range
of runtimes on this test, but I can't help thinking it would probably
have failed a few of those runs with a three-second timeout.

More generally, sometimes people want to do things like run a test
under valgrind. So it's not just "underpowered machines" that may
need a generous timeout. Even if we did reduce the default, I'd
want a way (probably via an environment variable, cf PGCTLTIMEOUT)
to kick it back up.

On the whole, I think the right thing to be doing here is not so
much messing with the timeout as fixing the test script to be
more robust against control-C. If it's failing to shut down the
KDC, I'd say that's a test bug.

regards, tom lane

#3Jacob Champion
pchampion@vmware.com
In reply to: Tom Lane (#2)
Re: More test/kerberos tweaks

On Fri, 2021-02-05 at 15:55 -0500, Tom Lane wrote:

We have, almost invariably, regretted it when we tried to use short
timeouts in test cases.

That's what I was afraid of. I can work around it easily enough on my
local machine, so it's not really a blocker in any sense.

That just leaves the first patch, then.

On the whole, I think the right thing to be doing here is not so
much messing with the timeout as fixing the test script to be
more robust against control-C. If it's failing to shut down the
KDC, I'd say that's a test bug.

Agreed. I'm trying to limit the amount of test churn I introduce, since
I don't speak Perl very well. :D

Reworking the log checks so that they didn't need timeouts (e.g. by
stopping the server or otherwise flushing the logs, a la the
ssl_passphrase_callback tests) would be another approach. I'll jot it
down to look into later.

--Jacob

#4Jacob Champion
pchampion@vmware.com
In reply to: Jacob Champion (#3)
Re: More test/kerberos tweaks

On Fri, 2021-02-05 at 21:54 +0000, Jacob Champion wrote:

That just leaves the first patch, then.

I've moved the first patch into the commitfest entry for [1]/messages/by-id/c55788dd1773c521c862e8e0dddb367df51222be.camel@vmware.com, which
depends on it.

--Jacob

[1]: /messages/by-id/c55788dd1773c521c862e8e0dddb367df51222be.camel@vmware.com