Fixing cache pollution in the Kerberos test suite

Started by Jacob Championalmost 5 years ago10 messages
#1Jacob Champion
pchampion@vmware.com
1 attachment(s)

Hi all,

I was running tests with a GSS-enabled stack, and ran into some very
long psql timeouts after running the Kerberos test suite. It turns out
the suite pushes test credentials into the user's global cache, and
these no-longer-useful credentials persist after the suite has
finished. (You can see this in action by running the test/kerberos
suite and then running `klist`.) This leads to long hangs, I assume
while the GSS implementation tries to contact a KDC that no longer
exists.
Attached is a patch that initializes a local credentials cache inside
tmp_check/krb5cc, and tells psql to use it via the KRB5CCNAME envvar.
This prevents the global cache pollution. WDYT?

--Jacob

Attachments:

test-kerberos-use-a-local-credentials-cache.patchtext/x-patch; name=test-kerberos-use-a-local-credentials-cache.patchDownload
From c9532e72a762abeef0996ad8df5da9bbdedbccad Mon Sep 17 00:00:00 2001
From: Jacob Champion <pchampion@vmware.com>
Date: Mon, 25 Jan 2021 09:32:44 -0800
Subject: [PATCH] test/kerberos: use a local credentials cache

Previously, the Kerberos test suite pushed credentials into the user's
default credentials cache. This modified any credentials the user
already had, and could cause other psql invocations to misbehave later,
as the GSS implementation attempted to use the globally cached test
credentials.

Use a local credentials cache at tmp_check/krb5cc instead. Clients can
be directed to use this cache via the KRB5CCNAME environment variable.
---
 src/test/kerberos/t/001_auth.pl | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl
index 8625059149..d329cb06e3 100644
--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -75,6 +75,10 @@ my $kdc_datadir = "${TestLib::tmp_check}/krb5kdc";
 my $kdc_pidfile = "${TestLib::tmp_check}/krb5kdc.pid";
 my $keytab      = "${TestLib::tmp_check}/krb5.keytab";
 
+# Avoid polluting the global credentials cache by creating our own and pointing
+# psql to it later using KRB5CCNAME.
+my $krb5_cache  = "${TestLib::tmp_check}/krb5cc";
+
 my $dbname = 'postgres';
 my $username = 'test1';
 my $application = '001_auth.pl';
@@ -185,6 +189,9 @@ sub test_access
 {
 	my ($node, $role, $query, $expected_res, $gssencmode, $test_name, $expect_log_msg) = @_;
 
+	# Use our local credentials cache.
+	local $ENV{KRB5CCNAME} = $krb5_cache;
+
 	# need to connect over TCP/IP for Kerberos
 	my ($res, $stdoutres, $stderrres) = $node->psql(
 		'postgres',
@@ -241,6 +248,9 @@ sub test_query
 {
 	my ($node, $role, $query, $expected, $gssencmode, $test_name) = @_;
 
+	# Use our local credentials cache.
+	local $ENV{KRB5CCNAME} = $krb5_cache;
+
 	# need to connect over TCP/IP for Kerberos
 	my ($res, $stdoutres, $stderrres) = $node->psql(
 		'postgres',
@@ -270,7 +280,7 @@ test_access($node, 'test1', 'SELECT true', 2, '',
 	'does not log unauthenticated principal',
 	"(test1:[unknown]) FATAL:  GSSAPI authentication failed");
 
-run_log [ $kinit, 'test1' ], \$test1_password or BAIL_OUT($?);
+run_log [ $kinit, '-c', $krb5_cache, 'test1' ], \$test1_password or BAIL_OUT($?);
 
 test_access($node, 'test1', 'SELECT true', 2, '', 'fails without mapping', '');
 
-- 
2.25.1

#2Stephen Frost
sfrost@snowman.net
In reply to: Jacob Champion (#1)
Re: Fixing cache pollution in the Kerberos test suite

Greetings,

* Jacob Champion (pchampion@vmware.com) wrote:

I was running tests with a GSS-enabled stack, and ran into some very
long psql timeouts after running the Kerberos test suite. It turns out
the suite pushes test credentials into the user's global cache, and
these no-longer-useful credentials persist after the suite has
finished. (You can see this in action by running the test/kerberos
suite and then running `klist`.) This leads to long hangs, I assume
while the GSS implementation tries to contact a KDC that no longer
exists.
Attached is a patch that initializes a local credentials cache inside
tmp_check/krb5cc, and tells psql to use it via the KRB5CCNAME envvar.
This prevents the global cache pollution. WDYT?

Ah, yeah, that generally seems like a good idea.

Thanks,

Stephen

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#2)
Re: Fixing cache pollution in the Kerberos test suite

Stephen Frost <sfrost@snowman.net> writes:

* Jacob Champion (pchampion@vmware.com) wrote:

I was running tests with a GSS-enabled stack, and ran into some very
long psql timeouts after running the Kerberos test suite. It turns out
the suite pushes test credentials into the user's global cache, and
these no-longer-useful credentials persist after the suite has
finished. (You can see this in action by running the test/kerberos
suite and then running `klist`.) This leads to long hangs, I assume
while the GSS implementation tries to contact a KDC that no longer
exists.
Attached is a patch that initializes a local credentials cache inside
tmp_check/krb5cc, and tells psql to use it via the KRB5CCNAME envvar.
This prevents the global cache pollution. WDYT?

Ah, yeah, that generally seems like a good idea.

Yeah, changing global state is just awful. However, I don't
actually see any change here (RHEL8):

$ klist
klist: Credentials cache 'KCM:1001' not found
$ make check
...
Result: PASS
$ klist
klist: Credentials cache 'KCM:1001' not found

I suppose in an environment where someone was really using Kerberos,
the random kinit would be more of a problem.

Also, why are you only setting the ENV variable within narrow parts
of the test script? I'd be inclined to enforce it throughout.

regards, tom lane

#4Jacob Champion
pchampion@vmware.com
In reply to: Tom Lane (#3)
Re: Fixing cache pollution in the Kerberos test suite

On Mon, 2021-01-25 at 13:49 -0500, Tom Lane wrote:

Yeah, changing global state is just awful. However, I don't
actually see any change here (RHEL8):

Interesting. I'm running Ubuntu 20.04:

$ klist
klist: No credentials cache found (filename: /tmp/krb5cc_1000)

$ make check
...

$ klist
Ticket cache: FILE:/tmp/krb5cc_1000
Default principal: test1@EXAMPLE.COM

Valid starting Expires Service principal
... krbtgt/EXAMPLE.COM@EXAMPLE.COM
... postgres/auth-test-localhost.postgresql.example.com@
... postgres/auth-test-localhost.postgresql.example.com@EXAMPLE.COM

I wonder if your use of a KCM cache type rather than FILE makes the
difference?

Also, why are you only setting the ENV variable within narrow parts
of the test script? I'd be inclined to enforce it throughout.

I considered it and decided I didn't want to pollute the server's
environment with it, since the server shouldn't need the client cache.
But I think it'd be fine (and match the current situation) if it were
set once for the whole script, if you prefer.

--Jacob

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jacob Champion (#4)
Re: Fixing cache pollution in the Kerberos test suite

Jacob Champion <pchampion@vmware.com> writes:

On Mon, 2021-01-25 at 13:49 -0500, Tom Lane wrote:

Yeah, changing global state is just awful. However, I don't
actually see any change here (RHEL8):

Interesting. I'm running Ubuntu 20.04:

Hmm. I'll poke harder.

Also, why are you only setting the ENV variable within narrow parts
of the test script? I'd be inclined to enforce it throughout.

I considered it and decided I didn't want to pollute the server's
environment with it, since the server shouldn't need the client cache.

True, but if it did try to access the cache, accessing the user's
normal cache would be strictly worse than accessing the test cache.

regards, tom lane

#6Jacob Champion
pchampion@vmware.com
In reply to: Tom Lane (#5)
1 attachment(s)
Re: Fixing cache pollution in the Kerberos test suite

On Mon, 2021-01-25 at 14:04 -0500, Tom Lane wrote:

Jacob Champion <pchampion@vmware.com> writes:

On Mon, 2021-01-25 at 13:49 -0500, Tom Lane wrote:

Also, why are you only setting the ENV variable within narrow parts
of the test script? I'd be inclined to enforce it throughout.

I considered it and decided I didn't want to pollute the server's
environment with it, since the server shouldn't need the client cache.

True, but if it did try to access the cache, accessing the user's
normal cache would be strictly worse than accessing the test cache.

That's fair. Attached is a v2 that just sets KRB5CCNAME globally. Makes
for a much smaller patch :)

--Jacob

Attachments:

v2-test-kerberos-use-a-local-credentials-cache.patchtext/x-patch; name=v2-test-kerberos-use-a-local-credentials-cache.patchDownload
From 86a7331868e6155488e568864c099caf1f21dffb Mon Sep 17 00:00:00 2001
From: Jacob Champion <pchampion@vmware.com>
Date: Mon, 25 Jan 2021 09:32:44 -0800
Subject: [PATCH] test/kerberos: use a local credentials cache

Previously, the Kerberos test suite pushed credentials into the user's
default credentials cache. This modified any credentials the user
already had, and could cause other psql invocations to misbehave later,
as the GSS implementation attempted to use the globally cached test
credentials.

Use a local credentials cache at tmp_check/krb5cc instead. Clients can
be directed to use this cache via the KRB5CCNAME environment variable.
---
 src/test/kerberos/t/001_auth.pl | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl
index 8625059149..044e58018f 100644
--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -79,6 +79,10 @@ my $dbname = 'postgres';
 my $username = 'test1';
 my $application = '001_auth.pl';
 
+# Avoid polluting the global credentials cache by creating our own and pointing
+# the clients to it. kinit and psql will use this implicitly.
+$ENV{KRB5CCNAME} = "${TestLib::tmp_check}/krb5cc";
+
 note "setting up Kerberos";
 
 my ($stdout, $krb5_version);
-- 
2.25.1

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#5)
Re: Fixing cache pollution in the Kerberos test suite

I wrote:

Jacob Champion <pchampion@vmware.com> writes:

Interesting. I'm running Ubuntu 20.04:

Hmm. I'll poke harder.

Ah ... on both RHEL8 and Fedora 33, I find this:

--- snip ---
$ cat /etc/krb5.conf.d/kcm_default_ccache
# This file should normally be installed by your distribution into a
# directory that is included from the Kerberos configuration file (/etc/krb5.conf)
# On Fedora/RHEL/CentOS, this is /etc/krb5.conf.d/
#
# To enable the KCM credential cache enable the KCM socket and the service:
#   systemctl enable sssd-secrets.socket sssd-kcm.socket
#   systemctl start sssd-kcm.socket
#
# To disable the KCM credential cache, comment out the following lines.
[libdefaults]
    default_ccache_name = KCM:
--- snip ---

Even more interesting, that service seems to be enabled by default
(I'm pretty darn sure I didn't ask for it...)

However, this doesn't seem to explain why the test script isn't
causing a global state change. Whether the state is held in a
file or the sssd daemon shouldn't matter, it seems like.

Also, it looks like the test causes /tmp/krb5cc_<uid> to get
created or updated despite this setting. If I force klist
to look at that:

$ KRB5CCNAME=/tmp/krb5cc_1001 klist
Ticket cache: FILE:/tmp/krb5cc_1001
Default principal: test1@EXAMPLE.COM

Valid starting Expires Service principal
01/25/21 14:31:57 01/26/21 14:31:57 krbtgt/EXAMPLE.COM@EXAMPLE.COM
01/25/21 14:31:57 01/26/21 14:31:57 postgres/auth-test-localhost.postgresql.example.com@
Ticket server: postgres/auth-test-localhost.postgresql.example.com@EXAMPLE.COM

where the time corresponds to my having just run the test again.

So I'm still mightily confused, but it is clear that the test's
kinit is touching a file it shouldn't.

regards, tom lane

#8Jacob Champion
pchampion@vmware.com
In reply to: Tom Lane (#7)
Re: Fixing cache pollution in the Kerberos test suite

On Mon, 2021-01-25 at 14:36 -0500, Tom Lane wrote:

However, this doesn't seem to explain why the test script isn't
causing a global state change. Whether the state is held in a
file or the sssd daemon shouldn't matter, it seems like.

Also, it looks like the test causes /tmp/krb5cc_<uid> to get
created or updated despite this setting.

Huh. I wonder, if you run `klist -A` after running the tests, do you
get anything more interesting? I am seeing a few bugs on Red Hat's
Bugzilla that center around strange KCM behavior [1]https://bugzilla.redhat.com/show_bug.cgi?id=1712875. But we're now
well outside my area of competence.

--Jacob

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1712875

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jacob Champion (#6)
Re: Fixing cache pollution in the Kerberos test suite

Jacob Champion <pchampion@vmware.com> writes:

On Mon, 2021-01-25 at 14:04 -0500, Tom Lane wrote:

True, but if it did try to access the cache, accessing the user's
normal cache would be strictly worse than accessing the test cache.

That's fair. Attached is a v2 that just sets KRB5CCNAME globally. Makes
for a much smaller patch :)

I tweaked this to make it look a bit more like the rest of the script,
and pushed it. Thanks!

regards, tom lane

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jacob Champion (#8)
Re: Fixing cache pollution in the Kerberos test suite

Jacob Champion <pchampion@vmware.com> writes:

On Mon, 2021-01-25 at 14:36 -0500, Tom Lane wrote:

Also, it looks like the test causes /tmp/krb5cc_<uid> to get
created or updated despite this setting.

Huh. I wonder, if you run `klist -A` after running the tests, do you
get anything more interesting?

"klist -A" prints nothing.

I am seeing a few bugs on Red Hat's
Bugzilla that center around strange KCM behavior [1]. But we're now
well outside my area of competence.

Mine too. But I verified that the /tmp file is no longer modified
with the adjusted script, so one way or the other this is better.

regards, tom lane