Fixing cache pollution in the Kerberos test suite

Started by Jacob Championabout 5 years ago10 messageshackers
Jump to latest
#1Jacob Champion
jacob.champion@enterprisedb.com

Hi all,

I was running tests with a GSS-enabled stack, and ran into some very
long psql timeouts after running the Kerberos test suite. It turns out
the suite pushes test credentials into the user's global cache, and
these no-longer-useful credentials persist after the suite has
finished. (You can see this in action by running the test/kerberos
suite and then running `klist`.) This leads to long hangs, I assume
while the GSS implementation tries to contact a KDC that no longer
exists.
Attached is a patch that initializes a local credentials cache inside
tmp_check/krb5cc, and tells psql to use it via the KRB5CCNAME envvar.
This prevents the global cache pollution. WDYT?

--Jacob

Attachments:

test-kerberos-use-a-local-credentials-cache.patchtext/x-patch; name=test-kerberos-use-a-local-credentials-cache.patchDownload+11-2
#2Stephen Frost
sfrost@snowman.net
In reply to: Jacob Champion (#1)
Re: Fixing cache pollution in the Kerberos test suite

Greetings,

* Jacob Champion (pchampion@vmware.com) wrote:

I was running tests with a GSS-enabled stack, and ran into some very
long psql timeouts after running the Kerberos test suite. It turns out
the suite pushes test credentials into the user's global cache, and
these no-longer-useful credentials persist after the suite has
finished. (You can see this in action by running the test/kerberos
suite and then running `klist`.) This leads to long hangs, I assume
while the GSS implementation tries to contact a KDC that no longer
exists.
Attached is a patch that initializes a local credentials cache inside
tmp_check/krb5cc, and tells psql to use it via the KRB5CCNAME envvar.
This prevents the global cache pollution. WDYT?

Ah, yeah, that generally seems like a good idea.

Thanks,

Stephen

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#2)
Re: Fixing cache pollution in the Kerberos test suite

Stephen Frost <sfrost@snowman.net> writes:

* Jacob Champion (pchampion@vmware.com) wrote:

I was running tests with a GSS-enabled stack, and ran into some very
long psql timeouts after running the Kerberos test suite. It turns out
the suite pushes test credentials into the user's global cache, and
these no-longer-useful credentials persist after the suite has
finished. (You can see this in action by running the test/kerberos
suite and then running `klist`.) This leads to long hangs, I assume
while the GSS implementation tries to contact a KDC that no longer
exists.
Attached is a patch that initializes a local credentials cache inside
tmp_check/krb5cc, and tells psql to use it via the KRB5CCNAME envvar.
This prevents the global cache pollution. WDYT?

Ah, yeah, that generally seems like a good idea.

Yeah, changing global state is just awful. However, I don't
actually see any change here (RHEL8):

$ klist
klist: Credentials cache 'KCM:1001' not found
$ make check
...
Result: PASS
$ klist
klist: Credentials cache 'KCM:1001' not found

I suppose in an environment where someone was really using Kerberos,
the random kinit would be more of a problem.

Also, why are you only setting the ENV variable within narrow parts
of the test script? I'd be inclined to enforce it throughout.

regards, tom lane

#4Jacob Champion
jacob.champion@enterprisedb.com
In reply to: Tom Lane (#3)
Re: Fixing cache pollution in the Kerberos test suite

On Mon, 2021-01-25 at 13:49 -0500, Tom Lane wrote:

Yeah, changing global state is just awful. However, I don't
actually see any change here (RHEL8):

Interesting. I'm running Ubuntu 20.04:

$ klist
klist: No credentials cache found (filename: /tmp/krb5cc_1000)

$ make check
...

$ klist
Ticket cache: FILE:/tmp/krb5cc_1000
Default principal: test1@EXAMPLE.COM

Valid starting Expires Service principal
... krbtgt/EXAMPLE.COM@EXAMPLE.COM
... postgres/auth-test-localhost.postgresql.example.com@
... postgres/auth-test-localhost.postgresql.example.com@EXAMPLE.COM

I wonder if your use of a KCM cache type rather than FILE makes the
difference?

Also, why are you only setting the ENV variable within narrow parts
of the test script? I'd be inclined to enforce it throughout.

I considered it and decided I didn't want to pollute the server's
environment with it, since the server shouldn't need the client cache.
But I think it'd be fine (and match the current situation) if it were
set once for the whole script, if you prefer.

--Jacob

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jacob Champion (#4)
Re: Fixing cache pollution in the Kerberos test suite

Jacob Champion <pchampion@vmware.com> writes:

On Mon, 2021-01-25 at 13:49 -0500, Tom Lane wrote:

Yeah, changing global state is just awful. However, I don't
actually see any change here (RHEL8):

Interesting. I'm running Ubuntu 20.04:

Hmm. I'll poke harder.

Also, why are you only setting the ENV variable within narrow parts
of the test script? I'd be inclined to enforce it throughout.

I considered it and decided I didn't want to pollute the server's
environment with it, since the server shouldn't need the client cache.

True, but if it did try to access the cache, accessing the user's
normal cache would be strictly worse than accessing the test cache.

regards, tom lane

#6Jacob Champion
jacob.champion@enterprisedb.com
In reply to: Tom Lane (#5)
Re: Fixing cache pollution in the Kerberos test suite

On Mon, 2021-01-25 at 14:04 -0500, Tom Lane wrote:

Jacob Champion <pchampion@vmware.com> writes:

On Mon, 2021-01-25 at 13:49 -0500, Tom Lane wrote:

Also, why are you only setting the ENV variable within narrow parts
of the test script? I'd be inclined to enforce it throughout.

I considered it and decided I didn't want to pollute the server's
environment with it, since the server shouldn't need the client cache.

True, but if it did try to access the cache, accessing the user's
normal cache would be strictly worse than accessing the test cache.

That's fair. Attached is a v2 that just sets KRB5CCNAME globally. Makes
for a much smaller patch :)

--Jacob

Attachments:

v2-test-kerberos-use-a-local-credentials-cache.patchtext/x-patch; name=v2-test-kerberos-use-a-local-credentials-cache.patchDownload+4-1
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#5)
Re: Fixing cache pollution in the Kerberos test suite

I wrote:

Jacob Champion <pchampion@vmware.com> writes:

Interesting. I'm running Ubuntu 20.04:

Hmm. I'll poke harder.

Ah ... on both RHEL8 and Fedora 33, I find this:

--- snip ---
$ cat /etc/krb5.conf.d/kcm_default_ccache
# This file should normally be installed by your distribution into a
# directory that is included from the Kerberos configuration file (/etc/krb5.conf)
# On Fedora/RHEL/CentOS, this is /etc/krb5.conf.d/
#
# To enable the KCM credential cache enable the KCM socket and the service:
#   systemctl enable sssd-secrets.socket sssd-kcm.socket
#   systemctl start sssd-kcm.socket
#
# To disable the KCM credential cache, comment out the following lines.
[libdefaults]
    default_ccache_name = KCM:
--- snip ---

Even more interesting, that service seems to be enabled by default
(I'm pretty darn sure I didn't ask for it...)

However, this doesn't seem to explain why the test script isn't
causing a global state change. Whether the state is held in a
file or the sssd daemon shouldn't matter, it seems like.

Also, it looks like the test causes /tmp/krb5cc_<uid> to get
created or updated despite this setting. If I force klist
to look at that:

$ KRB5CCNAME=/tmp/krb5cc_1001 klist
Ticket cache: FILE:/tmp/krb5cc_1001
Default principal: test1@EXAMPLE.COM

Valid starting Expires Service principal
01/25/21 14:31:57 01/26/21 14:31:57 krbtgt/EXAMPLE.COM@EXAMPLE.COM
01/25/21 14:31:57 01/26/21 14:31:57 postgres/auth-test-localhost.postgresql.example.com@
Ticket server: postgres/auth-test-localhost.postgresql.example.com@EXAMPLE.COM

where the time corresponds to my having just run the test again.

So I'm still mightily confused, but it is clear that the test's
kinit is touching a file it shouldn't.

regards, tom lane

#8Jacob Champion
jacob.champion@enterprisedb.com
In reply to: Tom Lane (#7)
Re: Fixing cache pollution in the Kerberos test suite

On Mon, 2021-01-25 at 14:36 -0500, Tom Lane wrote:

However, this doesn't seem to explain why the test script isn't
causing a global state change. Whether the state is held in a
file or the sssd daemon shouldn't matter, it seems like.

Also, it looks like the test causes /tmp/krb5cc_<uid> to get
created or updated despite this setting.

Huh. I wonder, if you run `klist -A` after running the tests, do you
get anything more interesting? I am seeing a few bugs on Red Hat's
Bugzilla that center around strange KCM behavior [1]https://bugzilla.redhat.com/show_bug.cgi?id=1712875. But we're now
well outside my area of competence.

--Jacob

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1712875

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jacob Champion (#6)
Re: Fixing cache pollution in the Kerberos test suite

Jacob Champion <pchampion@vmware.com> writes:

On Mon, 2021-01-25 at 14:04 -0500, Tom Lane wrote:

True, but if it did try to access the cache, accessing the user's
normal cache would be strictly worse than accessing the test cache.

That's fair. Attached is a v2 that just sets KRB5CCNAME globally. Makes
for a much smaller patch :)

I tweaked this to make it look a bit more like the rest of the script,
and pushed it. Thanks!

regards, tom lane

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jacob Champion (#8)
Re: Fixing cache pollution in the Kerberos test suite

Jacob Champion <pchampion@vmware.com> writes:

On Mon, 2021-01-25 at 14:36 -0500, Tom Lane wrote:

Also, it looks like the test causes /tmp/krb5cc_<uid> to get
created or updated despite this setting.

Huh. I wonder, if you run `klist -A` after running the tests, do you
get anything more interesting?

"klist -A" prints nothing.

I am seeing a few bugs on Red Hat's
Bugzilla that center around strange KCM behavior [1]. But we're now
well outside my area of competence.

Mine too. But I verified that the /tmp file is no longer modified
with the adjusted script, so one way or the other this is better.

regards, tom lane