RADIUS tests and improvements

Started by Thomas Munroover 3 years ago9 messageshackers
Jump to latest
#1Thomas Munro
thomas.munro@gmail.com

Hi,

Here's a draft patch to tackle a couple of TODOs in the RADIUS code in auth.c.

The first change is to replace select() with a standard latch loop
that responds to interrupts, postmaster death etc promptly. It's not
really too much of a big deal because the timeout was only 3 seconds
(hardcoded), but it's not good to have places that ignore ProcSignal,
and it's good to move code to our modern pattern for I/O multiplexing.

We know from experience that we have to crank timeouts up to be able
to run tests reliably on slow/valgrind/etc systems, so the second
change is to change the timeout to a GUC, as also requested by a
comment. One good side-effect is that it becomes easy and fast to
test the timed-out code path too, with a small value. While adding
the GUC I couldn't help wondering why RADIUS even needs a timeout
separate from authentication_timeout; another way to go here would be
to remove it completely, but that'd be a policy change (removing the 3
second timeout we always had). Thoughts?

The patch looks bigger than it really is because it changes the
indentation level.

But first, some basic tests to show that it works. We can test before
and after the change and have a non-zero level of confidence about
whacking the code around. Like existing similar tests, you need to
install an extra package (FreeRADIUS) and opt in with
PG_EXTRA_TESTS=radius. I figured out how to do that for our 3 CI
Unixen, so cfbot should run the tests and pass once I add this to the
March commitfest. FreeRADIUS claims to work on Windows too, but I
don't know how to set that up; maybe someday someone will fix that for
all the PG_EXTRA_TESTS tests. I've also seen this work on a Mac with
MacPorts. There's only one pathname in there that's a wild guess:
non-Debianoid Linux systems; if you know the answer there please LMK.

Attachments:

0001-Add-simple-test-for-RADIUS-authentication.patchtext/x-patch; charset=US-ASCII; name=0001-Add-simple-test-for-RADIUS-authentication.patchDownload+224-2
0002-ci-Enable-RADIUS-test.patchtext/x-patch; charset=US-ASCII; name=0002-ci-Enable-RADIUS-test.patchDownload+6-2
0003-Use-latch-API-to-wait-for-RADIUS-authentication.patchtext/x-patch; charset=US-ASCII; name=0003-Use-latch-API-to-wait-for-RADIUS-authentication.patchDownload+197-174
#2Andreas Karlsson
andreas.karlsson@percona.com
In reply to: Thomas Munro (#1)
Re: RADIUS tests and improvements

On 1/3/23 04:11, Thomas Munro wrote:

Here's a draft patch to tackle a couple of TODOs in the RADIUS code in auth.c.

Nice to see someone working on this! I know of one company which could
have used the configurable timeout for radius because the 3 second
timeout is too short for 2FA. I think they ended up using PAM or some
other solution in the end, but I am not 100% sure.

[...] While adding
the GUC I couldn't help wondering why RADIUS even needs a timeout
separate from authentication_timeout; another way to go here would be
to remove it completely, but that'd be a policy change (removing the 3
second timeout we always had). Thoughts?

It was some time since I last looked at the code but my impression was
that the reason for having a separate timeout is that you can try the
next server after the first one timed out (multiple radius servers are
allowed). But I wonder if that really is a useful feature or if someone
just was too clever or it just was an accidental feature.

Andreas

#3Andreas Karlsson
andreas.karlsson@percona.com
In reply to: Andreas Karlsson (#2)
Re: RADIUS tests and improvements

On 1/3/23 22:03, Andreas Karlsson wrote:

On 1/3/23 04:11, Thomas Munro wrote:

Here's a draft patch to tackle a couple of TODOs in the RADIUS code in
auth.c.

Nice to see someone working on this!.

Another thing: shouldn't we set some wait event to indicate that we are
waiting the RADIUS server or is that pointless during authentication
since there are no queries running anyway?

Andreas

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Andreas Karlsson (#3)
Re: RADIUS tests and improvements

On Wed, Jan 4, 2023 at 10:07 AM Andreas Karlsson <andreas@proxel.se> wrote:

Another thing: shouldn't we set some wait event to indicate that we are
waiting the RADIUS server or is that pointless during authentication
since there are no queries running anyway?

I initially added a wait_event value, but I couldn't see it
anywhere... there is no entry in pg_stat_activity for a backend that
is in that phase of authentication, so I just set it to zero.

#5Thomas Munro
thomas.munro@gmail.com
In reply to: Andreas Karlsson (#2)
Re: RADIUS tests and improvements

On Wed, Jan 4, 2023 at 10:03 AM Andreas Karlsson <andreas@proxel.se> wrote:

On 1/3/23 04:11, Thomas Munro wrote:

[...] While adding
the GUC I couldn't help wondering why RADIUS even needs a timeout
separate from authentication_timeout; another way to go here would be
to remove it completely, but that'd be a policy change (removing the 3
second timeout we always had). Thoughts?

It was some time since I last looked at the code but my impression was
that the reason for having a separate timeout is that you can try the
next server after the first one timed out (multiple radius servers are
allowed). But I wonder if that really is a useful feature or if someone
just was too clever or it just was an accidental feature.

Ah! Thanks, now that makes sense.

#6Andreas Karlsson
andreas.karlsson@percona.com
In reply to: Thomas Munro (#4)
Re: RADIUS tests and improvements

On 1/3/23 22:16, Thomas Munro wrote:

On Wed, Jan 4, 2023 at 10:07 AM Andreas Karlsson <andreas@proxel.se> wrote:

Another thing: shouldn't we set some wait event to indicate that we are
waiting the RADIUS server or is that pointless during authentication
since there are no queries running anyway?

I initially added a wait_event value, but I couldn't see it
anywhere... there is no entry in pg_stat_activity for a backend that
is in that phase of authentication, so I just set it to zero.

Thanks for the explanation, that makes a lot of sense!

Andreas

#7Thomas Munro
thomas.munro@gmail.com
In reply to: Andreas Karlsson (#6)
Re: RADIUS tests and improvements

New improved version:

* fixed stupid misuse of PG_FINALLY() (oops, must have been thinking
of another language)
* realised that it was strange to have a GUC for the timeout, and made
a new HBA parameter instead
* added documentation for that
* used TimestampDifferenceMilliseconds() instead of open-coded TimestampTz maths

I don't exactly love the PG_TRY()/PG_CATCH() around the
CHECK_FOR_INTERRUPTS(). In fact this kind of CFI-with-cleanup problem
has been haunting me across several projects. For cases that memory
contexts and resource owners can't help with, I don't currently know
what else to do here. Better ideas welcome. If I just let that
socket leak because I know this backend will soon exit, I'd expect a
knock at the door from the programming police.

I don't actually know why we have
src/test/authentication/t/...{password,sasl,peer}..., but then
src/test/{kerberos,ldap,ssl}/t/001_auth.pl. For this one, I just
copied the second style, creating src/test/radius/t/001_auth.pl. I
can't explain why it should be like that, though. If I propose
another test for PAM, where should it go?

Attachments:

v2-0001-Add-simple-test-for-RADIUS-authentication.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Add-simple-test-for-RADIUS-authentication.patchDownload+224-2
v2-0002-ci-Enable-RADIUS-test.patchtext/x-patch; charset=US-ASCII; name=v2-0002-ci-Enable-RADIUS-test.patchDownload+6-2
v2-0003-Use-latch-API-to-wait-for-RADIUS-authentication.patchtext/x-patch; charset=US-ASCII; name=v2-0003-Use-latch-API-to-wait-for-RADIUS-authentication.patchDownload+100-39
#8Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#7)
Re: RADIUS tests and improvements

On Sat, Mar 04, 2023 at 02:23:10PM +1300, Thomas Munro wrote:

I don't exactly love the PG_TRY()/PG_CATCH() around the
CHECK_FOR_INTERRUPTS().

In fact this kind of CFI-with-cleanup problem
has been haunting me across several projects. For cases that memory
contexts and resource owners can't help with, I don't currently know
what else to do here. Better ideas welcome.

Like adding a Open/CloseSocket() in fd.c to control the leaks?

If I just let that
socket leak because I know this backend will soon exit, I'd expect a
knock at the door from the programming police.

Hmm. It seems to me that you'd better have two patches instead of one
here? First, one to introduce the new parameter to control the
timeout, and a second to improve the responsiveness with a
WaitLatch()? If the CFI proves to be an issue, it would be sad to
have to revert the configuration part, which is worth on its own.

I don't actually know why we have
src/test/authentication/t/...{password,sasl,peer}..., but then
src/test/{kerberos,ldap,ssl}/t/001_auth.pl. For this one, I just
copied the second style, creating src/test/radius/t/001_auth.pl. I
can't explain why it should be like that, though. If I propose
another test for PAM, where should it go?

My take would be to keep the number of directories in src/test/ to a
minimum in the long run. Still, this is a case-by-case, as it depends
on if a set of tests needs an expanded set of modules, configuration
files and/or multiple scripts. ssl has its own set of configuration
files with its module, so it makes sense to be independent. ldap has
its LdapServer.pm with a configuration file, again I'm OK with a
separate case. Kerberos has its own README, but IMO it could also be
moved to src/test/authentication/ as it has a simple structure, with
its requirements moved into a different README.

What this patch set does for the RADIUS test is simple enough in
structure that I would also add it in src/test/authentication/. That
means less Make-fu and less Meson-fu.

In 0001, PG_TEST_EXTRA requires radius for the test. This needs an
update of regress.sgml where the values available are listed. I think
that you'd better document that freeradius is required for the test in
one of the README (either create a new one in radius/, or add this
information to the one in authentication, as you feel).
--
Michael

#9Daniel Gustafsson
daniel@yesql.se
In reply to: Michael Paquier (#8)
Re: RADIUS tests and improvements

Hi Thomas,

Have you have a chance to look at and address the feedback given in this
thread?

--
Daniel Gustafsson