Patch proposal: make use of regular expressions for the username in pg_hba.conf

Started by Drouvot, Bertrandover 3 years ago37 messages

bdrouvot@amazon.com

over 3 years ago

1 attachment(s)

Hi hackers,

Attached is a patch proposal to allow the use of regular expressions for
the username in pg_hba.conf.

Using regular expressions for the username in the pg_hba.conf file is
convenient in situations where an organization has a large number of
users and needs an expressive way to map them.

For example, if an organization wants to allow gss connections only for
users having their principal, e.g. @BDTFOREST.LOCAL, they could make use
of an entry in pg_hba.conf such as:

host all /^.*@BDTFOREST.LOCAL$ 0.0.0.0/0 gss

Without this patch, I can think of three alternatives with existing
functionality, which all of tradeoffs. This includes:

1) Create an entry per user: this is challenging for organizations
managing large numbers of users (e.g. 1000s). This is also not dynamic,
i.e. the HBA file would need to be updated when users are added or removed.

2) Use a mapping in pg_ident.conf, for example:

Here is an entry in pg_hba.conf that uses a map:

host all all 0.0.0.0/0 gss map=mygssmap

and by defining this mapping in pg_ident.conf:

mygssmap /^(.*)@BDTFOREST\.LOCAL$ \1@BDTFOREST.LOCAL

That works for filtering the username.

LOG: connection authenticated: identity="bertrand@BDTFOREST.LOCAL"
method=gss (/pg_installed/data/pg_hba.conf:95)
$ grep -n mygssmap /pg_installed/data/pg_hba.conf
95:host all all 0.0.0.0/0 gss map=mygssmap

However, the behavior is not the same for the ones that don’t match the
mapping in pg_ident.conf: indeed the connection attempt stop here and
the next HBA line won’t be evaluated.

FATAL: GSSAPI authentication failed for user "bdt"
DETAIL: Connection matched pg_hba.conf line 95: "host all
all 0.0.0.0/0 gss map=mygssmap"

3) Make use of a role in pg_hba.conf, e.g. “+BDTONLY”. That would work
too, and also allow the evaluation of the next HBA line for the ones
that are not part of the role.

However:

- That’s not as dynamic as the regular expression, as new users
would need to be granted the role and some users who are moving in the
company may need to have the role revoked.
- Looking at the regular expression in the HBA file makes it clear
what filtering needs to be done. This is not obvious when looking at the
role, even if it has a meaningful name. This can generate “incorrect
filtering” should one user be granted the role by mistake, or make it
more difficult to debug why a user is not being matched to a particular
line in the HBA file.

This is why I think username filtering with regular expressions would
provide its own advantages.

Thoughts? Looking forward to your feedback,

Regards,

--
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-hba_with_regexp.patchtext/plain; charset=UTF-8; name=v1-0001-hba_with_regexp.patchDownload

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 433759928b..f3d0be260c 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -249,7 +249,8 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        Specifies which database user name(s) this record
        matches. The value <literal>all</literal> specifies that it
        matches all users.  Otherwise, this is either the name of a specific
-       database user, or a group name preceded by <literal>+</literal>.
+       database user, a regular expression preceded by <literal>/</literal>
+       or a group name preceded by <literal>+</literal>.
        (Recall that there is no real distinction between users and groups
        in <productname>PostgreSQL</productname>; a <literal>+</literal> mark really means
        <quote>match any of the roles that are directly or indirectly members
@@ -787,16 +788,18 @@ host    all             all             192.168.12.10/32        gss
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 host    all             all             192.168.0.0/16          ident map=omicron
 
-# If these are the only three lines for local connections, they will
+# If these are the only four lines for local connections, they will
 # allow local users to connect only to their own databases (databases
-# with the same name as their database user name) except for administrators
-# and members of role "support", who can connect to all databases.  The file
-# $PGDATA/admins contains a list of names of administrators.  Passwords
+# with the same name as their database user name) except for administrators,
+# users finishing with "helpdesk" and members of role "support",
+# who can connect to all databases.
+# The file$PGDATA/admins contains a list of names of administrators.  Passwords
 # are required in all cases.
 #
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 local   sameuser        all                                     md5
 local   all             @admins                                 md5
+local   all             /^.*helpdesk$                           md5
 local   all             +support                                md5
 
 # The last two lines above can be combined into a single line:
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 1447588c4a..e5b95eebf6 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -574,7 +574,8 @@ is_member(Oid userid, const char *role)
 }
 
 /*
- * Check AuthToken list for a match to role, allowing group names.
+ * Check AuthToken list for a match to role.
+ * We are allowing group names and regular expressions.
  */
 static bool
 check_role(const char *role, Oid roleid, List *tokens)
@@ -590,6 +591,56 @@ check_role(const char *role, Oid roleid, List *tokens)
 			if (is_member(roleid, tok->string + 1))
 				return true;
 		}
+		else if (!tok->quoted && tok->string[0] == '/')
+		{
+			/*
+			 * When tok->string starts with a slash, treat it as a regular
+			 * expression.
+			 */
+			int			r;
+			pg_wchar   *wstr;
+			int			wlen;
+			regex_t		re;
+
+			wstr = palloc((strlen(tok->string + 1) + 1) * sizeof(pg_wchar));
+			wlen = pg_mb2wchar_with_len(tok->string + 1,
+										wstr, strlen(tok->string + 1));
+
+			r = pg_regcomp(&re, wstr, wlen, REG_ADVANCED, C_COLLATION_OID);
+			if (r)
+			{
+				char		errstr[100];
+
+				pg_regerror(r, &re, errstr, sizeof(errstr));
+				ereport(LOG,
+						(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
+						 errmsg("invalid regular expression \"%s\": %s",
+								tok->string + 1, errstr)));
+
+				pfree(wstr);
+				return false;
+			}
+			else
+			{
+				pg_wchar   *wrolestr;
+				int			wrolelen;
+
+				pfree(wstr);
+				wrolestr = palloc((strlen(role) + 1) * sizeof(pg_wchar));
+				wrolelen = pg_mb2wchar_with_len(role, wrolestr, strlen(role));
+
+				if (pg_regexec(&re, wrolestr, wrolelen, 0, NULL, 0, NULL, 0) == REG_OKAY)
+				{
+					pfree(wrolestr);
+					return true;
+				}
+				else
+				{
+					pfree(wrolestr);
+					return false;
+				}
+			}
+		}
 		else if (token_matches(tok, role) ||
 				 token_is_keyword(tok, "all"))
 			return true;
diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index 3e3079c824..09439fe8e4 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -23,12 +23,13 @@ if (!$use_unix_sockets)
 # and then execute a reload to refresh it.
 sub reset_pg_hba
 {
-	my $node       = shift;
-	my $hba_method = shift;
+	my $node       		= shift;
+	my $hba_user_name 	= shift;
+	my $hba_method 		= shift;
 
 	unlink($node->data_dir . '/pg_hba.conf');
 	# just for testing purposes, use a continuation line
-	$node->append_conf('pg_hba.conf', "local all all\\\n $hba_method");
+	$node->append_conf('pg_hba.conf', "local all $hba_user_name\\\n $hba_method");
 	$node->reload;
 	return;
 }
@@ -76,14 +77,14 @@ $ENV{"PGPASSWORD"} = 'pass';
 
 # For "trust" method, all users should be able to connect. These users are not
 # considered to be authenticated.
-reset_pg_hba($node, 'trust');
+reset_pg_hba($node, 'all', 'trust');
 test_role($node, 'scram_role', 'trust', 0,
 	log_unlike => [qr/connection authenticated:/]);
 test_role($node, 'md5_role', 'trust', 0,
 	log_unlike => [qr/connection authenticated:/]);
 
 # For plain "password" method, all users should also be able to connect.
-reset_pg_hba($node, 'password');
+reset_pg_hba($node, 'all', 'password');
 test_role($node, 'scram_role', 'password', 0,
 	log_like =>
 	  [qr/connection authenticated: identity="scram_role" method=password/]);
@@ -92,7 +93,7 @@ test_role($node, 'md5_role', 'password', 0,
 	  [qr/connection authenticated: identity="md5_role" method=password/]);
 
 # For "scram-sha-256" method, user "scram_role" should be able to connect.
-reset_pg_hba($node, 'scram-sha-256');
+reset_pg_hba($node, 'all', 'scram-sha-256');
 test_role(
 	$node,
 	'scram_role',
@@ -112,7 +113,7 @@ $ENV{"PGPASSWORD"} = 'pass';
 
 # For "md5" method, all users should be able to connect (SCRAM
 # authentication will be performed for the user with a SCRAM secret.)
-reset_pg_hba($node, 'md5');
+reset_pg_hba($node, 'all', 'md5');
 test_role($node, 'scram_role', 'md5', 0,
 	log_like =>
 	  [qr/connection authenticated: identity="scram_role" method=md5/]);
@@ -122,11 +123,11 @@ test_role($node, 'md5_role', 'md5', 0,
 
 # Tests for channel binding without SSL.
 # Using the password authentication method; channel binding can't work
-reset_pg_hba($node, 'password');
+reset_pg_hba($node, 'all', 'password');
 $ENV{"PGCHANNELBINDING"} = 'require';
 test_role($node, 'scram_role', 'scram-sha-256', 2);
 # SSL not in use; channel binding still can't work
-reset_pg_hba($node, 'scram-sha-256');
+reset_pg_hba($node, 'all', 'scram-sha-256');
 $ENV{"PGCHANNELBINDING"} = 'require';
 test_role($node, 'scram_role', 'scram-sha-256', 2);
 
@@ -145,7 +146,7 @@ append_to_file(
 !);
 chmod 0600, $pgpassfile or die;
 
-reset_pg_hba($node, 'password');
+reset_pg_hba($node, 'all', 'password');
 test_role($node, 'scram_role', 'password from pgpass', 0);
 test_role($node, 'md5_role',   'password from pgpass', 2);
 
@@ -156,4 +157,8 @@ append_to_file(
 
 test_role($node, 'md5_role', 'password from pgpass', 0);
 
+# Testing with regular expression for username
+reset_pg_hba($node, '/^.*md.*$', 'password');
+test_role($node, 'md5_role', 'password from pgpass and regular expression for username', 0);
+
 done_testing();

Jacob Champion

jchampion@timescale.com

over 3 years ago

In reply to: Drouvot, Bertrand (#1)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

On Fri, Aug 19, 2022 at 1:13 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

This is why I think username filtering with regular expressions would
provide its own advantages.

Thoughts? Looking forward to your feedback,

I think your motivation for the feature is solid. It is killing me a
bit that this is making it easier to switch authentication methods
based on the role name, when I suspect what someone might really want
is to switch authentication methods based on the ID the user is trying
to authenticate with. But that's not your fault or problem to fix,
because the startup packet doesn't currently have that information.
(It does make me wonder whether I withdrew my PGAUTHUSER proposal [1]https://commitfest.postgresql.org/38/3314/
a month too early. And man, do I wish that pg_ident and pg_hba were
one file.)

I think you're going to have to address backwards compatibility
concerns. Today, I can create a role named "/a", and I can put that
into the HBA without quoting it. I'd be unamused if, after an upgrade,
my rule suddenly matched any role name containing an 'a'.

Speaking of partial matches, should this feature allow them? Maybe
rules should have to match the entire username instead, and sidestep
the inevitable "I forgot to anchor my regex" problems?

Thanks,
--Jacob

[1]: https://commitfest.postgresql.org/38/3314/

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Jacob Champion (#2)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Jacob Champion <jchampion@timescale.com> writes:

On Fri, Aug 19, 2022 at 1:13 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

This is why I think username filtering with regular expressions would
provide its own advantages.

I think your motivation for the feature is solid.

Yeah. I'm not sure that I buy the argument that this is more useful
than writing a role name and controlling things with GRANT ROLE, but
it's a plausible alternative with properties that might win in some
use-cases. So I see little reason not to allow it.

I'd actually ask why stop here? In particular, why not do the same
with the database-name column, especially since that does *not*
have the ability to use roles as a substitute for a wildcard entry?

I think you're going to have to address backwards compatibility
concerns. Today, I can create a role named "/a", and I can put that
into the HBA without quoting it. I'd be unamused if, after an upgrade,
my rule suddenly matched any role name containing an 'a'.

Meh ... that concern seems overblown to me. I guess it's possible
that somebody has an HBA entry that looks like that, but it doesn't
seem very plausible. Note that we made this exact same change in
pg_ident.conf years ago, and AFAIR we got zero complaints.

Speaking of partial matches, should this feature allow them? Maybe
rules should have to match the entire username instead, and sidestep
the inevitable "I forgot to anchor my regex" problems?

I think the pg_ident.conf precedent is binding on us here. If we
make this one work differently, nobody's going to thank us for it,
they're just going to wonder "did the left hand not know what the
right hand already did?"

regards, tom lane

Drouvot, Bertrand

bdrouvot@amazon.com

over 3 years ago

In reply to: Tom Lane (#3)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Hi,

On 9/9/22 2:46 AM, Tom Lane wrote:

Jacob Champion<jchampion@timescale.com> writes:

On Fri, Aug 19, 2022 at 1:13 AM Drouvot, Bertrand<bdrouvot@amazon.com> wrote:

This is why I think username filtering with regular expressions would
provide its own advantages.

I think your motivation for the feature is solid.

Yeah. I'm not sure that I buy the argument that this is more useful
than writing a role name and controlling things with GRANT ROLE, but
it's a plausible alternative with properties that might win in some
use-cases. So I see little reason not to allow it.

Thank you both for your feedback.

I'd actually ask why stop here? In particular, why not do the same
with the database-name column, especially since that does *not*
have the ability to use roles as a substitute for a wildcard entry?

I think that's a fair point, I'll look at it.

I think you're going to have to address backwards compatibility
concerns. Today, I can create a role named "/a", and I can put that
into the HBA without quoting it. I'd be unamused if, after an upgrade,
my rule suddenly matched any role name containing an 'a'.

Meh ... that concern seems overblown to me. I guess it's possible
that somebody has an HBA entry that looks like that, but it doesn't
seem very plausible. Note that we made this exact same change in
pg_ident.conf years ago, and AFAIR we got zero complaints.

Agree that it seems unlikely but maybe we could add a new GUC to turn
the regex usage on the hba file on/off (and use off as the default)?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services:https://aws.amazon.com

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Drouvot, Bertrand (#4)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

"Drouvot, Bertrand" <bdrouvot@amazon.com> writes:

Agree that it seems unlikely but maybe we could add a new GUC to turn
the regex usage on the hba file on/off (and use off as the default)?

I think that will just add useless complication.

regards, tom lane

Jacob Champion

jchampion@timescale.com

over 3 years ago

In reply to: Tom Lane (#3)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

On Thu, Sep 8, 2022 at 5:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jacob Champion <jchampion@timescale.com> writes:

I think you're going to have to address backwards compatibility
concerns. Today, I can create a role named "/a", and I can put that
into the HBA without quoting it. I'd be unamused if, after an upgrade,
my rule suddenly matched any role name containing an 'a'.

Meh ... that concern seems overblown to me. I guess it's possible
that somebody has an HBA entry that looks like that, but it doesn't
seem very plausible. Note that we made this exact same change in
pg_ident.conf years ago, and AFAIR we got zero complaints.

What percentage of users actually use pg_ident maps? My assumption
would be that a change to pg_hba would affect many more people, but
then I don't have any proof that there are users with role names that
look like that to begin with. I won't pound the table with it.

Speaking of partial matches, should this feature allow them? Maybe
rules should have to match the entire username instead, and sidestep
the inevitable "I forgot to anchor my regex" problems?

I think the pg_ident.conf precedent is binding on us here. If we
make this one work differently, nobody's going to thank us for it,
they're just going to wonder "did the left hand not know what the
right hand already did?"

Hmm... yeah, I suppose. From the other direction, it'd be bad to train
users that unanchored regexes are safe in pg_hba only to take those
guardrails off in pg_ident. I will tuck that away as a potential
behavior change, for a different thread.

Thanks,
--Jacob

Jacob Champion

jchampion@timescale.com

over 3 years ago

In reply to: Drouvot, Bertrand (#1)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

On 8/19/22 01:12, Drouvot, Bertrand wrote:

+           wstr = palloc((strlen(tok->string + 1) + 1) * sizeof(pg_wchar));     
+           wlen = pg_mb2wchar_with_len(tok->string + 1,                         
+                                       wstr, strlen(tok->string + 1));

The (tok->string + 1) construction comes up often enough that I think it
should be put in a `regex` variable or similar. That would help my eyes
with the (strlen(tok->string + 1) + 1) construction, especially.

I noticed that for pg_ident, we precompile the regexes per-line and
reuse those in child processes. Whereas here we're compiling, using, and
then discarding the regex for each check. I think the example set by the
pg_ident code is probably the one to follow, unless you have a good
reason not to.

+# Testing with regular expression for username                                  
+reset_pg_hba($node, '/^.*md.*$', 'password');                                   
+test_role($node, 'md5_role', 'password from pgpass and regular expression for username', 0);
+

IMO the coverage for this patch needs to be filled out. Negative test
cases are more important than positive ones for security-related code.

Other than that, and Tom's note on potentially expanding this to other
areas, I think this is a pretty straightforward patch.

Thanks,
--Jacob

Drouvot, Bertrand

bdrouvot@amazon.com

over 3 years ago

In reply to: Jacob Champion (#7)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Hi,

On 9/10/22 1:21 AM, Jacob Champion wrote:

On 8/19/22 01:12, Drouvot, Bertrand wrote:
+           wstr = palloc((strlen(tok->string + 1) + 1) * sizeof(pg_wchar));
+           wlen = pg_mb2wchar_with_len(tok->string + 1,
+                                       wstr, strlen(tok->string + 1));
The (tok->string + 1) construction comes up often enough that I think it
should be put in a `regex` variable or similar. That would help my eyes
with the (strlen(tok->string + 1) + 1) construction, especially.

I noticed that for pg_ident, we precompile the regexes per-line and
reuse those in child processes. Whereas here we're compiling, using, and
then discarding the regex for each check. I think the example set by the
pg_ident code is probably the one to follow, unless you have a good
reason not to.

Thanks for the feedback.

Yeah fully agree. I'll provide a new version that follow the same logic
as the pg_ident code.

+# Testing with regular expression for username
+reset_pg_hba($node, '/^.*md.*$', 'password');
+test_role($node, 'md5_role', 'password from pgpass and regular expression for username', 0);
+
IMO the coverage for this patch needs to be filled out. Negative test
cases are more important than positive ones for security-related code.

Agree, will do.

Other than that, and Tom's note on potentially expanding this to other
areas,

I'll add regexp usage for the database column and also the for the
address one when non CIDR is provided (so host name(s)) (I think it also
makes sense specially as we don't allow multiple values for this column).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services:https://aws.amazon.com

Drouvot, Bertrand

bertranddrouvot.pg@gmail.com

over 3 years ago

In reply to: Drouvot, Bertrand (#8)

1 attachment(s)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Hi,

On 9/12/22 9:55 AM, Drouvot, Bertrand wrote:

Hi,

On 9/10/22 1:21 AM, Jacob Champion wrote:
On 8/19/22 01:12, Drouvot, Bertrand wrote:
+           wstr = palloc((strlen(tok->string + 1) + 1) * sizeof(pg_wchar));
+           wlen = pg_mb2wchar_with_len(tok->string + 1,
+                                       wstr, strlen(tok->string + 1));
The (tok->string + 1) construction comes up often enough that I think it
should be put in a `regex` variable or similar. That would help my eyes
with the (strlen(tok->string + 1) + 1) construction, especially.

I noticed that for pg_ident, we precompile the regexes per-line and
reuse those in child processes. Whereas here we're compiling, using, and
then discarding the regex for each check. I think the example set by the
pg_ident code is probably the one to follow, unless you have a good
reason not to.
Thanks for the feedback.

Yeah fully agree. I'll provide a new version that follow the same logic
as the pg_ident code.
+# Testing with regular expression for username
+reset_pg_hba($node, '/^.*md.*$', 'password');
+test_role($node, 'md5_role', 'password from pgpass and regular expression for username', 0);
+
IMO the coverage for this patch needs to be filled out. Negative test
cases are more important than positive ones for security-related code.
Agree, will do.

Other than that, and Tom's note on potentially expanding this to other
areas,

I'll add regexp usage for the database column and also the for the
address one when non CIDR is provided (so host name(s)) (I think it also
makes sense specially as we don't allow multiple values for this column).

Please find attached v2 addressing the comments mentioned above.

v2 also provides regular expression usage for the database and the
address columns (when a host name is being used).

Remark:

The CF bot is failing for Windows (all other tests are green) and only
for the new tap test related to the regular expression on the host name
(the ones on database and role are fine).

The issue is not related to the patch. The issue is that the Windows
Cirrus test does not like when a host name is provided for a "host"
entry in pg_hba.conf (while it works fine when a CIDR is provided).

You can see an example in [1]https://github.com/bdrouvot/postgres/branches on branch “host_non_cidr” where the only change is to replace the
CIDR by "localhost" in 002_scram.pl. As you can see the Cirrus tests are
failing on Windows only (its log file is here [2]https://api.cirrus-ci.com/v1/artifact/task/6507279833890816/log/src/test/ssl/tmp_check/log/002_scram_primary.log).

I'll look at this "Windows" related issue but would appreciate any
guidance/help if someone has experience in this area on windows.

[1]: https://github.com/bdrouvot/postgres/branches on branch “host_non_cidr”

[2]: https://api.cirrus-ci.com/v1/artifact/task/6507279833890816/log/src/test/ssl/tmp_check/log/002_scram_primary.log
https://api.cirrus-ci.com/v1/artifact/task/6507279833890816/log/src/test/ssl/tmp_check/log/002_scram_primary.log

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v2-0001-hba_with_regexp.patchtext/plain; charset=UTF-8; name=v2-0001-hba_with_regexp.patchDownload

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..28343445be 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -235,8 +235,9 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        logical replication connections do specify it.
        Otherwise, this is the name of
        a specific <productname>PostgreSQL</productname> database.
-       Multiple database names can be supplied by separating them with
-       commas.  A separate file containing database names can be specified by
+       Multiple database names and/or regular expressions preceded by <literal>/</literal>
+       can be supplied by separating them with commas.
+       A separate file containing database names can be specified by
        preceding the file name with <literal>@</literal>.
       </para>
      </listitem>
@@ -249,7 +250,8 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        Specifies which database user name(s) this record
        matches. The value <literal>all</literal> specifies that it
        matches all users.  Otherwise, this is either the name of a specific
-       database user, or a group name preceded by <literal>+</literal>.
+       database user, a regular expression preceded by <literal>/</literal>
+       or a group name preceded by <literal>+</literal>.
        (Recall that there is no real distinction between users and groups
        in <productname>PostgreSQL</productname>; a <literal>+</literal> mark really means
        <quote>match any of the roles that are directly or indirectly members
@@ -258,7 +260,8 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        considered to be a member of a role if they are explicitly a member
        of the role, directly or indirectly, and not just by virtue of
        being a superuser.
-       Multiple user names can be supplied by separating them with commas.
+       Multiple user names and/or regular expressions preceded by <literal>/</literal>
+       can be supplied by separating them with commas.
        A separate file containing user names can be specified by preceding the
        file name with <literal>@</literal>.
       </para>
@@ -270,8 +273,9 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
      <listitem>
       <para>
        Specifies the client machine address(es) that this record
-       matches.  This field can contain either a host name, an IP
-       address range, or one of the special key words mentioned below.
+       matches.  This field can contain either a host name, a regular expression
+       preceded by <literal>/</literal> representing host names, an IP address range,
+       or one of the special key words mentioned below.
       </para>
 
       <para>
@@ -785,16 +789,18 @@ host    all             all             192.168.12.10/32        gss
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 host    all             all             192.168.0.0/16          ident map=omicron
 
-# If these are the only three lines for local connections, they will
+# If these are the only four lines for local connections, they will
 # allow local users to connect only to their own databases (databases
-# with the same name as their database user name) except for administrators
-# and members of role "support", who can connect to all databases.  The file
-# $PGDATA/admins contains a list of names of administrators.  Passwords
+# with the same name as their database user name) except for administrators,
+# users finishing with "helpdesk" and members of role "support",
+# who can connect to all databases.
+# The file$PGDATA/admins contains a list of names of administrators.  Passwords
 # are required in all cases.
 #
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 local   sameuser        all                                     md5
 local   all             @admins                                 md5
+local   all             /^.*helpdesk$                           md5
 local   all             +support                                md5
 
 # The last two lines above can be combined into a single line:
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 4637426d62..200de8d7d8 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -117,6 +117,9 @@ static List *tokenize_inc_file(List *tokens, const char *outer_filename,
 							   const char *inc_filename, int elevel, char **err_msg);
 static bool parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline,
 							   int elevel, char **err_msg);
+static bool token_regcomp(regex_t *re, char *string, char *filename,
+						  int line_num, char **err_msg, int elevel);
+static bool token_regexec(const char *match, regex_t *re);
 
 
 /*
@@ -574,13 +577,15 @@ is_member(Oid userid, const char *role)
 }
 
 /*
- * Check AuthToken list for a match to role, allowing group names.
+ * Check AuthToken list for a match to role.
+ * We are allowing group names and regular expressions.
  */
 static bool
-check_role(const char *role, Oid roleid, List *tokens)
+check_role(const char *role, Oid roleid, List *tokens, List *tokens_re)
 {
 	ListCell   *cell;
 	AuthToken  *tok;
+	int			re_num = 0;
 
 	foreach(cell, tokens)
 	{
@@ -590,6 +595,23 @@ check_role(const char *role, Oid roleid, List *tokens)
 			if (is_member(roleid, tok->string + 1))
 				return true;
 		}
+		else if (!tok->quoted && tok->string[0] == '/')
+		{
+			/*
+			 * When tok->string starts with a slash, treat it as a regular
+			 * expression.
+			 */
+			ListCell   *cell_re;
+			regex_t    *re;
+
+			cell_re = list_nth_cell(tokens_re, re_num);
+			re = lfirst(cell_re);
+
+			if (token_regexec(role, re))
+				return true;
+
+			re_num++;
+		}
 		else if (token_matches(tok, role) ||
 				 token_is_keyword(tok, "all"))
 			return true;
@@ -601,10 +623,11 @@ check_role(const char *role, Oid roleid, List *tokens)
  * Check to see if db/role combination matches AuthToken list.
  */
 static bool
-check_db(const char *dbname, const char *role, Oid roleid, List *tokens)
+check_db(const char *dbname, const char *role, Oid roleid, List *tokens, List *tokens_re)
 {
 	ListCell   *cell;
 	AuthToken  *tok;
+	int			re_num = 0;
 
 	foreach(cell, tokens)
 	{
@@ -633,6 +656,23 @@ check_db(const char *dbname, const char *role, Oid roleid, List *tokens)
 		}
 		else if (token_is_keyword(tok, "replication"))
 			continue;			/* never match this if not walsender */
+		else if (!tok->quoted && tok->string[0] == '/')
+		{
+			/*
+			 * When tok->string starts with a slash, treat it as a regular
+			 * expression.
+			 */
+			ListCell   *cell_re;
+			regex_t    *re;
+
+			cell_re = list_nth_cell(tokens_re, re_num);
+			re = lfirst(cell_re);
+
+			if (token_regexec(dbname, re))
+				return true;
+
+			re_num++;
+		}
 		else if (token_matches(tok, dbname))
 			return true;
 	}
@@ -681,7 +721,7 @@ hostname_match(const char *pattern, const char *actual_hostname)
  * Check to see if a connecting IP matches a given host name.
  */
 static bool
-check_hostname(hbaPort *port, const char *hostname)
+check_hostname(hbaPort *port, const char *hostname, regex_t re)
 {
 	struct addrinfo *gai_result,
 			   *gai;
@@ -712,8 +752,17 @@ check_hostname(hbaPort *port, const char *hostname)
 		port->remote_hostname = pstrdup(remote_hostname);
 	}
 
+	if (hostname[0] == '/')
+	{
+		/*
+		 * When hostname starts with a slash, treat it as a regular
+		 * expression.
+		 */
+		if (!token_regexec(port->remote_hostname, &re))
+			return false;
+	}
 	/* Now see if remote host name matches this pg_hba line */
-	if (!hostname_match(hostname, port->remote_hostname))
+	else if (!hostname_match(hostname, port->remote_hostname))
 		return false;
 
 	/* If we already verified the forward lookup, we're done */
@@ -939,13 +988,13 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 	struct addrinfo *gai_result;
 	struct addrinfo hints;
 	int			ret;
-	char	   *cidr_slash;
 	char	   *unsupauth;
 	ListCell   *field;
 	List	   *tokens;
 	ListCell   *tokencell;
 	AuthToken  *token;
 	HbaLine    *parsedline;
+	char       *cidr_slash = NULL; /* keep compiler quiet */
 
 	parsedline = palloc0(sizeof(HbaLine));
 	parsedline->linenumber = line_num;
@@ -1049,9 +1098,27 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 		return NULL;
 	}
 	parsedline->databases = NIL;
+	parsedline->databases_re = NIL;
 	tokens = lfirst(field);
 	foreach(tokencell, tokens)
 	{
+		AuthToken  *tok = lfirst(tokencell);
+
+		if (!tok->quoted && tok->string[0] == '/')
+		{
+			/*
+			 * When tok->string starts with a slash, treat it as a regular
+			 * expression. Pre-compile it.
+			 */
+			regex_t    *re;
+
+			re = (regex_t *) palloc(sizeof(regex_t));
+			if (token_regcomp(re, tok->string + 1, HbaFileName, line_num,
+							  err_msg, elevel))
+				parsedline->databases_re = lappend(parsedline->databases_re, re);
+			else
+				return NULL;
+		}
 		parsedline->databases = lappend(parsedline->databases,
 										copy_auth_token(lfirst(tokencell)));
 	}
@@ -1069,9 +1136,27 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 		return NULL;
 	}
 	parsedline->roles = NIL;
+	parsedline->roles_re = NIL;
 	tokens = lfirst(field);
 	foreach(tokencell, tokens)
 	{
+		AuthToken  *tok = lfirst(tokencell);
+
+		if (!tok->quoted && tok->string[0] == '/')
+		{
+			/*
+			 * When tok->string starts with a slash, treat it as a regular
+			 * expression. Pre-compile it.
+			 */
+			regex_t    *re;
+
+			re = (regex_t *) palloc(sizeof(regex_t));
+			if (token_regcomp(re, tok->string + 1, HbaFileName, line_num,
+							  err_msg, elevel))
+				parsedline->roles_re = lappend(parsedline->roles_re, re);
+			else
+				return NULL;
+		}
 		parsedline->roles = lappend(parsedline->roles,
 									copy_auth_token(lfirst(tokencell)));
 	}
@@ -1120,6 +1205,8 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 		}
 		else
 		{
+			bool is_regexp = token->string[0] == '/' ? true : false;
+
 			/* IP and netmask are specified */
 			parsedline->ip_cmp_method = ipCmpMask;
 
@@ -1127,9 +1214,12 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 			str = pstrdup(token->string);
 
 			/* Check if it has a CIDR suffix and if so isolate it */
-			cidr_slash = strchr(str, '/');
-			if (cidr_slash)
-				*cidr_slash = '\0';
+			if (!is_regexp)
+			{
+				cidr_slash = strchr(str, '/');
+				if (cidr_slash)
+					*cidr_slash = '\0';
+			}
 
 			/* Get the IP address either way */
 			hints.ai_flags = AI_NUMERICHOST;
@@ -1168,7 +1258,7 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 			pg_freeaddrinfo_all(hints.ai_family, gai_result);
 
 			/* Get the netmask */
-			if (cidr_slash)
+			if (cidr_slash && !is_regexp)
 			{
 				if (parsedline->hostname)
 				{
@@ -1199,7 +1289,7 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 				parsedline->masklen = parsedline->addrlen;
 				pfree(str);
 			}
-			else if (!parsedline->hostname)
+			else if (!parsedline->hostname && !is_regexp)
 			{
 				/* Read the mask field. */
 				pfree(str);
@@ -1261,6 +1351,18 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 					return NULL;
 				}
 			}
+			else if (is_regexp)
+			{
+				/*
+				 * When token->string starts with a slash, treat it as a
+				 * regular expression. Pre-compile it.
+				 */
+				if (!token_regcomp(&parsedline->hostname_re,
+								   token->string + 1, HbaFileName,
+								   line_num, err_msg, elevel))
+					return NULL;
+				parsedline->hostname = str;
+			}
 		}
 	}							/* != ctLocal */
 
@@ -2135,7 +2237,8 @@ check_hba(hbaPort *port)
 					if (hba->hostname)
 					{
 						if (!check_hostname(port,
-											hba->hostname))
+											hba->hostname, hba->hostname_re))
+
 							continue;
 					}
 					else
@@ -2162,10 +2265,10 @@ check_hba(hbaPort *port)
 
 		/* Check database and role */
 		if (!check_db(port->database_name, port->user_name, roleid,
-					  hba->databases))
+					  hba->databases, hba->databases_re))
 			continue;
 
-		if (!check_role(port->user_name, roleid, hba->roles))
+		if (!check_role(port->user_name, roleid, hba->roles, hba->roles_re))
 			continue;
 
 		/* Found a record that matched! */
@@ -2342,34 +2445,9 @@ parse_ident_line(TokenizedAuthLine *tok_line, int elevel)
 		 * When system username starts with a slash, treat it as a regular
 		 * expression. Pre-compile it.
 		 */
-		int			r;
-		pg_wchar   *wstr;
-		int			wlen;
-
-		wstr = palloc((strlen(parsedline->ident_user + 1) + 1) * sizeof(pg_wchar));
-		wlen = pg_mb2wchar_with_len(parsedline->ident_user + 1,
-									wstr, strlen(parsedline->ident_user + 1));
-
-		r = pg_regcomp(&parsedline->re, wstr, wlen, REG_ADVANCED, C_COLLATION_OID);
-		if (r)
-		{
-			char		errstr[100];
-
-			pg_regerror(r, &parsedline->re, errstr, sizeof(errstr));
-			ereport(elevel,
-					(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
-					 errmsg("invalid regular expression \"%s\": %s",
-							parsedline->ident_user + 1, errstr),
-					 errcontext("line %d of configuration file \"%s\"",
-							line_num, IdentFileName)));
-
-			*err_msg = psprintf("invalid regular expression \"%s\": %s",
-								parsedline->ident_user + 1, errstr);
-
-			pfree(wstr);
+		if (!token_regcomp(&parsedline->re, parsedline->ident_user + 1,
+						   IdentFileName, line_num, err_msg, elevel))
 			return NULL;
-		}
-		pfree(wstr);
 	}
 
 	return parsedline;
@@ -2706,3 +2784,58 @@ hba_authname(UserAuth auth_method)
 
 	return UserAuthName[auth_method];
 }
+
+static bool
+token_regcomp(regex_t *re, char *string, char *filename, int line_num,
+			  char **err_msg, int elevel)
+{
+	int			r;
+	pg_wchar   *wstr;
+	int			wlen;
+
+	wstr = palloc((strlen(string) + 1) * sizeof(pg_wchar));
+	wlen = pg_mb2wchar_with_len(string,
+								wstr, strlen(string));
+
+	r = pg_regcomp(re, wstr, wlen, REG_ADVANCED, C_COLLATION_OID);
+	if (r)
+	{
+		char		errstr[100];
+
+		pg_regerror(r, re, errstr, sizeof(errstr));
+		ereport(elevel,
+				(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
+				 errmsg("invalid regular expression \"%s\": %s",
+						string, errstr),
+				 errcontext("line %d of configuration file \"%s\"",
+							line_num, filename)));
+
+		*err_msg = psprintf("invalid regular expression \"%s\": %s",
+							string, errstr);
+
+		pfree(wstr);
+		return false;
+	}
+
+	pfree(wstr);
+	return true;
+}
+
+static bool
+token_regexec(const char *match, regex_t *re)
+{
+	pg_wchar   *wmatchstr;
+	int			wmatchlen;
+
+	wmatchstr = palloc((strlen(match) + 1) * sizeof(pg_wchar));
+	wmatchlen = pg_mb2wchar_with_len(match, wmatchstr, strlen(match));
+
+	if (pg_regexec(re, wmatchstr, wmatchlen, 0, NULL, 0, NULL, 0) == REG_OKAY)
+	{
+		pfree(wmatchstr);
+		return true;
+	}
+
+	pfree(wmatchstr);
+	return false;
+}
diff --git a/src/include/libpq/hba.h b/src/include/libpq/hba.h
index 90036f7bcd..c24f9166cb 100644
--- a/src/include/libpq/hba.h
+++ b/src/include/libpq/hba.h
@@ -120,6 +120,9 @@ typedef struct HbaLine
 	char	   *radiusidentifiers_s;
 	List	   *radiusports;
 	char	   *radiusports_s;
+	List	   *roles_re;
+	List	   *databases_re;
+	regex_t	   hostname_re;
 } HbaLine;
 
 typedef struct IdentLine
diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index 3e3079c824..049ad1cc89 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -23,29 +23,32 @@ if (!$use_unix_sockets)
 # and then execute a reload to refresh it.
 sub reset_pg_hba
 {
-	my $node       = shift;
-	my $hba_method = shift;
+	my $node		= shift;
+	my $host		= shift;
+	my $database	= shift;
+	my $role		= shift;
+	my $method		= shift;
 
 	unlink($node->data_dir . '/pg_hba.conf');
 	# just for testing purposes, use a continuation line
-	$node->append_conf('pg_hba.conf', "local all all\\\n $hba_method");
+	$node->append_conf('pg_hba.conf', "$host $database $role\\\n $method");
 	$node->reload;
 	return;
 }
 
 # Test access for a single role, useful to wrap all tests into one.  Extra
 # named parameters are passed to connect_ok/fails as-is.
-sub test_role
+sub test_conn
 {
 	local $Test::Builder::Level = $Test::Builder::Level + 1;
 
-	my ($node, $role, $method, $expected_res, %params) = @_;
+	my ($node, $conn, $method, $expected_res, %params) = @_;
 	my $status_string = 'failed';
 	$status_string = 'success' if ($expected_res eq 0);
 
-	my $connstr = "user=$role";
+	my $connstr = "$conn";
 	my $testname =
-	  "authentication $status_string for method $method, role $role";
+	  "authentication $status_string for method $method, conn $conn";
 
 	if ($expected_res eq 0)
 	{
@@ -61,7 +64,11 @@ sub test_role
 # Initialize primary node
 my $node = PostgreSQL::Test::Cluster->new('primary');
 $node->init;
-$node->append_conf('postgresql.conf', "log_connections = on\n");
+$node->append_conf(
+    'postgresql.conf', qq{
+listen_addresses = '127.0.0.1'
+log_connections = on
+});
 $node->start;
 
 # Create 3 roles with different password methods for each one. The same
@@ -74,61 +81,66 @@ $node->safe_psql('postgres',
 );
 $ENV{"PGPASSWORD"} = 'pass';
 
+# Create a database to test regular expression
+$node->safe_psql('postgres',
+	"CREATE database testdb;"
+);
+
 # For "trust" method, all users should be able to connect. These users are not
 # considered to be authenticated.
-reset_pg_hba($node, 'trust');
-test_role($node, 'scram_role', 'trust', 0,
+reset_pg_hba($node, 'local','all', 'all', 'trust');
+test_conn($node, 'user=scram_role', 'trust', 0,
 	log_unlike => [qr/connection authenticated:/]);
-test_role($node, 'md5_role', 'trust', 0,
+test_conn($node, 'user=md5_role', 'trust', 0,
 	log_unlike => [qr/connection authenticated:/]);
 
 # For plain "password" method, all users should also be able to connect.
-reset_pg_hba($node, 'password');
-test_role($node, 'scram_role', 'password', 0,
+reset_pg_hba($node, 'local', 'all', 'all', 'password');
+test_conn($node, 'user=scram_role', 'password', 0,
 	log_like =>
 	  [qr/connection authenticated: identity="scram_role" method=password/]);
-test_role($node, 'md5_role', 'password', 0,
+test_conn($node, 'user=md5_role', 'password', 0,
 	log_like =>
 	  [qr/connection authenticated: identity="md5_role" method=password/]);
 
 # For "scram-sha-256" method, user "scram_role" should be able to connect.
-reset_pg_hba($node, 'scram-sha-256');
-test_role(
+reset_pg_hba($node, 'local', 'all', 'all', 'scram-sha-256');
+test_conn(
 	$node,
-	'scram_role',
+	'user=scram_role',
 	'scram-sha-256',
 	0,
 	log_like => [
 		qr/connection authenticated: identity="scram_role" method=scram-sha-256/
 	]);
-test_role($node, 'md5_role', 'scram-sha-256', 2,
+test_conn($node, 'user=md5_role', 'scram-sha-256', 2,
 	log_unlike => [qr/connection authenticated:/]);
 
 # Test that bad passwords are rejected.
 $ENV{"PGPASSWORD"} = 'badpass';
-test_role($node, 'scram_role', 'scram-sha-256', 2,
+test_conn($node, 'user=scram_role', 'scram-sha-256', 2,
 	log_unlike => [qr/connection authenticated:/]);
 $ENV{"PGPASSWORD"} = 'pass';
 
 # For "md5" method, all users should be able to connect (SCRAM
 # authentication will be performed for the user with a SCRAM secret.)
-reset_pg_hba($node, 'md5');
-test_role($node, 'scram_role', 'md5', 0,
+reset_pg_hba($node, 'local', 'all', 'all', 'md5');
+test_conn($node, 'user=scram_role', 'md5', 0,
 	log_like =>
 	  [qr/connection authenticated: identity="scram_role" method=md5/]);
-test_role($node, 'md5_role', 'md5', 0,
+test_conn($node, 'user=md5_role', 'md5', 0,
 	log_like =>
 	  [qr/connection authenticated: identity="md5_role" method=md5/]);
 
 # Tests for channel binding without SSL.
 # Using the password authentication method; channel binding can't work
-reset_pg_hba($node, 'password');
+reset_pg_hba($node, 'local', 'all', 'all', 'password');
 $ENV{"PGCHANNELBINDING"} = 'require';
-test_role($node, 'scram_role', 'scram-sha-256', 2);
+test_conn($node, 'user=scram_role', 'scram-sha-256', 2);
 # SSL not in use; channel binding still can't work
-reset_pg_hba($node, 'scram-sha-256');
+reset_pg_hba($node, 'local', 'all', 'all', 'scram-sha-256');
 $ENV{"PGCHANNELBINDING"} = 'require';
-test_role($node, 'scram_role', 'scram-sha-256', 2);
+test_conn($node, 'user=scram_role', 'scram-sha-256', 2);
 
 # Test .pgpass processing; but use a temp file, don't overwrite the real one!
 my $pgpassfile = "${PostgreSQL::Test::Utils::tmp_check}/pgpass";
@@ -145,15 +157,38 @@ append_to_file(
 !);
 chmod 0600, $pgpassfile or die;
 
-reset_pg_hba($node, 'password');
-test_role($node, 'scram_role', 'password from pgpass', 0);
-test_role($node, 'md5_role',   'password from pgpass', 2);
+reset_pg_hba($node, 'local', 'all', 'all', 'password');
+test_conn($node, 'user=scram_role', 'password from pgpass', 0);
+test_conn($node, 'user=md5_role',   'password from pgpass', 2);
 
 append_to_file(
 	$pgpassfile, qq!
 *:*:*:md5_role:p\\ass
 !);
 
-test_role($node, 'md5_role', 'password from pgpass', 0);
+test_conn($node, 'user=md5_role', 'password from pgpass', 0);
+
+# Testing with regular expression for username
+reset_pg_hba($node, 'local', 'all', '/^.*nomatch.*$, baduser, /^.*md.*$', 'password');
+test_conn($node, 'user=md5_role', 'password, matching regexp for username', 0);
+
+reset_pg_hba($node, 'local', 'all', '/^.*nomatch.*$, baduser, /^.*m_d.*$', 'password');
+test_conn($node, 'user=md5_role', 'password, non matching regexp for username', 2,
+		log_unlike => [qr/connection authenticated:/]);
+
+# Testing with regular expression for dbname
+reset_pg_hba($node, 'local', '/^.*nomatch.*$, baddb, /^t.*b$', 'all', 'password');
+test_conn($node, 'user=md5_role dbname=testdb', 'password, matching regexp for dbname', 0);
+
+reset_pg_hba($node, 'local', '/^.*nomatch.*$, baddb, /^t.*ba$', 'all', 'password');
+test_conn($node, 'user=md5_role dbname=testdb', 'password, non matching regexp for dbname', 2,
+		log_unlike => [qr/connection authenticated:/]);
+
+# Testing with regular expression for hostname
+reset_pg_hba($node, 'host', 'all', 'all  /^.*$', 'password');
+test_conn($node, 'user=md5_role host=localhost', 'password, matching regexp for hostname', 0);
+
+reset_pg_hba($node, 'host', 'all', 'all  /^$', 'password');
+test_conn($node, 'user=md5_role host=localhost', 'password, non matching regexp for hostname', 2);
 
 done_testing();

#10

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Drouvot, Bertrand (#9)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

On Fri, Sep 16, 2022 at 06:24:07PM +0200, Drouvot, Bertrand wrote:

The CF bot is failing for Windows (all other tests are green) and only for
the new tap test related to the regular expression on the host name (the
ones on database and role are fine).

The issue is not related to the patch. The issue is that the Windows Cirrus
test does not like when a host name is provided for a "host" entry in
pg_hba.conf (while it works fine when a CIDR is provided).

You can see an example in [1] where the only change is to replace the CIDR
by "localhost" in 002_scram.pl. As you can see the Cirrus tests are failing
on Windows only (its log file is here [2]).

I'll look at this "Windows" related issue but would appreciate any
guidance/help if someone has experience in this area on windows.

I recall that being able to do a reverse lookup of a hostname on
Windows for localhost requires a few extra setup steps as that's not
guaranteed to be set in all environments by default, which is why we
go at great length to use 127.0.0.1 in the TAP test setup for example
(see Cluster.pm). Looking at your patch, the goal is to test the
mapping of regular expression for host names, user names and database
names. If the first case is not guaranteed, my guess is that it is
fine to skip this portion of the tests on Windows.

While reading the patch, I am a bit confused about token_regcomp() and
token_regexec(). It would help the review a lot if these were
documented with proper comments, even if these act roughly as wrappers
for pg_regexec() and pg_regcomp().
--
Michael

#11

Drouvot, Bertrand

bertranddrouvot.pg@gmail.com

over 3 years ago

In reply to: Michael Paquier (#10)

1 attachment(s)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Hi,

On 9/17/22 8:53 AM, Michael Paquier wrote:

On Fri, Sep 16, 2022 at 06:24:07PM +0200, Drouvot, Bertrand wrote:

The CF bot is failing for Windows (all other tests are green) and only for
the new tap test related to the regular expression on the host name (the
ones on database and role are fine).

The issue is not related to the patch. The issue is that the Windows Cirrus
test does not like when a host name is provided for a "host" entry in
pg_hba.conf (while it works fine when a CIDR is provided).

You can see an example in [1] where the only change is to replace the CIDR
by "localhost" in 002_scram.pl. As you can see the Cirrus tests are failing
on Windows only (its log file is here [2]).

I'll look at this "Windows" related issue but would appreciate any
guidance/help if someone has experience in this area on windows.

I recall that being able to do a reverse lookup of a hostname on
Windows for localhost requires a few extra setup steps as that's not
guaranteed to be set in all environments by default, which is why we
go at great length to use 127.0.0.1 in the TAP test setup for example
(see Cluster.pm). Looking at your patch, the goal is to test the
mapping of regular expression for host names, user names and database
names. If the first case is not guaranteed, my guess is that it is
fine to skip this portion of the tests on Windows.

Thanks for looking at it!

That sounds reasonable, v3 attached is skipping the regular expression
tests for the hostname on Windows.

While reading the patch, I am a bit confused about token_regcomp() and
token_regexec(). It would help the review a lot if these were
documented with proper comments, even if these act roughly as wrappers
for pg_regexec() and pg_regcomp().

Fully agree, comments were missing. They've been added in v3 attached.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v3-0001-hba_with_regexp.patchtext/plain; charset=UTF-8; name=v3-0001-hba_with_regexp.patchDownload

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..28343445be 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -235,8 +235,9 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        logical replication connections do specify it.
        Otherwise, this is the name of
        a specific <productname>PostgreSQL</productname> database.
-       Multiple database names can be supplied by separating them with
-       commas.  A separate file containing database names can be specified by
+       Multiple database names and/or regular expressions preceded by <literal>/</literal>
+       can be supplied by separating them with commas.
+       A separate file containing database names can be specified by
        preceding the file name with <literal>@</literal>.
       </para>
      </listitem>
@@ -249,7 +250,8 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        Specifies which database user name(s) this record
        matches. The value <literal>all</literal> specifies that it
        matches all users.  Otherwise, this is either the name of a specific
-       database user, or a group name preceded by <literal>+</literal>.
+       database user, a regular expression preceded by <literal>/</literal>
+       or a group name preceded by <literal>+</literal>.
        (Recall that there is no real distinction between users and groups
        in <productname>PostgreSQL</productname>; a <literal>+</literal> mark really means
        <quote>match any of the roles that are directly or indirectly members
@@ -258,7 +260,8 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        considered to be a member of a role if they are explicitly a member
        of the role, directly or indirectly, and not just by virtue of
        being a superuser.
-       Multiple user names can be supplied by separating them with commas.
+       Multiple user names and/or regular expressions preceded by <literal>/</literal>
+       can be supplied by separating them with commas.
        A separate file containing user names can be specified by preceding the
        file name with <literal>@</literal>.
       </para>
@@ -270,8 +273,9 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
      <listitem>
       <para>
        Specifies the client machine address(es) that this record
-       matches.  This field can contain either a host name, an IP
-       address range, or one of the special key words mentioned below.
+       matches.  This field can contain either a host name, a regular expression
+       preceded by <literal>/</literal> representing host names, an IP address range,
+       or one of the special key words mentioned below.
       </para>
 
       <para>
@@ -785,16 +789,18 @@ host    all             all             192.168.12.10/32        gss
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 host    all             all             192.168.0.0/16          ident map=omicron
 
-# If these are the only three lines for local connections, they will
+# If these are the only four lines for local connections, they will
 # allow local users to connect only to their own databases (databases
-# with the same name as their database user name) except for administrators
-# and members of role "support", who can connect to all databases.  The file
-# $PGDATA/admins contains a list of names of administrators.  Passwords
+# with the same name as their database user name) except for administrators,
+# users finishing with "helpdesk" and members of role "support",
+# who can connect to all databases.
+# The file$PGDATA/admins contains a list of names of administrators.  Passwords
 # are required in all cases.
 #
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 local   sameuser        all                                     md5
 local   all             @admins                                 md5
+local   all             /^.*helpdesk$                           md5
 local   all             +support                                md5
 
 # The last two lines above can be combined into a single line:
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 4637426d62..89a4fa8b98 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -117,6 +117,9 @@ static List *tokenize_inc_file(List *tokens, const char *outer_filename,
 							   const char *inc_filename, int elevel, char **err_msg);
 static bool parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline,
 							   int elevel, char **err_msg);
+static bool token_regcomp(regex_t *re, char *string, char *filename,
+						  int line_num, char **err_msg, int elevel);
+static bool token_regexec(const char *match, regex_t *re);
 
 
 /*
@@ -574,13 +577,15 @@ is_member(Oid userid, const char *role)
 }
 
 /*
- * Check AuthToken list for a match to role, allowing group names.
+ * Check AuthToken list for a match to role.
+ * We are allowing group names and regular expressions.
  */
 static bool
-check_role(const char *role, Oid roleid, List *tokens)
+check_role(const char *role, Oid roleid, List *tokens, List *tokens_re)
 {
 	ListCell   *cell;
 	AuthToken  *tok;
+	int			re_num = 0;
 
 	foreach(cell, tokens)
 	{
@@ -590,6 +595,23 @@ check_role(const char *role, Oid roleid, List *tokens)
 			if (is_member(roleid, tok->string + 1))
 				return true;
 		}
+		else if (!tok->quoted && tok->string[0] == '/')
+		{
+			/*
+			 * When tok->string starts with a slash, treat it as a regular
+			 * expression.
+			 */
+			ListCell   *cell_re;
+			regex_t    *re;
+
+			cell_re = list_nth_cell(tokens_re, re_num);
+			re = lfirst(cell_re);
+
+			if (token_regexec(role, re))
+				return true;
+
+			re_num++;
+		}
 		else if (token_matches(tok, role) ||
 				 token_is_keyword(tok, "all"))
 			return true;
@@ -601,10 +623,11 @@ check_role(const char *role, Oid roleid, List *tokens)
  * Check to see if db/role combination matches AuthToken list.
  */
 static bool
-check_db(const char *dbname, const char *role, Oid roleid, List *tokens)
+check_db(const char *dbname, const char *role, Oid roleid, List *tokens, List *tokens_re)
 {
 	ListCell   *cell;
 	AuthToken  *tok;
+	int			re_num = 0;
 
 	foreach(cell, tokens)
 	{
@@ -633,6 +656,23 @@ check_db(const char *dbname, const char *role, Oid roleid, List *tokens)
 		}
 		else if (token_is_keyword(tok, "replication"))
 			continue;			/* never match this if not walsender */
+		else if (!tok->quoted && tok->string[0] == '/')
+		{
+			/*
+			 * When tok->string starts with a slash, treat it as a regular
+			 * expression.
+			 */
+			ListCell   *cell_re;
+			regex_t    *re;
+
+			cell_re = list_nth_cell(tokens_re, re_num);
+			re = lfirst(cell_re);
+
+			if (token_regexec(dbname, re))
+				return true;
+
+			re_num++;
+		}
 		else if (token_matches(tok, dbname))
 			return true;
 	}
@@ -681,7 +721,7 @@ hostname_match(const char *pattern, const char *actual_hostname)
  * Check to see if a connecting IP matches a given host name.
  */
 static bool
-check_hostname(hbaPort *port, const char *hostname)
+check_hostname(hbaPort *port, const char *hostname, regex_t re)
 {
 	struct addrinfo *gai_result,
 			   *gai;
@@ -712,8 +752,17 @@ check_hostname(hbaPort *port, const char *hostname)
 		port->remote_hostname = pstrdup(remote_hostname);
 	}
 
+	if (hostname[0] == '/')
+	{
+		/*
+		 * When hostname starts with a slash, treat it as a regular
+		 * expression.
+		 */
+		if (!token_regexec(port->remote_hostname, &re))
+			return false;
+	}
 	/* Now see if remote host name matches this pg_hba line */
-	if (!hostname_match(hostname, port->remote_hostname))
+	else if (!hostname_match(hostname, port->remote_hostname))
 		return false;
 
 	/* If we already verified the forward lookup, we're done */
@@ -939,13 +988,13 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 	struct addrinfo *gai_result;
 	struct addrinfo hints;
 	int			ret;
-	char	   *cidr_slash;
 	char	   *unsupauth;
 	ListCell   *field;
 	List	   *tokens;
 	ListCell   *tokencell;
 	AuthToken  *token;
 	HbaLine    *parsedline;
+	char       *cidr_slash = NULL; /* keep compiler quiet */
 
 	parsedline = palloc0(sizeof(HbaLine));
 	parsedline->linenumber = line_num;
@@ -1049,9 +1098,27 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 		return NULL;
 	}
 	parsedline->databases = NIL;
+	parsedline->databases_re = NIL;
 	tokens = lfirst(field);
 	foreach(tokencell, tokens)
 	{
+		AuthToken  *tok = lfirst(tokencell);
+
+		if (!tok->quoted && tok->string[0] == '/')
+		{
+			/*
+			 * When tok->string starts with a slash, treat it as a regular
+			 * expression. Pre-compile it.
+			 */
+			regex_t    *re;
+
+			re = (regex_t *) palloc(sizeof(regex_t));
+			if (token_regcomp(re, tok->string + 1, HbaFileName, line_num,
+							  err_msg, elevel))
+				parsedline->databases_re = lappend(parsedline->databases_re, re);
+			else
+				return NULL;
+		}
 		parsedline->databases = lappend(parsedline->databases,
 										copy_auth_token(lfirst(tokencell)));
 	}
@@ -1069,9 +1136,27 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 		return NULL;
 	}
 	parsedline->roles = NIL;
+	parsedline->roles_re = NIL;
 	tokens = lfirst(field);
 	foreach(tokencell, tokens)
 	{
+		AuthToken  *tok = lfirst(tokencell);
+
+		if (!tok->quoted && tok->string[0] == '/')
+		{
+			/*
+			 * When tok->string starts with a slash, treat it as a regular
+			 * expression. Pre-compile it.
+			 */
+			regex_t    *re;
+
+			re = (regex_t *) palloc(sizeof(regex_t));
+			if (token_regcomp(re, tok->string + 1, HbaFileName, line_num,
+							  err_msg, elevel))
+				parsedline->roles_re = lappend(parsedline->roles_re, re);
+			else
+				return NULL;
+		}
 		parsedline->roles = lappend(parsedline->roles,
 									copy_auth_token(lfirst(tokencell)));
 	}
@@ -1120,6 +1205,8 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 		}
 		else
 		{
+			bool is_regexp = token->string[0] == '/' ? true : false;
+
 			/* IP and netmask are specified */
 			parsedline->ip_cmp_method = ipCmpMask;
 
@@ -1127,9 +1214,12 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 			str = pstrdup(token->string);
 
 			/* Check if it has a CIDR suffix and if so isolate it */
-			cidr_slash = strchr(str, '/');
-			if (cidr_slash)
-				*cidr_slash = '\0';
+			if (!is_regexp)
+			{
+				cidr_slash = strchr(str, '/');
+				if (cidr_slash)
+					*cidr_slash = '\0';
+			}
 
 			/* Get the IP address either way */
 			hints.ai_flags = AI_NUMERICHOST;
@@ -1168,7 +1258,7 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 			pg_freeaddrinfo_all(hints.ai_family, gai_result);
 
 			/* Get the netmask */
-			if (cidr_slash)
+			if (cidr_slash && !is_regexp)
 			{
 				if (parsedline->hostname)
 				{
@@ -1199,7 +1289,7 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 				parsedline->masklen = parsedline->addrlen;
 				pfree(str);
 			}
-			else if (!parsedline->hostname)
+			else if (!parsedline->hostname && !is_regexp)
 			{
 				/* Read the mask field. */
 				pfree(str);
@@ -1261,6 +1351,18 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 					return NULL;
 				}
 			}
+			else if (is_regexp)
+			{
+				/*
+				 * When token->string starts with a slash, treat it as a
+				 * regular expression. Pre-compile it.
+				 */
+				if (!token_regcomp(&parsedline->hostname_re,
+								   token->string + 1, HbaFileName,
+								   line_num, err_msg, elevel))
+					return NULL;
+				parsedline->hostname = str;
+			}
 		}
 	}							/* != ctLocal */
 
@@ -2135,7 +2237,8 @@ check_hba(hbaPort *port)
 					if (hba->hostname)
 					{
 						if (!check_hostname(port,
-											hba->hostname))
+											hba->hostname, hba->hostname_re))
+
 							continue;
 					}
 					else
@@ -2162,10 +2265,10 @@ check_hba(hbaPort *port)
 
 		/* Check database and role */
 		if (!check_db(port->database_name, port->user_name, roleid,
-					  hba->databases))
+					  hba->databases, hba->databases_re))
 			continue;
 
-		if (!check_role(port->user_name, roleid, hba->roles))
+		if (!check_role(port->user_name, roleid, hba->roles, hba->roles_re))
 			continue;
 
 		/* Found a record that matched! */
@@ -2342,34 +2445,9 @@ parse_ident_line(TokenizedAuthLine *tok_line, int elevel)
 		 * When system username starts with a slash, treat it as a regular
 		 * expression. Pre-compile it.
 		 */
-		int			r;
-		pg_wchar   *wstr;
-		int			wlen;
-
-		wstr = palloc((strlen(parsedline->ident_user + 1) + 1) * sizeof(pg_wchar));
-		wlen = pg_mb2wchar_with_len(parsedline->ident_user + 1,
-									wstr, strlen(parsedline->ident_user + 1));
-
-		r = pg_regcomp(&parsedline->re, wstr, wlen, REG_ADVANCED, C_COLLATION_OID);
-		if (r)
-		{
-			char		errstr[100];
-
-			pg_regerror(r, &parsedline->re, errstr, sizeof(errstr));
-			ereport(elevel,
-					(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
-					 errmsg("invalid regular expression \"%s\": %s",
-							parsedline->ident_user + 1, errstr),
-					 errcontext("line %d of configuration file \"%s\"",
-							line_num, IdentFileName)));
-
-			*err_msg = psprintf("invalid regular expression \"%s\": %s",
-								parsedline->ident_user + 1, errstr);
-
-			pfree(wstr);
+		if (!token_regcomp(&parsedline->re, parsedline->ident_user + 1,
+						   IdentFileName, line_num, err_msg, elevel))
 			return NULL;
-		}
-		pfree(wstr);
 	}
 
 	return parsedline;
@@ -2706,3 +2784,68 @@ hba_authname(UserAuth auth_method)
 
 	return UserAuthName[auth_method];
 }
+
+/*
+ * Compile the regular expression "re" and return whether it compiles
+ * successfully or not.
+ *
+ * If not, the last 4 parameters are used to add extra details while reporting
+ * the error.
+ */
+static bool
+token_regcomp(regex_t *re, char *string, char *filename, int line_num,
+			  char **err_msg, int elevel)
+{
+	int			r;
+	pg_wchar   *wstr;
+	int			wlen;
+
+	wstr = palloc((strlen(string) + 1) * sizeof(pg_wchar));
+	wlen = pg_mb2wchar_with_len(string,
+								wstr, strlen(string));
+
+	r = pg_regcomp(re, wstr, wlen, REG_ADVANCED, C_COLLATION_OID);
+	if (r)
+	{
+		char		errstr[100];
+
+		pg_regerror(r, re, errstr, sizeof(errstr));
+		ereport(elevel,
+				(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
+				 errmsg("invalid regular expression \"%s\": %s",
+						string, errstr),
+				 errcontext("line %d of configuration file \"%s\"",
+							line_num, filename)));
+
+		*err_msg = psprintf("invalid regular expression \"%s\": %s",
+							string, errstr);
+
+		pfree(wstr);
+		return false;
+	}
+
+	pfree(wstr);
+	return true;
+}
+
+/*
+ * Return whether "match" is matching the regular expression "re" or not.
+ */
+static bool
+token_regexec(const char *match, regex_t *re)
+{
+	pg_wchar   *wmatchstr;
+	int			wmatchlen;
+
+	wmatchstr = palloc((strlen(match) + 1) * sizeof(pg_wchar));
+	wmatchlen = pg_mb2wchar_with_len(match, wmatchstr, strlen(match));
+
+	if (pg_regexec(re, wmatchstr, wmatchlen, 0, NULL, 0, NULL, 0) == REG_OKAY)
+	{
+		pfree(wmatchstr);
+		return true;
+	}
+
+	pfree(wmatchstr);
+	return false;
+}
diff --git a/src/include/libpq/hba.h b/src/include/libpq/hba.h
index d06da81806..24fe502d06 100644
--- a/src/include/libpq/hba.h
+++ b/src/include/libpq/hba.h
@@ -120,6 +120,9 @@ typedef struct HbaLine
 	char	   *radiusidentifiers_s;
 	List	   *radiusports;
 	char	   *radiusports_s;
+	List	   *roles_re;
+	List	   *databases_re;
+	regex_t	   hostname_re;
 } HbaLine;
 
 typedef struct IdentLine
diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index 3e3079c824..210583c406 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -23,29 +23,32 @@ if (!$use_unix_sockets)
 # and then execute a reload to refresh it.
 sub reset_pg_hba
 {
-	my $node       = shift;
-	my $hba_method = shift;
+	my $node		= shift;
+	my $host		= shift;
+	my $database	= shift;
+	my $role		= shift;
+	my $method		= shift;
 
 	unlink($node->data_dir . '/pg_hba.conf');
 	# just for testing purposes, use a continuation line
-	$node->append_conf('pg_hba.conf', "local all all\\\n $hba_method");
+	$node->append_conf('pg_hba.conf', "$host $database $role\\\n $method");
 	$node->reload;
 	return;
 }
 
 # Test access for a single role, useful to wrap all tests into one.  Extra
 # named parameters are passed to connect_ok/fails as-is.
-sub test_role
+sub test_conn
 {
 	local $Test::Builder::Level = $Test::Builder::Level + 1;
 
-	my ($node, $role, $method, $expected_res, %params) = @_;
+	my ($node, $conn, $method, $expected_res, %params) = @_;
 	my $status_string = 'failed';
 	$status_string = 'success' if ($expected_res eq 0);
 
-	my $connstr = "user=$role";
+	my $connstr = "$conn";
 	my $testname =
-	  "authentication $status_string for method $method, role $role";
+	  "authentication $status_string for method $method, conn $conn";
 
 	if ($expected_res eq 0)
 	{
@@ -61,7 +64,11 @@ sub test_role
 # Initialize primary node
 my $node = PostgreSQL::Test::Cluster->new('primary');
 $node->init;
-$node->append_conf('postgresql.conf', "log_connections = on\n");
+$node->append_conf(
+    'postgresql.conf', qq{
+listen_addresses = '127.0.0.1'
+log_connections = on
+});
 $node->start;
 
 # Create 3 roles with different password methods for each one. The same
@@ -74,61 +81,66 @@ $node->safe_psql('postgres',
 );
 $ENV{"PGPASSWORD"} = 'pass';
 
+# Create a database to test regular expression
+$node->safe_psql('postgres',
+	"CREATE database testdb;"
+);
+
 # For "trust" method, all users should be able to connect. These users are not
 # considered to be authenticated.
-reset_pg_hba($node, 'trust');
-test_role($node, 'scram_role', 'trust', 0,
+reset_pg_hba($node, 'local','all', 'all', 'trust');
+test_conn($node, 'user=scram_role', 'trust', 0,
 	log_unlike => [qr/connection authenticated:/]);
-test_role($node, 'md5_role', 'trust', 0,
+test_conn($node, 'user=md5_role', 'trust', 0,
 	log_unlike => [qr/connection authenticated:/]);
 
 # For plain "password" method, all users should also be able to connect.
-reset_pg_hba($node, 'password');
-test_role($node, 'scram_role', 'password', 0,
+reset_pg_hba($node, 'local', 'all', 'all', 'password');
+test_conn($node, 'user=scram_role', 'password', 0,
 	log_like =>
 	  [qr/connection authenticated: identity="scram_role" method=password/]);
-test_role($node, 'md5_role', 'password', 0,
+test_conn($node, 'user=md5_role', 'password', 0,
 	log_like =>
 	  [qr/connection authenticated: identity="md5_role" method=password/]);
 
 # For "scram-sha-256" method, user "scram_role" should be able to connect.
-reset_pg_hba($node, 'scram-sha-256');
-test_role(
+reset_pg_hba($node, 'local', 'all', 'all', 'scram-sha-256');
+test_conn(
 	$node,
-	'scram_role',
+	'user=scram_role',
 	'scram-sha-256',
 	0,
 	log_like => [
 		qr/connection authenticated: identity="scram_role" method=scram-sha-256/
 	]);
-test_role($node, 'md5_role', 'scram-sha-256', 2,
+test_conn($node, 'user=md5_role', 'scram-sha-256', 2,
 	log_unlike => [qr/connection authenticated:/]);
 
 # Test that bad passwords are rejected.
 $ENV{"PGPASSWORD"} = 'badpass';
-test_role($node, 'scram_role', 'scram-sha-256', 2,
+test_conn($node, 'user=scram_role', 'scram-sha-256', 2,
 	log_unlike => [qr/connection authenticated:/]);
 $ENV{"PGPASSWORD"} = 'pass';
 
 # For "md5" method, all users should be able to connect (SCRAM
 # authentication will be performed for the user with a SCRAM secret.)
-reset_pg_hba($node, 'md5');
-test_role($node, 'scram_role', 'md5', 0,
+reset_pg_hba($node, 'local', 'all', 'all', 'md5');
+test_conn($node, 'user=scram_role', 'md5', 0,
 	log_like =>
 	  [qr/connection authenticated: identity="scram_role" method=md5/]);
-test_role($node, 'md5_role', 'md5', 0,
+test_conn($node, 'user=md5_role', 'md5', 0,
 	log_like =>
 	  [qr/connection authenticated: identity="md5_role" method=md5/]);
 
 # Tests for channel binding without SSL.
 # Using the password authentication method; channel binding can't work
-reset_pg_hba($node, 'password');
+reset_pg_hba($node, 'local', 'all', 'all', 'password');
 $ENV{"PGCHANNELBINDING"} = 'require';
-test_role($node, 'scram_role', 'scram-sha-256', 2);
+test_conn($node, 'user=scram_role', 'scram-sha-256', 2);
 # SSL not in use; channel binding still can't work
-reset_pg_hba($node, 'scram-sha-256');
+reset_pg_hba($node, 'local', 'all', 'all', 'scram-sha-256');
 $ENV{"PGCHANNELBINDING"} = 'require';
-test_role($node, 'scram_role', 'scram-sha-256', 2);
+test_conn($node, 'user=scram_role', 'scram-sha-256', 2);
 
 # Test .pgpass processing; but use a temp file, don't overwrite the real one!
 my $pgpassfile = "${PostgreSQL::Test::Utils::tmp_check}/pgpass";
@@ -145,15 +157,45 @@ append_to_file(
 !);
 chmod 0600, $pgpassfile or die;
 
-reset_pg_hba($node, 'password');
-test_role($node, 'scram_role', 'password from pgpass', 0);
-test_role($node, 'md5_role',   'password from pgpass', 2);
+reset_pg_hba($node, 'local', 'all', 'all', 'password');
+test_conn($node, 'user=scram_role', 'password from pgpass', 0);
+test_conn($node, 'user=md5_role',   'password from pgpass', 2);
 
 append_to_file(
 	$pgpassfile, qq!
 *:*:*:md5_role:p\\ass
 !);
 
-test_role($node, 'md5_role', 'password from pgpass', 0);
+test_conn($node, 'user=md5_role', 'password from pgpass', 0);
+
+# Testing with regular expression for username
+reset_pg_hba($node, 'local', 'all', '/^.*nomatch.*$, baduser, /^.*md.*$', 'password');
+test_conn($node, 'user=md5_role', 'password, matching regexp for username', 0);
+
+reset_pg_hba($node, 'local', 'all', '/^.*nomatch.*$, baduser, /^.*m_d.*$', 'password');
+test_conn($node, 'user=md5_role', 'password, non matching regexp for username', 2,
+		log_unlike => [qr/connection authenticated:/]);
+
+# Testing with regular expression for dbname
+reset_pg_hba($node, 'local', '/^.*nomatch.*$, baddb, /^t.*b$', 'all', 'password');
+test_conn($node, 'user=md5_role dbname=testdb', 'password, matching regexp for dbname', 0);
 
+reset_pg_hba($node, 'local', '/^.*nomatch.*$, baddb, /^t.*ba$', 'all', 'password');
+test_conn($node, 'user=md5_role dbname=testdb', 'password, non matching regexp for dbname', 2,
+		log_unlike => [qr/connection authenticated:/]);
+
+# Testing with regular expression for hostname
+SKIP:
+{
+	# Being able to do a reverse lookup of a hostname on Windows for localhost
+	# is not guaranteed on all environments by default.
+	# So, skip the regular expression test for hostname on Windows.
+	skip "Regular expression for hostname not tested on Windows", 2 if ($windows_os);
+
+	reset_pg_hba($node, 'host', 'all', 'all  /^.*$', 'password');
+	test_conn($node, 'user=md5_role host=localhost', 'password, matching regexp for hostname', 0);
+
+	reset_pg_hba($node, 'host', 'all', 'all  /^$', 'password');
+	test_conn($node, 'user=md5_role host=localhost', 'password, non matching regexp for hostname', 2);
+}
 done_testing();

#12

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Jacob Champion (#6)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

On Fri, Sep 09, 2022 at 03:05:18PM -0700, Jacob Champion wrote:

On Thu, Sep 8, 2022 at 5:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jacob Champion <jchampion@timescale.com> writes:

I think you're going to have to address backwards compatibility
concerns. Today, I can create a role named "/a", and I can put that
into the HBA without quoting it. I'd be unamused if, after an upgrade,
my rule suddenly matched any role name containing an 'a'.

Meh ... that concern seems overblown to me. I guess it's possible
that somebody has an HBA entry that looks like that, but it doesn't
seem very plausible. Note that we made this exact same change in
pg_ident.conf years ago, and AFAIR we got zero complaints.

What percentage of users actually use pg_ident maps? My assumption
would be that a change to pg_hba would affect many more people, but
then I don't have any proof that there are users with role names that
look like that to begin with. I won't pound the table with it.

This concern does not sound overblown to me. A change in pg_hba.conf
impacts everybody. I was just looking at this patch, and the logic
with usernames and databases is changed so as we would *always* treat
*any* entries beginning with a slash as a regular expression, skipping
them if they don't match with an error fed to the logs and
pg_hba_file_rules. This could lead to silent security issues as
stricter HBA policies need to be located first in pg_hba.conf and
these could suddenly fail to load. It would be much safer to me if we
had in place some restrictions to avoid such problems to happen,
meaning some extra checks in the DDL code paths for such object names
and perhaps even something in pg_upgrade with a scan at pg_database
and pg_authid.

On the bright side, I have been looking at some of the RFCs covering
the set of characters allowed in DNS names and slashes are not
authorized in hostnames, making this change rather safe AFAIK.
--
Michael

#13

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Michael Paquier (#12)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Michael Paquier <michael@paquier.xyz> writes:

On Thu, Sep 8, 2022 at 5:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Meh ... that concern seems overblown to me. I guess it's possible
that somebody has an HBA entry that looks like that, but it doesn't
seem very plausible. Note that we made this exact same change in
pg_ident.conf years ago, and AFAIR we got zero complaints.

This concern does not sound overblown to me.

You have to assume that somebody (a) has a role or DB name starting
with slash, (b) has an explicit reference to that name in their
pg_hba.conf, (c) doesn't read the release notes, and (d) doesn't
notice that things are misbehaving until after some hacker manages
to break into their installation on the strength of the misbehaving
entry. OK, I'll grant that the probability of (c) is depressingly
close to unity; but each of the other steps seems quite low probability.
All four of them happening in one installation is something I doubt
will happen.

On the contrary side, if we make this work differently from the
pg_ident.conf precedent, or install weird rules to try to prevent
accidental misinterpretations, that could also lead to security
problems because things don't work as someone would expect. I see
no a-priori reason to believe that this risk is negligible compared
to the other one.

regards, tom lane

#14

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Tom Lane (#13)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

On Tue, Sep 20, 2022 at 12:09:33AM -0400, Tom Lane wrote:

You have to assume that somebody (a) has a role or DB name starting
with slash, (b) has an explicit reference to that name in their
pg_hba.conf, (c) doesn't read the release notes, and (d) doesn't
notice that things are misbehaving until after some hacker manages
to break into their installation on the strength of the misbehaving
entry. OK, I'll grant that the probability of (c) is depressingly
close to unity; but each of the other steps seems quite low probability.
All four of them happening in one installation is something I doubt
will happen.

It is the kind of things that could blow up as a CVE and some bad PR
for the project, so I cannot get excited about enforcing this new rule
in an authentication file (aka before a role is authenticated) while
we are talking about 3~4 code paths (?) that would need an extra check
to make sure that no instances have such object names.

On the contrary side, if we make this work differently from the
pg_ident.conf precedent, or install weird rules to try to prevent
accidental misinterpretations, that could also lead to security
problems because things don't work as someone would expect. I see
no a-priori reason to believe that this risk is negligible compared
to the other one.

I also do like a lot the idea of making things consistent across all
the auth configuration files for all the fields where this can be
applied.
--
Michael

#15

Drouvot, Bertrand

bertranddrouvot.pg@gmail.com

over 3 years ago

In reply to: Michael Paquier (#14)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Hi,

On 9/20/22 6:30 AM, Michael Paquier wrote:

On Tue, Sep 20, 2022 at 12:09:33AM -0400, Tom Lane wrote:

You have to assume that somebody (a) has a role or DB name starting
with slash, (b) has an explicit reference to that name in their
pg_hba.conf, (c) doesn't read the release notes, and (d) doesn't
notice that things are misbehaving until after some hacker manages
to break into their installation on the strength of the misbehaving
entry. OK, I'll grant that the probability of (c) is depressingly
close to unity; but each of the other steps seems quite low probability.
All four of them happening in one installation is something I doubt
will happen.

It is the kind of things that could blow up as a CVE and some bad PR
for the project, so I cannot get excited about enforcing this new rule
in an authentication file (aka before a role is authenticated) while
we are talking about 3~4 code paths (?) that would need an extra check
to make sure that no instances have such object names.

I also have the feeling that having (a), (b) and (d) is low probability.

That said, If the user "/john" already exists and has a hba entry then
this entry will still match with the patch. Problem is that all the
users that contain "john" would also now match.

But things get worst if say /a is an existing user and hba entry as the
entry would match any users that contains "a" with the patch.

I assume (maybe i should not) that if objects starting with / already
exist there is very good reason(s) behind. Then I don't think that
preventing their creation in the DDL would help (quite the contrary for
the ones that really need them).

It looks to me that adding a GUC (off by default) to enable/disable the
regexp usage in the hba could be a fair compromise. It won't block any
creation starting with a / and won't open more doors (if such objects
exist) by default.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#16

Jacob Champion

jchampion@timescale.com

over 3 years ago

In reply to: Tom Lane (#13)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

On Mon, Sep 19, 2022 at 9:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

You have to assume that somebody (a) has a role or DB name starting
with slash, (b) has an explicit reference to that name in their
pg_hba.conf, (c) doesn't read the release notes, and (d) doesn't
notice that things are misbehaving until after some hacker manages
to break into their installation on the strength of the misbehaving
entry. OK, I'll grant that the probability of (c) is depressingly
close to unity; but each of the other steps seems quite low probability.
All four of them happening in one installation is something I doubt
will happen.

I can't argue with (a) or (b), but (d) seems decently likely to me. If
your normal user base consists of people who are authorized to access
your system, what clues would you have that your HBA is silently
failing open?

--Jacob

#17

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Drouvot, Bertrand (#15)

2 attachment(s)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

On Tue, Sep 20, 2022 at 01:33:09PM +0200, Drouvot, Bertrand wrote:

I assume (maybe i should not) that if objects starting with / already exist
there is very good reason(s) behind. Then I don't think that preventing
their creation in the DDL would help (quite the contrary for the ones that
really need them).

I have been pondering on this point for the last few weeks, and I'd
like to change my opinion and side with Tom on this one as per the
very unlikeliness of this being a problem in the wild. I have studied
the places that would require restrictions but that was just feeling
adding a bit more bloat into the CREATE/ALTER ROLE paths for what's
aimed at providing a consistent experience for the user across
pg_hba.conf and pg_ident.conf.

It looks to me that adding a GUC (off by default) to enable/disable the
regexp usage in the hba could be a fair compromise. It won't block any
creation starting with a / and won't open more doors (if such objects exist)
by default.

Enforcing a behavior change in HBA policies with a GUC does not strike
me as a good thing in the long term. I am ready to bet that it would
just sit around for nothing like the compatibility GUCs.

Anyway, I have looked at the patch.

+ List *roles_re;
+ List *databases_re;
+ regex_t hostname_re;
I am surprised by the approach of using separate lists for the regular
expressions and the raw names. Wouldn't it be better to store
everything in a single list but assign an entry type? In this case it
would be either regex or plain string. This would minimize the
footprint of the changes (no extra arguments *_re in the routines
checking for a match on the roles, databases or hosts). And it seems
to me that this would make unnecessary the use of re_num here and
there. The hostname is different, of course, requiring only an extra
field for its type, or something like that.

Perhaps the documentation would gain in clarity if there were more
examples, like a set of comma-separated examples (mix of regex and raw
strings for example, for all the field types that gain support for
regexes)?

-$node->append_conf('postgresql.conf', "log_connections = on\n");
+$node->append_conf(
+    'postgresql.conf', qq{
+listen_addresses = '127.0.0.1'
+log_connections = on
+});
Hmm.  I think that we may need to reconsider the location of the tests
for the regexes with the host name, as the "safe" regression tests
should not switch listen_addresses.  One location where we already do
that is src/test/ssl/, so these could be moved there.  Keeping the
database and user name parts in src/test/authentication/ is fine.

Something that stood out on a first review is the refactoring of
001_password.pl that can be done independently of the main patch:
- test_role() -> test_conn() to be able to pass down a database name.
- reset_pg_hba() to control the host, db and user parts. The host
part does not really apply after moving the hosts checks to a more
secure location, so I guess that this had better be extended just for
the user and database, keeping host=local all the time.
I am planning to apply 0001 attached independently, reducing the
footprint of 0002, which is your previous patch left untouched
(mostly!).
--
Michael

Attachments:

v4-0001-Refactor-TAP-test-001_password.pl.patchtext/plain; charset=us-asciiDownload

From 203122ff1eeae942fc78b5d7ad85122547a2a6ed Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 5 Oct 2022 15:14:41 +0900
Subject: [PATCH v4 1/2] Refactor TAP test 001_password.pl

---
 src/test/authentication/t/001_password.pl | 60 ++++++++++++-----------
 1 file changed, 31 insertions(+), 29 deletions(-)

diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index 58e4176e80..9e02697355 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -24,28 +24,30 @@ if (!$use_unix_sockets)
 sub reset_pg_hba
 {
 	my $node       = shift;
+	my $host       = shift;
+	my $database   = shift;
+	my $role       = shift;
 	my $hba_method = shift;
 
 	unlink($node->data_dir . '/pg_hba.conf');
 	# just for testing purposes, use a continuation line
-	$node->append_conf('pg_hba.conf', "local all all\\\n $hba_method");
+	$node->append_conf('pg_hba.conf', "$host $database $role\\\n $hba_method");
 	$node->reload;
 	return;
 }
 
-# Test access for a single role, useful to wrap all tests into one.  Extra
-# named parameters are passed to connect_ok/fails as-is.
-sub test_role
+# Test access for a connection string, useful to wrap all tests into one.
+# Extra named parameters are passed to connect_ok/fails as-is.
+sub test_conn
 {
 	local $Test::Builder::Level = $Test::Builder::Level + 1;
 
-	my ($node, $role, $method, $expected_res, %params) = @_;
+	my ($node, $connstr, $method, $expected_res, %params) = @_;
 	my $status_string = 'failed';
 	$status_string = 'success' if ($expected_res eq 0);
 
-	my $connstr = "user=$role";
 	my $testname =
-	  "authentication $status_string for method $method, role $role";
+	  "authentication $status_string for method $method, connstr $connstr";
 
 	if ($expected_res eq 0)
 	{
@@ -81,10 +83,10 @@ $ENV{"PGPASSWORD"} = 'pass';
 
 # For "trust" method, all users should be able to connect. These users are not
 # considered to be authenticated.
-reset_pg_hba($node, 'trust');
-test_role($node, 'scram_role', 'trust', 0,
+reset_pg_hba($node, 'local','all', 'all', 'trust');
+test_conn($node, 'user=scram_role', 'trust', 0,
 	log_unlike => [qr/connection authenticated:/]);
-test_role($node, 'md5_role', 'trust', 0,
+test_conn($node, 'user=md5_role', 'trust', 0,
 	log_unlike => [qr/connection authenticated:/]);
 
 # SYSTEM_USER is null when not authenticated.
@@ -106,40 +108,40 @@ is($res, 't',
 );
 
 # For plain "password" method, all users should also be able to connect.
-reset_pg_hba($node, 'password');
-test_role($node, 'scram_role', 'password', 0,
+reset_pg_hba($node, 'local','all', 'all', 'password');
+test_conn($node, 'user=scram_role', 'password', 0,
 	log_like =>
 	  [qr/connection authenticated: identity="scram_role" method=password/]);
-test_role($node, 'md5_role', 'password', 0,
+test_conn($node, 'user=md5_role', 'password', 0,
 	log_like =>
 	  [qr/connection authenticated: identity="md5_role" method=password/]);
 
 # For "scram-sha-256" method, user "scram_role" should be able to connect.
-reset_pg_hba($node, 'scram-sha-256');
-test_role(
+reset_pg_hba($node, 'local','all', 'all', 'scram-sha-256');
+test_conn(
 	$node,
-	'scram_role',
+	'user=scram_role',
 	'scram-sha-256',
 	0,
 	log_like => [
 		qr/connection authenticated: identity="scram_role" method=scram-sha-256/
 	]);
-test_role($node, 'md5_role', 'scram-sha-256', 2,
+test_conn($node, 'user=md5_role', 'scram-sha-256', 2,
 	log_unlike => [qr/connection authenticated:/]);
 
 # Test that bad passwords are rejected.
 $ENV{"PGPASSWORD"} = 'badpass';
-test_role($node, 'scram_role', 'scram-sha-256', 2,
+test_conn($node, 'user=scram_role', 'scram-sha-256', 2,
 	log_unlike => [qr/connection authenticated:/]);
 $ENV{"PGPASSWORD"} = 'pass';
 
 # For "md5" method, all users should be able to connect (SCRAM
 # authentication will be performed for the user with a SCRAM secret.)
-reset_pg_hba($node, 'md5');
-test_role($node, 'scram_role', 'md5', 0,
+reset_pg_hba($node, 'local','all', 'all', 'md5');
+test_conn($node, 'user=scram_role', 'md5', 0,
 	log_like =>
 	  [qr/connection authenticated: identity="scram_role" method=md5/]);
-test_role($node, 'md5_role', 'md5', 0,
+test_conn($node, 'user=md5_role', 'md5', 0,
 	log_like =>
 	  [qr/connection authenticated: identity="md5_role" method=md5/]);
 
@@ -164,13 +166,13 @@ is($res, 't',
 
 # Tests for channel binding without SSL.
 # Using the password authentication method; channel binding can't work
-reset_pg_hba($node, 'password');
+reset_pg_hba($node, 'local','all', 'all', 'password');
 $ENV{"PGCHANNELBINDING"} = 'require';
-test_role($node, 'scram_role', 'scram-sha-256', 2);
+test_conn($node, 'user=scram_role', 'scram-sha-256', 2);
 # SSL not in use; channel binding still can't work
-reset_pg_hba($node, 'scram-sha-256');
+reset_pg_hba($node, 'local','all', 'all', 'scram-sha-256');
 $ENV{"PGCHANNELBINDING"} = 'require';
-test_role($node, 'scram_role', 'scram-sha-256', 2);
+test_conn($node, 'user=scram_role', 'scram-sha-256', 2);
 
 # Test .pgpass processing; but use a temp file, don't overwrite the real one!
 my $pgpassfile = "${PostgreSQL::Test::Utils::tmp_check}/pgpass";
@@ -187,15 +189,15 @@ append_to_file(
 !);
 chmod 0600, $pgpassfile or die;
 
-reset_pg_hba($node, 'password');
-test_role($node, 'scram_role', 'password from pgpass', 0);
-test_role($node, 'md5_role',   'password from pgpass', 2);
+reset_pg_hba($node, 'local','all', 'all', 'password');
+test_conn($node, 'user=scram_role', 'password from pgpass', 0);
+test_conn($node, 'user=md5_role',   'password from pgpass', 2);
 
 append_to_file(
 	$pgpassfile, qq!
 *:*:*:md5_role:p\\ass
 !);
 
-test_role($node, 'md5_role', 'password from pgpass', 0);
+test_conn($node, 'user=md5_role', 'password from pgpass', 0);
 
 done_testing();
-- 
2.37.3

v4-0002-Main-patch-from-Bertrand-Drouvot.patchtext/plain; charset=us-asciiDownload

From cd8debbc5c54f5342cd7db4e43091532fbdb065d Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 5 Oct 2022 16:17:28 +0900
Subject: [PATCH v4 2/2] Main patch from Bertrand Drouvot

---
 src/include/libpq/hba.h                   |   3 +
 src/backend/libpq/hba.c                   | 225 ++++++++++++++++++----
 src/test/authentication/t/001_password.pl |  42 ++++
 doc/src/sgml/client-auth.sgml             |  26 ++-
 4 files changed, 245 insertions(+), 51 deletions(-)

diff --git a/src/include/libpq/hba.h b/src/include/libpq/hba.h
index d06da81806..24fe502d06 100644
--- a/src/include/libpq/hba.h
+++ b/src/include/libpq/hba.h
@@ -120,6 +120,9 @@ typedef struct HbaLine
 	char	   *radiusidentifiers_s;
 	List	   *radiusports;
 	char	   *radiusports_s;
+	List	   *roles_re;
+	List	   *databases_re;
+	regex_t	   hostname_re;
 } HbaLine;
 
 typedef struct IdentLine
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 4637426d62..89a4fa8b98 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -117,6 +117,9 @@ static List *tokenize_inc_file(List *tokens, const char *outer_filename,
 							   const char *inc_filename, int elevel, char **err_msg);
 static bool parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline,
 							   int elevel, char **err_msg);
+static bool token_regcomp(regex_t *re, char *string, char *filename,
+						  int line_num, char **err_msg, int elevel);
+static bool token_regexec(const char *match, regex_t *re);
 
 
 /*
@@ -574,13 +577,15 @@ is_member(Oid userid, const char *role)
 }
 
 /*
- * Check AuthToken list for a match to role, allowing group names.
+ * Check AuthToken list for a match to role.
+ * We are allowing group names and regular expressions.
  */
 static bool
-check_role(const char *role, Oid roleid, List *tokens)
+check_role(const char *role, Oid roleid, List *tokens, List *tokens_re)
 {
 	ListCell   *cell;
 	AuthToken  *tok;
+	int			re_num = 0;
 
 	foreach(cell, tokens)
 	{
@@ -590,6 +595,23 @@ check_role(const char *role, Oid roleid, List *tokens)
 			if (is_member(roleid, tok->string + 1))
 				return true;
 		}
+		else if (!tok->quoted && tok->string[0] == '/')
+		{
+			/*
+			 * When tok->string starts with a slash, treat it as a regular
+			 * expression.
+			 */
+			ListCell   *cell_re;
+			regex_t    *re;
+
+			cell_re = list_nth_cell(tokens_re, re_num);
+			re = lfirst(cell_re);
+
+			if (token_regexec(role, re))
+				return true;
+
+			re_num++;
+		}
 		else if (token_matches(tok, role) ||
 				 token_is_keyword(tok, "all"))
 			return true;
@@ -601,10 +623,11 @@ check_role(const char *role, Oid roleid, List *tokens)
  * Check to see if db/role combination matches AuthToken list.
  */
 static bool
-check_db(const char *dbname, const char *role, Oid roleid, List *tokens)
+check_db(const char *dbname, const char *role, Oid roleid, List *tokens, List *tokens_re)
 {
 	ListCell   *cell;
 	AuthToken  *tok;
+	int			re_num = 0;
 
 	foreach(cell, tokens)
 	{
@@ -633,6 +656,23 @@ check_db(const char *dbname, const char *role, Oid roleid, List *tokens)
 		}
 		else if (token_is_keyword(tok, "replication"))
 			continue;			/* never match this if not walsender */
+		else if (!tok->quoted && tok->string[0] == '/')
+		{
+			/*
+			 * When tok->string starts with a slash, treat it as a regular
+			 * expression.
+			 */
+			ListCell   *cell_re;
+			regex_t    *re;
+
+			cell_re = list_nth_cell(tokens_re, re_num);
+			re = lfirst(cell_re);
+
+			if (token_regexec(dbname, re))
+				return true;
+
+			re_num++;
+		}
 		else if (token_matches(tok, dbname))
 			return true;
 	}
@@ -681,7 +721,7 @@ hostname_match(const char *pattern, const char *actual_hostname)
  * Check to see if a connecting IP matches a given host name.
  */
 static bool
-check_hostname(hbaPort *port, const char *hostname)
+check_hostname(hbaPort *port, const char *hostname, regex_t re)
 {
 	struct addrinfo *gai_result,
 			   *gai;
@@ -712,8 +752,17 @@ check_hostname(hbaPort *port, const char *hostname)
 		port->remote_hostname = pstrdup(remote_hostname);
 	}
 
+	if (hostname[0] == '/')
+	{
+		/*
+		 * When hostname starts with a slash, treat it as a regular
+		 * expression.
+		 */
+		if (!token_regexec(port->remote_hostname, &re))
+			return false;
+	}
 	/* Now see if remote host name matches this pg_hba line */
-	if (!hostname_match(hostname, port->remote_hostname))
+	else if (!hostname_match(hostname, port->remote_hostname))
 		return false;
 
 	/* If we already verified the forward lookup, we're done */
@@ -939,13 +988,13 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 	struct addrinfo *gai_result;
 	struct addrinfo hints;
 	int			ret;
-	char	   *cidr_slash;
 	char	   *unsupauth;
 	ListCell   *field;
 	List	   *tokens;
 	ListCell   *tokencell;
 	AuthToken  *token;
 	HbaLine    *parsedline;
+	char       *cidr_slash = NULL; /* keep compiler quiet */
 
 	parsedline = palloc0(sizeof(HbaLine));
 	parsedline->linenumber = line_num;
@@ -1049,9 +1098,27 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 		return NULL;
 	}
 	parsedline->databases = NIL;
+	parsedline->databases_re = NIL;
 	tokens = lfirst(field);
 	foreach(tokencell, tokens)
 	{
+		AuthToken  *tok = lfirst(tokencell);
+
+		if (!tok->quoted && tok->string[0] == '/')
+		{
+			/*
+			 * When tok->string starts with a slash, treat it as a regular
+			 * expression. Pre-compile it.
+			 */
+			regex_t    *re;
+
+			re = (regex_t *) palloc(sizeof(regex_t));
+			if (token_regcomp(re, tok->string + 1, HbaFileName, line_num,
+							  err_msg, elevel))
+				parsedline->databases_re = lappend(parsedline->databases_re, re);
+			else
+				return NULL;
+		}
 		parsedline->databases = lappend(parsedline->databases,
 										copy_auth_token(lfirst(tokencell)));
 	}
@@ -1069,9 +1136,27 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 		return NULL;
 	}
 	parsedline->roles = NIL;
+	parsedline->roles_re = NIL;
 	tokens = lfirst(field);
 	foreach(tokencell, tokens)
 	{
+		AuthToken  *tok = lfirst(tokencell);
+
+		if (!tok->quoted && tok->string[0] == '/')
+		{
+			/*
+			 * When tok->string starts with a slash, treat it as a regular
+			 * expression. Pre-compile it.
+			 */
+			regex_t    *re;
+
+			re = (regex_t *) palloc(sizeof(regex_t));
+			if (token_regcomp(re, tok->string + 1, HbaFileName, line_num,
+							  err_msg, elevel))
+				parsedline->roles_re = lappend(parsedline->roles_re, re);
+			else
+				return NULL;
+		}
 		parsedline->roles = lappend(parsedline->roles,
 									copy_auth_token(lfirst(tokencell)));
 	}
@@ -1120,6 +1205,8 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 		}
 		else
 		{
+			bool is_regexp = token->string[0] == '/' ? true : false;
+
 			/* IP and netmask are specified */
 			parsedline->ip_cmp_method = ipCmpMask;
 
@@ -1127,9 +1214,12 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 			str = pstrdup(token->string);
 
 			/* Check if it has a CIDR suffix and if so isolate it */
-			cidr_slash = strchr(str, '/');
-			if (cidr_slash)
-				*cidr_slash = '\0';
+			if (!is_regexp)
+			{
+				cidr_slash = strchr(str, '/');
+				if (cidr_slash)
+					*cidr_slash = '\0';
+			}
 
 			/* Get the IP address either way */
 			hints.ai_flags = AI_NUMERICHOST;
@@ -1168,7 +1258,7 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 			pg_freeaddrinfo_all(hints.ai_family, gai_result);
 
 			/* Get the netmask */
-			if (cidr_slash)
+			if (cidr_slash && !is_regexp)
 			{
 				if (parsedline->hostname)
 				{
@@ -1199,7 +1289,7 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 				parsedline->masklen = parsedline->addrlen;
 				pfree(str);
 			}
-			else if (!parsedline->hostname)
+			else if (!parsedline->hostname && !is_regexp)
 			{
 				/* Read the mask field. */
 				pfree(str);
@@ -1261,6 +1351,18 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 					return NULL;
 				}
 			}
+			else if (is_regexp)
+			{
+				/*
+				 * When token->string starts with a slash, treat it as a
+				 * regular expression. Pre-compile it.
+				 */
+				if (!token_regcomp(&parsedline->hostname_re,
+								   token->string + 1, HbaFileName,
+								   line_num, err_msg, elevel))
+					return NULL;
+				parsedline->hostname = str;
+			}
 		}
 	}							/* != ctLocal */
 
@@ -2135,7 +2237,8 @@ check_hba(hbaPort *port)
 					if (hba->hostname)
 					{
 						if (!check_hostname(port,
-											hba->hostname))
+											hba->hostname, hba->hostname_re))
+
 							continue;
 					}
 					else
@@ -2162,10 +2265,10 @@ check_hba(hbaPort *port)
 
 		/* Check database and role */
 		if (!check_db(port->database_name, port->user_name, roleid,
-					  hba->databases))
+					  hba->databases, hba->databases_re))
 			continue;
 
-		if (!check_role(port->user_name, roleid, hba->roles))
+		if (!check_role(port->user_name, roleid, hba->roles, hba->roles_re))
 			continue;
 
 		/* Found a record that matched! */
@@ -2342,34 +2445,9 @@ parse_ident_line(TokenizedAuthLine *tok_line, int elevel)
 		 * When system username starts with a slash, treat it as a regular
 		 * expression. Pre-compile it.
 		 */
-		int			r;
-		pg_wchar   *wstr;
-		int			wlen;
-
-		wstr = palloc((strlen(parsedline->ident_user + 1) + 1) * sizeof(pg_wchar));
-		wlen = pg_mb2wchar_with_len(parsedline->ident_user + 1,
-									wstr, strlen(parsedline->ident_user + 1));
-
-		r = pg_regcomp(&parsedline->re, wstr, wlen, REG_ADVANCED, C_COLLATION_OID);
-		if (r)
-		{
-			char		errstr[100];
-
-			pg_regerror(r, &parsedline->re, errstr, sizeof(errstr));
-			ereport(elevel,
-					(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
-					 errmsg("invalid regular expression \"%s\": %s",
-							parsedline->ident_user + 1, errstr),
-					 errcontext("line %d of configuration file \"%s\"",
-							line_num, IdentFileName)));
-
-			*err_msg = psprintf("invalid regular expression \"%s\": %s",
-								parsedline->ident_user + 1, errstr);
-
-			pfree(wstr);
+		if (!token_regcomp(&parsedline->re, parsedline->ident_user + 1,
+						   IdentFileName, line_num, err_msg, elevel))
 			return NULL;
-		}
-		pfree(wstr);
 	}
 
 	return parsedline;
@@ -2706,3 +2784,68 @@ hba_authname(UserAuth auth_method)
 
 	return UserAuthName[auth_method];
 }
+
+/*
+ * Compile the regular expression "re" and return whether it compiles
+ * successfully or not.
+ *
+ * If not, the last 4 parameters are used to add extra details while reporting
+ * the error.
+ */
+static bool
+token_regcomp(regex_t *re, char *string, char *filename, int line_num,
+			  char **err_msg, int elevel)
+{
+	int			r;
+	pg_wchar   *wstr;
+	int			wlen;
+
+	wstr = palloc((strlen(string) + 1) * sizeof(pg_wchar));
+	wlen = pg_mb2wchar_with_len(string,
+								wstr, strlen(string));
+
+	r = pg_regcomp(re, wstr, wlen, REG_ADVANCED, C_COLLATION_OID);
+	if (r)
+	{
+		char		errstr[100];
+
+		pg_regerror(r, re, errstr, sizeof(errstr));
+		ereport(elevel,
+				(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
+				 errmsg("invalid regular expression \"%s\": %s",
+						string, errstr),
+				 errcontext("line %d of configuration file \"%s\"",
+							line_num, filename)));
+
+		*err_msg = psprintf("invalid regular expression \"%s\": %s",
+							string, errstr);
+
+		pfree(wstr);
+		return false;
+	}
+
+	pfree(wstr);
+	return true;
+}
+
+/*
+ * Return whether "match" is matching the regular expression "re" or not.
+ */
+static bool
+token_regexec(const char *match, regex_t *re)
+{
+	pg_wchar   *wmatchstr;
+	int			wmatchlen;
+
+	wmatchstr = palloc((strlen(match) + 1) * sizeof(pg_wchar));
+	wmatchlen = pg_mb2wchar_with_len(match, wmatchstr, strlen(match));
+
+	if (pg_regexec(re, wmatchstr, wmatchlen, 0, NULL, 0, NULL, 0) == REG_OKAY)
+	{
+		pfree(wmatchstr);
+		return true;
+	}
+
+	pfree(wmatchstr);
+	return false;
+}
diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index 9e02697355..453dafffdb 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -81,6 +81,9 @@ $node->safe_psql(
 	 GRANT ALL ON sysuser_data TO md5_role;");
 $ENV{"PGPASSWORD"} = 'pass';
 
+# Create a database to test regular expression
+$node->safe_psql('postgres', "CREATE database regex_testdb;");
+
 # For "trust" method, all users should be able to connect. These users are not
 # considered to be authenticated.
 reset_pg_hba($node, 'local','all', 'all', 'trust');
@@ -200,4 +203,43 @@ append_to_file(
 
 test_conn($node, 'user=md5_role', 'password from pgpass', 0);
 
+# Testing with regular expression for username.  Note that the first regex
+# matches in this case.
+reset_pg_hba($node, 'local', 'all', '/^.*nomatch.*$, baduser, /^md.*$',
+	     'password');
+test_conn($node, 'user=md5_role', 'password, matching regexp for username', 0);
+# The third regex does not match anymore.
+reset_pg_hba($node, 'local', 'all', '/^.*nomatch.*$, baduser, /^m_d.*$',
+	     'password');
+test_conn($node, 'user=md5_role', 'password, non matching regexp for username', 2,
+              log_unlike => [qr/connection authenticated:/]);
+
+# Testing with regular expression for dbname.  The third regex matches.
+reset_pg_hba($node, 'local', '/^.*nomatch.*$, baddb, /^regex_t.*b$', 'all', 'password');
+test_conn($node, 'user=md5_role dbname=regex_testdb', 'password,
+	matching regexp for dbname', 0);
+# The third regex does not match anymore.
+reset_pg_hba($node, 'local', '/^.*nomatch.*$, baddb, /^regex_t.*ba$', 'all', 'password');
+test_conn($node, 'user=md5_role dbname=regex_testdb', 'password, non matching regexp for dbname', 2,
+              log_unlike => [qr/connection authenticated:/]);
+
+# Testing with regular expression for hostname
+SKIP:
+{
+	# Being able to do a reverse lookup of a hostname on Windows for localhost
+	# is not guaranteed on all environments by default.
+	# So, skip the regular expression test for hostname on Windows.
+	skip "Regular expression for hostname not tested on Windows", 2 if ($windows_os);
+
+	# XXX: this makes the test unsecure.
+	$node->append_conf('postgresql.conf', "listen_addresses = '127.0.0.1'\n");
+	$node->restart;
+
+	reset_pg_hba($node, 'host', 'all', 'all  /^.*$', 'password');
+	test_conn($node, 'user=md5_role host=localhost', 'password, matching regexp for hostname', 0);
+
+	reset_pg_hba($node, 'host', 'all', 'all  /^$', 'password');
+	test_conn($node, 'user=md5_role host=localhost', 'password, non matching regexp for hostname', 2);
+}
+
 done_testing();
diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..28343445be 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -235,8 +235,9 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        logical replication connections do specify it.
        Otherwise, this is the name of
        a specific <productname>PostgreSQL</productname> database.
-       Multiple database names can be supplied by separating them with
-       commas.  A separate file containing database names can be specified by
+       Multiple database names and/or regular expressions preceded by <literal>/</literal>
+       can be supplied by separating them with commas.
+       A separate file containing database names can be specified by
        preceding the file name with <literal>@</literal>.
       </para>
      </listitem>
@@ -249,7 +250,8 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        Specifies which database user name(s) this record
        matches. The value <literal>all</literal> specifies that it
        matches all users.  Otherwise, this is either the name of a specific
-       database user, or a group name preceded by <literal>+</literal>.
+       database user, a regular expression preceded by <literal>/</literal>
+       or a group name preceded by <literal>+</literal>.
        (Recall that there is no real distinction between users and groups
        in <productname>PostgreSQL</productname>; a <literal>+</literal> mark really means
        <quote>match any of the roles that are directly or indirectly members
@@ -258,7 +260,8 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        considered to be a member of a role if they are explicitly a member
        of the role, directly or indirectly, and not just by virtue of
        being a superuser.
-       Multiple user names can be supplied by separating them with commas.
+       Multiple user names and/or regular expressions preceded by <literal>/</literal>
+       can be supplied by separating them with commas.
        A separate file containing user names can be specified by preceding the
        file name with <literal>@</literal>.
       </para>
@@ -270,8 +273,9 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
      <listitem>
       <para>
        Specifies the client machine address(es) that this record
-       matches.  This field can contain either a host name, an IP
-       address range, or one of the special key words mentioned below.
+       matches.  This field can contain either a host name, a regular expression
+       preceded by <literal>/</literal> representing host names, an IP address range,
+       or one of the special key words mentioned below.
       </para>
 
       <para>
@@ -785,16 +789,18 @@ host    all             all             192.168.12.10/32        gss
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 host    all             all             192.168.0.0/16          ident map=omicron
 
-# If these are the only three lines for local connections, they will
+# If these are the only four lines for local connections, they will
 # allow local users to connect only to their own databases (databases
-# with the same name as their database user name) except for administrators
-# and members of role "support", who can connect to all databases.  The file
-# $PGDATA/admins contains a list of names of administrators.  Passwords
+# with the same name as their database user name) except for administrators,
+# users finishing with "helpdesk" and members of role "support",
+# who can connect to all databases.
+# The file$PGDATA/admins contains a list of names of administrators.  Passwords
 # are required in all cases.
 #
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 local   sameuser        all                                     md5
 local   all             @admins                                 md5
+local   all             /^.*helpdesk$                           md5
 local   all             +support                                md5
 
 # The last two lines above can be combined into a single line:
-- 
2.37.3

#18

Drouvot, Bertrand

bertranddrouvot.pg@gmail.com

over 3 years ago

In reply to: Michael Paquier (#17)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Hi,

On 10/5/22 9:24 AM, Michael Paquier wrote:

Something that stood out on a first review is the refactoring of
001_password.pl that can be done independently of the main patch:

Good idea, thanks for the proposal.

- test_role() -> test_conn() to be able to pass down a database name.
- reset_pg_hba() to control the host, db and user parts. The host
part does not really apply after moving the hosts checks to a more
secure location, so I guess that this had better be extended just for
the user and database, keeping host=local all the time.
I am planning to apply 0001 attached independently,

0001 looks good to me.

reducing the
footprint of 0002, which is your previous patch left untouched
(mostly!).

Thanks! I'll look at it and the comments you just made up-thread.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#19

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Drouvot, Bertrand (#18)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

On Wed, Oct 05, 2022 at 03:32:20PM +0200, Drouvot, Bertrand wrote:

On 10/5/22 9:24 AM, Michael Paquier wrote:

- test_role() -> test_conn() to be able to pass down a database name.
- reset_pg_hba() to control the host, db and user parts. The host
part does not really apply after moving the hosts checks to a more
secure location, so I guess that this had better be extended just for
the user and database, keeping host=local all the time.
I am planning to apply 0001 attached independently,

0001 looks good to me.

Thanks. I have applied this refactoring, leaving the host part out of
the equation as we should rely only on local connections for this
part of the test. The best fit I can think about for the checks on
the hostname patterns would be either the ssl, ldap or krb5 tests.
SSL is more widely tested than the two others.

reducing the
footprint of 0002, which is your previous patch left untouched
(mostly!).

Thanks! I'll look at it and the comments you just made up-thread.

Cool, thanks. One thing that matters a lot IMO (that I forgot to
mention previously) is to preserve the order of the items parsed from
the configuration files.

Also, I am wondering whether we'd better have some regression tests
where a regex includes a comma and a role name itself has a comma,
actually, just to stress more the parsing of individual items in the
HBA file.
--
Michael

#20

Drouvot, Bertrand

bertranddrouvot.pg@gmail.com

over 3 years ago

In reply to: Michael Paquier (#17)

1 attachment(s)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Hi,

On 10/5/22 9:24 AM, Michael Paquier wrote:

On Tue, Sep 20, 2022 at 01:33:09PM +0200, Drouvot, Bertrand wrote:
Anyway, I have looked at the patch.

+ List *roles_re;
+ List *databases_re;
+ regex_t hostname_re;
I am surprised by the approach of using separate lists for the regular
expressions and the raw names. Wouldn't it be better to store
everything in a single list but assign an entry type? In this case it
would be either regex or plain string. This would minimize the
footprint of the changes (no extra arguments *_re in the routines
checking for a match on the roles, databases or hosts). And it seems
to me that this would make unnecessary the use of re_num here and
there.

Please find attached v5 addressing this. I started with an union but it
turns out that we still need the plain string when a regex is used. This
is not needed for the authentication per say but for fill_hba_line(). So
I ended up creating a new struct without union in v5.

The hostname is different, of course, requiring only an extra
field for its type, or something like that.

I'm using the same new struct as described above for the hostname.

Perhaps the documentation would gain in clarity if there were more
examples, like a set of comma-separated examples (mix of regex and raw
strings for example, for all the field types that gain support for
regexes)?

Right, I added more examples in v5.

-$node->append_conf('postgresql.conf', "log_connections = on\n");
+$node->append_conf(
+    'postgresql.conf', qq{
+listen_addresses = '127.0.0.1'
+log_connections = on
+});
Hmm.  I think that we may need to reconsider the location of the tests
for the regexes with the host name, as the "safe" regression tests
should not switch listen_addresses.  One location where we already do
that is src/test/ssl/, so these could be moved there.

Good point, I moved the hostname related tests in src/test/ssl.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v5-0001-hba_with_regexp.patchtext/plain; charset=UTF-8; name=v5-0001-hba_with_regexp.patchDownload

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..406628ef35 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -235,8 +235,9 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        logical replication connections do specify it.
        Otherwise, this is the name of
        a specific <productname>PostgreSQL</productname> database.
-       Multiple database names can be supplied by separating them with
-       commas.  A separate file containing database names can be specified by
+       Multiple database names and/or regular expressions preceded by <literal>/</literal>
+       can be supplied by separating them with commas.
+       A separate file containing database names can be specified by
        preceding the file name with <literal>@</literal>.
       </para>
      </listitem>
@@ -249,7 +250,8 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        Specifies which database user name(s) this record
        matches. The value <literal>all</literal> specifies that it
        matches all users.  Otherwise, this is either the name of a specific
-       database user, or a group name preceded by <literal>+</literal>.
+       database user, a regular expression preceded by <literal>/</literal>
+       or a group name preceded by <literal>+</literal>.
        (Recall that there is no real distinction between users and groups
        in <productname>PostgreSQL</productname>; a <literal>+</literal> mark really means
        <quote>match any of the roles that are directly or indirectly members
@@ -258,7 +260,8 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        considered to be a member of a role if they are explicitly a member
        of the role, directly or indirectly, and not just by virtue of
        being a superuser.
-       Multiple user names can be supplied by separating them with commas.
+       Multiple user names and/or regular expressions preceded by <literal>/</literal>
+       can be supplied by separating them with commas.
        A separate file containing user names can be specified by preceding the
        file name with <literal>@</literal>.
       </para>
@@ -270,8 +273,9 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
      <listitem>
       <para>
        Specifies the client machine address(es) that this record
-       matches.  This field can contain either a host name, an IP
-       address range, or one of the special key words mentioned below.
+       matches.  This field can contain either a host name, a regular expression
+       preceded by <literal>/</literal> representing host names, an IP address range,
+       or one of the special key words mentioned below.
       </para>
 
       <para>
@@ -739,6 +743,24 @@ host    all             all             ::1/128                 trust
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 host    all             all             localhost               trust
 
+# The same using a regular expression for host name, which allows connection for
+# host name ending with "test".
+#
+# TYPE  DATABASE        USER            ADDRESS                 METHOD
+host    all             all             /^.*test$               trust
+
+# The same using regular expression for DATABASE, which allows connection to the
+# db1 and testdb databases and any database with a name ending with "test".
+#
+# TYPE  DATABASE               USER            ADDRESS                 METHOD
+local   db1,/^.*test$,testdb   all             /^.*test$               trust
+
+# The same using regular expression for USER, which allows connection to the
+# user1 and testuser users and any user with a name ending with "test".
+#
+# TYPE  DATABASE                 USER                              ADDRESS                 METHOD
+local   db1,/^.*test$,testdb     user1,/^.*test$,testuser          /^.*test$               trust
+
 # Allow any user from any host with IP address 192.168.93.x to connect
 # to database "postgres" as the same user name that ident reports for
 # the connection (typically the operating system user name).
@@ -785,16 +807,18 @@ host    all             all             192.168.12.10/32        gss
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 host    all             all             192.168.0.0/16          ident map=omicron
 
-# If these are the only three lines for local connections, they will
+# If these are the only four lines for local connections, they will
 # allow local users to connect only to their own databases (databases
-# with the same name as their database user name) except for administrators
-# and members of role "support", who can connect to all databases.  The file
-# $PGDATA/admins contains a list of names of administrators.  Passwords
+# with the same name as their database user name) except for administrators,
+# users ending with "helpdesk" and members of role "support",
+# who can connect to all databases.
+# The file$PGDATA/admins contains a list of names of administrators.  Passwords
 # are required in all cases.
 #
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 local   sameuser        all                                     md5
 local   all             @admins                                 md5
+local   all             /^.*helpdesk$                           md5
 local   all             +support                                md5
 
 # The last two lines above can be combined into a single line:
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 4637426d62..6fe8c40098 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -66,6 +66,7 @@ typedef struct check_network_data
 } check_network_data;
 
 
+#define token_is_regexp(t)	(t->is_regex)
 #define token_is_keyword(t, k)	(!t->quoted && strcmp(t->string, k) == 0)
 #define token_matches(t, k)  (strcmp(t->string, k) == 0)
 
@@ -117,6 +118,9 @@ static List *tokenize_inc_file(List *tokens, const char *outer_filename,
 							   const char *inc_filename, int elevel, char **err_msg);
 static bool parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline,
 							   int elevel, char **err_msg);
+static bool token_regcomp(regex_t *re, char *string, char *filename,
+						  int line_num, char **err_msg, int elevel);
+static bool token_regexec(const char *match, regex_t *re);
 
 
 /*
@@ -574,24 +578,30 @@ is_member(Oid userid, const char *role)
 }
 
 /*
- * Check AuthToken list for a match to role, allowing group names.
+ * Check AuthToken list for a match to role.
+ * We are allowing group names and regular expressions.
  */
 static bool
 check_role(const char *role, Oid roleid, List *tokens)
 {
 	ListCell   *cell;
-	AuthToken  *tok;
+	AuthTokenOrRegex *tokreg;
 
 	foreach(cell, tokens)
 	{
-		tok = lfirst(cell);
-		if (!tok->quoted && tok->string[0] == '+')
+		tokreg = lfirst(cell);
+		if (!token_is_regexp(tokreg))
 		{
-			if (is_member(roleid, tok->string + 1))
+			if (!tokreg->authtoken->quoted && tokreg->authtoken->string[0] == '+')
+			{
+				if (is_member(roleid, tokreg->authtoken->string + 1))
+					return true;
+			}
+			else if (token_matches(tokreg->authtoken, role) ||
+					 token_is_keyword(tokreg->authtoken, "all"))
 				return true;
 		}
-		else if (token_matches(tok, role) ||
-				 token_is_keyword(tok, "all"))
+		else if (token_regexec(role, tokreg->regex))
 			return true;
 	}
 	return false;
@@ -604,36 +614,41 @@ static bool
 check_db(const char *dbname, const char *role, Oid roleid, List *tokens)
 {
 	ListCell   *cell;
-	AuthToken  *tok;
+	AuthTokenOrRegex *tokreg;
 
 	foreach(cell, tokens)
 	{
-		tok = lfirst(cell);
-		if (am_walsender && !am_db_walsender)
-		{
-			/*
-			 * physical replication walsender connections can only match
-			 * replication keyword
-			 */
-			if (token_is_keyword(tok, "replication"))
-				return true;
-		}
-		else if (token_is_keyword(tok, "all"))
-			return true;
-		else if (token_is_keyword(tok, "sameuser"))
+		tokreg = lfirst(cell);
+		if (!token_is_regexp(tokreg))
 		{
-			if (strcmp(dbname, role) == 0)
+			if (am_walsender && !am_db_walsender)
+			{
+				/*
+				 * physical replication walsender connections can only match
+				 * replication keyword
+				 */
+				if (token_is_keyword(tokreg->authtoken, "replication"))
+					return true;
+			}
+			else if (token_is_keyword(tokreg->authtoken, "all"))
 				return true;
-		}
-		else if (token_is_keyword(tok, "samegroup") ||
-				 token_is_keyword(tok, "samerole"))
-		{
-			if (is_member(roleid, dbname))
+			else if (token_is_keyword(tokreg->authtoken, "sameuser"))
+			{
+				if (strcmp(dbname, role) == 0)
+					return true;
+			}
+			else if (token_is_keyword(tokreg->authtoken, "samegroup") ||
+					 token_is_keyword(tokreg->authtoken, "samerole"))
+			{
+				if (is_member(roleid, dbname))
+					return true;
+			}
+			else if (token_is_keyword(tokreg->authtoken, "replication"))
+				continue;		/* never match this if not walsender */
+			else if (token_matches(tokreg->authtoken, dbname))
 				return true;
 		}
-		else if (token_is_keyword(tok, "replication"))
-			continue;			/* never match this if not walsender */
-		else if (token_matches(tok, dbname))
+		else if (token_regexec(dbname, tokreg->regex))
 			return true;
 	}
 	return false;
@@ -681,7 +696,7 @@ hostname_match(const char *pattern, const char *actual_hostname)
  * Check to see if a connecting IP matches a given host name.
  */
 static bool
-check_hostname(hbaPort *port, const char *hostname)
+check_hostname(hbaPort *port, const AuthTokenOrRegex *tok_hostname)
 {
 	struct addrinfo *gai_result,
 			   *gai;
@@ -712,8 +727,13 @@ check_hostname(hbaPort *port, const char *hostname)
 		port->remote_hostname = pstrdup(remote_hostname);
 	}
 
+	if (token_is_regexp(tok_hostname))
+	{
+		if (!token_regexec(port->remote_hostname, tok_hostname->regex))
+			return false;
+	}
 	/* Now see if remote host name matches this pg_hba line */
-	if (!hostname_match(hostname, port->remote_hostname))
+	else if (!hostname_match(tok_hostname->authtoken->string, port->remote_hostname))
 		return false;
 
 	/* If we already verified the forward lookup, we're done */
@@ -761,7 +781,7 @@ check_hostname(hbaPort *port, const char *hostname)
 
 	if (!found)
 		elog(DEBUG2, "pg_hba.conf host name \"%s\" rejected because address resolution did not return a match with IP address of client",
-			 hostname);
+			 tok_hostname->authtoken->string);
 
 	port->remote_hostname_resolv = found ? +1 : -1;
 
@@ -939,13 +959,13 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 	struct addrinfo *gai_result;
 	struct addrinfo hints;
 	int			ret;
-	char	   *cidr_slash;
 	char	   *unsupauth;
 	ListCell   *field;
 	List	   *tokens;
 	ListCell   *tokencell;
 	AuthToken  *token;
 	HbaLine    *parsedline;
+	char	   *cidr_slash = NULL;	/* keep compiler quiet */
 
 	parsedline = palloc0(sizeof(HbaLine));
 	parsedline->linenumber = line_num;
@@ -1052,8 +1072,31 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 	tokens = lfirst(field);
 	foreach(tokencell, tokens)
 	{
-		parsedline->databases = lappend(parsedline->databases,
-										copy_auth_token(lfirst(tokencell)));
+		AuthTokenOrRegex *tokreg;
+		AuthToken  *tok = lfirst(tokencell);
+
+		tokreg = (AuthTokenOrRegex *) palloc0(sizeof(AuthTokenOrRegex));
+		tokreg->authtoken = copy_auth_token(lfirst(tokencell));
+		if (tok->string[0] == '/')
+		{
+			/*
+			 * When tok->string starts with a slash, treat it as a regular
+			 * expression. Pre-compile it.
+			 */
+			regex_t    *re;
+
+			tokreg->is_regex = true;
+			re = (regex_t *) palloc(sizeof(regex_t));
+			if (token_regcomp(re, tok->string + 1, HbaFileName, line_num,
+							  err_msg, elevel))
+				tokreg->regex = re;
+			else
+				return NULL;
+		}
+		else
+			tokreg->is_regex = false;
+
+		parsedline->databases = lappend(parsedline->databases, tokreg);
 	}
 
 	/* Get the roles. */
@@ -1072,8 +1115,31 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 	tokens = lfirst(field);
 	foreach(tokencell, tokens)
 	{
-		parsedline->roles = lappend(parsedline->roles,
-									copy_auth_token(lfirst(tokencell)));
+		AuthTokenOrRegex *tokreg;
+		AuthToken  *tok = lfirst(tokencell);
+
+		tokreg = (AuthTokenOrRegex *) palloc0(sizeof(AuthTokenOrRegex));
+		tokreg->authtoken = copy_auth_token(lfirst(tokencell));
+		if (tok->string[0] == '/')
+		{
+			/*
+			 * When tok->string starts with a slash, treat it as a regular
+			 * expression. Pre-compile it.
+			 */
+			regex_t    *re;
+
+			tokreg->is_regex = true;
+			re = (regex_t *) palloc(sizeof(regex_t));
+			if (token_regcomp(re, tok->string + 1, HbaFileName, line_num,
+							  err_msg, elevel))
+				tokreg->regex = re;
+			else
+				return NULL;
+		}
+		else
+			tokreg->is_regex = false;
+
+		parsedline->roles = lappend(parsedline->roles, tokreg);
 	}
 
 	if (parsedline->conntype != ctLocal)
@@ -1120,6 +1186,8 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 		}
 		else
 		{
+			bool		is_regexp = token->string[0] == '/' ? true : false;
+
 			/* IP and netmask are specified */
 			parsedline->ip_cmp_method = ipCmpMask;
 
@@ -1127,9 +1195,12 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 			str = pstrdup(token->string);
 
 			/* Check if it has a CIDR suffix and if so isolate it */
-			cidr_slash = strchr(str, '/');
-			if (cidr_slash)
-				*cidr_slash = '\0';
+			if (!is_regexp)
+			{
+				cidr_slash = strchr(str, '/');
+				if (cidr_slash)
+					*cidr_slash = '\0';
+			}
 
 			/* Get the IP address either way */
 			hints.ai_flags = AI_NUMERICHOST;
@@ -1149,7 +1220,16 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 				parsedline->addrlen = gai_result->ai_addrlen;
 			}
 			else if (ret == EAI_NONAME)
-				parsedline->hostname = str;
+			{
+				parsedline->tok_hostname.is_regex = false;
+
+				/*
+				 * This is ok to copy the token->string and not str here, as
+				 * we'll error and report "specifying both host name and CIDR
+				 * mask is invalid" below should they differ.
+				 */
+				parsedline->tok_hostname.authtoken = copy_auth_token(token);
+			}
 			else
 			{
 				ereport(elevel,
@@ -1168,9 +1248,9 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 			pg_freeaddrinfo_all(hints.ai_family, gai_result);
 
 			/* Get the netmask */
-			if (cidr_slash)
+			if (cidr_slash && !is_regexp)
 			{
-				if (parsedline->hostname)
+				if (parsedline->tok_hostname.authtoken)
 				{
 					ereport(elevel,
 							(errcode(ERRCODE_CONFIG_FILE_ERROR),
@@ -1199,7 +1279,7 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 				parsedline->masklen = parsedline->addrlen;
 				pfree(str);
 			}
-			else if (!parsedline->hostname)
+			else if (!parsedline->tok_hostname.authtoken && !is_regexp)
 			{
 				/* Read the mask field. */
 				pfree(str);
@@ -1261,9 +1341,25 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 					return NULL;
 				}
 			}
+			else if (is_regexp)
+			{
+				/*
+				 * When token->string starts with a slash, treat it as a
+				 * regular expression. Pre-compile it.
+				 */
+				regex_t    *re;
+
+				re = (regex_t *) palloc(sizeof(regex_t));
+				parsedline->tok_hostname.is_regex = true;
+				if (!token_regcomp(re,
+								   token->string + 1, HbaFileName,
+								   line_num, err_msg, elevel))
+					return NULL;
+
+				parsedline->tok_hostname.regex = re;
+			}
 		}
 	}							/* != ctLocal */
-
 	/* Get the authentication method */
 	field = lnext(tok_line->fields, field);
 	if (!field)
@@ -2132,10 +2228,11 @@ check_hba(hbaPort *port)
 			switch (hba->ip_cmp_method)
 			{
 				case ipCmpMask:
-					if (hba->hostname)
+					if (hba->tok_hostname.authtoken || hba->tok_hostname.is_regex)
 					{
 						if (!check_hostname(port,
-											hba->hostname))
+											&hba->tok_hostname))
+
 							continue;
 					}
 					else
@@ -2342,34 +2439,9 @@ parse_ident_line(TokenizedAuthLine *tok_line, int elevel)
 		 * When system username starts with a slash, treat it as a regular
 		 * expression. Pre-compile it.
 		 */
-		int			r;
-		pg_wchar   *wstr;
-		int			wlen;
-
-		wstr = palloc((strlen(parsedline->ident_user + 1) + 1) * sizeof(pg_wchar));
-		wlen = pg_mb2wchar_with_len(parsedline->ident_user + 1,
-									wstr, strlen(parsedline->ident_user + 1));
-
-		r = pg_regcomp(&parsedline->re, wstr, wlen, REG_ADVANCED, C_COLLATION_OID);
-		if (r)
-		{
-			char		errstr[100];
-
-			pg_regerror(r, &parsedline->re, errstr, sizeof(errstr));
-			ereport(elevel,
-					(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
-					 errmsg("invalid regular expression \"%s\": %s",
-							parsedline->ident_user + 1, errstr),
-					 errcontext("line %d of configuration file \"%s\"",
-							line_num, IdentFileName)));
-
-			*err_msg = psprintf("invalid regular expression \"%s\": %s",
-								parsedline->ident_user + 1, errstr);
-
-			pfree(wstr);
+		if (!token_regcomp(&parsedline->re, parsedline->ident_user + 1,
+						   IdentFileName, line_num, err_msg, elevel))
 			return NULL;
-		}
-		pfree(wstr);
 	}
 
 	return parsedline;
@@ -2706,3 +2778,68 @@ hba_authname(UserAuth auth_method)
 
 	return UserAuthName[auth_method];
 }
+
+/*
+ * Compile the regular expression "re" and return whether it compiles
+ * successfully or not.
+ *
+ * If not, the last 4 parameters are used to add extra details while reporting
+ * the error.
+ */
+static bool
+token_regcomp(regex_t *re, char *string, char *filename, int line_num,
+			  char **err_msg, int elevel)
+{
+	int			r;
+	pg_wchar   *wstr;
+	int			wlen;
+
+	wstr = palloc((strlen(string) + 1) * sizeof(pg_wchar));
+	wlen = pg_mb2wchar_with_len(string,
+								wstr, strlen(string));
+
+	r = pg_regcomp(re, wstr, wlen, REG_ADVANCED, C_COLLATION_OID);
+	if (r)
+	{
+		char		errstr[100];
+
+		pg_regerror(r, re, errstr, sizeof(errstr));
+		ereport(elevel,
+				(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
+				 errmsg("invalid regular expression \"%s\": %s",
+						string, errstr),
+				 errcontext("line %d of configuration file \"%s\"",
+							line_num, filename)));
+
+		*err_msg = psprintf("invalid regular expression \"%s\": %s",
+							string, errstr);
+
+		pfree(wstr);
+		return false;
+	}
+
+	pfree(wstr);
+	return true;
+}
+
+/*
+ * Return whether "match" is matching the regular expression "re" or not.
+ */
+static bool
+token_regexec(const char *match, regex_t *re)
+{
+	pg_wchar   *wmatchstr;
+	int			wmatchlen;
+
+	wmatchstr = palloc((strlen(match) + 1) * sizeof(pg_wchar));
+	wmatchlen = pg_mb2wchar_with_len(match, wmatchstr, strlen(match));
+
+	if (pg_regexec(re, wmatchstr, wmatchlen, 0, NULL, 0, NULL, 0) == REG_OKAY)
+	{
+		pfree(wmatchstr);
+		return true;
+	}
+
+	pfree(wmatchstr);
+	return false;
+}
diff --git a/src/backend/utils/adt/hbafuncs.c b/src/backend/utils/adt/hbafuncs.c
index 9e5794071c..2b7ab0bd63 100644
--- a/src/backend/utils/adt/hbafuncs.c
+++ b/src/backend/utils/adt/hbafuncs.c
@@ -242,9 +242,9 @@ fill_hba_line(Tuplestorestate *tuple_store, TupleDesc tupdesc,
 
 			foreach(lc, hba->databases)
 			{
-				AuthToken  *tok = lfirst(lc);
+				AuthTokenOrRegex *tok = lfirst(lc);
 
-				names = lappend(names, tok->string);
+				names = lappend(names, tok->authtoken->string);
 			}
 			values[index++] = PointerGetDatum(strlist_to_textarray(names));
 		}
@@ -259,9 +259,9 @@ fill_hba_line(Tuplestorestate *tuple_store, TupleDesc tupdesc,
 
 			foreach(lc, hba->roles)
 			{
-				AuthToken  *tok = lfirst(lc);
+				AuthTokenOrRegex *tok = lfirst(lc);
 
-				roles = lappend(roles, tok->string);
+				roles = lappend(roles, tok->authtoken->string);
 			}
 			values[index++] = PointerGetDatum(strlist_to_textarray(roles));
 		}
@@ -274,10 +274,8 @@ fill_hba_line(Tuplestorestate *tuple_store, TupleDesc tupdesc,
 		switch (hba->ip_cmp_method)
 		{
 			case ipCmpMask:
-				if (hba->hostname)
-				{
-					addrstr = hba->hostname;
-				}
+				if (hba->tok_hostname.authtoken)
+					addrstr = hba->tok_hostname.authtoken->string;
 				else
 				{
 					/*
diff --git a/src/include/libpq/hba.h b/src/include/libpq/hba.h
index d06da81806..7a53a31771 100644
--- a/src/include/libpq/hba.h
+++ b/src/include/libpq/hba.h
@@ -77,6 +77,32 @@ typedef enum ClientCertName
 	clientCertDN
 } ClientCertName;
 
+/*
+ * A single string token lexed from an authentication configuration file
+ * (pg_ident.conf or pg_hba.conf), together with whether the token has
+ * been quoted.
+ */
+typedef struct AuthToken
+{
+	char	   *string;
+	bool		quoted;
+} AuthToken;
+
+/*
+ * Distinguish the case a token has to be treated as a regular
+ * expression or not.
+ */
+typedef struct AuthTokenOrRegex
+{
+	bool		is_regex;
+
+	/*
+	 * Not an union as we still need the token string for fill_hba_line().
+	 */
+	AuthToken  *authtoken;
+	regex_t    *regex;
+} AuthTokenOrRegex;
+
 typedef struct HbaLine
 {
 	int			linenumber;
@@ -89,7 +115,7 @@ typedef struct HbaLine
 	struct sockaddr_storage mask;
 	int			masklen;		/* zero if we don't have a valid mask */
 	IPCompareMethod ip_cmp_method;
-	char	   *hostname;
+	AuthTokenOrRegex tok_hostname;
 	UserAuth	auth_method;
 	char	   *usermap;
 	char	   *pamservice;
@@ -132,17 +158,6 @@ typedef struct IdentLine
 	regex_t		re;
 } IdentLine;
 
-/*
- * A single string token lexed from an authentication configuration file
- * (pg_ident.conf or pg_hba.conf), together with whether the token has
- * been quoted.
- */
-typedef struct AuthToken
-{
-	char	   *string;
-	bool		quoted;
-} AuthToken;
-
 /*
  * TokenizedAuthLine represents one line lexed from an authentication
  * configuration file.  Each item in the "fields" list is a sub-list of
diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index 93df77aa4e..5063a2601c 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -81,6 +81,14 @@ $node->safe_psql(
 	 GRANT ALL ON sysuser_data TO md5_role;");
 $ENV{"PGPASSWORD"} = 'pass';
 
+# Create a role that contains a comma to stress the parsing.
+$node->safe_psql('postgres',
+	q{SET password_encryption='md5'; CREATE ROLE "md5,role" LOGIN PASSWORD 'pass';}
+);
+
+# Create a database to test regular expression.
+$node->safe_psql('postgres', "CREATE database regex_testdb;");
+
 # For "trust" method, all users should be able to connect. These users are not
 # considered to be authenticated.
 reset_pg_hba($node, 'all', 'all', 'trust');
@@ -200,4 +208,37 @@ append_to_file(
 
 test_conn($node, 'user=md5_role', 'password from pgpass', 0);
 
+# Testing with regular expression for username. Note that the third regex
+# matches in this case.
+reset_pg_hba($node, 'all', '/^.*nomatch.*$, baduser, /^md.*$', 'password');
+test_conn($node, 'user=md5_role', 'password, matching regexp for username',
+	0);
+
+# The third regex does not match anymore.
+reset_pg_hba($node, 'all', '/^.*nomatch.*$, baduser, /^m_d.*$', 'password');
+test_conn($node, 'user=md5_role',
+	'password, non matching regexp for username',
+	2, log_unlike => [qr/connection authenticated:/]);
+
+# test with a comma in the regular expression
+reset_pg_hba($node, 'all', '"/^.*5,.*e$"', 'password');
+test_conn($node, 'user=md5,role', 'password', 'matching regexp for username',
+	0);
+
+# Testing with regular expression for dbname. The third regex matches.
+reset_pg_hba($node, '/^.*nomatch.*$, baddb, /^regex_t.*b$', 'all',
+	'password');
+test_conn(
+	$node, 'user=md5_role dbname=regex_testdb', 'password,
+	matching regexp for dbname', 0);
+
+# The third regex does not match anymore.
+reset_pg_hba($node, '/^.*nomatch.*$, baddb, /^regex_t.*ba$',
+	'all', 'password');
+test_conn(
+	$node,
+	'user=md5_role dbname=regex_testdb',
+	'password, non matching regexp for dbname',
+	2, log_unlike => [qr/connection authenticated:/]);
+
 done_testing();
diff --git a/src/test/ssl/t/002_scram.pl b/src/test/ssl/t/002_scram.pl
index deaa4aa086..b575557b37 100644
--- a/src/test/ssl/t/002_scram.pl
+++ b/src/test/ssl/t/002_scram.pl
@@ -22,7 +22,8 @@ if ($ENV{with_ssl} ne 'openssl')
 }
 elsif ($ENV{PG_TEST_EXTRA} !~ /\bssl\b/)
 {
-	plan skip_all => 'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA';
+	plan skip_all =>
+	  'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA';
 }
 
 my $ssl_server = SSL::Server->new();
@@ -37,6 +38,20 @@ sub switch_server_cert
 	$ssl_server->switch_server_cert(@_);
 }
 
+# Delete pg_hba.conf from the given node, add a new entry to it
+# and then execute a reload to refresh it.
+sub reset_pg_hba
+{
+	my $node     = shift;
+	my $hostname = shift;
+
+	unlink($node->data_dir . '/pg_hba.conf');
+	# just for testing purposes, use a continuation line
+	$node->append_conf('pg_hba.conf', "host all all $hostname scram-sha-256");
+	$node->reload;
+	return;
+}
+
 
 # This is the hostname used to connect to the server.
 my $SERVERHOSTADDR = '127.0.0.1';
@@ -136,4 +151,25 @@ $node->connect_ok(
 		qr/connection authenticated: identity="ssltestuser" method=scram-sha-256/
 	]);
 
+# Testing with regular expression for hostname
+SKIP:
+{
+	# Being able to do a reverse lookup of a hostname on Windows for localhost
+	# is not guaranteed on all environments by default.
+	# So, skip the regular expression test for hostname on Windows.
+	skip "Regular expression for hostname not tested on Windows", 2
+	  if ($windows_os);
+
+	# Test regular expression on hostname, this one matches any host.
+	reset_pg_hba($node, '/^.*$');
+	$node->connect_ok("$common_connstr user=ssltestuser",
+		"Basic SCRAM authentication with SSL matching regexp on hostname");
+	# Test regular expression on hostname, this one does not match.
+	reset_pg_hba($node, '/^$');
+	$node->connect_fails(
+		"$common_connstr user=ssltestuser",
+		"Basic SCRAM authentication with SSL non matching regexp on hostname",
+		log_like => [ qr/no pg_hba.conf entry for host/ ]);
+}
+
 done_testing();

#21

Drouvot, Bertrand

bertranddrouvot.pg@gmail.com

over 3 years ago

In reply to: Michael Paquier (#19)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Hi,

On 10/6/22 2:53 AM, Michael Paquier wrote:

On Wed, Oct 05, 2022 at 03:32:20PM +0200, Drouvot, Bertrand wrote:

On 10/5/22 9:24 AM, Michael Paquier wrote:

- test_role() -> test_conn() to be able to pass down a database name.
- reset_pg_hba() to control the host, db and user parts. The host
part does not really apply after moving the hosts checks to a more
secure location, so I guess that this had better be extended just for
the user and database, keeping host=local all the time.
I am planning to apply 0001 attached independently,

0001 looks good to me.

Thanks. I have applied this refactoring, leaving the host part out of
the equation as we should rely only on local connections for this
part of the test.

Thanks!

Thanks! I'll look at it and the comments you just made up-thread.

Cool, thanks. One thing that matters a lot IMO (that I forgot to
mention previously) is to preserve the order of the items parsed from
the configuration files.

Fully agree, all the versions that have been submitted in this thread
preserves the ordering.

Also, I am wondering whether we'd better have some regression tests
where a regex includes a comma and a role name itself has a comma,
actually, just to stress more the parsing of individual items in the
HBA file.

Good idea, it has been added in v5 just shared up-thread.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#22

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Drouvot, Bertrand (#20)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

On Mon, Oct 10, 2022 at 09:00:06AM +0200, Drouvot, Bertrand wrote:

foreach(cell, tokens)
{
[...]
+		tokreg = lfirst(cell);
+		if (!token_is_regexp(tokreg))
{
-			if (strcmp(dbname, role) == 0)
+			if (am_walsender && !am_db_walsender)
+			{
+				/*
+				 * physical replication walsender connections can only match
+				 * replication keyword
+				 */
+				if (token_is_keyword(tokreg->authtoken, "replication"))
+					return true;
+			}
+			else if (token_is_keyword(tokreg->authtoken, "all"))
return true;

When checking the list of databases in check_db(), physical WAL
senders (aka am_walsender && !am_db_walsender) would be able to accept
regexps, but these should only accept "replication" and never a
regexp, no? The second check on "replication" placed in the branch
for token_is_regexp() in your patch would be correctly placed, though.
This is kind of special in the HBA logic, coming back to 9.0 where
physical replication and this special role property have been
introduced. WAL senders have gained an actual database property later
on in 9.4 with logical decoding, keeping "replication" for
compatibility (connection strings can use replication=database to
connect as a non-physical WAL sender and connect to a specific
database).

+typedef struct AuthToken
+{
+	char	   *string;
+	bool		quoted;
+} AuthToken;
+
+/*
+ * Distinguish the case a token has to be treated as a regular
+ * expression or not.
+ */
+typedef struct AuthTokenOrRegex
+{
+	bool		is_regex;
+
+	/*
+	 * Not an union as we still need the token string for fill_hba_line().
+	 */
+	AuthToken  *authtoken;
+	regex_t    *regex;
+} AuthTokenOrRegex;

Hmm. With is_regex to check if a regex_t exists, both structures may
not be necessary. I have not put my hands on that directly, but if
I guess that I would shape things to have only AuthToken with
(enforcing regex_t in priority if set in the list of elements to check
for a match):
- the string
- quoted
- regex_t
A list member should never have (regex_t != NULL && quoted), right?
Hostnames would never be quoted, as well.

+# test with a comma in the regular expression
+reset_pg_hba($node, 'all', '"/^.*5,.*e$"', 'password');
+test_conn($node, 'user=md5,role', 'password', 'matching regexp for username',
+	0);

So, we check here that the role includes "5," in its name. This is
getting fun to parse ;)

elsif ($ENV{PG_TEST_EXTRA} !~ /\bssl\b/)
{
-	plan skip_all => 'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA';
+	plan skip_all =>
+	  'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA';
}

Unrelated noise from perltidy.
--
Michael

#23

Drouvot, Bertrand

bertranddrouvot.pg@gmail.com

over 3 years ago

In reply to: Michael Paquier (#22)

1 attachment(s)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Hi,

On 10/11/22 8:29 AM, Michael Paquier wrote:

On Mon, Oct 10, 2022 at 09:00:06AM +0200, Drouvot, Bertrand wrote:
foreach(cell, tokens)
{
[...]
+		tokreg = lfirst(cell);
+		if (!token_is_regexp(tokreg))
{
-			if (strcmp(dbname, role) == 0)
+			if (am_walsender && !am_db_walsender)
+			{
+				/*
+				 * physical replication walsender connections can only match
+				 * replication keyword
+				 */
+				if (token_is_keyword(tokreg->authtoken, "replication"))
+					return true;
+			}
+			else if (token_is_keyword(tokreg->authtoken, "all"))
return true;
When checking the list of databases in check_db(), physical WAL
senders (aka am_walsender && !am_db_walsender) would be able to accept
regexps, but these should only accept "replication" and never a
regexp, no?

Oh right, good catch, thanks! Please find attached v6 fixing it.

This is kind of special in the HBA logic, coming back to 9.0 where
physical replication and this special role property have been
introduced. WAL senders have gained an actual database property later
on in 9.4 with logical decoding, keeping "replication" for
compatibility (connection strings can use replication=database to
connect as a non-physical WAL sender and connect to a specific
database).

Thanks for the explanation!

+typedef struct AuthToken
+{
+	char	   *string;
+	bool		quoted;
+} AuthToken;
+
+/*
+ * Distinguish the case a token has to be treated as a regular
+ * expression or not.
+ */
+typedef struct AuthTokenOrRegex
+{
+	bool		is_regex;
+
+	/*
+	 * Not an union as we still need the token string for fill_hba_line().
+	 */
+	AuthToken  *authtoken;
+	regex_t    *regex;
+} AuthTokenOrRegex;

Hmm. With is_regex to check if a regex_t exists, both structures may
not be necessary.

Agree that both struct are not necessary. In v6, AuthTokenOrRegex has
been removed and the regex has been moved to AuthToken. There is no
is_regex bool anymore, as it's enough to test whether regex is NULL or not.

I have not put my hands on that directly, but if
I guess that I would shape things to have only AuthToken with
(enforcing regex_t in priority if set in the list of elements to check
for a match):
- the string
- quoted
- regex_t
A list member should never have (regex_t != NULL && quoted), right?

The patch does allow that. For example it happens for the test where we
add a comma in the role name. As we don't rely on a dedicated char to
mark the end of a reg exp (we only rely on / to mark its start) then
allowing (regex_t != NULL && quoted) seems reasonable to me.

+# test with a comma in the regular expression
+reset_pg_hba($node, 'all', '"/^.*5,.*e$"', 'password');
+test_conn($node, 'user=md5,role', 'password', 'matching regexp for username',
+	0);
So, we check here that the role includes "5," in its name. This is
getting fun to parse ;)

Indeed, ;-)

elsif ($ENV{PG_TEST_EXTRA} !~ /\bssl\b/)
{
-	plan skip_all => 'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA';
+	plan skip_all =>
+	  'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA';
}

Unrelated noise from perltidy.

Right.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v6-0001-hba_with_regexp.patchtext/plain; charset=UTF-8; name=v6-0001-hba_with_regexp.patchDownload

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..406628ef35 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -235,8 +235,9 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        logical replication connections do specify it.
        Otherwise, this is the name of
        a specific <productname>PostgreSQL</productname> database.
-       Multiple database names can be supplied by separating them with
-       commas.  A separate file containing database names can be specified by
+       Multiple database names and/or regular expressions preceded by <literal>/</literal>
+       can be supplied by separating them with commas.
+       A separate file containing database names can be specified by
        preceding the file name with <literal>@</literal>.
       </para>
      </listitem>
@@ -249,7 +250,8 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        Specifies which database user name(s) this record
        matches. The value <literal>all</literal> specifies that it
        matches all users.  Otherwise, this is either the name of a specific
-       database user, or a group name preceded by <literal>+</literal>.
+       database user, a regular expression preceded by <literal>/</literal>
+       or a group name preceded by <literal>+</literal>.
        (Recall that there is no real distinction between users and groups
        in <productname>PostgreSQL</productname>; a <literal>+</literal> mark really means
        <quote>match any of the roles that are directly or indirectly members
@@ -258,7 +260,8 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        considered to be a member of a role if they are explicitly a member
        of the role, directly or indirectly, and not just by virtue of
        being a superuser.
-       Multiple user names can be supplied by separating them with commas.
+       Multiple user names and/or regular expressions preceded by <literal>/</literal>
+       can be supplied by separating them with commas.
        A separate file containing user names can be specified by preceding the
        file name with <literal>@</literal>.
       </para>
@@ -270,8 +273,9 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
      <listitem>
       <para>
        Specifies the client machine address(es) that this record
-       matches.  This field can contain either a host name, an IP
-       address range, or one of the special key words mentioned below.
+       matches.  This field can contain either a host name, a regular expression
+       preceded by <literal>/</literal> representing host names, an IP address range,
+       or one of the special key words mentioned below.
       </para>
 
       <para>
@@ -739,6 +743,24 @@ host    all             all             ::1/128                 trust
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 host    all             all             localhost               trust
 
+# The same using a regular expression for host name, which allows connection for
+# host name ending with "test".
+#
+# TYPE  DATABASE        USER            ADDRESS                 METHOD
+host    all             all             /^.*test$               trust
+
+# The same using regular expression for DATABASE, which allows connection to the
+# db1 and testdb databases and any database with a name ending with "test".
+#
+# TYPE  DATABASE               USER            ADDRESS                 METHOD
+local   db1,/^.*test$,testdb   all             /^.*test$               trust
+
+# The same using regular expression for USER, which allows connection to the
+# user1 and testuser users and any user with a name ending with "test".
+#
+# TYPE  DATABASE                 USER                              ADDRESS                 METHOD
+local   db1,/^.*test$,testdb     user1,/^.*test$,testuser          /^.*test$               trust
+
 # Allow any user from any host with IP address 192.168.93.x to connect
 # to database "postgres" as the same user name that ident reports for
 # the connection (typically the operating system user name).
@@ -785,16 +807,18 @@ host    all             all             192.168.12.10/32        gss
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 host    all             all             192.168.0.0/16          ident map=omicron
 
-# If these are the only three lines for local connections, they will
+# If these are the only four lines for local connections, they will
 # allow local users to connect only to their own databases (databases
-# with the same name as their database user name) except for administrators
-# and members of role "support", who can connect to all databases.  The file
-# $PGDATA/admins contains a list of names of administrators.  Passwords
+# with the same name as their database user name) except for administrators,
+# users ending with "helpdesk" and members of role "support",
+# who can connect to all databases.
+# The file$PGDATA/admins contains a list of names of administrators.  Passwords
 # are required in all cases.
 #
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 local   sameuser        all                                     md5
 local   all             @admins                                 md5
+local   all             /^.*helpdesk$                           md5
 local   all             +support                                md5
 
 # The last two lines above can be combined into a single line:
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 4637426d62..fc1fb12bb3 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -66,6 +66,7 @@ typedef struct check_network_data
 } check_network_data;
 
 
+#define token_is_regexp(t)	(t->regex)
 #define token_is_keyword(t, k)	(!t->quoted && strcmp(t->string, k) == 0)
 #define token_matches(t, k)  (strcmp(t->string, k) == 0)
 
@@ -117,6 +118,9 @@ static List *tokenize_inc_file(List *tokens, const char *outer_filename,
 							   const char *inc_filename, int elevel, char **err_msg);
 static bool parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline,
 							   int elevel, char **err_msg);
+static bool token_regcomp(regex_t *re, char *string, char *filename,
+						  int line_num, char **err_msg, int elevel);
+static bool token_regexec(const char *match, regex_t *re);
 
 
 /*
@@ -267,7 +271,7 @@ make_auth_token(const char *token, bool quoted)
 
 	toklen = strlen(token);
 	/* we copy string into same palloc block as the struct */
-	authtoken = (AuthToken *) palloc(sizeof(AuthToken) + toklen + 1);
+	authtoken = (AuthToken *) palloc0(sizeof(AuthToken) + toklen + 1);
 	authtoken->string = (char *) authtoken + sizeof(AuthToken);
 	authtoken->quoted = quoted;
 	memcpy(authtoken->string, token, toklen + 1);
@@ -574,7 +578,8 @@ is_member(Oid userid, const char *role)
 }
 
 /*
- * Check AuthToken list for a match to role, allowing group names.
+ * Check AuthToken list for a match to role.
+ * We are allowing group names and regular expressions.
  */
 static bool
 check_role(const char *role, Oid roleid, List *tokens)
@@ -585,13 +590,18 @@ check_role(const char *role, Oid roleid, List *tokens)
 	foreach(cell, tokens)
 	{
 		tok = lfirst(cell);
-		if (!tok->quoted && tok->string[0] == '+')
+		if (!token_is_regexp(tok))
 		{
-			if (is_member(roleid, tok->string + 1))
+			if (!tok->quoted && tok->string[0] == '+')
+			{
+				if (is_member(roleid, tok->string + 1))
+					return true;
+			}
+			else if (token_matches(tok, role) ||
+					 token_is_keyword(tok, "all"))
 				return true;
 		}
-		else if (token_matches(tok, role) ||
-				 token_is_keyword(tok, "all"))
+		else if (token_regexec(role, tok->regex))
 			return true;
 	}
 	return false;
@@ -618,22 +628,27 @@ check_db(const char *dbname, const char *role, Oid roleid, List *tokens)
 			if (token_is_keyword(tok, "replication"))
 				return true;
 		}
-		else if (token_is_keyword(tok, "all"))
-			return true;
-		else if (token_is_keyword(tok, "sameuser"))
+		else if (token_is_keyword(tok, "replication"))
+			continue;			/* never match this if not walsender */
+		else if (!token_is_regexp(tok))
 		{
-			if (strcmp(dbname, role) == 0)
+			if (token_is_keyword(tok, "all"))
 				return true;
-		}
-		else if (token_is_keyword(tok, "samegroup") ||
-				 token_is_keyword(tok, "samerole"))
-		{
-			if (is_member(roleid, dbname))
+			else if (token_is_keyword(tok, "sameuser"))
+			{
+				if (strcmp(dbname, role) == 0)
+					return true;
+			}
+			else if (token_is_keyword(tok, "samegroup") ||
+					 token_is_keyword(tok, "samerole"))
+			{
+				if (is_member(roleid, dbname))
+					return true;
+			}
+			else if (token_matches(tok, dbname))
 				return true;
 		}
-		else if (token_is_keyword(tok, "replication"))
-			continue;			/* never match this if not walsender */
-		else if (token_matches(tok, dbname))
+		else if (token_regexec(dbname, tok->regex))
 			return true;
 	}
 	return false;
@@ -681,7 +696,7 @@ hostname_match(const char *pattern, const char *actual_hostname)
  * Check to see if a connecting IP matches a given host name.
  */
 static bool
-check_hostname(hbaPort *port, const char *hostname)
+check_hostname(hbaPort *port, const AuthToken *tok_hostname)
 {
 	struct addrinfo *gai_result,
 			   *gai;
@@ -712,8 +727,13 @@ check_hostname(hbaPort *port, const char *hostname)
 		port->remote_hostname = pstrdup(remote_hostname);
 	}
 
+	if (token_is_regexp(tok_hostname))
+	{
+		if (!token_regexec(port->remote_hostname, tok_hostname->regex))
+			return false;
+	}
 	/* Now see if remote host name matches this pg_hba line */
-	if (!hostname_match(hostname, port->remote_hostname))
+	else if (!hostname_match(tok_hostname->string, port->remote_hostname))
 		return false;
 
 	/* If we already verified the forward lookup, we're done */
@@ -761,7 +781,7 @@ check_hostname(hbaPort *port, const char *hostname)
 
 	if (!found)
 		elog(DEBUG2, "pg_hba.conf host name \"%s\" rejected because address resolution did not return a match with IP address of client",
-			 hostname);
+			 tok_hostname->string);
 
 	port->remote_hostname_resolv = found ? +1 : -1;
 
@@ -939,13 +959,13 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 	struct addrinfo *gai_result;
 	struct addrinfo hints;
 	int			ret;
-	char	   *cidr_slash;
 	char	   *unsupauth;
 	ListCell   *field;
 	List	   *tokens;
 	ListCell   *tokencell;
 	AuthToken  *token;
 	HbaLine    *parsedline;
+	char	   *cidr_slash = NULL;	/* keep compiler quiet */
 
 	parsedline = palloc0(sizeof(HbaLine));
 	parsedline->linenumber = line_num;
@@ -1052,8 +1072,26 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 	tokens = lfirst(field);
 	foreach(tokencell, tokens)
 	{
-		parsedline->databases = lappend(parsedline->databases,
-										copy_auth_token(lfirst(tokencell)));
+		AuthToken  *tok = lfirst(tokencell);
+
+		tok = copy_auth_token(lfirst(tokencell));
+		if (tok->string[0] == '/')
+		{
+			/*
+			 * When tok->string starts with a slash, treat it as a regular
+			 * expression. Pre-compile it.
+			 */
+			regex_t    *re;
+
+			re = (regex_t *) palloc(sizeof(regex_t));
+			if (token_regcomp(re, tok->string + 1, HbaFileName, line_num,
+							  err_msg, elevel))
+				tok->regex = re;
+			else
+				return NULL;
+		}
+
+		parsedline->databases = lappend(parsedline->databases, tok);
 	}
 
 	/* Get the roles. */
@@ -1072,8 +1110,26 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 	tokens = lfirst(field);
 	foreach(tokencell, tokens)
 	{
-		parsedline->roles = lappend(parsedline->roles,
-									copy_auth_token(lfirst(tokencell)));
+		AuthToken  *tok = lfirst(tokencell);
+
+		tok = copy_auth_token(lfirst(tokencell));
+		if (tok->string[0] == '/')
+		{
+			/*
+			 * When tok->string starts with a slash, treat it as a regular
+			 * expression. Pre-compile it.
+			 */
+			regex_t    *re;
+
+			re = (regex_t *) palloc(sizeof(regex_t));
+			if (token_regcomp(re, tok->string + 1, HbaFileName, line_num,
+							  err_msg, elevel))
+				tok->regex = re;
+			else
+				return NULL;
+		}
+
+		parsedline->roles = lappend(parsedline->roles, tok);
 	}
 
 	if (parsedline->conntype != ctLocal)
@@ -1120,6 +1176,8 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 		}
 		else
 		{
+			bool		is_regexp = token->string[0] == '/' ? true : false;
+
 			/* IP and netmask are specified */
 			parsedline->ip_cmp_method = ipCmpMask;
 
@@ -1127,9 +1185,12 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 			str = pstrdup(token->string);
 
 			/* Check if it has a CIDR suffix and if so isolate it */
-			cidr_slash = strchr(str, '/');
-			if (cidr_slash)
-				*cidr_slash = '\0';
+			if (!is_regexp)
+			{
+				cidr_slash = strchr(str, '/');
+				if (cidr_slash)
+					*cidr_slash = '\0';
+			}
 
 			/* Get the IP address either way */
 			hints.ai_flags = AI_NUMERICHOST;
@@ -1149,7 +1210,14 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 				parsedline->addrlen = gai_result->ai_addrlen;
 			}
 			else if (ret == EAI_NONAME)
-				parsedline->hostname = str;
+			{
+				/*
+				 * This is ok to copy the token->string and not str here, as
+				 * we'll error and report "specifying both host name and CIDR
+				 * mask is invalid" below should they differ.
+				 */
+				parsedline->tok_hostname = *copy_auth_token(token);
+			}
 			else
 			{
 				ereport(elevel,
@@ -1168,9 +1236,9 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 			pg_freeaddrinfo_all(hints.ai_family, gai_result);
 
 			/* Get the netmask */
-			if (cidr_slash)
+			if (cidr_slash && !is_regexp)
 			{
-				if (parsedline->hostname)
+				if (parsedline->tok_hostname.string)
 				{
 					ereport(elevel,
 							(errcode(ERRCODE_CONFIG_FILE_ERROR),
@@ -1199,7 +1267,7 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 				parsedline->masklen = parsedline->addrlen;
 				pfree(str);
 			}
-			else if (!parsedline->hostname)
+			else if (!parsedline->tok_hostname.string && !is_regexp)
 			{
 				/* Read the mask field. */
 				pfree(str);
@@ -1261,6 +1329,22 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 					return NULL;
 				}
 			}
+			else if (is_regexp)
+			{
+				/*
+				 * When token->string starts with a slash, treat it as a
+				 * regular expression. Pre-compile it.
+				 */
+				regex_t    *re;
+
+				re = (regex_t *) palloc(sizeof(regex_t));
+				if (!token_regcomp(re,
+								   token->string + 1, HbaFileName,
+								   line_num, err_msg, elevel))
+					return NULL;
+
+				parsedline->tok_hostname.regex = re;
+			}
 		}
 	}							/* != ctLocal */
 
@@ -2132,10 +2216,11 @@ check_hba(hbaPort *port)
 			switch (hba->ip_cmp_method)
 			{
 				case ipCmpMask:
-					if (hba->hostname)
+					if (hba->tok_hostname.string || hba->tok_hostname.regex)
 					{
 						if (!check_hostname(port,
-											hba->hostname))
+											&hba->tok_hostname))
+
 							continue;
 					}
 					else
@@ -2342,34 +2427,9 @@ parse_ident_line(TokenizedAuthLine *tok_line, int elevel)
 		 * When system username starts with a slash, treat it as a regular
 		 * expression. Pre-compile it.
 		 */
-		int			r;
-		pg_wchar   *wstr;
-		int			wlen;
-
-		wstr = palloc((strlen(parsedline->ident_user + 1) + 1) * sizeof(pg_wchar));
-		wlen = pg_mb2wchar_with_len(parsedline->ident_user + 1,
-									wstr, strlen(parsedline->ident_user + 1));
-
-		r = pg_regcomp(&parsedline->re, wstr, wlen, REG_ADVANCED, C_COLLATION_OID);
-		if (r)
-		{
-			char		errstr[100];
-
-			pg_regerror(r, &parsedline->re, errstr, sizeof(errstr));
-			ereport(elevel,
-					(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
-					 errmsg("invalid regular expression \"%s\": %s",
-							parsedline->ident_user + 1, errstr),
-					 errcontext("line %d of configuration file \"%s\"",
-							line_num, IdentFileName)));
-
-			*err_msg = psprintf("invalid regular expression \"%s\": %s",
-								parsedline->ident_user + 1, errstr);
-
-			pfree(wstr);
+		if (!token_regcomp(&parsedline->re, parsedline->ident_user + 1,
+						   IdentFileName, line_num, err_msg, elevel))
 			return NULL;
-		}
-		pfree(wstr);
 	}
 
 	return parsedline;
@@ -2706,3 +2766,68 @@ hba_authname(UserAuth auth_method)
 
 	return UserAuthName[auth_method];
 }
+
+/*
+ * Compile the regular expression "re" and return whether it compiles
+ * successfully or not.
+ *
+ * If not, the last 4 parameters are used to add extra details while reporting
+ * the error.
+ */
+static bool
+token_regcomp(regex_t *re, char *string, char *filename, int line_num,
+			  char **err_msg, int elevel)
+{
+	int			r;
+	pg_wchar   *wstr;
+	int			wlen;
+
+	wstr = palloc((strlen(string) + 1) * sizeof(pg_wchar));
+	wlen = pg_mb2wchar_with_len(string,
+								wstr, strlen(string));
+
+	r = pg_regcomp(re, wstr, wlen, REG_ADVANCED, C_COLLATION_OID);
+	if (r)
+	{
+		char		errstr[100];
+
+		pg_regerror(r, re, errstr, sizeof(errstr));
+		ereport(elevel,
+				(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
+				 errmsg("invalid regular expression \"%s\": %s",
+						string, errstr),
+				 errcontext("line %d of configuration file \"%s\"",
+							line_num, filename)));
+
+		*err_msg = psprintf("invalid regular expression \"%s\": %s",
+							string, errstr);
+
+		pfree(wstr);
+		return false;
+	}
+
+	pfree(wstr);
+	return true;
+}
+
+/*
+ * Return whether "match" is matching the regular expression "re" or not.
+ */
+static bool
+token_regexec(const char *match, regex_t *re)
+{
+	pg_wchar   *wmatchstr;
+	int			wmatchlen;
+
+	wmatchstr = palloc((strlen(match) + 1) * sizeof(pg_wchar));
+	wmatchlen = pg_mb2wchar_with_len(match, wmatchstr, strlen(match));
+
+	if (pg_regexec(re, wmatchstr, wmatchlen, 0, NULL, 0, NULL, 0) == REG_OKAY)
+	{
+		pfree(wmatchstr);
+		return true;
+	}
+
+	pfree(wmatchstr);
+	return false;
+}
diff --git a/src/backend/utils/adt/hbafuncs.c b/src/backend/utils/adt/hbafuncs.c
index 9e5794071c..fd995e91c8 100644
--- a/src/backend/utils/adt/hbafuncs.c
+++ b/src/backend/utils/adt/hbafuncs.c
@@ -274,10 +274,8 @@ fill_hba_line(Tuplestorestate *tuple_store, TupleDesc tupdesc,
 		switch (hba->ip_cmp_method)
 		{
 			case ipCmpMask:
-				if (hba->hostname)
-				{
-					addrstr = hba->hostname;
-				}
+				if (hba->tok_hostname.string)
+					addrstr = hba->tok_hostname.string;
 				else
 				{
 					/*
diff --git a/src/include/libpq/hba.h b/src/include/libpq/hba.h
index d06da81806..595d3c5a46 100644
--- a/src/include/libpq/hba.h
+++ b/src/include/libpq/hba.h
@@ -77,6 +77,19 @@ typedef enum ClientCertName
 	clientCertDN
 } ClientCertName;
 
+/*
+ * A single string token lexed from an authentication configuration file
+ * (pg_ident.conf or pg_hba.conf), together with whether the token has
+ * been quoted. It may also contain a non NULL regular expression if the string
+ * starts with a "/".
+ */
+typedef struct AuthToken
+{
+	char	   *string;
+	bool		quoted;
+	regex_t    *regex;
+} AuthToken;
+
 typedef struct HbaLine
 {
 	int			linenumber;
@@ -89,7 +102,7 @@ typedef struct HbaLine
 	struct sockaddr_storage mask;
 	int			masklen;		/* zero if we don't have a valid mask */
 	IPCompareMethod ip_cmp_method;
-	char	   *hostname;
+	AuthToken	tok_hostname;
 	UserAuth	auth_method;
 	char	   *usermap;
 	char	   *pamservice;
@@ -132,17 +145,6 @@ typedef struct IdentLine
 	regex_t		re;
 } IdentLine;
 
-/*
- * A single string token lexed from an authentication configuration file
- * (pg_ident.conf or pg_hba.conf), together with whether the token has
- * been quoted.
- */
-typedef struct AuthToken
-{
-	char	   *string;
-	bool		quoted;
-} AuthToken;
-
 /*
  * TokenizedAuthLine represents one line lexed from an authentication
  * configuration file.  Each item in the "fields" list is a sub-list of
diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index ea664d18f5..1c559dfbda 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -81,6 +81,14 @@ $node->safe_psql(
 	 GRANT ALL ON sysuser_data TO md5_role;");
 $ENV{"PGPASSWORD"} = 'pass';
 
+# Create a role that contains a comma to stress the parsing.
+$node->safe_psql('postgres',
+	q{SET password_encryption='md5'; CREATE ROLE "md5,role" LOGIN PASSWORD 'pass';}
+);
+
+# Create a database to test regular expression.
+$node->safe_psql('postgres', "CREATE database regex_testdb;");
+
 # For "trust" method, all users should be able to connect. These users are not
 # considered to be authenticated.
 reset_pg_hba($node, 'all', 'all', 'trust');
@@ -200,6 +208,39 @@ append_to_file(
 
 test_conn($node, 'user=md5_role', 'password from pgpass', 0);
 
+# Testing with regular expression for username. Note that the third regex
+# matches in this case.
+reset_pg_hba($node, 'all', '/^.*nomatch.*$, baduser, /^md.*$', 'password');
+test_conn($node, 'user=md5_role', 'password, matching regexp for username',
+	0);
+
+# The third regex does not match anymore.
+reset_pg_hba($node, 'all', '/^.*nomatch.*$, baduser, /^m_d.*$', 'password');
+test_conn($node, 'user=md5_role',
+	'password, non matching regexp for username',
+	2, log_unlike => [qr/connection authenticated:/]);
+
+# test with a comma in the regular expression
+reset_pg_hba($node, 'all', '"/^.*5,.*e$"', 'password');
+test_conn($node, 'user=md5,role', 'password', 'matching regexp for username',
+	0);
+
+# Testing with regular expression for dbname. The third regex matches.
+reset_pg_hba($node, '/^.*nomatch.*$, baddb, /^regex_t.*b$', 'all',
+	'password');
+test_conn(
+	$node, 'user=md5_role dbname=regex_testdb', 'password,
+   matching regexp for dbname', 0);
+
+# The third regex does not match anymore.
+reset_pg_hba($node, '/^.*nomatch.*$, baddb, /^regex_t.*ba$',
+	'all', 'password');
+test_conn(
+	$node,
+	'user=md5_role dbname=regex_testdb',
+	'password, non matching regexp for dbname',
+	2, log_unlike => [qr/connection authenticated:/]);
+
 unlink($pgpassfile);
 delete $ENV{"PGPASSFILE"};
 
diff --git a/src/test/ssl/t/002_scram.pl b/src/test/ssl/t/002_scram.pl
index deaa4aa086..471eb3e256 100644
--- a/src/test/ssl/t/002_scram.pl
+++ b/src/test/ssl/t/002_scram.pl
@@ -22,7 +22,8 @@ if ($ENV{with_ssl} ne 'openssl')
 }
 elsif ($ENV{PG_TEST_EXTRA} !~ /\bssl\b/)
 {
-	plan skip_all => 'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA';
+	plan skip_all =>
+	  'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA';
 }
 
 my $ssl_server = SSL::Server->new();
@@ -37,6 +38,20 @@ sub switch_server_cert
 	$ssl_server->switch_server_cert(@_);
 }
 
+# Delete pg_hba.conf from the given node, add a new entry to it
+# and then execute a reload to refresh it.
+sub reset_pg_hba
+{
+	my $node     = shift;
+	my $hostname = shift;
+
+	unlink($node->data_dir . '/pg_hba.conf');
+	# just for testing purposes, use a continuation line
+	$node->append_conf('pg_hba.conf', "host all all $hostname scram-sha-256");
+	$node->reload;
+	return;
+}
+
 
 # This is the hostname used to connect to the server.
 my $SERVERHOSTADDR = '127.0.0.1';
@@ -136,4 +151,25 @@ $node->connect_ok(
 		qr/connection authenticated: identity="ssltestuser" method=scram-sha-256/
 	]);
 
+# Testing with regular expression for hostname
+SKIP:
+{
+	# Being able to do a reverse lookup of a hostname on Windows for localhost
+	# is not guaranteed on all environments by default.
+	# So, skip the regular expression test for hostname on Windows.
+	skip "Regular expression for hostname not tested on Windows", 2
+	  if ($windows_os);
+
+	# Test regular expression on hostname, this one matches any host.
+	reset_pg_hba($node, '/^.*$');
+	$node->connect_ok("$common_connstr user=ssltestuser",
+		"Basic SCRAM authentication with SSL matching regexp on hostname");
+	# Test regular expression on hostname, this one does not match.
+	reset_pg_hba($node, '/^$');
+	$node->connect_fails(
+		"$common_connstr user=ssltestuser",
+		"Basic SCRAM authentication with SSL non matching regexp on hostname",
+		log_like => [qr/no pg_hba.conf entry for host/]);
+}
+
 done_testing();

#24

Michael Paquier

michael@paquier.xyz

about 3 years ago

In reply to: Drouvot, Bertrand (#23)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote:

Indeed, ;-)

So, I have spent the last two days looking at all that, studying the
structure of the patch and the existing HEAD code, and it looks like
that a few things could be consolidated.

First, as of HEAD, AuthToken is only used for elements in a list of
role and database names in hba.conf before filling in each HbaLine,
hence we limit its usage to the initial parsing. The patch assigns an
optional regex_t to it, then extends the use of AuthToken for single
hostname entries in pg_hba.conf. Things going first: shouldn't we
combine ident_user and "re" together in the same structure? Even if
we finish by not using AuthToken to store the computed regex, it seems
to me that we'd better use the same base structure for pg_ident.conf
and pg_hba.conf. While looking closely at the patch, we would expand
the use of AuthToken outside its original context. I have also looked
at make_auth_token(), and wondered if it could be possible to have this
routine compile the regexes. This approach would not stick with
pg_ident.conf though, as we validate the fields in each line when we
put our hands on ident_user and after the base validation of a line
(number of fields, etc.). So with all that in mind, it feels right to
not use AuthToken at all when building each HbaLine and each
IdentLine, but a new, separate, structure. We could call that an
AuthItem (string, its compiled regex) perhaps? It could have its own
make() routine, taking in input an AuthToken and process
pg_regcomp(). Better ideas for this new structure would be welcome,
and the idea is that we'd store the post-parsing state of an
AuthToken to something that has a compiled regex. We could finish by
using AuthToken at the end and expand its use, but it does not feel
completely right either to have a make() routine but not be able to
compile its regular expression when creating the AuthToken.

The logic in charge of compiling the regular expressions could be
consolidated more. The patch refactors the logic with
token_regcomp(), uses it for the user names (ident_user in
parse_ident_line() from pg_ident.conf), then extended to the hostnames
(single item) and the role/database names (list possible in these
cases). This approach looks right to me. Once we plug in an AuthItem
to IdentLine, token_regcomp could be changed so as it takes an
AuthToken in input, saving directly the compiled regex_t in the input
structure.

At the end, the global structure of the code should, I guess, respect
the following rules:
- The number of places where we check if a string is a regex should be
minimal (aka string beginning by '/').
- Only one code path of hba.c should call pg_regcomp() (the patch does
that), and only one code path should call pg_regexec() (two code paths
of hba.c do that with the patch, as of the need to store matching
expression). This should minimize the areas where we call
pg_mb2wchar_with_len(), for one.

About this last point, token_regexec() does not include
check_ident_usermap() in its logic, and it seems to me that it should.
The difference is with the expected regmatch_t structures, so
shouldn't token_regexec be extended with two arguments as of an array
of regmatch_t and the number of elements in the array? This would
save a bit some of the logic around pg_mb2wchar_with_len(), for
example. To make all that work, token_regexec() should return an int,
coming from pg_regexec, but no specific error strings as we don't want
to spam the logs when checking hosts, roles and databases in
pg_hba.conf.

       /* Check if it has a CIDR suffix and if so isolate it */
-      cidr_slash = strchr(str, '/');
-      if (cidr_slash)
-          *cidr_slash = '\0';
+      if (!is_regexp)
+      {
+          cidr_slash = strchr(str, '/');
+          if (cidr_slash)
+              *cidr_slash = '\0';
+      }
[...]
            /* Get the netmask */
-           if (cidr_slash)
+           if (cidr_slash && !is_regexp)
            {
Some of the code handling regexes for hostnames itches me a bit, like
this one.  Perhaps it would be better to evaluate this interaction
with regular expressions separately.  The database and role names
don't have this need, so their cases are much simpler to think about.

The code could be split to tackle things step-by-step:
- One refactoring patch to introduce token_regcomp() and
token_regexec(), with the introduction of a new structure that
includes the compiled regexes. (Feel free to counterargue about the
use of AuthToken for this purpose, of course!)
- Plug in the refactored logic for the lists of role names and
database names in pg_hba.conf.
- Handle the case of single host entries in pg_hba.conf.
--
Michael

#25

Michael Paquier

michael@paquier.xyz

about 3 years ago

In reply to: Michael Paquier (#24)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

On Fri, Oct 14, 2022 at 02:30:25PM +0900, Michael Paquier wrote:

First, as of HEAD, AuthToken is only used for elements in a list of
role and database names in hba.conf before filling in each HbaLine,
hence we limit its usage to the initial parsing. The patch assigns an
optional regex_t to it, then extends the use of AuthToken for single
hostname entries in pg_hba.conf. Things going first: shouldn't we
combine ident_user and "re" together in the same structure? Even if
we finish by not using AuthToken to store the computed regex, it seems
to me that we'd better use the same base structure for pg_ident.conf
and pg_hba.conf. While looking closely at the patch, we would expand
the use of AuthToken outside its original context. I have also looked
at make_auth_token(), and wondered if it could be possible to have this
routine compile the regexes. This approach would not stick with
pg_ident.conf though, as we validate the fields in each line when we
put our hands on ident_user and after the base validation of a line
(number of fields, etc.). So with all that in mind, it feels right to
not use AuthToken at all when building each HbaLine and each
IdentLine, but a new, separate, structure. We could call that an
AuthItem (string, its compiled regex) perhaps? It could have its own
make() routine, taking in input an AuthToken and process
pg_regcomp(). Better ideas for this new structure would be welcome,
and the idea is that we'd store the post-parsing state of an
AuthToken to something that has a compiled regex. We could finish by
using AuthToken at the end and expand its use, but it does not feel
completely right either to have a make() routine but not be able to
compile its regular expression when creating the AuthToken.

I have have sent this part too quickly. As AuthTokens are used in
check_db() and check_role() when matching entries, it is more
intuitive to store the regex_t directly in it. Changing IdentLine to
use a AuthToken makes the "quoted" part useless in this case, still it
could be used in Assert()s to make sure that the data is shaped as
expected at check-time, enforced at false when creating it in
parse_ident_line()?
--
Michael

#26

Drouvot, Bertrand

bertranddrouvot.pg@gmail.com

about 3 years ago

In reply to: Michael Paquier (#25)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Hi,

On 10/14/22 8:18 AM, Michael Paquier wrote:

On Fri, Oct 14, 2022 at 02:30:25PM +0900, Michael Paquier wrote:

First, as of HEAD, AuthToken is only used for elements in a list of
role and database names in hba.conf before filling in each HbaLine,
hence we limit its usage to the initial parsing. The patch assigns an
optional regex_t to it, then extends the use of AuthToken for single
hostname entries in pg_hba.conf. Things going first: shouldn't we
combine ident_user and "re" together in the same structure? Even if
we finish by not using AuthToken to store the computed regex, it seems
to me that we'd better use the same base structure for pg_ident.conf
and pg_hba.conf. While looking closely at the patch, we would expand
the use of AuthToken outside its original context. I have also looked
at make_auth_token(), and wondered if it could be possible to have this
routine compile the regexes. This approach would not stick with
pg_ident.conf though, as we validate the fields in each line when we
put our hands on ident_user and after the base validation of a line
(number of fields, etc.). So with all that in mind, it feels right to
not use AuthToken at all when building each HbaLine and each
IdentLine, but a new, separate, structure. We could call that an
AuthItem (string, its compiled regex) perhaps? It could have its own
make() routine, taking in input an AuthToken and process
pg_regcomp(). Better ideas for this new structure would be welcome,
and the idea is that we'd store the post-parsing state of an
AuthToken to something that has a compiled regex. We could finish by
using AuthToken at the end and expand its use, but it does not feel
completely right either to have a make() routine but not be able to
compile its regular expression when creating the AuthToken.

I have have sent this part too quickly. As AuthTokens are used in
check_db() and check_role() when matching entries, it is more
intuitive to store the regex_t directly in it.

Yeah, I also think this is the right place for it.

Changing IdentLine to
use a AuthToken makes the "quoted" part useless in this case, still it
could be used in Assert()s to make sure that the data is shaped as
expected at check-time, enforced at false when creating it in
parse_ident_line()?

I agree, that makes sense. I'll work on that.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#27

Drouvot, Bertrand

bertranddrouvot.pg@gmail.com

about 3 years ago

In reply to: Michael Paquier (#24)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Hi,

On 10/14/22 7:30 AM, Michael Paquier wrote:

On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote:

Indeed, ;-)

So, I have spent the last two days looking at all that, studying the
structure of the patch and the existing HEAD code,

Thanks!

The code could be split to tackle things step-by-step:
- One refactoring patch to introduce token_regcomp() and
token_regexec(), with the introduction of a new structure that
includes the compiled regexes. (Feel free to counterargue about the
use of AuthToken for this purpose, of course!)
- Plug in the refactored logic for the lists of role names and
database names in pg_hba.conf.
- Handle the case of single host entries in pg_hba.conf.
--

I agree to work step-by-step.

While looking at it again now, I discovered that the new TAP test for
the regexp on the hostname in ssl/002_scram.pl is failing on some of my
tests environment (and not all..).

So, I agree with the dedicated steps you are proposing and that the
"host case" needs a dedicated attention.

I'm not ignoring all the remarks you've just done up-thread, I'll
address them and/or provide my feedback on them when I'll come back with
the step-by-step sub patches.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#28

Drouvot, Bertrand

bertranddrouvot.pg@gmail.com

about 3 years ago

In reply to: Michael Paquier (#24)

1 attachment(s)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Hi,

On 10/14/22 7:30 AM, Michael Paquier wrote:

On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote:

Indeed, ;-)

I have also looked
at make_auth_token(), and wondered if it could be possible to have this
routine compile the regexes.

I think that it makes sense.

This approach would not stick with
pg_ident.conf though, as we validate the fields in each line when we
put our hands on ident_user and after the base validation of a line
(number of fields, etc.).

I'm not sure to get the issue here with the proposed approach and
pg_ident.conf.

The new attached patch proposal is making use of make_auth_token()
(through copy_auth_token()) in parse_ident_line(), do you see any issue?

The logic in charge of compiling the regular expressions could be
consolidated more. The patch refactors the logic with
token_regcomp(), uses it for the user names (ident_user in
parse_ident_line() from pg_ident.conf), then extended to the hostnames
(single item) and the role/database names (list possible in these
cases). This approach looks right to me. Once we plug in an AuthItem
to IdentLine, token_regcomp could be changed so as it takes an
AuthToken in input

Right, did it that way in the attached.

- Only one code path of hba.c should call pg_regcomp() (the patch does
that), and only one code path should call pg_regexec() (two code paths
of hba.c do that with the patch, as of the need to store matching
expression). This should minimize the areas where we call
pg_mb2wchar_with_len(), for one.

Right.

About this last point, token_regexec() does not include
check_ident_usermap() in its logic, and it seems to me that it should.
The difference is with the expected regmatch_t structures, so
shouldn't token_regexec be extended with two arguments as of an array
of regmatch_t and the number of elements in the array?

You are right, not using token_regexec() in check_ident_usermap() in the
previous patch versions was not right. That's fixed in the attached,
though the substitution (if any) is still outside of token_regexec(), do
you think it should be part of it? (I think that makes sense to keep it
outside of it as we wont use the substitution logic for roles, databases
and hostname)

The code could be split to tackle things step-by-step:
- One refactoring patch to introduce token_regcomp() and
token_regexec()

Agree. Please find attached v1-0001-token_reg-functions.patch for this
first step.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-token_reg-functions.patchtext/plain; charset=UTF-8; name=v1-0001-token_reg-functions.patchDownload

diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 4637426d62..d9ef98141c 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -66,6 +66,7 @@ typedef struct check_network_data
 } check_network_data;
 
 
+#define token_is_regexp(t)	(t->regex)
 #define token_is_keyword(t, k)	(!t->quoted && strcmp(t->string, k) == 0)
 #define token_matches(t, k)  (strcmp(t->string, k) == 0)
 
@@ -117,6 +118,9 @@ static List *tokenize_inc_file(List *tokens, const char *outer_filename,
 							   const char *inc_filename, int elevel, char **err_msg);
 static bool parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline,
 							   int elevel, char **err_msg);
+static void token_regcomp(AuthToken *token);
+static int	token_regexec(const char *match, regex_t *re, size_t nmatch,
+						  regmatch_t pmatch[]);
 
 
 /*
@@ -267,11 +271,21 @@ make_auth_token(const char *token, bool quoted)
 
 	toklen = strlen(token);
 	/* we copy string into same palloc block as the struct */
-	authtoken = (AuthToken *) palloc(sizeof(AuthToken) + toklen + 1);
+	authtoken = (AuthToken *) palloc0(sizeof(AuthToken) + toklen + 1);
 	authtoken->string = (char *) authtoken + sizeof(AuthToken);
 	authtoken->quoted = quoted;
+	authtoken->regcomp_rc = 0;
 	memcpy(authtoken->string, token, toklen + 1);
 
+	if (authtoken->string[0] == '/')
+	{
+		/*
+		 * When the string starts with a slash, treat it as a regular
+		 * expression. Pre-compile it.
+		 */
+		token_regcomp(authtoken);
+	}
+
 	return authtoken;
 }
 
@@ -2326,52 +2340,39 @@ parse_ident_line(TokenizedAuthLine *tok_line, int elevel)
 	tokens = lfirst(field);
 	IDENT_MULTI_VALUE(tokens);
 	token = linitial(tokens);
-	parsedline->ident_user = pstrdup(token->string);
 
-	/* Get the PG rolename token */
-	field = lnext(tok_line->fields, field);
-	IDENT_FIELD_ABSENT(field);
-	tokens = lfirst(field);
-	IDENT_MULTI_VALUE(tokens);
-	token = linitial(tokens);
-	parsedline->pg_role = pstrdup(token->string);
+	/* Copy the ident user token */
+	parsedline->token = copy_auth_token(token);
 
-	if (parsedline->ident_user[0] == '/')
+	if (parsedline->token->regcomp_rc)
 	{
-		/*
-		 * When system username starts with a slash, treat it as a regular
-		 * expression. Pre-compile it.
-		 */
-		int			r;
-		pg_wchar   *wstr;
-		int			wlen;
-
-		wstr = palloc((strlen(parsedline->ident_user + 1) + 1) * sizeof(pg_wchar));
-		wlen = pg_mb2wchar_with_len(parsedline->ident_user + 1,
-									wstr, strlen(parsedline->ident_user + 1));
+		char		errstr[100];
 
-		r = pg_regcomp(&parsedline->re, wstr, wlen, REG_ADVANCED, C_COLLATION_OID);
-		if (r)
-		{
-			char		errstr[100];
+		Assert(token_is_regexp(parsedline->token));
 
-			pg_regerror(r, &parsedline->re, errstr, sizeof(errstr));
-			ereport(elevel,
-					(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
-					 errmsg("invalid regular expression \"%s\": %s",
-							parsedline->ident_user + 1, errstr),
-					 errcontext("line %d of configuration file \"%s\"",
+		pg_regerror(parsedline->token->regcomp_rc, parsedline->token->regex,
+					errstr, sizeof(errstr));
+		ereport(elevel,
+				(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
+				 errmsg("invalid regular expression \"%s\": %s",
+						parsedline->token->string + 1, errstr),
+				 errcontext("line %d of configuration file \"%s\"",
 							line_num, IdentFileName)));
 
-			*err_msg = psprintf("invalid regular expression \"%s\": %s",
-								parsedline->ident_user + 1, errstr);
+		*err_msg = psprintf("invalid regular expression \"%s\": %s",
+							parsedline->token->string + 1, errstr);
 
-			pfree(wstr);
-			return NULL;
-		}
-		pfree(wstr);
+		return NULL;
 	}
 
+	/* Get the PG rolename token */
+	field = lnext(tok_line->fields, field);
+	IDENT_FIELD_ABSENT(field);
+	tokens = lfirst(field);
+	IDENT_MULTI_VALUE(tokens);
+	token = linitial(tokens);
+	parsedline->pg_role = pstrdup(token->string);
+
 	return parsedline;
 }
 
@@ -2394,25 +2395,19 @@ check_ident_usermap(IdentLine *identLine, const char *usermap_name,
 		return;
 
 	/* Match? */
-	if (identLine->ident_user[0] == '/')
+	if (token_is_regexp(identLine->token))
 	{
 		/*
-		 * When system username starts with a slash, treat it as a regular
-		 * expression. In this case, we process the system username as a
-		 * regular expression that returns exactly one match. This is replaced
-		 * for \1 in the database username string, if present.
+		 * Process the system username as a regular expression that returns
+		 * exactly one match. This is replaced for \1 in the database username
+		 * string, if present.
 		 */
 		int			r;
 		regmatch_t	matches[2];
-		pg_wchar   *wstr;
-		int			wlen;
 		char	   *ofs;
 		char	   *regexp_pgrole;
 
-		wstr = palloc((strlen(ident_user) + 1) * sizeof(pg_wchar));
-		wlen = pg_mb2wchar_with_len(ident_user, wstr, strlen(ident_user));
-
-		r = pg_regexec(&identLine->re, wstr, wlen, 0, NULL, 2, matches, 0);
+		r = token_regexec(ident_user, identLine->token->regex, 2, matches);
 		if (r)
 		{
 			char		errstr[100];
@@ -2420,18 +2415,15 @@ check_ident_usermap(IdentLine *identLine, const char *usermap_name,
 			if (r != REG_NOMATCH)
 			{
 				/* REG_NOMATCH is not an error, everything else is */
-				pg_regerror(r, &identLine->re, errstr, sizeof(errstr));
+				pg_regerror(r, identLine->token->regex, errstr, sizeof(errstr));
 				ereport(LOG,
 						(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
 						 errmsg("regular expression match for \"%s\" failed: %s",
-								identLine->ident_user + 1, errstr)));
+								identLine->token->string + 1, errstr)));
 				*error_p = true;
 			}
-
-			pfree(wstr);
 			return;
 		}
-		pfree(wstr);
 
 		if ((ofs = strstr(identLine->pg_role, "\\1")) != NULL)
 		{
@@ -2443,7 +2435,7 @@ check_ident_usermap(IdentLine *identLine, const char *usermap_name,
 				ereport(LOG,
 						(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
 						 errmsg("regular expression \"%s\" has no subexpressions as requested by backreference in \"%s\"",
-								identLine->ident_user + 1, identLine->pg_role)));
+								identLine->token->string + 1, identLine->pg_role)));
 				*error_p = true;
 				return;
 			}
@@ -2490,13 +2482,13 @@ check_ident_usermap(IdentLine *identLine, const char *usermap_name,
 		if (case_insensitive)
 		{
 			if (pg_strcasecmp(identLine->pg_role, pg_role) == 0 &&
-				pg_strcasecmp(identLine->ident_user, ident_user) == 0)
+				pg_strcasecmp(identLine->token->string, ident_user) == 0)
 				*found_p = true;
 		}
 		else
 		{
 			if (strcmp(identLine->pg_role, pg_role) == 0 &&
-				strcmp(identLine->ident_user, ident_user) == 0)
+				strcmp(identLine->token->string, ident_user) == 0)
 				*found_p = true;
 		}
 	}
@@ -2646,8 +2638,8 @@ load_ident(void)
 		foreach(parsed_line_cell, new_parsed_lines)
 		{
 			newline = (IdentLine *) lfirst(parsed_line_cell);
-			if (newline->ident_user[0] == '/')
-				pg_regfree(&newline->re);
+			if (token_is_regexp(newline->token))
+				pg_regfree(newline->token->regex);
 		}
 		MemoryContextDelete(ident_context);
 		return false;
@@ -2659,8 +2651,8 @@ load_ident(void)
 		foreach(parsed_line_cell, parsed_ident_lines)
 		{
 			newline = (IdentLine *) lfirst(parsed_line_cell);
-			if (newline->ident_user[0] == '/')
-				pg_regfree(&newline->re);
+			if (token_is_regexp(newline->token))
+				pg_regfree(newline->token->regex);
 		}
 	}
 	if (parsed_ident_context != NULL)
@@ -2706,3 +2698,43 @@ hba_authname(UserAuth auth_method)
 
 	return UserAuthName[auth_method];
 }
+
+/*
+ * Compile the regular expression "re" and register the compilation return code
+ * in the token (so that the caller can decide what do to with it).
+ */
+static void
+token_regcomp(AuthToken *token)
+{
+	pg_wchar   *wstr;
+	int			wlen;
+
+	token->regex = (regex_t *) palloc0(sizeof(regex_t));
+	wstr = palloc((strlen(token->string + 1) + 1) * sizeof(pg_wchar));
+	wlen = pg_mb2wchar_with_len(token->string + 1,
+								wstr, strlen(token->string + 1));
+
+	token->regcomp_rc = pg_regcomp(token->regex, wstr, wlen, REG_ADVANCED,
+								   C_COLLATION_OID);
+
+	pfree(wstr);
+}
+
+/*
+ * Return the pg_regexec()'s result.
+ */
+static int
+token_regexec(const char *match, regex_t *re, size_t nmatch, regmatch_t pmatch[])
+{
+	pg_wchar   *wmatchstr;
+	int			wmatchlen;
+	int			r;
+
+	wmatchstr = palloc((strlen(match) + 1) * sizeof(pg_wchar));
+	wmatchlen = pg_mb2wchar_with_len(match, wmatchstr, strlen(match));
+
+	r = pg_regexec(re, wmatchstr, wmatchlen, 0, NULL, nmatch, pmatch, 0);
+
+	pfree(wmatchstr);
+	return r;
+}
diff --git a/src/backend/utils/adt/hbafuncs.c b/src/backend/utils/adt/hbafuncs.c
index 9e5794071c..54f4cfd713 100644
--- a/src/backend/utils/adt/hbafuncs.c
+++ b/src/backend/utils/adt/hbafuncs.c
@@ -467,7 +467,7 @@ fill_ident_line(Tuplestorestate *tuple_store, TupleDesc tupdesc,
 	if (ident != NULL)
 	{
 		values[index++] = CStringGetTextDatum(ident->usermap);
-		values[index++] = CStringGetTextDatum(ident->ident_user);
+		values[index++] = CStringGetTextDatum(ident->token->string);
 		values[index++] = CStringGetTextDatum(ident->pg_role);
 	}
 	else
diff --git a/src/include/libpq/hba.h b/src/include/libpq/hba.h
index d06da81806..27abd5bace 100644
--- a/src/include/libpq/hba.h
+++ b/src/include/libpq/hba.h
@@ -77,6 +77,20 @@ typedef enum ClientCertName
 	clientCertDN
 } ClientCertName;
 
+/*
+ * A single string token lexed from an authentication configuration file
+ * (pg_ident.conf or pg_hba.conf), together with whether the token has
+ * been quoted. It may also contain a non NULL regular expression if the string
+ * starts with a "/"., together with the regex compilation return code.
+ */
+typedef struct AuthToken
+{
+	char	   *string;
+	bool		quoted;
+	regex_t    *regex;
+	int			regcomp_rc;
+} AuthToken;
+
 typedef struct HbaLine
 {
 	int			linenumber;
@@ -127,22 +141,10 @@ typedef struct IdentLine
 	int			linenumber;
 
 	char	   *usermap;
-	char	   *ident_user;
 	char	   *pg_role;
-	regex_t		re;
+	AuthToken  *token;
 } IdentLine;
 
-/*
- * A single string token lexed from an authentication configuration file
- * (pg_ident.conf or pg_hba.conf), together with whether the token has
- * been quoted.
- */
-typedef struct AuthToken
-{
-	char	   *string;
-	bool		quoted;
-} AuthToken;
-
 /*
  * TokenizedAuthLine represents one line lexed from an authentication
  * configuration file.  Each item in the "fields" list is a sub-list of

#29

Michael Paquier

michael@paquier.xyz

about 3 years ago

In reply to: Drouvot, Bertrand (#28)

1 attachment(s)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

On Mon, Oct 17, 2022 at 07:56:02PM +0200, Drouvot, Bertrand wrote:

On 10/14/22 7:30 AM, Michael Paquier wrote:

This approach would not stick with
pg_ident.conf though, as we validate the fields in each line when we
put our hands on ident_user and after the base validation of a line
(number of fields, etc.).

I'm not sure to get the issue here with the proposed approach and
pg_ident.conf.

My point is about parse_ident_line(), where we need to be careful in
the order of the operations. The macros IDENT_MULTI_VALUE() and
IDENT_FIELD_ABSENT() need to be applied on all the fields first, and
the regex computation needs to be last. Your patch had a subtile
issue here, as users may get errors on the computed regex before the
ordering of the fields as the computation was used *before* the "Get
the PG rolename token" part of the logic.

About this last point, token_regexec() does not include
check_ident_usermap() in its logic, and it seems to me that it should.
The difference is with the expected regmatch_t structures, so
shouldn't token_regexec be extended with two arguments as of an array
of regmatch_t and the number of elements in the array?

You are right, not using token_regexec() in check_ident_usermap() in the
previous patch versions was not right. That's fixed in the attached, though
the substitution (if any) is still outside of token_regexec(), do you think
it should be part of it? (I think that makes sense to keep it outside of it
as we wont use the substitution logic for roles, databases and hostname)

Keeping the substition done with the IdentLine's Authtokens outside of
the internal execution routine is fine by me.

While putting my hands on that, I was also wondering whether we should
have the error string generated after compilation within the internal
regcomp() routine, but that would require more arguments to
pg_regcomp() (as of file name, line number, **err_string), and that
looks more invasive than necessary. Perhaps the follow-up steps will
prove me wrong, though :)

A last thing is the introduction of a free() routine for AuthTokens,
to minimize the number of places where we haev pg_regfree(). The gain
is minimal, but that looks more consistent with the execution and
compilation paths.
--
Michael

Attachments:

v2-0001-Refactor-regex-handling-for-pg_ident.conf-in-hba..patchtext/x-diff; charset=us-asciiDownload

From 89a7ce5bb2338f65ebd42703eed11033881646cb Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Tue, 18 Oct 2022 14:51:03 +0900
Subject: [PATCH v2] Refactor regex handling for pg_ident.conf in hba.c

---
 src/include/libpq/hba.h          |  28 +++---
 src/backend/libpq/hba.c          | 162 ++++++++++++++++++++-----------
 src/backend/utils/adt/hbafuncs.c |   2 +-
 3 files changed, 121 insertions(+), 71 deletions(-)

diff --git a/src/include/libpq/hba.h b/src/include/libpq/hba.h
index d06da81806..cec2e2665f 100644
--- a/src/include/libpq/hba.h
+++ b/src/include/libpq/hba.h
@@ -77,6 +77,20 @@ typedef enum ClientCertName
 	clientCertDN
 } ClientCertName;
 
+/*
+ * A single string token lexed from an authentication configuration file
+ * (pg_ident.conf or pg_hba.conf), together with whether the token has
+ * been quoted.  If "string" begins with a slash, it may optionally
+ * contain a regular expression (currently used for pg_ident.conf when
+ * building IdentLines).
+ */
+typedef struct AuthToken
+{
+	char	   *string;
+	bool		quoted;
+	regex_t    *regex;
+} AuthToken;
+
 typedef struct HbaLine
 {
 	int			linenumber;
@@ -127,22 +141,10 @@ typedef struct IdentLine
 	int			linenumber;
 
 	char	   *usermap;
-	char	   *ident_user;
 	char	   *pg_role;
-	regex_t		re;
+	AuthToken  *token;
 } IdentLine;
 
-/*
- * A single string token lexed from an authentication configuration file
- * (pg_ident.conf or pg_hba.conf), together with whether the token has
- * been quoted.
- */
-typedef struct AuthToken
-{
-	char	   *string;
-	bool		quoted;
-} AuthToken;
-
 /*
  * TokenizedAuthLine represents one line lexed from an authentication
  * configuration file.  Each item in the "fields" list is a sub-list of
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 4637426d62..ea4e955d51 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -66,6 +66,7 @@ typedef struct check_network_data
 } check_network_data;
 
 
+#define token_has_regexp(t)	(t->regex != NULL)
 #define token_is_keyword(t, k)	(!t->quoted && strcmp(t->string, k) == 0)
 #define token_matches(t, k)  (strcmp(t->string, k) == 0)
 
@@ -80,9 +81,10 @@ static MemoryContext parsed_hba_context = NULL;
  * pre-parsed content of ident mapping file: list of IdentLine structs.
  * parsed_ident_context is the memory context where it lives.
  *
- * NOTE: the IdentLine structs can contain pre-compiled regular expressions
- * that live outside the memory context. Before destroying or resetting the
- * memory context, they need to be explicitly free'd.
+ * NOTE: the IdentLine structs can contain AuthTokens with pre-compiled
+ * regular expressions that live outside the memory context. Before
+ * destroying or resetting the memory context, they need to be explicitly
+ * free'd.
  */
 static List *parsed_ident_lines = NIL;
 static MemoryContext parsed_ident_context = NULL;
@@ -117,6 +119,9 @@ static List *tokenize_inc_file(List *tokens, const char *outer_filename,
 							   const char *inc_filename, int elevel, char **err_msg);
 static bool parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline,
 							   int elevel, char **err_msg);
+static int regcomp_auth_token(AuthToken *token);
+static int regexec_auth_token(const char *match, AuthToken *token,
+							  size_t nmatch, regmatch_t pmatch[]);
 
 
 /*
@@ -267,14 +272,27 @@ make_auth_token(const char *token, bool quoted)
 
 	toklen = strlen(token);
 	/* we copy string into same palloc block as the struct */
-	authtoken = (AuthToken *) palloc(sizeof(AuthToken) + toklen + 1);
+	authtoken = (AuthToken *) palloc0(sizeof(AuthToken) + toklen + 1);
 	authtoken->string = (char *) authtoken + sizeof(AuthToken);
 	authtoken->quoted = quoted;
+	authtoken->regex = NULL;
 	memcpy(authtoken->string, token, toklen + 1);
 
 	return authtoken;
 }
 
+/*
+ * Free an AuthToken, that may include a regular expression that needs
+ * to be cleaned up explicitly.
+ */
+static void
+free_auth_token(AuthToken *token)
+{
+	if (token_has_regexp(token))
+		pg_regfree(token->regex);
+	pfree(token);
+}
+
 /*
  * Copy a AuthToken struct into freshly palloc'd memory.
  */
@@ -286,6 +304,57 @@ copy_auth_token(AuthToken *in)
 	return out;
 }
 
+/*
+ * Compile the regular expression "re" and store it in the AuthToken
+ * given in input.  Returns the error code of pg_regcomp, to be used
+ * by the caller.
+ */
+static int
+regcomp_auth_token(AuthToken *token)
+{
+	pg_wchar   *wstr;
+	int			wlen;
+	int			rc;
+
+	Assert(token->regex == NULL);
+
+	if (token->string[0] != '/')
+		return 0;	/* nothing to compile */
+
+	token->regex = (regex_t *) palloc0(sizeof(regex_t));
+	wstr = palloc((strlen(token->string + 1) + 1) * sizeof(pg_wchar));
+	wlen = pg_mb2wchar_with_len(token->string + 1,
+								wstr, strlen(token->string + 1));
+
+	rc = pg_regcomp(token->regex, wstr, wlen, REG_ADVANCED, C_COLLATION_OID);
+
+	pfree(wstr);
+	return rc;
+}
+
+/*
+ * Execute a regular expression computed in an AuthToken, checking for a match
+ * with the string specified in "match".  The caller may optionally give an
+ * array to store the matches.  Return the result of pg_regexec().
+ */
+static int
+regexec_auth_token(const char *match, AuthToken *token, size_t nmatch,
+				   regmatch_t pmatch[])
+{
+	pg_wchar   *wmatchstr;
+	int			wmatchlen;
+	int			r;
+
+	Assert(token->string[0] == '/' && token->regex);
+
+	wmatchstr = palloc((strlen(match) + 1) * sizeof(pg_wchar));
+	wmatchlen = pg_mb2wchar_with_len(match, wmatchstr, strlen(match));
+
+	r = pg_regexec(token->regex, wmatchstr, wmatchlen, 0, NULL, nmatch, pmatch, 0);
+
+	pfree(wmatchstr);
+	return r;
+}
 
 /*
  * Tokenize one HBA field from a line, handling file inclusion and comma lists.
@@ -2307,6 +2376,7 @@ parse_ident_line(TokenizedAuthLine *tok_line, int elevel)
 	List	   *tokens;
 	AuthToken  *token;
 	IdentLine  *parsedline;
+	int			rc;
 
 	Assert(tok_line->fields != NIL);
 	field = list_head(tok_line->fields);
@@ -2326,7 +2396,9 @@ parse_ident_line(TokenizedAuthLine *tok_line, int elevel)
 	tokens = lfirst(field);
 	IDENT_MULTI_VALUE(tokens);
 	token = linitial(tokens);
-	parsedline->ident_user = pstrdup(token->string);
+
+	/* Copy the ident user token */
+	parsedline->token = copy_auth_token(token);
 
 	/* Get the PG rolename token */
 	field = lnext(tok_line->fields, field);
@@ -2336,40 +2408,27 @@ parse_ident_line(TokenizedAuthLine *tok_line, int elevel)
 	token = linitial(tokens);
 	parsedline->pg_role = pstrdup(token->string);
 
-	if (parsedline->ident_user[0] == '/')
+	/*
+	 * Now that the field validation is done, compile a regex from the user
+	 * token, if necessary.
+	 */
+	rc = regcomp_auth_token(parsedline->token);
+	if (rc)
 	{
-		/*
-		 * When system username starts with a slash, treat it as a regular
-		 * expression. Pre-compile it.
-		 */
-		int			r;
-		pg_wchar   *wstr;
-		int			wlen;
+		char		errstr[100];
 
-		wstr = palloc((strlen(parsedline->ident_user + 1) + 1) * sizeof(pg_wchar));
-		wlen = pg_mb2wchar_with_len(parsedline->ident_user + 1,
-									wstr, strlen(parsedline->ident_user + 1));
-
-		r = pg_regcomp(&parsedline->re, wstr, wlen, REG_ADVANCED, C_COLLATION_OID);
-		if (r)
-		{
-			char		errstr[100];
-
-			pg_regerror(r, &parsedline->re, errstr, sizeof(errstr));
-			ereport(elevel,
-					(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
-					 errmsg("invalid regular expression \"%s\": %s",
-							parsedline->ident_user + 1, errstr),
-					 errcontext("line %d of configuration file \"%s\"",
+		pg_regerror(rc, parsedline->token->regex, errstr, sizeof(errstr));
+		ereport(elevel,
+				(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
+				 errmsg("invalid regular expression \"%s\": %s",
+						parsedline->token->string + 1, errstr),
+				 errcontext("line %d of configuration file \"%s\"",
 							line_num, IdentFileName)));
 
-			*err_msg = psprintf("invalid regular expression \"%s\": %s",
-								parsedline->ident_user + 1, errstr);
+		*err_msg = psprintf("invalid regular expression \"%s\": %s",
+							parsedline->token->string + 1, errstr);
 
-			pfree(wstr);
-			return NULL;
-		}
-		pfree(wstr);
+		return NULL;
 	}
 
 	return parsedline;
@@ -2394,25 +2453,19 @@ check_ident_usermap(IdentLine *identLine, const char *usermap_name,
 		return;
 
 	/* Match? */
-	if (identLine->ident_user[0] == '/')
+	if (token_has_regexp(identLine->token))
 	{
 		/*
-		 * When system username starts with a slash, treat it as a regular
-		 * expression. In this case, we process the system username as a
-		 * regular expression that returns exactly one match. This is replaced
-		 * for \1 in the database username string, if present.
+		 * Process the system username as a regular expression that returns
+		 * exactly one match. This is replaced for \1 in the database username
+		 * string, if present.
 		 */
 		int			r;
 		regmatch_t	matches[2];
-		pg_wchar   *wstr;
-		int			wlen;
 		char	   *ofs;
 		char	   *regexp_pgrole;
 
-		wstr = palloc((strlen(ident_user) + 1) * sizeof(pg_wchar));
-		wlen = pg_mb2wchar_with_len(ident_user, wstr, strlen(ident_user));
-
-		r = pg_regexec(&identLine->re, wstr, wlen, 0, NULL, 2, matches, 0);
+		r = regexec_auth_token(ident_user, identLine->token, 2, matches);
 		if (r)
 		{
 			char		errstr[100];
@@ -2420,18 +2473,15 @@ check_ident_usermap(IdentLine *identLine, const char *usermap_name,
 			if (r != REG_NOMATCH)
 			{
 				/* REG_NOMATCH is not an error, everything else is */
-				pg_regerror(r, &identLine->re, errstr, sizeof(errstr));
+				pg_regerror(r, identLine->token->regex, errstr, sizeof(errstr));
 				ereport(LOG,
 						(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
 						 errmsg("regular expression match for \"%s\" failed: %s",
-								identLine->ident_user + 1, errstr)));
+								identLine->token->string + 1, errstr)));
 				*error_p = true;
 			}
-
-			pfree(wstr);
 			return;
 		}
-		pfree(wstr);
 
 		if ((ofs = strstr(identLine->pg_role, "\\1")) != NULL)
 		{
@@ -2443,7 +2493,7 @@ check_ident_usermap(IdentLine *identLine, const char *usermap_name,
 				ereport(LOG,
 						(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
 						 errmsg("regular expression \"%s\" has no subexpressions as requested by backreference in \"%s\"",
-								identLine->ident_user + 1, identLine->pg_role)));
+								identLine->token->string + 1, identLine->pg_role)));
 				*error_p = true;
 				return;
 			}
@@ -2490,13 +2540,13 @@ check_ident_usermap(IdentLine *identLine, const char *usermap_name,
 		if (case_insensitive)
 		{
 			if (pg_strcasecmp(identLine->pg_role, pg_role) == 0 &&
-				pg_strcasecmp(identLine->ident_user, ident_user) == 0)
+				pg_strcasecmp(identLine->token->string, ident_user) == 0)
 				*found_p = true;
 		}
 		else
 		{
 			if (strcmp(identLine->pg_role, pg_role) == 0 &&
-				strcmp(identLine->ident_user, ident_user) == 0)
+				strcmp(identLine->token->string, ident_user) == 0)
 				*found_p = true;
 		}
 	}
@@ -2646,8 +2696,7 @@ load_ident(void)
 		foreach(parsed_line_cell, new_parsed_lines)
 		{
 			newline = (IdentLine *) lfirst(parsed_line_cell);
-			if (newline->ident_user[0] == '/')
-				pg_regfree(&newline->re);
+			free_auth_token(newline->token);
 		}
 		MemoryContextDelete(ident_context);
 		return false;
@@ -2659,8 +2708,7 @@ load_ident(void)
 		foreach(parsed_line_cell, parsed_ident_lines)
 		{
 			newline = (IdentLine *) lfirst(parsed_line_cell);
-			if (newline->ident_user[0] == '/')
-				pg_regfree(&newline->re);
+			free_auth_token(newline->token);
 		}
 	}
 	if (parsed_ident_context != NULL)
diff --git a/src/backend/utils/adt/hbafuncs.c b/src/backend/utils/adt/hbafuncs.c
index cbbe44ff13..cfdc4d8b39 100644
--- a/src/backend/utils/adt/hbafuncs.c
+++ b/src/backend/utils/adt/hbafuncs.c
@@ -467,7 +467,7 @@ fill_ident_line(Tuplestorestate *tuple_store, TupleDesc tupdesc,
 	if (ident != NULL)
 	{
 		values[index++] = CStringGetTextDatum(ident->usermap);
-		values[index++] = CStringGetTextDatum(ident->ident_user);
+		values[index++] = CStringGetTextDatum(ident->token->string);
 		values[index++] = CStringGetTextDatum(ident->pg_role);
 	}
 	else
-- 
2.37.2

#30

Drouvot, Bertrand

bertranddrouvot.pg@gmail.com

about 3 years ago

In reply to: Michael Paquier (#29)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Hi,

On 10/18/22 7:51 AM, Michael Paquier wrote:

On Mon, Oct 17, 2022 at 07:56:02PM +0200, Drouvot, Bertrand wrote:

On 10/14/22 7:30 AM, Michael Paquier wrote:

This approach would not stick with
pg_ident.conf though, as we validate the fields in each line when we
put our hands on ident_user and after the base validation of a line
(number of fields, etc.).

I'm not sure to get the issue here with the proposed approach and
pg_ident.conf.

My point is about parse_ident_line(), where we need to be careful in
the order of the operations. The macros IDENT_MULTI_VALUE() and
IDENT_FIELD_ABSENT() need to be applied on all the fields first, and
the regex computation needs to be last. Your patch had a subtile
issue here, as users may get errors on the computed regex before the
ordering of the fields as the computation was used *before* the "Get
the PG rolename token" part of the logic.

Gotcha, thanks! I was wondering if we shouldn't add a comment about that
and I see that you've added one in v2, thanks!

BTW, what about adding a new TAP test (dedicated patch) to test the
behavior in case of errors during the regexes compilation in
pg_ident.conf and pg_hba.conf (not done yet)? (we could add it once this
patch series is done).

While putting my hands on that, I was also wondering whether we should
have the error string generated after compilation within the internal
regcomp() routine, but that would require more arguments to
pg_regcomp() (as of file name, line number, **err_string), and that
looks more invasive than necessary. Perhaps the follow-up steps will
prove me wrong, though :)

I've had the same thought (and that was what the previous global patch
was doing). I'm tempted to think that the follow-steps will prove you
right ;-) (specially if at the end those will be the same error messages
for databases and roles).

A last thing is the introduction of a free() routine for AuthTokens,
to minimize the number of places where we haev pg_regfree(). The gain
is minimal, but that looks more consistent with the execution and
compilation paths.

Agree, that looks better.

I had a look at your v2, did a few tests and it looks good to me.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#31

Michael Paquier

michael@paquier.xyz

about 3 years ago

In reply to: Drouvot, Bertrand (#30)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

On Tue, Oct 18, 2022 at 09:14:21AM +0200, Drouvot, Bertrand wrote:

BTW, what about adding a new TAP test (dedicated patch) to test the behavior
in case of errors during the regexes compilation in pg_ident.conf and
pg_hba.conf (not done yet)? (we could add it once this patch series is
done).

Perhaps, that may become tricky when it comes to -DEXEC_BACKEND (for
cases where no fork() implementation is available, aka Windows). But
a postmaster restart failure would generate logs that could be picked
for a pattern check?

While putting my hands on that, I was also wondering whether we should
have the error string generated after compilation within the internal
regcomp() routine, but that would require more arguments to
pg_regcomp() (as of file name, line number, **err_string), and that
looks more invasive than necessary. Perhaps the follow-up steps will
prove me wrong, though :)

I've had the same thought (and that was what the previous global patch was
doing). I'm tempted to think that the follow-steps will prove you right ;-)
(specially if at the end those will be the same error messages for databases
and roles).

Avoiding three times the same error message seems like a good thing in
the long run, but let's think about this part later as needed. All
these routines are static to hba.c so even if we finish by not
finishing the whole job for this development cycle we can still be
very flexible.
--
Michael

#32

Drouvot, Bertrand

bertranddrouvot.pg@gmail.com

about 3 years ago

In reply to: Michael Paquier (#31)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Hi,

On 10/19/22 3:18 AM, Michael Paquier wrote:

On Tue, Oct 18, 2022 at 09:14:21AM +0200, Drouvot, Bertrand wrote:

BTW, what about adding a new TAP test (dedicated patch) to test the behavior
in case of errors during the regexes compilation in pg_ident.conf and
pg_hba.conf (not done yet)? (we could add it once this patch series is
done).

Perhaps, that may become tricky when it comes to -DEXEC_BACKEND (for
cases where no fork() implementation is available, aka Windows). But
a postmaster restart failure would generate logs that could be picked
for a pattern check?

Right, that's how I'd see it. I'll give it a look.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#33

Drouvot, Bertrand

bertranddrouvot.pg@gmail.com

about 3 years ago

In reply to: Michael Paquier (#24)

1 attachment(s)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Hi,

On 10/14/22 7:30 AM, Michael Paquier wrote:

On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote:

Indeed, ;-)

The code could be split to tackle things step-by-step:
- One refactoring patch to introduce token_regcomp() and
token_regexec(), with the introduction of a new structure that
includes the compiled regexes. (Feel free to counterargue about the
use of AuthToken for this purpose, of course!)
- Plug in the refactored logic for the lists of role names and
database names in pg_hba.conf.

Please find attached
v1-0001-regex-handling-for-db-and-roles-in-hba.patch to implement
regexes for databases and roles in hba.

It does also contain new regexes related TAP tests and doc updates.

It relies on the refactoring made in fc579e11c6 (but changes the
regcomp_auth_token() parameters so that it is now responsible for
emitting the compilation error message (if any), to avoid code
duplication in parse_hba_line() and parse_ident_line() for roles,
databases and user name mapping).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-regex-handling-for-db-and-roles-in-hba.patchtext/plain; charset=UTF-8; name=v1-0001-regex-handling-for-db-and-roles-in-hba.patchDownload

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..30753003ba 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -234,10 +234,13 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        replication connections do not specify any particular database whereas
        logical replication connections do specify it.
        Otherwise, this is the name of
-       a specific <productname>PostgreSQL</productname> database.
-       Multiple database names can be supplied by separating them with
-       commas.  A separate file containing database names can be specified by
-       preceding the file name with <literal>@</literal>.
+       a specific <productname>PostgreSQL</productname> database or a regular
+       expression preceded by <literal>/</literal>.
+       Multiple database names and/or regular expressions preceded by <literal>/</literal>
+       can be supplied by separating them with commas.
+       A separate file containing database names and/or regular expressions preceded
+       by <literal>/</literal> can be specified by preceding the file name
+       with <literal>@</literal>.
       </para>
      </listitem>
     </varlistentry>
@@ -249,18 +252,20 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        Specifies which database user name(s) this record
        matches. The value <literal>all</literal> specifies that it
        matches all users.  Otherwise, this is either the name of a specific
-       database user, or a group name preceded by <literal>+</literal>.
+       database user, a regular expression preceded by <literal>/</literal>,
+       or a group name preceded by <literal>+</literal>.
        (Recall that there is no real distinction between users and groups
        in <productname>PostgreSQL</productname>; a <literal>+</literal> mark really means
        <quote>match any of the roles that are directly or indirectly members
        of this role</quote>, while a name without a <literal>+</literal> mark matches
-       only that specific role.) For this purpose, a superuser is only
+       only that specific role or regular expression.) For this purpose, a superuser is only
        considered to be a member of a role if they are explicitly a member
        of the role, directly or indirectly, and not just by virtue of
        being a superuser.
-       Multiple user names can be supplied by separating them with commas.
-       A separate file containing user names can be specified by preceding the
-       file name with <literal>@</literal>.
+       Multiple user names and/or regular expressions preceded by <literal>/</literal>
+       can be supplied by separating them with commas. A separate file containing
+       user names and/or regular expressions preceded by <literal>/</literal>
+       can be specified by preceding the file name with <literal>@</literal>.
       </para>
      </listitem>
     </varlistentry>
@@ -739,6 +744,18 @@ host    all             all             ::1/128                 trust
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 host    all             all             localhost               trust
 
+# The same using regular expression for DATABASE, which allows connection to the
+# db1 and testdb databases and any database with a name ending with "test".
+#
+# TYPE  DATABASE               USER            ADDRESS          METHOD
+local   db1,/^.*test$,testdb   all             localhost        trust
+
+# The same using regular expression for USER, which allows connection to the
+# user1 and testuser users and any user with a name ending with "test".
+#
+# TYPE  DATABASE                 USER                              ADDRESS          METHOD
+local   db1,/^.*test$,testdb     user1,/^.*test$,testuser          localhost        trust
+
 # Allow any user from any host with IP address 192.168.93.x to connect
 # to database "postgres" as the same user name that ident reports for
 # the connection (typically the operating system user name).
@@ -785,15 +802,17 @@ host    all             all             192.168.12.10/32        gss
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 host    all             all             192.168.0.0/16          ident map=omicron
 
-# If these are the only three lines for local connections, they will
+# If these are the only four lines for local connections, they will
 # allow local users to connect only to their own databases (databases
-# with the same name as their database user name) except for administrators
-# and members of role "support", who can connect to all databases.  The file
-# $PGDATA/admins contains a list of names of administrators.  Passwords
+# with the same name as their database user name) except for users ending
+# with "helpdesk", administrators and members of role "support",
+# who can connect to all databases.
+# The file $PGDATA/admins contains a list of names of administrators.  Passwords
 # are required in all cases.
 #
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 local   sameuser        all                                     md5
+local   all             /^.*helpdesk$                           md5
 local   all             @admins                                 md5
 local   all             +support                                md5
 
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 947a1edcef..165ef4d397 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -119,7 +119,8 @@ static List *tokenize_inc_file(List *tokens, const char *outer_filename,
 							   const char *inc_filename, int elevel, char **err_msg);
 static bool parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline,
 							   int elevel, char **err_msg);
-static int	regcomp_auth_token(AuthToken *token);
+static int	regcomp_auth_token(AuthToken *token, char *filename, int line_num,
+							   char **err_msg, int elevel);
 static int	regexec_auth_token(const char *match, AuthToken *token,
 							   size_t nmatch, regmatch_t pmatch[]);
 
@@ -308,7 +309,8 @@ copy_auth_token(AuthToken *in)
  * input.  Returns the result of pg_regcomp().
  */
 static int
-regcomp_auth_token(AuthToken *token)
+regcomp_auth_token(AuthToken *token, char *filename, int line_num,
+				   char **err_msg, int elevel)
 {
 	pg_wchar   *wstr;
 	int			wlen;
@@ -325,6 +327,22 @@ regcomp_auth_token(AuthToken *token)
 								wstr, strlen(token->string + 1));
 
 	rc = pg_regcomp(token->regex, wstr, wlen, REG_ADVANCED, C_COLLATION_OID);
+	if (rc)
+	{
+		char		errstr[100];
+
+		pg_regerror(rc, token->regex, errstr, sizeof(errstr));
+		ereport(elevel,
+				(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
+				 errmsg("invalid regular expression \"%s\": %s",
+						token->string + 1, errstr),
+				 errcontext("line %d of configuration file \"%s\"",
+							line_num, filename)));
+
+		*err_msg = psprintf("invalid regular expression \"%s\": %s",
+							token->string + 1, errstr);
+
+	}
 
 	pfree(wstr);
 	return rc;
@@ -652,13 +670,18 @@ check_role(const char *role, Oid roleid, List *tokens)
 	foreach(cell, tokens)
 	{
 		tok = lfirst(cell);
-		if (!tok->quoted && tok->string[0] == '+')
+		if (!token_has_regexp(tok))
 		{
-			if (is_member(roleid, tok->string + 1))
+			if (!tok->quoted && tok->string[0] == '+')
+			{
+				if (is_member(roleid, tok->string + 1))
+					return true;
+			}
+			else if (token_matches(tok, role) ||
+					 token_is_keyword(tok, "all"))
 				return true;
 		}
-		else if (token_matches(tok, role) ||
-				 token_is_keyword(tok, "all"))
+		else if (regexec_auth_token(role, tok, 0, NULL) == REG_OKAY)
 			return true;
 	}
 	return false;
@@ -685,22 +708,27 @@ check_db(const char *dbname, const char *role, Oid roleid, List *tokens)
 			if (token_is_keyword(tok, "replication"))
 				return true;
 		}
-		else if (token_is_keyword(tok, "all"))
-			return true;
-		else if (token_is_keyword(tok, "sameuser"))
+		else if (token_is_keyword(tok, "replication"))
+			continue;			/* never match this if not walsender */
+		else if (!token_has_regexp(tok))
 		{
-			if (strcmp(dbname, role) == 0)
+			if (token_is_keyword(tok, "all"))
 				return true;
-		}
-		else if (token_is_keyword(tok, "samegroup") ||
-				 token_is_keyword(tok, "samerole"))
-		{
-			if (is_member(roleid, dbname))
+			else if (token_is_keyword(tok, "sameuser"))
+			{
+				if (strcmp(dbname, role) == 0)
+					return true;
+			}
+			else if (token_is_keyword(tok, "samegroup") ||
+					 token_is_keyword(tok, "samerole"))
+			{
+				if (is_member(roleid, dbname))
+					return true;
+			}
+			else if (token_matches(tok, dbname))
 				return true;
 		}
-		else if (token_is_keyword(tok, "replication"))
-			continue;			/* never match this if not walsender */
-		else if (token_matches(tok, dbname))
+		else if (regexec_auth_token(dbname, tok, 0, NULL) == REG_OKAY)
 			return true;
 	}
 	return false;
@@ -1119,8 +1147,15 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 	tokens = lfirst(field);
 	foreach(tokencell, tokens)
 	{
-		parsedline->databases = lappend(parsedline->databases,
-										copy_auth_token(lfirst(tokencell)));
+		AuthToken  *tok = copy_auth_token(lfirst(tokencell));
+
+		/*
+		 * Compile a regex from the database token, if necessary.
+		 */
+		if (regcomp_auth_token(tok, HbaFileName, line_num, err_msg, elevel))
+			return NULL;
+
+		parsedline->databases = lappend(parsedline->databases, tok);
 	}
 
 	/* Get the roles. */
@@ -1139,8 +1174,15 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 	tokens = lfirst(field);
 	foreach(tokencell, tokens)
 	{
-		parsedline->roles = lappend(parsedline->roles,
-									copy_auth_token(lfirst(tokencell)));
+		AuthToken  *tok = copy_auth_token(lfirst(tokencell));
+
+		/*
+		 * Compile a regex from the role token, if necessary.
+		 */
+		if (regcomp_auth_token(tok, HbaFileName, line_num, err_msg, elevel))
+			return NULL;
+
+		parsedline->roles = lappend(parsedline->roles, tok);
 	}
 
 	if (parsedline->conntype != ctLocal)
@@ -2374,7 +2416,6 @@ parse_ident_line(TokenizedAuthLine *tok_line, int elevel)
 	List	   *tokens;
 	AuthToken  *token;
 	IdentLine  *parsedline;
-	int			rc;
 
 	Assert(tok_line->fields != NIL);
 	field = list_head(tok_line->fields);
@@ -2410,24 +2451,8 @@ parse_ident_line(TokenizedAuthLine *tok_line, int elevel)
 	 * Now that the field validation is done, compile a regex from the user
 	 * token, if necessary.
 	 */
-	rc = regcomp_auth_token(parsedline->token);
-	if (rc)
-	{
-		char		errstr[100];
-
-		pg_regerror(rc, parsedline->token->regex, errstr, sizeof(errstr));
-		ereport(elevel,
-				(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
-				 errmsg("invalid regular expression \"%s\": %s",
-						parsedline->token->string + 1, errstr),
-				 errcontext("line %d of configuration file \"%s\"",
-							line_num, IdentFileName)));
-
-		*err_msg = psprintf("invalid regular expression \"%s\": %s",
-							parsedline->token->string + 1, errstr);
-
+	if (regcomp_auth_token(parsedline->token, IdentFileName, line_num, err_msg, elevel))
 		return NULL;
-	}
 
 	return parsedline;
 }
diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index ea664d18f5..acb8bfaac8 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -81,6 +81,14 @@ $node->safe_psql(
 	 GRANT ALL ON sysuser_data TO md5_role;");
 $ENV{"PGPASSWORD"} = 'pass';
 
+# Create a role that contains a comma to stress the parsing.
+$node->safe_psql('postgres',
+	q{SET password_encryption='md5'; CREATE ROLE "md5,role" LOGIN PASSWORD 'pass';}
+);
+
+# Create a database to test regular expression.
+$node->safe_psql('postgres', "CREATE database regex_testdb;");
+
 # For "trust" method, all users should be able to connect. These users are not
 # considered to be authenticated.
 reset_pg_hba($node, 'all', 'all', 'trust');
@@ -200,6 +208,39 @@ append_to_file(
 
 test_conn($node, 'user=md5_role', 'password from pgpass', 0);
 
+# Testing with regular expression for username. Note that the third regex
+# matches in this case.
+reset_pg_hba($node, 'all', '/^.*nomatch.*$, baduser, /^md.*$', 'password');
+test_conn($node, 'user=md5_role', 'password, matching regexp for username',
+	0);
+
+# The third regex does not match anymore.
+reset_pg_hba($node, 'all', '/^.*nomatch.*$, baduser, /^m_d.*$', 'password');
+test_conn($node, 'user=md5_role',
+	'password, non matching regexp for username',
+	2, log_unlike => [qr/connection authenticated:/]);
+
+# Test with a comma in the regular expression
+reset_pg_hba($node, 'all', '"/^.*5,.*e$"', 'password');
+test_conn($node, 'user=md5,role', 'password', 'matching regexp for username',
+	0);
+
+# Testing with regular expression for dbname. The third regex matches.
+reset_pg_hba($node, '/^.*nomatch.*$, baddb, /^regex_t.*b$', 'all',
+	'password');
+test_conn(
+	$node, 'user=md5_role dbname=regex_testdb', 'password,
+   matching regexp for dbname', 0);
+
+# The third regex does not match anymore.
+reset_pg_hba($node, '/^.*nomatch.*$, baddb, /^regex_t.*ba$',
+	'all', 'password');
+test_conn(
+	$node,
+	'user=md5_role dbname=regex_testdb',
+	'password, non matching regexp for dbname',
+	2, log_unlike => [qr/connection authenticated:/]);
+
 unlink($pgpassfile);
 delete $ENV{"PGPASSFILE"};

#34

Michael Paquier

michael@paquier.xyz

about 3 years ago

In reply to: Drouvot, Bertrand (#33)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

On Wed, Oct 19, 2022 at 10:45:44AM +0200, Drouvot, Bertrand wrote:

Please find attached v1-0001-regex-handling-for-db-and-roles-in-hba.patch to
implement regexes for databases and roles in hba.

It does also contain new regexes related TAP tests and doc updates.

Thanks for the updated version. This is really easy to look at now.

It relies on the refactoring made in fc579e11c6 (but changes the
regcomp_auth_token() parameters so that it is now responsible for emitting
the compilation error message (if any), to avoid code duplication in
parse_hba_line() and parse_ident_line() for roles, databases and user name
mapping).

@@ -652,13 +670,18 @@ check_role(const char *role, Oid roleid, List *tokens)
[...]
-       if (!tok->quoted && tok->string[0] == '+')
+       if (!token_has_regexp(tok))
        {
Hmm.  Do we need token_has_regexp() here for all the cases?  We know
that the string can begin with a '+', hence it is no regex.  The same
applies for the case of "all".  The remaining case is the one where
the user name matches exactly the AuthToken string, which should be
last as we want to treat anything beginning with a '/' as a regex.  It
seems like we could do an order like that?  Say:
if (!tok->quoted && tok->string[0] == '+')
    //do
else if (token_is_keyword(tok, "all"))
    //do
else if (token_has_regexp(tok))
    //do regex compilation, handling failures
else if (token_matches(tok, role))
    //do exact match

The same approach with keywords first, regex, and exact match could be
applied as well for the databases? Perhaps it is just mainly a matter
of taste, and it depends on how much you want to prioritize the place
of the regex over the rest but that could make the code easier to
understand in the long-run and this is a very sensitive code area, and
the case of physical walsenders (in short specific process types)
requiringx specific conditions is also something to take into account.

    foreach(tokencell, tokens)
    {
-       parsedline->roles = lappend(parsedline->roles,
-                                   copy_auth_token(lfirst(tokencell)));
+       AuthToken  *tok = copy_auth_token(lfirst(tokencell));
+
+       /*
+        * Compile a regex from the role token, if necessary.
+        */
+       if (regcomp_auth_token(tok, HbaFileName, line_num, err_msg, elevel))
+           return NULL;
+
+       parsedline->roles = lappend(parsedline->roles, tok);
    }

Compiling the expressions for the user and database lists one-by-one
in parse_hba_line() as you do is correct. However there is a gotcha
that you are forgetting here: the memory allocations done for the
regexp compilations are not linked to the memory context where each
line is processed (named hbacxt in load_hba()) and need a separate
cleanup. In the same fashion as load_ident(), it seems to me that we
need two extra things for this patch:
- if !ok (see where we do MemoryContextDelete(hbacxt)), we need to go
through new_parsed_lines and free for each line the AuthTokens for the
database and user lists.
- if ok and new_parsed_lines != NIL, the same cleanup needs to
happen.
My guess is that you could do both the same way as load_ident() does,
keeping some symmetry between the two code paths. Unifying both into
a common routine would be sort of annoying as HbaLines uses lists
within the lists of parsed lines, and IdentLine can have one at most
in each line.

I am wondering whether we should link the regexp code to not use
malloc(), actually.. This would require a separate analysis, though,
and I suspect that palloc() would be very expensive for this job.

For now, I have made your last patch a bit shorter by applying the
refactoring of regcomp_auth_token() separately with a few tweaks to
the comments.
--
Michael

#35

Drouvot, Bertrand

bertranddrouvot.pg@gmail.com

about 3 years ago

In reply to: Michael Paquier (#34)

1 attachment(s)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Hi,

On 10/21/22 2:58 AM, Michael Paquier wrote:

On Wed, Oct 19, 2022 at 10:45:44AM +0200, Drouvot, Bertrand wrote:

Please find attached v1-0001-regex-handling-for-db-and-roles-in-hba.patch to
implement regexes for databases and roles in hba.

It does also contain new regexes related TAP tests and doc updates.

Thanks for the updated version. This is really easy to look at now.

It relies on the refactoring made in fc579e11c6 (but changes the
regcomp_auth_token() parameters so that it is now responsible for emitting
the compilation error message (if any), to avoid code duplication in
parse_hba_line() and parse_ident_line() for roles, databases and user name
mapping).
@@ -652,13 +670,18 @@ check_role(const char *role, Oid roleid, List *tokens)
[...]
-       if (!tok->quoted && tok->string[0] == '+')
+       if (!token_has_regexp(tok))
{
Hmm.  Do we need token_has_regexp() here for all the cases?  We know
that the string can begin with a '+', hence it is no regex.  The same
applies for the case of "all".  The remaining case is the one where
the user name matches exactly the AuthToken string, which should be
last as we want to treat anything beginning with a '/' as a regex.  It
seems like we could do an order like that?  Say:
if (!tok->quoted && tok->string[0] == '+')
//do
else if (token_is_keyword(tok, "all"))
//do
else if (token_has_regexp(tok))
//do regex compilation, handling failures
else if (token_matches(tok, role))
//do exact match
The same approach with keywords first, regex, and exact match could be
applied as well for the databases? Perhaps it is just mainly a matter
of taste,

Yeah, I think it is.

and it depends on how much you want to prioritize the place
of the regex over the rest but that could make the code easier to
understand in the long-run and this is a very sensitive code area,

And agree that your proposal tastes better ;-): it is easier to
understand, v2 attached has been done that way.

Compiling the expressions for the user and database lists one-by-one
in parse_hba_line() as you do is correct. However there is a gotcha
that you are forgetting here: the memory allocations done for the
regexp compilations are not linked to the memory context where each
line is processed (named hbacxt in load_hba()) and need a separate
cleanup.

Oops, right, thanks for the call out!

In the same fashion as load_ident(), it seems to me that we
need two extra things for this patch:
- if !ok (see where we do MemoryContextDelete(hbacxt)), we need to go
through new_parsed_lines and free for each line the AuthTokens for the
database and user lists.
- if ok and new_parsed_lines != NIL, the same cleanup needs to
happen.

Right, but I think that should be "parsed_hba_lines != NIL".

My guess is that you could do both the same way as load_ident() does,
keeping some symmetry between the two code paths.

Right. To avoid code duplication in the !ok/ok cases, the function
free_hba_line() has been added in v2: it goes through the list of
databases and roles tokens and call free_auth_token() for each of them.

Unifying both into
a common routine would be sort of annoying as HbaLines uses lists
within the lists of parsed lines, and IdentLine can have one at most
in each line.

I agree, and v2 is not attempting to unify them.

For now, I have made your last patch a bit shorter by applying the
refactoring of regcomp_auth_token() separately with a few tweaks to
the comments.

Thanks! v2 attached does apply on top of that.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v2-0001-regex-handling-for-db-and-roles-in-hba.patchtext/plain; charset=UTF-8; name=v2-0001-regex-handling-for-db-and-roles-in-hba.patchDownload

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..30753003ba 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -234,10 +234,13 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        replication connections do not specify any particular database whereas
        logical replication connections do specify it.
        Otherwise, this is the name of
-       a specific <productname>PostgreSQL</productname> database.
-       Multiple database names can be supplied by separating them with
-       commas.  A separate file containing database names can be specified by
-       preceding the file name with <literal>@</literal>.
+       a specific <productname>PostgreSQL</productname> database or a regular
+       expression preceded by <literal>/</literal>.
+       Multiple database names and/or regular expressions preceded by <literal>/</literal>
+       can be supplied by separating them with commas.
+       A separate file containing database names and/or regular expressions preceded
+       by <literal>/</literal> can be specified by preceding the file name
+       with <literal>@</literal>.
       </para>
      </listitem>
     </varlistentry>
@@ -249,18 +252,20 @@ hostnogssenc  <replaceable>database</replaceable>  <replaceable>user</replaceabl
        Specifies which database user name(s) this record
        matches. The value <literal>all</literal> specifies that it
        matches all users.  Otherwise, this is either the name of a specific
-       database user, or a group name preceded by <literal>+</literal>.
+       database user, a regular expression preceded by <literal>/</literal>,
+       or a group name preceded by <literal>+</literal>.
        (Recall that there is no real distinction between users and groups
        in <productname>PostgreSQL</productname>; a <literal>+</literal> mark really means
        <quote>match any of the roles that are directly or indirectly members
        of this role</quote>, while a name without a <literal>+</literal> mark matches
-       only that specific role.) For this purpose, a superuser is only
+       only that specific role or regular expression.) For this purpose, a superuser is only
        considered to be a member of a role if they are explicitly a member
        of the role, directly or indirectly, and not just by virtue of
        being a superuser.
-       Multiple user names can be supplied by separating them with commas.
-       A separate file containing user names can be specified by preceding the
-       file name with <literal>@</literal>.
+       Multiple user names and/or regular expressions preceded by <literal>/</literal>
+       can be supplied by separating them with commas. A separate file containing
+       user names and/or regular expressions preceded by <literal>/</literal>
+       can be specified by preceding the file name with <literal>@</literal>.
       </para>
      </listitem>
     </varlistentry>
@@ -739,6 +744,18 @@ host    all             all             ::1/128                 trust
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 host    all             all             localhost               trust
 
+# The same using regular expression for DATABASE, which allows connection to the
+# db1 and testdb databases and any database with a name ending with "test".
+#
+# TYPE  DATABASE               USER            ADDRESS          METHOD
+local   db1,/^.*test$,testdb   all             localhost        trust
+
+# The same using regular expression for USER, which allows connection to the
+# user1 and testuser users and any user with a name ending with "test".
+#
+# TYPE  DATABASE                 USER                              ADDRESS          METHOD
+local   db1,/^.*test$,testdb     user1,/^.*test$,testuser          localhost        trust
+
 # Allow any user from any host with IP address 192.168.93.x to connect
 # to database "postgres" as the same user name that ident reports for
 # the connection (typically the operating system user name).
@@ -785,15 +802,17 @@ host    all             all             192.168.12.10/32        gss
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 host    all             all             192.168.0.0/16          ident map=omicron
 
-# If these are the only three lines for local connections, they will
+# If these are the only four lines for local connections, they will
 # allow local users to connect only to their own databases (databases
-# with the same name as their database user name) except for administrators
-# and members of role "support", who can connect to all databases.  The file
-# $PGDATA/admins contains a list of names of administrators.  Passwords
+# with the same name as their database user name) except for users ending
+# with "helpdesk", administrators and members of role "support",
+# who can connect to all databases.
+# The file $PGDATA/admins contains a list of names of administrators.  Passwords
 # are required in all cases.
 #
 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 local   sameuser        all                                     md5
+local   all             /^.*helpdesk$                           md5
 local   all             @admins                                 md5
 local   all             +support                                md5
 
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index f3539a7929..0e2185d35e 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -293,6 +293,31 @@ free_auth_token(AuthToken *token)
 		pg_regfree(token->regex);
 }
 
+/*
+ * Free a HbaLine, that may include databases and/or roles AuthToken with a
+ * regular expression that needs to be cleaned up explicitly.
+ */
+static void
+free_hba_line(HbaLine *line)
+{
+	ListCell   *cell;
+	AuthToken  *tok;
+
+	/* Clean up roles regexes */
+	foreach(cell, line->roles)
+	{
+		tok = lfirst(cell);
+		free_auth_token(tok);
+	}
+
+	/* Clean up databases regexes */
+	foreach(cell, line->databases)
+	{
+		tok = lfirst(cell);
+		free_auth_token(tok);
+	}
+}
+
 /*
  * Copy a AuthToken struct into freshly palloc'd memory.
  */
@@ -676,8 +701,14 @@ check_role(const char *role, Oid roleid, List *tokens)
 			if (is_member(roleid, tok->string + 1))
 				return true;
 		}
-		else if (token_matches(tok, role) ||
-				 token_is_keyword(tok, "all"))
+		else if (token_is_keyword(tok, "all"))
+			return true;
+		else if (token_has_regexp(tok))
+		{
+			if (regexec_auth_token(role, tok, 0, NULL) == REG_OKAY)
+				return true;
+		}
+		else if (token_matches(tok, role))
 			return true;
 	}
 	return false;
@@ -719,6 +750,11 @@ check_db(const char *dbname, const char *role, Oid roleid, List *tokens)
 		}
 		else if (token_is_keyword(tok, "replication"))
 			continue;			/* never match this if not walsender */
+		else if (token_has_regexp(tok))
+		{
+			if (regexec_auth_token(dbname, tok, 0, NULL) == REG_OKAY)
+				return true;
+		}
 		else if (token_matches(tok, dbname))
 			return true;
 	}
@@ -1138,8 +1174,15 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 	tokens = lfirst(field);
 	foreach(tokencell, tokens)
 	{
-		parsedline->databases = lappend(parsedline->databases,
-										copy_auth_token(lfirst(tokencell)));
+		AuthToken  *tok = copy_auth_token(lfirst(tokencell));
+
+		/*
+		 * Compile a regex from the database token, if necessary.
+		 */
+		if (regcomp_auth_token(tok, HbaFileName, line_num, err_msg, elevel))
+			return NULL;
+
+		parsedline->databases = lappend(parsedline->databases, tok);
 	}
 
 	/* Get the roles. */
@@ -1158,8 +1201,15 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 	tokens = lfirst(field);
 	foreach(tokencell, tokens)
 	{
-		parsedline->roles = lappend(parsedline->roles,
-									copy_auth_token(lfirst(tokencell)));
+		AuthToken  *tok = copy_auth_token(lfirst(tokencell));
+
+		/*
+		 * Compile a regex from the role token, if necessary.
+		 */
+		if (regcomp_auth_token(tok, HbaFileName, line_num, err_msg, elevel))
+			return NULL;
+
+		parsedline->roles = lappend(parsedline->roles, tok);
 	}
 
 	if (parsedline->conntype != ctLocal)
@@ -2355,12 +2405,31 @@ load_hba(void)
 
 	if (!ok)
 	{
-		/* File contained one or more errors, so bail out */
+		/*
+		 * File contained one or more errors, so bail out, first being careful
+		 * to clean up whatever we allocated.  Most stuff will go away via
+		 * MemoryContextDelete, but we have to clean up regexes explicitly.
+		 */
+		foreach(line, new_parsed_lines)
+		{
+			HbaLine    *newline = (HbaLine *) lfirst(line);
+
+			free_hba_line(newline);
+		}
 		MemoryContextDelete(hbacxt);
 		return false;
 	}
 
 	/* Loaded new file successfully, replace the one we use */
+	if (parsed_hba_lines != NIL)
+	{
+		foreach(line, parsed_hba_lines)
+		{
+			HbaLine    *newline = (HbaLine *) lfirst(line);
+
+			free_hba_line(newline);
+		}
+	}
 	if (parsed_hba_context != NULL)
 		MemoryContextDelete(parsed_hba_context);
 	parsed_hba_context = hbacxt;
diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index ea664d18f5..acb8bfaac8 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -81,6 +81,14 @@ $node->safe_psql(
 	 GRANT ALL ON sysuser_data TO md5_role;");
 $ENV{"PGPASSWORD"} = 'pass';
 
+# Create a role that contains a comma to stress the parsing.
+$node->safe_psql('postgres',
+	q{SET password_encryption='md5'; CREATE ROLE "md5,role" LOGIN PASSWORD 'pass';}
+);
+
+# Create a database to test regular expression.
+$node->safe_psql('postgres', "CREATE database regex_testdb;");
+
 # For "trust" method, all users should be able to connect. These users are not
 # considered to be authenticated.
 reset_pg_hba($node, 'all', 'all', 'trust');
@@ -200,6 +208,39 @@ append_to_file(
 
 test_conn($node, 'user=md5_role', 'password from pgpass', 0);
 
+# Testing with regular expression for username. Note that the third regex
+# matches in this case.
+reset_pg_hba($node, 'all', '/^.*nomatch.*$, baduser, /^md.*$', 'password');
+test_conn($node, 'user=md5_role', 'password, matching regexp for username',
+	0);
+
+# The third regex does not match anymore.
+reset_pg_hba($node, 'all', '/^.*nomatch.*$, baduser, /^m_d.*$', 'password');
+test_conn($node, 'user=md5_role',
+	'password, non matching regexp for username',
+	2, log_unlike => [qr/connection authenticated:/]);
+
+# Test with a comma in the regular expression
+reset_pg_hba($node, 'all', '"/^.*5,.*e$"', 'password');
+test_conn($node, 'user=md5,role', 'password', 'matching regexp for username',
+	0);
+
+# Testing with regular expression for dbname. The third regex matches.
+reset_pg_hba($node, '/^.*nomatch.*$, baddb, /^regex_t.*b$', 'all',
+	'password');
+test_conn(
+	$node, 'user=md5_role dbname=regex_testdb', 'password,
+   matching regexp for dbname', 0);
+
+# The third regex does not match anymore.
+reset_pg_hba($node, '/^.*nomatch.*$, baddb, /^regex_t.*ba$',
+	'all', 'password');
+test_conn(
+	$node,
+	'user=md5_role dbname=regex_testdb',
+	'password, non matching regexp for dbname',
+	2, log_unlike => [qr/connection authenticated:/]);
+
 unlink($pgpassfile);
 delete $ENV{"PGPASSFILE"};

#36

Michael Paquier

michael@paquier.xyz

about 3 years ago

In reply to: Drouvot, Bertrand (#35)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

On Fri, Oct 21, 2022 at 02:10:37PM +0200, Drouvot, Bertrand wrote:

On 10/21/22 2:58 AM, Michael Paquier wrote:

The same approach with keywords first, regex, and exact match could be
applied as well for the databases? Perhaps it is just mainly a matter
of taste,

Yeah, I think it is.

;)

Still it looks that this makes for less confusion with a minimal
footprint once the new additions are in place.

In the same fashion as load_ident(), it seems to me that we
need two extra things for this patch:
- if !ok (see where we do MemoryContextDelete(hbacxt)), we need to go
through new_parsed_lines and free for each line the AuthTokens for the
database and user lists.
- if ok and new_parsed_lines != NIL, the same cleanup needs to
happen.

Right, but I think that should be "parsed_hba_lines != NIL".

For the second case, where we need to free the past contents after a
success, yes.

Right. To avoid code duplication in the !ok/ok cases, the function
free_hba_line() has been added in v2: it goes through the list of databases
and roles tokens and call free_auth_token() for each of them.

Having a small-ish routine for that is fine.

I have spent a couple of hours doing a pass over v2, playing manually
with regex patterns, reloads, the system views and item lists. The
logic was fine, but I have adjusted a few things related to the
comments and the documentation (particularly with the examples,
removing one example and updating one with a regex that has a comma,
needing double quotes). The CI and all my machines were green, and
the test coverage looked sufficient. So, applied. I'll keep an eye
on the buildfarm.
--
Michael

#37

Drouvot, Bertrand

bertranddrouvot.pg@gmail.com

about 3 years ago

In reply to: Michael Paquier (#36)

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Hi,

On 10/24/22 5:34 AM, Michael Paquier wrote:

On Fri, Oct 21, 2022 at 02:10:37PM +0200, Drouvot, Bertrand wrote:

On 10/21/22 2:58 AM, Michael Paquier wrote:

I have spent a couple of hours doing a pass over v2, playing manually
with regex patterns, reloads, the system views and item lists. The
logic was fine, but I have adjusted a few things related to the
comments and the documentation (particularly with the examples,
removing one example and updating one with a regex that has a comma,
needing double quotes). The CI and all my machines were green, and
the test coverage looked sufficient. So, applied.

Thanks!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com