[PoC] Delegating pg_ident to a third party

Started by Jacob Championabout 4 years ago11 messages

pchampion@vmware.com

about 4 years ago

4 attachment(s)

Hi all,

In keeping with my theme of expanding the authentication/authorization
options for the server, attached is an experimental patchset that lets
Postgres determine an authenticated user's allowed roles by querying an
LDAP server, and enables SASL binding for those queries.

This lets you delegate pieces of pg_ident.conf to a central server, so
that you don't have to run any synchronization scripts (or deal with
associated staleness problems, repeated load on the LDAP deployment,
etc.). And it lets you make those queries with a client certificate
instead of a bind password, or at the very least protect your bind
password with some SCRAM crypto. You don't have to use the LDAP auth
method for this to work; you can combine it with Kerberos or certs or
any auth method that already supports pg_ident.

The target users, in my mind, are admins who are already using an auth
method with user maps, but have many deployments and want easier
control over granting and revoking database access from one location.
This won't help you so much if you need to have exactly one role per
user -- there's no logic to automatically create roles, so it can't
fully replace the existing synchronization scripts that are out there.
But if all you need is "X, Y, and Z are allowed to log in as guest, and
A and B may connect as admins", then this is meant to simplify your
life.

This is a smaller step than my previous proof-of-concept, which handled
fully federated authentication and authorization via an OAuth provider
[1]: /messages/by-id/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.camel@vmware.com
mappings to the LDAP auth method [2]/messages/by-id/1a61806047c536e7528b943d0cfe12608118ca31.camel@vmware.com, though I haven't tried them
together yet. (I've also been thinking about pulling group membership
information out of Kerberos authorization data, for those of you using
Active Directory. Things for later.)

= How-To =

If you want to try it out -- on a non-production system please -- take
a look at the test suite in src/test/ldap, which has been filled out
with some example usage. The core features are the "ldapmap" HBA option
(which you would use instead of "map" in your existing HBA) and the
"ldapsaslmechs" HBA option, which you can set to a list of SASL
mechanisms that you will accept. (The list of supported mechanisms is
determined by both systems' LDAP and SASL libraries, not by Postgres.)

The tricky part is writing the pg_ident line correctly, because it's
currently not a very good user experience. The query is in the form of
an LDAP URL. It needs to return exactly one entry for the user being
authorized; the attribute values contained in that entry will be
interpreted as the list of roles that the user is allowed to connect
as. Regex matching and substitution are supported as they are for
regular maps. Here's a sample:

pg_ident.conf:

myldapmap /^(.*)$ ldap://example.com/dc=example,dc=com?postgresRole?sub?(uid=\1)

pg_hba.conf:

hostssl all all all cert ldapmap=myldapmap ldaptls=1 ldapsaslmechs=scram-sha-1 ldapbinddn=admin ldapbindpasswd=secret

This particular setup can be described as follows:

- Clients must use client certificates to authenticate to Postgres.
- Once the certificate is verified, Postgres will connect to the LDAP
server at example.com, issue StartTLS, and begin a SCRAM-SHA-1 exchange
using the bind username and password (admin/secret).
- Once that completes, Postgres will issue a query for the LDAP user
that has a uid matching the CN of the client certificate. (If more than
one user matches, authorization fails.)
- The client's PGUSER will be compared with the list of postgresRole
attributes belonging to that LDAP user, and if one matches,
authorization succeeds.

= Areas for Improvement =

I think it would be nice to support LDAP group membership in addition
to object attributes.

Settings for the LDAP connection are currently spread between pg_hba,
pg_ident, and environment variables like LDAPTLS_CERT. I made the
situation worse by allowing the pg_ident query to contain a scheme,
host, and port. That makes it seem like you could send different users
to different LDAP servers, but since they would all have to share
exactly the same TLS settings anyway, I think this was a mistake on my
part.

That mistake aside, I think the current URL query syntax is powerful
but unintuitive. I would rather see that as an option for power users,
and let other people just specify the user filter and role attribute
separately. And there needs to be more logging around the feature, to
help debug problems.

Regex substitution of user-controlled data into an LDAP query is
perilous, and I don't like it. For now I have restricted the allowed
characters as a first mitigation.

Is it safe to use listen_addresses in the test suite, as I have done,
as long as the HBA requires authentication? Or is that reopening a
security hole? I seem to recall discussion on this but my search-fu has
failed me.

There's a lot of code duplication in the current patchset that would
need to be undone.

...and more; see TODOs in the patches if you're interested.

= Patch Roadmap =

- 0001 fixes error messages that are printed when ldap_url_parse()
fails. Since the pg_ident queries use LDAP URLs, and it's easy to get
them wrong, that fix is particularly important for this patchset. But I
think it could potentially be applied separately.

- 0002 implements the "ldapmap" HBA option and enables the ldaptls,
ldapbinddn, and ldapbindpasswd options for it. It also adds
corresponding tests to the LDAP suite.

- 0003 tests the use of client certificates via LDAP environment
variables. (This is already supported today but I didn't see any
coverage, which will be important for the last patch.)

- 0004 implements the "ldapsaslmechs" HBA option and adds enough SASL
support for at least the EXTERNAL and SCRAM-* mechanisms. Others may
work but I haven't tested them. This feature is available only if you
have the <sasl/sasl.h> header on your system at build time.

WDYT? (My responses here will be slower than usual. Hope you all have a
great end to the year!)

--Jacob

[1]: /messages/by-id/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.camel@vmware.com
[2]: /messages/by-id/1a61806047c536e7528b943d0cfe12608118ca31.camel@vmware.com

Attachments:

0001-hba-correct-messages-when-ldap_url_parse-fails.patchtext/x-patch; name=0001-hba-correct-messages-when-ldap_url_parse-fails.patchDownload

From 2a88b8cbdbe790df55240850995852e8f2d304eb Mon Sep 17 00:00:00 2001
From: Jacob Champion <pchampion@vmware.com>
Date: Thu, 2 Dec 2021 09:37:08 -0800
Subject: [PATCH 1/4] hba: correct messages when ldap_url_parse() fails

ldap_err2string() doesn't work for the return value of ldap_url_parse();
you end up with strange messages like

    LOG: could not parse LDAP URL "<bad-url>": Time limit exceeded

There doesn't appear to be a corresponding error-to-string function for
the URL codes in OpenLDAP, so add a helper.
---
 src/backend/libpq/hba.c | 49 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 47 insertions(+), 2 deletions(-)

diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 4328eb74fe..600972e9a4 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -1691,6 +1691,51 @@ parse_hba_line(TokenizedLine *tok_line, int elevel)
 }
 
 
+#ifdef LDAP_API_FEATURE_X_OPENLDAP
+
+/*
+ * OpenLDAP's ldap_err2string() doesn't work on the return value of
+ * ldap_url_parse(). Provide a helper to do so.
+ */
+static const char *
+ldap_url_err2string(int errcode)
+{
+	switch (errcode)
+	{
+		case LDAP_URL_SUCCESS:
+			return "success";
+
+		/* internal/developer errors */
+		case LDAP_URL_ERR_MEM:
+			return "out of memory";
+		case LDAP_URL_ERR_PARAM:
+			return "invalid parameter";
+
+		/* user errors */
+		case LDAP_URL_ERR_BADSCHEME:
+			return "unsupported scheme";
+		case LDAP_URL_ERR_BADENCLOSURE:
+			return "missing closing bracket";
+		case LDAP_URL_ERR_BADURL:
+			return "malformed URL";
+		case LDAP_URL_ERR_BADHOST:
+			return "bad host/port";
+		case LDAP_URL_ERR_BADATTRS:
+			return "bad/missing attributes";
+		case LDAP_URL_ERR_BADSCOPE:
+			return "bad/missing scope";
+		case LDAP_URL_ERR_BADFILTER:
+			return "bad/missing filter";
+		case LDAP_URL_ERR_BADEXTS:
+			return "bad/missing extensions";
+	}
+
+	return psprintf("unknown error: %d", errcode);
+}
+
+#endif /* LDAP_API_FEATURE_X_OPENLDAP */
+
+
 /*
  * Parse one name-value pair as an authentication option into the given
  * HbaLine.  Return true if we successfully parse the option, false if we
@@ -1818,9 +1863,9 @@ parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline,
 		{
 			ereport(elevel,
 					(errcode(ERRCODE_CONFIG_FILE_ERROR),
-					 errmsg("could not parse LDAP URL \"%s\": %s", val, ldap_err2string(rc))));
+					 errmsg("could not parse LDAP URL \"%s\": %s", val, ldap_url_err2string(rc))));
 			*err_msg = psprintf("could not parse LDAP URL \"%s\": %s",
-								val, ldap_err2string(rc));
+								val, ldap_url_err2string(rc));
 			return false;
 		}
 
-- 
2.25.1

0002-pg_ident-allow-delegation-to-an-LDAP-server.patchtext/x-patch; name=0002-pg_ident-allow-delegation-to-an-LDAP-server.patchDownload

From aec6ad94c1640691f70b0f8d1434a64e25a30755 Mon Sep 17 00:00:00 2001
From: Jacob Champion <pchampion@vmware.com>
Date: Thu, 2 Dec 2021 09:36:36 -0800
Subject: [PATCH 2/4] pg_ident: allow delegation to an LDAP server

This is work-in-progress code, with several deficiencies and TODOs.
Don't use it in production.

The core addition of this patch is the HBA option "ldapmap", which
behaves like the existing "map" option except that the list of Postgres
roles is constructed from the results of an LDAP query. With this
method, role authorization can be controlled centrally, without needing
cron scripts for LDAP-to-Postgres synchronization.

The LDAP query must be in the form of an ldap[s]:// URL and it must
return exactly one entry (for the user that is logging in). That entry's
attribute values are used as the list of roles that an authenticated
user is allowed to log in as. For example, an LDAP admin could give any
authorized users one or more "postgresRole" attributes, and the DBA
could write a role query in pg_ident that looks like

  myldapmap  /^(.*)$  ldaps://example.com/dc=example,dc=com?postgresRole?sub?(uid=\1)

The ldapmap option can only be used for those auth methods that already
support the map option (cert, gss, etc.). The ldaptls, ldapbinddn, and
ldapbindpasswd options for the ldap auth method are now also supported
for any HBA lines that use ldapmap.

Possible remaining work (and known problems):

- It'd be nice to map roles based on LDAP group membership.

- pg_ident's regex substitution into a query URL is fairly unsafe and
  needs to be rethought. For now I've restricted the characters that can
  be substituted.

- The new LDAP tests open up TCP listen_addresses on the test server; I
  don't know if that's actually safe to do.

- Some of the options for specifying the LDAP server are in pg_ident
  (the scheme, host, and port), some are in pg_hba (ldaptls and the bind
  settings), and some are in the environment (client certificates, etc.)
  That is not easy to admin.

- Querying for multiple attributes is allowed, but only the first one
  returned by the server will be honored; the rest are silently ignored.

- The LDAP connection code might need to be deduplicated with
  InitializeLDAPConnection(), as they have substantial similarity.

- There's not much logging to help you when you're writing a role query,
  or to do a postmortem if something is wrong.

- Various others, called out inline...
---
 src/backend/libpq/auth.c          |  13 +-
 src/backend/libpq/hba.c           | 523 ++++++++++++++++++++++++++++--
 src/include/libpq/hba.h           |   5 +-
 src/test/ldap/authdata.ldif       |   3 +
 src/test/ldap/postgresuser.schema |  30 ++
 src/test/ldap/t/001_auth.pl       | 244 ++++++++++++--
 6 files changed, 766 insertions(+), 52 deletions(-)
 create mode 100644 src/test/ldap/postgresuser.schema

diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 7bcf52523b..c8c48d487d 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -1137,8 +1137,7 @@ pg_GSS_checkauth(Port *port)
 		return STATUS_ERROR;
 	}
 
-	ret = check_usermap(port->hba->usermap, port->user_name, princ,
-						pg_krb_caseins_users);
+	ret = check_usermap(port, princ, pg_krb_caseins_users);
 
 	pfree(princ);
 
@@ -1477,12 +1476,12 @@ pg_SSPI_recvauth(Port *port)
 		int			retval;
 
 		namebuf = psprintf("%s@%s", accountname, domainname);
-		retval = check_usermap(port->hba->usermap, port->user_name, namebuf, true);
+		retval = check_usermap(port, namebuf, true);
 		pfree(namebuf);
 		return retval;
 	}
 	else
-		return check_usermap(port->hba->usermap, port->user_name, accountname, true);
+		return check_usermap(port, accountname, true);
 }
 
 /*
@@ -1843,7 +1842,7 @@ ident_inet_done:
 		 * usermap, because at this point authentication has succeeded.
 		 */
 		set_authn_id(port, ident_user);
-		return check_usermap(port->hba->usermap, port->user_name, ident_user, false);
+		return check_usermap(port, ident_user, false);
 	}
 	return STATUS_ERROR;
 }
@@ -1906,7 +1905,7 @@ auth_peer(hbaPort *port)
 	 */
 	set_authn_id(port, pw->pw_name);
 
-	ret = check_usermap(port->hba->usermap, port->user_name, port->authn_id, false);
+	ret = check_usermap(port, port->authn_id, false);
 
 	return ret;
 #else
@@ -2799,7 +2798,7 @@ CheckCertAuth(Port *port)
 	}
 
 	/* Just pass the certificate cn/dn to the usermap check */
-	status_check_usermap = check_usermap(port->hba->usermap, port->user_name, peer_username, false);
+	status_check_usermap = check_usermap(port, peer_username, false);
 	if (status_check_usermap != STATUS_OK)
 	{
 		/*
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 600972e9a4..57440107be 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -49,6 +49,7 @@
 #ifdef WIN32
 #include <winldap.h>
 #else
+#define LDAP_DEPRECATED 1
 #include <ldap.h>
 #endif
 #endif
@@ -1679,6 +1680,53 @@ parse_hba_line(TokenizedLine *tok_line, int elevel)
 		}
 	}
 
+	/* Final sanity checks on options. */
+	if (parsedline->usermap && parsedline->ldapmap)
+	{
+		ereport(elevel,
+				(errcode(ERRCODE_CONFIG_FILE_ERROR),
+				 errmsg("cannot use map together with ldapmap"),
+				 errcontext("line %d of configuration file \"%s\"",
+							line_num, HbaFileName)));
+		*err_msg = "cannot use map together with ldapmap";
+		return NULL;
+	}
+
+	if ((parsedline->auth_method != uaLDAP) && !parsedline->ldapmap)
+	{
+		/*
+		 * Some options are allowed to be set for the LDAP auth method and/or an
+		 * LDAP user mapping, but if neither is in use then we should complain.
+		 */
+		const char *badopt = NULL;
+
+		if (parsedline->ldaptls)
+		{
+			badopt = "ldaptls";
+		}
+		else if (parsedline->ldapbinddn)
+		{
+			badopt = "ldapbinddn";
+		}
+		else if (parsedline->ldapbindpasswd)
+		{
+			badopt = "ldapbindpasswd";
+		}
+
+		if (badopt)
+		{
+			ereport(elevel, \
+					(errcode(ERRCODE_CONFIG_FILE_ERROR), \
+					 errmsg("authentication option \"%s\" is only valid for the \"ldap\" authentication method or an \"ldapmap\" user mapping", \
+							badopt), \
+					 errcontext("line %d of configuration file \"%s\"", \
+							line_num, HbaFileName))); \
+			*err_msg = psprintf("authentication option \"%s\" is only valid for the \"ldap\" authentication method or an \"ldapmap\" user mapping", \
+								badopt); \
+			return NULL;
+		}
+	}
+
 	/*
 	 * Enforce any parameters implied by other settings.
 	 */
@@ -1752,15 +1800,23 @@ parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline,
 	hbaline->ldapscope = LDAP_SCOPE_SUBTREE;
 #endif
 
-	if (strcmp(name, "map") == 0)
+	if (strcmp(name, "map") == 0 || strcmp(name, "ldapmap") == 0)
 	{
 		if (hbaline->auth_method != uaIdent &&
 			hbaline->auth_method != uaPeer &&
 			hbaline->auth_method != uaGSS &&
 			hbaline->auth_method != uaSSPI &&
 			hbaline->auth_method != uaCert)
-			INVALID_AUTH_OPTION("map", gettext_noop("ident, peer, gssapi, sspi, and cert"));
-		hbaline->usermap = pstrdup(val);
+			INVALID_AUTH_OPTION(name, gettext_noop("ident, peer, gssapi, sspi, and cert"));
+
+		if (name[0] == 'l') /* ldapmap */
+		{
+			hbaline->ldapmap = pstrdup(val);
+		}
+		else /* map */
+		{
+			hbaline->usermap = pstrdup(val);
+		}
 	}
 	else if (strcmp(name, "clientcert") == 0)
 	{
@@ -1904,7 +1960,6 @@ parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline,
 	}
 	else if (strcmp(name, "ldaptls") == 0)
 	{
-		REQUIRE_AUTH_OPTION(uaLDAP, "ldaptls", "ldap");
 		if (strcmp(val, "1") == 0)
 			hbaline->ldaptls = true;
 		else
@@ -1943,12 +1998,10 @@ parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline,
 	}
 	else if (strcmp(name, "ldapbinddn") == 0)
 	{
-		REQUIRE_AUTH_OPTION(uaLDAP, "ldapbinddn", "ldap");
 		hbaline->ldapbinddn = pstrdup(val);
 	}
 	else if (strcmp(name, "ldapbindpasswd") == 0)
 	{
-		REQUIRE_AUTH_OPTION(uaLDAP, "ldapbindpasswd", "ldap");
 		hbaline->ldapbindpasswd = pstrdup(val);
 	}
 	else if (strcmp(name, "ldapsearchattribute") == 0)
@@ -3001,6 +3054,401 @@ check_ident_usermap(IdentLine *identLine, const char *usermap_name,
 	}
 }
 
+#ifdef LDAP_API_FEATURE_X_OPENLDAP
+
+static bool query_ldap_roles(const Port *port, LDAPURLDesc *ldapurl, List **roles);
+
+/*
+ *	Process one line from the parsed ident config lines.
+ *
+ *	Compare input parsed ident line to the needed map and ident_user. If it
+ *	matches, perform an LDAP query (using the Port's settings) to obtain the
+ *	allowed set of roles and make sure pg_role is one of them. *found_p and
+ *	*error_p are set according to our results.
+ *
+ *	TODO: each ident line has its own URL, implying that we could query separate
+ *	servers with a map
+ *
+ *	    ldap /.*@example.com ldaps://ldap.example.com/...
+ *	    ldap /.*@example.net ldap://ldap2.example.net/...
+ *
+ *	But the use of the Port means that all those servers have to share certain
+ *	settings (like credentials and the StartTLS setting). So either those
+ *	settings need to be set per ident line, or we should defer to the HBA for
+ *	scheme, host, and port too, and just allow exactly one LDAP server to be
+ *	contacted.
+ *
+ *	TODO: consolidate logic with check_ident_usermap()
+ */
+static void
+check_ldap_usermap(IdentLine *identLine, const char *ldapmap_name,
+				   const Port *port, const char *pg_role,
+				   const char *ident_user, bool *found_p, bool *error_p)
+{
+	char	   *ldapurl = NULL;
+	int			rc;
+	LDAPURLDesc *urldata;
+	List	   *roles;
+	ListCell   *role;
+
+	*found_p = false;
+	*error_p = false;
+
+	if (strcmp(identLine->usermap, ldapmap_name) != 0)
+		/* Line does not match the map name we're looking for, so just abort */
+		return;
+
+	/* Match? */
+	if (identLine->ident_user[0] == '/')
+	{
+		/*
+		 * When system username starts with a slash, treat it as a regular
+		 * expression. In this case, we process the system username as a
+		 * regular expression that returns exactly one match. This is replaced
+		 * for \1 in the LDAP query URI, if present.
+		 */
+		int			r;
+		regmatch_t	matches[2];
+		pg_wchar   *wstr;
+		int			wlen;
+		char	   *ofs;
+
+		wstr = palloc((strlen(ident_user) + 1) * sizeof(pg_wchar));
+		wlen = pg_mb2wchar_with_len(ident_user, wstr, strlen(ident_user));
+
+		r = pg_regexec(&identLine->re, wstr, wlen, 0, NULL, 2, matches, 0);
+		if (r)
+		{
+			char		errstr[100];
+
+			if (r != REG_NOMATCH)
+			{
+				/* REG_NOMATCH is not an error, everything else is */
+				pg_regerror(r, &identLine->re, errstr, sizeof(errstr));
+				ereport(LOG,
+						(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
+						 errmsg("regular expression match for \"%s\" failed: %s",
+								identLine->ident_user + 1, errstr)));
+				*error_p = true;
+			}
+
+			pfree(wstr);
+			return;
+		}
+		pfree(wstr);
+
+		if ((ofs = strstr(identLine->pg_role, "\\1")) != NULL)
+		{
+			static const char *unreserved =
+				"abcdefghijklmnopqrstuvwxyz"
+				"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+				"0123456789-._~";
+
+			regoff_t	matchlen;
+			int			offset;
+
+			/* substitution of the first argument requested */
+			if (matches[1].rm_so < 0)
+			{
+				ereport(LOG,
+						(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
+						 errmsg("regular expression \"%s\" has no subexpressions as requested by backreference in \"%s\"",
+								identLine->ident_user + 1, identLine->pg_role)));
+				*error_p = true;
+				return;
+			}
+
+			/*
+			 * Sanity check the substitution.
+			 *
+			 * We're inserting into an LDAP query URL, so we need to prevent
+			 * injection. As a simple solution, limit the characters that can be
+			 * substituted to the URL-unreserved set:
+			 *
+			 *     unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
+			 *
+			 * Note that this is stricter than the preconditions for
+			 * FormatSearchFilter().
+			 *
+			 * XXX This restriction probably indicates that either we should
+			 * break the URL up first and substitute then, or allow the DBA to
+			 * provide the separate pieces instead of a URL, or URL-escape
+			 * before substituting, or...
+			 */
+			matchlen = matches[1].rm_eo - matches[1].rm_so;
+			if (strspn(ident_user + matches[1].rm_so, unreserved) < matchlen)
+			{
+				ereport(LOG,
+						(errmsg("invalid character in matched substitution for LDAP mapping")));
+				*error_p = true;
+				return;
+			}
+
+			/*
+			 * length: original length minus length of \1 plus length of match
+			 * plus null terminator
+			 */
+			ldapurl = palloc0(strlen(identLine->pg_role) - 2 + matchlen + 1);
+			offset = ofs - identLine->pg_role;
+			memcpy(ldapurl, identLine->pg_role, offset);
+			memcpy(ldapurl + offset,
+				   ident_user + matches[1].rm_so,
+				   matchlen);
+			strcat(ldapurl, ofs + 2);
+		}
+		else
+		{
+			/* no substitution, so copy the match */
+			ldapurl = pstrdup(identLine->pg_role);
+		}
+	}
+	else
+	{
+		/* Not regular expression, so use the ident entries as-is */
+		if (strcmp(identLine->ident_user, ident_user) != 0)
+			return;
+
+		ldapurl = pstrdup(identLine->pg_role);
+	}
+
+	/*
+	 * At this point, we know that this ident line matches our map and system
+	 * user, and we've constructed the LDAP URL to use for a role query.
+	 */
+	rc = ldap_url_parse(ldapurl, &urldata);
+	if (rc != LDAP_SUCCESS)
+	{
+		ereport(LOG,
+				(errcode(ERRCODE_CONFIG_FILE_ERROR),
+				 errmsg("could not parse LDAP mapping URL \"%s\": %s",
+						ldapurl, ldap_url_err2string(rc))));
+		*error_p = true;
+
+		pfree(ldapurl);
+		return;
+	}
+
+	if (!query_ldap_roles(port, urldata, &roles))
+	{
+		/* Error message already logged */
+		list_free_deep(roles);
+		pfree(ldapurl);
+		return;
+	}
+
+	foreach(role, roles)
+	{
+		const char *allowed_role = lfirst(role);
+
+		if (strcmp(allowed_role, pg_role) == 0)
+		{
+			*found_p = true;
+			break;
+		}
+	}
+
+	list_free_deep(roles);
+	pfree(ldapurl);
+}
+
+/*
+ * Add a detail error message text to the current error if one can be
+ * constructed from the LDAP 'diagnostic message'.
+ *
+ * XXX copied from auth.c
+ */
+static int
+errdetail_for_ldap(LDAP *ldap)
+{
+	char	   *message;
+	int			rc;
+
+	rc = ldap_get_option(ldap, LDAP_OPT_DIAGNOSTIC_MESSAGE, &message);
+	if (rc == LDAP_SUCCESS && message != NULL)
+	{
+		errdetail("LDAP diagnostics: %s", message);
+		ldap_memfree(message);
+	}
+
+	return 0;
+}
+
+/*
+ * Returns a palloc'd list of pointers to role names returned by the LDAP query
+ * contained in ldapurl. Callers must list_free_deep() the return value even if
+ * the function fails.
+ */
+static bool
+query_ldap_roles(const Port *port, LDAPURLDesc *ldapurl, List **roles)
+{
+	char	   *server;
+	LDAP	   *ldap = NULL;
+	int			rc;
+	const int	ldapversion = LDAP_VERSION3;
+	bool		success = false;
+	LDAPMessage *search_message = NULL;
+	LDAPMessage *entry;
+	int			count;
+	struct berval **values = NULL;
+
+	/*
+	 * For now we query for only one attribute (the first).
+	 * TODO: maybe open up this restriction
+	 */
+	char	   *attributes[] = {ldapurl->lud_attrs[0], NULL};
+
+	*roles = NIL;
+
+	/*
+	 * TODO: why does other code prevent ldapi:// connections? Should we do so
+	 * here?
+	 */
+	server = psprintf("%s://%s:%d",
+					  ldapurl->lud_scheme, ldapurl->lud_host, ldapurl->lud_port);
+
+	/*
+	 * TODO: reuse the InitializeLDAPConnection() logic
+	 */
+	rc = ldap_initialize(&ldap, server);
+	if (rc != LDAP_SUCCESS)
+	{
+		ereport(LOG,
+				(errmsg("could not initialize LDAP: %s", ldap_err2string(rc))));
+		goto cleanup;
+	}
+
+	rc = ldap_set_option(ldap, LDAP_OPT_PROTOCOL_VERSION, &ldapversion);
+	if (rc != LDAP_SUCCESS)
+	{
+		ereport(LOG,
+				(errmsg("could not set LDAP protocol version: %s",
+						ldap_err2string(rc)),
+				 errdetail_for_ldap(ldap)));
+		goto cleanup;
+	}
+
+	if (port->hba->ldaptls)
+	{
+		if ((rc = ldap_start_tls_s(ldap, NULL, NULL)) != LDAP_SUCCESS)
+		{
+			ereport(LOG,
+					(errmsg("could not start LDAP TLS session: %s",
+							ldap_err2string(rc)),
+					 errdetail_for_ldap(ldap)));
+			goto cleanup;
+		}
+	}
+
+	rc = ldap_simple_bind_s(ldap,
+							port->hba->ldapbinddn ? port->hba->ldapbinddn : "",
+							port->hba->ldapbindpasswd ? port->hba->ldapbindpasswd : "");
+	if (rc != LDAP_SUCCESS)
+	{
+		ereport(LOG,
+				(errmsg("could not perform initial LDAP bind for ldapbinddn \"%s\" on server \"%s\": %s",
+						port->hba->ldapbinddn ? port->hba->ldapbinddn : "",
+						server,
+						ldap_err2string(rc)),
+				 errdetail_for_ldap(ldap)));
+		goto cleanup;
+	}
+
+	rc = ldap_search_s(ldap,
+					   ldapurl->lud_dn,
+					   ldapurl->lud_scope,
+					   ldapurl->lud_filter,
+					   attributes,
+					   0,
+					   &search_message);
+
+	if (rc != LDAP_SUCCESS)
+	{
+		ereport(LOG,
+				(errmsg("LDAP role search failed on server \"%s\": %s",
+						server, ldap_err2string(rc)),
+				 errdetail_for_ldap(ldap)));
+		goto cleanup;
+	}
+
+	count = ldap_count_entries(ldap, search_message);
+	if (count == 1)
+	{
+		int i = 0;
+
+		/*
+		 * Loop over the returned attributes and add them to our list. Note that
+		 * there doesn't seem to be a way to differentiate between "no
+		 * attributes" and an actual lookup error.
+		 *
+		 * TODO: save off matching DN for later auditing
+		 */
+		entry = ldap_first_entry(ldap, search_message);
+		values = ldap_get_values_len(ldap, entry, ldapurl->lud_attrs[0]);
+		if (values)
+		{
+			for (i = 0; values[i]; ++i)
+			{
+				char   *role = values[i]->bv_val;
+
+				if (strlen(role) != values[i]->bv_len)
+				{
+					ereport(LOG,
+							(errmsg("server returned LDAP role attribute with embedded NULL")));
+					goto cleanup;
+				}
+
+				*roles = lappend(*roles, pstrdup(role));
+			}
+		}
+
+		if (i == 0)
+		{
+			ereport(LOG,
+					(errmsg("matching LDAP entry had no role attributes")));
+		}
+	}
+	else if (count == 0)
+	{
+		/* Not an error condition; this ident line just doesn't "match". */
+		ereport(LOG,
+				(errmsg("LDAP mapping query returned no entries")));
+	}
+	else
+	{
+		/*
+		 * This, however, is an error. The query is malformed if it returns more
+		 * than one matching entry.
+		 */
+		ereport(LOG,
+				(errmsg("LDAP mapping query matched multiple DNs")));
+		goto cleanup;
+	}
+
+	success = true;
+
+cleanup:
+	if (values)
+		ldap_value_free_len(values);
+	if (search_message)
+		ldap_msgfree(search_message);
+	if (ldap)
+	{
+		rc = ldap_unbind(ldap);
+		if (rc != LDAP_SUCCESS)
+		{
+			ereport(LOG,
+					(errmsg("could not unbind from server \"%s\": %s",
+							server, ldap_err2string(rc)),
+					 errdetail_for_ldap(ldap)));
+			success = false;
+		}
+	}
+	pfree(server);
+
+	return success;
+}
+
+#endif /* LDAP_API_FEATURE_X_OPENLDAP */
 
 /*
  *	Scan the (pre-parsed) ident usermap file line by line, looking for a match
@@ -3008,6 +3456,9 @@ check_ident_usermap(IdentLine *identLine, const char *usermap_name,
  *	See if the user with ident username "auth_user" is allowed to act
  *	as Postgres user "pg_role" according to usermap "usermap_name".
  *
+ *	If the HBA has specified an ldapmap instead, the LDAP server will be queried
+ *	here to determine the allowed roles.
+ *
  *	Special case: Usermap NULL, equivalent to what was previously called
  *	"sameuser" or "samerole", means don't look in the usermap file.
  *	That's an implied map wherein "pg_role" must be identical to
@@ -3016,15 +3467,51 @@ check_ident_usermap(IdentLine *identLine, const char *usermap_name,
  *	Iff authorized, return STATUS_OK, otherwise return STATUS_ERROR.
  */
 int
-check_usermap(const char *usermap_name,
-			  const char *pg_role,
+check_usermap(const Port *port,
 			  const char *auth_user,
 			  bool case_insensitive)
 {
 	bool		found_entry = false,
 				error = false;
+	const char *usermap_name = port->hba->usermap;
+	const char *ldapmap_name = port->hba->ldapmap;
+	const char *pg_role = port->user_name;
+	const char *maptype = "usermap";
+
+	/*
+	 * Note that parse_hba_line() prevents cases where both usermap and
+	 * ldapmap are set simultaneously.
+	 */
+	if (usermap_name && usermap_name[0])
+	{
+		ListCell   *line_cell;
 
-	if (usermap_name == NULL || usermap_name[0] == '\0')
+		foreach(line_cell, parsed_ident_lines)
+		{
+			check_ident_usermap(lfirst(line_cell), usermap_name,
+								pg_role, auth_user, case_insensitive,
+								&found_entry, &error);
+			if (found_entry || error)
+				break;
+		}
+	}
+	else if (ldapmap_name && ldapmap_name[0])
+	{
+		ListCell   *line_cell;
+
+		maptype = "ldapmap";
+
+		foreach(line_cell, parsed_ident_lines)
+		{
+			/* TODO: currently we ignore case-insensitivity; how should LDAP
+			 * handle that? */
+			check_ldap_usermap(lfirst(line_cell), ldapmap_name, port,
+							   pg_role, auth_user, &found_entry, &error);
+			if (found_entry || error)
+				break;
+		}
+	}
+	else
 	{
 		if (case_insensitive)
 		{
@@ -3041,24 +3528,14 @@ check_usermap(const char *usermap_name,
 						pg_role, auth_user)));
 		return STATUS_ERROR;
 	}
-	else
-	{
-		ListCell   *line_cell;
 
-		foreach(line_cell, parsed_ident_lines)
-		{
-			check_ident_usermap(lfirst(line_cell), usermap_name,
-								pg_role, auth_user, case_insensitive,
-								&found_entry, &error);
-			if (found_entry || error)
-				break;
-		}
-	}
 	if (!found_entry && !error)
 	{
 		ereport(LOG,
-				(errmsg("no match in usermap \"%s\" for user \"%s\" authenticated as \"%s\"",
-						usermap_name, pg_role, auth_user)));
+				(errmsg("no match in %s \"%s\" for user \"%s\" authenticated as \"%s\"",
+						maptype,
+						(maptype[0] == 'u') ? usermap_name : ldapmap_name,
+						pg_role, auth_user)));
 	}
 	return found_entry ? STATUS_OK : STATUS_ERROR;
 }
diff --git a/src/include/libpq/hba.h b/src/include/libpq/hba.h
index 8d9f3821b1..a9c709f9b8 100644
--- a/src/include/libpq/hba.h
+++ b/src/include/libpq/hba.h
@@ -92,6 +92,7 @@ typedef struct HbaLine
 	char	   *hostname;
 	UserAuth	auth_method;
 	char	   *usermap;
+	char	   *ldapmap; /* alternative to usermap; queries LDAP directly */
 	char	   *pamservice;
 	bool		pam_use_hostname;
 	bool		ldaptls;
@@ -139,8 +140,8 @@ extern bool load_hba(void);
 extern bool load_ident(void);
 extern const char *hba_authname(UserAuth auth_method);
 extern void hba_getauthmethod(hbaPort *port);
-extern int	check_usermap(const char *usermap_name,
-						  const char *pg_role, const char *auth_user,
+extern int	check_usermap(const hbaPort *port,
+						  const char *auth_user,
 						  bool case_sensitive);
 extern bool pg_isblank(const char c);
 
diff --git a/src/test/ldap/authdata.ldif b/src/test/ldap/authdata.ldif
index c0a15daffb..a3e296f3e2 100644
--- a/src/test/ldap/authdata.ldif
+++ b/src/test/ldap/authdata.ldif
@@ -21,6 +21,7 @@ mail: test1@example.net
 dn: uid=test2,dc=example,dc=net
 objectClass: inetOrgPerson
 objectClass: posixAccount
+objectClass: postgresUser
 uid: test2
 sn: Lastname
 givenName: Firstname
@@ -30,3 +31,5 @@ uidNumber: 102
 gidNumber: 100
 homeDirectory: /home/test2
 mail: test2@example.net
+postgresRole: test0
+postgresRole: test2@example.net
diff --git a/src/test/ldap/postgresuser.schema b/src/test/ldap/postgresuser.schema
new file mode 100644
index 0000000000..5459555fc3
--- /dev/null
+++ b/src/test/ldap/postgresuser.schema
@@ -0,0 +1,30 @@
+# PostgresUser
+#
+# A mix-in class that attaches authorized Postgres role names to an LDAP object.
+
+# Well-known OID classes
+objectIdentifier directoryString 1.3.6.1.4.1.1466.115.121.1.15
+objectIdentifier jointUUID 2.25
+
+# The 'postgres' OID is equivalent to
+#
+#     oid:/UUID/b13db941-195e-329e-91fc-adba1a8b6619
+#
+# where {b13db941-195e-329e-91fc-adba1a8b6619} is the UUIDv3 corresponding to
+# the 'postgresql.org' DNS name. That should hopefully be enough to prevent
+# collisions with any other schemas, though since this is test-only it probably
+# doesn't matter in practice.
+objectIdentifier postgres jointUUID:235593842765758976291531166600911349273
+objectIdentifier postgresAttribute postgres:1
+objectIdentifier postgresObject    postgres:2
+
+attributetype ( postgresAttribute:1
+	NAME 'postgresRole'
+	DESC 'Authorized database role'
+	SYNTAX directoryString )
+
+objectclass ( postgresObject:1
+	NAME 'postgresUser'
+	DESC 'PostgreSQL user'
+	AUXILIARY
+	MAY postgresRole )
diff --git a/src/test/ldap/t/001_auth.pl b/src/test/ldap/t/001_auth.pl
index 5a9a009832..781b1e8c78 100644
--- a/src/test/ldap/t/001_auth.pl
+++ b/src/test/ldap/t/001_auth.pl
@@ -9,7 +9,7 @@ use Test::More;
 
 if ($ENV{with_ldap} eq 'yes')
 {
-	plan tests => 28;
+	plan tests => 51;
 }
 else
 {
@@ -75,6 +75,7 @@ append_to_file(
 include $ldap_schema_dir/cosine.schema
 include $ldap_schema_dir/nis.schema
 include $ldap_schema_dir/inetorgperson.schema
+include postgresuser.schema
 
 pidfile $slapd_pidfile
 logfile $slapd_logfile
@@ -113,34 +114,60 @@ system_or_bail "openssl", "x509", "-req", "-in", "$slapd_certs/server.csr",
   "-CA", "$slapd_certs/ca.crt", "-CAkey", "$slapd_certs/ca.key",
   "-CAcreateserial", "-out", "$slapd_certs/server.crt";
 
-system_or_bail $slapd, '-f', $slapd_conf, '-h', "$ldap_url $ldaps_url";
+sub start_slapd
+{
+	system_or_bail $slapd, '-f', $slapd_conf, '-h', "$ldap_url $ldaps_url";
+}
 
-END
+sub wait_for_slapd
+{
+	my ($url) = @_;
+
+	# wait until slapd accepts requests
+	my $retries = 0;
+	while (1)
+	{
+		last
+		  if (
+			system_log(
+				"ldapsearch", "-sbase",
+				"-H",         $url,
+				"-b",         $ldap_basedn,
+				"-D",         $ldap_rootdn,
+				"-y",         $ldap_pwfile,
+				"-n",         "'objectclass=*'") == 0);
+		die "cannot connect to slapd" if ++$retries >= 300;
+		note "waiting for slapd to accept requests...";
+		Time::HiRes::usleep(1000000);
+	}
+}
+
+sub stop_slapd
 {
 	kill 'INT', `cat $slapd_pidfile` if -f $slapd_pidfile;
 }
 
-append_to_file($ldap_pwfile, $ldap_rootpw);
-chmod 0600, $ldap_pwfile or die;
+sub restart_slapd
+{
+	my ($url) = @_;
+
+	stop_slapd();
+	start_slapd();
+	wait_for_slapd($url);
+}
 
-# wait until slapd accepts requests
-my $retries = 0;
-while (1)
+start_slapd();
+
+END
 {
-	last
-	  if (
-		system_log(
-			"ldapsearch", "-sbase",
-			"-H",         $ldap_url,
-			"-b",         $ldap_basedn,
-			"-D",         $ldap_rootdn,
-			"-y",         $ldap_pwfile,
-			"-n",         "'objectclass=*'") == 0);
-	die "cannot connect to slapd" if ++$retries >= 300;
-	note "waiting for slapd to accept requests...";
-	Time::HiRes::usleep(1000000);
+	stop_slapd();
 }
 
+append_to_file($ldap_pwfile, $ldap_rootpw);
+chmod 0600, $ldap_pwfile or die;
+
+wait_for_slapd($ldap_url);
+
 $ENV{'LDAPURI'}    = $ldap_url;
 $ENV{'LDAPBINDDN'} = $ldap_rootdn;
 $ENV{'LDAPCONF'}   = $ldap_conf;
@@ -164,6 +191,12 @@ $node->safe_psql('postgres', 'CREATE USER test0;');
 $node->safe_psql('postgres', 'CREATE USER test1;');
 $node->safe_psql('postgres', 'CREATE USER "test2@example.net";');
 
+my @databases = ( 'anon', 'noattrs', 'badmap', 'starttls', 'bindpw' );
+foreach my $db (@databases)
+{
+	$node->safe_psql('postgres', "CREATE DATABASE $db");
+}
+
 note "running tests";
 
 sub test_access
@@ -367,3 +400,174 @@ $node->restart;
 
 $ENV{"PGPASSWORD"} = 'secret1';
 test_access($node, 'test1', 2, 'bad combination of LDAPS and StartTLS');
+
+note 'LDAP attribute ident mapping';
+
+delete $ENV{"PGPASSWORD"};
+
+# We'll use cert auth for mapping. Reuse the LDAP CA we already have for
+# simplicity (this is a nonsensical setup in practice).
+system_or_bail "openssl", "req", "-new", "-nodes",
+  "-keyout", "$slapd_certs/test1-client.key",
+  "-out", "$slapd_certs/test1-client.csr",
+  "-subj", "/DC=net/DC=example/CN=test1";
+system_or_bail "openssl", "x509", "-req",
+  "-in", "$slapd_certs/test1-client.csr",
+  "-CA", "$slapd_certs/ca.crt", "-CAkey", "$slapd_certs/ca.key",
+  "-CAcreateserial", "-out", "$slapd_certs/test1-client.crt";
+system_or_bail "openssl", "req", "-new", "-nodes",
+  "-keyout", "$slapd_certs/test2-client.key",
+  "-out", "$slapd_certs/test2-client.csr",
+  "-subj", "/DC=net/DC=example/CN=test2";
+system_or_bail "openssl", "x509", "-req",
+  "-in", "$slapd_certs/test2-client.csr",
+  "-CA", "$slapd_certs/ca.crt", "-CAkey", "$slapd_certs/ca.key",
+  "-CAcreateserial", "-out", "$slapd_certs/test2-client.crt";
+
+my $SERVERHOSTADDR = '127.0.0.1';
+
+$node->append_conf('postgresql.conf', qq{
+listen_addresses = '$SERVERHOSTADDR'
+ssl = on
+ssl_ca_file = '$slapd_certs/ca.crt'
+ssl_cert_file = '$slapd_certs/server.crt'
+ssl_key_file = '$slapd_certs/server.key'
+});
+
+# XXX check the other SSL tests' security mitigations for hostssl
+unlink($node->data_dir . '/pg_hba.conf');
+$node->append_conf('pg_hba.conf',
+	qq{
+# TYPE   DATABASE  USER  ADDRESS  METHOD  OPTIONS
+hostssl  anon      all   all      cert    ldapmap=ldap
+hostssl  noattrs   all   all      cert    ldapmap=noattrs
+hostssl  badmap    all   all      cert    ldapmap=badmap
+hostssl  starttls  all   all      cert    ldapmap=ldap ldaptls=1
+hostssl  bindpw    all   all      cert    ldapmap=ldap ldaptls=1 ldapbinddn="$ldap_rootdn" ldapbindpasswd="$ldap_rootpw"
+});
+
+unlink($node->data_dir . '/pg_ident.conf');
+$node->append_conf('pg_ident.conf',
+	qq{
+# This query matches only postgresUser entries, and returns their postgresRole
+# attributes.
+ldap    /^(.*)\$ "$ldap_url/$ldap_basedn?postgresRole?sub?(&(objectClass=postgresUser)(uid=\\1))"
+
+# This query matches any object with the given uid, so it can return entries
+# with no attribute values.
+noattrs /^(.*)\$ "$ldap_url/$ldap_basedn?postgresRole?sub?(&(objectClass=*)(uid=\\1))"
+
+# This query matches multiple DNs and should fail.
+badmap  /^       "$ldap_url/$ldap_basedn?postgresRole?sub?(objectClass=inetOrgPerson)"
+});
+
+$node->restart;
+
+my $common_connstr =
+	"host=server hostaddr=$SERVERHOSTADDR sslmode=verify-full " .
+	"sslrootcert='$slapd_certs/ca.crt' " .
+	"sslcert='$slapd_certs/test2-client.crt' " .
+	"sslkey='$slapd_certs/test2-client.key'";
+
+$node->connect_ok(
+	"$common_connstr dbname=anon user=test0",
+    "ldapmap succeeds with role attribute");
+
+$node->connect_fails(
+	"$common_connstr dbname=anon user=test1",
+	"ldapmap fails without matching role attribute",
+	log_like => [
+		qr/no match in ldapmap "ldap" for user "test1" authenticated as ".*"/,
+	]);
+
+$node->connect_ok(
+	"$common_connstr dbname=anon user='test2\@example.net'",
+	"ldapmap succeeds with another role attribute");
+
+$node->connect_fails(
+	"$common_connstr dbname=badmap user=test0",
+	"ldapmap fails if query matches multiple DNs",
+	log_like => [
+		qr/query matched multiple DNs/,
+		qr/no match in ldapmap "badmap" for user "test0" authenticated as ".*"/,
+	]);
+
+# Switch to the test1 client cert, which does not have a corresponding
+# postgresUser in the LDAP tree.
+$common_connstr =
+	"host=server hostaddr=$SERVERHOSTADDR sslmode=verify-full " .
+	"sslrootcert='$slapd_certs/ca.crt' " .
+	"sslcert='$slapd_certs/test1-client.crt' " .
+	"sslkey='$slapd_certs/test1-client.key'";
+
+$node->connect_fails(
+	"$common_connstr dbname=anon user=test1",
+	"ldapmap fails if query matches no DNs",
+	log_like => [
+		qr/query returned no entries/,
+		qr/no match in ldapmap "ldap" for user "test1" authenticated as ".*"/,
+	]);
+
+$node->connect_fails(
+	"$common_connstr dbname=noattrs user=test1",
+	"ldapmap fails if entry has no attributes",
+	log_like => [
+		qr/entry had no role attributes/,
+		qr/no match in ldapmap "noattrs" for user "test1" authenticated as ".*"/,
+	]);
+
+note 'LDAP ident mapping with StartTLS';
+
+# Force the use of TLS for connections from this point onward.
+append_to_file(
+	$slapd_conf,
+	qq{
+security tls=128
+});
+
+restart_slapd($ldaps_url);
+
+$common_connstr =
+	"host=server hostaddr=$SERVERHOSTADDR sslmode=verify-full " .
+	"sslrootcert='$slapd_certs/ca.crt' " .
+	"sslcert='$slapd_certs/test2-client.crt' " .
+	"sslkey='$slapd_certs/test2-client.key'";
+
+$node->connect_fails(
+	"$common_connstr dbname=anon user=test0",
+	"anonymous ldapmap binding fails with StartTLS enforcement",
+	log_like => [
+		qr/connection authenticated:/,
+		qr/LDAP role search failed on server .*: Confidentiality required/,
+		qr/no match in ldapmap "ldap" for user "test0" authenticated as ".*"/,
+	]);
+
+$node->connect_ok(
+	"$common_connstr dbname=starttls user=test0",
+	"ldapmap works with StartTLS");
+
+note 'LDAP ident mapping with bind password';
+
+# Force the use of authenticated connections from this point onward.
+append_to_file(
+	$slapd_conf,
+	qq{require authc
+});
+
+restart_slapd($ldaps_url);
+
+$node->connect_fails(
+	"$common_connstr dbname=starttls user=test0",
+	"anonymous ldapmap binding fails",
+	log_like => [
+		qr/connection authenticated:/,
+		qr/LDAP diagnostics: authentication required/,
+		qr/no match in ldapmap "ldap" for user "test0" authenticated as ".*"/,
+	]);
+
+$node->connect_ok(
+	"$common_connstr dbname=bindpw user=test0",
+	"ldapmap works with bind password");
+
+note 'LDAP group ident mapping';
+# TODO
-- 
2.25.1

0003-ldapmap-test-binding-with-a-client-cert-key.patchtext/x-patch; name=0003-ldapmap-test-binding-with-a-client-cert-key.patchDownload

From 56ffba733b6dea553b56fde4daad8d32266de592 Mon Sep 17 00:00:00 2001
From: Jacob Champion <pchampion@vmware.com>
Date: Mon, 13 Dec 2021 10:27:08 -0800
Subject: [PATCH 3/4] ldapmap: test binding with a client cert/key

Make sure that ldapmap queries can utilize a client cert. This was
already supported before, but it'll be more important with the next
patch, so test it explicitly.
---
 src/test/ldap/t/001_auth.pl | 42 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/src/test/ldap/t/001_auth.pl b/src/test/ldap/t/001_auth.pl
index 781b1e8c78..6467a6c4af 100644
--- a/src/test/ldap/t/001_auth.pl
+++ b/src/test/ldap/t/001_auth.pl
@@ -9,7 +9,7 @@ use Test::More;
 
 if ($ENV{with_ldap} eq 'yes')
 {
-	plan tests => 51;
+	plan tests => 56;
 }
 else
 {
@@ -79,6 +79,7 @@ include postgresuser.schema
 
 pidfile $slapd_pidfile
 logfile $slapd_logfile
+loglevel conns filter stats
 
 access to *
         by * read
@@ -569,5 +570,44 @@ $node->connect_ok(
 	"$common_connstr dbname=bindpw user=test0",
 	"ldapmap works with bind password");
 
+note 'LDAP ident mapping with client certificate';
+
+# Set up a certificate for the root user.
+system_or_bail "openssl", "req", "-new", "-nodes",
+  "-keyout", "$slapd_certs/root-client.key",
+  "-out", "$slapd_certs/root-client.csr",
+  "-subj", "/DC=net/DC=example/CN=Manager";
+system_or_bail "openssl", "x509", "-req", "-in", "$slapd_certs/root-client.csr",
+  "-CA", "$slapd_certs/ca.crt", "-CAkey", "$slapd_certs/ca.key",
+  "-CAcreateserial", "-out", "$slapd_certs/root-client.crt";
+
+$ENV{'LDAPTLS_CERT'} = "$slapd_certs/root-client.crt";
+$ENV{'LDAPTLS_KEY'}  = "$slapd_certs/root-client.key";
+
+# Force the use of client certificates from this point onward.
+append_to_file(
+	$slapd_conf,
+	qq{TLSVerifyClient demand
+});
+
+restart_slapd($ldaps_url);
+
+$node->connect_fails(
+	"$common_connstr dbname=bindpw user=test0",
+	"ldapmap with bind password fails without client certificate",
+	log_like => [
+		qr/connection authenticated:/,
+		qr/could not perform initial LDAP bind for ldapbinddn "cn=Manager,dc=example,dc=net" on server ".*": Can't contact LDAP server/,
+		qr/no match in ldapmap "ldap" for user "test0" authenticated as ".*"/,
+	]);
+
+# The server needs to be restarted to pick up all the above LDAPTLS_* settings
+# from the environment.
+$node->restart;
+
+$node->connect_ok(
+	"$common_connstr dbname=bindpw user=test0",
+	"ldapmap works with bind certificate");
+
 note 'LDAP group ident mapping';
 # TODO
-- 
2.25.1

0004-ldapmap-implement-SASL-binding.patchtext/x-patch; name=0004-ldapmap-implement-SASL-binding.patchDownload

From 2f3c8959274a53fb6027cd666c5e942c6ab68ed2 Mon Sep 17 00:00:00 2001
From: Jacob Champion <pchampion@vmware.com>
Date: Mon, 13 Dec 2021 15:39:16 -0800
Subject: [PATCH 4/4] ldapmap: implement SASL binding

Add the "ldapsaslmechs" HBA option, which tells the ldapmap query to use
SASL binding (instead of simple username/password binding) and specifies
the SASL mechanism list that is accepted. This allows the DBA to further
lock down the LDAP connection and even get rid of bind passwords
altogether.

This new feature is gated on the existence of the <sasl/sasl.h> header,
which we need in order to implement the SASL callback API. (No new
runtime dependencies are needed.)

The currently tested mechanisms are

- EXTERNAL, which authenticates using a TLS client certificate, and

- SCRAM-SHA-1, which performs mutual password authentication without
  ever sending the password over the connection. (Other SCRAM-*
  mechanisms should work too but I haven't tested them.)

TODOs:

- This seems like it would be useful for the ldap auth method, too.

- I reuse ldapbinddn for the SASL auth name, but it's probably not
  actually a DN in practice (and it's certainly not a DN for the test
  case I provided). Maybe make a new HBA option?
---
 configure                   |  12 +++
 configure.ac                |   1 +
 src/backend/libpq/hba.c     | 165 +++++++++++++++++++++++++++++++++---
 src/include/libpq/hba.h     |   1 +
 src/include/pg_config.h.in  |   3 +
 src/test/ldap/authdata.ldif |   7 ++
 src/test/ldap/t/001_auth.pl |  35 +++++++-
 7 files changed, 210 insertions(+), 14 deletions(-)

diff --git a/configure b/configure
index 3b19105328..e5ac7f441d 100755
--- a/configure
+++ b/configure
@@ -13993,6 +13993,18 @@ else
   as_fn_error $? "header file <ldap.h> is required for LDAP" "$LINENO" 5
 fi
 
+done
+
+     for ac_header in sasl/sasl.h
+do :
+  ac_fn_c_check_header_mongrel "$LINENO" "sasl/sasl.h" "ac_cv_header_sasl_sasl_h" "$ac_includes_default"
+if test "x$ac_cv_header_sasl_sasl_h" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_SASL_SASL_H 1
+_ACEOF
+
+fi
+
 done
 
      { $as_echo "$as_me:${as_lineno-$LINENO}: checking for compatible LDAP implementation" >&5
diff --git a/configure.ac b/configure.ac
index e77d4dcf2d..c1f0048b0f 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1523,6 +1523,7 @@ if test "$with_ldap" = yes ; then
   if test "$PORTNAME" != "win32"; then
      AC_CHECK_HEADERS(ldap.h, [],
                       [AC_MSG_ERROR([header file <ldap.h> is required for LDAP])])
+     AC_CHECK_HEADERS(sasl/sasl.h)
      PGAC_LDAP_SAFE
   else
      AC_CHECK_HEADERS(winldap.h, [],
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 57440107be..7516a9681e 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -51,6 +51,9 @@
 #else
 #define LDAP_DEPRECATED 1
 #include <ldap.h>
+#if HAVE_SASL_SASL_H
+#include <sasl/sasl.h> /* header-only dependency for sasl_interact_t */
+#endif
 #endif
 #endif
 
@@ -1712,6 +1715,10 @@ parse_hba_line(TokenizedLine *tok_line, int elevel)
 		{
 			badopt = "ldapbindpasswd";
 		}
+		else if (parsedline->ldapsaslmechs)
+		{
+			badopt = "ldapsaslmechs";
+		}
 
 		if (badopt)
 		{
@@ -2004,6 +2011,10 @@ parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline,
 	{
 		hbaline->ldapbindpasswd = pstrdup(val);
 	}
+	else if (strcmp(name, "ldapsaslmechs") == 0)
+	{
+		hbaline->ldapsaslmechs = pstrdup(val);
+	}
 	else if (strcmp(name, "ldapsearchattribute") == 0)
 	{
 		REQUIRE_AUTH_OPTION(uaLDAP, "ldapsearchattribute", "ldap");
@@ -3273,6 +3284,146 @@ errdetail_for_ldap(LDAP *ldap)
 	return 0;
 }
 
+#if HAVE_SASL_SASL_H
+
+struct sasl_ctx
+{
+	const HbaLine  *hba;
+};
+
+/*
+ * Callback for ldap_sasl_interactive_bind_s(). libsasl asks us for various
+ * authentication parameters, and we fill them in.
+ */
+static int
+sasl_interaction(LDAP *ldap, unsigned flags, void *vctx, void *sasl_interact)
+{
+	struct sasl_ctx *ctx = vctx;
+	sasl_interact_t *prompt = sasl_interact;
+
+	while (true)
+	{
+		switch (prompt->id)
+		{
+			case SASL_CB_LIST_END:
+				return LDAP_SUCCESS;
+
+			case SASL_CB_USER:
+				/*
+				 * This is the authzid; we leave it empty/default.
+				 */
+				prompt->result = prompt->defresult;
+				break;
+
+			case SASL_CB_AUTHNAME:
+				/*
+				 * The username for the authentication; this is our bind DN.
+				 */
+				if (!ctx->hba->ldapbinddn)
+				{
+					ereport(LOG,
+							errmsg("SASL mechanism requires ldapbinddn"));
+					return LDAP_LOCAL_ERROR;
+				}
+
+				prompt->result = ctx->hba->ldapbinddn;
+				prompt->len = strlen(prompt->result);
+
+				break;
+
+			case SASL_CB_PASS:
+				/*
+				 * The password.
+				 */
+				if (!ctx->hba->ldapbindpasswd)
+				{
+					ereport(LOG,
+							errmsg("SASL mechanism requires ldapbindpasswd"));
+					return LDAP_LOCAL_ERROR;
+				}
+
+				prompt->result = ctx->hba->ldapbindpasswd;
+				prompt->len = strlen(prompt->result);
+
+				break;
+
+			default:
+				ereport(LOG,
+						errmsg("SASL interaction type 0x%lX (%s) is unimplemented",
+							   prompt->id, prompt->challenge));
+				return LDAP_LOCAL_ERROR;
+		}
+
+		++prompt;
+	}
+
+	/* unreachable */
+	return LDAP_LOCAL_ERROR;
+}
+
+#endif /* HAVE_SASL_SASL_H */
+
+/*
+ * Performs either a simple or a SASL bind over the LDAP connection, depending
+ * on the HBA settings.
+ */
+static bool
+bind_ldap(const HbaLine *hba, LDAP *ldap, const char *server_name)
+{
+	int			rc;
+#if HAVE_SASL_SASL_H
+	struct sasl_ctx ctx = {0};
+#endif
+
+	if (!(hba->ldapsaslmechs && hba->ldapsaslmechs[0]))
+	{
+		/* Use a simple bind. */
+		rc = ldap_simple_bind_s(ldap,
+								hba->ldapbinddn ? hba->ldapbinddn : "",
+								hba->ldapbindpasswd ? hba->ldapbindpasswd : "");
+		if (rc != LDAP_SUCCESS)
+		{
+			ereport(LOG,
+					(errmsg("could not perform initial LDAP bind for ldapbinddn \"%s\" on server \"%s\": %s",
+							hba->ldapbinddn ? hba->ldapbinddn : "",
+							server_name,
+							ldap_err2string(rc)),
+					 errdetail_for_ldap(ldap)));
+		}
+
+		return (rc == LDAP_SUCCESS);
+	}
+
+#if HAVE_SASL_SASL_H
+	/* DBA has asked for a SASL bind. */
+	ctx.hba = hba;
+
+	rc = ldap_sasl_interactive_bind_s(ldap,
+									  NULL, /* DN is ignored for SASL */
+									  hba->ldapsaslmechs,
+									  NULL, NULL, /* server/client controls */
+									  LDAP_SASL_QUIET, /* don't prompt */
+									  sasl_interaction,
+									  &ctx);
+	if (rc != LDAP_SUCCESS)
+	{
+		ereport(LOG,
+				(errmsg("could not perform SASL bind on server \"%s\": %s",
+						server_name,
+						ldap_err2string(rc)),
+				 errdetail_for_ldap(ldap)));
+	}
+
+	return (rc == LDAP_SUCCESS);
+
+#else
+	ereport(LOG,
+			(errmsg("this build does not support LDAP SASL binding")));
+	return false;
+
+#endif /* HAVE_SASL_SASL_H */
+}
+
 /*
  * Returns a palloc'd list of pointers to role names returned by the LDAP query
  * contained in ldapurl. Callers must list_free_deep() the return value even if
@@ -3339,19 +3490,9 @@ query_ldap_roles(const Port *port, LDAPURLDesc *ldapurl, List **roles)
 		}
 	}
 
-	rc = ldap_simple_bind_s(ldap,
-							port->hba->ldapbinddn ? port->hba->ldapbinddn : "",
-							port->hba->ldapbindpasswd ? port->hba->ldapbindpasswd : "");
-	if (rc != LDAP_SUCCESS)
-	{
-		ereport(LOG,
-				(errmsg("could not perform initial LDAP bind for ldapbinddn \"%s\" on server \"%s\": %s",
-						port->hba->ldapbinddn ? port->hba->ldapbinddn : "",
-						server,
-						ldap_err2string(rc)),
-				 errdetail_for_ldap(ldap)));
+	/* Bind using our HBA settings. */
+	if (!bind_ldap(port->hba, ldap, server))
 		goto cleanup;
-	}
 
 	rc = ldap_search_s(ldap,
 					   ldapurl->lud_dn,
diff --git a/src/include/libpq/hba.h b/src/include/libpq/hba.h
index a9c709f9b8..c701cb72a7 100644
--- a/src/include/libpq/hba.h
+++ b/src/include/libpq/hba.h
@@ -101,6 +101,7 @@ typedef struct HbaLine
 	int			ldapport;
 	char	   *ldapbinddn;
 	char	   *ldapbindpasswd;
+	char	   *ldapsaslmechs;
 	char	   *ldapsearchattribute;
 	char	   *ldapsearchfilter;
 	char	   *ldapbasedn;
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 7525c16597..5479b2d952 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -488,6 +488,9 @@
 /* Define to 1 if you have the `rl_variable_bind' function. */
 #undef HAVE_RL_VARIABLE_BIND
 
+/* Define to 1 if you have the <sasl/sasl.h> header file. */
+#undef HAVE_SASL_SASL_H
+
 /* Define to 1 if you have the <security/pam_appl.h> header file. */
 #undef HAVE_SECURITY_PAM_APPL_H
 
diff --git a/src/test/ldap/authdata.ldif b/src/test/ldap/authdata.ldif
index a3e296f3e2..0a32042d85 100644
--- a/src/test/ldap/authdata.ldif
+++ b/src/test/ldap/authdata.ldif
@@ -33,3 +33,10 @@ homeDirectory: /home/test2
 mail: test2@example.net
 postgresRole: test0
 postgresRole: test2@example.net
+
+# For SCRAM auth only; the rootpw should take care of the rest.
+dn: cn=Manager,dc=example,dc=net
+objectClass: inetOrgPerson
+uid: Manager
+sn: Lastname
+cn: Manager
diff --git a/src/test/ldap/t/001_auth.pl b/src/test/ldap/t/001_auth.pl
index 6467a6c4af..c296c746a1 100644
--- a/src/test/ldap/t/001_auth.pl
+++ b/src/test/ldap/t/001_auth.pl
@@ -9,7 +9,7 @@ use Test::More;
 
 if ($ENV{with_ldap} eq 'yes')
 {
-	plan tests => 56;
+	plan tests => 61;
 }
 else
 {
@@ -192,7 +192,7 @@ $node->safe_psql('postgres', 'CREATE USER test0;');
 $node->safe_psql('postgres', 'CREATE USER test1;');
 $node->safe_psql('postgres', 'CREATE USER "test2@example.net";');
 
-my @databases = ( 'anon', 'noattrs', 'badmap', 'starttls', 'bindpw' );
+my @databases = ( 'anon', 'noattrs', 'badmap', 'starttls', 'bindpw', 'bindscram', 'bindcert' );
 foreach my $db (@databases)
 {
 	$node->safe_psql('postgres', "CREATE DATABASE $db");
@@ -445,6 +445,8 @@ hostssl  noattrs   all   all      cert    ldapmap=noattrs
 hostssl  badmap    all   all      cert    ldapmap=badmap
 hostssl  starttls  all   all      cert    ldapmap=ldap ldaptls=1
 hostssl  bindpw    all   all      cert    ldapmap=ldap ldaptls=1 ldapbinddn="$ldap_rootdn" ldapbindpasswd="$ldap_rootpw"
+hostssl  bindscram all   all      cert    ldapmap=ldap ldaptls=1 ldapsaslmechs=scram-sha-1 ldapbinddn=Manager ldapbindpasswd="$ldap_rootpw"
+hostssl  bindcert  all   all      cert    ldapmap=ldap ldaptls=1 ldapsaslmechs=external
 });
 
 unlink($node->data_dir . '/pg_ident.conf');
@@ -570,6 +572,31 @@ $node->connect_ok(
 	"$common_connstr dbname=bindpw user=test0",
 	"ldapmap works with bind password");
 
+note 'LDAP ident mapping with SCRAM binding';
+
+$node->connect_fails(
+	"$common_connstr dbname=bindscram user=test0",
+	"ldapmap can't perform SCRAM authentication without server setup",
+	log_like => [
+		qr/could not perform SASL bind on server .*: Invalid credentials/,
+		qr/user not found: no secret in database/,
+	]);
+
+# Map the SCRAM-specific authentication DN to our root user.
+append_to_file(
+	$slapd_conf,
+	qq{
+authz-regexp
+	uid=Manager,cn=SCRAM-SHA-1,cn=auth
+	cn=Manager,dc=example,dc=net
+});
+
+restart_slapd($ldaps_url);
+
+$node->connect_ok(
+	"$common_connstr dbname=bindscram user=test0",
+	"ldapmap works with SCRAM authentication to LDAP server");
+
 note 'LDAP ident mapping with client certificate';
 
 # Set up a certificate for the root user.
@@ -609,5 +636,9 @@ $node->connect_ok(
 	"$common_connstr dbname=bindpw user=test0",
 	"ldapmap works with bind certificate");
 
+$node->connect_ok(
+	"$common_connstr dbname=bindcert user=test0",
+	"ldapmap works with client certificate authentication");
+
 note 'LDAP group ident mapping';
 # TODO
-- 
2.25.1

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 4 years ago

In reply to: Jacob Champion (#1)

Re: [PoC] Delegating pg_ident to a third party

On 17.12.21 00:48, Jacob Champion wrote:

WDYT? (My responses here will be slower than usual. Hope you all have a
great end to the year!)

Looks interesting. I wonder whether putting this into pg_ident.conf is
sensible. I suspect people will want to eventually add more features
around this, like automatically creating roles or role memberships, at
which point pg_ident.conf doesn't seem appropriate anymore. Should we
have a new file for this? Do you have any further ideas?

Jacob Champion

pchampion@vmware.com

about 4 years ago

In reply to: Peter Eisentraut (#2)

Re: [PoC] Delegating pg_ident to a third party

On Fri, 2021-12-17 at 10:06 +0100, Peter Eisentraut wrote:

On 17.12.21 00:48, Jacob Champion wrote:

WDYT? (My responses here will be slower than usual. Hope you all have a
great end to the year!)

Looks interesting. I wonder whether putting this into pg_ident.conf is
sensible. I suspect people will want to eventually add more features
around this, like automatically creating roles or role memberships, at
which point pg_ident.conf doesn't seem appropriate anymore.

Yeah, pg_ident is getting too cramped for this.

Should we have a new file for this? Do you have any further ideas?

My experience with these configs is mostly limited to HTTP servers.
That said, it's pretty hard to beat the flexibility of arbitrary key-
value pairs inside nested contexts. It's nice to be able to say things
like

Everyone has to use LDAP auth
With this server
And these TLS settings

Except admins
who additionally need client certificates
with this CA root

And Jacob
who isn't allowed in anymore

Are there any existing discussions along these lines that I should take
a look at?

--Jacob

Stephen Frost

sfrost@snowman.net

about 4 years ago

In reply to: Jacob Champion (#3)

Re: [PoC] Delegating pg_ident to a third party

Greetings,

* Jacob Champion (pchampion@vmware.com) wrote:

On Fri, 2021-12-17 at 10:06 +0100, Peter Eisentraut wrote:

On 17.12.21 00:48, Jacob Champion wrote:

WDYT? (My responses here will be slower than usual. Hope you all have a
great end to the year!)

Looks interesting. I wonder whether putting this into pg_ident.conf is
sensible. I suspect people will want to eventually add more features
around this, like automatically creating roles or role memberships, at
which point pg_ident.conf doesn't seem appropriate anymore.

This is the part that I really wonder about also ... I've always viewed
pg_ident as being intended mainly for one-to-one kind of mappings and
not the "map a bunch of different users into the same role" that this
advocated for. Being able to have roles and memberships automatically
created is much more the direction that I'd say we should be going in,
so that in-database auditing has an actual user to go on and not some
generic role that could be any number of people.

I'd go a step further and suggest that the way to do this is with a
background worker that's started up and connects to an LDAP
infrastructure and listens for changes, allowing the system to pick up
on new roles/memberships as soon as they're created in the LDAP
environment. That would then be controlled by appropriate settings in
postgresql.conf/.auto.conf.

Yeah, pg_ident is getting too cramped for this.

All that said, I do see how having the ability to call out to another
system for mappings may be useful, so I'm not sure that we shouldn't
consider this specific change and have it be specifically just for
mappings, in which case pg_ident seems appropriate.

Should we have a new file for this? Do you have any further ideas?

My experience with these configs is mostly limited to HTTP servers.
That said, it's pretty hard to beat the flexibility of arbitrary key-
value pairs inside nested contexts. It's nice to be able to say things
like

Everyone has to use LDAP auth
With this server
And these TLS settings

Except admins
who additionally need client certificates
with this CA root

And Jacob
who isn't allowed in anymore

I certainly don't think we should have this be limited to LDAP auth-
such an external mapping ability is suitable for any authentication
method that supports a mapping (thinking specifically of GSSAPI, of
course..). Not sure if that's what was meant above but did want to
make sure that was clear. The rest looks a lot more like pg_hba or
perhaps in-database privileges like roles/memberships existing or not
and CONNECT rights. I'm not really sold on the idea of adding yet even
more different ways to control authorization.

Thanks,

Stephen

Jacob Champion

pchampion@vmware.com

about 4 years ago

In reply to: Stephen Frost (#4)

Re: [PoC] Delegating pg_ident to a third party

On Mon, 2022-01-03 at 12:36 -0500, Stephen Frost wrote:

* Jacob Champion (pchampion@vmware.com) wrote:

On Fri, 2021-12-17 at 10:06 +0100, Peter Eisentraut wrote:

On 17.12.21 00:48, Jacob Champion wrote:

WDYT? (My responses here will be slower than usual. Hope you all have a
great end to the year!)

Looks interesting. I wonder whether putting this into pg_ident.conf is
sensible. I suspect people will want to eventually add more features
around this, like automatically creating roles or role memberships, at
which point pg_ident.conf doesn't seem appropriate anymore.

This is the part that I really wonder about also ... I've always viewed
pg_ident as being intended mainly for one-to-one kind of mappings and
not the "map a bunch of different users into the same role" that this
advocated for. Being able to have roles and memberships automatically
created is much more the direction that I'd say we should be going in,
so that in-database auditing has an actual user to go on and not some
generic role that could be any number of people.

That last point was my motivation for the authn_id patch [1]/messages/by-id/E1lTwp4-0002l4-L9@gemulon.postgresql.org -- so that
auditing could see the actual user _and_ the generic role. The
information is already there to be used, it's just not exposed to the
stats framework yet.

Forcing one role per individual end user is wasteful and isn't really
making good use of the role-based system that you already have.
Generally speaking, when administering hundreds or thousands of users,
people start dividing them up into groups as opposed to dealing with
them individually. So I don't think new features should be taking away
flexibility in this area -- if one role per user already works well for
you, great, but don't make everyone do the same.

I'd go a step further and suggest that the way to do this is with a
background worker that's started up and connects to an LDAP
infrastructure and listens for changes, allowing the system to pick up
on new roles/memberships as soon as they're created in the LDAP
environment. That would then be controlled by appropriate settings in
postgresql.conf/.auto.conf.

This is roughly what you can already do with existing (third-party)
tools, and that approach isn't scaling out in practice for some of our
existing customers. The load on the central server, for thousands of
idle databases dialing in just to see if there are any new users, is
huge.

All that said, I do see how having the ability to call out to another
system for mappings may be useful, so I'm not sure that we shouldn't
consider this specific change and have it be specifically just for
mappings, in which case pg_ident seems appropriate.

Yeah, this PoC was mostly an increment on the functionality that
already existed. The division between what goes in pg_hba and what goes
in pg_ident is starting to blur with this patchset, though, and I think
Peter's point is sound.

I certainly don't think we should have this be limited to LDAP auth-
such an external mapping ability is suitable for any authentication
method that supports a mapping (thinking specifically of GSSAPI, of
course..). Not sure if that's what was meant above but did want to
make sure that was clear.

You can't use usermaps with LDAP auth yet, so no, that's not what I
meant. (I have another patch for that feature in commitfest, which
would allow these two things to be used together.)

Thanks,
--Jacob

[1]: /messages/by-id/E1lTwp4-0002l4-L9@gemulon.postgresql.org

Stephen Frost

sfrost@snowman.net

about 4 years ago

In reply to: Jacob Champion (#5)

Re: [PoC] Delegating pg_ident to a third party

Greetings,

* Jacob Champion (pchampion@vmware.com) wrote:

On Mon, 2022-01-03 at 12:36 -0500, Stephen Frost wrote:

* Jacob Champion (pchampion@vmware.com) wrote:

On Fri, 2021-12-17 at 10:06 +0100, Peter Eisentraut wrote:

On 17.12.21 00:48, Jacob Champion wrote:

WDYT? (My responses here will be slower than usual. Hope you all have a
great end to the year!)

Looks interesting. I wonder whether putting this into pg_ident.conf is
sensible. I suspect people will want to eventually add more features
around this, like automatically creating roles or role memberships, at
which point pg_ident.conf doesn't seem appropriate anymore.

This is the part that I really wonder about also ... I've always viewed
pg_ident as being intended mainly for one-to-one kind of mappings and
not the "map a bunch of different users into the same role" that this
advocated for. Being able to have roles and memberships automatically
created is much more the direction that I'd say we should be going in,
so that in-database auditing has an actual user to go on and not some
generic role that could be any number of people.

That last point was my motivation for the authn_id patch [1] -- so that
auditing could see the actual user _and_ the generic role. The
information is already there to be used, it's just not exposed to the
stats framework yet.

While that helps, and I generally support adding that information to the
logs, it's certainly not nearly as good or useful as having the actual
user known to the database.

Forcing one role per individual end user is wasteful and isn't really
making good use of the role-based system that you already have.
Generally speaking, when administering hundreds or thousands of users,
people start dividing them up into groups as opposed to dealing with
them individually. So I don't think new features should be taking away
flexibility in this area -- if one role per user already works well for
you, great, but don't make everyone do the same.

Using the role system we have to assign privileges certainly is useful
and sensible, of course, though I don't see where you've actually made
an argument for why one role per individual is somehow wasteful or
somehow takes away from the role system that we have for granting
rights. I'm also not suggesting that we make everyone do the same
thing, indeed, later on I was supportive of having an external system
provide the mapping. Here, I'm just making the point that we should
also be looking at automatic role/membership creation.

I'd go a step further and suggest that the way to do this is with a
background worker that's started up and connects to an LDAP
infrastructure and listens for changes, allowing the system to pick up
on new roles/memberships as soon as they're created in the LDAP
environment. That would then be controlled by appropriate settings in
postgresql.conf/.auto.conf.

This is roughly what you can already do with existing (third-party)
tools, and that approach isn't scaling out in practice for some of our
existing customers. The load on the central server, for thousands of
idle databases dialing in just to see if there are any new users, is
huge.

If you're referring specifically to cron-based tools which are
constantly hammering on the LDAP servers running the same queries over
and over, sure, I agree that that's creating load on the LDAP
infrastructure (though, well, it was kind of designed to be very
scalable for exactly that kind of load, no? So I'm not really sure why
that's such an issue..). That's also why I specifically wasn't
suggesting that and was instead suggesting that we have something that's
connected to one of the (hopefully, many, many) LDAP servers and is
doing change monitoring, allowing changes to be pushed down to PG,
rather than cronjobs constantly running the same queries and re-checking
things over and over. I appreciate that that's also not free, but I
don't believe it's nearly as bad as the cron-based approach and it's
certainly something that an LDAP infrastructure should be really rather
good at.

All that said, I do see how having the ability to call out to another
system for mappings may be useful, so I'm not sure that we shouldn't
consider this specific change and have it be specifically just for
mappings, in which case pg_ident seems appropriate.

Yeah, this PoC was mostly an increment on the functionality that
already existed. The division between what goes in pg_hba and what goes
in pg_ident is starting to blur with this patchset, though, and I think
Peter's point is sound.

This part I tend to disagree with- pg_ident for mappings and for ways to
call out to other systems to provide those mappings strikes me as
entirely appropriate and doesn't blur the lines and that's really what
this patch seems to be primarily about. Peter noted that there might be
other things we want to do and argued that those might not be
appropriate in pg_ident, which I tend to agree with, but I don't think
we need to invent something entirely new for mappings when we have
pg_ident already.

When it comes to the question of "how to connect to an LDAP server for
$whatever", it seems like it'd be nice to be able to configure that once
and reuse that configuration. Not sure I have a great suggestion for
how to do that. The approach this patch takes of adding options to
pg_hba for that, just like other options in pg_hba do, strikes me as
pretty reasonable. I would advocate for other methods to work when it
comes to authenticating to LDAP from PG though (such as GSSAPI, in
particular, of course...).

I certainly don't think we should have this be limited to LDAP auth-
such an external mapping ability is suitable for any authentication
method that supports a mapping (thinking specifically of GSSAPI, of
course..). Not sure if that's what was meant above but did want to
make sure that was clear.

You can't use usermaps with LDAP auth yet, so no, that's not what I
meant. (I have another patch for that feature in commitfest, which
would allow these two things to be used together.)

Yes, I'm aware of the other patch, just wanted to make sure the intent
is for this to work for all map-supporting auth methods. Figured that
was the case but the examples in the prior email had me concerned and
just wanted to make sure.

Thanks,

Stephen

Jacob Champion

pchampion@vmware.com

about 4 years ago

In reply to: Stephen Frost (#6)

Re: [PoC] Delegating pg_ident to a third party

On Mon, 2022-01-03 at 19:42 -0500, Stephen Frost wrote:

* Jacob Champion (pchampion@vmware.com) wrote:

That last point was my motivation for the authn_id patch [1] -- so that
auditing could see the actual user _and_ the generic role. The
information is already there to be used, it's just not exposed to the
stats framework yet.

While that helps, and I generally support adding that information to the
logs, it's certainly not nearly as good or useful as having the actual
user known to the database.

Could you talk more about the use cases for which having the "actual
user" is better? From an auditing perspective I don't see why
"authenticated as jacob@example.net, logged in as admin" is any worse
than "logged in as jacob".

Forcing one role per individual end user is wasteful and isn't really
making good use of the role-based system that you already have.
Generally speaking, when administering hundreds or thousands of users,
people start dividing them up into groups as opposed to dealing with
them individually. So I don't think new features should be taking away
flexibility in this area -- if one role per user already works well for
you, great, but don't make everyone do the same.

Using the role system we have to assign privileges certainly is useful
and sensible, of course, though I don't see where you've actually made
an argument for why one role per individual is somehow wasteful or
somehow takes away from the role system that we have for granting
rights.

I was responding more to your statement that "Being able to have roles
and memberships automatically created is much more the direction that
I'd say we should be going in". It's not that one-role-per-user is
inherently wasteful, but forcing role proliferation where it's not
needed is. If all users have the same set of permissions, there doesn't
need to be more than one role. But see below.

I'm also not suggesting that we make everyone do the same
thing, indeed, later on I was supportive of having an external system
provide the mapping. Here, I'm just making the point that we should
also be looking at automatic role/membership creation.

Gotcha. Agreed; that would open up the ability to administer role
privileges externally too, which would be cool. That could be used in
tandem with something like this patchset.

I'd go a step further and suggest that the way to do this is with a
background worker that's started up and connects to an LDAP
infrastructure and listens for changes, allowing the system to pick up
on new roles/memberships as soon as they're created in the LDAP
environment. That would then be controlled by appropriate settings in
postgresql.conf/.auto.conf.

This is roughly what you can already do with existing (third-party)
tools, and that approach isn't scaling out in practice for some of our
existing customers. The load on the central server, for thousands of
idle databases dialing in just to see if there are any new users, is
huge.

If you're referring specifically to cron-based tools which are
constantly hammering on the LDAP servers running the same queries over
and over, sure, I agree that that's creating load on the LDAP
infrastructure (though, well, it was kind of designed to be very
scalable for exactly that kind of load, no? So I'm not really sure why
that's such an issue..).

I don't have hands-on experience here -- just going on what I've been
told via field/product teams -- but it seems to me that there's a big
difference between asking an LDAP server to give you information on a
user at the time that user logs in, and asking it to give a list of
_all_ users to every single Postgres instance you have on a regular
timer. The latter is what seems to be problematic.

That's also why I specifically wasn't
suggesting that and was instead suggesting that we have something that's
connected to one of the (hopefully, many, many) LDAP servers and is
doing change monitoring, allowing changes to be pushed down to PG,
rather than cronjobs constantly running the same queries and re-checking
things over and over. I appreciate that that's also not free, but I
don't believe it's nearly as bad as the cron-based approach and it's
certainly something that an LDAP infrastructure should be really rather
good at.

I guess I'd have to see an implementation -- I was under the impression
that persistent search wasn't widely implemented?

All that said, I do see how having the ability to call out to another
system for mappings may be useful, so I'm not sure that we shouldn't
consider this specific change and have it be specifically just for
mappings, in which case pg_ident seems appropriate.

Yeah, this PoC was mostly an increment on the functionality that
already existed. The division between what goes in pg_hba and what goes
in pg_ident is starting to blur with this patchset, though, and I think
Peter's point is sound.

This part I tend to disagree with- pg_ident for mappings and for ways to
call out to other systems to provide those mappings strikes me as
entirely appropriate and doesn't blur the lines and that's really what
this patch seems to be primarily about. Peter noted that there might be
other things we want to do and argued that those might not be
appropriate in pg_ident, which I tend to agree with, but I don't think
we need to invent something entirely new for mappings when we have
pg_ident already.

The current patchset here has pieces of what is usually contained in
HBA (the LDAP host/port/base/filter/etc.) effectively moved into
pg_ident, while other pieces (TLS settings) remain in the HBA and the
environment. That's what I'm referring to. If that is workable for you
in the end, that's fine, but for me it'd be much easier to maintain if
the mapping query and the LDAP connection settings for that mapping
query were next to each other.

When it comes to the question of "how to connect to an LDAP server for
$whatever", it seems like it'd be nice to be able to configure that once
and reuse that configuration. Not sure I have a great suggestion for
how to do that. The approach this patch takes of adding options to
pg_hba for that, just like other options in pg_hba do, strikes me as
pretty reasonable.

Right. That part seems less reasonable to me, given the current format
of the HBA. YMMV.

I would advocate for other methods to work when it comes to
authenticating to LDAP from PG though (such as GSSAPI, in particular,
of course...).

I can take a look at the Cyrus requirements for the GSSAPI mechanism.
Might be tricky to add tests for it, though. Any others you're
interested in?

I certainly don't think we should have this be limited to LDAP auth-
such an external mapping ability is suitable for any authentication
method that supports a mapping (thinking specifically of GSSAPI, of
course..). Not sure if that's what was meant above but did want to
make sure that was clear.

You can't use usermaps with LDAP auth yet, so no, that's not what I
meant. (I have another patch for that feature in commitfest, which
would allow these two things to be used together.)

Yes, I'm aware of the other patch, just wanted to make sure the intent
is for this to work for all map-supporting auth methods. Figured that
was the case but the examples in the prior email had me concerned and
just wanted to make sure.

Correct. The new tests use cert auth, for instance.

Thanks,
--Jacob

Stephen Frost

sfrost@snowman.net

about 4 years ago

In reply to: Jacob Champion (#7)

Re: [PoC] Delegating pg_ident to a third party

Greetings,

On Tue, Jan 4, 2022 at 18:56 Jacob Champion <pchampion@vmware.com> wrote:

On Mon, 2022-01-03 at 19:42 -0500, Stephen Frost wrote:

* Jacob Champion (pchampion@vmware.com) wrote:

That last point was my motivation for the authn_id patch [1] -- so that
auditing could see the actual user _and_ the generic role. The
information is already there to be used, it's just not exposed to the
stats framework yet.

While that helps, and I generally support adding that information to the
logs, it's certainly not nearly as good or useful as having the actual
user known to the database.

Could you talk more about the use cases for which having the "actual
user" is better? From an auditing perspective I don't see why
"authenticated as jacob@example.net, logged in as admin" is any worse
than "logged in as jacob".

The above case isn’t what we are talking about, as far as I understand
anyway. You’re suggesting “authenticated as jacob@example.net, logged in as
sales” where the user in the database is “sales”. Consider triggers which
only have access to “sales”, or a tool like pgaudit which only has access
to “sales”. Who was it in sales that updated that record though? We don’t
know- we would have to go try to figure it out from the logs, but even if
we had time stamps on the row update, there could be 50 sales people logged
in at overlapping times.

Forcing one role per individual end user is wasteful and isn't really

making good use of the role-based system that you already have.
Generally speaking, when administering hundreds or thousands of users,
people start dividing them up into groups as opposed to dealing with
them individually. So I don't think new features should be taking away
flexibility in this area -- if one role per user already works well for
you, great, but don't make everyone do the same.

Using the role system we have to assign privileges certainly is useful
and sensible, of course, though I don't see where you've actually made
an argument for why one role per individual is somehow wasteful or
somehow takes away from the role system that we have for granting
rights.

I was responding more to your statement that "Being able to have roles
and memberships automatically created is much more the direction that
I'd say we should be going in". It's not that one-role-per-user is
inherently wasteful, but forcing role proliferation where it's not
needed is. If all users have the same set of permissions, there doesn't
need to be more than one role. But see below.

Just saying it’s wasteful isn’t actually saying what is wasteful about it.

I'm also not suggesting that we make everyone do the same

thing, indeed, later on I was supportive of having an external system
provide the mapping. Here, I'm just making the point that we should
also be looking at automatic role/membership creation.

Gotcha. Agreed; that would open up the ability to administer role
privileges externally too, which would be cool. That could be used in
tandem with something like this patchset.

Not sure exactly what you’re referring to here by “administer role
privileges externally too”..? Curious to hear what you are imagining
specifically.

I'd go a step further and suggest that the way to do this is with a

background worker that's started up and connects to an LDAP
infrastructure and listens for changes, allowing the system to pick

up

on new roles/memberships as soon as they're created in the LDAP
environment. That would then be controlled by appropriate settings

in

postgresql.conf/.auto.conf.

This is roughly what you can already do with existing (third-party)
tools, and that approach isn't scaling out in practice for some of our
existing customers. The load on the central server, for thousands of
idle databases dialing in just to see if there are any new users, is
huge.

If you're referring specifically to cron-based tools which are
constantly hammering on the LDAP servers running the same queries over
and over, sure, I agree that that's creating load on the LDAP
infrastructure (though, well, it was kind of designed to be very
scalable for exactly that kind of load, no? So I'm not really sure why
that's such an issue..).

I don't have hands-on experience here -- just going on what I've been
told via field/product teams -- but it seems to me that there's a big
difference between asking an LDAP server to give you information on a
user at the time that user logs in, and asking it to give a list of
_all_ users to every single Postgres instance you have on a regular
timer. The latter is what seems to be problematic.

And to be clear, I agree that’s not good (though, again, really, your ldap
infrastructure shouldn’t be having all that much trouble with it- you can
scale those out verryyyy far, and far more easily than a relational
database..).

I’d also point out though that having to do an ldap lookup on every login
to PG is *already* an issue in some environments, having to do multiple
amplifies that. Not to mention that when the ldap servers can’t be reached
for some reason, no one can log into the database and that’s rather
unfortunate too. These are, of course, arguments for moving away from
methods that require checking with some other system synchronously during
login- which is another reason why it’s better to have the authentication
credentials easily map to the PG role, without the need for external checks
at login time. That’s done with today’s pg_ident, but this patch would
change that.

Consider the approach I continue to advocate- GSSAPI based authentication,
where a user only needs to contact the Kerberos server perhaps every 8
hours or so for an updated ticket but otherwise can authorize directly to
PG using their existing ticket and credentials, where their role was
previously created and their memberships already exist thanks to a
background worker whose job it is to handle that and which deals with
transient network failures or other issues. In this world, most logins to
PG don’t require any other system to be involved besides the client, the PG
server, and the networking between them; perhaps DNS if things aren’t
cached on the client.

On the other hand, to use ldap authentication (which also happens to be
demonstrable insecure without any reasonable way to fix that), with an ldap
mapping setup, requires two logins to an ldap server every single time a
user logs into PG and if the ldap environment is offline or overloaded for
whatever reason, the login fails or takes an excessively long amount of
time.

That's also why I specifically wasn't

suggesting that and was instead suggesting that we have something that's
connected to one of the (hopefully, many, many) LDAP servers and is
doing change monitoring, allowing changes to be pushed down to PG,
rather than cronjobs constantly running the same queries and re-checking
things over and over. I appreciate that that's also not free, but I
don't believe it's nearly as bad as the cron-based approach and it's
certainly something that an LDAP infrastructure should be really rather
good at.

I guess I'd have to see an implementation -- I was under the impression
that persistent search wasn't widely implemented?

I mean … let’s talk about the one that really matters here:

https://docs.microsoft.com/en-us/windows/win32/ad/change-notifications-in-active-directory-domain-services

OpenLDAP has an audit log system which can be used though it’s certainly
not as nice and would require code specific to it.

This talks a bit about other directories:
https://docs.informatica.com/data-integration/powerexchange-adapters-for-powercenter/10-1/powerexchange-for-ldap-user-guide-for-powercenter/ldap-sessions/configuring-change-data-capture/methods-for-tracking-changes-in-different-directories.html

I do wish they all supported it cleanly in the same way.

All that said, I do see how having the ability to call out to another

system for mappings may be useful, so I'm not sure that we shouldn't
consider this specific change and have it be specifically just for
mappings, in which case pg_ident seems appropriate.

Yeah, this PoC was mostly an increment on the functionality that
already existed. The division between what goes in pg_hba and what goes
in pg_ident is starting to blur with this patchset, though, and I think
Peter's point is sound.

This part I tend to disagree with- pg_ident for mappings and for ways to
call out to other systems to provide those mappings strikes me as
entirely appropriate and doesn't blur the lines and that's really what
this patch seems to be primarily about. Peter noted that there might be
other things we want to do and argued that those might not be
appropriate in pg_ident, which I tend to agree with, but I don't think
we need to invent something entirely new for mappings when we have
pg_ident already.

The current patchset here has pieces of what is usually contained in
HBA (the LDAP host/port/base/filter/etc.) effectively moved into
pg_ident, while other pieces (TLS settings) remain in the HBA and the
environment. That's what I'm referring to. If that is workable for you
in the end, that's fine, but for me it'd be much easier to maintain if
the mapping query and the LDAP connection settings for that mapping
query were next to each other.

I can agree with the point that it would be nicer to have the ldap
host/port/base/filter be in the hba instead, if there is a way to
accomplish that reasonably. Did you have a suggestion in mind for how to do
that..? If there’s an alternative approach to consider, it’d be useful to
see them next to each other and then we could all contemplate which is
better.

When it comes to the question of "how to connect to an LDAP server for

$whatever", it seems like it'd be nice to be able to configure that once
and reuse that configuration. Not sure I have a great suggestion for
how to do that. The approach this patch takes of adding options to
pg_hba for that, just like other options in pg_hba do, strikes me as
pretty reasonable.

Right. That part seems less reasonable to me, given the current format
of the HBA. YMMV.

If the ldap connection info and filters and such could all exist in the
hba, then perhaps a way to define those credentials in one place in the hba
file and then use them on other lines would be possible..? Seems like that
would be easier than having them also in the ident or having the ident
refer to something defined elsewhere.

Consider in the hba having:

LDAPSERVER[ldap1]=“ldaps://whatever other options go here”

Then later:

hostssl all all ::0/0 ldap ldapserver=ldap1 ldapmapserver=ldap1
map=myldapmap

Clearly needs more thought needed due to different requirements for ldap
authentication vs. the map, but still, the general idea being to have all
of it in the hba and then a way to define ldap server configuration in the
hba once and then reused.

I would advocate for other methods to work when it comes to

authenticating to LDAP from PG though (such as GSSAPI, in particular,
of course...).

I can take a look at the Cyrus requirements for the GSSAPI mechanism.
Might be tricky to add tests for it, though. Any others you're
interested in?

GSSAPI is the main one … I suppose client side certificates would be nice
too if that’s possible. I suspect some would like a way to have
username/pw ldap credentials in some other file besides the hba, but that
isn’t as interesting to me, at least.

I certainly don't think we should have this be limited to LDAP auth-

such an external mapping ability is suitable for any authentication
method that supports a mapping (thinking specifically of GSSAPI, of
course..). Not sure if that's what was meant above but did want to
make sure that was clear.

You can't use usermaps with LDAP auth yet, so no, that's not what I
meant. (I have another patch for that feature in commitfest, which
would allow these two things to be used together.)

Yes, I'm aware of the other patch, just wanted to make sure the intent
is for this to work for all map-supporting auth methods. Figured that
was the case but the examples in the prior email had me concerned and
just wanted to make sure.

Correct. The new tests use cert auth, for instance.

Great.

Thanks!

Stephen

Jacob Champion

pchampion@vmware.com

about 4 years ago

In reply to: Stephen Frost (#8)

Re: [PoC] Delegating pg_ident to a third party

On Tue, 2022-01-04 at 22:24 -0500, Stephen Frost wrote:

On Tue, Jan 4, 2022 at 18:56 Jacob Champion <pchampion@vmware.com> wrote:

Could you talk more about the use cases for which having the "actual
user" is better? From an auditing perspective I don't see why
"authenticated as jacob@example.net, logged in as admin" is any worse
than "logged in as jacob".

The above case isn’t what we are talking about, as far as I
understand anyway. You’re suggesting “authenticated as
jacob@example.net, logged in as sales” where the user in the database
is “sales”. Consider triggers which only have access to “sales”, or
a tool like pgaudit which only has access to “sales”.

Okay. So an additional getter function in miscadmin.h, and surfacing
that function to trigger languages, are needed to make authn_id more
generally useful. Any other cases you can think of?

I was responding more to your statement that "Being able to have roles
and memberships automatically created is much more the direction that
I'd say we should be going in". It's not that one-role-per-user is
inherently wasteful, but forcing role proliferation where it's not
needed is. If all users have the same set of permissions, there doesn't
need to be more than one role. But see below.

Just saying it’s wasteful isn’t actually saying what is wasteful about it.

Well, I felt like it was irrelevant; you've already said you have no
intention to force one-user-per-role.

But to elaborate: *forcing* one-user-per-role is wasteful, because if I
have a thousand employees, and I want to give all my employees access
to a guest role in the database, then I have to administer a thousand
roles: maintaining them through dump/restores and pg_upgrades, auditing
them to figure out why Bob in Accounting somehow got a different
privilege GRANT than the rest of the users, adding new accounts,
purging old ones, maintaining the inevitable scripts that will result.

If none of the users need to be "special" in any way, that's all wasted
overhead. (If they do actually need to be special, then at least some
of that overhead becomes necessary. Otherwise it's waste.) You may be
able to mitigate the cost of the waste, or absorb the mitigations into
Postgres so that the user can't see the waste, or decide that the waste
is not costly enough to care about. It's still waste.

I'm also not suggesting that we make everyone do the same
thing, indeed, later on I was supportive of having an external system
provide the mapping. Here, I'm just making the point that we should
also be looking at automatic role/membership creation.

Gotcha. Agreed; that would open up the ability to administer role
privileges externally too, which would be cool. That could be used in
tandem with something like this patchset.

Not sure exactly what you’re referring to here by “administer role
privileges externally too”..? Curious to hear what you are imagining
specifically.

Just that it would be nice to centrally provision role GRANTs as well
as role membership, that's all. No specifics in mind, and I'm not even
sure if LDAP would be a helpful place to put that sort of config.

I’d also point out though that having to do an ldap lookup on every
login to PG is *already* an issue in some environments, having to do
multiple amplifies that.

You can't use the LDAP auth method with this patch yet, so this concern
is based on code that doesn't exist. It's entirely possible that you
could do the role query as part of the first bound connection. If that
proves unworkable, then yes, I agree that it's a concern.

Not to mention that when the ldap servers can’t be reached for some
reason, no one can log into the database and that’s rather
unfortunate too.

Assuming you have no caches, then yes. That might be a pretty good
argument for allowing ldapmap and map to be used together, actually, so
that you can have some critical users who can always log in as
"themselves" or "admin" or etc. Or maybe it's an argument for allowing
HBA to handle fallback methods of authentication.

Luckily I think it's pretty easy to communicate to LDAP users that if
*all* your login infrastructure goes down, you will no longer be able
to log in. They're probably used to that idea, if they haven't set up
any availability infra.

These are, of course, arguments for moving away from methods that
require checking with some other system synchronously during login-
which is another reason why it’s better to have the authentication
credentials easily map to the PG role, without the need for external
checks at login time. That’s done with today’s pg_ident, but this
patch would change that.

There are arguments for moving towards synchronous checks as well.
Central revocation of credentials (in timeframes shorter than ticket
expiration) is what comes to mind. Revocation is hard and usually
conflicts with the desire for availability.

What's "better" for me or you is not necessarily "better" overall; it's
all tradeoffs, all the time.

Consider the approach I continue to advocate- GSSAPI based
authentication, where a user only needs to contact the Kerberos
server perhaps every 8 hours or so for an updated ticket but
otherwise can authorize directly to PG using their existing ticket
and credentials, where their role was previously created and their
memberships already exist thanks to a background worker whose job it
is to handle that and which deals with transient network failures or
other issues. In this world, most logins to PG don’t require any
other system to be involved besides the client, the PG server, and
the networking between them; perhaps DNS if things aren’t cached on
the client.

On the other hand, to use ldap authentication (which also happens to
be demonstrable insecure without any reasonable way to fix that),
with an ldap mapping setup, requires two logins to an ldap server
every single time a user logs into PG and if the ldap environment is
offline or overloaded for whatever reason, the login fails or takes
an excessively long amount of time.

The two systems have different architectures, and different security
properties, and you have me at a disadvantage in that you can see the
experimental code I have written and I cannot see the hypothetical code
in your head.

It sounds like I'm more concerned with the ability to have an online
central source of truth for access control, accepting that denial of
service may cause the system to fail shut; and you're more concerned
with availability in the face of network failure, accepting that denial
of service may cause the system to fail open. I think that's a design
decision that belongs to an end user.

The distributed availability problems you're describing are, in my
experience, typically solved by caching. With your not-yet-written
solution, the caching is built into Postgres, and it's on all of the
time, but may (see below) only actually perform well with Active
Directory. With my solution, any caching is optional, because it has to
be implemented/maintained external to Postgres, but because it's just
generic "LDAP caching" then it should be broadly compatible and we
don't have to maintain it. I can see arguments for and against both
approaches.

I guess I'd have to see an implementation -- I was under the impression
that persistent search wasn't widely implemented?

I mean … let’s talk about the one that really matters here:

https://docs.microsoft.com/en-us/windows/win32/ad/change-notifications-in-active-directory-domain-services

That would certainly be a useful thing to implement for deployments
that can use it. But my personal interest in writing "LDAP" code that
only works with AD is nil, at least in the short term.

(The continued attitude that Microsoft Active Directory is "the one
that really matters" is really frustrating. I have users on LDAP
without Active Directory. Postgres tests are written against OpenLDAP.)

OpenLDAP has an audit log system which can be used though it’s
certainly not as nice and would require code specific to it.

This talks a bit about other directories:
https://docs.informatica.com/data-integration/powerexchange-adapters-for-powercenter/10-1/powerexchange-for-ldap-user-guide-for-powercenter/ldap-sessions/configuring-change-data-capture/methods-for-tracking-changes-in-different-directories.html

I do wish they all supported it cleanly in the same way.

Okay. But the answer to "is persistent search widely implemented?"
appears to be "No."

The current patchset here has pieces of what is usually contained in
HBA (the LDAP host/port/base/filter/etc.) effectively moved into
pg_ident, while other pieces (TLS settings) remain in the HBA and the
environment. That's what I'm referring to. If that is workable for you
in the end, that's fine, but for me it'd be much easier to maintain if
the mapping query and the LDAP connection settings for that mapping
query were next to each other.

I can agree with the point that it would be nicer to have the ldap
host/port/base/filter be in the hba instead, if there is a way to
accomplish that reasonably. Did you have a suggestion in mind for how
to do that..? If there’s an alternative approach to consider, it’d
be useful to see them next to each other and then we could all
contemplate which is better.

I didn't say I necessarily wanted it all in the HBA, just that I wanted
it all in the same spot.

I don't see a good way to push the filter back into the HBA, because it
may very well depend on the users being mapped (i.e. there may need to
be multiple lines in the map). Same for the query attributes. In fact
if I'm already using AD Kerberos or SSPI and I want to be able to
handle users coming from multiple domains, couldn't I be querying
entirely different servers depending on the username presented?

When it comes to the question of "how to connect to an LDAP server for
$whatever", it seems like it'd be nice to be able to configure that once
and reuse that configuration. Not sure I have a great suggestion for
how to do that. The approach this patch takes of adding options to
pg_hba for that, just like other options in pg_hba do, strikes me as
pretty reasonable.

Right. That part seems less reasonable to me, given the current format
of the HBA. YMMV.

If the ldap connection info and filters and such could all exist in
the hba, then perhaps a way to define those credentials in one place
in the hba file and then use them on other lines would be
possible..? Seems like that would be easier than having them also in
the ident or having the ident refer to something defined elsewhere.

Consider in the hba having:

LDAPSERVER[ldap1]=“ldaps://whatever other options go here”

Then later:

hostssl all all ::0/0 ldap ldapserver=ldap1 ldapmapserver=ldap1 map=myldapmap

Clearly needs more thought needed due to different requirements for
ldap authentication vs. the map, but still, the general idea being to
have all of it in the hba and then a way to define ldap server
configuration in the hba once and then reused.

You're open to the idea of bolting a new key/value grammar onto the HBA
parser, but not to the idea of brainstorming a different configuration
DSL?

I can take a look at the Cyrus requirements for the GSSAPI mechanism.
Might be tricky to add tests for it, though. Any others you're
interested in?

GSSAPI is the main one … I suppose client side certificates would be
nice too if that’s possible. I suspect some would like a way to have
username/pw ldap credentials in some other file besides the hba, but
that isn’t as interesting to me, at least.

Certificate auth is already there in the patch. See the end of
t/001_ldap.t.

Thanks,
--Jacob

#10

Stephen Frost

sfrost@snowman.net

about 4 years ago

In reply to: Jacob Champion (#9)

Re: [PoC] Delegating pg_ident to a third party

Greetings,

* Jacob Champion (pchampion@vmware.com) wrote:

On Tue, 2022-01-04 at 22:24 -0500, Stephen Frost wrote:

On Tue, Jan 4, 2022 at 18:56 Jacob Champion <pchampion@vmware.com> wrote:

Could you talk more about the use cases for which having the "actual
user" is better? From an auditing perspective I don't see why
"authenticated as jacob@example.net, logged in as admin" is any worse
than "logged in as jacob".

The above case isn’t what we are talking about, as far as I
understand anyway. You’re suggesting “authenticated as
jacob@example.net, logged in as sales” where the user in the database
is “sales”. Consider triggers which only have access to “sales”, or
a tool like pgaudit which only has access to “sales”.

Okay. So an additional getter function in miscadmin.h, and surfacing
that function to trigger languages, are needed to make authn_id more
generally useful. Any other cases you can think of?

That would help but now you've got two different things that have to be
tracked, potentially, because for some people you might not want to use
their system auth'd-as ID. I don't see that as a great solution and
instead as a workaround. Yes, we should also do this but it's really an
argument for how to deal with such a setup, not a justification for
going down this route.

I was responding more to your statement that "Being able to have roles
and memberships automatically created is much more the direction that
I'd say we should be going in". It's not that one-role-per-user is
inherently wasteful, but forcing role proliferation where it's not
needed is. If all users have the same set of permissions, there doesn't
need to be more than one role. But see below.

Just saying it’s wasteful isn’t actually saying what is wasteful about it.

Well, I felt like it was irrelevant; you've already said you have no
intention to force one-user-per-role.

Forcing one-user-per-role would be breaking things we already support
so, no, I certainly don't have any intention of requiring such a change.
That said, I do feel it's useful to have these discussions.

But to elaborate: *forcing* one-user-per-role is wasteful, because if I
have a thousand employees, and I want to give all my employees access
to a guest role in the database, then I have to administer a thousand
roles: maintaining them through dump/restores and pg_upgrades, auditing
them to figure out why Bob in Accounting somehow got a different
privilege GRANT than the rest of the users, adding new accounts,
purging old ones, maintaining the inevitable scripts that will result.

pg_upgrade just handles it, no? pg_dumpall -g does too. Having to deal
with roles in general is a pain but the number of them isn't necessarily
an issue. A guest role which doesn't have any auditing requirements
might be a decent use-case for what you're talking about here but I
don't know that we'd implement this for just that case. Part of this
discussion was specifically about addressing the other challenges- like
having automation around the account addition/removal and sync'ing role
membership too. As for auditing privileges, that should be done
regardless and the case you outline isn't somehow different from others
(the same could be as easily said for how the 'guest' account got access
to whatever it did).

If none of the users need to be "special" in any way, that's all wasted
overhead. (If they do actually need to be special, then at least some
of that overhead becomes necessary. Otherwise it's waste.) You may be
able to mitigate the cost of the waste, or absorb the mitigations into
Postgres so that the user can't see the waste, or decide that the waste
is not costly enough to care about. It's still waste.

Except the amount of 'wasted' overhead being claimed here seems to be
hardly any. The biggest complaint levied at this seems to really be
just the issues around the load on the ldap systems from having to deal
with the frequent sync queries, and that's largely a solvable issue in
the majority of environments out there today.

I'm also not suggesting that we make everyone do the same
thing, indeed, later on I was supportive of having an external system
provide the mapping. Here, I'm just making the point that we should
also be looking at automatic role/membership creation.

Gotcha. Agreed; that would open up the ability to administer role
privileges externally too, which would be cool. That could be used in
tandem with something like this patchset.

Not sure exactly what you’re referring to here by “administer role
privileges externally too”..? Curious to hear what you are imagining
specifically.

Just that it would be nice to centrally provision role GRANTs as well
as role membership, that's all. No specifics in mind, and I'm not even
sure if LDAP would be a helpful place to put that sort of config.

GRANT's on objects, you mean? I agree, that would be interesting to
consider though it would involve custom entries in the LDAP directory,
no? Role membership would be able to be sync'd as part of group
membership and that was something I was thinking would be handled as
part of this in a similar manner to what the 3rd party solutions provide
today using the cron-based approach.

I’d also point out though that having to do an ldap lookup on every
login to PG is *already* an issue in some environments, having to do
multiple amplifies that.

You can't use the LDAP auth method with this patch yet, so this concern
is based on code that doesn't exist. It's entirely possible that you
could do the role query as part of the first bound connection. If that
proves unworkable, then yes, I agree that it's a concern.

Perhaps it could be done as part of the same connection but that then
has an impact on what the configuration of the ident LDAP lookup would
be, no? That seems like an important thing to flesh out before we move
too much farther with this patch, to make sure that, if we want that to
work, that there's a clear way to configure it to avoid the double LDAP
connection. I'm guessing you already have an idea how that'll work
though..?

Not to mention that when the ldap servers can’t be reached for some
reason, no one can log into the database and that’s rather
unfortunate too.

Assuming you have no caches, then yes. That might be a pretty good
argument for allowing ldapmap and map to be used together, actually, so
that you can have some critical users who can always log in as
"themselves" or "admin" or etc. Or maybe it's an argument for allowing
HBA to handle fallback methods of authentication.

Ok, so now we're talking about a cache that needs to be implemented
which will ... store the user's password for LDAP authentication? Or
what the mapping is for various LDAP IDs to PG roles? And how will that
cache be managed? Would it be handled by dump/restore? What about
pg_upgrade? How will entries in the cache be removed?

And mainly- how is this different from just having all the roles in PG
to begin with..?

Luckily I think it's pretty easy to communicate to LDAP users that if
*all* your login infrastructure goes down, you will no longer be able
to log in. They're probably used to that idea, if they haven't set up
any availability infra.

Except that most of the rest of the infrastructure may continue to work
just fine except for logging in- which is something most folks only do
once a day. That is, why is the SQL Server system still happily
accepting connections while the AD is being rebooted? Or why can I
still log into the company website even though AD is down, but I can't
get into PG? Not everything in an environment is tied to LDAP being up
and running all the time, so it's not nearly so cut and dry in many,
many cases.

These are, of course, arguments for moving away from methods that
require checking with some other system synchronously during login-
which is another reason why it’s better to have the authentication
credentials easily map to the PG role, without the need for external
checks at login time. That’s done with today’s pg_ident, but this
patch would change that.

There are arguments for moving towards synchronous checks as well.
Central revocation of credentials (in timeframes shorter than ticket
expiration) is what comes to mind. Revocation is hard and usually
conflicts with the desire for availability.

Revocation in less time than ticket lifetime and everything falling over
due to the AD being restarted are very different. The approaches being
discussed are all much shorter than ticket lifetime and so that's hardly
an appropriate comparison to be making. I didn't suggest that waiting
for ticket expiration would be appropriate when it comes to syncing
accounts between AD and PG or that it would be appropriate for
revocation. Regarding the cache'ing proposed above- in such a case,
clearly, revocation wouldn't be syncronous either. Certainly in the
cases today where cronjobs are being used to perform the sync,
revocation also isn't syncronous (unless also using LDAP for
authentication, of course, though that wouldn't do anything for existing
sessions, while removing role memberships does...).

What's "better" for me or you is not necessarily "better" overall; it's
all tradeoffs, all the time.

Sure.

Consider the approach I continue to advocate- GSSAPI based
authentication, where a user only needs to contact the Kerberos
server perhaps every 8 hours or so for an updated ticket but
otherwise can authorize directly to PG using their existing ticket
and credentials, where their role was previously created and their
memberships already exist thanks to a background worker whose job it
is to handle that and which deals with transient network failures or
other issues. In this world, most logins to PG don’t require any
other system to be involved besides the client, the PG server, and
the networking between them; perhaps DNS if things aren’t cached on
the client.

On the other hand, to use ldap authentication (which also happens to
be demonstrable insecure without any reasonable way to fix that),
with an ldap mapping setup, requires two logins to an ldap server
every single time a user logs into PG and if the ldap environment is
offline or overloaded for whatever reason, the login fails or takes
an excessively long amount of time.

The two systems have different architectures, and different security
properties, and you have me at a disadvantage in that you can see the
experimental code I have written and I cannot see the hypothetical code
in your head.

I've barely glanced at the code you've written and it largely hasn't
been driving my comments on this thread- merely the understanding of how
it works. Further, you've stated that you're already familiar with
systems that sync between LDAP and PG and the vast majority of this
discussion has been about that distinction- if we push the mappings into
PG as roles, or if we execute a query out to LDAP on connection to check
the mapping. The above references to tickets and GSSAPI/Kerberos are
all from existing code as well. The only reference to hypothetical code
is the idea of a background or other worker that subscribes to changes
in LDAP and implements those changes in PG instead of having something
cron-based do it, but that doesn't really change anything about the
architectural question of if we cache (either with an explicit cache, as
you've opined us adding above, though which there is no code for today,
or just by using PG's existing role/membership system) or call out to
LDAP for every login.

It sounds like I'm more concerned with the ability to have an online
central source of truth for access control, accepting that denial of
service may cause the system to fail shut; and you're more concerned
with availability in the face of network failure, accepting that denial
of service may cause the system to fail open. I think that's a design
decision that belongs to an end user.

There is more to it than just failing shut/closed. Part of the argument
being used to drive this change was that it would help to reduce the
load on the LDAP servers because there wouldn't be a need to run large
queries on them frequently out of cron to keep PG's understanding of
what the roles are and their mappings is matching what's in LDAP.

The distributed availability problems you're describing are, in my
experience, typically solved by caching. With your not-yet-written
solution, the caching is built into Postgres, and it's on all of the
time, but may (see below) only actually perform well with Active
Directory. With my solution, any caching is optional, because it has to
be implemented/maintained external to Postgres, but because it's just
generic "LDAP caching" then it should be broadly compatible and we
don't have to maintain it. I can see arguments for and against both
approaches.

I'm a bit confused by the this- either you're referring to the cache
being PG's existing system, which certainly has already been written,
and has existed since it was committed and released as part of 8.1, and
is, indeed, on all the time ... or you're talking about something else
which hasn't been written and could therefore be anything, though I'm
generally against the idea of having an independent cache for this, as
described above.

As for optional cacheing with some generic LDAP caching system, that
strikes me as clearly even worse than building something into PG for
this as it requires maintaining yet another system in order to have a
reasonably well working system and that isn't good. While it's good
that we have pgbouncer, it'd certainly be better if we didn't need it
and it's got a bunch of downsides to it. I strongly suspect the same
would be true of some external generic "LDAP cacheing" system as is
referred to above, though as there isn't anything to look at, I can't
say for sure.

Regarding 'performing well', while lots of little queries may be better
in some cases than less frequent larger queries, that's really going to
depend on the frequency of each and therefore really be rather dependent
on the environment and usage. In any case, however, being able to
leverage change modifications instead of fully resyncing will definitely
be better. At the same time, however, if we have the external generic
LDAP cacheing system that's being claimed ... why wouldn't we simply use
that with the cron-based system today to offload those from the main
LDAP systems?

I guess I'd have to see an implementation -- I was under the impression
that persistent search wasn't widely implemented?

I mean … let’s talk about the one that really matters here:

https://docs.microsoft.com/en-us/windows/win32/ad/change-notifications-in-active-directory-domain-services

That would certainly be a useful thing to implement for deployments
that can use it. But my personal interest in writing "LDAP" code that
only works with AD is nil, at least in the short term.

(The continued attitude that Microsoft Active Directory is "the one
that really matters" is really frustrating. I have users on LDAP
without Active Directory. Postgres tests are written against OpenLDAP.)

What would you consider the important directories to worry about beyond
AD? I don't consider the PG testing framework to be particularly
indicative of what enterprises are actually running.

OpenLDAP has an audit log system which can be used though it’s
certainly not as nice and would require code specific to it.

This talks a bit about other directories:
https://docs.informatica.com/data-integration/powerexchange-adapters-for-powercenter/10-1/powerexchange-for-ldap-user-guide-for-powercenter/ldap-sessions/configuring-change-data-capture/methods-for-tracking-changes-in-different-directories.html

I do wish they all supported it cleanly in the same way.

Okay. But the answer to "is persistent search widely implemented?"
appears to be "No."

I'm curious as to how the large environments that you've worked with
have generally solved this issue. Is there a generic LDAP cacheing
system that's been used? What?

The current patchset here has pieces of what is usually contained in
HBA (the LDAP host/port/base/filter/etc.) effectively moved into
pg_ident, while other pieces (TLS settings) remain in the HBA and the
environment. That's what I'm referring to. If that is workable for you
in the end, that's fine, but for me it'd be much easier to maintain if
the mapping query and the LDAP connection settings for that mapping
query were next to each other.

I can agree with the point that it would be nicer to have the ldap
host/port/base/filter be in the hba instead, if there is a way to
accomplish that reasonably. Did you have a suggestion in mind for how
to do that..? If there’s an alternative approach to consider, it’d
be useful to see them next to each other and then we could all
contemplate which is better.

I didn't say I necessarily wanted it all in the HBA, just that I wanted
it all in the same spot.

I don't see a good way to push the filter back into the HBA, because it
may very well depend on the users being mapped (i.e. there may need to
be multiple lines in the map). Same for the query attributes. In fact
if I'm already using AD Kerberos or SSPI and I want to be able to
handle users coming from multiple domains, couldn't I be querying
entirely different servers depending on the username presented?

Yeah, that's a good point and which argues for putting everything into
the ident. In such a situation as you describe above, we wouldn't
actually have any LDAP configuration in the HBA and I'm entirely fine
with that- we'd just have it all in ident. I don't see how you'd make
that work with, as you suggest above, LDAP-based authentication and the
idea of having only one connection be used for the LDAP-based auth and
the mapping lookup, but I'm also not generally worried about LDAP-based
auth and would rather we rip it out entirely. :)

As such, I'd say that you've largely convinced me that we should just
move all of the LDAP configuration for the lookup into the ident and
discourage people from using LDAP-based authentication and from putting
LDAP configuration into the hba. I'm still a fan of the general idea of
having a way to configure such ldap parameters in one place in whatever
file they go into and then re-using that multiple times on the general
assumption that folks are likely to need to reference a particular LDAP
configuration more than once, wherever it's configured.

When it comes to the question of "how to connect to an LDAP server for
$whatever", it seems like it'd be nice to be able to configure that once
and reuse that configuration. Not sure I have a great suggestion for
how to do that. The approach this patch takes of adding options to
pg_hba for that, just like other options in pg_hba do, strikes me as
pretty reasonable.

Right. That part seems less reasonable to me, given the current format
of the HBA. YMMV.

If the ldap connection info and filters and such could all exist in
the hba, then perhaps a way to define those credentials in one place
in the hba file and then use them on other lines would be
possible..? Seems like that would be easier than having them also in
the ident or having the ident refer to something defined elsewhere.

Consider in the hba having:

LDAPSERVER[ldap1]=“ldaps://whatever other options go here”

Then later:

hostssl all all ::0/0 ldap ldapserver=ldap1 ldapmapserver=ldap1 map=myldapmap

Clearly needs more thought needed due to different requirements for
ldap authentication vs. the map, but still, the general idea being to
have all of it in the hba and then a way to define ldap server
configuration in the hba once and then reused.

You're open to the idea of bolting a new key/value grammar onto the HBA
parser, but not to the idea of brainstorming a different configuration
DSL?

Short answer- yes (or, as mentioned just above, into the ident file vs.
the hba). I'd rather we build on the existing configuration systems
that we have rather than invent something new that will then have to
work with the others, as I don't see it as likely that we could just
replace the existing ones with something new and make everyone
change. Having yet another one strikes me as worse than making
improvements to the existing ones (be those 'bolted on' or otherwise).

Thanks,

Stephen

#11

Jacob Champion

pchampion@vmware.com

almost 4 years ago

In reply to: Stephen Frost (#10)

Re: [PoC] Delegating pg_ident to a third party

On Mon, 2022-01-10 at 15:09 -0500, Stephen Frost wrote:

Greetings,

Sorry for the delay, the last few weeks have been insane.

* Jacob Champion (pchampion@vmware.com) wrote:

On Tue, 2022-01-04 at 22:24 -0500, Stephen Frost wrote:

On Tue, Jan 4, 2022 at 18:56 Jacob Champion <pchampion@vmware.com> wrote:

Could you talk more about the use cases for which having the "actual
user" is better? From an auditing perspective I don't see why
"authenticated as jacob@example.net, logged in as admin" is any worse
than "logged in as jacob".

The above case isn’t what we are talking about, as far as I
understand anyway. You’re suggesting “authenticated as
jacob@example.net, logged in as sales” where the user in the database
is “sales”. Consider triggers which only have access to “sales”, or
a tool like pgaudit which only has access to “sales”.

Okay. So an additional getter function in miscadmin.h, and surfacing
that function to trigger languages, are needed to make authn_id more
generally useful. Any other cases you can think of?

That would help but now you've got two different things that have to be
tracked, potentially, because for some people you might not want to use
their system auth'd-as ID. I don't see that as a great solution and
instead as a workaround.

There's nothing to be worked around. If you have a user mapping set up
using the features that exist today, and you want to audit who logged
in at some point in the past, then you need to log both the
authenticated ID and the authorized role. There's no getting around
that. It's not enough to say "just check the configuration" because the
config can change over time.

But to elaborate: *forcing* one-user-per-role is wasteful, because if I
have a thousand employees, and I want to give all my employees access
to a guest role in the database, then I have to administer a thousand
roles: maintaining them through dump/restores and pg_upgrades, auditing
them to figure out why Bob in Accounting somehow got a different
privilege GRANT than the rest of the users, adding new accounts,
purging old ones, maintaining the inevitable scripts that will result.

pg_upgrade just handles it, no? pg_dumpall -g does too. Having to deal
with roles in general is a pain but the number of them isn't necessarily
an issue. A guest role which doesn't have any auditing requirements
might be a decent use-case for what you're talking about here but I
don't know that we'd implement this for just that case. Part of this
discussion was specifically about addressing the other challenges- like
having automation around the account addition/removal and sync'ing role
membership too. As for auditing privileges, that should be done
regardless and the case you outline isn't somehow different from others
(the same could be as easily said for how the 'guest' account got access
to whatever it did).

I think there's a difference between auditing a small fixed number of
roles and auditing many thousands of them that change on a weekly or
daily basis. I'd rather maintain the former, given the choice. It's
harder for things to slip through the cracks with fewer moving pieces.

If none of the users need to be "special" in any way, that's all wasted
overhead. (If they do actually need to be special, then at least some
of that overhead becomes necessary. Otherwise it's waste.) You may be
able to mitigate the cost of the waste, or absorb the mitigations into
Postgres so that the user can't see the waste, or decide that the waste
is not costly enough to care about. It's still waste.

Except the amount of 'wasted' overhead being claimed here seems to be
hardly any. The biggest complaint levied at this seems to really be
just the issues around the load on the ldap systems from having to deal
with the frequent sync queries, and that's largely a solvable issue in
the majority of environments out there today.

As long as we're in agreement that there is waste, I don't think I'm
going to convince you about the cost. It's tangential anyway if you're
not going to remove many-to-many maps.

Not sure exactly what you’re referring to here by “administer role
privileges externally too”..? Curious to hear what you are imagining
specifically.

Just that it would be nice to centrally provision role GRANTs as well
as role membership, that's all. No specifics in mind, and I'm not even
sure if LDAP would be a helpful place to put that sort of config.

GRANT's on objects, you mean? I agree, that would be interesting to
consider though it would involve custom entries in the LDAP directory,
no? Role membership would be able to be sync'd as part of group
membership and that was something I was thinking would be handled as
part of this in a similar manner to what the 3rd party solutions provide
today using the cron-based approach.

Agreed. I haven't put too much thought into those use cases yet.

I’d also point out though that having to do an ldap lookup on every
login to PG is *already* an issue in some environments, having to do
multiple amplifies that.

You can't use the LDAP auth method with this patch yet, so this concern
is based on code that doesn't exist. It's entirely possible that you
could do the role query as part of the first bound connection. If that
proves unworkable, then yes, I agree that it's a concern.

Perhaps it could be done as part of the same connection but that then
has an impact on what the configuration of the ident LDAP lookup would
be, no? That seems like an important thing to flesh out before we move
too much farther with this patch, to make sure that, if we want that to
work, that there's a clear way to configure it to avoid the double LDAP
connection. I'm guessing you already have an idea how that'll work
though..?

It's only relevant if the other thread (which you've said you're
ignoring) progresses. The patch discussed here does not touch that code
path.

But yes, I have a general idea that as long as a user can look up (but
not modify) their own role information, this should work just fine.

Not to mention that when the ldap servers can’t be reached for some
reason, no one can log into the database and that’s rather
unfortunate too.

Assuming you have no caches, then yes. That might be a pretty good
argument for allowing ldapmap and map to be used together, actually, so
that you can have some critical users who can always log in as
"themselves" or "admin" or etc. Or maybe it's an argument for allowing
HBA to handle fallback methods of authentication.

Ok, so now we're talking about a cache that needs to be implemented
which will ... store the user's password for LDAP authentication? Or
what the mapping is for various LDAP IDs to PG roles? And how will that
cache be managed? Would it be handled by dump/restore? What about
pg_upgrade? How will entries in the cache be removed?

You keep pulling the authentication discussion, which this patch does
not touch on purpose, into this discussion about authorization. The
authz info requested by this patch seems like it can be cached.

People currently using LDAP authentication (which again, this patch
cannot use because there is no LDAP user mapping) either have existing
HA infrastructure that they're happy with, or they don't. This patch
shouldn't make that situation any better or worse -- *if* the lookup
can be done on one connection.

And mainly- how is this different from just having all the roles in PG
to begin with..?

This comment seems counterproductive. One major difference is that
Postgres doesn't have to duplicate the authentication info that some
other system already holds.

Luckily I think it's pretty easy to communicate to LDAP users that if
*all* your login infrastructure goes down, you will no longer be able
to log in. They're probably used to that idea, if they haven't set up
any availability infra.

Except that most of the rest of the infrastructure may continue to work
just fine except for logging in- which is something most folks only do
once a day. That is, why is the SQL Server system still happily
accepting connections while the AD is being rebooted? Or why can I
still log into the company website even though AD is down, but I can't
get into PG? Not everything in an environment is tied to LDAP being up
and running all the time, so it's not nearly so cut and dry in many,
many cases.

Whatever LDAP users currently deal with, this patch doesn't change
their experience, right? It seems like it's a lot easier to add caching
to a synchronous check, to make it asynchronous and a little more
fault-tolerant, than it is to do the reverse.

These are, of course, arguments for moving away from methods that
require checking with some other system synchronously during login-
which is another reason why it’s better to have the authentication
credentials easily map to the PG role, without the need for external
checks at login time. That’s done with today’s pg_ident, but this
patch would change that.

There are arguments for moving towards synchronous checks as well.
Central revocation of credentials (in timeframes shorter than ticket
expiration) is what comes to mind. Revocation is hard and usually
conflicts with the desire for availability.

Revocation in less time than ticket lifetime and everything falling over
due to the AD being restarted are very different. The approaches being
discussed are all much shorter than ticket lifetime and so that's hardly
an appropriate comparison to be making. I didn't suggest that waiting
for ticket expiration would be appropriate when it comes to syncing
accounts between AD and PG or that it would be appropriate for
revocation. Regarding the cache'ing proposed above- in such a case,
clearly, revocation wouldn't be syncronous either. Certainly in the
cases today where cronjobs are being used to perform the sync,
revocation also isn't syncronous (unless also using LDAP for
authentication, of course, though that wouldn't do anything for existing
sessions, while removing role memberships does...).

Sure. Again: tradeoffs.

The two systems have different architectures, and different security
properties, and you have me at a disadvantage in that you can see the
experimental code I have written and I cannot see the hypothetical code
in your head.

I've barely glanced at the code you've written <snip>

This is frustrating to read. I think we're talking past each other,
because I'm trying to talk about this patch and you're talking about
other things.

The only reference to hypothetical code
is the idea of a background or other worker that subscribes to changes
in LDAP and implements those changes in PG instead of having something
cron-based do it

Yes. That's what I was referring to.

, but that doesn't really change anything about the
architectural question of if we cache (either with an explicit cache, as
you've opined us adding above, though which there is no code for today,

LDAP caches exist... I'm not suggesting we implement a Postgres-branded
LDAP cache.

or just by using PG's existing role/membership system) or call out to
LDAP for every login.

It sounds like I'm more concerned with the ability to have an online
central source of truth for access control, accepting that denial of
service may cause the system to fail shut; and you're more concerned
with availability in the face of network failure, accepting that denial
of service may cause the system to fail open. I think that's a design
decision that belongs to an end user.

There is more to it than just failing shut/closed. Part of the argument
being used to drive this change was that it would help to reduce the
load on the LDAP servers because there wouldn't be a need to run large
queries on them frequently out of cron to keep PG's understanding of
what the roles are and their mappings is matching what's in LDAP.

Yes.

The distributed availability problems you're describing are, in my
experience, typically solved by caching. With your not-yet-written
solution, the caching is built into Postgres, and it's on all of the
time, but may (see below) only actually perform well with Active
Directory. With my solution, any caching is optional, because it has to
be implemented/maintained external to Postgres, but because it's just
generic "LDAP caching" then it should be broadly compatible and we
don't have to maintain it. I can see arguments for and against both
approaches.

I'm a bit confused by the this- either you're referring to the cache
being PG's existing system, which certainly has already been written,
and has existed since it was committed and released as part of 8.1, and
is, indeed, on all the time ... or you're talking about something else
which hasn't been written and could therefore be anything, though I'm
generally against the idea of having an independent cache for this, as
described above.

You just proposed an internal caching system, immediately upthread:
"I'd go a step further and suggest that the way to do this is with a
background worker that's started up and connects to an LDAP
infrastructure and listens for changes, allowing the system to pick up
on new roles/memberships as soon as they're created in the LDAP
environment." That proposal is what I was referring to by "your not-
yet-written solution".

As for optional cacheing with some generic LDAP caching system, that
strikes me as clearly even worse than building something into PG for
this as it requires maintaining yet another system in order to have a
reasonably well working system and that isn't good.

A choice for the end user. If they don't want to deal with LDAP
infrastructure, they don't have to use it.

While it's good
that we have pgbouncer, it'd certainly be better if we didn't need it
and it's got a bunch of downsides to it. I strongly suspect the same
would be true of some external generic "LDAP cacheing" system as is
referred to above, though as there isn't anything to look at, I can't
say for sure.

We can take a look at OpenLDAP's proxy caching for some info. That
won't be perfectly representative but I don't think there's "nothing to
look at".

Regarding 'performing well', while lots of little queries may be better
in some cases than less frequent larger queries, that's really going to
depend on the frequency of each and therefore really be rather dependent
on the environment and usage. In any case, however, being able to
leverage change modifications instead of fully resyncing will definitely
be better. At the same time, however, if we have the external generic
LDAP cacheing system that's being claimed ... why wouldn't we simply use
that with the cron-based system today to offload those from the main
LDAP systems?

I think there's an architectural difference between a proxy cache that
is set up to reduce load on a central server, and one that is set up to
handle network partitions while ensuring liveness. To be fair, I don't
know which use cases existing solutions can handle. But those two don't
seem to be the same to me.

I know that I have users who are okay with the query load from logins,
but not with the query load of their role-sync scripts. That's a good
enough datapoint for me.

That would certainly be a useful thing to implement for deployments
that can use it. But my personal interest in writing "LDAP" code that
only works with AD is nil, at least in the short term.

(The continued attitude that Microsoft Active Directory is "the one
that really matters" is really frustrating. I have users on LDAP
without Active Directory. Postgres tests are written against OpenLDAP.)

What would you consider the important directories to worry about beyond
AD? I don't consider the PG testing framework to be particularly
indicative of what enterprises are actually running.

I have end users running
- NetIQ/Novell eDirectory
- Oracle Directory Server
- Red Hat IdM
in addition to AD.

OpenLDAP has an audit log system which can be used though it’s
certainly not as nice and would require code specific to it.

This talks a bit about other directories:
https://docs.informatica.com/data-integration/powerexchange-adapters-for-powercenter/10-1/powerexchange-for-ldap-user-guide-for-powercenter/ldap-sessions/configuring-change-data-capture/methods-for-tracking-changes-in-different-directories.html

I do wish they all supported it cleanly in the same way.

Okay. But the answer to "is persistent search widely implemented?"
appears to be "No."

I'm curious as to how the large environments that you've worked with
have generally solved this issue. Is there a generic LDAP cacheing
system that's been used? What?

They haven't solved the issue; that's why I'm poking at it. Several
users have to cobble together scripts because of poor interaction with
their existing LDAP deployments (or complete lack of support, in the
case of pgbouncer).

I don't see a good way to push the filter back into the HBA, because it
may very well depend on the users being mapped (i.e. there may need to
be multiple lines in the map). Same for the query attributes. In fact
if I'm already using AD Kerberos or SSPI and I want to be able to
handle users coming from multiple domains, couldn't I be querying
entirely different servers depending on the username presented?

Yeah, that's a good point and which argues for putting everything into
the ident. In such a situation as you describe above, we wouldn't
actually have any LDAP configuration in the HBA and I'm entirely fine
with that- we'd just have it all in ident. I don't see how you'd make
that work with, as you suggest above, LDAP-based authentication and the
idea of having only one connection be used for the LDAP-based auth and
the mapping lookup, but I'm also not generally worried about LDAP-based
auth and would rather we rip it out entirely. :)

As such, I'd say that you've largely convinced me that we should just
move all of the LDAP configuration for the lookup into the ident and
discourage people from using LDAP-based authentication and from putting
LDAP configuration into the hba.

I'm willing to bet that Postgres dropping support will not result in my
end users abandoning their LDAP infrastructure. Either I and others in
my position will need to maintain forks, or my end users will find a
different database.

If there's widespread agreement that the project doesn't want to
maintain an LDAP auth method -- so far I think you've provided the only
such opinion, that I've seen at least -- that might be a good argument
for introducing pluggable auth so that the community can maintain the
methods that are important to them.

I'm still a fan of the general idea of
having a way to configure such ldap parameters in one place in whatever
file they go into and then re-using that multiple times on the general
assumption that folks are likely to need to reference a particular LDAP
configuration more than once, wherever it's configured.

Sure.

You're open to the idea of bolting a new key/value grammar onto the HBA
parser, but not to the idea of brainstorming a different configuration
DSL?

Short answer- yes (or, as mentioned just above, into the ident file vs.
the hba). I'd rather we build on the existing configuration systems
that we have rather than invent something new that will then have to
work with the others, as I don't see it as likely that we could just
replace the existing ones with something new and make everyone
change. Having yet another one strikes me as worse than making
improvements to the existing ones (be those 'bolted on' or otherwise).

I think the key to maintaining incrementally built systems is that at
some point, eventually, you refactor the thing. There was a brief
question on what that might look like, from Peter. You stepped in with
some very strong opinions.

--Jacob