[PATCH] Exponential backoff for auth_delay

Started by Michael Banckabout 2 years ago22 messages
#1Michael Banck
mbanck@gmx.net
1 attachment(s)

Hi,

we had a conversation with a customer about security compliance a while
ago and one thing they were concerned about was avoiding brute-force
attemps for remote password guessing. This is should not be a big
concern if reasonably secure passwords are used and increasing SCRAM
iteration count can also help, but generally auth_delay is recommended
for this if there are concerns.

This patch adds exponential backoff so that one can choose a small
initial value which gets doubled for each failed authentication attempt
until a maximum wait time (which is 10s by default, but can be disabled
if so desired).

Currently, this patch tracks remote hosts but not users, the idea being
that a remote attacker likely tries several users from a particular
host, but this could in theory be extended to users if there are
concerns.

The patch is partly based on an earlier, more ambitious attempt at
extending auth_delay by 成之焕 from a year ago:
/messages/by-id/AHwAxACqIwIVOEhs5YejpqoG.1.1668569845751.Hmail.zhcheng@ceresdata.com

Michael

Attachments:

v1-0001-Add-optional-exponential-backoff-to-auth_delay-co.patchtext/x-diff; charset=utf-8Download
From 4c964c866010bbdbeee9f0b2a755d97c91c5c091 Mon Sep 17 00:00:00 2001
From: Michael Banck <michael.banck@credativ.de>
Date: Wed, 27 Dec 2023 15:55:39 +0100
Subject: [PATCH v1] Add optional exponential backoff to auth_delay contrib
 module.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This adds two new GUCs for auth_delay, exp_backoff and max_seconds. The former
controls whether exponential backoff should be used or not, the latter sets an
maximum delay (default is 10s) in case exponential backoff is active.

The exponential backoff is tracked per remote host and doubled for every failed
login attempt (i.e., wrong password, not just missing pg_hba line or database)
and reset to auth_delay.milliseconds after a successful authentication from
that host.

This patch is partly based on a larger (but ultimately rejected) patch by
成之焕.

Authors: Michael Banck, 成之焕
Discussion: https://postgr.es/m/AHwAxACqIwIVOEhs5YejpqoG.1.1668569845751.Hmail.zhcheng@ceresdata.com
---
 contrib/auth_delay/auth_delay.c  | 202 ++++++++++++++++++++++++++++++-
 doc/src/sgml/auth-delay.sgml     |  41 +++++++
 src/tools/pgindent/typedefs.list |   1 +
 3 files changed, 243 insertions(+), 1 deletion(-)

diff --git a/contrib/auth_delay/auth_delay.c b/contrib/auth_delay/auth_delay.c
index 8d6e4d2778..95e56db6ec 100644
--- a/contrib/auth_delay/auth_delay.c
+++ b/contrib/auth_delay/auth_delay.c
@@ -14,24 +14,50 @@
 #include <limits.h>
 
 #include "libpq/auth.h"
+#include "miscadmin.h"
 #include "port.h"
+#include "storage/ipc.h"
+#include "storage/shmem.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
 PG_MODULE_MAGIC;
 
+#define MAX_CONN_RECORDS 50
+
 /* GUC Variables */
 static int	auth_delay_milliseconds = 0;
+static bool auth_delay_exp_backoff = false;
+static int	auth_delay_max_seconds = 0;
 
 /* Original Hook */
 static ClientAuthentication_hook_type original_client_auth_hook = NULL;
 
+typedef struct AuthConnRecord
+{
+	char		remote_host[NI_MAXHOST];
+	bool		used;
+	double		sleep_time;		/* in milliseconds */
+} AuthConnRecord;
+
+static shmem_startup_hook_type shmem_startup_next = NULL;
+static shmem_request_hook_type shmem_request_next = NULL;
+static AuthConnRecord *acr_array = NULL;
+
+static AuthConnRecord *find_conn_record(char *remote_host, int *free_index);
+static double record_failed_conn_auth(Port *port);
+static double find_conn_max_delay(void);
+static void record_conn_failure(AuthConnRecord *acr);
+static void cleanup_conn_record(Port *port);
+
 /*
  * Check authentication
  */
 static void
 auth_delay_checks(Port *port, int status)
 {
+	double		delay;
+
 	/*
 	 * Any other plugins which use ClientAuthentication_hook.
 	 */
@@ -43,8 +69,150 @@ auth_delay_checks(Port *port, int status)
 	 */
 	if (status != STATUS_OK)
 	{
-		pg_usleep(1000L * auth_delay_milliseconds);
+		if (auth_delay_exp_backoff)
+		{
+			/*
+			 * Exponential backoff per remote host.
+			 */
+			delay = record_failed_conn_auth(port);
+			if (auth_delay_max_seconds > 0)
+				delay = Min(delay, 1000L * auth_delay_max_seconds);
+		}
+		else
+			delay = auth_delay_milliseconds;
+		if (delay > 0)
+		{
+			elog(DEBUG1, "Authentication delayed for %g seconds", delay / 1000.0);
+			pg_usleep(1000L * (long) delay);
+		}
+	}
+	else
+	{
+		cleanup_conn_record(port);
+	}
+}
+
+static double
+record_failed_conn_auth(Port *port)
+{
+	AuthConnRecord *acr = NULL;
+	int			j = -1;
+
+	acr = find_conn_record(port->remote_host, &j);
+
+	if (!acr)
+	{
+		if (j == -1)
+
+			/*
+			 * No free space, MAX_CONN_RECORDS reached. Wait as long as the
+			 * largest delay for any remote host.
+			 */
+			return find_conn_max_delay();
+		acr = &acr_array[j];
+		strcpy(acr->remote_host, port->remote_host);
+		acr->used = true;
+		elog(DEBUG1, "new connection: %s, index: %d", acr->remote_host, j);
+	}
+
+	record_conn_failure(acr);
+	return acr->sleep_time;
+}
+
+static AuthConnRecord *
+find_conn_record(char *remote_host, int *free_index)
+{
+	int			i;
+
+	*free_index = -1;
+	for (i = 0; i < MAX_CONN_RECORDS; i++)
+	{
+		if (!acr_array[i].used)
+		{
+			if (*free_index == -1)
+				/* record unused element */
+				*free_index = i;
+			continue;
+		}
+		if (strcmp(acr_array[i].remote_host, remote_host) == 0)
+			return &acr_array[i];
+	}
+
+	return NULL;
+}
+
+static double
+find_conn_max_delay(void)
+{
+	int			i;
+	double		max_delay = 0.0;
+
+
+	for (i = 0; i < MAX_CONN_RECORDS; i++)
+	{
+		if (acr_array[i].used && acr_array[i].sleep_time > max_delay)
+			max_delay = acr_array[i].sleep_time;
 	}
+
+	return max_delay;
+}
+
+static void
+record_conn_failure(AuthConnRecord *acr)
+{
+	if (acr->sleep_time == 0)
+		acr->sleep_time = (double) auth_delay_milliseconds;
+	else
+		acr->sleep_time *= 2;
+}
+
+static void
+cleanup_conn_record(Port *port)
+{
+	int			free_index;
+	AuthConnRecord *acr = NULL;
+
+	acr = find_conn_record(port->remote_host, &free_index);
+	if (acr == NULL)
+		return;
+
+	acr->used = false;
+	acr->sleep_time = 0.0;
+}
+
+/*
+ * Set up shared memory
+ */
+
+static void
+auth_delay_shmem_request(void)
+{
+	Size		required;
+
+	if (shmem_request_next)
+		shmem_request_next();
+
+	required = sizeof(AuthConnRecord) * MAX_CONN_RECORDS;
+	required += sizeof(int);
+	RequestAddinShmemSpace(required);
+}
+
+static void
+auth_delay_shmem_startup(void)
+{
+	Size		required;
+	bool		found;
+
+	if (shmem_startup_next)
+		shmem_startup_next();
+
+	required = sizeof(AuthConnRecord) * MAX_CONN_RECORDS;
+	acr_array = ShmemInitStruct("Array of AuthConnRecord", required, &found);
+	if (found)
+		/* this should not happen ? */
+		elog(DEBUG1, "variable acr_array already exists");
+	/* all fileds are set to 0 */
+	memset(acr_array, 0, required);
 }
 
 /*
@@ -53,6 +221,11 @@ auth_delay_checks(Port *port, int status)
 void
 _PG_init(void)
 {
+	if (!process_shared_preload_libraries_in_progress)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("auth_delay must be loaded via shared_preload_libraries")));
+
 	/* Define custom GUC variables */
 	DefineCustomIntVariable("auth_delay.milliseconds",
 							"Milliseconds to delay before reporting authentication failure",
@@ -66,9 +239,36 @@ _PG_init(void)
 							NULL,
 							NULL);
 
+	DefineCustomBoolVariable("auth_delay.exp_backoff",
+							 "Exponential backoff for failed connections, per remote host",
+							 NULL,
+							 &auth_delay_exp_backoff,
+							 false,
+							 PGC_SIGHUP,
+							 0,
+							 NULL,
+							 NULL,
+							 NULL);
+
+	DefineCustomIntVariable("auth_delay.max_seconds",
+							"Maximum seconds to wait when login fails during exponential backoff",
+							NULL,
+							&auth_delay_max_seconds,
+							10,
+							0, INT_MAX,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL, NULL, NULL);
+
 	MarkGUCPrefixReserved("auth_delay");
 
 	/* Install Hooks */
 	original_client_auth_hook = ClientAuthentication_hook;
 	ClientAuthentication_hook = auth_delay_checks;
+
+	/* Set up shared memory */
+	shmem_request_next = shmem_request_hook;
+	shmem_request_hook = auth_delay_shmem_request;
+	shmem_startup_next = shmem_startup_hook;
+	shmem_startup_hook = auth_delay_shmem_startup;
 }
diff --git a/doc/src/sgml/auth-delay.sgml b/doc/src/sgml/auth-delay.sgml
index 0571f2a99d..2a6efad851 100644
--- a/doc/src/sgml/auth-delay.sgml
+++ b/doc/src/sgml/auth-delay.sgml
@@ -16,6 +16,17 @@
   connection slots.
  </para>
 
+ <para>
+  It is optionally possible to let <filename>auth_delay</filename> wait longer
+  for each successive authentication failure from a particular remote host, if
+  the configuration parameter <varname>auth_delay.exp_backoff</varname> is
+  active.  Once an authentication succeeded from a remote host, the
+  authentication delay is reset to the value of
+  <varname>auth_delay.milliseconds</varname> for this host.  The parmaeter
+  <primary><varname>auth_delay.max_seconds</varname> sets an upper bound for
+  the delay in this case.
+ </para>
+
  <para>
   In order to function, this module must be loaded via
   <xref linkend="guc-shared-preload-libraries"/> in <filename>postgresql.conf</filename>.
@@ -39,6 +50,34 @@
      </para>
     </listitem>
    </varlistentry>
+   <varlistentry>
+    <term>
+     <varname>auth_delay.exp_backoff</varname> (<type>bool</type>)
+     <indexterm>
+      <primary><varname>auth_delay.exp_backoff</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      Whether to use exponential backoff per remote host on authentication
+      failure.  The default is off.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+    <term>
+     <varname>auth_delay.max_seconds</varname> (<type>integer</type>)
+     <indexterm>
+      <primary><varname>auth_delay.max_seconds</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      How many seconds to wait at most if exponential backoff is active.
+      Setting this parameter to 0 disables it.  The default is 10 seconds.
+     </para>
+    </listitem>
+   </varlistentry>
   </variablelist>
 
   <para>
@@ -51,6 +90,8 @@
 shared_preload_libraries = 'auth_delay'
 
 auth_delay.milliseconds = '500'
+auth_delay.exp_backoff = 'on'
+auth_delay.max_seconds = '20'
 </programlisting>
  </sect2>
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e37ef9aa76..9b62945f28 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -164,6 +164,7 @@ AttrMap
 AttrMissing
 AttrNumber
 AttributeOpts
+AuthConnRecord
 AuthRequest
 AuthToken
 AutoPrewarmSharedState
-- 
2.39.2

#2Michael Banck
mbanck@gmx.net
In reply to: Michael Banck (#1)
1 attachment(s)
Re: [PATCH] Exponential backoff for auth_delay

Hi,

On Wed, Dec 27, 2023 at 05:19:54PM +0100, Michael Banck wrote:

This patch adds exponential backoff so that one can choose a small
initial value which gets doubled for each failed authentication attempt
until a maximum wait time (which is 10s by default, but can be disabled
if so desired).

Here is a new version, hopefully fixing warnings in the documentation
build, per cfbot.

Michael

Attachments:

v2-0001-Add-optional-exponential-backoff-to-auth_delay-co.patchtext/x-diff; charset=utf-8Download
From 579f6ce8f968464af06a4695b7a3b66ee94716c8 Mon Sep 17 00:00:00 2001
From: Michael Banck <michael.banck@credativ.de>
Date: Wed, 27 Dec 2023 15:55:39 +0100
Subject: [PATCH v2] Add optional exponential backoff to auth_delay contrib
 module.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This adds two new GUCs for auth_delay, exp_backoff and max_seconds. The former
controls whether exponential backoff should be used or not, the latter sets an
maximum delay (default is 10s) in case exponential backoff is active.

The exponential backoff is tracked per remote host and doubled for every failed
login attempt (i.e., wrong password, not just missing pg_hba line or database)
and reset to auth_delay.milliseconds after a successful authentication from
that host.

This patch is partly based on a larger (but ultimately rejected) patch by
成之焕.

Authors: Michael Banck, 成之焕
Discussion: https://postgr.es/m/AHwAxACqIwIVOEhs5YejpqoG.1.1668569845751.Hmail.zhcheng@ceresdata.com
---
 contrib/auth_delay/auth_delay.c  | 202 ++++++++++++++++++++++++++++++-
 doc/src/sgml/auth-delay.sgml     |  41 +++++++
 src/tools/pgindent/typedefs.list |   1 +
 3 files changed, 243 insertions(+), 1 deletion(-)

diff --git a/contrib/auth_delay/auth_delay.c b/contrib/auth_delay/auth_delay.c
index 8d6e4d2778..95e56db6ec 100644
--- a/contrib/auth_delay/auth_delay.c
+++ b/contrib/auth_delay/auth_delay.c
@@ -14,24 +14,50 @@
 #include <limits.h>
 
 #include "libpq/auth.h"
+#include "miscadmin.h"
 #include "port.h"
+#include "storage/ipc.h"
+#include "storage/shmem.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
 PG_MODULE_MAGIC;
 
+#define MAX_CONN_RECORDS 50
+
 /* GUC Variables */
 static int	auth_delay_milliseconds = 0;
+static bool auth_delay_exp_backoff = false;
+static int	auth_delay_max_seconds = 0;
 
 /* Original Hook */
 static ClientAuthentication_hook_type original_client_auth_hook = NULL;
 
+typedef struct AuthConnRecord
+{
+	char		remote_host[NI_MAXHOST];
+	bool		used;
+	double		sleep_time;		/* in milliseconds */
+} AuthConnRecord;
+
+static shmem_startup_hook_type shmem_startup_next = NULL;
+static shmem_request_hook_type shmem_request_next = NULL;
+static AuthConnRecord *acr_array = NULL;
+
+static AuthConnRecord *find_conn_record(char *remote_host, int *free_index);
+static double record_failed_conn_auth(Port *port);
+static double find_conn_max_delay(void);
+static void record_conn_failure(AuthConnRecord *acr);
+static void cleanup_conn_record(Port *port);
+
 /*
  * Check authentication
  */
 static void
 auth_delay_checks(Port *port, int status)
 {
+	double		delay;
+
 	/*
 	 * Any other plugins which use ClientAuthentication_hook.
 	 */
@@ -43,8 +69,150 @@ auth_delay_checks(Port *port, int status)
 	 */
 	if (status != STATUS_OK)
 	{
-		pg_usleep(1000L * auth_delay_milliseconds);
+		if (auth_delay_exp_backoff)
+		{
+			/*
+			 * Exponential backoff per remote host.
+			 */
+			delay = record_failed_conn_auth(port);
+			if (auth_delay_max_seconds > 0)
+				delay = Min(delay, 1000L * auth_delay_max_seconds);
+		}
+		else
+			delay = auth_delay_milliseconds;
+		if (delay > 0)
+		{
+			elog(DEBUG1, "Authentication delayed for %g seconds", delay / 1000.0);
+			pg_usleep(1000L * (long) delay);
+		}
+	}
+	else
+	{
+		cleanup_conn_record(port);
+	}
+}
+
+static double
+record_failed_conn_auth(Port *port)
+{
+	AuthConnRecord *acr = NULL;
+	int			j = -1;
+
+	acr = find_conn_record(port->remote_host, &j);
+
+	if (!acr)
+	{
+		if (j == -1)
+
+			/*
+			 * No free space, MAX_CONN_RECORDS reached. Wait as long as the
+			 * largest delay for any remote host.
+			 */
+			return find_conn_max_delay();
+		acr = &acr_array[j];
+		strcpy(acr->remote_host, port->remote_host);
+		acr->used = true;
+		elog(DEBUG1, "new connection: %s, index: %d", acr->remote_host, j);
+	}
+
+	record_conn_failure(acr);
+	return acr->sleep_time;
+}
+
+static AuthConnRecord *
+find_conn_record(char *remote_host, int *free_index)
+{
+	int			i;
+
+	*free_index = -1;
+	for (i = 0; i < MAX_CONN_RECORDS; i++)
+	{
+		if (!acr_array[i].used)
+		{
+			if (*free_index == -1)
+				/* record unused element */
+				*free_index = i;
+			continue;
+		}
+		if (strcmp(acr_array[i].remote_host, remote_host) == 0)
+			return &acr_array[i];
+	}
+
+	return NULL;
+}
+
+static double
+find_conn_max_delay(void)
+{
+	int			i;
+	double		max_delay = 0.0;
+
+
+	for (i = 0; i < MAX_CONN_RECORDS; i++)
+	{
+		if (acr_array[i].used && acr_array[i].sleep_time > max_delay)
+			max_delay = acr_array[i].sleep_time;
 	}
+
+	return max_delay;
+}
+
+static void
+record_conn_failure(AuthConnRecord *acr)
+{
+	if (acr->sleep_time == 0)
+		acr->sleep_time = (double) auth_delay_milliseconds;
+	else
+		acr->sleep_time *= 2;
+}
+
+static void
+cleanup_conn_record(Port *port)
+{
+	int			free_index;
+	AuthConnRecord *acr = NULL;
+
+	acr = find_conn_record(port->remote_host, &free_index);
+	if (acr == NULL)
+		return;
+
+	acr->used = false;
+	acr->sleep_time = 0.0;
+}
+
+/*
+ * Set up shared memory
+ */
+
+static void
+auth_delay_shmem_request(void)
+{
+	Size		required;
+
+	if (shmem_request_next)
+		shmem_request_next();
+
+	required = sizeof(AuthConnRecord) * MAX_CONN_RECORDS;
+	required += sizeof(int);
+	RequestAddinShmemSpace(required);
+}
+
+static void
+auth_delay_shmem_startup(void)
+{
+	Size		required;
+	bool		found;
+
+	if (shmem_startup_next)
+		shmem_startup_next();
+
+	required = sizeof(AuthConnRecord) * MAX_CONN_RECORDS;
+	acr_array = ShmemInitStruct("Array of AuthConnRecord", required, &found);
+	if (found)
+		/* this should not happen ? */
+		elog(DEBUG1, "variable acr_array already exists");
+	/* all fileds are set to 0 */
+	memset(acr_array, 0, required);
 }
 
 /*
@@ -53,6 +221,11 @@ auth_delay_checks(Port *port, int status)
 void
 _PG_init(void)
 {
+	if (!process_shared_preload_libraries_in_progress)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("auth_delay must be loaded via shared_preload_libraries")));
+
 	/* Define custom GUC variables */
 	DefineCustomIntVariable("auth_delay.milliseconds",
 							"Milliseconds to delay before reporting authentication failure",
@@ -66,9 +239,36 @@ _PG_init(void)
 							NULL,
 							NULL);
 
+	DefineCustomBoolVariable("auth_delay.exp_backoff",
+							 "Exponential backoff for failed connections, per remote host",
+							 NULL,
+							 &auth_delay_exp_backoff,
+							 false,
+							 PGC_SIGHUP,
+							 0,
+							 NULL,
+							 NULL,
+							 NULL);
+
+	DefineCustomIntVariable("auth_delay.max_seconds",
+							"Maximum seconds to wait when login fails during exponential backoff",
+							NULL,
+							&auth_delay_max_seconds,
+							10,
+							0, INT_MAX,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL, NULL, NULL);
+
 	MarkGUCPrefixReserved("auth_delay");
 
 	/* Install Hooks */
 	original_client_auth_hook = ClientAuthentication_hook;
 	ClientAuthentication_hook = auth_delay_checks;
+
+	/* Set up shared memory */
+	shmem_request_next = shmem_request_hook;
+	shmem_request_hook = auth_delay_shmem_request;
+	shmem_startup_next = shmem_startup_hook;
+	shmem_startup_hook = auth_delay_shmem_startup;
 }
diff --git a/doc/src/sgml/auth-delay.sgml b/doc/src/sgml/auth-delay.sgml
index 0571f2a99d..2ca9528011 100644
--- a/doc/src/sgml/auth-delay.sgml
+++ b/doc/src/sgml/auth-delay.sgml
@@ -16,6 +16,17 @@
   connection slots.
  </para>
 
+ <para>
+  It is optionally possible to let <filename>auth_delay</filename> wait longer
+  for each successive authentication failure from a particular remote host, if
+  the configuration parameter <varname>auth_delay.exp_backoff</varname> is
+  active.  Once an authentication succeeded from a remote host, the
+  authentication delay is reset to the value of
+  <varname>auth_delay.milliseconds</varname> for this host.  The parameter
+  <varname>auth_delay.max_seconds</varname> sets an upper bound for the delay
+  in this case.
+ </para>
+
  <para>
   In order to function, this module must be loaded via
   <xref linkend="guc-shared-preload-libraries"/> in <filename>postgresql.conf</filename>.
@@ -39,6 +50,34 @@
      </para>
     </listitem>
    </varlistentry>
+   <varlistentry>
+    <term>
+     <varname>auth_delay.exp_backoff</varname> (<type>bool</type>)
+     <indexterm>
+      <primary><varname>auth_delay.exp_backoff</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      Whether to use exponential backoff per remote host on authentication
+      failure.  The default is off.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+    <term>
+     <varname>auth_delay.max_seconds</varname> (<type>integer</type>)
+     <indexterm>
+      <primary><varname>auth_delay.max_seconds</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      How many seconds to wait at most if exponential backoff is active.
+      Setting this parameter to 0 disables it.  The default is 10 seconds.
+     </para>
+    </listitem>
+   </varlistentry>
   </variablelist>
 
   <para>
@@ -51,6 +90,8 @@
 shared_preload_libraries = 'auth_delay'
 
 auth_delay.milliseconds = '500'
+auth_delay.exp_backoff = 'on'
+auth_delay.max_seconds = '20'
 </programlisting>
  </sect2>
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e37ef9aa76..9b62945f28 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -164,6 +164,7 @@ AttrMap
 AttrMissing
 AttrNumber
 AttributeOpts
+AuthConnRecord
 AuthRequest
 AuthToken
 AutoPrewarmSharedState
-- 
2.39.2

#3Abhijit Menon-Sen
ams@toroid.org
In reply to: Michael Banck (#2)
Re: [PATCH] Exponential backoff for auth_delay

At 2024-01-04 08:30:36 +0100, mbanck@gmx.net wrote:

+typedef struct AuthConnRecord
+{
+	char		remote_host[NI_MAXHOST];
+	bool		used;
+	double		sleep_time;		/* in milliseconds */
+} AuthConnRecord;

Do we really need a "used" field here? You could avoid it by setting
remote_host[0] = '\0' in cleanup_conn_record.

static void
auth_delay_checks(Port *port, int status)
{
+ double delay;

I would just initialise this to auth_delay_milliseconds here, instead of
the harder-to-notice "else" below.

@@ -43,8 +69,150 @@ auth_delay_checks(Port *port, int status)
*/
if (status != STATUS_OK)
{
-		pg_usleep(1000L * auth_delay_milliseconds);
+		if (auth_delay_exp_backoff)
+		{
+			/*
+			 * Exponential backoff per remote host.
+			 */
+			delay = record_failed_conn_auth(port);
+			if (auth_delay_max_seconds > 0)
+				delay = Min(delay, 1000L * auth_delay_max_seconds);
+		}

I would make this comment more specific, maybe something like "Delay by
2^n seconds after each authentication failure from a particular host,
where n is the number of consecutive authentication failures".

It's slightly discordant to see the new delay being returned by a
function named "record_<something>" (rather than "calculate_delay" or
similar). Maybe a name like "backoff_after_failed_auth" would be better?
Or "increase_delay_on_auth_fail"?

+static double
+record_failed_conn_auth(Port *port)
+{
+	AuthConnRecord *acr = NULL;
+	int			j = -1;
+
+	acr = find_conn_record(port->remote_host, &j);
+
+	if (!acr)
+	{
+		if (j == -1)
+
+			/*
+			 * No free space, MAX_CONN_RECORDS reached. Wait as long as the
+			 * largest delay for any remote host.
+			 */
+			return find_conn_max_delay();

In this extraordinary situation (where there are lots of hosts with lots
of authentication failures), why not delay by auth_delay_max_seconds
straightaway, especially when the default is only 10s? I don't see much
point in coordinating the delay between fifty known hosts and an unknown
number of others.

+ elog(DEBUG1, "new connection: %s, index: %d", acr->remote_host, j);

I think this should be removed, but if you want to leave it in, the
message should be more specific about what this is actually about, and
where the message is from, so as to not confuse debug-log readers.

(The same applies to the other debug messages.)

+static AuthConnRecord *
+find_conn_record(char *remote_host, int *free_index)
+{
+	int			i;
+
+	*free_index = -1;
+	for (i = 0; i < MAX_CONN_RECORDS; i++)
+	{
+		if (!acr_array[i].used)
+		{
+			if (*free_index == -1)
+				/* record unused element */
+				*free_index = i;
+			continue;
+		}
+		if (strcmp(acr_array[i].remote_host, remote_host) == 0)
+			return &acr_array[i];
+	}
+
+	return NULL;
+}

It's not a big deal, but to be honest, I would much prefer to (get rid
of used, as suggested earlier, and) have separate find_acr_for_host()
and find_free_acr() functions.

+static void
+record_conn_failure(AuthConnRecord *acr)
+{
+	if (acr->sleep_time == 0)
+		acr->sleep_time = (double) auth_delay_milliseconds;
+	else
+		acr->sleep_time *= 2;
+}

I would just roll this function into record_failed_conn_auth (or
whatever it's named), but if you want to keep it a separate function, it
should at least have a name that's sufficiently different from
record_failed_conn_auth.

In terms of the logic, it would have been slightly clearer to store the
number of failures and calculate the delay, but it's easier to double
the sleep time that way you've written it. I think it's fine.

It's worth noting that there is no time-based reset of the delay with
this feature, which I think is something that people might expect to go
hand-in-hand with exponential backoff. I think that's probably OK too.

+static void
+auth_delay_shmem_startup(void)
+{
+	Size		required;
+	bool		found;
+
+	if (shmem_startup_next)
+		shmem_startup_next();
+
+	required = sizeof(AuthConnRecord) * MAX_CONN_RECORDS;
+	acr_array = ShmemInitStruct("Array of AuthConnRecord", required, &found);
+	if (found)
+		/* this should not happen ? */
+		elog(DEBUG1, "variable acr_array already exists");
+	/* all fileds are set to 0 */
+	memset(acr_array, 0, required);
}

I think you can remove the elog and just do the memset if (!found). Also
if you're changing it anyway, I'd suggest something like "total_size"
instead of "required".

+	DefineCustomBoolVariable("auth_delay.exp_backoff",
+							 "Exponential backoff for failed connections, per remote host",
+							 NULL,
+							 &auth_delay_exp_backoff,
+							 false,
+							 PGC_SIGHUP,
+							 0,
+							 NULL,
+							 NULL,
+							 NULL);

Maybe "Double the delay after each authentication failure from a
particular host". (Note: authentication failed, not connection.)

I would also mildly prefer to spell out "exponential_backoff" (but leave
auth_delay_exp_backoff as-is).

+	DefineCustomIntVariable("auth_delay.max_seconds",
+							"Maximum seconds to wait when login fails during exponential backoff",
+							NULL,
+							&auth_delay_max_seconds,
+							10,
+							0, INT_MAX,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL, NULL, NULL);
+

Maybe just "Maximum delay when exponential backoff is enabled".

(Parameter indentation doesn't match the earlier block.)

I'm not able to make up my mind if I think 10s is a good default or not.
In practice, it means that after the first three consecutive failures,
we'll delay by 10s for every subsequent failure. That sounds OK. But is
is much more useful than, for example, delaying the first three failures
by auth_delay_milliseconds and then jumping straight to max_seconds?

I can't really imagine wanting to increase max_seconds to, say, 128 and
keep a bunch of backends sleeping while someone's trying to brute-force
a password. And with a reasonably short max_seconds, I'm not sure if
having the backoff be _exponential_ is particularly important.

Or maybe because this is a contrib module, we don't have to think about
it to that extent?

diff --git a/doc/src/sgml/auth-delay.sgml b/doc/src/sgml/auth-delay.sgml
index 0571f2a99d..2ca9528011 100644
--- a/doc/src/sgml/auth-delay.sgml
+++ b/doc/src/sgml/auth-delay.sgml
@@ -16,6 +16,17 @@
connection slots.
</para>
+ <para>
+  It is optionally possible to let <filename>auth_delay</filename> wait longer
+  for each successive authentication failure from a particular remote host, if
+  the configuration parameter <varname>auth_delay.exp_backoff</varname> is
+  active.  Once an authentication succeeded from a remote host, the
+  authentication delay is reset to the value of
+  <varname>auth_delay.milliseconds</varname> for this host.  The parameter
+  <varname>auth_delay.max_seconds</varname> sets an upper bound for the delay
+  in this case.
+ </para>

How about something like this…

If you enable exponential_backoff, auth_delay will double the delay
after each consecutive authentication failure from a particular
host, up to the given max_seconds (default: 10s). If the host
authenticates successfully, the delay is reset.

+   <varlistentry>
+    <term>
+     <varname>auth_delay.max_seconds</varname> (<type>integer</type>)
+     <indexterm>
+      <primary><varname>auth_delay.max_seconds</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      How many seconds to wait at most if exponential backoff is active.
+      Setting this parameter to 0 disables it.  The default is 10 seconds.
+     </para>
+    </listitem>
+   </varlistentry>

I suggest "The maximum delay, in seconds, when exponential backoff is
enabled."

-- Abhijit

#4Michael Banck
mbanck@gmx.net
In reply to: Abhijit Menon-Sen (#3)
1 attachment(s)
Re: [PATCH] Exponential backoff for auth_delay

Hi,

many thanks for the review!

I went through your comments (a lot of which pertained to the original
larger patch I took code from), attached is a reworked version 2.

Other changes:

1. Ignore STATUS_EOF, this led to auth_delay being applied twice (maybe
due to the gss/kerberos auth psql is trying by default? Is that legit
and should this change be reverted?) - i.e. handle STATUS_OK and
STATUS_ERROR explicitly.

2. Guard ShmemInitStruct() with LWLockAcquire(AddinShmemInitLock,
LW_EXCLUSIVE) / LWLockRelease(AddinShmemInitLock), as is done in
pg_prewarm and pg_stat_statements as well.

3. Added an additional paragraph discussing the value of
auth_delay.milliseconds when auth_delay.exponential_backoff is on, see
below.

I wonder whether maybe auth_delay.max_seconds should either be renamed
to auth_delay.exponential_backoff_max_seconds (but then it is rather
long) in order to make it clearer it only applies in that context or
alternatively just apply to auth_delay.milliseconds as well (though that
would be somewhat weird).

Further comments to your comments:

On Tue, Jan 16, 2024 at 12:00:49PM +0530, Abhijit Menon-Sen wrote:

At 2024-01-04 08:30:36 +0100, mbanck@gmx.net wrote:

+typedef struct AuthConnRecord
+{
+	char		remote_host[NI_MAXHOST];
+	bool		used;
+	double		sleep_time;		/* in milliseconds */
+} AuthConnRecord;

Do we really need a "used" field here? You could avoid it by setting
remote_host[0] = '\0' in cleanup_conn_record.

Ok, removed.

static void
auth_delay_checks(Port *port, int status)
{
+ double delay;

I would just initialise this to auth_delay_milliseconds here, instead of
the harder-to-notice "else" below.

Done.

@@ -43,8 +69,150 @@ auth_delay_checks(Port *port, int status)
*/
if (status != STATUS_OK)
{
-		pg_usleep(1000L * auth_delay_milliseconds);
+		if (auth_delay_exp_backoff)
+		{
+			/*
+			 * Exponential backoff per remote host.
+			 */
+			delay = record_failed_conn_auth(port);
+			if (auth_delay_max_seconds > 0)
+				delay = Min(delay, 1000L * auth_delay_max_seconds);
+		}

I would make this comment more specific, maybe something like "Delay by
2^n seconds after each authentication failure from a particular host,
where n is the number of consecutive authentication failures".

Done.

It's slightly discordant to see the new delay being returned by a
function named "record_<something>" (rather than "calculate_delay" or
similar). Maybe a name like "backoff_after_failed_auth" would be better?
Or "increase_delay_on_auth_fail"?

I called it increase_delay_after_failed_conn_auth() now.

+static double
+record_failed_conn_auth(Port *port)
+{
+	AuthConnRecord *acr = NULL;
+	int			j = -1;
+
+	acr = find_conn_record(port->remote_host, &j);
+
+	if (!acr)
+	{
+		if (j == -1)
+
+			/*
+			 * No free space, MAX_CONN_RECORDS reached. Wait as long as the
+			 * largest delay for any remote host.
+			 */
+			return find_conn_max_delay();

In this extraordinary situation (where there are lots of hosts with lots
of authentication failures), why not delay by auth_delay_max_seconds
straightaway, especially when the default is only 10s? I don't see much
point in coordinating the delay between fifty known hosts and an unknown
number of others.

I was a bit worried about legitimate users suffering here if (for some
reason) a lot of different hosts try to guess passwords, but only once
or twice or something. But I have changed it now as you suggested as
that makes it simpler and I guess the problem I mentioned above is
rather contrived.

+ elog(DEBUG1, "new connection: %s, index: %d", acr->remote_host, j);

I think this should be removed, but if you want to leave it in, the
message should be more specific about what this is actually about, and
where the message is from, so as to not confuse debug-log readers.

I left it in but mentioned auth_delay in it now. I wonder though whether
this might be a useful message to have at some more standard level like
INFO?

(The same applies to the other debug messages.)

Those are all gone.

+static AuthConnRecord *
+find_conn_record(char *remote_host, int *free_index)
+{
+	int			i;
+
+	*free_index = -1;
+	for (i = 0; i < MAX_CONN_RECORDS; i++)
+	{
+		if (!acr_array[i].used)
+		{
+			if (*free_index == -1)
+				/* record unused element */
+				*free_index = i;
+			continue;
+		}
+		if (strcmp(acr_array[i].remote_host, remote_host) == 0)
+			return &acr_array[i];
+	}
+
+	return NULL;
+}

It's not a big deal, but to be honest, I would much prefer to (get rid
of used, as suggested earlier, and) have separate find_acr_for_host()
and find_free_acr() functions.

Done.

+static void
+record_conn_failure(AuthConnRecord *acr)
+{
+	if (acr->sleep_time == 0)
+		acr->sleep_time = (double) auth_delay_milliseconds;
+	else
+		acr->sleep_time *= 2;
+}

I would just roll this function into record_failed_conn_auth (or
whatever it's named),

Done.

In terms of the logic, it would have been slightly clearer to store the
number of failures and calculate the delay, but it's easier to double
the sleep time that way you've written it. I think it's fine.

I kept it as-is for now.

It's worth noting that there is no time-based reset of the delay with
this feature, which I think is something that people might expect to go
hand-in-hand with exponential backoff. I think that's probably OK too.

You mean something like "after 5 minutes, reset the delay to 0 again"? I
agree that this would probably be useful, but would also make the change
more complex.

+static void
+auth_delay_shmem_startup(void)
+{
+	Size		required;
+	bool		found;
+
+	if (shmem_startup_next)
+		shmem_startup_next();
+
+	required = sizeof(AuthConnRecord) * MAX_CONN_RECORDS;
+	acr_array = ShmemInitStruct("Array of AuthConnRecord", required, &found);
+	if (found)
+		/* this should not happen ? */
+		elog(DEBUG1, "variable acr_array already exists");
+	/* all fileds are set to 0 */
+	memset(acr_array, 0, required);
}

I think you can remove the elog and just do the memset if (!found). Also
if you're changing it anyway, I'd suggest something like "total_size"
instead of "required".

Done.

+	DefineCustomBoolVariable("auth_delay.exp_backoff",
+							 "Exponential backoff for failed connections, per remote host",
+							 NULL,
+							 &auth_delay_exp_backoff,
+							 false,
+							 PGC_SIGHUP,
+							 0,
+							 NULL,
+							 NULL,
+							 NULL);

Maybe "Double the delay after each authentication failure from a
particular host". (Note: authentication failed, not connection.)

Done.

I would also mildly prefer to spell out "exponential_backoff" (but leave
auth_delay_exp_backoff as-is).

Done.

+	DefineCustomIntVariable("auth_delay.max_seconds",
+							"Maximum seconds to wait when login fails during exponential backoff",
+							NULL,
+							&auth_delay_max_seconds,
+							10,
+							0, INT_MAX,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL, NULL, NULL);
+

Maybe just "Maximum delay when exponential backoff is enabled".

Done.

(Parameter indentation doesn't match the earlier block.)

I noticed that as well, but pgindent keeps changing it back to this, not
sure why...

I'm not able to make up my mind if I think 10s is a good default or not.
In practice, it means that after the first three consecutive failures,
we'll delay by 10s for every subsequent failure. That sounds OK. But is
is much more useful than, for example, delaying the first three failures
by auth_delay_milliseconds and then jumping straight to max_seconds?

What I had in mind is that admins would lower auth_delay.milliseconds to
something like 100 or 125 when exponential_backoff is on, so that the
first few (possibley honest) auth failures do not get an annoying 1
seconds penalty, but later ones then wil. In that case, 10 seconds is
probably ok cause you'd need more than a handful of auth failures.

I added a paragraph to the documentation to this end.

I can't really imagine wanting to increase max_seconds to, say, 128 and
keep a bunch of backends sleeping while someone's trying to brute-force
a password. And with a reasonably short max_seconds, I'm not sure if
having the backoff be _exponential_ is particularly important.

Or maybe because this is a contrib module, we don't have to think about
it to that extent?

Well, not sure. I think something like 10 seconds should be fine for
most brute-force attacks in practise, and it is configurable (and turned
off by default).

diff --git a/doc/src/sgml/auth-delay.sgml b/doc/src/sgml/auth-delay.sgml
index 0571f2a99d..2ca9528011 100644
--- a/doc/src/sgml/auth-delay.sgml
+++ b/doc/src/sgml/auth-delay.sgml
@@ -16,6 +16,17 @@
connection slots.
</para>
+ <para>
+  It is optionally possible to let <filename>auth_delay</filename> wait longer
+  for each successive authentication failure from a particular remote host, if
+  the configuration parameter <varname>auth_delay.exp_backoff</varname> is
+  active.  Once an authentication succeeded from a remote host, the
+  authentication delay is reset to the value of
+  <varname>auth_delay.milliseconds</varname> for this host.  The parameter
+  <varname>auth_delay.max_seconds</varname> sets an upper bound for the delay
+  in this case.
+ </para>

How about something like this…

If you enable exponential_backoff, auth_delay will double the delay
after each consecutive authentication failure from a particular
host, up to the given max_seconds (default: 10s). If the host
authenticates successfully, the delay is reset.

Done, mostly.

+   <varlistentry>
+    <term>
+     <varname>auth_delay.max_seconds</varname> (<type>integer</type>)
+     <indexterm>
+      <primary><varname>auth_delay.max_seconds</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      How many seconds to wait at most if exponential backoff is active.
+      Setting this parameter to 0 disables it.  The default is 10 seconds.
+     </para>
+    </listitem>
+   </varlistentry>

I suggest "The maximum delay, in seconds, when exponential backoff is
enabled."

Done.

Michael

Attachments:

v2-0001-Add-optional-exponential-backoff-to-auth_delay-co.patchtext/x-diff; charset=utf-8Download
From e60cdca73d384fa467f8e6becdd85d9a0c530b60 Mon Sep 17 00:00:00 2001
From: Michael Banck <michael.banck@credativ.de>
Date: Wed, 27 Dec 2023 15:55:39 +0100
Subject: [PATCH v2] Add optional exponential backoff to auth_delay contrib
 module.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This adds two new GUCs for auth_delay, exponential_backoff and max_seconds. The
former controls whether exponential backoff should be used or not, the latter
sets an maximum delay (default is 10s) in case exponential backoff is active.

The exponential backoff is tracked per remote host and doubled for every failed
login attempt (i.e., wrong password, not just missing pg_hba line or database)
and reset to auth_delay.milliseconds after a successful authentication from
that host.

This patch is partly based on a larger (but ultimately rejected) patch by
成之焕.

Authors: Michael Banck, 成之焕
Reviewed-by: Abhijit Menon-Sen
Discussion: https://postgr.es/m/AHwAxACqIwIVOEhs5YejpqoG.1.1668569845751.Hmail.zhcheng@ceresdata.com
---
 contrib/auth_delay/auth_delay.c  | 197 ++++++++++++++++++++++++++++++-
 doc/src/sgml/auth-delay.sgml     |  48 +++++++-
 src/tools/pgindent/typedefs.list |   1 +
 3 files changed, 243 insertions(+), 3 deletions(-)

diff --git a/contrib/auth_delay/auth_delay.c b/contrib/auth_delay/auth_delay.c
index ff0e1fd461..06d2f5f280 100644
--- a/contrib/auth_delay/auth_delay.c
+++ b/contrib/auth_delay/auth_delay.c
@@ -14,24 +14,49 @@
 #include <limits.h>
 
 #include "libpq/auth.h"
+#include "miscadmin.h"
 #include "port.h"
+#include "storage/ipc.h"
+#include "storage/lwlock.h"
+#include "storage/shmem.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
 PG_MODULE_MAGIC;
 
+#define MAX_CONN_RECORDS 50
+
 /* GUC Variables */
 static int	auth_delay_milliseconds = 0;
+static bool auth_delay_exp_backoff = false;
+static int	auth_delay_max_seconds = 0;
 
 /* Original Hook */
 static ClientAuthentication_hook_type original_client_auth_hook = NULL;
 
+typedef struct AuthConnRecord
+{
+	char		remote_host[NI_MAXHOST];
+	double		sleep_time;		/* in milliseconds */
+} AuthConnRecord;
+
+static shmem_startup_hook_type shmem_startup_next = NULL;
+static shmem_request_hook_type shmem_request_next = NULL;
+static AuthConnRecord *acr_array = NULL;
+
+static AuthConnRecord *find_acr_for_host(char *remote_host);
+static AuthConnRecord *find_free_acr(void);
+static double increase_delay_after_failed_conn_auth(Port *port);
+static void cleanup_conn_record(Port *port);
+
 /*
  * Check authentication
  */
 static void
 auth_delay_checks(Port *port, int status)
 {
+	double		delay = auth_delay_milliseconds;
+
 	/*
 	 * Any other plugins which use ClientAuthentication_hook.
 	 */
@@ -41,10 +66,146 @@ auth_delay_checks(Port *port, int status)
 	/*
 	 * Inject a short delay if authentication failed.
 	 */
-	if (status != STATUS_OK)
+	if (status == STATUS_ERROR)
 	{
-		pg_usleep(1000L * auth_delay_milliseconds);
+		if (auth_delay_exp_backoff)
+		{
+			/*
+			 * Delay by 2^n seconds after each authentication failure from a
+			 * particular host, where n is the number of consecutive
+			 * authentication failures.
+			 */
+			delay = increase_delay_after_failed_conn_auth(port);
+
+			/*
+			 * Clamp delay to a maximum of auth_delay_max_seconds.
+			 */
+			if (auth_delay_max_seconds > 0) {
+				delay = Min(delay, 1000L * auth_delay_max_seconds);
+			}
+		}
+
+		if (delay > 0)
+		{
+			elog(DEBUG1, "Authentication delayed for %g seconds due to auth_delay", delay / 1000.0);
+			pg_usleep(1000L * (long) delay);
+		}
 	}
+
+	/*
+	 * Remove host-specific delay if authentication succeeded.
+	 */
+	if (status == STATUS_OK)
+		cleanup_conn_record(port);
+}
+
+static double
+increase_delay_after_failed_conn_auth(Port *port)
+{
+	AuthConnRecord *acr = NULL;
+
+	acr = find_acr_for_host(port->remote_host);
+
+	if (!acr)
+	{
+		acr = find_free_acr();
+
+		if (!acr)
+		{
+			/*
+			 * No free space, MAX_CONN_RECORDS reached. Wait for the
+			 * configured maximum amount.
+			 */
+			return 1000L * auth_delay_max_seconds;
+		}
+		strcpy(acr->remote_host, port->remote_host);
+	}
+	if (acr->sleep_time == 0)
+		acr->sleep_time = (double) auth_delay_milliseconds;
+	else
+		acr->sleep_time *= 2;
+
+	return acr->sleep_time;
+}
+
+static AuthConnRecord *
+find_acr_for_host(char *remote_host)
+{
+	int			i;
+
+	for (i = 0; i < MAX_CONN_RECORDS; i++)
+	{
+		if (strcmp(acr_array[i].remote_host, remote_host) == 0)
+			return &acr_array[i];
+	}
+
+	return NULL;
+}
+
+static AuthConnRecord *
+find_free_acr(void)
+{
+	int			i;
+
+	for (i = 0; i < MAX_CONN_RECORDS; i++)
+	{
+		if (!acr_array[i].remote_host[0])
+			return &acr_array[i];
+	}
+
+	return 0;
+}
+
+static void
+cleanup_conn_record(Port *port)
+{
+	AuthConnRecord *acr = NULL;
+
+	acr = find_acr_for_host(port->remote_host);
+	if (acr == NULL)
+		return;
+
+	port->remote_host[0] = '\0';
+
+	acr->sleep_time = 0.0;
+}
+
+/*
+ * Set up shared memory
+ */
+
+static void
+auth_delay_shmem_request(void)
+{
+	Size		shm_size;
+
+	if (shmem_request_next)
+		shmem_request_next();
+
+	shm_size = sizeof(AuthConnRecord) * MAX_CONN_RECORDS;
+	shm_size += sizeof(int);
+	RequestAddinShmemSpace(shm_size);
+}
+
+static void
+auth_delay_shmem_startup(void)
+{
+	bool		found;
+	Size		shm_size;
+
+	if (shmem_startup_next)
+		shmem_startup_next();
+
+	shm_size = sizeof(AuthConnRecord) * MAX_CONN_RECORDS;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	acr_array = ShmemInitStruct("Array of AuthConnRecord", shm_size, &found);
+	if (!found)
+	{
+		/* First time through ... */
+		memset(acr_array, 0, shm_size);
+	}
+	LWLockRelease(AddinShmemInitLock);
 }
 
 /*
@@ -53,6 +214,11 @@ auth_delay_checks(Port *port, int status)
 void
 _PG_init(void)
 {
+	if (!process_shared_preload_libraries_in_progress)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("auth_delay must be loaded via shared_preload_libraries")));
+
 	/* Define custom GUC variables */
 	DefineCustomIntVariable("auth_delay.milliseconds",
 							"Milliseconds to delay before reporting authentication failure",
@@ -66,9 +232,36 @@ _PG_init(void)
 							NULL,
 							NULL);
 
+	DefineCustomBoolVariable("auth_delay.exponential_backoff",
+							 "Double the delay after each authentication failure from a particular host",
+							 NULL,
+							 &auth_delay_exp_backoff,
+							 false,
+							 PGC_SIGHUP,
+							 0,
+							 NULL,
+							 NULL,
+							 NULL);
+
+	DefineCustomIntVariable("auth_delay.max_seconds",
+							"Maximum delay when exponential backoff is enabled",
+							NULL,
+							&auth_delay_max_seconds,
+							10,
+							0, INT_MAX,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL, NULL, NULL);
+
 	MarkGUCPrefixReserved("auth_delay");
 
 	/* Install Hooks */
 	original_client_auth_hook = ClientAuthentication_hook;
 	ClientAuthentication_hook = auth_delay_checks;
+
+	/* Set up shared memory */
+	shmem_request_next = shmem_request_hook;
+	shmem_request_hook = auth_delay_shmem_request;
+	shmem_startup_next = shmem_startup_hook;
+	shmem_startup_hook = auth_delay_shmem_startup;
 }
diff --git a/doc/src/sgml/auth-delay.sgml b/doc/src/sgml/auth-delay.sgml
index 0571f2a99d..89585ea3c8 100644
--- a/doc/src/sgml/auth-delay.sgml
+++ b/doc/src/sgml/auth-delay.sgml
@@ -16,6 +16,22 @@
   connection slots.
  </para>
 
+ <para>
+  It is optionally possible to let <filename>auth_delay</filename> wait longer
+  on each successive authentication failure if the configuration parameter
+  <varname>auth_delay.exponential_backoff</varname> is active.  If enabled,
+  <filename>auth_delay</filename> will double the delay after each consecutive
+  authentication failure from a particular host, up to the given
+  <varname>auth_delay.max_seconds</varname> (default: 10s). If the host
+  authenticates successfully, the delay is reset.
+ </para>
+
+ <para>
+  In this case, it might be desirable to decrease the configuration parameter
+  <varname>auth_delay.milliseconds</varname> from the default of 1 second in
+  order to not delay as long for single accidental authentication failures.
+ </para>
+
  <para>
   In order to function, this module must be loaded via
   <xref linkend="guc-shared-preload-libraries"/> in <filename>postgresql.conf</filename>.
@@ -39,6 +55,34 @@
      </para>
     </listitem>
    </varlistentry>
+   <varlistentry>
+    <term>
+     <varname>auth_delay.exp_backoff</varname> (<type>bool</type>)
+     <indexterm>
+      <primary><varname>auth_delay.exp_backoff</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      Whether to use exponential backoff per remote host on authentication
+      failure.  The default is off.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+    <term>
+     <varname>auth_delay.max_seconds</varname> (<type>integer</type>)
+     <indexterm>
+      <primary><varname>auth_delay.max_seconds</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      The maximum delay, in seconds, when exponential backoff is
+      enabled.  The default is 10 seconds.
+     </para>
+    </listitem>
+   </varlistentry>
   </variablelist>
 
   <para>
@@ -50,7 +94,9 @@
 # postgresql.conf
 shared_preload_libraries = 'auth_delay'
 
-auth_delay.milliseconds = '500'
+auth_delay.milliseconds = '125'
+auth_delay.exp_backoff = 'on'
+auth_delay.max_seconds = '20'
 </programlisting>
  </sect2>
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 29fd1cae64..ee30969f0c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -165,6 +165,7 @@ AttrMap
 AttrMissing
 AttrNumber
 AttributeOpts
+AuthConnRecord
 AuthRequest
 AuthToken
 AutoPrewarmSharedState
-- 
2.39.2

#5Michael Banck
mbanck@gmx.net
In reply to: Michael Banck (#4)
1 attachment(s)
Re: [PATCH] Exponential backoff for auth_delay

On Fri, Jan 19, 2024 at 03:08:36PM +0100, Michael Banck wrote:

I went through your comments (a lot of which pertained to the original
larger patch I took code from), attached is a reworked version 2.

Oops, we are supposed to be at version 3, attached.

Michael

Attachments:

v3-0001-Add-optional-exponential-backoff-to-auth_delay-co.patchtext/x-diff; charset=utf-8Download
From e60cdca73d384fa467f8e6becdd85d9a0c530b60 Mon Sep 17 00:00:00 2001
From: Michael Banck <michael.banck@credativ.de>
Date: Wed, 27 Dec 2023 15:55:39 +0100
Subject: [PATCH v3] Add optional exponential backoff to auth_delay contrib
 module.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This adds two new GUCs for auth_delay, exponential_backoff and max_seconds. The
former controls whether exponential backoff should be used or not, the latter
sets an maximum delay (default is 10s) in case exponential backoff is active.

The exponential backoff is tracked per remote host and doubled for every failed
login attempt (i.e., wrong password, not just missing pg_hba line or database)
and reset to auth_delay.milliseconds after a successful authentication from
that host.

This patch is partly based on a larger (but ultimately rejected) patch by
成之焕.

Authors: Michael Banck, 成之焕
Reviewed-by: Abhijit Menon-Sen
Discussion: https://postgr.es/m/AHwAxACqIwIVOEhs5YejpqoG.1.1668569845751.Hmail.zhcheng@ceresdata.com
---
 contrib/auth_delay/auth_delay.c  | 197 ++++++++++++++++++++++++++++++-
 doc/src/sgml/auth-delay.sgml     |  48 +++++++-
 src/tools/pgindent/typedefs.list |   1 +
 3 files changed, 243 insertions(+), 3 deletions(-)

diff --git a/contrib/auth_delay/auth_delay.c b/contrib/auth_delay/auth_delay.c
index ff0e1fd461..06d2f5f280 100644
--- a/contrib/auth_delay/auth_delay.c
+++ b/contrib/auth_delay/auth_delay.c
@@ -14,24 +14,49 @@
 #include <limits.h>
 
 #include "libpq/auth.h"
+#include "miscadmin.h"
 #include "port.h"
+#include "storage/ipc.h"
+#include "storage/lwlock.h"
+#include "storage/shmem.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
 PG_MODULE_MAGIC;
 
+#define MAX_CONN_RECORDS 50
+
 /* GUC Variables */
 static int	auth_delay_milliseconds = 0;
+static bool auth_delay_exp_backoff = false;
+static int	auth_delay_max_seconds = 0;
 
 /* Original Hook */
 static ClientAuthentication_hook_type original_client_auth_hook = NULL;
 
+typedef struct AuthConnRecord
+{
+	char		remote_host[NI_MAXHOST];
+	double		sleep_time;		/* in milliseconds */
+} AuthConnRecord;
+
+static shmem_startup_hook_type shmem_startup_next = NULL;
+static shmem_request_hook_type shmem_request_next = NULL;
+static AuthConnRecord *acr_array = NULL;
+
+static AuthConnRecord *find_acr_for_host(char *remote_host);
+static AuthConnRecord *find_free_acr(void);
+static double increase_delay_after_failed_conn_auth(Port *port);
+static void cleanup_conn_record(Port *port);
+
 /*
  * Check authentication
  */
 static void
 auth_delay_checks(Port *port, int status)
 {
+	double		delay = auth_delay_milliseconds;
+
 	/*
 	 * Any other plugins which use ClientAuthentication_hook.
 	 */
@@ -41,10 +66,146 @@ auth_delay_checks(Port *port, int status)
 	/*
 	 * Inject a short delay if authentication failed.
 	 */
-	if (status != STATUS_OK)
+	if (status == STATUS_ERROR)
 	{
-		pg_usleep(1000L * auth_delay_milliseconds);
+		if (auth_delay_exp_backoff)
+		{
+			/*
+			 * Delay by 2^n seconds after each authentication failure from a
+			 * particular host, where n is the number of consecutive
+			 * authentication failures.
+			 */
+			delay = increase_delay_after_failed_conn_auth(port);
+
+			/*
+			 * Clamp delay to a maximum of auth_delay_max_seconds.
+			 */
+			if (auth_delay_max_seconds > 0) {
+				delay = Min(delay, 1000L * auth_delay_max_seconds);
+			}
+		}
+
+		if (delay > 0)
+		{
+			elog(DEBUG1, "Authentication delayed for %g seconds due to auth_delay", delay / 1000.0);
+			pg_usleep(1000L * (long) delay);
+		}
 	}
+
+	/*
+	 * Remove host-specific delay if authentication succeeded.
+	 */
+	if (status == STATUS_OK)
+		cleanup_conn_record(port);
+}
+
+static double
+increase_delay_after_failed_conn_auth(Port *port)
+{
+	AuthConnRecord *acr = NULL;
+
+	acr = find_acr_for_host(port->remote_host);
+
+	if (!acr)
+	{
+		acr = find_free_acr();
+
+		if (!acr)
+		{
+			/*
+			 * No free space, MAX_CONN_RECORDS reached. Wait for the
+			 * configured maximum amount.
+			 */
+			return 1000L * auth_delay_max_seconds;
+		}
+		strcpy(acr->remote_host, port->remote_host);
+	}
+	if (acr->sleep_time == 0)
+		acr->sleep_time = (double) auth_delay_milliseconds;
+	else
+		acr->sleep_time *= 2;
+
+	return acr->sleep_time;
+}
+
+static AuthConnRecord *
+find_acr_for_host(char *remote_host)
+{
+	int			i;
+
+	for (i = 0; i < MAX_CONN_RECORDS; i++)
+	{
+		if (strcmp(acr_array[i].remote_host, remote_host) == 0)
+			return &acr_array[i];
+	}
+
+	return NULL;
+}
+
+static AuthConnRecord *
+find_free_acr(void)
+{
+	int			i;
+
+	for (i = 0; i < MAX_CONN_RECORDS; i++)
+	{
+		if (!acr_array[i].remote_host[0])
+			return &acr_array[i];
+	}
+
+	return 0;
+}
+
+static void
+cleanup_conn_record(Port *port)
+{
+	AuthConnRecord *acr = NULL;
+
+	acr = find_acr_for_host(port->remote_host);
+	if (acr == NULL)
+		return;
+
+	port->remote_host[0] = '\0';
+
+	acr->sleep_time = 0.0;
+}
+
+/*
+ * Set up shared memory
+ */
+
+static void
+auth_delay_shmem_request(void)
+{
+	Size		shm_size;
+
+	if (shmem_request_next)
+		shmem_request_next();
+
+	shm_size = sizeof(AuthConnRecord) * MAX_CONN_RECORDS;
+	shm_size += sizeof(int);
+	RequestAddinShmemSpace(shm_size);
+}
+
+static void
+auth_delay_shmem_startup(void)
+{
+	bool		found;
+	Size		shm_size;
+
+	if (shmem_startup_next)
+		shmem_startup_next();
+
+	shm_size = sizeof(AuthConnRecord) * MAX_CONN_RECORDS;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	acr_array = ShmemInitStruct("Array of AuthConnRecord", shm_size, &found);
+	if (!found)
+	{
+		/* First time through ... */
+		memset(acr_array, 0, shm_size);
+	}
+	LWLockRelease(AddinShmemInitLock);
 }
 
 /*
@@ -53,6 +214,11 @@ auth_delay_checks(Port *port, int status)
 void
 _PG_init(void)
 {
+	if (!process_shared_preload_libraries_in_progress)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("auth_delay must be loaded via shared_preload_libraries")));
+
 	/* Define custom GUC variables */
 	DefineCustomIntVariable("auth_delay.milliseconds",
 							"Milliseconds to delay before reporting authentication failure",
@@ -66,9 +232,36 @@ _PG_init(void)
 							NULL,
 							NULL);
 
+	DefineCustomBoolVariable("auth_delay.exponential_backoff",
+							 "Double the delay after each authentication failure from a particular host",
+							 NULL,
+							 &auth_delay_exp_backoff,
+							 false,
+							 PGC_SIGHUP,
+							 0,
+							 NULL,
+							 NULL,
+							 NULL);
+
+	DefineCustomIntVariable("auth_delay.max_seconds",
+							"Maximum delay when exponential backoff is enabled",
+							NULL,
+							&auth_delay_max_seconds,
+							10,
+							0, INT_MAX,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL, NULL, NULL);
+
 	MarkGUCPrefixReserved("auth_delay");
 
 	/* Install Hooks */
 	original_client_auth_hook = ClientAuthentication_hook;
 	ClientAuthentication_hook = auth_delay_checks;
+
+	/* Set up shared memory */
+	shmem_request_next = shmem_request_hook;
+	shmem_request_hook = auth_delay_shmem_request;
+	shmem_startup_next = shmem_startup_hook;
+	shmem_startup_hook = auth_delay_shmem_startup;
 }
diff --git a/doc/src/sgml/auth-delay.sgml b/doc/src/sgml/auth-delay.sgml
index 0571f2a99d..89585ea3c8 100644
--- a/doc/src/sgml/auth-delay.sgml
+++ b/doc/src/sgml/auth-delay.sgml
@@ -16,6 +16,22 @@
   connection slots.
  </para>
 
+ <para>
+  It is optionally possible to let <filename>auth_delay</filename> wait longer
+  on each successive authentication failure if the configuration parameter
+  <varname>auth_delay.exponential_backoff</varname> is active.  If enabled,
+  <filename>auth_delay</filename> will double the delay after each consecutive
+  authentication failure from a particular host, up to the given
+  <varname>auth_delay.max_seconds</varname> (default: 10s). If the host
+  authenticates successfully, the delay is reset.
+ </para>
+
+ <para>
+  In this case, it might be desirable to decrease the configuration parameter
+  <varname>auth_delay.milliseconds</varname> from the default of 1 second in
+  order to not delay as long for single accidental authentication failures.
+ </para>
+
  <para>
   In order to function, this module must be loaded via
   <xref linkend="guc-shared-preload-libraries"/> in <filename>postgresql.conf</filename>.
@@ -39,6 +55,34 @@
      </para>
     </listitem>
    </varlistentry>
+   <varlistentry>
+    <term>
+     <varname>auth_delay.exp_backoff</varname> (<type>bool</type>)
+     <indexterm>
+      <primary><varname>auth_delay.exp_backoff</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      Whether to use exponential backoff per remote host on authentication
+      failure.  The default is off.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+    <term>
+     <varname>auth_delay.max_seconds</varname> (<type>integer</type>)
+     <indexterm>
+      <primary><varname>auth_delay.max_seconds</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      The maximum delay, in seconds, when exponential backoff is
+      enabled.  The default is 10 seconds.
+     </para>
+    </listitem>
+   </varlistentry>
   </variablelist>
 
   <para>
@@ -50,7 +94,9 @@
 # postgresql.conf
 shared_preload_libraries = 'auth_delay'
 
-auth_delay.milliseconds = '500'
+auth_delay.milliseconds = '125'
+auth_delay.exp_backoff = 'on'
+auth_delay.max_seconds = '20'
 </programlisting>
  </sect2>
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 29fd1cae64..ee30969f0c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -165,6 +165,7 @@ AttrMap
 AttrMissing
 AttrNumber
 AttributeOpts
+AuthConnRecord
 AuthRequest
 AuthToken
 AutoPrewarmSharedState
-- 
2.39.2

#6Abhijit Menon-Sen
ams@toroid.org
In reply to: Michael Banck (#4)
Re: [PATCH] Exponential backoff for auth_delay

At 2024-01-19 15:08:36 +0100, mbanck@gmx.net wrote:

I wonder whether maybe auth_delay.max_seconds should either be renamed
to auth_delay.exponential_backoff_max_seconds (but then it is rather
long) in order to make it clearer it only applies in that context or
alternatively just apply to auth_delay.milliseconds as well (though
that would be somewhat weird).

I think it's OK as-is. The description/docs are pretty clear.

I wonder though whether this might be a useful message to have at some
more standard level like INFO?

I don't have a strong opinion about this, but I suspect anyone who is
annoyed enough by repeated authentication failures to use auth_delay
will also be happy to have less noise in the logs about it.

You mean something like "after 5 minutes, reset the delay to 0 again"?
I agree that this would probably be useful, but would also make the
change more complex.

Yes, that's the kind of thing I meant.

I agree that it would make this patch more complex, and I don't think
it's necessary to implement. However, since it's a feature that seems to
go hand-in-hand with exponential backoff in general, it _may_ be good to
mention in the docs that the sleep time for a host is reset only by
successful authentication, not by any timeout. Not sure.

What I had in mind is that admins would lower auth_delay.milliseconds to
something like 100 or 125 when exponential_backoff is on

Ah, that makes a lot of sense. Thanks for explaining.

Your new v3 patch looks fine to me. I'm marking it as ready for
committer.

-- Abhijit

#7Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Michael Banck (#5)
Re: [PATCH] Exponential backoff for auth_delay

Hi,

Thanks for the patch. I took a closer look at v3, so let me share some
review comments. Please push back if you happen to disagree with some of
it, some of this is going to be more a matter of personal preference.

1) I think it's a bit weird to have two options specifying amount of
time, but one is in seconds and one in milliseconds. Won't that be
unnecessarily confusing? Could we do both in milliseconds?

2) The C code defines the GUC as auth_delay.exponential_backoff, but the
SGML docs say <varname>auth_delay.exp_backoff</varname>.

3) Do we actually need the exponential_backoff GUC? Wouldn't it be
sufficient to have the maximum value, and if it's -1 treat that as no
backoff?

4) I think the SGML docs should more clearly explain that the delay is
initialized to auth_delay.milliseconds, and reset to this value after
successful authentication. The wording kinda implies that, but it's not
quite clear I think.

4) I've been really confused by the change from

if (status != STATUS_OK)

to

if (status == STATUS_ERROR)

in auth_delay_checks, until I realized those two codes are not the only
ones, and we intentionally ignore STATUS_EOF. I think it'd be good to
mention that in a comment, it's not quite obvious (I only realized it
because the e-mail mentioned it).

5) I kinda like the custom that functions in a contrib module start with
a clear common prefix, say auth_delay_ in this case. Matter of personal
preference, ofc.

6) I'm a bit skeptical about some acr_array details. Firstly, why would
50 entries be enough? Seems like a pretty low and arbitrary number ...
Also, what happens if someone attempts to authenticate, fails to do so,
and then disconnects and never tries again? Or just changes IP? Seems
like the entry will remain in the array forever, no?

Seems like a way to cause a "limited" DoS - do auth failure from 50
different hosts, to fill the table, and now everyone has to wait the
maximum amount of time (if they fail to authenticate).

I think it'd be good to consider:

- Making the acr_array a hash table, and larger than 50 entries (I
wonder if that should be dynamic / customizable by GUC?).

- Make sure the entries are eventually expired, based on time (for
example after 2*max_delay?).

- It would be a good idea to log something when we get into the "full
table" and start delaying everyone by max_delay_seconds. (How would
anyone know that's what's happening, seems rather confusing.)

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Michael Banck
mbanck@gmx.net
In reply to: Tomas Vondra (#7)
1 attachment(s)
Re: [PATCH] Exponential backoff for auth_delay

Hi,

On Wed, Feb 21, 2024 at 10:26:11PM +0100, Tomas Vondra wrote:

Thanks for the patch. I took a closer look at v3, so let me share some
review comments. Please push back if you happen to disagree with some of
it, some of this is going to be more a matter of personal preference.

Thanks! As my patch was based on a previous patch, some of decisions
were carry-overs I am not overly attached to.

1) I think it's a bit weird to have two options specifying amount of
time, but one is in seconds and one in milliseconds. Won't that be
unnecessarily confusing? Could we do both in milliseconds?

Alright, I changed that.

See below for a discussion about the GUCs in general.

2) The C code defines the GUC as auth_delay.exponential_backoff, but the
SGML docs say <varname>auth_delay.exp_backoff</varname>.

Right, an oversight from the last version where the GUC name got changed
but I forgot to change the documentation, fixed.

3) Do we actually need the exponential_backoff GUC? Wouldn't it be
sufficient to have the maximum value, and if it's -1 treat that as no
backoff?

That is a good question, I guess that makes sense.

The next question then is: what should the default for (now)
auth_delay.max_milliseconds be in this case, -1? Or do we say that as
the default for auth_delay.milliseconds is 0 anyway, why would somebody
not want exponential backoff when they switch it on and keep it at the
current 10s/10000ms)?

I have not changed that for now, pending further input.

4) I think the SGML docs should more clearly explain that the delay is
initialized to auth_delay.milliseconds, and reset to this value after
successful authentication. The wording kinda implies that, but it's not
quite clear I think.

Ok, I added some text to that end. I also added a not that
auth_delay.max_milliseconds will mean that the delay doubling never
stops.

4) I've been really confused by the change from

if (status != STATUS_OK)
to
if (status == STATUS_ERROR)

in auth_delay_checks, until I realized those two codes are not the only
ones, and we intentionally ignore STATUS_EOF. I think it'd be good to
mention that in a comment, it's not quite obvious (I only realized it
because the e-mail mentioned it).

Yeah I agree, I tried to explain that now.

5) I kinda like the custom that functions in a contrib module start with
a clear common prefix, say auth_delay_ in this case. Matter of personal
preference, ofc.

Ok, I changed the functions to have an auth_delay_ prefix throughout..

6) I'm a bit skeptical about some acr_array details. Firstly, why would
50 entries be enough? Seems like a pretty low and arbitrary number ...
Also, what happens if someone attempts to authenticate, fails to do so,
and then disconnects and never tries again? Or just changes IP? Seems
like the entry will remain in the array forever, no?

Yeah, that is how v3 of this patch worked. I have changed that now, see
below.

Seems like a way to cause a "limited" DoS - do auth failure from 50
different hosts, to fill the table, and now everyone has to wait the
maximum amount of time (if they fail to authenticate).

Right, though the problem would only exist on authentication failures,
so it is really rather limited.

I think it'd be good to consider:

- Making the acr_array a hash table, and larger than 50 entries (I
wonder if that should be dynamic / customizable by GUC?).

I would say a GUC should be overkill for this as this would mostly be an
implementation detail.

More generally, I think now that entries are expired (see below), this
should be less of a concern, so I have not changed this to a hash table
for now but doubled MAX_CONN_RECORDS to 100 entries.

- Make sure the entries are eventually expired, based on time (for
example after 2*max_delay?).

I went with 5*max_milliseconds - the AuthConnRecord struct now has a
last_failed_auth timestamp member; if we increase the delay for a host,
we check if any other host expired in the meantime and remove it if so.

- It would be a good idea to log something when we get into the "full
table" and start delaying everyone by max_delay_seconds. (How would
anyone know that's what's happening, seems rather confusing.)

Right, I added a log line for that. However, I made it LOG instead of
WARNING as I don't think the client would ever see it, would he?

Attached is v4 with the above changes.

Cheers,

Michael

Attachments:

v4-0001-Add-optional-exponential-backoff-to-auth_delay-co.patchtext/x-diff; charset=utf-8Download
From 6520fb1e768bb8bc26cafad014d08d7e9ac7f12a Mon Sep 17 00:00:00 2001
From: Michael Banck <michael.banck@credativ.de>
Date: Wed, 27 Dec 2023 15:55:39 +0100
Subject: [PATCH v4] Add optional exponential backoff to auth_delay contrib
 module.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This adds two new GUCs for auth_delay, exponential_backoff and
max_milliseconds. The former controls whether exponential backoff should be
used or not, the latter sets an maximum delay (default is 10s) in case
exponential backoff is active.

The exponential backoff is tracked per remote host and doubled for every failed
login attempt (i.e., wrong password, not just missing pg_hba line or database)
and reset to auth_delay.milliseconds after a successful authentication or when
no authentication attempts have been made for 5*max_milliseconds from that
host.

Authors: Michael Banck, based on an earlier patch by 成之焕
Reviewed-by: Abhijit Menon-Sen, Tomas Vondra
Discussion: https://postgr.es/m/AHwAxACqIwIVOEhs5YejpqoG.1.1668569845751.Hmail.zhcheng@ceresdata.com
---
 contrib/auth_delay/auth_delay.c  | 231 ++++++++++++++++++++++++++++++-
 doc/src/sgml/auth-delay.sgml     |  45 +++++-
 src/tools/pgindent/typedefs.list |   1 +
 3 files changed, 273 insertions(+), 4 deletions(-)

diff --git a/contrib/auth_delay/auth_delay.c b/contrib/auth_delay/auth_delay.c
index ff0e1fd461..647aa4e1cd 100644
--- a/contrib/auth_delay/auth_delay.c
+++ b/contrib/auth_delay/auth_delay.c
@@ -14,24 +14,51 @@
 #include <limits.h>
 
 #include "libpq/auth.h"
+#include "miscadmin.h"
 #include "port.h"
+#include "storage/dsm_registry.h"
+#include "storage/ipc.h"
+#include "storage/lwlock.h"
+#include "storage/shmem.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
 PG_MODULE_MAGIC;
 
+#define MAX_CONN_RECORDS 100
+
 /* GUC Variables */
 static int	auth_delay_milliseconds = 0;
+static bool auth_delay_exp_backoff = false;
+static int	auth_delay_max_milliseconds = 0;
 
 /* Original Hook */
 static ClientAuthentication_hook_type original_client_auth_hook = NULL;
 
+typedef struct AuthConnRecord
+{
+	char		remote_host[NI_MAXHOST];
+	double		sleep_time;		/* in milliseconds */
+	TimestampTz last_failed_auth;
+} AuthConnRecord;
+
+static shmem_startup_hook_type shmem_startup_next = NULL;
+static AuthConnRecord *acr_array = NULL;
+
+static AuthConnRecord *auth_delay_find_acr_for_host(char *remote_host);
+static AuthConnRecord *auth_delay_find_free_acr(void);
+static double auth_delay_increase_delay_after_failed_conn_auth(Port *port);
+static void auth_delay_cleanup_conn_record(Port *port);
+static void auth_delay_expire_conn_records(Port *port);
+
 /*
  * Check authentication
  */
 static void
 auth_delay_checks(Port *port, int status)
 {
+	double		delay = auth_delay_milliseconds;
+
 	/*
 	 * Any other plugins which use ClientAuthentication_hook.
 	 */
@@ -39,20 +66,193 @@ auth_delay_checks(Port *port, int status)
 		original_client_auth_hook(port, status);
 
 	/*
-	 * Inject a short delay if authentication failed.
+	 * We handle both STATUS_ERROR and STATUS_OK - the third option
+	 * (STATUS_EOF) is disregarded.
+	 *
+	 * In case of STATUS_ERROR we inject a short delay, optionally with
+	 * exponential backoff.
+	 */
+	if (status == STATUS_ERROR)
+	{
+		if (auth_delay_exp_backoff)
+		{
+			/*
+			 * Delay by 2^n seconds after each authentication failure from a
+			 * particular host, where n is the number of consecutive
+			 * authentication failures.
+			 */
+			delay = auth_delay_increase_delay_after_failed_conn_auth(port);
+
+			/*
+			 * Clamp delay to a maximum of auth_delay_max_milliseconds.
+			 */
+			if (auth_delay_max_milliseconds > 0)
+			{
+				delay = Min(delay, auth_delay_max_milliseconds);
+			}
+		}
+
+		if (delay > 0)
+		{
+			elog(DEBUG1, "Authentication delayed for %g seconds due to auth_delay", delay / 1000.0);
+			pg_usleep(1000L * (long) delay);
+		}
+
+		/*
+		 * Expire delays from other hosts after auth_delay_max_milliseconds *
+		 * 5.
+		 */
+		auth_delay_expire_conn_records(port);
+	}
+
+	/*
+	 * Remove host-specific delay if authentication succeeded.
+	 */
+	if (status == STATUS_OK)
+		auth_delay_cleanup_conn_record(port);
+}
+
+static double
+auth_delay_increase_delay_after_failed_conn_auth(Port *port)
+{
+	AuthConnRecord *acr = NULL;
+
+	acr = auth_delay_find_acr_for_host(port->remote_host);
+
+	if (!acr)
+	{
+		acr = auth_delay_find_free_acr();
+
+		if (!acr)
+		{
+			/*
+			 * No free space, MAX_CONN_RECORDS reached. Wait for the
+			 * configured maximum amount.
+			 */
+			elog(LOG, "auth_delay: host connection list full, waiting maximum amount");
+			return auth_delay_max_milliseconds;
+		}
+		strcpy(acr->remote_host, port->remote_host);
+	}
+	if (acr->sleep_time == 0)
+		acr->sleep_time = (double) auth_delay_milliseconds;
+	else
+		acr->sleep_time *= 2;
+
+	/*
+	 * Set current timestamp for later expiry.
 	 */
-	if (status != STATUS_OK)
+	acr->last_failed_auth = GetCurrentTimestamp();
+
+	return acr->sleep_time;
+}
+
+static AuthConnRecord *
+auth_delay_find_acr_for_host(char *remote_host)
+{
+	int			i;
+
+	for (i = 0; i < MAX_CONN_RECORDS; i++)
+	{
+		if (strcmp(acr_array[i].remote_host, remote_host) == 0)
+			return &acr_array[i];
+	}
+
+	return NULL;
+}
+
+static AuthConnRecord *
+auth_delay_find_free_acr(void)
+{
+	int			i;
+
+	for (i = 0; i < MAX_CONN_RECORDS; i++)
+	{
+		if (!acr_array[i].remote_host[0])
+			return &acr_array[i];
+	}
+
+	return 0;
+}
+
+static void
+auth_delay_cleanup_conn_record(Port *port)
+{
+	AuthConnRecord *acr = NULL;
+
+	acr = auth_delay_find_acr_for_host(port->remote_host);
+	if (acr == NULL)
+		return;
+
+	port->remote_host[0] = '\0';
+
+	acr->sleep_time = 0.0;
+	acr->last_failed_auth = 0.0;
+}
+
+static void
+auth_delay_expire_conn_records(Port *port)
+{
+	int			i;
+	TimestampTz now = GetCurrentTimestamp();
+
+	for (i = 0; i < MAX_CONN_RECORDS; i++)
 	{
-		pg_usleep(1000L * auth_delay_milliseconds);
+		/*
+		 * Do not expire the host from which the current authentication
+		 * failure originated.
+		 */
+		if (strcmp(acr_array[i].remote_host, port->remote_host) == 0)
+			continue;
+
+		if (acr_array[i].last_failed_auth > 0 && (long) ((now - acr_array[i].last_failed_auth) / 1000) > 5 * auth_delay_max_milliseconds)
+		{
+			acr_array[i].remote_host[0] = '\0';
+			acr_array[i].sleep_time = 0.0;
+			acr_array[i].last_failed_auth = 0.0;
+		}
 	}
 }
 
+/*
+ * Set up shared memory
+ */
+
+static void
+auth_delay_init_state(void *ptr)
+{
+	Size		shm_size;
+	AuthConnRecord *array = (AuthConnRecord *) ptr;
+
+	shm_size = sizeof(AuthConnRecord) * MAX_CONN_RECORDS;
+
+	memset(array, 0, shm_size);
+}
+
+static void
+auth_delay_shmem_startup(void)
+{
+	bool		found;
+	Size		shm_size;
+
+	if (shmem_startup_next)
+		shmem_startup_next();
+
+	shm_size = sizeof(AuthConnRecord) * MAX_CONN_RECORDS;
+	acr_array = GetNamedDSMSegment("auth_delay", shm_size, auth_delay_init_state, &found);
+}
+
 /*
  * Module Load Callback
  */
 void
 _PG_init(void)
 {
+	if (!process_shared_preload_libraries_in_progress)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("auth_delay must be loaded via shared_preload_libraries")));
+
 	/* Define custom GUC variables */
 	DefineCustomIntVariable("auth_delay.milliseconds",
 							"Milliseconds to delay before reporting authentication failure",
@@ -66,9 +266,34 @@ _PG_init(void)
 							NULL,
 							NULL);
 
+	DefineCustomBoolVariable("auth_delay.exponential_backoff",
+							 "Double the delay after each authentication failure from a particular host",
+							 NULL,
+							 &auth_delay_exp_backoff,
+							 false,
+							 PGC_SIGHUP,
+							 0,
+							 NULL,
+							 NULL,
+							 NULL);
+
+	DefineCustomIntVariable("auth_delay.max_milliseconds",
+							"Maximum delay when exponential backoff is enabled",
+							NULL,
+							&auth_delay_max_milliseconds,
+							10000,
+							0, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_MS,
+							NULL, NULL, NULL);
+
 	MarkGUCPrefixReserved("auth_delay");
 
 	/* Install Hooks */
 	original_client_auth_hook = ClientAuthentication_hook;
 	ClientAuthentication_hook = auth_delay_checks;
+
+	/* Set up shared memory */
+	shmem_startup_next = shmem_startup_hook;
+	shmem_startup_hook = auth_delay_shmem_startup;
 }
diff --git a/doc/src/sgml/auth-delay.sgml b/doc/src/sgml/auth-delay.sgml
index 0571f2a99d..b02b6b0b30 100644
--- a/doc/src/sgml/auth-delay.sgml
+++ b/doc/src/sgml/auth-delay.sgml
@@ -16,6 +16,18 @@
   connection slots.
  </para>
 
+ <para>
+  It is optionally possible to let <filename>auth_delay</filename> wait longer
+  on each successive authentication failure if the configuration parameter
+  <varname>auth_delay.exponential_backoff</varname> is active.  If enabled,
+  <filename>auth_delay</filename> will start with a delay of
+  <varname>auth_delay.milliseconds</varname> and double the delay after each
+  consecutive authentication failure from a particular host, up to the given
+  <varname>auth_delay.max_milliseconds</varname> (default: 10s). If the host
+  authenticates successfully or after a timeout of five times
+  <varname>auth_delay.max_milliseconds</varname>, the delay is reset.
+ </para>
+
  <para>
   In order to function, this module must be loaded via
   <xref linkend="guc-shared-preload-libraries"/> in <filename>postgresql.conf</filename>.
@@ -39,6 +51,35 @@
      </para>
     </listitem>
    </varlistentry>
+   <varlistentry>
+    <term>
+     <varname>auth_delay.exponential_backoff</varname> (<type>bool</type>)
+     <indexterm>
+      <primary><varname>auth_delay.exponential_backoff</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      Whether to use exponential backoff per remote host on authentication
+      failure.  The default is off.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+    <term>
+     <varname>auth_delay.max_milliseconds</varname> (<type>integer</type>)
+     <indexterm>
+      <primary><varname>auth_delay.max_milliseconds</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      The maximum delay, in milliseconds, when exponential backoff is
+      enabled.  The default is 10 seconds.  If set to 0, authentication delays
+      will increase without limit.
+     </para>
+    </listitem>
+   </varlistentry>
   </variablelist>
 
   <para>
@@ -50,7 +91,9 @@
 # postgresql.conf
 shared_preload_libraries = 'auth_delay'
 
-auth_delay.milliseconds = '500'
+auth_delay.milliseconds = '125'
+auth_delay.exponential_backoff = 'on'
+auth_delay.max_milliseconds = '20000'
 </programlisting>
  </sect2>
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 95ae7845d8..dd9fb9e530 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -166,6 +166,7 @@ AttrMap
 AttrMissing
 AttrNumber
 AttributeOpts
+AuthConnRecord
 AuthRequest
 AuthToken
 AutoPrewarmSharedState
-- 
2.39.2

#9Robert Haas
robertmhaas@gmail.com
In reply to: Michael Banck (#8)
Re: [PATCH] Exponential backoff for auth_delay

On Mon, Mar 4, 2024 at 2:43 PM Michael Banck <mbanck@gmx.net> wrote:

3) Do we actually need the exponential_backoff GUC? Wouldn't it be
sufficient to have the maximum value, and if it's -1 treat that as no
backoff?

That is a good question, I guess that makes sense.

The next question then is: what should the default for (now)
auth_delay.max_milliseconds be in this case, -1? Or do we say that as
the default for auth_delay.milliseconds is 0 anyway, why would somebody
not want exponential backoff when they switch it on and keep it at the
current 10s/10000ms)?

I have not changed that for now, pending further input.

I agree that two GUCs here seems to be one more than necessary, but I
wonder whether we couldn't just say 0 means no exponential backoff and
any other value is the maximum time. The idea that 0 means unlimited
doesn't seem useful in practice. There's always going to be some
limit, at least by the number of bits we have in the data type that
we're using to do the calculation. But that limit is basically never
the right answer. I mean, I think 2^31 milliseconds is roughly 25
days, but it seems unlikely to me that delays measured in days
helpfully more secure than delays measured in minutes, and they don't
seem very convenient for users either, and do you really want a failed
connection to linger for days before failing? That seems like you're
DOSing yourself. If somebody wants to configure a very large value
explicitly, cool, they can do as they like, but we don't need to
complicate the interface to make it easier for them to do so.

--
Robert Haas
EDB: http://www.enterprisedb.com

#10Michael Banck
mbanck@gmx.net
In reply to: Robert Haas (#9)
1 attachment(s)
Re: [PATCH] Exponential backoff for auth_delay

Hi,

On Mon, Mar 04, 2024 at 03:50:07PM -0500, Robert Haas wrote:

I agree that two GUCs here seems to be one more than necessary, but I
wonder whether we couldn't just say 0 means no exponential backoff and
any other value is the maximum time.

Alright, I have changed it so that auth_delay.milliseconds and
auth_delay.max_milliseconds are the only GUCs, their default being 0. If
the latter is 0, the former's value is always taken. If the latter is
non-zero and larger than the former, exponential backoff is applied with
the latter's value as maximum delay.

If the latter is smaller than the former then auth_delay just sets the
delay to the latter, I don't think this is problem or confusing, or
should this be considered a misconfiguration?

The idea that 0 means unlimited doesn't seem useful in practice.

Yeah, that was more how it was coded than a real policy decision, so
let's do away with it.

V5 attached.

Michael

Attachments:

v5-0001-Add-optional-exponential-backoff-to-auth_delay-co.patchtext/x-diff; charset=utf-8Download
From 3563d77061480b7e022255b968a39086b0cc8814 Mon Sep 17 00:00:00 2001
From: Michael Banck <michael.banck@credativ.de>
Date: Wed, 27 Dec 2023 15:55:39 +0100
Subject: [PATCH v5] Add optional exponential backoff to auth_delay contrib
 module.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This adds the new auth_delay.max_milliseconds GUC. If set (its default is 0),
auth_delay adds exponential backoff with this GUC's value as maximum delay.

The exponential backoff is tracked per remote host and doubled for every failed
login attempt (i.e., wrong password, not just missing pg_hba line or database)
and reset to auth_delay.milliseconds after a successful authentication or when
no authentication attempts have been made for 5*max_milliseconds from that
host.

Authors: Michael Banck, based on an earlier patch by 成之焕
Reviewed-by: Abhijit Menon-Sen, Tomas Vondra
Discussion: https://postgr.es/m/AHwAxACqIwIVOEhs5YejpqoG.1.1668569845751.Hmail.zhcheng@ceresdata.com
---
 contrib/auth_delay/auth_delay.c  | 216 ++++++++++++++++++++++++++++++-
 doc/src/sgml/auth-delay.sgml     |  31 ++++-
 src/tools/pgindent/typedefs.list |   1 +
 3 files changed, 244 insertions(+), 4 deletions(-)

diff --git a/contrib/auth_delay/auth_delay.c b/contrib/auth_delay/auth_delay.c
index ff0e1fd461..5fb123d133 100644
--- a/contrib/auth_delay/auth_delay.c
+++ b/contrib/auth_delay/auth_delay.c
@@ -14,24 +14,50 @@
 #include <limits.h>
 
 #include "libpq/auth.h"
+#include "miscadmin.h"
 #include "port.h"
+#include "storage/dsm_registry.h"
+#include "storage/ipc.h"
+#include "storage/lwlock.h"
+#include "storage/shmem.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
 PG_MODULE_MAGIC;
 
+#define MAX_CONN_RECORDS 100
+
 /* GUC Variables */
 static int	auth_delay_milliseconds = 0;
+static int	auth_delay_max_milliseconds = 0;
 
 /* Original Hook */
 static ClientAuthentication_hook_type original_client_auth_hook = NULL;
 
+typedef struct AuthConnRecord
+{
+	char		remote_host[NI_MAXHOST];
+	double		sleep_time;		/* in milliseconds */
+	TimestampTz last_failed_auth;
+} AuthConnRecord;
+
+static shmem_startup_hook_type shmem_startup_next = NULL;
+static AuthConnRecord *acr_array = NULL;
+
+static AuthConnRecord *auth_delay_find_acr_for_host(char *remote_host);
+static AuthConnRecord *auth_delay_find_free_acr(void);
+static double auth_delay_increase_delay_after_failed_conn_auth(Port *port);
+static void auth_delay_cleanup_conn_record(Port *port);
+static void auth_delay_expire_conn_records(Port *port);
+
 /*
  * Check authentication
  */
 static void
 auth_delay_checks(Port *port, int status)
 {
+	double		delay = auth_delay_milliseconds;
+
 	/*
 	 * Any other plugins which use ClientAuthentication_hook.
 	 */
@@ -39,20 +65,190 @@ auth_delay_checks(Port *port, int status)
 		original_client_auth_hook(port, status);
 
 	/*
-	 * Inject a short delay if authentication failed.
+	 * We handle both STATUS_ERROR and STATUS_OK - the third option
+	 * (STATUS_EOF) is disregarded.
+	 *
+	 * In case of STATUS_ERROR we inject a short delay, optionally with
+	 * exponential backoff.
+	 */
+	if (status == STATUS_ERROR)
+	{
+		if (auth_delay_max_milliseconds > 0)
+		{
+			/*
+			 * Delay by 2^n seconds after each authentication failure from a
+			 * particular host, where n is the number of consecutive
+			 * authentication failures.
+			 */
+			delay = auth_delay_increase_delay_after_failed_conn_auth(port);
+
+			/*
+			 * Clamp delay to a maximum of auth_delay_max_milliseconds.
+			 */
+			delay = Min(delay, auth_delay_max_milliseconds);
+		}
+
+		if (delay > 0)
+		{
+			elog(DEBUG1, "Authentication delayed for %g seconds due to auth_delay", delay / 1000.0);
+			pg_usleep(1000L * (long) delay);
+		}
+
+		/*
+		 * Expire delays from other hosts after auth_delay_max_milliseconds *
+		 * 5.
+		 */
+		auth_delay_expire_conn_records(port);
+	}
+
+	/*
+	 * Remove host-specific delay if authentication succeeded.
+	 */
+	if (status == STATUS_OK)
+		auth_delay_cleanup_conn_record(port);
+}
+
+static double
+auth_delay_increase_delay_after_failed_conn_auth(Port *port)
+{
+	AuthConnRecord *acr = NULL;
+
+	acr = auth_delay_find_acr_for_host(port->remote_host);
+
+	if (!acr)
+	{
+		acr = auth_delay_find_free_acr();
+
+		if (!acr)
+		{
+			/*
+			 * No free space, MAX_CONN_RECORDS reached. Wait for the
+			 * configured maximum amount.
+			 */
+			elog(LOG, "auth_delay: host connection list full, waiting maximum amount");
+			return auth_delay_max_milliseconds;
+		}
+		strcpy(acr->remote_host, port->remote_host);
+	}
+	if (acr->sleep_time == 0)
+		acr->sleep_time = (double) auth_delay_milliseconds;
+	else
+		acr->sleep_time *= 2;
+
+	/*
+	 * Set current timestamp for later expiry.
 	 */
-	if (status != STATUS_OK)
+	acr->last_failed_auth = GetCurrentTimestamp();
+
+	return acr->sleep_time;
+}
+
+static AuthConnRecord *
+auth_delay_find_acr_for_host(char *remote_host)
+{
+	int			i;
+
+	for (i = 0; i < MAX_CONN_RECORDS; i++)
+	{
+		if (strcmp(acr_array[i].remote_host, remote_host) == 0)
+			return &acr_array[i];
+	}
+
+	return NULL;
+}
+
+static AuthConnRecord *
+auth_delay_find_free_acr(void)
+{
+	int			i;
+
+	for (i = 0; i < MAX_CONN_RECORDS; i++)
+	{
+		if (!acr_array[i].remote_host[0])
+			return &acr_array[i];
+	}
+
+	return 0;
+}
+
+static void
+auth_delay_cleanup_conn_record(Port *port)
+{
+	AuthConnRecord *acr = NULL;
+
+	acr = auth_delay_find_acr_for_host(port->remote_host);
+	if (acr == NULL)
+		return;
+
+	port->remote_host[0] = '\0';
+
+	acr->sleep_time = 0.0;
+	acr->last_failed_auth = 0.0;
+}
+
+static void
+auth_delay_expire_conn_records(Port *port)
+{
+	int			i;
+	TimestampTz now = GetCurrentTimestamp();
+
+	for (i = 0; i < MAX_CONN_RECORDS; i++)
 	{
-		pg_usleep(1000L * auth_delay_milliseconds);
+		/*
+		 * Do not expire the host from which the current authentication
+		 * failure originated.
+		 */
+		if (strcmp(acr_array[i].remote_host, port->remote_host) == 0)
+			continue;
+
+		if (acr_array[i].last_failed_auth > 0 && (long) ((now - acr_array[i].last_failed_auth) / 1000) > 5 * auth_delay_max_milliseconds)
+		{
+			acr_array[i].remote_host[0] = '\0';
+			acr_array[i].sleep_time = 0.0;
+			acr_array[i].last_failed_auth = 0.0;
+		}
 	}
 }
 
+/*
+ * Set up shared memory
+ */
+
+static void
+auth_delay_init_state(void *ptr)
+{
+	Size		shm_size;
+	AuthConnRecord *array = (AuthConnRecord *) ptr;
+
+	shm_size = sizeof(AuthConnRecord) * MAX_CONN_RECORDS;
+
+	memset(array, 0, shm_size);
+}
+
+static void
+auth_delay_shmem_startup(void)
+{
+	bool		found;
+	Size		shm_size;
+
+	if (shmem_startup_next)
+		shmem_startup_next();
+
+	shm_size = sizeof(AuthConnRecord) * MAX_CONN_RECORDS;
+	acr_array = GetNamedDSMSegment("auth_delay", shm_size, auth_delay_init_state, &found);
+}
+
 /*
  * Module Load Callback
  */
 void
 _PG_init(void)
 {
+	if (!process_shared_preload_libraries_in_progress)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("auth_delay must be loaded via shared_preload_libraries")));
+
 	/* Define custom GUC variables */
 	DefineCustomIntVariable("auth_delay.milliseconds",
 							"Milliseconds to delay before reporting authentication failure",
@@ -66,9 +262,23 @@ _PG_init(void)
 							NULL,
 							NULL);
 
+	DefineCustomIntVariable("auth_delay.max_milliseconds",
+							"Maximum delay for exponential backoff",
+							NULL,
+							&auth_delay_max_milliseconds,
+							0,
+							0, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_MS,
+							NULL, NULL, NULL);
+
 	MarkGUCPrefixReserved("auth_delay");
 
 	/* Install Hooks */
 	original_client_auth_hook = ClientAuthentication_hook;
 	ClientAuthentication_hook = auth_delay_checks;
+
+	/* Set up shared memory */
+	shmem_startup_next = shmem_startup_hook;
+	shmem_startup_hook = auth_delay_shmem_startup;
 }
diff --git a/doc/src/sgml/auth-delay.sgml b/doc/src/sgml/auth-delay.sgml
index 0571f2a99d..e3c182cd45 100644
--- a/doc/src/sgml/auth-delay.sgml
+++ b/doc/src/sgml/auth-delay.sgml
@@ -16,6 +16,19 @@
   connection slots.
  </para>
 
+ <para>
+  It is optionally possible to let <filename>auth_delay</filename> wait longer
+  on each successive authentication failure if the configuration parameter
+  <varname>auth_delay.max_milliseconds</varname> is set.  In this case,
+  <filename>auth_delay</filename> will start with a delay of
+  <varname>auth_delay.milliseconds</varname> and double the delay after each
+  consecutive authentication failure from a particular host, up to the given
+  <varname>auth_delay.max_milliseconds</varname>. If the host authenticates
+  successfully or after a timeout of five times
+  <varname>auth_delay.max_milliseconds</varname>, the delay is reset to
+  <varname>auth_delay.milliseconds</varname>.
+ </para>
+
  <para>
   In order to function, this module must be loaded via
   <xref linkend="guc-shared-preload-libraries"/> in <filename>postgresql.conf</filename>.
@@ -39,6 +52,21 @@
      </para>
     </listitem>
    </varlistentry>
+   <varlistentry>
+    <term>
+     <varname>auth_delay.max_milliseconds</varname> (<type>integer</type>)
+     <indexterm>
+      <primary><varname>auth_delay.max_milliseconds</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      The maximum delay in milliseconds, implying exponential backoff between
+      each successive failed attempt.  The default is 0, meaning exponential
+      backoff is not active.
+     </para>
+    </listitem>
+   </varlistentry>
   </variablelist>
 
   <para>
@@ -50,7 +78,8 @@
 # postgresql.conf
 shared_preload_libraries = 'auth_delay'
 
-auth_delay.milliseconds = '500'
+auth_delay.milliseconds = '125'
+auth_delay.max_milliseconds = '20000'
 </programlisting>
  </sect2>
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 95ae7845d8..dd9fb9e530 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -166,6 +166,7 @@ AttrMap
 AttrMissing
 AttrNumber
 AttributeOpts
+AuthConnRecord
 AuthRequest
 AuthToken
 AutoPrewarmSharedState
-- 
2.39.2

#11Robert Haas
robertmhaas@gmail.com
In reply to: Michael Banck (#10)
Re: [PATCH] Exponential backoff for auth_delay

On Mon, Mar 4, 2024 at 4:58 PM Michael Banck <mbanck@gmx.net> wrote:

Alright, I have changed it so that auth_delay.milliseconds and
auth_delay.max_milliseconds are the only GUCs, their default being 0. If
the latter is 0, the former's value is always taken. If the latter is
non-zero and larger than the former, exponential backoff is applied with
the latter's value as maximum delay.

If the latter is smaller than the former then auth_delay just sets the
delay to the latter, I don't think this is problem or confusing, or
should this be considered a misconfiguration?

Seems fine to me. We may need to think about what the documentation
should say about this, if anything.

--
Robert Haas
EDB: http://www.enterprisedb.com

#12Nathan Bossart
nathandbossart@gmail.com
In reply to: Michael Banck (#8)
Re: [PATCH] Exponential backoff for auth_delay

On Mon, Mar 04, 2024 at 08:42:55PM +0100, Michael Banck wrote:

On Wed, Feb 21, 2024 at 10:26:11PM +0100, Tomas Vondra wrote:

I think it'd be good to consider:

- Making the acr_array a hash table, and larger than 50 entries (I
wonder if that should be dynamic / customizable by GUC?).

I would say a GUC should be overkill for this as this would mostly be an
implementation detail.

More generally, I think now that entries are expired (see below), this
should be less of a concern, so I have not changed this to a hash table
for now but doubled MAX_CONN_RECORDS to 100 entries.

I don't have a strong opinion about making this configurable, but I do
think we should consider making this a hash table so that we can set
MAX_CONN_RECORDS much higher.

Also, since these records are stored in shared memory, don't we need to
lock them when searching/updating?

+static void
+auth_delay_init_state(void *ptr)
+{
+	Size		shm_size;
+	AuthConnRecord *array = (AuthConnRecord *) ptr;
+
+	shm_size = sizeof(AuthConnRecord) * MAX_CONN_RECORDS;
+
+	memset(array, 0, shm_size);
+}
+
+static void
+auth_delay_shmem_startup(void)
+{
+	bool		found;
+	Size		shm_size;
+
+	if (shmem_startup_next)
+		shmem_startup_next();
+
+	shm_size = sizeof(AuthConnRecord) * MAX_CONN_RECORDS;
+	acr_array = GetNamedDSMSegment("auth_delay", shm_size, auth_delay_init_state, &found);
+}

Great to see the DSM registry getting some use. This example makes me
wonder whether the size of the segment should be passed to the
init_callback.

/*
* Module Load Callback
*/
void
_PG_init(void)
{
+	if (!process_shared_preload_libraries_in_progress)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("auth_delay must be loaded via shared_preload_libraries")));
+

This change seems like a good idea independent of this feature.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#13Jacob Champion
jacob.champion@enterprisedb.com
In reply to: Nathan Bossart (#12)
Re: [PATCH] Exponential backoff for auth_delay

On Tue, Mar 5, 2024 at 1:51 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

I don't have a strong opinion about making this configurable, but I do
think we should consider making this a hash table so that we can set
MAX_CONN_RECORDS much higher.

I'm curious why? It seems like the higher you make MAX_CONN_RECORDS,
the easier it is to put off the brute-force protection. (My assumption
is that anyone mounting a serious attack is not going to be doing this
from their own computer; they'll be doing it from many devices they
don't own -- a botnet, or a series of proxies, or something.)

--

Drive-by microreview -- auth_delay_cleanup_conn_record() has

+ port->remote_host[0] = '\0';

which doesn't seem right. I assume acr->remote_host was meant?

--Jacob

#14Nathan Bossart
nathandbossart@gmail.com
In reply to: Jacob Champion (#13)
Re: [PATCH] Exponential backoff for auth_delay

On Tue, Mar 05, 2024 at 05:14:46PM -0800, Jacob Champion wrote:

On Tue, Mar 5, 2024 at 1:51 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

I don't have a strong opinion about making this configurable, but I do
think we should consider making this a hash table so that we can set
MAX_CONN_RECORDS much higher.

I'm curious why? It seems like the higher you make MAX_CONN_RECORDS,
the easier it is to put off the brute-force protection. (My assumption
is that anyone mounting a serious attack is not going to be doing this
from their own computer; they'll be doing it from many devices they
don't own -- a botnet, or a series of proxies, or something.)

Assuming you are referring to the case where we run out of free slots in
acr_array, I'm not sure I see that as desirable behavior. Once you run out
of slots, all failed authentication attempts are now subject to the maximum
delay, which is arguably a denial-of-service scenario, albeit not a
particularly worrisome one.

I also think the linear search approach artifically constrains the value of
MAX_CONN_RECORDS, so even if a user wanted to bump it up substantially for
their own build, they'd potentially begin noticing the effects of the O(n)
behavior. AFAICT this is really the only reason this value is set so low
at the moment, as I don't think the memory usage or code complexity of a
hash table with thousands of slots would be too bad. In fact, it might
even be simpler to use hash_search() instead of hard-coding linear searches
in multiple places.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#15Jacob Champion
jacob.champion@enterprisedb.com
In reply to: Nathan Bossart (#14)
Re: [PATCH] Exponential backoff for auth_delay

On Wed, Mar 6, 2024 at 8:10 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

Assuming you are referring to the case where we run out of free slots in
acr_array, I'm not sure I see that as desirable behavior. Once you run out
of slots, all failed authentication attempts are now subject to the maximum
delay, which is arguably a denial-of-service scenario, albeit not a
particularly worrisome one.

Maybe I've misunderstood the attack vector, but I thought the point of
the feature was to deny service when the server is under attack. If we
don't deny service, what does the feature do?

And I may have introduced a red herring in talking about the number of
hosts, because an attacker operating from a single host is under no
obligation to actually wait for the authentication delay. Once we hit
some short timeout, we can safely assume the password is wrong,
abandon the request, and open up a new connection. It seems like the
thing limiting our attack is the number of connection slots, not
MAX_CONN_RECORDS. Am I missing something crucial?

--Jacob

#16Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Jacob Champion (#15)
Re: [PATCH] Exponential backoff for auth_delay

On 3/6/24 19:24, Jacob Champion wrote:

On Wed, Mar 6, 2024 at 8:10 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

Assuming you are referring to the case where we run out of free slots in
acr_array, I'm not sure I see that as desirable behavior. Once you run out
of slots, all failed authentication attempts are now subject to the maximum
delay, which is arguably a denial-of-service scenario, albeit not a
particularly worrisome one.

Maybe I've misunderstood the attack vector, but I thought the point of
the feature was to deny service when the server is under attack. If we
don't deny service, what does the feature do?

And I may have introduced a red herring in talking about the number of
hosts, because an attacker operating from a single host is under no
obligation to actually wait for the authentication delay. Once we hit
some short timeout, we can safely assume the password is wrong,
abandon the request, and open up a new connection. It seems like the
thing limiting our attack is the number of connection slots, not
MAX_CONN_RECORDS. Am I missing something crucial?

Doesn't this mean this approach (as implemented) doesn't actually work
as a protection against this sort of DoS?

If I'm an attacker, and I can just keep opening new connections for each
auth request, am I even subject to any auth delay?

ISTM the problem lies in the fact that we apply the delay only *after*
the failed auth attempt. Which makes sense, because until now we didn't
have any state with information for new connections. But with the new
acr_array, we could check that first, and do the delay before trying to
athenticate, no?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17Michael Banck
mbanck@gmx.net
In reply to: Tomas Vondra (#16)
Re: [PATCH] Exponential backoff for auth_delay

Hi,

On Wed, Mar 06, 2024 at 09:34:37PM +0100, Tomas Vondra wrote:

On 3/6/24 19:24, Jacob Champion wrote:

On Wed, Mar 6, 2024 at 8:10 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

Assuming you are referring to the case where we run out of free slots in
acr_array, I'm not sure I see that as desirable behavior. Once you run out
of slots, all failed authentication attempts are now subject to the maximum
delay, which is arguably a denial-of-service scenario, albeit not a
particularly worrisome one.

Maybe I've misunderstood the attack vector, but I thought the point of
the feature was to deny service when the server is under attack. If we
don't deny service, what does the feature do?

I think there are two attack vectors under discussion:

1. Somebody tries to brute-force a password. The original auth_delay
delays auth for a bit everytime authentication fails. If you configure
the delay to be very small, maybe it does not bother the attacker too
much. If you configure it to be long enough, legitimate users might be
annoyed when typoing their password. The suggested feature tries to help
here by initially delaying authentication just a bit and then gradually
increasing the delay.

2. Somebody tries to denial-of-service a server by exhausting all
(remaining) available connections with failed authentication requests
that are being delayed. For this attack, the suggested feature is
hurting more than doing good as it potentially keeps a failed
authentication attempt's connection hanging around longer than before
and makes it easier to denial-of-service a server in this way.

In order to at least make case 2 not worse for exponential backoff, we
could maybe disable it (and just wait for auth_delay.milliseconds) once
MAX_CONN_RECORDS is full. In addition, maybe MAX_CONN_RECORDS should be
some fraction of max_connections, like 25%?

And I may have introduced a red herring in talking about the number of
hosts, because an attacker operating from a single host is under no
obligation to actually wait for the authentication delay. Once we hit
some short timeout, we can safely assume the password is wrong,
abandon the request, and open up a new connection.

That is a valid point.

Maybe this could be averted if we either outright deny even a successful
authentication request if the host it originates from has a
max_milliseconds delay on file (i.e. has been trying to brute-force the
password for a while) or at least delay a successful authentication
request for some delay, if the host it originates on has a
max_milliseconds delay on file (assuming it will close the connection
beforehand as it thinks the password guess was wrong)?

It seems like the thing limiting our attack is the number of
connection slots, not MAX_CONN_RECORDS. Am I missing something
crucial?

Doesn't this mean this approach (as implemented) doesn't actually work
as a protection against this sort of DoS?

If I'm an attacker, and I can just keep opening new connections for each
auth request, am I even subject to any auth delay?

Yeah, but see above.

ISTM the problem lies in the fact that we apply the delay only *after*
the failed auth attempt. Which makes sense, because until now we didn't
have any state with information for new connections. But with the new
acr_array, we could check that first, and do the delay before trying to
athenticate, no?

I don't think we have a hook for that yet, do we?

Michael

#18Jacob Champion
jacob.champion@enterprisedb.com
In reply to: Tomas Vondra (#16)
Re: [PATCH] Exponential backoff for auth_delay

On Wed, Mar 6, 2024 at 12:34 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Doesn't this mean this approach (as implemented) doesn't actually work
as a protection against this sort of DoS?

If I'm an attacker, and I can just keep opening new connections for each
auth request, am I even subject to any auth delay?

s/DoS/brute-force/, but yeah, that's basically the question at the
heart of my comment. _If_ the point of auth_delay is to tie up
connection slots in order to degrade service during an attack, then I
think this addition works, in the sense that it makes the self-imposed
DoS more draconian the more failures occur.

But I don't know if that's actually the intended goal. For one, I'm
not convinced that the host tracking part buys you anything more in
practice, under that model. And if people are trying to *avoid* the
DoS somehow, then I just really don't understand the feature.

ISTM the problem lies in the fact that we apply the delay only *after*
the failed auth attempt. Which makes sense, because until now we didn't
have any state with information for new connections. But with the new
acr_array, we could check that first, and do the delay before trying to
athenticate, no?

Yeah, I think one key point is to apply the delay to both successful
and failed connections. That probably opens up a bunch more questions,
though? It seems like a big change from the previous behavior. An
attacker can still front-load a bunch of connections in parallel. And
the end state of the working feature is probably still slot exhaustion
during an attack, so...

I looked around a bit at other designs. [1]https://owasp.org/www-community/controls/Blocking_Brute_Force_Attacks is HTTP-focused, but it
talks about some design tradeoffs. I wonder if flipping the sense of
the host tracking [2]https://owasp.org/www-community/Slow_Down_Online_Guessing_Attacks_with_Device_Cookies, to keep track of well-behaved clients and let
them through the service degradation that we're applying to the
masses, might be more robust. But I don't know how to let those
clients punch through slot exhaustion without a lot more work.

--Jacob

[1]: https://owasp.org/www-community/controls/Blocking_Brute_Force_Attacks
[2]: https://owasp.org/www-community/Slow_Down_Online_Guessing_Attacks_with_Device_Cookies

#19Jacob Champion
jacob.champion@enterprisedb.com
In reply to: Michael Banck (#17)
Re: [PATCH] Exponential backoff for auth_delay

On Wed, Mar 6, 2024 at 2:45 PM Michael Banck <mbanck@gmx.net> wrote:

In order to at least make case 2 not worse for exponential backoff, we
could maybe disable it (and just wait for auth_delay.milliseconds) once
MAX_CONN_RECORDS is full. In addition, maybe MAX_CONN_RECORDS should be
some fraction of max_connections, like 25%?

(Our mails crossed; hopefully I've addressed the other points.)

I think solutions for case 1 and case 2 are necessarily at odds under
the current design, if auth_delay relies on slot exhaustion to do its
work effectively. Weakening that on purpose doesn't make much sense to
me; if a DBA is uncomfortable with the DoS implications then I'd argue
they need a different solution. (Which we could theoretically
implement, but it's not my intention to sign you up for that. :D )

--Jacob

#20Jacob Champion
jacob.champion@enterprisedb.com
In reply to: Jacob Champion (#19)
Re: [PATCH] Exponential backoff for auth_delay

On Wed, Mar 20, 2024 at 2:15 PM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:

I think solutions for case 1 and case 2 are necessarily at odds under
the current design, if auth_delay relies on slot exhaustion to do its
work effectively. Weakening that on purpose doesn't make much sense to
me; if a DBA is uncomfortable with the DoS implications then I'd argue
they need a different solution. (Which we could theoretically
implement, but it's not my intention to sign you up for that. :D )

The thread got quiet, and I'm nervous that I squashed it unintentionally. :/

Is there consensus on whether the backoff is useful, even without the
host tracking? (Or, alternatively, is the host tracking helpful in a
way I'm not seeing?) Failing those, is there a way forward that could
make it useful in the future?

--Jacob

#21Daniel Gustafsson
daniel@yesql.se
In reply to: Jacob Champion (#20)
Re: [PATCH] Exponential backoff for auth_delay

On 20 Mar 2024, at 22:21, Jacob Champion <jacob.champion@enterprisedb.com> wrote:

On Wed, Mar 20, 2024 at 2:15 PM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:

I think solutions for case 1 and case 2 are necessarily at odds under
the current design, if auth_delay relies on slot exhaustion to do its
work effectively. Weakening that on purpose doesn't make much sense to
me; if a DBA is uncomfortable with the DoS implications then I'd argue
they need a different solution. (Which we could theoretically
implement, but it's not my intention to sign you up for that. :D )

The thread got quiet, and I'm nervous that I squashed it unintentionally. :/

Is there consensus on whether the backoff is useful, even without the
host tracking? (Or, alternatively, is the host tracking helpful in a
way I'm not seeing?) Failing those, is there a way forward that could
make it useful in the future?

I actually wrote more or less the same patch with rudimentary attacker
fingerprinting, and after some off-list discussion decided to abandon it for
the reasons discussed in this thread. It's unlikely to protect against the
attackers we wan't to protect the cluster against since they won't wait for the
delay anyways.

--
Daniel Gustafsson

#22Michael Banck
mbanck@gmx.net
In reply to: Daniel Gustafsson (#21)
Re: [PATCH] Exponential backoff for auth_delay

Hi,

On Wed, Mar 20, 2024 at 11:22:12PM +0100, Daniel Gustafsson wrote:

On 20 Mar 2024, at 22:21, Jacob Champion <jacob.champion@enterprisedb.com> wrote:

On Wed, Mar 20, 2024 at 2:15 PM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:

I think solutions for case 1 and case 2 are necessarily at odds under
the current design, if auth_delay relies on slot exhaustion to do its
work effectively. Weakening that on purpose doesn't make much sense to
me; if a DBA is uncomfortable with the DoS implications then I'd argue
they need a different solution. (Which we could theoretically
implement, but it's not my intention to sign you up for that. :D )

The thread got quiet, and I'm nervous that I squashed it unintentionally. :/

Is there consensus on whether the backoff is useful, even without the
host tracking? (Or, alternatively, is the host tracking helpful in a
way I'm not seeing?) Failing those, is there a way forward that could
make it useful in the future?

I actually wrote more or less the same patch with rudimentary attacker
fingerprinting, and after some off-list discussion decided to abandon it for
the reasons discussed in this thread. It's unlikely to protect against the
attackers we wan't to protect the cluster against since they won't wait for the
delay anyways.

I have marked the patch "Returned with Feedback" now. Maybe I will get
back to this for v18, but it was clearly not ready for v17.

Michael