dynamic background workers

Started by Robert Haasover 12 years ago17 messages

robertmhaas@gmail.com

over 12 years ago

2 attachment(s)

Parallel query, or any subset of that project such as parallel sort,
will require a way to start background workers on demand. Thanks to
Alvaro's work on 9.3, we now have the ability to configure background
workers via shared_preload_libraries. But if you don't have the right
library loaded at startup time, and subsequently wish to add a
background worker while the server is running, you are out of luck.
Even if you do have the right library loaded, but want to start
workers in response to user activity, rather than when the database
comes on-line, you are also out of luck. Relaxing these restrictions
is essential for parallel query (or parallel processing of any kind),
and useful apart from that. Two patches are attached.

The first patch, max-worker-processes-v1.patch, adds a new GUC
max_worker_processes, which defaults to 8. This fixes the problem
discussed here:

/messages/by-id/CA+TgmobguVO+qHnHvxBA2TFkDhw67Y=4Bp405FVABEc_EtO4VQ@mail.gmail.com

Apart from fixing that problem, it's a pretty boring patch.

The second patch, dynamic-bgworkers-v1.patch, revises the background
worker API to allow background workers to be started dynamically.
This requires some communication channel from ordinary workers to the
postmaster, because it is the postmaster that must ultimately start
the newly-registered workers. However, that communication channel has
to be designed pretty carefully, lest a shared memory corruption take
out the postmaster and lead to inadvertent failure to restart after a
crash. Here's how I implemented that: there's an array in shared
memory of a size equal to max_worker_processes. This array is
separate from the backend-private list of workers maintained by the
postmaster, but the two are kept in sync. When a new background
worker registration is added to the shared data structure, the backend
adding it uses the existing pmsignal mechanism to kick the postmaster,
which then scans the array for new registrations. I have attempted to
make the code that transfers the shared_memory state into the
postmaster's private state as paranoid as humanly possible. The
precautions taken are documented in the comments. Conversely, when a
background worker flagged as BGW_NEVER_RESTART is considered for
restart (and we decide against it), the corresponding slot in the
shared memory array is marked as no longer in use, allowing it to be
reused for a new registration.

Since the postmaster cannot take locks, synchronization between the
postmaster and other backends using the shared memory segment has to
be lockless. This mechanism is also documented in the comments. An
lwlock is used to prevent two backends that are both registering a new
worker at about the same time from stomping on each other, but the
postmaster need not care about that lwlock.

This patch also extends worker_spi as a demonstration of the new
interface. With this patch, you can CREATE EXTENSION worker_spi and
then call worker_spi_launch(int4) to launch a new background worker,
or combine it with generate_series() to launch a bunch at once. Then
you can kill them off with pg_terminate_backend() and start some new
ones. That, in my humble opinion, is pretty cool.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

max-worker-processes-v1.patchapplication/octet-stream; name=max-worker-processes-v1.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c7d84b5..df4255d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1595,6 +1595,25 @@ include 'filename'
         </para>
        </listitem>
       </varlistentry>
+
+      <varlistentry id="guc-max-worker-processes" xreflabel="max_worrker_processes">
+       <term><varname>max_worker_processes</varname> (<type>integer</type>)</term>
+       <indexterm>
+        <primary><varname>max_worker_processes</> configuration parameter</primary>
+       </indexterm>
+       <listitem>
+        <para>
+         Sets the maximum number of background processes that the system
+         can support.  This parameter can only be set at server start.
+        </para>
+
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
      </variablelist>
     </sect2>
    </sect1>
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 2bad527..7a2f4a9 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -117,8 +117,9 @@ xlog_desc(StringInfo buf, uint8 xl_info, char *rec)
 			}
 		}
 
-		appendStringInfo(buf, "parameter change: max_connections=%d max_prepared_xacts=%d max_locks_per_xact=%d wal_level=%s",
+		appendStringInfo(buf, "parameter change: max_connections=%d max_worker_processes=%d max_prepared_xacts=%d max_locks_per_xact=%d wal_level=%s",
 						 xlrec.MaxConnections,
+						 xlrec.max_worker_processes,
 						 xlrec.max_prepared_xacts,
 						 xlrec.max_locks_per_xact,
 						 wal_level_str);
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 654c9c1..a6b7797 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4126,6 +4126,7 @@ BootStrapXLOG(void)
 
 	/* Set important parameter values for use when replaying WAL */
 	ControlFile->MaxConnections = MaxConnections;
+	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
@@ -4833,6 +4834,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_connections",
 									 MaxConnections,
 									 ControlFile->MaxConnections);
+		RecoveryRequiresIntParameter("max_worker_processes",
+									 max_worker_processes,
+									 ControlFile->max_worker_processes);
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
@@ -7759,6 +7763,7 @@ XLogReportParameters(void)
 {
 	if (wal_level != ControlFile->wal_level ||
 		MaxConnections != ControlFile->MaxConnections ||
+		max_worker_processes != ControlFile->max_worker_processes ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact)
 	{
@@ -7775,6 +7780,7 @@ XLogReportParameters(void)
 			xl_parameter_change xlrec;
 
 			xlrec.MaxConnections = MaxConnections;
+			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
@@ -7788,6 +7794,7 @@ XLogReportParameters(void)
 		}
 
 		ControlFile->MaxConnections = MaxConnections;
+		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
@@ -8173,6 +8180,7 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record)
 
 		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 		ControlFile->MaxConnections = xlrec.MaxConnections;
+		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 87e6062..5c77743 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -397,7 +397,6 @@ static void reaper(SIGNAL_ARGS);
 static void sigusr1_handler(SIGNAL_ARGS);
 static void startup_die(SIGNAL_ARGS);
 static void dummy_handler(SIGNAL_ARGS);
-static int	GetNumRegisteredBackgroundWorkers(int flags);
 static void StartupPacketTimeoutHandler(void);
 static void CleanupBackend(int pid, int exitstatus);
 static bool CleanupBackgroundWorker(int pid, int exitstatus);
@@ -5132,7 +5131,7 @@ int
 MaxLivePostmasterChildren(void)
 {
 	return 2 * (MaxConnections + autovacuum_max_workers + 1 +
-				GetNumRegisteredBackgroundWorkers(0));
+				max_worker_processes);
 }
 
 /*
@@ -5146,7 +5145,6 @@ RegisterBackgroundWorker(BackgroundWorker *worker)
 {
 	RegisteredBgWorker *rw;
 	int			namelen = strlen(worker->bgw_name);
-	static int	maxworkers;
 	static int	numworkers = 0;
 
 #ifdef EXEC_BACKEND
@@ -5158,11 +5156,6 @@ RegisterBackgroundWorker(BackgroundWorker *worker)
 	static int	BackgroundWorkerCookie = 1;
 #endif
 
-	/* initialize upper limit on first call */
-	if (numworkers == 0)
-		maxworkers = MAX_BACKENDS -
-			(MaxConnections + autovacuum_max_workers + 1);
-
 	if (!IsUnderPostmaster)
 		ereport(LOG,
 			(errmsg("registering background worker: %s", worker->bgw_name)));
@@ -5218,17 +5211,17 @@ RegisterBackgroundWorker(BackgroundWorker *worker)
 	/*
 	 * Enforce maximum number of workers.  Note this is overly restrictive: we
 	 * could allow more non-shmem-connected workers, because these don't count
-	 * towards the MAX_BACKENDS limit elsewhere.  This doesn't really matter
-	 * for practical purposes; several million processes would need to run on
-	 * a single server.
+	 * towards the MAX_BACKENDS limit elsewhere.  For now, it doesn't seem
+	 * important to relax this restriction.
 	 */
-	if (++numworkers > maxworkers)
+	if (++numworkers > max_worker_processes)
 	{
 		ereport(LOG,
 				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
 				 errmsg("too many background workers"),
 				 errdetail("Up to %d background workers can be registered with the current settings.",
-						   maxworkers)));
+						   max_worker_processes),
+				 errhint("Consider increasing the configuration parameter \"max_worker_processes\".")));
 		return;
 	}
 
@@ -5509,41 +5502,6 @@ do_start_bgworker(void)
 	proc_exit(0);
 }
 
-/*
- * Return the number of background workers registered that have at least
- * one of the passed flag bits set.
- */
-static int
-GetNumRegisteredBackgroundWorkers(int flags)
-{
-	slist_iter	iter;
-	int			count = 0;
-
-	slist_foreach(iter, &BackgroundWorkerList)
-	{
-		RegisteredBgWorker *rw;
-
-		rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
-
-		if (flags != 0 &&
-			!(rw->rw_worker.bgw_flags & flags))
-			continue;
-
-		count++;
-	}
-
-	return count;
-}
-
-/*
- * Return the number of bgworkers that need to have PGPROC entries.
- */
-int
-GetNumShmemAttachedBgworkers(void)
-{
-	return GetNumRegisteredBackgroundWorkers(BGWORKER_SHMEM_ACCESS);
-}
-
 #ifdef EXEC_BACKEND
 static pid_t
 bgworker_forkexec(int cookie)
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 6d72a63..25bd528 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -140,10 +140,8 @@ ProcGlobalSemas(void)
  *	  running out when trying to start another backend is a common failure.
  *	  So, now we grab enough semaphores to support the desired max number
  *	  of backends immediately at initialization --- if the sysadmin has set
- *	  MaxConnections or autovacuum_max_workers higher than his kernel will
- *	  support, he'll find out sooner rather than later.  (The number of
- *	  background worker processes registered by loadable modules is also taken
- *	  into consideration.)
+ *	  MaxConnections, max_worker_processes, or autovacuum_max_workers higher
+ *	  than his kernel will support, he'll find out sooner rather than later.
  *
  *	  Another reason for creating semaphores here is that the semaphore
  *	  implementation typically requires us to create semaphores in the
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 9f51929..33efb3c 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -109,6 +109,7 @@ int			maintenance_work_mem = 16384;
  */
 int			NBuffers = 1000;
 int			MaxConnections = 90;
+int			max_worker_processes = 8;
 int			MaxBackends = 0;
 
 int			VacuumCostPageHit = 1;		/* GUC parameters for vacuum */
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index e0abff1..0ca40da 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -441,7 +441,7 @@ InitializeMaxBackends(void)
 
 	/* the extra unit accounts for the autovacuum launcher */
 	MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
-		GetNumShmemAttachedBgworkers();
+		+ max_worker_processes;
 
 	/* internal error because the values were all checked previously */
 	if (MaxBackends > MAX_BACKENDS)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ea16c64..670f75e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -189,6 +189,7 @@ static const char *show_tcp_keepalives_idle(void);
 static const char *show_tcp_keepalives_interval(void);
 static const char *show_tcp_keepalives_count(void);
 static bool check_maxconnections(int *newval, void **extra, GucSource source);
+static bool check_max_worker_processes(int *newval, void **extra, GucSource source);
 static bool check_autovacuum_max_workers(int *newval, void **extra, GucSource source);
 static bool check_effective_io_concurrency(int *newval, void **extra, GucSource source);
 static void assign_effective_io_concurrency(int newval, void *extra);
@@ -2158,6 +2159,18 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
+		{"max_worker_processes",
+			PGC_POSTMASTER,
+			RESOURCES_ASYNCHRONOUS,
+			gettext_noop("Maximum number of concurrent worker processes."),
+			NULL,
+		},
+		&max_worker_processes,
+		8, 1, MAX_BACKENDS,
+		check_max_worker_processes, NULL, NULL
+	},
+
+	{
 		{"log_rotation_age", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Automatic log file rotation will occur after N minutes."),
 			NULL,
@@ -8645,8 +8658,8 @@ show_tcp_keepalives_count(void)
 static bool
 check_maxconnections(int *newval, void **extra, GucSource source)
 {
-	if (*newval + GetNumShmemAttachedBgworkers() + autovacuum_max_workers + 1 >
-		MAX_BACKENDS)
+	if (*newval + autovacuum_max_workers + 1 +
+		max_worker_processes > MAX_BACKENDS)
 		return false;
 	return true;
 }
@@ -8654,8 +8667,15 @@ check_maxconnections(int *newval, void **extra, GucSource source)
 static bool
 check_autovacuum_max_workers(int *newval, void **extra, GucSource source)
 {
-	if (MaxConnections + *newval + 1 + GetNumShmemAttachedBgworkers() >
-		MAX_BACKENDS)
+	if (MaxConnections + *newval + 1 + max_worker_processes > MAX_BACKENDS)
+		return false;
+	return true;
+}
+
+static bool
+check_max_worker_processes(int *newval, void **extra, GucSource source)
+{
+	if (MaxConnections + autovacuum_max_workers + 1 + *newval > MAX_BACKENDS)
 		return false;
 	return true;
 }
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 0303ac7..3589767 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -152,6 +152,7 @@
 # - Asynchronous Behavior -
 
 #effective_io_concurrency = 1		# 1-1000; 0 disables prefetching
+#max_worker_processes = 8
 
 
 #------------------------------------------------------------------------------
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index a790f99..fde483a 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -260,6 +260,8 @@ main(int argc, char *argv[])
 		   wal_level_str(ControlFile.wal_level));
 	printf(_("Current max_connections setting:      %d\n"),
 		   ControlFile.MaxConnections);
+	printf(_("Current max_worker_processes setting: %d\n"),
+		   ControlFile.max_worker_processes);
 	printf(_("Current max_prepared_xacts setting:   %d\n"),
 		   ControlFile.max_prepared_xacts);
 	printf(_("Current max_locks_per_xact setting:   %d\n"),
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 82018e4..211a5a7 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -518,6 +518,7 @@ GuessControlValues(void)
 
 	ControlFile.wal_level = WAL_LEVEL_MINIMAL;
 	ControlFile.MaxConnections = 100;
+	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
@@ -664,6 +665,7 @@ RewriteControlFile(void)
 	 */
 	ControlFile.wal_level = WAL_LEVEL_MINIMAL;
 	ControlFile.MaxConnections = 100;
+	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index ee12d1a..c3e1731 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -55,7 +55,7 @@ typedef struct BkpBlock
 /*
  * Each page of XLOG file has a header like this:
  */
-#define XLOG_PAGE_MAGIC 0xD075	/* can be used as WAL version indicator */
+#define XLOG_PAGE_MAGIC 0xD076	/* can be used as WAL version indicator */
 
 typedef struct XLogPageHeaderData
 {
@@ -205,6 +205,7 @@ typedef XLogLongPageHeaderData *XLogLongPageHeader;
 typedef struct xl_parameter_change
 {
 	int			MaxConnections;
+	int			max_worker_processes;
 	int			max_prepared_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 4f154a9..91577de 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -172,6 +172,7 @@ typedef struct ControlFileData
 	 */
 	int			wal_level;
 	int			MaxConnections;
+	int			max_worker_processes;
 	int			max_prepared_xacts;
 	int			max_locks_per_xact;
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index be3add9..48985b3 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -141,6 +141,7 @@ extern PGDLLIMPORT char *DataDir;
 extern PGDLLIMPORT int NBuffers;
 extern int	MaxBackends;
 extern int	MaxConnections;
+extern int	max_worker_processes;
 
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;

dynamic-bgworkers-v1.patchapplication/octet-stream; name=dynamic-bgworkers-v1.patchDownload

diff --git a/contrib/worker_spi/Makefile b/contrib/worker_spi/Makefile
index edf4105..fbb29b4 100644
--- a/contrib/worker_spi/Makefile
+++ b/contrib/worker_spi/Makefile
@@ -2,6 +2,9 @@
 
 MODULES = worker_spi
 
+EXTENSION = worker_spi
+DATA = worker_spi--1.0.sql
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/worker_spi/worker_spi--1.0.sql b/contrib/worker_spi/worker_spi--1.0.sql
new file mode 100644
index 0000000..a56b42c
--- /dev/null
+++ b/contrib/worker_spi/worker_spi--1.0.sql
@@ -0,0 +1,9 @@
+/* contrib/worker_spi/worker_spi--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION worker_spi" to load this file. \quit
+
+CREATE FUNCTION worker_spi_launch(pg_catalog.int4)
+RETURNS pg_catalog.bool STRICT
+AS 'MODULE_PATHNAME'
+LANGUAGE C;
diff --git a/contrib/worker_spi/worker_spi.c b/contrib/worker_spi/worker_spi.c
index 414721a..4152aec 100644
--- a/contrib/worker_spi/worker_spi.c
+++ b/contrib/worker_spi/worker_spi.c
@@ -42,8 +42,11 @@
 #include "tcop/utility.h"
 
 PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(worker_spi_launch);
 
 void		_PG_init(void);
+void		worker_spi_main(Datum);
+Datum		worker_spi_launch(PG_FUNCTION_ARGS);
 
 /* flags set by signal handlers */
 static volatile sig_atomic_t got_sighup = false;
@@ -153,11 +156,18 @@ initialize_worker_spi(worktable *table)
 	pgstat_report_activity(STATE_IDLE, NULL);
 }
 
-static void
-worker_spi_main(void *main_arg)
+void
+worker_spi_main(Datum main_arg)
 {
-	worktable  *table = (worktable *) main_arg;
+	int			index = DatumGetInt32(main_arg);
+	worktable  *table;
 	StringInfoData buf;
+	char		name[20];
+
+	table = palloc(sizeof(worktable));
+	sprintf(name, "schema%d", index);
+	table->schema = pstrdup(name);
+	table->name = pstrdup("counted");
 
 	/* We're now ready to receive signals */
 	BackgroundWorkerUnblockSignals();
@@ -279,7 +289,7 @@ worker_spi_main(void *main_arg)
 		pgstat_report_activity(STATE_IDLE, NULL);
 	}
 
-	proc_exit(0);
+	proc_exit(1);
 }
 
 /*
@@ -292,9 +302,7 @@ void
 _PG_init(void)
 {
 	BackgroundWorker worker;
-	worktable  *table;
 	unsigned int i;
-	char		name[20];
 
 	/* get the configuration */
 	DefineCustomIntVariable("worker_spi.naptime",
@@ -309,6 +317,10 @@ _PG_init(void)
 							NULL,
 							NULL,
 							NULL);
+
+	if (!process_shared_preload_libraries_in_progress)
+		return;
+
 	DefineCustomIntVariable("worker_spi.total_workers",
 							"Number of workers.",
 							NULL,
@@ -336,15 +348,33 @@ _PG_init(void)
 	 */
 	for (i = 1; i <= worker_spi_total_workers; i++)
 	{
-		sprintf(name, "worker %d", i);
-		worker.bgw_name = pstrdup(name);
-
-		table = palloc(sizeof(worktable));
-		sprintf(name, "schema%d", i);
-		table->schema = pstrdup(name);
-		table->name = pstrdup("counted");
-		worker.bgw_main_arg = (void *) table;
+		snprintf(worker.bgw_name, BGW_MAXLEN, "worker %d", i);
+		worker.bgw_main_arg = Int32GetDatum(i);
 
 		RegisterBackgroundWorker(&worker);
 	}
 }
+
+/*
+ * Dynamically launch an SPI worker.
+ */
+Datum
+worker_spi_launch(PG_FUNCTION_ARGS)
+{
+	int32		i = PG_GETARG_INT32(0);
+	BackgroundWorker worker;
+
+	worker.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	worker.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	worker.bgw_restart_time = BGW_NEVER_RESTART;
+	worker.bgw_main = NULL;		/* new worker might not have library loaded */
+	sprintf(worker.bgw_library_name, "worker_spi");
+	sprintf(worker.bgw_function_name, "worker_spi_main");
+	worker.bgw_sighup = worker_spi_sighup;
+	worker.bgw_sigterm = worker_spi_sigterm;
+	snprintf(worker.bgw_name, BGW_MAXLEN, "worker %d", i);
+	worker.bgw_main_arg = Int32GetDatum(i);
+
+	PG_RETURN_BOOL(RegisterDynamicBackgroundWorker(&worker));
+}
diff --git a/contrib/worker_spi/worker_spi.control b/contrib/worker_spi/worker_spi.control
new file mode 100644
index 0000000..84d6294
--- /dev/null
+++ b/contrib/worker_spi/worker_spi.control
@@ -0,0 +1,5 @@
+# worker_spi extension
+comment = 'Sample background worker'
+default_version = '1.0'
+module_pathname = '$libdir/worker_spi'
+relocatable = true
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 3056b09..71c2321 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgwriter.o fork_process.o pgarch.o pgstat.o postmaster.o \
-	startup.o syslogger.o walwriter.o checkpointer.o
+OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
+	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
new file mode 100644
index 0000000..10aac7b
--- /dev/null
+++ b/src/backend/postmaster/bgworker.c
@@ -0,0 +1,481 @@
+/*--------------------------------------------------------------------
+ * bgworker.c
+ *		POSTGRES pluggable background workers implementation
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/bgworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "postmaster/bgworker_internals.h"
+#include "storage/barrier.h"
+#include "storage/lwlock.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "utils/ascii.h"
+
+/*
+ * The postmaster's list of registered background workers, in private memory.
+ */
+slist_head BackgroundWorkerList = SLIST_STATIC_INIT(BackgroundWorkerList);
+
+/*
+ * BackgroundWorkerSlots exist in shared memory and can be accessed (via
+ * the BackgroundWorkerArray) by both the postmaster and by regular backends.
+ * However, the postmaster cannot take locks, even spinlocks, because this
+ * might allow it to crash or become wedged if shared memory gets corrupted.
+ * Such an outcome is intolerable.  Therefore, we need a lockless protocol
+ * for coordinating access to this data.
+ *
+ * The 'in_use' flag is used to hand off responsibility for the slot between
+ * the postmaster and the rest of the system.  When 'in_use' is false,
+ * the postmaster will ignore the slot entirely, except for the 'in_use' flag
+ * itself, which it may read.  In this state, regular backends may modify the
+ * slot.  Once a backend sets 'in_use' to true, the slot becomes the
+ * responsibility of the postmaster.  Regular backends may no longer modify it,
+ * but the postmaster may examine it.  Thus, a backend initializing a slot
+ * must fully initialize the slot - and insert a write memory barrier - before
+ * marking it as in use.
+ *
+ * In addition to coordinating with the postmaster, backends modifying this
+ * data structure must coordinate with each other.  Since they can take locks,
+ * this is straightforward: any backend wishing to manipulate a slot must
+ * take BackgroundWorkerLock in exclusive mode.  Backends wishing to read
+ * data that might get concurrently modified by other backends should take
+ * this lock in shared mode.  No matter what, backends reading this data
+ * structure must be able to tolerate concurrent modifications by the
+ * postmaster.
+ */
+typedef struct BackgroundWorkerSlot
+{
+	bool	in_use;
+	BackgroundWorker worker;
+} BackgroundWorkerSlot;
+
+typedef struct BackgroundWorkerArray
+{
+	int		total_slots;
+	BackgroundWorkerSlot slot[FLEXIBLE_ARRAY_MEMBER];
+} BackgroundWorkerArray;
+
+BackgroundWorkerArray *BackgroundWorkerData;
+
+/*
+ * Calculate shared memory needed.
+ */
+Size
+BackgroundWorkerShmemSize(void)
+{
+	Size		size;
+
+	/* Array of workers is variably sized. */
+	size = offsetof(BackgroundWorkerArray, slot);
+	size = add_size(size, mul_size(max_worker_processes,
+								   sizeof(BackgroundWorkerSlot)));
+
+	return size;
+}
+
+/*
+ * Initialize shared memory.
+ */
+void
+BackgroundWorkerShmemInit(void)
+{
+	bool		found;
+
+	BackgroundWorkerData = ShmemInitStruct("Background Worker Data",
+										   BackgroundWorkerShmemSize(),
+										   &found);
+	if (!IsUnderPostmaster)
+	{
+		slist_iter	siter;
+		int			slotno = 0;
+
+		BackgroundWorkerData->total_slots = max_worker_processes;
+
+		/*
+		 * Copy contents of worker list into shared memory.  Record the
+		 * shared memory slot assigned to each worker.  This ensures
+		 * a 1-to-1 correspondence betwen the postmaster's private list and
+		 * the array in shared memory.
+		 */
+		slist_foreach(siter, &BackgroundWorkerList)
+		{
+			BackgroundWorkerSlot *slot = &BackgroundWorkerData->slot[slotno];
+			RegisteredBgWorker *rw;
+
+			rw = slist_container(RegisteredBgWorker, rw_lnode, siter.cur);
+			Assert(slotno < max_worker_processes);
+			slot->in_use = true;
+			rw->rw_shmem_slot = slotno;
+			memcpy(&slot->worker, &rw->rw_worker, sizeof(BackgroundWorker));
+			++slotno;
+		}
+
+		/*
+		 * Mark any remaining slots as not in use.
+		 */
+		while (slotno < max_worker_processes)
+		{
+			BackgroundWorkerSlot *slot = &BackgroundWorkerData->slot[slotno];
+
+			slot->in_use = false;
+			++slotno;
+		}
+	}
+	else
+		Assert(found);
+}
+
+static RegisteredBgWorker *
+FindRegisteredWorkerBySlotNumber(int slotno)
+{
+	slist_iter	siter;
+
+	/*
+	 * Copy contents of worker list into shared memory.  Record the
+	 * shared memory slot assigned to each worker.  This ensures
+	 * a 1-to-1 correspondence betwen the postmaster's private list and
+	 * the array in shared memory.
+	 */
+	slist_foreach(siter, &BackgroundWorkerList)
+	{
+		RegisteredBgWorker *rw;
+
+		rw = slist_container(RegisteredBgWorker, rw_lnode, siter.cur);
+		if (rw->rw_shmem_slot == slotno)
+			return rw;
+	}
+
+	return NULL;
+}
+
+/*
+ * Notice changes to shared_memory made by other backends.  This code
+ * runs in the postmaster, so we must be very careful not to assume that
+ * shared memory contents are sane.  Otherwise, a rogue backend could take
+ * out the postmaster.
+ */
+void
+BackgroundWorkerStateChange(void)
+{
+	int		slotno;
+
+	/*
+	 * The total number of slots stored in shared memory should match our
+	 * notion of max_worker_processes.  If it does not, something is very
+	 * wrong.  Further down, we always refer to this value as
+	 * max_worker_processes, in case shared memory gets corrupted while
+	 * we're looping.
+	 */
+	if (max_worker_processes != BackgroundWorkerData->total_slots)
+	{
+		elog(LOG,
+			 "inconsistent background worker state (max_worker_processes=%d, total_slots=%d",
+			max_worker_processes,
+			BackgroundWorkerData->total_slots);
+		return;
+	}
+
+	/*
+	 * Iterate through slots, looking for newly-registered workers or
+	 * workers who must die.
+	 */
+	for (slotno = 0; slotno < max_worker_processes; ++slotno)
+	{
+		BackgroundWorkerSlot *slot = &BackgroundWorkerData->slot[slotno];
+		RegisteredBgWorker *rw;
+
+		if (!slot->in_use)
+			continue;
+
+		/*
+		 * Make sure we don't see the in_use flag before the updated slot
+		 * contents.
+		 */
+		pg_read_barrier();
+
+		/*
+		 * See whether we already know about this worker.  If not, we need
+		 * to update our backend-private BackgroundWorkerList to match shared
+		 * memory.
+		 */
+		rw = FindRegisteredWorkerBySlotNumber(slotno);
+		if (rw != NULL)
+			continue;
+
+		/*
+		 * Copy the registration data into the registered workers list.
+		 */
+		rw = malloc(sizeof(RegisteredBgWorker));
+		if (rw == NULL)
+		{
+			ereport(LOG,
+					(errcode(ERRCODE_OUT_OF_MEMORY),
+					 errmsg("out of memory")));
+			return;
+		}
+
+		/*
+		 * Copy strings in a paranoid way.  If shared memory is corrupted,
+		 * the source data might not even be NUL-terminated.
+		 */
+		ascii_safe_strlcpy(rw->rw_worker.bgw_name,
+						   slot->worker.bgw_name, BGW_MAXLEN);
+		ascii_safe_strlcpy(rw->rw_worker.bgw_library_name,
+						   slot->worker.bgw_library_name, BGW_MAXLEN);
+		ascii_safe_strlcpy(rw->rw_worker.bgw_function_name,
+						   slot->worker.bgw_function_name, BGW_MAXLEN);
+
+		/*
+		 * Copy remaining fields.
+		 *
+		 * flags, start_time, and restart_time are examined by the
+		 * postmaster, but nothing too bad will happen if they are
+		 * corrupted.  The remaining fields will only be examined by the
+		 * child process.  It might crash, but we won't.
+		 */
+		rw->rw_worker.bgw_flags = slot->worker.bgw_flags;
+		rw->rw_worker.bgw_start_time = slot->worker.bgw_start_time;
+		rw->rw_worker.bgw_restart_time = slot->worker.bgw_restart_time;
+		rw->rw_worker.bgw_main = slot->worker.bgw_main;
+		rw->rw_worker.bgw_main_arg = slot->worker.bgw_main_arg;
+		rw->rw_worker.bgw_sighup = slot->worker.bgw_sighup;
+		rw->rw_worker.bgw_sigterm = slot->worker.bgw_sigterm;
+
+		/* Initialize postmaster bookkeeping. */
+		rw->rw_backend = NULL;
+		rw->rw_pid = 0;
+		rw->rw_child_slot = 0;
+		rw->rw_crashed_at = 0;
+		rw->rw_shmem_slot = slotno;
+
+		/* Log it! */
+		ereport(LOG,
+				(errmsg("registering background worker: %s",
+					rw->rw_worker.bgw_name)));
+
+		slist_push_head(&BackgroundWorkerList, &rw->rw_lnode);
+	}
+}
+
+/*
+ * Forget about a background worker that's no longer needed.
+ *
+ * At present, this only happens when a background worker marked
+ * BGW_NEVER_RESTART exits.  This function should only be invoked in
+ * the postmaster.
+ */
+void
+ForgetBackgroundWorker(RegisteredBgWorker *rw)
+{
+	BackgroundWorkerSlot *slot;
+
+	Assert(rw->rw_shmem_slot < max_worker_processes);
+ 	slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
+	slot->in_use = false;
+
+	ereport(LOG,
+			(errmsg("unregistering background worker: %s",
+				rw->rw_worker.bgw_name)));
+
+	slist_delete(&BackgroundWorkerList, &rw->rw_lnode);
+	free(rw);
+}
+
+#ifdef EXEC_BACKEND
+/*
+ * In EXEC_BACKEND mode, workers use this to retrieve their details from
+ * shared memory.
+ */
+BackgroundWorker *
+BackgroundWorkerEntry(int slotno)
+{
+	BackgroundWorkerSlot *slot;
+
+	Assert(slotno < BackgroundWorkerData->total_slots);
+ 	slot = &BackgroundWorkerData->slot[slotno];
+	Assert(slot->in_use);
+	return &slot->worker;		/* can't become free while we're still here */
+}
+#endif
+
+/*
+ * Complain about the BackgroundWorker definition using error level elevel.
+ * Return true if it looks ok, false if not (unless elevel >= ERROR, in
+ * which case we won't return at all in the not-OK case).
+ */
+static bool
+SanityCheckBackgroundWorker(BackgroundWorker *worker, int elevel)
+{
+	/* sanity check for flags */
+	if (worker->bgw_flags & BGWORKER_BACKEND_DATABASE_CONNECTION)
+	{
+		if (!(worker->bgw_flags & BGWORKER_SHMEM_ACCESS))
+		{
+			ereport(elevel,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("background worker \"%s\": must attach to shared memory in order to request a database connection",
+							worker->bgw_name)));
+			return false;
+		}
+
+		if (worker->bgw_start_time == BgWorkerStart_PostmasterStart)
+		{
+			ereport(elevel,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("background worker \"%s\": cannot request database access if starting at postmaster start",
+							worker->bgw_name)));
+			return false;
+		}
+
+		/* XXX other checks? */
+	}
+
+	if ((worker->bgw_restart_time < 0 &&
+		 worker->bgw_restart_time != BGW_NEVER_RESTART) ||
+		(worker->bgw_restart_time > USECS_PER_DAY / 1000))
+	{
+		ereport(elevel,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("background worker \"%s\": invalid restart interval",
+						worker->bgw_name)));
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * Register a new background worker while processing shared_preload_libraries.
+ *
+ * This can only be called in the _PG_init function of a module library
+ * that's loaded by shared_preload_libraries; otherwise it has no effect.
+ */
+void
+RegisterBackgroundWorker(BackgroundWorker *worker)
+{
+	RegisteredBgWorker *rw;
+	static int	numworkers = 0;
+
+	if (!IsUnderPostmaster)
+		ereport(LOG,
+			(errmsg("registering background worker: %s", worker->bgw_name)));
+
+	if (!process_shared_preload_libraries_in_progress)
+	{
+		if (!IsUnderPostmaster)
+			ereport(LOG,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("background worker \"%s\": must be registered in shared_preload_libraries",
+							worker->bgw_name)));
+		return;
+	}
+
+	if (!SanityCheckBackgroundWorker(worker, LOG))
+		return;
+
+	/*
+	 * Enforce maximum number of workers.  Note this is overly restrictive: we
+	 * could allow more non-shmem-connected workers, because these don't count
+	 * towards the MAX_BACKENDS limit elsewhere.  For now, it doesn't seem
+	 * important to relax this restriction.
+	 */
+	if (++numworkers > max_worker_processes)
+	{
+		ereport(LOG,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("too many background workers"),
+				 errdetail("Up to %d background workers can be registered with the current settings.",
+						   max_worker_processes),
+				 errhint("Consider increasing the configuration parameter \"max_worker_processes\".")));
+		return;
+	}
+
+	/*
+	 * Copy the registration data into the registered workers list.
+	 */
+	rw = malloc(sizeof(RegisteredBgWorker));
+	if (rw == NULL)
+	{
+		ereport(LOG,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+		return;
+	}
+
+	rw->rw_worker = *worker;
+	rw->rw_backend = NULL;
+	rw->rw_pid = 0;
+	rw->rw_child_slot = 0;
+	rw->rw_crashed_at = 0;
+
+	slist_push_head(&BackgroundWorkerList, &rw->rw_lnode);
+}
+
+/*
+ * Register a new background worker from a regular backend.
+ *
+ * Returns true on success and false on failure.  Failure typically indicates
+ * that no background worker slots are currently available.
+ */
+bool
+RegisterDynamicBackgroundWorker(BackgroundWorker *worker)
+{
+	int		slotno;
+	bool	success = false;
+
+	/*
+	 * We can't register dynamic background workers from the postmaster.
+	 * If this is a standalone backend, we're the only process and can't
+	 * start any more.  In a multi-process environement, it might be
+	 * theoretically possible, but we don't currently support it due to
+	 * locking considerations; see comments on the BackgroundWorkerSlot
+	 * data structure.
+	 */
+	if (!IsUnderPostmaster)
+		return false;
+
+	if (!SanityCheckBackgroundWorker(worker, ERROR))
+		return false;
+
+	LWLockAcquire(BackgroundWorkerLock, LW_EXCLUSIVE);
+
+	/*
+	 * Look for an unused slot.  If we find one, grab it.
+	 */
+	for (slotno = 0; slotno < BackgroundWorkerData->total_slots; ++slotno)
+	{
+		BackgroundWorkerSlot *slot = &BackgroundWorkerData->slot[slotno];
+
+		if (!slot->in_use)
+		{
+			memcpy(&slot->worker, worker, sizeof(BackgroundWorker));
+
+			/*
+			 * Make sure postmaster doesn't see the slot as in use before
+			 * it sees the new contents.
+			 */
+			pg_write_barrier();
+
+			slot->in_use = true;
+			success = true;			
+			break;
+		}
+	}
+
+	LWLockRelease(BackgroundWorkerLock);
+
+	/* If we found a slot, tell the postmaster to notice the change. */
+	if (success)
+		SendPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE);
+
+	return success;
+}
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 5c77743..4e65509 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -103,7 +103,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
-#include "postmaster/bgworker.h"
+#include "postmaster/bgworker_internals.h"
 #include "postmaster/fork_process.h"
 #include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
@@ -117,6 +117,7 @@
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/datetime.h"
+#include "utils/dynamic_loader.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -178,29 +179,6 @@ static dlist_head BackendList = DLIST_STATIC_INIT(BackendList);
 static Backend *ShmemBackendArray;
 #endif
 
-
-/*
- * List of background workers.
- *
- * A worker that requests a database connection during registration will have
- * rw_backend set, and will be present in BackendList.	Note: do not rely on
- * rw_backend being non-NULL for shmem-connected workers!
- */
-typedef struct RegisteredBgWorker
-{
-	BackgroundWorker rw_worker; /* its registry entry */
-	Backend    *rw_backend;		/* its BackendList entry, or NULL */
-	pid_t		rw_pid;			/* 0 if not running */
-	int			rw_child_slot;
-	TimestampTz rw_crashed_at;	/* if not 0, time it last crashed */
-#ifdef EXEC_BACKEND
-	int			rw_cookie;
-#endif
-	slist_node	rw_lnode;		/* list link */
-} RegisteredBgWorker;
-
-static slist_head BackgroundWorkerList = SLIST_STATIC_INIT(BackgroundWorkerList);
-
 BackgroundWorker *MyBgworkerEntry = NULL;
 
 
@@ -526,8 +504,6 @@ static bool save_backend_variables(BackendParameters *param, Port *port,
 
 static void ShmemBackendArrayAdd(Backend *bn);
 static void ShmemBackendArrayRemove(Backend *bn);
-
-static BackgroundWorker *find_bgworker_entry(int cookie);
 #endif   /* EXEC_BACKEND */
 
 #define StartupDataBase()		StartChildProcess(StartupProcess)
@@ -1440,7 +1416,7 @@ DetermineSleepTime(struct timeval * timeout)
 
 	if (HaveCrashedWorker)
 	{
-		slist_iter	siter;
+		slist_mutable_iter	siter;
 
 		/*
 		 * When there are crashed bgworkers, we sleep just long enough that
@@ -1448,7 +1424,7 @@ DetermineSleepTime(struct timeval * timeout)
 		 * determine the minimum of all wakeup times according to most recent
 		 * crash time and requested restart interval.
 		 */
-		slist_foreach(siter, &BackgroundWorkerList)
+		slist_foreach_modify(siter, &BackgroundWorkerList)
 		{
 			RegisteredBgWorker *rw;
 			TimestampTz this_wakeup;
@@ -1459,7 +1435,10 @@ DetermineSleepTime(struct timeval * timeout)
 				continue;
 
 			if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
+			{
+				ForgetBackgroundWorker(rw);
 				continue;
+			}
 
 			this_wakeup = TimestampTzPlusMilliseconds(rw->rw_crashed_at,
 									 1000L * rw->rw_worker.bgw_restart_time);
@@ -4540,7 +4519,7 @@ SubPostmasterMain(int argc, char *argv[])
 	}
 	if (strncmp(argv[1], "--forkbgworker=", 15) == 0)
 	{
-		int			cookie;
+		int			shmem_slot;
 
 		/* Close the postmaster's sockets */
 		ClosePostmasterPorts(false);
@@ -4554,8 +4533,8 @@ SubPostmasterMain(int argc, char *argv[])
 		/* Attach process to shared data structures */
 		CreateSharedMemoryAndSemaphores(false, 0);
 
-		cookie = atoi(argv[1] + 15);
-		MyBgworkerEntry = find_bgworker_entry(cookie);
+		shmem_slot = atoi(argv[1] + 15);
+		MyBgworkerEntry = BackgroundWorkerEntry(shmem_slot);
 		do_start_bgworker();
 	}
 	if (strcmp(argv[1], "--forkarch") == 0)
@@ -4618,9 +4597,17 @@ static void
 sigusr1_handler(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
+	bool		start_bgworker = false;
 
 	PG_SETMASK(&BlockSig);
 
+	/* Process background worker state change. */
+	if (CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE))
+	{
+		BackgroundWorkerStateChange();
+		start_bgworker = true;
+	}
+
 	/*
 	 * RECOVERY_STARTED and BEGIN_HOT_STANDBY signals are ignored in
 	 * unexpected states. If the startup process quickly starts up, completes
@@ -4657,11 +4644,13 @@ sigusr1_handler(SIGNAL_ARGS)
 		(errmsg("database system is ready to accept read only connections")));
 
 		pmState = PM_HOT_STANDBY;
-
 		/* Some workers may be scheduled to start now */
-		StartOneBackgroundWorker();
+		start_bgworker = true;
 	}
 
+	if (start_bgworker)
+		StartOneBackgroundWorker();
+
 	if (CheckPostmasterSignal(PMSIGNAL_WAKEN_ARCHIVER) &&
 		PgArchPID != 0)
 	{
@@ -5135,124 +5124,6 @@ MaxLivePostmasterChildren(void)
 }
 
 /*
- * Register a new background worker.
- *
- * This can only be called in the _PG_init function of a module library
- * that's loaded by shared_preload_libraries; otherwise it has no effect.
- */
-void
-RegisterBackgroundWorker(BackgroundWorker *worker)
-{
-	RegisteredBgWorker *rw;
-	int			namelen = strlen(worker->bgw_name);
-	static int	numworkers = 0;
-
-#ifdef EXEC_BACKEND
-
-	/*
-	 * Use 1 here, not 0, to avoid confusing a possible bogus cookie read by
-	 * atoi() in SubPostmasterMain.
-	 */
-	static int	BackgroundWorkerCookie = 1;
-#endif
-
-	if (!IsUnderPostmaster)
-		ereport(LOG,
-			(errmsg("registering background worker: %s", worker->bgw_name)));
-
-	if (!process_shared_preload_libraries_in_progress)
-	{
-		if (!IsUnderPostmaster)
-			ereport(LOG,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("background worker \"%s\": must be registered in shared_preload_libraries",
-							worker->bgw_name)));
-		return;
-	}
-
-	/* sanity check for flags */
-	if (worker->bgw_flags & BGWORKER_BACKEND_DATABASE_CONNECTION)
-	{
-		if (!(worker->bgw_flags & BGWORKER_SHMEM_ACCESS))
-		{
-			if (!IsUnderPostmaster)
-				ereport(LOG,
-						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-						 errmsg("background worker \"%s\": must attach to shared memory in order to request a database connection",
-								worker->bgw_name)));
-			return;
-		}
-
-		if (worker->bgw_start_time == BgWorkerStart_PostmasterStart)
-		{
-			if (!IsUnderPostmaster)
-				ereport(LOG,
-						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-						 errmsg("background worker \"%s\": cannot request database access if starting at postmaster start",
-								worker->bgw_name)));
-			return;
-		}
-
-		/* XXX other checks? */
-	}
-
-	if ((worker->bgw_restart_time < 0 &&
-		 worker->bgw_restart_time != BGW_NEVER_RESTART) ||
-		(worker->bgw_restart_time > USECS_PER_DAY / 1000))
-	{
-		if (!IsUnderPostmaster)
-			ereport(LOG,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("background worker \"%s\": invalid restart interval",
-						worker->bgw_name)));
-		return;
-	}
-
-	/*
-	 * Enforce maximum number of workers.  Note this is overly restrictive: we
-	 * could allow more non-shmem-connected workers, because these don't count
-	 * towards the MAX_BACKENDS limit elsewhere.  For now, it doesn't seem
-	 * important to relax this restriction.
-	 */
-	if (++numworkers > max_worker_processes)
-	{
-		ereport(LOG,
-				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
-				 errmsg("too many background workers"),
-				 errdetail("Up to %d background workers can be registered with the current settings.",
-						   max_worker_processes),
-				 errhint("Consider increasing the configuration parameter \"max_worker_processes\".")));
-		return;
-	}
-
-	/*
-	 * Copy the registration data into the registered workers list.
-	 */
-	rw = malloc(sizeof(RegisteredBgWorker) + namelen + 1);
-	if (rw == NULL)
-	{
-		ereport(LOG,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory")));
-		return;
-	}
-
-	rw->rw_worker = *worker;
-	rw->rw_worker.bgw_name = ((char *) rw) + sizeof(RegisteredBgWorker);
-	strlcpy(rw->rw_worker.bgw_name, worker->bgw_name, namelen + 1);
-
-	rw->rw_backend = NULL;
-	rw->rw_pid = 0;
-	rw->rw_child_slot = 0;
-	rw->rw_crashed_at = 0;
-#ifdef EXEC_BACKEND
-	rw->rw_cookie = BackgroundWorkerCookie++;
-#endif
-
-	slist_push_head(&BackgroundWorkerList, &rw->rw_lnode);
-}
-
-/*
  * Connect background worker to a database.
  */
 void
@@ -5290,25 +5161,6 @@ BackgroundWorkerUnblockSignals(void)
 	PG_SETMASK(&UnBlockSig);
 }
 
-#ifdef EXEC_BACKEND
-static BackgroundWorker *
-find_bgworker_entry(int cookie)
-{
-	slist_iter	iter;
-
-	slist_foreach(iter, &BackgroundWorkerList)
-	{
-		RegisteredBgWorker *rw;
-
-		rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
-		if (rw->rw_cookie == cookie)
-			return &rw->rw_worker;
-	}
-
-	return NULL;
-}
-#endif
-
 static void
 bgworker_quickdie(SIGNAL_ARGS)
 {
@@ -5371,6 +5223,7 @@ do_start_bgworker(void)
 	sigjmp_buf	local_sigjmp_buf;
 	char		buf[MAXPGPATH];
 	BackgroundWorker *worker = MyBgworkerEntry;
+	bgworker_main_type entrypt;
 
 	if (worker == NULL)
 		elog(FATAL, "unable to find bgworker entry");
@@ -5487,6 +5340,23 @@ do_start_bgworker(void)
 #endif
 
 	/*
+	 * If bgw_main is set, we use that value as the initial entrypoint.
+	 * However, if the library containing the entrypoint wasn't loaded at
+	 * postmaster startup time, passing it as a direct function pointer is
+	 * not possible.  To work around that, we allow callers for whom a
+	 * function pointer is not available to pass a library name (which will
+	 * be loaded, if necessary) and a function name (which will be looked up
+	 * in the named library).
+	 */
+	if (worker->bgw_main != NULL)
+		entrypt = worker->bgw_main;
+	else
+		entrypt = (bgworker_main_type)
+			load_external_function(worker->bgw_library_name,
+								   worker->bgw_function_name,
+								   true, NULL);
+
+	/*
 	 * Note that in normal processes, we would call InitPostgres here.	For a
 	 * worker, however, we don't know what database to connect to, yet; so we
 	 * need to wait until the user code does it via
@@ -5496,7 +5366,7 @@ do_start_bgworker(void)
 	/*
 	 * Now invoke the user-defined worker code
 	 */
-	worker->bgw_main(worker->bgw_main_arg);
+	entrypt(worker->bgw_main_arg);
 
 	/* ... and if it returns, we're done */
 	proc_exit(0);
@@ -5504,13 +5374,13 @@ do_start_bgworker(void)
 
 #ifdef EXEC_BACKEND
 static pid_t
-bgworker_forkexec(int cookie)
+bgworker_forkexec(int shmem_slot)
 {
 	char	   *av[10];
 	int			ac = 0;
 	char		forkav[MAXPGPATH];
 
-	snprintf(forkav, MAXPGPATH, "--forkbgworker=%d", cookie);
+	snprintf(forkav, MAXPGPATH, "--forkbgworker=%d", shmem_slot);
 
 	av[ac++] = "postgres";
 	av[ac++] = forkav;
@@ -5539,7 +5409,7 @@ start_bgworker(RegisteredBgWorker *rw)
 					rw->rw_worker.bgw_name)));
 
 #ifdef EXEC_BACKEND
-	switch ((worker_pid = bgworker_forkexec(rw->rw_cookie)))
+	switch ((worker_pid = bgworker_forkexec(rw->rw_shmem_slot)))
 #else
 	switch ((worker_pid = fork_process()))
 #endif
@@ -5667,7 +5537,7 @@ assign_backendlist_entry(RegisteredBgWorker *rw)
 static void
 StartOneBackgroundWorker(void)
 {
-	slist_iter	iter;
+	slist_mutable_iter	iter;
 	TimestampTz now = 0;
 
 	if (FatalError)
@@ -5679,7 +5549,7 @@ StartOneBackgroundWorker(void)
 
 	HaveCrashedWorker = false;
 
-	slist_foreach(iter, &BackgroundWorkerList)
+	slist_foreach_modify(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 
@@ -5699,7 +5569,10 @@ StartOneBackgroundWorker(void)
 		if (rw->rw_crashed_at != 0)
 		{
 			if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
+			{
+				ForgetBackgroundWorker(rw);
 				continue;
+			}
 
 			if (now == 0)
 				now = GetCurrentTimestamp();
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index b34ba44..a0b741b 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -24,6 +24,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
+#include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
 #include "postmaster/postmaster.h"
 #include "replication/walreceiver.h"
@@ -113,6 +114,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, CLOGShmemSize());
 		size = add_size(size, SUBTRANSShmemSize());
 		size = add_size(size, TwoPhaseShmemSize());
+		size = add_size(size, BackgroundWorkerShmemSize());
 		size = add_size(size, MultiXactShmemSize());
 		size = add_size(size, LWLockShmemSize());
 		size = add_size(size, ProcArrayShmemSize());
@@ -214,6 +216,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	CreateSharedProcArray();
 	CreateSharedBackendStatus();
 	TwoPhaseShmemInit();
+	BackgroundWorkerShmemInit();
 
 	/*
 	 * Set up shared-inval messaging
diff --git a/src/include/postmaster/bgworker.h b/src/include/postmaster/bgworker.h
index 5316705..794eb39 100644
--- a/src/include/postmaster/bgworker.h
+++ b/src/include/postmaster/bgworker.h
@@ -52,7 +52,7 @@
 #define BGWORKER_BACKEND_DATABASE_CONNECTION		0x0002
 
 
-typedef void (*bgworker_main_type) (void *main_arg);
+typedef void (*bgworker_main_type) (Datum main_arg);
 typedef void (*bgworker_sighdlr_type) (SIGNAL_ARGS);
 
 /*
@@ -67,22 +67,28 @@ typedef enum
 
 #define BGW_DEFAULT_RESTART_INTERVAL	60
 #define BGW_NEVER_RESTART				-1
+#define BGW_MAXLEN						64
 
 typedef struct BackgroundWorker
 {
-	char	   *bgw_name;
+	char	    bgw_name[BGW_MAXLEN];
 	int			bgw_flags;
 	BgWorkerStartTime bgw_start_time;
 	int			bgw_restart_time;		/* in seconds, or BGW_NEVER_RESTART */
 	bgworker_main_type bgw_main;
-	void	   *bgw_main_arg;
+	char		bgw_library_name[BGW_MAXLEN];	/* only if bgw_main is NULL */
+	char		bgw_function_name[BGW_MAXLEN];	/* only if bgw_main is NULL */
+	Datum		bgw_main_arg;
 	bgworker_sighdlr_type bgw_sighup;
 	bgworker_sighdlr_type bgw_sigterm;
 } BackgroundWorker;
 
-/* Register a new bgworker */
+/* Register a new bgworker during shared_preload_libraries */
 extern void RegisterBackgroundWorker(BackgroundWorker *worker);
 
+/* Register a new bgworker from a regular backend */
+extern bool RegisterDynamicBackgroundWorker(BackgroundWorker *worker);
+
 /* This is valid in a running worker */
 extern BackgroundWorker *MyBgworkerEntry;
 
diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h
new file mode 100644
index 0000000..6484cfb
--- /dev/null
+++ b/src/include/postmaster/bgworker_internals.h
@@ -0,0 +1,48 @@
+/*--------------------------------------------------------------------
+ * bgworker_internals.h
+ *		POSTGRES pluggable background workers internals
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *		src/include/postmaster/bgworker.h
+ *--------------------------------------------------------------------
+ */
+#ifndef BGWORKER_INTERNALS_H
+#define BGWORKER_INTERNALS_H
+
+#include "datatype/timestamp.h"
+#include "lib/ilist.h"
+#include "postmaster/bgworker.h"
+
+/*
+ * List of background workers, private to postmaster.
+ *
+ * A worker that requests a database connection during registration will have
+ * rw_backend set, and will be present in BackendList.	Note: do not rely on
+ * rw_backend being non-NULL for shmem-connected workers!
+ */
+typedef struct RegisteredBgWorker
+{
+	BackgroundWorker rw_worker; /* its registry entry */
+	struct bkend *rw_backend;		/* its BackendList entry, or NULL */
+	pid_t		rw_pid;			/* 0 if not running */
+	int			rw_child_slot;
+	TimestampTz rw_crashed_at;	/* if not 0, time it last crashed */
+	int			rw_shmem_slot;
+	slist_node	rw_lnode;		/* list link */
+} RegisteredBgWorker;
+
+extern slist_head BackgroundWorkerList;
+
+extern Size BackgroundWorkerShmemSize(void);
+extern void BackgroundWorkerShmemInit(void);
+extern void BackgroundWorkerStateChange(void);
+extern void ForgetBackgroundWorker(RegisteredBgWorker *);
+
+#ifdef EXEC_BACKEND
+extern BackgroundWorker *BackgroundWorkerEntry(int slotno);
+#endif
+
+#endif   /* BGWORKER_INTERNLS_H */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index d8f7e9d..ce9df9b 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -79,6 +79,7 @@ typedef enum LWLockId
 	SerializablePredicateLockListLock,
 	OldSerXidLock,
 	SyncRepLock,
+	BackgroundWorkerLock,
 	/* Individual lock IDs end here */
 	FirstBufMappingLock,
 	FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,
diff --git a/src/include/storage/pmsignal.h b/src/include/storage/pmsignal.h
index a6cb844..d894edf 100644
--- a/src/include/storage/pmsignal.h
+++ b/src/include/storage/pmsignal.h
@@ -28,6 +28,7 @@ typedef enum
 	PMSIGNAL_ROTATE_LOGFILE,	/* send SIGUSR1 to syslogger to rotate logfile */
 	PMSIGNAL_START_AUTOVAC_LAUNCHER,	/* start an autovacuum launcher */
 	PMSIGNAL_START_AUTOVAC_WORKER,		/* start an autovacuum worker */
+	PMSIGNAL_BACKGROUND_WORKER_CHANGE,	/* background worker state change */
 	PMSIGNAL_START_WALRECEIVER, /* start a walreceiver */
 	PMSIGNAL_ADVANCE_STATE_MACHINE,		/* advance postmaster's state machine */

Michael Paquier

michael.paquier@gmail.com

over 12 years ago

In reply to: Robert Haas (#1)

Re: dynamic background workers

On Sat, Jun 15, 2013 at 6:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:

The second patch, dynamic-bgworkers-v1.patch, revises the background
worker API to allow background workers to be started dynamically.
This requires some communication channel from ordinary workers to the
postmaster, because it is the postmaster that must ultimately start
the newly-registered workers. However, that communication channel has
to be designed pretty carefully, lest a shared memory corruption take
out the postmaster and lead to inadvertent failure to restart after a
crash. Here's how I implemented that: there's an array in shared
memory of a size equal to max_worker_processes. This array is
separate from the backend-private list of workers maintained by the
postmaster, but the two are kept in sync. When a new background
worker registration is added to the shared data structure, the backend
adding it uses the existing pmsignal mechanism to kick the postmaster,
which then scans the array for new registrations. I have attempted to
make the code that transfers the shared_memory state into the
postmaster's private state as paranoid as humanly possible. The
precautions taken are documented in the comments. Conversely, when a
background worker flagged as BGW_NEVER_RESTART is considered for
restart (and we decide against it), the corresponding slot in the
shared memory array is marked as no longer in use, allowing it to be
reused for a new registration.

Since the postmaster cannot take locks, synchronization between the
postmaster and other backends using the shared memory segment has to
be lockless. This mechanism is also documented in the comments. An
lwlock is used to prevent two backends that are both registering a new
worker at about the same time from stomping on each other, but the
postmaster need not care about that lwlock.

This patch also extends worker_spi as a demonstration of the new
interface. With this patch, you can CREATE EXTENSION worker_spi and
then call worker_spi_launch(int4) to launch a new background worker,
or combine it with generate_series() to launch a bunch at once. Then
you can kill them off with pg_terminate_backend() and start some new
ones. That, in my humble opinion, is pretty cool.

This looks really interesting, +1. I'll test the patch if possible next
week.
--
Michael

Simon Riggs

simon@2ndQuadrant.com

over 12 years ago

In reply to: Robert Haas (#1)

Re: dynamic background workers

On 14 June 2013 22:00, Robert Haas <robertmhaas@gmail.com> wrote:

Parallel query, or any subset of that project such as parallel sort,
will require a way to start background workers on demand. Thanks to
Alvaro's work on 9.3, we now have the ability to configure background
workers via shared_preload_libraries. But if you don't have the right
library loaded at startup time, and subsequently wish to add a
background worker while the server is running, you are out of luck.
Even if you do have the right library loaded, but want to start
workers in response to user activity, rather than when the database
comes on-line, you are also out of luck. Relaxing these restrictions
is essential for parallel query (or parallel processing of any kind),
and useful apart from that. Two patches are attached.

Your proposal is exactly what we envisaged and parallel query always
was a target for background workers. The restrictions were only there
to ensure we got the feature into 9.3, rather than trying to implement
everything and then having it pushed back a release.

So +1.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Christopher Browne

cbbrowne@gmail.com

over 12 years ago

In reply to: Robert Haas (#1)

Re: dynamic background workers

BTW, one of the ideas that popped up in the unConference session on
replication was "why couldn't we use a background worker as a replication
agent?"

The main reason pointed out was 'because that means you have to restart the
postmaster to add a replication agent.' (e.g. - like a Slony "slon"
process)

There may well be other better reasons not to do so, but it would be nice
to eliminate this reason. It seems seriously limiting to the bg-worker
concept for them to be thus restricted.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

Peter Eisentraut

peter_e@gmx.net

over 12 years ago

In reply to: Robert Haas (#1)

Re: dynamic background workers

On Fri, 2013-06-14 at 17:00 -0400, Robert Haas wrote:

Alvaro's work on 9.3, we now have the ability to configure background
workers via shared_preload_libraries. But if you don't have the right
library loaded at startup time, and subsequently wish to add a
background worker while the server is running, you are out of luck.

We could tweak shared_preload_libraries so that it reacts sensibly to
reloads. I basically gave up on that by writing
session_preload_libraries, but if there is more general use for that, we
could try.

(That doesn't invalidate your work, but it's a thought.)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier

michael.paquier@gmail.com

over 12 years ago

In reply to: Robert Haas (#1)

Re: dynamic background workers

Hi,

On Sat, Jun 15, 2013 at 6:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:

The first patch, max-worker-processes-v1.patch, adds a new GUC
max_worker_processes, which defaults to 8. This fixes the problem
discussed here:

/messages/by-id/CA+TgmobguVO+qHnHvxBA2TFkDhw67Y=4Bp405FVABEc_EtO4VQ@mail.gmail.com

Apart from fixing that problem, it's a pretty boring patch.

I just had a look at the first patch which is pretty simple before
looking in details at the 2nd patch.

Here are some minor comments:
1) Correction of postgresql.conf.sample
Putting the new parameter in the section resource usage is adapted,
however why not adding a new sub-section of this type with some
comments like below?
# - Background workers -
#max_worker_processes = 8 # Maximum number of background worker subprocesses
# (change requires restart)
2) Perhaps it would be better to specify in the docs that if the
number of bgworker that are tried to be started by server is higher
than max_worker_processes, startup of the extra bgworkers will fail
but server will continue running as if nothing happened. This is
something users should be made aware of.
3) In InitProcGlobal:proc.c, wouldn't it be more consistent to do that
when assigning new slots in PGPROC:
else if (i < MaxConnections + autovacuum_max_workers + max_worker_processes + 1)
{
/* PGPROC for bgworker, add to bgworkerFreeProcs list */
procs[i].links.next = (SHM_QUEUE *) ProcGlobal->bgworkerFreeProcs;
ProcGlobal->bgworkerFreeProcs = &procs[i];
}
instead of that?
else if (i < MaxBackends)
{
/* PGPROC for bgworker, add to bgworkerFreeProcs list */
procs[i].links.next = (SHM_QUEUE *) ProcGlobal->bgworkerFreeProcs;
ProcGlobal->bgworkerFreeProcs = &procs[i];
}

I have also done many tests with worker_spi and some home-made
bgworkers and the patch is working as expected, the extra bgworkers
are not started once the maximum number set it reached.
I'll try to look at the other patch soon, but I think that the real
discussion on the topic is just beginning... Btw, IMHO, this first
patch can safely be committed as we would have a nice base for future
discussions/reviews.
Regards,
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Peter Eisentraut (#5)

Re: dynamic background workers

On Mon, Jun 17, 2013 at 10:45 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

On Fri, 2013-06-14 at 17:00 -0400, Robert Haas wrote:

Alvaro's work on 9.3, we now have the ability to configure background
workers via shared_preload_libraries. But if you don't have the right
library loaded at startup time, and subsequently wish to add a
background worker while the server is running, you are out of luck.

We could tweak shared_preload_libraries so that it reacts sensibly to
reloads. I basically gave up on that by writing
session_preload_libraries, but if there is more general use for that, we
could try.

(That doesn't invalidate your work, but it's a thought.)

Yeah, I thought about that. But it doesn't seem possible to do
anything all that sane. You can't unload libraries if they've been
removed; you can potentially load new ones if they've been added. But
that's a bit confusing, if the config file says that's what's loaded
is bar, and what's actually loaded is foo, bar, baz, bletch, and quux.

Some variant of this might still be worth doing, but figuring out the
details sounded like more than I wanted to get into, so I punted. :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Markus Wanner

markus@bluegap.ch

over 12 years ago

In reply to: Robert Haas (#1)

Re: dynamic background workers

Robert,

On 06/14/2013 11:00 PM, Robert Haas wrote:

Parallel query, or any subset of that project such as parallel sort,
will require a way to start background workers on demand.

thanks for continuing this, very much appreciated. Postgres-R and thus
TransLattice successfully use a similar approach for years, now.

I only had a quick glance over the patch, yet. Some comments on the design:

This requires some communication channel from ordinary workers to the
postmaster, because it is the postmaster that must ultimately start
the newly-registered workers. However, that communication channel has
to be designed pretty carefully, lest a shared memory corruption take
out the postmaster and lead to inadvertent failure to restart after a
crash. Here's how I implemented that: there's an array in shared
memory of a size equal to max_worker_processes. This array is
separate from the backend-private list of workers maintained by the
postmaster, but the two are kept in sync. When a new background
worker registration is added to the shared data structure, the backend
adding it uses the existing pmsignal mechanism to kick the postmaster,
which then scans the array for new registrations.

That sounds like a good simplification. Even if it's an O(n) operation,
the n in question here has relatively low practical limits. It's
unlikely to be much of a concern, I guess.

Back then, I solved it by having a "fork request slot". After starting,
the bgworker then had to clear that slot and register with a coordinator
process (i.e. the original requestor), so that one learned its fork
request was successful. At some point I expanded that to multiple
request slots to better handle multiple concurrent fork requests.
However, it was difficult to get right and requires more IPC than your
approach.

On the pro side: The shared memory area used by the postmaster was very
small in size and read-only to the postmaster. These were my main goals,
which I'm not sure were the best ones, now that I read your concept.

I have attempted to
make the code that transfers the shared_memory state into the
postmaster's private state as paranoid as humanly possible. The
precautions taken are documented in the comments. Conversely, when a
background worker flagged as BGW_NEVER_RESTART is considered for
restart (and we decide against it), the corresponding slot in the
shared memory array is marked as no longer in use, allowing it to be
reused for a new registration.

Sounds like the postmaster is writing to shared memory. Not sure why
I've been trying so hard to avoid that, though. After all, it can hardly
hurt itself *writing* to shared memory.

Since the postmaster cannot take locks, synchronization between the
postmaster and other backends using the shared memory segment has to
be lockless. This mechanism is also documented in the comments. An
lwlock is used to prevent two backends that are both registering a new
worker at about the same time from stomping on each other, but the
postmaster need not care about that lwlock.

This patch also extends worker_spi as a demonstration of the new
interface. With this patch, you can CREATE EXTENSION worker_spi and
then call worker_spi_launch(int4) to launch a new background worker,
or combine it with generate_series() to launch a bunch at once. Then
you can kill them off with pg_terminate_backend() and start some new
ones. That, in my humble opinion, is pretty cool.

It definitely is. Thanks again.

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Markus Wanner (#8)

Re: dynamic background workers

On Thu, Jun 20, 2013 at 9:57 AM, Markus Wanner <markus@bluegap.ch> wrote:

That sounds like a good simplification. Even if it's an O(n) operation,
the n in question here has relatively low practical limits. It's
unlikely to be much of a concern, I guess.

The constant factor is also very small. Generally, I would expect
num_worker_processes <~ # CPUs, and scanning a 32, 64, or even 128
element array is not a terribly time-consuming operation. We might
need to re-think this when systems with 4096 processors become
commonplace, but considering how many other things would also need to
be fixed to work well in that universe, I'm not too concerned about it
just yet.

One thing I think we probably want to explore in the future, for both
worker backends and regular backends, is pre-forking. We could avoid
some of the latency associated with starting up a new backend or
opening a new connection in that way. However, there are quite a few
details to be thought through there, so I'm not eager to pursue that
just yet. Once we have enough infrastructure to implement meaningful
parallelism, we can benchmark it and find out where the bottlenecks
are, and which solutions actually help most.

Back then, I solved it by having a "fork request slot". After starting,
the bgworker then had to clear that slot and register with a coordinator
process (i.e. the original requestor), so that one learned its fork
request was successful. At some point I expanded that to multiple
request slots to better handle multiple concurrent fork requests.
However, it was difficult to get right and requires more IPC than your
approach.

I do think we need a mechanism to allow the backend that requested the
bgworker to know whether or not the bgworker got started, and whether
it unexpectedly died. Once we get to the point of calling user code
within the bgworker process, it can use any number of existing
mechanisms to make sure that it won't die without notifying the
backend that started it (short of a PANIC, in which case it won't
matter anyway). But we need a way to report failures that happen
before that point. I have some ideas about that, but decided to leave
them for future passes. The remit of this patch is just to make it
possible to dynamically register bgworkers. Allowing a bgworker to be
"tied" to the session that requested it via some sort of feedback loop
is a separate project - which I intend to tackle before CF2, assuming
this gets committed (and so far nobody is objecting to that).

I have attempted to
make the code that transfers the shared_memory state into the
postmaster's private state as paranoid as humanly possible. The
precautions taken are documented in the comments. Conversely, when a
background worker flagged as BGW_NEVER_RESTART is considered for
restart (and we decide against it), the corresponding slot in the
shared memory array is marked as no longer in use, allowing it to be
reused for a new registration.

Sounds like the postmaster is writing to shared memory. Not sure why
I've been trying so hard to avoid that, though. After all, it can hardly
hurt itself *writing* to shared memory.

I think there's ample room for paranoia about postmaster interaction
with shared memory, but all it's doing is setting a flag, which is no
different from what CheckPostmasterSignal() already does.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Markus Wanner

markus@bluegap.ch

over 12 years ago

In reply to: Robert Haas (#9)

Re: dynamic background workers

On 06/20/2013 04:41 PM, Robert Haas wrote:

The constant factor is also very small. Generally, I would expect
num_worker_processes <~ # CPUs

That assumption might hold for parallel querying, yes. In case of
Postgres-R, it doesn't. In the worst case, i.e. with a 100% write load,
a cluster of n nodes, each with m backends performing transactions, all
of them replicated to all other (n-1) nodes, you end up with ((n-1) * m)
bgworkers. Which is pretty likely to be way above the # CPUs on any
single node.

I can imagine other extensions or integral features like autonomous
transactions that might possibly want many more bgworkers as well.

and scanning a 32, 64, or even 128
element array is not a terribly time-consuming operation.

I'd extend that to say scanning an array with a few thousand elements is
not terribly time-consuming, either. IMO the simplicity is worth it,
ATM. It's all relative to your definition of ... eh ... "terribly".

.oO( ... premature optimization ... all evil ... )

We might
need to re-think this when systems with 4096 processors become
commonplace, but considering how many other things would also need to
be fixed to work well in that universe, I'm not too concerned about it
just yet.

Agreed.

One thing I think we probably want to explore in the future, for both
worker backends and regular backends, is pre-forking. We could avoid
some of the latency associated with starting up a new backend or
opening a new connection in that way. However, there are quite a few
details to be thought through there, so I'm not eager to pursue that
just yet. Once we have enough infrastructure to implement meaningful
parallelism, we can benchmark it and find out where the bottlenecks
are, and which solutions actually help most.

Do you mean pre-forking and connecting to a specific database? Or really
just the forking?

I do think we need a mechanism to allow the backend that requested the
bgworker to know whether or not the bgworker got started, and whether
it unexpectedly died. Once we get to the point of calling user code
within the bgworker process, it can use any number of existing
mechanisms to make sure that it won't die without notifying the
backend that started it (short of a PANIC, in which case it won't
matter anyway). But we need a way to report failures that happen
before that point. I have some ideas about that, but decided to leave
them for future passes. The remit of this patch is just to make it
possible to dynamically register bgworkers. Allowing a bgworker to be
"tied" to the session that requested it via some sort of feedback loop
is a separate project - which I intend to tackle before CF2, assuming
this gets committed (and so far nobody is objecting to that).

Okay, sounds good. Given my background, I considered that a solved
problem. Thanks for pointing it out.

Sounds like the postmaster is writing to shared memory. Not sure why
I've been trying so hard to avoid that, though. After all, it can hardly
hurt itself *writing* to shared memory.

I think there's ample room for paranoia about postmaster interaction
with shared memory, but all it's doing is setting a flag, which is no
different from what CheckPostmasterSignal() already does.

Sounds good to me.

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Markus Wanner (#10)

Re: dynamic background workers

On Thu, Jun 20, 2013 at 10:59 AM, Markus Wanner <markus@bluegap.ch> wrote:

On 06/20/2013 04:41 PM, Robert Haas wrote:

The constant factor is also very small. Generally, I would expect
num_worker_processes <~ # CPUs

That assumption might hold for parallel querying, yes. In case of
Postgres-R, it doesn't. In the worst case, i.e. with a 100% write load,
a cluster of n nodes, each with m backends performing transactions, all
of them replicated to all other (n-1) nodes, you end up with ((n-1) * m)
bgworkers. Which is pretty likely to be way above the # CPUs on any
single node.

I can imagine other extensions or integral features like autonomous
transactions that might possibly want many more bgworkers as well.

Yeah, maybe. I think in general it's not going to work great to have
zillions of backends floating around, because eventually the OS
scheduler overhead - and the memory overhead - are going to become
pain points. And I'm hopeful that autonomous transactions can be
implemented without needing to start a new backend for each one,
because that sounds pretty expensive. Some users of other database
products will expect autonomous transactions to be cheap; aside from
that, cheap is better than expensive. But we will see. At any rate I
think your basic point is that people might end up creating a lot more
background workers than I'm imagining, which is certainly a fair
point.

and scanning a 32, 64, or even 128
element array is not a terribly time-consuming operation.

I'd extend that to say scanning an array with a few thousand elements is
not terribly time-consuming, either. IMO the simplicity is worth it,
ATM. It's all relative to your definition of ... eh ... "terribly".

.oO( ... premature optimization ... all evil ... )

Yeah, that thing.

One thing I think we probably want to explore in the future, for both
worker backends and regular backends, is pre-forking. We could avoid
some of the latency associated with starting up a new backend or
opening a new connection in that way. However, there are quite a few
details to be thought through there, so I'm not eager to pursue that
just yet. Once we have enough infrastructure to implement meaningful
parallelism, we can benchmark it and find out where the bottlenecks
are, and which solutions actually help most.

Do you mean pre-forking and connecting to a specific database? Or really
just the forking?

I've considered both at various times, although in this context I was
mostly thinking about just the forking. Pre-connecting to a specific
database would save an unknown but possibly significant amount of
additional latency. Against that, it's more complex (because we've
got to track which preforked workers are associated with which
databases) and there's some cost to guessing wrong (because then we're
keeping workers around that we can't use, or maybe even having to turn
around and kill them to make slots for the workers we actually need).
I suspect we'll want to pursue the idea at some point but it's not
near the top of my list.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Andres Freund

andres@2ndquadrant.com

over 12 years ago

In reply to: Robert Haas (#11)

Re: dynamic background workers

On 2013-06-20 11:29:27 -0400, Robert Haas wrote:

Do you mean pre-forking and connecting to a specific database? Or really
just the forking?

I've considered both at various times, although in this context I was
mostly thinking about just the forking. Pre-connecting to a specific
database would save an unknown but possibly significant amount of
additional latency. Against that, it's more complex (because we've
got to track which preforked workers are associated with which
databases) and there's some cost to guessing wrong (because then we're
keeping workers around that we can't use, or maybe even having to turn
around and kill them to make slots for the workers we actually need).
I suspect we'll want to pursue the idea at some point but it's not
near the top of my list.

Just as a datapoint, if you benchmark the numbers of forks that can be
performed by a single process (i.e. postmaster) the number is easily in
the 10s of thousands. Now forking that much has some scalability
implications inside the kernel, but still.
I'd be surprised if the actual fork is more than 5-10% of the current
cost of starting a new backend.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Michael Paquier

michael.paquier@gmail.com

over 12 years ago

In reply to: Robert Haas (#1)

Re: dynamic background workers

Hi,

Please find some review for the 2nd patch, with the 1st patch applied
on top of it.

On Sat, Jun 15, 2013 at 6:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:

The second patch, dynamic-bgworkers-v1.patch, revises the background
worker API to allow background workers to be started dynamically.
This requires some communication channel from ordinary workers to the
postmaster, because it is the postmaster that must ultimately start
the newly-registered workers. However, that communication channel has
to be designed pretty carefully, lest a shared memory corruption take
out the postmaster and lead to inadvertent failure to restart after a
crash. Here's how I implemented that: there's an array in shared
memory of a size equal to max_worker_processes. This array is
separate from the backend-private list of workers maintained by the
postmaster, but the two are kept in sync. When a new background
worker registration is added to the shared data structure, the backend
adding it uses the existing pmsignal mechanism to kick the postmaster,
which then scans the array for new registrations. I have attempted to
make the code that transfers the shared_memory state into the
postmaster's private state as paranoid as humanly possible. The
precautions taken are documented in the comments. Conversely, when a
background worker flagged as BGW_NEVER_RESTART is considered for
restart (and we decide against it), the corresponding slot in the
shared memory array is marked as no longer in use, allowing it to be
reused for a new registration.

Since the postmaster cannot take locks, synchronization between the
postmaster and other backends using the shared memory segment has to
be lockless. This mechanism is also documented in the comments. An
lwlock is used to prevent two backends that are both registering a new
worker at about the same time from stomping on each other, but the
postmaster need not care about that lwlock.

This patch also extends worker_spi as a demonstration of the new
interface. With this patch, you can CREATE EXTENSION worker_spi and
then call worker_spi_launch(int4) to launch a new background worker,
or combine it with generate_series() to launch a bunch at once. Then
you can kill them off with pg_terminate_backend() and start some new
ones. That, in my humble opinion, is pretty cool.

The patch applies cleanly, I found a couple of whitespaces though:
/home/ioltas/download/dynamic-bgworkers-v1.patch:452: space before tab
in indent.
slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
/home/ioltas/download/dynamic-bgworkers-v1.patch:474: space before tab
in indent.
slot = &BackgroundWorkerData->slot[slotno];
/home/ioltas/download/dynamic-bgworkers-v1.patch:639: trailing whitespace.
success = true;
warning: 3 lines add whitespace errors.

The code compiles, has no warnings, and make check passes.

Then, here are some impressions after reading the code. It is good to
see that all the bgworker APIs are moved under the same banner
bgworker.c.
1) Just for clarity, I think that this code in worker_spi.c deserves a
comment mentioning that this code path cannot cannot be taken for a
bgworker not loaded via shared_preload_libraries.
+
+       if (!process_shared_preload_libraries_in_progress)
+               return;
+
2) s/NUL-terminated/NULL-terminated @ bgworker.c
3) Why not adding an other function in worker_spi.c being the opposite
of worker_spi_launch to stop dynamic bgworkers for a given index
number? This would be only a wrapper of pg_terminate_backend, OK, but
at least it would give the user all the information needed to start
and to stop a dynamic bgworker with a single extension, here
worker_spi.c. It can be painful to stop
4) Not completely related to this patch, but one sanity check in
SanityCheckBackgroundWorker:bgworker.c is not listed in the
documentation: when requesting a database connection, bgworker needs
to have access to shmem. It looks that this should be also fixed in
REL9_3_STABLE.
5) Why not adding some documentation? Both dynamic and static
bgworkers share the same infrastructure, so some lines in the existing
chapter might be fine?
6) Just wondering something: it looks that the current code is not
able to identify what was the way used to start a given bgworker.
Would it be interesting to be able to identify if a bgworker has been
registered though RegisterBackgroundWorker or
RegisterDynamicBackgroundWorker?

I have also done some tests, and the infrastructure is working nicely.
The workers started dynamically are able to receive SIGHUP and
SIGTERM. Workers are also not started if the maximum number of
bgworkers authorized is reached. It is really a nice feature!

Also, I will try to do some more tests to test the robustness of the
slist and the protocol used.

Regards,
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Michael Paquier (#13)

Re: dynamic background workers

On Mon, Jun 24, 2013 at 3:51 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

3) Why not adding an other function in worker_spi.c being the opposite
of worker_spi_launch to stop dynamic bgworkers for a given index
number? This would be only a wrapper of pg_terminate_backend, OK, but
at least it would give the user all the information needed to start
and to stop a dynamic bgworker with a single extension, here
worker_spi.c. It can be painful to stop

Well, there's currently no mechanism for the person who starts a new
backend to know the PID of the process that actually got started. I
plan to write a patch to address that problem, but it's not this
patch.

4) Not completely related to this patch, but one sanity check in
SanityCheckBackgroundWorker:bgworker.c is not listed in the
documentation: when requesting a database connection, bgworker needs
to have access to shmem. It looks that this should be also fixed in
REL9_3_STABLE.

That's fine; I think it's separate from this patch. Please feel free
to propose something.

5) Why not adding some documentation? Both dynamic and static
bgworkers share the same infrastructure, so some lines in the existing
chapter might be fine?

I'll take a look.

6) Just wondering something: it looks that the current code is not
able to identify what was the way used to start a given bgworker.
Would it be interesting to be able to identify if a bgworker has been
registered though RegisterBackgroundWorker or
RegisterDynamicBackgroundWorker?

I don't think that's a good thing to expose.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Alvaro Herrera

alvherre@2ndquadrant.com

over 12 years ago

In reply to: Andres Freund (#12)

Re: dynamic background workers

Andres Freund escribiï¿½:

Just as a datapoint, if you benchmark the numbers of forks that can be
performed by a single process (i.e. postmaster) the number is easily in
the 10s of thousands. Now forking that much has some scalability
implications inside the kernel, but still.
I'd be surprised if the actual fork is more than 5-10% of the current
cost of starting a new backend.

I played at having some thousands of registered bgworkers on my laptop,
and there wasn't even that much load. So yeah, you can have lots of
forks.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Michael Paquier

michael.paquier@gmail.com

over 12 years ago

In reply to: Robert Haas (#14)

Re: dynamic background workers

On Wed, Jul 3, 2013 at 11:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jun 24, 2013 at 3:51 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

3) Why not adding an other function in worker_spi.c being the opposite
of worker_spi_launch to stop dynamic bgworkers for a given index
number? This would be only a wrapper of pg_terminate_backend, OK, but
at least it would give the user all the information needed to start
and to stop a dynamic bgworker with a single extension, here
worker_spi.c. It can be painful to stop

Well, there's currently no mechanism for the person who starts a new
backend to know the PID of the process that actually got started. I
plan to write a patch to address that problem, but it's not this
patch.

OK. Understood, this functionality would be a good addition to have.

4) Not completely related to this patch, but one sanity check in
SanityCheckBackgroundWorker:bgworker.c is not listed in the
documentation: when requesting a database connection, bgworker needs
to have access to shmem. It looks that this should be also fixed in
REL9_3_STABLE.

That's fine; I think it's separate from this patch. Please feel free
to propose something.

I'll send a patch about that.

6) Just wondering something: it looks that the current code is not
able to identify what was the way used to start a given bgworker.
Would it be interesting to be able to identify if a bgworker has been
registered though RegisterBackgroundWorker or
RegisterDynamicBackgroundWorker?

I don't think that's a good thing to expose.

My concerns here are covered by the functionality you propose in 1),
where a user who launched a custom bgworker would know its PID, this
would allow users to keep track of which bgworker has been started
dynamically.

Regards,
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Alvaro Herrera (#15)

Re: dynamic background workers

On Wed, Jul 3, 2013 at 11:15 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Andres Freund escribió:

Just as a datapoint, if you benchmark the numbers of forks that can be
performed by a single process (i.e. postmaster) the number is easily in
the 10s of thousands. Now forking that much has some scalability
implications inside the kernel, but still.
I'd be surprised if the actual fork is more than 5-10% of the current
cost of starting a new backend.

I played at having some thousands of registered bgworkers on my laptop,
and there wasn't even that much load. So yeah, you can have lots of
forks.

Since no one seems to be objecting to this patch beyond the lack of
documentation, I've added documentation and committed it, with
appropriate rebasing and a few minor cleanups. One loose end is
around the bgw_sighup and bgw_sigterm structure members. If you're
registering a background worker for a library that is not loaded in
the postmaster, you can't (safely) use these for anything, because
it's possible (though maybe not likely) for the worker process to map
the shared library at a different address than where they are mapped
in the backend that requests the new process to be started. However,
that doesn't really matter; AFAICS, you can just as well call pqsignal
to set the handlers to anything you want from the main entrypoint
before unblocking signals. So I'm inclined to say we should just
remove bgw_sighup and bgw_sigterm altogether and tell people to do it
that way.

Alternatively, we could give them the same treatment that I gave
bgw_main: let the user specify a function name and we'll search the
appropriate DSO for it. But that's probably less convenient for
anyone using this facility than just calling pqsignal() before
unblocking signals, so I don't see any real reason to go that route.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers