pg_background (and more parallelism infrastructure patches)

Started by Robert Haasover 11 years ago83 messages

robertmhaas@gmail.com

over 11 years ago

6 attachment(s)

Attached is a contrib module that lets you launch arbitrary command in
a background worker, and supporting infrastructure patches for core.
You can launch queries and fetch the results back, much as you could
do with a dblink connection back to the local database but without the
hassles of dealing with authentication; and you can also run utility
commands, like VACUUM. For people who have always wanted to be able
to launch a vacuum (or an autonomous transaction, or a background
task) from a procedural language ... enjoy.

Here's an example of running vacuum and then fetching the results.
Notice that the notices from the original session are propagated to
our session; if an error had occurred, it would be re-thrown locally
when we try to read the results.

rhaas=# create table foo (a int);
CREATE TABLE
rhaas=# select pg_background_launch('vacuum verbose foo');
pg_background_launch
----------------------
51101
(1 row)

rhaas=# select * from pg_background_result(51101) as (x text);
INFO: vacuuming "public.foo"
INFO: "foo": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
x
--------
VACUUM
(1 row)

Here's an overview of the attached patches:

Patches 1 and 2 add a few new interfaces to the shm_mq and dsm APIs
that happen to be convenient for later patches in the series. I'm
pretty sure I could make all this work without these, but it would
take more code and be less efficient, so here they are.

Patch 3 adds the ability for a backend to request that the protocol
messages it would normally send to the frontend get redirected to a
shm_mq. I did this by adding a couple of hook functions. The best
design is definitely arguable here, so if you'd like to bikeshed, this
is probably the patch to look at. This patch also adds a function to
help you parse an ErrorResponse or NoticeResponse and re-throw the
error or notice in the originating backend. Obviously, parallelism is
going to need this kind of functionality, but I suspect a variety of
other applications people may develop using background workers may
want it too; and it's certainly important for pg_background itself.

Patch 4 adds infrastructure that allows one session to save all of its
non-default GUC values and another session to reload those values.
This was written by Amit Khandekar and Noah Misch. It allows
pg_background to start up the background worker with the same GUC
settings that the launching process is using. I intend this as a
demonstration of how to synchronize any given piece of state between
cooperating backends. For real parallelism, we'll need to synchronize
snapshots, combo CIDs, transaction state, and so on, in addition to
GUCs. But GUCs are ONE of the things that we'll need to synchronize
in that context, and this patch shows the kind of API we're thinking
about for these sorts of problems.

Patch 5 is a trivial patch to add a function to get the authenticated
user ID. Noah pointed out to me that it's important for the
authenticated user ID, session user ID, and current user ID to all
match between the original session and the background worker.
Otherwise, pg_background could be used to circumvent restrictions that
we normally impose when those values differ from each other. The
session and current user IDs are restored by the GUC save-and-restore
machinery ("session_authorization" and "role") but the authenticated
user ID requires special treatment. To make that happen, it has to be
exposed somehow.

Patch 6 is pg_background itself. I'm quite pleased with how easily
this came together. The existing background worker, dsm, shm_toc, and
shm_mq infrastructure handles most of the heavily lifting here -
obviously with some exceptions addressed by the preceding patches.
Again, this is the kind of set-up that I'm expecting will happen in a
background worker used for actual parallelism - clearly, more state
will need to be restored there than here, but nonetheless the general
flow of the code here is about what I'm imagining, just with somewhat
more different kinds of state. Most of the work of writing this patch
was actually figuring out how to execute the query itself; what I
ended up with is mostly copied form exec_simple_query, but with some
difference here and there. I'm not sure if it would be
possible/advisable to try to refactor to reduce duplication.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Extend-shm_mq-API-with-new-functions-shm_mq_sendv-an.patchtext/x-patch; charset=US-ASCII; name=0001-Extend-shm_mq-API-with-new-functions-shm_mq_sendv-an.patchDownload

From 82c93205d2dba8cf82c2753dc0b90540810f0a57 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 17 Jul 2014 14:51:27 -0400
Subject: [PATCH 1/6] Extend shm_mq API with new functions shm_mq_sendv and
 shm_mq_set_handle.

shm_mq_sendv sends a message to the queue assembled from multiple
locations.  This is expected to be used by forthcoming patches to
allow frontend/backend protocol messages to be sent via shm_mq, but
might be useful for other purposes as well.

shm_mq_set_handle associates a BackgroundWorkerHandle with an
already-existing shm_mq_handle.  This solves a timing problem when
creating a shm_mq to communicate with a newly-launched background
worker: if you attach to the queue first, and the background worker
fails to start, you might block forever trying to do I/O on the
queue; but if you start the background worker first, but then die
before attaching to the queue, the background worrker might block
forever trying to do I/O on the queue.  This lets you attach before
starting the worker (so that the worker is protected) and then associate
the BackgroundWorkerHandle later (so that you are also protected).
---
 src/backend/storage/ipc/shm_mq.c |  126 +++++++++++++++++++++++++++++++++-----
 src/include/storage/shm_mq.h     |   14 ++++-
 2 files changed, 124 insertions(+), 16 deletions(-)

diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index d96627a..90df593 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -139,7 +139,7 @@ struct shm_mq_handle
 };
 
 static shm_mq_result shm_mq_send_bytes(shm_mq_handle *mq, Size nbytes,
-				  void *data, bool nowait, Size *bytes_written);
+				  const void *data, bool nowait, Size *bytes_written);
 static shm_mq_result shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed,
 					 bool nowait, Size *nbytesp, void **datap);
 static bool shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile * ptr,
@@ -301,7 +301,33 @@ shm_mq_attach(shm_mq *mq, dsm_segment *seg, BackgroundWorkerHandle *handle)
 }
 
 /*
+ * Associate a BackgroundWorkerHandle with a shm_mq_handle just as if it had
+ * been passed to shm_mq_attach.
+ */
+void
+shm_mq_set_handle(shm_mq_handle *mqh, BackgroundWorkerHandle *handle)
+{
+	Assert(mqh->mqh_handle == NULL);
+	mqh->mqh_handle = handle;
+}
+
+/*
  * Write a message into a shared message queue.
+ */
+shm_mq_result
+shm_mq_send(shm_mq_handle *mqh, Size nbytes, const void *data, bool nowait)
+{
+	shm_mq_iovec	iov;
+
+	iov.data = data;
+	iov.len = nbytes;
+
+	return shm_mq_sendv(mqh, &iov, 1, nowait);
+}
+
+/*
+ * Write a message into a shared message queue, gathered from multiple
+ * addresses.
  *
  * When nowait = false, we'll wait on our process latch when the ring buffer
  * fills up, and then continue writing once the receiver has drained some data.
@@ -315,14 +341,22 @@ shm_mq_attach(shm_mq *mq, dsm_segment *seg, BackgroundWorkerHandle *handle)
  * the length or payload will corrupt the queue.)
  */
 shm_mq_result
-shm_mq_send(shm_mq_handle *mqh, Size nbytes, void *data, bool nowait)
+shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait)
 {
 	shm_mq_result res;
 	shm_mq	   *mq = mqh->mqh_queue;
+	Size		nbytes = 0;
 	Size		bytes_written;
+	int			i;
+	int			which_iov = 0;
+	Size		offset;
 
 	Assert(mq->mq_sender == MyProc);
 
+	/* Compute total size of write. */
+	for (i = 0; i < iovcnt; ++i)
+		nbytes += iov[i].len;
+
 	/* Try to write, or finish writing, the length word into the buffer. */
 	while (!mqh->mqh_length_word_complete)
 	{
@@ -348,18 +382,80 @@ shm_mq_send(shm_mq_handle *mqh, Size nbytes, void *data, bool nowait)
 
 	/* Write the actual data bytes into the buffer. */
 	Assert(mqh->mqh_partial_bytes <= nbytes);
-	res = shm_mq_send_bytes(mqh, nbytes - mqh->mqh_partial_bytes,
-							((char *) data) + mqh->mqh_partial_bytes,
-							nowait, &bytes_written);
-	if (res == SHM_MQ_WOULD_BLOCK)
-		mqh->mqh_partial_bytes += bytes_written;
-	else
+	offset = mqh->mqh_partial_bytes;
+	do
 	{
-		mqh->mqh_partial_bytes = 0;
-		mqh->mqh_length_word_complete = false;
-	}
-	if (res != SHM_MQ_SUCCESS)
-		return res;
+		Size	chunksize;
+
+		/* Figure out which bytes need to be sent next. */
+		if (offset >= iov[which_iov].len)
+		{
+			offset -= iov[which_iov].len;
+			++which_iov;
+			if (which_iov >= iovcnt)
+				break;
+			continue;
+		}
+
+		/*
+		 * We want to avoid copying the data if at all possible, but every
+		 * chunk of bytes we write into the queue has to be MAXALIGN'd,
+		 * except the last.  Thus, if a chunk other than the last one ends
+		 * on a non-MAXALIGN'd boundary, we have to combine the tail end of
+		 * its data with data from one or more following chunks until we
+		 * either reach the last chunk or accumulate a number of bytes which
+		 * is MAXALIGN'd.
+		 */
+		if (which_iov + 1 < iovcnt &&
+			offset + MAXIMUM_ALIGNOF > iov[which_iov].len)
+		{
+			char	tmpbuf[MAXIMUM_ALIGNOF];
+			int		j = 0;
+
+			for (;;)
+			{
+				if (offset < iov[which_iov].len)
+				{
+					tmpbuf[j] = iov[which_iov].data[offset];
+					j++;
+					offset++;
+					if (j == MAXIMUM_ALIGNOF)
+						break;
+				}
+				else
+				{
+					offset -= iov[which_iov].len;
+					which_iov++;
+					if (which_iov >= iovcnt)
+						break;
+				}
+			}
+			res = shm_mq_send_bytes(mqh, j, tmpbuf, nowait, &bytes_written);
+			mqh->mqh_partial_bytes += bytes_written;
+			if (res != SHM_MQ_SUCCESS)
+				return res;
+			continue;
+		}
+
+		/*
+		 * If this is the last chunk, we can write all the data, even if it
+		 * isn't a multiple of MAXIMUM_ALIGNOF.  Otherwise, we need to
+		 * MAXALIGN_DOWN the write size.
+		 */
+		chunksize = iov[which_iov].len - offset;
+		if (which_iov + 1 < iovcnt)
+			chunksize = MAXALIGN_DOWN(chunksize);
+		res = shm_mq_send_bytes(mqh, chunksize, &iov[which_iov].data[offset],
+								nowait, &bytes_written);
+		mqh->mqh_partial_bytes += bytes_written;
+		offset += bytes_written;
+		if (res != SHM_MQ_SUCCESS)
+			return res;
+	} while (mqh->mqh_partial_bytes < nbytes);
+
+	/* Reset for next message. */
+	mqh->mqh_partial_bytes = 0;
+	mqh->mqh_length_word_complete = false;
 
 	/* Notify receiver of the newly-written data, and return. */
 	return shm_mq_notify_receiver(mq);
@@ -653,8 +749,8 @@ shm_mq_detach(shm_mq *mq)
  * Write bytes into a shared message queue.
  */
 static shm_mq_result
-shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, void *data, bool nowait,
-				  Size *bytes_written)
+shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
+				  bool nowait, Size *bytes_written)
 {
 	shm_mq	   *mq = mqh->mqh_queue;
 	Size		sent = 0;
diff --git a/src/include/storage/shm_mq.h b/src/include/storage/shm_mq.h
index 5bae380..063400a 100644
--- a/src/include/storage/shm_mq.h
+++ b/src/include/storage/shm_mq.h
@@ -25,6 +25,13 @@ typedef struct shm_mq shm_mq;
 struct shm_mq_handle;
 typedef struct shm_mq_handle shm_mq_handle;
 
+/* Descriptors for a single write spanning multiple locations. */
+typedef struct
+{
+	const char  *data;
+	Size	len;
+} shm_mq_iovec;
+
 /* Possible results of a send or receive operation. */
 typedef enum
 {
@@ -52,12 +59,17 @@ extern PGPROC *shm_mq_get_sender(shm_mq *);
 extern shm_mq_handle *shm_mq_attach(shm_mq *mq, dsm_segment *seg,
 			  BackgroundWorkerHandle *handle);
 
+/* Associate worker handle with shm_mq. */
+extern void shm_mq_set_handle(shm_mq_handle *, BackgroundWorkerHandle *);
+
 /* Break connection. */
 extern void shm_mq_detach(shm_mq *);
 
 /* Send or receive messages. */
 extern shm_mq_result shm_mq_send(shm_mq_handle *mqh,
-			Size nbytes, void *data, bool nowait);
+			Size nbytes, const void *data, bool nowait);
+extern shm_mq_result shm_mq_sendv(shm_mq_handle *mqh,
+			shm_mq_iovec *iov, int iovcnt, bool nowait);
 extern shm_mq_result shm_mq_receive(shm_mq_handle *mqh,
 			   Size *nbytesp, void **datap, bool nowait);
 
-- 
1.7.9.6 (Apple Git-31.1)

0002-Extend-dsm-API-with-a-new-function-dsm_unkeep_mappin.patchtext/x-patch; charset=US-ASCII; name=0002-Extend-dsm-API-with-a-new-function-dsm_unkeep_mappin.patchDownload

From d982dbb620b6a5d9992b3259550c3e16e7e8e28a Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 23 Jul 2014 11:16:44 -0400
Subject: [PATCH 2/6] Extend dsm API with a new function dsm_unkeep_mapping.

This reassociates a dynamic shared memory handle previous passed to
dsm_keep_mapping with the current resource owner, so that it will be
cleaned up at the end of the current query.
---
 src/backend/storage/ipc/dsm.c |   18 ++++++++++++++++++
 src/include/storage/dsm.h     |    1 +
 2 files changed, 19 insertions(+)

diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index a5c0084..6039beb 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -796,6 +796,24 @@ dsm_keep_mapping(dsm_segment *seg)
 }
 
 /*
+ * Arrange to remove a dynamic shared memory mapping at cleanup time.
+ *
+ * dsm_keep_mapping() can be used to preserve a mapping for the entire
+ * lifetime of a process; this function reverses that decision, making
+ * the segment owned by the current resource owner.  This may be useful
+ * just before performing some operation that will invalidate the segment
+ * for future use by this backend.
+ */
+void
+dsm_unkeep_mapping(dsm_segment *seg)
+{
+	Assert(seg->resowner == NULL);
+	ResourceOwnerEnlargeDSMs(CurrentResourceOwner);
+	seg->resowner = CurrentResourceOwner;
+	ResourceOwnerRememberDSM(seg->resowner, seg);
+}
+
+/*
  * Keep a dynamic shared memory segment until postmaster shutdown.
  *
  * This function should not be called more than once per segment;
diff --git a/src/include/storage/dsm.h b/src/include/storage/dsm.h
index 1d0110d..1694409 100644
--- a/src/include/storage/dsm.h
+++ b/src/include/storage/dsm.h
@@ -37,6 +37,7 @@ extern void dsm_detach(dsm_segment *seg);
 
 /* Resource management functions. */
 extern void dsm_keep_mapping(dsm_segment *seg);
+extern void dsm_unkeep_mapping(dsm_segment *seg);
 extern void dsm_keep_segment(dsm_segment *seg);
 extern dsm_segment *dsm_find_mapping(dsm_handle h);
 
-- 
1.7.9.6 (Apple Git-31.1)

0003-Support-frontend-backend-protocol-communication-usin.patchtext/x-patch; charset=US-ASCII; name=0003-Support-frontend-backend-protocol-communication-usin.patchDownload

From 5840a0f2c7a7546c98485382e2c173b6be943f54 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 28 May 2014 19:30:21 -0400
Subject: [PATCH 3/6] Support frontend-backend protocol communication using a
 shm_mq.

A background worker can use pq_redirect_to_shm_mq() to direct protocol
that would normally be sent to the frontend to a shm_mq so that another
process may read them.

The receiving process may use pq_parse_errornotice() to parse an
ErrorResponse or NoticeResponse from the background worker and, if
it wishes, ThrowErrorData() to propagate the error (with or without
further modification).
---
 src/backend/libpq/Makefile       |    2 +-
 src/backend/libpq/pqcomm.c       |   13 +++
 src/backend/libpq/pqmq.c         |  202 ++++++++++++++++++++++++++++++++++++++
 src/backend/utils/adt/numutils.c |    2 +-
 src/backend/utils/error/elog.c   |   51 ++++++++++
 src/include/libpq/libpq.h        |    3 +
 src/include/libpq/pqmq.h         |   22 +++++
 src/include/utils/builtins.h     |    2 +-
 src/include/utils/elog.h         |    1 +
 9 files changed, 295 insertions(+), 3 deletions(-)
 create mode 100644 src/backend/libpq/pqmq.c
 create mode 100644 src/include/libpq/pqmq.h

diff --git a/src/backend/libpq/Makefile b/src/backend/libpq/Makefile
index e929864..f548b2f 100644
--- a/src/backend/libpq/Makefile
+++ b/src/backend/libpq/Makefile
@@ -15,6 +15,6 @@ include $(top_builddir)/src/Makefile.global
 # be-fsstubs is here for historical reasons, probably belongs elsewhere
 
 OBJS = be-fsstubs.o be-secure.o auth.o crypt.o hba.o ip.o md5.o pqcomm.o \
-       pqformat.o pqsignal.o
+       pqformat.o pqmq.o pqsignal.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 605d891..b02fb0e 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -102,6 +102,9 @@
 int			Unix_socket_permissions;
 char	   *Unix_socket_group;
 
+/* Hooks for protocol message redirect */
+int	(*pq_putmessage_hook)(char msgtype, const char *s, size_t len);
+int	(*pq_flush_hook)(void);
 
 /* Where the Unix socket files are (list of palloc'd strings) */
 static List *sock_paths = NIL;
@@ -792,6 +795,11 @@ TouchSocketFiles(void)
 static void
 pq_set_nonblocking(bool nonblocking)
 {
+	if (MyProcPort == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONNECTION_DOES_NOT_EXIST),
+				 errmsg("there is no client connection")));
+
 	if (MyProcPort->noblock == nonblocking)
 		return;
 
@@ -1220,6 +1228,9 @@ pq_flush(void)
 {
 	int			res;
 
+	if (pq_flush_hook != NULL)
+		return pq_flush_hook();
+
 	/* No-op if reentrant call */
 	if (PqCommBusy)
 		return 0;
@@ -1378,6 +1389,8 @@ pq_is_send_pending(void)
 int
 pq_putmessage(char msgtype, const char *s, size_t len)
 {
+	if (pq_putmessage_hook != NULL)
+		return pq_putmessage_hook(msgtype, s, len);
 	if (DoingCopyOut || PqCommBusy)
 		return 0;
 	PqCommBusy = true;
diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
new file mode 100644
index 0000000..c29cc7c
--- /dev/null
+++ b/src/backend/libpq/pqmq.c
@@ -0,0 +1,202 @@
+/*-------------------------------------------------------------------------
+ *
+ * pqmq.c
+ *	  Use the frontend/backend protocol for communication over a shm_mq
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *	src/backend/libpq/pqmq.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "libpq/libpq.h"
+#include "libpq/pqformat.h"
+#include "libpq/pqmq.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+static shm_mq *pq_mq;
+static shm_mq_handle *pq_mq_handle;
+static bool pq_mq_busy = false;
+
+static int pq_putmessage_mq(char msgtype, const char *s, size_t len);
+static int pq_flush_mq(void);
+
+/*
+ * Arrange to redirect frontend/backend protocol messages to a shared-memory
+ * message queue.
+ */
+void
+pq_redirect_to_shm_mq(shm_mq *mq, shm_mq_handle *mqh)
+{
+	pq_putmessage_hook = pq_putmessage_mq;
+	pq_flush_hook = pq_flush_mq;
+	pq_mq = mq;
+	pq_mq_handle = mqh;
+	whereToSendOutput = DestRemote;
+	FrontendProtocol = PG_PROTOCOL_LATEST;
+}
+
+/*
+ * Transmit a libpq protocol message to the shared memory message queue
+ * selected via pq_mq_handle.  We don't include a length word, because the
+ * receiver will know the length of the message from shm_mq_receive().
+ */
+static int
+pq_putmessage_mq(char msgtype, const char *s, size_t len)
+{
+	shm_mq_iovec	iov[2];
+	shm_mq_result	result;
+
+	/*
+	 * If we're sending a message, and we have to wait because the
+	 * queue is full, and then we get interrupted, and that interrupt
+	 * results in trying to send another message, we respond by detaching
+	 * the queue.  There's no way to return to the original context, but
+	 * even if there were, just queueing the message would amount to
+	 * indefinitely postponing the response to the interrupt.  So we do
+	 * this instead.
+	 */
+	if (pq_mq_busy)
+	{
+		if (pq_mq != NULL)
+			shm_mq_detach(pq_mq);
+		pq_mq = NULL;
+		return EOF;
+	}
+
+	pq_mq_busy = true;
+
+	iov[0].data = &msgtype;
+	iov[0].len = 1;
+	iov[1].data = s;
+	iov[1].len = len;
+
+	Assert(pq_mq_handle != NULL);
+	result = shm_mq_sendv(pq_mq_handle, iov, 2, false);
+
+	pq_mq_busy = false;
+
+	Assert(result == SHM_MQ_SUCCESS || result == SHM_MQ_DETACHED);
+	if (result != SHM_MQ_SUCCESS)
+		return EOF;
+	return 0;
+}
+
+/*
+ * Flush is not required.
+ */
+static int
+pq_flush_mq(void)
+{
+	return 0;
+}
+
+/*
+ * Parse an ErrorResponse or NoticeResponse payload and populate an ErrorData
+ * structure with the results.
+ */
+void
+pq_parse_errornotice(StringInfo msg, ErrorData *edata)
+{
+	/* Initialize edata with reasonable defaults. */
+	MemSet(edata, 0, sizeof(ErrorData));
+	edata->elevel = ERROR;
+	edata->assoc_context = CurrentMemoryContext;
+
+	/* Loop over fields and extract each one. */
+	for (;;)
+	{
+		char	code = pq_getmsgbyte(msg);
+		const char *value;
+
+		if (code == '\0')
+		{
+			pq_getmsgend(msg);
+			break;
+		}
+	 	value = pq_getmsgstring(msg);
+
+		switch (code)
+		{
+			case PG_DIAG_SEVERITY:
+				if (strcmp(value, "DEBUG") == 0)
+					edata->elevel = DEBUG1;	/* or some other DEBUG level */
+				else if (strcmp(value, "LOG") == 0)
+					edata->elevel = LOG;	/* can't be COMMERROR */
+				else if (strcmp(value, "INFO") == 0)
+					edata->elevel = INFO;
+				else if (strcmp(value, "NOTICE") == 0)
+					edata->elevel = NOTICE;
+				else if (strcmp(value, "WARNING") == 0)
+					edata->elevel = WARNING;
+				else if (strcmp(value, "ERROR") == 0)
+					edata->elevel = ERROR;
+				else if (strcmp(value, "FATAL") == 0)
+					edata->elevel = FATAL;
+				else if (strcmp(value, "PANIC") == 0)
+					edata->elevel = PANIC;
+				else
+					elog(ERROR, "unknown error severity");
+				break;
+			case PG_DIAG_SQLSTATE:
+				if (strlen(value) != 5)
+					elog(ERROR, "malformed sql state");
+				edata->sqlerrcode = MAKE_SQLSTATE(value[0], value[1], value[2],
+												  value[3], value[4]);
+				break;
+			case PG_DIAG_MESSAGE_PRIMARY:
+				edata->message = pstrdup(value);
+				break;
+			case PG_DIAG_MESSAGE_DETAIL:
+				edata->detail = pstrdup(value);
+				break;
+			case PG_DIAG_MESSAGE_HINT:
+				edata->hint = pstrdup(value);
+				break;
+			case PG_DIAG_STATEMENT_POSITION:
+				edata->cursorpos = pg_atoi(value, sizeof(int), '\0');
+				break;
+			case PG_DIAG_INTERNAL_POSITION:
+				edata->internalpos = pg_atoi(value, sizeof(int), '\0');
+				break;
+			case PG_DIAG_INTERNAL_QUERY:
+				edata->internalquery = pstrdup(value);
+				break;
+			case PG_DIAG_CONTEXT:
+				edata->context = pstrdup(value);
+				break;
+			case PG_DIAG_SCHEMA_NAME:
+				edata->schema_name = pstrdup(value);
+				break;
+			case PG_DIAG_TABLE_NAME:
+				edata->table_name = pstrdup(value);
+				break;
+			case PG_DIAG_COLUMN_NAME:
+				edata->column_name = pstrdup(value);
+				break;
+			case PG_DIAG_DATATYPE_NAME:
+				edata->datatype_name = pstrdup(value);
+				break;
+			case PG_DIAG_CONSTRAINT_NAME:
+				edata->constraint_name = pstrdup(value);
+				break;
+			case PG_DIAG_SOURCE_FILE:
+				edata->filename = pstrdup(value);
+				break;
+			case PG_DIAG_SOURCE_LINE:
+				edata->lineno = pg_atoi(value, sizeof(int), '\0');
+				break;
+			case PG_DIAG_SOURCE_FUNCTION:
+				edata->funcname = pstrdup(value);
+				break;
+			default:
+				elog(ERROR, "unknown error field: %d", (int) code);
+				break;
+		}
+	}
+}
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index ca5a8a5..1d13363 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -34,7 +34,7 @@
  * overflow.
  */
 int32
-pg_atoi(char *s, int size, int c)
+pg_atoi(const char *s, int size, int c)
 {
 	long		l;
 	char	   *badp;
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 32a9663..2316464 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -1577,6 +1577,57 @@ FlushErrorState(void)
 }
 
 /*
+ * ThrowErrorData --- report an error described by an ErrorData structure
+ *
+ * This is intended to be used to re-report errors originally thrown by
+ * background worker processes and then propagated (with or without
+ * modification) to the backend responsible for them.
+ */
+void
+ThrowErrorData(ErrorData *edata)
+{
+	ErrorData *newedata;
+	MemoryContext	oldcontext;
+
+	if (!errstart(edata->elevel, edata->filename, edata->lineno,
+				  edata->funcname, NULL))
+		return;
+
+	newedata = &errordata[errordata_stack_depth];
+	oldcontext = MemoryContextSwitchTo(edata->assoc_context);
+
+	/* Copy the supplied fields to the error stack. */
+	if (edata->sqlerrcode > 0)
+		newedata->sqlerrcode = edata->sqlerrcode;
+	if (edata->message)
+		newedata->message = pstrdup(edata->message);
+	if (edata->detail)
+		newedata->detail = pstrdup(edata->detail);
+	if (edata->detail_log)
+		newedata->detail_log = pstrdup(edata->detail_log);
+	if (edata->hint)
+		newedata->hint = pstrdup(edata->hint);
+	if (edata->context)
+		newedata->context = pstrdup(edata->context);
+	if (edata->schema_name)
+		newedata->schema_name = pstrdup(edata->schema_name);
+	if (edata->table_name)
+		newedata->table_name = pstrdup(edata->table_name);
+	if (edata->column_name)
+		newedata->column_name = pstrdup(edata->column_name);
+	if (edata->datatype_name)
+		newedata->datatype_name = pstrdup(edata->datatype_name);
+	if (edata->constraint_name)
+		newedata->constraint_name = pstrdup(edata->constraint_name);
+	if (edata->internalquery)
+		newedata->internalquery = pstrdup(edata->internalquery);
+
+	MemoryContextSwitchTo(oldcontext);
+
+	errfinish(0);
+}
+
+/*
  * ReThrowError --- re-throw a previously copied error
  *
  * A handler can do CopyErrorData/FlushErrorState to get out of the error
diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
index e4e354d..eee80b3 100644
--- a/src/include/libpq/libpq.h
+++ b/src/include/libpq/libpq.h
@@ -75,6 +75,9 @@ extern char *ssl_key_file;
 extern char *ssl_ca_file;
 extern char *ssl_crl_file;
 
+extern int	(*pq_putmessage_hook)(char msgtype, const char *s, size_t len);
+extern int  (*pq_flush_hook)(void);
+
 extern int	secure_initialize(void);
 extern bool secure_loaded_verify_locations(void);
 extern void secure_destroy(void);
diff --git a/src/include/libpq/pqmq.h b/src/include/libpq/pqmq.h
new file mode 100644
index 0000000..6bb24d9
--- /dev/null
+++ b/src/include/libpq/pqmq.h
@@ -0,0 +1,22 @@
+/*-------------------------------------------------------------------------
+ *
+ * pqmq.h
+ *	  Use the frontend/backend protocol for communication over a shm_mq
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/libpq/pqmq.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PQMQ_H
+#define PQMQ_H
+
+#include "storage/shm_mq.h"
+
+extern void	pq_redirect_to_shm_mq(shm_mq *, shm_mq_handle *);
+
+extern void pq_parse_errornotice(StringInfo str, ErrorData *edata);
+
+#endif   /* PQMQ_H */
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index bbb5d39..2e8783f 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -285,7 +285,7 @@ extern Datum current_schema(PG_FUNCTION_ARGS);
 extern Datum current_schemas(PG_FUNCTION_ARGS);
 
 /* numutils.c */
-extern int32 pg_atoi(char *s, int size, int c);
+extern int32 pg_atoi(const char *s, int size, int c);
 extern void pg_itoa(int16 i, char *a);
 extern void pg_ltoa(int32 l, char *a);
 extern void pg_lltoa(int64 ll, char *a);
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 92073be..87438b8 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -415,6 +415,7 @@ extern ErrorData *CopyErrorData(void);
 extern void FreeErrorData(ErrorData *edata);
 extern void FlushErrorState(void);
 extern void ReThrowError(ErrorData *edata) __attribute__((noreturn));
+extern void ThrowErrorData(ErrorData *edata);
 extern void pg_re_throw(void) __attribute__((noreturn));
 
 extern char *GetErrorContextStack(void);
-- 
1.7.9.6 (Apple Git-31.1)

0004-Add-infrastructure-to-save-and-restore-GUC-values.patchtext/x-patch; charset=US-ASCII; name=0004-Add-infrastructure-to-save-and-restore-GUC-values.patchDownload

From 59d3cb93c566368d9cc9dd26b3ffd0924c98ee36 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 17 Jul 2014 07:58:32 -0400
Subject: [PATCH 4/6] Add infrastructure to save and restore GUC values.

Amit Khandekar and Noah Misch, with a bit of further hacking by me.
---
 src/backend/utils/misc/guc.c |  397 ++++++++++++++++++++++++++++++++++++++++++
 src/include/utils/guc.h      |    5 +
 2 files changed, 402 insertions(+)

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6c52db8..cb324f3 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -117,6 +117,12 @@
 #define S_PER_D (60 * 60 * 24)
 #define MS_PER_D (1000 * 60 * 60 * 24)
 
+/*
+ * Precision with which REAL type guc values are to be printed for GUC
+ * serialization.
+ */
+#define REALTYPE_PRECISION 17
+
 /* XXX these should appear in other modules' header files */
 extern bool Log_disconnections;
 extern int	CommitDelay;
@@ -146,6 +152,10 @@ char	   *GUC_check_errmsg_string;
 char	   *GUC_check_errdetail_string;
 char	   *GUC_check_errhint_string;
 
+static void
+do_serialize(char **destptr, Size *maxbytes, const char *fmt,...)
+/* This lets gcc check the format string for consistency. */
+__attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 4)));
 
 static void set_config_sourcefile(const char *name, char *sourcefile,
 					  int sourceline);
@@ -8409,6 +8419,393 @@ read_nondefault_variables(void)
 }
 #endif   /* EXEC_BACKEND */
 
+/*
+ * can_skip_gucvar:
+ * When serializing, determine whether to skip this GUC.  When restoring, the
+ * negation of this test determines whether to restore the compiled-in default
+ * value before processing serialized values.
+ *
+ * A PGC_S_DEFAULT setting on the serialize side will typically match new
+ * postmaster children, but that can be false when got_SIGHUP == true and the
+ * pending configuration change modifies this setting.  Nonetheless, we omit
+ * PGC_S_DEFAULT settings from serialization and make up for that by restoring
+ * defaults before applying serialized values.
+ *
+ * PGC_POSTMASTER variables always have the same value in every child of a
+ * particular postmaster.  Most PGC_INTERNAL variables are compile-time
+ * constants; a few, like server_encoding and lc_ctype, are handled specially
+ * outside the serialize/restore procedure.  Therefore, SerializeGUCState()
+ * never sends these, and and RestoreGUCState() never changes them.
+ */
+static bool
+can_skip_gucvar(struct config_generic * gconf)
+{
+	return gconf->context == PGC_POSTMASTER ||
+		gconf->context == PGC_INTERNAL || gconf->source == PGC_S_DEFAULT;
+}
+
+/*
+ * estimate_variable_size:
+ * Estimate max size for dumping the given GUC variable.
+ */
+static Size
+estimate_variable_size(struct config_generic * gconf)
+{
+	Size		size;
+	Size		valsize;
+
+	if (can_skip_gucvar(gconf))
+		return 0;
+
+	size = 0;
+
+	size = add_size(size, strlen(gconf->name) + 1);
+
+	/* Get the maximum display length of the GUC value. */
+	switch (gconf->vartype)
+	{
+		case PGC_BOOL:
+			{
+				valsize = 5;	/* max(strlen('true'), strlen('false')) */
+			}
+			break;
+
+		case PGC_INT:
+			{
+				struct config_int *conf = (struct config_int *) gconf;
+
+				/*
+				 * Instead of getting the exact display length, use max
+				 * length.  Also reduce the max length for typical ranges of
+				 * small values.  Maximum value is 2147483647, i.e. 10 chars.
+				 * Include one byte for sign.
+				 */
+				if (abs(*conf->variable) < 1000)
+					valsize = 3 + 1;
+				else
+					valsize = 10 + 1;
+			}
+			break;
+
+		case PGC_REAL:
+			{
+				/*
+				 * We are going to print it with %.17g. Account for sign,
+				 * decimal point, and e+nnn notation. E.g.
+				 * -3.9932904234000002e+110
+				 */
+				valsize = REALTYPE_PRECISION + 1 + 1 + 5;
+			}
+			break;
+
+		case PGC_STRING:
+			{
+				struct config_string *conf = (struct config_string *) gconf;
+
+				valsize = strlen(*conf->variable);
+			}
+			break;
+
+		case PGC_ENUM:
+			{
+				struct config_enum *conf = (struct config_enum *) gconf;
+
+				valsize = strlen(config_enum_lookup_by_value(conf, *conf->variable));
+			}
+			break;
+	}
+
+	/* Valsize + NULL terminator */
+	size = add_size(size, valsize + 1);
+
+	if (gconf->sourcefile)
+		size = add_size(size, strlen(gconf->sourcefile));
+
+	/* NULL char */
+	size = add_size(size, 1);
+
+	/* Include line whenever we include file. */
+	if (gconf->sourcefile)
+		size = add_size(size, sizeof(gconf->sourceline));
+
+	size = add_size(size, sizeof(gconf->source));
+	size = add_size(size, sizeof(gconf->scontext));
+
+	return size;
+}
+
+/*
+ * EstimateGUCStateSpace:
+ * Returns the size needed to store the GUC state for the current process
+ */
+Size
+EstimateGUCStateSpace(void)
+{
+	Size		size;
+	int			i;
+
+	/* Add space reqd for saving the data size of the guc state */
+	size = sizeof(Size);
+
+	/* Add up the space needed for each GUC variable */
+	for (i = 0; i < num_guc_variables; i++)
+		size = add_size(size,
+						estimate_variable_size(guc_variables[i]));
+
+	return size;
+}
+
+/*
+ * do_serialize:
+ * Copies the formatted string into the destination.  Moves ahead the
+ * destination pointer, and decrements the maxbytes by that many bytes. If
+ * maxbytes is not sufficient to copy the string, error out.
+ */
+static void
+do_serialize(char **destptr, Size *maxbytes, const char *fmt,...)
+{
+	va_list		vargs;
+	int			n;
+
+	if (*maxbytes <= 0)
+		elog(ERROR, "not enough space to serialize GUC state");
+
+	va_start(vargs, fmt);
+	n = vsnprintf(*destptr, *maxbytes, fmt, vargs);
+	va_end(vargs);
+
+	/*
+	 * Cater to portability hazards in the vsnprintf() return value just like
+	 * appendPQExpBufferVA() does.  Note that this requires an extra byte of
+	 * slack at the end of the buffer.  Since serialize_variable() ends with a
+	 * do_serialize_binary() rather than a do_serialize(), we'll always have
+	 * that slack; estimate_variable_size() need not add a byte for it.
+	 */
+	if (n < 0 || n >= *maxbytes - 1)
+	{
+		if (n < 0 && errno != 0 && errno != ENOMEM)
+			/* Shouldn't happen. Better show errno description. */
+			elog(ERROR, "vsnprintf failed: %m");
+		else
+			elog(ERROR, "not enough space to serialize GUC state");
+	}
+
+	/* Shift the destptr ahead of the null terminator */
+	*destptr += n + 1;
+	*maxbytes -= n + 1;
+}
+
+/* Binary copy version of do_serialize() */
+static void
+do_serialize_binary(char **destptr, Size *maxbytes, void *val, Size valsize)
+{
+	if (valsize > *maxbytes)
+		elog(ERROR, "not enough space to serialize GUC state");
+
+	memcpy(*destptr, val, valsize);
+	*destptr += valsize;
+	*maxbytes -= valsize;
+}
+
+/*
+ * serialize_variable:
+ * Dumps name, value and other information of a GUC variable into destptr.
+ */
+static void
+serialize_variable(char **destptr, Size *maxbytes,
+				   struct config_generic * gconf)
+{
+	if (can_skip_gucvar(gconf))
+		return;
+
+	do_serialize(destptr, maxbytes, "%s", gconf->name);
+
+	switch (gconf->vartype)
+	{
+		case PGC_BOOL:
+			{
+				struct config_bool *conf = (struct config_bool *) gconf;
+
+				do_serialize(destptr, maxbytes,
+							 (*conf->variable ? "true" : "false"));
+			}
+			break;
+
+		case PGC_INT:
+			{
+				struct config_int *conf = (struct config_int *) gconf;
+
+				do_serialize(destptr, maxbytes, "%d", *conf->variable);
+			}
+			break;
+
+		case PGC_REAL:
+			{
+				struct config_real *conf = (struct config_real *) gconf;
+
+				do_serialize(destptr, maxbytes, "%.*g",
+							 REALTYPE_PRECISION, *conf->variable);
+			}
+			break;
+
+		case PGC_STRING:
+			{
+				struct config_string *conf = (struct config_string *) gconf;
+
+				do_serialize(destptr, maxbytes, "%s", *conf->variable);
+			}
+			break;
+
+		case PGC_ENUM:
+			{
+				struct config_enum *conf = (struct config_enum *) gconf;
+
+				do_serialize(destptr, maxbytes, "%s",
+						 config_enum_lookup_by_value(conf, *conf->variable));
+			}
+			break;
+	}
+
+	do_serialize(destptr, maxbytes, "%s",
+				 (gconf->sourcefile ? gconf->sourcefile : ""));
+
+	if (gconf->sourcefile)
+		do_serialize_binary(destptr, maxbytes, &gconf->sourceline,
+							sizeof(gconf->sourceline));
+
+	do_serialize_binary(destptr, maxbytes, &gconf->source,
+						sizeof(gconf->source));
+	do_serialize_binary(destptr, maxbytes, &gconf->scontext,
+						sizeof(gconf->scontext));
+}
+
+/*
+ * SerializeGUCState:
+ * Dumps the complete GUC state onto the memory location at start_address.
+ */
+void
+SerializeGUCState(Size maxsize, char *start_address)
+{
+	char	   *curptr;
+	Size		actual_size;
+	Size		bytes_left;
+	int			i;
+	int			i_role;
+
+	/* Reserve space for saving the actual size of the guc state */
+	curptr = start_address + sizeof(actual_size);
+	bytes_left = maxsize - sizeof(actual_size);
+
+	for (i = 0; i < num_guc_variables; i++)
+	{
+		/*
+		 * It's pretty ugly, but we've got to force "role" to be initialized
+		 * after "session_authorization"; otherwise, the latter will override
+		 * the former.
+		 */
+		if (strcmp(guc_variables[i]->name, "role") == 0)
+			i_role = i;
+		else
+			serialize_variable(&curptr, &bytes_left, guc_variables[i]);
+	}
+	serialize_variable(&curptr, &bytes_left, guc_variables[i_role]);
+
+	/* Store actual size without assuming alignment of start_address. */
+	actual_size = maxsize - bytes_left - sizeof(actual_size);
+	memcpy(start_address, &actual_size, sizeof(actual_size));
+}
+
+/*
+ * read_gucstate:
+ * Actually it does not read anything, just returns the srcptr. But it does
+ * move the srcptr past the terminating NULL char, so that the caller is ready
+ * to read the next string.
+ */
+static char *
+read_gucstate(char **srcptr, char *srcend)
+{
+	char	   *retptr = *srcptr;
+	char	   *ptr;
+
+	if (*srcptr >= srcend)
+		elog(ERROR, "incomplete GUC state");
+
+	/* The string variables are all null terminated */
+	for (ptr = *srcptr; ptr < srcend && *ptr != '\0'; ptr++)
+		;
+
+	if (ptr > srcend)
+		elog(ERROR, "could not find null terminator in GUC state");
+
+	/* Set the new position at the position after the NULL character */
+	*srcptr = ptr + 1;
+
+	return retptr;
+}
+
+/* Binary read version of read_gucstate(). Copies into dest */
+static void
+read_gucstate_binary(char **srcptr, char *srcend, void *dest, Size size)
+{
+	if (*srcptr + size > srcend)
+		elog(ERROR, "incomplete GUC state");
+
+	memcpy(dest, *srcptr, size);
+	*srcptr += size;
+}
+
+/*
+ * RestoreGUCState:
+ * Reads the GUC state at the specified address and updates the GUCs with the
+ * values read from the GUC state.
+ */
+void
+RestoreGUCState(void *gucstate)
+{
+	char	   *varname,
+			   *varvalue,
+			   *varsourcefile;
+	int			varsourceline;
+	GucSource	varsource;
+	GucContext	varscontext;
+	char	   *srcptr = (char *) gucstate;
+	char	   *srcend;
+	Size		len;
+	int			i;
+
+	/* See comment at can_skip_gucvar(). */
+	for (i = 0; i < num_guc_variables; i++)
+		if (!can_skip_gucvar(guc_variables[i]))
+			InitializeOneGUCOption(guc_variables[i]);
+
+	/* First item is the length of the subsequent data */
+	memcpy(&len, gucstate, sizeof(len));
+
+	srcptr += sizeof(len);
+	srcend = srcptr + len;
+
+	while (srcptr < srcend)
+	{
+		if ((varname = read_gucstate(&srcptr, srcend)) == NULL)
+			break;
+
+		varvalue = read_gucstate(&srcptr, srcend);
+		varsourcefile = read_gucstate(&srcptr, srcend);
+		if (varsourcefile[0])
+			read_gucstate_binary(&srcptr, srcend,
+								 &varsourceline, sizeof(varsourceline));
+		read_gucstate_binary(&srcptr, srcend,
+							 &varsource, sizeof(varsource));
+		read_gucstate_binary(&srcptr, srcend,
+							 &varscontext, sizeof(varscontext));
+
+		(void) set_config_option(varname, varvalue,
+								 varscontext, varsource,
+								 GUC_ACTION_SET, true, 0);
+		if (varsourcefile[0])
+			set_config_sourcefile(varname, varsourcefile, varsourceline);
+	}
+}
 
 /*
  * A little "long argument" simulation, although not quite GNU
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 0a729c1..47cac5c 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -351,6 +351,11 @@ extern void write_nondefault_variables(GucContext context);
 extern void read_nondefault_variables(void);
 #endif
 
+/* GUC serialization */
+extern Size EstimateGUCStateSpace(void);
+extern void SerializeGUCState(Size maxsize, char *start_address);
+extern void RestoreGUCState(void *gucstate);
+
 /* Support for messages reported from GUC check hooks */
 
 extern PGDLLIMPORT char *GUC_check_errmsg_string;
-- 
1.7.9.6 (Apple Git-31.1)

0005-Add-a-function-to-get-the-authenticated-user-ID.patchtext/x-patch; charset=US-ASCII; name=0005-Add-a-function-to-get-the-authenticated-user-ID.patchDownload

From 654e3c0371662533874323ef99ca06f3ab180d25 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 17 Jul 2014 10:33:50 -0400
Subject: [PATCH 5/6] Add a function to get the authenticated user ID.

Previously, this was not exposed outside of miscinit.c.
---
 src/backend/utils/init/miscinit.c |   10 ++++++++++
 src/include/miscadmin.h           |    1 +
 2 files changed, 11 insertions(+)

diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index a703c67..8fccb4c 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -222,6 +222,16 @@ SetSessionUserId(Oid userid, bool is_superuser)
 	CurrentUserId = userid;
 }
 
+/*
+ * GetAuthenticatedUserId - get the authenticated user ID
+ */
+Oid
+GetAuthenticatedUserId(void)
+{
+	AssertState(OidIsValid(AuthenticatedUserId));
+	return AuthenticatedUserId;
+}
+
 
 /*
  * GetUserIdAndSecContext/SetUserIdAndSecContext - get/set the current user ID
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index c2b786e..4badcbc 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -282,6 +282,7 @@ extern char *GetUserNameFromId(Oid roleid);
 extern Oid	GetUserId(void);
 extern Oid	GetOuterUserId(void);
 extern Oid	GetSessionUserId(void);
+extern Oid	GetAuthenticatedUserId(void);
 extern void GetUserIdAndSecContext(Oid *userid, int *sec_context);
 extern void SetUserIdAndSecContext(Oid userid, int sec_context);
 extern bool InLocalUserIdChange(void);
-- 
1.7.9.6 (Apple Git-31.1)

0006-pg_background-Run-commands-in-a-background-worker-an.patchtext/x-patch; charset=US-ASCII; name=0006-pg_background-Run-commands-in-a-background-worker-an.patchDownload

From c835a06f20792556d35a0eee4c2fa21f5f23e8a3 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 11 Jul 2014 09:53:40 -0400
Subject: [PATCH 6/6] pg_background: Run commands in a background worker, and
 get the results.

The currently-active GUC values from the user session will be copied
to the background worker.  If the command returns a result set, you
can retrieve the result set; if not, you can retrieve the command
tags.  If the command fails with an error, the same error will be
thrown in the launching process when the results are retrieved.
Warnings and other messages generated by the background worker, and
notifications received by it, are also propagated to the foreground
process.
---
 contrib/Makefile                             |    1 +
 contrib/pg_background/Makefile               |   18 +
 contrib/pg_background/pg_background--1.0.sql |   17 +
 contrib/pg_background/pg_background.c        |  919 ++++++++++++++++++++++++++
 contrib/pg_background/pg_background.control  |    4 +
 5 files changed, 959 insertions(+)
 create mode 100644 contrib/pg_background/Makefile
 create mode 100644 contrib/pg_background/pg_background--1.0.sql
 create mode 100644 contrib/pg_background/pg_background.c
 create mode 100644 contrib/pg_background/pg_background.control

diff --git a/contrib/Makefile b/contrib/Makefile
index b37d0dd..11d6116 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
 		pageinspect	\
 		passwordcheck	\
 		pg_archivecleanup \
+		pg_background \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
diff --git a/contrib/pg_background/Makefile b/contrib/pg_background/Makefile
new file mode 100644
index 0000000..c4e717d
--- /dev/null
+++ b/contrib/pg_background/Makefile
@@ -0,0 +1,18 @@
+# contrib/pg_background/Makefile
+
+MODULE_big = pg_background
+OBJS = pg_background.o
+
+EXTENSION = pg_background
+DATA = pg_background--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_background
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_background/pg_background--1.0.sql b/contrib/pg_background/pg_background--1.0.sql
new file mode 100644
index 0000000..5fa5ddb
--- /dev/null
+++ b/contrib/pg_background/pg_background--1.0.sql
@@ -0,0 +1,17 @@
+/* contrib/pg_background/pg_background--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION pg_background" to load this file. \quit
+
+CREATE FUNCTION pg_background_launch(sql pg_catalog.text,
+					   queue_size pg_catalog.int4 DEFAULT 65536)
+    RETURNS pg_catalog.int4 STRICT
+	AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION pg_background_result(pid pg_catalog.int4)
+    RETURNS SETOF pg_catalog.record STRICT
+	AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION pg_background_detach(pid pg_catalog.int4)
+    RETURNS pg_catalog.void STRICT
+	AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/contrib/pg_background/pg_background.c b/contrib/pg_background/pg_background.c
new file mode 100644
index 0000000..19d139b
--- /dev/null
+++ b/contrib/pg_background/pg_background.c
@@ -0,0 +1,919 @@
+/*--------------------------------------------------------------------------
+ *
+ * pg_background.c
+ *		Run SQL commands using a background worker.
+ *
+ * Copyright (C) 2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pg_background/pg_background.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/htup_details.h"
+#include "access/printtup.h"
+#include "access/xact.h"
+#include "catalog/pg_type.h"
+#include "commands/async.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "libpq/pqformat.h"
+#include "libpq/pqmq.h"
+#include "miscadmin.h"
+#include "parser/analyze.h"
+#include "pgstat.h"
+#include "storage/dsm.h"
+#include "storage/ipc.h"
+#include "storage/shm_mq.h"
+#include "storage/shm_toc.h"
+#include "tcop/pquery.h"
+#include "tcop/utility.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/snapmgr.h"
+#include "utils/timeout.h"
+
+/* Table-of-contents constants for our dynamic shared memory segment. */
+#define PG_BACKGROUND_MAGIC				0x50674267
+#define PG_BACKGROUND_KEY_FIXED_DATA	0
+#define PG_BACKGROUND_KEY_SQL			1
+#define PG_BACKGROUND_KEY_GUC			2
+#define PG_BACKGROUND_KEY_QUEUE			3
+#define PG_BACKGROUND_NKEYS				4
+
+/* Fixed-size data passed via our dynamic shared memory segment. */
+typedef struct pg_background_fixed_data
+{
+	Oid	database_id;
+	Oid	authenticated_user_id;
+	Oid	current_user_id;
+	int	sec_context;
+	char database[NAMEDATALEN];
+	char authenticated_user[NAMEDATALEN];
+} pg_background_fixed_data;
+
+/* Private state maintained by the launching backend for IPC. */
+typedef struct pg_background_worker_info
+{
+	pid_t		pid;
+	dsm_segment *seg;
+	BackgroundWorkerHandle *handle;	
+	shm_mq_handle *responseq;	
+	bool		consumed;
+} pg_background_worker_info;
+
+/* Private state maintained across calls to pg_background_result. */
+typedef struct pg_background_result_state
+{
+	pg_background_worker_info *info;
+	FmgrInfo   *receive_functions;
+	Oid		   *typioparams;
+	bool		has_row_description;
+	List	   *command_tags;
+	bool		complete;
+} pg_background_result_state;
+
+static HTAB *worker_hash;
+
+static void cleanup_worker_info(dsm_segment *, Datum pid_datum);
+static pg_background_worker_info *find_worker_info(pid_t pid);
+static void save_worker_info(pid_t pid, dsm_segment *seg,
+				 BackgroundWorkerHandle *handle,
+				 shm_mq_handle *responseq);
+
+static HeapTuple form_result_tuple(pg_background_result_state *state,
+								   TupleDesc tupdesc, StringInfo msg);
+
+static void handle_sigterm(SIGNAL_ARGS);
+static void execute_sql_string(const char *sql);
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(pg_background_launch);
+PG_FUNCTION_INFO_V1(pg_background_result);
+PG_FUNCTION_INFO_V1(pg_background_detach);
+
+void pg_background_worker_main(Datum);
+
+/*
+ * Start a dynamic background worker to run a user-specified SQL command.
+ */
+Datum
+pg_background_launch(PG_FUNCTION_ARGS)
+{
+	text	   *sql = PG_GETARG_TEXT_PP(0);
+	int32		queue_size = PG_GETARG_INT64(1);
+	int32		sql_len = VARSIZE_ANY_EXHDR(sql);
+	Size		guc_len;
+	Size		segsize;
+	dsm_segment *seg;
+	shm_toc_estimator e;
+	shm_toc    *toc;
+    char	   *sqlp;
+	char	   *gucstate;
+	shm_mq	   *mq;
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *worker_handle;
+	pg_background_fixed_data *fdata;
+	pid_t		pid;
+	shm_mq_handle *responseq;
+	MemoryContext	oldcontext;
+
+	/* Ensure a valid queue size. */
+	if (queue_size < 0 || ((uint64) queue_size) < shm_mq_minimum_size)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("queue size must be at least %zu bytes",
+						shm_mq_minimum_size)));
+
+	/* Create dynamic shared memory and table of contents. */
+	shm_toc_initialize_estimator(&e);
+	shm_toc_estimate_chunk(&e, sizeof(pg_background_fixed_data));
+	shm_toc_estimate_chunk(&e, sql_len + 1);
+	guc_len = EstimateGUCStateSpace();
+	shm_toc_estimate_chunk(&e, guc_len);
+	shm_toc_estimate_chunk(&e, (Size) queue_size);
+	shm_toc_estimate_keys(&e, PG_BACKGROUND_NKEYS);
+	segsize = shm_toc_estimate(&e);
+	seg = dsm_create(segsize);
+	toc = shm_toc_create(PG_BACKGROUND_MAGIC, dsm_segment_address(seg),
+						 segsize);
+
+	/* Store fixed-size data in dynamic shared memory. */
+	fdata = shm_toc_allocate(toc, sizeof(pg_background_fixed_data));
+	fdata->database_id = MyDatabaseId;
+	fdata->authenticated_user_id = GetAuthenticatedUserId();
+	GetUserIdAndSecContext(&fdata->current_user_id, &fdata->sec_context);
+	strlcpy(fdata->database, get_database_name(MyDatabaseId), NAMEDATALEN);
+	strlcpy(fdata->authenticated_user,
+			GetUserNameFromId(fdata->authenticated_user_id), NAMEDATALEN);
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_FIXED_DATA, fdata);
+
+	/* Store SQL query in dynamic shared memory. */
+	sqlp = shm_toc_allocate(toc, sql_len + 1);
+	memcpy(sqlp, VARDATA(sql), sql_len);
+	sqlp[sql_len] = '\0';
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_SQL, sqlp);
+
+	/* Store GUC state in dynamic shared memory. */
+	gucstate = shm_toc_allocate(toc, guc_len);
+	SerializeGUCState(guc_len, gucstate);
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_GUC, gucstate);
+
+	/* Establish message queue in dynamic shared memory. */
+	mq = shm_mq_create(shm_toc_allocate(toc, (Size) queue_size),
+					   (Size) queue_size);
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_QUEUE, mq);
+	shm_mq_set_receiver(mq, MyProc);
+
+	/*
+	 * Attach the queue before launching a worker, so that we'll automatically
+	 * detach the queue if we error out.  (Otherwise, the worker might sit
+	 * there trying to write the queue long after we've gone away.)
+	 */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+	responseq = shm_mq_attach(mq, seg, NULL);
+	MemoryContextSwitchTo(oldcontext);
+
+	/* Configure a worker. */
+	worker.bgw_flags =
+		BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	worker.bgw_start_time = BgWorkerStart_ConsistentState;
+	worker.bgw_restart_time = BGW_NEVER_RESTART;
+	worker.bgw_main = NULL;		/* new worker might not have library loaded */
+	sprintf(worker.bgw_library_name, "pg_background");
+	sprintf(worker.bgw_function_name, "pg_background_worker_main");
+	snprintf(worker.bgw_name, BGW_MAXLEN,
+			 "pg_background by PID %d", MyProcPid);
+	worker.bgw_main_arg = UInt32GetDatum(dsm_segment_handle(seg));
+	/* set bgw_notify_pid, so we can detect if the worker stops */
+	worker.bgw_notify_pid = MyProcPid;
+
+	/*
+	 * Register the worker.
+	 *
+	 * We switch contexts so that the background worker handle can outlast
+	 * this transaction.
+	 */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+	if (!RegisterDynamicBackgroundWorker(&worker, &worker_handle))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not register background process"),
+			 errhint("You may need to increase max_worker_processes.")));
+	MemoryContextSwitchTo(oldcontext);
+	shm_mq_set_handle(responseq, worker_handle);
+
+	/* Wait for the worker to start. */
+	switch (WaitForBackgroundWorkerStartup(worker_handle, &pid))
+	{
+		case BGWH_STARTED:
+			/* Success. */
+			break;
+		case BGWH_STOPPED:
+			pfree(worker_handle);
+			ereport(ERROR,
+					(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+					 errmsg("could not start background process"),
+					 errhint("More details may be available in the server log.")));
+			break;
+        case BGWH_POSTMASTER_DIED:
+			pfree(worker_handle);
+			ereport(ERROR,
+					(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+					 errmsg("cannot start background processes without postmaster"),
+					 errhint("Kill all remaining database processes and restart the database.")));
+			break;
+		default:
+			elog(ERROR, "unexpected bgworker handle status");
+			break;
+	}
+
+	/* Store the relevant details about this worker for future use. */
+	save_worker_info(pid, seg, worker_handle, responseq);
+
+	/*
+	 * Now that the worker info is saved, we do not need to, and should not,
+	 * automatically detach the segment at resource-owner cleanup time.
+	 */
+	dsm_keep_mapping(seg);
+
+	/* Return the worker's PID. */
+	PG_RETURN_INT32(pid);
+}
+
+/*
+ * Retrieve the results of a background query previously launched in this
+ * session.
+ */
+Datum
+pg_background_result(PG_FUNCTION_ARGS)
+{
+	int32		pid = PG_GETARG_INT32(0);
+	shm_mq_result	res;
+	FuncCallContext *funcctx;
+	TupleDesc	tupdesc;
+	StringInfoData	msg;
+	pg_background_result_state *state;
+
+	/* First-time setup. */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+		pg_background_worker_info *info;
+
+		funcctx = SRF_FIRSTCALL_INIT();
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* See if we have a connection to the specified PID. */
+		if ((info = find_worker_info(pid)) == NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_OBJECT),
+					 errmsg("PID %d is not attached to this session", pid)));
+
+		/* Can't read results twice. */
+		if (info->consumed)
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("results for PID %d have already been consumed", pid)));
+		info->consumed = true;		
+
+		/*
+		 * Whether we succeed or fail, a future invocation of this function
+		 * may not try to read from the DSM once we've begun to do so.
+		 * Accordingly, make arrangements to clean things up at end of query.
+		 */
+		dsm_unkeep_mapping(info->seg);
+
+		/* Set up tuple-descriptor based on colum definition list. */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record"),
+					 errhint("Try calling the function in the FROM clause "
+							 "using a column definition list.")));
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/* Cache state that will be needed on every call. */
+		state = palloc0(sizeof(pg_background_result_state));
+		state->info = info;
+		if (funcctx->tuple_desc->natts > 0)
+		{
+			int	natts = funcctx->tuple_desc->natts;
+			int	i;
+
+			state->receive_functions = palloc(sizeof(FmgrInfo) * natts);
+			state->typioparams = palloc(sizeof(Oid) * natts);
+
+			for (i = 0;	i < natts; ++i)
+			{
+				Oid	receive_function_id;
+
+				getTypeBinaryInputInfo(funcctx->tuple_desc->attrs[i]->atttypid,
+									   &receive_function_id,
+									   &state->typioparams[i]);
+				fmgr_info(receive_function_id, &state->receive_functions[i]);
+			}
+		}
+		funcctx->user_fctx = state;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+	funcctx = SRF_PERCALL_SETUP();
+	tupdesc = funcctx->tuple_desc;
+	state = funcctx->user_fctx;
+
+	/* Initialize message buffer. */
+	initStringInfo(&msg);
+
+	/* Read and processes messages from the shared memory queue. */
+	for (;;)
+	{
+		char		msgtype;
+		Size		nbytes;
+		void	   *data;
+
+		/* Get next message. */
+		res = shm_mq_receive(state->info->responseq, &nbytes, &data, false);
+		if (res != SHM_MQ_SUCCESS)
+			break;
+
+		/*
+		 * Message-parsing routines operate on a null-terminated StringInfo,
+		 * so we must construct one.
+		 */
+		resetStringInfo(&msg);
+		enlargeStringInfo(&msg, nbytes);
+		msg.len = nbytes;
+		memcpy(msg.data, data, nbytes);
+		msg.data[nbytes] = '\0';
+		msgtype = pq_getmsgbyte(&msg);
+
+		/* Dispatch on message type. */
+		switch (msgtype)
+		{
+			case 'E':
+			case 'N':
+				{
+					ErrorData	edata;
+
+					/* Parse ErrorResponse or NoticeResponse. */
+					pq_parse_errornotice(&msg, &edata);
+
+					/*
+					 * Limit the maximum error level to ERROR.  We don't want
+					 * a FATAL inside the background worker to kill the user
+					 * session.
+					 */
+					if (edata.elevel > ERROR)
+						edata.elevel = ERROR;
+
+					/* Rethrow the error. */
+					ThrowErrorData(&edata);	
+					break;
+				}
+			case 'A':
+				{
+					/* Propagate NotifyResponse. */
+					pq_putmessage(msg.data[0], &msg.data[1], nbytes - 1);
+					break;
+				}
+			case 'T':
+				{
+					int16	natts = pq_getmsgint(&msg, 2);
+					int16	i;
+
+					if (state->has_row_description)
+						elog(ERROR, "multiple RowDescription messages");
+					state->has_row_description = true;
+   					if (natts != tupdesc->natts)
+						ereport(ERROR,
+								(errcode(ERRCODE_DATATYPE_MISMATCH),
+								 errmsg("remote query result rowtype does not match "
+									"the specified FROM clause rowtype")));
+
+					for (i = 0; i < natts; ++i)
+					{
+						Oid		type_id;
+
+						(void) pq_getmsgstring(&msg);	/* name */
+						(void) pq_getmsgint(&msg, 4);	/* table OID */
+						(void) pq_getmsgint(&msg, 2);	/* table attnum */
+						type_id = pq_getmsgint(&msg, 4);	/* type OID */
+						(void) pq_getmsgint(&msg, 2);	/* type length */
+						(void) pq_getmsgint(&msg, 4);	/* typmod */
+						(void) pq_getmsgint(&msg, 2);	/* format code */
+
+						if (type_id != tupdesc->attrs[i]->atttypid)
+							ereport(ERROR,
+									(errcode(ERRCODE_DATATYPE_MISMATCH),
+									 errmsg("remote query result rowtype does not match "
+										"the specified FROM clause rowtype")));
+					}
+
+					pq_getmsgend(&msg);
+
+					break;
+				}
+			case 'D':
+				{
+					/* Handle DataRow message. */
+					HeapTuple	result;
+
+					result = form_result_tuple(state, tupdesc, &msg);
+					SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(result));
+				}
+			case 'C':
+				{
+					/* Handle CommandComplete message. */
+					MemoryContext	oldcontext;
+					const char  *tag = pq_getmsgstring(&msg);
+
+					oldcontext =
+						MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+					state->command_tags = lappend(state->command_tags,
+												  pstrdup(tag));
+					MemoryContextSwitchTo(oldcontext);
+					break;
+				}
+			case 'Z':
+				{
+					/* Handle ReadyForQuery message. */
+					state->complete = true;
+					break;
+				}
+			default:
+				elog(WARNING, "unknown message type: %c (%zu bytes)",
+					 msg.data[0], nbytes);
+				break;
+		}
+	}
+
+	/* Check whether the connection was broken prematurely. */
+	if (!state->complete)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONNECTION_FAILURE),
+				 errmsg("lost connection to worker process with PID %d",
+					pid)));
+
+	/* If no data rows, return the command tags instead. */
+	if (!state->has_row_description)
+	{
+		if (tupdesc->natts != 1 || tupdesc->attrs[0]->atttypid != TEXTOID)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query did not return a result set, but "
+							"result rowtype is not a single text column")));
+		if (state->command_tags != NIL)
+		{
+			char *tag = linitial(state->command_tags);
+			Datum	value;
+			bool	isnull;
+			HeapTuple	result;
+
+			state->command_tags = list_delete_first(state->command_tags);
+			value = PointerGetDatum(cstring_to_text(tag));
+			isnull = false;
+			result = heap_form_tuple(tupdesc, &value, &isnull);
+			SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(result));
+		}			
+	}
+
+	/* We're done! */
+	dsm_detach(state->info->seg);
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Parse a DataRow message and form a result tuple.
+ */
+static HeapTuple
+form_result_tuple(pg_background_result_state *state, TupleDesc tupdesc,
+				  StringInfo msg)
+{
+	/* Handle DataRow message. */
+	int16	natts = pq_getmsgint(msg, 2);
+	int16	i;
+	Datum  *values = NULL;
+	bool   *isnull = NULL;
+	StringInfoData	buf;
+
+	if (!state->has_row_description)
+		elog(ERROR, "DataRow not preceded by RowDescription");
+   	if (natts != tupdesc->natts)
+		elog(ERROR, "malformed DataRow");
+	if (natts > 0)
+	{
+		values = palloc(natts * sizeof(Datum));
+		isnull = palloc(natts * sizeof(bool));
+	}
+	initStringInfo(&buf);
+
+	for (i = 0; i < natts; ++i)
+	{
+		int32	bytes = pq_getmsgint(msg, 4);
+
+		if (bytes < 0)
+		{
+			values[i] = ReceiveFunctionCall(&state->receive_functions[i],
+											NULL,
+											state->typioparams[i],
+											tupdesc->attrs[i]->atttypmod);
+			isnull[i] = true;
+		}
+		else
+		{
+			resetStringInfo(&buf);
+			appendBinaryStringInfo(&buf, pq_getmsgbytes(msg, bytes), bytes);
+			values[i] = ReceiveFunctionCall(&state->receive_functions[i],
+											&buf,
+											state->typioparams[i],
+											tupdesc->attrs[i]->atttypmod);
+			isnull[i] = false;
+		}
+	}
+
+	pq_getmsgend(msg);
+
+	return heap_form_tuple(tupdesc, values, isnull);
+}
+
+/*
+ * Detach from the dynamic shared memory segment used for communication with
+ * a background worker.  This prevents the worker from stalling waiting for
+ * us to read its results.
+ */
+Datum
+pg_background_detach(PG_FUNCTION_ARGS)
+{
+	int32		pid = PG_GETARG_INT32(0);
+	pg_background_worker_info *info;
+
+	info = find_worker_info(pid);
+	if (info == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("PID %d is not attached to this session", pid)));
+	dsm_detach(info->seg);
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * When the dynamic shared memory segment associated with a worker is
+ * cleaned up, we need to clean up our associated private data structures.
+ */
+static void
+cleanup_worker_info(dsm_segment *seg, Datum pid_datum)
+{
+	pid_t	pid = DatumGetInt32(pid_datum);
+	bool	found;
+	pg_background_worker_info *info;
+
+	/* Find any worker info entry for this PID.  If none, we're done. */
+	if ((info = find_worker_info(pid)) == NULL)
+		return;
+
+	/* Free memory used by the BackgroundWorkerHandle. */
+	if (info->handle != NULL)
+	{
+		pfree(info->handle);
+		info->handle = NULL;
+	}
+
+	/* Remove the hashtable entry. */
+	hash_search(worker_hash, (void *) &pid, HASH_REMOVE, &found);
+	if (!found)
+		elog(ERROR, "pg_background worker_hash table corrupted");
+}
+
+/*
+ * Find the background worker information for the worker with a given PID.
+ */
+static pg_background_worker_info *
+find_worker_info(pid_t pid)
+{
+	pg_background_worker_info *info = NULL;
+
+	if (worker_hash != NULL)
+		info = hash_search(worker_hash, (void *) &pid, HASH_FIND, NULL);
+
+	return info;
+}
+
+/*
+ * Save worker information for future IPC.
+ */
+static void
+save_worker_info(pid_t pid, dsm_segment *seg, BackgroundWorkerHandle *handle,
+				 shm_mq_handle *responseq)
+{
+	pg_background_worker_info *info;
+
+	/* If the hash table hasn't been set up yet, do that now. */
+	if (worker_hash == NULL)
+	{
+		HASHCTL	ctl;
+
+		ctl.keysize = sizeof(pid_t);
+		ctl.entrysize = sizeof(pg_background_worker_info);
+		worker_hash = hash_create("pg_background worker_hash", 8, &ctl,
+								  HASH_ELEM);
+	}
+
+	/* Detach any older worker with this PID. */
+	if ((info = find_worker_info(pid)) != NULL)
+		dsm_detach(info->seg);
+
+	/* When the DSM is unmapped, clean everything up. */
+	on_dsm_detach(seg, cleanup_worker_info, Int32GetDatum(pid));
+
+	/* Create a new entry for this worker. */
+	info = hash_search(worker_hash, (void *) &pid, HASH_ENTER, NULL);
+	info->seg = seg;
+	info->handle = handle;
+	info->responseq = responseq;
+	info->consumed = false;
+}
+
+/*
+ * Background worker entrypoint.
+ */
+void
+pg_background_worker_main(Datum main_arg)
+{
+	dsm_segment *seg;
+	shm_toc    *toc;
+	pg_background_fixed_data *fdata;
+	char	   *sql;
+	char	   *gucstate;
+	shm_mq	   *mq;
+	shm_mq_handle *responseq;
+
+	/* Establish signal handlers. */
+	pqsignal(SIGTERM, handle_sigterm);
+	BackgroundWorkerUnblockSignals();
+
+	/* Set up a memory context and resource owner. */
+	Assert(CurrentResourceOwner == NULL);
+	CurrentResourceOwner = ResourceOwnerCreate(NULL, "pg_background");
+	CurrentMemoryContext = AllocSetContextCreate(TopMemoryContext,
+												 "pg_background session",
+												 ALLOCSET_DEFAULT_MINSIZE,
+												 ALLOCSET_DEFAULT_INITSIZE,
+												 ALLOCSET_DEFAULT_MAXSIZE);
+	
+
+	/* Connect to the dynamic shared memory segment. */
+	seg = dsm_attach(DatumGetInt32(main_arg));
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("unable to map dynamic shared memory segment")));
+	toc = shm_toc_attach(PG_BACKGROUND_MAGIC, dsm_segment_address(seg));
+	if (toc == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+			   errmsg("bad magic number in dynamic shared memory segment")));
+
+	/* Find data structures in dynamic shared memory. */
+	fdata = shm_toc_lookup(toc, PG_BACKGROUND_KEY_FIXED_DATA);
+	sql = shm_toc_lookup(toc, PG_BACKGROUND_KEY_SQL);
+	gucstate = shm_toc_lookup(toc, PG_BACKGROUND_KEY_GUC);
+	mq = shm_toc_lookup(toc, PG_BACKGROUND_KEY_QUEUE);
+	shm_mq_set_sender(mq, MyProc);
+	responseq = shm_mq_attach(mq, seg, NULL);
+
+	/* Redirect protocol messages to responseq. */
+	pq_redirect_to_shm_mq(mq, responseq);
+
+	/*
+	 * Initialize our user and database ID based on the strings version of
+	 * the data, and then go back and check that we actually got the database
+	 * and user ID that we intended to get.  We do this because it's not
+	 * impossible for the process that started us to die before we get here,
+	 * and the user or database could be renamed in the meantime.  We don't
+	 * want to latch on the wrong object by accident.  There should probably
+	 * be a variant of BackgroundWorkerInitializeConnection that accepts OIDs
+	 * rather than strings.
+	 */
+	BackgroundWorkerInitializeConnection(fdata->database,
+										 fdata->authenticated_user);
+	if (fdata->database_id != MyDatabaseId ||
+		fdata->authenticated_user_id != GetAuthenticatedUserId())
+		ereport(ERROR,
+			(errmsg("user or database renamed during pg_background startup")));
+
+	/* Restore GUC values from launching backend. */
+	StartTransactionCommand();
+	RestoreGUCState(gucstate);
+	CommitTransactionCommand();
+
+	/* Restore user ID and security context. */
+	SetUserIdAndSecContext(fdata->current_user_id, fdata->sec_context);
+
+	/* Prepare to execute the query. */
+	SetCurrentStatementStartTimestamp();
+	debug_query_string = sql;
+   	pgstat_report_activity(STATE_RUNNING, sql);
+	StartTransactionCommand();
+	if (StatementTimeout > 0)
+		enable_timeout_after(STATEMENT_TIMEOUT, StatementTimeout);
+	else
+		disable_timeout(STATEMENT_TIMEOUT, false);
+
+	/* Execute the query. */
+	execute_sql_string(sql);
+
+	/* Post-execution cleanup. */
+	disable_timeout(STATEMENT_TIMEOUT, false);
+	CommitTransactionCommand();
+	ProcessCompletedNotifies();
+   	pgstat_report_activity(STATE_IDLE, sql);
+	pgstat_report_stat(true);
+
+	/* Signal that we are done. */
+	ReadyForQuery(DestRemote);
+}
+
+/*
+ * Execute given SQL string.
+ *
+ * Using SPI here would preclude backgrounding commands like VACUUM which one
+ * might very well wish to launch in the background.  So we do this instead.
+ */
+static void
+execute_sql_string(const char *sql)
+{
+	List	   *raw_parsetree_list;
+	ListCell   *lc1;
+	bool		isTopLevel;
+	int			commands_remaining;
+	MemoryContext	parsecontext;
+	MemoryContext	oldcontext;
+
+	/*
+	 * Parse the SQL string into a list of raw parse trees.
+	 *
+	 * Because we allow statements that perform internal transaction control,
+	 * we can't do this in TopTransactionContext; the parse trees might get
+	 * blown away before we're done executing them.
+	 */
+	parsecontext = AllocSetContextCreate(TopMemoryContext,
+										 "pg_background parse/plan",
+										 ALLOCSET_DEFAULT_MINSIZE,
+										 ALLOCSET_DEFAULT_INITSIZE,
+										 ALLOCSET_DEFAULT_MAXSIZE);
+	oldcontext = MemoryContextSwitchTo(parsecontext);
+	raw_parsetree_list = pg_parse_query(sql);
+	commands_remaining = list_length(raw_parsetree_list);
+	isTopLevel = commands_remaining == 1;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * Do parse analysis, rule rewrite, planning, and execution for each raw
+	 * parsetree.  We must fully execute each query before beginning parse
+	 * analysis on the next one, since there may be interdependencies.
+	 */
+	foreach(lc1, raw_parsetree_list)
+	{
+		Node	   *parsetree = (Node *) lfirst(lc1);
+        const char *commandTag;
+        char        completionTag[COMPLETION_TAG_BUFSIZE];
+        List       *querytree_list,
+                   *plantree_list;
+		bool		snapshot_set = false;
+		Portal		portal;
+		DestReceiver *receiver;
+		int16		format = 1;
+
+		/*
+		 * We don't allow transaction-control commands like COMMIT and ABORT
+		 * here.  The entire SQL statement is executed as a single transaction
+		 * which commits if no errors are encountered.
+		 */
+		if (IsA(parsetree, TransactionStmt))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("transaction control statements are not allowed in pg_background")));
+
+        /*
+         * Get the command name for use in status display (it also becomes the
+         * default completion tag, down inside PortalRun).  Set ps_status and
+         * do any special start-of-SQL-command processing needed by the
+         * destination.
+         */
+        commandTag = CreateCommandTag(parsetree);
+        set_ps_display(commandTag, false);
+        BeginCommand(commandTag, DestNone);
+
+		/* Set up a snapshot if parse analysis/planning will need one. */
+		if (analyze_requires_snapshot(parsetree))
+		{
+			PushActiveSnapshot(GetTransactionSnapshot());
+			snapshot_set = true;
+		}
+
+		/*
+		 * OK to analyze, rewrite, and plan this query.
+		 *
+		 * As with parsing, we need to make sure this data outlives the
+		 * transaction, because of the possibility that the statement might
+		 * perform internal transaction control.
+		 */
+		oldcontext = MemoryContextSwitchTo(parsecontext);
+		querytree_list = pg_analyze_and_rewrite(parsetree, sql, NULL, 0);
+		plantree_list = pg_plan_queries(querytree_list, 0, NULL);
+
+		/* Done with the snapshot used for parsing/planning */
+        if (snapshot_set)
+            PopActiveSnapshot();
+
+        /* If we got a cancel signal in analysis or planning, quit */
+        CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Execute the query using the unnamed portal.
+		 */
+        portal = CreatePortal("", true, true);
+        /* Don't display the portal in pg_cursors */
+        portal->visible = false;
+		PortalDefineQuery(portal, NULL, sql, commandTag, plantree_list, NULL);
+		PortalStart(portal, NULL, 0, InvalidSnapshot);
+		PortalSetResultFormat(portal, 1, &format);		/* binary format */
+
+		/*
+		 * Tuples returned by any command other than the last are simply
+		 * discarded; but those returned by the last (or only) command are
+		 * redirected to the shared memory queue we're using for communication
+		 * with the launching backend. If the launching backend is gone or has
+		 * detached us, these messages will just get dropped on the floor.
+		 */
+		--commands_remaining;
+		if (commands_remaining > 0)
+			receiver = CreateDestReceiver(DestNone);
+		else
+		{
+			receiver = CreateDestReceiver(DestRemote);
+			SetRemoteDestReceiverParams(receiver, portal);
+		}
+
+		/*
+		 * Only once the portal and destreceiver have been established can
+		 * we return to the transaction context.  All that stuff needs to
+		 * survive an internal commit inside PortalRun!
+		 */
+		MemoryContextSwitchTo(oldcontext);
+
+		/* Here's where we actually execute the command. */
+		(void) PortalRun(portal, FETCH_ALL, isTopLevel, receiver, receiver,
+                         completionTag);
+
+		/* Clean up the receiver. */
+		(*receiver->rDestroy) (receiver);
+
+		/*
+		 * Send a CommandComplete message even if we suppressed the query
+		 * results.  The user backend will report these in the absence of
+		 * any true query results.
+		 */
+		EndCommand(completionTag, DestRemote);
+
+		/* Clean up the portal. */
+		PortalDrop(portal, false);
+	}
+
+	/* Be sure to advance the command counter after the last script command */
+	CommandCounterIncrement();
+}
+
+/*
+ * When we receive a SIGTERM, we set InterruptPending and ProcDiePending just
+ * like a normal backend.  The next CHECK_FOR_INTERRUPTS() will do the right
+ * thing.
+ */
+static void
+handle_sigterm(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	if (!proc_exit_inprogress)
+	{
+		InterruptPending = true;
+		ProcDiePending = true;
+	}
+
+	errno = save_errno;
+}
diff --git a/contrib/pg_background/pg_background.control b/contrib/pg_background/pg_background.control
new file mode 100644
index 0000000..733d0e1
--- /dev/null
+++ b/contrib/pg_background/pg_background.control
@@ -0,0 +1,4 @@
+comment = 'Run SQL queries in the background'
+default_version = '1.0'
+module_pathname = '$libdir/pg_background'
+relocatable = true
-- 
1.7.9.6 (Apple Git-31.1)

Alvaro Herrera

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#1)

Re: pg_background (and more parallelism infrastructure patches)

On Fri, Jul 25, 2014 at 02:11:32PM -0400, Robert Haas wrote:

+	pq_mq_busy = true;
+
+	iov[0].data = &msgtype;
+	iov[0].len = 1;
+	iov[1].data = s;
+	iov[1].len = len;
+
+	Assert(pq_mq_handle != NULL);
+	result = shm_mq_sendv(pq_mq_handle, iov, 2, false);
+
+	pq_mq_busy = false;

Don't you need a PG_TRY block here to reset pq_mq_busy?

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#1)

Re: pg_background (and more parallelism infrastructure patches)

Hi,

On 2014-07-25 14:11:32 -0400, Robert Haas wrote:

Attached is a contrib module that lets you launch arbitrary command in
a background worker, and supporting infrastructure patches for core.

Cool.

I assume this 'fell out' of the work towards parallelism? Do you think
all of the patches (except the contrib one) are required for that or is
some, e.g. 3), only required to demonstrate the others?

Patch 3 adds the ability for a backend to request that the protocol
messages it would normally send to the frontend get redirected to a
shm_mq. I did this by adding a couple of hook functions. The best
design is definitely arguable here, so if you'd like to bikeshed, this
is probably the patch to look at.

Uh. This doesn't sound particularly nice. Shouldn't this rather be
clearly layered by making reading/writing from the client a proper API
instead of adding hook functions here and there?

Also, you seem to have only touched receiving from the client, and not
sending back to the subprocess. Is that actually sufficient? I'd expect
that for this facility to be fully useful it'd have to be two way
communication. But perhaps I'm overestimating what it could be used for.

This patch also adds a function to
help you parse an ErrorResponse or NoticeResponse and re-throw the
error or notice in the originating backend. Obviously, parallelism is
going to need this kind of functionality, but I suspect a variety of
other applications people may develop using background workers may
want it too; and it's certainly important for pg_background itself.

I would have had use for it previously.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Alvaro Herrera (#2)

Re: pg_background (and more parallelism infrastructure patches)

On Fri, Jul 25, 2014 at 4:16 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

On Fri, Jul 25, 2014 at 02:11:32PM -0400, Robert Haas wrote:
+     pq_mq_busy = true;
+
+     iov[0].data = &msgtype;
+     iov[0].len = 1;
+     iov[1].data = s;
+     iov[1].len = len;
+
+     Assert(pq_mq_handle != NULL);
+     result = shm_mq_sendv(pq_mq_handle, iov, 2, false);
+
+     pq_mq_busy = false;
Don't you need a PG_TRY block here to reset pq_mq_busy?

No. If shm_mq_sendv is interrupted, we can't use the shm_mq any more.
But since that should only happen if an interrupt arrives while the
queue is full, I think that's OK. (Think about the alternatives: if
the queue is full, we have no way of notifying the launching process
without waiting for it to retrieve the results, but it might not do
that right away, and if we've been killed we need to die *now* not
later.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Andres Freund (#3)

Re: pg_background (and more parallelism infrastructure patches)

On Sat, Jul 26, 2014 at 4:37 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-07-25 14:11:32 -0400, Robert Haas wrote:

Attached is a contrib module that lets you launch arbitrary command in
a background worker, and supporting infrastructure patches for core.

Cool.

I assume this 'fell out' of the work towards parallelism? Do you think
all of the patches (except the contrib one) are required for that or is
some, e.g. 3), only required to demonstrate the others?

I'm fairly sure that patches 3, 4, and 5 are all required in some form
as building blocks for parallelism. Patch 1 contains two functions,
one of which (shm_mq_set_handle) I think is generally useful for
people using background workers, but not absolutely required; and one
of which is infrastructure for patch 3 which might not be necessary
with different design choices (shm_mq_sendv). Patch 2 is only
included because pg_background can benefit from it; we could instead
use an eoxact callback, at the expense of doing cleanup at
end-of-transaction rather than end-of-query. But it's a mighty small
patch and seems like a reasonable extension to the API, so I lean
toward including it.

Patch 3 adds the ability for a backend to request that the protocol
messages it would normally send to the frontend get redirected to a
shm_mq. I did this by adding a couple of hook functions. The best
design is definitely arguable here, so if you'd like to bikeshed, this
is probably the patch to look at.

Uh. This doesn't sound particularly nice. Shouldn't this rather be
clearly layered by making reading/writing from the client a proper API
instead of adding hook functions here and there?

I don't know exactly what you have in mind here. There is an API for
writing to the client that is used throughout the backend, but right
now "the client" always has to be a socket. Hooking a couple of parts
of that API lets us write someplace else instead. If you've got
another idea how to do this, suggest away...

Also, you seem to have only touched receiving from the client, and not
sending back to the subprocess. Is that actually sufficient? I'd expect
that for this facility to be fully useful it'd have to be two way
communication. But perhaps I'm overestimating what it could be used for.

Well, the basic shm_mq infrastructure can be used to send any kind of
messages you want between any pair of processes that care to establish
them. But in general I expect that data is going to flow mostly in
one direction - the user backend will launch workers and give them an
initial set of instructions, and then results will stream back from
the workers to the user backend. Other messaging topologies are
certainly possible, and probably useful for something, but I don't
really know exactly what those things will be yet, and I'm not sure
the FEBE protocol will be the right tool for the job anyway. But
error propagation, which is the main thrust of this, seems like a need
that will likely be pretty well ubiquitous.

This patch also adds a function to
help you parse an ErrorResponse or NoticeResponse and re-throw the
error or notice in the originating backend. Obviously, parallelism is
going to need this kind of functionality, but I suspect a variety of
other applications people may develop using background workers may
want it too; and it's certainly important for pg_background itself.

I would have had use for it previously.

Cool. I know Petr was interested as well (possibly for the same project?).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#5)

Re: pg_background (and more parallelism infrastructure patches)

On 2014-07-26 12:20:34 -0400, Robert Haas wrote:

On Sat, Jul 26, 2014 at 4:37 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-07-25 14:11:32 -0400, Robert Haas wrote:

Attached is a contrib module that lets you launch arbitrary command in
a background worker, and supporting infrastructure patches for core.

Cool.

I assume this 'fell out' of the work towards parallelism? Do you think
all of the patches (except the contrib one) are required for that or is
some, e.g. 3), only required to demonstrate the others?

I'm fairly sure that patches 3, 4, and 5 are all required in some form
as building blocks for parallelism. Patch 1 contains two functions,
one of which (shm_mq_set_handle) I think is generally useful for
people using background workers, but not absolutely required; and one
of which is infrastructure for patch 3 which might not be necessary
with different design choices (shm_mq_sendv). Patch 2 is only
included because pg_background can benefit from it; we could instead
use an eoxact callback, at the expense of doing cleanup at
end-of-transaction rather than end-of-query. But it's a mighty small
patch and seems like a reasonable extension to the API, so I lean
toward including it.

Don't get me wrong, I don't object to anything in here. It's just that
the bigger picture can help giving sensible feedback.

Patch 3 adds the ability for a backend to request that the protocol
messages it would normally send to the frontend get redirected to a
shm_mq. I did this by adding a couple of hook functions. The best
design is definitely arguable here, so if you'd like to bikeshed, this
is probably the patch to look at.

Uh. This doesn't sound particularly nice. Shouldn't this rather be
clearly layered by making reading/writing from the client a proper API
instead of adding hook functions here and there?

I don't know exactly what you have in mind here. There is an API for
writing to the client that is used throughout the backend, but right
now "the client" always has to be a socket. Hooking a couple of parts
of that API lets us write someplace else instead. If you've got
another idea how to do this, suggest away...

What I'm thinking of is providing an actual API for the writes instead
of hooking into the socket API in a couple places. I.e. have something
like

typedef struct DestIO DestIO;

struct DestIO
{
void (*flush)(struct DestIO *io);
int (*putbytes)(struct DestIO *io, const char *s, size_t len);
int (*getbytes)(struct DestIO *io, const char *s, size_t len);
...
}

and do everything through it. I haven't thought much about the specific
API we want, but abstracting the communication properly instead of
adding hooks here and there is imo much more likely to succeed in the
long term.

Also, you seem to have only touched receiving from the client, and not
sending back to the subprocess. Is that actually sufficient? I'd expect
that for this facility to be fully useful it'd have to be two way
communication. But perhaps I'm overestimating what it could be used for.

Well, the basic shm_mq infrastructure can be used to send any kind of
messages you want between any pair of processes that care to establish
them. But in general I expect that data is going to flow mostly in
one direction - the user backend will launch workers and give them an
initial set of instructions, and then results will stream back from
the workers to the user backend. Other messaging topologies are
certainly possible, and probably useful for something, but I don't
really know exactly what those things will be yet, and I'm not sure
the FEBE protocol will be the right tool for the job anyway.

It's imo not particularly unreasonable to e.g. COPY to/from a bgworker. Which
would require the ability to both read/write from the other side.

But
error propagation, which is the main thrust of this, seems like a need
that will likely be pretty well ubiquitous.

Agreed.

This patch also adds a function to
help you parse an ErrorResponse or NoticeResponse and re-throw the
error or notice in the originating backend. Obviously, parallelism is
going to need this kind of functionality, but I suspect a variety of
other applications people may develop using background workers may
want it too; and it's certainly important for pg_background itself.

I would have had use for it previously.

Cool. I know Petr was interested as well (possibly for the same project?).

Well, I was aware of Petr's project, but I also have my own pet project
I'd been playing with :). Nothing real.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Andres Freund (#6)

Re: pg_background (and more parallelism infrastructure patches)

On Mon, Jul 28, 2014 at 1:50 PM, Andres Freund <andres@2ndquadrant.com> wrote:

Don't get me wrong, I don't object to anything in here. It's just that
the bigger picture can help giving sensible feedback.

Right. I did not get you wrong. :-)

The reason I'm making a point of it is that, if somebody wants to
object to the way those facilities are designed, it'd be good to get
that out of the way now rather than waiting until 2 or 3 patch sets
from now and then saying "Uh, could you guys go back and rework all
that stuff?". I'm not going to complain too loudly now if somebody
wants something in there done in a different way, but it's easier to
do that now while there's only pg_background sitting on top of it.

What I'm thinking of is providing an actual API for the writes instead
of hooking into the socket API in a couple places. I.e. have something
like

typedef struct DestIO DestIO;

struct DestIO
{
void (*flush)(struct DestIO *io);
int (*putbytes)(struct DestIO *io, const char *s, size_t len);
int (*getbytes)(struct DestIO *io, const char *s, size_t len);
...
}

and do everything through it. I haven't thought much about the specific
API we want, but abstracting the communication properly instead of
adding hooks here and there is imo much more likely to succeed in the
long term.

This sounds suspiciously like the DestReceiver thing we've already
got, except that the DestReceiver only applies to tuple results, not
errors and notices and so on. I'm not totally unamenable to a bigger
refactoring here, but right now it looks to me like a solution in
search of a problem. The hooks are simple and seem to work fine; I
don't want to add notation for its own sake.

Also, you seem to have only touched receiving from the client, and not
sending back to the subprocess. Is that actually sufficient? I'd expect
that for this facility to be fully useful it'd have to be two way
communication. But perhaps I'm overestimating what it could be used for.

Well, the basic shm_mq infrastructure can be used to send any kind of
messages you want between any pair of processes that care to establish
them. But in general I expect that data is going to flow mostly in
one direction - the user backend will launch workers and give them an
initial set of instructions, and then results will stream back from
the workers to the user backend. Other messaging topologies are
certainly possible, and probably useful for something, but I don't
really know exactly what those things will be yet, and I'm not sure
the FEBE protocol will be the right tool for the job anyway.

It's imo not particularly unreasonable to e.g. COPY to/from a bgworker. Which
would require the ability to both read/write from the other side.

Well, that should work fine if the background worker and user backend
generate the CopyData messages via some bespoke code rather than
expecting to be able to jump into copy.c and have everything work. If
you want that to work, why? It doesn't make much sense for
pg_background, because I don't think it would be sensible for SELECT
pg_background_result(...) to return CopyInResponse or CopyOutResponse,
and even if it were sensible it doesn't seem useful. And I can't
think of any other application off-hand, either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Kapila

amit.kapila16@gmail.com

over 11 years ago

In reply to: Robert Haas (#4)

Re: pg_background (and more parallelism infrastructure patches)

On Sat, Jul 26, 2014 at 9:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jul 25, 2014 at 4:16 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
On Fri, Jul 25, 2014 at 02:11:32PM -0400, Robert Haas wrote:
+     pq_mq_busy = true;
+
+     iov[0].data = &msgtype;
+     iov[0].len = 1;
+     iov[1].data = s;
+     iov[1].len = len;
+
+     Assert(pq_mq_handle != NULL);
+     result = shm_mq_sendv(pq_mq_handle, iov, 2, false);
+
+     pq_mq_busy = false;
Don't you need a PG_TRY block here to reset pq_mq_busy?
No. If shm_mq_sendv is interrupted, we can't use the shm_mq any more.
But since that should only happen if an interrupt arrives while the
queue is full, I think that's OK.

I think here not only on interrupt, but any other error in this
function shm_mq_sendv() path (one example is WaitLatch())
could lead to same behaviour.

(Think about the alternatives: if
the queue is full, we have no way of notifying the launching process
without waiting for it to retrieve the results, but it might not do
that right away, and if we've been killed we need to die *now* not
later.)

So in such cases what is the advise to users, currently they will
see the below message:
postgres=# select * from pg_background_result(5124) as (x int);
ERROR: lost connection to worker process with PID 5124

One way is to ask them to check logs, but what about if they want
to handle error and take some action?

Another point about error handling is that to execute the sql in
function pg_background_worker_main(), it starts the transaction
which I think doesn't get aborted if error occurs and similarly handling
for timeout seems to be missing in error path.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Amit Kapila

amit.kapila16@gmail.com

over 11 years ago

In reply to: Amit Kapila (#8)

Re: pg_background (and more parallelism infrastructure patches)

On Mon, Sep 8, 2014 at 10:39 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Sat, Jul 26, 2014 at 9:32 PM, Robert Haas <robertmhaas@gmail.com>

wrote:

On Fri, Jul 25, 2014 at 4:16 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
On Fri, Jul 25, 2014 at 02:11:32PM -0400, Robert Haas wrote:
+     pq_mq_busy = true;
+
+     iov[0].data = &msgtype;
+     iov[0].len = 1;
+     iov[1].data = s;
+     iov[1].len = len;
+
+     Assert(pq_mq_handle != NULL);
+     result = shm_mq_sendv(pq_mq_handle, iov, 2, false);
+
+     pq_mq_busy = false;
Don't you need a PG_TRY block here to reset pq_mq_busy?
No. If shm_mq_sendv is interrupted, we can't use the shm_mq any more.
But since that should only happen if an interrupt arrives while the
queue is full, I think that's OK.
I think here not only on interrupt, but any other error in this
function shm_mq_sendv() path (one example is WaitLatch())
could lead to same behaviour.

(Think about the alternatives: if
the queue is full, we have no way of notifying the launching process
without waiting for it to retrieve the results, but it might not do
that right away, and if we've been killed we need to die *now* not
later.)

So in such cases what is the advise to users, currently they will
see the below message:
postgres=# select * from pg_background_result(5124) as (x int);
ERROR: lost connection to worker process with PID 5124

One way is to ask them to check logs, but what about if they want
to handle error and take some action?

Another point about error handling is that to execute the sql in
function pg_background_worker_main(), it starts the transaction
which I think doesn't get aborted if error occurs

For this I was just referring error handling code of
StartBackgroundWorker(), however during shutdown of process, it
will call AbortOutOfAnyTransaction() which will take care of aborting
transaction.

and similarly handling
for timeout seems to be missing in error path.

As we are anyway going to exit on error, so not sure, if this will be
required, however having it for clean exit seems to be better.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#10

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Amit Kapila (#8)

Re: pg_background (and more parallelism infrastructure patches)

On Mon, Sep 8, 2014 at 1:09 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Don't you need a PG_TRY block here to reset pq_mq_busy?

No. If shm_mq_sendv is interrupted, we can't use the shm_mq any more.
But since that should only happen if an interrupt arrives while the
queue is full, I think that's OK.

I think here not only on interrupt, but any other error in this
function shm_mq_sendv() path (one example is WaitLatch())
could lead to same behaviour.

Probably true. But I think we generally count on that to be no-fail,
or close to it, so I'm not really worried about it.

(Think about the alternatives: if
the queue is full, we have no way of notifying the launching process
without waiting for it to retrieve the results, but it might not do
that right away, and if we've been killed we need to die *now* not
later.)

So in such cases what is the advise to users, currently they will
see the below message:
postgres=# select * from pg_background_result(5124) as (x int);
ERROR: lost connection to worker process with PID 5124

One way is to ask them to check logs, but what about if they want
to handle error and take some action?

They have to check the logs. To reiterate what I said before, there
is no reasonable way to both have the background worker terminate
quickly and also guarantee that the full error message is received by
the process that started it. You have to pick one, and I stick by the
one I picked.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Amit Kapila (#9)

Re: pg_background (and more parallelism infrastructure patches)

On Mon, Sep 8, 2014 at 2:05 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Another point about error handling is that to execute the sql in
function pg_background_worker_main(), it starts the transaction
which I think doesn't get aborted if error occurs

For this I was just referring error handling code of
StartBackgroundWorker(), however during shutdown of process, it
will call AbortOutOfAnyTransaction() which will take care of aborting
transaction.

Yeah.

and similarly handling
for timeout seems to be missing in error path.

As we are anyway going to exit on error, so not sure, if this will be
required, however having it for clean exit seems to be better.

Can you be more specific?

I'm generally of the view that there's little point in spending time
cleaning things up that will go away anyway when the process exits.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Amit Kapila

amit.kapila16@gmail.com

over 11 years ago

In reply to: Robert Haas (#11)

Re: pg_background (and more parallelism infrastructure patches)

On Tue, Sep 9, 2014 at 2:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Sep 8, 2014 at 2:05 AM, Amit Kapila <amit.kapila16@gmail.com>

wrote:

Another point about error handling is that to execute the sql in
function pg_background_worker_main(), it starts the transaction
which I think doesn't get aborted if error occurs

For this I was just referring error handling code of
StartBackgroundWorker(), however during shutdown of process, it
will call AbortOutOfAnyTransaction() which will take care of aborting
transaction.

Yeah.

and similarly handling
for timeout seems to be missing in error path.

As we are anyway going to exit on error, so not sure, if this will be
required, however having it for clean exit seems to be better.

Can you be more specific?

I was thinking to handle errors similar to what PostgreMain does,
however if it is going to exit then it might no be worth.

I'm generally of the view that there's little point in spending time
cleaning things up that will go away anyway when the process exits.

Yeah, in that case it might not make much sense. The argument here
could be why it has to exit, why can't it wait till the launcher asks it
to exit. You have mentioned in previous mail that you want to stick to
the approach taken by patch, however it is not clear why you think that
is better. If I try to think about how the worker backend should behave
incase the master backend assigns some task to worker, then I think it
would be sensible for worker to not exit after completing it's task (either
it has finished the execution of work assigned or otherwise) as master
backend can assign further tasks to the same worker. Another advantage
would be that setup and tear down cost of worker will be saved. Now If
we just think w.r.t pg_background it might not be so important to let
worker wait till launcher asks to quit, however as you have mentioned
in your mail upthread that you are expecting this kind of set-up for actual
parallelism, so it might not be completely insensible to expect worker
to wait till it has been asked to quit.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#13

Petr Jelinek

petr@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#7)

Re: pg_background (and more parallelism infrastructure patches)

On 29/07/14 18:51, Robert Haas wrote:

On Mon, Jul 28, 2014 at 1:50 PM, Andres Freund <andres@2ndquadrant.com> wrote:

What I'm thinking of is providing an actual API for the writes instead
of hooking into the socket API in a couple places. I.e. have something
like

typedef struct DestIO DestIO;

struct DestIO
{
void (*flush)(struct DestIO *io);
int (*putbytes)(struct DestIO *io, const char *s, size_t len);
int (*getbytes)(struct DestIO *io, const char *s, size_t len);
...
}

and do everything through it. I haven't thought much about the specific
API we want, but abstracting the communication properly instead of
adding hooks here and there is imo much more likely to succeed in the
long term.

This sounds suspiciously like the DestReceiver thing we've already
got, except that the DestReceiver only applies to tuple results, not
errors and notices and so on. I'm not totally unamenable to a bigger
refactoring here, but right now it looks to me like a solution in
search of a problem. The hooks are simple and seem to work fine; I
don't want to add notation for its own sake.

I am not sure if what Andres proposed is the right solution, but I do
agree here that the hook definitely isn't.

I am also not sure that I like the pq_redirect_to_shm_mq being directly
exposed for use in bgworker, what I would like is to have elog interface
to which you tell that you want errors sent to shm_mq handle and that
one would then do the necessary calls (It's basically same principle but
for the sake of cleanliness/correctness I think it should be elog and
not pq from the point of bgworker).

So the interface needs work.

I do agree that we don't need two way communication over this channel, I
think the dsm_mq can be used directly quite well for generic usecase.

Otherwise I like the patch, and I am glad you also made the GUC
save/restore infrastructure.

Please don't forget to add this to next commitfest.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Petr Jelinek (#13)

Re: pg_background (and more parallelism infrastructure patches)

On Tue, Sep 9, 2014 at 12:33 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:

I am not sure if what Andres proposed is the right solution, but I do agree
here that the hook definitely isn't.

Well, that doesn't get me very far. Andres never responded to my
previous comments about why I did it that way, and you're not
proposing a clear alternative that I can either implement or comment
on.

I am also not sure that I like the pq_redirect_to_shm_mq being directly
exposed for use in bgworker, what I would like is to have elog interface to
which you tell that you want errors sent to shm_mq handle and that one would
then do the necessary calls (It's basically same principle but for the sake
of cleanliness/correctness I think it should be elog and not pq from the
point of bgworker).

I think that's completely wrong. As the patch series demonstrates,
it's not limited to propagating ErrorResponse and NoticeResponse. It
can also propagate NotifyResponse and RowDescription and DataRow and
anything else that comes along. We are not just propagating errors;
we are propagating all protocol messages of whatever type. So tying
it to elog specifically is not right.

So the interface needs work.

I do agree that we don't need two way communication over this channel, I
think the dsm_mq can be used directly quite well for generic usecase.

I'm glad you agree, but that leaves me baffled as to what's wrong with
the hook approach. There's no crying need for extensibility in this
code; it's barely changed in a very long time, and I know of no other
need which might soon require changing it again. The hook is very
lightweight and shouldn't materially affect performance when it's not
used, as a more complex approach might.

Otherwise I like the patch, and I am glad you also made the GUC save/restore
infrastructure.

Cool.

Please don't forget to add this to next commitfest.

OK, done. But it would be awfully nice if we could actually make some
progress on hammering out a design everyone can live with here.
Letting it sit for another 5 weeks is not going to help anything, and
it will hold up getting any more stuff after this done in time for
9.5.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Petr Jelinek

petr@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#14)

Re: pg_background (and more parallelism infrastructure patches)

On 09/09/14 18:49, Robert Haas wrote:

On Tue, Sep 9, 2014 at 12:33 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:

I am also not sure that I like the pq_redirect_to_shm_mq being directly
exposed for use in bgworker, what I would like is to have elog interface to
which you tell that you want errors sent to shm_mq handle and that one would
then do the necessary calls (It's basically same principle but for the sake
of cleanliness/correctness I think it should be elog and not pq from the
point of bgworker).

I think that's completely wrong. As the patch series demonstrates,
it's not limited to propagating ErrorResponse and NoticeResponse. It
can also propagate NotifyResponse and RowDescription and DataRow and
anything else that comes along. We are not just propagating errors;
we are propagating all protocol messages of whatever type. So tying
it to elog specifically is not right.

Oh in that case, I think what Andres proposed is actually quite good. I
know the hook works fine it just seems like using somewhat hackish
solution to save 20 lines of code.
The reason why I am not proposing anything better is that my solution
when I did prototyping in this area was to just check if my pq_dsm_mq
handle is set or not and call the appropriate function from the wrapper
based on that which does not seem like big improvement over the hook
approach... Anything better/more flexible that I could come up with is
basically same idea what Andres already wrote.

Please don't forget to add this to next commitfest.

OK, done. But it would be awfully nice if we could actually make some
progress on hammering out a design everyone can live with here.
Letting it sit for another 5 weeks is not going to help anything, and
it will hold up getting any more stuff after this done in time for
9.5.

Yeah I said that just as formality, I wrote the response now and not in
5 weeks exactly for this reason, I am all for discussing this now and I
think it's important to hammer this infrastructure out sooner rather
than later.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Petr Jelinek (#15)

Re: pg_background (and more parallelism infrastructure patches)

On Tue, Sep 9, 2014 at 1:18 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:

I think that's completely wrong. As the patch series demonstrates,
it's not limited to propagating ErrorResponse and NoticeResponse. It
can also propagate NotifyResponse and RowDescription and DataRow and
anything else that comes along. We are not just propagating errors;
we are propagating all protocol messages of whatever type. So tying
it to elog specifically is not right.

Oh in that case, I think what Andres proposed is actually quite good. I know
the hook works fine it just seems like using somewhat hackish solution to
save 20 lines of code.

If it's 20 lines of code, I'm probably fine to go that way. Let's see
if we can figure out what those 20 lines look like.

libpq.h exports 29 functions that do a variety of different things.
Of those, 20 are in pqcomm.c and the others are in be-secure.c. I
presume that pluggability for the latter group, if needed at all, is a
separate project. The remaining ones break down like this:

- StreamServerPort, StreamConnection, StreamClose, and
TouchSocketFiles are intended to be called only from the postmaster,
to set up and tear down the listening socket and individual
connections. Presumably this is not what we care about here.
- pq_getbytes(), pq_getstring(), pq_getmessage(), pq_getbyte(),
pq_peekbyte(), and pq_getbyte_if_available() handle *reading* data
from the socket. Since you previously agreed that we didn't need to
build two-way communication on top of this, I would thank that would
mean that these don't need to be pluggable either. But maybe I'm
wrong.
- The remaining functions are pq_init(), pq_comm_reset(), pq_flush(),
pq_flush_if_writable(), pq_is_send_pending(), pq_putmessage(),
pq_putmessage_noblock(), pq_startcopyout(), and pq_endcopyout().
These are the ones that I think are potentially interesting.

I didn't choose to provide hooks for all of these in the submitted
patch because they're not all needed for I want to do here:
pq_startcopyout() and pq_endcopyout() are only needed for V2 protocol
support, which did not interest me (nor did COPY, really);
pq_putmessage_noblock(), pq_flush_if_writable(), and
pq_is_send_pending() are only used for the walsender protocol, which
doesn't seem useful to redirect to a non-socket; and I just didn't
happen to have any use for pq_init() or pq_comm_reset(). Hence what I
ended up with.

But, I could revisit that. Suppose I provide a structure with 10
function pointers for all ten of those functions, or maybe 9 since
pq_init() is called so early that it's not likely we'll have control
to put the hooks in place before that point, and anyway whatever code
installs the hooks can do its own initialization then. Then make a
global variable like pqSendMethods and #define pq_comm_reset() to be
pqSendMethods->comm_reset(), pflush() to be pqSendMethods->flush(),
and so on for all 9 or 10 methods. Then the pqmq code could just
change pqSendMethods to point to its own method structure instead of
the default one. Would that address the concern this concern? It's
more than 20 lines of code, but it's not TOO bad.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Petr Jelinek

petr@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#16)

Re: pg_background (and more parallelism infrastructure patches)

On 09/09/14 22:48, Robert Haas wrote:

On Tue, Sep 9, 2014 at 1:18 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:

I think that's completely wrong. As the patch series demonstrates,
it's not limited to propagating ErrorResponse and NoticeResponse. It
can also propagate NotifyResponse and RowDescription and DataRow and
anything else that comes along. We are not just propagating errors;
we are propagating all protocol messages of whatever type. So tying
it to elog specifically is not right.

Oh in that case, I think what Andres proposed is actually quite good. I know
the hook works fine it just seems like using somewhat hackish solution to
save 20 lines of code.

If it's 20 lines of code, I'm probably fine to go that way. Let's see
if we can figure out what those 20 lines look like.

libpq.h exports 29 functions that do a variety of different things.
Of those, 20 are in pqcomm.c and the others are in be-secure.c. I
presume that pluggability for the latter group, if needed at all, is a
separate project. The remaining ones break down like this:

Ugh

- The remaining functions are pq_init(), pq_comm_reset(), pq_flush(),
pq_flush_if_writable(), pq_is_send_pending(), pq_putmessage(),
pq_putmessage_noblock(), pq_startcopyout(), and pq_endcopyout().
These are the ones that I think are potentially interesting.

I didn't choose to provide hooks for all of these in the submitted
patch because they're not all needed for I want to do here:
pq_startcopyout() and pq_endcopyout() are only needed for V2 protocol
support, which did not interest me (nor did COPY, really);
pq_putmessage_noblock(), pq_flush_if_writable(), and
pq_is_send_pending() are only used for the walsender protocol, which
doesn't seem useful to redirect to a non-socket; and I just didn't
happen to have any use for pq_init() or pq_comm_reset(). Hence what I
ended up with.

But, I could revisit that. Suppose I provide a structure with 10
function pointers for all ten of those functions, or maybe 9 since
pq_init() is called so early that it's not likely we'll have control
to put the hooks in place before that point, and anyway whatever code
installs the hooks can do its own initialization then. Then make a
global variable like pqSendMethods and #define pq_comm_reset() to be
pqSendMethods->comm_reset(), pflush() to be pqSendMethods->flush(),
and so on for all 9 or 10 methods. Then the pqmq code could just
change pqSendMethods to point to its own method structure instead of
the default one. Would that address the concern this concern? It's
more than 20 lines of code, but it's not TOO bad.

Yes, although my issue with the hooks was not that you only provided
them for 2 functions, but the fact that it had no structure and the
implementation was "if hook set do this, else do that" which I don't see
like a good way of doing it.

All I personally want is structure and extensibility so struct with just
the needed functions is good enough for me and I would be ok to leave
the other 8 functions just as a option for whomever needs to make them
overridable in the future.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Amit Kapila

amit.kapila16@gmail.com

over 11 years ago

In reply to: Robert Haas (#16)

Re: pg_background (and more parallelism infrastructure patches)

On Wed, Sep 10, 2014 at 2:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Sep 9, 2014 at 1:18 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:

I think that's completely wrong. As the patch series demonstrates,
it's not limited to propagating ErrorResponse and NoticeResponse. It
can also propagate NotifyResponse and RowDescription and DataRow and
anything else that comes along. We are not just propagating errors;
we are propagating all protocol messages of whatever type. So tying
it to elog specifically is not right.

Oh in that case, I think what Andres proposed is actually quite good. I

know

the hook works fine it just seems like using somewhat hackish solution

save 20 lines of code.

If it's 20 lines of code, I'm probably fine to go that way. Let's see
if we can figure out what those 20 lines look like.

libpq.h exports 29 functions that do a variety of different things.
Of those, 20 are in pqcomm.c and the others are in be-secure.c. I
presume that pluggability for the latter group, if needed at all, is a
separate project. The remaining ones break down like this:

- StreamServerPort, StreamConnection, StreamClose, and
TouchSocketFiles are intended to be called only from the postmaster,
to set up and tear down the listening socket and individual
connections. Presumably this is not what we care about here.
- pq_getbytes(), pq_getstring(), pq_getmessage(), pq_getbyte(),
pq_peekbyte(), and pq_getbyte_if_available() handle *reading* data
from the socket. Since you previously agreed that we didn't need to
build two-way communication on top of this, I would thank that would
mean that these don't need to be pluggable either. But maybe I'm
wrong.
- The remaining functions are pq_init(), pq_comm_reset(), pq_flush(),
pq_flush_if_writable(), pq_is_send_pending(), pq_putmessage(),
pq_putmessage_noblock(), pq_startcopyout(), and pq_endcopyout().
These are the ones that I think are potentially interesting.

I didn't choose to provide hooks for all of these in the submitted
patch because they're not all needed for I want to do here:
pq_startcopyout() and pq_endcopyout() are only needed for V2 protocol
support, which did not interest me (nor did COPY, really);
pq_putmessage_noblock(), pq_flush_if_writable(), and
pq_is_send_pending() are only used for the walsender protocol, which
doesn't seem useful to redirect to a non-socket; and I just didn't
happen to have any use for pq_init() or pq_comm_reset(). Hence what I
ended up with.

But, I could revisit that. Suppose I provide a structure with 10
function pointers for all ten of those functions, or maybe 9 since
pq_init() is called so early that it's not likely we'll have control
to put the hooks in place before that point, and anyway whatever code
installs the hooks can do its own initialization then.

Can we use pq_init() to install function pointers?
Let us say that it will take COMM_METHOD (tcp, shm_mq) as
input and then install function pointers based on communication
method. We can call this from main function of bgworker (in case
of patch from pg_background_worker_main()) with COMM_METHOD
as shm_mq and BackendInitialize() will pass it as tcp.

Then make a
global variable like pqSendMethods and #define pq_comm_reset() to be
pqSendMethods->comm_reset(), pflush() to be pqSendMethods->flush(),
and so on for all 9 or 10 methods. Then the pqmq code could just
change pqSendMethods to point to its own method structure instead of
the default one. Would that address the concern this concern? It's
more than 20 lines of code, but it's not TOO bad.

This idea seems to be better than directly using hooks, however I
don't see any harm in defining pqReceiveMethods for get API's as
well because it can make the whole layer extendable. Having said
that I think as currently there is no usage of it, so we can leave it
for another patch as well.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#19

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Petr Jelinek (#17)

Re: pg_background (and more parallelism infrastructure patches)

On Tue, Sep 9, 2014 at 6:03 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:

- The remaining functions are pq_init(), pq_comm_reset(), pq_flush(),
pq_flush_if_writable(), pq_is_send_pending(), pq_putmessage(),
pq_putmessage_noblock(), pq_startcopyout(), and pq_endcopyout().
These are the ones that I think are potentially interesting.

I didn't choose to provide hooks for all of these in the submitted
patch because they're not all needed for I want to do here:
pq_startcopyout() and pq_endcopyout() are only needed for V2 protocol
support, which did not interest me (nor did COPY, really);
pq_putmessage_noblock(), pq_flush_if_writable(), and
pq_is_send_pending() are only used for the walsender protocol, which
doesn't seem useful to redirect to a non-socket; and I just didn't
happen to have any use for pq_init() or pq_comm_reset(). Hence what I
ended up with.

But, I could revisit that. Suppose I provide a structure with 10
function pointers for all ten of those functions, or maybe 9 since
pq_init() is called so early that it's not likely we'll have control
to put the hooks in place before that point, and anyway whatever code
installs the hooks can do its own initialization then. Then make a
global variable like pqSendMethods and #define pq_comm_reset() to be
pqSendMethods->comm_reset(), pflush() to be pqSendMethods->flush(),
and so on for all 9 or 10 methods. Then the pqmq code could just
change pqSendMethods to point to its own method structure instead of
the default one. Would that address the concern this concern? It's
more than 20 lines of code, but it's not TOO bad.

Yes, although my issue with the hooks was not that you only provided them
for 2 functions, but the fact that it had no structure and the
implementation was "if hook set do this, else do that" which I don't see
like a good way of doing it.

We've done it that way in a bunch of other places, like ExecutorRun().
An advantage of this approach (I think) is that jumping to a fixed
address is faster than jumping through a function pointer - so with
the approach I've taken here, the common case where we're talking to
the client incurs only the overhead of a null-test, and the larger
overhead of the function pointer jump is incurred only when the hook
is in use. Maybe that's not enough of a difference to matter to
anything, but I think the contention that I've invented some novel
kind of interface here doesn't hold up to scrutiny. We have lots of
hooks that work just like what I did here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Robert Haas (#19)

1 attachment(s)

Re: pg_background (and more parallelism infrastructure patches)

On Wed, Sep 10, 2014 at 4:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Yes, although my issue with the hooks was not that you only provided them
for 2 functions, but the fact that it had no structure and the
implementation was "if hook set do this, else do that" which I don't see
like a good way of doing it.

We've done it that way in a bunch of other places, like ExecutorRun().
An advantage of this approach (I think) is that jumping to a fixed
address is faster than jumping through a function pointer - so with
the approach I've taken here, the common case where we're talking to
the client incurs only the overhead of a null-test, and the larger
overhead of the function pointer jump is incurred only when the hook
is in use. Maybe that's not enough of a difference to matter to
anything, but I think the contention that I've invented some novel
kind of interface here doesn't hold up to scrutiny. We have lots of
hooks that work just like what I did here.

Here's what the other approach looks like. I can't really see doing
this way and then only providing hooks for those two functions, so
this is with hooks for all the send-side stuff.

Original version: 9 files changed, 295 insertions(+), 3 deletions(-)
This version: 9 files changed, 428 insertions(+), 47 deletions(-)

There is admittedly a certain elegance to providing a complete set of
hooks, so maybe this is the way to go. The remaining patches in the
patch series work with either the old version or this one; the changes
here don't affect anything else.

Anyone else have an opinion on which way is better here?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0003-Support-frontend-backend-protocol-communication-usin-v2.patchtext/x-patch; charset=US-ASCII; name=0003-Support-frontend-backend-protocol-communication-usin-v2.patchDownload

commit 76b9b171f81b1639e9d0379f5066212107a740f6
Author: Robert Haas <rhaas@postgresql.org>
Date:   Wed May 28 19:30:21 2014 -0400

    Support frontend-backend protocol communication using a shm_mq.
    
    A background worker can use pq_redirect_to_shm_mq() to direct protocol
    that would normally be sent to the frontend to a shm_mq so that another
    process may read them.
    
    The receiving process may use pq_parse_errornotice() to parse an
    ErrorResponse or NoticeResponse from the background worker and, if
    it wishes, ThrowErrorData() to propagate the error (with or without
    further modification).
    
    V2: Lots more hooks, and a struct!

diff --git a/src/backend/libpq/Makefile b/src/backend/libpq/Makefile
index 8be0572..09410c4 100644
--- a/src/backend/libpq/Makefile
+++ b/src/backend/libpq/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
 # be-fsstubs is here for historical reasons, probably belongs elsewhere
 
 OBJS = be-fsstubs.o be-secure.o auth.o crypt.o hba.o ip.o md5.o pqcomm.o \
-       pqformat.o pqsignal.o
+       pqformat.o pqmq.o pqsignal.o
 
 ifeq ($(with_openssl),yes)
 OBJS += be-secure-openssl.o
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 605d891..7c4252d 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -102,7 +102,6 @@
 int			Unix_socket_permissions;
 char	   *Unix_socket_group;
 
-
 /* Where the Unix socket files are (list of palloc'd strings) */
 static List *sock_paths = NIL;
 
@@ -134,16 +133,38 @@ static bool DoingCopyOut;
 
 
 /* Internal functions */
-static void pq_close(int code, Datum arg);
+static void socket_comm_reset(void);
+static void socket_close(int code, Datum arg);
+static void socket_set_nonblocking(bool nonblocking);
+static int	socket_flush(void);
+static int	socket_flush_if_writable(void);
+static bool socket_is_send_pending(void);
+static int	socket_putmessage(char msgtype, const char *s, size_t len);
+static void socket_putmessage_noblock(char msgtype, const char *s, size_t len);
+static void socket_startcopyout(void);
+static void socket_endcopyout(bool errorAbort);
 static int	internal_putbytes(const char *s, size_t len);
 static int	internal_flush(void);
-static void pq_set_nonblocking(bool nonblocking);
+static void socket_set_nonblocking(bool nonblocking);
 
 #ifdef HAVE_UNIX_SOCKETS
 static int	Lock_AF_UNIX(char *unixSocketDir, char *unixSocketPath);
 static int	Setup_AF_UNIX(char *sock_path);
 #endif   /* HAVE_UNIX_SOCKETS */
 
+PQsendMethods PQsendSocketMethods;
+
+static PQsendMethods PqSendSocketMethods = {
+	socket_comm_reset,
+	socket_flush,
+	socket_flush_if_writable,
+	socket_is_send_pending,
+	socket_putmessage,
+	socket_putmessage_noblock,
+	socket_startcopyout,
+	socket_endcopyout
+};
+
 
 /* --------------------------------
  *		pq_init - initialize libpq at backend startup
@@ -152,24 +173,25 @@ static int	Setup_AF_UNIX(char *sock_path);
 void
 pq_init(void)
 {
+	PqSendMethods = &PqSendSocketMethods;
 	PqSendBufferSize = PQ_SEND_BUFFER_SIZE;
 	PqSendBuffer = MemoryContextAlloc(TopMemoryContext, PqSendBufferSize);
 	PqSendPointer = PqSendStart = PqRecvPointer = PqRecvLength = 0;
 	PqCommBusy = false;
 	DoingCopyOut = false;
-	on_proc_exit(pq_close, 0);
+	on_proc_exit(socket_close, 0);
 }
 
 /* --------------------------------
- *		pq_comm_reset - reset libpq during error recovery
+ *		socket_comm_reset - reset libpq during error recovery
  *
  * This is called from error recovery at the outer idle loop.  It's
  * just to get us out of trouble if we somehow manage to elog() from
  * inside a pqcomm.c routine (which ideally will never happen, but...)
  * --------------------------------
  */
-void
-pq_comm_reset(void)
+static void
+socket_comm_reset(void)
 {
 	/* Do not throw away pending data, but do reset the busy flag */
 	PqCommBusy = false;
@@ -178,14 +200,14 @@ pq_comm_reset(void)
 }
 
 /* --------------------------------
- *		pq_close - shutdown libpq at backend exit
+ *		socket_close - shutdown libpq at backend exit
  *
  * Note: in a standalone backend MyProcPort will be null,
  * don't crash during exit...
  * --------------------------------
  */
 static void
-pq_close(int code, Datum arg)
+socket_close(int code, Datum arg)
 {
 	if (MyProcPort != NULL)
 	{
@@ -783,15 +805,20 @@ TouchSocketFiles(void)
  */
 
 /* --------------------------------
- *			  pq_set_nonblocking - set socket blocking/non-blocking
+ *			  socket_set_nonblocking - set socket blocking/non-blocking
  *
  * Sets the socket non-blocking if nonblocking is TRUE, or sets it
  * blocking otherwise.
  * --------------------------------
  */
 static void
-pq_set_nonblocking(bool nonblocking)
+socket_set_nonblocking(bool nonblocking)
 {
+	if (MyProcPort == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONNECTION_DOES_NOT_EXIST),
+				 errmsg("there is no client connection")));
+
 	if (MyProcPort->noblock == nonblocking)
 		return;
 
@@ -844,7 +871,7 @@ pq_recvbuf(void)
 	}
 
 	/* Ensure that we're in blocking mode */
-	pq_set_nonblocking(false);
+	socket_set_nonblocking(false);
 
 	/* Can fill buffer from PqRecvLength and upwards */
 	for (;;)
@@ -935,7 +962,7 @@ pq_getbyte_if_available(unsigned char *c)
 	}
 
 	/* Put the socket into non-blocking mode */
-	pq_set_nonblocking(true);
+	socket_set_nonblocking(true);
 
 	r = secure_read(MyProcPort, c, 1);
 	if (r < 0)
@@ -1194,7 +1221,7 @@ internal_putbytes(const char *s, size_t len)
 		/* If buffer is full, then flush it out */
 		if (PqSendPointer >= PqSendBufferSize)
 		{
-			pq_set_nonblocking(false);
+			socket_set_nonblocking(false);
 			if (internal_flush())
 				return EOF;
 		}
@@ -1210,13 +1237,13 @@ internal_putbytes(const char *s, size_t len)
 }
 
 /* --------------------------------
- *		pq_flush		- flush pending output
+ *		socket_flush		- flush pending output
  *
  *		returns 0 if OK, EOF if trouble
  * --------------------------------
  */
-int
-pq_flush(void)
+static int
+socket_flush(void)
 {
 	int			res;
 
@@ -1224,7 +1251,7 @@ pq_flush(void)
 	if (PqCommBusy)
 		return 0;
 	PqCommBusy = true;
-	pq_set_nonblocking(false);
+	socket_set_nonblocking(false);
 	res = internal_flush();
 	PqCommBusy = false;
 	return res;
@@ -1310,8 +1337,8 @@ internal_flush(void)
  * Returns 0 if OK, or EOF if trouble.
  * --------------------------------
  */
-int
-pq_flush_if_writable(void)
+static int
+socket_flush_if_writable(void)
 {
 	int			res;
 
@@ -1324,7 +1351,7 @@ pq_flush_if_writable(void)
 		return 0;
 
 	/* Temporarily put the socket into non-blocking mode */
-	pq_set_nonblocking(true);
+	socket_set_nonblocking(true);
 
 	PqCommBusy = true;
 	res = internal_flush();
@@ -1333,11 +1360,11 @@ pq_flush_if_writable(void)
 }
 
 /* --------------------------------
- *		pq_is_send_pending	- is there any pending data in the output buffer?
+ *	socket_is_send_pending	- is there any pending data in the output buffer?
  * --------------------------------
  */
-bool
-pq_is_send_pending(void)
+static bool
+socket_is_send_pending(void)
 {
 	return (PqSendStart < PqSendPointer);
 }
@@ -1351,7 +1378,7 @@ pq_is_send_pending(void)
 
 
 /* --------------------------------
- *		pq_putmessage	- send a normal message (suppressed in COPY OUT mode)
+ *		socket_putmessage - send a normal message (suppressed in COPY OUT mode)
  *
  *		If msgtype is not '\0', it is a message type code to place before
  *		the message body.  If msgtype is '\0', then the message has no type
@@ -1375,8 +1402,8 @@ pq_is_send_pending(void)
  *		returns 0 if OK, EOF if trouble
  * --------------------------------
  */
-int
-pq_putmessage(char msgtype, const char *s, size_t len)
+static int
+socket_putmessage(char msgtype, const char *s, size_t len)
 {
 	if (DoingCopyOut || PqCommBusy)
 		return 0;
@@ -1408,8 +1435,8 @@ fail:
  *		If the output buffer is too small to hold the message, the buffer
  *		is enlarged.
  */
-void
-pq_putmessage_noblock(char msgtype, const char *s, size_t len)
+static void
+socket_putmessage_noblock(char msgtype, const char *s, size_t len)
 {
 	int res		PG_USED_FOR_ASSERTS_ONLY;
 	int			required;
@@ -1431,18 +1458,18 @@ pq_putmessage_noblock(char msgtype, const char *s, size_t len)
 
 
 /* --------------------------------
- *		pq_startcopyout - inform libpq that an old-style COPY OUT transfer
+ *		socket_startcopyout - inform libpq that an old-style COPY OUT transfer
  *			is beginning
  * --------------------------------
  */
-void
-pq_startcopyout(void)
+static void
+socket_startcopyout(void)
 {
 	DoingCopyOut = true;
 }
 
 /* --------------------------------
- *		pq_endcopyout	- end an old-style COPY OUT transfer
+ *		socket_endcopyout	- end an old-style COPY OUT transfer
  *
  *		If errorAbort is indicated, we are aborting a COPY OUT due to an error,
  *		and must send a terminator line.  Since a partial data line might have
@@ -1451,8 +1478,8 @@ pq_startcopyout(void)
  *		not allow binary transfers, so a textual terminator is always correct.
  * --------------------------------
  */
-void
-pq_endcopyout(bool errorAbort)
+static void
+socket_endcopyout(bool errorAbort)
 {
 	if (!DoingCopyOut)
 		return;
@@ -1462,7 +1489,6 @@ pq_endcopyout(bool errorAbort)
 	DoingCopyOut = false;
 }
 
-
 /*
  * Support for TCP Keepalive parameters
  */
diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
new file mode 100644
index 0000000..4a3c8b1
--- /dev/null
+++ b/src/backend/libpq/pqmq.c
@@ -0,0 +1,261 @@
+/*-------------------------------------------------------------------------
+ *
+ * pqmq.c
+ *	  Use the frontend/backend protocol for communication over a shm_mq
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *	src/backend/libpq/pqmq.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "libpq/libpq.h"
+#include "libpq/pqformat.h"
+#include "libpq/pqmq.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+static shm_mq *pq_mq;
+static shm_mq_handle *pq_mq_handle;
+static bool pq_mq_busy = false;
+
+static void mq_comm_reset(void);
+static int	mq_flush(void);
+static int	mq_flush_if_writable(void);
+static bool mq_is_send_pending(void);
+static int	mq_putmessage(char msgtype, const char *s, size_t len);
+static void mq_putmessage_noblock(char msgtype, const char *s, size_t len);
+static void mq_startcopyout(void);
+static void mq_endcopyout(bool errorAbort);
+
+static PQsendMethods PqSendMqMethods = {
+	mq_comm_reset,
+	mq_flush,
+	mq_flush_if_writable,
+	mq_is_send_pending,
+	mq_putmessage,
+	mq_putmessage_noblock,
+	mq_startcopyout,
+	mq_endcopyout
+};
+
+/*
+ * Arrange to redirect frontend/backend protocol messages to a shared-memory
+ * message queue.
+ */
+void
+pq_redirect_to_shm_mq(shm_mq *mq, shm_mq_handle *mqh)
+{
+	PqSendMethods = &PqSendMqMethods;
+	pq_mq = mq;
+	pq_mq_handle = mqh;
+	whereToSendOutput = DestRemote;
+	FrontendProtocol = PG_PROTOCOL_LATEST;
+}
+
+static void
+mq_comm_reset(void)
+{
+	/* Nothing to do. */
+}
+
+static int
+mq_flush(void)
+{
+	/* Nothing to do. */
+	return 0;
+}
+
+static int
+mq_flush_if_writable(void)
+{
+	/* Nothing to do. */
+	return 0;
+}
+
+static bool
+mq_is_send_pending(void)
+{
+	/* There's never anything pending. */
+	return 0;
+}
+
+/*
+ * Transmit a libpq protocol message to the shared memory message queue
+ * selected via pq_mq_handle.  We don't include a length word, because the
+ * receiver will know the length of the message from shm_mq_receive().
+ */
+static int
+mq_putmessage(char msgtype, const char *s, size_t len)
+{
+	shm_mq_iovec	iov[2];
+	shm_mq_result	result;
+
+	/*
+	 * If we're sending a message, and we have to wait because the
+	 * queue is full, and then we get interrupted, and that interrupt
+	 * results in trying to send another message, we respond by detaching
+	 * the queue.  There's no way to return to the original context, but
+	 * even if there were, just queueing the message would amount to
+	 * indefinitely postponing the response to the interrupt.  So we do
+	 * this instead.
+	 */
+	if (pq_mq_busy)
+	{
+		if (pq_mq != NULL)
+			shm_mq_detach(pq_mq);
+		pq_mq = NULL;
+		return EOF;
+	}
+
+	pq_mq_busy = true;
+
+	iov[0].data = &msgtype;
+	iov[0].len = 1;
+	iov[1].data = s;
+	iov[1].len = len;
+
+	Assert(pq_mq_handle != NULL);
+	result = shm_mq_sendv(pq_mq_handle, iov, 2, false);
+
+	pq_mq_busy = false;
+
+	Assert(result == SHM_MQ_SUCCESS || result == SHM_MQ_DETACHED);
+	if (result != SHM_MQ_SUCCESS)
+		return EOF;
+	return 0;
+}
+
+static void
+mq_putmessage_noblock(char msgtype, const char *s, size_t len)
+{
+	/*
+	 * While the shm_mq machinery does support sending a message in
+	 * non-blocking mode, there's currently no way to try sending beginning
+	 * to send the message that doesn't also commit us to completing the
+	 * transmission.  This could be improved in the future, but for now
+	 * we don't need it.
+	 */
+	elog(ERROR, "not currently supported");
+}
+
+static void
+mq_startcopyout(void)
+{
+	/* Nothing to do. */
+}
+
+static void
+mq_endcopyout(bool errorAbort)
+{
+	/* Nothing to do. */
+}
+
+/*
+ * Parse an ErrorResponse or NoticeResponse payload and populate an ErrorData
+ * structure with the results.
+ */
+void
+pq_parse_errornotice(StringInfo msg, ErrorData *edata)
+{
+	/* Initialize edata with reasonable defaults. */
+	MemSet(edata, 0, sizeof(ErrorData));
+	edata->elevel = ERROR;
+	edata->assoc_context = CurrentMemoryContext;
+
+	/* Loop over fields and extract each one. */
+	for (;;)
+	{
+		char	code = pq_getmsgbyte(msg);
+		const char *value;
+
+		if (code == '\0')
+		{
+			pq_getmsgend(msg);
+			break;
+		}
+	 	value = pq_getmsgstring(msg);
+
+		switch (code)
+		{
+			case PG_DIAG_SEVERITY:
+				if (strcmp(value, "DEBUG") == 0)
+					edata->elevel = DEBUG1;	/* or some other DEBUG level */
+				else if (strcmp(value, "LOG") == 0)
+					edata->elevel = LOG;	/* can't be COMMERROR */
+				else if (strcmp(value, "INFO") == 0)
+					edata->elevel = INFO;
+				else if (strcmp(value, "NOTICE") == 0)
+					edata->elevel = NOTICE;
+				else if (strcmp(value, "WARNING") == 0)
+					edata->elevel = WARNING;
+				else if (strcmp(value, "ERROR") == 0)
+					edata->elevel = ERROR;
+				else if (strcmp(value, "FATAL") == 0)
+					edata->elevel = FATAL;
+				else if (strcmp(value, "PANIC") == 0)
+					edata->elevel = PANIC;
+				else
+					elog(ERROR, "unknown error severity");
+				break;
+			case PG_DIAG_SQLSTATE:
+				if (strlen(value) != 5)
+					elog(ERROR, "malformed sql state");
+				edata->sqlerrcode = MAKE_SQLSTATE(value[0], value[1], value[2],
+												  value[3], value[4]);
+				break;
+			case PG_DIAG_MESSAGE_PRIMARY:
+				edata->message = pstrdup(value);
+				break;
+			case PG_DIAG_MESSAGE_DETAIL:
+				edata->detail = pstrdup(value);
+				break;
+			case PG_DIAG_MESSAGE_HINT:
+				edata->hint = pstrdup(value);
+				break;
+			case PG_DIAG_STATEMENT_POSITION:
+				edata->cursorpos = pg_atoi(value, sizeof(int), '\0');
+				break;
+			case PG_DIAG_INTERNAL_POSITION:
+				edata->internalpos = pg_atoi(value, sizeof(int), '\0');
+				break;
+			case PG_DIAG_INTERNAL_QUERY:
+				edata->internalquery = pstrdup(value);
+				break;
+			case PG_DIAG_CONTEXT:
+				edata->context = pstrdup(value);
+				break;
+			case PG_DIAG_SCHEMA_NAME:
+				edata->schema_name = pstrdup(value);
+				break;
+			case PG_DIAG_TABLE_NAME:
+				edata->table_name = pstrdup(value);
+				break;
+			case PG_DIAG_COLUMN_NAME:
+				edata->column_name = pstrdup(value);
+				break;
+			case PG_DIAG_DATATYPE_NAME:
+				edata->datatype_name = pstrdup(value);
+				break;
+			case PG_DIAG_CONSTRAINT_NAME:
+				edata->constraint_name = pstrdup(value);
+				break;
+			case PG_DIAG_SOURCE_FILE:
+				edata->filename = pstrdup(value);
+				break;
+			case PG_DIAG_SOURCE_LINE:
+				edata->lineno = pg_atoi(value, sizeof(int), '\0');
+				break;
+			case PG_DIAG_SOURCE_FUNCTION:
+				edata->funcname = pstrdup(value);
+				break;
+			default:
+				elog(ERROR, "unknown error field: %d", (int) code);
+				break;
+		}
+	}
+}
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index ca5a8a5..1d13363 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -34,7 +34,7 @@
  * overflow.
  */
 int32
-pg_atoi(char *s, int size, int c)
+pg_atoi(const char *s, int size, int c)
 {
 	long		l;
 	char	   *badp;
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 32a9663..2316464 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -1577,6 +1577,57 @@ FlushErrorState(void)
 }
 
 /*
+ * ThrowErrorData --- report an error described by an ErrorData structure
+ *
+ * This is intended to be used to re-report errors originally thrown by
+ * background worker processes and then propagated (with or without
+ * modification) to the backend responsible for them.
+ */
+void
+ThrowErrorData(ErrorData *edata)
+{
+	ErrorData *newedata;
+	MemoryContext	oldcontext;
+
+	if (!errstart(edata->elevel, edata->filename, edata->lineno,
+				  edata->funcname, NULL))
+		return;
+
+	newedata = &errordata[errordata_stack_depth];
+	oldcontext = MemoryContextSwitchTo(edata->assoc_context);
+
+	/* Copy the supplied fields to the error stack. */
+	if (edata->sqlerrcode > 0)
+		newedata->sqlerrcode = edata->sqlerrcode;
+	if (edata->message)
+		newedata->message = pstrdup(edata->message);
+	if (edata->detail)
+		newedata->detail = pstrdup(edata->detail);
+	if (edata->detail_log)
+		newedata->detail_log = pstrdup(edata->detail_log);
+	if (edata->hint)
+		newedata->hint = pstrdup(edata->hint);
+	if (edata->context)
+		newedata->context = pstrdup(edata->context);
+	if (edata->schema_name)
+		newedata->schema_name = pstrdup(edata->schema_name);
+	if (edata->table_name)
+		newedata->table_name = pstrdup(edata->table_name);
+	if (edata->column_name)
+		newedata->column_name = pstrdup(edata->column_name);
+	if (edata->datatype_name)
+		newedata->datatype_name = pstrdup(edata->datatype_name);
+	if (edata->constraint_name)
+		newedata->constraint_name = pstrdup(edata->constraint_name);
+	if (edata->internalquery)
+		newedata->internalquery = pstrdup(edata->internalquery);
+
+	MemoryContextSwitchTo(oldcontext);
+
+	errfinish(0);
+}
+
+/*
  * ReThrowError --- re-throw a previously copied error
  *
  * A handler can do CopyErrorData/FlushErrorState to get out of the error
diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
index 5da9d8d..0b8db42 100644
--- a/src/include/libpq/libpq.h
+++ b/src/include/libpq/libpq.h
@@ -37,6 +37,31 @@ typedef struct
 	}			u;
 } PQArgBlock;
 
+typedef struct
+{
+	void (*comm_reset)(void);
+	int	(*flush)(void);
+	int	(*flush_if_writable)(void);
+	bool (*is_send_pending)(void);
+	int	(*putmessage)(char msgtype, const char *s, size_t len);
+	void (*putmessage_noblock)(char msgtype, const char *s, size_t len);
+	void (*startcopyout)(void);
+	void (*endcopyout)(bool errorAbort);
+} PQsendMethods;
+
+PQsendMethods *PqSendMethods;
+
+#define pq_comm_reset()	(PqSendMethods->comm_reset())
+#define pq_flush() (PqSendMethods->flush())
+#define pq_flush_if_writable() (PqSendMethods->flush_if_writable())
+#define pq_is_send_pending() (PqSendMethods->is_send_pending())
+#define pq_putmessage(msgtype, s, len) \
+	(PqSendMethods->putmessage(msgtype, s, len))
+#define pq_putmessage_noblock(msgtype, s, len) \
+	(PqSendMethods->putmessage(msgtype, s, len))
+#define pq_startcopyout() (PqSendMethods->startcopyout())
+#define pq_endcopyout(errorAbort) (PqSendMethods->endcopyout(errorAbort))
+
 /*
  * External functions.
  */
@@ -51,7 +76,6 @@ extern int	StreamConnection(pgsocket server_fd, Port *port);
 extern void StreamClose(pgsocket sock);
 extern void TouchSocketFiles(void);
 extern void pq_init(void);
-extern void pq_comm_reset(void);
 extern int	pq_getbytes(char *s, size_t len);
 extern int	pq_getstring(StringInfo s);
 extern int	pq_getmessage(StringInfo s, int maxlen);
@@ -59,13 +83,6 @@ extern int	pq_getbyte(void);
 extern int	pq_peekbyte(void);
 extern int	pq_getbyte_if_available(unsigned char *c);
 extern int	pq_putbytes(const char *s, size_t len);
-extern int	pq_flush(void);
-extern int	pq_flush_if_writable(void);
-extern bool pq_is_send_pending(void);
-extern int	pq_putmessage(char msgtype, const char *s, size_t len);
-extern void pq_putmessage_noblock(char msgtype, const char *s, size_t len);
-extern void pq_startcopyout(void);
-extern void pq_endcopyout(bool errorAbort);
 
 /*
  * prototypes for functions in be-secure.c
@@ -75,6 +92,9 @@ extern char *ssl_key_file;
 extern char *ssl_ca_file;
 extern char *ssl_crl_file;
 
+extern int	(*pq_putmessage_hook)(char msgtype, const char *s, size_t len);
+extern int  (*pq_flush_hook)(void);
+
 extern int	secure_initialize(void);
 extern bool secure_loaded_verify_locations(void);
 extern void secure_destroy(void);
diff --git a/src/include/libpq/pqmq.h b/src/include/libpq/pqmq.h
new file mode 100644
index 0000000..6bb24d9
--- /dev/null
+++ b/src/include/libpq/pqmq.h
@@ -0,0 +1,22 @@
+/*-------------------------------------------------------------------------
+ *
+ * pqmq.h
+ *	  Use the frontend/backend protocol for communication over a shm_mq
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/libpq/pqmq.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PQMQ_H
+#define PQMQ_H
+
+#include "storage/shm_mq.h"
+
+extern void	pq_redirect_to_shm_mq(shm_mq *, shm_mq_handle *);
+
+extern void pq_parse_errornotice(StringInfo str, ErrorData *edata);
+
+#endif   /* PQMQ_H */
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 78cc0a0..e1bea54 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -285,7 +285,7 @@ extern Datum current_schema(PG_FUNCTION_ARGS);
 extern Datum current_schemas(PG_FUNCTION_ARGS);
 
 /* numutils.c */
-extern int32 pg_atoi(char *s, int size, int c);
+extern int32 pg_atoi(const char *s, int size, int c);
 extern void pg_itoa(int16 i, char *a);
 extern void pg_ltoa(int32 l, char *a);
 extern void pg_lltoa(int64 ll, char *a);
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 92073be..87438b8 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -415,6 +415,7 @@ extern ErrorData *CopyErrorData(void);
 extern void FreeErrorData(ErrorData *edata);
 extern void FlushErrorState(void);
 extern void ReThrowError(ErrorData *edata) __attribute__((noreturn));
+extern void ThrowErrorData(ErrorData *edata);
 extern void pg_re_throw(void) __attribute__((noreturn));
 
 extern char *GetErrorContextStack(void);

#21

Amit Kapila

amit.kapila16@gmail.com

over 11 years ago

In reply to: Robert Haas (#1)

Re: pg_background (and more parallelism infrastructure patches)

On Fri, Jul 25, 2014 at 11:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Patch 4 adds infrastructure that allows one session to save all of its
non-default GUC values and another session to reload those values.
This was written by Amit Khandekar and Noah Misch. It allows
pg_background to start up the background worker with the same GUC
settings that the launching process is using. I intend this as a
demonstration of how to synchronize any given piece of state between
cooperating backends. For real parallelism, we'll need to synchronize
snapshots, combo CIDs, transaction state, and so on, in addition to
GUCs. But GUCs are ONE of the things that we'll need to synchronize
in that context, and this patch shows the kind of API we're thinking
about for these sorts of problems.

Don't we need some way to prohibit changing GUC by launching process,
once it has shared the existing GUC?

Patch 6 is pg_background itself. I'm quite pleased with how easily
this came together. The existing background worker, dsm, shm_toc, and
shm_mq infrastructure handles most of the heavily lifting here -
obviously with some exceptions addressed by the preceding patches.
Again, this is the kind of set-up that I'm expecting will happen in a
background worker used for actual parallelism - clearly, more state
will need to be restored there than here, but nonetheless the general
flow of the code here is about what I'm imagining, just with somewhat
more different kinds of state. Most of the work of writing this patch
was actually figuring out how to execute the query itself; what I
ended up with is mostly copied form exec_simple_query, but with some
difference here and there. I'm not sure if it would be
possible/advisable to try to refactor to reduce duplication.

1. This patch generates warning on windows
1>pg_background.obj : error LNK2001: unresolved external symbol
StatementTimeout

You need to add PGDLLIMPORT for StatementTimeout

2.
CREATE FUNCTION pg_background_launch(sql pg_catalog.text,
queue_size pg_catalog.int4 DEFAULT 65536)

Here shouldn't queue_size be pg_catalog.int8 as I could see
some related functions in test_shm_mq uses int8?

CREATE FUNCTION test_shm_mq(queue_size pg_catalog.int8,
CREATE FUNCTION test_shm_mq_pipelined(queue_size pg_catalog.int8,

Anyway I think corresponding C function doesn't use matching function
to extract the function args.

pg_background_launch(PG_FUNCTION_ARGS)
{
text *sql = PG_GETARG_TEXT_PP(0);
int32 queue_size = PG_GETARG_INT64(1);

Here it should _INT32 variant to match with current sql definition,
otherwise it leads to below error.

postgres=# select pg_background_launch('vacuum verbose foo');
ERROR: queue size must be at least 64 bytes

3.
Comparing execute_sql_string() and exec_simple_query(), I could see below
main differences:
a. Allocate a new memory context different from message context
b. Start/commit control of transaction is outside execute_sql_string
c. enable/disable statement timeout is done from outside incase of
execute_sql_string()
d. execute_sql_string() prohibits Start/Commit/Abort transaction statements.
e. execute_sql_string() doesn't log statements
f. execute_sql_string() uses binary format for result whereas
exec_simple_query()
uses TEXT as defult format
g. processed stat related info from caller incase of execute_sql_string().

Can't we handle all these or other changes inside exec_simple_query()
based on some parameter or may be a use a hook similar to what we
do in ProcessUtility?

Basically it looks bit odd to have a duplicate (mostly) copy of
exec_simple_query().

4.
Won't it be better if pg_background_worker_main() can look more
like PostgresMain() (in terms of handling different kind of messages),
so that it can be extended in future to handle parallel worker.

5.
pg_background_result()
{
..
/* Read and processes messages from the shared memory queue. */
}

Shouldn't the processing of messages be a separate function as
we do for pqParseInput3().

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#22

Petr Jelinek

petr@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#20)

Re: pg_background (and more parallelism infrastructure patches)

On 10/09/14 22:53, Robert Haas wrote:

Here's what the other approach looks like. I can't really see doing
this way and then only providing hooks for those two functions, so
this is with hooks for all the send-side stuff.

Original version: 9 files changed, 295 insertions(+), 3 deletions(-)
This version: 9 files changed, 428 insertions(+), 47 deletions(-)

There is admittedly a certain elegance to providing a complete set of
hooks, so maybe this is the way to go. The remaining patches in the
patch series work with either the old version or this one; the changes
here don't affect anything else.

Anyone else have an opinion on which way is better here?

Ok so it is more than 20 lines :)

I do like this approach more though, it looks cleaner to me.

On 10/09/14 22:53, Robert Haas wrote:

+extern int	(*pq_putmessage_hook)(char msgtype, const char *s, size_tlen);
+extern int  (*pq_flush_hook)(void);

Btw you forgot to remove the above.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Amit Kapila (#21)

Re: pg_background (and more parallelism infrastructure patches)

On Thu, Sep 11, 2014 at 7:34 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Don't we need some way to prohibit changing GUC by launching process,
once it has shared the existing GUC?

Nope. I mean, eventually, for true parallelism ... we absolutely will
need that. But pg_background itself doesn't need that; it's perfectly
fine for configuration settings to get changed in the background
worker. So it's a different piece of infrastructure from this patch
set.

1. This patch generates warning on windows
1>pg_background.obj : error LNK2001: unresolved external symbol
StatementTimeout

You need to add PGDLLIMPORT for StatementTimeout

OK. I still think we should go back and PGDLLIMPORT-ize all the GUC variables.

2.
CREATE FUNCTION pg_background_launch(sql pg_catalog.text,
queue_size pg_catalog.int4 DEFAULT 65536)

Here shouldn't queue_size be pg_catalog.int8 as I could see
some related functions in test_shm_mq uses int8?

I think if you think you want a queue that's more than 2GB in size,
you should should re-think.

pg_background_launch(PG_FUNCTION_ARGS)
{
text *sql = PG_GETARG_TEXT_PP(0);
int32 queue_size = PG_GETARG_INT64(1);

Here it should _INT32 variant to match with current sql definition,
otherwise it leads to below error.

postgres=# select pg_background_launch('vacuum verbose foo');
ERROR: queue size must be at least 64 bytes

Oops.

3.
Comparing execute_sql_string() and exec_simple_query(), I could see below
main differences:
a. Allocate a new memory context different from message context
b. Start/commit control of transaction is outside execute_sql_string
c. enable/disable statement timeout is done from outside incase of
execute_sql_string()
d. execute_sql_string() prohibits Start/Commit/Abort transaction statements.
e. execute_sql_string() doesn't log statements
f. execute_sql_string() uses binary format for result whereas
exec_simple_query()
uses TEXT as defult format
g. processed stat related info from caller incase of execute_sql_string().

Can't we handle all these or other changes inside exec_simple_query()
based on some parameter or may be a use a hook similar to what we
do in ProcessUtility?

Basically it looks bit odd to have a duplicate (mostly) copy of
exec_simple_query().

It is. But I didn't think hacking up exec_simple_query() was a better
option. We could do that if most people like that approach, but to me
it seemed there were enough differences to make it unappealing.

4.
Won't it be better if pg_background_worker_main() can look more
like PostgresMain() (in terms of handling different kind of messages),
so that it can be extended in future to handle parallel worker.

I don't think that a parallel worker will look like pg_background in
much more than broad outline. Some of the same constructs will get
reused, but overall I think it's a different problem that I'd rather
not conflate with this patch.

5.
pg_background_result()
{
..
/* Read and processes messages from the shared memory queue. */
}

Shouldn't the processing of messages be a separate function as
we do for pqParseInput3().

I guess we could. It's not all that much code, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Petr Jelinek

petr@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#23)

Re: pg_background (and more parallelism infrastructure patches)

On 11/09/14 20:37, Robert Haas wrote:

1. This patch generates warning on windows
1>pg_background.obj : error LNK2001: unresolved external symbol
StatementTimeout

You need to add PGDLLIMPORT for StatementTimeout

OK. I still think we should go back and PGDLLIMPORT-ize all the GUC variables.

4.
Won't it be better if pg_background_worker_main() can look more
like PostgresMain() (in terms of handling different kind of messages),
so that it can be extended in future to handle parallel worker.

I don't think that a parallel worker will look like pg_background in
much more than broad outline. Some of the same constructs will get
reused, but overall I think it's a different problem that I'd rather
not conflate with this patch.

Yeah I agree here. While the patch provides a lot of necessary plumbing
that any kind of parallel processing needs, the pg_background itself
does not seem to be base for that.
Actually, when I first seen the pg_background, I was thinking that it
looks like a good base for some kind of background job scheduling
mechanism that was requested few times over the years (the actual
scheduling is the only part missing now IMHO but that's separate
discussion).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

Andres Freund

andres@2ndquadrant.com

over 11 years ago

In reply to: Petr Jelinek (#24)

Re: pg_background (and more parallelism infrastructure patches)

On 2014-09-11 21:27:15 +0200, Petr Jelinek wrote:

On 11/09/14 20:37, Robert Haas wrote:

OK. I still think we should go back and PGDLLIMPORT-ize all the GUC variables.

+1

Same here. I think Tom was the only one against it, but pretty much
everyone else was for it. We should fix all the GUCs declared as externs
in multiple .c files at the same time imo.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Amit Kapila

amit.kapila16@gmail.com

over 11 years ago

In reply to: Robert Haas (#23)

Re: pg_background (and more parallelism infrastructure patches)

On Fri, Sep 12, 2014 at 12:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Sep 11, 2014 at 7:34 AM, Amit Kapila <amit.kapila16@gmail.com>

wrote:

Don't we need some way to prohibit changing GUC by launching process,
once it has shared the existing GUC?

Nope. I mean, eventually, for true parallelism ... we absolutely will
need that. But pg_background itself doesn't need that; it's perfectly
fine for configuration settings to get changed in the background
worker. So it's a different piece of infrastructure from this patch
set.

Okay, but as there is no predictability (it can be either same as what
launching process has at the when it has launched background worker
or it could be some different value if got changed later due to sighup)
which GUC value will be used by background worker, it might be good
to clarify the same in pg_bacground docs (which are not there currently,
but I assume it will eventually be part of this patch).

3.
Can't we handle all these or other changes inside exec_simple_query()
based on some parameter or may be a use a hook similar to what we
do in ProcessUtility?

Basically it looks bit odd to have a duplicate (mostly) copy of
exec_simple_query().

It is. But I didn't think hacking up exec_simple_query() was a better
option. We could do that if most people like that approach, but to me
it seemed there were enough differences to make it unappealing.

Okay, but I think in that case we need to carefully evaluate the
differences else it might lead to behaviour differences in statement
execution. Few observations related to differences are as follows:

1.
Keeping transaction control (Start/commit) outside the function
execute_sql_string() could lead to EndCommand() message being
sent before transaction end which could be a problem in case
transaction commit fails due to any reason. exec_simple_query() takes
care of the same by calling finish_xact_command() before reporting
command-complete for last parse tree. It even has comment indicating
that we should follow this protocol.

2.
+static void
+execute_sql_string(const char *sql)
{
..

+ /* Be sure to advance the command counter after the last script command */

+ CommandCounterIncrement();
}

Won't CommandCounterIncrement() required after every command like
we do in exec_simple_query()?

3.
+static void
+execute_sql_string(const char *sql)
{
..
+ /*
+ * Send a CommandComplete message even if we suppressed the query
+
 * results.  The user backend will report these in the absence of
+ * any true query results.
+
 */
+ EndCommand(completionTag, DestRemote);
+
+ /* Clean up the portal. */
+
PortalDrop(portal, false);
..
}

Whats the need to reverse the order of execution for EndCommand()
and PortalDrop()? Any error in PortalDrop() will lead to wrong
message being sent to client.

4.
What is the reason for not logging statements executed via
pg_background, even though it allows to report the sql statement
to various monitoring tools by setting debug_query_string?

5.
Isn't it better to add a comment why execute_sql_string() uses
binary format for result?

4.
Won't it be better if pg_background_worker_main() can look more
like PostgresMain() (in terms of handling different kind of messages),
so that it can be extended in future to handle parallel worker.

I don't think that a parallel worker will look like pg_background in
much more than broad outline. Some of the same constructs will get
reused, but overall I think it's a different problem that I'd rather
not conflate with this patch.

No issues.

5.
pg_background_result()
{
..
/* Read and processes messages from the shared memory queue. */
}

Shouldn't the processing of messages be a separate function as
we do for pqParseInput3().

I guess we could. It's not all that much code, though.

Sure, please take a call based on what you feel is right here, I
mentioned it because I felt it might be little bit easier for other people
to understand that code.

Some other comments are as follows:

1.
+pg_background_result(PG_FUNCTION_ARGS)
{
..
..
+ /*
+ * Whether we succeed or fail, a future invocation of this function
+
 * may not try to read from the DSM once we've begun to do so.
+ * Accordingly, make arrangements to
clean things up at end of query.
+ */
+ dsm_unkeep_mapping(info->seg);

There can be certain scenarios where user might like to invoke this
again. Assume, user calls function
pg_background_launch('select count(*) from t1') and this statement
execution via background worker is going to take very long time before
it could return anything. After sometime user tries to retrieve data via
pg_background_result(), but the call couldn't came out as it is waiting
for results, so user presses cancel and on again trying after sometime,
he won't get any data. I think this behaviour is bit annoying.

I am able to reproduce this by halting the background worked via
debugger.

postgres=# select pg_background_launch('select count(*) from t1');
pg_background_launch
----------------------
656
(1 row)

postgres=# select * from pg_background_result(656) as (cnt int);
Cancel request sent
ERROR: canceling statement due to user request
postgres=# select * from pg_background_result(656) as (cnt int);
ERROR: PID 656 is not attached to this session

2.
To avoid user to wait for longer, function pg_background_result()
can take an additional parameter where user can specify whether
to WAIT incase results are not available.

3.
+ case 'E':
+ case 'N':
+ {
+
ErrorData edata;
+
+ /* Parse
ErrorResponse or NoticeResponse. */
+ pq_parse_errornotice(&msg, &edata);
+
+
/*
+ * Limit the maximum error level to
ERROR.  We don't want
+ * a FATAL inside the background worker to kill the
user
+ * session.
+ */
+
if (edata.elevel > ERROR)
+ edata.elevel = ERROR;

Why FATAL inside background worker is not propagated at same level by
launcher process?
If PANIC in background worker can kill other backends and restart server
then ideally FATAL in background worker should also be treated at same
level by client backend.

4.
+void
+pg_background_worker_main(Datum main_arg)
+{
..
+ responseq = shm_mq_attach(mq, seg, NULL);
+
+ /* Redirect protocol messages to responseq. */
+
pq_redirect_to_shm_mq(mq, responseq);

Any error ("unable to map dynamic shared memory segment") before
pq_redirect_to_shm_mq() will not reach launcher. Launcher client will
get "ERROR: lost connection to worker process with PID 4020".

I think it is very difficult for user to know why such an error has
occurred and what he can do to proceed. I am not sure if there is any
sensible way to report such an error, but OTOH it seems we should
provide some information regarding what has happened to client.

5.
Commands not supported in pg_background should get proper
message.

postgres=# select pg_background_launch('copy t1 from stdin');
pg_background_launch
----------------------
4672
(1 row)

postgres=# select * from pg_background_result(4672) as (result TEXT);
WARNING: unknown message type: G (6 bytes)
ERROR: there is no client connection
CONTEXT: COPY t1, line 1: ""

Something similar to what is returned for transaction statements
("transaction control statements are not allowed in pg_background")
would be better.

6.
+ /*
+ * Tuples returned by any command other than the last are simply
+
 * discarded; but those returned by the last (or only) command are
+ * redirected to the shared
memory queue we're using for communication
+ * with the launching backend. If the launching
backend is gone or has
+ * detached us, these messages will just get dropped on the floor.
+
 */
+ --commands_remaining;
+ if (commands_remaining > 0)
+ receiver =
CreateDestReceiver(DestNone);
+ else
+ {
+ receiver =
CreateDestReceiver(DestRemote);
+ SetRemoteDestReceiverParams(receiver, portal);
+
}

If you have to discard results of statements other than last,
then why at first place you want to allow multiple statements?

Like in below case, how will user identify whether the result is
for first statement or second statement?

postgres=# select pg_background_launch('select count(*) from t1;select
count(*)
from t1');
pg_background_launch
----------------------
3996
(1 row)

postgres=# select * from pg_background_result(3996) as (result bigint);
result
--------
11
(1 row)

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#27

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Amit Kapila (#26)

1 attachment(s)

Re: pg_background (and more parallelism infrastructure patches)

On Sat, Sep 20, 2014 at 3:03 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, but as there is no predictability (it can be either same as what
launching process has at the when it has launched background worker
or it could be some different value if got changed later due to sighup)
which GUC value will be used by background worker, it might be good
to clarify the same in pg_bacground docs (which are not there currently,
but I assume it will eventually be part of this patch).

OK, I will mention that in the documentation when I write it. I
didn't sweat that too much originally because I wasn't sure how much
churn there was going to be in the user-visible API, but so far
everybody seems happy with that, so maybe it's time to go document it.
It's a pretty simple API but, as you say, there are a few details
worth mentioning. I still need some review of the earlier patches in
the series before this really gets urgent, though: so far no one has
commented on #1, #2, #4, or #5, and I'm not entirely whether my
revised version of #3 passed muster.

Keeping transaction control (Start/commit) outside the function
execute_sql_string() could lead to EndCommand() message being
sent before transaction end which could be a problem in case
transaction commit fails due to any reason. exec_simple_query() takes
care of the same by calling finish_xact_command() before reporting
command-complete for last parse tree. It even has comment indicating
that we should follow this protocol.

Fixed in the attached version.

Won't CommandCounterIncrement() required after every command like
we do in exec_simple_query()?

Fixed in the attached version.

Whats the need to reverse the order of execution for EndCommand()
and PortalDrop()? Any error in PortalDrop() will lead to wrong
message being sent to client.

Fixed in the attached version.

What is the reason for not logging statements executed via
pg_background, even though it allows to report the sql statement
to various monitoring tools by setting debug_query_string?

I wasn't really sure whether core GUCs should affect the behavior of a
contrib module, and I wasn't excited about duplicating the code.

Isn't it better to add a comment why execute_sql_string() uses
binary format for result?

Done in the attached version.

Sure, please take a call based on what you feel is right here, I
mentioned it because I felt it might be little bit easier for other people
to understand that code.

I played around with this a bit but it didn't seem like it worked out
to a win. There were a bunch of things that had to be passed down
into that function and it didn't seem like it was really reducing
complexity. What I think makes sense is to keep an eye on the
complexity of the handling for each individual message type and move
any handlers that get complex to their own functions.

There can be certain scenarios where user might like to invoke this
again. Assume, user calls function
pg_background_launch('select count(*) from t1') and this statement
execution via background worker is going to take very long time before
it could return anything. After sometime user tries to retrieve data via
pg_background_result(), but the call couldn't came out as it is waiting
for results, so user presses cancel and on again trying after sometime,
he won't get any data. I think this behaviour is bit annoying.

Yep. I don't have a better solution at the moment, but there may be one.

To avoid user to wait for longer, function pg_background_result()
can take an additional parameter where user can specify whether
to WAIT incase results are not available.

That gets complicated. Until any results are available? Until all
results are available? What if we try to read from the queue to find
out if results are available, and the first message in the queue is
long enough that it wraps the queue, so that we have to block and wait
for the background worker to send more data before we can continue?

Why FATAL inside background worker is not propagated at same level by
launcher process?
If PANIC in background worker can kill other backends and restart server
then ideally FATAL in background worker should also be treated at same
level by client backend.

It was initially like that, but then I realized it was silly. If the
background worker hits some error that forces its session to
terminate, there is no need to terminate the user's session too - and
in fact doing so is really annoying, as I rapidly found out while
experimenting with this. Generally a FATAL is something we do because
backend-local state is corrupted to a degree that makes it impractical
to continue, but the fact that that other backend is messed up does
not mean our backend is messed up too.

Any error ("unable to map dynamic shared memory segment") before
pq_redirect_to_shm_mq() will not reach launcher. Launcher client will
get "ERROR: lost connection to worker process with PID 4020".

I think it is very difficult for user to know why such an error has
occurred and what he can do to proceed. I am not sure if there is any
sensible way to report such an error, but OTOH it seems we should
provide some information regarding what has happened to client.

I don't think this is really a fixable problem. There's no way to
communicate an error that happens before you establish communications.
The same problem exists for user connections, but it's not serious in
practice because it's rare. I expect the same to be true here.

postgres=# select * from pg_background_result(4672) as (result TEXT);
WARNING: unknown message type: G (6 bytes)
ERROR: there is no client connection
CONTEXT: COPY t1, line 1: ""

Something similar to what is returned for transaction statements
("transaction control statements are not allowed in pg_background")
would be better.

Fixed in the attached version.

If you have to discard results of statements other than last,
then why at first place you want to allow multiple statements?

You can run statements with side effects, or you can run multiply
utility commands.

Like in below case, how will user identify whether the result is
for first statement or second statement?

By reading the documentation that I will write.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0006-pg_background-Run-commands-in-a-background-worker-an.patchtext/x-patch; charset=US-ASCII; name=0006-pg_background-Run-commands-in-a-background-worker-an.patchDownload

From c835a06f20792556d35a0eee4c2fa21f5f23e8a3 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 11 Jul 2014 09:53:40 -0400
Subject: [PATCH 6/6] pg_background: Run commands in a background worker, and
 get the results.

The currently-active GUC values from the user session will be copied
to the background worker.  If the command returns a result set, you
can retrieve the result set; if not, you can retrieve the command
tags.  If the command fails with an error, the same error will be
thrown in the launching process when the results are retrieved.
Warnings and other messages generated by the background worker, and
notifications received by it, are also propagated to the foreground
process.
---
 contrib/Makefile                             |    1 +
 contrib/pg_background/Makefile               |   18 +
 contrib/pg_background/pg_background--1.0.sql |   17 +
 contrib/pg_background/pg_background.c        |  919 ++++++++++++++++++++++++++
 contrib/pg_background/pg_background.control  |    4 +
 5 files changed, 959 insertions(+)
 create mode 100644 contrib/pg_background/Makefile
 create mode 100644 contrib/pg_background/pg_background--1.0.sql
 create mode 100644 contrib/pg_background/pg_background.c
 create mode 100644 contrib/pg_background/pg_background.control

diff --git a/contrib/Makefile b/contrib/Makefile
index b37d0dd..11d6116 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
 		pageinspect	\
 		passwordcheck	\
 		pg_archivecleanup \
+		pg_background \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
diff --git a/contrib/pg_background/Makefile b/contrib/pg_background/Makefile
new file mode 100644
index 0000000..c4e717d
--- /dev/null
+++ b/contrib/pg_background/Makefile
@@ -0,0 +1,18 @@
+# contrib/pg_background/Makefile
+
+MODULE_big = pg_background
+OBJS = pg_background.o
+
+EXTENSION = pg_background
+DATA = pg_background--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_background
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_background/pg_background--1.0.sql b/contrib/pg_background/pg_background--1.0.sql
new file mode 100644
index 0000000..5fa5ddb
--- /dev/null
+++ b/contrib/pg_background/pg_background--1.0.sql
@@ -0,0 +1,17 @@
+/* contrib/pg_background/pg_background--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION pg_background" to load this file. \quit
+
+CREATE FUNCTION pg_background_launch(sql pg_catalog.text,
+					   queue_size pg_catalog.int4 DEFAULT 65536)
+    RETURNS pg_catalog.int4 STRICT
+	AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION pg_background_result(pid pg_catalog.int4)
+    RETURNS SETOF pg_catalog.record STRICT
+	AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION pg_background_detach(pid pg_catalog.int4)
+    RETURNS pg_catalog.void STRICT
+	AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/contrib/pg_background/pg_background.c b/contrib/pg_background/pg_background.c
new file mode 100644
index 0000000..19d139b
--- /dev/null
+++ b/contrib/pg_background/pg_background.c
@@ -0,0 +1,919 @@
+/*--------------------------------------------------------------------------
+ *
+ * pg_background.c
+ *		Run SQL commands using a background worker.
+ *
+ * Copyright (C) 2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pg_background/pg_background.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/htup_details.h"
+#include "access/printtup.h"
+#include "access/xact.h"
+#include "catalog/pg_type.h"
+#include "commands/async.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "libpq/pqformat.h"
+#include "libpq/pqmq.h"
+#include "miscadmin.h"
+#include "parser/analyze.h"
+#include "pgstat.h"
+#include "storage/dsm.h"
+#include "storage/ipc.h"
+#include "storage/shm_mq.h"
+#include "storage/shm_toc.h"
+#include "tcop/pquery.h"
+#include "tcop/utility.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/snapmgr.h"
+#include "utils/timeout.h"
+
+/* Table-of-contents constants for our dynamic shared memory segment. */
+#define PG_BACKGROUND_MAGIC				0x50674267
+#define PG_BACKGROUND_KEY_FIXED_DATA	0
+#define PG_BACKGROUND_KEY_SQL			1
+#define PG_BACKGROUND_KEY_GUC			2
+#define PG_BACKGROUND_KEY_QUEUE			3
+#define PG_BACKGROUND_NKEYS				4
+
+/* Fixed-size data passed via our dynamic shared memory segment. */
+typedef struct pg_background_fixed_data
+{
+	Oid	database_id;
+	Oid	authenticated_user_id;
+	Oid	current_user_id;
+	int	sec_context;
+	char database[NAMEDATALEN];
+	char authenticated_user[NAMEDATALEN];
+} pg_background_fixed_data;
+
+/* Private state maintained by the launching backend for IPC. */
+typedef struct pg_background_worker_info
+{
+	pid_t		pid;
+	dsm_segment *seg;
+	BackgroundWorkerHandle *handle;	
+	shm_mq_handle *responseq;	
+	bool		consumed;
+} pg_background_worker_info;
+
+/* Private state maintained across calls to pg_background_result. */
+typedef struct pg_background_result_state
+{
+	pg_background_worker_info *info;
+	FmgrInfo   *receive_functions;
+	Oid		   *typioparams;
+	bool		has_row_description;
+	List	   *command_tags;
+	bool		complete;
+} pg_background_result_state;
+
+static HTAB *worker_hash;
+
+static void cleanup_worker_info(dsm_segment *, Datum pid_datum);
+static pg_background_worker_info *find_worker_info(pid_t pid);
+static void save_worker_info(pid_t pid, dsm_segment *seg,
+				 BackgroundWorkerHandle *handle,
+				 shm_mq_handle *responseq);
+
+static HeapTuple form_result_tuple(pg_background_result_state *state,
+								   TupleDesc tupdesc, StringInfo msg);
+
+static void handle_sigterm(SIGNAL_ARGS);
+static void execute_sql_string(const char *sql);
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(pg_background_launch);
+PG_FUNCTION_INFO_V1(pg_background_result);
+PG_FUNCTION_INFO_V1(pg_background_detach);
+
+void pg_background_worker_main(Datum);
+
+/*
+ * Start a dynamic background worker to run a user-specified SQL command.
+ */
+Datum
+pg_background_launch(PG_FUNCTION_ARGS)
+{
+	text	   *sql = PG_GETARG_TEXT_PP(0);
+	int32		queue_size = PG_GETARG_INT64(1);
+	int32		sql_len = VARSIZE_ANY_EXHDR(sql);
+	Size		guc_len;
+	Size		segsize;
+	dsm_segment *seg;
+	shm_toc_estimator e;
+	shm_toc    *toc;
+    char	   *sqlp;
+	char	   *gucstate;
+	shm_mq	   *mq;
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *worker_handle;
+	pg_background_fixed_data *fdata;
+	pid_t		pid;
+	shm_mq_handle *responseq;
+	MemoryContext	oldcontext;
+
+	/* Ensure a valid queue size. */
+	if (queue_size < 0 || ((uint64) queue_size) < shm_mq_minimum_size)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("queue size must be at least %zu bytes",
+						shm_mq_minimum_size)));
+
+	/* Create dynamic shared memory and table of contents. */
+	shm_toc_initialize_estimator(&e);
+	shm_toc_estimate_chunk(&e, sizeof(pg_background_fixed_data));
+	shm_toc_estimate_chunk(&e, sql_len + 1);
+	guc_len = EstimateGUCStateSpace();
+	shm_toc_estimate_chunk(&e, guc_len);
+	shm_toc_estimate_chunk(&e, (Size) queue_size);
+	shm_toc_estimate_keys(&e, PG_BACKGROUND_NKEYS);
+	segsize = shm_toc_estimate(&e);
+	seg = dsm_create(segsize);
+	toc = shm_toc_create(PG_BACKGROUND_MAGIC, dsm_segment_address(seg),
+						 segsize);
+
+	/* Store fixed-size data in dynamic shared memory. */
+	fdata = shm_toc_allocate(toc, sizeof(pg_background_fixed_data));
+	fdata->database_id = MyDatabaseId;
+	fdata->authenticated_user_id = GetAuthenticatedUserId();
+	GetUserIdAndSecContext(&fdata->current_user_id, &fdata->sec_context);
+	strlcpy(fdata->database, get_database_name(MyDatabaseId), NAMEDATALEN);
+	strlcpy(fdata->authenticated_user,
+			GetUserNameFromId(fdata->authenticated_user_id), NAMEDATALEN);
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_FIXED_DATA, fdata);
+
+	/* Store SQL query in dynamic shared memory. */
+	sqlp = shm_toc_allocate(toc, sql_len + 1);
+	memcpy(sqlp, VARDATA(sql), sql_len);
+	sqlp[sql_len] = '\0';
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_SQL, sqlp);
+
+	/* Store GUC state in dynamic shared memory. */
+	gucstate = shm_toc_allocate(toc, guc_len);
+	SerializeGUCState(guc_len, gucstate);
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_GUC, gucstate);
+
+	/* Establish message queue in dynamic shared memory. */
+	mq = shm_mq_create(shm_toc_allocate(toc, (Size) queue_size),
+					   (Size) queue_size);
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_QUEUE, mq);
+	shm_mq_set_receiver(mq, MyProc);
+
+	/*
+	 * Attach the queue before launching a worker, so that we'll automatically
+	 * detach the queue if we error out.  (Otherwise, the worker might sit
+	 * there trying to write the queue long after we've gone away.)
+	 */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+	responseq = shm_mq_attach(mq, seg, NULL);
+	MemoryContextSwitchTo(oldcontext);
+
+	/* Configure a worker. */
+	worker.bgw_flags =
+		BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	worker.bgw_start_time = BgWorkerStart_ConsistentState;
+	worker.bgw_restart_time = BGW_NEVER_RESTART;
+	worker.bgw_main = NULL;		/* new worker might not have library loaded */
+	sprintf(worker.bgw_library_name, "pg_background");
+	sprintf(worker.bgw_function_name, "pg_background_worker_main");
+	snprintf(worker.bgw_name, BGW_MAXLEN,
+			 "pg_background by PID %d", MyProcPid);
+	worker.bgw_main_arg = UInt32GetDatum(dsm_segment_handle(seg));
+	/* set bgw_notify_pid, so we can detect if the worker stops */
+	worker.bgw_notify_pid = MyProcPid;
+
+	/*
+	 * Register the worker.
+	 *
+	 * We switch contexts so that the background worker handle can outlast
+	 * this transaction.
+	 */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+	if (!RegisterDynamicBackgroundWorker(&worker, &worker_handle))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not register background process"),
+			 errhint("You may need to increase max_worker_processes.")));
+	MemoryContextSwitchTo(oldcontext);
+	shm_mq_set_handle(responseq, worker_handle);
+
+	/* Wait for the worker to start. */
+	switch (WaitForBackgroundWorkerStartup(worker_handle, &pid))
+	{
+		case BGWH_STARTED:
+			/* Success. */
+			break;
+		case BGWH_STOPPED:
+			pfree(worker_handle);
+			ereport(ERROR,
+					(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+					 errmsg("could not start background process"),
+					 errhint("More details may be available in the server log.")));
+			break;
+        case BGWH_POSTMASTER_DIED:
+			pfree(worker_handle);
+			ereport(ERROR,
+					(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+					 errmsg("cannot start background processes without postmaster"),
+					 errhint("Kill all remaining database processes and restart the database.")));
+			break;
+		default:
+			elog(ERROR, "unexpected bgworker handle status");
+			break;
+	}
+
+	/* Store the relevant details about this worker for future use. */
+	save_worker_info(pid, seg, worker_handle, responseq);
+
+	/*
+	 * Now that the worker info is saved, we do not need to, and should not,
+	 * automatically detach the segment at resource-owner cleanup time.
+	 */
+	dsm_keep_mapping(seg);
+
+	/* Return the worker's PID. */
+	PG_RETURN_INT32(pid);
+}
+
+/*
+ * Retrieve the results of a background query previously launched in this
+ * session.
+ */
+Datum
+pg_background_result(PG_FUNCTION_ARGS)
+{
+	int32		pid = PG_GETARG_INT32(0);
+	shm_mq_result	res;
+	FuncCallContext *funcctx;
+	TupleDesc	tupdesc;
+	StringInfoData	msg;
+	pg_background_result_state *state;
+
+	/* First-time setup. */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+		pg_background_worker_info *info;
+
+		funcctx = SRF_FIRSTCALL_INIT();
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* See if we have a connection to the specified PID. */
+		if ((info = find_worker_info(pid)) == NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_OBJECT),
+					 errmsg("PID %d is not attached to this session", pid)));
+
+		/* Can't read results twice. */
+		if (info->consumed)
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("results for PID %d have already been consumed", pid)));
+		info->consumed = true;		
+
+		/*
+		 * Whether we succeed or fail, a future invocation of this function
+		 * may not try to read from the DSM once we've begun to do so.
+		 * Accordingly, make arrangements to clean things up at end of query.
+		 */
+		dsm_unkeep_mapping(info->seg);
+
+		/* Set up tuple-descriptor based on colum definition list. */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record"),
+					 errhint("Try calling the function in the FROM clause "
+							 "using a column definition list.")));
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/* Cache state that will be needed on every call. */
+		state = palloc0(sizeof(pg_background_result_state));
+		state->info = info;
+		if (funcctx->tuple_desc->natts > 0)
+		{
+			int	natts = funcctx->tuple_desc->natts;
+			int	i;
+
+			state->receive_functions = palloc(sizeof(FmgrInfo) * natts);
+			state->typioparams = palloc(sizeof(Oid) * natts);
+
+			for (i = 0;	i < natts; ++i)
+			{
+				Oid	receive_function_id;
+
+				getTypeBinaryInputInfo(funcctx->tuple_desc->attrs[i]->atttypid,
+									   &receive_function_id,
+									   &state->typioparams[i]);
+				fmgr_info(receive_function_id, &state->receive_functions[i]);
+			}
+		}
+		funcctx->user_fctx = state;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+	funcctx = SRF_PERCALL_SETUP();
+	tupdesc = funcctx->tuple_desc;
+	state = funcctx->user_fctx;
+
+	/* Initialize message buffer. */
+	initStringInfo(&msg);
+
+	/* Read and processes messages from the shared memory queue. */
+	for (;;)
+	{
+		char		msgtype;
+		Size		nbytes;
+		void	   *data;
+
+		/* Get next message. */
+		res = shm_mq_receive(state->info->responseq, &nbytes, &data, false);
+		if (res != SHM_MQ_SUCCESS)
+			break;
+
+		/*
+		 * Message-parsing routines operate on a null-terminated StringInfo,
+		 * so we must construct one.
+		 */
+		resetStringInfo(&msg);
+		enlargeStringInfo(&msg, nbytes);
+		msg.len = nbytes;
+		memcpy(msg.data, data, nbytes);
+		msg.data[nbytes] = '\0';
+		msgtype = pq_getmsgbyte(&msg);
+
+		/* Dispatch on message type. */
+		switch (msgtype)
+		{
+			case 'E':
+			case 'N':
+				{
+					ErrorData	edata;
+
+					/* Parse ErrorResponse or NoticeResponse. */
+					pq_parse_errornotice(&msg, &edata);
+
+					/*
+					 * Limit the maximum error level to ERROR.  We don't want
+					 * a FATAL inside the background worker to kill the user
+					 * session.
+					 */
+					if (edata.elevel > ERROR)
+						edata.elevel = ERROR;
+
+					/* Rethrow the error. */
+					ThrowErrorData(&edata);	
+					break;
+				}
+			case 'A':
+				{
+					/* Propagate NotifyResponse. */
+					pq_putmessage(msg.data[0], &msg.data[1], nbytes - 1);
+					break;
+				}
+			case 'T':
+				{
+					int16	natts = pq_getmsgint(&msg, 2);
+					int16	i;
+
+					if (state->has_row_description)
+						elog(ERROR, "multiple RowDescription messages");
+					state->has_row_description = true;
+   					if (natts != tupdesc->natts)
+						ereport(ERROR,
+								(errcode(ERRCODE_DATATYPE_MISMATCH),
+								 errmsg("remote query result rowtype does not match "
+									"the specified FROM clause rowtype")));
+
+					for (i = 0; i < natts; ++i)
+					{
+						Oid		type_id;
+
+						(void) pq_getmsgstring(&msg);	/* name */
+						(void) pq_getmsgint(&msg, 4);	/* table OID */
+						(void) pq_getmsgint(&msg, 2);	/* table attnum */
+						type_id = pq_getmsgint(&msg, 4);	/* type OID */
+						(void) pq_getmsgint(&msg, 2);	/* type length */
+						(void) pq_getmsgint(&msg, 4);	/* typmod */
+						(void) pq_getmsgint(&msg, 2);	/* format code */
+
+						if (type_id != tupdesc->attrs[i]->atttypid)
+							ereport(ERROR,
+									(errcode(ERRCODE_DATATYPE_MISMATCH),
+									 errmsg("remote query result rowtype does not match "
+										"the specified FROM clause rowtype")));
+					}
+
+					pq_getmsgend(&msg);
+
+					break;
+				}
+			case 'D':
+				{
+					/* Handle DataRow message. */
+					HeapTuple	result;
+
+					result = form_result_tuple(state, tupdesc, &msg);
+					SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(result));
+				}
+			case 'C':
+				{
+					/* Handle CommandComplete message. */
+					MemoryContext	oldcontext;
+					const char  *tag = pq_getmsgstring(&msg);
+
+					oldcontext =
+						MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+					state->command_tags = lappend(state->command_tags,
+												  pstrdup(tag));
+					MemoryContextSwitchTo(oldcontext);
+					break;
+				}
+			case 'Z':
+				{
+					/* Handle ReadyForQuery message. */
+					state->complete = true;
+					break;
+				}
+			default:
+				elog(WARNING, "unknown message type: %c (%zu bytes)",
+					 msg.data[0], nbytes);
+				break;
+		}
+	}
+
+	/* Check whether the connection was broken prematurely. */
+	if (!state->complete)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONNECTION_FAILURE),
+				 errmsg("lost connection to worker process with PID %d",
+					pid)));
+
+	/* If no data rows, return the command tags instead. */
+	if (!state->has_row_description)
+	{
+		if (tupdesc->natts != 1 || tupdesc->attrs[0]->atttypid != TEXTOID)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query did not return a result set, but "
+							"result rowtype is not a single text column")));
+		if (state->command_tags != NIL)
+		{
+			char *tag = linitial(state->command_tags);
+			Datum	value;
+			bool	isnull;
+			HeapTuple	result;
+
+			state->command_tags = list_delete_first(state->command_tags);
+			value = PointerGetDatum(cstring_to_text(tag));
+			isnull = false;
+			result = heap_form_tuple(tupdesc, &value, &isnull);
+			SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(result));
+		}			
+	}
+
+	/* We're done! */
+	dsm_detach(state->info->seg);
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Parse a DataRow message and form a result tuple.
+ */
+static HeapTuple
+form_result_tuple(pg_background_result_state *state, TupleDesc tupdesc,
+				  StringInfo msg)
+{
+	/* Handle DataRow message. */
+	int16	natts = pq_getmsgint(msg, 2);
+	int16	i;
+	Datum  *values = NULL;
+	bool   *isnull = NULL;
+	StringInfoData	buf;
+
+	if (!state->has_row_description)
+		elog(ERROR, "DataRow not preceded by RowDescription");
+   	if (natts != tupdesc->natts)
+		elog(ERROR, "malformed DataRow");
+	if (natts > 0)
+	{
+		values = palloc(natts * sizeof(Datum));
+		isnull = palloc(natts * sizeof(bool));
+	}
+	initStringInfo(&buf);
+
+	for (i = 0; i < natts; ++i)
+	{
+		int32	bytes = pq_getmsgint(msg, 4);
+
+		if (bytes < 0)
+		{
+			values[i] = ReceiveFunctionCall(&state->receive_functions[i],
+											NULL,
+											state->typioparams[i],
+											tupdesc->attrs[i]->atttypmod);
+			isnull[i] = true;
+		}
+		else
+		{
+			resetStringInfo(&buf);
+			appendBinaryStringInfo(&buf, pq_getmsgbytes(msg, bytes), bytes);
+			values[i] = ReceiveFunctionCall(&state->receive_functions[i],
+											&buf,
+											state->typioparams[i],
+											tupdesc->attrs[i]->atttypmod);
+			isnull[i] = false;
+		}
+	}
+
+	pq_getmsgend(msg);
+
+	return heap_form_tuple(tupdesc, values, isnull);
+}
+
+/*
+ * Detach from the dynamic shared memory segment used for communication with
+ * a background worker.  This prevents the worker from stalling waiting for
+ * us to read its results.
+ */
+Datum
+pg_background_detach(PG_FUNCTION_ARGS)
+{
+	int32		pid = PG_GETARG_INT32(0);
+	pg_background_worker_info *info;
+
+	info = find_worker_info(pid);
+	if (info == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("PID %d is not attached to this session", pid)));
+	dsm_detach(info->seg);
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * When the dynamic shared memory segment associated with a worker is
+ * cleaned up, we need to clean up our associated private data structures.
+ */
+static void
+cleanup_worker_info(dsm_segment *seg, Datum pid_datum)
+{
+	pid_t	pid = DatumGetInt32(pid_datum);
+	bool	found;
+	pg_background_worker_info *info;
+
+	/* Find any worker info entry for this PID.  If none, we're done. */
+	if ((info = find_worker_info(pid)) == NULL)
+		return;
+
+	/* Free memory used by the BackgroundWorkerHandle. */
+	if (info->handle != NULL)
+	{
+		pfree(info->handle);
+		info->handle = NULL;
+	}
+
+	/* Remove the hashtable entry. */
+	hash_search(worker_hash, (void *) &pid, HASH_REMOVE, &found);
+	if (!found)
+		elog(ERROR, "pg_background worker_hash table corrupted");
+}
+
+/*
+ * Find the background worker information for the worker with a given PID.
+ */
+static pg_background_worker_info *
+find_worker_info(pid_t pid)
+{
+	pg_background_worker_info *info = NULL;
+
+	if (worker_hash != NULL)
+		info = hash_search(worker_hash, (void *) &pid, HASH_FIND, NULL);
+
+	return info;
+}
+
+/*
+ * Save worker information for future IPC.
+ */
+static void
+save_worker_info(pid_t pid, dsm_segment *seg, BackgroundWorkerHandle *handle,
+				 shm_mq_handle *responseq)
+{
+	pg_background_worker_info *info;
+
+	/* If the hash table hasn't been set up yet, do that now. */
+	if (worker_hash == NULL)
+	{
+		HASHCTL	ctl;
+
+		ctl.keysize = sizeof(pid_t);
+		ctl.entrysize = sizeof(pg_background_worker_info);
+		worker_hash = hash_create("pg_background worker_hash", 8, &ctl,
+								  HASH_ELEM);
+	}
+
+	/* Detach any older worker with this PID. */
+	if ((info = find_worker_info(pid)) != NULL)
+		dsm_detach(info->seg);
+
+	/* When the DSM is unmapped, clean everything up. */
+	on_dsm_detach(seg, cleanup_worker_info, Int32GetDatum(pid));
+
+	/* Create a new entry for this worker. */
+	info = hash_search(worker_hash, (void *) &pid, HASH_ENTER, NULL);
+	info->seg = seg;
+	info->handle = handle;
+	info->responseq = responseq;
+	info->consumed = false;
+}
+
+/*
+ * Background worker entrypoint.
+ */
+void
+pg_background_worker_main(Datum main_arg)
+{
+	dsm_segment *seg;
+	shm_toc    *toc;
+	pg_background_fixed_data *fdata;
+	char	   *sql;
+	char	   *gucstate;
+	shm_mq	   *mq;
+	shm_mq_handle *responseq;
+
+	/* Establish signal handlers. */
+	pqsignal(SIGTERM, handle_sigterm);
+	BackgroundWorkerUnblockSignals();
+
+	/* Set up a memory context and resource owner. */
+	Assert(CurrentResourceOwner == NULL);
+	CurrentResourceOwner = ResourceOwnerCreate(NULL, "pg_background");
+	CurrentMemoryContext = AllocSetContextCreate(TopMemoryContext,
+												 "pg_background session",
+												 ALLOCSET_DEFAULT_MINSIZE,
+												 ALLOCSET_DEFAULT_INITSIZE,
+												 ALLOCSET_DEFAULT_MAXSIZE);
+	
+
+	/* Connect to the dynamic shared memory segment. */
+	seg = dsm_attach(DatumGetInt32(main_arg));
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("unable to map dynamic shared memory segment")));
+	toc = shm_toc_attach(PG_BACKGROUND_MAGIC, dsm_segment_address(seg));
+	if (toc == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+			   errmsg("bad magic number in dynamic shared memory segment")));
+
+	/* Find data structures in dynamic shared memory. */
+	fdata = shm_toc_lookup(toc, PG_BACKGROUND_KEY_FIXED_DATA);
+	sql = shm_toc_lookup(toc, PG_BACKGROUND_KEY_SQL);
+	gucstate = shm_toc_lookup(toc, PG_BACKGROUND_KEY_GUC);
+	mq = shm_toc_lookup(toc, PG_BACKGROUND_KEY_QUEUE);
+	shm_mq_set_sender(mq, MyProc);
+	responseq = shm_mq_attach(mq, seg, NULL);
+
+	/* Redirect protocol messages to responseq. */
+	pq_redirect_to_shm_mq(mq, responseq);
+
+	/*
+	 * Initialize our user and database ID based on the strings version of
+	 * the data, and then go back and check that we actually got the database
+	 * and user ID that we intended to get.  We do this because it's not
+	 * impossible for the process that started us to die before we get here,
+	 * and the user or database could be renamed in the meantime.  We don't
+	 * want to latch on the wrong object by accident.  There should probably
+	 * be a variant of BackgroundWorkerInitializeConnection that accepts OIDs
+	 * rather than strings.
+	 */
+	BackgroundWorkerInitializeConnection(fdata->database,
+										 fdata->authenticated_user);
+	if (fdata->database_id != MyDatabaseId ||
+		fdata->authenticated_user_id != GetAuthenticatedUserId())
+		ereport(ERROR,
+			(errmsg("user or database renamed during pg_background startup")));
+
+	/* Restore GUC values from launching backend. */
+	StartTransactionCommand();
+	RestoreGUCState(gucstate);
+	CommitTransactionCommand();
+
+	/* Restore user ID and security context. */
+	SetUserIdAndSecContext(fdata->current_user_id, fdata->sec_context);
+
+	/* Prepare to execute the query. */
+	SetCurrentStatementStartTimestamp();
+	debug_query_string = sql;
+   	pgstat_report_activity(STATE_RUNNING, sql);
+	StartTransactionCommand();
+	if (StatementTimeout > 0)
+		enable_timeout_after(STATEMENT_TIMEOUT, StatementTimeout);
+	else
+		disable_timeout(STATEMENT_TIMEOUT, false);
+
+	/* Execute the query. */
+	execute_sql_string(sql);
+
+	/* Post-execution cleanup. */
+	disable_timeout(STATEMENT_TIMEOUT, false);
+	CommitTransactionCommand();
+	ProcessCompletedNotifies();
+   	pgstat_report_activity(STATE_IDLE, sql);
+	pgstat_report_stat(true);
+
+	/* Signal that we are done. */
+	ReadyForQuery(DestRemote);
+}
+
+/*
+ * Execute given SQL string.
+ *
+ * Using SPI here would preclude backgrounding commands like VACUUM which one
+ * might very well wish to launch in the background.  So we do this instead.
+ */
+static void
+execute_sql_string(const char *sql)
+{
+	List	   *raw_parsetree_list;
+	ListCell   *lc1;
+	bool		isTopLevel;
+	int			commands_remaining;
+	MemoryContext	parsecontext;
+	MemoryContext	oldcontext;
+
+	/*
+	 * Parse the SQL string into a list of raw parse trees.
+	 *
+	 * Because we allow statements that perform internal transaction control,
+	 * we can't do this in TopTransactionContext; the parse trees might get
+	 * blown away before we're done executing them.
+	 */
+	parsecontext = AllocSetContextCreate(TopMemoryContext,
+										 "pg_background parse/plan",
+										 ALLOCSET_DEFAULT_MINSIZE,
+										 ALLOCSET_DEFAULT_INITSIZE,
+										 ALLOCSET_DEFAULT_MAXSIZE);
+	oldcontext = MemoryContextSwitchTo(parsecontext);
+	raw_parsetree_list = pg_parse_query(sql);
+	commands_remaining = list_length(raw_parsetree_list);
+	isTopLevel = commands_remaining == 1;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * Do parse analysis, rule rewrite, planning, and execution for each raw
+	 * parsetree.  We must fully execute each query before beginning parse
+	 * analysis on the next one, since there may be interdependencies.
+	 */
+	foreach(lc1, raw_parsetree_list)
+	{
+		Node	   *parsetree = (Node *) lfirst(lc1);
+        const char *commandTag;
+        char        completionTag[COMPLETION_TAG_BUFSIZE];
+        List       *querytree_list,
+                   *plantree_list;
+		bool		snapshot_set = false;
+		Portal		portal;
+		DestReceiver *receiver;
+		int16		format = 1;
+
+		/*
+		 * We don't allow transaction-control commands like COMMIT and ABORT
+		 * here.  The entire SQL statement is executed as a single transaction
+		 * which commits if no errors are encountered.
+		 */
+		if (IsA(parsetree, TransactionStmt))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("transaction control statements are not allowed in pg_background")));
+
+        /*
+         * Get the command name for use in status display (it also becomes the
+         * default completion tag, down inside PortalRun).  Set ps_status and
+         * do any special start-of-SQL-command processing needed by the
+         * destination.
+         */
+        commandTag = CreateCommandTag(parsetree);
+        set_ps_display(commandTag, false);
+        BeginCommand(commandTag, DestNone);
+
+		/* Set up a snapshot if parse analysis/planning will need one. */
+		if (analyze_requires_snapshot(parsetree))
+		{
+			PushActiveSnapshot(GetTransactionSnapshot());
+			snapshot_set = true;
+		}
+
+		/*
+		 * OK to analyze, rewrite, and plan this query.
+		 *
+		 * As with parsing, we need to make sure this data outlives the
+		 * transaction, because of the possibility that the statement might
+		 * perform internal transaction control.
+		 */
+		oldcontext = MemoryContextSwitchTo(parsecontext);
+		querytree_list = pg_analyze_and_rewrite(parsetree, sql, NULL, 0);
+		plantree_list = pg_plan_queries(querytree_list, 0, NULL);
+
+		/* Done with the snapshot used for parsing/planning */
+        if (snapshot_set)
+            PopActiveSnapshot();
+
+        /* If we got a cancel signal in analysis or planning, quit */
+        CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Execute the query using the unnamed portal.
+		 */
+        portal = CreatePortal("", true, true);
+        /* Don't display the portal in pg_cursors */
+        portal->visible = false;
+		PortalDefineQuery(portal, NULL, sql, commandTag, plantree_list, NULL);
+		PortalStart(portal, NULL, 0, InvalidSnapshot);
+		PortalSetResultFormat(portal, 1, &format);		/* binary format */
+
+		/*
+		 * Tuples returned by any command other than the last are simply
+		 * discarded; but those returned by the last (or only) command are
+		 * redirected to the shared memory queue we're using for communication
+		 * with the launching backend. If the launching backend is gone or has
+		 * detached us, these messages will just get dropped on the floor.
+		 */
+		--commands_remaining;
+		if (commands_remaining > 0)
+			receiver = CreateDestReceiver(DestNone);
+		else
+		{
+			receiver = CreateDestReceiver(DestRemote);
+			SetRemoteDestReceiverParams(receiver, portal);
+		}
+
+		/*
+		 * Only once the portal and destreceiver have been established can
+		 * we return to the transaction context.  All that stuff needs to
+		 * survive an internal commit inside PortalRun!
+		 */
+		MemoryContextSwitchTo(oldcontext);
+
+		/* Here's where we actually execute the command. */
+		(void) PortalRun(portal, FETCH_ALL, isTopLevel, receiver, receiver,
+                         completionTag);
+
+		/* Clean up the receiver. */
+		(*receiver->rDestroy) (receiver);
+
+		/*
+		 * Send a CommandComplete message even if we suppressed the query
+		 * results.  The user backend will report these in the absence of
+		 * any true query results.
+		 */
+		EndCommand(completionTag, DestRemote);
+
+		/* Clean up the portal. */
+		PortalDrop(portal, false);
+	}
+
+	/* Be sure to advance the command counter after the last script command */
+	CommandCounterIncrement();
+}
+
+/*
+ * When we receive a SIGTERM, we set InterruptPending and ProcDiePending just
+ * like a normal backend.  The next CHECK_FOR_INTERRUPTS() will do the right
+ * thing.
+ */
+static void
+handle_sigterm(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	if (!proc_exit_inprogress)
+	{
+		InterruptPending = true;
+		ProcDiePending = true;
+	}
+
+	errno = save_errno;
+}
diff --git a/contrib/pg_background/pg_background.control b/contrib/pg_background/pg_background.control
new file mode 100644
index 0000000..733d0e1
--- /dev/null
+++ b/contrib/pg_background/pg_background.control
@@ -0,0 +1,4 @@
+comment = 'Run SQL queries in the background'
+default_version = '1.0'
+module_pathname = '$libdir/pg_background'
+relocatable = true
-- 
1.7.9.6 (Apple Git-31.1)

#28

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Robert Haas (#1)

1 attachment(s)

Re: pg_background (and more parallelism infrastructure patches)

Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:

Attached is a contrib module that lets you launch arbitrary command in
a background worker, and supporting infrastructure patches for core.

Very cool! Started looking into this while waiting on a few
CLOBBER_CACHE_ALWAYS runs to finish (ugh...).

Perhaps I'm just being a bit over the top, but all this per-character
work feels a bit ridiculous.. When we're using MAXIMUM_ALIGNOF, I
suppose it's not so bad, but is there no hope to increase that and make
this whole process more efficient? Just a thought.

After reading through the code for 0001, I decided to actually take it
out for a spin- see attached. I then passed a few megabytes of data
through it and it seemed to work just fine.

In general, I'm quite excited about this capability and will be looking
over the later patches also. I also prefer the function-pointer based
approach which was taken up in later versions to the hook-based approach
in the initial patches, so glad to see things going in that direction.
Lastly, I will say that I feel it'd be good to support bi-directional
communication as I think it'll be needed eventually, but I'm not sure
that has to happen now.

Thanks!

Stephen

#29

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Stephen Frost (#28)

Re: pg_background (and more parallelism infrastructure patches)

On Mon, Sep 29, 2014 at 12:05 PM, Stephen Frost <sfrost@snowman.net> wrote:

Perhaps I'm just being a bit over the top, but all this per-character
work feels a bit ridiculous.. When we're using MAXIMUM_ALIGNOF, I
suppose it's not so bad, but is there no hope to increase that and make
this whole process more efficient? Just a thought.

I'm not sure I understand what you're getting at here.

After reading through the code for 0001, I decided to actually take it
out for a spin- see attached. I then passed a few megabytes of data
through it and it seemed to work just fine.

That's good.

In general, I'm quite excited about this capability and will be looking
over the later patches also. I also prefer the function-pointer based
approach which was taken up in later versions to the hook-based approach
in the initial patches, so glad to see things going in that direction.

OK.

Lastly, I will say that I feel it'd be good to support bi-directional
communication as I think it'll be needed eventually, but I'm not sure
that has to happen now.

I agree we need bidirectional communication; I just don't agree that
the other direction should use the libpq format. The data going from
the worker to the process that launched it is stuff like errors and
tuples, for which we already have a wire format. The data going in
the other direction is going to be things like plan trees to be
executed, for which we don't. But if we can defer the issue, so much
the better. Things will become clearer as we get closer to being
done.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Robert Haas (#29)

Re: pg_background (and more parallelism infrastructure patches)

* Robert Haas (robertmhaas@gmail.com) wrote:

On Mon, Sep 29, 2014 at 12:05 PM, Stephen Frost <sfrost@snowman.net> wrote:

Perhaps I'm just being a bit over the top, but all this per-character
work feels a bit ridiculous.. When we're using MAXIMUM_ALIGNOF, I
suppose it's not so bad, but is there no hope to increase that and make
this whole process more efficient? Just a thought.

I'm not sure I understand what you're getting at here.

Was just thinking that we might be able to work out what needs to be
done without having to actually do it on a per-character basis. That
said, I'm not sure it's really worth the effort given that we're talking
about at most 8 bytes currently. I had originally been thinking that we
might increase the minimum size as it might make things more efficient,
but it's not clear that'd really end up being the case either and,
regardless, it's probably not worth worrying about at this point.

After reading through the code for 0001, I decided to actually take it
out for a spin- see attached. I then passed a few megabytes of data
through it and it seemed to work just fine.

That's good.

Just to note this- the testing which I did used a random number of
segments and random amounts of data with each and hit specific edge
cases and all looked good.

Lastly, I will say that I feel it'd be good to support bi-directional
communication as I think it'll be needed eventually, but I'm not sure
that has to happen now.

I agree we need bidirectional communication; I just don't agree that
the other direction should use the libpq format. The data going from
the worker to the process that launched it is stuff like errors and
tuples, for which we already have a wire format. The data going in
the other direction is going to be things like plan trees to be
executed, for which we don't. But if we can defer the issue, so much
the better. Things will become clearer as we get closer to being
done.

Sounds good to me.

Thanks,

Stephen

#31

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Stephen Frost (#30)

Re: pg_background (and more parallelism infrastructure patches)

On Wed, Oct 1, 2014 at 4:56 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Robert Haas (robertmhaas@gmail.com) wrote:

On Mon, Sep 29, 2014 at 12:05 PM, Stephen Frost <sfrost@snowman.net> wrote:

Perhaps I'm just being a bit over the top, but all this per-character
work feels a bit ridiculous.. When we're using MAXIMUM_ALIGNOF, I
suppose it's not so bad, but is there no hope to increase that and make
this whole process more efficient? Just a thought.

I'm not sure I understand what you're getting at here.

Was just thinking that we might be able to work out what needs to be
done without having to actually do it on a per-character basis. That
said, I'm not sure it's really worth the effort given that we're talking
about at most 8 bytes currently. I had originally been thinking that we
might increase the minimum size as it might make things more efficient,
but it's not clear that'd really end up being the case either and,
regardless, it's probably not worth worrying about at this point.

I'm still not entirely sure we're on the same page. Most of the data
movement for shm_mq is done via memcpy(), which I think is about as
efficient as it gets. The detailed character-by-character handling
only really comes up when shm_mq_send() is passed multiple chunks with
lengths that are not a multiple of MAXIMUM_ALIGNOF. Then we have to
fiddle a bit more. So making MAXIMUM_ALIGNOF bigger would actually
cause us to do more fiddling, not less.

When I originally designed this queue, I had the idea in mind that
life would be simpler if the queue head and tail pointers always
advanced in multiples of MAXIMUM_ALIGNOF. That didn't really work out
as well as I'd hoped; maybe I would have been better off having the
queue pack everything in tightly and ignore alignment. However, there
is some possible efficiency advantage of the present system: when a
message fits in the queue without wrapping, shm_mq_receive() returns a
pointer to the message, and the caller can assume that message is
properly aligned. If the queue didn't maintain alignment internally,
you'd need to do a copy there before accessing anything inside the
message that might be aligned. Granted, we don't have any code that
takes advantage of that yet, at least not in core, but the potential
is there. Still, I may have made the wrong call. But, I don't think
it's the of this patch to rework the whole framework; we can do that
in the future after this has a few more users and the pros and cons of
various approaches are (hopefully) more clear. It's not a complex
API, so substituting a different implementation later on probably
wouldn't be too hard.

Anyway, it sounds like you're on board with me committing the first
patch of the series, so I'll do that next week absent objections.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Robert Haas (#31)

Re: pg_background (and more parallelism infrastructure patches)

Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:

On Wed, Oct 1, 2014 at 4:56 PM, Stephen Frost <sfrost@snowman.net> wrote:

Was just thinking that we might be able to work out what needs to be
done without having to actually do it on a per-character basis. That
said, I'm not sure it's really worth the effort given that we're talking
about at most 8 bytes currently. I had originally been thinking that we
might increase the minimum size as it might make things more efficient,
but it's not clear that'd really end up being the case either and,
regardless, it's probably not worth worrying about at this point.

I'm still not entirely sure we're on the same page. Most of the data
movement for shm_mq is done via memcpy(), which I think is about as
efficient as it gets.

Right- agreed. I had originally thought we were doing things on a
per-MAXIMUM_ALIGNOF-basis somewhere else, but that appears to be an
incorrect assumption (which I'm glad for).

The detailed character-by-character handling
only really comes up when shm_mq_send() is passed multiple chunks with
lengths that are not a multiple of MAXIMUM_ALIGNOF. Then we have to
fiddle a bit more. So making MAXIMUM_ALIGNOF bigger would actually
cause us to do more fiddling, not less.

Sorry- those were two independent items. Regarding the per-character
work, I was thinking we could work out the number of characters which
need to be moved and then use memcpy directly rather than doing the
per-character work, but as noted, we shouldn't be going through that
loop very often or for very many iterations anyway, and we have to deal
with moving forward through the iovs, so we'd still have to have a loop
there regardless.

When I originally designed this queue, I had the idea in mind that
life would be simpler if the queue head and tail pointers always
advanced in multiples of MAXIMUM_ALIGNOF. That didn't really work out
as well as I'd hoped; maybe I would have been better off having the
queue pack everything in tightly and ignore alignment. However, there
is some possible efficiency advantage of the present system: when a
message fits in the queue without wrapping, shm_mq_receive() returns a
pointer to the message, and the caller can assume that message is
properly aligned. If the queue didn't maintain alignment internally,
you'd need to do a copy there before accessing anything inside the
message that might be aligned. Granted, we don't have any code that
takes advantage of that yet, at least not in core, but the potential
is there. Still, I may have made the wrong call. But, I don't think
it's the of this patch to rework the whole framework; we can do that
in the future after this has a few more users and the pros and cons of
various approaches are (hopefully) more clear. It's not a complex
API, so substituting a different implementation later on probably
wouldn't be too hard.

Agreed.

Anyway, it sounds like you're on board with me committing the first
patch of the series, so I'll do that next week absent objections.

Works for me.

Thanks!

Stephen

#33

Andres Freund

andres@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#29)

Re: pg_background (and more parallelism infrastructure patches)

On 2014-09-29 13:38:37 -0400, Robert Haas wrote:

On Mon, Sep 29, 2014 at 12:05 PM, Stephen Frost <sfrost@snowman.net> wrote:

Lastly, I will say that I feel it'd be good to support bi-directional
communication as I think it'll be needed eventually, but I'm not sure
that has to happen now.

I agree we need bidirectional communication; I just don't agree that
the other direction should use the libpq format. The data going from
the worker to the process that launched it is stuff like errors and
tuples, for which we already have a wire format. The data going in
the other direction is going to be things like plan trees to be
executed, for which we don't. But if we can defer the issue, so much
the better. Things will become clearer as we get closer to being
done.

I think that might be true for your usecase, but not for others. It's
perfectly conceivable that one might want to ship tuples to a couple
bgworkers using the COPY protocol or such.

I don't think it needs to be fully implemented, but I think we should
design it a way that it's unlikely to require larger changes to the
added code from here to add it.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

Andres Freund

andres@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#20)

Re: pg_background (and more parallelism infrastructure patches)

On 2014-09-10 16:53:12 -0400, Robert Haas wrote:

diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
index 5da9d8d..0b8db42 100644
--- a/src/include/libpq/libpq.h
+++ b/src/include/libpq/libpq.h
@@ -37,6 +37,31 @@ typedef struct
}			u;
} PQArgBlock;

+typedef struct
+{
+	void (*comm_reset)(void);
+	int	(*flush)(void);
+	int	(*flush_if_writable)(void);
+	bool (*is_send_pending)(void);
+	int	(*putmessage)(char msgtype, const char *s, size_t len);
+	void (*putmessage_noblock)(char msgtype, const char *s, size_t len);
+	void (*startcopyout)(void);
+	void (*endcopyout)(bool errorAbort);
+} PQsendMethods;
+
+PQsendMethods *PqSendMethods;

WRT my complaint in the other subthread, I think PQsendMethods should
just be renamed to PQcommMethods or similar. That'd, in combination with
a /* at some point we might want to add methods for receiving data here
*/ looks like it'd already satisfy my complaint ;)

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Andres Freund (#34)

Re: pg_background (and more parallelism infrastructure patches)

On Wed, Oct 8, 2014 at 10:09 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-09-10 16:53:12 -0400, Robert Haas wrote:
diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
index 5da9d8d..0b8db42 100644
--- a/src/include/libpq/libpq.h
+++ b/src/include/libpq/libpq.h
@@ -37,6 +37,31 @@ typedef struct
}                       u;
} PQArgBlock;
+typedef struct
+{
+     void (*comm_reset)(void);
+     int     (*flush)(void);
+     int     (*flush_if_writable)(void);
+     bool (*is_send_pending)(void);
+     int     (*putmessage)(char msgtype, const char *s, size_t len);
+     void (*putmessage_noblock)(char msgtype, const char *s, size_t len);
+     void (*startcopyout)(void);
+     void (*endcopyout)(bool errorAbort);
+} PQsendMethods;
+
+PQsendMethods *PqSendMethods;
WRT my complaint in the other subthread, I think PQsendMethods should
just be renamed to PQcommMethods or similar. That'd, in combination with
a /* at some point we might want to add methods for receiving data here
*/ looks like it'd already satisfy my complaint ;)

Well, I'm happy enough to go ahead and change that, if that's all it
takes to make you happy. Is it?

So far the status of these patches AIUI is:

#1 - Just committed, per discussion last week.
#2 - No comments, possibly because it's pretty boring.
#3 - Most people seem happy with v2, modulo the above renaming request.
#4 - No comments.
#5 - No comments; trivial.
#6 - Updated per Amit's feedback. Not sure if anyone else wants to
review. Still needs docs.

I'm tempted to go ahead and do the renaming you've requested here and
then commit #2, #3, #4, and #5. But I'll wait if people want to
review more first. #6 probably needs at least one more thorough
review, and of course the documentation.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36

Petr Jelinek

petr@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#35)

Re: pg_background (and more parallelism infrastructure patches)

On 08/10/14 21:03, Robert Haas wrote:

On Wed, Oct 8, 2014 at 10:09 AM, Andres Freund <andres@2ndquadrant.com> wrote:

WRT my complaint in the other subthread, I think PQsendMethods should
just be renamed to PQcommMethods or similar. That'd, in combination with
a /* at some point we might want to add methods for receiving data here
*/ looks like it'd already satisfy my complaint ;)

Well, I'm happy enough to go ahead and change that, if that's all it
takes to make you happy. Is it?

So far the status of these patches AIUI is:

#1 - Just committed, per discussion last week.
#2 - No comments, possibly because it's pretty boring.

Yeah there is nothing much to say there, it can just go in.

#4 - No comments.

I think I said I like it which I meant as I think it's good as it is, no
objections to committing that one either.

#6 - Updated per Amit's feedback. Not sure if anyone else wants to
review. Still needs docs.

I'd like to see this one when it's updated since I lost track a little
about how the changes looks like exactly.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37

Andres Freund

andres@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#1)

Re: pg_background (and more parallelism infrastructure patches)

/*
+ * Arrange to remove a dynamic shared memory mapping at cleanup time.
+ *
+ * dsm_keep_mapping() can be used to preserve a mapping for the entire
+ * lifetime of a process; this function reverses that decision, making
+ * the segment owned by the current resource owner.  This may be useful
+ * just before performing some operation that will invalidate the segment
+ * for future use by this backend.
+ */
+void
+dsm_unkeep_mapping(dsm_segment *seg)
+{
+	Assert(seg->resowner == NULL);
+	ResourceOwnerEnlargeDSMs(CurrentResourceOwner);
+	seg->resowner = CurrentResourceOwner;
+	ResourceOwnerRememberDSM(seg->resowner, seg);
+}

Hm, I dislike the name unkeep. I guess you want to be symmetric to
dsm_keep_mapping? dsm_manage_mapping(), dsm_ensure_mapping_cleanup()
dm_remember_mapping()?

From c835a06f20792556d35a0eee4c2fa21f5f23e8a3 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 11 Jul 2014 09:53:40 -0400
Subject: [PATCH 6/6] pg_background: Run commands in a background worker, and
get the results.

The currently-active GUC values from the user session will be copied
to the background worker. If the command returns a result set, you
can retrieve the result set; if not, you can retrieve the command
tags. If the command fails with an error, the same error will be
thrown in the launching process when the results are retrieved.
Warnings and other messages generated by the background worker, and
notifications received by it, are also propagated to the foreground
process.

I got to ask: Why is it helpful that we have this in contrib? I have a
good share of blame to bear for that, but I think we need to stop
dilluting contrib evermore with test programs. These have a place, but I
don't think it should be contrib.

+/* Table-of-contents constants for our dynamic shared memory segment. */
+#define PG_BACKGROUND_MAGIC				0x50674267
+#define PG_BACKGROUND_KEY_FIXED_DATA	0
+#define PG_BACKGROUND_KEY_SQL			1
+#define PG_BACKGROUND_KEY_GUC			2
+#define PG_BACKGROUND_KEY_QUEUE			3
+#define PG_BACKGROUND_NKEYS				4
+
+/* Fixed-size data passed via our dynamic shared memory segment. */
+typedef struct pg_background_fixed_data
+{
+	Oid	database_id;
+	Oid	authenticated_user_id;
+	Oid	current_user_id;
+	int	sec_context;
+	char database[NAMEDATALEN];
+	char authenticated_user[NAMEDATALEN];
+} pg_background_fixed_data;

Why not NameData?

+/* Private state maintained by the launching backend for IPC. */
+typedef struct pg_background_worker_info
+{
+	pid_t		pid;
+	dsm_segment *seg;
+	BackgroundWorkerHandle *handle;	
+	shm_mq_handle *responseq;	
+	bool		consumed;
+} pg_background_worker_info;

whitespace damage.

+static HTAB *worker_hash;

Hm. So the lifetime of this hash's contents is managed via
on_dsm_detach(), do I understand that correctly?

+PG_FUNCTION_INFO_V1(pg_background_launch);
+PG_FUNCTION_INFO_V1(pg_background_result);
+PG_FUNCTION_INFO_V1(pg_background_detach);

+void pg_background_worker_main(Datum);
+
+/*
+ * Start a dynamic background worker to run a user-specified SQL command.
+ */
+Datum
+pg_background_launch(PG_FUNCTION_ARGS)
+{
+	text	   *sql = PG_GETARG_TEXT_PP(0);
+	int32		queue_size = PG_GETARG_INT64(1);
+	int32		sql_len = VARSIZE_ANY_EXHDR(sql);
+	Size		guc_len;
+	Size		segsize;
+	dsm_segment *seg;
+	shm_toc_estimator e;
+	shm_toc    *toc;
+    char	   *sqlp;

wrong indentation.

+	char	   *gucstate;
+	shm_mq	   *mq;
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *worker_handle;
+	pg_background_fixed_data *fdata;
+	pid_t		pid;
+	shm_mq_handle *responseq;
+	MemoryContext	oldcontext;
+
+	/* Ensure a valid queue size. */
+	if (queue_size < 0 || ((uint64) queue_size) < shm_mq_minimum_size)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("queue size must be at least %zu bytes",
+						shm_mq_minimum_size)));

Hm. So every user can do this once the extension is created as the
functions are most likely to be PUBLIC. Is this a good idea?

+	/* Wait for the worker to start. */
+	switch (WaitForBackgroundWorkerStartup(worker_handle, &pid))
+	{
+		case BGWH_STARTED:
+			/* Success. */
+			break;
+		case BGWH_STOPPED:
+			pfree(worker_handle);
+			ereport(ERROR,
+					(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+					 errmsg("could not start background process"),
+					 errhint("More details may be available in the server log.")));
+			break;
+        case BGWH_POSTMASTER_DIED:

space vs. tab.

+/*
+ * Retrieve the results of a background query previously launched in this
+ * session.
+ */
+Datum
+pg_background_result(PG_FUNCTION_ARGS)
+{
+	int32		pid = PG_GETARG_INT32(0);
+	shm_mq_result	res;
+	FuncCallContext *funcctx;
+	TupleDesc	tupdesc;
+	StringInfoData	msg;
+	pg_background_result_state *state;
+
+	/* First-time setup. */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+		pg_background_worker_info *info;
+
+		funcctx = SRF_FIRSTCALL_INIT();
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* See if we have a connection to the specified PID. */
+		if ((info = find_worker_info(pid)) == NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_OBJECT),
+					 errmsg("PID %d is not attached to this session", pid)));
+
+		/* Can't read results twice. */
+		if (info->consumed)
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("results for PID %d have already been consumed", pid)));
+		info->consumed = true;

trailing whitespace.

+		/*
+		 * Whether we succeed or fail, a future invocation of this function
+		 * may not try to read from the DSM once we've begun to do so.
+		 * Accordingly, make arrangements to clean things up at end of query.
+		 */
+		dsm_unkeep_mapping(info->seg);
+
+		/* Set up tuple-descriptor based on colum definition list. */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record"),
+					 errhint("Try calling the function in the FROM clause "
+							 "using a column definition list.")));

Hm, normally we don't add linebreaks inside error messages.

+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/* Cache state that will be needed on every call. */
+		state = palloc0(sizeof(pg_background_result_state));
+		state->info = info;
+		if (funcctx->tuple_desc->natts > 0)
+		{
+			int	natts = funcctx->tuple_desc->natts;
+			int	i;
+
+			state->receive_functions = palloc(sizeof(FmgrInfo) * natts);
+			state->typioparams = palloc(sizeof(Oid) * natts);
+
+			for (i = 0;	i < natts; ++i)
+			{
+				Oid	receive_function_id;
+
+				getTypeBinaryInputInfo(funcctx->tuple_desc->attrs[i]->atttypid,
+									   &receive_function_id,
+									   &state->typioparams[i]);
+				fmgr_info(receive_function_id, &state->receive_functions[i]);
+			}
+		}

I'm unsure right now about the rules surrounding this, but shouldn't we
check that the user is allowed to execute these? And shouldn't we fall
back to non binary functions if no binary ones are available?

+	/* Read and processes messages from the shared memory queue. */
+	for (;;)
+	{
+		char		msgtype;
+		Size		nbytes;
+		void	   *data;
+
+		/* Get next message. */
+		res = shm_mq_receive(state->info->responseq, &nbytes, &data, false);
+		if (res != SHM_MQ_SUCCESS)
+			break;
+
+		/*
+		 * Message-parsing routines operate on a null-terminated StringInfo,
+		 * so we must construct one.
+		 */
+		resetStringInfo(&msg);
+		enlargeStringInfo(&msg, nbytes);
+		msg.len = nbytes;
+		memcpy(msg.data, data, nbytes);
+		msg.data[nbytes] = '\0';
+		msgtype = pq_getmsgbyte(&msg);
+
+		/* Dispatch on message type. */
+		switch (msgtype)
+		{
+			case 'E':
+			case 'N':
+				{
+					ErrorData	edata;
+
+					/* Parse ErrorResponse or NoticeResponse. */
+					pq_parse_errornotice(&msg, &edata);
+
+					/*
+					 * Limit the maximum error level to ERROR.  We don't want
+					 * a FATAL inside the background worker to kill the user
+					 * session.
+					 */
+					if (edata.elevel > ERROR)
+						edata.elevel = ERROR;

Hm. But we still should report that it FATALed? Maybe add error context
notice about it? Not nice, but I don't have a immediately better idea. I
think it generally would be rather helpful to add the information that
this wasn't originally an error triggered by this process. The user
might otherwise be confused when looking for the origin of the error in
the log.

+			case 'A':
+				{
+					/* Propagate NotifyResponse. */
+					pq_putmessage(msg.data[0], &msg.data[1], nbytes - 1);
+					break;

Hm. Are we sure to be in a situation where the client expects these? And
are we sure their encoding is correct? The other data goe through
input/output methods checking for that, but here we don't. And the other
side AFAICS could have done a SET client_encoding.

+	/* If no data rows, return the command tags instead. */
+	if (!state->has_row_description)
+	{
+		if (tupdesc->natts != 1 || tupdesc->attrs[0]->atttypid != TEXTOID)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query did not return a result set, but "
+							"result rowtype is not a single text column")));
+		if (state->command_tags != NIL)
+		{
+			char *tag = linitial(state->command_tags);
+			Datum	value;
+			bool	isnull;
+			HeapTuple	result;
+
+			state->command_tags = list_delete_first(state->command_tags);
+			value = PointerGetDatum(cstring_to_text(tag));
+			isnull = false;
+			result = heap_form_tuple(tupdesc, &value, &isnull);
+			SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(result));
+		}

trailing whitespace.

+/*
+ * Parse a DataRow message and form a result tuple.
+ */
+static HeapTuple
+form_result_tuple(pg_background_result_state *state, TupleDesc tupdesc,
+				  StringInfo msg)
+{
+	/* Handle DataRow message. */
+	int16	natts = pq_getmsgint(msg, 2);
+	int16	i;
+	Datum  *values = NULL;
+	bool   *isnull = NULL;
+	StringInfoData	buf;
+
+	if (!state->has_row_description)
+		elog(ERROR, "DataRow not preceded by RowDescription");
+   	if (natts != tupdesc->natts)
+		elog(ERROR, "malformed DataRow");
+	if (natts > 0)
+	{
+		values = palloc(natts * sizeof(Datum));
+		isnull = palloc(natts * sizeof(bool));
+	}
+	initStringInfo(&buf);
+
+	for (i = 0; i < natts; ++i)
+	{
+		int32	bytes = pq_getmsgint(msg, 4);
+
+		if (bytes < 0)
+		{
+			values[i] = ReceiveFunctionCall(&state->receive_functions[i],
+											NULL,
+											state->typioparams[i],
+											tupdesc->attrs[i]->atttypmod);
+			isnull[i] = true;

+		}
+		else
+		{
+			resetStringInfo(&buf);
+			appendBinaryStringInfo(&buf, pq_getmsgbytes(msg, bytes), bytes);
+			values[i] = ReceiveFunctionCall(&state->receive_functions[i],
+											&buf,
+											state->typioparams[i],
+											tupdesc->attrs[i]->atttypmod);
+			isnull[i] = false;
+		}
+	}

Hm. I think you didn't check that the typemods are the same above.

+/*
+ * Detach from the dynamic shared memory segment used for communication with
+ * a background worker.  This prevents the worker from stalling waiting for
+ * us to read its results.
+ */
+Datum
+pg_background_detach(PG_FUNCTION_ARGS)
+{
+	int32		pid = PG_GETARG_INT32(0);
+	pg_background_worker_info *info;
+
+	info = find_worker_info(pid);
+	if (info == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("PID %d is not attached to this session", pid)));
+	dsm_detach(info->seg);
+
+	PG_RETURN_VOID();
+}

So there 's really no limit of who is allowed to do stuff like
this. I.e. any SECURITY DEFINER and such may do the same.

+/*
+ * Background worker entrypoint.
+ */
+void
+pg_background_worker_main(Datum main_arg)
+{
+	dsm_segment *seg;
+	shm_toc    *toc;
+	pg_background_fixed_data *fdata;
+	char	   *sql;
+	char	   *gucstate;
+	shm_mq	   *mq;
+	shm_mq_handle *responseq;
+
+	/* Establish signal handlers. */
+	pqsignal(SIGTERM, handle_sigterm);

Hm. No SIGINT?

+	/* Find data structures in dynamic shared memory. */
+	fdata = shm_toc_lookup(toc, PG_BACKGROUND_KEY_FIXED_DATA);
+	sql = shm_toc_lookup(toc, PG_BACKGROUND_KEY_SQL);
+	gucstate = shm_toc_lookup(toc, PG_BACKGROUND_KEY_GUC);
+	mq = shm_toc_lookup(toc, PG_BACKGROUND_KEY_QUEUE);
+	shm_mq_set_sender(mq, MyProc);
+	responseq = shm_mq_attach(mq, seg, NULL);

Don't these need to ensure that values have been found? shm_toc_lookup
returns NULL for unknown itmes and such and such?

+	/* Redirect protocol messages to responseq. */
+	pq_redirect_to_shm_mq(mq, responseq);
+
+	/*
+	 * Initialize our user and database ID based on the strings version of
+	 * the data, and then go back and check that we actually got the database
+	 * and user ID that we intended to get.  We do this because it's not
+	 * impossible for the process that started us to die before we get here,
+	 * and the user or database could be renamed in the meantime.  We don't
+	 * want to latch on the wrong object by accident.  There should probably
+	 * be a variant of BackgroundWorkerInitializeConnection that accepts OIDs
+	 * rather than strings.
+	 */
+	BackgroundWorkerInitializeConnection(fdata->database,
+										 fdata->authenticated_user);
+	if (fdata->database_id != MyDatabaseId ||
+		fdata->authenticated_user_id != GetAuthenticatedUserId())
+		ereport(ERROR,
+			(errmsg("user or database renamed during pg_background startup")));
+
+	/* Restore GUC values from launching backend. */
+	StartTransactionCommand();
+	RestoreGUCState(gucstate);
+	CommitTransactionCommand();

I haven't read the guc save patch, but is it a) required to this in a
transaction? We normally reload the config even without. b) correct to
do? What's with SET LOCAL variables?

+	/* Restore user ID and security context. */
+	SetUserIdAndSecContext(fdata->current_user_id, fdata->sec_context);
+
+	/* Prepare to execute the query. */
+	SetCurrentStatementStartTimestamp();
+	debug_query_string = sql;
+   	pgstat_report_activity(STATE_RUNNING, sql);
+	StartTransactionCommand();
+	if (StatementTimeout > 0)
+		enable_timeout_after(STATEMENT_TIMEOUT, StatementTimeout);
+	else
+		disable_timeout(STATEMENT_TIMEOUT, false);

I doubt that actually works correctly without a SIGINT handler as
statement timeout just falls back to kill(SIGINT)? Or does it, because
it falls back to just doing a proc_exit()? If so, is that actually safe?

+	/* Execute the query. */
+	execute_sql_string(sql);
+
+	/* Post-execution cleanup. */
+	disable_timeout(STATEMENT_TIMEOUT, false);
+	CommitTransactionCommand();

So, we're allowed to do nearly arbitrary nastyness here...

+/*
+ * Execute given SQL string.
+ *
+ * Using SPI here would preclude backgrounding commands like VACUUM which one
+ * might very well wish to launch in the background.  So we do this instead.
+ */
+static void
+execute_sql_string(const char *sql)
+{
+	List	   *raw_parsetree_list;
+	ListCell   *lc1;
+	bool		isTopLevel;
+	int			commands_remaining;
+	MemoryContext	parsecontext;
+	MemoryContext	oldcontext;
+
+	/*
+	 * Parse the SQL string into a list of raw parse trees.
+	 *
+	 * Because we allow statements that perform internal transaction control,
+	 * we can't do this in TopTransactionContext; the parse trees might get
+	 * blown away before we're done executing them.
+	 */
+	parsecontext = AllocSetContextCreate(TopMemoryContext,
+										 "pg_background parse/plan",
+										 ALLOCSET_DEFAULT_MINSIZE,
+										 ALLOCSET_DEFAULT_INITSIZE,
+										 ALLOCSET_DEFAULT_MAXSIZE);

Not that it hugely matters, but shouldn't this rather be
TopTransactionContext?

+	oldcontext = MemoryContextSwitchTo(parsecontext);
+	raw_parsetree_list = pg_parse_query(sql);
+	commands_remaining = list_length(raw_parsetree_list);
+	isTopLevel = commands_remaining == 1;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * Do parse analysis, rule rewrite, planning, and execution for each raw
+	 * parsetree.  We must fully execute each query before beginning parse
+	 * analysis on the next one, since there may be interdependencies.
+	 */
+	foreach(lc1, raw_parsetree_list)
+	{
+		Node	   *parsetree = (Node *) lfirst(lc1);
+        const char *commandTag;
+        char        completionTag[COMPLETION_TAG_BUFSIZE];
+        List       *querytree_list,
+                   *plantree_list;

borked indentation.

+		bool		snapshot_set = false;
+		Portal		portal;
+		DestReceiver *receiver;
+		int16		format = 1;
+
+		/*
+		 * We don't allow transaction-control commands like COMMIT and ABORT
+		 * here.  The entire SQL statement is executed as a single transaction
+		 * which commits if no errors are encountered.
+		 */
+		if (IsA(parsetree, TransactionStmt))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("transaction control statements are not allowed in pg_background")));

Hm. I don't think that goes far enough. This allows commands that
internally stop/start transactions like CREATE INDEX CONCURRETNLY. Are
you sure that's working right now?

+        /*
+         * Get the command name for use in status display (it also becomes the
+         * default completion tag, down inside PortalRun).  Set ps_status and
+         * do any special start-of-SQL-command processing needed by the
+         * destination.
+         */
+        commandTag = CreateCommandTag(parsetree);
+        set_ps_display(commandTag, false);
+        BeginCommand(commandTag, DestNone);

indentation.

+		/* Set up a snapshot if parse analysis/planning will need one. */
+		if (analyze_requires_snapshot(parsetree))
+		{
+			PushActiveSnapshot(GetTransactionSnapshot());
+			snapshot_set = true;
+		}
+
+		/*
+		 * OK to analyze, rewrite, and plan this query.
+		 *
+		 * As with parsing, we need to make sure this data outlives the
+		 * transaction, because of the possibility that the statement might
+		 * perform internal transaction control.
+		 */
+		oldcontext = MemoryContextSwitchTo(parsecontext);
+		querytree_list = pg_analyze_and_rewrite(parsetree, sql, NULL, 0);
+		plantree_list = pg_plan_queries(querytree_list, 0, NULL);
+
+		/* Done with the snapshot used for parsing/planning */
+        if (snapshot_set)
+            PopActiveSnapshot();

+        /* If we got a cancel signal in analysis or planning, quit */
+        CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Execute the query using the unnamed portal.
+		 */
+        portal = CreatePortal("", true, true);
+        /* Don't display the portal in pg_cursors */
+        portal->visible = false;

indentation.

+		PortalDefineQuery(portal, NULL, sql, commandTag, plantree_list, NULL);
+		PortalStart(portal, NULL, 0, InvalidSnapshot);
+		PortalSetResultFormat(portal, 1, &format);		/* binary format */
+
+		/*
+		 * Tuples returned by any command other than the last are simply
+		 * discarded; but those returned by the last (or only) command are
+		 * redirected to the shared memory queue we're using for communication
+		 * with the launching backend. If the launching backend is gone or has
+		 * detached us, these messages will just get dropped on the floor.
+		 */
+		--commands_remaining;
+		if (commands_remaining > 0)
+			receiver = CreateDestReceiver(DestNone);
+		else
+		{
+			receiver = CreateDestReceiver(DestRemote);
+			SetRemoteDestReceiverParams(receiver, portal);
+		}
+
+		/*
+		 * Only once the portal and destreceiver have been established can
+		 * we return to the transaction context.  All that stuff needs to
+		 * survive an internal commit inside PortalRun!
+		 */
+		MemoryContextSwitchTo(oldcontext);
+
+		/* Here's where we actually execute the command. */
+		(void) PortalRun(portal, FETCH_ALL, isTopLevel, receiver, receiver,
+                         completionTag);
+
+		/* Clean up the receiver. */
+		(*receiver->rDestroy) (receiver);
+
+		/*
+		 * Send a CommandComplete message even if we suppressed the query
+		 * results.  The user backend will report these in the absence of
+		 * any true query results.
+		 */
+		EndCommand(completionTag, DestRemote);

Hm. This is a fair amount of code copied from postgres.c.

I think this is interesting work, but I doubt it's ready yet. I need to
read the preceding patches, to really understand where breakage lies
hidden.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

Andres Freund

andres@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#35)

Re: pg_background (and more parallelism infrastructure patches)

On 2014-10-08 15:03:01 -0400, Robert Haas wrote:

#1 - Just committed, per discussion last week.
#2 - No comments, possibly because it's pretty boring.

fine with me.

#3 - Most people seem happy with v2, modulo the above renaming request.

If you do it without the error handling stuff, I'm good with committing
it. I'm not yet convinced that verbatimly rethrowing errors from
background workers without any indications that the process changed is a
good idea.

#4 - No comments.

This needs review imo.

#5 - No comments; trivial.

fine.

#6 - Updated per Amit's feedback. Not sure if anyone else wants to
review. Still needs docs.

This definitely needs more view.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

Amit Kapila

amit.kapila16@gmail.com

over 11 years ago

In reply to: Andres Freund (#37)

Re: pg_background (and more parallelism infrastructure patches)

On Thu, Oct 9, 2014 at 4:02 AM, Andres Freund <andres@2ndquadrant.com>
wrote:

/*
+ * Arrange to remove a dynamic shared memory mapping at cleanup time.
+ *
+ * dsm_keep_mapping() can be used to preserve a mapping for the entire
+ * lifetime of a process; this function reverses that decision, making
+ * the segment owned by the current resource owner.  This may be useful
+ * just before performing some operation that will invalidate the

segment

+ * for future use by this backend.
+ */
+void
+dsm_unkeep_mapping(dsm_segment *seg)
+{
+     Assert(seg->resowner == NULL);
+     ResourceOwnerEnlargeDSMs(CurrentResourceOwner);
+     seg->resowner = CurrentResourceOwner;
+     ResourceOwnerRememberDSM(seg->resowner, seg);
+}

Hm, I dislike the name unkeep.

I also think function name is not appropriate as per functionality.

I guess you want to be symmetric to
dsm_keep_mapping? dsm_manage_mapping(), dsm_ensure_mapping_cleanup()
dm_remember_mapping()?

Another could be dsm_change_mapping(). Yet another idea could
be that we use single function (dsm_manage_mapping() with an
additional parameter to indicate the scope of segment) instead of
two different functions dsm_keep_mapping() and
dsm_unkeep_mapping().

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#40

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Andres Freund (#37)

Re: pg_background (and more parallelism infrastructure patches)

On Wed, Oct 8, 2014 at 6:32 PM, Andres Freund <andres@2ndquadrant.com> wrote:

/*
+ * Arrange to remove a dynamic shared memory mapping at cleanup time.
+ *
+ * dsm_keep_mapping() can be used to preserve a mapping for the entire
+ * lifetime of a process; this function reverses that decision, making
+ * the segment owned by the current resource owner.  This may be useful
+ * just before performing some operation that will invalidate the segment
+ * for future use by this backend.
+ */
+void
+dsm_unkeep_mapping(dsm_segment *seg)
+{
+     Assert(seg->resowner == NULL);
+     ResourceOwnerEnlargeDSMs(CurrentResourceOwner);
+     seg->resowner = CurrentResourceOwner;
+     ResourceOwnerRememberDSM(seg->resowner, seg);
+}

Hm, I dislike the name unkeep. I guess you want to be symmetric to
dsm_keep_mapping? dsm_manage_mapping(), dsm_ensure_mapping_cleanup()
dm_remember_mapping()?

Yes, I want to be symmetrical with dsm_keep_mapping, which is itself
symmetrical with dsm_keep_segment(). Your proposed names don't sound
good to me because it's not obvious that "manage" or "remember" is the
opposite of "keep". It's pretty easy to remember that "unkeep" is the
opposite of "keep" though.

If I had to pick one of those, I'd probably pick
dsm_ensure_mapping_cleanup(), but I don't like that much. It
describes what the function does fairly well, but it's totally unlike
the function it pairs with.

Another idea is to add another argument to the existing function
(possibly renaming it along the way). For example, we could have
dsm_keep_mapping(seg, true) mean keep it, and dsm_keep_mapping(seg,
false) meaning don't keep it. That would break existing callers, but
there probably aren't many of those. We could even - and much more
generally - replace it with dsm_set_resowner(seg, res), but that would
require #including some some stuff into dsm.h that we might rather not
have there.

I'll respond to the rest of this later.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41

Petr Jelinek

petr@2ndquadrant.com

about 11 years ago

In reply to: Andres Freund (#37)

Re: pg_background (and more parallelism infrastructure patches)

On 09/10/14 00:32, Andres Freund wrote:

From c835a06f20792556d35a0eee4c2fa21f5f23e8a3 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 11 Jul 2014 09:53:40 -0400
Subject: [PATCH 6/6] pg_background: Run commands in a background worker, and
get the results.

The currently-active GUC values from the user session will be copied
to the background worker. If the command returns a result set, you
can retrieve the result set; if not, you can retrieve the command
tags. If the command fails with an error, the same error will be
thrown in the launching process when the results are retrieved.
Warnings and other messages generated by the background worker, and
notifications received by it, are also propagated to the foreground
process.

I got to ask: Why is it helpful that we have this in contrib? I have a
good share of blame to bear for that, but I think we need to stop
dilluting contrib evermore with test programs. These have a place, but I
don't think it should be contrib.

I don't see this as just mere test module, it seems to provide actual
useful functionality (I still dream of it being extended with scheduler
eventually and even if it's not, you still get "asynchronous transactions").

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Andres Freund (#37)

4 attachment(s)

Re: pg_background (and more parallelism infrastructure patches)

On Wed, Oct 8, 2014 at 6:32 PM, Andres Freund <andres@2ndquadrant.com> wrote:

I got to ask: Why is it helpful that we have this in contrib? I have a
good share of blame to bear for that, but I think we need to stop
dilluting contrib evermore with test programs. These have a place, but I
don't think it should be contrib.

I don't think pg_background is merely a test program: I think it's a
quite useful piece of functionality. It can be used for running
VACUUM from a procedure, which is something people have asked for more
than once, or for simulating an autonomous transaction. Granted,
it'll be a lot slower than a real autonomous transaction, but it's
still better than doing it via dblink, because you don't have to futz
with authentication.

I would be all in favor of moving things like test_decoding,
test_parser, and test_shm_mq to src/test or contrib/test or wherever
else we want to put them, but I think pg_background belongs in
contrib.

+/* Fixed-size data passed via our dynamic shared memory segment. */
+typedef struct pg_background_fixed_data
+{
+     Oid     database_id;
+     Oid     authenticated_user_id;
+     Oid     current_user_id;
+     int     sec_context;
+     char database[NAMEDATALEN];
+     char authenticated_user[NAMEDATALEN];
+} pg_background_fixed_data;

Why not NameData?

No particular reason. Changed.

whitespace damage.

I went through and fixed everything that git diff --check complained
about. Let me know if you see other problems yet.

+static HTAB *worker_hash;

Hm. So the lifetime of this hash's contents is managed via
on_dsm_detach(), do I understand that correctly?

More or less, yes.

Hm. So every user can do this once the extension is created as the
functions are most likely to be PUBLIC. Is this a good idea?

Why not? If they can log in, they could start separate sessions with
similar effect.

+             /*
+              * Whether we succeed or fail, a future invocation of this function
+              * may not try to read from the DSM once we've begun to do so.
+              * Accordingly, make arrangements to clean things up at end of query.
+              */
+             dsm_unkeep_mapping(info->seg);
+
+             /* Set up tuple-descriptor based on colum definition list. */
+             if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+                     ereport(ERROR,
+                                     (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                                      errmsg("function returning record called in context "
+                                                     "that cannot accept type record"),
+                                      errhint("Try calling the function in the FROM clause "
+                                                      "using a column definition list.")));

Hm, normally we don't add linebreaks inside error messages.

I copied it from dblink.

I'm unsure right now about the rules surrounding this, but shouldn't we
check that the user is allowed to execute these? And shouldn't we fall
back to non binary functions if no binary ones are available?

I can't see any reason to do either of those things. I'm not aware
that returning data in binary format is in any way intended to be a
security-restricted operation, or that we have any data types that
actually matter without send and receive functions. If we do, I think
the solution is to add them, not make this more complicated.

+                                     /*
+                                      * Limit the maximum error level to ERROR.  We don't want
+                                      * a FATAL inside the background worker to kill the user
+                                      * session.
+                                      */
+                                     if (edata.elevel > ERROR)
+                                             edata.elevel = ERROR;
Hm. But we still should report that it FATALed? Maybe add error context
notice about it? Not nice, but I don't have a immediately better idea. I
think it generally would be rather helpful to add the information that
this wasn't originally an error triggered by this process. The user
might otherwise be confused when looking for the origin of the error in
the log.

Yeah, I was wondering if we needed some kind of annotation here. What
I'm wondering about is appending something to the errcontext, perhaps
"background worker, PID %d".

+                     case 'A':
+                             {
+                                     /* Propagate NotifyResponse. */
+                                     pq_putmessage(msg.data[0], &msg.data[1], nbytes - 1);
+                                     break;
Hm. Are we sure to be in a situation where the client expects these? And
are we sure their encoding is correct? The other data goe through
input/output methods checking for that, but here we don't. And the other
side AFAICS could have done a SET client_encoding.

I think there's no problem at the protocol level; I think the server
can send NotifyResponse pretty much whenever. It could be argued that
this is a POLA violation, but dropping the notify messages on the
floor (which seems to be the other option) doesn't seem like superior.
So I think this is mostly a matter of documentation.

+/*
+ * Parse a DataRow message and form a result tuple.
+ */
+static HeapTuple
+form_result_tuple(pg_background_result_state *state, TupleDesc tupdesc,
+                               StringInfo msg)
+{
+     /* Handle DataRow message. */
+     int16   natts = pq_getmsgint(msg, 2);
+     int16   i;
+     Datum  *values = NULL;
+     bool   *isnull = NULL;
+     StringInfoData  buf;
+
+     if (!state->has_row_description)
+             elog(ERROR, "DataRow not preceded by RowDescription");
+     if (natts != tupdesc->natts)
+             elog(ERROR, "malformed DataRow");
+     if (natts > 0)
+     {
+             values = palloc(natts * sizeof(Datum));
+             isnull = palloc(natts * sizeof(bool));
+     }
+     initStringInfo(&buf);
+
+     for (i = 0; i < natts; ++i)
+     {
+             int32   bytes = pq_getmsgint(msg, 4);
+
+             if (bytes < 0)
+             {
+                     values[i] = ReceiveFunctionCall(&state->receive_functions[i],
+                                                                                     NULL,
+                                                                                     state->typioparams[i],
+                                                                                     tupdesc->attrs[i]->atttypmod);
+                     isnull[i] = true;

+             }
+             else
+             {
+                     resetStringInfo(&buf);
+                     appendBinaryStringInfo(&buf, pq_getmsgbytes(msg, bytes), bytes);
+                     values[i] = ReceiveFunctionCall(&state->receive_functions[i],
+                                                                                     &buf,
+                                                                                     state->typioparams[i],
+                                                                                     tupdesc->attrs[i]->atttypmod);
+                     isnull[i] = false;
+             }
+     }

Hm. I think you didn't check that the typemods are the same above.

The same as what?

+Datum
+pg_background_detach(PG_FUNCTION_ARGS)
+{
+     int32           pid = PG_GETARG_INT32(0);
+     pg_background_worker_info *info;
+
+     info = find_worker_info(pid);
+     if (info == NULL)
+             ereport(ERROR,
+                             (errcode(ERRCODE_UNDEFINED_OBJECT),
+                              errmsg("PID %d is not attached to this session", pid)));
+     dsm_detach(info->seg);
+
+     PG_RETURN_VOID();
+}

So there 's really no limit of who is allowed to do stuff like
this. I.e. any SECURITY DEFINER and such may do the same.

Do you think we need a restriction? It's not obvious to me that there
are any security-relevant consequences to this, but it's an important
question, and I might be missing something.

+ /* Establish signal handlers. */
+ pqsignal(SIGTERM, handle_sigterm);

Hm. No SIGINT?

Nope; bgworker.c sets it to StatementCancelHandler, which should be
fine. Ideally I wouldn't have to do anything with SIGTERM either, but
bgworker.c sets it to bgworker_die(), which is pretty much complete
junk. It's not safe to just ereport() from within whatever the heck
the caller is doing. We should probably drop a small thermonuclear
weapon on bgworker_die(), but that's a problem for another patch.

+     /* Find data structures in dynamic shared memory. */
+     fdata = shm_toc_lookup(toc, PG_BACKGROUND_KEY_FIXED_DATA);
+     sql = shm_toc_lookup(toc, PG_BACKGROUND_KEY_SQL);
+     gucstate = shm_toc_lookup(toc, PG_BACKGROUND_KEY_GUC);
+     mq = shm_toc_lookup(toc, PG_BACKGROUND_KEY_QUEUE);
+     shm_mq_set_sender(mq, MyProc);
+     responseq = shm_mq_attach(mq, seg, NULL);

Don't these need to ensure that values have been found? shm_toc_lookup
returns NULL for unknown itmes and such and such?

Meh. The launching process would have errored out if it hadn't been
able to set up the segment correctly. We could add some Assert()
statements if you really feel strongly about it, but it seems fairly
pointless to me. Any situation where those pointers come back NULL is
presumably going to be some sort of really stupid bug that will be
found even by trivial testing.

+     /* Restore GUC values from launching backend. */
+     StartTransactionCommand();
+     RestoreGUCState(gucstate);
+     CommitTransactionCommand();
I haven't read the guc save patch, but is it a) required to this in a
transaction? We normally reload the config even without. b) correct to
do? What's with SET LOCAL variables?

(a) Yeah, it doesn't work without that. I forget what breaks, but if
you taking those out, it will blow up.

(b) Do those need special handling for some reason?

I doubt that actually works correctly without a SIGINT handler as
statement timeout just falls back to kill(SIGINT)? Or does it, because
it falls back to just doing a proc_exit()? If so, is that actually safe?

See above kvetching.

+     /* Post-execution cleanup. */
+     disable_timeout(STATEMENT_TIMEOUT, false);
+     CommitTransactionCommand();
So, we're allowed to do nearly arbitrary nastyness here...

Can you be more specific about the nature of your concern? This is no
different than finish_xact_command().

+     /*
+      * Parse the SQL string into a list of raw parse trees.
+      *
+      * Because we allow statements that perform internal transaction control,
+      * we can't do this in TopTransactionContext; the parse trees might get
+      * blown away before we're done executing them.
+      */
+     parsecontext = AllocSetContextCreate(TopMemoryContext,
+                                                                              "pg_background parse/plan",
+                                                                              ALLOCSET_DEFAULT_MINSIZE,
+                                                                              ALLOCSET_DEFAULT_INITSIZE,
+                                                                              ALLOCSET_DEFAULT_MAXSIZE);

Not that it hugely matters, but shouldn't this rather be
TopTransactionContext?

No, for the reasons explained in the command that you quoted right, uh, there.

+             bool            snapshot_set = false;
+             Portal          portal;
+             DestReceiver *receiver;
+             int16           format = 1;
+
+             /*
+              * We don't allow transaction-control commands like COMMIT and ABORT
+              * here.  The entire SQL statement is executed as a single transaction
+              * which commits if no errors are encountered.
+              */
+             if (IsA(parsetree, TransactionStmt))
+                     ereport(ERROR,
+                                     (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                                      errmsg("transaction control statements are not allowed in pg_background")));

Hm. I don't think that goes far enough. This allows commands that
internally stop/start transactions like CREATE INDEX CONCURRETNLY. Are
you sure that's working right now?

I tested VACUUM not of a specific table and that seems to work
correctly. I could try CIC, but I think the issues are the same. If
we only get one parse tree, then isTopLevel will be true and it's safe
for commands to do their own transaction control. If we get multiple
parse trees, then PreventTransactionChain will do its thing. This is
not novel territory.

Hm. This is a fair amount of code copied from postgres.c.

Yes. I'm open to suggestions, but I don't immediately see a better way.

I think this is interesting work, but I doubt it's ready yet. I need to
read the preceding patches, to really understand where breakage lies
hidden.

Breakage??? In my code??? Surely not. :-)

I'm reattaching all the uncommitted patches here. #3 and #6 have been
updated; #2 and #4 are unchanged.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0002-Extend-dsm-API-with-a-new-function-dsm_unkeep_mappin.patchtext/x-patch; charset=US-ASCII; name=0002-Extend-dsm-API-with-a-new-function-dsm_unkeep_mappin.patchDownload

From d982dbb620b6a5d9992b3259550c3e16e7e8e28a Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 23 Jul 2014 11:16:44 -0400
Subject: [PATCH 2/6] Extend dsm API with a new function dsm_unkeep_mapping.

This reassociates a dynamic shared memory handle previous passed to
dsm_keep_mapping with the current resource owner, so that it will be
cleaned up at the end of the current query.
---
 src/backend/storage/ipc/dsm.c |   18 ++++++++++++++++++
 src/include/storage/dsm.h     |    1 +
 2 files changed, 19 insertions(+)

diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index a5c0084..6039beb 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -796,6 +796,24 @@ dsm_keep_mapping(dsm_segment *seg)
 }
 
 /*
+ * Arrange to remove a dynamic shared memory mapping at cleanup time.
+ *
+ * dsm_keep_mapping() can be used to preserve a mapping for the entire
+ * lifetime of a process; this function reverses that decision, making
+ * the segment owned by the current resource owner.  This may be useful
+ * just before performing some operation that will invalidate the segment
+ * for future use by this backend.
+ */
+void
+dsm_unkeep_mapping(dsm_segment *seg)
+{
+	Assert(seg->resowner == NULL);
+	ResourceOwnerEnlargeDSMs(CurrentResourceOwner);
+	seg->resowner = CurrentResourceOwner;
+	ResourceOwnerRememberDSM(seg->resowner, seg);
+}
+
+/*
  * Keep a dynamic shared memory segment until postmaster shutdown.
  *
  * This function should not be called more than once per segment;
diff --git a/src/include/storage/dsm.h b/src/include/storage/dsm.h
index 1d0110d..1694409 100644
--- a/src/include/storage/dsm.h
+++ b/src/include/storage/dsm.h
@@ -37,6 +37,7 @@ extern void dsm_detach(dsm_segment *seg);
 
 /* Resource management functions. */
 extern void dsm_keep_mapping(dsm_segment *seg);
+extern void dsm_unkeep_mapping(dsm_segment *seg);
 extern void dsm_keep_segment(dsm_segment *seg);
 extern dsm_segment *dsm_find_mapping(dsm_handle h);
 
-- 
1.7.9.6 (Apple Git-31.1)

0003-Support-frontend-backend-protocol-communication-usin-v3.patchtext/x-patch; charset=US-ASCII; name=0003-Support-frontend-backend-protocol-communication-usin-v3.patchDownload

commit e23e994817526afb5de0106e5ea612894ae23fbd
Author: Robert Haas <rhaas@postgresql.org>
Date:   Wed May 28 19:30:21 2014 -0400

    Support frontend-backend protocol communication using a shm_mq.
    
    A background worker can use pq_redirect_to_shm_mq() to direct protocol
    that would normally be sent to the frontend to a shm_mq so that another
    process may read them.
    
    The receiving process may use pq_parse_errornotice() to parse an
    ErrorResponse or NoticeResponse from the background worker and, if
    it wishes, ThrowErrorData() to propagate the error (with or without
    further modification).
    
    V2: Lots more hooks, and a struct!
    V3: Fix whitespace damage; rename send-methods to comm-methods.

diff --git a/src/backend/libpq/Makefile b/src/backend/libpq/Makefile
index 8be0572..09410c4 100644
--- a/src/backend/libpq/Makefile
+++ b/src/backend/libpq/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
 # be-fsstubs is here for historical reasons, probably belongs elsewhere
 
 OBJS = be-fsstubs.o be-secure.o auth.o crypt.o hba.o ip.o md5.o pqcomm.o \
-       pqformat.o pqsignal.o
+       pqformat.o pqmq.o pqsignal.o
 
 ifeq ($(with_openssl),yes)
 OBJS += be-secure-openssl.o
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 605d891..dcbb704 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -102,7 +102,6 @@
 int			Unix_socket_permissions;
 char	   *Unix_socket_group;
 
-
 /* Where the Unix socket files are (list of palloc'd strings) */
 static List *sock_paths = NIL;
 
@@ -134,16 +133,38 @@ static bool DoingCopyOut;
 
 
 /* Internal functions */
-static void pq_close(int code, Datum arg);
+static void socket_comm_reset(void);
+static void socket_close(int code, Datum arg);
+static void socket_set_nonblocking(bool nonblocking);
+static int	socket_flush(void);
+static int	socket_flush_if_writable(void);
+static bool socket_is_send_pending(void);
+static int	socket_putmessage(char msgtype, const char *s, size_t len);
+static void socket_putmessage_noblock(char msgtype, const char *s, size_t len);
+static void socket_startcopyout(void);
+static void socket_endcopyout(bool errorAbort);
 static int	internal_putbytes(const char *s, size_t len);
 static int	internal_flush(void);
-static void pq_set_nonblocking(bool nonblocking);
+static void socket_set_nonblocking(bool nonblocking);
 
 #ifdef HAVE_UNIX_SOCKETS
 static int	Lock_AF_UNIX(char *unixSocketDir, char *unixSocketPath);
 static int	Setup_AF_UNIX(char *sock_path);
 #endif   /* HAVE_UNIX_SOCKETS */
 
+PQcommMethods PQcommSocketMethods;
+
+static PQcommMethods PqCommSocketMethods = {
+	socket_comm_reset,
+	socket_flush,
+	socket_flush_if_writable,
+	socket_is_send_pending,
+	socket_putmessage,
+	socket_putmessage_noblock,
+	socket_startcopyout,
+	socket_endcopyout
+};
+
 
 /* --------------------------------
  *		pq_init - initialize libpq at backend startup
@@ -152,24 +173,25 @@ static int	Setup_AF_UNIX(char *sock_path);
 void
 pq_init(void)
 {
+	PqCommMethods = &PqCommSocketMethods;
 	PqSendBufferSize = PQ_SEND_BUFFER_SIZE;
 	PqSendBuffer = MemoryContextAlloc(TopMemoryContext, PqSendBufferSize);
 	PqSendPointer = PqSendStart = PqRecvPointer = PqRecvLength = 0;
 	PqCommBusy = false;
 	DoingCopyOut = false;
-	on_proc_exit(pq_close, 0);
+	on_proc_exit(socket_close, 0);
 }
 
 /* --------------------------------
- *		pq_comm_reset - reset libpq during error recovery
+ *		socket_comm_reset - reset libpq during error recovery
  *
  * This is called from error recovery at the outer idle loop.  It's
  * just to get us out of trouble if we somehow manage to elog() from
  * inside a pqcomm.c routine (which ideally will never happen, but...)
  * --------------------------------
  */
-void
-pq_comm_reset(void)
+static void
+socket_comm_reset(void)
 {
 	/* Do not throw away pending data, but do reset the busy flag */
 	PqCommBusy = false;
@@ -178,14 +200,14 @@ pq_comm_reset(void)
 }
 
 /* --------------------------------
- *		pq_close - shutdown libpq at backend exit
+ *		socket_close - shutdown libpq at backend exit
  *
  * Note: in a standalone backend MyProcPort will be null,
  * don't crash during exit...
  * --------------------------------
  */
 static void
-pq_close(int code, Datum arg)
+socket_close(int code, Datum arg)
 {
 	if (MyProcPort != NULL)
 	{
@@ -783,15 +805,20 @@ TouchSocketFiles(void)
  */
 
 /* --------------------------------
- *			  pq_set_nonblocking - set socket blocking/non-blocking
+ *			  socket_set_nonblocking - set socket blocking/non-blocking
  *
  * Sets the socket non-blocking if nonblocking is TRUE, or sets it
  * blocking otherwise.
  * --------------------------------
  */
 static void
-pq_set_nonblocking(bool nonblocking)
+socket_set_nonblocking(bool nonblocking)
 {
+	if (MyProcPort == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONNECTION_DOES_NOT_EXIST),
+				 errmsg("there is no client connection")));
+
 	if (MyProcPort->noblock == nonblocking)
 		return;
 
@@ -844,7 +871,7 @@ pq_recvbuf(void)
 	}
 
 	/* Ensure that we're in blocking mode */
-	pq_set_nonblocking(false);
+	socket_set_nonblocking(false);
 
 	/* Can fill buffer from PqRecvLength and upwards */
 	for (;;)
@@ -935,7 +962,7 @@ pq_getbyte_if_available(unsigned char *c)
 	}
 
 	/* Put the socket into non-blocking mode */
-	pq_set_nonblocking(true);
+	socket_set_nonblocking(true);
 
 	r = secure_read(MyProcPort, c, 1);
 	if (r < 0)
@@ -1194,7 +1221,7 @@ internal_putbytes(const char *s, size_t len)
 		/* If buffer is full, then flush it out */
 		if (PqSendPointer >= PqSendBufferSize)
 		{
-			pq_set_nonblocking(false);
+			socket_set_nonblocking(false);
 			if (internal_flush())
 				return EOF;
 		}
@@ -1210,13 +1237,13 @@ internal_putbytes(const char *s, size_t len)
 }
 
 /* --------------------------------
- *		pq_flush		- flush pending output
+ *		socket_flush		- flush pending output
  *
  *		returns 0 if OK, EOF if trouble
  * --------------------------------
  */
-int
-pq_flush(void)
+static int
+socket_flush(void)
 {
 	int			res;
 
@@ -1224,7 +1251,7 @@ pq_flush(void)
 	if (PqCommBusy)
 		return 0;
 	PqCommBusy = true;
-	pq_set_nonblocking(false);
+	socket_set_nonblocking(false);
 	res = internal_flush();
 	PqCommBusy = false;
 	return res;
@@ -1310,8 +1337,8 @@ internal_flush(void)
  * Returns 0 if OK, or EOF if trouble.
  * --------------------------------
  */
-int
-pq_flush_if_writable(void)
+static int
+socket_flush_if_writable(void)
 {
 	int			res;
 
@@ -1324,7 +1351,7 @@ pq_flush_if_writable(void)
 		return 0;
 
 	/* Temporarily put the socket into non-blocking mode */
-	pq_set_nonblocking(true);
+	socket_set_nonblocking(true);
 
 	PqCommBusy = true;
 	res = internal_flush();
@@ -1333,11 +1360,11 @@ pq_flush_if_writable(void)
 }
 
 /* --------------------------------
- *		pq_is_send_pending	- is there any pending data in the output buffer?
+ *	socket_is_send_pending	- is there any pending data in the output buffer?
  * --------------------------------
  */
-bool
-pq_is_send_pending(void)
+static bool
+socket_is_send_pending(void)
 {
 	return (PqSendStart < PqSendPointer);
 }
@@ -1351,7 +1378,7 @@ pq_is_send_pending(void)
 
 
 /* --------------------------------
- *		pq_putmessage	- send a normal message (suppressed in COPY OUT mode)
+ *		socket_putmessage - send a normal message (suppressed in COPY OUT mode)
  *
  *		If msgtype is not '\0', it is a message type code to place before
  *		the message body.  If msgtype is '\0', then the message has no type
@@ -1375,8 +1402,8 @@ pq_is_send_pending(void)
  *		returns 0 if OK, EOF if trouble
  * --------------------------------
  */
-int
-pq_putmessage(char msgtype, const char *s, size_t len)
+static int
+socket_putmessage(char msgtype, const char *s, size_t len)
 {
 	if (DoingCopyOut || PqCommBusy)
 		return 0;
@@ -1408,8 +1435,8 @@ fail:
  *		If the output buffer is too small to hold the message, the buffer
  *		is enlarged.
  */
-void
-pq_putmessage_noblock(char msgtype, const char *s, size_t len)
+static void
+socket_putmessage_noblock(char msgtype, const char *s, size_t len)
 {
 	int res		PG_USED_FOR_ASSERTS_ONLY;
 	int			required;
@@ -1431,18 +1458,18 @@ pq_putmessage_noblock(char msgtype, const char *s, size_t len)
 
 
 /* --------------------------------
- *		pq_startcopyout - inform libpq that an old-style COPY OUT transfer
+ *		socket_startcopyout - inform libpq that an old-style COPY OUT transfer
  *			is beginning
  * --------------------------------
  */
-void
-pq_startcopyout(void)
+static void
+socket_startcopyout(void)
 {
 	DoingCopyOut = true;
 }
 
 /* --------------------------------
- *		pq_endcopyout	- end an old-style COPY OUT transfer
+ *		socket_endcopyout	- end an old-style COPY OUT transfer
  *
  *		If errorAbort is indicated, we are aborting a COPY OUT due to an error,
  *		and must send a terminator line.  Since a partial data line might have
@@ -1451,8 +1478,8 @@ pq_startcopyout(void)
  *		not allow binary transfers, so a textual terminator is always correct.
  * --------------------------------
  */
-void
-pq_endcopyout(bool errorAbort)
+static void
+socket_endcopyout(bool errorAbort)
 {
 	if (!DoingCopyOut)
 		return;
@@ -1462,7 +1489,6 @@ pq_endcopyout(bool errorAbort)
 	DoingCopyOut = false;
 }
 
-
 /*
  * Support for TCP Keepalive parameters
  */
diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
new file mode 100644
index 0000000..6e6b429
--- /dev/null
+++ b/src/backend/libpq/pqmq.c
@@ -0,0 +1,261 @@
+/*-------------------------------------------------------------------------
+ *
+ * pqmq.c
+ *	  Use the frontend/backend protocol for communication over a shm_mq
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *	src/backend/libpq/pqmq.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "libpq/libpq.h"
+#include "libpq/pqformat.h"
+#include "libpq/pqmq.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+static shm_mq *pq_mq;
+static shm_mq_handle *pq_mq_handle;
+static bool pq_mq_busy = false;
+
+static void mq_comm_reset(void);
+static int	mq_flush(void);
+static int	mq_flush_if_writable(void);
+static bool mq_is_send_pending(void);
+static int	mq_putmessage(char msgtype, const char *s, size_t len);
+static void mq_putmessage_noblock(char msgtype, const char *s, size_t len);
+static void mq_startcopyout(void);
+static void mq_endcopyout(bool errorAbort);
+
+static PQcommMethods PqCommMqMethods = {
+	mq_comm_reset,
+	mq_flush,
+	mq_flush_if_writable,
+	mq_is_send_pending,
+	mq_putmessage,
+	mq_putmessage_noblock,
+	mq_startcopyout,
+	mq_endcopyout
+};
+
+/*
+ * Arrange to redirect frontend/backend protocol messages to a shared-memory
+ * message queue.
+ */
+void
+pq_redirect_to_shm_mq(shm_mq *mq, shm_mq_handle *mqh)
+{
+	PqCommMethods = &PqCommMqMethods;
+	pq_mq = mq;
+	pq_mq_handle = mqh;
+	whereToSendOutput = DestRemote;
+	FrontendProtocol = PG_PROTOCOL_LATEST;
+}
+
+static void
+mq_comm_reset(void)
+{
+	/* Nothing to do. */
+}
+
+static int
+mq_flush(void)
+{
+	/* Nothing to do. */
+	return 0;
+}
+
+static int
+mq_flush_if_writable(void)
+{
+	/* Nothing to do. */
+	return 0;
+}
+
+static bool
+mq_is_send_pending(void)
+{
+	/* There's never anything pending. */
+	return 0;
+}
+
+/*
+ * Transmit a libpq protocol message to the shared memory message queue
+ * selected via pq_mq_handle.  We don't include a length word, because the
+ * receiver will know the length of the message from shm_mq_receive().
+ */
+static int
+mq_putmessage(char msgtype, const char *s, size_t len)
+{
+	shm_mq_iovec	iov[2];
+	shm_mq_result	result;
+
+	/*
+	 * If we're sending a message, and we have to wait because the
+	 * queue is full, and then we get interrupted, and that interrupt
+	 * results in trying to send another message, we respond by detaching
+	 * the queue.  There's no way to return to the original context, but
+	 * even if there were, just queueing the message would amount to
+	 * indefinitely postponing the response to the interrupt.  So we do
+	 * this instead.
+	 */
+	if (pq_mq_busy)
+	{
+		if (pq_mq != NULL)
+			shm_mq_detach(pq_mq);
+		pq_mq = NULL;
+		return EOF;
+	}
+
+	pq_mq_busy = true;
+
+	iov[0].data = &msgtype;
+	iov[0].len = 1;
+	iov[1].data = s;
+	iov[1].len = len;
+
+	Assert(pq_mq_handle != NULL);
+	result = shm_mq_sendv(pq_mq_handle, iov, 2, false);
+
+	pq_mq_busy = false;
+
+	Assert(result == SHM_MQ_SUCCESS || result == SHM_MQ_DETACHED);
+	if (result != SHM_MQ_SUCCESS)
+		return EOF;
+	return 0;
+}
+
+static void
+mq_putmessage_noblock(char msgtype, const char *s, size_t len)
+{
+	/*
+	 * While the shm_mq machinery does support sending a message in
+	 * non-blocking mode, there's currently no way to try sending beginning
+	 * to send the message that doesn't also commit us to completing the
+	 * transmission.  This could be improved in the future, but for now
+	 * we don't need it.
+	 */
+	elog(ERROR, "not currently supported");
+}
+
+static void
+mq_startcopyout(void)
+{
+	/* Nothing to do. */
+}
+
+static void
+mq_endcopyout(bool errorAbort)
+{
+	/* Nothing to do. */
+}
+
+/*
+ * Parse an ErrorResponse or NoticeResponse payload and populate an ErrorData
+ * structure with the results.
+ */
+void
+pq_parse_errornotice(StringInfo msg, ErrorData *edata)
+{
+	/* Initialize edata with reasonable defaults. */
+	MemSet(edata, 0, sizeof(ErrorData));
+	edata->elevel = ERROR;
+	edata->assoc_context = CurrentMemoryContext;
+
+	/* Loop over fields and extract each one. */
+	for (;;)
+	{
+		char	code = pq_getmsgbyte(msg);
+		const char *value;
+
+		if (code == '\0')
+		{
+			pq_getmsgend(msg);
+			break;
+		}
+		value = pq_getmsgstring(msg);
+
+		switch (code)
+		{
+			case PG_DIAG_SEVERITY:
+				if (strcmp(value, "DEBUG") == 0)
+					edata->elevel = DEBUG1;	/* or some other DEBUG level */
+				else if (strcmp(value, "LOG") == 0)
+					edata->elevel = LOG;	/* can't be COMMERROR */
+				else if (strcmp(value, "INFO") == 0)
+					edata->elevel = INFO;
+				else if (strcmp(value, "NOTICE") == 0)
+					edata->elevel = NOTICE;
+				else if (strcmp(value, "WARNING") == 0)
+					edata->elevel = WARNING;
+				else if (strcmp(value, "ERROR") == 0)
+					edata->elevel = ERROR;
+				else if (strcmp(value, "FATAL") == 0)
+					edata->elevel = FATAL;
+				else if (strcmp(value, "PANIC") == 0)
+					edata->elevel = PANIC;
+				else
+					elog(ERROR, "unknown error severity");
+				break;
+			case PG_DIAG_SQLSTATE:
+				if (strlen(value) != 5)
+					elog(ERROR, "malformed sql state");
+				edata->sqlerrcode = MAKE_SQLSTATE(value[0], value[1], value[2],
+												  value[3], value[4]);
+				break;
+			case PG_DIAG_MESSAGE_PRIMARY:
+				edata->message = pstrdup(value);
+				break;
+			case PG_DIAG_MESSAGE_DETAIL:
+				edata->detail = pstrdup(value);
+				break;
+			case PG_DIAG_MESSAGE_HINT:
+				edata->hint = pstrdup(value);
+				break;
+			case PG_DIAG_STATEMENT_POSITION:
+				edata->cursorpos = pg_atoi(value, sizeof(int), '\0');
+				break;
+			case PG_DIAG_INTERNAL_POSITION:
+				edata->internalpos = pg_atoi(value, sizeof(int), '\0');
+				break;
+			case PG_DIAG_INTERNAL_QUERY:
+				edata->internalquery = pstrdup(value);
+				break;
+			case PG_DIAG_CONTEXT:
+				edata->context = pstrdup(value);
+				break;
+			case PG_DIAG_SCHEMA_NAME:
+				edata->schema_name = pstrdup(value);
+				break;
+			case PG_DIAG_TABLE_NAME:
+				edata->table_name = pstrdup(value);
+				break;
+			case PG_DIAG_COLUMN_NAME:
+				edata->column_name = pstrdup(value);
+				break;
+			case PG_DIAG_DATATYPE_NAME:
+				edata->datatype_name = pstrdup(value);
+				break;
+			case PG_DIAG_CONSTRAINT_NAME:
+				edata->constraint_name = pstrdup(value);
+				break;
+			case PG_DIAG_SOURCE_FILE:
+				edata->filename = pstrdup(value);
+				break;
+			case PG_DIAG_SOURCE_LINE:
+				edata->lineno = pg_atoi(value, sizeof(int), '\0');
+				break;
+			case PG_DIAG_SOURCE_FUNCTION:
+				edata->funcname = pstrdup(value);
+				break;
+			default:
+				elog(ERROR, "unknown error field: %d", (int) code);
+				break;
+		}
+	}
+}
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index ca5a8a5..1d13363 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -34,7 +34,7 @@
  * overflow.
  */
 int32
-pg_atoi(char *s, int size, int c)
+pg_atoi(const char *s, int size, int c)
 {
 	long		l;
 	char	   *badp;
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 32a9663..2316464 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -1577,6 +1577,57 @@ FlushErrorState(void)
 }
 
 /*
+ * ThrowErrorData --- report an error described by an ErrorData structure
+ *
+ * This is intended to be used to re-report errors originally thrown by
+ * background worker processes and then propagated (with or without
+ * modification) to the backend responsible for them.
+ */
+void
+ThrowErrorData(ErrorData *edata)
+{
+	ErrorData *newedata;
+	MemoryContext	oldcontext;
+
+	if (!errstart(edata->elevel, edata->filename, edata->lineno,
+				  edata->funcname, NULL))
+		return;
+
+	newedata = &errordata[errordata_stack_depth];
+	oldcontext = MemoryContextSwitchTo(edata->assoc_context);
+
+	/* Copy the supplied fields to the error stack. */
+	if (edata->sqlerrcode > 0)
+		newedata->sqlerrcode = edata->sqlerrcode;
+	if (edata->message)
+		newedata->message = pstrdup(edata->message);
+	if (edata->detail)
+		newedata->detail = pstrdup(edata->detail);
+	if (edata->detail_log)
+		newedata->detail_log = pstrdup(edata->detail_log);
+	if (edata->hint)
+		newedata->hint = pstrdup(edata->hint);
+	if (edata->context)
+		newedata->context = pstrdup(edata->context);
+	if (edata->schema_name)
+		newedata->schema_name = pstrdup(edata->schema_name);
+	if (edata->table_name)
+		newedata->table_name = pstrdup(edata->table_name);
+	if (edata->column_name)
+		newedata->column_name = pstrdup(edata->column_name);
+	if (edata->datatype_name)
+		newedata->datatype_name = pstrdup(edata->datatype_name);
+	if (edata->constraint_name)
+		newedata->constraint_name = pstrdup(edata->constraint_name);
+	if (edata->internalquery)
+		newedata->internalquery = pstrdup(edata->internalquery);
+
+	MemoryContextSwitchTo(oldcontext);
+
+	errfinish(0);
+}
+
+/*
  * ReThrowError --- re-throw a previously copied error
  *
  * A handler can do CopyErrorData/FlushErrorState to get out of the error
diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
index 5da9d8d..409f3d7 100644
--- a/src/include/libpq/libpq.h
+++ b/src/include/libpq/libpq.h
@@ -37,6 +37,31 @@ typedef struct
 	}			u;
 } PQArgBlock;
 
+typedef struct
+{
+	void (*comm_reset)(void);
+	int	(*flush)(void);
+	int	(*flush_if_writable)(void);
+	bool (*is_send_pending)(void);
+	int	(*putmessage)(char msgtype, const char *s, size_t len);
+	void (*putmessage_noblock)(char msgtype, const char *s, size_t len);
+	void (*startcopyout)(void);
+	void (*endcopyout)(bool errorAbort);
+} PQcommMethods;
+
+PQcommMethods *PqCommMethods;
+
+#define pq_comm_reset()	(PqCommMethods->comm_reset())
+#define pq_flush() (PqCommMethods->flush())
+#define pq_flush_if_writable() (PqCommMethods->flush_if_writable())
+#define pq_is_send_pending() (PqCommMethods->is_send_pending())
+#define pq_putmessage(msgtype, s, len) \
+	(PqCommMethods->putmessage(msgtype, s, len))
+#define pq_putmessage_noblock(msgtype, s, len) \
+	(PqCommMethods->putmessage(msgtype, s, len))
+#define pq_startcopyout() (PqCommMethods->startcopyout())
+#define pq_endcopyout(errorAbort) (PqCommMethods->endcopyout(errorAbort))
+
 /*
  * External functions.
  */
@@ -51,7 +76,6 @@ extern int	StreamConnection(pgsocket server_fd, Port *port);
 extern void StreamClose(pgsocket sock);
 extern void TouchSocketFiles(void);
 extern void pq_init(void);
-extern void pq_comm_reset(void);
 extern int	pq_getbytes(char *s, size_t len);
 extern int	pq_getstring(StringInfo s);
 extern int	pq_getmessage(StringInfo s, int maxlen);
@@ -59,13 +83,6 @@ extern int	pq_getbyte(void);
 extern int	pq_peekbyte(void);
 extern int	pq_getbyte_if_available(unsigned char *c);
 extern int	pq_putbytes(const char *s, size_t len);
-extern int	pq_flush(void);
-extern int	pq_flush_if_writable(void);
-extern bool pq_is_send_pending(void);
-extern int	pq_putmessage(char msgtype, const char *s, size_t len);
-extern void pq_putmessage_noblock(char msgtype, const char *s, size_t len);
-extern void pq_startcopyout(void);
-extern void pq_endcopyout(bool errorAbort);
 
 /*
  * prototypes for functions in be-secure.c
@@ -75,6 +92,9 @@ extern char *ssl_key_file;
 extern char *ssl_ca_file;
 extern char *ssl_crl_file;
 
+extern int	(*pq_putmessage_hook)(char msgtype, const char *s, size_t len);
+extern int  (*pq_flush_hook)(void);
+
 extern int	secure_initialize(void);
 extern bool secure_loaded_verify_locations(void);
 extern void secure_destroy(void);
diff --git a/src/include/libpq/pqmq.h b/src/include/libpq/pqmq.h
new file mode 100644
index 0000000..6bb24d9
--- /dev/null
+++ b/src/include/libpq/pqmq.h
@@ -0,0 +1,22 @@
+/*-------------------------------------------------------------------------
+ *
+ * pqmq.h
+ *	  Use the frontend/backend protocol for communication over a shm_mq
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/libpq/pqmq.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PQMQ_H
+#define PQMQ_H
+
+#include "storage/shm_mq.h"
+
+extern void	pq_redirect_to_shm_mq(shm_mq *, shm_mq_handle *);
+
+extern void pq_parse_errornotice(StringInfo str, ErrorData *edata);
+
+#endif   /* PQMQ_H */
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index fb1b4a4..4e74d85 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -285,7 +285,7 @@ extern Datum current_schema(PG_FUNCTION_ARGS);
 extern Datum current_schemas(PG_FUNCTION_ARGS);
 
 /* numutils.c */
-extern int32 pg_atoi(char *s, int size, int c);
+extern int32 pg_atoi(const char *s, int size, int c);
 extern void pg_itoa(int16 i, char *a);
 extern void pg_ltoa(int32 l, char *a);
 extern void pg_lltoa(int64 ll, char *a);
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 92073be..87438b8 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -415,6 +415,7 @@ extern ErrorData *CopyErrorData(void);
 extern void FreeErrorData(ErrorData *edata);
 extern void FlushErrorState(void);
 extern void ReThrowError(ErrorData *edata) __attribute__((noreturn));
+extern void ThrowErrorData(ErrorData *edata);
 extern void pg_re_throw(void) __attribute__((noreturn));
 
 extern char *GetErrorContextStack(void);

0004-Add-infrastructure-to-save-and-restore-GUC-values.patchtext/x-patch; charset=US-ASCII; name=0004-Add-infrastructure-to-save-and-restore-GUC-values.patchDownload

From 59d3cb93c566368d9cc9dd26b3ffd0924c98ee36 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 17 Jul 2014 07:58:32 -0400
Subject: [PATCH 4/6] Add infrastructure to save and restore GUC values.

Amit Khandekar and Noah Misch, with a bit of further hacking by me.
---
 src/backend/utils/misc/guc.c |  397 ++++++++++++++++++++++++++++++++++++++++++
 src/include/utils/guc.h      |    5 +
 2 files changed, 402 insertions(+)

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6c52db8..cb324f3 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -117,6 +117,12 @@
 #define S_PER_D (60 * 60 * 24)
 #define MS_PER_D (1000 * 60 * 60 * 24)
 
+/*
+ * Precision with which REAL type guc values are to be printed for GUC
+ * serialization.
+ */
+#define REALTYPE_PRECISION 17
+
 /* XXX these should appear in other modules' header files */
 extern bool Log_disconnections;
 extern int	CommitDelay;
@@ -146,6 +152,10 @@ char	   *GUC_check_errmsg_string;
 char	   *GUC_check_errdetail_string;
 char	   *GUC_check_errhint_string;
 
+static void
+do_serialize(char **destptr, Size *maxbytes, const char *fmt,...)
+/* This lets gcc check the format string for consistency. */
+__attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 4)));
 
 static void set_config_sourcefile(const char *name, char *sourcefile,
 					  int sourceline);
@@ -8409,6 +8419,393 @@ read_nondefault_variables(void)
 }
 #endif   /* EXEC_BACKEND */
 
+/*
+ * can_skip_gucvar:
+ * When serializing, determine whether to skip this GUC.  When restoring, the
+ * negation of this test determines whether to restore the compiled-in default
+ * value before processing serialized values.
+ *
+ * A PGC_S_DEFAULT setting on the serialize side will typically match new
+ * postmaster children, but that can be false when got_SIGHUP == true and the
+ * pending configuration change modifies this setting.  Nonetheless, we omit
+ * PGC_S_DEFAULT settings from serialization and make up for that by restoring
+ * defaults before applying serialized values.
+ *
+ * PGC_POSTMASTER variables always have the same value in every child of a
+ * particular postmaster.  Most PGC_INTERNAL variables are compile-time
+ * constants; a few, like server_encoding and lc_ctype, are handled specially
+ * outside the serialize/restore procedure.  Therefore, SerializeGUCState()
+ * never sends these, and and RestoreGUCState() never changes them.
+ */
+static bool
+can_skip_gucvar(struct config_generic * gconf)
+{
+	return gconf->context == PGC_POSTMASTER ||
+		gconf->context == PGC_INTERNAL || gconf->source == PGC_S_DEFAULT;
+}
+
+/*
+ * estimate_variable_size:
+ * Estimate max size for dumping the given GUC variable.
+ */
+static Size
+estimate_variable_size(struct config_generic * gconf)
+{
+	Size		size;
+	Size		valsize;
+
+	if (can_skip_gucvar(gconf))
+		return 0;
+
+	size = 0;
+
+	size = add_size(size, strlen(gconf->name) + 1);
+
+	/* Get the maximum display length of the GUC value. */
+	switch (gconf->vartype)
+	{
+		case PGC_BOOL:
+			{
+				valsize = 5;	/* max(strlen('true'), strlen('false')) */
+			}
+			break;
+
+		case PGC_INT:
+			{
+				struct config_int *conf = (struct config_int *) gconf;
+
+				/*
+				 * Instead of getting the exact display length, use max
+				 * length.  Also reduce the max length for typical ranges of
+				 * small values.  Maximum value is 2147483647, i.e. 10 chars.
+				 * Include one byte for sign.
+				 */
+				if (abs(*conf->variable) < 1000)
+					valsize = 3 + 1;
+				else
+					valsize = 10 + 1;
+			}
+			break;
+
+		case PGC_REAL:
+			{
+				/*
+				 * We are going to print it with %.17g. Account for sign,
+				 * decimal point, and e+nnn notation. E.g.
+				 * -3.9932904234000002e+110
+				 */
+				valsize = REALTYPE_PRECISION + 1 + 1 + 5;
+			}
+			break;
+
+		case PGC_STRING:
+			{
+				struct config_string *conf = (struct config_string *) gconf;
+
+				valsize = strlen(*conf->variable);
+			}
+			break;
+
+		case PGC_ENUM:
+			{
+				struct config_enum *conf = (struct config_enum *) gconf;
+
+				valsize = strlen(config_enum_lookup_by_value(conf, *conf->variable));
+			}
+			break;
+	}
+
+	/* Valsize + NULL terminator */
+	size = add_size(size, valsize + 1);
+
+	if (gconf->sourcefile)
+		size = add_size(size, strlen(gconf->sourcefile));
+
+	/* NULL char */
+	size = add_size(size, 1);
+
+	/* Include line whenever we include file. */
+	if (gconf->sourcefile)
+		size = add_size(size, sizeof(gconf->sourceline));
+
+	size = add_size(size, sizeof(gconf->source));
+	size = add_size(size, sizeof(gconf->scontext));
+
+	return size;
+}
+
+/*
+ * EstimateGUCStateSpace:
+ * Returns the size needed to store the GUC state for the current process
+ */
+Size
+EstimateGUCStateSpace(void)
+{
+	Size		size;
+	int			i;
+
+	/* Add space reqd for saving the data size of the guc state */
+	size = sizeof(Size);
+
+	/* Add up the space needed for each GUC variable */
+	for (i = 0; i < num_guc_variables; i++)
+		size = add_size(size,
+						estimate_variable_size(guc_variables[i]));
+
+	return size;
+}
+
+/*
+ * do_serialize:
+ * Copies the formatted string into the destination.  Moves ahead the
+ * destination pointer, and decrements the maxbytes by that many bytes. If
+ * maxbytes is not sufficient to copy the string, error out.
+ */
+static void
+do_serialize(char **destptr, Size *maxbytes, const char *fmt,...)
+{
+	va_list		vargs;
+	int			n;
+
+	if (*maxbytes <= 0)
+		elog(ERROR, "not enough space to serialize GUC state");
+
+	va_start(vargs, fmt);
+	n = vsnprintf(*destptr, *maxbytes, fmt, vargs);
+	va_end(vargs);
+
+	/*
+	 * Cater to portability hazards in the vsnprintf() return value just like
+	 * appendPQExpBufferVA() does.  Note that this requires an extra byte of
+	 * slack at the end of the buffer.  Since serialize_variable() ends with a
+	 * do_serialize_binary() rather than a do_serialize(), we'll always have
+	 * that slack; estimate_variable_size() need not add a byte for it.
+	 */
+	if (n < 0 || n >= *maxbytes - 1)
+	{
+		if (n < 0 && errno != 0 && errno != ENOMEM)
+			/* Shouldn't happen. Better show errno description. */
+			elog(ERROR, "vsnprintf failed: %m");
+		else
+			elog(ERROR, "not enough space to serialize GUC state");
+	}
+
+	/* Shift the destptr ahead of the null terminator */
+	*destptr += n + 1;
+	*maxbytes -= n + 1;
+}
+
+/* Binary copy version of do_serialize() */
+static void
+do_serialize_binary(char **destptr, Size *maxbytes, void *val, Size valsize)
+{
+	if (valsize > *maxbytes)
+		elog(ERROR, "not enough space to serialize GUC state");
+
+	memcpy(*destptr, val, valsize);
+	*destptr += valsize;
+	*maxbytes -= valsize;
+}
+
+/*
+ * serialize_variable:
+ * Dumps name, value and other information of a GUC variable into destptr.
+ */
+static void
+serialize_variable(char **destptr, Size *maxbytes,
+				   struct config_generic * gconf)
+{
+	if (can_skip_gucvar(gconf))
+		return;
+
+	do_serialize(destptr, maxbytes, "%s", gconf->name);
+
+	switch (gconf->vartype)
+	{
+		case PGC_BOOL:
+			{
+				struct config_bool *conf = (struct config_bool *) gconf;
+
+				do_serialize(destptr, maxbytes,
+							 (*conf->variable ? "true" : "false"));
+			}
+			break;
+
+		case PGC_INT:
+			{
+				struct config_int *conf = (struct config_int *) gconf;
+
+				do_serialize(destptr, maxbytes, "%d", *conf->variable);
+			}
+			break;
+
+		case PGC_REAL:
+			{
+				struct config_real *conf = (struct config_real *) gconf;
+
+				do_serialize(destptr, maxbytes, "%.*g",
+							 REALTYPE_PRECISION, *conf->variable);
+			}
+			break;
+
+		case PGC_STRING:
+			{
+				struct config_string *conf = (struct config_string *) gconf;
+
+				do_serialize(destptr, maxbytes, "%s", *conf->variable);
+			}
+			break;
+
+		case PGC_ENUM:
+			{
+				struct config_enum *conf = (struct config_enum *) gconf;
+
+				do_serialize(destptr, maxbytes, "%s",
+						 config_enum_lookup_by_value(conf, *conf->variable));
+			}
+			break;
+	}
+
+	do_serialize(destptr, maxbytes, "%s",
+				 (gconf->sourcefile ? gconf->sourcefile : ""));
+
+	if (gconf->sourcefile)
+		do_serialize_binary(destptr, maxbytes, &gconf->sourceline,
+							sizeof(gconf->sourceline));
+
+	do_serialize_binary(destptr, maxbytes, &gconf->source,
+						sizeof(gconf->source));
+	do_serialize_binary(destptr, maxbytes, &gconf->scontext,
+						sizeof(gconf->scontext));
+}
+
+/*
+ * SerializeGUCState:
+ * Dumps the complete GUC state onto the memory location at start_address.
+ */
+void
+SerializeGUCState(Size maxsize, char *start_address)
+{
+	char	   *curptr;
+	Size		actual_size;
+	Size		bytes_left;
+	int			i;
+	int			i_role;
+
+	/* Reserve space for saving the actual size of the guc state */
+	curptr = start_address + sizeof(actual_size);
+	bytes_left = maxsize - sizeof(actual_size);
+
+	for (i = 0; i < num_guc_variables; i++)
+	{
+		/*
+		 * It's pretty ugly, but we've got to force "role" to be initialized
+		 * after "session_authorization"; otherwise, the latter will override
+		 * the former.
+		 */
+		if (strcmp(guc_variables[i]->name, "role") == 0)
+			i_role = i;
+		else
+			serialize_variable(&curptr, &bytes_left, guc_variables[i]);
+	}
+	serialize_variable(&curptr, &bytes_left, guc_variables[i_role]);
+
+	/* Store actual size without assuming alignment of start_address. */
+	actual_size = maxsize - bytes_left - sizeof(actual_size);
+	memcpy(start_address, &actual_size, sizeof(actual_size));
+}
+
+/*
+ * read_gucstate:
+ * Actually it does not read anything, just returns the srcptr. But it does
+ * move the srcptr past the terminating NULL char, so that the caller is ready
+ * to read the next string.
+ */
+static char *
+read_gucstate(char **srcptr, char *srcend)
+{
+	char	   *retptr = *srcptr;
+	char	   *ptr;
+
+	if (*srcptr >= srcend)
+		elog(ERROR, "incomplete GUC state");
+
+	/* The string variables are all null terminated */
+	for (ptr = *srcptr; ptr < srcend && *ptr != '\0'; ptr++)
+		;
+
+	if (ptr > srcend)
+		elog(ERROR, "could not find null terminator in GUC state");
+
+	/* Set the new position at the position after the NULL character */
+	*srcptr = ptr + 1;
+
+	return retptr;
+}
+
+/* Binary read version of read_gucstate(). Copies into dest */
+static void
+read_gucstate_binary(char **srcptr, char *srcend, void *dest, Size size)
+{
+	if (*srcptr + size > srcend)
+		elog(ERROR, "incomplete GUC state");
+
+	memcpy(dest, *srcptr, size);
+	*srcptr += size;
+}
+
+/*
+ * RestoreGUCState:
+ * Reads the GUC state at the specified address and updates the GUCs with the
+ * values read from the GUC state.
+ */
+void
+RestoreGUCState(void *gucstate)
+{
+	char	   *varname,
+			   *varvalue,
+			   *varsourcefile;
+	int			varsourceline;
+	GucSource	varsource;
+	GucContext	varscontext;
+	char	   *srcptr = (char *) gucstate;
+	char	   *srcend;
+	Size		len;
+	int			i;
+
+	/* See comment at can_skip_gucvar(). */
+	for (i = 0; i < num_guc_variables; i++)
+		if (!can_skip_gucvar(guc_variables[i]))
+			InitializeOneGUCOption(guc_variables[i]);
+
+	/* First item is the length of the subsequent data */
+	memcpy(&len, gucstate, sizeof(len));
+
+	srcptr += sizeof(len);
+	srcend = srcptr + len;
+
+	while (srcptr < srcend)
+	{
+		if ((varname = read_gucstate(&srcptr, srcend)) == NULL)
+			break;
+
+		varvalue = read_gucstate(&srcptr, srcend);
+		varsourcefile = read_gucstate(&srcptr, srcend);
+		if (varsourcefile[0])
+			read_gucstate_binary(&srcptr, srcend,
+								 &varsourceline, sizeof(varsourceline));
+		read_gucstate_binary(&srcptr, srcend,
+							 &varsource, sizeof(varsource));
+		read_gucstate_binary(&srcptr, srcend,
+							 &varscontext, sizeof(varscontext));
+
+		(void) set_config_option(varname, varvalue,
+								 varscontext, varsource,
+								 GUC_ACTION_SET, true, 0);
+		if (varsourcefile[0])
+			set_config_sourcefile(varname, varsourcefile, varsourceline);
+	}
+}
 
 /*
  * A little "long argument" simulation, although not quite GNU
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 0a729c1..47cac5c 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -351,6 +351,11 @@ extern void write_nondefault_variables(GucContext context);
 extern void read_nondefault_variables(void);
 #endif
 
+/* GUC serialization */
+extern Size EstimateGUCStateSpace(void);
+extern void SerializeGUCState(Size maxsize, char *start_address);
+extern void RestoreGUCState(void *gucstate);
+
 /* Support for messages reported from GUC check hooks */
 
 extern PGDLLIMPORT char *GUC_check_errmsg_string;
-- 
1.7.9.6 (Apple Git-31.1)

0006-pg_background-Run-commands-in-a-background-worker-an-v3.patchtext/x-patch; charset=US-ASCII; name=0006-pg_background-Run-commands-in-a-background-worker-an-v3.patchDownload

commit 5d550ec6627d25932bebaedf589e6928b6f21ea1
Author: Robert Haas <rhaas@postgresql.org>
Date:   Fri Jul 11 09:53:40 2014 -0400

    pg_background: Run commands in a background worker, and get the results.
    
    The currently-active GUC values from the user session will be copied
    to the background worker.  If the command returns a result set, you
    can retrieve the result set; if not, you can retrieve the command
    tags.  If the command fails with an error, the same error will be
    thrown in the launching process when the results are retrieved.
    Warnings and other messages generated by the background worker, and
    notifications received by it, are also propagated to the foreground
    process.
    
    Patch by me; review by Amit Kapila and Andres Freund.
    
    V2: Refactor to reduce differences with exec_simple_query; other
    cleanups per Amit.
    
    V3: Per Andres, fix whitespace damage and use NameData instead of
    char[NAMEDATALEN].

diff --git a/contrib/Makefile b/contrib/Makefile
index b37d0dd..11d6116 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
 		pageinspect	\
 		passwordcheck	\
 		pg_archivecleanup \
+		pg_background \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
diff --git a/contrib/pg_background/Makefile b/contrib/pg_background/Makefile
new file mode 100644
index 0000000..c4e717d
--- /dev/null
+++ b/contrib/pg_background/Makefile
@@ -0,0 +1,18 @@
+# contrib/pg_background/Makefile
+
+MODULE_big = pg_background
+OBJS = pg_background.o
+
+EXTENSION = pg_background
+DATA = pg_background--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_background
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_background/pg_background--1.0.sql b/contrib/pg_background/pg_background--1.0.sql
new file mode 100644
index 0000000..5fa5ddb
--- /dev/null
+++ b/contrib/pg_background/pg_background--1.0.sql
@@ -0,0 +1,17 @@
+/* contrib/pg_background/pg_background--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION pg_background" to load this file. \quit
+
+CREATE FUNCTION pg_background_launch(sql pg_catalog.text,
+					   queue_size pg_catalog.int4 DEFAULT 65536)
+    RETURNS pg_catalog.int4 STRICT
+	AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION pg_background_result(pid pg_catalog.int4)
+    RETURNS SETOF pg_catalog.record STRICT
+	AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION pg_background_detach(pid pg_catalog.int4)
+    RETURNS pg_catalog.void STRICT
+	AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/contrib/pg_background/pg_background.c b/contrib/pg_background/pg_background.c
new file mode 100644
index 0000000..f807781
--- /dev/null
+++ b/contrib/pg_background/pg_background.c
@@ -0,0 +1,935 @@
+/*--------------------------------------------------------------------------
+ *
+ * pg_background.c
+ *		Run SQL commands using a background worker.
+ *
+ * Copyright (C) 2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pg_background/pg_background.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/htup_details.h"
+#include "access/printtup.h"
+#include "access/xact.h"
+#include "catalog/pg_type.h"
+#include "commands/async.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "libpq/pqformat.h"
+#include "libpq/pqmq.h"
+#include "miscadmin.h"
+#include "parser/analyze.h"
+#include "pgstat.h"
+#include "storage/dsm.h"
+#include "storage/ipc.h"
+#include "storage/shm_mq.h"
+#include "storage/shm_toc.h"
+#include "tcop/pquery.h"
+#include "tcop/utility.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/snapmgr.h"
+#include "utils/timeout.h"
+
+/* Table-of-contents constants for our dynamic shared memory segment. */
+#define PG_BACKGROUND_MAGIC				0x50674267
+#define PG_BACKGROUND_KEY_FIXED_DATA	0
+#define PG_BACKGROUND_KEY_SQL			1
+#define PG_BACKGROUND_KEY_GUC			2
+#define PG_BACKGROUND_KEY_QUEUE			3
+#define PG_BACKGROUND_NKEYS				4
+
+/* Fixed-size data passed via our dynamic shared memory segment. */
+typedef struct pg_background_fixed_data
+{
+	Oid	database_id;
+	Oid	authenticated_user_id;
+	Oid	current_user_id;
+	int	sec_context;
+	NameData	database;
+	NameData	authenticated_user;
+} pg_background_fixed_data;
+
+/* Private state maintained by the launching backend for IPC. */
+typedef struct pg_background_worker_info
+{
+	pid_t		pid;
+	dsm_segment *seg;
+	BackgroundWorkerHandle *handle;
+	shm_mq_handle *responseq;
+	bool		consumed;
+} pg_background_worker_info;
+
+/* Private state maintained across calls to pg_background_result. */
+typedef struct pg_background_result_state
+{
+	pg_background_worker_info *info;
+	FmgrInfo   *receive_functions;
+	Oid		   *typioparams;
+	bool		has_row_description;
+	List	   *command_tags;
+	bool		complete;
+} pg_background_result_state;
+
+static HTAB *worker_hash;
+
+static void cleanup_worker_info(dsm_segment *, Datum pid_datum);
+static pg_background_worker_info *find_worker_info(pid_t pid);
+static void save_worker_info(pid_t pid, dsm_segment *seg,
+				 BackgroundWorkerHandle *handle,
+				 shm_mq_handle *responseq);
+
+static HeapTuple form_result_tuple(pg_background_result_state *state,
+								   TupleDesc tupdesc, StringInfo msg);
+
+static void handle_sigterm(SIGNAL_ARGS);
+static void execute_sql_string(const char *sql);
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(pg_background_launch);
+PG_FUNCTION_INFO_V1(pg_background_result);
+PG_FUNCTION_INFO_V1(pg_background_detach);
+
+void pg_background_worker_main(Datum);
+
+/*
+ * Start a dynamic background worker to run a user-specified SQL command.
+ */
+Datum
+pg_background_launch(PG_FUNCTION_ARGS)
+{
+	text	   *sql = PG_GETARG_TEXT_PP(0);
+	int32		queue_size = PG_GETARG_INT32(1);
+	int32		sql_len = VARSIZE_ANY_EXHDR(sql);
+	Size		guc_len;
+	Size		segsize;
+	dsm_segment *seg;
+	shm_toc_estimator e;
+	shm_toc    *toc;
+	char	   *sqlp;
+	char	   *gucstate;
+	shm_mq	   *mq;
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *worker_handle;
+	pg_background_fixed_data *fdata;
+	pid_t		pid;
+	shm_mq_handle *responseq;
+	MemoryContext	oldcontext;
+
+	/* Ensure a valid queue size. */
+	if (queue_size < 0 || ((uint64) queue_size) < shm_mq_minimum_size)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("queue size must be at least %zu bytes",
+						shm_mq_minimum_size)));
+
+	/* Create dynamic shared memory and table of contents. */
+	shm_toc_initialize_estimator(&e);
+	shm_toc_estimate_chunk(&e, sizeof(pg_background_fixed_data));
+	shm_toc_estimate_chunk(&e, sql_len + 1);
+	guc_len = EstimateGUCStateSpace();
+	shm_toc_estimate_chunk(&e, guc_len);
+	shm_toc_estimate_chunk(&e, (Size) queue_size);
+	shm_toc_estimate_keys(&e, PG_BACKGROUND_NKEYS);
+	segsize = shm_toc_estimate(&e);
+	seg = dsm_create(segsize);
+	toc = shm_toc_create(PG_BACKGROUND_MAGIC, dsm_segment_address(seg),
+						 segsize);
+
+	/* Store fixed-size data in dynamic shared memory. */
+	fdata = shm_toc_allocate(toc, sizeof(pg_background_fixed_data));
+	fdata->database_id = MyDatabaseId;
+	fdata->authenticated_user_id = GetAuthenticatedUserId();
+	GetUserIdAndSecContext(&fdata->current_user_id, &fdata->sec_context);
+	namestrcpy(&fdata->database, get_database_name(MyDatabaseId));
+	namestrcpy(&fdata->authenticated_user,
+			   GetUserNameFromId(fdata->authenticated_user_id));
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_FIXED_DATA, fdata);
+
+	/* Store SQL query in dynamic shared memory. */
+	sqlp = shm_toc_allocate(toc, sql_len + 1);
+	memcpy(sqlp, VARDATA(sql), sql_len);
+	sqlp[sql_len] = '\0';
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_SQL, sqlp);
+
+	/* Store GUC state in dynamic shared memory. */
+	gucstate = shm_toc_allocate(toc, guc_len);
+	SerializeGUCState(guc_len, gucstate);
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_GUC, gucstate);
+
+	/* Establish message queue in dynamic shared memory. */
+	mq = shm_mq_create(shm_toc_allocate(toc, (Size) queue_size),
+					   (Size) queue_size);
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_QUEUE, mq);
+	shm_mq_set_receiver(mq, MyProc);
+
+	/*
+	 * Attach the queue before launching a worker, so that we'll automatically
+	 * detach the queue if we error out.  (Otherwise, the worker might sit
+	 * there trying to write the queue long after we've gone away.)
+	 */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+	responseq = shm_mq_attach(mq, seg, NULL);
+	MemoryContextSwitchTo(oldcontext);
+
+	/* Configure a worker. */
+	worker.bgw_flags =
+		BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	worker.bgw_start_time = BgWorkerStart_ConsistentState;
+	worker.bgw_restart_time = BGW_NEVER_RESTART;
+	worker.bgw_main = NULL;		/* new worker might not have library loaded */
+	sprintf(worker.bgw_library_name, "pg_background");
+	sprintf(worker.bgw_function_name, "pg_background_worker_main");
+	snprintf(worker.bgw_name, BGW_MAXLEN,
+			 "pg_background by PID %d", MyProcPid);
+	worker.bgw_main_arg = UInt32GetDatum(dsm_segment_handle(seg));
+	/* set bgw_notify_pid, so we can detect if the worker stops */
+	worker.bgw_notify_pid = MyProcPid;
+
+	/*
+	 * Register the worker.
+	 *
+	 * We switch contexts so that the background worker handle can outlast
+	 * this transaction.
+	 */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+	if (!RegisterDynamicBackgroundWorker(&worker, &worker_handle))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not register background process"),
+			 errhint("You may need to increase max_worker_processes.")));
+	MemoryContextSwitchTo(oldcontext);
+	shm_mq_set_handle(responseq, worker_handle);
+
+	/* Wait for the worker to start. */
+	switch (WaitForBackgroundWorkerStartup(worker_handle, &pid))
+	{
+		case BGWH_STARTED:
+			/* Success. */
+			break;
+		case BGWH_STOPPED:
+			pfree(worker_handle);
+			ereport(ERROR,
+					(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+					 errmsg("could not start background process"),
+					 errhint("More details may be available in the server log.")));
+			break;
+		case BGWH_POSTMASTER_DIED:
+			pfree(worker_handle);
+			ereport(ERROR,
+					(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+					 errmsg("cannot start background processes without postmaster"),
+					 errhint("Kill all remaining database processes and restart the database.")));
+			break;
+		default:
+			elog(ERROR, "unexpected bgworker handle status");
+			break;
+	}
+
+	/* Store the relevant details about this worker for future use. */
+	save_worker_info(pid, seg, worker_handle, responseq);
+
+	/*
+	 * Now that the worker info is saved, we do not need to, and should not,
+	 * automatically detach the segment at resource-owner cleanup time.
+	 */
+	dsm_keep_mapping(seg);
+
+	/* Return the worker's PID. */
+	PG_RETURN_INT32(pid);
+}
+
+/*
+ * Retrieve the results of a background query previously launched in this
+ * session.
+ */
+Datum
+pg_background_result(PG_FUNCTION_ARGS)
+{
+	int32		pid = PG_GETARG_INT32(0);
+	shm_mq_result	res;
+	FuncCallContext *funcctx;
+	TupleDesc	tupdesc;
+	StringInfoData	msg;
+	pg_background_result_state *state;
+
+	/* First-time setup. */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+		pg_background_worker_info *info;
+
+		funcctx = SRF_FIRSTCALL_INIT();
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* See if we have a connection to the specified PID. */
+		if ((info = find_worker_info(pid)) == NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_OBJECT),
+					 errmsg("PID %d is not attached to this session", pid)));
+
+		/* Can't read results twice. */
+		if (info->consumed)
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("results for PID %d have already been consumed", pid)));
+		info->consumed = true;
+
+		/*
+		 * Whether we succeed or fail, a future invocation of this function
+		 * may not try to read from the DSM once we've begun to do so.
+		 * Accordingly, make arrangements to clean things up at end of query.
+		 */
+		dsm_unkeep_mapping(info->seg);
+
+		/* Set up tuple-descriptor based on colum definition list. */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record"),
+					 errhint("Try calling the function in the FROM clause "
+							 "using a column definition list.")));
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/* Cache state that will be needed on every call. */
+		state = palloc0(sizeof(pg_background_result_state));
+		state->info = info;
+		if (funcctx->tuple_desc->natts > 0)
+		{
+			int	natts = funcctx->tuple_desc->natts;
+			int	i;
+
+			state->receive_functions = palloc(sizeof(FmgrInfo) * natts);
+			state->typioparams = palloc(sizeof(Oid) * natts);
+
+			for (i = 0;	i < natts; ++i)
+			{
+				Oid	receive_function_id;
+
+				getTypeBinaryInputInfo(funcctx->tuple_desc->attrs[i]->atttypid,
+									   &receive_function_id,
+									   &state->typioparams[i]);
+				fmgr_info(receive_function_id, &state->receive_functions[i]);
+			}
+		}
+		funcctx->user_fctx = state;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+	funcctx = SRF_PERCALL_SETUP();
+	tupdesc = funcctx->tuple_desc;
+	state = funcctx->user_fctx;
+
+	/* Initialize message buffer. */
+	initStringInfo(&msg);
+
+	/* Read and processes messages from the shared memory queue. */
+	for (;;)
+	{
+		char		msgtype;
+		Size		nbytes;
+		void	   *data;
+
+		/* Get next message. */
+		res = shm_mq_receive(state->info->responseq, &nbytes, &data, false);
+		if (res != SHM_MQ_SUCCESS)
+			break;
+
+		/*
+		 * Message-parsing routines operate on a null-terminated StringInfo,
+		 * so we must construct one.
+		 */
+		resetStringInfo(&msg);
+		enlargeStringInfo(&msg, nbytes);
+		msg.len = nbytes;
+		memcpy(msg.data, data, nbytes);
+		msg.data[nbytes] = '\0';
+		msgtype = pq_getmsgbyte(&msg);
+
+		/* Dispatch on message type. */
+		switch (msgtype)
+		{
+			case 'E':
+			case 'N':
+				{
+					ErrorData	edata;
+
+					/* Parse ErrorResponse or NoticeResponse. */
+					pq_parse_errornotice(&msg, &edata);
+
+					/*
+					 * Limit the maximum error level to ERROR.  We don't want
+					 * a FATAL inside the background worker to kill the user
+					 * session.
+					 */
+					if (edata.elevel > ERROR)
+						edata.elevel = ERROR;
+
+					/* Rethrow the error. */
+					ThrowErrorData(&edata);
+					break;
+				}
+			case 'A':
+				{
+					/* Propagate NotifyResponse. */
+					pq_putmessage(msg.data[0], &msg.data[1], nbytes - 1);
+					break;
+				}
+			case 'T':
+				{
+					int16	natts = pq_getmsgint(&msg, 2);
+					int16	i;
+
+					if (state->has_row_description)
+						elog(ERROR, "multiple RowDescription messages");
+					state->has_row_description = true;
+					if (natts != tupdesc->natts)
+						ereport(ERROR,
+								(errcode(ERRCODE_DATATYPE_MISMATCH),
+								 errmsg("remote query result rowtype does not match "
+									"the specified FROM clause rowtype")));
+
+					for (i = 0; i < natts; ++i)
+					{
+						Oid		type_id;
+
+						(void) pq_getmsgstring(&msg);	/* name */
+						(void) pq_getmsgint(&msg, 4);	/* table OID */
+						(void) pq_getmsgint(&msg, 2);	/* table attnum */
+						type_id = pq_getmsgint(&msg, 4);	/* type OID */
+						(void) pq_getmsgint(&msg, 2);	/* type length */
+						(void) pq_getmsgint(&msg, 4);	/* typmod */
+						(void) pq_getmsgint(&msg, 2);	/* format code */
+
+						if (type_id != tupdesc->attrs[i]->atttypid)
+							ereport(ERROR,
+									(errcode(ERRCODE_DATATYPE_MISMATCH),
+									 errmsg("remote query result rowtype does not match "
+										"the specified FROM clause rowtype")));
+					}
+
+					pq_getmsgend(&msg);
+
+					break;
+				}
+			case 'D':
+				{
+					/* Handle DataRow message. */
+					HeapTuple	result;
+
+					result = form_result_tuple(state, tupdesc, &msg);
+					SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(result));
+				}
+			case 'C':
+				{
+					/* Handle CommandComplete message. */
+					MemoryContext	oldcontext;
+					const char  *tag = pq_getmsgstring(&msg);
+
+					oldcontext =
+						MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+					state->command_tags = lappend(state->command_tags,
+												  pstrdup(tag));
+					MemoryContextSwitchTo(oldcontext);
+					break;
+				}
+			case 'G':
+			case 'H':
+			case 'W':
+				{
+					ereport(ERROR,
+							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("COPY protocol not allowed in pg_background")));
+				}
+
+			case 'Z':
+				{
+					/* Handle ReadyForQuery message. */
+					state->complete = true;
+					break;
+				}
+			default:
+				elog(WARNING, "unknown message type: %c (%zu bytes)",
+					 msg.data[0], nbytes);
+				break;
+		}
+	}
+
+	/* Check whether the connection was broken prematurely. */
+	if (!state->complete)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONNECTION_FAILURE),
+				 errmsg("lost connection to worker process with PID %d",
+					pid)));
+
+	/* If no data rows, return the command tags instead. */
+	if (!state->has_row_description)
+	{
+		if (tupdesc->natts != 1 || tupdesc->attrs[0]->atttypid != TEXTOID)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query did not return a result set, but "
+							"result rowtype is not a single text column")));
+		if (state->command_tags != NIL)
+		{
+			char *tag = linitial(state->command_tags);
+			Datum	value;
+			bool	isnull;
+			HeapTuple	result;
+
+			state->command_tags = list_delete_first(state->command_tags);
+			value = PointerGetDatum(cstring_to_text(tag));
+			isnull = false;
+			result = heap_form_tuple(tupdesc, &value, &isnull);
+			SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(result));
+		}
+	}
+
+	/* We're done! */
+	dsm_detach(state->info->seg);
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Parse a DataRow message and form a result tuple.
+ */
+static HeapTuple
+form_result_tuple(pg_background_result_state *state, TupleDesc tupdesc,
+				  StringInfo msg)
+{
+	/* Handle DataRow message. */
+	int16	natts = pq_getmsgint(msg, 2);
+	int16	i;
+	Datum  *values = NULL;
+	bool   *isnull = NULL;
+	StringInfoData	buf;
+
+	if (!state->has_row_description)
+		elog(ERROR, "DataRow not preceded by RowDescription");
+	if (natts != tupdesc->natts)
+		elog(ERROR, "malformed DataRow");
+	if (natts > 0)
+	{
+		values = palloc(natts * sizeof(Datum));
+		isnull = palloc(natts * sizeof(bool));
+	}
+	initStringInfo(&buf);
+
+	for (i = 0; i < natts; ++i)
+	{
+		int32	bytes = pq_getmsgint(msg, 4);
+
+		if (bytes < 0)
+		{
+			values[i] = ReceiveFunctionCall(&state->receive_functions[i],
+											NULL,
+											state->typioparams[i],
+											tupdesc->attrs[i]->atttypmod);
+			isnull[i] = true;
+		}
+		else
+		{
+			resetStringInfo(&buf);
+			appendBinaryStringInfo(&buf, pq_getmsgbytes(msg, bytes), bytes);
+			values[i] = ReceiveFunctionCall(&state->receive_functions[i],
+											&buf,
+											state->typioparams[i],
+											tupdesc->attrs[i]->atttypmod);
+			isnull[i] = false;
+		}
+	}
+
+	pq_getmsgend(msg);
+
+	return heap_form_tuple(tupdesc, values, isnull);
+}
+
+/*
+ * Detach from the dynamic shared memory segment used for communication with
+ * a background worker.  This prevents the worker from stalling waiting for
+ * us to read its results.
+ */
+Datum
+pg_background_detach(PG_FUNCTION_ARGS)
+{
+	int32		pid = PG_GETARG_INT32(0);
+	pg_background_worker_info *info;
+
+	info = find_worker_info(pid);
+	if (info == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("PID %d is not attached to this session", pid)));
+	dsm_detach(info->seg);
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * When the dynamic shared memory segment associated with a worker is
+ * cleaned up, we need to clean up our associated private data structures.
+ */
+static void
+cleanup_worker_info(dsm_segment *seg, Datum pid_datum)
+{
+	pid_t	pid = DatumGetInt32(pid_datum);
+	bool	found;
+	pg_background_worker_info *info;
+
+	/* Find any worker info entry for this PID.  If none, we're done. */
+	if ((info = find_worker_info(pid)) == NULL)
+		return;
+
+	/* Free memory used by the BackgroundWorkerHandle. */
+	if (info->handle != NULL)
+	{
+		pfree(info->handle);
+		info->handle = NULL;
+	}
+
+	/* Remove the hashtable entry. */
+	hash_search(worker_hash, (void *) &pid, HASH_REMOVE, &found);
+	if (!found)
+		elog(ERROR, "pg_background worker_hash table corrupted");
+}
+
+/*
+ * Find the background worker information for the worker with a given PID.
+ */
+static pg_background_worker_info *
+find_worker_info(pid_t pid)
+{
+	pg_background_worker_info *info = NULL;
+
+	if (worker_hash != NULL)
+		info = hash_search(worker_hash, (void *) &pid, HASH_FIND, NULL);
+
+	return info;
+}
+
+/*
+ * Save worker information for future IPC.
+ */
+static void
+save_worker_info(pid_t pid, dsm_segment *seg, BackgroundWorkerHandle *handle,
+				 shm_mq_handle *responseq)
+{
+	pg_background_worker_info *info;
+
+	/* If the hash table hasn't been set up yet, do that now. */
+	if (worker_hash == NULL)
+	{
+		HASHCTL	ctl;
+
+		ctl.keysize = sizeof(pid_t);
+		ctl.entrysize = sizeof(pg_background_worker_info);
+		worker_hash = hash_create("pg_background worker_hash", 8, &ctl,
+								  HASH_ELEM);
+	}
+
+	/* Detach any older worker with this PID. */
+	if ((info = find_worker_info(pid)) != NULL)
+		dsm_detach(info->seg);
+
+	/* When the DSM is unmapped, clean everything up. */
+	on_dsm_detach(seg, cleanup_worker_info, Int32GetDatum(pid));
+
+	/* Create a new entry for this worker. */
+	info = hash_search(worker_hash, (void *) &pid, HASH_ENTER, NULL);
+	info->seg = seg;
+	info->handle = handle;
+	info->responseq = responseq;
+	info->consumed = false;
+}
+
+/*
+ * Background worker entrypoint.
+ */
+void
+pg_background_worker_main(Datum main_arg)
+{
+	dsm_segment *seg;
+	shm_toc    *toc;
+	pg_background_fixed_data *fdata;
+	char	   *sql;
+	char	   *gucstate;
+	shm_mq	   *mq;
+	shm_mq_handle *responseq;
+
+	/* Establish signal handlers. */
+	pqsignal(SIGTERM, handle_sigterm);
+	BackgroundWorkerUnblockSignals();
+
+	/* Set up a memory context and resource owner. */
+	Assert(CurrentResourceOwner == NULL);
+	CurrentResourceOwner = ResourceOwnerCreate(NULL, "pg_background");
+	CurrentMemoryContext = AllocSetContextCreate(TopMemoryContext,
+												 "pg_background session",
+												 ALLOCSET_DEFAULT_MINSIZE,
+												 ALLOCSET_DEFAULT_INITSIZE,
+												 ALLOCSET_DEFAULT_MAXSIZE);
+
+	/* Connect to the dynamic shared memory segment. */
+	seg = dsm_attach(DatumGetInt32(main_arg));
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("unable to map dynamic shared memory segment")));
+	toc = shm_toc_attach(PG_BACKGROUND_MAGIC, dsm_segment_address(seg));
+	if (toc == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+			   errmsg("bad magic number in dynamic shared memory segment")));
+
+	/* Find data structures in dynamic shared memory. */
+	fdata = shm_toc_lookup(toc, PG_BACKGROUND_KEY_FIXED_DATA);
+	sql = shm_toc_lookup(toc, PG_BACKGROUND_KEY_SQL);
+	gucstate = shm_toc_lookup(toc, PG_BACKGROUND_KEY_GUC);
+	mq = shm_toc_lookup(toc, PG_BACKGROUND_KEY_QUEUE);
+	shm_mq_set_sender(mq, MyProc);
+	responseq = shm_mq_attach(mq, seg, NULL);
+
+	/* Redirect protocol messages to responseq. */
+	pq_redirect_to_shm_mq(mq, responseq);
+
+	/*
+	 * Initialize our user and database ID based on the strings version of
+	 * the data, and then go back and check that we actually got the database
+	 * and user ID that we intended to get.  We do this because it's not
+	 * impossible for the process that started us to die before we get here,
+	 * and the user or database could be renamed in the meantime.  We don't
+	 * want to latch on the wrong object by accident.  There should probably
+	 * be a variant of BackgroundWorkerInitializeConnection that accepts OIDs
+	 * rather than strings.
+	 */
+	BackgroundWorkerInitializeConnection(NameStr(fdata->database),
+										 NameStr(fdata->authenticated_user));
+	if (fdata->database_id != MyDatabaseId ||
+		fdata->authenticated_user_id != GetAuthenticatedUserId())
+		ereport(ERROR,
+			(errmsg("user or database renamed during pg_background startup")));
+
+	/* Restore GUC values from launching backend. */
+	StartTransactionCommand();
+	RestoreGUCState(gucstate);
+	CommitTransactionCommand();
+
+	/* Restore user ID and security context. */
+	SetUserIdAndSecContext(fdata->current_user_id, fdata->sec_context);
+
+	/* Prepare to execute the query. */
+	SetCurrentStatementStartTimestamp();
+	debug_query_string = sql;
+	pgstat_report_activity(STATE_RUNNING, sql);
+
+	/* Execute the query. */
+	execute_sql_string(sql);
+
+	/* Post-execution cleanup. */
+	ProcessCompletedNotifies();
+	pgstat_report_activity(STATE_IDLE, sql);
+	pgstat_report_stat(true);
+
+	/* Signal that we are done. */
+	ReadyForQuery(DestRemote);
+}
+
+/*
+ * Execute given SQL string.
+ *
+ * Using SPI here would preclude backgrounding commands like VACUUM which one
+ * might very well wish to launch in the background.  So we do this instead.
+ */
+static void
+execute_sql_string(const char *sql)
+{
+	List	   *raw_parsetree_list;
+	ListCell   *lc1;
+	bool		isTopLevel;
+	int			commands_remaining;
+	MemoryContext	parsecontext;
+	MemoryContext	oldcontext;
+
+	/* Start up a transaction command. */
+	start_xact_command();
+
+	/*
+	 * Parse the SQL string into a list of raw parse trees.
+	 *
+	 * Because we allow statements that perform internal transaction control,
+	 * we can't do this in TopTransactionContext; the parse trees might get
+	 * blown away before we're done executing them.
+	 */
+	parsecontext = AllocSetContextCreate(TopMemoryContext,
+										 "pg_background parse/plan",
+										 ALLOCSET_DEFAULT_MINSIZE,
+										 ALLOCSET_DEFAULT_INITSIZE,
+										 ALLOCSET_DEFAULT_MAXSIZE);
+	oldcontext = MemoryContextSwitchTo(parsecontext);
+	raw_parsetree_list = pg_parse_query(sql);
+	commands_remaining = list_length(raw_parsetree_list);
+	isTopLevel = commands_remaining == 1;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * Do parse analysis, rule rewrite, planning, and execution for each raw
+	 * parsetree.  We must fully execute each query before beginning parse
+	 * analysis on the next one, since there may be interdependencies.
+	 */
+	foreach(lc1, raw_parsetree_list)
+	{
+		Node	   *parsetree = (Node *) lfirst(lc1);
+		const char *commandTag;
+		char        completionTag[COMPLETION_TAG_BUFSIZE];
+		List       *querytree_list,
+				   *plantree_list;
+		bool		snapshot_set = false;
+		Portal		portal;
+		DestReceiver *receiver;
+		int16		format = 1;
+
+		/*
+		 * We don't allow transaction-control commands like COMMIT and ABORT
+		 * here.  The entire SQL statement is executed as a single transaction
+		 * which commits if no errors are encountered.
+		 */
+		if (IsA(parsetree, TransactionStmt))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("transaction control statements are not allowed in pg_background")));
+
+		/*
+		 * Get the command name for use in status display (it also becomes the
+		 * default completion tag, down inside PortalRun).  Set ps_status and
+		 * do any special start-of-SQL-command processing needed by the
+		 * destination.
+		 */
+		commandTag = CreateCommandTag(parsetree);
+		set_ps_display(commandTag, false);
+		BeginCommand(commandTag, DestNone);
+
+		/* Set up a snapshot if parse analysis/planning will need one. */
+		if (analyze_requires_snapshot(parsetree))
+		{
+			PushActiveSnapshot(GetTransactionSnapshot());
+			snapshot_set = true;
+		}
+
+		/*
+		 * OK to analyze, rewrite, and plan this query.
+		 *
+		 * As with parsing, we need to make sure this data outlives the
+		 * transaction, because of the possibility that the statement might
+		 * perform internal transaction control.
+		 */
+		oldcontext = MemoryContextSwitchTo(parsecontext);
+		querytree_list = pg_analyze_and_rewrite(parsetree, sql, NULL, 0);
+		plantree_list = pg_plan_queries(querytree_list, 0, NULL);
+
+		/* Done with the snapshot used for parsing/planning */
+		if (snapshot_set)
+			PopActiveSnapshot();
+
+		/* If we got a cancel signal in analysis or planning, quit */
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Execute the query using the unnamed portal.
+		 */
+		portal = CreatePortal("", true, true);
+		/* Don't display the portal in pg_cursors */
+		portal->visible = false;
+		PortalDefineQuery(portal, NULL, sql, commandTag, plantree_list, NULL);
+		PortalStart(portal, NULL, 0, InvalidSnapshot);
+
+		/* We always use binary format, for efficiency. */
+		PortalSetResultFormat(portal, 1, &format);
+
+		/*
+		 * Tuples returned by any command other than the last are simply
+		 * discarded; but those returned by the last (or only) command are
+		 * redirected to the shared memory queue we're using for communication
+		 * with the launching backend. If the launching backend is gone or has
+		 * detached us, these messages will just get dropped on the floor.
+		 */
+		--commands_remaining;
+		if (commands_remaining > 0)
+			receiver = CreateDestReceiver(DestNone);
+		else
+		{
+			receiver = CreateDestReceiver(DestRemote);
+			SetRemoteDestReceiverParams(receiver, portal);
+		}
+
+		/*
+		 * Only once the portal and destreceiver have been established can
+		 * we return to the transaction context.  All that stuff needs to
+		 * survive an internal commit inside PortalRun!
+		 */
+		MemoryContextSwitchTo(oldcontext);
+
+		/* Here's where we actually execute the command. */
+		(void) PortalRun(portal, FETCH_ALL, isTopLevel, receiver, receiver,
+						 completionTag);
+
+		/* Clean up the receiver. */
+		(*receiver->rDestroy) (receiver);
+
+		/* Clean up the portal. */
+		PortalDrop(portal, false);
+
+		/*
+		 * If this is the last parsetree, close down transaction statement
+		 * before reporting CommandComplete.  Otherwise, we need a
+		 * CommandCounterIncrement.
+		 */
+		if (lnext(lc1) == NULL)
+			finish_xact_command();
+		else
+			CommandCounterIncrement();
+
+		/*
+		 * Send a CommandComplete message even if we suppressed the query
+		 * results.  The user backend will report the command tags in the
+		 * absence of any true query results.
+		 */
+		EndCommand(completionTag, DestRemote);
+	}
+
+	/* Make sure there's not still a transaction open. */
+	finish_xact_command();
+}
+
+/*
+ * When we receive a SIGTERM, we set InterruptPending and ProcDiePending just
+ * like a normal backend.  The next CHECK_FOR_INTERRUPTS() will do the right
+ * thing.
+ */
+static void
+handle_sigterm(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	if (!proc_exit_inprogress)
+	{
+		InterruptPending = true;
+		ProcDiePending = true;
+	}
+
+	errno = save_errno;
+}
diff --git a/contrib/pg_background/pg_background.control b/contrib/pg_background/pg_background.control
new file mode 100644
index 0000000..733d0e1
--- /dev/null
+++ b/contrib/pg_background/pg_background.control
@@ -0,0 +1,4 @@
+comment = 'Run SQL queries in the background'
+default_version = '1.0'
+module_pathname = '$libdir/pg_background'
+relocatable = true
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 61f17bf..e44f5fa 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -190,8 +190,6 @@ static int	errdetail_execute(List *raw_parsetree_list);
 static int	errdetail_params(ParamListInfo params);
 static int	errdetail_abort(void);
 static int	errdetail_recovery_conflict(void);
-static void start_xact_command(void);
-static void finish_xact_command(void);
 static bool IsTransactionExitStmt(Node *parsetree);
 static bool IsTransactionExitStmtList(List *parseTrees);
 static bool IsTransactionStmtList(List *parseTrees);
@@ -2373,7 +2371,7 @@ exec_describe_portal_message(const char *portal_name)
 /*
  * Convenience routines for starting/committing a single command.
  */
-static void
+void
 start_xact_command(void)
 {
 	if (!xact_started)
@@ -2393,7 +2391,7 @@ start_xact_command(void)
 	}
 }
 
-static void
+void
 finish_xact_command(void)
 {
 	if (xact_started)
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index c23f4da..8e84ee7 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -224,7 +224,7 @@ extern PGPROC *PreparedXactProcs;
 
 /* configurable options */
 extern int	DeadlockTimeout;
-extern int	StatementTimeout;
+extern PGDLLIMPORT int	StatementTimeout;
 extern int	LockTimeout;
 extern bool log_lock_waits;
 
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 60f7532..fd3df58 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -69,6 +69,8 @@ extern void RecoveryConflictInterrupt(ProcSignalReason reason); /* called from S
 																 * handler */
 extern void prepare_for_client_read(void);
 extern void client_read_ended(void);
+extern void start_xact_command(void);
+extern void finish_xact_command(void);
 extern void process_postgres_switches(int argc, char *argv[],
 						  GucContext ctx, const char **dbname);
 extern void PostgresMain(int argc, char *argv[],

#43

Joshua D. Drake

jd@commandprompt.com

about 11 years ago

In reply to: Robert Haas (#42)

Re: pg_background (and more parallelism infrastructure patches)

On 10/22/2014 04:03 PM, Robert Haas wrote:

On Wed, Oct 8, 2014 at 6:32 PM, Andres Freund <andres@2ndquadrant.com> wrote:

I got to ask: Why is it helpful that we have this in contrib? I have a
good share of blame to bear for that, but I think we need to stop
dilluting contrib evermore with test programs. These have a place, but I
don't think it should be contrib.

I don't think pg_background is merely a test program: I think it's a
quite useful piece of functionality. It can be used for running
VACUUM from a procedure, which is something people have asked for more
than once, or for simulating an autonomous transaction. Granted,
it'll be a lot slower than a real autonomous transaction, but it's
still better than doing it via dblink, because you don't have to futz
with authentication.

I think this is a very useful program but I wonder if it makes more
sense to push it out to pgxn? Why do we need an ever growing contrib?
Further, if it is good enough to go into contrib, why isn't it just in core?

Sincerely,

Joshua D. Drake

--
Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, @cmdpromptinc
"If we send our children to Caesar for their education, we should
not be surprised when they come back as Romans."

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Joshua D. Drake (#43)

Re: pg_background (and more parallelism infrastructure patches)

On Wed, Oct 22, 2014 at 7:12 PM, Joshua D. Drake <jd@commandprompt.com> wrote:

I don't think pg_background is merely a test program: I think it's a
quite useful piece of functionality. It can be used for running
VACUUM from a procedure, which is something people have asked for more
than once, or for simulating an autonomous transaction. Granted,
it'll be a lot slower than a real autonomous transaction, but it's
still better than doing it via dblink, because you don't have to futz
with authentication.

I think this is a very useful program but I wonder if it makes more sense to
push it out to pgxn? Why do we need an ever growing contrib? Further, if it
is good enough to go into contrib, why isn't it just in core?

Sure, there's always this debate. Certainly, there's no reason it has
to be in contrib rather than core or PGXN. From my point of view,
putting it on PGXN would be a less-than-awesome development because
then the infrastructure that the patch introduces (patches 1-5 of the
series) would have no users in core or contrib, which means that (A)
if anyone is thinking of modifying that code in the future, they'll
have more difficulty knowing whether their changes break anything and
(B) if someone breaks something unwittingly, the buildfarm will not
tell us.

Now, admittedly, I expect that eventually all of this stuff will be
used in core - for parallelism. But in the meantime I think having at
least one example in the PostgreSQL source tree of how each piece of
extensibility infrastructure that we have can be used is handy and
useful and generally something that we ought to encourage. And this
one has the advantage - unlike some stuff in contrib - of actually
being somewhat useful for real work, which a decent amount of the
stuff in contrib isn't.

What about moving it the opposite direction, from contrib all the way
into core? The main advantage of that in my book is that it might
make it easier to reduce the code duplication between pg_background
and exec_simple_query; the main disadvantage is that it reduces the
super-user's control, because now the functionality is always
available instead of being something that the super-user can choose to
install, or not.

But these are all judgement calls, of course.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45

Joshua D. Drake

jd@commandprompt.com

about 11 years ago

In reply to: Robert Haas (#44)

Re: pg_background (and more parallelism infrastructure patches)

What about moving it the opposite direction, from contrib all the way
into core? The main advantage of that in my book is that it might
make it easier to reduce the code duplication between pg_background
and exec_simple_query; the main disadvantage is that it reduces the
super-user's control, because now the functionality is always
available instead of being something that the super-user can choose to
install, or not.

But these are all judgement calls, of course.

I lean toward a push into core because:

1. "I expect that eventually all of this stuff will be
used in core - for parallelism."

2. Contrib is already bloated out. (which is a whole other discussion)

3. I am not sure I buy into the super-user argument. Just because the
functionality is there, doesn't mean it has to be used.

4. "The main advantage of that in my book is that it might
make it easier to reduce the code duplication between pg_background
and exec_simple_query;"

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46

Jim Nasby

Jim.Nasby@BlueTreble.com

about 11 years ago

In reply to: Joshua D. Drake (#45)

Re: pg_background (and more parallelism infrastructure patches)

On 10/22/14, 7:49 PM, Joshua D. Drake wrote:

I lean toward a push into core because:

3. I am not sure I buy into the super-user argument. Just because the functionality is there, doesn't mean it has to be used.

It's a valid concern, but I think the way to handle it if needed is to limit the number of connections a user can open. Or perhaps another option would be to change the permissions on the related functions (do we check ACLs for internal functions?)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Jim Nasby (#46)

Re: pg_background (and more parallelism infrastructure patches)

On Fri, Oct 24, 2014 at 1:13 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

It's a valid concern, but I think the way to handle it if needed is to limit
the number of connections a user can open. Or perhaps another option would
be to change the permissions on the related functions (do we check ACLs for
internal functions?)

I'm not sure dump-and-restore would preserve any properties of
anything in pg_catalog.

Anyway, I think we're getting a bit ahead of ourselves here. The
questions I need answers to right now are:

- What should we call dsm_unkeep_mapping, if not that?
- Are there remaining complaints about patch #3?
- How can I get somebody to review patch #4?
- Does anyone have a tangible suggestion for how to reduce the code
duplication in patch #6?

The question of where pg_background should ultimately live does
matter, but the answer will be "the -hackers mailing list archives"
unless we can get agreement on the prerequisite patches.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48

Jim Nasby

Jim.Nasby@BlueTreble.com

about 11 years ago

In reply to: Robert Haas (#47)

Re: pg_background (and more parallelism infrastructure patches)

On 10/24/14, 12:21 PM, Robert Haas wrote:

On Fri, Oct 24, 2014 at 1:13 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

It's a valid concern, but I think the way to handle it if needed is to limit
the number of connections a user can open. Or perhaps another option would
be to change the permissions on the related functions (do we check ACLs for
internal functions?)

I'm not sure dump-and-restore would preserve any properties of
anything in pg_catalog.

Anyway, I think we're getting a bit ahead of ourselves here. The
questions I need answers to right now are:

- What should we call dsm_unkeep_mapping, if not that?
- Are there remaining complaints about patch #3?
- How can I get somebody to review patch #4?

Here's a review of patch 4.

Perhaps it would be good to document the serialization format. I at least would have found it helpful, especially when reading estimate_variable_size(). I can take a stab at that if you'd rather not be bothered.

estimate_variable_size():
+				valsize = 5;	/* max(strlen('true'), strlen('false')) */
...
+				if (abs(*conf->variable) < 1000)
+					valsize = 3 + 1;

If we're worried about size, perhaps we should use 1/0 for bool instead of true/false?

On the serialization structure itself, should we be worried about a mismatch between available GUCs on the sender vs the receiver? Presumably if the sender outputs a GUC that the receiver doesn't know about we'll get an error, but what if the sender didn't include something? Maybe not an issue today, but could this cause problems down the road if we wanted to use the serialized data some other way? But maybe I'm just being paranoid. :)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49

Jim Nasby

Jim.Nasby@BlueTreble.com

about 11 years ago

In reply to: Jim Nasby (#48)

Re: pg_background (and more parallelism infrastructure patches)

On 10/24/14, 2:23 PM, Jim Nasby wrote:

On the serialization structure itself, should we be worried about a mismatch between available GUCs on the sender vs the receiver? Presumably if the sender outputs a GUC that the receiver doesn't know about we'll get an error, but what if the sender didn't include something? Maybe not an issue today, but could this cause problems down the road if we wanted to use the serialized data some other way? But maybe I'm just being paranoid. :)

I just realized there's a bigger problem there; this isn't portable against any changes to any of the binary elements.

So I guess it's really a question of would we ever want this to function (as-is) cross-version.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Jim Nasby (#48)

Re: pg_background (and more parallelism infrastructure patches)

On Fri, Oct 24, 2014 at 3:23 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

Here's a review of patch 4.

Thanks!

Perhaps it would be good to document the serialization format. I at least
would have found it helpful, especially when reading
estimate_variable_size(). I can take a stab at that if you'd rather not be
bothered.

I find the existing code pretty clear: to my eyes,
serialize_variable() really couldn't be much more transparent. But if
you want to suggest something I'm all ears.

estimate_variable_size():
+                               valsize = 5;    /* max(strlen('true'),
strlen('false')) */
...
+                               if (abs(*conf->variable) < 1000)
+                                       valsize = 3 + 1;

If we're worried about size, perhaps we should use 1/0 for bool instead of
true/false?

I guess we could. I'm not particularly worried about size, myself.
We need all the rigamarole about estimating the amount of space needed
because we can't allocate more space once we begin serializing. If
the DSM where this data is getting stored turns out not to be big
enough, we can't resize it. So we need to *know* how big the data is
going to be, but I don't think we need to *care* very much.

On the serialization structure itself, should we be worried about a mismatch
between available GUCs on the sender vs the receiver? Presumably if the
sender outputs a GUC that the receiver doesn't know about we'll get an
error, but what if the sender didn't include something? Maybe not an issue
today, but could this cause problems down the road if we wanted to use the
serialized data some other way? But maybe I'm just being paranoid. :)

I think the principal danger here is that there could be some loadable
module which is present in one backend but not the other, or which (in
its inimitable wisdom) chose to register a different set of GUCs or
valid values for those GUCs in one backend than the other. That would
indeed be sad. Given that our rules for registering GUCs resemble
nothing so much as the wild west, I don't believe there's any ironclad
way to defend against it, either.

One thing we could do - not in this patch - is provide a mechanism
that would cause a background worker to load every loadable module
that's present in the launching worker. I'm a little reluctant to
spend time on that now. If pg_background doesn't work with certain
combinations of loadable modules and GUC settings... it's not a
catastrophe. It might be more of an issue for parallel query proper,
because expectations may be higher there, but if so it won't hurt to
have some experience seeing whether things go wrong with this simple
system, and in what manner. At least IMHO, it doesn't seem like a
good idea to put fiddling with the edges of this ahead of other
parallelism-related projects.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Jim Nasby (#49)

Re: pg_background (and more parallelism infrastructure patches)

On Fri, Oct 24, 2014 at 3:30 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 10/24/14, 2:23 PM, Jim Nasby wrote:

On the serialization structure itself, should we be worried about a
mismatch between available GUCs on the sender vs the receiver? Presumably if
the sender outputs a GUC that the receiver doesn't know about we'll get an
error, but what if the sender didn't include something? Maybe not an issue
today, but could this cause problems down the road if we wanted to use the
serialized data some other way? But maybe I'm just being paranoid. :)

I just realized there's a bigger problem there; this isn't portable against
any changes to any of the binary elements.

So I guess it's really a question of would we ever want this to function
(as-is) cross-version.

I think that would be pretty hard to make work, but I don't mind if
someone else wants to try for some use case that they want to meet.
My goal is to make parallel query work, so the data will just be
getting transferred between two simultaneously-running children of the
same postmaster.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52

Jim Nasby

Jim.Nasby@BlueTreble.com

about 11 years ago

In reply to: Robert Haas (#47)

Re: pg_background (and more parallelism infrastructure patches)

On 10/24/14, 12:21 PM, Robert Haas wrote:

- What should we call dsm_unkeep_mapping, if not that?

Only option I can think of beyond unkeep would be dsm_(un)register_keep_mapping. Dunno that it's worth it.

- Does anyone have a tangible suggestion for how to reduce the code
duplication in patch #6?

Between execute_sql_string() and tcop/exec_simple_query()? Is there stuff in exec_simple that's not safe for bgwriter? I'm not seeing why we can't use exec_simple. :/

BTW, it does occur to me that we could do something to combine AllocSetContextCreate() followed by oldcontext=MemoryContextSwitchTo().

pg_background_result()
+		dsm_unkeep_mapping(info->seg);
+
+		/* Set up tuple-descriptor based on colum definition list. */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
Is there a reason we can't check the result type before unkeeping? Seems silly to throw the results away just because someone flubbed a function call...

+			default:
+				elog(WARNING, "unknown message type: %c (%zu bytes)",
+					 msg.data[0], nbytes);
It'd be useful to also provide DEBUG output with the message itself...
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#53

Jim Nasby

Jim.Nasby@BlueTreble.com

about 11 years ago

In reply to: Robert Haas (#51)

Re: pg_background (and more parallelism infrastructure patches)

On 10/24/14, 3:26 PM, Robert Haas wrote:

On Fri, Oct 24, 2014 at 3:30 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 10/24/14, 2:23 PM, Jim Nasby wrote:

On the serialization structure itself, should we be worried about a
mismatch between available GUCs on the sender vs the receiver? Presumably if
the sender outputs a GUC that the receiver doesn't know about we'll get an
error, but what if the sender didn't include something? Maybe not an issue
today, but could this cause problems down the road if we wanted to use the
serialized data some other way? But maybe I'm just being paranoid. :)

I just realized there's a bigger problem there; this isn't portable against
any changes to any of the binary elements.

So I guess it's really a question of would we ever want this to function
(as-is) cross-version.

I think that would be pretty hard to make work, but I don't mind if
someone else wants to try for some use case that they want to meet.
My goal is to make parallel query work, so the data will just be
getting transferred between two simultaneously-running children of the
same postmaster.

The only case I can think of would be actually connecting to a remote database; in that case would we even want something as raw as this? I suspect not, in which case I don't see an issue. On the other hand, if we ever think we might want to do that, we should probably at least stick a version number field in there...

But my suspicion is if we ever wanted to do something more with this then we'd want some kind of text-based format that could be passed into a SQL command (ie: SET ENVIRONMENT TO blah;)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#54

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Jim Nasby (#52)

Re: pg_background (and more parallelism infrastructure patches)

On Fri, Oct 24, 2014 at 4:46 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 10/24/14, 12:21 PM, Robert Haas wrote:

- What should we call dsm_unkeep_mapping, if not that?

Only option I can think of beyond unkeep would be
dsm_(un)register_keep_mapping. Dunno that it's worth it.

Hmm, we could rename dsm_keep_mapping() to dsm_unregister_mapping(),
since it's arranging to keep it by unregistering it from the resource
owner. And then we could call the new function
dsm_register_mapping(). That has the appeal that "unregister" is a
word, whereas "unkeep" isn't, but it's a little confusing otherwise,
because the sense is reversed vs. the current naming. Or we could
just leave dsm_keep_mapping() alone and call the new function
dsm_register_mapping(). A little non-orthogonal, but I think it'd be
OK.

- Does anyone have a tangible suggestion for how to reduce the code
duplication in patch #6?

Between execute_sql_string() and tcop/exec_simple_query()? Is there stuff in
exec_simple that's not safe for bgwriter? I'm not seeing why we can't use
exec_simple. :/

There are some differences if you compare them closely.

BTW, it does occur to me that we could do something to combine
AllocSetContextCreate() followed by oldcontext=MemoryContextSwitchTo().

Meh.

pg_background_result()
+               dsm_unkeep_mapping(info->seg);
+
+               /* Set up tuple-descriptor based on colum definition list.
*/
+               if (get_call_result_type(fcinfo, NULL, &tupdesc) !=
TYPEFUNC_COMPOSITE)
+                       ereport(ERROR,
Is there a reason we can't check the result type before unkeeping? Seems
silly to throw the results away just because someone flubbed a function
call...

Hmm, yeah, I see no reason why we couldn't move that up higher in the
function. It's probably a pretty common failure mode, too, so I can
see the convenience factor there.

+                       default:
+                               elog(WARNING, "unknown message type: %c (%zu
bytes)",
+                                        msg.data[0], nbytes);
It'd be useful to also provide DEBUG output with the message itself...

I think that's more work than is justified. We'd have to hexdump it
or something because it might not be valid in the client encoding, and
it's a case that shouldn't happen anyway.

(Hmm, that reminds me that I haven't thought of a solution to the
problem that the background worker might try to SET client_encoding,
which would be bad. Not sure what the best fix for that is, off-hand.
I'd rather not prohibit ALL GUC changes in the background worker, as
there's no good reason for such a draconian restriction.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Jim Nasby (#53)

Re: pg_background (and more parallelism infrastructure patches)

On Fri, Oct 24, 2014 at 4:53 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

The only case I can think of would be actually connecting to a remote
database; in that case would we even want something as raw as this? I
suspect not, in which case I don't see an issue. On the other hand, if we
ever think we might want to do that, we should probably at least stick a
version number field in there...

But my suspicion is if we ever wanted to do something more with this then
we'd want some kind of text-based format that could be passed into a SQL
command (ie: SET ENVIRONMENT TO blah;)

I mean, I don't think this is much different than what we're already
doing to transfer variables from the postmaster to other backends in
EXEC_BACKEND builds; see write_nondefault_variables(). It doesn't
have a version number or anything either. I don't see a reason why
this code needs to be held to a different standard; the purpose is
fundamentally quite similar.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#56

Petr Jelinek

petr@2ndquadrant.com

about 11 years ago

In reply to: Robert Haas (#54)

Re: pg_background (and more parallelism infrastructure patches)

On 24/10/14 23:03, Robert Haas wrote:

On Fri, Oct 24, 2014 at 4:46 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 10/24/14, 12:21 PM, Robert Haas wrote:

- What should we call dsm_unkeep_mapping, if not that?

Only option I can think of beyond unkeep would be
dsm_(un)register_keep_mapping. Dunno that it's worth it.

Hmm, we could rename dsm_keep_mapping() to dsm_unregister_mapping(),
since it's arranging to keep it by unregistering it from the resource
owner. And then we could call the new function
dsm_register_mapping(). That has the appeal that "unregister" is a
word, whereas "unkeep" isn't, but it's a little confusing otherwise,
because the sense is reversed vs. the current naming. Or we could
just leave dsm_keep_mapping() alone and call the new function
dsm_register_mapping(). A little non-orthogonal, but I think it'd be
OK.

I don't like that too much, but I don't have better suggestion, if we
went with one of these, I would prefer taking the route of renaming the
dsm_keep_mapping to dsm_unregister_mapping.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57

Jim Nasby

Jim.Nasby@BlueTreble.com

about 11 years ago

In reply to: Robert Haas (#54)

Re: pg_background (and more parallelism infrastructure patches)

On 10/24/14, 4:03 PM, Robert Haas wrote:

On Fri, Oct 24, 2014 at 4:46 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 10/24/14, 12:21 PM, Robert Haas wrote:

- What should we call dsm_unkeep_mapping, if not that?

Only option I can think of beyond unkeep would be
dsm_(un)register_keep_mapping. Dunno that it's worth it.

Hmm, we could rename dsm_keep_mapping() to dsm_unregister_mapping(),
since it's arranging to keep it by unregistering it from the resource
owner. And then we could call the new function
dsm_register_mapping(). That has the appeal that "unregister" is a
word, whereas "unkeep" isn't, but it's a little confusing otherwise,
because the sense is reversed vs. the current naming. Or we could
just leave dsm_keep_mapping() alone and call the new function
dsm_register_mapping(). A little non-orthogonal, but I think it'd be
OK.

I think it's actually important to keep the names matching, even if it means "unkeep".

dsm_unregister_mapping seems like the opposite of what you'd want to do to have the mapping be remembered. I get that it works that way because of how it's implemented, but that seems like a lot of leakage of implementation into API...

FWIW, I don't see "unkeep" as being that horrible.

- Does anyone have a tangible suggestion for how to reduce the code
duplication in patch #6?

Between execute_sql_string() and tcop/exec_simple_query()? Is there stuff in
exec_simple that's not safe for bgwriter? I'm not seeing why we can't use
exec_simple. :/

There are some differences if you compare them closely.

Without doing a deep dive, I'm guessing that most of the extra stuff would be safe to re-use; it just wouldn't affect execute_sql_string. Obviously we could add a boolean to exec_simple_query for the case when it's being used by a bgwriter. Though, it seems like the biggest differences have to do with logging

Here's the differences I see:

- Disallowing transaction commands
- Logging
- What memory context is used (could we just use that differently in a pg_backend backend?)
- Portal output format
- What to do with the output of intermediate commands (surely there's other places we need to handle that? plpgsql maybe?)

I think all of those except logging could be handled by a function serving both exec_simple_query and execute_sql_command that accepts a few booleans (or maybe just one to indicate the caller) and some if's. At least I don't think it'd be too bad, without actually writing it.

I'm not sure what to do about logging. If the original backend has logging turned on, is it that horrible to do the same logging as exec_simple_query would do?

Another option would be factoring out parts of exec_simple_query; the for loop over the parse tree might be a good candidate. But I suspect that would be uglier than a separate support function.

I do feel uncomfortable with the amount of duplication there is right now though...

BTW, it does occur to me that we could do something to combine
AllocSetContextCreate() followed by oldcontext=MemoryContextSwitchTo().

Meh.

Well, it does seem to be a fairly common pattern throughout the codebase. And you were pretty vague when you asked for ideas to reduce duplication. ;P

+                       default:
+                               elog(WARNING, "unknown message type: %c (%zu
bytes)",
+                                        msg.data[0], nbytes);
It'd be useful to also provide DEBUG output with the message itself...
I think that's more work than is justified. We'd have to hexdump it
or something because it might not be valid in the client encoding, and
it's a case that shouldn't happen anyway.

I'm worried that a user wouldn't have any good way to see what the unexpected data is... but I guess if a user ever saw this we've got bigger problems anyway...

(Hmm, that reminds me that I haven't thought of a solution to the
problem that the background worker might try to SET client_encoding,
which would be bad. Not sure what the best fix for that is, off-hand.
I'd rather not prohibit ALL GUC changes in the background worker, as
there's no good reason for such a draconian restriction.)

Perhaps a new GucContext for background workers? Or maybe a new GucSource, though I'm not sure if that'd work and it seems pretty wrong anyway.

BTW, perhaps the amount of discussion on internal parts is an indication this shouldn't be in contrib. I think it's a damn strong indication it shouldn't be in PGXN. :)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#58

Petr Jelinek

petr@2ndquadrant.com

about 11 years ago

In reply to: Robert Haas (#55)

Re: pg_background (and more parallelism infrastructure patches)

On 24/10/14 23:07, Robert Haas wrote:

On Fri, Oct 24, 2014 at 4:53 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

The only case I can think of would be actually connecting to a remote
database; in that case would we even want something as raw as this? I
suspect not, in which case I don't see an issue. On the other hand, if we
ever think we might want to do that, we should probably at least stick a
version number field in there...

But my suspicion is if we ever wanted to do something more with this then
we'd want some kind of text-based format that could be passed into a SQL
command (ie: SET ENVIRONMENT TO blah;)

I mean, I don't think this is much different than what we're already
doing to transfer variables from the postmaster to other backends in
EXEC_BACKEND builds; see write_nondefault_variables(). It doesn't
have a version number or anything either. I don't see a reason why
this code needs to be held to a different standard; the purpose is
fundamentally quite similar.

I really do think that the GUC patch is ok as it is, we might want to do
1/0 for bools but I don't really see that as important thing. And it
works for the use-cases it's intended for. I don't see why this should
be required to work cross version really.
Loading of the modules by bgworker is probably bigger issue than just
GUCs, and I think it's out of scope for the current patch(set).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#59

Alvaro Herrera

alvherre@2ndquadrant.com

about 11 years ago

In reply to: Robert Haas (#54)

Re: pg_background (and more parallelism infrastructure patches)

Robert Haas wrote:

On Fri, Oct 24, 2014 at 4:46 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 10/24/14, 12:21 PM, Robert Haas wrote:

- What should we call dsm_unkeep_mapping, if not that?

Only option I can think of beyond unkeep would be
dsm_(un)register_keep_mapping. Dunno that it's worth it.

Hmm, we could rename dsm_keep_mapping() to dsm_unregister_mapping(),
since it's arranging to keep it by unregistering it from the resource
owner. And then we could call the new function
dsm_register_mapping(). That has the appeal that "unregister" is a
word, whereas "unkeep" isn't, but it's a little confusing otherwise,
because the sense is reversed vs. the current naming. Or we could
just leave dsm_keep_mapping() alone and call the new function
dsm_register_mapping(). A little non-orthogonal, but I think it'd be
OK.

I do think that dsm_keep_mapping is a strange name for what it does. If
it were a single function maybe it'd be okay because you read the
comments and know what it does, but if we have to invent the "unkeep"
stuff then it gets weird enough that a better solution seems worthwhile
to me. So +1 for renaming it to something else. I like the "pin"
analogy we use for buffers; so if you pin a mapping (dsm_pin_mapping)
then it doesn't go away automatically with the resowner, and then you
unpin it (dsm_unpin_mapping) to make the system collect it.

(Not sure what this means for the per-segment equivalent function. It
might be worth keeping both namings consistent.)

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#60

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Alvaro Herrera (#59)

Re: pg_background (and more parallelism infrastructure patches)

On Sat, Oct 25, 2014 at 7:01 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I do think that dsm_keep_mapping is a strange name for what it does.

OK, so let me see if I can summarize the votes so far on this (highly
important?) naming issue:

- Andres doesn't like "unkeep". He suggests dsm_manage_mapping(),
dsm_ensure_mapping_cleanup(), and dsm_remember_mapping() as possible
alternatives.
- Amit also doesn't like "unkeep". He suggests dsm_change_mapping().
Alternatively, he suggests having a function called
dsm_manage_mapping() with an additional boolean parameter to indicate
whether we are keeping or unkeeping.
- Jim, without taking a position on whether the current name is
problematic, suggested the naming, suggested
dsm_(un)register_keep_mapping.
- I am unbothered by the name "unkeep". But I suggested renaming
"dsm_keep_mapping" to "dsm_unregister_mapping" and adding
"dsm_register_mapping" as an alternative.
- Petr liked that proposal better than the others, although it wasn't
clear that he was unhappy with my original proposal.
- Alvaro proposes dsm_pin_mapping/dsm_unpin_mappng.
- Nobody's comments on any proposal made subsequent to the proposal
they made themselves.

After reviewing all of those possibilities with the sort of laser-like
focus the situation demands, I'm inclined to endorse Alvaro's proposal
to rename the existing dsm_keep_mapping() function to
dsm_pin_mapping() and the existing dsm_keep_segment() function to
dsm_pin_segment(). Then, I will add the new function as
dsm_unpin_mapping(). So:

1. Does anyone strongly object to that course of action?

2. Does anyone wish to argue for or against back-patching the name
changes to 9.4? My feeling is that we may as well, because either
nobody's using this yet, in which case back-patching it won't break
anything, or somebody is, in which case we'll cause less pain by
breaking it before release than a year on. But I don't care that much
either way, so I'll defer if others disagree.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#61

Andres Freund

andres@2ndquadrant.com

about 11 years ago

In reply to: Robert Haas (#60)

Re: pg_background (and more parallelism infrastructure patches)

On 2014-10-29 15:00:36 -0400, Robert Haas wrote:

1. Does anyone strongly object to that course of action?

I don't.

2. Does anyone wish to argue for or against back-patching the name
changes to 9.4?

+1.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#62

Andres Freund

andres@2ndquadrant.com

about 11 years ago

In reply to: Robert Haas (#42)

Re: pg_background (and more parallelism infrastructure patches)

On 2014-10-22 19:03:28 -0400, Robert Haas wrote:

On Wed, Oct 8, 2014 at 6:32 PM, Andres Freund <andres@2ndquadrant.com> wrote:

I got to ask: Why is it helpful that we have this in contrib? I have a
good share of blame to bear for that, but I think we need to stop
dilluting contrib evermore with test programs. These have a place, but I
don't think it should be contrib.

I don't think pg_background is merely a test program: I think it's a
quite useful piece of functionality. It can be used for running
VACUUM from a procedure, which is something people have asked for more
than once, or for simulating an autonomous transaction. Granted,
it'll be a lot slower than a real autonomous transaction, but it's
still better than doing it via dblink, because you don't have to futz
with authentication.

Fair enough.

Hm. So every user can do this once the extension is created as the
functions are most likely to be PUBLIC. Is this a good idea?

Why not? If they can log in, they could start separate sessions with
similar effect.

Connection limits and revoked connection permissions are possible
reasons.

I'm unsure right now about the rules surrounding this, but shouldn't we
check that the user is allowed to execute these? And shouldn't we fall
back to non binary functions if no binary ones are available?

I can't see any reason to do either of those things. I'm not aware
that returning data in binary format is in any way intended to be a
security-restricted operation, or that we have any data types that
actually matter without send and receive functions. If we do, I think
the solution is to add them, not make this more complicated.

We do have a couple of types that don't have binary send/recv
functions. And there are many more out there. I don't think we can make
that a hard prerequisite.

+                                     /*
+                                      * Limit the maximum error level to ERROR.  We don't want
+                                      * a FATAL inside the background worker to kill the user
+                                      * session.
+                                      */
+                                     if (edata.elevel > ERROR)
+                                             edata.elevel = ERROR;
Hm. But we still should report that it FATALed? Maybe add error context
notice about it? Not nice, but I don't have a immediately better idea. I
think it generally would be rather helpful to add the information that
this wasn't originally an error triggered by this process. The user
might otherwise be confused when looking for the origin of the error in
the log.
Yeah, I was wondering if we needed some kind of annotation here. What
I'm wondering about is appending something to the errcontext, perhaps
"background worker, PID %d".

Can't we just add another error context? That seems cleaner to me. It
should be sufficient to push something onto the relevant stack.

+                     case 'A':
+                             {
+                                     /* Propagate NotifyResponse. */
+                                     pq_putmessage(msg.data[0], &msg.data[1], nbytes - 1);
+                                     break;
Hm. Are we sure to be in a situation where the client expects these? And
are we sure their encoding is correct? The other data goe through
input/output methods checking for that, but here we don't. And the other
side AFAICS could have done a SET client_encoding.
I think there's no problem at the protocol level; I think the server
can send NotifyResponse pretty much whenever.

I think the server not having done so for pretty much forever will still
confuse clients.

It could be argued that this is a POLA violation, but dropping the
notify messages on the floor (which seems to be the other option)
doesn't seem like superior. So I think this is mostly a matter of
documentation.

Maybe.

+Datum
+pg_background_detach(PG_FUNCTION_ARGS)
+{
+     int32           pid = PG_GETARG_INT32(0);
+     pg_background_worker_info *info;
+
+     info = find_worker_info(pid);
+     if (info == NULL)
+             ereport(ERROR,
+                             (errcode(ERRCODE_UNDEFINED_OBJECT),
+                              errmsg("PID %d is not attached to this session", pid)));
+     dsm_detach(info->seg);
+
+     PG_RETURN_VOID();
+}
So there 's really no limit of who is allowed to do stuff like
this. I.e. any SECURITY DEFINER and such may do the same.
Do you think we need a restriction? It's not obvious to me that there
are any security-relevant consequences to this, but it's an important
question, and I might be missing something.

Well, a security definer function might have started the worker. And
then the plain user kills it. Or the other way round.

+ /* Establish signal handlers. */
+ pqsignal(SIGTERM, handle_sigterm);

Hm. No SIGINT?

Nope; bgworker.c sets it to StatementCancelHandler, which should be
fine.

Ah, right.

+     /* Find data structures in dynamic shared memory. */
+     fdata = shm_toc_lookup(toc, PG_BACKGROUND_KEY_FIXED_DATA);
+     sql = shm_toc_lookup(toc, PG_BACKGROUND_KEY_SQL);
+     gucstate = shm_toc_lookup(toc, PG_BACKGROUND_KEY_GUC);
+     mq = shm_toc_lookup(toc, PG_BACKGROUND_KEY_QUEUE);
+     shm_mq_set_sender(mq, MyProc);
+     responseq = shm_mq_attach(mq, seg, NULL);
Don't these need to ensure that values have been found? shm_toc_lookup
returns NULL for unknown itmes and such and such?
Meh. The launching process would have errored out if it hadn't been
able to set up the segment correctly. We could add some Assert()
statements if you really feel strongly about it, but it seems fairly
pointless to me. Any situation where those pointers come back NULL is
presumably going to be some sort of really stupid bug that will be
found even by trivial testing.

If we write code like this we really should have made it an error to
look something up that doesn't exist in the toc. But whatever.

+     /* Restore GUC values from launching backend. */
+     StartTransactionCommand();
+     RestoreGUCState(gucstate);
+     CommitTransactionCommand();
I haven't read the guc save patch, but is it a) required to this in a
transaction? We normally reload the config even without. b) correct to
do? What's with SET LOCAL variables?
(a) Yeah, it doesn't work without that. I forget what breaks, but if
you taking those out, it will blow up.

That seems bad. Because we actually parse gucs without a surrounding
transaction in some cases. But maybe it's just NOT_IN_FILE (or however
they're called) variables that are problematic?

Btw, how are we dealing with connection gucs?

(b) Do those need special handling for some reason?

Well, the CommitTransactionCommand() will reset them (via AtEOXact_GUC),
no? Or am I missing something?

+     /* Post-execution cleanup. */
+     disable_timeout(STATEMENT_TIMEOUT, false);
+     CommitTransactionCommand();
So, we're allowed to do nearly arbitrary nastyness here...
Can you be more specific about the nature of your concern? This is no
different than finish_xact_command().

What I was thinking about was that the background process can do stuff
that writes.

+     /*
+      * Parse the SQL string into a list of raw parse trees.
+      *
+      * Because we allow statements that perform internal transaction control,
+      * we can't do this in TopTransactionContext; the parse trees might get
+      * blown away before we're done executing them.
+      */
+     parsecontext = AllocSetContextCreate(TopMemoryContext,
+                                                                              "pg_background parse/plan",
+                                                                              ALLOCSET_DEFAULT_MINSIZE,
+                                                                              ALLOCSET_DEFAULT_INITSIZE,
+                                                                              ALLOCSET_DEFAULT_MAXSIZE);

Not that it hugely matters, but shouldn't this rather be
TopTransactionContext?

No, for the reasons explained in the command that you quoted right, uh, there.

Man. You expect me to read comments? .oO(hide)

+             bool            snapshot_set = false;
+             Portal          portal;
+             DestReceiver *receiver;
+             int16           format = 1;
+
+             /*
+              * We don't allow transaction-control commands like COMMIT and ABORT
+              * here.  The entire SQL statement is executed as a single transaction
+              * which commits if no errors are encountered.
+              */
+             if (IsA(parsetree, TransactionStmt))
+                     ereport(ERROR,
+                                     (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                                      errmsg("transaction control statements are not allowed in pg_background")));
Hm. I don't think that goes far enough. This allows commands that
internally stop/start transactions like CREATE INDEX CONCURRETNLY. Are
you sure that's working right now?
I tested VACUUM not of a specific table and that seems to work
correctly. I could try CIC, but I think the issues are the same. If
we only get one parse tree, then isTopLevel will be true and it's safe
for commands to do their own transaction control. If we get multiple
parse trees, then PreventTransactionChain will do its thing. This is
not novel territory.

It's not novel, but damned easy to get wrong.

Hm. This is a fair amount of code copied from postgres.c.

Yes. I'm open to suggestions, but I don't immediately see a better way.

I don't immediately see something better either. But I definitely
dislike the amount of code it duplicates.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#63

Petr Jelinek

petr@2ndquadrant.com

about 11 years ago

In reply to: Robert Haas (#60)

Re: pg_background (and more parallelism infrastructure patches)

On 29/10/14 20:00, Robert Haas wrote:

After reviewing all of those possibilities with the sort of laser-like
focus the situation demands, I'm inclined to endorse Alvaro's proposal
to rename the existing dsm_keep_mapping() function to
dsm_pin_mapping() and the existing dsm_keep_segment() function to
dsm_pin_segment(). Then, I will add the new function as
dsm_unpin_mapping(). So:

1. Does anyone strongly object to that course of action?

Nah, it sounds reasonable.

2. Does anyone wish to argue for or against back-patching the name
changes to 9.4? My feeling is that we may as well, because either
nobody's using this yet, in which case back-patching it won't break
anything, or somebody is, in which case we'll cause less pain by
breaking it before release than a year on. But I don't care that much
either way, so I'll defer if others disagree.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#64

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Andres Freund (#62)

Re: pg_background (and more parallelism infrastructure patches)

On Wed, Oct 29, 2014 at 3:28 PM, Andres Freund <andres@2ndquadrant.com> wrote:

Hm. So every user can do this once the extension is created as the
functions are most likely to be PUBLIC. Is this a good idea?

Why not? If they can log in, they could start separate sessions with
similar effect.

Connection limits and revoked connection permissions are possible
reasons.

That's true. I don't know what to do about it. I'm somewhat inclined
to think that, if this remains in contrib, it's OK to ignore those
issues until such time as people complain about them, because anybody
who dislikes the things that can be done with this extension doesn't
have to install it. Also, the people complaining might have useful
ideas about what a good fix would look like, which I currently don't.
There's some push to move this into core, which I think is overkill,
but if we do it then we'd better have a good solution to this problem.

We could try to make connection limits apply to pg_background, and we
could also check CONNECT permission when starting a background worker.
Both of those things feel slightly odd because there's no actual
server connection. There *might* be a connection to the user backend
that started it, but it's sort of a "virtual" connection through
shared memory, and the background process continues running unimpeded
if it goes away, so there might be no actual connection at all. At
the other extreme, we could invent a BACKGROUND permission and
background_limit as analogues of what we have now and enforce those
instead. But I'm not keep to getting sucked into a long process of
perfecting pg_background. It's here; I think some people will like
it, but it exists mostly to show that, dude, we can use dynamic shared
memory to talk to a background worker and do real stuff.

I'm unsure right now about the rules surrounding this, but shouldn't we
check that the user is allowed to execute these? And shouldn't we fall
back to non binary functions if no binary ones are available?

I can't see any reason to do either of those things. I'm not aware
that returning data in binary format is in any way intended to be a
security-restricted operation, or that we have any data types that
actually matter without send and receive functions. If we do, I think
the solution is to add them, not make this more complicated.

We do have a couple of types that don't have binary send/recv
functions. And there are many more out there. I don't think we can make
that a hard prerequisite.

Well, again, I think this comes down partly to the question of whether
this is a toy, a first-class facility, or something in between. But I
also think there's absolutely no rule that all types need to work with
every feature. Some types, for example, don't provide a hash opclass.
You can't build hash indexes on those data types; more importantly,
you can't do hash joins. Anyone who doesn't like that can write a
patch to provide a hash opclass. Similarly, if you try to retrieve
results in binary from the server and there are no send/recv
functions, it will fail. MUCH less significantly, the type also can't
be returned via pg_background. I don't see why that should be viewed
as a problem pg_background has to fix rather than a problem with the
type.

Yeah, I was wondering if we needed some kind of annotation here. What
I'm wondering about is appending something to the errcontext, perhaps
"background worker, PID %d".

Can't we just add another error context? That seems cleaner to me. It
should be sufficient to push something onto the relevant stack.

Right now, pq_parse_errornotice() just recontructs the exact ErrorData
that was present in the other backed, and ThrowErrorData() just
rethrows it as-is. I think that's the right design. It's for the
application code (patch 6) to figure out what to pass to
ThrowErrorData(). The error propagation machinery itself (patch 3)
should content itself with serializing and deserializing the data it's
given, and should not attempt to second-guess which parts of the error
data the user may want to change.

+                     case 'A':
+                             {
+                                     /* Propagate NotifyResponse. */
+                                     pq_putmessage(msg.data[0], &msg.data[1], nbytes - 1);
+                                     break;
Hm. Are we sure to be in a situation where the client expects these? And
are we sure their encoding is correct? The other data goe through
input/output methods checking for that, but here we don't. And the other
side AFAICS could have done a SET client_encoding.
I think there's no problem at the protocol level; I think the server
can send NotifyResponse pretty much whenever.
I think the server not having done so for pretty much forever will still
confuse clients.

It could be argued that this is a POLA violation, but dropping the
notify messages on the floor (which seems to be the other option)
doesn't seem like superior. So I think this is mostly a matter of
documentation.

Maybe.

Yeah, I'm not totally sure about this either. Perhaps some other
people will weigh in.

Do you think we need a restriction? It's not obvious to me that there
are any security-relevant consequences to this, but it's an important
question, and I might be missing something.

Well, a security definer function might have started the worker. And
then the plain user kills it. Or the other way round.

Detaching the worker doesn't kill it; it just causes us not to wait
for the results. It does mean that if a security definer function
starts a worker, and returns without detaching it or cleaning it up,
the unprivileged user could then read the data back from that worker.
That's more insidious than it may at first appear, because the
security definer function could have been intending to read back the
data before returning, and then a transaction abort happens. We could
add a guard requiring that the data be read by the same effective user
ID that started the worker, although that might also foreclose useful
things people would otherwise be able to do with this.

Meh. The launching process would have errored out if it hadn't been
able to set up the segment correctly. We could add some Assert()
statements if you really feel strongly about it, but it seems fairly
pointless to me. Any situation where those pointers come back NULL is
presumably going to be some sort of really stupid bug that will be
found even by trivial testing.

If we write code like this we really should have made it an error to
look something up that doesn't exist in the toc. But whatever.

You might be right. I think there could be other cases where the data
could exist or be omitted and the worker would have to know to cope
with it, but having the dsm_toc stuff error out directly wouldn't have
been stupid either.

+     /* Restore GUC values from launching backend. */
+     StartTransactionCommand();
+     RestoreGUCState(gucstate);
+     CommitTransactionCommand();
I haven't read the guc save patch, but is it a) required to this in a
transaction? We normally reload the config even without. b) correct to
do? What's with SET LOCAL variables?
(a) Yeah, it doesn't work without that. I forget what breaks, but if
you taking those out, it will blow up.
That seems bad. Because we actually parse gucs without a surrounding
transaction in some cases. But maybe it's just NOT_IN_FILE (or however
they're called) variables that are problematic?

I just tried this and got:

ERROR: invalid value for parameter "session_authorization": "rhaas"

Btw, how are we dealing with connection gucs?

What do you mean by that?

(b) Do those need special handling for some reason?

Well, the CommitTransactionCommand() will reset them (via AtEOXact_GUC),
no? Or am I missing something?

We don't serialize and restore the GucStackState.

+     /* Post-execution cleanup. */
+     disable_timeout(STATEMENT_TIMEOUT, false);
+     CommitTransactionCommand();
So, we're allowed to do nearly arbitrary nastyness here...
Can you be more specific about the nature of your concern? This is no
different than finish_xact_command().
What I was thinking about was that the background process can do stuff
that writes.

So what?

Hm. This is a fair amount of code copied from postgres.c.

Yes. I'm open to suggestions, but I don't immediately see a better way.

I don't immediately see something better either. But I definitely
dislike the amount of code it duplicates.

I'm not crazy about it myself. But I don't think polluting
exec_simple_query() with pg_background's concerns is a good plan
either. That'd really be getting our priorities mixed up.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#65

Andres Freund

andres@2ndquadrant.com

about 11 years ago

In reply to: Robert Haas (#64)

Re: pg_background (and more parallelism infrastructure patches)

On 2014-10-29 16:38:59 -0400, Robert Haas wrote:

On Wed, Oct 29, 2014 at 3:28 PM, Andres Freund <andres@2ndquadrant.com> wrote:

Hm. So every user can do this once the extension is created as the
functions are most likely to be PUBLIC. Is this a good idea?

Why not? If they can log in, they could start separate sessions with
similar effect.

Connection limits and revoked connection permissions are possible
reasons.

That's true. I don't know what to do about it. I'm somewhat inclined
to think that, if this remains in contrib, it's OK to ignore those
issues until such time as people complain about them, because anybody
who dislikes the things that can be done with this extension doesn't
have to install it. Also, the people complaining might have useful
ideas about what a good fix would look like, which I currently don't.
There's some push to move this into core, which I think is overkill,
but if we do it then we'd better have a good solution to this problem.

At the very least it need to be clearly documented. Another solution
would be to simply not give out PUBLIC rights, and restrict it to the
owner/superuesers lest somebody makes explicit grants. I favor
combining those two.

We could try to make connection limits apply to pg_background, and we
could also check CONNECT permission when starting a background worker.
Both of those things feel slightly odd because there's no actual
server connection. There *might* be a connection to the user backend
that started it, but it's sort of a "virtual" connection through
shared memory, and the background process continues running unimpeded
if it goes away, so there might be no actual connection at all.

I think that'd not be bad.

I'm unsure right now about the rules surrounding this, but shouldn't we
check that the user is allowed to execute these? And shouldn't we fall
back to non binary functions if no binary ones are available?

I can't see any reason to do either of those things. I'm not aware
that returning data in binary format is in any way intended to be a
security-restricted operation, or that we have any data types that
actually matter without send and receive functions. If we do, I think
the solution is to add them, not make this more complicated.

We do have a couple of types that don't have binary send/recv
functions. And there are many more out there. I don't think we can make
that a hard prerequisite.

Well, again, I think this comes down partly to the question of whether
this is a toy, a first-class facility, or something in between. But I
also think there's absolutely no rule that all types need to work with
every feature. Some types, for example, don't provide a hash opclass.
You can't build hash indexes on those data types; more importantly,
you can't do hash joins. Anyone who doesn't like that can write a
patch to provide a hash opclass. Similarly, if you try to retrieve
results in binary from the server and there are no send/recv
functions, it will fail. MUCH less significantly, the type also can't
be returned via pg_background. I don't see why that should be viewed
as a problem pg_background has to fix rather than a problem with the
type.

Hm. I'm unconvinced. It looks almost trivial to fail back to the text
based protocol.

Yeah, I was wondering if we needed some kind of annotation here. What
I'm wondering about is appending something to the errcontext, perhaps
"background worker, PID %d".

Can't we just add another error context? That seems cleaner to me. It
should be sufficient to push something onto the relevant stack.

Right now, pq_parse_errornotice() just recontructs the exact ErrorData
that was present in the other backed, and ThrowErrorData() just
rethrows it as-is. I think that's the right design. It's for the
application code (patch 6) to figure out what to pass to
ThrowErrorData(). The error propagation machinery itself (patch 3)
should content itself with serializing and deserializing the data it's
given, and should not attempt to second-guess which parts of the error
data the user may want to change.

I don't see how that follows. The error context logic is there to make
it clearer where an error originated from. It'll be really confusing if
there's ERRORs jumping out of a block of code without emitting context
that has set a error context set.

Do you think we need a restriction? It's not obvious to me that there
are any security-relevant consequences to this, but it's an important
question, and I might be missing something.

Well, a security definer function might have started the worker. And
then the plain user kills it. Or the other way round.

Detaching the worker doesn't kill it; it just causes us not to wait
for the results.

I think in many cases that's a academical distinction.

It does mean that if a security definer function
starts a worker, and returns without detaching it or cleaning it up,
the unprivileged user could then read the data back from that worker.
That's more insidious than it may at first appear, because the
security definer function could have been intending to read back the
data before returning, and then a transaction abort happens. We could
add a guard requiring that the data be read by the same effective user
ID that started the worker, although that might also foreclose useful
things people would otherwise be able to do with this.

I think such a restriction would be a good idea for now.

Meh. The launching process would have errored out if it hadn't been
able to set up the segment correctly. We could add some Assert()
statements if you really feel strongly about it, but it seems fairly
pointless to me. Any situation where those pointers come back NULL is
presumably going to be some sort of really stupid bug that will be
found even by trivial testing.

If we write code like this we really should have made it an error to
look something up that doesn't exist in the toc. But whatever.

You might be right. I think there could be other cases where the data
could exist or be omitted and the worker would have to know to cope
with it, but having the dsm_toc stuff error out directly wouldn't have
been stupid either.

There could be an explicit boolean function to check for the presence...

+     /* Restore GUC values from launching backend. */
+     StartTransactionCommand();
+     RestoreGUCState(gucstate);
+     CommitTransactionCommand();
I haven't read the guc save patch, but is it a) required to this in a
transaction? We normally reload the config even without. b) correct to
do? What's with SET LOCAL variables?
(a) Yeah, it doesn't work without that. I forget what breaks, but if
you taking those out, it will blow up.
That seems bad. Because we actually parse gucs without a surrounding
transaction in some cases. But maybe it's just NOT_IN_FILE (or however
they're called) variables that are problematic?
I just tried this and got:

ERROR: invalid value for parameter "session_authorization": "rhaas"

Hm. Ok.

Btw, how are we dealing with connection gucs?

What do you mean by that?

There's some PGC_BACKEND guc's that normally are only allowed to be set
at connection start. I'm not sure whether you're just circumventing that
check or whether you didn't hit a problematic case.

What does e.g. happen if you set PGOPTIONS='-c
local_preload_libraries=auto_explain' after linking auto_explain into
the plugin directory?

(b) Do those need special handling for some reason?

Well, the CommitTransactionCommand() will reset them (via AtEOXact_GUC),
no? Or am I missing something?

We don't serialize and restore the GucStackState.

Ah, ok.

Hm. This is a fair amount of code copied from postgres.c.

Yes. I'm open to suggestions, but I don't immediately see a better way.

I don't immediately see something better either. But I definitely
dislike the amount of code it duplicates.

I'm not crazy about it myself. But I don't think polluting
exec_simple_query() with pg_background's concerns is a good plan
either. That'd really be getting our priorities mixed up.

Agreed that that'd be the wrong way round.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#66

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Andres Freund (#65)

1 attachment(s)

Re: pg_background (and more parallelism infrastructure patches)

On Wed, Oct 29, 2014 at 4:58 PM, Andres Freund <andres@2ndquadrant.com> wrote:

That's true. I don't know what to do about it. I'm somewhat inclined
to think that, if this remains in contrib, it's OK to ignore those
issues until such time as people complain about them, because anybody
who dislikes the things that can be done with this extension doesn't
have to install it. Also, the people complaining might have useful
ideas about what a good fix would look like, which I currently don't.
There's some push to move this into core, which I think is overkill,
but if we do it then we'd better have a good solution to this problem.

At the very least it need to be clearly documented. Another solution
would be to simply not give out PUBLIC rights, and restrict it to the
owner/superuesers lest somebody makes explicit grants. I favor
combining those two.

I don't think it's appropriate to put superuser() checks in the code,
if that's what you are proposing. Forcing this to be super-user only
is hitting a mouse with a battleship. If you're saying we should put
REVOKE commands into the install script as we do for some other
modules, like dblink, that makes sense to me.

We could try to make connection limits apply to pg_background, and we
could also check CONNECT permission when starting a background worker.
Both of those things feel slightly odd because there's no actual
server connection. There *might* be a connection to the user backend
that started it, but it's sort of a "virtual" connection through
shared memory, and the background process continues running unimpeded
if it goes away, so there might be no actual connection at all.

I think that'd not be bad.

Looks like those checks happen in InitializeSessionUserId(), which is
called from InitPostgres(), which is called from
BackgroundWorkerInitializeConnection(). That makes me think we're
already applying these checks.

rhaas=# select * from
pg_background_result(pg_background_launch('vacuum pg_enum')) as (x
text);
x
--------
VACUUM
(1 row)

rhaas=# alter role rhaas nologin;
ALTER ROLE
rhaas=# select * from
pg_background_result(pg_background_launch('vacuum pg_enum')) as (x
text);
ERROR: role "rhaas" is not permitted to log in
CONTEXT: background worker, pid 64311
rhaas=# alter role rhaas login;
ALTER ROLE
rhaas=# select * from
pg_background_result(pg_background_launch('vacuum pg_enum')) as (x
text);
x
--------
VACUUM
(1 row)

Hm. I'm unconvinced. It looks almost trivial to fail back to the text
based protocol.

It's hard to argue with "I'm unconvinced". What specifically about
that argument do you think isn't valid?

While I am sure the problem can be worked around, it isn't trivial.
Right now, execute_sql_string() just requests binary format
unconditionally. To do what you're talking about, we'd need to
iterate through all of the types and figure out which ones have
typsend/typreceive functions. If there's a convenience function that
will do that for us, I don't see it, probably because I believe there
are exact zero situations where we do that kind of inference today.
Then, the user backend has to save the format codes from the
RowDescription message and decide whether to use text or binary. That
just seems like a silly waste of code and cycles.

I think this actually matters, too, because the question is what we're
going to do with full-blown parallelism. Best would be to actually
shuttle entire raw tuples between backends; second best, binary
format; third best, text format or mixture of text and binary. I'm
not sure what it's reasonable to try to get away with there, but I
certainly think minimizing IPC costs is going to be an important goal.
I didn't try to get around with shipping raw tuples here because we
don't lock types while they're in use, and I'm worried that Bad Things
Could Happen. But I'm sure somebody's going to care about the
overhead of converting back and forth at some point.

I don't see how that follows. The error context logic is there to make
it clearer where an error originated from. It'll be really confusing if
there's ERRORs jumping out of a block of code without emitting context
that has set a error context set.

I don't think I was proposing that, but I think I may have gotten a
little off-track here. See what you think of the attached, which
seems to work.

It does mean that if a security definer function
starts a worker, and returns without detaching it or cleaning it up,
the unprivileged user could then read the data back from that worker.
That's more insidious than it may at first appear, because the
security definer function could have been intending to read back the
data before returning, and then a transaction abort happens. We could
add a guard requiring that the data be read by the same effective user
ID that started the worker, although that might also foreclose useful
things people would otherwise be able to do with this.

I think such a restriction would be a good idea for now.

Done in the attached version.

Btw, how are we dealing with connection gucs?

What do you mean by that?

There's some PGC_BACKEND guc's that normally are only allowed to be set
at connection start. I'm not sure whether you're just circumventing that
check or whether you didn't hit a problematic case.

What does e.g. happen if you set PGOPTIONS='-c
local_preload_libraries=auto_explain' after linking auto_explain into
the plugin directory?

Eh, I'm embarrassed to admit I don't know exactly how to do this. My
install directory doesn't seem to have a subdirectory called "plugin".

BTW, are we in agreement, more or less, on patch #3 now, the support
for error propagation? I'd like to go ahead and commit that if there
are no further review comments there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0006-pg_background-Run-commands-in-a-background-worker-an-v4.patchtext/x-patch; charset=US-ASCII; name=0006-pg_background-Run-commands-in-a-background-worker-an-v4.patchDownload

commit d15a03840ab5540584b19bb8ef295ff46b051d35
Author: Robert Haas <rhaas@postgresql.org>
Date:   Fri Jul 11 09:53:40 2014 -0400

    pg_background: Run commands in a background worker, and get the results.
    
    The currently-active GUC values from the user session will be copied
    to the background worker.  If the command returns a result set, you
    can retrieve the result set; if not, you can retrieve the command
    tags.  If the command fails with an error, the same error will be
    thrown in the launching process when the results are retrieved.
    Warnings and other messages generated by the background worker, and
    notifications received by it, are also propagated to the foreground
    process.
    
    Patch by me; review by Amit Kapila and Andres Freund.
    
    V2: Refactor to reduce differences with exec_simple_query; other
    cleanups per Amit.
    
    V3: Per Andres, fix whitespace damage and use NameData instead of
    char[NAMEDATALEN].
    
    V4: dsm_(un)keep_mapping -> dsm_(un)pin_mapping.  Error context:
    background worker, pid %d.  Permissions check: user must have
    permission to read results.

diff --git a/contrib/Makefile b/contrib/Makefile
index b37d0dd..11d6116 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
 		pageinspect	\
 		passwordcheck	\
 		pg_archivecleanup \
+		pg_background \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
diff --git a/contrib/pg_background/Makefile b/contrib/pg_background/Makefile
new file mode 100644
index 0000000..c4e717d
--- /dev/null
+++ b/contrib/pg_background/Makefile
@@ -0,0 +1,18 @@
+# contrib/pg_background/Makefile
+
+MODULE_big = pg_background
+OBJS = pg_background.o
+
+EXTENSION = pg_background
+DATA = pg_background--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_background
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_background/pg_background--1.0.sql b/contrib/pg_background/pg_background--1.0.sql
new file mode 100644
index 0000000..5fa5ddb
--- /dev/null
+++ b/contrib/pg_background/pg_background--1.0.sql
@@ -0,0 +1,17 @@
+/* contrib/pg_background/pg_background--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION pg_background" to load this file. \quit
+
+CREATE FUNCTION pg_background_launch(sql pg_catalog.text,
+					   queue_size pg_catalog.int4 DEFAULT 65536)
+    RETURNS pg_catalog.int4 STRICT
+	AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION pg_background_result(pid pg_catalog.int4)
+    RETURNS SETOF pg_catalog.record STRICT
+	AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION pg_background_detach(pid pg_catalog.int4)
+    RETURNS pg_catalog.void STRICT
+	AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/contrib/pg_background/pg_background.c b/contrib/pg_background/pg_background.c
new file mode 100644
index 0000000..a566ffb
--- /dev/null
+++ b/contrib/pg_background/pg_background.c
@@ -0,0 +1,997 @@
+/*--------------------------------------------------------------------------
+ *
+ * pg_background.c
+ *		Run SQL commands using a background worker.
+ *
+ * Copyright (C) 2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pg_background/pg_background.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/htup_details.h"
+#include "access/printtup.h"
+#include "access/xact.h"
+#include "catalog/pg_type.h"
+#include "commands/async.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "libpq/pqformat.h"
+#include "libpq/pqmq.h"
+#include "miscadmin.h"
+#include "parser/analyze.h"
+#include "pgstat.h"
+#include "storage/dsm.h"
+#include "storage/ipc.h"
+#include "storage/shm_mq.h"
+#include "storage/shm_toc.h"
+#include "tcop/pquery.h"
+#include "tcop/utility.h"
+#include "utils/acl.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/snapmgr.h"
+#include "utils/timeout.h"
+
+/* Table-of-contents constants for our dynamic shared memory segment. */
+#define PG_BACKGROUND_MAGIC				0x50674267
+#define PG_BACKGROUND_KEY_FIXED_DATA	0
+#define PG_BACKGROUND_KEY_SQL			1
+#define PG_BACKGROUND_KEY_GUC			2
+#define PG_BACKGROUND_KEY_QUEUE			3
+#define PG_BACKGROUND_NKEYS				4
+
+/* Fixed-size data passed via our dynamic shared memory segment. */
+typedef struct pg_background_fixed_data
+{
+	Oid	database_id;
+	Oid	authenticated_user_id;
+	Oid	current_user_id;
+	int	sec_context;
+	NameData	database;
+	NameData	authenticated_user;
+} pg_background_fixed_data;
+
+/* Private state maintained by the launching backend for IPC. */
+typedef struct pg_background_worker_info
+{
+	pid_t		pid;
+	Oid			current_user_id;
+	dsm_segment *seg;
+	BackgroundWorkerHandle *handle;
+	shm_mq_handle *responseq;
+	bool		consumed;
+} pg_background_worker_info;
+
+/* Private state maintained across calls to pg_background_result. */
+typedef struct pg_background_result_state
+{
+	pg_background_worker_info *info;
+	FmgrInfo   *receive_functions;
+	Oid		   *typioparams;
+	bool		has_row_description;
+	List	   *command_tags;
+	bool		complete;
+} pg_background_result_state;
+
+static HTAB *worker_hash;
+
+static void cleanup_worker_info(dsm_segment *, Datum pid_datum);
+static pg_background_worker_info *find_worker_info(pid_t pid);
+static void check_rights(pg_background_worker_info *info);
+static void save_worker_info(pid_t pid, dsm_segment *seg,
+				 BackgroundWorkerHandle *handle,
+				 shm_mq_handle *responseq);
+static void pg_background_error_callback(void *arg);
+
+static HeapTuple form_result_tuple(pg_background_result_state *state,
+								   TupleDesc tupdesc, StringInfo msg);
+
+static void handle_sigterm(SIGNAL_ARGS);
+static void execute_sql_string(const char *sql);
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(pg_background_launch);
+PG_FUNCTION_INFO_V1(pg_background_result);
+PG_FUNCTION_INFO_V1(pg_background_detach);
+
+void pg_background_worker_main(Datum);
+
+/*
+ * Start a dynamic background worker to run a user-specified SQL command.
+ */
+Datum
+pg_background_launch(PG_FUNCTION_ARGS)
+{
+	text	   *sql = PG_GETARG_TEXT_PP(0);
+	int32		queue_size = PG_GETARG_INT32(1);
+	int32		sql_len = VARSIZE_ANY_EXHDR(sql);
+	Size		guc_len;
+	Size		segsize;
+	dsm_segment *seg;
+	shm_toc_estimator e;
+	shm_toc    *toc;
+	char	   *sqlp;
+	char	   *gucstate;
+	shm_mq	   *mq;
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *worker_handle;
+	pg_background_fixed_data *fdata;
+	pid_t		pid;
+	shm_mq_handle *responseq;
+	MemoryContext	oldcontext;
+
+	/* Ensure a valid queue size. */
+	if (queue_size < 0 || ((uint64) queue_size) < shm_mq_minimum_size)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("queue size must be at least %zu bytes",
+						shm_mq_minimum_size)));
+
+	/* Create dynamic shared memory and table of contents. */
+	shm_toc_initialize_estimator(&e);
+	shm_toc_estimate_chunk(&e, sizeof(pg_background_fixed_data));
+	shm_toc_estimate_chunk(&e, sql_len + 1);
+	guc_len = EstimateGUCStateSpace();
+	shm_toc_estimate_chunk(&e, guc_len);
+	shm_toc_estimate_chunk(&e, (Size) queue_size);
+	shm_toc_estimate_keys(&e, PG_BACKGROUND_NKEYS);
+	segsize = shm_toc_estimate(&e);
+	seg = dsm_create(segsize);
+	toc = shm_toc_create(PG_BACKGROUND_MAGIC, dsm_segment_address(seg),
+						 segsize);
+
+	/* Store fixed-size data in dynamic shared memory. */
+	fdata = shm_toc_allocate(toc, sizeof(pg_background_fixed_data));
+	fdata->database_id = MyDatabaseId;
+	fdata->authenticated_user_id = GetAuthenticatedUserId();
+	GetUserIdAndSecContext(&fdata->current_user_id, &fdata->sec_context);
+	namestrcpy(&fdata->database, get_database_name(MyDatabaseId));
+	namestrcpy(&fdata->authenticated_user,
+			   GetUserNameFromId(fdata->authenticated_user_id));
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_FIXED_DATA, fdata);
+
+	/* Store SQL query in dynamic shared memory. */
+	sqlp = shm_toc_allocate(toc, sql_len + 1);
+	memcpy(sqlp, VARDATA(sql), sql_len);
+	sqlp[sql_len] = '\0';
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_SQL, sqlp);
+
+	/* Store GUC state in dynamic shared memory. */
+	gucstate = shm_toc_allocate(toc, guc_len);
+	SerializeGUCState(guc_len, gucstate);
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_GUC, gucstate);
+
+	/* Establish message queue in dynamic shared memory. */
+	mq = shm_mq_create(shm_toc_allocate(toc, (Size) queue_size),
+					   (Size) queue_size);
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_QUEUE, mq);
+	shm_mq_set_receiver(mq, MyProc);
+
+	/*
+	 * Attach the queue before launching a worker, so that we'll automatically
+	 * detach the queue if we error out.  (Otherwise, the worker might sit
+	 * there trying to write the queue long after we've gone away.)
+	 */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+	responseq = shm_mq_attach(mq, seg, NULL);
+	MemoryContextSwitchTo(oldcontext);
+
+	/* Configure a worker. */
+	worker.bgw_flags =
+		BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	worker.bgw_start_time = BgWorkerStart_ConsistentState;
+	worker.bgw_restart_time = BGW_NEVER_RESTART;
+	worker.bgw_main = NULL;		/* new worker might not have library loaded */
+	sprintf(worker.bgw_library_name, "pg_background");
+	sprintf(worker.bgw_function_name, "pg_background_worker_main");
+	snprintf(worker.bgw_name, BGW_MAXLEN,
+			 "pg_background by PID %d", MyProcPid);
+	worker.bgw_main_arg = UInt32GetDatum(dsm_segment_handle(seg));
+	/* set bgw_notify_pid, so we can detect if the worker stops */
+	worker.bgw_notify_pid = MyProcPid;
+
+	/*
+	 * Register the worker.
+	 *
+	 * We switch contexts so that the background worker handle can outlast
+	 * this transaction.
+	 */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+	if (!RegisterDynamicBackgroundWorker(&worker, &worker_handle))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not register background process"),
+			 errhint("You may need to increase max_worker_processes.")));
+	MemoryContextSwitchTo(oldcontext);
+	shm_mq_set_handle(responseq, worker_handle);
+
+	/* Wait for the worker to start. */
+	switch (WaitForBackgroundWorkerStartup(worker_handle, &pid))
+	{
+		case BGWH_STARTED:
+			/* Success. */
+			break;
+		case BGWH_STOPPED:
+			pfree(worker_handle);
+			ereport(ERROR,
+					(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+					 errmsg("could not start background process"),
+					 errhint("More details may be available in the server log.")));
+			break;
+		case BGWH_POSTMASTER_DIED:
+			pfree(worker_handle);
+			ereport(ERROR,
+					(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+					 errmsg("cannot start background processes without postmaster"),
+					 errhint("Kill all remaining database processes and restart the database.")));
+			break;
+		default:
+			elog(ERROR, "unexpected bgworker handle status");
+			break;
+	}
+
+	/* Store the relevant details about this worker for future use. */
+	save_worker_info(pid, seg, worker_handle, responseq);
+
+	/*
+	 * Now that the worker info is saved, we do not need to, and should not,
+	 * automatically detach the segment at resource-owner cleanup time.
+	 */
+	dsm_pin_mapping(seg);
+
+	/* Return the worker's PID. */
+	PG_RETURN_INT32(pid);
+}
+
+/*
+ * Retrieve the results of a background query previously launched in this
+ * session.
+ */
+Datum
+pg_background_result(PG_FUNCTION_ARGS)
+{
+	int32		pid = PG_GETARG_INT32(0);
+	shm_mq_result	res;
+	FuncCallContext *funcctx;
+	TupleDesc	tupdesc;
+	StringInfoData	msg;
+	pg_background_result_state *state;
+
+	/* First-time setup. */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+		pg_background_worker_info *info;
+
+		funcctx = SRF_FIRSTCALL_INIT();
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* See if we have a connection to the specified PID. */
+		if ((info = find_worker_info(pid)) == NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_OBJECT),
+					 errmsg("PID %d is not attached to this session", pid)));
+		check_rights(info);
+
+		/* Can't read results twice. */
+		if (info->consumed)
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("results for PID %d have already been consumed", pid)));
+		info->consumed = true;
+
+		/*
+		 * Whether we succeed or fail, a future invocation of this function
+		 * may not try to read from the DSM once we've begun to do so.
+		 * Accordingly, make arrangements to clean things up at end of query.
+		 */
+		dsm_unpin_mapping(info->seg);
+
+		/* Set up tuple-descriptor based on colum definition list. */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record"),
+					 errhint("Try calling the function in the FROM clause "
+							 "using a column definition list.")));
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/* Cache state that will be needed on every call. */
+		state = palloc0(sizeof(pg_background_result_state));
+		state->info = info;
+		if (funcctx->tuple_desc->natts > 0)
+		{
+			int	natts = funcctx->tuple_desc->natts;
+			int	i;
+
+			state->receive_functions = palloc(sizeof(FmgrInfo) * natts);
+			state->typioparams = palloc(sizeof(Oid) * natts);
+
+			for (i = 0;	i < natts; ++i)
+			{
+				Oid	receive_function_id;
+
+				getTypeBinaryInputInfo(funcctx->tuple_desc->attrs[i]->atttypid,
+									   &receive_function_id,
+									   &state->typioparams[i]);
+				fmgr_info(receive_function_id, &state->receive_functions[i]);
+			}
+		}
+		funcctx->user_fctx = state;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+	funcctx = SRF_PERCALL_SETUP();
+	tupdesc = funcctx->tuple_desc;
+	state = funcctx->user_fctx;
+
+	/* Initialize message buffer. */
+	initStringInfo(&msg);
+
+	/* Read and processes messages from the shared memory queue. */
+	for (;;)
+	{
+		char		msgtype;
+		Size		nbytes;
+		void	   *data;
+
+		/* Get next message. */
+		res = shm_mq_receive(state->info->responseq, &nbytes, &data, false);
+		if (res != SHM_MQ_SUCCESS)
+			break;
+
+		/*
+		 * Message-parsing routines operate on a null-terminated StringInfo,
+		 * so we must construct one.
+		 */
+		resetStringInfo(&msg);
+		enlargeStringInfo(&msg, nbytes);
+		msg.len = nbytes;
+		memcpy(msg.data, data, nbytes);
+		msg.data[nbytes] = '\0';
+		msgtype = pq_getmsgbyte(&msg);
+
+		/* Dispatch on message type. */
+		switch (msgtype)
+		{
+			case 'E':
+			case 'N':
+				{
+					ErrorData	edata;
+					ErrorContextCallback context;
+
+					/* Parse ErrorResponse or NoticeResponse. */
+					pq_parse_errornotice(&msg, &edata);
+
+					/*
+					 * Limit the maximum error level to ERROR.  We don't want
+					 * a FATAL inside the background worker to kill the user
+					 * session.
+					 */
+					if (edata.elevel > ERROR)
+						edata.elevel = ERROR;
+
+					/*
+					 * Rethrow the error with an appropriate context method.
+					 */
+					context.callback = pg_background_error_callback;
+					context.arg = (void *) &pid;
+					context.previous = error_context_stack;
+					error_context_stack = &context;
+					ThrowErrorData(&edata);
+					error_context_stack = context.previous;
+
+					break;
+				}
+			case 'A':
+				{
+					/* Propagate NotifyResponse. */
+					pq_putmessage(msg.data[0], &msg.data[1], nbytes - 1);
+					break;
+				}
+			case 'T':
+				{
+					int16	natts = pq_getmsgint(&msg, 2);
+					int16	i;
+
+					if (state->has_row_description)
+						elog(ERROR, "multiple RowDescription messages");
+					state->has_row_description = true;
+					if (natts != tupdesc->natts)
+						ereport(ERROR,
+								(errcode(ERRCODE_DATATYPE_MISMATCH),
+								 errmsg("remote query result rowtype does not match "
+									"the specified FROM clause rowtype")));
+
+					for (i = 0; i < natts; ++i)
+					{
+						Oid		type_id;
+
+						(void) pq_getmsgstring(&msg);	/* name */
+						(void) pq_getmsgint(&msg, 4);	/* table OID */
+						(void) pq_getmsgint(&msg, 2);	/* table attnum */
+						type_id = pq_getmsgint(&msg, 4);	/* type OID */
+						(void) pq_getmsgint(&msg, 2);	/* type length */
+						(void) pq_getmsgint(&msg, 4);	/* typmod */
+						(void) pq_getmsgint(&msg, 2);	/* format code */
+
+						if (type_id != tupdesc->attrs[i]->atttypid)
+							ereport(ERROR,
+									(errcode(ERRCODE_DATATYPE_MISMATCH),
+									 errmsg("remote query result rowtype does not match "
+										"the specified FROM clause rowtype")));
+					}
+
+					pq_getmsgend(&msg);
+
+					break;
+				}
+			case 'D':
+				{
+					/* Handle DataRow message. */
+					HeapTuple	result;
+
+					result = form_result_tuple(state, tupdesc, &msg);
+					SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(result));
+				}
+			case 'C':
+				{
+					/* Handle CommandComplete message. */
+					MemoryContext	oldcontext;
+					const char  *tag = pq_getmsgstring(&msg);
+
+					oldcontext =
+						MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+					state->command_tags = lappend(state->command_tags,
+												  pstrdup(tag));
+					MemoryContextSwitchTo(oldcontext);
+					break;
+				}
+			case 'G':
+			case 'H':
+			case 'W':
+				{
+					ereport(ERROR,
+							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("COPY protocol not allowed in pg_background")));
+				}
+
+			case 'Z':
+				{
+					/* Handle ReadyForQuery message. */
+					state->complete = true;
+					break;
+				}
+			default:
+				elog(WARNING, "unknown message type: %c (%zu bytes)",
+					 msg.data[0], nbytes);
+				break;
+		}
+	}
+
+	/* Check whether the connection was broken prematurely. */
+	if (!state->complete)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONNECTION_FAILURE),
+				 errmsg("lost connection to worker process with PID %d",
+					pid)));
+
+	/* If no data rows, return the command tags instead. */
+	if (!state->has_row_description)
+	{
+		if (tupdesc->natts != 1 || tupdesc->attrs[0]->atttypid != TEXTOID)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query did not return a result set, but "
+							"result rowtype is not a single text column")));
+		if (state->command_tags != NIL)
+		{
+			char *tag = linitial(state->command_tags);
+			Datum	value;
+			bool	isnull;
+			HeapTuple	result;
+
+			state->command_tags = list_delete_first(state->command_tags);
+			value = PointerGetDatum(cstring_to_text(tag));
+			isnull = false;
+			result = heap_form_tuple(tupdesc, &value, &isnull);
+			SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(result));
+		}
+	}
+
+	/* We're done! */
+	dsm_detach(state->info->seg);
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Parse a DataRow message and form a result tuple.
+ */
+static HeapTuple
+form_result_tuple(pg_background_result_state *state, TupleDesc tupdesc,
+				  StringInfo msg)
+{
+	/* Handle DataRow message. */
+	int16	natts = pq_getmsgint(msg, 2);
+	int16	i;
+	Datum  *values = NULL;
+	bool   *isnull = NULL;
+	StringInfoData	buf;
+
+	if (!state->has_row_description)
+		elog(ERROR, "DataRow not preceded by RowDescription");
+	if (natts != tupdesc->natts)
+		elog(ERROR, "malformed DataRow");
+	if (natts > 0)
+	{
+		values = palloc(natts * sizeof(Datum));
+		isnull = palloc(natts * sizeof(bool));
+	}
+	initStringInfo(&buf);
+
+	for (i = 0; i < natts; ++i)
+	{
+		int32	bytes = pq_getmsgint(msg, 4);
+
+		if (bytes < 0)
+		{
+			values[i] = ReceiveFunctionCall(&state->receive_functions[i],
+											NULL,
+											state->typioparams[i],
+											tupdesc->attrs[i]->atttypmod);
+			isnull[i] = true;
+		}
+		else
+		{
+			resetStringInfo(&buf);
+			appendBinaryStringInfo(&buf, pq_getmsgbytes(msg, bytes), bytes);
+			values[i] = ReceiveFunctionCall(&state->receive_functions[i],
+											&buf,
+											state->typioparams[i],
+											tupdesc->attrs[i]->atttypmod);
+			isnull[i] = false;
+		}
+	}
+
+	pq_getmsgend(msg);
+
+	return heap_form_tuple(tupdesc, values, isnull);
+}
+
+/*
+ * Detach from the dynamic shared memory segment used for communication with
+ * a background worker.  This prevents the worker from stalling waiting for
+ * us to read its results.
+ */
+Datum
+pg_background_detach(PG_FUNCTION_ARGS)
+{
+	int32		pid = PG_GETARG_INT32(0);
+	pg_background_worker_info *info;
+
+	info = find_worker_info(pid);
+	if (info == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("PID %d is not attached to this session", pid)));
+	check_rights(info);
+	dsm_detach(info->seg);
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * When the dynamic shared memory segment associated with a worker is
+ * cleaned up, we need to clean up our associated private data structures.
+ */
+static void
+cleanup_worker_info(dsm_segment *seg, Datum pid_datum)
+{
+	pid_t	pid = DatumGetInt32(pid_datum);
+	bool	found;
+	pg_background_worker_info *info;
+
+	/* Find any worker info entry for this PID.  If none, we're done. */
+	if ((info = find_worker_info(pid)) == NULL)
+		return;
+
+	/* Free memory used by the BackgroundWorkerHandle. */
+	if (info->handle != NULL)
+	{
+		pfree(info->handle);
+		info->handle = NULL;
+	}
+
+	/* Remove the hashtable entry. */
+	hash_search(worker_hash, (void *) &pid, HASH_REMOVE, &found);
+	if (!found)
+		elog(ERROR, "pg_background worker_hash table corrupted");
+}
+
+/*
+ * Find the background worker information for the worker with a given PID.
+ */
+static pg_background_worker_info *
+find_worker_info(pid_t pid)
+{
+	pg_background_worker_info *info = NULL;
+
+	if (worker_hash != NULL)
+		info = hash_search(worker_hash, (void *) &pid, HASH_FIND, NULL);
+
+	return info;
+}
+
+/*
+ * Check whether the current user has rights to manipulate the background
+ * worker with the given PID.
+ */
+static void
+check_rights(pg_background_worker_info *info)
+{
+	Oid	current_user_id;
+	int	sec_context;
+
+	GetUserIdAndSecContext(&current_user_id, &sec_context);
+	if (!has_privs_of_role(current_user_id, info->current_user_id))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+			 errmsg("permission denied for background worker with PID \"%d\"",
+						info->pid)));
+}
+
+/*
+ * Save worker information for future IPC.
+ */
+static void
+save_worker_info(pid_t pid, dsm_segment *seg, BackgroundWorkerHandle *handle,
+				 shm_mq_handle *responseq)
+{
+	pg_background_worker_info *info;
+	Oid current_user_id;
+	int	sec_context;
+
+	/* If the hash table hasn't been set up yet, do that now. */
+	if (worker_hash == NULL)
+	{
+		HASHCTL	ctl;
+
+		ctl.keysize = sizeof(pid_t);
+		ctl.entrysize = sizeof(pg_background_worker_info);
+		worker_hash = hash_create("pg_background worker_hash", 8, &ctl,
+								  HASH_ELEM);
+	}
+
+	/* Get current authentication information. */
+	GetUserIdAndSecContext(&current_user_id, &sec_context);
+
+	/*
+	 * In the unlikely event that there's an older worker with this PID,
+	 * just detach it - unless it has a different user ID than the
+	 * currently-active one, in which case someone might be trying to pull
+	 * a fast one.  Let's kill the backend to make sure we don't break
+	 * anyone's expectations.
+	 */
+	if ((info = find_worker_info(pid)) != NULL)
+	{
+		if (current_user_id != info->current_user_id)
+			ereport(FATAL,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+			 errmsg("background worker with PID \"%d\" already exists",
+						pid)));
+		dsm_detach(info->seg);
+	}
+
+	/* When the DSM is unmapped, clean everything up. */
+	on_dsm_detach(seg, cleanup_worker_info, Int32GetDatum(pid));
+
+	/* Create a new entry for this worker. */
+	info = hash_search(worker_hash, (void *) &pid, HASH_ENTER, NULL);
+	info->seg = seg;
+	info->handle = handle;
+	info->responseq = responseq;
+	info->consumed = false;
+}
+
+/*
+ * Indicate that an error came from a particular background worker.
+ */
+static void
+pg_background_error_callback(void *arg)
+{
+	pid_t	pid = * (pid_t *) arg;
+
+	errcontext("background worker, pid %d", pid);
+}
+
+/*
+ * Background worker entrypoint.
+ */
+void
+pg_background_worker_main(Datum main_arg)
+{
+	dsm_segment *seg;
+	shm_toc    *toc;
+	pg_background_fixed_data *fdata;
+	char	   *sql;
+	char	   *gucstate;
+	shm_mq	   *mq;
+	shm_mq_handle *responseq;
+
+	/* Establish signal handlers. */
+	pqsignal(SIGTERM, handle_sigterm);
+	BackgroundWorkerUnblockSignals();
+
+	/* Set up a memory context and resource owner. */
+	Assert(CurrentResourceOwner == NULL);
+	CurrentResourceOwner = ResourceOwnerCreate(NULL, "pg_background");
+	CurrentMemoryContext = AllocSetContextCreate(TopMemoryContext,
+												 "pg_background session",
+												 ALLOCSET_DEFAULT_MINSIZE,
+												 ALLOCSET_DEFAULT_INITSIZE,
+												 ALLOCSET_DEFAULT_MAXSIZE);
+
+	/* Connect to the dynamic shared memory segment. */
+	seg = dsm_attach(DatumGetInt32(main_arg));
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("unable to map dynamic shared memory segment")));
+	toc = shm_toc_attach(PG_BACKGROUND_MAGIC, dsm_segment_address(seg));
+	if (toc == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+			   errmsg("bad magic number in dynamic shared memory segment")));
+
+	/* Find data structures in dynamic shared memory. */
+	fdata = shm_toc_lookup(toc, PG_BACKGROUND_KEY_FIXED_DATA);
+	sql = shm_toc_lookup(toc, PG_BACKGROUND_KEY_SQL);
+	gucstate = shm_toc_lookup(toc, PG_BACKGROUND_KEY_GUC);
+	mq = shm_toc_lookup(toc, PG_BACKGROUND_KEY_QUEUE);
+	shm_mq_set_sender(mq, MyProc);
+	responseq = shm_mq_attach(mq, seg, NULL);
+
+	/* Redirect protocol messages to responseq. */
+	pq_redirect_to_shm_mq(mq, responseq);
+
+	/*
+	 * Initialize our user and database ID based on the strings version of
+	 * the data, and then go back and check that we actually got the database
+	 * and user ID that we intended to get.  We do this because it's not
+	 * impossible for the process that started us to die before we get here,
+	 * and the user or database could be renamed in the meantime.  We don't
+	 * want to latch on the wrong object by accident.  There should probably
+	 * be a variant of BackgroundWorkerInitializeConnection that accepts OIDs
+	 * rather than strings.
+	 */
+	BackgroundWorkerInitializeConnection(NameStr(fdata->database),
+										 NameStr(fdata->authenticated_user));
+	if (fdata->database_id != MyDatabaseId ||
+		fdata->authenticated_user_id != GetAuthenticatedUserId())
+		ereport(ERROR,
+			(errmsg("user or database renamed during pg_background startup")));
+
+	/* Restore GUC values from launching backend. */
+	StartTransactionCommand();
+	RestoreGUCState(gucstate);
+	CommitTransactionCommand();
+
+	/* Restore user ID and security context. */
+	SetUserIdAndSecContext(fdata->current_user_id, fdata->sec_context);
+
+	/* Prepare to execute the query. */
+	SetCurrentStatementStartTimestamp();
+	debug_query_string = sql;
+	pgstat_report_activity(STATE_RUNNING, sql);
+
+	/* Execute the query. */
+	execute_sql_string(sql);
+
+	/* Post-execution cleanup. */
+	ProcessCompletedNotifies();
+	pgstat_report_activity(STATE_IDLE, sql);
+	pgstat_report_stat(true);
+
+	/* Signal that we are done. */
+	ReadyForQuery(DestRemote);
+}
+
+/*
+ * Execute given SQL string.
+ *
+ * Using SPI here would preclude backgrounding commands like VACUUM which one
+ * might very well wish to launch in the background.  So we do this instead.
+ */
+static void
+execute_sql_string(const char *sql)
+{
+	List	   *raw_parsetree_list;
+	ListCell   *lc1;
+	bool		isTopLevel;
+	int			commands_remaining;
+	MemoryContext	parsecontext;
+	MemoryContext	oldcontext;
+
+	/* Start up a transaction command. */
+	start_xact_command();
+
+	/*
+	 * Parse the SQL string into a list of raw parse trees.
+	 *
+	 * Because we allow statements that perform internal transaction control,
+	 * we can't do this in TopTransactionContext; the parse trees might get
+	 * blown away before we're done executing them.
+	 */
+	parsecontext = AllocSetContextCreate(TopMemoryContext,
+										 "pg_background parse/plan",
+										 ALLOCSET_DEFAULT_MINSIZE,
+										 ALLOCSET_DEFAULT_INITSIZE,
+										 ALLOCSET_DEFAULT_MAXSIZE);
+	oldcontext = MemoryContextSwitchTo(parsecontext);
+	raw_parsetree_list = pg_parse_query(sql);
+	commands_remaining = list_length(raw_parsetree_list);
+	isTopLevel = commands_remaining == 1;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * Do parse analysis, rule rewrite, planning, and execution for each raw
+	 * parsetree.  We must fully execute each query before beginning parse
+	 * analysis on the next one, since there may be interdependencies.
+	 */
+	foreach(lc1, raw_parsetree_list)
+	{
+		Node	   *parsetree = (Node *) lfirst(lc1);
+		const char *commandTag;
+		char        completionTag[COMPLETION_TAG_BUFSIZE];
+		List       *querytree_list,
+				   *plantree_list;
+		bool		snapshot_set = false;
+		Portal		portal;
+		DestReceiver *receiver;
+		int16		format = 1;
+
+		/*
+		 * We don't allow transaction-control commands like COMMIT and ABORT
+		 * here.  The entire SQL statement is executed as a single transaction
+		 * which commits if no errors are encountered.
+		 */
+		if (IsA(parsetree, TransactionStmt))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("transaction control statements are not allowed in pg_background")));
+
+		/*
+		 * Get the command name for use in status display (it also becomes the
+		 * default completion tag, down inside PortalRun).  Set ps_status and
+		 * do any special start-of-SQL-command processing needed by the
+		 * destination.
+		 */
+		commandTag = CreateCommandTag(parsetree);
+		set_ps_display(commandTag, false);
+		BeginCommand(commandTag, DestNone);
+
+		/* Set up a snapshot if parse analysis/planning will need one. */
+		if (analyze_requires_snapshot(parsetree))
+		{
+			PushActiveSnapshot(GetTransactionSnapshot());
+			snapshot_set = true;
+		}
+
+		/*
+		 * OK to analyze, rewrite, and plan this query.
+		 *
+		 * As with parsing, we need to make sure this data outlives the
+		 * transaction, because of the possibility that the statement might
+		 * perform internal transaction control.
+		 */
+		oldcontext = MemoryContextSwitchTo(parsecontext);
+		querytree_list = pg_analyze_and_rewrite(parsetree, sql, NULL, 0);
+		plantree_list = pg_plan_queries(querytree_list, 0, NULL);
+
+		/* Done with the snapshot used for parsing/planning */
+		if (snapshot_set)
+			PopActiveSnapshot();
+
+		/* If we got a cancel signal in analysis or planning, quit */
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Execute the query using the unnamed portal.
+		 */
+		portal = CreatePortal("", true, true);
+		/* Don't display the portal in pg_cursors */
+		portal->visible = false;
+		PortalDefineQuery(portal, NULL, sql, commandTag, plantree_list, NULL);
+		PortalStart(portal, NULL, 0, InvalidSnapshot);
+
+		/* We always use binary format, for efficiency. */
+		PortalSetResultFormat(portal, 1, &format);
+
+		/*
+		 * Tuples returned by any command other than the last are simply
+		 * discarded; but those returned by the last (or only) command are
+		 * redirected to the shared memory queue we're using for communication
+		 * with the launching backend. If the launching backend is gone or has
+		 * detached us, these messages will just get dropped on the floor.
+		 */
+		--commands_remaining;
+		if (commands_remaining > 0)
+			receiver = CreateDestReceiver(DestNone);
+		else
+		{
+			receiver = CreateDestReceiver(DestRemote);
+			SetRemoteDestReceiverParams(receiver, portal);
+		}
+
+		/*
+		 * Only once the portal and destreceiver have been established can
+		 * we return to the transaction context.  All that stuff needs to
+		 * survive an internal commit inside PortalRun!
+		 */
+		MemoryContextSwitchTo(oldcontext);
+
+		/* Here's where we actually execute the command. */
+		(void) PortalRun(portal, FETCH_ALL, isTopLevel, receiver, receiver,
+						 completionTag);
+
+		/* Clean up the receiver. */
+		(*receiver->rDestroy) (receiver);
+
+		/* Clean up the portal. */
+		PortalDrop(portal, false);
+
+		/*
+		 * If this is the last parsetree, close down transaction statement
+		 * before reporting CommandComplete.  Otherwise, we need a
+		 * CommandCounterIncrement.
+		 */
+		if (lnext(lc1) == NULL)
+			finish_xact_command();
+		else
+			CommandCounterIncrement();
+
+		/*
+		 * Send a CommandComplete message even if we suppressed the query
+		 * results.  The user backend will report the command tags in the
+		 * absence of any true query results.
+		 */
+		EndCommand(completionTag, DestRemote);
+	}
+
+	/* Make sure there's not still a transaction open. */
+	finish_xact_command();
+}
+
+/*
+ * When we receive a SIGTERM, we set InterruptPending and ProcDiePending just
+ * like a normal backend.  The next CHECK_FOR_INTERRUPTS() will do the right
+ * thing.
+ */
+static void
+handle_sigterm(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	if (!proc_exit_inprogress)
+	{
+		InterruptPending = true;
+		ProcDiePending = true;
+	}
+
+	errno = save_errno;
+}
diff --git a/contrib/pg_background/pg_background.control b/contrib/pg_background/pg_background.control
new file mode 100644
index 0000000..733d0e1
--- /dev/null
+++ b/contrib/pg_background/pg_background.control
@@ -0,0 +1,4 @@
+comment = 'Run SQL queries in the background'
+default_version = '1.0'
+module_pathname = '$libdir/pg_background'
+relocatable = true
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 61f17bf..e44f5fa 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -190,8 +190,6 @@ static int	errdetail_execute(List *raw_parsetree_list);
 static int	errdetail_params(ParamListInfo params);
 static int	errdetail_abort(void);
 static int	errdetail_recovery_conflict(void);
-static void start_xact_command(void);
-static void finish_xact_command(void);
 static bool IsTransactionExitStmt(Node *parsetree);
 static bool IsTransactionExitStmtList(List *parseTrees);
 static bool IsTransactionStmtList(List *parseTrees);
@@ -2373,7 +2371,7 @@ exec_describe_portal_message(const char *portal_name)
 /*
  * Convenience routines for starting/committing a single command.
  */
-static void
+void
 start_xact_command(void)
 {
 	if (!xact_started)
@@ -2393,7 +2391,7 @@ start_xact_command(void)
 	}
 }
 
-static void
+void
 finish_xact_command(void)
 {
 	if (xact_started)
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index c23f4da..8e84ee7 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -224,7 +224,7 @@ extern PGPROC *PreparedXactProcs;
 
 /* configurable options */
 extern int	DeadlockTimeout;
-extern int	StatementTimeout;
+extern PGDLLIMPORT int	StatementTimeout;
 extern int	LockTimeout;
 extern bool log_lock_waits;
 
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 60f7532..fd3df58 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -69,6 +69,8 @@ extern void RecoveryConflictInterrupt(ProcSignalReason reason); /* called from S
 																 * handler */
 extern void prepare_for_client_read(void);
 extern void client_read_ended(void);
+extern void start_xact_command(void);
+extern void finish_xact_command(void);
 extern void process_postgres_switches(int argc, char *argv[],
 						  GucContext ctx, const char **dbname);
 extern void PostgresMain(int argc, char *argv[],

#67

Andres Freund

andres@2ndquadrant.com

about 11 years ago

In reply to: Robert Haas (#66)

Re: pg_background (and more parallelism infrastructure patches)

Hi,

On 2014-10-30 18:03:27 -0400, Robert Haas wrote:

On Wed, Oct 29, 2014 at 4:58 PM, Andres Freund <andres@2ndquadrant.com> wrote:

That's true. I don't know what to do about it. I'm somewhat inclined
to think that, if this remains in contrib, it's OK to ignore those
issues until such time as people complain about them, because anybody
who dislikes the things that can be done with this extension doesn't
have to install it. Also, the people complaining might have useful
ideas about what a good fix would look like, which I currently don't.
There's some push to move this into core, which I think is overkill,
but if we do it then we'd better have a good solution to this problem.

At the very least it need to be clearly documented. Another solution
would be to simply not give out PUBLIC rights, and restrict it to the
owner/superuesers lest somebody makes explicit grants. I favor
combining those two.

I don't think it's appropriate to put superuser() checks in the code,
if that's what you are proposing.

That's not at all what I'm thinking of... The superuser reference was
only that explicit EXECUTE rights aren't required for superusers.

Forcing this to be super-user only is hitting a mouse with a
battleship. If you're saying we should put REVOKE commands into the
install script as we do for some other modules, like dblink, that
makes sense to me.

But instead this.

We could try to make connection limits apply to pg_background, and we
could also check CONNECT permission when starting a background worker.
Both of those things feel slightly odd because there's no actual
server connection. There *might* be a connection to the user backend
that started it, but it's sort of a "virtual" connection through
shared memory, and the background process continues running unimpeded
if it goes away, so there might be no actual connection at all.

I think that'd not be bad.

Looks like those checks happen in InitializeSessionUserId(), which is
called from InitPostgres(), which is called from
BackgroundWorkerInitializeConnection(). That makes me think we're
already applying these checks.

Oh, neat.

Hm. I'm unconvinced. It looks almost trivial to fail back to the text
based protocol.

It's hard to argue with "I'm unconvinced". What specifically about
that argument do you think isn't valid?

We don't normally just say "this is unsupported" for things that are
perfectly valid. And types without binary send/recv functions imo are
perfectly valid. You've made that kind of argument yourself more than
once, no?
It's really not a academic thing. Popular extensions like postgis don't
provide send/recv for all its types. Same with a couple of our own
contrib types (intarray, chkpass, seg, cube).

I guess we need input from others on this point.

While I am sure the problem can be worked around, it isn't trivial.
Right now, execute_sql_string() just requests binary format
unconditionally. To do what you're talking about, we'd need to
iterate through all of the types and figure out which ones have
typsend/typreceive functions. If there's a convenience function that
will do that for us, I don't see it, probably because I believe there
are exact zero situations where we do that kind of inference today.
Then, the user backend has to save the format codes from the
RowDescription message and decide whether to use text or binary. That
just seems like a silly waste of code and cycles.

The current client/server protocol definitely can do it. You can specify
per column whether you want binary/textual data. I'm pretty sure that at
least the jdbc driver does that.

It's also what I've decided to do for BDR's output plugin. Some builtin
types will be transferred 'as is' if the architecture/version
matches. Otherwise, if available and the version matches, send/recv will
be used (exluding some corner case, but ..). Only if neither is the case
if falls back to the textual protocol.

I think this actually matters, too, because the question is what we're
going to do with full-blown parallelism.

I think for parallelism it'll most definitely not be acceptable to not
support types without send/recv. So building some convenience
infrastructure here imo is going to pay off. It's not like it's that
hard to detect - just check OidIsValid(pg_type->typsend). The pg_type
tuple has to be looked up anyway to find the send/recv functions...

Best would be to actually
shuttle entire raw tuples between backends; second best, binary
format; third best, text format or mixture of text and binary. I'm
not sure what it's reasonable to try to get away with there, but I
certainly think minimizing IPC costs is going to be an important goal.
I didn't try to get around with shipping raw tuples here because we
don't lock types while they're in use, and I'm worried that Bad Things
Could Happen.

Yea, because of type changes and such I went with using the 'as-is'
format only for builtin types (via att->atttypid < FirstNormalObjectId)
- we know those aren't changed. And the majority of the data usually
uses such types...

There's some PGC_BACKEND guc's that normally are only allowed to be set
at connection start. I'm not sure whether you're just circumventing that
check or whether you didn't hit a problematic case.

What does e.g. happen if you set PGOPTIONS='-c
local_preload_libraries=auto_explain' after linking auto_explain into
the plugin directory?

Eh, I'm embarrassed to admit I don't know exactly how to do this. My
install directory doesn't seem to have a subdirectory called "plugin".

Sadly you need to create it. Inside the normal $libdir. I.e. $(pg_config
--libdir)/plugins. And then (sym-)link/copy libraries that you want
every user to use in there.

BTW, are we in agreement, more or less, on patch #3 now, the support
for error propagation? I'd like to go ahead and commit that if there
are no further review comments there.

You're talking about
0003-Support-frontend-backend-protocol-communication-usin-v3.patch from
http://archives.postgresql.org/message-id/CA%2BTgmoY9B0mL7Ci1fG1cZ4-zwDHOjjQaKyLowB3F-jsbuvVw2g%40mail.gmail.com?

If so, go ahead from my POV. It seems likely that we'll whack it around
some more, but I don't see an advantage in waiting longer.

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#68

Jim Nasby

Jim.Nasby@BlueTreble.com

about 11 years ago

In reply to: Jim Nasby (#57)

2 attachment(s)

Re: pg_background (and more parallelism infrastructure patches)

On 10/24/14, 6:17 PM, Jim Nasby wrote:

- Does anyone have a tangible suggestion for how to reduce the code
duplication in patch #6?

Between execute_sql_string() and tcop/exec_simple_query()? Is there stuff in
exec_simple that's not safe for bgwriter? I'm not seeing why we can't use
exec_simple. :/

There are some differences if you compare them closely.

Without doing a deep dive, I'm guessing that most of the extra stuff would be safe to re-use; it just wouldn't affect execute_sql_string. Obviously we could add a boolean to exec_simple_query for the case when it's being used by a bgwriter. Though, it seems like the biggest differences have to do with logging

Here's the differences I see:

- Disallowing transaction commands
- Logging
- What memory context is used (could we just use that differently in a pg_backend backend?)
- Portal output format
- What to do with the output of intermediate commands (surely there's other places we need to handle that? plpgsql maybe?)

I think all of those except logging could be handled by a function serving both exec_simple_query and execute_sql_command that accepts a few booleans (or maybe just one to indicate the caller) and some if's. At least I don't think it'd be too bad, without actually writing it.

I'm not sure what to do about logging. If the original backend has logging turned on, is it that horrible to do the same logging as exec_simple_query would do?

Another option would be factoring out parts of exec_simple_query; the for loop over the parse tree might be a good candidate. But I suspect that would be uglier than a separate support function.

I do feel uncomfortable with the amount of duplication there is right now though...

I took a stab at this by refactoring the guts of exec_simple_query (patch attached) into a new function called exec_query_string (also attached in raw form). As indicated it needs a bit more work. In particular, how it's dealing with excluding transactional commands is rather ugly. Why do we need to do that in pg_background?

Andres was concerned about the performance impact of doing this. I tested this by doing

for i in {1..999999}; do echo 'SELECT 1;' >> test.sql; done

and then

time psql -f test.sql > /dev/nul

It appears there may be a ~1% performance hit, but my laptop isn't repeatable enough to be sure. I did try manually in-lining exec_query_string to see if it was the function call causing the issue; it didn't seem to make a difference.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Attachments:

0001-Refactor-the-guts-of-exec_simple_query.patchtext/plain; charset=UTF-8; name=0001-Refactor-the-guts-of-exec_simple_query.patch; x-mac-creator=0; x-mac-type=0Download

Author: Jim Nasby <Jim.Nasby@BlueTreble.com>
Date: Thu, 30 Oct 2014 20:25:34 -0500

Move the bulk of exec_simple_query into exec_query_string() so that
pg_backend can also make use of it.

I’m not thrilled about the name, and I’m not sure tcopprot.h is the
right place to expose this. Also note the XXX comments.
---
 contrib/pg_background/pg_background.c | 141 ++--------------------------------
 src/backend/tcop/postgres.c           |  83 ++++++++++++++++----
 src/include/tcop/tcopprot.h           |   7 ++
 3 files changed, 80 insertions(+), 151 deletions(-)

diff --git a/contrib/pg_background/pg_background.c b/contrib/pg_background/pg_background.c
index a566ffb..075ecd8 100644
--- a/contrib/pg_background/pg_background.c
+++ b/contrib/pg_background/pg_background.c
@@ -817,10 +817,6 @@ pg_background_worker_main(Datum main_arg)
 static void
 execute_sql_string(const char *sql)
 {
-	List	   *raw_parsetree_list;
-	ListCell   *lc1;
-	bool		isTopLevel;
-	int			commands_remaining;
 	MemoryContext	parsecontext;
 	MemoryContext	oldcontext;
 
@@ -839,139 +835,16 @@ execute_sql_string(const char *sql)
 										 ALLOCSET_DEFAULT_MINSIZE,
 										 ALLOCSET_DEFAULT_INITSIZE,
 										 ALLOCSET_DEFAULT_MAXSIZE);
-	oldcontext = MemoryContextSwitchTo(parsecontext);
-	raw_parsetree_list = pg_parse_query(sql);
-	commands_remaining = list_length(raw_parsetree_list);
-	isTopLevel = commands_remaining == 1;
-	MemoryContextSwitchTo(oldcontext);
 
 	/*
-	 * Do parse analysis, rule rewrite, planning, and execution for each raw
-	 * parsetree.  We must fully execute each query before beginning parse
-	 * analysis on the next one, since there may be interdependencies.
+	 * Do the real work
 	 */
-	foreach(lc1, raw_parsetree_list)
-	{
-		Node	   *parsetree = (Node *) lfirst(lc1);
-		const char *commandTag;
-		char        completionTag[COMPLETION_TAG_BUFSIZE];
-		List       *querytree_list,
-				   *plantree_list;
-		bool		snapshot_set = false;
-		Portal		portal;
-		DestReceiver *receiver;
-		int16		format = 1;
-
-		/*
-		 * We don't allow transaction-control commands like COMMIT and ABORT
-		 * here.  The entire SQL statement is executed as a single transaction
-		 * which commits if no errors are encountered.
-		 */
-		if (IsA(parsetree, TransactionStmt))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("transaction control statements are not allowed in pg_background")));
-
-		/*
-		 * Get the command name for use in status display (it also becomes the
-		 * default completion tag, down inside PortalRun).  Set ps_status and
-		 * do any special start-of-SQL-command processing needed by the
-		 * destination.
-		 */
-		commandTag = CreateCommandTag(parsetree);
-		set_ps_display(commandTag, false);
-		BeginCommand(commandTag, DestNone);
-
-		/* Set up a snapshot if parse analysis/planning will need one. */
-		if (analyze_requires_snapshot(parsetree))
-		{
-			PushActiveSnapshot(GetTransactionSnapshot());
-			snapshot_set = true;
-		}
-
-		/*
-		 * OK to analyze, rewrite, and plan this query.
-		 *
-		 * As with parsing, we need to make sure this data outlives the
-		 * transaction, because of the possibility that the statement might
-		 * perform internal transaction control.
-		 */
-		oldcontext = MemoryContextSwitchTo(parsecontext);
-		querytree_list = pg_analyze_and_rewrite(parsetree, sql, NULL, 0);
-		plantree_list = pg_plan_queries(querytree_list, 0, NULL);
-
-		/* Done with the snapshot used for parsing/planning */
-		if (snapshot_set)
-			PopActiveSnapshot();
-
-		/* If we got a cancel signal in analysis or planning, quit */
-		CHECK_FOR_INTERRUPTS();
-
-		/*
-		 * Execute the query using the unnamed portal.
-		 */
-		portal = CreatePortal("", true, true);
-		/* Don't display the portal in pg_cursors */
-		portal->visible = false;
-		PortalDefineQuery(portal, NULL, sql, commandTag, plantree_list, NULL);
-		PortalStart(portal, NULL, 0, InvalidSnapshot);
-
-		/* We always use binary format, for efficiency. */
-		PortalSetResultFormat(portal, 1, &format);
-
-		/*
-		 * Tuples returned by any command other than the last are simply
-		 * discarded; but those returned by the last (or only) command are
-		 * redirected to the shared memory queue we're using for communication
-		 * with the launching backend. If the launching backend is gone or has
-		 * detached us, these messages will just get dropped on the floor.
-		 */
-		--commands_remaining;
-		if (commands_remaining > 0)
-			receiver = CreateDestReceiver(DestNone);
-		else
-		{
-			receiver = CreateDestReceiver(DestRemote);
-			SetRemoteDestReceiverParams(receiver, portal);
-		}
-
-		/*
-		 * Only once the portal and destreceiver have been established can
-		 * we return to the transaction context.  All that stuff needs to
-		 * survive an internal commit inside PortalRun!
-		 */
-		MemoryContextSwitchTo(oldcontext);
-
-		/* Here's where we actually execute the command. */
-		(void) PortalRun(portal, FETCH_ALL, isTopLevel, receiver, receiver,
-						 completionTag);
-
-		/* Clean up the receiver. */
-		(*receiver->rDestroy) (receiver);
-
-		/* Clean up the portal. */
-		PortalDrop(portal, false);
-
-		/*
-		 * If this is the last parsetree, close down transaction statement
-		 * before reporting CommandComplete.  Otherwise, we need a
-		 * CommandCounterIncrement.
-		 */
-		if (lnext(lc1) == NULL)
-			finish_xact_command();
-		else
-			CommandCounterIncrement();
-
-		/*
-		 * Send a CommandComplete message even if we suppressed the query
-		 * results.  The user backend will report the command tags in the
-		 * absence of any true query results.
-		 */
-		EndCommand(completionTag, DestRemote);
-	}
-
-	/* Make sure there's not still a transaction open. */
-	finish_xact_command();
+	exec_query_string(query_string,
+					  DestRemote,
+					  &parsecontext,
+					  true, /* allow_transactions */
+					  true /* last_result_only */
+					  );
 }
 
 /*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e44f5fa..3b998fc 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -823,13 +823,7 @@ static void
 exec_simple_query(const char *query_string)
 {
 	CommandDest dest = whereToSendOutput;
-	MemoryContext oldcontext;
-	List	   *parsetree_list;
-	ListCell   *parsetree_item;
 	bool		save_log_statement_stats = log_statement_stats;
-	bool		was_logged = false;
-	bool		isTopLevel;
-	char		msec_str[32];
 
 
 	/*
@@ -849,6 +843,46 @@ exec_simple_query(const char *query_string)
 		ResetUsage();
 
 	/*
+	 * Do the real work
+	 */
+	exec_query_string(query_string,
+					  dest,
+					  MessageContext,
+					  true, /* allow_transactions */
+					  false /* last_result_only */
+					  );
+
+	if (save_log_statement_stats)
+		ShowUsage("QUERY STATISTICS");
+
+	TRACE_POSTGRESQL_QUERY_DONE(query_string);
+
+	debug_query_string = NULL;
+}
+
+/*
+ * exec_query_string
+ *
+ * This is the guts of exec_simply_query, but is also used by
+ * execute_sql_string in contrib/pg_background
+ */
+void
+exec_query_string(const char *query_string,
+				  CommandDest dest, /* Where to send output */
+				  MemoryContext parsecontext, /* Context we should use to parse query_string */
+				  bool allow_transactions, /* Should we allow transaction statements? */
+				  bool last_result_only /* If true, drop all command output except for the last command */
+				  )
+{
+	MemoryContext oldcontext;
+	List	   *parsetree_list;
+	ListCell   *parsetree_item;
+	bool		was_logged = false;
+	bool		isTopLevel;
+	char		msec_str[32];
+	int			commands_remaining;
+
+	/*
 	 * Start up a transaction command.  All queries generated by the
 	 * query_string will be in this same command block, *unless* we find a
 	 * BEGIN/COMMIT/ABORT statement; we have to force a new xact command after
@@ -875,6 +909,7 @@ exec_simple_query(const char *query_string)
 	 * we are in aborted transaction state!)
 	 */
 	parsetree_list = pg_parse_query(query_string);
+	commands_remaining = list_length(parsetree_list);
 
 	/* Log immediately if dictated by log_statement */
 	if (check_log_statement(parsetree_list))
@@ -915,6 +950,13 @@ exec_simple_query(const char *query_string)
 		DestReceiver *receiver;
 		int16		format;
 
+		if (!allow_transactions && IsA(parsetree, TransactionStmt))
+			/* XXX this needs to be more flexible */
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("transaction control statements are not allowed in pg_background")));
+
+
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and
@@ -925,6 +967,7 @@ exec_simple_query(const char *query_string)
 
 		set_ps_display(commandTag, false);
 
+		/* XXX what should we do here in the case of last_result_only = true? */
 		BeginCommand(commandTag, dest);
 
 		/*
@@ -944,7 +987,8 @@ exec_simple_query(const char *query_string)
 					 errdetail_abort()));
 
 		/* Make sure we are in a transaction command */
-		start_xact_command();
+		if (allow_transactions)
+			start_xact_command();
 
 		/* If we got a cancel signal in parsing or prior command, quit */
 		CHECK_FOR_INTERRUPTS();
@@ -1027,10 +1071,19 @@ exec_simple_query(const char *query_string)
 
 		/*
 		 * Now we can create the destination receiver object.
+		 *
+		 * If we sending the last result only, throw output away unless it's
+		 * the last command.
 		 */
-		receiver = CreateDestReceiver(dest);
-		if (dest == DestRemote)
-			SetRemoteDestReceiverParams(receiver, portal);
+		--commands_remaining;
+		if (last_result_only && commands_remaining > 0)
+			receiver = CreateDestReceiver(DestNone);
+		else
+		{
+			receiver = CreateDestReceiver(dest);
+			if (dest == DestRemote)
+				SetRemoteDestReceiverParams(receiver, portal);
+		}
 
 		/*
 		 * Switch back to transaction context for execution.
@@ -1055,7 +1108,9 @@ exec_simple_query(const char *query_string)
 		{
 			/*
 			 * If this was a transaction control statement, commit it. We will
-			 * start a new xact command for the next command (if any).
+			 * start a new xact command for the next command (if any). Note
+			 * that if allow_transactions is false then we should never get to
+			 * this point.
 			 */
 			finish_xact_command();
 		}
@@ -1121,12 +1176,6 @@ exec_simple_query(const char *query_string)
 			break;
 	}
 
-	if (save_log_statement_stats)
-		ShowUsage("QUERY STATISTICS");
-
-	TRACE_POSTGRESQL_QUERY_DONE(query_string);
-
-	debug_query_string = NULL;
 }
 
 /*
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index fd3df58..39c6a39 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -86,4 +86,11 @@ extern bool set_plan_disabling_options(const char *arg,
 						   GucContext context, GucSource source);
 extern const char *get_stats_option_name(const char *arg);
 
+extern void exec_query_string(const char *query_string,
+				  CommandDest dest, /* Where to send output */
+				  MemoryContext parsecontext, /* Context we should use to parse query_string */
+				  bool allow_transactions, /* Should we allow transaction statements? */
+				  bool last_result_only /* If true, drop all command output except for the last command */
+				  );
+
 #endif   /* TCOPPROT_H */
-- 
2.1.2

exec_query_string.ctext/plain; charset=UTF-8; name=exec_query_string.c; x-mac-creator=0; x-mac-type=0Download

#69

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Andres Freund (#67)

Re: pg_background (and more parallelism infrastructure patches)

So, summarizing the state of this patch set:

- Patches 1, 2, 3, and 5 are committed.

- Patch 4 has had some review and really, as far as I can tell, the
only really major issue that has cropped up is the possibility that
the GUC settings that are legal in backend A aren't legal in backend B
for some reason; in that case, restoring the GUC settings there will
probably fail. While this is a legitimate concern, I think what it
really boils down to is that there's a possibility that there are
libraries loaded in the user backend that are not loaded in the
postmaster and therefore won't get loaded into the background worker.
That's an issue to which we may need to engineer some solution, but
not in this patch. Overall, the patch's architecture is modeled
closely on the way we synchronize GUCs to new backends when using
EXEC_BACKEND, and I don't think we're going do any better than stating
with that and working to file down the rough edges as we go. So I'd
like to go ahead and commit this.

- Patch 6, pg_background itself, has a number of outstanding issues.
Some people aren't sure it's useful enough to really be worth
bothering with; other people seem quite excited about it, even to the
point of wanting to push it all the way into core. Whatever we
ultimately decide to do there, the patch as it stands today is clearly
not ready for commit. The issues I know of are:

-- I still haven't written documentation for it.
-- There's some code duplication with exec_simple_query() that doesn't
feel great, but it's not clear that there's a superior alternative. I
think we certainly don't want to complicate exec_simple_query() to
make pg_background happy.
-- We should add REVOKE to the SQL script that installs it so that
non-superusers can't use it unless the superuser decides to grant them
rights.
-- It doesn't handle types without send/receive functions. I thought
that was tolerable since the functions without such types seem like
mostly edge cases, but Andres doesn't like it. Apparently, BDR is
makes provisions to either ship the tuple as one big blob - if all
built-in types - or else use binary format where supported and text
format otherwise. Since this issue is common to BDR, parallelism, and
pg_background, we might want to introduce some common infrastructure
for it, but it's not too clear to me right now what that should look
like.
-- Evil things can happen if the background process changes client_encoding.
-- Even if I fix all that stuff, it's not entirely clear that there's
a consensus in favor of the patch at all.

I certainly think that as far as this CommitFest is concerned, patch 6
ought to be considered Returned with Feedback. Many thanks to Andres
and all for reviewing. What I am less certain about is whether it
makes sense to spend the energy fixing the above issues in the
short-term. I wrote pg_background mostly a demonstration that the
rest of these patches work and can be used to accomplish useful stuff,
whether as part of parallelism or otherwise; and there's a lot of work
left to be done on parallelism proper that won't necessarily benefit
from polishing pg_background first.

So, questions:

1. Any other opinions for or against pg_background as a concept? I
thought the ability to kick of vacuums (or other stuff with
PreventTransactionChain) asynchronously was pretty cool, as we as the
ability to use it as an authentication-free loopback dblink. But
opinions obviously vary.

2. Is anyone sufficiently interested in pg_background as a concept
that they'd be willing to take over the patch and work on the TODO
list mentioned above?

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#70

Andres Freund

andres@2ndquadrant.com

about 11 years ago

In reply to: Robert Haas (#69)

Re: pg_background (and more parallelism infrastructure patches)

On 2014-11-10 12:10:49 -0500, Robert Haas wrote:

- Patch 4 has had some review and really, as far as I can tell, the
only really major issue that has cropped up is the possibility that
the GUC settings that are legal in backend A aren't legal in backend B
for some reason; in that case, restoring the GUC settings there will
probably fail. While this is a legitimate concern, I think what it
really boils down to is that there's a possibility that there are
libraries loaded in the user backend that are not loaded in the
postmaster and therefore won't get loaded into the background worker.

I'm not too concerned about that one.

That's an issue to which we may need to engineer some solution, but
not in this patch. Overall, the patch's architecture is modeled
closely on the way we synchronize GUCs to new backends when using
EXEC_BACKEND, and I don't think we're going do any better than stating
with that and working to file down the rough edges as we go. So I'd
like to go ahead and commit this.

Did you check out whether PGC_BACKEND GUCs work? Other than that I agree
that we can just solve further issues as we notice them. This isn't
particularly intrustive.

-- There's some code duplication with exec_simple_query() that doesn't
feel great, but it's not clear that there's a superior alternative. I
think we certainly don't want to complicate exec_simple_query() to
make pg_background happy.

Yea, I think we can live with it if really necessary.

-- We should add REVOKE to the SQL script that installs it so that
non-superusers can't use it unless the superuser decides to grant them
rights.

Right. That seems trivial enough.

-- It doesn't handle types without send/receive functions. I thought
that was tolerable since the functions without such types seem like
mostly edge cases, but Andres doesn't like it. Apparently, BDR is
makes provisions to either ship the tuple as one big blob - if all
built-in types - or else use binary format where supported and text
format otherwise. Since this issue is common to BDR, parallelism, and
pg_background, we might want to introduce some common infrastructure
for it, but it's not too clear to me right now what that should look
like.

I think we definitely need to solve this - but I'm also not at all that
sure how.

For something like pg_background it's pretty trivial to fall back to
in/out when there's no send/recv. It's just a couple of lines, and the
portal code has the necessary provisions. So solving it for
pg_background itself is pretty trivial. But, as you say, pg_background
itself isn't a primary goal (although a nice demonstration, with some
real world applications). There's enough differences between the
parallelism and the replication cases that I'm not entirely sure how
much common ground there is. Namely replication has the additional
concerns of version differences, trust (don't use blobs if you don't
fully trust the other side), and differences in the allocated oids
(especially relevant for arrays which, very annoyingly, embed the oid in
the send/recv format).

1. Any other opinions for or against pg_background as a concept? I
thought the ability to kick of vacuums (or other stuff with
PreventTransactionChain) asynchronously was pretty cool, as we as the
ability to use it as an authentication-free loopback dblink. But
opinions obviously vary.

I think it's a cool concept, but I'm not sure if it's worth the work to
make it fully usable. Or rather, I think it's worthy enough, but I
personally would priorize other stuff.

2. Is anyone sufficiently interested in pg_background as a concept
that they'd be willing to take over the patch and work on the TODO
list mentioned above?

I personally won't. If we can come up with a sketch of how to deal with
the data transport encoding issue above, I'd be willing to to work on
that specific part. But not pg_background in itself.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#71

Stephen Frost

sfrost@snowman.net

about 11 years ago

In reply to: Andres Freund (#70)

Re: pg_background (and more parallelism infrastructure patches)

* Andres Freund (andres@2ndquadrant.com) wrote:

1. Any other opinions for or against pg_background as a concept? I
thought the ability to kick of vacuums (or other stuff with
PreventTransactionChain) asynchronously was pretty cool, as we as the
ability to use it as an authentication-free loopback dblink. But
opinions obviously vary.

I think it's a cool concept, but I'm not sure if it's worth the work to
make it fully usable. Or rather, I think it's worthy enough, but I
personally would priorize other stuff.

I've not read through the whole thread/discussionm but I'd put myself in
more-or-less the same boat at this point. I've got a number of other
things on my plate already that need to get done (more RLS cleanup /
consistency, back-patching the ereport() column-privs issue, reviewing
pgAudit, the less-than-superuser privileges work, actually helping out
with the in-progress commitfest..) and so I really doubt I'd be able to
seriously help with pg_background- at least for 9.5, which is coming up
awful fast at this point, if we're going to stick with the 'normal'
schedule and freeze in the spring.

That said, I love the concept and had really been hoping to see it in
9.5, and maybe some at or cron-like ability happening later (yes, I
absolutely feel we need this, though I know others have different
opinions..).

2. Is anyone sufficiently interested in pg_background as a concept
that they'd be willing to take over the patch and work on the TODO
list mentioned above?

I personally won't. If we can come up with a sketch of how to deal with
the data transport encoding issue above, I'd be willing to to work on
that specific part. But not pg_background in itself.

If other things get done or additional resources show up, I'd be
interested in helping, but I don't see either happening in time for 9.5.

Thanks!

Stephen

#72

Alvaro Herrera

alvherre@2ndquadrant.com

about 11 years ago

In reply to: Stephen Frost (#71)

Re: pg_background (and more parallelism infrastructure patches)

Stephen Frost wrote:

at least for 9.5, which is coming up awful fast at this point, if
we're going to stick with the 'normal' schedule and freeze in the
spring.

https://wiki.postgresql.org/wiki/PgCon_2014_Developer_Meeting#9.5_Schedule

Doesn't look all that "normal" to me, with that commitfest in Feb.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#73

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Andres Freund (#70)

Re: pg_background (and more parallelism infrastructure patches)

On Mon, Nov 10, 2014 at 1:29 PM, Andres Freund <andres@2ndquadrant.com> wrote:

That's an issue to which we may need to engineer some solution, but
not in this patch. Overall, the patch's architecture is modeled
closely on the way we synchronize GUCs to new backends when using
EXEC_BACKEND, and I don't think we're going do any better than stating
with that and working to file down the rough edges as we go. So I'd
like to go ahead and commit this.

Did you check out whether PGC_BACKEND GUCs work? Other than that I agree
that we can just solve further issues as we notice them. This isn't
particularly intrustive.

Is the scenario you're concerned about this one:

1. Start postmaster.
2. Start a user session.
3. Change a PGC_BACKEND GUC in postgresql.conf.
4. Reload.
5. Launch a background worker that uses this code.

It should end up with the same values there as the active session, not
the current one from postgresql.conf. But we want to verify that's
the behavior we actually get. Right?

-- It doesn't handle types without send/receive functions. I thought
that was tolerable since the functions without such types seem like
mostly edge cases, but Andres doesn't like it. Apparently, BDR is
makes provisions to either ship the tuple as one big blob - if all
built-in types - or else use binary format where supported and text
format otherwise. Since this issue is common to BDR, parallelism, and
pg_background, we might want to introduce some common infrastructure
for it, but it's not too clear to me right now what that should look
like.

I think we definitely need to solve this - but I'm also not at all that
sure how.

For something like pg_background it's pretty trivial to fall back to
in/out when there's no send/recv. It's just a couple of lines, and the
portal code has the necessary provisions. So solving it for
pg_background itself is pretty trivial.

That's not really the interesting part of the problem, I think. Yeah,
pg_background can be made to speak text format if we really care. But
the real issue is that even with the binary format, converting a tuple
in on-disk format into a DataRow message so that the other side can
reconstruct the tuple and shove it into an executor slot (or wherever
it wants to shove it) seems like it might be expensive. You might
have data on that; if it's not expensive, stop here. If it is
expensive, what we really want to do is figure out some way to make it
safe to copy the tuple into the shared memory queue, or send it out
over the socket, and have the process on the other end use it without
having to revalidate the whole thing column-by-column.

But, as you say, pg_background
itself isn't a primary goal (although a nice demonstration, with some
real world applications). There's enough differences between the
parallelism and the replication cases that I'm not entirely sure how
much common ground there is. Namely replication has the additional
concerns of version differences, trust (don't use blobs if you don't
fully trust the other side), and differences in the allocated oids
(especially relevant for arrays which, very annoyingly, embed the oid in
the send/recv format).

Yeah. It would be nice to use the same code for deciding what to do
in a particular case. It seems like it ought to be possible to have
one rule for whether a tuple with a given set of data types is safe to
ship in the on-disk format. For pg_background/parallelism, it's
enough if that function returns true; for BDR, there might be
additional criteria applied before deciding to do it that way. But
they could use the same decision tree for part of it.

What would be even better is to find some way to MAKE IT SAFE to send
the undecoded tuple. I'm not sure what that would look like. Locking
the types against concurrent changes? Marking dropped types as
dropped without actually dropping them, and garbage collecting them
later? Providing safer ways to deconstruct tuples? It might help to
start by enumerating what exactly can go wrong here. As far as I can
see, the main risks for pg_background and parallelism are (1) that the
type might get concurrently dropped, leaving us unable to interpret
the received tuple and (2) that the type might get concurrently
modified in some way that leaves us confused about how to interpret
the tuple, as by having the type size change. The chances of a
problem in practice seem remote, since you can't do either of those
things if a table column with that type exists, but I can't convince
myself that there's no way for the type to be modified under us in
such a way that two different backends have different ideas about
whether it exists or how wide it is.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#74

Stephen Frost

sfrost@snowman.net

about 11 years ago

In reply to: Alvaro Herrera (#72)

Re: pg_background (and more parallelism infrastructure patches)

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:

Stephen Frost wrote:

at least for 9.5, which is coming up awful fast at this point, if
we're going to stick with the 'normal' schedule and freeze in the
spring.

https://wiki.postgresql.org/wiki/PgCon_2014_Developer_Meeting#9.5_Schedule

Doesn't look all that "normal" to me, with that commitfest in Feb.

I'd be more inclined to expect the late release of 9.4 to impact things
than that schedule to really change when we freeze.. Just my 2c though.

Thanks,

Stephen

#75

Andres Freund

andres@2ndquadrant.com

about 11 years ago

In reply to: Robert Haas (#73)

Re: pg_background (and more parallelism infrastructure patches)

On 2014-11-10 14:54:15 -0500, Robert Haas wrote:

On Mon, Nov 10, 2014 at 1:29 PM, Andres Freund <andres@2ndquadrant.com> wrote:

That's an issue to which we may need to engineer some solution, but
not in this patch. Overall, the patch's architecture is modeled
closely on the way we synchronize GUCs to new backends when using
EXEC_BACKEND, and I don't think we're going do any better than stating
with that and working to file down the rough edges as we go. So I'd
like to go ahead and commit this.

Did you check out whether PGC_BACKEND GUCs work? Other than that I agree
that we can just solve further issues as we notice them. This isn't
particularly intrustive.

Is the scenario you're concerned about this one:

1. Start postmaster.
2. Start a user session.
3. Change a PGC_BACKEND GUC in postgresql.conf.
4. Reload.
5. Launch a background worker that uses this code.

No, that's not what I was thinking of. Consider:

export pg_libdir=$(pg_config --libdir)
mkdir $pg_libdir/plugins
ln -s $pg_libdir/auto_explain.so $pg_libdir/plugins/
PG_OPTIONS='-c local_preload_libraries=auto_explain' psql

and use pg_background() (or something else using this infrastructure) in
that context.
Without having reread your relevant code I'd expect a:
ERROR: 55P02: parameter "local_preload_libraries" cannot be set after connection start
because the code would try to import the "user session's"
local_preload_libraries values into the background process - after that
actually already has done its initialization.

It should end up with the same values there as the active session, not
the current one from postgresql.conf. But we want to verify that's
the behavior we actually get. Right?

But that's also something worthwile to check.

-- It doesn't handle types without send/receive functions. I thought
that was tolerable since the functions without such types seem like
mostly edge cases, but Andres doesn't like it. Apparently, BDR is
makes provisions to either ship the tuple as one big blob - if all
built-in types - or else use binary format where supported and text
format otherwise. Since this issue is common to BDR, parallelism, and
pg_background, we might want to introduce some common infrastructure
for it, but it's not too clear to me right now what that should look
like.

I think we definitely need to solve this - but I'm also not at all that
sure how.

For something like pg_background it's pretty trivial to fall back to
in/out when there's no send/recv. It's just a couple of lines, and the
portal code has the necessary provisions. So solving it for
pg_background itself is pretty trivial.

That's not really the interesting part of the problem, I think. Yeah,
pg_background can be made to speak text format if we really care. But
the real issue is that even with the binary format, converting a tuple
in on-disk format into a DataRow message so that the other side can
reconstruct the tuple and shove it into an executor slot (or wherever
it wants to shove it) seems like it might be expensive. You might
have data on that; if it's not expensive, stop here. If it is
expensive, what we really want to do is figure out some way to make it
safe to copy the tuple into the shared memory queue, or send it out
over the socket, and have the process on the other end use it without
having to revalidate the whole thing column-by-column.

Yes, sending the tuples as-is is noticeable win. I've not fully analyzed
where the difference is - I'm suspect it's to a large degree because it
requires less copying/memory allocations. But also some of the send/recv
functions aren't exactly cheap in themselves.

For BDR we've forgone problems around changing type definitions and such
by saying only builtin types get to do the 'as-is' stuff. That can be
expanded later, but that's it for now. For those there's no chance that
the type definition changes or anything.

A general problem is that you really can't allow this to happen outside
a controlled setting - manipulating data on the 'Datum' level allows you
to do bad things.

But, as you say, pg_background
itself isn't a primary goal (although a nice demonstration, with some
real world applications). There's enough differences between the
parallelism and the replication cases that I'm not entirely sure how
much common ground there is. Namely replication has the additional
concerns of version differences, trust (don't use blobs if you don't
fully trust the other side), and differences in the allocated oids
(especially relevant for arrays which, very annoyingly, embed the oid in
the send/recv format).

Yeah. It would be nice to use the same code for deciding what to do
in a particular case. It seems like it ought to be possible to have
one rule for whether a tuple with a given set of data types is safe to
ship in the on-disk format. For pg_background/parallelism, it's
enough if that function returns true; for BDR, there might be
additional criteria applied before deciding to do it that way. But
they could use the same decision tree for part of it.

Yes, that'd be a good idea.

What would be even better is to find some way to MAKE IT SAFE to send
the undecoded tuple. I'm not sure what that would look like.

How do you define 'SAFE' here? Safe against concurrent SQL level
activity? Safe against receiving arbitrary data?

I see little chance in being safe against the latter. But imo that's
fine. Neither is WAL shipping. And that's perfectly fine for most
usecases...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#76

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Andres Freund (#75)

Re: pg_background (and more parallelism infrastructure patches)

On Mon, Nov 10, 2014 at 4:58 PM, Andres Freund <andres@2ndquadrant.com> wrote:

No, that's not what I was thinking of. Consider:

export pg_libdir=$(pg_config --libdir)
mkdir $pg_libdir/plugins
ln -s $pg_libdir/auto_explain.so $pg_libdir/plugins/
PG_OPTIONS='-c local_preload_libraries=auto_explain' psql

and use pg_background() (or something else using this infrastructure) in
that context.
Without having reread your relevant code I'd expect a:
ERROR: 55P02: parameter "local_preload_libraries" cannot be set after connection start
because the code would try to import the "user session's"
local_preload_libraries values into the background process - after that
actually already has done its initialization.

Thanks for these pointers. Your directions turn out to be wrong in
detail, because it's PGOPTIONS not PG_OPTIONS, and the plugins
directory needs to be under $(pg_config --libdir)/postgresql, not just
$(pg_config --libdir). But it was enough to get me on the right
track. Long story short, it seems to work:

[rhaas ~]$ PGOPTIONS='-c local_preload_libraries=auto_explain' psql
psql (9.5devel)
Type "help" for help.

rhaas=# show local_preload_libraries;
local_preload_libraries
-------------------------
auto_explain
(1 row)

rhaas=# select * from pg_background_result(pg_background_launch('show
local_preload_libraries')) as (x text);
x
--------------
auto_explain
(1 row)

This is what I expected, and I think what we want. Background workers
enlisted for parallelism (or pg_background) need to be able to have
the same values for GUCs as the user backend that started them, and
our definition of setting things "at backend start" needs to be
expansive enough to include stuff we pull out of a dynamic shared
memory segment shortly thereafter.

It should end up with the same values there as the active session, not
the current one from postgresql.conf. But we want to verify that's
the behavior we actually get. Right?

But that's also something worthwile to check.

Mmph. There's a problem here. I tried changing
local_preload_libraries='auto_explain' in postgresql.conf and then
sending PGC_SIGHUP. A session started before that change does indeed
create a worker with that value still empty, but a session started
after that change also creates a worker with that value still empty.
Oops. Not sure why that's happening yet, will investigate.

What would be even better is to find some way to MAKE IT SAFE to send
the undecoded tuple. I'm not sure what that would look like.

How do you define 'SAFE' here? Safe against concurrent SQL level
activity? Safe against receiving arbitrary data?

The first one. The code powering the background worker is just as
trustworthy as any other part of the backend, so it can be subverted
if you load malicious C code into the backend, but not otherwise.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#77

Andres Freund

andres@2ndquadrant.com

about 11 years ago

In reply to: Robert Haas (#76)

Re: pg_background (and more parallelism infrastructure patches)

On 2014-11-12 11:15:26 -0500, Robert Haas wrote:

On Mon, Nov 10, 2014 at 4:58 PM, Andres Freund <andres@2ndquadrant.com> wrote:

No, that's not what I was thinking of. Consider:

export pg_libdir=$(pg_config --libdir)
mkdir $pg_libdir/plugins
ln -s $pg_libdir/auto_explain.so $pg_libdir/plugins/
PG_OPTIONS='-c local_preload_libraries=auto_explain' psql

and use pg_background() (or something else using this infrastructure) in
that context.
Without having reread your relevant code I'd expect a:
ERROR: 55P02: parameter "local_preload_libraries" cannot be set after connection start
because the code would try to import the "user session's"
local_preload_libraries values into the background process - after that
actually already has done its initialization.

Thanks for these pointers. Your directions turn out to be wrong in
detail, because it's PGOPTIONS not PG_OPTIONS, and the plugins
directory needs to be under $(pg_config --libdir)/postgresql, not just
$(pg_config --libdir). But it was enough to get me on the right
track. Long story short, it seems to work:

Sorry, was extracting it from a larger local script...

[rhaas ~]$ PGOPTIONS='-c local_preload_libraries=auto_explain' psql
psql (9.5devel)
Type "help" for help.

rhaas=# show local_preload_libraries;
local_preload_libraries
-------------------------
auto_explain
(1 row)

rhaas=# select * from pg_background_result(pg_background_launch('show
local_preload_libraries')) as (x text);
x
--------------
auto_explain
(1 row)

The question is whether the library is actually loaded in that case?
Because that normally only happens early during startup - which is why
it's a PGC_BACKEND guc.

How do you define 'SAFE' here? Safe against concurrent SQL level
activity? Safe against receiving arbitrary data?

The first one. The code powering the background worker is just as
trustworthy as any other part of the backend, so it can be subverted
if you load malicious C code into the backend, but not otherwise.

Ah, good. Because the latter sounds quite hard, if not impossible lest
we recreate send/recv ;)

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#78

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Andres Freund (#77)

Re: pg_background (and more parallelism infrastructure patches)

On Wed, Nov 12, 2014 at 11:19 AM, Andres Freund <andres@2ndquadrant.com> wrote:

The question is whether the library is actually loaded in that case?
Because that normally only happens early during startup - which is why
it's a PGC_BACKEND guc.

It looks like that does not work.

[rhaas pgsql]$ PGOPTIONS='-c local_preload_libraries=auto_explain' psql
psql (9.5devel)
Type "help" for help.

rhaas=# select * from pg_background_result(pg_background_launch('show
auto_explain.log_min_duration')) as (x text);
ERROR: unrecognized configuration parameter "auto_explain.log_min_duration"
CONTEXT: background worker, pid 31316

So, there's more to be done here. Rats.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#79

Andres Freund

andres@2ndquadrant.com

about 11 years ago

In reply to: Robert Haas (#78)

Re: pg_background (and more parallelism infrastructure patches)

On 2014-11-12 11:36:14 -0500, Robert Haas wrote:

On Wed, Nov 12, 2014 at 11:19 AM, Andres Freund <andres@2ndquadrant.com> wrote:

The question is whether the library is actually loaded in that case?
Because that normally only happens early during startup - which is why
it's a PGC_BACKEND guc.

It looks like that does not work.

[rhaas pgsql]$ PGOPTIONS='-c local_preload_libraries=auto_explain' psql
psql (9.5devel)
Type "help" for help.

rhaas=# select * from pg_background_result(pg_background_launch('show
auto_explain.log_min_duration')) as (x text);
ERROR: unrecognized configuration parameter "auto_explain.log_min_duration"
CONTEXT: background worker, pid 31316

So, there's more to be done here. Rats.

We could just say having PGC_BACKEND guc's that aren't set to their
config files aren't supported for anything parallel. I find that a
reasonable thing - otherwise pooling of bgworkers and all that will
likely be out. And it's not that there are that many important
PGC_BACKEND gucs.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#80

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Robert Haas (#78)

2 attachment(s)

Re: pg_background (and more parallelism infrastructure patches)

On Wed, Nov 12, 2014 at 11:36 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Nov 12, 2014 at 11:19 AM, Andres Freund <andres@2ndquadrant.com> wrote:

The question is whether the library is actually loaded in that case?
Because that normally only happens early during startup - which is why
it's a PGC_BACKEND guc.

It looks like that does not work.

[rhaas pgsql]$ PGOPTIONS='-c local_preload_libraries=auto_explain' psql
psql (9.5devel)
Type "help" for help.

rhaas=# select * from pg_background_result(pg_background_launch('show
auto_explain.log_min_duration')) as (x text);
ERROR: unrecognized configuration parameter "auto_explain.log_min_duration"
CONTEXT: background worker, pid 31316

So, there's more to be done here. Rats.

It turned out to be quite simple to fix both problems. This
particular case fails because the call that loads the libraries
specified by session_preload_libraries and local_preload_libraries is
in PostgresMain() and thus never gets called by pg_background. I
fixed that by adding that call to pg_background in the appropriate
place. While I was at it, I added the REVOKE statements we discussed
earlier to pg_background's .sql file.

The other problem was due to this code in set_config_option:

/*
* If a PGC_BACKEND or PGC_SU_BACKEND
parameter is changed in
* the config file, we want to accept
the new value in the
* postmaster (whence it will propagate to
* subsequently-started backends), but
ignore it in existing
* backends. This is a tad klugy, but
necessary because we
* don't re-read the config file
during backend start.
*
* In EXEC_BACKEND builds, this works
differently: we load all
* nondefault settings from the
CONFIG_EXEC_PARAMS file during
* backend start. In that case we
must accept PGC_SIGHUP
* settings, so as to have the same
value as if we'd forked
* from the postmaster. We detect
this situation by checking
* IsInitProcessingMode, which is a
bit ugly, but it doesn't
* seem worth passing down an explicit
flag saying we're doing
* read_nondefault_variables().
*/
#ifdef EXEC_BACKEND
if (IsUnderPostmaster &&
!IsInitProcessingMode())
return -1;
#else
if (IsUnderPostmaster)
return -1;
#endif

When restoring variables via RestoreGUCState(), we need the same kind
of special-handling that we do when running in EXEC_BACKEND mode and
restoring variables via read_nondefault_variables(). Extending the
IsInitProcessingMode kludge() doesn't look appealing, so I instead
added the flag contemplated by the comment.

Updated patches attached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0004-Add-infrastructure-to-save-and-restore-GUC-values-v2.patchtext/x-patch; charset=US-ASCII; name=0004-Add-infrastructure-to-save-and-restore-GUC-values-v2.patchDownload

commit dc8980bdc9be89acf6d61bbfa5b1d7391992e8e8
Author: Robert Haas <rhaas@postgresql.org>
Date:   Thu Jul 17 07:58:32 2014 -0400

    Add infrastructure to save and restore GUC values.
    
    Amit Khandekar, Noah Misch, Robert Haas
    
    V2: Fix misuse of NULL vs. NUL/zero-byte.
    V2: Check return value of set_config_option.
    V2: Add an explicit is_reload flag to set_config_option.

diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 9a0afa4..6692bb5 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -814,11 +814,11 @@ execute_extension_script(Oid extensionOid, ExtensionControlFile *control,
 	if (client_min_messages < WARNING)
 		(void) set_config_option("client_min_messages", "warning",
 								 PGC_USERSET, PGC_S_SESSION,
-								 GUC_ACTION_SAVE, true, 0);
+								 GUC_ACTION_SAVE, true, 0, false);
 	if (log_min_messages < WARNING)
 		(void) set_config_option("log_min_messages", "warning",
 								 PGC_SUSET, PGC_S_SESSION,
-								 GUC_ACTION_SAVE, true, 0);
+								 GUC_ACTION_SAVE, true, 0, false);
 
 	/*
 	 * Set up the search path to contain the target schema, then the schemas
@@ -843,7 +843,7 @@ execute_extension_script(Oid extensionOid, ExtensionControlFile *control,
 
 	(void) set_config_option("search_path", pathbuf.data,
 							 PGC_USERSET, PGC_S_SESSION,
-							 GUC_ACTION_SAVE, true, 0);
+							 GUC_ACTION_SAVE, true, 0, false);
 
 	/*
 	 * Set creating_extension and related variables so that
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index c0156fa..2f02303 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -2422,7 +2422,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	snprintf(workmembuf, sizeof(workmembuf), "%d", maintenance_work_mem);
 	(void) set_config_option("work_mem", workmembuf,
 							 PGC_USERSET, PGC_S_SESSION,
-							 GUC_ACTION_SAVE, true, 0);
+							 GUC_ACTION_SAVE, true, 0, false);
 
 	if (SPI_connect() != SPI_OK_CONNECT)
 		elog(ERROR, "SPI_connect failed");
diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l
index bb9207a..6dca5df 100644
--- a/src/backend/utils/misc/guc-file.l
+++ b/src/backend/utils/misc/guc-file.l
@@ -318,7 +318,7 @@ ProcessConfigFile(GucContext context)
 		/* Now we can re-apply the wired-in default (i.e., the boot_val) */
 		if (set_config_option(gconf->name, NULL,
 							  context, PGC_S_DEFAULT,
-							  GUC_ACTION_SET, true, 0) > 0)
+							  GUC_ACTION_SET, true, 0, false) > 0)
 		{
 			/* Log the change if appropriate */
 			if (context == PGC_SIGHUP)
@@ -373,7 +373,7 @@ ProcessConfigFile(GucContext context)
 
 		scres = set_config_option(item->name, item->value,
 								  context, PGC_S_FILE,
-								  GUC_ACTION_SET, true, 0);
+								  GUC_ACTION_SET, true, 0, false);
 		if (scres > 0)
 		{
 			/* variable was updated, so log the change if appropriate */
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6547dfa..f5a706e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -110,6 +110,12 @@
 #define S_PER_D (60 * 60 * 24)
 #define MS_PER_D (1000 * 60 * 60 * 24)
 
+/*
+ * Precision with which REAL type guc values are to be printed for GUC
+ * serialization.
+ */
+#define REALTYPE_PRECISION 17
+
 /* XXX these should appear in other modules' header files */
 extern bool Log_disconnections;
 extern int	CommitDelay;
@@ -136,6 +142,10 @@ char	   *GUC_check_errmsg_string;
 char	   *GUC_check_errdetail_string;
 char	   *GUC_check_errhint_string;
 
+static void
+do_serialize(char **destptr, Size *maxbytes, const char *fmt,...)
+/* This lets gcc check the format string for consistency. */
+__attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 4)));
 
 static void set_config_sourcefile(const char *name, char *sourcefile,
 					  int sourceline);
@@ -5612,7 +5622,8 @@ validate_conf_option(struct config_generic * record, const char *name,
 int
 set_config_option(const char *name, const char *value,
 				  GucContext context, GucSource source,
-				  GucAction action, bool changeVal, int elevel)
+				  GucAction action, bool changeVal, int elevel,
+				  bool is_reload)
 {
 	struct config_generic *record;
 	bool		prohibitValueChange = false;
@@ -5726,18 +5737,13 @@ set_config_option(const char *name, const char *value,
 				 * nondefault settings from the CONFIG_EXEC_PARAMS file during
 				 * backend start.  In that case we must accept PGC_SIGHUP
 				 * settings, so as to have the same value as if we'd forked
-				 * from the postmaster.  We detect this situation by checking
-				 * IsInitProcessingMode, which is a bit ugly, but it doesn't
-				 * seem worth passing down an explicit flag saying we're doing
-				 * read_nondefault_variables().
+				 * from the postmaster.  This can also happen when using
+				 * RestoreGUCState() within a background worker that needs to
+				 * have the same settings as the user backend that started it.
+				 * is_reload will be true when either situation applies.
 				 */
-#ifdef EXEC_BACKEND
-				if (IsUnderPostmaster && !IsInitProcessingMode())
-					return -1;
-#else
-				if (IsUnderPostmaster)
+				if (IsUnderPostmaster && !is_reload)
 					return -1;
-#endif
 			}
 			else if (context != PGC_POSTMASTER &&
 					 context != PGC_BACKEND &&
@@ -6343,7 +6349,7 @@ SetConfigOption(const char *name, const char *value,
 				GucContext context, GucSource source)
 {
 	(void) set_config_option(name, value, context, source,
-							 GUC_ACTION_SET, true, 0);
+							 GUC_ACTION_SET, true, 0, false);
 }
 
 
@@ -6923,9 +6929,7 @@ ExecSetVariableStmt(VariableSetStmt *stmt, bool isTopLevel)
 									 ExtractSetVariableArgs(stmt),
 									 (superuser() ? PGC_SUSET : PGC_USERSET),
 									 PGC_S_SESSION,
-									 action,
-									 true,
-									 0);
+									 action, true, 0, false);
 			break;
 		case VAR_SET_MULTI:
 
@@ -7012,9 +7016,7 @@ ExecSetVariableStmt(VariableSetStmt *stmt, bool isTopLevel)
 									 NULL,
 									 (superuser() ? PGC_SUSET : PGC_USERSET),
 									 PGC_S_SESSION,
-									 action,
-									 true,
-									 0);
+									 action, true, 0, false);
 			break;
 		case VAR_RESET_ALL:
 			ResetAllOptions();
@@ -7059,8 +7061,7 @@ SetPGVariable(const char *name, List *args, bool is_local)
 							 (superuser() ? PGC_SUSET : PGC_USERSET),
 							 PGC_S_SESSION,
 							 is_local ? GUC_ACTION_LOCAL : GUC_ACTION_SET,
-							 true,
-							 0);
+							 true, 0, false);
 }
 
 /*
@@ -7103,8 +7104,7 @@ set_config_by_name(PG_FUNCTION_ARGS)
 							 (superuser() ? PGC_SUSET : PGC_USERSET),
 							 PGC_S_SESSION,
 							 is_local ? GUC_ACTION_LOCAL : GUC_ACTION_SET,
-							 true,
-							 0);
+							 true, 0, false);
 
 	/* get the new current value */
 	new_value = GetConfigOptionByName(name, NULL);
@@ -7225,7 +7225,7 @@ define_custom_variable(struct config_generic * variable)
 		(void) set_config_option(name, pHolder->reset_val,
 								 pHolder->gen.reset_scontext,
 								 pHolder->gen.reset_source,
-								 GUC_ACTION_SET, true, WARNING);
+								 GUC_ACTION_SET, true, WARNING, false);
 	/* That should not have resulted in stacking anything */
 	Assert(variable->stack == NULL);
 
@@ -7281,30 +7281,35 @@ reapply_stacked_values(struct config_generic * variable,
 			case GUC_SAVE:
 				(void) set_config_option(name, curvalue,
 										 curscontext, cursource,
-										 GUC_ACTION_SAVE, true, WARNING);
+										 GUC_ACTION_SAVE, true,
+										 WARNING, false);
 				break;
 
 			case GUC_SET:
 				(void) set_config_option(name, curvalue,
 										 curscontext, cursource,
-										 GUC_ACTION_SET, true, WARNING);
+										 GUC_ACTION_SET, true,
+										 WARNING, false);
 				break;
 
 			case GUC_LOCAL:
 				(void) set_config_option(name, curvalue,
 										 curscontext, cursource,
-										 GUC_ACTION_LOCAL, true, WARNING);
+										 GUC_ACTION_LOCAL, true,
+										 WARNING, false);
 				break;
 
 			case GUC_SET_LOCAL:
 				/* first, apply the masked value as SET */
 				(void) set_config_option(name, stack->masked.val.stringval,
 									   stack->masked_scontext, PGC_S_SESSION,
-										 GUC_ACTION_SET, true, WARNING);
+										 GUC_ACTION_SET, true,
+										 WARNING, false);
 				/* then apply the current value as LOCAL */
 				(void) set_config_option(name, curvalue,
 										 curscontext, cursource,
-										 GUC_ACTION_LOCAL, true, WARNING);
+										 GUC_ACTION_LOCAL, true,
+										 WARNING, false);
 				break;
 		}
 
@@ -7328,7 +7333,7 @@ reapply_stacked_values(struct config_generic * variable,
 		{
 			(void) set_config_option(name, curvalue,
 									 curscontext, cursource,
-									 GUC_ACTION_SET, true, WARNING);
+									 GUC_ACTION_SET, true, WARNING, false);
 			variable->stack = NULL;
 		}
 	}
@@ -8452,7 +8457,7 @@ read_nondefault_variables(void)
 
 		(void) set_config_option(varname, varvalue,
 								 varscontext, varsource,
-								 GUC_ACTION_SET, true, 0);
+								 GUC_ACTION_SET, true, 0, true);
 		if (varsourcefile[0])
 			set_config_sourcefile(varname, varsourcefile, varsourceline);
 
@@ -8465,6 +8470,398 @@ read_nondefault_variables(void)
 }
 #endif   /* EXEC_BACKEND */
 
+/*
+ * can_skip_gucvar:
+ * When serializing, determine whether to skip this GUC.  When restoring, the
+ * negation of this test determines whether to restore the compiled-in default
+ * value before processing serialized values.
+ *
+ * A PGC_S_DEFAULT setting on the serialize side will typically match new
+ * postmaster children, but that can be false when got_SIGHUP == true and the
+ * pending configuration change modifies this setting.  Nonetheless, we omit
+ * PGC_S_DEFAULT settings from serialization and make up for that by restoring
+ * defaults before applying serialized values.
+ *
+ * PGC_POSTMASTER variables always have the same value in every child of a
+ * particular postmaster.  Most PGC_INTERNAL variables are compile-time
+ * constants; a few, like server_encoding and lc_ctype, are handled specially
+ * outside the serialize/restore procedure.  Therefore, SerializeGUCState()
+ * never sends these, and and RestoreGUCState() never changes them.
+ */
+static bool
+can_skip_gucvar(struct config_generic * gconf)
+{
+	return gconf->context == PGC_POSTMASTER ||
+		gconf->context == PGC_INTERNAL || gconf->source == PGC_S_DEFAULT;
+}
+
+/*
+ * estimate_variable_size:
+ * Estimate max size for dumping the given GUC variable.
+ */
+static Size
+estimate_variable_size(struct config_generic * gconf)
+{
+	Size		size;
+	Size		valsize;
+
+	if (can_skip_gucvar(gconf))
+		return 0;
+
+	size = 0;
+
+	size = add_size(size, strlen(gconf->name) + 1);
+
+	/* Get the maximum display length of the GUC value. */
+	switch (gconf->vartype)
+	{
+		case PGC_BOOL:
+			{
+				valsize = 5;	/* max(strlen('true'), strlen('false')) */
+			}
+			break;
+
+		case PGC_INT:
+			{
+				struct config_int *conf = (struct config_int *) gconf;
+
+				/*
+				 * Instead of getting the exact display length, use max
+				 * length.  Also reduce the max length for typical ranges of
+				 * small values.  Maximum value is 2147483647, i.e. 10 chars.
+				 * Include one byte for sign.
+				 */
+				if (abs(*conf->variable) < 1000)
+					valsize = 3 + 1;
+				else
+					valsize = 10 + 1;
+			}
+			break;
+
+		case PGC_REAL:
+			{
+				/*
+				 * We are going to print it with %.17g. Account for sign,
+				 * decimal point, and e+nnn notation. E.g.
+				 * -3.9932904234000002e+110
+				 */
+				valsize = REALTYPE_PRECISION + 1 + 1 + 5;
+			}
+			break;
+
+		case PGC_STRING:
+			{
+				struct config_string *conf = (struct config_string *) gconf;
+
+				valsize = strlen(*conf->variable);
+			}
+			break;
+
+		case PGC_ENUM:
+			{
+				struct config_enum *conf = (struct config_enum *) gconf;
+
+				valsize = strlen(config_enum_lookup_by_value(conf, *conf->variable));
+			}
+			break;
+	}
+
+	/* Allow space for terminating zero-byte */
+	size = add_size(size, valsize + 1);
+
+	if (gconf->sourcefile)
+		size = add_size(size, strlen(gconf->sourcefile));
+
+	/* Allow space for terminating zero-byte */
+	size = add_size(size, 1);
+
+	/* Include line whenever we include file. */
+	if (gconf->sourcefile)
+		size = add_size(size, sizeof(gconf->sourceline));
+
+	size = add_size(size, sizeof(gconf->source));
+	size = add_size(size, sizeof(gconf->scontext));
+
+	return size;
+}
+
+/*
+ * EstimateGUCStateSpace:
+ * Returns the size needed to store the GUC state for the current process
+ */
+Size
+EstimateGUCStateSpace(void)
+{
+	Size		size;
+	int			i;
+
+	/* Add space reqd for saving the data size of the guc state */
+	size = sizeof(Size);
+
+	/* Add up the space needed for each GUC variable */
+	for (i = 0; i < num_guc_variables; i++)
+		size = add_size(size,
+						estimate_variable_size(guc_variables[i]));
+
+	return size;
+}
+
+/*
+ * do_serialize:
+ * Copies the formatted string into the destination.  Moves ahead the
+ * destination pointer, and decrements the maxbytes by that many bytes. If
+ * maxbytes is not sufficient to copy the string, error out.
+ */
+static void
+do_serialize(char **destptr, Size *maxbytes, const char *fmt,...)
+{
+	va_list		vargs;
+	int			n;
+
+	if (*maxbytes <= 0)
+		elog(ERROR, "not enough space to serialize GUC state");
+
+	va_start(vargs, fmt);
+	n = vsnprintf(*destptr, *maxbytes, fmt, vargs);
+	va_end(vargs);
+
+	/*
+	 * Cater to portability hazards in the vsnprintf() return value just like
+	 * appendPQExpBufferVA() does.  Note that this requires an extra byte of
+	 * slack at the end of the buffer.  Since serialize_variable() ends with a
+	 * do_serialize_binary() rather than a do_serialize(), we'll always have
+	 * that slack; estimate_variable_size() need not add a byte for it.
+	 */
+	if (n < 0 || n >= *maxbytes - 1)
+	{
+		if (n < 0 && errno != 0 && errno != ENOMEM)
+			/* Shouldn't happen. Better show errno description. */
+			elog(ERROR, "vsnprintf failed: %m");
+		else
+			elog(ERROR, "not enough space to serialize GUC state");
+	}
+
+	/* Shift the destptr ahead of the null terminator */
+	*destptr += n + 1;
+	*maxbytes -= n + 1;
+}
+
+/* Binary copy version of do_serialize() */
+static void
+do_serialize_binary(char **destptr, Size *maxbytes, void *val, Size valsize)
+{
+	if (valsize > *maxbytes)
+		elog(ERROR, "not enough space to serialize GUC state");
+
+	memcpy(*destptr, val, valsize);
+	*destptr += valsize;
+	*maxbytes -= valsize;
+}
+
+/*
+ * serialize_variable:
+ * Dumps name, value and other information of a GUC variable into destptr.
+ */
+static void
+serialize_variable(char **destptr, Size *maxbytes,
+				   struct config_generic * gconf)
+{
+	if (can_skip_gucvar(gconf))
+		return;
+
+	do_serialize(destptr, maxbytes, "%s", gconf->name);
+
+	switch (gconf->vartype)
+	{
+		case PGC_BOOL:
+			{
+				struct config_bool *conf = (struct config_bool *) gconf;
+
+				do_serialize(destptr, maxbytes,
+							 (*conf->variable ? "true" : "false"));
+			}
+			break;
+
+		case PGC_INT:
+			{
+				struct config_int *conf = (struct config_int *) gconf;
+
+				do_serialize(destptr, maxbytes, "%d", *conf->variable);
+			}
+			break;
+
+		case PGC_REAL:
+			{
+				struct config_real *conf = (struct config_real *) gconf;
+
+				do_serialize(destptr, maxbytes, "%.*g",
+							 REALTYPE_PRECISION, *conf->variable);
+			}
+			break;
+
+		case PGC_STRING:
+			{
+				struct config_string *conf = (struct config_string *) gconf;
+
+				do_serialize(destptr, maxbytes, "%s", *conf->variable);
+			}
+			break;
+
+		case PGC_ENUM:
+			{
+				struct config_enum *conf = (struct config_enum *) gconf;
+
+				do_serialize(destptr, maxbytes, "%s",
+						 config_enum_lookup_by_value(conf, *conf->variable));
+			}
+			break;
+	}
+
+	do_serialize(destptr, maxbytes, "%s",
+				 (gconf->sourcefile ? gconf->sourcefile : ""));
+
+	if (gconf->sourcefile)
+		do_serialize_binary(destptr, maxbytes, &gconf->sourceline,
+							sizeof(gconf->sourceline));
+
+	do_serialize_binary(destptr, maxbytes, &gconf->source,
+						sizeof(gconf->source));
+	do_serialize_binary(destptr, maxbytes, &gconf->scontext,
+						sizeof(gconf->scontext));
+}
+
+/*
+ * SerializeGUCState:
+ * Dumps the complete GUC state onto the memory location at start_address.
+ */
+void
+SerializeGUCState(Size maxsize, char *start_address)
+{
+	char	   *curptr;
+	Size		actual_size;
+	Size		bytes_left;
+	int			i;
+	int			i_role;
+
+	/* Reserve space for saving the actual size of the guc state */
+	curptr = start_address + sizeof(actual_size);
+	bytes_left = maxsize - sizeof(actual_size);
+
+	for (i = 0; i < num_guc_variables; i++)
+	{
+		/*
+		 * It's pretty ugly, but we've got to force "role" to be initialized
+		 * after "session_authorization"; otherwise, the latter will override
+		 * the former.
+		 */
+		if (strcmp(guc_variables[i]->name, "role") == 0)
+			i_role = i;
+		else
+			serialize_variable(&curptr, &bytes_left, guc_variables[i]);
+	}
+	serialize_variable(&curptr, &bytes_left, guc_variables[i_role]);
+
+	/* Store actual size without assuming alignment of start_address. */
+	actual_size = maxsize - bytes_left - sizeof(actual_size);
+	memcpy(start_address, &actual_size, sizeof(actual_size));
+}
+
+/*
+ * read_gucstate:
+ * Actually it does not read anything, just returns the srcptr. But it does
+ * move the srcptr past the terminating zero byte, so that the caller is ready
+ * to read the next string.
+ */
+static char *
+read_gucstate(char **srcptr, char *srcend)
+{
+	char	   *retptr = *srcptr;
+	char	   *ptr;
+
+	if (*srcptr >= srcend)
+		elog(ERROR, "incomplete GUC state");
+
+	/* The string variables are all null terminated */
+	for (ptr = *srcptr; ptr < srcend && *ptr != '\0'; ptr++)
+		;
+
+	if (ptr > srcend)
+		elog(ERROR, "could not find null terminator in GUC state");
+
+	/* Set the new position to the byte following the terminating NUL */
+	*srcptr = ptr + 1;
+
+	return retptr;
+}
+
+/* Binary read version of read_gucstate(). Copies into dest */
+static void
+read_gucstate_binary(char **srcptr, char *srcend, void *dest, Size size)
+{
+	if (*srcptr + size > srcend)
+		elog(ERROR, "incomplete GUC state");
+
+	memcpy(dest, *srcptr, size);
+	*srcptr += size;
+}
+
+/*
+ * RestoreGUCState:
+ * Reads the GUC state at the specified address and updates the GUCs with the
+ * values read from the GUC state.
+ */
+void
+RestoreGUCState(void *gucstate)
+{
+	char	   *varname,
+			   *varvalue,
+			   *varsourcefile;
+	int			varsourceline;
+	GucSource	varsource;
+	GucContext	varscontext;
+	char	   *srcptr = (char *) gucstate;
+	char	   *srcend;
+	Size		len;
+	int			i;
+
+	/* See comment at can_skip_gucvar(). */
+	for (i = 0; i < num_guc_variables; i++)
+		if (!can_skip_gucvar(guc_variables[i]))
+			InitializeOneGUCOption(guc_variables[i]);
+
+	/* First item is the length of the subsequent data */
+	memcpy(&len, gucstate, sizeof(len));
+
+	srcptr += sizeof(len);
+	srcend = srcptr + len;
+
+	while (srcptr < srcend)
+	{
+		int		result;
+
+		if ((varname = read_gucstate(&srcptr, srcend)) == NULL)
+			break;
+
+		varvalue = read_gucstate(&srcptr, srcend);
+		varsourcefile = read_gucstate(&srcptr, srcend);
+		if (varsourcefile[0])
+			read_gucstate_binary(&srcptr, srcend,
+								 &varsourceline, sizeof(varsourceline));
+		read_gucstate_binary(&srcptr, srcend,
+							 &varsource, sizeof(varsource));
+		read_gucstate_binary(&srcptr, srcend,
+							 &varscontext, sizeof(varscontext));
+
+		result = set_config_option(varname, varvalue, varscontext, varsource,
+								   GUC_ACTION_SET, true, ERROR, true);
+		if (result <= 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					 errmsg("parameter \"%s\" could not be set", varname)));
+		if (varsourcefile[0])
+			set_config_sourcefile(varname, varsourcefile, varsourceline);
+	}
+}
 
 /*
  * A little "long argument" simulation, although not quite GNU
@@ -8555,7 +8952,7 @@ ProcessGUCArray(ArrayType *array,
 
 		(void) set_config_option(name, value,
 								 context, source,
-								 action, true, 0);
+								 action, true, 0, false);
 
 		free(name);
 		if (value)
@@ -8858,7 +9255,7 @@ validate_option_array_item(const char *name, const char *value,
 	/* test for permissions and valid option value */
 	(void) set_config_option(name, value,
 							 superuser() ? PGC_SUSET : PGC_USERSET,
-							 PGC_S_TEST, GUC_ACTION_SET, false, 0);
+							 PGC_S_TEST, GUC_ACTION_SET, false, 0, false);
 
 	return true;
 }
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 2b2aaf4..395858b 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -339,7 +339,8 @@ extern bool parse_int(const char *value, int *result, int flags,
 extern bool parse_real(const char *value, double *result);
 extern int set_config_option(const char *name, const char *value,
 				  GucContext context, GucSource source,
-				  GucAction action, bool changeVal, int elevel);
+				  GucAction action, bool changeVal, int elevel,
+				  bool is_reload);
 extern void AlterSystemSetConfigFile(AlterSystemStmt *setstmt);
 extern char *GetConfigOptionByName(const char *name, const char **varname);
 extern void GetConfigOptionByNum(int varnum, const char **values, bool *noshow);
@@ -363,6 +364,11 @@ extern void write_nondefault_variables(GucContext context);
 extern void read_nondefault_variables(void);
 #endif
 
+/* GUC serialization */
+extern Size EstimateGUCStateSpace(void);
+extern void SerializeGUCState(Size maxsize, char *start_address);
+extern void RestoreGUCState(void *gucstate);
+
 /* Support for messages reported from GUC check hooks */
 
 extern PGDLLIMPORT char *GUC_check_errmsg_string;

0006-pg_background-Run-commands-in-a-background-worker-an-v5.patchtext/x-patch; charset=US-ASCII; name=0006-pg_background-Run-commands-in-a-background-worker-an-v5.patchDownload

commit be835602577c179a1adf6562c3391266465db71a
Author: Robert Haas <rhaas@postgresql.org>
Date:   Fri Jul 11 09:53:40 2014 -0400

    pg_background: Run commands in a background worker, and get the results.
    
    The currently-active GUC values from the user session will be copied
    to the background worker.  If the command returns a result set, you
    can retrieve the result set; if not, you can retrieve the command
    tags.  If the command fails with an error, the same error will be
    thrown in the launching process when the results are retrieved.
    Warnings and other messages generated by the background worker, and
    notifications received by it, are also propagated to the foreground
    process.
    
    Patch by me; review by Amit Kapila and Andres Freund.
    
    V2: Refactor to reduce differences with exec_simple_query; other
    cleanups per Amit.
    
    V3: Per Andres, fix whitespace damage and use NameData instead of
    char[NAMEDATALEN].
    
    V4: dsm_(un)keep_mapping -> dsm_(un)pin_mapping.  Error context:
    background worker, pid %d.  Permissions check: user must have
    permission to read results.
    
    V5: Revoke permissions on pg_background functions from public.
    Add missing call to process_session_preload_libraries().

diff --git a/contrib/Makefile b/contrib/Makefile
index b37d0dd..11d6116 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
 		pageinspect	\
 		passwordcheck	\
 		pg_archivecleanup \
+		pg_background \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
diff --git a/contrib/pg_background/Makefile b/contrib/pg_background/Makefile
new file mode 100644
index 0000000..c4e717d
--- /dev/null
+++ b/contrib/pg_background/Makefile
@@ -0,0 +1,18 @@
+# contrib/pg_background/Makefile
+
+MODULE_big = pg_background
+OBJS = pg_background.o
+
+EXTENSION = pg_background
+DATA = pg_background--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_background
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_background/pg_background--1.0.sql b/contrib/pg_background/pg_background--1.0.sql
new file mode 100644
index 0000000..8ac804a
--- /dev/null
+++ b/contrib/pg_background/pg_background--1.0.sql
@@ -0,0 +1,24 @@
+/* contrib/pg_background/pg_background--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION pg_background" to load this file. \quit
+
+CREATE FUNCTION pg_background_launch(sql pg_catalog.text,
+					   queue_size pg_catalog.int4 DEFAULT 65536)
+    RETURNS pg_catalog.int4 STRICT
+	AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION pg_background_result(pid pg_catalog.int4)
+    RETURNS SETOF pg_catalog.record STRICT
+	AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION pg_background_detach(pid pg_catalog.int4)
+    RETURNS pg_catalog.void STRICT
+	AS 'MODULE_PATHNAME' LANGUAGE C;
+
+REVOKE ALL ON FUNCTION pg_background_launch(pg_catalog.text, pg_catalog.int4) 
+	FROM public;
+REVOKE ALL ON FUNCTION pg_background_result(pg_catalog.int4) 
+	FROM public;
+REVOKE ALL ON FUNCTION pg_background_detach(pg_catalog.int4) 
+	FROM public;
diff --git a/contrib/pg_background/pg_background.c b/contrib/pg_background/pg_background.c
new file mode 100644
index 0000000..c04de26
--- /dev/null
+++ b/contrib/pg_background/pg_background.c
@@ -0,0 +1,1000 @@
+/*--------------------------------------------------------------------------
+ *
+ * pg_background.c
+ *		Run SQL commands using a background worker.
+ *
+ * Copyright (C) 2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pg_background/pg_background.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/htup_details.h"
+#include "access/printtup.h"
+#include "access/xact.h"
+#include "catalog/pg_type.h"
+#include "commands/async.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "libpq/pqformat.h"
+#include "libpq/pqmq.h"
+#include "miscadmin.h"
+#include "parser/analyze.h"
+#include "pgstat.h"
+#include "storage/dsm.h"
+#include "storage/ipc.h"
+#include "storage/shm_mq.h"
+#include "storage/shm_toc.h"
+#include "tcop/pquery.h"
+#include "tcop/utility.h"
+#include "utils/acl.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/snapmgr.h"
+#include "utils/timeout.h"
+
+/* Table-of-contents constants for our dynamic shared memory segment. */
+#define PG_BACKGROUND_MAGIC				0x50674267
+#define PG_BACKGROUND_KEY_FIXED_DATA	0
+#define PG_BACKGROUND_KEY_SQL			1
+#define PG_BACKGROUND_KEY_GUC			2
+#define PG_BACKGROUND_KEY_QUEUE			3
+#define PG_BACKGROUND_NKEYS				4
+
+/* Fixed-size data passed via our dynamic shared memory segment. */
+typedef struct pg_background_fixed_data
+{
+	Oid	database_id;
+	Oid	authenticated_user_id;
+	Oid	current_user_id;
+	int	sec_context;
+	NameData	database;
+	NameData	authenticated_user;
+} pg_background_fixed_data;
+
+/* Private state maintained by the launching backend for IPC. */
+typedef struct pg_background_worker_info
+{
+	pid_t		pid;
+	Oid			current_user_id;
+	dsm_segment *seg;
+	BackgroundWorkerHandle *handle;
+	shm_mq_handle *responseq;
+	bool		consumed;
+} pg_background_worker_info;
+
+/* Private state maintained across calls to pg_background_result. */
+typedef struct pg_background_result_state
+{
+	pg_background_worker_info *info;
+	FmgrInfo   *receive_functions;
+	Oid		   *typioparams;
+	bool		has_row_description;
+	List	   *command_tags;
+	bool		complete;
+} pg_background_result_state;
+
+static HTAB *worker_hash;
+
+static void cleanup_worker_info(dsm_segment *, Datum pid_datum);
+static pg_background_worker_info *find_worker_info(pid_t pid);
+static void check_rights(pg_background_worker_info *info);
+static void save_worker_info(pid_t pid, dsm_segment *seg,
+				 BackgroundWorkerHandle *handle,
+				 shm_mq_handle *responseq);
+static void pg_background_error_callback(void *arg);
+
+static HeapTuple form_result_tuple(pg_background_result_state *state,
+								   TupleDesc tupdesc, StringInfo msg);
+
+static void handle_sigterm(SIGNAL_ARGS);
+static void execute_sql_string(const char *sql);
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(pg_background_launch);
+PG_FUNCTION_INFO_V1(pg_background_result);
+PG_FUNCTION_INFO_V1(pg_background_detach);
+
+void pg_background_worker_main(Datum);
+
+/*
+ * Start a dynamic background worker to run a user-specified SQL command.
+ */
+Datum
+pg_background_launch(PG_FUNCTION_ARGS)
+{
+	text	   *sql = PG_GETARG_TEXT_PP(0);
+	int32		queue_size = PG_GETARG_INT32(1);
+	int32		sql_len = VARSIZE_ANY_EXHDR(sql);
+	Size		guc_len;
+	Size		segsize;
+	dsm_segment *seg;
+	shm_toc_estimator e;
+	shm_toc    *toc;
+	char	   *sqlp;
+	char	   *gucstate;
+	shm_mq	   *mq;
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *worker_handle;
+	pg_background_fixed_data *fdata;
+	pid_t		pid;
+	shm_mq_handle *responseq;
+	MemoryContext	oldcontext;
+
+	/* Ensure a valid queue size. */
+	if (queue_size < 0 || ((uint64) queue_size) < shm_mq_minimum_size)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("queue size must be at least %zu bytes",
+						shm_mq_minimum_size)));
+
+	/* Create dynamic shared memory and table of contents. */
+	shm_toc_initialize_estimator(&e);
+	shm_toc_estimate_chunk(&e, sizeof(pg_background_fixed_data));
+	shm_toc_estimate_chunk(&e, sql_len + 1);
+	guc_len = EstimateGUCStateSpace();
+	shm_toc_estimate_chunk(&e, guc_len);
+	shm_toc_estimate_chunk(&e, (Size) queue_size);
+	shm_toc_estimate_keys(&e, PG_BACKGROUND_NKEYS);
+	segsize = shm_toc_estimate(&e);
+	seg = dsm_create(segsize);
+	toc = shm_toc_create(PG_BACKGROUND_MAGIC, dsm_segment_address(seg),
+						 segsize);
+
+	/* Store fixed-size data in dynamic shared memory. */
+	fdata = shm_toc_allocate(toc, sizeof(pg_background_fixed_data));
+	fdata->database_id = MyDatabaseId;
+	fdata->authenticated_user_id = GetAuthenticatedUserId();
+	GetUserIdAndSecContext(&fdata->current_user_id, &fdata->sec_context);
+	namestrcpy(&fdata->database, get_database_name(MyDatabaseId));
+	namestrcpy(&fdata->authenticated_user,
+			   GetUserNameFromId(fdata->authenticated_user_id));
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_FIXED_DATA, fdata);
+
+	/* Store SQL query in dynamic shared memory. */
+	sqlp = shm_toc_allocate(toc, sql_len + 1);
+	memcpy(sqlp, VARDATA(sql), sql_len);
+	sqlp[sql_len] = '\0';
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_SQL, sqlp);
+
+	/* Store GUC state in dynamic shared memory. */
+	gucstate = shm_toc_allocate(toc, guc_len);
+	SerializeGUCState(guc_len, gucstate);
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_GUC, gucstate);
+
+	/* Establish message queue in dynamic shared memory. */
+	mq = shm_mq_create(shm_toc_allocate(toc, (Size) queue_size),
+					   (Size) queue_size);
+	shm_toc_insert(toc, PG_BACKGROUND_KEY_QUEUE, mq);
+	shm_mq_set_receiver(mq, MyProc);
+
+	/*
+	 * Attach the queue before launching a worker, so that we'll automatically
+	 * detach the queue if we error out.  (Otherwise, the worker might sit
+	 * there trying to write the queue long after we've gone away.)
+	 */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+	responseq = shm_mq_attach(mq, seg, NULL);
+	MemoryContextSwitchTo(oldcontext);
+
+	/* Configure a worker. */
+	worker.bgw_flags =
+		BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	worker.bgw_start_time = BgWorkerStart_ConsistentState;
+	worker.bgw_restart_time = BGW_NEVER_RESTART;
+	worker.bgw_main = NULL;		/* new worker might not have library loaded */
+	sprintf(worker.bgw_library_name, "pg_background");
+	sprintf(worker.bgw_function_name, "pg_background_worker_main");
+	snprintf(worker.bgw_name, BGW_MAXLEN,
+			 "pg_background by PID %d", MyProcPid);
+	worker.bgw_main_arg = UInt32GetDatum(dsm_segment_handle(seg));
+	/* set bgw_notify_pid, so we can detect if the worker stops */
+	worker.bgw_notify_pid = MyProcPid;
+
+	/*
+	 * Register the worker.
+	 *
+	 * We switch contexts so that the background worker handle can outlast
+	 * this transaction.
+	 */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+	if (!RegisterDynamicBackgroundWorker(&worker, &worker_handle))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not register background process"),
+			 errhint("You may need to increase max_worker_processes.")));
+	MemoryContextSwitchTo(oldcontext);
+	shm_mq_set_handle(responseq, worker_handle);
+
+	/* Wait for the worker to start. */
+	switch (WaitForBackgroundWorkerStartup(worker_handle, &pid))
+	{
+		case BGWH_STARTED:
+			/* Success. */
+			break;
+		case BGWH_STOPPED:
+			pfree(worker_handle);
+			ereport(ERROR,
+					(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+					 errmsg("could not start background process"),
+					 errhint("More details may be available in the server log.")));
+			break;
+		case BGWH_POSTMASTER_DIED:
+			pfree(worker_handle);
+			ereport(ERROR,
+					(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+					 errmsg("cannot start background processes without postmaster"),
+					 errhint("Kill all remaining database processes and restart the database.")));
+			break;
+		default:
+			elog(ERROR, "unexpected bgworker handle status");
+			break;
+	}
+
+	/* Store the relevant details about this worker for future use. */
+	save_worker_info(pid, seg, worker_handle, responseq);
+
+	/*
+	 * Now that the worker info is saved, we do not need to, and should not,
+	 * automatically detach the segment at resource-owner cleanup time.
+	 */
+	dsm_pin_mapping(seg);
+
+	/* Return the worker's PID. */
+	PG_RETURN_INT32(pid);
+}
+
+/*
+ * Retrieve the results of a background query previously launched in this
+ * session.
+ */
+Datum
+pg_background_result(PG_FUNCTION_ARGS)
+{
+	int32		pid = PG_GETARG_INT32(0);
+	shm_mq_result	res;
+	FuncCallContext *funcctx;
+	TupleDesc	tupdesc;
+	StringInfoData	msg;
+	pg_background_result_state *state;
+
+	/* First-time setup. */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+		pg_background_worker_info *info;
+
+		funcctx = SRF_FIRSTCALL_INIT();
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* See if we have a connection to the specified PID. */
+		if ((info = find_worker_info(pid)) == NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_OBJECT),
+					 errmsg("PID %d is not attached to this session", pid)));
+		check_rights(info);
+
+		/* Can't read results twice. */
+		if (info->consumed)
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("results for PID %d have already been consumed", pid)));
+		info->consumed = true;
+
+		/*
+		 * Whether we succeed or fail, a future invocation of this function
+		 * may not try to read from the DSM once we've begun to do so.
+		 * Accordingly, make arrangements to clean things up at end of query.
+		 */
+		dsm_unpin_mapping(info->seg);
+
+		/* Set up tuple-descriptor based on colum definition list. */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record"),
+					 errhint("Try calling the function in the FROM clause "
+							 "using a column definition list.")));
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/* Cache state that will be needed on every call. */
+		state = palloc0(sizeof(pg_background_result_state));
+		state->info = info;
+		if (funcctx->tuple_desc->natts > 0)
+		{
+			int	natts = funcctx->tuple_desc->natts;
+			int	i;
+
+			state->receive_functions = palloc(sizeof(FmgrInfo) * natts);
+			state->typioparams = palloc(sizeof(Oid) * natts);
+
+			for (i = 0;	i < natts; ++i)
+			{
+				Oid	receive_function_id;
+
+				getTypeBinaryInputInfo(funcctx->tuple_desc->attrs[i]->atttypid,
+									   &receive_function_id,
+									   &state->typioparams[i]);
+				fmgr_info(receive_function_id, &state->receive_functions[i]);
+			}
+		}
+		funcctx->user_fctx = state;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+	funcctx = SRF_PERCALL_SETUP();
+	tupdesc = funcctx->tuple_desc;
+	state = funcctx->user_fctx;
+
+	/* Initialize message buffer. */
+	initStringInfo(&msg);
+
+	/* Read and processes messages from the shared memory queue. */
+	for (;;)
+	{
+		char		msgtype;
+		Size		nbytes;
+		void	   *data;
+
+		/* Get next message. */
+		res = shm_mq_receive(state->info->responseq, &nbytes, &data, false);
+		if (res != SHM_MQ_SUCCESS)
+			break;
+
+		/*
+		 * Message-parsing routines operate on a null-terminated StringInfo,
+		 * so we must construct one.
+		 */
+		resetStringInfo(&msg);
+		enlargeStringInfo(&msg, nbytes);
+		msg.len = nbytes;
+		memcpy(msg.data, data, nbytes);
+		msg.data[nbytes] = '\0';
+		msgtype = pq_getmsgbyte(&msg);
+
+		/* Dispatch on message type. */
+		switch (msgtype)
+		{
+			case 'E':
+			case 'N':
+				{
+					ErrorData	edata;
+					ErrorContextCallback context;
+
+					/* Parse ErrorResponse or NoticeResponse. */
+					pq_parse_errornotice(&msg, &edata);
+
+					/*
+					 * Limit the maximum error level to ERROR.  We don't want
+					 * a FATAL inside the background worker to kill the user
+					 * session.
+					 */
+					if (edata.elevel > ERROR)
+						edata.elevel = ERROR;
+
+					/*
+					 * Rethrow the error with an appropriate context method.
+					 */
+					context.callback = pg_background_error_callback;
+					context.arg = (void *) &pid;
+					context.previous = error_context_stack;
+					error_context_stack = &context;
+					ThrowErrorData(&edata);
+					error_context_stack = context.previous;
+
+					break;
+				}
+			case 'A':
+				{
+					/* Propagate NotifyResponse. */
+					pq_putmessage(msg.data[0], &msg.data[1], nbytes - 1);
+					break;
+				}
+			case 'T':
+				{
+					int16	natts = pq_getmsgint(&msg, 2);
+					int16	i;
+
+					if (state->has_row_description)
+						elog(ERROR, "multiple RowDescription messages");
+					state->has_row_description = true;
+					if (natts != tupdesc->natts)
+						ereport(ERROR,
+								(errcode(ERRCODE_DATATYPE_MISMATCH),
+								 errmsg("remote query result rowtype does not match "
+									"the specified FROM clause rowtype")));
+
+					for (i = 0; i < natts; ++i)
+					{
+						Oid		type_id;
+
+						(void) pq_getmsgstring(&msg);	/* name */
+						(void) pq_getmsgint(&msg, 4);	/* table OID */
+						(void) pq_getmsgint(&msg, 2);	/* table attnum */
+						type_id = pq_getmsgint(&msg, 4);	/* type OID */
+						(void) pq_getmsgint(&msg, 2);	/* type length */
+						(void) pq_getmsgint(&msg, 4);	/* typmod */
+						(void) pq_getmsgint(&msg, 2);	/* format code */
+
+						if (type_id != tupdesc->attrs[i]->atttypid)
+							ereport(ERROR,
+									(errcode(ERRCODE_DATATYPE_MISMATCH),
+									 errmsg("remote query result rowtype does not match "
+										"the specified FROM clause rowtype")));
+					}
+
+					pq_getmsgend(&msg);
+
+					break;
+				}
+			case 'D':
+				{
+					/* Handle DataRow message. */
+					HeapTuple	result;
+
+					result = form_result_tuple(state, tupdesc, &msg);
+					SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(result));
+				}
+			case 'C':
+				{
+					/* Handle CommandComplete message. */
+					MemoryContext	oldcontext;
+					const char  *tag = pq_getmsgstring(&msg);
+
+					oldcontext =
+						MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+					state->command_tags = lappend(state->command_tags,
+												  pstrdup(tag));
+					MemoryContextSwitchTo(oldcontext);
+					break;
+				}
+			case 'G':
+			case 'H':
+			case 'W':
+				{
+					ereport(ERROR,
+							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("COPY protocol not allowed in pg_background")));
+				}
+
+			case 'Z':
+				{
+					/* Handle ReadyForQuery message. */
+					state->complete = true;
+					break;
+				}
+			default:
+				elog(WARNING, "unknown message type: %c (%zu bytes)",
+					 msg.data[0], nbytes);
+				break;
+		}
+	}
+
+	/* Check whether the connection was broken prematurely. */
+	if (!state->complete)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONNECTION_FAILURE),
+				 errmsg("lost connection to worker process with PID %d",
+					pid)));
+
+	/* If no data rows, return the command tags instead. */
+	if (!state->has_row_description)
+	{
+		if (tupdesc->natts != 1 || tupdesc->attrs[0]->atttypid != TEXTOID)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query did not return a result set, but "
+							"result rowtype is not a single text column")));
+		if (state->command_tags != NIL)
+		{
+			char *tag = linitial(state->command_tags);
+			Datum	value;
+			bool	isnull;
+			HeapTuple	result;
+
+			state->command_tags = list_delete_first(state->command_tags);
+			value = PointerGetDatum(cstring_to_text(tag));
+			isnull = false;
+			result = heap_form_tuple(tupdesc, &value, &isnull);
+			SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(result));
+		}
+	}
+
+	/* We're done! */
+	dsm_detach(state->info->seg);
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Parse a DataRow message and form a result tuple.
+ */
+static HeapTuple
+form_result_tuple(pg_background_result_state *state, TupleDesc tupdesc,
+				  StringInfo msg)
+{
+	/* Handle DataRow message. */
+	int16	natts = pq_getmsgint(msg, 2);
+	int16	i;
+	Datum  *values = NULL;
+	bool   *isnull = NULL;
+	StringInfoData	buf;
+
+	if (!state->has_row_description)
+		elog(ERROR, "DataRow not preceded by RowDescription");
+	if (natts != tupdesc->natts)
+		elog(ERROR, "malformed DataRow");
+	if (natts > 0)
+	{
+		values = palloc(natts * sizeof(Datum));
+		isnull = palloc(natts * sizeof(bool));
+	}
+	initStringInfo(&buf);
+
+	for (i = 0; i < natts; ++i)
+	{
+		int32	bytes = pq_getmsgint(msg, 4);
+
+		if (bytes < 0)
+		{
+			values[i] = ReceiveFunctionCall(&state->receive_functions[i],
+											NULL,
+											state->typioparams[i],
+											tupdesc->attrs[i]->atttypmod);
+			isnull[i] = true;
+		}
+		else
+		{
+			resetStringInfo(&buf);
+			appendBinaryStringInfo(&buf, pq_getmsgbytes(msg, bytes), bytes);
+			values[i] = ReceiveFunctionCall(&state->receive_functions[i],
+											&buf,
+											state->typioparams[i],
+											tupdesc->attrs[i]->atttypmod);
+			isnull[i] = false;
+		}
+	}
+
+	pq_getmsgend(msg);
+
+	return heap_form_tuple(tupdesc, values, isnull);
+}
+
+/*
+ * Detach from the dynamic shared memory segment used for communication with
+ * a background worker.  This prevents the worker from stalling waiting for
+ * us to read its results.
+ */
+Datum
+pg_background_detach(PG_FUNCTION_ARGS)
+{
+	int32		pid = PG_GETARG_INT32(0);
+	pg_background_worker_info *info;
+
+	info = find_worker_info(pid);
+	if (info == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("PID %d is not attached to this session", pid)));
+	check_rights(info);
+	dsm_detach(info->seg);
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * When the dynamic shared memory segment associated with a worker is
+ * cleaned up, we need to clean up our associated private data structures.
+ */
+static void
+cleanup_worker_info(dsm_segment *seg, Datum pid_datum)
+{
+	pid_t	pid = DatumGetInt32(pid_datum);
+	bool	found;
+	pg_background_worker_info *info;
+
+	/* Find any worker info entry for this PID.  If none, we're done. */
+	if ((info = find_worker_info(pid)) == NULL)
+		return;
+
+	/* Free memory used by the BackgroundWorkerHandle. */
+	if (info->handle != NULL)
+	{
+		pfree(info->handle);
+		info->handle = NULL;
+	}
+
+	/* Remove the hashtable entry. */
+	hash_search(worker_hash, (void *) &pid, HASH_REMOVE, &found);
+	if (!found)
+		elog(ERROR, "pg_background worker_hash table corrupted");
+}
+
+/*
+ * Find the background worker information for the worker with a given PID.
+ */
+static pg_background_worker_info *
+find_worker_info(pid_t pid)
+{
+	pg_background_worker_info *info = NULL;
+
+	if (worker_hash != NULL)
+		info = hash_search(worker_hash, (void *) &pid, HASH_FIND, NULL);
+
+	return info;
+}
+
+/*
+ * Check whether the current user has rights to manipulate the background
+ * worker with the given PID.
+ */
+static void
+check_rights(pg_background_worker_info *info)
+{
+	Oid	current_user_id;
+	int	sec_context;
+
+	GetUserIdAndSecContext(&current_user_id, &sec_context);
+	if (!has_privs_of_role(current_user_id, info->current_user_id))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+			 errmsg("permission denied for background worker with PID \"%d\"",
+						info->pid)));
+}
+
+/*
+ * Save worker information for future IPC.
+ */
+static void
+save_worker_info(pid_t pid, dsm_segment *seg, BackgroundWorkerHandle *handle,
+				 shm_mq_handle *responseq)
+{
+	pg_background_worker_info *info;
+	Oid current_user_id;
+	int	sec_context;
+
+	/* If the hash table hasn't been set up yet, do that now. */
+	if (worker_hash == NULL)
+	{
+		HASHCTL	ctl;
+
+		ctl.keysize = sizeof(pid_t);
+		ctl.entrysize = sizeof(pg_background_worker_info);
+		worker_hash = hash_create("pg_background worker_hash", 8, &ctl,
+								  HASH_ELEM);
+	}
+
+	/* Get current authentication information. */
+	GetUserIdAndSecContext(&current_user_id, &sec_context);
+
+	/*
+	 * In the unlikely event that there's an older worker with this PID,
+	 * just detach it - unless it has a different user ID than the
+	 * currently-active one, in which case someone might be trying to pull
+	 * a fast one.  Let's kill the backend to make sure we don't break
+	 * anyone's expectations.
+	 */
+	if ((info = find_worker_info(pid)) != NULL)
+	{
+		if (current_user_id != info->current_user_id)
+			ereport(FATAL,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+			 errmsg("background worker with PID \"%d\" already exists",
+						pid)));
+		dsm_detach(info->seg);
+	}
+
+	/* When the DSM is unmapped, clean everything up. */
+	on_dsm_detach(seg, cleanup_worker_info, Int32GetDatum(pid));
+
+	/* Create a new entry for this worker. */
+	info = hash_search(worker_hash, (void *) &pid, HASH_ENTER, NULL);
+	info->seg = seg;
+	info->handle = handle;
+	info->responseq = responseq;
+	info->consumed = false;
+}
+
+/*
+ * Indicate that an error came from a particular background worker.
+ */
+static void
+pg_background_error_callback(void *arg)
+{
+	pid_t	pid = * (pid_t *) arg;
+
+	errcontext("background worker, pid %d", pid);
+}
+
+/*
+ * Background worker entrypoint.
+ */
+void
+pg_background_worker_main(Datum main_arg)
+{
+	dsm_segment *seg;
+	shm_toc    *toc;
+	pg_background_fixed_data *fdata;
+	char	   *sql;
+	char	   *gucstate;
+	shm_mq	   *mq;
+	shm_mq_handle *responseq;
+
+	/* Establish signal handlers. */
+	pqsignal(SIGTERM, handle_sigterm);
+	BackgroundWorkerUnblockSignals();
+
+	/* Set up a memory context and resource owner. */
+	Assert(CurrentResourceOwner == NULL);
+	CurrentResourceOwner = ResourceOwnerCreate(NULL, "pg_background");
+	CurrentMemoryContext = AllocSetContextCreate(TopMemoryContext,
+												 "pg_background session",
+												 ALLOCSET_DEFAULT_MINSIZE,
+												 ALLOCSET_DEFAULT_INITSIZE,
+												 ALLOCSET_DEFAULT_MAXSIZE);
+
+	/* Connect to the dynamic shared memory segment. */
+	seg = dsm_attach(DatumGetInt32(main_arg));
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("unable to map dynamic shared memory segment")));
+	toc = shm_toc_attach(PG_BACKGROUND_MAGIC, dsm_segment_address(seg));
+	if (toc == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+			   errmsg("bad magic number in dynamic shared memory segment")));
+
+	/* Find data structures in dynamic shared memory. */
+	fdata = shm_toc_lookup(toc, PG_BACKGROUND_KEY_FIXED_DATA);
+	sql = shm_toc_lookup(toc, PG_BACKGROUND_KEY_SQL);
+	gucstate = shm_toc_lookup(toc, PG_BACKGROUND_KEY_GUC);
+	mq = shm_toc_lookup(toc, PG_BACKGROUND_KEY_QUEUE);
+	shm_mq_set_sender(mq, MyProc);
+	responseq = shm_mq_attach(mq, seg, NULL);
+
+	/* Redirect protocol messages to responseq. */
+	pq_redirect_to_shm_mq(mq, responseq);
+
+	/*
+	 * Initialize our user and database ID based on the strings version of
+	 * the data, and then go back and check that we actually got the database
+	 * and user ID that we intended to get.  We do this because it's not
+	 * impossible for the process that started us to die before we get here,
+	 * and the user or database could be renamed in the meantime.  We don't
+	 * want to latch on the wrong object by accident.  There should probably
+	 * be a variant of BackgroundWorkerInitializeConnection that accepts OIDs
+	 * rather than strings.
+	 */
+	BackgroundWorkerInitializeConnection(NameStr(fdata->database),
+										 NameStr(fdata->authenticated_user));
+	if (fdata->database_id != MyDatabaseId ||
+		fdata->authenticated_user_id != GetAuthenticatedUserId())
+		ereport(ERROR,
+			(errmsg("user or database renamed during pg_background startup")));
+
+	/* Restore GUC values from launching backend. */
+	StartTransactionCommand();
+	RestoreGUCState(gucstate);
+	CommitTransactionCommand();
+
+	/* Handle local_preload_libraries and session_preload_libraries. */
+	process_session_preload_libraries();
+
+	/* Restore user ID and security context. */
+	SetUserIdAndSecContext(fdata->current_user_id, fdata->sec_context);
+
+	/* Prepare to execute the query. */
+	SetCurrentStatementStartTimestamp();
+	debug_query_string = sql;
+	pgstat_report_activity(STATE_RUNNING, sql);
+
+	/* Execute the query. */
+	execute_sql_string(sql);
+
+	/* Post-execution cleanup. */
+	ProcessCompletedNotifies();
+	pgstat_report_activity(STATE_IDLE, sql);
+	pgstat_report_stat(true);
+
+	/* Signal that we are done. */
+	ReadyForQuery(DestRemote);
+}
+
+/*
+ * Execute given SQL string.
+ *
+ * Using SPI here would preclude backgrounding commands like VACUUM which one
+ * might very well wish to launch in the background.  So we do this instead.
+ */
+static void
+execute_sql_string(const char *sql)
+{
+	List	   *raw_parsetree_list;
+	ListCell   *lc1;
+	bool		isTopLevel;
+	int			commands_remaining;
+	MemoryContext	parsecontext;
+	MemoryContext	oldcontext;
+
+	/* Start up a transaction command. */
+	start_xact_command();
+
+	/*
+	 * Parse the SQL string into a list of raw parse trees.
+	 *
+	 * Because we allow statements that perform internal transaction control,
+	 * we can't do this in TopTransactionContext; the parse trees might get
+	 * blown away before we're done executing them.
+	 */
+	parsecontext = AllocSetContextCreate(TopMemoryContext,
+										 "pg_background parse/plan",
+										 ALLOCSET_DEFAULT_MINSIZE,
+										 ALLOCSET_DEFAULT_INITSIZE,
+										 ALLOCSET_DEFAULT_MAXSIZE);
+	oldcontext = MemoryContextSwitchTo(parsecontext);
+	raw_parsetree_list = pg_parse_query(sql);
+	commands_remaining = list_length(raw_parsetree_list);
+	isTopLevel = commands_remaining == 1;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * Do parse analysis, rule rewrite, planning, and execution for each raw
+	 * parsetree.  We must fully execute each query before beginning parse
+	 * analysis on the next one, since there may be interdependencies.
+	 */
+	foreach(lc1, raw_parsetree_list)
+	{
+		Node	   *parsetree = (Node *) lfirst(lc1);
+		const char *commandTag;
+		char        completionTag[COMPLETION_TAG_BUFSIZE];
+		List       *querytree_list,
+				   *plantree_list;
+		bool		snapshot_set = false;
+		Portal		portal;
+		DestReceiver *receiver;
+		int16		format = 1;
+
+		/*
+		 * We don't allow transaction-control commands like COMMIT and ABORT
+		 * here.  The entire SQL statement is executed as a single transaction
+		 * which commits if no errors are encountered.
+		 */
+		if (IsA(parsetree, TransactionStmt))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("transaction control statements are not allowed in pg_background")));
+
+		/*
+		 * Get the command name for use in status display (it also becomes the
+		 * default completion tag, down inside PortalRun).  Set ps_status and
+		 * do any special start-of-SQL-command processing needed by the
+		 * destination.
+		 */
+		commandTag = CreateCommandTag(parsetree);
+		set_ps_display(commandTag, false);
+		BeginCommand(commandTag, DestNone);
+
+		/* Set up a snapshot if parse analysis/planning will need one. */
+		if (analyze_requires_snapshot(parsetree))
+		{
+			PushActiveSnapshot(GetTransactionSnapshot());
+			snapshot_set = true;
+		}
+
+		/*
+		 * OK to analyze, rewrite, and plan this query.
+		 *
+		 * As with parsing, we need to make sure this data outlives the
+		 * transaction, because of the possibility that the statement might
+		 * perform internal transaction control.
+		 */
+		oldcontext = MemoryContextSwitchTo(parsecontext);
+		querytree_list = pg_analyze_and_rewrite(parsetree, sql, NULL, 0);
+		plantree_list = pg_plan_queries(querytree_list, 0, NULL);
+
+		/* Done with the snapshot used for parsing/planning */
+		if (snapshot_set)
+			PopActiveSnapshot();
+
+		/* If we got a cancel signal in analysis or planning, quit */
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Execute the query using the unnamed portal.
+		 */
+		portal = CreatePortal("", true, true);
+		/* Don't display the portal in pg_cursors */
+		portal->visible = false;
+		PortalDefineQuery(portal, NULL, sql, commandTag, plantree_list, NULL);
+		PortalStart(portal, NULL, 0, InvalidSnapshot);
+
+		/* We always use binary format, for efficiency. */
+		PortalSetResultFormat(portal, 1, &format);
+
+		/*
+		 * Tuples returned by any command other than the last are simply
+		 * discarded; but those returned by the last (or only) command are
+		 * redirected to the shared memory queue we're using for communication
+		 * with the launching backend. If the launching backend is gone or has
+		 * detached us, these messages will just get dropped on the floor.
+		 */
+		--commands_remaining;
+		if (commands_remaining > 0)
+			receiver = CreateDestReceiver(DestNone);
+		else
+		{
+			receiver = CreateDestReceiver(DestRemote);
+			SetRemoteDestReceiverParams(receiver, portal);
+		}
+
+		/*
+		 * Only once the portal and destreceiver have been established can
+		 * we return to the transaction context.  All that stuff needs to
+		 * survive an internal commit inside PortalRun!
+		 */
+		MemoryContextSwitchTo(oldcontext);
+
+		/* Here's where we actually execute the command. */
+		(void) PortalRun(portal, FETCH_ALL, isTopLevel, receiver, receiver,
+						 completionTag);
+
+		/* Clean up the receiver. */
+		(*receiver->rDestroy) (receiver);
+
+		/* Clean up the portal. */
+		PortalDrop(portal, false);
+
+		/*
+		 * If this is the last parsetree, close down transaction statement
+		 * before reporting CommandComplete.  Otherwise, we need a
+		 * CommandCounterIncrement.
+		 */
+		if (lnext(lc1) == NULL)
+			finish_xact_command();
+		else
+			CommandCounterIncrement();
+
+		/*
+		 * Send a CommandComplete message even if we suppressed the query
+		 * results.  The user backend will report the command tags in the
+		 * absence of any true query results.
+		 */
+		EndCommand(completionTag, DestRemote);
+	}
+
+	/* Make sure there's not still a transaction open. */
+	finish_xact_command();
+}
+
+/*
+ * When we receive a SIGTERM, we set InterruptPending and ProcDiePending just
+ * like a normal backend.  The next CHECK_FOR_INTERRUPTS() will do the right
+ * thing.
+ */
+static void
+handle_sigterm(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	if (!proc_exit_inprogress)
+	{
+		InterruptPending = true;
+		ProcDiePending = true;
+	}
+
+	errno = save_errno;
+}
diff --git a/contrib/pg_background/pg_background.control b/contrib/pg_background/pg_background.control
new file mode 100644
index 0000000..733d0e1
--- /dev/null
+++ b/contrib/pg_background/pg_background.control
@@ -0,0 +1,4 @@
+comment = 'Run SQL queries in the background'
+default_version = '1.0'
+module_pathname = '$libdir/pg_background'
+relocatable = true
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 61f17bf..e44f5fa 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -190,8 +190,6 @@ static int	errdetail_execute(List *raw_parsetree_list);
 static int	errdetail_params(ParamListInfo params);
 static int	errdetail_abort(void);
 static int	errdetail_recovery_conflict(void);
-static void start_xact_command(void);
-static void finish_xact_command(void);
 static bool IsTransactionExitStmt(Node *parsetree);
 static bool IsTransactionExitStmtList(List *parseTrees);
 static bool IsTransactionStmtList(List *parseTrees);
@@ -2373,7 +2371,7 @@ exec_describe_portal_message(const char *portal_name)
 /*
  * Convenience routines for starting/committing a single command.
  */
-static void
+void
 start_xact_command(void)
 {
 	if (!xact_started)
@@ -2393,7 +2391,7 @@ start_xact_command(void)
 	}
 }
 
-static void
+void
 finish_xact_command(void)
 {
 	if (xact_started)
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index c23f4da..8e84ee7 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -224,7 +224,7 @@ extern PGPROC *PreparedXactProcs;
 
 /* configurable options */
 extern int	DeadlockTimeout;
-extern int	StatementTimeout;
+extern PGDLLIMPORT int	StatementTimeout;
 extern int	LockTimeout;
 extern bool log_lock_waits;
 
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 60f7532..fd3df58 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -69,6 +69,8 @@ extern void RecoveryConflictInterrupt(ProcSignalReason reason); /* called from S
 																 * handler */
 extern void prepare_for_client_read(void);
 extern void client_read_ended(void);
+extern void start_xact_command(void);
+extern void finish_xact_command(void);
 extern void process_postgres_switches(int argc, char *argv[],
 						  GucContext ctx, const char **dbname);
 extern void PostgresMain(int argc, char *argv[],

#81

Amit Kapila

amit.kapila16@gmail.com

about 11 years ago

In reply to: Robert Haas (#80)

Re: pg_background (and more parallelism infrastructure patches)

On Wed, Nov 12, 2014 at 11:09 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Nov 12, 2014 at 11:36 AM, Robert Haas <robertmhaas@gmail.com>

wrote:

On Wed, Nov 12, 2014 at 11:19 AM, Andres Freund <andres@2ndquadrant.com>

wrote:

The question is whether the library is actually loaded in that case?
Because that normally only happens early during startup - which is why
it's a PGC_BACKEND guc.

It looks like that does not work.

[rhaas pgsql]$ PGOPTIONS='-c local_preload_libraries=auto_explain' psql
psql (9.5devel)
Type "help" for help.

rhaas=# select * from pg_background_result(pg_background_launch('show
auto_explain.log_min_duration')) as (x text);
ERROR: unrecognized configuration parameter

"auto_explain.log_min_duration"

CONTEXT: background worker, pid 31316

So, there's more to be done here. Rats.

It turned out to be quite simple to fix both problems.

Updated patches attached.

Few compilation errors in the patch:
1>contrib\postgres_fdw\postgres_fdw.c(2107): error C2198:
'set_config_option' : too few arguments for call
1>contrib\postgres_fdw\postgres_fdw.c(2111): error C2198:
'set_config_option' : too few arguments for call
1>contrib\postgres_fdw\postgres_fdw.c(2115): error C2198:
'set_config_option' : too few arguments for call
2>contrib\dblink\dblink.c(2983): error C2198: 'set_config_option' : too few
arguments for call

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#82

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Amit Kapila (#81)

1 attachment(s)

Re: pg_background (and more parallelism infrastructure patches)

On Thu, Nov 20, 2014 at 7:30 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Few compilation errors in the patch:
1>contrib\postgres_fdw\postgres_fdw.c(2107): error C2198:
'set_config_option' : too few arguments for call
1>contrib\postgres_fdw\postgres_fdw.c(2111): error C2198:
'set_config_option' : too few arguments for call
1>contrib\postgres_fdw\postgres_fdw.c(2115): error C2198:
'set_config_option' : too few arguments for call
2>contrib\dblink\dblink.c(2983): error C2198: 'set_config_option' : too few
arguments for call

Oops. Good catch. Fixed in the attached version.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0004-Add-infrastructure-to-save-and-restore-GUC-values-v3.patchtext/x-patch; charset=US-ASCII; name=0004-Add-infrastructure-to-save-and-restore-GUC-values-v3.patchDownload

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index d77b3ee..18ae318 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -2980,7 +2980,7 @@ applyRemoteGucs(PGconn *conn)
 		/* Apply the option (this will throw error on failure) */
 		(void) set_config_option(gucName, remoteVal,
 								 PGC_USERSET, PGC_S_SESSION,
-								 GUC_ACTION_SAVE, true, 0);
+								 GUC_ACTION_SAVE, true, 0, false);
 	}
 
 	return nestlevel;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 905a821..76bda73 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2104,15 +2104,15 @@ set_transmission_modes(void)
 	if (DateStyle != USE_ISO_DATES)
 		(void) set_config_option("datestyle", "ISO",
 								 PGC_USERSET, PGC_S_SESSION,
-								 GUC_ACTION_SAVE, true, 0);
+								 GUC_ACTION_SAVE, true, 0, false);
 	if (IntervalStyle != INTSTYLE_POSTGRES)
 		(void) set_config_option("intervalstyle", "postgres",
 								 PGC_USERSET, PGC_S_SESSION,
-								 GUC_ACTION_SAVE, true, 0);
+								 GUC_ACTION_SAVE, true, 0, false);
 	if (extra_float_digits < 3)
 		(void) set_config_option("extra_float_digits", "3",
 								 PGC_USERSET, PGC_S_SESSION,
-								 GUC_ACTION_SAVE, true, 0);
+								 GUC_ACTION_SAVE, true, 0, false);
 
 	return nestlevel;
 }
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 9a0afa4..6692bb5 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -814,11 +814,11 @@ execute_extension_script(Oid extensionOid, ExtensionControlFile *control,
 	if (client_min_messages < WARNING)
 		(void) set_config_option("client_min_messages", "warning",
 								 PGC_USERSET, PGC_S_SESSION,
-								 GUC_ACTION_SAVE, true, 0);
+								 GUC_ACTION_SAVE, true, 0, false);
 	if (log_min_messages < WARNING)
 		(void) set_config_option("log_min_messages", "warning",
 								 PGC_SUSET, PGC_S_SESSION,
-								 GUC_ACTION_SAVE, true, 0);
+								 GUC_ACTION_SAVE, true, 0, false);
 
 	/*
 	 * Set up the search path to contain the target schema, then the schemas
@@ -843,7 +843,7 @@ execute_extension_script(Oid extensionOid, ExtensionControlFile *control,
 
 	(void) set_config_option("search_path", pathbuf.data,
 							 PGC_USERSET, PGC_S_SESSION,
-							 GUC_ACTION_SAVE, true, 0);
+							 GUC_ACTION_SAVE, true, 0, false);
 
 	/*
 	 * Set creating_extension and related variables so that
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index c0156fa..2f02303 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -2422,7 +2422,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	snprintf(workmembuf, sizeof(workmembuf), "%d", maintenance_work_mem);
 	(void) set_config_option("work_mem", workmembuf,
 							 PGC_USERSET, PGC_S_SESSION,
-							 GUC_ACTION_SAVE, true, 0);
+							 GUC_ACTION_SAVE, true, 0, false);
 
 	if (SPI_connect() != SPI_OK_CONNECT)
 		elog(ERROR, "SPI_connect failed");
diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l
index bb9207a..6dca5df 100644
--- a/src/backend/utils/misc/guc-file.l
+++ b/src/backend/utils/misc/guc-file.l
@@ -318,7 +318,7 @@ ProcessConfigFile(GucContext context)
 		/* Now we can re-apply the wired-in default (i.e., the boot_val) */
 		if (set_config_option(gconf->name, NULL,
 							  context, PGC_S_DEFAULT,
-							  GUC_ACTION_SET, true, 0) > 0)
+							  GUC_ACTION_SET, true, 0, false) > 0)
 		{
 			/* Log the change if appropriate */
 			if (context == PGC_SIGHUP)
@@ -373,7 +373,7 @@ ProcessConfigFile(GucContext context)
 
 		scres = set_config_option(item->name, item->value,
 								  context, PGC_S_FILE,
-								  GUC_ACTION_SET, true, 0);
+								  GUC_ACTION_SET, true, 0, false);
 		if (scres > 0)
 		{
 			/* variable was updated, so log the change if appropriate */
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 23cbe90..f04757c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -110,6 +110,12 @@
 #define S_PER_D (60 * 60 * 24)
 #define MS_PER_D (1000 * 60 * 60 * 24)
 
+/*
+ * Precision with which REAL type guc values are to be printed for GUC
+ * serialization.
+ */
+#define REALTYPE_PRECISION 17
+
 /* XXX these should appear in other modules' header files */
 extern bool Log_disconnections;
 extern int	CommitDelay;
@@ -136,6 +142,10 @@ char	   *GUC_check_errmsg_string;
 char	   *GUC_check_errdetail_string;
 char	   *GUC_check_errhint_string;
 
+static void
+do_serialize(char **destptr, Size *maxbytes, const char *fmt,...)
+/* This lets gcc check the format string for consistency. */
+__attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 4)));
 
 static void set_config_sourcefile(const char *name, char *sourcefile,
 					  int sourceline);
@@ -5612,7 +5622,8 @@ validate_conf_option(struct config_generic * record, const char *name,
 int
 set_config_option(const char *name, const char *value,
 				  GucContext context, GucSource source,
-				  GucAction action, bool changeVal, int elevel)
+				  GucAction action, bool changeVal, int elevel,
+				  bool is_reload)
 {
 	struct config_generic *record;
 	bool		prohibitValueChange = false;
@@ -5726,18 +5737,13 @@ set_config_option(const char *name, const char *value,
 				 * nondefault settings from the CONFIG_EXEC_PARAMS file during
 				 * backend start.  In that case we must accept PGC_SIGHUP
 				 * settings, so as to have the same value as if we'd forked
-				 * from the postmaster.  We detect this situation by checking
-				 * IsInitProcessingMode, which is a bit ugly, but it doesn't
-				 * seem worth passing down an explicit flag saying we're doing
-				 * read_nondefault_variables().
+				 * from the postmaster.  This can also happen when using
+				 * RestoreGUCState() within a background worker that needs to
+				 * have the same settings as the user backend that started it.
+				 * is_reload will be true when either situation applies.
 				 */
-#ifdef EXEC_BACKEND
-				if (IsUnderPostmaster && !IsInitProcessingMode())
-					return -1;
-#else
-				if (IsUnderPostmaster)
+				if (IsUnderPostmaster && !is_reload)
 					return -1;
-#endif
 			}
 			else if (context != PGC_POSTMASTER &&
 					 context != PGC_BACKEND &&
@@ -6343,7 +6349,7 @@ SetConfigOption(const char *name, const char *value,
 				GucContext context, GucSource source)
 {
 	(void) set_config_option(name, value, context, source,
-							 GUC_ACTION_SET, true, 0);
+							 GUC_ACTION_SET, true, 0, false);
 }
 
 
@@ -6923,9 +6929,7 @@ ExecSetVariableStmt(VariableSetStmt *stmt, bool isTopLevel)
 									 ExtractSetVariableArgs(stmt),
 									 (superuser() ? PGC_SUSET : PGC_USERSET),
 									 PGC_S_SESSION,
-									 action,
-									 true,
-									 0);
+									 action, true, 0, false);
 			break;
 		case VAR_SET_MULTI:
 
@@ -7012,9 +7016,7 @@ ExecSetVariableStmt(VariableSetStmt *stmt, bool isTopLevel)
 									 NULL,
 									 (superuser() ? PGC_SUSET : PGC_USERSET),
 									 PGC_S_SESSION,
-									 action,
-									 true,
-									 0);
+									 action, true, 0, false);
 			break;
 		case VAR_RESET_ALL:
 			ResetAllOptions();
@@ -7059,8 +7061,7 @@ SetPGVariable(const char *name, List *args, bool is_local)
 							 (superuser() ? PGC_SUSET : PGC_USERSET),
 							 PGC_S_SESSION,
 							 is_local ? GUC_ACTION_LOCAL : GUC_ACTION_SET,
-							 true,
-							 0);
+							 true, 0, false);
 }
 
 /*
@@ -7103,8 +7104,7 @@ set_config_by_name(PG_FUNCTION_ARGS)
 							 (superuser() ? PGC_SUSET : PGC_USERSET),
 							 PGC_S_SESSION,
 							 is_local ? GUC_ACTION_LOCAL : GUC_ACTION_SET,
-							 true,
-							 0);
+							 true, 0, false);
 
 	/* get the new current value */
 	new_value = GetConfigOptionByName(name, NULL);
@@ -7225,7 +7225,7 @@ define_custom_variable(struct config_generic * variable)
 		(void) set_config_option(name, pHolder->reset_val,
 								 pHolder->gen.reset_scontext,
 								 pHolder->gen.reset_source,
-								 GUC_ACTION_SET, true, WARNING);
+								 GUC_ACTION_SET, true, WARNING, false);
 	/* That should not have resulted in stacking anything */
 	Assert(variable->stack == NULL);
 
@@ -7281,30 +7281,35 @@ reapply_stacked_values(struct config_generic * variable,
 			case GUC_SAVE:
 				(void) set_config_option(name, curvalue,
 										 curscontext, cursource,
-										 GUC_ACTION_SAVE, true, WARNING);
+										 GUC_ACTION_SAVE, true,
+										 WARNING, false);
 				break;
 
 			case GUC_SET:
 				(void) set_config_option(name, curvalue,
 										 curscontext, cursource,
-										 GUC_ACTION_SET, true, WARNING);
+										 GUC_ACTION_SET, true,
+										 WARNING, false);
 				break;
 
 			case GUC_LOCAL:
 				(void) set_config_option(name, curvalue,
 										 curscontext, cursource,
-										 GUC_ACTION_LOCAL, true, WARNING);
+										 GUC_ACTION_LOCAL, true,
+										 WARNING, false);
 				break;
 
 			case GUC_SET_LOCAL:
 				/* first, apply the masked value as SET */
 				(void) set_config_option(name, stack->masked.val.stringval,
 									   stack->masked_scontext, PGC_S_SESSION,
-										 GUC_ACTION_SET, true, WARNING);
+										 GUC_ACTION_SET, true,
+										 WARNING, false);
 				/* then apply the current value as LOCAL */
 				(void) set_config_option(name, curvalue,
 										 curscontext, cursource,
-										 GUC_ACTION_LOCAL, true, WARNING);
+										 GUC_ACTION_LOCAL, true,
+										 WARNING, false);
 				break;
 		}
 
@@ -7328,7 +7333,7 @@ reapply_stacked_values(struct config_generic * variable,
 		{
 			(void) set_config_option(name, curvalue,
 									 curscontext, cursource,
-									 GUC_ACTION_SET, true, WARNING);
+									 GUC_ACTION_SET, true, WARNING, false);
 			variable->stack = NULL;
 		}
 	}
@@ -8452,7 +8457,7 @@ read_nondefault_variables(void)
 
 		(void) set_config_option(varname, varvalue,
 								 varscontext, varsource,
-								 GUC_ACTION_SET, true, 0);
+								 GUC_ACTION_SET, true, 0, true);
 		if (varsourcefile[0])
 			set_config_sourcefile(varname, varsourcefile, varsourceline);
 
@@ -8465,6 +8470,398 @@ read_nondefault_variables(void)
 }
 #endif   /* EXEC_BACKEND */
 
+/*
+ * can_skip_gucvar:
+ * When serializing, determine whether to skip this GUC.  When restoring, the
+ * negation of this test determines whether to restore the compiled-in default
+ * value before processing serialized values.
+ *
+ * A PGC_S_DEFAULT setting on the serialize side will typically match new
+ * postmaster children, but that can be false when got_SIGHUP == true and the
+ * pending configuration change modifies this setting.  Nonetheless, we omit
+ * PGC_S_DEFAULT settings from serialization and make up for that by restoring
+ * defaults before applying serialized values.
+ *
+ * PGC_POSTMASTER variables always have the same value in every child of a
+ * particular postmaster.  Most PGC_INTERNAL variables are compile-time
+ * constants; a few, like server_encoding and lc_ctype, are handled specially
+ * outside the serialize/restore procedure.  Therefore, SerializeGUCState()
+ * never sends these, and and RestoreGUCState() never changes them.
+ */
+static bool
+can_skip_gucvar(struct config_generic * gconf)
+{
+	return gconf->context == PGC_POSTMASTER ||
+		gconf->context == PGC_INTERNAL || gconf->source == PGC_S_DEFAULT;
+}
+
+/*
+ * estimate_variable_size:
+ * Estimate max size for dumping the given GUC variable.
+ */
+static Size
+estimate_variable_size(struct config_generic * gconf)
+{
+	Size		size;
+	Size		valsize;
+
+	if (can_skip_gucvar(gconf))
+		return 0;
+
+	size = 0;
+
+	size = add_size(size, strlen(gconf->name) + 1);
+
+	/* Get the maximum display length of the GUC value. */
+	switch (gconf->vartype)
+	{
+		case PGC_BOOL:
+			{
+				valsize = 5;	/* max(strlen('true'), strlen('false')) */
+			}
+			break;
+
+		case PGC_INT:
+			{
+				struct config_int *conf = (struct config_int *) gconf;
+
+				/*
+				 * Instead of getting the exact display length, use max
+				 * length.  Also reduce the max length for typical ranges of
+				 * small values.  Maximum value is 2147483647, i.e. 10 chars.
+				 * Include one byte for sign.
+				 */
+				if (abs(*conf->variable) < 1000)
+					valsize = 3 + 1;
+				else
+					valsize = 10 + 1;
+			}
+			break;
+
+		case PGC_REAL:
+			{
+				/*
+				 * We are going to print it with %.17g. Account for sign,
+				 * decimal point, and e+nnn notation. E.g.
+				 * -3.9932904234000002e+110
+				 */
+				valsize = REALTYPE_PRECISION + 1 + 1 + 5;
+			}
+			break;
+
+		case PGC_STRING:
+			{
+				struct config_string *conf = (struct config_string *) gconf;
+
+				valsize = strlen(*conf->variable);
+			}
+			break;
+
+		case PGC_ENUM:
+			{
+				struct config_enum *conf = (struct config_enum *) gconf;
+
+				valsize = strlen(config_enum_lookup_by_value(conf, *conf->variable));
+			}
+			break;
+	}
+
+	/* Allow space for terminating zero-byte */
+	size = add_size(size, valsize + 1);
+
+	if (gconf->sourcefile)
+		size = add_size(size, strlen(gconf->sourcefile));
+
+	/* Allow space for terminating zero-byte */
+	size = add_size(size, 1);
+
+	/* Include line whenever we include file. */
+	if (gconf->sourcefile)
+		size = add_size(size, sizeof(gconf->sourceline));
+
+	size = add_size(size, sizeof(gconf->source));
+	size = add_size(size, sizeof(gconf->scontext));
+
+	return size;
+}
+
+/*
+ * EstimateGUCStateSpace:
+ * Returns the size needed to store the GUC state for the current process
+ */
+Size
+EstimateGUCStateSpace(void)
+{
+	Size		size;
+	int			i;
+
+	/* Add space reqd for saving the data size of the guc state */
+	size = sizeof(Size);
+
+	/* Add up the space needed for each GUC variable */
+	for (i = 0; i < num_guc_variables; i++)
+		size = add_size(size,
+						estimate_variable_size(guc_variables[i]));
+
+	return size;
+}
+
+/*
+ * do_serialize:
+ * Copies the formatted string into the destination.  Moves ahead the
+ * destination pointer, and decrements the maxbytes by that many bytes. If
+ * maxbytes is not sufficient to copy the string, error out.
+ */
+static void
+do_serialize(char **destptr, Size *maxbytes, const char *fmt,...)
+{
+	va_list		vargs;
+	int			n;
+
+	if (*maxbytes <= 0)
+		elog(ERROR, "not enough space to serialize GUC state");
+
+	va_start(vargs, fmt);
+	n = vsnprintf(*destptr, *maxbytes, fmt, vargs);
+	va_end(vargs);
+
+	/*
+	 * Cater to portability hazards in the vsnprintf() return value just like
+	 * appendPQExpBufferVA() does.  Note that this requires an extra byte of
+	 * slack at the end of the buffer.  Since serialize_variable() ends with a
+	 * do_serialize_binary() rather than a do_serialize(), we'll always have
+	 * that slack; estimate_variable_size() need not add a byte for it.
+	 */
+	if (n < 0 || n >= *maxbytes - 1)
+	{
+		if (n < 0 && errno != 0 && errno != ENOMEM)
+			/* Shouldn't happen. Better show errno description. */
+			elog(ERROR, "vsnprintf failed: %m");
+		else
+			elog(ERROR, "not enough space to serialize GUC state");
+	}
+
+	/* Shift the destptr ahead of the null terminator */
+	*destptr += n + 1;
+	*maxbytes -= n + 1;
+}
+
+/* Binary copy version of do_serialize() */
+static void
+do_serialize_binary(char **destptr, Size *maxbytes, void *val, Size valsize)
+{
+	if (valsize > *maxbytes)
+		elog(ERROR, "not enough space to serialize GUC state");
+
+	memcpy(*destptr, val, valsize);
+	*destptr += valsize;
+	*maxbytes -= valsize;
+}
+
+/*
+ * serialize_variable:
+ * Dumps name, value and other information of a GUC variable into destptr.
+ */
+static void
+serialize_variable(char **destptr, Size *maxbytes,
+				   struct config_generic * gconf)
+{
+	if (can_skip_gucvar(gconf))
+		return;
+
+	do_serialize(destptr, maxbytes, "%s", gconf->name);
+
+	switch (gconf->vartype)
+	{
+		case PGC_BOOL:
+			{
+				struct config_bool *conf = (struct config_bool *) gconf;
+
+				do_serialize(destptr, maxbytes,
+							 (*conf->variable ? "true" : "false"));
+			}
+			break;
+
+		case PGC_INT:
+			{
+				struct config_int *conf = (struct config_int *) gconf;
+
+				do_serialize(destptr, maxbytes, "%d", *conf->variable);
+			}
+			break;
+
+		case PGC_REAL:
+			{
+				struct config_real *conf = (struct config_real *) gconf;
+
+				do_serialize(destptr, maxbytes, "%.*g",
+							 REALTYPE_PRECISION, *conf->variable);
+			}
+			break;
+
+		case PGC_STRING:
+			{
+				struct config_string *conf = (struct config_string *) gconf;
+
+				do_serialize(destptr, maxbytes, "%s", *conf->variable);
+			}
+			break;
+
+		case PGC_ENUM:
+			{
+				struct config_enum *conf = (struct config_enum *) gconf;
+
+				do_serialize(destptr, maxbytes, "%s",
+						 config_enum_lookup_by_value(conf, *conf->variable));
+			}
+			break;
+	}
+
+	do_serialize(destptr, maxbytes, "%s",
+				 (gconf->sourcefile ? gconf->sourcefile : ""));
+
+	if (gconf->sourcefile)
+		do_serialize_binary(destptr, maxbytes, &gconf->sourceline,
+							sizeof(gconf->sourceline));
+
+	do_serialize_binary(destptr, maxbytes, &gconf->source,
+						sizeof(gconf->source));
+	do_serialize_binary(destptr, maxbytes, &gconf->scontext,
+						sizeof(gconf->scontext));
+}
+
+/*
+ * SerializeGUCState:
+ * Dumps the complete GUC state onto the memory location at start_address.
+ */
+void
+SerializeGUCState(Size maxsize, char *start_address)
+{
+	char	   *curptr;
+	Size		actual_size;
+	Size		bytes_left;
+	int			i;
+	int			i_role;
+
+	/* Reserve space for saving the actual size of the guc state */
+	curptr = start_address + sizeof(actual_size);
+	bytes_left = maxsize - sizeof(actual_size);
+
+	for (i = 0; i < num_guc_variables; i++)
+	{
+		/*
+		 * It's pretty ugly, but we've got to force "role" to be initialized
+		 * after "session_authorization"; otherwise, the latter will override
+		 * the former.
+		 */
+		if (strcmp(guc_variables[i]->name, "role") == 0)
+			i_role = i;
+		else
+			serialize_variable(&curptr, &bytes_left, guc_variables[i]);
+	}
+	serialize_variable(&curptr, &bytes_left, guc_variables[i_role]);
+
+	/* Store actual size without assuming alignment of start_address. */
+	actual_size = maxsize - bytes_left - sizeof(actual_size);
+	memcpy(start_address, &actual_size, sizeof(actual_size));
+}
+
+/*
+ * read_gucstate:
+ * Actually it does not read anything, just returns the srcptr. But it does
+ * move the srcptr past the terminating zero byte, so that the caller is ready
+ * to read the next string.
+ */
+static char *
+read_gucstate(char **srcptr, char *srcend)
+{
+	char	   *retptr = *srcptr;
+	char	   *ptr;
+
+	if (*srcptr >= srcend)
+		elog(ERROR, "incomplete GUC state");
+
+	/* The string variables are all null terminated */
+	for (ptr = *srcptr; ptr < srcend && *ptr != '\0'; ptr++)
+		;
+
+	if (ptr > srcend)
+		elog(ERROR, "could not find null terminator in GUC state");
+
+	/* Set the new position to the byte following the terminating NUL */
+	*srcptr = ptr + 1;
+
+	return retptr;
+}
+
+/* Binary read version of read_gucstate(). Copies into dest */
+static void
+read_gucstate_binary(char **srcptr, char *srcend, void *dest, Size size)
+{
+	if (*srcptr + size > srcend)
+		elog(ERROR, "incomplete GUC state");
+
+	memcpy(dest, *srcptr, size);
+	*srcptr += size;
+}
+
+/*
+ * RestoreGUCState:
+ * Reads the GUC state at the specified address and updates the GUCs with the
+ * values read from the GUC state.
+ */
+void
+RestoreGUCState(void *gucstate)
+{
+	char	   *varname,
+			   *varvalue,
+			   *varsourcefile;
+	int			varsourceline;
+	GucSource	varsource;
+	GucContext	varscontext;
+	char	   *srcptr = (char *) gucstate;
+	char	   *srcend;
+	Size		len;
+	int			i;
+
+	/* See comment at can_skip_gucvar(). */
+	for (i = 0; i < num_guc_variables; i++)
+		if (!can_skip_gucvar(guc_variables[i]))
+			InitializeOneGUCOption(guc_variables[i]);
+
+	/* First item is the length of the subsequent data */
+	memcpy(&len, gucstate, sizeof(len));
+
+	srcptr += sizeof(len);
+	srcend = srcptr + len;
+
+	while (srcptr < srcend)
+	{
+		int		result;
+
+		if ((varname = read_gucstate(&srcptr, srcend)) == NULL)
+			break;
+
+		varvalue = read_gucstate(&srcptr, srcend);
+		varsourcefile = read_gucstate(&srcptr, srcend);
+		if (varsourcefile[0])
+			read_gucstate_binary(&srcptr, srcend,
+								 &varsourceline, sizeof(varsourceline));
+		read_gucstate_binary(&srcptr, srcend,
+							 &varsource, sizeof(varsource));
+		read_gucstate_binary(&srcptr, srcend,
+							 &varscontext, sizeof(varscontext));
+
+		result = set_config_option(varname, varvalue, varscontext, varsource,
+								   GUC_ACTION_SET, true, ERROR, true);
+		if (result <= 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					 errmsg("parameter \"%s\" could not be set", varname)));
+		if (varsourcefile[0])
+			set_config_sourcefile(varname, varsourcefile, varsourceline);
+	}
+}
 
 /*
  * A little "long argument" simulation, although not quite GNU
@@ -8555,7 +8952,7 @@ ProcessGUCArray(ArrayType *array,
 
 		(void) set_config_option(name, value,
 								 context, source,
-								 action, true, 0);
+								 action, true, 0, false);
 
 		free(name);
 		if (value)
@@ -8858,7 +9255,7 @@ validate_option_array_item(const char *name, const char *value,
 	/* test for permissions and valid option value */
 	(void) set_config_option(name, value,
 							 superuser() ? PGC_SUSET : PGC_USERSET,
-							 PGC_S_TEST, GUC_ACTION_SET, false, 0);
+							 PGC_S_TEST, GUC_ACTION_SET, false, 0, false);
 
 	return true;
 }
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 2b2aaf4..395858b 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -339,7 +339,8 @@ extern bool parse_int(const char *value, int *result, int flags,
 extern bool parse_real(const char *value, double *result);
 extern int set_config_option(const char *name, const char *value,
 				  GucContext context, GucSource source,
-				  GucAction action, bool changeVal, int elevel);
+				  GucAction action, bool changeVal, int elevel,
+				  bool is_reload);
 extern void AlterSystemSetConfigFile(AlterSystemStmt *setstmt);
 extern char *GetConfigOptionByName(const char *name, const char **varname);
 extern void GetConfigOptionByNum(int varnum, const char **values, bool *noshow);
@@ -363,6 +364,11 @@ extern void write_nondefault_variables(GucContext context);
 extern void read_nondefault_variables(void);
 #endif
 
+/* GUC serialization */
+extern Size EstimateGUCStateSpace(void);
+extern void SerializeGUCState(Size maxsize, char *start_address);
+extern void RestoreGUCState(void *gucstate);
+
 /* Support for messages reported from GUC check hooks */
 
 extern PGDLLIMPORT char *GUC_check_errmsg_string;

#83

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Robert Haas (#82)

Re: pg_background (and more parallelism infrastructure patches)

On Thu, Nov 20, 2014 at 11:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Nov 20, 2014 at 7:30 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Few compilation errors in the patch:
1>contrib\postgres_fdw\postgres_fdw.c(2107): error C2198:
'set_config_option' : too few arguments for call
1>contrib\postgres_fdw\postgres_fdw.c(2111): error C2198:
'set_config_option' : too few arguments for call
1>contrib\postgres_fdw\postgres_fdw.c(2115): error C2198:
'set_config_option' : too few arguments for call
2>contrib\dblink\dblink.c(2983): error C2198: 'set_config_option' : too few
arguments for call

Oops. Good catch. Fixed in the attached version.

Committed.

Unfortunately, I forgot to credit you and Andres as reviewers; sorry about that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers