Re: Proof of concept: standalone backend with full FE/BE protocol

Started by Amit Kapilaabout 12 years ago49 messages
#1Amit Kapila
amit.kapila16@gmail.com
1 attachment(s)

I have gone through the mail chain of this thread and tried to find
the different concerns or open ends for this patch.

Summarisation of the discussion and concerns for this patch:

1. Security concern in interface
2. Security concern in Windows implementation
3. Handling of Ctrl-C/SIGTERM
4. Secondary connections for maintenance activities, replication
5. Windows Implementation - what should be behaviour for admin users?
6. Restricting operation's in single backend mode
7. Proposal related to maintenance activities

Description of each concern
----------------------------------------------

1. Security concern in interface -

Interface
---------------
$ psql "standalone_datadir = $PGDATA dbname = regression"
There is another option "standalone_backend", which can be set to
specify which postgres executable to launch.
If the latter isn't specified, libpq defaults to trying the
installation PGBINDIR that was selected by configure.

Security Concern
-----------------------------
If a user can specify libpq connection options, he can now execute
any file he wants by passing it as standalone_backend.

Method to resolve Security concern
--------------------------------------------------------
If an application wants to allow these connection parameters to be
used, it would need to do PQenableStartServer() first. If it doesn't,
those connection parameters will be rejected.

2. Security concern in Windows implementation -

Interface
---------------
PQcancel -
In Unix, need to use kill(conn->postgres_pid, SIGINT)
In Windows, pgkill(int pid, int sig) API can be used.

Security concern
---------------------------
pgkill is used to send cancel signal using pipe mechanism, so
someone else can create a pipe with our name before we do (since we
use the actual name - it's \\.\pipe\pgsinal_<pid>), by
guessing what pid we will have. If that happens, we'll go into a
loop and try to recreate it while logging a warning message to
eventlog/stderr. (this happens for every backend). We can't
throw an error on this and kill the backend because the pipe is
created in the background thread not the main one.

Some suggestions
------------------------------
Once it is detected that already a same Named Pipe already exists,
there can be following options:
a. try to create with some other name, but in that case how to
communicate the new name to client end of pipe. Some solution can be
thought if this approach seems to be reasonable,
though currently I don't have any in mind.
b. give error, as creation of pipe is generally at beginning of
process creation(backend)
c. any other better solution?

3. Handling of Ctrl-C/SIGTERM

Behaviour
---------------
If you kill the client, the child postgres will see connection
closure and will shut down.

Concern
--------------
will make scripting harder because you cannot start another single
backend pg_dump before the old backend noticed it, checkpointed and
shut down. It can happen if you forcibly kill
pg_dump (or some other client) and then immediately try to start a
new one, it's not clear how long you'll have to wait.

Suggestions for alternatives for this case
-------------------------------------------------------------
a. There is no expectation that a standalone PG implementation
would provide performance for a series of standalone sessions that is
equivalent to what you'd get from a persistent
server. If that scenario is what's important to you, you'd use
a persistent server.
b. An extra libpq call to handle this case can be helpful.

4. Secondary connections for data access

Proposal
---------------
A single-user connection database with "no administrative hassles"

Concerns
-----------------
As this proposal will not allow any data it stores to be accessed
by another connection, so all forms of replication are excluded and
all maintenance actions force the database to be
unavailable for a period of time. Those two things are barriers of
the most major kind to anybody working in an enterprise with connected
data and devices.

Suggestions for it's use or make it usable
----------------------------------------------------------------
a. a usable & scriptable --single mode is justification enough.
Having to wait for hours just enter one more command because --single
doesn't support any scripts sucks. Especially in
recovery situations.
b. it's worth having this particular thing because it makes
pg_upgrade more robust.
c. some competing solutions already provide similar solution
(http://www.firebirdsql.org/manual/fbmetasecur-embedded.html).
d. we need to make sure that this isn't foreclosing the option of
having a multi-process environment with a single user connection. I
don't see that it is, but it might be wise to sketch
exactly how that case would work before accepting this.

5. Windows Implementation - what should be behaviour for admin users

Behavior clarification -
does this follow the behavior that admin users will not be allowed
to invoke postgres child process?

6. Restricting operation's in single backend mode

Serializable transactions could skip all the SSI predicate locking
and conflict checking when in single-connection mode. With only one
connection the transactions could never overlap, so
there would be no chance of serialization anomalies when running
snapshot isolation.

It could be of use if someone had code they wanted to run under
both normal and single-connection modes. For single-connection only,
they could just choose REPEATABLE READ to
get exactly the same semantics.

7. Proposal related to maintainence activities

For maintainence activities, in longer run, we can have a
postmaster process that isn't listening on any ports, but is managing
background processes in addition to a single child backend.

As per my understanding, to complete this patch we need to
a. complete the work for #1, #2, #5
b. #6 and #7 can be done as enhancements after the initial feature is committed
c. need to decide what should we do for #3 and #4.

Rebased the patch (changes in patch)
-----------------------------------------------------------
a. fillPGconn(), loops through each connection option, so no need to
do it separately for standalone_datadir and standalone_backend
b. In function, ChildPostgresMain()->PostgresMain() pass third
parameter dbname as NULL.
c. Changed second parameter of read_standalone_child_variables() from
"int *" to "pgsocket *" to remove warning.
d. removed trailing white spaces.
e. update PQconninfoOptions array to include offset.

Any objections for adding this idea/patch to CF?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

standalone_backend.3.patchapplication/octet-stream; name=standalone_backend.3.patchDownload
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index d71885d..66e68e5 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -188,6 +188,8 @@ main(int argc, char *argv[])
 		AuxiliaryProcessMain(argc, argv);		/* does not return */
 	else if (argc > 1 && strcmp(argv[1], "--describe-config") == 0)
 		GucInfoMain();			/* does not return */
+	else if (argc > 1 && strncmp(argv[1], "--child=", 8) == 0)
+		ChildPostgresMain(argc, argv, get_current_username(progname)); /* does not return */
 	else if (argc > 1 && strcmp(argv[1], "--single") == 0)
 		PostgresMain(argc, argv,
 					 NULL,		/* no dbname */
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index d294a5a..e319825 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -501,6 +501,7 @@ typedef struct
 
 static void read_backend_variables(char *id, Port *port);
 static void restore_backend_variables(BackendParameters *param, Port *port);
+static void read_standalone_child_variables(char *id, pgsocket *psock);
 
 #ifndef WIN32
 static bool save_backend_variables(BackendParameters *param, Port *port);
@@ -4710,6 +4711,97 @@ ExitPostmaster(int status)
 	proc_exit(status);
 }
 
+
+/*
+ * ChildPostgresMain - start a new-style standalone postgres process
+ *
+ * This may not belong here, but it does share a lot of code with ConnCreate
+ * and BackendInitialize.  Basically what it has to do is set up a
+ * MyProcPort structure and then hand off control to PostgresMain.
+ * Beware that not very much stuff is initialized yet.
+ *
+ * In the future it might be interesting to support a "standalone
+ * multiprocess" mode in which we have a postmaster process that doesn't
+ * listen for connections, but does supervise autovacuum, bgwriter, etc
+ * auxiliary processes.  So that's another reason why postmaster.c might be
+ * the right place for this.
+ */
+void
+ChildPostgresMain(int argc, char *argv[], const char *username)
+{
+	Port	   *port;
+#ifdef WIN32
+	char        paramHandleStr[32];
+#endif
+
+	/*
+	 * Fire up essential subsystems: error and memory management
+	 */
+	MemoryContextInit();
+
+	/*
+	 * Build a Port structure for the client connection
+	 */
+	if (!(port = (Port *) calloc(1, sizeof(Port))))
+		ereport(FATAL,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+
+	/*
+	 * GSSAPI specific state struct must exist even though we won't use it
+	 */
+#if defined(ENABLE_GSS) || defined(ENABLE_SSPI)
+	port->gss = (pg_gssinfo *) calloc(1, sizeof(pg_gssinfo));
+	if (!port->gss)
+		ereport(FATAL,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+#endif
+
+#ifndef WIN32
+	/* The file descriptor of the client socket is the argument of --child */
+	if (sscanf(argv[1], "--child=%d", &port->sock) != 1)
+		ereport(FATAL,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid argument for --child: \"%s\"", argv[1])));
+#else
+/* The file descriptor of the client socket is the argument of --child */
+	if (sscanf(argv[1], "--child=%s", paramHandleStr) != 1)
+		ereport(FATAL,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid argument for --child: \"%s\"", argv[1])));
+
+	read_standalone_child_variables(paramHandleStr, &port->sock);
+
+#endif
+
+	/* Default assumption about protocol to use */
+	FrontendProtocol = port->proto = PG_PROTOCOL_LATEST;
+
+	/* save process start time */
+	port->SessionStartTime = GetCurrentTimestamp();
+	MyStartTime = timestamptz_to_time_t(port->SessionStartTime);
+
+	/* set these to empty in case they are needed */
+	port->remote_host = "";
+	port->remote_port = "";
+
+	MyProcPort = port;
+
+	/*
+	 * We can now initialize libpq and then enable reporting of ereport errors
+	 * to the client.
+	 */
+	pq_init();					/* initialize libpq to talk to client */
+	whereToSendOutput = DestRemote;	/* now safe to ereport to client */
+
+	/* And pass off control to PostgresMain */
+	PostgresMain(argc, argv, NULL, username);
+
+	abort();					/* not reached */
+}
+
+
 /*
  * sigusr1_handler - handle signal conditions from child processes
  */
@@ -5891,6 +5983,47 @@ restore_backend_variables(BackendParameters *param, Port *port)
 }
 
 
+#ifdef WIN32
+static void
+read_standalone_child_variables(char *id, pgsocket *psock)
+{
+	HANDLE		paramHandle;
+	InheritableSocket param;
+	InheritableSocket *paramp;
+
+/* Win32 version uses mapped file */
+#ifdef _WIN64
+	paramHandle = (HANDLE) _atoi64(id);
+#else
+	paramHandle = (HANDLE) atol(id);
+#endif
+	paramp = MapViewOfFile(paramHandle, FILE_MAP_READ, 0, 0, 0);
+	if (!paramp)
+	{
+		write_stderr("could not map view of standalone child variables: error code %lu\n",
+					 GetLastError());
+		exit(1);
+	}
+
+	memcpy(&param, paramp, sizeof(InheritableSocket));
+
+	if (!UnmapViewOfFile(paramp))
+	{
+		write_stderr("could not unmap view of standalone child variables: error code %lu\n",
+					 GetLastError());
+		exit(1);
+	}
+
+	if (!CloseHandle(paramHandle))
+	{
+		write_stderr("could not close handle to standalone child parameter variables: error code %lu\n",
+					 GetLastError());
+		exit(1);
+	}
+
+	read_inheritable_socket(psock, &param);
+}
+#endif
 Size
 ShmemBackendArraySize(void)
 {
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 181e3fe..f5a09c8 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3271,8 +3271,10 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
 	{
 		gucsource = PGC_S_ARGV; /* switches came from command line */
 
-		/* Ignore the initial --single argument, if present */
-		if (argc > 1 && strcmp(argv[1], "--single") == 0)
+		/* Ignore the initial --single or --child argument, if present */
+		if (argc > 1 &&
+			(strcmp(argv[1], "--single") == 0 ||
+			 strncmp(argv[1], "--child=", 8) == 0))
 		{
 			argv++;
 			argc--;
@@ -3536,18 +3538,18 @@ PostgresMain(int argc, char *argv[],
 	 * standalone).
 	 */
 	if (!IsUnderPostmaster)
-	{
 		MyProcPid = getpid();
 
+	if (MyProcPort == NULL)
 		MyStartTime = time(NULL);
-	}
 
 	/*
 	 * Fire up essential subsystems: error and memory management
 	 *
-	 * If we are running under the postmaster, this is done already.
+	 * If we are running under the postmaster, this is done already; likewise
+	 * in child-postgres mode.
 	 */
-	if (!IsUnderPostmaster)
+	if (MyProcPort == NULL)
 		MemoryContextInit();
 
 	SetProcessingMode(InitProcessing);
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index 5dc6dc5..35099d8 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -47,6 +47,9 @@ extern int	postmaster_alive_fds[2];
 extern const char *progname;
 
 extern void PostmasterMain(int argc, char *argv[]) __attribute__((noreturn));
+
+extern void ChildPostgresMain(int argc, char *argv[], const char *username) __attribute__((noreturn));
+
 extern void ClosePostmasterPorts(bool am_syslogger);
 
 extern int	MaxLivePostmasterChildren(void);
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 18fcb0c..1da022c 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -17,6 +17,7 @@
 
 #include <sys/types.h>
 #include <sys/stat.h>
+#include <sys/wait.h>
 #include <fcntl.h>
 #include <ctype.h>
 #include <time.h>
@@ -300,6 +301,14 @@ static const internalPQconninfoOption PQconninfoOptions[] = {
 		"Replication", "D", 5,
 	offsetof(struct pg_conn, replication)},
 
+	{"standalone_datadir", NULL, NULL, NULL,
+	"Standalone-Data-Directory", "D", 64,
+	offsetof(struct pg_conn, standalone_datadir)},
+
+	{"standalone_backend", NULL, NULL, NULL,
+	"Standalone-Backend", "D", 64,
+	offsetof(struct pg_conn, standalone_backend)},
+
 	/* Terminating entry --- MUST BE LAST */
 	{NULL, NULL, NULL, NULL,
 	NULL, NULL, 0}
@@ -401,6 +410,27 @@ pqDropConnection(PGconn *conn)
 	if (conn->sock >= 0)
 		closesocket(conn->sock);
 	conn->sock = -1;
+	/* If we forked a child postgres process, wait for it to exit */
+	if (conn->postgres_pid > 0)
+	{
+#ifdef WIN32
+		while (WaitForSingleObject(conn->proc_handle, INFINITE) != WAIT_OBJECT_0)
+		{
+			_dosmaperr(GetLastError());
+			/* If interrupted by signal, keep waiting */
+			if (errno != EINTR)
+				break;
+		}
+#else
+		while (waitpid(conn->postgres_pid, NULL, 0) < 0)
+		{
+			/* If interrupted by signal, keep waiting */
+			if (errno != EINTR)
+				break;
+		}
+#endif
+	}
+	conn->postgres_pid = -1;
 	/* Discard any unread/unsent data */
 	conn->inStart = conn->inCursor = conn->inEnd = 0;
 	conn->outCount = 0;
@@ -1299,6 +1329,340 @@ setKeepalivesWin32(PGconn *conn)
 #endif   /* SIO_KEEPALIVE_VALS */
 #endif   /* WIN32 */
 
+#ifdef WIN32
+int
+pgsockpair(int handles[2], PGconn *conn)
+{
+	SOCKET		s;
+	struct sockaddr_in serv_addr;
+	int			len = sizeof(serv_addr);
+
+	handles[0] = handles[1] = INVALID_SOCKET;
+
+	if ((s = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
+	{
+		char		sebuf[256];
+
+		appendPQExpBuffer(&conn->errorMessage,
+				   libpq_gettext("pgsockpair could not create socket: %s\n"),
+						  SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+
+		return -1;
+	}
+
+	memset((void *) &serv_addr, 0, sizeof(serv_addr));
+	serv_addr.sin_family = AF_INET;
+	serv_addr.sin_port = htons(0);
+	serv_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+	if (bind(s, (SOCKADDR *) & serv_addr, len) == SOCKET_ERROR)
+	{
+		char		sebuf[256];
+
+		appendPQExpBuffer(&conn->errorMessage,
+						  libpq_gettext("pgsockpair could not bind: %s\n"),
+						  SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+
+		closesocket(s);
+		return -1;
+	}
+	if (listen(s, 1) == SOCKET_ERROR)
+	{
+		char		sebuf[256];
+
+		appendPQExpBuffer(&conn->errorMessage,
+						  libpq_gettext("pgsockpair could not listen: %s\n"),
+						  SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+		closesocket(s);
+		return -1;
+	}
+	if (getsockname(s, (SOCKADDR *) & serv_addr, &len) == SOCKET_ERROR)
+	{
+		char		sebuf[256];
+
+		appendPQExpBuffer(&conn->errorMessage,
+					 libpq_gettext("pgsockpair could not getsockname: %s\n"),
+						  SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+		closesocket(s);
+		return -1;
+	}
+	if ((handles[1] = socket(PF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
+	{
+		char		sebuf[256];
+
+		appendPQExpBuffer(&conn->errorMessage,
+				 libpq_gettext("pgsockpair could not create socket 2: %s\n"),
+						  SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+
+		closesocket(s);
+		return -1;
+	}
+
+	if (connect(handles[1], (SOCKADDR *) & serv_addr, len) == SOCKET_ERROR)
+	{
+		char		sebuf[256];
+
+		appendPQExpBuffer(&conn->errorMessage,
+				  libpq_gettext("pgsockpair could not connect socket: %s\n"),
+						  SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+		closesocket(s);
+		return -1;
+	}
+	if ((handles[0] = accept(s, (SOCKADDR *) & serv_addr, &len)) == INVALID_SOCKET)
+	{
+		char		sebuf[256];
+
+		appendPQExpBuffer(&conn->errorMessage,
+				   libpq_gettext("pgsockpair could not accept socket: %s\n"),
+						  SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+		closesocket(handles[1]);
+		handles[1] = INVALID_SOCKET;
+		closesocket(s);
+		return -1;
+	}
+	closesocket(s);
+	return 0;
+}
+
+/*
+ * Fork and exec a command, connecting up a pair of anonymous sockets for
+ * communication.  argv[fdarg] is modified by appending the child-side
+ * socket's file descriptor number.  The parent-side socket's FD is returned
+ * in *psock, and the function result is the child's PID.
+ */
+static pid_t
+fork_backend_child(char **argv, int fdarg, PGconn *conn)
+{
+	int			socks[2];
+	char		newfdarg[32];
+	STARTUPINFO si;
+	PROCESS_INFORMATION pi;
+	int			i;
+	int			j;
+	char		cmdLine[MAXPGPATH * 2];
+	typedef struct
+	{
+		SOCKET		origsocket; /* Original socket value, or PGINVALID_SOCKET
+								 * if not a socket */
+		WSAPROTOCOL_INFO wsainfo;
+	} InheritableSocket;
+	InheritableSocket *param;
+	HANDLE		paramHandle;
+	SECURITY_ATTRIBUTES sa;
+	char		paramHandleStr[32];
+
+
+	if (pgsockpair(socks, conn) < 0)
+		return -1;
+
+	/* Set up shared memory for parameter passing */
+	ZeroMemory(&sa, sizeof(sa));
+	sa.nLength = sizeof(sa);
+	sa.bInheritHandle = TRUE;
+	paramHandle = CreateFileMapping(INVALID_HANDLE_VALUE,
+									&sa,
+									PAGE_READWRITE,
+									0,
+									sizeof(InheritableSocket),
+									NULL);
+	if (paramHandle == INVALID_HANDLE_VALUE)
+	{
+		char		sebuf[256];
+
+		appendPQExpBuffer(&conn->errorMessage,
+						  libpq_gettext("could not create standalone backend parameter file mapping: %s\n"),
+						  SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+		return -1;
+	}
+
+	param = MapViewOfFile(paramHandle, FILE_MAP_WRITE, 0, 0, sizeof(InheritableSocket));
+	if (!param)
+	{
+		char		sebuf[256];
+
+		appendPQExpBuffer(&conn->errorMessage,
+			   libpq_gettext("could not map backend parameter memory: %s\n"),
+						  SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+
+		CloseHandle(paramHandle);
+		return -1;
+	}
+
+	/* Insert temp file name instead of sockid incase of windows */
+#ifdef _WIN64
+	sprintf(paramHandleStr, "%llu", (LONG_PTR) paramHandle);
+#else
+	sprintf(paramHandleStr, "%lu", (DWORD) paramHandle);
+#endif
+
+	snprintf(newfdarg, sizeof(newfdarg), "%s%s", argv[fdarg], paramHandleStr);
+	argv[fdarg] = newfdarg;
+
+	/* Format the cmd line */
+	cmdLine[sizeof(cmdLine) - 1] = '\0';
+	cmdLine[sizeof(cmdLine) - 2] = '\0';
+	snprintf(cmdLine, sizeof(cmdLine) - 1, "\"%s\"", argv[0]);
+
+	i = 0;
+	while (argv[++i] != NULL)
+	{
+		j = strlen(cmdLine);
+		snprintf(cmdLine + j, sizeof(cmdLine) - 1 - j, " \"%s\"", argv[i]);
+	}
+	if (cmdLine[sizeof(cmdLine) - 2] != '\0')
+	{
+		appendPQExpBuffer(&conn->errorMessage,
+					libpq_gettext("subprocess command line too long: %s\n"));
+
+		return -1;
+	}
+
+	memset(&pi, 0, sizeof(pi));
+	memset(&si, 0, sizeof(si));
+	si.cb = sizeof(si);
+
+	if (!CreateProcess(NULL, cmdLine, NULL, NULL, TRUE, CREATE_SUSPENDED,
+					   NULL, NULL, &si, &pi))
+	{
+		char		sebuf[256];
+
+		appendPQExpBuffer(&conn->errorMessage,
+						  libpq_gettext("CreateProcess call failed: %s\n"),
+						  SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+
+		return -1;
+	}
+
+
+	param->origsocket = socks[0];
+	if (socks[0] != 0 && socks[0] != PGINVALID_SOCKET)
+	{
+		/* Actual socket */
+		if (WSADuplicateSocket(socks[0], pi.dwProcessId, &param->wsainfo) != 0)
+		{
+			char		sebuf[256];
+
+			if (!TerminateProcess(pi.hProcess, 255))
+			{
+				appendPQExpBuffer(&conn->errorMessage,
+				libpq_gettext("could not terminate unstarted process: %s\n"),
+							SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+
+				CloseHandle(pi.hProcess);
+				CloseHandle(pi.hThread);
+				return -1;
+			}
+
+			appendPQExpBuffer(&conn->errorMessage,
+					   libpq_gettext("could not duplicate the socket: %s\n"),
+							SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+
+			CloseHandle(pi.hProcess);
+			CloseHandle(pi.hThread);
+			return -1;
+		}
+	}
+
+	/*
+	 * Drop the parameter shared memory that is now inherited to the
+	 * standalone chile
+	 */
+	if (!UnmapViewOfFile(param))
+	{
+		char		sebuf[256];
+
+		fprintf(stderr,
+				libpq_gettext("WARNING: could not unmap view of standalone child variables: %s\n"),
+				SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+	}
+
+	if (!CloseHandle(paramHandle))
+	{
+		char		sebuf[256];
+
+		fprintf(stderr,
+				libpq_gettext("WARNING: could not close handle of standalone child variables: %s\n"),
+				SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+	}
+
+	/*
+	 * Now that the socket information is shared, we start the child thread
+	 */
+	if (ResumeThread(pi.hThread) == -1)
+	{
+		char		sebuf[256];
+
+		if (!TerminateProcess(pi.hProcess, 255))
+		{
+			appendPQExpBuffer(&conn->errorMessage,
+				libpq_gettext("could not terminate unstarted process: %s\n"),
+							SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+
+			CloseHandle(pi.hProcess);
+			CloseHandle(pi.hThread);
+			return -1;
+		}
+		appendPQExpBuffer(&conn->errorMessage,
+		 libpq_gettext("could not resume thread of unstarted process: %s\n"),
+						  SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+		CloseHandle(pi.hProcess);
+		CloseHandle(pi.hThread);
+		return -1;
+	}
+	/* Don't close pi.hProcess here - the wait thread needs access to it */
+	CloseHandle(pi.hThread);
+
+	conn->sock = socks[1];
+	conn->proc_handle = pi.hProcess;
+	return pi.dwProcessId;
+}
+#else
+
+/*
+ * Fork and exec a command, connecting up a pair of anonymous sockets for
+ * communication.  argv[fdarg] is modified by appending the child-side
+ * socket's file descriptor number.  The parent-side socket's FD is returned
+ * in *psock, and the function result is the child's PID.
+ */
+static pid_t
+fork_backend_child(char **argv, int fdarg, int *psock)
+{
+	pid_t		pid;
+	int			socks[2];
+	char		newfdarg[32];
+
+	if (socketpair(AF_UNIX, SOCK_STREAM, 0, socks) < 0)
+		return -1;
+
+	pid = fork();
+
+	if (pid < 0)
+	{
+		/* fork failed */
+		close(socks[0]);
+		close(socks[1]);
+		return pid;
+	}
+	else if (pid == 0)
+	{
+		/* successful, in child process */
+		close(socks[1]);
+		snprintf(newfdarg, sizeof(newfdarg), "%s%d", argv[fdarg], socks[0]);
+		argv[fdarg] = newfdarg;
+		execv(argv[0], argv);
+		perror(argv[0]);
+		exit(1);
+	}
+	else
+	{
+		/* successful, in parent process */
+		close(socks[0]);
+		*psock = socks[1];
+	}
+
+	return pid;
+}
+#endif
+
 /* ----------
  * connectDBStart -
  *		Begin the process of making a connection to the backend.
@@ -1327,6 +1691,61 @@ connectDBStart(PGconn *conn)
 	conn->outCount = 0;
 
 	/*
+	 * If the standalone_datadir option was specified, ignore any host or
+	 * port specifications and just try to fork a standalone backend.
+	 */
+	if (conn->standalone_datadir && conn->standalone_datadir[0])
+	{
+		char   *be_argv[10];
+
+		/*
+		 * We set up the backend's command line in execv(3) style, so that
+		 * we don't need to cope with shell quoting rules.
+		 */
+		if (conn->standalone_backend && conn->standalone_backend[0])
+			be_argv[0] = conn->standalone_backend;
+		else					/* assume we should use hard-wired path */
+			be_argv[0] = PGBINDIR "/postgres";
+
+		be_argv[1] = "--child=";
+		be_argv[2] = "-D";
+		be_argv[3] = conn->standalone_datadir;
+		be_argv[4] = (conn->dbName && conn->dbName[0]) ? conn->dbName : NULL;
+		be_argv[5] = NULL;
+
+#ifdef WIN32
+		conn->postgres_pid = fork_backend_child(be_argv, 1, conn);
+		if (conn->postgres_pid < 0)
+			goto connect_errReturn;
+#else
+		conn->postgres_pid = fork_backend_child(be_argv, 1, &conn->sock);
+		if (conn->postgres_pid < 0)
+		{
+			char		sebuf[256];
+
+			appendPQExpBuffer(&conn->errorMessage,
+					libpq_gettext("could not fork standalone backend: %s\n"),
+							  pqStrerror(errno, sebuf, sizeof(sebuf)));
+			goto connect_errReturn;
+		}
+#endif
+
+		/*
+		 * Go directly to CONNECTION_AUTH_OK state, since the standalone
+		 * backend is not going to issue any authentication challenge to us.
+		 * We're just waiting for startup to conclude.
+		 */
+#ifdef USE_SSL
+		conn->allow_ssl_try = false;
+#endif
+		conn->pversion = PG_PROTOCOL(3, 0);
+		conn->status = CONNECTION_AUTH_OK;
+		conn->asyncStatus = PGASYNC_BUSY;
+
+		return 1;
+	}
+
+	/*
 	 * Determine the parameters to pass to pg_getaddrinfo_all.
 	 */
 
@@ -2706,6 +3125,7 @@ makeEmptyPGconn(void)
 	conn->auth_req_received = false;
 	conn->password_needed = false;
 	conn->dot_pgpass_used = false;
+	conn->postgres_pid = -1;
 #ifdef USE_SSL
 	conn->allow_ssl_try = true;
 	conn->wait_ssl_try = false;
@@ -2782,6 +3202,10 @@ freePGconn(PGconn *conn)
 		free(conn->pgport);
 	if (conn->pgunixsocket)
 		free(conn->pgunixsocket);
+	if (conn->standalone_datadir)
+		free(conn->standalone_datadir);
+	if (conn->standalone_backend)
+		free(conn->standalone_backend);
 	if (conn->pgtty)
 		free(conn->pgtty);
 	if (conn->connect_timeout)
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 408aeb1..1ac3088 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -306,6 +306,8 @@ struct pg_conn
 	char	   *pgunixsocket;	/* the Unix-domain socket that the server is
 								 * listening on; if NULL, uses a default
 								 * constructed from pgport */
+	char	   *standalone_datadir;	/* data directory for standalone backend */
+	char	   *standalone_backend;	/* executable for standalone backend */
 	char	   *pgtty;			/* tty on which the backend messages is
 								 * displayed (OBSOLETE, NOT USED) */
 	char	   *connect_timeout;	/* connection timeout (numeric string) */
@@ -376,6 +378,9 @@ struct pg_conn
 	bool		sigpipe_so;		/* have we masked SIGPIPE via SO_NOSIGPIPE? */
 	bool		sigpipe_flag;	/* can we mask SIGPIPE via MSG_NOSIGNAL? */
 
+	/* If we forked a child postgres process, its PID is kept here */
+	pid_t		postgres_pid;	/* PID, or -1 if none */
+
 	/* Transient state needed while establishing connection */
 	struct addrinfo *addrlist;	/* list of possible backend addresses */
 	struct addrinfo *addr_cur;	/* the one currently being tried */
@@ -418,6 +423,10 @@ struct pg_conn
 	/* Status for asynchronous result construction */
 	PGresult   *result;			/* result being constructed */
 	PGresult   *next_result;	/* next result (used in single-row mode) */
+#ifdef WIN32
+	/* If we forked a child postgres process, process handle */
+	HANDLE      proc_handle;
+#endif
 
 	/* Assorted state for SSL, GSS, etc */
 
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Kapila (#1)

Amit Kapila <amit.kapila16@gmail.com> writes:

Any objections for adding this idea/patch to CF?

You should certainly add it to the CF. You've listed lots of matters
for review, but that's no reason to not get it in the queue to be
reviewed.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Amit Kapila (#1)

On 14 November 2013 03:41, Amit Kapila <amit.kapila16@gmail.com> wrote:

I have gone through the mail chain of this thread and tried to find
the different concerns or open ends for this patch.

Not enough. This feature is clearly being suggested as a way to offer
Postgres in embedded mode for users by a back door. Doing that forces
us to turn off many of the server's features and we will take a huge
step backwards in features, testing, maintainability of code and
wasted community time.

"No administrative hassles" is just a complete fiction. Admin will
become a huge burden for any user in this mode, which will bite the
community and cause us to waste much time redesigning the server to
operate on a single session.

-1 from me

4. Secondary connections for data access

Proposal
---------------
A single-user connection database with "no administrative hassles"

Concerns
-----------------
As this proposal will not allow any data it stores to be accessed
by another connection, so all forms of replication are excluded and
all maintenance actions force the database to be
unavailable for a period of time. Those two things are barriers of
the most major kind to anybody working in an enterprise with connected
data and devices.

Suggestions for it's use or make it usable
----------------------------------------------------------------
a. a usable & scriptable --single mode is justification enough.
Having to wait for hours just enter one more command because --single
doesn't support any scripts sucks. Especially in
recovery situations.
b. it's worth having this particular thing because it makes
pg_upgrade more robust.
c. some competing solutions already provide similar solution
(http://www.firebirdsql.org/manual/fbmetasecur-embedded.html).
d. we need to make sure that this isn't foreclosing the option of
having a multi-process environment with a single user connection. I
don't see that it is, but it might be wise to sketch
exactly how that case would work before accepting this.

Why is not feasible to run a normal server with 1 connection.

Are we really following what Firebird is doing? Why?

6. Restricting operation's in single backend mode

Serializable transactions could skip all the SSI predicate locking
and conflict checking when in single-connection mode. With only one
connection the transactions could never overlap, so
there would be no chance of serialization anomalies when running
snapshot isolation.

It could be of use if someone had code they wanted to run under
both normal and single-connection modes. For single-connection only,
they could just choose REPEATABLE READ to
get exactly the same semantics.

This is an example of my concern that we would begin optimising for
the case of single user mode and encourage its use by users. This
shows that the feature is not being suggested just for recovery.

PostgreSQL has been designed from the ground up to support
concurrency. If we begin optimising for single user mode it will take
years to unpick our work and eventually we'll have a conflict and
someone will say "we can't do that because it will be a problem in
single user mode".

7. Proposal related to maintainence activities

For maintainence activities, in longer run, we can have a
postmaster process that isn't listening on any ports, but is managing
background processes in addition to a single child backend.

As per my understanding, to complete this patch we need to
a. complete the work for #1, #2, #5
b. #6 and #7 can be done as enhancements after the initial feature is committed
c. need to decide what should we do for #3 and #4.

Huh? Multi-process mode already works. Why would do we need "a
solution" for the problem that single process mode uses only one
process? Don't use it.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andres Freund
andres@2ndquadrant.com
In reply to: Simon Riggs (#3)

On 2013-11-15 09:51:28 -0200, Simon Riggs wrote:

On 14 November 2013 03:41, Amit Kapila <amit.kapila16@gmail.com> wrote:

I have gone through the mail chain of this thread and tried to find
the different concerns or open ends for this patch.

Not enough. This feature is clearly being suggested as a way to offer
Postgres in embedded mode for users by a back door.

I think fixing single user mode to work halfway reasonable is enough
justification for the feature. Having to deal with that when solving
critical issues is just embarassing.

Doing that forces
us to turn off many of the server's features and we will take a huge
step backwards in features, testing, maintainability of code and
wasted community time.

I think the patch as proposed actually reduces maintenance overhead
since we don't have to deal with the strange separate codepaths for
single user mode.

But: I very, very much agree with the other concerns around this. This
should be a patch to fix single user mode, not one to make postgres into
a single process database. It's not, and trying to make it by using
single user mode for it will start to hinder development of normal
postgres because we suddenly need to be concerned about performance and
features in situations where we previously weren't.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Andres Freund (#4)

Andres Freund <andres@2ndquadrant.com> writes:

I think fixing single user mode to work halfway reasonable is enough
justification for the feature. Having to deal with that when solving
critical issues is just embarassing.

+1

But: I very, very much agree with the other concerns around this. This
should be a patch to fix single user mode, not one to make postgres into
a single process database. It's not, and trying to make it by using
single user mode for it will start to hinder development of normal
postgres because we suddenly need to be concerned about performance and
features in situations where we previously weren't.

+1

Maybe what needs to happen to this patch is away to restrict its usage
to --single. I'm thinking that postgres --single maybe could be made to
fork the server process underneath the psql controler client process
transparently.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#4)

On 15 November 2013 09:00, Andres Freund <andres@2ndquadrant.com> wrote:

This
should be a patch to fix single user mode, not one to make postgres into
a single process database.

+1

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Merlin Moncure
mmoncure@gmail.com
In reply to: Dimitri Fontaine (#5)

On Fri, Nov 15, 2013 at 6:06 AM, Dimitri Fontaine
<dimitri@2ndquadrant.fr> wrote:

But: I very, very much agree with the other concerns around this. This
should be a patch to fix single user mode, not one to make postgres into
a single process database. It's not, and trying to make it by using
single user mode for it will start to hinder development of normal
postgres because we suddenly need to be concerned about performance and
features in situations where we previously weren't.

+1

Maybe what needs to happen to this patch is away to restrict its usage
to --single. I'm thinking that postgres --single maybe could be made to
fork the server process underneath the psql controler client process
transparently.

I personally would prefer not to do that. My main non-administrative
interest in this mode is doing things like benchmarking protocol
overhead. I'm OK with not supporting (and optimizing) for single user
code paths but I don't like the idea of building walls that serve no
purpose other than to make it difficult for other people mess around.
Just document strenuously that this mode is not intended for
application bundling and move on...

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Amit Kapila
amit.kapila16@gmail.com
In reply to: Simon Riggs (#3)

On Fri, Nov 15, 2013 at 5:21 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 14 November 2013 03:41, Amit Kapila <amit.kapila16@gmail.com> wrote:

I have gone through the mail chain of this thread and tried to find
the different concerns or open ends for this patch.

Not enough. This feature is clearly being suggested as a way to offer
Postgres in embedded mode for users by a back door.

Current patch doesn't have such facility and I don't think somebody
can use it as an embedded database.

Doing that forces
us to turn off many of the server's features and we will take a huge
step backwards in features, testing, maintainability of code and
wasted community time.

"No administrative hassles" is just a complete fiction. Admin will
become a huge burden for any user in this mode, which will bite the
community and cause us to waste much time redesigning the server to
operate on a single session.

-1 from me

What I could understand from your objection is that you don't want
users to get the impression of this feature as an embedded database.
I think as the patch stands, it doesn't have such facility, so
advertising it as an substitute for embedded database would be anyway
inappropriate.
The use case is to provide a standalone mode which will be useful for
cases where today --single mode is required/used and I think
documenting the feature that way is the right way to proceed. If this
addresses your concern, then we can proceed to discuss solutions for
other concerns like security?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#3)

On Fri, Nov 15, 2013 at 6:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Not enough. This feature is clearly being suggested as a way to offer
Postgres in embedded mode for users by a back door. Doing that forces
us to turn off many of the server's features and we will take a huge
step backwards in features, testing, maintainability of code and
wasted community time.

That's not clear to me at all. IIRC, the original idea was Tom's, and
the idea is to make it possible to have, for example, a psql session
connected to a standalone database, which can't be done right now. I
don't use standalone mode much, but when I do, I'd sure like to have
the psql interface rather than the existing standalone mode interface.
I'm not aware that there's anything in this patch which targets any
other use case; if there is, sure, rip it out. But let's not assume
this is going in a bad direction, especially considering who it was
that suggested the idea originally.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#9)

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Nov 15, 2013 at 6:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Not enough. This feature is clearly being suggested as a way to offer
Postgres in embedded mode for users by a back door. Doing that forces
us to turn off many of the server's features and we will take a huge
step backwards in features, testing, maintainability of code and
wasted community time.

That's not clear to me at all. IIRC, the original idea was Tom's, and
the idea is to make it possible to have, for example, a psql session
connected to a standalone database, which can't be done right now. I
don't use standalone mode much, but when I do, I'd sure like to have
the psql interface rather than the existing standalone mode interface.

pg_dump from a standalone backend seems like another core use case.
That's not just more convenience, it's functionality you just don't
have right now.

I think that it might someday be interesting to allow a full server to be
started on-demand in this way. But the patch as proposed is not that,
it's just a nicer interface to the existing standalone mode.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Peter Eisentraut
peter_e@gmx.net
In reply to: Amit Kapila (#1)

On 11/14/13, 1:41 AM, Amit Kapila wrote:

Security Concern
-----------------------------
If a user can specify libpq connection options, he can now execute
any file he wants by passing it as standalone_backend.

Method to resolve Security concern
--------------------------------------------------------
If an application wants to allow these connection parameters to be
used, it would need to do PQenableStartServer() first. If it doesn't,
those connection parameters will be rejected.

I don't think this really helps. You can't tell with reasonable effort
or certainty whether a given program is calling PQenableStartServer(),
so you cannot audit this from the outside. Also, someone could,
depending on circumstances, dynamically load a module that calls
PQenableStartServer(), thus circumventing this check. And maybe before
long someone will patch up all drivers to call PQenableStartServer()
automatically, because why shouldn't I be able to run a standalone
backend from PHP or Ruby? Also, at some point at least, something like
phpPgAdmin called pg_dump internally, so you could imagine that in
situations like that, assuming that pg_dump called
PQenableStartServer(), with a little bit craftiness, you could expose
the execute-any-file hole through a web server.

I don't have a better idea right now how to set up these connection
parameters in a way that you can only set them in certain "safe"
circumstances.

I would consider sidestepping this entire issue by having the
stand-alone backend create a Unix-domain socket and have a client
connect to that in the normal way. At least if you split the patch that
way, you might alleviate some concerns of others about whether this
patch is about fixing standalone mode vs. allowing using standalone mode
with psql vs. making a fully embedded database.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#11)

Peter Eisentraut <peter_e@gmx.net> writes:

I would consider sidestepping this entire issue by having the
stand-alone backend create a Unix-domain socket and have a client
connect to that in the normal way.

Hmm. But that requires the "stand-alone backend" to take on at least
some properties of a postmaster; at the very least, it would need to
accept some form of shutdown signal (not just EOF on its stdin).

Perhaps more to the point, I think this approach actually breaks one of
the principal good-thing-in-emergencies attributes of standalone mode,
namely being sure that nobody but you can connect. With this, you're
right back to having a race condition as to whether your psql command
gets to the socket before somebody else.

I think we'd be better off trying to fix the security issue by
constraining what can be executed as a "standalone backend". Would
it work to insist that psql/pg_dump launch the program named postgres
from the same bin directory they're in, rather than accepting a path
from the connection string?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#12)

On 2013-11-20 10:48:20 -0500, Tom Lane wrote:

constraining what can be executed as a "standalone backend". Would
it work to insist that psql/pg_dump launch the program named postgres
from the same bin directory they're in, rather than accepting a path
from the connection string?

But why do we want to start the server through the connection string
using PQconnectb() in the first place? That doesn't really seem right to
me.
Something like PQstartSingleUser(dsn) returning a established connection
seems better to me.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#13)

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-11-20 10:48:20 -0500, Tom Lane wrote:

constraining what can be executed as a "standalone backend". Would
it work to insist that psql/pg_dump launch the program named postgres
from the same bin directory they're in, rather than accepting a path
from the connection string?

But why do we want to start the server through the connection string
using PQconnectb() in the first place? That doesn't really seem right to
me.
Something like PQstartSingleUser(dsn) returning a established connection
seems better to me.

That just pushes the problem up a level --- how are you going to tell
psql, pg_dump, or other programs that they should do that?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#14)

On 2013-11-20 11:08:33 -0500, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-11-20 10:48:20 -0500, Tom Lane wrote:

constraining what can be executed as a "standalone backend". Would
it work to insist that psql/pg_dump launch the program named postgres
from the same bin directory they're in, rather than accepting a path
from the connection string?

But why do we want to start the server through the connection string
using PQconnectb() in the first place? That doesn't really seem right to
me.
Something like PQstartSingleUser(dsn) returning a established connection
seems better to me.

That just pushes the problem up a level --- how are you going to tell
psql, pg_dump, or other programs that they should do that?

An explicit parameter. A program imo explicitly needs to be aware that a
PQconnect() suddenly starts forking and such. What if it is using
threads? What if it has it's own SIGCHLD handler for other business it's
doing?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#12)

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

I think we'd be better off trying to fix the security issue by
constraining what can be executed as a "standalone backend". Would
it work to insist that psql/pg_dump launch the program named postgres
from the same bin directory they're in, rather than accepting a path
from the connection string?

Couldn't that be an issue for people who have multiple major versions of
binaries installed? In particular, the "default" on the system for psql
might be 9.3 while the cluster you're trying to recover may be 9.2. Of
course, in that case you might say to use the 9.2 psql, which would be
fair, but what if you're looking to get the data out of the 9.2 DB and
into the 9.3? In that case, we'd recommend using the 9.3 pg_dump.

Basically, I'd suggest that we try and avoid things like "the binaries
have to be in the same directory".. With regard to access to the
socket, perhaps we create our own socket w/ 0600 and use that? Seems
like it'd be sufficient to prevent the 'normal' users from getting into
the DB while we're working on it. If there's two different individuals
gettings into the same system and trying to start the same cluster as
the same unix user, well.. I'm not convinced we'd be able to come up
with a perfect solution to that anyway.

Thanks,

Stephen

#17Andres Freund
andres@2ndquadrant.com
In reply to: Andres Freund (#15)

On 2013-11-20 17:19:42 +0100, Andres Freund wrote:

That just pushes the problem up a level --- how are you going to tell
psql, pg_dump, or other programs that they should do that?

An explicit parameter. A program imo explicitly needs to be aware that a
PQconnect() suddenly starts forking and such. What if it is using
threads? What if it has it's own SIGCHLD handler for other business it's
doing?

Just as an example, consider what happens if somebody does pg_dump -j?
Or somebody specifies such a connection for primary_conninfo?

I am also not sure whether vacuumdb -a/reindexdb -a (both not unlikely
commands to use for single user mode) are careful enough not to have
parallel connections open?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#15)

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-11-20 11:08:33 -0500, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

Something like PQstartSingleUser(dsn) returning a established connection
seems better to me.

That just pushes the problem up a level --- how are you going to tell
psql, pg_dump, or other programs that they should do that?

An explicit parameter. A program imo explicitly needs to be aware that a
PQconnect() suddenly starts forking and such. What if it is using
threads? What if it has it's own SIGCHLD handler for other business it's
doing?

Hm. That's a fair point. I don't especially buy your other argument
about additional connections --- if the program tries such, they'll
just fail, which can hardly be said to be unexpected. But it's reasonable
to worry that programs might need to be aware that they now have a child
process. (It occurs to me that we'll need to provide a way to get the
PID of the child, too.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Peter Eisentraut
peter_e@gmx.net
In reply to: Stephen Frost (#16)

On 11/20/13, 11:31 AM, Stephen Frost wrote:

Couldn't that be an issue for people who have multiple major versions of
binaries installed? In particular, the "default" on the system for psql
might be 9.3 while the cluster you're trying to recover may be 9.2. Of
course, in that case you might say to use the 9.2 psql, which would be
fair, but what if you're looking to get the data out of the 9.2 DB and
into the 9.3? In that case, we'd recommend using the 9.3 pg_dump.

Right. And also, in emergency situations you might have a custom built
postgres binary lying around in a separate path that includes a patch
from a mailing list you're supposed to test or something. Best not to
make that even more difficult.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#12)

On 11/20/13, 10:48 AM, Tom Lane wrote:

Perhaps more to the point, I think this approach actually breaks one of
the principal good-thing-in-emergencies attributes of standalone mode,
namely being sure that nobody but you can connect. With this, you're
right back to having a race condition as to whether your psql command
gets to the socket before somebody else.

I don't disagree, except maybe about the relative gravity of the various
competing concerns. But I want to see if we can split the proposed
patch into smaller, more acceptable parts.

There is elegance in being able to start a standalone backend from libpq
connection parameters. But there are also security concerns and some
general concerns about promoting an embedded database mode.

If we allow single-user backends to speak protocol over sockets, then we
have at least solved the problem of being able to use standard tools in
emergency mode. And I don't think it precludes adding some of the other
functionality later.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#11)

On Wed, Nov 20, 2013 at 10:13 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

On 11/14/13, 1:41 AM, Amit Kapila wrote:

Security Concern
-----------------------------
If a user can specify libpq connection options, he can now execute
any file he wants by passing it as standalone_backend.

Method to resolve Security concern
--------------------------------------------------------
If an application wants to allow these connection parameters to be
used, it would need to do PQenableStartServer() first. If it doesn't,
those connection parameters will be rejected.

I don't think this really helps. You can't tell with reasonable effort
or certainty whether a given program is calling PQenableStartServer(),
so you cannot audit this from the outside. Also, someone could,
depending on circumstances, dynamically load a module that calls
PQenableStartServer(), thus circumventing this check.

What?! The complaint is that somebody who only has access to set
connection parameters could cause a server to be started. There's a
tremendous gulf between "I can set the connection string" and "I can
set LD_PRELOAD". If you can set LD_PRELOAD to a value of your choice,
I'm pretty sure you can do things that are far more entertaining than
calling a hypothetical PQenableStartServer() function.

And maybe before
long someone will patch up all drivers to call PQenableStartServer()
automatically, because why shouldn't I be able to run a standalone
backend from PHP or Ruby? Also, at some point at least, something like
phpPgAdmin called pg_dump internally, so you could imagine that in
situations like that, assuming that pg_dump called
PQenableStartServer(), with a little bit craftiness, you could expose
the execute-any-file hole through a web server.

The point is that client applications should expose whether or not to
set this function as a command-line switch separate from whatever they
accept in terms of connection strings. So pg_dump should have a flag
called --standalone-server or something like, and it should all
PQenableStartServer() only when that flag is used. So if the user has
a shell script that invokes pg_dump -d "$1", the user cannot contrive
a server. If they write the script as pg_dump --standalone-server -d
"$1", then they can, but by putting that option in there you pretty
much bought the farm. Any program that calls that function
unconditionally while at the same time accepting untrusted user input
will be insecure, but chmod -R u+s /bin is insecure, too. That's why
we don't do that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Peter Eisentraut
peter_e@gmx.net
In reply to: Robert Haas (#21)

On 11/20/13, 3:24 PM, Robert Haas wrote:

The point is that client applications should expose whether or not to
set this function as a command-line switch separate from whatever they
accept in terms of connection strings. So pg_dump should have a flag
called --standalone-server or something like, and it should all
PQenableStartServer() only when that flag is used.

The argument elsewhere in this thread was that the reason for putting
this in the connection options was so that you do *not* have to patch up
every client to be able to use this functionality. If you have to add
separate options everywhere, then you might as well just have a separate
libpq function to initiate the session.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#22)

On Wed, Nov 20, 2013 at 3:32 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

On 11/20/13, 3:24 PM, Robert Haas wrote:

The point is that client applications should expose whether or not to
set this function as a command-line switch separate from whatever they
accept in terms of connection strings. So pg_dump should have a flag
called --standalone-server or something like, and it should all
PQenableStartServer() only when that flag is used.

The argument elsewhere in this thread was that the reason for putting
this in the connection options was so that you do *not* have to patch up
every client to be able to use this functionality. If you have to add
separate options everywhere, then you might as well just have a separate
libpq function to initiate the session.

Well, that's fair enough. I don't care much what the syntax is for
invoking the postmaster this way, as long as it's reasonably
convenient. I just want there to be one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#22)

Peter Eisentraut <peter_e@gmx.net> writes:

The argument elsewhere in this thread was that the reason for putting
this in the connection options was so that you do *not* have to patch up
every client to be able to use this functionality. If you have to add
separate options everywhere, then you might as well just have a separate
libpq function to initiate the session.

Right, Andres was saying that we had to do both (special switches that
lead to calling a special connection function). I'm not terribly happy
about that, because it will greatly constrain the set of programs that are
able to connect to standalone backends --- but I think that there are some
in this discussion who want that, anyway. In practice, as long as psql
and pg_dump and pg_upgrade can do it, I think we've covered most of the
interesting bases.

To my mind, the "create a socket and hope nobody else can get to it"
approach is exactly one of the main things we're trying to avoid here.
If you'll recall, awhile back we had a big discussion about how pg_upgrade
could positively guarantee that nobody messed with the source database
while it was working, and we still don't have a bulletproof guarantee
there. I would like to fix that by making pg_upgrade use only standalone
backends to talk to the source database, never starting a real postmaster
at all. But if the standalone-pg_dump mode goes through a socket, we're
back to square one on that concern.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Gurjeet Singh
singh.gurjeet@gmail.com
In reply to: Tom Lane (#24)

On Wed, Nov 20, 2013 at 3:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

To my mind, the "create a socket and hope nobody else can get to it"
approach is exactly one of the main things we're trying to avoid here.
If you'll recall, awhile back we had a big discussion about how pg_upgrade
could positively guarantee that nobody messed with the source database
while it was working, and we still don't have a bulletproof guarantee
there. I would like to fix that by making pg_upgrade use only standalone
backends to talk to the source database, never starting a real postmaster
at all. But if the standalone-pg_dump mode goes through a socket, we're
back to square one on that concern.

(I couldn't find the pg_upgrade-related thread mentioned above).

I am not sure of the mechanics of this, but can we not launch the
postmaster with a random magic-cookie, and use that cookie while initiating
the connection from libpq. The postmaster will then reject any connections
that don't provide the cookie.

We do something similar to enable applications to send cancellation signals
(postmaster.c:Backend.cancel_key), just that it's establishing trust in the
opposite direction.

Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/

EnterprsieDB Inc. www.enterprisedb.com

#26Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#24)

On 2013-11-20 15:44:03 -0500, Tom Lane wrote:

In practice, as long as psql and pg_dump and pg_upgrade can do it, I
think we've covered most of the interesting bases.

I'd say vacuumdb/reindexdb should be added to that list. In my
experience xid wraparound and corrupted system indexes are the most
frequent use-case of single user mode.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Bruce Momjian
bruce@momjian.us
In reply to: Gurjeet Singh (#25)

On Wed, Nov 20, 2013 at 05:38:14PM -0500, Gurjeet Singh wrote:

On Wed, Nov 20, 2013 at 3:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

To my mind, the "create a socket and hope nobody else can get to it"
approach is exactly one of the main things we're trying to avoid here.
If you'll recall, awhile back we had a big discussion about how pg_upgrade
could positively guarantee that nobody messed with the source database
while it was working, and we still don't have a bulletproof guarantee
there. I would like to fix that by making pg_upgrade use only standalone
backends to talk to the source database, never starting a real postmaster
at all. But if the standalone-pg_dump mode goes through a socket, we're
back to square one on that concern.

(I couldn't find the pg_upgrade-related thread mentioned above).

I am not sure of the mechanics of this, but can we not launch the postmaster
with a random magic-cookie, and use that cookie while initiating the connection
from libpq. The postmaster will then reject any connections that don't provide
the cookie.

We do something similar to enable applications to send cancellation signals
(postmaster.c:Backend.cancel_key), just that it's establishing trust in the
opposite direction.

The magic cookie can be tha application_name. I had pg_upgrade code to
prevent anyone from connecting unless their application_name was
"pg_upgrade", but the idea was rejected.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#24)

On Thu, Nov 21, 2013 at 2:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

The argument elsewhere in this thread was that the reason for putting
this in the connection options was so that you do *not* have to patch up
every client to be able to use this functionality. If you have to add
separate options everywhere, then you might as well just have a separate
libpq function to initiate the session.

Right, Andres was saying that we had to do both (special switches that
lead to calling a special connection function).

Doesn't the new option 'standalone_datadir' (which is already in
patch) a good candidate for special switch?
How does having one more new switch helps better?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#28)

On Thu, Nov 21, 2013 at 9:11 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Nov 21, 2013 at 2:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

The argument elsewhere in this thread was that the reason for putting
this in the connection options was so that you do *not* have to patch up
every client to be able to use this functionality. If you have to add
separate options everywhere, then you might as well just have a separate
libpq function to initiate the session.

Right, Andres was saying that we had to do both (special switches that
lead to calling a special connection function).

Doesn't the new option 'standalone_datadir' (which is already in
patch) a good candidate for special switch?
How does having one more new switch helps better?

Here what I have in mind is that:
a. In pg_dump or other internal utilities where we want to use this
feature, they should call PQenableStart() or some other API before
calling PQConnect() which will indicate that it wants to operate
as a standalone mode.
b. In psql, if user specifies this special switch (
'standalone_datadir'), then internally we will call PQenableStart()
and use postgres from same
directory.

So standalone_backend option will not be exposed through psql, but
other internal tools can use it.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Kapila (#29)

Amit Kapila <amit.kapila16@gmail.com> writes:

Here what I have in mind is that:
a. In pg_dump or other internal utilities where we want to use this
feature, they should call PQenableStart() or some other API before
calling PQConnect() which will indicate that it wants to operate
as a standalone mode.
b. In psql, if user specifies this special switch (
'standalone_datadir'), then internally we will call PQenableStart()
and use postgres from same
directory.

Why would you make psql behave differently from our other command-line
clients? That seems bizarre. If we're going to use a special switch
to enable standalone mode, it should be the same on every program
that supports it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#30)

On Thu, Nov 21, 2013 at 8:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Kapila <amit.kapila16@gmail.com> writes:

Here what I have in mind is that:
a. In pg_dump or other internal utilities where we want to use this
feature, they should call PQenableStart() or some other API before
calling PQConnect() which will indicate that it wants to operate
as a standalone mode.
b. In psql, if user specifies this special switch (
'standalone_datadir'), then internally we will call PQenableStart()
and use postgres from same
directory.

Why would you make psql behave differently from our other command-line
clients?

No, psql should not behave different from other clients. Sorry, I
was under assumption that for other programs we will not take backend
executable
path.
One other thing which is not clear to me is that how by calling
some special/new API we can ensure that the path provided by user is
a valid path, are we going to validate file given in
'standalone_backend' switch in some way?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#31)

On Thu, Nov 21, 2013 at 9:54 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Nov 21, 2013 at 8:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Kapila <amit.kapila16@gmail.com> writes:

Here what I have in mind is that:

Why would you make psql behave differently from our other command-line
clients?

No, psql should not behave different from other clients. Sorry, I
was under assumption that for other programs we will not take backend
executable
path.
One other thing which is not clear to me is that how by calling
some special/new API we can ensure that the path provided by user is
a valid path, are we going to validate file given in
'standalone_backend' switch in some way?

Here, do we mean that if user specifies special switch, then
psql/pg_dump will call new API (eg. PQstartSingleUser(dsn)) which
will use postgres from same bin directory they are in, rather than
using from standalone_backend.
Can we consider this as a way to proceed for this patch?

On a side note, today while reading what other software's does to
protect them from such a security threat, I came across this link
(http://osxbook.com/book/bonus/chapter7/binaryprotection/index.html)
which suggests to have encrypted binaries. The use for encrypted
binaries is somewhat similar to what we discussed as a security threat
in this mail thread. Some text from link which made me think that
this is relevant.
For example, "one could turn the requirement around and say that a
given system must not run any binaries unless they are from a
certain source (or set of sources). This could be used to create an
admission-control mechanism for executables, which in turn could
be used in defending against malware. In a draconian managed
environment, it might be desired to limit program execution on managed
systems to a predefined set of programs—nothing else will execute."

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33Peter Eisentraut
peter_e@gmx.net
In reply to: Amit Kapila (#1)

On Thu, 2013-11-14 at 12:11 +0530, Amit Kapila wrote:

If an application wants to allow these connection parameters to be
used, it would need to do PQenableStartServer() first. If it doesn't,
those connection parameters will be rejected.

Stupid idea: Would it work that we require an environment variable to be
set before we allow the standalone_backend connection parameter? That's
easy to do, easy to audit, and doesn't require any extra code in the
individual clients.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Eisentraut (#33)

On Thu, Dec 5, 2013 at 7:25 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

On Thu, 2013-11-14 at 12:11 +0530, Amit Kapila wrote:

If an application wants to allow these connection parameters to be
used, it would need to do PQenableStartServer() first. If it doesn't,
those connection parameters will be rejected.

Stupid idea: Would it work that we require an environment variable to be
set before we allow the standalone_backend connection parameter? That's
easy to do, easy to audit, and doesn't require any extra code in the
individual clients.

This is certainly not a stupid idea, rather something on similar lines
has been discussed previously in this thread.
Tom has suggested something similar, but I am not sure if there was a
conclusion on that point. Please see the
relavant discussion at below link:
/messages/by-id/17384.1346645480@sss.pgh.pa.us

I think the basic question at that time was why should we consider an
environment variable more safe.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35Peter Eisentraut
peter_e@gmx.net
In reply to: Amit Kapila (#34)

On Thu, 2013-12-05 at 09:02 +0530, Amit Kapila wrote:

This is certainly not a stupid idea, rather something on similar lines
has been discussed previously in this thread.
Tom has suggested something similar, but I am not sure if there was a
conclusion on that point. Please see the
relavant discussion at below link:
/messages/by-id/17384.1346645480@sss.pgh.pa.us

Yeah, I think the environment variable idea wasn't actually refuted
there.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36Simon Riggs
simon@2ndQuadrant.com
In reply to: Peter Eisentraut (#33)

On 5 December 2013 01:55, Peter Eisentraut <peter_e@gmx.net> wrote:

On Thu, 2013-11-14 at 12:11 +0530, Amit Kapila wrote:

If an application wants to allow these connection parameters to be
used, it would need to do PQenableStartServer() first. If it doesn't,
those connection parameters will be rejected.

Stupid idea: Would it work that we require an environment variable to be
set before we allow the standalone_backend connection parameter? That's
easy to do, easy to audit, and doesn't require any extra code in the
individual clients.

I like the idea... should it be in pg_hba.conf ?
Or should it be next to listen_addresses in postgresql.conf?

hba might be less convenient but seems like the correct place

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37Andres Freund
andres@2ndquadrant.com
In reply to: Peter Eisentraut (#33)

On 2013-12-04 20:55:08 -0500, Peter Eisentraut wrote:

On Thu, 2013-11-14 at 12:11 +0530, Amit Kapila wrote:

If an application wants to allow these connection parameters to be
used, it would need to do PQenableStartServer() first. If it doesn't,
those connection parameters will be rejected.

Stupid idea: Would it work that we require an environment variable to be
set before we allow the standalone_backend connection parameter? That's
easy to do, easy to audit, and doesn't require any extra code in the
individual clients.

I still don't think it's ok to start forking in arbitrary applications
without their knowledge. So I don't think that buys us enough.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38Peter Eisentraut
peter_e@gmx.net
In reply to: Simon Riggs (#36)

On 12/5/13, 6:07 AM, Simon Riggs wrote:

On 5 December 2013 01:55, Peter Eisentraut <peter_e@gmx.net> wrote:

On Thu, 2013-11-14 at 12:11 +0530, Amit Kapila wrote:

If an application wants to allow these connection parameters to be
used, it would need to do PQenableStartServer() first. If it doesn't,
those connection parameters will be rejected.

Stupid idea: Would it work that we require an environment variable to be
set before we allow the standalone_backend connection parameter? That's
easy to do, easy to audit, and doesn't require any extra code in the
individual clients.

I like the idea... should it be in pg_hba.conf ?
Or should it be next to listen_addresses in postgresql.conf?

No, it's an environment variable.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39Peter Eisentraut
peter_e@gmx.net
In reply to: Amit Kapila (#1)

I think this proposal is a bit deadlocked now.

- There are technical concerns about launching a server executable from
within a client.

- There are conceptual concerns about promoting an embedded database mode.

On the other hand:

- Everyone would like to have a way to use psql (and other basic
clients) in stand-alone mode.

The compromise would be to not launch the server from within the client,
but have client and server communicate over external mechanisms (e.g.,
Unix-domain socket).

The concern about that was that it would open up standalone mode to
accidental third-party connections. While there are some ways around
that (socket in private directory), they are not easy and not portable.
So standalone mode would became less robust and reliable overall.

The only solutions I see are:

1. do nothing

2. do everything (i.e., existing terminal mode plus socket mode plus
embedded mode), letting the user work out the differences

Pick one. ;-)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40Andres Freund
andres@2ndquadrant.com
In reply to: Peter Eisentraut (#39)

On 2013-12-05 11:39:29 -0500, Peter Eisentraut wrote:

I think this proposal is a bit deadlocked now.

- There are technical concerns about launching a server executable from
within a client.

- There are conceptual concerns about promoting an embedded database mode.

On the other hand:

- Everyone would like to have a way to use psql (and other basic
clients) in stand-alone mode.
The only solutions I see are:

1. do nothing

2. do everything (i.e., existing terminal mode plus socket mode plus
embedded mode), letting the user work out the differences

Pick one. ;-)

3) make it an explicit parameter, outside the database DSN, and let the
clients contain a tiny bit of explict code about it. There really
aren't that many clients that can use such a mode sensibly.

If we ever want to support a real embedded mode, much, much more than
this is needed. I don't think we should let that stop us from improving
single user mode.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#40)

On Thu, Dec 5, 2013 at 11:52 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-12-05 11:39:29 -0500, Peter Eisentraut wrote:

I think this proposal is a bit deadlocked now.

- There are technical concerns about launching a server executable from
within a client.

- There are conceptual concerns about promoting an embedded database mode.

On the other hand:

- Everyone would like to have a way to use psql (and other basic
clients) in stand-alone mode.
The only solutions I see are:

1. do nothing

2. do everything (i.e., existing terminal mode plus socket mode plus
embedded mode), letting the user work out the differences

Pick one. ;-)

3) make it an explicit parameter, outside the database DSN, and let the
clients contain a tiny bit of explict code about it. There really
aren't that many clients that can use such a mode sensibly.

If we ever want to support a real embedded mode, much, much more than
this is needed. I don't think we should let that stop us from improving
single user mode.

Yeah, seriously. I don't understand what the big deal is here. The
right design here is 99.44% clear here, and the committer (presumably
Tom) can handle the other 0.56% however he'd like. Let's do this and
move on.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#41)

Robert Haas <robertmhaas@gmail.com> writes:

Yeah, seriously. I don't understand what the big deal is here. The
right design here is 99.44% clear here, and the committer (presumably
Tom) can handle the other 0.56% however he'd like. Let's do this and
move on.

Yeah, but the remaining 0.56% is an important decision, not least because
it's got security implications. I think we need some consensus not just
a unilateral committer decision.

I'm pretty much persuaded by Andres' point that we should not allow a
child process to be launched under a client app without clear permission
from the code of the app (and *not* just some environment variable that
might have been set far away, perhaps by someone who doesn't know what the
app assumes about SIGCHLD etc). So a separate connection call seems like
not a bad idea. In the case of psql and pg_dump it'd be reasonable to
invent a separate command line switch that drives use of this call instead
of normal PQconnect. Doing that, and *not* allowing the text of the
connection string to determine it, seems like it pretty well solves any
security objections. It might be unpleasant to use in some cases, though.

Another issue is that we have too many variants of PQconnect already;
which of them are we prepared to clone for this hypothetical new
connection method?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#42)

On Thu, Dec 5, 2013 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm pretty much persuaded by Andres' point that we should not allow a
child process to be launched under a client app without clear permission
from the code of the app (and *not* just some environment variable that
might have been set far away, perhaps by someone who doesn't know what the
app assumes about SIGCHLD etc). So a separate connection call seems like
not a bad idea. In the case of psql and pg_dump it'd be reasonable to
invent a separate command line switch that drives use of this call instead
of normal PQconnect. Doing that, and *not* allowing the text of the
connection string to determine it, seems like it pretty well solves any
security objections.

Yep.

It might be unpleasant to use in some cases, though.

Why would there be more than a few cases in the first place? Who is
going to use this beyond psql, pg_dump(all), and pg_upgrade, and why?

Another issue is that we have too many variants of PQconnect already;
which of them are we prepared to clone for this hypothetical new
connection method?

PQconnectdbParams, I assume. Isn't that the one to rule them all,
modulo async connect which I can't think is relevant here?

Or don't clone that one but instead have
PQnextConnectionShouldForkThisBinary('...') and let the psql/pg_dump
switch be --standalone=full-path-to-the-postgres-binary.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Robert Haas (#43)

On 12/05/2013 10:37 PM, Robert Haas wrote:

On Thu, Dec 5, 2013 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It might be unpleasant to use in some cases, though.

Why would there be more than a few cases in the first place? Who is
going to use this beyond psql, pg_dump(all), and pg_upgrade, and why?

Well, you might want to use pgAdmin, or your other favorite admin tool.
I'm not sure how well it would work, and I think it's OK if we say
"sorry, can't do that", but it's not a crazy thing to want.

Another issue is that we have too many variants of PQconnect already;
which of them are we prepared to clone for this hypothetical new
connection method?

PQconnectdbParams, I assume. Isn't that the one to rule them all,
modulo async connect which I can't think is relevant here?

Right. Not all of the parameters will make sense for a stand-alone
backend though, like the hostname and port number. And I think you need
need a new parameter to pass the path to the 'postgres' executable,
unless we re-use the host parameter for that.

Or don't clone that one but instead have
PQnextConnectionShouldForkThisBinary('...') and let the psql/pg_dump
switch be --standalone=full-path-to-the-postgres-binary.

I think a separate function makes more sense.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45Andres Freund
andres@2ndquadrant.com
In reply to: Heikki Linnakangas (#44)

On 2013-12-05 23:01:28 +0200, Heikki Linnakangas wrote:

On 12/05/2013 10:37 PM, Robert Haas wrote:

On Thu, Dec 5, 2013 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It might be unpleasant to use in some cases, though.

Why would there be more than a few cases in the first place? Who is
going to use this beyond psql, pg_dump(all), and pg_upgrade, and why?

Well, you might want to use pgAdmin, or your other favorite admin tool. I'm
not sure how well it would work, and I think it's OK if we say "sorry, can't
do that", but it's not a crazy thing to want.

Pgadmin wouldn't work, it uses multiple connections for anything but the
most trivial tasks. You can't even send a manual sql query using only
one connection.
I think that's true for most of the non-trivial tools.

Another issue is that we have too many variants of PQconnect
already; which of them are we prepared to clone for this
hypothetical new connection method?

PQconnectdbParams, I assume. Isn't that the one to rule them all,
modulo async connect which I can't think is relevant here?

Right. Not all of the parameters will make sense for a stand-alone backend
though, like the hostname and port number. And I think you need need a new
parameter to pass the path to the 'postgres' executable, unless we re-use
the host parameter for that.

Hm. I'd guessed that we wouldn't use the connection string to pass down
the executable name and the datadir now that we're inventing a separate
function. But maybe that's unneccessary.

What parameters do we require to be set for that mode:
* path to postgres
* data directory
* database name (single mode after all)
* port, because of the shmem key? I'd say that's not important enough

I think we also need to be able to pass some additional parameters to
postgres:
- config_file, hba_file, ... might be required to start pg in some environments
- -P, -O , are sometimes required in cases single user mode is
neccessary for data recovery.

So I think we should just allow passing through arguments to postgres.

Not sure if we need anything but the pid of the postmaster be returned?

Or don't clone that one but instead have
PQnextConnectionShouldForkThisBinary('...') and let the psql/pg_dump
switch be --standalone=full-path-to-the-postgres-binary.

Yuck, that's ugly.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#45)

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-12-05 23:01:28 +0200, Heikki Linnakangas wrote:

Right. Not all of the parameters will make sense for a stand-alone backend
though, like the hostname and port number. And I think you need need a new
parameter to pass the path to the 'postgres' executable, unless we re-use
the host parameter for that.

Hm. I'd guessed that we wouldn't use the connection string to pass down
the executable name and the datadir now that we're inventing a separate
function. But maybe that's unneccessary.

What parameters do we require to be set for that mode:
* path to postgres
* data directory
* database name (single mode after all)
* port, because of the shmem key? I'd say that's not important enough

I think we also need to be able to pass some additional parameters to
postgres:
- config_file, hba_file, ... might be required to start pg in some environments
- -P, -O , are sometimes required in cases single user mode is
neccessary for data recovery.

Right, so by the time we're done, we'd still need a connection string or
the moral equivalent.

My feeling is that we should just treat the executable name and data
directory path as new connection parameters, which'd be ignored in
normal-connection mode, just as some other parameters will be ignored in
single-user mode. Otherwise we'll find ourselves building parameter
setting infrastructure that pretty much duplicates what's there for the
existing connection parameters.

I think the special-purpose command line switches you mention can be
passed through PGOPTIONS, rather than inventing a new parameter -- do you
have an objection to that?

Not sure if we need anything but the pid of the postmaster be returned?

The new PQconnect routine would certainly hand back a PGconn. I think
we'd need a new query function PQchildPid(PGconn *) or some such to
provide access to the child process PID.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#46)

On 2013-12-06 11:02:48 -0500, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:
My feeling is that we should just treat the executable name and data
directory path as new connection parameters, which'd be ignored in
normal-connection mode, just as some other parameters will be ignored in
single-user mode. Otherwise we'll find ourselves building parameter
setting infrastructure that pretty much duplicates what's there for the
existing connection parameters.

Right.

I think the special-purpose command line switches you mention can be
passed through PGOPTIONS, rather than inventing a new parameter -- do you
have an objection to that?

I am not sure if they currently will get recognized early enough and
whether permission checking will interferes, but if so, that's probably
fixable.

Not sure if we need anything but the pid of the postmaster be returned?

The new PQconnect routine would certainly hand back a PGconn. I think
we'd need a new query function PQchildPid(PGconn *) or some such to
provide access to the child process PID.

I was thinking of a pid_t* argument to the new routine, but it's likely
unneccessary as we're probably going to end up storing it in PGconn
anyway.

There's the question what we're going to end up doing with the current
single user mode? There's some somewhat ugly code around for it...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#47)

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-12-06 11:02:48 -0500, Tom Lane wrote:

I think the special-purpose command line switches you mention can be
passed through PGOPTIONS, rather than inventing a new parameter -- do you
have an objection to that?

I am not sure if they currently will get recognized early enough and
whether permission checking will interferes, but if so, that's probably
fixable.

Shouldn't be a problem --- the single-user mode will just concatenate
the options parameter onto the command line it builds.

There's the question what we're going to end up doing with the current
single user mode? There's some somewhat ugly code around for it...

Nothing, in the short term. In a release or two we can get rid of it,
probably, but I'd hesitate to provide no overlap at all of these
usage modes.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#45)

On Fri, Dec 6, 2013 at 04:04:36PM +0100, Andres Freund wrote:

On 2013-12-05 23:01:28 +0200, Heikki Linnakangas wrote:

On 12/05/2013 10:37 PM, Robert Haas wrote:

On Thu, Dec 5, 2013 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It might be unpleasant to use in some cases, though.

Why would there be more than a few cases in the first place? Who is
going to use this beyond psql, pg_dump(all), and pg_upgrade, and why?

Well, you might want to use pgAdmin, or your other favorite admin tool. I'm
not sure how well it would work, and I think it's OK if we say "sorry, can't
do that", but it's not a crazy thing to want.

Pgadmin wouldn't work, it uses multiple connections for anything but the
most trivial tasks. You can't even send a manual sql query using only
one connection.
I think that's true for most of the non-trivial tools.

FYI, pg_upgrade in parallel mode needs multiple database connections
too.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers