TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
This patch implementing the following TODO item
Allow parallel cores to be used by vacuumdb
/messages/by-id/4F10A728.7090403@agliodbs.com
Like Parallel pg_dump, vacuumdb is provided with the option to run the vacuum of multiple tables in parallel. [ vacuumdb -j ]
1. One new option is provided with vacuumdb to give the number of workers.
2. All worker will be started in beginning and all will be waiting for the vacuum instruction from the master.
3. Now, if table list is provided in vacuumdb command using -t then, it will send the vacuum of one table to one of the IDLE worker, next table to next IDLE worker and so on.
4. If vacuum is given for one DB then, it will execute select on pg_class to get the table list and fetch the table name one by one and also assign the vacuum responsibility to IDLE workers.
Performance Data by parallel vacuumdb:
Machine Configuration:
Core : 8
RAM: 24GB
Test Scenario:
16 tables all with 4M records. [many records are deleted and inserted using some pattern, (files is attached in the mail)]
Test Result
{Base Code} Time(s) %CPU Usage Avg Read(kB/s) Avg Write(kB/s)
521 3% 12000 20000
{With Parallel Vacuum Patch}
worker Time(s) %CPU Usage Avg Read(kB/s) Avg Write(kB/s)
1 518 3% 12000 20000 --> this will take the same path as base code
2 390 5% 14000 30000
8 235 7% 18000 40000
16 197 8% 20000 50000
Conclusion:
By running the vacuumdb in parallel, CPU and I/O throughput is increasing and it can give >50% performance improvement.
Work to be Done:
1. Documentations of the new command.
2. Parallel support for vacuum all db.
Is it required to move the common code for parallel operation of pg_dump and vacuumdb to one place and reuse it ?
Prototype patch is attached in the mail, please provide your feedback/Suggestions...
Thanks & Regards,
Dilip Kumar
Attachments:
vacuumdb_parallel_v1.patchapplication/octet-stream; name=vacuumdb_parallel_v1.patchDownload
diff --git a/src/bin/scripts/vac_parallel.c b/src/bin/scripts/vac_parallel.c
new file mode 100644
index 0000000..da56f86
--- /dev/null
+++ b/src/bin/scripts/vac_parallel.c
@@ -0,0 +1,1007 @@
+/*-------------------------------------------------------------------------
+ *
+ * vac_parallel.c
+ *
+ * Parallel support for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+#include "vac_parallel.h"
+
+#ifndef WIN32
+#include <sys/types.h>
+#include <sys/wait.h>
+#include "signal.h"
+#include <unistd.h>
+#include <fcntl.h>
+#endif
+
+#include "common.h"
+
+#define PIPE_READ 0
+#define PIPE_WRITE 1
+
+/* file-scope variables */
+#ifdef WIN32
+static unsigned int tMasterThreadId = 0;
+static HANDLE termEvent = INVALID_HANDLE_VALUE;
+static int pgpipe(int handles[2]);
+static int piperead(int s, char *buf, int len);
+bool parallel_init_done = false;
+DWORD mainThreadId;
+
+/*
+ * Structure to hold info passed by _beginthreadex() to the function it calls
+ * via its single allowed argument.
+ */
+typedef struct
+{
+ VacuumOption *vopt;
+ int worker;
+ int pipeRead;
+ int pipeWrite;
+} WorkerInfo;
+
+#define pipewrite(a,b,c) send(a,b,c,0)
+#else
+/*
+ * aborting is only ever used in the master, the workers are fine with just
+ * wantAbort.
+ */
+static bool aborting = false;
+static volatile sig_atomic_t wantAbort = 0;
+
+#define pgpipe(a) pipe(a)
+#define piperead(a,b,c) read(a,b,c)
+#define pipewrite(a,b,c) write(a,b,c)
+#endif
+
+static const char *modulename = gettext_noop("parallel vacuum");
+
+typedef struct ShutdownInformation
+{
+ ParallelState *pstate;
+ PGconn *conn;
+} ShutdownInformation;
+
+static ShutdownInformation shutdown_info;
+
+static char *readMessageFromPipe(int fd);
+static void
+SetupWorker(PGconn *connection, int pipefd[2], int worker);
+static void
+WaitForCommands(PGconn * connection, int pipefd[2]);
+static char *
+getMessageFromMaster(int pipefd[2]);
+#define messageStartsWith(msg, prefix) \
+ (strncmp(msg, prefix, strlen(prefix)) == 0)
+#define messageEquals(msg, pattern) \
+ (strcmp(msg, pattern) == 0)
+static char *
+getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker);
+
+static void
+sendMessageToMaster(int pipefd[2], const char *str);
+
+#ifndef WIN32
+static void sigTermHandler(int signum);
+#endif
+
+static ParallelSlot *
+GetMyPSlot(ParallelState *pstate);
+static int
+select_loop(int maxFd, fd_set *workerset);
+void
+exit_horribly(const char *modulename, const char *fmt,...);
+void
+init_parallel_vacuum_utils(void);
+
+static void exit_nicely(int code);
+
+static bool
+HasEveryWorkerTerminated(ParallelState *pstate);
+
+static ParallelSlot *
+GetMyPSlot(ParallelState *pstate)
+{
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+#ifdef WIN32
+ if (pstate->parallelSlot[i].threadId == GetCurrentThreadId())
+#else
+ if (pstate->parallelSlot[i].pid == getpid())
+#endif
+ return &(pstate->parallelSlot[i]);
+
+ return NULL;
+}
+
+/* Sends the error message from the worker to the master process */
+static void
+parallel_msg_master(ParallelSlot *slot, const char *modulename,
+ const char *fmt, va_list ap)
+{
+ char buf[512];
+ int pipefd[2];
+
+ pipefd[PIPE_READ] = slot->pipeRevRead;
+ pipefd[PIPE_WRITE] = slot->pipeRevWrite;
+
+ strcpy(buf, "ERROR ");
+ vsnprintf(buf + strlen("ERROR "),
+ sizeof(buf) - strlen("ERROR "), fmt, ap);
+
+ sendMessageToMaster(pipefd, buf);
+}
+
+/*
+ * Wait for the termination of the processes using the OS-specific method.
+ */
+static void
+WaitForTerminatingWorkers(ParallelState *pstate)
+{
+ while (!HasEveryWorkerTerminated(pstate))
+ {
+ ParallelSlot *slot = NULL;
+ int j;
+
+#ifndef WIN32
+ int status;
+ pid_t pid = wait(&status);
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].pid == pid)
+ slot = &(pstate->parallelSlot[j]);
+#else
+ uintptr_t hThread;
+ DWORD ret;
+ uintptr_t *lpHandles = pg_malloc(sizeof(HANDLE) * pstate->numWorkers);
+ int nrun = 0;
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].workerStatus != WRKR_TERMINATED)
+ {
+ lpHandles[nrun] = pstate->parallelSlot[j].hThread;
+ nrun++;
+ }
+ ret = WaitForMultipleObjects(nrun, (HANDLE *) lpHandles, false, INFINITE);
+ Assert(ret != WAIT_FAILED);
+ hThread = lpHandles[ret - WAIT_OBJECT_0];
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].hThread == hThread)
+ slot = &(pstate->parallelSlot[j]);
+
+ free(lpHandles);
+#endif
+ Assert(slot);
+
+ slot->workerStatus = WRKR_TERMINATED;
+ }
+ Assert(HasEveryWorkerTerminated(pstate));
+}
+
+
+#ifdef WIN32
+static unsigned __stdcall
+init_spawned_worker_win32(WorkerInfo *wi)
+{
+ PGconn *conn;
+ int pipefd[2] = {wi->pipeRead, wi->pipeWrite};
+ int worker = wi->worker;
+ VacuumOption *vopt = wi->vopt;
+
+ conn = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport, vopt->username, 1,
+ vopt->progname, false);
+
+ free(wi);
+ SetupWorker(conn, pipefd, worker);
+ _endthreadex(0);
+ return 0;
+}
+#endif
+
+
+
+ParallelState * ParallelVacuumStart(VacuumOption *vopt, int numWorkers)
+{
+ ParallelState *pstate;
+ int i;
+ const size_t slotSize = numWorkers * sizeof(ParallelSlot);
+
+ Assert(numWorkers > 0);
+
+ /* Ensure stdio state is quiesced before forking */
+ fflush(NULL);
+
+ pstate = (ParallelState *) pg_malloc(sizeof(ParallelState));
+
+ pstate->numWorkers = numWorkers;
+ pstate->parallelSlot = NULL;
+
+ if (numWorkers == 1)
+ return pstate;
+
+ pstate->parallelSlot = (ParallelSlot *) pg_malloc(slotSize);
+ memset((void *) pstate->parallelSlot, 0, slotSize);
+
+ /*
+ * Set the pstate in the shutdown_info. The exit handler uses pstate if
+ * set and falls back to AHX otherwise.
+ */
+#ifdef WIN32
+ tMasterThreadId = GetCurrentThreadId();
+#endif
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+#ifdef WIN32
+ WorkerInfo *wi;
+ uintptr_t handle;
+#else
+ pid_t pid;
+#endif
+ int pipeMW[2],
+ pipeWM[2];
+
+ if (pgpipe(pipeMW) < 0 || pgpipe(pipeWM) < 0)
+ exit_horribly(modulename,
+ "could not create communication channels: %s\n",
+ strerror(errno));
+
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ pstate->parallelSlot[i].args = (ParallelArgs *) pg_malloc(sizeof(ParallelArgs));
+#ifdef WIN32
+ /* Allocate a new structure for every worker */
+ wi = (WorkerInfo *) pg_malloc(sizeof(WorkerInfo));
+ wi->worker = i;
+ wi->pipeRead = pstate->parallelSlot[i].pipeRevRead = pipeMW[PIPE_READ];
+ wi->pipeWrite = pstate->parallelSlot[i].pipeRevWrite = pipeWM[PIPE_WRITE];
+ wi->vopt = vopt;
+ handle = _beginthreadex(NULL, 0, (void *) &init_spawned_worker_win32,
+ wi, 0, &(pstate->parallelSlot[i].threadId));
+ pstate->parallelSlot[i].hThread = handle;
+#else
+ pid = fork();
+ if (pid == 0)
+ {
+ /* we are the worker */
+ int j;
+ int pipefd[2] = {pipeMW[PIPE_READ], pipeWM[PIPE_WRITE]};
+
+ /*
+ * Store the fds for the reverse communication in pstate. Actually
+ * we only use this in case of an error and don't use pstate
+ * otherwise in the worker process. On Windows we write to the
+ * global pstate, in Unix we write to our process-local copy but
+ * that's also where we'd retrieve this information back from.
+ */
+ pstate->parallelSlot[i].pipeRevRead = pipefd[PIPE_READ];
+ pstate->parallelSlot[i].pipeRevWrite = pipefd[PIPE_WRITE];
+ pstate->parallelSlot[i].pid = getpid();
+
+ pstate->parallelSlot[i].args->connection
+ = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport,
+ vopt->username, 1,
+ vopt->progname, false);
+
+ /* close read end of Worker -> Master */
+ closesocket(pipeWM[PIPE_READ]);
+ /* close write end of Master -> Worker */
+ closesocket(pipeMW[PIPE_WRITE]);
+
+ /*
+ * Close all inherited fds for communication of the master with
+ * the other workers.
+ */
+ for (j = 0; j < i; j++)
+ {
+ closesocket(pstate->parallelSlot[j].pipeRead);
+ closesocket(pstate->parallelSlot[j].pipeWrite);
+ }
+
+ SetupWorker(pstate->parallelSlot[i].args->connection, pipefd, i);
+ exit(0);
+ }
+ else if (pid < 0)
+ /* fork failed */
+ exit_horribly(modulename,
+ "could not create worker process: %s\n",
+ strerror(errno));
+
+ /* we are the Master, pid > 0 here */
+ Assert(pid > 0);
+
+ /* close read end of Master -> Worker */
+ closesocket(pipeMW[PIPE_READ]);
+ /* close write end of Worker -> Master */
+ closesocket(pipeWM[PIPE_WRITE]);
+
+ pstate->parallelSlot[i].pid = pid;
+#endif
+
+ pstate->parallelSlot[i].pipeRead = pipeWM[PIPE_READ];
+ pstate->parallelSlot[i].pipeWrite = pipeMW[PIPE_WRITE];
+ }
+
+ return pstate;
+}
+
+/*
+ * Tell all of our workers to terminate.
+ *
+ * Pretty straightforward routine, first we tell everyone to terminate, then
+ * we listen to the workers' replies and finally close the sockets that we
+ * have used for communication.
+ */
+void
+ParallelVacuumEnd(ParallelState *pstate)
+{
+ int i;
+
+ if (pstate->numWorkers == 1)
+ return;
+
+ Assert(IsEveryWorkerIdle(pstate));
+
+ /* close the sockets so that the workers know they can exit */
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ closesocket(pstate->parallelSlot[i].pipeRead);
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+ }
+
+ WaitForTerminatingWorkers(pstate);
+
+ /*
+ * Remove the pstate again, so the exit handler in the parent will now
+ * again fall back to closing AH->connection (if connected).
+ */
+ shutdown_info.pstate = NULL;
+
+ free(pstate->parallelSlot);
+ free(pstate);
+}
+
+
+/*
+ * Find the first free parallel slot (if any).
+ */
+int
+GetIdleWorker(ParallelState *pstate)
+{
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus == WRKR_IDLE)
+ return i;
+ return NO_SLOT;
+}
+
+/*
+ * Return true iff every worker process is in the WRKR_TERMINATED state.
+ */
+static bool
+HasEveryWorkerTerminated(ParallelState *pstate)
+{
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_TERMINATED)
+ return false;
+ return true;
+}
+
+
+
+/*
+ * This function is called by both UNIX and Windows variants to set up a
+ * worker process.
+ */
+static void
+SetupWorker(PGconn *connection, int pipefd[2], int worker)
+{
+ /*
+ * Call the setup worker function that's defined in the ArchiveHandle.
+ *
+ * We get the raw connection only for the reason that we can close it
+ * properly when we shut down. This happens only that way when it is
+ * brought down because of an error.
+ */
+ WaitForCommands(connection, pipefd);
+ closesocket(pipefd[PIPE_READ]);
+ closesocket(pipefd[PIPE_WRITE]);
+}
+
+
+
+/*
+ * That's the main routine for the worker.
+ * When it starts up it enters this routine and waits for commands from the
+ * master process. After having processed a command it comes back to here to
+ * wait for the next command. Finally it will receive a TERMINATE command and
+ * exit.
+ */
+static void
+WaitForCommands(PGconn * connection, int pipefd[2])
+{
+ char *command;
+
+ for (;;)
+ {
+ if (!(command = getMessageFromMaster(pipefd)))
+ {
+ PQfinish(connection);
+ connection = NULL;
+ return;
+ }
+
+ /*
+ * The message we return here has been pg_malloc()ed and we are
+ * responsible for free()ing it.
+ */
+ if (executeMaintenanceCommand(connection, command, false))
+ sendMessageToMaster(pipefd, "OK");
+ else
+ sendMessageToMaster(pipefd, "ERROR : Execute failed");
+
+ /* command was pg_malloc'd and we are responsible for free()ing it. */
+ free(command);
+ }
+}
+
+/*
+ * ---------------------------------------------------------------------
+ * Note the status change:
+ *
+ * DispatchJobForTocEntry WRKR_IDLE -> WRKR_WORKING
+ * ListenToWorkers WRKR_WORKING -> WRKR_FINISHED / WRKR_TERMINATED
+ * ReapWorkerStatus WRKR_FINISHED -> WRKR_IDLE
+ * ---------------------------------------------------------------------
+ *
+ * Just calling ReapWorkerStatus() when all workers are working might or might
+ * not give you an idle worker because you need to call ListenToWorkers() in
+ * between and only thereafter ReapWorkerStatus(). This is necessary in order
+ * to get and deal with the status (=result) of the worker's execution.
+ */
+void
+ListenToWorkers(ParallelState *pstate, bool do_wait)
+{
+ int worker;
+ char *msg;
+
+ msg = getMessageFromWorker(pstate, do_wait, &worker);
+
+ if (!msg)
+ {
+ if (do_wait)
+ exit_horribly(modulename, "a worker process died unexpectedly\n");
+ return;
+ }
+
+ if (messageStartsWith(msg, "OK"))
+ {
+ pstate->parallelSlot[worker].workerStatus = WRKR_FINISHED;
+
+ }
+ else if (messageStartsWith(msg, "ERROR "))
+ {
+ pstate->parallelSlot[worker].workerStatus = WRKR_TERMINATED;
+ exit_horribly(modulename, "%s", msg + strlen("ERROR "));
+ }
+ else
+ exit_horribly(modulename, "invalid message received from worker: %s\n", msg);
+
+ /* both Unix and Win32 return pg_malloc()ed space, so we free it */
+ free(msg);
+}
+
+/*
+ * This function is executed in the master process.
+ *
+ * This function is used to get the return value of a terminated worker
+ * process. If a process has terminated, its status is stored in *status and
+ * the id of the worker is returned.
+ */
+int
+ReapWorkerStatus(ParallelState *pstate, int *status)
+{
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_FINISHED)
+ {
+ *status = pstate->parallelSlot[i].status;
+ pstate->parallelSlot[i].status = 0;
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ return i;
+ }
+ }
+
+ return NO_SLOT;
+}
+
+/*
+ * This function is executed in the master process.
+ *
+ * It looks for an idle worker process and only returns if there is one.
+ */
+void
+EnsureIdleWorker(ParallelState *pstate)
+{
+ int ret_worker;
+ int work_status;
+
+ for (;;)
+ {
+ int nTerm = 0;
+
+ while ((ret_worker = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ if (work_status != 0)
+ exit_horribly(modulename, "error processing a parallel work item\n");
+
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before dispatching
+ * the next item. If nTerm > 0 we already have that (quick check).
+ */
+ if (nTerm > 0)
+ return;
+
+ /* explicit check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ return;
+
+ /*
+ * If we have no idle worker, read the result of one or more workers
+ * and loop the loop to call ReapWorkerStatus() on them
+ */
+ ListenToWorkers(pstate, true);
+ }
+}
+
+
+/*
+ * Return true iff every worker is in the WRKR_IDLE state.
+ */
+bool
+IsEveryWorkerIdle(ParallelState *pstate)
+{
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_IDLE)
+ return false;
+ return true;
+}
+
+
+/*
+ * This function is executed in the master process.
+ *
+ * It waits for all workers to terminate.
+ */
+void
+EnsureWorkersFinished(ParallelState *pstate)
+{
+ int work_status;
+
+ if (!pstate || pstate->numWorkers == 1)
+ return;
+
+ /* Waiting for the remaining worker processes to finish */
+ while (!IsEveryWorkerIdle(pstate))
+ {
+ if (ReapWorkerStatus(pstate, &work_status) == NO_SLOT)
+ ListenToWorkers(pstate, true);
+ else if (work_status != 0)
+ exit_horribly(modulename,
+ "error processing a parallel work item\n");
+ }
+}
+
+/*
+ * This function is executed in the worker process.
+ *
+ * It returns the next message on the communication channel, blocking until it
+ * becomes available.
+ */
+static char *
+getMessageFromMaster(int pipefd[2])
+{
+ return readMessageFromPipe(pipefd[PIPE_READ]);
+}
+
+/*
+ * This function is executed in the worker process.
+ *
+ * It sends a message to the master on the communication channel.
+ */
+static void
+sendMessageToMaster(int pipefd[2], const char *str)
+{
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pipefd[PIPE_WRITE], str, len) != len)
+ exit_horribly(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+}
+
+/*
+ * This function is executed in the master process.
+ *
+ * It returns the next message from the worker on the communication channel,
+ * optionally blocking (do_wait) until it becomes available.
+ *
+ * The id of the worker is returned in *worker.
+ */
+static char *
+getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker)
+{
+ int i;
+ fd_set workerset;
+ int maxFd = -1;
+ struct timeval nowait = {0, 0};
+
+ FD_ZERO(&workerset);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_TERMINATED)
+ continue;
+ FD_SET(pstate->parallelSlot[i].pipeRead, &workerset);
+ /* actually WIN32 ignores the first parameter to select()... */
+ if (pstate->parallelSlot[i].pipeRead > maxFd)
+ maxFd = pstate->parallelSlot[i].pipeRead;
+ }
+
+ if (do_wait)
+ {
+ i = select_loop(maxFd, &workerset);
+ Assert(i != 0);
+ }
+ else
+ {
+ if ((i = select(maxFd + 1, &workerset, NULL, NULL, &nowait)) == 0)
+ return NULL;
+ }
+
+ if (i < 0)
+ exit_horribly(modulename, "error in ListenToWorkers(): %s\n", strerror(errno));
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ char *msg;
+
+ if (!FD_ISSET(pstate->parallelSlot[i].pipeRead, &workerset))
+ continue;
+
+ msg = readMessageFromPipe(pstate->parallelSlot[i].pipeRead);
+ *worker = i;
+ return msg;
+ }
+
+ Assert(false);
+ return NULL;
+}
+
+/*
+ * This function is executed in the master process.
+ *
+ * It sends a message to a certain worker on the communication channel.
+ */
+static void
+sendMessageToWorker(ParallelState *pstate, int worker, const char *str)
+{
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pstate->parallelSlot[worker].pipeWrite, str, len) != len)
+ {
+ /*
+ * If we're already aborting anyway, don't care if we succeed or not.
+ * The child might have gone already.
+ */
+#ifndef WIN32
+ if (!aborting)
+#endif
+ exit_horribly(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+ }
+}
+
+/*
+ * The underlying function to read a message from the communication channel
+ * (fd) with optional blocking (do_wait).
+ */
+static char *
+readMessageFromPipe(int fd)
+{
+ char *msg;
+ int msgsize,
+ bufsize;
+ int ret;
+
+ /*
+ * The problem here is that we need to deal with several possibilites: we
+ * could receive only a partial message or several messages at once. The
+ * caller expects us to return exactly one message however.
+ *
+ * We could either read in as much as we can and keep track of what we
+ * delivered back to the caller or we just read byte by byte. Once we see
+ * (char) 0, we know that it's the message's end. This would be quite
+ * inefficient for more data but since we are reading only on the command
+ * channel, the performance loss does not seem worth the trouble of
+ * keeping internal states for different file descriptors.
+ */
+ bufsize = 64; /* could be any number */
+ msg = (char *) pg_malloc(bufsize);
+
+ msgsize = 0;
+ for (;;)
+ {
+ Assert(msgsize <= bufsize);
+ ret = piperead(fd, msg + msgsize, 1);
+
+ /* worker has closed the connection or another error happened */
+ if (ret <= 0)
+ return NULL;
+
+ Assert(ret == 1);
+
+ if (msg[msgsize] == '\0')
+ return msg;
+
+ msgsize++;
+ if (msgsize == bufsize)
+ {
+ /* could be any number */
+ bufsize += 16;
+ msg = (char *) realloc(msg, bufsize);
+ }
+ }
+}
+
+/*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+static int
+select_loop(int maxFd, fd_set *workerset)
+{
+ int i;
+ fd_set saveSet = *workerset;
+
+#ifdef WIN32
+ /* should always be the master */
+ Assert(tMasterThreadId == GetCurrentThreadId());
+
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+#else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+
+ /*
+ * If we Ctrl-C the master process , it's likely that we interrupt
+ * select() here. The signal handler will set wantAbort == true and
+ * the shutdown journey starts from here. Note that we'll come back
+ * here later when we tell all workers to terminate and read their
+ * responses. But then we have aborting set to true.
+ */
+ if (wantAbort && !aborting)
+ exit_horribly(modulename, "terminated by user\n");
+
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+#endif
+
+ return i;
+}
+
+void
+DispatchJob(ParallelState *pstate, char * command)
+{
+ int worker;
+
+ /* our caller makes sure that at least one worker is idle */
+ Assert(GetIdleWorker(pstate) != NO_SLOT);
+ worker = GetIdleWorker(pstate);
+ Assert(worker != NO_SLOT);
+
+ sendMessageToWorker(pstate, worker, command);
+ pstate->parallelSlot[worker].workerStatus = WRKR_WORKING;
+}
+
+#ifdef WIN32
+/*
+ * This is a replacement version of pipe for Win32 which allows returned
+ * handles to be used in select(). Note that read/write calls must be replaced
+ * with recv/send.
+ */
+static int
+pgpipe(int handles[2])
+{
+ SOCKET s;
+ struct sockaddr_in serv_addr;
+ int len = sizeof(serv_addr);
+
+ handles[0] = handles[1] = INVALID_SOCKET;
+
+ if ((s = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
+ {
+ return -1;
+ }
+
+ memset((void *) &serv_addr, 0, sizeof(serv_addr));
+ serv_addr.sin_family = AF_INET;
+ serv_addr.sin_port = htons(0);
+ serv_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+ if (bind(s, (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if (listen(s, 1) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if (getsockname(s, (SOCKADDR *) &serv_addr, &len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if ((handles[1] = socket(PF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
+ {
+ closesocket(s);
+ return -1;
+ }
+
+ if (connect(handles[1], (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if ((handles[0] = accept(s, (SOCKADDR *) &serv_addr, &len)) == INVALID_SOCKET)
+ {
+ closesocket(handles[1]);
+ handles[1] = INVALID_SOCKET;
+ closesocket(s);
+ return -1;
+ }
+ closesocket(s);
+ return 0;
+}
+
+static int
+piperead(int s, char *buf, int len)
+{
+ int ret = recv(s, buf, len, 0);
+
+ if (ret < 0 && WSAGetLastError() == WSAECONNRESET)
+ /* EOF on the pipe! (win32 socket based implementation) */
+ ret = 0;
+ return ret;
+}
+
+#endif
+
+static void
+shutdown_parallel_vacuum_utils()
+{
+#ifdef WIN32
+ /* Call the cleanup function only from the main thread */
+ if (mainThreadId == GetCurrentThreadId())
+ WSACleanup();
+#endif
+}
+
+void
+init_parallel_vacuum_utils(void)
+{
+#ifdef WIN32
+ if (!parallel_init_done)
+ {
+ WSADATA wsaData;
+ int err;
+
+ mainThreadId = GetCurrentThreadId();
+ err = WSAStartup(MAKEWORD(2, 2), &wsaData);
+ if (err != 0)
+ {
+ exit_nicely(1);
+ }
+ parallel_init_done = true;
+ }
+#endif
+}
+
+void
+on_exit_close_connection(PGconn *conn)
+{
+ shutdown_info.conn = conn;
+}
+
+/*
+ * Fail and die, with a message to stderr. Parameters as for write_msg.
+ *
+ * This is defined in parallel.c, because in parallel mode, things are more
+ * complicated. If the worker process does exit_horribly(), we forward its
+ * last words to the master process. The master process then does
+ * exit_horribly() with this error message itself and prints it normally.
+ * After printing the message, exit_horribly() on the master will shut down
+ * the remaining worker processes.
+ */
+void
+exit_horribly(const char *modulename, const char *fmt,...)
+{
+ va_list ap;
+ ParallelState *pstate = shutdown_info.pstate;
+ ParallelSlot *slot;
+
+ va_start(ap, fmt);
+
+ slot = GetMyPSlot(pstate);
+ if (!slot)
+ {
+ /* We're the parent, just write the message out */
+ vfprintf(stderr, _(fmt), ap);
+ ParallelVacuumEnd(pstate);
+ PQfinish(shutdown_info.conn);
+ }
+ else
+ {
+ /* If we're a worker process, send the msg to the master process */
+ parallel_msg_master(slot, modulename, fmt, ap);
+ PQfinish(slot->args->connection);
+ }
+
+ va_end(ap);
+ exit_nicely(1);
+}
+
+static void
+exit_nicely(int code)
+{
+ shutdown_parallel_vacuum_utils();
+ exit(code);
+}
+
+
diff --git a/src/bin/scripts/vac_parallel.h b/src/bin/scripts/vac_parallel.h
new file mode 100644
index 0000000..21100b1
--- /dev/null
+++ b/src/bin/scripts/vac_parallel.h
@@ -0,0 +1,102 @@
+/*-------------------------------------------------------------------------
+ *
+ * vac_parallel.h
+ *
+ * Parallel support header file for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef VAC_PARALLEL_H
+#define VAC_PARALLEL_H
+
+#include "postgres_fe.h"
+#include <time.h>
+#include "libpq-fe.h"
+
+
+typedef enum
+{
+ WRKR_TERMINATED = 0,
+ WRKR_IDLE,
+ WRKR_WORKING,
+ WRKR_FINISHED
+} T_WorkerStatus;
+
+/* Arguments needed for a worker process */
+typedef struct ParallelArgs
+{
+ PGconn *connection;
+} ParallelArgs;
+
+/* State for each parallel activity slot */
+typedef struct ParallelSlot
+{
+ ParallelArgs *args; //can pass connection handle here
+ T_WorkerStatus workerStatus;
+ int status;
+ int pipeRead;
+ int pipeWrite;
+ int pipeRevRead;
+ int pipeRevWrite;
+#ifdef WIN32
+ uintptr_t hThread;
+ unsigned int threadId;
+#else
+ pid_t pid;
+#endif
+} ParallelSlot;
+
+#define NO_SLOT (-1)
+
+typedef struct ParallelState
+{
+ int numWorkers;
+ ParallelSlot *parallelSlot;
+} ParallelState;
+
+
+
+typedef struct VacuumOption
+{
+ char *dbname;
+ char *pgport;
+ char *pghost;
+ char *username;
+ char * progname;
+}VacuumOption;
+
+
+#ifdef WIN32
+extern bool parallel_init_done;
+extern DWORD mainThreadId;
+#endif
+
+extern ParallelState * ParallelVacuumStart(VacuumOption *vopt, int numWorkers);
+extern int GetIdleWorker(ParallelState *pstate);
+extern bool IsEveryWorkerIdle(ParallelState *pstate);
+extern void ListenToWorkers(ParallelState *pstate, bool do_wait);
+extern int ReapWorkerStatus(ParallelState *pstate, int *status);
+extern void EnsureIdleWorker(ParallelState *pstate);
+extern void EnsureWorkersFinished(ParallelState *pstate);
+
+extern void DispatchJob(ParallelState *pstate, char * command);
+extern void
+exit_horribly(const char *modulename, const char *fmt,...)
+__attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3), noreturn));
+
+extern void init_parallel_vacuum_utils(void);
+extern void on_exit_close_connection(PGconn *conn);
+extern void ParallelVacuumEnd(ParallelState *pstate);
+
+
+#endif /* VAC_PARALLEL_H */
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index e4dde1f..4d43ae9 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -13,6 +13,7 @@
#include "postgres_fe.h"
#include "common.h"
#include "dumputils.h"
+#include "vac_parallel.h"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
@@ -29,6 +30,18 @@ static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
static void help(const char *progname);
+void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool freeze,
+ const char *table, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int parallel,
+ SimpleStringList *tables);
+
+void run_command(ParallelState *pstate, char *command);
+
+void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+
int
main(int argc, char *argv[])
@@ -49,6 +62,7 @@ main(int argc, char *argv[])
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"parallel", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{NULL, 0, NULL, 0}
};
@@ -72,13 +86,14 @@ main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int parallel = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
- while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
+ while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
@@ -127,6 +142,9 @@ main(int argc, char *argv[])
case 'v':
verbose = true;
break;
+ case 'j':
+ parallel = atoi(optarg);
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
@@ -136,6 +154,7 @@ main(int argc, char *argv[])
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
@@ -209,21 +228,49 @@ main(int argc, char *argv[])
{
SimpleStringListCell *cell;
- for (cell = tables.head; cell; cell = cell->next)
- {
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only,
- freeze, cell->val,
- host, port, username, prompt_password,
- progname, echo);
- }
+ if (parallel < 2)
+ {
+ for (cell = tables.head; cell; cell = cell->next)
+ {
+ vacuum_one_database(dbname, full, verbose, and_analyze,
+ analyze_only,
+ freeze, cell->val,
+ host, port, username, prompt_password,
+ progname, echo);
+ }
+ }
+ else
+ {
+ vacuum_parallel(dbname, full, verbose, and_analyze,
+ analyze_only,
+ freeze, NULL,
+ host, port, username, prompt_password,
+ progname, echo, parallel, &tables);
+
+ }
}
else
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only,
- freeze, NULL,
- host, port, username, prompt_password,
- progname, echo);
+ {
+
+ if (parallel < 2)
+ {
+ vacuum_one_database(dbname, full, verbose, and_analyze,
+ analyze_only,
+ freeze, NULL,
+ host, port, username, prompt_password,
+ progname, echo);
+ }
+ else
+ {
+
+ vacuum_parallel(dbname, full, verbose, and_analyze,
+ analyze_only,
+ freeze, NULL,
+ host, port, username, prompt_password,
+ progname, echo, parallel, NULL);
+ }
+ }
+
}
exit(0);
@@ -351,6 +398,173 @@ vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
}
+void
+vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool freeze,
+ const char *table, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int parallel,
+ SimpleStringList *tables)
+{
+ PQExpBufferData sql;
+
+ PGconn *conn;
+ PGresult *res;
+ int ntuple;
+ int i;
+ char *relName;
+ char *nspace;
+ VacuumOption vopt;
+ ParallelState *pstate;
+
+ init_parallel_vacuum_utils();
+
+ initPQExpBuffer(&sql);
+
+ conn = connectDatabase(dbname, host, port, username, prompt_password,
+ progname, false);
+
+ vopt.dbname = dbname;
+ vopt.pghost = host;
+ vopt.pgport = port;
+ vopt.username = username;
+ vopt.progname = progname;
+
+ on_exit_close_connection(conn);
+
+ pstate = ParallelVacuumStart(&vopt, parallel);
+
+ if (tables == NULL)
+ {
+ res = executeQuery(conn,
+ "select relname, nspname from pg_class c, pg_namespace ns"
+ " where relkind= \'r\' and c.relnamespace = ns.oid",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ relName = PQgetvalue(res, i, 0);
+ nspace = PQgetvalue(res, i, 1);
+ prepare_command(conn, full, verbose, and_analyze,
+ analyze_only, freeze, &sql);
+
+ appendPQExpBuffer(&sql, " %s.%s", nspace, relName);
+ run_command(pstate, sql.data);
+ termPQExpBuffer(&sql);
+ }
+ }
+ else
+ {
+ SimpleStringListCell *cell;
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ prepare_command(conn, full, verbose, and_analyze,
+ analyze_only, freeze, &sql);
+ appendPQExpBuffer(&sql, " %s", cell->val);
+ run_command(pstate, sql.data);
+ termPQExpBuffer(&sql);
+ }
+ }
+
+ EnsureWorkersFinished(pstate);
+ ParallelVacuumEnd(pstate);
+
+ PQfinish(conn);
+ termPQExpBuffer(&sql);
+}
+
+void run_command(ParallelState *pstate, char *command)
+{
+ int work_status;
+ int ret_child;
+
+ DispatchJob(pstate, command);
+
+ /*Listen for worker and get message*/
+ for (;;)
+ {
+ int nTerm = 0;
+
+ ListenToWorkers(pstate, false);
+ while ((ret_child = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before
+ * re-running the loop. If nTerm > 0 we already have that (quick
+ * check).
+ */
+ if (nTerm > 0)
+ break;
+
+ /* if nobody terminated, explicitly check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ break;
+ }
+}
+
+void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+{
+ initPQExpBuffer(sql);
+
+ if (analyze_only)
+ {
+ appendPQExpBuffer(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBuffer(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBuffer(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBuffer(sql, " FULL");
+ if (freeze)
+ appendPQExpBuffer(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBuffer(sql, " ANALYZE");
+ }
+ }
+}
+
+
static void
help(const char *progname)
{
On 07-11-2013 09:42, Dilip kumar wrote:
Dilip, this is on my TODO for 9.4. I've already had a half-backed patch
for it. Let's see what I can come up with.
Is it required to move the common code for parallel operation of pg_dump and vacuumdb to one place and reuse it ?
I'm not sure about that because the pg_dump parallel code is tight to
TOC entry. Also, dependency matters for pg_dump while in the scripts
case, an order to be choosen will be used. However, vacuumdb can share
the parallel code with clusterdb and reindexdb (my patch does it).
Of course, a refactor to unify parallel code (pg_dump and scripts) can
be done in a separate patch.
Prototype patch is attached in the mail, please provide your feedback/Suggestions...
I'll try to merge your patch with the one I have here until the next CF.
--
Euler Taveira Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 08 November 2013 03:22, Euler Taveira Wrote
On 07-11-2013 09:42, Dilip kumar wrote:
Dilip, this is on my TODO for 9.4. I've already had a half-backed patch
for it. Let's see what I can come up with.
Ok, Let me know if I can contribute to this..
Is it required to move the common code for parallel operation of
pg_dump and vacuumdb to one place and reuse it ?
I'm not sure about that because the pg_dump parallel code is tight to
TOC entry. Also, dependency matters for pg_dump while in the scripts
case, an order to be choosen will be used. However, vacuumdb can share
the parallel code with clusterdb and reindexdb (my patch does it).
+1
Of course, a refactor to unify parallel code (pg_dump and scripts) can
be done in a separate patch.Prototype patch is attached in the mail, please provide your
feedback/Suggestions...
I'll try to merge your patch with the one I have here until the next CF.
Regards,
Dilip
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Am 07.11.2013 12:42, schrieb Dilip kumar:
This patch
implementing the following TODO item
Allow parallel cores to be
used by vacuumdb
/messages/by-id/4F10A728.7090403@agliodbs.com [1]/messages/by-id/4F10A728.7090403@agliodbs.com
Like Parallel pg_dump, vacuumdb is provided with the option to run
the vacuum of multiple tables in parallel. [ VACUUMDB –J ]
1. One
new option is provided with vacuumdb to give the number of workers.
2. All worker will be started in beginning and all will be waiting
for the vacuum instruction from the master.
3. Now, if table list
is provided in vacuumdb command using -t then, it will send the vacuum
of one table to one of the IDLE worker, next table to next IDLE worker
and so on.
4. If vacuum is given for one DB then, it will execute
select on pg_class to get the table list and fetch the table name one by
one and also assign the vacuum responsibility to IDLE workers.
[...]
For this use case, would it make sense to queue work (tables) in
order of their size, starting on the largest one?
For the case where
you have tables of varying size this would lead to a reduced overall
processing time as it prevents large (read: long processing time) tables
to be processed in the last step. While processing large tables at first
and filling up "processing slots/jobs" when they get free with smaller
tables one after the other would safe overall execution time.
Regards
Jan
--
professional: http://www.oscar-consult.de
Links:
------
[1]: /messages/by-id/4F10A728.7090403@agliodbs.com
/messages/by-id/4F10A728.7090403@agliodbs.com
On 08 November 2013 13:38, Jan Lentfer
For this use case, would it make sense to queue work (tables) in order of their size, starting on the largest one?
For the case where you have tables of varying size this would lead to a reduced overall processing time as it prevents large (read: long processing time) tables to be processed in the last step. While processing large tables at first and filling up "processing slots/jobs" when they get free with smaller tables one after the other would safe overall execution time.
Good point, I have made the change and attached the modified patch.
Regards,
Dilip
Attachments:
vacuumdb_parallel_v2.patchapplication/octet-stream; name=vacuumdb_parallel_v2.patchDownload
diff --git a/src/bin/scripts/vac_parallel.c b/src/bin/scripts/vac_parallel.c
new file mode 100644
index 0000000..da56f86
--- /dev/null
+++ b/src/bin/scripts/vac_parallel.c
@@ -0,0 +1,1007 @@
+/*-------------------------------------------------------------------------
+ *
+ * vac_parallel.c
+ *
+ * Parallel support for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+#include "vac_parallel.h"
+
+#ifndef WIN32
+#include <sys/types.h>
+#include <sys/wait.h>
+#include "signal.h"
+#include <unistd.h>
+#include <fcntl.h>
+#endif
+
+#include "common.h"
+
+#define PIPE_READ 0
+#define PIPE_WRITE 1
+
+/* file-scope variables */
+#ifdef WIN32
+static unsigned int tMasterThreadId = 0;
+static HANDLE termEvent = INVALID_HANDLE_VALUE;
+static int pgpipe(int handles[2]);
+static int piperead(int s, char *buf, int len);
+bool parallel_init_done = false;
+DWORD mainThreadId;
+
+/*
+ * Structure to hold info passed by _beginthreadex() to the function it calls
+ * via its single allowed argument.
+ */
+typedef struct
+{
+ VacuumOption *vopt;
+ int worker;
+ int pipeRead;
+ int pipeWrite;
+} WorkerInfo;
+
+#define pipewrite(a,b,c) send(a,b,c,0)
+#else
+/*
+ * aborting is only ever used in the master, the workers are fine with just
+ * wantAbort.
+ */
+static bool aborting = false;
+static volatile sig_atomic_t wantAbort = 0;
+
+#define pgpipe(a) pipe(a)
+#define piperead(a,b,c) read(a,b,c)
+#define pipewrite(a,b,c) write(a,b,c)
+#endif
+
+static const char *modulename = gettext_noop("parallel vacuum");
+
+typedef struct ShutdownInformation
+{
+ ParallelState *pstate;
+ PGconn *conn;
+} ShutdownInformation;
+
+static ShutdownInformation shutdown_info;
+
+static char *readMessageFromPipe(int fd);
+static void
+SetupWorker(PGconn *connection, int pipefd[2], int worker);
+static void
+WaitForCommands(PGconn * connection, int pipefd[2]);
+static char *
+getMessageFromMaster(int pipefd[2]);
+#define messageStartsWith(msg, prefix) \
+ (strncmp(msg, prefix, strlen(prefix)) == 0)
+#define messageEquals(msg, pattern) \
+ (strcmp(msg, pattern) == 0)
+static char *
+getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker);
+
+static void
+sendMessageToMaster(int pipefd[2], const char *str);
+
+#ifndef WIN32
+static void sigTermHandler(int signum);
+#endif
+
+static ParallelSlot *
+GetMyPSlot(ParallelState *pstate);
+static int
+select_loop(int maxFd, fd_set *workerset);
+void
+exit_horribly(const char *modulename, const char *fmt,...);
+void
+init_parallel_vacuum_utils(void);
+
+static void exit_nicely(int code);
+
+static bool
+HasEveryWorkerTerminated(ParallelState *pstate);
+
+static ParallelSlot *
+GetMyPSlot(ParallelState *pstate)
+{
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+#ifdef WIN32
+ if (pstate->parallelSlot[i].threadId == GetCurrentThreadId())
+#else
+ if (pstate->parallelSlot[i].pid == getpid())
+#endif
+ return &(pstate->parallelSlot[i]);
+
+ return NULL;
+}
+
+/* Sends the error message from the worker to the master process */
+static void
+parallel_msg_master(ParallelSlot *slot, const char *modulename,
+ const char *fmt, va_list ap)
+{
+ char buf[512];
+ int pipefd[2];
+
+ pipefd[PIPE_READ] = slot->pipeRevRead;
+ pipefd[PIPE_WRITE] = slot->pipeRevWrite;
+
+ strcpy(buf, "ERROR ");
+ vsnprintf(buf + strlen("ERROR "),
+ sizeof(buf) - strlen("ERROR "), fmt, ap);
+
+ sendMessageToMaster(pipefd, buf);
+}
+
+/*
+ * Wait for the termination of the processes using the OS-specific method.
+ */
+static void
+WaitForTerminatingWorkers(ParallelState *pstate)
+{
+ while (!HasEveryWorkerTerminated(pstate))
+ {
+ ParallelSlot *slot = NULL;
+ int j;
+
+#ifndef WIN32
+ int status;
+ pid_t pid = wait(&status);
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].pid == pid)
+ slot = &(pstate->parallelSlot[j]);
+#else
+ uintptr_t hThread;
+ DWORD ret;
+ uintptr_t *lpHandles = pg_malloc(sizeof(HANDLE) * pstate->numWorkers);
+ int nrun = 0;
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].workerStatus != WRKR_TERMINATED)
+ {
+ lpHandles[nrun] = pstate->parallelSlot[j].hThread;
+ nrun++;
+ }
+ ret = WaitForMultipleObjects(nrun, (HANDLE *) lpHandles, false, INFINITE);
+ Assert(ret != WAIT_FAILED);
+ hThread = lpHandles[ret - WAIT_OBJECT_0];
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].hThread == hThread)
+ slot = &(pstate->parallelSlot[j]);
+
+ free(lpHandles);
+#endif
+ Assert(slot);
+
+ slot->workerStatus = WRKR_TERMINATED;
+ }
+ Assert(HasEveryWorkerTerminated(pstate));
+}
+
+
+#ifdef WIN32
+static unsigned __stdcall
+init_spawned_worker_win32(WorkerInfo *wi)
+{
+ PGconn *conn;
+ int pipefd[2] = {wi->pipeRead, wi->pipeWrite};
+ int worker = wi->worker;
+ VacuumOption *vopt = wi->vopt;
+
+ conn = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport, vopt->username, 1,
+ vopt->progname, false);
+
+ free(wi);
+ SetupWorker(conn, pipefd, worker);
+ _endthreadex(0);
+ return 0;
+}
+#endif
+
+
+
+ParallelState * ParallelVacuumStart(VacuumOption *vopt, int numWorkers)
+{
+ ParallelState *pstate;
+ int i;
+ const size_t slotSize = numWorkers * sizeof(ParallelSlot);
+
+ Assert(numWorkers > 0);
+
+ /* Ensure stdio state is quiesced before forking */
+ fflush(NULL);
+
+ pstate = (ParallelState *) pg_malloc(sizeof(ParallelState));
+
+ pstate->numWorkers = numWorkers;
+ pstate->parallelSlot = NULL;
+
+ if (numWorkers == 1)
+ return pstate;
+
+ pstate->parallelSlot = (ParallelSlot *) pg_malloc(slotSize);
+ memset((void *) pstate->parallelSlot, 0, slotSize);
+
+ /*
+ * Set the pstate in the shutdown_info. The exit handler uses pstate if
+ * set and falls back to AHX otherwise.
+ */
+#ifdef WIN32
+ tMasterThreadId = GetCurrentThreadId();
+#endif
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+#ifdef WIN32
+ WorkerInfo *wi;
+ uintptr_t handle;
+#else
+ pid_t pid;
+#endif
+ int pipeMW[2],
+ pipeWM[2];
+
+ if (pgpipe(pipeMW) < 0 || pgpipe(pipeWM) < 0)
+ exit_horribly(modulename,
+ "could not create communication channels: %s\n",
+ strerror(errno));
+
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ pstate->parallelSlot[i].args = (ParallelArgs *) pg_malloc(sizeof(ParallelArgs));
+#ifdef WIN32
+ /* Allocate a new structure for every worker */
+ wi = (WorkerInfo *) pg_malloc(sizeof(WorkerInfo));
+ wi->worker = i;
+ wi->pipeRead = pstate->parallelSlot[i].pipeRevRead = pipeMW[PIPE_READ];
+ wi->pipeWrite = pstate->parallelSlot[i].pipeRevWrite = pipeWM[PIPE_WRITE];
+ wi->vopt = vopt;
+ handle = _beginthreadex(NULL, 0, (void *) &init_spawned_worker_win32,
+ wi, 0, &(pstate->parallelSlot[i].threadId));
+ pstate->parallelSlot[i].hThread = handle;
+#else
+ pid = fork();
+ if (pid == 0)
+ {
+ /* we are the worker */
+ int j;
+ int pipefd[2] = {pipeMW[PIPE_READ], pipeWM[PIPE_WRITE]};
+
+ /*
+ * Store the fds for the reverse communication in pstate. Actually
+ * we only use this in case of an error and don't use pstate
+ * otherwise in the worker process. On Windows we write to the
+ * global pstate, in Unix we write to our process-local copy but
+ * that's also where we'd retrieve this information back from.
+ */
+ pstate->parallelSlot[i].pipeRevRead = pipefd[PIPE_READ];
+ pstate->parallelSlot[i].pipeRevWrite = pipefd[PIPE_WRITE];
+ pstate->parallelSlot[i].pid = getpid();
+
+ pstate->parallelSlot[i].args->connection
+ = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport,
+ vopt->username, 1,
+ vopt->progname, false);
+
+ /* close read end of Worker -> Master */
+ closesocket(pipeWM[PIPE_READ]);
+ /* close write end of Master -> Worker */
+ closesocket(pipeMW[PIPE_WRITE]);
+
+ /*
+ * Close all inherited fds for communication of the master with
+ * the other workers.
+ */
+ for (j = 0; j < i; j++)
+ {
+ closesocket(pstate->parallelSlot[j].pipeRead);
+ closesocket(pstate->parallelSlot[j].pipeWrite);
+ }
+
+ SetupWorker(pstate->parallelSlot[i].args->connection, pipefd, i);
+ exit(0);
+ }
+ else if (pid < 0)
+ /* fork failed */
+ exit_horribly(modulename,
+ "could not create worker process: %s\n",
+ strerror(errno));
+
+ /* we are the Master, pid > 0 here */
+ Assert(pid > 0);
+
+ /* close read end of Master -> Worker */
+ closesocket(pipeMW[PIPE_READ]);
+ /* close write end of Worker -> Master */
+ closesocket(pipeWM[PIPE_WRITE]);
+
+ pstate->parallelSlot[i].pid = pid;
+#endif
+
+ pstate->parallelSlot[i].pipeRead = pipeWM[PIPE_READ];
+ pstate->parallelSlot[i].pipeWrite = pipeMW[PIPE_WRITE];
+ }
+
+ return pstate;
+}
+
+/*
+ * Tell all of our workers to terminate.
+ *
+ * Pretty straightforward routine, first we tell everyone to terminate, then
+ * we listen to the workers' replies and finally close the sockets that we
+ * have used for communication.
+ */
+void
+ParallelVacuumEnd(ParallelState *pstate)
+{
+ int i;
+
+ if (pstate->numWorkers == 1)
+ return;
+
+ Assert(IsEveryWorkerIdle(pstate));
+
+ /* close the sockets so that the workers know they can exit */
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ closesocket(pstate->parallelSlot[i].pipeRead);
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+ }
+
+ WaitForTerminatingWorkers(pstate);
+
+ /*
+ * Remove the pstate again, so the exit handler in the parent will now
+ * again fall back to closing AH->connection (if connected).
+ */
+ shutdown_info.pstate = NULL;
+
+ free(pstate->parallelSlot);
+ free(pstate);
+}
+
+
+/*
+ * Find the first free parallel slot (if any).
+ */
+int
+GetIdleWorker(ParallelState *pstate)
+{
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus == WRKR_IDLE)
+ return i;
+ return NO_SLOT;
+}
+
+/*
+ * Return true iff every worker process is in the WRKR_TERMINATED state.
+ */
+static bool
+HasEveryWorkerTerminated(ParallelState *pstate)
+{
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_TERMINATED)
+ return false;
+ return true;
+}
+
+
+
+/*
+ * This function is called by both UNIX and Windows variants to set up a
+ * worker process.
+ */
+static void
+SetupWorker(PGconn *connection, int pipefd[2], int worker)
+{
+ /*
+ * Call the setup worker function that's defined in the ArchiveHandle.
+ *
+ * We get the raw connection only for the reason that we can close it
+ * properly when we shut down. This happens only that way when it is
+ * brought down because of an error.
+ */
+ WaitForCommands(connection, pipefd);
+ closesocket(pipefd[PIPE_READ]);
+ closesocket(pipefd[PIPE_WRITE]);
+}
+
+
+
+/*
+ * That's the main routine for the worker.
+ * When it starts up it enters this routine and waits for commands from the
+ * master process. After having processed a command it comes back to here to
+ * wait for the next command. Finally it will receive a TERMINATE command and
+ * exit.
+ */
+static void
+WaitForCommands(PGconn * connection, int pipefd[2])
+{
+ char *command;
+
+ for (;;)
+ {
+ if (!(command = getMessageFromMaster(pipefd)))
+ {
+ PQfinish(connection);
+ connection = NULL;
+ return;
+ }
+
+ /*
+ * The message we return here has been pg_malloc()ed and we are
+ * responsible for free()ing it.
+ */
+ if (executeMaintenanceCommand(connection, command, false))
+ sendMessageToMaster(pipefd, "OK");
+ else
+ sendMessageToMaster(pipefd, "ERROR : Execute failed");
+
+ /* command was pg_malloc'd and we are responsible for free()ing it. */
+ free(command);
+ }
+}
+
+/*
+ * ---------------------------------------------------------------------
+ * Note the status change:
+ *
+ * DispatchJobForTocEntry WRKR_IDLE -> WRKR_WORKING
+ * ListenToWorkers WRKR_WORKING -> WRKR_FINISHED / WRKR_TERMINATED
+ * ReapWorkerStatus WRKR_FINISHED -> WRKR_IDLE
+ * ---------------------------------------------------------------------
+ *
+ * Just calling ReapWorkerStatus() when all workers are working might or might
+ * not give you an idle worker because you need to call ListenToWorkers() in
+ * between and only thereafter ReapWorkerStatus(). This is necessary in order
+ * to get and deal with the status (=result) of the worker's execution.
+ */
+void
+ListenToWorkers(ParallelState *pstate, bool do_wait)
+{
+ int worker;
+ char *msg;
+
+ msg = getMessageFromWorker(pstate, do_wait, &worker);
+
+ if (!msg)
+ {
+ if (do_wait)
+ exit_horribly(modulename, "a worker process died unexpectedly\n");
+ return;
+ }
+
+ if (messageStartsWith(msg, "OK"))
+ {
+ pstate->parallelSlot[worker].workerStatus = WRKR_FINISHED;
+
+ }
+ else if (messageStartsWith(msg, "ERROR "))
+ {
+ pstate->parallelSlot[worker].workerStatus = WRKR_TERMINATED;
+ exit_horribly(modulename, "%s", msg + strlen("ERROR "));
+ }
+ else
+ exit_horribly(modulename, "invalid message received from worker: %s\n", msg);
+
+ /* both Unix and Win32 return pg_malloc()ed space, so we free it */
+ free(msg);
+}
+
+/*
+ * This function is executed in the master process.
+ *
+ * This function is used to get the return value of a terminated worker
+ * process. If a process has terminated, its status is stored in *status and
+ * the id of the worker is returned.
+ */
+int
+ReapWorkerStatus(ParallelState *pstate, int *status)
+{
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_FINISHED)
+ {
+ *status = pstate->parallelSlot[i].status;
+ pstate->parallelSlot[i].status = 0;
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ return i;
+ }
+ }
+
+ return NO_SLOT;
+}
+
+/*
+ * This function is executed in the master process.
+ *
+ * It looks for an idle worker process and only returns if there is one.
+ */
+void
+EnsureIdleWorker(ParallelState *pstate)
+{
+ int ret_worker;
+ int work_status;
+
+ for (;;)
+ {
+ int nTerm = 0;
+
+ while ((ret_worker = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ if (work_status != 0)
+ exit_horribly(modulename, "error processing a parallel work item\n");
+
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before dispatching
+ * the next item. If nTerm > 0 we already have that (quick check).
+ */
+ if (nTerm > 0)
+ return;
+
+ /* explicit check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ return;
+
+ /*
+ * If we have no idle worker, read the result of one or more workers
+ * and loop the loop to call ReapWorkerStatus() on them
+ */
+ ListenToWorkers(pstate, true);
+ }
+}
+
+
+/*
+ * Return true iff every worker is in the WRKR_IDLE state.
+ */
+bool
+IsEveryWorkerIdle(ParallelState *pstate)
+{
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_IDLE)
+ return false;
+ return true;
+}
+
+
+/*
+ * This function is executed in the master process.
+ *
+ * It waits for all workers to terminate.
+ */
+void
+EnsureWorkersFinished(ParallelState *pstate)
+{
+ int work_status;
+
+ if (!pstate || pstate->numWorkers == 1)
+ return;
+
+ /* Waiting for the remaining worker processes to finish */
+ while (!IsEveryWorkerIdle(pstate))
+ {
+ if (ReapWorkerStatus(pstate, &work_status) == NO_SLOT)
+ ListenToWorkers(pstate, true);
+ else if (work_status != 0)
+ exit_horribly(modulename,
+ "error processing a parallel work item\n");
+ }
+}
+
+/*
+ * This function is executed in the worker process.
+ *
+ * It returns the next message on the communication channel, blocking until it
+ * becomes available.
+ */
+static char *
+getMessageFromMaster(int pipefd[2])
+{
+ return readMessageFromPipe(pipefd[PIPE_READ]);
+}
+
+/*
+ * This function is executed in the worker process.
+ *
+ * It sends a message to the master on the communication channel.
+ */
+static void
+sendMessageToMaster(int pipefd[2], const char *str)
+{
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pipefd[PIPE_WRITE], str, len) != len)
+ exit_horribly(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+}
+
+/*
+ * This function is executed in the master process.
+ *
+ * It returns the next message from the worker on the communication channel,
+ * optionally blocking (do_wait) until it becomes available.
+ *
+ * The id of the worker is returned in *worker.
+ */
+static char *
+getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker)
+{
+ int i;
+ fd_set workerset;
+ int maxFd = -1;
+ struct timeval nowait = {0, 0};
+
+ FD_ZERO(&workerset);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_TERMINATED)
+ continue;
+ FD_SET(pstate->parallelSlot[i].pipeRead, &workerset);
+ /* actually WIN32 ignores the first parameter to select()... */
+ if (pstate->parallelSlot[i].pipeRead > maxFd)
+ maxFd = pstate->parallelSlot[i].pipeRead;
+ }
+
+ if (do_wait)
+ {
+ i = select_loop(maxFd, &workerset);
+ Assert(i != 0);
+ }
+ else
+ {
+ if ((i = select(maxFd + 1, &workerset, NULL, NULL, &nowait)) == 0)
+ return NULL;
+ }
+
+ if (i < 0)
+ exit_horribly(modulename, "error in ListenToWorkers(): %s\n", strerror(errno));
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ char *msg;
+
+ if (!FD_ISSET(pstate->parallelSlot[i].pipeRead, &workerset))
+ continue;
+
+ msg = readMessageFromPipe(pstate->parallelSlot[i].pipeRead);
+ *worker = i;
+ return msg;
+ }
+
+ Assert(false);
+ return NULL;
+}
+
+/*
+ * This function is executed in the master process.
+ *
+ * It sends a message to a certain worker on the communication channel.
+ */
+static void
+sendMessageToWorker(ParallelState *pstate, int worker, const char *str)
+{
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pstate->parallelSlot[worker].pipeWrite, str, len) != len)
+ {
+ /*
+ * If we're already aborting anyway, don't care if we succeed or not.
+ * The child might have gone already.
+ */
+#ifndef WIN32
+ if (!aborting)
+#endif
+ exit_horribly(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+ }
+}
+
+/*
+ * The underlying function to read a message from the communication channel
+ * (fd) with optional blocking (do_wait).
+ */
+static char *
+readMessageFromPipe(int fd)
+{
+ char *msg;
+ int msgsize,
+ bufsize;
+ int ret;
+
+ /*
+ * The problem here is that we need to deal with several possibilites: we
+ * could receive only a partial message or several messages at once. The
+ * caller expects us to return exactly one message however.
+ *
+ * We could either read in as much as we can and keep track of what we
+ * delivered back to the caller or we just read byte by byte. Once we see
+ * (char) 0, we know that it's the message's end. This would be quite
+ * inefficient for more data but since we are reading only on the command
+ * channel, the performance loss does not seem worth the trouble of
+ * keeping internal states for different file descriptors.
+ */
+ bufsize = 64; /* could be any number */
+ msg = (char *) pg_malloc(bufsize);
+
+ msgsize = 0;
+ for (;;)
+ {
+ Assert(msgsize <= bufsize);
+ ret = piperead(fd, msg + msgsize, 1);
+
+ /* worker has closed the connection or another error happened */
+ if (ret <= 0)
+ return NULL;
+
+ Assert(ret == 1);
+
+ if (msg[msgsize] == '\0')
+ return msg;
+
+ msgsize++;
+ if (msgsize == bufsize)
+ {
+ /* could be any number */
+ bufsize += 16;
+ msg = (char *) realloc(msg, bufsize);
+ }
+ }
+}
+
+/*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+static int
+select_loop(int maxFd, fd_set *workerset)
+{
+ int i;
+ fd_set saveSet = *workerset;
+
+#ifdef WIN32
+ /* should always be the master */
+ Assert(tMasterThreadId == GetCurrentThreadId());
+
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+#else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+
+ /*
+ * If we Ctrl-C the master process , it's likely that we interrupt
+ * select() here. The signal handler will set wantAbort == true and
+ * the shutdown journey starts from here. Note that we'll come back
+ * here later when we tell all workers to terminate and read their
+ * responses. But then we have aborting set to true.
+ */
+ if (wantAbort && !aborting)
+ exit_horribly(modulename, "terminated by user\n");
+
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+#endif
+
+ return i;
+}
+
+void
+DispatchJob(ParallelState *pstate, char * command)
+{
+ int worker;
+
+ /* our caller makes sure that at least one worker is idle */
+ Assert(GetIdleWorker(pstate) != NO_SLOT);
+ worker = GetIdleWorker(pstate);
+ Assert(worker != NO_SLOT);
+
+ sendMessageToWorker(pstate, worker, command);
+ pstate->parallelSlot[worker].workerStatus = WRKR_WORKING;
+}
+
+#ifdef WIN32
+/*
+ * This is a replacement version of pipe for Win32 which allows returned
+ * handles to be used in select(). Note that read/write calls must be replaced
+ * with recv/send.
+ */
+static int
+pgpipe(int handles[2])
+{
+ SOCKET s;
+ struct sockaddr_in serv_addr;
+ int len = sizeof(serv_addr);
+
+ handles[0] = handles[1] = INVALID_SOCKET;
+
+ if ((s = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
+ {
+ return -1;
+ }
+
+ memset((void *) &serv_addr, 0, sizeof(serv_addr));
+ serv_addr.sin_family = AF_INET;
+ serv_addr.sin_port = htons(0);
+ serv_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+ if (bind(s, (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if (listen(s, 1) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if (getsockname(s, (SOCKADDR *) &serv_addr, &len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if ((handles[1] = socket(PF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
+ {
+ closesocket(s);
+ return -1;
+ }
+
+ if (connect(handles[1], (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if ((handles[0] = accept(s, (SOCKADDR *) &serv_addr, &len)) == INVALID_SOCKET)
+ {
+ closesocket(handles[1]);
+ handles[1] = INVALID_SOCKET;
+ closesocket(s);
+ return -1;
+ }
+ closesocket(s);
+ return 0;
+}
+
+static int
+piperead(int s, char *buf, int len)
+{
+ int ret = recv(s, buf, len, 0);
+
+ if (ret < 0 && WSAGetLastError() == WSAECONNRESET)
+ /* EOF on the pipe! (win32 socket based implementation) */
+ ret = 0;
+ return ret;
+}
+
+#endif
+
+static void
+shutdown_parallel_vacuum_utils()
+{
+#ifdef WIN32
+ /* Call the cleanup function only from the main thread */
+ if (mainThreadId == GetCurrentThreadId())
+ WSACleanup();
+#endif
+}
+
+void
+init_parallel_vacuum_utils(void)
+{
+#ifdef WIN32
+ if (!parallel_init_done)
+ {
+ WSADATA wsaData;
+ int err;
+
+ mainThreadId = GetCurrentThreadId();
+ err = WSAStartup(MAKEWORD(2, 2), &wsaData);
+ if (err != 0)
+ {
+ exit_nicely(1);
+ }
+ parallel_init_done = true;
+ }
+#endif
+}
+
+void
+on_exit_close_connection(PGconn *conn)
+{
+ shutdown_info.conn = conn;
+}
+
+/*
+ * Fail and die, with a message to stderr. Parameters as for write_msg.
+ *
+ * This is defined in parallel.c, because in parallel mode, things are more
+ * complicated. If the worker process does exit_horribly(), we forward its
+ * last words to the master process. The master process then does
+ * exit_horribly() with this error message itself and prints it normally.
+ * After printing the message, exit_horribly() on the master will shut down
+ * the remaining worker processes.
+ */
+void
+exit_horribly(const char *modulename, const char *fmt,...)
+{
+ va_list ap;
+ ParallelState *pstate = shutdown_info.pstate;
+ ParallelSlot *slot;
+
+ va_start(ap, fmt);
+
+ slot = GetMyPSlot(pstate);
+ if (!slot)
+ {
+ /* We're the parent, just write the message out */
+ vfprintf(stderr, _(fmt), ap);
+ ParallelVacuumEnd(pstate);
+ PQfinish(shutdown_info.conn);
+ }
+ else
+ {
+ /* If we're a worker process, send the msg to the master process */
+ parallel_msg_master(slot, modulename, fmt, ap);
+ PQfinish(slot->args->connection);
+ }
+
+ va_end(ap);
+ exit_nicely(1);
+}
+
+static void
+exit_nicely(int code)
+{
+ shutdown_parallel_vacuum_utils();
+ exit(code);
+}
+
+
diff --git a/src/bin/scripts/vac_parallel.h b/src/bin/scripts/vac_parallel.h
new file mode 100644
index 0000000..21100b1
--- /dev/null
+++ b/src/bin/scripts/vac_parallel.h
@@ -0,0 +1,102 @@
+/*-------------------------------------------------------------------------
+ *
+ * vac_parallel.h
+ *
+ * Parallel support header file for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef VAC_PARALLEL_H
+#define VAC_PARALLEL_H
+
+#include "postgres_fe.h"
+#include <time.h>
+#include "libpq-fe.h"
+
+
+typedef enum
+{
+ WRKR_TERMINATED = 0,
+ WRKR_IDLE,
+ WRKR_WORKING,
+ WRKR_FINISHED
+} T_WorkerStatus;
+
+/* Arguments needed for a worker process */
+typedef struct ParallelArgs
+{
+ PGconn *connection;
+} ParallelArgs;
+
+/* State for each parallel activity slot */
+typedef struct ParallelSlot
+{
+ ParallelArgs *args; //can pass connection handle here
+ T_WorkerStatus workerStatus;
+ int status;
+ int pipeRead;
+ int pipeWrite;
+ int pipeRevRead;
+ int pipeRevWrite;
+#ifdef WIN32
+ uintptr_t hThread;
+ unsigned int threadId;
+#else
+ pid_t pid;
+#endif
+} ParallelSlot;
+
+#define NO_SLOT (-1)
+
+typedef struct ParallelState
+{
+ int numWorkers;
+ ParallelSlot *parallelSlot;
+} ParallelState;
+
+
+
+typedef struct VacuumOption
+{
+ char *dbname;
+ char *pgport;
+ char *pghost;
+ char *username;
+ char * progname;
+}VacuumOption;
+
+
+#ifdef WIN32
+extern bool parallel_init_done;
+extern DWORD mainThreadId;
+#endif
+
+extern ParallelState * ParallelVacuumStart(VacuumOption *vopt, int numWorkers);
+extern int GetIdleWorker(ParallelState *pstate);
+extern bool IsEveryWorkerIdle(ParallelState *pstate);
+extern void ListenToWorkers(ParallelState *pstate, bool do_wait);
+extern int ReapWorkerStatus(ParallelState *pstate, int *status);
+extern void EnsureIdleWorker(ParallelState *pstate);
+extern void EnsureWorkersFinished(ParallelState *pstate);
+
+extern void DispatchJob(ParallelState *pstate, char * command);
+extern void
+exit_horribly(const char *modulename, const char *fmt,...)
+__attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3), noreturn));
+
+extern void init_parallel_vacuum_utils(void);
+extern void on_exit_close_connection(PGconn *conn);
+extern void ParallelVacuumEnd(ParallelState *pstate);
+
+
+#endif /* VAC_PARALLEL_H */
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index e4dde1f..766ed0e 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -13,6 +13,7 @@
#include "postgres_fe.h"
#include "common.h"
#include "dumputils.h"
+#include "vac_parallel.h"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
@@ -29,6 +30,18 @@ static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
static void help(const char *progname);
+void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool freeze,
+ const char *table, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int parallel,
+ SimpleStringList *tables);
+
+void run_command(ParallelState *pstate, char *command);
+
+void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+
int
main(int argc, char *argv[])
@@ -49,6 +62,7 @@ main(int argc, char *argv[])
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"parallel", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{NULL, 0, NULL, 0}
};
@@ -72,13 +86,14 @@ main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int parallel = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
- while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
+ while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
@@ -127,6 +142,9 @@ main(int argc, char *argv[])
case 'v':
verbose = true;
break;
+ case 'j':
+ parallel = atoi(optarg);
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
@@ -136,6 +154,7 @@ main(int argc, char *argv[])
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
@@ -209,21 +228,49 @@ main(int argc, char *argv[])
{
SimpleStringListCell *cell;
- for (cell = tables.head; cell; cell = cell->next)
- {
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only,
- freeze, cell->val,
- host, port, username, prompt_password,
- progname, echo);
- }
+ if (parallel < 2)
+ {
+ for (cell = tables.head; cell; cell = cell->next)
+ {
+ vacuum_one_database(dbname, full, verbose, and_analyze,
+ analyze_only,
+ freeze, cell->val,
+ host, port, username, prompt_password,
+ progname, echo);
+ }
+ }
+ else
+ {
+ vacuum_parallel(dbname, full, verbose, and_analyze,
+ analyze_only,
+ freeze, NULL,
+ host, port, username, prompt_password,
+ progname, echo, parallel, &tables);
+
+ }
}
else
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only,
- freeze, NULL,
- host, port, username, prompt_password,
- progname, echo);
+ {
+
+ if (parallel < 2)
+ {
+ vacuum_one_database(dbname, full, verbose, and_analyze,
+ analyze_only,
+ freeze, NULL,
+ host, port, username, prompt_password,
+ progname, echo);
+ }
+ else
+ {
+
+ vacuum_parallel(dbname, full, verbose, and_analyze,
+ analyze_only,
+ freeze, NULL,
+ host, port, username, prompt_password,
+ progname, echo, parallel, NULL);
+ }
+ }
+
}
exit(0);
@@ -351,6 +398,174 @@ vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
}
+void
+vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool freeze,
+ const char *table, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int parallel,
+ SimpleStringList *tables)
+{
+ PQExpBufferData sql;
+
+ PGconn *conn;
+ PGresult *res;
+ int ntuple;
+ int i;
+ char *relName;
+ char *nspace;
+ VacuumOption vopt;
+ ParallelState *pstate;
+
+ init_parallel_vacuum_utils();
+
+ initPQExpBuffer(&sql);
+
+ conn = connectDatabase(dbname, host, port, username, prompt_password,
+ progname, false);
+
+ vopt.dbname = dbname;
+ vopt.pghost = host;
+ vopt.pgport = port;
+ vopt.username = username;
+ vopt.progname = progname;
+
+ on_exit_close_connection(conn);
+
+ pstate = ParallelVacuumStart(&vopt, parallel);
+
+ if (tables == NULL)
+ {
+ res = executeQuery(conn,
+ "select relname, nspname from pg_class c, pg_namespace ns"
+ " where relkind= \'r\' and c.relnamespace = ns.oid"
+ " order by relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ relName = PQgetvalue(res, i, 0);
+ nspace = PQgetvalue(res, i, 1);
+ prepare_command(conn, full, verbose, and_analyze,
+ analyze_only, freeze, &sql);
+
+ appendPQExpBuffer(&sql, " %s.%s", nspace, relName);
+ run_command(pstate, sql.data);
+ termPQExpBuffer(&sql);
+ }
+ }
+ else
+ {
+ SimpleStringListCell *cell;
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ prepare_command(conn, full, verbose, and_analyze,
+ analyze_only, freeze, &sql);
+ appendPQExpBuffer(&sql, " %s", cell->val);
+ run_command(pstate, sql.data);
+ termPQExpBuffer(&sql);
+ }
+ }
+
+ EnsureWorkersFinished(pstate);
+ ParallelVacuumEnd(pstate);
+
+ PQfinish(conn);
+ termPQExpBuffer(&sql);
+}
+
+void run_command(ParallelState *pstate, char *command)
+{
+ int work_status;
+ int ret_child;
+
+ DispatchJob(pstate, command);
+
+ /*Listen for worker and get message*/
+ for (;;)
+ {
+ int nTerm = 0;
+
+ ListenToWorkers(pstate, false);
+ while ((ret_child = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before
+ * re-running the loop. If nTerm > 0 we already have that (quick
+ * check).
+ */
+ if (nTerm > 0)
+ break;
+
+ /* if nobody terminated, explicitly check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ break;
+ }
+}
+
+void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+{
+ initPQExpBuffer(sql);
+
+ if (analyze_only)
+ {
+ appendPQExpBuffer(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBuffer(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBuffer(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBuffer(sql, " FULL");
+ if (freeze)
+ appendPQExpBuffer(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBuffer(sql, " ANALYZE");
+ }
+ }
+}
+
+
static void
help(const char *progname)
{
On 08-11-2013 05:07, Jan Lentfer wrote:
For the case where you have tables of varying size this would lead to
a reduced overall processing time as it prevents large (read: long
processing time) tables to be processed in the last step. While
processing large tables at first and filling up "processing
slots/jobs" when they get free with smaller tables one after the
other would safe overall execution time.
That is certainly a good strategy (not the optimal [1]/messages/by-id/CA+TgmobwxqsagXKtyQ1S8+gMpqxF_MLXv=4350tFZVqAwKEqgQ@mail.gmail.com -- that is hard
to achieve). Also, the strategy must:
(i) consider the relation age before size (for vacuum);
(ii) consider that you can't pick indexes for the same relation (for
reindex).
[1]: /messages/by-id/CA+TgmobwxqsagXKtyQ1S8+gMpqxF_MLXv=4350tFZVqAwKEqgQ@mail.gmail.com
/messages/by-id/CA+TgmobwxqsagXKtyQ1S8+gMpqxF_MLXv=4350tFZVqAwKEqgQ@mail.gmail.com
--
Euler Taveira Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Nov 7, 2013 at 8:42 PM, Dilip kumar <dilip.kumar@huawei.com> wrote:
This patch implementing the following TODO item
Allow parallel cores to be used by vacuumdb
/messages/by-id/4F10A728.7090403@agliodbs.comLike Parallel pg_dump, vacuumdb is provided with the option to run the
vacuum of multiple tables in parallel. [ vacuumdb –j ]1. One new option is provided with vacuumdb to give the number of
workers.2. All worker will be started in beginning and all will be waiting for
the vacuum instruction from the master.3. Now, if table list is provided in vacuumdb command using –t then,
it will send the vacuum of one table to one of the IDLE worker, next table
to next IDLE worker and so on.4. If vacuum is given for one DB then, it will execute select on
pg_class to get the table list and fetch the table name one by one and also
assign the vacuum responsibility to IDLE workers.Performance Data by parallel vacuumdb:
Machine Configuration:
Core : 8
RAM: 24GB
Test Scenario:
16 tables all with 4M records. [many records
are deleted and inserted using some pattern, (files is attached in the
mail)]Test Result
{Base Code} Time(s) %CPU Usage Avg Read(kB/s) Avg Write(kB/s)
521 3% 12000
20000{With Parallel Vacuum Patch}
worker Time(s) %CPU Usage Avg Read(kB/s) Avg
Write(kB/s)1 518 3% 12000
20000 --> this will take the same path as base code2 390 5% 14000
300008 235 7% 18000
4000016 197 8% 20000
50000Conclusion:
By running the vacuumdb in parallel, CPU and I/O throughput
is increasing and it can give >50% performance improvement.Work to be Done:
1. Documentations of the new command.
2. Parallel support for vacuum all db.
Is it required to move the common code for parallel operation of pg_dump and
vacuumdb to one place and reuse it ?Prototype patch is attached in the mail, please provide your
feedback/Suggestions…Thanks & Regards,
Dilip Kumar
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Nov 7, 2013 at 8:42 PM, Dilip kumar <dilip.kumar@huawei.com> wrote:
This patch implementing the following TODO item
Allow parallel cores to be used by vacuumdb
/messages/by-id/4F10A728.7090403@agliodbs.com
Cool. Could you add this patch to the next commit fest for 9.4? It
begins officially in a couple of days. Here is the URL to it:
https://commitfest.postgresql.org/action/commitfest_view?id=20
Regards,
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 08-11-2013 06:20, Dilip kumar wrote:
On 08 November 2013 13:38, Jan Lentfer
For this use case, would it make sense to queue work (tables) in order of their size, starting on the largest one?
For the case where you have tables of varying size this would lead to a reduced overall processing time as it prevents large (read: long processing time) tables to be processed in the last step. While processing large tables at first and filling up "processing slots/jobs" when they get free with smaller tables one after the other would safe overall execution time.
Good point, I have made the change and attached the modified patch.
Don't you submit it for a CF, do you? Is it too late for this CF?
--
Euler Taveira Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Euler Taveira wrote:
On 08-11-2013 06:20, Dilip kumar wrote:
On 08 November 2013 13:38, Jan Lentfer
For this use case, would it make sense to queue work (tables) in order of their size, starting on the largest one?
For the case where you have tables of varying size this would lead to a reduced overall processing time as it prevents large (read: long processing time) tables to be processed in the last step. While processing large tables at first and filling up "processing slots/jobs" when they get free with smaller tables one after the other would safe overall execution time.
Good point, I have made the change and attached the modified patch.
Don't you submit it for a CF, do you? Is it too late for this CF?
Not too late.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 16 January 2014 19:53, Euler Taveira Wrote,
For the case where you have tables of varying size this would lead
to a reduced overall processing time as it prevents large (read: long
processing time) tables to be processed in the last step. While
processing large tables at first and filling up "processing slots/jobs"
when they get free with smaller tables one after the other would safe
overall execution time.Good point, I have made the change and attached the modified patch.
Don't you submit it for a CF, do you? Is it too late for this CF?
Attached the latest updated patch
1. Rebased the patch to current GIT head.
2. Doc is updated.
3. Supported parallel execution for all db option also.
Same I will add to current open commitfest..
Regards,
Dilip
Attachments:
vacuumdb_parallel_v3.patchapplication/octet-stream; name=vacuumdb_parallel_v3.patchDownload
*** a/doc/src/sgml/ref/vacuumdb.sgml
--- b/doc/src/sgml/ref/vacuumdb.sgml
***************
*** 204,209 **** PostgreSQL documentation
--- 204,219 ----
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">parallel_threads</replaceable></></term>
+ <term><option>--jobs=<replaceable class="parameter">jobs</replaceable></></term>
+ <listitem>
+ <para>
+ No of parallel jobs to perform the operation.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-?</></term>
<term><option>--help</></term>
<listitem>
*** a/src/bin/scripts/Makefile
--- b/src/bin/scripts/Makefile
***************
*** 32,38 **** dropdb: dropdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq subm
droplang: droplang.o common.o print.o mbprint.o | submake-libpq submake-libpgport
dropuser: dropuser.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
clusterdb: clusterdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
! vacuumdb: vacuumdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
reindexdb: reindexdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
pg_isready: pg_isready.o common.o | submake-libpq submake-libpgport
--- 32,38 ----
droplang: droplang.o common.o print.o mbprint.o | submake-libpq submake-libpgport
dropuser: dropuser.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
clusterdb: clusterdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
! vacuumdb: vacuumdb.o vac_parallel.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
reindexdb: reindexdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
pg_isready: pg_isready.o common.o | submake-libpq submake-libpgport
***************
*** 66,70 **** uninstall:
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
! rm -f common.o dumputils.o kwlookup.o keywords.o print.o mbprint.o $(WIN32RES)
rm -f dumputils.c print.c mbprint.c kwlookup.c keywords.c
--- 66,70 ----
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
! rm -f common.o dumputils.o kwlookup.o keywords.o print.o mbprint.o vac_parallel.o $(WIN32RES)
rm -f dumputils.c print.c mbprint.c kwlookup.c keywords.c
*** /dev/null
--- b/src/bin/scripts/vac_parallel.c
***************
*** 0 ****
--- 1,1112 ----
+ /*-------------------------------------------------------------------------
+ *
+ * vac_parallel.c
+ *
+ * Parallel support for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #include "vac_parallel.h"
+
+ #ifndef WIN32
+ #include <sys/types.h>
+ #include <sys/wait.h>
+ #include "signal.h"
+ #include <unistd.h>
+ #include <fcntl.h>
+ #endif
+
+ #include "common.h"
+
+
+ #define PIPE_READ 0
+ #define PIPE_WRITE 1
+
+ /* file-scope variables */
+ #ifdef WIN32
+ static unsigned int tMasterThreadId = 0;
+ static HANDLE termEvent = INVALID_HANDLE_VALUE;
+ static int pgpipe(int handles[2]);
+ static int piperead(int s, char *buf, int len);
+ bool parallel_init_done = false;
+ DWORD mainThreadId;
+
+ /*
+ * Structure to hold info passed by _beginthreadex() to the function it calls
+ * via its single allowed argument.
+ */
+ typedef struct
+ {
+ VacOpt *vopt;
+ int worker;
+ int pipeRead;
+ int pipeWrite;
+ } WorkerInfo;
+
+ #define pipewrite(a,b,c) send(a,b,c,0)
+ #else
+ /*
+ * aborting is only ever used in the master, the workers are fine with just
+ * wantAbort.
+ */
+ static bool aborting = false;
+ static volatile sig_atomic_t wantAbort = 0;
+
+ #define pgpipe(a) pipe(a)
+ #define piperead(a,b,c) read(a,b,c)
+ #define pipewrite(a,b,c) write(a,b,c)
+ #endif
+
+ static const char *modulename = gettext_noop("parallel vacuum");
+
+ typedef struct ShutdownInformation
+ {
+ ParallelState *pstate;
+ PGconn *conn;
+ } ShutdownInformation;
+
+ static ShutdownInformation shutdown_info;
+
+ static char *readMessageFromPipe(int fd);
+ static void
+ SetupWorker(PGconn *connection, int pipefd[2], int worker);
+ static void
+ WaitForCommands(PGconn * connection, int pipefd[2]);
+ static char *
+ getMessageFromMaster(int pipefd[2]);
+ #define messageStartsWith(msg, prefix) \
+ (strncmp(msg, prefix, strlen(prefix)) == 0)
+ #define messageEquals(msg, pattern) \
+ (strcmp(msg, pattern) == 0)
+ static char *
+ getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker);
+
+ static void
+ sendMessageToMaster(int pipefd[2], const char *str);
+
+ #ifndef WIN32
+ static void sigTermHandler(int signum);
+ #endif
+
+ static ParallelSlot *
+ GetMyPSlot(ParallelState *pstate);
+ static int
+ select_loop(int maxFd, fd_set *workerset);
+ void
+ exit_horribly(const char *modulename, const char *fmt,...);
+ void
+ init_parallel_vacuum_utils(void);
+
+ static void exit_nicely(int code);
+
+ static bool
+ HasEveryWorkerTerminated(ParallelState *pstate);
+
+ static void checkAborting();
+
+
+ static ParallelSlot *
+ GetMyPSlot(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ #ifdef WIN32
+ if (pstate->parallelSlot[i].threadId == GetCurrentThreadId())
+ #else
+ if (pstate->parallelSlot[i].pid == getpid())
+ #endif
+ return &(pstate->parallelSlot[i]);
+
+ return NULL;
+ }
+
+ /* Sends the error message from the worker to the master process */
+ static void
+ parallel_msg_master(ParallelSlot *slot, const char *modulename,
+ const char *fmt, va_list ap)
+ {
+ char buf[512];
+ int pipefd[2];
+
+ pipefd[PIPE_READ] = slot->pipeRevRead;
+ pipefd[PIPE_WRITE] = slot->pipeRevWrite;
+
+ strcpy(buf, "ERROR ");
+ vsnprintf(buf + strlen("ERROR "),
+ sizeof(buf) - strlen("ERROR "), fmt, ap);
+
+ sendMessageToMaster(pipefd, buf);
+ }
+
+ /*
+ * Wait for the termination of the processes using the OS-specific method.
+ */
+ static void
+ WaitForTerminatingWorkers(ParallelState *pstate)
+ {
+ while (!HasEveryWorkerTerminated(pstate))
+ {
+ ParallelSlot *slot = NULL;
+ int j;
+
+ #ifndef WIN32
+ int status;
+ pid_t pid = wait(&status);
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].pid == pid)
+ slot = &(pstate->parallelSlot[j]);
+ #else
+ uintptr_t hThread;
+ DWORD ret;
+ uintptr_t *lpHandles = pg_malloc(sizeof(HANDLE) * pstate->numWorkers);
+ int nrun = 0;
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].workerStatus != WRKR_TERMINATED)
+ {
+ lpHandles[nrun] = pstate->parallelSlot[j].hThread;
+ nrun++;
+ }
+ ret = WaitForMultipleObjects(nrun, (HANDLE *) lpHandles, false, INFINITE);
+ Assert(ret != WAIT_FAILED);
+ hThread = lpHandles[ret - WAIT_OBJECT_0];
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].hThread == hThread)
+ slot = &(pstate->parallelSlot[j]);
+
+ free(lpHandles);
+ #endif
+ Assert(slot);
+
+ slot->workerStatus = WRKR_TERMINATED;
+ }
+ Assert(HasEveryWorkerTerminated(pstate));
+ }
+
+
+ #ifdef WIN32
+ static unsigned __stdcall
+ init_spawned_worker_win32(WorkerInfo *wi)
+ {
+ PGconn *conn;
+ int pipefd[2] = {wi->pipeRead, wi->pipeWrite};
+ int worker = wi->worker;
+ VacOpt *vopt = wi->vopt;
+
+ conn = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport,
+ vopt->username, vopt->promptPassword, vopt->progname,
+ false);
+
+ free(wi);
+ SetupWorker(conn, pipefd, worker);
+ _endthreadex(0);
+ return 0;
+ }
+ #endif
+
+
+
+ ParallelState * ParallelVacuumStart(VacOpt *vopt, int numWorkers)
+ {
+ ParallelState *pstate;
+ int i;
+ const size_t slotSize = numWorkers * sizeof(ParallelSlot);
+
+ Assert(numWorkers > 0);
+
+ /* Ensure stdio state is quiesced before forking */
+ fflush(NULL);
+
+ pstate = (ParallelState *) pg_malloc(sizeof(ParallelState));
+
+ pstate->numWorkers = numWorkers;
+ pstate->parallelSlot = NULL;
+
+ if (numWorkers == 1)
+ return pstate;
+
+ pstate->parallelSlot = (ParallelSlot *) pg_malloc(slotSize);
+ memset((void *) pstate->parallelSlot, 0, slotSize);
+
+ shutdown_info.pstate = pstate;
+
+ #ifdef WIN32
+ tMasterThreadId = GetCurrentThreadId();
+ termEvent = CreateEvent(NULL, true, false, "Terminate");
+ #else
+ signal(SIGTERM, sigTermHandler);
+ signal(SIGINT, sigTermHandler);
+ signal(SIGQUIT, sigTermHandler);
+ #endif
+
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ #ifdef WIN32
+ WorkerInfo *wi;
+ uintptr_t handle;
+ #else
+ pid_t pid;
+ #endif
+ int pipeMW[2],
+ pipeWM[2];
+
+ if (pgpipe(pipeMW) < 0 || pgpipe(pipeWM) < 0)
+ exit_horribly(modulename,
+ "could not create communication channels: %s\n",
+ strerror(errno));
+
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ pstate->parallelSlot[i].args = (ParallelArgs *) pg_malloc(sizeof(ParallelArgs));
+ #ifdef WIN32
+ /* Allocate a new structure for every worker */
+ wi = (WorkerInfo *) pg_malloc(sizeof(WorkerInfo));
+ wi->worker = i;
+ wi->pipeRead = pstate->parallelSlot[i].pipeRevRead = pipeMW[PIPE_READ];
+ wi->pipeWrite = pstate->parallelSlot[i].pipeRevWrite = pipeWM[PIPE_WRITE];
+ wi->vopt = vopt;
+ handle = _beginthreadex(NULL, 0, (void *) &init_spawned_worker_win32,
+ wi, 0, &(pstate->parallelSlot[i].threadId));
+ pstate->parallelSlot[i].hThread = handle;
+ #else
+ pid = fork();
+ if (pid == 0)
+ {
+ /* we are the worker */
+ int j;
+ int pipefd[2] = {pipeMW[PIPE_READ], pipeWM[PIPE_WRITE]};
+
+ /*
+ * Store the fds for the reverse communication in pstate. Actually
+ * we only use this in case of an error and don't use pstate
+ * otherwise in the worker process. On Windows we write to the
+ * global pstate, in Unix we write to our process-local copy but
+ * that's also where we'd retrieve this information back from.
+ */
+ pstate->parallelSlot[i].pipeRevRead = pipefd[PIPE_READ];
+ pstate->parallelSlot[i].pipeRevWrite = pipefd[PIPE_WRITE];
+ pstate->parallelSlot[i].pid = getpid();
+
+ pstate->parallelSlot[i].args->connection
+ = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport,
+ vopt->username, vopt->promptPassword,
+ vopt->progname, false);
+
+ /* close read end of Worker -> Master */
+ closesocket(pipeWM[PIPE_READ]);
+
+ /* close write end of Master -> Worker */
+ closesocket(pipeMW[PIPE_WRITE]);
+
+ /*
+ * Close all inherited fds for communication of the master with
+ * the other workers.
+ */
+ for (j = 0; j < i; j++)
+ {
+ closesocket(pstate->parallelSlot[j].pipeRead);
+ closesocket(pstate->parallelSlot[j].pipeWrite);
+ }
+
+ SetupWorker(pstate->parallelSlot[i].args->connection, pipefd, i);
+ exit(0);
+ }
+ else if (pid < 0)
+ /* fork failed */
+ exit_horribly(modulename,
+ "could not create worker process: %s\n",
+ strerror(errno));
+
+ /* we are the Master, pid > 0 here */
+ Assert(pid > 0);
+
+ /* close read end of Master -> Worker */
+ closesocket(pipeMW[PIPE_READ]);
+
+ /* close write end of Worker -> Master */
+ closesocket(pipeWM[PIPE_WRITE]);
+
+ pstate->parallelSlot[i].pid = pid;
+ #endif
+
+ pstate->parallelSlot[i].pipeRead = pipeWM[PIPE_READ];
+ pstate->parallelSlot[i].pipeWrite = pipeMW[PIPE_WRITE];
+ }
+
+ return pstate;
+ }
+
+ /*
+ * Tell all of our workers to terminate.
+ *
+ * Pretty straightforward routine, first we tell everyone to terminate, then
+ * we listen to the workers' replies and finally close the sockets that we
+ * have used for communication.
+ */
+ void
+ ParallelVacuumEnd(ParallelState *pstate)
+ {
+ int i;
+
+ Assert(IsEveryWorkerIdle(pstate));
+
+ /* close the sockets so that the workers know they can exit */
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ closesocket(pstate->parallelSlot[i].pipeRead);
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+ }
+
+ WaitForTerminatingWorkers(pstate);
+
+ /*
+ * Remove the pstate again, so the exit handler in the parent will now
+ * again fall back to closing AH->connection (if connected).
+ */
+ shutdown_info.pstate = NULL;
+
+ free(pstate->parallelSlot);
+ free(pstate);
+ }
+
+
+ /*
+ * Find the first free parallel slot (if any).
+ */
+ int
+ GetIdleWorker(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus == WRKR_IDLE)
+ return i;
+ return NO_SLOT;
+ }
+
+ /*
+ * Return true iff every worker process is in the WRKR_TERMINATED state.
+ */
+ static bool
+ HasEveryWorkerTerminated(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_TERMINATED)
+ return false;
+ return true;
+ }
+
+ /*
+ * If we have one worker that terminates for some reason, we'd like the other
+ * threads to terminate as well (and not finish with their 70 GB table dump
+ * first...). Now in UNIX we can just kill these processes, and let the signal
+ * handler set wantAbort to 1. In Windows we set a termEvent and this serves
+ * as the signal for everyone to terminate.
+ */
+ static void
+ checkAborting()
+ {
+ #ifdef WIN32
+ if (WaitForSingleObject(termEvent, 0) == WAIT_OBJECT_0)
+ #else
+ if (wantAbort)
+ #endif
+ exit_horribly(modulename, "worker is terminating\n");
+ }
+
+ /*
+ * Shut down any remaining workers, this has an implicit do_wait == true.
+ *
+ * The fastest way we can make the workers terminate gracefully is when
+ * they are listening for new commands and we just tell them to terminate.
+ */
+ static void
+ ShutdownWorkersHard(ParallelState *pstate)
+ {
+ #ifndef WIN32
+ int i;
+
+ signal(SIGPIPE, SIG_IGN);
+
+ /*
+ * Close our write end of the sockets so that the workers know they can
+ * exit.
+ */
+ for (i = 0; i < pstate->numWorkers; i++)
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ kill(pstate->parallelSlot[i].pid, SIGTERM);
+ #else
+ /* The workers monitor this event via checkAborting(). */
+ SetEvent(termEvent);
+ #endif
+
+ WaitForTerminatingWorkers(pstate);
+ }
+
+ #ifndef WIN32
+ /* Signal handling (UNIX only) */
+ static void
+ sigTermHandler(int signum)
+ {
+ wantAbort = 1;
+ }
+ #endif
+
+
+
+ /*
+ * This function is called by both UNIX and Windows variants to set up a
+ * worker process.
+ */
+ static void
+ SetupWorker(PGconn *connection, int pipefd[2], int worker)
+ {
+ /*
+ * Call the setup worker function that's defined in the ArchiveHandle.
+ *
+ * We get the raw connection only for the reason that we can close it
+ * properly when we shut down. This happens only that way when it is
+ * brought down because of an error.
+ */
+ WaitForCommands(connection, pipefd);
+ closesocket(pipefd[PIPE_READ]);
+ closesocket(pipefd[PIPE_WRITE]);
+ }
+
+
+
+ /*
+ * That's the main routine for the worker.
+ * When it starts up it enters this routine and waits for commands from the
+ * master process. After having processed a command it comes back to here to
+ * wait for the next command. Finally it will receive a TERMINATE command and
+ * exit.
+ */
+ static void
+ WaitForCommands(PGconn * connection, int pipefd[2])
+ {
+ char *command;
+
+ for (;;)
+ {
+ if (!(command = getMessageFromMaster(pipefd)))
+ {
+ PQfinish(connection);
+ connection = NULL;
+ return;
+ }
+
+
+ /* check if master has set the terminate event*/
+ checkAborting();
+
+ if (executeMaintenanceCommand(connection, command, false))
+ sendMessageToMaster(pipefd, "OK");
+ else
+ sendMessageToMaster(pipefd, "ERROR : Execute failed");
+
+ /* command was pg_malloc'd and we are responsible for free()ing it. */
+ free(command);
+ }
+ }
+
+ /*
+ * ---------------------------------------------------------------------
+ * Note the status change:
+ *
+ * DispatchJobForTocEntry WRKR_IDLE -> WRKR_WORKING
+ * ListenToWorkers WRKR_WORKING -> WRKR_FINISHED / WRKR_TERMINATED
+ * ReapWorkerStatus WRKR_FINISHED -> WRKR_IDLE
+ * ---------------------------------------------------------------------
+ *
+ * Just calling ReapWorkerStatus() when all workers are working might or might
+ * not give you an idle worker because you need to call ListenToWorkers() in
+ * between and only thereafter ReapWorkerStatus(). This is necessary in order
+ * to get and deal with the status (=result) of the worker's execution.
+ */
+ void
+ ListenToWorkers(ParallelState *pstate, bool do_wait)
+ {
+ int worker;
+ char *msg;
+
+ msg = getMessageFromWorker(pstate, do_wait, &worker);
+
+ if (!msg)
+ {
+ if (do_wait)
+ exit_horribly(modulename, "a worker process died unexpectedly\n");
+ return;
+ }
+
+ if (messageStartsWith(msg, "OK"))
+ {
+ pstate->parallelSlot[worker].workerStatus = WRKR_FINISHED;
+
+ }
+ else if (messageStartsWith(msg, "ERROR "))
+ {
+ pstate->parallelSlot[worker].workerStatus = WRKR_TERMINATED;
+ exit_horribly(modulename, "%s", msg + strlen("ERROR "));
+ }
+ else
+ exit_horribly(modulename, "invalid message received from worker: %s\n", msg);
+
+ /* both Unix and Win32 return pg_malloc()ed space, so we free it */
+ free(msg);
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * This function is used to get the return value of a terminated worker
+ * process. If a process has terminated, its status is stored in *status and
+ * the id of the worker is returned.
+ */
+ int
+ ReapWorkerStatus(ParallelState *pstate, int *status)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_FINISHED)
+ {
+ *status = pstate->parallelSlot[i].status;
+ pstate->parallelSlot[i].status = 0;
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ return i;
+ }
+ }
+
+ return NO_SLOT;
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It looks for an idle worker process and only returns if there is one.
+ */
+ void
+ EnsureIdleWorker(ParallelState *pstate)
+ {
+ int ret_worker;
+ int work_status;
+
+ for (;;)
+ {
+ int nTerm = 0;
+
+ while ((ret_worker = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ if (work_status != 0)
+ exit_horribly(modulename, "error processing a parallel work item\n");
+
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before dispatching
+ * the next item. If nTerm > 0 we already have that (quick check).
+ */
+ if (nTerm > 0)
+ return;
+
+ /* explicit check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ return;
+
+ /*
+ * If we have no idle worker, read the result of one or more workers
+ * and loop the loop to call ReapWorkerStatus() on them
+ */
+ ListenToWorkers(pstate, true);
+ }
+ }
+
+
+ /*
+ * Return true iff every worker is in the WRKR_IDLE state.
+ */
+ bool
+ IsEveryWorkerIdle(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_IDLE)
+ return false;
+ return true;
+ }
+
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It waits for all workers to terminate.
+ */
+ void
+ EnsureWorkersFinished(ParallelState *pstate)
+ {
+ int work_status;
+
+ if (!pstate || pstate->numWorkers == 1)
+ return;
+
+ /* Waiting for the remaining worker processes to finish */
+ while (!IsEveryWorkerIdle(pstate))
+ {
+ if (ReapWorkerStatus(pstate, &work_status) == NO_SLOT)
+ ListenToWorkers(pstate, true);
+ else if (work_status != 0)
+ exit_horribly(modulename,
+ "error processing a parallel work item\n");
+ }
+ }
+
+ /*
+ * This function is executed in the worker process.
+ *
+ * It returns the next message on the communication channel, blocking until it
+ * becomes available.
+ */
+ static char *
+ getMessageFromMaster(int pipefd[2])
+ {
+ return readMessageFromPipe(pipefd[PIPE_READ]);
+ }
+
+ /*
+ * This function is executed in the worker process.
+ *
+ * It sends a message to the master on the communication channel.
+ */
+ static void
+ sendMessageToMaster(int pipefd[2], const char *str)
+ {
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pipefd[PIPE_WRITE], str, len) != len)
+ exit_horribly(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It returns the next message from the worker on the communication channel,
+ * optionally blocking (do_wait) until it becomes available.
+ *
+ * The id of the worker is returned in *worker.
+ */
+ static char *
+ getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker)
+ {
+ int i;
+ fd_set workerset;
+ int maxFd = -1;
+ struct timeval nowait = {0, 0};
+
+ FD_ZERO(&workerset);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_TERMINATED)
+ continue;
+ FD_SET(pstate->parallelSlot[i].pipeRead, &workerset);
+ /* actually WIN32 ignores the first parameter to select()... */
+ if (pstate->parallelSlot[i].pipeRead > maxFd)
+ maxFd = pstate->parallelSlot[i].pipeRead;
+ }
+
+ if (do_wait)
+ {
+ i = select_loop(maxFd, &workerset);
+ Assert(i != 0);
+ }
+ else
+ {
+ if ((i = select(maxFd + 1, &workerset, NULL, NULL, &nowait)) == 0)
+ return NULL;
+ }
+
+ if (i < 0)
+ exit_horribly(modulename, "error in ListenToWorkers(): %s\n", strerror(errno));
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ char *msg;
+
+ if (!FD_ISSET(pstate->parallelSlot[i].pipeRead, &workerset))
+ continue;
+
+ msg = readMessageFromPipe(pstate->parallelSlot[i].pipeRead);
+ *worker = i;
+ return msg;
+ }
+
+ Assert(false);
+ return NULL;
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It sends a message to a certain worker on the communication channel.
+ */
+ static void
+ sendMessageToWorker(ParallelState *pstate, int worker, const char *str)
+ {
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pstate->parallelSlot[worker].pipeWrite, str, len) != len)
+ {
+ /*
+ * If we're already aborting anyway, don't care if we succeed or not.
+ * The child might have gone already.
+ */
+ #ifndef WIN32
+ if (!aborting)
+ #endif
+ exit_horribly(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+ }
+ }
+
+ /*
+ * The underlying function to read a message from the communication channel
+ * (fd) with optional blocking (do_wait).
+ */
+ static char *
+ readMessageFromPipe(int fd)
+ {
+ char *msg;
+ int msgsize,
+ bufsize;
+ int ret;
+
+ /*
+ * The problem here is that we need to deal with several possibilites: we
+ * could receive only a partial message or several messages at once. The
+ * caller expects us to return exactly one message however.
+ *
+ * We could either read in as much as we can and keep track of what we
+ * delivered back to the caller or we just read byte by byte. Once we see
+ * (char) 0, we know that it's the message's end. This would be quite
+ * inefficient for more data but since we are reading only on the command
+ * channel, the performance loss does not seem worth the trouble of
+ * keeping internal states for different file descriptors.
+ */
+ bufsize = 64; /* could be any number */
+ msg = (char *) pg_malloc(bufsize);
+
+ msgsize = 0;
+ for (;;)
+ {
+ Assert(msgsize <= bufsize);
+ ret = piperead(fd, msg + msgsize, 1);
+
+ /* worker has closed the connection or another error happened */
+ if (ret <= 0)
+ return NULL;
+
+ Assert(ret == 1);
+
+ if (msg[msgsize] == '\0')
+ return msg;
+
+ msgsize++;
+ if (msgsize == bufsize)
+ {
+ /* could be any number */
+ bufsize += 16;
+ msg = (char *) realloc(msg, bufsize);
+ }
+ }
+ }
+
+ /*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+ static int
+ select_loop(int maxFd, fd_set *workerset)
+ {
+ int i;
+ fd_set saveSet = *workerset;
+
+ #ifdef WIN32
+ /* should always be the master */
+ Assert(tMasterThreadId == GetCurrentThreadId());
+
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+ #else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+
+ /*
+ * If we Ctrl-C the master process , it's likely that we interrupt
+ * select() here. The signal handler will set wantAbort == true and
+ * the shutdown journey starts from here. Note that we'll come back
+ * here later when we tell all workers to terminate and read their
+ * responses. But then we have aborting set to true.
+ */
+ if (wantAbort && !aborting)
+ exit_horribly(modulename, "terminated by user\n");
+
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+ #endif
+
+ return i;
+ }
+
+ void
+ DispatchJob(ParallelState *pstate, char * command)
+ {
+ int worker;
+
+ /* our caller makes sure that at least one worker is idle */
+ Assert(GetIdleWorker(pstate) != NO_SLOT);
+ worker = GetIdleWorker(pstate);
+ Assert(worker != NO_SLOT);
+
+ sendMessageToWorker(pstate, worker, command);
+ pstate->parallelSlot[worker].workerStatus = WRKR_WORKING;
+ }
+
+ #ifdef WIN32
+ /*
+ * This is a replacement version of pipe for Win32 which allows returned
+ * handles to be used in select(). Note that read/write calls must be replaced
+ * with recv/send.
+ */
+ static int
+ pgpipe(int handles[2])
+ {
+ SOCKET s;
+ struct sockaddr_in serv_addr;
+ int len = sizeof(serv_addr);
+
+ handles[0] = handles[1] = INVALID_SOCKET;
+
+ if ((s = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
+ {
+ return -1;
+ }
+
+ memset((void *) &serv_addr, 0, sizeof(serv_addr));
+ serv_addr.sin_family = AF_INET;
+ serv_addr.sin_port = htons(0);
+ serv_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+ if (bind(s, (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if (listen(s, 1) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if (getsockname(s, (SOCKADDR *) &serv_addr, &len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if ((handles[1] = socket(PF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
+ {
+ closesocket(s);
+ return -1;
+ }
+
+ if (connect(handles[1], (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if ((handles[0] = accept(s, (SOCKADDR *) &serv_addr, &len)) == INVALID_SOCKET)
+ {
+ closesocket(handles[1]);
+ handles[1] = INVALID_SOCKET;
+ closesocket(s);
+ return -1;
+ }
+ closesocket(s);
+ return 0;
+ }
+
+ static int
+ piperead(int s, char *buf, int len)
+ {
+ int ret = recv(s, buf, len, 0);
+
+ if (ret < 0 && WSAGetLastError() == WSAECONNRESET)
+ /* EOF on the pipe! (win32 socket based implementation) */
+ ret = 0;
+ return ret;
+ }
+
+ #endif
+
+ static void
+ shutdown_parallel_vacuum_utils()
+ {
+ #ifdef WIN32
+ /* Call the cleanup function only from the main thread */
+ if (mainThreadId == GetCurrentThreadId())
+ WSACleanup();
+ #endif
+ }
+
+ void
+ init_parallel_vacuum_utils(void)
+ {
+ #ifdef WIN32
+ if (!parallel_init_done)
+ {
+ WSADATA wsaData;
+ int err;
+
+ mainThreadId = GetCurrentThreadId();
+ err = WSAStartup(MAKEWORD(2, 2), &wsaData);
+ if (err != 0)
+ {
+ exit_nicely(1);
+ }
+
+ parallel_init_done = true;
+ }
+ #endif
+ }
+
+ void
+ on_exit_close_connection(PGconn *conn)
+ {
+ shutdown_info.conn = conn;
+ }
+
+ /*
+ * Fail and die, with a message to stderr. Parameters as for write_msg.
+ *
+ * This is defined in parallel.c, because in parallel mode, things are more
+ * complicated. If the worker process does exit_horribly(), we forward its
+ * last words to the master process. The master process then does
+ * exit_horribly() with this error message itself and prints it normally.
+ * After printing the message, exit_horribly() on the master will shut down
+ * the remaining worker processes.
+ */
+ void
+ exit_horribly(const char *modulename, const char *fmt,...)
+ {
+ va_list ap;
+ ParallelState *pstate = shutdown_info.pstate;
+ ParallelSlot *slot;
+
+ va_start(ap, fmt);
+
+ if (pstate == NULL)
+ {
+ /* We're the parent, just write the message out */
+ vfprintf(stderr, _(fmt), ap);
+ }
+ else
+ {
+ slot = GetMyPSlot(pstate);
+ if (!slot)
+ {
+ /* We're the parent, just write the message out */
+ vfprintf(stderr, _(fmt), ap);
+
+ }
+ else
+ {
+ /* If we're a worker process, send the msg to the master process */
+ parallel_msg_master(slot, modulename, fmt, ap);
+ }
+ }
+
+ va_end(ap);
+
+ exit_nicely(1);
+ }
+
+ static void
+ exit_nicely(int code)
+ {
+ if (shutdown_info.pstate)
+ {
+ ParallelSlot *slot = GetMyPSlot(shutdown_info.pstate);
+
+ if (!slot)
+ {
+ /*
+ * We're the master: We have already printed out the message
+ * passed to exit_horribly() either from the master itself or from
+ * a worker process. Now we need to close our own database
+ * connection (only open during parallel dump but not restore) and
+ * shut down the remaining workers.
+ */
+ PQfinish(shutdown_info.conn);
+ #ifndef WIN32
+
+ /*
+ * Setting aborting to true switches to best-effort-mode
+ * (send/receive but ignore errors) in communicating with our
+ * workers.
+ */
+ aborting = true;
+ #endif
+ ShutdownWorkersHard(shutdown_info.pstate);
+ }
+ else if (slot->args->connection)
+ PQfinish(slot->args->connection);
+
+ }
+ else if (shutdown_info.conn)
+ PQfinish(shutdown_info.conn);
+
+ shutdown_parallel_vacuum_utils();
+ exit(code);
+ }
+
+
*** /dev/null
--- b/src/bin/scripts/vac_parallel.h
***************
*** 0 ****
--- 1,103 ----
+ /*-------------------------------------------------------------------------
+ *
+ * vac_parallel.h
+ *
+ * Parallel support header file for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #ifndef VAC_PARALLEL_H
+ #define VAC_PARALLEL_H
+
+ #include "postgres_fe.h"
+ #include <time.h>
+ #include "libpq-fe.h"
+ #include "common.h"
+
+ typedef enum
+ {
+ WRKR_TERMINATED = 0,
+ WRKR_IDLE,
+ WRKR_WORKING,
+ WRKR_FINISHED
+ } T_WorkerStatus;
+
+ /* Arguments needed for a worker process */
+ typedef struct ParallelArgs
+ {
+ PGconn *connection;
+ } ParallelArgs;
+
+ /* State for each parallel activity slot */
+ typedef struct ParallelSlot
+ {
+ ParallelArgs *args; //can pass connection handle here
+ T_WorkerStatus workerStatus;
+ int status;
+ int pipeRead;
+ int pipeWrite;
+ int pipeRevRead;
+ int pipeRevWrite;
+ #ifdef WIN32
+ uintptr_t hThread;
+ unsigned int threadId;
+ #else
+ pid_t pid;
+ #endif
+ } ParallelSlot;
+
+ #define NO_SLOT (-1)
+
+ typedef struct ParallelState
+ {
+ int numWorkers;
+ ParallelSlot *parallelSlot;
+ } ParallelState;
+
+
+
+ typedef struct VacOpt
+ {
+ char *dbname;
+ char *pgport;
+ char *pghost;
+ char *username;
+ char *progname;
+ enum trivalue promptPassword;
+ }VacOpt;
+
+
+ #ifdef WIN32
+ extern bool parallel_init_done;
+ extern DWORD mainThreadId;
+ #endif
+
+ extern ParallelState * ParallelVacuumStart(VacOpt *vopt, int numWorkers);
+ extern int GetIdleWorker(ParallelState *pstate);
+ extern bool IsEveryWorkerIdle(ParallelState *pstate);
+ extern void ListenToWorkers(ParallelState *pstate, bool do_wait);
+ extern int ReapWorkerStatus(ParallelState *pstate, int *status);
+ extern void EnsureIdleWorker(ParallelState *pstate);
+ extern void EnsureWorkersFinished(ParallelState *pstate);
+
+ extern void DispatchJob(ParallelState *pstate, char * command);
+ extern void
+ exit_horribly(const char *modulename, const char *fmt,...)
+ __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3), noreturn));
+
+ extern void init_parallel_vacuum_utils(void);
+ extern void on_exit_close_connection(PGconn *conn);
+ extern void ParallelVacuumEnd(ParallelState *pstate);
+
+
+ #endif /* VAC_PARALLEL_H */
*** a/src/bin/scripts/vacuumdb.c
--- b/src/bin/scripts/vacuumdb.c
***************
*** 11,18 ****
*/
#include "postgres_fe.h"
- #include "common.h"
#include "dumputils.h"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
--- 11,18 ----
*/
#include "postgres_fe.h"
#include "dumputils.h"
+ #include "vac_parallel.h"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
***************
*** 25,34 **** static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet);
static void help(const char *progname);
int
main(int argc, char *argv[])
--- 25,46 ----
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet, int parallel);
static void help(const char *progname);
+ void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool freeze,
+ const char *table, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int parallel,
+ SimpleStringList *tables);
+
+ void run_command(ParallelState *pstate, char *command);
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+
int
main(int argc, char *argv[])
***************
*** 49,54 **** main(int argc, char *argv[])
--- 61,67 ----
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"parallel", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{NULL, 0, NULL, 0}
};
***************
*** 72,84 **** main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
{
switch (c)
{
--- 85,98 ----
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int parallel = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
***************
*** 127,132 **** main(int argc, char *argv[])
--- 141,149 ----
case 'v':
verbose = true;
break;
+ case 'j':
+ parallel = atoi(optarg);
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
***************
*** 136,141 **** main(int argc, char *argv[])
--- 153,159 ----
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
***************
*** 191,197 **** main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet);
}
else
{
--- 209,215 ----
vacuum_all_databases(full, verbose, and_analyze, analyze_only, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet, parallel);
}
else
{
***************
*** 205,229 **** main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
! for (cell = tables.head; cell; cell = cell->next)
{
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo);
}
}
- else
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only,
- freeze, NULL,
- host, port, username, prompt_password,
- progname, echo);
}
exit(0);
--- 223,262 ----
dbname = get_user_name_or_exit(progname);
}
! if (parallel > 1)
{
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only,
! freeze, NULL,
! host, port, username, prompt_password,
! progname, echo, parallel, &tables);
! }
! else
! {
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
!
! for (cell = tables.head; cell; cell = cell->next)
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo);
! }
! }
! else
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only,
! freeze, NULL,
! host, port, username, prompt_password,
! progname, echo);
!
}
}
}
exit(0);
***************
*** 321,327 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
bool freeze, const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
--- 354,360 ----
bool freeze, const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet, int parallel)
{
PGconn *conn;
PGresult *result;
***************
*** 342,356 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
fflush(stdout);
}
! vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
! freeze, NULL, host, port, username, prompt_password,
! progname, echo);
}
PQclear(result);
}
static void
help(const char *progname)
{
--- 375,581 ----
fflush(stdout);
}
!
! if (parallel > 1)
! {
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only,
! freeze, NULL,
! host, port, username, prompt_password,
! progname, echo, parallel, NULL);
!
! }
! else
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, freeze, NULL, host, port,
! username, prompt_password, progname, echo);
! }
}
PQclear(result);
}
+ void
+ vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool freeze,
+ const char *table, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int parallel,
+ SimpleStringList *tables)
+ {
+ PQExpBufferData sql;
+
+ PGconn *conn;
+ PGresult *res;
+ int ntuple;
+ int i;
+ char *relName;
+ char *nspace;
+ VacOpt vopt = {0};
+ ParallelState *pstate;
+
+ init_parallel_vacuum_utils();
+
+ initPQExpBuffer(&sql);
+
+ conn = connectDatabase(dbname, host, port, username, prompt_password,
+ progname, false);
+
+ if (dbname)
+ vopt.dbname = pg_strdup(dbname);
+
+ if (host)
+ vopt.pghost = pg_strdup(host);
+
+ if (port)
+ vopt.pgport = pg_strdup(port);
+
+ if (username)
+ vopt.username = pg_strdup(username);
+
+ if (progname)
+ vopt.progname = pg_strdup(progname);
+
+ vopt.promptPassword = prompt_password;
+
+ on_exit_close_connection(conn);
+
+ pstate = ParallelVacuumStart(&vopt, parallel);
+
+ if (tables && tables->head)
+ {
+ SimpleStringListCell *cell;
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ prepare_command(conn, full, verbose, and_analyze,
+ analyze_only, freeze, &sql);
+ appendPQExpBuffer(&sql, " %s", cell->val);
+ run_command(pstate, sql.data);
+ termPQExpBuffer(&sql);
+ }
+ }
+ else
+ {
+ res = executeQuery(conn,
+ "select relname, nspname from pg_class c, pg_namespace ns"
+ " where relkind= \'r\' and c.relnamespace = ns.oid"
+ " order by relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ relName = PQgetvalue(res, i, 0);
+ nspace = PQgetvalue(res, i, 1);
+ prepare_command(conn, full, verbose, and_analyze,
+ analyze_only, freeze, &sql);
+
+ appendPQExpBuffer(&sql, " %s.%s", nspace, relName);
+ run_command(pstate, sql.data);
+ termPQExpBuffer(&sql);
+ }
+ }
+
+ EnsureWorkersFinished(pstate);
+ ParallelVacuumEnd(pstate);
+
+ PQfinish(conn);
+ termPQExpBuffer(&sql);
+ }
+
+ void run_command(ParallelState *pstate, char *command)
+ {
+ int work_status;
+ int ret_child;
+
+ DispatchJob(pstate, command);
+
+ /*Listen for worker and get message*/
+ for (;;)
+ {
+ int nTerm = 0;
+
+ ListenToWorkers(pstate, false);
+ while ((ret_child = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before
+ * re-running the loop. If nTerm > 0 we already have that (quick
+ * check).
+ */
+ if (nTerm > 0)
+ break;
+
+ /* if nobody terminated, explicitly check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ break;
+ }
+ }
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+ {
+ initPQExpBuffer(sql);
+
+ if (analyze_only)
+ {
+ appendPQExpBuffer(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBuffer(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBuffer(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBuffer(sql, " FULL");
+ if (freeze)
+ appendPQExpBuffer(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBuffer(sql, " ANALYZE");
+ }
+ }
+ }
+
+
static void
help(const char *progname)
{
***************
*** 369,374 **** help(const char *progname)
--- 594,600 ----
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to vacuum\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nConnection options:\n"));
printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
On Fri, Mar 21, 2014 at 12:48 AM, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 16 January 2014 19:53, Euler Taveira Wrote,
For the case where you have tables of varying size this would lead
to a reduced overall processing time as it prevents large (read: long
processing time) tables to be processed in the last step. While
processing large tables at first and filling up "processing slots/jobs"
when they get free with smaller tables one after the other would safe
overall execution time.Good point, I have made the change and attached the modified patch.
Don't you submit it for a CF, do you? Is it too late for this CF?
Attached the latest updated patch
1. Rebased the patch to current GIT head.
2. Doc is updated.
3. Supported parallel execution for all db option also.
This patch needs to be rebased after the analyze-in-stages patch,
c92c3d50d7fbe7391b5fc864b44434.
Although that patch still needs to some work itself, despite being
committed, as still loops over the stages for each db, rather than the
dbs for each stage.
So I don't know if this patch is really reviewable at this point, as
it is not clear how those things are going to interact with each
other.
Cheers,
Jeff
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 24 June 2014 03:31, Jeff Wrote,
Attached the latest updated patch
1. Rebased the patch to current GIT head.
2. Doc is updated.
3. Supported parallel execution for all db option also.This patch needs to be rebased after the analyze-in-stages patch,
c92c3d50d7fbe7391b5fc864b44434.
Thank you for giving your attention to this, I will rebase this..
Although that patch still needs to some work itself, despite being
committed, as still loops over the stages for each db, rather than the
dbs for each stage.
If I understood your comment properly, Here you mean to say that
In vacuum_all_databases instead to running all DB's in parallel, we are running db by db in parallel?
I think we can fix this..
So I don't know if this patch is really reviewable at this point, as it
is not clear how those things are going to interact with each other.
Exactly what points you want to mention here ?
Regards,
Dilip Kumar
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Monday, June 23, 2014, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 24 June 2014 03:31, Jeff Wrote,
Attached the latest updated patch
1. Rebased the patch to current GIT head.
2. Doc is updated.
3. Supported parallel execution for all db option also.This patch needs to be rebased after the analyze-in-stages patch,
c92c3d50d7fbe7391b5fc864b44434.Thank you for giving your attention to this, I will rebase this..
Although that patch still needs to some work itself, despite being
committed, as still loops over the stages for each db, rather than the
dbs for each stage.If I understood your comment properly, Here you mean to say that
In vacuum_all_databases instead to running all DB's in parallel, we are
running db by db in parallel?
I mean that the other commit, the one conflicting with your patch, is still
not finished. It probably would not have been committed if we realized the
problem at the time. That other patch runs analyze in stages at different
settings of default_statistics_target, but it has the loops in the wrong
order, so it analyzes one database in all three stages, then moves to the
next database. I think that these two changes are going to interact with
each other. But I can't predict right now what that interaction will look
like. So it is hard for me to evaluate your patch, until the other one is
resolved.
Normally I would evaluate your patch in isolation, but since the
conflicting patch is already committed (and is in the 9.4 branch) that
would probably not be very useful in this case.
Cheers,
Jeff
On 24 June 2014 11:02 Jeff Wrote,
I mean that the other commit, the one conflicting with your patch, is still not finished. It probably would not have been committed if we realized the problem at the time. That other patch runs analyze in stages at
different settings of default_statistics_target, but it has the loops in the wrong order, so it analyzes one database in all three stages, then moves to the next database. I think that these two changes are going to
interact with each other. But I can't predict right now what that interaction will look like. So it is hard for me to evaluate your patch, until the other one is resolved.
Normally I would evaluate your patch in isolation, but since the conflicting patch is already committed (and is in the 9.4 branch) that would probably not be very useful in this case.
Oh k, Got your point, I will also try to think how these two patch can interact together..
Regards,
Dilip
Hi,
I got following FAILED when I patched v3 to HEAD.
$ patch -d. -p1 < ../patch/vacuumdb_parallel_v3.patch
patching file doc/src/sgml/ref/vacuumdb.sgml
Hunk #1 succeeded at 224 (offset 20 lines).
patching file src/bin/scripts/Makefile
Hunk #2 succeeded at 65 with fuzz 2 (offset -1 lines).
patching file src/bin/scripts/vac_parallel.c
patching file src/bin/scripts/vac_parallel.h
patching file src/bin/scripts/vacuumdb.c
Hunk #3 succeeded at 61 with fuzz 2.
Hunk #4 succeeded at 87 (offset 2 lines).
Hunk #5 succeeded at 143 (offset 2 lines).
Hunk #6 succeeded at 158 (offset 5 lines).
Hunk #7 succeeded at 214 with fuzz 2 (offset 5 lines).
Hunk #8 FAILED at 223.
Hunk #9 succeeded at 374 with fuzz 1 (offset 35 lines).
Hunk #10 FAILED at 360.
Hunk #11 FAILED at 387.
3 out of 11 hunks FAILED -- saving rejects to file
src/bin/scripts/vacuumdb.c.rej
---
Sawada Masahiko
On Friday, March 21, 2014, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 16 January 2014 19:53, Euler Taveira Wrote,
For the case where you have tables of varying size this would lead
to a reduced overall processing time as it prevents large (read: long
processing time) tables to be processed in the last step. While
processing large tables at first and filling up "processing slots/jobs"
when they get free with smaller tables one after the other would safe
overall execution time.Good point, I have made the change and attached the modified patch.
Don't you submit it for a CF, do you? Is it too late for this CF?
Attached the latest updated patch
1. Rebased the patch to current GIT head.
2. Doc is updated.
3. Supported parallel execution for all db option also.Same I will add to current open commitfest..
Regards,
Dilip
--
Regards,
-------
Sawada Masahiko
On 25 June 2014 23:37 Sawada Masahiko Wrote
I got following FAILED when I patched v3 to HEAD.
$ patch -d. -p1 < ../patch/vacuumdb_parallel_v3.patch
patching file doc/src/sgml/ref/vacuumdb.sgml
Hunk #1 succeeded at 224 (offset 20 lines).
patching file src/bin/scripts/Makefile
Hunk #2 succeeded at 65 with fuzz 2 (offset -1 lines).
patching file src/bin/scripts/vac_parallel.c
patching file src/bin/scripts/vac_parallel.h
patching file src/bin/scripts/vacuumdb.c
Hunk #3 succeeded at 61 with fuzz 2.
Hunk #4 succeeded at 87 (offset 2 lines).
Hunk #5 succeeded at 143 (offset 2 lines).
Hunk #6 succeeded at 158 (offset 5 lines).
Hunk #7 succeeded at 214 with fuzz 2 (offset 5 lines).
Hunk #8 FAILED at 223.
Hunk #9 succeeded at 374 with fuzz 1 (offset 35 lines).
Hunk #10 FAILED at 360.
Hunk #11 FAILED at 387.
3 out of 11 hunks FAILED -- saving rejects to file src/bin/scripts/vacuumdb.c.rej
Thank you for giving your time, Please review the updated patch attached in the mail.
1. Rebased the patch
2. Implemented parallel execution for new option --analyze-in-stages
Regards,
Dilip Kumar
Attachments:
vacuumdb_parallel_v4.patchapplication/octet-stream; name=vacuumdb_parallel_v4.patchDownload
*** a/doc/src/sgml/ref/vacuumdb.sgml
--- b/doc/src/sgml/ref/vacuumdb.sgml
***************
*** 224,229 **** PostgreSQL documentation
--- 224,239 ----
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></></term>
+ <term><option>--jobs=<replaceable class="parameter">jobs</replaceable></></term>
+ <listitem>
+ <para>
+ Number of parallel process to perform the operation.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-?</></term>
<term><option>--help</></term>
<listitem>
*** a/src/bin/scripts/Makefile
--- b/src/bin/scripts/Makefile
***************
*** 32,38 **** dropdb: dropdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq subm
droplang: droplang.o common.o print.o mbprint.o | submake-libpq submake-libpgport
dropuser: dropuser.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
clusterdb: clusterdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
! vacuumdb: vacuumdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
reindexdb: reindexdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
pg_isready: pg_isready.o common.o | submake-libpq submake-libpgport
--- 32,38 ----
droplang: droplang.o common.o print.o mbprint.o | submake-libpq submake-libpgport
dropuser: dropuser.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
clusterdb: clusterdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
! vacuumdb: vacuumdb.o vac_parallel.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
reindexdb: reindexdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
pg_isready: pg_isready.o common.o | submake-libpq submake-libpgport
***************
*** 65,71 **** uninstall:
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
! rm -f common.o dumputils.o kwlookup.o keywords.o print.o mbprint.o $(WIN32RES)
rm -f dumputils.c print.c mbprint.c kwlookup.c keywords.c
rm -rf tmp_check
--- 65,71 ----
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
! rm -f common.o dumputils.o kwlookup.o keywords.o print.o mbprint.o vac_parallel.o $(WIN32RES)
rm -f dumputils.c print.c mbprint.c kwlookup.c keywords.c
rm -rf tmp_check
*** /dev/null
--- b/src/bin/scripts/vac_parallel.c
***************
*** 0 ****
--- 1,1112 ----
+ /*-------------------------------------------------------------------------
+ *
+ * vac_parallel.c
+ *
+ * Parallel support for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #include "vac_parallel.h"
+
+ #ifndef WIN32
+ #include <sys/types.h>
+ #include <sys/wait.h>
+ #include "signal.h"
+ #include <unistd.h>
+ #include <fcntl.h>
+ #endif
+
+ #include "common.h"
+
+
+ #define PIPE_READ 0
+ #define PIPE_WRITE 1
+
+ /* file-scope variables */
+ #ifdef WIN32
+ static unsigned int tMasterThreadId = 0;
+ static HANDLE termEvent = INVALID_HANDLE_VALUE;
+ static int pgpipe(int handles[2]);
+ static int piperead(int s, char *buf, int len);
+ bool parallel_init_done = false;
+ DWORD mainThreadId;
+
+ /*
+ * Structure to hold info passed by _beginthreadex() to the function it calls
+ * via its single allowed argument.
+ */
+ typedef struct
+ {
+ VacOpt *vopt;
+ int worker;
+ int pipeRead;
+ int pipeWrite;
+ } WorkerInfo;
+
+ #define pipewrite(a,b,c) send(a,b,c,0)
+ #else
+ /*
+ * aborting is only ever used in the master, the workers are fine with just
+ * wantAbort.
+ */
+ static bool aborting = false;
+ static volatile sig_atomic_t wantAbort = 0;
+
+ #define pgpipe(a) pipe(a)
+ #define piperead(a,b,c) read(a,b,c)
+ #define pipewrite(a,b,c) write(a,b,c)
+ #endif
+
+ static const char *modulename = gettext_noop("parallel vacuum");
+
+ typedef struct ShutdownInformation
+ {
+ ParallelState *pstate;
+ PGconn *conn;
+ } ShutdownInformation;
+
+ static ShutdownInformation shutdown_info;
+
+ static char *readMessageFromPipe(int fd);
+ static void
+ SetupWorker(PGconn *connection, int pipefd[2], int worker);
+ static void
+ WaitForCommands(PGconn * connection, int pipefd[2]);
+ static char *
+ getMessageFromMaster(int pipefd[2]);
+ #define messageStartsWith(msg, prefix) \
+ (strncmp(msg, prefix, strlen(prefix)) == 0)
+ #define messageEquals(msg, pattern) \
+ (strcmp(msg, pattern) == 0)
+ static char *
+ getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker);
+
+ static void
+ sendMessageToMaster(int pipefd[2], const char *str);
+
+ #ifndef WIN32
+ static void sigTermHandler(int signum);
+ #endif
+
+ static ParallelSlot *
+ GetMyPSlot(ParallelState *pstate);
+ static int
+ select_loop(int maxFd, fd_set *workerset);
+ void
+ exit_horribly(const char *modulename, const char *fmt,...);
+ void
+ init_parallel_vacuum_utils(void);
+
+ static void exit_nicely(int code);
+
+ static bool
+ HasEveryWorkerTerminated(ParallelState *pstate);
+
+ static void checkAborting();
+
+
+ static ParallelSlot *
+ GetMyPSlot(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ #ifdef WIN32
+ if (pstate->parallelSlot[i].threadId == GetCurrentThreadId())
+ #else
+ if (pstate->parallelSlot[i].pid == getpid())
+ #endif
+ return &(pstate->parallelSlot[i]);
+
+ return NULL;
+ }
+
+ /* Sends the error message from the worker to the master process */
+ static void
+ parallel_msg_master(ParallelSlot *slot, const char *modulename,
+ const char *fmt, va_list ap)
+ {
+ char buf[512];
+ int pipefd[2];
+
+ pipefd[PIPE_READ] = slot->pipeRevRead;
+ pipefd[PIPE_WRITE] = slot->pipeRevWrite;
+
+ strcpy(buf, "ERROR ");
+ vsnprintf(buf + strlen("ERROR "),
+ sizeof(buf) - strlen("ERROR "), fmt, ap);
+
+ sendMessageToMaster(pipefd, buf);
+ }
+
+ /*
+ * Wait for the termination of the processes using the OS-specific method.
+ */
+ static void
+ WaitForTerminatingWorkers(ParallelState *pstate)
+ {
+ while (!HasEveryWorkerTerminated(pstate))
+ {
+ ParallelSlot *slot = NULL;
+ int j;
+
+ #ifndef WIN32
+ int status;
+ pid_t pid = wait(&status);
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].pid == pid)
+ slot = &(pstate->parallelSlot[j]);
+ #else
+ uintptr_t hThread;
+ DWORD ret;
+ uintptr_t *lpHandles = pg_malloc(sizeof(HANDLE) * pstate->numWorkers);
+ int nrun = 0;
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].workerStatus != WRKR_TERMINATED)
+ {
+ lpHandles[nrun] = pstate->parallelSlot[j].hThread;
+ nrun++;
+ }
+ ret = WaitForMultipleObjects(nrun, (HANDLE *) lpHandles, false, INFINITE);
+ Assert(ret != WAIT_FAILED);
+ hThread = lpHandles[ret - WAIT_OBJECT_0];
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].hThread == hThread)
+ slot = &(pstate->parallelSlot[j]);
+
+ free(lpHandles);
+ #endif
+ Assert(slot);
+
+ slot->workerStatus = WRKR_TERMINATED;
+ }
+ Assert(HasEveryWorkerTerminated(pstate));
+ }
+
+
+ #ifdef WIN32
+ static unsigned __stdcall
+ init_spawned_worker_win32(WorkerInfo *wi)
+ {
+ PGconn *conn;
+ int pipefd[2] = {wi->pipeRead, wi->pipeWrite};
+ int worker = wi->worker;
+ VacOpt *vopt = wi->vopt;
+
+ conn = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport,
+ vopt->username, vopt->promptPassword, vopt->progname,
+ false);
+
+ free(wi);
+ SetupWorker(conn, pipefd, worker);
+ _endthreadex(0);
+ return 0;
+ }
+ #endif
+
+
+
+ ParallelState * ParallelVacuumStart(VacOpt *vopt, int numWorkers)
+ {
+ ParallelState *pstate;
+ int i;
+ const size_t slotSize = numWorkers * sizeof(ParallelSlot);
+
+ Assert(numWorkers > 0);
+
+ /* Ensure stdio state is quiesced before forking */
+ fflush(NULL);
+
+ pstate = (ParallelState *) pg_malloc(sizeof(ParallelState));
+
+ pstate->numWorkers = numWorkers;
+ pstate->parallelSlot = NULL;
+
+ if (numWorkers == 1)
+ return pstate;
+
+ pstate->parallelSlot = (ParallelSlot *) pg_malloc(slotSize);
+ memset((void *) pstate->parallelSlot, 0, slotSize);
+
+ shutdown_info.pstate = pstate;
+
+ #ifdef WIN32
+ tMasterThreadId = GetCurrentThreadId();
+ termEvent = CreateEvent(NULL, true, false, "Terminate");
+ #else
+ signal(SIGTERM, sigTermHandler);
+ signal(SIGINT, sigTermHandler);
+ signal(SIGQUIT, sigTermHandler);
+ #endif
+
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ #ifdef WIN32
+ WorkerInfo *wi;
+ uintptr_t handle;
+ #else
+ pid_t pid;
+ #endif
+ int pipeMW[2],
+ pipeWM[2];
+
+ if (pgpipe(pipeMW) < 0 || pgpipe(pipeWM) < 0)
+ exit_horribly(modulename,
+ "could not create communication channels: %s\n",
+ strerror(errno));
+
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ pstate->parallelSlot[i].args = (ParallelArgs *) pg_malloc(sizeof(ParallelArgs));
+ #ifdef WIN32
+ /* Allocate a new structure for every worker */
+ wi = (WorkerInfo *) pg_malloc(sizeof(WorkerInfo));
+ wi->worker = i;
+ wi->pipeRead = pstate->parallelSlot[i].pipeRevRead = pipeMW[PIPE_READ];
+ wi->pipeWrite = pstate->parallelSlot[i].pipeRevWrite = pipeWM[PIPE_WRITE];
+ wi->vopt = vopt;
+ handle = _beginthreadex(NULL, 0, (void *) &init_spawned_worker_win32,
+ wi, 0, &(pstate->parallelSlot[i].threadId));
+ pstate->parallelSlot[i].hThread = handle;
+ #else
+ pid = fork();
+ if (pid == 0)
+ {
+ /* we are the worker */
+ int j;
+ int pipefd[2] = {pipeMW[PIPE_READ], pipeWM[PIPE_WRITE]};
+
+ /*
+ * Store the fds for the reverse communication in pstate. Actually
+ * we only use this in case of an error and don't use pstate
+ * otherwise in the worker process. On Windows we write to the
+ * global pstate, in Unix we write to our process-local copy but
+ * that's also where we'd retrieve this information back from.
+ */
+ pstate->parallelSlot[i].pipeRevRead = pipefd[PIPE_READ];
+ pstate->parallelSlot[i].pipeRevWrite = pipefd[PIPE_WRITE];
+ pstate->parallelSlot[i].pid = getpid();
+
+ pstate->parallelSlot[i].args->connection
+ = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport,
+ vopt->username, vopt->promptPassword,
+ vopt->progname, false);
+
+ /* close read end of Worker -> Master */
+ closesocket(pipeWM[PIPE_READ]);
+
+ /* close write end of Master -> Worker */
+ closesocket(pipeMW[PIPE_WRITE]);
+
+ /*
+ * Close all inherited fds for communication of the master with
+ * the other workers.
+ */
+ for (j = 0; j < i; j++)
+ {
+ closesocket(pstate->parallelSlot[j].pipeRead);
+ closesocket(pstate->parallelSlot[j].pipeWrite);
+ }
+
+ SetupWorker(pstate->parallelSlot[i].args->connection, pipefd, i);
+ exit(0);
+ }
+ else if (pid < 0)
+ /* fork failed */
+ exit_horribly(modulename,
+ "could not create worker process: %s\n",
+ strerror(errno));
+
+ /* we are the Master, pid > 0 here */
+ Assert(pid > 0);
+
+ /* close read end of Master -> Worker */
+ closesocket(pipeMW[PIPE_READ]);
+
+ /* close write end of Worker -> Master */
+ closesocket(pipeWM[PIPE_WRITE]);
+
+ pstate->parallelSlot[i].pid = pid;
+ #endif
+
+ pstate->parallelSlot[i].pipeRead = pipeWM[PIPE_READ];
+ pstate->parallelSlot[i].pipeWrite = pipeMW[PIPE_WRITE];
+ }
+
+ return pstate;
+ }
+
+ /*
+ * Tell all of our workers to terminate.
+ *
+ * Pretty straightforward routine, first we tell everyone to terminate, then
+ * we listen to the workers' replies and finally close the sockets that we
+ * have used for communication.
+ */
+ void
+ ParallelVacuumEnd(ParallelState *pstate)
+ {
+ int i;
+
+ Assert(IsEveryWorkerIdle(pstate));
+
+ /* close the sockets so that the workers know they can exit */
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ closesocket(pstate->parallelSlot[i].pipeRead);
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+ }
+
+ WaitForTerminatingWorkers(pstate);
+
+ /*
+ * Remove the pstate again, so the exit handler in the parent will now
+ * again fall back to closing AH->connection (if connected).
+ */
+ shutdown_info.pstate = NULL;
+
+ free(pstate->parallelSlot);
+ free(pstate);
+ }
+
+
+ /*
+ * Find the first free parallel slot (if any).
+ */
+ int
+ GetIdleWorker(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus == WRKR_IDLE)
+ return i;
+ return NO_SLOT;
+ }
+
+ /*
+ * Return true iff every worker process is in the WRKR_TERMINATED state.
+ */
+ static bool
+ HasEveryWorkerTerminated(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_TERMINATED)
+ return false;
+ return true;
+ }
+
+ /*
+ * If we have one worker that terminates for some reason, we'd like the other
+ * threads to terminate as well (and not finish with their 70 GB table dump
+ * first...). Now in UNIX we can just kill these processes, and let the signal
+ * handler set wantAbort to 1. In Windows we set a termEvent and this serves
+ * as the signal for everyone to terminate.
+ */
+ static void
+ checkAborting()
+ {
+ #ifdef WIN32
+ if (WaitForSingleObject(termEvent, 0) == WAIT_OBJECT_0)
+ #else
+ if (wantAbort)
+ #endif
+ exit_horribly(modulename, "worker is terminating\n");
+ }
+
+ /*
+ * Shut down any remaining workers, this has an implicit do_wait == true.
+ *
+ * The fastest way we can make the workers terminate gracefully is when
+ * they are listening for new commands and we just tell them to terminate.
+ */
+ static void
+ ShutdownWorkersHard(ParallelState *pstate)
+ {
+ #ifndef WIN32
+ int i;
+
+ signal(SIGPIPE, SIG_IGN);
+
+ /*
+ * Close our write end of the sockets so that the workers know they can
+ * exit.
+ */
+ for (i = 0; i < pstate->numWorkers; i++)
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ kill(pstate->parallelSlot[i].pid, SIGTERM);
+ #else
+ /* The workers monitor this event via checkAborting(). */
+ SetEvent(termEvent);
+ #endif
+
+ WaitForTerminatingWorkers(pstate);
+ }
+
+ #ifndef WIN32
+ /* Signal handling (UNIX only) */
+ static void
+ sigTermHandler(int signum)
+ {
+ wantAbort = 1;
+ }
+ #endif
+
+
+
+ /*
+ * This function is called by both UNIX and Windows variants to set up a
+ * worker process.
+ */
+ static void
+ SetupWorker(PGconn *connection, int pipefd[2], int worker)
+ {
+ /*
+ * Call the setup worker function that's defined in the ArchiveHandle.
+ *
+ * We get the raw connection only for the reason that we can close it
+ * properly when we shut down. This happens only that way when it is
+ * brought down because of an error.
+ */
+ WaitForCommands(connection, pipefd);
+ closesocket(pipefd[PIPE_READ]);
+ closesocket(pipefd[PIPE_WRITE]);
+ }
+
+
+
+ /*
+ * That's the main routine for the worker.
+ * When it starts up it enters this routine and waits for commands from the
+ * master process. After having processed a command it comes back to here to
+ * wait for the next command. Finally it will receive a TERMINATE command and
+ * exit.
+ */
+ static void
+ WaitForCommands(PGconn * connection, int pipefd[2])
+ {
+ char *command;
+
+ for (;;)
+ {
+ if (!(command = getMessageFromMaster(pipefd)))
+ {
+ PQfinish(connection);
+ connection = NULL;
+ return;
+ }
+
+
+ /* check if master has set the terminate event*/
+ checkAborting();
+
+ if (executeMaintenanceCommand(connection, command, false))
+ sendMessageToMaster(pipefd, "OK");
+ else
+ sendMessageToMaster(pipefd, "ERROR : Execute failed");
+
+ /* command was pg_malloc'd and we are responsible for free()ing it. */
+ free(command);
+ }
+ }
+
+ /*
+ * ---------------------------------------------------------------------
+ * Note the status change:
+ *
+ * DispatchJobForTocEntry WRKR_IDLE -> WRKR_WORKING
+ * ListenToWorkers WRKR_WORKING -> WRKR_FINISHED / WRKR_TERMINATED
+ * ReapWorkerStatus WRKR_FINISHED -> WRKR_IDLE
+ * ---------------------------------------------------------------------
+ *
+ * Just calling ReapWorkerStatus() when all workers are working might or might
+ * not give you an idle worker because you need to call ListenToWorkers() in
+ * between and only thereafter ReapWorkerStatus(). This is necessary in order
+ * to get and deal with the status (=result) of the worker's execution.
+ */
+ void
+ ListenToWorkers(ParallelState *pstate, bool do_wait)
+ {
+ int worker;
+ char *msg;
+
+ msg = getMessageFromWorker(pstate, do_wait, &worker);
+
+ if (!msg)
+ {
+ if (do_wait)
+ exit_horribly(modulename, "a worker process died unexpectedly\n");
+ return;
+ }
+
+ if (messageStartsWith(msg, "OK"))
+ {
+ pstate->parallelSlot[worker].workerStatus = WRKR_FINISHED;
+
+ }
+ else if (messageStartsWith(msg, "ERROR "))
+ {
+ pstate->parallelSlot[worker].workerStatus = WRKR_TERMINATED;
+ exit_horribly(modulename, "%s", msg + strlen("ERROR "));
+ }
+ else
+ exit_horribly(modulename, "invalid message received from worker: %s\n", msg);
+
+ /* both Unix and Win32 return pg_malloc()ed space, so we free it */
+ free(msg);
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * This function is used to get the return value of a terminated worker
+ * process. If a process has terminated, its status is stored in *status and
+ * the id of the worker is returned.
+ */
+ int
+ ReapWorkerStatus(ParallelState *pstate, int *status)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_FINISHED)
+ {
+ *status = pstate->parallelSlot[i].status;
+ pstate->parallelSlot[i].status = 0;
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ return i;
+ }
+ }
+
+ return NO_SLOT;
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It looks for an idle worker process and only returns if there is one.
+ */
+ void
+ EnsureIdleWorker(ParallelState *pstate)
+ {
+ int ret_worker;
+ int work_status;
+
+ for (;;)
+ {
+ int nTerm = 0;
+
+ while ((ret_worker = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ if (work_status != 0)
+ exit_horribly(modulename, "error processing a parallel work item\n");
+
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before dispatching
+ * the next item. If nTerm > 0 we already have that (quick check).
+ */
+ if (nTerm > 0)
+ return;
+
+ /* explicit check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ return;
+
+ /*
+ * If we have no idle worker, read the result of one or more workers
+ * and loop the loop to call ReapWorkerStatus() on them
+ */
+ ListenToWorkers(pstate, true);
+ }
+ }
+
+
+ /*
+ * Return true iff every worker is in the WRKR_IDLE state.
+ */
+ bool
+ IsEveryWorkerIdle(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_IDLE)
+ return false;
+ return true;
+ }
+
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It waits for all workers to terminate.
+ */
+ void
+ EnsureWorkersFinished(ParallelState *pstate)
+ {
+ int work_status;
+
+ if (!pstate || pstate->numWorkers == 1)
+ return;
+
+ /* Waiting for the remaining worker processes to finish */
+ while (!IsEveryWorkerIdle(pstate))
+ {
+ if (ReapWorkerStatus(pstate, &work_status) == NO_SLOT)
+ ListenToWorkers(pstate, true);
+ else if (work_status != 0)
+ exit_horribly(modulename,
+ "error processing a parallel work item\n");
+ }
+ }
+
+ /*
+ * This function is executed in the worker process.
+ *
+ * It returns the next message on the communication channel, blocking until it
+ * becomes available.
+ */
+ static char *
+ getMessageFromMaster(int pipefd[2])
+ {
+ return readMessageFromPipe(pipefd[PIPE_READ]);
+ }
+
+ /*
+ * This function is executed in the worker process.
+ *
+ * It sends a message to the master on the communication channel.
+ */
+ static void
+ sendMessageToMaster(int pipefd[2], const char *str)
+ {
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pipefd[PIPE_WRITE], str, len) != len)
+ exit_horribly(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It returns the next message from the worker on the communication channel,
+ * optionally blocking (do_wait) until it becomes available.
+ *
+ * The id of the worker is returned in *worker.
+ */
+ static char *
+ getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker)
+ {
+ int i;
+ fd_set workerset;
+ int maxFd = -1;
+ struct timeval nowait = {0, 0};
+
+ FD_ZERO(&workerset);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_TERMINATED)
+ continue;
+ FD_SET(pstate->parallelSlot[i].pipeRead, &workerset);
+ /* actually WIN32 ignores the first parameter to select()... */
+ if (pstate->parallelSlot[i].pipeRead > maxFd)
+ maxFd = pstate->parallelSlot[i].pipeRead;
+ }
+
+ if (do_wait)
+ {
+ i = select_loop(maxFd, &workerset);
+ Assert(i != 0);
+ }
+ else
+ {
+ if ((i = select(maxFd + 1, &workerset, NULL, NULL, &nowait)) == 0)
+ return NULL;
+ }
+
+ if (i < 0)
+ exit_horribly(modulename, "error in ListenToWorkers(): %s\n", strerror(errno));
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ char *msg;
+
+ if (!FD_ISSET(pstate->parallelSlot[i].pipeRead, &workerset))
+ continue;
+
+ msg = readMessageFromPipe(pstate->parallelSlot[i].pipeRead);
+ *worker = i;
+ return msg;
+ }
+
+ Assert(false);
+ return NULL;
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It sends a message to a certain worker on the communication channel.
+ */
+ static void
+ sendMessageToWorker(ParallelState *pstate, int worker, const char *str)
+ {
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pstate->parallelSlot[worker].pipeWrite, str, len) != len)
+ {
+ /*
+ * If we're already aborting anyway, don't care if we succeed or not.
+ * The child might have gone already.
+ */
+ #ifndef WIN32
+ if (!aborting)
+ #endif
+ exit_horribly(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+ }
+ }
+
+ /*
+ * The underlying function to read a message from the communication channel
+ * (fd) with optional blocking (do_wait).
+ */
+ static char *
+ readMessageFromPipe(int fd)
+ {
+ char *msg;
+ int msgsize,
+ bufsize;
+ int ret;
+
+ /*
+ * The problem here is that we need to deal with several possibilites: we
+ * could receive only a partial message or several messages at once. The
+ * caller expects us to return exactly one message however.
+ *
+ * We could either read in as much as we can and keep track of what we
+ * delivered back to the caller or we just read byte by byte. Once we see
+ * (char) 0, we know that it's the message's end. This would be quite
+ * inefficient for more data but since we are reading only on the command
+ * channel, the performance loss does not seem worth the trouble of
+ * keeping internal states for different file descriptors.
+ */
+ bufsize = 64; /* could be any number */
+ msg = (char *) pg_malloc(bufsize);
+
+ msgsize = 0;
+ for (;;)
+ {
+ Assert(msgsize <= bufsize);
+ ret = piperead(fd, msg + msgsize, 1);
+
+ /* worker has closed the connection or another error happened */
+ if (ret <= 0)
+ return NULL;
+
+ Assert(ret == 1);
+
+ if (msg[msgsize] == '\0')
+ return msg;
+
+ msgsize++;
+ if (msgsize == bufsize)
+ {
+ /* could be any number */
+ bufsize += 16;
+ msg = (char *) realloc(msg, bufsize);
+ }
+ }
+ }
+
+ /*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+ static int
+ select_loop(int maxFd, fd_set *workerset)
+ {
+ int i;
+ fd_set saveSet = *workerset;
+
+ #ifdef WIN32
+ /* should always be the master */
+ Assert(tMasterThreadId == GetCurrentThreadId());
+
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+ #else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+
+ /*
+ * If we Ctrl-C the master process , it's likely that we interrupt
+ * select() here. The signal handler will set wantAbort == true and
+ * the shutdown journey starts from here. Note that we'll come back
+ * here later when we tell all workers to terminate and read their
+ * responses. But then we have aborting set to true.
+ */
+ if (wantAbort && !aborting)
+ exit_horribly(modulename, "terminated by user\n");
+
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+ #endif
+
+ return i;
+ }
+
+ void
+ DispatchJob(ParallelState *pstate, char * command)
+ {
+ int worker;
+
+ /* our caller makes sure that at least one worker is idle */
+ Assert(GetIdleWorker(pstate) != NO_SLOT);
+ worker = GetIdleWorker(pstate);
+ Assert(worker != NO_SLOT);
+
+ sendMessageToWorker(pstate, worker, command);
+ pstate->parallelSlot[worker].workerStatus = WRKR_WORKING;
+ }
+
+ #ifdef WIN32
+ /*
+ * This is a replacement version of pipe for Win32 which allows returned
+ * handles to be used in select(). Note that read/write calls must be replaced
+ * with recv/send.
+ */
+ static int
+ pgpipe(int handles[2])
+ {
+ SOCKET s;
+ struct sockaddr_in serv_addr;
+ int len = sizeof(serv_addr);
+
+ handles[0] = handles[1] = INVALID_SOCKET;
+
+ if ((s = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
+ {
+ return -1;
+ }
+
+ memset((void *) &serv_addr, 0, sizeof(serv_addr));
+ serv_addr.sin_family = AF_INET;
+ serv_addr.sin_port = htons(0);
+ serv_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+ if (bind(s, (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if (listen(s, 1) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if (getsockname(s, (SOCKADDR *) &serv_addr, &len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if ((handles[1] = socket(PF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
+ {
+ closesocket(s);
+ return -1;
+ }
+
+ if (connect(handles[1], (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if ((handles[0] = accept(s, (SOCKADDR *) &serv_addr, &len)) == INVALID_SOCKET)
+ {
+ closesocket(handles[1]);
+ handles[1] = INVALID_SOCKET;
+ closesocket(s);
+ return -1;
+ }
+ closesocket(s);
+ return 0;
+ }
+
+ static int
+ piperead(int s, char *buf, int len)
+ {
+ int ret = recv(s, buf, len, 0);
+
+ if (ret < 0 && WSAGetLastError() == WSAECONNRESET)
+ /* EOF on the pipe! (win32 socket based implementation) */
+ ret = 0;
+ return ret;
+ }
+
+ #endif
+
+ static void
+ shutdown_parallel_vacuum_utils()
+ {
+ #ifdef WIN32
+ /* Call the cleanup function only from the main thread */
+ if (mainThreadId == GetCurrentThreadId())
+ WSACleanup();
+ #endif
+ }
+
+ void
+ init_parallel_vacuum_utils(void)
+ {
+ #ifdef WIN32
+ if (!parallel_init_done)
+ {
+ WSADATA wsaData;
+ int err;
+
+ mainThreadId = GetCurrentThreadId();
+ err = WSAStartup(MAKEWORD(2, 2), &wsaData);
+ if (err != 0)
+ {
+ exit_nicely(1);
+ }
+
+ parallel_init_done = true;
+ }
+ #endif
+ }
+
+ void
+ on_exit_close_connection(PGconn *conn)
+ {
+ shutdown_info.conn = conn;
+ }
+
+ /*
+ * Fail and die, with a message to stderr. Parameters as for write_msg.
+ *
+ * This is defined in parallel.c, because in parallel mode, things are more
+ * complicated. If the worker process does exit_horribly(), we forward its
+ * last words to the master process. The master process then does
+ * exit_horribly() with this error message itself and prints it normally.
+ * After printing the message, exit_horribly() on the master will shut down
+ * the remaining worker processes.
+ */
+ void
+ exit_horribly(const char *modulename, const char *fmt,...)
+ {
+ va_list ap;
+ ParallelState *pstate = shutdown_info.pstate;
+ ParallelSlot *slot;
+
+ va_start(ap, fmt);
+
+ if (pstate == NULL)
+ {
+ /* We're the parent, just write the message out */
+ vfprintf(stderr, _(fmt), ap);
+ }
+ else
+ {
+ slot = GetMyPSlot(pstate);
+ if (!slot)
+ {
+ /* We're the parent, just write the message out */
+ vfprintf(stderr, _(fmt), ap);
+
+ }
+ else
+ {
+ /* If we're a worker process, send the msg to the master process */
+ parallel_msg_master(slot, modulename, fmt, ap);
+ }
+ }
+
+ va_end(ap);
+
+ exit_nicely(1);
+ }
+
+ static void
+ exit_nicely(int code)
+ {
+ if (shutdown_info.pstate)
+ {
+ ParallelSlot *slot = GetMyPSlot(shutdown_info.pstate);
+
+ if (!slot)
+ {
+ /*
+ * We're the master: We have already printed out the message
+ * passed to exit_horribly() either from the master itself or from
+ * a worker process. Now we need to close our own database
+ * connection (only open during parallel dump but not restore) and
+ * shut down the remaining workers.
+ */
+ PQfinish(shutdown_info.conn);
+ #ifndef WIN32
+
+ /*
+ * Setting aborting to true switches to best-effort-mode
+ * (send/receive but ignore errors) in communicating with our
+ * workers.
+ */
+ aborting = true;
+ #endif
+ ShutdownWorkersHard(shutdown_info.pstate);
+ }
+ else if (slot->args->connection)
+ PQfinish(slot->args->connection);
+
+ }
+ else if (shutdown_info.conn)
+ PQfinish(shutdown_info.conn);
+
+ shutdown_parallel_vacuum_utils();
+ exit(code);
+ }
+
+
*** /dev/null
--- b/src/bin/scripts/vac_parallel.h
***************
*** 0 ****
--- 1,103 ----
+ /*-------------------------------------------------------------------------
+ *
+ * vac_parallel.h
+ *
+ * Parallel support header file for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #ifndef VAC_PARALLEL_H
+ #define VAC_PARALLEL_H
+
+ #include "postgres_fe.h"
+ #include <time.h>
+ #include "libpq-fe.h"
+ #include "common.h"
+
+ typedef enum
+ {
+ WRKR_TERMINATED = 0,
+ WRKR_IDLE,
+ WRKR_WORKING,
+ WRKR_FINISHED
+ } T_WorkerStatus;
+
+ /* Arguments needed for a worker process */
+ typedef struct ParallelArgs
+ {
+ PGconn *connection;
+ } ParallelArgs;
+
+ /* State for each parallel activity slot */
+ typedef struct ParallelSlot
+ {
+ ParallelArgs *args; //can pass connection handle here
+ T_WorkerStatus workerStatus;
+ int status;
+ int pipeRead;
+ int pipeWrite;
+ int pipeRevRead;
+ int pipeRevWrite;
+ #ifdef WIN32
+ uintptr_t hThread;
+ unsigned int threadId;
+ #else
+ pid_t pid;
+ #endif
+ } ParallelSlot;
+
+ #define NO_SLOT (-1)
+
+ typedef struct ParallelState
+ {
+ int numWorkers;
+ ParallelSlot *parallelSlot;
+ } ParallelState;
+
+
+
+ typedef struct VacOpt
+ {
+ char *dbname;
+ char *pgport;
+ char *pghost;
+ char *username;
+ char *progname;
+ enum trivalue promptPassword;
+ }VacOpt;
+
+
+ #ifdef WIN32
+ extern bool parallel_init_done;
+ extern DWORD mainThreadId;
+ #endif
+
+ extern ParallelState * ParallelVacuumStart(VacOpt *vopt, int numWorkers);
+ extern int GetIdleWorker(ParallelState *pstate);
+ extern bool IsEveryWorkerIdle(ParallelState *pstate);
+ extern void ListenToWorkers(ParallelState *pstate, bool do_wait);
+ extern int ReapWorkerStatus(ParallelState *pstate, int *status);
+ extern void EnsureIdleWorker(ParallelState *pstate);
+ extern void EnsureWorkersFinished(ParallelState *pstate);
+
+ extern void DispatchJob(ParallelState *pstate, char * command);
+ extern void
+ exit_horribly(const char *modulename, const char *fmt,...)
+ __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3), noreturn));
+
+ extern void init_parallel_vacuum_utils(void);
+ extern void on_exit_close_connection(PGconn *conn);
+ extern void ParallelVacuumEnd(ParallelState *pstate);
+
+
+ #endif /* VAC_PARALLEL_H */
*** a/src/bin/scripts/vacuumdb.c
--- b/src/bin/scripts/vacuumdb.c
***************
*** 11,34 ****
*/
#include "postgres_fe.h"
- #include "common.h"
#include "dumputils.h"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *table, const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
const char *progname, bool echo);
static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
bool analyze_only, bool analyze_in_stages, bool freeze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet);
static void help(const char *progname);
int
main(int argc, char *argv[])
--- 11,54 ----
*/
#include "postgres_fe.h"
#include "dumputils.h"
+ #include "vac_parallel.h"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password,
const char *progname, bool echo);
static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
bool analyze_only, bool analyze_in_stages, bool freeze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet, int parallel);
static void help(const char *progname);
+ void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host,
+ const char *port, const char *username,
+ enum trivalue prompt_password, const char *progname,
+ bool echo, int parallel, SimpleStringList *tables);
+
+ void run_command(ParallelState *pstate, char *command);
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+ static void
+ run_parallel_vacuum(PGconn *conn, bool echo,
+ const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int parallel,
+ VacOpt *vopt, const char *progname);
+
+
int
main(int argc, char *argv[])
***************
*** 49,54 **** main(int argc, char *argv[])
--- 69,75 ----
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"parallel", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
***************
*** 74,86 **** main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
{
switch (c)
{
--- 95,108 ----
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int parallel = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
***************
*** 129,134 **** main(int argc, char *argv[])
--- 151,159 ----
case 'v':
verbose = true;
break;
+ case 'j':
+ parallel = atoi(optarg);
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
***************
*** 141,146 **** main(int argc, char *argv[])
--- 166,172 ----
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
***************
*** 196,202 **** main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet);
}
else
{
--- 222,228 ----
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet, parallel);
}
else
{
***************
*** 210,234 **** main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
! for (cell = tables.head; cell; cell = cell->next)
{
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo);
}
}
- else
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages,
- freeze, NULL,
- host, port, username, prompt_password,
- progname, echo);
}
exit(0);
--- 236,274 ----
dbname = get_user_name_or_exit(progname);
}
! if (parallel > 1)
{
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! progname, echo, parallel, &tables);
! }
! else
! {
! if (tables.head != NULL)
! {
! SimpleStringListCell *cell;
!
! for (cell = tables.head; cell; cell = cell->next)
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo);
! }
! }
! else
{
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, NULL,
! host, port, username, prompt_password,
! progname, echo);
!
}
}
}
exit(0);
***************
*** 253,263 **** run_vacuum_command(PGconn *conn, const char *sql, bool echo, const char *dbname,
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze, const char *table,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo)
{
PQExpBufferData sql;
--- 293,304 ----
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo)
{
PQExpBufferData sql;
***************
*** 352,362 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
! bool analyze_in_stages, bool freeze, const char *maintenance_db,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
--- 393,404 ----
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *maintenance_db, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo, bool quiet, int parallel)
{
PGconn *conn;
PGresult *result;
***************
*** 377,391 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
fflush(stdout);
}
! vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
! analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
progname, echo);
}
PQclear(result);
}
static void
help(const char *progname)
--- 419,668 ----
fflush(stdout);
}
!
! if (parallel > 1)
! {
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! progname, echo, parallel, NULL);
!
! }
! else
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
progname, echo);
+ }
}
PQclear(result);
}
+ static void
+ run_parallel_vacuum(PGconn *conn, bool echo,
+ const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int parallel,
+ VacOpt *vopt, const char *progname)
+ {
+ ParallelState *pstate;
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+
+ initPQExpBuffer(&sql);
+
+ pstate = ParallelVacuumStart(vopt, parallel);
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ prepare_command(conn, full, verbose, and_analyze,
+ analyze_only, freeze, &sql);
+ appendPQExpBuffer(&sql, " %s", cell->val);
+ run_command(pstate, sql.data);
+ termPQExpBuffer(&sql);
+ }
+
+ EnsureWorkersFinished(pstate);
+ ParallelVacuumEnd(pstate);
+ termPQExpBuffer(&sql);
+ }
+
+
+ void
+ vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int parallel,
+ SimpleStringList *tables)
+ {
+
+ PGconn *conn;
+ VacOpt vopt = {0};
+
+ init_parallel_vacuum_utils();
+
+ conn = connectDatabase(dbname, host, port, username, prompt_password,
+ progname, false);
+
+ if (dbname)
+ vopt.dbname = pg_strdup(dbname);
+
+ if (host)
+ vopt.pghost = pg_strdup(host);
+
+ if (port)
+ vopt.pgport = pg_strdup(port);
+
+ if (username)
+ vopt.username = pg_strdup(username);
+
+ if (progname)
+ vopt.progname = pg_strdup(progname);
+
+ vopt.promptPassword = prompt_password;
+
+ on_exit_close_connection(conn);
+
+ /* if table list is not provided then we need to do vaccum for whole DB
+ get the list of all tables and prpare the list*/
+ if (!tables || !tables->head)
+ {
+ SimpleStringList dbtables = {NULL, NULL};
+ PGresult *res;
+ int ntuple;
+ int i;
+ char *relName;
+ char *nspace;
+ PQExpBufferData sql;
+
+ initPQExpBuffer(&sql);
+
+ res = executeQuery(conn,
+ "select relname, nspname from pg_class c, pg_namespace ns"
+ " where relkind= \'r\' and c.relnamespace = ns.oid"
+ " order by relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ relName = PQgetvalue(res, i, 0);
+ nspace = PQgetvalue(res, i, 1);
+
+ appendPQExpBuffer(&sql, " %s.%s", nspace, relName);
+ simple_string_list_append(&dbtables, sql.data);
+ resetPQExpBuffer(&sql);
+ }
+
+ termPQExpBuffer(&sql);
+
+ tables = &dbtables;
+
+ }
+
+ if (analyze_in_stages)
+ {
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"
+ };
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")
+ };
+ int i;
+
+ for (i = 0; i < 3; i++)
+ {
+ puts(gettext(stage_messages[i]));
+ executeCommand(conn, stage_commands[i], progname, echo);
+ run_parallel_vacuum(conn, echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, parallel, &vopt,
+ progname);
+ }
+ }
+ else
+ run_parallel_vacuum(conn, echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, parallel, &vopt,
+ progname);
+
+ PQfinish(conn);
+ }
+
+ void run_command(ParallelState *pstate, char *command)
+ {
+ int work_status;
+ int ret_child;
+
+ DispatchJob(pstate, command);
+
+ /*Listen for worker and get message*/
+ for (;;)
+ {
+ int nTerm = 0;
+
+ ListenToWorkers(pstate, false);
+ while ((ret_child = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before
+ * re-running the loop. If nTerm > 0 we already have that (quick
+ * check).
+ */
+ if (nTerm > 0)
+ break;
+
+ /* if nobody terminated, explicitly check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ break;
+ }
+ }
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+ {
+ initPQExpBuffer(sql);
+
+ if (analyze_only)
+ {
+ appendPQExpBuffer(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBuffer(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBuffer(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBuffer(sql, " FULL");
+ if (freeze)
+ appendPQExpBuffer(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBuffer(sql, " ANALYZE");
+ }
+ }
+ }
+
static void
help(const char *progname)
***************
*** 405,410 **** help(const char *progname)
--- 682,688 ----
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
On Thu, Jun 26, 2014 at 2:35 AM, Dilip kumar <dilip.kumar@huawei.com> wrote:
Thank you for giving your time, Please review the updated patch attached in
the mail.1. Rebased the patch
2. Implemented parallel execution for new option --analyze-in-stages
Hi Dilip,
Thanks for rebasing.
I haven't done an architectural or code review on it, I just applied
it and used it a little on Linux.
Based on that, I find most importantly that it doesn't seem to
correctly vacuum tables which have upper case letters in the name,
because it does not quote the table names when they need quotes.
Of course that needs to be fixed, but taking it as it is, the
resulting error message to the console is just:
: Execute failed
Which is not very informative. I get the same error if I do a "pg_ctl
shutdown -mi" while running the parallel vacuumdb. Without the -j
option it produces a more descriptive error message "FATAL:
terminating connection due to administrator command", so something
about the new feature suppresses the informative error messages.
I get some compiler warnings with the new patch:
vac_parallel.c: In function 'parallel_msg_master':
vac_parallel.c:147: warning: function might be possible candidate for
'gnu_printf' format attribute
vac_parallel.c:147: warning: function might be possible candidate for
'gnu_printf' format attribute
vac_parallel.c: In function 'exit_horribly':
vac_parallel.c:1071: warning: 'noreturn' function does return
In the usage message, the string has a tab embedded within it
(immediately before "use") that should be converted to literal spaces,
otherwise the output of --help gets misaligned:
printf(_(" -j, --jobs=NUM use this many parallel
jobs to vacuum\n"));
Thanks for the work on this.
Cheers,
Jeff
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 27 June 2014 02:57, Jeff Wrote,
Based on that, I find most importantly that it doesn't seem to
correctly vacuum tables which have upper case letters in the name,
because it does not quote the table names when they need quotes.
Thanks for your comments....
There are two problem
First -> When doing the vacuum of complete database that time if any table with upper case letter, it was giving error
--FIXED by adding quotes for table name
Second -> When user pass the table using -t option, and if it has uppercase letter
--This is the existing problem (without parallel implementation),
One solution to this is, always add Quote to the relation name passed by user, but this can break existing applications for some users..
Of course that needs to be fixed, but taking it as it is, the resulting
error message to the console is just:
FIXED
Which is not very informative. I get the same error if I do a "pg_ctl
shutdown -mi" while running the parallel vacuumdb. Without the -j
option it produces a more descriptive error message "FATAL:
terminating connection due to administrator command", so something
about the new feature suppresses the informative error messages.I get some compiler warnings with the new patch:
vac_parallel.c: In function 'parallel_msg_master':
vac_parallel.c:147: warning: function might be possible candidate for
'gnu_printf' format attribute
vac_parallel.c:147: warning: function might be possible candidate for
'gnu_printf' format attribute
vac_parallel.c: In function 'exit_horribly':
vac_parallel.c:1071: warning: 'noreturn' function does return
FIXED
In the usage message, the string has a tab embedded within it
(immediately before "use") that should be converted to literal spaces,
otherwise the output of --help gets misaligned:printf(_(" -j, --jobs=NUM use this many parallel
jobs to vacuum\n"));
FIXED
Updated patch is attached in the mail..
Thanks & Regards,
Dilip Kumar
Attachments:
vacuumdb_parallel_v5.patchapplication/octet-stream; name=vacuumdb_parallel_v5.patchDownload
*** a/doc/src/sgml/ref/vacuumdb.sgml
--- b/doc/src/sgml/ref/vacuumdb.sgml
***************
*** 224,229 **** PostgreSQL documentation
--- 224,239 ----
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></></term>
+ <term><option>--jobs=<replaceable class="parameter">jobs</replaceable></></term>
+ <listitem>
+ <para>
+ Number of parallel process to perform the operation.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-?</></term>
<term><option>--help</></term>
<listitem>
*** a/src/bin/scripts/Makefile
--- b/src/bin/scripts/Makefile
***************
*** 32,38 **** dropdb: dropdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq subm
droplang: droplang.o common.o print.o mbprint.o | submake-libpq submake-libpgport
dropuser: dropuser.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
clusterdb: clusterdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
! vacuumdb: vacuumdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
reindexdb: reindexdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
pg_isready: pg_isready.o common.o | submake-libpq submake-libpgport
--- 32,38 ----
droplang: droplang.o common.o print.o mbprint.o | submake-libpq submake-libpgport
dropuser: dropuser.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
clusterdb: clusterdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
! vacuumdb: vacuumdb.o vac_parallel.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
reindexdb: reindexdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
pg_isready: pg_isready.o common.o | submake-libpq submake-libpgport
***************
*** 65,71 **** uninstall:
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
! rm -f common.o dumputils.o kwlookup.o keywords.o print.o mbprint.o $(WIN32RES)
rm -f dumputils.c print.c mbprint.c kwlookup.c keywords.c
rm -rf tmp_check
--- 65,71 ----
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
! rm -f common.o dumputils.o kwlookup.o keywords.o print.o mbprint.o vac_parallel.o $(WIN32RES)
rm -f dumputils.c print.c mbprint.c kwlookup.c keywords.c
rm -rf tmp_check
*** /dev/null
--- b/src/bin/scripts/vac_parallel.c
***************
*** 0 ****
--- 1,1144 ----
+ /*-------------------------------------------------------------------------
+ *
+ * vac_parallel.c
+ *
+ * Parallel support for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #include "vac_parallel.h"
+
+ #ifndef WIN32
+ #include <sys/types.h>
+ #include <sys/wait.h>
+ #include "signal.h"
+ #include <unistd.h>
+ #include <fcntl.h>
+ #endif
+
+ #include "common.h"
+
+
+ #define PIPE_READ 0
+ #define PIPE_WRITE 1
+
+ /* file-scope variables */
+ #ifdef WIN32
+ static unsigned int tMasterThreadId = 0;
+ static HANDLE termEvent = INVALID_HANDLE_VALUE;
+ static int pgpipe(int handles[2]);
+ static int piperead(int s, char *buf, int len);
+ bool parallel_init_done = false;
+ DWORD mainThreadId;
+
+ /*
+ * Structure to hold info passed by _beginthreadex() to the function it calls
+ * via its single allowed argument.
+ */
+ typedef struct
+ {
+ VacOpt *vopt;
+ int worker;
+ int pipeRead;
+ int pipeWrite;
+ } WorkerInfo;
+
+ #define pipewrite(a,b,c) send(a,b,c,0)
+ #else
+ /*
+ * aborting is only ever used in the master, the workers are fine with just
+ * wantAbort.
+ */
+ static bool aborting = false;
+ static volatile sig_atomic_t wantAbort = 0;
+
+ #define pgpipe(a) pipe(a)
+ #define piperead(a,b,c) read(a,b,c)
+ #define pipewrite(a,b,c) write(a,b,c)
+ #endif
+
+ static const char *modulename = gettext_noop("parallel vacuum");
+
+ typedef struct ShutdownInformation
+ {
+ ParallelState *pstate;
+ PGconn *conn;
+ } ShutdownInformation;
+
+ static ShutdownInformation shutdown_info;
+
+ static char *readMessageFromPipe(int fd);
+ static void
+ SetupWorker(PGconn *connection, int pipefd[2], int worker, int vacStage);
+ static void
+ WaitForCommands(PGconn * connection, int pipefd[2]);
+ static char *
+ getMessageFromMaster(int pipefd[2]);
+ #define messageStartsWith(msg, prefix) \
+ (strncmp(msg, prefix, strlen(prefix)) == 0)
+ #define messageEquals(msg, pattern) \
+ (strcmp(msg, pattern) == 0)
+ static char *
+ getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker);
+
+ static void
+ sendMessageToMaster(int pipefd[2], const char *str);
+
+ static void
+ parallel_msg_master(ParallelSlot *slot, const char *modulename, const char *fmt,
+ va_list ap)__attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 0)));
+
+
+ #ifndef WIN32
+ static void sigTermHandler(int signum);
+ #endif
+
+ static ParallelSlot *
+ GetMyPSlot(ParallelState *pstate);
+ static int
+ select_loop(int maxFd, fd_set *workerset);
+ void horribly_exit(const char *modulename, const char *fmt,...)
+ __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
+
+ void
+ init_parallel_vacuum_utils(void);
+
+ static void exit_nicely(int code);
+
+ static bool
+ HasEveryWorkerTerminated(ParallelState *pstate);
+
+ static void checkAborting();
+
+
+ static ParallelSlot *
+ GetMyPSlot(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ #ifdef WIN32
+ if (pstate->parallelSlot[i].threadId == GetCurrentThreadId())
+ #else
+ if (pstate->parallelSlot[i].pid == getpid())
+ #endif
+ return &(pstate->parallelSlot[i]);
+
+ return NULL;
+ }
+
+ /* Sends the error message from the worker to the master process */
+ static void
+ parallel_msg_master(ParallelSlot *slot, const char *modulename,
+ const char *fmt, va_list ap)
+ {
+ char buf[512];
+ int pipefd[2];
+
+ pipefd[PIPE_READ] = slot->pipeRevRead;
+ pipefd[PIPE_WRITE] = slot->pipeRevWrite;
+
+ strcpy(buf, "ERROR ");
+ vsnprintf(buf + strlen("ERROR "),
+ sizeof(buf) - strlen("ERROR "), fmt, ap);
+
+ sendMessageToMaster(pipefd, buf);
+ }
+
+ /*
+ * Wait for the termination of the processes using the OS-specific method.
+ */
+ static void
+ WaitForTerminatingWorkers(ParallelState *pstate)
+ {
+ while (!HasEveryWorkerTerminated(pstate))
+ {
+ ParallelSlot *slot = NULL;
+ int j;
+
+ #ifndef WIN32
+ int status;
+ pid_t pid = wait(&status);
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].pid == pid)
+ slot = &(pstate->parallelSlot[j]);
+ #else
+ uintptr_t hThread;
+ DWORD ret;
+ uintptr_t *lpHandles = pg_malloc(sizeof(HANDLE) * pstate->numWorkers);
+ int nrun = 0;
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].workerStatus != WRKR_TERMINATED)
+ {
+ lpHandles[nrun] = pstate->parallelSlot[j].hThread;
+ nrun++;
+ }
+ ret = WaitForMultipleObjects(nrun, (HANDLE *) lpHandles, false, INFINITE);
+ Assert(ret != WAIT_FAILED);
+ hThread = lpHandles[ret - WAIT_OBJECT_0];
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].hThread == hThread)
+ slot = &(pstate->parallelSlot[j]);
+
+ free(lpHandles);
+ #endif
+ Assert(slot);
+
+ slot->workerStatus = WRKR_TERMINATED;
+ }
+ Assert(HasEveryWorkerTerminated(pstate));
+ }
+
+
+ #ifdef WIN32
+ static unsigned __stdcall
+ init_spawned_worker_win32(WorkerInfo *wi)
+ {
+ PGconn *conn;
+ int pipefd[2] = {wi->pipeRead, wi->pipeWrite};
+ int worker = wi->worker;
+ VacOpt *vopt = wi->vopt;
+ ParallelSlot *mySlot = &shutdown_info.pstate->parallelSlot[worker];
+
+ conn = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport,
+ vopt->username, vopt->promptPassword, vopt->progname,
+ false);
+
+ mySlot->args->connection = conn;
+
+ free(wi);
+ SetupWorker(conn, pipefd, worker, vopt->analyze_stage);
+ _endthreadex(0);
+ return 0;
+ }
+ #endif
+
+
+
+ ParallelState * ParallelVacuumStart(VacOpt *vopt, int numWorkers)
+ {
+ ParallelState *pstate;
+ int i;
+ const size_t slotSize = numWorkers * sizeof(ParallelSlot);
+
+ Assert(numWorkers > 0);
+
+ /* Ensure stdio state is quiesced before forking */
+ fflush(NULL);
+
+ pstate = (ParallelState *) pg_malloc(sizeof(ParallelState));
+
+ pstate->numWorkers = numWorkers;
+ pstate->parallelSlot = NULL;
+
+ if (numWorkers == 1)
+ return pstate;
+
+ pstate->parallelSlot = (ParallelSlot *) pg_malloc(slotSize);
+ memset((void *) pstate->parallelSlot, 0, slotSize);
+
+ shutdown_info.pstate = pstate;
+
+ #ifdef WIN32
+ tMasterThreadId = GetCurrentThreadId();
+ termEvent = CreateEvent(NULL, true, false, "Terminate");
+ #else
+ signal(SIGTERM, sigTermHandler);
+ signal(SIGINT, sigTermHandler);
+ signal(SIGQUIT, sigTermHandler);
+ #endif
+
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ #ifdef WIN32
+ WorkerInfo *wi;
+ uintptr_t handle;
+ #else
+ pid_t pid;
+ #endif
+ int pipeMW[2],
+ pipeWM[2];
+
+ if (pgpipe(pipeMW) < 0 || pgpipe(pipeWM) < 0)
+ horribly_exit(modulename,
+ "could not create communication channels: %s\n",
+ strerror(errno));
+
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ pstate->parallelSlot[i].args = (ParallelArgs *) pg_malloc(sizeof(ParallelArgs));
+ pstate->parallelSlot[i].args->vopt = vopt;
+ #ifdef WIN32
+ /* Allocate a new structure for every worker */
+ wi = (WorkerInfo *) pg_malloc(sizeof(WorkerInfo));
+ wi->worker = i;
+ wi->pipeRead = pstate->parallelSlot[i].pipeRevRead = pipeMW[PIPE_READ];
+ wi->pipeWrite = pstate->parallelSlot[i].pipeRevWrite = pipeWM[PIPE_WRITE];
+ wi->vopt = vopt;
+ handle = _beginthreadex(NULL, 0, (void *) &init_spawned_worker_win32,
+ wi, 0, &(pstate->parallelSlot[i].threadId));
+ pstate->parallelSlot[i].hThread = handle;
+ #else
+ pid = fork();
+ if (pid == 0)
+ {
+ /* we are the worker */
+ int j;
+ int pipefd[2] = {pipeMW[PIPE_READ], pipeWM[PIPE_WRITE]};
+
+ /*
+ * Store the fds for the reverse communication in pstate. Actually
+ * we only use this in case of an error and don't use pstate
+ * otherwise in the worker process. On Windows we write to the
+ * global pstate, in Unix we write to our process-local copy but
+ * that's also where we'd retrieve this information back from.
+ */
+ pstate->parallelSlot[i].pipeRevRead = pipefd[PIPE_READ];
+ pstate->parallelSlot[i].pipeRevWrite = pipefd[PIPE_WRITE];
+ pstate->parallelSlot[i].pid = getpid();
+
+ pstate->parallelSlot[i].args->connection
+ = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport,
+ vopt->username, vopt->promptPassword,
+ vopt->progname, false);
+
+ /* close read end of Worker -> Master */
+ closesocket(pipeWM[PIPE_READ]);
+
+ /* close write end of Master -> Worker */
+ closesocket(pipeMW[PIPE_WRITE]);
+
+ /*
+ * Close all inherited fds for communication of the master with
+ * the other workers.
+ */
+ for (j = 0; j < i; j++)
+ {
+ closesocket(pstate->parallelSlot[j].pipeRead);
+ closesocket(pstate->parallelSlot[j].pipeWrite);
+ }
+
+ SetupWorker(pstate->parallelSlot[i].args->connection,
+ pipefd, i, vopt->analyze_stage);
+ exit(0);
+ }
+ else if (pid < 0)
+ /* fork failed */
+ horribly_exit(modulename,
+ "could not create worker process: %s\n",
+ strerror(errno));
+
+ /* we are the Master, pid > 0 here */
+ Assert(pid > 0);
+
+ /* close read end of Master -> Worker */
+ closesocket(pipeMW[PIPE_READ]);
+
+ /* close write end of Worker -> Master */
+ closesocket(pipeWM[PIPE_WRITE]);
+
+ pstate->parallelSlot[i].pid = pid;
+ #endif
+
+ pstate->parallelSlot[i].pipeRead = pipeWM[PIPE_READ];
+ pstate->parallelSlot[i].pipeWrite = pipeMW[PIPE_WRITE];
+ }
+
+ return pstate;
+ }
+
+ /*
+ * Tell all of our workers to terminate.
+ *
+ * Pretty straightforward routine, first we tell everyone to terminate, then
+ * we listen to the workers' replies and finally close the sockets that we
+ * have used for communication.
+ */
+ void
+ ParallelVacuumEnd(ParallelState *pstate)
+ {
+ int i;
+
+ Assert(IsEveryWorkerIdle(pstate));
+
+ /* close the sockets so that the workers know they can exit */
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ closesocket(pstate->parallelSlot[i].pipeRead);
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+ }
+
+ WaitForTerminatingWorkers(pstate);
+
+ /*
+ * Remove the pstate again, so the exit handler in the parent will now
+ * again fall back to closing AH->connection (if connected).
+ */
+ shutdown_info.pstate = NULL;
+
+ free(pstate->parallelSlot);
+ free(pstate);
+ }
+
+
+ /*
+ * Find the first free parallel slot (if any).
+ */
+ int
+ GetIdleWorker(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus == WRKR_IDLE)
+ return i;
+ return NO_SLOT;
+ }
+
+ /*
+ * Return true iff every worker process is in the WRKR_TERMINATED state.
+ */
+ static bool
+ HasEveryWorkerTerminated(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_TERMINATED)
+ return false;
+ return true;
+ }
+
+ /*
+ * If we have one worker that terminates for some reason, we'd like the other
+ * threads to terminate as well (and not finish with their 70 GB table dump
+ * first...). Now in UNIX we can just kill these processes, and let the signal
+ * handler set wantAbort to 1. In Windows we set a termEvent and this serves
+ * as the signal for everyone to terminate.
+ */
+ static void
+ checkAborting()
+ {
+ #ifdef WIN32
+ if (WaitForSingleObject(termEvent, 0) == WAIT_OBJECT_0)
+ #else
+ if (wantAbort)
+ #endif
+ horribly_exit(modulename, "worker is terminating\n");
+ }
+
+ /*
+ * Shut down any remaining workers, this has an implicit do_wait == true.
+ *
+ * The fastest way we can make the workers terminate gracefully is when
+ * they are listening for new commands and we just tell them to terminate.
+ */
+ static void
+ ShutdownWorkersHard(ParallelState *pstate)
+ {
+ #ifndef WIN32
+ int i;
+
+ signal(SIGPIPE, SIG_IGN);
+
+ /*
+ * Close our write end of the sockets so that the workers know they can
+ * exit.
+ */
+ for (i = 0; i < pstate->numWorkers; i++)
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ kill(pstate->parallelSlot[i].pid, SIGTERM);
+ #else
+ /* The workers monitor this event via checkAborting(). */
+ SetEvent(termEvent);
+ #endif
+
+ WaitForTerminatingWorkers(pstate);
+ }
+
+ #ifndef WIN32
+ /* Signal handling (UNIX only) */
+ static void
+ sigTermHandler(int signum)
+ {
+ wantAbort = 1;
+ }
+ #endif
+
+
+
+ /*
+ * This function is called by both UNIX and Windows variants to set up a
+ * worker process.
+ */
+ static void
+ SetupWorker(PGconn *connection, int pipefd[2], int worker, int vacStage)
+ {
+
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"};
+
+ executeMaintenanceCommand(connection, stage_commands[vacStage], false);
+
+ /*
+ * Call the setup worker function that's defined in the ArchiveHandle.
+ *
+ * We get the raw connection only for the reason that we can close it
+ * properly when we shut down. This happens only that way when it is
+ * brought down because of an error.
+ */
+ WaitForCommands(connection, pipefd);
+ closesocket(pipefd[PIPE_READ]);
+ closesocket(pipefd[PIPE_WRITE]);
+ }
+
+
+
+ /*
+ * That's the main routine for the worker.
+ * When it starts up it enters this routine and waits for commands from the
+ * master process. After having processed a command it comes back to here to
+ * wait for the next command. Finally it will receive a TERMINATE command and
+ * exit.
+ */
+ static void
+ WaitForCommands(PGconn * connection, int pipefd[2])
+ {
+ char *command;
+ PQExpBufferData sql;
+
+ for (;;)
+ {
+ if (!(command = getMessageFromMaster(pipefd)))
+ {
+ PQfinish(connection);
+ connection = NULL;
+ return;
+ }
+
+
+ /* check if master has set the terminate event*/
+ checkAborting();
+
+ if (executeMaintenanceCommand(connection, command, false))
+ sendMessageToMaster(pipefd, "OK");
+ else
+ {
+ initPQExpBuffer(&sql);
+ appendPQExpBuffer(&sql, "ERROR : %s",
+ PQerrorMessage(connection));
+ sendMessageToMaster(pipefd, sql.data);
+ termPQExpBuffer(&sql);
+ }
+
+ /* command was pg_malloc'd and we are responsible for free()ing it. */
+ free(command);
+ }
+ }
+
+ /*
+ * ---------------------------------------------------------------------
+ * Note the status change:
+ *
+ * DispatchJobForTocEntry WRKR_IDLE -> WRKR_WORKING
+ * ListenToWorkers WRKR_WORKING -> WRKR_FINISHED / WRKR_TERMINATED
+ * ReapWorkerStatus WRKR_FINISHED -> WRKR_IDLE
+ * ---------------------------------------------------------------------
+ *
+ * Just calling ReapWorkerStatus() when all workers are working might or might
+ * not give you an idle worker because you need to call ListenToWorkers() in
+ * between and only thereafter ReapWorkerStatus(). This is necessary in order
+ * to get and deal with the status (=result) of the worker's execution.
+ */
+ void
+ ListenToWorkers(ParallelState *pstate, bool do_wait)
+ {
+ int worker;
+ char *msg;
+
+ msg = getMessageFromWorker(pstate, do_wait, &worker);
+
+ if (!msg)
+ {
+ if (do_wait)
+ horribly_exit(modulename, "a worker process died unexpectedly\n");
+ return;
+ }
+
+ if (messageStartsWith(msg, "OK"))
+ {
+ pstate->parallelSlot[worker].workerStatus = WRKR_FINISHED;
+
+ }
+ else if (messageStartsWith(msg, "ERROR "))
+ {
+
+ ParallelSlot *mySlot = &pstate->parallelSlot[worker];
+
+ mySlot->workerStatus = WRKR_TERMINATED;
+
+ horribly_exit(modulename,
+ "%s: vacuuming of database \"%s\" failed %s",
+ mySlot->args->vopt->progname,
+ mySlot->args->vopt->dbname, msg + strlen("ERROR "));
+ }
+ else
+ horribly_exit(modulename, "invalid message received from worker: %s\n", msg);
+
+ /* both Unix and Win32 return pg_malloc()ed space, so we free it */
+ free(msg);
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * This function is used to get the return value of a terminated worker
+ * process. If a process has terminated, its status is stored in *status and
+ * the id of the worker is returned.
+ */
+ int
+ ReapWorkerStatus(ParallelState *pstate, int *status)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_FINISHED)
+ {
+ *status = pstate->parallelSlot[i].status;
+ pstate->parallelSlot[i].status = 0;
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ return i;
+ }
+ }
+
+ return NO_SLOT;
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It looks for an idle worker process and only returns if there is one.
+ */
+ void
+ EnsureIdleWorker(ParallelState *pstate)
+ {
+ int ret_worker;
+ int work_status;
+
+ for (;;)
+ {
+ int nTerm = 0;
+
+ while ((ret_worker = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ if (work_status != 0)
+ horribly_exit(modulename, "error processing a parallel work item\n");
+
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before dispatching
+ * the next item. If nTerm > 0 we already have that (quick check).
+ */
+ if (nTerm > 0)
+ return;
+
+ /* explicit check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ return;
+
+ /*
+ * If we have no idle worker, read the result of one or more workers
+ * and loop the loop to call ReapWorkerStatus() on them
+ */
+ ListenToWorkers(pstate, true);
+ }
+ }
+
+
+ /*
+ * Return true iff every worker is in the WRKR_IDLE state.
+ */
+ bool
+ IsEveryWorkerIdle(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_IDLE)
+ return false;
+ return true;
+ }
+
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It waits for all workers to terminate.
+ */
+ void
+ EnsureWorkersFinished(ParallelState *pstate)
+ {
+ int work_status;
+
+ if (!pstate || pstate->numWorkers == 1)
+ return;
+
+ /* Waiting for the remaining worker processes to finish */
+ while (!IsEveryWorkerIdle(pstate))
+ {
+ if (ReapWorkerStatus(pstate, &work_status) == NO_SLOT)
+ ListenToWorkers(pstate, true);
+ else if (work_status != 0)
+ horribly_exit(modulename,
+ "error processing a parallel work item\n");
+ }
+ }
+
+ /*
+ * This function is executed in the worker process.
+ *
+ * It returns the next message on the communication channel, blocking until it
+ * becomes available.
+ */
+ static char *
+ getMessageFromMaster(int pipefd[2])
+ {
+ return readMessageFromPipe(pipefd[PIPE_READ]);
+ }
+
+ /*
+ * This function is executed in the worker process.
+ *
+ * It sends a message to the master on the communication channel.
+ */
+ static void
+ sendMessageToMaster(int pipefd[2], const char *str)
+ {
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pipefd[PIPE_WRITE], str, len) != len)
+ horribly_exit(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It returns the next message from the worker on the communication channel,
+ * optionally blocking (do_wait) until it becomes available.
+ *
+ * The id of the worker is returned in *worker.
+ */
+ static char *
+ getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker)
+ {
+ int i;
+ fd_set workerset;
+ int maxFd = -1;
+ struct timeval nowait = {0, 0};
+
+ FD_ZERO(&workerset);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_TERMINATED)
+ continue;
+ FD_SET(pstate->parallelSlot[i].pipeRead, &workerset);
+ /* actually WIN32 ignores the first parameter to select()... */
+ if (pstate->parallelSlot[i].pipeRead > maxFd)
+ maxFd = pstate->parallelSlot[i].pipeRead;
+ }
+
+ if (do_wait)
+ {
+ i = select_loop(maxFd, &workerset);
+ Assert(i != 0);
+ }
+ else
+ {
+ if ((i = select(maxFd + 1, &workerset, NULL, NULL, &nowait)) == 0)
+ return NULL;
+ }
+
+ if (i < 0)
+ horribly_exit(modulename, "error in ListenToWorkers(): %s\n", strerror(errno));
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ char *msg;
+
+ if (!FD_ISSET(pstate->parallelSlot[i].pipeRead, &workerset))
+ continue;
+
+ msg = readMessageFromPipe(pstate->parallelSlot[i].pipeRead);
+ *worker = i;
+ return msg;
+ }
+
+ Assert(false);
+ return NULL;
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It sends a message to a certain worker on the communication channel.
+ */
+ static void
+ sendMessageToWorker(ParallelState *pstate, int worker, const char *str)
+ {
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pstate->parallelSlot[worker].pipeWrite, str, len) != len)
+ {
+ /*
+ * If we're already aborting anyway, don't care if we succeed or not.
+ * The child might have gone already.
+ */
+ #ifndef WIN32
+ if (!aborting)
+ #endif
+ horribly_exit(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+ }
+ }
+
+ /*
+ * The underlying function to read a message from the communication channel
+ * (fd) with optional blocking (do_wait).
+ */
+ static char *
+ readMessageFromPipe(int fd)
+ {
+ char *msg;
+ int msgsize,
+ bufsize;
+ int ret;
+
+ /*
+ * The problem here is that we need to deal with several possibilites: we
+ * could receive only a partial message or several messages at once. The
+ * caller expects us to return exactly one message however.
+ *
+ * We could either read in as much as we can and keep track of what we
+ * delivered back to the caller or we just read byte by byte. Once we see
+ * (char) 0, we know that it's the message's end. This would be quite
+ * inefficient for more data but since we are reading only on the command
+ * channel, the performance loss does not seem worth the trouble of
+ * keeping internal states for different file descriptors.
+ */
+ bufsize = 64; /* could be any number */
+ msg = (char *) pg_malloc(bufsize);
+
+ msgsize = 0;
+ for (;;)
+ {
+ Assert(msgsize <= bufsize);
+ ret = piperead(fd, msg + msgsize, 1);
+
+ /* worker has closed the connection or another error happened */
+ if (ret <= 0)
+ return NULL;
+
+ Assert(ret == 1);
+
+ if (msg[msgsize] == '\0')
+ return msg;
+
+ msgsize++;
+ if (msgsize == bufsize)
+ {
+ /* could be any number */
+ bufsize += 16;
+ msg = (char *) realloc(msg, bufsize);
+ }
+ }
+ }
+
+ /*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+ static int
+ select_loop(int maxFd, fd_set *workerset)
+ {
+ int i;
+ fd_set saveSet = *workerset;
+
+ #ifdef WIN32
+ /* should always be the master */
+ Assert(tMasterThreadId == GetCurrentThreadId());
+
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+ #else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+
+ /*
+ * If we Ctrl-C the master process , it's likely that we interrupt
+ * select() here. The signal handler will set wantAbort == true and
+ * the shutdown journey starts from here. Note that we'll come back
+ * here later when we tell all workers to terminate and read their
+ * responses. But then we have aborting set to true.
+ */
+ if (wantAbort && !aborting)
+ horribly_exit(modulename, "terminated by user\n");
+
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+ #endif
+
+ return i;
+ }
+
+ void
+ DispatchJob(ParallelState *pstate, char * command)
+ {
+ int worker;
+
+ /* our caller makes sure that at least one worker is idle */
+ Assert(GetIdleWorker(pstate) != NO_SLOT);
+ worker = GetIdleWorker(pstate);
+ Assert(worker != NO_SLOT);
+
+ sendMessageToWorker(pstate, worker, command);
+ pstate->parallelSlot[worker].workerStatus = WRKR_WORKING;
+ }
+
+ #ifdef WIN32
+ /*
+ * This is a replacement version of pipe for Win32 which allows returned
+ * handles to be used in select(). Note that read/write calls must be replaced
+ * with recv/send.
+ */
+ static int
+ pgpipe(int handles[2])
+ {
+ SOCKET s;
+ struct sockaddr_in serv_addr;
+ int len = sizeof(serv_addr);
+
+ handles[0] = handles[1] = INVALID_SOCKET;
+
+ if ((s = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
+ {
+ return -1;
+ }
+
+ memset((void *) &serv_addr, 0, sizeof(serv_addr));
+ serv_addr.sin_family = AF_INET;
+ serv_addr.sin_port = htons(0);
+ serv_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+ if (bind(s, (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if (listen(s, 1) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if (getsockname(s, (SOCKADDR *) &serv_addr, &len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if ((handles[1] = socket(PF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
+ {
+ closesocket(s);
+ return -1;
+ }
+
+ if (connect(handles[1], (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if ((handles[0] = accept(s, (SOCKADDR *) &serv_addr, &len)) == INVALID_SOCKET)
+ {
+ closesocket(handles[1]);
+ handles[1] = INVALID_SOCKET;
+ closesocket(s);
+ return -1;
+ }
+ closesocket(s);
+ return 0;
+ }
+
+ static int
+ piperead(int s, char *buf, int len)
+ {
+ int ret = recv(s, buf, len, 0);
+
+ if (ret < 0 && WSAGetLastError() == WSAECONNRESET)
+ /* EOF on the pipe! (win32 socket based implementation) */
+ ret = 0;
+ return ret;
+ }
+
+ #endif
+
+ static void
+ shutdown_parallel_vacuum_utils()
+ {
+ #ifdef WIN32
+ /* Call the cleanup function only from the main thread */
+ if (mainThreadId == GetCurrentThreadId())
+ WSACleanup();
+ #endif
+ }
+
+ void
+ init_parallel_vacuum_utils(void)
+ {
+ #ifdef WIN32
+ if (!parallel_init_done)
+ {
+ WSADATA wsaData;
+ int err;
+
+ mainThreadId = GetCurrentThreadId();
+ err = WSAStartup(MAKEWORD(2, 2), &wsaData);
+ if (err != 0)
+ {
+ exit_nicely(1);
+ }
+
+ parallel_init_done = true;
+ }
+ #endif
+ }
+
+ void
+ on_exit_close_connection(PGconn *conn)
+ {
+ shutdown_info.conn = conn;
+ }
+
+ /*
+ * Fail and die, with a message to stderr. Parameters as for write_msg.
+ *
+ * This is defined in parallel.c, because in parallel mode, things are more
+ * complicated. If the worker process does horribly_exit(), we forward its
+ * last words to the master process. The master process then does
+ * horribly_exit() with this error message itself and prints it normally.
+ * After printing the message, horribly_exit() on the master will shut down
+ * the remaining worker processes.
+ */
+ void
+ horribly_exit(const char *modulename, const char *fmt,...)
+ {
+ va_list ap;
+ ParallelState *pstate = shutdown_info.pstate;
+ ParallelSlot *slot;
+ va_start(ap, fmt);
+
+ if (pstate == NULL)
+ {
+ /* We're the parent, just write the message out */
+ vfprintf(stderr, _(fmt), ap);
+ }
+ else
+ {
+ slot = GetMyPSlot(pstate);
+ if (!slot)
+ {
+ /* We're the parent, just write the message out */
+ vfprintf(stderr, _(fmt), ap);
+
+ }
+ else
+ {
+ /* If we're a worker process, send the msg to the master process */
+ parallel_msg_master(slot, modulename, fmt, ap);
+ }
+ }
+
+ va_end(ap);
+
+ exit_nicely(1);
+ }
+
+ static void
+ exit_nicely(int code)
+ {
+ if (shutdown_info.pstate)
+ {
+ ParallelSlot *slot = GetMyPSlot(shutdown_info.pstate);
+
+ if (!slot)
+ {
+ /*
+ * We're the master: We have already printed out the message
+ * passed to horribly_exit() either from the master itself or from
+ * a worker process. Now we need to close our own database
+ * connection (only open during parallel dump but not restore) and
+ * shut down the remaining workers.
+ */
+ PQfinish(shutdown_info.conn);
+ #ifndef WIN32
+
+ /*
+ * Setting aborting to true switches to best-effort-mode
+ * (send/receive but ignore errors) in communicating with our
+ * workers.
+ */
+ aborting = true;
+ #endif
+ ShutdownWorkersHard(shutdown_info.pstate);
+ }
+ else if (slot->args->connection)
+ PQfinish(slot->args->connection);
+
+ }
+ else if (shutdown_info.conn)
+ PQfinish(shutdown_info.conn);
+
+ shutdown_parallel_vacuum_utils();
+ exit(code);
+ }
+
+
*** /dev/null
--- b/src/bin/scripts/vac_parallel.h
***************
*** 0 ****
--- 1,105 ----
+ /*-------------------------------------------------------------------------
+ *
+ * vac_parallel.h
+ *
+ * Parallel support header file for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #ifndef VAC_PARALLEL_H
+ #define VAC_PARALLEL_H
+
+ #include "postgres_fe.h"
+ #include <time.h>
+ #include "libpq-fe.h"
+ #include "common.h"
+
+ typedef enum
+ {
+ WRKR_TERMINATED = 0,
+ WRKR_IDLE,
+ WRKR_WORKING,
+ WRKR_FINISHED
+ } T_WorkerStatus;
+
+
+ typedef struct VacOpt
+ {
+ char *dbname;
+ char *pgport;
+ char *pghost;
+ char *username;
+ char *progname;
+ enum trivalue promptPassword;
+ int analyze_stage;
+ }VacOpt;
+
+ /* Arguments needed for a worker process */
+ typedef struct ParallelArgs
+ {
+ PGconn *connection;
+ VacOpt *vopt;
+ } ParallelArgs;
+
+ /* State for each parallel activity slot */
+ typedef struct ParallelSlot
+ {
+ ParallelArgs *args; //can pass connection handle here
+ T_WorkerStatus workerStatus;
+ int status;
+ int pipeRead;
+ int pipeWrite;
+ int pipeRevRead;
+ int pipeRevWrite;
+ #ifdef WIN32
+ uintptr_t hThread;
+ unsigned int threadId;
+ #else
+ pid_t pid;
+ #endif
+ } ParallelSlot;
+
+ #define NO_SLOT (-1)
+
+ typedef struct ParallelState
+ {
+ int numWorkers;
+ ParallelSlot *parallelSlot;
+ } ParallelState;
+
+
+
+ #ifdef WIN32
+ extern bool parallel_init_done;
+ extern DWORD mainThreadId;
+ #endif
+
+ extern ParallelState * ParallelVacuumStart(VacOpt *vopt, int numWorkers);
+ extern int GetIdleWorker(ParallelState *pstate);
+ extern bool IsEveryWorkerIdle(ParallelState *pstate);
+ extern void ListenToWorkers(ParallelState *pstate, bool do_wait);
+ extern int ReapWorkerStatus(ParallelState *pstate, int *status);
+ extern void EnsureIdleWorker(ParallelState *pstate);
+ extern void EnsureWorkersFinished(ParallelState *pstate);
+
+ extern void DispatchJob(ParallelState *pstate, char * command);
+ extern void
+ exit_horribly(const char *modulename, const char *fmt,...)
+ __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3), noreturn));
+
+ extern void init_parallel_vacuum_utils(void);
+ extern void on_exit_close_connection(PGconn *conn);
+ extern void ParallelVacuumEnd(ParallelState *pstate);
+
+
+ #endif /* VAC_PARALLEL_H */
*** a/src/bin/scripts/vacuumdb.c
--- b/src/bin/scripts/vacuumdb.c
***************
*** 11,34 ****
*/
#include "postgres_fe.h"
- #include "common.h"
#include "dumputils.h"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *table, const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
const char *progname, bool echo);
static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
bool analyze_only, bool analyze_in_stages, bool freeze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet);
static void help(const char *progname);
int
main(int argc, char *argv[])
--- 11,54 ----
*/
#include "postgres_fe.h"
#include "dumputils.h"
+ #include "vac_parallel.h"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password,
const char *progname, bool echo);
static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
bool analyze_only, bool analyze_in_stages, bool freeze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet, int parallel);
static void help(const char *progname);
+ void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host,
+ const char *port, const char *username,
+ enum trivalue prompt_password, const char *progname,
+ bool echo, int parallel, SimpleStringList *tables);
+
+ void run_command(ParallelState *pstate, char *command);
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+ static void
+ run_parallel_vacuum(PGconn *conn, bool echo,
+ const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int parallel,
+ VacOpt *vopt, const char *progname);
+
+
int
main(int argc, char *argv[])
***************
*** 49,54 **** main(int argc, char *argv[])
--- 69,75 ----
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"parallel", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
***************
*** 74,86 **** main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
{
switch (c)
{
--- 95,108 ----
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int parallel = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
***************
*** 129,134 **** main(int argc, char *argv[])
--- 151,159 ----
case 'v':
verbose = true;
break;
+ case 'j':
+ parallel = atoi(optarg);
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
***************
*** 141,146 **** main(int argc, char *argv[])
--- 166,172 ----
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
***************
*** 196,202 **** main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet);
}
else
{
--- 222,228 ----
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet, parallel);
}
else
{
***************
*** 210,234 **** main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
! for (cell = tables.head; cell; cell = cell->next)
{
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo);
}
}
- else
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages,
- freeze, NULL,
- host, port, username, prompt_password,
- progname, echo);
}
exit(0);
--- 236,274 ----
dbname = get_user_name_or_exit(progname);
}
! if (parallel > 1)
{
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! progname, echo, parallel, &tables);
! }
! else
! {
! if (tables.head != NULL)
! {
! SimpleStringListCell *cell;
!
! for (cell = tables.head; cell; cell = cell->next)
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo);
! }
! }
! else
{
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, NULL,
! host, port, username, prompt_password,
! progname, echo);
!
}
}
}
exit(0);
***************
*** 253,263 **** run_vacuum_command(PGconn *conn, const char *sql, bool echo, const char *dbname,
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze, const char *table,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo)
{
PQExpBufferData sql;
--- 293,304 ----
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo)
{
PQExpBufferData sql;
***************
*** 352,362 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
! bool analyze_in_stages, bool freeze, const char *maintenance_db,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
--- 393,404 ----
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *maintenance_db, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo, bool quiet, int parallel)
{
PGconn *conn;
PGresult *result;
***************
*** 377,391 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
fflush(stdout);
}
! vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
! analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
progname, echo);
}
PQclear(result);
}
static void
help(const char *progname)
--- 419,666 ----
fflush(stdout);
}
!
! if (parallel > 1)
! {
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! progname, echo, parallel, NULL);
!
! }
! else
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
progname, echo);
+ }
}
PQclear(result);
}
+ static void
+ run_parallel_vacuum(PGconn *conn, bool echo,
+ const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int parallel,
+ VacOpt *vopt, const char *progname)
+ {
+ ParallelState *pstate;
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+
+ initPQExpBuffer(&sql);
+
+ pstate = ParallelVacuumStart(vopt, parallel);
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ prepare_command(conn, full, verbose, and_analyze,
+ analyze_only, freeze, &sql);
+ appendPQExpBuffer(&sql, " %s", cell->val);
+ run_command(pstate, sql.data);
+ termPQExpBuffer(&sql);
+ }
+
+ EnsureWorkersFinished(pstate);
+ ParallelVacuumEnd(pstate);
+ termPQExpBuffer(&sql);
+ }
+
+
+ void
+ vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int parallel,
+ SimpleStringList *tables)
+ {
+
+ PGconn *conn;
+ VacOpt vopt = {0};
+
+ init_parallel_vacuum_utils();
+
+ conn = connectDatabase(dbname, host, port, username, prompt_password,
+ progname, false);
+
+ if (dbname)
+ vopt.dbname = pg_strdup(dbname);
+
+ if (host)
+ vopt.pghost = pg_strdup(host);
+
+ if (port)
+ vopt.pgport = pg_strdup(port);
+
+ if (username)
+ vopt.username = pg_strdup(username);
+
+ if (progname)
+ vopt.progname = pg_strdup(progname);
+
+ vopt.promptPassword = prompt_password;
+
+ on_exit_close_connection(conn);
+
+ /* if table list is not provided then we need to do vaccum for whole DB
+ get the list of all tables and prpare the list*/
+ if (!tables || !tables->head)
+ {
+ SimpleStringList dbtables = {NULL, NULL};
+ PGresult *res;
+ int ntuple;
+ int i;
+ char *relName;
+ char *nspace;
+ PQExpBufferData sql;
+
+ initPQExpBuffer(&sql);
+
+ res = executeQuery(conn,
+ "select relname, nspname from pg_class c, pg_namespace ns"
+ " where relkind= \'r\' and c.relnamespace = ns.oid"
+ " order by relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ relName = PQgetvalue(res, i, 0);
+ nspace = PQgetvalue(res, i, 1);
+
+ appendPQExpBuffer(&sql, " %s.\"%s\"", nspace, relName);
+ simple_string_list_append(&dbtables, sql.data);
+ resetPQExpBuffer(&sql);
+ }
+
+ termPQExpBuffer(&sql);
+
+ tables = &dbtables;
+
+ }
+
+ if (analyze_in_stages)
+ {
+ int i;
+
+ for (i = 0; i < 3; i++)
+ {
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")};
+
+ puts(gettext(stage_messages[i]));
+ vopt.analyze_stage = i;
+ run_parallel_vacuum(conn, echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, parallel, &vopt,
+ progname);
+ }
+ }
+ else
+ {
+ vopt.analyze_stage = -1;
+ run_parallel_vacuum(conn, echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, parallel, &vopt,
+ progname);
+ }
+
+ PQfinish(conn);
+ }
+
+ void run_command(ParallelState *pstate, char *command)
+ {
+ int work_status;
+ int ret_child;
+
+ DispatchJob(pstate, command);
+
+ /*Listen for worker and get message*/
+ for (;;)
+ {
+ int nTerm = 0;
+
+ ListenToWorkers(pstate, false);
+ while ((ret_child = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before
+ * re-running the loop. If nTerm > 0 we already have that (quick
+ * check).
+ */
+ if (nTerm > 0)
+ break;
+
+ /* if nobody terminated, explicitly check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ break;
+ }
+ }
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+ {
+ initPQExpBuffer(sql);
+
+ if (analyze_only)
+ {
+ appendPQExpBuffer(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBuffer(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBuffer(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBuffer(sql, " FULL");
+ if (freeze)
+ appendPQExpBuffer(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBuffer(sql, " ANALYZE");
+ }
+ }
+ }
+
static void
help(const char *progname)
***************
*** 405,410 **** help(const char *progname)
--- 680,686 ----
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
On Fri, Jun 27, 2014 at 4:10 AM, Dilip kumar <dilip.kumar@huawei.com> wrote:
...
Updated patch is attached in the mail..
Thanks Dilip.
I get a compiler warning when building on Windows. When I started
looking into that, I see that two files have too much code duplication
between them:
src/bin/scripts/vac_parallel.c (new file)
src/bin/pg_dump/parallel.c (existing file)
In particular, pgpipe is almost an exact duplicate between them,
except the copy in vac_parallel.c has fallen behind changes made to
parallel.c. (Those changes would have fixed the Windows warnings). I
think that this function (and perhaps other parts as
well--"exit_horribly" for example) need to refactored into a common
file that both files can include. I don't know where the best place
for that would be, though. (I haven't done this type of refactoring
myself.)
Also, there are several places in the patch which use spaces for
indentation where tabs are called for by the coding style. It looks
like you may have copied the code from one terminal window and copied
it into another one, converting tabs to spaces in the process. This
makes it hard to evaluate the amount of code duplication.
In some places the code spins in a tight loop while waiting for a
worker process to become free. If I strace the process, I got a long
list of selects with 0 time outs:
select(13, [6 8 10 12], NULL, NULL, {0, 0}) = 0 (Timeout)
I have not tried to track down the code that causes it. I did notice
that vacuumdb spends an awful lot of time at the top of the Linux
"top" output, and this is probably why.
Cheers,
Jeff
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Jeff Janes wrote:
In particular, pgpipe is almost an exact duplicate between them,
except the copy in vac_parallel.c has fallen behind changes made to
parallel.c. (Those changes would have fixed the Windows warnings). I
think that this function (and perhaps other parts as
well--"exit_horribly" for example) need to refactored into a common
file that both files can include. I don't know where the best place
for that would be, though. (I haven't done this type of refactoring
myself.)
I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos.
Maybe we should move pgpipe back to src/port and have pg_dump and this
new thing use that. I'm not sure about the rest of duplication in
vac_parallel.c; there might be a lot in common with what
pg_dump/parallel.c does too. Having two copies of code is frowned upon
for good reasons. This patch introduces 1200 lines of new code in
vac_parallel.c, ugh.
If we really require 1200 lines to get parallel vacuum working for
vacuumdb, I would question the wisdom of this effort. To me, it seems
better spent improving autovacuum to cover whatever it is that this
patch is supposed to be good for --- or maybe just enable having a shell
script that launches multiple vacuumdb instances in parallel ...
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01 July 2014 03:31, Jeff Janes Wrote,
I get a compiler warning when building on Windows. When I started
looking into that, I see that two files have too much code duplication
between them:
Thanks for Reviewing,
src/bin/scripts/vac_parallel.c (new file)
src/bin/pg_dump/parallel.c (existing file)In particular, pgpipe is almost an exact duplicate between them, except
the copy in vac_parallel.c has fallen behind changes made to parallel.c.
(Those changes would have fixed the Windows warnings). I think that
this function (and perhaps other parts as well--"exit_horribly" for
example) need to refactored into a common file that both files can
include. I don't know where the best place for that would be, though.
(I haven't done this type of refactoring
myself.)
When I started doing this patch, I thought of sharing the common code b/w vacuumdb and pg_dump, But if we notice
Pg_dump code is tightly coupled with ArchiveHandle, almost all function take this parameter as input or they operate on this, and other functions
uses some structure like ParallelState or ParallelSlot which has ArchiveHandle member. I think making this code common may need to change complete code of
Parallel pg_dump.
However there are some function which are independent of Archive Handle and can directly move to common code,
As you mention pg_pipe, piperead, readMessageFromPipe, select_loop.
For moving them to common place we need to decide where the common file to be placed.
Thoughts ?
Also, there are several places in the patch which use spaces for
indentation where tabs are called for by the coding style. It looks
like you may have copied the code from one terminal window and copied
it into another one, converting tabs to spaces in the process. This
makes it hard to evaluate the amount of code duplication.In some places the code spins in a tight loop while waiting for a
worker process to become free. If I strace the process, I got a long
list of selects with 0 time outs:select(13, [6 8 10 12], NULL, NULL, {0, 0}) = 0 (Timeout)
I have not tried to track down the code that causes it. I did notice
that vacuumdb spends an awful lot of time at the top of the Linux "top"
output, and this is probably why.
I will look into these and fix..
Thanks & Regards,
Dilip Kumar
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01 July 2014 03:48, Alvaro Wrote,
In particular, pgpipe is almost an exact duplicate between them,
except the copy in vac_parallel.c has fallen behind changes made to
parallel.c. (Those changes would have fixed the Windows warnings).I
think that this function (and perhaps other parts as
well--"exit_horribly" for example) need to refactored into a common
file that both files can include. I don't know where the best place
for that would be, though. (I haven't done this type of refactoring
myself.)I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos.
Maybe we should move pgpipe back to src/port and have pg_dump and this
new thing use that. I'm not sure about the rest of duplication in
vac_parallel.c; there might be a lot in common with what
pg_dump/parallel.c does too. Having two copies of code is frowned upon
for good reasons. This patch introduces 1200 lines of new code in
vac_parallel.c, ugh.
If we really require 1200 lines to get parallel vacuum working for
vacuumdb, I would question the wisdom of this effort. To me, it seems
better spent improving autovacuum to cover whatever it is that this
patch is supposed to be good for --- or maybe just enable having a
shell script that launches multiple vacuumdb instances in parallel ...
Thanks for looking into the patch,
I think if we use shell script for launching parallel vacuumdb, we cannot get complete control of dividing the task,
If we directly divide table b/w multiple process, it may happen some process get very big tables then it will be as good as one process is doing operation.
In this patch at a time we assign only one table to each process and whichever process finishes fast, we assign new table, this way all process get equal sharing of the task.
Thanks & Regards,
Dilip Kumar
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jul 1, 2014 at 1:25 PM, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 01 July 2014 03:48, Alvaro Wrote,
In particular, pgpipe is almost an exact duplicate between them,
except the copy in vac_parallel.c has fallen behind changes made to
parallel.c. (Those changes would have fixed the Windows warnings).I
think that this function (and perhaps other parts as
well--"exit_horribly" for example) need to refactored into a common
file that both files can include. I don't know where the best place
for that would be, though. (I haven't done this type of refactoring
myself.)I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos.
Maybe we should move pgpipe back to src/port and have pg_dump and this
new thing use that. I'm not sure about the rest of duplication in
vac_parallel.c; there might be a lot in common with what
pg_dump/parallel.c does too. Having two copies of code is frowned upon
for good reasons. This patch introduces 1200 lines of new code in
vac_parallel.c, ugh.If we really require 1200 lines to get parallel vacuum working for
vacuumdb, I would question the wisdom of this effort. To me, it seems
better spent improving autovacuum to cover whatever it is that this
patch is supposed to be good for --- or maybe just enable having a
shell script that launches multiple vacuumdb instances in parallel ...Thanks for looking into the patch,
I think if we use shell script for launching parallel vacuumdb, we cannot get complete control of dividing the task,
If we directly divide table b/w multiple process, it may happen some process get very big tables then it will be as good as one process is doing operation.In this patch at a time we assign only one table to each process and whichever process finishes fast, we assign new table, this way all process get equal sharing of the task.
Thanks & Regards,
Dilip Kumar
I have executed latest patch.
One question is that how to use --jobs option is correct?
$ vacuumdb -d postgres --jobs=30
I got following error.
vacuumdb: unrecognized option '--jobs=30'
Try "vacuumdb --help" for more information.
Regards,
-------
Sawada Masahiko
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01 July 2014 22:17, Sawada Masahiko Wrote,
I have executed latest patch.
One question is that how to use --jobs option is correct?
$ vacuumdb -d postgres --jobs=30I got following error.
vacuumdb: unrecognized option '--jobs=30'
Try "vacuumdb --help" for more information.
Thanks for comments, Your usage are correct, but there are some problem in code and I have fixed the same in attached patch.
Apart from this issue fix currently I am working on jeff's comments for making the code common between pg_dump/parallel.c and scripts/vac_parallel.c.
I found that almost 300 lines of code we can move to common place, but only problem is where to keep the common code.
I am thinking of
1. keeping a common folder in bin folder --> src/bin/common and move common code which is specific to parallel operation in src/bin/common/parallel_common.c
2. Both vacuum db and pg_dump will compile this file in while generating there executables(in future other executable like reindex can also use same for parallel functionality)
Thoughts ?
Thanks & Regards,
Dilip Kumar
Attachments:
vacuumdb_parallel_v6.patchapplication/octet-stream; name=vacuumdb_parallel_v6.patchDownload
*** a/doc/src/sgml/ref/vacuumdb.sgml
--- b/doc/src/sgml/ref/vacuumdb.sgml
***************
*** 224,229 **** PostgreSQL documentation
--- 224,239 ----
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></></term>
+ <term><option>--jobs=<replaceable class="parameter">jobs</replaceable></></term>
+ <listitem>
+ <para>
+ Number of parallel process to perform the operation.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-?</></term>
<term><option>--help</></term>
<listitem>
*** a/src/bin/scripts/Makefile
--- b/src/bin/scripts/Makefile
***************
*** 32,38 **** dropdb: dropdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq subm
droplang: droplang.o common.o print.o mbprint.o | submake-libpq submake-libpgport
dropuser: dropuser.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
clusterdb: clusterdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
! vacuumdb: vacuumdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
reindexdb: reindexdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
pg_isready: pg_isready.o common.o | submake-libpq submake-libpgport
--- 32,38 ----
droplang: droplang.o common.o print.o mbprint.o | submake-libpq submake-libpgport
dropuser: dropuser.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
clusterdb: clusterdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
! vacuumdb: vacuumdb.o vac_parallel.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
reindexdb: reindexdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
pg_isready: pg_isready.o common.o | submake-libpq submake-libpgport
***************
*** 65,71 **** uninstall:
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
! rm -f common.o dumputils.o kwlookup.o keywords.o print.o mbprint.o $(WIN32RES)
rm -f dumputils.c print.c mbprint.c kwlookup.c keywords.c
rm -rf tmp_check
--- 65,71 ----
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
! rm -f common.o dumputils.o kwlookup.o keywords.o print.o mbprint.o vac_parallel.o $(WIN32RES)
rm -f dumputils.c print.c mbprint.c kwlookup.c keywords.c
rm -rf tmp_check
*** /dev/null
--- b/src/bin/scripts/vac_parallel.c
***************
*** 0 ****
--- 1,1144 ----
+ /*-------------------------------------------------------------------------
+ *
+ * vac_parallel.c
+ *
+ * Parallel support for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #include "vac_parallel.h"
+
+ #ifndef WIN32
+ #include <sys/types.h>
+ #include <sys/wait.h>
+ #include "signal.h"
+ #include <unistd.h>
+ #include <fcntl.h>
+ #endif
+
+ #include "common.h"
+
+
+ #define PIPE_READ 0
+ #define PIPE_WRITE 1
+
+ /* file-scope variables */
+ #ifdef WIN32
+ static unsigned int tMasterThreadId = 0;
+ static HANDLE termEvent = INVALID_HANDLE_VALUE;
+ static int pgpipe(int handles[2]);
+ static int piperead(int s, char *buf, int len);
+ bool parallel_init_done = false;
+ DWORD mainThreadId;
+
+ /*
+ * Structure to hold info passed by _beginthreadex() to the function it calls
+ * via its single allowed argument.
+ */
+ typedef struct
+ {
+ VacOpt *vopt;
+ int worker;
+ int pipeRead;
+ int pipeWrite;
+ } WorkerInfo;
+
+ #define pipewrite(a,b,c) send(a,b,c,0)
+ #else
+ /*
+ * aborting is only ever used in the master, the workers are fine with just
+ * wantAbort.
+ */
+ static bool aborting = false;
+ static volatile sig_atomic_t wantAbort = 0;
+
+ #define pgpipe(a) pipe(a)
+ #define piperead(a,b,c) read(a,b,c)
+ #define pipewrite(a,b,c) write(a,b,c)
+ #endif
+
+ static const char *modulename = gettext_noop("parallel vacuum");
+
+ typedef struct ShutdownInformation
+ {
+ ParallelState *pstate;
+ PGconn *conn;
+ } ShutdownInformation;
+
+ static ShutdownInformation shutdown_info;
+
+ static char *readMessageFromPipe(int fd);
+ static void
+ SetupWorker(PGconn *connection, int pipefd[2], int worker, int vacStage);
+ static void
+ WaitForCommands(PGconn * connection, int pipefd[2]);
+ static char *
+ getMessageFromMaster(int pipefd[2]);
+ #define messageStartsWith(msg, prefix) \
+ (strncmp(msg, prefix, strlen(prefix)) == 0)
+ #define messageEquals(msg, pattern) \
+ (strcmp(msg, pattern) == 0)
+ static char *
+ getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker);
+
+ static void
+ sendMessageToMaster(int pipefd[2], const char *str);
+
+ static void
+ parallel_msg_master(ParallelSlot *slot, const char *modulename, const char *fmt,
+ va_list ap)__attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 0)));
+
+
+ #ifndef WIN32
+ static void sigTermHandler(int signum);
+ #endif
+
+ static ParallelSlot *
+ GetMyPSlot(ParallelState *pstate);
+ static int
+ select_loop(int maxFd, fd_set *workerset);
+ void horribly_exit(const char *modulename, const char *fmt,...)
+ __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
+
+ void
+ init_parallel_vacuum_utils(void);
+
+ static void exit_nicely(int code);
+
+ static bool
+ HasEveryWorkerTerminated(ParallelState *pstate);
+
+ static void checkAborting();
+
+
+ static ParallelSlot *
+ GetMyPSlot(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ #ifdef WIN32
+ if (pstate->parallelSlot[i].threadId == GetCurrentThreadId())
+ #else
+ if (pstate->parallelSlot[i].pid == getpid())
+ #endif
+ return &(pstate->parallelSlot[i]);
+
+ return NULL;
+ }
+
+ /* Sends the error message from the worker to the master process */
+ static void
+ parallel_msg_master(ParallelSlot *slot, const char *modulename,
+ const char *fmt, va_list ap)
+ {
+ char buf[512];
+ int pipefd[2];
+
+ pipefd[PIPE_READ] = slot->pipeRevRead;
+ pipefd[PIPE_WRITE] = slot->pipeRevWrite;
+
+ strcpy(buf, "ERROR ");
+ vsnprintf(buf + strlen("ERROR "),
+ sizeof(buf) - strlen("ERROR "), fmt, ap);
+
+ sendMessageToMaster(pipefd, buf);
+ }
+
+ /*
+ * Wait for the termination of the processes using the OS-specific method.
+ */
+ static void
+ WaitForTerminatingWorkers(ParallelState *pstate)
+ {
+ while (!HasEveryWorkerTerminated(pstate))
+ {
+ ParallelSlot *slot = NULL;
+ int j;
+
+ #ifndef WIN32
+ int status;
+ pid_t pid = wait(&status);
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].pid == pid)
+ slot = &(pstate->parallelSlot[j]);
+ #else
+ uintptr_t hThread;
+ DWORD ret;
+ uintptr_t *lpHandles = pg_malloc(sizeof(HANDLE) * pstate->numWorkers);
+ int nrun = 0;
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].workerStatus != WRKR_TERMINATED)
+ {
+ lpHandles[nrun] = pstate->parallelSlot[j].hThread;
+ nrun++;
+ }
+ ret = WaitForMultipleObjects(nrun, (HANDLE *) lpHandles, false, INFINITE);
+ Assert(ret != WAIT_FAILED);
+ hThread = lpHandles[ret - WAIT_OBJECT_0];
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].hThread == hThread)
+ slot = &(pstate->parallelSlot[j]);
+
+ free(lpHandles);
+ #endif
+ Assert(slot);
+
+ slot->workerStatus = WRKR_TERMINATED;
+ }
+ Assert(HasEveryWorkerTerminated(pstate));
+ }
+
+
+ #ifdef WIN32
+ static unsigned __stdcall
+ init_spawned_worker_win32(WorkerInfo *wi)
+ {
+ PGconn *conn;
+ int pipefd[2] = {wi->pipeRead, wi->pipeWrite};
+ int worker = wi->worker;
+ VacOpt *vopt = wi->vopt;
+ ParallelSlot *mySlot = &shutdown_info.pstate->parallelSlot[worker];
+
+ conn = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport,
+ vopt->username, vopt->promptPassword, vopt->progname,
+ false);
+
+ mySlot->args->connection = conn;
+
+ free(wi);
+ SetupWorker(conn, pipefd, worker, vopt->analyze_stage);
+ _endthreadex(0);
+ return 0;
+ }
+ #endif
+
+
+
+ ParallelState * ParallelVacuumStart(VacOpt *vopt, int numWorkers)
+ {
+ ParallelState *pstate;
+ int i;
+ const size_t slotSize = numWorkers * sizeof(ParallelSlot);
+
+ Assert(numWorkers > 0);
+
+ /* Ensure stdio state is quiesced before forking */
+ fflush(NULL);
+
+ pstate = (ParallelState *) pg_malloc(sizeof(ParallelState));
+
+ pstate->numWorkers = numWorkers;
+ pstate->parallelSlot = NULL;
+
+ if (numWorkers == 1)
+ return pstate;
+
+ pstate->parallelSlot = (ParallelSlot *) pg_malloc(slotSize);
+ memset((void *) pstate->parallelSlot, 0, slotSize);
+
+ shutdown_info.pstate = pstate;
+
+ #ifdef WIN32
+ tMasterThreadId = GetCurrentThreadId();
+ termEvent = CreateEvent(NULL, true, false, "Terminate");
+ #else
+ signal(SIGTERM, sigTermHandler);
+ signal(SIGINT, sigTermHandler);
+ signal(SIGQUIT, sigTermHandler);
+ #endif
+
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ #ifdef WIN32
+ WorkerInfo *wi;
+ uintptr_t handle;
+ #else
+ pid_t pid;
+ #endif
+ int pipeMW[2],
+ pipeWM[2];
+
+ if (pgpipe(pipeMW) < 0 || pgpipe(pipeWM) < 0)
+ horribly_exit(modulename,
+ "could not create communication channels: %s\n",
+ strerror(errno));
+
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ pstate->parallelSlot[i].args = (ParallelArgs *) pg_malloc(sizeof(ParallelArgs));
+ pstate->parallelSlot[i].args->vopt = vopt;
+ #ifdef WIN32
+ /* Allocate a new structure for every worker */
+ wi = (WorkerInfo *) pg_malloc(sizeof(WorkerInfo));
+ wi->worker = i;
+ wi->pipeRead = pstate->parallelSlot[i].pipeRevRead = pipeMW[PIPE_READ];
+ wi->pipeWrite = pstate->parallelSlot[i].pipeRevWrite = pipeWM[PIPE_WRITE];
+ wi->vopt = vopt;
+ handle = _beginthreadex(NULL, 0, (void *) &init_spawned_worker_win32,
+ wi, 0, &(pstate->parallelSlot[i].threadId));
+ pstate->parallelSlot[i].hThread = handle;
+ #else
+ pid = fork();
+ if (pid == 0)
+ {
+ /* we are the worker */
+ int j;
+ int pipefd[2] = {pipeMW[PIPE_READ], pipeWM[PIPE_WRITE]};
+
+ /*
+ * Store the fds for the reverse communication in pstate. Actually
+ * we only use this in case of an error and don't use pstate
+ * otherwise in the worker process. On Windows we write to the
+ * global pstate, in Unix we write to our process-local copy but
+ * that's also where we'd retrieve this information back from.
+ */
+ pstate->parallelSlot[i].pipeRevRead = pipefd[PIPE_READ];
+ pstate->parallelSlot[i].pipeRevWrite = pipefd[PIPE_WRITE];
+ pstate->parallelSlot[i].pid = getpid();
+
+ pstate->parallelSlot[i].args->connection
+ = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport,
+ vopt->username, vopt->promptPassword,
+ vopt->progname, false);
+
+ /* close read end of Worker -> Master */
+ closesocket(pipeWM[PIPE_READ]);
+
+ /* close write end of Master -> Worker */
+ closesocket(pipeMW[PIPE_WRITE]);
+
+ /*
+ * Close all inherited fds for communication of the master with
+ * the other workers.
+ */
+ for (j = 0; j < i; j++)
+ {
+ closesocket(pstate->parallelSlot[j].pipeRead);
+ closesocket(pstate->parallelSlot[j].pipeWrite);
+ }
+
+ SetupWorker(pstate->parallelSlot[i].args->connection,
+ pipefd, i, vopt->analyze_stage);
+ exit(0);
+ }
+ else if (pid < 0)
+ /* fork failed */
+ horribly_exit(modulename,
+ "could not create worker process: %s\n",
+ strerror(errno));
+
+ /* we are the Master, pid > 0 here */
+ Assert(pid > 0);
+
+ /* close read end of Master -> Worker */
+ closesocket(pipeMW[PIPE_READ]);
+
+ /* close write end of Worker -> Master */
+ closesocket(pipeWM[PIPE_WRITE]);
+
+ pstate->parallelSlot[i].pid = pid;
+ #endif
+
+ pstate->parallelSlot[i].pipeRead = pipeWM[PIPE_READ];
+ pstate->parallelSlot[i].pipeWrite = pipeMW[PIPE_WRITE];
+ }
+
+ return pstate;
+ }
+
+ /*
+ * Tell all of our workers to terminate.
+ *
+ * Pretty straightforward routine, first we tell everyone to terminate, then
+ * we listen to the workers' replies and finally close the sockets that we
+ * have used for communication.
+ */
+ void
+ ParallelVacuumEnd(ParallelState *pstate)
+ {
+ int i;
+
+ Assert(IsEveryWorkerIdle(pstate));
+
+ /* close the sockets so that the workers know they can exit */
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ closesocket(pstate->parallelSlot[i].pipeRead);
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+ }
+
+ WaitForTerminatingWorkers(pstate);
+
+ /*
+ * Remove the pstate again, so the exit handler in the parent will now
+ * again fall back to closing AH->connection (if connected).
+ */
+ shutdown_info.pstate = NULL;
+
+ free(pstate->parallelSlot);
+ free(pstate);
+ }
+
+
+ /*
+ * Find the first free parallel slot (if any).
+ */
+ int
+ GetIdleWorker(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus == WRKR_IDLE)
+ return i;
+ return NO_SLOT;
+ }
+
+ /*
+ * Return true iff every worker process is in the WRKR_TERMINATED state.
+ */
+ static bool
+ HasEveryWorkerTerminated(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_TERMINATED)
+ return false;
+ return true;
+ }
+
+ /*
+ * If we have one worker that terminates for some reason, we'd like the other
+ * threads to terminate as well (and not finish with their 70 GB table dump
+ * first...). Now in UNIX we can just kill these processes, and let the signal
+ * handler set wantAbort to 1. In Windows we set a termEvent and this serves
+ * as the signal for everyone to terminate.
+ */
+ static void
+ checkAborting()
+ {
+ #ifdef WIN32
+ if (WaitForSingleObject(termEvent, 0) == WAIT_OBJECT_0)
+ #else
+ if (wantAbort)
+ #endif
+ horribly_exit(modulename, "worker is terminating\n");
+ }
+
+ /*
+ * Shut down any remaining workers, this has an implicit do_wait == true.
+ *
+ * The fastest way we can make the workers terminate gracefully is when
+ * they are listening for new commands and we just tell them to terminate.
+ */
+ static void
+ ShutdownWorkersHard(ParallelState *pstate)
+ {
+ #ifndef WIN32
+ int i;
+
+ signal(SIGPIPE, SIG_IGN);
+
+ /*
+ * Close our write end of the sockets so that the workers know they can
+ * exit.
+ */
+ for (i = 0; i < pstate->numWorkers; i++)
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ kill(pstate->parallelSlot[i].pid, SIGTERM);
+ #else
+ /* The workers monitor this event via checkAborting(). */
+ SetEvent(termEvent);
+ #endif
+
+ WaitForTerminatingWorkers(pstate);
+ }
+
+ #ifndef WIN32
+ /* Signal handling (UNIX only) */
+ static void
+ sigTermHandler(int signum)
+ {
+ wantAbort = 1;
+ }
+ #endif
+
+
+
+ /*
+ * This function is called by both UNIX and Windows variants to set up a
+ * worker process.
+ */
+ static void
+ SetupWorker(PGconn *connection, int pipefd[2], int worker, int vacStage)
+ {
+
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"};
+
+ executeMaintenanceCommand(connection, stage_commands[vacStage], false);
+
+ /*
+ * Call the setup worker function that's defined in the ArchiveHandle.
+ *
+ * We get the raw connection only for the reason that we can close it
+ * properly when we shut down. This happens only that way when it is
+ * brought down because of an error.
+ */
+ WaitForCommands(connection, pipefd);
+ closesocket(pipefd[PIPE_READ]);
+ closesocket(pipefd[PIPE_WRITE]);
+ }
+
+
+
+ /*
+ * That's the main routine for the worker.
+ * When it starts up it enters this routine and waits for commands from the
+ * master process. After having processed a command it comes back to here to
+ * wait for the next command. Finally it will receive a TERMINATE command and
+ * exit.
+ */
+ static void
+ WaitForCommands(PGconn * connection, int pipefd[2])
+ {
+ char *command;
+ PQExpBufferData sql;
+
+ for (;;)
+ {
+ if (!(command = getMessageFromMaster(pipefd)))
+ {
+ PQfinish(connection);
+ connection = NULL;
+ return;
+ }
+
+
+ /* check if master has set the terminate event*/
+ checkAborting();
+
+ if (executeMaintenanceCommand(connection, command, false))
+ sendMessageToMaster(pipefd, "OK");
+ else
+ {
+ initPQExpBuffer(&sql);
+ appendPQExpBuffer(&sql, "ERROR : %s",
+ PQerrorMessage(connection));
+ sendMessageToMaster(pipefd, sql.data);
+ termPQExpBuffer(&sql);
+ }
+
+ /* command was pg_malloc'd and we are responsible for free()ing it. */
+ free(command);
+ }
+ }
+
+ /*
+ * ---------------------------------------------------------------------
+ * Note the status change:
+ *
+ * DispatchJobForTocEntry WRKR_IDLE -> WRKR_WORKING
+ * ListenToWorkers WRKR_WORKING -> WRKR_FINISHED / WRKR_TERMINATED
+ * ReapWorkerStatus WRKR_FINISHED -> WRKR_IDLE
+ * ---------------------------------------------------------------------
+ *
+ * Just calling ReapWorkerStatus() when all workers are working might or might
+ * not give you an idle worker because you need to call ListenToWorkers() in
+ * between and only thereafter ReapWorkerStatus(). This is necessary in order
+ * to get and deal with the status (=result) of the worker's execution.
+ */
+ void
+ ListenToWorkers(ParallelState *pstate, bool do_wait)
+ {
+ int worker;
+ char *msg;
+
+ msg = getMessageFromWorker(pstate, do_wait, &worker);
+
+ if (!msg)
+ {
+ if (do_wait)
+ horribly_exit(modulename, "a worker process died unexpectedly\n");
+ return;
+ }
+
+ if (messageStartsWith(msg, "OK"))
+ {
+ pstate->parallelSlot[worker].workerStatus = WRKR_FINISHED;
+
+ }
+ else if (messageStartsWith(msg, "ERROR "))
+ {
+
+ ParallelSlot *mySlot = &pstate->parallelSlot[worker];
+
+ mySlot->workerStatus = WRKR_TERMINATED;
+
+ horribly_exit(modulename,
+ "%s: vacuuming of database \"%s\" failed %s",
+ mySlot->args->vopt->progname,
+ mySlot->args->vopt->dbname, msg + strlen("ERROR "));
+ }
+ else
+ horribly_exit(modulename, "invalid message received from worker: %s\n", msg);
+
+ /* both Unix and Win32 return pg_malloc()ed space, so we free it */
+ free(msg);
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * This function is used to get the return value of a terminated worker
+ * process. If a process has terminated, its status is stored in *status and
+ * the id of the worker is returned.
+ */
+ int
+ ReapWorkerStatus(ParallelState *pstate, int *status)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_FINISHED)
+ {
+ *status = pstate->parallelSlot[i].status;
+ pstate->parallelSlot[i].status = 0;
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ return i;
+ }
+ }
+
+ return NO_SLOT;
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It looks for an idle worker process and only returns if there is one.
+ */
+ void
+ EnsureIdleWorker(ParallelState *pstate)
+ {
+ int ret_worker;
+ int work_status;
+
+ for (;;)
+ {
+ int nTerm = 0;
+
+ while ((ret_worker = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ if (work_status != 0)
+ horribly_exit(modulename, "error processing a parallel work item\n");
+
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before dispatching
+ * the next item. If nTerm > 0 we already have that (quick check).
+ */
+ if (nTerm > 0)
+ return;
+
+ /* explicit check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ return;
+
+ /*
+ * If we have no idle worker, read the result of one or more workers
+ * and loop the loop to call ReapWorkerStatus() on them
+ */
+ ListenToWorkers(pstate, true);
+ }
+ }
+
+
+ /*
+ * Return true iff every worker is in the WRKR_IDLE state.
+ */
+ bool
+ IsEveryWorkerIdle(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_IDLE)
+ return false;
+ return true;
+ }
+
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It waits for all workers to terminate.
+ */
+ void
+ EnsureWorkersFinished(ParallelState *pstate)
+ {
+ int work_status;
+
+ if (!pstate || pstate->numWorkers == 1)
+ return;
+
+ /* Waiting for the remaining worker processes to finish */
+ while (!IsEveryWorkerIdle(pstate))
+ {
+ if (ReapWorkerStatus(pstate, &work_status) == NO_SLOT)
+ ListenToWorkers(pstate, true);
+ else if (work_status != 0)
+ horribly_exit(modulename,
+ "error processing a parallel work item\n");
+ }
+ }
+
+ /*
+ * This function is executed in the worker process.
+ *
+ * It returns the next message on the communication channel, blocking until it
+ * becomes available.
+ */
+ static char *
+ getMessageFromMaster(int pipefd[2])
+ {
+ return readMessageFromPipe(pipefd[PIPE_READ]);
+ }
+
+ /*
+ * This function is executed in the worker process.
+ *
+ * It sends a message to the master on the communication channel.
+ */
+ static void
+ sendMessageToMaster(int pipefd[2], const char *str)
+ {
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pipefd[PIPE_WRITE], str, len) != len)
+ horribly_exit(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It returns the next message from the worker on the communication channel,
+ * optionally blocking (do_wait) until it becomes available.
+ *
+ * The id of the worker is returned in *worker.
+ */
+ static char *
+ getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker)
+ {
+ int i;
+ fd_set workerset;
+ int maxFd = -1;
+ struct timeval nowait = {0, 0};
+
+ FD_ZERO(&workerset);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_TERMINATED)
+ continue;
+ FD_SET(pstate->parallelSlot[i].pipeRead, &workerset);
+ /* actually WIN32 ignores the first parameter to select()... */
+ if (pstate->parallelSlot[i].pipeRead > maxFd)
+ maxFd = pstate->parallelSlot[i].pipeRead;
+ }
+
+ if (do_wait)
+ {
+ i = select_loop(maxFd, &workerset);
+ Assert(i != 0);
+ }
+ else
+ {
+ if ((i = select(maxFd + 1, &workerset, NULL, NULL, &nowait)) == 0)
+ return NULL;
+ }
+
+ if (i < 0)
+ horribly_exit(modulename, "error in ListenToWorkers(): %s\n", strerror(errno));
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ char *msg;
+
+ if (!FD_ISSET(pstate->parallelSlot[i].pipeRead, &workerset))
+ continue;
+
+ msg = readMessageFromPipe(pstate->parallelSlot[i].pipeRead);
+ *worker = i;
+ return msg;
+ }
+
+ Assert(false);
+ return NULL;
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It sends a message to a certain worker on the communication channel.
+ */
+ static void
+ sendMessageToWorker(ParallelState *pstate, int worker, const char *str)
+ {
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pstate->parallelSlot[worker].pipeWrite, str, len) != len)
+ {
+ /*
+ * If we're already aborting anyway, don't care if we succeed or not.
+ * The child might have gone already.
+ */
+ #ifndef WIN32
+ if (!aborting)
+ #endif
+ horribly_exit(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+ }
+ }
+
+ /*
+ * The underlying function to read a message from the communication channel
+ * (fd) with optional blocking (do_wait).
+ */
+ static char *
+ readMessageFromPipe(int fd)
+ {
+ char *msg;
+ int msgsize,
+ bufsize;
+ int ret;
+
+ /*
+ * The problem here is that we need to deal with several possibilites: we
+ * could receive only a partial message or several messages at once. The
+ * caller expects us to return exactly one message however.
+ *
+ * We could either read in as much as we can and keep track of what we
+ * delivered back to the caller or we just read byte by byte. Once we see
+ * (char) 0, we know that it's the message's end. This would be quite
+ * inefficient for more data but since we are reading only on the command
+ * channel, the performance loss does not seem worth the trouble of
+ * keeping internal states for different file descriptors.
+ */
+ bufsize = 64; /* could be any number */
+ msg = (char *) pg_malloc(bufsize);
+
+ msgsize = 0;
+ for (;;)
+ {
+ Assert(msgsize <= bufsize);
+ ret = piperead(fd, msg + msgsize, 1);
+
+ /* worker has closed the connection or another error happened */
+ if (ret <= 0)
+ return NULL;
+
+ Assert(ret == 1);
+
+ if (msg[msgsize] == '\0')
+ return msg;
+
+ msgsize++;
+ if (msgsize == bufsize)
+ {
+ /* could be any number */
+ bufsize += 16;
+ msg = (char *) realloc(msg, bufsize);
+ }
+ }
+ }
+
+ /*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+ static int
+ select_loop(int maxFd, fd_set *workerset)
+ {
+ int i;
+ fd_set saveSet = *workerset;
+
+ #ifdef WIN32
+ /* should always be the master */
+ Assert(tMasterThreadId == GetCurrentThreadId());
+
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+ #else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+
+ /*
+ * If we Ctrl-C the master process , it's likely that we interrupt
+ * select() here. The signal handler will set wantAbort == true and
+ * the shutdown journey starts from here. Note that we'll come back
+ * here later when we tell all workers to terminate and read their
+ * responses. But then we have aborting set to true.
+ */
+ if (wantAbort && !aborting)
+ horribly_exit(modulename, "terminated by user\n");
+
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+ #endif
+
+ return i;
+ }
+
+ void
+ DispatchJob(ParallelState *pstate, char * command)
+ {
+ int worker;
+
+ /* our caller makes sure that at least one worker is idle */
+ Assert(GetIdleWorker(pstate) != NO_SLOT);
+ worker = GetIdleWorker(pstate);
+ Assert(worker != NO_SLOT);
+
+ sendMessageToWorker(pstate, worker, command);
+ pstate->parallelSlot[worker].workerStatus = WRKR_WORKING;
+ }
+
+ #ifdef WIN32
+ /*
+ * This is a replacement version of pipe for Win32 which allows returned
+ * handles to be used in select(). Note that read/write calls must be replaced
+ * with recv/send.
+ */
+ static int
+ pgpipe(int handles[2])
+ {
+ SOCKET s;
+ struct sockaddr_in serv_addr;
+ int len = sizeof(serv_addr);
+
+ handles[0] = handles[1] = INVALID_SOCKET;
+
+ if ((s = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
+ {
+ return -1;
+ }
+
+ memset((void *) &serv_addr, 0, sizeof(serv_addr));
+ serv_addr.sin_family = AF_INET;
+ serv_addr.sin_port = htons(0);
+ serv_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+ if (bind(s, (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if (listen(s, 1) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if (getsockname(s, (SOCKADDR *) &serv_addr, &len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if ((handles[1] = socket(PF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
+ {
+ closesocket(s);
+ return -1;
+ }
+
+ if (connect(handles[1], (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if ((handles[0] = accept(s, (SOCKADDR *) &serv_addr, &len)) == INVALID_SOCKET)
+ {
+ closesocket(handles[1]);
+ handles[1] = INVALID_SOCKET;
+ closesocket(s);
+ return -1;
+ }
+ closesocket(s);
+ return 0;
+ }
+
+ static int
+ piperead(int s, char *buf, int len)
+ {
+ int ret = recv(s, buf, len, 0);
+
+ if (ret < 0 && WSAGetLastError() == WSAECONNRESET)
+ /* EOF on the pipe! (win32 socket based implementation) */
+ ret = 0;
+ return ret;
+ }
+
+ #endif
+
+ static void
+ shutdown_parallel_vacuum_utils()
+ {
+ #ifdef WIN32
+ /* Call the cleanup function only from the main thread */
+ if (mainThreadId == GetCurrentThreadId())
+ WSACleanup();
+ #endif
+ }
+
+ void
+ init_parallel_vacuum_utils(void)
+ {
+ #ifdef WIN32
+ if (!parallel_init_done)
+ {
+ WSADATA wsaData;
+ int err;
+
+ mainThreadId = GetCurrentThreadId();
+ err = WSAStartup(MAKEWORD(2, 2), &wsaData);
+ if (err != 0)
+ {
+ exit_nicely(1);
+ }
+
+ parallel_init_done = true;
+ }
+ #endif
+ }
+
+ void
+ on_exit_close_connection(PGconn *conn)
+ {
+ shutdown_info.conn = conn;
+ }
+
+ /*
+ * Fail and die, with a message to stderr. Parameters as for write_msg.
+ *
+ * This is defined in parallel.c, because in parallel mode, things are more
+ * complicated. If the worker process does horribly_exit(), we forward its
+ * last words to the master process. The master process then does
+ * horribly_exit() with this error message itself and prints it normally.
+ * After printing the message, horribly_exit() on the master will shut down
+ * the remaining worker processes.
+ */
+ void
+ horribly_exit(const char *modulename, const char *fmt,...)
+ {
+ va_list ap;
+ ParallelState *pstate = shutdown_info.pstate;
+ ParallelSlot *slot;
+ va_start(ap, fmt);
+
+ if (pstate == NULL)
+ {
+ /* We're the parent, just write the message out */
+ vfprintf(stderr, _(fmt), ap);
+ }
+ else
+ {
+ slot = GetMyPSlot(pstate);
+ if (!slot)
+ {
+ /* We're the parent, just write the message out */
+ vfprintf(stderr, _(fmt), ap);
+
+ }
+ else
+ {
+ /* If we're a worker process, send the msg to the master process */
+ parallel_msg_master(slot, modulename, fmt, ap);
+ }
+ }
+
+ va_end(ap);
+
+ exit_nicely(1);
+ }
+
+ static void
+ exit_nicely(int code)
+ {
+ if (shutdown_info.pstate)
+ {
+ ParallelSlot *slot = GetMyPSlot(shutdown_info.pstate);
+
+ if (!slot)
+ {
+ /*
+ * We're the master: We have already printed out the message
+ * passed to horribly_exit() either from the master itself or from
+ * a worker process. Now we need to close our own database
+ * connection (only open during parallel dump but not restore) and
+ * shut down the remaining workers.
+ */
+ PQfinish(shutdown_info.conn);
+ #ifndef WIN32
+
+ /*
+ * Setting aborting to true switches to best-effort-mode
+ * (send/receive but ignore errors) in communicating with our
+ * workers.
+ */
+ aborting = true;
+ #endif
+ ShutdownWorkersHard(shutdown_info.pstate);
+ }
+ else if (slot->args->connection)
+ PQfinish(slot->args->connection);
+
+ }
+ else if (shutdown_info.conn)
+ PQfinish(shutdown_info.conn);
+
+ shutdown_parallel_vacuum_utils();
+ exit(code);
+ }
+
+
*** /dev/null
--- b/src/bin/scripts/vac_parallel.h
***************
*** 0 ****
--- 1,105 ----
+ /*-------------------------------------------------------------------------
+ *
+ * vac_parallel.h
+ *
+ * Parallel support header file for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #ifndef VAC_PARALLEL_H
+ #define VAC_PARALLEL_H
+
+ #include "postgres_fe.h"
+ #include <time.h>
+ #include "libpq-fe.h"
+ #include "common.h"
+
+ typedef enum
+ {
+ WRKR_TERMINATED = 0,
+ WRKR_IDLE,
+ WRKR_WORKING,
+ WRKR_FINISHED
+ } T_WorkerStatus;
+
+
+ typedef struct VacOpt
+ {
+ char *dbname;
+ char *pgport;
+ char *pghost;
+ char *username;
+ char *progname;
+ enum trivalue promptPassword;
+ int analyze_stage;
+ }VacOpt;
+
+ /* Arguments needed for a worker process */
+ typedef struct ParallelArgs
+ {
+ PGconn *connection;
+ VacOpt *vopt;
+ } ParallelArgs;
+
+ /* State for each parallel activity slot */
+ typedef struct ParallelSlot
+ {
+ ParallelArgs *args; //can pass connection handle here
+ T_WorkerStatus workerStatus;
+ int status;
+ int pipeRead;
+ int pipeWrite;
+ int pipeRevRead;
+ int pipeRevWrite;
+ #ifdef WIN32
+ uintptr_t hThread;
+ unsigned int threadId;
+ #else
+ pid_t pid;
+ #endif
+ } ParallelSlot;
+
+ #define NO_SLOT (-1)
+
+ typedef struct ParallelState
+ {
+ int numWorkers;
+ ParallelSlot *parallelSlot;
+ } ParallelState;
+
+
+
+ #ifdef WIN32
+ extern bool parallel_init_done;
+ extern DWORD mainThreadId;
+ #endif
+
+ extern ParallelState * ParallelVacuumStart(VacOpt *vopt, int numWorkers);
+ extern int GetIdleWorker(ParallelState *pstate);
+ extern bool IsEveryWorkerIdle(ParallelState *pstate);
+ extern void ListenToWorkers(ParallelState *pstate, bool do_wait);
+ extern int ReapWorkerStatus(ParallelState *pstate, int *status);
+ extern void EnsureIdleWorker(ParallelState *pstate);
+ extern void EnsureWorkersFinished(ParallelState *pstate);
+
+ extern void DispatchJob(ParallelState *pstate, char * command);
+ extern void
+ exit_horribly(const char *modulename, const char *fmt,...)
+ __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3), noreturn));
+
+ extern void init_parallel_vacuum_utils(void);
+ extern void on_exit_close_connection(PGconn *conn);
+ extern void ParallelVacuumEnd(ParallelState *pstate);
+
+
+ #endif /* VAC_PARALLEL_H */
*** a/src/bin/scripts/vacuumdb.c
--- b/src/bin/scripts/vacuumdb.c
***************
*** 11,34 ****
*/
#include "postgres_fe.h"
- #include "common.h"
#include "dumputils.h"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *table, const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
const char *progname, bool echo);
static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
bool analyze_only, bool analyze_in_stages, bool freeze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet);
static void help(const char *progname);
int
main(int argc, char *argv[])
--- 11,54 ----
*/
#include "postgres_fe.h"
#include "dumputils.h"
+ #include "vac_parallel.h"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password,
const char *progname, bool echo);
static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
bool analyze_only, bool analyze_in_stages, bool freeze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet, int parallel);
static void help(const char *progname);
+ void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host,
+ const char *port, const char *username,
+ enum trivalue prompt_password, const char *progname,
+ bool echo, int parallel, SimpleStringList *tables);
+
+ void run_command(ParallelState *pstate, char *command);
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+ static void
+ run_parallel_vacuum(PGconn *conn, bool echo,
+ const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int parallel,
+ VacOpt *vopt, const char *progname);
+
+
int
main(int argc, char *argv[])
***************
*** 49,54 **** main(int argc, char *argv[])
--- 69,75 ----
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
***************
*** 74,86 **** main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
{
switch (c)
{
--- 95,108 ----
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int parallel = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
***************
*** 129,134 **** main(int argc, char *argv[])
--- 151,159 ----
case 'v':
verbose = true;
break;
+ case 'j':
+ parallel = atoi(optarg);
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
***************
*** 141,146 **** main(int argc, char *argv[])
--- 166,172 ----
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
***************
*** 196,202 **** main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet);
}
else
{
--- 222,228 ----
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet, parallel);
}
else
{
***************
*** 210,234 **** main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
! for (cell = tables.head; cell; cell = cell->next)
{
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo);
}
}
- else
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages,
- freeze, NULL,
- host, port, username, prompt_password,
- progname, echo);
}
exit(0);
--- 236,274 ----
dbname = get_user_name_or_exit(progname);
}
! if (parallel > 1)
{
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! progname, echo, parallel, &tables);
! }
! else
! {
! if (tables.head != NULL)
! {
! SimpleStringListCell *cell;
!
! for (cell = tables.head; cell; cell = cell->next)
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo);
! }
! }
! else
{
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, NULL,
! host, port, username, prompt_password,
! progname, echo);
!
}
}
}
exit(0);
***************
*** 253,263 **** run_vacuum_command(PGconn *conn, const char *sql, bool echo, const char *dbname,
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze, const char *table,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo)
{
PQExpBufferData sql;
--- 293,304 ----
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo)
{
PQExpBufferData sql;
***************
*** 352,362 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
! bool analyze_in_stages, bool freeze, const char *maintenance_db,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
--- 393,404 ----
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *maintenance_db, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo, bool quiet, int parallel)
{
PGconn *conn;
PGresult *result;
***************
*** 377,391 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
fflush(stdout);
}
! vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
! analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
progname, echo);
}
PQclear(result);
}
static void
help(const char *progname)
--- 419,666 ----
fflush(stdout);
}
!
! if (parallel > 1)
! {
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! progname, echo, parallel, NULL);
!
! }
! else
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
progname, echo);
+ }
}
PQclear(result);
}
+ static void
+ run_parallel_vacuum(PGconn *conn, bool echo,
+ const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int parallel,
+ VacOpt *vopt, const char *progname)
+ {
+ ParallelState *pstate;
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+
+ initPQExpBuffer(&sql);
+
+ pstate = ParallelVacuumStart(vopt, parallel);
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ prepare_command(conn, full, verbose, and_analyze,
+ analyze_only, freeze, &sql);
+ appendPQExpBuffer(&sql, " %s", cell->val);
+ run_command(pstate, sql.data);
+ termPQExpBuffer(&sql);
+ }
+
+ EnsureWorkersFinished(pstate);
+ ParallelVacuumEnd(pstate);
+ termPQExpBuffer(&sql);
+ }
+
+
+ void
+ vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int parallel,
+ SimpleStringList *tables)
+ {
+
+ PGconn *conn;
+ VacOpt vopt = {0};
+
+ init_parallel_vacuum_utils();
+
+ conn = connectDatabase(dbname, host, port, username, prompt_password,
+ progname, false);
+
+ if (dbname)
+ vopt.dbname = pg_strdup(dbname);
+
+ if (host)
+ vopt.pghost = pg_strdup(host);
+
+ if (port)
+ vopt.pgport = pg_strdup(port);
+
+ if (username)
+ vopt.username = pg_strdup(username);
+
+ if (progname)
+ vopt.progname = pg_strdup(progname);
+
+ vopt.promptPassword = prompt_password;
+
+ on_exit_close_connection(conn);
+
+ /* if table list is not provided then we need to do vaccum for whole DB
+ get the list of all tables and prpare the list*/
+ if (!tables || !tables->head)
+ {
+ SimpleStringList dbtables = {NULL, NULL};
+ PGresult *res;
+ int ntuple;
+ int i;
+ char *relName;
+ char *nspace;
+ PQExpBufferData sql;
+
+ initPQExpBuffer(&sql);
+
+ res = executeQuery(conn,
+ "select relname, nspname from pg_class c, pg_namespace ns"
+ " where relkind= \'r\' and c.relnamespace = ns.oid"
+ " order by relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ relName = PQgetvalue(res, i, 0);
+ nspace = PQgetvalue(res, i, 1);
+
+ appendPQExpBuffer(&sql, " %s.\"%s\"", nspace, relName);
+ simple_string_list_append(&dbtables, sql.data);
+ resetPQExpBuffer(&sql);
+ }
+
+ termPQExpBuffer(&sql);
+
+ tables = &dbtables;
+
+ }
+
+ if (analyze_in_stages)
+ {
+ int i;
+
+ for (i = 0; i < 3; i++)
+ {
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")};
+
+ puts(gettext(stage_messages[i]));
+ vopt.analyze_stage = i;
+ run_parallel_vacuum(conn, echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, parallel, &vopt,
+ progname);
+ }
+ }
+ else
+ {
+ vopt.analyze_stage = -1;
+ run_parallel_vacuum(conn, echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, parallel, &vopt,
+ progname);
+ }
+
+ PQfinish(conn);
+ }
+
+ void run_command(ParallelState *pstate, char *command)
+ {
+ int work_status;
+ int ret_child;
+
+ DispatchJob(pstate, command);
+
+ /*Listen for worker and get message*/
+ for (;;)
+ {
+ int nTerm = 0;
+
+ ListenToWorkers(pstate, false);
+ while ((ret_child = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before
+ * re-running the loop. If nTerm > 0 we already have that (quick
+ * check).
+ */
+ if (nTerm > 0)
+ break;
+
+ /* if nobody terminated, explicitly check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ break;
+ }
+ }
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+ {
+ initPQExpBuffer(sql);
+
+ if (analyze_only)
+ {
+ appendPQExpBuffer(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBuffer(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBuffer(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBuffer(sql, " FULL");
+ if (freeze)
+ appendPQExpBuffer(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBuffer(sql, " ANALYZE");
+ }
+ }
+ }
+
static void
help(const char *progname)
***************
*** 405,410 **** help(const char *progname)
--- 680,686 ----
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
On Mon, Jun 30, 2014 at 3:17 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
Jeff Janes wrote:
In particular, pgpipe is almost an exact duplicate between them,
except the copy in vac_parallel.c has fallen behind changes made to
parallel.c. (Those changes would have fixed the Windows warnings). I
think that this function (and perhaps other parts as
well--"exit_horribly" for example) need to refactored into a common
file that both files can include. I don't know where the best place
for that would be, though. (I haven't done this type of refactoring
myself.)I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos.
Maybe we should move pgpipe back to src/port and have pg_dump and this
new thing use that. I'm not sure about the rest of duplication in
vac_parallel.c; there might be a lot in common with what
pg_dump/parallel.c does too. Having two copies of code is frowned upon
for good reasons. This patch introduces 1200 lines of new code in
vac_parallel.c, ugh.If we really require 1200 lines to get parallel vacuum working for
vacuumdb, I would question the wisdom of this effort. To me, it seems
better spent improving autovacuum to cover whatever it is that this
patch is supposed to be good for --- or maybe just enable having a shell
script that launches multiple vacuumdb instances in parallel ...
I would only envision using the parallel feature for vacuumdb after a
pg_upgrade or some other major maintenance window (that is the only
time I ever envision using vacuumdb at all). I don't think autovacuum
can be expected to handle such situations well, as it is designed to
be a smooth background process.
I guess the ideal solution would be for manual VACUUM to have a
PARALLEL option, then vacuumdb could just invoke that one table at a
time. That way you would get within-table parallelism which would be
important if one table dominates the entire database cluster. But I
don't foresee that happening any time soon.
I don't know how to calibrate the number of lines that is worthwhile.
If you write in C and need to have cross-platform compatibility and
robust error handling, it seems to take hundreds of lines to do much
of anything. The code duplication is a problem, but I don't think
just raw line count is, especially since it has already been written.
The trend in this project seems to be for shell scripts to eventually
get converted into C programs. In fact, src/bin/scripts now has no
scripts at all. Also it is important to vacuum/analyze tables in the
same database at the same time, otherwise you will not get much
speed-up in the ordinary case where there is only one meaningful
database. Doing that in a shell script would be fairly hard. It
should be pretty easy in Perl (at least for me--I'm sure others
disagree), but that also doesn't seem to be the way we do things for
programs intended for end users.
Cheers,
Jeff
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Jeff Janes wrote:
I would only envision using the parallel feature for vacuumdb after a
pg_upgrade or some other major maintenance window (that is the only
time I ever envision using vacuumdb at all). I don't think autovacuum
can be expected to handle such situations well, as it is designed to
be a smooth background process.
That's a fair point. One thing that would be pretty neat but I don't
think I would get anyone to implement it, is having the user control the
autovacuum launcher in some way. For instance "please vacuum this set
of tables as quickly as possible", and it would launch as many workers
are configured. It would take months to get a UI settled for this,
however.
I guess the ideal solution would be for manual VACUUM to have a
PARALLEL option, then vacuumdb could just invoke that one table at a
time. That way you would get within-table parallelism which would be
important if one table dominates the entire database cluster. But I
don't foresee that happening any time soon.
I see this as a completely different feature, which might also be pretty
neat, at least if you're open to spending more I/O bandwidth processing
a single table: have several processes scanning the heap simultaneously.
Since I think vacuum is mostly I/O bound at the moment, I'm not sure
there is much point in this currently.
I don't know how to calibrate the number of lines that is worthwhile.
If you write in C and need to have cross-platform compatibility and
robust error handling, it seems to take hundreds of lines to do much
of anything. The code duplication is a problem, but I don't think
just raw line count is, especially since it has already been written.
Well, there are (at least) two types of duplicate code: first you have
these common routines such as pgpipe that are duplicates for no good
reason. Just move them to src/port or something and it's all good. But
the OP said there is code that cannot be shared even though it's very
similar in both incarnations. That means we cannot (or it's difficult
to) just have one copy, which means as they fix bugs in one copy we need
to update the other. This is bad -- witness the situation with ecpg's
copy of date/time code, where there are bugs fixed in the backend
version but the ecpg version does not have the fix. It's difficult to
keep track of these things.
The trend in this project seems to be for shell scripts to eventually
get converted into C programs. In fact, src/bin/scripts now has no
scripts at all. Also it is important to vacuum/analyze tables in the
same database at the same time, otherwise you will not get much
speed-up in the ordinary case where there is only one meaningful
database. Doing that in a shell script would be fairly hard. It
should be pretty easy in Perl (at least for me--I'm sure others
disagree), but that also doesn't seem to be the way we do things for
programs intended for end users.
Yeah, shipping shell scripts doesn't work very well for us. I'm
thinking perhaps we can have sample scripts in which we show how to use
parallel(1) to run multiple vacuumdb's in parallel in Unix and some
similar mechanism in Windows, and that's it. So we wouldn't provide the
complete toolset, but the platform surely has ways to make it happen.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jul 2, 2014 at 2:27 PM, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 01 July 2014 22:17, Sawada Masahiko Wrote,
I have executed latest patch.
One question is that how to use --jobs option is correct?
$ vacuumdb -d postgres --jobs=30I got following error.
vacuumdb: unrecognized option '--jobs=30'
Try "vacuumdb --help" for more information.Thanks for comments, Your usage are correct, but there are some problem in code and I have fixed the same in attached patch.
This patch allows to set 0 to -j option?
When I set 0 to -j option, I think that the behavior of this is same
as when I set to 1.
Regards,
-------
Sawada Masahiko
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03 July 2014 00:01, Sawada Masahiko Wrote,
This patch allows to set 0 to -j option?
When I set 0 to -j option, I think that the behavior of this is same as
when I set to 1.
I have changed the patch, now It will return error if -j set to 0 or less than 0.
"vacuumdb: Number of parallel "jobs" should be at least 1"
Thanks & Regards,
Dilip
Attachments:
vacuumdb_parallel_v7.patchapplication/octet-stream; name=vacuumdb_parallel_v7.patchDownload
*** a/doc/src/sgml/ref/vacuumdb.sgml
--- b/doc/src/sgml/ref/vacuumdb.sgml
***************
*** 224,229 **** PostgreSQL documentation
--- 224,239 ----
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></></term>
+ <term><option>--jobs=<replaceable class="parameter">jobs</replaceable></></term>
+ <listitem>
+ <para>
+ Number of parallel process to perform the operation.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-?</></term>
<term><option>--help</></term>
<listitem>
*** a/src/bin/scripts/Makefile
--- b/src/bin/scripts/Makefile
***************
*** 32,38 **** dropdb: dropdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq subm
droplang: droplang.o common.o print.o mbprint.o | submake-libpq submake-libpgport
dropuser: dropuser.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
clusterdb: clusterdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
! vacuumdb: vacuumdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
reindexdb: reindexdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
pg_isready: pg_isready.o common.o | submake-libpq submake-libpgport
--- 32,38 ----
droplang: droplang.o common.o print.o mbprint.o | submake-libpq submake-libpgport
dropuser: dropuser.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
clusterdb: clusterdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
! vacuumdb: vacuumdb.o vac_parallel.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
reindexdb: reindexdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
pg_isready: pg_isready.o common.o | submake-libpq submake-libpgport
***************
*** 65,71 **** uninstall:
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
! rm -f common.o dumputils.o kwlookup.o keywords.o print.o mbprint.o $(WIN32RES)
rm -f dumputils.c print.c mbprint.c kwlookup.c keywords.c
rm -rf tmp_check
--- 65,71 ----
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
! rm -f common.o dumputils.o kwlookup.o keywords.o print.o mbprint.o vac_parallel.o $(WIN32RES)
rm -f dumputils.c print.c mbprint.c kwlookup.c keywords.c
rm -rf tmp_check
*** /dev/null
--- b/src/bin/scripts/vac_parallel.c
***************
*** 0 ****
--- 1,1144 ----
+ /*-------------------------------------------------------------------------
+ *
+ * vac_parallel.c
+ *
+ * Parallel support for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #include "vac_parallel.h"
+
+ #ifndef WIN32
+ #include <sys/types.h>
+ #include <sys/wait.h>
+ #include "signal.h"
+ #include <unistd.h>
+ #include <fcntl.h>
+ #endif
+
+ #include "common.h"
+
+
+ #define PIPE_READ 0
+ #define PIPE_WRITE 1
+
+ /* file-scope variables */
+ #ifdef WIN32
+ static unsigned int tMasterThreadId = 0;
+ static HANDLE termEvent = INVALID_HANDLE_VALUE;
+ static int pgpipe(int handles[2]);
+ static int piperead(int s, char *buf, int len);
+ bool parallel_init_done = false;
+ DWORD mainThreadId;
+
+ /*
+ * Structure to hold info passed by _beginthreadex() to the function it calls
+ * via its single allowed argument.
+ */
+ typedef struct
+ {
+ VacOpt *vopt;
+ int worker;
+ int pipeRead;
+ int pipeWrite;
+ } WorkerInfo;
+
+ #define pipewrite(a,b,c) send(a,b,c,0)
+ #else
+ /*
+ * aborting is only ever used in the master, the workers are fine with just
+ * wantAbort.
+ */
+ static bool aborting = false;
+ static volatile sig_atomic_t wantAbort = 0;
+
+ #define pgpipe(a) pipe(a)
+ #define piperead(a,b,c) read(a,b,c)
+ #define pipewrite(a,b,c) write(a,b,c)
+ #endif
+
+ static const char *modulename = gettext_noop("parallel vacuum");
+
+ typedef struct ShutdownInformation
+ {
+ ParallelState *pstate;
+ PGconn *conn;
+ } ShutdownInformation;
+
+ static ShutdownInformation shutdown_info;
+
+ static char *readMessageFromPipe(int fd);
+ static void
+ SetupWorker(PGconn *connection, int pipefd[2], int worker, int vacStage);
+ static void
+ WaitForCommands(PGconn * connection, int pipefd[2]);
+ static char *
+ getMessageFromMaster(int pipefd[2]);
+ #define messageStartsWith(msg, prefix) \
+ (strncmp(msg, prefix, strlen(prefix)) == 0)
+ #define messageEquals(msg, pattern) \
+ (strcmp(msg, pattern) == 0)
+ static char *
+ getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker);
+
+ static void
+ sendMessageToMaster(int pipefd[2], const char *str);
+
+ static void
+ parallel_msg_master(ParallelSlot *slot, const char *modulename, const char *fmt,
+ va_list ap)__attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 0)));
+
+
+ #ifndef WIN32
+ static void sigTermHandler(int signum);
+ #endif
+
+ static ParallelSlot *
+ GetMyPSlot(ParallelState *pstate);
+ static int
+ select_loop(int maxFd, fd_set *workerset);
+ void horribly_exit(const char *modulename, const char *fmt,...)
+ __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
+
+ void
+ init_parallel_vacuum_utils(void);
+
+ static void exit_nicely(int code);
+
+ static bool
+ HasEveryWorkerTerminated(ParallelState *pstate);
+
+ static void checkAborting();
+
+
+ static ParallelSlot *
+ GetMyPSlot(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ #ifdef WIN32
+ if (pstate->parallelSlot[i].threadId == GetCurrentThreadId())
+ #else
+ if (pstate->parallelSlot[i].pid == getpid())
+ #endif
+ return &(pstate->parallelSlot[i]);
+
+ return NULL;
+ }
+
+ /* Sends the error message from the worker to the master process */
+ static void
+ parallel_msg_master(ParallelSlot *slot, const char *modulename,
+ const char *fmt, va_list ap)
+ {
+ char buf[512];
+ int pipefd[2];
+
+ pipefd[PIPE_READ] = slot->pipeRevRead;
+ pipefd[PIPE_WRITE] = slot->pipeRevWrite;
+
+ strcpy(buf, "ERROR ");
+ vsnprintf(buf + strlen("ERROR "),
+ sizeof(buf) - strlen("ERROR "), fmt, ap);
+
+ sendMessageToMaster(pipefd, buf);
+ }
+
+ /*
+ * Wait for the termination of the processes using the OS-specific method.
+ */
+ static void
+ WaitForTerminatingWorkers(ParallelState *pstate)
+ {
+ while (!HasEveryWorkerTerminated(pstate))
+ {
+ ParallelSlot *slot = NULL;
+ int j;
+
+ #ifndef WIN32
+ int status;
+ pid_t pid = wait(&status);
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].pid == pid)
+ slot = &(pstate->parallelSlot[j]);
+ #else
+ uintptr_t hThread;
+ DWORD ret;
+ uintptr_t *lpHandles = pg_malloc(sizeof(HANDLE) * pstate->numWorkers);
+ int nrun = 0;
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].workerStatus != WRKR_TERMINATED)
+ {
+ lpHandles[nrun] = pstate->parallelSlot[j].hThread;
+ nrun++;
+ }
+ ret = WaitForMultipleObjects(nrun, (HANDLE *) lpHandles, false, INFINITE);
+ Assert(ret != WAIT_FAILED);
+ hThread = lpHandles[ret - WAIT_OBJECT_0];
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].hThread == hThread)
+ slot = &(pstate->parallelSlot[j]);
+
+ free(lpHandles);
+ #endif
+ Assert(slot);
+
+ slot->workerStatus = WRKR_TERMINATED;
+ }
+ Assert(HasEveryWorkerTerminated(pstate));
+ }
+
+
+ #ifdef WIN32
+ static unsigned __stdcall
+ init_spawned_worker_win32(WorkerInfo *wi)
+ {
+ PGconn *conn;
+ int pipefd[2] = {wi->pipeRead, wi->pipeWrite};
+ int worker = wi->worker;
+ VacOpt *vopt = wi->vopt;
+ ParallelSlot *mySlot = &shutdown_info.pstate->parallelSlot[worker];
+
+ conn = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport,
+ vopt->username, vopt->promptPassword, vopt->progname,
+ false);
+
+ mySlot->args->connection = conn;
+
+ free(wi);
+ SetupWorker(conn, pipefd, worker, vopt->analyze_stage);
+ _endthreadex(0);
+ return 0;
+ }
+ #endif
+
+
+
+ ParallelState * ParallelVacuumStart(VacOpt *vopt, int numWorkers)
+ {
+ ParallelState *pstate;
+ int i;
+ const size_t slotSize = numWorkers * sizeof(ParallelSlot);
+
+ Assert(numWorkers > 0);
+
+ /* Ensure stdio state is quiesced before forking */
+ fflush(NULL);
+
+ pstate = (ParallelState *) pg_malloc(sizeof(ParallelState));
+
+ pstate->numWorkers = numWorkers;
+ pstate->parallelSlot = NULL;
+
+ if (numWorkers == 1)
+ return pstate;
+
+ pstate->parallelSlot = (ParallelSlot *) pg_malloc(slotSize);
+ memset((void *) pstate->parallelSlot, 0, slotSize);
+
+ shutdown_info.pstate = pstate;
+
+ #ifdef WIN32
+ tMasterThreadId = GetCurrentThreadId();
+ termEvent = CreateEvent(NULL, true, false, "Terminate");
+ #else
+ signal(SIGTERM, sigTermHandler);
+ signal(SIGINT, sigTermHandler);
+ signal(SIGQUIT, sigTermHandler);
+ #endif
+
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ #ifdef WIN32
+ WorkerInfo *wi;
+ uintptr_t handle;
+ #else
+ pid_t pid;
+ #endif
+ int pipeMW[2],
+ pipeWM[2];
+
+ if (pgpipe(pipeMW) < 0 || pgpipe(pipeWM) < 0)
+ horribly_exit(modulename,
+ "could not create communication channels: %s\n",
+ strerror(errno));
+
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ pstate->parallelSlot[i].args = (ParallelArgs *) pg_malloc(sizeof(ParallelArgs));
+ pstate->parallelSlot[i].args->vopt = vopt;
+ #ifdef WIN32
+ /* Allocate a new structure for every worker */
+ wi = (WorkerInfo *) pg_malloc(sizeof(WorkerInfo));
+ wi->worker = i;
+ wi->pipeRead = pstate->parallelSlot[i].pipeRevRead = pipeMW[PIPE_READ];
+ wi->pipeWrite = pstate->parallelSlot[i].pipeRevWrite = pipeWM[PIPE_WRITE];
+ wi->vopt = vopt;
+ handle = _beginthreadex(NULL, 0, (void *) &init_spawned_worker_win32,
+ wi, 0, &(pstate->parallelSlot[i].threadId));
+ pstate->parallelSlot[i].hThread = handle;
+ #else
+ pid = fork();
+ if (pid == 0)
+ {
+ /* we are the worker */
+ int j;
+ int pipefd[2] = {pipeMW[PIPE_READ], pipeWM[PIPE_WRITE]};
+
+ /*
+ * Store the fds for the reverse communication in pstate. Actually
+ * we only use this in case of an error and don't use pstate
+ * otherwise in the worker process. On Windows we write to the
+ * global pstate, in Unix we write to our process-local copy but
+ * that's also where we'd retrieve this information back from.
+ */
+ pstate->parallelSlot[i].pipeRevRead = pipefd[PIPE_READ];
+ pstate->parallelSlot[i].pipeRevWrite = pipefd[PIPE_WRITE];
+ pstate->parallelSlot[i].pid = getpid();
+
+ pstate->parallelSlot[i].args->connection
+ = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport,
+ vopt->username, vopt->promptPassword,
+ vopt->progname, false);
+
+ /* close read end of Worker -> Master */
+ closesocket(pipeWM[PIPE_READ]);
+
+ /* close write end of Master -> Worker */
+ closesocket(pipeMW[PIPE_WRITE]);
+
+ /*
+ * Close all inherited fds for communication of the master with
+ * the other workers.
+ */
+ for (j = 0; j < i; j++)
+ {
+ closesocket(pstate->parallelSlot[j].pipeRead);
+ closesocket(pstate->parallelSlot[j].pipeWrite);
+ }
+
+ SetupWorker(pstate->parallelSlot[i].args->connection,
+ pipefd, i, vopt->analyze_stage);
+ exit(0);
+ }
+ else if (pid < 0)
+ /* fork failed */
+ horribly_exit(modulename,
+ "could not create worker process: %s\n",
+ strerror(errno));
+
+ /* we are the Master, pid > 0 here */
+ Assert(pid > 0);
+
+ /* close read end of Master -> Worker */
+ closesocket(pipeMW[PIPE_READ]);
+
+ /* close write end of Worker -> Master */
+ closesocket(pipeWM[PIPE_WRITE]);
+
+ pstate->parallelSlot[i].pid = pid;
+ #endif
+
+ pstate->parallelSlot[i].pipeRead = pipeWM[PIPE_READ];
+ pstate->parallelSlot[i].pipeWrite = pipeMW[PIPE_WRITE];
+ }
+
+ return pstate;
+ }
+
+ /*
+ * Tell all of our workers to terminate.
+ *
+ * Pretty straightforward routine, first we tell everyone to terminate, then
+ * we listen to the workers' replies and finally close the sockets that we
+ * have used for communication.
+ */
+ void
+ ParallelVacuumEnd(ParallelState *pstate)
+ {
+ int i;
+
+ Assert(IsEveryWorkerIdle(pstate));
+
+ /* close the sockets so that the workers know they can exit */
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ closesocket(pstate->parallelSlot[i].pipeRead);
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+ }
+
+ WaitForTerminatingWorkers(pstate);
+
+ /*
+ * Remove the pstate again, so the exit handler in the parent will now
+ * again fall back to closing AH->connection (if connected).
+ */
+ shutdown_info.pstate = NULL;
+
+ free(pstate->parallelSlot);
+ free(pstate);
+ }
+
+
+ /*
+ * Find the first free parallel slot (if any).
+ */
+ int
+ GetIdleWorker(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus == WRKR_IDLE)
+ return i;
+ return NO_SLOT;
+ }
+
+ /*
+ * Return true iff every worker process is in the WRKR_TERMINATED state.
+ */
+ static bool
+ HasEveryWorkerTerminated(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_TERMINATED)
+ return false;
+ return true;
+ }
+
+ /*
+ * If we have one worker that terminates for some reason, we'd like the other
+ * threads to terminate as well (and not finish with their 70 GB table dump
+ * first...). Now in UNIX we can just kill these processes, and let the signal
+ * handler set wantAbort to 1. In Windows we set a termEvent and this serves
+ * as the signal for everyone to terminate.
+ */
+ static void
+ checkAborting()
+ {
+ #ifdef WIN32
+ if (WaitForSingleObject(termEvent, 0) == WAIT_OBJECT_0)
+ #else
+ if (wantAbort)
+ #endif
+ horribly_exit(modulename, "worker is terminating\n");
+ }
+
+ /*
+ * Shut down any remaining workers, this has an implicit do_wait == true.
+ *
+ * The fastest way we can make the workers terminate gracefully is when
+ * they are listening for new commands and we just tell them to terminate.
+ */
+ static void
+ ShutdownWorkersHard(ParallelState *pstate)
+ {
+ #ifndef WIN32
+ int i;
+
+ signal(SIGPIPE, SIG_IGN);
+
+ /*
+ * Close our write end of the sockets so that the workers know they can
+ * exit.
+ */
+ for (i = 0; i < pstate->numWorkers; i++)
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ kill(pstate->parallelSlot[i].pid, SIGTERM);
+ #else
+ /* The workers monitor this event via checkAborting(). */
+ SetEvent(termEvent);
+ #endif
+
+ WaitForTerminatingWorkers(pstate);
+ }
+
+ #ifndef WIN32
+ /* Signal handling (UNIX only) */
+ static void
+ sigTermHandler(int signum)
+ {
+ wantAbort = 1;
+ }
+ #endif
+
+
+
+ /*
+ * This function is called by both UNIX and Windows variants to set up a
+ * worker process.
+ */
+ static void
+ SetupWorker(PGconn *connection, int pipefd[2], int worker, int vacStage)
+ {
+
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"};
+
+ executeMaintenanceCommand(connection, stage_commands[vacStage], false);
+
+ /*
+ * Call the setup worker function that's defined in the ArchiveHandle.
+ *
+ * We get the raw connection only for the reason that we can close it
+ * properly when we shut down. This happens only that way when it is
+ * brought down because of an error.
+ */
+ WaitForCommands(connection, pipefd);
+ closesocket(pipefd[PIPE_READ]);
+ closesocket(pipefd[PIPE_WRITE]);
+ }
+
+
+
+ /*
+ * That's the main routine for the worker.
+ * When it starts up it enters this routine and waits for commands from the
+ * master process. After having processed a command it comes back to here to
+ * wait for the next command. Finally it will receive a TERMINATE command and
+ * exit.
+ */
+ static void
+ WaitForCommands(PGconn * connection, int pipefd[2])
+ {
+ char *command;
+ PQExpBufferData sql;
+
+ for (;;)
+ {
+ if (!(command = getMessageFromMaster(pipefd)))
+ {
+ PQfinish(connection);
+ connection = NULL;
+ return;
+ }
+
+
+ /* check if master has set the terminate event*/
+ checkAborting();
+
+ if (executeMaintenanceCommand(connection, command, false))
+ sendMessageToMaster(pipefd, "OK");
+ else
+ {
+ initPQExpBuffer(&sql);
+ appendPQExpBuffer(&sql, "ERROR : %s",
+ PQerrorMessage(connection));
+ sendMessageToMaster(pipefd, sql.data);
+ termPQExpBuffer(&sql);
+ }
+
+ /* command was pg_malloc'd and we are responsible for free()ing it. */
+ free(command);
+ }
+ }
+
+ /*
+ * ---------------------------------------------------------------------
+ * Note the status change:
+ *
+ * DispatchJobForTocEntry WRKR_IDLE -> WRKR_WORKING
+ * ListenToWorkers WRKR_WORKING -> WRKR_FINISHED / WRKR_TERMINATED
+ * ReapWorkerStatus WRKR_FINISHED -> WRKR_IDLE
+ * ---------------------------------------------------------------------
+ *
+ * Just calling ReapWorkerStatus() when all workers are working might or might
+ * not give you an idle worker because you need to call ListenToWorkers() in
+ * between and only thereafter ReapWorkerStatus(). This is necessary in order
+ * to get and deal with the status (=result) of the worker's execution.
+ */
+ void
+ ListenToWorkers(ParallelState *pstate, bool do_wait)
+ {
+ int worker;
+ char *msg;
+
+ msg = getMessageFromWorker(pstate, do_wait, &worker);
+
+ if (!msg)
+ {
+ if (do_wait)
+ horribly_exit(modulename, "a worker process died unexpectedly\n");
+ return;
+ }
+
+ if (messageStartsWith(msg, "OK"))
+ {
+ pstate->parallelSlot[worker].workerStatus = WRKR_FINISHED;
+
+ }
+ else if (messageStartsWith(msg, "ERROR "))
+ {
+
+ ParallelSlot *mySlot = &pstate->parallelSlot[worker];
+
+ mySlot->workerStatus = WRKR_TERMINATED;
+
+ horribly_exit(modulename,
+ "%s: vacuuming of database \"%s\" failed %s",
+ mySlot->args->vopt->progname,
+ mySlot->args->vopt->dbname, msg + strlen("ERROR "));
+ }
+ else
+ horribly_exit(modulename, "invalid message received from worker: %s\n", msg);
+
+ /* both Unix and Win32 return pg_malloc()ed space, so we free it */
+ free(msg);
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * This function is used to get the return value of a terminated worker
+ * process. If a process has terminated, its status is stored in *status and
+ * the id of the worker is returned.
+ */
+ int
+ ReapWorkerStatus(ParallelState *pstate, int *status)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_FINISHED)
+ {
+ *status = pstate->parallelSlot[i].status;
+ pstate->parallelSlot[i].status = 0;
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ return i;
+ }
+ }
+
+ return NO_SLOT;
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It looks for an idle worker process and only returns if there is one.
+ */
+ void
+ EnsureIdleWorker(ParallelState *pstate)
+ {
+ int ret_worker;
+ int work_status;
+
+ for (;;)
+ {
+ int nTerm = 0;
+
+ while ((ret_worker = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ if (work_status != 0)
+ horribly_exit(modulename, "error processing a parallel work item\n");
+
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before dispatching
+ * the next item. If nTerm > 0 we already have that (quick check).
+ */
+ if (nTerm > 0)
+ return;
+
+ /* explicit check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ return;
+
+ /*
+ * If we have no idle worker, read the result of one or more workers
+ * and loop the loop to call ReapWorkerStatus() on them
+ */
+ ListenToWorkers(pstate, true);
+ }
+ }
+
+
+ /*
+ * Return true iff every worker is in the WRKR_IDLE state.
+ */
+ bool
+ IsEveryWorkerIdle(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_IDLE)
+ return false;
+ return true;
+ }
+
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It waits for all workers to terminate.
+ */
+ void
+ EnsureWorkersFinished(ParallelState *pstate)
+ {
+ int work_status;
+
+ if (!pstate || pstate->numWorkers == 1)
+ return;
+
+ /* Waiting for the remaining worker processes to finish */
+ while (!IsEveryWorkerIdle(pstate))
+ {
+ if (ReapWorkerStatus(pstate, &work_status) == NO_SLOT)
+ ListenToWorkers(pstate, true);
+ else if (work_status != 0)
+ horribly_exit(modulename,
+ "error processing a parallel work item\n");
+ }
+ }
+
+ /*
+ * This function is executed in the worker process.
+ *
+ * It returns the next message on the communication channel, blocking until it
+ * becomes available.
+ */
+ static char *
+ getMessageFromMaster(int pipefd[2])
+ {
+ return readMessageFromPipe(pipefd[PIPE_READ]);
+ }
+
+ /*
+ * This function is executed in the worker process.
+ *
+ * It sends a message to the master on the communication channel.
+ */
+ static void
+ sendMessageToMaster(int pipefd[2], const char *str)
+ {
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pipefd[PIPE_WRITE], str, len) != len)
+ horribly_exit(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It returns the next message from the worker on the communication channel,
+ * optionally blocking (do_wait) until it becomes available.
+ *
+ * The id of the worker is returned in *worker.
+ */
+ static char *
+ getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker)
+ {
+ int i;
+ fd_set workerset;
+ int maxFd = -1;
+ struct timeval nowait = {0, 0};
+
+ FD_ZERO(&workerset);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_TERMINATED)
+ continue;
+ FD_SET(pstate->parallelSlot[i].pipeRead, &workerset);
+ /* actually WIN32 ignores the first parameter to select()... */
+ if (pstate->parallelSlot[i].pipeRead > maxFd)
+ maxFd = pstate->parallelSlot[i].pipeRead;
+ }
+
+ if (do_wait)
+ {
+ i = select_loop(maxFd, &workerset);
+ Assert(i != 0);
+ }
+ else
+ {
+ if ((i = select(maxFd + 1, &workerset, NULL, NULL, &nowait)) == 0)
+ return NULL;
+ }
+
+ if (i < 0)
+ horribly_exit(modulename, "error in ListenToWorkers(): %s\n", strerror(errno));
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ char *msg;
+
+ if (!FD_ISSET(pstate->parallelSlot[i].pipeRead, &workerset))
+ continue;
+
+ msg = readMessageFromPipe(pstate->parallelSlot[i].pipeRead);
+ *worker = i;
+ return msg;
+ }
+
+ Assert(false);
+ return NULL;
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It sends a message to a certain worker on the communication channel.
+ */
+ static void
+ sendMessageToWorker(ParallelState *pstate, int worker, const char *str)
+ {
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pstate->parallelSlot[worker].pipeWrite, str, len) != len)
+ {
+ /*
+ * If we're already aborting anyway, don't care if we succeed or not.
+ * The child might have gone already.
+ */
+ #ifndef WIN32
+ if (!aborting)
+ #endif
+ horribly_exit(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+ }
+ }
+
+ /*
+ * The underlying function to read a message from the communication channel
+ * (fd) with optional blocking (do_wait).
+ */
+ static char *
+ readMessageFromPipe(int fd)
+ {
+ char *msg;
+ int msgsize,
+ bufsize;
+ int ret;
+
+ /*
+ * The problem here is that we need to deal with several possibilites: we
+ * could receive only a partial message or several messages at once. The
+ * caller expects us to return exactly one message however.
+ *
+ * We could either read in as much as we can and keep track of what we
+ * delivered back to the caller or we just read byte by byte. Once we see
+ * (char) 0, we know that it's the message's end. This would be quite
+ * inefficient for more data but since we are reading only on the command
+ * channel, the performance loss does not seem worth the trouble of
+ * keeping internal states for different file descriptors.
+ */
+ bufsize = 64; /* could be any number */
+ msg = (char *) pg_malloc(bufsize);
+
+ msgsize = 0;
+ for (;;)
+ {
+ Assert(msgsize <= bufsize);
+ ret = piperead(fd, msg + msgsize, 1);
+
+ /* worker has closed the connection or another error happened */
+ if (ret <= 0)
+ return NULL;
+
+ Assert(ret == 1);
+
+ if (msg[msgsize] == '\0')
+ return msg;
+
+ msgsize++;
+ if (msgsize == bufsize)
+ {
+ /* could be any number */
+ bufsize += 16;
+ msg = (char *) realloc(msg, bufsize);
+ }
+ }
+ }
+
+ /*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+ static int
+ select_loop(int maxFd, fd_set *workerset)
+ {
+ int i;
+ fd_set saveSet = *workerset;
+
+ #ifdef WIN32
+ /* should always be the master */
+ Assert(tMasterThreadId == GetCurrentThreadId());
+
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+ #else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+
+ /*
+ * If we Ctrl-C the master process , it's likely that we interrupt
+ * select() here. The signal handler will set wantAbort == true and
+ * the shutdown journey starts from here. Note that we'll come back
+ * here later when we tell all workers to terminate and read their
+ * responses. But then we have aborting set to true.
+ */
+ if (wantAbort && !aborting)
+ horribly_exit(modulename, "terminated by user\n");
+
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+ #endif
+
+ return i;
+ }
+
+ void
+ DispatchJob(ParallelState *pstate, char * command)
+ {
+ int worker;
+
+ /* our caller makes sure that at least one worker is idle */
+ Assert(GetIdleWorker(pstate) != NO_SLOT);
+ worker = GetIdleWorker(pstate);
+ Assert(worker != NO_SLOT);
+
+ sendMessageToWorker(pstate, worker, command);
+ pstate->parallelSlot[worker].workerStatus = WRKR_WORKING;
+ }
+
+ #ifdef WIN32
+ /*
+ * This is a replacement version of pipe for Win32 which allows returned
+ * handles to be used in select(). Note that read/write calls must be replaced
+ * with recv/send.
+ */
+ static int
+ pgpipe(int handles[2])
+ {
+ SOCKET s;
+ struct sockaddr_in serv_addr;
+ int len = sizeof(serv_addr);
+
+ handles[0] = handles[1] = INVALID_SOCKET;
+
+ if ((s = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
+ {
+ return -1;
+ }
+
+ memset((void *) &serv_addr, 0, sizeof(serv_addr));
+ serv_addr.sin_family = AF_INET;
+ serv_addr.sin_port = htons(0);
+ serv_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+ if (bind(s, (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if (listen(s, 1) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if (getsockname(s, (SOCKADDR *) &serv_addr, &len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if ((handles[1] = socket(PF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
+ {
+ closesocket(s);
+ return -1;
+ }
+
+ if (connect(handles[1], (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ closesocket(s);
+ return -1;
+ }
+ if ((handles[0] = accept(s, (SOCKADDR *) &serv_addr, &len)) == INVALID_SOCKET)
+ {
+ closesocket(handles[1]);
+ handles[1] = INVALID_SOCKET;
+ closesocket(s);
+ return -1;
+ }
+ closesocket(s);
+ return 0;
+ }
+
+ static int
+ piperead(int s, char *buf, int len)
+ {
+ int ret = recv(s, buf, len, 0);
+
+ if (ret < 0 && WSAGetLastError() == WSAECONNRESET)
+ /* EOF on the pipe! (win32 socket based implementation) */
+ ret = 0;
+ return ret;
+ }
+
+ #endif
+
+ static void
+ shutdown_parallel_vacuum_utils()
+ {
+ #ifdef WIN32
+ /* Call the cleanup function only from the main thread */
+ if (mainThreadId == GetCurrentThreadId())
+ WSACleanup();
+ #endif
+ }
+
+ void
+ init_parallel_vacuum_utils(void)
+ {
+ #ifdef WIN32
+ if (!parallel_init_done)
+ {
+ WSADATA wsaData;
+ int err;
+
+ mainThreadId = GetCurrentThreadId();
+ err = WSAStartup(MAKEWORD(2, 2), &wsaData);
+ if (err != 0)
+ {
+ exit_nicely(1);
+ }
+
+ parallel_init_done = true;
+ }
+ #endif
+ }
+
+ void
+ on_exit_close_connection(PGconn *conn)
+ {
+ shutdown_info.conn = conn;
+ }
+
+ /*
+ * Fail and die, with a message to stderr. Parameters as for write_msg.
+ *
+ * This is defined in parallel.c, because in parallel mode, things are more
+ * complicated. If the worker process does horribly_exit(), we forward its
+ * last words to the master process. The master process then does
+ * horribly_exit() with this error message itself and prints it normally.
+ * After printing the message, horribly_exit() on the master will shut down
+ * the remaining worker processes.
+ */
+ void
+ horribly_exit(const char *modulename, const char *fmt,...)
+ {
+ va_list ap;
+ ParallelState *pstate = shutdown_info.pstate;
+ ParallelSlot *slot;
+ va_start(ap, fmt);
+
+ if (pstate == NULL)
+ {
+ /* We're the parent, just write the message out */
+ vfprintf(stderr, _(fmt), ap);
+ }
+ else
+ {
+ slot = GetMyPSlot(pstate);
+ if (!slot)
+ {
+ /* We're the parent, just write the message out */
+ vfprintf(stderr, _(fmt), ap);
+
+ }
+ else
+ {
+ /* If we're a worker process, send the msg to the master process */
+ parallel_msg_master(slot, modulename, fmt, ap);
+ }
+ }
+
+ va_end(ap);
+
+ exit_nicely(1);
+ }
+
+ static void
+ exit_nicely(int code)
+ {
+ if (shutdown_info.pstate)
+ {
+ ParallelSlot *slot = GetMyPSlot(shutdown_info.pstate);
+
+ if (!slot)
+ {
+ /*
+ * We're the master: We have already printed out the message
+ * passed to horribly_exit() either from the master itself or from
+ * a worker process. Now we need to close our own database
+ * connection (only open during parallel dump but not restore) and
+ * shut down the remaining workers.
+ */
+ PQfinish(shutdown_info.conn);
+ #ifndef WIN32
+
+ /*
+ * Setting aborting to true switches to best-effort-mode
+ * (send/receive but ignore errors) in communicating with our
+ * workers.
+ */
+ aborting = true;
+ #endif
+ ShutdownWorkersHard(shutdown_info.pstate);
+ }
+ else if (slot->args->connection)
+ PQfinish(slot->args->connection);
+
+ }
+ else if (shutdown_info.conn)
+ PQfinish(shutdown_info.conn);
+
+ shutdown_parallel_vacuum_utils();
+ exit(code);
+ }
+
+
*** /dev/null
--- b/src/bin/scripts/vac_parallel.h
***************
*** 0 ****
--- 1,105 ----
+ /*-------------------------------------------------------------------------
+ *
+ * vac_parallel.h
+ *
+ * Parallel support header file for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #ifndef VAC_PARALLEL_H
+ #define VAC_PARALLEL_H
+
+ #include "postgres_fe.h"
+ #include <time.h>
+ #include "libpq-fe.h"
+ #include "common.h"
+
+ typedef enum
+ {
+ WRKR_TERMINATED = 0,
+ WRKR_IDLE,
+ WRKR_WORKING,
+ WRKR_FINISHED
+ } T_WorkerStatus;
+
+
+ typedef struct VacOpt
+ {
+ char *dbname;
+ char *pgport;
+ char *pghost;
+ char *username;
+ char *progname;
+ enum trivalue promptPassword;
+ int analyze_stage;
+ }VacOpt;
+
+ /* Arguments needed for a worker process */
+ typedef struct ParallelArgs
+ {
+ PGconn *connection;
+ VacOpt *vopt;
+ } ParallelArgs;
+
+ /* State for each parallel activity slot */
+ typedef struct ParallelSlot
+ {
+ ParallelArgs *args; //can pass connection handle here
+ T_WorkerStatus workerStatus;
+ int status;
+ int pipeRead;
+ int pipeWrite;
+ int pipeRevRead;
+ int pipeRevWrite;
+ #ifdef WIN32
+ uintptr_t hThread;
+ unsigned int threadId;
+ #else
+ pid_t pid;
+ #endif
+ } ParallelSlot;
+
+ #define NO_SLOT (-1)
+
+ typedef struct ParallelState
+ {
+ int numWorkers;
+ ParallelSlot *parallelSlot;
+ } ParallelState;
+
+
+
+ #ifdef WIN32
+ extern bool parallel_init_done;
+ extern DWORD mainThreadId;
+ #endif
+
+ extern ParallelState * ParallelVacuumStart(VacOpt *vopt, int numWorkers);
+ extern int GetIdleWorker(ParallelState *pstate);
+ extern bool IsEveryWorkerIdle(ParallelState *pstate);
+ extern void ListenToWorkers(ParallelState *pstate, bool do_wait);
+ extern int ReapWorkerStatus(ParallelState *pstate, int *status);
+ extern void EnsureIdleWorker(ParallelState *pstate);
+ extern void EnsureWorkersFinished(ParallelState *pstate);
+
+ extern void DispatchJob(ParallelState *pstate, char * command);
+ extern void
+ exit_horribly(const char *modulename, const char *fmt,...)
+ __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3), noreturn));
+
+ extern void init_parallel_vacuum_utils(void);
+ extern void on_exit_close_connection(PGconn *conn);
+ extern void ParallelVacuumEnd(ParallelState *pstate);
+
+
+ #endif /* VAC_PARALLEL_H */
*** a/src/bin/scripts/vacuumdb.c
--- b/src/bin/scripts/vacuumdb.c
***************
*** 11,34 ****
*/
#include "postgres_fe.h"
- #include "common.h"
#include "dumputils.h"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *table, const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
const char *progname, bool echo);
static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
bool analyze_only, bool analyze_in_stages, bool freeze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet);
static void help(const char *progname);
int
main(int argc, char *argv[])
--- 11,54 ----
*/
#include "postgres_fe.h"
#include "dumputils.h"
+ #include "vac_parallel.h"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password,
const char *progname, bool echo);
static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
bool analyze_only, bool analyze_in_stages, bool freeze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet, int parallel);
static void help(const char *progname);
+ void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host,
+ const char *port, const char *username,
+ enum trivalue prompt_password, const char *progname,
+ bool echo, int parallel, SimpleStringList *tables);
+
+ void run_command(ParallelState *pstate, char *command);
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+ static void
+ run_parallel_vacuum(PGconn *conn, bool echo,
+ const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int parallel,
+ VacOpt *vopt, const char *progname);
+
+
int
main(int argc, char *argv[])
***************
*** 49,54 **** main(int argc, char *argv[])
--- 69,75 ----
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
***************
*** 74,86 **** main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
{
switch (c)
{
--- 95,108 ----
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int parallel = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
***************
*** 129,134 **** main(int argc, char *argv[])
--- 151,166 ----
case 'v':
verbose = true;
break;
+ case 'j':
+ parallel = atoi(optarg);
+ if (parallel <= 0)
+ {
+ fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least 1\n"),
+ progname);
+ exit(1);
+ }
+
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
***************
*** 141,146 **** main(int argc, char *argv[])
--- 173,179 ----
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
***************
*** 196,202 **** main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet);
}
else
{
--- 229,235 ----
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet, parallel);
}
else
{
***************
*** 210,234 **** main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
! for (cell = tables.head; cell; cell = cell->next)
{
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo);
}
}
- else
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages,
- freeze, NULL,
- host, port, username, prompt_password,
- progname, echo);
}
exit(0);
--- 243,281 ----
dbname = get_user_name_or_exit(progname);
}
! if (parallel > 1)
{
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! progname, echo, parallel, &tables);
! }
! else
! {
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
!
! for (cell = tables.head; cell; cell = cell->next)
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo);
! }
! }
! else
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, NULL,
! host, port, username, prompt_password,
! progname, echo);
!
}
}
}
exit(0);
***************
*** 253,263 **** run_vacuum_command(PGconn *conn, const char *sql, bool echo, const char *dbname,
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze, const char *table,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo)
{
PQExpBufferData sql;
--- 300,311 ----
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo)
{
PQExpBufferData sql;
***************
*** 352,362 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
! bool analyze_in_stages, bool freeze, const char *maintenance_db,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
--- 400,411 ----
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *maintenance_db, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo, bool quiet, int parallel)
{
PGconn *conn;
PGresult *result;
***************
*** 377,391 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
fflush(stdout);
}
! vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
! analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
progname, echo);
}
PQclear(result);
}
static void
help(const char *progname)
--- 426,673 ----
fflush(stdout);
}
!
! if (parallel > 1)
! {
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! progname, echo, parallel, NULL);
!
! }
! else
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
progname, echo);
+ }
}
PQclear(result);
}
+ static void
+ run_parallel_vacuum(PGconn *conn, bool echo,
+ const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int parallel,
+ VacOpt *vopt, const char *progname)
+ {
+ ParallelState *pstate;
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+
+ initPQExpBuffer(&sql);
+
+ pstate = ParallelVacuumStart(vopt, parallel);
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ prepare_command(conn, full, verbose, and_analyze,
+ analyze_only, freeze, &sql);
+ appendPQExpBuffer(&sql, " %s", cell->val);
+ run_command(pstate, sql.data);
+ termPQExpBuffer(&sql);
+ }
+
+ EnsureWorkersFinished(pstate);
+ ParallelVacuumEnd(pstate);
+ termPQExpBuffer(&sql);
+ }
+
+
+ void
+ vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int parallel,
+ SimpleStringList *tables)
+ {
+
+ PGconn *conn;
+ VacOpt vopt = {0};
+
+ init_parallel_vacuum_utils();
+
+ conn = connectDatabase(dbname, host, port, username, prompt_password,
+ progname, false);
+
+ if (dbname)
+ vopt.dbname = pg_strdup(dbname);
+
+ if (host)
+ vopt.pghost = pg_strdup(host);
+
+ if (port)
+ vopt.pgport = pg_strdup(port);
+
+ if (username)
+ vopt.username = pg_strdup(username);
+
+ if (progname)
+ vopt.progname = pg_strdup(progname);
+
+ vopt.promptPassword = prompt_password;
+
+ on_exit_close_connection(conn);
+
+ /* if table list is not provided then we need to do vaccum for whole DB
+ get the list of all tables and prpare the list*/
+ if (!tables || !tables->head)
+ {
+ SimpleStringList dbtables = {NULL, NULL};
+ PGresult *res;
+ int ntuple;
+ int i;
+ char *relName;
+ char *nspace;
+ PQExpBufferData sql;
+
+ initPQExpBuffer(&sql);
+
+ res = executeQuery(conn,
+ "select relname, nspname from pg_class c, pg_namespace ns"
+ " where relkind= \'r\' and c.relnamespace = ns.oid"
+ " order by relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ relName = PQgetvalue(res, i, 0);
+ nspace = PQgetvalue(res, i, 1);
+
+ appendPQExpBuffer(&sql, " %s.\"%s\"", nspace, relName);
+ simple_string_list_append(&dbtables, sql.data);
+ resetPQExpBuffer(&sql);
+ }
+
+ termPQExpBuffer(&sql);
+
+ tables = &dbtables;
+
+ }
+
+ if (analyze_in_stages)
+ {
+ int i;
+
+ for (i = 0; i < 3; i++)
+ {
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")};
+
+ puts(gettext(stage_messages[i]));
+ vopt.analyze_stage = i;
+ run_parallel_vacuum(conn, echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, parallel, &vopt,
+ progname);
+ }
+ }
+ else
+ {
+ vopt.analyze_stage = -1;
+ run_parallel_vacuum(conn, echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, parallel, &vopt,
+ progname);
+ }
+
+ PQfinish(conn);
+ }
+
+ void run_command(ParallelState *pstate, char *command)
+ {
+ int work_status;
+ int ret_child;
+
+ DispatchJob(pstate, command);
+
+ /*Listen for worker and get message*/
+ for (;;)
+ {
+ int nTerm = 0;
+
+ ListenToWorkers(pstate, false);
+ while ((ret_child = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before
+ * re-running the loop. If nTerm > 0 we already have that (quick
+ * check).
+ */
+ if (nTerm > 0)
+ break;
+
+ /* if nobody terminated, explicitly check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ break;
+ }
+ }
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+ {
+ initPQExpBuffer(sql);
+
+ if (analyze_only)
+ {
+ appendPQExpBuffer(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBuffer(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBuffer(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBuffer(sql, " FULL");
+ if (freeze)
+ appendPQExpBuffer(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBuffer(sql, " ANALYZE");
+ }
+ }
+ }
+
static void
help(const char *progname)
***************
*** 405,410 **** help(const char *progname)
--- 687,693 ----
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
On Wed, Jul 2, 2014 at 11:45 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:
Jeff Janes wrote:
I would only envision using the parallel feature for vacuumdb after a
pg_upgrade or some other major maintenance window (that is the only
time I ever envision using vacuumdb at all). I don't think autovacuum
can be expected to handle such situations well, as it is designed to
be a smooth background process.That's a fair point. One thing that would be pretty neat but I don't
think I would get anyone to implement it, is having the user control the
autovacuum launcher in some way. For instance "please vacuum this set
of tables as quickly as possible", and it would launch as many workers
are configured. It would take months to get a UI settled for this,
however.
This sounds to be a better way to have multiple workers working
on vacuuming tables. For vacuum as we already have some sort
of infrastructure (vacuum workers) to perform tasks in parallel, why
not to leverage that instead of inventing a new one even if we assume
that we can reduce the duplicate code.
I don't know how to calibrate the number of lines that is worthwhile.
If you write in C and need to have cross-platform compatibility and
robust error handling, it seems to take hundreds of lines to do much
of anything. The code duplication is a problem, but I don't think
just raw line count is, especially since it has already been written.Well, there are (at least) two types of duplicate code: first you have
these common routines such as pgpipe that are duplicates for no good
reason. Just move them to src/port or something and it's all good. But
the OP said there is code that cannot be shared even though it's very
similar in both incarnations. That means we cannot (or it's difficult
to) just have one copy, which means as they fix bugs in one copy we need
to update the other.
I checked briefly the duplicate code among both versions and I think,
we might be able to reduce it to a significant amount by making common
functions and use AH where passed (as an example, I have checked
function ParallelBackupStart() which is more than 100 lines). If you see
code duplication as a major point for which you don't prefer this patch,
then I think that can be ameliorated or atleast it is worth a try to do so.
However I think it might be better to achieve in a way suggested by you
using autovacuum launcher.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 02 July 2014 23:45, Alvaro Herrera Wrote,
Well, there are (at least) two types of duplicate code: first you have
these common routines such as pgpipe that are duplicates for no good
reason. Just move them to src/port or something and it's all good.
But the OP said there is code that cannot be shared even though it's
very similar in both incarnations. That means we cannot (or it's
difficult
to) just have one copy, which means as they fix bugs in one copy we
need to update the other. This is bad -- witness the situation with
ecpg's copy of date/time code, where there are bugs fixed in the
backend version but the ecpg version does not have the fix. It's
difficult to keep track of these things.
In attached patch, I have moved pgpipe, piperead functions to src/port/pipe.c
There are some more common function what Jeff and Amit also mentioned to move to common place,
Currently I am not sure where we can move other functions to.
Can we move other parallel functions to src/port, may be one new file parallel.c under src/port ?
Thanks & Regards,
Dilip Kumar
Attachments:
vacuumdb_parallel_v8.patchapplication/octet-stream; name=vacuumdb_parallel_v8.patchDownload
*** a/doc/src/sgml/ref/vacuumdb.sgml
--- b/doc/src/sgml/ref/vacuumdb.sgml
***************
*** 224,229 **** PostgreSQL documentation
--- 224,239 ----
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></></term>
+ <term><option>--jobs=<replaceable class="parameter">jobs</replaceable></></term>
+ <listitem>
+ <para>
+ Number of parallel process to perform the operation.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-?</></term>
<term><option>--help</></term>
<listitem>
*** a/src/bin/pg_dump/parallel.c
--- b/src/bin/pg_dump/parallel.c
***************
*** 36,43 ****
#ifdef WIN32
static unsigned int tMasterThreadId = 0;
static HANDLE termEvent = INVALID_HANDLE_VALUE;
- static int pgpipe(int handles[2]);
- static int piperead(int s, char *buf, int len);
/*
* Structure to hold info passed by _beginthreadex() to the function it calls
--- 36,41 ----
***************
*** 61,69 **** typedef struct
static bool aborting = false;
static volatile sig_atomic_t wantAbort = 0;
- #define pgpipe(a) pipe(a)
- #define piperead(a,b,c) read(a,b,c)
- #define pipewrite(a,b,c) write(a,b,c)
#endif
typedef struct ShutdownInformation
--- 59,64 ----
***************
*** 1315,1417 **** readMessageFromPipe(int fd)
return NULL;
}
-
- #ifdef WIN32
- /*
- * This is a replacement version of pipe for Win32 which allows returned
- * handles to be used in select(). Note that read/write calls must be replaced
- * with recv/send. "handles" have to be integers so we check for errors then
- * cast to integers.
- */
- static int
- pgpipe(int handles[2])
- {
- pgsocket s, tmp_sock;
- struct sockaddr_in serv_addr;
- int len = sizeof(serv_addr);
-
- /* We have to use the Unix socket invalid file descriptor value here. */
- handles[0] = handles[1] = -1;
-
- /*
- * setup listen socket
- */
- if ((s = socket(AF_INET, SOCK_STREAM, 0)) == PGINVALID_SOCKET)
- {
- write_msg(modulename, "pgpipe: could not create socket: error code %d\n",
- WSAGetLastError());
- return -1;
- }
-
- memset((void *) &serv_addr, 0, sizeof(serv_addr));
- serv_addr.sin_family = AF_INET;
- serv_addr.sin_port = htons(0);
- serv_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
- if (bind(s, (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
- {
- write_msg(modulename, "pgpipe: could not bind: error code %d\n",
- WSAGetLastError());
- closesocket(s);
- return -1;
- }
- if (listen(s, 1) == SOCKET_ERROR)
- {
- write_msg(modulename, "pgpipe: could not listen: error code %d\n",
- WSAGetLastError());
- closesocket(s);
- return -1;
- }
- if (getsockname(s, (SOCKADDR *) &serv_addr, &len) == SOCKET_ERROR)
- {
- write_msg(modulename, "pgpipe: getsockname() failed: error code %d\n",
- WSAGetLastError());
- closesocket(s);
- return -1;
- }
-
- /*
- * setup pipe handles
- */
- if ((tmp_sock = socket(AF_INET, SOCK_STREAM, 0)) == PGINVALID_SOCKET)
- {
- write_msg(modulename, "pgpipe: could not create second socket: error code %d\n",
- WSAGetLastError());
- closesocket(s);
- return -1;
- }
- handles[1] = (int) tmp_sock;
-
- if (connect(handles[1], (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
- {
- write_msg(modulename, "pgpipe: could not connect socket: error code %d\n",
- WSAGetLastError());
- closesocket(s);
- return -1;
- }
- if ((tmp_sock = accept(s, (SOCKADDR *) &serv_addr, &len)) == PGINVALID_SOCKET)
- {
- write_msg(modulename, "pgpipe: could not accept connection: error code %d\n",
- WSAGetLastError());
- closesocket(handles[1]);
- handles[1] = -1;
- closesocket(s);
- return -1;
- }
- handles[0] = (int) tmp_sock;
-
- closesocket(s);
- return 0;
- }
-
- static int
- piperead(int s, char *buf, int len)
- {
- int ret = recv(s, buf, len, 0);
-
- if (ret < 0 && WSAGetLastError() == WSAECONNRESET)
- /* EOF on the pipe! (win32 socket based implementation) */
- ret = 0;
- return ret;
- }
-
- #endif
--- 1310,1312 ----
*** a/src/bin/scripts/Makefile
--- b/src/bin/scripts/Makefile
***************
*** 32,38 **** dropdb: dropdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq subm
droplang: droplang.o common.o print.o mbprint.o | submake-libpq submake-libpgport
dropuser: dropuser.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
clusterdb: clusterdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
! vacuumdb: vacuumdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
reindexdb: reindexdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
pg_isready: pg_isready.o common.o | submake-libpq submake-libpgport
--- 32,38 ----
droplang: droplang.o common.o print.o mbprint.o | submake-libpq submake-libpgport
dropuser: dropuser.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
clusterdb: clusterdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
! vacuumdb: vacuumdb.o vac_parallel.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
reindexdb: reindexdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
pg_isready: pg_isready.o common.o | submake-libpq submake-libpgport
***************
*** 65,71 **** uninstall:
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
! rm -f common.o dumputils.o kwlookup.o keywords.o print.o mbprint.o $(WIN32RES)
rm -f dumputils.c print.c mbprint.c kwlookup.c keywords.c
rm -rf tmp_check
--- 65,71 ----
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
! rm -f common.o dumputils.o kwlookup.o keywords.o print.o mbprint.o vac_parallel.o $(WIN32RES)
rm -f dumputils.c print.c mbprint.c kwlookup.c keywords.c
rm -rf tmp_check
*** /dev/null
--- b/src/bin/scripts/vac_parallel.c
***************
*** 0 ****
--- 1,1070 ----
+ /*-------------------------------------------------------------------------
+ *
+ * vac_parallel.c
+ *
+ * Parallel support for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #include "vac_parallel.h"
+
+ #ifndef WIN32
+ #include <sys/types.h>
+ #include <sys/wait.h>
+ #include "signal.h"
+ #include <unistd.h>
+ #include <fcntl.h>
+ #endif
+
+ #include "common.h"
+
+
+ #define PIPE_READ 0
+ #define PIPE_WRITE 1
+
+ /* file-scope variables */
+ #ifdef WIN32
+ static unsigned int tMasterThreadId = 0;
+ static HANDLE termEvent = INVALID_HANDLE_VALUE;
+ bool parallel_init_done = false;
+ DWORD mainThreadId;
+
+ /*
+ * Structure to hold info passed by _beginthreadex() to the function it calls
+ * via its single allowed argument.
+ */
+ typedef struct
+ {
+ VacOpt *vopt;
+ int worker;
+ int pipeRead;
+ int pipeWrite;
+ } WorkerInfo;
+
+ #else
+ /*
+ * aborting is only ever used in the master, the workers are fine with just
+ * wantAbort.
+ */
+ static bool aborting = false;
+ static volatile sig_atomic_t wantAbort = 0;
+
+ #endif
+
+ static const char *modulename = gettext_noop("parallel vacuum");
+
+ typedef struct ShutdownInformation
+ {
+ ParallelState *pstate;
+ PGconn *conn;
+ } ShutdownInformation;
+
+ static ShutdownInformation shutdown_info;
+
+ static char *readMessageFromPipe(int fd);
+ static void
+ SetupWorker(PGconn *connection, int pipefd[2], int worker, int vacStage);
+ static void
+ WaitForCommands(PGconn * connection, int pipefd[2]);
+ static char *
+ getMessageFromMaster(int pipefd[2]);
+ #define messageStartsWith(msg, prefix) \
+ (strncmp(msg, prefix, strlen(prefix)) == 0)
+ #define messageEquals(msg, pattern) \
+ (strcmp(msg, pattern) == 0)
+ static char *
+ getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker);
+
+ static void
+ sendMessageToMaster(int pipefd[2], const char *str);
+
+ static void
+ parallel_msg_master(ParallelSlot *slot, const char *modulename, const char *fmt,
+ va_list ap)__attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 0)));
+
+
+ #ifndef WIN32
+ static void sigTermHandler(int signum);
+ #endif
+
+ static ParallelSlot *
+ GetMyPSlot(ParallelState *pstate);
+ static int
+ select_loop(int maxFd, fd_set *workerset);
+ void horribly_exit(const char *modulename, const char *fmt,...)
+ __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
+
+ void
+ init_parallel_vacuum_utils(void);
+
+ static void exit_nicely(int code);
+
+ static bool
+ HasEveryWorkerTerminated(ParallelState *pstate);
+
+ static void checkAborting();
+
+
+ static ParallelSlot *
+ GetMyPSlot(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ #ifdef WIN32
+ if (pstate->parallelSlot[i].threadId == GetCurrentThreadId())
+ #else
+ if (pstate->parallelSlot[i].pid == getpid())
+ #endif
+ return &(pstate->parallelSlot[i]);
+
+ return NULL;
+ }
+
+ /* Sends the error message from the worker to the master process */
+ static void
+ parallel_msg_master(ParallelSlot *slot, const char *modulename,
+ const char *fmt, va_list ap)
+ {
+ char buf[512];
+ int pipefd[2];
+
+ pipefd[PIPE_READ] = slot->pipeRevRead;
+ pipefd[PIPE_WRITE] = slot->pipeRevWrite;
+
+ strcpy(buf, "ERROR ");
+ vsnprintf(buf + strlen("ERROR "),
+ sizeof(buf) - strlen("ERROR "), fmt, ap);
+
+ sendMessageToMaster(pipefd, buf);
+ }
+
+ /*
+ * Wait for the termination of the processes using the OS-specific method.
+ */
+ static void
+ WaitForTerminatingWorkers(ParallelState *pstate)
+ {
+ while (!HasEveryWorkerTerminated(pstate))
+ {
+ ParallelSlot *slot = NULL;
+ int j;
+
+ #ifndef WIN32
+ int status;
+ pid_t pid = wait(&status);
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].pid == pid)
+ slot = &(pstate->parallelSlot[j]);
+ #else
+ uintptr_t hThread;
+ DWORD ret;
+ uintptr_t *lpHandles = pg_malloc(sizeof(HANDLE) * pstate->numWorkers);
+ int nrun = 0;
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].workerStatus != WRKR_TERMINATED)
+ {
+ lpHandles[nrun] = pstate->parallelSlot[j].hThread;
+ nrun++;
+ }
+ ret = WaitForMultipleObjects(nrun, (HANDLE *) lpHandles, false, INFINITE);
+ Assert(ret != WAIT_FAILED);
+ hThread = lpHandles[ret - WAIT_OBJECT_0];
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].hThread == hThread)
+ slot = &(pstate->parallelSlot[j]);
+
+ free(lpHandles);
+ #endif
+ Assert(slot);
+
+ slot->workerStatus = WRKR_TERMINATED;
+ }
+ Assert(HasEveryWorkerTerminated(pstate));
+ }
+
+
+ #ifdef WIN32
+ static unsigned __stdcall
+ init_spawned_worker_win32(WorkerInfo *wi)
+ {
+ PGconn *conn;
+ int pipefd[2] = {wi->pipeRead, wi->pipeWrite};
+ int worker = wi->worker;
+ VacOpt *vopt = wi->vopt;
+ ParallelSlot *mySlot = &shutdown_info.pstate->parallelSlot[worker];
+
+ conn = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport,
+ vopt->username, vopt->promptPassword, vopt->progname,
+ false);
+
+ mySlot->args->connection = conn;
+
+ free(wi);
+ SetupWorker(conn, pipefd, worker, vopt->analyze_stage);
+ _endthreadex(0);
+ return 0;
+ }
+ #endif
+
+
+
+ ParallelState * ParallelVacuumStart(VacOpt *vopt, int numWorkers)
+ {
+ ParallelState *pstate;
+ int i;
+ const size_t slotSize = numWorkers * sizeof(ParallelSlot);
+
+ Assert(numWorkers > 0);
+
+ /* Ensure stdio state is quiesced before forking */
+ fflush(NULL);
+
+ pstate = (ParallelState *) pg_malloc(sizeof(ParallelState));
+
+ pstate->numWorkers = numWorkers;
+ pstate->parallelSlot = NULL;
+
+ if (numWorkers == 1)
+ return pstate;
+
+ pstate->parallelSlot = (ParallelSlot *) pg_malloc(slotSize);
+ memset((void *) pstate->parallelSlot, 0, slotSize);
+
+ shutdown_info.pstate = pstate;
+
+ #ifdef WIN32
+ tMasterThreadId = GetCurrentThreadId();
+ termEvent = CreateEvent(NULL, true, false, "Terminate");
+ #else
+ signal(SIGTERM, sigTermHandler);
+ signal(SIGINT, sigTermHandler);
+ signal(SIGQUIT, sigTermHandler);
+ #endif
+
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ #ifdef WIN32
+ WorkerInfo *wi;
+ uintptr_t handle;
+ #else
+ pid_t pid;
+ #endif
+ int pipeMW[2],
+ pipeWM[2];
+
+ if (pgpipe(pipeMW) < 0 || pgpipe(pipeWM) < 0)
+ horribly_exit(modulename,
+ "could not create communication channels: %s\n",
+ strerror(errno));
+
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ pstate->parallelSlot[i].args = (ParallelArgs *) pg_malloc(sizeof(ParallelArgs));
+ pstate->parallelSlot[i].args->vopt = vopt;
+ #ifdef WIN32
+ /* Allocate a new structure for every worker */
+ wi = (WorkerInfo *) pg_malloc(sizeof(WorkerInfo));
+ wi->worker = i;
+ wi->pipeRead = pstate->parallelSlot[i].pipeRevRead = pipeMW[PIPE_READ];
+ wi->pipeWrite = pstate->parallelSlot[i].pipeRevWrite = pipeWM[PIPE_WRITE];
+ wi->vopt = vopt;
+ handle = _beginthreadex(NULL, 0, (void *) &init_spawned_worker_win32,
+ wi, 0, &(pstate->parallelSlot[i].threadId));
+ pstate->parallelSlot[i].hThread = handle;
+ #else
+ pid = fork();
+ if (pid == 0)
+ {
+ /* we are the worker */
+ int j;
+ int pipefd[2] = {pipeMW[PIPE_READ], pipeWM[PIPE_WRITE]};
+
+ /*
+ * Store the fds for the reverse communication in pstate. Actually
+ * we only use this in case of an error and don't use pstate
+ * otherwise in the worker process. On Windows we write to the
+ * global pstate, in Unix we write to our process-local copy but
+ * that's also where we'd retrieve this information back from.
+ */
+ pstate->parallelSlot[i].pipeRevRead = pipefd[PIPE_READ];
+ pstate->parallelSlot[i].pipeRevWrite = pipefd[PIPE_WRITE];
+ pstate->parallelSlot[i].pid = getpid();
+
+ pstate->parallelSlot[i].args->connection
+ = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport,
+ vopt->username, vopt->promptPassword,
+ vopt->progname, false);
+
+ /* close read end of Worker -> Master */
+ closesocket(pipeWM[PIPE_READ]);
+
+ /* close write end of Master -> Worker */
+ closesocket(pipeMW[PIPE_WRITE]);
+
+ /*
+ * Close all inherited fds for communication of the master with
+ * the other workers.
+ */
+ for (j = 0; j < i; j++)
+ {
+ closesocket(pstate->parallelSlot[j].pipeRead);
+ closesocket(pstate->parallelSlot[j].pipeWrite);
+ }
+
+ SetupWorker(pstate->parallelSlot[i].args->connection,
+ pipefd, i, vopt->analyze_stage);
+ exit(0);
+ }
+ else if (pid < 0)
+ /* fork failed */
+ horribly_exit(modulename,
+ "could not create worker process: %s\n",
+ strerror(errno));
+
+ /* we are the Master, pid > 0 here */
+ Assert(pid > 0);
+
+ /* close read end of Master -> Worker */
+ closesocket(pipeMW[PIPE_READ]);
+
+ /* close write end of Worker -> Master */
+ closesocket(pipeWM[PIPE_WRITE]);
+
+ pstate->parallelSlot[i].pid = pid;
+ #endif
+
+ pstate->parallelSlot[i].pipeRead = pipeWM[PIPE_READ];
+ pstate->parallelSlot[i].pipeWrite = pipeMW[PIPE_WRITE];
+ }
+
+ return pstate;
+ }
+
+ /*
+ * Tell all of our workers to terminate.
+ *
+ * Pretty straightforward routine, first we tell everyone to terminate, then
+ * we listen to the workers' replies and finally close the sockets that we
+ * have used for communication.
+ */
+ void
+ ParallelVacuumEnd(ParallelState *pstate)
+ {
+ int i;
+
+ Assert(IsEveryWorkerIdle(pstate));
+
+ /* close the sockets so that the workers know they can exit */
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ closesocket(pstate->parallelSlot[i].pipeRead);
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+ }
+
+ WaitForTerminatingWorkers(pstate);
+
+ /*
+ * Remove the pstate again, so the exit handler in the parent will now
+ * again fall back to closing AH->connection (if connected).
+ */
+ shutdown_info.pstate = NULL;
+
+ free(pstate->parallelSlot);
+ free(pstate);
+ }
+
+
+ /*
+ * Find the first free parallel slot (if any).
+ */
+ int
+ GetIdleWorker(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus == WRKR_IDLE)
+ return i;
+ return NO_SLOT;
+ }
+
+ /*
+ * Return true iff every worker process is in the WRKR_TERMINATED state.
+ */
+ static bool
+ HasEveryWorkerTerminated(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_TERMINATED)
+ return false;
+ return true;
+ }
+
+ /*
+ * If we have one worker that terminates for some reason, we'd like the other
+ * threads to terminate as well (and not finish with their 70 GB table dump
+ * first...). Now in UNIX we can just kill these processes, and let the signal
+ * handler set wantAbort to 1. In Windows we set a termEvent and this serves
+ * as the signal for everyone to terminate.
+ */
+ static void
+ checkAborting()
+ {
+ #ifdef WIN32
+ if (WaitForSingleObject(termEvent, 0) == WAIT_OBJECT_0)
+ #else
+ if (wantAbort)
+ #endif
+ horribly_exit(modulename, "worker is terminating\n");
+ }
+
+ /*
+ * Shut down any remaining workers, this has an implicit do_wait == true.
+ *
+ * The fastest way we can make the workers terminate gracefully is when
+ * they are listening for new commands and we just tell them to terminate.
+ */
+ static void
+ ShutdownWorkersHard(ParallelState *pstate)
+ {
+ #ifndef WIN32
+ int i;
+
+ signal(SIGPIPE, SIG_IGN);
+
+ /*
+ * Close our write end of the sockets so that the workers know they can
+ * exit.
+ */
+ for (i = 0; i < pstate->numWorkers; i++)
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ kill(pstate->parallelSlot[i].pid, SIGTERM);
+ #else
+ /* The workers monitor this event via checkAborting(). */
+ SetEvent(termEvent);
+ #endif
+
+ WaitForTerminatingWorkers(pstate);
+ }
+
+ #ifndef WIN32
+ /* Signal handling (UNIX only) */
+ static void
+ sigTermHandler(int signum)
+ {
+ wantAbort = 1;
+ }
+ #endif
+
+
+
+ /*
+ * This function is called by both UNIX and Windows variants to set up a
+ * worker process.
+ */
+ static void
+ SetupWorker(PGconn *connection, int pipefd[2], int worker, int vacStage)
+ {
+
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"};
+
+ executeMaintenanceCommand(connection, stage_commands[vacStage], false);
+
+ /*
+ * Call the setup worker function that's defined in the ArchiveHandle.
+ *
+ * We get the raw connection only for the reason that we can close it
+ * properly when we shut down. This happens only that way when it is
+ * brought down because of an error.
+ */
+ WaitForCommands(connection, pipefd);
+ closesocket(pipefd[PIPE_READ]);
+ closesocket(pipefd[PIPE_WRITE]);
+ }
+
+
+
+ /*
+ * That's the main routine for the worker.
+ * When it starts up it enters this routine and waits for commands from the
+ * master process. After having processed a command it comes back to here to
+ * wait for the next command. Finally it will receive a TERMINATE command and
+ * exit.
+ */
+ static void
+ WaitForCommands(PGconn * connection, int pipefd[2])
+ {
+ char *command;
+ PQExpBufferData sql;
+
+ for (;;)
+ {
+ if (!(command = getMessageFromMaster(pipefd)))
+ {
+ PQfinish(connection);
+ connection = NULL;
+ return;
+ }
+
+
+ /* check if master has set the terminate event*/
+ checkAborting();
+
+ if (executeMaintenanceCommand(connection, command, false))
+ sendMessageToMaster(pipefd, "OK");
+ else
+ {
+ initPQExpBuffer(&sql);
+ appendPQExpBuffer(&sql, "ERROR : %s",
+ PQerrorMessage(connection));
+ sendMessageToMaster(pipefd, sql.data);
+ termPQExpBuffer(&sql);
+ }
+
+ /* command was pg_malloc'd and we are responsible for free()ing it. */
+ free(command);
+ }
+ }
+
+ /*
+ * ---------------------------------------------------------------------
+ * Note the status change:
+ *
+ * DispatchJobForTocEntry WRKR_IDLE -> WRKR_WORKING
+ * ListenToWorkers WRKR_WORKING -> WRKR_FINISHED / WRKR_TERMINATED
+ * ReapWorkerStatus WRKR_FINISHED -> WRKR_IDLE
+ * ---------------------------------------------------------------------
+ *
+ * Just calling ReapWorkerStatus() when all workers are working might or might
+ * not give you an idle worker because you need to call ListenToWorkers() in
+ * between and only thereafter ReapWorkerStatus(). This is necessary in order
+ * to get and deal with the status (=result) of the worker's execution.
+ */
+ void
+ ListenToWorkers(ParallelState *pstate, bool do_wait)
+ {
+ int worker;
+ char *msg;
+
+ msg = getMessageFromWorker(pstate, do_wait, &worker);
+
+ if (!msg)
+ {
+ if (do_wait)
+ horribly_exit(modulename, "a worker process died unexpectedly\n");
+ return;
+ }
+
+ if (messageStartsWith(msg, "OK"))
+ {
+ pstate->parallelSlot[worker].workerStatus = WRKR_FINISHED;
+ }
+ else if (messageStartsWith(msg, "ERROR "))
+ {
+ ParallelSlot *mySlot = &pstate->parallelSlot[worker];
+
+ mySlot->workerStatus = WRKR_TERMINATED;
+ horribly_exit(modulename,
+ "vacuuming of database \"%s\" failed %s",
+ mySlot->args->vopt->dbname, msg + strlen("ERROR "));
+ }
+ else
+ {
+ horribly_exit(modulename,
+ "invalid message received from worker: %s\n", msg);
+ }
+
+ /* both Unix and Win32 return pg_malloc()ed space, so we free it */
+ free(msg);
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * This function is used to get the return value of a terminated worker
+ * process. If a process has terminated, its status is stored in *status and
+ * the id of the worker is returned.
+ */
+ int
+ ReapWorkerStatus(ParallelState *pstate, int *status)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_FINISHED)
+ {
+ *status = pstate->parallelSlot[i].status;
+ pstate->parallelSlot[i].status = 0;
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ return i;
+ }
+ }
+
+ return NO_SLOT;
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It looks for an idle worker process and only returns if there is one.
+ */
+ void
+ EnsureIdleWorker(ParallelState *pstate)
+ {
+ int ret_worker;
+ int work_status;
+
+ for (;;)
+ {
+ int nTerm = 0;
+
+ while ((ret_worker = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ if (work_status != 0)
+ horribly_exit(modulename, "error processing a parallel work item\n");
+
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before dispatching
+ * the next item. If nTerm > 0 we already have that (quick check).
+ */
+ if (nTerm > 0)
+ return;
+
+ /* explicit check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ return;
+
+ /*
+ * If we have no idle worker, read the result of one or more workers
+ * and loop the loop to call ReapWorkerStatus() on them
+ */
+ ListenToWorkers(pstate, true);
+ }
+ }
+
+
+ /*
+ * Return true iff every worker is in the WRKR_IDLE state.
+ */
+ bool
+ IsEveryWorkerIdle(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_IDLE)
+ return false;
+ return true;
+ }
+
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It waits for all workers to terminate.
+ */
+ void
+ EnsureWorkersFinished(ParallelState *pstate)
+ {
+ int work_status;
+
+ if (!pstate || pstate->numWorkers == 1)
+ return;
+
+ /* Waiting for the remaining worker processes to finish */
+ while (!IsEveryWorkerIdle(pstate))
+ {
+ if (ReapWorkerStatus(pstate, &work_status) == NO_SLOT)
+ ListenToWorkers(pstate, true);
+ else if (work_status != 0)
+ horribly_exit(modulename,
+ "error processing a parallel work item\n");
+ }
+ }
+
+ /*
+ * This function is executed in the worker process.
+ *
+ * It returns the next message on the communication channel, blocking until it
+ * becomes available.
+ */
+ static char *
+ getMessageFromMaster(int pipefd[2])
+ {
+ return readMessageFromPipe(pipefd[PIPE_READ]);
+ }
+
+ /*
+ * This function is executed in the worker process.
+ *
+ * It sends a message to the master on the communication channel.
+ */
+ static void
+ sendMessageToMaster(int pipefd[2], const char *str)
+ {
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pipefd[PIPE_WRITE], str, len) != len)
+ horribly_exit(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It returns the next message from the worker on the communication channel,
+ * optionally blocking (do_wait) until it becomes available.
+ *
+ * The id of the worker is returned in *worker.
+ */
+ static char *
+ getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker)
+ {
+ int i;
+ fd_set workerset;
+ int maxFd = -1;
+ struct timeval nowait = {0, 0};
+
+ FD_ZERO(&workerset);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_TERMINATED)
+ continue;
+ FD_SET(pstate->parallelSlot[i].pipeRead, &workerset);
+ /* actually WIN32 ignores the first parameter to select()... */
+ if (pstate->parallelSlot[i].pipeRead > maxFd)
+ maxFd = pstate->parallelSlot[i].pipeRead;
+ }
+
+ if (do_wait)
+ {
+ i = select_loop(maxFd, &workerset);
+ Assert(i != 0);
+ }
+ else
+ {
+ if ((i = select(maxFd + 1, &workerset, NULL, NULL, &nowait)) == 0)
+ return NULL;
+ }
+
+ if (i < 0)
+ horribly_exit(modulename, "error in ListenToWorkers(): %s\n", strerror(errno));
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ char *msg;
+
+ if (!FD_ISSET(pstate->parallelSlot[i].pipeRead, &workerset))
+ continue;
+
+ msg = readMessageFromPipe(pstate->parallelSlot[i].pipeRead);
+ *worker = i;
+ return msg;
+ }
+
+ Assert(false);
+ return NULL;
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It sends a message to a certain worker on the communication channel.
+ */
+ static void
+ sendMessageToWorker(ParallelState *pstate, int worker, const char *str)
+ {
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pstate->parallelSlot[worker].pipeWrite, str, len) != len)
+ {
+ /*
+ * If we're already aborting anyway, don't care if we succeed or not.
+ * The child might have gone already.
+ */
+ #ifndef WIN32
+ if (!aborting)
+ #endif
+ horribly_exit(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+ }
+ }
+
+ /*
+ * The underlying function to read a message from the communication channel
+ * (fd) with optional blocking (do_wait).
+ */
+ static char *
+ readMessageFromPipe(int fd)
+ {
+ char *msg;
+ int msgsize,
+ bufsize;
+ int ret;
+
+ /*
+ * The problem here is that we need to deal with several possibilites: we
+ * could receive only a partial message or several messages at once. The
+ * caller expects us to return exactly one message however.
+ *
+ * We could either read in as much as we can and keep track of what we
+ * delivered back to the caller or we just read byte by byte. Once we see
+ * (char) 0, we know that it's the message's end. This would be quite
+ * inefficient for more data but since we are reading only on the command
+ * channel, the performance loss does not seem worth the trouble of
+ * keeping internal states for different file descriptors.
+ */
+ bufsize = 64; /* could be any number */
+ msg = (char *) pg_malloc(bufsize);
+
+ msgsize = 0;
+ for (;;)
+ {
+ Assert(msgsize <= bufsize);
+ ret = piperead(fd, msg + msgsize, 1);
+
+ /* worker has closed the connection or another error happened */
+ if (ret <= 0)
+ return NULL;
+
+ Assert(ret == 1);
+
+ if (msg[msgsize] == '\0')
+ return msg;
+
+ msgsize++;
+ if (msgsize == bufsize)
+ {
+ /* could be any number */
+ bufsize += 16;
+ msg = (char *) realloc(msg, bufsize);
+ }
+ }
+ }
+
+ /*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+ static int
+ select_loop(int maxFd, fd_set *workerset)
+ {
+ int i;
+ fd_set saveSet = *workerset;
+
+ #ifdef WIN32
+ /* should always be the master */
+ Assert(tMasterThreadId == GetCurrentThreadId());
+
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+ #else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+
+ /*
+ * If we Ctrl-C the master process , it's likely that we interrupt
+ * select() here. The signal handler will set wantAbort == true and
+ * the shutdown journey starts from here. Note that we'll come back
+ * here later when we tell all workers to terminate and read their
+ * responses. But then we have aborting set to true.
+ */
+ if (wantAbort && !aborting)
+ horribly_exit(modulename, "terminated by user\n");
+
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+ #endif
+
+ return i;
+ }
+
+ void
+ DispatchJob(ParallelState *pstate, char * command)
+ {
+ int worker;
+
+ /* our caller makes sure that at least one worker is idle */
+ Assert(GetIdleWorker(pstate) != NO_SLOT);
+ worker = GetIdleWorker(pstate);
+ Assert(worker != NO_SLOT);
+
+ sendMessageToWorker(pstate, worker, command);
+ pstate->parallelSlot[worker].workerStatus = WRKR_WORKING;
+ }
+
+ static void
+ shutdown_parallel_vacuum_utils()
+ {
+ #ifdef WIN32
+ /* Call the cleanup function only from the main thread */
+ if (mainThreadId == GetCurrentThreadId())
+ WSACleanup();
+ #endif
+ }
+
+ void
+ init_parallel_vacuum_utils(void)
+ {
+ #ifdef WIN32
+ if (!parallel_init_done)
+ {
+ WSADATA wsaData;
+ int err;
+
+ mainThreadId = GetCurrentThreadId();
+ err = WSAStartup(MAKEWORD(2, 2), &wsaData);
+ if (err != 0)
+ {
+ exit_nicely(1);
+ }
+
+ parallel_init_done = true;
+ }
+ #endif
+ }
+
+ void
+ on_exit_close_connection(PGconn *conn)
+ {
+ shutdown_info.conn = conn;
+ }
+
+ /*
+ * Fail and die, with a message to stderr. Parameters as for write_msg.
+ *
+ * This is defined in parallel.c, because in parallel mode, things are more
+ * complicated. If the worker process does horribly_exit(), we forward its
+ * last words to the master process. The master process then does
+ * horribly_exit() with this error message itself and prints it normally.
+ * After printing the message, horribly_exit() on the master will shut down
+ * the remaining worker processes.
+ */
+ void
+ horribly_exit(const char *modulename, const char *fmt,...)
+ {
+ va_list ap;
+ ParallelState *pstate = shutdown_info.pstate;
+ ParallelSlot *slot;
+ va_start(ap, fmt);
+
+ if (pstate == NULL)
+ {
+ /* We're the parent, just write the message out */
+ vfprintf(stderr, _(fmt), ap);
+ }
+ else
+ {
+ slot = GetMyPSlot(pstate);
+ if (!slot)
+ {
+ if (progname)
+ {
+ if (modulename)
+ fprintf(stderr, "%s: [%s] ", progname, _(modulename));
+ else
+ fprintf(stderr, "%s: ", progname);
+ }
+
+ /* We're the parent, just write the message out */
+ vfprintf(stderr, _(fmt), ap);
+ }
+ else
+ {
+ /* If we're a worker process, send the msg to the master process */
+ parallel_msg_master(slot, modulename, fmt, ap);
+ }
+ }
+
+ va_end(ap);
+
+ exit_nicely(1);
+ }
+
+ static void
+ exit_nicely(int code)
+ {
+ if (shutdown_info.pstate)
+ {
+ ParallelSlot *slot = GetMyPSlot(shutdown_info.pstate);
+
+ if (!slot)
+ {
+ /*
+ * We're the master: We have already printed out the message
+ * passed to horribly_exit() either from the master itself or from
+ * a worker process. Now we need to close our own database
+ * connection (only open during parallel dump but not restore) and
+ * shut down the remaining workers.
+ */
+ PQfinish(shutdown_info.conn);
+ #ifndef WIN32
+
+ /*
+ * Setting aborting to true switches to best-effort-mode
+ * (send/receive but ignore errors) in communicating with our
+ * workers.
+ */
+ aborting = true;
+ #endif
+ ShutdownWorkersHard(shutdown_info.pstate);
+ }
+ else if (slot->args->connection)
+ PQfinish(slot->args->connection);
+
+ }
+ else if (shutdown_info.conn)
+ PQfinish(shutdown_info.conn);
+
+ shutdown_parallel_vacuum_utils();
+ exit(code);
+ }
+
+
*** /dev/null
--- b/src/bin/scripts/vac_parallel.h
***************
*** 0 ****
--- 1,107 ----
+ /*-------------------------------------------------------------------------
+ *
+ * vac_parallel.h
+ *
+ * Parallel support header file for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #ifndef VAC_PARALLEL_H
+ #define VAC_PARALLEL_H
+
+ #include "postgres_fe.h"
+ #include <time.h>
+ #include "libpq-fe.h"
+ #include "common.h"
+
+ typedef enum
+ {
+ WRKR_TERMINATED = 0,
+ WRKR_IDLE,
+ WRKR_WORKING,
+ WRKR_FINISHED
+ } T_WorkerStatus;
+
+
+ typedef struct VacOpt
+ {
+ char *dbname;
+ char *pgport;
+ char *pghost;
+ char *username;
+ char *progname;
+ enum trivalue promptPassword;
+ int analyze_stage;
+ }VacOpt;
+
+ /* Arguments needed for a worker process */
+ typedef struct ParallelArgs
+ {
+ PGconn *connection;
+ VacOpt *vopt;
+ } ParallelArgs;
+
+ /* State for each parallel activity slot */
+ typedef struct ParallelSlot
+ {
+ ParallelArgs *args; //can pass connection handle here
+ T_WorkerStatus workerStatus;
+ int status;
+ int pipeRead;
+ int pipeWrite;
+ int pipeRevRead;
+ int pipeRevWrite;
+ #ifdef WIN32
+ uintptr_t hThread;
+ unsigned int threadId;
+ #else
+ pid_t pid;
+ #endif
+ } ParallelSlot;
+
+ #define NO_SLOT (-1)
+
+ typedef struct ParallelState
+ {
+ int numWorkers;
+ ParallelSlot *parallelSlot;
+ } ParallelState;
+
+
+
+ #ifdef WIN32
+ extern bool parallel_init_done;
+ extern DWORD mainThreadId;
+ #endif
+
+ extern const char *progname;
+
+ extern ParallelState * ParallelVacuumStart(VacOpt *vopt, int numWorkers);
+ extern int GetIdleWorker(ParallelState *pstate);
+ extern bool IsEveryWorkerIdle(ParallelState *pstate);
+ extern void ListenToWorkers(ParallelState *pstate, bool do_wait);
+ extern int ReapWorkerStatus(ParallelState *pstate, int *status);
+ extern void EnsureIdleWorker(ParallelState *pstate);
+ extern void EnsureWorkersFinished(ParallelState *pstate);
+
+ extern void DispatchJob(ParallelState *pstate, char * command);
+ extern void
+ exit_horribly(const char *modulename, const char *fmt,...)
+ __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3), noreturn));
+
+ extern void init_parallel_vacuum_utils(void);
+ extern void on_exit_close_connection(PGconn *conn);
+ extern void ParallelVacuumEnd(ParallelState *pstate);
+
+
+ #endif /* VAC_PARALLEL_H */
*** a/src/bin/scripts/vacuumdb.c
--- b/src/bin/scripts/vacuumdb.c
***************
*** 11,34 ****
*/
#include "postgres_fe.h"
- #include "common.h"
#include "dumputils.h"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *table, const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo);
static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
bool analyze_only, bool analyze_in_stages, bool freeze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet);
static void help(const char *progname);
int
main(int argc, char *argv[])
--- 11,53 ----
*/
#include "postgres_fe.h"
#include "dumputils.h"
+ #include "vac_parallel.h"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, bool echo);
static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
bool analyze_only, bool analyze_in_stages, bool freeze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! bool echo, bool quiet, int parallel);
static void help(const char *progname);
+ void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host,
+ const char *port, const char *username,
+ enum trivalue prompt_password,
+ bool echo, int parallel, SimpleStringList *tables);
+
+ void run_command(ParallelState *pstate, char *command);
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+ static void
+ run_parallel_vacuum(PGconn *conn, bool echo,
+ const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int parallel,
+ VacOpt *vopt);
+
+ const char *progname = NULL;
int
main(int argc, char *argv[])
***************
*** 49,60 **** main(int argc, char *argv[])
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
};
- const char *progname;
int optindex;
int c;
--- 68,79 ----
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
};
int optindex;
int c;
***************
*** 74,86 **** main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
{
switch (c)
{
--- 93,106 ----
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int parallel = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
***************
*** 129,134 **** main(int argc, char *argv[])
--- 149,164 ----
case 'v':
verbose = true;
break;
+ case 'j':
+ parallel = atoi(optarg);
+ if (parallel <= 0)
+ {
+ fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least 1\n"),
+ progname);
+ exit(1);
+ }
+
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
***************
*** 141,146 **** main(int argc, char *argv[])
--- 171,177 ----
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
***************
*** 196,202 **** main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet);
}
else
{
--- 227,233 ----
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, echo, quiet, parallel);
}
else
{
***************
*** 210,234 **** main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
! for (cell = tables.head; cell; cell = cell->next)
{
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo);
}
}
- else
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages,
- freeze, NULL,
- host, port, username, prompt_password,
- progname, echo);
}
exit(0);
--- 241,279 ----
dbname = get_user_name_or_exit(progname);
}
! if (parallel > 1)
{
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! echo, parallel, &tables);
! }
! else
! {
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
!
! for (cell = tables.head; cell; cell = cell->next)
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, cell->val,
! host, port, username, prompt_password,
! echo);
! }
! }
! else
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, NULL,
! host, port, username, prompt_password,
! echo);
!
}
}
}
exit(0);
***************
*** 253,263 **** run_vacuum_command(PGconn *conn, const char *sql, bool echo, const char *dbname,
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze, const char *table,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo)
{
PQExpBufferData sql;
--- 298,308 ----
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, bool echo)
{
PQExpBufferData sql;
***************
*** 352,362 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
! bool analyze_in_stages, bool freeze, const char *maintenance_db,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
--- 397,408 ----
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *maintenance_db, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password,
! bool echo, bool quiet, int parallel)
{
PGconn *conn;
PGresult *result;
***************
*** 377,391 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
fflush(stdout);
}
! vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
! analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
! progname, echo);
}
PQclear(result);
}
static void
help(const char *progname)
--- 423,667 ----
fflush(stdout);
}
!
! if (parallel > 1)
! {
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! echo, parallel, NULL);
!
! }
! else
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
! echo);
! }
}
PQclear(result);
}
+ static void
+ run_parallel_vacuum(PGconn *conn, bool echo,
+ const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int parallel,
+ VacOpt *vopt)
+ {
+ ParallelState *pstate;
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+
+ initPQExpBuffer(&sql);
+
+ pstate = ParallelVacuumStart(vopt, parallel);
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ prepare_command(conn, full, verbose, and_analyze,
+ analyze_only, freeze, &sql);
+ appendPQExpBuffer(&sql, " %s", cell->val);
+ run_command(pstate, sql.data);
+ termPQExpBuffer(&sql);
+ }
+
+ EnsureWorkersFinished(pstate);
+ ParallelVacuumEnd(pstate);
+ termPQExpBuffer(&sql);
+ }
+
+
+ void
+ vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ bool echo, int parallel, SimpleStringList *tables)
+ {
+
+ PGconn *conn;
+ VacOpt vopt = {0};
+
+ init_parallel_vacuum_utils();
+
+ conn = connectDatabase(dbname, host, port, username, prompt_password,
+ progname, false);
+
+ if (dbname)
+ vopt.dbname = pg_strdup(dbname);
+
+ if (host)
+ vopt.pghost = pg_strdup(host);
+
+ if (port)
+ vopt.pgport = pg_strdup(port);
+
+ if (username)
+ vopt.username = pg_strdup(username);
+
+ if (progname)
+ vopt.progname = pg_strdup(progname);
+
+ vopt.promptPassword = prompt_password;
+
+ on_exit_close_connection(conn);
+
+ /* if table list is not provided then we need to do vaccum for whole DB
+ get the list of all tables and prpare the list*/
+ if (!tables || !tables->head)
+ {
+ SimpleStringList dbtables = {NULL, NULL};
+ PGresult *res;
+ int ntuple;
+ int i;
+ char *relName;
+ char *nspace;
+ PQExpBufferData sql;
+
+ initPQExpBuffer(&sql);
+
+ res = executeQuery(conn,
+ "select relname, nspname from pg_class c, pg_namespace ns"
+ " where relkind= \'r\' and c.relnamespace = ns.oid"
+ " order by relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ relName = PQgetvalue(res, i, 0);
+ nspace = PQgetvalue(res, i, 1);
+
+ appendPQExpBuffer(&sql, " %s.\"%s\"", nspace, relName);
+ simple_string_list_append(&dbtables, sql.data);
+ resetPQExpBuffer(&sql);
+ }
+
+ termPQExpBuffer(&sql);
+
+ tables = &dbtables;
+
+ }
+
+ if (analyze_in_stages)
+ {
+ int i;
+
+ for (i = 0; i < 3; i++)
+ {
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")};
+
+ puts(gettext(stage_messages[i]));
+ vopt.analyze_stage = i;
+ run_parallel_vacuum(conn, echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, parallel, &vopt);
+ }
+ }
+ else
+ {
+ vopt.analyze_stage = -1;
+ run_parallel_vacuum(conn, echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, parallel, &vopt);
+ }
+
+ PQfinish(conn);
+ }
+
+ void run_command(ParallelState *pstate, char *command)
+ {
+ int work_status;
+ int ret_child;
+
+ DispatchJob(pstate, command);
+
+ /*Listen for worker and get message*/
+ for (;;)
+ {
+ int nTerm = 0;
+
+ ListenToWorkers(pstate, false);
+ while ((ret_child = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before
+ * re-running the loop. If nTerm > 0 we already have that (quick
+ * check).
+ */
+ if (nTerm > 0)
+ break;
+
+ /* if nobody terminated, explicitly check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ break;
+ }
+ }
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+ {
+ initPQExpBuffer(sql);
+
+ if (analyze_only)
+ {
+ appendPQExpBuffer(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBuffer(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBuffer(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBuffer(sql, " FULL");
+ if (freeze)
+ appendPQExpBuffer(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBuffer(sql, " ANALYZE");
+ }
+ }
+ }
+
static void
help(const char *progname)
***************
*** 405,410 **** help(const char *progname)
--- 681,687 ----
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
*** a/src/include/port.h
--- b/src/include/port.h
***************
*** 216,221 **** extern char *pgwin32_setlocale(int category, const char *locale);
--- 216,241 ----
/* Portable prompt handling */
extern char *simple_prompt(const char *prompt, int maxlen, bool echo);
+ /*
+ * WIN32 doesn't allow descriptors returned by pipe() to be used in select(),
+ * so for that platform we use socket() instead of pipe().
+ * There is some inconsistency here because sometimes we require pg*, like
+ * pgpipe, but in other cases we define rename to pgrename just on Win32.
+ */
+ #ifndef WIN32
+ /*
+ * The function prototypes are not supplied because every C file
+ * includes this file.
+ */
+ #define pgpipe(a) pipe(a)
+ #define piperead(a,b,c) read(a,b,c)
+ #define pipewrite(a,b,c) write(a,b,c)
+ #else
+ extern int pgpipe(int handles[2]);
+ extern int piperead(int s, char *buf, int len);
+ #define pipewrite(a,b,c) send(a,b,c,0)
+ #endif
+
#ifdef WIN32
#define PG_SIGNAL_COUNT 32
#define kill(pid,sig) pgkill(pid,sig)
*** /dev/null
--- b/src/port/pipe.c
***************
*** 0 ****
--- 1,122 ----
+ /*-------------------------------------------------------------------------
+ *
+ * pipe.c
+ * pipe()
+ *
+ * Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ *
+ * This is a replacement version of pipe for Win32 which allows
+ * returned handles to be used in select(). Note that read/write calls
+ * must be replaced with recv/send.
+ *
+ * IDENTIFICATION
+ * src/backend/port/pipe.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #include "postgres.h"
+ #include "port.h"
+ #include "common/fe_memutils.h"
+
+ #ifdef WIN32
+ /*
+ * This is a replacement version of pipe for Win32 which allows returned
+ * handles to be used in select(). Note that read/write calls must be replaced
+ * with recv/send. "handles" have to be integers so we check for errors then
+ * cast to integers.
+ */
+ int
+ pgpipe(int handles[2])
+ {
+ pgsocket s, tmp_sock;
+ struct sockaddr_in serv_addr;
+ int len = sizeof(serv_addr);
+
+ /* We have to use the Unix socket invalid file descriptor value here. */
+ handles[0] = handles[1] = -1;
+
+ /*
+ * setup listen socket
+ */
+ if ((s = socket(AF_INET, SOCK_STREAM, 0)) == PGINVALID_SOCKET)
+ {
+ fprintf(stderr, _("pgpipe: could not create socket: error code %d\n"),
+ WSAGetLastError());
+
+ return -1;
+ }
+
+ memset((void *) &serv_addr, 0, sizeof(serv_addr));
+ serv_addr.sin_family = AF_INET;
+ serv_addr.sin_port = htons(0);
+ serv_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+ if (bind(s, (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ fprintf(stderr, _("pgpipe: could not bind: error code %d\n"),
+ WSAGetLastError());
+ closesocket(s);
+ return -1;
+ }
+ if (listen(s, 1) == SOCKET_ERROR)
+ {
+ fprintf(stderr, _("pgpipe: could not listen: error code %d\n"),
+ WSAGetLastError());
+ closesocket(s);
+ return -1;
+ }
+ if (getsockname(s, (SOCKADDR *) &serv_addr, &len) == SOCKET_ERROR)
+ {
+ fprintf(stderr, _("pgpipe: getsockname() failed: error code %d\n"),
+ WSAGetLastError());
+ closesocket(s);
+ return -1;
+ }
+
+ /*
+ * setup pipe handles
+ */
+ if ((tmp_sock = socket(AF_INET, SOCK_STREAM, 0)) == PGINVALID_SOCKET)
+ {
+ fprintf(stderr, _("pgpipe: could not create second socket: error code %d\n"),
+ WSAGetLastError());
+ closesocket(s);
+ return -1;
+ }
+ handles[1] = (int) tmp_sock;
+
+ if (connect(handles[1], (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ fprintf(stderr, _("pgpipe: could not connect socket: error code %d\n"),
+ WSAGetLastError());
+ closesocket(s);
+ return -1;
+ }
+ if ((tmp_sock = accept(s, (SOCKADDR *) &serv_addr, &len)) == PGINVALID_SOCKET)
+ {
+ fprintf(stderr, _("pgpipe: could not accept connection: error code %d\n"),
+ WSAGetLastError());
+ closesocket(handles[1]);
+ handles[1] = -1;
+ closesocket(s);
+ return -1;
+ }
+ handles[0] = (int) tmp_sock;
+
+ closesocket(s);
+ return 0;
+ }
+
+ int
+ piperead(int s, char *buf, int len)
+ {
+ int ret = recv(s, buf, len, 0);
+
+ if (ret < 0 && WSAGetLastError() == WSAECONNRESET)
+ /* EOF on the pipe! (win32 socket based implementation) */
+ ret = 0;
+ return ret;
+ }
+
+ #endif
+
*** a/src/tools/msvc/Mkvcbuild.pm
--- b/src/tools/msvc/Mkvcbuild.pm
***************
*** 71,77 **** sub mkvcbuild
pgcheckdir.c pg_crc.c pgmkdirp.c pgsleep.c pgstrcasecmp.c pqsignal.c
mkdtemp.c qsort.c qsort_arg.c quotes.c system.c
sprompt.c tar.c thread.c getopt.c getopt_long.c dirent.c
! win32env.c win32error.c win32setlocale.c);
push(@pgportfiles, 'rint.c') if ($vsVersion < '12.00');
--- 71,77 ----
pgcheckdir.c pg_crc.c pgmkdirp.c pgsleep.c pgstrcasecmp.c pqsignal.c
mkdtemp.c qsort.c qsort_arg.c quotes.c system.c
sprompt.c tar.c thread.c getopt.c getopt_long.c dirent.c
! win32env.c win32error.c win32setlocale.c pipe.c);
push(@pgportfiles, 'rint.c') if ($vsVersion < '12.00');
On Fri, Jul 4, 2014 at 1:15 AM, Dilip kumar <dilip.kumar@huawei.com> wrote:
In attached patch, I have moved pgpipe, piperead functions to src/port/pipe.c
If we want to consider proceeding with this approach, you should
probably separate this into a refactoring patch that doesn't do
anything but move code around and a feature patch that applies on top
of it.
(As to whether this is the right approach, I'm not sure.)
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 07 July 2014 17:55 Rebert Hass Wrote,
On Fri, Jul 4, 2014 at 1:15 AM, Dilip kumar <dilip.kumar@huawei.com>
wrote:In attached patch, I have moved pgpipe, piperead functions to
src/port/pipe.cIf we want to consider proceeding with this approach, you should
probably separate this into a refactoring patch that doesn't do
anything but move code around and a feature patch that applies on top
of it.(As to whether this is the right approach, I'm not sure.)
I have done the refactoring of the code.
Two patches are attached
1. vacuumdb_parallel_refactor.patch --> Moved pg_dump, parallel code to port/parallel_utils.c (almost 800 lines are moved to the common code).
2. vacuumdb_parallel_v9 --> Feature changes for vaccumdb parallel (created on top of first patch).
I think by this changes, we are able to address all the concerns we were having related to duplicate code.
Thanks & Regards,
Dilip Kumar
Attachments:
vacuumdb_parallel_refactor.patchapplication/octet-stream; name=vacuumdb_parallel_refactor.patchDownload
*** a/src/bin/pg_dump/common.c
--- b/src/bin/pg_dump/common.c
***************
*** 15,21 ****
*/
#include "pg_backup_archiver.h"
#include "pg_backup_utils.h"
!
#include <ctype.h>
#include "catalog/pg_class.h"
--- 15,21 ----
*/
#include "pg_backup_archiver.h"
#include "pg_backup_utils.h"
! #include "parallel_utils.h"
#include <ctype.h>
#include "catalog/pg_class.h"
*** a/src/bin/pg_dump/compress_io.c
--- b/src/bin/pg_dump/compress_io.c
***************
*** 184,190 **** WriteDataToArchive(ArchiveHandle *AH, CompressorState *cs,
const void *data, size_t dLen)
{
/* Are we aborting? */
! checkAborting(AH);
switch (cs->comprAlg)
{
--- 184,190 ----
const void *data, size_t dLen)
{
/* Are we aborting? */
! checkAborting();
switch (cs->comprAlg)
{
***************
*** 351,357 **** ReadDataFromArchiveZlib(ArchiveHandle *AH, ReadFunc readF)
while ((cnt = readF(AH, &buf, &buflen)))
{
/* Are we aborting? */
! checkAborting(AH);
zp->next_in = (void *) buf;
zp->avail_in = cnt;
--- 351,357 ----
while ((cnt = readF(AH, &buf, &buflen)))
{
/* Are we aborting? */
! checkAborting();
zp->next_in = (void *) buf;
zp->avail_in = cnt;
***************
*** 414,420 **** ReadDataFromArchiveNone(ArchiveHandle *AH, ReadFunc readF)
while ((cnt = readF(AH, &buf, &buflen)))
{
/* Are we aborting? */
! checkAborting(AH);
ahwrite(buf, 1, cnt, AH);
}
--- 414,420 ----
while ((cnt = readF(AH, &buf, &buflen)))
{
/* Are we aborting? */
! checkAborting();
ahwrite(buf, 1, cnt, AH);
}
*** a/src/bin/pg_dump/parallel.c
--- b/src/bin/pg_dump/parallel.c
***************
*** 20,25 ****
--- 20,26 ----
#include "pg_backup_utils.h"
#include "parallel.h"
+ #include "parallel_utils.h"
#ifndef WIN32
#include <sys/types.h>
***************
*** 35,43 ****
/* file-scope variables */
#ifdef WIN32
static unsigned int tMasterThreadId = 0;
- static HANDLE termEvent = INVALID_HANDLE_VALUE;
- static int pgpipe(int handles[2]);
- static int piperead(int s, char *buf, int len);
/*
* Structure to hold info passed by _beginthreadex() to the function it calls
--- 36,41 ----
***************
*** 53,228 **** typedef struct
} WorkerInfo;
#define pipewrite(a,b,c) send(a,b,c,0)
- #else
- /*
- * aborting is only ever used in the master, the workers are fine with just
- * wantAbort.
- */
- static bool aborting = false;
- static volatile sig_atomic_t wantAbort = 0;
- #define pgpipe(a) pipe(a)
- #define piperead(a,b,c) read(a,b,c)
- #define pipewrite(a,b,c) write(a,b,c)
#endif
- typedef struct ShutdownInformation
- {
- ParallelState *pstate;
- Archive *AHX;
- } ShutdownInformation;
-
- static ShutdownInformation shutdown_info;
-
static const char *modulename = gettext_noop("parallel archiver");
- static ParallelSlot *GetMyPSlot(ParallelState *pstate);
- static void
- parallel_msg_master(ParallelSlot *slot, const char *modulename,
- const char *fmt, va_list ap)
__attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 0)));
static void archive_close_connection(int code, void *arg);
- static void ShutdownWorkersHard(ParallelState *pstate);
- static void WaitForTerminatingWorkers(ParallelState *pstate);
- #ifndef WIN32
- static void sigTermHandler(int signum);
- #endif
static void SetupWorker(ArchiveHandle *AH, int pipefd[2], int worker,
RestoreOptions *ropt);
- static bool HasEveryWorkerTerminated(ParallelState *pstate);
static void lockTableNoWait(ArchiveHandle *AH, TocEntry *te);
static void WaitForCommands(ArchiveHandle *AH, int pipefd[2]);
- static char *getMessageFromMaster(int pipefd[2]);
- static void sendMessageToMaster(int pipefd[2], const char *str);
- static int select_loop(int maxFd, fd_set *workerset);
- static char *getMessageFromWorker(ParallelState *pstate,
- bool do_wait, int *worker);
- static void sendMessageToWorker(ParallelState *pstate,
- int worker, const char *str);
- static char *readMessageFromPipe(int fd);
#define messageStartsWith(msg, prefix) \
(strncmp(msg, prefix, strlen(prefix)) == 0)
#define messageEquals(msg, pattern) \
(strcmp(msg, pattern) == 0)
- #ifdef WIN32
- static void shutdown_parallel_dump_utils(int code, void *unused);
- bool parallel_init_done = false;
- static DWORD tls_index;
- DWORD mainThreadId;
- #endif
-
-
- #ifdef WIN32
- static void
- shutdown_parallel_dump_utils(int code, void *unused)
- {
- /* Call the cleanup function only from the main thread */
- if (mainThreadId == GetCurrentThreadId())
- WSACleanup();
- }
- #endif
-
- void
- init_parallel_dump_utils(void)
- {
- #ifdef WIN32
- if (!parallel_init_done)
- {
- WSADATA wsaData;
- int err;
-
- tls_index = TlsAlloc();
- mainThreadId = GetCurrentThreadId();
- err = WSAStartup(MAKEWORD(2, 2), &wsaData);
- if (err != 0)
- {
- fprintf(stderr, _("%s: WSAStartup failed: %d\n"), progname, err);
- exit_nicely(1);
- }
- on_exit_nicely(shutdown_parallel_dump_utils, NULL);
- parallel_init_done = true;
- }
- #endif
- }
-
- static ParallelSlot *
- GetMyPSlot(ParallelState *pstate)
- {
- int i;
-
- for (i = 0; i < pstate->numWorkers; i++)
- #ifdef WIN32
- if (pstate->parallelSlot[i].threadId == GetCurrentThreadId())
- #else
- if (pstate->parallelSlot[i].pid == getpid())
- #endif
- return &(pstate->parallelSlot[i]);
-
- return NULL;
- }
-
- /*
- * Fail and die, with a message to stderr. Parameters as for write_msg.
- *
- * This is defined in parallel.c, because in parallel mode, things are more
- * complicated. If the worker process does exit_horribly(), we forward its
- * last words to the master process. The master process then does
- * exit_horribly() with this error message itself and prints it normally.
- * After printing the message, exit_horribly() on the master will shut down
- * the remaining worker processes.
- */
- void
- exit_horribly(const char *modulename, const char *fmt,...)
- {
- va_list ap;
- ParallelState *pstate = shutdown_info.pstate;
- ParallelSlot *slot;
-
- va_start(ap, fmt);
-
- if (pstate == NULL)
- {
- /* Not in parallel mode, just write to stderr */
- vwrite_msg(modulename, fmt, ap);
- }
- else
- {
- slot = GetMyPSlot(pstate);
-
- if (!slot)
- /* We're the parent, just write the message out */
- vwrite_msg(modulename, fmt, ap);
- else
- /* If we're a worker process, send the msg to the master process */
- parallel_msg_master(slot, modulename, fmt, ap);
- }
-
- va_end(ap);
-
- exit_nicely(1);
- }
-
- /* Sends the error message from the worker to the master process */
- static void
- parallel_msg_master(ParallelSlot *slot, const char *modulename,
- const char *fmt, va_list ap)
- {
- char buf[512];
- int pipefd[2];
-
- pipefd[PIPE_READ] = slot->pipeRevRead;
- pipefd[PIPE_WRITE] = slot->pipeRevWrite;
-
- strcpy(buf, "ERROR ");
- vsnprintf(buf + strlen("ERROR "),
- sizeof(buf) - strlen("ERROR "), fmt, ap);
-
- sendMessageToMaster(pipefd, buf);
- }
/*
* A thread-local version of getLocalPQExpBuffer().
--- 51,75 ----
***************
*** 280,286 **** getThreadLocalPQExpBuffer(void)
void
on_exit_close_archive(Archive *AHX)
{
! shutdown_info.AHX = AHX;
on_exit_nicely(archive_close_connection, &shutdown_info);
}
--- 127,133 ----
void
on_exit_close_archive(Archive *AHX)
{
! shutdown_info.handle = (void*)AHX;
on_exit_nicely(archive_close_connection, &shutdown_info);
}
***************
*** 306,312 **** archive_close_connection(int code, void *arg)
* connection (only open during parallel dump but not restore) and
* shut down the remaining workers.
*/
! DisconnectDatabase(si->AHX);
#ifndef WIN32
/*
--- 153,159 ----
* connection (only open during parallel dump but not restore) and
* shut down the remaining workers.
*/
! DisconnectDatabase((Archive*)si->handle);
#ifndef WIN32
/*
***************
*** 318,436 **** archive_close_connection(int code, void *arg)
#endif
ShutdownWorkersHard(si->pstate);
}
! else if (slot->args->AH)
! DisconnectDatabase(&(slot->args->AH->public));
}
! else if (si->AHX)
! DisconnectDatabase(si->AHX);
}
/*
- * If we have one worker that terminates for some reason, we'd like the other
- * threads to terminate as well (and not finish with their 70 GB table dump
- * first...). Now in UNIX we can just kill these processes, and let the signal
- * handler set wantAbort to 1. In Windows we set a termEvent and this serves
- * as the signal for everyone to terminate.
- */
- void
- checkAborting(ArchiveHandle *AH)
- {
- #ifdef WIN32
- if (WaitForSingleObject(termEvent, 0) == WAIT_OBJECT_0)
- #else
- if (wantAbort)
- #endif
- exit_horribly(modulename, "worker is terminating\n");
- }
-
- /*
- * Shut down any remaining workers, this has an implicit do_wait == true.
- *
- * The fastest way we can make the workers terminate gracefully is when
- * they are listening for new commands and we just tell them to terminate.
- */
- static void
- ShutdownWorkersHard(ParallelState *pstate)
- {
- #ifndef WIN32
- int i;
-
- signal(SIGPIPE, SIG_IGN);
-
- /*
- * Close our write end of the sockets so that the workers know they can
- * exit.
- */
- for (i = 0; i < pstate->numWorkers; i++)
- closesocket(pstate->parallelSlot[i].pipeWrite);
-
- for (i = 0; i < pstate->numWorkers; i++)
- kill(pstate->parallelSlot[i].pid, SIGTERM);
- #else
- /* The workers monitor this event via checkAborting(). */
- SetEvent(termEvent);
- #endif
-
- WaitForTerminatingWorkers(pstate);
- }
-
- /*
- * Wait for the termination of the processes using the OS-specific method.
- */
- static void
- WaitForTerminatingWorkers(ParallelState *pstate)
- {
- while (!HasEveryWorkerTerminated(pstate))
- {
- ParallelSlot *slot = NULL;
- int j;
-
- #ifndef WIN32
- int status;
- pid_t pid = wait(&status);
-
- for (j = 0; j < pstate->numWorkers; j++)
- if (pstate->parallelSlot[j].pid == pid)
- slot = &(pstate->parallelSlot[j]);
- #else
- uintptr_t hThread;
- DWORD ret;
- uintptr_t *lpHandles = pg_malloc(sizeof(HANDLE) * pstate->numWorkers);
- int nrun = 0;
-
- for (j = 0; j < pstate->numWorkers; j++)
- if (pstate->parallelSlot[j].workerStatus != WRKR_TERMINATED)
- {
- lpHandles[nrun] = pstate->parallelSlot[j].hThread;
- nrun++;
- }
- ret = WaitForMultipleObjects(nrun, (HANDLE *) lpHandles, false, INFINITE);
- Assert(ret != WAIT_FAILED);
- hThread = lpHandles[ret - WAIT_OBJECT_0];
-
- for (j = 0; j < pstate->numWorkers; j++)
- if (pstate->parallelSlot[j].hThread == hThread)
- slot = &(pstate->parallelSlot[j]);
-
- free(lpHandles);
- #endif
- Assert(slot);
-
- slot->workerStatus = WRKR_TERMINATED;
- }
- Assert(HasEveryWorkerTerminated(pstate));
- }
-
- #ifndef WIN32
- /* Signal handling (UNIX only) */
- static void
- sigTermHandler(int signum)
- {
- wantAbort = 1;
- }
- #endif
-
- /*
* This function is called by both UNIX and Windows variants to set up a
* worker process.
*/
--- 165,178 ----
#endif
ShutdownWorkersHard(si->pstate);
}
! else if (((ParallelArgs*)slot->args)->AH)
! DisconnectDatabase(&(((ParallelArgs*)slot->args)->AH->public));
}
! else if ((ArchiveHandle*)si->handle)
! DisconnectDatabase((Archive*)si->handle);
}
/*
* This function is called by both UNIX and Windows variants to set up a
* worker process.
*/
***************
*** 537,544 **** ParallelBackupStart(ArchiveHandle *AH, RestoreOptions *ropt)
pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
pstate->parallelSlot[i].args = (ParallelArgs *) pg_malloc(sizeof(ParallelArgs));
! pstate->parallelSlot[i].args->AH = NULL;
! pstate->parallelSlot[i].args->te = NULL;
#ifdef WIN32
/* Allocate a new structure for every worker */
wi = (WorkerInfo *) pg_malloc(sizeof(WorkerInfo));
--- 279,286 ----
pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
pstate->parallelSlot[i].args = (ParallelArgs *) pg_malloc(sizeof(ParallelArgs));
! ((ParallelArgs*)pstate->parallelSlot[i].args)->AH = NULL;
! ((ParallelArgs*)pstate->parallelSlot[i].args)->te = NULL;
#ifdef WIN32
/* Allocate a new structure for every worker */
wi = (WorkerInfo *) pg_malloc(sizeof(WorkerInfo));
***************
*** 581,587 **** ParallelBackupStart(ArchiveHandle *AH, RestoreOptions *ropt)
* and also clones the database connection (for parallel dump)
* which both seem kinda helpful.
*/
! pstate->parallelSlot[i].args->AH = CloneArchive(AH);
/* close read end of Worker -> Master */
closesocket(pipeWM[PIPE_READ]);
--- 323,329 ----
* and also clones the database connection (for parallel dump)
* which both seem kinda helpful.
*/
! ((ParallelArgs*)pstate->parallelSlot[i].args)->AH = CloneArchive(AH);
/* close read end of Worker -> Master */
closesocket(pipeWM[PIPE_READ]);
***************
*** 598,604 **** ParallelBackupStart(ArchiveHandle *AH, RestoreOptions *ropt)
closesocket(pstate->parallelSlot[j].pipeWrite);
}
! SetupWorker(pstate->parallelSlot[i].args->AH, pipefd, i, ropt);
exit(0);
}
--- 340,346 ----
closesocket(pstate->parallelSlot[j].pipeWrite);
}
! SetupWorker(((ParallelArgs*)pstate->parallelSlot[i].args)->AH, pipefd, i, ropt);
exit(0);
}
***************
*** 738,786 **** DispatchJobForTocEntry(ArchiveHandle *AH, ParallelState *pstate, TocEntry *te,
sendMessageToWorker(pstate, worker, arg);
pstate->parallelSlot[worker].workerStatus = WRKR_WORKING;
! pstate->parallelSlot[worker].args->te = te;
! }
!
! /*
! * Find the first free parallel slot (if any).
! */
! int
! GetIdleWorker(ParallelState *pstate)
! {
! int i;
!
! for (i = 0; i < pstate->numWorkers; i++)
! if (pstate->parallelSlot[i].workerStatus == WRKR_IDLE)
! return i;
! return NO_SLOT;
! }
!
! /*
! * Return true iff every worker process is in the WRKR_TERMINATED state.
! */
! static bool
! HasEveryWorkerTerminated(ParallelState *pstate)
! {
! int i;
!
! for (i = 0; i < pstate->numWorkers; i++)
! if (pstate->parallelSlot[i].workerStatus != WRKR_TERMINATED)
! return false;
! return true;
! }
!
! /*
! * Return true iff every worker is in the WRKR_IDLE state.
! */
! bool
! IsEveryWorkerIdle(ParallelState *pstate)
! {
! int i;
!
! for (i = 0; i < pstate->numWorkers; i++)
! if (pstate->parallelSlot[i].workerStatus != WRKR_IDLE)
! return false;
! return true;
}
/*
--- 480,486 ----
sendMessageToWorker(pstate, worker, arg);
pstate->parallelSlot[worker].workerStatus = WRKR_WORKING;
! ((ParallelArgs*)pstate->parallelSlot[worker].args)->te = te;
}
/*
***************
*** 966,972 **** ListenToWorkers(ArchiveHandle *AH, ParallelState *pstate, bool do_wait)
TocEntry *te;
pstate->parallelSlot[worker].workerStatus = WRKR_FINISHED;
! te = pstate->parallelSlot[worker].args->te;
if (messageStartsWith(msg, "OK RESTORE "))
{
statusString = msg + strlen("OK RESTORE ");
--- 666,672 ----
TocEntry *te;
pstate->parallelSlot[worker].workerStatus = WRKR_FINISHED;
! te = ((ParallelArgs*)pstate->parallelSlot[worker].args)->te;
if (messageStartsWith(msg, "OK RESTORE "))
{
statusString = msg + strlen("OK RESTORE ");
***************
*** 1001,1031 **** ListenToWorkers(ArchiveHandle *AH, ParallelState *pstate, bool do_wait)
/*
* This function is executed in the master process.
*
- * This function is used to get the return value of a terminated worker
- * process. If a process has terminated, its status is stored in *status and
- * the id of the worker is returned.
- */
- int
- ReapWorkerStatus(ParallelState *pstate, int *status)
- {
- int i;
-
- for (i = 0; i < pstate->numWorkers; i++)
- {
- if (pstate->parallelSlot[i].workerStatus == WRKR_FINISHED)
- {
- *status = pstate->parallelSlot[i].status;
- pstate->parallelSlot[i].status = 0;
- pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
- return i;
- }
- }
- return NO_SLOT;
- }
-
- /*
- * This function is executed in the master process.
- *
* It looks for an idle worker process and only returns if there is one.
*/
void
--- 701,706 ----
***************
*** 1089,1417 **** EnsureWorkersFinished(ArchiveHandle *AH, ParallelState *pstate)
}
}
- /*
- * This function is executed in the worker process.
- *
- * It returns the next message on the communication channel, blocking until it
- * becomes available.
- */
- static char *
- getMessageFromMaster(int pipefd[2])
- {
- return readMessageFromPipe(pipefd[PIPE_READ]);
- }
-
- /*
- * This function is executed in the worker process.
- *
- * It sends a message to the master on the communication channel.
- */
- static void
- sendMessageToMaster(int pipefd[2], const char *str)
- {
- int len = strlen(str) + 1;
-
- if (pipewrite(pipefd[PIPE_WRITE], str, len) != len)
- exit_horribly(modulename,
- "could not write to the communication channel: %s\n",
- strerror(errno));
- }
-
- /*
- * A select loop that repeats calling select until a descriptor in the read
- * set becomes readable. On Windows we have to check for the termination event
- * from time to time, on Unix we can just block forever.
- */
- static int
- select_loop(int maxFd, fd_set *workerset)
- {
- int i;
- fd_set saveSet = *workerset;
-
- #ifdef WIN32
- /* should always be the master */
- Assert(tMasterThreadId == GetCurrentThreadId());
-
- for (;;)
- {
- /*
- * sleep a quarter of a second before checking if we should terminate.
- */
- struct timeval tv = {0, 250000};
-
- *workerset = saveSet;
- i = select(maxFd + 1, workerset, NULL, NULL, &tv);
-
- if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
- continue;
- if (i)
- break;
- }
- #else /* UNIX */
-
- for (;;)
- {
- *workerset = saveSet;
- i = select(maxFd + 1, workerset, NULL, NULL, NULL);
-
- /*
- * If we Ctrl-C the master process , it's likely that we interrupt
- * select() here. The signal handler will set wantAbort == true and
- * the shutdown journey starts from here. Note that we'll come back
- * here later when we tell all workers to terminate and read their
- * responses. But then we have aborting set to true.
- */
- if (wantAbort && !aborting)
- exit_horribly(modulename, "terminated by user\n");
-
- if (i < 0 && errno == EINTR)
- continue;
- break;
- }
- #endif
-
- return i;
- }
-
-
- /*
- * This function is executed in the master process.
- *
- * It returns the next message from the worker on the communication channel,
- * optionally blocking (do_wait) until it becomes available.
- *
- * The id of the worker is returned in *worker.
- */
- static char *
- getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker)
- {
- int i;
- fd_set workerset;
- int maxFd = -1;
- struct timeval nowait = {0, 0};
-
- FD_ZERO(&workerset);
-
- for (i = 0; i < pstate->numWorkers; i++)
- {
- if (pstate->parallelSlot[i].workerStatus == WRKR_TERMINATED)
- continue;
- FD_SET(pstate->parallelSlot[i].pipeRead, &workerset);
- /* actually WIN32 ignores the first parameter to select()... */
- if (pstate->parallelSlot[i].pipeRead > maxFd)
- maxFd = pstate->parallelSlot[i].pipeRead;
- }
-
- if (do_wait)
- {
- i = select_loop(maxFd, &workerset);
- Assert(i != 0);
- }
- else
- {
- if ((i = select(maxFd + 1, &workerset, NULL, NULL, &nowait)) == 0)
- return NULL;
- }
-
- if (i < 0)
- exit_horribly(modulename, "error in ListenToWorkers(): %s\n", strerror(errno));
-
- for (i = 0; i < pstate->numWorkers; i++)
- {
- char *msg;
-
- if (!FD_ISSET(pstate->parallelSlot[i].pipeRead, &workerset))
- continue;
-
- msg = readMessageFromPipe(pstate->parallelSlot[i].pipeRead);
- *worker = i;
- return msg;
- }
- Assert(false);
- return NULL;
- }
-
- /*
- * This function is executed in the master process.
- *
- * It sends a message to a certain worker on the communication channel.
- */
- static void
- sendMessageToWorker(ParallelState *pstate, int worker, const char *str)
- {
- int len = strlen(str) + 1;
-
- if (pipewrite(pstate->parallelSlot[worker].pipeWrite, str, len) != len)
- {
- /*
- * If we're already aborting anyway, don't care if we succeed or not.
- * The child might have gone already.
- */
- #ifndef WIN32
- if (!aborting)
- #endif
- exit_horribly(modulename,
- "could not write to the communication channel: %s\n",
- strerror(errno));
- }
- }
-
- /*
- * The underlying function to read a message from the communication channel
- * (fd) with optional blocking (do_wait).
- */
- static char *
- readMessageFromPipe(int fd)
- {
- char *msg;
- int msgsize,
- bufsize;
- int ret;
-
- /*
- * The problem here is that we need to deal with several possibilites: we
- * could receive only a partial message or several messages at once. The
- * caller expects us to return exactly one message however.
- *
- * We could either read in as much as we can and keep track of what we
- * delivered back to the caller or we just read byte by byte. Once we see
- * (char) 0, we know that it's the message's end. This would be quite
- * inefficient for more data but since we are reading only on the command
- * channel, the performance loss does not seem worth the trouble of
- * keeping internal states for different file descriptors.
- */
- bufsize = 64; /* could be any number */
- msg = (char *) pg_malloc(bufsize);
-
- msgsize = 0;
- for (;;)
- {
- Assert(msgsize <= bufsize);
- ret = piperead(fd, msg + msgsize, 1);
-
- /* worker has closed the connection or another error happened */
- if (ret <= 0)
- break;
-
- Assert(ret == 1);
-
- if (msg[msgsize] == '\0')
- return msg;
-
- msgsize++;
- if (msgsize == bufsize)
- {
- /* could be any number */
- bufsize += 16;
- msg = (char *) realloc(msg, bufsize);
- }
- }
-
- /*
- * Worker has closed the connection, make sure to clean up before return
- * since we are not returning msg (but did allocate it).
- */
- free(msg);
-
- return NULL;
- }
-
- #ifdef WIN32
- /*
- * This is a replacement version of pipe for Win32 which allows returned
- * handles to be used in select(). Note that read/write calls must be replaced
- * with recv/send. "handles" have to be integers so we check for errors then
- * cast to integers.
- */
- static int
- pgpipe(int handles[2])
- {
- pgsocket s, tmp_sock;
- struct sockaddr_in serv_addr;
- int len = sizeof(serv_addr);
-
- /* We have to use the Unix socket invalid file descriptor value here. */
- handles[0] = handles[1] = -1;
-
- /*
- * setup listen socket
- */
- if ((s = socket(AF_INET, SOCK_STREAM, 0)) == PGINVALID_SOCKET)
- {
- write_msg(modulename, "pgpipe: could not create socket: error code %d\n",
- WSAGetLastError());
- return -1;
- }
-
- memset((void *) &serv_addr, 0, sizeof(serv_addr));
- serv_addr.sin_family = AF_INET;
- serv_addr.sin_port = htons(0);
- serv_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
- if (bind(s, (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
- {
- write_msg(modulename, "pgpipe: could not bind: error code %d\n",
- WSAGetLastError());
- closesocket(s);
- return -1;
- }
- if (listen(s, 1) == SOCKET_ERROR)
- {
- write_msg(modulename, "pgpipe: could not listen: error code %d\n",
- WSAGetLastError());
- closesocket(s);
- return -1;
- }
- if (getsockname(s, (SOCKADDR *) &serv_addr, &len) == SOCKET_ERROR)
- {
- write_msg(modulename, "pgpipe: getsockname() failed: error code %d\n",
- WSAGetLastError());
- closesocket(s);
- return -1;
- }
-
- /*
- * setup pipe handles
- */
- if ((tmp_sock = socket(AF_INET, SOCK_STREAM, 0)) == PGINVALID_SOCKET)
- {
- write_msg(modulename, "pgpipe: could not create second socket: error code %d\n",
- WSAGetLastError());
- closesocket(s);
- return -1;
- }
- handles[1] = (int) tmp_sock;
-
- if (connect(handles[1], (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
- {
- write_msg(modulename, "pgpipe: could not connect socket: error code %d\n",
- WSAGetLastError());
- closesocket(s);
- return -1;
- }
- if ((tmp_sock = accept(s, (SOCKADDR *) &serv_addr, &len)) == PGINVALID_SOCKET)
- {
- write_msg(modulename, "pgpipe: could not accept connection: error code %d\n",
- WSAGetLastError());
- closesocket(handles[1]);
- handles[1] = -1;
- closesocket(s);
- return -1;
- }
- handles[0] = (int) tmp_sock;
-
- closesocket(s);
- return 0;
- }
-
- static int
- piperead(int s, char *buf, int len)
- {
- int ret = recv(s, buf, len, 0);
-
- if (ret < 0 && WSAGetLastError() == WSAECONNRESET)
- /* EOF on the pipe! (win32 socket based implementation) */
- ret = 0;
- return ret;
- }
-
- #endif
--- 764,766 ----
*** a/src/bin/pg_dump/parallel.h
--- b/src/bin/pg_dump/parallel.h
***************
*** 20,37 ****
#define PG_DUMP_PARALLEL_H
#include "pg_backup_db.h"
struct _archiveHandle;
struct _tocEntry;
- typedef enum
- {
- WRKR_TERMINATED = 0,
- WRKR_IDLE,
- WRKR_WORKING,
- WRKR_FINISHED
- } T_WorkerStatus;
-
/* Arguments needed for a worker process */
typedef struct ParallelArgs
{
--- 20,30 ----
#define PG_DUMP_PARALLEL_H
#include "pg_backup_db.h"
+ #include "parallel_utils.h"
struct _archiveHandle;
struct _tocEntry;
/* Arguments needed for a worker process */
typedef struct ParallelArgs
{
***************
*** 39,81 **** typedef struct ParallelArgs
struct _tocEntry *te;
} ParallelArgs;
- /* State for each parallel activity slot */
- typedef struct ParallelSlot
- {
- ParallelArgs *args;
- T_WorkerStatus workerStatus;
- int status;
- int pipeRead;
- int pipeWrite;
- int pipeRevRead;
- int pipeRevWrite;
- #ifdef WIN32
- uintptr_t hThread;
- unsigned int threadId;
- #else
- pid_t pid;
- #endif
- } ParallelSlot;
-
- #define NO_SLOT (-1)
-
- typedef struct ParallelState
- {
- int numWorkers;
- ParallelSlot *parallelSlot;
- } ParallelState;
-
- #ifdef WIN32
- extern bool parallel_init_done;
- extern DWORD mainThreadId;
- #endif
-
- extern void init_parallel_dump_utils(void);
-
- extern int GetIdleWorker(ParallelState *pstate);
- extern bool IsEveryWorkerIdle(ParallelState *pstate);
extern void ListenToWorkers(struct _archiveHandle * AH, ParallelState *pstate, bool do_wait);
- extern int ReapWorkerStatus(ParallelState *pstate, int *status);
extern void EnsureIdleWorker(struct _archiveHandle * AH, ParallelState *pstate);
extern void EnsureWorkersFinished(struct _archiveHandle * AH, ParallelState *pstate);
--- 32,38 ----
***************
*** 86,95 **** extern void DispatchJobForTocEntry(struct _archiveHandle * AH,
struct _tocEntry * te, T_Action act);
extern void ParallelBackupEnd(struct _archiveHandle * AH, ParallelState *pstate);
- extern void checkAborting(struct _archiveHandle * AH);
-
- extern void
- exit_horribly(const char *modulename, const char *fmt,...)
- __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3), noreturn));
-
#endif /* PG_DUMP_PARALLEL_H */
--- 43,46 ----
*** a/src/bin/pg_dump/pg_backup_archiver.c
--- b/src/bin/pg_dump/pg_backup_archiver.c
***************
*** 3825,3834 **** get_next_work_item(ArchiveHandle *AH, TocEntry *ready_list,
if (pref_non_data)
{
int count = 0;
!
for (k = 0; k < pstate->numWorkers; k++)
! if (pstate->parallelSlot[k].args->te != NULL &&
! pstate->parallelSlot[k].args->te->section == SECTION_DATA)
count++;
if (pstate->numWorkers == 0 || count * 4 < pstate->numWorkers)
pref_non_data = false;
--- 3825,3834 ----
if (pref_non_data)
{
int count = 0;
!
for (k = 0; k < pstate->numWorkers; k++)
! if (((ParallelArgs*)pstate->parallelSlot[k].args)->te != NULL &&
! ((ParallelArgs*)pstate->parallelSlot[k].args)->te->section == SECTION_DATA)
count++;
if (pstate->numWorkers == 0 || count * 4 < pstate->numWorkers)
pref_non_data = false;
***************
*** 3852,3858 **** get_next_work_item(ArchiveHandle *AH, TocEntry *ready_list,
if (pstate->parallelSlot[i].workerStatus != WRKR_WORKING)
continue;
! running_te = pstate->parallelSlot[i].args->te;
if (has_lock_conflicts(te, running_te) ||
has_lock_conflicts(running_te, te))
--- 3852,3858 ----
if (pstate->parallelSlot[i].workerStatus != WRKR_WORKING)
continue;
! running_te = ((ParallelArgs*)pstate->parallelSlot[i].args)->te;
if (has_lock_conflicts(te, running_te) ||
has_lock_conflicts(running_te, te))
***************
*** 3926,3932 **** mark_work_done(ArchiveHandle *AH, TocEntry *ready_list,
{
TocEntry *te = NULL;
! te = pstate->parallelSlot[worker].args->te;
if (te == NULL)
exit_horribly(modulename, "could not find slot of finished worker\n");
--- 3926,3932 ----
{
TocEntry *te = NULL;
! te = ((ParallelArgs*)pstate->parallelSlot[worker].args)->te;
if (te == NULL)
exit_horribly(modulename, "could not find slot of finished worker\n");
*** a/src/bin/pg_dump/pg_backup_directory.c
--- b/src/bin/pg_dump/pg_backup_directory.c
***************
*** 35,42 ****
#include "compress_io.h"
#include "pg_backup_utils.h"
#include "parallel.h"
-
#include <dirent.h>
#include <sys/stat.h>
--- 35,42 ----
#include "compress_io.h"
#include "pg_backup_utils.h"
+ #include "parallel_utils.h"
#include "parallel.h"
#include <dirent.h>
#include <sys/stat.h>
***************
*** 356,362 **** _WriteData(ArchiveHandle *AH, const void *data, size_t dLen)
lclContext *ctx = (lclContext *) AH->formatData;
/* Are we aborting? */
! checkAborting(AH);
if (dLen > 0 && cfwrite(data, dLen, ctx->dataFH) != dLen)
WRITE_ERROR_EXIT;
--- 356,362 ----
lclContext *ctx = (lclContext *) AH->formatData;
/* Are we aborting? */
! checkAborting();
if (dLen > 0 && cfwrite(data, dLen, ctx->dataFH) != dLen)
WRITE_ERROR_EXIT;
***************
*** 524,530 **** _WriteBuf(ArchiveHandle *AH, const void *buf, size_t len)
lclContext *ctx = (lclContext *) AH->formatData;
/* Are we aborting? */
! checkAborting(AH);
if (cfwrite(buf, len, ctx->dataFH) != len)
WRITE_ERROR_EXIT;
--- 524,530 ----
lclContext *ctx = (lclContext *) AH->formatData;
/* Are we aborting? */
! checkAborting();
if (cfwrite(buf, len, ctx->dataFH) != len)
WRITE_ERROR_EXIT;
*** a/src/bin/pg_dump/pg_backup_utils.c
--- b/src/bin/pg_dump/pg_backup_utils.c
***************
*** 15,33 ****
#include "pg_backup_utils.h"
#include "parallel.h"
!
/* Globals exported by this file */
const char *progname = NULL;
- #define MAX_ON_EXIT_NICELY 20
-
- static struct
- {
- on_exit_nicely_callback function;
- void *arg;
- } on_exit_nicely_list[MAX_ON_EXIT_NICELY];
-
- static int on_exit_nicely_index;
/*
* Parse a --section=foo command line argument.
--- 15,24 ----
#include "pg_backup_utils.h"
#include "parallel.h"
! #include "parallel_utils.h"
/* Globals exported by this file */
const char *progname = NULL;
/*
* Parse a --section=foo command line argument.
***************
*** 60,126 **** set_dump_section(const char *arg, int *dumpSections)
}
- /*
- * Write a printf-style message to stderr.
- *
- * The program name is prepended, if "progname" has been set.
- * Also, if modulename isn't NULL, that's included too.
- * Note that we'll try to translate the modulename and the fmt string.
- */
- void
- write_msg(const char *modulename, const char *fmt,...)
- {
- va_list ap;
-
- va_start(ap, fmt);
- vwrite_msg(modulename, fmt, ap);
- va_end(ap);
- }
-
- /*
- * As write_msg, but pass a va_list not variable arguments.
- */
- void
- vwrite_msg(const char *modulename, const char *fmt, va_list ap)
- {
- if (progname)
- {
- if (modulename)
- fprintf(stderr, "%s: [%s] ", progname, _(modulename));
- else
- fprintf(stderr, "%s: ", progname);
- }
- vfprintf(stderr, _(fmt), ap);
- }
-
- /* Register a callback to be run when exit_nicely is invoked. */
- void
- on_exit_nicely(on_exit_nicely_callback function, void *arg)
- {
- if (on_exit_nicely_index >= MAX_ON_EXIT_NICELY)
- exit_horribly(NULL, "out of on_exit_nicely slots\n");
- on_exit_nicely_list[on_exit_nicely_index].function = function;
- on_exit_nicely_list[on_exit_nicely_index].arg = arg;
- on_exit_nicely_index++;
- }
-
- /*
- * Run accumulated on_exit_nicely callbacks in reverse order and then exit
- * quietly. This needs to be thread-safe.
- */
- void
- exit_nicely(int code)
- {
- int i;
-
- for (i = on_exit_nicely_index - 1; i >= 0; i--)
- (*on_exit_nicely_list[i].function) (code,
- on_exit_nicely_list[i].arg);
-
- #ifdef WIN32
- if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
- ExitThread(code);
- #endif
-
- exit(code);
- }
--- 51,53 ----
*** a/src/bin/pg_dump/pg_backup_utils.h
--- b/src/bin/pg_dump/pg_backup_utils.h
***************
*** 23,40 **** typedef enum /* bits returned by set_dump_section */
DUMP_UNSECTIONED = 0xff
} DumpSections;
- typedef void (*on_exit_nicely_callback) (int code, void *arg);
-
- extern const char *progname;
-
extern void set_dump_section(const char *arg, int *dumpSections);
- extern void
- write_msg(const char *modulename, const char *fmt,...)
- __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
- extern void
- vwrite_msg(const char *modulename, const char *fmt, va_list ap)
- __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 0)));
- extern void on_exit_nicely(on_exit_nicely_callback function, void *arg);
- extern void exit_nicely(int code) __attribute__((noreturn));
-
#endif /* PG_BACKUP_UTILS_H */
--- 23,27 ----
*** /dev/null
--- b/src/include/parallel_utils.h
***************
*** 0 ****
--- 1,176 ----
+ /*-------------------------------------------------------------------------
+ *
+ * parallel_utils.h
+ * Header for src/port/ parallel execution functions.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/port.h
+ *
+ *-------------------------------------------------------------------------
+ */
+ #ifndef PARALLEL_UTILS_H
+ #define PARALLEL_UTILS_H
+
+ #include <ctype.h>
+ #include <netdb.h>
+ #include <pwd.h>
+
+ /*
+ * WIN32 doesn't allow descriptors returned by pipe() to be used in select(),
+ * so for that platform we use socket() instead of pipe().
+ * There is some inconsistency here because sometimes we require pg*, like
+ * pgpipe, but in other cases we define rename to pgrename just on Win32.
+ */
+ #ifndef WIN32
+ /*
+ * The function prototypes are not supplied because every C file
+ * includes this file.
+ */
+ #define pgpipe(a) pipe(a)
+ #define piperead(a,b,c) read(a,b,c)
+ #define pipewrite(a,b,c) write(a,b,c)
+ #else
+ extern int pgpipe(int handles[2]);
+ extern int piperead(int s, char *buf, int len);
+ #define pipewrite(a,b,c) send(a,b,c,0)
+ #endif
+
+ #ifdef WIN32
+ #define PG_SIGNAL_COUNT 32
+ #define kill(pid,sig) pgkill(pid,sig)
+ extern int pgkill(int pid, int sig);
+ #endif
+
+ #ifdef WIN32
+ extern DWORD mainThreadId;
+ extern bool parallel_init_done;
+ extern HANDLE termEvent;
+
+ #endif
+
+
+ typedef enum
+ {
+ WRKR_TERMINATED = 0,
+ WRKR_IDLE,
+ WRKR_WORKING,
+ WRKR_FINISHED
+ } T_WorkerStatus;
+
+ #define PIPE_READ 0
+ #define PIPE_WRITE 1
+
+ #define NO_SLOT (-1)
+
+ /* State for each parallel activity slot */
+ typedef struct ParallelSlot
+ {
+ void *args;
+ T_WorkerStatus workerStatus;
+ int status;
+ int pipeRead;
+ int pipeWrite;
+ int pipeRevRead;
+ int pipeRevWrite;
+ #ifdef WIN32
+ uintptr_t hThread;
+ unsigned int threadId;
+ #else
+ pid_t pid;
+ #endif
+ } ParallelSlot;
+
+ typedef struct ParallelState
+ {
+ int numWorkers;
+ ParallelSlot *parallelSlot;
+ } ParallelState;
+
+ typedef struct ShutdownInformation
+ {
+ ParallelState *pstate;
+ void *handle;
+ } ShutdownInformation;
+
+ extern ShutdownInformation shutdown_info;
+
+ #define MAX_ON_EXIT_NICELY 20
+
+ typedef void (*on_exit_nicely_callback) (int code, void *arg);
+
+ typedef struct on_exit_nicely_stru
+ {
+ on_exit_nicely_callback function;
+ void *arg;
+ }on_exit_nicely_stru;
+
+ extern on_exit_nicely_stru on_exit_nicely_list[MAX_ON_EXIT_NICELY];
+
+ extern int on_exit_nicely_index;
+
+
+ #ifdef WIN32
+ extern bool parallel_init_done;
+ extern DWORD tls_index;
+
+ #else
+ extern bool aborting;
+
+ #endif
+
+ extern const char *progname;
+
+ extern char *
+ readMessageFromPipe(int fd);
+
+ void
+ parallel_msg_master(ParallelSlot *slot, const char *modulename,
+ const char *fmt, va_list ap)
+ __attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 0)));
+
+ extern void ShutdownWorkersHard(ParallelState *pstate);
+ extern void WaitForTerminatingWorkers(ParallelState *pstate);
+
+ extern bool HasEveryWorkerTerminated(ParallelState *pstate);
+
+ extern char *getMessageFromMaster(int pipefd[2]);
+ extern void sendMessageToMaster(int pipefd[2], const char *str);
+ extern int select_loop(int maxFd, fd_set *workerset);
+ extern char *getMessageFromWorker(ParallelState *pstate,
+ bool do_wait, int *worker);
+ extern void sendMessageToWorker(ParallelState *pstate,
+ int worker, const char *str);
+ extern int ReapWorkerStatus(ParallelState *pstate, int *status);
+
+ extern ParallelSlot *
+ GetMyPSlot(ParallelState *pstate);
+
+ extern void init_parallel_dump_utils(void);
+ extern int GetIdleWorker(ParallelState *pstate);
+ extern bool IsEveryWorkerIdle(ParallelState *pstate);
+ extern void checkAborting();
+
+ extern void
+ write_msg(const char *modulename, const char *fmt,...)
+ __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
+ extern void
+ vwrite_msg(const char *modulename, const char *fmt, va_list ap)
+ __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 0)));
+ extern void on_exit_nicely(on_exit_nicely_callback function, void *arg);
+ extern void exit_nicely(int code) __attribute__((noreturn));
+
+ extern void
+ exit_horribly(const char *modulename, const char *fmt,...)
+ __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3), noreturn));
+
+ #ifndef WIN32
+ extern void sigTermHandler(int signum);
+ #endif
+
+ #ifdef WIN32
+ extern void shutdown_parallel_dump_utils(int code, void *unused);
+ #endif
+
+ #endif /* PARALLEL_UTILS_H */
*** a/src/port/Makefile
--- b/src/port/Makefile
***************
*** 33,39 **** LIBS += $(PTHREAD_LIBS)
OBJS = $(LIBOBJS) chklocale.o dirmod.o erand48.o fls.o inet_net_ntop.o \
noblock.o path.o pgcheckdir.o pg_crc.o pgmkdirp.o pgsleep.o \
pgstrcasecmp.o pqsignal.o \
! qsort.o qsort_arg.o quotes.o sprompt.o tar.o thread.o
# foo_srv.o and foo.o are both built from foo.c, but only foo.o has -DFRONTEND
OBJS_SRV = $(OBJS:%.o=%_srv.o)
--- 33,39 ----
OBJS = $(LIBOBJS) chklocale.o dirmod.o erand48.o fls.o inet_net_ntop.o \
noblock.o path.o pgcheckdir.o pg_crc.o pgmkdirp.o pgsleep.o \
pgstrcasecmp.o pqsignal.o \
! qsort.o qsort_arg.o quotes.o sprompt.o tar.o thread.o parallel_utils.o
# foo_srv.o and foo.o are both built from foo.c, but only foo.o has -DFRONTEND
OBJS_SRV = $(OBJS:%.o=%_srv.o)
*** /dev/null
--- b/src/port/parallel_utils.c
***************
*** 0 ****
--- 1,737 ----
+ /*-------------------------------------------------------------------------
+ *
+ * parallel_utils.c
+ * kill()
+ *
+ * Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * Utility function for supporing parallel execution in client tools
+ *
+ * IDENTIFICATION
+ * src/port/parallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #include "postgres.h"
+ #include "parallel_utils.h"
+ #include "common/fe_memutils.h"
+
+ #ifndef WIN32
+ #include <sys/types.h>
+ #include <sys/wait.h>
+ #include "signal.h"
+ #include <unistd.h>
+ #include <fcntl.h>
+
+ #endif
+
+
+ #ifdef WIN32
+ DWORD mainThreadId;
+ bool parallel_init_done = false;
+ HANDLE termEvent = INVALID_HANDLE_VALUE;
+ DWORD tls_index;
+ #else
+ bool aborting = false;
+ static volatile sig_atomic_t wantAbort = 0;
+
+ #endif
+
+ static const char *modulename = gettext_noop("parallel utils");
+
+ ShutdownInformation shutdown_info;
+ int on_exit_nicely_index;
+ on_exit_nicely_stru on_exit_nicely_list[MAX_ON_EXIT_NICELY];
+
+
+
+ /*
+ * The underlying function to read a message from the communication channel
+ * (fd) with optional blocking (do_wait).
+ */
+ char *
+ readMessageFromPipe(int fd)
+ {
+ char *msg;
+ int msgsize,
+ bufsize;
+ int ret;
+
+ /*
+ * The problem here is that we need to deal with several possibilites: we
+ * could receive only a partial message or several messages at once. The
+ * caller expects us to return exactly one message however.
+ *
+ * We could either read in as much as we can and keep track of what we
+ * delivered back to the caller or we just read byte by byte. Once we see
+ * (char) 0, we know that it's the message's end. This would be quite
+ * inefficient for more data but since we are reading only on the command
+ * channel, the performance loss does not seem worth the trouble of
+ * keeping internal states for different file descriptors.
+ */
+ bufsize = 64; /* could be any number */
+ msg = (char *)malloc(bufsize);
+ if (!msg)
+ {
+ fprintf(stderr, _("out of memory\n"));
+ exit(1);
+ }
+
+
+ msgsize = 0;
+ for (;;)
+ {
+ Assert(msgsize <= bufsize);
+ ret = piperead(fd, msg + msgsize, 1);
+
+ /* worker has closed the connection or another error happened */
+ if (ret <= 0)
+ return NULL;
+
+ Assert(ret == 1);
+
+ if (msg[msgsize] == '\0')
+ return msg;
+
+ msgsize++;
+ if (msgsize == bufsize)
+ {
+ /* could be any number */
+ bufsize += 16;
+ msg = (char *) realloc(msg, bufsize);
+ }
+ }
+ }
+
+ #ifdef WIN32
+ /*
+ * This is a replacement version of pipe for Win32 which allows returned
+ * handles to be used in select(). Note that read/write calls must be replaced
+ * with recv/send. "handles" have to be integers so we check for errors then
+ * cast to integers.
+ */
+ int
+ pgpipe(int handles[2])
+ {
+ pgsocket s, tmp_sock;
+ struct sockaddr_in serv_addr;
+ int len = sizeof(serv_addr);
+
+ /* We have to use the Unix socket invalid file descriptor value here. */
+ handles[0] = handles[1] = -1;
+
+ /*
+ * setup listen socket
+ */
+ if ((s = socket(AF_INET, SOCK_STREAM, 0)) == PGINVALID_SOCKET)
+ {
+ fprintf(stderr, _("pgpipe: could not create socket: error code %d\n"),
+ WSAGetLastError());
+
+ return -1;
+ }
+
+ memset((void *) &serv_addr, 0, sizeof(serv_addr));
+ serv_addr.sin_family = AF_INET;
+ serv_addr.sin_port = htons(0);
+ serv_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+ if (bind(s, (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ fprintf(stderr, _("pgpipe: could not bind: error code %d\n"),
+ WSAGetLastError());
+ closesocket(s);
+ return -1;
+ }
+ if (listen(s, 1) == SOCKET_ERROR)
+ {
+ fprintf(stderr, _("pgpipe: could not listen: error code %d\n"),
+ WSAGetLastError());
+ closesocket(s);
+ return -1;
+ }
+ if (getsockname(s, (SOCKADDR *) &serv_addr, &len) == SOCKET_ERROR)
+ {
+ fprintf(stderr, _("pgpipe: getsockname() failed: error code %d\n"),
+ WSAGetLastError());
+ closesocket(s);
+ return -1;
+ }
+
+ /*
+ * setup pipe handles
+ */
+ if ((tmp_sock = socket(AF_INET, SOCK_STREAM, 0)) == PGINVALID_SOCKET)
+ {
+ fprintf(stderr, _("pgpipe: could not create second socket: error code %d\n"),
+ WSAGetLastError());
+ closesocket(s);
+ return -1;
+ }
+ handles[1] = (int) tmp_sock;
+
+ if (connect(handles[1], (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ fprintf(stderr, _("pgpipe: could not connect socket: error code %d\n"),
+ WSAGetLastError());
+ closesocket(s);
+ return -1;
+ }
+ if ((tmp_sock = accept(s, (SOCKADDR *) &serv_addr, &len)) == PGINVALID_SOCKET)
+ {
+ fprintf(stderr, _("pgpipe: could not accept connection: error code %d\n"),
+ WSAGetLastError());
+ closesocket(handles[1]);
+ handles[1] = -1;
+ closesocket(s);
+ return -1;
+ }
+ handles[0] = (int) tmp_sock;
+
+ closesocket(s);
+ return 0;
+ }
+
+ int
+ piperead(int s, char *buf, int len)
+ {
+ int ret = recv(s, buf, len, 0);
+
+ if (ret < 0 && WSAGetLastError() == WSAECONNRESET)
+ /* EOF on the pipe! (win32 socket based implementation) */
+ ret = 0;
+ return ret;
+ }
+
+ #endif
+
+
+ #ifdef WIN32
+ void
+ shutdown_parallel_dump_utils(int code, void *unused)
+ {
+ /* Call the cleanup function only from the main thread */
+ if (mainThreadId == GetCurrentThreadId())
+ WSACleanup();
+ }
+ #endif
+
+ void
+ init_parallel_dump_utils(void)
+ {
+ #ifdef WIN32
+ if (!parallel_init_done)
+ {
+ WSADATA wsaData;
+ int err;
+
+ tls_index = TlsAlloc();
+ mainThreadId = GetCurrentThreadId();
+ err = WSAStartup(MAKEWORD(2, 2), &wsaData);
+ if (err != 0)
+ {
+ fprintf(stderr, _("%s: WSAStartup failed: %d\n"), progname, err);
+ exit_nicely(1);
+ }
+ on_exit_nicely(shutdown_parallel_dump_utils, NULL);
+ parallel_init_done = true;
+ }
+ #endif
+ }
+
+ ParallelSlot *
+ GetMyPSlot(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ #ifdef WIN32
+ if (pstate->parallelSlot[i].threadId == GetCurrentThreadId())
+ #else
+ if (pstate->parallelSlot[i].pid == getpid())
+ #endif
+ return &(pstate->parallelSlot[i]);
+
+ return NULL;
+ }
+
+
+ /*
+ * Fail and die, with a message to stderr. Parameters as for write_msg.
+ *
+ * This is defined in parallel.c, because in parallel mode, things are more
+ * complicated. If the worker process does exit_horribly(), we forward its
+ * last words to the master process. The master process then does
+ * exit_horribly() with this error message itself and prints it normally.
+ * After printing the message, exit_horribly() on the master will shut down
+ * the remaining worker processes.
+ */
+ void
+ exit_horribly(const char *modulename, const char *fmt,...)
+ {
+ va_list ap;
+ ParallelState *pstate = shutdown_info.pstate;
+ ParallelSlot *slot;
+
+ va_start(ap, fmt);
+
+ if (pstate == NULL)
+ {
+ /* Not in parallel mode, just write to stderr */
+ vwrite_msg(modulename, fmt, ap);
+ }
+ else
+ {
+ slot = GetMyPSlot(pstate);
+
+ if (!slot)
+ /* We're the parent, just write the message out */
+ vwrite_msg(modulename, fmt, ap);
+ else
+ /* If we're a worker process, send the msg to the master process */
+ parallel_msg_master(slot, modulename, fmt, ap);
+ }
+
+ va_end(ap);
+
+ exit_nicely(1);
+ }
+
+ /* Sends the error message from the worker to the master process */
+ void
+ parallel_msg_master(ParallelSlot *slot, const char *modulename,
+ const char *fmt, va_list ap)
+ {
+ char buf[512];
+ int pipefd[2];
+
+ pipefd[PIPE_READ] = slot->pipeRevRead;
+ pipefd[PIPE_WRITE] = slot->pipeRevWrite;
+
+ strcpy(buf, "ERROR ");
+ vsnprintf(buf + strlen("ERROR "),
+ sizeof(buf) - strlen("ERROR "), fmt, ap);
+
+ sendMessageToMaster(pipefd, buf);
+ }
+
+ /*
+ * Shut down any remaining workers, this has an implicit do_wait == true.
+ *
+ * The fastest way we can make the workers terminate gracefully is when
+ * they are listening for new commands and we just tell them to terminate.
+ */
+ void
+ ShutdownWorkersHard(ParallelState *pstate)
+ {
+ int i;
+
+ #ifndef WIN32
+ signal(SIGPIPE, SIG_IGN);
+
+ /*
+ * Close our write end of the sockets so that the workers know they can
+ * exit.
+ */
+ for (i = 0; i < pstate->numWorkers; i++)
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ kill(pstate->parallelSlot[i].pid, SIGTERM);
+ #else
+ /*
+ * Close our write end of the sockets so that the workers know they can
+ * exit.
+ */
+ for (i = 0; i < pstate->numWorkers; i++)
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+
+ /* The workers monitor this event via checkAborting(). */
+ SetEvent(termEvent);
+ #endif
+
+ WaitForTerminatingWorkers(pstate);
+ }
+
+
+ /*
+ * Wait for the termination of the processes using the OS-specific method.
+ */
+ void
+ WaitForTerminatingWorkers(ParallelState *pstate)
+ {
+ while (!HasEveryWorkerTerminated(pstate))
+ {
+ ParallelSlot *slot = NULL;
+ int j;
+
+ #ifndef WIN32
+ int status;
+ pid_t pid = wait(&status);
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].pid == pid)
+ slot = &(pstate->parallelSlot[j]);
+ #else
+ uintptr_t hThread;
+ DWORD ret;
+ uintptr_t *lpHandles;
+ int nrun = 0;
+
+ lpHandles = malloc(sizeof(HANDLE) * pstate->numWorkers);
+ if (!lpHandles)
+ {
+ fprintf(stderr, _("out of memory\n"));
+ exit(1);
+ }
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].workerStatus != WRKR_TERMINATED)
+ {
+ lpHandles[nrun] = pstate->parallelSlot[j].hThread;
+ nrun++;
+ }
+ ret = WaitForMultipleObjects(nrun, (HANDLE *) lpHandles, false, INFINITE);
+ Assert(ret != WAIT_FAILED);
+ hThread = lpHandles[ret - WAIT_OBJECT_0];
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].hThread == hThread)
+ slot = &(pstate->parallelSlot[j]);
+
+ free(lpHandles);
+ #endif
+ Assert(slot);
+
+ slot->workerStatus = WRKR_TERMINATED;
+ }
+ Assert(HasEveryWorkerTerminated(pstate));
+ }
+
+ #ifndef WIN32
+ /* Signal handling (UNIX only) */
+ void
+ sigTermHandler(int signum)
+ {
+ wantAbort = 1;
+ }
+ #endif
+
+ /*
+ * Find the first free parallel slot (if any).
+ */
+ int
+ GetIdleWorker(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus == WRKR_IDLE)
+ return i;
+ return NO_SLOT;
+ }
+
+ /*
+ * Return true iff every worker process is in the WRKR_TERMINATED state.
+ */
+ bool
+ HasEveryWorkerTerminated(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_TERMINATED)
+ return false;
+ return true;
+ }
+
+ /*
+ * Return true iff every worker is in the WRKR_IDLE state.
+ */
+ bool
+ IsEveryWorkerIdle(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_IDLE)
+ return false;
+ return true;
+ }
+
+ /*
+ * This function is executed in the worker process.
+ *
+ * It returns the next message on the communication channel, blocking until it
+ * becomes available.
+ */
+ char *
+ getMessageFromMaster(int pipefd[2])
+ {
+ return readMessageFromPipe(pipefd[PIPE_READ]);
+ }
+
+ /*
+ * This function is executed in the worker process.
+ *
+ * It sends a message to the master on the communication channel.
+ */
+ void
+ sendMessageToMaster(int pipefd[2], const char *str)
+ {
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pipefd[PIPE_WRITE], str, len) != len)
+ exit_horribly(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+ }
+
+ /*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+ int
+ select_loop(int maxFd, fd_set *workerset)
+ {
+ int i;
+ fd_set saveSet = *workerset;
+
+ #ifdef WIN32
+ /* should always be the master */
+ Assert(tMasterThreadId == GetCurrentThreadId());
+
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+ #else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+
+ /*
+ * If we Ctrl-C the master process , it's likely that we interrupt
+ * select() here. The signal handler will set wantAbort == true and
+ * the shutdown journey starts from here. Note that we'll come back
+ * here later when we tell all workers to terminate and read their
+ * responses. But then we have aborting set to true.
+ */
+ if (wantAbort && !aborting)
+ exit_horribly(modulename, "terminated by user\n");
+
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+ #endif
+
+ return i;
+ }
+
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It returns the next message from the worker on the communication channel,
+ * optionally blocking (do_wait) until it becomes available.
+ *
+ * The id of the worker is returned in *worker.
+ */
+ char *
+ getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker)
+ {
+ int i;
+ fd_set workerset;
+ int maxFd = -1;
+ struct timeval nowait = {0, 0};
+
+ FD_ZERO(&workerset);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_TERMINATED)
+ continue;
+ FD_SET(pstate->parallelSlot[i].pipeRead, &workerset);
+ /* actually WIN32 ignores the first parameter to select()... */
+ if (pstate->parallelSlot[i].pipeRead > maxFd)
+ maxFd = pstate->parallelSlot[i].pipeRead;
+ }
+
+ if (do_wait)
+ {
+ i = select_loop(maxFd, &workerset);
+ Assert(i != 0);
+ }
+ else
+ {
+ if ((i = select(maxFd + 1, &workerset, NULL, NULL, &nowait)) == 0)
+ return NULL;
+ }
+
+ if (i < 0)
+ exit_horribly(modulename, "error in ListenToWorkers(): %s\n", strerror(errno));
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ char *msg;
+
+ if (!FD_ISSET(pstate->parallelSlot[i].pipeRead, &workerset))
+ continue;
+
+ msg = readMessageFromPipe(pstate->parallelSlot[i].pipeRead);
+ *worker = i;
+ return msg;
+ }
+ Assert(false);
+ return NULL;
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It sends a message to a certain worker on the communication channel.
+ */
+ void
+ sendMessageToWorker(ParallelState *pstate, int worker, const char *str)
+ {
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pstate->parallelSlot[worker].pipeWrite, str, len) != len)
+ {
+ /*
+ * If we're already aborting anyway, don't care if we succeed or not.
+ * The child might have gone already.
+ */
+ #ifndef WIN32
+ if (!aborting)
+ #endif
+ exit_horribly(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+ }
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * This function is used to get the return value of a terminated worker
+ * process. If a process has terminated, its status is stored in *status and
+ * the id of the worker is returned.
+ */
+ int
+ ReapWorkerStatus(ParallelState *pstate, int *status)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_FINISHED)
+ {
+ *status = pstate->parallelSlot[i].status;
+ pstate->parallelSlot[i].status = 0;
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ return i;
+ }
+ }
+ return NO_SLOT;
+ }
+
+ /*
+ * If we have one worker that terminates for some reason, we'd like the other
+ * threads to terminate as well (and not finish with their 70 GB table dump
+ * first...). Now in UNIX we can just kill these processes, and let the signal
+ * handler set wantAbort to 1. In Windows we set a termEvent and this serves
+ * as the signal for everyone to terminate.
+ */
+ void
+ checkAborting()
+ {
+ #ifdef WIN32
+ if (WaitForSingleObject(termEvent, 0) == WAIT_OBJECT_0)
+ #else
+ if (wantAbort)
+ #endif
+ exit_horribly(modulename, "worker is terminating\n");
+ }
+
+
+ /*
+ * Write a printf-style message to stderr.
+ *
+ * The program name is prepended, if "progname" has been set.
+ * Also, if modulename isn't NULL, that's included too.
+ * Note that we'll try to translate the modulename and the fmt string.
+ */
+ void
+ write_msg(const char *modulename, const char *fmt,...)
+ {
+ va_list ap;
+
+ va_start(ap, fmt);
+ vwrite_msg(modulename, fmt, ap);
+ va_end(ap);
+ }
+
+ /*
+ * As write_msg, but pass a va_list not variable arguments.
+ */
+ void
+ vwrite_msg(const char *modulename, const char *fmt, va_list ap)
+ {
+ if (progname)
+ {
+ if (modulename)
+ fprintf(stderr, "%s: [%s] ", progname, _(modulename));
+ else
+ fprintf(stderr, "%s: ", progname);
+ }
+ vfprintf(stderr, _(fmt), ap);
+ }
+
+ /* Register a callback to be run when exit_nicely is invoked. */
+ void
+ on_exit_nicely(on_exit_nicely_callback function, void *arg)
+ {
+ if (on_exit_nicely_index >= MAX_ON_EXIT_NICELY)
+ exit_horribly(NULL, "out of on_exit_nicely slots\n");
+ on_exit_nicely_list[on_exit_nicely_index].function = function;
+ on_exit_nicely_list[on_exit_nicely_index].arg = arg;
+ on_exit_nicely_index++;
+ }
+
+ /*
+ * Run accumulated on_exit_nicely callbacks in reverse order and then exit
+ * quietly. This needs to be thread-safe.
+ */
+ void
+ exit_nicely(int code)
+ {
+ int i;
+
+ for (i = on_exit_nicely_index - 1; i >= 0; i--)
+ (*on_exit_nicely_list[i].function) (code,
+ on_exit_nicely_list[i].arg);
+
+ #ifdef WIN32
+ if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
+ ExitThread(code);
+ #endif
+
+ exit(code);
+ }
+
*** a/src/tools/msvc/Mkvcbuild.pm
--- b/src/tools/msvc/Mkvcbuild.pm
***************
*** 71,77 **** sub mkvcbuild
pgcheckdir.c pg_crc.c pgmkdirp.c pgsleep.c pgstrcasecmp.c pqsignal.c
mkdtemp.c qsort.c qsort_arg.c quotes.c system.c
sprompt.c tar.c thread.c getopt.c getopt_long.c dirent.c
! win32env.c win32error.c win32setlocale.c);
push(@pgportfiles, 'rint.c') if ($vsVersion < '12.00');
--- 71,77 ----
pgcheckdir.c pg_crc.c pgmkdirp.c pgsleep.c pgstrcasecmp.c pqsignal.c
mkdtemp.c qsort.c qsort_arg.c quotes.c system.c
sprompt.c tar.c thread.c getopt.c getopt_long.c dirent.c
! win32env.c win32error.c win32setlocale.c parallel_utils.c);
push(@pgportfiles, 'rint.c') if ($vsVersion < '12.00');
vacuumdb_parallel_v9.patchapplication/octet-stream; name=vacuumdb_parallel_v9.patchDownload
*** a/doc/src/sgml/ref/vacuumdb.sgml
--- b/doc/src/sgml/ref/vacuumdb.sgml
***************
*** 224,229 **** PostgreSQL documentation
--- 224,239 ----
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></></term>
+ <term><option>--jobs=<replaceable class="parameter">jobs</replaceable></></term>
+ <listitem>
+ <para>
+ Number of parallel process to perform the operation.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-?</></term>
<term><option>--help</></term>
<listitem>
*** a/src/bin/scripts/Makefile
--- b/src/bin/scripts/Makefile
***************
*** 32,38 **** dropdb: dropdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq subm
droplang: droplang.o common.o print.o mbprint.o | submake-libpq submake-libpgport
dropuser: dropuser.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
clusterdb: clusterdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
! vacuumdb: vacuumdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
reindexdb: reindexdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
pg_isready: pg_isready.o common.o | submake-libpq submake-libpgport
--- 32,38 ----
droplang: droplang.o common.o print.o mbprint.o | submake-libpq submake-libpgport
dropuser: dropuser.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
clusterdb: clusterdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
! vacuumdb: vacuumdb.o vac_parallel.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
reindexdb: reindexdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
pg_isready: pg_isready.o common.o | submake-libpq submake-libpgport
***************
*** 65,71 **** uninstall:
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
! rm -f common.o dumputils.o kwlookup.o keywords.o print.o mbprint.o $(WIN32RES)
rm -f dumputils.c print.c mbprint.c kwlookup.c keywords.c
rm -rf tmp_check
--- 65,71 ----
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
! rm -f common.o dumputils.o kwlookup.o keywords.o print.o mbprint.o vac_parallel.o $(WIN32RES)
rm -f dumputils.c print.c mbprint.c kwlookup.c keywords.c
rm -rf tmp_check
*** /dev/null
--- b/src/bin/scripts/vac_parallel.c
***************
*** 0 ****
--- 1,498 ----
+ /*-------------------------------------------------------------------------
+ *
+ * vac_parallel.c
+ *
+ * Parallel support for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #include "vac_parallel.h"
+
+ #ifndef WIN32
+ #include <sys/types.h>
+ #include <sys/wait.h>
+ #include "signal.h"
+ #include <unistd.h>
+ #include <fcntl.h>
+ #endif
+ #include "common.h"
+
+ /* file-scope variables */
+ #ifdef WIN32
+ static unsigned int tMasterThreadId = 0;
+
+ /*
+ * Structure to hold info passed by _beginthreadex() to the function it calls
+ * via its single allowed argument.
+ */
+ typedef struct
+ {
+ VacOpt *vopt;
+ int worker;
+ int pipeRead;
+ int pipeWrite;
+ } WorkerInfo;
+ #endif
+
+ static const char *modulename = gettext_noop("parallel vacuum");
+
+ static void
+ SetupWorker(PGconn *connection, int pipefd[2], int worker, int vacStage);
+
+ static void
+ WaitForCommands(PGconn * connection, int pipefd[2]);
+
+ static void
+ vacuum_close_connection(int code, void *arg);
+
+ #define messageStartsWith(msg, prefix) \
+ (strncmp(msg, prefix, strlen(prefix)) == 0)
+ #define messageEquals(msg, pattern) \
+ (strcmp(msg, pattern) == 0)
+
+
+ #ifdef WIN32
+ static unsigned __stdcall
+ init_spawned_worker_win32(WorkerInfo *wi)
+ {
+ PGconn *conn;
+ int pipefd[2] = {wi->pipeRead, wi->pipeWrite};
+ int worker = wi->worker;
+ VacOpt *vopt = wi->vopt;
+ ParallelSlot *mySlot = &shutdown_info.pstate->parallelSlot[worker];
+
+ conn = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport,
+ vopt->username, vopt->promptPassword, vopt->progname,
+ false);
+
+ ((ParallelArgs*)mySlot->args)->connection = conn;
+
+ free(wi);
+ SetupWorker(conn, pipefd, worker, vopt->analyze_stage);
+ _endthreadex(0);
+ return 0;
+ }
+ #endif
+
+
+
+ ParallelState * ParallelVacuumStart(VacOpt *vopt, int numWorkers)
+ {
+ ParallelState *pstate;
+ int i;
+ const size_t slotSize = numWorkers * sizeof(ParallelSlot);
+
+ Assert(numWorkers > 0);
+
+ /* Ensure stdio state is quiesced before forking */
+ fflush(NULL);
+
+ pstate = (ParallelState *) pg_malloc(sizeof(ParallelState));
+
+ pstate->numWorkers = numWorkers;
+ pstate->parallelSlot = NULL;
+
+ if (numWorkers == 1)
+ return pstate;
+
+ pstate->parallelSlot = (ParallelSlot *) pg_malloc(slotSize);
+ memset((void *) pstate->parallelSlot, 0, slotSize);
+
+ shutdown_info.pstate = pstate;
+
+ #ifdef WIN32
+ tMasterThreadId = GetCurrentThreadId();
+ termEvent = CreateEvent(NULL, true, false, "Terminate");
+ #else
+ signal(SIGTERM, sigTermHandler);
+ signal(SIGINT, sigTermHandler);
+ signal(SIGQUIT, sigTermHandler);
+ #endif
+
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ #ifdef WIN32
+ WorkerInfo *wi;
+ uintptr_t handle;
+ #else
+ pid_t pid;
+ #endif
+ int pipeMW[2],
+ pipeWM[2];
+
+ if (pgpipe(pipeMW) < 0 || pgpipe(pipeWM) < 0)
+ exit_horribly(modulename,
+ "could not create communication channels: %s\n",
+ strerror(errno));
+
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ pstate->parallelSlot[i].args = (ParallelArgs *) pg_malloc(sizeof(ParallelArgs));
+ ((ParallelArgs*)pstate->parallelSlot[i].args)->vopt = vopt;
+ #ifdef WIN32
+ /* Allocate a new structure for every worker */
+ wi = (WorkerInfo *) pg_malloc(sizeof(WorkerInfo));
+ wi->worker = i;
+ wi->pipeRead = pstate->parallelSlot[i].pipeRevRead = pipeMW[PIPE_READ];
+ wi->pipeWrite = pstate->parallelSlot[i].pipeRevWrite = pipeWM[PIPE_WRITE];
+ wi->vopt = vopt;
+ handle = _beginthreadex(NULL, 0, (void *) &init_spawned_worker_win32,
+ wi, 0, &(pstate->parallelSlot[i].threadId));
+ pstate->parallelSlot[i].hThread = handle;
+ #else
+ pid = fork();
+ if (pid == 0)
+ {
+ /* we are the worker */
+ int j;
+ int pipefd[2] = {pipeMW[PIPE_READ], pipeWM[PIPE_WRITE]};
+
+ /*
+ * Store the fds for the reverse communication in pstate. Actually
+ * we only use this in case of an error and don't use pstate
+ * otherwise in the worker process. On Windows we write to the
+ * global pstate, in Unix we write to our process-local copy but
+ * that's also where we'd retrieve this information back from.
+ */
+ pstate->parallelSlot[i].pipeRevRead = pipefd[PIPE_READ];
+ pstate->parallelSlot[i].pipeRevWrite = pipefd[PIPE_WRITE];
+ pstate->parallelSlot[i].pid = getpid();
+
+ ((ParallelArgs*)pstate->parallelSlot[i].args)->connection
+ = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport,
+ vopt->username, vopt->promptPassword,
+ vopt->progname, false);
+
+ /* close read end of Worker -> Master */
+ closesocket(pipeWM[PIPE_READ]);
+
+ /* close write end of Master -> Worker */
+ closesocket(pipeMW[PIPE_WRITE]);
+
+ /*
+ * Close all inherited fds for communication of the master with
+ * the other workers.
+ */
+ for (j = 0; j < i; j++)
+ {
+ closesocket(pstate->parallelSlot[j].pipeRead);
+ closesocket(pstate->parallelSlot[j].pipeWrite);
+ }
+
+ SetupWorker(((ParallelArgs*)pstate->parallelSlot[i].args)->connection,
+ pipefd, i, vopt->analyze_stage);
+ exit(0);
+ }
+ else if (pid < 0)
+ /* fork failed */
+ exit_horribly(modulename,
+ "could not create worker process: %s\n",
+ strerror(errno));
+
+ /* we are the Master, pid > 0 here */
+ Assert(pid > 0);
+
+ /* close read end of Master -> Worker */
+ closesocket(pipeMW[PIPE_READ]);
+
+ /* close write end of Worker -> Master */
+ closesocket(pipeWM[PIPE_WRITE]);
+
+ pstate->parallelSlot[i].pid = pid;
+ #endif
+
+ pstate->parallelSlot[i].pipeRead = pipeWM[PIPE_READ];
+ pstate->parallelSlot[i].pipeWrite = pipeMW[PIPE_WRITE];
+ }
+
+ return pstate;
+ }
+
+ /*
+ * Tell all of our workers to terminate.
+ *
+ * Pretty straightforward routine, first we tell everyone to terminate, then
+ * we listen to the workers' replies and finally close the sockets that we
+ * have used for communication.
+ */
+ void
+ ParallelVacuumEnd(ParallelState *pstate)
+ {
+ int i;
+
+ Assert(IsEveryWorkerIdle(pstate));
+
+ /* close the sockets so that the workers know they can exit */
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ closesocket(pstate->parallelSlot[i].pipeRead);
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+ }
+
+ WaitForTerminatingWorkers(pstate);
+
+ /*
+ * Remove the pstate again, so the exit handler in the parent will now
+ * again fall back to closing AH->connection (if connected).
+ */
+ shutdown_info.pstate = NULL;
+
+ free(pstate->parallelSlot);
+ free(pstate);
+ }
+
+ /*
+ * This function is called by both UNIX and Windows variants to set up a
+ * worker process.
+ */
+ static void
+ SetupWorker(PGconn *connection, int pipefd[2], int worker, int vacStage)
+ {
+
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"};
+
+ if (vacStage >= 0)
+ executeMaintenanceCommand(connection, stage_commands[vacStage], false);
+
+ /*
+ * Call the setup worker function that's defined in the ArchiveHandle.
+ *
+ * We get the raw connection only for the reason that we can close it
+ * properly when we shut down. This happens only that way when it is
+ * brought down because of an error.
+ */
+ WaitForCommands(connection, pipefd);
+ closesocket(pipefd[PIPE_READ]);
+ closesocket(pipefd[PIPE_WRITE]);
+ }
+
+
+
+ /*
+ * That's the main routine for the worker.
+ * When it starts up it enters this routine and waits for commands from the
+ * master process. After having processed a command it comes back to here to
+ * wait for the next command. Finally it will receive a TERMINATE command and
+ * exit.
+ */
+ static void
+ WaitForCommands(PGconn * connection, int pipefd[2])
+ {
+ char *command;
+ PQExpBufferData sql;
+
+ for (;;)
+ {
+ if (!(command = getMessageFromMaster(pipefd)))
+ {
+ PQfinish(connection);
+ connection = NULL;
+ return;
+ }
+
+
+ /* check if master has set the terminate event*/
+ checkAborting();
+
+ if (executeMaintenanceCommand(connection, command, false))
+ sendMessageToMaster(pipefd, "OK");
+ else
+ {
+ initPQExpBuffer(&sql);
+ appendPQExpBuffer(&sql, "ERROR : %s",
+ PQerrorMessage(connection));
+ sendMessageToMaster(pipefd, sql.data);
+ termPQExpBuffer(&sql);
+ }
+
+ /* command was pg_malloc'd and we are responsible for free()ing it. */
+ free(command);
+ }
+ }
+
+ /*
+ * ---------------------------------------------------------------------
+ * Note the status change:
+ *
+ * DispatchJobForTocEntry WRKR_IDLE -> WRKR_WORKING
+ * ListenToWorkers WRKR_WORKING -> WRKR_FINISHED / WRKR_TERMINATED
+ * ReapWorkerStatus WRKR_FINISHED -> WRKR_IDLE
+ * ---------------------------------------------------------------------
+ *
+ * Just calling ReapWorkerStatus() when all workers are working might or might
+ * not give you an idle worker because you need to call ListenToWorkers() in
+ * between and only thereafter ReapWorkerStatus(). This is necessary in order
+ * to get and deal with the status (=result) of the worker's execution.
+ */
+ void
+ ListenToWorkers(ParallelState *pstate, bool do_wait)
+ {
+ int worker;
+ char *msg;
+
+ msg = getMessageFromWorker(pstate, do_wait, &worker);
+
+ if (!msg)
+ {
+ if (do_wait)
+ exit_horribly(modulename, "a worker process died unexpectedly\n");
+ return;
+ }
+
+ if (messageStartsWith(msg, "OK"))
+ {
+ pstate->parallelSlot[worker].workerStatus = WRKR_FINISHED;
+ }
+ else if (messageStartsWith(msg, "ERROR "))
+ {
+ ParallelSlot *mySlot = &pstate->parallelSlot[worker];
+
+ mySlot->workerStatus = WRKR_TERMINATED;
+ exit_horribly(modulename,
+ "vacuuming of database \"%s\" failed %s",
+ ((ParallelArgs*)mySlot->args)->vopt->dbname, msg + strlen("ERROR "));
+ }
+ else
+ {
+ exit_horribly(modulename,
+ "invalid message received from worker: %s\n", msg);
+ }
+
+ /* both Unix and Win32 return pg_malloc()ed space, so we free it */
+ free(msg);
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It looks for an idle worker process and only returns if there is one.
+ */
+ void
+ EnsureIdleWorker(ParallelState *pstate)
+ {
+ int ret_worker;
+ int work_status;
+
+ for (;;)
+ {
+ int nTerm = 0;
+
+ while ((ret_worker = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ if (work_status != 0)
+ exit_horribly(modulename, "error processing a parallel work item\n");
+
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before dispatching
+ * the next item. If nTerm > 0 we already have that (quick check).
+ */
+ if (nTerm > 0)
+ return;
+
+ /* explicit check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ return;
+
+ /*
+ * If we have no idle worker, read the result of one or more workers
+ * and loop the loop to call ReapWorkerStatus() on them
+ */
+ ListenToWorkers(pstate, true);
+ }
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It waits for all workers to terminate.
+ */
+ void
+ EnsureWorkersFinished(ParallelState *pstate)
+ {
+ int work_status;
+
+ if (!pstate || pstate->numWorkers == 1)
+ return;
+
+ /* Waiting for the remaining worker processes to finish */
+ while (!IsEveryWorkerIdle(pstate))
+ {
+ if (ReapWorkerStatus(pstate, &work_status) == NO_SLOT)
+ ListenToWorkers(pstate, true);
+ else if (work_status != 0)
+ exit_horribly(modulename,
+ "error processing a parallel work item\n");
+ }
+ }
+
+ void
+ DispatchJob(ParallelState *pstate, char * command)
+ {
+ int worker;
+
+ /* our caller makes sure that at least one worker is idle */
+ Assert(GetIdleWorker(pstate) != NO_SLOT);
+ worker = GetIdleWorker(pstate);
+ Assert(worker != NO_SLOT);
+
+ sendMessageToWorker(pstate, worker, command);
+ pstate->parallelSlot[worker].workerStatus = WRKR_WORKING;
+ }
+
+ void
+ on_exit_close_vacuum(PGconn *conn)
+ {
+ shutdown_info.handle = (void*)conn;
+ on_exit_nicely(vacuum_close_connection, &shutdown_info);
+ }
+
+ /*
+ * This function can close archives in both the parallel and non-parallel
+ * case.
+ */
+ static void
+ vacuum_close_connection(int code, void *arg)
+ {
+ ShutdownInformation *si = (ShutdownInformation *) arg;
+
+ if (si->pstate)
+ {
+ ParallelSlot *slot = GetMyPSlot(si->pstate);
+
+ if (!slot)
+ {
+ PQfinish((PGconn*)si->handle);
+ #ifndef WIN32
+
+ /*
+ * Setting aborting to true switches to best-effort-mode
+ * (send/receive but ignore errors) in communicating with our
+ * workers.
+ */
+ aborting = true;
+ #endif
+ ShutdownWorkersHard(si->pstate);
+ }
+ else if (((ParallelArgs*)slot->args)->connection)
+ PQfinish((((ParallelArgs*)slot->args)->connection));
+ }
+ else if ((PGconn*)si->handle)
+ PQfinish((PGconn*)si->handle);
+ }
+
*** /dev/null
--- b/src/bin/scripts/vac_parallel.h
***************
*** 0 ****
--- 1,58 ----
+ /*-------------------------------------------------------------------------
+ *
+ * vac_parallel.h
+ *
+ * Parallel support header file for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #ifndef VAC_PARALLEL_H
+ #define VAC_PARALLEL_H
+
+ #include "postgres_fe.h"
+ #include <time.h>
+ #include "libpq-fe.h"
+ #include "common.h"
+ #include "parallel_utils.h"
+
+
+ typedef struct VacOpt
+ {
+ char *dbname;
+ char *pgport;
+ char *pghost;
+ char *username;
+ char *progname;
+ enum trivalue promptPassword;
+ int analyze_stage;
+ }VacOpt;
+
+ /* Arguments needed for a worker process */
+ typedef struct ParallelArgs
+ {
+ PGconn *connection;
+ VacOpt *vopt;
+ } ParallelArgs;
+
+
+ extern ParallelState * ParallelVacuumStart(VacOpt *vopt, int numWorkers);
+ extern bool IsEveryWorkerIdle(ParallelState *pstate);
+ extern void ListenToWorkers(ParallelState *pstate, bool do_wait);
+ extern void EnsureIdleWorker(ParallelState *pstate);
+ extern void EnsureWorkersFinished(ParallelState *pstate);
+
+ extern void DispatchJob(ParallelState *pstate, char * command);
+ extern void on_exit_close_vacuum(PGconn *conn);
+ extern void ParallelVacuumEnd(ParallelState *pstate);
+
+ #endif /* VAC_PARALLEL_H */
*** a/src/bin/scripts/vacuumdb.c
--- b/src/bin/scripts/vacuumdb.c
***************
*** 13,34 ****
#include "postgres_fe.h"
#include "common.h"
#include "dumputils.h"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *table, const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo);
static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
bool analyze_only, bool analyze_in_stages, bool freeze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet);
static void help(const char *progname);
int
main(int argc, char *argv[])
--- 13,54 ----
#include "postgres_fe.h"
#include "common.h"
#include "dumputils.h"
+ #include "vac_parallel.h"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, bool echo);
static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
bool analyze_only, bool analyze_in_stages, bool freeze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! bool echo, bool quiet, int parallel);
static void help(const char *progname);
+ void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host,
+ const char *port, const char *username,
+ enum trivalue prompt_password,
+ bool echo, int parallel, SimpleStringList *tables);
+
+ void run_command(ParallelState *pstate, char *command);
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+ static void
+ run_parallel_vacuum(PGconn *conn, bool echo,
+ const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int parallel,
+ VacOpt *vopt);
+
+ const char *progname = NULL;
int
main(int argc, char *argv[])
***************
*** 49,60 **** main(int argc, char *argv[])
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
};
- const char *progname;
int optindex;
int c;
--- 69,80 ----
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
};
int optindex;
int c;
***************
*** 74,86 **** main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
{
switch (c)
{
--- 94,107 ----
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int parallel = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
***************
*** 129,134 **** main(int argc, char *argv[])
--- 150,165 ----
case 'v':
verbose = true;
break;
+ case 'j':
+ parallel = atoi(optarg);
+ if (parallel <= 0)
+ {
+ fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least 1\n"),
+ progname);
+ exit(1);
+ }
+
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
***************
*** 141,146 **** main(int argc, char *argv[])
--- 172,178 ----
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
***************
*** 196,202 **** main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet);
}
else
{
--- 228,234 ----
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, echo, quiet, parallel);
}
else
{
***************
*** 210,234 **** main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
! for (cell = tables.head; cell; cell = cell->next)
{
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo);
}
}
- else
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages,
- freeze, NULL,
- host, port, username, prompt_password,
- progname, echo);
}
exit(0);
--- 242,280 ----
dbname = get_user_name_or_exit(progname);
}
! if (parallel > 1)
{
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! echo, parallel, &tables);
! }
! else
! {
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
!
! for (cell = tables.head; cell; cell = cell->next)
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, cell->val,
! host, port, username, prompt_password,
! echo);
! }
! }
! else
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, NULL,
! host, port, username, prompt_password,
! echo);
!
}
}
}
exit(0);
***************
*** 253,263 **** run_vacuum_command(PGconn *conn, const char *sql, bool echo, const char *dbname,
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze, const char *table,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo)
{
PQExpBufferData sql;
--- 299,309 ----
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, bool echo)
{
PQExpBufferData sql;
***************
*** 352,362 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
! bool analyze_in_stages, bool freeze, const char *maintenance_db,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
--- 398,409 ----
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *maintenance_db, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password,
! bool echo, bool quiet, int parallel)
{
PGconn *conn;
PGresult *result;
***************
*** 377,391 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
fflush(stdout);
}
! vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
! analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
! progname, echo);
}
PQclear(result);
}
static void
help(const char *progname)
--- 424,668 ----
fflush(stdout);
}
!
! if (parallel > 1)
! {
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! echo, parallel, NULL);
!
! }
! else
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
! echo);
! }
}
PQclear(result);
}
+ static void
+ run_parallel_vacuum(PGconn *conn, bool echo,
+ const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int parallel,
+ VacOpt *vopt)
+ {
+ ParallelState *pstate;
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+
+ initPQExpBuffer(&sql);
+
+ pstate = ParallelVacuumStart(vopt, parallel);
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ prepare_command(conn, full, verbose, and_analyze,
+ analyze_only, freeze, &sql);
+ appendPQExpBuffer(&sql, " %s", cell->val);
+ run_command(pstate, sql.data);
+ termPQExpBuffer(&sql);
+ }
+
+ EnsureWorkersFinished(pstate);
+ ParallelVacuumEnd(pstate);
+ termPQExpBuffer(&sql);
+ }
+
+
+ void
+ vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ bool echo, int parallel, SimpleStringList *tables)
+ {
+
+ PGconn *conn;
+ VacOpt vopt = {0};
+
+ init_parallel_dump_utils();
+
+ conn = connectDatabase(dbname, host, port, username, prompt_password,
+ progname, false);
+
+ if (dbname)
+ vopt.dbname = pg_strdup(dbname);
+
+ if (host)
+ vopt.pghost = pg_strdup(host);
+
+ if (port)
+ vopt.pgport = pg_strdup(port);
+
+ if (username)
+ vopt.username = pg_strdup(username);
+
+ if (progname)
+ vopt.progname = pg_strdup(progname);
+
+ vopt.promptPassword = prompt_password;
+
+ on_exit_close_vacuum(conn);
+
+ /* if table list is not provided then we need to do vaccum for whole DB
+ get the list of all tables and prpare the list*/
+ if (!tables || !tables->head)
+ {
+ SimpleStringList dbtables = {NULL, NULL};
+ PGresult *res;
+ int ntuple;
+ int i;
+ char *relName;
+ char *nspace;
+ PQExpBufferData sql;
+
+ initPQExpBuffer(&sql);
+
+ res = executeQuery(conn,
+ "select relname, nspname from pg_class c, pg_namespace ns"
+ " where relkind= \'r\' and c.relnamespace = ns.oid"
+ " order by relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ relName = PQgetvalue(res, i, 0);
+ nspace = PQgetvalue(res, i, 1);
+
+ appendPQExpBuffer(&sql, " %s.\"%s\"", nspace, relName);
+ simple_string_list_append(&dbtables, sql.data);
+ resetPQExpBuffer(&sql);
+ }
+
+ termPQExpBuffer(&sql);
+
+ tables = &dbtables;
+
+ }
+
+ if (analyze_in_stages)
+ {
+ int i;
+
+ for (i = 0; i < 3; i++)
+ {
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")};
+
+ puts(gettext(stage_messages[i]));
+ vopt.analyze_stage = i;
+ run_parallel_vacuum(conn, echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, parallel, &vopt);
+ }
+ }
+ else
+ {
+ vopt.analyze_stage = -1;
+ run_parallel_vacuum(conn, echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, parallel, &vopt);
+ }
+
+ PQfinish(conn);
+ }
+
+ void run_command(ParallelState *pstate, char *command)
+ {
+ int work_status;
+ int ret_child;
+
+ DispatchJob(pstate, command);
+
+ /*Listen for worker and get message*/
+ for (;;)
+ {
+ int nTerm = 0;
+
+ ListenToWorkers(pstate, false);
+ while ((ret_child = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before
+ * re-running the loop. If nTerm > 0 we already have that (quick
+ * check).
+ */
+ if (nTerm > 0)
+ break;
+
+ /* if nobody terminated, explicitly check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ break;
+ }
+ }
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+ {
+ initPQExpBuffer(sql);
+
+ if (analyze_only)
+ {
+ appendPQExpBuffer(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBuffer(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBuffer(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBuffer(sql, " FULL");
+ if (freeze)
+ appendPQExpBuffer(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBuffer(sql, " ANALYZE");
+ }
+ }
+ }
+
static void
help(const char *progname)
***************
*** 405,410 **** help(const char *progname)
--- 682,688 ----
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
On Fri, Jun 27, 2014 at 4:10 AM, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 27 June 2014 02:57, Jeff Wrote,
Based on that, I find most importantly that it doesn't seem to
correctly vacuum tables which have upper case letters in the name,
because it does not quote the table names when they need quotes.Thanks for your comments....
There are two problem
First -> When doing the vacuum of complete database that time if any table with upper case letter, it was giving error
--FIXED by adding quotes for table nameSecond -> When user pass the table using -t option, and if it has uppercase letter
--This is the existing problem (without parallel implementation),
Just for the record, I don't think the second one is actually a bug.
If someone uses -t option from the command line, they are required to
provide the quotes if quotes are needed, just like they would need to
in psql. That can be annoying to do from a shell, as you then need to
protect the quotes themselves from the shell, but that is the way it
is.
vacuumdb -t '"CrAzY QuOtE"'
or
vacuumdb -t \"CrAzY\ QuOtE\"
Cheers,
Jeff
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jul 1, 2014 at 6:25 AM, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 01 July 2014 03:48, Alvaro Wrote,
In particular, pgpipe is almost an exact duplicate between them,
except the copy in vac_parallel.c has fallen behind changes made to
parallel.c. (Those changes would have fixed the Windows warnings).I
think that this function (and perhaps other parts as
well--"exit_horribly" for example) need to refactored into a common
file that both files can include. I don't know where the best place
for that would be, though. (I haven't done this type of refactoring
myself.)I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos.
Maybe we should move pgpipe back to src/port and have pg_dump and this
new thing use that. I'm not sure about the rest of duplication in
vac_parallel.c; there might be a lot in common with what
pg_dump/parallel.c does too. Having two copies of code is frowned upon
for good reasons. This patch introduces 1200 lines of new code in
vac_parallel.c, ugh.If we really require 1200 lines to get parallel vacuum working for
vacuumdb, I would question the wisdom of this effort. To me, it seems
better spent improving autovacuum to cover whatever it is that this
patch is supposed to be good for --- or maybe just enable having a
shell script that launches multiple vacuumdb instances in parallel ...Thanks for looking into the patch,
I think if we use shell script for launching parallel vacuumdb, we cannot get complete control of dividing the task,
If we directly divide table b/w multiple process, it may happen some process get very big tables then it will be as good as one process is doing operation.In this patch at a time we assign only one table to each process and whichever process finishes fast, we assign new table, this way all process get equal sharing of the task.
I am late to this game, but the first thing to my mind was - do we
really need the whole forking/threading thing on the client at all? We
need it for things like pg_dump/pg_restore because they can themselvse
benefit from parallelism at the client level, but for something like
this, might the code become a lot simpler if we just use multiple
database connections and async queries? That would also bring the
benefit of less platform dependent code, less cleanup needs etc.
(Oh, and for some reason at my quick review i also noticed - you added
quoting of the table name, but forgot to do it for the schema name.
You should probably also look at using something like
quote_identifier(), that'll make things easier).
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 15 July 2014 19:01, Magnus Hagander Wrote,
I am late to this game, but the first thing to my mind was - do we
really need the whole forking/threading thing on the client at all? We
need it for things like pg_dump/pg_restore because they can themselvse
benefit from parallelism at the client level, but for something like
this, might the code become a lot simpler if we just use multiple
database connections and async queries? That would also bring the
benefit of less platform dependent code, less cleanup needs etc.
Thanks for the review, I understand you point, but I think if we have do this directly by independent connection,
It's difficult to equally divide the jobs b/w multiple independent connections.
As per this implementation we are able to share the load b/w the processes quite well,
1. If one process finishes the work faster it can take the other load.
2. Specially while vacuuming whole database, it's very difficult to divide the load if they are not centralized control.
By above points, I think that we can have this patch..
(Oh, and for some reason at my quick review i also noticed - you added
quoting of the table name, but forgot to do it for the schema name.
You should probably also look at using something like
quote_identifier(), that'll make things easier).
Thanks for the comments, I have attached the updated patch.
vacuumdb_parallel_refactor --> No change
vacuumdb_parallel_v9 --> Quotes added for namespace
Thanks & Regards,
Dilip
Attachments:
vacuumdb_parallel_refactor.patchapplication/octet-stream; name=vacuumdb_parallel_refactor.patchDownload
*** a/src/bin/pg_dump/common.c
--- b/src/bin/pg_dump/common.c
***************
*** 15,21 ****
*/
#include "pg_backup_archiver.h"
#include "pg_backup_utils.h"
!
#include <ctype.h>
#include "catalog/pg_class.h"
--- 15,21 ----
*/
#include "pg_backup_archiver.h"
#include "pg_backup_utils.h"
! #include "parallel_utils.h"
#include <ctype.h>
#include "catalog/pg_class.h"
*** a/src/bin/pg_dump/compress_io.c
--- b/src/bin/pg_dump/compress_io.c
***************
*** 184,190 **** WriteDataToArchive(ArchiveHandle *AH, CompressorState *cs,
const void *data, size_t dLen)
{
/* Are we aborting? */
! checkAborting(AH);
switch (cs->comprAlg)
{
--- 184,190 ----
const void *data, size_t dLen)
{
/* Are we aborting? */
! checkAborting();
switch (cs->comprAlg)
{
***************
*** 351,357 **** ReadDataFromArchiveZlib(ArchiveHandle *AH, ReadFunc readF)
while ((cnt = readF(AH, &buf, &buflen)))
{
/* Are we aborting? */
! checkAborting(AH);
zp->next_in = (void *) buf;
zp->avail_in = cnt;
--- 351,357 ----
while ((cnt = readF(AH, &buf, &buflen)))
{
/* Are we aborting? */
! checkAborting();
zp->next_in = (void *) buf;
zp->avail_in = cnt;
***************
*** 414,420 **** ReadDataFromArchiveNone(ArchiveHandle *AH, ReadFunc readF)
while ((cnt = readF(AH, &buf, &buflen)))
{
/* Are we aborting? */
! checkAborting(AH);
ahwrite(buf, 1, cnt, AH);
}
--- 414,420 ----
while ((cnt = readF(AH, &buf, &buflen)))
{
/* Are we aborting? */
! checkAborting();
ahwrite(buf, 1, cnt, AH);
}
*** a/src/bin/pg_dump/parallel.c
--- b/src/bin/pg_dump/parallel.c
***************
*** 20,25 ****
--- 20,26 ----
#include "pg_backup_utils.h"
#include "parallel.h"
+ #include "parallel_utils.h"
#ifndef WIN32
#include <sys/types.h>
***************
*** 35,43 ****
/* file-scope variables */
#ifdef WIN32
static unsigned int tMasterThreadId = 0;
- static HANDLE termEvent = INVALID_HANDLE_VALUE;
- static int pgpipe(int handles[2]);
- static int piperead(int s, char *buf, int len);
/*
* Structure to hold info passed by _beginthreadex() to the function it calls
--- 36,41 ----
***************
*** 53,228 **** typedef struct
} WorkerInfo;
#define pipewrite(a,b,c) send(a,b,c,0)
- #else
- /*
- * aborting is only ever used in the master, the workers are fine with just
- * wantAbort.
- */
- static bool aborting = false;
- static volatile sig_atomic_t wantAbort = 0;
- #define pgpipe(a) pipe(a)
- #define piperead(a,b,c) read(a,b,c)
- #define pipewrite(a,b,c) write(a,b,c)
#endif
- typedef struct ShutdownInformation
- {
- ParallelState *pstate;
- Archive *AHX;
- } ShutdownInformation;
-
- static ShutdownInformation shutdown_info;
-
static const char *modulename = gettext_noop("parallel archiver");
- static ParallelSlot *GetMyPSlot(ParallelState *pstate);
- static void
- parallel_msg_master(ParallelSlot *slot, const char *modulename,
- const char *fmt, va_list ap)
__attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 0)));
static void archive_close_connection(int code, void *arg);
- static void ShutdownWorkersHard(ParallelState *pstate);
- static void WaitForTerminatingWorkers(ParallelState *pstate);
- #ifndef WIN32
- static void sigTermHandler(int signum);
- #endif
static void SetupWorker(ArchiveHandle *AH, int pipefd[2], int worker,
RestoreOptions *ropt);
- static bool HasEveryWorkerTerminated(ParallelState *pstate);
static void lockTableNoWait(ArchiveHandle *AH, TocEntry *te);
static void WaitForCommands(ArchiveHandle *AH, int pipefd[2]);
- static char *getMessageFromMaster(int pipefd[2]);
- static void sendMessageToMaster(int pipefd[2], const char *str);
- static int select_loop(int maxFd, fd_set *workerset);
- static char *getMessageFromWorker(ParallelState *pstate,
- bool do_wait, int *worker);
- static void sendMessageToWorker(ParallelState *pstate,
- int worker, const char *str);
- static char *readMessageFromPipe(int fd);
#define messageStartsWith(msg, prefix) \
(strncmp(msg, prefix, strlen(prefix)) == 0)
#define messageEquals(msg, pattern) \
(strcmp(msg, pattern) == 0)
- #ifdef WIN32
- static void shutdown_parallel_dump_utils(int code, void *unused);
- bool parallel_init_done = false;
- static DWORD tls_index;
- DWORD mainThreadId;
- #endif
-
-
- #ifdef WIN32
- static void
- shutdown_parallel_dump_utils(int code, void *unused)
- {
- /* Call the cleanup function only from the main thread */
- if (mainThreadId == GetCurrentThreadId())
- WSACleanup();
- }
- #endif
-
- void
- init_parallel_dump_utils(void)
- {
- #ifdef WIN32
- if (!parallel_init_done)
- {
- WSADATA wsaData;
- int err;
-
- tls_index = TlsAlloc();
- mainThreadId = GetCurrentThreadId();
- err = WSAStartup(MAKEWORD(2, 2), &wsaData);
- if (err != 0)
- {
- fprintf(stderr, _("%s: WSAStartup failed: %d\n"), progname, err);
- exit_nicely(1);
- }
- on_exit_nicely(shutdown_parallel_dump_utils, NULL);
- parallel_init_done = true;
- }
- #endif
- }
-
- static ParallelSlot *
- GetMyPSlot(ParallelState *pstate)
- {
- int i;
-
- for (i = 0; i < pstate->numWorkers; i++)
- #ifdef WIN32
- if (pstate->parallelSlot[i].threadId == GetCurrentThreadId())
- #else
- if (pstate->parallelSlot[i].pid == getpid())
- #endif
- return &(pstate->parallelSlot[i]);
-
- return NULL;
- }
-
- /*
- * Fail and die, with a message to stderr. Parameters as for write_msg.
- *
- * This is defined in parallel.c, because in parallel mode, things are more
- * complicated. If the worker process does exit_horribly(), we forward its
- * last words to the master process. The master process then does
- * exit_horribly() with this error message itself and prints it normally.
- * After printing the message, exit_horribly() on the master will shut down
- * the remaining worker processes.
- */
- void
- exit_horribly(const char *modulename, const char *fmt,...)
- {
- va_list ap;
- ParallelState *pstate = shutdown_info.pstate;
- ParallelSlot *slot;
-
- va_start(ap, fmt);
-
- if (pstate == NULL)
- {
- /* Not in parallel mode, just write to stderr */
- vwrite_msg(modulename, fmt, ap);
- }
- else
- {
- slot = GetMyPSlot(pstate);
-
- if (!slot)
- /* We're the parent, just write the message out */
- vwrite_msg(modulename, fmt, ap);
- else
- /* If we're a worker process, send the msg to the master process */
- parallel_msg_master(slot, modulename, fmt, ap);
- }
-
- va_end(ap);
-
- exit_nicely(1);
- }
-
- /* Sends the error message from the worker to the master process */
- static void
- parallel_msg_master(ParallelSlot *slot, const char *modulename,
- const char *fmt, va_list ap)
- {
- char buf[512];
- int pipefd[2];
-
- pipefd[PIPE_READ] = slot->pipeRevRead;
- pipefd[PIPE_WRITE] = slot->pipeRevWrite;
-
- strcpy(buf, "ERROR ");
- vsnprintf(buf + strlen("ERROR "),
- sizeof(buf) - strlen("ERROR "), fmt, ap);
-
- sendMessageToMaster(pipefd, buf);
- }
/*
* A thread-local version of getLocalPQExpBuffer().
--- 51,75 ----
***************
*** 280,286 **** getThreadLocalPQExpBuffer(void)
void
on_exit_close_archive(Archive *AHX)
{
! shutdown_info.AHX = AHX;
on_exit_nicely(archive_close_connection, &shutdown_info);
}
--- 127,133 ----
void
on_exit_close_archive(Archive *AHX)
{
! shutdown_info.handle = (void*)AHX;
on_exit_nicely(archive_close_connection, &shutdown_info);
}
***************
*** 306,312 **** archive_close_connection(int code, void *arg)
* connection (only open during parallel dump but not restore) and
* shut down the remaining workers.
*/
! DisconnectDatabase(si->AHX);
#ifndef WIN32
/*
--- 153,159 ----
* connection (only open during parallel dump but not restore) and
* shut down the remaining workers.
*/
! DisconnectDatabase((Archive*)si->handle);
#ifndef WIN32
/*
***************
*** 318,436 **** archive_close_connection(int code, void *arg)
#endif
ShutdownWorkersHard(si->pstate);
}
! else if (slot->args->AH)
! DisconnectDatabase(&(slot->args->AH->public));
}
! else if (si->AHX)
! DisconnectDatabase(si->AHX);
}
/*
- * If we have one worker that terminates for some reason, we'd like the other
- * threads to terminate as well (and not finish with their 70 GB table dump
- * first...). Now in UNIX we can just kill these processes, and let the signal
- * handler set wantAbort to 1. In Windows we set a termEvent and this serves
- * as the signal for everyone to terminate.
- */
- void
- checkAborting(ArchiveHandle *AH)
- {
- #ifdef WIN32
- if (WaitForSingleObject(termEvent, 0) == WAIT_OBJECT_0)
- #else
- if (wantAbort)
- #endif
- exit_horribly(modulename, "worker is terminating\n");
- }
-
- /*
- * Shut down any remaining workers, this has an implicit do_wait == true.
- *
- * The fastest way we can make the workers terminate gracefully is when
- * they are listening for new commands and we just tell them to terminate.
- */
- static void
- ShutdownWorkersHard(ParallelState *pstate)
- {
- #ifndef WIN32
- int i;
-
- signal(SIGPIPE, SIG_IGN);
-
- /*
- * Close our write end of the sockets so that the workers know they can
- * exit.
- */
- for (i = 0; i < pstate->numWorkers; i++)
- closesocket(pstate->parallelSlot[i].pipeWrite);
-
- for (i = 0; i < pstate->numWorkers; i++)
- kill(pstate->parallelSlot[i].pid, SIGTERM);
- #else
- /* The workers monitor this event via checkAborting(). */
- SetEvent(termEvent);
- #endif
-
- WaitForTerminatingWorkers(pstate);
- }
-
- /*
- * Wait for the termination of the processes using the OS-specific method.
- */
- static void
- WaitForTerminatingWorkers(ParallelState *pstate)
- {
- while (!HasEveryWorkerTerminated(pstate))
- {
- ParallelSlot *slot = NULL;
- int j;
-
- #ifndef WIN32
- int status;
- pid_t pid = wait(&status);
-
- for (j = 0; j < pstate->numWorkers; j++)
- if (pstate->parallelSlot[j].pid == pid)
- slot = &(pstate->parallelSlot[j]);
- #else
- uintptr_t hThread;
- DWORD ret;
- uintptr_t *lpHandles = pg_malloc(sizeof(HANDLE) * pstate->numWorkers);
- int nrun = 0;
-
- for (j = 0; j < pstate->numWorkers; j++)
- if (pstate->parallelSlot[j].workerStatus != WRKR_TERMINATED)
- {
- lpHandles[nrun] = pstate->parallelSlot[j].hThread;
- nrun++;
- }
- ret = WaitForMultipleObjects(nrun, (HANDLE *) lpHandles, false, INFINITE);
- Assert(ret != WAIT_FAILED);
- hThread = lpHandles[ret - WAIT_OBJECT_0];
-
- for (j = 0; j < pstate->numWorkers; j++)
- if (pstate->parallelSlot[j].hThread == hThread)
- slot = &(pstate->parallelSlot[j]);
-
- free(lpHandles);
- #endif
- Assert(slot);
-
- slot->workerStatus = WRKR_TERMINATED;
- }
- Assert(HasEveryWorkerTerminated(pstate));
- }
-
- #ifndef WIN32
- /* Signal handling (UNIX only) */
- static void
- sigTermHandler(int signum)
- {
- wantAbort = 1;
- }
- #endif
-
- /*
* This function is called by both UNIX and Windows variants to set up a
* worker process.
*/
--- 165,178 ----
#endif
ShutdownWorkersHard(si->pstate);
}
! else if (((ParallelArgs*)slot->args)->AH)
! DisconnectDatabase(&(((ParallelArgs*)slot->args)->AH->public));
}
! else if ((ArchiveHandle*)si->handle)
! DisconnectDatabase((Archive*)si->handle);
}
/*
* This function is called by both UNIX and Windows variants to set up a
* worker process.
*/
***************
*** 537,544 **** ParallelBackupStart(ArchiveHandle *AH, RestoreOptions *ropt)
pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
pstate->parallelSlot[i].args = (ParallelArgs *) pg_malloc(sizeof(ParallelArgs));
! pstate->parallelSlot[i].args->AH = NULL;
! pstate->parallelSlot[i].args->te = NULL;
#ifdef WIN32
/* Allocate a new structure for every worker */
wi = (WorkerInfo *) pg_malloc(sizeof(WorkerInfo));
--- 279,286 ----
pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
pstate->parallelSlot[i].args = (ParallelArgs *) pg_malloc(sizeof(ParallelArgs));
! ((ParallelArgs*)pstate->parallelSlot[i].args)->AH = NULL;
! ((ParallelArgs*)pstate->parallelSlot[i].args)->te = NULL;
#ifdef WIN32
/* Allocate a new structure for every worker */
wi = (WorkerInfo *) pg_malloc(sizeof(WorkerInfo));
***************
*** 581,587 **** ParallelBackupStart(ArchiveHandle *AH, RestoreOptions *ropt)
* and also clones the database connection (for parallel dump)
* which both seem kinda helpful.
*/
! pstate->parallelSlot[i].args->AH = CloneArchive(AH);
/* close read end of Worker -> Master */
closesocket(pipeWM[PIPE_READ]);
--- 323,329 ----
* and also clones the database connection (for parallel dump)
* which both seem kinda helpful.
*/
! ((ParallelArgs*)pstate->parallelSlot[i].args)->AH = CloneArchive(AH);
/* close read end of Worker -> Master */
closesocket(pipeWM[PIPE_READ]);
***************
*** 598,604 **** ParallelBackupStart(ArchiveHandle *AH, RestoreOptions *ropt)
closesocket(pstate->parallelSlot[j].pipeWrite);
}
! SetupWorker(pstate->parallelSlot[i].args->AH, pipefd, i, ropt);
exit(0);
}
--- 340,346 ----
closesocket(pstate->parallelSlot[j].pipeWrite);
}
! SetupWorker(((ParallelArgs*)pstate->parallelSlot[i].args)->AH, pipefd, i, ropt);
exit(0);
}
***************
*** 738,786 **** DispatchJobForTocEntry(ArchiveHandle *AH, ParallelState *pstate, TocEntry *te,
sendMessageToWorker(pstate, worker, arg);
pstate->parallelSlot[worker].workerStatus = WRKR_WORKING;
! pstate->parallelSlot[worker].args->te = te;
! }
!
! /*
! * Find the first free parallel slot (if any).
! */
! int
! GetIdleWorker(ParallelState *pstate)
! {
! int i;
!
! for (i = 0; i < pstate->numWorkers; i++)
! if (pstate->parallelSlot[i].workerStatus == WRKR_IDLE)
! return i;
! return NO_SLOT;
! }
!
! /*
! * Return true iff every worker process is in the WRKR_TERMINATED state.
! */
! static bool
! HasEveryWorkerTerminated(ParallelState *pstate)
! {
! int i;
!
! for (i = 0; i < pstate->numWorkers; i++)
! if (pstate->parallelSlot[i].workerStatus != WRKR_TERMINATED)
! return false;
! return true;
! }
!
! /*
! * Return true iff every worker is in the WRKR_IDLE state.
! */
! bool
! IsEveryWorkerIdle(ParallelState *pstate)
! {
! int i;
!
! for (i = 0; i < pstate->numWorkers; i++)
! if (pstate->parallelSlot[i].workerStatus != WRKR_IDLE)
! return false;
! return true;
}
/*
--- 480,486 ----
sendMessageToWorker(pstate, worker, arg);
pstate->parallelSlot[worker].workerStatus = WRKR_WORKING;
! ((ParallelArgs*)pstate->parallelSlot[worker].args)->te = te;
}
/*
***************
*** 966,972 **** ListenToWorkers(ArchiveHandle *AH, ParallelState *pstate, bool do_wait)
TocEntry *te;
pstate->parallelSlot[worker].workerStatus = WRKR_FINISHED;
! te = pstate->parallelSlot[worker].args->te;
if (messageStartsWith(msg, "OK RESTORE "))
{
statusString = msg + strlen("OK RESTORE ");
--- 666,672 ----
TocEntry *te;
pstate->parallelSlot[worker].workerStatus = WRKR_FINISHED;
! te = ((ParallelArgs*)pstate->parallelSlot[worker].args)->te;
if (messageStartsWith(msg, "OK RESTORE "))
{
statusString = msg + strlen("OK RESTORE ");
***************
*** 1001,1031 **** ListenToWorkers(ArchiveHandle *AH, ParallelState *pstate, bool do_wait)
/*
* This function is executed in the master process.
*
- * This function is used to get the return value of a terminated worker
- * process. If a process has terminated, its status is stored in *status and
- * the id of the worker is returned.
- */
- int
- ReapWorkerStatus(ParallelState *pstate, int *status)
- {
- int i;
-
- for (i = 0; i < pstate->numWorkers; i++)
- {
- if (pstate->parallelSlot[i].workerStatus == WRKR_FINISHED)
- {
- *status = pstate->parallelSlot[i].status;
- pstate->parallelSlot[i].status = 0;
- pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
- return i;
- }
- }
- return NO_SLOT;
- }
-
- /*
- * This function is executed in the master process.
- *
* It looks for an idle worker process and only returns if there is one.
*/
void
--- 701,706 ----
***************
*** 1089,1417 **** EnsureWorkersFinished(ArchiveHandle *AH, ParallelState *pstate)
}
}
- /*
- * This function is executed in the worker process.
- *
- * It returns the next message on the communication channel, blocking until it
- * becomes available.
- */
- static char *
- getMessageFromMaster(int pipefd[2])
- {
- return readMessageFromPipe(pipefd[PIPE_READ]);
- }
-
- /*
- * This function is executed in the worker process.
- *
- * It sends a message to the master on the communication channel.
- */
- static void
- sendMessageToMaster(int pipefd[2], const char *str)
- {
- int len = strlen(str) + 1;
-
- if (pipewrite(pipefd[PIPE_WRITE], str, len) != len)
- exit_horribly(modulename,
- "could not write to the communication channel: %s\n",
- strerror(errno));
- }
-
- /*
- * A select loop that repeats calling select until a descriptor in the read
- * set becomes readable. On Windows we have to check for the termination event
- * from time to time, on Unix we can just block forever.
- */
- static int
- select_loop(int maxFd, fd_set *workerset)
- {
- int i;
- fd_set saveSet = *workerset;
-
- #ifdef WIN32
- /* should always be the master */
- Assert(tMasterThreadId == GetCurrentThreadId());
-
- for (;;)
- {
- /*
- * sleep a quarter of a second before checking if we should terminate.
- */
- struct timeval tv = {0, 250000};
-
- *workerset = saveSet;
- i = select(maxFd + 1, workerset, NULL, NULL, &tv);
-
- if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
- continue;
- if (i)
- break;
- }
- #else /* UNIX */
-
- for (;;)
- {
- *workerset = saveSet;
- i = select(maxFd + 1, workerset, NULL, NULL, NULL);
-
- /*
- * If we Ctrl-C the master process , it's likely that we interrupt
- * select() here. The signal handler will set wantAbort == true and
- * the shutdown journey starts from here. Note that we'll come back
- * here later when we tell all workers to terminate and read their
- * responses. But then we have aborting set to true.
- */
- if (wantAbort && !aborting)
- exit_horribly(modulename, "terminated by user\n");
-
- if (i < 0 && errno == EINTR)
- continue;
- break;
- }
- #endif
-
- return i;
- }
-
-
- /*
- * This function is executed in the master process.
- *
- * It returns the next message from the worker on the communication channel,
- * optionally blocking (do_wait) until it becomes available.
- *
- * The id of the worker is returned in *worker.
- */
- static char *
- getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker)
- {
- int i;
- fd_set workerset;
- int maxFd = -1;
- struct timeval nowait = {0, 0};
-
- FD_ZERO(&workerset);
-
- for (i = 0; i < pstate->numWorkers; i++)
- {
- if (pstate->parallelSlot[i].workerStatus == WRKR_TERMINATED)
- continue;
- FD_SET(pstate->parallelSlot[i].pipeRead, &workerset);
- /* actually WIN32 ignores the first parameter to select()... */
- if (pstate->parallelSlot[i].pipeRead > maxFd)
- maxFd = pstate->parallelSlot[i].pipeRead;
- }
-
- if (do_wait)
- {
- i = select_loop(maxFd, &workerset);
- Assert(i != 0);
- }
- else
- {
- if ((i = select(maxFd + 1, &workerset, NULL, NULL, &nowait)) == 0)
- return NULL;
- }
-
- if (i < 0)
- exit_horribly(modulename, "error in ListenToWorkers(): %s\n", strerror(errno));
-
- for (i = 0; i < pstate->numWorkers; i++)
- {
- char *msg;
-
- if (!FD_ISSET(pstate->parallelSlot[i].pipeRead, &workerset))
- continue;
-
- msg = readMessageFromPipe(pstate->parallelSlot[i].pipeRead);
- *worker = i;
- return msg;
- }
- Assert(false);
- return NULL;
- }
-
- /*
- * This function is executed in the master process.
- *
- * It sends a message to a certain worker on the communication channel.
- */
- static void
- sendMessageToWorker(ParallelState *pstate, int worker, const char *str)
- {
- int len = strlen(str) + 1;
-
- if (pipewrite(pstate->parallelSlot[worker].pipeWrite, str, len) != len)
- {
- /*
- * If we're already aborting anyway, don't care if we succeed or not.
- * The child might have gone already.
- */
- #ifndef WIN32
- if (!aborting)
- #endif
- exit_horribly(modulename,
- "could not write to the communication channel: %s\n",
- strerror(errno));
- }
- }
-
- /*
- * The underlying function to read a message from the communication channel
- * (fd) with optional blocking (do_wait).
- */
- static char *
- readMessageFromPipe(int fd)
- {
- char *msg;
- int msgsize,
- bufsize;
- int ret;
-
- /*
- * The problem here is that we need to deal with several possibilites: we
- * could receive only a partial message or several messages at once. The
- * caller expects us to return exactly one message however.
- *
- * We could either read in as much as we can and keep track of what we
- * delivered back to the caller or we just read byte by byte. Once we see
- * (char) 0, we know that it's the message's end. This would be quite
- * inefficient for more data but since we are reading only on the command
- * channel, the performance loss does not seem worth the trouble of
- * keeping internal states for different file descriptors.
- */
- bufsize = 64; /* could be any number */
- msg = (char *) pg_malloc(bufsize);
-
- msgsize = 0;
- for (;;)
- {
- Assert(msgsize <= bufsize);
- ret = piperead(fd, msg + msgsize, 1);
-
- /* worker has closed the connection or another error happened */
- if (ret <= 0)
- break;
-
- Assert(ret == 1);
-
- if (msg[msgsize] == '\0')
- return msg;
-
- msgsize++;
- if (msgsize == bufsize)
- {
- /* could be any number */
- bufsize += 16;
- msg = (char *) realloc(msg, bufsize);
- }
- }
-
- /*
- * Worker has closed the connection, make sure to clean up before return
- * since we are not returning msg (but did allocate it).
- */
- free(msg);
-
- return NULL;
- }
-
- #ifdef WIN32
- /*
- * This is a replacement version of pipe for Win32 which allows returned
- * handles to be used in select(). Note that read/write calls must be replaced
- * with recv/send. "handles" have to be integers so we check for errors then
- * cast to integers.
- */
- static int
- pgpipe(int handles[2])
- {
- pgsocket s, tmp_sock;
- struct sockaddr_in serv_addr;
- int len = sizeof(serv_addr);
-
- /* We have to use the Unix socket invalid file descriptor value here. */
- handles[0] = handles[1] = -1;
-
- /*
- * setup listen socket
- */
- if ((s = socket(AF_INET, SOCK_STREAM, 0)) == PGINVALID_SOCKET)
- {
- write_msg(modulename, "pgpipe: could not create socket: error code %d\n",
- WSAGetLastError());
- return -1;
- }
-
- memset((void *) &serv_addr, 0, sizeof(serv_addr));
- serv_addr.sin_family = AF_INET;
- serv_addr.sin_port = htons(0);
- serv_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
- if (bind(s, (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
- {
- write_msg(modulename, "pgpipe: could not bind: error code %d\n",
- WSAGetLastError());
- closesocket(s);
- return -1;
- }
- if (listen(s, 1) == SOCKET_ERROR)
- {
- write_msg(modulename, "pgpipe: could not listen: error code %d\n",
- WSAGetLastError());
- closesocket(s);
- return -1;
- }
- if (getsockname(s, (SOCKADDR *) &serv_addr, &len) == SOCKET_ERROR)
- {
- write_msg(modulename, "pgpipe: getsockname() failed: error code %d\n",
- WSAGetLastError());
- closesocket(s);
- return -1;
- }
-
- /*
- * setup pipe handles
- */
- if ((tmp_sock = socket(AF_INET, SOCK_STREAM, 0)) == PGINVALID_SOCKET)
- {
- write_msg(modulename, "pgpipe: could not create second socket: error code %d\n",
- WSAGetLastError());
- closesocket(s);
- return -1;
- }
- handles[1] = (int) tmp_sock;
-
- if (connect(handles[1], (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
- {
- write_msg(modulename, "pgpipe: could not connect socket: error code %d\n",
- WSAGetLastError());
- closesocket(s);
- return -1;
- }
- if ((tmp_sock = accept(s, (SOCKADDR *) &serv_addr, &len)) == PGINVALID_SOCKET)
- {
- write_msg(modulename, "pgpipe: could not accept connection: error code %d\n",
- WSAGetLastError());
- closesocket(handles[1]);
- handles[1] = -1;
- closesocket(s);
- return -1;
- }
- handles[0] = (int) tmp_sock;
-
- closesocket(s);
- return 0;
- }
-
- static int
- piperead(int s, char *buf, int len)
- {
- int ret = recv(s, buf, len, 0);
-
- if (ret < 0 && WSAGetLastError() == WSAECONNRESET)
- /* EOF on the pipe! (win32 socket based implementation) */
- ret = 0;
- return ret;
- }
-
- #endif
--- 764,766 ----
*** a/src/bin/pg_dump/parallel.h
--- b/src/bin/pg_dump/parallel.h
***************
*** 20,37 ****
#define PG_DUMP_PARALLEL_H
#include "pg_backup_db.h"
struct _archiveHandle;
struct _tocEntry;
- typedef enum
- {
- WRKR_TERMINATED = 0,
- WRKR_IDLE,
- WRKR_WORKING,
- WRKR_FINISHED
- } T_WorkerStatus;
-
/* Arguments needed for a worker process */
typedef struct ParallelArgs
{
--- 20,30 ----
#define PG_DUMP_PARALLEL_H
#include "pg_backup_db.h"
+ #include "parallel_utils.h"
struct _archiveHandle;
struct _tocEntry;
/* Arguments needed for a worker process */
typedef struct ParallelArgs
{
***************
*** 39,81 **** typedef struct ParallelArgs
struct _tocEntry *te;
} ParallelArgs;
- /* State for each parallel activity slot */
- typedef struct ParallelSlot
- {
- ParallelArgs *args;
- T_WorkerStatus workerStatus;
- int status;
- int pipeRead;
- int pipeWrite;
- int pipeRevRead;
- int pipeRevWrite;
- #ifdef WIN32
- uintptr_t hThread;
- unsigned int threadId;
- #else
- pid_t pid;
- #endif
- } ParallelSlot;
-
- #define NO_SLOT (-1)
-
- typedef struct ParallelState
- {
- int numWorkers;
- ParallelSlot *parallelSlot;
- } ParallelState;
-
- #ifdef WIN32
- extern bool parallel_init_done;
- extern DWORD mainThreadId;
- #endif
-
- extern void init_parallel_dump_utils(void);
-
- extern int GetIdleWorker(ParallelState *pstate);
- extern bool IsEveryWorkerIdle(ParallelState *pstate);
extern void ListenToWorkers(struct _archiveHandle * AH, ParallelState *pstate, bool do_wait);
- extern int ReapWorkerStatus(ParallelState *pstate, int *status);
extern void EnsureIdleWorker(struct _archiveHandle * AH, ParallelState *pstate);
extern void EnsureWorkersFinished(struct _archiveHandle * AH, ParallelState *pstate);
--- 32,38 ----
***************
*** 86,95 **** extern void DispatchJobForTocEntry(struct _archiveHandle * AH,
struct _tocEntry * te, T_Action act);
extern void ParallelBackupEnd(struct _archiveHandle * AH, ParallelState *pstate);
- extern void checkAborting(struct _archiveHandle * AH);
-
- extern void
- exit_horribly(const char *modulename, const char *fmt,...)
- __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3), noreturn));
-
#endif /* PG_DUMP_PARALLEL_H */
--- 43,46 ----
*** a/src/bin/pg_dump/pg_backup_archiver.c
--- b/src/bin/pg_dump/pg_backup_archiver.c
***************
*** 3825,3834 **** get_next_work_item(ArchiveHandle *AH, TocEntry *ready_list,
if (pref_non_data)
{
int count = 0;
!
for (k = 0; k < pstate->numWorkers; k++)
! if (pstate->parallelSlot[k].args->te != NULL &&
! pstate->parallelSlot[k].args->te->section == SECTION_DATA)
count++;
if (pstate->numWorkers == 0 || count * 4 < pstate->numWorkers)
pref_non_data = false;
--- 3825,3834 ----
if (pref_non_data)
{
int count = 0;
!
for (k = 0; k < pstate->numWorkers; k++)
! if (((ParallelArgs*)pstate->parallelSlot[k].args)->te != NULL &&
! ((ParallelArgs*)pstate->parallelSlot[k].args)->te->section == SECTION_DATA)
count++;
if (pstate->numWorkers == 0 || count * 4 < pstate->numWorkers)
pref_non_data = false;
***************
*** 3852,3858 **** get_next_work_item(ArchiveHandle *AH, TocEntry *ready_list,
if (pstate->parallelSlot[i].workerStatus != WRKR_WORKING)
continue;
! running_te = pstate->parallelSlot[i].args->te;
if (has_lock_conflicts(te, running_te) ||
has_lock_conflicts(running_te, te))
--- 3852,3858 ----
if (pstate->parallelSlot[i].workerStatus != WRKR_WORKING)
continue;
! running_te = ((ParallelArgs*)pstate->parallelSlot[i].args)->te;
if (has_lock_conflicts(te, running_te) ||
has_lock_conflicts(running_te, te))
***************
*** 3926,3932 **** mark_work_done(ArchiveHandle *AH, TocEntry *ready_list,
{
TocEntry *te = NULL;
! te = pstate->parallelSlot[worker].args->te;
if (te == NULL)
exit_horribly(modulename, "could not find slot of finished worker\n");
--- 3926,3932 ----
{
TocEntry *te = NULL;
! te = ((ParallelArgs*)pstate->parallelSlot[worker].args)->te;
if (te == NULL)
exit_horribly(modulename, "could not find slot of finished worker\n");
*** a/src/bin/pg_dump/pg_backup_directory.c
--- b/src/bin/pg_dump/pg_backup_directory.c
***************
*** 35,42 ****
#include "compress_io.h"
#include "pg_backup_utils.h"
#include "parallel.h"
-
#include <dirent.h>
#include <sys/stat.h>
--- 35,42 ----
#include "compress_io.h"
#include "pg_backup_utils.h"
+ #include "parallel_utils.h"
#include "parallel.h"
#include <dirent.h>
#include <sys/stat.h>
***************
*** 356,362 **** _WriteData(ArchiveHandle *AH, const void *data, size_t dLen)
lclContext *ctx = (lclContext *) AH->formatData;
/* Are we aborting? */
! checkAborting(AH);
if (dLen > 0 && cfwrite(data, dLen, ctx->dataFH) != dLen)
WRITE_ERROR_EXIT;
--- 356,362 ----
lclContext *ctx = (lclContext *) AH->formatData;
/* Are we aborting? */
! checkAborting();
if (dLen > 0 && cfwrite(data, dLen, ctx->dataFH) != dLen)
WRITE_ERROR_EXIT;
***************
*** 524,530 **** _WriteBuf(ArchiveHandle *AH, const void *buf, size_t len)
lclContext *ctx = (lclContext *) AH->formatData;
/* Are we aborting? */
! checkAborting(AH);
if (cfwrite(buf, len, ctx->dataFH) != len)
WRITE_ERROR_EXIT;
--- 524,530 ----
lclContext *ctx = (lclContext *) AH->formatData;
/* Are we aborting? */
! checkAborting();
if (cfwrite(buf, len, ctx->dataFH) != len)
WRITE_ERROR_EXIT;
*** a/src/bin/pg_dump/pg_backup_utils.c
--- b/src/bin/pg_dump/pg_backup_utils.c
***************
*** 15,33 ****
#include "pg_backup_utils.h"
#include "parallel.h"
!
/* Globals exported by this file */
const char *progname = NULL;
- #define MAX_ON_EXIT_NICELY 20
-
- static struct
- {
- on_exit_nicely_callback function;
- void *arg;
- } on_exit_nicely_list[MAX_ON_EXIT_NICELY];
-
- static int on_exit_nicely_index;
/*
* Parse a --section=foo command line argument.
--- 15,24 ----
#include "pg_backup_utils.h"
#include "parallel.h"
! #include "parallel_utils.h"
/* Globals exported by this file */
const char *progname = NULL;
/*
* Parse a --section=foo command line argument.
***************
*** 60,126 **** set_dump_section(const char *arg, int *dumpSections)
}
- /*
- * Write a printf-style message to stderr.
- *
- * The program name is prepended, if "progname" has been set.
- * Also, if modulename isn't NULL, that's included too.
- * Note that we'll try to translate the modulename and the fmt string.
- */
- void
- write_msg(const char *modulename, const char *fmt,...)
- {
- va_list ap;
-
- va_start(ap, fmt);
- vwrite_msg(modulename, fmt, ap);
- va_end(ap);
- }
-
- /*
- * As write_msg, but pass a va_list not variable arguments.
- */
- void
- vwrite_msg(const char *modulename, const char *fmt, va_list ap)
- {
- if (progname)
- {
- if (modulename)
- fprintf(stderr, "%s: [%s] ", progname, _(modulename));
- else
- fprintf(stderr, "%s: ", progname);
- }
- vfprintf(stderr, _(fmt), ap);
- }
-
- /* Register a callback to be run when exit_nicely is invoked. */
- void
- on_exit_nicely(on_exit_nicely_callback function, void *arg)
- {
- if (on_exit_nicely_index >= MAX_ON_EXIT_NICELY)
- exit_horribly(NULL, "out of on_exit_nicely slots\n");
- on_exit_nicely_list[on_exit_nicely_index].function = function;
- on_exit_nicely_list[on_exit_nicely_index].arg = arg;
- on_exit_nicely_index++;
- }
-
- /*
- * Run accumulated on_exit_nicely callbacks in reverse order and then exit
- * quietly. This needs to be thread-safe.
- */
- void
- exit_nicely(int code)
- {
- int i;
-
- for (i = on_exit_nicely_index - 1; i >= 0; i--)
- (*on_exit_nicely_list[i].function) (code,
- on_exit_nicely_list[i].arg);
-
- #ifdef WIN32
- if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
- ExitThread(code);
- #endif
-
- exit(code);
- }
--- 51,53 ----
*** a/src/bin/pg_dump/pg_backup_utils.h
--- b/src/bin/pg_dump/pg_backup_utils.h
***************
*** 23,40 **** typedef enum /* bits returned by set_dump_section */
DUMP_UNSECTIONED = 0xff
} DumpSections;
- typedef void (*on_exit_nicely_callback) (int code, void *arg);
-
- extern const char *progname;
-
extern void set_dump_section(const char *arg, int *dumpSections);
- extern void
- write_msg(const char *modulename, const char *fmt,...)
- __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
- extern void
- vwrite_msg(const char *modulename, const char *fmt, va_list ap)
- __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 0)));
- extern void on_exit_nicely(on_exit_nicely_callback function, void *arg);
- extern void exit_nicely(int code) __attribute__((noreturn));
-
#endif /* PG_BACKUP_UTILS_H */
--- 23,27 ----
*** /dev/null
--- b/src/include/parallel_utils.h
***************
*** 0 ****
--- 1,176 ----
+ /*-------------------------------------------------------------------------
+ *
+ * parallel_utils.h
+ * Header for src/port/ parallel execution functions.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/port.h
+ *
+ *-------------------------------------------------------------------------
+ */
+ #ifndef PARALLEL_UTILS_H
+ #define PARALLEL_UTILS_H
+
+ #include <ctype.h>
+ #include <netdb.h>
+ #include <pwd.h>
+
+ /*
+ * WIN32 doesn't allow descriptors returned by pipe() to be used in select(),
+ * so for that platform we use socket() instead of pipe().
+ * There is some inconsistency here because sometimes we require pg*, like
+ * pgpipe, but in other cases we define rename to pgrename just on Win32.
+ */
+ #ifndef WIN32
+ /*
+ * The function prototypes are not supplied because every C file
+ * includes this file.
+ */
+ #define pgpipe(a) pipe(a)
+ #define piperead(a,b,c) read(a,b,c)
+ #define pipewrite(a,b,c) write(a,b,c)
+ #else
+ extern int pgpipe(int handles[2]);
+ extern int piperead(int s, char *buf, int len);
+ #define pipewrite(a,b,c) send(a,b,c,0)
+ #endif
+
+ #ifdef WIN32
+ #define PG_SIGNAL_COUNT 32
+ #define kill(pid,sig) pgkill(pid,sig)
+ extern int pgkill(int pid, int sig);
+ #endif
+
+ #ifdef WIN32
+ extern DWORD mainThreadId;
+ extern bool parallel_init_done;
+ extern HANDLE termEvent;
+
+ #endif
+
+
+ typedef enum
+ {
+ WRKR_TERMINATED = 0,
+ WRKR_IDLE,
+ WRKR_WORKING,
+ WRKR_FINISHED
+ } T_WorkerStatus;
+
+ #define PIPE_READ 0
+ #define PIPE_WRITE 1
+
+ #define NO_SLOT (-1)
+
+ /* State for each parallel activity slot */
+ typedef struct ParallelSlot
+ {
+ void *args;
+ T_WorkerStatus workerStatus;
+ int status;
+ int pipeRead;
+ int pipeWrite;
+ int pipeRevRead;
+ int pipeRevWrite;
+ #ifdef WIN32
+ uintptr_t hThread;
+ unsigned int threadId;
+ #else
+ pid_t pid;
+ #endif
+ } ParallelSlot;
+
+ typedef struct ParallelState
+ {
+ int numWorkers;
+ ParallelSlot *parallelSlot;
+ } ParallelState;
+
+ typedef struct ShutdownInformation
+ {
+ ParallelState *pstate;
+ void *handle;
+ } ShutdownInformation;
+
+ extern ShutdownInformation shutdown_info;
+
+ #define MAX_ON_EXIT_NICELY 20
+
+ typedef void (*on_exit_nicely_callback) (int code, void *arg);
+
+ typedef struct on_exit_nicely_stru
+ {
+ on_exit_nicely_callback function;
+ void *arg;
+ }on_exit_nicely_stru;
+
+ extern on_exit_nicely_stru on_exit_nicely_list[MAX_ON_EXIT_NICELY];
+
+ extern int on_exit_nicely_index;
+
+
+ #ifdef WIN32
+ extern bool parallel_init_done;
+ extern DWORD tls_index;
+
+ #else
+ extern bool aborting;
+
+ #endif
+
+ extern const char *progname;
+
+ extern char *
+ readMessageFromPipe(int fd);
+
+ void
+ parallel_msg_master(ParallelSlot *slot, const char *modulename,
+ const char *fmt, va_list ap)
+ __attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 0)));
+
+ extern void ShutdownWorkersHard(ParallelState *pstate);
+ extern void WaitForTerminatingWorkers(ParallelState *pstate);
+
+ extern bool HasEveryWorkerTerminated(ParallelState *pstate);
+
+ extern char *getMessageFromMaster(int pipefd[2]);
+ extern void sendMessageToMaster(int pipefd[2], const char *str);
+ extern int select_loop(int maxFd, fd_set *workerset);
+ extern char *getMessageFromWorker(ParallelState *pstate,
+ bool do_wait, int *worker);
+ extern void sendMessageToWorker(ParallelState *pstate,
+ int worker, const char *str);
+ extern int ReapWorkerStatus(ParallelState *pstate, int *status);
+
+ extern ParallelSlot *
+ GetMyPSlot(ParallelState *pstate);
+
+ extern void init_parallel_dump_utils(void);
+ extern int GetIdleWorker(ParallelState *pstate);
+ extern bool IsEveryWorkerIdle(ParallelState *pstate);
+ extern void checkAborting();
+
+ extern void
+ write_msg(const char *modulename, const char *fmt,...)
+ __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
+ extern void
+ vwrite_msg(const char *modulename, const char *fmt, va_list ap)
+ __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 0)));
+ extern void on_exit_nicely(on_exit_nicely_callback function, void *arg);
+ extern void exit_nicely(int code) __attribute__((noreturn));
+
+ extern void
+ exit_horribly(const char *modulename, const char *fmt,...)
+ __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3), noreturn));
+
+ #ifndef WIN32
+ extern void sigTermHandler(int signum);
+ #endif
+
+ #ifdef WIN32
+ extern void shutdown_parallel_dump_utils(int code, void *unused);
+ #endif
+
+ #endif /* PARALLEL_UTILS_H */
*** a/src/port/Makefile
--- b/src/port/Makefile
***************
*** 33,39 **** LIBS += $(PTHREAD_LIBS)
OBJS = $(LIBOBJS) chklocale.o dirmod.o erand48.o fls.o inet_net_ntop.o \
noblock.o path.o pgcheckdir.o pg_crc.o pgmkdirp.o pgsleep.o \
pgstrcasecmp.o pqsignal.o \
! qsort.o qsort_arg.o quotes.o sprompt.o tar.o thread.o
# foo_srv.o and foo.o are both built from foo.c, but only foo.o has -DFRONTEND
OBJS_SRV = $(OBJS:%.o=%_srv.o)
--- 33,39 ----
OBJS = $(LIBOBJS) chklocale.o dirmod.o erand48.o fls.o inet_net_ntop.o \
noblock.o path.o pgcheckdir.o pg_crc.o pgmkdirp.o pgsleep.o \
pgstrcasecmp.o pqsignal.o \
! qsort.o qsort_arg.o quotes.o sprompt.o tar.o thread.o parallel_utils.o
# foo_srv.o and foo.o are both built from foo.c, but only foo.o has -DFRONTEND
OBJS_SRV = $(OBJS:%.o=%_srv.o)
*** /dev/null
--- b/src/port/parallel_utils.c
***************
*** 0 ****
--- 1,737 ----
+ /*-------------------------------------------------------------------------
+ *
+ * parallel_utils.c
+ * kill()
+ *
+ * Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * Utility function for supporing parallel execution in client tools
+ *
+ * IDENTIFICATION
+ * src/port/parallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #include "postgres.h"
+ #include "parallel_utils.h"
+ #include "common/fe_memutils.h"
+
+ #ifndef WIN32
+ #include <sys/types.h>
+ #include <sys/wait.h>
+ #include "signal.h"
+ #include <unistd.h>
+ #include <fcntl.h>
+
+ #endif
+
+
+ #ifdef WIN32
+ DWORD mainThreadId;
+ bool parallel_init_done = false;
+ HANDLE termEvent = INVALID_HANDLE_VALUE;
+ DWORD tls_index;
+ #else
+ bool aborting = false;
+ static volatile sig_atomic_t wantAbort = 0;
+
+ #endif
+
+ static const char *modulename = gettext_noop("parallel utils");
+
+ ShutdownInformation shutdown_info;
+ int on_exit_nicely_index;
+ on_exit_nicely_stru on_exit_nicely_list[MAX_ON_EXIT_NICELY];
+
+
+
+ /*
+ * The underlying function to read a message from the communication channel
+ * (fd) with optional blocking (do_wait).
+ */
+ char *
+ readMessageFromPipe(int fd)
+ {
+ char *msg;
+ int msgsize,
+ bufsize;
+ int ret;
+
+ /*
+ * The problem here is that we need to deal with several possibilites: we
+ * could receive only a partial message or several messages at once. The
+ * caller expects us to return exactly one message however.
+ *
+ * We could either read in as much as we can and keep track of what we
+ * delivered back to the caller or we just read byte by byte. Once we see
+ * (char) 0, we know that it's the message's end. This would be quite
+ * inefficient for more data but since we are reading only on the command
+ * channel, the performance loss does not seem worth the trouble of
+ * keeping internal states for different file descriptors.
+ */
+ bufsize = 64; /* could be any number */
+ msg = (char *)malloc(bufsize);
+ if (!msg)
+ {
+ fprintf(stderr, _("out of memory\n"));
+ exit(1);
+ }
+
+
+ msgsize = 0;
+ for (;;)
+ {
+ Assert(msgsize <= bufsize);
+ ret = piperead(fd, msg + msgsize, 1);
+
+ /* worker has closed the connection or another error happened */
+ if (ret <= 0)
+ return NULL;
+
+ Assert(ret == 1);
+
+ if (msg[msgsize] == '\0')
+ return msg;
+
+ msgsize++;
+ if (msgsize == bufsize)
+ {
+ /* could be any number */
+ bufsize += 16;
+ msg = (char *) realloc(msg, bufsize);
+ }
+ }
+ }
+
+ #ifdef WIN32
+ /*
+ * This is a replacement version of pipe for Win32 which allows returned
+ * handles to be used in select(). Note that read/write calls must be replaced
+ * with recv/send. "handles" have to be integers so we check for errors then
+ * cast to integers.
+ */
+ int
+ pgpipe(int handles[2])
+ {
+ pgsocket s, tmp_sock;
+ struct sockaddr_in serv_addr;
+ int len = sizeof(serv_addr);
+
+ /* We have to use the Unix socket invalid file descriptor value here. */
+ handles[0] = handles[1] = -1;
+
+ /*
+ * setup listen socket
+ */
+ if ((s = socket(AF_INET, SOCK_STREAM, 0)) == PGINVALID_SOCKET)
+ {
+ fprintf(stderr, _("pgpipe: could not create socket: error code %d\n"),
+ WSAGetLastError());
+
+ return -1;
+ }
+
+ memset((void *) &serv_addr, 0, sizeof(serv_addr));
+ serv_addr.sin_family = AF_INET;
+ serv_addr.sin_port = htons(0);
+ serv_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+ if (bind(s, (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ fprintf(stderr, _("pgpipe: could not bind: error code %d\n"),
+ WSAGetLastError());
+ closesocket(s);
+ return -1;
+ }
+ if (listen(s, 1) == SOCKET_ERROR)
+ {
+ fprintf(stderr, _("pgpipe: could not listen: error code %d\n"),
+ WSAGetLastError());
+ closesocket(s);
+ return -1;
+ }
+ if (getsockname(s, (SOCKADDR *) &serv_addr, &len) == SOCKET_ERROR)
+ {
+ fprintf(stderr, _("pgpipe: getsockname() failed: error code %d\n"),
+ WSAGetLastError());
+ closesocket(s);
+ return -1;
+ }
+
+ /*
+ * setup pipe handles
+ */
+ if ((tmp_sock = socket(AF_INET, SOCK_STREAM, 0)) == PGINVALID_SOCKET)
+ {
+ fprintf(stderr, _("pgpipe: could not create second socket: error code %d\n"),
+ WSAGetLastError());
+ closesocket(s);
+ return -1;
+ }
+ handles[1] = (int) tmp_sock;
+
+ if (connect(handles[1], (SOCKADDR *) &serv_addr, len) == SOCKET_ERROR)
+ {
+ fprintf(stderr, _("pgpipe: could not connect socket: error code %d\n"),
+ WSAGetLastError());
+ closesocket(s);
+ return -1;
+ }
+ if ((tmp_sock = accept(s, (SOCKADDR *) &serv_addr, &len)) == PGINVALID_SOCKET)
+ {
+ fprintf(stderr, _("pgpipe: could not accept connection: error code %d\n"),
+ WSAGetLastError());
+ closesocket(handles[1]);
+ handles[1] = -1;
+ closesocket(s);
+ return -1;
+ }
+ handles[0] = (int) tmp_sock;
+
+ closesocket(s);
+ return 0;
+ }
+
+ int
+ piperead(int s, char *buf, int len)
+ {
+ int ret = recv(s, buf, len, 0);
+
+ if (ret < 0 && WSAGetLastError() == WSAECONNRESET)
+ /* EOF on the pipe! (win32 socket based implementation) */
+ ret = 0;
+ return ret;
+ }
+
+ #endif
+
+
+ #ifdef WIN32
+ void
+ shutdown_parallel_dump_utils(int code, void *unused)
+ {
+ /* Call the cleanup function only from the main thread */
+ if (mainThreadId == GetCurrentThreadId())
+ WSACleanup();
+ }
+ #endif
+
+ void
+ init_parallel_dump_utils(void)
+ {
+ #ifdef WIN32
+ if (!parallel_init_done)
+ {
+ WSADATA wsaData;
+ int err;
+
+ tls_index = TlsAlloc();
+ mainThreadId = GetCurrentThreadId();
+ err = WSAStartup(MAKEWORD(2, 2), &wsaData);
+ if (err != 0)
+ {
+ fprintf(stderr, _("%s: WSAStartup failed: %d\n"), progname, err);
+ exit_nicely(1);
+ }
+ on_exit_nicely(shutdown_parallel_dump_utils, NULL);
+ parallel_init_done = true;
+ }
+ #endif
+ }
+
+ ParallelSlot *
+ GetMyPSlot(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ #ifdef WIN32
+ if (pstate->parallelSlot[i].threadId == GetCurrentThreadId())
+ #else
+ if (pstate->parallelSlot[i].pid == getpid())
+ #endif
+ return &(pstate->parallelSlot[i]);
+
+ return NULL;
+ }
+
+
+ /*
+ * Fail and die, with a message to stderr. Parameters as for write_msg.
+ *
+ * This is defined in parallel.c, because in parallel mode, things are more
+ * complicated. If the worker process does exit_horribly(), we forward its
+ * last words to the master process. The master process then does
+ * exit_horribly() with this error message itself and prints it normally.
+ * After printing the message, exit_horribly() on the master will shut down
+ * the remaining worker processes.
+ */
+ void
+ exit_horribly(const char *modulename, const char *fmt,...)
+ {
+ va_list ap;
+ ParallelState *pstate = shutdown_info.pstate;
+ ParallelSlot *slot;
+
+ va_start(ap, fmt);
+
+ if (pstate == NULL)
+ {
+ /* Not in parallel mode, just write to stderr */
+ vwrite_msg(modulename, fmt, ap);
+ }
+ else
+ {
+ slot = GetMyPSlot(pstate);
+
+ if (!slot)
+ /* We're the parent, just write the message out */
+ vwrite_msg(modulename, fmt, ap);
+ else
+ /* If we're a worker process, send the msg to the master process */
+ parallel_msg_master(slot, modulename, fmt, ap);
+ }
+
+ va_end(ap);
+
+ exit_nicely(1);
+ }
+
+ /* Sends the error message from the worker to the master process */
+ void
+ parallel_msg_master(ParallelSlot *slot, const char *modulename,
+ const char *fmt, va_list ap)
+ {
+ char buf[512];
+ int pipefd[2];
+
+ pipefd[PIPE_READ] = slot->pipeRevRead;
+ pipefd[PIPE_WRITE] = slot->pipeRevWrite;
+
+ strcpy(buf, "ERROR ");
+ vsnprintf(buf + strlen("ERROR "),
+ sizeof(buf) - strlen("ERROR "), fmt, ap);
+
+ sendMessageToMaster(pipefd, buf);
+ }
+
+ /*
+ * Shut down any remaining workers, this has an implicit do_wait == true.
+ *
+ * The fastest way we can make the workers terminate gracefully is when
+ * they are listening for new commands and we just tell them to terminate.
+ */
+ void
+ ShutdownWorkersHard(ParallelState *pstate)
+ {
+ int i;
+
+ #ifndef WIN32
+ signal(SIGPIPE, SIG_IGN);
+
+ /*
+ * Close our write end of the sockets so that the workers know they can
+ * exit.
+ */
+ for (i = 0; i < pstate->numWorkers; i++)
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ kill(pstate->parallelSlot[i].pid, SIGTERM);
+ #else
+ /*
+ * Close our write end of the sockets so that the workers know they can
+ * exit.
+ */
+ for (i = 0; i < pstate->numWorkers; i++)
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+
+ /* The workers monitor this event via checkAborting(). */
+ SetEvent(termEvent);
+ #endif
+
+ WaitForTerminatingWorkers(pstate);
+ }
+
+
+ /*
+ * Wait for the termination of the processes using the OS-specific method.
+ */
+ void
+ WaitForTerminatingWorkers(ParallelState *pstate)
+ {
+ while (!HasEveryWorkerTerminated(pstate))
+ {
+ ParallelSlot *slot = NULL;
+ int j;
+
+ #ifndef WIN32
+ int status;
+ pid_t pid = wait(&status);
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].pid == pid)
+ slot = &(pstate->parallelSlot[j]);
+ #else
+ uintptr_t hThread;
+ DWORD ret;
+ uintptr_t *lpHandles;
+ int nrun = 0;
+
+ lpHandles = malloc(sizeof(HANDLE) * pstate->numWorkers);
+ if (!lpHandles)
+ {
+ fprintf(stderr, _("out of memory\n"));
+ exit(1);
+ }
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].workerStatus != WRKR_TERMINATED)
+ {
+ lpHandles[nrun] = pstate->parallelSlot[j].hThread;
+ nrun++;
+ }
+ ret = WaitForMultipleObjects(nrun, (HANDLE *) lpHandles, false, INFINITE);
+ Assert(ret != WAIT_FAILED);
+ hThread = lpHandles[ret - WAIT_OBJECT_0];
+
+ for (j = 0; j < pstate->numWorkers; j++)
+ if (pstate->parallelSlot[j].hThread == hThread)
+ slot = &(pstate->parallelSlot[j]);
+
+ free(lpHandles);
+ #endif
+ Assert(slot);
+
+ slot->workerStatus = WRKR_TERMINATED;
+ }
+ Assert(HasEveryWorkerTerminated(pstate));
+ }
+
+ #ifndef WIN32
+ /* Signal handling (UNIX only) */
+ void
+ sigTermHandler(int signum)
+ {
+ wantAbort = 1;
+ }
+ #endif
+
+ /*
+ * Find the first free parallel slot (if any).
+ */
+ int
+ GetIdleWorker(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus == WRKR_IDLE)
+ return i;
+ return NO_SLOT;
+ }
+
+ /*
+ * Return true iff every worker process is in the WRKR_TERMINATED state.
+ */
+ bool
+ HasEveryWorkerTerminated(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_TERMINATED)
+ return false;
+ return true;
+ }
+
+ /*
+ * Return true iff every worker is in the WRKR_IDLE state.
+ */
+ bool
+ IsEveryWorkerIdle(ParallelState *pstate)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ if (pstate->parallelSlot[i].workerStatus != WRKR_IDLE)
+ return false;
+ return true;
+ }
+
+ /*
+ * This function is executed in the worker process.
+ *
+ * It returns the next message on the communication channel, blocking until it
+ * becomes available.
+ */
+ char *
+ getMessageFromMaster(int pipefd[2])
+ {
+ return readMessageFromPipe(pipefd[PIPE_READ]);
+ }
+
+ /*
+ * This function is executed in the worker process.
+ *
+ * It sends a message to the master on the communication channel.
+ */
+ void
+ sendMessageToMaster(int pipefd[2], const char *str)
+ {
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pipefd[PIPE_WRITE], str, len) != len)
+ exit_horribly(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+ }
+
+ /*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+ int
+ select_loop(int maxFd, fd_set *workerset)
+ {
+ int i;
+ fd_set saveSet = *workerset;
+
+ #ifdef WIN32
+ /* should always be the master */
+ Assert(tMasterThreadId == GetCurrentThreadId());
+
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+ #else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+
+ /*
+ * If we Ctrl-C the master process , it's likely that we interrupt
+ * select() here. The signal handler will set wantAbort == true and
+ * the shutdown journey starts from here. Note that we'll come back
+ * here later when we tell all workers to terminate and read their
+ * responses. But then we have aborting set to true.
+ */
+ if (wantAbort && !aborting)
+ exit_horribly(modulename, "terminated by user\n");
+
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+ #endif
+
+ return i;
+ }
+
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It returns the next message from the worker on the communication channel,
+ * optionally blocking (do_wait) until it becomes available.
+ *
+ * The id of the worker is returned in *worker.
+ */
+ char *
+ getMessageFromWorker(ParallelState *pstate, bool do_wait, int *worker)
+ {
+ int i;
+ fd_set workerset;
+ int maxFd = -1;
+ struct timeval nowait = {0, 0};
+
+ FD_ZERO(&workerset);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_TERMINATED)
+ continue;
+ FD_SET(pstate->parallelSlot[i].pipeRead, &workerset);
+ /* actually WIN32 ignores the first parameter to select()... */
+ if (pstate->parallelSlot[i].pipeRead > maxFd)
+ maxFd = pstate->parallelSlot[i].pipeRead;
+ }
+
+ if (do_wait)
+ {
+ i = select_loop(maxFd, &workerset);
+ Assert(i != 0);
+ }
+ else
+ {
+ if ((i = select(maxFd + 1, &workerset, NULL, NULL, &nowait)) == 0)
+ return NULL;
+ }
+
+ if (i < 0)
+ exit_horribly(modulename, "error in ListenToWorkers(): %s\n", strerror(errno));
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ char *msg;
+
+ if (!FD_ISSET(pstate->parallelSlot[i].pipeRead, &workerset))
+ continue;
+
+ msg = readMessageFromPipe(pstate->parallelSlot[i].pipeRead);
+ *worker = i;
+ return msg;
+ }
+ Assert(false);
+ return NULL;
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It sends a message to a certain worker on the communication channel.
+ */
+ void
+ sendMessageToWorker(ParallelState *pstate, int worker, const char *str)
+ {
+ int len = strlen(str) + 1;
+
+ if (pipewrite(pstate->parallelSlot[worker].pipeWrite, str, len) != len)
+ {
+ /*
+ * If we're already aborting anyway, don't care if we succeed or not.
+ * The child might have gone already.
+ */
+ #ifndef WIN32
+ if (!aborting)
+ #endif
+ exit_horribly(modulename,
+ "could not write to the communication channel: %s\n",
+ strerror(errno));
+ }
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * This function is used to get the return value of a terminated worker
+ * process. If a process has terminated, its status is stored in *status and
+ * the id of the worker is returned.
+ */
+ int
+ ReapWorkerStatus(ParallelState *pstate, int *status)
+ {
+ int i;
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ if (pstate->parallelSlot[i].workerStatus == WRKR_FINISHED)
+ {
+ *status = pstate->parallelSlot[i].status;
+ pstate->parallelSlot[i].status = 0;
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ return i;
+ }
+ }
+ return NO_SLOT;
+ }
+
+ /*
+ * If we have one worker that terminates for some reason, we'd like the other
+ * threads to terminate as well (and not finish with their 70 GB table dump
+ * first...). Now in UNIX we can just kill these processes, and let the signal
+ * handler set wantAbort to 1. In Windows we set a termEvent and this serves
+ * as the signal for everyone to terminate.
+ */
+ void
+ checkAborting()
+ {
+ #ifdef WIN32
+ if (WaitForSingleObject(termEvent, 0) == WAIT_OBJECT_0)
+ #else
+ if (wantAbort)
+ #endif
+ exit_horribly(modulename, "worker is terminating\n");
+ }
+
+
+ /*
+ * Write a printf-style message to stderr.
+ *
+ * The program name is prepended, if "progname" has been set.
+ * Also, if modulename isn't NULL, that's included too.
+ * Note that we'll try to translate the modulename and the fmt string.
+ */
+ void
+ write_msg(const char *modulename, const char *fmt,...)
+ {
+ va_list ap;
+
+ va_start(ap, fmt);
+ vwrite_msg(modulename, fmt, ap);
+ va_end(ap);
+ }
+
+ /*
+ * As write_msg, but pass a va_list not variable arguments.
+ */
+ void
+ vwrite_msg(const char *modulename, const char *fmt, va_list ap)
+ {
+ if (progname)
+ {
+ if (modulename)
+ fprintf(stderr, "%s: [%s] ", progname, _(modulename));
+ else
+ fprintf(stderr, "%s: ", progname);
+ }
+ vfprintf(stderr, _(fmt), ap);
+ }
+
+ /* Register a callback to be run when exit_nicely is invoked. */
+ void
+ on_exit_nicely(on_exit_nicely_callback function, void *arg)
+ {
+ if (on_exit_nicely_index >= MAX_ON_EXIT_NICELY)
+ exit_horribly(NULL, "out of on_exit_nicely slots\n");
+ on_exit_nicely_list[on_exit_nicely_index].function = function;
+ on_exit_nicely_list[on_exit_nicely_index].arg = arg;
+ on_exit_nicely_index++;
+ }
+
+ /*
+ * Run accumulated on_exit_nicely callbacks in reverse order and then exit
+ * quietly. This needs to be thread-safe.
+ */
+ void
+ exit_nicely(int code)
+ {
+ int i;
+
+ for (i = on_exit_nicely_index - 1; i >= 0; i--)
+ (*on_exit_nicely_list[i].function) (code,
+ on_exit_nicely_list[i].arg);
+
+ #ifdef WIN32
+ if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
+ ExitThread(code);
+ #endif
+
+ exit(code);
+ }
+
*** a/src/tools/msvc/Mkvcbuild.pm
--- b/src/tools/msvc/Mkvcbuild.pm
***************
*** 71,77 **** sub mkvcbuild
pgcheckdir.c pg_crc.c pgmkdirp.c pgsleep.c pgstrcasecmp.c pqsignal.c
mkdtemp.c qsort.c qsort_arg.c quotes.c system.c
sprompt.c tar.c thread.c getopt.c getopt_long.c dirent.c
! win32env.c win32error.c win32setlocale.c);
push(@pgportfiles, 'rint.c') if ($vsVersion < '12.00');
--- 71,77 ----
pgcheckdir.c pg_crc.c pgmkdirp.c pgsleep.c pgstrcasecmp.c pqsignal.c
mkdtemp.c qsort.c qsort_arg.c quotes.c system.c
sprompt.c tar.c thread.c getopt.c getopt_long.c dirent.c
! win32env.c win32error.c win32setlocale.c parallel_utils.c);
push(@pgportfiles, 'rint.c') if ($vsVersion < '12.00');
vacuumdb_parallel_v10.patchapplication/octet-stream; name=vacuumdb_parallel_v10.patchDownload
*** a/doc/src/sgml/ref/vacuumdb.sgml
--- b/doc/src/sgml/ref/vacuumdb.sgml
***************
*** 224,229 **** PostgreSQL documentation
--- 224,239 ----
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></></term>
+ <term><option>--jobs=<replaceable class="parameter">jobs</replaceable></></term>
+ <listitem>
+ <para>
+ Number of parallel process to perform the operation.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-?</></term>
<term><option>--help</></term>
<listitem>
*** a/src/bin/scripts/Makefile
--- b/src/bin/scripts/Makefile
***************
*** 32,38 **** dropdb: dropdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq subm
droplang: droplang.o common.o print.o mbprint.o | submake-libpq submake-libpgport
dropuser: dropuser.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
clusterdb: clusterdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
! vacuumdb: vacuumdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
reindexdb: reindexdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
pg_isready: pg_isready.o common.o | submake-libpq submake-libpgport
--- 32,38 ----
droplang: droplang.o common.o print.o mbprint.o | submake-libpq submake-libpgport
dropuser: dropuser.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
clusterdb: clusterdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
! vacuumdb: vacuumdb.o vac_parallel.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
reindexdb: reindexdb.o common.o dumputils.o kwlookup.o keywords.o | submake-libpq submake-libpgport
pg_isready: pg_isready.o common.o | submake-libpq submake-libpgport
***************
*** 65,71 **** uninstall:
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
! rm -f common.o dumputils.o kwlookup.o keywords.o print.o mbprint.o $(WIN32RES)
rm -f dumputils.c print.c mbprint.c kwlookup.c keywords.c
rm -rf tmp_check
--- 65,71 ----
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
! rm -f common.o dumputils.o kwlookup.o keywords.o print.o mbprint.o vac_parallel.o $(WIN32RES)
rm -f dumputils.c print.c mbprint.c kwlookup.c keywords.c
rm -rf tmp_check
*** /dev/null
--- b/src/bin/scripts/vac_parallel.c
***************
*** 0 ****
--- 1,498 ----
+ /*-------------------------------------------------------------------------
+ *
+ * vac_parallel.c
+ *
+ * Parallel support for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #include "vac_parallel.h"
+
+ #ifndef WIN32
+ #include <sys/types.h>
+ #include <sys/wait.h>
+ #include "signal.h"
+ #include <unistd.h>
+ #include <fcntl.h>
+ #endif
+ #include "common.h"
+
+ /* file-scope variables */
+ #ifdef WIN32
+ static unsigned int tMasterThreadId = 0;
+
+ /*
+ * Structure to hold info passed by _beginthreadex() to the function it calls
+ * via its single allowed argument.
+ */
+ typedef struct
+ {
+ VacOpt *vopt;
+ int worker;
+ int pipeRead;
+ int pipeWrite;
+ } WorkerInfo;
+ #endif
+
+ static const char *modulename = gettext_noop("parallel vacuum");
+
+ static void
+ SetupWorker(PGconn *connection, int pipefd[2], int worker, int vacStage);
+
+ static void
+ WaitForCommands(PGconn * connection, int pipefd[2]);
+
+ static void
+ vacuum_close_connection(int code, void *arg);
+
+ #define messageStartsWith(msg, prefix) \
+ (strncmp(msg, prefix, strlen(prefix)) == 0)
+ #define messageEquals(msg, pattern) \
+ (strcmp(msg, pattern) == 0)
+
+
+ #ifdef WIN32
+ static unsigned __stdcall
+ init_spawned_worker_win32(WorkerInfo *wi)
+ {
+ PGconn *conn;
+ int pipefd[2] = {wi->pipeRead, wi->pipeWrite};
+ int worker = wi->worker;
+ VacOpt *vopt = wi->vopt;
+ ParallelSlot *mySlot = &shutdown_info.pstate->parallelSlot[worker];
+
+ conn = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport,
+ vopt->username, vopt->promptPassword, vopt->progname,
+ false);
+
+ ((ParallelArgs*)mySlot->args)->connection = conn;
+
+ free(wi);
+ SetupWorker(conn, pipefd, worker, vopt->analyze_stage);
+ _endthreadex(0);
+ return 0;
+ }
+ #endif
+
+
+
+ ParallelState * ParallelVacuumStart(VacOpt *vopt, int numWorkers)
+ {
+ ParallelState *pstate;
+ int i;
+ const size_t slotSize = numWorkers * sizeof(ParallelSlot);
+
+ Assert(numWorkers > 0);
+
+ /* Ensure stdio state is quiesced before forking */
+ fflush(NULL);
+
+ pstate = (ParallelState *) pg_malloc(sizeof(ParallelState));
+
+ pstate->numWorkers = numWorkers;
+ pstate->parallelSlot = NULL;
+
+ if (numWorkers == 1)
+ return pstate;
+
+ pstate->parallelSlot = (ParallelSlot *) pg_malloc(slotSize);
+ memset((void *) pstate->parallelSlot, 0, slotSize);
+
+ shutdown_info.pstate = pstate;
+
+ #ifdef WIN32
+ tMasterThreadId = GetCurrentThreadId();
+ termEvent = CreateEvent(NULL, true, false, "Terminate");
+ #else
+ signal(SIGTERM, sigTermHandler);
+ signal(SIGINT, sigTermHandler);
+ signal(SIGQUIT, sigTermHandler);
+ #endif
+
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ #ifdef WIN32
+ WorkerInfo *wi;
+ uintptr_t handle;
+ #else
+ pid_t pid;
+ #endif
+ int pipeMW[2],
+ pipeWM[2];
+
+ if (pgpipe(pipeMW) < 0 || pgpipe(pipeWM) < 0)
+ exit_horribly(modulename,
+ "could not create communication channels: %s\n",
+ strerror(errno));
+
+ pstate->parallelSlot[i].workerStatus = WRKR_IDLE;
+ pstate->parallelSlot[i].args = (ParallelArgs *) pg_malloc(sizeof(ParallelArgs));
+ ((ParallelArgs*)pstate->parallelSlot[i].args)->vopt = vopt;
+ #ifdef WIN32
+ /* Allocate a new structure for every worker */
+ wi = (WorkerInfo *) pg_malloc(sizeof(WorkerInfo));
+ wi->worker = i;
+ wi->pipeRead = pstate->parallelSlot[i].pipeRevRead = pipeMW[PIPE_READ];
+ wi->pipeWrite = pstate->parallelSlot[i].pipeRevWrite = pipeWM[PIPE_WRITE];
+ wi->vopt = vopt;
+ handle = _beginthreadex(NULL, 0, (void *) &init_spawned_worker_win32,
+ wi, 0, &(pstate->parallelSlot[i].threadId));
+ pstate->parallelSlot[i].hThread = handle;
+ #else
+ pid = fork();
+ if (pid == 0)
+ {
+ /* we are the worker */
+ int j;
+ int pipefd[2] = {pipeMW[PIPE_READ], pipeWM[PIPE_WRITE]};
+
+ /*
+ * Store the fds for the reverse communication in pstate. Actually
+ * we only use this in case of an error and don't use pstate
+ * otherwise in the worker process. On Windows we write to the
+ * global pstate, in Unix we write to our process-local copy but
+ * that's also where we'd retrieve this information back from.
+ */
+ pstate->parallelSlot[i].pipeRevRead = pipefd[PIPE_READ];
+ pstate->parallelSlot[i].pipeRevWrite = pipefd[PIPE_WRITE];
+ pstate->parallelSlot[i].pid = getpid();
+
+ ((ParallelArgs*)pstate->parallelSlot[i].args)->connection
+ = connectDatabase(vopt->dbname, vopt->pghost, vopt->pgport,
+ vopt->username, vopt->promptPassword,
+ vopt->progname, false);
+
+ /* close read end of Worker -> Master */
+ closesocket(pipeWM[PIPE_READ]);
+
+ /* close write end of Master -> Worker */
+ closesocket(pipeMW[PIPE_WRITE]);
+
+ /*
+ * Close all inherited fds for communication of the master with
+ * the other workers.
+ */
+ for (j = 0; j < i; j++)
+ {
+ closesocket(pstate->parallelSlot[j].pipeRead);
+ closesocket(pstate->parallelSlot[j].pipeWrite);
+ }
+
+ SetupWorker(((ParallelArgs*)pstate->parallelSlot[i].args)->connection,
+ pipefd, i, vopt->analyze_stage);
+ exit(0);
+ }
+ else if (pid < 0)
+ /* fork failed */
+ exit_horribly(modulename,
+ "could not create worker process: %s\n",
+ strerror(errno));
+
+ /* we are the Master, pid > 0 here */
+ Assert(pid > 0);
+
+ /* close read end of Master -> Worker */
+ closesocket(pipeMW[PIPE_READ]);
+
+ /* close write end of Worker -> Master */
+ closesocket(pipeWM[PIPE_WRITE]);
+
+ pstate->parallelSlot[i].pid = pid;
+ #endif
+
+ pstate->parallelSlot[i].pipeRead = pipeWM[PIPE_READ];
+ pstate->parallelSlot[i].pipeWrite = pipeMW[PIPE_WRITE];
+ }
+
+ return pstate;
+ }
+
+ /*
+ * Tell all of our workers to terminate.
+ *
+ * Pretty straightforward routine, first we tell everyone to terminate, then
+ * we listen to the workers' replies and finally close the sockets that we
+ * have used for communication.
+ */
+ void
+ ParallelVacuumEnd(ParallelState *pstate)
+ {
+ int i;
+
+ Assert(IsEveryWorkerIdle(pstate));
+
+ /* close the sockets so that the workers know they can exit */
+ for (i = 0; i < pstate->numWorkers; i++)
+ {
+ closesocket(pstate->parallelSlot[i].pipeRead);
+ closesocket(pstate->parallelSlot[i].pipeWrite);
+ }
+
+ WaitForTerminatingWorkers(pstate);
+
+ /*
+ * Remove the pstate again, so the exit handler in the parent will now
+ * again fall back to closing AH->connection (if connected).
+ */
+ shutdown_info.pstate = NULL;
+
+ free(pstate->parallelSlot);
+ free(pstate);
+ }
+
+ /*
+ * This function is called by both UNIX and Windows variants to set up a
+ * worker process.
+ */
+ static void
+ SetupWorker(PGconn *connection, int pipefd[2], int worker, int vacStage)
+ {
+
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"};
+
+ if (vacStage >= 0)
+ executeMaintenanceCommand(connection, stage_commands[vacStage], false);
+
+ /*
+ * Call the setup worker function that's defined in the ArchiveHandle.
+ *
+ * We get the raw connection only for the reason that we can close it
+ * properly when we shut down. This happens only that way when it is
+ * brought down because of an error.
+ */
+ WaitForCommands(connection, pipefd);
+ closesocket(pipefd[PIPE_READ]);
+ closesocket(pipefd[PIPE_WRITE]);
+ }
+
+
+
+ /*
+ * That's the main routine for the worker.
+ * When it starts up it enters this routine and waits for commands from the
+ * master process. After having processed a command it comes back to here to
+ * wait for the next command. Finally it will receive a TERMINATE command and
+ * exit.
+ */
+ static void
+ WaitForCommands(PGconn * connection, int pipefd[2])
+ {
+ char *command;
+ PQExpBufferData sql;
+
+ for (;;)
+ {
+ if (!(command = getMessageFromMaster(pipefd)))
+ {
+ PQfinish(connection);
+ connection = NULL;
+ return;
+ }
+
+
+ /* check if master has set the terminate event*/
+ checkAborting();
+
+ if (executeMaintenanceCommand(connection, command, false))
+ sendMessageToMaster(pipefd, "OK");
+ else
+ {
+ initPQExpBuffer(&sql);
+ appendPQExpBuffer(&sql, "ERROR : %s",
+ PQerrorMessage(connection));
+ sendMessageToMaster(pipefd, sql.data);
+ termPQExpBuffer(&sql);
+ }
+
+ /* command was pg_malloc'd and we are responsible for free()ing it. */
+ free(command);
+ }
+ }
+
+ /*
+ * ---------------------------------------------------------------------
+ * Note the status change:
+ *
+ * DispatchJobForTocEntry WRKR_IDLE -> WRKR_WORKING
+ * ListenToWorkers WRKR_WORKING -> WRKR_FINISHED / WRKR_TERMINATED
+ * ReapWorkerStatus WRKR_FINISHED -> WRKR_IDLE
+ * ---------------------------------------------------------------------
+ *
+ * Just calling ReapWorkerStatus() when all workers are working might or might
+ * not give you an idle worker because you need to call ListenToWorkers() in
+ * between and only thereafter ReapWorkerStatus(). This is necessary in order
+ * to get and deal with the status (=result) of the worker's execution.
+ */
+ void
+ ListenToWorkers(ParallelState *pstate, bool do_wait)
+ {
+ int worker;
+ char *msg;
+
+ msg = getMessageFromWorker(pstate, do_wait, &worker);
+
+ if (!msg)
+ {
+ if (do_wait)
+ exit_horribly(modulename, "a worker process died unexpectedly\n");
+ return;
+ }
+
+ if (messageStartsWith(msg, "OK"))
+ {
+ pstate->parallelSlot[worker].workerStatus = WRKR_FINISHED;
+ }
+ else if (messageStartsWith(msg, "ERROR "))
+ {
+ ParallelSlot *mySlot = &pstate->parallelSlot[worker];
+
+ mySlot->workerStatus = WRKR_TERMINATED;
+ exit_horribly(modulename,
+ "vacuuming of database \"%s\" failed %s",
+ ((ParallelArgs*)mySlot->args)->vopt->dbname, msg + strlen("ERROR "));
+ }
+ else
+ {
+ exit_horribly(modulename,
+ "invalid message received from worker: %s\n", msg);
+ }
+
+ /* both Unix and Win32 return pg_malloc()ed space, so we free it */
+ free(msg);
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It looks for an idle worker process and only returns if there is one.
+ */
+ void
+ EnsureIdleWorker(ParallelState *pstate)
+ {
+ int ret_worker;
+ int work_status;
+
+ for (;;)
+ {
+ int nTerm = 0;
+
+ while ((ret_worker = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ if (work_status != 0)
+ exit_horribly(modulename, "error processing a parallel work item\n");
+
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before dispatching
+ * the next item. If nTerm > 0 we already have that (quick check).
+ */
+ if (nTerm > 0)
+ return;
+
+ /* explicit check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ return;
+
+ /*
+ * If we have no idle worker, read the result of one or more workers
+ * and loop the loop to call ReapWorkerStatus() on them
+ */
+ ListenToWorkers(pstate, true);
+ }
+ }
+
+ /*
+ * This function is executed in the master process.
+ *
+ * It waits for all workers to terminate.
+ */
+ void
+ EnsureWorkersFinished(ParallelState *pstate)
+ {
+ int work_status;
+
+ if (!pstate || pstate->numWorkers == 1)
+ return;
+
+ /* Waiting for the remaining worker processes to finish */
+ while (!IsEveryWorkerIdle(pstate))
+ {
+ if (ReapWorkerStatus(pstate, &work_status) == NO_SLOT)
+ ListenToWorkers(pstate, true);
+ else if (work_status != 0)
+ exit_horribly(modulename,
+ "error processing a parallel work item\n");
+ }
+ }
+
+ void
+ DispatchJob(ParallelState *pstate, char * command)
+ {
+ int worker;
+
+ /* our caller makes sure that at least one worker is idle */
+ Assert(GetIdleWorker(pstate) != NO_SLOT);
+ worker = GetIdleWorker(pstate);
+ Assert(worker != NO_SLOT);
+
+ sendMessageToWorker(pstate, worker, command);
+ pstate->parallelSlot[worker].workerStatus = WRKR_WORKING;
+ }
+
+ void
+ on_exit_close_vacuum(PGconn *conn)
+ {
+ shutdown_info.handle = (void*)conn;
+ on_exit_nicely(vacuum_close_connection, &shutdown_info);
+ }
+
+ /*
+ * This function can close archives in both the parallel and non-parallel
+ * case.
+ */
+ static void
+ vacuum_close_connection(int code, void *arg)
+ {
+ ShutdownInformation *si = (ShutdownInformation *) arg;
+
+ if (si->pstate)
+ {
+ ParallelSlot *slot = GetMyPSlot(si->pstate);
+
+ if (!slot)
+ {
+ PQfinish((PGconn*)si->handle);
+ #ifndef WIN32
+
+ /*
+ * Setting aborting to true switches to best-effort-mode
+ * (send/receive but ignore errors) in communicating with our
+ * workers.
+ */
+ aborting = true;
+ #endif
+ ShutdownWorkersHard(si->pstate);
+ }
+ else if (((ParallelArgs*)slot->args)->connection)
+ PQfinish((((ParallelArgs*)slot->args)->connection));
+ }
+ else if ((PGconn*)si->handle)
+ PQfinish((PGconn*)si->handle);
+ }
+
*** /dev/null
--- b/src/bin/scripts/vac_parallel.h
***************
*** 0 ****
--- 1,58 ----
+ /*-------------------------------------------------------------------------
+ *
+ * vac_parallel.h
+ *
+ * Parallel support header file for the vacuumdb
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * The author is not responsible for loss or damages that may
+ * result from its use.
+ *
+ * IDENTIFICATION
+ * src/bin/scripts/vac_parallel.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #ifndef VAC_PARALLEL_H
+ #define VAC_PARALLEL_H
+
+ #include "postgres_fe.h"
+ #include <time.h>
+ #include "libpq-fe.h"
+ #include "common.h"
+ #include "parallel_utils.h"
+
+
+ typedef struct VacOpt
+ {
+ char *dbname;
+ char *pgport;
+ char *pghost;
+ char *username;
+ char *progname;
+ enum trivalue promptPassword;
+ int analyze_stage;
+ }VacOpt;
+
+ /* Arguments needed for a worker process */
+ typedef struct ParallelArgs
+ {
+ PGconn *connection;
+ VacOpt *vopt;
+ } ParallelArgs;
+
+
+ extern ParallelState * ParallelVacuumStart(VacOpt *vopt, int numWorkers);
+ extern bool IsEveryWorkerIdle(ParallelState *pstate);
+ extern void ListenToWorkers(ParallelState *pstate, bool do_wait);
+ extern void EnsureIdleWorker(ParallelState *pstate);
+ extern void EnsureWorkersFinished(ParallelState *pstate);
+
+ extern void DispatchJob(ParallelState *pstate, char * command);
+ extern void on_exit_close_vacuum(PGconn *conn);
+ extern void ParallelVacuumEnd(ParallelState *pstate);
+
+ #endif /* VAC_PARALLEL_H */
*** a/src/bin/scripts/vacuumdb.c
--- b/src/bin/scripts/vacuumdb.c
***************
*** 13,34 ****
#include "postgres_fe.h"
#include "common.h"
#include "dumputils.h"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *table, const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo);
static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
bool analyze_only, bool analyze_in_stages, bool freeze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet);
static void help(const char *progname);
int
main(int argc, char *argv[])
--- 13,54 ----
#include "postgres_fe.h"
#include "common.h"
#include "dumputils.h"
+ #include "vac_parallel.h"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, bool echo);
static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
bool analyze_only, bool analyze_in_stages, bool freeze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! bool echo, bool quiet, int parallel);
static void help(const char *progname);
+ void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host,
+ const char *port, const char *username,
+ enum trivalue prompt_password,
+ bool echo, int parallel, SimpleStringList *tables);
+
+ void run_command(ParallelState *pstate, char *command);
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+ static void
+ run_parallel_vacuum(PGconn *conn, bool echo,
+ const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int parallel,
+ VacOpt *vopt);
+
+ const char *progname = NULL;
int
main(int argc, char *argv[])
***************
*** 49,60 **** main(int argc, char *argv[])
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
};
- const char *progname;
int optindex;
int c;
--- 69,80 ----
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
};
int optindex;
int c;
***************
*** 74,86 **** main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
{
switch (c)
{
--- 94,107 ----
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int parallel = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
***************
*** 129,134 **** main(int argc, char *argv[])
--- 150,165 ----
case 'v':
verbose = true;
break;
+ case 'j':
+ parallel = atoi(optarg);
+ if (parallel <= 0)
+ {
+ fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least 1\n"),
+ progname);
+ exit(1);
+ }
+
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
***************
*** 141,146 **** main(int argc, char *argv[])
--- 172,178 ----
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
***************
*** 196,202 **** main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet);
}
else
{
--- 228,234 ----
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, echo, quiet, parallel);
}
else
{
***************
*** 210,234 **** main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
! for (cell = tables.head; cell; cell = cell->next)
{
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo);
}
}
- else
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages,
- freeze, NULL,
- host, port, username, prompt_password,
- progname, echo);
}
exit(0);
--- 242,280 ----
dbname = get_user_name_or_exit(progname);
}
! if (parallel > 1)
{
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! echo, parallel, &tables);
! }
! else
! {
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
!
! for (cell = tables.head; cell; cell = cell->next)
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, cell->val,
! host, port, username, prompt_password,
! echo);
! }
! }
! else
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, NULL,
! host, port, username, prompt_password,
! echo);
!
}
}
}
exit(0);
***************
*** 253,263 **** run_vacuum_command(PGconn *conn, const char *sql, bool echo, const char *dbname,
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze, const char *table,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo)
{
PQExpBufferData sql;
--- 299,309 ----
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, bool echo)
{
PQExpBufferData sql;
***************
*** 352,362 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
! bool analyze_in_stages, bool freeze, const char *maintenance_db,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
--- 398,409 ----
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *maintenance_db, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password,
! bool echo, bool quiet, int parallel)
{
PGconn *conn;
PGresult *result;
***************
*** 377,391 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
fflush(stdout);
}
! vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
! analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
! progname, echo);
}
PQclear(result);
}
static void
help(const char *progname)
--- 424,668 ----
fflush(stdout);
}
!
! if (parallel > 1)
! {
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! echo, parallel, NULL);
!
! }
! else
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
! echo);
! }
}
PQclear(result);
}
+ static void
+ run_parallel_vacuum(PGconn *conn, bool echo,
+ const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int parallel,
+ VacOpt *vopt)
+ {
+ ParallelState *pstate;
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+
+ initPQExpBuffer(&sql);
+
+ pstate = ParallelVacuumStart(vopt, parallel);
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ prepare_command(conn, full, verbose, and_analyze,
+ analyze_only, freeze, &sql);
+ appendPQExpBuffer(&sql, " %s", cell->val);
+ run_command(pstate, sql.data);
+ termPQExpBuffer(&sql);
+ }
+
+ EnsureWorkersFinished(pstate);
+ ParallelVacuumEnd(pstate);
+ termPQExpBuffer(&sql);
+ }
+
+
+ void
+ vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ bool echo, int parallel, SimpleStringList *tables)
+ {
+
+ PGconn *conn;
+ VacOpt vopt = {0};
+
+ init_parallel_dump_utils();
+
+ conn = connectDatabase(dbname, host, port, username, prompt_password,
+ progname, false);
+
+ if (dbname)
+ vopt.dbname = pg_strdup(dbname);
+
+ if (host)
+ vopt.pghost = pg_strdup(host);
+
+ if (port)
+ vopt.pgport = pg_strdup(port);
+
+ if (username)
+ vopt.username = pg_strdup(username);
+
+ if (progname)
+ vopt.progname = pg_strdup(progname);
+
+ vopt.promptPassword = prompt_password;
+
+ on_exit_close_vacuum(conn);
+
+ /* if table list is not provided then we need to do vaccum for whole DB
+ get the list of all tables and prpare the list*/
+ if (!tables || !tables->head)
+ {
+ SimpleStringList dbtables = {NULL, NULL};
+ PGresult *res;
+ int ntuple;
+ int i;
+ char *relName;
+ char *nspace;
+ PQExpBufferData sql;
+
+ initPQExpBuffer(&sql);
+
+ res = executeQuery(conn,
+ "select relname, nspname from pg_class c, pg_namespace ns"
+ " where relkind= \'r\' and c.relnamespace = ns.oid"
+ " order by relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ relName = PQgetvalue(res, i, 0);
+ nspace = PQgetvalue(res, i, 1);
+
+ appendPQExpBuffer(&sql, " \"%s\".\"%s\"", nspace, relName);
+ simple_string_list_append(&dbtables, sql.data);
+ resetPQExpBuffer(&sql);
+ }
+
+ termPQExpBuffer(&sql);
+
+ tables = &dbtables;
+
+ }
+
+ if (analyze_in_stages)
+ {
+ int i;
+
+ for (i = 0; i < 3; i++)
+ {
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")};
+
+ puts(gettext(stage_messages[i]));
+ vopt.analyze_stage = i;
+ run_parallel_vacuum(conn, echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, parallel, &vopt);
+ }
+ }
+ else
+ {
+ vopt.analyze_stage = -1;
+ run_parallel_vacuum(conn, echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, parallel, &vopt);
+ }
+
+ PQfinish(conn);
+ }
+
+ void run_command(ParallelState *pstate, char *command)
+ {
+ int work_status;
+ int ret_child;
+
+ DispatchJob(pstate, command);
+
+ /*Listen for worker and get message*/
+ for (;;)
+ {
+ int nTerm = 0;
+
+ ListenToWorkers(pstate, false);
+ while ((ret_child = ReapWorkerStatus(pstate, &work_status)) != NO_SLOT)
+ {
+ nTerm++;
+ }
+
+ /*
+ * We need to make sure that we have an idle worker before
+ * re-running the loop. If nTerm > 0 we already have that (quick
+ * check).
+ */
+ if (nTerm > 0)
+ break;
+
+ /* if nobody terminated, explicitly check for an idle worker */
+ if (GetIdleWorker(pstate) != NO_SLOT)
+ break;
+ }
+ }
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+ {
+ initPQExpBuffer(sql);
+
+ if (analyze_only)
+ {
+ appendPQExpBuffer(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBuffer(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBuffer(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBuffer(sql, " FULL");
+ if (freeze)
+ appendPQExpBuffer(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBuffer(sql, " ANALYZE");
+ }
+ }
+ }
+
static void
help(const char *progname)
***************
*** 405,410 **** help(const char *progname)
--- 682,688 ----
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
Dilip kumar <dilip.kumar@huawei.com> writes:
On 15 July 2014 19:01, Magnus Hagander Wrote,
I am late to this game, but the first thing to my mind was - do we
really need the whole forking/threading thing on the client at all?
Thanks for the review, I understand you point, but I think if we have do this directly by independent connection,
It's difficult to equally divide the jobs b/w multiple independent connections.
That argument seems like complete nonsense. You're confusing work
allocation strategy with the implementation technology for the multiple
working threads. I see no reason why a good allocation strategy couldn't
work with either approach; indeed, I think it would likely be easier to
do some things *without* client-side physical parallelism, because that
makes it much simpler to handle feedback between the results of different
operational threads.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Tom Lane wrote:
Dilip kumar <dilip.kumar@huawei.com> writes:
On 15 July 2014 19:01, Magnus Hagander Wrote,
I am late to this game, but the first thing to my mind was - do we
really need the whole forking/threading thing on the client at all?Thanks for the review, I understand you point, but I think if we have do this directly by independent connection,
It's difficult to equally divide the jobs b/w multiple independent connections.That argument seems like complete nonsense. You're confusing work
allocation strategy with the implementation technology for the multiple
working threads. I see no reason why a good allocation strategy couldn't
work with either approach; indeed, I think it would likely be easier to
do some things *without* client-side physical parallelism, because that
makes it much simpler to handle feedback between the results of different
operational threads.
So you would have one initial connection, which generates a task list;
then open N libpq connections. Launch one vacuum on each, and then
sleep on select() on the three sockets. Whenever one returns
read-ready, the vacuuming is done and we send another item from the task
list. Repeat until tasklist is empty. No need to fork anything.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Jul 16, 2014 7:05 AM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote:
Tom Lane wrote:
Dilip kumar <dilip.kumar@huawei.com> writes:
On 15 July 2014 19:01, Magnus Hagander Wrote,
I am late to this game, but the first thing to my mind was - do we
really need the whole forking/threading thing on the client at all?Thanks for the review, I understand you point, but I think if we have
do this directly by independent connection,
It's difficult to equally divide the jobs b/w multiple independent
connections.
That argument seems like complete nonsense. You're confusing work
allocation strategy with the implementation technology for the multiple
working threads. I see no reason why a good allocation strategy
couldn't
work with either approach; indeed, I think it would likely be easier to
do some things *without* client-side physical parallelism, because that
makes it much simpler to handle feedback between the results of
different
operational threads.
So you would have one initial connection, which generates a task list;
then open N libpq connections. Launch one vacuum on each, and then
sleep on select() on the three sockets. Whenever one returns
read-ready, the vacuuming is done and we send another item from the task
list. Repeat until tasklist is empty. No need to fork anything.
Yeah, those are exactly my points. I think it would be significantly
simpler to do it that way, rather than forking and threading. And also
easier to make portable...
(and as a optimization on Alvaros suggestion, you can of course reuse the
initial connection as one of the workers as long as you got the full list
of tasks from it up front, which I think you do anyway in order to do
sorting of tasks...)
/Magnus
On 16 July 2014 12:13 Magnus Hagander Wrote,
Yeah, those are exactly my points. I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier to make portable...
(and as a optimization on Alvaros suggestion, you can of course reuse the initial connection as one of the workers as long as you got the full list of tasks from it up front, which I think you do anyway in order to do sorting of tasks...)
Oh, I got your point, I will update my patch and send,
Now we can completely remove vac_parallel.h file and no need of refactoring also:)
Thanks & Regards,
Dilip Kumar
From: Magnus Hagander [mailto:magnus@hagander.net]
Sent: 16 July 2014 12:13
To: Alvaro Herrera
Cc: Dilip kumar; Jan Lentfer; Tom Lane; PostgreSQL-development; Sawada Masahiko; Euler Taveira
Subject: Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Jul 16, 2014 7:05 AM, "Alvaro Herrera" <alvherre@2ndquadrant.com<mailto:alvherre@2ndquadrant.com>> wrote:
Tom Lane wrote:
Dilip kumar <dilip.kumar@huawei.com<mailto:dilip.kumar@huawei.com>> writes:
On 15 July 2014 19:01, Magnus Hagander Wrote,
I am late to this game, but the first thing to my mind was - do we
really need the whole forking/threading thing on the client at all?Thanks for the review, I understand you point, but I think if we have do this directly by independent connection,
It's difficult to equally divide the jobs b/w multiple independent connections.That argument seems like complete nonsense. You're confusing work
allocation strategy with the implementation technology for the multiple
working threads. I see no reason why a good allocation strategy couldn't
work with either approach; indeed, I think it would likely be easier to
do some things *without* client-side physical parallelism, because that
makes it much simpler to handle feedback between the results of different
operational threads.So you would have one initial connection, which generates a task list;
then open N libpq connections. Launch one vacuum on each, and then
sleep on select() on the three sockets. Whenever one returns
read-ready, the vacuuming is done and we send another item from the task
list. Repeat until tasklist is empty. No need to fork anything.
Yeah, those are exactly my points. I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier to make portable...
(and as a optimization on Alvaros suggestion, you can of course reuse the initial connection as one of the workers as long as you got the full list of tasks from it up front, which I think you do anyway in order to do sorting of tasks...)
/Magnus
On 16 July 2014 12:13, Magnus Hagander Wrote,
Yeah, those are exactly my points. I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier to make portable...
(and as a optimization on Alvaros suggestion, you can of course reuse the initial connection as one of the workers as long as you got the full list of tasks from it up front, which I think you do anyway in order to sorting of tasks...)
I have modified the patch as per the suggestion,
Now in beginning we create all connections, and first connection we use for getting table list in beginning, After that all connections will be involved in vacuum task.
Please have a look and provide your opinion…
Thanks & Regards,
Dilip Kumar
Attachments:
vacuumdb_parallel_v11.patchapplication/octet-stream; name=vacuumdb_parallel_v11.patchDownload
*** a/doc/src/sgml/ref/vacuumdb.sgml
--- b/doc/src/sgml/ref/vacuumdb.sgml
***************
*** 224,229 **** PostgreSQL documentation
--- 224,239 ----
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></></term>
+ <term><option>--jobs=<replaceable class="parameter">jobs</replaceable></></term>
+ <listitem>
+ <para>
+ Number of parallel process to perform the operation.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-?</></term>
<term><option>--help</></term>
<listitem>
*** a/src/bin/scripts/vacuumdb.c
--- b/src/bin/scripts/vacuumdb.c
***************
*** 14,34 ****
#include "common.h"
#include "dumputils.h"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *table, const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
const char *progname, bool echo);
static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
bool analyze_only, bool analyze_in_stages, bool freeze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet);
static void help(const char *progname);
int
main(int argc, char *argv[])
--- 14,71 ----
#include "common.h"
#include "dumputils.h"
+ #define NO_SLOT (-1)
+
+ /* Arguments needed for a worker process */
+ typedef struct ParallelSlot
+ {
+ PGconn *connection;
+ bool isFree;
+ pgsocket sock;
+ } ParallelSlot;
+
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password,
const char *progname, bool echo);
static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
bool analyze_only, bool analyze_in_stages, bool freeze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet, int parallel);
static void help(const char *progname);
+ void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host,
+ const char *port, const char *username,
+ enum trivalue prompt_password, const char *progname,
+ bool echo, int parallel, SimpleStringList *tables);
+
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+ static void
+ run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int parallel,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot);
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname, const char *progname);
+
+ static bool GetQueryResult(PGconn *conn, const char *dbname, const char *progname);
+
+ static int
+ select_loop(int maxFd, fd_set *workerset);
+
+ static void DisconnectDatabase(ParallelSlot *slot);
+
int
main(int argc, char *argv[])
***************
*** 49,54 **** main(int argc, char *argv[])
--- 86,92 ----
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
***************
*** 74,86 **** main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
{
switch (c)
{
--- 112,125 ----
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int parallel = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
***************
*** 129,134 **** main(int argc, char *argv[])
--- 168,183 ----
case 'v':
verbose = true;
break;
+ case 'j':
+ parallel = atoi(optarg);
+ if (parallel <= 0)
+ {
+ fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least 1\n"),
+ progname);
+ exit(1);
+ }
+
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
***************
*** 141,146 **** main(int argc, char *argv[])
--- 190,196 ----
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
***************
*** 196,202 **** main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet);
}
else
{
--- 246,252 ----
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet, parallel);
}
else
{
***************
*** 210,234 **** main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
! for (cell = tables.head; cell; cell = cell->next)
{
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo);
}
}
- else
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages,
- freeze, NULL,
- host, port, username, prompt_password,
- progname, echo);
}
exit(0);
--- 260,298 ----
dbname = get_user_name_or_exit(progname);
}
! if (parallel > 1)
{
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! progname, echo, parallel, &tables);
! }
! else
! {
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
!
! for (cell = tables.head; cell; cell = cell->next)
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo);
! }
! }
! else
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, NULL,
! host, port, username, prompt_password,
! progname, echo);
!
}
}
}
exit(0);
***************
*** 253,263 **** run_vacuum_command(PGconn *conn, const char *sql, bool echo, const char *dbname,
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze, const char *table,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo)
{
PQExpBufferData sql;
--- 317,328 ----
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo)
{
PQExpBufferData sql;
***************
*** 352,362 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
! bool analyze_in_stages, bool freeze, const char *maintenance_db,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
--- 417,428 ----
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *maintenance_db, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo, bool quiet, int parallel)
{
PGconn *conn;
PGresult *result;
***************
*** 377,391 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
fflush(stdout);
}
! vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
! analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
progname, echo);
}
PQclear(result);
}
static void
help(const char *progname)
--- 443,849 ----
fflush(stdout);
}
!
! if (parallel > 1)
! {
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! progname, echo, parallel, NULL);
!
! }
! else
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
progname, echo);
+ }
}
PQclear(result);
}
+ static void
+ run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int parallel,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot)
+ {
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+ int max_slot = parallel;
+ int i;
+ int free_slot;
+ PGconn *slotconn;
+ bool error = false;
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"};
+
+ initPQExpBuffer(&sql);
+
+ if (analyze_stage >= 0)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ executeCommand(connSlot[i].connection,
+ stage_commands[analyze_stage], progname, echo);
+ }
+ }
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ /* This will give the free connection slot, if no slot is free it will
+ wait for atleast one slot to get free.*/
+ free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname);
+ if (free_slot == NO_SLOT)
+ {
+ error = true;
+ goto fail;
+ }
+
+ prepare_command(connSlot[free_slot].connection, full, verbose, and_analyze,
+ analyze_only, freeze, &sql);
+
+ appendPQExpBuffer(&sql, " %s", cell->val);
+
+ connSlot[free_slot].isFree = false;
+
+ slotconn = connSlot[free_slot].connection;
+ PQsendQuery(slotconn, sql.data);
+
+ resetPQExpBuffer(&sql);
+ }
+
+
+ for (i = 0; i < max_slot; i++)
+ {
+ /* wait for all connection to return the results*/
+ if (!GetQueryResult(connSlot[i].connection, dbname, progname))
+ {
+ error = true;
+ goto fail;
+ }
+
+ connSlot[i].isFree = true;
+ }
+
+ fail:
+
+ termPQExpBuffer(&sql);
+
+ if (error)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ DisconnectDatabase(&connSlot[i]);
+ }
+
+ exit(1);
+ }
+ }
+
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname)
+ {
+ int i;
+ fd_set slotset;
+ int firstFree = -1;
+ pgsocket maxFd;
+
+ for (i = 0; i < max_slot; i++)
+ if (pSlot[i].isFree)
+ return i;
+
+ FD_ZERO(&slotset);
+
+ maxFd = pSlot[0].sock;
+
+ for (i = 0; i < max_slot; i++)
+ {
+ FD_SET(pSlot[i].sock, &slotset);
+ if (pSlot[i].sock > maxFd)
+ maxFd = pSlot[i].sock;
+ }
+
+ /* Some of the slot are free, Process the results for slots whichever are
+ free*/
+ do
+ {
+ i = select_loop(maxFd, &slotset);
+ Assert(i != 0);
+
+ for (i = 0; i < max_slot; i++)
+ {
+ if (!FD_ISSET(pSlot[i].sock, &slotset))
+ continue;
+
+ PQconsumeInput(pSlot[i].connection);
+ if (PQisBusy(pSlot[i].connection))
+ continue;
+
+ pSlot[i].isFree = true;
+
+ if (!GetQueryResult(pSlot[i].connection, dbname, progname))
+ return NO_SLOT;
+
+ if (firstFree < 0)
+ firstFree = i;
+ }
+ }while(firstFree < 0);
+
+ return firstFree;
+ }
+
+ static bool GetQueryResult(PGconn *conn, const char *dbname, const char *progname)
+ {
+ PGresult *result = NULL;
+ PGresult *lastResult = NULL;
+ bool r;
+
+ while((result = PQgetResult(conn)) != NULL)
+ lastResult = result;
+
+ if (!lastResult)
+ return true;
+
+ r = (PQresultStatus(lastResult) == PGRES_COMMAND_OK);
+
+ PQclear(lastResult);
+
+ if (!r)
+ {
+ fprintf(stderr, _("%s: vacuuming of database \"%s\" failed: %s"),
+ progname, dbname, PQerrorMessage(conn));
+ return false;
+ }
+
+ return true;
+ }
+
+
+
+ void
+ vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int parallel,
+ SimpleStringList *tables)
+ {
+
+ PGconn *conn;
+ int i;
+ ParallelSlot *connSlot;
+
+ connSlot = (ParallelSlot*)pg_malloc(parallel * sizeof(ParallelSlot));
+ for (i = 0; i < parallel; i++)
+ {
+ connSlot[i].connection = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ PQsetnonblocking(connSlot[i].connection, 1);
+ connSlot[i].isFree = true;
+ connSlot[i].sock = PQsocket(connSlot[i].connection);
+ }
+
+ conn = connSlot[0].connection;
+
+ /* if table list is not provided then we need to do vaccum for whole DB
+ get the list of all tables and prpare the list*/
+ if (!tables || !tables->head)
+ {
+ SimpleStringList dbtables = {NULL, NULL};
+ PGresult *res;
+ int ntuple;
+ int i;
+ char *relName;
+ char *nspace;
+ PQExpBufferData sql;
+
+ initPQExpBuffer(&sql);
+
+ res = executeQuery(conn,
+ "select relname, nspname from pg_class c, pg_namespace ns"
+ " where relkind= \'r\' and c.relnamespace = ns.oid"
+ " order by relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ relName = PQgetvalue(res, i, 0);
+ nspace = PQgetvalue(res, i, 1);
+
+ appendPQExpBuffer(&sql, "\"%s\".\"%s\"", nspace, relName);
+ simple_string_list_append(&dbtables, sql.data);
+ resetPQExpBuffer(&sql);
+ }
+
+ termPQExpBuffer(&sql);
+ tables = &dbtables;
+ }
+
+
+ if (analyze_in_stages)
+ {
+ int i;
+
+ for (i = 0; i < 3; i++)
+ {
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")};
+
+ puts(gettext(stage_messages[i]));
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, parallel,
+ progname, i, connSlot);
+ }
+ }
+ else
+ {
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze,
+ parallel, progname, -1, connSlot);
+ }
+
+ for (i = 0; i < parallel; i++)
+ {
+ PQfinish(connSlot[i].connection);
+ }
+ }
+
+ /*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+ static int
+ select_loop(int maxFd, fd_set *workerset)
+ {
+ int i;
+ fd_set saveSet = *workerset;
+
+ #ifdef WIN32
+ /* should always be the master */
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+ #else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+ #endif
+
+ return i;
+ }
+
+ void
+ DisconnectDatabase(ParallelSlot *slot)
+ {
+ PGcancel *cancel;
+ char errbuf[1];
+
+ if (!slot->connection)
+ return;
+
+ if (PQtransactionStatus(slot->connection) == PQTRANS_ACTIVE)
+ {
+ if ((cancel = PQgetCancel(slot->connection)))
+ {
+ PQcancel(cancel, errbuf, sizeof(errbuf));
+ PQfreeCancel(cancel);
+ }
+ }
+
+ PQfinish(slot->connection);
+ slot->connection= NULL;
+ }
+
+
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+ {
+ initPQExpBuffer(sql);
+
+ if (analyze_only)
+ {
+ appendPQExpBuffer(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBuffer(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBuffer(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBuffer(sql, " FULL");
+ if (freeze)
+ appendPQExpBuffer(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBuffer(sql, " ANALYZE");
+ }
+ }
+ }
+
static void
help(const char *progname)
***************
*** 405,410 **** help(const char *progname)
--- 863,869 ----
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
On Wed, Jul 16, 2014 at 5:30 AM, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 16 July 2014 12:13 Magnus Hagander Wrote,
Yeah, those are exactly my points. I think it would be significantly
simpler to do it that way, rather than forking and threading. And also
easier to make portable...(and as a optimization on Alvaros suggestion, you can of course reuse
the initial connection as one of the workers as long as you got the full
list of tasks from it up front, which I think you do anyway in order to do
sorting of tasks...)Oh, I got your point, I will update my patch and send,
Now we can completely remove vac_parallel.h file and no need of
refactoring also:)Thanks & Regards,
Dilip Kumar
Should we push the refactoring through anyway? I have a hard time
believing that pg_dump is going to be the only client program we ever have
that will need process-level parallelism, even if this feature itself does
not need it. Why make the next person who comes along re-invent that
re-factoring of this wheel?
Cheers,
Jeff
Jeff Janes wrote:
Should we push the refactoring through anyway? I have a hard time
believing that pg_dump is going to be the only client program we ever have
that will need process-level parallelism, even if this feature itself does
not need it. Why make the next person who comes along re-invent that
re-factoring of this wheel?
I gave the refactoring patch a look some days ago, and my conclusion was
that it is reasonably sound but it needed quite some cleanup in order
for it to be committable. Without any immediate use case, it's hard to
justify going through all that effort. Maybe we can add a TODO item and
have it point to the posted patch, so that if in the future we see a
need for another parallel client program we can easily rebase the
current patch.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 18, 2014 at 10:22 AM, Dilip kumar <dilip.kumar@huawei.com>
wrote:
On 16 July 2014 12:13, Magnus Hagander Wrote,
Yeah, those are exactly my points. I think it would be significantly
simpler to do it that way, rather than forking and threading. And also
easier to make portable...
(and as a optimization on Alvaros suggestion, you can of course reuse
the initial connection as one of the workers as long as you got the full
list of tasks from it up front, which I think you do anyway in order to
sorting of tasks...)
I have modified the patch as per the suggestion,
Now in beginning we create all connections, and first connection we use
for getting table list in beginning, After that all connections will be
involved in vacuum task.
Please have a look and provide your opinion…
1.
+ connSlot = (ParallelSlot*)pg_malloc(parallel * sizeof(ParallelSlot));
+ for (i = 0; i < parallel; i++)
+ {
+ connSlot[i].connection = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ PQsetnonblocking(connSlot[i].connection, 1);
+ connSlot[i].isFree = true;
+ connSlot[i].sock = PQsocket(connSlot[i].connection);
+ }
Here it seems to me that you are opening connections before
getting or checking tables list, so in case you have lesser
number of tables, won't the extra connections be always idle.
Simple case to erify the same is with below example
vacuumdb -t t1 -d postgres -j 4
2.
+ res = executeQuery(conn,
+ "select relname, nspname from pg_class c, pg_namespace ns"
+ " where relkind= \'r\' and c.relnamespace = ns.oid"
+ " order by relpages desc",
+ progname, echo);
Here it is just trying to get the list of relations, however
Vacuum command processes materialized views as well, so
I think here the list should include materialized views as well
unless you have any specific reason for not including those.
3. In function vacuum_parallel(), if user has not provided list of tables,
then it is retrieving all the tables in database and then in
run_parallel_vacuum(),
it tries to do Vacuum for each of table using Async mechanism, now
consider a case when after getting list if any table is dropped by user
from some other session, then patch will error out. However without patch
or Vacuum command will internally ignore such a case and complete
the Vacuum for other tables. Don't you think the patch should maintain
the existing behaviour?
4.
+ <term><option>-j <replaceable
class="parameter">jobs</replaceable></></term>
+ Number of parallel process to perform the operation.
Change this description as per new implementation. Also I think
there is a need of some explanation for this new option.
5.
It seems there is no change in below function decalration:
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password,
const char *progname, bool echo);
6.
+ printf(_(" -j, --jobs=NUM use this many parallel jobs
to vacuum\n"));
Change the description as per new implementation.
7.
/* This will give the free connection slot, if no slot is free it will
wait for atleast one slot to get free.*/
Multiline comments should be written like (refer other places)
/*
* This will give the free connection slot, if no slot is free it will
* wait for atleast one slot to get free.
*/
Kindly correct at other places if similar instance exist in patch.
8.
Isn't it a good idea to check performance of this new patch
especially for some worst cases like when there is not much
to vacuum in the tables inside a database. The reason I wanted
to check is that because with new algorithm (for a vacuum of database,
now it will get the list of tables and perform vacuum on individual
tables) we have to repeat certain actions in server side like
allocation/deallocataion of context, sending stats which would have
been otherwise done once.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 31 July 2014 10:59, Amit kapila Wrote,
Thanks for the review and valuable comments.
I have fixed all the comments and attached the revised patch.
As per your suggestion I have taken the performance report also…
Test1:
Machine Configuration:
Core : 8 (Intel(R) Xeon(R) CPU E5520 @ 2.27GHz)
RAM: 48GB
Test Scenario:
8 tables all with 1M+ records. [many records are deleted and inserted using some pattern, (files is attached in the mail)]
Test Result
Base Code: 43.126s
Parallel Vacuum Code
2 Threads : 29.687s
8 Threads : 14.647s
Test2: (as per your scenario, where actual vacuum time is very less)
Vacuum done for complete DB
8 tables all with 10000 records and few dead tuples
Test Result
Base Code: 0.59s
Parallel Vacuum Code
2 Threads : 0.50s
4 Threads : 0.29s
8 Threads : 0.18s
Regards,
Dilip Kumar
From: Amit Kapila [mailto:amit.kapila16@gmail.com]
Sent: 31 July 2014 10:59
To: Dilip kumar
Cc: Magnus Hagander; Alvaro Herrera; Jan Lentfer; Tom Lane; PostgreSQL-development; Sawada Masahiko; Euler Taveira
Subject: Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Fri, Jul 18, 2014 at 10:22 AM, Dilip kumar <dilip.kumar@huawei.com<mailto:dilip.kumar@huawei.com>> wrote:
On 16 July 2014 12:13, Magnus Hagander Wrote,
Yeah, those are exactly my points. I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier to make portable...
(and as a optimization on Alvaros suggestion, you can of course reuse the initial connection as one of the workers as long as you got the full list of tasks from it up front, which I think you do anyway in order to sorting of tasks...)
I have modified the patch as per the suggestion,
Now in beginning we create all connections, and first connection we use for getting table list in beginning, After that all connections will be involved in vacuum task.
Please have a look and provide your opinion…
1.
+ connSlot = (ParallelSlot*)pg_malloc(parallel * sizeof(ParallelSlot));
+ for (i = 0; i < parallel; i++)
+ {
+ connSlot[i].connection = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ PQsetnonblocking(connSlot[i].connection, 1);
+ connSlot[i].isFree = true;
+ connSlot[i].sock = PQsocket(connSlot[i].connection);
+ }
Here it seems to me that you are opening connections before
getting or checking tables list, so in case you have lesser
number of tables, won't the extra connections be always idle.
Simple case to erify the same is with below example
vacuumdb -t t1 -d postgres -j 4
2.
+ res = executeQuery(conn,
+ "select relname, nspname from pg_class c, pg_namespace ns"
+ " where relkind= \'r\' and c.relnamespace = ns.oid"
+ " order by relpages desc",
+ progname, echo);
Here it is just trying to get the list of relations, however
Vacuum command processes materialized views as well, so
I think here the list should include materialized views as well
unless you have any specific reason for not including those.
3. In function vacuum_parallel(), if user has not provided list of tables,
then it is retrieving all the tables in database and then in run_parallel_vacuum(),
it tries to do Vacuum for each of table using Async mechanism, now
consider a case when after getting list if any table is dropped by user
from some other session, then patch will error out. However without patch
or Vacuum command will internally ignore such a case and complete
the Vacuum for other tables. Don't you think the patch should maintain
the existing behaviour?
4.
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></></term>
+ Number of parallel process to perform the operation.
Change this description as per new implementation. Also I think
there is a need of some explanation for this new option.
5.
It seems there is no change in below function decalration:
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password,
const char *progname, bool echo);
6.
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to vacuum\n"));
Change the description as per new implementation.
7.
/* This will give the free connection slot, if no slot is free it will
wait for atleast one slot to get free.*/
Multiline comments should be written like (refer other places)
/*
* This will give the free connection slot, if no slot is free it will
* wait for atleast one slot to get free.
*/
Kindly correct at other places if similar instance exist in patch.
8.
Isn't it a good idea to check performance of this new patch
especially for some worst cases like when there is not much
to vacuum in the tables inside a database. The reason I wanted
to check is that because with new algorithm (for a vacuum of database,
now it will get the list of tables and perform vacuum on individual
tables) we have to repeat certain actions in server side like
allocation/deallocataion of context, sending stats which would have
been otherwise done once.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
vacuumdb_parallel_v12.patchapplication/octet-stream; name=vacuumdb_parallel_v12.patchDownload
*** a/doc/src/sgml/ref/vacuumdb.sgml
--- b/doc/src/sgml/ref/vacuumdb.sgml
***************
*** 224,229 **** PostgreSQL documentation
--- 224,241 ----
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></></term>
+ <term><option>--jobs=<replaceable class="parameter">jobs</replaceable></></term>
+ <listitem>
+ <para>
+ Number of parallel connections to perform the operation. This option will enable the vacuum
+ operation to run on parallel connections, at a time one table will be operated on one connection.
+ So at one time as many tables will be vacuumed parallely as number of jobs.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-?</></term>
<term><option>--help</></term>
<listitem>
*** a/src/bin/scripts/vacuumdb.c
--- b/src/bin/scripts/vacuumdb.c
***************
*** 14,19 ****
--- 14,29 ----
#include "common.h"
#include "dumputils.h"
+ #define NO_SLOT (-1)
+
+ /* Arguments needed for a worker process */
+ typedef struct ParallelSlot
+ {
+ PGconn *connection;
+ bool isFree;
+ pgsocket sock;
+ } ParallelSlot;
+
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
bool and_analyze, bool analyze_only, bool analyze_in_stages, bool freeze,
***************
*** 25,34 **** static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet);
static void help(const char *progname);
int
main(int argc, char *argv[])
--- 35,72 ----
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet, int parallel);
static void help(const char *progname);
+ void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host,
+ const char *port, const char *username,
+ enum trivalue prompt_password, const char *progname,
+ bool echo, int parallel, SimpleStringList *tables);
+
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+ static void
+ run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int parallel,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb);
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb);
+
+ static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb);
+
+ static int
+ select_loop(int maxFd, fd_set *workerset);
+
+ static void DisconnectDatabase(ParallelSlot *slot);
+
int
main(int argc, char *argv[])
***************
*** 49,54 **** main(int argc, char *argv[])
--- 87,93 ----
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
***************
*** 74,86 **** main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
{
switch (c)
{
--- 113,127 ----
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int parallel = 0;
+ int tbl_count = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
***************
*** 121,134 **** main(int argc, char *argv[])
--- 162,188 ----
alldb = true;
break;
case 't':
+ {
simple_string_list_append(&tables, optarg);
+ tbl_count++;
break;
+ }
case 'f':
full = true;
break;
case 'v':
verbose = true;
break;
+ case 'j':
+ parallel = atoi(optarg);
+ if (parallel <= 0)
+ {
+ fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least 1\n"),
+ progname);
+ exit(1);
+ }
+
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
***************
*** 141,146 **** main(int argc, char *argv[])
--- 195,201 ----
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
***************
*** 179,184 **** main(int argc, char *argv[])
--- 234,245 ----
setup_cancel_handler();
+ /* When user is giving the table list, and list is smaller then
+ * number of tables
+ */
+ if (tbl_count && (parallel > tbl_count))
+ parallel = tbl_count;
+
if (alldb)
{
if (dbname)
***************
*** 196,202 **** main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet);
}
else
{
--- 257,263 ----
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet, parallel);
}
else
{
***************
*** 210,234 **** main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
! for (cell = tables.head; cell; cell = cell->next)
{
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo);
}
}
- else
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages,
- freeze, NULL,
- host, port, username, prompt_password,
- progname, echo);
}
exit(0);
--- 271,309 ----
dbname = get_user_name_or_exit(progname);
}
! if (parallel > 1)
{
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! progname, echo, parallel, &tables);
! }
! else
! {
! if (tables.head != NULL)
! {
! SimpleStringListCell *cell;
!
! for (cell = tables.head; cell; cell = cell->next)
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo);
! }
! }
! else
{
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, NULL,
! host, port, username, prompt_password,
! progname, echo);
!
}
}
}
exit(0);
***************
*** 253,263 **** run_vacuum_command(PGconn *conn, const char *sql, bool echo, const char *dbname,
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze, const char *table,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo)
{
PQExpBufferData sql;
--- 328,339 ----
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo)
{
PQExpBufferData sql;
***************
*** 352,362 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
! bool analyze_in_stages, bool freeze, const char *maintenance_db,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
--- 428,439 ----
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *maintenance_db, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo, bool quiet, int parallel)
{
PGconn *conn;
PGresult *result;
***************
*** 377,391 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
fflush(stdout);
}
! vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
! analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
progname, echo);
}
PQclear(result);
}
static void
help(const char *progname)
--- 454,877 ----
fflush(stdout);
}
!
! if (parallel > 1)
! {
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! progname, echo, parallel, NULL);
!
! }
! else
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
progname, echo);
+ }
}
PQclear(result);
}
+ static void
+ run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int parallel,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb)
+ {
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+ int max_slot = parallel;
+ int i;
+ int free_slot;
+ PGconn *slotconn;
+ bool error = false;
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"};
+
+ initPQExpBuffer(&sql);
+
+ if (analyze_stage >= 0)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ executeCommand(connSlot[i].connection,
+ stage_commands[analyze_stage], progname, echo);
+ }
+ }
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ /* This will give the free connection slot, if no slot is free it will
+ * wait for atleast one slot to get free.
+ */
+ free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname,
+ completedb);
+ if (free_slot == NO_SLOT)
+ {
+ error = true;
+ goto fail;
+ }
+
+ prepare_command(connSlot[free_slot].connection, full, verbose,
+ and_analyze, analyze_only, freeze, &sql);
+
+ appendPQExpBuffer(&sql, " %s", cell->val);
+
+ connSlot[free_slot].isFree = false;
+
+ slotconn = connSlot[free_slot].connection;
+ PQsendQuery(slotconn, sql.data);
+
+ resetPQExpBuffer(&sql);
+ }
+
+
+ for (i = 0; i < max_slot; i++)
+ {
+ /* wait for all connection to return the results*/
+ if (!GetQueryResult(connSlot[i].connection, dbname, progname,
+ completedb))
+ {
+ error = true;
+ goto fail;
+ }
+
+ connSlot[i].isFree = true;
+ }
+
+ fail:
+
+ termPQExpBuffer(&sql);
+
+ if (error)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ DisconnectDatabase(&connSlot[i]);
+ }
+
+ exit(1);
+ }
+ }
+
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb)
+ {
+ int i;
+ fd_set slotset;
+ int firstFree = -1;
+ pgsocket maxFd;
+
+ for (i = 0; i < max_slot; i++)
+ if (pSlot[i].isFree)
+ return i;
+
+ FD_ZERO(&slotset);
+
+ maxFd = pSlot[0].sock;
+
+ for (i = 0; i < max_slot; i++)
+ {
+ FD_SET(pSlot[i].sock, &slotset);
+ if (pSlot[i].sock > maxFd)
+ maxFd = pSlot[i].sock;
+ }
+
+ /* Some of the slot are free, Process the results for slots whichever
+ * are free
+ */
+ do
+ {
+ i = select_loop(maxFd, &slotset);
+ Assert(i != 0);
+
+ for (i = 0; i < max_slot; i++)
+ {
+ if (!FD_ISSET(pSlot[i].sock, &slotset))
+ continue;
+
+ PQconsumeInput(pSlot[i].connection);
+ if (PQisBusy(pSlot[i].connection))
+ continue;
+
+ pSlot[i].isFree = true;
+
+ if (!GetQueryResult(pSlot[i].connection, dbname, progname,
+ completedb))
+ return NO_SLOT;
+
+ if (firstFree < 0)
+ firstFree = i;
+ }
+ }while(firstFree < 0);
+
+ return firstFree;
+ }
+
+ static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb)
+ {
+ PGresult *result = NULL;
+ PGresult *lastResult = NULL;
+ bool r;
+
+ while((result = PQgetResult(conn)) != NULL)
+ lastResult = result;
+
+ if (!lastResult)
+ return true;
+
+ r = (PQresultStatus(lastResult) == PGRES_COMMAND_OK);
+
+ PQclear(lastResult);
+
+ if (!r && !completedb)
+ {
+ fprintf(stderr, _("%s: vacuuming of database \"%s\" failed: %s"),
+ progname, dbname, PQerrorMessage(conn));
+ return false;
+ }
+
+ return true;
+ }
+
+
+
+ void
+ vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int parallel,
+ SimpleStringList *tables)
+ {
+
+ PGconn *conn;
+ int i;
+ ParallelSlot *connSlot;
+ bool vacuum_tables = true;
+ int tbl_count;
+
+ conn = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ /* if table list is not provided then we need to do vaccum for whole DB
+ * get the list of all tables and prpare the list
+ */
+ if (!tables || !tables->head)
+ {
+ SimpleStringList dbtables = {NULL, NULL};
+ PGresult *res;
+ int ntuple;
+ int i;
+ char *relName;
+ char *nspace;
+ PQExpBufferData sql;
+
+ initPQExpBuffer(&sql);
+
+ res = executeQuery(conn,
+ "select relname, nspname from pg_class c, pg_namespace ns"
+ " where (relkind = \'r\' or relkind = \'m\')"
+ " and c.relnamespace = ns.oid order by relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ relName = PQgetvalue(res, i, 0);
+ nspace = PQgetvalue(res, i, 1);
+
+ appendPQExpBuffer(&sql, "\"%s\".\"%s\"", nspace, relName);
+ simple_string_list_append(&dbtables, sql.data);
+ resetPQExpBuffer(&sql);
+ tbl_count++;
+ }
+
+ termPQExpBuffer(&sql);
+ tables = &dbtables;
+
+ /* Vaccuming full database*/
+ vacuum_tables = false;
+
+ if (parallel > tbl_count)
+ parallel = tbl_count;
+ }
+
+ connSlot = (ParallelSlot*)pg_malloc(parallel * sizeof(ParallelSlot));
+ connSlot[0].connection = conn;
+ for (i = 1; i < parallel; i++)
+ {
+ connSlot[i].connection = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ PQsetnonblocking(connSlot[i].connection, 1);
+ connSlot[i].isFree = true;
+ connSlot[i].sock = PQsocket(connSlot[i].connection);
+ }
+
+ if (analyze_in_stages)
+ {
+ int i;
+
+ for (i = 0; i < 3; i++)
+ {
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")};
+
+ puts(gettext(stage_messages[i]));
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, parallel,
+ progname, i, connSlot, vacuum_tables);
+ }
+ }
+ else
+ {
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze,
+ parallel, progname, -1, connSlot, vacuum_tables);
+ }
+
+ for (i = 0; i < parallel; i++)
+ {
+ PQfinish(connSlot[i].connection);
+ }
+ }
+
+ /*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+ static int
+ select_loop(int maxFd, fd_set *workerset)
+ {
+ int i;
+ fd_set saveSet = *workerset;
+
+ #ifdef WIN32
+ /* should always be the master */
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+ #else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+ #endif
+
+ return i;
+ }
+
+ void
+ DisconnectDatabase(ParallelSlot *slot)
+ {
+ PGcancel *cancel;
+ char errbuf[1];
+
+ if (!slot->connection)
+ return;
+
+ if (PQtransactionStatus(slot->connection) == PQTRANS_ACTIVE)
+ {
+ if ((cancel = PQgetCancel(slot->connection)))
+ {
+ PQcancel(cancel, errbuf, sizeof(errbuf));
+ PQfreeCancel(cancel);
+ }
+ }
+
+ PQfinish(slot->connection);
+ slot->connection= NULL;
+ }
+
+
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+ {
+ initPQExpBuffer(sql);
+
+ if (analyze_only)
+ {
+ appendPQExpBuffer(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBuffer(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBuffer(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBuffer(sql, " FULL");
+ if (freeze)
+ appendPQExpBuffer(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBuffer(sql, " ANALYZE");
+ }
+ }
+ }
+
static void
help(const char *progname)
***************
*** 405,410 **** help(const char *progname)
--- 891,897 ----
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel connections to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
On Mon, Aug 4, 2014 at 11:41 AM, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 31 July 2014 10:59, Amit kapila Wrote,
Thanks for the review and valuable comments.
I have fixed all the comments and attached the revised patch.
I have again looked into your revised patch and would like
to share my findings with you.
1.
+ Number of parallel connections to perform the operation. This
option will enable the vacuum
+ operation to run on parallel connections, at a time one table will
be operated on one connection.
a. How about describing w.r.t asynchronous connections
instead of parallel connections?
b. It is better to have line length as lesser than 80.
c. As you are using multiple connections to achieve parallelism,
I suggest you add a line in your description indicating user should
verify max_connections parameters. Something similar to pg_dump:
"pg_dump will open njobs + 1 connections to the database, so make
sure your max_connections setting is high enough to accommodate
all connections."
2.
+ So at one time as many tables will be vacuumed parallely as number
of jobs.
can you briefly mention about the case when number of jobs
is more than number of tables?
3.
+ /* When user is giving the table list, and list is smaller then
+ * number of tables
+ */
+ if (tbl_count && (parallel > tbl_count))
+ parallel = tbl_count;
+
Again here multiline comments are wrong.
Some other instances are as below:
a.
/* This will give the free connection slot, if no slot is free it will
* wait for atleast one slot to get free.
*/
b.
/* if table list is not provided then we need to do vaccum for whole DB
* get the list of all tables and prpare the list
*/
c.
/* Some of the slot are free, Process the results for slots whichever
* are free
*/
4.
src/bin/scripts/vacuumdb.c:51: indent with spaces.
+ bool analyze_only, bool freeze, PQExpBuffer sql);
src/bin/scripts/vacuumdb.c:116: indent with spaces.
+ int parallel = 0;
src/bin/scripts/vacuumdb.c:198: indent with spaces.
+ optind++;
src/bin/scripts/vacuumdb.c:299: space before tab in indent.
+ vacuum_one_database(dbname, full, verbose,
and_analyze,
There are lot of redundant whitespaces, check with
git diff --check and fix them.
5.
res = executeQuery(conn,
"select relname, nspname from pg_class c, pg_namespace ns"
" where (relkind = \'r\' or relkind = \'m\')"
" and c.relnamespace = ns.oid order by relpages desc",
progname, echo);
a. Here you need to use SQL keywords in capital letters, refer one
of the other caller of executeQuery() in vacuumdb.c
b. Why do you need this condition c.relnamespace = ns.oid in above
query?
I think to get the list of required objects from pg_class, you don't
need to have a join with pg_namespace.
6.
vacuum_parallel()
{
..
if (!tables || !tables->head)
{
..
tbl_count++;
}
..
}
a. Here why you need a separate variable (tbl_count) to verify number
asynchronous/parallel connections you want, why can't we use ntuple?
b. there is a warning in code (I have compiled it on windows) as well
related to this variable.
vacuumdb.c(695): warning C4700: uninitialized local variable 'tbl_count'
used
7.
Fix for one of my previous comment is as below:
GetQueryResult()
{
..
if (!r && !completedb)
..
}
Here I think some generic errors like connection broken or others
will also get ignored. Is it possible that we can ignore particular
error which we want to ignore without complicating the code?
Also in anycase add comments to explain why you are ignoring
error for particular case.
8.
+ fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least
1\n"),
+ progname);
formatting of 2nd line progname is not as per standard (you can refer other
fprintf in the same file).
9. + int parallel = 0;
I think it is better to name it as numAsyncCons or something similar.
10. It is better if you can add function header for newly added
functions.
Test2: (as per your scenario, where actual vacuum time is very less)
Vacuum done for complete DB
8 tables all with 10000 records and few dead tuples
I think this test is missing in attached file. Few means?
Can you try with 0.1%, 1% of dead tuples in table and try to
take time in milliseconds if it is taking less time to complete
the test.
PS -
It is better if you mention against each review comment/suggestion
what you have done, because in some cases it will help reviewer to
understand your fix easily and as author you will also be sure that
all got fixed.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 11 August 2014 10:29, Amit kapila wrote,
1. I have fixed all the review comments except few, and modified patch is attached.
2. For not fixed comments, find inline reply in the mail..
1. + Number of parallel connections to perform the operation. This option will enable the vacuum + operation to run on parallel connections, at a time one table will be operated on one connection.a. How about describing w.r.t asynchronous connections
instead of parallel connections?
b. It is better to have line length as lesser than 80.
c. As you are using multiple connections to achieve parallelism,
I suggest you add a line in your description indicating user should
verify max_connections parameters. Something similar to pg_dump:2.
+ So at one time as many tables will be vacuumed parallely as number of jobs.can you briefly mention about the case when number of jobs
is more than number of tables?
1 and 2 ARE FIXED in DOC.
3. + /* When user is giving the table list, and list is smaller then + * number of tables + */ + if (tbl_count && (parallel > tbl_count)) + parallel = tbl_count; +Again here multiline comments are wrong.
Some other instances are as below:
a.
/* This will give the free connection slot, if no slot is free it will
* wait for atleast one slot to get free.
*/
b.
/* if table list is not provided then we need to do vaccum for whole DB
* get the list of all tables and prpare the list
*/
c.
/* Some of the slot are free, Process the results for slots whichever
* are free
*/
COMMENTS are FIXED
4. src/bin/scripts/vacuumdb.c:51: indent with spaces. + bool analyze_only, bool freeze, PQExpBuffer sql); src/bin/scripts/vacuumdb.c:116: indent with spaces. + int parallel = 0; src/bin/scripts/vacuumdb.c:198: indent with spaces. + optind++; src/bin/scripts/vacuumdb.c:299: space before tab in indent. + vacuum_one_database(dbname, full, verbose, and_analyze,There are lot of redundant whitespaces, check with
git diff --check and fix them.
ALL are FIXED
5.
res = executeQuery(conn,
"select relname, nspname from pg_class c, pg_namespace ns"
" where (relkind = \'r\' or relkind = \'m\')"
" and c.relnamespace = ns.oid order by relpages desc",
progname, echo);a. Here you need to use SQL keywords in capital letters, refer one
of the other caller of executeQuery() in vacuumdb.c
b. Why do you need this condition c.relnamespace = ns.oid in above
query?
IT IS POSSIBLE THAT, TWO NAMESPACE HAVE THE SAME TABLE NAME, SO WHEN WE ARE SENDING COMMAND FROM CLIENT WE NEED TO GIVE NAMESPACE WISE BECAUSE WE NEED TO VACUUM ALL THE
TABLES.. (OTHERWISE TWO TABLE WITH SAME NAME FROM DIFFERENT NAMESPACE WILL BE TREATED SAME.)
I think to get the list of required objects from pg_class, you don't
need to have a join with pg_namespace.
DONE
6.
vacuum_parallel()
{
..
if (!tables || !tables->head)
{
..
tbl_count++;
}
..
}a. Here why you need a separate variable (tbl_count) to verify number
asynchronous/parallel connections you want, why can't we use ntuple?
b. there is a warning in code (I have compiled it on windows) as well
related to this variable.
vacuumdb.c(695): warning C4700: uninitialized local variable 'tbl_count' used
Variable REMOVED
7.
Fix for one of my previous comment is as below:
GetQueryResult()
{
..
if (!r && !completedb)
..
}Here I think some generic errors like connection broken or others
will also get ignored. Is it possible that we can ignore particular
error which we want to ignore without complicating the code?Also in anycase add comments to explain why you are ignoring
error for particular case.
Here we are getting message string, I think if we need to find the error code then we need to parse the string, and after that we need to compare with error codes.
Is there any other way to do this ?
Comments are added
8. + fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least 1\n"), + progname); formatting of 2nd line progname is not as per standard (you can refer other fprintf in the same file).
DONE
9. + int parallel = 0;
I think it is better to name it as numAsyncCons or something similar.
CHANGED as PER SUGGESTION
10. It is better if you can add function header for newly added
functions.
ADDED
Test2: (as per your scenario, where actual vacuum time is very less)
Vacuum done for complete DB
8 tables all with 10000 records and few dead tuples
I think this test is missing in attached file. Few means?
Can you try with 0.1%, 1% of dead tuples in table and try to
take time in milliseconds if it is taking less time to complete
the test.
TESTED with 1%, 0.1% and 0.01 % and results are as follows
1. With 1% (file test1%.sql)
Base Code : 22.26 s
2 Threads : 12.82 s
4 Threads : 9.19s
8 Threads : 8.89s
2. With 0.1%
Base Code : 3.83.26 s
2 Threads : 2.01 s
4 Threads : 2.02s
8 Threads : 2.25s
3. With 0.01%
Base Code : 0.60 s
2 Threads : 0.32 s
4 Threads : 0.26s
8 Threads : 0.31s
Thanks & Regards,
Dilip
Attachments:
vacuumdb_parallel_v13.patchapplication/octet-stream; name=vacuumdb_parallel_v13.patchDownload
*** a/doc/src/sgml/ref/vacuumdb.sgml
--- b/doc/src/sgml/ref/vacuumdb.sgml
***************
*** 224,229 **** PostgreSQL documentation
--- 224,250 ----
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></></term>
+ <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></></term>
+ <listitem>
+ <para>
+ Number of asynchronous connections to perform the operation. This option
+ will enable the vacuum operation to run on asynchronous connections,
+ at a time one table will be operated on one connection. So at one time
+ as many tables will be vacuumed parallely as number of jobs.
+ If number of jobs given are more than number of tables then number of
+ jobs will be set to number of tables.
+ </para>
+ <para><application>vacuumdb</> will open <replaceable class="parameter">
+ njobs</replaceable> connections to the database, so make sure your
+ <xref linkend="guc-max-connections"> setting is high enough to
+ accommodate all connections.
+ </para>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-?</></term>
<term><option>--help</></term>
<listitem>
*** a/src/bin/scripts/vacuumdb.c
--- b/src/bin/scripts/vacuumdb.c
***************
*** 14,19 ****
--- 14,29 ----
#include "common.h"
#include "dumputils.h"
+ #define NO_SLOT (-1)
+
+ /* Arguments needed for a worker process */
+ typedef struct ParallelSlot
+ {
+ PGconn *connection;
+ bool isFree;
+ pgsocket sock;
+ } ParallelSlot;
+
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
bool and_analyze, bool analyze_only, bool analyze_in_stages, bool freeze,
***************
*** 25,34 **** static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet);
static void help(const char *progname);
int
main(int argc, char *argv[])
--- 35,73 ----
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet,
! int numAsyncCons);
static void help(const char *progname);
+ void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host,
+ const char *port, const char *username,
+ enum trivalue prompt_password, const char *progname,
+ bool echo, int numAsyncCons, SimpleStringList *tables);
+
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+ static void
+ run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int numAsyncCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb);
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb);
+
+ static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb);
+
+ static int
+ select_loop(int maxFd, fd_set *workerset);
+
+ static void DisconnectDatabase(ParallelSlot *slot);
+
int
main(int argc, char *argv[])
***************
*** 49,54 **** main(int argc, char *argv[])
--- 88,94 ----
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
***************
*** 74,86 **** main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
{
switch (c)
{
--- 114,128 ----
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int numAsyncCons = 0;
+ int tbl_count = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
***************
*** 121,134 **** main(int argc, char *argv[])
--- 163,189 ----
alldb = true;
break;
case 't':
+ {
simple_string_list_append(&tables, optarg);
+ tbl_count++;
break;
+ }
case 'f':
full = true;
break;
case 'v':
verbose = true;
break;
+ case 'j':
+ numAsyncCons = atoi(optarg);
+ if (numAsyncCons <= 0)
+ {
+ fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least 1\n"),
+ progname);
+ exit(1);
+ }
+
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
***************
*** 141,146 **** main(int argc, char *argv[])
--- 196,202 ----
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
***************
*** 179,184 **** main(int argc, char *argv[])
--- 235,247 ----
setup_cancel_handler();
+ /*
+ * When user is giving the table list, and list is smaller then
+ * number of tables
+ */
+ if (tbl_count && (numAsyncCons > tbl_count))
+ numAsyncCons = tbl_count;
+
if (alldb)
{
if (dbname)
***************
*** 196,202 **** main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet);
}
else
{
--- 259,265 ----
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet, numAsyncCons);
}
else
{
***************
*** 210,234 **** main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
! for (cell = tables.head; cell; cell = cell->next)
{
vacuum_one_database(dbname, full, verbose, and_analyze,
analyze_only, analyze_in_stages,
! freeze, cell->val,
host, port, username, prompt_password,
progname, echo);
}
}
- else
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages,
- freeze, NULL,
- host, port, username, prompt_password,
- progname, echo);
}
exit(0);
--- 273,311 ----
dbname = get_user_name_or_exit(progname);
}
! if (numAsyncCons > 1)
{
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! progname, echo, numAsyncCons, &tables);
! }
! else
! {
! if (tables.head != NULL)
! {
! SimpleStringListCell *cell;
!
! for (cell = tables.head; cell; cell = cell->next)
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo);
! }
! }
! else
{
vacuum_one_database(dbname, full, verbose, and_analyze,
analyze_only, analyze_in_stages,
! freeze, NULL,
host, port, username, prompt_password,
progname, echo);
+
}
}
}
exit(0);
***************
*** 253,263 **** run_vacuum_command(PGconn *conn, const char *sql, bool echo, const char *dbname,
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze, const char *table,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo)
{
PQExpBufferData sql;
--- 330,341 ----
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo)
{
PQExpBufferData sql;
***************
*** 352,362 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
! bool analyze_in_stages, bool freeze, const char *maintenance_db,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
--- 430,441 ----
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *maintenance_db, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo, bool quiet, int numAsyncCons)
{
PGconn *conn;
PGresult *result;
***************
*** 377,391 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
fflush(stdout);
}
! vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
! analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
progname, echo);
}
PQclear(result);
}
static void
help(const char *progname)
--- 456,913 ----
fflush(stdout);
}
!
! if (numAsyncCons > 1)
! {
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, host, port, username, prompt_password,
! progname, echo, numAsyncCons, NULL);
!
! }
! else
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages,
! freeze, NULL, host, port, username, prompt_password,
progname, echo);
+ }
}
PQclear(result);
}
+ /*
+ * run_parallel_vacuum
+ * This function process the table list,
+ * pick the object on by one and get the Free connections slot, once it
+ * get the free slot send the job on the free connection.
+ */
+
+ static void
+ run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int numAsyncCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb)
+ {
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+ int max_slot = numAsyncCons;
+ int i;
+ int free_slot;
+ PGconn *slotconn;
+ bool error = false;
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"};
+
+ initPQExpBuffer(&sql);
+
+ if (analyze_stage >= 0)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ executeCommand(connSlot[i].connection,
+ stage_commands[analyze_stage], progname, echo);
+ }
+ }
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ /*
+ * This will give the free connection slot, if no slot is free it will
+ * wait for atleast one slot to get free.
+ */
+ free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname,
+ completedb);
+ if (free_slot == NO_SLOT)
+ {
+ error = true;
+ goto fail;
+ }
+
+ prepare_command(connSlot[free_slot].connection, full, verbose,
+ and_analyze, analyze_only, freeze, &sql);
+
+ appendPQExpBuffer(&sql, " %s", cell->val);
+
+ connSlot[free_slot].isFree = false;
+
+ slotconn = connSlot[free_slot].connection;
+ PQsendQuery(slotconn, sql.data);
+
+ resetPQExpBuffer(&sql);
+ }
+
+
+ for (i = 0; i < max_slot; i++)
+ {
+ /* wait for all connection to return the results*/
+ if (!GetQueryResult(connSlot[i].connection, dbname, progname,
+ completedb))
+ {
+ error = true;
+ goto fail;
+ }
+
+ connSlot[i].isFree = true;
+ }
+
+ fail:
+
+ termPQExpBuffer(&sql);
+
+ if (error)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ DisconnectDatabase(&connSlot[i]);
+ }
+
+ exit(1);
+ }
+ }
+
+ /*
+ * GetIdleSlot
+ * Process the slot list, if any free slot available return the slotid
+ * If no slot is free, Then perform select on all the socket and wait until
+ * atleast one slot is free.
+ */
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb)
+ {
+ int i;
+ fd_set slotset;
+ int firstFree = -1;
+ pgsocket maxFd;
+
+ for (i = 0; i < max_slot; i++)
+ if (pSlot[i].isFree)
+ return i;
+
+ FD_ZERO(&slotset);
+
+ maxFd = pSlot[0].sock;
+
+ for (i = 0; i < max_slot; i++)
+ {
+ FD_SET(pSlot[i].sock, &slotset);
+ if (pSlot[i].sock > maxFd)
+ maxFd = pSlot[i].sock;
+ }
+
+ /*
+ * Some of the slot are free, Process the results for slots whichever
+ * are free
+ */
+ do
+ {
+ i = select_loop(maxFd, &slotset);
+ Assert(i != 0);
+
+ for (i = 0; i < max_slot; i++)
+ {
+ if (!FD_ISSET(pSlot[i].sock, &slotset))
+ continue;
+
+ PQconsumeInput(pSlot[i].connection);
+ if (PQisBusy(pSlot[i].connection))
+ continue;
+
+ pSlot[i].isFree = true;
+
+ if (!GetQueryResult(pSlot[i].connection, dbname, progname,
+ completedb))
+ return NO_SLOT;
+
+ if (firstFree < 0)
+ firstFree = i;
+ }
+ }while(firstFree < 0);
+
+ return firstFree;
+ }
+
+ /*
+ * GetQueryResult
+ * Process the query result.
+ */
+
+ static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb)
+ {
+ PGresult *result = NULL;
+ PGresult *lastResult = NULL;
+ bool r;
+
+ while((result = PQgetResult(conn)) != NULL)
+ lastResult = result;
+
+ if (!lastResult)
+ return true;
+
+ r = (PQresultStatus(lastResult) == PGRES_COMMAND_OK);
+
+ PQclear(lastResult);
+
+ /*
+ * If user has not given the vacuum of complete db, then if
+ * any of the object vacuum failed it can be ignored and vacuuming
+ * of other object can be continued, this is the same behaviour as
+ * vacuuming of complete db is handled without --jobs option
+ */
+ if (!r && !completedb)
+ {
+ fprintf(stderr, _("%s: vacuuming of database \"%s\" failed: %s"),
+ progname, dbname, PQerrorMessage(conn));
+ return false;
+ }
+
+ return true;
+ }
+
+ /*
+ * vacuum_parallel
+ * This function will open the multiple asynchronous connection as
+ * suggested by used, it will derive the table list using server call
+ * if table list is not given by user and perform the vacuum call
+ */
+
+ void
+ vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int numAsyncCons,
+ SimpleStringList *tables)
+ {
+
+ PGconn *conn;
+ int i;
+ ParallelSlot *connSlot;
+ bool vacuum_tables = true;
+
+ conn = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ /*
+ * if table list is not provided then we need to do vaccum for whole DB
+ * get the list of all tables and prpare the list
+ */
+ if (!tables || !tables->head)
+ {
+ SimpleStringList dbtables = {NULL, NULL};
+ PGresult *res;
+ int ntuple;
+ int i;
+ char *relName;
+ char *nspace;
+ PQExpBufferData sql;
+
+ initPQExpBuffer(&sql);
+
+ res = executeQuery(conn,
+ "SELECT relname, nspname FROM pg_class c, pg_namespace ns"
+ " WHERE (relkind = \'r\' or relkind = \'m\')"
+ " and c.relnamespace = ns.oid ORDER BY relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ relName = PQgetvalue(res, i, 0);
+ nspace = PQgetvalue(res, i, 1);
+
+ appendPQExpBuffer(&sql, "\"%s\".\"%s\"", nspace, relName);
+ simple_string_list_append(&dbtables, sql.data);
+ resetPQExpBuffer(&sql);
+ }
+
+ termPQExpBuffer(&sql);
+ tables = &dbtables;
+
+ /* Vaccuming full database*/
+ vacuum_tables = false;
+
+ if (numAsyncCons > ntuple)
+ numAsyncCons = ntuple;
+ }
+
+ connSlot = (ParallelSlot*)pg_malloc(numAsyncCons * sizeof(ParallelSlot));
+ connSlot[0].connection = conn;
+ for (i = 1; i < numAsyncCons; i++)
+ {
+ connSlot[i].connection = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ PQsetnonblocking(connSlot[i].connection, 1);
+ connSlot[i].isFree = true;
+ connSlot[i].sock = PQsocket(connSlot[i].connection);
+ }
+
+ if (analyze_in_stages)
+ {
+ int i;
+
+ for (i = 0; i < 3; i++)
+ {
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")};
+
+ puts(gettext(stage_messages[i]));
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, numAsyncCons,
+ progname, i, connSlot, vacuum_tables);
+ }
+ }
+ else
+ {
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze,
+ numAsyncCons, progname, -1, connSlot, vacuum_tables);
+ }
+
+ for (i = 0; i < numAsyncCons; i++)
+ {
+ PQfinish(connSlot[i].connection);
+ }
+ }
+
+ /*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+ static int
+ select_loop(int maxFd, fd_set *workerset)
+ {
+ int i;
+ fd_set saveSet = *workerset;
+
+ #ifdef WIN32
+ /* should always be the master */
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+ #else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+ #endif
+
+ return i;
+ }
+
+ /*
+ * DisconnectDatabase
+ * disconnect all the connections.
+ */
+ void
+ DisconnectDatabase(ParallelSlot *slot)
+ {
+ PGcancel *cancel;
+ char errbuf[1];
+
+ if (!slot->connection)
+ return;
+
+ if (PQtransactionStatus(slot->connection) == PQTRANS_ACTIVE)
+ {
+ if ((cancel = PQgetCancel(slot->connection)))
+ {
+ PQcancel(cancel, errbuf, sizeof(errbuf));
+ PQfreeCancel(cancel);
+ }
+ }
+
+ PQfinish(slot->connection);
+ slot->connection= NULL;
+ }
+
+
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+ {
+ initPQExpBuffer(sql);
+
+ if (analyze_only)
+ {
+ appendPQExpBuffer(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBuffer(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBuffer(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBuffer(sql, " FULL");
+ if (freeze)
+ appendPQExpBuffer(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBuffer(sql, " ANALYZE");
+ }
+ }
+ }
+
static void
help(const char *progname)
***************
*** 405,410 **** help(const char *progname)
--- 927,933 ----
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many asynchronous connections to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
On Mon, Aug 11, 2014 at 12:59 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
1. + Number of parallel connections to perform the operation. This option will enable the vacuum + operation to run on parallel connections, at a time one table will be operated on one connection.a. How about describing w.r.t asynchronous connections
instead of parallel connections?
I don't think "asynchronous" is a good choice of word. Maybe "simultaneous"?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 13, 2014 at 4:01 PM, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 11 August 2014 10:29, Amit kapila wrote,
5.
res = executeQuery(conn,
"select relname, nspname from pg_class c, pg_namespace ns"
" where (relkind = \'r\' or relkind = \'m\')"
" and c.relnamespace = ns.oid order by relpages desc",
progname, echo);
a. Here you need to use SQL keywords in capital letters, refer one
of the other caller of executeQuery() in vacuumdb.cb. Why do you need this condition c.relnamespace = ns.oid in above
query?
IT IS POSSIBLE THAT, TWO NAMESPACE HAVE THE SAME TABLE NAME, SO WHEN WE
ARE SENDING COMMAND FROM CLIENT WE NEED TO GIVE NAMESPACE WISE BECAUSE WE
NEED TO VACUUM ALL THE
TABLES.. (OTHERWISE TWO TABLE WITH SAME NAME FROM DIFFERENT NAMESPACE
WILL BE TREATED SAME.)
Thats right, however writing query in below way might
make it more understandable
+ "SELECT relname, nspname FROM pg_class c, pg_namespace ns"
"SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns"
7.
Here we are getting message string, I think if we need to find the error
code then we need to parse the string, and after that we need to compare
with error codes.
Is there any other way to do this ?
You can compare against SQLSTATE by using below API.
val = PQresultErrorField(res, PG_DIAG_SQLSTATE);
You need to handle *42P01* SQLSTATE, also please refer below
usage where we are checking SQLSTATE.
fe-connect.c
PQresultErrorField(conn->result, PG_DIAG_SQLSTATE),
ERRCODE_INVALID_PASSWORD) == 0)
Few more comments:
1.
* If user has not given the vacuum of complete db, then if
I think here you have said reverse of what code is doing.
You don't need *not* in above sentence.
2.
+ appendPQExpBuffer(&sql, "\"%s\".\"%s\"", nspace, relName);
I think here you need to use function fmtQualifiedId() or fmtId()
or something similar to handle quotes appropriately.
3.
+ */
+ if (!r && !completedb)
Here the usage of completedb variable is reversed which means
that it goes into error path when actually whole database is
getting vacuumed and the reason is that you are setting it
to false in below code:
+ /* Vaccuming full database*/
+ vacuum_tables = false;
4.
Functions prepare_command() and vacuum_one_database() contain
duplicate code, is there any problem in using prepare_command()
function in vacuum_one_database(). Another point in this context
is that I think it is better to name function prepare_command()
as append_vacuum_options() or something on that lines, also it will
be better if you can write function header for this function as well.
5.
+ if (error)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ DisconnectDatabase(&connSlot[i]);
+ }
Here why do we need DisconnectDatabase() type of function?
Why can't we simply call PQfinish() as in base code?
6.
+ /*
+ * if table list is not provided then we need to do vaccum for whole DB
+ * get the list of all tables and prpare the list
+ */
spelling of prepare is wrong. I have noticed spell mistake
in comments at some other place as well, please check all
comments once
7. I think in new mechanism cancel handler will not work.
In single connection vacuum it was always set/reset
in function executeMaintenanceCommand(). You might need
to set/reset it in function run_parallel_vacuum().
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Fri, Aug 15, 2014 at 12:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Aug 11, 2014 at 12:59 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
1. + Number of parallel connections to perform the operation. This option will enable the vacuum + operation to run on parallel connections, at a time one table
will
be operated on one connection.
a. How about describing w.r.t asynchronous connections
instead of parallel connections?I don't think "asynchronous" is a good choice of word.
Agreed.
Maybe "simultaneous"?
Not sure. How about *concurrent* or *multiple*?
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Aug 19, 2014 at 7:08 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Aug 15, 2014 at 12:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Aug 11, 2014 at 12:59 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:1. + Number of parallel connections to perform the operation. This option will enable the vacuum + operation to run on parallel connections, at a time one table will be operated on one connection.a. How about describing w.r.t asynchronous connections
instead of parallel connections?I don't think "asynchronous" is a good choice of word.
Agreed.
Maybe "simultaneous"?
Not sure. How about *concurrent* or *multiple*?
multiple isn't right, but we could say concurrent.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 21, 2014 at 12:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 19, 2014 at 7:08 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Fri, Aug 15, 2014 at 12:55 AM, Robert Haas <robertmhaas@gmail.com>
wrote:
On Mon, Aug 11, 2014 at 12:59 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:a. How about describing w.r.t asynchronous connections
instead of parallel connections?I don't think "asynchronous" is a good choice of word.
Agreed.
Maybe "simultaneous"?
Not sure. How about *concurrent* or *multiple*?
multiple isn't right, but we could say concurrent.
I also find concurrent more appropriate.
Dilip, could you please change it to concurrent in doc updates,
variables, functions unless you see any objection for the same.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 21 August 2014 08:31, Amit Kapila Wrote,
Not sure. How about *concurrent* or *multiple*?
multiple isn't right, but we could say concurrent.
I also find concurrent more appropriate.
Dilip, could you please change it to concurrent in doc updates,
variables, functions unless you see any objection for the same.
Ok, I will take care along with other comments fix..
Regards,
Dilip Kumar
On Tue, Aug 19, 2014 at 4:27 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
Few more comments:
Some more comments:
1. I could see one shortcoming in the way the patch has currently
parallelize the
work for --analyze-in-stages. Basically patch is performing the work for
each stage
for multiple tables in concurrent connections that seems okay for the
cases when
number of parallel connections is less than equal to number of tables,
but for
the case when user has asked for more number of connections than number
of tables,
then I think this strategy will not be able to use the extra
connections.
2. Similarly for the case of multiple databases, currently it will not be
able
to use connections more than number of tables in each database because
the
parallelizing strategy is to just use the conncurrent connections for
tables inside single database.
I am not completely sure whether current strategy is good enough or
we should try to address the above problems. What do you think?
3.
+ do
+ {
+ i = select_loop(maxFd, &slotset);
+ Assert(i != 0);
Could you explain the reason of using this loop, I think you
want to wait for data on socket descriptor, but why for maxFd?
Also it is better if you explain this logic in comments.
4.
+ for (i = 0; i < max_slot; i++)
+ {
+ if (!FD_ISSET(pSlot[i].sock, &slotset))
+ continue;
+
+ PQconsumeInput(pSlot[i].connection);
+ if (PQisBusy(pSlot[i].connection))
+ continue;
I think it is better to call PQconsumeInput() only if you find
connection is busy.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 24 August 2014 11:33, Amit Kapila Wrote
Thanks for you comments, i have worked on both the review comment lists, sent on 19 August, and 24 August.
Latest patch is attached with the mail..
on 19 August:
------------
You can compare against SQLSTATE by using below API.
val = PQresultErrorField(res, PG_DIAG_SQLSTATE);
You need to handle *42P01* SQLSTATE, also please refer below
usage where we are checking SQLSTATE.
fe-connect.c
PQresultErrorField(conn->result, PG_DIAG_SQLSTATE),
ERRCODE_INVALID_PASSWORD) == 0)
DONE
Few more comments:
1.
* If user has not given the vacuum of complete db, then ifI think here you have said reverse of what code is doing.
You don't need *not* in above sentence.
DONE
2.
+ appendPQExpBuffer(&sql, "\"%s\".\"%s\"", nspace, relName);
I think here you need to use function fmtQualifiedId() or fmtId()
or something similar to handle quotes appropriately.
DONE
3.
+ */ + if (!r && !completedb) Here the usage of completedb variable is reversed which means that it goes into error path when actually whole database is getting vacuumed and the reason is that you are setting it to false in below code: + /* Vaccuming full database*/ + vacuum_tables = false;
FIXED
4.
Functions prepare_command() and vacuum_one_database() contain
duplicate code, is there any problem in using prepare_command()
function in vacuum_one_database(). Another point in this context
is that I think it is better to name function prepare_command()
as append_vacuum_options() or something on that lines, also it will
be better if you can write function header for this function as well.
DONE
5. + if (error) + { + for (i = 0; i < max_slot; i++) + { + DisconnectDatabase(&connSlot[i]); + }Here why do we need DisconnectDatabase() type of function?
Why can't we simply call PQfinish() as in base code?
Beacause base code jut need to handle main connection, and when sending the PQfinish, means either its completed or error,
In both the cases, control is with client, But in multi connection case, if one connection fails then we need to send
cancle to on other connection that wwhat is done DisconnectDatabase.
6. + /* + * if table list is not provided then we need to do vaccum for whole DB + * get the list of all tables and prpare the list + */ spelling of prepare is wrong. I have noticed spell mistake in comments at some other place as well, please check all comments once
FIXED
7. I think in new mechanism cancel handler will not work.
In single connection vacuum it was always set/reset
in function executeMaintenanceCommand(). You might need
to set/reset it in function run_parallel_vacuum().
Good catch, Now i have called SetCancelConn(pSlot[0].connection), on first connection. this will enable cancle
handler to cancle the query on first connection so that select loop will come out.
24 August
---------
1. I could see one shortcoming in the way the patch has currently parallelize the
work for --analyze-in-stages. Basically patch is performing the work for each stage
for multiple tables in concurrent connections that seems okay for the cases when
number of parallel connections is less than equal to number of tables, but for
the case when user has asked for more number of connections than number of tables,
then I think this strategy will not be able to use the extra connections.
I think --analyze-in-stages should maintain the order.
2. Similarly for the case of multiple databases, currently it will not be able
to use connections more than number of tables in each database because the
parallelizing strategy is to just use the conncurrent connections for
tables inside single database.
I think for multiple database doing the parallel execution we need to maintain the multiple connection with multiple databases.
And we need to maintain a table list for all the databases together to run them concurrently. I think this may impact the startup cost,
as we need to create a multiple connection and disconnect for preparing the list and i think it will increase the complexity also.
I am not completely sure whether current strategy is good enough or
we should try to address the above problems. What do you think?
3. + do + { + i = select_loop(maxFd, &slotset); + Assert(i != 0);Could you explain the reason of using this loop, I think you
want to wait for data on socket descriptor, but why for maxFd?
Also it is better if you explain this logic in comments.
We are sending vacuum job to the connection and when non of the connection slot is free, we are waiting on all the socket, and
wait until one of them get freed.
4. + for (i = 0; i < max_slot; i++) + { + if (!FD_ISSET(pSlot[i].sock, &slotset)) + continue; + + PQconsumeInput(pSlot[i].connection); + if (PQisBusy(pSlot[i].connection)) + continue;I think it is better to call PQconsumeInput() only if you find
connection is busy.
I think here logic is bit different, in other places of code they call PQconsumeInput untill it shows PQisBusy in a loop to consume all data,
but in over case, we have sent the query now we consume if there is something on n/w and then check PQisBusy,
if not busy means this connection is freed.
And PQconsumeInput functionality is if input available then only consume it so i think we need not to checkk for PQisBusy externally,
However the case where user have to fetch complete data, that time they need to call PQisBusy and then PQconsumeInput in loop so PQconsumeInput
will not get called in tight loop, and will be called only when data connection is busy.
Regards,
Dilip
Attachments:
vacuumdb_parallel_v14.patchapplication/octet-stream; name=vacuumdb_parallel_v14.patchDownload
*** a/doc/src/sgml/ref/vacuumdb.sgml
--- b/doc/src/sgml/ref/vacuumdb.sgml
***************
*** 204,209 **** PostgreSQL documentation
--- 204,231 ----
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></></term>
+ <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></></term>
+ <listitem>
+ <para>
+ Number of concurrent connections to perform the operation.
+ This option will enable the vacuum operation to run on asynchronous connections,
+ at a time one table will be operated on one connection.
+ So at one time as many tables will be vacuumed parallely as number of jobs.
+ If number of jobs given are more than number of tables then number of
+ jobs will be set to number of tables.
+ </para>
+ <para>
+ <application>vacuumdb</> will open <replaceable class="parameter">
+ njobs</replaceable> connections to the database, so make sure your
+ <xref linkend="guc-max-connections"> setting is high enough to
+ accommodate all connections.
+ </para>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>--analyze-in-stages</option></term>
<listitem>
<para>
*** a/src/bin/scripts/common.c
--- b/src/bin/scripts/common.c
***************
*** 19,26 ****
#include "common.h"
! static void SetCancelConn(PGconn *conn);
! static void ResetCancelConn(void);
static PGcancel *volatile cancelConn = NULL;
--- 19,25 ----
#include "common.h"
!
static PGcancel *volatile cancelConn = NULL;
***************
*** 291,297 **** yesno_prompt(const char *question)
*
* Set cancelConn to point to the current database connection.
*/
! static void
SetCancelConn(PGconn *conn)
{
PGcancel *oldCancelConn;
--- 290,296 ----
*
* Set cancelConn to point to the current database connection.
*/
! void
SetCancelConn(PGconn *conn)
{
PGcancel *oldCancelConn;
***************
*** 321,327 **** SetCancelConn(PGconn *conn)
*
* Free the current cancel connection, if any, and set to NULL.
*/
! static void
ResetCancelConn(void)
{
PGcancel *oldCancelConn;
--- 320,326 ----
*
* Free the current cancel connection, if any, and set to NULL.
*/
! void
ResetCancelConn(void)
{
PGcancel *oldCancelConn;
*** a/src/bin/scripts/common.h
--- b/src/bin/scripts/common.h
***************
*** 49,52 **** extern bool yesno_prompt(const char *question);
--- 49,56 ----
extern void setup_cancel_handler(void);
+ extern void SetCancelConn(PGconn *conn);
+ extern void ResetCancelConn(void);
+
+
#endif /* COMMON_H */
*** a/src/bin/scripts/vacuumdb.c
--- b/src/bin/scripts/vacuumdb.c
***************
*** 14,19 ****
--- 14,30 ----
#include "common.h"
#include "dumputils.h"
+ #define NO_SLOT (-1)
+
+ /* Arguments needed for a worker process */
+ typedef struct ParallelSlot
+ {
+ PGconn *connection;
+ bool isFree;
+ pgsocket sock;
+ } ParallelSlot;
+
+ #define ERRCODE_UNDEFINED_TABLE "42P01"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
bool and_analyze, bool analyze_only, bool analyze_in_stages, int stage, bool freeze,
***************
*** 25,34 **** static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet);
static void help(const char *progname);
int
main(int argc, char *argv[])
--- 36,74 ----
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet,
! int numAsyncCons);
static void help(const char *progname);
+ void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ int stage, bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int concurrentCons,
+ SimpleStringList *tables, bool quiet);
+
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+ static void
+ run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int numAsyncCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb);
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb);
+
+ static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb);
+
+ static int
+ select_loop(int maxFd, fd_set *workerset);
+
+ static void DisconnectDatabase(ParallelSlot *slot);
+
int
main(int argc, char *argv[])
***************
*** 49,54 **** main(int argc, char *argv[])
--- 89,95 ----
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
***************
*** 74,86 **** main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
{
switch (c)
{
--- 115,129 ----
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int concurrentCons = 0;
+ int tbl_count = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
***************
*** 121,134 **** main(int argc, char *argv[])
--- 164,190 ----
alldb = true;
break;
case 't':
+ {
simple_string_list_append(&tables, optarg);
+ tbl_count++;
break;
+ }
case 'f':
full = true;
break;
case 'v':
verbose = true;
break;
+ case 'j':
+ concurrentCons = atoi(optarg);
+ if (concurrentCons <= 0)
+ {
+ fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least 1\n"),
+ progname);
+ exit(1);
+ }
+
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
***************
*** 141,146 **** main(int argc, char *argv[])
--- 197,203 ----
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
***************
*** 179,184 **** main(int argc, char *argv[])
--- 236,248 ----
setup_cancel_handler();
+ /*
+ * When user is giving the table list, and list is smaller then
+ * number of tables
+ */
+ if (tbl_count && (concurrentCons > tbl_count))
+ concurrentCons = tbl_count;
+
if (alldb)
{
if (dbname)
***************
*** 196,202 **** main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet);
}
else
{
--- 260,266 ----
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet, concurrentCons);
}
else
{
***************
*** 210,234 **** main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
- for (cell = tables.head; cell; cell = cell->next)
- {
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages, -1,
- freeze, cell->val,
- host, port, username, prompt_password,
- progname, echo, quiet);
- }
}
else
! vacuum_one_database(dbname, full, verbose, and_analyze,
analyze_only, analyze_in_stages, -1,
freeze, NULL,
host, port, username, prompt_password,
progname, echo, quiet);
}
exit(0);
--- 274,309 ----
dbname = get_user_name_or_exit(progname);
}
! if (concurrentCons > 1)
{
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages, -1,
! freeze, host, port, username, prompt_password,
! progname, echo, concurrentCons, &tables, quiet);
}
else
! {
! if (tables.head != NULL)
! {
! SimpleStringListCell *cell;
!
! for (cell = tables.head; cell; cell = cell->next)
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages, -1,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo, quiet);
! }
! }
! else
! vacuum_one_database(dbname, full, verbose, and_analyze,
analyze_only, analyze_in_stages, -1,
freeze, NULL,
host, port, username, prompt_password,
progname, echo, quiet);
+ }
}
exit(0);
***************
*** 268,323 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
conn = connectDatabase(dbname, host, port, username, prompt_password,
progname, false);
! if (analyze_only)
! {
! appendPQExpBufferStr(&sql, "ANALYZE");
! if (verbose)
! appendPQExpBufferStr(&sql, " VERBOSE");
! }
! else
! {
! appendPQExpBufferStr(&sql, "VACUUM");
! if (PQserverVersion(conn) >= 90000)
! {
! const char *paren = " (";
! const char *comma = ", ";
! const char *sep = paren;
- if (full)
- {
- appendPQExpBuffer(&sql, "%sFULL", sep);
- sep = comma;
- }
- if (freeze)
- {
- appendPQExpBuffer(&sql, "%sFREEZE", sep);
- sep = comma;
- }
- if (verbose)
- {
- appendPQExpBuffer(&sql, "%sVERBOSE", sep);
- sep = comma;
- }
- if (and_analyze)
- {
- appendPQExpBuffer(&sql, "%sANALYZE", sep);
- sep = comma;
- }
- if (sep != paren)
- appendPQExpBufferStr(&sql, ")");
- }
- else
- {
- if (full)
- appendPQExpBufferStr(&sql, " FULL");
- if (freeze)
- appendPQExpBufferStr(&sql, " FREEZE");
- if (verbose)
- appendPQExpBufferStr(&sql, " VERBOSE");
- if (and_analyze)
- appendPQExpBufferStr(&sql, " ANALYZE");
- }
- }
if (table)
appendPQExpBuffer(&sql, " %s", table);
appendPQExpBufferStr(&sql, ";");
--- 343,351 ----
conn = connectDatabase(dbname, host, port, username, prompt_password,
progname, false);
! prepare_command(conn, full, verbose,
! and_analyze, analyze_only, freeze, &sql);
if (table)
appendPQExpBuffer(&sql, " %s", table);
appendPQExpBufferStr(&sql, ";");
***************
*** 374,384 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
! bool analyze_in_stages, bool freeze, const char *maintenance_db,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
--- 402,413 ----
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *maintenance_db, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo, bool quiet, int numAsyncCons)
{
PGconn *conn;
PGresult *result;
***************
*** 407,412 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
--- 436,450 ----
fflush(stdout);
}
+ if (numAsyncCons > 1)
+ {
+ vacuum_parallel(dbname, full, verbose, and_analyze,
+ analyze_only, analyze_in_stages, stage,
+ freeze, host, port, username, prompt_password,
+ progname, echo, numAsyncCons, NULL, quiet);
+
+ }
+ else
vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
analyze_in_stages, stage,
freeze, NULL, host, port, username, prompt_password,
***************
*** 417,422 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
--- 455,923 ----
PQclear(result);
}
+ /*
+ * run_parallel_vacuum
+ * This function process the table list,
+ * pick the object on by one and get the Free connections slot, once it
+ * get the free slot send the job on the free connection.
+ */
+
+ static void
+ run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int concurrentCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb)
+ {
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+ int max_slot = concurrentCons;
+ int i;
+ int free_slot;
+ PGconn *slotconn;
+ bool error = false;
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"};
+
+ initPQExpBuffer(&sql);
+
+ if (analyze_stage >= 0)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ executeCommand(connSlot[i].connection,
+ stage_commands[analyze_stage], progname, echo);
+ }
+ }
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ /*
+ * This will give the free connection slot, if no slot is free it will
+ * wait for atleast one slot to get free.
+ */
+ free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname,
+ completedb);
+ if (free_slot == NO_SLOT)
+ {
+ error = true;
+ goto fail;
+ }
+
+ prepare_command(connSlot[free_slot].connection, full, verbose,
+ and_analyze, analyze_only, freeze, &sql);
+
+ appendPQExpBuffer(&sql, " %s", cell->val);
+
+ connSlot[free_slot].isFree = false;
+
+ slotconn = connSlot[free_slot].connection;
+ PQsendQuery(slotconn, sql.data);
+
+ resetPQExpBuffer(&sql);
+ }
+
+
+ for (i = 0; i < max_slot; i++)
+ {
+ /* wait for all connection to return the results*/
+ if (!GetQueryResult(connSlot[i].connection, dbname, progname,
+ completedb))
+ {
+ error = true;
+ goto fail;
+ }
+
+ connSlot[i].isFree = true;
+ }
+
+ fail:
+
+ termPQExpBuffer(&sql);
+
+ if (error)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ DisconnectDatabase(&connSlot[i]);
+ }
+
+ exit(1);
+ }
+ }
+
+ /*
+ * GetIdleSlot
+ * Process the slot list, if any free slot available return the slotid
+ * If no slot is free, Then perform select on all the socket and wait until
+ * atleast one slot is free.
+ */
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb)
+ {
+ int i;
+ fd_set slotset;
+ int firstFree = -1;
+ pgsocket maxFd;
+
+ for (i = 0; i < max_slot; i++)
+ if (pSlot[i].isFree)
+ return i;
+
+ FD_ZERO(&slotset);
+
+ maxFd = pSlot[0].sock;
+
+ for (i = 0; i < max_slot; i++)
+ {
+ FD_SET(pSlot[i].sock, &slotset);
+ if (pSlot[i].sock > maxFd)
+ maxFd = pSlot[i].sock;
+ }
+
+ /*
+ * Some of the slot are free, Process the results for slots whichever
+ * are free
+ */
+ do
+ {
+ SetCancelConn(pSlot[0].connection);
+ i = select_loop(maxFd, &slotset);
+
+ ResetCancelConn();
+
+ Assert(i != 0);
+
+ for (i = 0; i < max_slot; i++)
+ {
+ if (!FD_ISSET(pSlot[i].sock, &slotset))
+ continue;
+
+ PQconsumeInput(pSlot[i].connection);
+ if (PQisBusy(pSlot[i].connection))
+ continue;
+
+ pSlot[i].isFree = true;
+
+ if (!GetQueryResult(pSlot[i].connection, dbname, progname,
+ completedb))
+ return NO_SLOT;
+
+ if (firstFree < 0)
+ firstFree = i;
+ }
+ }while(firstFree < 0);
+
+ return firstFree;
+ }
+
+ /*
+ * GetQueryResult
+ * Process the query result.
+ */
+
+ static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb)
+ {
+ PGresult *result = NULL;
+ PGresult *lastResult = NULL;
+ bool r;
+
+ while((result = PQgetResult(conn)) != NULL)
+ lastResult = result;
+
+ if (!lastResult)
+ return true;
+
+ r = (PQresultStatus(lastResult) == PGRES_COMMAND_OK);
+
+ PQclear(lastResult);
+
+ /*
+ * If user has given the vacuum of complete db, then if
+ * any of the object vacuum failed it can be ignored and vacuuming
+ * of other object can be continued, this is the same behavior as
+ * vacuuming of complete db is handled without --jobs option
+ */
+ if (!r)
+ {
+ char *sqlState = PQresultErrorField(result, PG_DIAG_SQLSTATE);
+
+ if(!completedb ||
+ (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0))
+ {
+
+ fprintf(stderr, _("%s: vacuuming of database \"%s\" failed: %s"),
+ progname, dbname, PQerrorMessage(conn));
+
+ return false;
+ }
+ }
+
+ return true;
+ }
+
+ /*
+ * vacuum_parallel
+ * This function will open the multiple concurrent connections as
+ * suggested by used, it will derive the table list using server call
+ * if table list is not given by user and perform the vacuum call
+ */
+
+ void
+ vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ int stage, bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int concurrentCons,
+ SimpleStringList *tables, bool quiet)
+ {
+
+ PGconn *conn;
+ int i;
+ ParallelSlot *connSlot;
+ bool completeDb = false;
+
+ conn = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ /*
+ * if table list is not provided then we need to do vaccum for whole DB
+ * get the list of all tables and prepare the list
+ */
+ if (!tables || !tables->head)
+ {
+ SimpleStringList dbtables = {NULL, NULL};
+ PGresult *res;
+ int ntuple;
+ int i;
+ PQExpBufferData sql;
+
+ initPQExpBuffer(&sql);
+
+ res = executeQuery(conn,
+ "SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns"
+ " WHERE (relkind = \'r\' or relkind = \'m\')"
+ " and c.relnamespace = ns.oid ORDER BY c.relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ appendPQExpBuffer(&sql, "%s",
+ fmtQualifiedId(PQserverVersion(conn),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 0)));
+
+ simple_string_list_append(&dbtables, sql.data);
+ resetPQExpBuffer(&sql);
+ }
+
+ termPQExpBuffer(&sql);
+ tables = &dbtables;
+
+ /* Vaccuming full database*/
+ completeDb = true;
+
+ if (concurrentCons > ntuple)
+ concurrentCons = ntuple;
+ }
+
+ connSlot = (ParallelSlot*)pg_malloc(concurrentCons * sizeof(ParallelSlot));
+ connSlot[0].connection = conn;
+ for (i = 1; i < concurrentCons; i++)
+ {
+ connSlot[i].connection = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ PQsetnonblocking(connSlot[i].connection, 1);
+ connSlot[i].isFree = true;
+ connSlot[i].sock = PQsocket(connSlot[i].connection);
+ }
+
+ if (analyze_in_stages)
+ {
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")
+ };
+
+ if (stage == -1)
+ {
+ int i;
+ for (i = 0; i < 3; i++)
+ {
+ if (!quiet)
+ {
+ puts(gettext(stage_messages[i]));
+ fflush(stdout);
+ }
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, concurrentCons,
+ progname, i, connSlot, completeDb);
+ }
+ }
+ else
+ {
+ /* Otherwise, we got a stage from vacuum_all_databases(), so run
+ * only that one. */
+ if (!quiet)
+ {
+ puts(gettext(stage_messages[stage]));
+ fflush(stdout);
+ }
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, concurrentCons,
+ progname, stage, connSlot, completeDb);
+ }
+ }
+ else
+ {
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze,
+ concurrentCons, progname, -1, connSlot, completeDb);
+ }
+
+ for (i = 0; i < concurrentCons; i++)
+ {
+ PQfinish(connSlot[i].connection);
+ }
+ }
+
+ /*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+ static int
+ select_loop(int maxFd, fd_set *workerset)
+ {
+ int i;
+ fd_set saveSet = *workerset;
+
+ #ifdef WIN32
+ /* should always be the master */
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+ #else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+ #endif
+
+ return i;
+ }
+
+ /*
+ * DisconnectDatabase
+ * disconnect all the connections.
+ */
+ void
+ DisconnectDatabase(ParallelSlot *slot)
+ {
+ PGcancel *cancel;
+ char errbuf[1];
+
+ if (!slot->connection)
+ return;
+
+ if (PQtransactionStatus(slot->connection) == PQTRANS_ACTIVE)
+ {
+ if ((cancel = PQgetCancel(slot->connection)))
+ {
+ PQcancel(cancel, errbuf, sizeof(errbuf));
+ PQfreeCancel(cancel);
+ }
+ }
+
+ PQfinish(slot->connection);
+ slot->connection= NULL;
+ }
+
+
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+ {
+ initPQExpBuffer(sql);
+
+ if (analyze_only)
+ {
+ appendPQExpBuffer(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBuffer(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBuffer(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBuffer(sql, " FULL");
+ if (freeze)
+ appendPQExpBuffer(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBuffer(sql, " ANALYZE");
+ }
+ }
+ }
+
static void
help(const char *progname)
***************
*** 436,441 **** help(const char *progname)
--- 937,943 ----
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many concurrent connections to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
On Wed, Sep 24, 2014 at 2:48 AM, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 24 August 2014 11:33, Amit Kapila Wrote
Thanks for you comments, i have worked on both the review comment lists,
sent on 19 August, and 24 August.Latest patch is attached with the mail..
Hi Dilip,
I think you have an off-by-one error in the index into the array of file
handles.
vacuumdb runs at full CPU, and if I run:
strace -ttt -T -f ../parallel_vac/bin/vacuumdb -z -a -j 8
I get the select returning immediately with a bad file descriptor error:
1411663937.641177 select(55, [4 5 6 7 8 9 10 54], NULL, NULL, NULL) = -1
EBADF (Bad file descriptor) <0.000012>
1411663937.641232 recvfrom(3, 0x104e3f0, 16384, 0, 0, 0) = -1 EAGAIN
(Resource temporarily unavailable) <0.000012>
1411663937.641279 recvfrom(4, 0x1034bc0, 16384, 0, 0, 0) = -1 EAGAIN
(Resource temporarily unavailable) <0.000011>
1411663937.641326 recvfrom(5, 0x10017c0, 16384, 0, 0, 0) = -1 EAGAIN
(Resource temporarily unavailable) <0.000014>
1411663937.641415 recvfrom(6, 0x10097e0, 16384, 0, 0, 0) = -1 EAGAIN
(Resource temporarily unavailable) <0.000020>
1411663937.641487 recvfrom(7, 0x1012330, 16384, 0, 0, 0) = -1 EAGAIN
(Resource temporarily unavailable) <0.000013>
1411663937.641538 recvfrom(8, 0x101af00, 16384, 0, 0, 0) = -1 EAGAIN
(Resource temporarily unavailable) <0.000011>
1411663937.641584 recvfrom(9, 0x1023af0, 16384, 0, 0, 0) = -1 EAGAIN
(Resource temporarily unavailable) <0.000012>
1411663937.641631 recvfrom(10, 0x1054600, 16384, 0, 0, 0) = -1 EAGAIN
(Resource temporarily unavailable) <0.000012>
Cheers,
Jeff
On Thu, Sep 25, 2014 at 10:00 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Sep 24, 2014 at 2:48 AM, Dilip kumar <dilip.kumar@huawei.com>
wrote:On 24 August 2014 11:33, Amit Kapila Wrote
Thanks for you comments, i have worked on both the review comment lists,
sent on 19 August, and 24 August.Latest patch is attached with the mail..
Hi Dilip,
I think you have an off-by-one error in the index into the array of file
handles.
Actually the problem is that the socket for the master connection was not
getting initialized, see my one line addition here.
connSlot = (ParallelSlot*)pg_malloc(concurrentCons *
sizeof(ParallelSlot));
connSlot[0].connection = conn;
+ connSlot[0].sock = PQsocket(conn);
However, I don't think it is good to just ignore errors from the select
call (like the EBADF) and go into a busy loop instead, so there are more
changes needed than this.
Also, cancelling the run (by hitting ctrl-C in the shell that invoked it)
does not seem to work on linux. I get a message that says "Cancel request
sent", but then it continues to finish the job anyway.
Cheers,
Jeff
On Wed, Sep 24, 2014 at 3:18 PM, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 24 August 2014 11:33, Amit Kapila Wrote
7. I think in new mechanism cancel handler will not work.
In single connection vacuum it was always set/reset
in function executeMaintenanceCommand(). You might need
to set/reset it in function run_parallel_vacuum().
Good catch, Now i have called SetCancelConn(pSlot[0].connection), on
first connection. this will enable cancle
handler to cancle the query on first connection so that select loop will
come out.
I don't think this can handle cancel requests properly because
you are just setting it in GetIdleSlot() what if the cancel
request came during GetQueryResult() after sending sql for
all connections (probably thats the reason why Jeff is not able
to cancel the vacuumdb when using parallel option).
1. I could see one shortcoming in the way the patch has currently
parallelize the
work for --analyze-in-stages. Basically patch is performing the work
for each stage
for multiple tables in concurrent connections that seems okay for the
cases when
number of parallel connections is less than equal to number of
tables, but for
the case when user has asked for more number of connections than
number of tables,
then I think this strategy will not be able to use the extra
connections.
I think --analyze-in-stages should maintain the order.
Yes, you are right. So lets keep the code as it is for this case.
2. Similarly for the case of multiple databases, currently it will not
be able
to use connections more than number of tables in each database
because the
parallelizing strategy is to just use the conncurrent connections for
tables inside single database.
I think for multiple database doing the parallel execution we need to
maintain the multiple connection with multiple databases.
And we need to maintain a table list for all the databases together to
run them concurrently. I think this may impact the startup cost,
as we need to create a multiple connection and disconnect for preparing
the list
Yeah probably startup cost will be bit higher, but that cost we
will anyway incur during overall operation.
and i think it will increase the complexity also.
I understand that complexity of code might increase and the strategy
to parallelize can also be different incase we want to parallelize
for all databases case, so lets leave as it is unless someone else
raises voice for the same.
Today while again thinking about the startegy used in patch to
parallelize the operation (vacuum database), I think we can
improve the same for cases when number of connections are
lesser than number of tables in database (which I presume
will normally be the case). Currently we are sending command
to vacuum one table per connection, how about sending multiple
commands (example Vacuum t1; Vacuum t2) on one connection.
It seems to me there is extra roundtrip for cases when there
are many small tables in database and few large tables. Do
you think we should optimize for any such cases?
Few other points
1.
+ vacuum_parallel(const char *dbname, bool full, bool verbose,
{
..
+ connSlot = (ParallelSlot*)pg_malloc(concurrentCons *
sizeof(ParallelSlot));
+ connSlot[0].connection = conn;
a.
Does above memory gets freed anywhere, if not isn't it
good idea to do the same
b. For slot 0, you are not seeting it as PQsetnonblocking,
where as I think it can be used to run commands like any other
connection.
2.
+ /*
+ * If user has given the vacuum of complete db, then if
+ * any of the object vacuum
failed it can be ignored and vacuuming
+ * of other object can be continued, this is the same
behavior as
+ * vacuuming of complete db is handled without --jobs option
+ */
s/object/object's
3.
+ if(!completedb ||
+ (sqlState && strcmp(sqlState,
ERRCODE_UNDEFINED_TABLE) != 0))
+ {
+
+ fprintf(stderr, _("%s: vacuuming of
database \"%s\" failed: %s"),
+ progname, dbname, PQerrorMessage
(conn));
Indentation on both places is wrong. Check other palces for
similar issues.
4.
+ bool analyze_only, bool freeze, int numAsyncCons,
In code still there is reference to AsyncCons, as decided lets
change it to concurrent_connections | conc_cons
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Amit Kapila wrote:
Today while again thinking about the startegy used in patch to
parallelize the operation (vacuum database), I think we can
improve the same for cases when number of connections are
lesser than number of tables in database (which I presume
will normally be the case). Currently we are sending command
to vacuum one table per connection, how about sending multiple
commands (example Vacuum t1; Vacuum t2) on one connection.
It seems to me there is extra roundtrip for cases when there
are many small tables in database and few large tables. Do
you think we should optimize for any such cases?
I don't think this is a good idea; at least not in a first cut of this
patch. It's easy to imagine that a table you initially think is small
enough turns out to have grown much larger since last analyze. In that
case, putting one worker to process that one together with some other
table could end up being bad for parallelism, if later it turns out that
some other worker has no table to process. (Table t2 in your example
could grown between the time the command is sent and t1 is vacuumed.)
It's simpler to have workers do one thing at a time only.
I don't think it's a very good idea to call pg_relation_size() on every
table in the database from vacuumdb.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Sep 26, 2014 at 7:06 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:
Amit Kapila wrote:
Today while again thinking about the startegy used in patch to
parallelize the operation (vacuum database), I think we can
improve the same for cases when number of connections are
lesser than number of tables in database (which I presume
will normally be the case). Currently we are sending command
to vacuum one table per connection, how about sending multiple
commands (example Vacuum t1; Vacuum t2) on one connection.
It seems to me there is extra roundtrip for cases when there
are many small tables in database and few large tables. Do
you think we should optimize for any such cases?I don't think this is a good idea; at least not in a first cut of this
patch. It's easy to imagine that a table you initially think is small
enough turns out to have grown much larger since last analyze.
That could be possible, but currently it vacuum's even system tables
one by one (where I think chances of growing up would be comparatively
less) which was the main reason I thought it might be worth
to consider if the current work distribution strategy is good enough.
In that
case, putting one worker to process that one together with some other
table could end up being bad for parallelism, if later it turns out that
some other worker has no table to process. (Table t2 in your example
could grown between the time the command is sent and t1 is vacuumed.)It's simpler to have workers do one thing at a time only.
Yeah probably that is best at least for initial patch.
I don't think it's a very good idea to call pg_relation_size() on every
table in the database from vacuumdb.
You might be right, however I was bit skeptical about the current
strategy where the work unit is one object and each object is considered
same irrespective of it's size/bloat. OTOH I agree with you that it is
good to keep the first version simpler.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 27/09/14 01:36, Alvaro Herrera wrote:
Amit Kapila wrote:
Today while again thinking about the startegy used in patch to
parallelize the operation (vacuum database), I think we can
improve the same for cases when number of connections are
lesser than number of tables in database (which I presume
will normally be the case). Currently we are sending command
to vacuum one table per connection, how about sending multiple
commands (example Vacuum t1; Vacuum t2) on one connection.
It seems to me there is extra roundtrip for cases when there
are many small tables in database and few large tables. Do
you think we should optimize for any such cases?I don't think this is a good idea; at least not in a first cut of this
patch. It's easy to imagine that a table you initially think is small
enough turns out to have grown much larger since last analyze. In that
case, putting one worker to process that one together with some other
table could end up being bad for parallelism, if later it turns out that
some other worker has no table to process. (Table t2 in your example
could grown between the time the command is sent and t1 is vacuumed.)It's simpler to have workers do one thing at a time only.
I don't think it's a very good idea to call pg_relation_size() on every
table in the database from vacuumdb.
Curious: would it be both feasible and useful to have multiple workers
process a 'large' table, without complicating things too much? The
could each start at a different position in the file.
Cheers,
Gavin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Gavin Flower wrote:
Curious: would it be both feasible and useful to have multiple
workers process a 'large' table, without complicating things too
much? The could each start at a different position in the file.
Feasible: no. Useful: maybe, we don't really know. (You could just as
well have a worker at double the speed, i.e. double vacuum_cost_limit).
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9/26/14, 2:38 PM, Gavin Flower wrote:
Curious: would it be both feasible and useful to have multiple workers
process a 'large' table, without complicating things too much? The
could each start at a different position in the file.
Not really feasible without a major overhaul. It might be mildly useful
in one rare case. Occasionally I'll find very hot single tables that
vacuum is constantly processing, despite mostly living in RAM because
the server has a lot of memory. You can set vacuum_cost_page_hit=0 in
order to get vacuum to chug through such a table as fast as possible.
However, the speed at which that happens will often then be limited by
how fast a single core can read from memory, for things in
shared_buffers. That is limited by the speed of memory transfers from a
single NUMA memory bank. Which bank you get will vary depending on the
core that owns that part of shared_buffers' memory, but it's only one at
a time.
On large servers, that can be only a small fraction of the total memory
bandwidth the server is able to reach. I've attached a graph showing
how this works on a system with many NUMA banks of RAM, and this is only
a medium sized system. This server can hit 40GB/s of memory transfers
in total; no one process will ever see more than 8GB/s.
If we had more vacuum processes running against the same table, there
would then be more situations where they were doing work against
different NUMA memory banks at the same time, therefore making faster
progress through the hits in shared_buffers possible. In the real world,
this situation is rare enough compared to disk-bound vacuum work that I
doubt it's worth getting excited over. Systems with lots of RAM where
performance is regularly dominated by one big ugly table are common
though, so I wouldn't just rule the idea out as not useful either.
--
Greg Smith greg.smith@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/
Attachments:
stream.pngimage/png; name=stream.pngDownload
�PNG
IHDR � � ,� 2PLTE��� ���� � ��� � ���@ �� Ai��� �@���0`�� @� ������**�� @�� 333MMMfff�������������������22�������U��������������� � d �"�".�W � �p � ���� ��� � �����P����E ��r��z�����k������� �� ���� �����P@Uk/� ��@�@��`��`��� ��@��@��`��p������������������������|�@�� ���___���???������d�H �IDATx���i�� @����r���
#�$�I@
���n:��So@p�� ���~5U��g%��n�4u�����"R7�j�5��%���<&�~��;.�E���~��}�������Y�j��:.�E�h�}*B&��5���7�p� l"�"���.��!|��]g4�>]���j?W���JjDm`�yVDEm���;���W�6�j���V��x�1�yV����[�����W��Ym`�����9�r�|�!�yV�%TsZ��Ow\m��v�?�M�i`1">���<�
�
z�����G��v_m��v���Z������}�yV��Ol���[�B�����Ym`��9?������\ ���Km��N�2?~%<���T�g�����[�
��������uR�g�����V��w�_}>(�o��6�jE^��&��O�t�_}Gm����AI� }7���4j��6�0�N�"5�T��g�l)_�f���<�
,q+����>�k���<�
,���?$*���}���KW��Ym`A��}��������<�
,����}�RC���_��Lm��@`���/��1c}�G�yVX �
T����!�9��;�6�j�G�:������}9���j��60o;�������m���@m���M`ya�_T@�:j��60_����[�\������]�6�j�ET��m����xs��<�
���C�K��������[�6�j�������e��;��6�j�#���4IQ�g��y8F����}@�z��60��_oAx
2j��60� ��?�yV�BSq����:j��60$��J�z��6���9��m�+a/�6�j�
&��j��6��������E=������FP��{�T��7�j�e��Oo���-��D��lJ���g��}�������R�8j��6�oQ#���Z����7�2��
��2���2e�/2�[��/��nhm��Ou�Z/X1c�M��<��[���e�$�f�4n]o��[r� �&�T��WT#������<3����&���w�O2�v�1�kc7��[e�[r��K~�h����ky�~I���U�^?�y��Z���.x��)!o�_�o���� $�b���.�d-���f&�j��.��������2%=
B~f"ZF�;�'|*����� �[���0E���6�j���yV������Oo������M���g��}=P�g��}Fbr��������>"qv��������>" �}���g��=>-�'p��Ge`����"�����(�y���]�4�����P��<�h
�})��8�2�[��*�O� �������7!`�v��3-���{o����-����K�� ���
-�y���Cw�OM��������Y�R���7��{pvA���mP��w��9�wI��/��?-y~GI`�$��y��A�|���oo_5������#���S������o��<�#0�����;(�y>@E`��DQ��+���<�"��[�
�t���y^+�����R��<�!0��5�9��;����!��a-�4��k
y>DC`���T���y�3�;��DA��Q��c[0y>FA`� y>&`B}�UN�����c�/M��%�O���Q?4q��y>!{`�����0������I<�
����3��QI���3r�i���Sr��i���Sr��i���S2��@�c0x&�A�D�.V�M��-Q�|�B�K��r�v�������"���kE�T����~�L���0��t�-��q�-&���_2d�<u�f>�3�%��.� `2�\�t{�q�-&
�3��<���K��!���w���NA�4��qU� ���,n�"7����6o�MD#`BdO�5S�Z��8L#��)A�70%��_��� �+�|���/�_Z�L���iA�g�/1�&�A�� ���|S��O `j�����.�tA�� ���0=����A��� � `p�r����@� �
�eW0���Y@���. `p���3�� �A�L ���'08���@@�e-�
� `6��9�����`S�..�
��lW�����qH( �e�?�U;�m���od�7�1/2-l��y�k��%�_��ev�"���2���2�/'�y��f<l>lx<V���\����0'yG��<��en&�:t����9Iq$��1Kk�q��[L��eE6���Ue�N�yf>�3n�-�����gS�y�h����������-������Y��s[w��p����q�m5
�y��s;tU7�M�|�����������a0�9�}�����T8�������{����V8
����s7��v�(��F������6{��2�.�����+�I9�zN��8���g���m;c��lr �o��g;b�+����"�����Dtk�����������<;�v�p= f!+?- ��!+�:���$�y�)��oLG@ d���`���g��N����ynSq1�u��?\]�����@��?�y0�����J�s-�C�V�Z����t���1���0
~�|^A�O����&a$�?M����S����?] d���?e����������S��>~I@�S� ����h�SI@���>������?���yy������?����,��4�#y!�����<W��%��Z�����9���JTK��f���c
�z �s�?t\�
~a"������4����u*5����i��N���j����/
���C@�� Y�db��VM�O7A#��/K9��� 9����m>t�v��������V��?����1O�T�����*���F@��M�L� O���:��VL�?�$����f�w=*����!�vd���������Q�-F���}S��������*���[��1����{�|��~����.8��f�MX�����_�A�@����>���o ���O8��e����|2����t*K ���g��_���D!Q�m��2@@��]��B��*�n��_) d���_1 d������r@@�����@@�
BVn( ��BV�' �BV�r;��,�r7��0������ ������{ ������[ ��������I@�+���H@�+�� d�>�_� d�6�_� d!+w�
!+7�JEN��Y���[-KEN@S����k2�#`��v��������T���?��-�����X�h*3������%���K
�6n<����i��C�rI!���� ��L@;$����_E|��W�!X0I�`�i�+�vC�Bd&��d���3p}�N�������)�b,��/ ��
BVJ�
!+Y)\@�+�� d!+e������� d�h��|�������, �� �� d!+�w !+YA@�J���-@@�
BV���BV�����R�������� d�P��. d!+Y)S@��
Y��o�����(��X!�9M�V[�m��k��"�mH�^��2�������b�j�T����Kn����>$��}����
�C�����[�60�%
�7Bol9�,����8R�����%N� ��H `��h��rV���G>��w����N�' ��
�� d!+� ��!+YA@�Ji���@@�
BV�R���w7�������% ���� d!+E ��!+E �Cm��QQ�g��AT��Ym`�yVDEm��QQ�g��AT��Ym`�yVXCW�0�����8���'��/Z���j��6����K��9��~�Am��D=��]�q~��/;[W���+�<���|�r4/ ���������?�<���eg����j~z��v�_�������Ym`AT���214��U;���������'���Gc�(Tc�Nz|��6b�����Ka8����vc�h���9��,��-�idR�6�j�s�8��7���U���^jG=z3y�m�����ns,��jZ}`���fm��hZS�Su�����]���5�N��x�y��<�
,��t��>��=�c@���V������|��&a��z:j����������}������� �h����8�5m~�T��%����fV�a���e^;ei���j�h}����;�]�c��.>�87:��^��Y����?�.��bVX:�2�xl��p��|���Ec�<u����/��f�l�r+���Z�^�]l4�V����s\�Rt�����Dm��v��e�5�����bU�4r����/m����N�|`�D�����j���G?��,���[���
��;�N�����zj �_��7{�9\>0n������S����O5N�t?h����n�����_�z+���v�8�VO��������:
n����3O������!�um=�����^}���{<�������������Wsz8M86t�p��m5>_����������u�`�������nR�y�����������s�.�_�����S�I*������I*�<��G������.��a���A����?*���)� ��vU;���q)u`dOA��������������� ��>����.�g ��.���h�H�k/�� �� ?��������Y7,�{
2�����Y�v��!�)�/�?="��c��V�������t�����m�Y� ���y���!*�t�t���.�5�0�%2)��
Q�m�fb�^g Fd�O@f��O��X���a{R�T&���f"�����Dp��!��F�v�5��=Dva�A8a$p���e�H&���n�����k�\��X&�H�O�<o�������u������D���0� d"�nlv�����Lu��p�`f"�n�vH�����`��h�> �'���v�y��.�5g"�Sq�}���7������X�?=8Zf�$3 p/��1|��u������ �BVVkS���j����[���`�z.��5�7�OW���v�H��]NVMw<�c��oKmS���m�����nY�"�85q�Rsq2nWlz�������!����.��� ������n�\�a�x��E���C���;���z�����
�������m�sh
]�.J�-'��^�����p;�P��(���w�4�2��G��-�OOa�9���3}mc�y�o����Rg� ���(�d
�a IEND�B`�On Fri, Sep 26, 2014 at 11:47 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:
Gavin Flower wrote:
Curious: would it be both feasible and useful to have multiple
workers process a 'large' table, without complicating things too
much? The could each start at a different position in the file.Feasible: no. Useful: maybe, we don't really know. (You could just as
well have a worker at double the speed, i.e. double vacuum_cost_limit).
Vacuum_cost_delay is already 0 by default. So unless you changed that,
vacuum_cost_limit does not take effect under vacuumdb.
It is pretty easy for vacuum to be CPU limited, and even easier for analyze
to be CPU limited (It does a lot of sorting). I think analyzing is the
main use case for this patch, to shorten the pg_upgrade window. At least,
that is how I anticipate using it.
Cheers,
Jeff
On 27/09/14 11:36, Gregory Smith wrote:
On 9/26/14, 2:38 PM, Gavin Flower wrote:
Curious: would it be both feasible and useful to have multiple
workers process a 'large' table, without complicating things too
much? The could each start at a different position in the file.Not really feasible without a major overhaul. It might be mildly
useful in one rare case. Occasionally I'll find very hot single
tables that vacuum is constantly processing, despite mostly living in
RAM because the server has a lot of memory. You can set
vacuum_cost_page_hit=0 in order to get vacuum to chug through such a
table as fast as possible.However, the speed at which that happens will often then be limited by
how fast a single core can read from memory, for things in
shared_buffers. That is limited by the speed of memory transfers from
a single NUMA memory bank. Which bank you get will vary depending on
the core that owns that part of shared_buffers' memory, but it's only
one at a time.On large servers, that can be only a small fraction of the total
memory bandwidth the server is able to reach. I've attached a graph
showing how this works on a system with many NUMA banks of RAM, and
this is only a medium sized system. This server can hit 40GB/s of
memory transfers in total; no one process will ever see more than 8GB/s.If we had more vacuum processes running against the same table, there
would then be more situations where they were doing work against
different NUMA memory banks at the same time, therefore making faster
progress through the hits in shared_buffers possible. In the real
world, this situation is rare enough compared to disk-bound vacuum
work that I doubt it's worth getting excited over. Systems with lots
of RAM where performance is regularly dominated by one big ugly table
are common though, so I wouldn't just rule the idea out as not useful
either.
Thanks for the very detailed reply of yours, and the comments from others.
Cheers,
Gavin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 26 September 2014 01:24, Jeff Janes Wrote,
I think you have an off-by-one error in the index into the array of file handles.
Actually the problem is that the socket for the master connection was not getting initialized, see my one line addition here.
connSlot = (ParallelSlot*)pg_malloc(concurrentCons * sizeof(ParallelSlot));
connSlot[0].connection = conn;
+ connSlot[0].sock = PQsocket(conn);
Thanks for the review, I have fixed this.
However, I don't think it is good to just ignore errors from the select call (like the EBADF) and go into a busy loop instead, so there are more changes needed than this.
Actually this select_loop function I have implemented same as other client application are handling, i.e pg_dum in parallel.c, however parallel.c is handling the case if process is in abort (if Ctrl+c is recieved),
And we need to handle the same, so I have fixed this in attached patch.
Also, cancelling the run (by hitting ctrl-C in the shell that invoked it) does not seem to work on linux. I get a message that says "Cancel request sent", but then it continues to finish the job anyway.
Apart from above mentioned reason, GetQueryResult was also not setting “SetCancelConn” as Amit has pointed, now this is also fixed.
Regards,
Dilip Kumar
Attachments:
vacuumdb_parallel_v15.patchapplication/octet-stream; name=vacuumdb_parallel_v15.patchDownload
*** a/doc/src/sgml/ref/vacuumdb.sgml
--- b/doc/src/sgml/ref/vacuumdb.sgml
***************
*** 204,209 **** PostgreSQL documentation
--- 204,231 ----
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></></term>
+ <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></></term>
+ <listitem>
+ <para>
+ Number of concurrent connections to perform the operation.
+ This option will enable the vacuum operation to run on asynchronous connections,
+ at a time one table will be operated on one connection.
+ So at one time as many tables will be vacuumed parallely as number of jobs.
+ If number of jobs given are more than number of tables then number of
+ jobs will be set to number of tables.
+ </para>
+ <para>
+ <application>vacuumdb</> will open <replaceable class="parameter">
+ njobs</replaceable> connections to the database, so make sure your
+ <xref linkend="guc-max-connections"> setting is high enough to
+ accommodate all connections.
+ </para>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>--analyze-in-stages</option></term>
<listitem>
<para>
*** a/src/bin/scripts/common.c
--- b/src/bin/scripts/common.c
***************
*** 19,28 ****
#include "common.h"
! static void SetCancelConn(PGconn *conn);
! static void ResetCancelConn(void);
static PGcancel *volatile cancelConn = NULL;
#ifdef WIN32
static CRITICAL_SECTION cancelConnLock;
--- 19,28 ----
#include "common.h"
!
static PGcancel *volatile cancelConn = NULL;
+ static bool inAbort = false;
#ifdef WIN32
static CRITICAL_SECTION cancelConnLock;
***************
*** 291,297 **** yesno_prompt(const char *question)
*
* Set cancelConn to point to the current database connection.
*/
! static void
SetCancelConn(PGconn *conn)
{
PGcancel *oldCancelConn;
--- 291,297 ----
*
* Set cancelConn to point to the current database connection.
*/
! void
SetCancelConn(PGconn *conn)
{
PGcancel *oldCancelConn;
***************
*** 321,327 **** SetCancelConn(PGconn *conn)
*
* Free the current cancel connection, if any, and set to NULL.
*/
! static void
ResetCancelConn(void)
{
PGcancel *oldCancelConn;
--- 321,327 ----
*
* Free the current cancel connection, if any, and set to NULL.
*/
! void
ResetCancelConn(void)
{
PGcancel *oldCancelConn;
***************
*** 358,363 **** handle_sigint(SIGNAL_ARGS)
--- 358,364 ----
/* Send QueryCancel if we are processing a database query */
if (cancelConn != NULL)
{
+ inAbort = true;
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
fprintf(stderr, _("Cancel request sent\n"));
else
***************
*** 391,396 **** consoleHandler(DWORD dwCtrlType)
--- 392,399 ----
EnterCriticalSection(&cancelConnLock);
if (cancelConn != NULL)
{
+ inAbort = true;
+
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
fprintf(stderr, _("Cancel request sent\n"));
else
***************
*** 414,416 **** setup_cancel_handler(void)
--- 417,424 ----
}
#endif /* WIN32 */
+
+ bool in_abort()
+ {
+ return inAbort;
+ }
*** a/src/bin/scripts/common.h
--- b/src/bin/scripts/common.h
***************
*** 49,52 **** extern bool yesno_prompt(const char *question);
--- 49,57 ----
extern void setup_cancel_handler(void);
+ extern void SetCancelConn(PGconn *conn);
+ extern void ResetCancelConn(void);
+ extern bool in_abort(void);
+
+
#endif /* COMMON_H */
*** a/src/bin/scripts/vacuumdb.c
--- b/src/bin/scripts/vacuumdb.c
***************
*** 14,19 ****
--- 14,30 ----
#include "common.h"
#include "dumputils.h"
+ #define NO_SLOT (-1)
+
+ /* Arguments needed for a worker process */
+ typedef struct ParallelSlot
+ {
+ PGconn *connection;
+ bool isFree;
+ pgsocket sock;
+ } ParallelSlot;
+
+ #define ERRCODE_UNDEFINED_TABLE "42P01"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
bool and_analyze, bool analyze_only, bool analyze_in_stages, int stage, bool freeze,
***************
*** 25,34 **** static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet);
static void help(const char *progname);
int
main(int argc, char *argv[])
--- 36,74 ----
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet,
! int concurrentCons);
static void help(const char *progname);
+ void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ int stage, bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int concurrentCons,
+ SimpleStringList *tables, bool quiet);
+
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+ static void
+ run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int concurrentCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb);
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb);
+
+ static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb);
+
+ static int
+ select_loop(int maxFd, fd_set *workerset);
+
+ static void DisconnectDatabase(ParallelSlot *slot);
+
int
main(int argc, char *argv[])
***************
*** 49,54 **** main(int argc, char *argv[])
--- 89,95 ----
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
***************
*** 74,86 **** main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
{
switch (c)
{
--- 115,129 ----
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int concurrentCons = 0;
+ int tbl_count = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
***************
*** 121,134 **** main(int argc, char *argv[])
--- 164,190 ----
alldb = true;
break;
case 't':
+ {
simple_string_list_append(&tables, optarg);
+ tbl_count++;
break;
+ }
case 'f':
full = true;
break;
case 'v':
verbose = true;
break;
+ case 'j':
+ concurrentCons = atoi(optarg);
+ if (concurrentCons <= 0)
+ {
+ fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least 1\n"),
+ progname);
+ exit(1);
+ }
+
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
***************
*** 141,146 **** main(int argc, char *argv[])
--- 197,203 ----
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
***************
*** 179,184 **** main(int argc, char *argv[])
--- 236,248 ----
setup_cancel_handler();
+ /*
+ * When user is giving the table list, and list is smaller then
+ * number of tables
+ */
+ if (tbl_count && (concurrentCons > tbl_count))
+ concurrentCons = tbl_count;
+
if (alldb)
{
if (dbname)
***************
*** 196,202 **** main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet);
}
else
{
--- 260,266 ----
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet, concurrentCons);
}
else
{
***************
*** 210,234 **** main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
- for (cell = tables.head; cell; cell = cell->next)
- {
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages, -1,
- freeze, cell->val,
- host, port, username, prompt_password,
- progname, echo, quiet);
- }
}
else
! vacuum_one_database(dbname, full, verbose, and_analyze,
analyze_only, analyze_in_stages, -1,
freeze, NULL,
host, port, username, prompt_password,
progname, echo, quiet);
}
exit(0);
--- 274,309 ----
dbname = get_user_name_or_exit(progname);
}
! if (concurrentCons > 1)
{
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages, -1,
! freeze, host, port, username, prompt_password,
! progname, echo, concurrentCons, &tables, quiet);
}
else
! {
! if (tables.head != NULL)
! {
! SimpleStringListCell *cell;
!
! for (cell = tables.head; cell; cell = cell->next)
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages, -1,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo, quiet);
! }
! }
! else
! vacuum_one_database(dbname, full, verbose, and_analyze,
analyze_only, analyze_in_stages, -1,
freeze, NULL,
host, port, username, prompt_password,
progname, echo, quiet);
+ }
}
exit(0);
***************
*** 268,323 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
conn = connectDatabase(dbname, host, port, username, prompt_password,
progname, false);
! if (analyze_only)
! {
! appendPQExpBufferStr(&sql, "ANALYZE");
! if (verbose)
! appendPQExpBufferStr(&sql, " VERBOSE");
! }
! else
! {
! appendPQExpBufferStr(&sql, "VACUUM");
! if (PQserverVersion(conn) >= 90000)
! {
! const char *paren = " (";
! const char *comma = ", ";
! const char *sep = paren;
- if (full)
- {
- appendPQExpBuffer(&sql, "%sFULL", sep);
- sep = comma;
- }
- if (freeze)
- {
- appendPQExpBuffer(&sql, "%sFREEZE", sep);
- sep = comma;
- }
- if (verbose)
- {
- appendPQExpBuffer(&sql, "%sVERBOSE", sep);
- sep = comma;
- }
- if (and_analyze)
- {
- appendPQExpBuffer(&sql, "%sANALYZE", sep);
- sep = comma;
- }
- if (sep != paren)
- appendPQExpBufferStr(&sql, ")");
- }
- else
- {
- if (full)
- appendPQExpBufferStr(&sql, " FULL");
- if (freeze)
- appendPQExpBufferStr(&sql, " FREEZE");
- if (verbose)
- appendPQExpBufferStr(&sql, " VERBOSE");
- if (and_analyze)
- appendPQExpBufferStr(&sql, " ANALYZE");
- }
- }
if (table)
appendPQExpBuffer(&sql, " %s", table);
appendPQExpBufferStr(&sql, ";");
--- 343,351 ----
conn = connectDatabase(dbname, host, port, username, prompt_password,
progname, false);
! prepare_command(conn, full, verbose,
! and_analyze, analyze_only, freeze, &sql);
if (table)
appendPQExpBuffer(&sql, " %s", table);
appendPQExpBufferStr(&sql, ";");
***************
*** 374,384 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
! bool analyze_in_stages, bool freeze, const char *maintenance_db,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
--- 402,413 ----
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *maintenance_db, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo, bool quiet, int concurrentCons)
{
PGconn *conn;
PGresult *result;
***************
*** 407,412 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
--- 436,450 ----
fflush(stdout);
}
+ if (concurrentCons > 1)
+ {
+ vacuum_parallel(dbname, full, verbose, and_analyze,
+ analyze_only, analyze_in_stages, stage,
+ freeze, host, port, username, prompt_password,
+ progname, echo, concurrentCons, NULL, quiet);
+
+ }
+ else
vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
analyze_in_stages, stage,
freeze, NULL, host, port, username, prompt_password,
***************
*** 417,422 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
--- 455,956 ----
PQclear(result);
}
+ /*
+ * run_parallel_vacuum
+ * This function process the table list,
+ * pick the object on by one and get the Free connections slot, once it
+ * get the free slot send the job on the free connection.
+ */
+
+ static void
+ run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int concurrentCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb)
+ {
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+ int max_slot = concurrentCons;
+ int i;
+ int free_slot;
+ PGconn *slotconn;
+ bool error = false;
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"};
+
+ initPQExpBuffer(&sql);
+
+ if (analyze_stage >= 0)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ executeCommand(connSlot[i].connection,
+ stage_commands[analyze_stage], progname, echo);
+ }
+ }
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ /*
+ * This will give the free connection slot, if no slot is free it will
+ * wait for atleast one slot to get free.
+ */
+ free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname,
+ completedb);
+ if (free_slot == NO_SLOT)
+ {
+ error = true;
+ goto fail;
+ }
+
+ prepare_command(connSlot[free_slot].connection, full, verbose,
+ and_analyze, analyze_only, freeze, &sql);
+
+ appendPQExpBuffer(&sql, " %s", cell->val);
+
+ connSlot[free_slot].isFree = false;
+
+ slotconn = connSlot[free_slot].connection;
+ PQsendQuery(slotconn, sql.data);
+
+ resetPQExpBuffer(&sql);
+ }
+
+ for (i = 0; i < max_slot; i++)
+ {
+ /* wait for all connection to return the results*/
+ if (!GetQueryResult(connSlot[i].connection, dbname, progname,
+ completedb))
+ {
+ error = true;
+ goto fail;
+ }
+
+ connSlot[i].isFree = true;
+ }
+
+ fail:
+
+ termPQExpBuffer(&sql);
+
+ if (error)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ DisconnectDatabase(&connSlot[i]);
+ }
+
+ pfree(connSlot);
+
+ exit(1);
+ }
+ }
+
+ /*
+ * GetIdleSlot
+ * Process the slot list, if any free slot available return the slotid
+ * If no slot is free, Then perform select on all the socket and wait until
+ * atleast one slot is free.
+ */
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb)
+ {
+ int i;
+ fd_set slotset;
+ int firstFree = -1;
+ pgsocket maxFd;
+
+ for (i = 0; i < max_slot; i++)
+ if (pSlot[i].isFree)
+ return i;
+
+ FD_ZERO(&slotset);
+
+ maxFd = pSlot[0].sock;
+
+ for (i = 0; i < max_slot; i++)
+ {
+ FD_SET(pSlot[i].sock, &slotset);
+ if (pSlot[i].sock > maxFd)
+ maxFd = pSlot[i].sock;
+ }
+
+ /*
+ * Some of the slot are free, Process the results for slots whichever
+ * are free
+ */
+
+ do
+ {
+ SetCancelConn(pSlot[0].connection);
+
+ i = select_loop(maxFd, &slotset);
+
+ ResetCancelConn();
+
+ if (i < 0)
+ {
+ /* This can only happen if user has sent the cacle request using
+ * Ctrl+C, Cancle is handled by 0th slot, so fetch the error result
+ */
+
+ GetQueryResult(pSlot[0].connection, dbname, progname,
+ completedb);
+ return NO_SLOT;
+ }
+
+ Assert(i != 0);
+
+ for (i = 0; i < max_slot; i++)
+ {
+ if (!FD_ISSET(pSlot[i].sock, &slotset))
+ continue;
+
+ PQconsumeInput(pSlot[i].connection);
+ if (PQisBusy(pSlot[i].connection))
+ continue;
+
+ pSlot[i].isFree = true;
+
+ if (!GetQueryResult(pSlot[i].connection, dbname, progname,
+ completedb))
+ return NO_SLOT;
+
+ if (firstFree < 0)
+ firstFree = i;
+ }
+ }while(firstFree < 0);
+
+ return firstFree;
+ }
+
+ /*
+ * GetQueryResult
+ * Process the query result.
+ */
+
+ static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb)
+ {
+ PGresult *result = NULL;
+ PGresult *lastResult = NULL;
+ bool r;
+
+
+ SetCancelConn(conn);
+ while((result = PQgetResult(conn)) != NULL)
+ {
+ PQclear(lastResult);
+ lastResult = result;
+ }
+
+ ResetCancelConn();
+
+ if (!lastResult)
+ return true;
+
+ r = (PQresultStatus(lastResult) == PGRES_COMMAND_OK);
+
+ /*
+ * If user has given the vacuum of complete db, then if
+ * any of the object's vacuum failed it can be ignored and vacuuming
+ * of other objects can be continued, this is the same behavior as
+ * vacuuming of complete db is handled without --jobs option
+ */
+ if (!r)
+ {
+ char *sqlState = PQresultErrorField(lastResult, PG_DIAG_SQLSTATE);
+
+ if (!completedb ||
+ (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0))
+ {
+
+ fprintf(stderr, _("%s: vacuuming of database \"%s\" failed: %s"),
+ progname, dbname, PQerrorMessage(conn));
+
+ PQclear(lastResult);
+ return false;
+ }
+ }
+
+ PQclear(lastResult);
+ return true;
+ }
+
+ /*
+ * vacuum_parallel
+ * This function will open the multiple concurrent connections as
+ * suggested by used, it will derive the table list using server call
+ * if table list is not given by user and perform the vacuum call
+ */
+
+ void
+ vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ int stage, bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int concurrentCons,
+ SimpleStringList *tables, bool quiet)
+ {
+
+ PGconn *conn;
+ int i;
+ ParallelSlot *connSlot;
+ bool completeDb = false;
+
+ conn = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ /*
+ * if table list is not provided then we need to do vaccum for whole DB
+ * get the list of all tables and prepare the list
+ */
+ if (!tables || !tables->head)
+ {
+ SimpleStringList dbtables = {NULL, NULL};
+ PGresult *res;
+ int ntuple;
+ int i;
+ PQExpBufferData sql;
+
+ initPQExpBuffer(&sql);
+
+ res = executeQuery(conn,
+ "SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns"
+ " WHERE (relkind = \'r\' or relkind = \'m\')"
+ " and c.relnamespace = ns.oid ORDER BY c.relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ appendPQExpBuffer(&sql, "%s",
+ fmtQualifiedId(PQserverVersion(conn),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 0)));
+
+ simple_string_list_append(&dbtables, sql.data);
+ resetPQExpBuffer(&sql);
+ }
+
+ termPQExpBuffer(&sql);
+ tables = &dbtables;
+
+ /* Vaccuming full database*/
+ completeDb = true;
+
+ if (concurrentCons > ntuple)
+ concurrentCons = ntuple;
+ }
+
+ connSlot = (ParallelSlot*)pg_malloc(concurrentCons * sizeof(ParallelSlot));
+ connSlot[0].connection = conn;
+ connSlot[0].sock = PQsocket(conn);
+
+ PQsetnonblocking(connSlot[0].connection, 1);
+
+ for (i = 1; i < concurrentCons; i++)
+ {
+ connSlot[i].connection = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ PQsetnonblocking(connSlot[i].connection, 1);
+ connSlot[i].isFree = true;
+ connSlot[i].sock = PQsocket(connSlot[i].connection);
+ }
+
+ if (analyze_in_stages)
+ {
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")
+ };
+
+ if (stage == -1)
+ {
+ int i;
+ for (i = 0; i < 3; i++)
+ {
+ if (!quiet)
+ {
+ puts(gettext(stage_messages[i]));
+ fflush(stdout);
+ }
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, concurrentCons,
+ progname, i, connSlot, completeDb);
+ }
+ }
+ else
+ {
+ /* Otherwise, we got a stage from vacuum_all_databases(), so run
+ * only that one. */
+ if (!quiet)
+ {
+ puts(gettext(stage_messages[stage]));
+ fflush(stdout);
+ }
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, concurrentCons,
+ progname, stage, connSlot, completeDb);
+ }
+ }
+ else
+ {
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze,
+ concurrentCons, progname, -1, connSlot, completeDb);
+ }
+
+ for (i = 0; i < concurrentCons; i++)
+ {
+ PQfinish(connSlot[i].connection);
+ }
+
+ pfree(connSlot);
+ }
+
+ /*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+ static int
+ select_loop(int maxFd, fd_set *workerset)
+ {
+ int i;
+ fd_set saveSet = *workerset;
+
+ #ifdef WIN32
+ /* should always be the master */
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+ #else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+
+ if (in_abort())
+ {
+ return -1;
+ }
+
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+ #endif
+
+ return i;
+ }
+
+ /*
+ * DisconnectDatabase
+ * disconnect all the connections.
+ */
+ void
+ DisconnectDatabase(ParallelSlot *slot)
+ {
+ PGcancel *cancel;
+ char errbuf[1];
+
+ if (!slot->connection)
+ return;
+
+ if (PQtransactionStatus(slot->connection) == PQTRANS_ACTIVE)
+ {
+ if ((cancel = PQgetCancel(slot->connection)))
+ {
+ PQcancel(cancel, errbuf, sizeof(errbuf));
+ PQfreeCancel(cancel);
+ }
+ }
+
+ PQfinish(slot->connection);
+ slot->connection= NULL;
+ }
+
+
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+ {
+ initPQExpBuffer(sql);
+
+ if (analyze_only)
+ {
+ appendPQExpBuffer(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBuffer(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBuffer(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBuffer(sql, " FULL");
+ if (freeze)
+ appendPQExpBuffer(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBuffer(sql, " ANALYZE");
+ }
+ }
+ }
+
static void
help(const char *progname)
***************
*** 436,441 **** help(const char *progname)
--- 970,976 ----
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many concurrent connections to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
On 26 September 2014 12:24, Amit Kapila Wrote,
I don't think this can handle cancel requests properly because
you are just setting it in GetIdleSlot() what if the cancel
request came during GetQueryResult() after sending sql for
all connections (probably thats the reason why Jeff is not able
to cancel the vacuumdb when using parallel option).
You are right, I have fixed, it in latest patch, please check latest patch @ (4205E661176A124FAF891E0A6BA9135266363710@szxeml509-mbs.china.huawei.com</messages/by-id/4205E661176A124FAF891E0A6BA9135266363710@szxeml509-mbs.china.huawei.com>)
dilip@linux-ltr9:/home/dilip/9.4/install/bin> ./vacuumdb -z -a -j 8 -p 9005
vacuumdb: vacuuming database "db1"
vacuumdb: vacuuming database "postgres"
Cancel request sent
vacuumdb: vacuuming of database "postgres" failed: ERROR: canceling statement due to user request
Few other points 1. + vacuum_parallel(const char *dbname, bool full, bool verbose, { .. + connSlot = (ParallelSlot*)pg_malloc(concurrentCons * sizeof(ParallelSlot)); + connSlot[0].connection = conn;
Fixed
a.
Does above memory gets freed anywhere, if not isn't it
good idea to do the same
b. For slot 0, you are not seeting it as PQsetnonblocking,
where as I think it can be used to run commands like any other
connection.
Yes, this was missing in the code, I have fixed it..
2. + /* + * If user has given the vacuum of complete db, then if + * any of the object vacuum failed it can be ignored and vacuuming + * of other object can be continued, this is the same behavior as + * vacuuming of complete db is handled without --jobs option + */s/object/object's
FIXED
3. + if(!completedb || + (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)) + { + + fprintf(stderr, _("%s: vacuuming of database \"%s\" failed: %s"), + progname, dbname, PQerrorMessage (conn));Indentation on both places is wrong. Check other palces for
similar issues.
FIXED
4.
+ bool analyze_only, bool freeze, int numAsyncCons,In code still there is reference to AsyncCons, as decided lets
change it to concurrent_connections | conc_cons
FIXED
Regards,
Dilip
On 27 September 2014 03:55, Jeff Janes <jeff.janes@gmail.com> wrote:
On Fri, Sep 26, 2014 at 11:47 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:Gavin Flower wrote:
Curious: would it be both feasible and useful to have multiple
workers process a 'large' table, without complicating things too
much? The could each start at a different position in the file.Feasible: no. Useful: maybe, we don't really know. (You could just as
well have a worker at double the speed, i.e. double vacuum_cost_limit).Vacuum_cost_delay is already 0 by default. So unless you changed that,
vacuum_cost_limit does not take effect under vacuumdb.It is pretty easy for vacuum to be CPU limited, and even easier for analyze
to be CPU limited (It does a lot of sorting). I think analyzing is the main
use case for this patch, to shorten the pg_upgrade window. At least, that
is how I anticipate using it.
I've been trying to review this thread with the thought "what does
this give me?". I am keen to encourage contributions and also keen to
extend our feature set, but I do not wish to complicate our code base.
Dilip's developments do seem to be good quality; what I question is
whether we want this feature.
This patch seems to allow me to run multiple VACUUMs at once. But I
can already do this, with autovacuum.
Is there anything this patch can do that cannot be already done with autovacuum?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Oct 16, 2014 at 8:08 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
I've been trying to review this thread with the thought "what does
this give me?". I am keen to encourage contributions and also keen to
extend our feature set, but I do not wish to complicate our code base.
Dilip's developments do seem to be good quality; what I question is
whether we want this feature.This patch seems to allow me to run multiple VACUUMs at once. But I
can already do this, with autovacuum.Is there anything this patch can do that cannot be already done with
autovacuum?
The difference lies in the fact that vacuumdb (or VACUUM) gives
the option to user to control the vacuum activity for cases when
autovacuum doesn't suffice the need, one of the example is to perform
vacuum via vacuumdb after pg_upgrade or some other maintenance
activity (as mentioned by Jeff upthread). So I think in all such cases
having parallel option can give benefit in terms of performance which
is already shown by Dilip upthread by running some tests (with and
without patch).
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 16 October 2014 06:05, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Oct 16, 2014 at 8:08 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
I've been trying to review this thread with the thought "what does
this give me?". I am keen to encourage contributions and also keen to
extend our feature set, but I do not wish to complicate our code base.
Dilip's developments do seem to be good quality; what I question is
whether we want this feature.This patch seems to allow me to run multiple VACUUMs at once. But I
can already do this, with autovacuum.Is there anything this patch can do that cannot be already done with
autovacuum?The difference lies in the fact that vacuumdb (or VACUUM) gives
the option to user to control the vacuum activity for cases when
autovacuum doesn't suffice the need, one of the example is to perform
vacuum via vacuumdb after pg_upgrade or some other maintenance
activity (as mentioned by Jeff upthread). So I think in all such cases
having parallel option can give benefit in terms of performance which
is already shown by Dilip upthread by running some tests (with and
without patch).
Why do we need 2 ways to do the same thing?
Why not ask autovacuum to do this for you?
Just send a message to autovacuum to request an immediate action. Let
it manage the children and the tasks.
SELECT pg_autovacuum_immediate(nworkers = N, list_of_tables);
Request would allocate an additional N workers and immediately begin
vacuuming the stated tables.
vacuumdb can still issue the request, but the guts of this are done by
the server, not a heavily modified client.
If we are going to heavily modify a client then it needs to be able to
run more than just one thing. Parallel psql would be nice. pg_batch?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Oct 16, 2014 at 1:56 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 16 October 2014 06:05, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Oct 16, 2014 at 8:08 AM, Simon Riggs <simon@2ndquadrant.com>
wrote:
This patch seems to allow me to run multiple VACUUMs at once. But I
can already do this, with autovacuum.Is there anything this patch can do that cannot be already done with
autovacuum?The difference lies in the fact that vacuumdb (or VACUUM) gives
the option to user to control the vacuum activity for cases when
autovacuum doesn't suffice the need, one of the example is to perform
vacuum via vacuumdb after pg_upgrade or some other maintenance
activity (as mentioned by Jeff upthread). So I think in all such cases
having parallel option can give benefit in terms of performance which
is already shown by Dilip upthread by running some tests (with and
without patch).Why do we need 2 ways to do the same thing?
Isn't the multiple ways to do the same already exists like via
vacuumdb | Vaccum and autovaccum?
Why not ask autovacuum to do this for you?
Just send a message to autovacuum to request an immediate action. Let
it manage the children and the tasks.SELECT pg_autovacuum_immediate(nworkers = N, list_of_tables);
Request would allocate an additional N workers and immediately begin
vacuuming the stated tables.
I think doing anything on the server side can have higher complexity like:
a. Does this function return immediately after sending request to
autovacuum, if yes then the behaviour of this new functionality
will be different as compare to vacuumdb which user might not
expect?
b. We need to have some way to handle errors that can occur in
autovacuum (may be need to find a way to pass back to user),
vacuumdb or Vacuum can report error to user.
c. How does nworkers input relates to autovacuum_max_workers
which is needed at start for shared memory initialization and in calc
of maxbackends.
d. How to handle database wide vacuum which is possible via vacuumdb
e. What is the best UI (a command like above or via config parameters)?
I think we can find a way for above and may be if any other similar things
needs to be taken care, but still I think it is better that we have this
feature
via vacuumdb rather than adding complexity in server code. Also the
current algorithm used in patch is discussed and agreed upon in this
thread and if now we want to go via some other method (auto vacuum),
it might take much more time to build consensus on all the points.
vacuumdb can still issue the request, but the guts of this are done by
the server, not a heavily modified client.If we are going to heavily modify a client then it needs to be able to
run more than just one thing. Parallel psql would be nice. pg_batch?
Could you be more specific in this point, I am not able to see how
vacuumdb utility has anything to do with parallel psql?
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 16 October 2014 15:09, Amit Kapila <amit.kapila16@gmail.com> wrote:
Just send a message to autovacuum to request an immediate action. Let
it manage the children and the tasks.SELECT pg_autovacuum_immediate(nworkers = N, list_of_tables);
Request would allocate an additional N workers and immediately begin
vacuuming the stated tables.I think doing anything on the server side can have higher complexity like:
a. Does this function return immediately after sending request to
autovacuum, if yes then the behaviour of this new functionality
will be different as compare to vacuumdb which user might not
expect?
b. We need to have some way to handle errors that can occur in
autovacuum (may be need to find a way to pass back to user),
vacuumdb or Vacuum can report error to user.
c. How does nworkers input relates to autovacuum_max_workers
which is needed at start for shared memory initialization and in calc
of maxbackends.
d. How to handle database wide vacuum which is possible via vacuumdb
e. What is the best UI (a command like above or via config parameters)?
c) seems like the only issue that needs any thought. I don't think its
going to be that hard.
I don't see any problems with the other points. You can make a
function wait, if you wish.
I think we can find a way for above and may be if any other similar things
needs to be taken care, but still I think it is better that we have this
feature
via vacuumdb rather than adding complexity in server code. Also the
current algorithm used in patch is discussed and agreed upon in this
thread and if now we want to go via some other method (auto vacuum),
it might take much more time to build consensus on all the points.
Well, I read Alvaro's point from earlier in the thread and agreed with
it. All we really need is an instruction to autovacuum to say "be
aggressive".
Just because somebody added something to the TODO list doesn't make it
a good idea. I apologise to Dilip for saying this, it is not anything
against him, just the idea.
Perhaps we just accept the patch and change AV in the future.
vacuumdb can still issue the request, but the guts of this are done by
the server, not a heavily modified client.If we are going to heavily modify a client then it needs to be able to
run more than just one thing. Parallel psql would be nice. pg_batch?Could you be more specific in this point, I am not able to see how
vacuumdb utility has anything to do with parallel psql?
That's my point. All this code in vacuumdb just for this one isolated
use case? Twice the maintenance burden.
A more generic utility to run commands in parallel would be useful.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 17, 2014 at 1:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 16 October 2014 15:09, Amit Kapila <amit.kapila16@gmail.com> wrote:
I think doing anything on the server side can have higher complexity
like:
a. Does this function return immediately after sending request to
autovacuum, if yes then the behaviour of this new functionality
will be different as compare to vacuumdb which user might not
expect?
b. We need to have some way to handle errors that can occur in
autovacuum (may be need to find a way to pass back to user),
vacuumdb or Vacuum can report error to user.
c. How does nworkers input relates to autovacuum_max_workers
which is needed at start for shared memory initialization and in calc
of maxbackends.
d. How to handle database wide vacuum which is possible via vacuumdb
e. What is the best UI (a command like above or via config parameters)?c) seems like the only issue that needs any thought. I don't think its
going to be that hard.I don't see any problems with the other points. You can make a
function wait, if you wish.
It is quite possible, but still I think to accomplish such a function,
we need to have some mechanism where it can inform auto vacuum
and then some changes in auto vacuum to receive/read that information
and reply back. I don't think any such mechanism exists.
I think we can find a way for above and may be if any other similar
things
needs to be taken care, but still I think it is better that we have this
feature
via vacuumdb rather than adding complexity in server code. Also the
current algorithm used in patch is discussed and agreed upon in this
thread and if now we want to go via some other method (auto vacuum),
it might take much more time to build consensus on all the points.Well, I read Alvaro's point from earlier in the thread and agreed with
it. All we really need is an instruction to autovacuum to say "be
aggressive".Just because somebody added something to the TODO list doesn't make it
a good idea. I apologise to Dilip for saying this, it is not anything
against him, just the idea.Perhaps we just accept the patch and change AV in the future.
So shall we move ahead with review of this patch and make a note
of changing AV in future (may be TODO)?
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 17 October 2014 12:52, Amit Kapila <amit.kapila16@gmail.com> wrote:
It is quite possible, but still I think to accomplish such a function,
we need to have some mechanism where it can inform auto vacuum
and then some changes in auto vacuum to receive/read that information
and reply back. I don't think any such mechanism exists.
That's exactly how the CHECKPOINT command works, in conjunction with
the checkpointer process.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Amit Kapila wrote:
On Fri, Oct 17, 2014 at 1:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 16 October 2014 15:09, Amit Kapila <amit.kapila16@gmail.com> wrote:
c) seems like the only issue that needs any thought. I don't think its
going to be that hard.I don't see any problems with the other points. You can make a
function wait, if you wish.It is quite possible, but still I think to accomplish such a function,
we need to have some mechanism where it can inform auto vacuum
and then some changes in auto vacuum to receive/read that information
and reply back. I don't think any such mechanism exists.
You're right, it doesn't. I think we have plenty more infrastructure
for that than we had when autovacuum was initially developed. It
shouldn't be that hard.
Of course, this is a task that requires much more thinking, design, and
discussion than just adding multi-process capability to vacuumdb ...
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 17 October 2014 14:05, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Of course, this is a task that requires much more thinking, design, and
discussion than just adding multi-process capability to vacuumdb ...
Yes, please proceed with this patch as originally envisaged. No more
comments from me.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 7, 2014 at 11:10 AM, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 26 September 2014 12:24, Amit Kapila Wrote,
I don't think this can handle cancel requests properly because
you are just setting it in GetIdleSlot() what if the cancel
request came during GetQueryResult() after sending sql for
all connections (probably thats the reason why Jeff is not able
to cancel the vacuumdb when using parallel option).
You are right, I have fixed, it in latest patch, please check latest
patch @ (
4205E661176A124FAF891E0A6BA9135266363710@szxeml509-mbs.china.huawei.com)
***************
*** 358,363 **** handle_sigint(SIGNAL_ARGS)
--- 358,364 ----
/* Send QueryCancel if we are processing a database query */
if (cancelConn != NULL)
{
+ inAbort = true;
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
fprintf(stderr, _("Cancel request sent\n"));
else
Do we need to set inAbort flag incase PQcancel is successful?
Basically if PQCancel fails due to any reason, I think behaviour
can be undefined as the executing thread can assume that cancel is
done.
*** 391,396 **** consoleHandler(DWORD dwCtrlType)
--- 392,399 ----
EnterCriticalSection
(&cancelConnLock);
if (cancelConn != NULL)
{
+ inAbort =
true;
+
You have set this flag in case of windows handler, however the same
is never used incase of windows, are you expecting any use of this
flag for windows?
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sat, Oct 25, 2014 at 5:52 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
*************** *** 358,363 **** handle_sigint(SIGNAL_ARGS) --- 358,364 ----/* Send QueryCancel if we are processing a database query */
if (cancelConn != NULL)
{
+ inAbort = true;
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
fprintf(stderr, _("Cancel request sent\n"));
elseDo we need to set inAbort flag incase PQcancel is successful?
Basically if PQCancel fails due to any reason, I think behaviour
can be undefined as the executing thread can assume that cancel is
done.*** 391,396 **** consoleHandler(DWORD dwCtrlType) --- 392,399 ---- EnterCriticalSection (&cancelConnLock); if (cancelConn != NULL) { + inAbort = true; +You have set this flag in case of windows handler, however the same
is never used incase of windows, are you expecting any use of this
flag for windows?
Going further with verification of this patch, I found below issue:
Run the testcase.sql file at below link:
/messages/by-id/4205E661176A124FAF891E0A6BA9135266347F25@szxeml509-mbs.china.huawei.com
./vacuumdb --analyze-in-stages -j 8 -d postgres
Generating minimal optimizer statistics (1 target)
Segmentation fault
Server Log:
ERROR: syntax error at or near "minimal" at character 12
STATEMENT: ANALYZE ng minimal optimizer statistics (1 target)
LOG: could not receive data from client: Connection reset by peer
Fixed below issues and attached an updated patch with mail:
1.
make check for docs gives below errors:
{ \
echo "<!ENTITY version \"9.5devel\">"; \
echo "<!ENTITY majorversion \"9.5\">"; \
} > version.sgml
'/usr/bin/perl' ./mk_feature_tables.pl YES
../../../src/backend/catalog/sql_feature_packages.txt
../../../src/backend/catalog/sql_features.txt > features-supported.sgml
'/usr/bin/perl' ./mk_feature_tables.pl NO
../../../src/backend/catalog/sql_feature_packages.txt
../../../src/backend/catalog/sql_features.txt > features-unsupported.sgml
'/usr/bin/perl' ./generate-errcodes-table.pl
../../../src/backend/utils/errcodes.txt > errcodes-table.sgml
onsgmls -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . -s
postgres.sgml
onsgmls:ref/vacuumdb.sgml:224:15:E: end tag for "LISTITEM" omitted, but
OMITTAG NO was specified
onsgmls:ref/vacuumdb.sgml:209:8: start tag was here
onsgmls:ref/vacuumdb.sgml:224:15:E: end tag for "VARLISTENTRY" omitted, but
OMITTAG NO was specified
onsgmls:ref/vacuumdb.sgml:206:5: start tag was here
onsgmls:ref/vacuumdb.sgml:224:15:E: end tag for "VARIABLELIST" omitted, but
OMITTAG NO was specified
onsgmls:ref/vacuumdb.sgml:79:4: start tag was here
onsgmls:ref/vacuumdb.sgml:225:18:E: end tag for element "LISTITEM" which is
not open
onsgmls:ref/vacuumdb.sgml:226:21:E: end tag for element "VARLISTENTRY"
which is not open
onsgmls:ref/vacuumdb.sgml:228:18:E: document type does not allow element
"VARLISTENTRY" here; assuming
missing "VARIABLELIST" start-tag
onsgmls:ref/vacuumdb.sgml:260:9:E: end tag for element "PARA" which is not
open
make: *** [check] Error 1
Fixed.
2.
Below multi-line comment is wrong:
/* Otherwise, we got a stage from vacuum_all_databases(), so run
* only that one. */
Fixed.
3.
fprintf(stderr, _("%s: vacuuming of database \"%s\" failed: %s"),
progname, dbname, PQerrorMessage(conn));
indentation of fprintf is not proper.
Fixed.
4.
/* This can only happen if user has sent the cacle request using
* Ctrl+C, Cancle is
handled by 0th slot, so fetch the error result
*/
spelling of cancel is wrong and multi-line comment is not proper.
Fixed
5.
/* This can only happen if user has sent the cacle request using
* Ctrl+C, Cancle is handled by 0th slot, so fetch the error result
*/
GetQueryResult(pSlot[0].connection, dbname, progname,
completedb);
indentation of completedb parameter is wrong.
Fixed.
6.
/*
* vacuum_parallel
* This function will open the multiple concurrent connections as
* suggested by used, it will derive the table list using server call
* if table list is not given by user and perform the vacuum call
*/
s/used/user
Fixed.
In general, I think you can once go through all the comments
and code to see if similar issues exist at other places as well.
I have done some performance test with the patch, data for which is
as below:
Performance Data
------------------------------
IBM POWER-7 16 cores, 64 hardware threads
RAM = 64GB
max_connections = 128
checkpoint_segments=256
checkpoint_timeout =15min
shared_buffers = 1GB
Before each test, run the testcase.sql file at below link:
/messages/by-id/4205E661176A124FAF891E0A6BA9135266347F25@szxeml509-mbs.china.huawei.com
Un-patched -
time ./vacuumdb -t t1 -t t2 -t t3 -t t4 -t t5 -t t6 -t t7 -t t8 -d postgres
real 0m2.454s
user 0m0.002s
sys 0m0.006s
Patched -
time ./vacuumdb -t t1 -t t2 -t t3 -t t4 -t t5 -t t6 -t t7 -t t8 -j 4 -d
postgres
real 0m1.691s
user 0m0.001s
sys 0m0.004s
time ./vacuumdb -t t1 -t t2 -t t3 -t t4 -t t5 -t t6 -t t7 -t t8 -j 8 -d
postgres
real 0m1.496s
user 0m0.002s
sys 0m0.004s
Above data indicates that the patch improves performance when used
with more number of concurrent connections. I have done similar tests
for whole database as well and the results are quite similar to above.
I think you can once run the performance test for --analyze-in-stages
option as well after fixing the issue reported above in this mail.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
vacuumdb_parallel_v16.patchapplication/octet-stream; name=vacuumdb_parallel_v16.patchDownload
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 3ecd999..e4a971f 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -204,6 +204,27 @@ PostgreSQL documentation
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></option></term>
+ <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+ <listitem>
+ <para>
+ Number of concurrent connections to perform the operation.
+ This option will enable the vacuum operation to run on asynchronous
+ connections, at a time one table will be operated on one connection.
+ So at one time as many tables will be vacuumed parallely as number of
+ jobs. If number of jobs given are more than number of tables then
+ number of jobs will be set to number of tables.
+ </para>
+ <para>
+ <application>vacuumdb</application> will open
+ <replaceable class="parameter"> njobs</replaceable> connections to the
+ database, so make sure your <xref linkend="guc-max-connections">
+ setting is high enough to accommodate all connections.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>--analyze-in-stages</option></term>
<listitem>
<para>
diff --git a/src/bin/scripts/common.c b/src/bin/scripts/common.c
index 311fed5..bc5336a 100644
--- a/src/bin/scripts/common.c
+++ b/src/bin/scripts/common.c
@@ -19,10 +19,10 @@
#include "common.h"
-static void SetCancelConn(PGconn *conn);
-static void ResetCancelConn(void);
+
static PGcancel *volatile cancelConn = NULL;
+static bool inAbort = false;
#ifdef WIN32
static CRITICAL_SECTION cancelConnLock;
@@ -291,7 +291,7 @@ yesno_prompt(const char *question)
*
* Set cancelConn to point to the current database connection.
*/
-static void
+void
SetCancelConn(PGconn *conn)
{
PGcancel *oldCancelConn;
@@ -321,7 +321,7 @@ SetCancelConn(PGconn *conn)
*
* Free the current cancel connection, if any, and set to NULL.
*/
-static void
+void
ResetCancelConn(void)
{
PGcancel *oldCancelConn;
@@ -358,6 +358,7 @@ handle_sigint(SIGNAL_ARGS)
/* Send QueryCancel if we are processing a database query */
if (cancelConn != NULL)
{
+ inAbort = true;
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
fprintf(stderr, _("Cancel request sent\n"));
else
@@ -391,6 +392,8 @@ consoleHandler(DWORD dwCtrlType)
EnterCriticalSection(&cancelConnLock);
if (cancelConn != NULL)
{
+ inAbort = true;
+
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
fprintf(stderr, _("Cancel request sent\n"));
else
@@ -414,3 +417,8 @@ setup_cancel_handler(void)
}
#endif /* WIN32 */
+
+bool in_abort()
+{
+ return inAbort;
+}
diff --git a/src/bin/scripts/common.h b/src/bin/scripts/common.h
index 691f6c6..3bafde3 100644
--- a/src/bin/scripts/common.h
+++ b/src/bin/scripts/common.h
@@ -49,4 +49,9 @@ extern bool yesno_prompt(const char *question);
extern void setup_cancel_handler(void);
+extern void SetCancelConn(PGconn *conn);
+extern void ResetCancelConn(void);
+extern bool in_abort(void);
+
+
#endif /* COMMON_H */
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 86e6ab3..8c2abb7 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -14,6 +14,17 @@
#include "common.h"
#include "dumputils.h"
+#define NO_SLOT (-1)
+
+/* Arguments needed for a worker process */
+typedef struct ParallelSlot
+{
+ PGconn *connection;
+ bool isFree;
+ pgsocket sock;
+} ParallelSlot;
+
+#define ERRCODE_UNDEFINED_TABLE "42P01"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
bool and_analyze, bool analyze_only, bool analyze_in_stages, int stage, bool freeze,
@@ -25,10 +36,39 @@ static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
- const char *progname, bool echo, bool quiet);
+ const char *progname, bool echo, bool quiet,
+ int concurrentCons);
static void help(const char *progname);
+void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ int stage, bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int concurrentCons,
+ SimpleStringList *tables, bool quiet);
+
+
+void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+static void
+run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int concurrentCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb);
+static int
+GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb);
+
+static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb);
+
+static int
+select_loop(int maxFd, fd_set *workerset);
+
+static void DisconnectDatabase(ParallelSlot *slot);
+
int
main(int argc, char *argv[])
@@ -49,6 +89,7 @@ main(int argc, char *argv[])
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
@@ -74,13 +115,15 @@ main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int concurrentCons = 0;
+ int tbl_count = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
- while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
+ while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
@@ -121,14 +164,27 @@ main(int argc, char *argv[])
alldb = true;
break;
case 't':
+ {
simple_string_list_append(&tables, optarg);
+ tbl_count++;
break;
+ }
case 'f':
full = true;
break;
case 'v':
verbose = true;
break;
+ case 'j':
+ concurrentCons = atoi(optarg);
+ if (concurrentCons <= 0)
+ {
+ fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least 1\n"),
+ progname);
+ exit(1);
+ }
+
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
@@ -141,6 +197,7 @@ main(int argc, char *argv[])
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
@@ -179,6 +236,13 @@ main(int argc, char *argv[])
setup_cancel_handler();
+ /*
+ * When user is giving the table list, and list is smaller then
+ * number of tables
+ */
+ if (tbl_count && (concurrentCons > tbl_count))
+ concurrentCons = tbl_count;
+
if (alldb)
{
if (dbname)
@@ -196,7 +260,7 @@ main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
- prompt_password, progname, echo, quiet);
+ prompt_password, progname, echo, quiet, concurrentCons);
}
else
{
@@ -210,25 +274,36 @@ main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
- if (tables.head != NULL)
+ if (concurrentCons > 1)
{
- SimpleStringListCell *cell;
+ vacuum_parallel(dbname, full, verbose, and_analyze,
+ analyze_only, analyze_in_stages, -1,
+ freeze, host, port, username, prompt_password,
+ progname, echo, concurrentCons, &tables, quiet);
- for (cell = tables.head; cell; cell = cell->next)
- {
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages, -1,
- freeze, cell->val,
- host, port, username, prompt_password,
- progname, echo, quiet);
- }
}
else
- vacuum_one_database(dbname, full, verbose, and_analyze,
+ {
+ if (tables.head != NULL)
+ {
+ SimpleStringListCell *cell;
+
+ for (cell = tables.head; cell; cell = cell->next)
+ {
+ vacuum_one_database(dbname, full, verbose, and_analyze,
+ analyze_only, analyze_in_stages, -1,
+ freeze, cell->val,
+ host, port, username, prompt_password,
+ progname, echo, quiet);
+ }
+ }
+ else
+ vacuum_one_database(dbname, full, verbose, and_analyze,
analyze_only, analyze_in_stages, -1,
freeze, NULL,
host, port, username, prompt_password,
progname, echo, quiet);
+ }
}
exit(0);
@@ -268,56 +343,9 @@ vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
conn = connectDatabase(dbname, host, port, username, prompt_password,
progname, false);
- if (analyze_only)
- {
- appendPQExpBufferStr(&sql, "ANALYZE");
- if (verbose)
- appendPQExpBufferStr(&sql, " VERBOSE");
- }
- else
- {
- appendPQExpBufferStr(&sql, "VACUUM");
- if (PQserverVersion(conn) >= 90000)
- {
- const char *paren = " (";
- const char *comma = ", ";
- const char *sep = paren;
+ prepare_command(conn, full, verbose,
+ and_analyze, analyze_only, freeze, &sql);
- if (full)
- {
- appendPQExpBuffer(&sql, "%sFULL", sep);
- sep = comma;
- }
- if (freeze)
- {
- appendPQExpBuffer(&sql, "%sFREEZE", sep);
- sep = comma;
- }
- if (verbose)
- {
- appendPQExpBuffer(&sql, "%sVERBOSE", sep);
- sep = comma;
- }
- if (and_analyze)
- {
- appendPQExpBuffer(&sql, "%sANALYZE", sep);
- sep = comma;
- }
- if (sep != paren)
- appendPQExpBufferStr(&sql, ")");
- }
- else
- {
- if (full)
- appendPQExpBufferStr(&sql, " FULL");
- if (freeze)
- appendPQExpBufferStr(&sql, " FREEZE");
- if (verbose)
- appendPQExpBufferStr(&sql, " VERBOSE");
- if (and_analyze)
- appendPQExpBufferStr(&sql, " ANALYZE");
- }
- }
if (table)
appendPQExpBuffer(&sql, " %s", table);
appendPQExpBufferStr(&sql, ";");
@@ -353,8 +381,10 @@ vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
}
else
{
- /* Otherwise, we got a stage from vacuum_all_databases(), so run
- * only that one. */
+ /*
+ * Otherwise, we got a stage from vacuum_all_databases(), so run
+ * only that one.
+ */
if (!quiet)
{
puts(gettext(stage_messages[stage]));
@@ -374,11 +404,12 @@ vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
static void
-vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
- bool analyze_in_stages, bool freeze, const char *maintenance_db,
- const char *host, const char *port,
- const char *username, enum trivalue prompt_password,
- const char *progname, bool echo, bool quiet)
+vacuum_all_databases(bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool analyze_in_stages, bool freeze,
+ const char *maintenance_db, const char *host,
+ const char *port, const char *username,
+ enum trivalue prompt_password, const char *progname,
+ bool echo, bool quiet, int concurrentCons)
{
PGconn *conn;
PGresult *result;
@@ -407,6 +438,15 @@ vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
fflush(stdout);
}
+ if (concurrentCons > 1)
+ {
+ vacuum_parallel(dbname, full, verbose, and_analyze,
+ analyze_only, analyze_in_stages, stage,
+ freeze, host, port, username, prompt_password,
+ progname, echo, concurrentCons, NULL, quiet);
+
+ }
+ else
vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
analyze_in_stages, stage,
freeze, NULL, host, port, username, prompt_password,
@@ -417,6 +457,503 @@ vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
PQclear(result);
}
+/*
+ * run_parallel_vacuum
+ * This function process the table list,
+ * pick the object on by one and get the Free connections slot, once it
+ * get the free slot send the job on the free connection.
+ */
+
+static void
+run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int concurrentCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb)
+{
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+ int max_slot = concurrentCons;
+ int i;
+ int free_slot;
+ PGconn *slotconn;
+ bool error = false;
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"};
+
+ initPQExpBuffer(&sql);
+
+ if (analyze_stage >= 0)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ executeCommand(connSlot[i].connection,
+ stage_commands[analyze_stage], progname, echo);
+ }
+ }
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ /*
+ * This will give the free connection slot, if no slot is free it will
+ * wait for atleast one slot to get free.
+ */
+ free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname,
+ completedb);
+ if (free_slot == NO_SLOT)
+ {
+ error = true;
+ goto fail;
+ }
+
+ prepare_command(connSlot[free_slot].connection, full, verbose,
+ and_analyze, analyze_only, freeze, &sql);
+
+ appendPQExpBuffer(&sql, " %s", cell->val);
+
+ connSlot[free_slot].isFree = false;
+
+ slotconn = connSlot[free_slot].connection;
+ PQsendQuery(slotconn, sql.data);
+
+ resetPQExpBuffer(&sql);
+ }
+
+ for (i = 0; i < max_slot; i++)
+ {
+ /* wait for all connection to return the results*/
+ if (!GetQueryResult(connSlot[i].connection, dbname, progname,
+ completedb))
+ {
+ error = true;
+ goto fail;
+ }
+
+ connSlot[i].isFree = true;
+ }
+
+fail:
+
+ termPQExpBuffer(&sql);
+
+ if (error)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ DisconnectDatabase(&connSlot[i]);
+ }
+
+ pfree(connSlot);
+
+ exit(1);
+ }
+}
+
+/*
+ * GetIdleSlot
+ * Process the slot list, if any free slot available return the slotid
+ * If no slot is free, Then perform select on all the socket and wait until
+ * atleast one slot is free.
+ */
+static int
+GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb)
+{
+ int i;
+ fd_set slotset;
+ int firstFree = -1;
+ pgsocket maxFd;
+
+ for (i = 0; i < max_slot; i++)
+ if (pSlot[i].isFree)
+ return i;
+
+ FD_ZERO(&slotset);
+
+ maxFd = pSlot[0].sock;
+
+ for (i = 0; i < max_slot; i++)
+ {
+ FD_SET(pSlot[i].sock, &slotset);
+ if (pSlot[i].sock > maxFd)
+ maxFd = pSlot[i].sock;
+ }
+
+ /*
+ * Some of the slot are free, Process the results for slots whichever
+ * are free
+ */
+
+ do
+ {
+ SetCancelConn(pSlot[0].connection);
+
+ i = select_loop(maxFd, &slotset);
+
+ ResetCancelConn();
+
+ if (i < 0)
+ {
+ /*
+ * This can only happen if user has sent the cancel request using
+ * Ctrl+C, Cancel is handled by 0th slot, so fetch the error result.
+ */
+
+ GetQueryResult(pSlot[0].connection, dbname, progname,
+ completedb);
+ return NO_SLOT;
+ }
+
+ Assert(i != 0);
+
+ for (i = 0; i < max_slot; i++)
+ {
+ if (!FD_ISSET(pSlot[i].sock, &slotset))
+ continue;
+
+ PQconsumeInput(pSlot[i].connection);
+ if (PQisBusy(pSlot[i].connection))
+ continue;
+
+ pSlot[i].isFree = true;
+
+ if (!GetQueryResult(pSlot[i].connection, dbname, progname,
+ completedb))
+ return NO_SLOT;
+
+ if (firstFree < 0)
+ firstFree = i;
+ }
+ }while(firstFree < 0);
+
+ return firstFree;
+}
+
+/*
+ * GetQueryResult
+ * Process the query result.
+ */
+
+static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb)
+{
+ PGresult *result = NULL;
+ PGresult *lastResult = NULL;
+ bool r;
+
+
+ SetCancelConn(conn);
+ while((result = PQgetResult(conn)) != NULL)
+ {
+ PQclear(lastResult);
+ lastResult = result;
+ }
+
+ ResetCancelConn();
+
+ if (!lastResult)
+ return true;
+
+ r = (PQresultStatus(lastResult) == PGRES_COMMAND_OK);
+
+ /*
+ * If user has given the vacuum of complete db, then if
+ * any of the object's vacuum failed it can be ignored and vacuuming
+ * of other objects can be continued, this is the same behavior as
+ * vacuuming of complete db is handled without --jobs option
+ */
+ if (!r)
+ {
+ char *sqlState = PQresultErrorField(lastResult, PG_DIAG_SQLSTATE);
+
+ if (!completedb ||
+ (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0))
+ {
+
+ fprintf(stderr, _("%s: vacuuming of database \"%s\" failed: %s"),
+ progname, dbname, PQerrorMessage(conn));
+
+ PQclear(lastResult);
+ return false;
+ }
+ }
+
+ PQclear(lastResult);
+ return true;
+}
+
+/*
+ * vacuum_parallel
+ * This function will open the multiple concurrent connections as
+ * suggested by user, it will derive the table list using server call
+ * if table list is not given by user and perform the vacuum call
+ */
+
+void
+vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ int stage, bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int concurrentCons,
+ SimpleStringList *tables, bool quiet)
+{
+
+ PGconn *conn;
+ int i;
+ ParallelSlot *connSlot;
+ bool completeDb = false;
+
+ conn = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ /*
+ * if table list is not provided then we need to do vaccum for whole DB
+ * get the list of all tables and prepare the list
+ */
+ if (!tables || !tables->head)
+ {
+ SimpleStringList dbtables = {NULL, NULL};
+ PGresult *res;
+ int ntuple;
+ int i;
+ PQExpBufferData sql;
+
+ initPQExpBuffer(&sql);
+
+ res = executeQuery(conn,
+ "SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns"
+ " WHERE (relkind = \'r\' or relkind = \'m\')"
+ " and c.relnamespace = ns.oid ORDER BY c.relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ appendPQExpBuffer(&sql, "%s",
+ fmtQualifiedId(PQserverVersion(conn),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 0)));
+
+ simple_string_list_append(&dbtables, sql.data);
+ resetPQExpBuffer(&sql);
+ }
+
+ termPQExpBuffer(&sql);
+ tables = &dbtables;
+
+ /* Vaccuming full database*/
+ completeDb = true;
+
+ if (concurrentCons > ntuple)
+ concurrentCons = ntuple;
+ }
+
+ connSlot = (ParallelSlot*)pg_malloc(concurrentCons * sizeof(ParallelSlot));
+ connSlot[0].connection = conn;
+ connSlot[0].sock = PQsocket(conn);
+
+ PQsetnonblocking(connSlot[0].connection, 1);
+
+ for (i = 1; i < concurrentCons; i++)
+ {
+ connSlot[i].connection = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ PQsetnonblocking(connSlot[i].connection, 1);
+ connSlot[i].isFree = true;
+ connSlot[i].sock = PQsocket(connSlot[i].connection);
+ }
+
+ if (analyze_in_stages)
+ {
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")
+ };
+
+ if (stage == -1)
+ {
+ int i;
+ for (i = 0; i < 3; i++)
+ {
+ if (!quiet)
+ {
+ puts(gettext(stage_messages[i]));
+ fflush(stdout);
+ }
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, concurrentCons,
+ progname, i, connSlot, completeDb);
+ }
+ }
+ else
+ {
+ /* Otherwise, we got a stage from vacuum_all_databases(), so run
+ * only that one. */
+ if (!quiet)
+ {
+ puts(gettext(stage_messages[stage]));
+ fflush(stdout);
+ }
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, concurrentCons,
+ progname, stage, connSlot, completeDb);
+ }
+ }
+ else
+ {
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze,
+ concurrentCons, progname, -1, connSlot, completeDb);
+ }
+
+ for (i = 0; i < concurrentCons; i++)
+ {
+ PQfinish(connSlot[i].connection);
+ }
+
+ pfree(connSlot);
+}
+
+/*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+static int
+select_loop(int maxFd, fd_set *workerset)
+{
+ int i;
+ fd_set saveSet = *workerset;
+
+#ifdef WIN32
+ /* should always be the master */
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+#else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+
+ if (in_abort())
+ {
+ return -1;
+ }
+
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+#endif
+
+ return i;
+}
+
+/*
+ * DisconnectDatabase
+ * disconnect all the connections.
+ */
+void
+DisconnectDatabase(ParallelSlot *slot)
+{
+ PGcancel *cancel;
+ char errbuf[1];
+
+ if (!slot->connection)
+ return;
+
+ if (PQtransactionStatus(slot->connection) == PQTRANS_ACTIVE)
+ {
+ if ((cancel = PQgetCancel(slot->connection)))
+ {
+ PQcancel(cancel, errbuf, sizeof(errbuf));
+ PQfreeCancel(cancel);
+ }
+ }
+
+ PQfinish(slot->connection);
+ slot->connection= NULL;
+}
+
+
+
+void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+{
+ initPQExpBuffer(sql);
+
+ if (analyze_only)
+ {
+ appendPQExpBuffer(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBuffer(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBuffer(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBuffer(sql, " FULL");
+ if (freeze)
+ appendPQExpBuffer(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBuffer(sql, " ANALYZE");
+ }
+ }
+}
+
static void
help(const char *progname)
@@ -436,6 +973,7 @@ help(const char *progname)
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many concurrent connections to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
On 25 October 2014 17:52, Amit Kapila Wrote,
*************** *** 358,363 **** handle_sigint(SIGNAL_ARGS) --- 358,364 ----/* Send QueryCancel if we are processing a database query */
if (cancelConn != NULL)
{
+ inAbort = true;
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
fprintf(stderr, _("Cancel request sent\n"));
elseDo we need to set inAbort flag incase PQcancel is successful?
Basically if PQCancel fails due to any reason, I think behaviour
can be undefined as the executing thread can assume that cancel is
done.*** 391,396 **** consoleHandler(DWORD dwCtrlType) --- 392,399 ---- EnterCriticalSection (&cancelConnLock); if (cancelConn != NULL) { + inAbort = true; +
In “handle_sigint” function if we are going to cancel the query that time I am setting the flag inAbort (even when it is success), so that in “select_loop” function
If select(maxFd + 1, workerset, NULL, NULL, &tv); come out, we can know whether it came out because of cancel query and handle it accordingly.
i = select(maxFd + 1, workerset, NULL, NULL, NULL);
if (in_abort()) //loop break because of cancel query, so return fail…
{
return -1;
}
if (i < 0 && errno == EINTR)
continue;
Regards,
Dilip Kumar
On Tue, Oct 28, 2014 at 9:03 AM, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 25 October 2014 17:52, Amit Kapila Wrote,
***************
*** 358,363 **** handle_sigint(SIGNAL_ARGS)
--- 358,364 ----/* Send QueryCancel if we are processing a database query */
if (cancelConn != NULL)
{
+ inAbort = true;
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
fprintf(stderr, _("Cancel request sent\n"));
else
Do we need to set inAbort flag incase PQcancel is successful?
Basically if PQCancel fails due to any reason, I think behaviour
can be undefined as the executing thread can assume that cancel is
done.
*** 391,396 **** consoleHandler(DWORD dwCtrlType)
--- 392,399 ----EnterCriticalSection
(&cancelConnLock);
if (cancelConn != NULL)
{
+ inAbort =
true;
+
In “handle_sigint” function if we are going to cancel the query that time
I am setting the flag inAbort (even when it is success), so that in
“select_loop” function
I am worried about the case if after setting the inAbort flag,
PQCancel() fails (returns error).
If select(maxFd + 1, workerset, NULL, NULL, &tv); come out, we can know
whether it came out because of cancel query and handle it accordingly.
Yeah, it is fine for the case when PQCancel() is successful, what
if it fails?
I think even if select comes out due to any other reason, it will behave
as if it came out due to Cancel, even though actually Cancel is failed,
how are planning to handle that case?
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 28 October 2014 09:18, Amit Kapila Wrote,
I am worried about the case if after setting the inAbort flag,
PQCancel() fails (returns error).If select(maxFd + 1, workerset, NULL, NULL, &tv); come out, we can know whether it came out because of cancel query and handle it accordingly.
Yeah, it is fine for the case when PQCancel() is successful, what
if it fails?
I think even if select comes out due to any other reason, it will behave
as if it came out due to Cancel, even though actually Cancel is failed,
how are planning to handle that case?
I think If PQcancel fails then also there is no problem, because we are setting inAbort flag in handle_sigint handler, it means user have tried to terminate.
So in this case as well we will find that inAbort is set so we return error, and in error case we finally call DisconnectDatabase, and in this function we will send the PQcancel for all active connection and then only we disconnect.
Regards,
DIlip
On Tue, Oct 28, 2014 at 9:27 AM, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 28 October 2014 09:18, Amit Kapila Wrote,
I am worried about the case if after setting the inAbort flag,
PQCancel() fails (returns error).
If select(maxFd + 1, workerset, NULL, NULL, &tv); come out, we can
know whether it came out because of cancel query and handle it accordingly.
Yeah, it is fine for the case when PQCancel() is successful, what
if it fails?
I think even if select comes out due to any other reason, it will behave
as if it came out due to Cancel, even though actually Cancel is failed,
how are planning to handle that case?
I think If PQcancel fails then also there is no problem, because we are
setting inAbort flag in handle_sigint handler, it means user have tried to
terminate.
Yeah, user has tried to terminate, however utility will emit the
message: "Could not send cancel request" in such a case and still
silently tries to cancel and disconnect all connections.
One other related point is that I think still cancel handling mechanism
is not completely right, code is doing that when there are not enough
number of freeslots, but otherwise it won't handle the cancel request,
basically I am referring to below part of code:
run_parallel_vacuum()
{
..
for (cell = tables->head; cell; cell = cell->next)
{
/*
* This will give the free connection slot, if no slot is free it will
* wait for atleast one slot to get free.
*/
free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname,
completedb);
if (free_slot == NO_SLOT)
{
error = true;
goto fail;
}
prepare_command(connSlot[free_slot].connection, full, verbose,
and_analyze, analyze_only, freeze, &sql);
appendPQExpBuffer(&sql, " %s", cell->val);
connSlot[free_slot].isFree = false;
slotconn = connSlot[free_slot].connection;
PQsendQuery(slotconn, sql.data);
resetPQExpBuffer(&sql);
}
..
}
I am wondering if it would be better to setcancelconn in above loop.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Oct 27, 2014 at 5:26 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
Going further with verification of this patch, I found below issue:
Run the testcase.sql file at below link:
/messages/by-id/4205E661176A124FAF891E0A6BA9135266347F25@szxeml509-mbs.china.huawei.com
./vacuumdb --analyze-in-stages -j 8 -d postgres
Generating minimal optimizer statistics (1 target)
Segmentation faultServer Log:
ERROR: syntax error at or near "minimal" at character 12
STATEMENT: ANALYZE ng minimal optimizer statistics (1 target)
LOG: could not receive data from client: Connection reset by peer
As mentioned by you offlist that you are not able reproduce this
issue, I have tried again and what I observe is that I am able to
reproduce it only on *release* build and some cases work without
this issue as well,
example:
./vacuumdb --analyze-in-stages -t t1 -t t2 -t t3 -t t4 -t t5 -t t6 -t t7 -t
t8 -j 8 -d postgres
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
So to me, it looks like this is a timing issue and please notice
why in error the statement looks like
"ANALYZE ng minimal optimizer statistics (1 target)". I think this
is not a valid statement.
Let me know if you still could not reproduce it.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 13 November 2014 15:35 Amit Kapila Wrote,
As mentioned by you offlist that you are not able reproduce this
issue, I have tried again and what I observe is that I am able to
reproduce it only on *release* build and some cases work without
this issue as well,
example:
./vacuumdb --analyze-in-stages -t t1 -t t2 -t t3 -t t4 -t t5 -t t6 -t t7 -t t8 -j 8 -d postgres
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
So to me, it looks like this is a timing issue and please notice
why in error the statement looks like
"ANALYZE ng minimal optimizer statistics (1 target)". I think this
is not a valid statement.
Let me know if you still could not reproduce it.
Thank you for looking into it once again..
I have tried with the release mode, but could not reproduce the same. By looking at server LOG sent by you “"ANALYZE ng minimal optimizer statistics (1 target)". ”, seems like some corruption.
So actually looks like two issues here.
1. Query string sent to server is getting corrupted.
2. Client is getting crashed.
I will review the code and try to find the same, meanwhile if you can find some time to debug this, it will be really helpful.
Regards,
Dilip
On Mon, Nov 17, 2014 at 8:55 AM, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 13 November 2014 15:35 Amit Kapila Wrote,
As mentioned by you offlist that you are not able reproduce this
issue, I have tried again and what I observe is that I am able to
reproduce it only on *release* build and some cases work without
this issue as well,
example:
./vacuumdb --analyze-in-stages -t t1 -t t2 -t t3 -t t4 -t t5 -t t6 -t t7
-t t8 -j 8 -d postgres
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
So to me, it looks like this is a timing issue and please notice
why in error the statement looks like
"ANALYZE ng minimal optimizer statistics (1 target)". I think this
is not a valid statement.
Let me know if you still could not reproduce it.
I will review the code and try to find the same, meanwhile if you can
find some time to debug this, it will be really helpful.
I think I have found the problem and fix for the same.
The stacktrace of crash is as below:
#0 0x00000080108cf3a4 in .strlen () from /lib64/libc.so.6
#1 0x00000080108925bc in ._IO_vfprintf () from /lib64/libc.so.6
#2 0x00000080108bc1e0 in .__GL__IO_vsnprintf_vsnprintf () from
/lib64/libc.so.6
#3 0x00000fff7e581590 in .appendPQExpBufferVA () from
/data/akapila/workspace/master/installation/lib/libpq.so.5
#4 0x00000fff7e581774 in .appendPQExpBuffer () from
/data/akapila/workspace/master/installation/lib/libpq.so.5
#5 0x0000000010003748 in .run_parallel_vacuum ()
#6 0x0000000010003f60 in .vacuum_parallel ()
#7 0x0000000010002ae4 in .main ()
(gdb) f 5
#5 0x0000000010003748 in .run_parallel_vacuum ()
So now the real reason here is that the list of tables passed to
function is corrupted. The below code seems to be the real
culprit:
vacuum_parallel()
{
..
if (!tables || !tables->head)
{
SimpleStringList dbtables = {NULL, NULL};
...
..
tables = &dbtables;
}
..
}
In above code dbtables is local to if loop and code
is using the address of same to assign it to tables which
is used out of if block scope, moving declaration to the
outer scope fixes the problem in my environment. Find the
updated patch that fixes this problem attached with this
mail. Let me know your opinion about the same.
While looking at this problem, I have noticed couple of other
improvements:
a. In prepare_command() function, patch is doing init of sql
buffer (initPQExpBuffer(sql);) which I think is not required
as both places from where this function is called, it is done by
caller. I think this will lead to memory leak.
b. In prepare_command() function, for fixed strings you can
use appendPQExpBufferStr() which is what used in original code
as well.
c.
run_parallel_vacuum()
{
..
prepare_command(connSlot[free_slot].connection, full, verbose,
and_analyze, analyze_only, freeze, &sql);
appendPQExpBuffer(&sql, " %s", cell->val);
..
}
I think it is better to end command with ';' by using
appendPQExpBufferStr(&sql, ";"); in above code.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
vacuumdb_parallel_v17.patchapplication/octet-stream; name=vacuumdb_parallel_v17.patchDownload
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 3ecd999..e4a971f 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -204,6 +204,27 @@ PostgreSQL documentation
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></option></term>
+ <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+ <listitem>
+ <para>
+ Number of concurrent connections to perform the operation.
+ This option will enable the vacuum operation to run on asynchronous
+ connections, at a time one table will be operated on one connection.
+ So at one time as many tables will be vacuumed parallely as number of
+ jobs. If number of jobs given are more than number of tables then
+ number of jobs will be set to number of tables.
+ </para>
+ <para>
+ <application>vacuumdb</application> will open
+ <replaceable class="parameter"> njobs</replaceable> connections to the
+ database, so make sure your <xref linkend="guc-max-connections">
+ setting is high enough to accommodate all connections.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>--analyze-in-stages</option></term>
<listitem>
<para>
diff --git a/src/bin/scripts/common.c b/src/bin/scripts/common.c
index 311fed5..bc5336a 100644
--- a/src/bin/scripts/common.c
+++ b/src/bin/scripts/common.c
@@ -19,10 +19,10 @@
#include "common.h"
-static void SetCancelConn(PGconn *conn);
-static void ResetCancelConn(void);
+
static PGcancel *volatile cancelConn = NULL;
+static bool inAbort = false;
#ifdef WIN32
static CRITICAL_SECTION cancelConnLock;
@@ -291,7 +291,7 @@ yesno_prompt(const char *question)
*
* Set cancelConn to point to the current database connection.
*/
-static void
+void
SetCancelConn(PGconn *conn)
{
PGcancel *oldCancelConn;
@@ -321,7 +321,7 @@ SetCancelConn(PGconn *conn)
*
* Free the current cancel connection, if any, and set to NULL.
*/
-static void
+void
ResetCancelConn(void)
{
PGcancel *oldCancelConn;
@@ -358,6 +358,7 @@ handle_sigint(SIGNAL_ARGS)
/* Send QueryCancel if we are processing a database query */
if (cancelConn != NULL)
{
+ inAbort = true;
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
fprintf(stderr, _("Cancel request sent\n"));
else
@@ -391,6 +392,8 @@ consoleHandler(DWORD dwCtrlType)
EnterCriticalSection(&cancelConnLock);
if (cancelConn != NULL)
{
+ inAbort = true;
+
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
fprintf(stderr, _("Cancel request sent\n"));
else
@@ -414,3 +417,8 @@ setup_cancel_handler(void)
}
#endif /* WIN32 */
+
+bool in_abort()
+{
+ return inAbort;
+}
diff --git a/src/bin/scripts/common.h b/src/bin/scripts/common.h
index 691f6c6..3bafde3 100644
--- a/src/bin/scripts/common.h
+++ b/src/bin/scripts/common.h
@@ -49,4 +49,9 @@ extern bool yesno_prompt(const char *question);
extern void setup_cancel_handler(void);
+extern void SetCancelConn(PGconn *conn);
+extern void ResetCancelConn(void);
+extern bool in_abort(void);
+
+
#endif /* COMMON_H */
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 86e6ab3..f45b2f0 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -14,6 +14,17 @@
#include "common.h"
#include "dumputils.h"
+#define NO_SLOT (-1)
+
+/* Arguments needed for a worker process */
+typedef struct ParallelSlot
+{
+ PGconn *connection;
+ bool isFree;
+ pgsocket sock;
+} ParallelSlot;
+
+#define ERRCODE_UNDEFINED_TABLE "42P01"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
bool and_analyze, bool analyze_only, bool analyze_in_stages, int stage, bool freeze,
@@ -25,10 +36,39 @@ static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
- const char *progname, bool echo, bool quiet);
+ const char *progname, bool echo, bool quiet,
+ int concurrentCons);
static void help(const char *progname);
+void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ int stage, bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int concurrentCons,
+ SimpleStringList *tables, bool quiet);
+
+
+void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+static void
+run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int concurrentCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb);
+static int
+GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb);
+
+static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb);
+
+static int
+select_loop(int maxFd, fd_set *workerset);
+
+static void DisconnectDatabase(ParallelSlot *slot);
+
int
main(int argc, char *argv[])
@@ -49,6 +89,7 @@ main(int argc, char *argv[])
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
@@ -74,13 +115,15 @@ main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int concurrentCons = 0;
+ int tbl_count = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
- while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
+ while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
@@ -121,14 +164,27 @@ main(int argc, char *argv[])
alldb = true;
break;
case 't':
+ {
simple_string_list_append(&tables, optarg);
+ tbl_count++;
break;
+ }
case 'f':
full = true;
break;
case 'v':
verbose = true;
break;
+ case 'j':
+ concurrentCons = atoi(optarg);
+ if (concurrentCons <= 0)
+ {
+ fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least 1\n"),
+ progname);
+ exit(1);
+ }
+
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
@@ -141,6 +197,7 @@ main(int argc, char *argv[])
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
@@ -179,6 +236,13 @@ main(int argc, char *argv[])
setup_cancel_handler();
+ /*
+ * When user is giving the table list, and list is smaller then
+ * number of tables
+ */
+ if (tbl_count && (concurrentCons > tbl_count))
+ concurrentCons = tbl_count;
+
if (alldb)
{
if (dbname)
@@ -196,7 +260,7 @@ main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
- prompt_password, progname, echo, quiet);
+ prompt_password, progname, echo, quiet, concurrentCons);
}
else
{
@@ -210,25 +274,36 @@ main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
- if (tables.head != NULL)
+ if (concurrentCons > 1)
{
- SimpleStringListCell *cell;
+ vacuum_parallel(dbname, full, verbose, and_analyze,
+ analyze_only, analyze_in_stages, -1,
+ freeze, host, port, username, prompt_password,
+ progname, echo, concurrentCons, &tables, quiet);
- for (cell = tables.head; cell; cell = cell->next)
- {
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages, -1,
- freeze, cell->val,
- host, port, username, prompt_password,
- progname, echo, quiet);
- }
}
else
- vacuum_one_database(dbname, full, verbose, and_analyze,
+ {
+ if (tables.head != NULL)
+ {
+ SimpleStringListCell *cell;
+
+ for (cell = tables.head; cell; cell = cell->next)
+ {
+ vacuum_one_database(dbname, full, verbose, and_analyze,
+ analyze_only, analyze_in_stages, -1,
+ freeze, cell->val,
+ host, port, username, prompt_password,
+ progname, echo, quiet);
+ }
+ }
+ else
+ vacuum_one_database(dbname, full, verbose, and_analyze,
analyze_only, analyze_in_stages, -1,
freeze, NULL,
host, port, username, prompt_password,
progname, echo, quiet);
+ }
}
exit(0);
@@ -268,56 +343,9 @@ vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
conn = connectDatabase(dbname, host, port, username, prompt_password,
progname, false);
- if (analyze_only)
- {
- appendPQExpBufferStr(&sql, "ANALYZE");
- if (verbose)
- appendPQExpBufferStr(&sql, " VERBOSE");
- }
- else
- {
- appendPQExpBufferStr(&sql, "VACUUM");
- if (PQserverVersion(conn) >= 90000)
- {
- const char *paren = " (";
- const char *comma = ", ";
- const char *sep = paren;
+ prepare_command(conn, full, verbose,
+ and_analyze, analyze_only, freeze, &sql);
- if (full)
- {
- appendPQExpBuffer(&sql, "%sFULL", sep);
- sep = comma;
- }
- if (freeze)
- {
- appendPQExpBuffer(&sql, "%sFREEZE", sep);
- sep = comma;
- }
- if (verbose)
- {
- appendPQExpBuffer(&sql, "%sVERBOSE", sep);
- sep = comma;
- }
- if (and_analyze)
- {
- appendPQExpBuffer(&sql, "%sANALYZE", sep);
- sep = comma;
- }
- if (sep != paren)
- appendPQExpBufferStr(&sql, ")");
- }
- else
- {
- if (full)
- appendPQExpBufferStr(&sql, " FULL");
- if (freeze)
- appendPQExpBufferStr(&sql, " FREEZE");
- if (verbose)
- appendPQExpBufferStr(&sql, " VERBOSE");
- if (and_analyze)
- appendPQExpBufferStr(&sql, " ANALYZE");
- }
- }
if (table)
appendPQExpBuffer(&sql, " %s", table);
appendPQExpBufferStr(&sql, ";");
@@ -353,8 +381,10 @@ vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
}
else
{
- /* Otherwise, we got a stage from vacuum_all_databases(), so run
- * only that one. */
+ /*
+ * Otherwise, we got a stage from vacuum_all_databases(), so run
+ * only that one.
+ */
if (!quiet)
{
puts(gettext(stage_messages[stage]));
@@ -374,11 +404,12 @@ vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
static void
-vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
- bool analyze_in_stages, bool freeze, const char *maintenance_db,
- const char *host, const char *port,
- const char *username, enum trivalue prompt_password,
- const char *progname, bool echo, bool quiet)
+vacuum_all_databases(bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool analyze_in_stages, bool freeze,
+ const char *maintenance_db, const char *host,
+ const char *port, const char *username,
+ enum trivalue prompt_password, const char *progname,
+ bool echo, bool quiet, int concurrentCons)
{
PGconn *conn;
PGresult *result;
@@ -407,6 +438,15 @@ vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
fflush(stdout);
}
+ if (concurrentCons > 1)
+ {
+ vacuum_parallel(dbname, full, verbose, and_analyze,
+ analyze_only, analyze_in_stages, stage,
+ freeze, host, port, username, prompt_password,
+ progname, echo, concurrentCons, NULL, quiet);
+
+ }
+ else
vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
analyze_in_stages, stage,
freeze, NULL, host, port, username, prompt_password,
@@ -417,6 +457,503 @@ vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
PQclear(result);
}
+/*
+ * run_parallel_vacuum
+ * This function process the table list,
+ * pick the object on by one and get the Free connections slot, once it
+ * get the free slot send the job on the free connection.
+ */
+
+static void
+run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int concurrentCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb)
+{
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+ int max_slot = concurrentCons;
+ int i;
+ int free_slot;
+ PGconn *slotconn;
+ bool error = false;
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"};
+
+ initPQExpBuffer(&sql);
+
+ if (analyze_stage >= 0)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ executeCommand(connSlot[i].connection,
+ stage_commands[analyze_stage], progname, echo);
+ }
+ }
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ /*
+ * This will give the free connection slot, if no slot is free it will
+ * wait for atleast one slot to get free.
+ */
+ free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname,
+ completedb);
+ if (free_slot == NO_SLOT)
+ {
+ error = true;
+ goto fail;
+ }
+
+ prepare_command(connSlot[free_slot].connection, full, verbose,
+ and_analyze, analyze_only, freeze, &sql);
+
+ appendPQExpBuffer(&sql, " %s", cell->val);
+
+ connSlot[free_slot].isFree = false;
+
+ slotconn = connSlot[free_slot].connection;
+ PQsendQuery(slotconn, sql.data);
+
+ resetPQExpBuffer(&sql);
+ }
+
+ for (i = 0; i < max_slot; i++)
+ {
+ /* wait for all connection to return the results*/
+ if (!GetQueryResult(connSlot[i].connection, dbname, progname,
+ completedb))
+ {
+ error = true;
+ goto fail;
+ }
+
+ connSlot[i].isFree = true;
+ }
+
+fail:
+
+ termPQExpBuffer(&sql);
+
+ if (error)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ DisconnectDatabase(&connSlot[i]);
+ }
+
+ pfree(connSlot);
+
+ exit(1);
+ }
+}
+
+/*
+ * GetIdleSlot
+ * Process the slot list, if any free slot available return the slotid
+ * If no slot is free, Then perform select on all the socket and wait until
+ * atleast one slot is free.
+ */
+static int
+GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb)
+{
+ int i;
+ fd_set slotset;
+ int firstFree = -1;
+ pgsocket maxFd;
+
+ for (i = 0; i < max_slot; i++)
+ if (pSlot[i].isFree)
+ return i;
+
+ FD_ZERO(&slotset);
+
+ maxFd = pSlot[0].sock;
+
+ for (i = 0; i < max_slot; i++)
+ {
+ FD_SET(pSlot[i].sock, &slotset);
+ if (pSlot[i].sock > maxFd)
+ maxFd = pSlot[i].sock;
+ }
+
+ /*
+ * Some of the slot are free, Process the results for slots whichever
+ * are free
+ */
+
+ do
+ {
+ SetCancelConn(pSlot[0].connection);
+
+ i = select_loop(maxFd, &slotset);
+
+ ResetCancelConn();
+
+ if (i < 0)
+ {
+ /*
+ * This can only happen if user has sent the cancel request using
+ * Ctrl+C, Cancel is handled by 0th slot, so fetch the error result.
+ */
+
+ GetQueryResult(pSlot[0].connection, dbname, progname,
+ completedb);
+ return NO_SLOT;
+ }
+
+ Assert(i != 0);
+
+ for (i = 0; i < max_slot; i++)
+ {
+ if (!FD_ISSET(pSlot[i].sock, &slotset))
+ continue;
+
+ PQconsumeInput(pSlot[i].connection);
+ if (PQisBusy(pSlot[i].connection))
+ continue;
+
+ pSlot[i].isFree = true;
+
+ if (!GetQueryResult(pSlot[i].connection, dbname, progname,
+ completedb))
+ return NO_SLOT;
+
+ if (firstFree < 0)
+ firstFree = i;
+ }
+ }while(firstFree < 0);
+
+ return firstFree;
+}
+
+/*
+ * GetQueryResult
+ * Process the query result.
+ */
+
+static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb)
+{
+ PGresult *result = NULL;
+ PGresult *lastResult = NULL;
+ bool r;
+
+
+ SetCancelConn(conn);
+ while((result = PQgetResult(conn)) != NULL)
+ {
+ PQclear(lastResult);
+ lastResult = result;
+ }
+
+ ResetCancelConn();
+
+ if (!lastResult)
+ return true;
+
+ r = (PQresultStatus(lastResult) == PGRES_COMMAND_OK);
+
+ /*
+ * If user has given the vacuum of complete db, then if
+ * any of the object's vacuum failed it can be ignored and vacuuming
+ * of other objects can be continued, this is the same behavior as
+ * vacuuming of complete db is handled without --jobs option
+ */
+ if (!r)
+ {
+ char *sqlState = PQresultErrorField(lastResult, PG_DIAG_SQLSTATE);
+
+ if (!completedb ||
+ (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0))
+ {
+
+ fprintf(stderr, _("%s: vacuuming of database \"%s\" failed: %s"),
+ progname, dbname, PQerrorMessage(conn));
+
+ PQclear(lastResult);
+ return false;
+ }
+ }
+
+ PQclear(lastResult);
+ return true;
+}
+
+/*
+ * vacuum_parallel
+ * This function will open the multiple concurrent connections as
+ * suggested by user, it will derive the table list using server call
+ * if table list is not given by user and perform the vacuum call
+ */
+
+void
+vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ int stage, bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int concurrentCons,
+ SimpleStringList *tables, bool quiet)
+{
+
+ PGconn *conn;
+ int i;
+ ParallelSlot *connSlot;
+ SimpleStringList dbtables = {NULL, NULL};
+ bool completeDb = false;
+
+ conn = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ /*
+ * if table list is not provided then we need to do vaccum for whole DB
+ * get the list of all tables and prepare the list
+ */
+ if (!tables || !tables->head)
+ {
+ PGresult *res;
+ int ntuple;
+ int i;
+ PQExpBufferData sql;
+
+ initPQExpBuffer(&sql);
+
+ res = executeQuery(conn,
+ "SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns"
+ " WHERE (relkind = \'r\' or relkind = \'m\')"
+ " and c.relnamespace = ns.oid ORDER BY c.relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ appendPQExpBuffer(&sql, "%s",
+ fmtQualifiedId(PQserverVersion(conn),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 0)));
+
+ simple_string_list_append(&dbtables, sql.data);
+ resetPQExpBuffer(&sql);
+ }
+
+ termPQExpBuffer(&sql);
+ tables = &dbtables;
+
+ /* Vaccuming full database*/
+ completeDb = true;
+
+ if (concurrentCons > ntuple)
+ concurrentCons = ntuple;
+ }
+
+ connSlot = (ParallelSlot*)pg_malloc(concurrentCons * sizeof(ParallelSlot));
+ connSlot[0].connection = conn;
+ connSlot[0].sock = PQsocket(conn);
+
+ PQsetnonblocking(connSlot[0].connection, 1);
+
+ for (i = 1; i < concurrentCons; i++)
+ {
+ connSlot[i].connection = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ PQsetnonblocking(connSlot[i].connection, 1);
+ connSlot[i].isFree = true;
+ connSlot[i].sock = PQsocket(connSlot[i].connection);
+ }
+
+ if (analyze_in_stages)
+ {
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")
+ };
+
+ if (stage == -1)
+ {
+ int i;
+ for (i = 0; i < 3; i++)
+ {
+ if (!quiet)
+ {
+ puts(gettext(stage_messages[i]));
+ fflush(stdout);
+ }
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, concurrentCons,
+ progname, i, connSlot, completeDb);
+ }
+ }
+ else
+ {
+ /* Otherwise, we got a stage from vacuum_all_databases(), so run
+ * only that one. */
+ if (!quiet)
+ {
+ puts(gettext(stage_messages[stage]));
+ fflush(stdout);
+ }
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, concurrentCons,
+ progname, stage, connSlot, completeDb);
+ }
+ }
+ else
+ {
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze,
+ concurrentCons, progname, -1, connSlot, completeDb);
+ }
+
+ for (i = 0; i < concurrentCons; i++)
+ {
+ PQfinish(connSlot[i].connection);
+ }
+
+ pfree(connSlot);
+}
+
+/*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+static int
+select_loop(int maxFd, fd_set *workerset)
+{
+ int i;
+ fd_set saveSet = *workerset;
+
+#ifdef WIN32
+ /* should always be the master */
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+#else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+
+ if (in_abort())
+ {
+ return -1;
+ }
+
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+#endif
+
+ return i;
+}
+
+/*
+ * DisconnectDatabase
+ * disconnect all the connections.
+ */
+void
+DisconnectDatabase(ParallelSlot *slot)
+{
+ PGcancel *cancel;
+ char errbuf[1];
+
+ if (!slot->connection)
+ return;
+
+ if (PQtransactionStatus(slot->connection) == PQTRANS_ACTIVE)
+ {
+ if ((cancel = PQgetCancel(slot->connection)))
+ {
+ PQcancel(cancel, errbuf, sizeof(errbuf));
+ PQfreeCancel(cancel);
+ }
+ }
+
+ PQfinish(slot->connection);
+ slot->connection= NULL;
+}
+
+
+
+void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+{
+ initPQExpBuffer(sql);
+
+ if (analyze_only)
+ {
+ appendPQExpBuffer(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBuffer(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBuffer(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBuffer(sql, " FULL");
+ if (freeze)
+ appendPQExpBuffer(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBuffer(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBuffer(sql, " ANALYZE");
+ }
+ }
+}
+
static void
help(const char *progname)
@@ -436,6 +973,7 @@ help(const char *progname)
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many concurrent connections to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
On 23 November 2014 14:45, Amit Kapila Wrote
Thanks a lot for debugging and fixing the issue..
The stacktrace of crash is as below:
#0 0x00000080108cf3a4 in .strlen () from /lib64/libc.so.6
#1 0x00000080108925bc in ._IO_vfprintf () from /lib64/libc.so.6
#2 0x00000080108bc1e0 in .__GL__IO_vsnprintf_vsnprintf () from /lib64/libc.so.6
#3 0x00000fff7e581590 in .appendPQExpBufferVA () from
/data/akapila/workspace/master/installation/lib/libpq.so.5
#4 0x00000fff7e581774 in .appendPQExpBuffer () from
/data/akapila/workspace/master/installation/lib/libpq.so.5
#5 0x0000000010003748 in .run_parallel_vacuum ()
#6 0x0000000010003f60 in .vacuum_parallel ()
#7 0x0000000010002ae4 in .main ()
(gdb) f 5
#5 0x0000000010003748 in .run_parallel_vacuum ()
So now the real reason here is that the list of tables passed to
function is corrupted. The below code seems to be the real
culprit:vacuum_parallel()
{
..
if (!tables || !tables->head)
{
SimpleStringList dbtables = {NULL, NULL};
...
..
tables = &dbtables;
}
..
}
In above code dbtables is local to if loop and code
is using the address of same to assign it to tables which
is used out of if block scope, moving declaration to the
outer scope fixes the problem in my environment. Find the
updated patch that fixes this problem attached with this
mail. Let me know your opinion about the same.
Yes, that’s the reason of corruption, this must be causing both the issues, sending corrupted query to server as well as crash at client side.
While looking at this problem, I have noticed couple of other
improvements:
a. In prepare_command() function, patch is doing init of sql
buffer (initPQExpBuffer(sql);) which I think is not required
as both places from where this function is called, it is done by
caller. I think this will lead to memory leak.
Fixed..
b. In prepare_command() function, for fixed strings you can
use appendPQExpBufferStr() which is what used in original code
as well.
Changed as per comment..
c.
run_parallel_vacuum()
{
..
prepare_command(connSlot[free_slot].connection, full, verbose,
and_analyze, analyze_only, freeze, &sql);appendPQExpBuffer(&sql, " %s", cell->val);
..
}
I think it is better to end command with ';' by using
appendPQExpBufferStr(&sql, ";"); in above code.
Done
Latest patch is attached, please have a look.
Regards,
Dilip Kumar
Attachments:
vacuumdb_parallel_v18.patchapplication/octet-stream; name=vacuumdb_parallel_v18.patchDownload
*** a/doc/src/sgml/ref/vacuumdb.sgml
--- b/doc/src/sgml/ref/vacuumdb.sgml
***************
*** 204,209 **** PostgreSQL documentation
--- 204,230 ----
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></option></term>
+ <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+ <listitem>
+ <para>
+ Number of concurrent connections to perform the operation.
+ This option will enable the vacuum operation to run on asynchronous
+ connections, at a time one table will be operated on one connection.
+ So at one time as many tables will be vacuumed parallely as number of
+ jobs. If number of jobs given are more than number of tables then
+ number of jobs will be set to number of tables.
+ </para>
+ <para>
+ <application>vacuumdb</application> will open
+ <replaceable class="parameter"> njobs</replaceable> connections to the
+ database, so make sure your <xref linkend="guc-max-connections">
+ setting is high enough to accommodate all connections.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>--analyze-in-stages</option></term>
<listitem>
<para>
*** a/src/bin/scripts/common.c
--- b/src/bin/scripts/common.c
***************
*** 19,28 ****
#include "common.h"
! static void SetCancelConn(PGconn *conn);
! static void ResetCancelConn(void);
static PGcancel *volatile cancelConn = NULL;
#ifdef WIN32
static CRITICAL_SECTION cancelConnLock;
--- 19,28 ----
#include "common.h"
!
static PGcancel *volatile cancelConn = NULL;
+ static bool inAbort = false;
#ifdef WIN32
static CRITICAL_SECTION cancelConnLock;
***************
*** 291,297 **** yesno_prompt(const char *question)
*
* Set cancelConn to point to the current database connection.
*/
! static void
SetCancelConn(PGconn *conn)
{
PGcancel *oldCancelConn;
--- 291,297 ----
*
* Set cancelConn to point to the current database connection.
*/
! void
SetCancelConn(PGconn *conn)
{
PGcancel *oldCancelConn;
***************
*** 321,327 **** SetCancelConn(PGconn *conn)
*
* Free the current cancel connection, if any, and set to NULL.
*/
! static void
ResetCancelConn(void)
{
PGcancel *oldCancelConn;
--- 321,327 ----
*
* Free the current cancel connection, if any, and set to NULL.
*/
! void
ResetCancelConn(void)
{
PGcancel *oldCancelConn;
***************
*** 358,363 **** handle_sigint(SIGNAL_ARGS)
--- 358,364 ----
/* Send QueryCancel if we are processing a database query */
if (cancelConn != NULL)
{
+ inAbort = true;
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
fprintf(stderr, _("Cancel request sent\n"));
else
***************
*** 391,396 **** consoleHandler(DWORD dwCtrlType)
--- 392,399 ----
EnterCriticalSection(&cancelConnLock);
if (cancelConn != NULL)
{
+ inAbort = true;
+
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
fprintf(stderr, _("Cancel request sent\n"));
else
***************
*** 414,416 **** setup_cancel_handler(void)
--- 417,424 ----
}
#endif /* WIN32 */
+
+ bool in_abort()
+ {
+ return inAbort;
+ }
*** a/src/bin/scripts/common.h
--- b/src/bin/scripts/common.h
***************
*** 49,52 **** extern bool yesno_prompt(const char *question);
--- 49,57 ----
extern void setup_cancel_handler(void);
+ extern void SetCancelConn(PGconn *conn);
+ extern void ResetCancelConn(void);
+ extern bool in_abort(void);
+
+
#endif /* COMMON_H */
*** a/src/bin/scripts/vacuumdb.c
--- b/src/bin/scripts/vacuumdb.c
***************
*** 14,19 ****
--- 14,30 ----
#include "common.h"
#include "dumputils.h"
+ #define NO_SLOT (-1)
+
+ /* Arguments needed for a worker process */
+ typedef struct ParallelSlot
+ {
+ PGconn *connection;
+ bool isFree;
+ pgsocket sock;
+ } ParallelSlot;
+
+ #define ERRCODE_UNDEFINED_TABLE "42P01"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
bool and_analyze, bool analyze_only, bool analyze_in_stages, int stage, bool freeze,
***************
*** 25,34 **** static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet);
static void help(const char *progname);
int
main(int argc, char *argv[])
--- 36,74 ----
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet,
! int concurrentCons);
static void help(const char *progname);
+ void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ int stage, bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int concurrentCons,
+ SimpleStringList *tables, bool quiet);
+
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+ static void
+ run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int concurrentCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb);
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb);
+
+ static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb);
+
+ static int
+ select_loop(int maxFd, fd_set *workerset);
+
+ static void DisconnectDatabase(ParallelSlot *slot);
+
int
main(int argc, char *argv[])
***************
*** 49,54 **** main(int argc, char *argv[])
--- 89,95 ----
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
***************
*** 74,86 **** main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
{
switch (c)
{
--- 115,129 ----
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int concurrentCons = 0;
+ int tbl_count = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
***************
*** 121,134 **** main(int argc, char *argv[])
--- 164,190 ----
alldb = true;
break;
case 't':
+ {
simple_string_list_append(&tables, optarg);
+ tbl_count++;
break;
+ }
case 'f':
full = true;
break;
case 'v':
verbose = true;
break;
+ case 'j':
+ concurrentCons = atoi(optarg);
+ if (concurrentCons <= 0)
+ {
+ fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least 1\n"),
+ progname);
+ exit(1);
+ }
+
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
***************
*** 141,146 **** main(int argc, char *argv[])
--- 197,203 ----
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
***************
*** 179,184 **** main(int argc, char *argv[])
--- 236,248 ----
setup_cancel_handler();
+ /*
+ * When user is giving the table list, and list is smaller then
+ * number of tables
+ */
+ if (tbl_count && (concurrentCons > tbl_count))
+ concurrentCons = tbl_count;
+
if (alldb)
{
if (dbname)
***************
*** 196,202 **** main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet);
}
else
{
--- 260,266 ----
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet, concurrentCons);
}
else
{
***************
*** 210,234 **** main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
- for (cell = tables.head; cell; cell = cell->next)
- {
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages, -1,
- freeze, cell->val,
- host, port, username, prompt_password,
- progname, echo, quiet);
- }
}
else
! vacuum_one_database(dbname, full, verbose, and_analyze,
analyze_only, analyze_in_stages, -1,
freeze, NULL,
host, port, username, prompt_password,
progname, echo, quiet);
}
exit(0);
--- 274,309 ----
dbname = get_user_name_or_exit(progname);
}
! if (concurrentCons > 1)
{
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages, -1,
! freeze, host, port, username, prompt_password,
! progname, echo, concurrentCons, &tables, quiet);
}
else
! {
! if (tables.head != NULL)
! {
! SimpleStringListCell *cell;
!
! for (cell = tables.head; cell; cell = cell->next)
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages, -1,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo, quiet);
! }
! }
! else
! vacuum_one_database(dbname, full, verbose, and_analyze,
analyze_only, analyze_in_stages, -1,
freeze, NULL,
host, port, username, prompt_password,
progname, echo, quiet);
+ }
}
exit(0);
***************
*** 268,323 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
conn = connectDatabase(dbname, host, port, username, prompt_password,
progname, false);
! if (analyze_only)
! {
! appendPQExpBufferStr(&sql, "ANALYZE");
! if (verbose)
! appendPQExpBufferStr(&sql, " VERBOSE");
! }
! else
! {
! appendPQExpBufferStr(&sql, "VACUUM");
! if (PQserverVersion(conn) >= 90000)
! {
! const char *paren = " (";
! const char *comma = ", ";
! const char *sep = paren;
- if (full)
- {
- appendPQExpBuffer(&sql, "%sFULL", sep);
- sep = comma;
- }
- if (freeze)
- {
- appendPQExpBuffer(&sql, "%sFREEZE", sep);
- sep = comma;
- }
- if (verbose)
- {
- appendPQExpBuffer(&sql, "%sVERBOSE", sep);
- sep = comma;
- }
- if (and_analyze)
- {
- appendPQExpBuffer(&sql, "%sANALYZE", sep);
- sep = comma;
- }
- if (sep != paren)
- appendPQExpBufferStr(&sql, ")");
- }
- else
- {
- if (full)
- appendPQExpBufferStr(&sql, " FULL");
- if (freeze)
- appendPQExpBufferStr(&sql, " FREEZE");
- if (verbose)
- appendPQExpBufferStr(&sql, " VERBOSE");
- if (and_analyze)
- appendPQExpBufferStr(&sql, " ANALYZE");
- }
- }
if (table)
appendPQExpBuffer(&sql, " %s", table);
appendPQExpBufferStr(&sql, ";");
--- 343,351 ----
conn = connectDatabase(dbname, host, port, username, prompt_password,
progname, false);
! prepare_command(conn, full, verbose,
! and_analyze, analyze_only, freeze, &sql);
if (table)
appendPQExpBuffer(&sql, " %s", table);
appendPQExpBufferStr(&sql, ";");
***************
*** 353,360 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
}
else
{
! /* Otherwise, we got a stage from vacuum_all_databases(), so run
! * only that one. */
if (!quiet)
{
puts(gettext(stage_messages[stage]));
--- 381,390 ----
}
else
{
! /*
! * Otherwise, we got a stage from vacuum_all_databases(), so run
! * only that one.
! */
if (!quiet)
{
puts(gettext(stage_messages[stage]));
***************
*** 374,384 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
! bool analyze_in_stages, bool freeze, const char *maintenance_db,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
--- 404,415 ----
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *maintenance_db, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo, bool quiet, int concurrentCons)
{
PGconn *conn;
PGresult *result;
***************
*** 407,412 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
--- 438,452 ----
fflush(stdout);
}
+ if (concurrentCons > 1)
+ {
+ vacuum_parallel(dbname, full, verbose, and_analyze,
+ analyze_only, analyze_in_stages, stage,
+ freeze, host, port, username, prompt_password,
+ progname, echo, concurrentCons, NULL, quiet);
+
+ }
+ else
vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
analyze_in_stages, stage,
freeze, NULL, host, port, username, prompt_password,
***************
*** 417,422 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
--- 457,958 ----
PQclear(result);
}
+ /*
+ * run_parallel_vacuum
+ * This function process the table list,
+ * pick the object one by one and get the Free connections slot, once it
+ * get the free slot send the job on the free connection.
+ */
+
+ static void
+ run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int concurrentCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb)
+ {
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+ int max_slot = concurrentCons;
+ int i;
+ int free_slot;
+ PGconn *slotconn;
+ bool error = false;
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"};
+
+ initPQExpBuffer(&sql);
+
+ if (analyze_stage >= 0)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ executeCommand(connSlot[i].connection,
+ stage_commands[analyze_stage], progname, echo);
+ }
+ }
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ /*
+ * This will give the free connection slot, if no slot is free it will
+ * wait for atleast one slot to get free.
+ */
+ free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname,
+ completedb);
+ if (free_slot == NO_SLOT)
+ {
+ error = true;
+ goto fail;
+ }
+
+ prepare_command(connSlot[free_slot].connection, full, verbose,
+ and_analyze, analyze_only, freeze, &sql);
+
+ appendPQExpBuffer(&sql, " %s", cell->val);
+ appendPQExpBufferStr(&sql, ";");
+
+ connSlot[free_slot].isFree = false;
+
+ slotconn = connSlot[free_slot].connection;
+ PQsendQuery(slotconn, sql.data);
+
+ resetPQExpBuffer(&sql);
+ }
+
+ for (i = 0; i < max_slot; i++)
+ {
+ /* wait for all connection to return the results*/
+ if (!GetQueryResult(connSlot[i].connection, dbname, progname,
+ completedb))
+ {
+ error = true;
+ goto fail;
+ }
+
+ connSlot[i].isFree = true;
+ }
+
+ fail:
+
+ termPQExpBuffer(&sql);
+
+ if (error)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ DisconnectDatabase(&connSlot[i]);
+ }
+
+ pfree(connSlot);
+
+ exit(1);
+ }
+ }
+
+ /*
+ * GetIdleSlot
+ * Process the slot list, if any free slot available return the slotid
+ * If no slot is free, Then perform select on all the socket and wait until
+ * atleast one slot is free.
+ */
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb)
+ {
+ int i;
+ fd_set slotset;
+ int firstFree = -1;
+ pgsocket maxFd;
+
+ for (i = 0; i < max_slot; i++)
+ if (pSlot[i].isFree)
+ return i;
+
+ FD_ZERO(&slotset);
+
+ maxFd = pSlot[0].sock;
+
+ for (i = 0; i < max_slot; i++)
+ {
+ FD_SET(pSlot[i].sock, &slotset);
+ if (pSlot[i].sock > maxFd)
+ maxFd = pSlot[i].sock;
+ }
+
+ /*
+ * Some of the slot are free, Process the results for slots whichever
+ * are free
+ */
+
+ do
+ {
+ SetCancelConn(pSlot[0].connection);
+
+ i = select_loop(maxFd, &slotset);
+
+ ResetCancelConn();
+
+ if (i < 0)
+ {
+ /*
+ * This can only happen if user has sent the cancel request using
+ * Ctrl+C, Cancel is handled by 0th slot, so fetch the error result.
+ */
+
+ GetQueryResult(pSlot[0].connection, dbname, progname,
+ completedb);
+ return NO_SLOT;
+ }
+
+ Assert(i != 0);
+
+ for (i = 0; i < max_slot; i++)
+ {
+ if (!FD_ISSET(pSlot[i].sock, &slotset))
+ continue;
+
+ PQconsumeInput(pSlot[i].connection);
+ if (PQisBusy(pSlot[i].connection))
+ continue;
+
+ pSlot[i].isFree = true;
+
+ if (!GetQueryResult(pSlot[i].connection, dbname, progname,
+ completedb))
+ return NO_SLOT;
+
+ if (firstFree < 0)
+ firstFree = i;
+ }
+ }while(firstFree < 0);
+
+ return firstFree;
+ }
+
+ /*
+ * GetQueryResult
+ * Process the query result.
+ */
+
+ static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb)
+ {
+ PGresult *result = NULL;
+ PGresult *lastResult = NULL;
+ bool r;
+
+
+ SetCancelConn(conn);
+ while((result = PQgetResult(conn)) != NULL)
+ {
+ PQclear(lastResult);
+ lastResult = result;
+ }
+
+ ResetCancelConn();
+
+ if (!lastResult)
+ return true;
+
+ r = (PQresultStatus(lastResult) == PGRES_COMMAND_OK);
+
+ /*
+ * If user has given the vacuum of complete db, then if
+ * any of the object's vacuum failed it can be ignored and vacuuming
+ * of other objects can be continued, this is the same behavior as
+ * vacuuming of complete db is handled without --jobs option
+ */
+ if (!r)
+ {
+ char *sqlState = PQresultErrorField(lastResult, PG_DIAG_SQLSTATE);
+
+ if (!completedb ||
+ (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0))
+ {
+
+ fprintf(stderr, _("%s: vacuuming of database \"%s\" failed: %s"),
+ progname, dbname, PQerrorMessage(conn));
+
+ PQclear(lastResult);
+ return false;
+ }
+ }
+
+ PQclear(lastResult);
+ return true;
+ }
+
+ /*
+ * vacuum_parallel
+ * This function will open the multiple concurrent connections as
+ * suggested by user, it will derive the table list using server call
+ * if table list is not given by user and perform the vacuum call
+ */
+
+ void
+ vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ int stage, bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int concurrentCons,
+ SimpleStringList *tables, bool quiet)
+ {
+
+ PGconn *conn;
+ int i;
+ ParallelSlot *connSlot;
+ SimpleStringList dbtables = {NULL, NULL};
+ bool completeDb = false;
+
+ conn = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ /*
+ * if table list is not provided then we need to do vaccum for whole DB
+ * get the list of all tables and prepare the list
+ */
+ if (!tables || !tables->head)
+ {
+ PGresult *res;
+ int ntuple;
+ int i;
+ PQExpBufferData sql;
+
+ initPQExpBuffer(&sql);
+
+ res = executeQuery(conn,
+ "SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns"
+ " WHERE (relkind = \'r\' or relkind = \'m\')"
+ " and c.relnamespace = ns.oid ORDER BY c.relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ appendPQExpBuffer(&sql, "%s",
+ fmtQualifiedId(PQserverVersion(conn),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 0)));
+
+ simple_string_list_append(&dbtables, sql.data);
+ resetPQExpBuffer(&sql);
+ }
+
+ termPQExpBuffer(&sql);
+ tables = &dbtables;
+
+ /* Vaccuming full database*/
+ completeDb = true;
+
+ if (concurrentCons > ntuple)
+ concurrentCons = ntuple;
+ }
+
+ connSlot = (ParallelSlot*)pg_malloc(concurrentCons * sizeof(ParallelSlot));
+ connSlot[0].connection = conn;
+ connSlot[0].sock = PQsocket(conn);
+
+ PQsetnonblocking(connSlot[0].connection, 1);
+
+ for (i = 1; i < concurrentCons; i++)
+ {
+ connSlot[i].connection = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ PQsetnonblocking(connSlot[i].connection, 1);
+ connSlot[i].isFree = true;
+ connSlot[i].sock = PQsocket(connSlot[i].connection);
+ }
+
+ if (analyze_in_stages)
+ {
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")
+ };
+
+ if (stage == -1)
+ {
+ int i;
+ for (i = 0; i < 3; i++)
+ {
+ if (!quiet)
+ {
+ puts(gettext(stage_messages[i]));
+ fflush(stdout);
+ }
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, concurrentCons,
+ progname, i, connSlot, completeDb);
+ }
+ }
+ else
+ {
+ /* Otherwise, we got a stage from vacuum_all_databases(), so run
+ * only that one. */
+ if (!quiet)
+ {
+ puts(gettext(stage_messages[stage]));
+ fflush(stdout);
+ }
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, concurrentCons,
+ progname, stage, connSlot, completeDb);
+ }
+ }
+ else
+ {
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze,
+ concurrentCons, progname, -1, connSlot, completeDb);
+ }
+
+ for (i = 0; i < concurrentCons; i++)
+ {
+ PQfinish(connSlot[i].connection);
+ }
+
+ pfree(connSlot);
+ }
+
+ /*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+ static int
+ select_loop(int maxFd, fd_set *workerset)
+ {
+ int i;
+ fd_set saveSet = *workerset;
+
+ #ifdef WIN32
+ /* should always be the master */
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+ #else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+
+ if (in_abort())
+ {
+ return -1;
+ }
+
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+ #endif
+
+ return i;
+ }
+
+ /*
+ * DisconnectDatabase
+ * disconnect all the connections.
+ */
+ void
+ DisconnectDatabase(ParallelSlot *slot)
+ {
+ PGcancel *cancel;
+ char errbuf[1];
+
+ if (!slot->connection)
+ return;
+
+ if (PQtransactionStatus(slot->connection) == PQTRANS_ACTIVE)
+ {
+ if ((cancel = PQgetCancel(slot->connection)))
+ {
+ PQcancel(cancel, errbuf, sizeof(errbuf));
+ PQfreeCancel(cancel);
+ }
+ }
+
+ PQfinish(slot->connection);
+ slot->connection= NULL;
+ }
+
+
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+ {
+ if (analyze_only)
+ {
+ appendPQExpBufferStr(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBufferStr(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBufferStr(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBufferStr(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBufferStr(sql, " FULL");
+ if (freeze)
+ appendPQExpBufferStr(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBufferStr(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBufferStr(sql, " ANALYZE");
+ }
+ }
+ }
+
static void
help(const char *progname)
***************
*** 436,441 **** help(const char *progname)
--- 972,978 ----
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many concurrent connections to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
On Mon, Nov 24, 2014 at 7:34 AM, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 23 November 2014 14:45, Amit Kapila Wrote
Thanks a lot for debugging and fixing the issue..
Latest patch is attached, please have a look.
I think still some of the comments given upthread are not handled:
a. About cancel handling
b. Setting of inAbort flag for case where PQCancel is successful
c. Performance data of --analyze-in-stages switch
d. Have one pass over the comments in patch. I could still some
wrong multiline comments. Refer below:
+ /* Otherwise, we got a stage from vacuum_all_databases(), so run
+ * only that one. */
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 24 November 2014 11:29, Amit Kapila Wrote,
I think still some of the comments given upthread are not handled:
a. About cancel handling
Your Actual comment was -->
One other related point is that I think still cancel handling mechanism
is not completely right, code is doing that when there are not enough
number of freeslots, but otherwise it won't handle the cancel request,
basically I am referring to below part of code:
run_parallel_vacuum()
{
..
for (cell = tables->head; cell; cell = cell->next)
{
..
free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname,
completedb);
…
PQsendQuery(slotconn, sql.data);
resetPQExpBuffer(&sql);
}
1. I think only if connection is going for Slot wait, it will be in blocking, or while GetQueryResult, so we have Handle SetCancleRequest both places.
2. Now a case (as you mentioned), when there are enough slots, and and above for loop is running if user do Ctrl+C then this will not break, This I have handled by checking inAbort
Mode inside the for loop before sending the new command, I think this we cannot do by setting the SetCancel because only when query receive some data it will realize that it canceled and it will fail, but until connection is not going to receive data it will not see the failure. So I have handled inAbort directly.
b. Setting of inAbort flag for case where PQCancel is successful
Your Actual comment was -->
Yeah, user has tried to terminate, however utility will emit the
message: "Could not send cancel request" in such a case and still
silently tries to cancel and disconnect all connections.
You are right, I have fixed the code, now in case of failure we need not to set inAbort Flag..
c. Performance data of --analyze-in-stages switch
Performance Data
------------------------------
CPU 8 cores
RAM = 16GB
checkpoint_segments=256
Before each test, run the test.sql (attached)
Un-patched -
dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb -p 9005 --analyze-in-stages -d postgres
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
real 0m0.843s
user 0m0.000s
sys 0m0.000s
Patched
dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb -p 9005 --analyze-in-stages -j 2 -d postgres
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
real 0m0.593s
user 0m0.004s
sys 0m0.004s
dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb -p 9005 --analyze-in-stages -j 4 -d postgres
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
real 0m0.421s
user 0m0.004s
sys 0m0.004s
I think in 2 connections we can get 30% improvement.
d. Have one pass over the comments in patch. I could still some wrong multiline comments. Refer below: + /* Otherwise, we got a stage from vacuum_all_databases(), so run + * only that one. */
Checked all, and fixed..
While testing, I found one more different behavior compare to base code,
Base Code:
dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb -p 9005 -t "t1" -t "t2" -t "t3" -t "t4" --analyze-in-stages -d Postgres
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
real 0m0.605s
user 0m0.004s
sys 0m0.000s
I think it should be like,
SET default_statistics_target=1; do for all three tables
SET default_statistics_target=10; do for all three tables so on..
With Patched
dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb -p 9005 -t "t1" -t "t2" -t "t3" -t "t4" --analyze-in-stages -j 2 -d postgres
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
real 0m0.395s
user 0m0.000s
sys 0m0.004s
here we are setting each target once and doing for all the tables..
Please provide you opinion.
Regards,
Dilip Kumar
Attachments:
vacuumdb_parallel_v19.patchapplication/octet-stream; name=vacuumdb_parallel_v19.patchDownload
*** a/doc/src/sgml/ref/vacuumdb.sgml
--- b/doc/src/sgml/ref/vacuumdb.sgml
***************
*** 204,209 **** PostgreSQL documentation
--- 204,230 ----
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></option></term>
+ <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+ <listitem>
+ <para>
+ Number of concurrent connections to perform the operation.
+ This option will enable the vacuum operation to run on asynchronous
+ connections, at a time one table will be operated on one connection.
+ So at one time as many tables will be vacuumed parallely as number of
+ jobs. If number of jobs given are more than number of tables then
+ number of jobs will be set to number of tables.
+ </para>
+ <para>
+ <application>vacuumdb</application> will open
+ <replaceable class="parameter"> njobs</replaceable> connections to the
+ database, so make sure your <xref linkend="guc-max-connections">
+ setting is high enough to accommodate all connections.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>--analyze-in-stages</option></term>
<listitem>
<para>
*** a/src/bin/scripts/common.c
--- b/src/bin/scripts/common.c
***************
*** 19,28 ****
#include "common.h"
! static void SetCancelConn(PGconn *conn);
! static void ResetCancelConn(void);
static PGcancel *volatile cancelConn = NULL;
#ifdef WIN32
static CRITICAL_SECTION cancelConnLock;
--- 19,28 ----
#include "common.h"
!
static PGcancel *volatile cancelConn = NULL;
+ static bool inAbort = false;
#ifdef WIN32
static CRITICAL_SECTION cancelConnLock;
***************
*** 291,297 **** yesno_prompt(const char *question)
*
* Set cancelConn to point to the current database connection.
*/
! static void
SetCancelConn(PGconn *conn)
{
PGcancel *oldCancelConn;
--- 291,297 ----
*
* Set cancelConn to point to the current database connection.
*/
! void
SetCancelConn(PGconn *conn)
{
PGcancel *oldCancelConn;
***************
*** 321,327 **** SetCancelConn(PGconn *conn)
*
* Free the current cancel connection, if any, and set to NULL.
*/
! static void
ResetCancelConn(void)
{
PGcancel *oldCancelConn;
--- 321,327 ----
*
* Free the current cancel connection, if any, and set to NULL.
*/
! void
ResetCancelConn(void)
{
PGcancel *oldCancelConn;
***************
*** 359,368 **** handle_sigint(SIGNAL_ARGS)
--- 359,375 ----
if (cancelConn != NULL)
{
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
+ {
+ inAbort = true;
fprintf(stderr, _("Cancel request sent\n"));
+ }
else
fprintf(stderr, _("Could not send cancel request: %s"), errbuf);
}
+ else
+ {
+ inAbort = true;
+ }
errno = save_errno; /* just in case the write changed it */
}
***************
*** 392,401 **** consoleHandler(DWORD dwCtrlType)
--- 399,416 ----
if (cancelConn != NULL)
{
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
+ {
fprintf(stderr, _("Cancel request sent\n"));
+ inAbort = true;
+ }
else
fprintf(stderr, _("Could not send cancel request: %s"), errbuf);
}
+ else
+ {
+ inAbort = true;
+ }
+
LeaveCriticalSection(&cancelConnLock);
return TRUE;
***************
*** 414,416 **** setup_cancel_handler(void)
--- 429,436 ----
}
#endif /* WIN32 */
+
+ bool in_abort()
+ {
+ return inAbort;
+ }
*** a/src/bin/scripts/common.h
--- b/src/bin/scripts/common.h
***************
*** 49,52 **** extern bool yesno_prompt(const char *question);
--- 49,57 ----
extern void setup_cancel_handler(void);
+ extern void SetCancelConn(PGconn *conn);
+ extern void ResetCancelConn(void);
+ extern bool in_abort(void);
+
+
#endif /* COMMON_H */
*** a/src/bin/scripts/vacuumdb.c
--- b/src/bin/scripts/vacuumdb.c
***************
*** 14,19 ****
--- 14,30 ----
#include "common.h"
#include "dumputils.h"
+ #define NO_SLOT (-1)
+
+ /* Arguments needed for a worker process */
+ typedef struct ParallelSlot
+ {
+ PGconn *connection;
+ bool isFree;
+ pgsocket sock;
+ } ParallelSlot;
+
+ #define ERRCODE_UNDEFINED_TABLE "42P01"
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
bool and_analyze, bool analyze_only, bool analyze_in_stages, int stage, bool freeze,
***************
*** 25,34 **** static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet);
static void help(const char *progname);
int
main(int argc, char *argv[])
--- 36,74 ----
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet,
! int concurrentCons);
static void help(const char *progname);
+ void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ int stage, bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int concurrentCons,
+ SimpleStringList *tables, bool quiet);
+
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+ static void
+ run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int concurrentCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb);
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb);
+
+ static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb);
+
+ static int
+ select_loop(int maxFd, fd_set *workerset);
+
+ static void DisconnectDatabase(ParallelSlot *slot);
+
int
main(int argc, char *argv[])
***************
*** 49,54 **** main(int argc, char *argv[])
--- 89,95 ----
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
***************
*** 74,86 **** main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
{
switch (c)
{
--- 115,129 ----
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int concurrentCons = 0;
+ int tbl_count = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
***************
*** 121,134 **** main(int argc, char *argv[])
--- 164,190 ----
alldb = true;
break;
case 't':
+ {
simple_string_list_append(&tables, optarg);
+ tbl_count++;
break;
+ }
case 'f':
full = true;
break;
case 'v':
verbose = true;
break;
+ case 'j':
+ concurrentCons = atoi(optarg);
+ if (concurrentCons <= 0)
+ {
+ fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least 1\n"),
+ progname);
+ exit(1);
+ }
+
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
***************
*** 141,146 **** main(int argc, char *argv[])
--- 197,203 ----
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
***************
*** 179,184 **** main(int argc, char *argv[])
--- 236,248 ----
setup_cancel_handler();
+ /*
+ * When user is giving the table list, and list is smaller then
+ * number of tables
+ */
+ if (tbl_count && (concurrentCons > tbl_count))
+ concurrentCons = tbl_count;
+
if (alldb)
{
if (dbname)
***************
*** 196,202 **** main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet);
}
else
{
--- 260,266 ----
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet, concurrentCons);
}
else
{
***************
*** 210,234 **** main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
- for (cell = tables.head; cell; cell = cell->next)
- {
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages, -1,
- freeze, cell->val,
- host, port, username, prompt_password,
- progname, echo, quiet);
- }
}
else
! vacuum_one_database(dbname, full, verbose, and_analyze,
analyze_only, analyze_in_stages, -1,
freeze, NULL,
host, port, username, prompt_password,
progname, echo, quiet);
}
exit(0);
--- 274,309 ----
dbname = get_user_name_or_exit(progname);
}
! if (concurrentCons > 1)
{
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages, -1,
! freeze, host, port, username, prompt_password,
! progname, echo, concurrentCons, &tables, quiet);
}
else
! {
! if (tables.head != NULL)
! {
! SimpleStringListCell *cell;
!
! for (cell = tables.head; cell; cell = cell->next)
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages, -1,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo, quiet);
! }
! }
! else
! vacuum_one_database(dbname, full, verbose, and_analyze,
analyze_only, analyze_in_stages, -1,
freeze, NULL,
host, port, username, prompt_password,
progname, echo, quiet);
+ }
}
exit(0);
***************
*** 268,323 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
conn = connectDatabase(dbname, host, port, username, prompt_password,
progname, false);
! if (analyze_only)
! {
! appendPQExpBufferStr(&sql, "ANALYZE");
! if (verbose)
! appendPQExpBufferStr(&sql, " VERBOSE");
! }
! else
! {
! appendPQExpBufferStr(&sql, "VACUUM");
! if (PQserverVersion(conn) >= 90000)
! {
! const char *paren = " (";
! const char *comma = ", ";
! const char *sep = paren;
- if (full)
- {
- appendPQExpBuffer(&sql, "%sFULL", sep);
- sep = comma;
- }
- if (freeze)
- {
- appendPQExpBuffer(&sql, "%sFREEZE", sep);
- sep = comma;
- }
- if (verbose)
- {
- appendPQExpBuffer(&sql, "%sVERBOSE", sep);
- sep = comma;
- }
- if (and_analyze)
- {
- appendPQExpBuffer(&sql, "%sANALYZE", sep);
- sep = comma;
- }
- if (sep != paren)
- appendPQExpBufferStr(&sql, ")");
- }
- else
- {
- if (full)
- appendPQExpBufferStr(&sql, " FULL");
- if (freeze)
- appendPQExpBufferStr(&sql, " FREEZE");
- if (verbose)
- appendPQExpBufferStr(&sql, " VERBOSE");
- if (and_analyze)
- appendPQExpBufferStr(&sql, " ANALYZE");
- }
- }
if (table)
appendPQExpBuffer(&sql, " %s", table);
appendPQExpBufferStr(&sql, ";");
--- 343,351 ----
conn = connectDatabase(dbname, host, port, username, prompt_password,
progname, false);
! prepare_command(conn, full, verbose,
! and_analyze, analyze_only, freeze, &sql);
if (table)
appendPQExpBuffer(&sql, " %s", table);
appendPQExpBufferStr(&sql, ";");
***************
*** 353,360 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
}
else
{
! /* Otherwise, we got a stage from vacuum_all_databases(), so run
! * only that one. */
if (!quiet)
{
puts(gettext(stage_messages[stage]));
--- 381,390 ----
}
else
{
! /*
! * Otherwise, we got a stage from vacuum_all_databases(), so run
! * only that one.
! */
if (!quiet)
{
puts(gettext(stage_messages[stage]));
***************
*** 374,384 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
! bool analyze_in_stages, bool freeze, const char *maintenance_db,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
--- 404,415 ----
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *maintenance_db, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo, bool quiet, int concurrentCons)
{
PGconn *conn;
PGresult *result;
***************
*** 390,396 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
PQfinish(conn);
/* If analyzing in stages, then run through all stages. Otherwise just
! * run once, passing -1 as the stage. */
for (stage = (analyze_in_stages ? 0 : -1);
stage < (analyze_in_stages ? 3 : 0);
stage++)
--- 421,428 ----
PQfinish(conn);
/* If analyzing in stages, then run through all stages. Otherwise just
! * run once, passing -1 as the stage.
! */
for (stage = (analyze_in_stages ? 0 : -1);
stage < (analyze_in_stages ? 3 : 0);
stage++)
***************
*** 407,412 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
--- 439,453 ----
fflush(stdout);
}
+ if (concurrentCons > 1)
+ {
+ vacuum_parallel(dbname, full, verbose, and_analyze,
+ analyze_only, analyze_in_stages, stage,
+ freeze, host, port, username, prompt_password,
+ progname, echo, concurrentCons, NULL, quiet);
+
+ }
+ else
vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
analyze_in_stages, stage,
freeze, NULL, host, port, username, prompt_password,
***************
*** 417,422 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
--- 458,969 ----
PQclear(result);
}
+ /*
+ * run_parallel_vacuum
+ * This function process the table list,
+ * pick the object one by one and get the Free connections slot, once it
+ * get the free slot send the job on the free connection.
+ */
+
+ static void
+ run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int concurrentCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb)
+ {
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+ int max_slot = concurrentCons;
+ int i;
+ int free_slot;
+ PGconn *slotconn;
+ bool error = false;
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"};
+
+ initPQExpBuffer(&sql);
+
+ if (analyze_stage >= 0)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ executeCommand(connSlot[i].connection,
+ stage_commands[analyze_stage], progname, echo);
+ }
+ }
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ if (in_abort())
+ {
+ error = true;
+ goto fail;
+ }
+
+ /*
+ * This will give the free connection slot, if no slot is free it will
+ * wait for atleast one slot to get free.
+ */
+ free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname,
+ completedb);
+ if (free_slot == NO_SLOT)
+ {
+ error = true;
+ goto fail;
+ }
+
+ prepare_command(connSlot[free_slot].connection, full, verbose,
+ and_analyze, analyze_only, freeze, &sql);
+
+ appendPQExpBuffer(&sql, " %s", cell->val);
+ appendPQExpBufferStr(&sql, ";");
+
+ connSlot[free_slot].isFree = false;
+
+ slotconn = connSlot[free_slot].connection;
+ PQsendQuery(slotconn, sql.data);
+
+ resetPQExpBuffer(&sql);
+ }
+
+ for (i = 0; i < max_slot; i++)
+ {
+ /* wait for all connection to return the results*/
+ if (!GetQueryResult(connSlot[i].connection, dbname, progname,
+ completedb))
+ {
+ error = true;
+ goto fail;
+ }
+
+ connSlot[i].isFree = true;
+ }
+
+ fail:
+
+ termPQExpBuffer(&sql);
+
+ if (error)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ DisconnectDatabase(&connSlot[i]);
+ }
+
+ pfree(connSlot);
+
+ exit(1);
+ }
+ }
+
+ /*
+ * GetIdleSlot
+ * Process the slot list, if any free slot available return the slotid
+ * If no slot is free, Then perform select on all the socket and wait until
+ * atleast one slot is free.
+ */
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb)
+ {
+ int i;
+ fd_set slotset;
+ int firstFree = -1;
+ pgsocket maxFd;
+
+ for (i = 0; i < max_slot; i++)
+ if (pSlot[i].isFree)
+ return i;
+
+ FD_ZERO(&slotset);
+
+ maxFd = pSlot[0].sock;
+
+ for (i = 0; i < max_slot; i++)
+ {
+ FD_SET(pSlot[i].sock, &slotset);
+ if (pSlot[i].sock > maxFd)
+ maxFd = pSlot[i].sock;
+ }
+
+ /*
+ * Some of the slot are free, Process the results for slots whichever
+ * are free
+ */
+
+ do
+ {
+ SetCancelConn(pSlot[0].connection);
+
+ i = select_loop(maxFd, &slotset);
+
+ ResetCancelConn();
+
+ if (i < 0)
+ {
+ /*
+ * This can only happen if user has sent the cancel request using
+ * Ctrl+C, Cancel is handled by 0th slot, so fetch the error result.
+ */
+
+ GetQueryResult(pSlot[0].connection, dbname, progname,
+ completedb);
+ return NO_SLOT;
+ }
+
+ Assert(i != 0);
+
+ for (i = 0; i < max_slot; i++)
+ {
+ if (!FD_ISSET(pSlot[i].sock, &slotset))
+ continue;
+
+ PQconsumeInput(pSlot[i].connection);
+ if (PQisBusy(pSlot[i].connection))
+ continue;
+
+ pSlot[i].isFree = true;
+
+ if (!GetQueryResult(pSlot[i].connection, dbname, progname,
+ completedb))
+ return NO_SLOT;
+
+ if (firstFree < 0)
+ firstFree = i;
+ }
+ }while(firstFree < 0);
+
+ return firstFree;
+ }
+
+ /*
+ * GetQueryResult
+ * Process the query result.
+ */
+
+ static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb)
+ {
+ PGresult *result = NULL;
+ PGresult *lastResult = NULL;
+ bool r;
+
+
+ SetCancelConn(conn);
+ while((result = PQgetResult(conn)) != NULL)
+ {
+ PQclear(lastResult);
+ lastResult = result;
+ }
+
+ ResetCancelConn();
+
+ if (!lastResult)
+ return true;
+
+ r = (PQresultStatus(lastResult) == PGRES_COMMAND_OK);
+
+ /*
+ * If user has given the vacuum of complete db, then if
+ * any of the object's vacuum failed it can be ignored and vacuuming
+ * of other objects can be continued, this is the same behavior as
+ * vacuuming of complete db is handled without --jobs option
+ */
+ if (!r)
+ {
+ char *sqlState = PQresultErrorField(lastResult, PG_DIAG_SQLSTATE);
+
+ if (!completedb ||
+ (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0))
+ {
+
+ fprintf(stderr, _("%s: vacuuming of database \"%s\" failed: %s"),
+ progname, dbname, PQerrorMessage(conn));
+
+ PQclear(lastResult);
+ return false;
+ }
+ }
+
+ PQclear(lastResult);
+ return true;
+ }
+
+ /*
+ * vacuum_parallel
+ * This function will open the multiple concurrent connections as
+ * suggested by user, it will derive the table list using server call
+ * if table list is not given by user and perform the vacuum call
+ */
+
+ void
+ vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ int stage, bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int concurrentCons,
+ SimpleStringList *tables, bool quiet)
+ {
+
+ PGconn *conn;
+ int i;
+ ParallelSlot *connSlot;
+ SimpleStringList dbtables = {NULL, NULL};
+ bool completeDb = false;
+
+ conn = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ /*
+ * if table list is not provided then we need to do vaccum for whole DB
+ * get the list of all tables and prepare the list
+ */
+ if (!tables || !tables->head)
+ {
+ PGresult *res;
+ int ntuple;
+ int i;
+ PQExpBufferData sql;
+
+ initPQExpBuffer(&sql);
+
+ res = executeQuery(conn,
+ "SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns"
+ " WHERE (relkind = \'r\' or relkind = \'m\')"
+ " and c.relnamespace = ns.oid ORDER BY c.relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ appendPQExpBuffer(&sql, "%s",
+ fmtQualifiedId(PQserverVersion(conn),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 0)));
+
+ simple_string_list_append(&dbtables, sql.data);
+ resetPQExpBuffer(&sql);
+ }
+
+ termPQExpBuffer(&sql);
+ tables = &dbtables;
+
+ /* Vaccuming full database*/
+ completeDb = true;
+
+ if (concurrentCons > ntuple)
+ concurrentCons = ntuple;
+ }
+
+ connSlot = (ParallelSlot*)pg_malloc(concurrentCons * sizeof(ParallelSlot));
+ connSlot[0].connection = conn;
+ connSlot[0].sock = PQsocket(conn);
+
+ PQsetnonblocking(connSlot[0].connection, 1);
+
+ for (i = 1; i < concurrentCons; i++)
+ {
+ connSlot[i].connection = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ PQsetnonblocking(connSlot[i].connection, 1);
+ connSlot[i].isFree = true;
+ connSlot[i].sock = PQsocket(connSlot[i].connection);
+ }
+
+ if (analyze_in_stages)
+ {
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")
+ };
+
+ if (stage == -1)
+ {
+ int i;
+ for (i = 0; i < 3; i++)
+ {
+ if (!quiet)
+ {
+ puts(gettext(stage_messages[i]));
+ fflush(stdout);
+ }
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, concurrentCons,
+ progname, i, connSlot, completeDb);
+ }
+ }
+ else
+ {
+ /* Otherwise, we got a stage from vacuum_all_databases(), so run
+ * only that one.
+ */
+ if (!quiet)
+ {
+ puts(gettext(stage_messages[stage]));
+ fflush(stdout);
+ }
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, concurrentCons,
+ progname, stage, connSlot, completeDb);
+ }
+ }
+ else
+ {
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze,
+ concurrentCons, progname, -1, connSlot, completeDb);
+ }
+
+ for (i = 0; i < concurrentCons; i++)
+ {
+ PQfinish(connSlot[i].connection);
+ }
+
+ pfree(connSlot);
+ }
+
+ /*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+ static int
+ select_loop(int maxFd, fd_set *workerset)
+ {
+ int i;
+ fd_set saveSet = *workerset;
+
+ #ifdef WIN32
+ /* should always be the master */
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+ if (in_abort())
+ {
+ return -1;
+ }
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+ #else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+ if (in_abort())
+ {
+ return -1;
+ }
+
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+ #endif
+
+ return i;
+ }
+
+ /*
+ * DisconnectDatabase
+ * disconnect all the connections.
+ */
+ void
+ DisconnectDatabase(ParallelSlot *slot)
+ {
+ PGcancel *cancel;
+ char errbuf[1];
+
+ if (!slot->connection)
+ return;
+
+ if (PQtransactionStatus(slot->connection) == PQTRANS_ACTIVE)
+ {
+ if ((cancel = PQgetCancel(slot->connection)))
+ {
+ PQcancel(cancel, errbuf, sizeof(errbuf));
+ PQfreeCancel(cancel);
+ }
+ }
+
+ PQfinish(slot->connection);
+ slot->connection= NULL;
+ }
+
+
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+ {
+ if (analyze_only)
+ {
+ appendPQExpBufferStr(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBufferStr(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBufferStr(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBufferStr(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBufferStr(sql, " FULL");
+ if (freeze)
+ appendPQExpBufferStr(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBufferStr(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBufferStr(sql, " ANALYZE");
+ }
+ }
+ }
+
static void
help(const char *progname)
***************
*** 436,441 **** help(const char *progname)
--- 983,989 ----
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many concurrent connections to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
On Mon, Dec 1, 2014 at 12:18 PM, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 24 November 2014 11:29, Amit Kapila Wrote,
I have verified that all previous comments are addressed and
the new version is much better than previous version.
here we are setting each target once and doing for all the tables..
Hmm, theoretically I think new behaviour could lead to more I/O in
certain cases as compare to existing behaviour. The reason for more I/O
is that in the new behaviour, while doing Analyze for a particular table at
different targets, in-between it has Analyze of different table as well,
so the pages in shared buffers or OS cache for a particular table needs to
be reloded again for a new target whereas currently it will do all stages
of Analyze for a particular table in one-go which means that each stage
of Analyze could get benefit from the pages of a table loaded by previous
stage. If you agree, then we should try to avoid this change in new
behaviour.
Please provide you opinion.
I have few questions regarding function GetIdleSlot()
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const
char *progname, bool completedb)
{
..
+ /*
+ * Some of the slot are free, Process the results for slots whichever
+ * are free
+ */
+
+ do
+ {
+ SetCancelConn(pSlot[0].connection);
+
+ i = select_loop(maxFd,
&slotset);
+
+ ResetCancelConn();
+
+ if (i < 0)
+ {
+ /*
+
* This can only happen if user has sent the cancel request using
+ *
Ctrl+C, Cancel is handled by 0th slot, so fetch the error result.
+ */
+
+
GetQueryResult(pSlot[0].connection, dbname, progname,
+
completedb);
+ return NO_SLOT;
+ }
+
+ Assert(i != 0);
+
+
for (i = 0; i < max_slot; i++)
+ {
+ if (!FD_ISSET(pSlot[i].sock,
&slotset))
+ continue;
+
+ PQconsumeInput(pSlot[i].connection);
+
if (PQisBusy(pSlot[i].connection))
+ continue;
+
+
pSlot[i].isFree = true;
+
+ if (!GetQueryResult(pSlot[i].connection, dbname,
progname,
+ completedb))
+
return NO_SLOT;
+
+ if (firstFree < 0)
+ firstFree = i;
+
}
+ }while(firstFree < 0);
}
I wanted to understand what exactly the above loop is doing.
a.
first of all the comment on top of it says "Some of the slot
are free, ...", if some slot is free, then why do you want
to process the results? (Do you mean to say that *None* of
the slot is free....?)
b.
IIUC, you have called function select_loop(maxFd, &slotset)
to check if socket descriptor is readable, if yes then why
in do..while loop the same maxFd is checked always, don't
you want to check different socket descriptors? I am not sure
if I am missing something here
c.
After checking the socket descriptor for maxFd why you want
to run run the below for loop for all slots?
for (i = 0; i < max_slot; i++)
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sat, Dec 6, 2014 at 9:01 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
If you agree, then we should try to avoid this change in new behaviour.
Still seeing many concerns about this patch, so marking it as returned
with feedback. If possible, switching it to the next CF would be fine
I guess as this work is still being continued.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 06 December 2014 20:01 Amit Kapila Wrote
I wanted to understand what exactly the above loop is doing.
a.
first of all the comment on top of it says "Some of the slot
are free, ...", if some slot is free, then why do you want
to process the results? (Do you mean to say that *None* of
the slot is free....?)
This comment is wrong, I will remove this.
b.
IIUC, you have called function select_loop(maxFd, &slotset)
to check if socket descriptor is readable, if yes then why
in do..while loop the same maxFd is checked always, don't
you want to check different socket descriptors? I am not sure
if I am missing something here
select_loop(maxFd, &slotset)
maxFd is the max descriptor among all SETS, and slotset contains all the descriptor, so if any of the descriptor get some message select_loop will come out, and once select loop come out,
we need to check how many descriptor have got the message from server so we loop and process the results.
So it’s not only for a maxFd, it’s for all the descriptors. And it’s in do..while loop, because it possible that select_loop come out because of some intermediate message on any of the socket but still query is not complete,
and if none of the socket is still free (that we check in below for loop), then go to select_loop again.
c.
After checking the socket descriptor for maxFd why you want
to run run the below for loop for all slots?
for (i = 0; i < max_slot; i++)
After Select loop is out, it’s possible that we might have got result on multiple connections, so consume input and check if still busy, then nothing to do, but if finished process the result and mark the connection free.
And if any of the connection is free, then we will break the do..while loop.
From: Amit Kapila [mailto:amit.kapila16@gmail.com]
Sent: 06 December 2014 20:01
To: Dilip kumar
Cc: Magnus Hagander; Alvaro Herrera; Jan Lentfer; Tom Lane; PostgreSQL-development; Sawada Masahiko; Euler Taveira
Subject: Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Mon, Dec 1, 2014 at 12:18 PM, Dilip kumar <dilip.kumar@huawei.com<mailto:dilip.kumar@huawei.com>> wrote:
On 24 November 2014 11:29, Amit Kapila Wrote,
I have verified that all previous comments are addressed and
the new version is much better than previous version.
here we are setting each target once and doing for all the tables..
Hmm, theoretically I think new behaviour could lead to more I/O in
certain cases as compare to existing behaviour. The reason for more I/O
is that in the new behaviour, while doing Analyze for a particular table at
different targets, in-between it has Analyze of different table as well,
so the pages in shared buffers or OS cache for a particular table needs to
be reloded again for a new target whereas currently it will do all stages
of Analyze for a particular table in one-go which means that each stage
of Analyze could get benefit from the pages of a table loaded by previous
stage. If you agree, then we should try to avoid this change in new
behaviour.
Please provide you opinion.
I have few questions regarding function GetIdleSlot()
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const
char *progname, bool completedb)
{
..
+ /*
+ * Some of the slot are free, Process the results for slots whichever
+ * are free
+ */
+
+ do
+ {
+ SetCancelConn(pSlot[0].connection);
+
+ i = select_loop(maxFd,
&slotset);
+
+ ResetCancelConn();
+
+ if (i < 0)
+ {
+ /*
+
* This can only happen if user has sent the cancel request using
+ *
Ctrl+C, Cancel is handled by 0th slot, so fetch the error result.
+ */
+
+
GetQueryResult(pSlot[0].connection, dbname, progname,
+
completedb);
+ return NO_SLOT;
+ }
+
+ Assert(i != 0);
+
+
for (i = 0; i < max_slot; i++)
+ {
+ if (!FD_ISSET(pSlot[i].sock,
&slotset))
+ continue;
+
+ PQconsumeInput(pSlot[i].connection);
+
if (PQisBusy(pSlot[i].connection))
+ continue;
+
+
pSlot[i].isFree = true;
+
+ if (!GetQueryResult(pSlot[i].connection, dbname,
progname,
+ completedb))
+
return NO_SLOT;
+
+ if (firstFree < 0)
+ firstFree = i;
+
}
+ }while(firstFree < 0);
}
I wanted to understand what exactly the above loop is doing.
a.
first of all the comment on top of it says "Some of the slot
are free, ...", if some slot is free, then why do you want
to process the results? (Do you mean to say that *None* of
the slot is free....?)
b.
IIUC, you have called function select_loop(maxFd, &slotset)
to check if socket descriptor is readable, if yes then why
in do..while loop the same maxFd is checked always, don't
you want to check different socket descriptors? I am not sure
if I am missing something here
c.
After checking the socket descriptor for maxFd why you want
to run run the below for loop for all slots?
for (i = 0; i < max_slot; i++)
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com<http://www.enterprisedb.com/>
On Mon, Dec 8, 2014 at 7:33 AM, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 06 December 2014 20:01 Amit Kapila Wrote
I wanted to understand what exactly the above loop is doing.
a.
first of all the comment on top of it says "Some of the slot
are free, ...", if some slot is free, then why do you want
to process the results? (Do you mean to say that *None* of
the slot is free....?)
This comment is wrong, I will remove this.
I suggest rather than removing, edit the comment to indicate
the idea behind code at that place.
b.
IIUC, you have called function select_loop(maxFd, &slotset)
to check if socket descriptor is readable, if yes then why
in do..while loop the same maxFd is checked always, don't
you want to check different socket descriptors? I am not sure
if I am missing something here
select_loop(maxFd, &slotset)
So it’s not only for a maxFd, it’s for all the descriptors. And it’s in
do..while loop, because it possible that select_loop come out because of
some intermediate message on any of the socket but still query is not
complete,
Okay, I think this part of code is somewhat similar to what
is done in pg_dump/parallel.c with some differences related
to handling of inAbort. One thing I have noticed here which
could lead to a problem is that caller of select_loop() function
assumes that return value is less than zero only if there is a
cancel request which I think is wrong, because select system
call could also return -1 in case of error. I am referring below
code in above context:
+ if (i < 0)
+ {
+ /*
+ * This can only happen if user has sent the cancel request using
+ * Ctrl+C, Cancel is handled by 0th slot, so fetch the error result.
+ */
+
+ GetQueryResult(pSlot[0].connection, dbname, progname,
+ completedb);
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On December 2014 17:31 Amit Kapila Wrote,
I suggest rather than removing, edit the comment to indicate
the idea behind code at that place.
Done
Okay, I think this part of code is somewhat similar to what
is done in pg_dump/parallel.c with some differences related
to handling of inAbort. One thing I have noticed here which
could lead to a problem is that caller of select_loop() function
assumes that return value is less than zero only if there is a
cancel request which I think is wrong, because select system
call could also return -1 in case of error. I am referring below
code in above context:
+ if (i < 0)
+ {
+ /*
+ * This can only happen if user has sent the cancel request using
+ * Ctrl+C, Cancel is handled by 0th slot, so fetch the error result.
+ */
+
+ GetQueryResult(pSlot[0].connection, dbname, progname,
+ completedb);
Now for abort case I am using special error code, and other than that case we will assert, this behavior is same as in pg_dump.
Hmm, theoretically I think new behaviour could lead to more I/O in
certain cases as compare to existing behaviour. The reason for more I/O
is that in the new behaviour, while doing Analyze for a particular table at
different targets, in-between it has Analyze of different table as well,
so the pages in shared buffers or OS cache for a particular table needs to
be reloded again for a new target whereas currently it will do all stages
of Analyze for a particular table in one-go which means that each stage
of Analyze could get benefit from the pages of a table loaded by previous
stage. If you agree, then we should try to avoid this change in new
behaviour.
I will work on this comment and post the updated patch..
I will move this patch to the latest commitfest.
Regards,
Dilip
Attachments:
vacuumdb_parallel_v20.patchapplication/octet-stream; name=vacuumdb_parallel_v20.patchDownload
*** a/doc/src/sgml/ref/vacuumdb.sgml
--- b/doc/src/sgml/ref/vacuumdb.sgml
***************
*** 204,209 **** PostgreSQL documentation
--- 204,230 ----
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></option></term>
+ <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+ <listitem>
+ <para>
+ Number of concurrent connections to perform the operation.
+ This option will enable the vacuum operation to run on asynchronous
+ connections, at a time one table will be operated on one connection.
+ So at one time as many tables will be vacuumed parallely as number of
+ jobs. If number of jobs given are more than number of tables then
+ number of jobs will be set to number of tables.
+ </para>
+ <para>
+ <application>vacuumdb</application> will open
+ <replaceable class="parameter"> njobs</replaceable> connections to the
+ database, so make sure your <xref linkend="guc-max-connections">
+ setting is high enough to accommodate all connections.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>--analyze-in-stages</option></term>
<listitem>
<para>
*** a/src/bin/scripts/common.c
--- b/src/bin/scripts/common.c
***************
*** 19,28 ****
#include "common.h"
! static void SetCancelConn(PGconn *conn);
! static void ResetCancelConn(void);
static PGcancel *volatile cancelConn = NULL;
#ifdef WIN32
static CRITICAL_SECTION cancelConnLock;
--- 19,28 ----
#include "common.h"
!
static PGcancel *volatile cancelConn = NULL;
+ static bool inAbort = false;
#ifdef WIN32
static CRITICAL_SECTION cancelConnLock;
***************
*** 291,297 **** yesno_prompt(const char *question)
*
* Set cancelConn to point to the current database connection.
*/
! static void
SetCancelConn(PGconn *conn)
{
PGcancel *oldCancelConn;
--- 291,297 ----
*
* Set cancelConn to point to the current database connection.
*/
! void
SetCancelConn(PGconn *conn)
{
PGcancel *oldCancelConn;
***************
*** 321,327 **** SetCancelConn(PGconn *conn)
*
* Free the current cancel connection, if any, and set to NULL.
*/
! static void
ResetCancelConn(void)
{
PGcancel *oldCancelConn;
--- 321,327 ----
*
* Free the current cancel connection, if any, and set to NULL.
*/
! void
ResetCancelConn(void)
{
PGcancel *oldCancelConn;
***************
*** 359,368 **** handle_sigint(SIGNAL_ARGS)
--- 359,375 ----
if (cancelConn != NULL)
{
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
+ {
+ inAbort = true;
fprintf(stderr, _("Cancel request sent\n"));
+ }
else
fprintf(stderr, _("Could not send cancel request: %s"), errbuf);
}
+ else
+ {
+ inAbort = true;
+ }
errno = save_errno; /* just in case the write changed it */
}
***************
*** 392,401 **** consoleHandler(DWORD dwCtrlType)
--- 399,416 ----
if (cancelConn != NULL)
{
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
+ {
fprintf(stderr, _("Cancel request sent\n"));
+ inAbort = true;
+ }
else
fprintf(stderr, _("Could not send cancel request: %s"), errbuf);
}
+ else
+ {
+ inAbort = true;
+ }
+
LeaveCriticalSection(&cancelConnLock);
return TRUE;
***************
*** 414,416 **** setup_cancel_handler(void)
--- 429,436 ----
}
#endif /* WIN32 */
+
+ bool in_abort()
+ {
+ return inAbort;
+ }
*** a/src/bin/scripts/common.h
--- b/src/bin/scripts/common.h
***************
*** 49,52 **** extern bool yesno_prompt(const char *question);
--- 49,57 ----
extern void setup_cancel_handler(void);
+ extern void SetCancelConn(PGconn *conn);
+ extern void ResetCancelConn(void);
+ extern bool in_abort(void);
+
+
#endif /* COMMON_H */
*** a/src/bin/scripts/vacuumdb.c
--- b/src/bin/scripts/vacuumdb.c
***************
*** 14,19 ****
--- 14,31 ----
#include "common.h"
#include "dumputils.h"
+ #define NO_SLOT (-1)
+
+ /* Arguments needed for a worker process */
+ typedef struct ParallelSlot
+ {
+ PGconn *connection;
+ bool isFree;
+ pgsocket sock;
+ } ParallelSlot;
+
+ #define ERRCODE_UNDEFINED_TABLE "42P01"
+ #define ERROR_IN_ABORT -2
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
bool and_analyze, bool analyze_only, bool analyze_in_stages, int stage, bool freeze,
***************
*** 25,34 **** static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet);
static void help(const char *progname);
int
main(int argc, char *argv[])
--- 37,75 ----
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet,
! int concurrentCons);
static void help(const char *progname);
+ void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ int stage, bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int concurrentCons,
+ SimpleStringList *tables, bool quiet);
+
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+ static void
+ run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int concurrentCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb);
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb);
+
+ static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb);
+
+ static int
+ select_loop(int maxFd, fd_set *workerset);
+
+ static void DisconnectDatabase(ParallelSlot *slot);
+
int
main(int argc, char *argv[])
***************
*** 49,54 **** main(int argc, char *argv[])
--- 90,96 ----
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
***************
*** 74,86 **** main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
{
switch (c)
{
--- 116,130 ----
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int concurrentCons = 0;
+ int tbl_count = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
***************
*** 121,134 **** main(int argc, char *argv[])
--- 165,191 ----
alldb = true;
break;
case 't':
+ {
simple_string_list_append(&tables, optarg);
+ tbl_count++;
break;
+ }
case 'f':
full = true;
break;
case 'v':
verbose = true;
break;
+ case 'j':
+ concurrentCons = atoi(optarg);
+ if (concurrentCons <= 0)
+ {
+ fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least 1\n"),
+ progname);
+ exit(1);
+ }
+
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
***************
*** 141,146 **** main(int argc, char *argv[])
--- 198,204 ----
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
***************
*** 179,184 **** main(int argc, char *argv[])
--- 237,249 ----
setup_cancel_handler();
+ /*
+ * When user is giving the table list, and list is smaller then
+ * number of tables
+ */
+ if (tbl_count && (concurrentCons > tbl_count))
+ concurrentCons = tbl_count;
+
if (alldb)
{
if (dbname)
***************
*** 196,202 **** main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet);
}
else
{
--- 261,267 ----
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet, concurrentCons);
}
else
{
***************
*** 210,234 **** main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
- for (cell = tables.head; cell; cell = cell->next)
- {
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages, -1,
- freeze, cell->val,
- host, port, username, prompt_password,
- progname, echo, quiet);
- }
}
else
! vacuum_one_database(dbname, full, verbose, and_analyze,
analyze_only, analyze_in_stages, -1,
freeze, NULL,
host, port, username, prompt_password,
progname, echo, quiet);
}
exit(0);
--- 275,310 ----
dbname = get_user_name_or_exit(progname);
}
! if (concurrentCons > 1)
{
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages, -1,
! freeze, host, port, username, prompt_password,
! progname, echo, concurrentCons, &tables, quiet);
}
else
! {
! if (tables.head != NULL)
! {
! SimpleStringListCell *cell;
!
! for (cell = tables.head; cell; cell = cell->next)
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages, -1,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo, quiet);
! }
! }
! else
! vacuum_one_database(dbname, full, verbose, and_analyze,
analyze_only, analyze_in_stages, -1,
freeze, NULL,
host, port, username, prompt_password,
progname, echo, quiet);
+ }
}
exit(0);
***************
*** 268,323 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
conn = connectDatabase(dbname, host, port, username, prompt_password,
progname, false);
! if (analyze_only)
! {
! appendPQExpBufferStr(&sql, "ANALYZE");
! if (verbose)
! appendPQExpBufferStr(&sql, " VERBOSE");
! }
! else
! {
! appendPQExpBufferStr(&sql, "VACUUM");
! if (PQserverVersion(conn) >= 90000)
! {
! const char *paren = " (";
! const char *comma = ", ";
! const char *sep = paren;
- if (full)
- {
- appendPQExpBuffer(&sql, "%sFULL", sep);
- sep = comma;
- }
- if (freeze)
- {
- appendPQExpBuffer(&sql, "%sFREEZE", sep);
- sep = comma;
- }
- if (verbose)
- {
- appendPQExpBuffer(&sql, "%sVERBOSE", sep);
- sep = comma;
- }
- if (and_analyze)
- {
- appendPQExpBuffer(&sql, "%sANALYZE", sep);
- sep = comma;
- }
- if (sep != paren)
- appendPQExpBufferStr(&sql, ")");
- }
- else
- {
- if (full)
- appendPQExpBufferStr(&sql, " FULL");
- if (freeze)
- appendPQExpBufferStr(&sql, " FREEZE");
- if (verbose)
- appendPQExpBufferStr(&sql, " VERBOSE");
- if (and_analyze)
- appendPQExpBufferStr(&sql, " ANALYZE");
- }
- }
if (table)
appendPQExpBuffer(&sql, " %s", table);
appendPQExpBufferStr(&sql, ";");
--- 344,352 ----
conn = connectDatabase(dbname, host, port, username, prompt_password,
progname, false);
! prepare_command(conn, full, verbose,
! and_analyze, analyze_only, freeze, &sql);
if (table)
appendPQExpBuffer(&sql, " %s", table);
appendPQExpBufferStr(&sql, ";");
***************
*** 353,360 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
}
else
{
! /* Otherwise, we got a stage from vacuum_all_databases(), so run
! * only that one. */
if (!quiet)
{
puts(gettext(stage_messages[stage]));
--- 382,391 ----
}
else
{
! /*
! * Otherwise, we got a stage from vacuum_all_databases(), so run
! * only that one.
! */
if (!quiet)
{
puts(gettext(stage_messages[stage]));
***************
*** 374,384 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
! bool analyze_in_stages, bool freeze, const char *maintenance_db,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
--- 405,416 ----
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *maintenance_db, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo, bool quiet, int concurrentCons)
{
PGconn *conn;
PGresult *result;
***************
*** 390,396 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
PQfinish(conn);
/* If analyzing in stages, then run through all stages. Otherwise just
! * run once, passing -1 as the stage. */
for (stage = (analyze_in_stages ? 0 : -1);
stage < (analyze_in_stages ? 3 : 0);
stage++)
--- 422,429 ----
PQfinish(conn);
/* If analyzing in stages, then run through all stages. Otherwise just
! * run once, passing -1 as the stage.
! */
for (stage = (analyze_in_stages ? 0 : -1);
stage < (analyze_in_stages ? 3 : 0);
stage++)
***************
*** 407,412 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
--- 440,454 ----
fflush(stdout);
}
+ if (concurrentCons > 1)
+ {
+ vacuum_parallel(dbname, full, verbose, and_analyze,
+ analyze_only, analyze_in_stages, stage,
+ freeze, host, port, username, prompt_password,
+ progname, echo, concurrentCons, NULL, quiet);
+
+ }
+ else
vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
analyze_in_stages, stage,
freeze, NULL, host, port, username, prompt_password,
***************
*** 417,422 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
--- 459,972 ----
PQclear(result);
}
+ /*
+ * run_parallel_vacuum
+ * This function process the table list,
+ * pick the object one by one and get the Free connections slot, once it
+ * get the free slot send the job on the free connection.
+ */
+
+ static void
+ run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int concurrentCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb)
+ {
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+ int max_slot = concurrentCons;
+ int i;
+ int free_slot;
+ PGconn *slotconn;
+ bool error = false;
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"};
+
+ initPQExpBuffer(&sql);
+
+ if (analyze_stage >= 0)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ executeCommand(connSlot[i].connection,
+ stage_commands[analyze_stage], progname, echo);
+ }
+ }
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ if (in_abort())
+ {
+ error = true;
+ goto fail;
+ }
+
+ /*
+ * This will give the free connection slot, if no slot is free it will
+ * wait for atleast one slot to get free.
+ */
+ free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname,
+ completedb);
+ if (free_slot == NO_SLOT)
+ {
+ error = true;
+ goto fail;
+ }
+
+ prepare_command(connSlot[free_slot].connection, full, verbose,
+ and_analyze, analyze_only, freeze, &sql);
+
+ appendPQExpBuffer(&sql, " %s", cell->val);
+ appendPQExpBufferStr(&sql, ";");
+
+ connSlot[free_slot].isFree = false;
+
+ slotconn = connSlot[free_slot].connection;
+ PQsendQuery(slotconn, sql.data);
+
+ resetPQExpBuffer(&sql);
+ }
+
+ for (i = 0; i < max_slot; i++)
+ {
+ /* wait for all connection to return the results*/
+ if (!GetQueryResult(connSlot[i].connection, dbname, progname,
+ completedb))
+ {
+ error = true;
+ goto fail;
+ }
+
+ connSlot[i].isFree = true;
+ }
+
+ fail:
+
+ termPQExpBuffer(&sql);
+
+ if (error)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ DisconnectDatabase(&connSlot[i]);
+ }
+
+ pfree(connSlot);
+
+ exit(1);
+ }
+ }
+
+ /*
+ * GetIdleSlot
+ * Process the slot list, if any free slot available return the slotid
+ * If no slot is free, Then perform select on all the socket and wait until
+ * atleast one slot is free.
+ */
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb)
+ {
+ int i;
+ fd_set slotset;
+ int firstFree = -1;
+ pgsocket maxFd;
+
+ for (i = 0; i < max_slot; i++)
+ if (pSlot[i].isFree)
+ return i;
+
+ FD_ZERO(&slotset);
+
+ maxFd = pSlot[0].sock;
+
+ for (i = 0; i < max_slot; i++)
+ {
+ FD_SET(pSlot[i].sock, &slotset);
+ if (pSlot[i].sock > maxFd)
+ maxFd = pSlot[i].sock;
+ }
+
+ /*
+ * No free slot found, so wait for all the connetions,
+ * once any of the connetion get respose, check if
+ * any of the connection finished it task, if yes,
+ * return the slot, otherwise wait again.
+ */
+
+ do
+ {
+ SetCancelConn(pSlot[0].connection);
+
+ i = select_loop(maxFd, &slotset);
+
+ ResetCancelConn();
+
+ if (ERROR_IN_ABORT == i)
+ {
+ /*
+ * This can only happen if user has sent the cancel request using
+ * Ctrl+C, Cancel is handled by 0th slot, so fetch the error result.
+ */
+
+ GetQueryResult(pSlot[0].connection, dbname, progname,
+ completedb);
+ return NO_SLOT;
+ }
+
+ Assert(i != 0);
+
+ for (i = 0; i < max_slot; i++)
+ {
+ if (!FD_ISSET(pSlot[i].sock, &slotset))
+ continue;
+
+ PQconsumeInput(pSlot[i].connection);
+ if (PQisBusy(pSlot[i].connection))
+ continue;
+
+ pSlot[i].isFree = true;
+
+ if (!GetQueryResult(pSlot[i].connection, dbname, progname,
+ completedb))
+ return NO_SLOT;
+
+ if (firstFree < 0)
+ firstFree = i;
+ }
+ }while(firstFree < 0);
+
+ return firstFree;
+ }
+
+ /*
+ * GetQueryResult
+ * Process the query result.
+ */
+
+ static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb)
+ {
+ PGresult *result = NULL;
+ PGresult *lastResult = NULL;
+ bool r;
+
+
+ SetCancelConn(conn);
+ while((result = PQgetResult(conn)) != NULL)
+ {
+ PQclear(lastResult);
+ lastResult = result;
+ }
+
+ ResetCancelConn();
+
+ if (!lastResult)
+ return true;
+
+ r = (PQresultStatus(lastResult) == PGRES_COMMAND_OK);
+
+ /*
+ * If user has given the vacuum of complete db, then if
+ * any of the object's vacuum failed it can be ignored and vacuuming
+ * of other objects can be continued, this is the same behavior as
+ * vacuuming of complete db is handled without --jobs option
+ */
+ if (!r)
+ {
+ char *sqlState = PQresultErrorField(lastResult, PG_DIAG_SQLSTATE);
+
+ if (!completedb ||
+ (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0))
+ {
+
+ fprintf(stderr, _("%s: vacuuming of database \"%s\" failed: %s"),
+ progname, dbname, PQerrorMessage(conn));
+
+ PQclear(lastResult);
+ return false;
+ }
+ }
+
+ PQclear(lastResult);
+ return true;
+ }
+
+ /*
+ * vacuum_parallel
+ * This function will open the multiple concurrent connections as
+ * suggested by user, it will derive the table list using server call
+ * if table list is not given by user and perform the vacuum call
+ */
+
+ void
+ vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ int stage, bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int concurrentCons,
+ SimpleStringList *tables, bool quiet)
+ {
+
+ PGconn *conn;
+ int i;
+ ParallelSlot *connSlot;
+ SimpleStringList dbtables = {NULL, NULL};
+ bool completeDb = false;
+
+ conn = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ /*
+ * if table list is not provided then we need to do vaccum for whole DB
+ * get the list of all tables and prepare the list
+ */
+ if (!tables || !tables->head)
+ {
+ PGresult *res;
+ int ntuple;
+ int i;
+ PQExpBufferData sql;
+
+ initPQExpBuffer(&sql);
+
+ res = executeQuery(conn,
+ "SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns"
+ " WHERE (relkind = \'r\' or relkind = \'m\')"
+ " and c.relnamespace = ns.oid ORDER BY c.relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ appendPQExpBuffer(&sql, "%s",
+ fmtQualifiedId(PQserverVersion(conn),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 0)));
+
+ simple_string_list_append(&dbtables, sql.data);
+ resetPQExpBuffer(&sql);
+ }
+
+ termPQExpBuffer(&sql);
+ tables = &dbtables;
+
+ /* Vaccuming full database*/
+ completeDb = true;
+
+ if (concurrentCons > ntuple)
+ concurrentCons = ntuple;
+ }
+
+ connSlot = (ParallelSlot*)pg_malloc(concurrentCons * sizeof(ParallelSlot));
+ connSlot[0].connection = conn;
+ connSlot[0].sock = PQsocket(conn);
+
+ PQsetnonblocking(connSlot[0].connection, 1);
+
+ for (i = 1; i < concurrentCons; i++)
+ {
+ connSlot[i].connection = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ PQsetnonblocking(connSlot[i].connection, 1);
+ connSlot[i].isFree = true;
+ connSlot[i].sock = PQsocket(connSlot[i].connection);
+ }
+
+ if (analyze_in_stages)
+ {
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")
+ };
+
+ if (stage == -1)
+ {
+ int i;
+ for (i = 0; i < 3; i++)
+ {
+ if (!quiet)
+ {
+ puts(gettext(stage_messages[i]));
+ fflush(stdout);
+ }
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, concurrentCons,
+ progname, i, connSlot, completeDb);
+ }
+ }
+ else
+ {
+ /* Otherwise, we got a stage from vacuum_all_databases(), so run
+ * only that one.
+ */
+ if (!quiet)
+ {
+ puts(gettext(stage_messages[stage]));
+ fflush(stdout);
+ }
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze, concurrentCons,
+ progname, stage, connSlot, completeDb);
+ }
+ }
+ else
+ {
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze,
+ concurrentCons, progname, -1, connSlot, completeDb);
+ }
+
+ for (i = 0; i < concurrentCons; i++)
+ {
+ PQfinish(connSlot[i].connection);
+ }
+
+ pfree(connSlot);
+ }
+
+ /*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+ static int
+ select_loop(int maxFd, fd_set *workerset)
+ {
+ int i;
+ fd_set saveSet = *workerset;
+
+ #ifdef WIN32
+ /* should always be the master */
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+ if (in_abort())
+ {
+ return ERROR_IN_ABORT;
+ }
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+ #else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+ if (in_abort())
+ {
+ return -1;
+ }
+
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+ #endif
+
+ return i;
+ }
+
+ /*
+ * DisconnectDatabase
+ * disconnect all the connections.
+ */
+ void
+ DisconnectDatabase(ParallelSlot *slot)
+ {
+ PGcancel *cancel;
+ char errbuf[1];
+
+ if (!slot->connection)
+ return;
+
+ if (PQtransactionStatus(slot->connection) == PQTRANS_ACTIVE)
+ {
+ if ((cancel = PQgetCancel(slot->connection)))
+ {
+ PQcancel(cancel, errbuf, sizeof(errbuf));
+ PQfreeCancel(cancel);
+ }
+ }
+
+ PQfinish(slot->connection);
+ slot->connection= NULL;
+ }
+
+
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+ {
+ if (analyze_only)
+ {
+ appendPQExpBufferStr(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBufferStr(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBufferStr(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBufferStr(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBufferStr(sql, " FULL");
+ if (freeze)
+ appendPQExpBufferStr(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBufferStr(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBufferStr(sql, " ANALYZE");
+ }
+ }
+ }
+
static void
help(const char *progname)
***************
*** 436,441 **** help(const char *progname)
--- 986,992 ----
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many concurrent connections to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
On Mon, Dec 15, 2014 at 4:18 PM, Dilip kumar <dilip.kumar@huawei.com> wrote:
On December 2014 17:31 Amit Kapila Wrote,
Hmm, theoretically I think new behaviour could lead to more I/O in
certain cases as compare to existing behaviour. The reason for more I/O
is that in the new behaviour, while doing Analyze for a particular table
at
different targets, in-between it has Analyze of different table as well,
so the pages in shared buffers or OS cache for a particular table needs
to
be reloded again for a new target whereas currently it will do all stages
of Analyze for a particular table in one-go which means that each stage
of Analyze could get benefit from the pages of a table loaded by previous
stage. If you agree, then we should try to avoid this change in new
behaviour.
I will work on this comment and post the updated patch..
One idea is to send all the stages and corresponding Analyze commands
to server in one go which means something like
"BEGIN; SET default_statistics_target=1; SET vacuum_cost_delay=0;
Analyze t1; COMMIT;"
"BEGIN; SET default_statistics_target=10; RESET vacuum_cost_delay;
Analyze t1; COMMIT;"
"BEGIN; RESET default_statistics_target;
Analyze t1; COMMIT;"
Now, still parallel operations in other backends could lead to
page misses, but I think the impact will be minimized.
I will move this patch to the latest commitfest.
By the way, I think this patch should be in Waiting On Author stage.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 19 December 2014 16:41, Amit Kapila Wrote,
One idea is to send all the stages and corresponding Analyze commands
to server in one go which means something like
"BEGIN; SET default_statistics_target=1; SET vacuum_cost_delay=0;
Analyze t1; COMMIT;"
"BEGIN; SET default_statistics_target=10; RESET vacuum_cost_delay;
Analyze t1; COMMIT;"
"BEGIN; RESET default_statistics_target;
Analyze t1; COMMIT;"
Case1:In Case for CompleteDB:
In base code first it will process all the tables in stage 1 then in stage2 and so on, so that at some time all the tables are analyzed at least up to certain stage.
But If we process all the stages for one table first, and then take the other table for processing the stage 1, then it may happen that for some table all the stages are processed,
but others are waiting for even first stage to be processed, this will affect the functionality for analyze-in-stages.
Case2: In case for independent tables like –t “t1” –t “t2”
In base code also currently we are processing all the stages for first table and processing same for next table and so on.
I think, if user is giving multiple tables together then his purpose might be to analyze those tables together stage by stage,
but in our code we analyze table1 in all stages and then only considering the next table.
So for tables also it should be like
Stage1:
T1
T2
..
Stage2:
T1
T2
…
Thoughts?
Now, still parallel operations in other backends could lead to
page misses, but I think the impact will be minimized.
Regards,
Dilip
On Wed, Dec 24, 2014 at 4:00 PM, Dilip kumar <dilip.kumar@huawei.com> wrote:
Case1:In Case for CompleteDB:
In base code first it will process all the tables in stage 1 then in
stage2 and so on, so that at some time all the tables are analyzed at least
up to certain stage.
But If we process all the stages for one table first, and then take the
other table for processing the stage 1, then it may happen that for some
table all the stages are processed,
but others are waiting for even first stage to be processed, this will
affect the functionality for analyze-in-stages.
Case2: In case for independent tables like –t “t1” –t “t2”
In base code also currently we are processing all the stages for first
table and processing same for next table and so on.
I think, if user is giving multiple tables together then his purpose
might be to analyze those tables together stage by stage,
but in our code we analyze table1 in all stages and then only considering
the next table.
So basically you want to say that currently the processing for
tables with --analyze-in-stages switch is different when the user
executes vacuumdb for whole database versus when it does for
individual tables (multiple tables together). In the proposed patch
the processing for tables will be same for either cases (whole
database or independent tables). I think your point has merit, so
lets proceed with this as it is in your patch.
Do you have anything more to handle in patch or shall I take one
another look and pass it to committer if it is ready for the same.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 29 December 2014 10:22 Amit Kapila Wrote,
Case1:In Case for CompleteDB:
In base code first it will process all the tables in stage 1 then in stage2 and so on, so that at some time all the tables are analyzed at least up to certain stage.
But If we process all the stages for one table first, and then take the other table for processing the stage 1, then it may happen that for some table all the stages are processed,
but others are waiting for even first stage to be processed, this will affect the functionality for analyze-in-stages.
Case2: In case for independent tables like –t “t1” –t “t2”
In base code also currently we are processing all the stages for first table and processing same for next table and so on.
I think, if user is giving multiple tables together then his purpose might be to analyze those tables together stage by stage,
but in our code we analyze table1 in all stages and then only considering the next table.So basically you want to say that currently the processing for
tables with --analyze-in-stages switch is different when the user
executes vacuumdb for whole database versus when it does for
individual tables (multiple tables together). In the proposed patch
the processing for tables will be same for either cases (whole
database or independent tables). I think your point has merit, so
lets proceed with this as it is in your patch.
Do you have anything more to handle in patch or shall I take one
another look and pass it to committer if it is ready for the same.
I think nothing more to be handled from my side, you can go ahead with review..
Regards,
Dilip
On Mon, Dec 29, 2014 at 11:10 AM, Dilip kumar <dilip.kumar@huawei.com>
wrote:
On 29 December 2014 10:22 Amit Kapila Wrote,
I think nothing more to be handled from my side, you can go ahead with
review..
The patch looks good to me. I have done couple of
cosmetic changes (spelling mistakes, improve some comments,
etc.), check the same once and if you are okay, we can move
ahead.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
vacuumdb_parallel_v21.patchapplication/octet-stream; name=vacuumdb_parallel_v21.patchDownload
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 3ecd999..e4a971f 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -204,6 +204,27 @@ PostgreSQL documentation
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></option></term>
+ <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+ <listitem>
+ <para>
+ Number of concurrent connections to perform the operation.
+ This option will enable the vacuum operation to run on asynchronous
+ connections, at a time one table will be operated on one connection.
+ So at one time as many tables will be vacuumed parallely as number of
+ jobs. If number of jobs given are more than number of tables then
+ number of jobs will be set to number of tables.
+ </para>
+ <para>
+ <application>vacuumdb</application> will open
+ <replaceable class="parameter"> njobs</replaceable> connections to the
+ database, so make sure your <xref linkend="guc-max-connections">
+ setting is high enough to accommodate all connections.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>--analyze-in-stages</option></term>
<listitem>
<para>
diff --git a/src/bin/scripts/common.c b/src/bin/scripts/common.c
index 311fed5..2220a4e 100644
--- a/src/bin/scripts/common.c
+++ b/src/bin/scripts/common.c
@@ -19,10 +19,9 @@
#include "common.h"
-static void SetCancelConn(PGconn *conn);
-static void ResetCancelConn(void);
static PGcancel *volatile cancelConn = NULL;
+static bool inAbort = false;
#ifdef WIN32
static CRITICAL_SECTION cancelConnLock;
@@ -291,7 +290,7 @@ yesno_prompt(const char *question)
*
* Set cancelConn to point to the current database connection.
*/
-static void
+void
SetCancelConn(PGconn *conn)
{
PGcancel *oldCancelConn;
@@ -321,7 +320,7 @@ SetCancelConn(PGconn *conn)
*
* Free the current cancel connection, if any, and set to NULL.
*/
-static void
+void
ResetCancelConn(void)
{
PGcancel *oldCancelConn;
@@ -359,10 +358,15 @@ handle_sigint(SIGNAL_ARGS)
if (cancelConn != NULL)
{
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
+ {
+ inAbort = true;
fprintf(stderr, _("Cancel request sent\n"));
+ }
else
fprintf(stderr, _("Could not send cancel request: %s"), errbuf);
}
+ else
+ inAbort = true;
errno = save_errno; /* just in case the write changed it */
}
@@ -392,10 +396,16 @@ consoleHandler(DWORD dwCtrlType)
if (cancelConn != NULL)
{
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
+ {
fprintf(stderr, _("Cancel request sent\n"));
+ inAbort = true;
+ }
else
fprintf(stderr, _("Could not send cancel request: %s"), errbuf);
}
+ else
+ inAbort = true;
+
LeaveCriticalSection(&cancelConnLock);
return TRUE;
@@ -414,3 +424,8 @@ setup_cancel_handler(void)
}
#endif /* WIN32 */
+
+bool in_abort()
+{
+ return inAbort;
+}
diff --git a/src/bin/scripts/common.h b/src/bin/scripts/common.h
index 691f6c6..3bafde3 100644
--- a/src/bin/scripts/common.h
+++ b/src/bin/scripts/common.h
@@ -49,4 +49,9 @@ extern bool yesno_prompt(const char *question);
extern void setup_cancel_handler(void);
+extern void SetCancelConn(PGconn *conn);
+extern void ResetCancelConn(void);
+extern bool in_abort(void);
+
+
#endif /* COMMON_H */
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 86e6ab3..05cd74f 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -14,6 +14,18 @@
#include "common.h"
#include "dumputils.h"
+#define NO_SLOT (-1)
+
+/* Arguments needed for a worker process */
+typedef struct ParallelSlot
+{
+ PGconn *connection;
+ bool isFree;
+ pgsocket sock;
+} ParallelSlot;
+
+#define ERRCODE_UNDEFINED_TABLE "42P01"
+#define ERROR_IN_ABORT -2
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
bool and_analyze, bool analyze_only, bool analyze_in_stages, int stage, bool freeze,
@@ -25,10 +37,40 @@ static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
- const char *progname, bool echo, bool quiet);
+ const char *progname, bool echo, bool quiet,
+ int concurrentCons);
static void help(const char *progname);
+void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only,
+ bool analyze_in_stages, int stage, bool freeze,
+ const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int concurrentCons,
+ SimpleStringList *tables, bool quiet);
+
+
+void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+static void
+run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int concurrentCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb);
+static int
+GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb);
+
+static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb);
+
+static int
+select_loop(int maxFd, fd_set *workerset);
+
+static void DisconnectDatabase(ParallelSlot *slot);
+
int
main(int argc, char *argv[])
@@ -49,6 +91,7 @@ main(int argc, char *argv[])
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
@@ -74,13 +117,15 @@ main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int concurrentCons = 0;
+ int tbl_count = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
- while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
+ while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
@@ -121,14 +166,27 @@ main(int argc, char *argv[])
alldb = true;
break;
case 't':
+ {
simple_string_list_append(&tables, optarg);
+ tbl_count++;
break;
+ }
case 'f':
full = true;
break;
case 'v':
verbose = true;
break;
+ case 'j':
+ concurrentCons = atoi(optarg);
+ if (concurrentCons <= 0)
+ {
+ fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least 1\n"),
+ progname);
+ exit(1);
+ }
+
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
@@ -141,6 +199,7 @@ main(int argc, char *argv[])
}
}
+ optind++;
/*
* Non-option argument specifies database name as long as it wasn't
@@ -179,6 +238,10 @@ main(int argc, char *argv[])
setup_cancel_handler();
+ /* Avoid opening extra connections. */
+ if (tbl_count && (concurrentCons > tbl_count))
+ concurrentCons = tbl_count;
+
if (alldb)
{
if (dbname)
@@ -196,7 +259,7 @@ main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
- prompt_password, progname, echo, quiet);
+ prompt_password, progname, echo, quiet, concurrentCons);
}
else
{
@@ -210,25 +273,36 @@ main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
- if (tables.head != NULL)
+ if (concurrentCons > 1)
{
- SimpleStringListCell *cell;
+ vacuum_parallel(dbname, full, verbose, and_analyze,
+ analyze_only, analyze_in_stages, -1,
+ freeze, host, port, username, prompt_password,
+ progname, echo, concurrentCons, &tables, quiet);
- for (cell = tables.head; cell; cell = cell->next)
+ }
+ else
+ {
+ if (tables.head != NULL)
{
+ SimpleStringListCell *cell;
+
+ for (cell = tables.head; cell; cell = cell->next)
+ {
+ vacuum_one_database(dbname, full, verbose, and_analyze,
+ analyze_only, analyze_in_stages, -1,
+ freeze, cell->val,
+ host, port, username, prompt_password,
+ progname, echo, quiet);
+ }
+ }
+ else
vacuum_one_database(dbname, full, verbose, and_analyze,
analyze_only, analyze_in_stages, -1,
- freeze, cell->val,
+ freeze, NULL,
host, port, username, prompt_password,
progname, echo, quiet);
- }
}
- else
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages, -1,
- freeze, NULL,
- host, port, username, prompt_password,
- progname, echo, quiet);
}
exit(0);
@@ -268,56 +342,9 @@ vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
conn = connectDatabase(dbname, host, port, username, prompt_password,
progname, false);
- if (analyze_only)
- {
- appendPQExpBufferStr(&sql, "ANALYZE");
- if (verbose)
- appendPQExpBufferStr(&sql, " VERBOSE");
- }
- else
- {
- appendPQExpBufferStr(&sql, "VACUUM");
- if (PQserverVersion(conn) >= 90000)
- {
- const char *paren = " (";
- const char *comma = ", ";
- const char *sep = paren;
+ prepare_command(conn, full, verbose,
+ and_analyze, analyze_only, freeze, &sql);
- if (full)
- {
- appendPQExpBuffer(&sql, "%sFULL", sep);
- sep = comma;
- }
- if (freeze)
- {
- appendPQExpBuffer(&sql, "%sFREEZE", sep);
- sep = comma;
- }
- if (verbose)
- {
- appendPQExpBuffer(&sql, "%sVERBOSE", sep);
- sep = comma;
- }
- if (and_analyze)
- {
- appendPQExpBuffer(&sql, "%sANALYZE", sep);
- sep = comma;
- }
- if (sep != paren)
- appendPQExpBufferStr(&sql, ")");
- }
- else
- {
- if (full)
- appendPQExpBufferStr(&sql, " FULL");
- if (freeze)
- appendPQExpBufferStr(&sql, " FREEZE");
- if (verbose)
- appendPQExpBufferStr(&sql, " VERBOSE");
- if (and_analyze)
- appendPQExpBufferStr(&sql, " ANALYZE");
- }
- }
if (table)
appendPQExpBuffer(&sql, " %s", table);
appendPQExpBufferStr(&sql, ";");
@@ -353,8 +380,10 @@ vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
}
else
{
- /* Otherwise, we got a stage from vacuum_all_databases(), so run
- * only that one. */
+ /*
+ * Otherwise, we got a stage from vacuum_all_databases(), so run
+ * only that one.
+ */
if (!quiet)
{
puts(gettext(stage_messages[stage]));
@@ -374,11 +403,12 @@ vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
static void
-vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
- bool analyze_in_stages, bool freeze, const char *maintenance_db,
- const char *host, const char *port,
- const char *username, enum trivalue prompt_password,
- const char *progname, bool echo, bool quiet)
+vacuum_all_databases(bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool analyze_in_stages, bool freeze,
+ const char *maintenance_db, const char *host,
+ const char *port, const char *username,
+ enum trivalue prompt_password, const char *progname,
+ bool echo, bool quiet, int concurrentCons)
{
PGconn *conn;
PGresult *result;
@@ -390,7 +420,8 @@ vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
PQfinish(conn);
/* If analyzing in stages, then run through all stages. Otherwise just
- * run once, passing -1 as the stage. */
+ * run once, passing -1 as the stage.
+ */
for (stage = (analyze_in_stages ? 0 : -1);
stage < (analyze_in_stages ? 3 : 0);
stage++)
@@ -407,6 +438,15 @@ vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
fflush(stdout);
}
+ if (concurrentCons > 1)
+ {
+ vacuum_parallel(dbname, full, verbose, and_analyze,
+ analyze_only, analyze_in_stages, stage,
+ freeze, host, port, username, prompt_password,
+ progname, echo, concurrentCons, NULL, quiet);
+
+ }
+ else
vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
analyze_in_stages, stage,
freeze, NULL, host, port, username, prompt_password,
@@ -417,6 +457,507 @@ vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
PQclear(result);
}
+/*
+ * run_parallel_vacuum
+ * This function does the actual work for sending the jobs
+ * concurrently to server.
+ */
+static void
+run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int concurrentCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb)
+{
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+ int max_slot = concurrentCons;
+ int i;
+ int free_slot;
+ PGconn *slotconn;
+ bool error = false;
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"};
+
+ initPQExpBuffer(&sql);
+
+ if (analyze_stage >= 0)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ executeCommand(connSlot[i].connection,
+ stage_commands[analyze_stage], progname, echo);
+ }
+ }
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ if (in_abort())
+ {
+ error = true;
+ goto fail;
+ }
+
+ /*
+ * This will give the free connection slot, if no slot is free it will
+ * wait for atleast one slot to get free.
+ */
+ free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname,
+ completedb);
+ if (free_slot == NO_SLOT)
+ {
+ error = true;
+ goto fail;
+ }
+
+ prepare_command(connSlot[free_slot].connection, full, verbose,
+ and_analyze, analyze_only, freeze, &sql);
+
+ appendPQExpBuffer(&sql, " %s", cell->val);
+ appendPQExpBufferStr(&sql, ";");
+
+ connSlot[free_slot].isFree = false;
+
+ slotconn = connSlot[free_slot].connection;
+ PQsendQuery(slotconn, sql.data);
+
+ resetPQExpBuffer(&sql);
+ }
+
+ for (i = 0; i < max_slot; i++)
+ {
+ /* wait for all connection to return the results*/
+ if (!GetQueryResult(connSlot[i].connection, dbname, progname,
+ completedb))
+ {
+ error = true;
+ goto fail;
+ }
+
+ connSlot[i].isFree = true;
+ }
+
+fail:
+
+ termPQExpBuffer(&sql);
+
+ if (error)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ DisconnectDatabase(&connSlot[i]);
+ }
+
+ pfree(connSlot);
+
+ exit(1);
+ }
+}
+
+/*
+ * GetIdleSlot
+ * Process the slot list, if any free slot is available then return
+ * the slotid else perform the select on all the socket's and wait
+ * until atleast one slot becomes available.
+ */
+static int
+GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb)
+{
+ int i;
+ fd_set slotset;
+ int firstFree = -1;
+ pgsocket maxFd;
+
+ for (i = 0; i < max_slot; i++)
+ if (pSlot[i].isFree)
+ return i;
+
+ FD_ZERO(&slotset);
+
+ maxFd = pSlot[0].sock;
+
+ for (i = 0; i < max_slot; i++)
+ {
+ FD_SET(pSlot[i].sock, &slotset);
+ if (pSlot[i].sock > maxFd)
+ maxFd = pSlot[i].sock;
+ }
+
+ /*
+ * No free slot found, so wait untill one of the connections
+ * has finished it's task and return the available slot.
+ */
+ do
+ {
+ SetCancelConn(pSlot[0].connection);
+
+ i = select_loop(maxFd, &slotset);
+
+ ResetCancelConn();
+
+ if (i == ERROR_IN_ABORT)
+ {
+ /*
+ * This can only happen if user has sent the cancel request using
+ * Ctrl+C, cancel is handled by 0th slot, so fetch the error result.
+ */
+ GetQueryResult(pSlot[0].connection, dbname, progname,
+ completedb);
+ return NO_SLOT;
+ }
+
+ Assert(i != 0);
+
+ for (i = 0; i < max_slot; i++)
+ {
+ if (!FD_ISSET(pSlot[i].sock, &slotset))
+ continue;
+
+ PQconsumeInput(pSlot[i].connection);
+ if (PQisBusy(pSlot[i].connection))
+ continue;
+
+ pSlot[i].isFree = true;
+
+ if (!GetQueryResult(pSlot[i].connection, dbname, progname,
+ completedb))
+ return NO_SLOT;
+
+ if (firstFree < 0)
+ firstFree = i;
+ }
+ }while(firstFree < 0);
+
+ return firstFree;
+}
+
+/*
+ * GetQueryResult
+ * Process the query result.
+ */
+static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb)
+{
+ PGresult *result = NULL;
+ PGresult *lastResult = NULL;
+ bool r;
+
+
+ SetCancelConn(conn);
+ while((result = PQgetResult(conn)) != NULL)
+ {
+ PQclear(lastResult);
+ lastResult = result;
+ }
+
+ ResetCancelConn();
+
+ if (!lastResult)
+ return true;
+
+ r = (PQresultStatus(lastResult) == PGRES_COMMAND_OK);
+
+ /*
+ * If user has given the vacuum of complete db and vacuum for
+ * any of the object is failed, it can be ignored and vacuuming
+ * of other objects can be continued, this is the same behavior as
+ * vacuuming of complete db is handled without --jobs option
+ */
+ if (!r)
+ {
+ char *sqlState = PQresultErrorField(lastResult, PG_DIAG_SQLSTATE);
+
+ if (!completedb ||
+ (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0))
+ {
+
+ fprintf(stderr, _("%s: vacuuming of database \"%s\" failed: %s"),
+ progname, dbname, PQerrorMessage(conn));
+
+ PQclear(lastResult);
+ return false;
+ }
+ }
+
+ PQclear(lastResult);
+ return true;
+}
+
+/*
+ * vacuum_parallel
+ * This function will open multiple connections to perform the
+ * vacuum on table's concurrently. Incase vacuum needs to be performed
+ * on database, it retrieve's the list of tables and then perform
+ * vacuum.
+ */
+void
+vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ int stage, bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int concurrentCons,
+ SimpleStringList *tables, bool quiet)
+{
+
+ PGconn *conn;
+ int i;
+ ParallelSlot *connSlot;
+ SimpleStringList dbtables = {NULL, NULL};
+ bool completeDb = false;
+
+ conn = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ /*
+ * if table list is not provided then we need to do vaccum for whole DB
+ * get the list of all tables and prepare the list
+ */
+ if (!tables || !tables->head)
+ {
+ PGresult *res;
+ int ntuple;
+ int i;
+ PQExpBufferData sql;
+
+ initPQExpBuffer(&sql);
+
+ res = executeQuery(conn,
+ "SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns"
+ " WHERE (relkind = \'r\' or relkind = \'m\')"
+ " and c.relnamespace = ns.oid ORDER BY c.relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ appendPQExpBuffer(&sql, "%s",
+ fmtQualifiedId(PQserverVersion(conn),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 0)));
+
+ simple_string_list_append(&dbtables, sql.data);
+ resetPQExpBuffer(&sql);
+ }
+
+ termPQExpBuffer(&sql);
+ tables = &dbtables;
+
+ /* remember that we are vaccuming full database. */
+ completeDb = true;
+
+ if (concurrentCons > ntuple)
+ concurrentCons = ntuple;
+ }
+
+ connSlot = (ParallelSlot*)pg_malloc(concurrentCons * sizeof(ParallelSlot));
+ connSlot[0].connection = conn;
+ connSlot[0].sock = PQsocket(conn);
+
+ PQsetnonblocking(connSlot[0].connection, 1);
+
+ for (i = 1; i < concurrentCons; i++)
+ {
+ connSlot[i].connection = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ PQsetnonblocking(connSlot[i].connection, 1);
+ connSlot[i].isFree = true;
+ connSlot[i].sock = PQsocket(connSlot[i].connection);
+ }
+
+ if (analyze_in_stages)
+ {
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")
+ };
+
+ if (stage == -1)
+ {
+ int i;
+ for (i = 0; i < 3; i++)
+ {
+ if (!quiet)
+ {
+ puts(gettext(stage_messages[i]));
+ fflush(stdout);
+ }
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze,
+ concurrentCons, progname, i, connSlot,
+ completeDb);
+ }
+ }
+ else
+ {
+ /*
+ * Otherwise, we got a stage from vacuum_all_databases(), so run
+ * only that one.
+ */
+ if (!quiet)
+ {
+ puts(gettext(stage_messages[stage]));
+ fflush(stdout);
+ }
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze,
+ concurrentCons, progname, stage,
+ connSlot, completeDb);
+ }
+ }
+ else
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze,
+ concurrentCons, progname, -1, connSlot,
+ completeDb);
+
+ for (i = 0; i < concurrentCons; i++)
+ PQfinish(connSlot[i].connection);
+
+ pfree(connSlot);
+}
+
+/*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+static int
+select_loop(int maxFd, fd_set *workerset)
+{
+ int i;
+ fd_set saveSet = *workerset;
+
+#ifdef WIN32
+ /* should always be the master */
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+ if (in_abort())
+ {
+ return ERROR_IN_ABORT;
+ }
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+#else /* UNIX */
+
+ for (;;)
+ {
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+ if (in_abort())
+ {
+ return -1;
+ }
+
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+#endif
+
+ return i;
+}
+
+/*
+ * DisconnectDatabase
+ * disconnect all the connections.
+ */
+void
+DisconnectDatabase(ParallelSlot *slot)
+{
+ PGcancel *cancel;
+ char errbuf[1];
+
+ if (!slot->connection)
+ return;
+
+ if (PQtransactionStatus(slot->connection) == PQTRANS_ACTIVE)
+ {
+ if ((cancel = PQgetCancel(slot->connection)))
+ {
+ PQcancel(cancel, errbuf, sizeof(errbuf));
+ PQfreeCancel(cancel);
+ }
+ }
+
+ PQfinish(slot->connection);
+ slot->connection= NULL;
+}
+
+
+
+void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+{
+ if (analyze_only)
+ {
+ appendPQExpBufferStr(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBufferStr(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBufferStr(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBufferStr(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBufferStr(sql, " FULL");
+ if (freeze)
+ appendPQExpBufferStr(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBufferStr(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBufferStr(sql, " ANALYZE");
+ }
+ }
+}
+
static void
help(const char *progname)
@@ -436,6 +977,7 @@ help(const char *progname)
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many concurrent connections to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
On 31 December 2014 18:36, Amit Kapila Wrote,
The patch looks good to me. I have done couple of
cosmetic changes (spelling mistakes, improve some comments,
etc.), check the same once and if you are okay, we can move
ahead.
Thanks for review and changes, changes looks fine to me..
Regards,
Dilip
On Fri, Jan 2, 2015 at 11:47 AM, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 31 December 2014 18:36, Amit Kapila Wrote,
The patch looks good to me. I have done couple of
cosmetic changes (spelling mistakes, improve some comments,
etc.), check the same once and if you are okay, we can move
ahead.
Thanks for review and changes, changes looks fine to me..
Okay, I have marked this patch as "Ready For Committer"
Notes for Committer -
There is one behavioural difference in the handling of --analyze-in-stages
switch, when individual tables (by using -t option) are analyzed by
using this switch, patch will process (in case of concurrent jobs) all the
given tables for stage-1 and then for stage-2 and so on whereas in the
unpatched code it will process all the three stages table by table
(table-1 all three stages, table-2 all three stages and so on). I think
the new behaviour is okay as the same is done when this utility does
vacuum for whole database. As there was no input from any committer
on this point, I thought it is better to get the same rather than waiting
more just for one point.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Amit Kapila <amit.kapila16@gmail.com> wrote:
Notes for Committer -
There is one behavioural difference in the handling of --analyze-in-stages
switch, when individual tables (by using -t option) are analyzed by
using this switch, patch will process (in case of concurrent jobs) all the
given tables for stage-1 and then for stage-2 and so on whereas in the
unpatched code it will process all the three stages table by table
(table-1 all three stages, table-2 all three stages and so on). I think
the new behaviour is okay as the same is done when this utility does
vacuum for whole database.
IMV, the change is for the better. The whole point of
--analyze-in-stages is to get minimal statistics so that "good
enough" plans will be built for most queries to allow a production
database to be brought back on-line quickly, followed by generating
increasing granularity (which takes longer but should help ensure
"best plan") while the database is in use with the initial
statistics. Doing all three levels for one table before generating
the rough statistics for the others doesn't help with that, so I
see this change as fixing a bug. Is it feasible to break that part
out as a separate patch?
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 2, 2015 at 8:34 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
Amit Kapila <amit.kapila16@gmail.com> wrote:
Notes for Committer -
There is one behavioural difference in the handling of
--analyze-in-stages
switch, when individual tables (by using -t option) are analyzed by
using this switch, patch will process (in case of concurrent jobs) all
the
given tables for stage-1 and then for stage-2 and so on whereas in the
unpatched code it will process all the three stages table by table
(table-1 all three stages, table-2 all three stages and so on). I think
the new behaviour is okay as the same is done when this utility does
vacuum for whole database.IMV, the change is for the better. The whole point of
--analyze-in-stages is to get minimal statistics so that "good
enough" plans will be built for most queries to allow a production
database to be brought back on-line quickly, followed by generating
increasing granularity (which takes longer but should help ensure
"best plan") while the database is in use with the initial
statistics. Doing all three levels for one table before generating
the rough statistics for the others doesn't help with that, so I
see this change as fixing a bug. Is it feasible to break that part
out as a separate patch?
Currently as the patch stands the fix (or new behaviour) is only
implemented for the multiple jobs option, however to fix this in base
code a separate patch is required.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 2014-12-31 18:35:38 +0530, Amit Kapila wrote:
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></option></term> + <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term> + <listitem> + <para> + Number of concurrent connections to perform the operation. + This option will enable the vacuum operation to run on asynchronous + connections, at a time one table will be operated on one connection. + So at one time as many tables will be vacuumed parallely as number of + jobs. If number of jobs given are more than number of tables then + number of jobs will be set to number of tables.
"asynchronous connections" isn't a very well defined term. Also, the
second part of that sentence doesn't seem to be gramattically correct.
+ </para> + <para> + <application>vacuumdb</application> will open + <replaceable class="parameter"> njobs</replaceable> connections to the + database, so make sure your <xref linkend="guc-max-connections"> + setting is high enough to accommodate all connections. + </para>
Isn't it njobs+1?
@@ -141,6 +199,7 @@ main(int argc, char *argv[])
}
}+ optind++;
Hm, where's that coming from?
+ PQsetnonblocking(connSlot[0].connection, 1); + + for (i = 1; i < concurrentCons; i++) + { + connSlot[i].connection = connectDatabase(dbname, host, port, username, + prompt_password, progname, false); + + PQsetnonblocking(connSlot[i].connection, 1); + connSlot[i].isFree = true; + connSlot[i].sock = PQsocket(connSlot[i].connection); + }
Are you sure about this global PQsetnonblocking()? This means that you
might not be able to send queries... And you don't seem to be waiting
for sockets waiting for writes in the select loop - which means you
might end up being stuck waiting for reads when you haven't submitted
the query.
I think you might need a more complex select() loop. On nonfree
connections also wait for writes if PQflush() returns != 0.
+/* + * GetIdleSlot + * Process the slot list, if any free slot is available then return + * the slotid else perform the select on all the socket's and wait + * until atleast one slot becomes available. + */ +static int +GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname, + const char *progname, bool completedb) +{ + int i; + fd_set slotset;
Hm, you probably need to limit -j to FD_SETSIZE - 1 or so.
+ int firstFree = -1; + pgsocket maxFd; + + for (i = 0; i < max_slot; i++) + if (pSlot[i].isFree) + return i;
+ FD_ZERO(&slotset); + + maxFd = pSlot[0].sock; + + for (i = 0; i < max_slot; i++) + { + FD_SET(pSlot[i].sock, &slotset); + if (pSlot[i].sock > maxFd) + maxFd = pSlot[i].sock; + }
So we're waiting for idle connections?
I think you'll have to have to use two fdsets here, and set the write
set based on PQflush() != 0.
+/* + * A select loop that repeats calling select until a descriptor in the read + * set becomes readable. On Windows we have to check for the termination event + * from time to time, on Unix we can just block forever. + */
Should a) mention why we have to check regularly on windows b) that on
linux we don't have to because we send a cancel event from the signal
handler.
+static int +select_loop(int maxFd, fd_set *workerset) +{ + int i; + fd_set saveSet = *workerset;+#ifdef WIN32
+ /* should always be the master */
Hm?
I have to say, this is a fairly large patch for such a minor feature...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Jan 4, 2015 at 10:57 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-12-31 18:35:38 +0530, Amit Kapila wrote:
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></option></term> + <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term> + <listitem> + <para> + Number of concurrent connections to perform the operation. + This option will enable the vacuum operation to run on asynchronous + connections, at a time one table will be operated on one connection. + So at one time as many tables will be vacuumed parallely as number of + jobs. If number of jobs given are more than number of tables then + number of jobs will be set to number of tables."asynchronous connections" isn't a very well defined term. Also, the
second part of that sentence doesn't seem to be gramattically correct.+ </para> + <para> + <application>vacuumdb</application> will open + <replaceable class="parameter"> njobs</replaceable> connections to the + database, so make sure your <xref linkend="guc-max-connections"> + setting is high enough to accommodate all connections. + </para>Isn't it njobs+1?
@@ -141,6 +199,7 @@ main(int argc, char *argv[])
}
}+ optind++;
Hm, where's that coming from?
+ PQsetnonblocking(connSlot[0].connection, 1); + + for (i = 1; i < concurrentCons; i++) + { + connSlot[i].connection = connectDatabase(dbname, host, port, username, + prompt_password, progname, false); + + PQsetnonblocking(connSlot[i].connection, 1); + connSlot[i].isFree = true; + connSlot[i].sock = PQsocket(connSlot[i].connection); + }Are you sure about this global PQsetnonblocking()? This means that you
might not be able to send queries... And you don't seem to be waiting
for sockets waiting for writes in the select loop - which means you
might end up being stuck waiting for reads when you haven't submitted
the query.I think you might need a more complex select() loop. On nonfree
connections also wait for writes if PQflush() returns != 0.+/* + * GetIdleSlot + * Process the slot list, if any free slot is available then return + * the slotid else perform the select on all the socket's and wait + * until atleast one slot becomes available. + */ +static int +GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname, + const char *progname, bool completedb) +{ + int i; + fd_set slotset;Hm, you probably need to limit -j to FD_SETSIZE - 1 or so.
+ int firstFree = -1; + pgsocket maxFd; + + for (i = 0; i < max_slot; i++) + if (pSlot[i].isFree) + return i;+ FD_ZERO(&slotset); + + maxFd = pSlot[0].sock; + + for (i = 0; i < max_slot; i++) + { + FD_SET(pSlot[i].sock, &slotset); + if (pSlot[i].sock > maxFd) + maxFd = pSlot[i].sock; + }So we're waiting for idle connections?
I think you'll have to have to use two fdsets here, and set the write
set based on PQflush() != 0.+/* + * A select loop that repeats calling select until a descriptor in the read + * set becomes readable. On Windows we have to check for the termination event + * from time to time, on Unix we can just block forever. + */Should a) mention why we have to check regularly on windows b) that on
linux we don't have to because we send a cancel event from the signal
handler.+static int +select_loop(int maxFd, fd_set *workerset) +{ + int i; + fd_set saveSet = *workerset;+#ifdef WIN32
+ /* should always be the master */Hm?
I have to say, this is a fairly large patch for such a minor feature...
Andres, this patch needs more effort from the author, right? So
marking it as returned with feedback.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier wrote:
Andres, this patch needs more effort from the author, right? So
marking it as returned with feedback.
I will give this patch a look in the current commitfest, if you can
please set as 'needs review' instead with me as reviewer, so that I
don't forget, I would appreciate it.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 16, 2015 at 12:53 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
Michael Paquier wrote:
Andres, this patch needs more effort from the author, right? So
marking it as returned with feedback.I will give this patch a look in the current commitfest, if you can
please set as 'needs review' instead with me as reviewer, so that I
don't forget, I would appreciate it.
Fine for me, done this way.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04 January 2015 07:27, Andres Freund Wrote,
On 2014-12-31 18:35:38 +0530, Amit Kapila wrote:
+ <term><option>-j <replaceable
class="parameter">jobs</replaceable></option></term>
+ <term><option>--jobs=<replaceable
class="parameter">njobs</replaceable></option></term>
+ <listitem> + <para> + Number of concurrent connections to perform the operation. + This option will enable the vacuum operation to run onasynchronous
+ connections, at a time one table will be operated on one
connection.
+ So at one time as many tables will be vacuumed parallely as
number of
+ jobs. If number of jobs given are more than number of
tables then
+ number of jobs will be set to number of tables.
"asynchronous connections" isn't a very well defined term. Also, the
second part of that sentence doesn't seem to be gramattically correct.
I have changed this to concurrent connections, is this ok?
+ </para> + <para> + <application>vacuumdb</application> will open + <replaceable class="parameter"> njobs</replaceable>connections to the
+ database, so make sure your <xref linkend="guc-max-
connections">
+ setting is high enough to accommodate all connections. + </para>Isn't it njobs+1?
The main connections what we are using for getting table information, same is use as first slot connections, so total number of connections are still njobs.
@@ -141,6 +199,7 @@ main(int argc, char *argv[])
}
}+ optind++;
Hm, where's that coming from?
This is wrong, I have removed it.
+ PQsetnonblocking(connSlot[0].connection, 1); + + for (i = 1; i < concurrentCons; i++) + { + connSlot[i].connection = connectDatabase(dbname, host, port,username,
+ prompt_password,
progname, false);
+ + PQsetnonblocking(connSlot[i].connection, 1); + connSlot[i].isFree = true; + connSlot[i].sock = PQsocket(connSlot[i].connection); + }Are you sure about this global PQsetnonblocking()? This means that you
might not be able to send queries... And you don't seem to be waiting
for sockets waiting for writes in the select loop - which means you
might end up being stuck waiting for reads when you haven't submitted
the query.I think you might need a more complex select() loop. On nonfree
connections also wait for writes if PQflush() returns != 0.
1. In GetIdleSlot we are making sure that, only if connection is busy, means if we have sent query on that connections, only in that case we will wait.
2. When all the connections are busy in that case we are doing select on all FD to make sure some response on connections, and if there is any response on connections
Select will come out, then we consume the input and check whether connection is idle, or it's just a intermediate response, if it not busy then we process all the result and set it as free.
+/* + * GetIdleSlot + * Process the slot list, if any free slot is available then return + * the slotid else perform the select on all the socket's and wait + * until atleast one slot becomes available. + */ +static int +GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname, + const char *progname, bool completedb) { + int i; + fd_set slotset;Hm, you probably need to limit -j to FD_SETSIZE - 1 or so.
I will change this in next patch..
+ int firstFree = -1; + pgsocket maxFd; + + for (i = 0; i < max_slot; i++) + if (pSlot[i].isFree) + return i;+ FD_ZERO(&slotset); + + maxFd = pSlot[0].sock; + + for (i = 0; i < max_slot; i++) + { + FD_SET(pSlot[i].sock, &slotset); + if (pSlot[i].sock > maxFd) + maxFd = pSlot[i].sock; + }So we're waiting for idle connections?
I think you'll have to have to use two fdsets here, and set the write
set based on PQflush() != 0.
I did not get this ?
The logic here is, we are waiting for any connections to respond, and wait using select on all fds.
When select come out, we check all the socket that which all are not busy, mark all the finished connection as idle at once,
If none of the connection free, we go to select again, otherwise will return first idle connection.
+/* + * A select loop that repeats calling select until a descriptor in +the read + * set becomes readable. On Windows we have to check for the +termination event + * from time to time, on Unix we can just block forever. + */Should a) mention why we have to check regularly on windows b) that on
linux we don't have to because we send a cancel event from the signal
handler.
I have added the comments..
Show quoted text
+static int +select_loop(int maxFd, fd_set *workerset) { + int i; + fd_set saveSet = *workerset;+#ifdef WIN32
+ /* should always be the master */Hm?
I have to say, this is a fairly large patch for such a minor feature...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
vacuumdb_parallel_v22.patchapplication/octet-stream; name=vacuumdb_parallel_v22.patchDownload
*** a/doc/src/sgml/ref/vacuumdb.sgml
--- b/doc/src/sgml/ref/vacuumdb.sgml
***************
*** 204,209 **** PostgreSQL documentation
--- 204,228 ----
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></option></term>
+ <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+ <listitem>
+ <para>
+ This option will enable the vacuum operation to run on concurrent
+ connections. Maximum number of tables can be vacuumed concurrently
+ is equal to number of jobs. If number of jobs given is more than
+ number of tables then number of jobs will be set to number of tables.
+ </para>
+ <para>
+ <application>vacuumdb</application> will open
+ <replaceable class="parameter"> njobs</replaceable> connections to the
+ database, so make sure your <xref linkend="guc-max-connections">
+ setting is high enough to accommodate all connections.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>--analyze-in-stages</option></term>
<listitem>
<para>
*** a/src/bin/scripts/common.c
--- b/src/bin/scripts/common.c
***************
*** 19,28 ****
#include "common.h"
- static void SetCancelConn(PGconn *conn);
- static void ResetCancelConn(void);
static PGcancel *volatile cancelConn = NULL;
#ifdef WIN32
static CRITICAL_SECTION cancelConnLock;
--- 19,27 ----
#include "common.h"
static PGcancel *volatile cancelConn = NULL;
+ static bool inAbort = false;
#ifdef WIN32
static CRITICAL_SECTION cancelConnLock;
***************
*** 291,297 **** yesno_prompt(const char *question)
*
* Set cancelConn to point to the current database connection.
*/
! static void
SetCancelConn(PGconn *conn)
{
PGcancel *oldCancelConn;
--- 290,296 ----
*
* Set cancelConn to point to the current database connection.
*/
! void
SetCancelConn(PGconn *conn)
{
PGcancel *oldCancelConn;
***************
*** 321,327 **** SetCancelConn(PGconn *conn)
*
* Free the current cancel connection, if any, and set to NULL.
*/
! static void
ResetCancelConn(void)
{
PGcancel *oldCancelConn;
--- 320,326 ----
*
* Free the current cancel connection, if any, and set to NULL.
*/
! void
ResetCancelConn(void)
{
PGcancel *oldCancelConn;
***************
*** 359,368 **** handle_sigint(SIGNAL_ARGS)
--- 358,372 ----
if (cancelConn != NULL)
{
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
+ {
+ inAbort = true;
fprintf(stderr, _("Cancel request sent\n"));
+ }
else
fprintf(stderr, _("Could not send cancel request: %s"), errbuf);
}
+ else
+ inAbort = true;
errno = save_errno; /* just in case the write changed it */
}
***************
*** 392,401 **** consoleHandler(DWORD dwCtrlType)
--- 396,411 ----
if (cancelConn != NULL)
{
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
+ {
fprintf(stderr, _("Cancel request sent\n"));
+ inAbort = true;
+ }
else
fprintf(stderr, _("Could not send cancel request: %s"), errbuf);
}
+ else
+ inAbort = true;
+
LeaveCriticalSection(&cancelConnLock);
return TRUE;
***************
*** 414,416 **** setup_cancel_handler(void)
--- 424,431 ----
}
#endif /* WIN32 */
+
+ bool in_abort()
+ {
+ return inAbort;
+ }
*** a/src/bin/scripts/common.h
--- b/src/bin/scripts/common.h
***************
*** 49,52 **** extern bool yesno_prompt(const char *question);
--- 49,57 ----
extern void setup_cancel_handler(void);
+ extern void SetCancelConn(PGconn *conn);
+ extern void ResetCancelConn(void);
+ extern bool in_abort(void);
+
+
#endif /* COMMON_H */
*** a/src/bin/scripts/vacuumdb.c
--- b/src/bin/scripts/vacuumdb.c
***************
*** 14,19 ****
--- 14,31 ----
#include "common.h"
#include "dumputils.h"
+ #define NO_SLOT (-1)
+
+ /* Arguments needed for a worker process */
+ typedef struct ParallelSlot
+ {
+ PGconn *connection;
+ bool isFree;
+ pgsocket sock;
+ } ParallelSlot;
+
+ #define ERRCODE_UNDEFINED_TABLE "42P01"
+ #define ERROR_IN_ABORT -2
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
bool and_analyze, bool analyze_only, bool analyze_in_stages, int stage, bool freeze,
***************
*** 25,34 **** static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet);
static void help(const char *progname);
int
main(int argc, char *argv[])
--- 37,76 ----
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet,
! int concurrentCons);
static void help(const char *progname);
+ void vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only,
+ bool analyze_in_stages, int stage, bool freeze,
+ const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int concurrentCons,
+ SimpleStringList *tables, bool quiet);
+
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql);
+ static void
+ run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int concurrentCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb);
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb);
+
+ static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb);
+
+ static int
+ select_loop(int maxFd, fd_set *workerset);
+
+ static void DisconnectDatabase(ParallelSlot *slot);
+
int
main(int argc, char *argv[])
***************
*** 49,54 **** main(int argc, char *argv[])
--- 91,97 ----
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
***************
*** 74,86 **** main(int argc, char *argv[])
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
{
switch (c)
{
--- 117,131 ----
bool full = false;
bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int concurrentCons = 0;
+ int tbl_count = 0;
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv:j:", long_options, &optindex)) != -1)
{
switch (c)
{
***************
*** 121,134 **** main(int argc, char *argv[])
--- 166,192 ----
alldb = true;
break;
case 't':
+ {
simple_string_list_append(&tables, optarg);
+ tbl_count++;
break;
+ }
case 'f':
full = true;
break;
case 'v':
verbose = true;
break;
+ case 'j':
+ concurrentCons = atoi(optarg);
+ if (concurrentCons <= 0)
+ {
+ fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least 1\n"),
+ progname);
+ exit(1);
+ }
+
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
***************
*** 141,147 **** main(int argc, char *argv[])
}
}
-
/*
* Non-option argument specifies database name as long as it wasn't
* already specified with -d / --dbname
--- 199,204 ----
***************
*** 179,184 **** main(int argc, char *argv[])
--- 236,245 ----
setup_cancel_handler();
+ /* Avoid opening extra connections. */
+ if (tbl_count && (concurrentCons > tbl_count))
+ concurrentCons = tbl_count;
+
if (alldb)
{
if (dbname)
***************
*** 196,202 **** main(int argc, char *argv[])
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet);
}
else
{
--- 257,263 ----
vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet, concurrentCons);
}
else
{
***************
*** 210,234 **** main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
! if (tables.head != NULL)
{
! SimpleStringListCell *cell;
! for (cell = tables.head; cell; cell = cell->next)
{
vacuum_one_database(dbname, full, verbose, and_analyze,
analyze_only, analyze_in_stages, -1,
! freeze, cell->val,
host, port, username, prompt_password,
progname, echo, quiet);
- }
}
- else
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages, -1,
- freeze, NULL,
- host, port, username, prompt_password,
- progname, echo, quiet);
}
exit(0);
--- 271,306 ----
dbname = get_user_name_or_exit(progname);
}
! if (concurrentCons > 1)
{
! vacuum_parallel(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages, -1,
! freeze, host, port, username, prompt_password,
! progname, echo, concurrentCons, &tables, quiet);
! }
! else
! {
! if (tables.head != NULL)
{
+ SimpleStringListCell *cell;
+
+ for (cell = tables.head; cell; cell = cell->next)
+ {
+ vacuum_one_database(dbname, full, verbose, and_analyze,
+ analyze_only, analyze_in_stages, -1,
+ freeze, cell->val,
+ host, port, username, prompt_password,
+ progname, echo, quiet);
+ }
+ }
+ else
vacuum_one_database(dbname, full, verbose, and_analyze,
analyze_only, analyze_in_stages, -1,
! freeze, NULL,
host, port, username, prompt_password,
progname, echo, quiet);
}
}
exit(0);
***************
*** 268,323 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
conn = connectDatabase(dbname, host, port, username, prompt_password,
progname, false);
! if (analyze_only)
! {
! appendPQExpBufferStr(&sql, "ANALYZE");
! if (verbose)
! appendPQExpBufferStr(&sql, " VERBOSE");
! }
! else
! {
! appendPQExpBufferStr(&sql, "VACUUM");
! if (PQserverVersion(conn) >= 90000)
! {
! const char *paren = " (";
! const char *comma = ", ";
! const char *sep = paren;
- if (full)
- {
- appendPQExpBuffer(&sql, "%sFULL", sep);
- sep = comma;
- }
- if (freeze)
- {
- appendPQExpBuffer(&sql, "%sFREEZE", sep);
- sep = comma;
- }
- if (verbose)
- {
- appendPQExpBuffer(&sql, "%sVERBOSE", sep);
- sep = comma;
- }
- if (and_analyze)
- {
- appendPQExpBuffer(&sql, "%sANALYZE", sep);
- sep = comma;
- }
- if (sep != paren)
- appendPQExpBufferStr(&sql, ")");
- }
- else
- {
- if (full)
- appendPQExpBufferStr(&sql, " FULL");
- if (freeze)
- appendPQExpBufferStr(&sql, " FREEZE");
- if (verbose)
- appendPQExpBufferStr(&sql, " VERBOSE");
- if (and_analyze)
- appendPQExpBufferStr(&sql, " ANALYZE");
- }
- }
if (table)
appendPQExpBuffer(&sql, " %s", table);
appendPQExpBufferStr(&sql, ";");
--- 340,348 ----
conn = connectDatabase(dbname, host, port, username, prompt_password,
progname, false);
! prepare_command(conn, full, verbose,
! and_analyze, analyze_only, freeze, &sql);
if (table)
appendPQExpBuffer(&sql, " %s", table);
appendPQExpBufferStr(&sql, ";");
***************
*** 353,360 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
}
else
{
! /* Otherwise, we got a stage from vacuum_all_databases(), so run
! * only that one. */
if (!quiet)
{
puts(gettext(stage_messages[stage]));
--- 378,387 ----
}
else
{
! /*
! * Otherwise, we got a stage from vacuum_all_databases(), so run
! * only that one.
! */
if (!quiet)
{
puts(gettext(stage_messages[stage]));
***************
*** 374,384 **** vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyz
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
! bool analyze_in_stages, bool freeze, const char *maintenance_db,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
--- 401,412 ----
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze,
! const char *maintenance_db, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password, const char *progname,
! bool echo, bool quiet, int concurrentCons)
{
PGconn *conn;
PGresult *result;
***************
*** 390,396 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
PQfinish(conn);
/* If analyzing in stages, then run through all stages. Otherwise just
! * run once, passing -1 as the stage. */
for (stage = (analyze_in_stages ? 0 : -1);
stage < (analyze_in_stages ? 3 : 0);
stage++)
--- 418,425 ----
PQfinish(conn);
/* If analyzing in stages, then run through all stages. Otherwise just
! * run once, passing -1 as the stage.
! */
for (stage = (analyze_in_stages ? 0 : -1);
stage < (analyze_in_stages ? 3 : 0);
stage++)
***************
*** 407,412 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
--- 436,450 ----
fflush(stdout);
}
+ if (concurrentCons > 1)
+ {
+ vacuum_parallel(dbname, full, verbose, and_analyze,
+ analyze_only, analyze_in_stages, stage,
+ freeze, host, port, username, prompt_password,
+ progname, echo, concurrentCons, NULL, quiet);
+
+ }
+ else
vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
analyze_in_stages, stage,
freeze, NULL, host, port, username, prompt_password,
***************
*** 417,422 **** vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_onl
--- 455,967 ----
PQclear(result);
}
+ /*
+ * run_parallel_vacuum
+ * This function does the actual work for sending the jobs
+ * concurrently to server.
+ */
+ static void
+ run_parallel_vacuum(bool echo, const char *dbname, SimpleStringList *tables,
+ bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, int concurrentCons,
+ const char *progname, int analyze_stage,
+ ParallelSlot *connSlot, bool completedb)
+ {
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+ int max_slot = concurrentCons;
+ int i;
+ int free_slot;
+ PGconn *slotconn;
+ bool error = false;
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"};
+
+ initPQExpBuffer(&sql);
+
+ if (analyze_stage >= 0)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ executeCommand(connSlot[i].connection,
+ stage_commands[analyze_stage], progname, echo);
+ }
+ }
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ if (in_abort())
+ {
+ error = true;
+ goto fail;
+ }
+
+ /*
+ * This will give the free connection slot, if no slot is free it will
+ * wait for atleast one slot to get free.
+ */
+ free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname,
+ completedb);
+ if (free_slot == NO_SLOT)
+ {
+ error = true;
+ goto fail;
+ }
+
+ prepare_command(connSlot[free_slot].connection, full, verbose,
+ and_analyze, analyze_only, freeze, &sql);
+
+ appendPQExpBuffer(&sql, " %s", cell->val);
+ appendPQExpBufferStr(&sql, ";");
+
+ connSlot[free_slot].isFree = false;
+
+ slotconn = connSlot[free_slot].connection;
+ PQsendQuery(slotconn, sql.data);
+
+ resetPQExpBuffer(&sql);
+ }
+
+ for (i = 0; i < max_slot; i++)
+ {
+ /* wait for all connection to return the results*/
+ if (!GetQueryResult(connSlot[i].connection, dbname, progname,
+ completedb))
+ {
+ error = true;
+ goto fail;
+ }
+
+ connSlot[i].isFree = true;
+ }
+
+ fail:
+
+ termPQExpBuffer(&sql);
+
+ if (error)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ DisconnectDatabase(&connSlot[i]);
+ }
+
+ pfree(connSlot);
+
+ exit(1);
+ }
+ }
+
+ /*
+ * GetIdleSlot
+ * Process the slot list, if any free slot is available then return
+ * the slotid else perform the select on all the socket's and wait
+ * until atleast one slot becomes available.
+ */
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const char *progname, bool completedb)
+ {
+ int i;
+ fd_set slotset;
+ int firstFree = -1;
+ pgsocket maxFd;
+
+ for (i = 0; i < max_slot; i++)
+ if (pSlot[i].isFree)
+ return i;
+
+ FD_ZERO(&slotset);
+
+ maxFd = pSlot[0].sock;
+
+ for (i = 0; i < max_slot; i++)
+ {
+ FD_SET(pSlot[i].sock, &slotset);
+ if (pSlot[i].sock > maxFd)
+ maxFd = pSlot[i].sock;
+ }
+
+ /*
+ * No free slot found, so wait untill one of the connections
+ * has finished it's task and return the available slot.
+ */
+ do
+ {
+ SetCancelConn(pSlot[0].connection);
+
+ i = select_loop(maxFd, &slotset);
+
+ ResetCancelConn();
+
+ if (i == ERROR_IN_ABORT)
+ {
+ /*
+ * This can only happen if user has sent the cancel request using
+ * Ctrl+C, cancel is handled by 0th slot, so fetch the error result.
+ */
+ GetQueryResult(pSlot[0].connection, dbname, progname,
+ completedb);
+ return NO_SLOT;
+ }
+
+ Assert(i != 0);
+
+ for (i = 0; i < max_slot; i++)
+ {
+ if (!FD_ISSET(pSlot[i].sock, &slotset))
+ continue;
+
+ PQconsumeInput(pSlot[i].connection);
+ if (PQisBusy(pSlot[i].connection))
+ continue;
+
+ pSlot[i].isFree = true;
+
+ if (!GetQueryResult(pSlot[i].connection, dbname, progname,
+ completedb))
+ return NO_SLOT;
+
+ if (firstFree < 0)
+ firstFree = i;
+ }
+ }while(firstFree < 0);
+
+ return firstFree;
+ }
+
+ /*
+ * GetQueryResult
+ * Process the query result.
+ */
+ static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname, bool completedb)
+ {
+ PGresult *result = NULL;
+ PGresult *lastResult = NULL;
+ bool r;
+
+
+ SetCancelConn(conn);
+ while((result = PQgetResult(conn)) != NULL)
+ {
+ PQclear(lastResult);
+ lastResult = result;
+ }
+
+ ResetCancelConn();
+
+ if (!lastResult)
+ return true;
+
+ r = (PQresultStatus(lastResult) == PGRES_COMMAND_OK);
+
+ /*
+ * If user has given the vacuum of complete db and vacuum for
+ * any of the object is failed, it can be ignored and vacuuming
+ * of other objects can be continued, this is the same behavior as
+ * vacuuming of complete db is handled without --jobs option
+ */
+ if (!r)
+ {
+ char *sqlState = PQresultErrorField(lastResult, PG_DIAG_SQLSTATE);
+
+ if (!completedb ||
+ (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0))
+ {
+
+ fprintf(stderr, _("%s: vacuuming of database \"%s\" failed: %s"),
+ progname, dbname, PQerrorMessage(conn));
+
+ PQclear(lastResult);
+ return false;
+ }
+ }
+
+ PQclear(lastResult);
+ return true;
+ }
+
+ /*
+ * vacuum_parallel
+ * This function will open multiple connections to perform the
+ * vacuum on table's concurrently. Incase vacuum needs to be performed
+ * on database, it retrieve's the list of tables and then perform
+ * vacuum.
+ */
+ void
+ vacuum_parallel(const char *dbname, bool full, bool verbose,
+ bool and_analyze, bool analyze_only, bool analyze_in_stages,
+ int stage, bool freeze, const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ const char *progname, bool echo, int concurrentCons,
+ SimpleStringList *tables, bool quiet)
+ {
+
+ PGconn *conn;
+ int i;
+ ParallelSlot *connSlot;
+ SimpleStringList dbtables = {NULL, NULL};
+ bool completeDb = false;
+
+ conn = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ /*
+ * if table list is not provided then we need to do vaccum for whole DB
+ * get the list of all tables and prepare the list
+ */
+ if (!tables || !tables->head)
+ {
+ PGresult *res;
+ int ntuple;
+ int i;
+ PQExpBufferData sql;
+
+ initPQExpBuffer(&sql);
+
+ res = executeQuery(conn,
+ "SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns"
+ " WHERE (relkind = \'r\' or relkind = \'m\')"
+ " and c.relnamespace = ns.oid ORDER BY c.relpages desc",
+ progname, echo);
+
+ ntuple = PQntuples(res);
+ for (i = 0; i < ntuple; i++)
+ {
+ appendPQExpBuffer(&sql, "%s",
+ fmtQualifiedId(PQserverVersion(conn),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 0)));
+
+ simple_string_list_append(&dbtables, sql.data);
+ resetPQExpBuffer(&sql);
+ }
+
+ termPQExpBuffer(&sql);
+ tables = &dbtables;
+
+ /* remember that we are vaccuming full database. */
+ completeDb = true;
+
+ if (concurrentCons > ntuple)
+ concurrentCons = ntuple;
+ }
+
+ connSlot = (ParallelSlot*)pg_malloc(concurrentCons * sizeof(ParallelSlot));
+ connSlot[0].connection = conn;
+ connSlot[0].sock = PQsocket(conn);
+
+ PQsetnonblocking(connSlot[0].connection, 1);
+
+ for (i = 1; i < concurrentCons; i++)
+ {
+ connSlot[i].connection = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ PQsetnonblocking(connSlot[i].connection, 1);
+ connSlot[i].isFree = true;
+ connSlot[i].sock = PQsocket(connSlot[i].connection);
+ }
+
+ if (analyze_in_stages)
+ {
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")
+ };
+
+ if (stage == -1)
+ {
+ int i;
+ for (i = 0; i < 3; i++)
+ {
+ if (!quiet)
+ {
+ puts(gettext(stage_messages[i]));
+ fflush(stdout);
+ }
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze,
+ concurrentCons, progname, i, connSlot,
+ completeDb);
+ }
+ }
+ else
+ {
+ /*
+ * Otherwise, we got a stage from vacuum_all_databases(), so run
+ * only that one.
+ */
+ if (!quiet)
+ {
+ puts(gettext(stage_messages[stage]));
+ fflush(stdout);
+ }
+
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze,
+ concurrentCons, progname, stage,
+ connSlot, completeDb);
+ }
+ }
+ else
+ run_parallel_vacuum(echo, dbname, tables, full, verbose,
+ and_analyze, analyze_only, freeze,
+ concurrentCons, progname, -1, connSlot,
+ completeDb);
+
+ for (i = 0; i < concurrentCons; i++)
+ PQfinish(connSlot[i].connection);
+
+ pfree(connSlot);
+ }
+
+ /*
+ * A select loop that repeats calling select until a descriptor in the read
+ * set becomes readable. On Windows we have to check for the termination event
+ * from time to time, on Unix we can just block forever.
+ */
+ static int
+ select_loop(int maxFd, fd_set *workerset)
+ {
+ int i;
+ fd_set saveSet = *workerset;
+
+ #ifdef WIN32
+ /* should always be the master */
+ for (;;)
+ {
+ /*
+ * sleep a quarter of a second before checking if we should terminate.
+ * after every quarter seconds we need to check if user have terminated
+ * using a Ctrl+C.
+ */
+ struct timeval tv = {0, 250000};
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, &tv);
+ if (in_abort())
+ {
+ return ERROR_IN_ABORT;
+ }
+
+ if (i == SOCKET_ERROR && WSAGetLastError() == WSAEINTR)
+ continue;
+ if (i)
+ break;
+ }
+ #else /* UNIX */
+
+ for (;;)
+ {
+ /*
+ * In linux sleep forever, because cancel even is handled using signal
+ * handler.
+ */
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, NULL);
+ if (in_abort())
+ {
+ return -1;
+ }
+
+ if (i < 0 && errno == EINTR)
+ continue;
+ break;
+ }
+ #endif
+
+ return i;
+ }
+
+ /*
+ * DisconnectDatabase
+ * disconnect all the connections.
+ */
+ void
+ DisconnectDatabase(ParallelSlot *slot)
+ {
+ PGcancel *cancel;
+ char errbuf[1];
+
+ if (!slot->connection)
+ return;
+
+ if (PQtransactionStatus(slot->connection) == PQTRANS_ACTIVE)
+ {
+ if ((cancel = PQgetCancel(slot->connection)))
+ {
+ PQcancel(cancel, errbuf, sizeof(errbuf));
+ PQfreeCancel(cancel);
+ }
+ }
+
+ PQfinish(slot->connection);
+ slot->connection= NULL;
+ }
+
+
+
+ void prepare_command(PGconn *conn, bool full, bool verbose, bool and_analyze,
+ bool analyze_only, bool freeze, PQExpBuffer sql)
+ {
+ if (analyze_only)
+ {
+ appendPQExpBufferStr(sql, "ANALYZE");
+ if (verbose)
+ appendPQExpBufferStr(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBufferStr(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBufferStr(sql, ")");
+ }
+ else
+ {
+ if (full)
+ appendPQExpBufferStr(sql, " FULL");
+ if (freeze)
+ appendPQExpBufferStr(sql, " FREEZE");
+ if (verbose)
+ appendPQExpBufferStr(sql, " VERBOSE");
+ if (and_analyze)
+ appendPQExpBufferStr(sql, " ANALYZE");
+ }
+ }
+ }
+
static void
help(const char *progname)
***************
*** 436,441 **** help(const char *progname)
--- 981,987 ----
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many concurrent connections to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
I didn't understand the coding in GetQueryResult(); why do we check the
result status of the last returned result only? It seems simpler to me
to check it inside the loop, but maybe there's a reason you didn't do it
like that?
Also, what is the reason we were ignoring those errors only in
"completedb" mode? It doesn't seem like it would cause any harm if we
did it in all cases. That means we can just not have the "completeDb"
flag at all.
Finally, I think it's better to report the "missing relation" error,
even if we're going to return true to continue processing other tables.
That makes the situation clearer to the user.
So the function would end up looking like this:
/*
* GetQueryResult
*
* Process the query result. Returns true if there's no error, false
* otherwise -- but errors about trying to vacuum a missing relation are
* ignored.
*/
static bool
GetQueryResult(PGconn *conn, errorOptions *erropts)
{
PGresult *result = NULL;
SetCancelConn(conn);
while ((result = PQgetResult(conn)) != NULL)
{
/*
* If errors are found, report them. Errors about a missing table are
* harmless so we continue processing, but die for other errors.
*/
if (PQresultStatus(result) != PGRES_COMMAND_OK)
{
char *sqlState = PQresultErrorField(result, PG_DIAG_SQLSTATE);
fprintf(stderr, _("%s: vacuuming of database \"%s\" failed: %s"),
erropts->progname, erropts->dbname, PQerrorMessage(conn));
if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)
{
PQclear(result);
return false;
}
}
PQclear(result);
}
ResetCancelConn();
return true;
}
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jan 21, 2015 at 8:51 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:
I didn't understand the coding in GetQueryResult(); why do we check the
result status of the last returned result only? It seems simpler to me
to check it inside the loop, but maybe there's a reason you didn't do it
like that?Also, what is the reason we were ignoring those errors only in
"completedb" mode? It doesn't seem like it would cause any harm if we
did it in all cases. That means we can just not have the "completeDb"
flag at all.
IIRC it is done to match the existing behaviour where such errors are
ignored we use this utility to vacuum database.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Amit Kapila wrote:
On Wed, Jan 21, 2015 at 8:51 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:I didn't understand the coding in GetQueryResult(); why do we check the
result status of the last returned result only? It seems simpler to me
to check it inside the loop, but maybe there's a reason you didn't do it
like that?Also, what is the reason we were ignoring those errors only in
"completedb" mode? It doesn't seem like it would cause any harm if we
did it in all cases. That means we can just not have the "completeDb"
flag at all.IIRC it is done to match the existing behaviour where such errors are
ignored we use this utility to vacuum database.
I think that's fine, but we should do it always, not just in
whole-database mode.
I've been hacking this patch today BTW; hope to have something to show
tomorrow.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jan 22, 2015 at 8:22 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:
Amit Kapila wrote:
On Wed, Jan 21, 2015 at 8:51 PM, Alvaro Herrera <
alvherre@2ndquadrant.com>
wrote:
I didn't understand the coding in GetQueryResult(); why do we check
the
result status of the last returned result only? It seems simpler to
me
to check it inside the loop, but maybe there's a reason you didn't do
it
like that?
Also, what is the reason we were ignoring those errors only in
"completedb" mode? It doesn't seem like it would cause any harm if we
did it in all cases. That means we can just not have the "completeDb"
flag at all.IIRC it is done to match the existing behaviour where such errors are
ignored we use this utility to vacuum database.I think that's fine, but we should do it always, not just in
whole-database mode.I've been hacking this patch today BTW; hope to have something to show
tomorrow.
Great!
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Here's v23.
I reworked a number of things. First, I changed trivial stuff like
grouping all the vacuuming options in a struct, to avoid passing an
excessive number of arguments to functions. full, freeze, analyze_only,
and_analyze and verbose are all in a single struct now. Also, the
stage_commands and stage_messages was duplicated by your patch; I moved
them to a file-level static struct.
I made prepare_command reset the string buffer and receive an optional
table name, so that it can append it to the generated command, and
append the semicolon as well. Forcing the callers to reset the string
before calling, and having them add the table name and semicolon
afterwards was awkward and unnecessarily verbose.
You had a new in_abort() function in common.c which seems an unnecessary
layer; in its place I just exported the inAbort boolean flag it was
returning, and renamed to CancelRequested.
I was then troubled by the fact that vacuum_one_database() was being
called in a loop by main() when multiple tables are vacuumed, but
vacuum_parallel() was doing the loop internally. I found this
discrepancy confusing, so I renamed that new function to
vacuum_one_database_parallel and modified the original
vacuum_one_database to do the loop internally as well. Now they are, in
essence, a mirror of each other, one doing the parallel stuff and one
doing it serially. This seems to make more sense to me -- but see
below.
I also modified some underlying stuff like GetIdleSlot returning a
ParallelSlot pointer instead of an array index. Since its caller always
has to dereference the array with the given index, it makes more sense
to return the right element pointer instead, so I made it do that.
Also, that way, instead of returning NO_SLOT in case of error it can
just return NULL; no need for extra cognitive burden.
I also changed select_loop. In your patch it had two implementations,
one WIN32 and another one for the rest. It looks nicer to me to have
only one with small exceptions in the places that need it. (I haven't
tested the WIN32 path.) Also, instead of returning ERROR_IN_ABORT I
made it set a boolean flag in case of error, which seems cleaner.
I changed GetQueryResult as I described in a previous message.
There are two things that continue to bother me and I would like you,
dear patch author, to change them before committing this patch:
1. I don't like having vacuum_one_database() and a separate
vacuum_one_database_parallel(). I think we should merge them into one
function, which does either thing according to parameters. There's
plenty in there that's duplicated.
2. in particular, the above means that run_parallel_vacuum can no longer
exist as it is. Right now vacuum_one_database_parallel relies on
run_parallel_vacuum to do the actual job parallellization. I would like
to have that looping in the improved vacuum_one_database() function
instead.
Looking forward to v24,
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
vacuumdb_parallel_v23.patchtext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 3ecd999..211235a 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -204,6 +204,25 @@ PostgreSQL documentation
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></option></term>
+ <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+ <listitem>
+ <para>
+ This option will enable the vacuum operation to run on concurrent
+ connections. Maximum number of tables can be vacuumed concurrently
+ is equal to number of jobs. If number of jobs given is more than
+ number of tables then number of jobs will be set to number of tables.
+ </para>
+ <para>
+ <application>vacuumdb</application> will open
+ <replaceable class="parameter"> njobs</replaceable> connections to the
+ database, so make sure your <xref linkend="guc-max-connections">
+ setting is high enough to accommodate all connections.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>--analyze-in-stages</option></term>
<listitem>
<para>
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index d942a75..1bf7611 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -1160,7 +1160,7 @@ select_loop(int maxFd, fd_set *workerset)
i = select(maxFd + 1, workerset, NULL, NULL, NULL);
/*
- * If we Ctrl-C the master process , it's likely that we interrupt
+ * If we Ctrl-C the master process, it's likely that we interrupt
* select() here. The signal handler will set wantAbort == true and
* the shutdown journey starts from here. Note that we'll come back
* here later when we tell all workers to terminate and read their
diff --git a/src/bin/scripts/common.c b/src/bin/scripts/common.c
index 6bfe2e6..da142aa 100644
--- a/src/bin/scripts/common.c
+++ b/src/bin/scripts/common.c
@@ -19,10 +19,9 @@
#include "common.h"
-static void SetCancelConn(PGconn *conn);
-static void ResetCancelConn(void);
static PGcancel *volatile cancelConn = NULL;
+bool CancelRequested = false;
#ifdef WIN32
static CRITICAL_SECTION cancelConnLock;
@@ -291,7 +290,7 @@ yesno_prompt(const char *question)
*
* Set cancelConn to point to the current database connection.
*/
-static void
+void
SetCancelConn(PGconn *conn)
{
PGcancel *oldCancelConn;
@@ -321,7 +320,7 @@ SetCancelConn(PGconn *conn)
*
* Free the current cancel connection, if any, and set to NULL.
*/
-static void
+void
ResetCancelConn(void)
{
PGcancel *oldCancelConn;
@@ -345,9 +344,8 @@ ResetCancelConn(void)
#ifndef WIN32
/*
- * Handle interrupt signals by canceling the current command,
- * if it's being executed through executeMaintenanceCommand(),
- * and thus has a cancelConn set.
+ * Handle interrupt signals by canceling the current command, if a cancelConn
+ * is set.
*/
static void
handle_sigint(SIGNAL_ARGS)
@@ -359,10 +357,15 @@ handle_sigint(SIGNAL_ARGS)
if (cancelConn != NULL)
{
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
+ {
+ CancelRequested = true;
fprintf(stderr, _("Cancel request sent\n"));
+ }
else
fprintf(stderr, _("Could not send cancel request: %s"), errbuf);
}
+ else
+ CancelRequested = true;
errno = save_errno; /* just in case the write changed it */
}
@@ -392,10 +395,16 @@ consoleHandler(DWORD dwCtrlType)
if (cancelConn != NULL)
{
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
+ {
fprintf(stderr, _("Cancel request sent\n"));
+ CancelRequested = true;
+ }
else
fprintf(stderr, _("Could not send cancel request: %s"), errbuf);
}
+ else
+ CancelRequested = true;
+
LeaveCriticalSection(&cancelConnLock);
return TRUE;
diff --git a/src/bin/scripts/common.h b/src/bin/scripts/common.h
index c0c1715..b5ce1ed 100644
--- a/src/bin/scripts/common.h
+++ b/src/bin/scripts/common.h
@@ -21,6 +21,8 @@ enum trivalue
TRI_YES
};
+extern bool CancelRequested;
+
typedef void (*help_handler) (const char *progname);
extern void handle_help_version_opts(int argc, char *argv[],
@@ -49,4 +51,8 @@ extern bool yesno_prompt(const char *question);
extern void setup_cancel_handler(void);
+extern void SetCancelConn(PGconn *conn);
+extern void ResetCancelConn(void);
+
+
#endif /* COMMON_H */
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 957fdb6..89af9d5 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -11,24 +11,108 @@
*/
#include "postgres_fe.h"
+
#include "common.h"
#include "dumputils.h"
-static void vacuum_one_database(const char *dbname, bool full, bool verbose,
- bool and_analyze, bool analyze_only, bool analyze_in_stages, int stage, bool freeze,
- const char *table, const char *host, const char *port,
+#define ERRCODE_UNDEFINED_TABLE "42P01"
+
+/* Parallel vacuuming stuff */
+typedef struct ParallelSlot
+{
+ PGconn *connection;
+ pgsocket sock;
+ bool isFree;
+} ParallelSlot;
+
+/* vacuum options controlled by user flags */
+typedef struct vacuumingOptions
+{
+ bool analyze_only;
+ bool verbose;
+ bool and_analyze;
+ bool full;
+ bool freeze;
+} vacuumingOptions;
+
+
+static void vacuum_one_database(const char *dbname, vacuumingOptions *vacopts,
+ bool analyze_in_stages, int stage,
+ SimpleStringList *tables,
+ const char *host, const char *port,
const char *username, enum trivalue prompt_password,
const char *progname, bool echo, bool quiet);
-static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
- bool analyze_only, bool analyze_in_stages, bool freeze,
+static void vacuum_one_database_parallel(const char *dbname, vacuumingOptions *vacopts,
+ bool analyze_in_stages, int stage,
+ SimpleStringList *tables,
+ const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ int concurrentCons,
+ const char *progname, bool echo, bool quiet);
+static void vacuum_all_databases(vacuumingOptions *vacopts,
+ bool analyze_in_stages,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
+ int concurrentCons,
const char *progname, bool echo, bool quiet);
+static void vacuum_database_stage(const char *dbname, vacuumingOptions *vacopts,
+ bool analyze_in_stages, int stage,
+ SimpleStringList *tables,
+ const char *host, const char *port, const char *username,
+ enum trivalue prompt_password,
+ int concurrentCons,
+ const char *progname, bool echo, bool quiet);
static void help(const char *progname);
+static void prepare_command(PQExpBuffer sql, PGconn *conn,
+ vacuumingOptions *vacopts, const char *table);
+static void run_parallel_vacuum(bool echo,
+ SimpleStringList *tables, vacuumingOptions *vacopts,
+ int concurrentCons, int analyze_stage,
+ ParallelSlot slots[],
+ const char *dbname, const char *progname);
+static ParallelSlot *GetIdleSlot(ParallelSlot slots[], int numslots,
+ const char *dbname, const char *progname);
+
+static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname);
+
+static int select_loop(int maxFd, fd_set *workerset, bool *aborting);
+
+static void DisconnectDatabase(ParallelSlot *slot);
+
+
+/*
+ * Preparatory commands and corresponding user-visible message for the
+ * analyze-in-stages feature. Note the ANALYZE command itself must be sent
+ * separately.
+ */
+static const struct
+{
+ const char *prepcmd;
+ const char *message;
+}
+staged_analyze[3] =
+{
+ {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ gettext_noop("Generating minimal optimizer statistics (1 target)")
+ },
+ {
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ gettext_noop("Generating medium optimizer statistics (10 targets)")
+ },
+ {
+ "RESET default_statistics_target;",
+ gettext_noop("Generating default (full) optimizer statistics")
+ }
+};
+
+#define ANALYZE_ALL_STAGES -1
+
int
main(int argc, char *argv[])
@@ -49,6 +133,7 @@ main(int argc, char *argv[])
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
@@ -57,7 +142,6 @@ main(int argc, char *argv[])
const char *progname;
int optindex;
int c;
-
const char *dbname = NULL;
const char *maintenance_db = NULL;
char *host = NULL;
@@ -66,21 +150,23 @@ main(int argc, char *argv[])
enum trivalue prompt_password = TRI_DEFAULT;
bool echo = false;
bool quiet = false;
- bool and_analyze = false;
- bool analyze_only = false;
+ vacuumingOptions vacopts;
bool analyze_in_stages = false;
- bool freeze = false;
bool alldb = false;
- bool full = false;
- bool verbose = false;
SimpleStringList tables = {NULL, NULL};
+ int concurrentCons = 0;
+ int tbl_count = 0;
+
+ /* initialize options to all false */
+ memset(&vacopts, 0, sizeof(vacopts));
progname = get_progname(argv[0]);
+
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
- while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
+ while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fvj:", long_options, &optindex)) != -1)
{
switch (c)
{
@@ -109,31 +195,43 @@ main(int argc, char *argv[])
dbname = pg_strdup(optarg);
break;
case 'z':
- and_analyze = true;
+ vacopts.and_analyze = true;
break;
case 'Z':
- analyze_only = true;
+ vacopts.analyze_only = true;
break;
case 'F':
- freeze = true;
+ vacopts.freeze = true;
break;
case 'a':
alldb = true;
break;
case 't':
+ {
simple_string_list_append(&tables, optarg);
+ tbl_count++;
break;
+ }
case 'f':
- full = true;
+ vacopts.full = true;
break;
case 'v':
- verbose = true;
+ vacopts.verbose = true;
+ break;
+ case 'j':
+ concurrentCons = atoi(optarg);
+ if (concurrentCons <= 0)
+ {
+ fprintf(stderr, _("%s: number of parallel \"jobs\" must be at least 1\n"),
+ progname);
+ exit(1);
+ }
break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
case 3:
- analyze_in_stages = analyze_only = true;
+ analyze_in_stages = vacopts.analyze_only = true;
break;
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
@@ -141,7 +239,6 @@ main(int argc, char *argv[])
}
}
-
/*
* Non-option argument specifies database name as long as it wasn't
* already specified with -d / --dbname
@@ -160,18 +257,18 @@ main(int argc, char *argv[])
exit(1);
}
- if (analyze_only)
+ if (vacopts.analyze_only)
{
- if (full)
+ if (vacopts.full)
{
- fprintf(stderr, _("%s: cannot use the \"full\" option when performing only analyze\n"),
- progname);
+ fprintf(stderr, _("%s: cannot use the \"%s\" option when performing only analyze\n"),
+ progname, "full");
exit(1);
}
- if (freeze)
+ if (vacopts.freeze)
{
- fprintf(stderr, _("%s: cannot use the \"freeze\" option when performing only analyze\n"),
- progname);
+ fprintf(stderr, _("%s: cannot use the \"%s\" option when performing only analyze\n"),
+ progname, "freeze");
exit(1);
}
/* allow 'and_analyze' with 'analyze_only' */
@@ -179,6 +276,10 @@ main(int argc, char *argv[])
setup_cancel_handler();
+ /* Avoid opening extra connections. */
+ if (tbl_count && (concurrentCons > tbl_count))
+ concurrentCons = tbl_count;
+
if (alldb)
{
if (dbname)
@@ -194,9 +295,12 @@ main(int argc, char *argv[])
exit(1);
}
- vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
- maintenance_db, host, port, username,
- prompt_password, progname, echo, quiet);
+ vacuum_all_databases(&vacopts,
+ analyze_in_stages,
+ maintenance_db,
+ host, port, username, prompt_password,
+ concurrentCons,
+ progname, echo, quiet);
}
else
{
@@ -210,35 +314,35 @@ main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
- if (tables.head != NULL)
- {
- SimpleStringListCell *cell;
-
- for (cell = tables.head; cell; cell = cell->next)
- {
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages, -1,
- freeze, cell->val,
- host, port, username, prompt_password,
- progname, echo, quiet);
- }
- }
- else
- vacuum_one_database(dbname, full, verbose, and_analyze,
- analyze_only, analyze_in_stages, -1,
- freeze, NULL,
- host, port, username, prompt_password,
- progname, echo, quiet);
+ vacuum_database_stage(dbname, &vacopts,
+ analyze_in_stages, ANALYZE_ALL_STAGES,
+ &tables,
+ host, port, username, prompt_password,
+ concurrentCons,
+ progname, echo, quiet);
}
exit(0);
}
-
+/*
+ * Execute a vacuum/analyze command to the server.
+ *
+ * Result status is checked only if 'async' is false.
+ */
static void
-run_vacuum_command(PGconn *conn, const char *sql, bool echo, const char *dbname, const char *table, const char *progname)
+run_vacuum_command(PGconn *conn, const char *sql, bool echo,
+ const char *dbname, const char *table,
+ const char *progname, bool async)
{
- if (!executeMaintenanceCommand(conn, sql, echo))
+ if (async)
+ {
+ if (echo)
+ printf("%s\n", sql);
+
+ PQsendQuery(conn, sql);
+ }
+ else if (!executeMaintenanceCommand(conn, sql, echo))
{
if (table)
fprintf(stderr, _("%s: vacuuming of table \"%s\" in database \"%s\" failed: %s"),
@@ -251,172 +355,648 @@ run_vacuum_command(PGconn *conn, const char *sql, bool echo, const char *dbname,
}
}
-
+/*
+ * vacuum_one_database
+ *
+ * Process tables in the given database. If the 'tables' list is empty,
+ * process all tables in the database. Note there is no paralellization here.
+ */
static void
-vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyze,
- bool analyze_only, bool analyze_in_stages, int stage, bool freeze, const char *table,
+vacuum_one_database(const char *dbname, vacuumingOptions *vacopts,
+ bool analyze_in_stages, int stage,
+ SimpleStringList *tables,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
const char *progname, bool echo, bool quiet)
{
PQExpBufferData sql;
-
PGconn *conn;
-
- initPQExpBuffer(&sql);
+ SimpleStringListCell *cell;
conn = connectDatabase(dbname, host, port, username, prompt_password,
progname, false);
- if (analyze_only)
+ initPQExpBuffer(&sql);
+
+ cell = tables ? tables->head : NULL;
+ do
{
- appendPQExpBufferStr(&sql, "ANALYZE");
- if (verbose)
- appendPQExpBufferStr(&sql, " VERBOSE");
- }
- else
- {
- appendPQExpBufferStr(&sql, "VACUUM");
- if (PQserverVersion(conn) >= 90000)
+ const char *tabname;
+
+ tabname = cell ? cell->val : NULL;
+ prepare_command(&sql, conn, vacopts, tabname);
+
+ if (analyze_in_stages)
{
- const char *paren = " (";
- const char *comma = ", ";
- const char *sep = paren;
+ if (stage == ANALYZE_ALL_STAGES)
+ {
+ int i;
- if (full)
- {
- appendPQExpBuffer(&sql, "%sFULL", sep);
- sep = comma;
- }
- if (freeze)
- {
- appendPQExpBuffer(&sql, "%sFREEZE", sep);
- sep = comma;
- }
- if (verbose)
- {
- appendPQExpBuffer(&sql, "%sVERBOSE", sep);
- sep = comma;
+ /* Run all stages. */
+ for (i = 0; i < 3; i++)
+ {
+ if (!quiet)
+ {
+ puts(gettext(staged_analyze[i].message));
+ fflush(stdout);
+ }
+ executeCommand(conn, staged_analyze[i].prepcmd, progname, echo);
+ run_vacuum_command(conn, sql.data, echo, dbname, tabname, progname, false);
+ }
}
- if (and_analyze)
+ else
{
- appendPQExpBuffer(&sql, "%sANALYZE", sep);
- sep = comma;
+ /*
+ * Otherwise, we got a stage from vacuum_all_databases(), so run
+ * only that one.
+ */
+ if (!quiet)
+ {
+ puts(gettext(staged_analyze[stage].message));
+ fflush(stdout);
+ }
+ executeCommand(conn, staged_analyze[stage].prepcmd, progname, echo);
+ run_vacuum_command(conn, sql.data, echo, dbname, tabname, progname, false);
}
- if (sep != paren)
- appendPQExpBufferStr(&sql, ")");
}
else
+ run_vacuum_command(conn, sql.data, echo, dbname, NULL, progname, false);
+
+ cell = cell->next;
+ } while (cell != NULL);
+
+ PQfinish(conn);
+ termPQExpBuffer(&sql);
+}
+
+static void
+init_slot(ParallelSlot *slot, PGconn *conn)
+{
+ slot->connection = conn;
+ slot->isFree = true;
+ slot->sock = PQsocket(conn);
+}
+
+/*
+ * vacuum_one_database_parallel
+ *
+ * Like vacuum_one_database, but drive multiple connections in parallel.
+ * Another significant difference is that if the table list is empty, rather
+ * than running unadorned VACUUM commands (which would vacuum all the tables in
+ * the database, as vacuum_one_database does) we need to query the catalogs to
+ * obtain the list of tables first.
+ */
+static void
+vacuum_one_database_parallel(const char *dbname, vacuumingOptions *vacopts,
+ bool analyze_in_stages, int stage,
+ SimpleStringList *tables,
+ const char *host, const char *port,
+ const char *username, enum trivalue prompt_password,
+ int concurrentCons,
+ const char *progname, bool echo, bool quiet)
+{
+ PGconn *conn;
+ ParallelSlot *slots;
+ SimpleStringList dbtables = {NULL, NULL};
+ int i;
+
+ conn = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ /*
+ * If a table list is not provided then we need to vacuum the whole
+ * database; prepare the list of tables.
+ */
+ if (!tables || !tables->head)
+ {
+ PQExpBufferData buf;
+ PGresult *res;
+ int ntups;
+ int i;
+
+ initPQExpBuffer(&buf);
+
+ res = executeQuery(conn,
+ "SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns\n"
+ " WHERE relkind IN (\'r\', \'m\') AND c.relnamespace = ns.oid\n"
+ " ORDER BY c.relpages DESC",
+ progname, echo);
+
+ ntups = PQntuples(res);
+ for (i = 0; i < ntups; i++)
{
- if (full)
- appendPQExpBufferStr(&sql, " FULL");
- if (freeze)
- appendPQExpBufferStr(&sql, " FREEZE");
- if (verbose)
- appendPQExpBufferStr(&sql, " VERBOSE");
- if (and_analyze)
- appendPQExpBufferStr(&sql, " ANALYZE");
+ appendPQExpBuffer(&buf, "%s",
+ fmtQualifiedId(PQserverVersion(conn),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 0)));
+
+ simple_string_list_append(&dbtables, buf.data);
+ resetPQExpBuffer(&buf);
}
+
+ termPQExpBuffer(&buf);
+ tables = &dbtables;
+
+ /*
+ * If there are more connections than vacuumable relations, we don't
+ * need to use them all.
+ */
+ if (concurrentCons > ntups)
+ concurrentCons = ntups;
+ }
+
+ slots = (ParallelSlot *) pg_malloc(sizeof(ParallelSlot) * concurrentCons);
+ init_slot(slots, conn);
+
+ for (i = 1; i < concurrentCons; i++)
+ {
+ conn = connectDatabase(dbname, host, port, username, prompt_password,
+ progname, false);
+ init_slot(slots + i, conn);
}
- if (table)
- appendPQExpBuffer(&sql, " %s", table);
- appendPQExpBufferStr(&sql, ";");
if (analyze_in_stages)
{
- const char *stage_commands[] = {
- "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
- "SET default_statistics_target=10; RESET vacuum_cost_delay;",
- "RESET default_statistics_target;"
- };
- const char *stage_messages[] = {
- gettext_noop("Generating minimal optimizer statistics (1 target)"),
- gettext_noop("Generating medium optimizer statistics (10 targets)"),
- gettext_noop("Generating default (full) optimizer statistics")
- };
-
- if (stage == -1)
+ if (stage == ANALYZE_ALL_STAGES)
{
- int i;
+ int i;
- /* Run all stages. */
for (i = 0; i < 3; i++)
{
if (!quiet)
{
- puts(gettext(stage_messages[i]));
+ puts(gettext(staged_analyze[i].message));
fflush(stdout);
}
- executeCommand(conn, stage_commands[i], progname, echo);
- run_vacuum_command(conn, sql.data, echo, dbname, table, progname);
+
+ run_parallel_vacuum(echo, tables, vacopts,
+ concurrentCons, i, slots,
+ dbname, progname);
}
}
else
{
- /* Otherwise, we got a stage from vacuum_all_databases(), so run
- * only that one. */
+ /*
+ * Otherwise, we got a stage from vacuum_all_databases(), so run
+ * only that one.
+ */
if (!quiet)
{
- puts(gettext(stage_messages[stage]));
+ puts(gettext(staged_analyze[stage].message));
fflush(stdout);
}
- executeCommand(conn, stage_commands[stage], progname, echo);
- run_vacuum_command(conn, sql.data, echo, dbname, table, progname);
- }
+ run_parallel_vacuum(echo, tables, vacopts,
+ concurrentCons, stage, slots,
+ dbname, progname);
+ }
}
else
- run_vacuum_command(conn, sql.data, echo, dbname, NULL, progname);
+ run_parallel_vacuum(echo, tables, vacopts,
+ concurrentCons, ANALYZE_ALL_STAGES, slots,
+ dbname, progname);
- PQfinish(conn);
- termPQExpBuffer(&sql);
+ for (i = 0; i < concurrentCons; i++)
+ DisconnectDatabase(&slots[i]);
+
+ pfree(slots);
}
-
static void
-vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
- bool analyze_in_stages, bool freeze, const char *maintenance_db,
- const char *host, const char *port,
- const char *username, enum trivalue prompt_password,
+vacuum_all_databases(vacuumingOptions *vacopts,
+ bool analyze_in_stages,
+ const char *maintenance_db, const char *host,
+ const char *port, const char *username,
+ enum trivalue prompt_password,
+ int concurrentCons,
const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
int stage;
+ int i;
conn = connectMaintenanceDatabase(maintenance_db, host, port,
username, prompt_password, progname);
- result = executeQuery(conn, "SELECT datname FROM pg_database WHERE datallowconn ORDER BY 1;", progname, echo);
+ result = executeQuery(conn,
+ "SELECT datname FROM pg_database WHERE datallowconn ORDER BY 1;",
+ progname, echo);
PQfinish(conn);
- /* If analyzing in stages, then run through all stages. Otherwise just
- * run once, passing -1 as the stage. */
- for (stage = (analyze_in_stages ? 0 : -1);
- stage < (analyze_in_stages ? 3 : 0);
- stage++)
+ if (analyze_in_stages)
{
- int i;
+ for (stage = 0; stage < 3; stage++)
+ {
+ for (i = 0; i < PQntuples(result); i++)
+ {
+ const char *dbname;
+ dbname = PQgetvalue(result, i, 0);
+ vacuum_database_stage(dbname, vacopts,
+ analyze_in_stages, stage,
+ NULL,
+ host, port, username, prompt_password,
+ concurrentCons,
+ progname, echo, quiet);
+ }
+ }
+ }
+ else
+ {
for (i = 0; i < PQntuples(result); i++)
{
- char *dbname = PQgetvalue(result, i, 0);
+ const char *dbname;
- if (!quiet)
- {
- printf(_("%s: vacuuming database \"%s\"\n"), progname, dbname);
- fflush(stdout);
- }
-
- vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
- analyze_in_stages, stage,
- freeze, NULL, host, port, username, prompt_password,
- progname, echo, quiet);
+ dbname = PQgetvalue(result, i, 0);
+ vacuum_database_stage(dbname, vacopts,
+ analyze_in_stages, ANALYZE_ALL_STAGES,
+ NULL,
+ host, port, username, prompt_password,
+ concurrentCons,
+ progname, echo, quiet);
}
}
PQclear(result);
}
+static void
+vacuum_database_stage(const char *dbname, vacuumingOptions *vacopts,
+ bool analyze_in_stages, int stage,
+ SimpleStringList *tables,
+ const char *host, const char *port, const char *username,
+ enum trivalue prompt_password,
+ int concurrentCons,
+ const char *progname, bool echo, bool quiet)
+{
+ if (!quiet)
+ {
+ printf(_("%s: vacuuming database \"%s\"\n"), progname, dbname);
+ fflush(stdout);
+ }
+
+ if (concurrentCons > 1)
+ vacuum_one_database_parallel(dbname, vacopts,
+ analyze_in_stages, stage,
+ tables,
+ host, port, username, prompt_password,
+ concurrentCons,
+ progname, echo, quiet);
+ else
+ vacuum_one_database(dbname, vacopts,
+ analyze_in_stages, stage,
+ tables,
+ host, port, username, prompt_password,
+ progname, echo, quiet);
+}
+
+/*
+ * run_parallel_vacuum
+ *
+ * This function does the actual work for sending the jobs concurrently to
+ * server.
+ */
+static void
+run_parallel_vacuum(bool echo, SimpleStringList *tables,
+ vacuumingOptions *vacopts, int concurrentCons,
+ int analyze_stage, ParallelSlot slots[],
+ const char *dbname, const char *progname)
+{
+ PQExpBufferData sql;
+ SimpleStringListCell *cell;
+ int i;
+
+ initPQExpBuffer(&sql);
+
+ if (analyze_stage >= 0)
+ {
+ for (i = 0; i < concurrentCons; i++)
+ {
+ executeCommand((slots + i)->connection,
+ staged_analyze[analyze_stage].prepcmd,
+ progname, echo);
+ }
+ }
+
+ for (cell = tables->head; cell; cell = cell->next)
+ {
+ ParallelSlot *free_slot;
+
+ if (CancelRequested)
+ goto fail;
+
+ /*
+ * Get a free slot, waiting until one becomes free if none currently
+ * is.
+ */
+ free_slot = GetIdleSlot(slots, concurrentCons, dbname, progname);
+ if (!free_slot)
+ goto fail;
+
+ free_slot->isFree = false;
+
+ prepare_command(&sql, free_slot->connection, vacopts, cell->val);
+ run_vacuum_command(free_slot->connection, sql.data,
+ echo, dbname, cell->val, progname, true);
+ }
+
+ for (i = 0; i < concurrentCons; i++)
+ {
+ /* wait for all connection to return the results */
+ if (!GetQueryResult((slots + i)->connection, dbname, progname))
+ goto fail;
+
+ (slots + i)->isFree = true; /* XXX what's the point? */
+ }
+
+ if (false)
+ {
+fail:
+ for (i = 0; i < concurrentCons; i++)
+ DisconnectDatabase(slots + i);
+ exit(1);
+ }
+
+ termPQExpBuffer(&sql);
+}
+
+/*
+ * GetIdleSlot
+ * Return a connection slot that is ready to execute a command.
+ *
+ * We return the first slot we find that is marked isFree, if one is;
+ * otherwise, we loop on select() until one socket becomes available. When
+ * this happens, we read the whole set and mark as free all sockets that become
+ * available.
+ *
+ * Process the slot list, if any free slot is available then return the slotid
+ * else perform the select on all the socket's and wait until at least one slot
+ * becomes available.
+ *
+ * If an error occurs, NULL is returned.
+ */
+static ParallelSlot *
+GetIdleSlot(ParallelSlot slots[], int numslots, const char *dbname,
+ const char *progname)
+{
+ int i;
+ int firstFree = -1;
+ fd_set slotset;
+ pgsocket maxFd;
+
+ for (i = 0; i < numslots; i++)
+ if ((slots + i)->isFree)
+ return slots + i;
+
+ FD_ZERO(&slotset);
+
+ maxFd = slots->sock;
+ for (i = 0; i < numslots; i++)
+ {
+ FD_SET((slots + i)->sock, &slotset);
+ if ((slots + i)->sock > maxFd)
+ maxFd = (slots + i)->sock;
+ }
+
+ /*
+ * No free slot found, so wait until one of the connections has finished
+ * its task and return the available slot.
+ */
+ for (firstFree = -1; firstFree < 0; )
+ {
+ bool aborting;
+
+ SetCancelConn(slots->connection);
+ i = select_loop(maxFd, &slotset, &aborting);
+ ResetCancelConn();
+
+ if (aborting)
+ {
+ /*
+ * We set the cancel-receiving connection to the one in the zeroth
+ * slot above, so fetch the error from there.
+ */
+ GetQueryResult(slots->connection, dbname, progname);
+ return NULL;
+ }
+ Assert(i != 0);
+
+ for (i = 0; i < numslots; i++)
+ {
+ if (!FD_ISSET((slots + i)->sock, &slotset))
+ continue;
+
+ PQconsumeInput((slots + i)->connection);
+ if (PQisBusy((slots + i)->connection))
+ continue;
+
+ (slots + i)->isFree = true;
+
+ if (!GetQueryResult((slots + i)->connection, dbname, progname))
+ return NULL;
+
+ if (firstFree < 0)
+ firstFree = i;
+ }
+ }
+
+ return slots + firstFree;
+}
+
+/*
+ * GetQueryResult
+ *
+ * Process the query result. Returns true if there's no error, false
+ * otherwise -- but errors about trying to vacuum a missing relation are
+ * reported and subsequently ignored.
+ */
+static bool
+GetQueryResult(PGconn *conn, const char *dbname, const char *progname)
+{
+ PGresult *result;
+
+ SetCancelConn(conn);
+ while ((result = PQgetResult(conn)) != NULL)
+ {
+ /*
+ * If errors are found, report them. Errors about a missing table are
+ * harmless so we continue processing; but die for other errors.
+ */
+ if (PQresultStatus(result) != PGRES_COMMAND_OK)
+ {
+ char *sqlState = PQresultErrorField(result, PG_DIAG_SQLSTATE);
+
+ fprintf(stderr, _("%s: vacuuming of database \"%s\" failed: %s"),
+ progname, dbname, PQerrorMessage(conn));
+
+ if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)
+ {
+ PQclear(result);
+ return false;
+ }
+ }
+
+ PQclear(result);
+ }
+ ResetCancelConn();
+
+ return true;
+}
+
+/*
+ * Loop on select() until a descriptor from the given set becomes readable.
+ *
+ * If we get a cancel request while we're waiting, we forego all further
+ * processing and set the *aborting flag to true. The return value must be
+ * ignored in this case. Otherwise, *aborting is set to false.
+ */
+static int
+select_loop(int maxFd, fd_set *workerset, bool *aborting)
+{
+ int i;
+ fd_set saveSet = *workerset;
+
+ if (CancelRequested)
+ {
+ *aborting = true;
+ return -1;
+ }
+ else
+ *aborting = false;
+
+ for (;;)
+ {
+ /*
+ * On Windows, we need to check once in a while for cancel requests; on
+ * other platforms we rely on select() returning when interrupted.
+ */
+ struct timeval *tvp;
+#ifdef WIN32
+ struct timeval tv;
+
+ tv = {0, 1000000};
+ tvp = &tv;
+#else
+ tvp = NULL;
+#endif
+
+ *workerset = saveSet;
+ i = select(maxFd + 1, workerset, NULL, NULL, tvp);
+
+#ifdef WIN32
+ if (i == SOCKET_ERROR)
+ {
+ i = -1;
+ TranslateSocketError();
+ }
+#endif
+
+ if (i < 0 && errno == EINTR)
+ continue; /* ignore this */
+ if (i < 0 || CancelRequested)
+ *aborting = true; /* but not this */
+ if (i == 0)
+ continue; /* timeout (Win32 only) */
+ break;
+ }
+
+ return i;
+}
+
+/*
+ * DisconnectDatabase
+ * Disconnect the connection associated with the given slot
+ */
+static void
+DisconnectDatabase(ParallelSlot *slot)
+{
+ char errbuf[256];
+
+ if (!slot->connection)
+ return;
+
+ if (PQtransactionStatus(slot->connection) == PQTRANS_ACTIVE)
+ {
+ PGcancel *cancel;
+
+ if ((cancel = PQgetCancel(slot->connection)))
+ {
+ PQcancel(cancel, errbuf, sizeof(errbuf));
+ PQfreeCancel(cancel);
+ }
+ }
+
+ PQfinish(slot->connection);
+ slot->connection = NULL;
+}
+
+/*
+ * Construct a vacuum/analyze command to run based on the given options, in the
+ * given string buffer, which may contain previous garbage.
+ *
+ * An optional table name can be passed; this must be already be properly
+ * quoted. The command is semicolon-terminated.
+ */
+static void
+prepare_command(PQExpBuffer sql, PGconn *conn, vacuumingOptions *vacopts,
+ const char *table)
+{
+ resetPQExpBuffer(sql);
+
+ if (vacopts->analyze_only)
+ {
+ appendPQExpBufferStr(sql, "ANALYZE");
+ if (vacopts->verbose)
+ appendPQExpBufferStr(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBufferStr(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (vacopts->full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (vacopts->freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (vacopts->verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (vacopts->and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBufferStr(sql, ")");
+ }
+ else
+ {
+ if (vacopts->full)
+ appendPQExpBufferStr(sql, " FULL");
+ if (vacopts->freeze)
+ appendPQExpBufferStr(sql, " FREEZE");
+ if (vacopts->verbose)
+ appendPQExpBufferStr(sql, " VERBOSE");
+ if (vacopts->and_analyze)
+ appendPQExpBufferStr(sql, " ANALYZE");
+ }
+ }
+
+ if (table)
+ appendPQExpBuffer(sql, " %s;", table);
+}
static void
help(const char *progname)
@@ -436,6 +1016,7 @@ help(const char *progname)
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many concurrent connections to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
On 22 January 2015 23:16, Alvaro Herrera Wrote,
Here's v23.
There are two things that continue to bother me and I would like you,
dear patch author, to change them before committing this patch:1. I don't like having vacuum_one_database() and a separate
vacuum_one_database_parallel(). I think we should merge them into one
function, which does either thing according to parameters. There's
plenty in there that's duplicated.2. in particular, the above means that run_parallel_vacuum can no
longer exist as it is. Right now vacuum_one_database_parallel relies
on run_parallel_vacuum to do the actual job parallellization. I would
like to have that looping in the improved vacuum_one_database()
function instead.
Looking forward to v24,
Thanks you for your effort, I have tried to change the patch as per your instructions and come up with v24,
Changes:
1. In current patch vacuum_one_database (for table list), have the table loop outside and analyze_stage loop inside, so it will finish
All three stage for one table first and then pick the next table. But vacuum_one_database_parallel will do the stage loop outside and will call run_parallel_vacuum,
Which will have table loop, so for one stage all the tables will be vacuumed first, then go to next stage.
So for merging two function both functions behaviors should be identical, I think if user have given a list of tables in analyze-in-stages, than doing all the table
Atleast for one stage and then picking next stage will be better solution so I have done it that way.
2. in select_loop
For WIN32 TranslateSocketError function I replaced with
if (WSAGetLastError() == WSAEINTR)
errno == EINTR;
otherwise I have to expose TranslateSocketError function from socket and include extra header.
I have tested in windows also its working fine.
Regards,
Dilip
Attachments:
vacuumdb_parallel_v24.patchapplication/octet-stream; name=vacuumdb_parallel_v24.patchDownload
*** a/doc/src/sgml/ref/vacuumdb.sgml
--- b/doc/src/sgml/ref/vacuumdb.sgml
***************
*** 204,209 **** PostgreSQL documentation
--- 204,228 ----
</varlistentry>
<varlistentry>
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></option></term>
+ <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+ <listitem>
+ <para>
+ This option will enable the vacuum operation to run on concurrent
+ connections. Maximum number of tables can be vacuumed concurrently
+ is equal to number of jobs. If number of jobs given is more than
+ number of tables then number of jobs will be set to number of tables.
+ </para>
+ <para>
+ <application>vacuumdb</application> will open
+ <replaceable class="parameter"> njobs</replaceable> connections to the
+ database, so make sure your <xref linkend="guc-max-connections">
+ setting is high enough to accommodate all connections.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>--analyze-in-stages</option></term>
<listitem>
<para>
*** a/src/backend/port/win32/socket.c
--- b/src/backend/port/win32/socket.c
***************
*** 42,48 **** int pgwin32_noblock = 0;
/*
* Convert the last socket error code into errno
*/
! static void
TranslateSocketError(void)
{
switch (WSAGetLastError())
--- 42,48 ----
/*
* Convert the last socket error code into errno
*/
! void
TranslateSocketError(void)
{
switch (WSAGetLastError())
*** a/src/bin/pg_dump/parallel.c
--- b/src/bin/pg_dump/parallel.c
***************
*** 1160,1166 **** select_loop(int maxFd, fd_set *workerset)
i = select(maxFd + 1, workerset, NULL, NULL, NULL);
/*
! * If we Ctrl-C the master process , it's likely that we interrupt
* select() here. The signal handler will set wantAbort == true and
* the shutdown journey starts from here. Note that we'll come back
* here later when we tell all workers to terminate and read their
--- 1160,1166 ----
i = select(maxFd + 1, workerset, NULL, NULL, NULL);
/*
! * If we Ctrl-C the master process, it's likely that we interrupt
* select() here. The signal handler will set wantAbort == true and
* the shutdown journey starts from here. Note that we'll come back
* here later when we tell all workers to terminate and read their
*** a/src/bin/scripts/common.c
--- b/src/bin/scripts/common.c
***************
*** 19,28 ****
#include "common.h"
- static void SetCancelConn(PGconn *conn);
- static void ResetCancelConn(void);
static PGcancel *volatile cancelConn = NULL;
#ifdef WIN32
static CRITICAL_SECTION cancelConnLock;
--- 19,27 ----
#include "common.h"
static PGcancel *volatile cancelConn = NULL;
+ bool CancelRequested = false;
#ifdef WIN32
static CRITICAL_SECTION cancelConnLock;
***************
*** 291,297 **** yesno_prompt(const char *question)
*
* Set cancelConn to point to the current database connection.
*/
! static void
SetCancelConn(PGconn *conn)
{
PGcancel *oldCancelConn;
--- 290,296 ----
*
* Set cancelConn to point to the current database connection.
*/
! void
SetCancelConn(PGconn *conn)
{
PGcancel *oldCancelConn;
***************
*** 321,327 **** SetCancelConn(PGconn *conn)
*
* Free the current cancel connection, if any, and set to NULL.
*/
! static void
ResetCancelConn(void)
{
PGcancel *oldCancelConn;
--- 320,326 ----
*
* Free the current cancel connection, if any, and set to NULL.
*/
! void
ResetCancelConn(void)
{
PGcancel *oldCancelConn;
***************
*** 345,353 **** ResetCancelConn(void)
#ifndef WIN32
/*
! * Handle interrupt signals by canceling the current command,
! * if it's being executed through executeMaintenanceCommand(),
! * and thus has a cancelConn set.
*/
static void
handle_sigint(SIGNAL_ARGS)
--- 344,351 ----
#ifndef WIN32
/*
! * Handle interrupt signals by canceling the current command, if a cancelConn
! * is set.
*/
static void
handle_sigint(SIGNAL_ARGS)
***************
*** 359,368 **** handle_sigint(SIGNAL_ARGS)
--- 357,371 ----
if (cancelConn != NULL)
{
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
+ {
+ CancelRequested = true;
fprintf(stderr, _("Cancel request sent\n"));
+ }
else
fprintf(stderr, _("Could not send cancel request: %s"), errbuf);
}
+ else
+ CancelRequested = true;
errno = save_errno; /* just in case the write changed it */
}
***************
*** 392,401 **** consoleHandler(DWORD dwCtrlType)
--- 395,410 ----
if (cancelConn != NULL)
{
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
+ {
fprintf(stderr, _("Cancel request sent\n"));
+ CancelRequested = true;
+ }
else
fprintf(stderr, _("Could not send cancel request: %s"), errbuf);
}
+ else
+ CancelRequested = true;
+
LeaveCriticalSection(&cancelConnLock);
return TRUE;
*** a/src/bin/scripts/common.h
--- b/src/bin/scripts/common.h
***************
*** 21,26 **** enum trivalue
--- 21,28 ----
TRI_YES
};
+ extern bool CancelRequested;
+
typedef void (*help_handler) (const char *progname);
extern void handle_help_version_opts(int argc, char *argv[],
***************
*** 49,52 **** extern bool yesno_prompt(const char *question);
--- 51,58 ----
extern void setup_cancel_handler(void);
+ extern void SetCancelConn(PGconn *conn);
+ extern void ResetCancelConn(void);
+
+
#endif /* COMMON_H */
*** a/src/bin/scripts/vacuumdb.c
--- b/src/bin/scripts/vacuumdb.c
***************
*** 11,34 ****
*/
#include "postgres_fe.h"
#include "common.h"
#include "dumputils.h"
! static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages, int stage, bool freeze,
! const char *table, const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet);
! static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, bool freeze,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
const char *progname, bool echo, bool quiet);
static void help(const char *progname);
int
main(int argc, char *argv[])
--- 11,111 ----
*/
#include "postgres_fe.h"
+
#include "common.h"
#include "dumputils.h"
! #define ERRCODE_UNDEFINED_TABLE "42P01"
!
! /* Parallel vacuuming stuff */
! typedef struct ParallelSlot
! {
! PGconn *connection;
! pgsocket sock;
! bool isFree;
! } ParallelSlot;
!
! /* vacuum options controlled by user flags */
! typedef struct vacuumingOptions
! {
! bool analyze_only;
! bool verbose;
! bool and_analyze;
! bool full;
! bool freeze;
! } vacuumingOptions;
!
!
! static void vacuum_one_database(const char *dbname, vacuumingOptions *vacopts,
! bool analyze_in_stages, int stage,
! SimpleStringList *tables,
! const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet,
! int concurrentCons);
!
! static void vacuum_all_databases(vacuumingOptions *vacopts,
! bool analyze_in_stages,
const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
+ int concurrentCons,
const char *progname, bool echo, bool quiet);
+ static void vacuum_database_stage(const char *dbname, vacuumingOptions *vacopts,
+ bool analyze_in_stages, int stage,
+ SimpleStringList *tables,
+ const char *host, const char *port, const char *username,
+ enum trivalue prompt_password,
+ int concurrentCons,
+ const char *progname, bool echo, bool quiet);
static void help(const char *progname);
+ static void prepare_command(PQExpBuffer sql, PGconn *conn,
+ vacuumingOptions *vacopts, const char *table);
+
+ static ParallelSlot *GetIdleSlot(ParallelSlot slots[], int numslots,
+ const char *dbname, const char *progname);
+
+ static bool GetQueryResult(PGconn *conn, const char *dbname,
+ const char *progname);
+
+ static int select_loop(int maxFd, fd_set *workerset, bool *aborting);
+
+ static void DisconnectDatabase(ParallelSlot *slot);
+ static void init_slot(ParallelSlot *slot, PGconn *conn);
+
+
+
+ /*
+ * Preparatory commands and corresponding user-visible message for the
+ * analyze-in-stages feature. Note the ANALYZE command itself must be sent
+ * separately.
+ */
+ static const struct
+ {
+ const char *prepcmd;
+ const char *message;
+ }
+ staged_analyze[3] =
+ {
+ {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ gettext_noop("Generating minimal optimizer statistics (1 target)")
+ },
+ {
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ gettext_noop("Generating medium optimizer statistics (10 targets)")
+ },
+ {
+ "RESET default_statistics_target;",
+ gettext_noop("Generating default (full) optimizer statistics")
+ }
+ };
+
+ #define ANALYZE_ALL_STAGES -1
+
int
main(int argc, char *argv[])
***************
*** 49,54 **** main(int argc, char *argv[])
--- 126,132 ----
{"table", required_argument, NULL, 't'},
{"full", no_argument, NULL, 'f'},
{"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
{"maintenance-db", required_argument, NULL, 2},
{"analyze-in-stages", no_argument, NULL, 3},
{NULL, 0, NULL, 0}
***************
*** 57,63 **** main(int argc, char *argv[])
const char *progname;
int optindex;
int c;
-
const char *dbname = NULL;
const char *maintenance_db = NULL;
char *host = NULL;
--- 135,140 ----
***************
*** 66,86 **** main(int argc, char *argv[])
enum trivalue prompt_password = TRI_DEFAULT;
bool echo = false;
bool quiet = false;
! bool and_analyze = false;
! bool analyze_only = false;
bool analyze_in_stages = false;
- bool freeze = false;
bool alldb = false;
- bool full = false;
- bool verbose = false;
SimpleStringList tables = {NULL, NULL};
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fv", long_options, &optindex)) != -1)
{
switch (c)
{
--- 143,165 ----
enum trivalue prompt_password = TRI_DEFAULT;
bool echo = false;
bool quiet = false;
! vacuumingOptions vacopts;
bool analyze_in_stages = false;
bool alldb = false;
SimpleStringList tables = {NULL, NULL};
+ int concurrentCons = 0;
+ int tbl_count = 0;
+
+ /* initialize options to all false */
+ memset(&vacopts, 0, sizeof(vacopts));
progname = get_progname(argv[0]);
+
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
handle_help_version_opts(argc, argv, "vacuumdb", help);
! while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fvj:", long_options, &optindex)) != -1)
{
switch (c)
{
***************
*** 109,139 **** main(int argc, char *argv[])
dbname = pg_strdup(optarg);
break;
case 'z':
! and_analyze = true;
break;
case 'Z':
! analyze_only = true;
break;
case 'F':
! freeze = true;
break;
case 'a':
alldb = true;
break;
case 't':
simple_string_list_append(&tables, optarg);
break;
case 'f':
! full = true;
break;
case 'v':
! verbose = true;
break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
case 3:
! analyze_in_stages = analyze_only = true;
break;
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
--- 188,230 ----
dbname = pg_strdup(optarg);
break;
case 'z':
! vacopts.and_analyze = true;
break;
case 'Z':
! vacopts.analyze_only = true;
break;
case 'F':
! vacopts.freeze = true;
break;
case 'a':
alldb = true;
break;
case 't':
+ {
simple_string_list_append(&tables, optarg);
+ tbl_count++;
break;
+ }
case 'f':
! vacopts.full = true;
break;
case 'v':
! vacopts.verbose = true;
! break;
! case 'j':
! concurrentCons = atoi(optarg);
! if (concurrentCons <= 0)
! {
! fprintf(stderr, _("%s: number of parallel \"jobs\" must be at least 1\n"),
! progname);
! exit(1);
! }
break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
case 3:
! analyze_in_stages = vacopts.analyze_only = true;
break;
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
***************
*** 141,147 **** main(int argc, char *argv[])
}
}
-
/*
* Non-option argument specifies database name as long as it wasn't
* already specified with -d / --dbname
--- 232,237 ----
***************
*** 160,177 **** main(int argc, char *argv[])
exit(1);
}
! if (analyze_only)
{
! if (full)
{
! fprintf(stderr, _("%s: cannot use the \"full\" option when performing only analyze\n"),
! progname);
exit(1);
}
! if (freeze)
{
! fprintf(stderr, _("%s: cannot use the \"freeze\" option when performing only analyze\n"),
! progname);
exit(1);
}
/* allow 'and_analyze' with 'analyze_only' */
--- 250,267 ----
exit(1);
}
! if (vacopts.analyze_only)
{
! if (vacopts.full)
{
! fprintf(stderr, _("%s: cannot use the \"%s\" option when performing only analyze\n"),
! progname, "full");
exit(1);
}
! if (vacopts.freeze)
{
! fprintf(stderr, _("%s: cannot use the \"%s\" option when performing only analyze\n"),
! progname, "freeze");
exit(1);
}
/* allow 'and_analyze' with 'analyze_only' */
***************
*** 179,184 **** main(int argc, char *argv[])
--- 269,278 ----
setup_cancel_handler();
+ /* Avoid opening extra connections. */
+ if (tbl_count && (concurrentCons > tbl_count))
+ concurrentCons = tbl_count;
+
if (alldb)
{
if (dbname)
***************
*** 194,202 **** main(int argc, char *argv[])
exit(1);
}
! vacuum_all_databases(full, verbose, and_analyze, analyze_only, analyze_in_stages, freeze,
! maintenance_db, host, port, username,
! prompt_password, progname, echo, quiet);
}
else
{
--- 288,299 ----
exit(1);
}
! vacuum_all_databases(&vacopts,
! analyze_in_stages,
! maintenance_db,
! host, port, username, prompt_password,
! concurrentCons,
! progname, echo, quiet);
}
else
{
***************
*** 210,244 **** main(int argc, char *argv[])
dbname = get_user_name_or_exit(progname);
}
! if (tables.head != NULL)
! {
! SimpleStringListCell *cell;
!
! for (cell = tables.head; cell; cell = cell->next)
! {
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages, -1,
! freeze, cell->val,
! host, port, username, prompt_password,
! progname, echo, quiet);
! }
! }
! else
! vacuum_one_database(dbname, full, verbose, and_analyze,
! analyze_only, analyze_in_stages, -1,
! freeze, NULL,
! host, port, username, prompt_password,
! progname, echo, quiet);
}
exit(0);
}
!
static void
! run_vacuum_command(PGconn *conn, const char *sql, bool echo, const char *dbname, const char *table, const char *progname)
{
! if (!executeMaintenanceCommand(conn, sql, echo))
{
if (table)
fprintf(stderr, _("%s: vacuuming of table \"%s\" in database \"%s\" failed: %s"),
--- 307,341 ----
dbname = get_user_name_or_exit(progname);
}
! vacuum_database_stage(dbname, &vacopts,
! analyze_in_stages, ANALYZE_ALL_STAGES,
! &tables,
! host, port, username, prompt_password,
! concurrentCons,
! progname, echo, quiet);
}
exit(0);
}
! /*
! * Execute a vacuum/analyze command to the server.
! *
! * Result status is checked only if 'async' is false.
! */
static void
! run_vacuum_command(PGconn *conn, const char *sql, bool echo,
! const char *dbname, const char *table,
! const char *progname, bool async)
{
! if (async)
! {
! if (echo)
! printf("%s\n", sql);
!
! PQsendQuery(conn, sql);
! }
! else if (!executeMaintenanceCommand(conn, sql, echo))
{
if (table)
fprintf(stderr, _("%s: vacuuming of table \"%s\" in database \"%s\" failed: %s"),
***************
*** 251,422 **** run_vacuum_command(PGconn *conn, const char *sql, bool echo, const char *dbname,
}
}
!
static void
! vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyze,
! bool analyze_only, bool analyze_in_stages, int stage, bool freeze, const char *table,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet)
{
PQExpBufferData sql;
-
PGconn *conn;
!
! initPQExpBuffer(&sql);
conn = connectDatabase(dbname, host, port, username, prompt_password,
progname, false);
! if (analyze_only)
{
! appendPQExpBufferStr(&sql, "ANALYZE");
! if (verbose)
! appendPQExpBufferStr(&sql, " VERBOSE");
}
! else
{
! appendPQExpBufferStr(&sql, "VACUUM");
! if (PQserverVersion(conn) >= 90000)
{
! const char *paren = " (";
! const char *comma = ", ";
! const char *sep = paren;
! if (full)
{
! appendPQExpBuffer(&sql, "%sFULL", sep);
! sep = comma;
}
! if (freeze)
{
! appendPQExpBuffer(&sql, "%sFREEZE", sep);
! sep = comma;
}
! if (verbose)
{
! appendPQExpBuffer(&sql, "%sVERBOSE", sep);
! sep = comma;
}
! if (and_analyze)
{
! appendPQExpBuffer(&sql, "%sANALYZE", sep);
! sep = comma;
}
- if (sep != paren)
- appendPQExpBufferStr(&sql, ")");
- }
- else
- {
- if (full)
- appendPQExpBufferStr(&sql, " FULL");
- if (freeze)
- appendPQExpBufferStr(&sql, " FREEZE");
- if (verbose)
- appendPQExpBufferStr(&sql, " VERBOSE");
- if (and_analyze)
- appendPQExpBufferStr(&sql, " ANALYZE");
}
- }
- if (table)
- appendPQExpBuffer(&sql, " %s", table);
- appendPQExpBufferStr(&sql, ";");
! if (analyze_in_stages)
! {
! const char *stage_commands[] = {
! "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
! "SET default_statistics_target=10; RESET vacuum_cost_delay;",
! "RESET default_statistics_target;"
! };
! const char *stage_messages[] = {
! gettext_noop("Generating minimal optimizer statistics (1 target)"),
! gettext_noop("Generating medium optimizer statistics (10 targets)"),
! gettext_noop("Generating default (full) optimizer statistics")
! };
!
! if (stage == -1)
{
! int i;
! /* Run all stages. */
! for (i = 0; i < 3; i++)
{
! if (!quiet)
{
! puts(gettext(stage_messages[i]));
! fflush(stdout);
}
! executeCommand(conn, stage_commands[i], progname, echo);
! run_vacuum_command(conn, sql.data, echo, dbname, table, progname);
}
! }
! else
{
! /* Otherwise, we got a stage from vacuum_all_databases(), so run
! * only that one. */
! if (!quiet)
{
! puts(gettext(stage_messages[stage]));
! fflush(stdout);
}
- executeCommand(conn, stage_commands[stage], progname, echo);
- run_vacuum_command(conn, sql.data, echo, dbname, table, progname);
}
}
else
! run_vacuum_command(conn, sql.data, echo, dbname, NULL, progname);
- PQfinish(conn);
termPQExpBuffer(&sql);
}
static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool analyze_only,
! bool analyze_in_stages, bool freeze, const char *maintenance_db,
! const char *host, const char *port,
! const char *username, enum trivalue prompt_password,
const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
int stage;
conn = connectMaintenanceDatabase(maintenance_db, host, port,
username, prompt_password, progname);
! result = executeQuery(conn, "SELECT datname FROM pg_database WHERE datallowconn ORDER BY 1;", progname, echo);
PQfinish(conn);
! /* If analyzing in stages, then run through all stages. Otherwise just
! * run once, passing -1 as the stage. */
! for (stage = (analyze_in_stages ? 0 : -1);
! stage < (analyze_in_stages ? 3 : 0);
! stage++)
{
- int i;
-
for (i = 0; i < PQntuples(result); i++)
{
! char *dbname = PQgetvalue(result, i, 0);
! if (!quiet)
{
! printf(_("%s: vacuuming database \"%s\"\n"), progname, dbname);
! fflush(stdout);
}
! vacuum_one_database(dbname, full, verbose, and_analyze, analyze_only,
! analyze_in_stages, stage,
! freeze, NULL, host, port, username, prompt_password,
! progname, echo, quiet);
}
}
! PQclear(result);
}
static void
help(const char *progname)
--- 348,914 ----
}
}
! /*
! * vacuum_one_database
! *
! * Process tables in the given database. If the 'tables' list is empty,
! * process all tables in the database. Note there is no paralellization here.
! */
static void
! vacuum_one_database(const char *dbname, vacuumingOptions *vacopts,
! bool analyze_in_stages, int stage,
! SimpleStringList *tables,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
! const char *progname, bool echo, bool quiet,
! int concurrentCons)
{
PQExpBufferData sql;
PGconn *conn;
! SimpleStringListCell *cell;
! ParallelSlot *slots = NULL;
! SimpleStringList dbtables = {NULL, NULL};
! int i;
! bool result = 0;
conn = connectDatabase(dbname, host, port, username, prompt_password,
progname, false);
! initPQExpBuffer(&sql);
!
! /*
! * If a table list is not provided and concurrentCons option is given
! * then we need to vacuum the whole database, prepare the list of tables.
! */
! if (concurrentCons && (!tables || !tables->head))
{
! PQExpBufferData buf;
! PGresult *res;
! int ntups;
! int i;
!
! initPQExpBuffer(&buf);
!
! res = executeQuery(conn,
! "SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns\n"
! " WHERE relkind IN (\'r\', \'m\') AND c.relnamespace = ns.oid\n"
! " ORDER BY c.relpages DESC",
! progname, echo);
!
! ntups = PQntuples(res);
! for (i = 0; i < ntups; i++)
! {
! appendPQExpBuffer(&buf, "%s",
! fmtQualifiedId(PQserverVersion(conn),
! PQgetvalue(res, i, 1),
! PQgetvalue(res, i, 0)));
!
! simple_string_list_append(&dbtables, buf.data);
! resetPQExpBuffer(&buf);
! }
!
! termPQExpBuffer(&buf);
! tables = &dbtables;
!
! /*
! * If there are more connections than vacuumable relations, we don't
! * need to use them all.
! */
! if (concurrentCons > ntups)
! concurrentCons = ntups;
}
!
! if (concurrentCons)
{
! slots = (ParallelSlot *) pg_malloc(sizeof(ParallelSlot) * concurrentCons);
! init_slot(slots, conn);
!
! for (i = 1; i < concurrentCons; i++)
{
! conn = connectDatabase(dbname, host, port, username, prompt_password,
! progname, false);
! init_slot(slots + i, conn);
! }
! }
!
! for (i = 0; i < 3; i++)
! {
! cell = tables ? tables->head : NULL;
!
! if (analyze_in_stages)
! {
! int currentStage;
! if (stage == ANALYZE_ALL_STAGES)
{
! currentStage = i;
}
! else
{
! currentStage = stage;
}
!
! if (!quiet)
{
! puts(gettext(staged_analyze[currentStage].message));
! fflush(stdout);
}
!
! if (concurrentCons)
{
! int j;
! for (j = 0; j < concurrentCons; j++)
! {
! executeCommand((slots + j)->connection,
! staged_analyze[currentStage].prepcmd, progname, echo);
! }
! }
! else
! {
! executeCommand(conn, staged_analyze[currentStage].prepcmd, progname, echo);
}
}
! do
{
! const char *tabname;
! tabname = cell ? cell->val : NULL;
! prepare_command(&sql, conn, vacopts, tabname);
! if (concurrentCons)
{
! ParallelSlot *free_slot;
!
! if (CancelRequested)
! {
! result = -1;
! goto finish;
! }
!
! /*
! * Get a free slot, waiting until one becomes free if none currently
! * is.
! */
! free_slot = GetIdleSlot(slots, concurrentCons, dbname, progname);
! if (!free_slot)
{
! result = -1;
! goto finish;
}
!
! free_slot->isFree = false;
!
! run_vacuum_command(free_slot->connection, sql.data,
! echo, dbname, cell->val, progname, true);
}
! else
! run_vacuum_command(conn, sql.data, echo, dbname, NULL, progname, false);
!
! if (cell)
! cell = cell->next;
! } while (cell != NULL);
!
! if (concurrentCons)
{
! int j;
!
! for (j = 0; j < concurrentCons; j++)
{
! /* wait for all connection to return the results */
! if (!GetQueryResult((slots + j)->connection, dbname, progname))
! goto finish;
!
! (slots + j)->isFree = true; /* XXX what's the point? */
}
}
+ if (!analyze_in_stages || stage != ANALYZE_ALL_STAGES)
+ break;
+ }
+
+ finish:
+ if (concurrentCons)
+ {
+ for (i = 0; i < concurrentCons; i++)
+ DisconnectDatabase(&slots[i]);
+
+ pfree(slots);
}
else
! PQfinish(conn);
termPQExpBuffer(&sql);
+
+ if (result == -1)
+ exit(1);
}
+ static void
+ init_slot(ParallelSlot *slot, PGconn *conn)
+ {
+ slot->connection = conn;
+ slot->isFree = true;
+ slot->sock = PQsocket(conn);
+ }
static void
! vacuum_all_databases(vacuumingOptions *vacopts,
! bool analyze_in_stages,
! const char *maintenance_db, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password,
! int concurrentCons,
const char *progname, bool echo, bool quiet)
{
PGconn *conn;
PGresult *result;
int stage;
+ int i;
conn = connectMaintenanceDatabase(maintenance_db, host, port,
username, prompt_password, progname);
! result = executeQuery(conn,
! "SELECT datname FROM pg_database WHERE datallowconn ORDER BY 1;",
! progname, echo);
PQfinish(conn);
! if (analyze_in_stages)
! {
! for (stage = 0; stage < 3; stage++)
! {
! for (i = 0; i < PQntuples(result); i++)
! {
! const char *dbname;
!
! dbname = PQgetvalue(result, i, 0);
! vacuum_database_stage(dbname, vacopts,
! analyze_in_stages, stage,
! NULL,
! host, port, username, prompt_password,
! concurrentCons,
! progname, echo, quiet);
! }
! }
! }
! else
{
for (i = 0; i < PQntuples(result); i++)
{
! const char *dbname;
!
! dbname = PQgetvalue(result, i, 0);
! vacuum_database_stage(dbname, vacopts,
! analyze_in_stages, ANALYZE_ALL_STAGES,
! NULL,
! host, port, username, prompt_password,
! concurrentCons,
! progname, echo, quiet);
! }
! }
! PQclear(result);
! }
!
! static void
! vacuum_database_stage(const char *dbname, vacuumingOptions *vacopts,
! bool analyze_in_stages, int stage,
! SimpleStringList *tables,
! const char *host, const char *port, const char *username,
! enum trivalue prompt_password,
! int concurrentCons,
! const char *progname, bool echo, bool quiet)
! {
! if (!quiet)
! {
! printf(_("%s: vacuuming database \"%s\"\n"), progname, dbname);
! fflush(stdout);
! }
!
! vacuum_one_database(dbname, vacopts,
! analyze_in_stages, stage,
! tables,
! host, port, username, prompt_password,
! progname, echo, quiet, concurrentCons);
! }
!
! /*
! * GetIdleSlot
! * Return a connection slot that is ready to execute a command.
! *
! * We return the first slot we find that is marked isFree, if one is;
! * otherwise, we loop on select() until one socket becomes available. When
! * this happens, we read the whole set and mark as free all sockets that become
! * available.
! *
! * Process the slot list, if any free slot is available then return the slotid
! * else perform the select on all the socket's and wait until at least one slot
! * becomes available.
! *
! * If an error occurs, NULL is returned.
! */
! static ParallelSlot *
! GetIdleSlot(ParallelSlot slots[], int numslots, const char *dbname,
! const char *progname)
! {
! int i;
! int firstFree = -1;
! fd_set slotset;
! pgsocket maxFd;
!
! for (i = 0; i < numslots; i++)
! if ((slots + i)->isFree)
! return slots + i;
!
! FD_ZERO(&slotset);
!
! maxFd = slots->sock;
! for (i = 0; i < numslots; i++)
! {
! FD_SET((slots + i)->sock, &slotset);
! if ((slots + i)->sock > maxFd)
! maxFd = (slots + i)->sock;
! }
!
! /*
! * No free slot found, so wait until one of the connections has finished
! * its task and return the available slot.
! */
! for (firstFree = -1; firstFree < 0; )
! {
! bool aborting;
!
! SetCancelConn(slots->connection);
! i = select_loop(maxFd, &slotset, &aborting);
! ResetCancelConn();
!
! if (aborting)
! {
! /*
! * We set the cancel-receiving connection to the one in the zeroth
! * slot above, so fetch the error from there.
! */
! GetQueryResult(slots->connection, dbname, progname);
! return NULL;
! }
! Assert(i != 0);
!
! for (i = 0; i < numslots; i++)
! {
! if (!FD_ISSET((slots + i)->sock, &slotset))
! continue;
!
! PQconsumeInput((slots + i)->connection);
! if (PQisBusy((slots + i)->connection))
! continue;
!
! (slots + i)->isFree = true;
!
! if (!GetQueryResult((slots + i)->connection, dbname, progname))
! return NULL;
!
! if (firstFree < 0)
! firstFree = i;
! }
! }
!
! return slots + firstFree;
! }
!
! /*
! * GetQueryResult
! *
! * Process the query result. Returns true if there's no error, false
! * otherwise -- but errors about trying to vacuum a missing relation are
! * reported and subsequently ignored.
! */
! static bool
! GetQueryResult(PGconn *conn, const char *dbname, const char *progname)
! {
! PGresult *result;
!
! SetCancelConn(conn);
! while ((result = PQgetResult(conn)) != NULL)
! {
! /*
! * If errors are found, report them. Errors about a missing table are
! * harmless so we continue processing; but die for other errors.
! */
! if (PQresultStatus(result) != PGRES_COMMAND_OK)
! {
! char *sqlState = PQresultErrorField(result, PG_DIAG_SQLSTATE);
!
! fprintf(stderr, _("%s: vacuuming of database \"%s\" failed: %s"),
! progname, dbname, PQerrorMessage(conn));
!
! if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)
{
! PQclear(result);
! return false;
}
+ }
+
+ PQclear(result);
+ }
+ ResetCancelConn();
+
+ return true;
+ }
! /*
! * Loop on select() until a descriptor from the given set becomes readable.
! *
! * If we get a cancel request while we're waiting, we forego all further
! * processing and set the *aborting flag to true. The return value must be
! * ignored in this case. Otherwise, *aborting is set to false.
! */
! static int
! select_loop(int maxFd, fd_set *workerset, bool *aborting)
! {
! int i;
! fd_set saveSet = *workerset;
!
! if (CancelRequested)
! {
! *aborting = true;
! return -1;
! }
! else
! *aborting = false;
!
! for (;;)
! {
! /*
! * On Windows, we need to check once in a while for cancel requests; on
! * other platforms we rely on select() returning when interrupted.
! */
! struct timeval *tvp;
! #ifdef WIN32
! struct timeval tv = {0, 1000000};
!
! tvp = &tv;
! #else
! tvp = NULL;
! #endif
!
! *workerset = saveSet;
! i = select(maxFd + 1, workerset, NULL, NULL, tvp);
!
! #ifdef WIN32
! if (i == SOCKET_ERROR)
! {
! i = -1;
!
! if (WSAGetLastError() == WSAEINTR)
! errno == EINTR;
}
+ #endif
+
+ if (i < 0 && errno == EINTR)
+ continue; /* ignore this */
+ if (i < 0 || CancelRequested)
+ *aborting = true; /* but not this */
+ if (i == 0)
+ continue; /* timeout (Win32 only) */
+ break;
}
! return i;
}
+ /*
+ * DisconnectDatabase
+ * Disconnect the connection associated with the given slot
+ */
+ static void
+ DisconnectDatabase(ParallelSlot *slot)
+ {
+ char errbuf[256];
+
+ if (!slot->connection)
+ return;
+
+ if (PQtransactionStatus(slot->connection) == PQTRANS_ACTIVE)
+ {
+ PGcancel *cancel;
+
+ if ((cancel = PQgetCancel(slot->connection)))
+ {
+ PQcancel(cancel, errbuf, sizeof(errbuf));
+ PQfreeCancel(cancel);
+ }
+ }
+
+ PQfinish(slot->connection);
+ slot->connection = NULL;
+ }
+
+ /*
+ * Construct a vacuum/analyze command to run based on the given options, in the
+ * given string buffer, which may contain previous garbage.
+ *
+ * An optional table name can be passed; this must be already be properly
+ * quoted. The command is semicolon-terminated.
+ */
+ static void
+ prepare_command(PQExpBuffer sql, PGconn *conn, vacuumingOptions *vacopts,
+ const char *table)
+ {
+ resetPQExpBuffer(sql);
+
+ if (vacopts->analyze_only)
+ {
+ appendPQExpBufferStr(sql, "ANALYZE");
+ if (vacopts->verbose)
+ appendPQExpBufferStr(sql, " VERBOSE");
+ }
+ else
+ {
+ appendPQExpBufferStr(sql, "VACUUM");
+ if (PQserverVersion(conn) >= 90000)
+ {
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ if (vacopts->full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (vacopts->freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (vacopts->verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (vacopts->and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBufferStr(sql, ")");
+ }
+ else
+ {
+ if (vacopts->full)
+ appendPQExpBufferStr(sql, " FULL");
+ if (vacopts->freeze)
+ appendPQExpBufferStr(sql, " FREEZE");
+ if (vacopts->verbose)
+ appendPQExpBufferStr(sql, " VERBOSE");
+ if (vacopts->and_analyze)
+ appendPQExpBufferStr(sql, " ANALYZE");
+ }
+ }
+
+ if (table)
+ appendPQExpBuffer(sql, " %s;", table);
+ }
static void
help(const char *progname)
***************
*** 436,441 **** help(const char *progname)
--- 928,934 ----
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -z, --analyze update optimizer statistics\n"));
printf(_(" -Z, --analyze-only only update optimizer statistics\n"));
+ printf(_(" -j, --jobs=NUM use this many concurrent connections to vacuum\n"));
printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n"
" stages for faster results\n"));
printf(_(" -?, --help show this help, then exit\n"));
***************
*** 449,451 **** help(const char *progname)
--- 942,945 ----
printf(_("\nRead the description of the SQL command VACUUM for details.\n"));
printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));
}
+
Dilip kumar wrote:
Changes:
1. In current patch vacuum_one_database (for table list), have the table loop outside and analyze_stage loop inside, so it will finish
All three stage for one table first and then pick the next table. But vacuum_one_database_parallel will do the stage loop outside and will call run_parallel_vacuum,
Which will have table loop, so for one stage all the tables will be vacuumed first, then go to next stage.
So for merging two function both functions behaviors should be identical, I think if user have given a list of tables in analyze-in-stages, than doing all the table
Atleast for one stage and then picking next stage will be better solution so I have done it that way.
Yeah, I think the stages loop should be outermost, as discussed upthread
somewhere -- it's better to have initial stats for all tables as soon as
possible, and improve them later, than have some tables/dbs with no
stats for a longer period while full stats are computed for some
specific tables/database.
I'm tweaking your v24 a bit more now, thanks -- main change is to make
vacuum_one_database be called only to run one analyze stage, so it never
iterates for each stage; callers must iterate calling it multiple times
in those cases. (There's only one callsite that needs changing anyway.)
2. in select_loop
For WIN32 TranslateSocketError function I replaced with
if (WSAGetLastError() == WSAEINTR)
errno == EINTR;otherwise I have to expose TranslateSocketError function from socket and include extra header.
Grumble. Don't like this bit, but moving TranslateSocketError to
src/common is outside the scope of this patch, so okay. (pg_dump's
parallel stuff has the same issue anyway.)
In case you're up for doing some more work later on, there are two ideas
here: move the backend's TranslateSocketError to src/common, and try to
merge pg_dump's select_loop function with the one in this new code. But
that's for another patch anyway (and this new function needs a little
battle-testing, of course.)
I have tested in windows also its working fine.
Great, thanks.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund wrote:
On 2014-12-31 18:35:38 +0530, Amit Kapila wrote:
+ PQsetnonblocking(connSlot[0].connection, 1); + + for (i = 1; i < concurrentCons; i++) + { + connSlot[i].connection = connectDatabase(dbname, host, port, username, + prompt_password, progname, false); + + PQsetnonblocking(connSlot[i].connection, 1); + connSlot[i].isFree = true; + connSlot[i].sock = PQsocket(connSlot[i].connection); + }Are you sure about this global PQsetnonblocking()? This means that you
might not be able to send queries... And you don't seem to be waiting
for sockets waiting for writes in the select loop - which means you
might end up being stuck waiting for reads when you haven't submitted
the query.I think you might need a more complex select() loop. On nonfree
connections also wait for writes if PQflush() returns != 0.
I removed the PQsetnonblocking() calls. They were a misunderstanding, I
think.
+/* + * GetIdleSlot + * Process the slot list, if any free slot is available then return + * the slotid else perform the select on all the socket's and wait + * until atleast one slot becomes available. + */ +static int +GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname, + const char *progname, bool completedb) +{ + int i; + fd_set slotset;Hm, you probably need to limit -j to FD_SETSIZE - 1 or so.
I tried without the check to use 1500 connections, and select() didn't
even blink -- everything worked fine vacuuming 1500 tables in parallel
on a set of 2000 tables. Not sure what's the actual limit but my
FD_SETSIZE says 1024, so I'm clearly over the limit. (I tried to run it
with -j2000 but the server didn't start with that many connections. I
didn't try any intermediate numbers.) Anyway I added the check.
I fixed some more minor issues and pushed.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro Herrera wrote:
I'm tweaking your v24 a bit more now, thanks -- main change is to make
vacuum_one_database be called only to run one analyze stage, so it never
iterates for each stage; callers must iterate calling it multiple times
in those cases. (There's only one callsite that needs changing anyway.)
I made some more changes, particularly so that the TAP test pass (we
were missing the semicolon when a table name was not specified to
prepare_vacuum_command). I reordered the code in a more sensible
manner, remove the vacuum_database_stage layer (which was pretty
useless), and changed the analyze-in-stages mode: if we pass a valid
stage number, run that stage, if not, then we're not in analyze-in-stage
mode. So I got rid of the boolean flag altogether. I also moved the
per-stage commands and messages back into a struct inside a function,
since there's no need to have them be file-level variables anymore.
-j1 is now the same as not specifying anything, and vacuum_one_database
uses more common code in the parallel and not-parallel cases: the
not-parallel case is just the parallel case with a single connection, so
the setup and shutdown is mostly the same in both cases.
I pushed the result. Please test, particularly on Windows. If you can
use configure --enable-tap-tests and run them ("make check" in the
src/bin/scripts subdir) that would be good too .. not sure whether
that's expected to work on Windows.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 23 January 2015 21:10, Alvaro Herrera Wrote,
In case you're up for doing some more work later on, there are two
ideas
here: move the backend's TranslateSocketError to src/common, and try to
merge pg_dump's select_loop function with the one in this new code.
But that's for another patch anyway (and this new function needs a
little battle-testing, of course.)
Thanks for your effort, I will take it up for next commitfest..
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 23 January 2015 23:55, Alvaro Herrera,
-j1 is now the same as not specifying anything, and vacuum_one_database
uses more common code in the parallel and not-parallel cases: the not-
parallel case is just the parallel case with a single connection, so
the setup and shutdown is mostly the same in both cases.I pushed the result. Please test, particularly on Windows. If you can
use configure --enable-tap-tests and run them ("make check" in the
src/bin/scripts subdir) that would be good too .. not sure whether
that's expected to work on Windows.
I have tested in windows, its working fine,
Not sure how to enable tap test in windows, I will check it and run if possible.
Thanks,
Dilip
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi
I am testing this feature on relative complex schema (38619 tables in db)
and I got deadlock
[pavel@localhost bin]$ /usr/local/pgsql/bin/vacuumdb test2 -fz -j 4
vacuumdb: vacuuming database "test2"
vacuumdb: vacuuming of database "test2" failed: ERROR: deadlock detected
DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of
database 194769; blocked by process 24690.
Process 24690 waits for AccessShareLock on relation 1249 of database
194769; blocked by process 24689.
HINT: See server log for query details.
ERROR: deadlock detected
DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of
database 194769; blocked by process 24690.
Process 24690 waits for AccessShareLock on relation 1249 of
database 194769; blocked by process 24689.
Process 24689: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute;
Process 24690: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend;
HINT: See server log for query details.
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_class;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc;
LOG: could not send data to client: Broken pipe
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc;
FATAL: connection to client lost
LOG: could not send data to client: Broken pipe
ERROR: canceling statement due to user request
FATAL: connection to client lost
Schema | Name | Type | Owner | Size |
Description
------------+-------------------------+-------+----------+------------+-------------
pg_catalog | pg_attribute | table | postgres | 439 MB |
pg_catalog | pg_rewrite | table | postgres | 314 MB |
pg_catalog | pg_proc | table | postgres | 136 MB |
pg_catalog | pg_depend | table | postgres | 133 MB |
pg_catalog | pg_class | table | postgres | 69 MB |
pg_catalog | pg_attrdef | table | postgres | 55 MB |
pg_catalog | pg_trigger | table | postgres | 47 MB |
pg_catalog | pg_type | table | postgres | 31 MB |
pg_catalog | pg_description | table | postgres | 23 MB |
pg_catalog | pg_index | table | postgres | 20 MB |
pg_catalog | pg_constraint | table | postgres | 17 MB |
pg_catalog | pg_shdepend | table | postgres | 17 MB |
pg_catalog | pg_statistic | table | postgres | 928 kB |
pg_catalog | pg_operator | table | postgres | 552 kB |
pg_catalog | pg_collation | table | postgres | 232 kB |
Regards
Pavel Stehule
2015-01-27 3:26 GMT+01:00 Dilip kumar <dilip.kumar@huawei.com>:
Show quoted text
On 23 January 2015 21:10, Alvaro Herrera Wrote,
In case you're up for doing some more work later on, there are two
ideas
here: move the backend's TranslateSocketError to src/common, and try to
merge pg_dump's select_loop function with the one in this new code.
But that's for another patch anyway (and this new function needs a
little battle-testing, of course.)Thanks for your effort, I will take it up for next commitfest..
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Em quinta-feira, 29 de janeiro de 2015, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:
Hi
I am testing this feature on relative complex schema (38619 tables in db)
and I got deadlock[pavel@localhost bin]$ /usr/local/pgsql/bin/vacuumdb test2 -fz -j 4
vacuumdb: vacuuming database "test2"
vacuumdb: vacuuming of database "test2" failed: ERROR: deadlock detected
DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of
database 194769; blocked by process 24690.
Process 24690 waits for AccessShareLock on relation 1249 of database
194769; blocked by process 24689.
HINT: See server log for query details.ERROR: deadlock detected
DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of
database 194769; blocked by process 24690.
Process 24690 waits for AccessShareLock on relation 1249 of
database 194769; blocked by process 24689.
Process 24689: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute;
Process 24690: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend;
HINT: See server log for query details.
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_class;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc;
LOG: could not send data to client: Broken pipe
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc;
FATAL: connection to client lost
LOG: could not send data to client: Broken pipe
ERROR: canceling statement due to user request
FATAL: connection to client lostSchema | Name | Type | Owner | Size |
Description------------+-------------------------+-------+----------+------------+-------------
pg_catalog | pg_attribute | table | postgres | 439 MB |
pg_catalog | pg_rewrite | table | postgres | 314 MB |
pg_catalog | pg_proc | table | postgres | 136 MB |
pg_catalog | pg_depend | table | postgres | 133 MB |
pg_catalog | pg_class | table | postgres | 69 MB |
pg_catalog | pg_attrdef | table | postgres | 55 MB |
pg_catalog | pg_trigger | table | postgres | 47 MB |
pg_catalog | pg_type | table | postgres | 31 MB |
pg_catalog | pg_description | table | postgres | 23 MB |
pg_catalog | pg_index | table | postgres | 20 MB |
pg_catalog | pg_constraint | table | postgres | 17 MB |
pg_catalog | pg_shdepend | table | postgres | 17 MB |
pg_catalog | pg_statistic | table | postgres | 928 kB |
pg_catalog | pg_operator | table | postgres | 552 kB |
pg_catalog | pg_collation | table | postgres | 232 kB |
There are a warning in the docs to be careful to use the -f (full) option
and -j.
Regards,
Fabrízio
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
Show quoted text
Timbira: http://www.timbira.com.br
Blog: http://fabriziomello.github.io
Linkedin: http://br.linkedin.com/in/fabriziomello
Twitter: http://twitter.com/fabriziomello
Github: http://github.com/fabriziomello
2015-01-29 10:28 GMT+01:00 Fabrízio de Royes Mello <fabriziomello@gmail.com>
:
Em quinta-feira, 29 de janeiro de 2015, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:Hi
I am testing this feature on relative complex schema (38619 tables in db)
and I got deadlock[pavel@localhost bin]$ /usr/local/pgsql/bin/vacuumdb test2 -fz -j 4
vacuumdb: vacuuming database "test2"
vacuumdb: vacuuming of database "test2" failed: ERROR: deadlock detected
DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of
database 194769; blocked by process 24690.
Process 24690 waits for AccessShareLock on relation 1249 of database
194769; blocked by process 24689.
HINT: See server log for query details.ERROR: deadlock detected
DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of
database 194769; blocked by process 24690.
Process 24690 waits for AccessShareLock on relation 1249 of
database 194769; blocked by process 24689.
Process 24689: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute;
Process 24690: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend;
HINT: See server log for query details.
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_class;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc;
LOG: could not send data to client: Broken pipe
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc;
FATAL: connection to client lost
LOG: could not send data to client: Broken pipe
ERROR: canceling statement due to user request
FATAL: connection to client lostSchema | Name | Type | Owner | Size |
Description------------+-------------------------+-------+----------+------------+-------------
pg_catalog | pg_attribute | table | postgres | 439 MB |
pg_catalog | pg_rewrite | table | postgres | 314 MB |
pg_catalog | pg_proc | table | postgres | 136 MB |
pg_catalog | pg_depend | table | postgres | 133 MB |
pg_catalog | pg_class | table | postgres | 69 MB |
pg_catalog | pg_attrdef | table | postgres | 55 MB |
pg_catalog | pg_trigger | table | postgres | 47 MB |
pg_catalog | pg_type | table | postgres | 31 MB |
pg_catalog | pg_description | table | postgres | 23 MB |
pg_catalog | pg_index | table | postgres | 20 MB |
pg_catalog | pg_constraint | table | postgres | 17 MB |
pg_catalog | pg_shdepend | table | postgres | 17 MB |
pg_catalog | pg_statistic | table | postgres | 928 kB |
pg_catalog | pg_operator | table | postgres | 552 kB |
pg_catalog | pg_collation | table | postgres | 232 kB |
should not be used a pessimist controlled locking instead?
Regards
Pavel
Show quoted text
Regards,
Fabrízio
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQLTimbira: http://www.timbira.com.br
Blog: http://fabriziomello.github.io
Linkedin: http://br.linkedin.com/in/fabriziomello
Twitter: http://twitter.com/fabriziomello
Github: http://github.com/fabriziomello
Pavel Stehule wrote:
should not be used a pessimist controlled locking instead?
Patches welcome.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 2, 2015 at 3:18 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Okay, I have marked this patch as "Ready For Committer"
Notes for Committer -
There is one behavioural difference in the handling of --analyze-in-stages
switch, when individual tables (by using -t option) are analyzed by
using this switch, patch will process (in case of concurrent jobs) all the
given tables for stage-1 and then for stage-2 and so on whereas in the
unpatched code it will process all the three stages table by table
(table-1 all three stages, table-2 all three stages and so on). I think
the new behaviour is okay as the same is done when this utility does
vacuum for whole database. As there was no input from any committer
on this point, I thought it is better to get the same rather than waiting
more just for one point.
Friendly greetings !
What's the status of parallel clusterdb please ?
I'm having fun (and troubles) applying the vacuumdb patch to clusterdb.
This thread also talk about unifying code for parallelizing clusterdb and
reindex.
Was anything done about it ? Because i can't see it and my work currently
involve a lot of copy/pasting from vacuumdb to clusterdb :)
And no, (i'm pretty sure) i don't have the required postgresql knowledge to
do this unification if it isn't done already.
Thank you :)
(And sorry about the thread-necromancy)
--
Laurent "ker2x" Laborde
DBA \o/ Gandi.net
Laurent Laborde wrote:
Friendly greetings !
What's the status of parallel clusterdb please ?
I'm having fun (and troubles) applying the vacuumdb patch to clusterdb.This thread also talk about unifying code for parallelizing clusterdb and
reindex.
Was anything done about it ? Because i can't see it and my work currently
involve a lot of copy/pasting from vacuumdb to clusterdb :)
Honestly, I have to wonder whether there are really valid use cases for
clusterdb. Are you actually using it and want to see it improved, or is
this just an academical exercise?
And no, (i'm pretty sure) i don't have the required postgresql knowledge to
do this unification if it isn't done already.
You may or may not lack it *now*, but that doesn't mean you will
continue to lack it forever.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Le 23 juil. 2015 19:27, "Alvaro Herrera" <alvherre@2ndquadrant.com> a
écrit :
Laurent Laborde wrote:
Friendly greetings !
What's the status of parallel clusterdb please ?
I'm having fun (and troubles) applying the vacuumdb patch to clusterdb.This thread also talk about unifying code for parallelizing clusterdb
and
reindex.
Was anything done about it ? Because i can't see it and my work
currently
involve a lot of copy/pasting from vacuumdb to clusterdb :)
Honestly, I have to wonder whether there are really valid use cases for
clusterdb. Are you actually using it and want to see it improved, or is
this just an academical exercise?
Purely academical. I don't use it.
And no, (i'm pretty sure) i don't have the required postgresql
knowledge to
do this unification if it isn't done already.
You may or may not lack it *now*, but that doesn't mean you will
continue to lack it forever.
That's why i'm working on it :)
--
Laurent "ker2x" Laborde
DBA \o/ gandi.net