Allowing parallel pg_restore from pipe

Started by Timothy Garnettover 12 years ago15 messages

tgarnett@panjiva.com

over 12 years ago

2 attachment(s)

Hi All,

Currently the -j option to pg_restore, which allows for parallelization in
the restore, can only be used if the input file is a regular file and not,
for ex., a pipe. However this is a pretty common occurrence for us
(usually in the form of pg_dump | pg_restore to copy an individual database
or some tables thereof from one machine to another). While there's no good
way to parallelize the data load steps when reading from a pipe, the index
and constraint building can still be parallelized and as they are generally
CPU bound on our machines we've found quite a bit of speedup from doing so.

Attached is two diffs off of the REL9_2_4 tag that I've been using. The
first is a simple change that serially loads the data section before
handing off the remainder of the restore to the existing parallelized
restore code (the .ALT. diff). The second which gets more parallelization
but is a bit more of a change uses the existing dependency analysis code to
allow index building etc. to occur in parallel with data loading. The data
loading tasks are still performed serially in the main thread, but non-data
loading tasks are scheduled in parallel as their dependencies are satisfied
(with the caveat that the main thread can only dispatch new tasks between
data loads).

Anyways, the question is if people think this is generally useful. If so I
can clean up the preferred choice a bit and rebase it off of master, etc.

Tim

Attachments:

0003-patch-pg_restore-to-allow-parallel-restore-when-the.ALT.patchapplication/octet-stream; name=0003-patch-pg_restore-to-allow-parallel-restore-when-the.ALT.patchDownload

From 8b205bd8a109892779dc804fb75d06d9a71db396 Mon Sep 17 00:00:00 2001
From: Timothy Garnett <tgarnett@panjiva.com>
Date: Tue, 23 Apr 2013 11:21:25 -0400
Subject: [PATCH 1/1] patch pg_restore to support partial parallel restore
 (parallel restore of indexes/constraints, but not data) of custom format
 dumps received vai stdin

---
 src/bin/pg_dump/pg_backup_archiver.c | 66 ++++++++++++++++++++++--------------
 src/bin/pg_dump/pg_backup_custom.c   |  5 +++
 2 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index c176b65..537b0c1 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -115,7 +115,6 @@ typedef struct _outputContext
 /* translator: this is a module name */
 static const char *modulename = gettext_noop("archiver");
 
-
 static ArchiveHandle *_allocAH(const char *FileSpec, const ArchiveFormat fmt,
 		 const int compression, ArchiveMode mode);
 static void _getObjectDescription(PQExpBuffer buf, TocEntry *te,
@@ -316,18 +315,12 @@ RestoreArchive(Archive *AHX)
 	if (parallel_mode)
 	{
 		/* We haven't got round to making this work for all archive formats */
-		if (AH->ClonePtr == NULL || AH->ReopenPtr == NULL)
+		if (AH->ClonePtr == NULL)
 			exit_horribly(modulename, "parallel restore is not supported with this archive file format\n");
 
 		/* Doesn't work if the archive represents dependencies as OIDs */
 		if (AH->version < K_VERS_1_8)
 			exit_horribly(modulename, "parallel restore is not supported with archives made by pre-8.0 pg_dump\n");
-
-		/*
-		 * It's also not gonna work if we can't reopen the input file, so
-		 * let's try that immediately.
-		 */
-		(AH->ReopenPtr) (AH);
 	}
 
 	/*
@@ -3465,12 +3458,19 @@ on_exit_close_archive(Archive *AHX)
  * Main engine for parallel restore.
  *
  * Work is done in three phases.
- * First we process all SECTION_PRE_DATA tocEntries, in a single connection,
- * just as for a standard restore.	Second we process the remaining non-ACL
- * steps in parallel worker children (threads on Windows, processes on Unix),
- * each of which connects separately to the database.  Finally we process all
- * the ACL entries in a single connection (that happens back in
- * RestoreArchive).
+ * 1) First we process all SECTION_PRE_DATA tocEntries, in a single connection,
+ *    just as for a standard restore. If we are unable to reopen the archive
+ *    (say from STDIN) we also do SECTION_DATA tocEntries as well.
+ * 2) Second we process the remaining non-ACL steps in parallel worker children
+ *    (threads on Windows, processes on Unix), each of which connects separately
+ *    to the database.
+ * 3) Finally we process all the ACL entries in a single connection
+ *    (that happens back in RestoreArchive).
+ *
+ * As a future improvement, in the case of unable to reopen the archive, it
+ * should be possible do SECTION_POST_DATA entries in parallel with the
+ * SECTION_DATA entries as their dependencies are met.  But, that requires
+ * some re-working of the flow here.
  */
 static void
 restore_toc_entries_parallel(ArchiveHandle *AH)
@@ -3520,23 +3520,32 @@ restore_toc_entries_parallel(ArchiveHandle *AH)
 		/* NB: process-or-continue logic must be the inverse of loop below */
 		if (next_work_item->section != SECTION_PRE_DATA)
 		{
-			/* DATA and POST_DATA items are just ignored for now */
-			if (next_work_item->section == SECTION_DATA ||
-				next_work_item->section == SECTION_POST_DATA)
+			/* Detect if we can re-open the archive (needed for parallel data
+			 * restore). If not process SECTION_DATA entries as well. */
+			if (next_work_item->section == SECTION_DATA && AH->ReopenPtr == NULL)
 			{
-				skipped_some = true;
-				continue;
+				/* fall through to process toc entry */
 			}
 			else
 			{
-				/*
-				 * SECTION_NONE items, such as comments, can be processed now
-				 * if we are still in the PRE_DATA part of the archive.  Once
-				 * we've skipped any items, we have to consider whether the
-				 * comment's dependencies are satisfied, so skip it for now.
-				 */
-				if (skipped_some)
+				/* DATA and POST_DATA items are just ignored for now */
+				if (next_work_item->section == SECTION_DATA ||
+					next_work_item->section == SECTION_POST_DATA)
+				{
+					skipped_some = true;
 					continue;
+				}
+				else
+				{
+					/*
+					 * SECTION_NONE items, such as comments, can be processed now
+					 * if we are still in the PRE_DATA part of the archive.  Once
+					 * we've skipped any items, we have to consider whether the
+					 * comment's dependencies are satisfied, so skip it for now.
+					 */
+					if (skipped_some)
+						continue;
+				}
 			}
 		}
 
@@ -3595,6 +3604,11 @@ restore_toc_entries_parallel(ArchiveHandle *AH)
 			/* All PRE_DATA items were dealt with above */
 			continue;
 		}
+		if (next_work_item->section == SECTION_DATA && AH->ReopenPtr == NULL)
+		{
+			/* If not able to reopen we handled SECTION_DATA above */
+			continue;
+		}
 		if (next_work_item->section == SECTION_DATA ||
 			next_work_item->section == SECTION_POST_DATA)
 		{
diff --git a/src/bin/pg_dump/pg_backup_custom.c b/src/bin/pg_dump/pg_backup_custom.c
index f7dc5be..4c9aa44 100644
--- a/src/bin/pg_dump/pg_backup_custom.c
+++ b/src/bin/pg_dump/pg_backup_custom.c
@@ -184,6 +184,11 @@ InitArchiveFmt_Custom(ArchiveHandle *AH)
 		ctx->dataStart = _getFilePos(AH, ctx);
 	}
 
+	/* clear the ReopenPtr if we can't actually reopen the file (say STDIN) */
+	if (AH->fSpec == NULL || strcmp(AH->fSpec, "") == 0 || !ctx->hasSeek)
+	{
+		AH->ReopenPtr = NULL;
+	}
 }
 
 /*
-- 
1.8.2.1

0003-patch-pg_restore-to-allow-parallel-restore-when-the-.patchapplication/octet-stream; name=0003-patch-pg_restore-to-allow-parallel-restore-when-the-.patchDownload

From 3c40cbcf33478e99ee70be7667cc89640049f014 Mon Sep 17 00:00:00 2001
From: Timothy Garnett <tgarnett@panjiva.com>
Date: Tue, 23 Apr 2013 18:22:51 -0400
Subject: [PATCH 3/3] patch pg_restore to allow parallel restore when the
 archive file is piped in via STDIN, data is still loaded serially in parallel
 but we parallelize indexes and constraints and intermix them with the data
 loading, because data loading blocks dispatching it's possible we could do
 slightly better, but this is pretty good

---
 src/bin/pg_dump/pg_backup_archiver.c | 219 +++++++++++++++++++++++------------
 src/bin/pg_dump/pg_backup_custom.c   |   5 +
 2 files changed, 151 insertions(+), 73 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index c176b65..df6fe22 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -151,6 +151,8 @@ static void RestoreOutput(ArchiveHandle *AH, OutputContext savedContext);
 static int restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 				  RestoreOptions *ropt, bool is_parallel);
 static void restore_toc_entries_parallel(ArchiveHandle *AH);
+static void parallel_restore_close_db_connection(ArchiveHandle *AH);
+static void parallel_restore_reestablish_db_connection(ArchiveHandle *AH);
 static thandle spawn_restore(RestoreArgs *args);
 static thandle reap_child(ParallelSlot *slots, int n_slots, int *work_status);
 static bool work_in_progress(ParallelSlot *slots, int n_slots);
@@ -160,7 +162,7 @@ static void par_list_append(TocEntry *l, TocEntry *te);
 static void par_list_remove(TocEntry *te);
 static TocEntry *get_next_work_item(ArchiveHandle *AH,
 				   TocEntry *ready_list,
-				   ParallelSlot *slots, int n_slots);
+				   ParallelSlot *slots, int n_slots, bool pref_non_data);
 static parallel_restore_result parallel_restore(RestoreArgs *args);
 static void mark_work_done(ArchiveHandle *AH, TocEntry *ready_list,
 			   thandle worker, int status,
@@ -171,6 +173,7 @@ static void repoint_table_dependencies(ArchiveHandle *AH);
 static void identify_locking_dependencies(ArchiveHandle *AH, TocEntry *te);
 static void reduce_dependencies(ArchiveHandle *AH, TocEntry *te,
 					TocEntry *ready_list);
+static void add_dependency(TocEntry *te, DumpId refId);
 static void mark_create_done(ArchiveHandle *AH, TocEntry *te);
 static void inhibit_data_for_failed_table(ArchiveHandle *AH, TocEntry *te);
 static ArchiveHandle *CloneArchive(ArchiveHandle *AH);
@@ -316,18 +319,12 @@ RestoreArchive(Archive *AHX)
 	if (parallel_mode)
 	{
 		/* We haven't got round to making this work for all archive formats */
-		if (AH->ClonePtr == NULL || AH->ReopenPtr == NULL)
+		if (AH->ClonePtr == NULL)
 			exit_horribly(modulename, "parallel restore is not supported with this archive file format\n");
 
 		/* Doesn't work if the archive represents dependencies as OIDs */
 		if (AH->version < K_VERS_1_8)
 			exit_horribly(modulename, "parallel restore is not supported with archives made by pre-8.0 pg_dump\n");
-
-		/*
-		 * It's also not gonna work if we can't reopen the input file, so
-		 * let's try that immediately.
-		 */
-		(AH->ReopenPtr) (AH);
 	}
 
 	/*
@@ -3461,16 +3458,60 @@ on_exit_close_archive(Archive *AHX)
 	on_exit_nicely(archive_close_connection, &shutdown_info);
 }
 
+void
+parallel_restore_close_db_connection(ArchiveHandle *AH)
+{
+	DisconnectDatabase(&AH->public);
+
+	/* blow away any transient state from the old connection */
+	if (AH->currUser)
+		free(AH->currUser);
+	AH->currUser = NULL;
+	if (AH->currSchema)
+		free(AH->currSchema);
+	AH->currSchema = NULL;
+	if (AH->currTablespace)
+		free(AH->currTablespace);
+	AH->currTablespace = NULL;
+	AH->currWithOids = -1;
+}
+
+void
+parallel_restore_reestablish_db_connection(ArchiveHandle *AH)
+{
+	RestoreOptions *ropt = AH->ropt;
+
+	ConnectDatabase((Archive *) AH, ropt->dbname,
+					ropt->pghost, ropt->pgport, ropt->username,
+					ropt->promptPassword);
+
+	_doSetFixedOutputState(AH);
+}
+
 /*
  * Main engine for parallel restore.
  *
  * Work is done in three phases.
- * First we process all SECTION_PRE_DATA tocEntries, in a single connection,
- * just as for a standard restore.	Second we process the remaining non-ACL
- * steps in parallel worker children (threads on Windows, processes on Unix),
- * each of which connects separately to the database.  Finally we process all
- * the ACL entries in a single connection (that happens back in
- * RestoreArchive).
+ * 1) First we process all SECTION_PRE_DATA tocEntries, in a single connection,
+ *    just as for a standard restore.
+ * 2) Second we process the remaining non-ACL steps in parallel worker children
+ *    (threads on Windows, processes on Unix), each of which connects separately
+ *    to the database. In the case where we are unable to reopen the archive
+ *    (STDIN) we add a dependency between each SECTION_DATA item and the next
+ *    so they are processed one at a time and in order obviating the need to
+ *    reopen the archive. This is handled by fix_dependencies.
+ * 3) Finally we process all the ACL entries in a single connection
+ *    (that happens back in RestoreArchive).
+ *
+ * NOTE - in the case where we are unable to reopen the archive, we load data
+ * in the parent thread which blocks dispatching other work entries.  We do this
+ * to avoid issues with the archive file state and the need to communicate that
+ * up to the parent thread from any data loading child. To keep the parallelism
+ * up we favor non-data loading tasks over data loading tasks, but this does
+ * limit parallelism slightly over the optimal case where one thread would
+ * continually load data.  However tracking that would seem to involve a lot
+ * of extra complexity around inter-process communication and this gets us
+ * pretty close.
  */
 static void
 restore_toc_entries_parallel(ArchiveHandle *AH)
@@ -3555,7 +3596,7 @@ restore_toc_entries_parallel(ArchiveHandle *AH)
 	 * mainly to ensure that we don't exceed the specified number of parallel
 	 * connections.
 	 */
-	DisconnectDatabase(&AH->public);
+	parallel_restore_close_db_connection(AH);
 
 	/*
 	 * Set the pstate in the shutdown_info. The exit handler uses pstate if
@@ -3563,18 +3604,6 @@ restore_toc_entries_parallel(ArchiveHandle *AH)
 	 */
 	shutdown_info.pstate = pstate;
 
-	/* blow away any transient state from the old connection */
-	if (AH->currUser)
-		free(AH->currUser);
-	AH->currUser = NULL;
-	if (AH->currSchema)
-		free(AH->currSchema);
-	AH->currSchema = NULL;
-	if (AH->currTablespace)
-		free(AH->currTablespace);
-	AH->currTablespace = NULL;
-	AH->currWithOids = -1;
-
 	/*
 	 * Initialize the lists of pending and ready items.  After this setup, the
 	 * pending list is everything that needs to be done but is blocked by one
@@ -3623,8 +3652,10 @@ restore_toc_entries_parallel(ArchiveHandle *AH)
 
 	ahlog(AH, 1, "entering main parallel loop\n");
 
+	/* Since we'll block while loading data when we can't reopen the file,
+	 * we prefer non-data work items in the selection here for parallelism */
 	while ((next_work_item = get_next_work_item(AH, &ready_list,
-												slots, n_slots)) != NULL ||
+												slots, n_slots, AH->ReopenPtr == NULL)) != NULL ||
 		   work_in_progress(slots, n_slots))
 	{
 		if (next_work_item != NULL)
@@ -3645,29 +3676,52 @@ restore_toc_entries_parallel(ArchiveHandle *AH)
 
 			if ((next_slot = get_next_slot(slots, n_slots)) != NO_SLOT)
 			{
-				/* There is work still to do and a worker slot available */
-				thandle		child;
-				RestoreArgs *args;
+				/* to avoid the need for inter-process communication around file
+				 * positioning etc. we do all data loading in the main loop */
+				if (AH->ReopenPtr == NULL && next_work_item->section == SECTION_DATA)
+				{
+					/* Load data, this will block dispatching work items till finished */
+					par_list_remove(next_work_item);
+					/* re-establish db connection, we know there's at least on slot
+					 * connection slot free so not exceeded requested max */
+					shutdown_info.pstate = NULL;
+					parallel_restore_reestablish_db_connection(AH);
 
-				ahlog(AH, 1, "launching item %d %s %s\n",
-					  next_work_item->dumpId,
-					  next_work_item->desc, next_work_item->tag);
+					(void) restore_toc_entry(AH, next_work_item, ropt, false);
+					reduce_dependencies(AH, next_work_item, &ready_list);
 
-				par_list_remove(next_work_item);
+					/* Release db connection so can be used in parallel children */
+					parallel_restore_close_db_connection(AH);
+					shutdown_info.pstate = pstate;
+
+					continue;
+				}
+				else
+				{
+					/* There is work still to do and a worker slot available */
+					thandle		child;
+					RestoreArgs *args;
 
-				/* this memory is dealloced in mark_work_done() */
-				args = pg_malloc(sizeof(RestoreArgs));
-				args->AH = CloneArchive(AH);
-				args->te = next_work_item;
-				args->pse = &pstate->pse[next_slot];
+					ahlog(AH, 1, "launching item %d %s %s\n",
+						  next_work_item->dumpId,
+						  next_work_item->desc, next_work_item->tag);
 
-				/* run the step in a worker child */
-				child = spawn_restore(args);
+					par_list_remove(next_work_item);
 
-				slots[next_slot].child_id = child;
-				slots[next_slot].args = args;
+					/* this memory is dealloced in mark_work_done() */
+					args = pg_malloc(sizeof(RestoreArgs));
+					args->AH = CloneArchive(AH);
+					args->te = next_work_item;
+					args->pse = &pstate->pse[next_slot];
 
-				continue;
+					/* run the step in a worker child */
+					child = spawn_restore(args);
+
+					slots[next_slot].child_id = child;
+					slots[next_slot].args = args;
+
+					continue;
+				}
 			}
 		}
 
@@ -3703,11 +3757,7 @@ restore_toc_entries_parallel(ArchiveHandle *AH)
 	/*
 	 * Now reconnect the single parent connection.
 	 */
-	ConnectDatabase((Archive *) AH, ropt->dbname,
-					ropt->pghost, ropt->pgport, ropt->username,
-					ropt->promptPassword);
-
-	_doSetFixedOutputState(AH);
+	parallel_restore_reestablish_db_connection(AH);
 
 	/*
 	 * Make sure there is no non-ACL work left due to, say, circular
@@ -3907,35 +3957,18 @@ par_list_remove(TocEntry *te)
  * The caller must do that after successfully dispatching the item.
  *
  * pref_non_data is for an alternative selection algorithm that gives
- * preference to non-data items if there is already a data load running.
- * It is currently disabled.
+ * preference to non-data items
  */
 static TocEntry *
 get_next_work_item(ArchiveHandle *AH, TocEntry *ready_list,
-				   ParallelSlot *slots, int n_slots)
+                                  ParallelSlot *slots, int n_slots, bool pref_non_data)
 {
-	bool		pref_non_data = false;	/* or get from AH->ropt */
 	TocEntry   *data_te = NULL;
 	TocEntry   *te;
 	int			i,
 				k;
 
 	/*
-	 * Bogus heuristics for pref_non_data
-	 */
-	if (pref_non_data)
-	{
-		int			count = 0;
-
-		for (k = 0; k < n_slots; k++)
-			if (slots[k].args->te != NULL &&
-				slots[k].args->te->section == SECTION_DATA)
-				count++;
-		if (n_slots == 0 || count * 4 < n_slots)
-			pref_non_data = false;
-	}
-
-	/*
 	 * Search the ready_list until we find a suitable item.
 	 */
 	for (te = ready_list->par_next; te != ready_list; te = te->par_next)
@@ -4102,6 +4135,10 @@ mark_work_done(ArchiveHandle *AH, TocEntry *ready_list,
 /*
  * Process the dependency information into a form useful for parallel restore.
  *
+ * In the case where we can't reopen the archive file (STDIN) we add
+ * dependencies to ensure the data sections are dumped serially and in order
+ * to match file pointer progression.
+ *
  * This function takes care of fixing up some missing or badly designed
  * dependencies, and then prepares subsidiary data structures that will be
  * used in the main parallel-restore logic, including:
@@ -4116,6 +4153,7 @@ static void
 fix_dependencies(ArchiveHandle *AH)
 {
 	TocEntry   *te;
+	TocEntry   *last_data_te = NULL;
 	int			i;
 
 	/*
@@ -4155,10 +4193,7 @@ fix_dependencies(ArchiveHandle *AH)
 				{
 					if (strcmp(te2->desc, "BLOBS") == 0)
 					{
-						te->dependencies = (DumpId *) pg_malloc(sizeof(DumpId));
-						te->dependencies[0] = te2->dumpId;
-						te->nDeps++;
-						te->depCount++;
+						add_dependency(te, te2->dumpId);
 						break;
 					}
 				}
@@ -4168,6 +4203,25 @@ fix_dependencies(ArchiveHandle *AH)
 	}
 
 	/*
+	 * Add dependencies to ensure SECTION_DATA items are dumped serially and in
+	 * order in the case where we are unable to reopen the archive file (STDIN)
+	 */
+	if (AH->ReopenPtr == NULL)
+	{
+		for (te = AH->toc->next; te != AH->toc; te = te->next)
+		{
+			if (te->section == SECTION_DATA)
+			{
+				if (last_data_te)
+					add_dependency(te, last_data_te->dumpId);
+
+				last_data_te = te;
+			}
+		}
+	}
+
+
+	/*
 	 * At this point we start to build the revDeps reverse-dependency arrays,
 	 * so all changes of dependencies must be complete.
 	 */
@@ -4341,6 +4395,25 @@ reduce_dependencies(ArchiveHandle *AH, TocEntry *te, TocEntry *ready_list)
 }
 
 /*
+ * Adds a dependency to a TocEntry.  We use this to add dependencies that are
+ * known to be missing in the archive and setup constraints on the order in
+ * which table data is restored during parallel restore when we are unable to
+ * reopen the archive file.
+ */
+static void
+add_dependency(TocEntry *te, DumpId refId)
+{
+	if (te->dependencies == NULL)
+		te->dependencies = (DumpId *) pg_malloc(sizeof(DumpId));
+	else
+		te->dependencies = (DumpId *) pg_realloc(te->dependencies, sizeof(DumpId) * (te->nDeps + 1));
+
+	te->dependencies[te->nDeps] = refId;
+	te->nDeps++;
+	te->depCount++;
+}
+
+/*
  * Set the created flag on the DATA member corresponding to the given
  * TABLE member
  */
diff --git a/src/bin/pg_dump/pg_backup_custom.c b/src/bin/pg_dump/pg_backup_custom.c
index f7dc5be..4c9aa44 100644
--- a/src/bin/pg_dump/pg_backup_custom.c
+++ b/src/bin/pg_dump/pg_backup_custom.c
@@ -184,6 +184,11 @@ InitArchiveFmt_Custom(ArchiveHandle *AH)
 		ctx->dataStart = _getFilePos(AH, ctx);
 	}
 
+	/* clear the ReopenPtr if we can't actually reopen the file (say STDIN) */
+	if (AH->fSpec == NULL || strcmp(AH->fSpec, "") == 0 || !ctx->hasSeek)
+	{
+		AH->ReopenPtr = NULL;
+	}
 }
 
 /*
-- 
1.8.2.1

Andrew Dunstan

andrew@dunslane.net

over 12 years ago

In reply to: Timothy Garnett (#1)

Re: Allowing parallel pg_restore from pipe

On 04/23/2013 07:53 PM, Timothy Garnett wrote:

Hi All,

Currently the -j option to pg_restore, which allows for
parallelization in the restore, can only be used if the input file is
a regular file and not, for ex., a pipe. However this is a pretty
common occurrence for us (usually in the form of pg_dump | pg_restore
to copy an individual database or some tables thereof from one machine
to another). While there's no good way to parallelize the data load
steps when reading from a pipe, the index and constraint building can
still be parallelized and as they are generally CPU bound on our
machines we've found quite a bit of speedup from doing so.

Attached is two diffs off of the REL9_2_4 tag that I've been using.
The first is a simple change that serially loads the data section
before handing off the remainder of the restore to the existing
parallelized restore code (the .ALT. diff). The second which gets
more parallelization but is a bit more of a change uses the existing
dependency analysis code to allow index building etc. to occur in
parallel with data loading. The data loading tasks are still performed
serially in the main thread, but non-data loading tasks are scheduled
in parallel as their dependencies are satisfied (with the caveat that
the main thread can only dispatch new tasks between data loads).

Anyways, the question is if people think this is generally useful. If
so I can clean up the preferred choice a bit and rebase it off of
master, etc.

I don't think these are bad ideas at all, and probably worth doing. Note
that there are some fairly hefty changes affecting this code in master,
so your rebasing could be tricky.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Dimitri Fontaine

dimitri@2ndQuadrant.fr

over 12 years ago

In reply to: Andrew Dunstan (#2)

Re: Allowing parallel pg_restore from pipe

Andrew Dunstan <andrew@dunslane.net> writes:

On 04/23/2013 07:53 PM, Timothy Garnett wrote:

Anyways, the question is if people think this is generally useful. If so
I can clean up the preferred choice a bit and rebase it off of master,
etc.

I find this idea very useful yes.

Another idea would be to allow for parallel pg_dump output to somehow be
piped into a parallel pg_restore. I don't know how to solve that at all,
it just sound something worthy of doing too.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andrew Dunstan

andrew@dunslane.net

over 12 years ago

In reply to: Dimitri Fontaine (#3)

Re: Allowing parallel pg_restore from pipe

On 04/24/2013 03:40 PM, Dimitri Fontaine wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

On 04/23/2013 07:53 PM, Timothy Garnett wrote:

Anyways, the question is if people think this is generally useful. If so
I can clean up the preferred choice a bit and rebase it off of master,
etc.

I find this idea very useful yes.

Another idea would be to allow for parallel pg_dump output to somehow be
piped into a parallel pg_restore. I don't know how to solve that at all,
it just sound something worthy of doing too.

That's not going to work, the output from parallel pg_dump is inherently
multiple streams. That's why it ONLY supports directory format, and not
even custom format on disk, let alone a pipe.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andrew Dunstan

andrew@dunslane.net

over 12 years ago

In reply to: Andrew Dunstan (#4)

Re: Allowing parallel pg_restore from pipe

On 04/24/2013 03:49 PM, Andrew Dunstan wrote:

On 04/24/2013 03:40 PM, Dimitri Fontaine wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

On 04/23/2013 07:53 PM, Timothy Garnett wrote:

Anyways, the question is if people think this is generally useful.
If so
I can clean up the preferred choice a bit and rebase it off of master,
etc.

I find this idea very useful yes.

Another idea would be to allow for parallel pg_dump output to somehow be
piped into a parallel pg_restore. I don't know how to solve that at all,
it just sound something worthy of doing too.

That's not going to work, the output from parallel pg_dump is
inherently multiple streams. That's why it ONLY supports directory
format, and not even custom format on disk, let alone a pipe.

What might make sense is something like pg_dump_restore which would have
no intermediate storage at all, just pump the data etc from one source
to another in parallel. But I pity the poor guy who has to write it :-)

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Stefan Kaltenbrunner

stefan@kaltenbrunner.cc

over 12 years ago

In reply to: Andrew Dunstan (#5)

Re: Allowing parallel pg_restore from pipe

On 04/24/2013 09:51 PM, Andrew Dunstan wrote:

On 04/24/2013 03:49 PM, Andrew Dunstan wrote:

On 04/24/2013 03:40 PM, Dimitri Fontaine wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

On 04/23/2013 07:53 PM, Timothy Garnett wrote:

Anyways, the question is if people think this is generally useful.
If so
I can clean up the preferred choice a bit and rebase it off of master,
etc.

I find this idea very useful yes.

Another idea would be to allow for parallel pg_dump output to somehow be
piped into a parallel pg_restore. I don't know how to solve that at all,
it just sound something worthy of doing too.

That's not going to work, the output from parallel pg_dump is
inherently multiple streams. That's why it ONLY supports directory
format, and not even custom format on disk, let alone a pipe.

What might make sense is something like pg_dump_restore which would have
no intermediate storage at all, just pump the data etc from one source
to another in parallel. But I pity the poor guy who has to write it :-)

hmm pretty sure that Joachims initial patch for parallel dump actually
had a PoC for something very similiar to that...

Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Joachim Wieland

joe@mcknight.de

over 12 years ago

In reply to: Stefan Kaltenbrunner (#6)

Re: Allowing parallel pg_restore from pipe

On Wed, Apr 24, 2013 at 4:05 PM, Stefan Kaltenbrunner <
stefan@kaltenbrunner.cc> wrote:

What might make sense is something like pg_dump_restore which would have
no intermediate storage at all, just pump the data etc from one source
to another in parallel. But I pity the poor guy who has to write it :-)

hmm pretty sure that Joachims initial patch for parallel dump actually
had a PoC for something very similiar to that...

That's right, I implemented that as an own output format and named it
"migrator" I think, which wouldn't write each stream to a file as the
directory output format does but that instead pumps it back into a restore
client.

Actually I think the logic was even reversed, it was a parallel restore
that got the data from internally calling pg_dump functionality instead of
from reading files... The neat thing about this approach was that the order
was optimized and correct, i.e. largest tables start first and dependencies
get resolved in the right order.

I could revisit that patch for 9.4 if enough people are interested.

Joachim

Claudio Freire

klaussfreire@gmail.com

over 12 years ago

In reply to: Joachim Wieland (#7)

Re: Allowing parallel pg_restore from pipe

On Wed, Apr 24, 2013 at 6:47 PM, Joachim Wieland <joe@mcknight.de> wrote:

On Wed, Apr 24, 2013 at 4:05 PM, Stefan Kaltenbrunner
<stefan@kaltenbrunner.cc> wrote:

What might make sense is something like pg_dump_restore which would have
no intermediate storage at all, just pump the data etc from one source
to another in parallel. But I pity the poor guy who has to write it :-)

hmm pretty sure that Joachims initial patch for parallel dump actually
had a PoC for something very similiar to that...

That's right, I implemented that as an own output format and named it
"migrator" I think, which wouldn't write each stream to a file as the
directory output format does but that instead pumps it back into a restore
client.

Actually I think the logic was even reversed, it was a parallel restore that
got the data from internally calling pg_dump functionality instead of from
reading files... The neat thing about this approach was that the order was
optimized and correct, i.e. largest tables start first and dependencies get
resolved in the right order.

I could revisit that patch for 9.4 if enough people are interested.

Indeed... I've wasted hours copying databases for test environments,
when that could've been hour (singular).

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Timothy Garnett

tgarnett@panjiva.com

over 12 years ago

In reply to: Joachim Wieland (#7)

Re: Allowing parallel pg_restore from pipe

As the OP, I'll just note that my organization would definitely find use
for a parallel migrator tool as long as it supported doing a selection of
tables (i.e. -t / -T) in addition to the whole database and it supported or
we were able to patch in an option to cluster as part of the migration (the
equivalent of something like
https://github.com/tgarnett/postgres/commit/cc320a71 ).

Tim

On Wed, Apr 24, 2013 at 5:47 PM, Joachim Wieland <joe@mcknight.de> wrote:

Show quoted text

On Wed, Apr 24, 2013 at 4:05 PM, Stefan Kaltenbrunner <
stefan@kaltenbrunner.cc> wrote:

What might make sense is something like pg_dump_restore which would have
no intermediate storage at all, just pump the data etc from one source
to another in parallel. But I pity the poor guy who has to write it :-)

hmm pretty sure that Joachims initial patch for parallel dump actually
had a PoC for something very similiar to that...

That's right, I implemented that as an own output format and named it
"migrator" I think, which wouldn't write each stream to a file as the
directory output format does but that instead pumps it back into a restore
client.

Actually I think the logic was even reversed, it was a parallel restore
that got the data from internally calling pg_dump functionality instead of
from reading files... The neat thing about this approach was that the order
was optimized and correct, i.e. largest tables start first and dependencies
get resolved in the right order.

I could revisit that patch for 9.4 if enough people are interested.

Joachim

#10

Timothy Garnett

tgarnett@panjiva.com

over 12 years ago

In reply to: Joachim Wieland (#7)

Re: Allowing parallel pg_restore from pipe

On Wed, Apr 24, 2013 at 5:47 PM, Joachim Wieland <joe@mcknight.de> wrote:

On Wed, Apr 24, 2013 at 4:05 PM, Stefan Kaltenbrunner <
stefan@kaltenbrunner.cc> wrote:

What might make sense is something like pg_dump_restore which would have
no intermediate storage at all, just pump the data etc from one source
to another in parallel

That's right, I implemented that as an own output format and named it
"migrator" I think, which wouldn't write each stream to a file as the
directory output format does but that instead pumps it back into a restore
client.

I could revisit that patch for 9.4 if enough people are interested.

Joachim

Tim

#11

Shaun Thomas

sthomas@optionshouse.com

over 12 years ago

In reply to: Timothy Garnett (#9)

Re: Allowing parallel pg_restore from pipe

On 04/25/2013 12:56 PM, Timothy Garnett wrote:

As the OP, I'll just note that my organization would definitely find use
for a parallel migrator tool as long as it supported doing a selection
of tables (i.e. -t / -T) in addition to the whole database and it
supported or we were able to patch in an option to cluster as part of
the migration (the equivalent of something like
https://github.com/tgarnett/postgres/commit/cc320a71 ).

If you need something like this short term, we actually found a way to
do it ourselves for a migration we performed back in October. The secret
is xargs with the -P option:

xargs -I{} -P 8 -a table-list.txt \
bash -c "pg_dump -Fc -t {} my_db | pg_restore -h remote -d my_db"

Fill table-list.txt with as many, or as few tables as you want. The
above example would give you 8 parallel threads. Well equipped systems
may be able to increase this.

Admittedly it's a gross hack, but it works. :)

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Timothy Garnett

tgarnett@panjiva.com

over 12 years ago

In reply to: Shaun Thomas (#11)

Re: Allowing parallel pg_restore from pipe

If you need something like this short term, we actually found a way to do it
ourselves for a migration we performed back in October. The secret is xargs
with the -P option:

xargs -I{} -P 8 -a table-list.txt \
bash -c "pg_dump -Fc -t {} my_db | pg_restore -h remote -d my_db"

Fill table-list.txt with as many, or as few tables as you want. The above
example would give you 8 parallel threads. Well equipped systems may be able
to increase this.

Admittedly it's a gross hack, but it works. :)

I think you'd have to be real careful around foreign key constraints
for that to work.

Tim

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Shaun Thomas

sthomas@optionshouse.com

over 12 years ago

In reply to: Timothy Garnett (#12)

Re: Allowing parallel pg_restore from pipe

On 05/16/2013 12:16 PM, Timothy Garnett wrote:

I think you'd have to be real careful around foreign key constraints
for that to work.

Not especially. All you need to do is bootstrap the database with a
bunch of empty table targets (no constraints, keys, etc), then restore
with the xargs command. Then you can apply the constraints, keys, and
indexes later by doing a schema-only parallel pg_restore.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Bruce Momjian

bruce@momjian.us

about 12 years ago

In reply to: Andrew Dunstan (#2)

Re: Allowing parallel pg_restore from pipe

On Wed, Apr 24, 2013 at 03:33:42PM -0400, Andrew Dunstan wrote:

On 04/23/2013 07:53 PM, Timothy Garnett wrote:

Hi All,

Currently the -j option to pg_restore, which allows for
parallelization in the restore, can only be used if the input file
is a regular file and not, for ex., a pipe. However this is a
pretty common occurrence for us (usually in the form of pg_dump |
pg_restore to copy an individual database or some tables thereof
from one machine to another). While there's no good way to
parallelize the data load steps when reading from a pipe, the
index and constraint building can still be parallelized and as
they are generally CPU bound on our machines we've found quite a
bit of speedup from doing so.

Attached is two diffs off of the REL9_2_4 tag that I've been
using. The first is a simple change that serially loads the data
section before handing off the remainder of the restore to the
existing parallelized restore code (the .ALT. diff). The second
which gets more parallelization but is a bit more of a change uses
the existing dependency analysis code to allow index building etc.
to occur in parallel with data loading. The data loading tasks are
still performed serially in the main thread, but non-data loading
tasks are scheduled in parallel as their dependencies are
satisfied (with the caveat that the main thread can only dispatch
new tasks between data loads).

Anyways, the question is if people think this is generally useful.
If so I can clean up the preferred choice a bit and rebase it off
of master, etc.

I don't think these are bad ideas at all, and probably worth doing.
Note that there are some fairly hefty changes affecting this code in
master, so your rebasing could be tricky.

Is there any progress on this: doing parallel pg_restore from a pipe?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Timothy Garnett

tgarnett@panjiva.com

about 12 years ago

In reply to: Bruce Momjian (#14)

Re: Allowing parallel pg_restore from pipe

On Tue, Dec 3, 2013 at 12:14 PM, Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Apr 24, 2013 at 03:33:42PM -0400, Andrew Dunstan wrote:

On 04/23/2013 07:53 PM, Timothy Garnett wrote:

...

Attached is two diffs off of the REL9_2_4 tag that I've been
using. The first is a simple change that serially loads the data
section before handing off the remainder of the restore to the
existing parallelized restore code (the .ALT. diff). The second
which gets more parallelization but is a bit more of a change uses
the existing dependency analysis code to allow index building etc.
to occur in parallel with data loading. The data loading tasks are
still performed serially in the main thread, but non-data loading
tasks are scheduled in parallel as their dependencies are
satisfied (with the caveat that the main thread can only dispatch
new tasks between data loads).

...

I don't think these are bad ideas at all, and probably worth doing.
Note that there are some fairly hefty changes affecting this code in
master, so your rebasing could be tricky.

Is there any progress on this: doing parallel pg_restore from a pipe?

We're on 9.2 and making make extensive use of the patch in the original
post. I will probably forward port it to 9.3 when we migrate to that
(probably sometime Q1) since we pretty much require it in our setup.

Tim